A Design of Overlapped Chunked Code over Compute-and-Forward for Multi-Source Multi-Relay Networks

This paper investigates the design of overlapped chunked codes (OCC) for multi-source multi-relay networks where a physical-layer network coding approach, compute-and-forward (CF) based on nested lattice codes (NLC), is applied for the simultaneous transmissions from the sources to the relays. This code is called OCC/CF. In this paper, OCC is applied before NLC before transmitting for each source. Random linear network coding is applied within each chunk. A decodability condition to design OCC/CF is provided. In addition, an OCC with a contiguously overlapping, but non-rounded-end fashion is employed for the design, which is done by using the probability distributions of the number of innovative codeword combinations and the probability distribution of the participation factor of each source to the codeword combinations received for a chunk transmission. An estimation is done to select an allocation, i.e., the number of innovative blocks per chunk and the number of blocks taken from the previous chunk for all sources, that is expected to provide the desired performance. From the numerical results, the design overhead of OCC/CF is low when the probability distribution of the participation factor of each source is dense at the chunk size for each source.


Introduction
This paper is an extended version of the work in [1]. In the current situation, wireless network nodes are ubiquitous and have increasing density. Since the wireless channel bandwidth is limited, the interference between nodes can affect the data transmission between nodes, e.g., causing message loss, longer latency, high energy consumption, etc. In order to solve this issue, the handshaking mechanism is applied to share the channel between nodes via a control message, e.g., request-to-send and clear-to-send. In addition, the transmit power control approach can allow multiple sources to transmit their messages simultaneously with appropriate interference between nodes. On the other hand, some proposed solutions exploited the interference instead of dealing with (or compensating) it. One of these solutions is physical-layer network coding (PNC) [2], which can also allow multiple sources to transmit their messages to the common receivers simultaneously. of coded blocks, and feedback can be avoided [4]. However, if MSR is not constant, then there will be some chunks that are not decodable, i.e., undecodable. To deal with this problem, the works in [13,14] proposed overlapped chunk code (OCC), where a block can belong to more than one chunk. A decoded chunk can be used to help to decode the other undecodable chunks by back-substitution, i.e., blocks from decoded chunks are substituted into the undecodable chunks that also have them as input blocks. The other designs of OCCs and those of the codes similar to OCC were proposed then such as in the works of [15][16][17][18]. These designed codes are mainly for single flow transmission or multicast transmission, i.e., the transmission of a source data. Up to the present, there is no design of OCC for the data transmission in multi-source multi-relay networks.
This paper considers the design and the application of OCC for the data transmissions in multi-source multi-relays networks where CF based on NLC is employed. The designed OCC is denoted as OCC/CF in this paper. The aim is to investigate the advantage of OCC/CF over a feedback-based transmission scheme. In addition, low computational complexity is considered such that the proposed work is applicable to low specification wireless nodes, e.g., wireless sensor nodes. This paper considers varying channel states where only receivers have knowledge of channel coefficients. The blocks of each source message are grouped into chunks. RLNC is done within each chunk before encoding with NLC. Only the transmissions from the sources to the relays are considered. The challenge to apply OCC in a multi-source multi-relay network is how to design OCC/CF such that the decodability of each chunk of all sources at the destination is ensured or the desired network performance is achieved. The contributions of this paper are as follows: • analyzing the decodability for chunks received at the destination to design OCC for each source and providing a decodability condition to design OCC/CF; • based on the condition of decodability, designing OCC/CF by employing an OCC with a contiguously overlapping, but non-rounded-end fashion at each source. The design is done by using the empirical rank distribution, i.e., the probability distribution of the number of linearly independent codeword combinations received at the destination per chunk transmission, as in the work of [17,18], and by using the probability distribution of the participation factor of each source to the received codeword combinations per chunk transmission. These two keys depend on the channel states from the sources to the relays, and they are applicable for any channel distribution; • providing a decoding scheme based on the feature of the employed OCC. The decoding scheme considers the other opportunity of starting decoding besides back-substitution, the combination of chunks. The decoding complexity is bounded by the maximum number of combined chunks, and the storing overhead can be reduced; • estimating the performance of the designed OCC/CF by following the decoding scheme and using table lookup for all allocations, i.e., the number of innovative blocks per chunk and the number of contiguously overlapped blocks for each source. The estimation is to determine which allocation can provide the desired performance such as high decodability, highest channel efficiency and acceptable decoding complexity; • reducing the number of candidates for the linear combination coefficient vector computed at each relay. This is achieved by a trade-off between computational latency and the performance in the frame error rate.
The numerical results demonstrate that the design of OCC/CF not only depends on the empirical rank distribution, but also on the probability distribution of the participation factor of each source. The chance to improve the network performance by employing OCC/CF depends on the feedback latency and feedback reception success rate if comparing with a feedback-based CF transmission scheme.
The remainder of this paper is organized as follows. Related works are described in Section 2. A short review of NLC and CF is introduced in Section 3. Section 4 describes the system model of this paper work, which includes the scenario, channel model, encoding scheme at sources and computing at relays, acquiring the linear combination coefficient vector at each, the considered empirical rank distributions and the analysis of decodability. Section 5 talks about the design of OCC/CF by using an OCC and the applied decoding scheme. The estimation of decodability for the OCC/CF designed is given in Section 6. The performance analysis and the reference schemes are described in Section 7. Section 8 shows the numerical results and discussion. At the end, Section 9 gives the conclusion.

Related Works
To complete the message transmission without the need for feedback, the code to be mentioned would be rateless code where the number of coded blocks is unlimited and the transmitter keeps sending the coded blocks until the receiver can recover all original blocks or packets. Fountain code [19] is an erasure code and a rateless code. The feature of fountain code is low computational complexity in encoding and decoding processes since they are done in the binary field, i.e., F 2 . This includes LTcode [20], Raptor code [21] and online code [22]. The decodability for LT code depends on the degree distribution, which is determined based on the soliton distribution. Degree is the number of input blocks to generate a coded block. The input blocks for each coded block are randomly selected. Raptor code applied the precoding process before encoding such that while a fraction of coded blocks are received, then all original blocks are recoverable. Online codes applied a precoding process for the distributed networks. The decoding process, while employing fountain codes, starts when at least a one-degree coded block, i.e., plain block, exists and stops when there are no more one-degree coded blocks. The decoded blocks are back-substituted into the new received coded blocks, which also have them as input blocks. The application of the inactivation decoding method [23] was studied in the work [24] for the decoding process of LT code and Raptor code to reduce the decoding complexity because the transfer matrix of the received coded blocks, i.e., the coding coefficient matrix of the received coded blocks, is a sparse matrix.
For RLNC, each element of the coding coefficient matrix of the sending coded blocks are randomly drawn from a finite field F q (normally, q is enough large, e.g., q = 2 8 ). The linear independence between coded blocks with RLNC is higher than with sparse network coding (the generated coding coefficient matrix of the coded blocks is a sparse matrix) especially in lossy communication networks, but the computational complexity of RLNC is higher. RLNC was employed within each chunk for OCC proposed in the work of [13] where two overlapping fashion were given: rectangular grid code and diagonal grid code. The number of chunks is finite, but the decodability of received chunks was not clearly analyzed. The overlapping fashion of OCC in the work of [14] is contiguous and in a rounded-end fashion. The decodability is analyzed with chunk size, the number of contiguously overlapped blocks and the number of received coded blocks. However, achieving high decodability, i.e., the probability that a chunk is decodable, requires a large chunk size, which can make the computational complexity more significant. A small sized chunk was analyzed then in their later work [15]. However, the decoding process would start when the receiver has collected a sufficient number of coded blocks of all chunks in the worst case, i.e., when there are no more decodable chunks. Then, higher storing ability at the receiver would be required, and the decoding complexity is still significant. The design of OCC with the other overlapping fashion was proposed in the work of [16], where the overlapped blocks, i.e., the blocks taken from the other chunks, are randomly selected. Although the performance in decodability is better than OCC with the contiguously overlapping fashion [14], the decoding process still might start when a sufficient number coded blocks are received.
Batched sparse (BATs) codes proposed in the work of [17] inherit the feature of rateless code by employing fountain codes as the outer code (chunk size obeys a degree distribution) and random linear network code as the inner code (RLNC is employed within each chunk). The degree distribution is determined using the empirical rank distribution to obtain the optimal performance in achievable rate. The decoding process starts when there is at least a decodable chunk, and back-substitution is done then. The inactivation decoding method might be applied when there are no more decodable chunks. The other design, which also employs the empirical rank distribution, is in the work of [18], where chunk size is fixed. Two degree distributions are defined, and a degree distribution is determined when another degree distribution is fixed to obtain the optimal achievable rate.
This paper provides the design of OCC/CF with a condition of decodability, which can be applied with the designs of codes for single flow transmission, which are described above. This paper employs an OCC in a contiguously overlapping fashion to design OCC/CF because it is simpler to determine which allocation for each source to obtain the desired performance since there are only two variables to be determined for each source. Although its performance in rate (channel efficiency, for this paper) is not higher than the other designs in single flow transmission, it has a potential to reduce the storage overhead and the computational complexity to suit its application with a low specification wireless node in multi-source multi-relay networks.

Notation
Boldface letters are used for vectors, e.g., a. The capital boldface letters are for matrices, e.g., G. Superscripts T and −1 refer to matrix transposition operation and inverse operation, respectively. R and Z denote the field of real values and the field of integer values, respectively. In addition, sign · refers to the multiplication operation, and sign × is used to express the size of the matrix.

Nested Lattice Codes
An n-dimensional lattice Λ is a linear additive subgroup of R n , i.e., if x 1 , x 2 ∈ Λ, then x 1 + x 2 ∈ Λ and −x 1 ∈ Λ. A lattice point x ∈ Λ is generated by the generator matrix G ∈ R n×n and an integer vector b ∈ Z n by: The fundamental Voronoi region of Λ, V, is the space that is closer to the origin x o (x o = 0) than to the other lattice points. A scaled lattice Λ p = p · Λ is obtained by scaling the generator matrix of Λ, i.e., G p = p · G. A lattice Λ p is nested in Λ if Λ p ⊆ Λ. If p is a non-zero positive integer, then NLC is formed by a coding lattice Λ c and a shaping lattice Λ s , where Λ s ⊆ Λ c . The codebook of NLC is the coset leaders of Λ c /Λ s , i.e., the lattice points (codewords) of Λ c that are inside the fundamental Voronoi region of Λ s , V s . If taking Λ s = qΛ c , where q is a prime number, and the generator matrix of Λ c , G c , is full rank, then the coding rate of NLC is R = log 2 q. The number of codewords is q n . The feature of NLC is that the linear combination of two codewords is still a codeword. The encoding process of NLC can be done as below.
where b ∈ F n q is the information, x is the NLC codeword corresponding to b and [ ] mod V s is the operation mapping a lattice point of Λ c into V s . This operation restricts the transmit power of a sending codeword by an assigned maximum transmit power P max .
The decoding process can be done as below: where [ ] mod q is the modulo operation by q or the operation mapping an integer value into the finite field F q .

Compute-and-Forward
If K sources transmit their codeword simultaneously, the accumulative codeword at relay l can be expressed by: where x k ∈ R n is an n-dimensional NLC codeword, which is transmitted from source k for k ∈ {1, 2, · · · , K}. h kl ∈ R is a real channel coefficient of the link from source k to relay l. For the case of a complex channel coefficient, the derivation can be done as in the work of [7]. On the other hand, z l ∈ R n is additive white Gaussian noise (AWGN). Relay l computes y l to obtain a linear combination of the codewords of K sources, v l , where: where a l = [a 1l , a 2l , · · · , a Kl ] T ∈ Z K is a linear combination integer coefficient vector used at relay l and a kl is called the k-th element of a l . v l is mapped into V s to obtain u l before forwarding to obey the transmit power constraint, i.e., 1 n ∑ n n =1 u 2 ln ≤ P max , where u ln is the n -th element of u l , and P max > 0.

Scenario
This paper takes a scenario of a K sources L relays single-destination network as shown in Figure 1, which is in the case K = L = 2. Each node is equipped with a single antenna. The direct links from the sources to the destination are not considered, and only the transmissions from the sources to the relays are considered. In this paper, all sources apply the NLC with the same coding rate R = log 2 q. This paper assumes that the transmissions from the relays to the destination are lossless. The process of forwarding the codeword combinations to the destination can be done as in the works of [8,9] by exploiting the coordination from the destination via control messages between the relays and the destination to select which codeword combinations are to be forwarded and which relays to forward.  . Scenario for the case of a two-source two-relay single-destination network. h kl is the channel coefficient corresponding to the instantaneous received signal-to-noise ratio (SNR) of the link from source k to relay l, where k ∈ {1, 2, · · · , K} and l ∈ {1, 2, · · · , L}. SNR kl denotes the average received SNR of the link from source k to relay l.
This scenario is considered as data collection in wireless sensor networks or data backhauling in ultra-dense networks, and it is a part of the topologies of these networks. On the other hand, if its application in cognitive radio (CR) network is considered, the primary user (PU) is one of K sources, and the other sources are secondary users (SUs). Alternatively, all sources can be assigned as SUs.
There is a relay assigned for each source if the transmissions (used for reference schemes) via the orthogonal channel are considered.

Channel Model
This paper assumes that time is slotted and synchronized. Only real channel coefficients are considered, and the block channel fading is assumed, i.e., the channel coefficient for a whole block signal within a time slot along a channel link is constant. In addition, Rayleigh fading is considered, and the channel coefficient is independently and identically distributed for each channel link. Hence, the real channel coefficient is normally distributed. The average received signal-to-noise ratio for the link from source k to relay l is denoted by SNR kl . On the other hand, AWGN has zero mean and unit variance in this paper.

Computing Combination Coefficient Vector
R l (h l , a l ) is defined as the computation rate region corresponding to the channel coefficient vector h l = [h 1l , h 2l , · · · , h Kl ] T and correspondent a l . According to the work in [7], R l is achievable for any large enough n and for the existing encoders and decoders such that the receiver can recover the desired codeword combination with a l = 0 with the average probability of error > 0 if the maximum coding rate of all sources, i.e., R for this paper, satisfies the condition: In this paper, a l is determined by applying the method proposed by U. Fincke and M. Pohst [25] as in the work of [26] to obtain the highest R l (h l , a l ). By considering the hardware specification of sensor nodes, this paper exploits Condition (7) to reduce the computational overhead by reducing the number of candidates of a l in searching, for which a l can provide the highest R l (h l , a l ). In addition, Condition (7) is also used to filter codeword combination for forwarding to the destination at each relay. Since R = log 2 q, the higher value q results in a high message loss rate. In this paper, only a small value of q is considered. Reducing the number of candidates of a l , i.e., reducing the bounds of the value of the elements of a l , can be done as in the works of [26,27] by replacing the condition R l (h l , a l ) > 0 with R l (h l , a l ) > R.
However, this modification causes some decrease in performance in the block error rate or frame error rate (FER) because codeword combination might be correctly received without satisfying Condition (7). By comparing with the case that applies condition R l (h l , a l ) > 0, the number of candidates, FER and computational latency are shown in Figure 2. The specification of the employed platform is shown in Table 1. The lattice code E 8 /7E 8 is used for NLC in this comparison, where q = 7, and E 8 is a well-known n = 8 lattice.
The result is obtained by considering the codeword combinations of two sources at relay l and taking SNR 2,1 = 0 → 35 dB and SNR 1,l with two cases: SNR 1,l = 35 dB and SNR 1,l = SNR 2,l = 0 → 35 dB. The FER for condition R l (h l , a l ) > 0 was obtained by comparing the codeword combination with the combination of the original codewords. For the case with Condition (7), the codeword combination is filtered with Condition (7) first before comparing with the combination of the original codewords. From Figure 2, this setting performs the trade-off between the computational latency and the FER performance.

Encoding and Computing
A big file message is divided into small blocks, and blocks are selected to group into chunks or batches. For source k where k ∈ {1, 2, · · · , K}, the i-th chunk consists of d (i) k blocks and is expressed by B RLNC is applied among chunks to generate M coded blocks, W where χ kM , as shown in Figure 3. All sources transmit these M codewords for each chunk simultaneously to the relays.  k , etc., can be attached to the transmitting data, e.g., at the header of the frame. However, the location of the attached information for a source should not overlap with those of the other sources, as shown in Figure 4. Hence, the small chunk size is preferred for the header with a limited length. This paper assumes that the length of the attached information is negligible compared with the length of the sending block. Alternatively, this information can be known by the receivers (relays or destination) by broadcasting from each source, for example. This paper assumes that the content of this information is correctly received. Relay l for l ∈ {1, 2, · · · , L} computes the superposition of K codewords x

RLNC Encoder
Km to obtain their linear combination u (i) lm ∈ R n to forward to the destination, which is: where β is the combination coefficient vector computed at relay l for the m-th blocks of all sources for chunk i. Then, the combined coding coefficient vector of u

Empirical Probability Distributions
Since lossless transmissions from the relays to the destination are assumed, the total number of linearly independent codeword combinations at the relays for each chunk is the same at the destination. r (i) (r for any chunk) denotes the number of linearly independent codeword combinations correctly received at the relays (destination) for chunk i. Hence, The original blocks of all sources for chunk i are recoverable if there are D (i) linearly independent received codeword combinations for chunk i, i.e., r (i) = D (i) . If the channel state is stable, i.e., r (i) is constant for all i, all chunks can be decoded with a suitable value of M without the need for feedback from the destination. However, with the unstable channel state, r (i) varies with different chunks. Hence, without the aid of feedback, there are some chunks that are undecodable. As in the works of [17,28], in this paper, ρ (r) denotes the empirical probability distribution of r, for r ∈ {0, 1, · · · , D max }, where D max = max{D (i) , ∀i}.
On the other hand, in this paper, θ (i) k (θ k for any chunk) denotes the rank of the part of the matrix k is defined as the participation factor of source k in C (i) , i.e., in the forwarded codeword combinations of chunk i. In addition, λ k (θ k ) denotes the empirical probability distribution of θ k , for θ k ∈ {0, 1, · · · , d k }.
In practical applications, ρ (r) and λ k (θ k ) can be collected by employing a feedback-based transmission scheme only for the chunks without feedback loss, as in the work of [29], for example. The overhead caused by the linear dependence between coded blocks and between codeword combinations, i.e., due to the small value of finite field size q, is taken into account in the data collections of ρ (r) and λ k (θ k ). In addition, exploiting the probability distributions for the design of OCC/CF enables OCC/CF to be applicable to the other channel distributions, ensuring its robustness.

Decodability
In order to analyze the decodability, in this paper, p d and p k denote the probabilities that chunk i is decodable, i.e., r (i) = D (i) and θ k , respectively, when employing an OCC, which is designed by using ρ (r) and λ k (θ k ), respectively, in single-transmission flow, i.e., transmission from a source to a relay via an orthogonal channel. The overlapping fashions of OCCs corresponding to ρ (r) and λ k (θ k ) are the same.
This paper considers the case that K ≥ L, M = d max , and source k generates M coded blocks by the RLNC encoder with d (i) k linearly independent coded blocks for chunk i and all k, i.e., would be pseudorandom to ensure the linear independence between coded blocks. When employing OCC/CF, the codeword combinations of chunk i are recoverable at the destination if there are D (i) received linearly independent codeword combinations, i.e., r (i) = D (i) .
To determine the probability that a chunk is decodable, this paper studies two cases as below: Klm is a unit vector, i.e., only an element of β (i) lm is equal to one, and the others are zero; lm does not have zero elements; there are only M linearly independent codeword combinations, and they are only forwarded by a relay.
For Case I, r (i) can be written as Hence, the decodability of each chunk only depends on the OCC design using λ k (θ k ) for all k. In this case, the original blocks of each source can be recovered independently since every received codeword combination corresponds to the coded blocks from only one source. By assuming that chunk i for all sources is decodable, i.e., r (i) = D (i) , if the chunks of all sources are decodable, thus the probability that a chunk for all sources is decodable, p dec , can be written as This case assumes that chunk i is not decodable and there are γ blocks inside chunk i for source k with k ∈ {1, 2, · · · , K}, which also belong to the other chunks. If these γ (i) k blocks have been already recovered with the decoded chunks, then there are still to recover for chunk i. From another point of view, it is equivalent to the case that matrix C (i) has γ (i) k eliminated rows, which are between row In addition, C (i) can be approximately obtained by eliminating ∑ K k=1 γ k blocks are back-substituted into a chunk i when employing OCC in single-flow transmission. Then, the decodability of each chunk when employing OCC/CF is the same as when employing OCC designed using ρ (r) in single-flow transmission. Hence, in this case, p dec = p d . With Case II, the feature is that already recovered k blocks can be back-substituted into chunk i without waste. In contrast, for the other case, by taking θ k for example, these recovered blocks can successfully increase the number of linearly independent received coded blocks in chunk i if they are linearly independent of the existing received coded blocks in chunk i.
In addition, the value of γ (i) k should be appropriately selected by using θ (according to the OCC design using ρ (r)) and q = 7 is shown in Figure 5. In Figure 5a Figure 5b, then chunk i is decodable with all three possibilities. Therefore, a suitable selection of γ (i) 1 and γ (i) 2 can provide better performance for OCC/CF. Hence, the OCC design using λ k (θ k ) for all k is needed. In Figure 5c, r (i) = 3, θ (i) 1 = 2 and θ (i) 2 = 3 are given. By taking any two different recovered blocks, chunk i is decodable with nine of ten chances. The undecodable outcome should be caused by the selection of γ i.e., the OCC design using ρ (r). Figure 5c represents Case II where θ The decodability of chunk i is given with different possibilities of recovered blocks.
For the general case, by combining the two cases above, the effective probability that each chunk is decodable when OCC/CF is applied, denoted by p deff , can be approximately obtained by: On the other hand, for the case that K > L > 1, the values of M and d (i) k for k ∈ {1, 2, · · · , K} should be selected appropriately such that any chunk i can be decoded by itself, i.e., r (i) = D (i) .
For example, if taking d

Channel Efficiency
In this paper, channel efficiency is defined as the ratio of the total number of decoded blocks from all sources to the total transmission time (the total number of time slots for OCC/CF or for the transmission schemes without the need for feedback from the relays) taken from the sources to the relays. η and η eff denote the channel efficiencies corresponding to p d and p deff , respectively.
For a K-source L-relay network, the ideal value of channel efficiency, which is obtained with lossless transmission and without linear dependence between codeword combinations, is min{K, L}. Thus, for L = 1, the channel efficiency would be like in the case of single flow transmission via an orthogonal channel. Hence, applying an orthogonal channel might be a better option. This paper only considers the case that L > 1.
On the other hand,ρ denotes ∑ D max r=1 r · ρ (r), andη denotesρ M .η is called channel capacity in this paper, i.e., the upper bound of η eff . Many OCC designs in single-flow transmission try to obtain η eff close toη. In this paper, the (design) overhead is defined as the gap between η eff andη.

Encoding
This paper applies an OCC in a contiguously overlapping fashion, which is similar to the works of [14,15], but not in a rounded-end fashion for the design of OCC/CF in a multi-source multi-relay network where CF based on NLC is employed. The applied overlapping fashion is shown in Figure 6. In this fashion, for source k and each chunk, there are µ k > 0 innovative blocks, i.e., linearly independent blocks if comparing to the blocks of the other chunks, and there are γ k overlapped blocks between two contiguous chunks. Hence, there are d k = µ k + γ k blocks for all chunks except the first chunk where there are only µ k blocks, i.e., d (1) k = µ k , since it is not the rounded-end fashion. µ and γ are defined as ∑ K k=1 µ k and ∑ K k=1 γ k , respectively. There are min{M, d k } linearly independent coded blocks among M coded blocks for chunk i and source k, where χ should be pseudorandomly generated to achieve this goal.

Decoding
The feature of OCC is that a decoded chunk can help the other undecodable chunks in decoding by using back-substitution (b.s). The recovered blocks of the decoded chunk are substituted into the undecodable chunks that consist of the same blocks, i.e., the overlapped blocks. Thus, the number of linearly independent codeword combinations of the back-substituted chunks might be increased, and it depends on the value of q and the pairs (d k , γ k ) for all k [28]. With the OCC employed in this paper, for chunk i, left back substitution (l.b.s) and right back-substitution (r.b.s) denote b.s by the decoded neighboring chunk on the left, i.e., chunk i − 1, and on the right, i.e., chunk i + 1, respectively.
In addition to b.s, this paper considers the other decoding opportunity, called combination of chunks (co.cs) for the applied OCC. co.cs combines the contiguous undecodable chunks into the form of chain of chunks (ch.cs) with length φ ≥ 1, where φ is the number of the combined chunks. The decoding process can start without the need for at least an already decoded neighboring chunk as with b.s. The form of combined coding coefficient matrix of co.cs, C c , is shown in Figure 7. A ch.cs is decodable if the rank of C c , rank (C c ), is equal to the total number of original blocks inside that ch.cs, which is denoted by r ch . A ch.cs is considered as a directly undecodable ch.cs without waiting to receive a new chunk if rank (C c ) is lower than a threshold value denoted by r th . r co and r th are determined as described in Algorithm 1 by using the feature of the applied OCC. Chunk i can participate in the co.cs process if r p is determined as described in Algorithm 2. In Algorithms 1 and 2, l.b.s.s(i) and r.b.s.s(i) refer to the state of l.b.s and r.b.s, respectively, for considered chunk i. l.b.s.s(i) and r.b.s.s(i) declare whether undecodable chunk i has not been back-substituted by its left neighboring decoded chunk, i.e., chunk i − 1, and by its right neighboring decoded chunk, i.e., chunk i + 1, respectively. r (t) p = 0 means that chunk t cannot participate in co.cs, and its decodability depends on r (t) . The process of co.cs is described in Algorithm 3, where d.s(ch.cs) refers to the decodability state of the currently obtained ch.cs.
The decoding process can be done as described in Algorithm 4, where d.s(t) and d.s(t − φ + 1 : t) declare whether chunk t and ch.cs, combining from chunk t back to t − φ + 1, respectively, are decoded or not. The decoding process for a ch.cs with length φ ≥ 2 can be done by using the inactivation decoding method [23] in order to reduce the decoding complexity. However, Gaussian elimination is applied for the decoding process in this paper.
The chunks that are considered as directly undecodable chunks without waiting for the next received chunks can become decodable by the aid of feedback from the destination back to the sources. Otherwise, they can be discarded in order to reduce the storage overhead if they do not affect the recovery of all blocks, i.e., the original message. The latter option can be achievable by applying precoding before OCC at each source. With precoding, the original blocks can be recovered when a fraction of all coded blocks is decoded [12].

Algorithm 1
Determining r th and r ch of a co.cs starting from chunk t with length φ.

Design with Applied Overlapped Chunked Code
The design of the applied OCC for multi-source multi-relay, i.e., the design of OCC/CF, is to determine an allocation [(µ 1 , γ 1 ) , (µ 2 , γ 2 ) , · · · , (µ K , γ K )] with the desired p deff or with the desired effective channel efficiency η eff . For convenience, this paper takes M as the maximum chunk size, i.e., M ≥ max{d k , 1 ≤ k ≤ K} and M can provide min{θ k , 1 ≤ k ≤ K} > 0. If N k is the total number of blocks for source k, the number of chunks that contain the blocks of all sources is equal to The finite number of chunks for the applied OCC might cause high overhead in single-flow transmission if compared with the other codes such as in the works of [16][17][18]30], which have a rateless feature. However, the design of the applied OCC in a multi-source multi-relay network might be simpler if compared with the other designs that determine the probability distribution of chunk size for all sources, for example. By taking the design of BATs codes [17] as an example, d k is selected according to a determined degree distribution Ψ k = {ψ 0 , ψ 1 , · · · , ψ M } for each chunk. Determining Ψ k for all k must consider the outputs of p k , p d and p deff , while Ψ k for all k needs to satisfy Ψ 1 * Ψ 2 * · · · * Ψ K = Ψ [31], where Ψ is the degree distribution of D = ∑ K k=1 d k and sign * refers to the discrete-time convolution operation. It becomes more complicated to determine Ψ k for all k when K is large.
For the design of OCC/CF, d k is fixed for all chunks except for the first chunk. Thus, there are 1 + 2 + · · · + M = M (M + 1) 2 candidates of (µ k , γ k ) for source k and for all sources. It would be less if fixing d k = M and varying only γ k for all k, as in the previous work of this paper [1], where there are only M candidates for each source and M K candidates for all sources. However, the decoding complexity and the storage overhead at the destination might be high. For this work, the chunk size for each source is not large and bounded by M; however, it can be large, as in the work of [14]. A larger chunk size with a large number of overlapped blocks can improve the decodability of all chunks, but it can cause high computational complexity, especially decoding complexity and storage overhead at the destination. An undecodable chunk needs to wait for several new received chunks to start decoding, i.e., φ is large. For example, if taking µ = 14, γ = 18,ρ = 15 and M < D, where D = µ + γ = 32, then there are no chunks that can be decoded by themselves, i.e., r = D without using b.s or co.cs. The decoding process can only start by co.cs, where φ at least satisfies: hence, φ ≥ 18. With a large chunk size, the decoding process rarely starts with back-substitution, i.e., the ch.cs with length φ = 1. The decoding process only start with co.cs with large φ. In this work, the length φ is bounded for the purpose of low decoding complexity by providing more opportunities to conduct b.s and to reduce latency, as an undecodable chunk needs to wait to become decodable.

Overview
In order to select an appropriate allocation [(µ 1 , γ 1 ) , (µ 2 , γ 2 ) , · · · , (µ K , γ K )] for the desired purpose, the estimation of p deff (also η eff ) is done for each possible allocation. The estimations of p k and p d are conducted separately for each possible allocation. Then, p deff is determined by (11), and η eff is determined by: In the previous work of this paper [1], the estimation was done by conducting a simulation to obtain the performance in p d and p k of all allocations. Alternatively, in this paper, the estimation is done by conducting table lookup and the accumulative sum of the probabilities that ch.cs with the maximum length φ max are decodable for all possible combinations of r i−φ+1 , r i−φ+2 , · · · , r i and θ i−φ+1 k , θ i−φ+2 k , · · · , θ i k to determine p d and p k , respectively, using ρ (r) and λ (θ k ), respectively, for φ ∈ {1, 2, · · · , φ max } and i > 1. For convenience, only the estimation of p d is described, and the estimation of p k for all k can be done similarly.
At the start, it is assumed that an OCC with the fashion as in Figure 6 is applied from a sender to a receiver in single-flow transmission, and ρ (r) with r ∈ {1, 2, · · · , D max } is the obtained empirical rank distribution. The chunk size D is selected from the range value of r, then the maximum number of linearly independent codeword combinations (coded blocks) received per chunk becomes D. ρ (r) is updated to ρ (r ) where r ∈ {1, 2, · · · , D}. This paper estimates ρ (r ) from ρ (r) by: For the estimation and for convenience, another four back-substitution states for a chunk are defined as below: n.b.s.s(r ), for example, refers to a chunk that has r linearly independent coded blocks, and its state is n.b.s.s. n (r ), h (r ), q (r ) and f (r ) denote the probabilities that a chunk has r linearly independent coded blocks and has n.b.s.s, h.b.s.s, f.b.s.s and q.b.s.s, respectively. They satisfy the condition below. D ∑ r =1 n r + h r + q r + f r = 1.
Initially, n (r ) = ρ (r ) and h (r ) = q (r ) = n (r ) = 0 for r ∈ {1, 2, · · · , D} are given. The estimation here is to update n (r ), h (r ), q (r ) and f (r ) according to the decoding process for all values of r . If ρ d denotes the probability distribution of the number of linearly independent coded blocks in a chunk after conducting the updating process, then: ρ d r = n r + h r + q r + f r , for r ∈ {1, 2, · · · , D}.
In the updating process, the chunk with n.b.s.s, h.b.s.s and q.b.s.s is active, i.e., n (r ), h (r ) and q (r ) are used to conduct the updating process, and the chunk with f.b.s.s is inactive, i.e., f (r ) cannot be used to conduct the updating process and is only used in determining ρ d . The updating process is to try transforming n.b.s.s(r ) for all r to the chunks with other states, i.e., to make n (r ) tend to zero for all r . At the end of the updating process, p d is obtained by taking p d = ρ d (D).
The updating process is divided into two parts: b.s and co.cs, which are for φ = 1 and for 2 ≤ φ ≤ φ max , respectively. This paper assumes that the estimation of decodability is done at the destination. The destination informs about the desired allocation to the sources via feedback.
A combination of r (i−φ+1) , r (i−φ+2) , · · · , r (i) for a ch.cs (simply combination for convenience) with length φ is considered as a possible combination to be taken into account in the estimation if it satisfies: and it does not contain any possible combination with length φ inside, where φ ∈ {1, 2, · · · , φ − 1}. The ch.cs corresponding to a possible combination is decodable if rank (C c ) = φ · µ + γ. p s denotes the probability that a combination can make the correspondent ch.cs decodable and q s = 1 − p s . All possible combinations and their p s are obtained by conducting a computation in MATLAB in this work and known by the destination where table lookup is done while doing the estimation. Based on three different locations of a chunk in a ch.cs, the other three probabilities are defined as below. Chunk t is decodable, i.e., ch.cs that contains chunk t and t ∈ {i − φ + 1, · · · , i} is decodable, by probability p c , which is: When the beginning chunk t = i is focused on, it must have r (t) ≥ µ + 1, as in the case that one of l.b.s.s (t) and r.b.s.s (t) is true in Algorithm 2. For example, by taking D = 18 and γ = 4, a combination with rank array [16,14,14,17] is a possible combination where r (t) = 16 and p s ≈ 0.9663. However, the combination with rank array [16,15,14,17] is not a possible combination because it contains a possible combination with rank array [15,14,17].
When the intermediate chunk t ∈ {i − φ + 2, · · · , i − 1} is focused on, it must have r (t) ≥ µ − γ + 2, as in the case that l.b.s.s (t) and r.b.s.s (t) are all false in Algorithm 2. However, for r (t) > µ, the possible combinations that have the same rank arrays as those of the possible combination when focusing on the beginning chunk with the same value of r (t) are not included. In this estimation, the possible combinations that are both available while focusing on the beginning chunk and the intermediate chunk and have the same elements of the rank array, and the same focused chunk with n.b.s.s is applied once in the updating process with co.cs. This is because there is no constraint on the order of chunks in a ch.cs for Relation (18). For example, by taking D = 18 and γ = 4, two combinations with rank array [16,13,15,16] focusing on the chunk with r (t) = 13 and with rank array [16,15,16] focusing on the chunk with r (t) = 15 are used because combinations with rank arrays [13,16,15,16] focusing on the chunk with r (t) = 13 and with rank array [15,16,16] focusing on the chunk with r (t) = 15 are not possible combinations. However, the combination with rank array [16,13,15,16] focusing on the chunk with r (t) = 15 is not a possible combination because the combination with rank array [15,16,13,16] focusing on the chunk with r (t) = 15 is a possible combination.
The purpose of introducing q.b.s.s is described by the following example. By taking D = 18 and γ = 3, a combination with rank array [17,16,14,17] has p s ≈ 0.9473. However, it is not a possible combination since the combination with rank array [17,16] is a possible combination with p s ≈ 0.8369. Then, no matter how the rest of the decoding process (with b.s) is done, the combination with rank array [17,16,14,17] is decodable with p s less than 0.8369, which is lower than the real value. Hence, in order to fix this problem, q.b.s.s is introduced by assuming that the beginning chunk with n.b.s.s of an undecodable possible combination (with a probability of q c = q s b (r (i) ) e (r (i−φ+1) ) ∏ i−1 t=i−φ+2 i (r (t) )) becomes a chunk with r (i) = D − 1, but no longer with n.b.s.s, and with a state like h.b.s.s during the updating process with co.cs and with a state like n.b.s.s during the updating process with b.s. After introducing q.b.s.s, the ch.cs with the rank array [17,16,14,17] is decodable with a probability around 0.9498, which is close to the real value.
The relationship (transition) between four states of a chunk for the updating process with co.cs is shown in Table 2, where p t refers to the transfer probability.

Condition for Focused Chunk t with r (t)
From The updating process with co.cs is done by applying the relationship in Table 2 [16,15,17], then p s ≈ 0.8166 and p c = 0.0065 are obtained. Then, n (15) decreases by p c , and f (18) increases by p c . The updating process with co.cs can be done repeatedly for a number of iterations. n (r ), h (r ), q (r ) and f (r ) for all values of r are updated after an iteration of the updating process with co.cs and are used for the next iteration.

Back-Substitution
The work in [28] provides the probability that γ recovered (overlapped) blocks successfully help undecodable chunk t that has already had r (t) linearly independent coded blocks turn into a decodable chunk, where r (t) + γ ≥ D and γ ∈ {γ, 2 · γ}. In this estimation, the increment of the number of linearly independent coded blocks in an undecodable chunk after conducting b.s by using γ recovered blocks is also considered. In addition, the probability distribution of the increment is used.
In order to obtain these data, a computation in MATLAB is conducted to obtain the rank probability distribution of matrix C (t) ∈ F D×r (t) q with γ rows eliminated, i.e., a (D − γ ) × r (t) matrix; hence, the obtained rank r The rank increment from the aid of γ overlapped blocks to undecodable chunk t is denoted by . p bs = n (D) + h (D) is taken as the fraction (probability) of the decoded chunks that can be used for b.s, and q bs = 1 − p bs . For the case that only half b.s (l.b.s or r.b.s) is executable, then γ = γ. If full b.s is executable, then γ = 2 · γ. The transition table between four chunk states for the updating process with b.s is shown in Table 3. The focused chunk t has C (t) with rank r (t) . Table 3. Transition table between four chunk states for the updating process with b.s.

Half or Full b.s or N/A From
The updating process with b.s is done by applying the relationship of four chunk states shown in Table 3 and by following the concept of the same increment same decrement as in the updating process with co.cs. The updating process with b.s can be done repeatedly as with co.cs. After the end of each iteration of the updating process with b.s, p bs is added to f (D), and it is updated according to the updated value of h (D) because n (D) becomes zero after the first iteration of the updating process with b.s.

Decoding Complexity
From the work in [17], to encode a block by applying RLNC from d k input blocks required O (T · d k ) finite field operation, where T is the number of symbols per block. By applying Gaussian elimination for the decoding process, decoding a chunk with size d k requires O d 2 k + T · d k finite field operations per block on average. Since decoding complexity is more significant than encoding complexity, thus only decoding complexity is discussed in this paper.
With the applied decoding scheme, the main key to look at the decoding complexity is the mean length of ch.cs,φ. With OCC/CF, it is hard to estimateφ since a successful b.s or co.cs does not depend on the selection of [D, γ] alone, but actually on the selection of [(µ 1 , γ 1 ) , (µ 2 , γ 2 ) , · · · , (µ K , γ K )]. However, this paper employs the OCC designed using ρ (r), i.e., the selection of [D, γ], to estimateφ since p d ≥ p deff and the obtained value ofφ might rely on the way of conducting the estimation. φ (φ) denotes the probability (faction) that a ch.cs with length φ is successfully decoded. φ (φ) for 2 ≤ φ ≥ φ max can be estimated or collected along with the process of updating probabilities with co.cs, i.e., the increase in h (D) and in f (D) for a considered φ is the increase in φ (φ). In addition, . Hence,φ can be obtained by: Therefore, while employing OCC/CF, for each successfully decoding, it would need O φ 2 · D 2 + T ·φ · D finite field operations per block. It would be lower since C c is an approximately sparse matrix, as shown in Figure 7.

Obtaining the Estimated Performance
The process of updating probabilities with co.cs and b.s above can be done repeatedly and also alternately. Since there are µ innovative blocks per chunk, then chunk t with r (t) > µ can help the undecodable chunks to decode. This paper considers that applying the updating process with b.s first might make the value of n (r (t) ) where r (t) is close to D tend to zero early and then might make the other n (r (t) ) hardly tend to zero via the updating process. Therefore, this paper applies the updating process with co.cs first.
By taking the overlapping fashion of the applied OCC as the pseudo-reference, the updating process with co.cs is applied for one iteration, then the updating process with b.s is applied for φ max iterations. If N is the total number of original blocks, then there are approximately N ch = N φ max chains of chunks with length φ max . It is assumed that one time of the updating process is the updating process with co.cs for one iteration and then with b.s for φ max iterations. Then, N ch times of the updating process are needed such that the effect of the first ch.cs reaches the last ch.cs, and thus, 2 · N ch − 1 times of the updating process are needed such that this effect returns back to the first chunk. In this work, one round of the updating process is 2 · N ch − 1 times the updating process. In order to obtain the ultimate ρ d (r ), the updating process is done for a number of rounds until the increment of obtained p d is lower than an assigned t , then the updating process terminates, and the ultimate p d is obtained. The channel efficient η is obtained by η = µ · p d . The computational complexity for the estimation depends on t . It is higher when t is smaller, but the accuracy might be higher.
For example, by taking K = L = 2, M = 10, φ max = 5, SNR 1,1 = SNR 2,2 = SNR 1,2 = 35 dB, SNR 2,1 = 15 dB, E 8 /7E 8 as NLC and by taking only the case that µ − γ > 0, the empirical rank distribution ρ (r), the estimated p d and the correspondent η are shown in Figure 8. The simulation result obtained with the same condition and by using the empirical rank distribution in Figure 8a is also shown in Figure 8b,c for comparison.
From Figure 8b, the estimation causes high deviation from the simulation result for the allocations (µ, γ) that provide low p d , e.g., lower than 0.9, and it causes low deviation for the allocations (µ, γ) that provide high p d . The error might be caused by the inaccuracy of the updated rank distribution ρ or by the imperfectness of the process of updating probability with limited length of ch.cs. However, because the performance with high p d is preferred for this paper, thus the estimation applied in this paper is acceptable.
On the other hand, from Figure 8c, the maximum η from the estimation, η max-est , is obtained by taking allocation (14,5). However, the maximum η from the simulation, η max-sim , is obtained by taking allocation (14,6). Both allocations have the same µ, but different output p d . From the work of [14,15], without limiting the length of ch.cs, larger γ with the same µ can provide higher p d . Due to the estimation deviation, the allocation to provide η max is not correctly given. However, during the application, the allocation providing η max-est can be switched to the other allocation with larger γ, but with the same µ to find out which allocation is more appropriate.
From Figure 8d, the estimatedφ is much larger thanφ from the simulation, since the process of updating probabilities with co.cs is executed first, which is different from the real fact that b.s should be conducted as soon as possible according to the decoding scheme described in Algorithm 4. Thus, φ (φ) is abnormally high for φ ≥ 2, especially, when γ is large. However, the obtained results ofφ with the same µ = D − γ, but with different set values (D, γ) from both estimation and simulation show that larger γ results in higherφ, hence higher decoding complexity. In addition,φ from the estimation somehow can serve asφ obtained in the worst case.

Examples of Allocations
This paper assumes that the fairness between sources is achieved if η 1 = η 2 = · · · = η K where η k is the channel efficiency for source k. η k is defined as the ratio of the number of decoded blocks from source k to the number of time slots taken from the sources to the relays. If individually decoding is not considered, η k for all k only depends on p deff . In this case, the fairness is achieved by taking By taking the data in Figure 8, Table 4 lists some allocations and their performances in p deff and η eff from estimation and simulation. In addition,φ is the average value of φ, and it is counted when a ch.cs with length φ ≥ 1 is decoded.φ represents the computational complexity of the decoding process. From Table 4, Allocation 1 can provide the fairness between sources, but it cannot provide the highest channel efficiency, while Allocations 2 and 3 can provide the highest channel efficiency by estimation and simulation, respectively. The interchanged Allocation 3, i.e., Allocation 4, shows the affect of an unsuitable selection of (µ 2 , γ 2 ), which has low p deff . Allocations 1-4 do not provide high p deff . On the other hand, Allocation 5 can provide high p deff , but not the highest channel efficiency. However, the decoding complexity is lower sinceφ is smaller. If the difference between the provided channel efficiency and the highest channel efficiency is small, then this allocation can be applied instead if lower decoding complexity is required. Allocations 5-9 have the same µ, but different γ, which varies from 7-3. They show the outcome of different values of γ to p deff andφ. From the result in Table 4, larger γ provides higher p deff , but higher decoding complexity.
On the other hand, the highest channel efficiency can be obtained by the precoding process at each source before employing OCC. However, this might cause additional decoding complexity and latency caused from re-ordering blocks after the decoding process. This paper assumes that there is no precoding overhead, i.e., the number of required received coded blocks is equal to the number of original blocks when the maximum channel efficiency is considered.

Impact of the Participation Factor of Each Source
From now on, this paper uses the term OCC as the applied OCC, which uses the allocation providing the highest channel efficiency. In addition, OCC refers to the applied OCC that uses the allocation providing the highest channel efficiency with condition p d ≥ p thr or p k ≥ p thr . OCC/CF and OCC /CF refer to the transmission schemes employing OCC and OCC , respectively, before NLC in a multi-source multi-relay network. The decodability condition for OCC /CF is p deff ≥ p thr .
η max denotes the maximum channel efficiency that can be provided by the applied OCC designed using ρ (r). If individually decoding is not considered, from (11), the upper bound of η eff is η max .
As mentioned above,η =ρ M is the channel capacity or the upper bound of the channel efficiency for the transmission scheme employing OCC/CF from the sources to the destination. By taking K = L = 2, M = 10, φ max = 5, E 8 /7E 8 as NLC, p thr = 0.97, SNR 1,1 = SNR 2,2 = 35 dB, SNR 1,2 ∈ {5, 20, 35} dB, SNR 2,1 ∈ {0, 5, 10, 15, 20, 25, 30, 35} dB. The performance of OCC/CF and OCC /CF in decodability and channel efficiency from estimation (with postfix "-est") and simulation (with postfix "-sim") is shown in Figure 9. From Figure 9, the estimated channel efficiencies of OCC/CF and OCC /CF are around 97.95% and 98.34%, respectively, of those of the channel efficiency obtained from simulation on average. In addition, the gap between the channel efficiency of OCC andη represents the design overhead of applied OCC. From the simulation result, η max is around 87.71% ofη on average. The design overhead of OCC/CF should be the aggregate of the design overhead using ρ (r) and λ k (θ k ) for all k. The channel efficiency of OCC/CF and OCC /CF is close to η max when SNR 1,2 or SNR 2,1 is close to (as high as) SNR 1,1 or SNR 2,2 . This is because, in this case, the participation factor of each source is dense around θ k = d k . When λ k (d k ) is very dense, most of the allocations (µ k , γ k ) can provide p k close to one. Hence, the design of OCC/CF or OCC /CF can only depend on ρ (r).
In the case that SNR 1,2 and SNR 2,1 are low, the received combined coded blocks are almost plain coded blocks, i.e., β (i) lm are almost in the form of unit vectors for all l and m. The design overheads of OCC/CF and OCC /CF in this case should be the aggregate of the design overhead of OCCs using λ k (θ k ) for all k. Since the participation factor of each source is not dense around θ k = d k , the design overheads of OCC and OCC might be high, and higher than those of the prior case.
In order to make λ k (d k ) denser, in addition to forcing to obtain β (i) lm without zero elements at each relay, improving the diversity of the received codeword combinations by increasing the number of participating relays or equipping more antennas at relays as in work of [8,9] might be a solution.

Reference Schemes
Because the original work in [7] did not consider the retransmission, a feedback-based transmission scheme is used as the reference scheme instead to evaluate the performance of OCC/CF and OCC /CF, and it is called CF with protocol overhead (CF/PO) in this paper. For each round of CF/PO, each source applies NLC without OCC and needs feedback from the destination after sending a block to know which blocks have been decoded and which blocks need to be retransmitted. Feedback is forwarded by relays via an orthogonal channel. There is no decoding delay constraint, i.e., a source can transmit a new original block although the previous blocks of the other sources have not been decoded [6].
The protocol overhead (the transmission time of feedback and the loss of feedback) are taken into account. The feedback reception success rate is denoted by p f , and the ratio of the transmission time of feedback to a slot time is denoted by τ f . p f is obtained by conducting a simulation where NLC is applied on a link from a source to a relay with the highest SNR via an orthogonal channel. Because it is hard to track the performance of CF/PO with varying p f , only the performance with different values of τ f is considered.
In addition to the channel efficiency, the transmission efficiency ε t is also considered to evaluate the performances of OCC/CF and OCC /CF with a scheme called RLNC with orthogonal channel (RLNC/OC) where RLNC is applied before NLC at each source for the transmissions from the sources to the relays via an orthogonal channel. For each source, original blocks are grouped into disjoint chunks with M blocks per chunk. RLNC is applied within each chunk, and a feedback (ACK) is needed when a transmitted chunk is decodable. The protocol overhead is also considered, and it is assumed that feedback cannot be received instantaneously by the source to stop transmitting [32]. In this paper, ε t is defined as the ratio of the total number of decoded blocks to the total number of transmissions taken between the sources and the relays, while the transmission of feedback is also taken into account. In this paper, τ f is assumed as the ratio of the length of feedback data to the length of payload data per block. The performance in transmission efficiency reflects the energy consumption of each scheme.
On the other hand, a transmission scheme employing LT code [20,30] at each source before NLC, called fountain code with CF (FC/CF), is also used as a reference scheme. The considered parameters of LT code are c fc and δ fc . In the case of single flow transmission with N k original blocks, the receiver can recover all blocks with probability 1 − δ fc if receiving N k + 2 · log e (S fc /δ fc ) · S fc coded blocks, where S fc ≡ c fc · log e (N k /δ fc ) · √ N k . In the CF/PO and FC/CF schemes, the decoding process is done for each source block transmission. The destination tries to decode if there are K linearly independent codeword combinations. If undecodable, the destination stores the undecodable blocks (after decoding using (2)) and waits for the next received codeword combinations.
Since feedback is not needed in OCC/CF, OCC /CF and FC/FC, their transmission efficiency is 1/K times their channel efficiency. If N dec , N fb and N ts denote the total number of decoded blocks, the total number of feedback and the total number of time slots taken excluding the transmission time of feedback, respectively, then the channel efficiencies and the transmission efficiencies of CF/PO and RLNC/OC can be written as below.

Numerical Results and Discussion
This papers considers two scenarios to observe the performance of the transmission schemes employing OCC/CF and OCC /CF by comparing with the reference schemes. The first scenario investigates the performance in a two-source two-relay network with an asymmetric channel state at relays, i.e., average SNRs of all links from all sources to a relay might be different, as used in Section 7.2. The second scenario considers a varying number of relays with a symmetric channel state, i.e., average SNRs of all links from all sources to a relay are the same, and the number of sources is fixed to two. The numerical results are obtained by conducting simulations taking E 8 /7E 8 as NLC, M = 10, N k = 1000 for all k, c fc = 0.01, δ fc = 0.01, τ f = 0.05 for RLNC/OC and τ f ∈ {0, 0.05} for CF/OF. The simulation frequency is 100 times. Each simulation terminates when there is at least source k having the rest of innovative blocks less than µ k for OCC/CF, OCC /CF and RLNC/OC and when all original blocks of at least one source are recovered for CF/PO and FC/CF.
The performances of OCC/CF and OCC /CF in the first scenario are the same as in Figure 9. The second scenario takes SNR 1,1 = SNR 2,2 = SNR 1,2 = SNR 2,1 ∈ {30, 35, 40} dB with correspondent From Figure 10, OCC/CF and OCC /CF increased channel efficiency by 72.98% and 63.28%, respectively, on average if comparing with RLNC/OC. For τ f = 0.05, OCC/CF had a 4.71% increment of channel efficiency on average if comparing with CF/PO. However, OCC/CF and OCC /CF provided higher channel efficiency than CF/PO only when SNR 1,2 or SNR 2,1 was similarly high as SNR 1,1 or SNR 2,2 , i.e., when the participation factor of source k was dense around θ k = d k for all k. OCC/CF and OCC /CF provided the increment of channel efficiency up to 16.41% and 13.41% if comparing with CF/PO for τ f = 0.05. On the other hand, by comparing with FC/CF, OCC/CF had higher channel efficiency than FC/CF in almost all cases. The increment was around 9.36% on average. For OCC /CF, because of the higher design overhead, it sometimes could not provide higher channel efficiency than FC/CF. There was only a 2.91% increment of channel efficiency on average if comparing with FC/CF. The issue of employing fountain code in this scenario was that the number of coded blocks to ensure the desired decodability was larger than the expected number, because some coded blocks from a source could not be extracted from the codeword combinations forwarded from the relays during each source block transmission.
On the other hand, the transmission efficiency of RLNC/OC was higher than the other schemes in all cases except CF/PO for τ f = 0. Thus, OCC/CF and OCC /CF had less of a chance to perform better than CF/PO when τ f was small. The transmission efficiency of OCC/CF and OCC /CF was 86.85% and 81.98% on average, respectively, of the transmission efficiency of RLNC/OC. It seems the performance of the proposed schemes was a trade-off between the increment of channel efficiency and the decrement of transmission efficiency if comparing with an orthogonal channel transmission scheme, e.g., RLNC/OC for this scenario. In order to improve the transmission efficiency of the proposed schemes, increasing the diversity of codeword combinations at relays, i.e., increasing the number of participating relays, was considered and discussed in Scenario 2.
From Figure 11, the channel efficiencies of OCC/CF and OCC /CF increased when SNR k,l increased, since the channel capacityρ also increased with SNR k,l . OCC/CF and OCC /CF performed similarly at high SNR k,l , and they had a chance to perform better than CF/PO for τ f = 0 in the case of four relays at SNR k,l = 40 dB since the loss of feedback also impacted the performance of CF/PO. If comparing with CF/PO, OCC/CF and OCC /CF increased channel efficiency up to 3.93% and 4.17%, respectively, for τ f = 0, and up to 16.18% and 14.59%, respectively, for τ f = 0.05. If comparing with RLNC/OC, OCC/CF and OCC /CF increased channel efficiency up to 140.32% and 140.88%, respectively. OCC /CF sometimes performed slightly better than OCC/CF, because of the estimation deviation. In addition, the channel efficiency of OCC/CF and OCC /CF increased faster than that of FC/CF when SNR k,l increased or the number of relays increased. This is because the overhead of fountain code was the same if the parameters c fc and δ fc were fixed. In addition, it might have been because of the condition of stopping the simulation, i.e., all original blocks of a source were recovered, and those of the other source had not been all recovered, the channel efficiency of FC/CF did not increase when SNR k,l increases or the number of relays increased, as shown in Figure 11c.
For the performance in transmission efficiency in Figure 11, OCC/CF and OCC /CF had higher transmission efficiency than RLNC/OC when the number of relays was higher than the number of sources. However, employing more relays might have increased the complexity of the network such as how to select which relays to join, how to achieve time synchronization at all relays, etc.
On the other hand, the performance of OCC/CF and OCC /CF could be improved, especially at low SNR 1,2 and low SNR 2,1 in the first scenario by applying decoding individually, but this might have increased the complexity of the decoding process if K were large.

Conclusions
This paper proposed a design of OCC that is applied before NLC in multi-source multi-relay networks, called OCC/CF. A decodability condition was provided for the design. This paper took an OCC with a contiguously overlapping fashion, but not a rounded-end fashion, to design OCC/CF. The decoding scheme and the estimation of designed OCC/CF are provided. The estimation is done for each allocation, i.e., the number of innovative blocks per chunk and the number of blocks taken from the previous chunk, to search for which allocation can provide the desired performance such as the highest channel efficiency, or the preferred decodability, or the acceptable decoding complexity. The estimation deviation is low when the decodability is sufficiently high. Since there are a limited number of chunks for the designed OCC/CF, the design overhead is high if comparing with channel capacity. From the numerical results, the advantage of OCC/CF over a feedback-based transmission scheme depends on the level of protocol overhead, i.e., the transmission time and the size of feedback, the feedback loss rate. The performance of OCC/CF, especially transmission efficiency when comparing with an orthogonal channel transmission, can be improved by increasing the number of relays. Future work is to consider decoding individually and the cooperation between feedback and OCC/CF for higher performance.