Distributed Rateless Codes with Unequal Error Protection Property for Space Information Networks

In this paper, we propose a novel distributed unequal error protection (UEP) rateless coding scheme (DURC) for space information networks (SIN). We consider the multimedia data transmissions in a dual-hop SIN communication scenario, where multiple disjoint source nodes need to transmit their UEP rateless coded data to a destination via a dynamic relay. We formulate the optimization problems to provide optimal degree distributions on the direct links and the dynamic relay links to satisfy the required error protection levels. The optimization methods are based on the And–Or tree analysis and can be solved by multi-objective programming. In addition, we evaluate the performance of the optimal DURC scheme, and simulation results show that the proposed DURC scheme can effectively provide UEP property under a variety of error requirements.


Introduction
With the development of space exploration and the evolution of future space information networks (SIN), erasure correcting codes have attracted considerable research interest to enhance the information transmission capacity under the extremely challenging space communication environments [1], which are characterized by the frequent and lengthy link disruptions, high data loss and a long link delay [2].In 2014, Consultative Committee for Space Data Systems (CCSDS) released an experimental specification of long erasure correcting (LEC) codes for near earth and deep-space communications [3], in which near-optimum fixed code rate Irregular-Repeat-Accumulate (IRA) codes are proposed.In [4], a joint design of the CCSDS file delivery protocol (CFDP) and IRA codes is discussed, and such erasure codes can be decoded efficiently with the Maximum-Likelihood algorithm [5], and the code rate can be selected from several values [6,7].
Rateless codes (RC), also termed fountain codes, are capacity-achieving loss-resilient codes for erasure channels.Luby-Transform (LT) codes with a well-designed robust Soliton degree distribution (RSD) are the first practical realization of fountain codes [8].They can recover the original k information (input) symbols from any N = k + O( √ k ln 2 (k/θ)) received coded (output) symbols with probability 1 − θ and the decoding cost of O(k ln(k/θ)) operations, where θ is the allowable failure probability to recover the original message after N coded symbols have been received.To further reduce the decoding complexity and to address the issue of high error floor in LT codes, Shokrollahi proposed Raptor codes that concatenate the LT code with a weakened robust Soliton degree distribution (WRSD) with a high-rate pre-code [9].In [10,11], LT codes are incorporated into the CFDP implementation, and both of the Packets Interleaving CCSDS File Delivery Protocol and the Loss-Tolerant File Delivery Protocol are able to resist channel erasures and can further reduce obvious overhead in CFDP.
Moreover, the line of sight (LOS) link is often unavailable for the space exploration rovers communicating to the base station directly [12].The space exploration rovers and relay satellites can form a typical multi-access relaying SIN, which has a dynamic time-varying property of multi-hop links [13].For example, the data from disjoint rovers/explorers need to be collected at the base station through a periodic relaying satellite [14].Therefore, rateless codes have been considered in increasingly complicated SIN to provide an efficient distributed transmission scheme [15].
The first distributed LT (DLT) codes is proposed in [16].The degree distribution for the distributed sources is designed as a way of decomposing the standard RSD, which is suited for a pre-fixed number of source nodes communicating with a single destination via a relay.In [17], selective distributed LT (SDLT) code is proposed by applying the And-Or tree analysis and linear programming, which can find some optimal combination at the relay node for an arbitrarily number of sources.In [18], soliton-like rateless coding (SLRC) scheme is designed for a Y-network, and SLRC scheme can provide degree distributions that generate LT-like output symbols in a relay with simple network coding protocol.It was shown through Monte Carlo simulations that the SLRC outperforms the DLT and SDLT.In [19], an improved approach of SLRC is proposed for the relay buffer-limited situation to ensure more effective decoding.The SIN is considered in the scenario that direct links and relay links are all existing, and an available degree distribution optimization scheme is proposed based on the And-Or tree analysis in [20].The rateless network coding (NC) for dynamic relay topology based on [20] is investigated preliminarily for increasing the system throughput in this paper.
Furthermore, there are several scenarios where the conventional rateless codes cannot perform optimally due to the lack of unequal error protection (UEP).For example, when transmitting the data blocks of discrete wavelet-transform encoded images in SIN, the lower frequency part of data blocks are more important than the higher frequency parts.Thus, it is more desirable to use rateless codes with UEP to protect the important parts.The first scheme of rateless codes with UEP property is proposed in [21], and message symbols are allocated two different weights according to their importance levels.In [22], expanding window fountain codes is proposed to generate output symbols only from message symbols within a certain window.Two overlapping and expanding windows are pre-designed, such that the smaller window contains important message symbols, and the larger window contains all the symbols.A distributed rateless code with an unequal error protection (UEP) property has been proposed in [23] for a Y-network, and it can provide different data importance levels with different error probabilities for two sources.The generalized UEP rateless code (GURC) for distributed relay networks is proposed in [24], and the relationship between UEP property and decoding error rate (DER) of LT codes is obtained by the And-Or tree analysis [25].However, the UEP property for multiple source nodes and original data in a dynamic network topology of SIN is still lacking research.
Considering a relay has its own orbit around the mission planet, which makes the links between landed rovers and the relay being periodically available.Since the space explorers have limited energy, broadcasting is prohibited.Thus, the rovers cannot communicate with the relay and destination simultaneously.In this paper, to improve the throughput of the multimedia service in future SIN communications, we proposed a novel distributed UEP rateless coding (DURC) scheme for the multimedia data transmission in a multi-access relaying SIN, which could obtain a lower DER under the pre-selected parameters of UEP property.Specifically, the RC degree distributions and network coding rules are designed to match the duration of the link access conditions.
The rest of the paper is organized as follows.In Section 2, we present the system model and our DURC scheme.In Section 3, we derive the asymptomatic performance based on the And-Or tree analysis and optimize the degree distributions and network coding rules by using multi-objective programming.In Section 4, we employ NSGA-II to design DURC codes and evaluate the performance of the DURC under different channel conditions.Finally, we conclude the paper in Section 5.

System Model
We consider a communication scenario in SIN as shown in Figure 1: two disjoint exploration rovers with sources s 1 and s 2 with data block of length k 1 and k 2 input symbols, respectively.Let S 1 and S 2 denote the set of s 1 and s 2 input symbols, respectively.S 1 and S 2 transmit to a base station D with the assistance of a periodic moving relay satellite/orbiter R. Due to the periodic motion, R has limited access time, and the source nodes have the knowledge of the accessing period in the SIN.Note that, without loss of generality, the output symbols transmission in Figure 1 are on binary erasure channels (BEC), and the erasure probabilities between the four nodes are denoted by ε ij , where i ∈ {1, 2, R} and j ∈ {R, D}.Note that the qualities of direct links are much worse than the relay links in SIN, i.e., {ε 1D , ε 2D } >> {ε 1R , ε 2R , ε RD }.We define the relay links S 1 − R, S 2 − R, and R − D (Y-network in Figure 1) as primary links, and the direct links between sources to destination, S 1 − D, S 2 − D, as secondary links.We define each period of the relay in the SIN as a transmission session, and each transmission session period is divided into two Phases.
In Phase 1, R is invisible to S 1 , S 2 , and D, then S 1 and S 2 performs LT coding over their information set s 1 and s 2 with degree distribution Ω 1 (x) and Ω 2 (x), respectively, and transmits the coded symbols on the secondary links S 1 − D and S 2 − D to the destination D; in Phase 2, once primary links S 1 − R and S 2 − R are available, the secondary link is closed to save energy immediately.S 1 and S 2 performs LT coding over its information set s 1 and s 2 with degree distribution Ψ 1 (x) and Ψ 2 (x), respectively, and transmits the coded symbols to the relay R, and a rateless NC is performed at R and then transmitted to D on R − D. The connections and durations are illustrated in Figure 2. The number of coded symbols transmitted on each link are denoted by N 1 and N 2 .For this dynamic SIN network model, the detail of our DURC scheme is illustrated below: • Initialization Suppose that the information symbol lengths of s 1 and s 2 are k 1 and k 2 , respectively.The k m (m ∈ {1, 2}) symbols are divided into n subsets according to their importance levels, expressed as I m1 , I m2 , ..., I mn and k m = ∑ n i = 1 I mi , the fraction of the I mi information symbols in k m is π mi and ∑ n i = 1 π mi = 1.S m selects the i-th importance level from subset I mi with probability w mi , which is called symbol-selection weight [21].S m employs an LT-coding degree distribution , where D m denotes a pre-selected maximum value of d m .
• Phase 1 R is invisible, and S 1 and S 2 generate distributed rateless coded symbols from k m information symbols using LT-coding degree distribution Ω m (x), and transmit N 1 coded symbols to D by secondary links.In an encoding process at S m , if a degree d m is randomly selected with probability Ω m,d m using the degree distribution , then d m information symbols are selected uniformly at random and are bitwise XORed to form the coded symbol.
• Phase 2.1 R is visible, and S 1 and S 2 generate distributed UEP rateless coded symbols from information symbols using LT-coding degree distribution Ψ m (x), and transmit N 2 coded symbols to R by primary links.In an encoding process at S m , if a degree d m is randomly selected with probability Ψ m,d m using the degree distribution , then d m information symbols are selected with probability w mi π mi k m in I mi and are bitwise XORed to form the coded symbol.• Phase 2.2 The coded symbols are transmitted to relay R from S 1 and S 2 , and based on the network coding rule P = {p 1 , p 2 , p 3 } and ∑ 3 i = 1 p i = 1, R generates three types of network coded symbols and forwards them to the destination D. R forwards S 1 's output symbol directly with the probability p 1 , while forwarding S 2 's output symbol directly with the probability p 2 , and XORs the two incoming symbols with the probability p 3 and then forwards to D .
• Decoding After receiving enough coded symbols from S 1 , S 2 and R, a joint decoding is performed on D to recover the information symbols of S 1 and S 2 .
Considering the erasure probabilities of the links, in Phase 1 and Phase 2, the expected number of the coded symbols successfully received at D can be expressed as where n 1 and n 2 are the number of the received coded symbols on the S 1 − D and S 2 − D links in Phase 1, respectively, and n NC is the number of the received output coded symbols on the link R − D in Phase 2.

Analysis of RC and NC Degree Distributions by the And-Or Tree Technique
In one transmission session, D will receive multi-path coded blocks in Phase 1 and Phase 2, and a joint decoding to restore the sources' original symbols via the belief propagation (BP) algorithm.In this paper, we assume that the original symbols in one source node are divided into more important bits (MIB) and less important bits (LIB), which means that n = 2 in the flowchart.
Based on the And-Or tree analysis [26], let δ(x) and ψ(x) denote the edge distributions of the input node and the output node in a rateless codes, respectively.The DER after l BP decoding iterations is expressed as y l = δ 1 − ψ(1 − y l−1 ) , where y 0 = 1.When k approaches infinity, the edge distribution of input nodes become a Poisson distribution, and the DER becomes y l = exp − γ × Ψ (1 − y l−1 ) , where Ψ (x) is the derivative of the output degree distribution, and γ is the decoding overhead defined as the ratio of the number of information symbols recovered by the destination decoder to the number of coded symbols.Thus, the decoding performance is only determined by γ and Ψ(x).
In order to analyze the decoding performance of our DURC scheme, we should derive the relationship between degree distribution Ω m (x), Ψ m (x) and network coding rule P. We consider an And-Or tree as shown in Figure 3, and the received coded symbols can be divided into five groups.The first two groups are received on the secondary links S 1 − D and S 2 − D, termed C 1 and C 2 .
The other three groups are received on the primary links, termed C 1 , C 2 and C 3 .C 1 and C 2 are the coded blocks forwarded by R with the probabilities p 1 and p 2 , respectively.C 3 is the coded symbol transmitted from R after the XOR operation with the probability p 3 .C 1 and C 2 are generated with degree distributions Ω 1 (x) and Ω 2 (x) , respectively.C 1 and C 2 are generated with degree distributions Ψ 1 (x) and Ψ 2 (x), respectively.C 3 is generated with the convolution degree distribution Ψ 1 (x) × Ψ 2 (x).
The input nodes can be divided into four groups, termed X 11 , X 12 , X 21 and X 22 .X 11 and X 12 are the MIB and LIB of S 1 , respectively.Similarly, X 21 and X 22 are the MIB and LIB of S 2 , respectively.Thus, T l,11 with depth 2l as shown in Figure 3, with the root X 11 .Similarly, we can construct And-Or trees T l,12 , T l,21 and T l,22 with the roots X 12 , X 21 and X 22 by using the same method.
with probability 11,i with probability 2,j with probability 1,i with probability  Theorem 1.Let y l,1,n (or y l,2,n ) be the probability that the root of S 1 (or S 2 ) evaluates to 0, indicating one input node not being recovered after l-th BP decoding iterations.Then, we have where y 0,1,n = y 0,2,n = 1, and Proof of Theorem 1.In ( 2) and ( 3), the first product term means y l of Phase 1, and the second one means y l of Phase 2. For every importance level of the information symbols in a single source node, their input edge distribution and average degree are different.Thus, the parameters used in ( 2) and ( 3) can be defined as follows.The input edge distribution of primary link is expressed as ) is the input average degree, and where µ m = ∑ D m d m = 1 d m Ψ md m is its output average degree.The input edge distribution of secondary link is expressed as δ m (x) = exp(α m (x − 1)), where α m = λ m n m /k m , and the output edge distribution of secondary link is expressed as λ m (x) = Ω m (x)/Ω 1 .ψ 1,i is defined as the probability that C 1 has i children of input nodes, and ψ 2,i is defined as the probability that C 2 has i children of input nodes.Moreover, ψ 1,i also refers to the probability that C 3 has i children of X 1 (including X 11 and X 12 ), and ψ 2,i refers to the probability that C 3 has i children of X 2 (including X 21 and X 22 ).Every C 1 only has the children of X 1 , and the probability of any X 1 evaluating to 1 is 1 − ∑ 2 n = 1 w 1,n y l−1,1,n .Thus, if a C 1 has i children of X 1 , the probability of C 1 evaluating to 1 is ∑ In the same way, if a C 3 node has i children of X 1 and j children of X 2 , the probability of C 3 evaluating to 1 is ∑ In the And-Or tree T l,11 , P 1 and P 3 are the proportions of X 1 connecting with C 1 + C 1 and C 3 , respectively.P 2 and P 4 are the proportions of X 2 connect with C 2 + C 2 and C 3 , respectively.The relationships are expressed as (8).When n 1 , n 2 and n NC are given, the DER y l in (2) and (3) monotonically decrease with l, and converge to fixed values, which can be regarded as the final DER.

Analysis of System Throughput
The system throughput is defined as the ratio of the total number of recovered information symbols k and the sum of received output symbols n in one transmission session period.Since y l is the asymptotic DER, there exists a unique ideal throughput upper bound for a certain system model that is related to y l .For the LT code, the ideal DER tends to 0 when γ ≥ 1, which means that the upper bound is only affected by the overhead γ.In this way, we can derive the throughput upper bound for a typical BEC model as shown in Figure 1, which is given by In ( 9), ε is the channel erasure probability, which affects the overhead γ.Moreover, by considering the periodic motion of relay R in the dynamic SIN scenario, we can derive the throughput upper bound of system model in this paper as

Optimizations of Rateless Coding Scheme and Network Coding Rule
To optimize the system performance, we construct a decision variables set about degree distribution, symbol selection weight and network coding rule in ( 2) and ( 3), Q = (Ω m , Ψ m,1 , Ψ m,2 , ..., Ψ m,d m , w m,n , P), where Ψ m,d m denotes the coefficient of degree distribution, w m,n denotes symbol selection weight for different importance level, and P is the network coding rule.Since the channel state information of the links are unknown, we cannot obtain the exact number of coded symbols at D in each phase.Therefore, sub-optimal Q should be acquired for minimizing the BP DER at the destination D. Therefore, we give the optimizations as follows.
We first consider the secondary link in our scenario.The erasure probability ε 1D or ε 2D is much too high, which may lead to the received coded symbols being less than the information symbols at the destination D, i.e., n 1 < k 1 and n 2 < k 2 .Thus, as a supplementation to enhance the decoding performance of the whole transmission session, the optimization problem of Ω 1 (x) or Ω 2 (x) is to guarantee that the received coded symbols can recover part of the information symbols instead of recovering the total original information symbol sets or UEP, and similar optimal formulations can be found in [25] and [27].The optimization problem of minimize the decoding error probability can be formulated as follows: min s.t.
Without the exact knowledge of n 1 , the values of α 1 for different channel erasure probabilities are unknown.Problem (11) is thus a constrained nonlinear optimization problem and generally non-convex.Therefore, it is a simplified method to solve this problem instead, by considering the most important coded symbols at the LT BP-decoder.In each BP decoding iteration, degree-1 and degree-2 symbols are most important, as they can help other unrecovered coded symbols to reduce the number of edges connected with them.In this case, we could restrict the maximum degree M 1 and M 2 to 2, i.e., Ω In [27], a theoretical analysis is given to demonstrate that the partial decoding performance of the degree distribution with only degree-1 and degree-2 nodes is acceptable, when the received overhead n k is lower than 1.In addition, these low-degree coded symbols are to be jointed with the coded symbols on the primary links to assist in full decoding.Therefore, to minimize the DER of the joint decoding at the destination D with a pre-selected decoding overhead γ = n 1 +n 2 +n NC k 1 +k 2 , and the And-Or tree asymptotic performances of y l,1 and y l,2 in (2) and (3) can be easily computed by choosing the remaining parameters Ω 1,1 , Ω 1,2 , Ω 2,1 , Ω 2,2 , p 1 , p 2 and p 3 , with n 1 : n 2 : n NC , and Ψ m,n (x) is known beforehand.It is not difficult to show that y l,1 and y l,2 are two conflicting objective functions by investigating (2) and (3).Therefore, we have a multi-objective optimization (MOP) problem about the objective function y(S) = (y 1 (S), y 2 (S)) to minimize them concurrently, where S is the set of decision variables, i.e., S = (Ω 1 , Ω 2 , p 1 , p 2 , p 3 ) Then, consider the primary links in our scheme, although the actual values of n 1 , n 2 and n NC are unknown, and the expected ratio of the numbers of the received coded symbols on different links, n 1 : n 2 : n NC , can be derived by given the block lengths and erasure probabilities.If a desired overhead γ * and proportion of every importance level are given, the decoding performance of MIB and LIB only depend on the degree distribution and symbol selection weight.Thus, if we limit the decoding performance relation between LIB and MIB, the optimizations of rateless coding scheme of one source are finished when we finish optimizing the decoding performance of LIB.Furthermore, it is not difficult to show that y l,1,2 and y l,2,2 are two conflicting objective functions by investigating ( 2) and (3).Therefore, we have a multi-objective optimization (MOP) problem, MOP1, about the objective function y l,1,2 (Q) and y l,2,2 (Q) to minimize them concurrently, where Q is the set of decision variables shown above: Ψ md m = 1, Ψ md m ≥ 0, 0 ≤ w mn ≤ 1, Note that the inequalities on the second line of ( 13) are added to guarantee UEP property, which limits the decoding performance relation between LIB and MIB.Thus, it should be noted that when the original information in any source is divided into n(n > 2) importance levels, optimization can also be in the same way due to the same degree distribution and network coding rule for every importance level.The MOP2 can be formulated as follows, in which it will have 2n importance levels for the whole system: s.t A fast non-dominated sorting genetic algorithm (NSGA-II) [28] is one of the many algorithms that could give a Pareto front of MOP with an outstanding performance.Thus, we employ this algorithm to solve the set Q.

Simulation and Comparison
Let us investigate the DURC parameters under a totally symmetric network model, where the block lengths are N 1 = N 2 = 1200, the information symbols lengths are k 1 = k 2 = 1000, and the erasure probabilities of channel links are ε 1D = ε 2D = 0.5, ε 1R = ε 2R = 0, and ε RD = 0.1.Setting the maximal value of the degree D 1 = D 2 = 50, desired total overhead γ * = 1.1, the proportion of MIB in every source is π m1 = 0.5, and the decoding performance relationship between MIB and LIB is I m,1 ≥ 10I m,2 .To obtain the optimized degree distribution Ψ 1 (x) and Ψ 2 (x), we solved the MOP1 and finally get a Pareto front about the optimized y l,12 and y l,22 .We plot the Pareto fronts obtained from our optimizations in Figure 4, where η = y l,2,2 /y l,1,2 .It is obviously that the protection of S 1 is increasing with the increasing of η.The partial optimization results are shown in Table 1.Furthermore, we select an equal error protect (eep) degree distribution from the sets of our optimized DURC scheme, i.e., η = 1 and n = 1, the optimized Ψ 1 (x) and Ψ 2 (x) at sources are identical as Ψ 1 (x) = Ψ 2 (x) = 0.0111x + 0.4944x 2 + 0.1787x 3 + 0.1653x 5 + 0.0053x 6 + 0.0978x 12 + 0.0474x 50 .Furthermore, we substitute Ψ m (x) into the MOP problem (12) with a desired total overhead γ * = 1.1 to solve for the RC degree distribution on secondary links as Ω 1 (x) = Ω 2 (x) = 0.054x + 0.946x 2 , and NC relaying probabilities as p 1 = 0.045, p 2 = 0.045 and p 3 = 0.91, and we can use these eep-DURC scheme to compare with three existing distributed rateless coding schemes in the same network model as described before.The store-and-forward (SF) scheme is that the RC degree distributions on secondary and primary links are both set as the classical degree distributions used for Raptor codes in [9], and the relay node R randomly forwards the received coded symbols from S 1 and S 2 with equal probability.Simple network coding scheme (XOR) is that R always sends a new coded symbol to D, which is generated by XORing the two coded symbols from S 1 and S 2 .The SLRC scheme uses the network coding relay protocol as in [18], the relay node R only forwards coded symbols with degree-1 and degree-2 with a threshold probability, and RC degree distributions on secondary and primary links are also the Raptor degree distributions.Figure 5a shows the DER versus the total overhead for various distributed rateless codes (DRC) schemes, which are obtained by the asymptotical performances formed by And-Or tree analysis.It is noted that the XOR scheme has the worst decoding performance because of the lack of lower degree symbols (1 and 2) received by D. The SLRC scheme has better performance than the SF scheme when the overhead is larger than 1.15.The DER of the eep-DURC scheme, which is the basic of our proposed UEP, is clearly the lowest.Figure 5b shows the DER versus the total overhead for various UEP schemes.It is noted that the DURC schemes achieved the lowest DER for both MIB and LIB, which gives great support for information transmission in SIN.EWF MIB [18] WRC LIB [17] WRC MIB [17] (a) (b) Figure 6a shows the DER versus the total overhead of inner information in one source node for DURC and eep-DURC schemes from the And-Or tree performance evaluation.It is obvious that the DURC scheme can achieve marvel decoding performance of MIB with about two orders of the decoding performance of LIB decreasing, which gives great protection for MIB than eep-DURC scheme.Figure 6b shows the LIB performance of different sources for η = 10, 10 2 , 10 3 , 10 4 .The result shows that when decoding performance of S 1 increases, the performance of related S 2 will decrease, that is to say, the performance increase of one source is on the price of performance decrease of another source.Therefore, we only have to set desired parameters, and the scheme proposed in this paper will then satisfy different needs.We also estimate the throughput performance of the DURC scheme.To substitute the parameter setting in this section into (10), we can derive the system throughput upper bound as Figure 7 shows the system throughput versus erasure probability of channel R − D for information in one source and eep-DURC.To match the system setting of primary link and secondary link, we set ε ≤ 0.5.The result in Figure 7 demonstrated the same trend of Figure 6a, where the throughput of MIB is much higher than eep-DURC, but the throughput of LIB is decreased.However, for the systems that need higher protection of MIB, this tradeoff is meaningful.

Conclusions
We have investigated the design and optimization of the DURC over a dynamic energy-limited satellite relay network with multiple sources in SIN.The decoding performance has been analyzed by the And-Or tree technique, and the optimizations of parameters are solved by the MOP.The DURC can adapt the degree distribution, symbol selection weight and network coding rule to various erasure probabilities in different links.Simulation results show that, in the DURC, the information of any source can be divided into different arbitrary importance levels.The DURC can give great protection for the MIB without a large sacrifice of LIB.Furthermore, we can set different UEP levels for different sources to satisfy the requirement of system, which improves the flexibility of screening information for ground stations and makes the system more practical.

Figure 2 .
Figure 2. Connections and durations of DURC in SIN relaying Communications.

Figure 3 .
Figure 3. And-Or tree illustration of the edges connection of T l,11 .

Figure 4 .
Figure 4. Connections and durations of DURC in SIN relaying communications.

Figure 5 .
Figure 5. Decoding error rate versus the total overhead at the destination: (a) various DRC schemes; and (b) various UEP schemes.

Figure 6 .
Figure 6.Decoding performance of the DURC schemes: (a) DER performance of single node; (b) DER performance of various UEP setups.