BAR: Blockwise Adaptive Recoding for Batched Network Coding

Multi-hop networks have become popular network topologies in various emerging Internet of Things (IoT) applications. Batched network coding (BNC) is a solution to reliable communications in such networks with packet loss. By grouping packets into small batches and restricting recoding to the packets belonging to the same batch; BNC has much smaller computational and storage requirements at intermediate nodes compared with direct application of random linear network coding. In this paper, we discuss a practical recoding scheme called blockwise adaptive recoding (BAR) which learns the latest channel knowledge from short observations so that BAR can adapt to fluctuations in channel conditions. Due to the low computational power of remote IoT devices, we focus on investigating practical concerns such as how to implement efficient BAR algorithms. We also design and investigate feedback schemes for BAR under imperfect feedback systems. Our numerical evaluations show that BAR has significant throughput gain for small batch sizes compared with existing baseline recoding schemes. More importantly, this gain is insensitive to inaccurate channel knowledge. This encouraging result suggests that BAR is suitable to be used in practice as the exact channel model and its parameters could be unknown and subject to changes from time to time.


I. INTRODUCTION
Noise, interference and congestion are common causes of packet loss in network communications. Usually, a packet has to travel through multiple hops before it can arrive at the destination node. Traditionally, the intermediate nodes apply the store-and-forward strategy. In order to maintain a reliable communication, retransmission is a common practice. Feedback mechanism is applied so that a network node can be acknowledged that a packet is lost. However, due to the delay and the bandwidth consumption of the feedback packets, retransmission schemes come with a cost of degraded system performance.
Random linear network coding (RLNC) [3], [4], which is a simple realization of network coding [5]- [7], can achieve the capacity of multi-hop networks with packet loss even without the needs of feedback [8], [9]. Unfortunately, a direct application of RLNC induces an enormous overhead for the coefficient vectors, and also high computational and storage costs of network Batched network coding (BNC) [10]- [14] is a practical variation of RLNC, which resolves the issues of RLNC by encoding the packets for transmission into small batches of coded packets, and then applying RLNC on the coded packets belonging to the same batch. BATS codes [14], [15], which are a class of BNC, have a close-to-optimal achievable rate where the achievable rate is upper bounded by the expectation of the rank distribution of the batch transfer matrices that model the end-to-end network operations (packet erasures, network coding operations, etc.) on the batches [16]. This hints that the network coding operations, which also known as recoding, have an impact on the throughput of BNC.
Baseline recoding is the simplest recoding scheme which generates the same number of recoded packets for every batch. However, the throughput of baseline recoding is not optimal with finite batch sizes [17]. The idea of adaptive recoding, which aims to outperform baseline recoding by generating different numbers of recoded packets for different batches, was proposed in [17] without truly optimizing the numbers. Two adaptive recoding optimization models for independent packet loss channels were formulated independently in [18] and the conference version of this paper [1]. A unified adaptive recoding framework was proposed in [19] which can subsume both optimization models and support other channel models under certain conditions.
Although adaptive recoding can be applied distributively with local network information, it is a challenge to obtain accurate local information when we deploy adaptive recoding in realworld scenarios. Adaptive recoding requires two pieces of information: The distribution of the information remained in the received batches and the channel condition of the outgoing link.
The first piece of information may change over time if the channel condition of the incoming link varies. One reason of the variation is that the link quality can be affected by the interference from the users of other networks around the network node. A way to adapt to this variation is to group a few batches into a block and observe the distribution from the received batches in this block. We call this approach blockwise adaptive recoding (BAR).
The second piece of information may also vary from time to time. In some scenarios such as deep-space and underwater communications, feedback can be expensive or is not available at all so that a feedbackless network is preferred. Without feedback, we cannot update our knowledge on the channel condition of the outgoing link. Although we may assume an unchanged channel condition and measure some information such as the packet loss rate of the channel beforehand, this measurement, however, can be inaccurate due to observational errors or precision limits.
In this paper, we focus on the practical design to apply BAR in real-world applications.
Specifically, we answer the following questions in this paper: 1) How does the block size affect the throughput?
2) Is BAR sensitive to an inaccurate channel condition?
3) How to calculate the components of BAR and solve the optimization efficiently? 4) How to make use of link-by-link feedback if it is available?
The first question is related to the trade-off between throughput and delay: A larger block induces a longer delay but gives a higher throughput. We show by numerical evaluations that a small block size can already give a significant throughput gain compared with baseline recoding.
For the second question, we demonstrate that BAR performs very well with an independent packet loss model on channels with dependent packet loss. We also show that BAR is insensitive to an inaccurate packet loss rate. This is an encouraging result as this suggests that it is feasible to apply BAR in real-world applications.
The third question is important in practice as BAR is suppose to run at network nodes which are usually routers or embedded devices having limited computational power but at the same time they have to handle a huge amount of network traffic. Also, by updating the knowledge of the incoming link from a short observation, we need to recalculate the components of BAR and solve the optimization problem again. In light of this, we want to reduce the number of computations to improve the reaction time and reduce the stress of congestion. We answer this question by proposing an on-demand dynamic programming approach to build the components, a efficient greedy algorithm to solve BAR, and an approximation scheme to speed up the greedy algorithm.
Lastly for the fourth question, we consider both a perfect feedback system (e.g., the feedback passes through a side-channel with no packet loss) and a lossy feedback system (e.g., the feedback uses the reverse direction of the lossy channel for data transmission). We investigate a few ways to estimate the packet loss rate and show that we can further boost the throughput by using feedback. Also, a rough estimation is sufficient to catch up the variation of the channel condition.
In other words, unless there is another application which requires a more accurate estimation on the packet loss rate, we may consider to use an estimation with low computational cost, e.g., the maximum likelihood estimator.
The paper is organized as follows. We first formulate BAR and introduce some of its properties in Section II. Then, we propose some algorithms to solve BAR efficiently and evaluate the throughput in Section III. In Section IV, we demonstrate that BAR is insensitive to inaccurate channel models and investigate the use of feedback mechanism. At last, we conclude the paper in Section V.

II. BLOCKWISE ADAPTIVE RECODING
In this section, we briefly introduce batched network coding (BNC) and then formulate blockwise adaptive recoding (BAR).
We consider line networks in this paper as they are the fundamental building blocks of a general network. A recoding scheme for line networks can be extended for general unicast networks and certain multicast networks [14], [18]. A line network is a sequence of network nodes where network links only exist between two neighboring nodes. An example of a line network is illustrated in Fig. 1.

A. Batched Network Coding
Suppose we want to send a file from a source node to a destination node through a multi-hop line network. The file is divided into multiple input packets, where each packet is regarded as a vector over a fixed finite field. A batched network code (BNC) has three main components: the encoder, the recoder and the decoder.
An encoder of a BNC is applied at the source node to generate batches from the input packets, where each batch consists of a small number of coded packets. To generate a batch, the encoder samples a predefined degree distribution to obtain a degree, where the degree is the number of input packets contributed to the batch. Depends on the application, there are various ways to formulate the degree distribution [20]- [23]. According to the degree, a set of packets is chosen randomly from the input packets. Each packet in the batch is formed by taking random linear combinations on the chosen set of packets. The encoder generates M packets per batch, where M is known as the batch size.
Each packet in a batch has a coefficient vector attached to it. Two packets in a batch are linearly independent of each other if and only if their coefficient vectors are linearly independent of each other. Right after a batch is generated, the packets in it are defined to be linearly independent of each other. This can be accomplished by suitably choosing the initial coefficient vectors.
A recoder is applied at each intermediate node, which performs network coding operations to the received batches to generate recoded packets. This procedure is known as recoding. Some packets of a batch may be lost when they pass through a network link. Each recoded packet of a batch is formed by taking random linear combination on the received packets of this batch. The number of recoded packets depends on the recoding scheme. For example, baseline recoding generates the same number of recoded packets for every batch. Optionally, we can also apply a recoder at the source node so that we can have more than M packets per batch at the beginning.
After recoding, the recoded packets are sent to the next network node.
At the destination node, a decoder is applied to recover the input packets. Depends on the specific BNC, we can use different decoding algorithms such as Gaussian elimination, belief propagation and inactivation [24], [25].

B. Expected Rank Functions
Define the rank of a batch at a network node by the number of linearly independent packets remained in the batch, which is a measure on the amount of information carried by the batch. Adaptive recoding aims to maximize the sum of the expected value of the rank distribution of each batch arriving at the next network node. For simplicity, we called this expected value the expected rank.
For a batch b, denote the rank of b by r b and the number of recoded packets to be generated for b by t b . The expectation of the rank of b at the next network node, denoted by E(r b , t b ), is known as the expected rank function. We have where X t is the random variable of the number of packets of a batch received by the next network node when we send t packets for this batch at the current node, and ζ i,r j is the probability that a batch of rank r at the current node with i received packets at the next network node has rank j at the next network node. 1 The exact formulation of ζ i,r j can be found in [14], , where q is the field size for the linear algebra operations and ζ m j = j−1 k=0 (1 − q −m+k ). It is convenience to use q = 2 8 in practice as each symbol in this field can be represented by 1 byte. For a sufficient large field size, say, q = 2 8 , ζ i,r j is very close to 1 if j = min{i, r} and is very close to 0 otherwise. That is, we can approximate ζ i,r j by δ j,min{i,r} where δ ·,· is the Kronecker delta. This approximation is also used in literature such as [26], [27].
For the independent packet loss model with packet loss rate p, we have X t ∼ Binom(t, 1 − p), a binomial distribution. If p = 1, then a store-and-forward technique can guarantee the maximal expected rank. If p = 0, then no matter how many packets we transmitted, the next network 1 Systematic recoding [15], [17], which regards the received packets as recoded packets, can achieve a nearly indistinguishable performance compared with the one which generates all recoded packets by taking random linear combinations [15]. So, we can also use (1) to approximate the expected rank functions for systematic recoding accurately. node must receive no packet. So, we assume 0 < p < 1 in this paper. 2 We demonstrate the accuracy of the approximation ζ i,r j ≈ δ j,min{i,r} by showing the percentage error of the expected rank function corrected to 3 decimal places when q = 2 8 , p = 0.2 and X t ∼ Binom(t, 1 − p) in Table I. We can see that only three pairs of (r, t) have percentage errors larger than 0.1%, where they occur when r, t ≤ 2. For all the other cases, the percentage errors are less than 0.1%.
Therefore, such approximation is accurate enough for practical applications. In the remaining text, we assume ζ i,r j = δ j,min{i,r} . That is, for independent packet loss model, we have We also consider the expected rank functions for burst packet loss channels modelled by Gilbert-Elliott (GE) models [28], [29], where the GE model was also used in other literature of BNC such as [26], [30]. A GE model is a 2-state Markov chain as illustrated in Fig. 2.
In each state, there is an independent event to decide whether a packet is lost or not. Define f (s, i, t) := Pr(S t = s, X t = i), where S t is the random variable of the state of the GE model after sending t packets of the batch. By exploiting the structure of the GE model, the computation of f can be done by dynamic programming. Then, we have It is easy to see that we would take more steps to compute (E-GE) than compute (E-indep).
So, a natural question to ask is that for burst packet loss channels, is the throughput gap between adaptive recoding with (E-indep) and (E-GE) small? We would demonstrate in Sec. IV-B that, yes, the gap is small so that we can use (E-indep) at all time to get a nice throughput. Therefore, we mainly focus our investigation on (E-indep). 2 It is easy to prove that the results in this paper are also valid for p = 0 or 1 when we define 0 0 := 1, which is a convention in combinatorics such that Binom(t, 0) and Binom(t, 1) are well-defined with a correct interpretation.
In the rest of this paper, we refer E(r, t) to E indep (r, t) unless otherwise specified. We first give the recursive formula for E(r, t). For integers r ≥ 0 and t ≥ −1, define When t ≥ 0, the function β p (t, r) is the partial sum of the probability masses of a binomial distribution Binom(t, 1 − p). The case t = −1 will be used in the approximation scheme in Section III and we will discuss such case in that section.
, where t and r are non-negative integers.
When Y i = 1, it means that the i-th packet is received by the next hop.
When we transmit one more packet at the current node, Y t+1 indicates whether this packet is received by the next network node or not. If Y t+1 = 0, i.e., the packet is lost, then the expected rank will not change. If Y t+1 = 1, then the packet is linearly independent of all the already received packets at the next network node if the number of received packets at the next network node is less than r. That is, the rank of this batch at the next network node increases by . The formula shown in Lemma 1 can be interpreted as: A newly received packet is linearly independent of all the already received packets with probability tends to 1 unless the rank has already reached r. This can also be interpreted as ζ i,r j = δ j,min{i,r} with probability tends to 1.
where the equality holds if and only if r = 0.
By the definition of β p in (2), we can see that β p (t, r) = 0 if and only if t ≥ 0 and r = 0.

C. Blockwise Adaptive Recoding
Let a block be a set of batches. We assume that the blocks at a network node are mutually disjoint. Blockwise adaptive recoding (BAR) is a recoding scheme which groups the batches into blocks and optimizes jointly the number of recoded packets for each batch in the block.
Fix a network node. Suppose the node receives a block L. For each batch b ∈ L, let r b and t b be the rank of b and the number of recoded packets to be generated for b respectively. A node can only transmit a finite number of packets for a block in practice. We denote this number by t L max , which is an input to the optimization problem. The following model maximizes the sum of the expected rank of the batches in the block L: The above optimization depends only on the local knowledge at the node. The batch rank r b can be known from the coefficient vectors of the received packets of batch b. The value of t L max can affect the stability of the packet buffer. 3 . Corollary 1 gives that the objective value with the set {t * b + α b } b∈L is no less than the one with the set {t * b } b∈L . If the objective value is unchanged, it is no harm to achieve the equality in the constraint in (P) with {t * b + α b } b∈L . Otherwise, it is a contradiction to the optimality, so we must have to achieve the equality in the constraint.
Note that the solution of (B) may not be unique. We only need to obtain one of the solutions for recoding purpose. In general, (B) is a non-linear integer programming problem. A linear programming variant of (B) can be formulated by using a technique in [31]. However, such formulation has a huge amount of constraints and requires to calculate the values of E(r b , t) for all b ∈ L and all possible t beforehand. We defer the discussion of this formulation to Appendix C. Now, we formally present BAR. It is based on the solution of (B). Similar to baseline recoding, the source node transmits t  We can also use (B) to decide the value of t (0) b , i.e., we group some batches generated by the encoder into a block L. We will discuss in Section III-D that we should have |t A network node would keep receiving packets until it has received enough batches to form a block L. A packet buffer is used to store the received packets. Then, the node solves (B) and obtains the number of recoded packets for each batch in the block, i.e., {t b } b∈L . The node then generates and transmits t b recoded packets for every batch b ∈ L. At the same time, the network node keeps receiving new packets. After all the recoded packets for the block L are transmitted, the node drops the block from its packet buffer and then repeats the procedure by considering another block.
The size of a block depends on its application. For example, if an interleaver is applied to L batches, we can group the L batches as a block. When |L| = 1, the only solution is t b = t L max , which degenerates into baseline recoding. Therefore, we need to use a block size at least 2 in order to enjoy the throughput enhancement of BAR. Intuitively, it is better to optimize (B) with a larger block size. This is formally stated in Theorem 2 below. However, the block size is related to the transmission latency as well as the computational and storage burdens at the network nodes. Note that we cannot conclude the exact rank of each batch in a block until the previous network node finishes sending all the packets of this block. That is, we need to wait the previous network node for the packets of all the batches in a block until we can solve the optimization problem. Numerical evaluations in Section III-F show that |L| = 2 already has obvious advantage over |L| = 1, and it may not be necessary to use a block size larger than 8.
Theorem 2. Let L and L be two blocks. The sum of objectives of maximizing L and L separately is less than or equal to the objective of maximizing the block L ∪ L .
Proof: See Appendix E.

D. Properties of Blockwise Adaptive Recoding
Due to the non-linear integer programming structure of (B), we need to find some properties of the model in order to design efficient algorithms.
Define the probability mass function of the binomial distribution Binom(t, 1 − p) by otherwise. We can rewrite (2) in terms of B p (t, i) by Due to the fact that t i=0 B p (t, i) = 1, we have Furthermore, it is easy to see that A tabular form of β p is illustrated in Fig. 3 after substituting the boundaries with 0s and 1s.
The regularized incomplete beta function, defined as I x (a, b) := Lemma 2. Assume 0 < p < 1. Let Λ be an index set.
where the equality holds if and only if t + 1 < r or t ≥ r = 0; (c) β p (t, r) ≤ β p (t + 1, r + 1) where the equality holds if and only if t < r; (d) β p (t, r + 1) ≥ β p (t, r) where the equality holds if and only if t < r; (e) 1 ≥ max b∈Λ β p (t b , r b ) ≥ β p (t a + s, r a ) for all a ∈ Λ and any non-negative integer s; for all a ∈ Λ and any non-negative integer s such that t a − s ≥ −1.
Proof: See Appendix F.
Lemma 3. Let t and r are non-negative integers.
(a) E(r, t + 1) = E(r, t) For a batch of rank r and t recoded packets transmitted, Lemma 1 states that when we transmit one more recoded packet, the expected rank of the batch at the next network node is increased by (1 − p)β p (t, r). Define a multiset Ω r which collects the value of (1 − p)β p (t, r) for all integers The following lemma shows the relationship between E(r, t) and Ω r .
Lemma 4. E(r, t) equals the sum of the largest t elements in Ω r .
is a monotonic decreasing function on t. By (3) and (5), we have β p (0, r) = 1 ≥ β p (t, r) for all positive integers t. So, E(r, t) is the sum of the largest t elements in Ω r .
When t L max ≤ b∈L r b , we can find an optimal solution of (B) easily by the following lemma. However, this condition means that the value of t L max is too small such that the node has just enough or even not enough time to forward the linearly independent packets it received.
and (5) shows that E(r . Therefore, the first r b packets of each batch can gain the most to the expected rank. Note that each of the first r b packet gains (1 − p) regardless of the value of r b . This concludes that if t L max ≤ b∈L r b , then every assignment with t b ≤ r b and b∈L t b = t L max can achieve the largest sum of expected rank, which equals (1 − p)t L max . Now, we consider the case that we have enough time to transmit more than b∈L r b recoded packets. The following theorem states the intuition that we should send more packets for a batch having a higher rank than a batch having a lower rank.
then t m < t n for all m, n ∈ L such that r m < r n .
Proof: See Appendix H.
Theorem 3 implies that a network node should not transmit the same number of packets for the batches having different ranks. Baseline recoding has t b = M for all b ∈ L so it violates this condition when not all the batches have the same rank. This means that, baseline recoding is not optimal, which is consistent with the result from [17]. Next, we give the following theorem which shows the necessary and sufficient conditions for a non-optimal solution of (B).
Proof: See Appendix I.
We can take contrapositive to obtain the necessary and sufficient conditions for an optimal solution. Finally, the following corollary gives a contrast to Lemma 5, which is also an assumption required by Theorem 3.
Proof: Suppose there exists some b ∈ L such that t b < r b . The constraint of (B), i.e.,

III. ALGORITHMS FOR BLOCKWISE ADAPTIVE RECODING
In this section, we first propose a greedy algorithm to solve (B) efficiently. This algorithm gives an insight to the characteristic of the solution, which will be discussed in Section III-D.
We also propose an approximation scheme based on Theorem 3 to speed up the solver for practical implementations. We defer a discussion on connecting BAR with other adaptive recoding formulations which assume that the incoming link condition is known in advance to Appendix D.
The algorithms we propose in this paper frequently query and compare the values of (1 − We suppose a lookup table is constructed so that the queries can be done in O(1) time. The table is reusable if the packet loss rate of the outgoing link is unchanged. We only consider the subset {−1, 0, 1, . . . , t L max } × {0, 1, 2, . . . , M } of the domain of β p because 1) the maximum rank of a batch is M ; and 2) any t b cannot excess t L max as b∈L t b = t L max . The case t = −1 will be used by our approximation scheme so we keep it in the lookup table. We can build the table on-demand by dynamic programming, which will be discussed in Section III-E.

A. Greedy Algorithm
We have discussed the case t L max ≤ b∈L r b in Lemma 5. Now, we consider t L max > b∈L r b . We first define the subproblems (B (k) ) of (B) for k ∈ {0, 1, . . . , t L max } so that we can investigate the optimal substructure for our greedy algorithm: Fix a block L. We define a multiset Ω : L}. If two batches a, b ∈ L have the same rank, i.e., r a = r b , then for all t, we have As Ω is a multiset, the duplicated values are not eliminated.
We have shown the relationship between E(r, t) and Ω r in Lemma 4. Now, we have the following lemma to connect (B (k) ) and Ω.
Lemma 6. The optimal value of (B (k) ) is the sum of the largest k elements in Ω.
Proof: Let {t b } b∈L solves (B (k) ). Suppose the optimal value is not the sum of the largest k elements in Ω. However, Lemma 4 states that E(r b , t b ) equals the sum of the largest t b elements in Ω r b for all b ∈ L. This means that there exists two distinct batches κ, ρ ∈ L with t ρ ≥ 1 such Fig. 4: Each grid cell represents a value of (1 − p)βp(y, r b ). The universe is the multiset Ω. The grey region Ω k contains the largest k elements in Ω, which represents an optimal solution of (B (k) ). To achieve an optimal solution of (B (k+1) ), we need to find the (k + 1)-th largest element in Ω, which is the largest element above the solid line. By Lemma 2(b), we have βp(y, r b ) ≥ βp(y + 1, r b ) for all b ∈ L, which implies that the largest element above the solid line is the largest one among the blue cells.

Algorithm 1: Solver of BAR
By setting t L max = k, we can apply Theorem 4 which gives that {t b } b∈L is not an optimal solution of (B (k) ). The proof is done by contradiction.
The largest k + 1 elements in Ω must contain the largest k elements in Ω in terms of the values. That is, Lemma 6 shows the optimal substructure of (B). Algorithm 1 is a greedy algorithm which makes use of this optimal substructure. After an initialization, the algorithm repeatedly finds the batch b which has the largest (1−p)β p (t b , r b ) and increases the corresponding t b by one, i.e., we are solving (B (k) ) for different k incrementally.
Proof: Note that the variable t in Algorithm 1 represents the number of unassigned packets.
The algorithm stops when t = 0, i.e., b∈L t b = t L max , which is in the feasible region. If t L max ≤ b∈L r b , the algorithm returns {t b } b∈L such that t b ≤ r b for all b ∈ L. Its optimality follows Lemma 5. The algorithm takes O(|L|) time in this case. Now, we consider t L max > b∈L r b . We prove the correctness of Algorithm 1 by induction.
value of (B (k) ), and Ω k is the collection of the largest k elements in Ω. By Lemma 2(e), we The multiset Ω k+1 contains the largest k + 1 elements in Ω. So, by Lemma 6, {t , so the update is a decrease key operation in a max-heap. A Fibonacci heap [33] cannot benefit this operation here. Therefore, the overall time complexity

B. Equal Opportunity Approximation Scheme
Algorithm 1 increases t b step-by-step. In the geometric point of view, the algorithm finds a path from the interior of a compact convex polytope P to the facet H : the half-space representation of P is the system of linear inequalities Equivalently, P is the convex hull of the points (t L max , 0, . . . , 0), (0, t L max , . . . , 0), . . ., (0, 0, . . . , t L max ) and the origin.
If we have a method to move a non-optimal feasible point on H towards an optimal point, together with a fast and accurate enough approximation to (B), then we can combine them to solve (B) faster than using Algorithm 1 directly. This idea is illustrated in Fig 5. We first give an approximation scheme in this subsection. As we cannot generate any linearly independent packet for a batch of rank 0, we have E(0, ·) = 0. So, we can exclude those batches having rank 0 from L before we start the approximation.
An easy way to give an approximation is to assign {t b } b∈L following the guideline given in Theorem 3 by: In case is not an integer, we can round it up for the batches having higher ranks and round it down for those having lower ranks.
The above rules allocate the unassigned packets to the batches equally after r b packets are assigned to each batch b. So, we call this approach the equal opportunity approximation scheme.
The steps of this scheme is summarized in Algorithm 2 and illustrated in Fig. 6.
Note that we do not need to know the packet loss rate p to apply this approximation. This is, if we do not know the value of p, we can still apply this approximation to outperform baseline recoding.

Algorithm 2: Equal Opportunity Approximation Scheme
for the r elements which have the largest Proof: It is easy to see that Algorithm 2 outputs {t b } b∈L which satisfies b∈L t b = t L max . That is, the output is a feasible solution of (B). Note that |L| ≤ |L|, so the assignments and branches If L = ∅, i.e., the whole block is lost, then any feasible {t b } b∈L is a solution, and the optimal objective value is 0. If t L max ≤ b∈L r b , then the algorithm terminates with an output satisfying t b ≤ r b for all b ∈ L. By Lemma 5, such solution is optimal.
In practice, the batch size M is small. We can search those r batches having the highest ranks in O(|L| + M ) time by a counting technique (see Appendix A-B) as an efficient alternative. Algorithm 2 is a (1−p)-approximation algorithm, although the relative performance guarantee factor 1−p is not tight in general. However, it suggests that the smaller the packet loss rate p, the more accurate output the algorithm gives. We leave the discussion regarding this approximation in Appendix A-C.

C. Speed-up via Approximation
In this subsection, we investigate a method which corrects an approximate solution to an optimal solution.
Note that all integral points on the facet H are feasible solutions to (B). When we move a point on H, the minimal change is to increase t b by 1 and decrease t b by 1, where b, b ∈ L and b = b . We call such minimal change a step. A sequence of steps is called a path.
For any non-optimal point T on the facet H, there exists a path of finite steps from T to an optimal point, where the objective value is strictly increasing along the path.
Proof: If t L max = 0, then the facet H degenerates into a single point, which is the origin. The only feasible point is the optimal point, so we cannot have a non-optimal solution.
Consider t L max > 0. Define a multiset Ω t L max which is a collection of the largest t L max elements in Ω. By Lemma 6, ω∈Ω t L max ω is the optimal value of (B). For any non-optimal point on H locating is the number of elements in Ω t L max which are also contained in k . We know that there exists a m such that the given non-optimal point T is located at (t We provide a constructive proof for this lemma. Given any k ∈ {m, m + 1, . . . , t L max − 1}, we are going to show that there exists a step to obtain a point locating at (t ) b∈L which has a larger objective value. Suppose we have a non-optimal point located at (t By definition, |Ω k | = k.Ω k contains the elements in k which are part of the optimal solution. On the other side,Ω k contains the elements in k which are not the largest t L max elements in Ω. By Lemma 6, the elements inΩ k are the cause of The intersection of the current non-optimal and optimal solutions, i.e., the multisetΩ k , is the grey region. The red region is denoted by the multisetΩ k , which contains the non-optimal values in k which should be removed. The blue region contains the values which are part of the optimal solution but not in k yet.
making the solution non-optimal. It is clear that for all The relationship between k , Ω t L max ,Ω k andΩ k is illustrated in Fig. 7. As k = Ω t L max , i.e., the current point is not optimal, we can apply Theorem 4 and know that there exists two batches κ, ρ ∈ L with t ρ ≥ 1 such that Without loss of generality, take any κ ∈ arg max b∈L β p (t b , r b ) and ρ ∈ arg min b∈L β p (t b − 1, r b ).
Note that max(Ω \ k ) ∈ Ω t L max , and max(Ω \ k ) > min(Ω k ) = min( k ). By Lemma 2(e) and (f), we have max( We construct a step by removing min{ k } from k and inserting max{Ω \ k } into it.
Mathematically, we are creating the following multisets:

Algorithm 3: Solver of BAR via Approximation
Run an approximation to get t b , b ∈ {a ∈ L : ra > 0} ; On the other hand, consider where (8) follows (7). Lastly, by Lemma 3(b), we have , which is strictly increasing. By induction, the existence of such path is shown. Algorithm 3 is a greedy algorithm which uses any point T on the facet H as a starting point.
Then, it follows the path constructed in the proof of Lemma 7 to obtain an optimal solution.
An iteration to modify the solution is illustrated in Fig. 8. Note that the algorithm may query β p (t a − 1, r a ) for a ∈ L. If t a = 0, then it is accessing the value β p (−1, r a ). Recall that we have defined β p (−1, ·) = 1, which is the upper bound of β p (·, ·) by (3). So, these values act as barriers to prevent outputing negative number of recoded packets.
Proof: Theorem 4 suggests that if the current feasible solution is not optimal, then we must have two distinct batches a, b ∈ L with a ≥ 1 such that Suppose t a = 0 for some a ∈ L, then by (5), we have β p (t a − 1, r a ) = β p (−1, r a ) = 1. By (3), we know that it is not possible to have a b ∈ L satisfying (9). That is, we do not need to check whether t a = 0 or not in the algorithm: The value β p (−1, r a ) does this purpose.

D. Solution Characteristic
In this subsection, we discuss an observation inspired by Algorithm 1. Consider a block L.
Let B r = {b ∈ L : r b = r} be a set containing all batches with rank r in the block. It is easy to see that M r=0 B r = L and B α ∩ B β = ∅ for all distinct α, β ∈ {0, 1, . . . , M }.
Suppose after some iterations, we have t b = t b for all B r , r = 0, 1, . . . , M . This condition holds for the subproblem (B (k 0 ) ) where k 0 = b∈L r b . The current iteration selects a batch b ∈ arg max b∈L β p (t b , r b ). Observe that actually all batches in B r b are in arg max b∈L β p (t b , r b ).
That is, the algorithm can select another batch in B r b in the next iteration, so on and so forth.
This suggests that the difference between every pair of t b , t b , where b, b ∈ B r , is at most 1 for all r = 0, 1, . . . , M . Also, we can select all the batches in B r before we consider another rank, which suggests that at most one r we have a difference 1 mentioned above. The following theorem formally states this observation. Corollary 3 below is a response to the discussion in Section II-C.

E. Construction of the Lookup Table
In the algorithms, we assume that we have a lookup table for the function β p (·, ·) so that we can query its values quickly. As shown in (6), we can consider our problem as accessing the value of a regularized incomplete beta function I p (·, ·). Most available implementations consider non-negative real parameters and calculate different queries independently. This consideration is too general for our application, as we only need to query the integral points efficiently. In this subsection, we propose an on-demand approach to construct the lookup table.
Being a dynamic programming approach, we need the following recursive relations: β p (t, r) = β p (t, r − 1) + B p (t, r − 1) for 1 < r ≤ t + 1, (a) The first stage of the table generation. The 1s and 0s paddings are generated first. The solid and dashed arrows represent (10) and (11) respectively.     Fig. 9, we know that we have to calculate all rows of β(t, r) for t ≤ t b . Also, the recursive relations on a row only depend on the previous row, we need to prepare ahead the values of B p in the next row so that we have the values to compute β p in the next row. As an example, Fig. 10 illustrates the values we have where R is the number of rows we want to construct. As restricted by the block size, we know that R ≤ t L max . The worst case is that we only receive one rank-M batch for the whole block, which is unlikely to occur. In this case, we have the worst case complexity O(M t L max ). Note that we can use fixed-point numbers instead of floating point numbers for a more efficient

F. Throughput Evaluations
We now evaluate the performance of BAR in a feedbackless multi-hop network. Note that baseline recoding is a special case of BAR with block size 1. Let (h 0 , h 1 , . . . , h M ) be the rank distribution of the batches arriving at a network node. The normalized throughput at a network node is defined to be the average rank of the received batches divided by the batch size, i.e., M i=0 ih i /M . In our evaluations in this subsection, we set t L max = M |L| for every block L. That is, the source node transmits M packets per batch. We assume that every link in the line network has independent packet loss with the same packet loss rate.
We first evaluate the normalized throughput with different batch sizes and packet loss rates. words, Fig. 11 shows the best possible throughput of adaptive recoding. We will compare the effect of block sizes later. We observe that 1) adaptive recoding has a higher throughput than baseline recoding under the same setting; 2) the difference in throughput between adaptive recoding and baseline recoding is larger when the batch size is smaller, the packet loss probability is larger, or the length of the line network is longer.
In terms of throughput, the percentage gains of adaptive recoding over baseline recoding using 2) using |L| = 2 already gives much larger throughput than using |L| = 1; and 3) using |L| > 8 gives little extra gain in terms of throughput.
Next, we show the performance of the equal opportunity approximation scheme. Fig. 13 compares the normalized throughput achieved by Algorithm 2 (AS) and the true optimal throughput (AR). We also compare the best possible throughput of adaptive recoding here. We observe that 1) the approximation is close to the optimal solution; and 2) the gap in normalized throughput is smaller when the batch size is larger, the packet loss probability is smaller, or the length of the line network is shorter.

IV. IMPACT OF INACCURATE CHANNEL MODELS
In this section, we first demonstrate that the throughput of BAR is insensitive to inaccurate channel models and inaccurate packet loss rates. Then, we investigate the feedback design and show that feedback can enhance the throughput a little bit.

A. Sensitivity of β p (r, t)
We can see that our algorithms only depend on the order of the values of β p (·, ·), it is possible that the optimal {t b } b∈L for an incorrect p is the same as the one for a correct p. As shown in Fig. 3, those boundaries 0s and 1s are not affected by p ∈ (0, 1). That is, we only need to  We can also check with the condition number [34] to verify the stability. 4 We calculate some condition numbers of β p (t, r) in Fig. 15 by the formula stated in Theorem 9.
Theorem 9. Let p ∈ (0, 1) and t ≥ r > 0. The condition number of β p (t, r) with respect to p is Proof: See Appendix K.

B. Impact of Inaccurate Channel Models
To demonstrate the impact of inaccurate channel model, we consider three different channels to present our observations.
• ch1: independent packet loss with constant loss rate p = 0.45.
• ch2: burst packet loss modelled by the GE model illustrated in Fig 2 with the parameters used in [26], which are p GB = p BG = p G = 0.1, p B = 0.8.  This suggest that in the view of throughput, BAR is not sensitive to p. Even with a wild guess on p, BAR still outperforms baseline recoding, as illustrated by the green curves. Regarding ch2, we also plot the orange curve with legend GE BAR, which is the throughput achieved by BAR with (E-GE). We can see that the gap between the throughput achieved by BAR with (E-indep) and (E-GE) is very small. As a summary of our demonstration: 1) We can use BAR with (E-indep) for bursty channels and the loss in throughput is not significant.
2) BAR with an inaccurate constant p can achieve a throughput close to the one when we have the exact real-time loss rate.
3) We can see a significant throughput gain from baseline recoding by using BAR even with inaccurate channel models.

C. Feedback Design
Although an inaccurate p can give an acceptable throughput, we can further enhance the throughput by adapting the varying p. To achieve this goal, we need to use feedback.
We adopt a simple feedback strategy which let the next node returns the number of received packets of the batches for the current node to estimate p. Although the next node does not know the number of lost packets per batch, it knows the number of received packets per batch. So, we do not need to introduce more overhead to the packets transmitted by the current node.
When we estimate p, we have to know the number of packets lost during a certain time frame. If the time frame is too small, the estimation is too sensitive so that the estimated p changes rapidly and unpredictably. If the time frame is too long, we captured too much outdated information about the channel so the estimated p changes too slow and may not be able to adapt to the real loss rate. Recall that the block size is not large as we want to keep the delay small. We use a block as an atomic unit of the time frame. The next node gives feedback on the number of received packets per block. The current node uses the feedback of the blocks in the time frame to estimate p. We perform an estimation of p per received feedback. This way, the estimated p is the same for each block so that we can apply BAR with (E-indep).
If the feedback is sent via a reliable side channel, then we can assume that the current node can always receive the feedback. However, if the feedback is sent via an unreliable channel, say, the reverse direction of the same channel the data packets were sent, then we need to consider feedback loss. Let Λ be a set of blocks in a time frame with received feedback. We handle the case of feedback loss by considering the total number of packets transmitted for the blocks in Λ as the total number of packets transmitted during the time frame. This way, we can also start the estimation before a node sent enough blocks to fill up a time frame. Suppose no feedback is received for every block in a time frame, then we reuse the previously estimated p for BAR.
At the beginning of the transmission, we have no feedback yet so we have no information to estimate p. To outperform baseline recoding without the knowledge of p, we can use the approximation of BAR given by Algorithm 2. Once we have received at least one feedback, we can start estimating p.

D. Estimators
Let x and n be the total number of packets received by the current node and the total number of packets transmitted by the previous node respectively in a time frame of observation. That is,  is the posterior mean with s = n − x and f = x, which is γa+n−x γ(a+b)+n . To prevent a bias when there is no enough sample, we should select a non-informative prior as the initial hyperparameters.
We first show the estimation of p by different schemes in Fig. 17. We use BAR with (E-indep) and L = M = 4. The size of the time frame is W blocks. Forp MLE andp MM , the observations in the whole time frame has the same weight. Forp Bayes , the effect of each observation is deceasing 10   exponentially fast. We consider an observation is out of the time frame when it is scaled into 10% of the original value. That is, we define the scaling factor by γ = W √ 0.1. In each subplot, the black curve is the real-time p. The red and blue curves are for the estimation without feedback loss and for that with feedback loss respectively. In each case, the two curves are the 25% and 75% percentiles from 1000 runs respectively.
We can see that a larger W has a slower respond to the change of p in ch3. Among the estimators,p Bayes has the fastest respond speed as its observations in a time frame is not fairly weighted. Also, although ch1 and ch2 have the same average loss rate, the estimation has a larger variance when the channel is bursty.

E. Throughput Evaluations
As we have discussed in Section IV-B, the p we guess does not have a significant impact on the throughput. We now show the throughput achieved by the estimation schemes in Fig. 18.
We are not wild guessing p anymore so it is not surprise that we can achieve nearly the same throughput as when we know the real p for ch1 and ch2. If we look closely, we can see from Fig. 16 that for ch3, there is a small gap between the throughput of BAR when we know the real-time p and the one of BAR when we use a constant p. Although the estimation may not be accurate at all time, we can now adapt to the change of p so we can finally achieve a throughput nearly the same as when we know the real-time p. On the other hand, no matter the feedback is loss or not, the plots shown in Fig. 18 are basically the same.

V. CONCLUDING REMARKS
We proposed blockwise adaptive recoding (BAR) in this paper which can adapt to variation of the incoming channel condition. In a practical perspective, we discussed how to calculate the components of BAR and how to solve BAR efficiently. We also investigated the impact of inaccurate channel model on the throughput achieved by BAR. Our evaluations showed that 1) BAR is not sensitive to the channel model: A wild guess on the loss rate can still outperform baseline recoding.
2) For bursty channels, the throughput achieved by BAR with an independent loss model is nearly the same as the one with the real channel model. This is, we can use the independent loss model for BAR in practice and apply the techniques in this paper to reduce the computational costs of BAR.
3) Feedback can enhance the throughput a little bit for channels with dynamic loss rate. On the other hand, feedback loss barely has an effect to the throughput of BAR. So, we can send the feedback through a lossy channel without the need of retransmission. Unless we need to use an accurate estimated loss rate in other applications, we can use MLE with a small time frame for BAR to reduce computational time.
These encouraging results suggest that BAR is suitable to be deployed in real-world applications.
APPENDIX A DISCUSSION ON ALGORITHM 2

A. Linear Time Selection
Here we discuss the way to add one to the number of recoded packets for those batches having the highest ranks in O(|L|) time. The linear time worst case can be achieved by using introselect [37] or quickselect [38] with median of medians [39] pivot strategy. We use the selection algorithm to find the r-th largest element, and we also make use of its intermediate steps.
During an iteration, one of the following three cases will occur. If the algorithm decides to search the part larger than the pivot, then it means that the discarded part does not contain the largest r elements. If the part smaller than the pivot is selected, then the discarded part is part of the largest r elements. If the pivot is exactly the r-th largest element, then the part larger than the pivot together with the pivot are part of the largest r elements.

B. Counting Technique
Now we discuss how to search those batches having the highest ranks in O(|L| + M ) time. It can be done by using part of the counting sort algorithm [40]. We first compute the histogram of

C. Performance Guarantee and Bounded Error
We start the discussion with the following theorem.
Theorem 10. Let SOL and OPT be the solution given by Algorithm 2 and the optimal solution of (B) respectively, then Proof: We first show that the algorithm has a relative performance guarantee factor 1 − p.
As stated in Theorem 6, when t L max ≤ b∈L r b , the algorithm guarantees an optimal solution. So, we only consider t L max > b∈L r b . Let {t b } b∈L be the approximation given by the algorithm. Note that any linear combinations of r independent vectors cannot obtain more than r independent vectors. So, the expected rank of a batch at the next hop must be no larger than the rank of the batch at the current hop, and, it is also non-negative. That is, This gives a bound of the optimal solution by We consider the exact formula of the approximation: where • (15) is stated in Lemma 3(b); • (16) holds as β p (j, r b ) ≥ 0 for all j, r b , which is by (3); • (17) follows the inequality (14).
Lastly, we show the bounded error. Let {t * b } be a solution to (B). By Corollary 2, we can write Note that the constraint of (B), i.e., b∈L t * b = t L max , suggests that On the other hand, it is easy to see that the approximation must give either t b = r b + or We consider the difference between OPT and SOL: where • (20) is the difference between the exact form of OPT by Lemma 3(b) after substituting the lower bound of SOL shown in (19); • the condition b > in the summation of (21) can be removed, as we have • (22) follows (18) and the fact shown in (3) that the extra β p (j, r b ) terms are non-negative.
The proof is done.
If the relative performance guarantee factor 1 − p is tight, we need both equalities in (16) and (17) hold. First, by (3), we know that β p (j, r b ) is always non-negative. The equality in (16) holds if and only if t b −1 j=r b β p (j, r b ) = 0 for all b ∈ L. The sum equals 0 only when • r b = 0 and t b ≥ 0 according to (4); or • t b − 1 < r b which forms an empty sum.
The equality in (17) holds if and only if OPT = b∈L E(r b , t * b ) = b∈L r b . Note that (13) , which equals r b if and only if r b = 0, as we assumed 0 < p < 1 in this paper. By Lemma 1, E(r b , t) is a monotonic increasing function in terms of t for all r b ≥ 0. So when r b = 0, we need t * b > r b , which implies that t L max > b∈L r b . Then, the approximation will also give t b > r b for some b ∈ L in this case, and the equality in (16) does not hold.
That is, we have SOL = (1 − p)OPT only when r b = 0 for all b ∈ L. In this case, we have SOL = OPT = 0. In practice, the probability of having r b = 0 for all b ∈ L is very small. So, we can consider that the bound is not tight in most cases but it guarantees that the approximation is good when the packet loss probability is small.

APPENDIX B LAZY EVALUATIONS IN ALGORITHM 3
In Algorithm 3, we need to query the minimum of β p (t a −1, r a ) and the maximum of β p (t b , r a ), where a, b ∈ L. During an iteration, suppose we choose to increase t b by 1 and decrease t a by

1.
It is clear that we need to decrease the key β p (t b , r b ) into β p (t b + 1, r b ) in the max-heap, and increase the key β p (t a − 1, r a ) into β p (t a − 2, r a ) in the min-heap. However, we can omit the updates for the batch a in the max-heap and the batch b in the min-heap, which will be discussed below.
Lemma 8. If the batch a is selected by the max-heap or the batch b is selected by the min-heap in any future iteration, then the optimal solution is reached.
Proof: Suppose the batch a with key A is selected by the max-heap in a future iteration. Note that A was once the smallest element in k for some k. So at the current state where k > k, every element in k must be no smaller than A. Equivalently, we have (1 − p)β p (t κ , r κ ) ≤ (1 − p)β p (t ρ − 1, r ρ ) for all κ, ρ ∈ L. By Theorem 4, the optimal solution is reached. The min-heap counterpart can be proved in a similar fashion.
Suppose we omit the update for the batch ρ in the heap. We call the key of the batch ρ a corrupted key, or the key of the batch ρ is corrupted. A key which is not corrupted is called an uncorrupted key. A heap with corrupted keys is called a corrupted heap. 5 In our scenario which is mentioned at the beginning of this section, the key of a batch is corrupted in a corrupted max-heap if and only if the same batch was once the minimum of the counterpart original min-heap, and vice versa.
Lemma 9. If the root of a corrupted heap is a corrupted key, then the optimal solution is reached.
Proof: We only consider a corrupted max-heap in the proof. We can use similar arguments to show that a corrupted min-heap also works.
In a future iteration, suppose the batch a is selected by the corrupted max-heap. We consider the real maximum in the original max-heap. There are three cases.
Case I: the batch a is also the root of the original max-heap. As the key of a is corrupted, it means that the batch was once selected by the counterpart min-heap. By Lemma 8, the optimal solution is reached.
Case II: the root of the original max-heap is a batch a where the key of a is also corrupted. Similar to Case I, the batch a was once selected by the counterpart min-heap, and we can apply Lemma 8 to finish this case.
Case III: the root of the original max-heap is a batch a where the key of a is not corrupted.
In this case, the uncorrupted key of a is also in the corrupted max-heap. Note that the corrupted key of a is no larger than the actual key of a in the original max-heap. This means that the key of a, a and the corrupted key of a are having the same value. It is equivalent to let the original max-heap select the batch a, as every element in k must be no smaller than the key of a , where k represents the state of the current iteration. Then, the problem is reduced to Case I.
Combining the three cases, the proof is done. 5 We do not have a guaranteed maximum portion of corrupted keys as an input. Also, we do not adopt the carpooling technique.
This suggests that the heap here is not a soft heap [41]. Theorem 11. The updates for the batch a in the max-heap and the batch b in the min-heap can be omitted.
Proof: When we omit the updates, the heap itself becomes a corrupted heap. We have to make sure that when a batch having corrupted key is selected, the termination condition of the algorithm is also met.
We can express the key of the batch π in a corrupted max-heap and min-heap by β p (t π +s π , r π ) and β p (t π − 1 − u π , r π ) respectively, where s π , u π are non-negative integers. When s π or u π is 0, the key is uncorrupted in the corresponding corrupted heap. By Lemma 2(b), we have β p (t π + s π , r π ) ≤ β p (t π , r π ), That is, the root of the corrupted max-heap is no larger than the root of the original max-heap.
Similar for the min-heap. Mathematically, we have Suppose a corrupted key is selected. By Lemma 9, we know that the optimal solution is reached. So, we can apply the contrapositive of Theorem 4 and know that for all κ, ρ ∈ L. We can omit the condition t ρ ≥ 1 because by (3) and (5), we have β p (−1, ·) = 1 ≥ β p (·, ·). The inequality (25) is equivalent to We can mix this inequality with (23) and (24) to show that when a corrupted key is selected, which is the termination condition shown in Algorithm 3 after we replaced the heaps into corrupted heaps.
We just showed that once a corrupted key selected, the termination condition is reached. In the other words, before a corrupted key is selected, every previous selection must be an uncorrupted key. That is, the details inside the iterations are not affected. If an uncorrupted key is selected where it also satisfies the termination condition, then it means that no corrupted key is touched, and the corrupted heap still acts as a normal heap at this point.
The correctness of the algorithm when we use a corrupted heap is proved. Moreover, we do not need to mark down which key is corrupted. This is, we can omitted the mentioned heap updates for a normal heap.
We do not need to mark down which key is corrupted while the algorithm still works, so we can simply omit the mentioned updates as lazy evaluations. As there are two heaps in algorithm, we can reduce from four to two heap updates.

APPENDIX C LINEAR PROGRAMMING FORMULATION OF BAR
In [31], a distributionally robust optimization [42] for adaptive recoding is formulated as a linear programming problem. It is based on an observation that when the expected rank function E(r, t) is concave with respect to t, we can reformulate it by E(r, t) = min i∈{0,1,...,ī} if we fix an artificial upper bound t ≤ī, where ∆ r,t := E(r, i + 1) − E(r, i) and ξ r,i := E(r, i) − i∆ r,i . In (B), we implicitly have t ≤ t L max , so we can make use of this expression to write (B) as max ∀b ∈ L, ∀i ∈ {0, 1, . . . , t L max }, where t b is allowed to be a non-integer. A non-integer t b means that we first generate t b recoded packets, then we generate one more recoded packet with probability t b − t b . Note that there are |L|t L max constraints for e b . To turn such a non-deterministic solution into a deterministic one, we perform the following steps: 1) Collect the batches having non-integer number of recoded packets into a set S.
. Note that R must be an integer.
3) For every b ∈ S, remove the fractional part of t b . 4) Randomly select R batches from S and add one recoded packet to each of these batches.
We have an integer R because b∈L t b = t L max . Also, we have R < |S|. Referring to the idea of Algorithm 1, we have the same value of ∆ r b , t b for all b ∈ S. After removing the fractional part of t b for all b ∈ S, it becomes the subproblem (B (k) ) with k = t L max − R. The last step follows Algorithm 1 so that the output is a solution to (B) where t b for all b ∈ L are all integers.

APPENDIX D BAR WITH KNOWN AND UNCHANGED INCOMING LINK CONDITION
Every batch arrives the current node with a rank. The ranks of all the batches during the transmission can form a rank distribution. As this distribution is for the input batches, we call it the input rank distribution.
If we know the input rank distribution, we can consider a block large enough to contain all the incoming batches. We can have the following benefits: i) by Theorem 2, we can achieve the highest expected rank at the next node; ii) as there is only one block, we only need to solve (B) once; and iii) we can solve (B) before we receive a batch, i.e., we can decide the number of recoded packets once we finished receiving a batch, which the delay is the same as a block of size

1.
We can calculate the input rank distribution before we receive all the batches if we know i) the packet loss probability of the incoming channel; ii) the input rank distribution at the previous node; and iii) the decision of the recoding scheme at the previous node. 6 However, the previous node may not know its input rank distribution, then it needs the above three points from its previous node to calculate it. The problem recursively expends to the source node. That is, it becomes a centralized problem, which is not practical. On the other hand, it is not practical to wait for all incoming batches, as the delay would be tremendous.
One solution is to use small blocks at the beginning and record the statistics of the ranks of the incoming batches. When the number of incoming batches is large enough, the empirical distribution formed by the collected statistics is close to the exact input rank distribution. Then, we can use the empirical input rank distribution to approximate the case of having received all the incoming batches.
According to Theorem 8, we can consider a solution such that |t b − t b | ≤ 1 for all b, b ∈ B r , Then, the sum of expected rank functions for the batches having rank R is To simplify the expression, define In this case, we say that the batches having rank R transmits S + |B R |/|B R | recoded packets, although the number is a non-integral rational number. With this definition, we can denote by t r the number of recoded packets to be transmitted for the batches having rank r.
Now, we can apply Theorem 8 to reformulate (B). We can rewrite the objective of (B) by where t r is a non-negative rational number. Theorem 8 also states that there is at most one r can satisfy |t b − t b | = 1 for all b, b ∈ B r . This means that there is at most one non-integral t r .
Similarly, we can rewrite the constraint by That is, we have the following optimization problem: there is at most one non-integral t r . (IP) When we have collected enough statistics, we can consider a block L which contains all the already received batches. The ratio |B r | : |L| can be used to approximate the portion of batches having rank r among all the incoming batches. Note that if we scale the size of |B r | for all r = 0, 1, . . . , M in (IP), we also have to scale the value of t L max . As the scaling factor is a constant which can be moved away from the maximization, we can use (IP) for the block L as an approximation to the same problem for the a block containing all the incoming batches. Now, suppose we have a set of {t r } M r=0 solving (IP). When a new batch having rank R arrives, we can immediately know that we should transmit t R recoded packets if t R is an integer. If t R is not an integer, then we can first transmit t R recoded packets. After that, we have a t R − t R chance to transmit one more recoded packet.
However, we still have the following issues: i) the algorithms in Section III take a longer time to solve the problem when |L| is large; ii) a batch having rank R where |B R | = 0 may arrive in the future, but we do not have a reasonable t R for it.
Issue (i) is obvious as the time complexity of any algorithm in Section III includes the term O(|L|). Issue (ii) may occur as we are only having an empirical distribution. However for (IP), |B R | = 0 means that any value of t R will not affect the optimal solution and the constraint. So, we can actually provide a reasonable t R , i.e., consider the batch having rank R really appears but takes no contribution to the optimization problem.
When t L max ≤ M r=0 r|B r |, Lemma 5 tells us every feasible solution satisfying t b ≤ r b for all b ∈ L can solve (IP). Although a similar approach used in Algorithm 1 can satisfy the requirement, it may leave some bachtes having non-zero rank transmitting nothing. This clearly cannot resolve issue (ii).
To handle this problem, we propose Algorithm 4 which acts similar to a water filling algorithm. 7 The algorithm increases all t r where t r < r by 1 if there are enough unassigned timeslots. When the number of timeslots are not enough, it allocates the remaining timeslots from the highest rank, which is M , in a descending order. Fig. 19 illustrates the idea of this algorithm.
This assignment ensures that all ranks are considered, and we have a feasible solution satisfying t r ≤ r for all r = 0, 1, . . . , M . It is equivalent to t b ≤ r b for all b ∈ L, thus the correctness is implied by Lemma 5. It is easy to see that the worst case time complexity of Algorithm 4 is When t L max > M r=0 r|B r |, we modify Algorithm 1 a bit to resolve the both issues. First, we do not assign the timeslot one by one and batch by batch. We assign one timeslot to all batches 7 Comparing to the traditional water filling algorithms used in communication systems, our water container is upside down.
We have a flat bottom and an uneven ceiling. At last if the remaining timeslots cannot fill a cell completely, we use up all the remaining timeslots to fill it partially. Note that the solution given by Algorithm 5 has at most one r such that t * r is not an interger, so we have

Algorithm 4: Reasonable Solution for (IP) when
where the last equality follows the constraint of (IP).
Suppose we have a t r which is not an integer for some r, then we need to decide if we should transmit t r or t r + 1 packets, which we need a random number generator. If it is expensive to obtain a random number, we can sacrifice the optimality by using a deterministic number of packets. As the number of timeslots is limited, we cannot send more packets than what we are allowed to send. So, one choice is to drop the fractional part of t r . This implies that which means that {t b } b∈L∪L is in the feasible region of (B').
The sum of expected rank by maximizing separately is which is equal to the objective value of (B') with the feasible solution {t b } b∈L∪L . Therefore, the optimal value of (B') is larger than or equal to the sum of expected rank of maximizing two blocks separately.
Proof of (c): Case I: t < r. By (5), the equality always hold.
Case II: t ≥ 0. Recall that β p (t, r) is the partial sum of the probability mass of the binomial distribution Binom(t, 1 − p). By summing one more term, i.e., β p (t, r + 1), the partial sum must be larger than or equal to β p (t, r). Note that B p (t, i) = 0 when 0 ≤ i ≤ t, so the equality holds if and only if β p (t, r) = 1, if and only if t < r by (5).
Proof of (e) and (f): Inductively by (b), we have β p (t a + u, r a ) ≤ β p (t a , r a ) ≤ β p (t a − v, r a ) for all a ∈ Λ where u, v are non-negative integers such that t a − v ≥ −1. By (3), for all a ∈ Λ. Combining (28) and (29), the proof is done.
APPENDIX G PROOF OF LEMMA 3 By Lemma 1, we have E(r, t + 1) = E(r, t) + (1 − p)β p (t, r). If t < r, we have We can evaluate Lemma 1 recursively and obtain the first equality in (b).
By (a), we can show that when t < r, we have E(r, t) = t(1 − p).
The above result contradicts {t b } b∈L solves (B), which gives that t m ≤ t n for all r m < r n .
It contradicts {t b } b∈L solves (B). So, we have t m = t n for all r m < r n .
Combining the two cases, the proof is done.
APPENDIX I

PROOF OF THEOREM 4
We first prove the sufficient condition. If {t b } b∈L does not solve (B), then it means that there exists another configuration {t b } b∈L which can give a higher objective value. As b∈L t b = b∈L t b = t L max , there exists distinct κ, ρ ∈ L such that t κ > t κ and t ρ < t ρ . Note that t ρ ≥ 0 so we must have t ρ ≥ 1. Define Θ = {κ : t κ > t κ } and Φ = {ρ : t ρ < t ρ }, Using the fact that {t b } b∈L gives a larger objective value and by Lemma 3(b), we have Now, we fix κ, ρ such that κ ∈ arg max θ∈Θ β p (t θ , r θ ) and ρ ∈ arg min φ∈Φ β p (t φ − 1, r φ ).
Applying (39), we have (1 − p)β p (t κ , r κ ) > (1 − p)β p (t ρ − 1, r ρ ), which proves the sufficient condition. Now we consider the necessary condition, where we have for some distinct κ, ρ ∈ L. Let for all b ∈ L. Then, we consider the following: This means that {t b } b∈L is not an optimal solution of (B).

APPENDIX J PROOF OF THEOREM 8
We only consider Algorithm 1 in this proof.
We first consider the case that t L max > b∈L r b . Note that (B) is the same as (B (t L max ) ). We are going to prove the following proposition by induction: there exists a set of {t b } b∈L solving (B) such that Recall that after an iteration, the set of {t b } b∈L solves (B (k+1) ). Case A and B imply that the proposition is true for the subproblem (B (k+1) ). By induction, the proof for the case t L max > b∈L r b is done. Now, we consider the case that t L max ≤ b∈L r b . Suppose we have sorted the batches by their ranks in ascending order, and the foreach loop in Algorithm 1 follows the rank in ascending order. Then, the output of the algorithm is in the form described below. There exists a rank R such that where 0 < t < r b and there is at most one b which have t b = t.
It is obvious that |t b − t b | = 0 for all b, b ∈ B r where r ∈ {0, 1, . . . , M } \ {R}. Now, we consider the batches in B R . We are going to redistribute the assigned timeslots.
Let T = b∈B R t b , which is the total number of timeslots assigned to the batches having rank R. We can reassign the timeslots by the following steps: 1) assign T /|B R | timeslots to each batch in B R ; 2) select T mod |B R | distinct batches from B R , and assign one more timeslots to each of them.
It is easy to see that the above assignment • uses up all the T timeslots; and That is, the solution is feasible, which is also optimal by Lemma 5. Also, we have the condition that |t b −t b | ≤ 1 for all b, b ∈ B r , r = 0, 1, . . . , M , and there is at most one r which can achieve the equality. The proof is done.