An Umbrella Converse for Data Exchange: Applied to Caching, Computing, and Shuffling

The problem of data exchange between multiple nodes with storage and communication capabilities models several current multi-user communication problems like Coded Caching, Data Shuffling, Coded Computing, etc. The goal in such problems is to design communication schemes which accomplish the desired data exchange between the nodes with the optimal (minimum) amount of communication load. In this work, we present a converse to such a general data exchange problem. The expression of the converse depends only on the number of bits to be moved between different subsets of nodes, and does not assume anything further specific about the parameters in the problem. Specific problem formulations, such as those in Coded Caching, Coded Data Shuffling, and Coded Distributed Computing, can be seen as instances of this generic data exchange problem. Applying our generic converse, we can efficiently recover known important converses in these formulations. Further, for a generic coded caching problem with heterogeneous cache sizes at the clients with or without a central server, we obtain a new general converse, which subsumes some existing results. Finally we relate a “centralized” version of our bound to the known generalized independence number bound in index coding and discuss our bound’s tightness in this context.


Introduction and Main Result
Consider a system of K nodes, denoted by [K] {1, . . . , K}, each of which have (not necessarily uniform) storage. The nodes can communicate with each other through a noiseless bus link, in which transmissions of any node is received by all others. Each node possesses a collection of data symbols (represented in bits) in its local storage and demands another set of symbols present in other nodes. We formalize this as a data exchange problem.

Definition 1.
A data exchange problem on a set of K nodes involving a collection B of information bits is given by the following: • a collection {C i : i ∈ [K]}, where C i ⊂ B denotes the subset of data present in node i, • a collection {D i : i ∈ [K]} where D i ⊂ ∪ j =i C j \ C i denotes the set of bits demanded by node i.
The above data exchange problem models a number of cache-enabled multi-receiver communication problems studied recently in the coding theory community, including Coded Caching [1], Coded Distributed Computing [2,3], Coded Data Shuffling [4][5][6], and Coded Data Rebalancing [7]. In [8], a special case of our general problem here was considered in the name of cooperative data exchange, where the goal was to reach a state in which all nodes have all the data in the system.

L(Φ, Ψ).
The central result in this work is Theorem 1 in Section 1.1, which is a lower bound on the optimal communication load L * . Using this lower bound, we recover several important converse results of cache-enabled communication problems studied in the literature, including Coded Caching (Section 2), Data Shuffling (Section 3), and Distributed Computing (Section 4). In each of these sections, we briefly review each setting and then apply Theorem 1 to recover the respective converses. As a result, the proofs of these existing converses are also made simpler than what is already available in the literature for the respective settings. The generic structure of the converse proofs obtained using our data exchange bound is presented in Section 1.2. This structure includes three steps, which we also highlight at the appropriate junctures within the proofs themselves. The close relationship between these problems is quite widely known. This work gives a further formal grounding to this connection, by abstracting the common structure of these converses into a general form, which can potentially be applied to other new data exchange problems as well.
Apart from recovering existing results, more importantly we also use our data exchange lower bound to obtain new tight converse results for some settings, while improving tightness results of some known bounds. Specifically, we present a new converse for a generic coded caching setting with multi-level cache sizes. Using this, we are able to close the gap to optimality for some known special cases of this generic setting (Section 2.1). In Section 5, we show the relationship between a "centralized" version of our data exchange lower bound and an existing bound for index coding known as the α-bound or the generalized independence number bound [9]. In general, we find that our bound is weaker than the α-bound. However, for unicast index coding problems, we identify the precise conditions under which our data exchange bound is equal to the α-bound. In Section 6, we discuss the application of our data exchange lower bound to more generalized index coding settings, specifically distributed index coding [10,11] and embedded index coding [12].
Notation: For positive integer a, let [a] {1, . . . , a}. For a set S, we denote by S \ k the set of items in S except for the item k, and represent the union S ∪ {k} as S ∪ k. The binomial coefficient is denoted by ( n k ), which is zero if k > n. The set of all t-sized subsets of a set A is denoted by ( A t ).

A Converse for the Data Exchange Problem
In this subsection, we will obtain a lower bound on the optimal communication load of the general data exchange problem defined in Section 1. This is the central result of this work. The predecessor to the proof technique of our data exchange lower bound is in [3], which first presented an induction based approach for the converse of the coded distributed computing setting. Our proof uses a similar induction technique.
Given a data exchange problem and for P, Q ⊂ [K] such that P = ∅, let a Q P denote the number of bits which are stored in every node in the subset of nodes Q and stored in no other node, and demanded by every node in the subset P and demanded by no other node, i.e., Note that, by definition, a Q P = 0 under the following conditions. • If P ∩ Q = ∅, as the bits demanded by any node are absent in the same node. • If Q = ∅, by Definition 1. Theorem 1 gives a lower bound on the optimal communication load of a given data exchange problem. The proof of the theorem is relegated to Appendix A. The idea of the proof is as follows. If we consider only two nodes in the system, say [K] = {1, 2}, then each of the 2 nodes has to transmit whatever bits it has which are demanded by the other node, i.e., L * ≥ a  Theorem 1, along with the observation that a Q ∅ = 0 = a ∅ P gives us the following corollary, which is a restatement of Theorem 1. Corollary 1. Let n(p, q) ∑ P,Q⊂[K]: |P|=p,|Q|=q,P∩Q=∅ a Q P denote the total number of bits present exactly in q nodes and demanded exactly by p (other) nodes. Then, Remark 1. In [13], the authors presented an essentially identical bound (Lemma 1, [13]) as Corollary 1 in the setting of coded distributed computing. The proof given in [13] for this lemma also generalizes the arguments presented in [3], as does this work. Our present work considers a general data exchange problem and derives the lower bound in Theorem 1 for the communication load in such a setting. We had derived this lower bound independently in the conference version of this paper [14], and only recently came to know about the bound in [13]. In subsequent sections, we show how to use this bound to recover converses for various multi-terminal communication problems considered in the literature in recent years, and also obtain new converses for some settings.
We also discuss, in Section 5, the looseness of Theorem 1 by considering a centralized version of the data exchange problem and comparing our bound with the generalized independence number bound in index coding. In Section 6, we discuss the application of our data exchange bound to more generalized index coding settings. These are the novel features of our present work, compared to the bound in Lemma 1 of [13].

A Generic Outline of the Converse Proofs Presented in This Paper
In this work, we derive converse bounds for various settings in coded caching, coded distributed computing, and coded data shuffling using the bound in Theorem 1. Some of these converse bounds are already available in the literature, while others are novel. Each setting enjoins some constraints on the size of the demands and the size of the pre-stored content at each node. The bound in Theorem 1 applies for the setting in which the nodes have some predetermined local storage and some specific demanded bits. However, the settings of coded caching, coded distributed computing, and coded data shuffling permit the design of the initial storage so that the communication load is minimized. Further, the optimal communication load as defined in the literature for some of these settings involves maximization over all possible demand configurations, keeping only the size of the demands fixed. Keeping with these specifics, our bound in Theorem 1 must be tuned for each setting to obtain the respective converse, as captured by the three following steps which describe the generic structure behind our converse proofs.

1.
Applying Theorem 1 to the present setting, we obtain a lower bound expression on the communication load, assuming an arbitrary choice of demands across the nodes and some arbitrary but fixed storage across the nodes.

2.
"Symmetrization" step: In this step, the lower bound expression obtained in the previous step is averaged over some carefully chosen configurations of demanded bits at the nodes. This step helps to remove the dependency of the lower bound on the specific choice of demands.

3.
Refine the averaged bound by imposing the constraints on the size of the initial storage at the nodes, and using convexity of terms inside the averaged bound to obtain the final expression of the bound. This step helps to remove the dependency of the converse on the specific initial storage configuration at the nodes.
These three steps enable us to give simpler proofs to those in the literature for known converses, and also obtain novel converses for some variants of the same problems. Further, it also illustrates the generic nature of the data exchange bound of Theorem 1. In the converse proofs that are to follow in this paper, we will highlight these steps at the appropriate junctures.

Coded Caching
In this section, we apply Theorem 1 to recover the lower bound obtained in [15] for the problem of coded caching introduced in [1]. Further, using Theorem 1, we prove in Section 2.1 a new converse for a generic coded caching problem under multiple cache size settings. This provides new converses for some existing settings in literature, and also tightens bounds in some others. In Section 2.2, we recover a converse for coded caching with multiple file requests. In Section 2.3, we recover the converse for coded caching with decentralized cache placement.
We now describe the main setting of this section. In the coded caching system introduced in [1], there is one server connected via a noiseless broadcast channel to K clients indexed as [K]. The server possesses N files, each of size F bits, where the files are indexed as W i : i ∈ [N]. Each client contains local storage, or a cache, of size MF bits, for some M ≤ N. We call this a (K, M, N, F) coded caching system. Figure 1 illustrates this system model. The coded caching system operates in two phases: in the caching phase which occurs during the low-traffic periods, the caches of the clients are populated by the server with some (uncoded) bits of the file library. This is known as uncoded prefetching. In this phase, the demands of the clients are not known. We denote the caching function for node k as ζ k , and thus the cache content at client k at the end of the caching phase is denoted as

Figure 1.
A single server is connected to K clients via a broadcast channel. Each user has a cache capable of storing MF of the NF bits in the file-library available at the server.
In the delivery phase which occurs during the high-traffic periods, each client demands one file from the server, and the server makes transmissions via the broadcast channel to satisfy the client demands. Let the demanded file at client k be W d k , where d k ∈ [N]. The server uses an encoding function φ to obtain coded transmissions can employ a decoding function ψ k to decode its demanded file using the coded transmissions and its cache content, i.e., ψ k (X, Z k ) = W d k .
The communication load L c ({ζ k : k ∈ [K]}, φ, {ψ k : k ∈ [K]}) of the above coded caching scheme is the number of bits transmitted in the delivery phase (i.e., the length of X) in the worst case (where "worst case" denotes maximization across all possible demands). The optimal communication load denoted by L * c , is then defined as For this system model, when MK N ∈ Z, the work in [1] proposed a caching and delivery scheme which achieves a communication load (normalized by the size of the file F) given In [15], it was shown that, for any coded caching scheme with uncoded cache placement, the optimal communication load is lower bounded by 1+MK/N . Therefore, it was shown that, when K ≤ N and MK N ∈ Z, the scheme in [1] is optimal.
In the present section, we give another proof of the lower bound for coded caching derived in [15]. We later discuss the case of arbitrary K, N in Remark 2.
We now proceed with restating the lower bound from [15]. Note that these converses are typically normalized by the file size in literature, however we recall them in their non-normalized form, in order to relate them with our data exchange problem setting. Theorem 2 ([15]). Consider a (K, M, N, F) coded caching system with K ≤ N. The optimal communication load L * c in the delivery phase satisfies Proof based on Theorem 1. We assume that the caching scheme and delivery scheme of the coded caching scheme are designed such that the communication load L c is exactly equal to the optimal load L * c . Let the K client demands in the delivery phase be represented by a demand vector d d d = (d 1 , . . . , d K ), where d k ∈ [N] denotes the index of the demanded file of the client k. We are interested in the worst case demands scenario; this means we can assume that all the demanded files are distinct, i.e., d k = d k for all k = k to bound L * c from below, without loss of generality.
We observe that a (K, M, N, F) coded caching problem during the delivery phase satisfies Definition 1 of a data exchange problem on K + 1 nodes indexed as {0, 1, . . . , K}, where we give the index 0 to the server node and include this in the data exchange system. Before proceeding, we remark that the below proof gives a lower bound where all K + 1 nodes in the system may transmit, whereas in the coded caching system of [1] only the server can transmit. Thus, any lower bound that we obtain in this proof applies to the setting in [1] also.
Clearly in the equivalent data exchange problem, the node 0 (the server) does not demand anything, but has a copy of all the bits in the entire system. With these observations, we have by definition of a Q P in (1) where the quantities a Q P clearly depend on the demand vector d d d. We thus use a new set of variables: for each k ∈ [K], Q ⊂ [K], and given demands denote the number of bits demanded by receiver node k that are available only at the nodes Q ∪ {0}, i.e., Using these definitions, we proceed following the three steps given in Section 1.2. Applying Theorem 1: By Theorem 1, we have the following lower bound for demand vector d d d where (6) is obtained from (4) and (5). "Symmetrizing" (6) over carefully chosen demand vectors: We now consider the averaging of bounds of type (6) over a chosen collection of N demand vectors, given by where j ⊕ N i ((j + i) mod N) + 1. That is, D contains the demand vectors consisting of consecutive K files, starting with each of the N files as the demand of the first client.
Averaging (6) through the set of N demand vectors in D, the lower bound we obtain is Let b Q n denote the number of bits of file n stored only in Q ∪ {0}. Then, in the above sum, b Q n = c Q k (d d d) if and only if d k = n. This happens precisely once in the collection of N demand vectors in D. Thus, we have where (10) follows as for a fixed n and Q, k ∈ [K] \ Q in (9), and by multiplying and dividing by F. Refining the bound (10) by using the constraints of the setting: Now, by definition, , denotes a probability mass function. Furthermore, Thus, we get L * c ≥ K(1−M/N) 1+MK/N F, which completes the proof.

Remark 2.
In the previous part of this section, we have shown the converse for the worst case communication load L * c for coded caching in the regime of K ≤ N. We now consider a general coded caching setup with arbitrary K, N values and cache size M. Consider a positive integer N u ≤ min{N, K}. For a fixed caching scheme denoted by ζ = {ζ k : k ∈ [K]}, let the minimum communication load for satisfying the clients, maximized across all possible demand vectors with exactly N u distinct files in each of the demand vectors, be denoted as L * c (N u , ζ). In the work [16], it was shown that for t MK N , where g Nu (x) is defined as the lower convex envelope of the points Note that g Nu (t) is independent of ζ. For this general setting, the optimal worst case load L * c , as defined in (3), satisfies Thus, from (11), we get which is the converse bound on the worst case communication load proved in [16] for this general scenario. In Appendix B, we use our data exchange bound in Theorem 1 to recover (11), which therefore shows (12).

Server-Based and Server-Free Coded Caching with Heterogeneous Cache Sizes at Clients
So far we have discussed the coded caching scenario where there is a central server containing the entire file library and the client cache sizes are homogeneous, i.e., the same at all clients. We now describe a generalization of the result in Theorem 2 to the case of systems in which the clients have heterogeneous cache sizes, with either a centralized server present or absent. The proof of this is easily obtained from our data exchange bound in Theorem 1. To the best of our knowledge, a converse for this general setting is not known in the literature. Using this converse, we can derive new converses and tighten existing converses for various special cases of this setting, which include widely studied coded caching settings, such as device-to-device coded caching [17].
Consider a coded caching system with N files (each of size F) with K client nodes denoted by a set K T . We shall indicate by the value γ the presence (γ = 1) or absence (γ = 0) of a centralized server in the system containing the file library. For the purpose of utilizing our data exchange bound, we assume that all the nodes in the system are capable of transmissions; thereby, any converse for this scenario is also valid for the usual coded caching scenario in which only the server (if it is present) does transmissions in the delivery phase. The set of clients K T is partitioned into subsets K T i : i = 1, . . . , t where the nodes in subset K T i can store a fraction γ T i of the file library. Let |K T i | = K T i . We now give our converse for this setting. The caching and the delivery scheme, as well as the optimal communication load L * c , are defined as in the case of coded caching with homogeneous cache sizes. Proposition 1. For the above heterogeneous cache sizes setting, assuming K ≤ N, the optimal communication load L * c for uncoded cache placement is lower bounded as follows.
Before giving the proof of Proposition 1, we give the following remarks regarding the generality of Proposition 1, the new results which arise by applying Proposition 1 and various results from existing literature that are subsumed or improved by it.
• Heterogeneous Cache Sizes: There exists a number of works discussing distinct or heterogenous client cache sizes, for instance, in [18,19]. However, closed form expressions for the lower bound on the load seem to be lacking for such scenarios, to the best of our knowledge. Proposition 1 gives a lower bound for all such settings. • Device-to-Device Coded Caching: Suppose there is no designated server in a coded caching setup, but the client nodes themselves are responsible for exchanging the information to satisfy their demands. This corresponds to the case of Device-to-Device (D2D) coded caching, first explored in [17]. In [17], an achievable scheme was presented for the case when each (client) node has equal cache fraction M N , and this scheme achieves a communication load of ( N M − 1)F bits. In the work [20], it was shown that this communication load is optimal (for the regime of K ≤ N) over all possible "one shot" schemes (where "one shot" refers to those schemes in which each demanded bit is decoded using the transmission only from one server), and further it was shown that the load is within a multiplicative factor of 2 of the optimal communication load under the constraint for uncoded cache placement. We remark that the D2D setting of [17] corresponds to the special case of our current setting, with γ = 0, t = 1, K T 1 = K, and γ T 1 = M/N. By this correspondence, by applying Proposition 1, we see that the load in this case is lower bounded as N M − 1 F, thus showing that the achievable scheme in [17] is exactly optimal under uncoded cache placement. The D2D scenario with heterogeneous cache sizes was explored in [21], in which the optimal communication load was characterized as the solution of an optimization problem. However, no closed form expression of the load for such a scenario is mentioned. Clearly, our Proposition 1 gives such a lower bound, when we fix γ = 0, for any number of levels t of the client-side cache sizes. Further, the result for coded caching with a server and equal cache sizes at receivers, as in Theorem 2, is clearly obtained as a special case of Proposition 1 with γ = 1, t = 1, K T 1 = K and γ T 1 = M N . We now proceed to prove Proposition 1. The proof is similar to that of Theorem 2.
Proof of Proposition 1. As in the proof of Theorem 2, we will denote the server node as the node 0 and assume a caching and delivery scheme which achieves the optimal load L * c for worst case client demands.
Applying Theorem 1, for our setting, we have Note that if γ = 1 (i.e., the server is present), then a Q P = 0 whenever 0 / ∈ Q . For a specific demand vector d d d = (d 1 , . . . , d K ) consisting of distinct demands and for some Q ⊂ K T , we define the quantity c Q k (d d d) as follows.
available exclusively in Q ∪ 0

Symmetrization over appropriately chosen demand vectors:
Choosing the same special set of demand vectors D as in (7) and averaging the above lower bound over the demand vectors in D similar to the proof of Theorem 2, we obtain a bound similar to (8): Combining the two expressions in (14), we can write a single equation which holds for γ ∈ {0, 1}, We now define the term b Q n as follows.
Using the above definition of b Q n and observing that each demand vector in D has distinct components, Equation (15) can be written as Refining the bound in (19) using setting constraints and convexity: By the defini- This completes the proof.
Remark 3. Proposition 1 holds when N ≥ K. This scenario is the most studied case in the literature and is practically more relevant than the case K > N. We now provide lower bounds for the heterogeneous cache sizes setting for general values of K, N, which includes the case K > N. As before, we consider two cases: γ = 1 indicates the presence of a centralized server in the system and γ = 0 indicates its absence. Case 1, γ = 1: For the case where a centralized server is present, i.e., γ = 1, we have where the function g min{N,K} is defined in Remark 2. The derivation of this lower bound follows the steps in Appendix B until (A21), where we choose N u = min{N, K}. Without loss of generality, we assume that all caches are fully populated with uncoded bits from the library, thus the total memory occupied by the cached bits ∑ Q⊂[K] |Q|a Q is equal to the sum of all the cache memory available in the system ∑ t i=1 K T i γ T i NF. Applying Jensen's inequality on (A21) and using the fact we immediately arrive at the lower bound (20). Case 2, γ = 0: In this case, the optimal worst-case communication load can be lower bounded as follows: The proof of this lower bound follows similar approach as Appendix B and is outlined in Appendix C. Note that when N ≥ K, both (20) and (21) become identical to the inequality in Proposition 1.

Coded Caching with Multiple File Requests
In [22], coded caching with multiple file requests was considered, in which each client requests any ∆ files out of the N files in the delivery phase. It was shown in [22] (Section V.A) that if the ∆K ≤ N, then the optimal worst case communication load can be lower bounded as The work in [22] also gives an achievable scheme based on the scheme in [1] which meets the above bound. The same lower bound can be derived using Theorem 1 also, by following a similar procedure as that of the proof of Theorem 2.
Applying Theorem 1, we give the proof in brief. The demand vector assumed in proof of Theorem 1 becomes a K∆-length vector in this case, consisting of K subvectors, each of length ∆, capturing ∆ distinct demands for each client. The proof proceeds as is until (6).
Symmetrization: The set D in (7) now contains the K∆-length vectors of consecutive file indices, cyclically constructed, starting from (1, . . . , K∆), i.e., then the indices of the demanded files at client k ∈ [K], denoted by d k (j), is given by The averaged lower bound expression similar to (8) is then obtained as In this expression, we have c Q k (d d d(j)) which now indicates the number of bits of ∆ distinct and consecutive files indexed by d k (j) and available exclusively at the nodes in Q ∪ 0 (0 denoting the server). Observation n denotes the number of bits of file n available exclusively in the nodes Q ∪ 0, as in the proof of Theorem 2. Now, n ∈ d d d k (j) if and only if the file n is demanded by client k. By definition of D, the event n ∈ d d d k (j) happens for precisely ∆ values of index j. From (24), applying the above observation, we have the following.
Refining the bound in (25) using the setting constraints: We use the constraints of the setting and the convexity of the resultant expression to refine (25). This refinement essentially follows similar subsequent steps as in the proof of Theorem 2 following (9), and leads finally to (22).

Remark 4.
The work in [23] considers a coded caching setup in which Λ caches (Λ ≤ K) are shared between the K clients. The special case when Λ divides K and each cache is serving exactly K Λ clients is equivalent to the scenario of the multiple file requests in [22] with Λ clients, each demanding K Λ files. The above proof then recovers the converse for this setting, which is obtained in [23] (Section III.A in [23]).

Coded Caching with Decentralized Caching
Theorem 2 and the subsequent results discussed above hold for the centralized caching framework, in which the caching phase is designed carefully in a predetermined fashion. In [24], the idea of decentralized placement was introduced, in which the caching phase is not coordinated centrally (this was called "decentralized coded caching" in [24]). In this scenario, each client, independently of others, caches a fraction γ = M N of the bits in each of the N files in the file library, chosen uniformly at random. For this scenario, the server (which has the file library) is responsible for the delivery phase. The optimal communication load L * c is defined as the minimum worst case communication load over all possible delivery schemes for a given caching configuration, randomly constructed as given above. For the case of K ≤ N, the authors of [24] show a scheme which achieves . This was shown to be optimal for large F in [16] and also in [25] via a connection to index coding. In the following, we show that the same optimality follows easily via our Theorem 1.
Assume that we have distinct demands at the K clients, as in the proof of Theorem 1, given by the demand vector d d d. We first note that by the law of large numbers, as F increases, for the decentralized cache placement, for any k ∈ is as defined in (5). This observation enables us to avoid the steps 2 and 3 mentioned in Section 1.2, as the value of c Q k (d d d) is independent of the specific random cache placement or the demands chosen (as long as they are distinct). Using this in (6), we get where the last step follows as Thus, we have given an alternate proof of the optimality of the decentralized scheme in [24].

Decentralized Coded Data Shuffling
In distributed machine learning systems consisting of a master and multiple worker nodes, data are distributed to the workers by the master in order to perform training of the machine learning model in a distributed manner. In general, this training process takes multiple iterations, with the workers doing some processing (like computing gradients) on their respective training data subsets. In order to ensure that the training data subset at each node are sufficiently representative of the data, and to improve the statistical performance of machine learning algorithms, shuffling of the training data between the worker nodes is implemented after every training iteration. This is known as data shuffling.
A coding theoretic approach to data shuffling, which involves the master communicating coded data to the workers was presented in [4]. The setting in [4] was centralized, which meant that there is a master node communicating to the servers to perform the data shuffling.
The work in [5] considered the data shuffling problem in which there is no master node, but the worker nodes exchange the training data among themselves, without involving the master node, to create a new desired partition in the next iteration. This was termed as decentralized data shuffling in [5]. Note that these notions of "centralized" and "decentralized" in the data shuffling problem are different from those in the coded caching [24], in which these terms were used to define the deterministic and random design of the caching phase, respectively. In this section, we look at the work in [5] and give a new simpler proof of the lower bound on the communication load for decentralized data shuffling.
We first review the setting in [5]. Consider K workers in the system, where each worker node is required to process q data units at any given time. The total dataset F 1 ∪ · · · ∪ F N consists of N = Kq data units F 1 , . . . , F N , with a size of B bits per data unit. The collection of data units to be processed by worker node k at time t is denoted as A k,t . The collection of data units A 1,t , . . . , A K,t must form a partition of the dataset F 1 ∪ · · · ∪ F N for every time instant t, i.e., for any time t and any choice of k, k ∈ [K] with k = k we have Each node k has a local cache of size MB bits (such that q ≤ M ≤ Kq) that can hold M data units. Out of these M units q units are the current "active" data A k,t at any time step which are required to be processed by the node k. The contents of the cache of node k at time t is denoted as Z k,t . Therefore, for each choice of k ∈ [K] and any time t, we have |Z k,t | = MB and A k,t ⊂ Z k,t .
At each time instance t, a new partition {A k,t : k ∈ [K]} is to be made active at the nodes [K], where this new partition is made known to the workers only at time step t. Note that the contents of the nodes at time t − 1 are Z 1,t−1 , . . . , Z K,t−1 , and the active partition at time t − 1 is A 1,t−1 , . . . , A K,t−1 . The worker nodes communicate with each other over a common broadcast link, as shown in Figure 2, to achieve the new partition. The decentralized data shuffling problem is to find a delivery scheme (between workers) to shuffle the collection of active data units {A k,t−1 : k ∈ [K]} to a new partition {A k,t : k ∈ [K]}. Each worker k computes a function φ k (Z k,t−1 ) of its cache contents and broadcasts it to the other workers. Using these transmissions and the locally available cache content Z k,t−1 , each node k is required to decode A k,t . As in the case of coded caching, one seeks to reduce the worst-case communication load by designing the initial storage and coded transmissions carefully. The communication load of this data shuffling scheme, denoted by L ds , is the sum of the number of bits broadcast by all the K nodes in the system, i.e., L ds = ∑ k∈[K] |φ k (Z k,t−1 )|. The optimal communication load of data shuffling L * ds (for the worst case data shuffle) is defined as where the maximization is over all possible choices for A k,t−1 : k ∈ [K] and A k,t : k ∈ [K], and the minimization is over all possible choices for the cache placement {Z k,t−1 : k ∈ [K]} and the delivery scheme {φ k : k ∈ [K]}. For the above setting, the following bound on the communication load L * d was shown in [5].
The above bound was shown to be optimal for some special cases of the parameters, and order-optimal otherwise.

Proof of the Decentralized Data Shuffling Converse
We now recover the bound (26) by a simple proof using our generic lower bound in Theorem 1. We assume that the cache placement and delivery scheme of the data shuffling scheme are designed such that the communication load of the data shuffling scheme is exactly equal to L * ds . We proceed as per the three steps in Section 1.2. Applying Theorem 1: For k ∈ [K] and Q ⊂ [K], let A Q k,t denote the subset of bits of A k,t available exactly at the nodes in Q and not anywhere else. Note that |A Q k,t | = 0 if Q = ∅, as each bit is necessarily present in at least one of the K nodes.
As per our bound in Theorem 1, we have Symmetrization by averaging over appropriately chosen set of shuffles: Let the set of circular permutations of (1, 2, . . . , K), apart from the identity permutation, be denoted by Γ. There are K − 1 of them clearly. We denote an arbitrary permutation in Γ by γ, and by γ k we denote the kth coordinate of γ. Now, consider the shuffle given by γ ∈ Γ, i.e., for each k, A k,t = A γ k ,t−1 . For this shuffle, we have by the above equation that Now, averaging (28) over all permutations in Γ, we get As we go through all choices of γ ∈ Γ, we see that γ k takes every value except k, i.e., γ k assumes each value in [K] \ k exactly once. Moreover, A Q k ,t−1 is the collection of bits of A k ,t−1 present only in Q. However, the bits A k ,t−1 are already presented in k . Hence, Refining the bound using setting constraints and convexity: Now, we have the following observations as A Q k ,t−1 : Utilizing the above, and the fact that K−|Q| |Q| is a convex decreasing function in |Q| (for |Q| ≥ 0), we have Thus, we have recovered (26).

Remark 5.
We have considered the decentralized version of the coded data shuffling problem in this subsection. The centralized version of the data shuffling problem was introduced in [4] and its information theoretic limits were studied elaborately in [6]. Our data exchange bound, when applied to the setting in [6], results in a looser converse result than that in [6]. The reasons for this is explored in Section 5 using the connection between our data exchange bound and the bound for index coding known in literature.

Coded Distributed Computing
In a distributed computing setting, there are N files on which the distributed computing task has to be performed by K nodes. The job at hand is divided into three phases: Map, Shuffle, and Reduce. In the shuffle phase, the nodes that are assigned to perform the distributed computing task exchange data. In [3], the authors proposed coded communication during the shuffle phase to reduce the communication load. We recollect the setting and the main converse result from [3], which we recover using our data exchange bound.
A subset M i of N files is assigned to ith node and the ith node computes the map functions on this subset in the map phase (see Figure 3). We assume that the total number of map functions computed at the K nodes is rN, where r is referred to as the computation load. In the reduce phase, a total of W reduce functions is to be computed across the K nodes corresponding to the N files. Each node is assigned the same number of functions. Obtaining the output of the reduce functions at all the nodes will complete the distributed computing task. In this work, as in [3], we consider two scenarios: in the first one, each reduce function is computed exactly at one node and in the second, each reduce function is computed at s nodes, where s ≥ 2. dc be the total number of bits broadcasted by the K nodes in the shuffle phase, minimized over all possible map function assignments, reduce function assignments, and shuffling schemes, with a computation load r. We refer to L * dc as the minimum communication load. To obtain similar expressions for the communication load as in [3], we normalize the communication load by the total number of intermediate output bits (=W NT). We consider the first scenario now, where each reduce function is computed exactly at one node.

Theorem 3 ([3]). The minimum communication load L *
dc incurred by a distributed computing system of K nodes for a given computation load r, where every reduce function is computed at exactly one node and each node computes W K reduce functions, is bounded as Proof. We resort to two of the three steps of Section 1.2 to complete this proof. The symmetrization step, which involves averaging over demand configurations, is not applicable in the present setting because the definition of L * dc involves minimization over the reduce function assignment as well.
Applying . It is easy to see that ∑ K j=1ã j M = N and ∑ K j=1 jã j M = rN. We will apply Theorem 1 to this setting. Recall that each reduce function is computed exactly at one node in our present setup. To apply Theorem 1, we need to ascertain the quantities a Q P for P, Q being disjoint subsets of [K]. To do this, we first denote byã Q the number of files whose intermediate outputs are demanded by some node k and available exclusively in the nodes of Q. Note thatã Q is the same for any k ∈ [K] \ Q, as each node demands intermediate outputs of all the files that are not mapped at the node itself.
As the number of reduce functions assigned to node k is W K (as each reduce function is computed at exactly one node) and each intermediate output is Refining the bound using convexity and setting constraints: Using definition of L * dc , noting that K−j j is a convex decreasing function of j and that Now, we consider the case in which each reduce function has to be computed at s nodes. The total number of reduce functions is assumed to be W. In addition, the following assumption is made to keep the problem formulation symmetric with respect to reduce functions: every possible s sized subset of K nodes is assigned W reduce functions (we assume ( K s ) divides W). As in the previous case, we will denote the communication load for a given map function assignment by L M (s) and the optimal communication load with computation load r by L * dc (s). We will prove the following result which gives a lower bound on L M (s).

Proposition 2 ([3]). The communication load corresponding to a map function assignment M when each reduce function has to be computed at s nodes is lower bounded as
Proof. As before, we will denote byã Q the number of files whose map function outputs are available exclusively in the nodes of Q. Furthermore, we will denote the number of intermediate output bits which are demanded exclusively by the nodes in P and available exclusively in the nodes of Q by b Q P . Then, applying Theorem 1, the lower bound on the communication load in terms of {b Q P } is given by We first interchange the above summation order and consider all sets Q with |Q| = j and all sets P such that |P| = l. For |Q| = j, we need to count the subsets of size s, which form a subset of P ∪ Q. Thus, for a fixed j, we can see that the range of l can vary from max(0, s − j) to min(K − j, s). For a given subset P of size l, the number of s sized subsets which are contained within P ∪ Q and contain P are ( j s−l ). Therefore, the number of intermediate output bits demanded exclusively by the nodes in P and available exclusively in Q, b Q P , is . This is because each of the s-sized subset has to reduceã Q W functions. Using this relation, the above inequality can be rewritten as follows.
where (40) follows asã This completes the proof.
The above lemma along with certain convexity arguments resulting from the constraints imposed by the computation load can be used to prove the lower bound on L * dc (s). The interested reader is referred to the converse proof of Theorem 2 in [3] for the same.

Relation to Index Coding Lower Bound
We now consider the "centralized" version of the data exchange problem, where one of the nodes has a copy of all the information bits and is the lone transmitter in the system. We will use the index 0 for this server node, and assume that there are K other nodes in the system, with index set [K], acting as clients. In terms of Definition 1, this system is composed of K + 1 nodes {0} ∪ [K], the demand D 0 of the server is empty, while the demands D i and the contents C i of all the clients are subsets of the contents of the server, i.e., C i , D i ⊂ C 0 for all i ∈ [K]. Without loss of generality, we assume that only the server performs all the transmissions as any coded bit that can be generated by any of the client nodes can be generated at the server itself. Clearly, this is an index coding problem [26] with K clients or receivers, the demand of the ith receiver is D i , and its side information is C i . When applied to this scenario, our main result Theorem 1 therefore provides a lower bound on the index coding communication cost.
The maximum acyclic induced subgraph (MAIS) and its generalization, which is known as the generalized independence number or the α-bound, are well-known lower bounds in index coding [9,26]. In this section, we describe the relation between the α-bound of index coding and the centralized version of Theorem 1. We show that the latter is in general weaker, and identify the scenarios when these two bounds are identical. We then use these observations to explain why Theorem 1 cannot provide a tight lower bound for the centralized data shuffling problem [6].
Let us first apply Theorem 1 to the centralized data exchange problem. As node 0 contains all the information bits and its demand is empty, we have a Q P = 0 if 0 / ∈ Q or 0 ∈ P. Using Q = Q \ {0} and defining the variable c Q P = a Q∪{0} P = a Q P , we obtain Note that it is possible to have c Q P = a Q∪{0} P > 0 when Q = ∅. In Section 5.1, we express the generalized independence number α in terms of the parameters c Q P , and in Section 5.2, we identify the relation between our lower bound Theorem 4 and the index coding lower bound α.

The Generalized Independence Number Bound
Let γ = (γ 1 , . . . , γ K ) be any permutation of [K], where γ i is the ith coordinate of the permutation. Applying similar ideas as in the proof of Theorem 1 to the centralized scenario, we obtain the following lower bound on L * . This lower bound considers the nodes in the order γ 1 , . . . , γ K , and for each node in this sequence it counts the number of bits that are demanded by this node which are neither demanded by and nor available as side information in any of the earlier nodes.

Proposition 3. For any permutation γ of [K],
Proof. See Appendix D.
A direct consequence of Proposition 3 is where the maximization is over all possible permutations on [K]. We now recall the definition of the generalized independence number [9]. Denote the collection of the c Q P information bits available exclusively at the nodes Q ∪ {0} and demanded exclusively by the nodes P as {w Q P,m : m = 1, . . . , c Q P }. Therefore, the set of all the information bits present in the system is Note that each bit is identified by a triple (P, Q, m).

Definition 2. A subset H of B is a generalized independent set if and only if every subset I ⊂ H satisfies the following:
• there exists a node k ∈ [K] and an information bit in I such that this information bit is demanded by k (and possibly some other nodes), and none of the other bits in I are available as side information at k.
The generalized independence number α is the size of the largest generalized independent set.
We next show that the lower bound in (42) is in fact equal to the generalized independence number α of this index coding problem.

Theorem 5. The generalized independence number α satisfies
where the maximization is over all K! permutations of [K].
Proof. See Appendix E. Taking the average of the right hand side of (41) with respect to all γ, we obtain

Relation to the Index Coding Lower Bound
For each choice of P, Q ⊂ [K] with P ∩ Q = ∅, we now count the number of times c Q P appears in this sum. For a given γ, the inner summations include the term c Q P if and only if the following holds: i.e., if we consider the elements γ 1 , . . . , γ K in that order, the first element from P ∪ Q to be observed in this sequence belongs to P. Thus, for a given pair P, Q the probability that a permutation γ chosen uniformly at random includes the term c Q P in the inner summation is |P|/(|P| + |Q|). Therefore, the average of the lower bound in Proposition 3 over all possible γ is which is exactly the bound in Theorem 4. As the bound in Theorem 4 is obtained by averaging over all γ, instead of maximizing over all γ, we conclude that this is in general weaker than the α-bound of index coding. The two bounds are equal if and only if the bound in Proposition 3 has the same value for every permutation γ.
Although weaker in general, we note that the bound of Theorem 4 is easier to use than the α-bound. As demonstrated by (2), in order to use Theorem 4, we only need to know, for each information bit, the number of nodes that contain this bit and the number of nodes that demand this bit. In comparison, this information is insufficient to evaluate the α-bound, which also requires the identities of these nodes.

On the Tightness of Theorem 4
We now consider the class of unicast problems, i.e., problems where each bit is demanded by exactly one of the nodes. For this class of problems, we characterize when Theorem 4 yields a tight bound. Proof. See Appendix F. When the lower bound of Theorem 4 is tight, the clique-covering based index coding scheme (see in [26,27]) yields the optimal communication cost.
Our main result in Theorem 1, or equivalently, Theorem 4, does not provide a tight lower bound for centralized data shuffling problem [6], because this problem involves scenarios that do not satisfy the tightness condition of Theorem 6. For instance, consider the simple canonical data shuffling setting, where the system has exactly K files, all of equal size F bits, and each node stores exactly one of these files, i.e., the entirety of the contents of the kth node C k is the kth file. Here, |C k | = F for all k ∈ [K], and C i ∩ C j = ∅ for all i = j. Assume that the shuffling problem is to move the file C k+1 to node k, i.e., D k = C k+1 , where we consider the index K + 1 to be equal to 1. This is a worst-case demand for data shuffling incurring the largest possible communication cost. For this set of demands, we have c Therefore, our lower bound is strictly less than L * for this data shuffling problem, and therefore is not tight.

Relationship to Other Index Coding Settings
We now comment on the application of our data exchange bound to a couple of other important index coding settings known in literature, (a) distributed index coding studied in [10] (which is equivalent to the cooperative multi-sender index coding setting considered in [11]), and (b) embedded index coding, presented in [12].

Distributed Index Coding
In [10], the authors consider a generalization of the single-server index coding problem (which we studied in Section 5) called distributed index coding. The specific setting in [10] is as follows. There are n messages denoted by x j : j ∈ [n], where x j ∈ {0, 1} t j (for some positive integer t j ). There is a corresponding set of n receivers indexed by [n]. The receiver j ∈ [n] contains as side-information the subset of messages indexed by A j ⊂ [n] (i.e., receiver j knows {x i : i ∈ A j }) and demands the message x j . There are 2 n − 1 servers in the system, indexed by the sets J = {J : J ⊂ [n], J = ∅}. The server J contains the messages {x i : i ∈ J}. The servers do not demand any messages and are responsible only for transmissions that satisfy the receivers. The server J is connected to the n receivers via a broadcast link with capacity C J bits. In order to satisfy the demands, each server J sends a message y J ∈ {0, 1} s J to all the receivers, where s J is some positive integer.

Definition 3 ([10]
). The rate-capacity tuple ((R j : j ∈ [n]), (C J : J ∈ J )) is said to be achievable if there exists some positive integer r such that t j ≥ rR j , ∀j and s J ≤ rC J , ∀J, and there exists valid encoding functions (encoding the messages of lengths (t j : j ∈ [n]) into codewords of lengths (s J : J ∈ J )) and decoding functions, such that all receivers can decode their respective demands.
Slightly abusing Definition 2, for some T ⊂ [n], we call a set S ⊂ T of message indices as a generalized independent set of T, if for every subset S ⊂ S, there is some j ∈ S such that A j ∩ (S \ j) = ∅.
Let ((R j : j ∈ [n]), (C J : J ∈ J )) be an achievable rate-capacity tuple. For any nonempty subset T ⊂ [n], let S T be a generalized independent set of T. In Corollary 2 of [10], it is shown that (44) Remark 6. The above bound in (44) is given in [10] using the terminology of the side-information graph defining the index coding problem and its acyclic induced subgraphs. However, we have used generalized independent sets to state the same bound. The reader can easily confirm that the acyclic induced subgraph of the side-information graph as defined in [10] is the same as a generalized independent set we have used in this work. Therefore, (44) is the same as the bound in Corollary 2 of [10].
where the maximization is over all generalized independent sets S of [n]. Then, we have by (44), In order to relate the bound in (44) with our data exchange bound, we fix R j = t j , ∀j ∈ [n]. This means that we should have r = 1 in Definition 3. For these parameters, let s * J : J ∈ J be a choice of integers s J : J ∈ J such that the rate-capacity tuple ((R j = t j : j ∈ [n]), (s J : J ∈ J )) is achievable and ∑ J∈J s J is minimized. Note that such integers s * J : J ∈ J will exist as each index coding problem has at least one solution, namely, the trivial solution consisting of uncoded transmissions of x j : j ∈ [n].
Then, applying (45), we have Note that ∑ J∈J s * J is exactly the minimum number of bits to be communicated by the servers for satisfying receiver demands. For By arguments similar to that of the proof of Theorem 5, we can verify that ∑ j∈S max where the maximization is over all possible permutations γ = (γ 1 , . . . , γ n ) of (1, . . . , n). We thus have by (46) and (47), Finally, we apply our data exchange bound in Theorem 1 to the distributed index coding setting. To do this, we first observe that if we replaced all servers by a single "virtual" central server containing all the messages, x j : j ∈ [n], then ∑ J∈J s * J is the minimum number of bits to be transmitted by this virtual central server to satisfy the receiver demands. Any lower bound on the communication cost for this transformed setting with the virtual server will thus continue to apply for the original distributed setting with messages of length t j : j ∈ [n]. Now, utilizing the centralized version of Theorem 1 shown in Theorem 4 and by the discussion in Section 5.2, we get Therefore, we see that the generalized independent set based bound in (48) is in general better than (49), as (48) involves a maximization over all permutations γ, while (49) involves the average.

Embedded Index Coding
We now consider the embedded index coding problem, introduced in [12], motivated by device-to-device communications. The embedded index coding setting consists of a set of m data blocks (each a binary vector of length t) distributed across a set of n nodes. Each node stores (as side information) a subset of the data blocks and demands another subset which it already does not have. This setting is different from [26,27] or distributed index coding [10], as there are no dedicated servers by default here. Each node transmits a codeword obtained by encoding its data blocks, and each demanded data block at any node is decoded from the codewords obtained from other nodes and the side information at the node itself. An embedded index code consists of a collection of such encoding functions and decoding functions at the nodes, such that all demanded blocks are decoded at the respective nodes. The communication cost of embedded index coding is the total number of bits transmitted between the nodes to satisfy the node demands. The work [12] generalizes the notion of minrank [26] of single-server index codes to define the optimal length of linear embedded index codes. Further, the authors also present heuristic constructions for general and specialized linear codes which have some nice properties.
As the embedded index coding problem clearly has a direct mapping with the data exchange problem considered in the present work, we can apply our data exchange bound directly to obtain a new lower bound for the communication cost of embedded index coding. The expression of this bound would be in the same form (up to only the change in notation) as Theorem 1 itself. As our bound holds in the information-theoretic sense, it would apply to not just the linear codes considered in [12] but nonlinear embedded index codes as well.

Conclusions
We have presented an information theoretic converse result for a generic data exchange problem, where the terminals contain some data in their local storage and want other data available at the local storage of other nodes. As a number of recently studied multi-terminal communication problems fall under this setting, we have used our general converse to obtain converses in many such settings, thus recovering many existing results and presenting some new results as well. Using a connection with index coding, we also presented some ideas on why and when our data exchange based converse can be loose in the index coding setting. It would be quite interesting to see if our converse result can be tightened further while still retaining a closed form expression, so as to cover all known bounds for any existing setting that can be modeled in the data exchange framework. A lower bound for the communication load in a generic data exchange setting in the presence of coded storage bits would also be a prospective direction for future research in this area. Acknowledgments: The authors are thankful to the anonymous reviewers and the academic editor for their careful reading of the manuscript and comments that helped improved the quality of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Proof of Theorem 1
We assume that all the bits in the collection B as in Definition 1 are i.i.d uniformly distributed on {0, 1}. For a given communication scheme for the given data exchange problem, let X i φ i (C i ) represent the codeword transmitted by node i. For a subset S ⊂ [K], let X S ∪ i∈S X i . Furthermore, let Y S = i∈S (D i ∪ C i ). We first prove the following claim. Applying S = [K] to the above claim then gives Theorem 1, as L * ≥ H(X [K] ). Now, we prove Claim A1. For this, we use induction on |S|.
We take the base case to be |S| = 2, as for |S| = 1 the problem of data exchange is not well defined. Let S = {1, 2} without loss of generality. Then, the LHS of (A1) gives where (A2) follows as conditioning reduces entropy and H(X 1 |C 1 ) = 0, (A3) is true as H(D 2 |Y S , C 2 , X 1 ) = 0 and H(D 1 |Y S , C 1 , X 2 ) = 0. This proves the base case. We now assume that the statement is true for |S| = t − 1, and prove that it holds for |S| = t. We have the LHS of (A1) satisfying the following relationships for |S| = t.
where (A6) follows because H(X k |C k ) = 0. In (A7), we introduce D k freely, because where the last two statements follow because H(X S |C S ) = 0 and from the decoding condition, respectively. We now interpret the two terms of (A8). For the first term, we have where the last statement follows by noting that for a fixed choice of P, Q we have |P| choices for (k, P ) such that P ∪ k = P.
Now, using the induction hypothesis for the last term of (A8), where the above follows by noting that for a fixed choice of disjoint subsets P, Q of S, we have |S| − |P| − |Q| choices for k such that P ⊂ S \ k and Q ⊂ S \ (P ∪ k). Using (A9) and (A11) we have RHS of (A8) thus proving Claim A1, which also concludes the proof of the theorem.

Appendix B. Proof of (11)
We proceed according to the three steps in Section 1.2, however with some important variations that are required to prove (11).
Applying Theorem 1 and symmetrizing: As in the proof of Theorem 2, we use the index 0 to represent the server. Consider that, for the given placement scheme ζ, the delivery scheme is designed so that the optimal communication load L * c (N u , ζ) is achieved.
N u ) be the set of all N u -sized subsets of clients. Consider the coded caching subproblem induced by the server and a set A ∈ A of clients. Consider some demand vector d = (d 1 , . . . , d K ) such that the demands of the clients in A are distinct, i.e., d i = d j for i, j ∈ A, i = j.
Let Ω consist of the N − 1 cyclic permutations of (1, . . . , N) which are N-cycles, along with the identity permutation. For σ ∈ Ω, let σ(d) denote the demand vector in which the ith component is exactly the value obtained by applying the permutation σ on the ith component of d, i.e., σ(d) i = σ(d i ).
Clearly, for each σ ∈ Ω, we have L * c (N u , ζ) ≥ L * A (σ(d)), where L * A (σ(d)) is the optimal communication load for this subproblem with demands σ(d) with respect to the placement ζ. Now, for each σ ∈ Ω, following similar steps as in proof of Theorem 2, we can use our data exchange bound in Theorem 1 to obtain By averaging (A12) across the N demand vectors in {σ(d) : σ ∈ Ω}, we get For each A ∈ A, the bound (A13) holds. Averaging these bounds for all A ∈ A, we obtain In the above summation, for any given A, for fixed For some Q ⊂ [K], to obtain an A ∈ A such that |A ∩ Q| = q, we have to choose q elements from Q and N u − q elements from outside Q. Thus, we have |{A ∈ A : Using (A20) and (A19) in (A17), we get where g Nu is the lower convex envelope as defined in Remark 2.

Using the constraints to revise (A21):
Observe that ∑ Q⊂[K] a Q = NF. We assume without loss of generality that all caches are completely populated with the bits of the file library (as we are bounding the optimal load), and thus we have ∑ Q⊂[K] |Q|a Q = tNF. By using these and applying Jensen's inequality to (A21), we have (11). This completes the proof. (21) The proof uses the same approach as in Appendix B, but takes into account the fact that there is no central server and the cache sizes are heterogeneous. The choice of the set of demand vectors used for symmetrization is the same as in Appendix B.

Appendix C. Proof of
Applying Theorem 1 and symmetrizing: Let the sets A and A ∈ A, the set of cyclic permutations Ω, demand vector d, and optimal communication loads L * c (N u , ζ) and L * A (σ(d)) be defined as in Appendix B. Throughout this proof we will assume N u = min{N, K}. Applying our main result Theorem 1 to the subproblem induced by the demands of the subset of nodes A, we obtain where c Q k (σ(d)) is the number of bits of the file W (σ(d)) k demanded by node k available exclusively in all nodes in Q and not available in [K] \ Q. Let a Q be the total number of bits stored exclusively in the nodes Q. Then, considering the set of N demands {σ(d) : σ ∈ Ω}, we have ∑ σ∈Ω c Q k (σ(d)) = a Q .
Averaging over all possible σ, we obtain Again, averaging this inequality over all possible choices of A ∈ A = ( [K] N u ), we obtain In (A22), for a given choice of Q ⊂ [K], the term 1 |Q| a Q appears in the summation for every choice of (A, k) such that k / ∈ Q, |A| = N u , k ∈ A. Therefore, the number of times 1 |Q| a Q appears in (A22) is the number of such choices of (A, k) which is (K − |Q|)( K−1 N u −1 ). Hence, we arrive at Using the constraints to revise (A23): We use the observations that K−x x is convex and decreasing in x > 0, ∑ Q⊂[K] a Q NF = 1, and ∑ Q⊂[K] |Q|a Q ≤ ∑ t i=1 K T i γ T i NF since this is the total available cache memory across all nodes. Applying Jensen's inequality on (A23) using these constraints, we obtain This completes the proof.

Appendix E. Proof of Theorem 5
We prove this theorem by showing that α is both upper and lower bounded by the right hand side of (43).
Upper Bound: Assume that H is a largest generalized independent set. We will now identify a permutation π = (π 1 , . . . , π K ) corresponding to H. Let I 1 = H, and observe that as I 1 is itself a subset of H, it must contain an information bit, say w Q P,m that is demanded by a node, say π 1 , and none of the bits in I 1 \ {w Q P,m } is available as side information at π 1 . For k = 2, . . . , K, we sequentially identify π k as follows. We first define I k = H \ i<k P:π i ∈P Q⊂[K]\P w Q P,m : m = 1, . . . , c Q P , which is H minus all the bits demanded by any of π 1 , . . . , π k−1 . Thus, any bit in I k is demanded by one or more of the nodes in [K] \ {π 1 , . . . , π k−1 }. As I k ⊂ H, it contains an information bit such that this bit is demanded by a node, say π k ∈ [K] \ {π 1 , . . . , π k−1 }, and the rest of I k is not available as side information at π k .
Observe that H = I 1 ⊃ I 2 ⊃ · · · ⊃ I K , and I k \ I k+1 is the set of bits in H that are demanded by π k but not by any of the nodes in π 1 , . . . , π k−1 . Thus, I 1 \ I 2 , I 2 \ I 3 , . . . , I K−1 \ I K , I K form a partition of H. Here, we have abused the notation to denote I K by I K \ I K+1 . We also observe that for any choice of k none of the bits of I k is available as side information at π k . If k > k , as I k ⊂ I k , we deduce that none of the bits in I k is available as side information at π k . Thus, we conclude that each bit in I k \ I k+1 is demanded by π k and is neither demanded by and nor available as side information at any of π 1 , . . . , π k−1 . Therefore, |I k \ I k+1 | is upper bounded by the number of bits exclusively demanded by π k and possibly some subset of {π k+1 , . . . , π K } and which are also exclusively available at some subset of {π k+1 , . . . , π K }, i.e., |I k \ I k+1 | ≤ ∑ P⊂{π k ,...,π K } π k ∈P ∑ Q⊂{π k+1 ,...,π K } c Q P .
This provides us the following upper bound, To show that H is a generalized independent set, consider any subset I ⊂ H. Let k be the smallest integer such that I contains an information bit w Q P,m with γ k ∈ P, i.e., k is the smallest integer such that γ k demands some information bit in I. Therefore, any other bit w Q P ,m in I must satisfy P ⊂ {γ k , . . . , γ K }, Q ⊂ {γ k +1 , . . . , γ K } for some k ≥ k.
Clearly, this bit is not available as side information at γ k . Thus, H is a generalized independent set.

Appendix F. Proof of Theorem 6
For unicast problems c Q P > 0 only if |P| = 1. We abuse the notation mildly and use c Q k to denote c Q {k} . The Necessity Part: The lower bound of Proposition 3 for unicast problems is ∑ K i=1 ∑ Q⊂{γ i+1 ,...,γ K } c Q γ i . For brevity, we will denote this sum as f (γ). For Theorem 4 to be tight it is necessary that the bound of this theorem be equal to α, i.e., the value of f be the same for all permutations γ.