Efficient Algorithms for Coded Multicasting in Heterogeneous Caching Networks

Coded multicasting has been shown to be a promising approach to significantly improve the performance of content delivery networks with multiple caches downstream of a common multicast link. However, the schemes that have been shown to achieve order-optimal performance require content items to be partitioned into several packets that grows exponentially with the number of caches, leading to codes of exponential complexity that jeopardize their promising performance benefits. In this paper, we address this crucial performance-complexity tradeoff in a heterogeneous caching network setting, where edge caches with possibly different storage capacity collect multiple content requests that may follow distinct demand distributions. We extend the asymptotic (in the number of packets per file) analysis of shared link caching networks to heterogeneous network settings, and present novel coded multicast schemes, based on local graph coloring, that exhibit polynomial-time complexity in all the system parameters, while preserving the asymptotically proven multiplicative caching gain even for finite file packetization. We further demonstrate that the packetization order (the number of packets each file is split into) can be traded-off with the number of requests collected by each cache, while preserving the same multiplicative caching gain. Simulation results confirm the superiority of the proposed schemes and illustrate the interesting request aggregation vs. packetization order tradeoff within several practical settings. Our results provide a compelling step towards the practical achievability of the promising multiplicative caching gain in next generation access networks.


Introduction
Recent information-theoretic studies  have characterized the fundamental limiting performance of several caching networks of practical relevance, in which throughput scales linearly with cache size, showing great promise to accommodate the exponential traffic growth experienced in today's communication networks [50]. In this context, a caching scheme is defined in terms of two phases: the cache placement phase, which operates at a large time-scale and determines the content to be placed at the network caches, and the delivery phase, during which user requests are served from the content caches and sources in the network. Some of the network topologies studied include shared link caching networks [1,2,[8][9][10][11][12][13][14], device-to-device (D2D) caching networks [17][18][19]33,34], hierarchical caching networks [24], multi-server caching networks [29], and combination caching networks [36][37][38][39][40][41][42].
Consider a network with one source (e.g., base station) having access to m files, and n users (e.g., small-cell base stations or end user devices), each with a cache memory of M files. In [17], the authors showed that if the users can communicate between each other via D2D communications, a simple distributed random caching policy and TDMA-based unicast D2D delivery achieves the order-optimal throughput Θ max{ M m , 1 m , 1 n } whose linear scaling with M when Mn ≥ m exhibits a remarkable multiplicative caching gain, in the sense that the per-user throughput grows proportionally to the cache size M for fixed library size m, and it is independent of the number of users n in the system. Moreover, in this scheme each user caches entire files without the need for partitioning files into packets, and missing files are delivered via unicast transmissions between neighbor nodes, making it efficiently implementable in practice. We recall that order-optimality refers to the fact that the multiplicative gap between information-theoretic converse and achievable performance can be bounded by a constant number when m, n → ∞.
In the case that users cannot communicate between each other, but share a multicast link from the content source, the authors in [8,9] showed that the use of coded multicasting (also referred to as index coding [51]) allows achieving the same order-optimal worst-case throughput as in the D2D caching network. In this case, however, in order to create enough coding opportunities during the delivery phase, requested files are required to be partitioned into a number of packets that grows exponentially with the number of users, leading to coding schemes of exponential complexity [8,9,21].
In [10,12], the authors considered the same shared link caching network, but under random demands characterized by a probability distribution, and proposed a scheme consisting of random aggregate popularity (RAP) placement and chromatic number index coding (CIC) delivery, referred to as RAP-CIC, proved to be order-optimal in terms of average throughput. The authors further provided optimal average rate scaling laws under Zipf [52] demand distributions, whose analytical characterization required resorting to a polynomial-time approximation of CIC, referred to as greedy constrained coloring (GCC). Using RAP-GCC, the authors further established the regions of the system parameters, in which multiplicative caching gains are potentially achievable. While GCC exhibits polynomial complexity in the number of users and packets, the order-optimal performance guarantee still requires, in general, the packetization order (number of packets per file) to grow exponentially with the number of users, as showed in [21].
It is then key to understand if the promised multiplicative caching gain, shown to be asymptotically achievable by the above-referenced schemes, can be preserved in practical settings of finite packetization order. In this context, we shall differentiate between coded multicast schemes that assume a deterministic vs. a random cache placement phase. Deterministic placement policies determine where to store file packets according to a deterministic procedure that takes into account the ID of each packet. In contrast, random placement policies, after determining the number of packets to be cached of each file at each cache, choose the exact packet IDs uniformly at random. While the increased structure of deterministic placement policies can be exploited to design more efficient coded multicast algorithms, random placement policies are desirable in practice, as they provide increased robustness by requiring less cache configuration changes under system dynamics.
The seminal work of [21] showed that all previously proposed schemes (based on both deterministic and random cache placement) required exponential packetization, and that under random placement, no graph-coloring-based coded multicast algorithm can achieve multiplicative caching gains with sub-exponential packetization. Since the fundamental results of [21], several works have studied the now central problem in caching of finite file packetization. The authors in [53] connect the caching problem to resolvable combinatorial designs and derive a scheme that while improving exponentially over previous schemes [8,9,21], still requires exponential packetization. In [54], the authors introduce the combinatorial concept of Placement Delivery Array (PDA) and derive a caching scheme where the packetization scales super-polynomially with the number of users. The work in [22] establishes a connection with the construction of hypergraphs with extremal properties, and provides the first sub-exponential (but still intractable) scheme. Somewhat surprisingly, some of the authors of [21] introduced a new combinatorial design based on Ruzsa-Szeméredi graphs in [30] and showed that a linear scaling of the number of packets per file with n can be achieved for a throughput of Θ(n −δ ), where δ can be arbitrarily small. However, all the above studies focus on coded multicast algorithms that assume a deterministic cache placement phase. Under random cache placement, several coded multicast algorithms have been proposed in the context of homogenous shared link caching networks [55][56][57][58][59][60], including our previous work that serves as the basis for this paper.
In this work, we address the important problem of finite-length coded multicasting under random cache placement, focusing on a more general heterogeneous shared link caching network, in which caches with possibly different sizes collect possibly multiple requests according to possibly different demand distributions (see Figure 1). As shown in Figure 1, this scenario can be motivated by the presence of both end user caches and cache-enabled small-cell base stations or WLAN access points sharing a common multicast link. In this case, each small-cell base station can be modeled as a user cache placing multiple requests. In addition, multiple requests per user also arise in the presence of delay-tolerant content requests (e.g., file downloading). While there have been several information-theoretic studies of shared link caching networks with distinct cache sizes [61][62][63], and with multiple per-user requests [13,14,34,64,65], none of these works considered the finite-length regime nor addressed the joint effect of random demands, heterogenous cache sizes, and multiple per-user requests.  The contributions of this paper are as follows: 1.
We provide a generalized model for heterogeneous shared link caching networks, in which users can have different cache sizes and make different number of requests according to different demand distributions.

2.
We design two novel coded multicast algorithms based on local graph coloring, referred to as Greedy Local Coloring (GLC) and Hierarchical Greedy Local Coloring (HgLC) that exhibit polynomial-time complexity in both the number of caches and the packetization order. In combination with the Random Aggregate Popularity (RAP) placement policy of [10,12], we show that the overall schemes RAP-GLC and RAP-HgLC are order-optimal in the asymptotic file-length regime.

3.
Focusing on the finite-length regime, in which content items can be partitioned into a finite number of packets, we show how the general advantage of local graph coloring is especially relevant when the number of per-user requests grow. We validate via simulations the superiority of RAP-GLC, especially with high number of per-user requests. We then show how RAP-HgLC, with a slight increase in the polynomial complexity order, further improves the caching gain of RAP-GLC, remarkably approaching the multiplicative gain that existing schemes can only guarantee in the asymptotic file-length regime.

4.
We demonstrate that there is a tradeoff between the required packetization order and the number of requested files per user. In particular, for a given target gain, if the number of requests increases, then the number of packets per file can be reduced, while preserving the target gain. We further quantify the regime of per-user requests for which a caching scheme with unit packetization order (i.e., a scheme that treats only whole files) is order-optimal. Our analysis illustrates the key impact of content request aggregation in time and space on caching performance. That is, if edge caches can wait for collecting multiple requests over time and/or aggregate requests from multiple users, the same performance can be achieved with lower packetization order, and hence lower computational complexity.
The paper is organized as follows. Section 2 introduces the network model and problem formulation. Section 3 describes the construction of coded multicast algorithms using graph coloring, with special focus on the advantages of local graph coloring. Section 4 presents novel polynomial-time local-graph-coloring-based coded multicast schemes. Section 5 analyzes the effect of request aggregation on the performance-complexity tradeoff. Section 6 presents simulation results and related discussions. Finally, concluding remarks are given in Section 7.

Network Model and Problem Formulation
We consider a caching network formed by a source node with access to a content library, connected to several caching nodes/users via a single shared (multicast) link. Similar to previous works [8][9][10][12][13][14]21,22,30], we define a caching scheme in terms of two phases: • Placement phase, which operates at a large time-scale and determines the content to be placed at the caching nodes, • Delivery phase, during which users requests are served from the content caches and sources in the network.
However, differently from previous works, we generalize the model to a heterogeneous system in which each caching node has a possibly different cache size and requests a possibly different number of files. A practical example of our setting can be represented by a macro base station connected to several cache-enabled small-cell base stations, and a number of user devices served either by the macro base station or by the small-cell base stations. In this setting, each small cell acts as a super user requesting multiple files resulting from the requests of the users it serves.
Specifically, the heterogeneous caching network consists of a single source node storing a library of files F = {1, . . . , m}, each with entropy F bits, and n user nodes U = {1, . . . , n}, each with a cache of storage capacity M u F bits (i.e., each user caches up to M u files). Each user u can requests L u (1 ≤ L u ≤ m) different files according to its individual request probability distribution. We assume that the library files have finite length and consequently a finite packetization order. Our main objective is to design a caching scheme that minimizes the number of transmissions required to satisfy the demands of all users.
In a homogeneous network setting with infinite packetization order, recent works [8][9][10][12][13][14] have shown that it is possible to satisfy a scaling number of users with only a constant number of multicast transmissions. The achievable schemes configure user caches with complementary (side) information during the caching phase, such that the resulting coded multicasting opportunities that arise during the delivery phase can be used to minimize the transmission rate (or load) over the shared multicast link. Specifically, reference [12] showed that under Zipf file popularity, a properly optimized random fractional placement policy, referred to as Random Aggregate Popularity (RAP) caching, achieves order-optimality when combined with a graph-coloring-based coded multicast scheme. Unfortunately, even in the homogenous setting, it was shown in [21] that a central limitation of all previous works is that they require infinite packetization order: all existing caching schemes achieve at most a factor of two gain when the packetization order is finite.
In this work, inspired by the fundamental throughput-delay-memory tradeoff derived in [21], our goal is to design computationally efficient schemes that provide good performance in the finite packetization regime. For the caching phase, (1) we restrict our placement policies to the class of random fractional schemes described in [9,10,[12][13][14], proved to be order-optimal in the homogeneous setting. For the delivery phase, (2) we focus on the class of graph-coloring-based index coding schemes, and design two novel polynomial-time algorithms that employ local graph coloring on the (index coding) conflict graph [51].

Random Fractional Cache Placement
The class of random fractional placement schemes is described as follows:

1.
Packetization: Each file is partitioned into B packets of equal-size F/B bits, where the integer B is referred to as the packetization order. Each packet is represented by a symbol in finite field F 2 F/B , where we assume that F/B is large enough.

2.
Random Placement: Each user u caches p f ,u M u B packets independently at random from each file f , where p f ,u is the probability that file f is cached at user u, and satisfies 0 We introduce a caching distribution matrix P = [p f ,u ] ∈ R m×n + , where f ∈ F and u ∈ U . Please note that the number of packets of file f cached at user u, p f ,u M u B, can be directly determined from the caching distribution matrix P. As described in [10,[12][13][14], the caching distribution must be properly optimized to balance the gains from local cache hits (where requested packets are served by the local cache) and coded multicast opportunities (where requested packets are served by coded transmissions that simultaneously satisfy distinct user requests). When this is the case, we refer to the cache placement scheme as Random Aggregate Popularity (RAP) caching (see e.g., [10,[12][13][14]). Given the number of packets to be cached of a given file, the actual indices of the packets to be cached are chosen uniformly at random, and independently across users. We use C u, f to denote the set of packets of file f cached at user u and C = {C u, f } with u ∈ U and f ∈ F to denote the packet-level cache placement realization.
The goal of the placement phase is to configure the user caches to create coding opportunities during the delivery phase that allow serving distinct user requests via common multicast transmissions.
Compared to deterministic placement [8], random placement schemes allow configuring user caches with lower complexity and increased robustness, i.e., changes in system parameters (e.g., number of users, number files, file popularity) require less changes in users' cache configurations [12].
Recall that the placement phase operates at a much larger time-scale than the delivery phase, and hence is unaware of the requests in the subsequent delivery rounds. Therefore, the placement phase can be designed according to the demand distribution, but must be independent of the requests realizations.

Random Multiple Requests
Each user u ∈ U requests L u (1 ≤ L u ≤ m) files independently from other users, following a probability distribution q f ,u with q f ,u ∈ [0, 1] and ∑ m f =1 q f ,u = 1 (i.e., for each request of user u, file f is chosen with probability q f ,u ). We introduce a demand distribution matrix Q = [q f ,u ] ∈ R m×n + , where f ∈ F and u ∈ U . In the following, we use W = {W u, f }, with u ∈ U and f ∈ F , to denote the packet-level demand realization where W u, f denotes the packets of file f requested by user u.
The multiple-request parameters {L u } have a key operational meaning, in that it captures the possibility of edge caches to collect requests across time and space. That is, L u may represent the amount of requests collected over time (given the delay tolerance of some content requests) as well as the amount of requests collected across space from users served by the given edge cache (e.g., when edge caches are located at helper nodes or small-cell base stations serving multiple individual users).

Performance Metric
For given realizations of the random fractional cache placement and the random multiple requests, the goal is to design a delivery scheme that minimizes the rate over the shared multicast link required to satisfy all user requests. Since one placement phase is followed by an arbitrarily large number of delivery rounds (each characterized by a new independent request realization), the rate (or load) of the system refers only to the delivery phase (i.e., asymptotically the cache placement costs no rate). Furthermore, it makes sense to consider the average rate, where averaging with respect to the users request distribution takes on the meaning of a time-averaged rate, invoking an ergodicity argument.
At each request round, let F = {f 1 , f 2 , · · · , f n } be the demand realization, where f u = { f 1,u , f 2,u , · · · , f L u ,u }, u ∈ U . The source node computes a multicast codeword as a function of the library and the demand realization F. We assume that the source node communicates to the user nodes through an error-free deterministic shared multicast link.
Given the demand realization F, let the total number of bits transmitted by the source node be J(F). We are interested in the average performance of the coded multicast scheme, and hence define the average rate (or load) as the number of transmitted bits normalized by the file size: where the expectation is over the random demand distribution.

Graph-Coloring-Based Coded Multicast Delivery
It is important to note that for given cache placement and demand realizations, the delivery phase of a caching scheme reduces to an index coding problem with a twist. The only difference with the conventional index coding problem introduced in [51] is that the cache information may contain part of (as opposed to entire) requested files, and that users may request multiple (as opposed to single) files. Nevertheless, as in index coding, the problem can still be represented by a conflict graph [10,[12][13][14], where vertices represent requested packets, and an edge between two vertices indicates a conflict, in the sense that the packet represented by one vertex is not present in the cache of the user requesting the packet represented by the other vertex. By construction, packets with no conflict in the graph can be simultaneously transmitted via an XOR operation. Performing graph coloring on the conflict graph and transmitting the packets via proper XOR operations, according to the graph coloring, results in an achievable linear index coding scheme, which we refer to as a coded multicast scheme.
In the following, we first illustrate how to construct the conflict graph, we then review classical linear index coding schemes, and then describe our proposed graph-coloring-based coded multicast schemes.

Conflict Graph Construction
Given cache placement realization C and demand realization W, the directed conflict graph H d C,W = (V, E ) can be constructed as follows: denotes the identity of the packet, and µ(v) the user requesting it. Hence, if a packet is requested by multiple users, such a packet is represented in as many vertices as the number of users requesting it. Such vertices have the same packet label ρ(v), but different user label µ(v).
To better understand the rationale behind the conflict graph and its construction, note that for any two vertices v 1 and v 2 that are labeled as {ρ(v 1 ), µ(v 1 )} and {ρ(v 2 ), µ(v 2 )}, respectively, we have the following three possible cases: This indicates that two different packets are requested by the same user. Then, v 1 and v 2 are mutually conflicting, in the sense that if sent within the same time-frequency resource they interfere with each other. Hence, in the conflict graph, they are connected with two directed edges, : This indicates that the same packet is requested by two different users. Then, v 1 and v 2 are not conflicting, and hence not connected in the conflict graph; i.e., This indicates that two different packets are requested by two different users. In this case, if packet ρ(v 1 ) is in the cache of user µ(v 2 ), then, even if ρ(v 1 ) and ρ(v 2 ) are sent within the same time-frequency resource, user µ(v 2 ) will not suffer from interference, since, using its cache information, it can cancel out the undesired packet Based on the above construction, it follows that the number of interference dimensions faced by a given node is at most the number of its outgoing neighbors.
To illustrate the construction of the directed conflict graph H d C,W , we present the following example.

Example 1.
We consider a network with n = 3 users denoted as U = {1, 2, 3} and m = 3 files denoted as F = {A, B, C}. We assume M u = 1, ∀u ∈ U and partition each file into three packets. For example, for u ∈ U , which means that one packet from each of A, B, C is stored in each user's cache. For the sake of notational convenience, we assume a symmetric caching realization, where the caching configuration C is given by . That is, the cache configuration of each user u ∈ U is C u = {A u , B u , C u }. We let each user make two requests, i.e., L u = 2 (∀u ∈ U ). Specifically, we let user 1 request A, B, user 2 request B, C, and user 3 request C, A, i.e., The associated directed conflict graph is shown in Figure 2.

Code Construction
Let In general, in a linear index coding scheme of length , every vertex v is associated with a "coding" q , is built as follows: For any feasible scalar linear index coding scheme of the form (2), the following interference alignment condition is necessary: For every vertex v, the coding vector g v should be linearly independent of all the coding vectors assigned to the out-neighborhood of v.
In the following, we describe how to construct coding vectors satisfying the interference alignment condition for every vertex. For ease of notation, we use H d to denote the directed conflict graph, and H to represent its underlying undirected skeleton, where the direction of edges is ignored. Recall that an undirected skeleton of a directed graph H d is an undirected graph where there is an undirected edge between v 1 and v 2 if, between v 1 and v 2 , there is a directed edge in either or both directions in H d .

Graph Coloring and Chromatic Number
A well-known procedure to construct the coding vectors {g v , v ∈ N (v)} is the coloring of H d . In the following, when used without any qualification, a coloring of a directed graph is considered to be a proper (vertex) coloring of its underlying undirected skeleton H, where a proper coloring is a labeling of the graph's vertices with colors, such that no two vertices sharing the same edge have the same color. Please note that by definition, any subset of nodes with the same color in a proper coloring form an independent set (i.e., a subset of nodes in a graph, no two of which share the same edge). A coloring using at most k colors is called a (proper) k-coloring. The smallest number of colors needed in a proper coloring of H d is called its chromatic number, and is denoted by χ(H d ). In the following, we explain why a coloring of H d provides a way to design the coding vectors {g v , v ∈ N (v)}. Let ξ be the total number of colors in a given coloring of H d . Let e i be the i-th unit vector in the space F ×1 q , with = ξ, i.e., e i = [0, 0, · · · , 1, · · · , 0, 0] T , where the 1 is in the i-th position. Now, if vertex v is colored with color i, then, its coding vector is g v = e i . Making this choice for the coding vectors, the associated achievable rate is given by ξ B . Since neighbors are assigned different colors, the interference alignment condition is satisfied for every vertex. Recalling the definition of χ(H d ), it is immediate to see that the best achievable rate due to conflict graph coloring is given by , and, according to the construction of the conflict graph, it is loosely bounded by: indicating that the achievable rate is a constant with regards to B. A much tighter bound will be given in Section 4.1.

Local Graph Coloring and Local Chromatic Number
More efficient sets of coding vectors can be constructed using the approach proposed in [66], which exploits the direction information in H d C,W , resulting in the following advanced coding scheme: . Given a proper coloring c of H d , the associated local chromatic number is defined as: where N + (v) is the closed out-neighborhood of vertex v (i.e., vertex v and all its ongoing neighbors N (v)) and |c(N + (v))| is the total number of colors in N + (v) for a given proper color assignment c.
The minimum local coloring number over all proper colorings is referred to as the local chromatic number and is formally defined as follows: Definition 2 (Local Chromatic Number). The directed local chromatic number of a directed graph H d is defined as: where C denotes the set of all proper coloring assignments of H d , N + (v) is the closed out-neighborhood of vertex v, and |c(N + (v))| is the total number of colors in N + (v) for a given proper color assignment c.
Encoding Scheme: For a given realization of the cache placement (C) and user requests (W), let us consider the conflict graph H d C,W as in Section 3.1. Given a (proper) ξ-coloring (i.e., a proper coloring of graph H d C,W with ξ colors), we compute the associated local coloring number ξ lc . Set = ξ lc and p = ξ. Then, consider the columns of the generator H of an × p Maximum Distance Separable (MDS) [67] code over the field F q : q > p . If the color of a vertex v is i, then the coding vector g v assigned to vertex v is given by i-th column h i of H. Then, the transmitted multicast codeword, x ∈ F ×1 q , is given by (2). Decoding Scheme: In any closed out-neighborhood, there are at most different colors (from the definition of local coloring). Since every columns of H are linearly independent (from the defining property of MDS codes), the coding vectors in any closed out-neighborhood have full rank, satisfying the interference alignment condition. The message ω v at vertex v is obtained at user v as follows: (1) Using side information at user v, cancel out message parts corresponding to all vertices outside N + (v), i.e., ω u g u . This is possible because, by the definition of the conflict graph H d , the messages {ω u } u/ ∈N + (v) are available as side information at user v and the encoding mechanism is known to all the users. (2) Find a vector z in the dual space of {g u } u∈N + (v)\{v} such that z T x = 0 (this is possible since g v is linearly independent of {g u } u∈N + (v)\{v} because of the local chromatic number-based construction). Now, z T x = (z T g v )ω v . Therefore, user v recovers its own message. It follows that all users can recover all the requested packets employing such linear scheme.
Achievable Rate: The coding scheme constructed as described above achieves a rate given by ξ lc /B, where B is the number of packets per file.

Example 2.
We consider an example shown in Figure 3. First, we assign colors to each vertex such that the total number of colors ξ = 5, and count the local coloring number, which is ξ lc = 4. Then, we construct the generator matrix A of a (ξ = 5, ξ lc = 4) MDS code, which is given by After that, we assign the columns of A to g v , corresponding from the left to the right to the vertices with the packets {A 2 , A 3 , B 2 , B 1 , A 1 }, as shown in Figure 3. Finally, the transmitted codewords can be generated which are the rows of the right-hand side of (2) where the length of the code is ξ lc /B = 4/3 file units. It can be easily verified that every user can decoded its desired packets with the cached ones. It is immediate to see that the best achievable rate due to local coloring is obtained by computing the local chromatic number of H d C,W and using its associated coloring to design the coding vectors, yielding a rate χ lc /B. However, note that to compute χ lc , we must optimize over all proper colorings to find the local chromatic number. As with the chromatic number, this can be cast as an Integer Program and it is hence an NP-hard problem. To overcome this limitation, in Section 4, we propose a greedy approach that (i) exhibits polynomial-time complexity in all the system parameters, (ii) achieves close to optimal performance for finite packetization order, and (iii) is asymptotically (i.e., for infinite packetization order) order-optimal.

Benefits of Local Coloring
Consider the following relation established for general directed graphs in [66]: Focusing on the conflict graph of interest H d C,W , the number of vertices can be as large as B ∑ u∈U L u . It then follows from (9) that the gap between the local chromatic number and the chromatic number can be as large as log(B ∑ u∈U L u ). Please note that this multiplicative factor grows with the number of packets per file B and the number of per-user requests L u , supporting the extra benefit of local coloring in the multiple-request scenario. In addition, the higher the number of per-user requests, the higher the directionality of the conflict graph, which is the main factor exploited by local coloring to reduce the achievable rate (see Section 3.2.2), further supporting the suitability of local coloring in increasingly practical settings where there is some form of spatial or temporal request aggregation.

Proposed Algorithms and Performance Analysis
As stated earlier, computing the local chromatic number is NP-hard. To circumvent this challenge, in this section, we propose two greedy coded multicast schemes, which together with the cache placement described in Section 2.1, yield the following two caching schemes: Randomized Aggregate Popularity-Greedy Local Coloring (RAP-GLC) and Randomized Aggregate Popularity-Hierarchical greedy Local Coloring (RAP-HgLC). In both cases, the steps for obtaining the coded multicast scheme are as follow: i.
Given a realization of the cache placement (C) and of the user requests (W), build the conflict graph H d C,W as in Section 3.1. ii.
Use any of the above algorithms (GLC or HgLC) to compute a proper coloring. Let ξ denote the number of colors used by either of the above algorithms to color H d C,W . Let ξ lc be the associated local coloring number. iii.
Consider a (ξ, ξ lc ) MDS code and compute the corresponding coded multicast scheme as described in Section 3.2.2.

Randomized Aggregate Popularity-Greedy Local Coloring (RAP-GLC)
The RAP-GLC algorithm generalizes the RAP-GCC (Random Aggregate Popularity-Greedy Constrained Coloring) algorithm introduced in [12]. RAP-GCC is a caching scheme based on random fractional caching for the placement phase and a coded multicast scheme built on greedy-graph-coloring -based linear index coding [51,68] for the delivery phase. RAP-GLC is more general than RAP-GCC in two aspects: (1) conventional coloring is replaced by local coloring to leverage possible gains in the multiple-request scenario, as described in Sections 3.2.1 and 3.3, and (2) RAP-GLC adaptively (depending on the demand realization) chooses between naive or coded multicasting according to a threshold parameter, instead of sticking to one of them (as in RAP-GCC).

RAP-GLC Algorithm Description
The algorithm associates to each vertex v a label or tag, composed of two fields i.e., K v ≡ (T D (v), T C (v)) with T C (v) denoting the subset of users caching the packet associated with vertex v, i.e., and T D (v) denoting the subset of users requesting the packet associated with vertex v, i.e., which includes the user itself µ(v) who requests ρ(v) and all the others requesting ρ(v). Please note that the cardinality of T D (v) indicates the popularity of packet ρ(v). Furthermore, let Given a vertex v, if the cardinality of T D (v) is higher than a predetermined threshold parameter t ∈ {0 · · · , n} i.e., |T D (v)| > t, then all vertices v such that ρ(v) = ρ(v ) are colored with the same color, leading to a naive multicast transmission scheme. If |T D (v)| ≤ t, then RAP-GLC greedily looks for a maximal set of vertices with the same T v (Algorithm 1, Line 14) and colors them with the same color if there is no conflict among the vertices (Algorithm 1, Line 15). The threshold parameter t is subject to optimization, as described in Section 4.1.2.
Doing this, RAP-GLC computes a valid coloring of the conflict graph H. Finally, the algorithm computes its associated local coloring number (Algorithm 1, Line 24). The coding scheme employed is based on the MDS code described in Section 3.2.1 associated with the above local coloring.  15: if {There is no edge between v and I} then 16: Time Complexity: In Algorithm 1, both the outer while-loop starting at Line 3, and the inner for-loop starting at Line 6 iterate at most |V | times, and all other operations inside the loops take constant time. Therefore, the complexity of RAP-GLC is O(|V | 2 ) or, equivalently, O(n 2 B 2 ), since |V | ≤ nB, which is polynomial in |V | (or n, B).

RAP-GLC Performance Analysis
In the following, we quantify the performance of RAP-GLC in the asymptotic regime when the number of users and files is kept constant while the packetization order is sent to infinity. Denoting by E[R RAP−GLC (P, Q, t)] the asymptotic average achievable rate of RAP-GLC for a fixed threshold t, the threshold parameter t is optimized to minimize E[R RAP−GLC (P, Q, t)]. Hence, denoting byR RAP−GLC the average rate achieved by RAP-GLC with optimized t, i.e.,  [10,12] using conventional graph coloring in the homogeneous shared link caching network to the case of using local coloring in the heterogeneous caching network. Specifically, we extend the order-optimality analysis under single per-user requests (L = 1) in the asymptotic regime of B → ∞ [10,12], to that under multiple L > 1 per-use requests [13,14]. These theoretical results will serve as rate lower bounds for the finite-length performance of our proposed algorithms.
Let L = max u L u and order L u , u ∈ U as a decreasing sequence L [1] ≥ L [2] ≥ L [3] , . . . , L [n] , where L [i] is the i-th largest L u and [i] = u for some u ∈ U . It can be seen that L [1] = max u L u and L [n] = min u L u . [1] and 1{·} is the indicator function. Let U n j = {[i] ∈ U : 1{L [i] − j ≥ 0}}. In the next theorem, we provide a performance guarantee of the RAP-GLC algorithm. Theorem 1. For any given m, n, M u , the random caching distribution P and the random request distribution Q, the average achievable rate of the RAP-GLC algorithm,R RAP−GLC satisfies R RAP−GLC ≤ min{ψ(P, Q),m −M}, with U denoting a set of users with cardinality , and denoting the probability that f is the file whose p f ,u maximizes the term λ(u, f u , U ) among f(U ) (the set of files requested by U ).

Proof. See Appendix A.
Using the explicit expression forR RAP−GLC in Theorem 1, we can optimize the caching distribution for a wide class of heterogeneous network models to minimize the number of transmissions. We use P * to denote the caching distribution that minimizes R RAP−GLC .

Remark 1.
For the sake of the numerical evaluation of ψ(q, p), it is worthwhile to note that the probabilities ρ f ,u,U can be easily computed as follows. Given the subset of users, U of cardinality , let J u 1 , . . . , J u denote i.i.d. random variables each of them distributed over F with pmf q u i , with i = 1, . . . , . Since λ(u 1 Hence, it follows that which can be easily computed by sorting the values {λ(u i , j, U ) : j ∈ F , u i ∈ U }.
Nevertheless, as shown in [21], when B is finite or is not exponential in n, the performance of RAP-GLC can degrade significantly, compromising the promising multiplicative caching gain, although it is already an improved version of RAP-GCC in [12]. This brings us to the other main contribution where we propose a new algorithm that preserves the gain due to coded multicasting even when B is finite.

Randomized Aggregate Popularity-Hierarchical Greedy Local Coloring ( RAP-HgLC) for Finite-Length Packetization
Similarly to RAP-GLC, RAP-HgLC has a predetermined parameter t ∈ {0, · · · , n} that is optimized to minimize its associated average achievable rate. However, in the RAP-HgLC algorithm, we arrange the vertices in a hierarchy and use this to design a more careful coloring algorithm. The key idea of RAP-HgLC is to exploit the labeling of each vertex more efficiently. More specifically, as in RAP-GLC, RAP-HgLC associates to each vertex v a label or tag, composed by the two fields K v ≡ (T D (v), T C (v)), defined in (10) and (11).

RAP-HgLC Algorithm Description
Before jumping into the algorithm, we introduce the following useful notations and their definitions.
where a ∈ [0, 1] is a design parameter and W 1 is updated with every iteration. • Q i (see Algorithm 2): another subset of G i that is updated every iteration. • W 2 ⊂ Q i : a subset of vertices in Q i defined as: where b ∈ [0, 1] is another design parameter.
Based on the above definitions, it follows that the total set vertices V forms an n-layer hierarchy with the i-th layer composed of the set of vertices G i .
Key Idea: Starting from layer n, at any layer i ≤ n, the RAP-HgLC algorithm attempts to form an independent set of size at least i; when there are no more such independent sets, all remaining packets are dropped to layer i − 1, and transmission actions on those packets are deferred to later layers. This is the key difference between RAP-HgLC and RAP-GLC. That is, RAP-HgLC makes an extra effort to place nodes with large labels into large independent sets.
We will now describe how the above key idea is implemented in RAP-HgLC. The RAP-HgLC algorithm forms large independent sets in a "top-down" fashion, starting with the highest layer, and iteratively moving to lower layers until layer 1. The following two steps are performed at each layer:

1.
Step I: The first step is similar to that in RAP-GLC algorithm. Given a vertex v, the algorithm first checks if the cardinality of T D (v) is higher than t, i.e., |T D (v)| > t then all the vertices v such that ρ(v) = ρ(v ) are colored with the same color. If |T D (v)| ≤ t then the algorithm greedily finds independent sets of size i, where every vertex v in the independent set (Algorithm 2, Line 20) has the same K v (Algorithm 2, Line 19). After removing these vertices, the rest of the vertices in G i are left for the second step. 2.
Step II: A candidate pool of vertices W 1 ⊆ G i is created. This set contains vertices v such that |K v | being close to the smallest available |K v |'s. We randomly pick a vertex v from W 1 (Algorithm 2, Line 31). The design parameter a determines how close is the picked |K v | to the smallest available ones. We gradually form an independent set of size i with v included as follows: Form another set W 2 (Algorithm 2, Line 34), excluding v, whose vertices have |K v | that is bigger but closer to that of v determined by b, sample repeatedly with replacement from it to grow the independent set. If an independent set of size at least i cannot be formed, we drop the vertex v to the lower layer G i−1 , and take it into account in the next layer iteration. Otherwise, we assign a color to the independent set. W 1 is repeatedly formed and random sampling from W 1 repeated till every vertex in G i is dropped or colored. if { |T D (v)| > t } then 10: for all v ∈ V with ρ(v ) = ρ(v) do 11: end for 13: Color all the vertices in I by c / ∈ C; 14: Let c[I ] = c; 15: for all i = n, n − 1, . . . , 2, 1 do 16: G i = G i \ I; 17: end for 18: else 19: 20: if {There is no edge between v and I} then 21: end if 23: end for 24: if |I| = i then 25: Color all the vertices in I by c / ∈ C; Q i = G i \ I; 34: for all v ∈ Q i with v randomly picked from W 2 ⊂ Q i . do 35: if {K v ⊃ K v } ∩ {No edge between v and I} then 36: end if 41: end for 42: if |I| ≥ i then 43: Color all the vertices in I by c / ∈ C; end if 49: end for 50: end for 51: c =LocalSearch(H C,W , c, C); 52: return the local coloring number max v∈V |c(N + (v))| and the corresponding color assignment c(N + (v)) for each v; Remark 2. Please note that RAP-GLC goes through the same Step I as RAP-HgLC, and then simply assigns a different color to each remaining uncolored vertex. On the other hand, Step II in RAP-HgLC tries to find further independent sets among the remaining uncolored vertices. It is this extra step that guarantees the performance of RAP-HgLC to be no worse than that of RAP-GLC.
The RAP-HgLC algorithm, when operating on the i-th layer, always colors at least i vertices with the same color. Please note that if there are remaining vertices when reaching layer 1, all such vertices will be colored, each with a different color.
To further reduce the required number of colors, we use a function called LocalSearch (Algorithm 2, Line 51), which is described in Algorithm 3. It works in an iterative fashion by replacing the current solution with a better one if there exists. It terminates when no better solutions can be found. In particular, the local search algorithm has the purpose of checking the redundancy of each color c ∈ C, to eventually decrease the current objective function value |C|. In more detail, the local search computes, iteratively for each color c ∈ C, the set J c of all vertices colored with color c, and performs the following steps:

1.
For each vertex i ∈ J c , if there is a color c ∈ C, c = c that is not assigned to any adjacent vertex j ∈ Adj(i), then assign vertex i with color c ; 2.
Color c is removed from the set C if and only if in the previous step it has been possible to replace c with some color c = c for all vertices in J c .
Finally, in Algorithm 2, Line 52, we compute the local coloring number.

Algorithm 3 LocalSearch(H C,W , c, C)
1: for all c ∈ C do 2: Let J c be the set of vertices whose color is c; To illustrate the RAP-HgLC algorithm, we present the following example.  by user 3 and not cached anywhere).
The RAP-HgLC algorithm works as follows. For i = n = 3, G 3 = {A 2 , A 3 , B 2 , B 3 , C 2 }, let v = A 2 , then it can be found that B 2 and C 2 would be in I, hence I = {A 2 , B 2 , C 2 }. Now since |I| = n = 3, we color A 2 , B 2 , C 2 by black (see Figure 4). Then G i = G i \ I = {A 3 , B 3 }. In the following loop, since we cannot find a set I with |I| = n = 3, we move to Line 19. Then since we cannot find a I with |I| ≥ n = 3, then we do G 2 = G 2 ∪ {A 3 }, and then G 2 = G 2 ∪ {B 3 }. Therefore, we obtain G 2 = {A 3 , A 4 , B 3 , B 4 , C 3 }. Now we go to Line 5 (start next loop). For i = n − 1 = 2, in this loop, we first pick v = A 4 , then we can find I = {A 4 , B 4 }. We color {A 4 , B 4 } by blue (see Figure 4). Now G 2 = G 2 \ {A 4 , B 4 } = {A 3 , B 3 , C 3 }. Then in Line 19, we find the vertex with smallest length of K v (let a = 0), which is C 3 with K C 3 = {3, 2}, then we have I = {C 3 } and Q 2 = {A 3 , B 3 }, then in the next loop, we can find I = {C 3 , B 3 }. We color I = {C 3 , B 3 } by red (see Figure 4).
Then we go to next loop i = n − 2 = 1. Then we can see that I = {C 4 }, and we color {C 4 } by purple (see Figure 4).
Hence, we can find I = {A 3 } and we color {A 3 } by brown.
According to Figure 4, the total number of required colors is 5, while the maximum number of colors required locally by each user is 4. For the naive multicasting, since it only allows the vertices represented the same packet to be colored by the same color, the total number of required colors is 9. The corresponding rate is given by 9/4. Hence, the final rate achieved by RAP-HgLC with local coloring is no more than min{4/4, 9/4} = 1. For the interested reader, it can be verified that if the GCC algorithm, designed for B → ∞, as proposed in [10], is used, the corresponding number of required colors is 6.
{3, 2 1} The complexity of RAP-HgLC can be computed as follows. For the hierarchical coloring procedure (Line 5-50 in Algorithm 2), the complexity is O(n|V | 2 ), and the complexity of local search procedure is O(|E |). Therefore, the running time complexity of RAP-HgLC is given by O(n|V | 2 + |E |) = O(n|V | 2 ). Since |V | ≤ nB, the running time complexity of RAP-HgLC is O n 3 B 2 .

RAP-HgLC Performance Analysis
For the general heterogeneous network setting, tight upper bounds on the asymptotic (B → ∞) average achievable rate of RAP-HgLC are quite complex to derive, even though a simple (but not necessarily tight) upper bound on the asymptotic performance can be obtained considering the asymptotic average rate of RAP-GLC (see Remark 2).
Regarding the finite-length regime, in [21] we derived a tight upper bound on the performance of RAP-HgLC for the simpler case of homogenous networks under worst-case demands. Specifically, the bound in [21], requires B to beÕ m M g+2 (whereÕ hides some poly log terms) to achieve a worst-case rate of at most n g . This approximately matches a lower bound ofÕ( m M g ) derived in the same work for any coloring algorithm, showing, for the simpler homogenous network setting, the optimality of RAP-HgLC among all graph-coloring-based algorithms.
For the more complex setting where demands arise from popularity distributions and every user requests multiple files, the finite-length performance of RAP-HgLC is investigated in Section 6 via numerical analysis, where we show how the RAP-HgLC is able to recover most of the multiplicative caching gain even with very moderate packetization order.

Tradeoff between Number of Requests and Code Length
As mentioned earlier, in the simpler homogenous scenario, the authors in [21] showed that under worst-case demands, to achieve a gain g over conventional naive multicasting, it is necessary for B to grow exponentially with g. Intuitively, this is because a sufficiently large B is needed to create coded multicast transmissions that are useful for multiple users. However, when each user makes multiple requests, the number of requests L u = L can play a similar role to that of B, such that the requirement for B, and hence the resulting computational complexity can be reduced. For ease of analysis, in this section, we assume that all users place the same number of requests (L u = L).
In the following, under either worst-case or uniform demands, we show the sufficient conditions on B and L that guarantee achieving a gain g = Mn m . From this result, we can obtain the regime where B and L are interchangeable (L plays an equivalent role to B). Note that it can be shown that the number of file transmissions under both worst-case and uniform demands have the same order.
We consider two cases for the range of B: the case of B = 1, and the case of B = ω m M . The regime where 1 < B = O m M is out of the scope of this paper. When B = 1, the cache placement algorithm becomes scalar uniform cache placement (SUP), in which each user caches M entire files chosen uniformly at random. For simplicity, we let M be a positive integer. Then, as shown in [14], letting L → ∞ as a function of n, m, M, we obtain the following theorem.
where ε is an arbitrarily small number, then, the achievable rate of RAP-GLC is upper bounded by lim n,m→∞

Proof. See Appendix B.
From Theorem 2, we can see that when L and M are large enough, instead of requiring a large B and packet-level coding, a simpler file-level coding scheme is sufficient to achieve the same order-optimal rate. We remark, however, that the range of the parameter regimes in which this result holds is limited due to the requirement of a large M and L. Next, we focus on another parameter regime, when B = ω m M , and find the achievable tradeoff between B and L.
or (ii) M m ≥ e 1+e , and where ε is an arbitrarily small number, Then, the achievable rate of RAP-GLC is upper bounded by lim n,m→∞ Proof. See Appendix C.
If we particularize Theorem 1 to the homogenous network setting under uniform demands, we see that the rate achieved by RAP-GLC is upper bounded by the same expression given in (24). Hence, from Theorem 2, we can see that when L is large enough, instead of requiring a very large B, an intermediate value of B = ω m M is sufficient to achieve the same order-optimal rate. In practice, it is important to find the right balance and tradeoff between B and L given the remaining system parameters. In Section 6, we show via simulation that a similar tradeoff holds also for RAP-HgLC.

Simulations and Discussions
In this section, we numerically evaluate the performance of the two polynomial-time algorithms described in Section 4, RAP-GLC and RAP-HgLC, in the finite-length regime characterized by the number of packets per file B.
Recall that the caching distribution P * is to be optimized to minimize the number of transmissions. Since the distribution P * resulting from minimizing the right-hand side of (12) may not admit an analytically tractable expression in general, in the following numerical results, we restrict the caching distribution to take the form of a truncated uniform distribution p u , as described in [12]: where the cut-off index m u ≥ M is a function of the system parameters that is optimized to minimize the right-hand side of (12). The intuition behind the form of p u in (25) is that each user caches the same fraction of (randomly selected) packets from each of the most m u popular files, and does not cache any packet from the remaining m − m u least popular files. We point out that when m u = M, this cache placement coincides with the LFU (Least Frequently Used) caching policy. Thus, this cache placement is referred to as Random LFU (RLFU) [12], and the corresponding caching algorithms as RLFU-GLC and RLFU-HgLC. Recall that LFU discards the least frequently requested file upon the arrival of a new file to a full cache of size M u files. In the long run, this is equivalent to caching the M u most popular files [69]. In Figures 5 and 6, we plot the average achievable rate, i.e., the average number of transmissions (normalized by the file size) as a function of the cache size for RLFU-GLC and RLFU-HgLC. For comparison, we also simulate the following algorithms: • LFU, which has been shown to be optimal in single cache networks; • RLFU-GLC with infinite file packetization (B → ∞), whose performance guarantee is given in Theorem 1, and it is shown to be order optimal.
Regarding the LFU algorithm, the average achievable rate is given by where U {M u < f } denotes the set of users with M u < f . For simplicity, and to better illustrate the effectiveness of the proposed algorithms, especially under multiple per-user requests, we consider a scenario in which all users request files according to a Zipf demand distribution with parameter γ ∈ {0.2, 0, 4, 0.5}, and all caches have size M files. Under Zipf demands, file f is requested with probability We consider two types of users. In Figures 5a and 6a, users represent end devices requesting only one file each (L = 1); while in Figures 5b and 6b, they represent helpers/small-cells, each serving 10 end user devices, and consequently collecting L = 10 requests.
In Figure 5a,b, we fix the total number of users n and the product between L and B (L × B = 1000). Figure 5a plots the average rate for a network with n = 40 users, γ = 0.5, L = 1, and B = 1000. It is immediate to observe the impact of finite packetization on the multiplicative caching gain. In fact, as predicted by the theory (see [21]), the significant caching gain (with respect to LFU) quantified by the asymptotic performance of RAP-GLC (GLC with B = ∞) is completely lost when using RAP-GLC with finite packetization (GLC with B = 1000). On the other hand, RAP-HgLC remarkably preserves, at the expense of a slight increase in computational complexity, most of the multiplicative caching gain for the same value of file packetization. For example, in Figure 5a, if M doubles from M = 200 to M = 400, then the rate achieved by RAP-HgLC reduces from 15 to 5.7. Furthermore, RAP-HgLC can achieve a factor of 3.5 rate reduction from LFU for M = 500. For the same regime, it is straightforward to verify that neither RAP-GLC nor LFU exhibit this property. Note from Figure 5a that to guarantee a rate of 10, RAP-GLC requires a cache size of M = 500, while RAP-HgLC can reduce the cache size requirement to M = 250, a 2× cache size reduction. Furthermore, while LFU can only provide an additive caching gain, additive and multiplicative gains may show indistinguishable when M is comparable to the library size m. Hence, one needs to pick a reasonably small M ( m n < M m) to observe the multiplicative caching gain of RAP-HgLC. Figure 5b shows the average rate for a network with n = 40 helpers/small-cells, each serving 10 users making requests according to a Zip distribution with γ = 0.5. Hence, the total number of distinct requests per helper is up to L u = 10, ∀u ∈ {1, . . . , 20}. In this case, we assume B = 100 (instead of B = 1000 in Figure 5a). In order to make easier the comparison with Figure 5a, we normalize the achievable rate (number of transmissions) by the file size and the number of requests.
Note from Figure 5a,b that as predicted by Theorem 3, when L u increases (from L u = 1 to L u = 10), almost the same multiplicative caching gain can be achieved with a smaller B (from B = 1000 to B = 100). In fact, from Figure 5a,b, we see that under RAP-HgLC, the average rate per request for B = 100 and L = 10 is almost the same as the average rate per request for B = 1000 and L = 1. This confirms the interesting tradeoff between B and L established in Theorem 3.
We can observe a similar behavior in Figure 6a,b. Figure 6a plots the average rate for a network with n = 80 users, γ = 0.4, L = 1, and B = 200. RAP-HgLC is able to preserve most of the multiplicative caching gain for the same values of file packetization. For example, in Figure 6a, if M doubles from M = 200 to M = 400, then the rate achieved by RAP-HgLC essentially halves from 20 to 10. Furthermore, RAP-HgLC can achieve a factor of 5 rate reduction from LFU for M = 500. Note from Figure 6a that to guarantee a rate of 20, RAP-GLC requires a cache size of M = 500, while RAP-HgLC can reduce the cache size requirement to M = 200, a 2.5× cache size reduction. Figure 6b plots the average rate for a network with n = 20 helpers/small-cells, each serving 10 users making requests according to a Zip distribution with γ = 0.2. Hence, the total number of distinct requests per helper is up to L u = 10, ∀u ∈ {1, . . . , 20}. In this case, we assume B = 100. Differently from Figure 5b, here we plot the average rate without normalizing it by the number of requests.
Note from Figure 6a,b that, as predicted by Theorem 3, when L u increases (from L u = 1 to L u = 10), almost the same multiplicative caching gain can be achieved with a smaller B (from B = 200 to B = 100). In fact, from Figure 6a Furthermore, from Figures 5 and 6, we notice that increasing the Zip parameter reduces the gains with respect to LFU. This is explained by the fact that when aggregating multiple requests, there is a higher number of overlapping requests, which increases the opportunities for naive multicasting (as clearly characterized in [13]). Note, however, that RAP-HgLC can remarkably keep similar gains with respect to LFU in this multiple-request setting, and approach the asymptotic performance even with just B = 100 packets per file, confirming the effectiveness of the local graph coloring and extra processing procedures in RAP-HgLC.

Conclusions
Coded multicasting has been shown to be a promising approach to significantly reduce the traffic load in wireless caching networks. However, most existing schemes require the number of packets per file to grow exponentially with the number of users. To address this challenge, in this paper we focused on a heterogeneous shared link caching network model and designed novel coded multicast algorithms based on local graph coloring that exhibit polynomial-time complexity in all the system parameters, and preserve the asymptotically proven multiplicative caching gain for finite file packetization. We also demonstrated that the number of packets per file can be traded-off with the number of requests collected by each cache, such that the same multiplicative caching gain can be preserved. Simulation results confirm the superiority of the proposed schemes and illustrate the tradeoff between request aggregation and computational complexity (driven by the packetization order), shedding light into the practical achievability of the promising multiplicative caching gain in next generation wireless networks.
Author Contributions: All authors have contributed in equal part to the results of this paper.
Funding: This research was funded in part by NSF grants #1619129, #1817154, #1824558, and by the Alexander von Humboldt Professorship.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Proof of Theorem 1
To analytically characterize the performance of RAP-GLC, we consider two specific cases, where t = n (i.e., the coded multicast only scheme) and t = 0 (i.e., the naive multicasting only scheme), and refer to these schemes as RAP-GLC 1 and RAP-GLC 2 , respectively. In the following, we will compute the performance of these two cases respectively and take the minimum rate between these two cases. Obviously, this rate can serve as an upper bound of RAP-GLC. To compute the average total number of colors provided by RAP-GLC 1 , we first see that for all v ∈ I obtained in this algorithm, T v are identical. Based on Algorithm 1, by construction the independent sets I ⊂ V generated by RAP-GLC 1 have the same (unordered) label of users requesting or caching the packets {ρ(v) : v ∈ I}. We shall refer to such unordered label of users as the user label of the independent set. Hence, we count the independent sets by enumerating all possible user labels, and upperbounding how many independent sets I Algorithm 1 generates for each user label. Consider a user label U ⊂ U of size , and let I(U , f, i) the i-th independent set generated by Algorithm 1 with label U and while let J (U , f) = {I(U , f, i) : ∀i}.
Following Algorithm 1, for each U , the number of used colors is |J (U , f)|. Given f, we can see that |J (U , f)| is a random variable which is a function of C. Let the indicator 1{T v fu = U } denote the event that vertex v f u from file f u requested by user u ∈ U is available in all the users in U but u and the rest of the vertices U \ U , then 1{T v fu = U } follows a Bernoulli distribution with parameter such that its expectation is λ(u, f u ). Then, we can see that given f , ∑ ∀v fu 1{T v fu = U } = λ(u, f u )B + o(B) with high probability [70]. Thus, as B → ∞, we have that with high probability, where f(U ) represent the set of files requested by U . Then, by averaging over the demand's distribution, we obtain that with high probability: where (a) is by using (A2) and (b) is obtained by computing the probability that the requested file f u in f(U ) maximizes λ(u, f ). δ 1 (B) denotes a smaller order term of ∑ L j=1 ∑ n =1 ∑ U ⊂U n j ∑ m f =1 ∑ u∈U ρ f ,u,U λ(u, f ). For any U , we obtain that ∑ f ∑ u∈U ρ f ,u,U = 1, and ρ f ,u,U denotes the probability that file f is the file with memory assignment p f ,u such that ρ f ,u,U ∆ = P( f = arg max which is the first term inside the minimum in (12).
Appendix A.2. Average Total Number of Colors for RAP-GLC 2 As described in Section 4.1, RAP-GLC 2 computes the minimum coloring of H C,W subject to the constraint that only the vertices representing the same packet can have the same color. In this case, the total number of colors is equal to the number of distinct requested packets, and the coloring can be found in O(|V | 2 ). Starting from this valid coloring, GCLC 2 computes max v∈V |c(N + (v))|. To show that the performance of GCLC 2 are upper bounded bym −M withm andM given as in (13) and (14) respectively, we note that: where Bf ,f is number of chucks that are going to be transmitted of filef given that the demand vector is equal to f, and (a) is due to the observation that given a filef , = P(filef is requested).
(A6) Normalizing (A5) by B, we obtain that: which is the second term inside the minimum in (12).

Appendix B. Proof of Theorem 2
In this section, we prove Theorem 2. Recall that for each U , the number of used colors is given by |J (U , f)|, then we have we have |J (U , f)| = max u∈U ∑ ∀v fu 1{T v fu = U }, where f(U ) represent the set of files requested by U . In this case, it is clear that Then let The goal is to find a condition of L such that where (a) is because M = ω(1) such that δ 1 is a smaller order term compared to the first term in (A12) and use the variance for the Binomial distribution. Then, we let l > 0. Hence, s(l) is an increasing function such that the minimum value of s(l) take place when l = 1. Thus, by using (A15) we obtain the sufficient condition is given by