Coded Caching for Broadcast Networks with User Cooperation

Caching technique is a promising approach to reduce the heavy traffic load and improve user latency experience for the Internet of Things (IoT). In this paper, by exploiting edge cache resources and communication opportunities in device-to-device (D2D) networks and broadcast networks, two novel coded caching schemes are proposed that greatly reduce transmission latency for the centralized and decentralized caching settings, respectively. In addition to the multicast gain, both schemes obtain an additional cooperation gain offered by user cooperation and an additional parallel gain offered by the parallel transmission among the server and users. With a newly established lower bound on the transmission delay, we prove that the centralized coded caching scheme is order-optimal, i.e., achieving a constant multiplicative gap within the minimum transmission delay. The decentralized coded caching scheme is also order-optimal if each user’s cache size is larger than a threshold which approaches zero as the total number of users tends to infinity. Moreover, theoretical analysis shows that to reduce the transmission delay, the number of users sending signals simultaneously should be appropriately chosen according to the user’s cache size, and always letting more users send information in parallel could cause high transmission delay.


Introduction
With the rapid development of Internet of Things (IoT) technologies, IoT data traffic, such as live streaming and on-demand video streaming, has grown dramatically over the past few years. To reduce the traffic load and improve the user latency experience, the caching technique has been viewed as a promising approach that shifts the network traffic to low congestion periods. In the seminal paper [1], Maddah-Ali and Niesen proposed a coded caching scheme based on centralized file placement and coded multicast delivery that achieves a significantly larger global multicast gain compared to the conventional uncoded caching scheme.
The coded caching scheme has attracted wide and significant interest. The coded caching scheme was extended to a setup with decentralized file placement, where no coordination is required for the file placement [2]. For the cache-aided broadcast network, ref. [3] showed that the rate-memory tradeoff of the above caching system is within a factor of 2.00884. For the setting with uncoded file placement where each user stores uncoded content from the library, refs. [4,5] proved that Maddah-Ali and Niesen's scheme is optimal. In [6], both the placement and delivery phases of coded caching are depicted using a placement delivery array (PDA), and an upper bound for all possible regular PDAs was established. In [7], the authors studied a cached-aided network with heterogeneous setting where the user cache memories are unequal. More asymmetric network settings have been discussed, such as coded caching with heterogeneous user profiles [8], with distinct sizes of files [9], with asymmetric cache sizes [10][11][12] and with distinct link qualities [13]. The settings with varying file popularities have been discussed in [14][15][16]. Coded caching that jointly considers various heterogeneous aspects was studied in [17]. Other works on coded caching include, e.g., cache-aided noiseless multi-server network [18], cache-aided wireless/noisy broadcast networks [19][20][21][22], cache-aided relay networks [23][24][25], cacheaided interference management [26,27], coded caching with random demands [28], caching in combination networks [29], coded caching under secrecy constraints [30], coded caching with reduced subpacketization [31,32], the coded caching problem where each user requests multiple files [33], and a cache-aided broadcast network for correlated content [34], etc.
A different line of work is to study the cached-aided networks without the presence of a server, e.g., the device-to-device (D2D) cache-aided network. In [35], the authors investigated coded caching for wireless D2D network [35], where users locate in a fixed mesh topology wireless D2D network. A D2D system with selfish users who do not participate in delivering the missing subfiles to all users was studied in [36]. Wang et al. applied the PDA to characterize cache-aided D2D wireless networks in [37]. In [38], the authors studied the spatial D2D networks in which the user locations are modeled by a Poisson point process. For heterogeneous cache-aided D2D networks where users are equipped with cache memories of distinct sizes, ref. [39] minimized the delivery load by optimizing over the partition during the placement phase and the size and structure of D2D during the delivery phase. A highly dense wireless network with device mobility was investigated in [40].
In fact, combining the cache-aided broadcast network with the cache-aided D2D network can potentially reduce the transmission latency. This hybrid network is common in many practical distributed systems such as cloud network [41], where a central cloud server broadcasts messages to multiple users through the cellular network, and meanwhile users communicate with each other through a fiber local area network (LAN). A potential scenario is that users in a moderately dense area, such as a university, want to download files, such as movies, from a data library, such as a video service provider. It should be noted that the user demands are highly redundant, and the files need not only be stored by a central server but also partially cached by other users. Someone can attain the desired content through both communicating with the central server and other users such that the communication and storage resources can be used efficiently. Unfortunately, there is very little research investigating the coded caching problem for this hybrid network. In this paper, we consider such hybrid cache-aided network where a server consisting of N ∈ Z + files connects with K ∈ Z + users through a broadcast network, and meanwhile the users can exchange information via a D2D network. Unlike the settings of [35,38], in which each user can only communicate with its neighboring users via spatial multiplexing, we consider the D2D network as either an error-free shared link or a flexible routing network [18]. In particular, for the case of the shared link, all users exchange information via a shared link. In the flexible routing network, there exists a routing strategy adaptively partitioning all users into multiple groups, in each of which one user sends data packets error-free to the remaining users in the corresponding group. Let α ∈ Z be the number of groups who send signals at the same time, then the following fundamental questions arise for this hybrid cache-aided network: • How does α affect the system performance? • What is the (approximately) optimal value of α to minimize the transmission latency? • How can communication loads be allocated between the server and users to achieve the minimum transmission latency?
In this paper, we try to address these questions, and our main contributions are summarized as follows: • We propose novel coded caching schemes for this hybrid network under centralized and decentralized data placement. Both schemes efficiently exploit communication opportunities in D2D and broadcast networks, and appropriately allocate communication loads between the server and users. In addition to multicast gain, our schemes achieve much smaller transmission latency than both that of Maddah-Ali and Niesen's scheme for a broadcast network [1,2] and the D2D coded caching scheme [35]. We characterize a cooperation gain and a parallel gain achieved by our schemes, where the cooperation gain is obtained through cooperation among users in the D2D network, and the parallel gain is obtained through the parallel transmission between the server and users. • We prove that the centralized scheme is order-optimal, i.e., achieving the optimal transmission delay within a constant multiplicative gap in all regimes. Moreover, the decentralized scheme is also optimal when the cache size of each user M is larger than the threshold N(1 − K−1 1/(K + 1)) that is approaching zero as K → ∞.

•
For the centralized data placement case, theoretical analysis shows that α should decrease with the increase of the user caching size. In particular, when each user's caching size is sufficiently large, only one user should be allowed to send information, indicating that the D2D network can be just a simple shared link connecting all users. For the decentralized data placement case, α should be dynamically changing according to the sizes of subfiles created in the placement phase. In other words, always letting more users parallelly send information can cause a high transmission delay.
Please note that the decentralized scenario is much more complicated than the centralized scenario, since each subfile can be stored by s = 1, 2, . . . , K users, leading to a dynamic file-splitting and communication strategy in the D2D network. Our schemes, in particular the decentralized coded caching scheme, differ greatly with the D2D coded caching scheme in [35]. Specifically, ref. [35] considered a fixed network topology where each user connects with a fixed set of users, and the total user cache sizes must be large enough to store all files in the library. However, in our schemes, the user group partition is dynamically changing, and each user can communicate with any set of users via network routing. Moreover, our model has the server share communication loads with the users, resulting in an allocation problem on communication loads between the broadcast network and D2D network. Finally, our schemes achieve a tradeoff between the cooperation gain, parallel gain and multicast gain, while the schemes in [1,2,35] only achieve the multicast gain.
The remainder of this paper is as follows. Section 2 presents the system model, and defines the main problem studied in this paper. We summarize the obtained main results in Section 3. Following that is a detailed description of the centralized coded caching scheme with user cooperation in Section 4. Section 5 extends the techniques we developed for the centralized caching problem to the setting of decentralized random caching. Section 6 concludes this paper.

System Model and Problem Definition
Consider a cache-aided network consisting of a single server and K users as depicted in Figure 1. The server has a library of N independent files W 1 , . . . , W N . Each file W n , n = 1, . . . , N, is uniformly distributed over for some positive integer F. The server connects with K users through a noisy-free shared link but rate-limited to a network speed of C 1 bits per second (bits/s). Each user k ∈ [K] is equipped with a cache memory of size MF bits, for some M ∈ [0, N], and can communicate with each other via a D2D network.
We mainly focus on two types of D2D networks: a shared link as in [1,2] and a flexible routing network introduced in [18]. In the case of a shared link, all users connect with each other through a shared error-free link but rate-limited to C 2 bits/s. In the flexible routing network, K users can arbitrarily form multiple groups via network routing, in each of which at most one user can send error-free data packets at a network speed C 2 bits/s to the remaining users within the group. To unify these two types of D2D networks, we introduce an integer α max ∈ {1, K 2 }, which denotes the maximum number of groups allowed to send data parallelly in the D2D network. For example, when α max = 1, the D2D network degenerates into a shared link, and when α max = K 2 , it turns to be the flexible network. The system works in two phases: a placement phase and a delivery phase. In the placement phase, all users will access the entire library W 1 , . . . , W N and fill the content to their caching memories. More specifically, each user k, for k ∈ [K], maps W 1 , . . . , W N to its cache content: α max users can send signals parallelly in each channel use. The set of α max users who send signals in parallel could be adaptively changed in the delivery phase.
At the end of the delivery phase, due to the error-free transmission in the broadcast and D2D networks, user k observes symbols sent to them, i.e., (X j : j ∈ [K], k ∈ D j ), and decodes its desired message asŴ d k = ψ k,d (X, (X j : j ∈ [K], k ∈ D j ), Z k ), where ψ k,d is a decoding function.
We define the worst-case probability of error as A coded caching scheme (M, R 1 , R 2 ) consists of caching functions {φ k }, encoding functions { f d , f k,d } and decoding functions {ψ k,d }. We say that the rate region (M, R 1 , R 2 ) is achievable if for every > 0 and every large enough file size F, there exists a coded caching scheme such that P e is less than .
Since the server and the users send signals in parallel, the total transmission delay, denoted by T, can be defined as The optimal transmission delay is T * inf{T : T is achievable}. For simplicity, we assume that C 1 = C 2 = F, and then from (7) we have When C 1 = C 2 , e.g., C 1 : C 2 = 1/k, one small adjustment allowing our scheme to continue to work is multiplying λ by 1/(k(1 − λ) + λ), where λ is a devisable parameter introduced later. Our goal is to design a coded caching scheme to minimize the transmission delay. Finally, in this paper we assume K ≤ N and M ≤ N. Extending the results to other scenarios is straightforward, as mentioned in [1].

Main Results
We first establish a general lower bound on the transmission delay for the system model described in Section 2, then present two upper bounds of the optimal transmission delay achieved by our centralized and decentralized coded caching schemes, respectively. Finally, we present the optimality results of these two schemes.
Theorem 1 (Lower Bound). For memory size 0 ≤ M ≤ N, the optimal transmission delay is lower bounded by Proof. See the proof in Appendix A.

Centralized Coded Caching
In the following theorem, we present an upper bound on the transmission delay for the centralized caching setup.
Theorem 2 (Upper Bound for the Centralized Scenario). Let t KM/N ∈ Z + , and α ∈ Z + . For memory size M ∈ {0, N K , 2N K , . . . , N}, the optimal transmission delay T * is upper bounded by For general 0 ≤ M ≤ N, the lower convex envelope of these points is achievable.
Proof. See scheme in Section 4.
The following simple example shows that the proposed upper bound can greatly reduce the transmission delay. Example 1. Consider a network described in Section 2 with KM/N = K − 1. The coded caching scheme without D2D communication [1] has the server multicast an XOR message useful for all K users, achieving the transmission delay The achievable transmission delay in Theorem 2 equals 1 2K−1 by letting α = 1, almost twice as short as the transmission delay of previous schemes if K is sufficiently large.
From (10), we obtain that the optimal value of α, denoted by α * , equals 1 if t ≥ K − 1 and to α max if t ≤ K α max −1. When ignoring all integer constraints, we obtain α * = K t+1 . We rewrite this choice as follows: Remark 1. From (11), we observe that when M is small such that t ≤ K/α max −1, we have α * = α max . As M is increasing, α * becomes K/(t + 1), smaller than α max . When M is sufficiently large such that M ≥ (K − 1)N/K, only one user should be allowed to send information, i.e., α * = 1. This indicates that letting more users parallelly send information could be harmful. The main reason for this phenomenon is the existence of a tradeoff between the multicast gain, cooperation gain and parallel gain, which will be introduced below in this section.
Comparing T central with the transmission delay achieved by Maddah-Ali and Niesen's scheme for the broadcast network [1], i.e., K 1 − M N 1 1+t , T central consists of an additional factor referred to as centralized cooperation gain, as it arises from user cooperation. Comparing T central with the transmission delay achieved by the D2D coded caching scheme [35], referred to as centralized parallel gain, as it arises from parallel transmission among the server and users. Both gains depend on K, M/N and α max .
Substituting the optimal α * into (12), we have When fixing (K, N, α max ), G central,c in general is not a monotonic function of M. More specifically, when M is small enough such that t < K α max −1, the function G central,c is monotonically decreasing, indicating that the improvement caused by introducing D2D communication. This is mainly because relatively larger M allows users to share more common data with each other, providing more opportunities on user cooperation. However, when M grows larger such that t ≥ K α max −1, the local and global caching gains become dominant, and less improvement can be obtained from user cooperation, turning G central,c to a monotonic increasing function of M, Similarly, substituting the optimal α * into (13), we obtain Equation (15) shows that G central,p is monotonically increasing with t, mainly due to the fact that when M increases, more content can be sent through the D2D network without the help of the central server, decreasing the improvement from parallel transmission between the server and users. The centralized cooperation gain (12) and parallel gain (13)

Remark 2.
Larger α could lead to better parallel and cooperation gain (more uses can concurrently multicast signals to other users), but will result in worse multicast gain (signals are multicast to fewer users in each group). The choice of α in (11) is in fact a tradeoff between the multicast gain, parallel gain and cooperation gain.
The proposed scheme achieving the upper bound in Theorem 2 is order-optimal.
Theorem 3. For memory size 0 ≤ M ≤ N, Proof. See the proof in Appendix B.
The exact gap of T central /T * could be much smaller. One could apply the method proposed in [3] to obtain a tighter lower bound and shrink the gap. In this paper, we only prove the order optimality of the proposed scheme, and leave the work of finding a smaller gap as the future work. Figure 3 plots the lower bound (9) and upper bounds achieved by various schemes, including the proposed scheme, the scheme Maddah-Ali 2014 in [1] which considers the broadcast network without D2D communication, and the scheme Ji 2016 in [35], which considers the D2D network without server. It is obvious that our scheme outperforms the previous schemes and approaches closely to the lower bound.

Decentralized Coded Caching
We exploit the multicast gain from coded caching, D2D communication, and parallel transmission between the server and users, leading to the following upper bound. where Proof. Here, R ∅ represents the transmission rate of sending contents that are not cached by any user, R s and R u represent the transmission rate sent by the server via the broadcast network, and the transmission rate sent by users via the D2D network, respectively. Equation (17) balances the communication loads assigned to the server and users. See more detailed proof in Section 5.
The key idea of the scheme achieving (17) is to partition K users into K s groups for each communication round s ∈ [K − 1], and let each group perform the D2D coded caching scheme [35] to exchange information. The main challenge is that that among all K s groups, there are K s groups of the same size s, and an abnormal group of size (K mod s) if (K mod s) = 0, leading to an asymmetric caching setup. One may use the scheme [35] for the groups of size s, for the group of size (K mod s) ≥ 2, but how to exploit the caching resource and communication capability of all groups while balancing communication loads among the two types of groups to minimize the transmission delay remains elusive and needs to be carefully designed. Moreover, this challenge poses complexities both in establishing the upper bound and in optimality proof.

Remark 3.
The upper bound in Theorem 4 is achieved by setting the number of users that exactly send signals in parallel as follows: If K s > α max , the number of users who send data in parallel is smaller than α max , indicating that always letting more users parallelly send messages could cause higher transmission delay. For example, when K ≥ 4,

Remark 4.
From the definitions of T decentral , R s , R u and R ∅ , it is easy to obtain that R ∅ ≤ T decentral ≤ R s , T decentral decreases as α max increases, and T decentral increases as R u increases if R u ≥ R ∅ .
Due to the complex term R u , T decentral in Theorem 4 is hard to evaluate. Since T decentral is increasing as R u increases (see Remark 4), substituting the following upper bound of R u into (17) provides an efficient way to evaluate T decentral . Corollary 1. For memory size 0 ≤ p ≤ 1, the upper bound of R u is given below: • α max = K 2 (a flexible network): Proof. See the proof in Appendix C.
Recall that the transmission delay achieved by the decentralized scheme without D2D communication [2] is equal to R s given in (19). We define the ratio between T decentral and R s as decentralized cooperation gain: with G decentral,c ∈ [0, 1] because of R ∅ ≤ R s . Similar to the centralized scenario, this gain arises from the coordination between users in the D2D network. Moreover, we also compare T decentral with the transmission delay (1 − p)/p, achieved by the D2D decentralized coded caching scheme [35], and define the ratio between R s and (1 − p)/p as decentralized parallel gain: where G decentral,p ∈ [0, 1] arises from the parallel transmission between the server and the users. We plot the decentralized cooperation gain and parallel gain for the two types of D2D networks in Figure 4 when N = 20 and K = 10. It can be seen that G decentral,c and G decentral,p in general are not monotonic functions of M. Here G decentral,c performs in a way similar to G central,c . When M is small, the function G decentral,c is monotonically decreasing from value 1 until reaching the minimum. For larger M, the function G decentral,c turns to monotonically increase with M. The reason for this phenomenon is that in the decentralized scenario, when M increases, the proportion of subfiles that are not cached by any user and must be sent by the server is decreasing. Thus, there are more subfiles that can be sent parallelly via D2D network as M increases. Meanwhile, the decentralized scheme in [2] offers an additional multicasting gain. Therefore, we need to balance these two gains to reduce the transmission delay.  The function G decentral,p behaves differently as it monotonically increases when M is small. After reaching the maximal value, the function G decentral,p decreases monotonically until meeting the local minimum (The abnormal bend in parallel gain when α max = K 2 comes from a balance effect between the G decentral,c and 1 − (1 − p) K in (27)), then G decentral,p turns to be a monotonic increasing function for large M. Similar to the centralized case, as M increases, the impact of parallel transmission among the server and users becomes smaller since more data can be transmitted by the users.
Proof. See the proof in Appendix D. Figure 5 plots the lower bound in (9) and upper bounds achieved by various decentralized coded caching schemes, including our scheme, the scheme Maddah-Ali 2015 in [2] which considers the case without D2D communication, and the scheme Ji 2016 in [35] which considers the case without server.

Coding Scheme under Centralized Data Placement
In this section, we describe a novel centralized coded caching scheme for arbitrary K, N and M such that t = KM/N is a positive integer. The scheme can be extended to the general case 1 ≤ t ≤ K by following the same approach as in [1].
We first use an illustrative example to show how we form D2D communication groups, split files and deliver data, and then present our generalized centralized coding caching scheme.

An Illustrative Example
Consider a network consisting of K = 6 users with cache size M = 4, and a library of N = 6 files. Thus, t = KM/N = 4. Divide all six users into two groups of equal size, and choose an integer L 1 = 2 that guarantees K( K−1 t )L 1 min{α( K/α −1),t} to be an integer. (According to (11) and (29), one optimal choice could be (α = 1, L 1 = 4, λ = 5/9), here we choose (α = 2, L 1 = 2, λ = 1/3) for simplicity, and also in order to demonstrate that even with a suboptimal choice, our scheme still outperforms that in [1,35]). Split each file W n , for n = 1, . . . , N, into 3( 6 4 ) = 45 subfiles: to user 5 and 6. The transmission rate in the D2D network is R 2 = 1 3 . For the remaining subfiles with superscript l = 3, the server delivers them in the same way as in [1]. Specifically, it sends symbols ⊕ k∈S W 3 d k ,S \{k} , for all S ⊆ [K] : |S| = 5. Thus, the rate sent by the server is R 1 = 2 15 , and the transmission delay T central = max{R 1 , R 2 } = 1 3 , which is less than the delay achieved by the coded caching schemes for the broadcast network [1] and the D2D communication [35], respectively.

The Generalized Centralized Coding Caching Scheme
In the placement phase, each file is first split into ( K t ) subfiles of equal size. More specifically, split W n into subfiles as follows: W n = (W n,T : T ⊂ [K], |T | = t). User k caches all the subfiles if k ∈ T for all n = 1, ..., N, occupying the cache memory of MF bits. Then split each subfile W n,T into two mini-files as W n,T = W s n,T , W u n,T , where with Here, the mini-file W s n,T and W u n,T will be sent by the server and users, respectively. For each mini-file W u n,T , split it into L 1 pico-files of equal size (1 − λ) · F , i.e., W u n,T = W u,1 n,T , . . . , W u,L 1 n,T , where L 1 satisfies As we will see later, condition (29) ensures that communication loads can be optimally allocated between the server and the users, and (30) ensures that the number of subfiles is large enough to maximize multicast gain for the transmission in the D2D network.
In the delivery phase, each user k requests file W d k . The request vector d = (d 1 , d 2 , . . . , d K ) is informed by the server and all users. Please note that different parts of file W d k have been stored in the user cache memories, and thus the uncached parts of W d k can be sent both by the server and users. Subfiles In this case, user k ∈ S sends an XOR symbol that contains the requested subfiles for all remaining t − 1 users in S, i.e., ⊕ j∈S \{k} W u,l(k,G i ,S ) d j ,S \{j} . Other groups perform the similar steps and concurrently deliver the remaining requested subfiles to other users.
By changing group partition and performing the delivery strategy described above, we can send all the requested subfiles to the users.
Since α groups send signals in a parallel manner (α users can concurrently deliver contents), and each user in a group delivers a symbol containing min{ K/α − 1, t} nonrepeating pico-files requested by other users, in order to send all requested subfiles in (31), we need to send in total XOR symbols, each of size 1−λ F bits. Notice that L 1 is chosen according to (30), ensuring that (32) equals to an integer. Thus, we obtain R 2 as where the last equality holds by (29). Now consider the delivery of the subfiles sent by the server. Apply the delivery strategy as in [1], i.e., the server broadcasts ⊕ k∈S W s d k ,S \{k} to all users, for all S ⊆ [K] : |S| = t + 1. We obtain the transmission rate of the server From (33) and (34), we can see that the choice λ in (29) guarantees equal communication loads at the server and users. Since the server and users transmit the signals simultaneously, the transmission delay of the whole network is the maximum between R 1 and R 2 , i.e., T central = max{R 1 , R 2 } = 1+t+α min{ K/α −1,t} , for some α ∈ [α max ].

Coding Scheme under Decentralized Data Placement
In this section, we present a novel decentralized coded caching scheme for joint broadcast network and D2D network. The decentralized scenario is much more complicated than the centralized scenario, since each subfile can be stored by s = 1, 2, . . . , K users, leading to a dynamic file-splitting and communication strategy in the D2D network. We first use an illustrative example to demonstrate how we form D2D communication groups, split data and deliver data, and then present our generalized coding caching scheme.

An Illustrative Example
Consider a joint broadcast and D2D network consisting of K = 7 users. When using the decentralized data placement strategy, the subfiles cached by user k can be written as We focus on the delivery of subfiles W n,T : n ∈ [N], k ∈ T , |T | = s = 4, i.e., each subfile is stored by s = 4 users. A similar process can be applied to deliver other subfiles with respect to s ∈ [K]\{4}.
To allocate communication loads between the server and users, we divide each subfile into two mini-files W n,T = W s n,T , W u n,T , where mini-files {W s n,T } and {W u n,T } will be sent by the server and users, respectively. To reduce the transmission delay, the size of W s n,T and W u n,T need to be chosen properly such that R 1 = R 2 , i.e., the transmission rate of the server and users are equal; see (37) and (39) ahead. Divide all the users into two non-intersecting groups (G r 1 , G r 2 ), for r ∈ [35] which satisfies There are ( 7 4 ) = 35 kinds of partitions in total, thus r ∈ [35]. Please note that for any user k ∈ G r i , |G r i | − 1 of its requested mini-files are already cached by the rest users in G r i , for i = 1, 2.
To avoid repetitive transmission of any mini-file, each mini-file in is divided into non-overlapping pico-files W u 1 d k ,T \{k} and W u 2 d k ,T \{k} , i.e., The sizes of W u 1 n,T and W u 2 n,T need to be chosen properly to have equal transmission rate of group G r 1 and G r 2 ; see (51) and (52) ahead. To allocate communication loads between the two different types of groups, split each W u 1 d k ,T \{k} and W u 2 d k ,T \{k} into 3 and two equal fragments, respectively, e.g., During the delivery phase, in each round, one user in each group produces and multicasts an XOR symbol to all other users in the same group, as shown in Table 2. Table 2. Parallel user delivery when K = 7, s = 4, G r 1 = 4 and G r 2 = 3, r ∈ [35].

The Generalized Decentralized Coded Caching Scheme
In the placement phase, each user k applies the caching function to map a subset of MF N bits of file W n , n = 1, ..., N, into its cache memory at random: W n = W n,T : T ⊆ [K] . The subfiles cached by user k can be written as W n,T : n ∈ [N], k ∈ T , T ⊆ [K] . When the size of file F is sufficiently large, by the law of large numbers, the subfile size with high probability can be written by The delivery procedure can be characterized into three different levels: allocating communication loads between the server and user, inner-group coding (i.e., transmission in each group) and parallel delivery among groups.

Allocating Communication Loads between the Server and User
To allocate communication loads between the server and users, split each subfile W n,T , for T ⊆ [K] : T = ∅, into two non-overlapping mini-files and λ is a design parameter whose value is determined in Remark 5.
Mini-files (W s d k ,T \{k} : k ∈ [K]) will be sent by the server using the decentralized coded caching scheme for the broadcast network [2], leading to the transmission delay where R s is defined in (19). Mini-files (W u d k ,T \{k} : k ∈ [K]) will be sent by users using parallel user delivery described in Section 5.2.3. The corresponding transmission rate is where R u represents the transmission bits sent by each user normalized by F. Since subfile W d k ,∅ is not cached by any user and must be sent exclusively from the server, the corresponding transmission delay for sending (W d k ,∅ : k ∈ [K]) is where R ∅ coincides with the definition in (18). By (38)-(40), we have According to (8), we have T decentral = max{R 1 , R 2 }.

Remark 5 (Choice of λ).
The parameter λ is chosen such that T decentral is minimized. If R u < R ∅ , then the inequality R 2 ≤ R 1 always holds and T decentral reaches the minimum
Split each W i d k ,S \{k} into (|G| − 1)γ non-overlapping fragments of equal size, i.e., and each user k ∈ G takes turn to broadcast XOR symbol where l(k, G, S) ∈ [(|G| − 1)γ] is a function of (k, G, S) which avoids redundant transmission of any fragments. The XOR symbol X i k,G,s will be received and decoded by the remaining users in G.
For each group G, inner-group coding encodes in total ( K−|G| s−|G| ) of W i d k ,S \{k} , and each XOR symbol X i k,G,s in (43) contains fragments required by |G| − 1 users in G.

Parallel Delivery among Groups
The parallel user delivery consists of (K − 1) rounds characterized by s = 2, . . . , K. In each round s, mini-files are recovered through D2D communication.
The key idea is to partition K users into K s groups for each communication round s ∈ {2, ..., K}, and let each group perform the D2D coded caching scheme [35] to exchange information. If (K mod s) = 0, there will be K s numbers of groups of the same size s, and an abnormal group of size (K mod s), leading to an asymmetric caching setup. We optimally allocate the communication loads between the two types of groups, and between the broadcast network and D2D network as well.
Based on K, s and α max , the delivery strategy in the D2D network is divided into 3 cases: • Case 1: K s > α max . In this case, α max users are allowed to send data simultaneously. Select s · α max users from all users and divide them into α max groups of equal size s. The total number of such kinds of partition is In each partition, α max users, selected from α max groups, respectively, send data in parallel via the D2D network.
• Case 2: K s ≤ α max and (K mod s) < 2. In this case, choose ( K s − 1)s users from all users and partition them into ( K s − 1) groups of equal size s. The total number of such kind partition is In each partition, ( K s − 1) users selected from ( K s − 1) groups of equal size s, respectively, together with an extra user selected from the abnormal group of size K − s( K s − 1) send data in parallel via the D2D network. • Case 3: K s ≤ α max and (K mod s) ≥ 2. In this case, every s users form a group, resulting in K s groups consisting of s K s users. The remaining (K mod s) users form an abnormal group. The total number of such kind of partition is In each partition, K s users selected from K s groups of equal size s, respectively, together with an extra user selected from the abnormal group of size (K mod s) send data in parallel via the D2D network.
Thus, the exact number of users who parallelly send signals can be written as follows: Please note that each group G re-appears The transmission delay of Case 1 in round s is thus where (a) follows by (44) and (48). Case 2 ( K s ≤ α max and (K mod s) < 2): We apply the same delivery procedure as Case 1, except that β 1 is replaced by β 2 and α D = K s . Thus, the transmission delay in round s is Case 3 ( K s ≤ α max and (K mod s) ≥ 2): Consider a partition r ∈ [β 3 ], denoted as Following the similar encoding operation in (43), group G r i : i ∈ [α D − 1] and group G r α D send the following XOR symbols, respectively: For each s ∈ {2, . . . , K}, the transmission delay for sending the XOR symbols above by group G r i : i ∈ [α D − 1] and group G r K s can be written as respectively. Since G i : i ∈ [ K s ] and group G K s can send signals in parallel, by letting we eliminate the parameter λ 2 and obtain the balanced transmission delay at users for Case 3: (53) Remark 6. The condition K s > α max in Case 1 implies that s ≤ K α max − 1. In this regime, the transmission delay is given in (49). If s ≥ K α max − 1 and (K mod s) < 2, scheme for Case 2 starts to work and the transmission delay is given in (50); If s ≥ K α max − 1 and (K mod s) ≥ 2, scheme for Case 3 starts to work and the transmission delay is given in (53).
In each round s ∈ {2, . . . , K}, all requested mini-files can be recovered by the delivery strategies above. By Remark 6, the transmission delay in the D2D network is where R u is defined in (20) and (55)

Conclusions
In this paper, we considered a cache-aided communication via joint broadcast network with a D2D network. Two novel coded caching schemes were proposed for centralized and decentralized data placement settings, respectively. Both schemes achieve a parallel gain and a cooperation gain by efficiently exploiting communication opportunities in the broadcast and D2D networks, and optimally allocating communication loads between the server and users. Furthermore, we showed that in the centralized case, letting too many users parallelly send information could be harmful. The information theoretic converse bounds were established, with which we proved that the centralized scheme achieves the optimal transmission delay within a constant multiplicative gap in all regimes, and the decentralized scheme is also order-optimal when the cache size of each user is larger than a small threshold which tends to zero as the number of users tends to infinity. Our work indicates that combining the cache-aided broadcast network with the cache-aided D2D network can greatly reduce the transmission latency.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Proof of the Converse
Let T * 1 and T * 2 denote the optimal rate sent by the server and each user. We first consider an enhance system where every user is served by an exclusive server and user, which both store full files in the database, then we are easy to obtain the following lower bound: Another lower bound follows similar idea to [1]. However, due to the flexibility of D2D network, the connection and partitioning status between users can change during the delivery phase, prohibiting the direct application of the proof in [1] into the hybrid network considered in this paper. Moreover, the parallel transmission of the server and many users creates abundant different signals in the networks, making the scenario more sophisticated.
Consider the cut separating X 1,0 , . . . , X N/s ,0 , X 1:α max , and Z 1 , . . . , Z s from the corresponding s users. By the cut-set bound and (A2), we have Since we have T * ≥ T * 1 and T * ≥ max{T * 1 , T * 2 } from the above definition, we obtain Appendix D.1.2. Case α max = K 2 and p ≤ p th From the definition of T decentral in (17), we have From Lemma A3, we know that and thus only focus on the upper bound of R ∅ /T * . According to Theorem 1, T * has the following two lower bounds: T * ≥ 1−p 2 , and s − KM N/(2s) .