Content Delivery in Fog-Aided Small-Cell Systems with Ofﬂine and Online Caching: An Information—Theoretic Analysis

: The storage of frequently requested multimedia content at small-cell base stations (BSs) can reduce the load of macro-BSs without relying on high-speed backhaul links. In this work, the optimal operation of a system consisting of a cache-aided small-cell BS and a macro-BS is investigated for both ofﬂine and online caching settings. In particular, a binary fading one-sided interference channel is considered in which the small-cell BS, whose transmission is interfered by the macro-BS, has a limited-capacity cache. The delivery time per bit (DTB) is adopted as a measure of the coding latency, that is, the duration of the transmission block, required for reliable delivery. For ofﬂine caching, assuming a static set of popular contents, the minimum achievable DTB is characterized through information-theoretic achievability and converse arguments as a function of the cache capacity and of the capacity of the backhaul link connecting cloud and small-cell BS. For online caching, under a time-varying set of popular contents, the long-term (average) DTB is evaluated for both proactive and reactive caching policies. Furthermore, a converse argument is developed to characterize the minimum achievable long-term DTB for online caching in terms of the minimum achievable DTB for ofﬂine caching. The performance of both online and ofﬂine caching is ﬁnally compared using numerical results.


Introduction
Edge or femto-caching relies on the storage of popular multimedia content at small-cell base stations (BSs) of a cellular system.This approach has been widely studied in recent years as a means to deliver video files with reduced latency and limited overhead on backhaul connections to the "cloud" [1,2].Caching at the edge can be seen as an instance of fog networking, whereby storage, computing and communication capabilities are moved closer to the end users [2].Edge caching has been initially studied for wireless channel models in which small-cell BSs and macro-BSs cannot coordinate their transmissions and hence cannot cooperatively manage their mutual interference (see [1,2] and references therein).In contrast, recent work in [3,4] addresses the possibility of interference management among edge nodes, such as small-cell and macro-BSs, based on the respective cached contents.
The papers [3,4] proposed caching and transmission schemes that enables coordination and cooperation at the BSs based on the cached contents for a system with three BSs and three users.
The performance of these schemes was evaluated in terms of the information-theoretic high signal-to-noise ratio (SNR) metric of the degrees of freedom (DoF), or, more precisely, of its inverse, as a function of the cache capacity of the BSs.More recent research in [5] provided an operational meaning for the inverse of the degrees of freedom metric used in [3,4] in terms of delivery latency, and derived a lower bound on the resulting metric, known as Normalized Delivery Time (NDT), for a general system with any number of BSs and users.The delivery coding latency, henceforth delivery latency, measures the duration of the transmission block.A scenario in which both BSs and users have cache storage is considered in [6,7] under one-shot linear transmission and in [8] under several transmission schemes for both centralized and decentralized caching strategies.It is proved that both BSs and users' caches have the same quantitative contribution to the achievable sum-DoF.Naderializadeh et al. [9] prop osed a universal scheme for content placement and delivery which is independent of underlying communication networks and is order-optimal in the high-SNR regime.In [10], upper and lower bounds on the NDT of cache-aided MIMO interference channels are provided.
In [11,12] the analysis in [3][4][5] was generalized to study a system in which a cloud server is connected to the BSs via finite-capacity backhaul links and can compensate for partial caching of the library of files at the BSs.This system was referred to as Fog-Radio Access Networks (F-RAN).The minimum NDT latency metric was characterized within a multiplicative factor of 2 in [12] as a function of the cache and backhaul capacity by developing achievability and converse arguments.Other works on NDT characterization include [13][14][15][16].In [13], a scenario with a multicast fronthaul is studied.In [14], decentralized content placement and file delivery are considered for a F-RAN system with caching at both BSs and users.Reference [15] studies the achievable NDT region to account for heterogenous requirements on the delivery of different files.In [16], the NDT performance of F-RAN systems is considered within a set-up characterized by a time-varying set of popular files.Reference [17] characterized the delivery time per bit of a cache-aided small-cell system by considering binary fading interference channels.Kakar et al. [18] considered the set-up in [17] under linear deterministic channel model to provide upper and lower bounds on the NDT.The optimization of linear processing and often signal processing aspects of F-RAN systems are considered in [19][20][21][22][23].
In this work, we consider the F-RAN model in Figure 1, which includes a small-cell BS and a macro-BS, represented by Encoder 1 and Encoder 2, respectively.The small-cell BS (Encoder 1) is equipped with a cache of finite capacity and can serve a small-cell mobile user, represented by Decoder 1.The macro-BS (Encoder 2) can serve a macro-cell user, namely Decoder 2, as well as, possibly, also Decoder 1.The transmission from the macro-BS (Encoder 2) to Decoder 2 interferes with Decoder 1.It is assumed that the small-cell BS transmits with sufficiently small power so as not to create interference at Decoder 2, which is modeled here as a partially connected wireless channel.We investigate both scenarios with offline and online caching.
Cloud and edge-aided data delivery over binary fading interference channels.
The main contributions of this article are as follows: • An information-theoretic formulation for the analyses of the system in Figure 1 is presented that centers on the characterization of the minimum delivery coding latency measured in terms of the Delivery Time per Bit (DTB), for both offline and online caching.The system model is based on a one-sided interference channel.

•
Assuming a fixed set of popular contents, the minimum DTB for the system in Figure 1 is obtained as a function of the cache capacity at Encoder 1 and the capacity of the backhaul link that connects the cloud to Encoder 1 in the offline setting.

•
Online caching and delivery schemes based on both reactive and proactive caching principles (see, e.g., [2]) are proposed in the presence of a time-varying set of popular files, and bounds on the corresponding achievable long-term DTBs are derived.

•
A lower bound on the achievable long-term DTB is obtained, which is a function of the time-variability of the set of popular files.The lower bound is then utilized to compare the achievable DTBs under offline and online caching.• Numerical results are provided in which the DTB performance of reactive and proactive online caching schemes is compared with offline caching.In addition, different eviction mechanisms, such as random eviction, Least Recently Used (LRU) and First In First Out (FIFO) (see, e.g., [24]), are evaluated.
The rest of the paper is organized as follows.In Section 2 we present the system model for offline caching, including the definition of the key performance metric of DTB.The minimum DTB for offline caching is then derived and discussed in Section 3. The online caching scenario for the system in Figure 1 is studied in Section 4 in terms of the long-term DTB metric.The comparison between online and offline caching is explored in Section 5. Numerical results are provided in Section 6 and, finally, Section 7 concludes the paper.This work was presented in part in [17].

System Model for Offline Caching
In this section, we study the fog-aided system depicted in Figure 1.We consider a static library of N files denoted by L = {W 1 , ..., W N }.Each file is independent and identically distributed according to uniform distribution, so that we have where F is the file size in bits.Encoder 1, which models a small-cell BS, has a local cache and is able to store µNF bits.The parameter µ, with 0 ≤ µ ≤ 1, is hence the fractional cache size and represents the portion of library that can be stored at the cache.Encoder 2, which models a macro-BS, can access the entire library L thanks to its direct connection to the cloud.Encoder 1 is also connected to the cloud but only through a rate-limited link of capacity C bits per channel use.We will first consider the scenario of edge-aided offline caching in which C = 0, i.e., Encoder 1 does not have access to the cloud, and then extend the analysis to cloud and edge-aided offline caching, i.e., when C ≥ 0.
It is assumed that encoders and decoders are connected by a binary fading interference channel, previously studied in [17,25,26].This model represents a special case of the deterministic linear model of [27] as generalized to account for random fading (see [28]).As illustrated in Figure 1, the signal received at Decoder 1 and Decoder 2 at time t can be written as where In the next two subsections, we described first the edge-aided scenario and then we generalize it to the cloud and edge-aided system.

Edge-Aided Offline Caching
The edge-aided small-cell system corresponds to the case with C = 0 in Figure 1.The system operates according to the following two phases. (1)Placement phase: The placement phase is defined by functions φ i (•), at Encoder 1, which maps each file To satisfy cache storage constraint, it is required that The total cache content at Encoder 1 is given by Note that, as in [5,11], we concentrate on caching strategies that allow for arbitrary intra-file coding but not for inter-file coding as per (2).Furthermore, the caching policy is kept fixed over multiple transmission intervals and is thus independent of the receivers' requests and of the channel realizations in the transmission intervals.(2) Delivery phase: The delivery phase is in charge of satisfying the given request vector d in each transmission interval given the current channel realization.We assume the availability of full Channel State Information (CSI) throughout the transmission block for simplicity of exposition, although this is not required by achievable schemes that will be proven to be optimal (see Remark 1).Note that in practice non-causal CSI for the coding block can be justified for multi-carrier transmission schemes, such as OFDM, in which index t runs over the subcarriers.It is defined by the following two functions.
• Encoding: Encoder 1 uses the encoding function which maps the cached content V, the demand vector d and the CSI sequence , d, G T ).Note that T represents the duration of transmission in channel uses.Encoder 2 uses the following encoding function which maps the library L of all files, the demand vector d, and the CSI vector Decoding: Each decoder j ∈ {1, 2} is defined by the following mapping which outputs the detected message Ŵd j = η j (Y T j , d, G T ) where Y T j = (Y j (1), ..., Y j (T)) is the received signal (1) at receiver j.
We refer to a selection of caching, encoding, and decoding functions in ( 5)-( 7) as a policy.The probability of error is evaluated with respect to the worst-case demand vector and decoder as Pr( Ŵd j = W d j ).
The delivery time per bit (DTB) of a code is defined as T/F and is measured in channel symbols per bit.A DTB δ is said to be achievable if there exists a sequence of policies indexed by the file size F for which the limits lim and P F e → 0 as F → ∞ hold.The minimum DTB δ * (µ) is the infimum of all achievable DTB when the fractional cache capacity at Encoder 1 is equal to µ.

Cloud and Edge-Aided Offline Caching
In this section, we generalize the model described above to the case in which there is a link with capacity C ≥ 0 between Cloud and Encoder 1.The content placement phase is the same as Section 2.1.In the delivery phase, the Cloud implements an encoding function which maps the library L of all files, the demand vector d and the CSI vector G T to the signal Here, parameter T C represents the duration of the transmission from Cloud to Encoder 1 in terms of number of channel uses of the fading channel from encoders to decoders.We have the inequality ] by the capacity limitations on the Cloud-to-Encoder 1 link.Furthermore, Encoder 1 uses the encoding function which maps the cached content V, the received signal U T C , the demand vector d and the CSI sequence ).Note that, as for the edge-aided case, we assume non-causal CSI at both cloud and edge for simplicity of exposition.As discussed, this is a sensible assumption for multi-carrier modulation schemes.However, as indicated in Remark 2, it will be proven that the optimal strategy requires only causal CSI at the encoders and no CSI at the cloud.As above, T represents the duration of transmission on the binary fading channel in channel uses.
Decoding and probability of error are defined as in Section 2.1.Instead, a DTB δ is said to be achievable if there exists a sequence of policies, defined by ( 2), ( 6), ( 7), ( 10) and ( 11) and indexed by F, such that the limits: lim and P F e → 0 as F → ∞ hold.The minimum DTB δ * (µ, C) is the infimum of all achievable DTBs when the fractional cache size at Encoder 1 is equal to µ and the Cloud-to-Encoder 1 capacity is equal to C.

Minimum DTB under Offline Caching
In this section, we first characterize minimum DTB for edge-aided under offline caching scenario.Then, we derive the minimum DTB for the cloud and edge-aided system.

Edge-Aided System (C = 0)
In this subsection, we derive the minimum DTB δ * (µ) for the system in Figure 1 by assuming C = 0. Proposition 1.The minimum DTB for the cache and cloud-aided system in Figure 1 where µ 0 and δ 0 are given by and Proof.The converse is presented in Appendix A, and the achievable scheme is presented next.
To provide some insights obtained from the result in Proposition 1, consider first the set-up in which Encoder 1 has no caching capabilities, i.e., µ = 0.In this case, Encoder 2 needs to deliver the requested files to both decoders on a binary erasure broadcast channel.Considering the worst-case in which two different files are requested by two decoders, the minimum average time to serve both users is ) a bit can be delivered to either Decoder 1 or Decoder 2 by Encoder 2, yielding a minimum DTB of δ * (0) = 2/(1 − 2 2 ).In contrast, when the entire library is available at Encoder 1, i.e., µ = 1, depending on the relative values of 1 and 2 , two different cases should be distinguished.Roughly speaking, if the channel between Encoder 2 and the Decoders is weaker on average than the channel between Encoder 1 and Decoder 1, or more precisely if ¯ 1 ≥ ¯ 2 , then the minimum DTB is limited by transmission delay to Decoder 2 and the minimum DTB is δ * (1) = 1/(1 − 2 ).Instead, when the channel between Encoder 1 and Decoder 1 is weaker on average than the channel between Encoder 2 and both decoders, or ¯ 1 ≤ ¯ 2 , the resulting minimum DTB depends on both 1 and 2 .In both cases, Encoder 2 serves a fraction (1 − µ 0 ) of the requested file to Decoder 1, so that Encoder 1 only needs to deliver a fraction µ 0 of the requested file by Decoder 1.
As will be detailed below, a key element of the transmission policies is that, in the channel state in which all three links are active, the presence of the cache at Encoder 1 allows the latter to coordinate its transmission with Encoder 2 and cancel the interference caused by Encoder 2 to Decoder 1. Furthermore, from the discussion above, a fractional cache size µ ≥ µ 0 is sufficient to achieve the same DTB δ 0 as with full caching.Figure 2 shows the value µ 0 as a function of 1 for different values of 2 .It is observed that, for fixed 2 , the fraction µ 0 decreases with 1 , showing that an Encoder 1 with a low channel quality cannot benefit from a large cache size.Furthermore, as the channel from Encoder 2 becomes more reliable, i.e., for small 2 , a larger cache at Encoder 1 enables the latter to coordinate more effectively with Encoder 2, hence improving the DTB.
Remark 1.The achievable schemes proposed above only require the encoders to know the current state of the CSI, i.e., at each time t, only the CSI G(t) is needed.As a result, even if the encoders know only the current CSI, as well as the CSI statistics, the optimal performance is the same as for the case in which the entire sequence G T is known as per definition (5) and (6).

Proof of Achievability
Here, we provide details on the policies that achieve the minimum DTB identified in Proposition 1.We start by proving that the minimum DTB δ * (µ) is a convex function of µ.The proof leverages the splitting of files into subfiles delivered using different strategies via time sharing.
Proof.Consider two policies that require fractional cache sizes µ 1 and µ 2 and achieve DTBs δ 1 and δ 2 , respectively.Given a fractional cache size µ = αµ 1 + (1 − α)µ 2 for any α ∈ [0, 1], the system can operate by splitting each file into two parts, one of size αF and the other of size (1 − α)F, while satisfying the cache constraints.The first fraction of the files is delivered following the first policy, while the second fraction is delivered using the second policy.Since the delivery time is additive over the two file fractions, the DTB δ = αδ 1 + (1 − α)δ 2 is achieved.
By the convexity of δ * (µ) proved in Lemma 1, it suffices to prove that the corner points (µ = 0, δ * (0) = 2/(1 − 2 2 )) and (µ = µ 0 , δ 0 ) are achievable.In fact, the minimum DTB δ * (µ) can then be achieved, following the proof of Lemma 1, by file splitting and time sharing between the optimal policies for µ = 0 and µ = µ 0 in the interval 0 ≤ µ ≤ µ 0 and by using the optimal policy for µ = µ 0 in the interval µ 0 ≤ µ ≤ 1 (see Figure 3).In the following, we use the notation (g 0 , g 1 , g 2 ) ∈ {0, 1} 3 to identify the channel realization For instance, (0, 1, 1) represents the channel realization in which Y 1 = X 1 and Y 2 = X 2 , and (1, 0, 1) that in which ).In this setting, in which Encoder 1 has no caching capabilities, the model reduces to a broadcast erasure channel from Encoder 2 to both decoders.The worst-case demand vector is any one in which the decoders request different files.In fact, if the same file is requested, it can always be treated as two distinct files achieving the same latency as for a scenario with distinct files.Focusing on this worst-case scenario, we adopt the following delivery policy.
Encoder 1 always transmits X 1 = 0. Encoder 2 transmits 1 bit of information to Decoder 1 in the states (1, 0, 0) and (1, 1, 0), in which the channel from Encoder 2 to Decoder 1 is on while the channel to Decoder 2 is off.It transmits 1 bit of information to Decoder 2 in the states (0, 0, 1) and (0, 1, 1), in which the channel to Decoder 2 is on while the channel to decoder 1 is off.Instead, in states (1, 0, 1) and (1, 1, 1), in which both channels to Decoder 1 and Decoder 2 are on, Encoder 2 transmits 1 bit of information to Decoder 1 or to Decoder 2 with equal probability.
Consider now the time T 1 required for Decoder 1 to decode successfully F bits.We can write this random variable as where T 1,k denotes the number of channel uses required to transmit the kth bit.Given the discussion above, the variables T 1,k are independent for k ∈ [F] and have a Geometric distribution with mean . By the strong law of large numbers we now have the limit with probability 1.In a similar manner, the resulting delivery time for Decoder 2 for any given bit has a Geometric distribution with mean (Pr[G = (0, 0, 1)] 2 ); and, by the strong law of large numbers, we obtain that the time T 2 needed to transmit F bits to Decoder 2 satisfies the limit lim surely.Using this limit along with (17) allows to conclude that there exists a sequence of policies with T/F → 2/(1 − 2 2 ) for any arbitrarily small probability of error.
Next, we consider the corner point (µ 0 , δ 0 ) under the condition ¯ 1 2 ≥ 1 ¯ 2 2 .In this case, in which Encoder 1 has a better channel than Decoder 2 in the average sense discussed above, our findings show that Encoder 2 should communicate to Decoder 1 only in the channel states in which the channel to Decoder 2 is off.Using these states, Encoder 2 sends (1 − µ 0 )F bits to Decoder 1. Encoder 1 cache a fraction µ 0 of each file in the library and delivers µ 0 F bits of the requested file to Decoder 1.For this purpose, coordination between Encoder 1 and Encoder 2 is needed to manage interference in the state (1, 1, 1) in which all links are on.
A detailed description of the transmission strategy is provided below as a function of the channel state G.
, where X1 is an information bit for Decoder 1.This form of coordination is enabled by the fact that Encoder 1 knows the bit X 2 , since it is part of the µ 0 F cached bits from the file requested by Decoder 2. In this way, interference from Encoder 2 is cancelled at Decoder 1.
From the discussion above, Encoder 2 transmits 1 bit of information to Decoder 2 in the states (1), (3), ( 5) and (7).For large F, the normalized transmission delay for transmitting the requested file to Decoder 2 is then equal to Furthermore, Encoder 2 transmits (1 − µ 0 )F bits to decoder 1 in the states at (4) and (6).
The required normalized time for large F is hence Finally, Encoder 1 transmits µ 0 F bits to Decoder 1 in the states at (2), ( 3) and (7).The required time is thus It can be shown that δ 11 ≤ δ 21 = δ 22 = δ 0 under the given condition ¯ 1 2 ≥ 1 ¯ 2 2 , and hence the DTB is given by max(δ 11 , δ 21 , Finally, we consider the corner point (µ 0 , δ 0 ) under the complementary condition ¯ 1 2 ≤ 1 ¯ 2 2 , in which Encoder 2 has better channels to the decoders.In this case, as above, Encoder 1 caches a fraction µ 0 of all files.Transmission take place as described in the previous case except for state (5) which is modified as follows: (5) G = (1, 0, 1): Encoder 2 transmits 1 bit of information to either Decoder 1 or Decoder 2 with probabilities 2 )/2 and 1 − α, respectively.

Cloud and Edge-Aided System (C ≥ 0)
In the following proposition, we derive the minimum DTB δ * (µ, C) for the system in Figure 1 with C ≥ 0. Proposition 2. The minimum DTB for the cache and cloud-aided system in Figure 1 is: Otherwise, it is given by: where δ * (µ), µ 0 and δ 0 are defined in (13), ( 14) and (15), respectively.
Proof.See below and Appendix B.
Figure 4 shows the minimum DTB as a function of µ and C. To elaborate on the results in Proposition 2, we focus first on the setting in which Encoder 1 has no caching capability, i.e., µ = 0.In this case, unlike the scenario studied in the previous section, Encoder 1 can deliver part of the file requested by Decoder 1 through the connection to the Cloud.Nevertheless, if C ≤ 1 − 2 2 , that is, if the average delay for transmission of 1 bit from cloud to Encoder 1, namely 1/C, is larger than the corresponding delay between Encoder 2 and both decoders, namely 1/(1 − 2 2 ), then it is optimal to neglect Encoder 1 and operate as discussed in Section 3.1.Instead, if C ≥ 1 − 2 2 , it is optimal for Encoder 1 to transmit parts of the requested files, or functions thereof, which are received from the cloud.In fact, as discussed below, it is necessary for the cloud to transmit a coded signal obtained from both the files requested by the users in order to obtain the DTB in Proposition 2.Moreover, if the fractional cache size satisfies the inequality µ ≥ µ 0 , then the cache size at Encoder 1 is sufficient to achieve the DTB δ 0 corresponding to full caching and the Cloud-to-Encoder 1 link can be neglected with no loss of optimality.

Proof of Achievability
In this section, we detail the policies that achieve the minimum DTB described in Proposition 2. We start by noting that for C ≤ 1 − 2 2 , the achievability of the DTB follows from Proposition 1, and hence we can concentrate on the case C ≥ 1 − 2 2 .We first note that the minimum DTB δ * (µ, C) is a convex function of µ for any value of C. The proof follows as in Lemma 1 by file splitting and time sharing and is hence omitted.By the convexity of δ * (µ, C) in Lemma 2, and by the achievability of the DTB in Proposition 1 with C = 0, and hence also for C ≥ 0, it suffices to prove that the corner point δ * (0 To this end, we consider the worst case in which each decoder requests a different file, and we adopt the following policy.
The Cloud-to-Encoder 1 link is used for a normalized time ))/C to transmit ρF bits from the file requested by Encoder 1, with Of these bits, ρF ¯ 1 2 /( ¯ 1 2 + ¯ 1 ¯ 2 2 ) bits are sent to Encoder 1 by the Cloud in an uncoded form.Instead, the remaining ρF ¯ 1 ¯ 2 2 /( ¯ 1 2 + ¯ 1 ¯ 2 2 ) bits are transmitted by XORing each bit of the file with the corresponding bit of the file requested by Decoder 2. The mentioned ρF bits are sent to Decoder 1 by Encoder 1, while the remaining (1 − ρ)F bits are sent by Encoder 2 to Decoder 1, as discussed next.
The transmission strategy follows the approach described in Section 3.1.As for (20) the transmission of uncoded bits from Encoder 1 to Decoder 1 requires a normalized time on the channel while the transmission of coded bits requires time Similar to ( 19) and ( 22), the time required for Encoder 2 to transmit to Decoder 1 is while δ 22 = δ 0 is sufficient to communicate to Decoder 2. Under the channel conditions ¯ 1 2 > ¯ 2 2 1 , from ( 25), ( 26) and ( 28 ))/C, which is equal to δ * (0, C) in (24).
Remark 2. In a manner similar to the edge-aided case, the optimal scheme described above requires only causal CSI at the encoders, and, furthermore, it requires no CSI at the Cloud (but only knowledge of the channel statistics.)This shows that the assumption of non-causal CSI is not needed to obtain optimal performance.

Online Caching
Section 2 focused on an offline caching scenario in which there is a fixed set L of popular contents and the operation of the system is divided between a placement phase and a delivery phase.In this section, instead, we consider an online caching set-up in which the set of popular files varies from one time slot to the next.As a result, both content delivery and cache update should be generally performed in every time slot, where the latter is needed to ensure the timeliness of the cached content.

System Model
Let L t be the set of N popular files at time slot t.As in [29], we assume that with probability 1 − p, the popular set is unchanged and we have L t = L t−1 ; while, with probability p, the set L t is constructed by randomly and uniformly selecting one of the files in the set L t−1 and replacing it by a new popular file.At each time slot t, users request files d t , which are drawn uniformly at random from the set L t without replacement.We consider two cases, namely: (i) known popular set: the Cloud is informed about the set L t at time t, e.g., by leveraging data analytics tools; (ii) unknown popular set: the set L t may only be inferred at the Cloud via the observation of the users' requests.We note that the latter assumption is typically made in the networking literature [24].Define as T C,t the duration of the transmission from Cloud to Encoder 1 and as T t the duration of the transmission from both encoders to decoders at time slot t.As in the previous section, durations are measured in terms of number of channel uses of the binary fading channel.Since the set of popular files L t is time-varying, both cache update and file delivery are generally performed at each time slot t.To this end, at time slot t, the Cloud encodes via the function: which maps the library L t of all files, the demand vector d t and the CSI vector G T t to the signal U T C,t = (U 1 , ..., U T C,t ) = ψ C (L t , d t , G T t ) to be delivered to Encoder 1.We have the inequality H(U T C,t ) ≤ T C,t C according to the capacity constraints on the Cloud-to-Encoder 1 link.Moreover, Encoder 1 uses the encoding function which maps the cached content V t , the received signal U T C,t , the demand vector d t and the CSI sequence The probability of error is defined as where d j,t is the index of the requested file by jth user at time slot t so that we have The probability of error in (31) is evaluated with respect to the distribution of the popular set L t and of the request vector d t .A sequence of policies indexed by t is said to be feasible if P F e,t → 0 as F → ∞ for all t.In a manner similar to the offline case, we define DTB at time slot t as where the average is taken over the distribution of the popular set L t and of the request vector d t .To measure the performance of online caching, we define the long-term DTB as We denote the minimum long-term DTB over all feasible policies under the known popular set assumption as δ * on,k (µ, C), while δ * on,u (µ, C) denotes the minimum long-term DTB under the unknown popular set assumption.By definition, we have the inequality δ * on,k (µ, C) ≤ δ * on,u (µ, C).Furthermore, both DTBs δ * on,k (µ, C) and δ * on,u (µ, C) are not smaller than the offline DTB δ * (µ, C), given that in the offline set-up caching takes place in a separate phase with no overhead on the Cloud-to-Encoder 1 link.In the rest of this section, we evaluate the performance of two proposed online caching schemes and we also provide a lower bound on the the minimum long-term DTB.The treatment is inspired by the prior work [16], which focuses on the F-RAN model studied in [11].

Proactive Online Caching
If the popular set L t is known, the cloud can proactively cache any new content at the small-cell BS by replacing the outdated file.Specifically, we propose to transfer a µ-fraction of the new popular file from the Cloud to Encoder 1 in order to update the cache content at the small-cell BS.Since, after this update, the cache configuration with respect to the current set L t of popular files is the same as in the offline case with respect to L, delivery can then be performed by following the offline delivery policy detailed in Section 3.2.The following proposition presents the resulting achievable long-term DTB of proactive online caching.Proposition 3. The proposed proactive online caching for the cache and cloud-aided system in Figure 1 achieves the long-term DTB δon,pro (µ, with δ * (µ, C) is given by ( 23) and (24).We hence have the upper bound δ * on,k (µ, C) ≤ δon,pro (µ, C).
Proof.With probability p, there is a new file in the popular set L t and hence a µ-fraction of the new content is sent on the cloud-to-Encoder 1 link resulting in a latency of T C,t = µF/C.The achievable scheme in Section 3.2 is then used to deliver both requested files.As a result, the DTB at time slot t is Using (33), the long-term DTB is given by (34).

Reactive Online Caching
When the popular set is highly time-varying, the proactive scheme sends a large number of new contents on the Cloud-to-Encoder 1 link to update the cache content at small-cell BS.However, only a subset of these files will generally be requested before becoming outdated.To potentially solve this problem, the Cloud can update the small-cell BS's cache by means of a reactive scheme.Accordingly, the Cloud updates the cache only if the files requested by Decoder 1 and/or Decoder 2 are not (partially) cached at the small-cell BS.
The reactive strategy, unlike the proactive one, can operate under the unknown popular set assumption.It is also possible to define a reactive strategy that leverages knowledge of the set of popular files to outperform proactive caching.This will be discussed in our future work.
To elaborate, in a manner similar to [29], in each time slot t, small-cell BS stores a (µ/α)-fraction of N = αN files for some α > 1.Note that the set of N > N cached files in the cached contents of small-cell BS generally contains files that are no longer in the set L t of N popular files.Caching N > N files is instrumental in keeping the intersection between the set of cached files and L t from vanishing [29].To update the cache content, a (µ/α)-fraction of the requested and uncached files is sent on the Cloud-to-Encoder 1 link and is cached at the small-cell BS by randomly and uniformly evicting the same number of cached files.The following proposition presents an achievable long-term DTB for the proposed reactive online caching policy.Proposition 4. The proposed reactive online caching for the cache and cloud-aided system in Figure 1 achieves a long-term DTB that is upper bounded as for any α > 1.This yields the upper bound δ * on,u (µ, C) ≤ δon,react (µ, C).
Proof.Denoting Y t ∈ {0, 1, 2} the number of requested and uncached files at time slot t, the cloud send a (µ/α)-fraction of the Y t requested and uncached files to the small-cell BS.Hence, the achievable DTB at each time slot t is By plugging (36) into the definition of long-term DTB (33), we have Noting the fact that content placement and random eviction are the same as [29], the result of ( [29] Lemma 3) can be invoked to obtain the upper bound lim sup Plugging (38) into (37) completes the proof.For the cache and cloud-aided system in Figure 1 with N ≥ 2, the long-term DTB is lower bounded as with δ * (µ, C) given in (23) and (24).
Proof.See Appendix C.
The lower bound (39) will be leveraged in the next section to relate the performance of offline and online caching.

Comparison between Online and Offline Caching
In this section, we compare the performance of the offline caching system studied in Section 3 and of the online caching system introduced in Section 4. The following proposition presents that the minimum long-term DTB can be upper and lower bounded in terms of the minimum DTB of offline caching.Proposition 6.For the cache and cloud-aided system in Figure 1 with N ≥ 2, the long-term DTB satisfies the inequalities Proof.The upper bound is obtained by comparing the performance (34) of the proposed reactive scheme with the minimum offline DTB in Proposition 2, while the lower bound is from Proposition 5. Details are provided in Appendix D.
Proposition 6 shows that the long-term DTB with online caching is no larger than twice the minimum offline DTB in the regime of low capacity C. Instead for larger values of C, the minimum online DTB is proportional to minimum offline DTB with an additive gap that decreases as 1/C.Informally, these results demonstrate that the additive loss of online caching decreases as 1/C for sufficiently large C, while, for lower values of C, the performance gap is bounded.This stands in contrast to [16], in which the performance gap between offline and online caching increases as the inverse of the capacity of the link between Cloud and BSs when the latter becomes smaller.The key distinction here is that the macro-BS has direct access to the set of popular files and can directly serve the users, while in [16] the Cloud can only access the users through the finite-capacity links.

Numerical Results
In this section, we evaluate the performance of the proposed online caching schemes numerically.We specifically consider the long-term DTB achievable by the proposed proactive scheme (34) and the proposed reactive scheme (35).For the latter, we evaluate the expectation in (36) via Monte Carlo simulations by averaging over a large number of realizations, i.e., 10,000, of the random process Y t .It is assumed that the small-cell cache is empty at the start of simulation, i.e., at time t = 1.
The impact of the cloud-to-Encoder 1 capacity C is first considered in Figure 5.As a reference, we also plot the minimum DTB for offline caching in ( 23) and ( 24) and the performance with no caching, that is, δ * (0, C) in (24).For reactive caching, we assume random eviction for reactive caching.Parameters are set as µ = 0.5, p = 0.5, 1 = 2 = 0.5 and N = 10.It is seen that both proactive and reactive caching can significantly improve over the no caching scheme by updating the content stored at the small-cell BS.However, as the capacity of Cloud-to-Encoder 1 link decreases, it is deleterious in terms of delivery latency to use the link in order to update the cache content.As a result, if C is small enough, the performance of reactive and proactive caching coincides with the no caching system.When C is large enough, instead, the latency of cache update is negligible and both proactive and reactive schemes achieve the same DTB, which tends to the minimum offline DTB.Next, we compare the performance of reactive and proactive online caching schemes as a function of the probability p of new content.As shown in Figure 6 for µ = 0.5, C = 0.5, 1 = 2 = 0.5 and N = 10, when p is small, proactive caching outperforms reactive caching, since it uses the Cloud-to-Encoder 1 connection only with rare event that there is a new popular file.On the other hand, when p is large, as explained in the previous section, the reactive approach yields a smaller latency than the proactive scheme.It is also seen that the LRU eviction strategy, whereby the replaced file is the one that has been least recently requested by any user, and FIFO eviction strategy, whereby the file that has been in the caches for the longest time is replaced, are both able to improve over randomized eviction.23) and ( 24) are also shown (C = 0.5, µ = 0.5, 1 = 2 = 0.5, N = 10).

Conclusions
Motivated by recent advances in cache and cloud-aided wireless network architectures, we have considered a fog-assisted system for content delivery.The system model includes a macro-BS that coexists with a cache and cloud-aided small-cell BS whose user can also be served by the macro-BS.Using the minimum delivery latency as performance measure, the trade-off between latency and system resources has been studied.A characterization of this optimal trade-off has been derived under a binary fading interference channel and in the presence of full CSI when the set of popular contents is fixed.For the alternative online scenario with time-varying set of popular files, the average DTB within a long time horizon is shown to be at most two times larger than for the offline scenario case when the capacity of the link used to update the cache content is small and to have otherwise a gap inversely proportional to this capacity.
where γ F indicates any function that satisfies γ F → 0 as F → ∞.In above derivation, (a) follows from the facts that: (i) Y T 1 is a function of V 1 , V 2 , G T , and GT X T 2 , since X T 1 can be assumed to depend on without loss of generality only on V 1 and V 2 , and the vector G T 0 X T 2 can be obtained from GT X T 2 and G T ; (ii) Y T 2 is a function of (G T , GT X T 2 ); (b) follows from Fano's inequality; (c) follows from the fact that the messages are independent of channel realization and from Fano inequality H(V 2 | GT X T 2 , G T ) ≤ Fγ F ; (d) hinges on the cache constraint (3) and by the following bounds where G is the set of all channel states and the last inequality follows from the fact that the entropy in all states G = g is maximized for X 2 ∼ Bernoulli(1/2).For F → ∞, (A1) yields the bound on the minimum DTB Based on the fact that requested files should be retrieved from the received signals, another bound can be derived as follows: where (a) follows from Fano's inequality; (b) follows from the fact that channel gains are independent from files; (c) follows in a manner similar to (A2); and (d) is due to the fact that the entropy terms in

Figure 2 .Figure 3 .
Figure 2. Optimum fractional cache size µ 0 as a function of 1 for different values of 2 , which ranges from 0 to 1 with step size 0.1.

4. 4 .Proposition 5 .
Lower Bound on the Minimum Long-Term DTB We now provide a lower bound on the the minimum long-term DTB.(Lower bound on the Long-Term DTB of Online Caching).

Figure 5 .
Figure5.Achievable long-term DTB versus the capacity C of the Cloud-to-Encoder 1 for proactive scheme (34) and reactive caching with random eviction (35).For reference, the DTB with no caching, namely δ * (0, C), and the offline minimum DTB (23) and (24) are also shown (p = 0.5, µ = 0.5, 1 = 2 = 0.5, N = 10). 0 3is the vector of binary channel coefficients at time t, and X 1 (t) and X 2 (t) are the binary transmitted signals from Encoder 1 and Encoder 2, respectively.In (1), all operations are in the binary field.The channel gains are distributed as G 1 (t) ∼ Bernoulli( 1 ) and G 0 (t), G 2 (t) ∼ Bernoulli( 2 ), are mutually independent and change independently over time.The parameters 1 and 2 describes the average quality of the communication links originating at Encoder 1 and Encoder 2, respectively, and are hence in practice related to the transmission powers of Encoder 1 and Encoder 2. We remark that a more general model with different erasure probabilities for the links G 0 (t) and G 2 (t) could also be considered but at the expense of a more cumbersome notation and analysis, which is not further pursued here.Each user, or decoder, k requests a file W d k from the library L at every transmission interval for k = 1, 2. The demand vector is defined as d Achievable long-term DTB versus probability p of new content for the proactive scheme (34) and reactive caching scheme with random, LRU or FIFO eviction (35).For reference, the DTB with no caching, namely δ * (0, C), and the offline minimum DTB (