Cache-Aided General Linear Function Retrieval

Coded Caching, proposed by Maddah-Ali and Niesen (MAN), has the potential to reduce network traffic by pre-storing content in the users’ local memories when the network is underutilized and transmitting coded multicast messages that simultaneously benefit many users at once during peak-hour times. This paper considers the linear function retrieval version of the original coded caching setting, where users are interested in retrieving a number of linear combinations of the data points stored at the server, as opposed to a single file. This extends the scope of the authors’ past work that only considered the class of linear functions that operate element-wise over the files. On observing that the existing cache-aided scalar linear function retrieval scheme does not work in the proposed setting, this paper designs a novel coded caching scheme that outperforms uncoded caching schemes that either use unicast transmissions or let each user recover all files in the library.


Introduction
Content caching is an efficient technique to handle the increase of requests for massive amounts of data and content over communication networks. By leveraging low-cost memory components at the user sides, caching reduces peak-time traffic by prefetching contents closer to users during off-peak time, thereby reducing the transmission delay or equivalently increasing the bandwidth in communication systems. Traditional caching techniques aim at prefetching popular content by predicting the user demands, thus realizing a "local caching gain" (i.e., that scales with the amount of local memory) [1]. Maddah-Ali and Niesen (MAN) showed that it is possible to actually attain a "global caching gain" (i.e., that scales with the global amount of memory in the network) by using codes [2]. The idea is that, if a single transmission can serve a number of users simultaneously, the network load can be reduced by the same factor thus speeding-up communications significantly.
In the MAN setting, a server has a library of N files and broadcasts to K users through an error-free shared-link. Each user has a cache of size of at most M files. The MAN scheme consists of two phases: placement phase, where the server pushes content from the library to the local caches without knowledge of user future demands, and delivery phase, where each user requests one file and the server broadcasts coded packets such that each user can correctly recover its desired file. The objective is to minimize the worst-case load over all possible user demands, that is, the number of files that must be communicated so that any demands can be satisfied. The MAN scheme is optimal under the constraint of uncoded cache placement (i.e., each user directly stores a collection of segments of the library files in its cache) when N ≥ K [3,4]. By removing the redundant transmissions in the MAN scheme when a file is requested multiple times, Yu, Maddah-Ali, and Avestimehr (YMA) derived a scheme that is optimal under the constraint of uncoded cache placement for N < K [5]. In general, the YMA scheme is order optimal to within a factor of 2 [6], that is, coded placement can at best half the load of the YMA scheme.
On the motivation that linear and multivariate polynomial queries naturally arise in modern engineering problems and deep learning algorithms such as matrix-vector, matrix-matrix multiplications, in [7] the authors posed the question of what is the optimal worst-case load when the cache-aided users are interested in retrieving a scalar linear function of the files rather than a single file. For the class of functions considered in [7], which are restricted to operate element-wise on the file entries, it was surprisingly shown that the YMA load can be achieved, that is, there is no penalty in terms of load in retrieving scalar linear functions under the constraint of uncoded cache placement. It was noted in [7] that the proposed scalar linear function scheme can be extended to all scenarios to which the original MAN scheme has been extended, such as for example demand-private retrieval [8] and Device-to-Device networks [9,10]. In addition, the scalar linear function scheme [7] can be used as a building block to provide demand-privacy and content-security against colluding users [11,12].
In this paper, we move to a more general case of cache-aided linear function retrieval than in [7], where users can request general linear combinations of all symbols in the library, and not necessarily restricted to operate element-wise on the file entries. For example, each user aims to compute some statistics of a bunch of data such as local weighted averages (which are general linear functions) of the data; these are very common tasks in many applications depending on the data and on the weights.
Instead, each user may want to compute some statistics of a bunch of data such as average, or compute local weighted averages (which are general linear functions) of the data. We think that it is a very common task in many applications depending on the data and on the weights. So in our paper, if the Academic Editor agrees, we will replace the application in deep neutral networks by the application in computing local weighted averages.
Besides the novel and realistic problem formulation, our main contributions are as follows. We first introduce a baseline scheme that either lets each user recover all the symbols in the library or uses unicast transmissions to satisfy each user. The main challenge to implement a coded caching strategy in this problem is that each symbol in a user's demand is a linear combination of all the symbols in the library. Inspired by the grouping coded caching strategy in [13], which was used to reduce the sub-packetization level (The sub-packetization level is the smallest file length necessary to realize an achievable scheme.), we propose a scheme that treats the demand of each user as a matrix-vector multiplication and uses the grouping strategy to generate multicast messages after possibly performing invertible linear matrix operations. The proposed scheme outperforms the baseline scheme in all parameter regimes.

Paper Organization
The rest of this paper is organized as follows. Section 2 formulates the shared-link cache-aided general linear function retrieval problem. Section 3 provides the main result of this paper. Section 4 provides some numerical evaluations. Section 5 concludes the paper. Some proofs may be found in Appendices.

Notation Convention
Calligraphic symbols denote sets, bold symbols denote vectors and matrices, and sans-serif symbols denote system parameters. We use | · | to represent the cardinality of a set or the length of a vector; [a : b] := {a, a + 1, . . . , b} and [n] := [1 : n]; ⊕ represents bit-wise XOR; [a] + := max{a, 0}; F q represents a finite field with order q; A T and A −1 represent the transpose and the inverse of matrix A, respectively; rank q (A) represents the rank of matrix A on field F q ; I n represents the identity matrix with dimension n × n; (A) m×n represents the dimension of matrix A is m × n; we let ( x y ) = 0 if x < 0 or y < 0 or x < y.

System Model
Different from [7], here we consider the case where the users' desired linear functions are no longer scalar or operating element-wise across the files entries, thus we consider the whole library as a single file.
The (K, F, L, q) shared-link cache-aided general linear function retrieval problem consists of a central server with access to a library of F independent and identically distributed (i.i.d.) symbols over a finite filed F q , denoted by w = (w 1 , . . . , w F ) T ∈ (F q ) F . We often treat w as a column vector, which should be clear from the context. The server is connected to K cache-aided users through an error-free shared-link. The system has two phases.

•
In the placement phase, the server pushes up to M symbols into the local cache of each user, where M ∈ [0 : F], without knowing what the users will demand later. The cached content of user k ∈ [K] is denoted by where φ k is the placement function for user k defined as M is referred to as the cache (or memory) size. If each user directly copies M symbols from the library into its cache, the cache placement is said to be uncoded.

•
In the delivery phase, each user wants to retrieve L linear combinations of all the symbols in the library, where L ∈ [1 : F].
The demand of user k ∈ [K] is represented by the matrix D k ∈ (F q ) L×F , meaning user k aims to retrieve Let the collection of all demand matrices be D := [D 1 ; . . . ; D K ] ∈ (F q ) KL×F . We assume that the server and all users know D which is communicated on a separate channel, thus not impacting the downlink load next-see also Remark 4. ( Notice that differently from the cache-aided matrix multiplication problem in [14], where the matrix on the each side of the desired multiplication is one of the library files, in this paper each user k ∈ [K] desires D k w where D k is known by all the users in the delivery phase and w represents the vector of all symbols in the library.) According to all the users' demand matrix D, the server broadcasts the message where ψ is the encoding function for some R ∈ [0 : F]. R is referred to as the load. Achievability: For the (K, F, L, q) shared-link cache-aided general linear function retrieval problem, we say that the pair (M, R) is achievable if for any possible demand D there exist placement functions in (2) and a delivery function in (5) such that Optimal memory-load tradeoff: For the (K, F, L, q) shared-link cache-aided general linear function retrieval problem, the objective is to determine the minimum worst-case downlink load (or load for simplicity) defined as Optimal memory-load tradeoff in the limit for large file size: Since solving the problem in (7) for any given (K, F, L, q) is challenging, in the following we shall consider the regime where the file size F is as large as desired and we thus let the system parameters scale with the file length as follows For fixed (K, λ) we aim to characterize the minimum worst-case normalized downlink load (or normalized load for simplicity) Remark 1 (Relationship to [7]). The cache-aided scalar linear function retrieval problem in [7] is a special case of the formulation here. More precisely, let F = NL (i.e., 1 N = λ), where N indicates the number of files and λF is the file length. The demand of user k ∈ [K] is represented by the vector y k = (y k,1 , y k,2 , . . . , y k,N ) ∈ (F q ) N by which we mean that the user is requesting where I n is the identity matrix with dimension n × n. In the restricted setting where the demands are as in (12) the optimal load under the constraint of uncoded cache placement is the lower convex envelop of the points where for a given value of t in (13) the subpacketization level L must be an integer multiple of ( K t ).

Remark 2 (A minrank solution).
For the (K, F, L, q) shared-link cache-aided general linear function retrieval problem, the best linear scheme, inspired by [15,16], is a follows. Linear placement: Linear delivery: the server sends, in the worst case, a number of symbols given by where Solving the minrank problem in (15) is hard [15,16], thus in the following we shall design a scheme with lower complexity.

Remark 3 (A baseline scheme).
For the (K, F, L, q) shared-link cache-aided general linear function retrieval problem, the load can be achieved by an uncoded caching strategy as follows.
• In order to achieve the load KL, we transmit one by one the elements of y k , k ∈ [K], in (3). The main limitation of this unicast transmission scheme is the lack of multicast gain.

•
In order to achieve F − M we let each user recover all the symbols in the library. In the placement phase, each user caches the first M symbols in the library. In the delivery phase, the server transmits all the remaining F − M symbols. The main limitation of this scheme is that, if L < F − M, the users do not need to recover all the symbols in the library in order to retrieve their desired function.
The main contribution of this paper is to find schemes that, despite the lack of structure on the demand matrices in general, achieve a smaller load than (16).

Remark 4 (Uplink and downlink loads).
Besides downlink load, uplink load is also considered in the distributed matrix-matrix multiplication problem [17][18][19]. In this work, the communication cost of uploading the demand matrix to the server is not a focus, i.e, we assume that each user communicates the whole demand matrix to the server and all other users on a separate channel that is not the bottleneck in the system. This assumption can be also justified as follows. Let D (k) denotes the set of possible demand matrices of user k ∈ [K], referred to as demand range, that is, user k chooses one matrix in D (k) as its demand. We assume that D (k) is known by the server and all users. The communication cost to let the server and the other users know the realization of the demand matrix is negligible compared to the number of transmissions from the server if

Main Result
Based on Remark 3, the main challenge is to design a coded caching strategy that (i) lets each user directly recover the desired linear combinations, instead of recovering all the library symbols, and (ii) attains coded caching gain, as opposed to serving the users one-by-one with unicast transmissions. The main contribution of this paper is the following theorem, which is proved in Appendix A. Theorem 1. For the (K, λ) shared-link cache-aided general linear function retrieval problem, we have: and α ∈ [0, 1], the following normalized load is achievable 1], the following normalized load is achievable Next, we provide the intuition behind the proposed scheme in Theorem 1, which is based on three ingredients:

1.
We start by the achievable scheme for (20) with α = 1. We aim to design the cache placement such that each user caches a fraction K−1 K of the file and the uncached part of file by this user is known by the remaining K − 1 users. With this cache placement, the delivery consists of a single multicast message with multicasting gain K. More precisely, the construction of the proposed scheme is as follows. ecalling that, in Remark 1 with t = K − 1, each user misses a fraction 1/K of each file and that missing fraction is known by the remaining K − 1 users; with t + 1 = K, the delivery consists of a single multicast message with multicasting gain K that is the sum of each user's missing fraction of the demanded file. In our context, this idea becomes the following scheme. Assume K divides F. We use here a Matlab-like notation for submatrices. The library is partitioned into K equal length subfiles as follows ; the server delivers the multicast message where D k; :,I k represents the sub-matrix of D k including the columns with indices in I k . In X, each user k ∈ [K] knows all but the requested vector such that user k can recover either of them. Thus an achieved normalized memoryload tradeoff is

2.
We then introduce the achievable scheme for (17) with α ∈ {0, 1}. Assume now the K users are portioned into g groups of K g users each, where g ∈ [K − 1]. Let the users in the same group share the same cache content and recover all the linear combinations demanded by the users in the group. Then the normalized memoryload tradeoff is as in (25) but with K replaced by with g and L replaced by K g L. Therefore, we get that the following normalized memory-load points are achievable 3.
The rest of the proof of Theorem 1 consists of a method to 'interpolate' among the points in (26), as explained in Appendix A. Unlike cache-aided scalar linear function retrieval in [7], the difficulty in the considered problem is that connecting two normalized memory-load points by a line segment is generally impossible. The main reason is that if we partition w as w = [w ; w ] and use a different cache placement strategy on each part, each demanded function D k w is in the form thus it cannot be divided into two separate parts, where the first part only contains the linear combinations of w and the second part only contains the linear combinations of w . An example to highlight this limitation and our approach to overcome it is provided at the end of this section.
Remark 5 (Comparison to the baseline scheme). We show here that the proposed scheme in Theorem 1 outperforms the baseline scheme in (3). (17) and (19), it can be seen that From (17) and (18), it can be seen that Hence, from (28) and (29), we can prove ρ ach ≤ ρ baseline in this case. (20) we can prove ρ ach ≤ ρ baseline in this case.
Remark 6 (Connection to Remark 1). For the proposed scheme achieving (25), the cache placement is the same as the cache-aided scalar linear function retrieval scheme in Remark 1 with t = K − 1.
ecalling that, in Remark 1 with t = K − 1, each user misses a fraction 1/K of each file and that missing fraction is known by the remaining K − 1 users; with t + 1 = K, the delivery consists of a single multicast message with multicasting gain K that is the sum of each user's missing fraction of the demanded file. In our context, this idea becomes the following scheme.
Notice that, for the considered cache-aided general linear function retrieval problem where µ = t K and t ∈ [K], we could use the cache-aided scalar linear function retrieval scheme in Remark 1 to deliver ( K t+1 ) pieces of the requested vectors. The scheme would achieve which reduces to (25) for t = K − 1. By the grouping argument we would achieve Let then fix one g ∈ [K] and one t ∈ [g − 2], and analyse the achieved normalized load in (31). We will show that as follows. It can be seen that where (34) follows since t ∈ [g − 2] and thus ( g t+1 ) ≥ g. This shows that, with the exception for the normalized memory-load points with t = g − 1, the scheme in (31) is inferior to the baseline scheme in (16), and will thus not be pursued in the rest of the paper.
We finish this section with an example to illustrate the main ideas of the proposed scheme. Example 1. We consider a system with K = 6 users, cache fraction µ = 47 72 , and demand fraction λ = 1 12 . It can be seen that (36) Placement Phase. It can be seen that the memory size is between µ 1 = g−1 g = 1 2 and µ 2 = g g+1 = 2 3 . We partition w into two parts as w = [w 1 ; w 2 ] where w 1 ∈ (F q ) F/12 and w 2 ∈ (F q ) 11F/12 . Furthermore, • w 1 is partitioned into two equal-length subfiles, Delivery Phase. With some permutation on the rows of w, the demand of user 1 can be expressed as ,3} from its cache, and similarly for the other users. Thus in the delivery phase, the users need to recover If we treat each sum in (38)-(43) as a block and use the MAN strategy to delivery these blocks, we would transmit B 1 + B 2 , B 3 + B 4 , B 5 + B 6 for a total of F 4 symbols. Hence, the scheme achieves the same normalized load as the proposed scheme in (26) with µ 1 = 1 2 ; in other words, a portion of the memory of size µ − µ 1 = 47 72 − 1 2 = 11 72 would be wasted. We next propose two novel schemes to let each user recover its desired sum in (38)-(43) while leveraging the whole memory.
The solution that achieves ρ 1 in (18). Focus on the demanded sum of user 1 in (38). The key idea is to let user 1 recover D 1 for a total of F 6 symbols. Hence, in the delivery phase the server transmits F 24 + F 6 = 5F 24 symbols, and the normalized load is ρ 1 = 5 24 , which coincides with (18). The solution that achieves ρ 2 in (19). The idea is to partition each user's demand into two parts after having removed its cached content, where the partition is the result of a clever invertible linear transformation; we then have two steps, one for each of the two parts.
We first focus on the demand of user 1 in (38), i.e., The main strategy here is to take a linear transformations of (47) as follows  3, 5, 6}. It will be clarified later that the server transmits B 1,{2,6} with coded caching gain equal to g = 2 (i.e., the multicast message satisfies two users simultaneously), and B 1,{2,3,5,6} with coded caching gain equal to g + 1 = 3.
Following the same line or reasoning, we can express the demands of the other users as The transmission contains two steps.

•
In the first step, we let each user k ∈ [6] recover the first term of its demand B k . In this step, the server transmits which contains F 8 symbols.

•
In the second step, we let each user k ∈ [6] recover the second term of its demand B k . In this step, the server transmits for a total of F 12 symbols. From the received multicast messages and its cache content, each user k ∈ [K] can recover B k , and then compute B k from T −1 k B k . The normalized load is ρ 2 = 1 8 + 1 12 = 5 24 , which conincides with (19). In conclusion, the normalized load of the proposed scheme is ρ ach = min{ρ 1 , ρ 2 } = 5 24 , while the baseline scheme in (16) achieves the normalized load equals 25 72 .

•
Fix K and µ. When λ grows, the gap between the proposed scheme and the baseline scheme reduces. When λ = 1 K , the proposed scheme and the baseline scheme have the same load; this is because at the corner points of the proposed scheme in (26) we achieve the load 1 − µ which is the same as the baseline scheme. • In addition, we also plot the cache-aided scalar linear function retrieval scheme in (14), which only works for the case where the demand matrices are with the form in (12). This comparison shows that, if the demand matrices are structured, we can design caching schemes that leverage the special structure of the demands to achieve a load that is no larger than the load for the worst-case demands. Moreover, the more the structure the more the gain compared to in (17).

Conclusions
In this paper, we formulated the cache-aided general linear function retrieval problem, where each user requests some linear combinations of all the symbols in the library. The formulated problem generalizes the cache-aided scalar linear function retrieval problem. We proposed a novel scheme that strictly improves on an uncoded caching baseline scheme. Further directions include designing improved coded caching schemes for arbitrary users' demand ranges (the setting considered here), as well as for given specific users' demand ranges. In addition, the derivation of a converse bound is also part of on-going work.  By a grouping strategy, we can achieve the normalized memory-load points in (26). In the following, inspired by Example 1, we introduce a general interpolation method among the points in (26).
We let Mod(b, a) represent the modulo operation on b with integer divisor a and we let Mod(b, a) ∈ {1, . . . , a} (i.e., we let Mod(b, a) = a if a divides b).
We first consider the case where g ∈ [K − 1] and α g ≥ K g λ. Recall that µ = α g−1 g + (1 − α) g g+1 > g−1 g . In this case, we directly use the caching scheme in (26) for the memory size g−1 g with achieved normalized load which coincides with (17).
We then focus on the case where g ∈ [K − 1] and α g ≤ K g λ. Placement Phase. The placement is done by the memory-sharing between the proposed placements in (26) for M 1 = g−1 g and M 2 = g g+1 . We divide w into two parts, w = [w 1 ; w 2 ] where the dimension of w 1 is αF × 1 and the dimension w 2 is (1 − α)F × 1.
For the first part, we further partition w 1 into g non-overlapping and equal-length subfiles, , where the dimension of each subfile w 1 T is αF g × 1. Each user k ∈ [K] caches w 1 T where T ⊆ [g], |T | = g − 1, and Mod(k, g) ∈ T . For the second part of each file, we further partition w 2 into g + 1 non-overlapping and equal-length subfiles, , |T | = G, and Mod(k, g + 1) ∈ T .
In total, each user caches symbols, satisfying the memory size constraint. Delivery Phase. For each T 1 ∈ [g] where |T 1 | = g − 1, we define D k,T 1 as the sub-matrix of D k which contains the columns corresponding to the symbols in w 1 T 1 . In addition, for each T 2 ∈ [g + 1] where |T 2 | = g, we define D k,T 2 as the sub-matrix of D k which contains the columns corresponding to the symbols in w 2 T 2 .
We can express the demand of user k ∈ [K] as It can be seen that user k knows all the terms in (A4) except Hence, in the delivery phase user k should recover B k . We then propose two solutions for this objective.
The solution that achieves ρ 1 in (18). We let user k recover For the first term B k,1 , the dimension of D k,[g]\{Mod(k,g)} is λF × αF g and D k,[g]\{Mod(k,g)} is known by each user. Recall that in this case we have αF g ≤ λF. Hence, we let user k directly recover w 1 [g]\{Mod(k,g)} . Thus in the delivery phase, we let the server transmit with αF g symbols. It can be seen that each user k ∈ [K] desires w 1 [g]\{Mod(k,g)} and caches all the other terms in (A8), such that user k can recover w 1 [g]\{Mod(k,g)} . For the second term B k,2 , the dimension of D k,[g+1]\{Mod(k,g+1)} is λF × (1−α)F g+1 . Notice that B k,2 only contains linear combinations of the second parts of files in the library. For the second part of each file, the users in Notice that the dimension of D i is |G 2 i |λF × (1−α)F g+1 . So virtual user v i only needs to recover at most min K g+1 λ, 1−α g+1 F symbols in (A9). We denote the set of these symbols by P i,[g+1]\{i} , which is known by all the other virtual users. We then let the server transmit with min K g+1 λ, 1−α g+1 F symbols, such that each virtual user can recover its demand.
In total, the server transmits symbols, which coincides with (18). The solution that achieves ρ 2 in (19). Recall that the demanded sum of user k is where T k is full-rank with dimension λF × λF, and the bottom λ − α g + F symbols in B k are some linear combinations of w 2 [g+1]\{Mod(k,g+1)} (i.e., these linear combinations do not contain any term in w 1 [g]\{Mod(k,g)} ). This is possible because B k contains λF linear combinations of all symbols in [w 1 [g]\{Mod(k,g)} ; w 2 [g+1]\{Mod(k,g+1)} ], while w 1 [g]\{Mod(k,g)} contains αF g symbols. Hence, we can re-express B k as The delivery phase is divided into two steps. In the first step, we first let each user k ∈ [K] recover B k,1 . Notice that B k,1 is the set of some linear combinations of the symbols in w 1 [g]\{Mod(k,g)} w 2 [g+1]\{Mod(k,g+1)} . w 1 [g]\{Mod(k,g)} is known by any user j 1 ∈ [K] where Mod(j 1 , g) = k; w 2 [g+1]\{Mod(k,g+1)} is known by any user j 2 ∈ [K] where Mod(j 2 , g + 1) = k. Assume that k = a k g + Mod(k, g), where a k = k g − 1 and Mod(k, g) ∈ [g]. In Appendix B, we prove the following lemma.
For each i ∈ K g , we let the server transmit From Lemma A1, each user (i − 1)g + j knows all except B (i−1)g+j,1 such that it can recover B (i−1)g+j,1 . In this step, the server transmits K g min α g , λ F symbols. In the second step, we then let each user k ∈ [K] recover B k,2 , which contains linear combinations of w 2 [g+1]\{Mod(k,g+1)} . We can use the same delivery scheme as we used to delivery the second term in the first solution (i.e., B k,2 in (A7) which contains λF linear combinations of w 2 [g+1]\{Mod(k,g+1)} and then recovers its demand. In total, the achieved normalized load is coinciding with (19).
Appendix A.3. Proof of (20) Finally, we focus on the case µ = α K−1 K + (1 − α) where α ∈ (0, 1). In this case, the proposed scheme is a direct extension from the proposed scheme in (25). More precisely, • we directly use the caching scheme in (25) for the memory size K−1 K with the achieved normalized load equal to λ.

•
In this case, the number of symbols which are not cached by user is αF K . Hence, we can let each user directly recover the uncached symbols with the achieved normalized load equal to α K . This concludes the proof.