Symmetry, Outer Bounds, and Code Constructions: A Computer-Aided Investigation on the Fundamental Limits of Caching

We illustrate how computer-aided methods can be used to investigate the fundamental limits of the caching systems, which are significantly different from the conventional analytical approach usually seen in the information theory literature. The linear programming (LP) outer bound of the entropy space serves as the starting point of this approach; however, our effort goes significantly beyond using it to prove information inequalities. We first identify and formalize the symmetry structure in the problem, which enables us to show the existence of optimal symmetric solutions. A symmetry-reduced linear program is then used to identify the boundary of the memory-transmission-rate tradeoff for several small cases, for which we obtain a set of tight outer bounds. General hypotheses on the optimal tradeoff region are formed from these computed data, which are then analytically proven. This leads to a complete characterization of the optimal tradeoff for systems with only two users, and certain partial characterization for systems with only two files. Next, we show that by carefully analyzing the joint entropy structure of the outer bounds for certain cases, a novel code construction can be reverse-engineered, which eventually leads to a general class of codes. Finally, we show that outer bounds can be computed through strategically relaxing the LP in different ways, which can be used to explore the problem computationally. This allows us firstly to deduce generic characteristic of the converse proof, and secondly to compute outer bounds for larger problem cases, despite the seemingly impossible computation scale.


Introduction
We illustrate how computer-aided methods can be used to investigate the fundamental limits of the caching systems, which is in clear contrast to the conventional analytical approach usually seen in the information theory literature. The theoretical foundation of this approach can be traced back to the linear programming (LP) outer bound of the entropy space [1]. The computer-aided approach has been previously applied in [2][3][4][5] on distributed data storage systems to derive various outer bounds, which in many cases are tight. In this work, we first show that the same general methodology can be tailored to the caching problem effectively to produce outer bounds in several cases, but more importantly, we show that data obtained through computation can be used in several different manners to deduce meaningful structural understanding of the fundamental limits and optimal code constructions.
The computer-aided investigation and exploration methods we propose are quite general; however, we tackle the caching problem in this work. Caching systems have attracted much research attention recently. In a nutshell, caching is a data management technique that can alleviate the communication burden during peak traffic time or data demand time, by prefetching and prestoring certain useful content at the users' local caches. Maddah-Ali and Niesen [6] recently considered the problem in an information theoretical framework, where the fundamental question is the optimal tradeoff between local cache memory capacity and the content delivery transmission rate. It was shown in [6] that coding can be very beneficial in this setting, while uncoded solutions suffer a significant loss. Subsequent works extended it to decentralized caching placements [7], caching with nonuniform demands [8], online caching placements [9], hierarchical caching [10], caching with random demands [11], among other things. There have been significant research activities recently [12][13][14][15][16][17][18][19][20][21] in both refining the outer bounds and finding stronger codes for caching. Despite these efforts, the fundamental tradeoff had not been fully characterized except for the case with only two users and two files [6] before our work. This is partly due to the fact that the main focus of the initial investigations [6][7][8][9] was on systems operating in the regime where the number of files and the number of users are both large, for which the coded solutions can provide the largest gain over the uncoded counterpart. However, in many applications, the number of simultaneous data requests can be small, or the collection of users or files need to be divided into subgroups in order to account for various service and request inhomogeneities; see, e.g., [8]. More importantly, precise and conclusive results on such cases with small numbers of users or files can provide significant insights into more general cases, as we shall show in this work.
In order to utilize the computational tool in this setting, the symmetry structure in the problem needs be understood and used to reduce the problem to a manageable scale. The symmetry-reduced LP is then used to identify the boundary of the memory-transmission-rate tradeoff for several cases. General hypotheses on the optimal tradeoff region are formed from these data, which are then analytically proven. This leads to a complete characterization of the optimal tradeoff for systems with two users, and certain partial characterization for systems with two files. Next, we show that by carefully analyzing the joint entropy structure of the outer bounds, a novel code construction can be reverse-engineered, which eventually leads to a general class of codes. Moreover, data can also be used to show that a certain tradeoff pair is not achievable using linear codes. Finally, we show that outer bounds can be computed through strategically relaxing the LP in different ways, which can be used to explore the problem computationally. This allows us firstly to deduce generic characteristic of the converse proof, and secondly to compute outer bounds for larger problem cases, despite the seemingly impossible computation scale.
Although some of the tightest bounds and the most conclusive results on the optimal memory-transmission-rate tradeoff in caching systems are presented in this work, our main focus is in fact to present the generic computer-aided methods that can be used to facilitate information theoretic investigations in a practically-important research problem setting. For this purpose, we will provide the necessary details on the development and the rationale of the proposed techniques in a semi-tutorial (and thus less concise) manner. The most important contribution of this work is three methods for the investigation of fundamental limits of information systems: (1) computational and data-driven converse hypothesis, (2) reverse-engineering optimal codes, and (3) computer-aided exploration. We believe that these methods are sufficiently general, such that they can be applied to other coding and communication problems, particularly those related to data storage and management.
The rest of the paper is organized as follows. In Section 2, existing results on the caching problem and some background information on the entropy LP framework are reviewed. The symmetry structure of the caching problem is explored in detail in Section 3. In Section 4, we show how the data obtained through computation can be used to form hypotheses, and then analytically prove them. In Section 5, we show that the computed data can also be used to facilitate reverse-engineering new codes, and also to prove that a certain memory-transmission-rate pair is not achievable using linear codes. In Section 6, we provide a method to explore the structure of the outer bounds computationally, to obtain insights into the problem and derive outer bounds for large problem cases. A few concluding remarks are given in Section 7, and technical proofs and some computer-produced proof tables are relegated to the Appendices A-I .

The Caching System Model
There are a total of N mutually independent files of equal size and K users in the system. The overall system operates in two phases: in the placement phase, each user stores in his/her local cache some content from these files; in the delivery phase, each user will request one file, and the central server transmits (multicasts) certain common content to all the users to accommodate their requests. Each user has a local cache memory of capacity M, and the contents stored in the placement phase are determined without knowing a priori the precise requests in the delivery phase. The system should minimize the amount of multicast information, which has rate R for all possible combinations of user requests, under the memory cache constraint M, both of which are measured as multiples of the file size F. The primary interest of this work is the optimal tradeoff between M and R. In the rest of the paper, we shall refer to a specific combination of the file requests of all users together as a demand, or a demand pattern, and reserve the word "request" as the particular file a user needs. Figure 1 provides an illustration of the overall system.   (1,2,2,3), respectively, and the multicast common information is written as X 1,2,2,3 .
Since we are investigating the fundamental limits of the caching systems in this work, the notation for the various quantities in the systems needs to be specified. The N files in the system are denoted as W {W 1 , W 2 , . . . , W N }; the cached contents at the K users are denoted as Z {Z 1 , Z 2 , . . . , Z K }; and the transmissions to satisfy a given demand are denoted as X d 1 ,d 2 ,...,d K , i.e., the transmitted information X d 1 ,d 2 ,...,d K when user k requests file W d k , k = 1, 2, . . . , K. For simplicity, we shall write (W 1 , W 2 , . . . , W n ) simply as W [1:n] , and (d 1 , d 2 , . . . , d K ) as d [1:K] ; when there are only two users in the system, we write (X i,1 , X i,2 , . . . , X i,j ) as X i, [1:j] . There are other simplifications of the notation for certain special cases of the problem, which will be introduced as they become necessary.
The cache content at the k-th user is produced directly from the files through the encoding function f k , and the transmission content from the files through the encoding function g d [1:K] , i.e., the second of which depends on the particular demands d [1:K] . Since the cached contents and transmitted information are both deterministic functions of the files, we have: It is also clear that: i.e., the file W d k is a function of the cached content Z k at user k and the transmitted information when user k requests W d k . The memory satisfies the constraint: and the transmission rate satisfies: Any valid caching code must satisfy the specific set of conditions in (2)- (5). A slight variant of the problem definition allows vanishing probability of error, i.e., the probability of error asymptotically approaches zero as F goes to infinity; all the outer bounds derived in this work remain valid for this variant with appropriate applications of Fano's inequality [22].

Known Results on Caching Systems
The first achievability result on this problem was given in [6], which is directly quoted below.
The first term in the minimization is achieved by the scheme of uncoded placement together with coded transmission [6], while the latter term is by simple uncoded placement and uncoded transmission. More recently, Yu et al. [19] provided the optimal solution when the placement is restricted to be uncoded. Chen et al. [15] extended a special scheme for the case N = K = 2 discussed in [6] to the general case N ≤ K, and showed that the tradeoff pair 1 K , is achievable. There were also several other notable efforts in attempting to find better binary codes [16][17][18]21]. Tian and Chen [20] proposed a class of codes for N ≤ K, the origin of which will be discussed in more details in Section 5. Gómez-Vilardebó [21] also proposed a new code, which can provide further improvement in the small cache memory regime. Tradeoff points achieved by the codes in [20] can indeed be optimal in some cases. It is worth noting that while all the schemes [6,[15][16][17][18][19]21] are binary codes, the codes in [20] use a more general finite field.
A cut-set outer bound was also given in [6], which is again directly quoted below.
Theorem 2 (Maddah-Ali and Niesen [6]). For N files and K users each with a cache size 0 ≤ M ≤ N, Several efforts to improve this outer bound have also been reported, which have led to more accurate approximation characterizations of the optimal tradeoff [12][13][14]. However, as mentioned earlier, even for the simplest cases beyond (N, K) = (2, 2), complete characterizations was not available before our work (firstly reported in [23]). In this work, we specifically treat such small problem cases, and attempt to deduce more generic properties and outer bounds from these cases. Some of the most recent work [24,25] that were obtained after the publication of our results [23] provide even more accurate approximations, the best of which at this point of time is roughly a factor of 2 [24].

The Basic Linear Programming Framework
The basic linear programing bound on the entropy space was introduced by Yeung [1], which can be understood as follows. Consider a total of n discrete random variables (X 1 , X 2 , . . . , X n ) with a given joint distribution. There are a total of 2 n − 1 joint entropies, each associated with a non-empty subset of these random variables. It is known that the entropy function is monotone and submodular, and thus, any valid (2 n − 1) dimensional entropy vector must have the properties associated with such monotonicity and submodularity, which can be written as a set of inequalities. Yeung showed (see, e.g., [26]) that the minimal sufficient set of such inequalities is the so-called elemental inequalities: The 2 n − 1 joint entropy terms can be viewed as the variables in a linear programming (LP) problem, and there is a total of n + ( n 2 )2 n−2 constraints in (8) and (9). In addition to this generic set of constraints, each specific coding problem will place additional constraints on the joint entropy values. These can be viewed as a constraint set of the given problem, although the problem might also induce constraints that are not in this form or even not possible to write in terms of joint entropies. For example, in the caching problem, the set of random variables are . . , N}}, and there is a total of 2 N+K+N K − 1 variables in this LP; the problem-specific constraints are those in (2)-(5), and there are N + K + N K + ( N+K+N K 2 )2 N+K+N K −2 elemental entropy constraints, which is in fact doubly exponential in the number of users K.

A Computed-Aided Approach to Find Outer Bounds
In principle, with the aforedescribed constraint set, one can simply convert the outer bounding problem into an LP (with an objective function R for each fixed M in the caching problem, or more generally a linear combination of M and R), and use a generic LP solver to compute it. Unfortunately, despite the effectiveness of modern LP solvers, directly applying this approach on an engineering problem is usually not possible, since the scale of the LP is often very large even for simple settings. For example, for the caching problem, when N = 2, K = 4, there are overall 200 million elemental inequalities. The key observation used in [2] to make the problem tractable is that the LP can usually be significantly reduced, by taking into account the symmetry and the implication relations in the problem.
The details of the reductions can be found in [2], and here, we only provide two examples in the context of the caching problem to illustrate the basic idea behind these reductions:

•
Assuming the optimal codes are symmetric, which will be defined more precisely later, the joint entropy H(W 2 , Z 3 , X 2,3,3 ) should be equal to the joint entropy H(W 1 , Z 2 , X 1,2,2 ). This implies that in the LP, we can represent both quantities using a single variable.

•
Because of the relation (3), the joint entropy H(W 2 , Z 3 , X 2,3,3 ) should be equal to the joint entropy H(W 2 , W 3 , Z 3 , X 2,3,3 ). This again implies that in the LP, we can represent both quantities using a single variable.
The reduced primal LP problem is usually significantly smaller, which allows us to find a lower bound for the tradeoff region for a specific instance with fixed file sizes. Moreover, after identifying the region of interest using these computed boundary points, a human-readable proof can also be produced computationally by invoking the dual of the LP given above. Note a feasible and bounded LP always has a rational optimal solution when all the coefficients are rational, and thus, the bound will have rational coefficients. More details can again be found in [2]; however, this procedure can be intuitively viewed as follows. Suppose a valid outer bound in the constraint set has the form of: ∑ Φ⊆{1,2,...,n} then it must be a linear combination of the known inequalities, i.e., (8) and (9), and the problem-specific constraints, e.g., (2)-(5) for the caching problem. To find a human-readable proof is essentially to find a valid linear combination of these inequalities, and for the conciseness of the proof, the sparsest linear combination is preferred. By utilizing the LP dual with an additional linear objective, we can find within all valid combinations a sparse (but not necessarily the sparsest) one, which can yield a concise proof of the inequality (10).
It should be noted that in [2], the region of interest was obtained by first finding a set of fine-spaced points on the boundary of the outer bound using the reduced LP, and then manually identifying the effective bounding segments using these boundary points. This task can however be accomplished more efficiently using an approach proposed by Lassez and Lassez [27], as pointed out in [28]. This prompted the author to implement this part of the computer program using this more efficient approach. For completeness, the specialization of the Lassez algorithm to the caching problem, which is much simplified in this setting, is provided in Appendix A.
The proof found through this approach can be conveniently written in a matrix to list all the linear combination coefficients, and one can easily produce a chain of inequalities using such a table to obtain a more conventional human-readable proof. This approach of generating human-readable proofs has subsequently been adopted by other researchers [5,29]. Though we shall present several results thus obtained in this current work in the tabulation form, our main goal is to use these results to present the computer-aided approach, and show the effectiveness of our approach.

Symmetry in the Caching Problem
The computer-aided approach to derive outer bounds mentioned earlier relies critically on the reduction of the basic entropy LP using symmetry and other problem structures. In this section, we consider the symmetry in the caching problem. Intuitively, if we place the cached contents in a permuted manner at the users, it will lead to a new code that is equivalent to the original one. Similarly, if we reorder the files and apply the same encoding function, the transmissions can also be changed accordingly to accommodate the requests, which is again an equivalent code. The two types of symmetries can be combined, and they induce a permutation group on the joint entropies of the subsets of the random variables W ∪ Z ∪ X .
For concreteness, we may specialize to the case (N, K) = (3,4) in the discussion, and for this case:
With the new coding functions and the permuted random variables defined above, we have the following relation: (Wπ, Zπ, Xπ)=(W,π(Z ),π(X )), (15) where the superscriptπ indicates the random variables induced by the new encoding functions. We call a caching code user-index-symmetric, if for any subsets W o ⊆ W, Z o ⊆ Z, X o ⊆ X , and any permutationπ, the following relation holds: For example, for such a symmetric code, the entropy H(W 2 , Z 2 , X 1,2,3,2 ) under the aforementioned permutation is equal to H(W 2 , Z 3 , X 3,1,2,2 ); note that W 2 is a function of (Z 2 , X 1,2,3,2 ), and after the mapping, it is a function of (Z 3 , X 3,1,2,2 ).
With the new coding functions and the permuted random variables defined above, we have the following equivalence relation: where d = indicates equal in distribution, which is due to the the random variables in W being independently and identically distributed, thus exchangeable.

Existence of Optimal Symmetric Codes
With the symmetry structure elucidated above, we can now state our first auxiliary result.

Proposition 1.
For any caching code, there is a code with the same or smaller caching memory and transmission rate, which is both user-index-symmetric and file-index-symmetric.
We call a code that is both user-index-symmetric and file-index-symmetric a symmetric code. This proposition implies that there is no loss of generality to consider only symmetric codes. The proof of this proposition relies on a simple space-sharing argument, where a set of base encoding functions and base decoding function are used to construct a new code. In this new code, each file is partitioned into a total of N!K! segments, each having the same size as suitable in the base coding functions. The coding functions obtained as in (12) and (17) from the base coding functions using permutationsπ andπ are used on the i-th segments of all the files to produce random variables Wπ ·π ∪ Zπ ·π ∪ Xπ ·π . Consider a set of random variables (W o ∪ Z o ∪ X o ) in the original code, and denote the same set of random variables in the new code as (W o ∪ Z o ∪ X o ). We have: because of (15) and (19). Similarly, for another pair of permutations (π ,π ), the random variableŝ in the new code will have exactly the same joint entropy value. It is now clear that the resultant code by space sharing is indeed symmetric, and it has (normalized) memory sizes and a transmission rate no worse than the original one. A similar argument was used in [2] to show, with a more detailed proof, the existence of optimal symmetric solution in regenerating codes. In a separate work [30], we investigated the properties of the induced permutationπ ·π, and particularly, showed that it is isomorphic to the power group [31]; readers are referred to [30] for more details.

Demand Types
Even for symmetric codes, the transmissions to satisfy different types of demands may use different rates. For example in the setting N, K = (3, 4), H(X 1,2,2,2 ) may not be equal to H(X 1,1,2,2 ), and H(X 1,2,3,2 ) may not be equal to H(X 3,2,3,2 ). The transmission rate R is then chosen to be the maximum among all cases. This motivates the notion of demand types.

Definition 1.
In an (N, K) caching system, for a specific demand, let the number of users requesting file n be denoted as m n , n = 1, 2, . . . , N. We call the vector obtained by sorting the values {m 1 , m 2 , . . . , m N } in a decreasing order as the demand type, denoted as T .
Proposition 1 implies that for optimal symmetric solutions, demands of the same type can always be satisfied with transmissions of the same rate; however, demands of different types may still require different rates. This observation is also important in setting up the linear program in the computer-aided approach outlined in the previous section. Because we are interested in the worst case transmission rate among all types of demands, in the symmetry-reduced LP, an additional variable needs to be introduced to constrain the transmission rates of all possible types.
For an (N, K) system, determining the number of demand types is closely related to the integer partition problem, which is the number of possible ways to write an integer K as the sum of positive integers. There is no explicit formula, but one can use a generator polynomial to compute it [32]. For several small (N, K) pairs, we list the demand types in Table 1. Table 1. Demand types for small (N, K) pairs.

(N,K)
Demand Types It can be seen that when N ≤ K, increasing N induces more demand types, but this stops when N > K; however, increasing K always induces more demand types. This suggests it might be easier to find solutions for a collection of cases with a fixed K and arbitrary N values, but more difficult for that of a fixed N and arbitrary K values. This intuition is partially confirmed with our results presented next.

Computational and Data-Driven Converse Hypotheses
Extending the computational approach developed in [2] and the problem symmetry, in this section, we first establish complete characterizations for the optimal memory-transmission-rate tradeoff for (N, K) = (3, 2) and (N, K) = (4, 2). Based on these results and the known result for (N, K) = (2, 2), we are able to form a hypothesis regarding the optimal tradeoff for the case of K = 2. An analytical proof is then provided, which gives the complete characterization of the optimal tradeoff for the case of (N, 2) caching systems. We then present a characterization of the optimal tradeoff for (N, K) = (2, 3) and an outer bound for (N, K) = (2, 4). These results also motivate a hypothesis on the optimal tradeoff for N = 2, which is subsequently proven analytically to yield a partial characterization. Note that since both M and R must be nonnegative, we do not explicitly state their non-negativity from here on.
Our investigation thus starts with identifying the previously unknown optimal tradeoff for (N, K) = (3, 2) and (N, K) = (4, 2) using the computation approach outlined in Section 2, the results of which are first summarized below as two propositions.
The proofs for Propositions 3 and 4 can be found in Appendix B, which are given in the tabulation format mentioned earlier. Strictly speaking, these two results are specialization of Theorem 3, and there is no need to provide the proofs separately; however, we provide them to illustrate the computer-aided approach.
The optimal tradeoff for these cases is given in Figure 2. A few immediate observations are as follows , there is only one non-trivial corner point on the optimal tradeoff, but for (N, K) = (2, 2), there are in fact two non-trivial corner points.

•
The cut-set bound is tight at the high memory regime in all the cases.

•
The single non-trivial corner point for (N, K) = (3, 2) and (N, K) = (4, 2) is achieved by the scheme proposed in [6]. For the (N, K) = (2, 2) case, one of the corner point is achieved also by this scheme, but the other corner point requires a different code.
Given the above observations, a natural hypothesis is as follows.

Hypothesis 1.
There is only one non-trivial corner point on the optimal tradeoff for (N, K) = (N, 2) caching systems when N ≥ 3, and it is (M, R) = (N/2, 1/2), or equivalently, the two facets of the optimal tradeoff should be: We are indeed able to analytically confirm this hypothesis, as stated formally in the following theorem.
Theorem 3. For any integer N ≥ 3, any memory-transmission-rate tradeoff pair for the (N, K) = (N, 2) caching problem must satisfy: Conversely, for any integer N ≥ 3, there exist codes for any nonnegative (M, R) pair satisfying (26). For (N, K) = (2, 2), the memory-transmission-rate tradeoff must satisfy: Conversely, for (N, K) = (2, 2), there exist codes for any nonnegative (M, R) pair satisfying (27). Since the solution for the special case (N, K) = (2, 2) was established in [6], we only need to consider the cases for N ≥ 3. Moreover, for the converse direction, only the bound 3M + NR ≥ 2N needs to be proven, since the other one can be obtained using the cut-set bound in [6]. To prove the remaining inequality, the following auxiliary lemma is needed.
Using Lemma 1, we can prove the converse part of Theorem 3 through an induction; the proofs of Theorem 3 and Lemma 1 can be found in Appendix C, both of which heavily rely on the symmetry specified in the previous section. Although some clues can be found in the proof tables for the cases (N, K) = (3, 2) and (N, K) = (4, 2), such as the effective joint entropy terms in the converse proof each having only a small number of X random variables, finding the proof of Theorem 3 still requires considerable human effort, and was not completed directly through a computer program. One key observation simplifying the proof in this case is that as the hypothesis states, the optimal corner point is achieved by the scheme given in [6], which is known only thanks to the computed bounds. In this specific case, the scheme reduces to splitting each file in half, and placing one half at the first user, and the other half at the second user; the corresponding delivery strategy is also extremely simple. We combined this special structure and the clues from the proof tables to find the outer bounding steps. [12] can be used to establish the bound 3M + NR ≥ 2N when K = 2, however only for the cases when N is an integer multiple of three. For N = 4, the bounds developed in [12][13][14] give M + 2R ≥ 3, instead of 3M + 4R ≥ 8, and thus, they are loose in this case. After this bound was initially reported in [23], Yu et al. [24] discovered an alternative proof.

A Partial Characterization for N = 2
We first summarize the characterizations of the optimal tradeoff for (N, K) = (2, 3), and the computed outer bound for (N, K) = (2, 4), in two propositions.

Proposition 6.
The memory-transmission-rate tradeoff for the (N, K) = (2, 4) caching problem must satisfy: For Proposition 5, the only new bound 3M + 3R ≥ 5 is a special case of the more general result of Theorem 4, and we thus do not provide this proof separately. For Proposition 6, only the second and the third inequalities need to be proven, since the fourth coincides with a bound in the (2, 3) case, the fifth is a special case of Theorem 4, and the others can be produced from the cut-set bounds. The proofs for these two inequalities are given in Appendix E. The optimal tradeoff for (N, K) = (2, 2), (2, 3) and the outer bound for (2, 4) are depicted in Figure 3. A few immediate observations and comments are as follows: • There are two non-trivial corner points on the outer bounds for (N, K) = (2, 2) and (N, K) = (2, 3), and there are five non-trivial corner points for (N, K) = (2, 4).

Remark 2. The bounds developed in
From the above observations, we can hypothesize that for N = 2, the number of corner points will continue to increase as K increases above four, and at the high memory regime, the scheme [6] is optimal. More precisely, we can establish the following theorem. Theorem 4. When K ≥ 3 and N = 2, any (M, R) pair must satisfy: As a consequence, the uncoded-placement-coded-transmission scheme in [6] (with space-sharing) is optimal when M ≥ 2(K−2) K , for the cases with K ≥ 4 and N = 2.
The first line segment at the high memory regime is M + 2R ≥ 2, which is given by the cut-set bound; its intersection with (31) is indeed the first point in: The proof of this theorem now boils down to the proof of the bound (31). This requires a sophisticated induction, the digest of which is summarized in the following lemma. The symmetry of the problem is again heavily utilized throughout the proof of this lemma. For notational simplicity, we use X →j to denote X 1,1,...,1,2,1,...,1 , i.e., when the j-th user requests the second file, and all the other users request the first file; we also write a collection of such variables (X →j , X →j+1 , . . . , X →k ) as X →[j:k] .

Lemma 2.
For N = 2 and K ≥ 3, the following inequality holds for k ∈ {2, 3, . . . , K − 1}: where we have taken the convention H(Z 1 , The proof of Lemma 2 is given in Appendix F. Theorem 4 can now be proven straightforwardly.
Proof of Theorem 4. We first write the following simple inequalities: Now, applying Lemma 2 with k = 2 gives: Observe that: where in the first inequality the file index symmetry H(W 1 Z 1 ) = H(W 2 Z 1 ) has been used. We can now continue to write: which has some a common term H(Z 1 ) on both sizes with different coefficients. Removing the common term and multiplying both sides by two lead to: where the equality relies on the assumption that W 1 and W 2 are independent files of unit size. Taking into consideration the memory and transmission rate constraints (4) and (5) now completes the proof.
Lemma 2 provides a way to reduce the number of X variables in H(Z 1 , X →[2:k] ), and thus is the core of the proof. Even with the hypothesis regarding the scheme in [6] being optimal, deriving the outer bound (particularly the coefficients in the lemma above) directly using this insight is far from being straightforward. Some of the guidance in finding our derivation was in fact obtained through a strategic computational exploration of the outer bounds. This information is helpful because the computer-generated proofs are not unique, and some of these solutions can appear quite arbitrary; however, to deduce general rules in the proof requires a more structured proof instead. In Section 6, we present in more detail this new exploration method, and discuss how insights can be actively identified in this particular case.

Reverse-Engineering Code Constructions
In the previous section, outer bounds of the optimal tradeoff were presented for the case (N, K) = (2, 4), which is given in Figure 3. Observe that the corner points: cannot be achieved by existing codes in the literature. The former point can indeed be achieved with a new code construction. This construction was first presented in [20], where it was generalized more systematically to yield a new class of codes for any N ≤ K, the proof and analysis of which are more involved. In this paper, we focus on how a specific code for this corner point was found through a reverse-engineering approach, which should help dispel the mystery on this seemingly arbitrary code construction.
5.1. The Code to Achieve 2 3 , 1 for (N, K) = (2, 4) The two files are denoted as A and B, each of which is partitioned into six segments of equal size, denoted as A i and B i , respectively, i = 1, 2, . . . , 6. Since we count the memory and transmission in multiples of the file size, the corner point 2 3 , 1 means the need for each user to store four symbols, and the transmission will use six symbols. The contents in the cache of each user are given in Table 2. By the symmetry of the cached contents, we only need to consider the demand (A, A, A, B), i.e., the first three users requesting A and User 4 requesting B, and the demand (A, A, B, B), i.e., the first two users requesting A and the other two requesting B. Table 2. Caching content for (N, K) = (2, 4).
Assume the file segments are in F 5 for concreteness.

•
For the demands (A, A, A, B), the transmission is as follows, Step 1: B 1 , B 2 , B 4 ; Step 2: A 3 + 2A 5 + 3A 6 , A 3 + 3A 5 + 4A 6 ; Step 3: After Step 1, User 1 can recover (A 1 , A 2 ); furthermore, he/she has (A 3 + B 3 , Together with the transmission in Step 2, User 4 has three linearly independent combinations of (A 3 , A 5 , A 6 ). After recovering them, he/she can remove these interferences from the cached content for (B 3 , B 5 , B 6 ). • For the demand (A, A, B, B), we can send: Step 1: B 1 , A 6 ; Step 2: User 1 has A 1 , B 1 , A 6 after Step 1, and he/she can also form: and together with B 2 + 2B 3 in the transmission of Step 2, he/she can recover (B 2 , B 3 ), and thus A 2 , A 3 . He/she still needs (A 4 , A 5 ), which can be recovered straightforwardly from the transmission (A 2 + 2A 4 , A 3 + 2A 5 ) since he/she already has (A 2 , A 3 ). Other users can use a similar strategy to decode their requested files.

Extracting Information for Reverse-Engineering
It is clear at this point that for this case of (N, K) = (2, 4), the code to achieve this optimal corner point is not straightforward. Next, we discuss a general approach to deduce the code structure from the LP solution, which leads to the discovery of the code in our work. The approach is based on the following assumptions: the outer bound is achievable (i.e., tight); moreover, there is a (vector) linear code that can achieve this performance.
Either of the two assumptions above may not hold in general, and in such a case, our attempt will not be successful. Nevertheless, though linear codes are known to be insufficient for all network coding problems [33], existing results in the literature suggest that vector linear codes are surprisingly versatile and powerful. Similarly, though it is known that Shannon-type inequalities, which are the basis for the outer bounds computation, are not sufficient to characterize rate region for all coding problems [34,35], they are surprisingly powerful, particularly in coding problems with strong symmetry structures [36,37].
There are essentially two types of information that we can extract from the primal LP and dual LP: • From the effective information inequalities: since we can produce a readable proof using the dual LP, if a code can achieve this corner point, then the information inequalities in the proof must hold with equality for the joint entropy values induced by this code, which reveals a set of conditional independence relations among random variables induced by this code; • From the extremal joint entropy values at the corner points: although we are only interested in the tradeoff between the memory and transmission rate, the LP solution can provide the whole set of joint entropy values at an extreme point. These values can reveal a set of dependence relations among the random variables induced by any code that can achieve this point.
Though the first type of information is important, its translation to code constructions appears difficult. On the other hand, the second type of information appears to be more suitable for the purpose of code design, which we adopt next.
One issue that complicates our task is that the entropy values so extracted are not always unique, and sometimes have considerable slacks. For example, for different LP solutions at the same operating point of (M, R) = 2 3 , 1 , the joint entropy H(Z 1 , Z 2 ) can vary between one and 4/3. We can identify such a slack in any joint entropy in the corner point solutions by considering a regularized primal LP: for a fixed rate value R at the corner point in question as an upper bound, the objective function can be set as: instead of: subject to the same original symmetric LP constraints at the target M. By choosing a small positive γ value, e.g., γ = 0.0001, we can find the minimum value for H(Z 1 , Z 2 ) at the same (M, R) point; similarly, by choosing a small negative γ value, we can find the maximum value for H(Z 1 , Z 2 ) at the same (M, R) point. Such slacks in the solution add uncertainty to the codes we seek to find and may indeed imply the existence of multiple code constructions. For the purpose of reverse-engineering the codes, we focus on the joint entropies that do not have any slacks, i.e., the "stable" joint entropies in the solution.

Reverse-Engineering the Code for (N, K) = (2, 4)
With the method outlined above, we identify the following stable joint entropy values in the (N, K) = (2, 4) case for the operating point 2 3 , 1 listed in Table 3. The values are normalized by multiplying everything by six. For simplicity, let us assume that each file has six units of information, written as W 1 = (A 1 , A 2 , . . . , A 6 ) A and W 2 = (B 1 , B 2 , . . . , B 6 ) B, respectively. This is a rich set of data, but a few immediate observations are given next.

•
The quantities can be categorized into three groups: the first is without any transmission; the second is the quantities involving the transmission to fulfill the demand type (3, 1); and the last for demand type (2, 2).
The values indicate that for each of the two files, each user should have three units in his/her cache, and the combination of any two users should have five units in their cache, while the combination of any three users should have all six units in their cache. This strongly suggests placing each piece A i (and B i ) at two users. Since each Z i has four units, but it needs to hold three units from each of the two files, coded placement (cross files) is thus needed. At this point, we place the corresponding symbols in the caching, but keep the precise linear combination coefficients as undetermined.

•
The next critical observation is that H(X 1,2,2,2 W 1 ) = H(X 1,1,1,2 W 1 ) = H(X 1,1,2,2 W 1 ) = 3. This implies that the transmission has three units of information on each file alone. However, since the operating point dictates that H(X 1,2,2,2 ) = H(X 1,1,1,2 ) = H(X 1,1,2,2 ) = 6, it further implies that in each transmission, three units are for the linear combinations of W 2 , and 3 units are for those of W 1 ; in other words, the linear combinations do not need to mix information from different files.

•
Since each transmission only has three units of information from each file, and each user has only three units of information from each file, they must be linearly independent of each other.
The observation and deductions are only from the perspective of the joint entropies given in Table 3, without much consideration of the particular coding requirement. For example, in the last item discussed above, it is clear that when transmitting the three units of information regarding a file (say file W 2 ), they should be simultaneously useful to other users requesting this file, and to the users not requesting this file. This intuition then strongly suggests each transmitted linear combination of W 2 should be a subspace of the W 2 parts at some users not requesting it. Using these intuitions as guidance, finding the code becomes straightforward after trial-and-error. In [20], we were able to further generalize this special code to a class of codes for any case when N ≤ K; readers are referred to [20] for more details on these codes. Table 3. Stable joint entropy values at the corner point 2 3 , 1 for (N, K) = (2, 4).

Disproving Linear-Coding Achievability
The reverse engineering approach may not always be successful, either because the structure revealed by the data is very difficult to construct explicitly, or because linear codes are not sufficient to achieve this operating point. In some other cases, the determination can be done explicitly. In the sequel, we present an example for (N, K) = (3, 3), which belongs to the latter case. An outer bound for (N, K) = (3, 3) is presented in the next section, and among the corner points, the pair (M, R) = ( 2 3 , 4 3 ) is the only one that cannot be achieved by existing schemes. Since the outer bound appears quite strong, we may conjecture this pair to be also achievable and attempt to construct a code. Unfortunately, as we shall show next, there does not exist such a (vector) linear code. Before delving into the data provided by the LP, readers are encouraged to consider proving directly that this tradeoff point cannot be achieved by linear codes, which does not appear to be straightforward to the author.
We shall assume each file has 3m symbols in a certain finite field, where m is a positive integer. The LP produces the joint entropy values (in terms of the number of finite field symbols, not in multiples of file size as in the other sections of the paper) in Table 4 at this corner point, where only the conditional joint entropies relevant to our discussion next are listed. The main idea is to use these joint entropy values to deduce structures of the coding matrices, and then combining these structures with the coding requirements to reach a contradiction.  4 3 for (N, K) = (3, 3).

Joint Entropy Computed Value
The first critical observation is that H(Z 1 W 1 , W 2 ) = m, and the user-index-symmetry implies that H(Z 2 W 1 , W 2 ) = H(Z 3 W 1 , W 2 ) = m. Moreover H(Z 1 , Z 2 , Z 3 W 1 , W 2 ) = 3m, from which we can conclude that excluding file W 1 and W 2 , each user stores m linearly independent combinations of the symbols of file W 3 , which are also linearly independent among the three users. Similar conclusions hold for files W 1 and W 2 . Thus, without loss of generality, we can view the linear combinations of W i cached by the users, after excluding the symbols from the other two files, as the basis of file W i . In other words, this implies that through a change of basis for each file, we can assume without loss of generality that user k stores 2m linear combinations in the following form: where W n,j is the j-th symbol of the n-th file and V k is a matrix of dimension 2m × 3m; V k can be partitioned into submatrices of dimension m × m, which are denoted as V k;i,j , i = 1, 2 and j = 1, 2, 3.
Note that symbols at different users are orthogonal to each other without loss of generality. Without loss of generality, assume the transmitted content X 1,2,3 is: where G is a matrix of dimension 4m × 9m; we can partition it into blocks of m × m, and each block is referred to as G i,j , i = 1, 2, . . . , 4 and j = 1, 2, . . . , 9. Let us first consider User 1, which has the following symbols: The coding requirement states that X 1,2,3 and Z 1 together can be used to recover file W 1 , and thus, one can recover all the symbols of W 1 knowing (45). Since W 1 can be recovered, its symbols can be eliminated in (45), i.e., in fact becomes known. Notice Table 4 specifies H(Z 1 W 1 ) = 2m, and thus, the matrix: is in fact full rank; thus, from the top part of (46), W 2,[1:m] and W 3, [1:m] can be recovered. In summary, through elemental row operations and column permutations, the matrix in (45) can be converted into the following form: where diagonal block square matrices are of full rank 3m and 2m, respectively, and U i,j 's are the resultant block matrices after the row operations and column permutations. This further implies that the matrix [U 6,5 , U 6,6 , U 6,8 , U 6,9 ] has maximum rank m, and it follows that the matrix: i.e., the submatrix of G by taking thick columns (5,6,8,9) has only maximum rank m. However, due to the symmetry, we can also conclude that the submatrix of G taking only thick columns (1,3,7,9) and that taking only thick columns (1,2,4,5) both have only maximum rank m. As a consequence, the matrix G has rank no larger than 3m, but this contradicts the condition that H(X 1,2,3 ) = 4m in Table 4.
We can now conclude that this memory-transmission-rate pair is not achievable with any linear codes Strictly speaking, our argument above holds under the assumption that the joint entropy values produced by LP are precise rational values, and the machine precision issue has thus been ignored. However, if the solution is accurate only up to machine precision, one can introduce a small slack value δ into the quantities, e.g., replacing 3m with (3 ± δ)m, and using a similar argument show that the same conclusion holds. This extended argument however becomes notationally rather lengthy, and we thus omitted it here for simplicity.

Computational Exploration and Bounds for Larger Cases
In this section, we explore the fundamental limits of the caching systems in more detail using a computational approach. Due to the (doubly) exponential growth of the LP variables and constraints, directly applying the method outlined in Section 2 becomes infeasible for larger problem cases. This is the initial motivation for us to investigate single-demand-type systems where only a single demand type is allowed. Any outer bound on the tradeoff of such a system is an outer bound for the original one, and the intersection of these outer bounds is thus also an outer bound. This investigation further reveals several hidden phenomena. For example, outer bounds for different single-demand-type systems are stronger in different regimes, and moreover, the LP bound for the original system is not simply the intersection of all outer bounds for single-demand-type systems; however, in certain regimes, they do match.
Given the observations above, we take the investigation one step further by choosing only a small subset of demands instead of the complete set in a single demand type. This allows us to obtain results for cases which initially appear impossible to compute. For example, even for (N, K) = (2, 5), there is a total of 2 + 5 + 2 5 = 39 random variables, and the number of constraints in LP after symmetry reduction is more than 10 11 , which is significantly beyond current LP solver capability (the problem can be further reduced using problem specific implication structures as outlined in Section 2, but our experience suggests that even with such additional reduction the problem may still too large for a start-of-the-art LP solver). However, by strategically considering only a small subset of the demand patterns, we are indeed able to find meaningful outer bounds, and moreover, use the clues obtained in such computational exploration to complete the proof of Theorem 4. We shall discuss the method we develop, and also present several example results for larger problem cases.

Single-Demand-Type Systems
As mentioned above, in a single-demand-type caching systems, the demand must belong to a particular demand type. We first present results on two cases (N, K) = (2, 4) and (N, K) = (3, 3), and then discuss our observations using these results.
The optimal (M, R) tradeoffs are illustrated in Figure 4 with the known inner bound, i.e., those in [6,15], and the one given in the last section, and the computed out bound of the original problem given in Section 4. Here, the demand type (3,1) in fact provides the tightest outer bound, which matches the known inner bound for M ∈ [0, 1/4] ∪ [2/3, 2]. The converse proofs of (51) and (52) are obtained computationally, the details of which can be found in Appendix G. In fact, only the middle three inequalities in (51) and the second inequality in (52) need to be proven, since the others are due to the cut-set bound. Although the original caching problem requires codes that can handle all types of demands, the optimal codes for single-demand-type systems turn out to be quite interesting by their own right, and thus, we provide the forward proof of Theorem 7 in Appendix H.  The computed outer bounds for single-demand-type systems for (N, K) = (3, 3) are summarized below; the proofs can be found in Appendix I.

Proposition 8.
Any memory-transmission-rate tradeoff pair for the (N, K) = (3, 3) caching problem must satisfy the following conditions for single-demand-type (3, 0, 0): and conversely any non-negative (M, R) pair satisfying (53) is achievable for single-demand-type (3, 0, 0); it must satisfy for single-demand-type (2, 1, 0): and conversely any non-negative (M, R) pair satisfying (54) is achievable for single-demand-type (2, 1, 0); it must satisfy for single-demand-type (1, 1, 1): These outer bounds are illustrated in Figure 5, together with the best known inner bound by combining [6,15], and the cut-set outer bound for reference. The bound is in fact tight for M ∈ [0, 1/3] ∪ [1,3]. Readers may notice that Proposition 8 provides complete characterizations for the first two demand types, but not the last demand type. As we have shown in Section 5, the point ( 2 3 , 4 3 ) in fact cannot be achieved using linear codes.
We can make the following observations immediately: • The single-demand-type systems for few files usually produce tighter bounds at high memory regimes, while those for more files usually produce tighter bounds at low memory regimes. For example, the first high-memory segment of the bounds can be obtained by considering only demands that request a single file, which coincidentally is also the cut-set bound; for (N, K) = (3, 3), the bound obtained from the demand type (2, 1, 0) is stronger than that from (1, 1, 1) in the range M ∈ [1, 2].

•
Simply intersecting the single-demand-type outer bounds does not produce the same bound as that obtained from a system with the complete set of demands. This can be seen from the case (N, K) = (2, 4) in the range M ∈ [1/4, 2/3].

•
The outer bounds produced by single-demand-type systems in many cases match the bound when more comprehensive demands are considered. This is particularly evident in the case These observations provide further insights on the difficulty of the problem. For instance, for (N, K) = (2, 4), the demand type (3, 1) is the most demanding case, and code design for this demand type should be considered as the main challenge. More importantly, these observation suggests that it is possible to obtain very strong bounds by considering only a small subset of demands, instead of the complete set of demands. In the sequel, we further explore this direction.

Equivalent Bounds Using Subsets of Demands
Based on the observations in the previous subsection, we conjecture that in some cases, equivalent bounds can be obtained by using only a smaller number of requests, and moreover, these demands do not need to form a complete demand type class; next, we show that this is indeed the case.
To be more precise, we are relaxing the LP, by including only elemental inequality constraints that involve joint entropies of random variables within a subset of the random variables W ∪ Z ∪ X , and other constraints are simply removed. However, the symmetry structure specified in Section 3 is still maintained to reduce the problem. This approach is not equivalent to forming the LP on a caching system where only those files, users and demands are present, since in this alternative setting, symmetric solutions may induce loss of optimality.
There are many choices of subsets with which the outer bounds can be computed, and we only provide a few that are more relevant, which confirm our conjecture: These observations reveal that the subset of demands can be chosen rather small to produce strong bounds. For example, for the (N, K) = (2, 4) case, including only joint entropies involving eight random variables W ∪ Z ∪ {X 1,1,1,2 , X 1,1,2,2 } will produce the strongest bound as including all 22 random variables. Moreover, for specific regimes, the same bound can be produced using an even smaller number of random variables (for the case (N, K) = (3, 3)), or with a more specific set of random variables (for the case (N, K) = (2,4), where in the range [1/3,2], including only some of the demand type (3, 1) is sufficient). Equipped with these insights, we can attempt to tackle larger problem cases, for which it would have appeared impossible to produce computationally meaningful outer bounds. In the sequel, this approach is applied for two purposes: (1) to identify generic structures in converse proofs, and (2) to produce outer bounds for large problem cases.

Identifying Generic Structures in Converse Proofs
Recall our comment given after the proof of Theorem 4 that finding this proof is not straightforward. One critical clue was obtained when applying the exploration approach discussed above. When restricting the set of included random variables to a smaller set, the overall problem is relaxed; however, if the outer bound thus obtained remains the same, it implies that the sought-after outer bound proof only needs to rely on the joint entropies within this restricted set. For the specific case of (N, K) = (2, 5), we have the following fact. Together with the second item in Fact 1, we can naturally conjecture that in order to prove the hypothesized outer bound, only the dependence structure within the set of random variables W ∪ Z ∪ X →[1:K] needs to be considered, and all the proof steps can be written using mutual information or joint entropies of them alone. Although this is still not a trivial task, the possibility is significantly reduced, e.g., for the (N, K) = (2, 5) case to only 12 random variables, with a much simpler structure than that of the original problem with 39 random variables. Perhaps more importantly, such a restriction makes it feasible to identify a common route of derivation in the converse proof and then generalize it, from which we obtain the proof of Theorem 4.

Computing Bounds for Larger Problem Cases
We now present a few outer bounds for larger problem cases, and make comparison with other known bounds in the literature. This is not intended to be a complete list of results we obtain, but these are perhaps the most informative.
In Figure 6, we provide results for (N, K) = (4, 3), (N, K) = (5, 3) and (N, K) = (6, 3). Included are the computed outer bounds, the inner bound by the scheme in [6], the cut-set outer bounds, and for reference, the outer bounds given in [12]. We omit the bounds in [13,14] to avoid too much clutter in the plot; however, they do not provide better bounds than that in [12] for these cases. It can be seen that the computed bounds are in fact tight in the range M ∈ [4/3, 4] for (N, K) = (4, 3), M ∈ [5/3, 5] for (N, K) = (5, 3), and tight in general for (N, K) = (6, 3); in these ranges, the scheme given in [6] is in fact optimal. Unlike our computed bounds, the outer bound in [12] does not provide additional tight results beyond those already determined using the cut-set bound, except the single point (M, R) = (2, 1) for (N, K) = (6, 3).  [6]; and the thin blue lines are the outer bounds given in [12]. Only nontrivial outer bound corner points that match inner bounds are explicitly labeled.
In Figure 7, we provide results for (N, K) = (3,4), (N, K) = (3,5) and (N, K) = (3,6). Included are the computed outer bounds, the inner bound by the code in [6] and that in [20], the cut-set outer bound, and for reference, the outer bounds in [12]. The bounds in [13,14] are again omitted. It can be seen that the computed bounds are in fact tight in the range M ∈ [0, 1/4] 3] for (N, K) = (3,5), and M ∈ [0, 1/6] ∪ [3/2, 3] for (N, K) = (3,6). Generally, in the high memory regime, the scheme given in [6] is in fact optimal, and in the low memory regime, the schemes in [15,20] are optimal. It can be see that the outer bound in [12] does not provide additional tight results beyond those already determined using the cut-set bound. The bounds given above in fact provide grounds and directions for further investigation and hypotheses on the optimal tradeoff, which we are currently exploring.  [6,20]; and the thin blue lines are the outer bounds given in [12]. Only nontrivial outer bound corner points that match inner bounds are explicitly labeled.

Conclusions
We presented a computer-aided investigation on the fundamental limit of the caching problem, including data-driven hypothesis forming, which leads to several complete or partial characterizations of the memory-transmission-rate tradeoff, a new code construction reverse-engineered through the computed outer bounding data and a computerized exploration approach that can reveal hidden structures in the problem and also enables us to find surprisingly strong outer bounds for larger problem cases.
It is our belief that this work provides strong evidence of the effectiveness of the computer-aided approach in the investigation of the fundamental limits of communication, data storage and data management systems. Although at first sight, the exponential growth of the LP problem would prevent any possibility of obtaining meaningful results on engineering problems of interest, our experience in [2,3] and the current work suggest otherwise. By incorporating the structure of the problem, we develop more domain-specific tools in such investigations and were able to obtain results that appear difficult for human experts to obtain directly.
Our effort can be viewed as both data-driven and computational, and thus, more advanced data analysis and machine learning technique may prove useful. Particularly, the computer-aided exploration approach is clearly a human-in-the-loop process, which can benefit from more automation based on reinforcement learning techniques. Moreover, the computed generated proofs may involve a large number of inequalities and joint entropies, and more efficient classification or clustering of these inequalities and joint entropies can reduce the human burden in the subsequent analysis. It is our hope that this work can serve as a starting point to introduce more machine intelligence and the corresponding computer-aided tools into information theory and communication research in the future.
Funding: This research was funded in part by the National Science Foundation under Grants CCF-15-26095 and CCF-18-32309.

Acknowledgments:
The author wishes to thank Urs Niesen and Vaneet Aggarwal for early discussions, which partly motivated this work. He also wishes to thank Jun Chen for several discussions, as well as the insightful comments on an early draft. Additionally, the author wishes to thank the authors of [12] for making the source code to compute their proposed outer bounds available online, which was conveniently used to generate some of the comparisons.

Conflicts of Interest:
The author declares no conflict of interest.

Appendix A. Finding Corner Points of the LP Outer Bounds
Since this is an LP problem, and also due to the problem setting, only the lower hull of the outer bound region between the two quantities M and R is of interest. The general algorithm in [27] is equivalent to the procedure given in Algorithm 1 in this specific setting. In this algorithm, the set P in the input is the initial extreme points of the tradeoff region, which are trivially known from the problem setting. The variables and constraints in the LP are given as outlined in Section 2 for a fixed (N, K) pair, which are populated and considered fixed. The output set P is the final computed extreme points of the outer bound. The algorithm can be intuitively explained as follows: starting with two known extreme points, if there are any other corner points, they must lie below the line segment connecting these two points, and thus, an LP that minimizes the bounding plane along the direction of this line segment must be able to find a lower value; if so, the new point is also an extreme point, and we can repeat this procedure again.
In the caching problem, the tradeoff is between two quantities M and R. We note here if there are more than two quantities that need to be considered in the tradeoff, the algorithm is more involved, and we refer the readers to [27,28] for more details on such settings. The proof of Proposition 3 is given in the Tables A1 and A2, and that of Proposition 4 is given in the Tables A3 and A4. Each row in Tables A2 and A4, except the last rows, are simple and known information inequalities, up to the symmetry defined in Section 3. The last rows in Tables A2 and  A4 are the sum of all previous rows, which are the sought-after inequalities, and they are simply the consequences of the known inequalities summed together. When represented in this form, the correctness of the proof is immediate, since the columns representing quantities not present in the final bound cancel out each other when being summed together. The rows in Table A2 are labeled, and it has more details in order to illustrate the meaning and usage of the tabulation proof in the example we provide next.
As mentioned previously, each row in Table A2 is an information inequality, which involves multiple joint entropies, but can also be represented in a mutual information form. For example, Row (2) is read as: and in the last, but one column of Table A2, an information inequality is given, which is an equivalent representation as a mutual information quantity: which can be seen by simply expanding the mutual information as: Directly summing up these information inequalities and canceling out redundant terms will directly result in the bound 2R + 2H(Z 1 ) − 4F ≥ 0, which clearly can be used to write 2R + 2M − 4F ≥ 0. Using these proof tables, one can write down different versions of proofs, and one such example is provided next based on Tables A1 and A2 for Proposition 3 by invoking the inequalities in Table A2 one by one.
where the inequalities match precisely the rows in Table A2, and the equality labeled (c) indicates the decoding requirement is used. In this version of the proof, we applied the inequalities in the order of (1)-(3)-(5)-(2)-(4)-(6,7), but this is by no means critical, as any order will yield a valid proof. One can similarly produce many different versions of proofs for Proposition 4 based on Tables A3 and A4.  Table A1. Terms needed to prove Proposition 3. Table A2. Proof by Tabulation of Proposition 3, with terms defined in Table A1.  Table A3.
where (b) is by the sub-modularity of the entropy function, and (c) is because of (3). Now, substituting (A7) into (A5) gives (28), which completes the proof.
We are now ready to prove Theorem 3.
Proof of Theorem 3. For N ≥ 3, it can be verified that the three corner points of the given tradeoff region are: (0, 2), ( N 2 , 1 2 ), (N, 0), which are achievable using the codes given in [6]. The outer bound M + NR ≥ N can also be obtained as one of the cut-set outer bounds in [6], and it only remains to show that the inequality 3M + NR ≥ 2N is true. For this purpose, we claim that for any integer n ∈ {1, 2, . . . , N − 2}: which we prove next by induction. First, notice that: = 3H(Z 1 , W 1 , X 1,2 ) + (N − 3)H(X 1,2 ) (d) where we wrote (3) to mean by Equation (3), and (d) is by Lemma 1 with n = 1. This is precisely the claim when n = 1, when we take the convention ∏ n k (·) = 1 when n < k in (A9). Assume the claim is true for n = n * , and we next prove it is true for n = n * + 1. Notice that the second and third terms in (A9) have a common factor: and using this to normalize the last two terms gives: where (e) is by the file-index-symmetry, and ( f ) is by Lemma 1. Substituting (A11) and (A12) into (A9) for the case n = n * gives exactly (A9) for the case n = n * + 1, which completes the proof for (A9). It remains to show that (A9) implies the bound 3M + NR ≥ 2N. For this purpose, notice that when n = N − 2, the last two terms in (A9) reduce to zero, and thus, we only need to show that: For each summand, we have: Thus, we have: where we have used the well-known formula for the sum of integer squares. The proof is thus complete. Table A14. Tabulation proof of Proposition 7 inequality 5M + 6R ≥ 9, with terms defined in Table A13. T 1 T 2 T 3 T 4 T 5 T 6 T 7 T 8 T 9 T 10 T 11 T 12 T 13 Table A15. Terms needed to prove Proposition 7, inequality 3M + 3R ≥ 5 in (52).
By the symmetry, we only need to consider the demand when the first three users request A and the last user request B. The server can send the following symbols in this case: Let us consider now the single-demand-type (2, 2) system, for which the corner points on the optimal tradeoff are: (M, R) = (0 , 2),  1  3  ,  4  3  ,  4  3  ,  1 3 , (2, 0).
Let us denote the first file as (A 1 , A 2 , A 3 ), and the second file as (B 1 , B 2 , B 3 ), which are in the binary field. To achieve the corner point 1 3 , 4 3 , we use the caching code in Table A18. Table A18. Code for the tradeoff point 1 3 , 4 3 for demand-type (2, 2) when (N, K) = (2, 4).

User 1
Again due to the symmetry, we only need to consider the case when the first two users request A, and the other two request B. For this case, the server can send: For the other corner point 4 3 , 1 3 the following placement in Table A19 can be used. Again for the case when the first two users request A, and the other two request B, the server can send: Table A19. Code for the tradeoff point 4 3 , 1 3 for demand-type (2, 2) when (N, K) = (2, 4). Table A21. Tabulation proof of Proposition 8 inequality M + R ≥ 2 in (54), with terms defined in Table A20. Table A22. Terms needed to prove Proposition 8, inequality 2M + 3R ≥ 5 in (54). Table A23. Tabulation proof of Proposition 8 inequality 2M + 3R ≥ 5 in (54), with terms defined in Table A22.