Multilevel Diversity Coding with Secure Regeneration: Separate Coding Achieves the MBR Point

The problem of multilevel diversity coding with secure regeneration (MDC-SR) is considered, which includes the problems of multilevel diversity coding with regeneration (MDC-R) and secure regenerating code (SRC) as special cases. Two outer bounds are established, showing that separate coding can achieve the minimum-bandwidth-regeneration (MBR) point of the achievable normalized storage-capacity repair-bandwidth trade-off regions for the general MDC-SR problem. The core of the new converse results is an exchange lemma, which can be established using Han’s subset inequality.

More specifically, in an (n, k, d) RC problem, a file M of size B is to be encoded in a total of n distributed storage nodes, each of capacity α. The encoding needs to ensure that the file M can be perfectly recovered by having full access to any k out of the total n storage nodes. In addition, when node failures occur and there are only d remaining nodes in the system, it is required that the data originally stored in any failed node can be recovered by downloading data of size β from each one of the d remaining nodes. An interesting technical challenge is to characterize the optimal trade-offs between the node capacity α and the download bandwidth β in satisfying both the file-recovery and node-repair requirements, which was studied in [8][9][10][11][12][13][14][15][16][17][18][19][20]. However, despite intensive research efforts that have yielded many interesting and highly non-trivial partial results including a precise characterization of the minimum-storage-regenerating (MSR) and the minimum-bandwidth-regenerating (MBR) rate points, the optimal trade-offs between the node capacity α and the download bandwidth β have not been fully understood for the general RC problem.
More recently, two extensions of the RC problem, namely multilevel diversity coding with regeneration (MDC-R) and secure regenerating code (SRC), have also been studied in the literature. The problem of MDC-R was first introduced by Tian and Liu [21]. In an (n, d) MDC-R problem, a total of d independent files M 1 , . . . , M d of size B 1 , . . . , B d , respectively, are to be stored in n distributed storage nodes, each of capacity α. The encoding needs to ensure that the file M j can be perfectly recovered by having full access to any j out of the total n storage nodes for any j ∈ {1, . . . , d}. In addition, when node failures occur and there are only d remaining nodes in the system, it is required that the data originally stored in any failed node can be recovered by downloading data of size β from each one of the d remaining nodes.
Clearly, an (n, k, d) RC problem can be viewed as an (n, d) MDC-R problem with degenerate messages (M j : j = k) (i.e., B j = 0 for all j = k). Therefore, from the code construction perspective, it is natural to consider the so-called separate coding scheme, i.e., to construct a code for the (n, d) MDC-R problem, we can simply use an (n, j, d) RC to encode the file M j for each j ∈ {1, . . . , d}, and the coded messages for each file remain separate when stored in the storage nodes and during the repair processes. However, despite being a natural scheme, it was shown in [21] that separate coding is in general suboptimal in achieving the optimal trade-offs between the normalized storage-capacity and repair-bandwidth. On the other hand, it has been shown that separate coding can, in fact, achieve both the MSR [21] and the MBR [22] points of the achievable normalized storage-capacity and repair-bandwidth trade-off region for the general MDC-R problem.
The problem of SRC is an extension of the RC problem that further requires security guarantees during the repair processes. More specifically, the (n, k, d, ) SRC problem that we consider is the (n, k, d) RC problem [7][8][9][10][11][12][13][14][15][16], with the additional constraint that the file M needs to be kept information-theoretically secure against an eavesdropper, which can access the data downloaded to regenerate a total of different failed nodes under all possible repair groups. Obviously, this is only possible when < k. Furthermore, when = 0, the secrecy requirement degenerates, and the (n, k, d, ) SRC problem reduces to the (n, k, d) RC problem without any repair secrecy requirement.
Under the additional require secrecy requirement ( ≥ 1), the optimal trade-offs between the node capacity α and repair bandwidth β have been studied in [23][24][25][26][27][28][29][30]. In particular, Shah, Rashmi and Kumar [25] showed that a particular trade-off point (referred to as the SRK point as the three first letters of the authors' names) can be achieved by extending an MBR code based on the product-matrix construction proposed in [8]. Later, it was shown [30] that, for any given (k, d) pair, there is a lower bound on , denoted by * (k, d), such that, when ≥ * (k, d), the SRK point is the only corner point of the trade-off region for the (n, k, d, ) SRC problem. On the other hand, when 1 ≤ < * (k, d), it is possible that the trade-off region features multiple corner points, even though a precise characterization of the trade-off region, including both the MSR and the MBR points, remains missing in general.
In this paper, we introduce the problem of multilevel diversity coding with secure regeneration (MDC-SR) (The problem of secure multilevel diversity coding without any node regeneration requirement has been considered in [6,31].), which includes the problems of MDC-R and SRC as two special cases. In this model, multiple files are to be stored distributed in several storage nodes, like what in the Multilevel Diversity Coding problem. The system requires that, if a user can fully access some of the nodes, then the user can recover the corresponding part of the original files. Meanwhile, if any storage node failed, it can be regenerated by downloading messages from other nodes within a certain bandwidth limit. Additionally, if some nodes and repairing messages are leaked to an eavesdropper, the original files can still be information that is theoretically secure. The detailed definition of this model can be found in the next section. Similar to the MDC-R problem, it is natural to consider the separate coding scheme for the MDC-SR problem as well. Our main contribution consisted of three parts. Firstly, we established two nontrivial outer bounds for the MDC-SR problem. The secrecy constraint in the MDC-SR problem makes the outer bounding its trade-off region, not a simple extension of the bounding technic of the MDC-R problem in [22]. Secondly, we addressed a coding scheme with a separate coding structure that can achieve the intersection of the two outer bounds that we established, hence we can show that the optimality of separate coding in terms of achieving the MBR point of the achievable normalized storage-capacity and repair-bandwidth trade-off region extends more generally from the MDC-R problem to the MDC-SR problem. Last but not the least, during the process of establishing the two outer bounds, we proposed a lemma called Exchange Lemma, which we believe can be used widely in other similar or even more generalized problems. We need to mention that our system model and main results can be degenerated to some unknown results. For example, when specialized to the SRC problem, our result shows that the SRK point [25] is, in fact, the MBR point of the achievable normalized storage-capacity and repair-bandwidth trade-off region, regardless of the number of corner points of the trade-off region.
From the technical viewpoint, this is mainly accomplished by establishing two outer bounds (one of them must be "horizontal", i.e., on the normalized repair-bandwidth only) on the achievable normalized storage-capacity and repair-bandwidth trade-off region, which intersect precisely at the superposition of the SRK points. The core of the new converse results is an exchange lemma, which we establish by exploiting the built-in symmetry of the problem via Han's subset inequality [32]. The meaning of "exchange" will be clear from the statement of the lemma. The lemma only relies on the functional dependencies for the repair processes and might be useful for solving some other related problems as well.
The rest of the paper is organized as follows. In Section 2, we formally introduce the problem of MDC-SR and the separate coding scheme. The main results of the paper are then presented in Section 3. In Section 4, we introduce the exchange lemma and use it to establish the main results of the paper. Finally, we conclude the paper in Section 5.
Notation and Remarks. Sets and random variables will be written in calligraphic and sans-serif fonts respectively, to differentiate from the real numbers written in normal math fonts. For any two integers t ≤ t , we shall denote the set of consecutive integers {t, t + 1, . . . , t } by [t : t ]. The use of the brackets will be supressed otherwise.
Though many remarkable previous works are mentioned in this introduction, some of them, in fact, are more related to our work, such as [15,25,29]. We list them for the best convenience of our readers.

The MDC-SR Problem
Let (n, d, N 1 , . . . , N d , K, T, S) be a tuple of positive integers such that d < n. Formally, an (n, d, N 1 , . . . , N d , K, T, S) code consists of:  i →i = f B i →i (W i ) be the data downloaded from the i th storage node in order to regenerate the data originally stored at the ith storage node under the context of repair group B. Obviously, , α = log T, and β = log S represent the message sizes, storage capacity, and repair bandwidth, respectively.
Given these connections, our problem formulation can be viewed as providing a unified framework to investigate these closely-related problems.
A simple and natural strategy for constructing a code for the (n, d, ) MDC-SR problem is to use to an (n, j, d, ) SRC to encode the message M j separately for each j ∈ [ + 1 : d]. Since the coded data are kept separate during the encoding, decoding and repair processes, we have Thus, for the general MDC-SR problem, the separate coding normalized storage-capacity repairbandwidth trade-off regionR n,d, (B +1 , . . . ,B d ) for a fixed normalized message-rate tuple (B +1 , . . . ,B d ) is given by: As mentioned previously, when = 0, the repair secrecy requirement (4) degenerates, and the (n, d, ) MDC-SR problem reduces to the (n, d) MDC-R problem. In this case, it was shown in [22] that any achievable normalized message-rate storage-capacity repair-bandwidth tuple (B 1 , . . . ,B d ,ᾱ,β) ∈ R n,d must satisfy:β where When set as equalities, the intersection of (6) and (7) is given by: For any j ∈ [1 : d], the MBR point for the (n, j, d) RC problem can be written as [8] dT We may thus conclude immediately from (5) (with = 0) that separate coding can achieve the MBR point for the general MDC-R problem. Figure 1 shows the optimal trade-off curve between the normalized storage-capacity and repair-bandwidth and the best possible trade-offs that can be achieved by separate coding for the (4, 3) MDC-R problem with (B 1 ,B 2 ,B 3 ) = (0, 1/3, 2/3) [21]. Clearly, for this example, separate coding is strictly suboptimal whenᾱ ∈ (5/12, 1/2). On the other hand, whenᾱ ≤ 5/12 orᾱ ≥ 1/2, separate coding can, in fact, achieve the optimal trade-offs. In particular, separate encoding can achieve the MSR point (7/18, 11/36) and the MBR point (8/15, 8/45). In the same figure, the outer bounds (6) and (7) have also been plotted. As illustrated, they intersect precisely at the MBR point (8/15, 8/45). Notice that, for this example at least, the outer bound (7) is tight only at the MBR point.  [21]). The outer bounds (6), (7) and (14)

Main Results
Our main result of the paper is to show that the optimality of separate coding in terms of achieving the MBR point of the normalized storage-capacity repair-bandwidth trade-off region extends more generally from the MDC-R problem to the MDC-SR problem. The results are summarized in the following theorem.

Theorem 1.
For the general MDC-SR problem, any achievable normalized message-rate storage-capacity repair-bandwidth tuple (B +1 , . . . ,B d ,ᾱ,β) ∈ R n,d, must satisfy: where T d,k, := ∑ k t= +1 (d + 1 − t). When set as equalities, the intersection of (9) and (10) is given by: For any j ∈ [ + 1 : d], the SRK point for the (n, j, d, ) SRC problem can be written as [25]: We may thus conclude immediately from (5) that separate coding can achieve the MBR point for the general MDC-SR problem.
The following corollary follows immediately from Theorem 1 by settingB j = 0 for all j = k.

Corollary 1.
For the general SRC problem, any achievable normalized storage-capacity repair-bandwidth tuple (ᾱ,β) ∈ R n,k,d, must satisfy:β When set as equalities, the intersection of (12) and (13) is precisely the SRK point (11) (with j = k), showing that the SRK point is, in fact, the MBR point of the achievable normalized storage-capacity repair-bandwidth trade-off region for the general SRC problem.
As a final remark, we mention here that when = 0, the outer bound (9) is reduced to (6) for the (n, d) MDC-R problem by the fact that T n,d,0 = T n,d . However, when = 0, the outer bound (10) is reduced to:ᾱ which is weaker than the outer bound (7) by the fact that d 2 > d(d−1)

Proof of the Main Results
Let us first outline the main ingredients for proving the outer bounds (9) and (10).
(1) Total number of nodes. To prove the outer bounds (9) and (10), let us first note that these bounds are independent of the total number of storage nodes n in the system. Therefore, in our proof, we only need to consider the cases where n = d + 1-for the cases where n > d + 1, since any subsystem consisting of d + 1 out of the total n storage nodes must give rise to a (d + 1, d, ) MDC-SR problem. Therefore, these outer bounds must apply as well. When n = d + 1, any repair group B of size d is uniquely determined by the node j to be repaired, i.e., B = [1 : n] \ {j}, and hence can be dropped from the notation S B i→j without causing any confusion. (2) Code symmetry. Due to the built-in symmetry of the problem, to prove the outer bounds (9) and (10), we only need to consider the so-called symmetrical codes [10] for which the joint entropy of any subset of random variables from remains unchanged under any permutation over the storage-node indices. These collections of random variables have also been used in [22,30].
An important part of the proof is to understand the relations between the collections of random variables defined above, and to use them to derive the desired converse results. We shall discuss this next. As a result, S →t+1 = (S →t+1 , S →t+1 ) is a function of U (t,s) . It thus follows immediately from the node regeneration requirement (3) that W t+1 is a function of U (t,s) . Similarly and inductively, it can be shown that (S →j , W j ) is a function of U (t,s) for all j ∈ [t + 2 : s]. This completes the proof of the lemma.

Technical Lemmas
The above lemma demonstrates the "compactness" of U (t,s) and has a number of direct consequences. For example, for any fixed s ∈ [1 : n], it is clear from Lemma 1 that U (t 2 ,s) is a function of U (t 1 ,s) and hence H(U (t 2 ,s) ) ≤ H(U (t 1 ,s) ) for any 0 ≤ t 1 ≤ t 2 ≤ s − 1.
The following lemma plays the key role in proving the outer bounds (6) and (7). The proof is rather long and is deferred to the Appendix to enhance the flow of the paper.
H(U ( ) ) (22) for any m ∈ [ + 1 : d]. Consequently, Proof of Proposition 1. To see (22), consider proof by induction. For the base case with m = + 1, we have where (a) follows from the fact that M +1 is a function of W where (a) follows from the induction assumption; (b) follows from Corollary 2; (c) follows from the fact that M m+1 is a function of W [1:m+1] , which is a function of U (m+1) by Lemma 1; (d) follows from the chain rule for entropy; and (e) follows from the facts that M m+1 is independent of M [ +1:m] and that H(M m+1 ) = B m+1 . This completes the induction step and hence the proof of (22). To see (23), simply set m = d in (22). We have Note that where the last equality follows from the fact that I(U ( ) ; M [ +1:d] ) = 0 by the repair secrecy requirement (4). Substituting (25) into (24) completes the proof of (23).
Substituting (32) and (33) into (31) gives: where (a) follows from the fact that T d,d,j + T d,j, = T d,d, . This completes the proof of the proposition.
Proof of Theorem 1. We are now ready to prove the outer bounds (9) and (10). To prove (9), note that where (a) follows from the fact that H(S → +1 ) ≤ (d − )β; (b) follows from the union bound on entropy; and (c) follows from (23) of Proposition 1. Cancelling 1 d− H(U ) from both sides of the inequality and normalizing both sides by ∑ d t= +1 B t complete the proof of (9).
To prove (10), note that where (a) follows from the fact that H(W d+1 ) ≤ α; (b) follows from the fact that S

Conclusions
This paper considered the problem of MDC-SR, which includes the problems of MDC-R and SRC as special cases. Two outer bounds were established, showing that separate coding can achieve the MBR point of the achievable normalized storage-capacity repair-bandwidth trade-off regions for the general MDC-SR problem. When specialized to the SRC problem, it was shown that the SRK point [25] is the MBR point of the achievable normalized storage-capacity repair-bandwidth trade-off regions for the general SRC problem. The core of the new converse results is an exchange lemma, which we established by using Han's subset inequality [32]. The exchange lemma only relies on the functional dependencies for the repair processes and might be useful for solving some other related problems as well.
Note that separate encoding can also achieve the MSR point of the achievable normalized storage-capacity repair-bandwidth trade-off regions for the general MDC-R problem [22]. We suspect that this also generalizes to the MDC-SR problem. To prove such this result, however, we shall need new converse results as well as new code constructions for the general SRC problem, both of which are currently under our investigations.
Author Contributions: T.L. and C.T. proposed the idea of this paper, S.S. was responsible for the technical proof of the paper and all for authors worked on writing, revising and editing this manuscript. Acknowledgments: S.S. is grateful to Fangwei Ye for giving opinions to our early version on Arxiv.

Conflicts of Interest:
The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Appendix. Proof of the Exchange Lemma
Proof of the Exchange Lemma. This lemma is proved in an iterative way. The to be "exchanged" random variable sets are partitioned in a designed way, and every time after a small partition of the set is exchanged, we can establish an inequality. In our proof, we use not only the submodularity of the entropy function but also the properties of regeneration code, namely Lemma 1 as well. Fix Let us first note that, if j = m + 1, we must have i = i, and in this case the inequality (15) holds trivially with an equality. Therefore, for the remaining proof, we shall assume that j ≤ m. Now that d + 1 − j > d − m, we may write d + 1 − j = s(d − m) + r for some integer s ≥ 1 and r ∈ [1 : d − m]. Furthermore, let As illustrated in Figure A1, a t is monotonically increasing with t. Finally, let τ 0 := {a t : t ∈ [1 : r]} and τ q := {a t : t ∈ [r + 1 + (q − 1)(d − m) : r + q(d − m)]} for any q ∈ [1 : s]. It is straightforward to verify that: • τ q ∩ τ q = ∅ for any q = q , (A2)