Next Article in Journal
Bit-Level Construction for Multiplicative-Repetition-Based Non-Binary Polar Codes
Previous Article in Journal
Correlations and Kappa Distributions: Numerical Experiment and Physical Understanding
Previous Article in Special Issue
On Algebraic Properties of Primitive Eisenstein Integers with Applications in Coding Theory
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Family of Optimal Linear Functional-Repair Regenerating Storage Codes

by
Henk D. L. Hollmann
Institute of Computer Science, University of Tartu, Tartu 50409, Estonia
Entropy 2025, 27(4), 376; https://doi.org/10.3390/e27040376
Submission received: 3 March 2025 / Revised: 24 March 2025 / Accepted: 26 March 2025 / Published: 1 April 2025
(This article belongs to the Special Issue Discrete Math in Coding Theory)

Abstract

:
We construct a family of linear optimal functional-repair regenerating storage codes with parameters { m , ( n , k ) , ( r , α , β ) } = { ( 2 r α + 1 ) α / 2 , ( r + 1 , r ) , ( r , α , 1 ) } for any integers r , α with 1 α r , over any field when α { 1 , r 1 , r } , and over any finite field F q with q r 1 otherwise. These storage codes are Minimum-Storage Regenerating (MSR) when α = 1 , Minimum-Bandwidth Regenerating (MBR) when α = r , and represents extremal points of the (convex) attainable cut-set region different from the MSR and MBR points in all other cases. It is known that when 2 α r 1 , these parameters cannot be realized by exact-repair storage codes. Each of these codes come with an explicit and relatively simple repair method, and repair can even be realized as help-by-transfer (HBT) if desired. The coding states of codes from this family can be described geometrically as configurations of r + 1 subspaces of dimension α in an m-dimensional vector space with restricted sub-span dimensions. A few “small” codes with these parameters are known: one for ( r , α ) = ( 3 , 2 ) dating from 2013 and one for ( r , α ) = ( 4 , 3 ) dating from 2024. Apart from these, our codes are the first examples of explicit, relatively simple, optimal functional-repair storage codes over a small finite field, with an explicit repair method and with parameters representing an extremal point of the attainable cut-set region distinct from the MSR and MBR points.
Data Set License: CC-BY

1. Introduction

The amount of data in need of storage continues to grow at an astonishing rate. The International Data Corporation (IDC) predicts that the Global Datasphere (the total amount of data created, captured, copied, and consumed globally) will grow from 149 zettabytes in 2024 [1], to 181 zettabytes by the end of 2025 [2,3], and to an estimated 394 zettabytes in 2028 [4] (a zettabyte equals 10 21 bytes). These developments may even be accelerated by the advancement of generative AI models. In view of these developments, the importance of efficient data storage management can hardly be underestimated. A major challenge is to devise storage technologies that are capable of handling these huge amounts of data in an efficient, reliable, and economically feasible way.

1.1. Distributed Storage Systems and Storage Codes

In modern storage systems, data storage is handled by a Distributed Storage System (DSS). A DSS stores data across potentially unreliable storage units commonly referred to as storage nodes, which are typically located in servers in data centers in widely different locations. Efficient update and repair mechanisms are critical for maintaining stability, especially during node failures [5]. To handle the occasional loss of a storage node, the DSS employs redundancy, in the form of a storage code [6,7]. Often, a DSS simply employs replication, where the storage code takes the form of a repetition code. But nowadays, many storage systems such as Amazon S3 [8]; Goole File System [9] and its successor Colossus [10]; Microsoft’s Azure [11,12,13]; and Facebook’s storage systems [14,15], offer a storage mode involving a (non-trivial) erasure code. Especially for cold data (data that remains unchanged, for example for archiving), but also for warm data (data that needs to be updated only occasionally), non-trivial erasure codes such as Reed–Solomon (RS) codes, Locally Repairable Codes (LRCs) or Regenerating Codes (RGCs) are considered or already applied [7,16]. For example, Microsoft Azure employs a Reed–Solomon code for archiving purposes [11]. Hadoop implements various Reed–Solomon (RS) codes [17,18], and the implementation of other codes such as HTEC has been proposed, see, e.g., [19]. The Redundant Array of Independent Disks (RAID) standard RAID-6 specifies the use of two-parity erasure codes, see, e.g., [20]. Huawei OceanStor Dorado [21,22] employs Elastic EC, offering choice between replication and EC, for example RAID-TP (triple parity), and IBM Ceph also offers a choice of EC profiles [23,24] (see also [25]). Several good overviews of modern storage codes and their performance are available, see for example [16,26,27,28,29]. For a general and recent reference on storage systems, see [30], and for an overview of Big–Data management, see [31].

1.2. Node Repair

In the case of a lost node, the DSS uses the storage code to repair the damage. During repair, the DSS introduces a replacement node (sometimes called a newcomer node) into the system and downloads a small amount of data from some of the remaining nodes, referred to as the helper nodes; the data obtained is then used to compute a block of replacement data that is to be placed on the replacement node. This process, commonly referred to as node repair, comes in two variations. In the simplest repair mode, referred to as exact repair (ER) [32,33], the data stored on the newcomer node is an exact copy of the data stored on the lost node. A more subtle repair mode, first considered in [6], is functional repair (FR), where the replacement data need not be an exact copy of the lost data, but is designed to maintain the possibility of recovering the data that was originally stored, as well as to maintain the possibility for future repairs. An ER storage code can be thought of as an erasure code that enables efficient repair. In contrast, an FR storage code can be seen as a family of codes, all having the same parameters, where an erasure in a codeword from a code in the family is corrected into a codeword from possibly another code in the family [29] (Section 3.1.1). We define and discuss linear FR storage codes in detail in Section 3, and describe an example in Example 1. For a formal definition of general FR storage codes, we refer to [29] (Section 3.1.1).

1.3. Effectiveness of a Storage Code

Key considerations for measuring the effectiveness of a storage code are the storage overhead and the efficiency of the repair process. The storage overhead is determined by the fraction of redundancy employed by the code, and is measured by the rate of the code. Efficient repair, first of all, requires an easily implementable repair algorithm. Other important factors are the amount of data that needs to be transferred during repair, referred to as the repair bandwidth, and the amount of disk I/O, the number of times that a symbol is accessed on disk. In addition, it is desirable to limit the number of nodes that participate in the repair process, known as the repair degree [6] or repair locality [34,35].
In general, the data that is transferred by a helper node during repair may be computed from the available data symbols stored in that node. If each of the helper nodes simply transfers a subset of the symbols stored in that node, then we speak of help by transfer (HBT) [26,29]; if, in addition, no computations are done either at the newcomer node then we speak of repair by transfer (RPT) [36,37]. We say that a storage code is an optimal-access code if the number of symbols read at a helper node equals the number of symbols transferred by that node [26,29,38].

1.4. Regenerating Codes and Locally Repairable Codes

Research into storage codes has diverged into two main directions. Regenerating codes (RGCs) investigate the possible trade-off between the storage capacity per node and the repair bandwidth (the total amount of data download during repair), which is determined by the cut-set bound [6]. On the other hand, Locally Repairable Codes (LRCs) study the influence of the repair degree, the number of helper nodes that may be contacted during node repair [34,35,39]. A good overview of the different lines of research on codes for distributed storage and the obtained results can be found in [40].
We first discuss an often-used model for storage codes, see, i.e., [6,26,27,29]. A regenerating code (RGC) with parameters { m , ( n , k ) , ( r , α , β ) } q is a code that allows for the storage of m information symbols from some finite field F q , in encoded form, onto n storage nodes, each of which being capable of holding α data symbols from F q . We will refer to α as the storage capacity or the subpacketization of a storage node. The parameter k indicates that at all times, the original stored information can be recovered from the data stored on any set of k nodes. It is assumed that k is the smallest integer with this property; since any set of r nodes can repair all the remaining nodes, we then have k r . Note that the rate of the code is the fraction m / ( n α ) of information per stored symbol. The resilience of the code is described in terms of a parameter r, referred to as the repair degree, and a parameter β , referred to as as the transport capacity of the code. If a node fails, then a replacement node is introduced into the system, which is then allowed to contact an arbitrary subset of size r of the remaining nodes, referred to as the set of helper nodes. Each of the helper nodes is allowed to compute β data symbols, which are then sent to the new node, which uses this data to compute a replacement block, again of size α . Therefore, the repair bandwidth γ of a RGC satisfies γ = r β . It has been shown [6] that the parameters of an RGC satisfy the cut-set bound
m i = 0 k 1 min ( α , ( r i ) β ) .
Remarkably, the cut-set bound is independent of n (but n does influence the required field size q for code construction). For fixed m, k, and r, the equality case in (1) takes the form of a piece-wise linear curve that represents the possible trade-off between the storage capacity α and the transport capacity β . Note that we have α m / k (since k nodes can recover the data) and β α / r (since r nodes can repair); the points on the curve where α = m / k with minimal β (so with β = α / ( r k + 1 ) ) and β = α / r with minimal α (so with α = r m / ( r k ( k 2 k ) / 2 ) ) are referred to as the Minimum Storage Regenerating (MSR) and Minimum Bandwidth Regenerating (MBR) points, respectively. It is easily verified that the achievable region determined by (1) is convex and has precisely k extreme points (also referred to as corner points), see Figure 1. We review the cut-set bound in detail in Section 4.
An optimal RGC is an RGC with parameters that attain the cut-set bound (1). It has been shown [41] (Theorem 7) that the MSR and MBR points are the only corner points that can be achieved by exact-repair RGCs; indeed, the only points on the cut-set bound between the MSR and MBR points that can be achieved by ER RGCs are the MSR and MSB points, with the possible addition of a small line segment starting at the MSR point and not including the next corner point. In fact, it is conjectured that the achievable region for ER RGCs is described by the (identical) parameter sets of Cascade codes [42] and Moulin codes [43]. Conversely, it has been shown [44] that every point on the cut-set bound is achievable by functional-repair RGCs; however, these codes are not (or not really) explicit, require a very large field size, and do not come with a repair algorithm. As far as we know, the only known explicit optimal FR RGCs are the partial exact-repair MSR codes with m = 2 k from [37], the explicit k = n 2 HBT “FMSR” codes in [45] (see also the “random” NCCloud HBT codes in [46] and the non-explicit k = 2 MSR codes in [47]), and the two explicit optimal FR RGCs from [48] and from [49,50]. Therefore, it is of great interest to construct “simple” FR RGCs with a small field size, in corner points different from the MSR and MBR points.
A Locally Repairable Code (LRC) also has parameters { m , ( n , k ) , ( r , α , β ) } q , where m, n, k, α , and β have the same meaning as for RGCs, but now we just require that repair of a failed node is always possible if we employ a specific set of r helper nodes (i.e., we are allowed to choose the r helpers). In [51,52] the maximal rate of such codes (without any constraint on k) was investigated, and in [52], it was conjectured that for the case where r + 1 n , the optimal rate is achieved by partitioning the n storage nodes into repair groups of size r + 1 and, within each repair group, using an { m , ( n , r ) , ( r , α , β ) } q optimal RGC, so with m attaining equality in (1). This partly explains our interest in RGCs with these parameters in this paper. It is an interesting problem to investigate optimal codes for the case where r + 1 Entropy 27 00376 i001 n.

1.5. Our Contribution

Many existing storage codes employ MDS codes or, essentially, arcs in projective geometry, in their construction. Some examples are the MBR exact-repair codes obtained by the matrix-product code construction in [53], the MSR functional-repair codes in [37] and in [47], and the exact-repair Moulin codes in [43]. In this paper, we use MDS codes to construct explicit optimal linear RGCs with n 1 = r = k , β = 1 , and with α an integer with 1 α r , so with m = ( 2 r α + 1 ) α / 2 , which we refer to as ( r , α ) -regular codes. In fact, we show that the existence of ( r , α ) -regular storage codes is equivalent to the existence of an [ r , α , r α + 1 ] q MDS code, so they can be realized over finite fields F q with q r 1 , and even as binary codes if r α 1 . These codes come with a relatively simple repair method, and we show that, if desired, they allow for help-by-transfer (HBT) repair. The parameters of these codes achieve the r extremal points of the achievable cut-set region for varying α . Note that by employing the obvious space-sharing technique [37], we can use the two storage codes in consecutive extremal points on the cut-set bound (1) to also achieve the points between these extremal points. Our construction is based on what we call ( r , α ) -regular configurations, collections of r + 1 subspaces of dimension α in an ambient space of dimension m with restricted sub-span dimensions (such configurations where called ( r , α 1 ) -good in [48] and [49], see also [51] (Example 3.3)).
The contents of this paper are organized as follows. In Section 2, we introduce some notation and we recall various notions from coding theory, and in Section 3, we review linear storage codes. We revisit the cut-set bound in Section 4, where we also show that in optimal RGCs with k > 1 , no two nodes store identical information; in addition, we show that if s an integer such that ( s 1 ) β α s β , then any r s + 1 nodes carry independent information, that is, together they carry an amount of information equal to ( r s + 1 ) α . In addition, in the case where r = k , we derive an inequality that motivates our definition of ( k , r , s , β ) -regular configurations in Section 5, where we also construct such configurations for all relevant parameters. The ( k , r , s , β ) -regular configurations with k = r , β = 1 , and α = s are called ( r , α ) -regular. In Section 6, we investigate the structure of such configurations. Section 7 contains our main results. Here, we show that the repair of a lost node in an ( r , α ) -regular coding state necessarily involves an MDS code, thus providing a lower bound for the size of the finite field for which an ( r , α ) -regular storage code can be constructed. Theorems 3 and 4 together demonstrate existence of ( r , α ) -regular codes for all feasible pairs ( r , α ) , and include precise and simple repair instructions for the corresponding codes. In Section 8, we describe how to obtain smaller ( r , α ) -regular storage codes with extra symmetry, involving only ( r , α ) -regular configurations of a more restricted type. Finally, in Section 9, we present some conclusions.

2. Notation and Preliminaries

For a positive integer n, we define [ n ] : = { 1 , , n } . We write F q to denote the (unique) finite field of size q. For two vectors a = ( a 1 , , a m ) and b = ( b 1 , , b m ) in some vector space V F q m , and for a k × m matrix M = ( M i , j ) with entries in F q , define the dot product  a · b : = a 1 b 1 + + a m b m ; define M · a : = ( M ( 1 ) · a , , M ( k ) · a ) , where M ( i ) denotes the i-th row of M ; and define a · M = ( a · M 1 , , a · M k ) , where M j denotes the j-th column of M .
We define the span  U 1 , , U n of subspaces U 1 , , U n of an ambient vector space V as the collection of all sums u 1 + + u n with u i U i for i [ n ] . (In other works, the span is sometimes denoted as U 1 + + U n .) We simply denote the span u 1 , , u n of the vectors u 1 , , u n in V by u 1 , , u n . We say that subspaces U 1 , , U n of a vector space V are independent if dim U 1 , , U n = dim U 1 + + dim U n , where dim V denotes the dimension of a vector space V.
We repeatedly use Grassmann’s identity, which states that for vector spaces U , V we have
dim U V + dim U , V = dim U + dim V .
We need various notions from coding theory. For reference, see, e.g., [54].
The support  supp ( v ) of a vector v F q n is the collection of positions i { 1 , , n } for which v i 0 ; the (Hamming) weight  w ( v ) of v is the number of positions i { 1 , , n } for which v i 0 , that is, w ( v ) = | supp ( v ) | . The (Hamming) distance  d ( v , w ) between v , w F q n is the number of positions i { 1 , , n } for which v i w i . Note that d ( v , w ) = w ( v w ) .
A code C of length n over F q is just a subset of F q n ; the code C is called linear if C is a subspace of F q n . We often refer to the vectors contained in a code as codewords. The minimum weight  w ( C ) of a code C is the smallest weight of a nonzero codeword from C, and the minimum distance  d ( C ) of C is the smallest distance between two distinct codewords from C. Note that if the code C is linear, then d ( C ) = w ( C ) . We often refer to a linear code C of length n, dimension k, and minimum distance d over F q as an [ n , k ] q code or as an [ n , k , d ] q code; we simply write [ n , k ] or [ n , k , d ] if the intended field is clear from the context.
A generator matrix for an [ n , k ] q code C is a k × n matrix G over F q with rank k and with its rowspace equal to C, that is, C consists of the vectors a · G with a F q k . An ( n k ) × n matrix H is a parity-check matrix for C if H has rank n k and c C if and only if H · c = 0 . The dual code C of C is the collection of all vectors x for which x · c = 0 for all c C . It is not difficult to see that C is an [ n , n k ] -code, and has generator matrix H and parity-check matrix G , see also [54] (Chapter 11).
Finally, we need some notions related to MDS codes. As a general reference for this material, see [54] (Chapter 11). The Singleton bound states that an [ n , k , d ] q code satisfies d n k + 1 . For a proof, see, e.g., [54] (Chapter 1, Theorem 11), or see [55] (Theorem 4.1) for a generalization for non-linear codes. An [ n , k , n k + 1 ] q code, that is, a linear code that attains the Singleton bound, is called an MDS code. A related notion is that of an arc, a collection of nonzero vectors in F q k with the property that any k of them are independent. (Usually, an arc is defined projectively, that is, as a set of points in PG ( k 1 , q ) , but for our purposes, this will do.) We say that a k × n matrix M represents an n-arc if the columns of M constitute an n-arc (i.e., an arc of size n) in F q k ; alternatively, we refer to such a matrix as an MDS-generator. (The term MDS matrix comes from cryptography and is commonly reserved for a matrix M for which [ I M ] is an MDS-generator.) Consider an [ n , k ] q code C, with generator matrix G and parity-check matrix H . Obviously, if H has n k columns that are dependent, then C has a nonzero codeword of weight at most n k . Therefore, C is MDS if and only if the columns of H form an n-arc. Moreover, if G has k columns that are dependent, then there exists a F q k with a 0 such that the codeword c = G a is nonzero but has a 0 in the corresponding positions, so that 0 < w ( c ) n k and C is not MDS. Hence, C is MDS if and only if G is an n-arc, that is, if and only if its generator matrix (or parity-check matrix) is an MDS-generator. In particular, C is MDS if and only if C is MDS [56] and [57] (Lemma 6.7, p. 245).
Note that F q k itself, the repetition codes with parameters [ n , 1 , n ] q and their duals, the codes with parameters [ n , n 1 , 2 ] q (called even-weight codes when q = 2 ), are all MDS codes. For k 2 , let m ( k , q ) denote the largest n for which an [ n , k , n k + 1 ] q MDS code exists. The famous MDS conjecture, proven by Simeon Ball for the case where q is prime in [58], claims that
m ( k , q ) = q + 1 , for   2 k < q ; k + 1 , for   k q ,
except that when q is even,
m ( 3 , q ) = m ( q 1 , q ) = q + 2 .
For k q , it was shown in [59] that m ( k , q ) = k + 1 , and that an [ k + 1 , k ] q MDS code is equivalent to the dual of the repetition code, see also [54] (Corollary 7). It is well known that m ( k , q ) is at least equal to the stated values in (2) and (3). Indeed, we already mentioned that
1 0 0 1 0 1 0 1 0 0 1 1
is an MDS-generator for all k; the corresponding linear code for q = 2 is called the even-weight code. Furthermore, let α 1 , , α q 1 be the non-zero elements of F q . If k q 1 , then
1 1 1 0 α 1 α q 1 0 0 α 1 k 2 α q 1 k 2 0 0 α 1 k 1 α q 1 k 1 0 1
is a k × ( q + 1 ) MDS-generator; moreover, if q is even, then
1 1 1 0 0 α 1 α q 1 0 1 0 α 1 2 α q 1 2 0 0 1
is a 3 × ( q + 2 ) MDS-generator. The corresponding codes are referred to as (Generalized) Reed–Solomon codes. In fact, for any k, 1 k q + 1 , such that q is even or k is odd, there exists a [ q + 1 , k , q k + 2 ] cyclic MDS code over F q [60] (this corrects an erroneous claim in [54]). For a reference for the above claims, see, e.g., [54] (Chapter 11, Sections 5–7).

3. Linear Storage Codes

In this paper, we adhere to the vector space view ([33,41,48,51,53,61,62,63]) on linear storage codes. Informally, a storage code with symbol alphabet F q is called linear if the four processes of data storage, data recovery, the generation of repair data from the helper nodes, and the generation of the replacement data from the repair data, are all linear operations over F q [29]. It turns out that in that case, the storage code can be described in terms of subspaces of an ambient vector space over F q referred to as the message space. In the description below, we will follow a similar approach as in [49,50]. We first need a few definitions.
Definition 1.
We say that the subspaces U 1 , , U k of a vector space V form a recovery set for V if V = U 1 , , U k .
Definition 2.
We say that a subspace U 0 of a vector space V can be obtained from subspaces U 1 , , U r of V by  β -repair, written as
U 1 , , U r β U 0 ,
if there are β-dimensional helper subspaces H j U j ( j [ r ] ) such that U 0 H j j [ r ] .
We can now present a formal definition of a Linear Regenerating Code (LRGC) in terms of vector spaces, which can be seen as a “basis-free” representation of a linear storage code. To understand the definition, think of the data that is stored by the storage code as being represented by a vector x in the ambient vector space V F q m , referred to as the message space of the code. Then for every subspace W of V that occurs in the definition, choose a fixed basis w 1 , , w t , and think of W as representing the t data symbols x · w 1 , , x · w t .
Definition 3.
Let m , n , k , r , α , β be integers for which 1 < k r < n and β α r β . A linear storage code with parameters { m , ( n , k ) , ( r , α , β ) } q consists of an ambient m-dimensional vector space V over F q together with a collection S of sequences σ = U 1 , , U n of α-dimensional subspaces U 1 , , U n of V, referred to as coding states of the storage code, with the following properties.
(i) (Data recovery) Every k subspaces in a coding state σ S constitute a recovery set for V. Moreover, we will assume that k is minimal with respect to this property.
(ii) (Repair) For every i [ n ] and for every J [ n ] \ { i } with | J | = r , there is a subspace U i of V such that ( U j ) j J β U i for which σ : = U 1 , , U i 1 , U i , U i + 1 , , U n is again a coding state in S .
For future use, we introduce some additional terminology.
Definition 4.
We refer to the collection of all the α-dimensional subspaces of V that occur in some coding state in S as the coding spaces of the linear storage code S .
A subsequence π = U 1 , , U i 1 , U i + 1 , , U n ( i [ n ] ) of a state σ = U 1 , , U n S will be referred to as a protostate of the storage code S .
So to actually employ the collection S as in Definition 3 as a storage code, think of the stored data as a vector x V (or as a linear functional, that is, as an element of the dual  V of V mapping a V to x · a F q as in [64]). Then, for every coding space U involved in S , choose a fixed  m × α matrix U = U ( U ) with columnspace equal to U; now, if U is the coding space associated with a particular storage node, then we let this node store the α symbols of the vector c ( U , x ) : = x · U . Note that if u is any vector in U, with u = U a U , say, then x · u = ( x · U ) · a , so for every u U , we can compute x · u from the stored vector x · U . Similarly, for a repair subspace H contained in a helper node with associated coding space U during repair, we choose a fixed m × β matrix H = H ( H ) with columnspace equal to H, and let this (helper) node contribute the β symbols x · H . The code associated with a coding state σ = U 1 , , U n is the collection C σ of all words c ( x ) in F q n α obtained as the concatenation of the words c ( U i , x ) for i [ n ] when x ranges over V. Note that C σ is an [ n α , m ] q code with m × n α generator matrix
G ( σ ) = [ U 1 U n ] ,
where U i is a matrix with columnspace U i , for all i. It is not difficult to verify that the family of codes C σ associated with states σ from a storage code S as in Definition 3 indeed has the desired repair properties when used in this way to store data. Note that the resulting functional-repair (FR) storage code is exact-repair precisely when the code consists of a single coding state. In the case where the storage code is FR, at any time every storage node must “know” its associated coding space. The extra overhead that this entails can be relatively small if the code is used to store a large number of data vectors simultaneously. For further details, we refer to [49,50]. The next example illustrates the above.
Example 1
(See also [48] (Example 2.2), [49] (Example 2.6), and [50] (Example 2.7)). We will construct a binary linear functional-repair storage code S with parameters { m , ( n , k ) , ( r , α , β ) } q = { 5 , ( 4 , 3 ) , ( 3 , 2 , 1 ) } 2 (representing the smallest non-MSR/MBR extreme point of the achievable cut-set region). So let V be a 5-dimensional vector space over F 2 . A set of three 2-dimensional subspaces { U 1 , U 2 , U 3 } of V is said to be  ( 3 , 2 ) -regular if any two of them are independent and U 1 , U 2 , U 3 = V (this was called  ( 3 , 1 ) -good in the cited papers). It is easily verified that if { U 1 , U 2 , U 3 } is ( 3 , 2 ) -regular, then there are nonzero vectors a i U i ( i = 1 , 2 , 3 ) such that a 1 + a 2 + a 3 = 0 ; as a consequence, there is a basis e 1 , e 2 , e 3 , a 1 , a 2 for V such that U i = e i , a i ( i = 1 , 2 , 3 ). It is easily checked that with U 4 : = e 1 + e 2 , e 1 + e 3 , any subset of { U 1 , U 2 , U 3 , U 4 } of size 3 is ( 3 , 2 ) -regular. As a consequence, the collection of all states σ = U 1 , U 2 , U 3 , U 4 for which any set of three of the spaces form a ( 3 , 2 ) -regular collection is a linear storage code with the parameters as specified. Note that there are coding states that are unreachable, that is, not obtainable by repair from a protostate; for example, states of the form σ = U 1 , U 2 , U 3 , U 4 with U i = e i , a i ( i = 1 , 2 , 3 ) and with U 4 = e 1 + e 2 , e 1 + e 3 + a 1 ; obviously, such states can be freely deleted from the code.

4. The Cut-Set Bound Revisited

Suppose that the DSS employs an { m , ( n , k ) , ( r , α , β ) } storage code. Since k is assumed to be minimal and any r nodes can regenerate the stored information, we have n 1 r k . (Indeed, to see this, choose an arbitrary set of r helper nodes, and one by one destroy and repair all the other nodes, employing these helper nodes for each repair. Then the information contained in the system is just the information that is contained in these r helper nodes.) Note also that, obviously, m / k α (since any k nodes regenerate the stored information), and α r β (since r helper nodes, each contributing an amount β of information, can create a replacement node), and β α (since α is the maximum amount that can be contributed by a helper node). Finally, let s be an integer such that ( s 1 ) β α s β , or such that β α s β if s = r k + 1 ; therefore, we may assume that r k + 1 s r . We let α ¯ : = α / m and β ¯ : = β / m denote the normalized storage capacity and transport capacity, respectively. Our aim is to provide a quick and informal derivation of the cut-set bound for RGCs and to establish a few simple properties of optimal codes that seem to have gone unobserved. First, we show the following.
Lemma 1
(Cut-set bound). Let m , n , k , r be positive integers with n 1 r k , and let α , β be positive real numbers with β α r β . Let s be an integer such that ( s 1 ) β α s β if s = r k + 2 , , r or such that β α ( r k + 1 ) β if s = r k + 1 . A storage code with parameters { m , ( n , k ) , ( r , α , β ) } satisfies
m i = 0 k 1 min ( α , ( r i ) β ) = ( r s + 1 ) α + ( ( s 1 ) + + ( r k + 1 ) ) β .
Moreover, in the case of equality in (7), we have the following.
  • Any r s + 1 nodes, together, contain an amount of information ( r s + 1 ) α , that is, these nodes carry independent information.
  • Any two nodes carry an amount of information of at least 2 α if s < r or α + ( k 1 ) β if s = r . Therefore, if k = 1 , then every node carries the stored information, so the code is essentially a repetition code, but if k 2 , then no two nodes carry identical information .
  • If, in addition, we have r = k , then for any J [ n ] with size | J | k , the information I ( N j j J ) contained in any collection of storage nodes N j with j J satisfies
    I ( N j j J ) = | J | α
    if | J | r s + 1 , and
    I ( N j j J ) ( r s + 1 ) α + ( ( s 1 ) + + ( s t ) ) β .
    if | J | = r s + 1 + t with 1 t k r + s 1 .
Proof. 
Assume that nodes N 1 , , N n store the file, and that each k nodes regenerate the stored file, with every node storing α symbols. Consider nodes N 1 , , N r + 1 . Pretend that nodes N r s + 2 , , N k fail in turn, and are replaced by newcomer nodes N r s + 2 , , N k , with none of the nodes N r + 2 , , N n ever participating in a repair. Assume that for i = 1 , , k r + s 1 , the lost node N r s + 1 + i is replaced by newcomer node N r s + 1 + i , which receives an amount of β information from each node contained in the set of r helper nodes consisting of the old nodes N 1 , , N r s + 1 , the new nodes N r s + 2 , , N r s + i , and the old nodes N r s + 2 + i , , N r + 1 . Now consider the sequence K of k nodes defined by K : = N 1 , , N r s + 1 , N r s + 2 , , N k . The first r s + 1 nodes N 1 , , N r s + 1 in K contain an amount of information that is at most equal to ( r s + 1 ) α . And for i = 1 , , k r + s 1 , the information in N r s + 1 + i that is not already contained in the preceding nodes N 1 , , N r s + 1 , N r s + 2 , , N r s + i in K is the information obtained from N r s + 2 + i , , N r + 1 , so is at most equal to ( s i ) β . As a consequence, the amount of information contained in K is at most equal to ( r s + 1 ) α + ( 1 + 2 + + ( r k + 1 ) ) β , and since any k nodes should be able to regenerate the stored information, we conclude that (7) holds. Moreover, we conclude that if the bound (7) holds with equality, then the nodes N 1 , , N r s + 1 in K , together, contain an amount of ( r s + 1 ) α of information, and, in addition, a node N r s + 2 + i contributes a further amount i β of information that is independent of the information already present in preceding nodes in K .
By keeping track which of the nodes among N r s + 2 , , N r + 1 contributed the various pieces of information during the above repair process, we see that node N r s + 2 + i for i = 1 , , k r + s 2 contributes an independent amount of information i β , and the nodes N k + 1 , , N r + 1 each contribute an independent amount ( k r + s 1 ) β . Also note that the sequence of nodes N 1 , , N k , as well as their order, is arbitrary, and nodes N 1 and N r + 1 form an arbitrary pair of nodes. Now, if s < r , then r s + 1 2 and we already showed that any r s + 1 nodes, together, contain at least an amount of 2 α of information; and if s = r then nodes N 1 and N r + 1 , together, contain at least an amount of α + ( k r + s 1 ) β = α + ( k 1 ) β of information. Obviously, in the case where k = 1 , every node carries the same information, so the code is essentially a repetition code. Finally, in the case where r = k , by considering the sequence of nodes N 1 , , N r s + 1 , N r + 1 , , N r s + 3 , we see that the last claim in the lemma holds.  □
Definition 5.
We say that a Regenerating Code (RGC) with parameters { m , ( n , k ) , ( r , α , β ) } is optimal if the bound (1) is attained with equality, and if, moreover, lowering α or β results in violation of this bound.
Note that if α ( r k + 1 ) β , then (7) reads as m k α . In that case, if the code is optimal, then according to Definition 5, we must have α = m / k and β = α / ( r k + 1 ) .
It is not difficult to see that in terms of the normalized parameters α ¯ : = α / m and β ¯ : = β / m , we have the following. For s { r k + 1 , , r } , define
m k , r , s : = ( r s + 1 ) s + ( s 1 ) + + ( r k + 1 ) = ( r s + 1 ) s + s 2 r k + 1 2 ,
and set
α ¯ s : = s / m k , r , s , β ¯ s : = 1 / m k , r , s .
Then the feasible cut-set region, the region of all pairs ( α ¯ , β ¯ ) that can be realized by tuples ( m , k , r , α , β ) for which m k α , β α r β , and for which (7) holds with s as defined above, has extreme points ( α ¯ s , β ¯ s ) for s = r k + 1 , , r , and is further bounded by the half-lines α ¯ = 1 / k = α ¯ r k + 1 , β ¯ α ¯ / ( r k + 1 ) = β ¯ r k + 1 and r β ¯ = α ¯ 2 / ( 2 r k k 2 + k ) , see Figure 1 in Section 1.
We sometimes refer to the extreme points ( α ¯ s , β ¯ s ) ( s = r k + 1 , , r ) as the corner points of the achievable region. The corner points ( α ¯ r k + 1 , β ¯ r k + 1 ) and ( α ¯ r , β ¯ r ) are known as the MSR point and the MBR point, respectively (note that these points are equal if and only if k = 1 ).
Definition 6.
We say that an RGC with parameters { m , ( n , k ) , ( r , α , β } q attains a corner point of the achievable cut-set region if the pair ( α / m , β / m ) equals one of the pairs ( α ¯ s , β ¯ s ) with s { r k + 1 , , r } . An RGC that attains the MSR point or the MBR point is referred to as an MSR code or an MBR code, respectively.
Remark 1.
The result in (9) may well hold also for optimal storage codes where r > k , but we have no proof and no counterexample.
Remark 2.
There are cases of optimal codes where (9) is not satisfied with equality. Consider an MBR code with α = r = k = 3 , n = 4 , and m = 3 + 2 + 1 = 6 . The “standard” code has coding spaces U i = e { i , j } j [ 4 ] , j i } , where the C ( 4 , 2 ) = 6 vectors e { i , j } with 1 i < j 4 form a basis. This code satisfies (8) and (9) with equality.
Now, let U 1 = e 1 , e 2 , e 3 , U 2 = e 4 , e 5 , e 6 , U 3 = e 1 + e 4 , e 2 , e 6 , and U 4 = e 2 + e 6 , e 3 , e 5 . Note that U 4 can be obtained by repair from U 1 (use e 3 ), U 2 (use e 5 ), and U 3 (use e 2 + e 6 ). Now any two coding spaces span at least a 5-space, and any three span a 6-space, but U 1 , U 2 are independent.
This example shows that in a coding state, (9) is not necessarily satisfied with equality. But note that this example can only represent an unreachable state in a storage code with these parameters, since once we have a protostate with no two spaces disjoint, then the new space has a repair vector in common with each of the other coding spaces.

5. ( k , r , s , β ) -Regular Configurations

In this section, let n , k , r be integers with n 1 r k 1 , let s be an integer with r k + 1 s r , and let m k , r , s be as defined in (10). Moreover, let β be a positive integer and let α : = s β . Motivated by the results from the previous section—notably, by (8) and (9))—and by the form of the “small” storage codes from [48,49]) (see also [50]), we introduce and investigate the following notion.
Definition 7.
Let V be a vector space with dim V = m k , r , s β , and let U 1 , , U n be α-dimensional subspaces of V. We say that the collection { U 1 , , U n } is  ( k , r , s , β ) -regular in V if α = s β and, for every integer t with 0 t k ( r s + 1 ) and for every J [ n ] with | J | = r s + 1 + t , we have dim U j j J = d t β , where 
d t : = d t r , s : = ( r s + 1 ) s + i = 1 t ( s i ) = ( r s + 1 ) s + ( s 1 ) + + ( s t ) .
In addition, we say that { U 1 , , U n } is  ( k , r , s ) -regular if it is ( k , r , s , β ) -regular with β = 1 , and ( r , s ) -regular if it is ( k , r , s ) -regular with k = r . We will write 
m r , s : = m r , r , s = ( r s + 1 ) s + ( s 1 ) + + 1
to denote the dimension of the ambient space of an ( r , s ) -regular collection.
Note that Definition 7 requires, in particular, that any r s + 1 of the vector spaces in a ( k , r , s , β ) -regular collection are independent, and that any k of the vector spaces span V. Our aim in the remainder of this section is to study the properties of the numbers m k , r , s defined in (10), and to describe a construction of ( k , r , s ) -regular collections (and, hence, of ( k , r , s , β ) -regular configurations for all integers β ). To that end, we need the following.
Lemma 2.
For i [ s ] , define m i : = min ( r s + i , k ) . Then r s + 1 = m 1 m s = k . Let t be an integer with 0 t k ( r s + 1 ) , and set u : = ( r s + 1 ) + t . Then r s + 1 u k and
d t = ( r s + 1 ) s + ( s 1 ) + + ( s t ) = i = 1 s min ( m i , u ) .
In particular, for m k , r , s as defined in (10), we have
m k , r , s = d s ( r k + 1 ) = m 1 + m 2 + + m s .
Proof. 
Since r k + 1 s , we have r s + 1 k , hence m 1 = r s + 1 . Also, m s = min ( r , k ) = k . Obviously, m i m j if i < j . Therefore, the first claim follows immediately. Since u = ( r s + 1 ) + t k , we have min ( m i , u ) = min ( r s + i , u ) , so we have
i = 1 s min ( m i , u ) = ( r s + 1 ) + + ( r s + t ) + ( s t ) u = ( r s + 1 ) t + 0 + 1 + + ( t 1 ) + ( s t ) ( r s + 1 ) + ( s t ) t = ( r s + 1 ) s + 0 + 1 + 2 + + ( t 1 ) + t ( s t ) = ( r s + 1 ) s + ( s t ) + + ( s 1 ) = d t .
Taking t = k ( r s + 1 ) , we have u = k m i for all i, and we find that m k , r , s = d k ( r s + 1 ) = i = 1 s min ( m i , k ) = i = 1 s m i .  □
Now, to construct a ( k , r , s ) -regular configuration of size n r + 1 , we proceed as follows. For i [ s ] , let M i be a m i × r MDS-generator over a sufficiently large field F q , and let M : = diag ( M 1 , , M s ) . Now let U j : = M 1 ( j ) , M 2 ( j ) , , M s ( j ) , where M i ( j ) denotes the j-th column of M i . Also, write V i = F q m i and let V : = V 1 V s = V 1 , , V s , where we identify V i with the subspace V i : = { 0 } { 0 } V i { 0 } { 0 } of V. Note that dim U j = s ( j [ n ] ), and, by Lemma 2, we have that dim V = m k , r , s .
Theorem 1.
Given the above definitions, σ : = { U 1 , , U n } is ( k , r , s ) -regular, and σ can be constructed from a generator matrix of an [ r , k , r k + 1 ] q MDS code (that is, from a k × r MDS-generator).
Proof. 
We begin by remarking that since m i k r and m s = k , the matrices M i can indeed be constructed if the field size q is large enough. Indeed, the matrices M 1 , , M s 1 can be constructed from a matrix M s by deleting some columns, and since m s = k , such a matrix exists if and only if there exists an [ r , k , r k + 1 ] q MDS code. Note that for i [ s ] , the columns of M i are in V i ; hence, the corresponding columns in M are in V i . Next, consider the span of a collection U i for i I , where | I | = u . Since this span contains u vectors from V i , which correspond to u columns from M i , the MDS property of M i implies that the dimension of their span is equal to min ( m i , u ) . Therefore, with u : = ( r s + 1 ) + t , according to Lemma 2, the span in V is equal to i = 1 s min ( m i , u ) = d t , as required. In particular, for t : = k ( r s + 1 ) , we have m : = dim V = d t = m k , r , s .  □
The above suggests investigating storage codes with parameters { m k , r , s , ( n , k ) , ( r , s , 1 ) } and with coding states that are ( k , r , s ) -regular. This is the subject of Section 7 and Section 8 for the case where k = r . We note that not every such coding state is reachable by repair, see Example 2 below.
Example 2.
Let U 1 : = a 1 , b 1 . U 2 : = a 2 , b 2 , U 3 : = a 3 , b 1 + b 2 , and U 4 : = a 1 a 2 a 3 , b 1 b 2 , where V : = a 1 , a 2 , a 3 , b 1 , b 2 has dimension 5. Then σ : = { U 1 , U 2 , U 3 , U 4 } is ( 3 , 2 ) -regular, but no subspace U i can be obtained from the other three subspaces U j with j i by 1-repair. Therefore, σ cannot be a reachable coding state in a { 5 , ( 4 , 3 ) , ( 3 , 2 , 1 ) } storage code. Replacing U 4 by U 4 : = a 1 a 2 , a 1 a 3 yields a ( 3 , 2 ) -regular configuration that could be a reachable state in a storage code with these parameters.
In Section 8, we shall describe an alternative construction of an ( r , s ) -regular configuration. Here, we state a useful property of the numbers m k , r , s that is needed in that construction.
Lemma 3.
We have
m k , r , s = ( r s + 1 ) s + ( s 1 ) + + ( r k + 1 ) = r + m k 1 , r 1 , s 1 , i f   s > r k + 1 ) ; k s , i f   s = r k + 1 ,
and hence
m k , r , s = r + ( r 1 ) + + ( 2 r s k + 2 ) + ( r s + 1 ) ( r k + 1 ) .
Proof. 
If r k + 1 < s , then with r : = r 1 , k : = k 1 , s : = s 1 , we have
m k , r , s = ( r s + 1 ) s + ( s 1 ) + + ( r k + 1 ) = ( r α + 1 ) s + ( r s ) + s + ( s 1 ) + ( r k + 1 ) = r + m k , r , s .
The last claim follows immediately from this claim by induction.  □

6. The Structure of an ( r , r , s , β ) -Regular Configuration

In this section, we consider the case where r = k . We begin with a result that is fundamental for what follows.
Lemma 4.
Let U 1 , , U r be subspaces of a vector space V. Define
U ¯ i : = U j j [ r ] , j i .
Suppose that H i is a subspace of U i with H i U ¯ i = { 0 } for all i [ r ] . Then, with H : = H i i [ r ] , we have dim H = dim H i , and for every J [ r ] , we have U j j J H = H j j J .
Proof. 
Let j and t be integers with 0 j < t r . Since U 1 , , U j , H 1 , , H t 1 U ¯ t and H t U ¯ t = { 0 } , we have dim U 1 , , U j , H 1 , , H t = dim U 1 , , U j , H 1 , , H t 1 + dim H t . Since H 1 , , H j U 1 , , U j , , by induction we have that
dim U 1 , , U j , H 1 , , H r = dim U 1 , , U j + i = j + 1 r dim H i .
By (16) for j = 0 , we conclude that dim H = dim H i , which proves the first part of the lemma. Next, let J [ r ] with | J | = j . After renumbering the subspaces if necessary, we may assume that J = { 1 , , j } . By (16) and Grassmann’s identity, we have
dim U 1 , , U j H = dim U 1 , , U j + dim H dim U 1 , , U j , H = i = 1 j dim H j .
Since H 1 , , H j U 1 , , U j and dim H 1 , , H j = i = 1 j dim H i , we conclude that U 1 , , U j H = H 1 , , H j , so the second part of the lemma follows.  □
Now assume that r, s, and β are positive integers with 1 s r and r 2 ; set α : = s β ; and let V be an m-dimensional vector space over some finite field F q with m = m r , s β , where m r , s is as defined in (10). Assume that π = { U 1 , , U r } is ( r , r , s , β ) -regular in V. For i [ r ] , let H i be a β -dimensional subspace of U i with H i U ¯ i = { 0 } , where U ¯ i is as defined in (15), and define H : = H 1 , , H r . Below, we will use these assumptions to draw a number of conclusions. First note that since π is ( r , r , s , β ) -regular, we have
dim U ¯ i = m β
and
U ¯ i , U i = V
for all i [ r ] . By Lemma 4, H 1 , , H r are independent in H, so dim H = r β . Next, we note the following.
Lemma 5.
We have that
dim U ¯ 1 U ¯ 2 U ¯ t = m t β
for all t; in particular, with
V : = j = 1 r U ¯ j ,
we have dim V = m r β .
Proof. 
We use induction on t. By (17), the result certainly holds for t = 1 . Now, let t 2 , and suppose the claim holds for smaller values of t. First, we observe that since U t is contained in U ¯ 1 , , U ¯ t 1 , by (18), we have U ¯ 1 U ¯ t 1 , U ¯ t U t , U ¯ t = V . Hence
dim U ¯ 1 U ¯ t 1 , U ¯ t = m .
By the induction hypothesis, dim U ¯ 1 U ¯ t 1 = m + ( t 1 ) β , so using (17), (20), and Grassmann’s identity, we obtain
dim U ¯ 1 U ¯ t = dim U ¯ 1 U ¯ t 1 + dim U ¯ t dim U ¯ 1 U ¯ t 1 , U ¯ t = ( m ( t 1 ) β ) + ( m β ) m = m t β .
The last claim in the lemma follows by letting t = r .  □
Lemma 6.
We have V , H = V and V H = { 0 } . (We will write this as V = V H , identifying V with V { 0 } and H with { 0 } H .)
Proof. 
We already noted that dim H = r β . Moreover, since r 2 , using Lemma 4 we have
H V H U ¯ 1 U ¯ 2 = ( H U ¯ 1 ) ( H U ¯ 2 ) = H 1 H 2 = { 0 } .
By Lemma 5, we have dim V = m r β , so dim V = dim V + dim H , and the claimed result follows.  □
Next, for i = 1 , , r , we define
U i : = U i V .
Lemma 7.
For all i [ r ] , we have dim U i = ( s 1 ) β and U i = U i H i .
Proof. 
Let i [ r ] . Since U i U ¯ j for j i , we have that
U i = U i V = U i ( j = 1 r U ¯ j ) = U i U ¯ i .
So by (17), (18), and Grassmann’s identity, we have
dim U i = dim U i U ¯ i = dim U i + dim U ¯ i dim U i , U ¯ i = s β + ( m β ) m = ( s 1 ) β .
Since U i , H U i , U i = U i V , and H U i = H i by Lemma 4, the claimed results now follow.  □
We summarize the above result in the following theorem.
Theorem 2.
Let r, s, and β be positive integers with 2 s r and r 2 ; set α : = s β ; and let V be a vector space with m : = dim V = m r , s β , with m r , s as defined in (13).
(i) Let V and H be subspaces of V for which V = V , H and V H = { 0 } (so that V = V H ), and let m : = dim V = m r β = m r 1 , s 1 and dim H = r β . Furthermore, let H 1 , , H r be independent in H with dim H i = β ( i [ r ] ), and let σ = { U 1 , , U r } be ( r 1 , r 1 , s 1 , β ) -regular in V . Then, with U i : = U i , H i = U i H i ( i [ r ] ), we have that π = { U 1 , , U r } is ( r , r , s , β ) -regular in V; moreover, V satisfies (19), U i = U i V , H i U i , and H i U ¯ i = { 0 } , where U ¯ i is as defined in (15).
(ii) Conversely, if π = { U 1 , , U r } is ( r , r , s , β ) -regular in V, then π can be put in the form as in (i) by letting V be as in (19), and, for all i [ r ] , letting U i : = U i V and choosing H i U i with H i U ¯ i = { 0 } .
Proof. 
We first note that m r β = m r 1 , s 1 by Lemma 3. With d t r , s as in (12), we have d t r , s = d t r 1 , s 1 + ( r s + 1 ) + t for integers t with 0 t s 2 . Now, if U i = U i H i ( i [ r ] ), then with J [ r ] with | J | = r s + 1 + t and 0 t s 1 , we have dim U j j J = dim U j j J + β | J | . So for t < s 1 , we have dim U j j J = d t r , s β if and only if dim U j j J = d t r 1 , s 1 β , and, in addition, dim U j j [ r ] = dim V if and only if dim U j j [ r ] = dim V . We conclude that π is ( r , r , s , β ) -regular in V if and only if σ is ( r 1 , r 1 , s 1 , β ) -regular in V . This proves part (i); part (ii) follows from Lemmas 5–7.  □
The next lemma handles the case where s = 1 .
Lemma 8.
Let σ = { U 1 , , U r + 1 } be ( r , r , 1 , β ) -regular in a vector space V with m : = dim V = m r , 1 , β β = r β . Then there is a basis { h i , j i [ r ] , j [ β ] } of V such that U i = h i , j j β for i [ r ] and U r + 1 = h 1 , j h r , j j [ β ] . In particular, the resulting storage code is linear, exact-repair, and optimal, meeting the cut-set bound in the MSR point.
Proof. 
Since σ is ( r , r , 1 , β ) -regular, U 1 , , U r are independent in V and every vector u in U r + 1 is of the form u = u 1 + + u r with u i U i ( i [ r ] ). Now, let h 1 , , h β be a basis for U r + 1 , and let h i = h i , 1 + + h i , β with h i , j U i for j [ β ] and i [ r ] . Since U j j [ r + 1 ] , j i = V , we conclude that U i = h i , j j [ β ] for all i [ r ] . Since U r + 1 = h 1 , , h r , the first claim follows. It is also easily checked that a lost coding space U i can be exactly repaired from knowledge of all the vectors h t , j ( j [ β ] for t [ r + 1 ] , t i . Since s = 1 , the resulting code is an ER MSR storage code.  □
The case where s = r is more complicated, as is illustrated by the example below.
Example 3.
The standard example is the following. Let dim V = β ( r + 1 ) r / 2 , let H { i , j } ( 1 i < j r + 1 be independent in V with dim H { i , j } = β , and let U i = H { i , j } j [ r + 1 ] , j i . Then σ = { U 1 , , U r + 1 } is ( r , r , r , β ) -regular in V. But already for r = 2 and β = 1 we have a different example. Indeed, let dim V = m 2 , 2 = 3 with V = e , a 1 , a 2 , and let U 1 : = e , a 1 , U 2 : = e , a 2 , and U 3 : = e , a 1 + a 2 . Then σ = { U 1 , U 2 , U 3 } is ( 2 , 2 ) -regular.
We leave the determination of ( r , r , r , β ) -regular configurations as an open problem.

7. Main Results

In this section, we specialize to the case where β = 1 and, except in Corollary 1, also k = r . The following simple result may be of independent interest.
Lemma 9.
Let U 1 , , U r be subspaces in an m-dimensional vector space V over F q . Let h i U i ( i [ r ] ), and suppose that U 0 is a subspace of H : = h 1 , , h r with dim U 0 = α . Define C F q r to be the collection of all c F q r for which i = 1 r c i h i U 0 . If every collection { U j j J { 0 } } with J [ r ] and | J | = r α is independent, then h 1 , , h r are independent and C is an [ r , α , r α + 1 ] q MDS code.
Proof. 
Since U 0 is a subspace, the code C is linear over F q . Suppose that (after renumbering if necessary) h 1 , , h t form a basis of H, for some t r . Let C 0 be the subcode of C consisting of all c C with supp ( c ) [ t ] } . Obviously, every u U 0 can be written as u = c i h i for a codeword c C 0 , and since h 1 , , h t are independent, every such expression is unique. As a consequence, dim C 0 = dim U 0 = α . Moreover, if C 0 contains a nonzero codeword c with | supp ( c ) | r α , then U 0 and the subspaces U j with j supp ( c ) are not independent, since the word u U 0 corresponding to the codeword c can be written as a linear combination of the vectors h j with j supp ( c ) . Therefore, C 0 is a linear code of length at most r, of dimension α , and with minimum distance at least r α + 1 . By the Singleton bound, we conclude that t = r and C 0 has minimum distance r α + 1 . As a consequence, h 1 , , h r are independent and C 0 = C ; hence C is an [ r , α , r α + 1 ] MDS code over F q .  □
Remark 3.
We note that a similar result holds if β > 1 and α = s β . As before, we can describe U 0 in terms of an [ r β , s β ] q code, with the positions partitioned into r groups of β positions each, but we can now only conclude that a nonzero codeword is nonzero in at least r s + 1 of these groups, and so the code need not be MDS. However, by considering the code an a code of length r over the larger symbol alphabet F q β , we see that the minimum symbol-weight of this F q -linear but not F q β -linear code of length r and size ( q β ) s is at least r s + 1 , so the minimum symbol-distance is r s + 1 . Therefore, this code meets the Singleton bound for non-linear codes [55] (Theorem 4.1), and is, again, a (non-linear) MDS code (or MDS array code). We leave further details to the interested reader.
Lemma 9 has an interesting consequence.
Corollary 1.
If there exists an optimal linear FR storage code with parameters { m , ( n , k ) , ( r , α , 1 ) } q in a corner point of the achievable cut-set region (that is, with α integer), then there exists an [ r , α , r α + 1 ] q MDS code.
Proof. 
Suppose that π = U 1 , , U n 1 is a protostate of such a code. Then we can choose helpers h i U i for i [ r ] and a subspace U 0 H : = h i i [ r ] with dim U = α such that σ = U 1 , , U 0 , , U n 1 is a coding state of that code. By Lemma 1, any collection of subspaces U j ( j J ) with | J | = r α + 1 is independent. Now the desired conclusion follows from Lemma 9.  □
We are now ready to state our main result. This result was announced already in [48] (Theorem 4.1), but, unfortunately, the required extra condition on the helper nodes was inadvertently omitted.
Theorem 3.
Suppose that π = { U 1 , , U r } is ( r , α ) -regular in a vector space V of dimension m = m r , α = α ( 2 r α + 1 ) / 2 over a finite field F q , and let h i U i for i [ r ] . Define U ¯ i as in (15). Then U i \ U ¯ i is nonempty for all i [ r ] . Let C F q r and let U 0 : = { c 1 h 1 + + c r h r c = ( c 1 , , c r ) C } . Then σ : = { U 0 , U 1 , , U r } is an ( r , α ) -regular extension of π if and only if h i U i \ U ¯ i for all i [ r ] and C is an [ r , α , r α + 1 ] MDS code over F q .
Proof. 
Note that by our assumption on π , we have dim U ¯ i = m 1 and dim U ¯ i , U i = dim V = m , hence U i is not contained in U ¯ i ; so U i \ U ¯ i is nonempty.
We begin by showing that the conditions on the vectors h i ( i [ r ] ) and on C are necessary. So suppose that σ is ( r , α ) -regular. First, if h i U ¯ i , then U 0 U ¯ i , hence σ \ { U i } is contained in the proper subspace U ¯ i of V, so it is not an ( r , α ) -configuration, contradicting our assumption. Hence h i U i \ U ¯ i for all i. Then by Lemma 4 with H i : = h i ( i [ r ] ), the vectors h 1 , , h r are independent. Next, let C ¯ denote the collection of all c F q r for which c i h i U 0 . Since h 1 , , h r are independent, we have C ¯ = C and by Lemma 9, we have that C ¯ , hence also C, is an [ r , α , r α + 1 ] q MDS code.
Now, we show that the conditions are also sufficient. So assume that h i U i \ U ¯ i for all i and that C is [ r , α , r α + 1 ] MDS. By Lemma 4 with H i = h i ( i [ r ] ), the vectors h 1 , , h r are independent; hence dim U 0 = dim C = α . Next, let J [ r ] { 0 } with | J | = r α + 1 + t for some integer t with 0 t k r + s 1 . According to Definition 5, we have to show that dim U j j J = d t = ( r α + 1 ) α + ( α 1 ) + + ( α t ) . If 0 J , this holds since π is ( r , α ) -regular. So assume that J = J 0 { 0 } with J 0 [ r ] and | J 0 | = r α + t . Again using that π is ( r , α ) -regular, we have dim U j j J 0 = d t 1 , so by Grassmann’s identity,
dim U j j J = d t 1 + α dim U j j J 0 U 0 ,
which is also correct for t = 0 if we set d 1 : = ( r α ) α . Setting H : = h 1 , , h r , we have U 0 H ; hence, using Lemma 4 and setting C 0 : = { c C supp ( c ) J 0 } , we have
U j j J 0 U 0 = U j j J 0 H U 0 = h j j J 0 U 0 = { c j h j c C 0 } .
Now C is MDS and dim C = α ; hence, dim C 0 = max ( 0 , α ( r | J 0 | ) ) = max ( 0 , t ) = t . So combining (22) and (23), we have
dim U 0 U j j J 0 = d t 1 + α t = d t .
Since J 0 is arbitrary, we conclude that σ is ( r , α ) -regular and of size r + 1 as claimed.  □
This theorem has the following important consequence.
Theorem 4.
Let F q be the finite field of size q. Suppose that there exists an [ r , α , r α + 1 ] MDS code C over F q . Then the family of all ( r , α ) -configurations of size r + 1 in a vector space V of dimension m = m r , α = ( r α + 1 ) α + ( α 1 ) + + ( r k + 1 ) over F q forms the collection of coding states of an optimal linear storage code over F q with parameters { m , ( r + 1 , r ) , ( r , α , 1 ) } q . The protostates of his code are the ( r , α ) -regular configurations of size r.
Proof. 
In Theorem 1, we showed how to use an [ r , α , r α + 1 ] q MDS code C to construct an ( r , α ) -regular configuration of size r + 1 , so the collection of coding states in the theorem is nonempty. And if a coding space is lost, then we are left with a protostate, which is ( r , α ) -regular of length r, and we can use Theorem 3 and the MDS code C to repair this protostate to another coding state.  □
It is usually possible to use a subset of the collection of all ( r , α ) -configurations of length r + 1 as coding states. A rather obvious restriction is discussed in the remark below.
Remark 4.
In Theorem 4, we can limit the coding states to all ( r , α ) -regular collections of size r + 1 in V that can be obtained by repair from a subcollection of size r, since other ones are not reachable. For example, let V = F 2 5 , and let a 1 , a 2 , e 1 , e 2 , e 3 be a basis for V; set a 3 : = a 1 + a 2 . For i [ 3 ] , define U i : = a i , e i , define U 4 : = e 1 + e 2 , a 1 + e 1 + e 3 , and define U 4 : = e 1 + e 2 , e 1 + e 3 . It is easily verified that both π : = { U 1 , U 2 , U 3 , U 4 } and π : = { U 1 , U 2 , U 3 , U 4 } are ( 3 , 2 ) -regular of size 4 (in fact, it can be shown that, up to a linear transformation, every  ( 3 , 2 ) -regular configuration is equal to either π or π ), and, moreover, no subspace U i ( i [ 4 ] ) can be obtained by 1-repair from the other three subspaces in π. So there is no need to include configurations such as π as coding states of a { 5 , ( 4 , 3 ) , ( 3 , 2 , 1 ) } 2 storage code.
In view of Theorem 3, Theorem 4, and of Remark 4, we introduce the following.
Definition 8.
Let r and α be integers with 1 α r . An optimal linear storage code with parameters { m r , α , ( r + 1 , r ) , ( r , α , 1 ) } is called an  ( r , α ) -regular storage code if the code has an ambient space V with dim V = m r , α and if every coding state is an ( r , α ) -regular configuration in V.
In the next section, we will introduce a more interesting family of ( r , α ) -regular storage codes.
We end this section with two further remarks.
Remark 5.
We show in Theorem 3 that an ( r , α ) -regular storage code over a finite field F q exists if and only if an [ r , α , r α + 1 ] q MDS code exists. As rightly pointed out by a reviewer, that leaves open the possibility that a storage code with parameters { m r , α , ( r + 1 , r ) , ( r , α , 1 ) } q exists while no [ r , α , r α + 1 ] q MDS code exists. We are not aware of any non-existence results for regenerating codes in terms of the alphabet size (even for MBR codes, this is listed as Open Problem 1 in [29]), so we cannot rule out this possibility. If one could prove that (9) always holds with equality, then we could conclude that every linear { m r , α , ( r + 1 , r ) , ( r , α , 1 ) } storage code is ( r , α ) -regular, but we do not see how to prove that (if it is true at all, which we doubt). But given the strong relation between construction methods for storage codes and MDS codes, and given our idea that these ( r , α ) -regular codes are, in a sense, “best-possible”, we strongly believe that these codes indeed realize the smallest possible alphabet size for their parameters. We leave this question as an interesting open problem.
Remark 6.
Interestingly, every storage code as in Theorem 3 can be realized as an optimal-access code, and, in fact, as a help-by-transfer (HBT) code. Essentially, with notation as in Theorem 3, the reason is that if a coding space U i is represented by a basis e 1 , , e α , then since U i U ¯ i , there must be an index j [ α ] such that e j U i \ U ¯ i . Note that this property need not hold for every ( r , α ) -regular storage code, since it may be required to choose helper vectors outside the given basis in order to repair to an available coding state. An example of this is given by the ( r , α ) = ( 3 , 2 ) -regular code from [48], as can be seen from its description in [50]. It is an interesting problem to find the smallest ( 3 , 2 ) -regular HBT code. We leave further details to the interested reader.

8. Smaller ( r , α ) -Regular Storage Codes

Inspired by Theorem 2, we will use Theorem 3 to produce a second (essentially recursive) construction of an ( r , α ) -regular collection of size r + 1 .
To this end, let V be a vector space over F q with dim V = m r , α . For t = 1 , , α , let C ( δ + t ) be an [ δ 1 + t , t , δ ] q MDS code, where δ = r α + 1 . In what follows, we will consider bases H for V consisting of vectors h i , j for i = 1 , , α and j = 1 , , δ 1 + i , arranged as in Table 1.
Recall that by Lemma 3, we have m r , α = δ + ( δ + 1 ) + + r , so by counting “by row”, we see that these bases indeed have the right size. Given such a basis H = ( h i , j ) , we can use the given MDS codes to construct a sequence σ = σ ( H , C ( δ + 1 ) , , C ( r + 1 ) ) = U 1 , , U r + 1 as follows. First, for t = 1 , , δ , we let
U t : = h i , t i [ α ] .
Then, for t = 1 , , α , we define
W δ + t : = { j = 1 δ 1 + t c j h t , j c = ( c 1 , , c δ 1 + t ) C ( δ + t ) }
and we let
U δ + t : = W δ + t , h t + 1 , δ + t , h α , δ + t .
Lemma 10.
With the above notation and assumptions, we have dim W δ + t = dim C ( δ + t ) = t ( t [ α ] ), and the collection σ : = { U 1 , , U r + 1 } is ( r , α ) -regular.
Proof. 
First, since h t , 1 , , h t , δ 1 + t are independent, it follows that dim W δ + t = dim C ( δ + t ) ; hence dim W δ + t = C ( δ + t ) = t . Then, from (24), we see that dim U t = α for t [ α ] , and from (26), we see that dim U δ + t = t + ( α ( t + 1 ) + 1 ) = α , so all the subspaces in σ have the required dimension α . We will use induction to prove the last claim. To establish the base case for the induction, note that the δ + 1 subspaces U ( 1 ) : = h 1 , 1 , , U ( δ ) : = h 1 , δ , U ( δ + 1 ) : = W δ + 1 form a ( δ , 1 ) -regular configuration (indeed, since C ( δ + 1 ) is MDS with dimension 1, the unique (up to a scalar) nonzero codeword in C ( δ + 1 ) has weight δ , hence is nonzero in every position). Now, suppose that we have constructed a ( δ 1 + t , t 1 ) -regular configuration σ ( t ) : = { U 1 ( t 1 ) , , U δ 1 + t ( t 1 ) } . Then, we “add an extra layer” by setting U j ( t ) : = U j ( t 1 ) , h t , j ( j [ δ 1 + t ] ), we add an extra subspace U δ + t ( t ) : = W δ + t , and we apply Theorem 2, part (i) to conclude that σ ( t ) : = { U 1 ( t ) , , U δ + t ( t ) } is ( δ + t , t ) -regular. Since σ ( α ) = σ , the claim follows by induction.  □
Next, we want to show that by restricting the allowed MDS codes involved, we can construct an ( r , α ) -regular storage code using only coding states of the type in Lemma 10. In that case, a coding state of this restricted type, when losing a subspace, must be repairable to a new coding state that is again of this restricted type. We will now sketch how this can be achieved.
Let C be a fixed [ r , α , δ ] MDS code C. For every permutation τ = τ 1 , , τ r of { 1 , , r } , we define codes C ( δ + 1 ) , , C ( r + 1 ) by letting
C ( δ + t ) : = { ( c τ 1 , , c τ δ 1 + t ) c = ( c 1 , , c r ) C , supp ( c ) { τ 1 , , τ δ 1 + t } } .
Note that since C is MDS, the code C ( δ + t ) is easily seen to be [ δ 1 + t , t , δ ] MDS; note also that C ( r + 1 ) = C . Now, for every basis H = { h i , j 1 i α , 1 j δ 1 + i } for V, we use these codes C ( δ + t ) defined above to construct an ( r , α ) -regular configuration σ = σ ( H , τ ) as explained earlier, that is, we set σ ( H , τ ) : = σ ( H , C ( δ + 1 ) , , C ( r + 1 ) ) . Then by Lemma 10, σ ( H , τ ) is ( r , α ) -regular. We now have the following.
Theorem 5.
Let r and α be integers with 1 α r , let V be a vector space over F q , with dim V = m r , s , and let C be an [ r , α , δ ] q MDS code, so with δ = r α + 1 . The collection of all ( r , α ) -regular configurations of the form σ ( H , τ ) as defined above, where H = ( h i , j i [ α ] , j [ δ 1 + i ] ) is a basis for V and where τ is a permutation of [ r ] , forms an ( r , α ) -regular storage code.
Proof. 
We sketch a proof as follows. Suppose that for each t [ α ] , we choose a basis s 1 , δ + t , , s t , δ + t for W δ + t . Then
U δ + t = s 1 , δ + t , , s t , δ + t , h t + 1 , δ + t , h α , δ + t .
Note that every vector s u , δ + t can be uniquely expressed as a linear combination of the basis vectors h i , j for V; we will say that a vector h i , j occurs in s u , δ + t if h i , j occurs in that linear combination with a nonzero coefficient. Later, we will impose additional conditions on these vectors s u , δ + t .
We can now arrange the vectors h i , j and the vectors s i , δ + j in a rectangular α × ( r + 1 ) array such that the vectors in column j span U j , see Table 2 below.
This array has the following characteristics.
A1
Row i of the array contains δ 1 + i of the basis vectors of V.
A2
The basis vectors in row i occur only in the vectors s 1 , δ + i , , s i , δ + i .
A3
The vector space W δ + i = s 1 , δ + i , , s i , δ + i is determined by the basis vectors in row i and by an [ δ 1 + i , i , δ ] MDS code C ( δ + i ) derived from the [ r , α , δ ] MDS code C through a fixed permutation τ of { 1 , , r } .
Now consider what happens if we lose a subspace, that is, if we lose a column of the array in Table 2. Our aim will be to arrange the remaining r subspaces into a similar array, but with the last column removed, and then to use the MDS code C to construct the last column from the last row of the new array. Losing any column j with j r has the consequence of losing the basis vectors h i , j in the array, and our aim will be to replace these lost basis vectors with the vectors s u , δ + t (where u = 1 if j δ and u = j δ if δ + 1 j r ), while maintaining the characteristics A1–A3 above. By A1, a row that contains a lost variable should move one row up, and the row that contains the replacement basis vectors should move into the last row. By A2, if s u , δ + t replaces h i , j , then h i , j should occur in s u , δ + t and should not occur in s s , δ + t for s u . Note that since C ( δ + i ) is an MDS code, there is no position where all codewords have a 0; hence we can always choose a basis s 1 , δ + i , , s i , δ + i for W δ + i such that a given vector h i , j occurs in one and in only one of the basis vectors. Finally, by A3, there has to be a suitable permutation τ that can describe the new [ δ 1 + t , t , δ ] MDS codes. As we saw above, A1 and A2 determine how the new array should be formed; what is left is to find a suitable τ , and then to verify that A3 holds again. Let us now turn to the details.
As remarked before, if we lose U r + 1 , then we can recover that subspace exactly. For the other subspaces, we distinguish two cases.
First, suppose we lose a subspace U t with 1 t δ . Then, in Table 2, we delete column t, and we take out row 1 and place it after the last row, where we want the α vectors s 1 , δ + 1 , , s 1 , r + 1 to replace the lost basis vectors h 1 , t , , h α , t . Recall that the vectors s 1 , δ + i , , s i , δ + i span W δ + i and are each a linear combination of h i , 1 , , h i , δ 1 + u ; now, choose these vectors such that s u , δ + i contains h i , t if and only if u = 1 (as remarked above, it is not difficult to verify that this is possible). Define a new permutation
τ = τ 1 , , τ t 1 , τ t + 1 , , τ r , τ t ,
and a new basis H = ( h i , j ) , where, for i = 1 , , α 1 , j = 1 , , δ 1 + i , we let
h i , j = h i + 1 , j , if   j < t ; h i + 1 , j + 1 , if   j > t
and for j = 1 , , r , we let
h α , j = h 1 , j , if   j < t ; h 1 , j + 1 , if   t < j < δ ; s 1 , j + 1 , if   j δ .
Finally, with
U 0 : = { s = 1 r c τ s h α , s c C } ,
it is easily verified that σ : = U 1 , , U t 1 , U t + 1 , , U r + 1 , U 0 is precisely the configuration σ ( τ , H ) .
Secondly, suppose that we lose subspace U δ + t with 1 t α . In that case, we proceed in a similar way, where in Table 2 we remove column δ + t , take out row t and place that row after the last row in the table, where we now want the α t vectors s t , δ + t + 1 , , s t , r + 1 to replace the lost basis vectors h t + 1 , δ + t , , h α , δ + t . This can be achieved by now choosing s u , δ + i to contain h i , δ + t if and only if u = t . Define a new permutation
τ = τ 1 , , τ δ + t 1 , τ δ + t + 1 , , τ r , τ δ + t ,
and a new basis H = ( h i , j ) , where for i = 1 , , α 1 , j = 1 , , δ 1 + i , we let
h i , j = h i , j , if   i < t , j < δ + t ; h i , j + 1 , if   i < t , j > δ + t ; h i + 1 , j + 1 , if   i > t , j > δ + t
and
h α , j = h t , j , if   j < δ + t ; s t , j + 1 , if   δ + t < j < r .
With U 0 as in (29), it is again easily verified that σ : = U 1 , , U δ + t 1 , U δ + t + 1 , , U r + 1 , U 0 is precisely the configuration σ ( τ , H ) .
We leave further details to the reader.  □
It turns out that with a proper choice for the MDS code C, the ( r , α ) -regular configurations described in Theorem 5 may possess extra symmetry, even to the point where they are all equal up to a linear transformation, for example, when q = 2 , r α = 1 , and the MDS code C is the even weight  [ r , r 1 , 2 ] 2 MDS code. In such cases, we can apply automorphism group techniques to construct “small” ( r , α ) -regular storage codes that involve only a relatively small number of different coding spaces. Examples of storage codes constructed in this way are the small ( 3 , 2 ) -regular code from [48] that involves only 8 different coding spaces, and the small ( 4 , 3 ) -regular storage code from [49,50] that involves only 72 different coding spaces. For more details on how such codes can be constructed, using groups of linear transformations fixing a protostate, we refer to [48,49,50].

9. Conclusions

A regenerating storage code (RGC) with parameters { m , ( n , k ) , ( r , α , β ) } q is designed to store m data symbols from a finite field F q in encoded form on n storage nodes, each storing α encoded symbols. If a node is lost, a replacement node may be constructed by obtaining β symbols from each of a collection of r of the surviving nodes, called the helper nodes. The name of these codes stems from the requirement that, even after an arbitrary amount of repairs, any k nodes can regenerate the original data. We say that the code employs exact repair (ER) if, after each repair, the information on the replacement node is identical to the information on the lost node; if not, then we say that the code employs functional repair (FR). An RGC is called optimal if its parameters meet an upper bound called the cut-set bound.
Linear MDS codes have often been instrumental in the construction of optimal RGC’s. In this paper, we first introduce a special type of configurations of vector spaces that we call ( r , α ) -regular. We show that such configurations can be constructed from suitable linear MDS codes. Then we employ linear MDS codes and ( r , α ) -regular configurations to construct what we call ( r , α ) -regular codes, which are optimal linear RGC’s with n 1 = k = r and β = 1 , over a relatively small finite field F q (if r α 1 , then any field can be used; if r α > 1 , then q r 1 is required). Along the way, we show that, conversely, the existence of an ( r , α ) -regular code over a finite field of size q implies the existence of an [ r , α , r α + 1 ] q MDS code over that field.
Apart from two known examples, these storage codes are the only known explicit optimal RGC’s with parameters realizing an extremal point of the achievable cut-set region different from the MSR and MBR points.

Funding

This work is supported in part by the Estonian Research Council grant PRG2531.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

I gratefully acknowledge various discussions with my colleagues Junming Ke, Vitaly Skachek, and Ago-Erik Riet. In addition, I want to thank one of the reviewers for a very careful reading of an earlier version and for some very useful comments that have helped to improve the quality of this paper.

Conflicts of Interest

The author declares to have no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MDPIMultidisciplinary Digital Publishing Institute
DOAJDirectory of open access journals
TLAThree letter acronym
LDLinear dichroism

References

  1. Reinsel, D.; Gantz, J.; Rydning, J. The Digitization of the World From Edge to Core; An IDC White Paper. 2018. Available online: https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf (accessed on 29 January 2025).
  2. Reinsel, D.; Rydning, J.; Gantz, J.F. Worldwide Global DataSphere Forecast, 2021–2025: The World Keeps Creating More Data–Now, What Do We Do with It All? 2021. IDC Report 2021, Doc #US46410421.
  3. Bartley, K. 2024. Available online: https://rivery.io/blog/big-data-statistics-how-much-data-is-there-in-the-world/ (accessed on 29 January 2025).
  4. Taylor, P. Volume of Data/Information Created, Captured, Copied, and Consumed Worldwide from 2010 to 2020, with Forecasts from 2021 to 2025. Available online: https://www.statista.com/statistics/871513/worldwide-data-created (accessed on 17 January 2025).
  5. Ke, J. Codes for Distributed Storage. Ph.D. Thesis, University of Tartu, Tartu, Estonia, 2024. Available online: https://hdl.handle.net/10062/105396 (accessed on 29 January 2025).
  6. Dimakis, A.G.; Godfrey, P.B.; Wu, Y.; Wainwright, M.J.; Ramchandran, K. Network Coding for Distributed Storage Systems. IEEE Trans. Inf. Theory 2010, 56, 4539–4551. [Google Scholar] [CrossRef]
  7. Yin, C.; Xu, Z.; Li, W.; Li, T.; Yuan, S.; Liu, Y. Erasure Codes for Cold Data in Distributed Storage Systems. Appl. Sci. 2023, 13, 2170. [Google Scholar] [CrossRef]
  8. Amazon S3. 2006. Available online: https://aws.amazon.com/s3/ (accessed on 24 February 2024).
  9. Ghemawat, S.; Gobioff, H.; Leung, S.T. The Google file system. ACM SIGOPS Oper. Syst. Rev. 2003, 37, 29–43. [Google Scholar] [CrossRef]
  10. Corbett, J.C.; Dean, J.; Epstein, M.; Fikes, A.; Frost, C.; Furman, J.; Ghemawat, S.; Gubarev, A.; Heiser, C.; Hochschild, P.; et al. Spanner: Google’s Globally Distributed Database. ACM Trans. Comp. Syst. (TOCS) 2013, 31, 1–22. [Google Scholar] [CrossRef]
  11. Huang, C.; Simitci, H.; Xu, Y.; Ogus, A.; Calder, B.; Gopalan, P.; Li, J.; Yekhanin, S. Erasure Coding in Windows Azure Storage. In Proceedings of the 2012 USENIX Annual Technical Conference (USENIX ATC 12), Boston, MA, USA, 13–15 June 2012; pp. 15–26. [Google Scholar]
  12. Khan, O.; Burns, R.; Plank, J.; Pierce, W.; Huang, C. Rethinking Erasure Codes for Cloud File Systems: Minimizing I/O for Recovery and Degraded Reads. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST 12), San Jose, CA, USA, 14–17 February 2012. [Google Scholar]
  13. Chen, Y.L.; Mu, S.; Li, J.; Huang, C.; Li, J.; Ogus, A.; Phillips, D. Giza: Erasure Coding Objects across Global Data Centers. In Proceedings of the 2017 USENIX Annual Technical Conference (USENIX ATC 17), Santa Clara, CA, USA, 15–17 February 2017; pp. 539–551. [Google Scholar]
  14. Rashmi, K.; Shah, N.B.; Gu, D.; Kuang, H.; Borthakur, D.; Ramchandran, K. A “Hitchhiker’s” Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers. ACM SIGCOMM Comput. Commun. Rev. 2014, 44, 331–342. [Google Scholar] [CrossRef]
  15. Sathiamoorthy, M.; Asteris, M.; Papailiopoulos, D.; Dimakis, A.G.; Vadali, R.; Chen, S.; Borthakur, D. XORing Elephants: Novel Erasure Codes for Big Data. Proc. VLDB Endow. 2013, 6, 325–336. [Google Scholar] [CrossRef]
  16. Chiniah, A.; Mungur, A. On the Adoption of Erasure Code for Cloud Storage by Major Distributed Storage Systems. In EAI Endorsed Transactions on Cloud Systems; EAI: New York, NY, USA, 2021; Volume 7, pp. 1–11. [Google Scholar] [CrossRef]
  17. Darrous, J.; Ibrahim, S. Understanding the Performance of Erasure Codes in Hadoop Distributed File System. In Proceedings of the CHEOPS 22, Rennes, France, 5 April 2022; pp. 24–32. [Google Scholar]
  18. Apache Hadoop. HDFS Erasure Coding. 2017. Available online: https://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding (accessed on 21 March 2025).
  19. Kralevska, K.; Gligoroski, D.; Jensen, R.E.; Øverby, H. HashTag Erasure Codes: From Theory to Practice. IEEE Trans. Big Data 2018, 4, 516–529. [Google Scholar] [CrossRef]
  20. Ramkumar, M.P.; Balaji, N.; Emil Selvan, G.S.R.; Jeya Rohini, R. RAID-6 Code Variants for Recovery of a Failed Disk. In Soft Computing in Data Analytics, Proceedings of the International Conference on SCDA 2018; Nayak, J., Abraham, A., Krishna, B.M., Chandra Sekhar, G.T., Das, A.K., Eds.; Springer: Singapore, 2019; pp. 237–245. [Google Scholar]
  21. Huawei, OceanStor Dorado NAS All-Flash Storage. 2024. Available online: https://e.huawei.com/en/solutions/storage/all-flash-storage/nas (accessed on 17 February 2025).
  22. Huang, K.; Li, X.; Yuan, M.; Zhang, J.; Shao, Z. Joint Directory, File and IO Trace Feature Extraction and Feature-based Trace Regeneration for Enterprise Storage Systems. In Proceedings of the 2024 IEEE 40th International Conference on Data Engineering (ICDE), Utrecht, The Netherlands, 13–17 May 2024; pp. 4002–4015. [Google Scholar] [CrossRef]
  23. Vajha, M.; Ramkumar, V.; Puranik, B.; Kini, G.; Lobo, E.; Sasidharan, B.; Kumar, P.V.; Barg, A.; Ye, M.; Narayanamurthy, S.; et al. Clay Codes: Moulding MDS Codes to Yield an MSR Code. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST 18), Oakland, CA, USA, 12–15 February 2018; pp. 139–154. [Google Scholar]
  24. IBM Ceph. 2025. Available online: https://www.ibm.com/docs/en/storage-ceph/8.0?topic=overview-erasure-code-profiles (accessed on 17 February 2025).
  25. Chen, J.; Li, Z.; Fang, G.; Hou, Y.; Li, X. A Comprehensive Repair Scheme for Distributed Storage Systems. Comput. Netw. 2023, 235, 109954. [Google Scholar] [CrossRef]
  26. Balaji, S.; Krishnan, M.; Vajha, M.; Ramkumar, V.; Sasidharan, B.; Kumar, P. Erasure Coding for Distributed Storage: An Overview. Sci. China Inf. Sci. 2018, 61, 100301. [Google Scholar] [CrossRef]
  27. Liu, S.; Oggier, F. An Overview of Coding for Distributed Storage Systems. In Network Coding and Subspace Designs; Greferath, M., Pavčević, M.O., Silberstein, N., Vázquez-Castro, M.Á., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 363–383. [Google Scholar] [CrossRef]
  28. Chen, R.; Xu, L. Practical Performance Evaluation of Space Optimal Erasure Codes for High-Speed Data Storage Systems. SN Comput. Sci. 2020, 1, 54. [Google Scholar] [CrossRef]
  29. Ramkumar, V.; Balaji, S.B.; Sasidharan, B.; Vajha, M.; Krishnan, M.N.; Kumar, P.V. Codes for Distributed Storage. Found. Trends® Commun. Inf. Theory 2022, 19, 547–813. [Google Scholar] [CrossRef]
  30. Thomasian, A. Storage Systems: Organization, Performance, Coding, Reliability, and Their Data Processing, 1st ed.; Morgan Kaufmann: Cambridge, MA, USA, 2021. [Google Scholar]
  31. Mazumdar, S.; Seybold, D.; Kritikos, K.; Verginadis, Y. A Survey on Data Storage and Placement Methodologies for Cloud-Big Data Ecosystem. J. Big Data 2019, 6, 15. [Google Scholar] [CrossRef]
  32. Wu, Y.; Dimakis, A.G. Reducing Repair Traffic for Erasure Coding-Based Storage via Interference Alignment. In Proceedings of the 2009 IEEE International Symposium on Information Theory (ISIT), Seoul, Republic of Korea, 28 June–3 July 2009; pp. 2276–2280. [Google Scholar] [CrossRef]
  33. Rashmi, K.; Shah, N.; Kumar, P.; Ramchandran, K. Explicit Codes Minimizing Repair Bandwidth for Distributed Storage. In Proceedings of the 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 1–3 October 2009; pp. 1243–1249. [Google Scholar]
  34. Gopalan, P.; Huang, C.; Simitci, H.; Yekhanin, S. On the Locality of Codeword Symbols. IEEE Trans. Inf. Theory 2012, 58, 6925–6934. [Google Scholar] [CrossRef]
  35. Papailiopoulos, D.S.; Dimakis, A.G. Locally Repairable Codes. IEEE Trans. Inf. Theory 2014, 60, 5843–5855. [Google Scholar] [CrossRef]
  36. El Rouayheb, S.; Ramchandran, K. Fractional Repetition Codes for Repair in Distributed Storage Systems. In Proceedings of the 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 29 September–1 October 2010; pp. 1510–1517. [Google Scholar] [CrossRef]
  37. Shah, N.B.; Rashmi, K.V.; Kumar, P.V.; Ramchandran, K. Distributed Storage Codes with Repair-by-Transfer and Nonachievability of Interior Points on the Storage-Bandwidth Tradeoff. IEEE Trans. Inf. Theory 2012, 58, 1837–1852. [Google Scholar] [CrossRef]
  38. Wang, Z.; Tamo, I.; Bruck, J. On Codes for Optimal Rebuilding Access. In Proceedings of the 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 28–30 September 2011; pp. 1374–1381. [Google Scholar] [CrossRef]
  39. Oggier, F.; Datta, A. Self-Repairing Homomorphic Codes for Distributed Storage Systems. In Proceedings of the 2011 Proceedings IEEE INFOCOM, Shanghai, China, 10–15 April 2011; pp. 1215–1223. [Google Scholar] [CrossRef]
  40. Duursma, I.; Wang, H. Multilinear Algebra for Minimum Storage Regenerating Codes: A Generalization of the Product-Matrix Construction. Appl. Algebra Eng. Commun. Comput. 2023, 34, 717–743. [Google Scholar] [CrossRef]
  41. Shah, N.B.; Rashmi, K.V.; Kumar, P.V.; Ramchandran, K. Explicit Codes Minimizing Repair Bandwidth for Distributed Storage. In Proceedings of the 2010 IEEE Information Theory Workshop on Information Theory (ITW 2010), Cairo, Egypt, 6–8 January 2010; pp. 1–5. [Google Scholar] [CrossRef]
  42. Elyasi, M.; Mohajer, S. Cascade Codes for Distributed Storage Systems. IEEE Trans. Inf. Theory 2020, 66, 7490–7527. [Google Scholar] [CrossRef]
  43. Duursma, I.; Li, X.; Wang, H.P. Multilinear Algebra for Distributed Storage. SIAM J. Appl. Algebra Geom. 2021, 5, 552–587. [Google Scholar] [CrossRef]
  44. Wu, Y. Existence and Construction of Capacity-Achieving Network Codes for Distributed Storage. IEEE J. Sel. Areas Commun. 2010, 28, 277–288. [Google Scholar] [CrossRef]
  45. Hu, Y.; Lee, P.P.C.; Shum, K.W. Analysis and Construction of Functional Regenerating Codes with Uncoded Repair for Distributed Storage Systems. In Proceedings of the 2013 Proceedings IEEE INFOCOM, Turin, Italy, 14–19 April 2013; pp. 2355–2363. [Google Scholar] [CrossRef]
  46. Hu, Y.; Chen, H.; Lee, P.; Tang, Y. NCCloud: Applying Network Coding for the Storage Repair in a Cloud-of-Clouds. In Proceedings of the 10th USENIX Conference on File and Storage Technologies (FAST 12), San Jose, CA, USA, 15–17 February 2012. [Google Scholar]
  47. Shum, K.W.; Hu, Y. Functional-Repair-by-Transfer Regenerating Codes. In Proceedings of the 2012 IEEE International Symposium on Information Theory (ISIT), Cambridge, MA, USA, 1–6 July 2012; pp. 1192–1196. [Google Scholar] [CrossRef]
  48. Hollmann, H.D.; Poh, W. Characterizations and Construction Methods for Linear Functional-Repair Storage Codes. In Proceedings of the 2013 IEEE International Symposium on Information Theory (ISIT), Istanbul, Turkey, 7–12 July 2013; pp. 336–340. [Google Scholar] [CrossRef]
  49. Ke, J.; Hollmann, H.D.; Riet, A.E. A Binary Linear Functional-Repair Regenerating Code on 72 Coding Spaces Related to PG(2, 8). In Proceedings of the 2024 IEEE International Symposium on Information Theory (ISIT), Athens, Greece, 7–12 July 2024; pp. 2335–2340. [Google Scholar] [CrossRef]
  50. Hollmann, H.D.; Ke, J.; Riet, A.E. An Optimal Binary Linear Functional-repair Storage Code with Efficient Repair Related to PG(2, 8). Submitted to Designs, Codes, and Cryptography.
  51. Hollmann, H.D. Storage Codes—Coding Rate and Repair Locality. In Proceedings of the 2013 International Conference on Computing, Networking and Communications (ICNC), San Diego, CA, USA, 28–31 January 2013; pp. 830–834. [Google Scholar] [CrossRef]
  52. Hollmann, H.D. On the Minimum Storage Overhead of Distributed Storage Codes with a Given Repair Locality. In Proceedings of the 2014 IEEE International Symposium on Information Theory (ISIT), Honolulu, HI, USA, 29 June–4 July 2014; pp. 1041–1045. [Google Scholar] [CrossRef]
  53. Rashmi, K.V.; Shah, N.B.; Kumar, P.V. Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction. IEEE Trans. Inf. Theory 2011, 57, 5227–5239. [Google Scholar] [CrossRef]
  54. MacWilliams, F.; Sloane, N. The Theory of Error-Correcting Codes, 3rd ed.; Elsevier: North-Holland, The Netherlands, 1981. [Google Scholar]
  55. Roth, R. Introduction to Coding Theory; Cambridge University Press: Cambridge, UK, 2006. [Google Scholar]
  56. Singleton, R. Maximum Distance q-Nary Codes. IEEE Trans. Inf. Theory 1964, 10, 116–118. [Google Scholar] [CrossRef]
  57. Moon, T.K. Error Correction Coding; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2005. [Google Scholar]
  58. Ball, S. On Sets of Vectors of a Finite Vector Space in Which Every Subset of Basis Size is a Basis. J. Eur. Math. Soc. 2012, 14, 733–748. [Google Scholar] [CrossRef]
  59. Bush, K. Orthogonal Arrays of Index Unity. Ann. Math. Statist. 1952, 23, 426–434. [Google Scholar] [CrossRef]
  60. Dahl, C.; Pedersen, J.P. Cyclic and Pseudo-Cyclic MDS Codes of Length q+1. J. Combin. Theory Ser. A 1992, 59, 130–133. [Google Scholar] [CrossRef]
  61. Zorgui, M.; Wang, Z. Centralized Multi-Node Repair Regenerating Codes. IEEE Trans. Inf. Theory 2019, 65, 4180–4206. [Google Scholar] [CrossRef]
  62. Ng, S.L.; Paterson, M. Functional Repair Codes: A View from Projective Geometry. Des. Codes Cryptogr. 2019, 87, 2701–2722. [Google Scholar] [CrossRef]
  63. Mital, N.; Kralevska, K.; Ling, C.; Gündüz, D. Functional Broadcast Repair of Multiple Partial Failures in Wireless Distributed Storage Systems. IEEE J. Sel. Areas Inf. Theory 2021, 2, 1093–1107. [Google Scholar] [CrossRef]
  64. Duursma, I.; Wang, H. Multilinear Algebra for Minimum Storage Regenerating Codes. arXiv 2020, arXiv:2006.08911v1. [Google Scholar]
Figure 1. The typical achievable region for functional repair and for exact repair when k = 4 , with fixed m and r.
Figure 1. The typical achievable region for functional repair and for exact repair when k = 4 , with fixed m and r.
Entropy 27 00376 g001
Table 1. The array of basis vectors.
Table 1. The array of basis vectors.
h 1 , 1 h 1 , δ
h t , 1 h t , δ h t , δ 1 + t
h α , 1 h α , δ h α , δ 1 + t h α , r
Table 2. The array of vectors constructed above.
Table 2. The array of vectors constructed above.
U 1 U δ U δ + 1 U δ 1 + t U δ + t U δ + t + 1 U r + 1
h 1 , 1 h 1 , δ s 1 , δ + 1 s 1 , δ 1 + t s 1 , δ + t s 1 , δ + t + 1 s 1 , r + 1
h t , 1 h t , δ h t , δ 1 + t s t , δ + t s t , δ + t + 1 s t , r + 1
h t + 1 , 1 h t + 1 , δ h t + 1 , δ 1 + t h t + 1 , δ + t s t + 1 , δ + t + 1 s t + 1 , r + 1
h α , 1 h α , δ h α , δ + 1 h α , δ 1 + t h α , δ + t h α , δ + t + 1 s α , r + 1
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hollmann, H.D.L. A Family of Optimal Linear Functional-Repair Regenerating Storage Codes. Entropy 2025, 27, 376. https://doi.org/10.3390/e27040376

AMA Style

Hollmann HDL. A Family of Optimal Linear Functional-Repair Regenerating Storage Codes. Entropy. 2025; 27(4):376. https://doi.org/10.3390/e27040376

Chicago/Turabian Style

Hollmann, Henk D. L. 2025. "A Family of Optimal Linear Functional-Repair Regenerating Storage Codes" Entropy 27, no. 4: 376. https://doi.org/10.3390/e27040376

APA Style

Hollmann, H. D. L. (2025). A Family of Optimal Linear Functional-Repair Regenerating Storage Codes. Entropy, 27(4), 376. https://doi.org/10.3390/e27040376

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop