Abstract
This paper summarizes the details of recently proposed binary locally repairable codes (BLRCs) and their features. The construction of codes over a small alphabet size of symbols is of particular interest for efficient hardware implementation. Therefore, BLRCs are highly noteworthy because no multiplication is required during the encoding, decoding, and repair processes. We explain the various construction approaches of BLRCs such as cyclic code based, bipartite graph based, anticode based, partial spread based, and generalized Hamming code based techniques. We also describe code generation methods based on modifications for linear codes such as extending, shorting, expurgating, and augmenting. Finally, we summarize and compare the parameters of the discussed constructions.
1. Introduction
Efficient distributed storage systems (DSSs) are considered to be crucial infrastructure for handling big data. These systems must be able to reliably store data over a long duration by introducing redundancy and storing data in a distributed manner across several storage nodes, which may be individually unreliable and could generate failures. Large data centers and peer-to-peer storage systems such as OceanStore [1] from Berkeley and BigTable from Google [2] are famous examples of distributed storage systems.
Owing to cost issues, large data centers also use many commercial hardware storage devices such as hard disk drives/solid state devices (HDDs/SSDs). As a result, device failure occurs regularly, rather than as an exception. The data are typically stored in a redundant manner to effectively protect valuable data against potential failures. The traditional storage method for large storage services such as cloud storage is triplication, i.e., triple replication of each symbol. For example, the Google file system [3] and Hadoop [4] adopt this approach. However, given that triplication requires thrice the storage space, a Reed–Solomon code is deployed in their warehouse cluster in the case of Facebook [5]. Although RS codes are efficient for handling specified numbers of erasures, all of the code symbols must be communicated and reconstructed to repair erasures. Thus, more efficient storage methods have been actively researched, including regeneration codes (RCs), fractional repetition codes (FRCs), and locally repairable codes (LRCs) [6,7,8,9,10,11,12]. RC attempts to minimize the number of transmitted symbols, while the objective of LRC is to optimize the number of disk reads required to repair a single lost node. In some respects, LRC is essentially a block code with an additional parameter referred to as locality. There have been excellent reviews on the distributed storage codes (e.g., [13,14,15,16]). Moreover, a review article on this topic has recently been published [17]. However, to the best of the authors knowledge, no review paper deals only with the binary LRC (BLRC) constructions, which are practically useful.
In most of the early suggestions for LRC constructions, the alphabet size of the stored symbols is very large. However, for efficient and convenient hardware implementation, the construction of codes over a small alphabet size for the stored symbols is of particular interest. For example, BLRCs are of special interest because multiplication is not necessary during the encoding, decoding, and repair processes.
This paper summarizes the recently proposed construction of BLRCs and their features. The code construction methods discussed in this paper are categorized as in Figure 1. The construction methods of BLRCs are explained using cyclic code based, bipartite graph based, anticode based, partial spread based, and generalized Hamming code based approaches. In addition, the construction of BLRCs using modification methods for linear codes such as extending, shorting, expurgating, augmenting, and lengthening are discussed. This paper is organized into several sections. In Section 2, the basic concepts used in the coding techniques for distributed storage systems are introduced. In addition, the characteristics of RC, LRC, and FRC are explained, including the meaning of locality and availability. In Section 3, generation methods of LRCs are summarized with respect to individual types and features, with a focus on BLRC. Finally, the main conclusions are summarized in Section 4.
Figure 1.
Classification of binary locally repairable codes.
2. Preliminaries
2.1. Classification of Storage Codes for DSS
There are several types of codes for data storage systems such as regeneration codes, locally repairable codes, and fractional repetition codes. Regeneration codes are a class of codes that enhance data reliability and facilitate the efficient repair of failed nodes in distributed storage systems [18,19]. The key metric of these codes is the network bandwidth, which is intended to optimize the amount of data communicated to repair a single failure node. In the case of node failure, it is necessary to recover the data stored in the failed node or restore them in the replacement node. This is called repair or regeneration of a node. During the repair process, data are typically downloaded from the remaining nodes. In this case, downloading the entire message is a waste of network resources. Therefore, regeneration codes are introduced to reduce the amount of downloaded data during the repair process while retaining the storage efficiency of traditional maximum distance separable (MDS) codes.
The earliest LRCs were proposed as pyramid codes [20,21,22]. The formal definition of LRC with a tradeoff between locality r and the minimum distance d first appears in [23]. LRCs focus on optimization of the number of nodes accessed for node repair and reconstruction. These codes are introduced in [24] and developed further in [25,26]. In addition, LRCs were recently utilized in distributed storage systems, such as Windows Azure storage [27] and Facebook HDFS-RAID [28].
There are several approaches for the construction of efficient storage codes for distributed storage systems as follows:
- –
- nonlinear codes [25,29];
- –
- vector codes [30,31,32];
- –
- codes over bounded alphabets [33];
- –
- codes with short local MDS [24,30]; and
- –
- codes with local regeneration [30,32].
A more detailed review of each method can be found in [17].
2.2. Locality, Recoverability, and Availability for Hot Data
Several criteria are used to evaluate the performance of distributed storage codes. This subsection introduces the concepts and definitions of the most important ones, including locality, reliability, and availability.
Let be an q-ary code of length n and dimension k over a finite field . The locality of the ith coordinate of is r if the value of the ith symbol of a codeword of is represented as a function of r other coordinates, and no such set of coordinates with cardinality less than r exists. This means that a coordinate in a linear code has locality r if it can be expressed as a linear combination of r other coordinates. The set of such r coordinates that can repair the ith symbol is called a “repair set”. An code with locality r is denoted as an locally repairable code. In addition to maximizing the distance of codewords, the maximal recoverable LRC (MR-LRC) is defined as a code that can modify all theoretically correctable erasure patterns under locality constraints.
If the ith symbol in a codeword is lost, it can be recovered by reading r other symbols in the codeword. In this case, the locality can be classified into two cases: “information locality r” if all information symbols have locality r, and “all-symbol locality r” if all symbols have locality r. In the case of node failure, the decoding complexity of LRCs can be decoupled from the code length n.
Other construction schemes for LRCs are intended to build codes with maximal recoverability (MR) called MR-LRCs, or partial MDS codes. Some examples are found in [34,35,36,37]. For MR-LRCs, it is important to not only maximize the global distance but also to correct any erasure patterns within a theoretical bound. Therefore, they are considered as a stronger class of LRCs than optimal LRCs [37].
Another important performance criterion is availability [38,39,40,41]. Availability is a very important feature when “hot data’ are accessed. Hot data are data that aere frequently accessed simultaneously by many users in front-end systems. A binary linear code of length n is called a t-available r-locally repairable code if every coordinate i for has at least t parity checks of disjoint nonzero elements. A symbol has availability t if it can be read in parallel by t disjoint groups of symbols. These t reads have locality r if each read involves up to r symbols. Replication provides high availability for hot data. For example, considering that replication is performed three times and each symbol can be read in parallel three times, the availability is then and the locality of these reads is . One possible solution is LRC with multiple disjoint recovery sets.
There are two types of availabilities, namely information-symbol availability and all-symbol availability. If an LRC supports availability t for local repair on each of k information symbols, it is referred to as an LRC with information symbol availability. If an LRC supports availability t for all n symbols, it is referred to as an LRC with all-symbol availability [42].
3. Binary Locally Repairable Codes
When the LRCs are first introduced, there is no restriction on the field size. For the Singleton-like bound in [31], there is an optimal construction matching for the bound of field size , where the optimal LRCs are constructed using an algebraic structure. However, the coding complexity can be significantly reduced using BLRC.
Compared to q-ary LRCs, BLRCs are known to be advantageous in terms of implementation in practical systems. In [43], the advantages of BLRC are discussed and compared with non-binary LRC, (14,10) RS code, and three-replication with four metrics including encoding complexity, repair complexity, mean time to data loss, and storage capacity. The authors of [43] further analyzed the advantages of BLRCs with a high Hamming distance and average locality [44,45]. In this section, we introduce bounds for BLRCs and various construction methods of BLRCs.
3.1. Bounds for the Binary Locally Repairable Codes
The bounds and constructions of BLRCs are quite different from those of q-ary LRCs. For the bound, the maximum code dimension of BLRCs is smaller than that of q-ary LRC and the corresponding optimal construction of the former should be made by different motivations such as easy implementation. Initially, we discuss the useful bounds for BLRCs.
Let us start with a general bound on LRC that shows a tradeoff relationship between rate , minimum distance d, and locality r [23]. For linear LRCs with information locality r, there are tradeoffs among n, k, d, and r. Let be an LRC. Assuming that and , the rate is bounded as follows:
In addition, the minimum distance is bounded by [31]
which is called a Singleton-like bound because it is a generalization of the classical Singleton bound for linear codes and we have the Singleton bound if . It is well-known that a q-ary MDS code can achieve a Singleton bound. An optimal LRC achieves the bound with equality. We can consider two extreme cases when and . For , we have and an RS code is an optimal LRC. For , we have and the duplication of an RS code is an optimal LRC. Therefore, we are interested in the case of .
For the bounds of BLRCs, Cadambe–Mazumdar (C-M) [33], linear programming [46], and -space bounds [47,48] are introduced. The first bound, considering the alphabet size, is given as
where denotes the largest possible dimension of an linear code over . The C-M bound is often used to determine whether the given BLRC with short code length is optimal [32]. However, because the exact value of can only be obtained in a limited case with relatively short code length, it is difficult to apply the C-M bound to evaluate the optimality of general BLRCs.
In addition, a linear programming bound was proposed using the Delsarte linear programming method, which is known to be tighter than the C-M bound for BLRCs for some parameters [49]. However, both bounds are expressed in the implicit forms and, thus, it is difficult to apply these bounds to BLRCs with long code lengths.
For an linear LRC , -space bound was recently proposed using sphere packing [47,48]. The -space is defined as the dual of the linear space generated by a minimum set of local parity checks of with overall support covering all coordinates. For an BLRC with disjoint repair groups, where and , the following bound holds for the parity of [50].
- (i)
- If is odd, we have
- (ii)
- If is even, we have
These bounds are advantageous in two ways compared to the previous bounds. Firstly, the -space bound is known to be tighter than the C-M bound for BLRCs with long code lengths. In addition, the inequality of the bound is expressed in an explicit form, i.e., the value of the bound is easily derived for BLRCs with long code lengths. Furthermore, the improved -space bound is induced with the refined packing radius for BLRCs with [50].
A bound in an explicit form for is given in [48]. For an linear BLRC with locality r, such that and , it follows that
In the next subsection, we introduce the construction of BLRCs with various parameters and motivations, some of which are optimal or near-optimal with respect to the aforementioned bounds.
3.2. Classification of Binary Locally Repairable Codes
For the construction of BLRCs, various methods have been proposed based on the following:
- (i)
- cyclic codes [51,52,53,54];
- (ii)
- random vectors [42];
- (iii)
- bipartite graph [44,55,56];
- (iv)
- anticodes [57];
- (v)
- partial spread [50,58];
- (vi)
- generalized Hamming code [47,48]; and
- (vii)
- modification of codes [53,59].
In the following subsections, the various types of constructions of BLRCs are summarized.
3.3. BLRCs from Cyclic Codes
Goparaju and Calderbank proposed several constructions of BLRCs from cyclic codes [51]. Cyclic codes inherently enjoy efficient structures for encoder and decoder implementation. The q-cyclotomic coset is defined as
where a is the smallest positive integer that satisfies . The defining set of an cyclic code is defined as
where has roots in the splitting field , . Using optimal cyclic codes in terms of the Singleton bound, three BLRC constructions are suggested as follows.
Construction (CC1) [51]:
Let , be a factor of n and α be a primitive element of . Let be a cyclic code with the generator polynomial with the defining set as
Then, is an LRC with locality r and dimension .
Construction (CC2) [51]:
Let with even m, and locality . Let be a cyclic code in which the generator polynomial has the defining set
Then, is an LRC of dimension and a distance .
Construction (CC2) is shown to be distance-optimal among the set of linear codes that have disjoint locality parity checks.
Construction (CC3) [51]:
Let . Let α be a primitive element of . The generator polynomial with the defining set
can construct a BLRC that satisfies the following inequality for even k, , and .
The BLRC construction from the binary Hamming code is expressed in the following construction.
Construction (CC4) [51]:
For , we have when . Let be a cyclic code in which the generator polynomial has the defining set
Then, is a three-available two-local LRC with dimension and minimum distance . The corresponding parity check polynomial is then given as
Extending the results in [51], Zeh and Yaakobi proposed several construction methods for BLRC in [52]. These constructions generate BLRCs with locality 2. Construction (CC5) was based on binary reversible codes. Let be the set given as . Let be the defining set of single parity check code with one erasure correctional capability in a block of length . Then, a BLRC can be obtained as in Construction (CC5).
Construction (CC5) [52]:
For odd m, let and . Let be a single parity check code with , where the defining set is given as:
The corresponding code is then an BLRC, where , , and .
In addition, Construction (CC4) was extended to obtain codes with a higher Hamming distance at the cost of a small reduction of the rate as follows:
Construction (CC6) [52]:
Let and (i.e., ). Let be the defining set given as
Then, the corresponding code is a BLRC with , , locality , and availability .
This construction was extended to the construction of simplex code with available and locality 2 as follows.
Construction (CC7) [52]:
Let , which is divisible by (i.e., ). Let be a cyclic simplex code with the defining set given as
The corresponding code is then a BLRC with , , , and dimension .
Another example of BLRCs was proposed by Tamo, Barg, Goparaju, and Calderbank in 2016 as in the following construction.
Construction (CC8) [54]:
Let α be an nth root of unity and let z be an integer such that and . Then, is an binary cyclic code with the defining set D with the coset of the group . Then, the locality of is bound as . Moreover, each symbol of the codewords in has at least recovery sets of size .
A BLRC that can satisfy the explicit bound given in Equation (4) is also proposed in [60] as follows:
Construction (CC9) [60]:
For , let and , where and . Let be a generator polynomial of the cyclic BLRC and be the uth root of unity. Then, BLRC can be constructed using the generator polynomials given by
- (i)
- For , , where is the minimum polynomial of over .
- (ii)
- For , , where m is a positive integer.
3.4. BLRCs from Random Vectors
A family of high-rate BLRCs with locality two and uneven availabilities was proposed in [42], which requires intermediate procedures. The uneven availability is represented as an availability profile. For its construction, a k-tuple binary column vector with a nonzero element at the random position is required. Let be a random function that converts x into a binary vector with the same length by changing a zero element into a nonzero element. From , square matrices for are constructed individually by increasing l as follows:
where is generated from by the lexicographical order of construction, and is the i circularly downward-cyclic-shifted vector of . Then, a matrix for the parity part of the generator matrix in a systematic form is generated by concatenating the matrix as follows:
Construction (RV) [42]:
Let denote the generator matrix of the proposed BLRC in a systematic form. Then, a systematic generator matrix is constructed as
It should be noted that the generator matrix has a code rate of .
An BLRC code from Construction (RV) has an all-symbol locality equal to and the all-symbol availability profile is given by
where the numbers of s, 2s, and 1s are k, , and k, respectively, and each value denotes the availability for local repair of the ith symbol of a codeword in .
3.5. BLRCs from Bipartite Graph
In coding theory, a Tanner graph is a bipartite graph with two sets of vertices, a set of n variable nodes and a set of check nodes, for the constraint of error correcting codes. Suppose that n variable nodes are partitioned into groups. All variable nodes related to each group are linked to a unique check node called the local check node and the other nodes are called the global check nodes. Then, the constructed BLRC can achieve maximum locality r for all symbols.
Construction (BG) [44]:
Let and , where ⊗ denotes the Kronecker product, denotes the all-one vector of length and is the parity check matrix of an Hamming code such as . Then, the parity check matrix of BLRC based on a bipartite graph of parameters is given as
The minimum distance of the parity check matrix H in Construction (BG) is 4. This BLRC is optimal in some cases. Even when it is not optimal, it is shown that this code has a near-optimal code rate with a rate gap of .
In addition, an expander graph based construction of BLRC exists [55,56]. Suppose we have two sets V and C that satisfy the following conditions:
- –
- , ;
- –
- the degree of is t; and
- –
- the degree of c is .
For , the bipartite graph is a -expander if for any subset , implies the size of the subset of C connected to is greater than . In addition, the length of the shortest cycle of the graph G is greater than 4. As such, we can have the following construction:
Construction (EG) [55,56]:
Let be an parity check matrix where and , whose columns correspond to the vertices of V and the rows corresponds to the vertices of C. Then, is equal to one if the corresponding vertices and are connected with an edge. For , the code constructed from is an BLRC.
In Construction (EG), is chosen from the range and is determined as a solution of the following equation:
where . The probability that G is a expander is greater than for . In addition, the code rate is bounded by
where the equality holds for the case whereby is a full rank matrix.
3.6. BLRC from Anticode
An anticode of length n is a code that may contain repeated codewords in and has an upper bound on the distance between codewords [61]. Contrary to the minimum distance in generic error correcting codes, the maximum distance is defined as the maximum Hamming distance between any pair of codewords in . This anticode is a core ingredient of the following BLRC.
The generator matrix of the anticode is a matrix, and all codewords in can be expressed by a linear combination of k rows of . If the rank of is , then each codeword in occurs times. Let be an anticode of length and Hamming weight of 2 and the columns of its generator matrix are all weight-2 vectors of length s.
Construction (AC1) [57]:
Let be a binary simplex code of length , dimension m, and minimum Hamming distance . Let be the generator matrix of , and let its columns consist of all possible nonzero vectors in . We prepend zeros to every column of of to construct an matrix . By deleting the columns in from , we can construct a generator matrix G of BLRC, , with parameters and locality 2.
For , the code satisfies the C-M bound in Equation (1). Moreover, three instances with locality of Construction (AC1) are listed in [57]:
- –
- The code from the anticode is a LRC.
- –
- The code from the anticode is a LRC.
- –
- The code from the anticode is a LRC.
Construction (AC2) [57]:
Let , , be an anticode such that its generator matrix consists of all columns of weight in . Then, zeros are prepended to every column of to form an matrix whose columns will be deleted from to obtain a generator matrix G for the code , which becomes a LRC with locality .
This code achieves the Griemer bound [62].
Construction (AC3) [57]:
Let be an anticode with generator matrix given by
where is the generator matrix of the simplex code . Let be a code obtained based on the Farrell construction using the simplex code and the anticode . Then, is a BLRC with locality .
It is also shown that this code can satisfy the bound in Equation (1).
3.7. BLRCs from Partial Spread
To introduce BLRCs constructed from partial spread, the definition of partial t-spread is given.
Definition [50]:
A partial t-spread of is a collection of t-dimensional subspaces of such that for . Moreover, S is maximal if it has the largest possible size. In particular, if , then S is a t-spread. If , a t-spread of exists.
Now, we can define a BLRC with parity check matrix given by
Then, a BLRC of parameters can be constructed in the following way:
Construction (PS1) [50]:
Let be the all-one vector of length n. Let and be a matrix that has binary expansions of the vectors as its columns, where and are distinct elements of the finite field . Then, the parity check matrix of a BLRC is given as in Equation (5).
For the further extension of Construction (PS1), the parity check matrix can be given as
where . For , is an matrix, whose ith row is the all-one vector of length and the other rows are all-zero vectors. Moreover, is the ith submatrix of . It is well-known that if any columns of the parity check matrix H are linearly independent, the minimum distance of a linear code is greater than or equal to d. Furthermore, for a collection of any columns of , if , then , where satisfy the following two conditions:
- (i)
- For , is even, where ; and
- (ii)
- .
Then, we can construct two k-optimal BLRCs with disjoint repair groups as in the following construction.
Construction (PS2) [50]:
Let and be the maximum partial -spread of . In addition, let be a basis of . For , there exists a binary linear code with the parity check matrix . Let be the set of indices corresponding to nonzero coordinates of a vector x. For , let be the set , where and is the ith column of . When , . Let be an matrix whose columns consist of the vectors in . Then, we can define a BLRC with a parity check matrix H as in Equation (6), where .
A set is -wise weakly independent over if no set , where , has the sum of its elements equal to zero. Then, we have , if the columns of satisfy the following conditions:
- (i)
- for ;
- (ii)
- for ; and
- (iii)
- for .
Construction (PS3) [50]:
Let , and be a maximum partial -spread of and the basis of is . When , there is a binary linear code. Let be the same set in Construction (PS2) for . For , is defined as . Let be an matrix whose columns consist of the vectors in . Then, a BLRC can be constructed using a parity check matrix H in Equation (6) for .
Let be the maximal cardinality of subspace codes over with minimum distance d and dimension k. Then, we can construct a BLRC as follows:
Construction (PS4) [50]:
Construction (PS5) [50]:
Let be a maximum partial two-spread of . The basis of is given as . Then, a BLRC with parity check matrix H of the form in Equation (6) for can be constructed using the submatrices for , which is given as
Another construction based on the partial t-spread is also proposed in [58]. Let q be a prime power and be the vector space of dimension m over .
Construction (PS6) [50]:
Given an integer , determine the smallest integer t such that . An integer m such that can be chosen, and there exists a partial t-spread with a size of at least l of . Let be a basis of and be a set whose elements are defined as for and . Finally, let for . Let s be an integer such that , and we use any vectors in to fill each submatrix as its columns for . Then, the BLRC has length , dimension , minimum distance , and locality r.
Then, the BLRCs and obtained from Construction (PS6) are optimal. In addition, for , the BLRCs from Construction (PS6) are almost optimal in terms of the C-M bound and for , the BLRCs from Construction (PS6) are almost optimal with respect to the C-M bound.
3.8. BLRCs from Generalized Hamming Code
Suppose that s and t are two positive integers such that and . Let A be a binary parity check matrix such that any four columns of this matrix are linearly independent. For , A can be chosen as the identity matrix. For , A is the parity check matrix of a binary linear code that can be built from non-primitive cyclic codes with length . Let be the primitive root of , and let denote the minimum polynomial of . The degree of is . is a parity check matrix defining the binary cyclic code with parameters that is generated by . Then, the set forms a subset of the roots of . By deleting one coordinate of , we can construct the parity check matrix A of the punctured code with parameters . In addition, B is defined as a matrix such that the columns are all nonzero -tuples from , with the first nonzero element equal to 1. Then, B is an parity check matrix of a -ary Hamming code. Using the matrices A and B, a BLRC construction is provided as follows.
Construction (GH1) [47,48]:
Suppose that are the elements corresponding to the columns of A, and the ith column of B is denoted by a vector for . Let be a binary linear code with the parity check matrix given as
where and for , is an matrix whose ith row is an all-one vector, the other rows are all-zero vectors, and is an matrix over whose columns are binary expansions of the vectors .
It is shown that this construction can satisfy the bound given in Equation (4).
The shortening for LRCs can also give us another LRC. Let be an BLRC with locality r such that and . Then, an BLRC with locality r can be obtained by shortening C, where the parameters of satisfy , , and .
Construction (GH2) [48]:
By applying the shortening of the times to C, we have an BLRC.
This kind of code modification approach can be extended to the well-known code modification methods such as extending, shorting, expurgating, augmenting, and lengthening [53], as in the following subsection.
3.9. BLRCs from Code Modification
It is well-known that there are various code modification methods for linear codes. For BLRC, we can also use these modification methods to generate codes with new parameters [53]. Let be an binary code with locality r and let be the minimum distance of its dual code, . By adding a parity bit to each codewords in a with parameters , the extended code with parameters can be obtained. This can be formally presented as
where for odd d and for even d [53]. For BLRCs, we are interested in the locality of the derived codes for a give with locality r. Let be the dual code of . If the maximum Hamming weight among codewords in the code is , then the locality of the extended code is . If the maximum Hamming weight among codewords in is , then the locality of the extended code is . Finally, if is an cyclic code with an odd minimum distance d, then the locality of the dual code in the extended code of is [53].
The shortening can also be applied to the derivation of new BLRC. By deleting codewords in with nonzero values in the last coordinates and removing the last coordinates from the remaining codewords, we can find the shortened code of . This can be formally represented as
For an original binary linear code, it is known that the parameters of the shortened code are given as . Moreover, if the original code is BLRC with locality , then the locality of the shortened code is r or . Let be the dual of and let be the minimum distance of the dual code . Then, for an cyclic code , the locality of code is either or [53].
Next, the expurgation also can be used to generate new BLRC for an BLRC with odd weight codewords. As such, the expurgated code of can be generated as a subcode of by selecting only even weight codewords such that
The corresponding parameters of are given as , where is the minimum Hamming weight of the nonzero codewords in . Let be the dual code of . Then, we have [53].
As an inverse method of the expurgation as previously described, the augmented code of an code without the all-one codeword is defined as the code whose parameters are given as , where is the maximum Hamming weight of codewords in . If the code is cyclic, then the expurgated and augmented codes of are also cyclic [53].
Another example of BLRC from the code modification methods is presented in [60] using the shortened expurgated Hamming code.
Construction (SE-Hamming) [60]:
Let β be a primitive element of and n be a positive integer and divisible by 3 such that . Let is a expurgated Hamming code with the generator polynomial , where is the minimal polynomial of β over . Then, a shortened expurgated Hamming code can be generated by shortening the first information bits of . The concatenation of and an cyclic code with parity check polynomial as an inner code then yields an LRC .
3.10. Summary of BLRC Constructions
We summarize the discussed BLRC construction methods in Table 1. Generally, in Table 1, X denotes the case that the equality of the bound is not achieved for all parameters. For the case of C-M bound, is assumed to satisfy the Singleton bound for given n and d.
Table 1.
Summary of parameters of various BLRC constructions.
4. Conclusions
This paper summarizes the recently proposed constructions for BLRCs and their features. To achieve efficient hardware implementation, the codes are constructed over the binary field because the need for multiplications is obviated during the encoding, decoding, and repair processes. We explain the various construction methods of BLRCs using cyclic code based, random vector based, bipartite or expander graph based, anticode based, partial spread based, and generalized Hamming code based approaches. In addition, construction methods of the BLRCs using code modification methods for linear codes such as extending, shorting, expurgating, and augmenting are introduced.
We selectively review important achievements on BLRCs from the authors’ perspectives and thus obviously the authors’ bias are reflected. Therefore, not being reviewed here does not mean it is not an important result. Especially, we also apologize in advance for the lack of proper citation or lack of new research results because this area is actively researched and many papers have been introduced in a relatively short period of time.
Author Contributions
The authors contributed equally to surveying the literature; conceptualizing the review; and writing, reviewing, editing, and drafting the manuscript.
Funding
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (Ministry of Science and ICT) (No. NRF-2017R1A2B2010588).
Conflicts of Interest
The authors declare no conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| AC | Anticode |
| BG | Bipartite graph |
| BLRC | Binary locally repairable code |
| CC | Cyclic code |
| C-M | Cadambe–Mazumdar |
| DSS | Distributed storage system |
| EG | Expander graph |
| FRC | Fractional repetition code |
| GH | Generalized Hamming code |
| LRC | Locally repairable code |
| MDS | Maximum distance separable |
| MR | Maximal recoverability |
| MR-LRC | Maximal recoverable-LRC |
| PS | Partial spread |
| RC | Regeneration code |
| RV | Random vector |
References
- Rhea, S.; Wells, C.; Eaton, P.; Geels, D.; Zhao, B.; Weatherspoon, H.; Kubiatowicz, J. Maintenance-free global data storage. IEEE Internet Comput. 2001, 5, 40–49. [Google Scholar] [CrossRef]
- Chang, F.; Dean, J.; Ghemawat, S.; Hsieh, W.C.; Wallach, D.A.; Burrows, M.; Chandra, T.; Fikes, A.; Gruber, R.E. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst. (TOCS) 2008, 26. [Google Scholar] [CrossRef]
- Ghemawat, S.; Gobioff, H.; Leung, S.-T. The Google file system. In Proceedings of the 19th ACM Symp. Operating Systems Principles, Bolton Landing, NY, USA, 19–22 October 2003; pp. 20–43. [Google Scholar]
- Borthakur, D. The Hadoop Distributed File System: Architecture and Design. 2007. Available online: https://svn.apache.org/repos/asf/hadoop/common/tags/release-0.16.0/docs/hdfs_design.pdf (accessed on 9 April 2019).
- Rashmi, K.V.; Shah, N.B.; Gu, D.; Kuang, H.; Borthakur, D.; Ramchandran, K. A solution to the network challenges of data recovery in erasure-coded distributed storage systems: A study on the Facebook warehouse cluster. In Proceedings of the 5th USENIX Workshop on Hot Topics in Storage and File Systems, San Jose, CA, USA, 6–9 October 2013. [Google Scholar]
- Dimakis, A.G.; Prabhakaran, V.; Ramchandran, K. Decentralized erasure codes for distributed networked storage. IEEE Trans. Inf. Theory 2006, 52, 2809–2816. [Google Scholar] [CrossRef]
- Wu, Y.; Dimakis, A.G.; Ramchandran, K. Deterministic regenerating codes for distributed storage. In Proceedings of the Annual Allerton Conference on Communication, Control, and Computing, Urbana-Champaign, IL, USA, 18 September 2007; pp. 1–5. [Google Scholar]
- Rashmi, K.; Shah, N.; Kumar, P.V.; Ramchandran, K. Explicit construction of optimal exact regenerating codes for distributed storage. In Proceedings of the 47th Annual Allerton Conference on Communication, Control, and Computing, Urbana-Champaign, IL, USA, 18 September 2009; pp. 1243–1249. [Google Scholar]
- Kim, Y.-S.; Park, H.; No, J.-S. Construction of new fractional repetition codes from relative difference sets with λ = 1. Entropy 2017, 19, 5637. [Google Scholar] [CrossRef]
- Park, H.; Kim, Y.-S. Construction of fractional repetition codes with variable parameters for distributed storage systems. Entropy 2016, 18, 441. [Google Scholar] [CrossRef]
- Tamo, I.; Barg, A. A family of optimal locally recoverable codes. IEEE Trans. Inf. Theory 2014, 60, 4661–4676. [Google Scholar] [CrossRef]
- Rouayheb, S.E.; Ramchandran, K. Fractional repetition codes for repair in distributed storage systems. In Proceedings of the 48th Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 29 September–1 October 2010; pp. 1510–1517. [Google Scholar]
- Dimakis, A.G.; Ramchandran, K.; Wu, Y.; Suh, C. A survey on network codes for distributed storage. Proc. IEEE 2011, 99, 476–489. [Google Scholar] [CrossRef]
- Datta, A.; Oggier, F.E. An overview of codes tailor-made for better repairability in networked distributed storage systems. SIGACT News 2013, 44, 89–105. [Google Scholar] [CrossRef]
- Li, J.; Li, B. Erasure coding for cloud storage systems: A survey. Tsinghua Sci. Technol. 2013, 18, 259–272. [Google Scholar] [CrossRef]
- Liu, S.; Oggier, F. An overview of coding for distributed storage systems. In Network Coding and Subspace Designs; Springer: Berlin, Germany, 2018; pp. 363–383. [Google Scholar]
- Balaji, S.B.; Krishnan, M.N.; Vajha, M.; Ramkumar, V.; Sasidharan, B.; Kumar, P.V. Erasure coding for distributed storage: An overview. Sci. China Inf. Sci. 2018, 61, 100301. [Google Scholar] [CrossRef]
- Rashmi, K.V.; Shah, N.B.; Ramchandran, K.; Kumar, P.V. Regenerating codes for errors and erasures in distributed storage. In Proceedings of the 2012 IEEE International Symposium on Information Theory Proceedings, Cambridge, MA, USA, 1–6 July 2012; pp. 1202–1206. [Google Scholar]
- Dimakis, A.G.; Godfrey, P.B.; Wu, Y.; Wainwright, M.J.; Ramchandran, K. Network coding for distributed storage systems. IEEE Trans. Inf. Theory 2010, 56, 4539–4551. [Google Scholar] [CrossRef]
- Huang, C.; Chen, M.; Li, J. Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems. In Proceedings of the IEEE International Symposium on Network Computing and Applications (NCA 2007), Cambridge, MA, USA, 12–14 July 2007; pp. 79–86. [Google Scholar]
- Huang, C.; Chen, M.; Li, J. Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems. ACM Trans. Storage (TOS) 2013, 9, 3:1–3:28. [Google Scholar] [CrossRef]
- Oggier, F.; Datta, A. Self-repairing homomorphic codes for distributed storage systems. In Proceedings of the IEEE INFOCOM, Shanghai, China, 10–15 April 2011; pp. 1215–1223. [Google Scholar]
- Gopalan, P.; Huang, C.; Simitci, H.; Yekhanin, S. On the locality of codeword symbols. IEEE Trans. Inf. Theory 2012, 58, 6925–6934. [Google Scholar] [CrossRef]
- Prakash, N.; Kamath, G.M.; Lalitha, V.; Kumar, P.V. Optimal linear codes with a local-error-correction property. In Proceedings of the IEEE International Symposium on Information Theory Proceedings (ISIT 2012), Cambridge, MA, USA, 1–6 July 2012; pp. 2776–2780. [Google Scholar]
- Forbes, M.; Yekhanin, S. On the locality of codeword symbols in non-linear codes. Discret. Math. 2014, 324, 78–84. [Google Scholar] [CrossRef]
- Tamo, I.; Papailiopoulos, D.S.; Dimakis, A.G. Optimal locally repairable codes and connections to matroid theory. IEEE Trans. Inf. Theory 2016, 62, 6661–6671. [Google Scholar] [CrossRef]
- Calder, B.; Wang, J.; Ogus, A.; Nilakantan, N.; Skjolsvold, A.; McKelvie, S.; Xu, Y.; Srivastav, S.; Wu, J.; Simitci, H.; et al. Windows Azure storage: A highly available cloud storage service with strong consistency. In Proceedings of the 23th ACM Symposium Operating Systems Principles (SOSP’11), Cascais, Portugal, 23–26 October 2011; pp. 143–157. [Google Scholar]
- Mehrabi, M.; Ardakani, M.; Khabbazian, M. Minimizing the update complexity of Facebook HDFS-RAID locally repairable code. In Proceedings of the 2017 IEEE 86th Vehicular Technology Conference (VTC-Fall), Toronto, ON, Canada, 24–27 September 2017; pp. 1–5. [Google Scholar]
- Papailiopoulos, D.; Dimakis, A.G. Distributed storage codes through Hadamard designs. In Proceedings of the IEEE International Symposium on Information Theory Proceedings (ISIT 2011), St. Petersburg, Russia, 31 July–5 August 2011; pp. 1230–1234. [Google Scholar]
- Silberstein, N.; Rawat, A.S.; Koyluoglu, O.O.; Vishwanath, S. Optimal locally repairable codes via rank-metric codes. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Istanbul, Turkey, 7–12 July 2013; pp. 1819–1823. [Google Scholar]
- Papailiopoulos, D.S.; Dimakis, A.G. Locally repairable codes. IEEE Trans. Inf. Theory 2014, 60, 5843–5855. [Google Scholar] [CrossRef]
- Kamath, G.M.; Prakash, N.; Lalitha, V.; Kumar, P.V. Codes with local regeneration and erasure correction. IEEE Trans. Inf. Theory 2014, 60, 4637–4660. [Google Scholar] [CrossRef]
- Cadambe, V.R.; Mazumdar, A. Bounds on the size of locally recoverable codes. IEEE Trans. Inf. Theory 2015, 61, 5787–5794. [Google Scholar] [CrossRef]
- Chen, M.; Huang, C.; Li, J. On the maximally recoverable property for multi-protection group codes. In Proceedings of the IEEE International Symposium on Information Theory, Nice, France, 24–29 June 2007; pp. 486–490. [Google Scholar]
- Blaum, M.; Hafner, J.L.; Hetzler, S. Partial-MDS codes and their application to RAID type of architectures. IEEE Trans. Inf. Theory 2013, 59, 4510–4519. [Google Scholar] [CrossRef]
- Gopalan, P.; Huang, C.; Jenkins, B.; Yekhanin, S. Explicit maximally recoverable codes with locality. IEEE Trans. Inf. Theory 2014, 60, 5245–5256. [Google Scholar] [CrossRef]
- Martinez-Penas, U.; Kschischangm, F.R. Universal and dynamic locally repairable codes with maximal recoverability via sum-rank codes. In Proceedings of the 2018 56th Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 2–5 October 2018; pp. 792–799. [Google Scholar]
- Shah, N.B.; Lee, K.; Ramchandran, K. When do redundant requests reduce latency? In Proceedings of the 51st Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 2–4 October 2013; pp. 731–738. [Google Scholar]
- Joshi, G.; Liu, Y.; Soljanin, E. On the delay-storage trade-off in content download from coded distributed storage systems. IEEE J. Sel. Areas Commun. 2014, 32, 989–997. [Google Scholar] [CrossRef]
- Liang, G.; Kozat, U. Tofec: Achieving optimal throughput-delay trade-off of cloud storage using erasure codes. In Proceedings of the IEEE Conference Computer Communication (IEEE INFOCOM), Toronto, ON, Canada, 27 April–2 May 2014; pp. 826–834. [Google Scholar]
- ARawat, S.; Papailiopoulos, D.S.; Dimakis, A.G.; Vishwanath, S. Locality and availability in distributed storage. IEEE Trans. Inf. Theory 2016, 62, 4481–4493. [Google Scholar]
- Lee, K.-S.; Park, H.; No, J.-S. New binary locally repairable codes with locality 2 and uneven availabilities for hot data. Entropy 2018, 20, 636. [Google Scholar] [CrossRef]
- Shahabinejad, M.; Khabbazian, M.; Ardakani, M. An efficient binary locally repairable codes for Hadoop distributed file system. IEEE Commun. Lett. 2014, 18, 1287–1290. [Google Scholar] [CrossRef]
- Shahabinejad, M.; Khabbazian, M.; Ardakani, M. A class of binary locally repairable codes. IEEE Trans. Commun. 2016, 64, 3182–3193. [Google Scholar] [CrossRef]
- Shahabinejad, M.; Khabbazian, M.; Ardakani, M. On the average locality of locally repairable codes. IEEE Trans. Commun. 2018, 66, 2773–2783. [Google Scholar] [CrossRef]
- Hu, S.; Tamo, I.; Barg, A. Combinatorial and LP bounds for LRC codes. In Proceedings of the IEEE International Symposium on Information Theory (ISIT 2016), Barcelona, Spain, 10–15 July 2016; pp. 1008–1012. [Google Scholar]
- Wang, A.; Zhang, Z.; Lin, D. Bounds and constructions for linear locally repairable codes over binary fields. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 2033–2037. [Google Scholar]
- Wang, A.; Zhang, Z.; Lin, D. Bounds for binary linear locally repairable codes via a sphere-packing approach. IEEE Trans. Inf. 2019. [Google Scholar] [CrossRef]
- Agarwal, A.; Barg, A.; Hu, S.; Mazumda, A.; Tamo, I. Combinatorial alphabet-dependent bounds for locally recoverable codes. IEEE Trans. Inf. 2018, 64, 3481–34928. [Google Scholar] [CrossRef]
- Ma, J.; Ge, G. Optimal binary linear locally repairable codes with disjoint repair groups. arXiv 2017, arXiv:1711.07138v1. [Google Scholar]
- Goparaju, S.; Calderbank, R. Binary cyclic codes that are locally repairable. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Honolulu, HI, USA, 29 June–4 July 2014; pp. 676–680. [Google Scholar]
- Zeh, A.; Yaakobi, E. Optimal linear and cyclic locally repairable codes over small fields. In Proceedings of the IEEE Information Theory Workshop (ITW), Jerusalem, Israel, 26 April–1 May 2015; pp. 1–5. [Google Scholar]
- Huang, P.; Yaakobi, E.; Uchikawa, H.; Seigel, P.H. Binary linear locally repairable codes. IEEE Trans. Inf. 2016, 62, 6268–6283. [Google Scholar] [CrossRef]
- Tamo, I.; Barg, A.; Goparaju, S.; Calderbank, R. Cyclic LRC codes, binary LRC codes, and upper bounds on the distance of cyclic codes. Int. J. Inf. Coding Theory 2016, 3, 345–364. [Google Scholar] [CrossRef]
- Tamo, I.; Barg, A.; Frolov, A. Bounds on the parameters of locally recoverable codes. IEEE Trans. Inf. Theory 2016, 62, 3070–3083. [Google Scholar] [CrossRef]
- Kruglik, S.; Nazirkhanova, K.; Frolov, A. New bounds and generalizations of locally recoverable codes with availability. IEEE Trans. Inf. Theory 2019. [Google Scholar] [CrossRef]
- Silberstein, N.; Zeh, A. Optimal binary locally repairable codes via anticodes. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Hong Kong, China, 14–19 June 2015; pp. 1247–1251. [Google Scholar]
- Nam, M.Y.; Song, H.Y. Binary locally repairable codes with minimum distance at least 6 based on partial t-spreads. IEEE Commun. Lett. 2017, 21, 1683–1686. [Google Scholar] [CrossRef]
- Kim, C.; No, J.-S. New constructions of binary LRCs with disjoint repair groups and locality 3 using existing LRCs. IEEE Commun. Lett. 2019, 23, 406–409. [Google Scholar] [CrossRef]
- Kim, C.; No, J.-S. New constructions of binary and ternary locally repairable codes using cyclic codes. IEEE Commun. Lett. 2018, 22, 228–231. [Google Scholar] [CrossRef]
- Farrell, P. Linear binary anticodes. Electron. Lett. 1970, 6, 419–421. [Google Scholar] [CrossRef]
- MacWilliams, F.J.; Sloane, N.J.A. The Theory of Error-Correcting Codes; North Holland: Amsterdam, The Netherlands, 1988; p. 547. [Google Scholar]
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
