Overview of Binary Locally Repairable Codes for Distributed Storage Systems

Kim, Young-Sik; Kim, Chanki; No, Jong-Seon

doi:10.3390/electronics8060596

Open AccessFeature PaperArticle

Overview of Binary Locally Repairable Codes for Distributed Storage Systems

by

Young-Sik Kim

^1,*

,

Chanki Kim

² and

Jong-Seon No

²

¹

Department of Information and Communication Engineering, Chosun University, Gwangju 61452, Korea

²

Department of Electrical and Computer Engineering, Institute of New Media and Communications, Seoul National University, Seoul 08826, Korea

^*

Author to whom correspondence should be addressed.

Electronics 2019, 8(6), 596; https://doi.org/10.3390/electronics8060596

Submission received: 9 April 2019 / Revised: 20 May 2019 / Accepted: 22 May 2019 / Published: 28 May 2019

(This article belongs to the Special Issue Energy-Efficient and Reliable Information Processing: Computing and Storage)

Download

Browse Figure

Versions Notes

Abstract

This paper summarizes the details of recently proposed binary locally repairable codes (BLRCs) and their features. The construction of codes over a small alphabet size of symbols is of particular interest for efficient hardware implementation. Therefore, BLRCs are highly noteworthy because no multiplication is required during the encoding, decoding, and repair processes. We explain the various construction approaches of BLRCs such as cyclic code based, bipartite graph based, anticode based, partial spread based, and generalized Hamming code based techniques. We also describe code generation methods based on modifications for linear codes such as extending, shorting, expurgating, and augmenting. Finally, we summarize and compare the parameters of the discussed constructions.

Keywords:

locally repairable codes (LRCs); locality; availability; regeneration codes; distributed storage system; data center

1. Introduction

Efficient distributed storage systems (DSSs) are considered to be crucial infrastructure for handling big data. These systems must be able to reliably store data over a long duration by introducing redundancy and storing data in a distributed manner across several storage nodes, which may be individually unreliable and could generate failures. Large data centers and peer-to-peer storage systems such as OceanStore [1] from Berkeley and BigTable from Google [2] are famous examples of distributed storage systems.

Owing to cost issues, large data centers also use many commercial hardware storage devices such as hard disk drives/solid state devices (HDDs/SSDs). As a result, device failure occurs regularly, rather than as an exception. The data are typically stored in a redundant manner to effectively protect valuable data against potential failures. The traditional storage method for large storage services such as cloud storage is triplication, i.e., triple replication of each symbol. For example, the Google file system [3] and Hadoop [4] adopt this approach. However, given that triplication requires thrice the storage space, a

(14, 10)

Reed–Solomon code is deployed in their warehouse cluster in the case of Facebook [5]. Although RS codes are efficient for handling specified numbers of erasures, all of the code symbols must be communicated and reconstructed to repair erasures. Thus, more efficient storage methods have been actively researched, including regeneration codes (RCs), fractional repetition codes (FRCs), and locally repairable codes (LRCs) [6,7,8,9,10,11,12]. RC attempts to minimize the number of transmitted symbols, while the objective of LRC is to optimize the number of disk reads required to repair a single lost node. In some respects, LRC is essentially a block code with an additional parameter referred to as locality. There have been excellent reviews on the distributed storage codes (e.g., [13,14,15,16]). Moreover, a review article on this topic has recently been published [17]. However, to the best of the authors knowledge, no review paper deals only with the binary LRC (BLRC) constructions, which are practically useful.

In most of the early suggestions for LRC constructions, the alphabet size of the stored symbols is very large. However, for efficient and convenient hardware implementation, the construction of codes over a small alphabet size for the stored symbols is of particular interest. For example, BLRCs are of special interest because multiplication is not necessary during the encoding, decoding, and repair processes.

This paper summarizes the recently proposed construction of BLRCs and their features. The code construction methods discussed in this paper are categorized as in Figure 1. The construction methods of BLRCs are explained using cyclic code based, bipartite graph based, anticode based, partial spread based, and generalized Hamming code based approaches. In addition, the construction of BLRCs using modification methods for linear codes such as extending, shorting, expurgating, augmenting, and lengthening are discussed. This paper is organized into several sections. In Section 2, the basic concepts used in the coding techniques for distributed storage systems are introduced. In addition, the characteristics of RC, LRC, and FRC are explained, including the meaning of locality and availability. In Section 3, generation methods of LRCs are summarized with respect to individual types and features, with a focus on BLRC. Finally, the main conclusions are summarized in Section 4.

2. Preliminaries

2.1. Classification of Storage Codes for DSS

There are several types of codes for data storage systems such as regeneration codes, locally repairable codes, and fractional repetition codes. Regeneration codes are a class of codes that enhance data reliability and facilitate the efficient repair of failed nodes in distributed storage systems [18,19]. The key metric of these codes is the network bandwidth, which is intended to optimize the amount of data communicated to repair a single failure node. In the case of node failure, it is necessary to recover the data stored in the failed node or restore them in the replacement node. This is called repair or regeneration of a node. During the repair process, data are typically downloaded from the remaining nodes. In this case, downloading the entire message is a waste of network resources. Therefore, regeneration codes are introduced to reduce the amount of downloaded data during the repair process while retaining the storage efficiency of traditional maximum distance separable (MDS) codes.

The earliest LRCs were proposed as pyramid codes [20,21,22]. The formal definition of LRC with a tradeoff between locality r and the minimum distance d first appears in [23]. LRCs focus on optimization of the number of nodes accessed for node repair and reconstruction. These codes are introduced in [24] and developed further in [25,26]. In addition, LRCs were recently utilized in distributed storage systems, such as Windows Azure storage [27] and Facebook HDFS-RAID [28].

There are several approaches for the construction of efficient storage codes for distributed storage systems as follows:

–: nonlinear codes [25,29];
–: vector codes [30,31,32];
–: codes over bounded alphabets [33];
–: codes with short local MDS [24,30]; and
–: codes with local regeneration [30,32].

A more detailed review of each method can be found in [17].

2.2. Locality, Recoverability, and Availability for Hot Data

Several criteria are used to evaluate the performance of distributed storage codes. This subsection introduces the concepts and definitions of the most important ones, including locality, reliability, and availability.

Let

C

be an

(n, k)

q-ary code of length n and dimension k over a finite field

F_{q}

. The locality of the ith coordinate of

C

is r if the value of the ith symbol of a codeword of

C

is represented as a function of r other coordinates, and no such set of coordinates with cardinality less than r exists. This means that a coordinate in a linear code has locality r if it can be expressed as a linear combination of r other coordinates. The set of such r coordinates that can repair the ith symbol is called a “repair set”. An

(n, k)

code

C

with locality r is denoted as an

(n, k, r)

locally repairable code. In addition to maximizing the distance of codewords, the maximal recoverable LRC (MR-LRC) is defined as a code that can modify all theoretically correctable erasure patterns under locality constraints.

If the ith symbol

c_{i}

in a codeword is lost, it can be recovered by reading r other symbols in the codeword. In this case, the locality can be classified into two cases: “information locality r” if all information symbols have locality r, and “all-symbol locality r” if all symbols have locality r. In the case of node failure, the decoding complexity of LRCs can be decoupled from the code length n.

Other construction schemes for LRCs are intended to build codes with maximal recoverability (MR) called MR-LRCs, or partial MDS codes. Some examples are found in [34,35,36,37]. For MR-LRCs, it is important to not only maximize the global distance but also to correct any erasure patterns within a theoretical bound. Therefore, they are considered as a stronger class of LRCs than optimal LRCs [37].

Another important performance criterion is availability [38,39,40,41]. Availability is a very important feature when “hot data’ are accessed. Hot data are data that aere frequently accessed simultaneously by many users in front-end systems. A binary linear code

C

of length n is called a t-available r-locally repairable code if every coordinate i for

1 \leq i \leq n

has at least t parity checks of disjoint

r + 1

nonzero elements. A symbol has availability t if it can be read in parallel by t disjoint groups of symbols. These t reads have locality r if each read involves up to r symbols. Replication provides high availability for hot data. For example, considering that replication is performed three times and each symbol can be read in parallel three times, the availability is then

t = 3

and the locality of these reads is

r = 1

. One possible solution is LRC with multiple disjoint recovery sets.

There are two types of availabilities, namely information-symbol availability and all-symbol availability. If an

(n, k, r, d)

LRC supports availability t for local repair on each of k information symbols, it is referred to as an

(n, k, r, t, d)

LRC with information symbol availability. If an

(n, k, r, d)

LRC supports availability t for all n symbols, it is referred to as an

(n, k, r, t, d)

LRC with all-symbol availability [42].

3. Binary Locally Repairable Codes

When the LRCs are first introduced, there is no restriction on the field size. For the Singleton-like bound in [31], there is an optimal construction matching for the bound of field size

q > n + 1

, where the optimal LRCs are constructed using an algebraic structure. However, the coding complexity can be significantly reduced using BLRC.

Compared to q-ary LRCs, BLRCs are known to be advantageous in terms of implementation in practical systems. In [43], the advantages of

(n, k, d, r) = (15, 10, 4, 6)

BLRC are discussed and compared with

(16, 10, 4, 5)

non-binary LRC, (14,10) RS code, and three-replication with four metrics including encoding complexity, repair complexity, mean time to data loss, and storage capacity. The authors of [43] further analyzed the advantages of BLRCs with a high Hamming distance and average locality [44,45]. In this section, we introduce bounds for BLRCs and various construction methods of BLRCs.

3.1. Bounds for the Binary Locally Repairable Codes

The bounds and constructions of BLRCs are quite different from those of q-ary LRCs. For the bound, the maximum code dimension of BLRCs is smaller than that of q-ary LRC and the corresponding optimal construction of the former should be made by different motivations such as easy implementation. Initially, we discuss the useful bounds for BLRCs.

Let us start with a general bound on LRC that shows a tradeoff relationship between rate

k / n

, minimum distance d, and locality r [23]. For linear LRCs with information locality r, there are tradeoffs among n, k, d, and r. Let

C

be an

(n, k, r)

LRC. Assuming that

r | k

and

(r + 1) | n

, the rate is bounded as follows:

\begin{matrix} \frac{k}{n} \leq \frac{r}{r + 1} . \end{matrix}

In addition, the minimum distance is bounded by [31]

\begin{matrix} d \leq n - k - ⌊\frac{k}{r}⌋ + 2, \end{matrix}

which is called a Singleton-like bound because it is a generalization of the classical Singleton bound for linear codes and we have the Singleton bound if

r = k

. It is well-known that a q-ary

(n, k, d)

MDS code can achieve a Singleton bound. An optimal

(n, k, r)

LRC achieves the bound with equality. We can consider two extreme cases when

r = k

and

r = 1

. For

r = k

, we have

d \leq n - k + 1

and an

(n, k)

RS code is an

(n, k, r = k)

optimal LRC. For

r = 1

, we have

d \leq n - k - ⌊ k ⌋ + 2 = 2 (\frac{n}{2} - k + 1)

and the duplication of an

(n / 2, k)

RS code is an

(n, k, r = 1)

optimal LRC. Therefore, we are interested in the case of

1 < r < k

.

For the bounds of BLRCs, Cadambe–Mazumdar (C-M) [33], linear programming [46], and

L

-space bounds [47,48] are introduced. The first bound, considering the alphabet size, is given as

\begin{matrix} k \leq min_{t \in Z^{+}} [t r + k_{o p t}^{(q)} (n - (r + 1) t, d)], \end{matrix}

(1)

where

k_{o p t}^{(q)} (n, d)

denotes the largest possible dimension of an

(n, k, d)

linear code over

F_{q}

. The C-M bound is often used to determine whether the given BLRC with short code length is optimal [32]. However, because the exact value of

k_{o p t}^{q} (n, d)

can only be obtained in a limited case with relatively short code length, it is difficult to apply the C-M bound to evaluate the optimality of general BLRCs.

In addition, a linear programming bound was proposed using the Delsarte linear programming method, which is known to be tighter than the C-M bound for BLRCs for some parameters [49]. However, both bounds are expressed in the implicit forms and, thus, it is difficult to apply these bounds to BLRCs with long code lengths.

For an

(n, k, d)

linear LRC

C

,

L

-space bound was recently proposed using sphere packing [47,48]. The

L

-space is defined as the dual of the linear space generated by a minimum set of local parity checks of

C

with overall support covering all coordinates. For an

(n, k, d, r)

BLRC with disjoint repair groups, where

d = 2 t + 2

and

n = (r + 1) l

, the following bound holds for the parity of

t + 1

[50].

(i): If $t + 1$ is odd, we have

$\begin{matrix} k \leq \frac{r n}{r + 1} - ⌈{log}_{2} (\sum_{0 \leq i_{1} + \dots + i_{l} \leq ⌊ \frac{d - 1}{4} ⌋} \prod_{j = 1}^{l} (\binom{r + 1}{2 i_{j}}))⌉ . \end{matrix}$

(2)
(ii): If $t + 1$ is even, we have

$\begin{matrix} k \leq \frac{r n}{r + 1} - ⌈{log}_{2} (\sum_{0 \leq i_{1} + \dots + i_{l} \leq ⌊ \frac{d - 1}{4} ⌋} \prod_{j = 1}^{l} (\binom{r + 1}{2 i_{j}}) + \frac{\sum_{i_{1} + \dots + i_{l} = \frac{d}{4}} \prod_{j = 1}^{l} (\binom{r + 1}{2 i_{j}})}{⌊ \frac{n}{t + 1} ⌋})⌉ . \end{matrix}$

(3)

These bounds are advantageous in two ways compared to the previous bounds. Firstly, the

L

-space bound is known to be tighter than the C-M bound for BLRCs with long code lengths. In addition, the inequality of the bound is expressed in an explicit form, i.e., the value of the bound is easily derived for BLRCs with long code lengths. Furthermore, the improved

L

-space bound is induced with the refined packing radius for BLRCs with

4 | d

[50].

A bound in an explicit form for

d \geq 5

is given in [48]. For an

(n, k, d)

linear BLRC with locality r, such that

d \geq 5

and

2 \leq r \leq \frac{n}{2} - 2

, it follows that

\begin{matrix} k \leq \frac{n r}{r + 1} - min \{{log}_{2} (1 + \frac{r n}{2}), \frac{r n}{(r + 1) (r + 2)}\} . \end{matrix}

(4)

In the next subsection, we introduce the construction of BLRCs with various parameters and motivations, some of which are optimal or near-optimal with respect to the aforementioned bounds.

3.2. Classification of Binary Locally Repairable Codes

For the construction of BLRCs, various methods have been proposed based on the following:

(i): cyclic codes [51,52,53,54];
(ii): random vectors [42];
(iii): bipartite graph [44,55,56];
(iv): anticodes [57];
(v): partial spread [50,58];
(vi): generalized Hamming code [47,48]; and
(vii): modification of codes [53,59].

In the following subsections, the various types of constructions of BLRCs are summarized.

3.3. BLRCs from Cyclic Codes

Goparaju and Calderbank proposed several constructions of BLRCs from cyclic codes [51]. Cyclic codes inherently enjoy efficient structures for encoder and decoder implementation. The q-cyclotomic coset

M_{i, n}

is defined as

\begin{matrix} M_{i, n} = {i q^{j} mod n | 1 \leq j < a}, \end{matrix}

where a is the smallest positive integer that satisfies

i q^{a} \equiv i mod n

. The defining set of an

(n, k, d)

cyclic code

C

is defined as

\begin{matrix} D_{C} = {i | g (α^{i}) = 0, 0 \leq i \leq n - 1}, \end{matrix}

where

g (x)

has roots in the splitting field

F_{q^{s}}

,

n | (q^{s} - 1)

. Using optimal cyclic codes in terms of the Singleton bound, three BLRC constructions are suggested as follows.

Construction (CC1) [51]:

Let

n = 2^{m} - 1

,

r + 1

be a factor of n and α be a primitive element of

F_{2^{m}}

. Let

C

be a cyclic code with the generator polynomial

g (x)

with the defining set as

\begin{matrix} D_{C} = {j mod (r + 1) | 0 \leq j \leq n - 1} . \end{matrix}

Then,

C

is an LRC with locality r and dimension

k = r n / (r + 1)

.

Construction (CC2) [51]:

Let

n = 2^{m} - 1

with even m, and locality

r = 2

. Let

C

be a cyclic code in which the generator polynomial

g (x)

has the defining set

\begin{matrix} D_{C} = {j mod 3 | 0 \leq j \leq n - 1} \cup M_{1, n} . \end{matrix}

Then,

C

is an LRC of dimension

k = \frac{2}{3} (2^{m} - 1) - m

and a distance

d \geq 6

.

Construction (CC2) is shown to be distance-optimal among the set of linear codes that have disjoint locality parity checks.

Construction (CC3) [51]:

Let

n = 2^{m} - 1

. Let α be a primitive element of

F_{q}

. The generator polynomial with the defining set

\begin{matrix} D_{C} = {j mod 3 | 0 \leq j \leq n - 1} \cup M_{1, n} \cup M_{- 1, n} \end{matrix}

can construct a BLRC that satisfies the following inequality

k \leq \frac{2}{3} (2^{m} - 1) - 2 m

for even k,

d = 10

, and

r = 2

.

The BLRC construction from the

(7, 4, 3)

binary Hamming code is expressed in the following construction.

Construction (CC4) [51]:

For

3 | m

, we have

7 | n

when

n = 2^{m} - 1

. Let

C

be a cyclic code in which the generator polynomial

g (x)

has the defining set

\begin{matrix} D_{C} = {j | j (mod 7) \in {0, 3, 5, 6}, 0 \leq j \leq n - 1} . \end{matrix}

Then,

C

is a three-available two-local LRC with dimension

k = 3 n / 7

and minimum distance

d = 4

. The corresponding parity check polynomial

h (x)

is then given as

\begin{matrix} h (x) = 1 + x^{n / 7} + x^{3 n / 7} . \end{matrix}

Extending the results in [51], Zeh and Yaakobi proposed several construction methods for BLRC in [52]. These constructions generate BLRCs with locality 2. Construction (CC5) was based on binary reversible codes. Let

D_{C}^{[l]}

be the set given as

{(i + l) | i \in D_{C}}

. Let

D_{L}

be the defining set of

(r + 1, r, 2)

single parity check code with one erasure correctional capability in a block of length

r + 1

. Then, a BLRC can be obtained as in Construction (CC5).

Construction (CC5) [52]:

For odd m, let

n = 2^{m} + 1

and

3 | n

. Let

L

be a

(3, 2, 2)

single parity check code with

D_{L} = {0}

, where the defining set is given as:

\begin{matrix} D_{C} = D_{L} \cup D_{L}^{[3]} \cup \dots \cup D_{L}^{[n - 3]} \cup M_{1, n} = {j mod 3 | 0 \leq j \leq n - 1} \cup M_{1, n} . \end{matrix}

The corresponding code

C

is then an

(n, k, d, r)

BLRC, where

k = \frac{2}{3} (2^{m} + 1) - 2 m

,

d \geq 10

, and

r = 2

.

In addition, Construction (CC4) was extended to obtain codes with a higher Hamming distance at the cost of a small reduction of the rate as follows:

Construction (CC6) [52]:

Let

n = 2^{m} - 1

and

7 | n

(i.e.,

3 | m

). Let

D_{C}

be the defining set given as

\begin{matrix} D_{C} & = {j | j (mod 7) \in {0, 3, 5, 6}, 0 \leq j \leq n - 1} \cup M_{1, n} \\ = {\dots, - 9, - 8, - 7, - 4, - 2, - 1, 0, 3, 5, 6, 7, 10, 12, 13, \dots} \cup M_{1, n} . \end{matrix}

Then, the corresponding code

C

is a BLRC with

k = 3 n / 7 - m

,

d \geq 12

, locality

r = 2

, and availability

t = 2

.

This construction was extended to the construction of

(2^{a} - 1, a, 2^{a - 1})

simplex code

L

with available

(2^{a - 1} - 1)

and locality 2 as follows.

Construction (CC7) [52]:

Let

n = 2^{m} - 1

, which is divisible by

2^{a} - 1

(i.e.,

a | m

). Let

L

be a

(2^{a} - 1, a, 2^{a - 1})

cyclic simplex code with the defining set given as

\begin{matrix} D_{L} = {0, 3, 5, 6, 2^{a - 1 + 1}, \dots, 2^{a} - 1} \cup M_{1, n} . \end{matrix}

The corresponding code

C

is then a BLRC with

d \geq 2^{a} + 2^{a - 1}

,

r = 2

,

t = 2^{a - 1} - 1

, and dimension

k = \frac{a}{2^{a} - 1} (2^{m} - 1) - m

.

Another example of BLRCs was proposed by Tamo, Barg, Goparaju, and Calderbank in 2016 as in the following construction.

Construction (CC8) [54]:

Let α be an nth root of unity and let z be an integer such that

(2^{z} - 1) | n

and

z \geq 1

. Then,

D

is an

(n, k)

binary cyclic code with the defining set D with the coset

α G_{2^{z} - 1}

of the group

G_{2^{z} - 1} = < α^{2^{z} - 1} >

. Then, the locality of

D

is bound as

r \leq 2^{z - 1} - 1

. Moreover, each symbol of the codewords in

D

has at least

2^{z - 1}

recovery sets

A_{i}

of size

2^{z - 1} - 1

.

A BLRC that can satisfy the explicit bound given in Equation (4) is also proposed in [60] as follows:

Construction (CC9) [60]:

For

(r + 1) | n

, let

v = \frac{n}{r + 1}

and

u = r + 1

, where

gcd (u, v) = 1

and

u, v \geq 2

. Let

g (x)

be a generator polynomial of the cyclic BLRC and

β^{'}

be the uth root of unity. Then,

(u v, u v - deg (g (x)), 4, u - 1)

BLRC can be constructed using the generator polynomials given by

(i): For $2 | r$ , $g (x) = (x^{v} + 1) g_{1} (x)$ , where $g_{1} (x)$ is the minimum polynomial of $β^{'}$ over $F_{2}$ .
(ii): For $r = 2^{m} - 1$ , $g (x) = (x^{v} + 1) {(x + 1)}^{2^{m - 1}}$ , where m is a positive integer.

3.4. BLRCs from Random Vectors

A family of high-rate BLRCs with locality two and uneven availabilities was proposed in [42], which requires intermediate procedures. The uneven availability is represented as an availability profile. For its construction, a k-tuple binary column vector

z_{k}

with a nonzero element at the random position is required. Let

Z (x)

be a random function that converts x into a binary vector with the same length by changing a zero element into a nonzero element. From

z_{k}

,

k \times k

square matrices

P_{k, l}

for

1 \leq l \leq k - 1

are constructed individually by increasing l as follows:

\begin{matrix} P_{k, l} = [Z^{l} (z_{k}) Z_{(1)}^{l} (z_{k}) \dots Z_{(k - 1)}^{l} (z_{k})], \end{matrix}

where

Z^{l} (z_{k})

is generated from

Z^{l - 1} (z_{k})

by the lexicographical order of construction, and

Z_{(i)}^{l} (z_{k})

is the i circularly downward-cyclic-shifted vector of

Z^{l} (z_{k})

. Then, a

k \times k (k - 2)

matrix

P_{k}

for the parity part of the generator matrix in a systematic form is generated by concatenating the matrix

P_{k, 1}, P_{k, 2}, \dots, P_{k, k - 2}

as follows:

\begin{matrix} P_{k} = [P_{k, 1} P_{k, 2} \dots P_{k, k - 2}] = [Z^{1} (z_{k}) Z_{(1)}^{1} (z_{k}) \dots Z_{(k - 1)}^{1} (z_{k}) \dots Z^{k - 2} (z_{k}) Z_{(1)}^{k - 2} (z_{k}) \dots Z_{(k - 1)}^{k - 2} (z_{k})] . \end{matrix}

Construction (RV) [42]:

Let

G_{(n, k)}

denote the generator matrix of the proposed

(n, k)

BLRC

C

in a systematic form. Then, a

k \times n

systematic generator matrix

G_{(n, k)}

is constructed as

\begin{matrix} G_{(n, k)} = [I_{k} P_{k}] . \end{matrix}

It should be noted that the

k \times k (k - 1)

generator matrix

G_{(n, k)}

has a code rate of

R = 1 / (k - 1)

.

An

(n, k)

BLRC code

C

from Construction (RV) has an all-symbol locality equal to

r = 2

and the all-symbol availability profile is given by

\begin{matrix} t = [k - 1, \dots, k - 1, 2, \dots, 2, 1, \dots, 1], \end{matrix}

where the numbers of

(k - 1)

s, 2s, and 1s are k,

k (k - 3)

, and k, respectively, and each value denotes the availability for local repair of the ith symbol of a codeword in

C

.

3.5. BLRCs from Bipartite Graph

In coding theory, a Tanner graph is a bipartite graph with two sets of vertices, a set of n variable nodes and a set of

(n - k)

check nodes, for the constraint of error correcting codes. Suppose that n variable nodes are partitioned into

l = n / (r + 1)

groups. All variable nodes related to each group are linked to a unique check node called the local check node and the other nodes are called the global check nodes. Then, the constructed BLRC can achieve maximum locality r for all symbols.

Construction (BG) [44]:

Let

H_{B L} = I_{\frac{n}{r + 1}} \otimes 1_{r + 1} \in F_{2}^{\frac{n}{r + 1} \times n}

and

H_{B G} = 1_{\frac{n}{r + 1}} \otimes H_{0}^{(r)} \in F_{2}^{⌈ {log}_{2} (r + 1) ⌉ \times n}

, where ⊗ denotes the Kronecker product,

1_{r + 1}

denotes the all-one vector of length

r + 1

and

H_{0}^{(r)}

is the parity check matrix of an

(r + 1, r + 1 - ⌈ {log}_{2} (r + 1) ⌉)

Hamming code such as

H_{0}^{(r)} = (0, 1, \dots, r) \in F_{2}^{⌈ {log}_{2} (r + 1) ⌉ \times (r + 1)}

. Then, the parity check matrix of BLRC based on a bipartite graph of parameters

(n, \frac{r n}{r + 1} - ⌈ {log}_{2} (r + 1) ⌉, 4, r)

is given as

\begin{matrix} H = (\begin{matrix} H_{B L} \\ H_{B G} \end{matrix}) \in F_{2}^{(n - k) \times n} . \end{matrix}

The minimum distance of the parity check matrix H in Construction (BG) is 4. This BLRC is optimal in some cases. Even when it is not optimal, it is shown that this code has a near-optimal code rate with a rate gap of

O (\frac{log r}{n})

.

In addition, an expander graph based construction of BLRC exists [55,56]. Suppose we have two sets V and C that satisfy the following conditions:

–: $| V | = n$ , $| C | = \frac{n t}{r + 1}$ ;
–: the degree of $v \in V$ is t; and
–: the degree of c is $r + 1$ .

For

0 < α, γ \leq 1

, the bipartite graph

G = (V \cup C, E)

is a

(t, r + 1, α, t γ)

-expander if for any subset

V^{'} \subset V

,

| V^{'} | \leq α n

implies the size of the subset of C connected to

V^{'}

is greater than

t γ | V^{'} |

. In addition, the length of the shortest cycle of the graph G is greater than 4. As such, we can have the following construction:

Construction (EG) [55,56]:

Let

H_{E}

be an

m \times n

parity check matrix

[h_{i, j}]

where

1 \leq i \leq m

and

1 \leq j \leq n

, whose columns correspond to the vertices of V and the rows corresponds to the vertices of C. Then,

h_{i, j}

is equal to one if the corresponding vertices

c_{i}

and

v_{j}

are connected with an edge. For

t < r + 1

, the code

C_{E}

constructed from

H_{E}

is an

(n, k, δ, r, t)

C_{E}

BLRC.

In Construction (EG),

γ

is chosen from the range

[\frac{1}{1 + r}, 1 - \frac{1}{t})

and

α

is determined as a solution of the following equation:

\begin{matrix} (t - 1) h (α) / t - h (α γ (r + 1)) / r + 1 - δ γ (r + 1) h (\frac{1}{γ (r + 1)}) = 0, \end{matrix}

where

h (x) = - x {log}_{2} x - (1 - x) {log}_{2} (1 - x)

. The probability that G is a

(t, r + 1, α^{'}, t γ)

expander is greater than

1 - O (n^{- t (1 - γ) - 1})

for

0 < α^{'} < α

. In addition, the code rate is bounded by

\begin{matrix} R \geq 1 - \frac{t}{r + 1} - o (1), \end{matrix}

where the equality holds for the case whereby

H_{E}

is a full rank matrix.

3.6. BLRC from Anticode

An anticode

A

of length n is a code that may contain repeated codewords in

F_{2}^{n}

and has an upper bound on the distance between codewords [61]. Contrary to the minimum distance in generic error correcting codes, the maximum distance

δ

is defined as the maximum Hamming distance between any pair of codewords in

A

. This anticode is a core ingredient of the following BLRC.

The generator matrix

G_{A}

of the anticode

A

is a

k \times n

matrix, and all codewords in

A

can be expressed by a linear combination of k rows of

G_{A}

. If the rank of

G_{A}

is

γ

, then each codeword in

A

occurs

2^{k - γ}

times. Let

A_{s, 2}

be an anticode of length

n = (\binom{s}{2})

and Hamming weight of 2 and the columns of its generator matrix

G_{A}

are all weight-2 vectors of length s.

Construction (AC1) [57]:

Let

S_{m}

be a binary simplex code of length

2^{m} - 1

, dimension m, and minimum Hamming distance

2^{m - 1}

. Let

G_{m}

be the generator matrix of

S_{m}

, and let its columns consist of all possible nonzero vectors in

F_{2}^{m}

. We prepend

m - s

zeros to every column of

G_{A}

of

A_{s, 2}

to construct an

m \times (\binom{s}{2})

matrix

G_{A}^{'}

. By deleting the columns in

G_{A}^{'}

from

G_{m}

, we can construct a generator matrix G of BLRC,

C_{m, s, 2}

, with parameters

(2^{m} - (\binom{s}{2}) - 1, m, 2^{m - 1} - ⌊ \frac{s^{2}}{4} ⌋)

and locality 2.

For

3 \leq s \leq 5

, the code

C_{m, s, 2}

satisfies the C-M bound in Equation (1). Moreover, three instances with locality

r = 2

of Construction (AC1) are listed in [57]:

–: The code $C_{m, 3, 2}$ from the anticode $A_{3, 2}$ is a $(2^{m} - 4, m, 2^{m - 1} - 2)$ LRC.
–: The code $C_{m, 4, 2}$ from the anticode $A_{4, 2}$ is a $(2^{m} - 7, m, 2^{m - 1} - 4)$ LRC.
–: The code $C_{m, 5, 6}$ from the anticode $A_{5, 2}$ is a $(2^{m} - 11, m, 2^{m - 1} - 6)$ LRC.

Construction (AC2) [57]:

Let

A_{t; 2, 3, \dots, t - 1}

,

3 \leq t \leq m

, be an anticode such that its generator matrix

G_{A}

consists of all columns of weight in

{2, 3, \dots, t - 1}

. Then,

m - t

zeros are prepended to every column of

G_{A}

to form an

m \times \sum_{i = 2}^{t - 1} (\binom{t}{i})

matrix whose columns will be deleted from

G_{m}

to obtain a generator matrix G for the code

C_{m, t}

, which becomes a

(2^{m} - 2^{t} + t + 1, m, 2^{m - 1} - 2^{t - 1} + 2)

LRC with locality

r = 2

.

This code achieves the Griemer bound [62].

Construction (AC3) [57]:

Let

A_{m - 1}

be an anticode with generator matrix given by

\begin{matrix} G_{A} = (\begin{matrix} 1 & 0 & \dots & 0 \\ 0 \\ ⋮ & G_{m - 1} \\ 0 \end{matrix}), \end{matrix}

where

G_{m - 1}

is the generator matrix of the simplex code

S_{m - 1}

. Let

C

be a code obtained based on the Farrell construction using the simplex code

S_{m}

and the anticode

A_{m - 1}

. Then,

C

is a

(2^{m - 1} - 1, m, 2^{m - 2} - 1)

BLRC with locality

r = 3

.

It is also shown that this code can satisfy the bound in Equation (1).

3.7. BLRCs from Partial Spread

To introduce BLRCs constructed from partial spread, the definition of partial t-spread is given.

Definition [50]:

A partial t-spread of

F_{q}^{m}

is a collection

S = {W_{1}, \dots, W_{l}}

of t-dimensional subspaces of

F_{q}^{m}

such that

W_{i} \cap W_{j} = {0}

for

1 \leq i < j \leq l

. Moreover, S is maximal if it has the largest possible size. In particular, if

\cup_{i = 1}^{n} W_{i} = F_{q}^{m}

, then S is a t-spread. If

t | n

, a t-spread of

F_{q}^{m}

exists.

Now, we can define a BLRC

C

with parity check matrix given by

\begin{matrix} H = (\begin{matrix} H_{L} \\ H_{G} \end{matrix}) . \end{matrix}

(5)

Then, a BLRC

C

of parameters

(n, k \geq \frac{r n}{r + 1} - t ⌈ {log}_{2} n ⌉, d \geq 2 t + 2, r)

can be constructed in the following way:

Construction (PS1) [50]:

Let

1_{n}

be the all-one vector of length n. Let

H_{L} = I_{\frac{n}{r + 1}} \otimes 1_{r + 1}

and

H_{G}

be a

t ⌈ {log}_{2} n ⌉ \times n

matrix that has binary expansions of the vectors

{a_{1}, a_{2}, \dots, a_{n}}

as its columns, where

a_{i} = {(β_{i}, β_{i}^{3}, \dots, β_{i}^{2 t - 1})}^{T}

and

β_{1}, \dots, β_{n}

are distinct elements of the finite field

F_{2^{⌈ {log}_{2} n ⌉}}

. Then, the parity check matrix of a BLRC

C

is given as in Equation (5).

For the further extension of Construction (PS1), the parity check matrix can be given as

\begin{matrix} H = (\begin{matrix} H_{L} \\ H_{G} \end{matrix}) = (\begin{matrix} H_{L}^{1} & H_{L}^{2} & \dots & H_{L}^{l} \\ H_{G}^{1} & H_{G}^{2} & \dots & H_{G}^{l} \end{matrix}), \end{matrix}

(6)

where

l = \frac{n}{r + 1}

. For

i \in [1, l]

,

H_{L}^{i}

is an

l \times (r + 1)

matrix, whose ith row is the all-one vector of length

r + 1

and the other rows are all-zero vectors. Moreover,

H_{G}^{i}

is the ith

(n - k - l) \times (r + 1)

submatrix of

H_{G} = (H_{G}^{1} H_{G}^{2} \dots H_{G}^{l})

. It is well-known that if any

d - 1

columns of the parity check matrix H are linearly independent, the minimum distance of a linear code is greater than or equal to d. Furthermore, for a collection of any

a_{i}

columns

{c_{1}^{i}, c_{2}^{i}, \dots, c_{a_{i}}^{i}}

of

H_{G}^{i}

, if

\sum_{i = 1}^{l} \sum_{j = 1}^{a_{i}} c_{j}^{i} \neq 0

, then

d \leq 2 t + 2

, where

a_{1}, a_{2}, \dots, a_{l}

satisfy the following two conditions:

(i): For $1 \leq i \leq l$ , $a_{i}$ is even, where $0 \leq a_{i} \leq min {2 t, r + 1}$ ; and
(ii): $2 \leq \sum_{i = 1}^{l} a_{i} \leq 2 t$ .

Then, we can construct two k-optimal

(n, k, d, r)

BLRCs with disjoint repair groups as in the following construction.

Construction (PS2) [50]:

Let

r = 2^{t}

and

{W_{1}, \dots, W_{a}}

be the maximum partial

2 t

-spread of

F_{2}^{s}

. In addition, let

{e_{1}^{(i)}, e_{2}^{(i)}, \dots, e_{2 t}^{(i)}}

be a basis of

W_{i}

. For

t \leq 3

, there exists a

(2^{t}, 2^{t} - 2 t, \geq 5)

binary linear code with the parity check matrix

H_{b}

. Let

s u p p (x)

be the set of indices corresponding to nonzero coordinates of a vector x. For

i \in [1, a]

, let

T^{(i)}

be the set

{0} \cup {f_{i} | 1 \leq i \leq n}

, where

f_{i} = \sum_{j \in s u p p (h_{i})} e_{j}^{(i)}

and

h_{i}

is the ith column of

H_{b}

. When

t = 1, 2

,

T^{(i)} = {0, e_{1}^{(i)}, e_{2}^{(i)}, \dots, e_{2 t}^{(i)}}

. Let

H_{G}^{i}

be an

s \times (2^{t} + 1)

matrix whose columns consist of the vectors in

T^{(i)}

. Then, we can define a BLRC with a parity check matrix H as in Equation (6), where

\frac{s}{r} < l \leq a

.

A set

T \subseteq F

is

τ

-wise weakly independent over

F_{2} \subseteq F

if no set

T^{'} \subseteq T

, where

2 \leq | T^{'} | \leq τ

, has the sum of its elements equal to zero. Then, we have

d \geq 6

, if the columns of

H_{G}

satisfy the following conditions:

(i): $c_{1}^{i} + c_{2}^{i} \neq 0$ for $1 \leq i \leq l$ ;
(ii): $c_{1}^{i} + c_{2}^{i} + c_{3}^{i} + c_{4}^{i} \neq 0$ for $1 \leq i \leq l$ ; and
(iii): $c_{1}^{i} + c_{2}^{i} + c_{1}^{j} + c_{2}^{j} \neq 0$ for $1 \leq i \neq j \in l$ .

Construction (PS3) [50]:

Let

r = 2^{t} + 2^{⌊ (t + 1) / 2 ⌋} - 1

, and

{W_{1}, W_{2}, \dots, W_{a}}

be a maximum partial

(2 t + 1)

-spread of

F_{2}^{s}

and the basis of

W_{i}

is

{e_{1}^{(i)}, e_{2}^{(i)}, \dots, e_{2 t + 1}^{(i)}}

. When

t \geq 3

, there is a

(2^{t} + 2^{⌊ (t + 1) / 2 ⌋} - 1, 2^{t} + 2^{⌊ (t + 1) / 2 ⌋} - 2 t - 2, 5)

binary linear code. Let

T^{(i)}

be the same set in Construction (PS2) for

1 \leq i \leq a

. For

t = 1, 2

,

T^{(i)}

is defined as

{0, e_{1}^{(i)}, e_{2}^{(i)}, \dots, e_{2 t + 1}^{(i)}}

. Let

H_{G}^{i}

be an

s \times (2^{t} + 1)

matrix whose columns consist of the vectors in

T^{(i)}

. Then, a BLRC

C

can be constructed using a parity check matrix H in Equation (6) for

\frac{s}{r} < l \leq a

.

Let

A_{q} (m, k, d)

be the maximal cardinality of subspace codes over

F_{q}^{m}

with minimum distance d and dimension k. Then, we can construct a BLRC as follows:

Construction (PS4) [50]:

Let

n = 3 l

such that

l \neq \frac{2^{2 m + 1} - 2}{3}

for

m \geq 2

. Then, there exists an

(n, k, 6, 2)

BLRC

C

with dimension given as

\begin{matrix} k = \{\begin{matrix} 2 l - 2 m, & i f l \in [A_{2} (2 m - 1, 2, 4) + 2, A_{2} (2 m, 2, 4)] \\ 2 l - 2 m - 1, & i f l \in [A_{2} (2 m, 2, 4) + 1, A_{2} (2 m + 1, 2, 4)], \end{matrix} \end{matrix}

where it is optimal with respect to the bound in Equation (2). The following construction is nearly optimal with respect to the bound in Equation (2).

Construction (PS5) [50]:

Let

{W_{1}, W_{2}, \dots, W_{a}}

be a maximum partial two-spread of

F_{2}^{s}

. The basis of

W_{i}

is given as

{e_{1}^{(i)}, e_{2}^{(i)}}

. Then, a

(4 l, \geq 3 l - s - 1, \geq 6, 3)

BLRC

C

with parity check matrix H of the form in Equation (6) for

\frac{s + 1}{3} < l \leq a

can be constructed using the submatrices

H_{G}^{i}

for

0 \leq i \leq l

, which is given as

\begin{matrix} H_{G}^{i} = (\begin{matrix} 0 & e_{1}^{(i)} & e_{2}^{(i)} & e_{1}^{(i)} + e_{2}^{(i)} \\ 1 & 0 & 0 & 0 \end{matrix}) . \end{matrix}

Another construction based on the partial t-spread is also proposed in [58]. Let q be a prime power and

V_{m} (q)

be the vector space of dimension m over

F_{q}

.

Construction (PS6) [50]:

Given an integer

r \geq 2

, determine the smallest integer t such that

r + 1 \leq t + ⌊ \frac{t}{2} ⌋

. An integer m such that

\frac{m + 1}{r} \leq l

can be chosen, and there exists a partial t-spread with a size of at least l of

V_{m} (2)

. Let

B_{i} = {b_{i, 0}, b_{i, 1}, \dots, b_{i, t - 1}}

be a basis of

W_{i} \in S

and

C_{i} = {c_{i, 0}, c_{i, 1}, \dots, c_{i, ⌊ \frac{t}{2} ⌋ - 1}}

be a set whose elements are defined as

c_{i, j} = b_{i, 2 j} + b_{i, 2 j + 1}

for

i = 0, 1, \dots, l - 1

and

j = 0, 1, \dots, ⌊ \frac{t}{2} ⌋ - 1

. Finally, let

U_{i} = B_{i} \cup C_{i}

for

i = 0, 1, \dots, l - 1

. Let s be an integer such that

\frac{m + 1}{r} \leq s \leq l

, and we use any

r + 1

vectors in

U_{i}

to fill each submatrix

H_{G}^{i}

as its

r + 1

columns for

i = 0, 1, \dots, s - 1

. Then, the BLRC

C_{s, m, r}

has length

n = (r + 1) s

, dimension

k = r s - m

, minimum distance

d \geq 6

, and locality r.

Then, the BLRCs

C_{4, 4, 2}

and

C_{5, 4, 2}

obtained from Construction (PS6) are optimal. In addition, for

s = 4, 5, \dots, 9

, the BLRCs

C_{s, 5, 2}

from Construction (PS6) are almost optimal in terms of the C-M bound and for

s = 3, 4, \dots, 9

, the BLRCs

C_{s, 6, 3}

from Construction (PS6) are almost optimal with respect to the C-M bound.

3.8. BLRCs from Generalized Hamming Code

Suppose that s and t are two positive integers such that

2 t | s

and

\frac{s}{2 t} \geq 2

. Let A be a

2 t \times 2^{t}

binary parity check matrix such that any four columns of this matrix are linearly independent. For

t \leq 2

, A can be chosen as the identity matrix. For

t \geq 3

, A is the parity check matrix of a

(2^{t}, 2^{t} - 2 t, 5)

binary linear code that can be built from non-primitive cyclic codes with length

2^{t} + 1

. Let

β

be the primitive root of

x^{2^{t} + 1} - 1

, and let

M (x)

denote the minimum polynomial of

β

. The degree of

M (x)

is

2 t

.

A^{'}

is a parity check matrix defining the binary cyclic code with parameters

(2^{t} + 1, 2^{t} - 2 t, \geq 6)

that is generated by

(x - 1) M (x)

. Then, the set

{β^{t} | i = - 2, - 1, 0, 1, 2, \dots}

forms a subset of the roots of

(x - 1) M (x)

. By deleting one coordinate of

A^{'}

, we can construct the parity check matrix A of the punctured code with parameters

(2^{t}, 2^{t} - 2 t, \geq 5)

. In addition, B is defined as a matrix such that the columns are all nonzero

\frac{s}{2 t}

-tuples from

F_{2^{2 t}}

, with the first nonzero element equal to 1. Then, B is an

\frac{s}{2 t} \times \frac{2^{s} - 1}{2^{2 t} - 1}

parity check matrix of a

2^{2 t}

-ary Hamming code. Using the matrices A and B, a BLRC construction is provided as follows.

Construction (GH1) [47,48]:

Suppose that

a_{1}, \dots, a_{2^{t}} \in F_{2^{2 t}}

are the

2^{t}

elements corresponding to the columns of A, and the ith column of B is denoted by a vector

β_{i}

for

1 \leq i \leq \frac{2^{s} - 1}{2^{2 t} - 1}

. Let

C

be a binary linear code with the parity check matrix given as

\begin{matrix} H = (\begin{matrix} L_{1} & L_{2} & \dots & L_{l} \\ H_{1} & H_{2} & \dots & H_{l} \end{matrix}), \end{matrix}

where

l = \frac{2^{s} - 1}{2^{2 t} - 1}

and for

1 \leq i \leq l

,

L_{i}

is an

l \times (2^{t} + 1)

matrix whose ith row is an all-one vector, the other rows are all-zero vectors, and

H_{i}

is an

s \times (2^{t} + 1)

matrix over

F_{2}

whose columns are binary expansions of the vectors

{0, a_{1} β_{i}, a_{2} β_{i}, \dots, a_{2^{t}} β_{i}}

.

It is shown that this construction can satisfy the bound given in Equation (4).

The shortening for LRCs can also give us another LRC. Let

C

be an

(n, k, d)

BLRC with locality r such that

n \geq 2 (r + 1)

and

k \geq 2 r

. Then, an

(n^{'}, k^{'}, d^{'})

BLRC

C^{'}

with locality r can be obtained by shortening C, where the parameters of

C^{'}

satisfy

n^{'} = n - (r + 1)

,

k^{'} \geq k - r

, and

d^{'} \geq d

.

Construction (GH2) [48]:

By applying the shortening of the

(r + 1)

times to C, we have an

(n - (r + 1), \geq k - r, \geq d)

BLRC.

This kind of code modification approach can be extended to the well-known code modification methods such as extending, shorting, expurgating, augmenting, and lengthening [53], as in the following subsection.

3.9. BLRCs from Code Modification

It is well-known that there are various code modification methods for linear codes. For BLRC, we can also use these modification methods to generate codes with new parameters [53]. Let

C

be an

(n, k, d)

binary code with locality r and let

d^{⊥}

be the minimum distance of its dual code,

C^{⊥}

. By adding a parity bit to each codewords in a

C

with parameters

(n, k, d)

, the extended code

C_{e x t}

with parameters

(n + 1, k, d_{e x t})

can be obtained. This can be formally presented as

\begin{matrix} C_{e x t} = \{(c_{1}, \dots, c_{n}, c_{n} + 1) | (c_{1}, \dots, c_{n}) \in C, c_{n + 1} = \sum_{i = 1}^{n} c_{i}\}, \end{matrix}

where

d_{e x t} = d + 1

for odd d and

d_{e x t} = d

for even d [53]. For BLRCs, we are interested in the locality of the derived codes for a give

C

with locality r. Let

C_{e x t}^{⊥}

be the dual code of

C_{e x t}

. If the maximum Hamming weight among codewords in the code

C^{⊥}

is

n - r

, then the locality of the extended code

C_{e x t}

is

r_{e x t} = r

. If the maximum Hamming weight among codewords in

C^{⊥}

is

n + 1 - d^{⊥}

, then the locality of the extended code

C_{e x t}

is

r_{e x t} = d^{⊥} - 1

. Finally, if

C

is an

(n, k, d)

cyclic code with an odd minimum distance d, then the locality of the dual code

C_{e x t}^{⊥}

in the extended code of

C

is

r_{e x t}^{⊥} = d

[53].

The shortening can also be applied to the derivation of new BLRC. By deleting codewords in

C

with nonzero values in the last coordinates and removing the last coordinates from the remaining codewords, we can find the shortened code

C_{s}

of

C

. This can be formally represented as

\begin{matrix} C_{s} = {(c_{1}, \dots, c_{n - 1}) | (c_{1}, \dots, c_{n}, 0) \in C} . \end{matrix}

For an original

(n, k, d)

binary linear code, it is known that the parameters of the shortened code are given as

(n - 1, k - 1, d_{s} \geq d)

. Moreover, if the original code is BLRC with locality

r \geq 2

, then the locality of the shortened code

C_{s}

is r or

r - 1

. Let

C_{s}^{⊥}

be the dual of

C_{s}

and let

d^{⊥} \geq 3

be the minimum distance of the dual code

C^{⊥}

. Then, for an

(n, k, d)

cyclic code

C

, the locality of code

C_{s}

is either

d^{⊥} - 2

or

d^{⊥} - 1

[53].

Next, the expurgation also can be used to generate new BLRC for an

(n, k, d)

BLRC

C

with odd weight codewords. As such, the expurgated code

C_{e x p}

of

C

can be generated as a subcode of

C

by selecting only even weight codewords such that

\begin{matrix} C_{e x p} = {c | c \in C, the Hamming weight w (c) is even} . \end{matrix}

The corresponding parameters of

C_{e x p}

are given as

(n, k - 1, w_{e})

, where

w_{e}

is the minimum Hamming weight of the nonzero codewords in

C

. Let

C_{e x p}^{⊥}

be the dual code of

C_{e x p}

. Then, we have

C_{e x p}^{⊥} = C^{⊥} \cup \bar{C^{⊥}}

[53].

As an inverse method of the expurgation as previously described, the augmented code

C_{a}

of an

(n, k, d)

code

C

without the all-one codeword

1

is defined as the code

C \cup {C + all - one codeword 1}

whose parameters are given as

(n, k + 1, min {d, n - w_{max}})

, where

w_{max}

is the maximum Hamming weight of codewords in

C

. If the code

C

is cyclic, then the expurgated and augmented codes of

C

are also cyclic [53].

Another example of BLRC from the code modification methods is presented in [60] using the shortened expurgated Hamming code.

Construction (SE-Hamming) [60]:

Let β be a primitive element of

F_{2^{m}}

and n be a positive integer

\geq 9

and divisible by 3 such that

\frac{2 n}{3} \leq 2^{m} - 1

. Let

C_{E}

is a

(2^{m} - 1, 2^{m} - m - 2, 4)

expurgated Hamming code with the generator polynomial

g (x) = (x + 1) g_{1} (x)

, where

g_{1} (x)

is the minimal polynomial of β over

F_{2}

. Then, a

(\frac{2 n}{3}, \frac{2 n}{3} - m - 1, \geq 4)

shortened expurgated Hamming code

C_{S}

can be generated by shortening the first

(2^{m} - \frac{2 n}{3} - 1)

information bits of

C_{E}

. The concatenation of

C_{S}

and an

(n, \frac{2 n}{3})

cyclic code with parity check polynomial

x^{\frac{2 n}{3}} + x^{\frac{n}{3}} + 1

as an inner code then yields an

(n, \frac{2 n}{3} - ⌈ {log}_{2} (\frac{2 n}{3} + 1) ⌉ - 1, d \geq 6, 2)

LRC

C_{C}

.

3.10. Summary of BLRC Constructions

We summarize the discussed BLRC construction methods in Table 1. Generally, in Table 1, X denotes the case that the equality of the bound is not achieved for all parameters. For the case of C-M bound,

k_{o p t}

is assumed to satisfy the Singleton bound for given n and d.

4. Conclusions

This paper summarizes the recently proposed constructions for BLRCs and their features. To achieve efficient hardware implementation, the codes are constructed over the binary field because the need for multiplications is obviated during the encoding, decoding, and repair processes. We explain the various construction methods of BLRCs using cyclic code based, random vector based, bipartite or expander graph based, anticode based, partial spread based, and generalized Hamming code based approaches. In addition, construction methods of the BLRCs using code modification methods for linear codes such as extending, shorting, expurgating, and augmenting are introduced.

We selectively review important achievements on BLRCs from the authors’ perspectives and thus obviously the authors’ bias are reflected. Therefore, not being reviewed here does not mean it is not an important result. Especially, we also apologize in advance for the lack of proper citation or lack of new research results because this area is actively researched and many papers have been introduced in a relatively short period of time.

Author Contributions

The authors contributed equally to surveying the literature; conceptualizing the review; and writing, reviewing, editing, and drafting the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (Ministry of Science and ICT) (No. NRF-2017R1A2B2010588).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AC	Anticode
BG	Bipartite graph
BLRC	Binary locally repairable code
CC	Cyclic code
C-M	Cadambe–Mazumdar
DSS	Distributed storage system
EG	Expander graph
FRC	Fractional repetition code
GH	Generalized Hamming code
LRC	Locally repairable code
MDS	Maximum distance separable
MR	Maximal recoverability
MR-LRC	Maximal recoverable-LRC
PS	Partial spread
RC	Regeneration code
RV	Random vector

References

Rhea, S.; Wells, C.; Eaton, P.; Geels, D.; Zhao, B.; Weatherspoon, H.; Kubiatowicz, J. Maintenance-free global data storage. IEEE Internet Comput. 2001, 5, 40–49. [Google Scholar] [CrossRef]
Chang, F.; Dean, J.; Ghemawat, S.; Hsieh, W.C.; Wallach, D.A.; Burrows, M.; Chandra, T.; Fikes, A.; Gruber, R.E. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst. (TOCS) 2008, 26. [Google Scholar] [CrossRef]
Ghemawat, S.; Gobioff, H.; Leung, S.-T. The Google file system. In Proceedings of the 19th ACM Symp. Operating Systems Principles, Bolton Landing, NY, USA, 19–22 October 2003; pp. 20–43. [Google Scholar]
Borthakur, D. The Hadoop Distributed File System: Architecture and Design. 2007. Available online: https://svn.apache.org/repos/asf/hadoop/common/tags/release-0.16.0/docs/hdfs_design.pdf (accessed on 9 April 2019).
Rashmi, K.V.; Shah, N.B.; Gu, D.; Kuang, H.; Borthakur, D.; Ramchandran, K. A solution to the network challenges of data recovery in erasure-coded distributed storage systems: A study on the Facebook warehouse cluster. In Proceedings of the 5th USENIX Workshop on Hot Topics in Storage and File Systems, San Jose, CA, USA, 6–9 October 2013. [Google Scholar]
Dimakis, A.G.; Prabhakaran, V.; Ramchandran, K. Decentralized erasure codes for distributed networked storage. IEEE Trans. Inf. Theory 2006, 52, 2809–2816. [Google Scholar] [CrossRef]
Wu, Y.; Dimakis, A.G.; Ramchandran, K. Deterministic regenerating codes for distributed storage. In Proceedings of the Annual Allerton Conference on Communication, Control, and Computing, Urbana-Champaign, IL, USA, 18 September 2007; pp. 1–5. [Google Scholar]
Rashmi, K.; Shah, N.; Kumar, P.V.; Ramchandran, K. Explicit construction of optimal exact regenerating codes for distributed storage. In Proceedings of the 47th Annual Allerton Conference on Communication, Control, and Computing, Urbana-Champaign, IL, USA, 18 September 2009; pp. 1243–1249. [Google Scholar]
Kim, Y.-S.; Park, H.; No, J.-S. Construction of new fractional repetition codes from relative difference sets with λ = 1. Entropy 2017, 19, 5637. [Google Scholar] [CrossRef]
Park, H.; Kim, Y.-S. Construction of fractional repetition codes with variable parameters for distributed storage systems. Entropy 2016, 18, 441. [Google Scholar] [CrossRef]
Tamo, I.; Barg, A. A family of optimal locally recoverable codes. IEEE Trans. Inf. Theory 2014, 60, 4661–4676. [Google Scholar] [CrossRef]
Rouayheb, S.E.; Ramchandran, K. Fractional repetition codes for repair in distributed storage systems. In Proceedings of the 48th Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 29 September–1 October 2010; pp. 1510–1517. [Google Scholar]
Dimakis, A.G.; Ramchandran, K.; Wu, Y.; Suh, C. A survey on network codes for distributed storage. Proc. IEEE 2011, 99, 476–489. [Google Scholar] [CrossRef]
Datta, A.; Oggier, F.E. An overview of codes tailor-made for better repairability in networked distributed storage systems. SIGACT News 2013, 44, 89–105. [Google Scholar] [CrossRef]
Li, J.; Li, B. Erasure coding for cloud storage systems: A survey. Tsinghua Sci. Technol. 2013, 18, 259–272. [Google Scholar] [CrossRef]
Liu, S.; Oggier, F. An overview of coding for distributed storage systems. In Network Coding and Subspace Designs; Springer: Berlin, Germany, 2018; pp. 363–383. [Google Scholar]
Balaji, S.B.; Krishnan, M.N.; Vajha, M.; Ramkumar, V.; Sasidharan, B.; Kumar, P.V. Erasure coding for distributed storage: An overview. Sci. China Inf. Sci. 2018, 61, 100301. [Google Scholar] [CrossRef]
Rashmi, K.V.; Shah, N.B.; Ramchandran, K.; Kumar, P.V. Regenerating codes for errors and erasures in distributed storage. In Proceedings of the 2012 IEEE International Symposium on Information Theory Proceedings, Cambridge, MA, USA, 1–6 July 2012; pp. 1202–1206. [Google Scholar]
Dimakis, A.G.; Godfrey, P.B.; Wu, Y.; Wainwright, M.J.; Ramchandran, K. Network coding for distributed storage systems. IEEE Trans. Inf. Theory 2010, 56, 4539–4551. [Google Scholar] [CrossRef]
Huang, C.; Chen, M.; Li, J. Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems. In Proceedings of the IEEE International Symposium on Network Computing and Applications (NCA 2007), Cambridge, MA, USA, 12–14 July 2007; pp. 79–86. [Google Scholar]
Huang, C.; Chen, M.; Li, J. Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems. ACM Trans. Storage (TOS) 2013, 9, 3:1–3:28. [Google Scholar] [CrossRef]
Oggier, F.; Datta, A. Self-repairing homomorphic codes for distributed storage systems. In Proceedings of the IEEE INFOCOM, Shanghai, China, 10–15 April 2011; pp. 1215–1223. [Google Scholar]
Gopalan, P.; Huang, C.; Simitci, H.; Yekhanin, S. On the locality of codeword symbols. IEEE Trans. Inf. Theory 2012, 58, 6925–6934. [Google Scholar] [CrossRef]
Prakash, N.; Kamath, G.M.; Lalitha, V.; Kumar, P.V. Optimal linear codes with a local-error-correction property. In Proceedings of the IEEE International Symposium on Information Theory Proceedings (ISIT 2012), Cambridge, MA, USA, 1–6 July 2012; pp. 2776–2780. [Google Scholar]
Forbes, M.; Yekhanin, S. On the locality of codeword symbols in non-linear codes. Discret. Math. 2014, 324, 78–84. [Google Scholar] [CrossRef]
Tamo, I.; Papailiopoulos, D.S.; Dimakis, A.G. Optimal locally repairable codes and connections to matroid theory. IEEE Trans. Inf. Theory 2016, 62, 6661–6671. [Google Scholar] [CrossRef]
Calder, B.; Wang, J.; Ogus, A.; Nilakantan, N.; Skjolsvold, A.; McKelvie, S.; Xu, Y.; Srivastav, S.; Wu, J.; Simitci, H.; et al. Windows Azure storage: A highly available cloud storage service with strong consistency. In Proceedings of the 23th ACM Symposium Operating Systems Principles (SOSP’11), Cascais, Portugal, 23–26 October 2011; pp. 143–157. [Google Scholar]
Mehrabi, M.; Ardakani, M.; Khabbazian, M. Minimizing the update complexity of Facebook HDFS-RAID locally repairable code. In Proceedings of the 2017 IEEE 86th Vehicular Technology Conference (VTC-Fall), Toronto, ON, Canada, 24–27 September 2017; pp. 1–5. [Google Scholar]
Papailiopoulos, D.; Dimakis, A.G. Distributed storage codes through Hadamard designs. In Proceedings of the IEEE International Symposium on Information Theory Proceedings (ISIT 2011), St. Petersburg, Russia, 31 July–5 August 2011; pp. 1230–1234. [Google Scholar]
Silberstein, N.; Rawat, A.S.; Koyluoglu, O.O.; Vishwanath, S. Optimal locally repairable codes via rank-metric codes. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Istanbul, Turkey, 7–12 July 2013; pp. 1819–1823. [Google Scholar]
Papailiopoulos, D.S.; Dimakis, A.G. Locally repairable codes. IEEE Trans. Inf. Theory 2014, 60, 5843–5855. [Google Scholar] [CrossRef]
Kamath, G.M.; Prakash, N.; Lalitha, V.; Kumar, P.V. Codes with local regeneration and erasure correction. IEEE Trans. Inf. Theory 2014, 60, 4637–4660. [Google Scholar] [CrossRef]
Cadambe, V.R.; Mazumdar, A. Bounds on the size of locally recoverable codes. IEEE Trans. Inf. Theory 2015, 61, 5787–5794. [Google Scholar] [CrossRef]
Chen, M.; Huang, C.; Li, J. On the maximally recoverable property for multi-protection group codes. In Proceedings of the IEEE International Symposium on Information Theory, Nice, France, 24–29 June 2007; pp. 486–490. [Google Scholar]
Blaum, M.; Hafner, J.L.; Hetzler, S. Partial-MDS codes and their application to RAID type of architectures. IEEE Trans. Inf. Theory 2013, 59, 4510–4519. [Google Scholar] [CrossRef]
Gopalan, P.; Huang, C.; Jenkins, B.; Yekhanin, S. Explicit maximally recoverable codes with locality. IEEE Trans. Inf. Theory 2014, 60, 5245–5256. [Google Scholar] [CrossRef]
Martinez-Penas, U.; Kschischangm, F.R. Universal and dynamic locally repairable codes with maximal recoverability via sum-rank codes. In Proceedings of the 2018 56th Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 2–5 October 2018; pp. 792–799. [Google Scholar]
Shah, N.B.; Lee, K.; Ramchandran, K. When do redundant requests reduce latency? In Proceedings of the 51st Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 2–4 October 2013; pp. 731–738. [Google Scholar]
Joshi, G.; Liu, Y.; Soljanin, E. On the delay-storage trade-off in content download from coded distributed storage systems. IEEE J. Sel. Areas Commun. 2014, 32, 989–997. [Google Scholar] [CrossRef]
Liang, G.; Kozat, U. Tofec: Achieving optimal throughput-delay trade-off of cloud storage using erasure codes. In Proceedings of the IEEE Conference Computer Communication (IEEE INFOCOM), Toronto, ON, Canada, 27 April–2 May 2014; pp. 826–834. [Google Scholar]
ARawat, S.; Papailiopoulos, D.S.; Dimakis, A.G.; Vishwanath, S. Locality and availability in distributed storage. IEEE Trans. Inf. Theory 2016, 62, 4481–4493. [Google Scholar]
Lee, K.-S.; Park, H.; No, J.-S. New binary locally repairable codes with locality 2 and uneven availabilities for hot data. Entropy 2018, 20, 636. [Google Scholar] [CrossRef]
Shahabinejad, M.; Khabbazian, M.; Ardakani, M. An efficient binary locally repairable codes for Hadoop distributed file system. IEEE Commun. Lett. 2014, 18, 1287–1290. [Google Scholar] [CrossRef]
Shahabinejad, M.; Khabbazian, M.; Ardakani, M. A class of binary locally repairable codes. IEEE Trans. Commun. 2016, 64, 3182–3193. [Google Scholar] [CrossRef]
Shahabinejad, M.; Khabbazian, M.; Ardakani, M. On the average locality of locally repairable codes. IEEE Trans. Commun. 2018, 66, 2773–2783. [Google Scholar] [CrossRef]
Hu, S.; Tamo, I.; Barg, A. Combinatorial and LP bounds for LRC codes. In Proceedings of the IEEE International Symposium on Information Theory (ISIT 2016), Barcelona, Spain, 10–15 July 2016; pp. 1008–1012. [Google Scholar]
Wang, A.; Zhang, Z.; Lin, D. Bounds and constructions for linear locally repairable codes over binary fields. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 2033–2037. [Google Scholar]
Wang, A.; Zhang, Z.; Lin, D. Bounds for binary linear locally repairable codes via a sphere-packing approach. IEEE Trans. Inf. 2019. [Google Scholar] [CrossRef]
Agarwal, A.; Barg, A.; Hu, S.; Mazumda, A.; Tamo, I. Combinatorial alphabet-dependent bounds for locally recoverable codes. IEEE Trans. Inf. 2018, 64, 3481–34928. [Google Scholar] [CrossRef]
Ma, J.; Ge, G. Optimal binary linear locally repairable codes with disjoint repair groups. arXiv 2017, arXiv:1711.07138v1. [Google Scholar]
Goparaju, S.; Calderbank, R. Binary cyclic codes that are locally repairable. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Honolulu, HI, USA, 29 June–4 July 2014; pp. 676–680. [Google Scholar]
Zeh, A.; Yaakobi, E. Optimal linear and cyclic locally repairable codes over small fields. In Proceedings of the IEEE Information Theory Workshop (ITW), Jerusalem, Israel, 26 April–1 May 2015; pp. 1–5. [Google Scholar]
Huang, P.; Yaakobi, E.; Uchikawa, H.; Seigel, P.H. Binary linear locally repairable codes. IEEE Trans. Inf. 2016, 62, 6268–6283. [Google Scholar] [CrossRef]
Tamo, I.; Barg, A.; Goparaju, S.; Calderbank, R. Cyclic LRC codes, binary LRC codes, and upper bounds on the distance of cyclic codes. Int. J. Inf. Coding Theory 2016, 3, 345–364. [Google Scholar] [CrossRef]
Tamo, I.; Barg, A.; Frolov, A. Bounds on the parameters of locally recoverable codes. IEEE Trans. Inf. Theory 2016, 62, 3070–3083. [Google Scholar] [CrossRef]
Kruglik, S.; Nazirkhanova, K.; Frolov, A. New bounds and generalizations of locally recoverable codes with availability. IEEE Trans. Inf. Theory 2019. [Google Scholar] [CrossRef]
Silberstein, N.; Zeh, A. Optimal binary locally repairable codes via anticodes. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Hong Kong, China, 14–19 June 2015; pp. 1247–1251. [Google Scholar]
Nam, M.Y.; Song, H.Y. Binary locally repairable codes with minimum distance at least 6 based on partial t-spreads. IEEE Commun. Lett. 2017, 21, 1683–1686. [Google Scholar] [CrossRef]
Kim, C.; No, J.-S. New constructions of binary LRCs with disjoint repair groups and locality 3 using existing LRCs. IEEE Commun. Lett. 2019, 23, 406–409. [Google Scholar] [CrossRef]
Kim, C.; No, J.-S. New constructions of binary and ternary locally repairable codes using cyclic codes. IEEE Commun. Lett. 2018, 22, 228–231. [Google Scholar] [CrossRef]
Farrell, P. Linear binary anticodes. Electron. Lett. 1970, 6, 419–421. [Google Scholar] [CrossRef]
MacWilliams, F.J.; Sloane, N.J.A. The Theory of Error-Correcting Codes; North Holland: Amsterdam, The Netherlands, 1988; p. 547. [Google Scholar]

Figure 1. Classification of binary locally repairable codes.

Table 1. Summary of parameters of various BLRC constructions.

Codes	n	k	d	r	t	S	C-M
CC1	$2^{m} - 1$	$r n / (r + 1)$	2	r	1	O	X
CC2	$2^{m} - 1$	$\frac{2}{3} (2^{m} - 1) - m$	$\geq 6$	2	1	X	X
CC3	$2^{m} - 1$	$\leq \frac{2}{3} (2^{m} - 1) - 2 m$	10	2	1	X	X
CC4	$2^{m} - 1$	$3 n / 7$	4	2	3	X	X
CC5	$2^{m} + 1$	$\frac{2}{3} (2^{m} + 1) - 2 m$	$\geq 10$	2	1	X	X
CC6	$2^{m} - 1$	$3 n / 7 - m$	$\geq 12$	2	3	X	X
CC7	$2^{a} - 1$	$\frac{a}{2^{a} - 1} (2^{m} - 1) - m$	$\geq 2^{a - 1} + 2^{a}$	2	$2^{a - 1} - 1$	X	X
CC8	n	k	$2^{z - 1}$	$\leq 2^{z - 1} - 1$	1	X	X
CC9	n	$n - deg (g (x))$	4	$u - 1$	1	X	X
RV	$k^{2} - k$	k	$2 k - 2$	2	$\geq 2$ *	O	X
BG	n	$\frac{r n}{r + 1} - ⌈ {log}_{2} (r + 1) ⌉$	4	r	1	O	X
EG	n	k	d	r	t	O	X
AC1	$2^{m} - (\binom{s}{2}) - 1$	m	$2^{m - 1} - ⌊ \frac{s^{2}}{4} ⌋$	2	1	X	O
AC2	$2^{m} - 2^{t} + t + 1$	m	$2^{m - 1} - 2^{t - 1} + 2$	2	1	X	X
AC3	$2^{m - 1} - 1$	m	$2^{m - 2} - 1$	3	1	X	O
PS1	n	$\geq \frac{r n}{r + 1} - t ⌈ {log}_{2} n ⌉$	$\geq 2 t + 2$	r	1	O	X
PS2	n	k	d	$2^{t}$	t	O	O
PS3	n	k	d	$2^{t} + 2^{⌊ \frac{t + 1}{2} ⌋} - 1$	t	O	O
PS4	$3 l$	$2 l - 3 m$ or $2 l - 2 m - 1$	6	2	t	O	O
PS5	$4 l$	$\geq 3 l - s - 1$	$\geq 6$	3	1	O	X
PS6	$(r + 1) s$	$r s - m$	$\geq 6$	r	t	O	X
GH1	$2^{t} + 1$	$2^{t} - l - s - 1$	d	r	t	O	X
GH2	$n - r - 1$	$\geq k - r$	$\geq d$	r	t	O	X
SE	n	$\frac{2 n}{3} - ⌈ {log}_{2} (\frac{2 n}{3} + 1) ⌉ - 1$	$\geq 6$	2	1	O	X

* This scheme has an uneven availability represented as an availability profile.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, Y.-S.; Kim, C.; No, J.-S. Overview of Binary Locally Repairable Codes for Distributed Storage Systems. Electronics 2019, 8, 596. https://doi.org/10.3390/electronics8060596

AMA Style

Kim Y-S, Kim C, No J-S. Overview of Binary Locally Repairable Codes for Distributed Storage Systems. Electronics. 2019; 8(6):596. https://doi.org/10.3390/electronics8060596

Chicago/Turabian Style

Kim, Young-Sik, Chanki Kim, and Jong-Seon No. 2019. "Overview of Binary Locally Repairable Codes for Distributed Storage Systems" Electronics 8, no. 6: 596. https://doi.org/10.3390/electronics8060596

APA Style

Kim, Y.-S., Kim, C., & No, J.-S. (2019). Overview of Binary Locally Repairable Codes for Distributed Storage Systems. Electronics, 8(6), 596. https://doi.org/10.3390/electronics8060596

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Overview of Binary Locally Repairable Codes for Distributed Storage Systems

Abstract

1. Introduction

2. Preliminaries

2.1. Classification of Storage Codes for DSS

2.2. Locality, Recoverability, and Availability for Hot Data

3. Binary Locally Repairable Codes

3.1. Bounds for the Binary Locally Repairable Codes

3.2. Classification of Binary Locally Repairable Codes

3.3. BLRCs from Cyclic Codes

3.4. BLRCs from Random Vectors

3.5. BLRCs from Bipartite Graph

3.6. BLRC from Anticode

3.7. BLRCs from Partial Spread

3.8. BLRCs from Generalized Hamming Code

3.9. BLRCs from Code Modification

3.10. Summary of BLRC Constructions

4. Conclusions

Author Contributions

Funding

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI