1. Introduction
Large-scale distributed storage systems store vast amounts of data, which are vulnerable to loss in the event of node failures. A widely adopted method to prevent data loss is the replication mechanism, where data from a failed node are recovered by copying it from one of its replicas to a new replacement node. This approach ensures data integrity and system reliability but introduces significant storage overhead. To address this inefficiency, erasure codes have been proposed as an alternative to replication, offering reduced storage overhead and the same level of reliability. Facebook’s cluster storage system [
1] employs a
Reed–Solomon (RS) code. This system can tolerate up to four node failures. However, such codes require all the data from the other 10 nodes to repair a single node. Therefore, there is a critical need for coding schemes that minimize the number of nodes accessed during recovery. To address this problem, locally repairable codes (LRCs) [
2] have been introduced.
In recent years, locally repairable codes have gained significant attention due to their practical applications in distributed storage systems [
2,
3,
4,
5]. The primary objective of LRCs is to minimize the number of nodes that need to be accessed when repairing a failed node. In [
4], the concept of
LRCs with
-locality was introduced. Cai et al. [
6] gave a construction of locally repairable codes with
-locality and smaller field sizes than known constructions of this kind.
However, these constructions do not support parallel reading of specific data nodes. In practice, some data experience significantly higher access frequencies, and such data are called hot data. For such data, supporting parallel access is an important feature. To address this shortcoming, Wang et al. [
7] introduced
. These codes enable the retrieval of data stored in an individual node using
mutually exclusive recovery sets, thereby enabling parallel access to frequently accessed data in distributed storage systems. Below, we give a formal definition of
LRC.
Definition 1. Given a systematic linear code with block length n, dimension k, and minimum distance d over , where denotes the minimum distance of , denotes the code rate of , the generator matrix of is defined as , and the parity-check matrix of is defined as . A message of length k over is encoded into a codeword of length n over . The encoding process can be written in the form of matrix multiplication: .
Let denote the set . A symbol in a codeword has locality r if is a linear combination of a subset of indexed by , where . is a recovery set (repair set) of . is said to have all-symbol locality if every symbol in has locality r.
If a symbol in a codeword has t repair sets , where , , then the symbol has locality r and availability t, or the symbol has -availability [8]. For an code over , if all information symbols possess -availability, is called an locally repairable code (LRC) with information-symbol (IS) -availability. This type of code is abbreviated as IS-LRC. Alternatively, if every codeword symbol in has -availability, is referred to as an LRC with all-symbol -availability, abbreviated as AS-LRC.
The repair sets of any coordinate i in a codeword from an locally repairable code can be deduced from its parity-check matrix H. Let denote the rows of H. If coordinate i possesses -availability, then the i-th column of H has at least t nonzero elements in t rows . Note that . So, we can obtain t linear equations . Assume is lost. Then, the terms in the above t linear equations, other than , index a repair set of .
Example 1. G is a generator matrix of an LRC over . H is its parity-check matrix.In this code , the message symbols in any codeword can be recovered by two disjoint repair sets of size 2. Assume is lost. From H or G, we can see that and . So, the two disjoint repair sets of are and . So, can be recovered by reading the data stored in (or ), and . Li et al. [
9] constructed asymptotically good LRCs with multiple recovery sets using automorphism groups of function fields. Wang et al. [
7] proposed the following minimum distance bound for
LRC:
Rawat et al. [
8] demonstrated that for
LRCs, where each repair group includes precisely one parity symbol, the minimum distance
d adheres to the following relationship:
In their work, Rawat et al. [
8] employed combinatorial designs to develop locally repairable codes with
-availability under the condition that
r divides
k. These constructions were shown to meet the bound in (
3). The authors also raised an open question: whether
is a necessary condition for
locally repairable codes to attain the bound in (
3). Su et al. [
10] introduced binary LRCs constructed using resolvable configurations. Su [
11] gave additional parameters for locally repairable codes whose minimum distance attains the bound (
3) based on resolvable configurations.
It is important to highlight that binary codes over
offer significant benefits for practical applications compared to codes defined over larger fields. Hao et al. [
12] presented methods for constructing locally repairable codes from LDPC codes. Balaji et al. [
13] gave a definition of the strict availability of locally repairable codes. Zhang et al. [
14] extended this construction by relaxing the constraint
given in [
13]. Zhang et al. [
15] also gave a series of distance-optimal locally repairable codes from certain combinatorial structures to attain the bound in (
3). Among the codes proposed in [
15],
is not a necessary condition. Teng et al. [
16] developed constructions of optimal binary locally repairable codes (LRCs) with multiple recovery sets. Jin et al. [
17] introduced a new class of binary LRCs with high availability. Tan et al. [
18] proposed several constructions of binary distance-optimal LRCs from linear algebra and partial geometries. Fetrat et al. [
19] presented a novel family of binary LRCs designed for coded distributed computing.
Prakash et al. [
20] derived the following upper bound on the code rate for
:
In practice, distributed storage systems often involve nodes with different characteristics, such as varying failure rates or uptimes. Existing constructions are not fully equipped to address these heterogeneous requirements. Kadhe et al. [
21] and Zeh et al. [
22] proposed LRCs with nonuniform localities, while Bhadane et al. [
23] introduced LRCs with varying localities and availabilities. Cai et al. [
24] derived a connection between
-packing and optimal locally repairable codes. They also proposed a classification of LRCs with nonuniform locality or availability and specific conditions that the parameters need to satisfy under these classifications [
24]. Cai et al. [
25] generalized the concept of
-availability and provided a series of constructions that achieve the bound in (
3).
This paper generalizes these approaches by leveraging pairwise balanced designs (PBDs) and balanced incomplete block designs (BIBDs) to construct LRCs with nonuniform localities and availabilities.
Table 1 presents the parameters of all constructions proposed in this article. These codes prioritize binary implementations for ease of deployment while achieving optimal or near-optimal performance in terms of rate, minimum distance, and repair efficiency. The contributions of this paper are as follows:
We introduce LRCs with nonuniform availabilities and all-symbol availabilities from pairwise balanced designs. This construction contains the codes with all-symbol availabilities constructed in [
15]. Moreover, from this construction, we obtain a distance-optimal binary
LRC with all-symbol availability attaining the bounds in (
2) and (
3);
We also give a construction of LRCs with nonuniform localities and all-symbol availability from PBDs. This construction also contains the codes from [
15];
We introduce LRCs with nonuniform availabilities and uniform locality using PBDs. When the chosen PBD is a BIBD, this construction gives distance-optimal locally repairable codes with message
-availability. The rate of this construction is optimal when
. This construction also contains some codes from [
15,
18,
24], and the rate of this construction is higher than that of the codes in [
11,
24,
25];
Finally, we propose distance-optimal LRCs with nonuniform localities using PBDs. This construction gives distance-optimal locally repairable codes with nonuniform localities and message
-availability attaining the bound in (
3). The rate of this construction is higher than the codes in [
11,
24,
25] while requiring a much smaller field size. This construction also contains some codes from [
15,
18,
24]. This construction gives codes with parameters not contained in [
11,
15,
18,
24,
25].
2. Preliminaries
This section outlines the notations and foundational concepts applicable throughout this article. We define the set for any positive integer i as , and the set for as . The symbol represents the finite field with q elements, where q is a prime power. Additionally, the support of a vector is indicated by .
2.1. Block Designs
Let
be integers such that
and
. A finite set
X is denoted as the point set. Elements in
are subsets of
X. Elements in
are called blocks. The pair
is called a
balanced incomplete block design (BIBD), written as a
-BIBD, if the following conditions are satisfied:
,
for every
, and every pair of distinct points in
X is contained in exactly
blocks [
26]. In a
-BIBD, each point belongs to
blocks, and the number of blocks is given by
.
A balanced incomplete block design can be described using its incidence matrix. For a
-BIBD
, where
X is a point set of size
and
is a block set of size
, let
M represent a
incidence matrix. The columns of
M correspond to the blocks
, with each
for
. The rows of
M correspond to the points
. The entry
in the incidence matrix
M is defined as
In this matrix, each column contains exactly
ones, while each row contains exactly
ones.
A
resolution is defined as a partition of
into multiple parallel classes. If such a partition exists, the BIBD is called
resolvable [
26]. The union of all parallel classes is equal to the point set
X. A
-RBIBD denotes a resolvable BIBD. The existence of a parallel class implies that
divides
, and the parallel class contains exactly
blocks. For a resolvable BIBD,
can be partitioned into
parallel classes.
2.2. Pairwise Balanced Designs (PBDs)
A PBD is a generalization of the balanced incomplete block design (BIBD) and is defined below.
Consider a positive integer
, a set of positive integers
such that
for all
, and another positive integer
. Let
X be a finite set, referred to as the set of points, and let
be a collection of subsets of
X, known as blocks. The pair
is defined as a
-
pairwise balanced design (PBD) [
26], abbreviated as
-PBD, if it satisfies the following conditions: The cardinality of
X is
, i.e.,
. For every block
, the size
belongs to
, and any pair of distinct points in
X appears in exactly
blocks.
The set of parameters is known as the parameter set of the PBD . When contains only one element, i.e., , the -PBD is a -BIBD. A -PBD is considered nondegenerate if there exists at least one block such that . In this paper, all PBDs discussed are assumed to be nondegenerate. PBDs can also be represented via incidence matrices.
A PBD can be constructed by removing points from a BIBD [
26]. Suppose
is a
-BIBD, and let
be any point. Define
as the point set excluding
p, and let
be the collection of blocks obtained by removing
p from each block in
. Then, the pair
forms a
-PBD.
4. Discussion
In this section, we compare the parameters of our constructions with those of some related works.
Table 1 gives the parameters of all constructions proposed in this paper. In
Table 2 and
Table 3, we give the parameters of some locally repairable codes with information-symbol
-availability from this paper and related constructions.
First, we compare our constructions with the codes constructed in [
15] with all-symbol
-availability. Since a
-BIBD is also a
-PBD, we can use our constructions in
Section 3.1 to obtain an
LRC with all-symbol availability, where
and
. This parameter set includes the first code constructed in [
15]. The construction of an
LRC in Theorem 2 from a
-PBD leads to multiple choices of
r for fixed
t, and the parameters of this construction cannot be derived from [
15]. Moreover, the codes from
Section 3.1 and
Section 3.2 give nonuniform locality or nonuniform availability LRCs, and we can obtain a distance-optimal
LRC with availabilities of 2 and 3 for different coordinates from Theorem 1.
Then, we consider locally repairable codes with information
-availability. Since a
-BIBD is also a
-PBD, we can use the codes in
Section 3.3 and
Section 3.4 to obtain three classes of distance-optimal constructions from BIBDs. These constructions contain the codes from [
15]. Moreover, our codes in Theorem 4 give locally repairable codes with nonuniform availability, and our construction in Theorem 5 proposes distance-optimal IS-LRCs with nonuniform locality. The locally repairable codes in this paper contain almost all the codes from [
15], except for those derived from Latin squares.
Table 2 contains the corresponding properties of these constructions.
Then, we compare our construction with some known distance-optimal locally repairable codes from [
6,
11,
18,
24,
25]. Locally repairable codes with
-locality [
6] focus on enhancing the fault tolerance ability of each repair group. In this kind of code, each code symbol has multiple choices of repair groups. However, there is no guarantee that these repair sets are disjoint. On the other hand, locally repairable codes with multiple disjoint recovery sets [
11,
18,
24,
25] support parallel reading of hot data.
Su [
11] proposed a series of distance-optimal locally repairable codes with information symbol
-availability from
resolvable configurations and an
MDS code or an
Gabidulin code. The first kind of construction in [
11] gave
IS-LRCs without all-symbol locality and a minimum distance of
, where
. The second kind of construction in [
11] gave distance-optimal
IS-LRCs with all-symbol locality, where
. Compared to the codes in [
11], our distance-optimal constructions in
Section 3.3 and
Section 3.4 achieve higher code rates and require much smaller field sizes, leading to lower computational and storage overhead. Moreover, the locality of our codes in
Section 3.3 and
Section 3.4 is
r for all code symbols, while the code in [
11] with all-symbol locality has a field size that is exponential in the code length
n. Compared to the codes in [
11], our distance-optimal codes achieve the smallest finite field size, a higher code rate, and all-symbol locality at the expense of a lower minimum distance (minimum distance of our codes still attains the bound in (
3)).
Rawat et al. [
8] raised an open question: whether
is a necessary condition for
locally repairable codes to attain the bound in (
3). Zhang et al. [
15] and Cai et al. [
24] addressed this open question by proposing distance-optimal locally repairable codes, where
. Cai et al. presented a framework for constructing locally repairable codes with optimal distance using
-packings. A
-packing can be viewed as a generalization of a PBD that requires each pair of points can appear at most once in all blocks. Cai et al. gave a series of distance-optimal locally repairable codes from packings when
. The parameters of these constructions are
,
, and
. Our constructions yield similar parameters in this case. The authors also gave a series of constructions when
. The parameters of these constructions are listed in
Table 3. Our construction in Theorem 5 yields similar parameters as these constructions when
or
, as well as in other cases. For example, in our constructions from Theorem 5, given a
-RBIBD, we can obtain an LRC with parameters
,
. We can obtain a construction with the same
r and
t as the codes in rows 3 and 4 of
Table 3 [
24]. When we choose
in the construction from Theorem 5, we obtain a construction with the same
r and
t as the codes in [
24] in the second row of
Table 3. Moreover, our distance-optimal constructions in
Section 3.4 have higher code rates and smaller field sizes, and all coordinates in our constructions in
Section 3.4 have locality
r. We also addressed the open problem proposed in [
8]. Therefore, compared to the construction in [
24], the codes in this paper have certain advantages.
Cai et al. [
25] also proposed a construction of distance-optimal locally repairable codes with
-availability from linearized Reed–Solomon codes and Gabidulin codes. Their constructions can attain the bound in (
2). The parameters of these constructions are listed in
Table 3. The constructions in [
25] give
LRCs with information
-availability for arbitrary
r and
t. They give the first explicit construction of LRCs with minimum distance to attain the bound in (
2). Compared to the codes in [
25], our constructions in
Section 3.4 have a smaller minimum distance. However, the code rate of the constructions in [
25] is
, and the field size is much larger than 2. The code rate of our constructions is higher than that of the constructions in [
25]. In addition, our constructions are binary and distance-optimal, so they are more suitable for deployment in actual distributed storage systems.
In [
18], Tan et al. proposed two classes of optimal LRCs with message-symbol
-availability from linear algebra and partial geometries. Note that some partial geometries are BIBDs, so our constructions in
Section 3.3 and
Section 3.4 include the constructions from [
18] when the chosen partial geometry is a BIBD. Their codes from linear algebra with
, where
and
r is a prime, are included in our constructions from an affine plane of order
r. Their constructions with parameters
are covered in our codes when an
-BIBD exists. The codes in [
15] can also yield the parameters mentioned above. Therefore, our codes include part of the constructions from [
18]. Moreover, our constructions can yield binary distance-optimal codes when
, while the constructions in [
11,
18] cannot.
5. Conclusions
This paper presents a comprehensive framework for constructing locally repairable codes (LRCs) using pairwise balanced designs (PBDs). In this paper, we propose LRCs with all-symbol locality and nonuniform availability using PBDs. These constructions include previously developed codes from [
15]. In some cases, this construction can achieve distance optimality under the bounds in (
2) and (
3). We also introduce a method for constructing LRCs with nonuniform localities and all-symbol availability.
We construct locally repairable codes with uniform locality and nonuniform availability from PBDs. When the chosen PBD is a BIBD, our construction is distance-optimal and rate-optimal for
. This construction contains the codes from [
15].
Moreover, we develop distance-optimal LRCs with nonuniform localities and message
-availability, addressing the open question proposed in [
8]. Our construction contains some codes proposed in [
15,
18,
24], and its code rate outperforms that of the codes in [
11,
24,
25] with a much smaller field size, demonstrating superiority over state-of-the-art constructions in terms of code rate, finite field size, and repair efficiency. Our construction can also produce codes with parameters not contained in [
11,
15,
18,
24,
25].
Through detailed comparisons with existing works, including those by Su et al. [
11], Cai et al. [
24,
25], Zhang et al. [
15], and Tan et al. [
18], we have shown that our constructions either subsume or outperform prior codes in terms of code rate, minimum distance, and computational complexity. Moreover, our binary implementations make these constructions highly suitable for practical deployment in distributed storage systems. Overall, the results in this paper provide a unified and generalized approach for constructing high-performance locally repairable codes, with significant theoretical and practical implications.