Three Efficient All-Erasure Decoding Methods for Blaum–Roth Codes

Blaum–Roth Codes are binary maximum distance separable (MDS) array codes over the binary quotient ring F2[x]/(Mp(x)), where Mp(x)=1+x+⋯+xp−1, and p is a prime number. Two existing all-erasure decoding methods for Blaum–Roth codes are the syndrome-based decoding method and the interpolation-based decoding method. In this paper, we propose a modified syndrome-based decoding method and a modified interpolation-based decoding method that have lower decoding complexity than the syndrome-based decoding method and the interpolation-based decoding method, respectively. Moreover, we present a fast decoding method for Blaum–Roth codes based on the LU decomposition of the Vandermonde matrix that has a lower decoding complexity than the two modified decoding methods for most of the parameters.


Introduction
Redundancy is necessary in storage systems in order to provide high data reliability in case of disk failures [1]. Replication and erasure codes are two main ways of including redundancy. The idea of replication is that the data in one disk are copied to multiple disks. The storage system replaces damaged disks with their copies when some disks are erased. It is fast to repair the erased disks but requires a lot of storage space. In contrast, erasure codes provide higher data reliability with a small storage cost.
Maximum distance separable (MDS) codes [2] are typical erasure codes that have optimal tradeoff between storage cost and data reliability, i.e., they can achieve the minimum storage cost given a level of data reliability. Binary MDS codes are special MDS codes that have lower computational complexity in the encoding/decoding procedures, since only XORs and cyclic-shift operations are involved. Some existing constructions of binary MDS codes are EVENODD codes [3,4], RDP codes [5], and X-codes [6,7], which can correct any two-column (we use "column" and "disk" interchangeably in this paper) erasures. RTP codes [8], Star codes [9,10], and extended EVENODD codes [11][12][13][14] can correct any threecolumn erasures. With the rapid increase in the data scale in storage systems [15], we need to design binary MDS codes that can correct any number of erasures as well as efficient encoding/decoding methods. Graftage codes [16] can achieve various tradeoffs between storage and repair bandwidth, while we focus on efficient decoding methods of binary MDS codes. Blaum-Roth codes [17] are this type of code, which are designed over the ring R p = F 2 [x]/(M p (x)), where M p (x) = 1 + x + · · · + x p−1 , and p is a prime number.
When some columns are erased, the syndrome-based decoding method [17] and the interpolation-based decoding method [18] have been proposed to recover the erased columns. In the decoding methods [17,18], there are three basic operations over the ring R p : (i) addition, (ii) multiplication of a power of x and a polynomial, and (iii) division of factor 1 + x b with 1 ≤ b ≤ p − 1. It is shown in the decoding methods [17,18] that we can first take the operations (i) and (ii) modulo 1 + x p and then take the results of modulo M p (x), while operation (iii) in the decoding methods [17,18] is directly taken as modulo M p (x).
In this paper, we show that we can also compute operation (iii) as modulo 1 + x p , which has lower computational complexity than modulo M p (x). We propose modified decoding methods for the two existing decoding methods [17,18] that have a lower decoding complexity than the original decoding methods by computing operation (iii) as modulo 1 + x p instead of modulo M p (x). The reason our modified decoding methods have much lower decoding complexity than the decoding methods [17,18] is twofold. First, all the operations in our decoding methods are taken as modulo 1 + x p , while the existing decoding methods execute the divisions as modulo M p (x). Second, we propose new algorithms in the decoding procedure to reduce the number of operations. Please refer to Section 3 for our two modified decoding methods. Moreover, the efficient LU decoding method [19] proposed for extended EVENODD codes decoding can also be employed to recover the erased columns of Blaum-Roth codes. We show that the LU decoding method has lower decoding complexity than the two modified decoding methods for most of the parameters. We define the decoding complexity as the total number of XORs required to recover the erased columns.

Blaum-Roth Codes
In this section, we first review the construction of Blaum-Roth codes [17] and then show the efficient operations over the ring F 2 [x]/(1 + x p ). Finally, we present an algorithm to compute multiple multiplications, which have two nonzero terms over F 2 [x]/(1 + x p ) with lower complexity.

Construction of Blaum-Roth Codes [17]
The codeword of Blaum-Roth codes [17] is a (p − 1) × n array [c i,j ] p−2,n−1 i=0,j=0 that is encoded from the (p − 1)k information bits, where c i,j ∈ F 2 and n ≤ p. We can view any k columns of the (p − 1) × n array as information columns that store the (p − 1)k information bits and the other r = n − k columns as parity columns that store the (p − 1)r parity bits. For j = 0, 1, . . . , n − 1, we represent the p − 1 bits in column j by a polynomial c j (x) = ∑ p−2 i=0 c i,j x i . The (p − 1) × n array of Blaum-Roth codes is defined as where H r×n is the r × n parity-check matrix and 0 is the all-zero row of length r. We denote the Blaum-Roth codes defined above as C(p, n, r). When p ≥ n and p is a prime number, we can always retrieve all the information bits from any k out of the n polynomials [17], i.e., C(p, n, r) are MDS codes. If we let c p−1,j = 0 for all j = 0, 1, . . . , n − 1, then C(p, n, r) can be equivalently defined as the following p · r linear constraints. (The subscripts are taken as modulo p unless otherwise specified.) where 0 ≤ m ≤ p − 1 and 0 ≤ ≤ r − 1.
Suppose that the λ columns {e i } λ−1 i=0 are erased, where λ ≥ 2 and 0 ≤ e 0 < · · · < e λ−1 < n. Let the δ = n − λ surviving columns be over the ring R p , where V λ×λ is the λ × λ square In this paper, we present three efficient decoding methods to solve the linear systems in Equation (1) over the ring F 2 [x]/(1 + x p ).

Efficient Operations over
It is more efficient to compute the multiplication of a power of x and division of the factor 1 + x b over the ring F 2 [x]/(1 + x p ) rather than over the ring R p : (i) Let a(x) ∈ R p , and the multiplication x i · a(x) over the ring R p in [17] (Equation (19)) takes p − 1 XORs, while the multiplication x i · a(x) over the ring F 2 [x]/(1 + x p ) takes no XORs [20].
where d is a positive integer, which is coprime with p. Consider the equation where f (x) has an even number of nonzero terms. Given such f (x) and d, we can compute g(x) by Lemma 1.

Lemma 1.
[ Lemma 8] in [21] The coefficients of g(x) in Equation (3) are given by By Lemma 1, computing the division 1+x d takes p − 3 XORs, but we are not sure whether g(x) has an even number of nonzero terms or not. If we want to guarantee that g(x) has an even number of nonzero terms, we should use Lemma 2 to compute the [20] The coefficients of g(x) in Equation (3) are given by

Lemma 2. [Lemma 13] in
XORs, and g(x) has an even number of nonzero terms. However, computing the division [17] takes 2(p − 1) XORs over the ring R p , which is strictly larger than the decoding methods in Lemmas 1 and 2. It is shown in [Theorem 5] in [19] that we can always solve the equations in Equation (1) over the ring F 2 [x]/(1 + x p ) of which all the solutions are congruent to each other after modulo M p (x). Therefore, we can first solve the equations in Equation (1) over the ring F 2 [x]/(1 + x p ) and then obtain the unique solution by taking modulo M p (x) to reduce the computational complexity.

Multiple Multiplications over F 2 [x]/(1 + x p )
Note that in our modified syndrome-based decoding method and the modified interpolation-based decoding method, we need to compute multiple polynomial multiplications, where each polynomial has two nonzero terms. Suppose that we want to compute the following m multiplications where m is a positive integer, We can derive from Equation (4) that (4). In Algorithm 1, we use Γ to denote the number of the polynomial 1 + x in the multiplication L(x τ ). Note that we only need to count the number of 1 +

Algorithm 1 presents a method to simplify the multiplications in Equation
2 multiplications and the computational complexity can be reduced with Algorithm 1. When Algorithm 1 is executed, all elements of count-array Γ should be zero or one, and the length η of the final L(x τ ) is between 1 and m.
18 ω ← the amount of elements no greater than one in count-array Γ;

Decoding Algorithm
In this section, we present two decoding methods over the ring F 2 [x]/(1 + x p ) by modifying two existing decoding methods [17,18] that can reduce the decoding complexity.

Modified Syndrome-Based Method
We define the function of the indeterminate z is given in Equation (2). We can obtain in [Equation (18)] in [17] that Therefore, the σ i (x) can be regarded as the coefficient of z λ−1 of the polynomial G i (z)S(z). Then, the erased column c e i (x) is given by Note that the terms of set {S (x)z } r−1 =λ are not involved in computing the coefficient of z λ−1 of the polynomial G i (z)S(z). Thus, we can just consider the first λ terms (the λ coefficients of degrees less than λ) of S(z) when computing these coefficients, but all the r terms of S(z) are calculated in [Step 1] in [17]. This is one essential way our modified syndrome-based decoding method obtains a lower decoding complexity than the original method in [17].
Moreover, the syndrome polynomials S (x) satisfy i.e., the λ syndrome polynomials S (x) either all have an even number of nonzero terms, or they all have an odd number of nonzero terms, from the definition of Equation (2). Let G(z) = (1 − x e i z)G i (z) and Q(z) = G(z)S(z). Then, we have Thus, Q(z) is independent of the erasure index i, and we only need to compute Q(z) once in the decoding procedure. Recall that σ i (x) is the coefficient of z λ−1 of the polynomial G i (z)S(z); then, the σ i (x) is also the coefficient of z λ−1 of the polynomial we can derive the recurrence formula where 0 ≤ i ≤ λ − 1. Notice that σ i (x) = f i λ−1 (x) holds. Similar to S(z), we only compute the first λ terms (the λ coefficients of degrees less than λ) of Q(z), since the other coefficients of Q(z) are not needed, but all the r + λ terms of Q(z) are calculated in [Step 2] in [17]. This is another way our modified syndrome-based decoding method obtains a lower decoding complexity than the original method in [17]. Algorithm 2 shows our modified syndrome-based decoding method over the ring F 2 [x]/(1 + x p ).
The following Lemma shows that we can always compute the divisions in steps 11-12 of Algorithm 2 by Lemmas 1 and 2 when λ ≥ 2.
Lemma 3. In steps 11-12 of Algorithm 2, the σ i (x) has an even number of nonzero terms for all 0 ≤ i ≤ λ − 1, and we can employ Lemmas 1 and 2 to compute the divisions. (8) and steps 7-10 of Algorithm 2, we obtain

Proof. From Equation
j=0 , which has an odd number of nonzero terms, is an even number, then the σ i (x) has an even number of nonzero terms for 0 ≤ i ≤ λ − 1. In the following, we will show this is true. According to Equation (6) and step 3 of Algorithm 2, Q 0 (1) = · · · = Q λ−1 (1) holds.
According to Lemma 3, we can use Lemmas 1 and 2 to compute the divisions in step 12. The number of divisions required in step 12 is recorded as L i , which ranges from 1 to λ − 1 for i = 0, 1, . . . , λ − 1. So, we can obtain c e i (x) in step 12 by recursively computing the division L i times, while the number of nonzero terms of the polynomial resulting from the first L i − 1 divisions is even. Therefore, we can execute these divisions by Lemma 2 and execute the last division by Lemma 1. The computational complexity T D in steps 11-12 of Algorithm 2 is where λ(p − 3) ≤ T D ≤ λ(λ − 2) 3p−5 2 + λ(p − 3). In steps 11-12 of Algorithm 2, we take the λ(λ − 1) division without Algorithm 1, in which λ divisions are executed by Lemma 1 and λ(λ − 2) divisions are executed by Lemma 2; however, the number of the divisions can be reduced with Algorithm 1. In Table 1, we show the average number of divisions in steps 11-12 of Algorithm 2 executed by Lemma 1 and Lemma 2 with Algorithm 1 for (p, n) ∈ {(5, 5), (7, 7)}. We specify the computational complexity of Algorithm 2 as follows: • Steps 1-2 take λ(δ − 1)p = λ(n − λ − 1)p XORs.
Then, the computational complexity T Alg 2 of Algorithm 2 is where Recall that the computational complexity of the decoding method in [17] is which is strictly larger than T Alg 2 . Table 2 evaluates the computational complexity of the decoding method in [17] and Algorithm 2 for some parameters. The results in Table 2 demonstrate that Algorithm 2 has much lower decoding complexity, compared with the original decoding method in [17]. For example, Algorithm 2 has 40.60% less decoding complexity than the decoding method in [17] when (p, n, r) = (7, 7, 4), λ = 3. The reason why Algorithm 2 has lower decoding complexity than the decoding method in [17] can be summarized as the following three points.
Firstly, we only consider the first λ terms (the λ coefficients of degrees less than λ) for both S(z) and Q(z) in computing the coefficients of z λ−1 , while all r terms of S(z) and all r + λ terms of Q(z) are calculated in the decoding method in [17], where r ≥ λ.
Thirdly, we apply Algorithm 1 to steps 11-12 of Algorithm 2, which can significantly reduce the number of divisions, thus reducing the number of XORs required.

Modified Interpolation-Based Decoding Method
According to the decoding method in [18], we can recover the erased column c e i (x) with 0 ≤ i ≤ λ − 1 by where f i (y) = ∏ λ−1 s=0, =i (y − x e s ) and f (y) = ∏ λ−1 s=0 (y − x e s ). Let where 0 ≤ j ≤ δ − 1. Then, a j (x) has an even number of nonzero terms, and we only need to compute once for a j (x) in the decoding procedure, since a j (x) is independent of the erasure index i. Let where 0 ≤ i ≤ λ − 1, and M p (x) = 1 + x + · · · + x p−1 . Algorithm 3 shows our modified interpolation-based method over the ring F 2 [x]/(1 + x p ). After using Algorithm 1, the number of polynomial multiplications in step 2 ranges from 1 to λ. Thus, the computational complexity T M in steps 1-2 of Algorithm 3 is Algorithm 3: Modified interpolation-based method.
In steps 1-2, we need to take λ multiplications without Algorithm 1, which takes (n − λ)λp XORs; however, with Algorithm 1, the number of multiplications involved in steps 1-2 can be reduced. In Table 3, we show the average number of XORs involved in steps 1-2 of Algorithm 3 with Algorithm 1 for (p, n) ∈ {(5, 5), (7, 7)}. The results in Table 3 show that we can reduce the number of XORs with Algorithm 1, especially for a large value of λ. Only steps 4 and 6 of Algorithm 3 are needed to compute the division. We should employ Lemma 2 to execute the divisions in steps 3-4 in Algorithm 3, since b i (x) in step 6 of Algorithm 3 should have an even number of nonzero terms. Notice that steps 5-6 of Algorithm 3 are exactly the same as steps 11-12 of Algorithm 2.
Then, the computational complexity of Algorithm 3 is where Recall that the computational complexity of the decoding method in [18] is which is larger than that of our Algorithm 3. Table 4 evaluates the computational complexity of the decoding method in [18] and Algorithm 3 for some parameters. The results in Table 4 demonstrate that our Algorithm 3 had much lower decoding complexity, compared with the original decoding method in [18]. For example, Algorithm 3 had a 34.13% lower decoding complexity than the decoding method in [18], when (p, n, r) = (7, 7, 4), λ = 3. The reason why Algorithm 3 has a lower decoding complexity than that of the decoding method in [18] is summarized as follows.
Firstly, all the divisions in Algorithm 3 were executed over the ring F 2 [x]/(1 + x p ) by Lemmas 1 and 2, which used p − 3 XORs and (3p − 5)/2 XORs for each division, respectively. The division in the decoding method in [18] was executed over the ring R p , which used 2(p − 1) XORs.
Secondly, we applied our Algorithm 1 to steps 1-2 and steps 5-6, which significantly reduced the number of multiplications, thus reducing the number of XORs required.

LU Decomposition-Based Method
The LU factorization of a matrix [22] is to express the matrix as a product of a lower triangular matrix L and an upper triangular matrix U. According to the LU factorization of the Vandermonde matrix [23], we can express a Vandermonde matrix as a product of several lower triangular matrices and several upper triangular matrices. Therefore, we can solve the Vandermonde linear equations by first solving the linear equations with the encoding matrices that are the upper triangular matrices and then solving the linear equations with the encoding matrices that are the lower triangular matrices.
Then, the computational complexity of Algorithm 4 is Table 5 evaluates the decoding complexity of Algorithm 2-4 for some parameters. The results of Table 5 demonstrate that Algorithm 2 performs better than Algorithm 3 if λ ≤ n 2 ; otherwise, if λ > n 2 , then Algorithm 3 has less decoding complexity. Algorithm 4 has less decoding complexity than both Algorithms 2 and 3, when λ is small. However, when λ is large, Algorithm 3 is more efficient than Algorithm 4. For example, compared with Algorithm 2-4 have 21.98% and 40.66% less decoding complexity, respectively, when (p, n, r) = (5, 5, 4), λ = 3.