Analysis of Blind Reconstruction of BCH Codes

In this paper, the theoretical lower-bound on the success probability of blind reconstruction of Bose–Chaudhuri–Hocquenghem (BCH) codes is derived. In particular, the blind reconstruction method of BCH codes based on the consecutive roots of generator polynomials is mainly analyzed because this method shows the best blind reconstruction performance. In order to derive a performance lower-bound, the theoretical analysis of BCH codes on the aspects of blind reconstruction is performed. Furthermore, the analysis results can be applied not only to the binary BCH codes but also to the non-binary BCH codes including Reed–Solomon (RS) codes. By comparing the derived lower-bound with the simulation results, it is confirmed that the success probability of the blind reconstruction of BCH codes based on the consecutive roots of generator polynomials is well bounded by the proposed lower-bound.


Introduction
In order to achieve reliable information transmission through noisy communication channels, the use of error-correcting codes (ECCs) in data-stream is indispensable [1]. By sharing the parameters of ECCs between the transmitter and the receiver, the errors occurred by communication channels can be detected or corrected at the receiver in a cooperative way. However, in a non-cooperative context, it is necessary to decode received (or intercepted) data without the knowledge of parameters of the used ECC. In other words, a blind reconstruction of the parameters of the used ECC should be performed by the receiver.
An analysis of the blind reconstruction of cyclic codes over binary erasure channel (BEC) is performed in [20]. Note that for BEC, the number and the locations of error bits in the received data-stream are known to the receiver. By using this property of BEC, a blind reconstruction scheme of binary cyclic codes is proposed and a lower-bound on the detection probability of this scheme is analyzed in [20]. However, many blind reconstruction schemes consider the binary symmetric channel (BSC) where the number and the locations of error bits in the received data-stream are not unavailable. Therefore, the analysis in [20] is not directly applicable to the blind reconstruction schemes considering the BSC.
In this paper, the blind reconstruction of BCH codes over q-ary symmetric channel is mainly considered because BCH codes are a most widely used class of cyclic codes, especially in communication and storage systems and q-ary symmetric channel is a general form of BSC. Especially, the method in [15] shows the best blind reconstruction performance among the existing blind reconstruction methods of BCH codes, but the theoretical analysis of this method has not been performed yet. Therefore, by analyzing the properties of BCH codes on the aspects of blind reconstruction, a lower-bound on the success probability of the blind reconstruction method in [15] is derived. More specifically, the distribution of GFFT values of the received codewords is analyzed and the blind reconstruction method is formulated by using the conjugacy classes. By comparing the derived lower-bound with the simulation results, it is confirmed that the success probability of the blind reconstruction is well lower-bounded. Furthermore, the analysis of BCH codes on the aspects of blind reconstruction may lay a foundation for an analysis of other blind reconstruction methods of BCH codes based on GFFT.
In Section 2, definitions and properties of BCH codes and GFFT are briefly explained. In Section 3, the theoretic analysis of the properties of BCH codes on the aspects of blind reconstruction is performed. In Section 4, the blind reconstruction method in [15] is explained, and a lower-bound on the success probability of this blind reconstruction method is derived. The simulation results confirm that the success probability of the blind reconstruction method is well-bounded by the derived lower-bound. In Section 5, conclusions are provided.

BCH Codes and Galois Field Fourier Transform
In this section, the BCH codes and the Galois field Fourier transform (GFFT) are briefly described.

BCH Codes
BCH codes is a class of linear block codes for forward error correction. Let GF(q) denote the Galois field (or finite field) of q elements and let BCH q (n, k) denote the BCH code with length n and dimension k over GF(q). Note that the dimension k is the same as the length of random message which also implies the number of codewords. Then, the generator polynomial of BCH q (n, k) is defined as follows: where LCM denotes the least common multiple function, α is a primitive n-th root of unity in GF(q m ), M α i (x) is a minimal polynomial of α i over GF(q), b is an arbitrary positive integer smaller than n, and d is a designed distance. Note that m is the smallest integer such that n divides q m − 1. By the definition of generator polynomial g(x), α b , α b+1 , · · · , α b+d−2 are the roots of g(x), i.e., g(α b ) = g(α b+1 ) = · · · = g(α b+d−2 ) = 0. Let S r be the set of the exponents of all roots of g(x) as follows: A message can be expressed in polynomial form as m(x) = m 0 + m 1 x + · · · + m k−1 x k−1 and in vector form as m = (m 0 , m 1 , · · · , m k−1 ), where m i ∈ GF(q) for i ∈ {0, 1, · · · , k − 1}. A codeword of BCH q (n, k) can be expressed in polynomial form as c(x) = c 0 + c 1 x + · · · + c n−1 x n−1 and in vector form as c = (c 0 , c 1 , · · · , c n−1 ), where c i ∈ GF(q) for i ∈ {0, 1, · · · , n − 1}. Then, c(x) can be obtained as follows: Since a codeword c(x) has g(x) as a factor, all roots of g(x) are also roots of c(x), i.e., c(α i ) = 0 for all i ∈ S r . In this paper, the q-ary symmetric channel with error probability is considered. Channel error can be expressed in polynomial form as e(x) = e 0 + e 1 x + · · · + e n−1 x n−1 and in vector form as e = (e 0 , e 1 , · · · , e n−1 ), where e i ∈ GF(q) for i ∈ {0, 1, · · · , n − 1}. Note that by the definition of q-ary symmetric channel, Pr(e i = 0) = 1 − and Pr(e i = x) = /(q − 1) for i ∈ {0, 1, · · · , n − 1} and x ∈ GF * (q) where GF * (q) = GF(q) \ {0}. Then, a received codeword at the receiver is expressed in polynomial form as or in vector form as follows: Throughout the paper, the polynomial form and the vector form will be used interchangeably.
If there is no error (i.e., e(x) = 0), r(α i ) = 0 for all i ∈ S r because r(x) = c(x). However, if e(x) = 0, we may have r(α i ) = 0 for some i ∈ S r because it can be e(α i ) = 0 for some i ∈ S r .

Conjugacy Classes and Cyclotomic Cosets
Let U β denote a conjugacy class of β ∈ GF(q m ). Then, U β consists of β and its conjugates β q , β q 2 , β q 3 , · · · . Note that the conjugacy classes of the elements in the same conjugacy class are the same. The minimal polynomial of α i ∈ GF(q m ) over GF(q), M α i (x), can be obtained by using the conjugacy classes as follows: The degree of has all the elements in U α i as its roots, g(x) in (1) has all the elements in U α b , U α b+1 , · · · , U α b+d−2 as its roots. Let S N denote the null spectrum of the BCH q (n, k) which has the generator polynomial in (1). Then S N is obtained as follows: S r in (2) is also expressed as the set of the exponents of the elements in S N such as S r = {i | α i ∈ S N }. Then, the complement of S r , denoted by S r c , is obtained as follows: It is clear that S r c = Z n \ S r where Z n = {0, 1, · · · , n − 1}.
Let C i denote the cyclotomic coset of i modulo n with respect to GF(q). Then, the exponents of all the elements in U α i make up C i and S r = b+d−2 i=b C i by (2) and (7).

Galois Field Fourier Transform
The roots of a received codeword r(x) can also be obtained by performing the Galois field Fourier transform (GFFT) on r(x). The GFFT of c(x), denoted as C(X), can be expressed in polynomial form as follows: where c(α i ) ∈ GF(q m ) for i ∈ {0, 1, · · · , n − 1}. It is also expressed in vector form as follows: The GFFT matrix M G is defined as follows: Then, the GFFT of c is simply obtained by C = c × M G . By the definition of g(x) in (1), The GFFT of r(x), denoted as R(X), can be expressed in polynomial form as follows: where r(α i ) ∈ GF(q m ) for i ∈ {0, 1, · · · , n − 1}. The vector form of R(X) is expressed as follows: By using M G in (11), the GFFT of r is simply obtained by R = r × M G . In the error-free case (i.e., e(x) = 0), The GFFT of e(x), denoted as E(X), can be expressed in polynomial form as follows: where e(α i ) ∈ GF(q m ) for i ∈ {0, 1, · · · , n − 1}. The vector form of E(X) is expressed as follows: By using (5), (10) and (13), it is clear that R = C + E = (c + e)M G .

GFFT of a Single Symbol Error
In this subsection, the GFFT values of a single symbol error is investigated. Let wt(a) denote the Hamming weight of a vector a, i.e., wt(a) is the number of non-zero elements in a. Note that a single symbol error e(x) satisfies wt(e) = 1.

Lemma 1.
If a received codeword r(x) of BCH q (n, k) contains a single symbol error, then Proof. Let e(x) = e j x j for some j ∈ {0, 1, · · · , n − 1} and e j ∈ GF * (q). Since the GFFT value of Lemma 1 shows that if r(x) contains a single symbol error, any root of g(x) cannot be a root of r(x). In the next subsection, the distribution of GFFT values of c(x) is analyzed.

GFFT of Codewords
Let S c(α i ) denote the set of the GFFT values taken by all the codewords c(x) of BCH q (n, k) for x = α i as follows: Suppose that the minimal polynomial M α (x) of a primitive n-th root of unity α ∈ GF(q m ) over GF(q) has a degree m where m |m. Then, any α i ∈ GF(q m ) can be expressed by a linear combination of α 0 , α 1 , · · · , α m −1 as follows: where h i ∈ GF(q) for i ∈ {0, 1, · · · , m − 1}. Moreover, based on (18), any α i ∈ GF(q m ) can be expressed in vector form, denoted as v α i , as follows:

Lemma 2.
Suppose that a message m(x) is generated uniformly at random, a codeword c(x) of BCH q (n, k) is encoded by g(x) as in (3), and k ≥ rk(α i ). Then, it is satisfied that Proof. First of all, for any i ∈ S r , it is always true that c(α i ) = 0 due to the definition of S r . Therefore, Second, in order to prove (21), let Γ ∈ GF(q) q k ×n denote a matrix having all the q k codewords of BCH q (n, k) as its rows. Then, the GFFT values of q k codewords can be expressed in vector form as follows: where Λ ∈ GF(q) q k ×m is a matrix with the vector forms of all GFFT values of q k codewords with respect to α i as its rows. Note that the rank of Γ is k because all the rows of Γ are the codewords of BCH q (n, k), and the rank of V α i is rk(α i ) by the definition. The matrix Γ can be decomposed as Γ = ∆ × G where ∆ has all the elements of GF(q) k as its rows and G is the generator matrix of BCH q (n, k). Note that the rank of Λ, rank(Λ), is equal to j ∈ {1, 2, · · · , k}, g 1 , g 2 , · · · , g k are linearly independent, and k ≥ rk(α i ), it is clear that rank(G × V α i ) is equal to rk(α i ). Therefore, the rank of Λ is also equal to rk(α i ), which implies that there are q rk(α i ) distinct rows in Λ and |S c(α i ) | = q rk(α i ) for any i ∈ S r c .
Lastly, in order to show (22), let x 0 , x 1 , · · · , x n 1 −1 ∈ GF(q) n be all distinct codewords such that x 0 (α i ) = x 1 (α i ) = · · · = x n 1 −1 (α i ) = x for given i and x ∈ GF(q m ). Also, let y 0 , y 1 , · · · , y n 2 −1 ∈ GF(q) n be all distinct codewords such that y 0 (α i ) = y 1 (α i ) = · · · = y n 2 −1 (α i ) = y for the same i and y ∈ GF(q m ). These relations can be expressed in matrix multiplication as follows: where (11). In order to show Pr(c(α i ) = x) = 1/|S c(α i ) |, it is enough to show n 1 = n 2 . Without loss of generality, suppose that n 1 > n 2 . From (24), we can obtain Note that n 1 vectors x i − x 0 are all distinct. By adding y 0 to each row of the first matrix in LHS of (26), we obtain  Note that n 1 vectors y 0 + x i − x 0 are still all distinct and they are valid codewords. According to (27), the number of codewords which have y as the GFFT value with respect to α i is n 1 , which is a contradiction to the assumption n 1 > n 2 and hence n 1 = n 2 . Therefore, if GFFT is performed on all the codewords of BCH q (n, k) with respect to α i , all the elements of S c(α i ) occur uniformly at random for the random message m(x), which implies Pr(c(α i ) = x) = 1/|S c(α i ) | for any i ∈ S r c .
By Lemma 2, it is clear that c(α i ) = 0 for i ∈ S r and c(α i ) for i ∈ S r c takes a value from S c(α i ) uniformly at random. In the next subsection, the distribution of GFFT values of r(x) is analyzed.

GFFT of Received Codewords
Consider a received codeword r(x) = c(x) + e(x) having a single symbol error, i.e., e(x) = e j x j with e j = 0. Let S e(α i ) be the set of all GFFT values of a single symbol error e(x) with respect to α i as follows: By using Lemma 2, the distribution of GFFT values of r(x) with a single symbol error is analyzed as follows.

Corollary 1.
Suppose that a message m(x) is generated uniformly at random, a codeword c(x) of BCH q (n, k) is encoded by g(x) as (3), e(x) is a single symbol error, and k ≥ rk(α i ). Then, it is satisfied that Proof. By Lemma 2, if k ≥ rk(α i ), it is clear that |S c(α i ) | = q rk(α i ) for i ∈ S r c , which means that S c(α i ) contains all the linear combinations of (α i ) 0 , (α i ) 1 , · · · , (α i ) n−1 over GF(q). Therefore, S c(α i ) contains S e(α i ) for any i ∈ S r c and (29) holds.
Let S r(α i ) be the set of all GFFT values of r(x) with a single symbol error e j x j with respect to α i as follows: Lemma 3. Suppose that a message m(x) is generated uniformly at random, a codeword c(x) of BCH q (n, k) is encoded by g(x) as (3), e(x) is a single symbol error, and k ≥ rk(α i ). Then, it is satisfied that Proof. Based on (30), S r(α i ) can be expressed as follows: As shown in Corollary 1, if e(x) is a single symbol error and k ≥ rk(α i ), S e(α i ) ⊂ S c(α i ) for any i ∈ S r c .
Therefore, S r(α i ) is equal to S c(α i ) for any i ∈ S r c .
The probability in (32) is derived as follows: for x ∈ S r(α i ) and i ∈ S r c . The equality (a) holds by Lemma 2.
Lemma 3 assumes wt(e) = 1, however, in practice, multiple errors also occur. If wt(e) > 1, Lemma 1 does not hold because r(α i ) can be 0 for some i ∈ S r even though e(x) = 0. Note that Pr e(α i ) = 0, e(x) = 0, i ∈ S r is equal to the undetectable error probability of the BCH code which has {α i | i ∈ S r } as its null spectrum.

Lemma 4.
Suppose that a message m(x) is generated uniformly at random, a codeword c(x) of BCH q (n, k) is encoded by g(x) as (3), e(x) is generated by q-ary symmetric channel with error probability , and k ≥ rk(α i ). Then, it is satisfied that S r(α i ) = S c(α i ) , ∀i ∈ S r c , Pr r(α i ) = x = 1 q rk(α i ) , ∀x ∈ S r(α i ) , ∀i ∈ S r c .
Since the error e(x) is not a single symbol error anymore, S e(α i ) is defined as follows: S e(α i ) also contains all the linear combinations of (α i ) 0 , (α i ) 1 , · · · , (α i ) n−1 over GF(q) and hence S r(α i ) = S c(α i ) for any i ∈ S r c because r(α i ) = c(α i ) + e(α i ).
The probability in (36) is derived as follows: for x ∈ S r(α i ) and i ∈ S r c .
As you can see from Lemmas 3 and 4, the conclusions (31) and (32) and (35) and (36) are the same. It implies that if the encoded message m(x) is generated uniformly at random, the GFFT of r(x) with respect to α i takes a value in S r(α i ) uniformly at random regardless of the distribution of e(x) for i ∈ S r c .
By Lemma 4, the probability that r(x) has α i as its root for i ∈ S r c is 1/q rk(α i ) . Based on Lemma 4, the performance of blind reconstruction method of BCH codes [15] is analyzed in the next section.

Blind Reconstruction Method of BCH Codes
In this subsection, the blind reconstruction method of BCH codes based on consecutive roots of generator polynomials [15] is described. In order to perform this method, it is assumed that the codeword synchronization is perfectly done and the code length n is known to the receiver. Suppose that M codewords are received. The j-th received codeword is expressed in polynomial form as r j (x) = r j,0 + r j,1 x + · · · + r j,n−1 x n−1 and in vector form as r j = (r j,0 , r j,1 , · · · , r j,n−1 ) for j ∈ {1, 2, · · · , M}. Let L j denote the set of pairs consisting of the length l of the consecutive roots and the starting value s of these consecutive roots of r j (x) defined as follows: where C l s ∪ s+l−1 i=s C i . For example, if r j = (0, 0, 1, 0, 1, 1, 0) is received, then the GFFT of r j is R j = (1, 0, 0, 1, 0, 1, 1), and therefore L j = {(5, 2)}. Note that 0 < s < n and 2 ≤ l of the elements in L j . By using (39), for r j (x), the maximum length of consecutive roots (MLCR) l max j and the corresponding starting value of consecutive roots (SVCR) s max j are obtained as follows: Let S max denote the set of (s max j , l max j ) for j ∈ {1, 2, · · · , M} as follows: The blind reconstruction method of BCH codes in [15] has two-stage processes.

1.
First stage: The most frequent s max j in S max is selected and called a reference SVCR (R-SVCR), denoted as s re f . 2.
Second stage: The most frequent l max j among the pairs having s max j = s re f in S max is selected and called a reference MLCR (R-MLCR), denoted as l re f . By setting b = s re f and d = l re f + 1 in (1), the generator polynomial of the used BCH code is reconstructed.

Performance Analysis of Blind Reconstruction Method of BCH Codes
In this subsection, the performance of blind reconstruction method in [15] is analyzed. Suppose that BCH q (n, k) is used and M codewords are received. The generator polynomial g 0 (x) is set as in (1) with b = s 0 and d = l 0 + 1. In order to succeed in blind reconstruction of this BCH code, s re f and l re f should be correctly determined as s re f = s 0 and l re f = l 0 . Define the sets of received codewords, M(s, l), M m (s, l), M * (s, l), and M e (s, l) as follows: M e (s, l) = r j (x) | e j (x) = 0, r j (x) ∈ M(s, l) .
Note that M * (s, l) ⊆ M m (s, l) ⊆ M(s, l). In order to succeed in the first stage of the blind reconstruction method, the following relation should be satisfied, The relation (46) can be simplified as in the following Lemma 5.

Lemma 5.
If the following inequality is satisfied, then the first stage of the blind reconstruction of BCH codes in [15] always succeeds, and the success probability of the first stage is lower-bounded as where, for better readability, the given condition that |M * (s 0 , l 0 )| > |M(s, 2)|, ∀s = s 0 , is omitted in the probability.
The lower-bound on the success probability of the second stage is derived as follows: Note that, for better readability, the given condition that |M * (s 0 , l 0 )| > |M(s, 2)|, ∀s = s 0 , is omitted in the probability.

Theorem 1.
Suppose that randomly generated M codewords of BCH q (n, k), which uses the generator polynomial g(x) as in (1) with b = s 0 and d = l 0 + 1, are received after passing through q-ary symmetric channel with error probability . Then, the success probability of the blind reconstruction method of BCH codes in [15], denoted as P s , is lower-bounded as follows: , 2, · · · , n − 1} \ C l 0 s 0 , and P ue (C) is the undetectable error probability of BCH code having {α i | i ∈ C} as its null spectrum.
If the j-th received codeword r j (x) is error-free or has an undetectable error, it is always true that r j (x) ∈ M(s 0 , l 0 ). Furthermore, if r j (α i ) = 0 for i ∈ C l 0 s 0 c , it is also true that r j (x) ∈ M * (s 0 , l 0 ), where C l 0 s 0 c = {1, 2, · · · , n − 1} \ C l 0 s 0 . Then, Pr r j (x) ∈ M * (s 0 , l 0 ) is derived as follows: where P ue (C l 0 s 0 ) is the undetectable error probability of BCH code having {α i | i ∈ C l 0 s 0 } as its null spectrum. In the equality (a) in (57), r j (α i ) for i ∈ C l 0 s 0 c occurs uniformly at random because the message is generated uniformly at random. Therefore, the event that r j (α i ) = 0 for any i ∈ C l 0 s 0 and the event that r j (α z ) = 0 for any z ∈ C l 0 s 0 c are independent and hence the equality (a) holds. The equality (b) is derived by using Pr(r j (α i ) = 0, ∀i ∈ C l 0 s 0 ) = {(1 − ) n + P ue (C l 0 s 0 )} and Lemma 4. The probability that r j (x) ∈ M(s, 2) for s = s 0 is calculated by using Lemma 4 as follows: For better readability, s = s 0 is omitted in the probability.
Let M 1 (C i ) be {r j (x) | r j (α z ) = 0, ∀z ∈ C i } and M 2 (C i ) be {r j (x) | r j (α z ) = c j (α z ) + e j (α z ) = 0, ∀z ∈ C i , e j (x) = 0}. If |M * (s 0 , l 0 )| is greater than |M 1 (C i )| for any C i ⊆ C l 0 s 0 c and also greater than |M 2 (C i )| for any C i ⊆ C l 0 s 0 , it is also satisfied that |M * (s 0 , l 0 )| > |M(s, 2)| for any s such that C i ⊆ C l 0 s 0 , then it is also true that |M * (s 0 , l 0 )| > |M(s, 2)| due to |M * (s 0 , l 0 )| > |M 2 (C i )| ≥ |M(s, 2)| for any C i ⊆ C l 0 s 0 . Then, the condition for the success of the first stage of blind reconstruction method is simplified as follows: Moreover, Pr(r j (α z ) = 0, ∀z ∈ C i ) for C i ⊆ C l 0 s 0 c is simplified as follows: The equality (a) is derived by using (58) and (b) is derived by using C i ∩ C l 0 s 0 = ∅ and C i \ C l 0 s 0 = C i . Furthermore, Pr(r j (α z ) = 0, ∀z ∈ C i , e j (x) = 0) for C i ⊆ C l 0 s 0 , is also simplified as follows: The equality (a) is derived by using C i ∩ C l 0 s 0 = C i and C i \ C l 0 s 0 = ∅ The probability that r j (x) ∈ M e (s 0 , 2) is the same as the undetectable error probability of a BCH code having {α i | i ∈ C 2 s 0 } as its null spectrum as follows: Pr r j (x) ∈ M e (s 0 , 2) = P ue C 2 s 0 .
In Theorem 1, a lower-bound on the success probability of the blind reconstruction method of BCH codes in [15] is obtained. In order to confirm the validity of this lower-bound, simulations are performed by using the following BCH codes. As you can see from Figure 1, the success probability of the blind reconstruction of binary BCH codes is well bounded by the lower-bound in (56). However, for BCH 2 (63, 51), the gap between the simulation result and the lower-bound is larger than the others because BCH 2 (63, 51) has a cyclotomic coset of cardinality 2, while all the cyclotomic cosets of BCH 2 (31, 21) and BCH 2 (127, 113) have the cardinality 5 and 7, respectively. In (56), if a cyclotomic coset C i ⊆ C l 0 s 0 c has small cardinality, 1/q rk(α i ) becomes bigger and then, B(M, y, 1/q rk(α i ) ) becomes smaller. Therefore, the lower-bounds of the blind reconstruction performance of BCH 2 (31, 21) and BCH 2 (127, 113) is much tighter than BCH 2 (63, 51).  As you can see from Figure 2, the success probability of the blind reconstruction of RS codes is also well bounded by the lower-bound in (56). Moreover, as the code length increase, the proposed lower-bound of RS codes becomes tighter and therefore this lower-bound can be a good estimation of blind reconstruction performance for practical RS codes. Furthermore, since the proposed lower-bound can estimate the blind reconstruction performance without the extensive simulation, the proposed lower-bound is suitable for practical use.

Conclusions
The blind reconstruction method of BCH codes in [15] shows the best performance, but the theoretical analysis of this method has not been performed. In this paper, by analyzing the properties of BCH codes on the aspects of blind reconstruction, a lower-bound on the success probability of the blind reconstruction method in [15] is derived. Especially, the distribution of GFFT values of the received codewords are analyzed and the blind reconstruction method is formalized based on the conjugacy classes. Furthermore, the analysis results can be applied not only to the binary BCH codes, but also to the non-binary BCH codes, including RS codes. By comparing the derived lower-bound with the simulation results, it is confirmed that the success probability of the blind reconstruction is well bounded by the proposed lower-bound.