Reducing the Overhead of BCH Codes: New Double Error Correction Codes

: The Bose-Chaudhuri-Hocquenghem (BCH) codes are a well-known class of powerful error correction cyclic codes. BCH codes can correct multiple errors with minimal redundancy. Primitive BCH codes only exist for some word lengths, which do not frequently match those employed in digital systems. This paper focuses on double error correction (DEC) codes for word lengths that are in powers of two (8, 16, 32, and 64), which are commonly used in memories. We also focus on hardware implementations of the encoder and decoder circuits for very fast operations. This work proposes new low redundancy and reduced overhead (LRRO) DEC codes, with the same redundancy as the equivalent BCH DEC codes, but whose encoder, and decoder circuits present a lower overhead (in terms of propagation delay, silicon area usage and power consumption). We used a methodology to search parity check matrices, based on error patterns, in order to design the new codes. We implemented and synthesized them, and compared their results with those obtained for the BCH codes. Our implementation of the decoder circuits achieved reductions between 2.8% and 8.7% in the propagation delay, between 1.3% and 3.0% in the silicon area, and between 15.7% and 26.9% in the power consumption. Therefore, we propose LRRO codes as an alternative for protecting information against multiple errors.


Introduction
Error control codes (ECCs) are frequently employed for fault tolerance in dependable systems. Coding theory has been studied for over half a century, and it is still going stronger than ever [1,2], because of its extensive application in computing, data storage, communications, etc.
The Bose-Chaudhuri-Hocquenghem (BCH) codes are one of the best-known ECCs. They were discovered in 1959 by Hocquenghem [3] and, independently, in 1960 by Bose and Ray-Chaudhuri [4]. Since then, BCH codes have been extensively employed in several applications, namely: flash memories in solid-state drives (SSDs) [5], optical storage like compact disks (CDs) or digital versatile disks (DVDs) [6], ethernet [7], video codecs [8], digital video broadcasting (DVB) [9], and satellite communications [10]. Their algebraic features make BCH codes very useful in a wide range of situations, and their error coverage is achieved with minimum redundancy.
Nevertheless, when applied to computers, these codes have two main weaknesses. First, primitive BCH codes (those generated according to their strict definition) only exist for a limited number of word lengths [11]. This problem can be easily solved by shortening a longer code. However, although they maintain their error coverage, shortened BCH codes may lose other properties, such as cyclicity or minimal redundancy, as stated later.
The second problem with these codes is the latency of the decoding process. Although their algebraic structure is useful in simplifying the encoding and decoding procedures, most of the decoding algorithms have a sequential structure that requires several clock cycles to complete the decoding [11]. Commonly, this is not a problem in software implementation or even in hardware when the speed requirements are not very high. However, the usefulness of BCH codes is limited when very fast encoding and decoding operations are required.
The information stored in key elements of a computer system, such as registers and memories, may be perturbed by different physical mechanisms [12][13][14]. As technology increases in integration scale, single error correction (SEC) or single error correction-double error detection (SEC-DED) codes may not be enough for present and future computers. Multiple error correction codes, like BCH, become an interesting alternative, but designers have to deal with the two problems described previously. First, the lengths of data words are commonly in a power of two (8, 16, 32, etc.), and they do not match to the block sizes of primitive BCH codes. Second, their "slow" decoding may reduce the system performance.
This paper focuses on binary codes for double error correction (DEC), applied to common data word lengths in computers (i.e., 8, 16, 32, and 64). We propose new low redundancy and reduced overhead (LRRO) DEC codes, which maintain the same error coverage and redundancy as the equivalent BCH codes, but reduce the overhead introduced by the encoder and decoder circuits in terms of the propagation delay, silicon area, and power consumption. Different encoder and decoder circuits have been implemented and synthesized in order to validate the error coverage and to measure and compare those parameters. The results validate the improvements achieved by our proposal, and confirm LRRO codes as an interesting alternative to BCH codes in high-speed applications, like random access memories (RAM) and processor registers. These codes are especially well suited for RAM memories, where low redundancy, high-speed operations, and low power consumption are mandatory.
This paper is organized as follows. Section 2 introduces basic concepts about coding theory and BCH codes. The proposed LRRO codes are described in Section 3. Section 4 includes the evaluation of the proposal and a comparison with the BCH codes. Finally, Section 5 presents some conclusions and ideas for future work.

Basics on Error Control Coding
An (n, k) binary linear block ECC encodes a k-bit input word in an n-bit output word [15]. The input word, u = (u0, u1, ..., uk−1), is a k-bit vector that represents the original data. The code word, b = (b0, b1, ..., bn−1), is an n-bit vector, where the (n − k) added bits are called the parity, code, or redundant bits. b is transmitted through an unreliable channel that delivers the received word, r = (r0, r1, ..., rn−1). The error vector, e = (e0, e1, ..., en−1), models the error induced by the channel. If no error has occurred in the i-th bit, then ei = 0; otherwise, ei = 1. Therefore, r can be interpreted as r = b ⊕ e. Figure 1 synthesizes this encoding, channel crossing, and syndrome decoding processes. The generator matrix, Gk×n, of a linear code (together with its related parity check matrix, H(n−k)×n) defines the code [1]. For the encoding process, b = u·G. The encoded word, b, meets the requirement The generator matrix, G k × n , of a linear code (together with its related parity check matrix, H (n − k) × n ) defines the code [1]. For the encoding process, b = u·G. The encoded word, b, meets the Electronics 2020, 9, 1897 3 of 15 requirement H·b T = 0, which means that it is a correct code word. For syndrome decoding, syndrome is defined as s T = H·r T , and it exclusively depends on e, as follows: There must be a different syndrome, s, for each correctable error vector, e. Syndrome decoding is done through a lookup table that relates each syndrome to the decoded error vector,ê. If s = 0, we can assume thatê = 0, and hence r is correct. Otherwise, an error has occurred. The decoded code word, b, is calculated asb = r ⊕ê. Fromb, it is easy to obtainû by discarding the parity bits. If the fault hypothesis used to design the ECC is consistent with the behavior of the channel,û and u must be equal with a very high probability.
For a binary word, the term Hamming weight, w, denotes the number of ones in that word. As explained later, the Hamming weights of the rows and columns of the parity check matrix determine the complexity of a code.
The Hamming distance between two binary words is the number of bits in which they differ. The minimum Hamming distance of a code (d min ) is the minimum of the distances between all pairs of valid code words. This parameter determines the error coverage of a code.
Code shortening is a code construction where, starting from a longer linear block code, the number of data bits is reduced, while maintaining the same number of parity bits [15]. In this way, we can construct a code for shorter data words, maintaining the same Hamming distance, and, therefore, the same error coverage.
As stated above, the encoding process in linear block codes can be easily implemented. However, the decoding process (mainly the lookup table implementation) may be difficult. Cyclic codes form an important subclass of linear block codes [11]. Encoding and syndrome computation can be implemented easily using linear feedback shift register circuits operated sequentially. Because of their algebraic structure, there are different practical methods for decoding.
Cyclic codes are linear block codes with the additional property that, for each code word, all cyclically shifted words are also valid code words.
Polynomial representation [11] is frequently employed to work with these codes. For example, the input word, u, can be represented as a (k − 1) or lower degree polynomial with the variable X, where the power of X is used to locate the bit u i in the following word: Cyclic codes can be represented using a generator polynomial, g(X). The data words, b(X), are computed as b(X) = u(X)·g(X). A parity check polynomial can be obtained from the generator polynomial. In fact, h(X)·g(X) = X n + 1 [11]. The generator and parity check matrices can be obtained from these polynomials.
BCH codes are one of the best-known cyclic codes. Reed-Solomon codes are the most important subclass of non-binary BCH codes [11]. However, we will focus on binary BCH codes from now on.

Binary BCH Codes
For any positive integers where m ≥ 3 and t < 2 m − 1 , there exists a binary BCH code with code word length, n = 2 m − 1; number of parity bits, (n − k) ≤ mt; and minimum distance d min ≥ 2t + 1. Therefore, this code can correct t errors [11].
For a t-error correction BCH code, its generator polynomial is given by g(X) = m 1 (X) × m 3 (X) × . . . × m 2t − 1 (X), where m i (X) is the minimal polynomial of α i , (i = 1, 3, . . . , 2t − 1) and α is an element of GF(2 m ) of order n. GF refers to Galois fields (or finite fields). If m 1 (X) is a primitive polynomial of degree m over GF (2), the code is called a primitive binary BCH code. Then, α is a primitive element and its order is n = 2 m − 1. The minimal polynomials m i (X) and/or the generator polynomials for binary primitive BCH codes can be found in the literature [11]. More information about algebra for coding theory can be found, for example, in the literature [1,11,15].
There are different algorithms to calculate the error-location polynomial (the key step of the decoding): Berlekamp-Massey (mainly used to implement software decoders), Euclidean (mostly employed in hardware implementations), etc. [11,16]. These methods are sequential algorithms with iterative steps. Therefore, they require several clock cycles to perform the decoding process.
A combinational decoder (sometimes referred to as a parallel decoder) is proposed in the literature [17]. Because of its importance in this work, it is briefly explained in Section 2.4. It is first necessary to understand how to obtain the generator and parity check matrices from the equivalent polynomials.

Matrices and Polynomials
As an example, let us consider the binary BCH code with m = 3 and t = 1. It is a (7, 4) BCH SEC code. It is equivalent to the cyclic Hamming code. Its generator polynomial is g(X) = X 3 + X + 1 [11]. As stated in Section 2.1, h(X) = (X 7 + 1)/g(X) = X 4 + X 2 + X + 1.
Combinational encoders and decoders may require the equivalent generator and parity check matrices. The binary representation of the generator polynomial is 1011. Arranging and shifting this word in the matrix, we obtain the following: As it can be observed, it is not in a systematic form. Obtaining the remainder by dividing X n − 1 , X n − 2 , . . . , X n − k by the generator polynomial, we get the matrix in a systematic form [16]: The same procedure can be followed to obtain the parity check matrix: This method is applied to BCH DEC codes in the literature [17], as presented in the following.

Combinational Circuits for BCH Codes
The work presented in the literature [17] proposes combinational encoder and decoder circuits for BCH DEC codes for SRAM protection. As stated before, BCH DEC codes have not found favorable application in SRAMs because of the non-alignment of their block sizes to typical memory word lengths, and particularly because of the large multi-cycle latency of traditional iterative decoding algorithms. The authors propose a solution based on the generator and parity check matrices. Shortened codes can be easily obtained from the matrices of the primitive codes. The encoder circuit can be simply designed from the generator matrix. Syndrome computation (the first step of the decoding process) is generated using the parity check matrix. The key element in this proposal is the error pattern decoder, equivalent to the lookup table shown in Figure 1. Combinational logic is employed to map the syndromes for correctable error patterns. This mapping is precomputed by multiplying all correctable error patterns with the parity check matrix, H.
Once the estimated error vector is determined, an erroneous bit is corrected by complementing it; hence, the error corrector circuit (final step of the decoding process) is simply a stack of XOR gates.
The authors of this work implemented BCH DEC codes for 16, 32, and 64 bits, and synthesized them using a 90-nm standard cell library. They reported, for example, decoding latencies ranging from 1.4 ns (for 16 bits) to 2.2 ns (for 64 bits).

Low Redundancy and Reduced Overhead (LRRO) Double Error Correction Codes
The Hamming weight for a row of a parity check matrix determines the number of terms to be XORed in order to calculate the parity and syndrome bits. Therefore, in hardware implementation, the heaviest row defines the logic depth of the XOR tree, which implements the encoder circuit and the syndrome computation in the decoder circuit. It influences the delay introduced by the ECC.
In the same way, the Hamming weight of the whole parity check matrix determines the number of logic gates required in the encoder circuit and the syndrome computation in the decoder circuit. This number has an important influence on both the silicon area occupied and the power consumed by those circuits.
Finally, the complexity of the ECC is mainly determined by its error coverage, and affects all overheads (delay, silicon area, and power consumption). Nevertheless, as the error coverage is determined by the design, nothing can be done about this parameter. Therefore, to reduce the overhead introduced by the encoder and decoder circuits, we must focus on the Hamming weights of the heaviest row and the whole matrix. Frequently, ECC designers can achieve these objectives by increasing the number of parity bits. Depending on the application, it could be an interesting alternative (e.g., [18]). However, this is not a good idea when protecting memories, as all redundant bits must be stored per individual word in the whole memory, leading to a much higher silicon area being occupied and more power being consumed by the memory circuitry.
Is it possible to find ECCs with the same redundancy as BCH codes, but reducing the overhead introduced? For thid, we employed a searching methodology based on the errors to be corrected and/or detected [19]. Although this methodology was initially employed to design flexible unequal error control (FUEC) codes, it has been successfully used to design different families of codes (e.g., [18,20,21]). First, let us briefly describe this methodology. Later, we describe the procedure needed in order to obtain matrices for the existing BCH DEC codes. Next, we present the new LRRO DEC codes. Finally, some important considerations have been included at the end of this section.

Searching Parity Check Matrices
Searching a parity check matrix that achieves the required error coverage may end up being very complex. We used a methodology based on searching matrices that can correct and/or detect a given set of error vectors. This methodology was first presented in the literature [19] to design flexible unequal error control (FUEC) codes. Although a detailed explanation of the methodology is out of the scope of this paper, it is briefly summarized in Figure 2, and is described in the following.
After determining the values of n and k for the code to be designed, the error patterns to be corrected must be selected. These error patterns can be represented using error vectors, according to the definition given in Section 2.1. Then, the parity check matrix, H, that satisfies (6) is searched, where E + represents the set of error vectors to be corrected. A detailed explanation of this algorithm, as well as a code design example, can be found in the literature [19].

BCH DEC Codes for 8, 16, 32, and 64 Bits
As stated above, in most cases, the block sizes of the primitive BCH codes do not align with the sizes employed in memories. Hence, if we want to use BCH DEC codes with data lengths of 8, 16, 32, or 64 bits, they must be shortened from primitive BCH DEC codes of longer data lengths. As described in Section 2.1, code shortening allows for the design of codes with any block size using longer block size codes. The primitive BCH DEC codes required for our purpose are (31, 21), (63, 51), and (127, 113). From the (31, 21) code, we can obtain (18,8) and (26, 16) BCH DEC codes. From the (63, 51) code, a (44, 32) code can be designed. The (127, 113) code allows for the construction of a (78, 64) BCH DEC code.
Following the steps described in Section 2.3 to get the matrices in a systematic form, and shortening the code as described in the literature [17], the parity check matrix for the (26, 16) BCH DEC code can be obtained as follows: In the same way, the matrices for all of the aforementioned codes have been obtained. These matrices and their resulting encoding and decoding circuits will be compared with our proposal in Section 4.

LRRO DEC Codes for 8, 16, 32, and 64 Bits
In this paper, we propose new low redundancy and reduced overhead (LRRO) codes for double error correction, equivalent (in terms of redundancy and error coverage) to the BCH DEC codes described in Section 3.2. That is, as primitive BCH codes offer the minimum possible redundancy, the  To find the matrix, the recursive backtracking algorithm shown in Figure 2b is used. It checks partial matrices and adds a new column only if the previous matrix satisfies the requirements. The added columns must be non-zero, so there are 2 n − k − 1 combinations for each column. In order to find matrices with a low Hamming weight, columns with a lower number of ones are checked first.
Once the H matrix is selected, it is easy to determine the logic equations in order to calculate each parity and syndrome bit, as well as the syndrome lookup table. They are required for encoder and decoder implementation.
In addition, we can improve the matrix generation so as to reduce the number of 1 s in those rows with a higher number of 1 s of the parity check matrix, as well as reducing the total number of 1 s in the whole matrix. As said before, these reductions will lead to faster, smaller, and less power-consuming circuits.
A detailed explanation of this algorithm, as well as a code design example, can be found in the literature [19].

BCH DEC Codes for 8, 16, 32, and 64 Bits
As stated above, in most cases, the block sizes of the primitive BCH codes do not align with the sizes employed in memories. Hence, if we want to use BCH DEC codes with data lengths of 8, 16, 32, or 64 bits, they must be shortened from primitive BCH DEC codes of longer data lengths. As described in Section 2.1, code shortening allows for the design of codes with any block size using longer block size codes. The primitive BCH DEC codes required for our purpose are (31, 21), (63, 51), and (127, 113). From the (31, 21) code, we can obtain (18,8) and (26, 16) BCH DEC codes. From the (63, 51) code, a (44, 32) code can be designed. The (127, 113) code allows for the construction of a (78, 64) BCH DEC code.
Electronics 2020, 9, 1897 7 of 15 Following the steps described in Section 2.3 to get the matrices in a systematic form, and shortening the code as described in the literature [17], the parity check matrix for the (26, 16) BCH DEC code can be obtained as follows: In the same way, the matrices for all of the aforementioned codes have been obtained. These matrices and their resulting encoding and decoding circuits will be compared with our proposal in Section 4.

LRRO DEC Codes for 8, 16, 32, and 64 Bits
In this paper, we propose new low redundancy and reduced overhead (LRRO) codes for double error correction, equivalent (in terms of redundancy and error coverage) to the BCH DEC codes described in Section 3.2. That is, as primitive BCH codes offer the minimum possible redundancy, the new codes have been designed to maintain the same redundancy and coverage as the BCH DEC codes mentioned previously, but attempt to reduce the overhead induced by such codes. In Section 3.4, some comments about the redundancy of non-primitive BCH codes, as well as a new code, have been included.
Using the methodology described in Section 3.1, and taking the values n = 18 and k = 8, we have found an (18,8) LRRO DEC code. This is its parity check matrix: In the same way, varying the values of n and k, matrices for the new (26, 16), (44, 32), and (78, 64) LRRO DEC codes have been found. They are shown in (10), (11), and (12), respectively:   100000000000001110000110010100011010100000000010001100000000000110111000011010  010000000000001101100000111010000010001000100001011001010001001000100100100100  001000000000001010110001100000100001100000100000000010010100011000011100111001  000100000000001001011010001100110100010100001001000110001000000001100001110000  000010000000000110101011001000100000010001010110010001100000010000000100111010  000001000000000101010101000010010001110010111000000000000000100010100101000111  000000100000000011001100100101001000001110010000001000000001001101010001010011  000000010000000000111100010001000100001001000100000100100000100101101010101101  000000001000000000000011110011001100000100001100100011000000000000011010001111  000000000100000000000000001111000011000010000010100000100011100010010111110011  000000000010000000000000000000111111000001000001100000001101011110000010101111  000000000001000000000000000000000000111111000000010000011011110000101010011000  000000000000100000000000000000000000000000111111110000000110110001011001000111 000000000000010000000000000000000000000000000000001111111110001111000111000100 It must be considered that these are just some examples of word sizes frequently employed in digital systems. Codes for different sizes (e.g., 128) can also be found using this methodology.
From the parity check matrices, it is easy to obtain the logic equations for the encoder circuits and the syndrome computation in the decoder circuits. An example can be found in the literature [19]. Later, it is necessary to relate each syndrome with the corresponding error vector, implementing a lookup table for each code. In fact, these are the truth tables of a group of logic functions where the inputs are the syndrome bits, and the outputs are the bits of the estimated error vectors.
Each logic function is a sum of minterms representing the syndromes that affect each bit. A good explanation of how to generate the lookup table and the logic equations for the bits of the estimated error vector can be found in the literature [18].

Redundancy and Non-Primitive BCH Codes
As stated before, although shortened (or non-primitive) BCH codes maintain the error coverage of the primitive codes, they may lose other properties. For example, do they maintain the minimum redundancy property for a given data block size? The answer is no for a general case, although it depends on the number of columns shortened. Using our methodology, we did not find LRRO DEC codes with a lower redundancy than the equivalent BCH DEC codes for 16, 32, and 64 bits. This does not mean that such codes do not exist, as a complete search is unfeasible nowadays; but if they exist, they are not easy to find. Using our methodology, if a code exists, it is frequently found after a short searching time.
However, when the (31, 21) BCH DEC code is shortened to get the (18,8) code, 13 columns are discarded-more than a half of the data columns. In this case, we found a (16,8) LRRO DEC code, whose parity check matrix is as follows: This code has a better redundancy than the equivalent BCH code. This is important when protecting memories, as the redundant bits are included in all of the words in the whole memory.
In any case, this code is not included in the comparisons performed in the next section. A deeper study of codes with lower redundancy than the equivalent BCH codes, as well as the implementation and synthesis of the code shown in (13), are out of the scope of this paper, and will form part of a future work.

Theoretical Study
The real values for the overhead induced by the encoder and decoder circuits of an ECC depend on each implementation: the technology, the synthesis process, etc. From a practical point of view, a logic synthesis is required to obtain these values. However, it is possible to do a more generic, theoretical study. As stated above, the Hamming weight of the heaviest row and the Hamming weight of the whole parity check matrix are important parameters that influence the overhead. Figure 3 shows the Hamming weights of the heaviest row (i.e., the maximum number of 1 s in a row) of all matrices for the codes considered. As can be observed, all LRRO codes reduce this number with respect to the equivalent BCH codes, and the difference increases as k increases.
Electronics 2020, 9, x FOR PEER REVIEW 9 of 14 the same redundancy and error coverage. From a theoretical point of view, these improvements should result in encoder and decoder circuits with a lower overhead. The question now is to confirm these good results when the circuits are implemented. This analysis is performed in the following.   Figure 4 shows the Hamming weights of the whole matrix (i.e., the total number of 1 s of each parity check matrix) for the codes considered. Again, all LRRO codes reduce this number with respect to the equivalent BCH codes, and the difference increases with k.

Implementation and Logic Synthesis
All LRRO DEC codes presented in this paper, as well as the considered BCH DEC codes, have been implemented and simulated to check if they achieve the expected error coverage. We implemented a simulation-based error injection tool [21] with which we could analyze the error coverage of the different ECCs. This tool can inject diverse error models. Particularly, single and multiple random errors were injected. These errors emulate bit-flips in memory cells.
The basic scheme of our error injector is shown in Figure 5. This tool can verify whether the injected error provokes an incorrect or a correct decoding. To do this, the simulation-based error injector tool compares the input and output words.
As expected, all of the studied codes corrected 100% of single and double errors.  The results shown here indicate that the proposed LRRO codes improve, in all cases, the attributes of their parity check matrices, when compared with the equivalent BCH codes, maintaining the same redundancy and error coverage. From a theoretical point of view, these improvements should result in encoder and decoder circuits with a lower overhead. The question now is to confirm these good results when the circuits are implemented. This analysis is performed in the following.

Implementation and Logic Synthesis
All LRRO DEC codes presented in this paper, as well as the considered BCH DEC codes, have been implemented and simulated to check if they achieve the expected error coverage. We implemented a simulation-based error injection tool [21] with which we could analyze the error coverage of the different ECCs. This tool can inject diverse error models. Particularly, single and multiple random errors were injected. These errors emulate bit-flips in memory cells.
The basic scheme of our error injector is shown in Figure 5. This tool can verify whether the injected error provokes an incorrect or a correct decoding. To do this, the simulation-based error injector tool compares the input and output words. these good results when the circuits are implemented. This analysis is performed in the following.

Implementation and Logic Synthesis
All LRRO DEC codes presented in this paper, as well as the considered BCH DEC codes, have been implemented and simulated to check if they achieve the expected error coverage. We implemented a simulation-based error injection tool [21] with which we could analyze the error coverage of the different ECCs. This tool can inject diverse error models. Particularly, single and multiple random errors were injected. These errors emulate bit-flips in memory cells.
The basic scheme of our error injector is shown in Figure 5. This tool can verify whether the injected error provokes an incorrect or a correct decoding. To do this, the simulation-based error injector tool compares the input and output words.
As expected, all of the studied codes corrected 100% of single and double errors.  As expected, all of the studied codes corrected 100% of single and double errors.
On the other hand, the encoder and decoder circuits for all ECCs were implemented in Very High speed integrated circuit Hardware Description Language (VHDL). For example, Figure 6 shows an excerpt of the VHDL implementation of the (26, 16) LRRO DEC code. Then, using CADENCE software [22], we carried out a logic synthesis for 45-nm technology using the NanGate FreePDK45 Open Cell Library [23,24]. Standard cells were based on Scalable Complementary Metal Oxide Semiconductor (SCMOS) design rules. The power voltage and temperature conditions were 1.1 V and 25 • C, respectively. Electronics 2020, 9, x FOR PEER REVIEW 10 of 14 On the other hand, the encoder and decoder circuits for all ECCs were implemented in Very High speed integrated circuit Hardware Description Language (VHDL). For example, Figure 6 shows an excerpt of the VHDL implementation of the (26, 16) LRRO DEC code. Then, using CADENCE software [22], we carried out a logic synthesis for 45-nm technology using the NanGate FreePDK45 Open Cell Library [23,24]. Standard cells were based on Scalable Complementary Metal Oxide Semiconductor (SCMOS) design rules. The power voltage and temperature conditions were 1.1 V and 25 °C, respectively. Although less generic than the theoretical study presented previously, logic synthesis allows for obtaining more realistic information about the overhead induced by different ECCs for a given technology. The values considered for comparison are the propagation delay of the circuits, the silicon area occupied, and the power consumption. Figure 7 shows the delay induced by the encoder circuits. As can be observed, the proposed LRRO codes greatly reduce this delay for all block sizes. Figure 8 depicts the propagation delay of the decoder circuits. In this case, the reduction achieved by the LRRO codes is less important than for the encoders. Anyway, all of the LRRO decoders reduce the delay with respect to the equivalent BCH circuits, for all block sizes.
When protecting memories using an ECC, encoding and decoding delays may influence the clock cycle and the working frequency of the processor. Therefore, the reduction achieved by LRRO codes makes them a better option than BCH codes.
The silicon areas occupied by the encoder and decoder circuits (in Figures 9 and 10, respectively) show the same trends observed with the propagation delay. All LRRO circuits use less area than the equivalent BCH circuits, and the reduction is proportionally more important in the encoder circuits. Because of the high integration density of the nanometric manufacturing processes, the silicon area is not a problem nowadays, in most cases. However, LRRO codes are superior to the equivalent BCH codes in this comparison. Although less generic than the theoretical study presented previously, logic synthesis allows for obtaining more realistic information about the overhead induced by different ECCs for a given technology. The values considered for comparison are the propagation delay of the circuits, the silicon area occupied, and the power consumption. Figure 7 shows the delay induced by the encoder circuits. As can be observed, the proposed LRRO codes greatly reduce this delay for all block sizes.       When protecting memories using an ECC, encoding and decoding delays may influence the clock cycle and the working frequency of the processor. Therefore, the reduction achieved by LRRO codes makes them a better option than BCH codes.
The silicon areas occupied by the encoder and decoder circuits (in Figures 9 and 10, respectively) show the same trends observed with the propagation delay. All LRRO circuits use less area than the equivalent BCH circuits, and the reduction is proportionally more important in the encoder circuits. Because of the high integration density of the nanometric manufacturing processes, the silicon area is not a problem nowadays, in most cases. However, LRRO codes are superior to the equivalent BCH codes in this comparison.        Finally, the power consumed by the encoder circuits is shown in Figure 11, while Figure 12 presents the power consumed by the decoder circuits. Again, the same trends can be observed. All LRRO circuits consume less power than the equivalent BCH circuits. The reduction is proportionally higher in the encoder circuits, but conversely to previous comparisons, the reductions observed in the decoder circuits are also high. The static power, more related to the silicon area, represents a small fraction of the total power consumed. In this case, the dynamic power consumption has been greatly reduced, leading the total power consumed to the results shown.
Nowadays, a low power consumption is a must in most digital systems. Therefore, the reductions achieved by the LRRO codes make them a clearly better option compared with BCH codes.
To sum up, the low redundancy and reduced overhead DEC codes presented in this paper are a better option than the equivalent BCH DEC codes, as they maintain the same redundancy and error coverage, while inducing a reduced overhead. Comparing the codes for the 8-, 16-, 32-, and 64-bit data lengths against BCH, LRRO circuits achieve the following:

•
Lower delay: at least 30% reduction in the encoder, up to 8.7% in the decoder circuits. • Less silicon areas: reductions up to 29% in the encoder and 3.0% in the decoder circuits.

•
Lower power consumption: about 40% in the encoder and up to 26.9% in the decoder.

Conclusions
The Bose-Chaudhuri-Hocquenghem (BCH) codes are one of the best-known error control codes. They are extensively employed in several fields, but they have not found favorable application in memories. The non-alignment of their block sizes to typical memory word lengths and the large multi-cycle latency of traditional iterative decoding algorithms are the main reasons for this. This paper presents low redundancy and reduced overhead (LRRO) double error correction (DEC) codes as an alternative to the BCH DEC codes for protecting registers and memories, i.e., for hardware implementations and block sizes that are in powers of 2 (8, 16, 32, 64, etc. Figure 11. Power consumed by the encoder circuits (in µW).
Nowadays, a low power consumption is a must in most digital systems. Therefore, the reductions achieved by the LRRO codes make them a clearly better option compared with BCH codes.
To sum up, the low redundancy and reduced overhead DEC codes presented in this paper are a better option than the equivalent BCH DEC codes, as they maintain the same redundancy and error coverage, while inducing a reduced overhead. Comparing the codes for the 8-, 16-, 32-, and 64-bit data lengths against BCH, LRRO circuits achieve the following:

•
Lower delay: at least 30% reduction in the encoder, up to 8.7% in the decoder circuits. • Less silicon areas: reductions up to 29% in the encoder and 3.0% in the decoder circuits.

•
Lower power consumption: about 40% in the encoder and up to 26.9% in the decoder.

Conclusions
The Bose-Chaudhuri-Hocquenghem (BCH) codes are one of the best-known error control codes. They are extensively employed in several fields, but they have not found favorable application in memories. The non-alignment of their block sizes to typical memory word lengths and the large multi-cycle latency of traditional iterative decoding algorithms are the main reasons for this. This paper presents low redundancy and reduced overhead (LRRO) double error correction (DEC) codes as an alternative to the BCH DEC codes for protecting registers and memories, i.e., for hardware implementations and block sizes that are in powers of 2 (8, 16, 32, 64, etc.). The objective is Nowadays, a low power consumption is a must in most digital systems. Therefore, the reductions achieved by the LRRO codes make them a clearly better option compared with BCH codes.
To sum up, the low redundancy and reduced overhead DEC codes presented in this paper are a better option than the equivalent BCH DEC codes, as they maintain the same redundancy and error coverage, while inducing a reduced overhead. Comparing the codes for the 8-, 16-, 32-, and 64-bit data lengths against BCH, LRRO circuits achieve the following:

•
Lower delay: at least 30% reduction in the encoder, up to 8.7% in the decoder circuits. • Less silicon areas: reductions up to 29% in the encoder and 3.0% in the decoder circuits.

•
Lower power consumption: about 40% in the encoder and up to 26.9% in the decoder.

Conclusions
The Bose-Chaudhuri-Hocquenghem (BCH) codes are one of the best-known error control codes. They are extensively employed in several fields, but they have not found favorable application in memories. The non-alignment of their block sizes to typical memory word lengths and the large multi-cycle latency of traditional iterative decoding algorithms are the main reasons for this. This paper presents low redundancy and reduced overhead (LRRO) double error correction (DEC) codes as an alternative to the BCH DEC codes for protecting registers and memories, i.e., for hardware implementations and block sizes that are in powers of 2 (8, 16, 32, 64, etc.). The objective is to maintain the redundancy and the error coverage of BCH DEC codes, while reducing the overhead induced by the encoder and decoder circuits.
We have demonstrated that all overhead comparisons are favorable to LRRO DEC codes. The theoretical study shows that all LRRO codes have a lower Hamming weight in the heaviest row of their parity check matrices, as well as a lower Hamming weight for the whole matrix, compared with the equivalent BCH DEC codes. Moreover, the logic synthesis illustrates that for all word lengths, the LRRO DEC circuits have a lower propagation delay, employ less silicon areas, and consume less power than the corresponding BCH DEC codes.
Therefore, we propose the LRRO DEC codes as an interesting alternative to BCH DEC codes for protecting registers and memories against single and double errors.
Finding ECCs with a lower redundancy, thereby maintaining affordable overheads, is part of our ongoing and future work.