On the Performance and Security of Multiplication in GF(2^{N})
Abstract
:1. Introduction
2. The Field $\mathit{GF}({\mathbf{2}}^{\mathit{N}})$ in Cryptography: Arithmetic and Suitability
2.1. Application to Block Ciphers
2.2. Application to Classical PublicKey Cryptography
2.3. Application to PostQuantum PublicKey Cryptography
2.4. Arithmetic in Extensions of $GF(2)$
2.5. Tower Fields Representation
 $P(x)={x}^{\ell}+\dots +{p}_{1}x+1$ is an irreducible polynomial over $GF(2)$ of degree ℓ,
 $Q(x)={x}^{m}+\dots +{q}_{1}x+1$ is an irreducible polynomial over $GF({2}^{\ell})$ of degree m,
 $D(x)={x}^{N}+\dots +{d}_{1}x+1$ is an irreducible polynomial over $GF(2)$ of degree N,
2.6. Composite Fields and Fields Mapping
 ${\alpha}^{r}=\gamma $ is known for some integer r and;
 $D({\alpha}^{r})\equiv 0\phantom{\rule{4.44443pt}{0ex}}(mod\phantom{\rule{0.277778em}{0ex}}P,Q)$.
 (1)
 ${\beta}^{r}\in GF({2}^{\ell})$,
 (2)
 if β is a primitive element, then γ is primitive in $GF({2}^{\ell})$.
 $P({\alpha}^{r})=0$ where $r=({2}^{m\ell}1)/({2}^{\ell}1)$, hence ${\alpha}^{r}=\gamma ,$
 $Q(x)={S}_{\alpha}(x)$ hence, ${q}_{mk}={(1)}^{k}{\sum}_{1\le {i}_{1}\le {i}_{2}\le \dots \le {i}_{k}\le m}{\alpha}_{{i}_{1}}{\alpha}_{{i}_{2}}^{{2}^{\ell}}{\alpha}_{{i}_{3}}^{{2}^{2\ell}}\dots {\alpha}_{{i}_{m}}^{{{2}^{\ell}}^{m1}}.$
3. Results and Discussion
3.1. Multiplication in $GF({2}^{N})$
Algorithm 1 Initialization of the antilog table 
Require: The finite field $GF\left({2}^{n}\right)$ and its generator polynomial P. Ensure: The antilog table.

Algorithm 2 Initialization of the log table 
Require: the antilog table. Ensure: the log table.

Algorithm 3 Multiplication in the tower field in $GF({({2}^{6})}^{2})$ 
Require: two polynomials $x=\left\{{x}_{i}\right\}$, $y=\left\{{y}_{i}\right\}$ and an extension polynomial p of order 2 Ensure: polynomial $r=x.y=\left\{{r}_{i}\right\}$

Algorithm 4 Iterative multiplication with conditional reduction 
Require: Two polys $X=\left\{{x}_{i}\right\}$, $Y=\left\{{y}_{i}\right\}$ of orders at most n and a reduction polynomial P of order n Ensure: Polynomial $R=X.Y=\left\{{r}_{i}\right\}$ of order n

3.2. Secure Computation in $GF({2}^{N})$
Algorithm 5 Iterative multiplication with unconditional reduction 
Require: Two polynomials $X=\left\{{x}_{i}\right\}$, $Y=\left\{{y}_{i}\right\}$ of orders at most n and a reduction polynomial P of order n Ensure: Polynomial $R=X.Y=\left\{{r}_{i}\right\}$ of order n

Algorithm 6 Bitsliced multiplication 
Require:$2\times 64$nbit words ${X}_{i}$ and ${Y}_{i}$ where $i\in [1,64]$ Ensure: 64 nbit words ${R}_{i}={X}_{i}.{Y}_{i}$

4. Case Study: Optimization of DAGS
4.1. Initial Choice of Parameters
4.2. Improved Field Selection
 Key Generation,
 Encapsulation,
 Decapsulation.
 Tabulated log/antilog (Algorithms A1–A3),
 Iterative, conditional reduction (Algorithm A5),
 Iterative, ASM with PCLMUL, conditional reduction (Algorithm A5),
 Iterative, unconditional reduction (Algorithm A6),
 Iterative, ASM with PCLMUL, unconditional reduction (Algorithm A6),
 Iterative, unconditional reduction, 1bitsliced, 64 comput. in parallel (Algorithm A7),
 Iterative, ASM with PCLMUL, unconditional reduction, bitsliced 2 computs. In parallel (Algorithm A8).
4.3. Implementation Performances
 (*): Conversion from $GF({({2}^{6})}^{2})$ to $GF({2}^{12})$ using T in Example (1) is 112 cycles, using POPCNT ASM instruction is 38 cycles (Algorithm A9).
 (**): Time to initialize the tables: (Algorithm A1 and A2);
 2360 cycles on $GF({2}^{6})$,
 267,086 cycles on $GF({({2}^{6})}^{2})$ and,
 7884 cycles on $GF({2}^{12})$,
(can be precomputed, hence cycles=0)  (***): Transposition (Algorithm A7.1) time is;
 780 cycles on $GF({2}^{6})$ and,
 1613 cycles on $GF({2}^{12})$,
 (****): Transposition $(X,{X}^{\prime})\to {X}^{\prime}{2}^{2N}+X$ is 2 cycles on $GF({2}^{N})$.
 The tabulated logantilog version is the fastest amongst nonparallel algorithms.
 It is faster to implement tower field computation directly in an isomorphic field of characteristic two.
 The modular multiplication with Carryless MULtiplication (PCLMUL) dedicated Assembly (ASM) instruction does not improve the speed since the overhead in the function call is dominating the computation for those small values of N. However, in case of only one serial operation, PCLMUL should be used because it has the lowest latency.
 Constanttime cache secure implementations take more time than those that are not secure. Moreover, we noticed that “conditional reduction” in C code is actually constanttime once compiled in assembly code (when optimization flag is set) owing to the use by the compiler of the CMOV (conditional move) assembly instruction, which executes in one single clock cycle.
 The bitsliced single multiplication takes only $55/64=0.86$ cycle over $GF({2}^{6})$ and $335/64=5.23$ cycles for $GF({2}^{12})$ and is invulnerable to cachetiming attacks. Thus, it is our champion implementation to be chosen for fast and secure arithmetic over $GF({2}^{N})$.
 For the second version of bitsliced implementation, we pack two words $X,{X}^{\prime}\in GF({2}^{N})$ as ${X}^{\prime}{2}^{2N}+X$. Then, the products $XY$ and ${X}^{\prime}{Y}^{\prime}$ can be computed in one go by noticing that PCLMULQDQ(${X}^{\prime}{2}^{2N}+X,{Y}^{\prime}{2}^{2N}+Y$) = PCLMULQDQ(${X}^{\prime}$, ${Y}^{\prime}$) ${2}^{4N}$ + (PCLMULQDQ(X, ${Y}^{\prime}$)⊕ PCLMULQDQ(${X}^{\prime}$, Y)) ${2}^{2N}$ + PCLMULQDQ(X, Y); hence, the results are obtained at bit indices $[6N,4N]$ and $[2N,0]$.
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
Appendix A. C Code for Various Algorithms
 #include <stdlib.h>
 #include <stdint.h>
 typedef uint16_t gf_t; /* Galois field elements */
 #define gf_extd 6
 #define gf_card (1 << gf_extd)
 #define gf_ord ((gf_card)1)
 #define poly_primitive_subfield 67 // 0x43 (0b01000011; the bits are defined
 //following the polynomial: X^6 + x + 1
 /* Algorithm A1: Precomputation of the antilog table for F(2)[x]/x^6+x+1 */
 static gf_t *gf_antilog;
 void gf_init_antilog()
 {
 int i = 1;
 int temp = 1 << (gf_extd  1);
 gf_antilog = (gf_t *)malloc((gf_card * sizeof(gf_t)));
 gf_antilog[0] = 1; // Dummy value (not used)
 for (i = 1; i < gf_ord; ++i)
 {
 gf_antilog[i] = gf_antilog[i  1] << 1;
 if ((gf_antilog[i  1]) & temp)
 {
 // XOR with 67: X^6 + x + 1
 gf_antilog[i] ^= poly_primitive_subfield;
 }
 }
 gf_antilog[gf_ord] = 1;
 }
 /* Algorithm A2: Precomputation of the log table for F(2)[x]/x^6+x+1 */
 static gf_t* gf_log;
 void gf_init_log()
 {
 int i = 1;
 gf_log = (gf_t *)malloc((gf_card * sizeof(gf_t)));
 gf_log[0] = 1; // Dummy value (not used)
 gf_log[1] = 0;
 for (i = 1; i < gf_ord; ++i)
 {
 gf_log[gf_antilog[i]] = i;
 }
 }
 /* Algorithm A3: Tabulated multiplication over GF(2^6) */
 /* Use precomputed tables to accelerate the multiplication: it uses the */
 /* algorithm 1 and 2 which are done just once to the DAGS initialization*/
 /* This algorithm is not constanttime, so it is not protected. */
 #define gf_mult_tabulated(x,y)((y) ? gf_antilog[(gf_log[x]+gf_log[y])
 % gf_ord]: 0)
 //not constanttime
 /* Algorithm A4: Tabulated multiplication over GF((2^6)^2) */
 /* Uses the algorithm 3, so it is not protected */
 gf_t gf_mult_extension_tabulated(gf_t x, gf_t y)
 {
 gf_t a1, b1, a2, b2, a3, b3;
 a1 = x >> 6;
 b1 = x & 63;
 a2 = y >> 6;
 b2 = y & 63;
 // not constanttime
 a3 = gf_mult_tabulated(gf_mult_tabulated(a1, a2), 36) ^
 gf_mult_tabulated(a1, b2)^ gf_mult_tabulated(b1, a2);
 //36 is p_1 in the extension polynomial
 b3 = gf_mult_tabulated(gf_mult_tabulated(a1, a2), 2)
 ^ gf_mult_tabulated(b1, b2); //2 is p_0 in the extension polynomial
 return (a3 << 6) ^ b3;
 }
 /* Algorithm A5: Iterative multiplication over GF(2^6) with conditional reduction */
 /* The multiplication does not use the precomputed tables and the
 ASM PCLMUL instruction */
 /* can be used. It is not constanttime. */
 gf_t gf_mult_iterative_conditional(gf_t x, gf_t y)
 {
 #ifndef PCLMUL
 gf_t res,m;
 res = 0; // this variable will contain the result
 for(int i=0;i<6;++i) // For each coefficient of the polynomial
 {
 if(y&1==1) // Check the coefficient, it is not constanttime
 {
 res = res ^ x; // addition
 }
 y=y>>1;
 //this shift permits to have the next coefficient b_i for the next iteration
 x = x << 1;
 if((x & 64) != 0)
 // x must be reduced modulo X^6+X+1, 64 for 0x40, 0b01000000
 {
 x ^= 67; // 0x43: X^6 + x + 1
 }
 }
 return res;
 #else
 //using ASM PCLMUL instruction
 uint32_t a, m;
 // Multiplication
 asm volatile ("movdqa %1, %%xmm0;\n\t"
 "movdqa %2, %%xmm1;\n\t"
 "pclmulqdq $0x00, %%xmm0, %%xmm1;\n\t"
 "movdqa %%xmm1, %0;\n\t"
 : "=x"(a)
 : "x"((uint32_t)y), "x"((uint32_t)x)
 : "%xmm0","%xmm1"
 );
 // reduction polynomial (conditional reduction)
 for (int k=0; k<6; k++) { // For each coefficient of the polynomial
 if (a >> (11k)) //not constanttime
 {
 a ^= (67 << (5k)); // 0x43: X^6 + x + 1
 }
 }
 return a&0xFFFF;
 #endif
 }
 /* Algorithm A6: Iterative multiplication
 over GF(2^6) with unconditional reduction */
 /* The multiplication does not use the
 precomputed tables and the ASM PCLMUL instruction */
 /* can be used. It is constanttime. */
 gf_t gf_mult_iterative_unconditional(gf_t x, gf_t y)
 {
 #ifndef PCLMUL
 gf_t res,m;
 res = 0;
 for(int i=0;i<6;++i)
 // For each coefficient of the polynomial, constanttime
 {
 m = (y&1); //m is either 0xffff or 0x0000
 res = res ^ (x&m); // addition
 y=y>>1;
 x = x << 1;
 // x must be reduced modulo X^6+X+1
 m=(((x)>>6)&1);
 x ^= m & 67; // 0x43 : X^6 + x + 1
 }
 return res;
 #else
 //using ASM PCLMUL instruction
 uint32_t a, m;
 // multiplication
 asm volatile ("movdqa %1, %%xmm0;\n\t"
 "movdqa %2, %%xmm1;\n\t"
 "pclmulqdq $0x00, %%xmm0, %%xmm1;\n\t"
 "movdqa %%xmm1, %0;\n\t"
 : "=x"(a)
 : "x"((uint32_t)y), "x"((uint32_t)x)
 :"%xmm0","%xmm1"
 );
 // reduction polynomial
 for (int k=0; k<6; k++) {
 m = ((a >> (11k))&1);
 a ^= ((67 << (5k))&m); // 0x43: X^6 + x + 1
 }
 return a&0xFFFF;
 #endif
 }
 /* Algorithm A7: 1bitsliced multiplication
 over GF(2^6) (64 computations in parallel) */
 /* A7.1: Transpositions */
 void to_bitslice(gf_t *x, uint64_t *res) {
 int i = 0;
 for (i = 0; i<64; i++) {
 res[0] = (((((uint64_t)x[i])) & 1) << i);
 res[1] = (((((uint64_t)x[i]) >> 1) & 1) << i);
 res[2] = (((((uint64_t)x[i]) >> 2) & 1) << i);
 res[3] = (((((uint64_t)x[i]) >> 3) & 1) << i);
 res[4] = (((((uint64_t)x[i]) >> 4) & 1) << i);
 res[5] = (((((uint64_t)x[i]) >> 5) & 1) << i);
 }
 }
 void from_bitslice(uint64_t *res, gf_t *x) {
 int i = 0;
 for (i = 0;i<64; i++) {
 x[i] = (((res[0] >> i) & 1));
 x[i] = (((res[1] >> i) & 1) << 1);
 x[i] = (((res[2] >> i) & 1) << 2);
 x[i] = (((res[3] >> i) & 1) << 3);
 x[i] = (((res[4] >> i) & 1) << 4);
 x[i] = (((res[5] >> i) & 1) << 5);
 }
 }
 /* A7.2: 1bitsliced multiplication (SIMD code) */
 void gf_multsubTab(gf_t *x, gf_t *y, gf_t *z)
 {
 uint64_t xbin[6];
 uint64_t ybin[6];
 uint64_t res[6];
 xbin[0]=xbin[1]=xbin[2]=xbin[3]=xbin[4]=xbin[5] = 0;
 ybin[0]=ybin[1]=ybin[2]=ybin[3]=ybin[4]=ybin[5] = 0;
 // Transpose x and y
 to_bitslice(x, xbin);
 to_bitslice(y, ybin);
 // Multiplication and reduction polynomial
 //with 64 computations in parallel for a each coefficient of the polynomial
 // constanttime
 uint64_t const xbin05 = xbin[0] ^ xbin[5];
 uint64_t const xbin54 = xbin[5] ^ xbin[4];
 uint64_t const xbin43 = xbin[4] ^ xbin[3];
 uint64_t const xbin32 = xbin[3] ^ xbin[2];
 uint64_t const xbin21 = xbin[2] ^ xbin[1];
 res[0] = (xbin[0] & ybin[0]);
 res[1] = (xbin[1] & ybin[0]);
 res[2] = (xbin[2] & ybin[0]);
 res[3] = (xbin[3] & ybin[0]);
 res[4] = (xbin[4] & ybin[0]);
 res[5] = (xbin[5] & ybin[0]);
 res[0] ^= (xbin[5] & ybin[1]);
 res[1] ^= (xbin05 & ybin[1]);
 res[2] ^= (xbin[1] & ybin[1]);
 res[3] ^= (xbin[2] & ybin[1]);
 res[4] ^= (xbin[3] & ybin[1]);
 res[5] ^= (xbin[4] & ybin[1]);
 res[0] ^= (xbin[4] & ybin[2]);
 res[1] ^= (xbin54 & ybin[2]);
 res[2] ^= (xbin05 & ybin[2]);
 res[3] ^= (xbin[1] & ybin[2]);
 res[4] ^= (xbin[2] & ybin[2]);
 res[5] ^= (xbin[3] & ybin[2]);
 res[0] ^= (xbin[3] & ybin[3]);
 res[1] ^= (xbin43 & ybin[3]);
 res[2] ^= (xbin54 & ybin[3]);
 res[3] ^= (xbin05 & ybin[3]);
 res[4] ^= (xbin[1] & ybin[3]);
 res[5] ^= (xbin[2] & ybin[3]);
 res[0] ^= (xbin[2] & ybin[4]);
 res[1] ^= (xbin32 & ybin[4]);
 res[2] ^= (xbin43 & ybin[4]);
 res[3] ^= (xbin54 & ybin[4]);
 res[4] ^= (xbin05 & ybin[4]);
 res[5] ^= (xbin[1] & ybin[4]);
 res[0] ^= (xbin[1] & ybin[5]);
 res[1] ^= (xbin21 & ybin[5]);
 res[2] ^= (xbin32 & ybin[5]);
 res[3] ^= (xbin43 & ybin[5]);
 res[4] ^= (xbin54 & ybin[5]);
 res[5] ^= (xbin05 & ybin[5]);
 // Transpose
 from_bitslice(res, z);
 }
 /* Algorithm A8: Iterative, ASM with PCLMUL, unconditional reduction, bitsliced
 (2 computations in parallel) */
 /* with PCMUL, 2 computations maximum are possible. It is constanttime. */
 void gf_mult_bitslice_2computations(gf_t *x, gf_t *y, gf_t *tab) {
 // Transposition for computation in parallel
 uint64_t x2 = x[1] << 12  x[0], y2 = y[1] << 12  y[0];
 uint64_t a, m, m1, s, m0;
 // Multiplication
 // As the output is on 64 bits max
 asm volatile ("movdqa %1, %%xmm0;\n\t"
 "movdqa %2, %%xmm1;\n\t"
 "pclmulqdq $0x00, %%xmm0, %%xmm1;\n\t"
 "movq %%xmm1, %0;\n\t"
 : "=x"(a)
 : "x"(y2), "x"(x2)
 :"%xmm0","%xmm1"
 );
 // Polynomial reduction
 for (int k=0; k<6; k++) {
 m0 = a >> (11k);
 m = (m0&1);
 m1 = ((m0>>24)&1);
 s = (67 << (5k)); // 0x43: X^6 + x + 1
 a ^= (( s & m )  ((s << 24) & m1 ));
 }
 // Transposition
 tab[0] = a&0x3F;
 tab[1] = (a>>24)&0x3F;
 }
 /* Algorithm A9: Mapping between GF((2^6)^2) and GF(2^12) */
 //Conversion Matrix from GF((2^6)^2) to GF(2^12)
 static const gf_t T[12] = {3857, 1140, 3330, 132, 286,
 1954, 1938, 1208, 314, 3754, 2750, 188};
 //Conversion Matrix from GF(2^12) to GF((2^6)^2)
 static const gf_t Ti[12] = {3321, 3388, 4080, 2152,
 3712, 3808, 2274, 4088, 1076, 3904, 1904, 3708};
 //Hamming weight computation
 static inline gf_t hamming_weight(gf_t n) {
 #ifndef ASM_POPCNT
 n = ((n & 0x0AAA) >> 1) + (n & 0x0555);
 n = ((n & 0x0CCC) >> 2) + (n & 0x0333);
 n = ((n & 0x00F0) >> 4) + (n & 0x0F0F);
 n = ((n & 0x0F00) >> 8) + (n & 0x00FF);
 #else
 //using ASM
 asm (
 "POPCNT %1, %0 \n" // Count of Number of Bits Set to 1
 : "=r" (n)
 : "mr" (n)
 : "cc"
 );
 #endif
 return n;
 }
 /* A9.1: Conversion from GF(2^12) to GF((2^6)^2) */
 /* with the convertion Matrix Ti from GF(2^12) to GF((2^6)^2) */
 gf_t iconv_bit(gf_t x)
 {
 gf_t res = 0;
 for (int i=0; i<12; i++) {
 res = (hamming_weight(x & Ti[i])&1) << i; // Ti defined in (3.5)
 }
 return res;
 }
 /* A9.2: Conversion from GF((2^6)^2) to GF(2^12) */
 /* with the convertion Matrix T from GF((2^6)^2) to GF(2^12) */
 gf_t conv_bit(gf_t x)
 {
 gf_t res = 0;
 for (int i=0; i<12; i++) {
 res = (hamming_weight(x & T[i])&1) << i; // T defined in (3.5)
 }
 return res;
 }
References
 Paar, C. Efficient VLSI architectures for BitParallel Computation in Galois Fields. Ph.D. Thesis, Institute for Experimental Mathematics, University of Essen, Duisburg, Germany, 1994. Available online: https://tinyurl.com/yc7hmfmo (accessed on 18 September 2018).
 Sunar, B.; Savas, E.; Koç, Ç.K. Constructing composite field representations for efficient conversion. IEEE Trans. Comput. 2003, 52, 1391–1398. [Google Scholar] [CrossRef] [Green Version]
 Round 1 Submissions (30/11/2017)—PostQuantum Cryptography. Available online: https://csrc.nist.gov/Projects/PostQuantumCryptography/Round1Submissions (accessed on 18 September 2018).
 DAGS project. Available online: http://www.dagsproject.org (accessed on 18 September 2018).
 NIST/ITL/CSD. Advanced Encryption Standard (AES). FIPS PUB 197, 11/26/2001. (Also ISO/IEC 180333:2010). Available online: http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.197.pdf (accessed on 18 September 2018).
 McEliece, R.J. A publickey cryptosystem based on algebraic coding theory. JPL DSN Prog. Rep. 1978, 42–44, 114–116. [Google Scholar]
 Rivest, R.L.; Shamir, A.; Adleman, L. A method for obtaining digital signatures and publickey cryptosystems. Commun. ACM 1978, 21, 120–126. [Google Scholar] [CrossRef] [Green Version]
 Diffie, W.; Hellman, M. New directions in cryptography. IEEE Trans. Inf. Theory 1976, 22, 644–654. [Google Scholar] [CrossRef] [Green Version]
 Bardet, M.; Chaulet, J.; Dragoi, V.; Otmani, A.; Tillich, J.P. Cryptanalysis of the McEliece public key cryptosystem based on polar codes. In Proceedings of the 7th International Conference on PostQuantum Cryptography (PQCrypto 2016), Fukuoka, Japan, 24–26 February 2016; Springer: Berlin, Germany, 2016; pp. 118–143. [Google Scholar]
 PostQuantum Cryptography Challenge (ongoing). Available online: https://csrc.nist.gov/Projects/PostQuantumCryptography/Round1Submissions (accessed on 18 September 2018).
 Yarom, Y.; Falkner, K. FLUSH+RELOAD: A High Resolution, Low Noise, L3 Cache SideChannel Attack. In Proceedings of the 23rd USENIX Security Symposium (USENIX Security 14), San Diego, CA, USA, 20–22 August 2014; pp. 719–732. [Google Scholar]
 Facon, A.; Guilley, S.; Lec’hvien, M.; Schaub, A.; Souissi, Y. Detecting cachetiming vulnerabilities in postquantum cryptography algorithms. In Proceedings of the 3rd IEEE International Verification and Security Workshop, Hotel Cap Roig, Platja d’Aro, Costa Brava, Spain, 2–4 July 2018. [Google Scholar]
 Lidl, R.; Niederreiter, H. Finite Fields; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
 Tromer, E.; Osvik, D.A.; Shamir, A. Efficient Cache Attacks on AES, and Countermeasures. J. Cryptol. 2010, 23, 37–71. [Google Scholar] [CrossRef]
 Aciiçmez, O.; Koç, Ç.K.; Seifert, J.P. On the power of simple branch prediction analysis. In Proceedings of the 2nd ACM Symposium on Information, Computer and Communications Security, Singapore, 20–22 March 2007; pp. 312–320. [Google Scholar]
 Aciiçmez, O.; Koç, Ç.K.; Seifert, J. Predicting Secret Keys Via Branch Prediction. In Proceedings of the Cryptographers’ Track at the RSA Conference 2007, San Francisco, CA, USA, 5–9 February 2007; pp. 225–242. [Google Scholar]
 Biham, E. A Fast New DES Implementation in Software. In Proceedings of the the Fourth International Workshop on Fast Software Encryption, Haifa, Israel, 20–22 January 1997; pp. 260–272. [Google Scholar]
 Matsui, M.; Nakajima, J. On the Power of Bitslice Implementation on Intel Core2 Processor. In Proceedings of the Cryptographic Hardware and Embedded Systems, Vienna, Austria, 10–13 September 2007; pp. 121–134. [Google Scholar]
 Berlekamp, E.; McEliece, R.; van Tilborg, H. On the Inherent Intractability of Certain Coding Problems. IEEE Trans. Inform. Theory 1978, 24, 384–386. [Google Scholar] [CrossRef]
 Misoczki, R.; Barreto, P.S.L.M.B. Compact McEliece Keys from Goppa Codes. In Proceedings of the 16th Workshop on Selected Areas in Cryptography (SAC 2009), Calgary, AB, Canada, 13–14 August 2009; pp. 376–392. [Google Scholar]
 Persichetti, E. Compact McEliece keys based on quasidyadic Srivastava codes. J. Math. Cryptol. 2012, 6, 149–169. [Google Scholar] [CrossRef]
 Faugère, J.C.; Otmani, A.; Perret, L.; Tillich, J.P. Algebraic Cryptanalysis of McEliece Variants with Compact Keys. In Proceedings of the 29th Annual International Conference on the Theory and Applications of Cryptographic Techniques, French Riviera, France, 30 May–3 June 2010; pp. 279–298. [Google Scholar]
 Prange, E. The use of information sets in decoding cyclic codes. IRE Trans. Inf. Theory 1962, 8, 5–9. [Google Scholar] [CrossRef]
 Peters, C. InformationSet Decoding for Linear Codes over F_{q}. In Proceedings of the The Third International Workshop on PostQuantum Cryptography, Darmstadt, Germany, 25–28 May 2010; pp. 81–94. [Google Scholar]
Submission  Type  Finite Field  Tower Fields Used 

BIG QUAKE  Codebased  $GF({2}^{N}),N=12,18$  No 
DAGS  Codebased  $GF({({2}^{5})}^{2}),GF({({2}^{6})}^{2})$  Yes 
EdonK  Codebased  $GF({2}^{N}),N=128,192$  No 
Ramstake  Codebased  $GF({2}^{8})$  No 
RLCE  Codebased  $GF({2}^{N}),N=10,11$  No 
LAC  Latticebased  $GF({2}^{N}),N=9,10$  No 
DME  Multivariate  $GF({2}^{N}),N=24,48$  No 
HIMQ3  Multivariate  $GF({2}^{8})$  No 
LUOV  Multivariate  $GF({2}^{8}),GF({({2}^{16})}^{\ell}),\ell =3,4,5$  Yes 
Name  Security Level  q  m  n  k  s  t  Public Key Size 

DAGS_1  128  ${2}^{5}$  2  832  416  ${2}^{4}$  13  6760 
DAGS_3  256  ${2}^{6}$  2  1216  512  ${2}^{5}$  11  8448 
DAGS_5  512  ${2}^{6}$  2  2112  704  ${2}^{6}$  11  11,616 
Multiplication Algorithm  Algorithm  $\mathit{GF}({2}^{6})$  $\mathit{GF}({2}^{12})$ (*)  $\mathit{GF}({({2}^{6})}^{2})$  ConstantTime 

Tabulated log/antilog (**)  3, 4  8  11  20  No 
Iterative, conditional reduction  5  27  51  133  No 
Iterative, ASM with PCLMUL,  5  29  41  146  No 
conditional reduction  
Iterative, unconditional reduction  6  30  58  155  Yes 
Iterative, ASM with PCLMUL,  6  35  65  225  Yes 
unconditional reduction  
Iterative, unconditional reduction,  7  $55/64$  $335/64$    Yes 
1bitsliced (***) 64 computations in parallel  
Iterative, ASM with PCLMUL,  8  $55/2$  $95/2$    Yes 
unconditional reduction, bitsliced (****) 2  
computations in parallel 
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Danger, J.L.; El Housni, Y.; Facon, A.; Gueye, C.T.; Guilley, S.; Herbel, S.; Ndiaye, O.; Persichetti, E.; Schaub, A. On the Performance and Security of Multiplication in GF(2^{N}). Cryptography 2018, 2, 25. https://doi.org/10.3390/cryptography2030025
Danger JL, El Housni Y, Facon A, Gueye CT, Guilley S, Herbel S, Ndiaye O, Persichetti E, Schaub A. On the Performance and Security of Multiplication in GF(2^{N}). Cryptography. 2018; 2(3):25. https://doi.org/10.3390/cryptography2030025
Chicago/Turabian StyleDanger, JeanLuc, Youssef El Housni, Adrien Facon, Cheikh T. Gueye, Sylvain Guilley, Sylvie Herbel, Ousmane Ndiaye, Edoardo Persichetti, and Alexander Schaub. 2018. "On the Performance and Security of Multiplication in GF(2^{N})" Cryptography 2, no. 3: 25. https://doi.org/10.3390/cryptography2030025