# FPGA Implementation of Some Second Round NIST Lightweight Cryptography Candidates

## Abstract

## 1. Introduction

## 2. Background

#### 2.1. Authenticated Encryption with Associated Data

#### 2.2. GMU LWC Interface

## 3. Implemented Authenticated Ciphers

#### 3.1. Preliminaries

#### 3.2. Hardware Design Principles

#### 3.3. LOTUS and LOCUS

#### 3.4. LOTUS

#### 3.5. LOCUS

#### Brief Description of tweGift-64

#### 3.6. ESTATE

#### Brief Description of AES

- tweAES-8: The version of tweAES with 8-bit datapath uses eight clock cycles to compute the AddRoundKey and SubstitutionBytes, ShiftRows takes one clock cycle, and finally MixColums is performed in four clock cycles. Thus, the latency per round is 21 clock cycles, and ten rounds take 210 clock cycles and 16 additional clock cycles to output the encrypted block, giving a total latency of 226 clock cycles. This architecture uses only one Sbox. As the last round does not include the MixColumns transformation, to maintain the uniformity of the design, the state register is disabled, avoiding compute the MixColums in the 10-th round.
- tweAES-32: The computation of the round is column-wise, so each round takes four clock cycles as the ShiftRows and MixColumns are performed in parallel, only selecting the correct bytes from the state to compute the MixColums. Four additional clock cycles are used to give the output. Thus, the total latency of tweAES is 44 clock cycles. This implementation uses four Sboxes.
- tweGift-128: Both implementations, 8-bit datapath and 32-bit datapath, use the same architecture; the only changes are that for 32 bits it uses eight Sboxes (4-bit Sbox) while for 8 bits only two Sboxes are used. The permutation step takes one clock cycle for both cases and the computation of the round is 5 and 17 clock cycles for 32-bit datapath and 8-bit datapath, respectively. Gift-128 executes 40 rounds in 32 bits, so the total latency is 204 clock cycles, and for 8 bits it is 696 clock cycles.

#### 3.7. COMET

#### 3.8. Oribatida

## 4. Results

#### Discussion of Results

## 5. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

**Figure 1.**Top-level block diagram of LWC core (based on the scheme found at [9]). sw, external key width; w, external data width; ccsw, internal key width; ccw, internal data width.

**Figure 11.**Encryption version of Oribatida AEAD Algorithm, where (

**a**) is the nonce/AD stage and (

**b**) is the encryption stage (scheme based on the one found in [4]).

Encoding | Segment |
---|---|

0001 | Associated data (AD) |

0100 | Message blocks (PT) |

0101 | Decrypted message (CT) |

1000 | Tag (T) |

1100 | Key (K) |

1101 | Npub |

0111 | Hash message |

1001 | Hash value |

**Table 2.**Utilization of resources, throughput and TPA for implemented architectures on the xc7a12tcsg325-3 FPGA.

Mode | LUT | FF | Slices | Freq | Clock Cycles | Throughput | TPA |
---|---|---|---|---|---|---|---|

(MHz) | per Block | E/D | (Mbps/LUT) | ||||

LOCUS_32 | 1846 | 1005 | 521 | 166.04 | 114 | 93.21 | 0.050 |

LOTUS_32 | 1525 | 908 | 454 | 132.45 | 114 | 74.36 | 0.049 |

ESTATE-AES_32 | 1359 | 733 | 420 | 202.02 | 88 | 293.85 | 0.216 |

ESTATE-AES_8 | 797 | 416 | 228 | 227.27 | 552 | 57.21 | 0.072 |

ESTATE-Gift_32 | 1055 | 869 | 324 | 233.97 | 408 | 73.40 | 0.070 |

ESTATE-Gift_8 | 821 | 558 | 248 | 259.74 | 1392 | 23.88 | 0.029 |

COMET_8 | 1052 | 1031 | 346 | 190.33 | 297 | 92.47 | 0.088 |

COMET_32 | 1737 | 1551 | 565 | 196.85 | 70 | 427.40 | 0.246 |

Oribatida-256_256 | 1432 | 1319 | 465 | 246.24 | 137 | 230.06 | 0.161 |

**Table 3.**Overhead in resources for complete design (LWC + CryptoCore) compared with CryptoCore alone. %Usage is the percentage of utilization of available resources on the xc7a12tcsg325-3 FPGA.

Complete Design | Mode Only | |||||
---|---|---|---|---|---|---|

Mode | LWC + CryptoCore | CryptoCore | Overhead | |||

LUT/%Usage | FF/%Usage | LUT | FF | LUT | FF | |

LOCUS | 1846/23.07 | 1005/6.28 | 1640 | 956 | 195 | 52 |

LOTUS | 1525/18.77 | 908/5.67 | 1327 | 854 | 183 | 52 |

ESTATE-AES_32 | 1359/16.99 | 733/4.58 | 1065 | 505 | 294 | 228 |

ESTATE-AES_8 | 797/9.96 | 416/2.60 | 587 | 344 | 209 | 72 |

ESTATE-Gift_32 | 1055/13.19 | 869/5.43 | 788 | 625 | 267 | 244 |

ESTATE-Gift_8 | 821/10.26 | 558/3.49 | 535 | 479 | 286 | 79 |

COMET_8 | 1052/13.15 | 1031/6.44 | 803 | 951 | 249 | 80 |

COMET_32 | 1737/21.71 | 1551/9.69 | 1462 | 1451 | 275 | 100 |

Oribatida | 1432/17.90 | 1319/8.24 | 1248 | 1172 | 184 | 147 |

Mode | LUTs | Ranking | Freq | Throughput | Ranking | TPA | Ranking |
---|---|---|---|---|---|---|---|

- | - | LUTs | (MHz) | E/D Mbps | Thr. | Mbps/LUT | TPA |

ASCON-AEAD [5] | 1898 | 11 | 263.00 | 1683.2 | 1 | 0.887 | 1 |

COMET-AES [5] | 2753 | 14 | 251.00 | 1606.40 | 2 | 0.584 | 2 |

Gift-COFB [5] | 1932 | 12 | 263.00 | 635.20 | 3 | 0.329 | 3 |

COMET_32 | 1737 | 9 | 196.85 | 427.40 | 5 | 0.246 | 4 |

ESTATE-AES_32 | 1359 | 6 | 202.02 | 293.85 | 6 | 0.216 | 5 |

Oribatida-256_256 | 1432 | 7 | 246.24 | 230.06 | 8 | 0.161 | 6 |

SpoC [5] | 1172 | 5 | 268.00 | 154.50 | 9 | 0.132 | 7 |

COMET-CHAM [5] | 2214 | 13 | 201.00 | 282.70 | 7 | 0.128 | 8 |

Schwaemm [5] | 4313 | 15 | 106.00 | 521.80 | 4 | 0.121 | 9 |

COMET_8 | 1052 | 3 | 190.33 | 92.47 | 10 | 0.088 | 10 |

ESTATE-AES_8 | 797 | 1 | 227.27 | 57.21 | 14 | 0.072 | 11 |

ESTATE-Gift_32 | 1055 | 4 | 233.97 | 73.40 | 12 | 0.070 | 12 |

LOTUS | 1530 | 8 | 132.013 | 74.11 | 11 | 0.048 | 13 |

LOCUS | 1835 | 10 | 121.951 | 68.46 | 13 | 0.037 | 14 |

ESTATE-Gift_8 | 821 | 2 | 259.74 | 23.88 | 15 | 0.029 | 15 |

