Large Field-Size Throughput/Area Accelerator for Elliptic-Curve Point Multiplication on FPGA
Abstract
:1. Introduction
- We present a flexible hardware accelerator of a large field-size PM processor for to optimize the throughput/area utilization. Flexibility means that the users can load different curve parameters and a scalar multiplier to our proposed processor for PM computation (more particular details are shown in Section 3.1).
- To minimize the clock cycle counts, we have implemented a fully recursive Karatsuba multiplier for a binary field of 571 bits. It computes one modular multiplication over two 571-bit inputs in one clock cycle (details are given in Section 3.3).
- To minimize the hardware resources, we have used the Karatsuba multiplier for modular square implementations. Moreover, the Itoh-Tsujii algorithm [22] for the modular inverse computation is operated using our multiplier resources.
- A finite-state-machine (FSM)-based controller is implemented to provide efficient control functionalities.
2. ECC Background over
Algorithm 1: Montgomery PM Algorithm [12] |
Input: with , |
Output: |
3. Proposed Flexible Crypto Accelerator
3.1. Curve Parameters Unit
3.2. RegFile
3.3. Arithmetic Unit
Algorithm 2: NIST modular reduction over [10] |
Input: Polynomial, with -bit length Output: Polynomial, with m-bit length
|
3.4. Control Unit and Clock Cycles Calculation
- Loading secret vectors (LSV). A bit stream of a scalar multiplier k and coordinates of initial point P are required as input to Algorithm 1. Therefore, based on the related control signals, LSV determines the loading of ECC parameters and a scalar multiplier into the corresponding buffers (in the Curve Parameters unit). When a one-bit signal becomes 1, then it means that we have to load x and y coordinates of the initial point P and a curve constant b into the corresponding buffers of the Curve Parameters unit. The related control signals to our Curve Parameters unit are shown in Figure 1 and the related details are provided in Table 1. Similarly, when a one-bit signal becomes 1, it implies that we have to load a secret key (or a scalar multiplier k) into a KeyReg buffer, as shown in Figure 1. It is essential to provide that the interface of our processor architecture supports only 8-bit data loading using an 8-bit pin. Therefore, the 571-bit ECC parameters (i.e., , and b) and a secret key k needs to load in 8-bit form. For example, to load a 571-bit secret key in our KeyReg buffer, 72 clock cycles are required. Similarly, our design requires 3×72 clock cycles to load , and b in buffers of the Curve Parameters unit.
- Initialization (INT). Line one of Algorithm 1 specifies the initializations or conversions from affine to projective coordinates. As our implemented modular operators (adder, square and multiplier) require one clock cycle for one modular operation computation, 5 clock cycles are required for affine to projective conversions.
- Point multiplication computation (PMC). A for loop in Algorithm 1 defines the PM computation in Lopez Dahab projective coordinate system. The instructions mentioned in the if and else statements are for PA and PD operations. More precisely, from to are for PA and to are for PD. The switching between if and else statements relies on the value of the inspected one-bit scalar multiplier k. A total of fourteen instructions are concerned with PA and PD computations. Therefore, to implement the fourteen instructions, our design takes clock cycles, where m is the size of a secret key (i.e., 571).
- Generating output (GOP). Algorithm 1 produces x and y coordinates of a resultant point Q as an output. Therefore, the statements in the last two lines in Algorithm 1 are for generating the and coordinates of a final point Q. This stage is (also) known as reconversions from Lopez Dahab projective to affine coordinates. As observed in the last two lines of Algorithm 1, two modular inverse operations and some addition and multiplication operations are required to compute. As described earlier in this paper, our design requires 583 cycles for one inverse computation. So, we need 1166 clock cycles for two inverse operations. Moreover, our architecture takes 28 clock cycles for other addition and multiplication operations to compute. Hence, clock cycles are required for projective to affine conversions.
4. Results and Comparison
4.1. Results
4.2. Comparisons
4.2.1. Comparison to Area, Clock Cycles, Latency, Frequency and Throughput
4.2.2. Comparison to Throughput/Area
5. Conclusions and Future Trends
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Miller, V.S. Use of Elliptic Curves in Cryptography. In Proceedings of the Advances in Cryptology—CRYPTO ’85 Proceedings; Williams, H.C., Ed.; Springer: Berlin/Heidelberg, Germany, 1986; pp. 417–426. [Google Scholar]
- Rivest, R.L.; Shamir, A.; Adleman, L. A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 1978, 21, 120–126. [Google Scholar] [CrossRef] [Green Version]
- Kumar, K.A.; Krishna, A.V.N.; Chatrapati, K.S. New secure routing protocol with elliptic curve cryptography for military heterogeneous wireless sensor networks. J. Inf. Optim. Sci. 2017, 38, 341–365. [Google Scholar] [CrossRef]
- Gulen, U.; Baktir, S. Elliptic Curve Cryptography for Wireless Sensor Networks Using the Number Theoretic Transform. Sensors 2020, 20, 1507. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Noori, D.; Shakeri, H.; Niazi, T.M. Scalable, efficient, and secure RFID with elliptic curve cryptosystem for Internet of Things in healthcare environment. EURASIP J. Inf. Secur. 2020, 2020, 13. [Google Scholar] [CrossRef]
- Calderoni, L.; Maio, D. Lightweight Security Settings in RFID Technology for Smart Agri-Food Certification. In Proceedings of the 2020 IEEE International Conference on Smart Computing (SMARTCOMP), Bologna, Italy, 14–17 September 2020; pp. 226–231. [Google Scholar] [CrossRef]
- Singh, R.; Miglani, S. Efficient and secure message transfer in VANET. In Proceedings of the 2016 International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India, 26–27 August 2016; Volume 2, pp. 1–5. [Google Scholar] [CrossRef]
- Chavhan, S.; Doriya, R. Secured Map Building using Elliptic Curve Integrated Encryption Scheme and Kerberos for Cloud-based Robots. In Proceedings of the 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 11–13 March 2020; pp. 157–164. [Google Scholar] [CrossRef]
- NIST. Recommended Elliptic Curves for Federal Government Use. 1999. Available online: https://csrc.nist.gov/csrc/media/publications/fips/186/2/archive/2000-01-27/documents/fips186-2.pdf (accessed on 28 October 2022).
- Hankerson, D.; Menezes, A.J.; Vanstone, S. Guide to Elliptic Curve Cryptography; 2004; pp. 1–311. Available online: https://link.springer.com/book/10.1007/b97644 (accessed on 7 November 2022).
- Rashid, M.; Imran, M.; Jafri, A.R.; Al-Somani, T.F. Flexible Architectures for Cryptographic Algorithms — A Systematic Literature Review. J. Circuits Syst. Comput. 2019, 28, 1930003. [Google Scholar] [CrossRef]
- Imran, M.; Rashid, M.; Jafri, A.R.; Kashif, M. Throughput/area optimised pipelined architecture for elliptic curve crypto processor. IET Comput. Digit. Tech. 2019, 13, 361–368. [Google Scholar] [CrossRef] [Green Version]
- Islam, M.M.; Hossain, M.S.; Hasan, M.K.; Shahjalal, M.; Jang, Y. FPGA Implementation of High-Speed Area-Efficient Processor for Elliptic Curve Point Multiplication Over Prime Field. IEEE Access 2019, 7, 178811–178826. [Google Scholar] [CrossRef]
- Rashid, M.; Imran, M.; Kashif, M.; Sajid, A. An Optimized Architecture for Binary Huff Curves With Improved Security. IEEE Access 2021, 9, 88498–88511. [Google Scholar] [CrossRef]
- Khan, Z.U.A.; Benaissa, M. Throughput/Area-efficient ECC Processor Using Montgomery Point Multiplication on FPGA. IEEE Trans. Circuits Syst. II Express Briefs 2015, 62, 1078–1082. [Google Scholar] [CrossRef]
- Imran, M.; Pagliarini, S.; Rashid, M. An Area Aware Accelerator for Elliptic Curve Point Multiplication. In Proceedings of the 2020 27th IEEE International Conference on Electronics, Circuits and Systems (ICECS), Glasgow, UK, 23–25 November 2020; pp. 1–4. [Google Scholar] [CrossRef]
- Li, L.; Li, S. High-Performance Pipelined Architecture of Point Multiplication on Koblitz Curves. IEEE Trans. Circuits Syst. II Express Briefs 2018, 65, 1723–1727. [Google Scholar] [CrossRef]
- Li, J.; Wang, W.; Zhang, J.; Luo, Y.; Ren, S. Innovative Dual-Binary-Field Architecture for Point Multiplication of Elliptic Curve Cryptography. IEEE Access 2021, 9, 12405–12419. [Google Scholar] [CrossRef]
- Zhao, X.; Li, B.; Zhang, L.; Wang, Y.; Zhang, Y.; Chen, R. FPGA Implementation of High-Efficiency ECC Point Multiplication Circuit. Electronics 2021, 10, 1252. [Google Scholar] [CrossRef]
- Sutter, G.D.; Deschamps, J.P.; Imana, J.L. Efficient Elliptic Curve Point Multiplication Using Digit-Serial Binary Field Operations. IEEE Trans. Ind. Electron. 2013, 60, 217–225. [Google Scholar] [CrossRef]
- Khan, Z.U.A.; Benaissa, M. High-Speed and Low-Latency ECC Processor Implementation Over GF( 2m) on FPGA. IEEE Trans. Very Large Scale Integr. Syst. 2017, 25, 165–176. [Google Scholar] [CrossRef] [Green Version]
- Itoh, T.; Tsujii, S. A fast algorithm for computing multiplicative inverses in GF (2m) using normal bases. Inf. Comput. 1988, 78, 171–177. [Google Scholar] [CrossRef] [Green Version]
- Rashid, M.; Imran, M.; Jafri, A.R.; Mehmood, Z. A 4-Stage Pipelined Architecture for Point Multiplication of Binary Huff Curves. J. Circuits Syst. Comput. 2020, 29, 2050179. [Google Scholar] [CrossRef]
- Islam, M.M.; Hossain, M.S.; Hasan, M.K.; Shahjalal, M.; Jang, Y.M. Design and Implementation of High-Performance ECC Processor with Unified Point Addition on Twisted Edwards Curve. Sensors 2020, 20, 5148. [Google Scholar] [CrossRef]
- Lara-Nino, C.A.; Diaz-Perez, A.; Morales-Sandoval, M. Lightweight elliptic curve cryptography accelerator for internet of things applications. Ad Hoc Netw. 2020, 103, 102159. [Google Scholar] [CrossRef]
- Sajid, A.; Rashid, M.; Imran, M.; Jafri, A.R. A Low-Complexity Edward-Curve Point Multiplication Architecture. Electronics 2021, 10, 1080. [Google Scholar] [CrossRef]
- Imran, M.; Rashid, M. Architectural review of polynomial bases finite field multipliers over GF(2m). In Proceedings of the 2017 International Conference on Communication, Computing and Digital Systems (C-CODE), Islamabad, Pakistan, 8–9 March 2017; pp. 331–336. [Google Scholar] [CrossRef]
- Imran, M.; Abideen, Z.U.; Pagliarini, S. An Open-source Library of Large Integer Polynomial Multipliers. In Proceedings of the 2021 24th International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS), Vienna, Austria, 7–9 April 2021; pp. 145–150. [Google Scholar] [CrossRef]
I/O Pin | Description | I/O Pin | Description |
---|---|---|---|
clk | input clock signal | rst | input reset signal |
ld | load data | din | load the data in 8-bit chunks |
st | start signal | lk | load a key in 8-bit chunks |
rp_addr | read parameter address | wp_addr | write parameter address |
rwp_en | read/write parameter enable | rd1_addr | read address one |
rd2_addr | read address two | wd_addr | write data address |
rw_en | read/write enable | op_sel_1 | select first operand |
op_sel_2 | select second operand | wb_sel | select an operand for written back |
dout | receives an output in 8-bit chunks | dn | done signal |
Device | Slices | LUTs | FFs | Clock Cycles | Freq (MHz) | Latency (s) | Throughput | PM |
---|---|---|---|---|---|---|---|---|
Virtex-6 | 6107 | 15,216 | 6103 | 9165 | 319 | 28.73 | 34.80 kbps | 5.69 |
Virtex-7 | 5683 | 14,356 | 5961 | 9165 | 361 | 25.38 | 39.40 kbps | 6.93 |
Ref. # | Year | Device | Slices | LUTs | Clock | Freq | Latency | Thrpt | Thrpt/Area | |
---|---|---|---|---|---|---|---|---|---|---|
Cycles | MHz | s | kbps | Thrpt/Slices | Thrpt/LUTs | |||||
[15] | 2015 | Virtex-7 | 12,965 | 38,547 | – | 250 | 57.61 | 17.35 | 1.33 | 0.45 |
[16] | 2020 | Virtex-7 | 4560 | 12,691 | 12,329 | 340 | 36.26 | 27.57 | 6.04 | 2.17 |
[17] | 2018 | Virtex-5 | 20,291 | – | – | – | 18.51 | 54.02 | 2.66 | – |
[18] | 2021 | Virtex-7 | – | 80,970 | – | 274 | 12.55 | 79.68 | – | 0.98 |
[19] | 2021 | Virtex-6 | – | 116,241 | 7628 | 135 | 56.50 | 17.69 | – | 0.15 |
[20] | 2013 | Virtex-5 | 11,640 | 324,332 | 44,047 | 127 | 348 | 2.87 | 0.246 | 0.008 |
[21] | 2017 | Virtex-7 | 50,336 | – | 3783 | 111 | 34.05 | 29.36 | 0.58 | – |
Design of Figure 1 | Virtex-5 | 7289 | 17116 | 9165 | 296 | 30.96 | 32.29 | 4.42 | 1.88 | |
Design of Figure 1 | Virtex-6 | 6107 | 15216 | 9165 | 319 | 28.73 | 34.80 | 5.69 | 2.28 | |
Design of Figure 1 | Virtex-7 | 5683 | 14356 | 9165 | 361 | 25.38 | 39.40 | 6.93 | 2.74 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alhomoud, A.; Jamal, S.S.; Altowaijri, S.M.; Ayari, M.; Alharbi, A.R.; Aljaedi, A. Large Field-Size Throughput/Area Accelerator for Elliptic-Curve Point Multiplication on FPGA. Appl. Sci. 2023, 13, 869. https://doi.org/10.3390/app13020869
Alhomoud A, Jamal SS, Altowaijri SM, Ayari M, Alharbi AR, Aljaedi A. Large Field-Size Throughput/Area Accelerator for Elliptic-Curve Point Multiplication on FPGA. Applied Sciences. 2023; 13(2):869. https://doi.org/10.3390/app13020869
Chicago/Turabian StyleAlhomoud, Ahmed, Sajjad Shaukat Jamal, Saleh M. Altowaijri, Mohamed Ayari, Adel R. Alharbi, and Amer Aljaedi. 2023. "Large Field-Size Throughput/Area Accelerator for Elliptic-Curve Point Multiplication on FPGA" Applied Sciences 13, no. 2: 869. https://doi.org/10.3390/app13020869
APA StyleAlhomoud, A., Jamal, S. S., Altowaijri, S. M., Ayari, M., Alharbi, A. R., & Aljaedi, A. (2023). Large Field-Size Throughput/Area Accelerator for Elliptic-Curve Point Multiplication on FPGA. Applied Sciences, 13(2), 869. https://doi.org/10.3390/app13020869