An Efficient Elliptic-Curve Point Multiplication Architecture for High-Speed Cryptographic Applications
Abstract
:1. Introduction
1.1. Existing High-Speed State-of-the-Art Implementations
1.2. Limitations in the Existing Solutions
1.3. Our Contributions
- An efficient high-speed hardware architecture for the PM computation of ECC using NIST defined binary fields over and are presented. The field is selected as it allows a richer comparison to the state-of-the-art solutions while the field is employed to evaluate the performance of proposed high-speed architecture for higher key lengths.
- To reduce the computational time for a single PM computation, we incorporate two modular multipliers (connected in parallel), two serially connected squarer units (one after the first multiplier and another after the adder) and a single adder unit (serially connected after multipliers) in the data path of the proposed high-speed PM architecture. The utilization factor for each parallel-connected multiplier is 83% in the loop iteration step of the PM algorithm, as the parallel-connected multipliers are fully utilized in five clock cycles when the total required cycles are six (details are given in Section 3).
- To reduce clock cycles, we provide an efficient scheduling of PA and PD instructions for PM computation of the Montgomery algorithm. It reduces 57% of clock cycles as compared to PA and PD instructions in the native Montgomery PM algorithm. Consequently, the reduction in computational time and clock cycles ultimately improves the overall speed.
- Finally, to speed up the control functionalities, a finite-state-machine (FSM) based dedicated control unit is used.
2. Background
- The Initialization step (Step-1 in Algorithm 1) ensures the conversions from affine to projective (Lopez Dahab) coordinates.
- The Point Multiplication step (Step-2 in Algorithm 1) provides some mathematical instructions related to PA () and PD () computations, based on the inspected value of scalar multiplier (ki). As shown in Algorithm 1, the point multiplication step is operated in a loop-fashion. In order to execute this loop in the hardware, i represents the integer value that is responsible for counting the number of points on the defined elliptic curve while n determines the implemented field size (163 or 571 in this work). Once the inspected key bit (i.e., k) becomes 1 then the instructions for PA and PD of “if” part of the Algorithm 1 are executed, otherwise the instructions from “else” part are computed.
- Finally, the Reconversion step (Step-3 in Algorithm 1) provides the instructions related to projective (Lopez Dahab) to affine conversions, including the costly inversion operation.
Algorithm 1: Montgomery point multiplication algorithm [24] |
|
3. Proposed Instructions Parallelization of Point Addition and Point Doubling Operations
Algorithm 2: Proposed instructions parallelization for PM computation of Montgomery algorithm using two modular multipliers, two squarers and an adder unit |
|
3.1. Instructions Parallelization for Point Addition
3.2. Instructions Parallelization for Point Doubling
3.3. Overall Decrease in Total Number of Clock Cycles
4. Proposed High-Speed Elliptic-Curve Point Multiplication Architecture
4.1. Memory Unit
4.2. Data Path
4.2.1. Structure of the Data Path
4.2.2. Implementation of Modular Operators
- Multipliers: To perform multiplication of two m bit input polynomials, i.e., A(x) × B(x) over , a Parallel-Least-Significant-Digit (P-LSD) multiplier with a digit size of 41 bits is employed inside the used MULT−1 and MULT−2 multipliers (shown in Figure 2). The digits with bits of input polynomial is created () by generating the partial products. Parallel multiplication of each created digit () is performed with the input polynomial, i.e., . For example, to compute FF multiplication over , a total of four digits are required. Out of these four digits, three digits () are with the size of 41 bits while the last digit () is with 40 bit size. The parallel multiplication of each digit with an m bit polynomial A(x), results bits. After multiplication of each d bit digit with an m bit polynomial, the resultant polynomial, i.e., bit, is created using shift and add operations over polynomials size of bits. Similarly, to compute FF multiplication over , a total of fourteen digits () are required. The first thirteen digits () are with 41 bit size while the remaining digit () is with 38 bit size.
- Modular reduction (not shown in Figure 2): The multiplication of two m bit polynomials, i.e., and squaring over an m bit polynomial, i.e., A(x), produces a resultant polynomial of degree bits. Therefore, after each FF multiplication and a squaring operation, a modular reduction is essential to perform. However, we have implemented NIST reduction algorithms over and fields, respectively (as described in Algorithm 2.41 and 2.45 in reference [9]).
- Polynomial inversion (not shown in Figure 2). The polynomial inversion over is implemented by using the Itoh−Tsujii algorithm [26]. It requires repeated squares followed with multiplications for inversion computations. For example, for , 162 squares followed with 9 multiplications are needed. Similarly, 570 squares followed with 11 multiplications are required to perform inversion over . Subsequently, there are two possibilities to implement repeated squares in the Itoh−Tsujii algorithm. In the first case, each square is required to compute in one clock cycle (termed as, square version of Itoh-Tsujii). In the second case, two repeated squares are performed in one clock cycle (named as, quad-block version of Itoh-Tsujii). Comparatively, the latter is more convenient to reduce the total number of clock cycles in high-speed elliptic-curve computations. Consequently, we have used the quad-block version of the Itoh–Tsujii algorithm for polynomial inversion computation using serially connected MULT−1 and SQR−2, as shown in Figure 2.
4.3. Control Unit
- Idle state: The state-0 is an idle state. The architecture begins execution based on the value of start signal, as illustrated in Figure 3.
- Step-1 of Algorithm 2: State-1 to State-3 are responsible for generating control signals for the initialization (affine to projective conversion) step of Algorithm 2, as shown in Figure 3.
- Step-2 of Algorithm 2: State-4 to State-9 generate control signals for the execution of the PM step (PA and PD instructions). Moreover, each state is responsible for checking the inspected key bit (this information is not provided in Figure 3). Once the value of becomes 1, the corresponding PA and PD instructions from the if part of Algorithm 2 are executed. Otherwise, the instructions from else part of Algorithm 2 are computed. The State-9 is also responsible for checking the value on n to count the number of points on the defined elliptic curve. Once the value of becomes 0 (the implemented field size—163 or 571—is the initial value to n) then the next-state will be State-10 otherwise the next-state will be State-3. These states (State-3 to State-9) continue their execution until the value of becomes 0.
- Step-3 of Algorithm 2: For , State-10 to State-41 (State-10 to State-31 are for inversions and the remaining states are required for the computation of additional instructions in Algorithm 2) are required to perform Lopez Dahab to affine conversions. Similarly, for this figure increases to 43 meaning that State-10 to State-33 are required for inversion computation and the additional states (State-34 to State-43) are required to execute remaining instructions in the projective to affine conversions of Algorithm 2.
5. Implementation Results and Comparisons
5.1. Implementation Results of the Proposed Architecture over for m = 163 Bits and 571 Bits
5.2. Comparison with State-of-the-Art Solutions
5.2.1. Comparison over Virtex-5
5.2.2. Comparison over Virtex-7
6. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Rashid, M.; Imran, M.; Jafri, A.R.; Al-Somani, T.F. Flexible Architectures for Cryptographic Algorithms—A Systematic Literature Review. J. Circuits Syst. Comput. 2019, 28, 1992002. [Google Scholar] [CrossRef] [Green Version]
- Rashid, M.; Imran, M.; Jafri, A.R.; Al-Somani, T.F. Comparative analysis of flexible cryptographic implementations. In Proceedings of the 11th International Symposium on Reconfigurable Communication-Centric Systems-on-Chip, Tallinn, Estonia, 27–29 June 2016; pp. 1–6. [Google Scholar]
- Khan, Z.U.A.; Benaissa, M. High-Speed and Low-Latency ECC Processor Implementation Over GF(2m) on FPGA. IEEE Trans. Very Large Scale Integr. Syst. 2017, 25, 165–176. [Google Scholar] [CrossRef] [Green Version]
- Koblitz, N. Elliptic curve cryptosystems. Math. Comput. 1987, 48, 203–209. [Google Scholar] [CrossRef]
- Miller, V. Use of elliptic curves in cryptography. In Advances in Cryptology— CRYPTO ’85 Proceedings; 218 of the Series Lecture Notes in Computer Science (LNCS); Williams, H.C., Ed.; Springer: Berlin/Heidelberg, Germany, 1986; pp. 417–426. [Google Scholar]
- Rivest, R.L.; Shamir, A.; Adleman, L. A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 1978, 21, 120–126. [Google Scholar] [CrossRef]
- Imran, M.; Rashid, M.; Jafri, A.R.; Kashif, M. Throughput/area optimized pipelined architecture for elliptic curve crypto processor. IET Comput. Digit. Tech. 2019, 13, 361–368. [Google Scholar] [CrossRef] [Green Version]
- Khan, Z.; Benaissa, M. Throughput/area-efficient ECC processor using Montgomery point multiplication on FPGA. IEEE Trans. Circuits Syst. II 2015, 62, 1078–1082. [Google Scholar] [CrossRef]
- Hankerson, D.; Menezes, A.; Vanstone, S. Finite Field Arithmetic. In Guide to Elliptic Curve Cryptography; Publishing House: New York, NY, USA, 2004; pp. 32–58. [Google Scholar]
- Méloni, N. New Point Addition Formulae for ECC Applications. In Proceedings of the WAIFI: Workshop on the Arithmetic of Finite Fields, Madrid, Spain, 21–22 June 2007; pp. 189–201. [Google Scholar]
- Li, L.; Li, S. High-performance pipelined architecture of point multiplication on koblitz curves. IEEE Trans. Circuits Syst. II Express Briefs 2018, 65, 1723–1727. [Google Scholar] [CrossRef]
- Rashid, M.; Imran, M.; Jafri, A.R.; Mehmood, Z. A 4-Stage Pipelined Architecture for Point Multiplication of Binary Huff Curves. J. Circuits Syst. Comput. 2020, 29, 2050179. [Google Scholar] [CrossRef]
- Imran, M.; Rashid, M.; Jafri, A.R.; Islam, M.N. ACryp-Proc: Flexible Asymmetric Crypto Processor for Point Multiplication. IEEE Access 2018, 6, 22778–22793. [Google Scholar] [CrossRef]
- Jafri, A.R.; Islam, M.N.; Imran, M.; Rashid, M. Towards an Optimized Architecture for Unified Binary Huff Curves. J. Circuits Syst. Comput. 2017, 26, 1750178. [Google Scholar] [CrossRef]
- Hu, X.; Zheng, X.; Zhang, S.; Cai, S.; Xiong, X. A low hardware consumption elliptic curve cryptographic architecture over GF(p) in embedded application. Electronics 2018, 7, 104. [Google Scholar] [CrossRef] [Green Version]
- Javeed, K.; Wang, X.; Scott, M. High performance hardware support for elliptic curve cryptography over general prime field. Microprcessors Microsyst. 2017, 51, 331–342. [Google Scholar] [CrossRef]
- Hamlin, A. Overview of Elliptic Curve Cryptography on Mobile Devices. Available online: https://pdfs.semanticscholar.org/e6dc/f250c18d37dfd380efe01f13ee4b24cef702.pdf (accessed on 11 November 2020).
- Mohammadi, S.; Abedi, S. ECC-Based biometric signature: A new approach in electronic banking security. In Proceedings of the International symposium on Electronic Commerce and Security, Guangzhou, China, 3–5 August 2008; pp. 763–766. [Google Scholar]
- Lee, C.; Li, C.; Chen, Z.; Chen, S.; Lai, Y. A novel authentication scheme for anonymity and digital rights management based on elliptic curve cryptography. Int. J. Electron. Secur. Digit. Forensics 2019, 11, 96–117. [Google Scholar] [CrossRef]
- Intel-White Paper. Understanding the SSL/TLS Adoption of Elliptic Curve Cryptography (ECC). Commun. Serv. Provid. Netw. Secur. Available online: https://builders.intel.com/docs/networkbuilders/understanding-the-ssl-tls-adoption-of-elliptic-curve-cryptography.pdf (accessed on 4 November 2020).
- Salarifard, R.; Bayat-Sarmadi, S.; Mosanaei-Boorani, H. A low-latency and low-complexity point-multiplication in ECC. IEEE Trans. Circuits Syst. I Regul. Pap. 2018, 65, 2869–2877. [Google Scholar] [CrossRef]
- Rashidi, B.; Masoud Sayedi, S.; Rezaeian Farashahi, R. High-speed hardware architecture of scalar multiplication for binary elliptic curve cryptosystems. Microelectron. J. 2016, 52, 49–65. [Google Scholar] [CrossRef]
- Imran, M.; Rashid, M. Architectural review of polynomial bases finite field multipliers over GF(2m). In Proceedings of the 2017 International Conference on Communication, Computing and Digital Systems (C-CODE), Islamabad, Pakistan, 8–9 March 2017; pp. 331–336. [Google Scholar]
- Montgomery, P.L. Speeding the pollard and elliptic curve methods of factorization. Math. Comput. 1987, 48, 243–264. [Google Scholar] [CrossRef]
- Federal Information Processing Standards Publication (FIPS PUB 186-4). Digital Signature Standard (DSS). Available online: https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.186-4.pdf (accessed on 9 November 2020).
- Itoh, T.; Tsujii, S. A fast algorithm for computing multiplicative inverses in GF(2m) using normal bases. Inf. Comput. 1988, 78, 171–177. [Google Scholar] [CrossRef] [Green Version]
m | Slices | LUTs | FFs | Clock Cycles | Frequency (in MHz) | Latency (in s) | # of Used Resources |
---|---|---|---|---|---|---|---|
B-163 | 1593 | 5097 | 1328 | 1079 | 293 | 3.68 | 1 ADD, 2 SQR and 2 MULT |
B-571 | 5575 | 17,839 | 4596 | 3732 | 269 | 13.87 |
Reference | m | Device | Slices | Clock Cycles | Frequency (in MHz) | Latency (in s) | # of Used Multipliers |
---|---|---|---|---|---|---|---|
[3] | B-163 | Virtex-7 | 11,657 | 450 | 159 | 2.83 | LLECC_3M (three 163 bit mul) |
[3] | B-163 | Virtex-7 | 4150 | 1119 | 352 | 3.18 | HPECC_1M (one 163 bit mul) |
[7] | B-163 | Virtex-7 | 2207 | 3960 | 369 | 10.73 | one digit-parallel |
[8] | B-163 | Virtex-7 | 1476 | 4168 | 397 | 10.51 | 1 digit-serial |
[11] | K-163 | Virtex-5 | 3670 | 614 | 245 | 2.50 | - |
[21] | B-163 | Virtex-7 | 2435 | - | 264 | 3.01 | LC_1M |
[21] | B-163 | Virtex-7 | 5753 | - | 214 | 1.50 | LL_2M |
[22] | B-163 | Virtex-7 | 5575 | - | 437 | 3.97 | 1 digit-serial |
[3] | B-571 | Virtex-7 | 50,336 | 3783 | 111 | 34.05 | HPECC_1M (one 571 bit mul) |
[8] | B-571 | Virtex-7 | 12,965 | 14,396 | 250 | 57.61 | 1 digit-serial |
[11] | K-571 | Virtex-5 | 20,291 | 2056 | 111 | 18.51 | - |
This work | B-163 | Virtex-5 | 1384 | 1079 | 276 | 3.90 | two digit-parallel |
B-163 | Virtex-7 | 1593 | 1079 | 293 | 3.68 | ||
B-571 | Virtex-5 | 4542 | 3732 | 237 | 15.74 | ||
B-571 | Virtex-7 | 5575 | 3732 | 269 | 13.87 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Rashid, M.; Imran, M.; Sajid, A. An Efficient Elliptic-Curve Point Multiplication Architecture for High-Speed Cryptographic Applications. Electronics 2020, 9, 2126. https://doi.org/10.3390/electronics9122126
Rashid M, Imran M, Sajid A. An Efficient Elliptic-Curve Point Multiplication Architecture for High-Speed Cryptographic Applications. Electronics. 2020; 9(12):2126. https://doi.org/10.3390/electronics9122126
Chicago/Turabian StyleRashid, Muhammad, Malik Imran, and Asher Sajid. 2020. "An Efficient Elliptic-Curve Point Multiplication Architecture for High-Speed Cryptographic Applications" Electronics 9, no. 12: 2126. https://doi.org/10.3390/electronics9122126
APA StyleRashid, M., Imran, M., & Sajid, A. (2020). An Efficient Elliptic-Curve Point Multiplication Architecture for High-Speed Cryptographic Applications. Electronics, 9(12), 2126. https://doi.org/10.3390/electronics9122126