Quantum Circuit Optimization for Solving Discrete Logarithm of Binary Elliptic Curves Obeying the Nearest-Neighbor Constrained

In this paper, we consider the optimization of the quantum circuit for discrete logarithm of binary elliptic curves under a constrained connectivity, focusing on the resource expenditure and the optimal design for quantum operations such as the addition, binary shift, multiplication, squaring, inversion, and division included in the point addition on binary elliptic curves. Based on the space-efficient quantum Karatsuba multiplication, the number of CNOTs in the circuits of inversion and division has been reduced with the help of the Steiner tree problem reduction. The optimized size of the CNOTs is related to the minimum degree of the connected graph.


Introduction
The security of Elliptic Curve Cryptosystems is based on the difficulty of solving the discrete logarithm problem in an elliptic curve group. It seems more difficult to deal with the problem for solving discrete logarithm in F 2 n than in F p . The key agreement represents the protocol in which two or more parties together generate a secret key using a public channel [1][2][3]. For instance, better security can be achieved in Diffie-Hellman Key exchange by choosing a suitable elliptic curve in F 2 155 than in F p when p has 512 bits. The efficiency of the optimization for elliptic curve cryptosystems relies on the speed of the operations in the elliptic curve, whose core operation is the point addition. The efficient algorithms for elliptic curve cryptography are classified into high-level algorithms and low-level algorithms, i.e., group operations of elliptic curves and arithmetic operations in the fundamental finite field. Obviously, both of the above two-level operations should be optimized in order to realize the elliptic curve cryptosystem effectively.
With the intrinsic advantages in executing certain matrix multiplication operations, quantum algorithms are proposed to enhance data analysis techniques under some circumstances [4]. The first paper to discuss in detail how to use a quantum algorithm to solve elliptic curve discrete logarithm problem is by Proos and Zalka [5]. Based on this study, in 2017, Rötteler, Naehrig, Svore, and Lauter presented a concrete quantum resource estimation and the explicit quantum circuit for operations of point additions for solving the discrete logarithm problem in elliptic curves over F p [6].
While there is some common ground between the prime-field case and the characteristictwo case, there are also important differences. Elliptic curves over finite fields F 2 n play a prominent role in modern cryptography. Published quantum algorithms dealing with such curves build on a short Weierstrass form in combination with affine or projective coordinates. Amento, Rötteler, and Steinwandt use projective coordinates to avoid divisions [7]. They need only 13 multiplications every step, which would result in 26n log(3)+1 as the leading term in their Toffoli gate count if the multiplications were implemented using the space-efficient quantum Karatsuba multiplication [8]. Amento et al. show in their paper [7] the choice of how to represent the elements of F 2 n can have a significant impact on the resource requirements for quantum arithmetic. In particular, they show how the Gaussian normal basis representations and "ghost-bit basis" representations can be used to implement inverters with a quantum circuit of depth O(n log(n)). This is the first construction to compute inverse in F * 2 n with subquadratic depth reported in the literature. The quantum circuit of computing inverse in F * 2 n in [7] is based on the Itoh-Tsujii algorithm [9] which exploits the property that, in a normal basis representation, squaring corresponds to a permutation of the coefficients. Because the map ξ → ξ 2 i is a bijection in F * 2 n , it corresponds to an n by n nonsingular matrix, and all the elements in the matrix belong to F 2 . Then, using an LUP-decomposition of this matrix, the needed exponentiation can be realized with n 2 + n CNOT gates in depth 2n. For i ≥ 0 they define β = α 2 i −1 . Then the goal is to find α −1 = (β n−1 ) 2 from β 1 = α. For this they exploit that β i+j = β i · β 2 i j for all i, j ≥ 0. Thus, in a polynomial basis representation, one evaluation of β i+j = β i · β 2 i j can be realized in depth O(n) using n 2 Toffolis and 2n 2 + n − 1 CNOT gates.
However, this use of projective coordinates has two disadvantages. First, they use many ancillary qubits and separate input and output qubits, leading to 10n qubits in one point-addition step even with space-efficient quantum Karatsuba multiplications. Second, projective coordinates have a much larger space disadvantage not pointed out in Ref. [7]. Furthermore, Ref. [7] does not specify the entirety of Shor's algorithm, leaving open how exactly the presented results would be combined.
Building on the Karatsuba multiplier, the multiplication algorithm presented by Ref. [8] can be realized using O(n log(3) ) Toffoli gates and 3n qubits, which has been exploited by Ref. [10]. However, there exists a disadvantage in the method of [8]. There are so many CNOT gates needed in Ref. [8], which is O(n 2 ).
The number of qubits and the connectivity between qubits in practical quantum devices are limited by the noisy environment. However, the resource costs have not been discussed in Refs. [5][6][7][8][9][10] when the quantum bit connectivity is limited. We discuss the quantum circuit optimization for solving discrete logarithm of elliptic curve in F 2 n , obeying the nearest-neighbor constrained. It has been shown that when operating a CNOT gate between two qubits, the number and the depth of CNOT gates needed are determined by the distance between the two qubits. Therefore, the number and the depth of CNOT gates needed in elementary operations (such as additions, binary shifts, multiplications, and squarings) for point additions are dominated by the arrangement of qubits. In this paper we treat division by a field element as multiplication by the inverse of that element and the inversion step is based on Fermat's little theorem (i.e., using the Itoh-Tsujii algorithm to compute the inverse). With the help of the Steiner tree problem reduction in Refs. [11,12], we optimize the number of CNOT gates included in the point addition on binary elliptic curves under a constrained connectivity. The optimized size of the CNOTs is O(n 2 /log δ), where δ is the minimum degree of the connected graph. Based on this, for both division algorithms, the FLT-based algorithm preserves the similar number of Toffoli gates and qubits and suppresses the disadvantage previously in Ref. [10], which has roughly twice the number of the CNOT gate count compared with the GCD-based algorithm.

Materials and Methods
Each addition in F 2 takes one CNOT gate. The addition of two polynomials f (x), g(x) of degree at most n − 1 takes n CNOT gates with depth 1. Considering the connectivity of qubits [13], four CNOT gates will be needed in performing a CNOT gate between the first qubit and the third qubit, which is shown in the Figure 1. Eight CNOT gates will be needed in performing a CNOT gate between the first qubit and the fourth qubit, which is shown in Figure 2. Therefore, 4(n − 2) CNOT gates will be needed in performing a CNOT gate between the first qubit and the n-th qubit.  Let the connectivity of qubits corresponding to the coefficients of f (x), g(x) be: Then the number of and the depth of CNOT gates needed in the addition of f (x) and g(x) are still n and 1, respectively. When these qubits are arranged in the following order the number of and the depth of CNOT gates needed in the addition of f (x) and g(x) are 4(n − 1) · n = 4n 2 − 4n and 4(n − 1) · n − (n − 1) = 4n 2 − 5n + 1, respectively.
For polynomials in F 2 [x] multiplication by x is a shift of the coefficient vector. This requires no quantum computation by doing a series of swaps. In a finite field F 2 n we want to multiply a polynomial g(x) of degree at most n − 1 by x then by a modular reduction by a fixed irreducible weight-ω degree-n polynomial m(x). In general, we let ω be 3 or 5. As m(x) is irreducible, it always has coefficient 1 for x 0 , so after a reduction by m(x) that qubit will be 1 and if no reduction takes place that qubit will be 0, which means the modular shift algorithm is always reversible. Considering the connectivity of qubits, when the Hamming weight of m(x) is ω = 3 and m(x) = x n + x t + 1 (1 ≤ t < n), we let the connectivity of qubits corresponding to the coefficients of g(x) be: Then the number of and the depth of CNOT gates needed in multiplying g(x) by x then by a modular reduction by m(x) are still n and 1, respectively. When these qubits are arranged in the following order the number of and the depth of CNOT gates needed in multiplying g(x) by x then by a modular reduction by m(x) are 4(n − t − 1) and 4(n − t − 1). respectively. When the Hamming weight of m(x) is ω = 5 and m( , let the connectivity of qubits corresponding to the coefficients of g(x) be: Then the number of and the depth of CNOT gates needed in multiplying g(x) by x then by a modular reduction by m(x) are 4 and 3, respectively. When these qubits are arranged in the following order g 0 − g 1 − · · · − g n−2 − g n−1 , the number of and the depth of CNOT gates needed in multiplying g(x) by x then by a modular reduction by m(x) are at most 2(n − t 1 − 1) respectively. The number of and the depth of CNOT gates are at least 2(n − t 3 − 1) For multiplication, if we use a space-efficient Karatsuba algorithm by Van Hoof, we will need O(n 2 ) CNOT gates, O(n log (3) ) Toffoli gates, and 3n total qubits: 2n qubits for the input f (x), g(x), and n separate qubits for the output f (x) · g(x). In a multiplication, most CNOT gates are needed in the processes of multiplying by 1 + x k or (1 + x k ) −1 where k has log (n) values and each process need O(n 2 ) CNOT gates. In the quantum algorithm for the division we have to use up to 2(k 1 + t − 1) multiplications, so 4(log (n)) · O(n 2 ) · (k 1 + t − 1) (i.e., O(n 2 (log 2 (n)))) CNOT gates will be needed in the quantum algorithm for a division. If we take the constrained connectivity into consideration, at most 16(log (n)) · O(n 2 ) · (n − 2) · (k 1 + t − 1) (i.e., O(n 3 (log 2 (n)))) CNOT gates will be needed.
If the irreducible polymomial is fixed to a trinomial m(x) = x n + x t + 1 (1 ≤ t < n) or a pentanomial m(x) = x n + x t 3 + x t 2 + x t 1 + 1 (1 ≤ t 1 < t 2 < t 3 < n) each multiplying by 1 + x k or (1 + x k ) −1 will need about (log (n)) · n CNOT gates. Then we use up to 2(k 1 + t − 1) multiplications in the quantum algorithm for the division. Therefore only about 4(log (n)) 2 · n · (k 1 + t − 1) CNOT gates are needed in the quantum algorithm for a division. When the constrained connectivity has been taken into consideration, at most 16(log (n)) 2 · n · (n − 2) · (k 1 + t − 1) CNOT gates will be needed.
Take for example the irreducible polynomial m(x) = x 4 + x + 1, based on which the finite field F 2 4 can be constructed. The quantum circuit of the space-efficient Karatsuba algorithm by Van Hoof is shown in the Figure 3: The simulation is ran under IBM T-like graph (T65). The topological structure of IBM T65 is depicted below: For the sake of optimizing the number and the depth of CNOT gates while preserving the similar number of Toffoli gates and qubits, we adopt the implementation of a Toffoli gate shown in Figure 4, which has been proposed by Ref. [14]. If we take the constrained connectivity into consideration, 812 CNOT gates will be needed in the quantum circuit for the space-efficient Karatsuba algorithm by Van Hoof. Because the map ξ → ξ 2 i is a bijection in F 2 n , we can think of squaring in F 2 n as a circuit that replaces the input with the result. To square and replace the input, we make use of the fact that squaring is a linear map and we can write that map as an n by n matrix. Using an LUP-decomposition, we get a lower triangular, upper triangular, and permutation matrix, which can be translated into a circuit consisting of at most n 2 − n CNOT gates and a number of swaps. In the quantum algorithm for the division we have to use up to 4n − 4 squarings, so 4n 3 − 8n 2 + 4n CNOT gates will be needed in the quantum algorithm for a division. If we take the constrained connectivity into consideration, at most 16n 4 − 64n 3 + 80n 2 − 32n CNOT gates will be needed.
If the irreducible polymomial is fixed to a trinomial m(x) = x n + x t + 1 (1 ≤ t < n) or a pentanomial m(x) = x n + x t 3 + x t 2 + x t 1 + 1 (1 ≤ t 1 < t 2 < t 3 < n), each squaring will need about 2n CNOT gates. Then we use up to 4n − 4 squarings in the quantum algorithm for the division. Therefore, only about 8n 2 − 8n CNOT gates are needed in the quantum algorithm for a division. When the constrained connectivity has been taken into consideration, at most 32n 3 − 96n 2 + 64n CNOT gates will be needed.
Take for example the irreducible polynomial m(x) = x 4 + x + 1, based on which the finite field F 2 4 can be constructed. The quantum circuit of the squaring for a polynomial a(x) = a 0 + a 1 x + a 2 x 2 + a 2 x 3 in F 2 4 need 5 CNOT gates. If we take the constrained connectivity into consideration, 8 CNOT gates will be needed.
The classic algorithm for the inversion f −1 mod m(x) uses n − 1 squarings and the quantum algorithm for the division has to use up to 4n − 4.
Only CNOT gates exist in quantum circuits of squarings and, multiplying by 1 + x k or (1 + x k ) −1 in the multiplications, these circuits are CNOT circuits, which cost many CNOT gates.
For a graph G(V, E) with n vertices, without loss of generality, we assume that the degree of vertices are denoted as d 1 ≤ d 2 ≤ · · · ≤ d n . A theorem has been given by Bujiao Wu et al. in [12], which optimizes the size of CNOTs.
Given a set of terminals and a connectivity graph, the algorithm performs breadth-first search outwards from each of the terminals. When the paths collide, the nodes along that path consolidate into a single node and all the edges adjacent to the consolidated nodes are placed adjacent to this new node. The process is restarted with this node as a new terminal. From many trials, it seems that this approximation is sufficient to see a large reduction in the CNOT count of the output circuit. The choice of Steiner tree approximation algorithm for this purpose depends on the user's efficiency and performance requirements.
It follows that the optimized size in Theorem 1 is asymptotically tight for a nearly regular graph.
then there is a polynomial time algorithm to construct an equivalent O( n 2 log(n/k) ) size CNOT circuit for any n-qubit CNOT circuit on topological graph G, and there needs at least Ω( n 2 log d n ) size of CNOT gates for some invertible matrix.
We can see the proof of Theorem 1 in [12]. Let k = n/δ for any given CNOT circuits with n qubits under a constrained connectivity, in which δ is the minimum degree of the connected graph. Then it can be easily shown that the sum of degrees for any k vertices is greater than n. Therefore, we will get CNOT circuits who have O(n 2 /log δ) CNOT gates. Due to the lower bound of the size of CNOT gates being Ω(n 2 /log δ) for any CNOT circuits on a connected graph [15], the bound O(n 2 /log δ) is tight for a regular graph. Let δ = 4, then the size and the depth of CNOT gates needed in the quantum algorithm for the division will be cut in half.

Simulation of the Improved Quantum Circuit for Division Algorithm
In this paper, with the help of the Q# language, the resource estimation of the quantum circuit for the division algorithm used to solve discrete logarithms of elliptic curves in F 2 n has been simulated. It has been shown that based on the space-efficient quantum Karatsuba multiplication, the number of CNOTs in the circuits of inversion and division has been reduced with the help of the Steiner tree problem reduction.
From Table 1, it can be seen that when the FLT-based algorithm is used for the division algorithm, the optimized quantum circuit of this paper is better in terms of the size and the depth of CNOT gates than that of [6]. Due to the space-efficient quantum Karatsuba multiplication, both the consumption of qubits and the consumption of Toffoli gates are also quite good.  Table 1 has also shown that the optimized quantum circuit of this paper where the FLT-based algorithm is used for the division algorithm is better in terms of the size of CNOT gates than that of [6], where the GCD-based algorithm is used for the division algorithm.
If the constrained connectivity has been taken into consideration, about 128n 3 CNOT gates will be needed in the quantum circuit for the division algorithm proposed by this paper.

Discussion and Conclusions
With the development of time, extensive attention has been attracted by the field of quantum computation. The main tool for researching the implementation of quantum algorithms is quantum circuit models, whose optimization is a direction worthy of study. In this paper, we have comprehensively discussed the quantum circuit of solving discrete logarithms of elliptic curves in F 2 n and have made further optimizations of the size and the depth of CNOT gates. Based on the space-efficient quantum Karatsuba multiplication, we have reduced the number of CNOTs in the circuits of inversion and division with the help of the Steiner tree problem reduction.
In the future, we will consider the quantum circuit optimizations of practical quantum devices in noisy environments and assess the performances of quantum algorithms on practical quantum devices.