Efﬁcient Implementations of Sieving and Enumeration Algorithms for Lattice-Based Cryptography

: The security of lattice-based cryptosystems is based on solving hard lattice problems such as the shortest vector problem (SVP) and the closest vector problem (CVP). Various cryptanalysis algorithms such as (Pro)GaussSieve, HashSieve, ENUM, and BKZ have been proposed to solve these hard problems. Several implementations of these algorithms have been developed. On the other hand, the implementations of these algorithms are expected to be efﬁcient in terms of run time and memory space. In this paper, a modular software package/library containing efﬁcient implementations of GaussSieve, ProGaussSieve, HashSieve, and BKZ algorithms is developed. These implementations are considered efﬁcient in terms of run time. While constructing this software library, some modiﬁcations to the algorithms are made to increase the performance. Then, the run times of these implementations are compared with the others. According to the experimental results, the proposed GaussSieve, ProGaussSieve, and HashSieve implementations are at least 70%, 75%, and 49% more efﬁcient than previous ones, respectively.


Introduction
Traditional public key cryptosystems such as RSA and (EC)DSA are based on the hardness of the integer factorization and the discrete logarithm problem [1]. However, due to the Shor algorithm in [2], they are insecure in the quantum era. For this reason, new cryptosystems are needed to avoid vulnerability in communication networks after the widespread use of quantum computers. The family of lattice-based cryptosystems is one of the candidates in the quantum era due to the efficiency and security reasons [1]. The lattice-based cryptography can be used in different security areas, such as identification and authentication [3,4]. The basis of the difficulty of lattice-based cryptography consists of lattice problems such as SVP and CVP, for which the solution is unknown in polynomialtime, even in the quantum computer era. To solve hard problems such as SVP and CVP, i.e., to break the lattice-based cryptography, many sieving-based and enumeration-based algorithms and their implementations of these algorithms are proposed in [5][6][7][8][9][10].

Previous Works
The main idea in the sieving algorithms such as the GaussSieve and the ProGaussSieve is to store the list data structure in memory, where vectors with larger sizes are held, and to continue processing by finding vectors close to the size of the intended shortest vector [9]. The efficiency of sieving algorithms is evaluated according to memory usage and the run times [5,6]. Therefore, sieving algorithms such as the GaussSieve and the ProGaussSieve, working with the same main idea of having only a few structural or technical modifica-tions, have been proposed in [5,6,10]. In [11], the AKS, the first exponential-time sieving algorithm, has the asymptotic run time complexity 2 2.465n+o(n) and the space complexity 2 1.233n+o(n) , where n is the lattice size. With the sieving algorithm of Nguyen-Vidick, which is the heuristic version of the AKS algorithm, the implementation of the AKS algorithm was developed in [12]. It was noted that memory problems in the AKS might occur as the size grows in [12]. The ListSieve algorithm in [5], in which the main operation is to start processing with an empty list data structure and to add a shorter vector to the list each time until it contains the shortest vector, operates with the asymptotic run time complexity 2 3.199n+o(n) and the space complexity 2 1.325n+o(n) . Micciancio and Voulgaris suggested the ListSieve algorithm and proposed the GaussSieve algorithm in [5] run with the ListSieve in the same idea except with a few changes. The GaussSieve algorithm in [5] works asymptotically with the run time 2 0.48n+o(n) complexity and the space complexity 2 0.18n+o(n) . Applying a progressive approach to the GaussSieve algorithm, Laarhoven and Mariano introduced the ProGaussSieve sieving algorithm in [6], which has the asymptotic run time complexity 2 0.42n+o(n) and the space complexity 2 0.21n+o(n) . The HashSieve sieving algorithm in [10] was revealed by carrying out Charikar's angular locality-sensitive hash (angular-LSH) family [13] to the GaussSieve algorithm, was proven to be faster than the GaussSieve algorithm, and found the shortest vector in the asymptotic run time complexity 2 0.3366n+o(n) and the space complexity 2 0.3366n+o(n) . In [14], a new algorithm was proposed by combining the k-means LSH function with the HashSieve.
The standard implementation of the GaussSieve algorithm was first developed by Micciancio and Voulgaris in [5] by using the NTL library [15]. In 2014, the first known parallel implementation of the GaussSieve algorithm was developed in [16]. The parallel version of the GaussSieve algorithm in [17] was implemented using the distributed-memory method on a CPU. Later, another parallel GaussSieve implementation using enhanced lock-free list data structure was introduced in [18]. Using the parallel GaussSieve algorithm method of Ishugura et al., the parallel implementation of the GaussSieve algorithm was developed on a GPU in [19]. The implementation of the ProGaussSieve algorithm, the progressive version of the GaussSieve algorithm, was developed by Laarhoven and Mariano in [6]. In 2015, the first known standard version of the HashSieve algorithm was implemented by Laarhoven in [10]. In [20], an SVP solver by using the Voronoi cell of a lattice was implemented. They gave a real performance of this method, although there are some limitations such as increasing memory requirements with the number of dimensions.
The enumeration algorithms are designed to enumerate all lattice points (vectors) in a bounded area to have a solution (finding the shortest vector) with better memory requirements. The first examples of enumeration algorithms are Kannan [21], Fincke and Pohst [22], and Schnorr and Euchner [7]. In 2010, the enumeration algorithms were made more efficient by using the "extreme pruning" technique in [23]. Dagdelen and Schneider introduced the parallel implementation of the enumeration algorithm in [24]. In 2016, the parallel implementation of Schnorr and Euchner's enumeration algorithm SE++ was revealed in [25].
The sieving and the enumeration algorithms need to perform reduction algorithms such as the Gram-Schmidt [26], the LLL [26], and the BKZ [8,27] to reduce the size of the lattices before starting their basic operations. The LLL algorithm has several usages in cryptography. For instance, the LLL has an important role in solving the knapsack problem efficiently, the integer factorization [28]. The LLL algorithm and the enumeration algorithms are used as subprocesses in the BKZ algorithm, which is the block width version of the LLL reduction algorithm [29]. The BKZ algorithm provides the most efficient results among the reduction algorithms, which was first introduced by Schnorr in [30], and its implementation was developed in [8]. The BKZ 2.0 version of the BKZ algorithm, which uses methods such as pruning enumeration [23], early termination, and progressive reduction, was introduced and implemented in [31]. In 2014, the parallel implementation of the BKZ algorithm was revealed in [32]. Later, the ACBKZ algorithm and its implementation were introduced in [33], which is the version of the BKZ algorithm that operates in different blocks in parallel.

Motivation and Contribution
Sieving, enumeration, and reduction algorithms consist of many common components. Therefore, these components are needed for the implementations of sieving, enumeration, and reduction algorithms. However, the implementations of sieving, enumeration, and reduction algorithms can be efficiently developed by using a software library that includes common components. Note that there is a lack of a modular software library that is used as the infrastructure in developing efficient implementations of these algorithms [34]. In this paper, a modular structured software library is designed to be used as the infrastructure while developing implementations of these algorithms to fill this gap. The developed implementation is designed to be efficient in terms of the run time and the space complexity. In addition to using a modular software infrastructure library, the performance of implementations can be increased through implementation-based improvements that can be made on the algorithms [6]. Furthermore, the proposed modular software library and the efficient implementations can also be used for efficient implementations of other lattice-based schemes. The contributions of this paper are as follows: • The modular software infrastructure library is developed to be used as an infrastructure to have the efficient implementations of the algorithms. • With the modular software library containing the commonly used components in the algorithms, the efficient implementations of GaussSieve and ProGaussSieve are provided. • In order to achieve performance improvements in the implementations of the GaussSieve and the ProGaussSieve algorithms, it is proposed to make changes to the termination criterion of these algorithms. • By making novel modifications to the HashSieve implementation developed by Laarhoven [10] and by using the modular software infrastructure library, a faster implementation of HashSieve is achieved. • Efficient implementations of the ENUM and the BKZ algorithms are developed using the modular software infrastructure library.

•
The proposed solution in [8] for the zero vector problem encountered in the LLLFP algorithm (a subprocess in the BKZ) is implemented in the LLLFP module.
The efficient implementations of GaussSieve and ProGaussSieve are compared with the GaussSieve and the ProGaussSieve implementations in [6] with regard to the asymptotic run time complexities. For the same lattice samples, HashSieve implementations are compared with that in [10]. In addition, the accuracy of the outputs of the efficient implementations of ENUM and BKZ are checked by comparing them with the outputs of the SageMath [35].

Organization
The rest of this paper is organized as follows. In Section 2, the mathematical background of the lattice-based cryptosystems; the lattice algorithms; and the GaussSieve, the ProGaussSieve, the HashSieve, the ENUM, and the BKZ algorithms are recalled with possible improvements. The software features of the modular software infrastructure library used to develop the efficient implementations of these algorithms are mentioned in Section 3. In addition to this section, the efficient implementations of these algorithms are detailed with novel modifications. In Section 4, the run times of efficient implementations are compared with those of the implementations in the literature. Finally, the results obtained in this paper and the future studies are given in Section 5.

Preliminaries
In this section, the main computationally hard problems in the lattice-based cryptography are recalled. Then, the common subcomponents and the algorithms developed to solve these hard problems are mentioned. Finally, the basic working order of the algorithms developed for solving hard problems is explained through the pseudo-codes. Table 1 shows some special notations that are often used in the mathematical descriptions and the algorithms.

Mathematical Background
Mathematical definitions of the lattice structure and the hard problems in these cryptosystems are given below. Definition 1 (Lattice). In R n , the set of points consisting of linear independent integer vectors {v 1 , . . . , v n } of the basis B is called the lattice.
Definition 2 (SVP). Let the shortest lattice vector be λ 1 (L) = min||v||, provided that it is v ∈ L and v = 0. SVP is the problem of finding the lattice vector v ∈ L, in ||v|| = λ 1 (L) equality. In other words, SVP is the problem of finding the vector with the shortest Euclidean norm from the lattice basis vectors [36].

Definition 3 (CVP).
Let t ∈ R n be a target point, and the smallest distance between the lattice vector v and t is defined as d(t, L) := min||v − t||. CVP is the problem of finding the lattice vector v ∈ L, in ||v|| = d(t, L) equality. In other words, CVP is the problem of finding the lattice vector closest to the chosen target point [37].

Common Submodules in Algorithms
Many mathematical operation structures and algorithms are common in the sieving, enumeration, the BKZ reduction algorithms. These common mathematical operation structures and algorithms, called common submodules, are carried out as subprocesses in the sieving, enumeration, and BKZ reduction algorithms. Among these common algorithms, the GaussReduce reduction [5], the Gram-Schmidt reduction [26], the LLL reduction [26], the LLLFP reduction [8], and the Klein's Nearest Neighbor [38] are used for the following purposes and operations. In addition, the mathematical operation structures that are commonly used with algorithms are given below.

•
The GaussReduce Algorithm: This reduction algorithm is used in the Gauss-based sieving algorithms to reduce the size of a lattice vector by other lattice vectors.
• The Gram-Schmidt Algorithm: This reduction algorithm obtains the Gram-Schmidt constants and the reduced lattice consisting of vectors perpendicular to each other as much as possible. The resulting reduced lattices and constants are given as input parameters to the sieving, enumeration, and reduction algorithms such as the BKZ and the LLL. • The LLL and The LLLFP Algorithms: These algorithms produce lattices consisting of vectors orthogonal to each other and that are reduced. The reduced lattice is given as an input parameter to the sieving algorithms and is used as a subprocess in the BKZ. The only difference between the LLLFP algorithm and the LLL algorithm is that the LLLFP algorithm minimizes the floating-point errors in the LLL algorithm. • The Klein's Nearest Neighbor Algorithm: It produces new sample vectors to be used in the sieving algorithms. This new sample vector is used in the reduction operations in the sieving algorithms. The Klein's algorithm is composed of the nearA algorithm [38] as the main algorithm and the Randomized Rounding algorithm [39] as a submodule. • The Mathematical Operations: Vector arithmetic is needed in the sieving, the enumeration, the BKZ, and the subcomponents of these algorithms such as the GaussReduce and the Gram-Schmidt submodules. These operations are the Euclidean norm of a vector, the vector addition/subtraction, and the inner product.

The GaussSieve and the ProGaussSieve Algorithms
The main idea in the GaussSieve algorithm, solving the shortest vector problem, is to add shorter new lattice vectors for each iteration to the list data structure. In addition, the GaussSieve algorithm is to reduce the new lattice vector with the list vectors while at the same time reducing all list vectors with the new lattice vector.
The reduced lattice basis B and the termination criterion c (the total number of collisions) are sent to the GaussSieve algorithm as the inputs. The GaussSieve algorithm first takes a vector v from the stack data structure S where the sample vectors are stored or generates a new sample vector v in the Klein's Nearest Neighbor algorithm (line 5 in Algorithm 1). The new sample vector v is given as input to the GaussReduce reduction algorithm. In this reduction algorithm, the vector v is reduced by using all list vectors w (line 6 in Algorithm 1). The reduced new sample vector v reduces all list vectors w if the vector v satisfies the conditions (line 7 in Algorithm 1). Later, the GaussSieve algorithm compares the Euclidean norm of the reduced new sample vector v with the Euclidean norm of its previous state (line 9 in Algorithm 1). As a result of the comparison, if there is no change in the length of the new sample vector v, the GaussSieve algorithm adds the vector v to the list data structure L (line 10 in Algorithm 1). If the reduced new sample vector v has a change in length and its length is different from zero, the GaussSieve algorithm adds the reduced vector v to the stack data structure S (line 13 in Algorithm 1). If the reduced new sample vector v has a change in length (the length is zero), the GaussSieve algorithm increases the number of collisions cl by one (line 15 in Algorithm 1). The GaussSieve algorithm, which consists of these process steps in general, iterates all of these steps and terminates operations when it reaches (the termination criterion c) a certain number of collisions cl (zero vector state). The GaussSieve algorithm, which terminates the process, outputs the shortest lattice vector in the list L. In Algorithm 1, which shows the pseudo-code of the ProGaussSieve algorithm, if the lines, written in red, are removed, the pseudo-code of the GaussSieve algorithm is given with slight modifications.
The ProGaussSieve algorithm, a different version of the Gauss-based sieving algorithm, operates in the same manner and procedure as in the GaussSieve algorithm. The main difference between the ProGaussSieve algorithm and the GaussSieve algorithm is that the ProGaussSieve algorithm divides the lattice into smaller subparts and starts operating with the smallest one, although the GaussSieve algorithm starts operating on the whole lattice.
Move the reduced vectors w ∈ L from list L to the stack S 9: if v has not changed then if cl = c then 17: if progressive =n then 18: return argmin v∈L ||v|| 19: else 20: The ProGaussSieve algorithm in Algorithm 1 takes the constant progressive as an input parameter, which determines the size of the lattice subparts, as well as the input parameters of the GaussSieve algorithm. The ProGaussSieve algorithm starts the operation by dividing the lattice into subparts according to the constant progressive and does the same operations as the GaussSieve algorithm on the smallest lattice subpart. When the ProGaussSieve algorithm reaches a certain number of collisions cl (zero vector state), it increases the constant progressive. Then, the ProGaussSieve algorithm continues to operate on the following lattice subpart. After the ProGaussSieve algorithm operating over the whole lattice and the number of collisions cl reaches the value of the termination criterion c, it halts the operation and outputs the shortest vector of the lattice.

The HashSieve Algorithm
The main idea and the operation procedure in the HashSieve algorithm, which uses Charikar's angular-LSH (locality sensitive hashing) family, is almost the same as the GaussSieve algorithm. The GaussSieve algorithm stores the vectors and uses them to reduce in a list data structure. In contrast, the HashSieve algorithm stores the vectors and uses them to reduce the hash tables and the list data structure. Note that LSH is not a cryptographic hash function approach and is helpful to see whether the distance between vectors are small enough.
In the angular-LSH family, given a target vector v and a hash vector a, the hash value consists of a single bit h a (v) ∈ {0, 1} and is calculated as The angular-LSH family consists of functions H = {h a } in the randomly drawn a ∈ R n from an n-dimensional Gaussian distribution. These hash functions have the property that vectors mapped to the same bucket have a higher probability of being closer than the "average" list vectors [40].
The pseudo-code of the HashSieve algorithm, which receives the reduced lattice B and the termination criterion c (the total number of collisions) as input parameters, is given in Algorithm 2. The HashSieve algorithm starts to operate with the empty hash tables T and the empty stack data structure S containing the sample vectors. In each iteration, the HashSieve algorithm first takes either the sample vector v from stack S or the new sample vector v generated by the Klein's Nearest Neighbor algorithm (line 5 in Algorithm 2). The HashSieve algorithm reduces the vector v to the closest candidate vectors in hash tables T. Then, the HashSieve algorithm reduces the vector w by using the reduced vector v (line 10 in Algorithm 2). After moving the nonzero reduced vectors w to stack S, the HashSieve algorithm inserts the nonzero reduced vector v into the hash tables T at the end of the iteration (line 16 and 20 in Algorithm 2). If the reduced vector v is a zero vector, the HashSieve algorithm increases the collision number cl (line 18 in Algorithm 2). The HashSieve algorithm, which iteratively performs these operations until the collision number cl reaches the termination criterion c, gives the shortest vector of the lattice as an output.
Input: Reduced lattice basis B, termination criterion c Output: The shortest vector on the basis of lattice B  for each w ∈ C do 10: Reduce vector v with vector w and reduce vector w with vector v 11: if w has changed then 12: Although the primary purpose of the ENUM algorithm working with the enumeration idea is to solve the SVP problem, it is intended to be used as a subprocess in Schnorr and Euchner's BKZ reduction algorithm. The ENUM enumeration algorithm is used to calculate the smallest area of the local lattice blocks found in the BKZ algorithm.
The ENUM algorithm, the pseudo-code of which is given in Algorithm 3 and developed specifically for Schnorr and Euchner's BKZ algorithm, takes the indices j and k as input parameters. For the smallestc j in the function c j , the ENUM algorithm performs an indepth search on all integer vectors {ũ t , . . . ,ũ k }, providing the conditionc j > c t (ũ t , . . . ,ũ k ) (between 7 and 17 lines in Algorithm 3). The ENUM algorithm, which calculates the operationc t = c t (ũ t , . . . ,ũ k ), assigns the value 1 to the ∆ t and variablesũ t at the level t (lines 23 and 24 in Algorithm 3). The algorithm, which always assigns the maximum value of t to s, assigns one of the sequential values 1, −1, 2, −2, 3, −3, . . . to the variable ∆ t when the conditionc t ≥c j is satisfied (line 23 in Algorithm 3). The ENUM algorithm, running iteratively, increases the variables s and t by 1 in each iteration (line 19 in Algorithm 3). The algorithm, which continues to operate at the level t − 1, assigns the result of the operation −y t + −y t to the variable δ t and the value 0 to the variable ∆ t (line 10 in Algorithm 3). The algorithm, which begins to operate at level t again, sets one of the sequential values 1, −1, 2, −2, 3, −3, . . . or −1, 1, −2, 2, −3, 3, . . . to the value ∆ t (line 23 in Algorithm 3). The ENUM algorithm, performing iteratively, returns the smallest area {u j , . . . , u k } ∈ Z k−j+1 as an output when it reaches the termination criterion.
Input: j and k for 1 ≤ j < k ≤ m Output: The smallest field {u j , . . . , u k } ∈ Z k−j+1 ifc t <c j then 8: if t > j then 9: if t < s then 21: if ∆ t δ t ≥ 0 then 23: ∆ t = ∆ t + δ t 24:ũ t = w t + ∆ t 25: return The smallest field {u j , . . . , u k } ∈ Z k−j+1 2.6. The BKZ Algorithm Schnorr and Euchner's BKZ algorithm is used for lattice reduction. The operation is to perform on local sublattices in which the sizes are determined by the parameter β. In Schnorr and Euchner's BKZ algorithm, in which the reduction quality varies according to parameter β, the LLLFP reduction and the ENUM enumeration algorithms developed by Schnorr and Euchner are used as a subprocess.
In Algorithm 4, the pseudo-code of Schnorr and Euchner's BKZ algorithm is detailed. It takes the basis B = {v 1 , . . . , v n } of the n-sized lattice L, the local lattice size value β, the floating-point error value δ (for the LLLFP reduction algorithm), the Gram-Schmidt constants µ, and the Euclidean norms of the reduced Gram-Schmidt lattice vectors v * is not satisfied, the BKZ algorithm sends the local lattice block v 1 , . . . , v h to the LLLFP algorithm and updates the constants µ (line 17 in Algorithm 4). Thus, the algorithm, which produces the LLLFP reduced lattice basis {v 1 , . . . , v h }, assigns the value 1 to the index j if none of the enumeration operations are successful when the value of the index j reaches the number n. The BKZ algorithm, which is iteratively performing all of these operations, uses the failed enumeration process counter z as a termination criterion. After reducing the whole lattice and when the termination counter z is also n − 1, the BKZ algorithm returns the BKZ-reduced lattice {v 1 , . . . , v n } as an output.

Modular Software Library
A modular software library to solve SVP in lattice-based cryptography is developed. This library is structured with a divide-and-conquer approach, i.e., submodules/subcomponents commonly used in the sieving, enumeration, and reduction cryptanalysis algorithms are determined and firstly implemented. Then, the connection/relation of these submodules is defined. Adhering to the algorithmic framework emerging from the analysis, common subcomponents required by the algorithms are added to the software library as modules. Then, these modules are the core parts of the efficient implementations of algorithms. The algorithmic framework in Figure 1 shows the dependency relationship between all algorithms and the submodules needed by the algorithms such as the sieving, the ENUM, and the BKZ. The direction of the arrows in Figure 1 means that the algorithm is a subprocess in the target algorithm. For example, the LLLFP algorithm is used as a subprocess in the BKZ algorithm. The software library, developed using the C programming language, includes the Gram-Schmidt, the LLL, the LLLFP, the ENUM enumeration, the Klein's Nearest Neighbor (nearA and Randomized Rounding algorithms), and the GaussReduce algorithms in modules. The software library also contains the modular forms of the vector Euclidean norm calculation, the vector addition/subtraction, and the inner product mathematical operations structures, which are commonly used by all algorithms given in Section 2. These modules, which are used as subcomponents by the efficient implementations of sieving, enumeration, and reduction algorithms, are available in different data types (such as long long int, and double) in the software library. Since these modules are used as subcomponents in the efficient implementations, they directly affect the run time of the implementations. For this reason, the variables or the parameters in the modules are defined with a structure pointer or double − pointer. Due to the structure pointer, the speed of accessing data in the memory during the processing of subcomponents increases. Thus, the run time of efficient implementations is reduced.
Codes 1 and 2 are examples for the modules. The Gram-Schmidt modules that return the reduced lattice output in the data type double is given as an example in Code 1. Code 2 shows an example of an inner product module that produces an output of the data type long long int. The modular software library is developed by following the 64-bit architecture to perform operations without any overflow problem. The modular software infrastructure library's source codes and efficient implementations are available at https://github.-com/ hsatilmis/modular-_software_library (accessed on 2 July 2021). Section 3.1 provides the software features and structures of the efficient implementations of GaussSieve, ProGaussSieve, ENUM, and BKZ developed using the software library.

The GaussSieve and the ProGaussSieve Implementations
The efficient implementations of the GaussSieve and the ProGaussSieve algorithms, which are developed by using the modular software library as an infrastructure, are constructed by using the C programming language. In the developed implementations, variables pointer or double − pointer are used frequently to minimize the delay to access the data in the memory. The data structures struct provided by the C programming language are preferred in these implementations for the variables that contain many values, such as lattice. Since the software library is used as an infrastructure for the efficient implementations, they are developed under the 64-bit architecture to be compatible with this library. Therefore, the integer variables are defined in the data type long long int. Code 3 shows the data structure struct, which defines the array variable (Coord) where the coordinate values of the basis vectors of the lattice consisting of the integer vectors are stored, and the variable of the Euclidean norms (Norm2) of these vectors.  The GaussSieve and the ProGaussSieve algorithms frequently compute the vectorial mathematical operations for each iteration. For this reason, to have efficient implementations, mathematical operation modules in the modular software infrastructure library are used. The Gram-Schmidt module in the software library is used in the efficient implementations to obtain the reduced lattice that the other implementations receive as input parameters. In Observations 1 and 2, more details are given on the termination criterion and the number of collisions. These are helpful to understand the main idea of the performance improvements.

Observation 1.
In the GaussSieve implementation developed by Micciancio and Voulgaris in [5], the lattices given as input parameters are randomly generated. In this implementation, it is difficult to estimate the total number of collisions (the termination criterion) where the shortest vectors of the randomly generated lattices are found. Furthermore, the shortest vectors of the randomly generated lattices are unknown, and the accuracy of the shortest vector found by the implementation cannot be theoretically proven.

Observation 2.
It is not easy to determine the total number of collisions, which is the termination criterion of sieving algorithms [41]. In other words, when a small value is chosen as the termination criterion, the implementations of sieving algorithms may stop working before they find the shortest lattice vector. On the other hand, when an enormous value is chosen as the termination criterion, implementations can continue their work even after finding the shortest lattice vector. As a result, the implementations can cause unnecessary resource usage.
In Remark 1, the comparison details are given.

Remark 1.
To make a logical comparison of the efficient implementations of GaussSieve and ProGaussSieve with the GaussSieve and the ProGaussSieve implementations in [6] about the run time complexities, the memory space is used as a termination criterion in this paper. On the other hand, the use of memory space as a termination criterion provides a different perspective to the solution of the termination criterion determination problem in sieving algorithms. The memory space values that implementations in the literature expend upon when they find the shortest vectors are set as the termination criterion for the efficient implementations of GaussSieve and ProGaussSieve developed in this paper.
The lattices generated by using the SageMath application are given as the input parameters to the efficient implementations in this paper. The efficient implementations return the shortest vectors as output when consuming the memory spaces are selected as a termination criterion. Both the efficient implementations in this paper and the implementations in the literature give the shortest vectors of the input lattices as output within the same memory spaces.

The HashSieve Implementation
The efficient implementation of the HashSieve algorithm was developed based on the HashSieve implementation in [10]. Unlike the HashSieve implementation in [10], the efficient implementation of HashSieve, which uses the modular software infrastructure library, was built by using the C programming language and by following the 64-bit architecture.
While the efficient implementation of HashSieve was developed, the variables were defined in structures pointer and double − pointer to reduce the run time of an implementation. In an efficient implementation, data structures struct are used for variables such as lattices with different properties. In addition to the data structures struct in the HashSieve implementation in [10], the structures struct shown in Code 4 are used. The variables in the data structures struct in Code 4 are defined in structures pointer or double − pointer to provide faster access to data in the memory. On the other hand, in the HashSieve implementation in [10], the same variables are defined as standard arrays. This difference in the defining variables is one of the factors that allow the efficient implementation of HashSieve to perform better than the HashSieve implementation in [10] regarding the run time. The data structures struct in Code 4 define the coordinates (matrix, dmatrix) and the Euclidean norms (matrixNorm, dmatrixNorm) of the data type long long int and double lattices. As input parameters for the efficient implementation of HashSieve, the lattices used in the HashSieve implementation in [10] are given. The efficient implementation of HashSieve outputs the same shortest lattice vectors as the HashSieve implementation in [10] for the same lattice samples.

The ENUM and The BKZ Implementations
The ENUM algorithm is used as a subprocess to find the smallest vector. This corresponds to obtaining the new lattice vector in the BKZ reduction algorithm of Schnorr and Euchner. While implementing the ENUM, the variables are mostly defined in the structures pointer or double − pointer so that the implementation can have a low run time.
Since the ENUM implementation uses the software library with the 64-bit architecture as the infrastructure, the integer variables are defined in the long long int data type in this implementation. For the vector arithmetic, the arithmetic operations modules in the software library are used. The developed ENUM implementation is added as a module to the modular software library for Schnorr and Euchner's efficient implementation of BKZ.
The efficient and practical version of Schnorr and Euchner's BKZ algorithm is implemented using the modular software library developed in this paper as the infrastructure. The module of the LLLFP reduction algorithm in the software library used by the BKZ algorithm as a subprocess is used in the efficient implementation of BKZ. In Observation 3, the zero vector problem in LLLFP algorithm is defined.

Observation 3.
While testing the module in the software library of the LLLFP algorithm, which is developed based on the LLL reduction algorithm and designed to minimize floating-point errors, the zero vector problem is encountered. Given some particular lattice examples as input parameters, the LLLFP module is calculated as the first lattice vector to be a zero vector during the reduction process. It is not desired situation in the LLLFP algorithm for the first vector in the lattices to be a zero vector, as the reduction operation cannot proceed correctly due to the zero vector problem.
In Remark 2, the solution for the zero vector problem in LLLFP algorithm is defined.

Remark 2.
In [8], to solve the zero vector problem in the LLLFP reduction algorithm, it is proposed to remove the zero vector from the lattice and to continue the reduction without the zero vector. This proposed solution is implemented in the LLLFP module in the software library with a slight difference.
The module of the ENUM algorithm in the software library is used as a subprocess in the efficient implementation of BKZ and to find the smallest element. The Gram-Schmidt module in the modular software infrastructure library is used to calculate the Gram-Schmidt constants and the Euclidean norms of Gram-Schmidt reduced lattice vectors used in the efficient implementation of BKZ developed using the C programming language. Since the vectorial arithmetic operations are carried out continuously in the BKZ algorithm, the arithmetic operations module in the software library is developed in the BKZ implementation. In the BKZ implementation developed in accordance with the 64-bit architecture, the integer variables are defined in the long long int data type. In addition, the data structures struct and the structures pointer or double − pointer are commonly used in the efficient implementation of BKZ.

Experimental Results
In this section, we give the details about the experimental results.

Settings
The efficient implementations and the modular software library were developed and tested on the x86 solution platform in Visual Studio Community 2017 version 15.9.11. The compiler's default compilation options were selected when running the library. The average run times of the efficient implementations developed in this paper were measured on a server computer with 2× Intel Xeon E5-2630V4 (20 Core) processors and 64 GB of RAM hardware.

Results
By running each efficient implementation at least 1000 times, the average run times of the implementation were calculated. In Table 2, the average run times are given. Since the real run times are used in Table 2, the average run times of the implementations in the literature are not included in this table for comparison. The lattices randomly generated by the SageMath application were used as input parameters to test the efficient implementations of GaussSieve and ProGaussSieve. When the outputs of the efficient implementations of GaussSieve and ProGaussSieve were evaluated, it is observed that implementations find the shorter vectors in each iteration and return the shortest vector possible. By considering the experimental results, the run time complexities are computed in a big-O notation using real run times. The linear functions and the curves representing the exponential values of the run time complexities of the efficient implementations by the lattice sizes are given in Figure 2. In Remark 3, the computation details of the complexity of the GaussSieve and Pro-GaussSieve are discussed. Note that the complexities are computed with the experimental results.

Remark 3.
The run time complexities of the efficient implementations of GaussSieve and Pro-GaussSieve were computed using the curve fitting method using the Linear Regression model [42]. The exponential run time complexities were calculated using the real run times in Table 2. The exponential values are assumed as a linear function (y = an + b). In this context, the linear function's approximate values of the constants a and b are estimated by forming the Linear Regression model. By estimating the values of the constants a and b, the run time complexities of the efficient implementations were also obtained (considering 2 y = 2 an+b ). As a result, the run time complexities of the implementations of GaussSieve and ProGaussSieve were found to be 2 0.21n−9.5 and 2 0.17n−9 , respectively. The run time complexities of the efficient implementations and the implementations in [6] are given in Table 3. The source codes of the Linear Regression model, which was developed using the scikit-learn module [43] on the Python programming language, are available at https://github.com/hsatilmis/modular_software_library/-blob/master/(pro)gausssieve_ curve_fitt-ing.ipynb (accessed on 2 July 2021). In Remark 4, the performance improvements are discussed.

Remark 4.
In the efficient implementations of GaussSieve and ProGaussSieve, the space complexity values obtained from the results in [6] were used as the termination criterion. Therefore, the efficient implementations of GaussSieve and ProGaussSieve and the implementations in [6] use the same memory spaces. According to the experimental results with the small lattice sizes to compare the run times, the developed implementation for GaussSieve is at least 70% faster than the GaussSieve implementation in [6]. Moreover, the run time of ProGaussSieve is improved by almost 75% compared to the Laarhoven and Mariano's ProGaussSieve implementations in [6].
The lattices used in the HashSieve implementation in [10] are given to the efficient implementation as input parameters to test the efficient implementation of HashSieve. The outputs of this efficient implementation were compared with those of HashSieve implementation, and it is concluded that the efficient implementation works correctly. In Remark 5, the performance analysis of HashSieve implementation is discussed.

Remark 5.
Considering the experimental results for all lattice sizes, the proposed HashSieve implementation is at least 49% more efficient than Laarhoven's standard HashSieve implementation in [10].
The randomly generated lattices using the SageMath application are used as input parameters to test the efficient implementation of BKZ. Moreover, the same lattice samples are given as input parameters to the BKZ algorithm in the SageMath. The outputs of the BKZ algorithm in the SageMath application were compared with those of the efficient implementation of BKZ. As a result of the comparison, except for some unique lattice samples given as input, it is observed that the efficient implementation of BKZ gives correct outputs. In Remark 6, a discussion on the efficiency of run times is given.

Remark 6.
There are two main reasons why the efficient implementations of algorithms developed in this paper have better run times than previous ones.

1.
The common subcomponents in the algorithms are used as subprocesses. The algorithms constantly need common subcomponents during their operations. Hence, the run time of the implementations is directly affected by the common subcomponents. For this reason, a modular software library is developed that includes the common subcomponents as modules. During modular software library development, often, the structures pointer or double − pointer are used effectively in the variable definitions in the modules. Due to the pointer structures that provide quick access to the data in the memory, the processing speed of the modules is increased. Therefore, the run time of efficient implementations, which use the modules as subprocesses, are improved.

2.
In this structure, the vector arithmetic in the lattice is needed. Therefore, the data structures are needed to define the vectors and the lattice structures in the efficient implementations. To obtain efficient implementation, the data structures pointer and struct were used. This helps to quickly access the data and the vector elements.

Conclusions and Future Works
In this paper, a modular software infrastructure library was developed to provide an infrastructure for efficient implementations of the sieving, enumeration, and reduction algorithms. Using the modules in this software library, efficient implementations of the GaussSieve, ProGaussSieve, HashSieve, ENUM, and BKZ algorithms were developed. The outputs of the efficient implementations developed were compared with those of the implementations in the literature. Moreover, the correctness of the developed implementation was assessed by comparing the outputs with the previous ones. The run times and the memory usage of these efficient implementations were provided. The run time complexities of the efficient implementations of GaussSieve and ProGaussSieve were calculated and compared with examples in the literature. It is concluded that the efficient implementations of GaussSieve and ProGaussSieve, which use the memory space as a termination criterion, have better run time complexities than the implementations in the literature. According to the experimental results, the efficient implementations of GaussSieve and ProGaussSieve are at least 70% and 75% more efficient in terms of run time than previous ones, respectively. Finally, the efficient implementation of HashSieve is at least 49% more efficient in terms of the run time than the sample in the literature that is used as its basis during the development. In future studies, we aim to develop parallel versions of the implementations developed in this paper in order to be more efficient.