Faster provable sieving algorithms for the Shortest Vector Problem and the Closest Vector Problem on lattices in $\ell_p$ norm

In this work, we give provable sieving algorithms for the Shortest Vector Problem (SVP) and the Closest Vector Problem (CVP) on lattices in $\ell_p$ norm ($1\leq p\leq\infty$). The running time we obtain is better than existing provable sieving algorithms. We give a new linear sieving procedure that works for all $\ell_p$ norm ($1\leq p\leq\infty$). The main idea is to divide the space into hypercubes such that each vector can be mapped efficiently to a sub-region. We achieve a time complexity of $2^{2.751n+o(n)}$, which is much less than the $2^{3.849n+o(n)}$ complexity of the previous best algorithm. We also introduce a mixed sieving procedure, where a point is mapped to a hypercube within a ball and then a quadratic sieve is performed within each hypercube. This improves the running time, especially in the $\ell_2$ norm, where we achieve a time complexity of $2^{2.25n+o(n)}$, while the List Sieve Birthday algorithm has a running time of $2^{2.465n+o(n)}$. We adopt our sieving techniques to approximation algorithms for SVP and CVP in $\ell_p$ norm ($1\leq p\leq\infty$) and show that our algorithm has a running time of $2^{2.001n+o(n)}$, while previous algorithms have a time complexity of $2^{3.169n+o(n)}$.


Introduction
A lattice L is the set of all integer combinations of linearly independent vectors b 1 , . . . , b n ∈ R d , We call n the rank of the lattice and d the dimension of the lattice. The matrix B = (b 1 , . . . , b n ) is called a basis of L. A lattice is said to be full-rank if n = d. In this work, we only consider full-rank lattices unless otherwise stated.
The two most important computational problems on lattices are the Shortest Vector Problem (SVP) and the Closest Vector Problem (CVP). Given a basis for a lattice L ⊆ R d , the goal of SVP is to compute the shortest non-zero vector in L, while the goal of CVP is to compute a lattice vector at a minimum distance to a given target vector t. Typically, the length/distance is defined in terms of the p norm, which is given by x p = (|x 1 | p + |x 2 | p + · · · + |x d | p ) 1/p for 1 ≤ p < ∞ and x ∞ = max 1≤i≤d |x i | * mukhopadhyay.priyanka@gmail.com, p3mukhop@uwaterloo.ca These lattice problems have been mostly studied in the Euclidean norm (p = 2). Starting with the seminal work of [1], algorithms for solving these problems either exactly or approximately have been studied intensely. These algorithms have found applications in various fields, such as factoring polynomials over rationals [1], integer programming [2,3,4,5], cryptanalysis [6,7,8], checking the solvability by radicals [9], and solving low-density subset-sum problems [10]. More recently, many powerful cryptographic primitives have been constructed whose security is based on the worst-case hardness of these or related lattice problems [11,12,13,14,15,16,17,18,19].

Prior Work
The lattice algorithms that have been developed to solve SVP and CVP are either based on sieving techniques [21,20], enumeration methods [22,3], basis reduction [1,23], or Voronoi cell-based deterministic computation [24,4,25]. The fastest of these run in a time of 2 cn , where n is the rank of the lattice and c is some constant. Since the aim of this paper is to improve time complexity of sieving algorithms, we mainly focus on these.
For an overview of the other types of algorithms, interested readers can refer to the survey by Hanrot et al. [26].

Sieving Algorithms in the Euclidean Norm
The first algorithm to solve SVP in the time exponential in the dimension of the lattice was given by Ajtai, Kumar, and Sivakumar [21] who devised a method based on "randomized sieving", whereby exponentially many randomly generated lattice vectors are iteratively combined to create increasingly short vectors, eventually resulting in the shortest vector in the lattice. The time complexity of this algorithm was shown to be 2 3.4n+o(n) by Micciancio and Voulgaris [27]. This was later improved by Pujol and Stehle [28], who analyzed it with the birthday paradox and gave a time complexity of 2 2.571n+o(n) . In [27] the authors introduced List Sieve, which was modified in [28] (List Sieve Birthday) to give a time complexity of 2 2.465n+o(n) . The current fastest provable algorithm for exact SVP runs in a time of 2 n+o(n) [20,29], and the fastest algorithm that gives a large constant approximation runs in a time of 2 0.802n+o(n) [30].
To make lattice sieving algorithms more practical for implementation, heuristic variants were introduced in [31,27]. Efforts have been made to decrease the asymptotic time complexity at the cost of using more space [32,33,34,35] and to study the trade-offs in reducing the space complexity [35,36,37,38]. Attempts have been made to make these algorithms competitive in high-performance computing environments [39,40,41,42,43]. The theoretically fastest heuristic algorithm that is conjectured to solve SVP runs in a time of 2 0.29n+o(n) [33] (LDSieve).
The CVP is considered to be a harder problem than SVP since there is a simple dimension and approximationfactor preserving reduction from SVP to CVP [44]. Based on a technique due to Kannan [3], Ajtai, Kumar, and Sivakumar [45] gave a provable sieving based algorithm that gives a 1 + α approximation of CVP in time (2 + 1/α) O(n) . Later, exact exponential time algorithms for CVP were discovered [24,46]. The current fastest algorithm for CVP runs in a time of 2 n+o(n) and is due to [46].

Algorithms in Other p Norms
Blomer and Naewe [47] and then Arvind and Joglekar [48] generalized the AKS algorithm [21] to give exact provable algorithms for SVP that run in a time of 2 O(n) . Additionally, [47] gave a 1 + ε approximation algorithm for CVP for all p norms that runs in a time of (2 + 1/ε) O(n) . For the special case when p = ∞, Eisenbrand et al. [5] gave a 2 O(n) ·(log(1/ε)) n algorithm for (1+ε)-approx CVP. Aggarwal and Mukhopadhyay [49] gave an algorithm for SVP and approximate CVP in the ∞ norm using a linear sieving technique that significantly improves the overall running time. In fact, for a large constant approximation factor, they achieved a running time of 3 n for SVP. The authors have argued that it is not possible for any of the above-mentioned algorithms to achieve this running time in the ∞ norm.

Hardness Results
The first NP hardness result for CVP in all p norms and SVP in the ∞ norm was given by Van Emde Boas [50]. Ajtai [51] proved that SVP is NP-hard under randomized reductions. Micciancio [52] showed that SVP is NP-hard to approximate within some constant approximation factor. Subsequently, it was shown that approximating CVP in any p norm and SVP in ∞ norm up to a factor of n c/ log log n is NPhard [53,54]. This difficulty of the approximation factor has been improved to n c in [55], assuming the Projection Games Conjecture [56]. Furthermore, the difficulty of SVP up to factor 2 log 1− n has been obtained assuming NP RTIME(n poly(log n) ) [57,58]. Recently, [59] showed that for almost all p ≥ 1, CVP in the p norm cannot be solved in 2 n(1−ε) of time under the strong exponential time hypothesis. A similar difficulty result has also been obtained for SVP in the p norm [60].

Our Results and Techniques
In this paper, we adopt the framework of [21,45] and give sieving algorithms for SVP and CVP in p norm for 1 ≤ p ≤ ∞. The primary difference between our sieving algorithm and the previous AKS-style algorithms such as those in [21,45,47,48] is in the sieving procedure-ours is a linear sieve, while theirs is a quadratic sieve. This results in an improvement in the overall running time.
Before describing our idea, we give an informal description of the sieving procedure of [21,45,47,48]. The algorithm starts by randomly generating a set S of N = 2 O(n) lattice vectors with a length of at most R = 2 O(n) . It then runs a sieving procedure a polynomial number of times. In the i th iteration, the algorithm starts with a list S of lattice vectors of a length of at most R i−1 ≈ γ i−1 R, for some parameter γ ∈ (0, 1). The algorithm maintains and updates a list of "centers" C, which is initialized to be the empty set. Then, for each lattice vector y in the list, the algorithm checks whether there is a center c at a distance of at most γ · R i−1 from this vector. If there exists such a center, then the vector y is replaced in the list by y − c, and otherwise it is deleted from S and added to C. This results in N i−1 − |C| lattice vectors which have a length of at most R i ≈ γR i−1 , where N i−1 is the number of lattice vectors at the end of i − 1 sieving iterations. We would like to mention here that this description hides many details and in particular, in order to show that this algorithm succeeds eventually in obtaining the shortest vector, we need to add a small perturbation to the lattice vectors to start with. The details of this can be found in Section 3.
A crucial step in this algorithm is to find a vector c from the list of centers that is close to y. This problem is called the nearest neighbor search (NNS) problem and has been well studied, especially in the context of heuristic algorithms for SVP (see [33] and the references therein). A trivial bound on the running time for this is |S| · |C|, but much effort has been dedicated to improving this bound under heuristic assumptions (see Section 1.1.1 for some references). Since they require heuristic assumptions, such improved algorithms for the NNS have not been used to improve the provable algorithms for SVP.
One can also view such sieving procedures as a division of the "ambient" geometric space (consisting of all the vectors in the current list). In the i th iteration, the space of all vectors with a length of at most R i−1 is divided into a number of sub-regions such that in each sub-region the vectors are within a distance of at most γR i−1 from a center. In the previous provable sieving algorithms such as those in [21,47,48,27] or even the heuristic ones, these sub-regions have been an p ball of certain radius (if the algorithm is in p norm) or some sections of it (spherical cap, etc). Given a vector, one has to compare it with all the centers (and hence sub-regions formed so far) to determine in which of these sub-regions it belongs. If none is found, we make it a center and associate a new sub-region with it. Note that such a division of space depends on the order in which the vectors are processed.
The basic idea behind our sieving procedure (let us call it Linear Sieve) is similar to that used in [49,61] in the special case of the ∞ norm. In fact, our procedure is a generalization of this method for all p norm (1 ≤ p ≤ ∞). We select these sub-regions as hypercubes and divide the ambient geometric space a priori (before we start processing the vectors in the current list) considering only the maximum length of a vector in the list. A diagrammatic representation of such a division of space in two dimensions has been given in Figure 1. It must be noted that in this figure (for ease of illustration), the radius of the small hypercube (square) is the same for 1 , 2 , and ∞ balls (circles). However, in our algorithm, this radius depends on the norm. The advantage we obtain is that we can map a vector to a sub-region efficiently -in O(n) time; i.e., in a sense we obtain better "decodability" property. If the vector's hypercube (sub-region) does not contain a center, we select this point as the center; otherwise, we subtract this vector from the center to obtain a shorter lattice vector. Thus, the time complexity of each sieving procedure is linear in the number of sampled vectors. Overall, we obtain an improved time complexity at the cost of increased space complexity compared to previous algorithms [48,47,26]. A more detailed explanation can be found in Section 3.1. Figure 1: Division of the area of a circle in 1 , 2 , and ∞ norm (respectively) into smaller squares.
Specifically, we obtain the following result. Let γ ∈ (0, 1), and let ξ > 1/2. Given a full.rank lattice L ⊂ Q n , there is a randomized algorithm for SVP (p) with a success probability of at least 1/2, space complexity of at most 2 cspacen+o(n) , and running time of and

A mixed sieving algorithm
In an attempt to gain as many advantages as possible, we introduce a mixed sieving procedure (let us call it Mixed Sieve). Here, we divide a hyperball into larger hypercubes so that we can map each point efficiently to a hypercube. Within a hypercube, we perform a quadratic sieving procedure such as AKS with the vectors in that region. This improves both time and space complexity, especially in the Euclidean norm.
Approximation algorithms for SVP (p) and CVP (p) We have adopted our sieving techniques to approximation algorithms for SVP (p) and CVP (p) . The idea is quite similar to that described in [49,61] (where it was shown to work for only the ∞ norm). In Section 5.1, we have shown that our approximation algorithms are faster than those of [48,47], but again they require more space.
Remark 1.1. It is quite straightforward to extend our algorithm to the Subspace Avoiding Problem (SAP) (or Generalized Shortest Vector Problem GSVP) [47,48]: replace the quadratic sieve by any one of the faster sieves described in this paper. We thus obtain a similar improvement in running time. By Theorem 3.4 in [47], there are polynomial time reductions from other lattice problems such as the Successive Minima Problem (SMP) (given a lattice L with rank n, the Successive Minima Problem (SMP) requires to find n linearly independent vectors v 1 , . . . , v n ∈ L such that v i p ≤ cλ   n ) has been given in Section 2 (Definition 2.5); c is the approximation factor) with approximation factor 1 + to GSVP with approximation factor 1 + . Thus, we can obtain a similar improvement in running time for both these problems. Since in this paper, we focus mainly on SVP and CVP, we do not delve into further details for these other problems.
Remark 1.2. Our algorithm (and in that case any sieving algorithm) is quite different from deterministic algorithms such as those in [4,62]. They reduce the problem in any norm to a 2 norm and compute an approximation of the shortest vector length (or distance of the closest lattice point to a target in case of CVP) using the Voronoi cell-based deterministic algorithm in [27]. Then, they enumerate all lattice points within a convex region to find the shortest one. Constructing ellipsoidal coverings, it has been shown that the lattice points within a convex body can be computed in a time proportional to the maximum number of lattice points that the body can contain in any translation of an ellipsoid. Note for p norm that any smaller q ball (where p = q or p = q) can serve this purpose, and the bound on the number of translates comes from standard packing arguments. For these deterministic algorithms, the target would be to chose a shape so that the upper bound (packing bound) on the number of translates can be reduced. Thus, the authors chose small p balls to cover a larger p ball.
In contrast, in our sieving algorithm, we aimed to map each lattice point efficiently within a sub-region. Thus, we divided any arbitrary p ball into smaller hypercubes. The result was an increase in space complexity, but due to the efficient mapping, we reduced the running time. To the best of our knowledge, this kind of sub-divisions has not been used before in any sieving algorithm. The focus of our paper is to develop randomized sieving algorithms. Thus, we will not delve further into the details of the above-mentioned deterministic algorithms. Clearly, these are different procedures.

Organization of the Paper
In Section 2, we give some preliminary definitions and results that are useful for this paper. In Section 3, we introduce the linear sieving technique, while in Section 4, we describe the mixed sieving technique. In Section 5, we discuss how to extend our sieving methods to approximation algorithms.

Notations
We write log q to represent the logarithm to the base q, and simply log when the base is q = 2. We denote the natural logarithm by ln.
We use bold lowercase letters (e.g., v n ) for vectors and bold uppercase letters for matrices (e.g., M m×n ). We may drop the dimension in the superscript whenever it is clear from the context. Sometimes, we represent a matrix as a vector of column (vectors) (e.g., the representation size of x with respect to M is the maximum of n and the binary lengths of the numerators and denominators of the coefficients x i . We denote the volume of a geometric body A by vol(A).

p Norm and Ball
Definition 2.2. A ball is the set of all points within a fixed distance or radius (defined by a metric) from a fixed point or center. More precisely, we define the (closed) ball centered at x ∈ R n with radius r as n (x, r)) = {y ∈ R n : y − x p = r}. We may drop the first argument when the ball is centered at the origin 0 and drop both arguments for a unit ball centered at the origin. Let B We drop the first argument if the spherical shell or corona is centered at the origin.
The algorithm of Dyer, Frieze, and Kannan [63] almost uniformly selects a point in any convex body in polynomial time if a membership oracle is given [64]. For the sake of simplicity, we ignore the implementation detail and assume that we are able to uniformly select a point in B n (x, r) in polynomial time.

Lattice
Each lattice has a basis For algorithmic purposes, we can assume that L ⊆ Q d . We call n the rank of L and d the dimension. If d = n, the lattice is said to be full-rank. Though our results can be generalized to arbitrary lattices, in the rest of the paper, we only consider full-rank lattices.
Definition 2.4. For any lattice basis B, we define the fundamental parallelepiped as If y ∈ P(B), then y p ≤ n B p , as can be easily seen by triangle inequality. For any z ∈ R n , there exists a unique y ∈ P(B) such that z − y ∈ L(B). This vector is denoted by y ≡ z mod B and it can be computed in polynomial time given B and z.
, the i th successive minimum is defined as the smallest real number r such that L contains i linearly independent vectors with a length of at most r: Thus, the first successive minimum of a lattice is the length of the shortest non-zero vector in the lattice: We consider the following lattice problems. In all the problems defined below, c ≥ 1 is some arbitrary approximation factor (usually specified as subscript), which can be a constant or a function of any parameter of the lattice (usually rank). For exact versions of the problems (i.e., c = 1), we drop the subscript. Definition 2.6 (Shortest Vector Problem (SVP (p) c )). Given a lattice L, find a vector v ∈ L \ {0} such that v p ≤ c u p for any other u ∈ L \ {0}.
. Given a lattice L with rank n and a target vector ). The LLL algorithm [1] can be used to solve SVP (p) 2 n in polynomial time.
The following result shows that in order to solve SVP (p) 1+ , it is sufficient to consider the case when 2 ≤ λ (p) 1 (L) < 3. This is done by appropriately scaling the lattice. Lemma 2.2 (Lemma 4.1 in [47]). For all p norms, if there is an algorithm A that for all lattices L with 2 ≤ λ Thus, henceforth, we assume 2 ≤ λ

Some Useful Definitions and Results
In this section, we give some results and definitions which are useful for our analysis later.
Definition 2.8. Let P and Q are two point sets in R n . The Minkowski sum of P and Q, denoted as P ⊕ Q, is the point set {p + q : p ∈ P, q ∈ Q}.
If |D| and |B 1 | are the volumes of D and B 1 , respectively, then

[26]
When p = 2, further optimization can be done such that we get Here, φ u,v is the angle between the vectors u and v.
Below, we give some bounds which work for all p norms. We especially mention the bounds obtained for the 2 norm where some optimization has been performed using Theorem 2.1.
n (R) such that the distance between two points is at least γR, then |C| ≤ 2 ccn+o(n) .

[27, 26] When
Since the distance between two lattice vectors is at most λ (p) 1 (L), we obtain the following corollary. Corollary 2.1. Let L be a lattice and R be a real number greater than the length of the shortest vector in the lattice.

A Faster Provable Sieving Algorithm in p Norm
In this section, we present an algorithm for SVP (p) that uses the framework of the AKS algorithm [21] but uses a different sieving procedure that yields a faster running time. Using Lemma 2.1, we can obtain an estimate λ * of λ . Thus, if we try polynomially many different values of λ = (1 + 1/n) −i λ * , for i ≥ 0, then for one of them, we have λ For the rest of this section, we assume that we know an estimated λ of the length of the shortest vector in L, which is correct up to a factor 1 + 1/n. The AKS algorithm (or its p norm generalization in [48,47]) initially uniformly samples a large number of perturbation vectors, e ∈ B (p) n (d), where d ∈ R >0 , and for each such perturbation vector, it maintains a vector y close to the lattice (y is such that y − e ∈ L). Thus, initially, we have a set S of many such pairs (e, y) ∈ B (p) The desired situation is that after a polynomial number of such sieving iterations, we are left with a set of vector pairs (e , y ) such that y −e ∈ L∩B (p) Finally, we take the pair-wise differences of the lattice vectors corresponding to these vector pairs and output the one with the smallest non-zero norm. It was shown in [21,48,47] that, with overwhelming probability, this is the shortest vector in the lattice.
One of the main and usually the most expensive steps in this algorithm is the sieving procedure, where given a list of vector pairs (e, y) ∈ B (p) 1) . In each sieving iteration, a number of vector pairs (usually exponential in n) are identified as "center pairs". The second element of each such center pair is referred to as the "center". By a well-defined map, each of the remaining vector pairs is associated to a "center pair" such that after certain operations (such as subtraction) on the vectors, we obtain a pair with a vector difference yielding a lattice vector with a norm less than R . If we start an iteration with say N vector pairs and identify |C| number of center pairs, then the output consists of N − |C| vector pairs. An illustration is given in Figure 2. In [21] and most other provable variants or generalizations such as [48,47], the running time of this sieving procedure, which is the dominant part of the total running time of the algorithm, is roughly quadratic in the number of sampled vectors. Among the sampled vector pairs, some are identified as centers (red dots) and the space is divided into a number of balls, centered around these red dots. Vector subtraction (denoted by arrow) is performed with the center pair in each ball, such that we obtain shorter lattice vectors in the next iteration.
Here, we propose a different sieving approach to reduce the overall time complexity of the algorithm. This can be thought of as a generalization of the sieving method introduced in [49] for the ∞ norm. We divide the space such that each lattice vector can be mapped efficiently into some desired division. In the following subsection, we explain this sieving procedure, whose running time is linear in the number of sampled vectors.

Linear Sieve
In the initial AKS algorithm [21,45] as well as in all its variants thereafter [47,48,27], in the sieving subroutine, a space B (p) n (R) has been divided into sub-regions such that each sub-region is associated with a center. Then, given a vector, we map it to a sub-region and subtract it from the center so that we get a vector of length at most γR. We must aim to select these sub-regions such that we can (i) map a vector efficiently to a sub-region (ii) without increasing the number of centers "too much". The latter factor is determined by the number of divisions of B (p) n (R) into these sub-regions and directly contributes to the space (and hence time) complexity.
In all the previous provable sieving algorithms, the sub-regions were small hyperballs (or parts of them) in p norm. In this paper, our sub-regions are hypercubes. The choice of this particular sub-region makes the mapping very efficient. First, let us note that, in contrast with the previous algorithms (except [49]), we divide the space a priori. This can be done by dividing each co-ordinate axis into intervals of length γR n 1/p so that the distance between any two vectors in the resulting hypercube is at most γR. In an ordered list, we store an appropriate index (say, co-ordinates of one corner) of only those hypercubes which have a non-zero intersection with B (p) n (R). We can map a vector to a hypercube in O(n) time simply by looking at the intervals in which each of its co-ordinates belong. If the hypercube contains a center, then we subtract the vectors and store the difference; otherwise, we assign this vector as the center. An illustration is given in Figure 3. The space is divided into a number of hypercubes with diagonal length r, and each vector pair is mapped into a hypercube. (c) Within each hypercube, a subtraction operation (denoted by arrow) is performed between a center (red dot) and the remaining vector pairs, such that we obtain shorter lattice vectors in the next iteration.
The following lemma gives a bound on the number of hypercubes or centers we obtain by this process. Such a volumetric argument can be found in [67]. We do not know whether this is the most optimal way of sub-dividing B (p) n (R) into smaller hypercubes. In [49], it has been shown that if we divide [−R, R] from one corner-i.e., place one small hypercube at one corner of the larger hypercube B  (Figure 3(a)). We would like to combine points so that we are left with vectors in B (p) n (γR). We divide each axis into intervals of length y = γR n 1/p and store in an ordered set (I) co-ordinates of one corner of the resulting hypercubes that have a non-zero intersection with B (p) n (R) (Figure 3(b)). Note that this can be done in a time of O(nN h ), where N h is the maximum number of hypercube translates as described in Lemma 3.1.
We maintain a list C of pairs, where the first entry of each pair is an n-tuple in I (let us call it "indextuple") and the second one, initialized as empty set, is for storing a center pair. Given y, we map it to its index-tuple I y as follows: we calculate the interval in which each of its co-ordinates belong (steps 10-13 in Algorithm 2). This can be done in O(n) time. This is equivalent to storing information about the hypercube (in Figure 3(b)) in which it belongs or is mapped to. We can access C[I y ] in constant time. For each (e, y) ∈ S, if there exists a (e c , c) ∈ C[I y ]-i.e., I y = I c (implying y − c p ≤ γR)-then we add (e, y − c + e c ) to the output set S (Figure 3(c)). Otherwise, we add vector pair (e, y) to C[I y ] as a center pair. This implies that if there exists a center in the hypercube, then we perform subtraction operations to obtain a shorter vector. Otherwise, we make (e, y) the center for its hypercube. Finally, we return S .
More details of this sieving procedure (Linear Sieve) can be found in Algorithm 2.

AKS Algorithm with a Linear Sieve
Algorithm 1 describes an exact algorithm for SVP (p) with a linear sieving procedure (Linear Sieve) (Algorithm 2).
Proof. This follows from Lemma 3.1 in Section 3.1.
1. The first invariant is maintained at the beginning of the sieving iterations in Algorithm 1 due to the choice of y at step 4 of Algorithm 1.
Since each center pair (e c , c) once belonged to S, c − e c ∈ L. Thus, at step 15 of the sieving procedure (Algorithm 2), we have (e − y) + (c − e c ) ∈ L.
2. The second invariant is maintained in steps 2-6 of Algorithm 1 because y ∈ P(B) and hence We claim that this invariant is also maintained in each iteration of the sieving procedure.
Consider a pair (e, y) ∈ S and let I y be its index-tuple. Let (e c , c) be its associated center pair. By Algorithm 2, we have I y = I c ; i.e., y − c The claim follows by the re-assignment of variable R at step 10 in Algorithm 1.
In the following lemma, we bound the length of the remaining lattice vectors after all the sieving iterations are over. The proof is similar to that given in [61], so we write it briefly.
Thus, after k iterations, y p ≤ R k , and hence after k iterations, Using Corollary 2.1 and assuming λ ≈ λ  The above lemma along with the invariants implies that at the beginning of step 12 in Algorithm 1, we have "short" lattice vectors; i.e., vectors with a norm bounded by R . We want to start with a "sufficient number" of vector pairs so that we do not end up with all zero vectors at the end of the sieving iterations. For this, we work with the following conceptual modification proposed by Regev [68].
n (γR + ξλ) such that ∀i ∈ I , y i − e i ∈ L 1 R ← max (e,y)∈S y p ; 2 S ← ∅ ; 3 Divide each axis into intervals of length γR n 1/p and store a corner of those resulting hypercubes with a non zero intersection with B For the analysis of the algorithm, we assume that for each perturbation vector e chosen by our algorithm, we replace e by σ(e) with probability 1/2 and that it remains unchanged with probability 1/2. We call this procedure tossing the vector e. This does not change the distribution of the perturbation vectors {e}. Further, we assume that this replacement of the perturbation vectors happens at the step where this has any effect on the algorithm for the first time. In particular, at step 17 in Algorithm 2, after we have identified a center pair (e c , c), we apply σ on e c with probability 1/2. Then, at the beginning of step 12 in Algorithm 1, we apply σ to e for all pairs (e, y) ∈ S. The distribution of y remains unchanged by this procedure because y ≡ e ≡ σ(e) mod P(B) and y − e ∈ L. A somewhat more detailed explanation of this can be found in the following result of [47]. Note that since this is just a conceptual modification intended for ease in analysis, we should not be concerned with the actual running time of this modified procedure. Even the fact that we need a shortest vector to begin the mapping σ does not matter.
The following lemma will help us to estimate the number of vector pairs to sample at the beginning of the algorithm. Thus, with a probability of at least 1− 4 qN , we have at least 2 −csn N pairs (e i , y i ) before the sieving iterations such that e i ∈ D 1 ∪ D 2 .
Lemma 3.6. If N ≥ 2 q (k|C|+2 c b n +1), then with a probability of at least 1/2, Algorithm 1 outputs a shortest non-zero vector in L with respect to p norm for 1 ≤ p ≤ ∞.
Proof. Of the N vector pairs (e, y) sampled in steps 2-6 of Algorithm 1, we consider those such that e ∈ (D 1 ∪ D 2 ). We have already seen there are at least qN 2 such pairs with a probability of at least 1 − 4 qN . We remove |C| vector pairs in each of the k sieve iterations. Thus, at step 12 of Algorithm 1, we have N ≥ 2 c b n + 1 pairs (e, y) to process. By Lemma 3.3, each of them is contained within a ball of radius R which can have at most 2 c b n lattice vectors. Thus, there exists at least one lattice vector w for which the perturbation is in D 1 ∪ D 2 , and it appears twice in S at the beginning of step 12. With a probability of 1/2, it remains w, or with the same probability, it becomes either w + u or w − u. Thus, after taking pair-wise difference at step 12 with a probability of at least 1/2, we find the shortest vector. Proof. If we start with N pairs (as stated in Lemma 3.6), then the space complexity is at most 2 cspacen+o(n) with c space = c s + max(c c , c b ).
In each iteration of the sieving Algorithm 2, it takes at most O(nN h ) time to initialize and index C (Lemmas 3.1 and 3.2). For each vector pair (e, y) ∈ S, it takes a time of at most n to calculate its indextuple I y . Thus, the time taken to process each vector pair is at most (n + 1), and the total time taken per iteration of Algorithm 2 is at most O(n (N h + N )), which is at most 2 cspacen+o(n) , and there are at most poly(n) such iterations.
If N ≥ 2 c b n + 1, then the time complexity for the computation of the pairwise differences is at most Thus, the overall time complexity is at most 2 ctimen+o(n) where c time = max(c space , 2c b ).

Improvement Using the Birthday Paradox
We can obtain a better running time and space complexity if we use the birthday paradox to decrease the number of sampled vectors but obtain at least two vector pairs corresponding to the same lattice vector after the sieving iterations [28,26]. For this, we have to ensure that the vectors are independent and identically distributed before step 12 of Algorithm 1. Thus, we incorporate the following modification, as discussed in [26]. Very briefly, the trick is to set aside many uniformly distributed vector pairs as centers for each sieving step, even before the sieving iterations begin. In each sieving iteration, the probability that a vector pair is not within the required distance of any center pair decreases. Now, if we sample enough vectors, then with a good probability at step 12, we have at least two vectors whose perturbation is in D 1 D 2 , implying that with a probability of at least 1/2, we obtain the shortest vector.
In the analysis of [26], the authors simply stated that the required center pairs can be sampled uniformly at the beginning. In our linear sieving algorithm, we have an advantage. Unlike the AKS-style algorithms, in which the center pairs are selected and then the space is divided, in our case, we can divide the space a priori. We take advantage of this and conduct a number of random divisions of the space. Since in each iteration, the length of the vectors decreases, the size of the hypercubes also decreases, and this can be calculated. Thus, for each iteration we have a number of divisions of the space into hypercubes of a certain size. For this, we need to divide the axes into intervals of a fixed size. Simply by shifting the intervals in each axis, we can make this division random. Then, among the uniformly sampled vectors, we select a center for each hypercube.
Assume we start with N ≥ 2 q (n 3 k|C| + n2 c b 2 n ) sampled pairs. After the initial sampling, for each of the k sieving iterations, we fix Ω 2n 3 q |C| pairs to be used as center pairs in the following way. 1. Let R = max i∈[N ] y i p . We maintain k lists of pairs, C 1 , C 2 , . . . , C k , where each list is similar to (C), as described in Algorithm 2. In the i th list, we store the indices (co-ordinates of a corner) of translates of B For such a division, we can obtain O(|C|) center pairs in each list. To meet our requirement, we maintain O(n 3 ) such lists for each i. We call these O(n 3 ) lists the "sibling lists" of C i .
2. For each (e, y) ∈ S (where S is the set of sampled pairs), we first calculate y p to check in which list group it can potentially belong, say C j . That is, C j corresponds to the smallest hyperball containing y. Then, we map it to its index-tuple I y , as has already been described before. We add (e, y) to a list in C j or any of its sibling lists if it was empty before. Since we sampled uniformly, this ensures we obtain the required number of (initially) fixed centers, and no other vector can be used as a center throughout the algorithm.
Having set aside the centers, now we repeat the following sieving operations k times. For each vector pair (e 1 , y 1 ) ∈ S, we can check which list (or its sibling lists) it can belong to from y 1 p . Then, if a center pair is found, we subtract as in step 15 of Algorithm 2. Otherwise, we discard it and consider it "lost".
Let us call this modified sieving procedure LinearSieveBirthday. We obtain the following improvement in the running time.
Theorem 3.2. Let γ ∈ (0, 1), and let ξ > 1/2. Given a full rank lattice L ⊂ Q n , there is a randomized algorithm for SVP (p) with a success probability of at least 1/2, a space complexity of at most 2 cspacen+o(n) , and running time of at most 2 ctimen+o(n) , where Proof. This analysis has been taken from [26]. At the beginning of the algorithm, among the pairs set aside as centers for the first step, there are Ω n 3 |C| pairs such that the perturbation is in D 1 D 2 with high probability (Lemma 3.5). We call them good pairs. After fixing these pairs as centers, the probability that the distance between the next perturbed vector and the closest center is more than γR decreases. The sum of these probabilities is bounded from above by |C|. As a consequence, once all centers have been processed, the probability for any of the subsequent pairs to be lost is O 1 n 3 . By induction, it can be proved that the same proportion of pairs is lost at each step of the sieve with high probability. As a consequence, no more n pairs are lost during the whole algorithm. This means that in the final ball, there are Ω n2 c b 2 n probabilistically independent lattice points corresponding to good pairs with high probability. As in the proof of Lemma 3.6 this implies that the algorithm returns a shortest vector with a probability of at least 1/2.
Comparison of Linear Sieve with provable sieving algorithms [21,45,47,48] For 1 ≤ p ≤ ∞, the number of centers obtained by [47] is We can incorporate modifications to apply the birthday paradox, as has been done in [26] (for 2 norm). This would improve the exponents to Clearly, the running time of our algorithm is less since 1 + 2 γ 2 > 2 + 2 γ for all γ < 1. In [47], the authors did not specify the constant in the exponent of running time. However, using the above formulae, we found out that their algorithm can achieve a time complexity of 2 3.849n+o(n) and space complexity of 2 2.023n+o(n) at parameters γ = 0.78, ξ = 1.27 (without the birthday paradox, the algorithm in [47] can achieve time and space complexities of 2 5.179n+o(n) and 2 3.01n+o(n) , respectively, at parameters γ = 0.572, ξ = 0.742). In comparison, our algorithm can achieve a time and space complexity of 2 2.751n+o(n) at parameters γ = 0.598, ξ = 0.82.
For p = 2, we can use Theorem 2.1 to obtain a better bound on the number of lattice vectors that remain after all sieving iterations. This is reflected in the quantity c b , which is then given by c (2) space = 2.49. The AKS algorithm with the birthday paradox manages to achieve a time complexity of 2 2.571n+o(n) and space complexity of 2 1.407n+o(n) when γ = 0.589 and ξ = 0.9365 [26]. Thus, our algorithm achieves a better time complexity at the cost of more space.
For p = ∞, we can reduce the space complexity by using the sub-division mentioned in Section 3.1 and achieve a space and time complexity of 2 2.443n+o(n) at parameters γ = 0.501, ξ = 0.738 (in [49], the authors mentioned a time and space complexity of 2 2.82n+o(n) in ∞ norm. We obtain a slightly better running time by using c b , as mentioned in this paper). Again, this is better than the time complexity of [47] (which is for all p norms).

A Mixed Sieving Algorithm
The main advantage in dividing the space (hyperball) into hypercubes (as we did in Linear Sieve) is the efficient "decodability" in the sense that a vector can be mapped to a sub-region (and thus be associated with a center) in O(n) time. However, the price we pay is in space complexity, because the number of hypercubes required to cover a hyperball is greater than the number of centers required if we used smaller hyperballs like in [21,47,48]. To reduce the space complexity, we perform a mixed sieving procedure. Double sieving techniques have been used for heuristic algorithms as in [32], where the rough idea is the following. There are two sets of centers: the first set consists of centers of larger radius balls, and for each such center, there is another set of centers of smaller radius balls within the respective large ball. In each sieving iteration, each non-center vector is mapped to the larger balls by comparing with the centers in the first set. Then, they are mapped to a smaller ball by comparing with the second set of centers. Thus, in both levels, a quadratic sieve is applied.
In our mixed sieving, the primary difference is the fact that in the two levels, we use two types of sieving methods: a linear sieve in the first level and then a quadratic sieve such as AKS in the next level. The overall outline of the algorithm is the same as in Algorithm 1, except at step 9, where we apply the following sieving procedure, which we call Mixed Sieve. An illustration is given in Figure 4. The input to Mixed Sieve is a set of vectors of length R, and the output is a set of smaller vectors of length γR.
1. We divide the whole space into large hypercubes of length AγR n 1/p , where A is some constant. In O(n) time, we map a vector to a large hypercube by comparing its co-ordinates. This has been explained in Section 3.1. We do not assign centers yet and do not perform any vector operation at this step. The distance between any two vectors mapped to the same hypercube is at most AγR (Figure 4b).
2. Next, we perform the AKS sieving procedure within each hypercube. For each hypercube, we have a set (initially null) of centers. When a vector is mapped to a hypercube ,we check if it is within distance γR of any center (within that hypercube). If yes, then we subtract it from the center and add the resultant shorter vector to output set. If no, then we add this vector to the set of centers (Figure 4c).
Using the same kind of counting method as in Section 3.1, we can say we need 2 c n large hypercubes, where c = log 2 + 2 Aγ . The maximum distance between any two vectors in each hypercube is AγR, and we want to get vectors of length at most γR by applying the AKS sieve. Thus, the number of centers (let us call these "AKS sieve-centers") within each hypercube is 2 cpn+o(n) where c p = log(1 + A) (in the special case of Euclidean norm, we have c 2 = 0.401 − log 2 A ). c p (and c 2 ) are obtained by applying Lemma 2.4. Note that the value of A must ensure the non-negativity of c 2 . Thus, the total number of centers is 2 c (p) n+o(n) where c (p) = c + c p .
To use the birthday paradox, we apply similar methods as given in Section 3.3 and [26]. Assume that we initially sample N ≥ 2 q (n 3 k2 c (p) n+o(n) + n2 c b 2 n ) vectors. Then, using similar arguments as in Section 3, we can conclude that, with high probability, we end up with the shortest vector in the lattice. We are not re-writing the proof since it is similar to that in Theorem 3.2. The only thing that is slightly different is the number of center pairs set aside at the beginning of the sieving iterations. As in Section 3, we randomly divide the space n 3 times into 2 c n hypercubes. Then, among the uniformly sampled vectors, we set aside 2 cpn vector pairs as centers for each hypercube. Thus, in Theorem 3.2, we replace |C| by 2 c (p) n+o(n) .
Thus, space complexity is 2 cspacen+o(n) where c space = c s + max(c (p) , c b /2). It takes O(n) time to map each vector to a large hypercube, and then at most 2 cpn+o(n) time to compare it with the "AKS sieve-centers" within each hypercube. Thus, the time complexity is 2 ctimen+o(n) where c time = max(c space + c p , c b ).
Theorem 4.1. Let γ ∈ (0, 1), ξ > 1/2 and A be some constant. Given a full-rank lattice L ⊂ Q n , there is a randomized algorithm for SVP (p) with a success probability of at least 1/2, a space complexity of at most 2 cspacen+o(n) , and a running time of at most 2 ctimen+o(n) . Here, Comparison with previous provable sieving algorithms [27,28,20] In the Euclidean norm with parameters γ = 0.645, ξ = 0.946 and A = 2 0.599 , we obtain a space and time complexity of 2 2.25n+o(n) , while the List Sieve Birthday [26,28] has space and time complexities of 2 1.233n+o(n) and 2 2.465n+o(n) , respectively. We can also use a different sieve in the second level, such as List Sieve [27], etc., which works in 2 norm and is faster than the AKS sieve. We can therefore expect to achieve a better running time.
The Discrete Gaussian-based sieving algorithm of Aggarwal et al. [20] with a time complexity of 2 n+o(n) performs better than both our sieving techniques. However, their algorithm works for the Euclidean norm and, to the best of our knowledge, it has not been generalized to any other norm.

Approximation Algorithms for SVP (p) and CVP (p)
In this section, we show how to adopt our sieving techniques to approximation algorithms for SVP (p) and CVP (p) . The analysis and explanations are similar to that given in [61]. For completeness, we give a brief outline.

Algorithm for Approximate SVP (p)
We note that at the end of the sieving procedure in Algorithm 1, we obtain lattice vectors of length at most R = ξ(2−γ)λ 1−γ + O(λ/n). Thus, if we can ensure that one of the vectors obtained at the end of the sieving procedure is non-zero, we obtain a τ = ξ(2−γ) 1−γ + o(1)-approximation of the shortest vector. Consider a new algorithm A (let us call it Approx-SVP) that is identical to Algorithm 1, except that Step 12 is replaced by the following: We now show that if we start with sufficiently many vectors, we must obtain a non-zero vector.
Lemma 5.1. If N ≥ 2 q (k|C| + 1), then with a probability of at least 1/2, Algorithm A outputs a non-zero vector in L of a length of at most ξ(2−γ)λ Proof. Of the N vector pairs (e, y) sampled in steps 2-6 of Algorithm A, we consider those such that e ∈ (D 1 ∪ D 2 ). We have already seen there are at least qN 2 such pairs. We remove |C| vector pairs in each of the k sieve iterations. Thus, at step 12 of Algorithm 1, we have N ≥ 1 pairs (e, y) to process.
With a probability of 1/2, e, and hence w = y − e is replaced by either w + u or w − u. Thus, the probability that this vector is the zero vector is at most 1/2.
We thus obtain the following result.
Note that while presenting the above theorem, we assumed that we are using the Linear Sieve in Algorithm 1. We can also use the Mixed Sieve procedure as described in Section 4. Then, we will obtain space and time complexities of 2 (cs+c (p) )n+o(n) and 2 (cs+c (p) +cp)n+o(n) , respectively, where c (p) = log 2 + 2 Aγ + c p and c p = log(1 + A), respectively (in the Euclidean norm, the parameters are as described in Theorem 4.1).
Comparison with provable approximation algorithms [30,47,48] We have mentioned in Section 1 that [48,47] gave approximation algorithms for lattice problems that work for all p norms and use the quadratic sieving procedure (as has been described before). Using our notations, the space and time complexities of their approximate algorithms are 2 cspace(BN )n+o(n) and 2 ctime(BN )n+o(n) , respectively, where The authors did not mention any explicit value of the constant in the exponent. Using the above formulae, we conclude that [48] and [47] can achieve time and space complexities of 2 3.169n+o(n) and 2 1.586n+o(n) , respectively, at parameters γ = 0.99, ξ = 10.001 with a large constant approximation factor. In comparison, we can achieve a space and time complexity of 2 2.001n+o(n) with a large constant approximation factor at the same parameters.
In 2 norm, using the mixed sieving procedure, we obtain a time and space complexity of 2 1.73n+o(n) and a large constant approximation factor at parameters γ = 0.999,ξ = 1. In [30], the best running time reported is 2 0.802n for a large approximation factor.
Using a similar linear sieve, a time and space complexity of 3 n i.e., 2 1.585n+o(n) can be achieved for the

Algorithm for Approximate CVP (p)
Given a lattice L and a target vector t, let d denote the distance of the closest vector in L to t. Just as in Section 3.2, we assume that we know the value of d within a factor of 1 + 1/n. We can get rid of this assumption by using Babai's [69] algorithm to guess the value of d within a factor of 2 n and then run our algorithm for polynomially many values of d.
For τ > 0, define the following (n + 1)−dimensional lattice L Let z * ∈ L be the lattice vector closest to t.
We sample N vector pairs (e, y) ∈ B  , 0), . . . , (b n , 0), (t, τ d/2)] is a basis for L . Next, we run a number of iterations of the sieving Algorithm 2 to obtain a number of vector pairs such that y p ≤ R = ξd 1−γ + o(1). Further details can be found in Algorithm 3. Note that in the algorithm, v| [n] is the n−dimensional vector v obtained by restricting v to the first n co-ordinates (with respect to the computational basis). Selecting ξ < (1−γ)τ 2−γ − o(1) ensures that our sieving algorithm does not return vectors from (L, 0) − (k t, k τ d/2) for some k such that |k | ≥ 2. Then, every vector has v p < τ d, and so either v = ±(z − t, 0) or v = ±(z − t, −τ d/2) for some lattice vector z, z ∈ L.
With similar arguments as in [61] (using the tossing argument outlined in Section 3.2), we can conclude that with some non-zero probability we have at least one vector in L \ (L ± t, 0) after the sieving iterations.
Thus, we obtain the following result.
Again, using Mixed Sieve in Algorithm 1, we obtain space and time complexities of 2 (cs+c (p) )n+o(n) and 2 (cs+c (p) +cp)n+o(n) , respectively, where c (p) = log 2 + 2 Aγ + c p and c p = log(1 + A), respectively (in the Euclidean norm, the parameters are as described in Theorem 4.1).

Discussions
In this paper, we have designed new sieving algorithms that work for any p norm. A comparative performance evaluation has been given in Table 1. We achieve a better time complexity at the cost of space complexity for every 1 ≤ p ≤ ∞, except for the algorithm in [20] that employs a Discrete Gaussian-based sieving algorithm and has better space and time complexity in the Euclidean norm. To the best of our knowledge, this algorithm does not work for any other norm.

Future work
An obvious direction for further research would be to design heuristic algorithms on these kind of sieving techniques and to study if these can be adapted to other computing environments like parallel computing. The major difference between our algorithm and the others like [21,47] is in the choice of the shape of the sub-regions in which we divide the ambient space (as has already been explained before). Due to this we get superior "decodability" in the sense that a vector can be efficiently mapped to a sub-region, at the cost of inferior space complexity, as described before. It might be interesting to study what other shapes of these sub-regions might be considered and what are the trade-offs we get.
It might be possible to improve the bound on the number of hypercubes required to cover the hyperball. At least in the ∞ norm we have seen that the number of hypercubes may depend on the initial position of the smaller hypercube, whose translates cover the bigger hyperball. In fact it might be possible to get some lower bound on the complexity of this kind of approach.