Improved Bernoulli Sampling for Discrete Gaussian Distributions over the Integers

: Discrete Gaussian sampling is one of the fundamental mathematical tools for lattice-based cryptography. In this paper, we revisit the Bernoulli(-type) sampling for centered discrete Gaussian distributions over the integers, which was proposed by Ducas et al.in 2013. Combining the idea of Karney’s algorithm for sampling from the Bernoulli distribution B e − 1/2 , we present an improved Bernoulli sampling algorithm. It does not require the use of ﬂoating-point arithmetic to generate a precomputed table, as the original Bernoulli sampling algorithm did. It only needs a ﬁxed look-up table of very small size (e.g., 128 bits) that stores the binary expansion of ln2. We also propose a noncentered version of Bernoulli sampling algorithm for discrete Gaussian distributions with varying centers over the integers. It requires no ﬂoating-point arithmetic and can support centers of precision up to 52 bits. The experimental results show that our proposed algorithms have a signiﬁcant improvement in the sampling efﬁciency as compared to other rejection algorithms.


Introduction
Lattice-based cryptography is one of the most widely and deeply studied postquantum cryptography. Discrete Gaussian sampling is one of the important mathematical tools for lattice-based cryptography [1][2][3]. Many lattice-based cryptosystems, including their security proofs, involve discrete Gaussian sampling.
Generally, discrete Gaussian sampling is to sample from a (multivariate) discrete Gaussian distribution over an n-dimensional lattice Λ. In particular, discrete Gaussian sampling over the integers, denoted by SampleZ, refers to the generation of random integers that conform to Gaussian distribution D Z,σ,µ with parameter σ > 0 and center µ ∈ R. Since SampleZ is much more efficient and simpler than discrete Gaussian sampling over a general lattice, many lattice-based cryptosystems, such as [4,5], only involve discrete Gaussian sampling over the integers.
The methods of sampling from a continuous Gaussian distribution are not trivially applicable for the discrete case. Therefore, it is of great significance to study discrete Gaussian sampling and design sampling algorithms, especially for SampleZ. This issue has attracted a lot of attention in recent years.

Related Work
For the commonly used methods (techniques) for SampleZ, including inversion sampling, the Knuth-Yao method and the rejection sampling, we can refer to [6,7]. One can also consider the convolution technique (based on the convolution-like properties of discrete Gaussian developed by Peikert in [8]) [9,10]. In addition, Saarinen developed a new binary arithmetic coding sampler for discrete Gaussians (BAC Sampling) [11]. Moreover, Karmakar et al. designed a class of very efficient discrete Gaussian samplers based on computing a series of Boolean functions with a number of random bits as the input [12,13].
Sampling from a discrete Gaussian distribution over the integers usually involves floating-point arithmetic because of the exponential computation in Gaussian function. It is believed that high-precision floating-point arithmetic is required for attaining enough statistical distance to the target distribution. This implies that online high-precision floatingpoint computation may be an implementation bottleneck. Thus, we can see that most of the improved SampleZ algorithms manage to use high-precision floating-point arithmetic only at offline time (such as algorithms based on CDT's or Knuth-Yao) or even to avoid floating-point arithmetic (such as Karney's sampling algorithm [14]).
Recently results show that the Bernoulli sampling, introduced by Ducas et al. in [4] for centered discrete Gaussian distributions over the integers, can be effectively improved by combining the polynomial approximation technique. Specifically, Bernoulli sampling is a kind of rejection sampling. Its rejection operation can be done very efficiently by calculating an approximated polynomial [15,16]. The polynomial is predetermined; it thus supports discrete Gaussians with a varying center µ ∈ [0, 1) over the integers.
Generic discrete Gaussian sampling algorithms over the integers that support a varying center µ and even a varying parameter σ are also very interesting. A discrete Gaussian sampling algorithm for an n-dimensional lattice usually requires such an algorithm as its subroutine [2,8,17,18]. However, most of the algorithms for sampling D Z,σ,µ and their improvements are designed only for the case where center µ is fixed in advance, such as [4,9,11,19,20]. When sampling from D Z,σ,µ with varying µ ∈ [0, 1) (e.g., double-precision numbers that are sampled from a specific distribution online), except [14], the existing algorithms available rely on floating-point arithmetic, such as the algorithms in [2,15] and the sampler used in Falcon signature [5] or require a large amount of offline computation and precomputation storage, such as the algorithms proposed by Micciancio et al. [10] and by Aguilar-Melchor et al. [21] respectively. Moreover, the sampler presented by Barthe et al. in [16] is not a generic algorithm, as it supports a varying µ ∈ R but not a varying parameters σ. One needs to determine a new approximation polynomial for each new σ.

Our Contribution
In this paper, combining the idea of Karney's algorithm for sampling the Bernoulli distribution B e −1/2 , we present a generalized version of Bernoulli sampling algorithm for both centered and noncentered discrete Gaussian distributions over the integers.
Specifically, we present an improved implementation of the Bernoulli sampling algorithm for centered discrete Gaussian distributions over the integers (see Algorithm 2). We remove the precomputed table, that is required and obtained by using floating-point arithmetic in advance in the original Bernoulli sampling algorithm. Except a fixed look-up table of very small size (e.g., 128 bits) that stores the binary expansion of ln 2, our proposed algorithm requires no precomputation storage and uses no floating-point arithmetic.
We also propose a noncentered Bernoulli sampling algorithm (see Algorithm 3) that supports discrete Gaussian distributions with varying centers. It uses no floating-point arithmetic, and it is suitable for sampling D Z,σ,µ with µ ∈ [0, 1) of precision up to 52 bits. We also show that our noncentered version of sampling algorithm has the same rejection rate as the original Bernoulli sampling algorithm. Furthermore, the experimental results show that our proposed algorithms have a significant improvement in the sampling efficiency as compared to other rejection algorithms for discrete Gaussian distributions over the integers [14]. Table 1 summarizes the some existing sampling algorithms for discrete Gaussian distributions over the integers and their properties in comparison to our work. Here, λ is the security parameters. The column "FPA" and the column "exp(·)" indicate if the algorithm requires to use floating-point arithmetic online and to evaluate exp(·) online respectively. The column "varying µ" refers to the property of being able to produce samples (online) from discrete Gaussians with varying centers rather than a specific discrete Gaussian with a fixed center. Similarly, the column "varying σ" refers to the property of being able to produce samples (online) from discrete Gaussians with varying parameter σ.

Preliminaries
In this paper, R denotes the set of real numbers, Z and Z + denotes the set of integers and non-negative integers respectively. We use f (S) = ∑ x∈S f (x) to define any real function f (·) to a countable set S if it exists, and use ρ σ,µ (x) = exp − (x−µ) 2 2σ 2 to define the Gaussian function evaluated at x ∈ R, where parameter σ > 0 and center µ ∈ R. Besides, D Z,σ,µ = ρ σ,µ (x)/ρ σ,µ (Z) defines the discrete Gaussian distribution evaluated at x ∈ Z, where x ∈ Z, µ ∈ R, σ > 0.
Bernoulli(-type) sampling algorithm proposed by Ducas et al. in [4] was designed exclusive for a centered discrete Gaussian distribution over the integers, namely D Z,σ,µ with σ = kσ 2 and µ = 0 , where k is a positive integer and σ 2 = 1/(2 · ln 2). This algorithm uses rejection sampling from the binary discrete Gaussian distribution, denoted by D Z + ,σ 2 . Sampling from D Z + ,σ 2 can be very efficient using only unbiased random bits (one may refer to Algorithm 10 in [4]). Firstly, it samples an integer x ∈ Z + from D Z + ,σ 2 , then samples y ∈ Z uniformly in {0, 1, 2, · · · , k − 1} and accepts z = kx + y with probability exp(−(y 2 + 2kxy)/2k 2 σ 2 2 ), finally, it outputs a signed integer, z or −z, with equal probabilities because of the symmetry of the target distribution (see Algorithm 11 and Algorithm 12 in [4]). It was shown that the average number of rejections is not more than 1.47.
A precomputed look-up table is required for computing the exponential function exp(−(y 2 + 2kxy)/2σ 2 ). The size of the table varies with the parameter σ. For examples, the size is 2.3 kb for σ ≈ 107 and 4 kb for σ ≈ 271 according to the estimations given by Ducas et al. in [4], which is not very small but acceptable. Algorithm 1 is adapted by Karney [14] from Von Neumann's algorithm that samples from e −x , where x > 0. Algorithm 1 is used as one of the basic operations in Karney's sampling algorithm for the standard normal distribution, which can be executed only by using integer arithmetic.
In Algorithm 1, by replacing 1/2 with p/q, we can get a Bernoulli random value. The value is true with probability e −p/q , namely sampling from B e −p/q , where p, q are positive integers and p < q. The above modification can be regarded as a generalized version of Algorithm 1, which is used implicitly in Karney's algorithm. The discrete Gaussian distribution over integers in Karney's algorithm also adopts the rejection sampling algorithm, which is a discretization of Karney's algorithm for sampling from the standard normal distribution. It is fit for discrete Gaussians with varying rational parameters of precision up to 32 bits (including σ and µ). The algorithm does not require any floating-point arithmetic. Moreover, it needs no precomputation storage. One may refer to [14] for the full description of Karney's sampling algorithm. Algorithm 1: [14] Sample from Bernoulli distribution B 1/ √ e Output: a Bernoulli random value from B 1/ √ e 1 sample uniform deviates u 1 , u 2 , . . . with u i ∈ [0, 1) and determine the maximum value n ≥ 0 such that 1/2 > u 1 > u 2 > . . . > u n ; 2 if n is even then 3 return true; 4 else 5 return false;

Removing Precomputed Tables in Bernoulli Sampling
One drawback of the Bernoulli sampling is that it needs a precomputed table to generate a Bernoulli random value from B exp(−(y 2 +2kxy)/2σ 2 ) . Although the table is not large, it has to be precomputed before sampling for every different σ = kσ 2 . Therefore, it is interesting to sample D Z,kσ 2 without a precomputed table. In this subsection, we propose an alternative algorithm, which samples D Z,kσ 2 for a given k ≥ 1 and does not require a precomputed table except a fixed look-up table with small size.
We use the notation of the sampling algorithm given by Ducas et al. (Algorithm 11 in [4]). Since σ 2 = 1/(2 · ln 2) and σ = kσ 2 , it follows that exp(−( is the fractional part of (y 2 + 2kxy)/k 2 . Then the Bernoulli value can be obtained by generating y 2 + 2kxy/k 2 Bernoulli deviates according to B 1/2 and one Bernoulli random value from B exp(−(ln 2){y 2 +2kxy/k 2 }) . If no false value is generated in this procedure, then we have true, otherwise we have false.
It is clear that a Bernoulli deviate according to B 1/2 is equivalent to a uniform random bit. Thus, the only remaining problem is to generate a Bernoulli random value from B exp(−(ln 2){(y 2 +2kxy)/k 2 }) . We address this problem via Algorithm 2.
One can see that Algorithm 2 is adapted from Algorithm 1. It provides a technique for determining the relation between (ln 2)(z/k 2 ) and a uniform deviate u 1 ∈ [0, 1), and it does not need floating-point operations during run-time, if the binary expansion of ln 2, that is long enough (e.g., 128 bits), has been stored as a fixed look-up table.
In step 1, p is assigned to be the t most significant bits of the binary expansion of ln 2. In steps 5 and 6, if p > q/z = (vk 2 )/z , then we have 2 t (ln 2) > (vk 2 )/z, which follows that (ln 2)(z/k 2 ) > v/2 t . It is equivalent to obtaining a uniform deviate u 1 = (v/2 t ) that is strictly less than the value of (ln 2)(z/k 2 ). On the contrary, in steps 7 and 8, if p < q/z , then it is equivalent to obtaining a uniform deviate u 1 that is strictly more than the value of (ln 2)(z/k 2 ). Otherwise, if p = q/z , i.e., 2 t (ln 2) = (vk 2 )/z . This means that we get a uniform deviate u 1 , but in this moment we cannot determine whether it is less than the value of (ln 2)(z/k 2 ) or not.

Bernoulli Sampling for Discrete Gaussians with Varying Centers
In this subsection, we propose a new noncentered Bernoulli sampling algorithm for D Z,σ,i/b with σ = kσ 2 , where b is a power of two, i ∈ [0, b − 1] ∩ Z, k is a positive integer, σ 2 = 1/(2 · ln 2). Our proposed algorithm does not require any floating-point arithmetic, and it needs no precomputed table except a fixed and small size look-up table for storing the binary expansion of ln 2, as mentioned in Section 2.2.
Informally, the sketch of the proposed algorithm is as follows. We sample an integer x ∈ Z + from D Z + ,σ 2 firstly, and then sample y ∈ Z uniformly in {0, 1, 2, · · · , k}. Since the symmetry of a noncentered discrete Gaussian distribution does not hold, we need to determine the sign of prospective output before the acceptance-rejection operation, by setting s ← ±1 with equal probabilities. Finally, we accept s(kx + y) as the output with probability and restart the whole algorithm if otherwise, where µ = i/b. The following equation guarantees that the output value has the desired relative probability density ρ kσ 2 ,µ (s(kx + y)).
The sketch of the algorithm seems to work, but there are several subtleties we have to address.
Bernoulli sampling for centered discrete Gaussian distributions needs to restart its whole sampling procedure with probability 1/2 when the algorithm is about to return z = 0 (see Algorithm 12 in [4]). In our algorithm, in order to avoid double counting 0, we need to set y ← y + 1 if s = 1, so that there is only one case where z = 0. This also guarantees that y − sµ is always non-negative, and thus exp −((y − sµ) 2 + 2kx(y − sµ))/(2k 2 σ 2 2 ) is never more than one.
Another problem is how to compute the value of the exponential function efficiently. Since σ 2 = 1/(2 · ln 2) and is the fractional part of δ. Then, when computing the value of exponential function exp −((y − sµ) 2 + 2kx(y − sµ))/(2k 2 σ 2 2 ) , we generate δ Bernoulli deviates according to B 1/2 and one Bernoulli random value which is true with probability exp(− ln 2 · {δ}) = 2 −{δ} . If no false value is generated, then we accept s(kx + y) as the output, otherwise we restart the algorithm. In summary, our proposed algorithm can be formally described as Algorithm 3, which summarizes the above discussion.
Recall that our goal is to give an algorithm that requires no floating-point arithmetic. It is clear that a Bernoulli deviate according to B 1/2 is equivalent to a uniform random bit, and sampling from D Z + ,σ 2 can be efficiently implemented by using only unbiased random bits (Algorithm 10 in [4]). Thus, the only remaining problem is to generate a Bernoulli random value with probability exp(− ln 2 · {δ}) = exp − ln 2 · (by − si) 2 + 2bkx(by − si) b 2 k 2 without using floating-point arithmetic. Since (by − si) 2 + 2bkx(by − si) is a non-negative integer, what we need essentially is a solution for generating a Bernoulli random value from B exp(−(ln 2)(z/b 2 k 2 )) , where z = (by − si) 2 + 2bkx(by − si) and k are positive integers such that z < b 2 k 2 . To end this, we can apply Algorithm 2 by replacing k with b 2 k 2 , which involves no floating-point arithmetic and requires no precomputation storage.
A sampler for centers of low precision can be used in q-ary lattices, where q is relatively large but not very large, and sampling from discrete Gaussian distributions with q different centers may be required in this case. It involves a large storage requirement if one uses precomputed tables. So, one could consider our sampler as described in Algorithm 3. It requires no precomputed table.

Increasing the Precision of µ
Algorithm 3 only supports centers of finite precision. The precision of µ depends on the value of b. A larger b means a µ of higher precision. For example, let b = 2 52 . We remark that the bound of 52 bits is significant. We note that float-pointing with 53 bits of precision corresponds to the double precision type in the IEEE 754 standard, and it is usable in software. When µ ∈ [0, 1) is given by a double-precision floating-point number, we could represent µ as a rational number of denominator 2 52 and use Algorithm 3 directly.
However, b cannot be 2 52 in practice, since in this case we need to compute where b 2 k 2 > 2 64 , which leads to the integer overflow (for 64-bit C/C++ compilers). A big integer library works but compromises the performance at runtime. In this subsection, therefore, we try to improve Algorithm 3 so that it supports a µ of precision up to 52 bits. The basic idea is to avoid compute b 2 k 2 . We note that generating a Bernoulli random value which is true with probability can be decomposed to generating two Bernoulli random values which are true with probability exp − ln 2 (by−si) bk 2 and exp − ln 2 2x(by−si) bk respectively. It is not hard to see that a Bernoulli random value which is true with probability exp − ln 2 2x(by−si) bk can be generated by using Algorithm 2 with z = 2x(by − si) and k 2 = bk. In particular, Algorithm 2 requires 2x(by − si) < bk. When 2x(by − si) > bk, we need to compute 2x(by − si)/bk and invoke Algorithm 1 firstly, and then use Algorithm 2 with z = 2x(by − si) mod (bk) and k 2 = bk.
The following algorithm is designed to generate a Bernoulli random value from Bernoulli distribution B θ with θ = exp − ln 2 z and 0 < z < K Input: integer K and integer z such that 0 < z < K Output: a Bernoulli random value from B θ 1 set x ← (z/K) · ln 2 and n ← 0; 2 sample uniform deviates u with u ∈ [0, 1) ; if v < (z/K) then 6 set x ← u, n ← n + 1; The main idea behind Algorithm 4 is to sample two sets of uniform deviates u 1 , u 2 , . . . and v 1 , v 2 , . . ., and to decide the maximum value n ≥ 0 such that ln 2 · (z/K) > u 1 > u 2 > . . . > u n and v i < z/k. Then, the probability that n is even, which means it returns true, is given by Furthermore, when implementing Algorithm 4, we need to apply the technique presented in Algorithm 2 for comparing u with (z/K) · ln 2 and avoiding using floatingpoint arithmetic.
Let us go back to the problem of increasing the precision of µ. If k is restricted to be less than 2 8 , we can take b = 2 52 . In this case, we have (by − si) ≤ bk < 2 60 and 2x(by − si) < 2 64 with overwhelming probability. This means that Algorithm 3, incorporated with Algorithm 4 as well as Algorithm 2, supports a µ of precision up to 52 bits if k is restricted to be less than 2 8 , and it can be implemented only by using 64-bit integer arithmetic.

Results
On a laptop computer (Intel i7-8550U, 16GB RAM, the g++ compiler and enabling -O3 optimization option), we implemented GPV algorithm in [2], Karney's algorithm in [14], the original Bernoulli sampling algorithm in [4], and our Algorithm 3 respectively. Except Karney's algorithm, the uniformly pseudorandom numbers are generated by using AES256-CTR with AES-NI (Intel Advanced Encryption Standard New Instructions). Figure  1 shows the performance comparison of these sampling algorithms. Karney's Algorithm [14] Bernoulli [4] Alg. 3 Alg. 3 + Alg. 4 We select Karney's algorithm and the GPV algorithm to compare with our generalized Bernoulli sampling because they also use rejection sampling and requires no precomputation storage. Although algorithms based on the inversion sampling or Knuth-Yao method are usually more efficient, they require a large amount precomputation storage, especially for sampling from distributions with variable µ or large σ.
For centered discrete Gaussian distributions with σ roughly from 10 to 190, one can get about 12.7 × 10 6 samples per second by using Algorithm 3. This means that the performance of our improved Bernoulli sampling algorithm is significantly better than the original Bernoulli sampling as well as Karney's sampling algorithm. When b = 2 20 , for the discrete Gaussian distribution D Z,σ,µ with µ = i/b and i uniformly from [0, b − 1], we also tested that Algorithm 3 has almost the same performance as the case of centered discrete Gaussian distributions. When we take b = 2 52 , we need combine Algorithm 4 to implement Algorithm 3. The performance in this case decreases to about 6.4 × 10 6 per second and is slightly worse than the original Bernoulli sampling algorithm, but the latter does not support noncentered discrete Gaussian distributions.
Here, the GPV algorithm has the worst performance, since its proposal distribution is poor and its rejection rate is too high, which is close to 10 [2], while the Bernoulli sampling algorithm has a rejection rate of about 1.47, and the rejection rate of Karney's algorithm is even as lower as 1.39 [14]. We will prove in the discussion section of this paper that the rejection rate of Algorithm 3 is still about 1.47.
Our Algorithm 3 has a better performance than Karney's algorithm, this is due to the fact that it uses the binary discrete Gaussian distribution D Z,σ 2 as its base sampler instead of D Z,1 , and there exists a very efficient sampling algorithm for such a discrete Gaussian distribution. Algorithm 3 is faster than the original Bernoulli sampling algorithm, mainly because its rejection operation is simplified to sample from only one Bernoulli distribution with an exponential function as its parameter, while the original Bernoulli sampling algorithm requires a lookup table to sample a relatively large number of Bernoulli distributions with exponential functions as parameters in order to perform the rejection operation, which affects its practical performance.
Note that our Algorithm 4 requires Algorithm 2 to be fully implemented, and the basic idea of Algorithm 4 follows from that of Algorithm 2. We perform algorithm 4 once roughly as the same amount of computation as Algorithm 2 twice. This helps us intuitively explain why the performance of Algorithm 3 drops by about half after using Algorithm 4.

Discussion
In this section, we show that the noncentered version of Bernoulli sampling presented by Algorithm 3 has the same rejection rate as the Bernoulli sampling for centered discrete Gaussians.
Intuitively, evaluating the rejection rate of Algorithm 3 could be done just as evaluating the rejection rate of Bernoulli sampling for centered discrete Gaussian distributions, which has been presented by Ducas et al. in [4]. Notwithstanding, we may need the notion of the smoothing parameter of lattices, which was introduced by Micciancio and Regev in [1], because of the noncentered discrete Gaussian distributions. In fact, we only need the notion of smoothing parameter of lattice Z and its two important properties, since this work only involves the (one-dimensional) integer lattice Z, which is the simplest lattice.
Definition 1 (a special case of [1], Definition 3.1). Let > 0 be a positive real. The smoothing parameter of lattice Z, denoted by η (Z), is defined to be the smallest real s such that ρ 1/s (Z \ {0}) ≤ .
We can also see that the rejection rate of 1.47 also represents the average number of times that Algorithm 3 calls Algorithm 2 or Algorithm 4. This is because the rejection rate of Algorithm 3 is actually equal to the inverse of the probability that the algorithm executes step 7 and step 10 without restarting. Algorithm 3 needs to perform step 7 whenever it successfully outputs the sample or restarts, and step 7 requires a call to Algorithm 2 or Algorithm 4. Thus, the average number of times that step 7 is executed is the rejection rate, and the average number of calls of Algorithm 2 or Algorithm 4 is thus the rejection rate, which is about 1.47 and independent from the parameter σ and the center µ, according to Theorem 1.

Conclusions
In this paper, we show that the Bernoulli(-type) sampling for centered discrete Gaussian distributions over the integers can be implemented without floating-point arithmetic. Except a fixed look-up table of very small size (e.g., 128 bits) that stores the binary expansion of ln 2, our proposed algorithm does not need any precomputation storage and requires no floating-point arithmetic. We also present a noncentered Bernoulli sampling algorithm that satisfies the need of discrete Gaussian distributions with varying centers. The algorithm needs no floating-point arithmetic, and it is suitable for sampling D Z,σ,µ with µ ∈ [0, 1) of precision up to 52 bits. The noncentered version of algorithm has the same rejection rate as the original Bernoulli sampling algorithm. The experiment shows that our improved algorithms have better performance as compared to other rejection algorithms for discrete Gaussian distributions over the integers.