Two-Party Privacy-Preserving Set Intersection with FHE

A two-party private set intersection allows two parties, the client and the server, to compute an intersection over their private sets, without revealing any information beyond the intersecting elements. We present a novel private set intersection protocol based on Shuhong Gao’s fully homomorphic encryption scheme and prove the security of the protocol in the semi-honest model. We also present a variant of the protocol which is a completely novel construction for computing the intersection based on Bloom filter and fully homomorphic encryption, and the protocol’s complexity is independent of the set size of the client. The security of the protocols relies on the learning with errors and ring learning with error problems. Furthermore, in the cloud with malicious adversaries, the computation of the private set intersection can be outsourced to the cloud service provider without revealing any private information.


Introduction
In 1978, Rivest first presented the idea of fully homomorphic encryption (FHE) [1]. Gentry constructed the first specific FHE scheme in 2009 [2]. Since then, dramatic progress in FHE is made by Gentry and many other researchers around the world. The first generation is based on an approximate GCD problem of integers and ideal lattices [2,3]; the second generation is based on ring learning with errors (RLWE) and learning with errors (LWE) problems, and developed several techniques, including re-linearization, key switch and modulus reduction, for decreasing noise growth [4,5]; the third generation involves the GSW scheme, which is based on approximate eigenvalues and RLWE [6]. Shuhong Gao's scheme [7] is a compressed fully homomorphic encryption scheme, denoted by SGFHE below, and this scheme has three features: (1) The cipher with private key encryption is expanded six times and with public key encryption is 10 + log2(n), where n (a power of 2) is the block length of the message; the computation of all ciphertexts is modulo r, where r = 16n; and the boundary of noise size is n − 1. (2) The bootstrapping algorithm needs only a bootstrapping key and the boundaries of the noise size of the output ciphers are still n − 1 with no failure at all. (3) the security of Shuhong Gao's scheme is based on the learning with errors problems and ring learning with errors problems, and for the block length of any message n ≥ 512, it costs at least 2 160 bit operations for breaking the scheme with the current approaches. In addition, with TFHE bootstrapping [8], the LWE cipher produced could be invalid with a probability of about 2 −33 (for n = 500). That probability is very small, and for computing many functions it is useful; however, it cannot be applied to functions

Related Work
Several specialized PSI protocols have been proposed in the literature which are more efficient than using general secure computation [33]. The main methods are: based on oblivious polynomial evaluation [25], based on an oblivious pseudo-random function [26], based on a blind signature [27], based on homomorphic encryption [28], based on the Bloom filter [29], etc. Shen Liyan et al. [30] gave a detailed overview of the development prospects of private preserving set intersection computing, the protocol developed by Google scholar. Mihaela Ion et al. [11] applied private preserving set intersection computing to advertising cooperation.

Contributions
We present three private set intersection protocols. First, we propose a novel private set intersection protocol based on Shuhong Gao's fully homomorphic encryption scheme and prove the security of the protocol in the honest-but-curious model. We then present a variant of promoted protocol. We also present a variant of the protocol which is a completely novel construction for computing the intersection based on the Bloom filter and a fully homomorphic encryption; this protocol's complexity is independent of the set size of the client. The security of the protocol relies on the learning with errors and ring learning with errors problems. Furthermore, in a cloud with malicious adversaries, the computation of the private set intersection can be outsourced to the cloud service provider without revealing any private information. The ciphertext extension of the protocols is small so that the protocols have strong practicability.
The remainder of the paper is structured as follows: We next review the basic concepts and techniques used in Section 2. In Section 3, we introduce the homomorphic operation used. We describe the basic two-party computing protocol, the improvement protocol and the two-party computing protocol based on the Bloom filter in Section 4. We present our conclusions in Section 5.

Notation
Let χ be an error distribution; according to the distribution χ, x ← χ is randomly chosen. For an integer n ≥ 1, let R n = Z[x]/(x n + 1), R n,q = Z[x]/(x n + 1, q), where (x n + 1, q) represents the ideal of Z[x] generated by x n + 1 and q. For any polynomial

LWE Ciphers and Modulus Reduction
Regev proposed LWE problem [31,32] over Z q . Let χ be a probabilistic distribution, and s ∈ Z n q be an arbitrary vector that is a secret key of any user. (a, b) is an LWE sample, where a ∈ Z n q is selected randomly and uniformly, b ≡ s, a + e (mod q), e ← χ.
Modulus reduction can reduce the LWE ciphers of Z q to Z r where r is far less than q.

Gadget Matrix
Suppose that B and l are positive integers so that B ≥ q. Suppose that when g = (1, B, · · · , B −1 ), an arbitrary a ∈ Z q could be denoted by where a i ∈ Z has a small size. Let −B/2 ≤ a i ≤ B/2; then (a 0 + a 1 , · · · + a −1 ) is unique. Let −2B ≤ a i ≤ 2B; the lemma as following is straightforward to prove.
, which is uniform, random and independent. Suppose that Hence, any list of elements in Z q can be extended. That is, each polynomial a(x) ∈ R n,q can be denoted by where u(x) ∈ R 2 n is selected randomly and uniformly, and ||u(x)|| ∞ ≤ 2B. Here G −1 , only as an operator, acts on the right of (a(x), b(x))(G is not a square matrix, so it has no inverse).
A row vector u(x) has 2 polynomials; the coefficients of the polynomials are small and at most 2B. This can increase the dimension to decease the coefficient. By the above definition, we have the following equation. ( n,q

External Product
Suppose that a row vector v = (a(x), b(x)) ∈ R 2 n,q , and arbitrary matrices A ∈ R 2 ×2 n,q of 2 × 2, define the external product of v and A as it is a random vector; for v G −1 is a random vector of 1 × 2 . By definition, the external product satisfies the right distributive, namely, for arbitrary two matrices A, B ∈ R

GSW Ciphers
Let an n-bit secret key s( according to the definition of RLWE sample where w(x) ∈ R 2 n , and ||w(x)|| ≤ τ; τ is the error size of GSW ciphers.

Bloom Filter
A Bloom filter [34] is a compact data structure for probabilistic set membership testing, and can insert and query data efficiently. The Bloom filter provides a time and space-efficient method to check whether there is an element in the set. A Bloom filter consists of a binary vector and a set of hash functions; b j represents the j-th bit of the Bloom filter b and all elements of the empty Bloom filter are 0. Any Bloom filter b includes the three steps as follows: Create(α) : Create an empty Bloom filter with α bits; the hash function {h i |0 ≤ i < β} is: Set the Bloom filter cell with subscript g i to 1.
Test(x) : Test whether the element x is in the Bloom filter b. Compute β hash values g i = h i (x) of the element x; if the β cells with subscript g i are 1 (b g i = 1), then return 1 (true).
The Bloom filter has a negligible false positive probability; Test(x) will return 1, although x cannot be added to the Bloom filter. Given ω elements to be added and the expected maximum false positive probability 2 −k , the Bloom filter size α needs to satisfy: A Bloom filter is widely used in cryptography. Bellovin and Cheswick [35] and Goh [36] implemented a securely document search using a Bloom filter. Raykov and Bellovin [37] realized a secure database query. Qiu L and Li Y [38] realized privacy data mining and BIP-0037 put forward the application of a Bloom filter in Bitcoin. Reference [39][40][41] realized the set intersection computing based on Bloom filters.

Homomorphic Operations
In SGFHE scheme, let any two LWE ciphers be E s (x 1 ) and E s ( ; the scheme follows the approach in Ducas et al. [42] and Chillotti [43], but does not need to perform a key switch.

Bootstrapping Algorithm
Lemma 4. Suppose that a bootstrapping key bk has an error size at most τ 1 ; r is divisible by 8 and r ≥ 16n, Q ≥ n n−3 16Br 2 τ 1 . Then, for any two LWE ciphers E s (x i ) = v i ∈ Z n r × Z, with error size ≤ D r /4 where x i ∈ {0, 1} for i = 1, 2, the bootstrapping algorithm in Algorithm 1 outputs random LWE ciphers E s ( r × Z r all with error size < n ≤ D r /4 [7].
We can divide the data x into d blocks of length n. Let N = dn, x = (x 1 , x 2 , . . . , x d ) ∈ {0, 1} N , x k = (x k,0 , x k,1 , . . . , x k,n−1 ), x k ∈ {0, 1} n . Each x k can be expressed as a polynomial ∑ n−1 i=0 x k,i x i ∈ R n . Then-encrypted using the private-key scheme c k = RE s (x k ), 1 ≤ k ≤ d by Algorithm 2-note that the cipher text size c k is about 6N bits and then encrypted using the public-key scheme c k = RE pk (x k ), 1 ≤ k ≤ d by Algorithm 3; note that the cipher text size c k ' is about N(10 + log 2 (n)). Homomorphic computing can be performed in three steps as follows: Algorithm 3 Encryption under public key: RE pk (m(x)) → (a(x), b(x)) ∈ R 2 n,r . Input: pk = (k 0 (x), k 1 (x)), k 0 (x) ← R n,q ; (1) Unpacking the RLWE ciphertexts RE(x k ) to get LWE ciphers in Z n r × Z r for the bits of x.

Privacy-Preserving Set Intersection
We abstract the privacy set intersection computation model as follows. The client C owns a set {c 1 , . . . , c v } of size v, and the server S holds a set {s 1 , . . . , s ω } of size ω. After the end of the protocol, the client C only obtains the intersection {c 1 , . . . , c v } {s 1 , . . . , s ω }; however, the server cannot get any information for the input and the set intersection of the client (including the size of the intersection).

The Basic Two-Party Computing Protocol
The summary of basic private two-party intersection protocol is shown in Figure 1. The specific steps are as follows: 1. The client C encrypts the set with private key and sends ciphertexts to the server S. 2. The server S implements homomorphic computing with bootstrapping key and sends the result to the client C.
3. The client C decrypts and computes the intersection of the two sets; the server S cannot acquire any information about the input and output. Our basic two-party computing protocol is shown in Figure 2. At step C → S, the client sends pk, bk and RE sk (c k ) to the server. At step S, the server unpacks RE sk (c k ) to get E sk (c k,j ), unpacks RE pk to get E sk (s i,j ), samples u ∈ {0, 1} n , calls bootstrapping operations to compute E sk (z k,i ), computes LWE ciphers E sk (w i,j ), packs the resulted LWE ciphers E sk (w i,j ) into RLWE ciphers RE sk (w i ) and sends them to the client. At step C, the client decrypts RE sk (w i ) to get w i and computes the intersection.

Correctness of the Basic Two-Party Computing Protocol
First, the correctness of SGFHE scheme has been proven. Let c k , s i be the set elements' binary representation of the client and server respectively. The insufficient bits are filled with 0s and we extend the length to n.
If z k,i = 1, then c k = s i ; if z k,i = 0, then c k = s i .
The server can acquire Remark: RE represents RLWE cipher; E represents LWE cipher. Let E sk (z i ) ∈ Z n r × Z r can be computed from E sk (z k,i ) by implementing (v − 1) bootstrapping operations. Hence, implementing (2n + v − 2) bootstrapping operations by (6) can compute E sk (z i ).
If z i = 1, then w i = u is a random value with ∀k, c k = s i ; if z i = 0, then there ∃k so that c k = s i , w i = c k = s i is in the intersection. For s i and u, each bit For plaintexts u j and s i,j , an LWE cipher of any bit z k ∧ u j ⊕ (1 − z k ) ∧ s i,j can be computed as which still has error size < D r /4. The LWE cipher is The server can pack the resulted LWE ciphers E sk (w i,j ) into RLWE ciphers RE sk (w i ) and send them to the client.

Security Analysis of the Basic Two-Party Computing Protocol
We analyze the security of the protocol by comparing the real model and the ideal model. The real model is the actual implementation of the basic private intersection protocol and it is a trusted server for computing the intersection. The trusted server receives the input {c 1 , . . . , c v } of the client and the input {s 1 , . . . , s ω } of the server, and will return the intersection with the client; however, the server cannot get any information about the output. The ideal model maintains all security evidence. In the semi-honest model, the participant's view includes its own input and the information received from other participants during the progression of the protocol. The simulator can use the participant's input and output to build a simulation that is computationally indistinguishable from the views. That proves that the participants cannot obtain any other information besides the inputs and outputs. Theorem 1. If SGFHE is held, then the basic two-party computing protocol can realize the private set intersection computing under the semi-honest model.

Proof.
In the protocol, the server cannot obtain any other information besides receiving the RLWE ciphers. Its view can only be simulated with ciphertexts and its security is based on IND-CPA security of RLWE scheme.
The client only receives the RLWE ciphers of the intersections and the random RLWE ciphers. Therefore, it just includes the output information of the set intersection and the view of simulator is only the output information of the set intersection.

The Improvement of the Basic Two-Party Computing Protocol
In the basic two-party computing protocol, the server will return the ciphertexts of the intersection elements or the random ciphertexts, and computes the intersection by decrypting the ciphertexts. In our improvement protocol shown in Figure 3, we just need to determine whether c k is in {s 1 , . . . , s ω } without computing the ciphertexts of the intersection elements by the server. On the one hand, it can reduce the computational complexity; on the other hand, it will not reveal the size of the server set. Let c k , s i be the set elements' binary representations of the client and the server respectively. The insufficient bits are filled with 0s and we extend the length to n.
If z k,i = 1, then c k = s i ; if z k,i = 0, then c k = s i .
The server can acquire In the protocol, the server cannot obtain any other information besides RLWE ciphers and the view can only be simulated by the ciphertexts. Its security is based on IND-CPA security of RLWE scheme.
The client acquires z k,i by (9), however, the probability of obtaining s i,j from z k,i and c k,j is 2 −n , and it is negligible. The client only receives the output of the intersection; therefore, the view of simulator is just the output of the set intersection.

Two-Party Computing Protocol Based on a Bloom Filter
In this section, we construct a two-party protocol based on Bloom filter shown in Figure 4, in which the client C encrypts each bit of the Bloom filter with private key and sends it to the server S. The server S homomorphic computes Test(s j ) with the bootstrapping key of client C and sends it to the client. C will obtain the intersection of the two sets by decrypting, but the server cannot get any information about the input and output (including the size of the intersection). Let c k , s i be the set elements' binary representations of the client and the server respectively. The insufficient bits are filled with 0s and we extend the length to n.
The client C constructs a Bloom filter b = create(α) and sends pk, bk, RE sk (b) to the server S.
w j = {w j,1 , . . . , w j,n } = z j s j ⊕ (1 − z j )u (11) If z j = 1, then there ∃k such that c k = s j , and computing w j = s j by (11); similarly, if z j = 0, then ∀k such that c k = s j , and computing w j = u by (11). For plaintexts s j and u, each bit can be computed by (11), w j,t = z j ∧ s j,t ⊕ (1 − z j ) ∧ u t , 1 ≤ t ≤ n.
The corresponding LWE cipher is E sk (w j,t ) = s j,t E sk (z j ) + u t (E sk (1) − E sk (z j )).
The correctness and security of the two-party computing protocol based on Bloom filter is similar to the basic two-party computing protocol. Please refer to Sections 4.2 and 4.3.

Conclusions
We constructed the set intersection two-party computing protocols based on a fully homomorphic encryption scheme. The protocols are simple and only need two rounds of communication, and the security is based on RLWE and LWE problems in the semi-honest model. The ciphertext extension of the protocols is small so that the protocols have strong practicability. Furthermore, we can extended the set intersection protocol by outsourcing computing under the malicious model. The limitation of our schemes is they are two-party protocols. In future work, we shall extend them to multi-party protocols. The disadvantage of the private set intersection protocols is they are not efficient enough due to bottleneck the bootstrapping operation. On the theoretical side, with the development of fully homomorphic encryption technology, its performance has been greatly improved, but the efficiency of it is still worthy of in-depth study. The bottleneck of the SGFHE scheme is its bootstrapping operation; therefore, its parallelization and hardware implementation will be further studied to improve the overall efficiency of the protocol.