Practical and Malicious Multiparty Private Set Intersection for Small Sets

: Private set intersection (PSI) is a pivotal subject in the realm of privacy computation. Numerous research endeavors have concentrated on situations involving vast and imbalanced sets. Nevertheless, there is a scarcity of existing PSI protocols tailored for small sets. Those that exist are either restricted to interactions between two parties or necessitate resource-intensive homomorphic operations. To bring forth practical multiparty private set intersection solutions for small sets, we present two multiparty PSI protocols founded on the principles of Oblivious Key–Value Stores (OKVSs), polynomials, and gabled cuckoo tables. Our security analysis underscores the resilience of these protocols against malicious models and collision attacks. Through experimental evaluations, we establish that, in comparison to related endeavors, our protocols excel in small-set contexts, particularly in low-bandwidth wide area network (WAN) settings.


Introduction
With the successive promulgation of data protection laws like the General Data Protection Regulation (GDPR), privacy preservation has garnered significant attention.Enabling data circulation and computation while safeguarding data privacy poses a challenge.Secure Multiparty Computation (MPC) has emerged as an important solution.MPC encompasses three key functionalities: element-to-element computations, such as basic MPC operations [1]; element-to-set computations, such as Private Information Retrieval (PIR) [2]; and set-to-set computations, like private set intersection [3], which is the focus of our interest.PSI allows two or more parties to compute the intersection of their sets without revealing the privacy of the set differences.PSI has practical applications in various scenarios, including evaluating the scale of incidental intelligence collection [4], safeguarding the plaintext privacy of users in biometric systems [5], privacy protection in vehicular networks [6], and data alignment in federated learning [7].
Data alignment constitutes a crucial preliminary step in federated learning, which is divided into horizontal federated learning and vertical federated learning.Horizontal federated learning, also termed sample-aligned federated learning, requires each party to execute the PSI protocol on the samples, typically characterized by a larger quantity.Vertical federated learning, also known as feature-aligned federated learning, mandates that parties run the PSI protocol on features, usually fewer in number.Presently, most PSI protocols are tailored for large sets, proving effective in horizontal federated learning scenarios.While large-set PSI can achieve feature alignment, it fails to fully consider scenarios with a limited number of features, lacking opportunities for further runtime optimization.Our primary focus lies in developing appropriate PSI protocols tailored for horizontal federated learning contexts.Beyond horizontal federated learning, numerous applications exist for small sets, such as identifying shared friends across hundreds of items or discovering social circles with common interests through a handful of shared tags.In analogous settings, an efficient protocol holds the potential to enhance the end-user experience significantly.The relationship between federated learning and private set intersection is shown in Figure 1.To the best of our knowledge, the most efficient large-set PSI protocol is based on the Oblivious Transfer (OT) extension primitive [3,8,9].The OT extension protocol involves a period of public key operations before formal symmetric cryptographic operations commence, and in the context of small-set scenarios, the pre-processing time for public key operations cannot be overlooked.Hence, directly applying large-set PSI protocols to small-set scenarios is not the optimal solution.Rosulek et al. proposed a PSI protocol based on key agreement, which is well-suited for small-set scenarios [10].However, this protocol is limited to two parties, whereas parties in federated learning are not restricted to pairs.Therefore, its practicality needs further enhancement.Some scholars have introduced PSI protocols for unbalanced sets using fully homomorphic encryption.While theoretically applicable to small-set scenarios, as pointed out by Rosulek et al., these protocols exhibit computational complexities several orders of magnitude higher than key agreement and are also limited to two-party interactions.Although there are existing multiparty PSI protocols, such as OT-extension-based multiparty PSI protocols [11] and homomorphic multiparty PSI protocols [12], they are not optimal for multiparty small-set scenarios.OT-extension-based multiparty PSI protocols still require a warm-up period, and homomorphic multiparty PSI protocols involve expensive homomorphic operations.Incorporating the advantages of the key agreement into multiparty PSI scenarios constitutes a research direction that needs to be pursued.
We integrate the benefits of key agreement into multiparty PSI scenarios while also exploring other optimization strategies.In recent years, research on index structures has become a hot topic.Index structures can be effectively integrated into various fields.For example, incorporating optimized index structures in the field of machine learning significantly improves efficiency [13,14].Similarly, in studies related to PSI, various index structures such as cuckoo hash and Bloom filters have been introduced [3,11,15].In order to enhance security against malicious attacks, researchers have proposed more robust index structures such as gabled Bloom filters (GBFs) and gabled cuckoo tables (GCTs) [8,16].Integrating these advanced index structures into new PSI protocols poses a significant challenge.
In this paper, we propose practical multiparty private set intersection protocols for small sets.In sum, the contributions of our work are shown below: 1.
We innovatively introduce two multiparty private set intersection protocols designed for small sets, leveraging distinct structures of oblivious key-value stores.These protocols employ key agreement and zero-sharing techniques to achieve our objectives.

2.
We analyze both protocols' security and demonstrate that both of our protocols are correct and secure under the malicious security model against collision attacks.

3.
We implement the two protocols using Rust, and the experimental results demonstrate that, compared with related works, our protocols are more suitable for small-set scenarios, especially in bandwidth-constrained simulations.

Related Work
The original PSI was based on the idea of DH.Meadows et al. [17] proposed the first DH-based PSI protocol.Nowadays, the work on PSI aims at enhancing the security model, improving its efficiency, adapting to different scenarios, and so on.
Enhanced security model: De Cristofaro et al.
[18] constructed a DH-based PSI protocol under the malicious model, while Orrù et al. [19] constructed an OT-extension-based PSI protocol under the malicious model.With the introduction of Oblivious Key-Value Store structures, the performance gap between the two approaches is gradually narrowing.
Improved efficiency: Ishai et al. [20] proposed an OT extension that reduces public key operations; Kolesnikov et al. [3,21] improved the OT-extension, permitting longer elements to be input; Garimella and Pinkas et al. [8,16] proposed an OT-extension-based PSI protocol under the malicious model, while Orrù et al. [19] constructed an OT-extension-based PSI protocol under the malicious model.Garimella et al. [16] proposed OKVSs to improve the computational efficiency.Theoretically, the 3H-GCT provided the best results, while the GBF, although more communicative, still requires some improvement.For example, Ben-Efra et al. [22] used the GBF to solve specific problems, Boyle et al. [23] proposed silent OT, and Rindal et al. [24] introduced silent OT into PSI to reduce the communication cost.Similarly, Rosulek et al. [10] reduced the communication volume by half compared to the original DH-based PSI protocol.
Adaptation scenarios: Kolesnikov et al. [11] proposed a multiparty PSI protocol using IKNP-OT extension; Rosulek et al. [10] proposed a two-party PSI for small sets; Nevo et al. [9] proposed an arbitrary conspiratorial mutltiparty PSI protocol; and Dui et al. [25] proposed a small-entries PSI protocol with a length of approximately 32 bits per element.Related works have been published [26,27] that require the number of intersection elements to exceed a certain value.Additionally, Bay et al. [12] introduced a different threshold, which characterizes an element appearing in at least t parties.Wei et al. [28] also recently proposed a PSI protocol for small sets of multiple participants based on a different zero-sharing structure to ours, which can be further optimized.

Organization
We introduce the prerequisite knowledge for the subsequent protocols in Section 2. Section 3 provides the definitions of functionality and security.Sections 4 and 5 discuss two specific protocols.Section 6 presents the security analysis.Section 7 covers the experimental details.Finally, Section 8 discusses the findings, and Section 9 concludes the paper.

Key Agreement
We focus on a type of one-round key agreement (KA), namely Diffie-Hellman key agreement (DHKA).Given security parameter κ, the size of the output keys space |KA.K| ≥ 2 κ .The algorithm corresponding to the KA is as follows: 1.
KA.R is a space of private randomness.

2.
KA.msg(a) = aG, where a is a secret key, and G is the base point.

3.
KA.key(a, y) = ay, where y is msg (public key).Elliptic curves have distinctive features that can be easily identified from random strings.In certain scenarios, it is necessary to encode an elliptic curve into a uniformly distributed random string.This functionality was achieved in related work [29].The algorithm for Elligator-DHKA is as follows: 1.
KA.key e (a, y) = a • dec(y) The function enc(•) encodes points on an elliptic curve into random strings.Approximately 50% of the points can be successfully encoded.rG represents a subset of the elliptic curve denoted as im(dec), where dec(•) is the function that decodes random strings into points on the elliptic curve.

Zero Sharing
We adopt an unconditional zero-sharing structure, which allows for a maximum of m − 2 corrupt parties [11].For any element x, the zero-sharing value of party P i is denoted as S i (x) and satisfies ∑ m−1 i=0 S i (x) = 0, where m is the number of parties.

Oblivious Key-Value Store
The Oblivious Key-Value Store (OKVS) is a structure abstracted by Garimella and Pinkas et al. [8,16].It allows for the preservation of the key-value mapping while hiding the actual keys.An OKVS consists of two steps: encoding and decoding.We referred to related works [16] and provide a specific step-by-step procedure for a three-hashing gabled cuckoo table (3H-GCT).
The encoding of 3H-GCT is shown in Algorithm 1, and the decoding of 3H-GCT is shown in Algorithm 2. The common parameters required for encoding and decoding are as follows:

•
The parameter of statistical security λ = 40; and the bucket at position H2 (k i ) of V 2 ; 3 Continuously search the bucket containing only one element in three spaces, peeling the element of that bucket onto the stack S and removing that element from the other two spaces; Where, x, y represents the inner product of vector x and vector y.
The 3H-GCT is essentially a hypergraph.Graphs have been extensively studied in various fields, such as graph decomposition [30] and research on symbolic networks [31].In theory, optimizing graphs or hypergraphs could further enhance our protocol.

Oblivious Programmable PRF
Pseudorandom function (PRF): Given PRF key k, input x, output pseudorandom value PRF k (x).

Security Definitions
Ideal Functionality of MPSI Definition 1 (MPSI ideal functionality F MPSI ).For i ∈ [0, m), each party P i holds his own set X i = {x i,j : j ∈ [0, n i )} as the inputs of F MPSI , where n i = |X i |.Then, F MPSI returns the output m−1 i=0 X i to P m−1 without leaking extra information.
MPSI is based on the malicious security model.In the malicious security model, corrupted parties can deviate from the protocol and try to reveal other parties' private input.
Let Real Π (input(•), output(•)) be the view of the adversaries in the execution of the real protocol Π with input input(•) and the real output output(•) of corrupted parties P i ∈ C, where C is the set of corrupted parties.Let Ideal Π (input(•), output(•)) be the view of the adversaries in the execution of the ideal protocol F with input input(•) and the ideal functionality F controlled by a simulator Sim.
Definition 2 (Malicious security model).Given the MPSI protocol Π MPSI and the corresponding ideal functionality F MPSI , if there exists a probabilistic polynomial time (PPT) simulator Sim using the random input R i of any party and output of F MPSI to generate a simulated view that is computationally indistinguishable from the view of an arbitrary PPT adversary in the real world, where input(X C ) is the set of inputs of the parties in C, the protocol Π MPSI is secure in the malicious security model to achieve F MPSI .

Poly-DH MPSI Protocol
We were inspired by the works of Rosulek et al. [10] and Kolesnikov et al. [11] to describe an MPSI protocol based on key agreement and zero sharing, and we implemented the OPPRF functionality using KA and zero sharing.We coined the term "Poly-DH MPSI protocol" for this specific protocol, and we offer an encompassing framework diagram in Figure 2.
The protocol employs a star-shaped topology, with party P m−1 designated as the central node.P m−1 and parties P i , (0 ≤ i ≤ n − 2) execute the OPPRF protocol.P m−1 collects the OPPRF results from the other parties and, ultimately, outputs the intersection.Our OP-PRF protocol was based on a one-round key agreement construction.The process whereby P m−1 sends messages to P i , (0 ≤ i ≤ n − 2) is referred to as PSI Request, and the process whereby P i , (0 ≤ i ≤ n − 2) sends messages to P m−1 is called PSI Response.Of course, before the steps of request and response, there are the zero-sharing and preprocessing stages.We refer to the preprocessing stage as PSI Preparation.

Initialization
Determine the m parties P 0 , P 1 , . . ., P m−1 and the set of parties in the PSI protocol X 0 , X 1 , . . ., X m−1 .In the case of unconditional zero sharing, agree on a specific implementation of the PRF.Agree on random oracle H : {0, 1} * → F, ideal permutation Π/Π −1 : F → F, and the specific KA implementation.
The ideal permutation is the weaker model of the ideal cipher, fixed to one key.It acts like a random oracle, but it is reversible.

Zero Sharing
We use an example to explain why we chose the zero-sharing technique to extend the two-party protocol of [10] to m parties.
Example 1 (Why did we choose zero sharing?).We represent by [[x]] the "ciphertext" obtained by a series of DHKA transformations of element x.Assuming Alice has a set X A = {a, d, e, g}, Bob has a set X B = {b, d, f , g}, and Carol has a set X C = {c, e, f , g}, the set relationship is shown in Figure 3.
We can obtain the intersection X A ∩ X B = {d, g} through the two-party PSI protocol, because ), this can leak information because the multiparty PSI protocol can only expose g to Alice, Bob, and Carol.However, implementing the two-party PSI protocol twice in this way will expose e to Alice and Carol and f to Bob and Carol.
If we apply zero sharing for each element, such as and any one of Alice, Bob, or Carol has the element g, so −3[[g]] can be calculated to determine whether the equation is equal to obtain the intersection.For Carol, the reason why e and f are not leaked is that Carol can only obtain ] = 0 and cannot determine whether they are intersecting elements.If Carol is the central node, there is an equivalent judgment formula: Parties P 0 , P 1 , • • • , P m−1 in the zero-sharing protocol are connected in a fully connected topology.

1.
Party P i randomly generates a set of PRF keys {k ij : i < j < m} and sends PRF key k ij to party P j .2.
Party P j receives PRF keys k ij from P i (0 < i < j) and obtains a set of PRF keys {k ij : 0 < i < j}.

3.
Party P i obtains the zero-sharing function handle S i (•) through Equation ( 1).

PSI Preparation
This stage is the preprocessing stage of PSI and can be performed in advance during the offline phase, such as generating the required DH public-private key pairs (a, aG).Our zero-sharing protocol and PSI protocol are two independent protocols in real application scenarios, and multiple PSI tasks can reuse the same zero-sharing task, which is an advantage over the work of Wei et al. [28].
Party P i randomly generates secret key a i and obtains public key y i through Equation (2).Then, P i sends y i to party P m−1 .
Party P m−1 randomly generates a set of secret keys {a m−1,j : 0 ≤ j < n m−1 )} in KA.R e and obtains a set of public keys {y m−1,j : 0 ≤ j < n m−1 }.Here, n m−1 is the set size of Party P m−1 receives a set of public keys {y i : 0 and obtains a set of KA keys {key i,j : 0 key i,j = KA.key(am−1,j , y i ). (3)

PSI Request Flow
This subsection mainly describes the relevant steps for P m−1 to send messages to the other parties.
Party P m−1 inputs set X m−1 = {x m−1,j : 0 ≤ j < n m−1 } and obtains interpolation polynomial Pol m−1 through Equation ( 4).Then, P m−1 sends Pol m−1 to party P i (0 ≤ i < m − 1).Here, interpol(k, v) represents polynomial interpolation over the field F, where (k, v) is a key-value pair.H is the random oracle, and Π −1 is the inverse of the ideal permutation.
Party P i receives interpolation polynomial Pol m−1 from party P m−1 and obtains a set of interpolated values {y i m−1,j : 0 ≤ j < n i } through Equation ( 5).Here, v = Pol(k) represents the evaluation of k to obtain v, Π is the ideal permutation, and n i is the set size for set X i of party P i .

2.
Party P i obtains a set of KA keys {key i m−1,j : 0 ≤ j < n i } through Equation ( 6).

3.
Party P i obtains a set Z i = {z i,j : 0 ≤ j < n i } through Equation (7).Here, S i (x) represents the zero-sharing shares of element x.

PSI Response Flow
This step mainly describes the relevant steps for P i (0 ≤ i < m − 1) to send messages to P m−1 .
Party P i obtains interpolation polynomial Pol m−1 through Equation (8).Then, P i sends Pol i to party P m−1 .
Evaluation for P m−1 : 1. Party P m−1 receives interpolation polynomial Pol i from party P i (0 ≤ i < m − 1) and obtains a set of interpolated values {z m−1 i,j Party P m−1 obtains a set {t j : 0 ≤ j < n m−1 } through Equation (10).
We summarize the Poly-DH MPSI protocol in Figure 4.

Cuckoo-DH MPSI Protocol
In Section 4, we implemented a small-set MPSI protocol using polynomial interpolation.In the OKVS structure, polynomial interpolation incurs the lowest communication overhead but has relatively high computational complexity.If we allow a slight increase in the communication overhead to reduce the computational costs, an alternative protocol construction approach is possible.This involves replacing the previous polynomial interpolation component with the structure of the 3H-GCT.The communication overhead of the 3H-GCT is roughly 1.3 times that of polynomial interpolation, but the computational complexity significantly decreases.Therefore, we propose a small-set MPSI protocol based on the 3H-GCT.We coined the term "Cuckoo-DH MPSI protocol" for this specific protocol, and we offer an encompassing framework diagram in Figure 5.
Compared to Poly-DH MPSI, Cuckoo-DH MPSI incurs a slight increase in communication overhead.Therefore, it is suitable for scenarios where the bandwidth is not extremely limited.

Zero Sharing and PSI Preparation
These two steps are consistent with the zero-sharing step and PSI preparation step of the Poly-DH MPSI protocol, and there is no need to modify any relevant parameters.

PSI Request Flow with 3H-GCT
We abstracted the interpolation and evaluation as encode and decode operations.Therefore, the improved steps are as follows.
Party P i receives coefficient vector D m−1 from party P m−1 and obtains a set of decoded values {y i m−1,j : 0 ≤ j < n i } through Equation ( 12).Here, v = D(k) represents the decoding of k to obtain v.

2.
Party P i obtains a set of KA keys {key i m−1,j : 0 ≤ j < n i } through Equation (6).

3.
Party P i obtains a set Z i = {z i,j : 0 ≤ j < n i } through Equation (7).

PSI Response Flow with 3H-GCT
Similarly, in this step, the polynomial is replaced with the 3H-GCT.Encode for P i (0 ≤ i < m − 1): 1.
Party P i obtains coefficient vector D m−1 through Equation (13).Then, P i sends D i to party P m−1 .
Party P m−1 receives coefficient vector D i from party P i (0 ≤ i < m − 1) and obtains a set of decoded values {z m−1 i,j : 0 ≤ j < n m−1 } through Equation ( 14) 2.
We summarize the Cuckoo-DH MPSI protocol in Figure 6.

323
Party P i obtains coefficient vector D m−1 through Eq. ( 13).Then, P i sends D i to party 324 P m−1 . 325 Decode for P m−1 326 1.
Party P m−1 receives coefficient vector D i from party P i (0 ≤ i < m − 1), and obtains a 327 set of decoded value {z m−1 i,j : 0 ≤ j < n m−1 } through Eq. ( 14) We summarize the Cuckoo-DH MPSI Protocol in Fig. 6.

PSI Response Flow with 3H-GCT:
+ key i,j Output {x m−1,j : t j = 0 and j ∈ [0, n m−1 )} Proof.For each x i,q = x m−1,j ∈ X i ∩ X m−1 , P i can compute Pol m−1 (x i,q ) or D m−1 (x i,q ) and obtain y i m−1,q = y m−1,j .Consequently, key i m−1,q = a i y i m−1,q = a i y m−1,j and z i,q = S i (x i,q ) − key i m−1,q = S i (x m−1,j ) − a i y m−1,j .P m−1 can compute Pol i (x m−1,j ) or D i (x m−1,j ) and obtain z m−1 i,j = z i,q = S i (x m−1,j ) − a i y m−1,j .P m−1 can also compute key i,j = a m−1,j y i = a i y m−1,j ; therefore, z m−1 i,j + key i,j = S i (x m−1,j ).
For each x i,q / ∈ X m−1 , P i can compute Pol m−1 (x i,q ) or D m−1 (x i,q ) and obtain y i m−1,q = ∇.Consequently, key i m−1,q = a i y i m−1,q = a i ∇ and z i,q = S i (x i,q ) − key i m−1,q = S i (x m−1,j ) − a i ∇.In this context, ∇ is indistinguishable from random values.
For each x m−1,j / ∈ X i , P m−1 can compute Pol i (x m−1,j ) or D i (x m−1,j ) and obtain z m−1 i,j = ∇ .P m−1 can also compute key i,j = a m−1,j y i = a i y m−1,j ; therefore, z m−1 i,j + key i,j = ∇ + a m−1,j y i = ∇ .In this context, ∇ and ∇ are indistinguishable from random values. If , meaning that at least one random value ∇ is added, the final sum will still result in a random value.
Therefore, Poly-DH MPSI and Cuckoo-DH MPSI are correct with overwhelming probability.

Malicious Secure MPSI
Theorem 2. Poly-DH MPSI and Cuckoo-DH MPSI are secure against up to m − 2 collision attacks in a malicious model.If KA is an Elligator-DHKA, H is a secure hash function and Π, Π −1 are a pair of ideal permutations.
Proof.We notate the set of corrupted and colluding parties as C. We considered two collusion attacks, P m−1 ∈ C and P m−1 / ∈ C, to perform the simulation experiment between the ideal world and the real world.
With P m−1 ∈ C in Poly-DH MPSI, we used a series of hybrid experiments to prove that the real-world protocol execution was indistinguishable from the ideal-world simulation.
Hybrid 0 .The experiment comprises a realistic protocol execution.Hybrid 1 .In the zero-sharing step, Sim plays the role of the honest parties P i / ∈ C to send k ij to P j ∈ C and generates S i (•) with k ij from P j ∈ C. In the other part of the protocol, Sim executes as Hybrid 0 .Obviously, Hybrid 1 is computationally indistinguishable from Hybrid 0 since the zero sharing is unconditionally secure against up to m − 2 collusion attacks.
Hybrid 2 .In the PSI preparation step, Sim plays the role of the honest party P i / ∈ C, chooses a i ∈ KA.R to compute y i = KA.msg e (a i ), and sends y i to the adversary P m−1 ∈ C. In the other part of the protocol, Sim executes as Hybrid 1 .Obviously, Hybrid 2 is computationally indistinguishable from Hybrid 1 since y i is uniformly randomly chosen for a i ← KA.R. Hybrid 3 .In the PSI request flow, Sim runs the random oracle H, records every query H(x, k) made by C, and stores the input-output tuple (x, k, H(x, k)) in the list L 1 .Sim also runs the ideal permutation Π −1 , records Π −1 (KA.msg e (b)), and stores KA.msg e (b) in the list L 2 .Then, the adversary P m−1 in C returns the polynomial poly m−1 to Sim.In the other part of the protocol, Sim executes as Hybrid 2 .Obviously, Hybrid 3 is computationally indistinguishable from Hybrid 2 since H is a cryptographically secure hash modeled as a random oracle, and Π, Π −1 are a pair of ideal permutations.
Hybrid 4 .The experiment is a complete ideal-world experiment similar to Hybrid 3 except for the PSI response flow.In the PSI response flow, for the honest party P i / ∈ C, Sim executes polynomial Pol P i ∈C as the protocol.Using the records in L 1 and L 2 , Sim can compute the intersection of the adversaries' sets Sim uses S as the inputs of parties in C and sends them and X i simulated by Sim as the honest parties P i / ∈ C to the MPSI ideal functionality.Sim obtains the intersection I = P i / ∈C X i S from the MPSI ideal functionality and computes for each honest party P i z i,j = S i (x i,j ) − KA.key e (a i , Π(Pol m−1 (H(x i,j )))) where x i,j ∈ X i and x i,j / ∈ I. Thus, Sim can construct the polynomial Pol i and send it to P m−1 .Obviously, Hybrid 4 is computationally indistinguishable from Hybrid 3 since H is a cryptographically secure hash modeled as a random oracle, and the KA is an Elligator-DHKA.
Therefore, Hybrid 4 is computationally indistinguishable from Hybrid 0 , that is, the real-world protocol execution is computationally indistinguishable from the ideal-world simulation.In this way, we prove that Poly-DH MPSI is secure against up to m − 2 collision attacks in a malicious model if KA is an Elligator-DHKA, H is a secure hash function, and Π, Π −1 are a pair of ideal permutations when P m−1 ∈ C.
With P m−1 / ∈ C in Poly-DH MPSI, we also used a series of hybrid experiments to prove that the real-world protocol execution is indistinguishable from the ideal-world simulation.
Hybrid 1,0 .The experiment comprises a realistic protocol execution.Hybrid 1,1 .In the zero-sharing step, Sim plays the role of the honest party P i / ∈ C to send k ij to P j ∈ C and generates S i (•) with k ij from P j ∈ C. In the other part of the protocol, Sim executes as Hybrid 1,0 .Obviously, Hybrid 1,1 is computationally indistinguishable from Hybrid 1,0 , since the zero sharing is unconditionally secure against up to m − 2 collusion attacks.
Hybrid 1,2 .In the PSI preparation step, Sim plays the role of the honest party P i / ∈ C, chooses a i ∈ KA.R to compute y i = KA.msg e (a i ), and records y i as the role of the honest party P m−1 .In the other part of the protocol, Sim executes as Hybrid 1,1 .Obviously, Hybrid 1,2 is computationally indistinguishable from Hybrid 1,1 , since y i is uniformly randomly chosen for a i ← KA.R.
Hybrid 1,3 .In the PSI request flow, Sim runs the random oracle H, records every query H(x, k) made by C, and stores the input-output tuple (x, k, H(x, k)) in the list L 1 .Sim also runs the ideal permutation Π −1 , records Π −1 (KA.msg e (b)), and stores KA.msg e (b) in the list L 2 .Then, Sim records the polynomial poly m−1 from the role of the honest party P m−1 and sends it to P i / ∈ C. In the other part of the protocol, Sim executes as Hybrid 1,2 .Obviously, Hybrid 1,3 is computationally indistinguishable from Hybrid 1,2 , since H is a cryptographically secure hash modeled as a random oracle, and Π, Π −1 are a pair of ideal permutations.
Hybrid 1,4 .The experiment is a complete ideal-world experiment similar to Hybrid 1,3 except for the PSI response flow.In the PSI response flow, for the honest party P i / ∈ C, Sim executes polynomial Pol P i ∈C as the protocol.Using the records in L 1 and L 2 , Sim can compute the intersection of the adversaries' sets Sim uses S as the inputs of the parties in C and sends them and X i simulated by Sim as the honest parties P i / ∈ C to the MPSI ideal functionality.Sim obtains the intersection I = P i / ∈C X i S from the MPSI ideal functionality, which is the same as the intersection computed based on the correctness proof in Theorem 1. Obviously, Hybrid 1,4 is computationally indistinguishable from Hybrid 1,3 , since H is a cryptographically secure hash modeled as a random oracle and the KA.
Therefore, Hybrid 1,4 is computationally indistinguishable from Hybrid 1,0 , that is, the real-world protocol execution is computationally indistinguishable from the ideal-world simulation.In this way, we prove that Poly-DH MPSI is secure against up to m − 2 collision attacks in a malicious model if KA is an Elligator-DHKA, H is a secure hash function, and Π, Π −1 are a pair of ideal permutations when P m−1 ∈ C.
Similar to the security proof of Poly-DH MPSI, we can also use a simulation model based on the security analysis of the 3H-GCT [16] and prove that Cuckoo-DH MPSI is also secure against up to m − 2 collision attacks in a malicious model if KA is an Elligator-DHKA, H is a secure hash function, and Π, Π −1 are a pair of ideal permutations.

Implementation
We will describe how the various components of the protocol were instantiated in this section.Our entire protocol revolved around a series of 256-bit operations.
Zero Sharing: The key in zero sharing was a uniformly distributed 256-bit random value, called the rand library of Rust; the PRF protocol took the keyed hash algorithm in BLAKE3.
Key Agreement: For one-round KA, we used Curve25519 [32] based on the Montgomery curve By 2 = x 3 + Ax 2 + x, where B takes 1, A takes 486,662, and x, y ∈ F 2 255 −19 .We followed the implementation in the work of Bernstein et al. [29].The enc(•) and dec(•) functions satisfied the following definitions: Oracle: We used the hash method of BLAKE3.BLAKE3 is one of the best hashing algorithms available and has more performance advantages than the SHA2 algorithm used by Rosulek et al. [10].
Ideal Permutation: The ideal permutation [10] is a reversible permutation function that was simulated using a fixed-key Rijndael-256.AES was derived from Rijndael, but the block size was fixed at 128 bits, while the ideal permutation here required a 256-bit block size.We followed the definition of Daemen et al. [33] for the implementation.
Polynomial Interpolation: We used the Lagrangian interpolation method, with the number in GF(2 256 ).Rust currently does not possess a cryptographic library similar to the C++ NTL and GMP library, and there is a lack of finite-field interpolation tools due to time constraints and excessive engineering.
Gabled Cuckoo Graph: We used the 3H-GCT structure shown in the work of Garimella et al. [16], where three hashes acted on each of the three regions.Our implementation referred to the C++ code of the 2H-GCT shown in the work of Pinkas et al. [8] and widened it to three hashes.

Experiments and Evaluation
We implemented our protocol in Rust and ran it on a 12th Gen Intel Core i7-12700H with 32 GB RAM and Ubuntu 22.04.We conducted experiments on the two protocols proposed in this article and compared them with the work of Kolesnikov et al. [11].
The MultipartyPSI (https://github.com/osu-crypto/MultipartyPSI(accessed on 6 April 2023)) library is also one of the few open-source libraries for multiparty PSI available at present.It provides PSI constructions based on the Poly, Table, and Bloom filter (BF) types, with the Table type utilizing cuckoo hashing for optimization.Our protocol was constructed based on the Poly and 3H-GCT designs.The Poly and 3H-GCT structures we used were advanced versions compared to the Poly and Table structures used in the work of Kolesnikov et al. [11], designed to be suitable for malicious models.However, we did not utilize the advanced version of the BF, i.e., the GBF, in constructing the PSI protocol due to its higher communication overhead.
We set the parameter of computational security κ = 128 and the parameter of statistical security λ = 40 for conducting experiments under both local area network (LAN) and wide area network (WAN) conditions.
Small sets with LAN setting: We set the scenario as PSI for small sets, taking the selected set size n as 2 4 , 2 5 , • • • , 2 10 for the experiment.In the LAN setting, the main influencing factor was the computational cost.The work of Kolesnikov et al. [11] was based on the OT protocol, which had a certain startup cost, and we found that our protocol constructed based on DHKA had an advantage when the set size was smaller than 2 7 .Ref. [11] presented a semi-honest model protocol, which was more time-consuming to transform into a malicious model protocol; as a reference, the malicious OT-PSI protocol with PaXoS [8] consumed about 1.4 times more time than the semi-honest OT-PSI protocol [3].Due to the limited Rust ecology at present, there is no efficient computational library similar to C++ NTL/GMP; thus, we manually wrote some non-deeply optimized code for temporary replacement.The PSI protocol [10] demonstrated that both the DHKA-based protocols had advantages when the set was smaller than 2 10 , and thus we believe that our protocol has much room for engineering improvement.To explore the impact of the number of parties on the protocol, we selected 5, 10, 20 for our experiments.Since the protocol adopted a star topology, theoretically the time elapsed for the task and the number of parties would have a linear growth relationship (similar to [11]).The experimental results are shown in Table 1.Small sets with WAN setting: OT-based PSI protocols perform best without bandwidth limitations, but their relatively high communication costs, as highlighted by Google, have a more substantial impact in real-world deployments compared to computational costs [34].Therefore, we limited the bandwidth and conducted simulation experiments for the cases of 20 Mbps, 10 Mbps, 5 Mbps, and 1 Mbps.From the experimental results, it was clear that our protocol based on DHKA had lower time consumption than the protocol based on OT in the case of bandwidth limitation.Under the settings of 5 Mbps∼20 Mbps, there was an efficiency improvement of about 3×∼7×, and under the worse network conditions of 1 Mbps, there was an efficiency improvement of more than 10×.Our proposed Cuckoo-DH MPSI protocol was able to balance the computation cost and communication cost and had the best performance in bandwidth-constrained situations.The experimental results are shown in Table 2.The execution time of PSI is influenced by both computational and communication costs.In the LAN experiments, where messages were transmitted locally and communication time could be neglected, the execution time primarily hinged on the computational complexity of the protocol.As seen in Table 1, both the approach in reference [11] and our method based on polynomial interpolation demonstrated a notable increase as the set elements grew.This was primarily due to the time-consuming nature of polynomial interpolation, particularly evident for non-prime modulus GF (2 256 ).The polynomial in- terpolation method used in reference [11] was derived from the C++ NTL library with a time complexity of O(nlog 2 n), while our implementation using Rust code, currently not fully optimized, operated at a time complexity of O(n 2 ).However, this does not imply that protocols based on polynomial interpolation lack value, a point that will be explained in subsequent experiments on a WAN.Table 1 demonstrates that in scenarios involving small sets, our DH-based protocol exhibited higher efficiency compared to the OT-based protocol in reference [11].This discrepancy was attributed to not only the time-consuming nature of polynomial interpolation but also the significant time overhead from public key operations, as shown in Table 3.
Table 3. Theoretical computation costs of MPSI protocols."SH" and "M" refer to semi-honest and malicious protocols, respectively.κ is the parameter of computational security.λ is the parameter of statistical security.n i is the set size of party P i ."Fixed-base Mul", "Variable-base Mul", and "Add" in the table represent operations on points of an elliptic curve."Encode" is an indexing operation (including cuckoo hashing) for reference [11], while others refer to OKVS encoding.When κ = 128, L = 1023.

Protocol
* "COT (2H-GCT)" is the multiparty promotion of the two-party PSI protocol proposed in [8], where "2H-GCT" refers to the two-hashing gabled cuckoo table.The structure of silent OT [23] used by VOLE-PSI [24] is relatively complex.When the set is larger than 2 20 , it can evenly share the cost of computation and perform better, so VOLE-PSI is not compared here.
The DH-based PSI protocol conducted approximately m • n m−1 public key operations, whereas the OT-based PSI protocol performed about 3.5(m + 1)κ public key operations.Thus, when the set size was n m−1 < 3.5κ(1 + 1 m ), the DH-based PSI protocol had fewer public key operations.For example, when m = 2 and κ = 128, n m−1 < 672 < 2 9.4 ; similarly, when m = +∞ and κ = 128, n m−1 < 448 < 2 8.81 .On the other hand, reference [11] explored the use of Bloom filters and cuckoo hash tables as alternative methods to polynomial interpolation.Bloom filters trade space for time, whereas cuckoo hash tables strike a balance between the two.However, these methods are suitable for semi-honest models and are not effectively applicable to malicious models.In contrast, our protocol introduced the 3H-GCT, an improved version of the cuckoo hash table suitable for malicious models but at expense of increased time consumption.
Reference [10] employed polynomial interpolation to construct two-party PSI protocols, and experiments validated the significant advantage of this protocol in bandwidthconstrained scenarios, as evident from Table 2.Both the protocol based on polynomial interpolation in reference [11] and our own polynomial interpolation-based protocol performed well.The reason behind this was that polynomial interpolation incurs the lowest communication overhead.Of course, the DH-based PSI protocol also had lower communication overhead compared to the OT-based PSI protocol, resulting in strong performance for both of our proposed protocols in Table 2. Since reference [11] open-sourced its code and implemented various protocols based on polynomial interpolation, Bloom filters, cuckoo graphs, and other designs, it provided a valuable comparison for our two protocols based on polynomial interpolation and gabled cuckoo graphs.We conducted theoretical analyses on communication overhead, as depicted in Tables 4 and 5, encompassing several other protocols.For instance, reference [24] introduced the VOLE-PSI protocol based on silent OT, along with the design of OPPRF, enabling the construction of a multiparty PSI protocol based on OPPRF.Although the reference did not elaborate further on multiparty PSI or provide code implementation for MPSI, we conducted theoretical analyses for its multiparty scenario.Tables 4 and 5 reveal that our protocols theoretically exhibited lower communication overhead.Apart from these protocols amenable to theoretical analysis, we also conducted a comparison of experimental protocol content, as illustrated in Figure 7.   4. Theoretical communication costs of zero sharing (in bits)."SH" and "M" refer to semi-honest and malicious protocols, respectively.κ is the parameter of computational security.

Protocol
Sec. Party P i (0 * VOLE-PSI is a two-party PSI protocol based on silent OT proposed in paper [24], which introduces the specific construction of OPPRF.OPPRF can be used to construct the MPSI protocol, so "VOLE" here refers to the protocol constructed based on paper [24] and combined with zero-sharing technology.There is a structural difference between the zero sharing mentioned in [28] and the unconditional zero sharing proposed in [11], and although the ideas are similar, the latter form of zero-sharing is heavily coupled with the subsequent DH-based MPSI; therefore, the zero sharing of the latter protocol is subsumed into the subsequent MPSI protocol. Table 5. Theoretical communication costs of MPSI protocols (in bits)."SH" and "M" refer to semihonest and malicious protocols, respectively.κ is the parameter of computational security.λ is the parameter of statistical security.The cost of base OTs are independent of input size and equal to 5κ. n i is the set size of party P i , and n = max m−2 i=0 n i .φ is the size of elliptic curve group elements (256 was used here).β i,1 and β i,2 is the required bin size mapping n i elements to 1.2n i and 0.2n i bins using simple hashing, and γ i = 3.6β i,1 + 0.4β i,2 .When n i = 2 14 , β i,1 = 28 and β i,2 = 63.When κ = 128, L = 1023.+ φ] * "COT (2H-GCT)" is the multiparty promotion of the two-party PSI protocol proposed in [8], where "2H-GCT" refers to the two-hashing gabled cuckoo table."VOLE (Poly)" and "VOLE (2H-GCT)" are the multiparty promotions of the two-party PSI protocol proposed in [24], and "VOLE (3H-GCT)" is a multiparty promotion of the combination of [24] and [16].

Discussion
In Section 6, we proved the security of our proposed protocol under a malicious model using unconditional zero sharing, withstanding attacks from m − 2 colluding parties for m parties.To defend against attacks from m − 1 colluding parties, conditional zero sharing is required, as demonstrated in [11].Due to the lower efficiency of conditional zero sharing, we did not provide a detailed explanation of this type of protocol in the paper.However, conditional zero sharing can also be applied within our protocol, and its principles are similar to those outlined in [11].
In Section 7, we manually implemented interpolation with a time complexity of O(n 2 ) and limited optimization, which was not as good as the work of Moenck et al. [35], whose method had a time complexity of O(nlog 2 n); we hope to improve upon this in the future.If the structure of 2H-GCT is a graph, then the structure of 3H-GCT is a hypergraph.Solving the loop problem in a hypergraph is more complicated, and we adopted a pollinglike approach for edge peeling, which may be combined with deeper knowledge of the hypergraph here for further optimization.
In addition, improving the encoding and decoding efficiency of the OKVS and reducing the space occupied by the coefficient vector after OKVS encoding are all ways to improve the efficiency of the PSI protocol.Our KA was specifically implemented through Curve25519, but KA is not limited to Curve25519.The new elliptic curve cryptography, and even the new cryptographic structure, may improve the efficiency of the PSI protocol.In addition, utilizing hardware acceleration protocols such as GPU and FPGA without introducing new cryptographic theories is also a future research focus.
Our protocol can be effectively utilized for feature alignment in federated learning.However, our protocol is not limited to this.For example, in [10], a scenario was discussed wherein two parties aim to schedule a meeting that must occur during a time slot when both parties are available.This requires computing the intersection of available time slots without disclosing any additional schedule information beyond the intersection.We extended this scenario to encompass multiple parties agreeing on a meeting, a common situation, and our protocol could fulfill this requirement.Identifying common friends among multiple users and privacy-preserving intelligent recommendations are potential subsequent applications.

Conclusions
This paper introduced two multiparty private set intersection protocols for small sets: Poly-DH MPSI and Cuckoo MPSI.These protocols were constructed using key agreement, zero sharing, and different OKVS structures.In small-set scenarios, both Poly-DH MPSI and Cuckoo MPSI showed higher efficiency than previous approaches, even in LAN settings.Particularly in scenarios with bandwidth constraints, our proposed protocol demonstrated distinct advantages.The protocol based on the gabled cuckoo table incurred lower computational costs but slightly higher communication costs compared to the polynomial-based protocol.

Figure 1 .
Figure 1.Federated learning and private set intersection.

5Algorithm 2 :
using Gaussian elimination and obtain the vector D ; Pop the elements from the stack S, map them onto the matrix, and modify vector D to satisfy L(k i )||R(k i ), D = v i .If there are three (or two) undefined elements in the corresponding position in D, fill two (or one) of the positions randomly and adjust the remaining one to satisfy the equation; Decoding of 3H-GCT Input: set size n, coefficient vector D, key k Output: value v

Figure 7 .
Figure 7. Running time (ms) of MPSI protocols.Our Poly-MPSI protocol did not use the C++ NTL API with a time complexity of O(n log 2 n) for the interpolation but rather used the interpolation function manually written by Rust.Due to the large engineering workload, the time complexity was O(n 2 ), and the experimental results were not the theoretical optimal values, so they are not compared in the figure.

Table 1 .
Running time (ms) of MPSI protocols (LAN setting) for m parties on sets with size n."SH" and "M" refer to semi-honest and malicious protocols, respectively.

Table 2 .
Running time (ms) of MPSI protocols (WAN setting) for 5 parties on sets with size n."SH" and "M" refer to semi-honest and malicious protocols, respectively.