Secure Computation Protocol of Text Similarity against Malicious Attacks for Text Classification in Deep-Learning Technology

Liu, Xin; Wang, Ruxue; Luo, Dan; Xu, Gang; Chen, Xiubo; Xiong, Neal; Liu, Xiaomeng

doi:10.3390/electronics12163491

Open AccessArticle

Secure Computation Protocol of Text Similarity against Malicious Attacks for Text Classification in Deep-Learning Technology

by

Xin Liu

^1,2

,

Ruxue Wang

¹,

Dan Luo

^3,*,

Gang Xu

⁴,

Xiubo Chen

²,

Neal Xiong

⁵

and

Xiaomeng Liu

¹

School of Information Engineering, Inner Mongolia University of Science and Technology, Baotou 014010, China

²

State Key Laboratory of Network and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China

³

Department of Computer, Tianjin Ren’ai College, Tianjin 301636, China

⁴

College of Information, North China University of Technology, Beijing 100144, China

⁵

Department of Computer Science and Mathematics, Sul Ross State University, Alpine, TX 79830, USA

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(16), 3491; https://doi.org/10.3390/electronics12163491

Submission received: 18 July 2023 / Revised: 11 August 2023 / Accepted: 15 August 2023 / Published: 17 August 2023

Download

Browse Figures

Versions Notes

Abstract

:

With the development of deep learning, the demand for similarity matching between texts in text classification is becoming increasingly high. How to match texts quickly under the premise of keeping private information secure has become a research hotspot. However, most existing protocols currently have full set limitations, and the applicability of these methods is limited when the data size is large and scattered. Therefore, this paper applies the secure vector calculation method for text similarity matching in the case of data without any complete set constraints, and it designs a secure computation protocol of text similarity (SCTS) based on the semi-honest model. At the same time, elliptic-curve cryptography technology is used to greatly improve the execution efficiency of the protocol. In addition, we also analyzed the possibility of the malicious behavior of participants in the semi-honest-model protocol, and further designed an SCTS protocol suitable for the malicious model using the cut-and-choose and zero-knowledge-proof methods. By proposing a security mechanism, this protocol aims to provide a reliable and secure computing solution that can effectively prevent malicious attacks and interference. Finally, through the analysis of the efficiencies of the existing protocols, the efficiencies of the protocols under the malicious model are further verified, and the practical value for text classification in deep learning is demonstrated.

Keywords:

1. Introduction

In recent years, with the development of Artificial Intelligence, deep learning has become a hot research topic and is widely used in speech recognition [1], automatic driving [2], natural language processing [3], and other fields. In these applications, information retrieval, semantic understanding, and text classification all require the secure computation protocol of text similarity (SCTS). For example, in text classification, the text to be classified can be classified by comparing the similarity between the text to be classified and the text of a known category to determine the category to which it belongs, which is often used in tasks such as sentiment analysis, spam filtering, medical classification, and news classification. When the similarity score between the text to be classified and the text of a known category is above a set threshold, the text to be classified is deemed to belong to that category. To accomplish the text classification task, a selection of commonly used techniques, such as convolutional neural networks, recurrent neural networks, attention mechanisms, and transformer models, can be used. Depending on the specific task and dataset requirements, there is flexibility in choosing the right approach and combining pre-trained models and migration learning to improve the classification performance. In general, deep learning has achieved good results in text-similarity-matching tasks [4,5,6]. Natural language processing is important in deep learning as a vector-based approach to solve the problem of text similarity [7,8,9,10].

However, in the era of big data, text information is highly susceptible to leakage, and the SCTS faces great challenges. The traditional method of calculating text similarity is to calculate the cosine similarity between two text vectors, but it consumes a long processing time and is mostly suitable for short texts, and its information is also easily leaked. This paper breaks the traditional method by increasing the length of the matching string, and there is no full set restriction on the range of the string, which greatly improves its applicability and security. Therefore, the SCTS is a necessary and secure multi-party computation (MPC) technology that can precisely achieve the secure computation of text similarity.

MPC was first introduced by Professor Yao [11], and Goldreich [12] and Cramer et al. [13] further studied MPC algorithms, including secure data mining [14], confidential computing sets and geometric problems [15], and secure vector operations [16]. These studies provide conditions for the development of MPC on the basis of protecting the privacy of their respective information.

SCTS can be abstracted as the secure computation of vectors. For example, the text data are represented as the vectors

x = (x_{1}, \dots x_{n})

, and the user’s text data are represented as the vectors

y = (y_{1}, \dots, y_{n})

. By setting a threshold (

t

), it is only necessary to judge whether the number of identical components of two vectors reaches the threshold (

t

). If the set threshold (

t

) is reached, then this means that they match; otherwise, they do not match. The user can know the similarity matching degree of two texts by setting the threshold (

t

).

There are few existing SCTS protocols, and the only ones are implemented in a semi-honest model. String matching can be seen as a special case in SCTS. Reference [17] used GM encryption with the aim of computing the similarity of two strings, but the time required to run the algorithm is long and inefficient. Reference [18] designed a string-matching protocol similar to the application scenario of this paper, which implements exact matching and can accurately classify text, but it is less secure and has a full set restriction. Reference [19] converted the string-matching problem into a set membership determination problem, and the string-matching protocol used in the confidential matching process is the BMH algorithm, which can greatly reduce the computational complexity, but it is not secure enough in the execution process, and it is still not able to resist the attacks of malicious adversaries. Reference [20] solved the problem of approximate string matching, and Reference [21] proposed a wildcard-pattern-matching protocol with a query function. However, the number of characters needs to be controlled within a certain range, which has significant limitations and is prone to information leakage. Reference [22] proposed the problem of vector secrecy computation, which can achieve the text similarity computation of two vectors by setting the corresponding thresholds, but the protocol is based on the semi-honest model, which is less efficient and secure.

To address the above issues, this article designs an SCTS protocol for malicious adversaries, which has important guiding significance for further promoting the research and application of the SCTS. The main contributions are as follows:

(1): The text is converted into a vector by an encoding method, and the number of equal elements in the secure vector computation is compared with the threshold value to judge the similarity of the two texts;
(2): The SCTS protocol under the semi-honest model is designed using elliptic-curve cryptography without set range limits. Compared with other encryption algorithms, elliptic-curve cryptography has obvious advantages, such as high security, a small key size, fast encryption and decryption, low storage requirements, and high adaptability;
(3): For the possible malicious behaviors of malicious adversaries in the SCTS semi-honest protocol, based on the cut-and-choose method and zero-knowledge proof, an SCTS protocol under the malicious model is designed. Whether the protocol is secure is verified via the real/ideal-model paradigm, and the efficiency of the protocol is analyzed.

The paper is structured as follows: Section 2 introduces the encoding method and the basic theorem, and then Section 3 describes the solution to the problem and designs the SCTS protocol under the semi-honest model. Section 4 then improves on Section 3 by designing the SCTS protocol under the malicious model using cryptographic tools. Section 5 provides a comparative analysis of the performances of existing protocols. Finally, Section 6 summarizes the work accomplished in this paper and shows future research directions.

2. Related Work

2.1. Elliptic-Curve Cryptography

Elliptic-curve cryptography (ECC) [23] is an asymmetric public-key cryptosystem based on a finite field. Assuming the elliptic curve y² = x³ − x, a straight line passing through point P and Q intersects point R′ on the elliptic curve, and a vertical line passing through point R′ on the X axis intersects point R on the elliptic curve, and then P + Q = R can be obtained. R′ is the additive inverse of R, and R′ and R are symmetric about the X axes. Please refer to Figure 1 for details.

When P = Q, make a straight line through point P to intersect point R′ on the elliptic curve. If P is the tangent point, then there is P + P = R, which is recorded as 2P = R. If there is a P with the same k added, it is recorded as k·P; for example, P + P + P + P = P + 3P = 4P. It is precise because of the discrete logarithm problem on the elliptic curve. When K = k·P, it is easy to obtain K when k and P are known, but it is impossible to obtain k when K and P are known.

The ECC methodology is as follows:

First, select an elliptic curve (E_p(a,b)), select the base point (G) and the private key (k), and then calculate kG = K to obtain the public key (K).

(1): Encryption: Encode a plaintext (m) to a point (M) on the elliptic curve (E_p(a,b)), select a random positive integer (r), and calculate C₁ = M +rK, C₂ = rG;
(2): Decryption: Decrypt M through the formula C₁ − kC₂ = M + rK – k(rG) = M. If the plaintext information (m) is obtained, then the point M on the elliptic curve is needed to decode;
(3): Addition homomorphism: the following properties exist: E(M₁) + E(M₂) = E(M₁ + M₂).

Theorem 1.

Suppose C = E(h) is the ciphertext of 0 or 1 encrypted by the elliptic curve, and −C is its additive inverse. According to the addition homomorphism of ECC, T(C) = E(1) + (−C) = E(1 − h). If C is the ciphertext of 0, then T(C) is the ciphertext of 1, and if C is the ciphertext of 1, then T(C) is the ciphertext of 0; that is, ciphertext T(C) flips the plaintext 1 or 0 corresponding to C; thus, T(C) = E(1) + (−C) is called a flip operation.

Theorem 2.

For any integer (a,b), the following conclusion holds:

(1): a > b if and only if 2a > 2b +1;
(2): a ≥ b if and only if 2a + 1 > 2b.

Theorem 3.

In the ECC encryption scheme, it is assumed that a,b take values in the plaintext space Z_N. If C = E(a) + E(−b) = E(a – b mod N) and w = D(C), the following conclusions can be reached:

(1): w = 0 if and only if a = b;
(2): If 0 ≤ a,b < N/2, then 0 < w < N/2, if and only if a > b; N/2 < w < N only if a < b.

Note: The specific proof processes of Theorem 1 and Theorem 2 can be referred to in Reference [22].

2.2. Coding Method

In this paper, the ASCL code is used to one-to-one correspond each character with a three-digit decimal system (a specific ASCL table can be queried) and encode text characters into vectors. For example, the text string ‘Love’, according to the ASCL table, corresponds to the characters with three decimal digits. The ASCL code of the character ‘L’ is 076, the ‘o’ is 111, the ‘v’ is 118, and the ‘e’ is 101. Therefore, the number string corresponding to ‘Love’ is 076111118101, and the vector is (076,111,118,101). The number string corresponding to the text string ‘Like’ is 076105107101, which is expressed as the vector (076,105,107,101).

2.3. Cut-and-Choose Method

The cut-and-choose method [24] plays an essential role in resisting the attacks of malicious adversaries. One party sends a large amount of data, and the other party arbitrarily selects a part of the data and requires the other party to verify it. After the verification is passed, the remaining data are selected for calculation. However, after the malicious participant passes the verification in the first step, the other party happens to pick up the wrong data, which are always found in the verification phase; thus, the cut-and-choose method can effectively resist the attack of the malicious opponent.

Input:

(1): Participant A inputs vector $\vec{x_{i}}$ ( $i = 1, \dots, l$ ); that is, $\vec{x_{i}} = < (x_{0}^{i, 1}, x_{1}^{i, 1}), (x_{0}^{i, 2}, x_{1}^{i, 2}), \dots, (x_{0}^{i, s}, x_{1}^{i, s}) >$ , and there are $l$ vectors in total. $s$ is used as the input to check whether the $X_{1}, \dots X_{s}$ value is in ${0, 1}^{n}$ ;
(2): Participant B inputs $σ_{1}, \dots σ_{i} \in {0, 1}$ and a set of parameters ( $ζ \in [s]$ ).

Output: The receiving party will obtain the following information:

(1): The receiving party ( $R$ ) receives the $j$ -pair in the vector $\vec{x_{i}}$ , namely, $(x_{0}^{i, j}, x_{1}^{i, j})$ ;
(2): The receiving party ( $R$ ) receives $σ_{i}$ (i.e., $< x_{σ_{i}}^{i, 1}, x_{σ_{i}}^{i, 2}, \dots, x_{σ_{i}}^{i, s} >$ ). Among them, $i = 1, \dots, l$ , $j \in ζ$ , $k \notin ζ$ , and $X_{k}$ are output by the $R$ .

2.4. Security of Malicious Model

The protocol under the malicious model has the highest security. Reference [24] specifically describes the security proofs of protocols under the malicious model.

Ideal protocol: Assume that data

x

and

y

are owned by Alice and Bob, respectively. They can compute equation

f (x, y) = (f_{1} (x, y), f_{2} (x, y))

through a trusted third party (TTP) without disclosing their own data. In the end, Alice can only obtain the result

f_{1} (x, y)

and Bob can only obtain the result

f_{2} (x, y)

, as follows:

(1): Alice and Bob, respectively, send input data $x$ and $y$ to the TTP. If the participants are honest, then they will send authentic data ( $x$ or $y$ ) to the TTP; if the participants are malicious, then the participants might choose to terminate the protocol or input false data ( $x^{'}$ or $y^{'}$ );
(2): The result of the calculation is sent to Alice by the TTP. After the TTP obtains the input pair $(x, y)$ , it will independently calculate function $f (x, y)$ and send Alice function $f_{1} (x, y)$ ; otherwise, it will send Alice a special symbol ( $⊥$ );
(3): The result of the calculation is sent to Bob by the TTP. If Alice is a malicious participant, then she obtains the message from the other side: The first response is to choose to disregard the TTP, at which point the TTP sends $⊥$ to Bob. The second response is to send $f_{2} (x, y)$ to Bob from the TTP.

When implementing the ideal protocol, the TTP and participants will not disclose any information except their own output information; thus, the protocol with the highest security is the one under the ideal model.

Under the ideal model, the process for the participants to jointly calculate

F (x, y)

through auxiliary input information

z

and strategy

\bar{B}

is

I D E A L_{F, \bar{B} (z)} (x, y)

, which is defined as a random number (

r

) evenly selected by the adversary, making

I D E A L_{F, \bar{B} (z)} (x, y) = γ (x, y, z, r)

. The details are as follows:

(1): When Alice is honest, there are $γ (x, y, z, r) = (f_{1} (x, y^{'}), B_{2} (y, z, r, f_{2} (x, y^{'})))$ and $y^{'} = B_{2} (y, z, r)$ ;
(2): When Bob is honest, there is the following:

$γ (x, y, z, r) = \{\begin{matrix} (B_{1} (x, z, r, f_{1} (x^{'}, y), ⊥), ⊥), & B_{1} (x, z, r, f_{1} (x^{'}, y)) = ⊥ \\ (B_{1} (x, z, r, f_{1} (x^{'}, y)), f_{2} (x^{'}, y)), & o t h e r s \end{matrix}$

(1)

Both of these cases have

x^{'} = B_{1} (x, z, r)

.

Definition 1.

Security of the protocol under the malicious model.

The actual protocol has policy pair

\bar{A} = (A_{1}, A_{2})

, and the ideal model has policy pair

\bar{B} = (B_{1}, B_{2})

, making

{I D E A L_{F, \bar{B} (z)} (x, y)}_{x, y, z} \overset{c}{\equiv} {R E A L_{Π, \bar{A} (z)} (x, y)}_{x, y, z}

, and then

Π

is secure in computing the function (

F

) (there exists

x, y, z \in {0, 1}^{*}

such that

|x| = |y|

and

|z| = p o l y (|x|)

).

Note 1: To design an MPC protocol under the malicious model, it must be ensured that at least one of the participants is honest; otherwise, the MPC protocol will not be implemented (which cannot be avoided under the ideal model).

3. Secure Computation Protocol of Text Similarity under the Semi-Honest Model

3.1. Problem Description

Alice encodes the private text as the vector

x = (x_{1}, \dots, x_{n})

. Bob encodes the private text as

y = (y_{1}, \dots, y_{n})

with an agreed threshold (

t

) (see Section 2.2). Both parties can securely output

L_{t} (x, y)

without disclosing any information. If

L_{t} (x, y) = 1

, then it means that the number of equal elements of

x_{i} = y_{i} (1 \leq i \leq n)

in vector

x

and vector

y

is

l \geq t

, indicating that the similarity between the two texts is at least

t / n

. Otherwise, it outputs

L_{t} (x, y) = 0

(where the number of equal elements is recorded as

l = L (x, y)

).

3.2. Solutions

(1): Alice calculates the vectors $u_{i} = 2 x_{i}$ and $u_{i}^{'} = 2 x_{i} + 1$ . Bob calculates the vectors $v_{i} = 2 y_{i} + 1$ and $v_{i}^{'} = 2 y_{i}$ . Both parties jointly calculate the dominance degree ( $h$ ) of $u = (u_{1}, \dots, u_{n})$ with respect to $v = (v_{1}, \dots, v_{n})$ and the dominance degree ( $h^{'}$ ) of $v^{'} = (v_{1}^{'}, \dots, v_{n}^{'})$ with respect to $u^{'} = (u_{1}^{'}, \dots, u_{n}^{'})$ (see Note 2 below for the definition of the dominance degree). According to Theorem 2, the number of elements of $x_{i} = y_{i} (1 \leq i \leq n)$ in vector $x$ and $y$ is recorded as $l = L (x, y) = n - h - h^{'}$ ;
(2): The random vectors $r = (r_{1}, \dots, r_{n})$ and $r^{'} = (r_{1}^{'}, \dots, r_{n}^{'})$ are selected, and the symbols of $t_{i} = r_{i} (u_{i} - v_{i})$ and $t_{i}^{'} = r_{i}^{'} (v_{i}^{'} - u_{i}^{'})$ are determined for each $i \in [1, n]$ ;
(3): The dominance ( $h$ ) of $u$ with respect to $v$ is determined by the number of the same symbols of $t_{i}$ and $r_{i}$ . The dominance ( $h^{'}$ ) of $v^{'}$ with respect to $u^{'}$ is calculated via the number of the same symbols of $t_{i}^{'}$ and $r_{i}^{'}$ (the ciphertexts of $h$ and $h^{'}$ are obtained). Finally, the sizes of $l$ and $t$ are compared through the ciphertext of $l = n - h - h^{'}$ .

Note 2: If two n-dimensional vectors (

u = (u_{1}, \dots, u_{n})

and

v = (v_{1}, \dots, v_{n})

) are given, then the number of

u_{i} > v_{i} (1 \leq i \leq n)

in the vector is called the vector dominance of

u

with respect to

v

.

The specific Algorithm 1 is as follows:

Algorithm 1: Computing the text similarity under the semi-honest model.

Input:

x = (x_{1}, \dots, x_{n})

: Alice’s input;

y = (y_{1}, \dots, y_{n})

: Bob’s input; t: agreed threshold between both parties; G: the base point of the elliptic curve (E_p); pk = K: the public key; sk = k: Alice’s private key; E: encrypt; D: decrypt; Encode: encode points onto elliptic curves (E_p);

u_{i} = 2 x_{i}, - u_{i}^{'} = - (2 x_{i} + 1)

: Alice’s calculation;

- v_{i} = - (2 y_{i} + 1), v_{i}^{'} = 2 y_{i}

: Bob’s calculation;

1: $E n c o d e (u_{1}, \dots, u_{n}) = (M_{1}, \dots, M_{n})$ , $E n c o d e (- u_{1}^{'}, \dots, - u_{n}^{'}) = (M_{1}^{'}, \dots, M_{n}^{'})$ ;
2: $E n c o d e (- v_{1}, \dots, - v_{n}) = (P_{1}, \dots, P_{n})$ , $E n c o d e (v_{1}^{'}, \dots, v_{n}^{'}) = (P_{1}^{'}, \dots, P_{n}^{'})$ ;
3: Select a_i, $a_{i}^{'}$ , b_i, $b_{i}^{'}$ ;
4: $E_{p k} (u_{i}) = (M_{i} + a_{i} K, a_{i} G), E_{p k} (- u_{i}^{'}) = (M_{i}^{'} + a_{i}^{'} K, a_{i}^{'} G)$ ;
5: $E_{p k} (- v_{i}) = (P_{i} + b_{i} K, b_{i} G), E_{p k} (v_{i}^{'}) = (P_{i}^{'} + b_{i}^{'} K, b_{i}^{'} G)$ ;
6: Select random vectors $r = (r_{1}, \dots, r_{n}), r^{'} = (r_{1}^{'}, \dots, r_{n}^{'})$ ;
7: Compute $w_{i} = E_{p k} [r_{i} (u_{i} - v_{i})] = E_{p k} (r_{i} u_{i}) + E_{p k} (- r_{i} v_{i})$
$and w_{i}^{'} = E [r_{i}^{'} (v_{i}^{'} - u_{i}^{'})] = E (r_{i}^{'} v_{i}^{'}) + E (- r_{i}^{'} u_{i}^{'})$ ;
8: $D_{s k} (w_{i}) = r_{i} (u_{i} - v_{i}) = d_{i}, D_{s k} (w_{i}^{'}) = r_{i}^{'} (v_{i}^{'} - u_{i}^{'}) = d_{i}^{'}$ ;
9: $If d_{i} < N / 2, then set h_{i} = 1; otherwise h_{i} = 0; if d_{i}^{'} < N / 2, then set h_{i}^{'} = 1; otherwise, h_{i}^{'} = 0$ ;
10: $E n c o d e (h_{1}, \dots, h_{n}) = (N_{1}, \dots, N_{n}), E n c o d e (h_{1}^{'}, \dots, h_{n}^{'}) = (N_{1}^{'}, \dots, N_{n}^{'})$ ;
11: $Select b_{1 i}, b_{1 i}^{'}$ ;
12: $E_{p k} (h_{i}) = (N_{i} + b_{1 i} K, b_{1 i} G), E_{p k} (h_{i}^{'}) = (N_{i}^{'} + b_{1 i}^{'} K, b_{1 i}^{'} G)$ ;
13: $For each i \in [1, n], when r_{i} > 0, let H_{i} = E (h_{i}); when r_{i} < 0, calculate H_{i} = T [E (h_{i})]$ ;
14: $When r_{i}^{'} > 0, let H_{i}^{'} = E (h_{i}^{'}); when r_{i}^{'} < 0, calculate H_{i}^{'} = T [E (h_{i}^{'})]$ ;
15: $Compute \overset{\land}{H} = \sum_{i \in [1, n]} H_{i} + H_{i}^{'} = E (e) = E (n - l)$ ;
16: $Select r^{*}$ ;
17: $Compute Z = E_{p k} (r^{*} 1) + E_{p k} (r^{*} 2 n) - E_{p k} (r^{*} 2 e) + E_{p k} (- 2 t r^{*}) = E [r^{*} (2 l + 1 - 2 t)]$ ;
18: $D_{s k} (Z) = z^{*}$ ;
19: $If z^{*} < N / 2, then L_{t} (x, y) = 1; otherwise, L_{t} (x, y) = 0$ .

Output:

L_{t} (x, y)

.

The specific Protocol 1 is as follows:

Protocol 1: The SCTS protocol under the semi-honest model.

Input: Alice’s text vector (

x = (x_{1}, \dots, x_{n})

), Bob’s text vector (

y = (y_{1}, \dots, y_{n})

), and threshold (t).
Output: The size relationship (

L_{t} (x, y)

) between the number of equal elements (l and t) of two vectors (x and y).

Preparation: For each

i \in [1, n]

, Alice calculates

u_{i} = 2 x_{i}, - u_{i}^{'} = - (2 x_{i} + 1)

, and Bob calculates

- v_{i} = - (2 y_{i} + 1), v_{i}^{'} = 2 y_{i} (u_{i}, v_{i} < N / 2

). Alice selects an elliptic curve (

E_{p} (a, b)

), selects the base point (G) and the private key (k), and then calculates kG = K to obtain the public key (K). Alice sends

E_{p} (a, b)

, the public key (K), and G to Bob.
Protocol Start:

(1)

Alice encodes the plaintext sets

u = (u_{1}, \dots, u_{n})

and

- u^{'} = (- u_{1}^{'}, \dots, - u_{n}^{'})

to points M_i and

M_{i}^{'} (1 \leq i \leq n)

on the elliptic curve (E_p) one by one, selects n random numbers (a_i and

a_{i}^{'}

), and encrypts each element (M_i and

M_{i}^{'} (1 \leq i \leq n)

) with the public key (K). That is, the ciphertexts

E (M_{i}) = (C_{1 i}, C_{2 i}) and E (M_{i}^{'}) = (C_{1 i}^{'}, C_{2 i}^{'})

are calculated, where

C_{1 i} = M_{i} + a_{i} K, C_{2 i} = a_{i} G and C_{1 i}^{'} = M_{i}^{'} + a_{i}^{'} K, C_{2 i}^{'} = a_{i}^{'} G

. Then, Alice obtains the sets

E (u) = (E (M_{1}), E (M_{2}), \dots, E (M_{n})) and E (- u^{'}) = (E (M_{1}^{'}), E (M_{2}^{'}), \dots, E (M_{n}^{'}))

and sends

E (u) and E (- u^{'})

to Bob;

(2)

After Bob receives

E (u) and E (- u^{'})

:

(a): Bob encodes the plaintext sets $- v = (- v_{1}, \dots, - v_{n}) and v^{'} = (v_{1}^{'}, \dots, v_{n}^{'})$ to points P_i and $P_{i}^{'} (1 \leq i \leq n)$ on the elliptic curve (E_p) one by one, selects n random numbers (b_i and $b_{i}^{'} (1 \leq i \leq n)$ ), and encrypts each element (P_i and $P_{i}^{'} (1 \leq i \leq n)$ ) with the public key (K). That is, the ciphertexts $E (P_{i}) = (I_{1 i}, I_{2 i}) and E (P_{i}^{'}) = (I_{1 i}^{'}, I_{2 i}^{'})$ are calculated, where $I_{1 i} = P_{i} + b_{i} K, I_{2 i} = b_{i} G and I_{1 i}^{'} = P_{i}^{'} + b_{i}^{'} K, I_{2 i}^{'} = b_{i}^{'} G$ . Then, Bob obtains the sets $E (- v) = (E (P_{1}), E (P_{2}), \dots, E (P_{n})) and E (v^{'}) = (E (P_{1}^{'}), E (P_{2}^{'}), \dots, E (P_{n}^{'}))$ . At the same time, Bob selects the random vectors $r = (r_{1}, \dots, r_{n}), r^{'} = (r_{1}^{'}, \dots, r_{n}^{'})$ , where $0 < |r_{i}|, |r_{i}^{'}| < \sqrt{N / 2} and i \in [1, n]$ ;
(b): For each $i \in [1, n]$ , Bob calculates w_i and $w_{i}^{'}$ , including the following:
$w_{i 1} = E (u_{i}), w_{i 2} = E (- v_{i}), w_{i} = E [r_{i} (u_{i} - v_{i})] = E (r_{i} u_{i}) + E (- r_{i} v_{i})$ , and that is $w_{i} = \underset{r_{i}}{\underset{⏟}{(w_{i 1} + \dots + w_{i 1})}} + \underset{r_{i}}{\underset{⏟}{(w_{i 2} + \dots + {w_{i}}_{2})}}$ . At the same time, Bob calculates $w_{i 1}^{'} = E (- u_{i}^{'}), w_{i 2}^{'} = E (v_{i}^{'}), w_{i}^{'} = E [r_{i}^{'} (v_{i}^{'} - u_{i}^{'})] = E (r_{i}^{'} v_{i}^{'}) + E (- r_{i}^{'} u_{i}^{'})$ , and that is $w_{i}^{'} = \underset{r_{i}^{'}}{\underset{⏟}{(w_{i 1}^{'} + \dots + w_{i 1}^{'})}} + \underset{r_{i}^{'}}{\underset{⏟}{(w_{i 2}^{'} + \dots + w_{i 2}^{'})}}$ . Then, Bob sends the ciphertexts $w_{i} and w_{i}^{'}$ to Alice;

(3)

For each

i \in [1, n]

:

(a): $Alice decrypts w_{i} and w_{i}^{'} and decodes the x - coordinates of points w_{i} and w_{i}^{'} to obtain d_{i} and d_{i}^{'} . If d_{i} < N / 2, h_{i} = 1 is set; otherwise, h_{i} = 0 . Similarly, if d_{i}^{'} < N / 2, then h_{i}^{'} = 1 is set; otherwise, h_{i}^{'} = 0$ ;
(b): $Alice encodes h_{i} and h_{i}^{'} to points N_{i} and N_{i}^{'} (1 \leq i \leq n) on the elliptic curve (E_{p} (a, b)) one by one, selects n random numbers (b_{1 i} and b_{1 i}^{'}), adopts the encryption method in step 1, uses the public key (K) to encrypt each element (N_{i} and N_{i}^{'} (1 \leq i \leq n)) one by one, obtains E (h) = (E (h_{1}), \dots, E (h_{n})) and E (h^{'}) = (E (h_{1}^{'}), \dots, E (h_{n}^{'}))$ , and sends them to Bob;

(4)

Bob makes the following calculation:

(a): $For each i \in [1, n], when r_{i} > 0, let H_{i} = E (h_{i}); when r_{i} < 0, calculate H_{i} = T [E (h_{i})] . Similarly, when r_{i}^{'} > 0, let H_{i}^{'} = E (h_{i}^{'}); when r_{i}^{'} < 0, calculate H_{i}^{'} = T [E (h_{i}^{'})]$ ;
(b): $Compute \overset{\land}{H} = \sum_{i \in [1, n]} H_{i} + H_{i}^{'} = E (e)$ ;
(c): $Select the random number r^{*} < \frac{N}{4 (n + 1)}, then select the random numbers a_{1 i}, a_{1 i}^{'}, a_{2 i}, a_{2 i}^{'}, use the encryption method in step 1 to encrypt with the public key (K) : Z_{1} = E (1), Z_{2} = E (2 n), Z_{3} = E (2 e), Z_{4} = E (- 2 t), Z = \underset{r^{*}}{\underset{⏟}{(Z_{1} + \dots Z_{1})}} + \underset{r^{*}}{\underset{⏟}{(Z_{2} + \dots Z_{2})}} - \underset{r^{*}}{\underset{⏟}{(Z_{3} + \dots Z_{3})}} + \underset{r^{*}}{\underset{⏟}{(Z_{4} + \dots Z_{4})}}, and send Z to Alice;$

(5)

Alice decrypts to obtain z = D (Z), and decodes the x - coordinate of point z to obtain z^{*} . If z^{*} < N / 2, then L_{t} (x, y) = 1; otherwise, L_{t} (x, y) = 0 . Bob will be informed by outputting L_{t} (x, y)

.

The protocol ends.

3.3. Correctness Analysis

(1): Steps (1)–(4) of the protocol, executed in parallel by Alice and Bob with $u$ and $v$ and $u^{'}$ and $v^{'}$ , respectively, reduce its communication complexity;
(2): In step (2b) of the protocol, for each $0 \leq u_{i}, v_{i} \leq \sqrt{N / 2}$ , $0 < |r_{i}| < \sqrt{N / 2}$ , and because $u_{i} \neq v_{i}$ , the range of the value of $r_{i} (u_{i} - v_{i})$ is $- N / 2 < r_{i} (u_{i} - v_{i}) < 0$ or $0 < r_{i} (u_{i} - v_{i}) < N / 2$ . According to Theorem 3, if $0 < d_{i} < N / 2$ , $0 < r_{i} (u_{i} - v_{i}) < N / 2$ , then $r_{i}$ and $u_{i} - v_{i}$ are the same number, and $h_{i} = 1$ ; if $d_{i} > N / 2$ , then $- N / 2 < r_{i} (u_{i} - v_{i}) < 0$ , and then $r_{i}$ and $u_{i} - v_{i}$ are different numbers, and $h_{i} = 0$ ;
(3): For the input $u$ and $v$ , it is known that $u_{i} > v_{i}$ when and only when $H_{i} = E (1)$ ; $u_{i} < v_{i}$ when and only when $H_{i} = E (0)$ . It is further known from Theorem 3 that $x_{i} > y_{i}$ when and only when $H_{i} = E (1)$ ; $x_{i} \leq y_{i}$ when and only when $H_{i} = E (0)$ ;
(4): For the input $u^{'}$ and $v^{'}$ , it is known that $u_{i}^{'} > v_{i}^{'}$ when and only when $H_{i}^{'} = E (0)$ ; $u_{i}^{'} < v_{i}^{'}$ when and only when $H_{i}^{'} = E (1)$ . It is further known from Theorem 3 that $x_{i} < y_{i}$ when and only when $H_{i}^{'} = E (1)$ ; $x_{i} \geq y_{i}$ when and only when $H_{i}^{'} = E (0)$ ;
(5): Clearly, the $i$ -th component of vectors $x$ and $y$ is equal when $H_{i}$ , $H_{i}^{'}$ are simultaneously 0. By the ECC additive homomorphism, it follows that $\overset{\land}{H} (e) = \sum_{i \in [1, n]} H_{i} + H_{i}^{'} = E (n - l)$ ;
(6): Steps (4c) and (5) of the protocol are for $l = L (x, y)$ and $t$ size comparisons, and by the ECC property:

$Z = \underset{r^{*}}{\underset{⏟}{(Z_{1} + \dots Z_{1})}} + \underset{r^{*}}{\underset{⏟}{(Z_{2} + \dots Z_{2})}} - \underset{r^{*}}{\underset{⏟}{(Z_{3} + \dots Z_{3})}} + \underset{r^{*}}{\underset{⏟}{(Z_{4} + \dots Z_{4})}} = E [r^{*} (2 l + 1 - 2 t) \mod N]$

(2)

According to Theorem 2, the following is known:

0 < z^{*} < N / 2, then 2 l + 1 > 2 t \Leftrightarrow l \geq t \Leftrightarrow L_{t} (x, y) = 1

(3)

N / 2 < z^{*} < N, then L_{t} (x, y) = 0 .

(4)

For example: Alice’s input vector:

x = (2,6, 8,9)

; Bob’s input vector:

y = (2,6, 7,8)

; threshold:

t = 2

.

Preparation procedure: Alice calculates

u = 2 x_{i} = (4,12,16,18)

,

- u_{}^{'} = - (2 x_{i} + 1) = (- 5, - 13, - 17, - 19)

. Bob calculates

- v = - (2 y_{i} + 1) = (- 5, - 13, - 15, - 17)

,

v_{}^{'} = 2 y_{i} = (4,12,14,16)

.

Calculation procedure:

(1): Alice encrypts $E (u) = (E (4), E (12), E (16), E (18))$ and $E (- u_{}^{'}) = (E (- 5), E (- 13), E (- 17), E (- 19))$ and sends them to Bob;
(2): Bob encrypts $E (- v) = (E (- 5), E (- 13), E (- 15), E (- 17))$ , $E (v_{}^{'}) = (E (4), E (12), E (14), E (16))$ and picks the random vectors $r = (1,2, - 3, - 1)$ and $r^{'} = (1,3, 2, - 1)$ . Then, Bob computes $w_{i} = E [r_{i} (u_{i} - v_{i})] = [E (1 \times (- 1)), E (2 \times (- 1)), E ((- 3) \times 1), E ((- 1) \times 1)]$ and $w_{i}^{'} = E [r_{i}^{'} (v_{i}^{'} - u_{i}^{'})] = [E (1 \times (- 1)), E (3 \times (- 1)), E (2 \times (- 3)), E ((- 1) \times (- 3))]$ , sending them to Alice;
(3): Alice decrypts to obtain $h_{i}$ and $h_{i}^{'}$ (i.e., $h = (0,0, 0,0)$ and $h^{'} = (0,0, 0, 1)$ ). Alice encrypts to obtain $E (h) = (E (0), E (0), E (0), E (0))$ and $E (h^{'}) = (E (0), E (0), E (0), E (1))$ and sends them to Bob;
(4): For each $i \in [1, n]$ , when $r_{i} > 0$ , let $H_{i} = E (h_{i})$ ; when $r_{i} < 0$ , calculate $H_{i} = T [E (h_{i})]$ . Similarly, when $r_{i}^{'} > 0$ , let $H_{i}^{'} = E (h_{i}^{'})$ ; when $r_{i}^{'} < 0$ , calculate $H_{i}^{'} = T [E (h_{i}^{'})]$ . Bob obtains the vectors $H_{i} = (E (0), E (0), E (1), E (1))$ and $H^{'} = (E (0), E (0), E (0), E (0))$ . $\overset{\land}{H} = E (0 + 0) + E (0 + 0) + E (1 + 0) + E (1 + 0)) = E (2)$ is computed;
(5): Bob selects $r^{*} = 1$ and brings it into the calculation to obtain $Z = E (1 + 2 n - 2 e - 2 t) = E (1)$ , and send $Z$ to Alice. Alice decypts and outputs $L_{t} (x, y) = 1$ , which shows that the number of equal elements in $x$ , $y$ is greater than or equal to the set threshold (i.e., $l \geq t$ ), and the similarity of the two vectors is at least $\frac{2}{4} = 50 %$ .

To sum up, Protocol 1 is secure because the participants are honest under the semi-honest model. However, malicious adversaries can exist in real situations, and so the design of a secure MPC protocol is required under the malicious model.

4. Secure Computation Protocol of Text Similarity under the Malicious Model

4.1. Solutions

By analyzing the possible attack behaviors of malicious participants in Protocol 1, the aim is to detect the attack in time, or the participants will be discovered once the attack is implemented. The following is an analysis of possible malicious actions in Protocol 1:

(1): In Protocol 1, both Alice and Bob can encrypt plaintexts, but only Alice can decrypt the ciphertext. Once Alice informs Bob of an incorrect result, Bob can only accept the result and is reactive. Compared with Alice, Bob is extremely inequitable. Therefore, two factors should be considered (i.e., both Alice and Bob can perform fairly and obtain the correct result);
(2): In Protocol 1, Alice and Bob need to inform each other of the encryption results when they execute the protocol. If one party intentionally informs the wrong ciphertext, this is an input error, which cannot be prevented in the ideal protocol, and so it will not be considered;
(3): In Step 5, Alice may not output the correct results after decryption, but Alice already knows the correct results at this time, and there is some malicious behavior in this step.

Aiming at the above malicious behaviors, this paper designs an SCTS protocol under the malicious model. The design idea is that Alice and Bob have the same status, and each has a public key and a private key and use the zero-knowledge-proof and cut-and-choose methods to verify whether the calculation results are consistent. Please refer to Figure 2 for details.

The specific Algorithm 2 is as follows:

Algorithm 2: Computing the text similarity under the malicious model.

Input : x = (x_{1}, \dots, x_{n}) : Alice ’ s input; y = (y_{1}, \dots, y_{n}) : Bob ’ s input; t : agreed threshold between both parties; G : the base point of the elliptic curve (E_{p}); a : Alice ’ s choice; b : Bob ’ s choice; p k_{1} = K_{1} : Alice ’ s public key; p k_{2} = K_{2} : Bob ’ s public key; s k_{1} = k_{1} : Alice ’ s private key; s k_{2} = k_{2} : Bob ’ s private key; E : encrypt; D : decrypt; Encode : encode points onto E_{p}; u_{i} = 2 x_{i}, - u_{i}^{'} = - (2 x_{i} + 1) : Alice ’ s calculation; - v_{i} = - (2 y_{i} + 1), v_{i}^{'} = 2 y_{i}

: Bob’s calculation;

1: $Compute q_{1} = a K_{1}$ ;
2: $Compute q_{2} = b K_{2}$ ;
3: $Exchange (K_{1}, q_{1}) and (K_{2}, q_{2})$ ;
4: $E n c o d e (u_{1}, \dots, u_{n}) = (M_{1}, \dots, M_{n}), E n c o d e (- u_{1}^{'}, \dots, - u_{n}^{'}) = (M_{1}^{'}, \dots, M_{n}^{'})$ ;
5: $E n c o d e (- v_{1}, \dots, - v_{n}) = (P_{1}, \dots, P_{n}), E n c o d e (v_{1}^{'}, \dots, v_{n}^{'}) = (P_{1}^{'}, \dots, P_{n}^{'})$ ;
6: $Select a_{i}, a_{i}^{'}, b_{i}, b_{i}^{'}$ ;
7: ${E_{p k}}_{1} (u_{i}) = (M_{i} + a_{i} K_{1}, a_{i} G), {E_{p k}}_{1} (- u_{i}^{'}) = (M_{i}^{'} + a_{i}^{'} K_{1}, a_{i}^{'} G)$ ;
8: ${E_{p k}}_{2} (- v_{i}) = (P_{i} + b_{i} K_{2}, b_{i} G), E_{p k} (v_{i}^{'}) = (P_{i}^{'} + b_{i}^{'} K_{2}, b_{i}^{'} G)$ ;
9: $Select random vectors r_{a} = (r_{a 1}, \dots, r_{a n}), r_{a}^{'} = (r_{a 1}^{'}, \dots, r_{a n}^{'})$ ;
10: $Select random vectors r_{b} = (r_{b 1}, \dots, r_{b n}), r_{b}^{'} = (r_{b 1}^{'}, \dots, r_{b n}^{'})$ ;
11: $Compute w_{i} = E [r_{a i} (u_{i} - v_{i})] = E (r_{a i} u_{i}) + E (- r_{a i} v_{i}) and w_{i}^{'} = E [r_{i}^{'} (v_{i}^{'} - u_{i}^{'})] = E (r_{i}^{'} v_{i}^{'}) + E (- r_{i}^{'} u_{i}^{'})$ ;
12: $Compute g_{i} = E [r_{b i} (u_{i} - v_{i})] = E (r_{b i} u_{i}) + E (- r_{b i} v_{i}) and g_{i}^{'} = E [r_{b i}^{'} (v_{i}^{'} - u_{i}^{'})] = E (r_{b i}^{'} v_{i}^{'}) + E (- r_{b i}^{'} u_{i}^{'})$ ;
13: $Exchange w_{i}, w_{i}^{'} and g_{i}, g_{i}^{'}$ ;
14: $D_{s k_{1}} (w_{i}) = r_{a i} (u_{i} - v_{i}) = {W_{1}}_{i}, D_{s k_{1}} (w_{i}^{'}) = r_{a i}^{'} (v_{i}^{'} - u_{i}^{'}) = W_{2 i}$ ;
15: $D_{s k_{2}} (g_{i}) = r_{b i} (u_{i} - v_{i}) = {Q_{1}}_{i}, D_{s k_{2}} (g_{i}^{'}) = r_{b i}^{'} (v_{i}^{'} - u_{i}^{'}) = Q_{2 i}$ ;
16: $Select p_{1 s}, p_{2 s}, f_{1 s}, f_{2 s}, 0 \leq s \leq m$ ;
17: $(c_{1 a}^{s}, c_{2 a}^{s}) = (p_{1 s} W_{1 i} + K_{1}, W_{1 i} + p_{1 s} W_{1 i} + a G) and (o_{1 a}^{s}, o_{2 a}^{s}) = (p_{2 s} W_{2 i} + K_{1}, W_{2 i} + p_{2 s} W_{2 i} + a G)$ ;
18: $(c_{1 b}^{s}, c_{2 b}^{s}) = (f_{1 s} Q_{1 i} + K_{2}, Q_{1 i} + f_{1 s} Q_{1 i} + b G) and (o_{1 b}^{s}, o_{2 b}^{s}) = (f_{2 s} Q_{2 i} + K_{2}, Q_{2 i} + f_{2 s} Q_{2 i} + b G)$ ;
19: $Exchange (c_{1 a}^{s}, c_{2 a}^{s}), (o_{1 a}^{s}, o_{2 a}^{s}) and (c_{1 b}^{s}, c_{2 b}^{s}), (o_{1 b}^{s}, o_{2 b}^{s})$ ;
20: $Choose m / 2 groups (c_{1 b}^{s}, c_{2 b}^{s}) and (o_{1 b}^{s}, o_{2 b}^{s})$
21: $If (f_{1 s} Q_{1 i} + K_{2} = c_{1 b}^{s}, f_{2 s} Q_{2 i} + K_{2} = o_{1 b}^{s})$ , then continue; otherwise, terminate;
22: $Choose m / 2 groups (c_{1 a}^{s}, c_{2 a}^{s}) and (o_{1 a}^{s}, o_{2 a}^{s})$ ;
23: $If (p_{1 s} W_{1 i} + K_{1} = c_{1 a}^{s}, p_{2 s} W_{2 i} + K_{1} = o_{1 a}^{s})$ , then continue; otherwise, terminate;
24: $Choose one (c_{1 b}^{j}, c_{2 b}^{j}) and (o_{1 b}^{j}, o_{2 b}^{j})$ from remaining groups;
25: $c_{b} = a (c_{2 b}^{j} - c_{1 b}^{j} - W_{1 i} + K_{2}) = a (Q_{1 i} - W_{1 i}) + a b G, c_{b}^{'} = a (o_{2 b}^{j} - o_{1 b}^{j} - W_{2 i} + K_{2}) = a (Q_{2 i} - W_{2 i}) + a b G$ ;
26: $O_{1} = q_{1}^{*} G, λ_{b} = q_{1}^{*} K_{2}, O_{1}^{'} = q_{1}^{'} G, λ_{b}^{'} = q_{1}^{'} K_{2}$ ;
27: $Choose one (c_{1 a}^{j}, c_{2 a}^{j}) and (o_{1 a}^{j}, o_{2 a}^{j})$ from remaining groups;;
28: $c_{a} = b (c_{2 a}^{j} - c_{1 a}^{j} - Q_{1 i} + K_{1}) = b (W_{1 i} - Q_{1 i}) + a b G, c_{a}^{'} = b (o_{2 b}^{j} - o_{1 b}^{j} - Q_{2 i} + K_{1}) = b (W_{2 i} - Q_{2 i}) + a b G$ ;
29: $O_{2} = q_{2}^{*} G, λ_{a} = q_{2}^{*} K_{1}, O_{2}^{'} = q_{2}^{'} G, λ_{a}^{'} = q_{2}^{'} K_{1}$ ;
30: $Exchange c_{b} + O_{1}, c_{b}^{'} + O_{1}^{'} and c_{a} + O_{2}, c_{a}^{'} + O_{2}^{'}$ ;
31: $β_{a} = k_{1} (c_{a} + O_{2}), m_{a} = k_{1} c_{a} and β_{a}^{'} = k_{1} (c_{a}^{'} + O_{2}^{'}), m_{a}^{'} = k_{1} c_{a}^{'}$ ;
32: $β_{b} = k_{2} (c_{b} + O_{1}), m_{b} = k_{2} c_{b} and β_{b}^{'} = k_{2} (c_{b}^{'} + O_{1}^{'}), m_{b}^{'} = k_{2} c_{b}^{'}$ ;
33: $Exchange β_{a}, m_{a}, β_{a}^{'}, m_{a}^{'} and β_{b}, m_{b}, β_{b}^{'}, m_{b}^{'}$ ;
34: $If (m_{b} = β_{b} - λ_{b}, m_{b}^{'} = β_{b}^{'} - λ_{b}^{'})$ , then continue; otherwise, terminate;
35: $If (m_{a} = β_{a} - λ_{a}, m_{a}^{'} = β_{a}^{'} - λ_{a}^{'})$ , then continue; otherwise, terminate;
36: $If k_{2} a (Q_{1 i} - W_{1 i}) = 0 by m_{b} - a q_{2} and k_{2} a (Q_{2 i} - W_{2 i}) by m_{b}^{'} - a q_{2}$ , continue;
37: $If k_{1} b (W_{1 i} - Q_{1 i}) = 0 by m_{a} - b q_{1} and k_{1} b (W_{2 i} - Q_{2 i}) by m_{a}^{'} - b q_{1}$ , continue;
38: $D (W_{1 i}) = d_{1 i}, D (W_{2 i}) = d_{1 i}^{'}; if d_{1 i} < N / 2, then set h_{1 i} = 1; otherwise, h_{1 i} = 0$ ;
39: $D (Q_{1 i}) = d_{2 i}, D (Q_{2 i}) = d_{2 i}^{'}; if d_{1 i}^{'} < N / 2, then set h_{1 i}^{'} = 1; otherwise, h_{1 i}^{'} = 0$ ;
40: $E_{p k_{1}} (h_{1}) = E_{p k_{1}} (h_{11}), \dots, E_{p k_{1}} (h_{1 n})$ ;
41: $E_{p k_{2}} (h_{1}^{'}) = E_{p k_{2}} (h_{11}^{'}), \dots, E_{p k_{2}} (h_{1 n}^{'})$ ;
42: $Obtain H_{1 i} and H_{1 i}^{'}$ ;
43: $Obtain H_{2 i} and H_{2 i}^{'}$ ;
44: $Compute \overset{\land}{H_{1}} = \sum_{i \in [1, n]} H_{1 i} + H_{1 i}^{'} = E (e_{1}) = E (n - l)$ ;
45: $Compute \overset{\land}{H_{2}} = \sum_{i \in [1, n]} H_{2 i} + H_{2 i}^{'} = E (e_{2}) = E (n - l)$ ;
46: $Select r_{1}, r_{2}$ ;
47: $Compute Z = E_{p k_{1}} (r_{1} 1) + E_{p k} (r_{1} 2 n) - {E_{p k}}_{1} (r_{1} 2 e_{1}) + E_{p k_{1}} (- 2 t r_{1}) = E [r_{1} (2 l + 1 - 2 t)]$ ;
48: $Compute Z^{'} = {E_{p k}}_{2} (r_{2} 1) + {E_{p k}}_{2} (r_{2} 2 n) - {E_{p k}}_{2} (r_{2} 2 e_{2}) + E_{p k_{2}} (- 2 t r_{2}) = E [r_{2} (2 l + 1 - 2 t)]$ ;
49: $D_{s k_{1}} (Z) = z_{1}, D_{s k_{2}} (Z^{'}) = z_{2}$ ;
50: $Select p_{i} and p_{i}^{'} (0 \leq i \leq m)$ ;
51: $(c_{11 a}^{i}, c_{12 a}^{i}) = (p_{i} z_{1} + K_{1}, z_{1} + p_{i} z_{1} + a G)$ ;
52: $(c_{11 b}^{i}, c_{12 b}^{i}) = (p_{i}^{'} z_{2} + K_{2}, z_{2} + p_{i}^{'} z_{2} + b G)$ ;
53: $Exchange (c_{11 a}^{i}, c_{12 a}^{i}) and (c_{11 b}^{i}, c_{12 b}^{i})$ ;
54: $Choose m / 2 groups from (c_{11 a}^{i}, c_{12 a}^{i}) and (c_{11 b}^{i}, c_{12 b}^{i})$ ;
55: $If (c_{11 b}^{i} = p_{i}^{'} z_{2} + K_{2})$ , then continue; otherwise, terminate;
56: $If (c_{11 a}^{i} = p_{i} z_{1} + K_{1})$ , then continue; otherwise, terminate;
57: $Choose one (c_{11 b}^{j}, c_{12 b}^{j}) and (c_{11 a}^{j}, c_{12 a}^{j})$ from remaining groups;
58: $Compute c_{b 1} = a (c_{12 b}^{j} - c_{11 b}^{j} - z_{1} + K_{2}) = a (z_{2} - z_{1}) + a b G, J_{1} = q_{3} G, λ_{b 1} = q_{3} K_{2}$ ;
59: $Compute c_{a 1} = b (c_{12 a}^{j} - c_{11 a}^{j} - z_{2} + K_{1}) = b (z_{1} - z_{2}) + a b G, J_{2} = q_{4} G, λ_{a 1} = q_{4} K_{1}$ ;
60: $Exchange c_{b 1} + J_{1}, c_{a 1} + J_{2}$ ;
61: $β_{a 1} = k_{1} (c_{a 1} + J_{2}), m_{a 1} = k_{1} c_{a 1}$ ;
62: $β_{b 1} = k_{2} (c_{b 1} + J_{1}), m_{b 1} = k_{2} c_{b 1}$ ;
63: $Exchange β_{a 1}, m_{a 1} and β_{b 1}, m_{b 1}$ ;
64: $If (m_{b 1} = β_{b 1} - λ_{b 1})$ , then continue; otherwise, terminate;
65: $If (m_{a 1} = β_{a 1} - λ_{a 1})$ , then continue; otherwise, terminate;
66: $If k_{2} a (z_{2} - z_{1}) = 0 by m_{b 1} - a q_{2}$ , then continue;
67: $If k_{1} b (z_{1} - z_{2}) = 0 by m_{a 1} - b q_{1}$ , then continue;
68: $D (z_{1}) = z_{3}; if z_{3} < N / 2, make L_{t} (x, y) = 1$ ;
69: $D (z_{2}) = z_{4}; if z_{4} < N / 2, make L_{t} (x, y) = 1$ ;

Output : L_{t} (x, y)

The specific Protocol 2 is as follows:

Protocol 2: The SCTS protocol under the malicious model.

Input : Alice ’ s text vector (x = (x_{1}, \dots, x_{n})), Bob ’ s text vector (y = (y_{1}, \dots, y_{n})), and a threshold (t) .

Output : The size relationship (L_{t} (x, y)) between the number of equal elements (l and t) of two vectors (x and y) .

Preparation : For each i \in [1, n], Alice calculates u_{i} = 2 x_{i}, - u_{i}^{'} = - (2 x_{i} + 1) and Bob calculates - v_{i} = - (2 y_{i} + 1), v_{i}^{'} = 2 y_{i} (u_{i}, v_{i} < N / 2) . Alice and Bob together choose the elliptic curve (E_{p}) and base point (G), and they choose the private key (k_{1}, k_{2} (k_{1}, k_{2} > 0)) and random number (a, b), respectively . Then, both parties calculate their public keys (K_{1} = k_{1} G, K_{2} = k_{2} G and q_{1} = a K_{1}, q_{2} = b K_{2}, respectively) . Finally, Alice and Bob exchange (K_{1}, q_{1}) and (K_{2}, q_{2}) .

Protocol Start:

(1)

Alice and Bob calculate the following:

(a): $Alice encodes the plaintext sets u = (u_{1}, \dots, u_{n}) and - u^{'} = (- u_{1}^{'}, \dots, - u_{n}^{'}) to points M_{i} and M_{i}^{'} (1 \leq i \leq n) on the elliptic curve (E_{p}) one by one, selects n random numbers (a_{i} and a_{i}^{'}), and encrypts each element (M_{i} and M_{i}^{'} (1 \leq i \leq n)) separately by using the public key K_{1} . That is, the ciphertexts E (M_{i}) = (C_{1 i}, C_{2 i}) and E (M_{i}^{'}) = (C_{1 i}^{'}, C_{2 i}^{'}) are calculated, corresponding to each element (M_{i} and M_{i}^{'} (1 \leq i \leq n)), where C_{1 i} = M_{i} + a_{i} K, C_{2 i} = a_{i} G and C_{1 i}^{'} = M_{i}^{'} + a_{i}^{'} K, C_{2 i}^{'} = a_{i}^{'} G . Then, Alice obtains the sets E (u) = (E (M_{1}), E (M_{2}), \dots, E (M_{n})) and E (- u^{'}) = (E (M_{1}^{'}), E (M_{2}^{'}), \dots, E (M_{n}^{'})), and sends E (u) and E (- u^{'})$ to Bob;
(b): $Bob encodes the plaintext sets - v = (- v_{1}, \dots, - v_{n}) and v^{'} = (v_{1}^{'}, \dots, v_{n}^{'}) to points P_{i} and P_{i}^{'} (1 \leq i \leq n) on the elliptic curve (E_{p}) one by one, selects n random numbers (b_{i} and b_{i}^{'}), and encrypts each element (P_{i} and P_{i}^{'} (1 \leq i \leq n)) separately by using the public key K_{2} . That is, the ciphertexts E (P_{i}) = (I_{1 i}, I_{2 i}) and E (P_{i}^{'}) = (I_{1 i}^{'}, I_{2 i}^{'}) are calculated, corresponding to each element (P_{i} and P_{i}^{'} (1 \leq i \leq n)), where I_{1 i} = P_{i} + b_{i} K, I_{2 i} = b_{i} G, I_{1 i}^{'} = P_{i}^{'} + b_{i}^{'} K, I_{2 i}^{'} = b_{i}^{'} G, and C_{1 i}^{'} = M_{i}^{'} + a_{i}^{'} K, C_{2 i}^{'} = a_{i}^{'} G . Then, Bob obtains the sets E (- v) = (E (P_{1}), E (P_{2}), \dots E (P_{n})) and E (v^{'}) = (E (P_{1}^{'}), E (P_{2}^{'}), \dots, E (P_{n}^{'})), and sends E (- v) and E (v^{'})$ to Alice;

(2)

After receiving the ciphertexts of each other, the participants calculate the following:

(a): $Alice selects random vectors r_{a} = (r_{a 1}, \dots, r_{a n}) and r_{a}^{'} = (r_{a 1}^{'}, \dots, r_{a n}^{'}), where 0 < |r_{a i}|, |r_{a i}^{'}| < \sqrt{N / 2}, i \in [1, n] . Meanwhile, Bob selects random vectors r_{b} = (r_{b 1}, \dots, r_{b n}) and r_{b}^{'} = (r_{b 1}^{'}, \dots, r_{b n}^{'}), where 0 < |r_{b i}|, |r_{b i}^{'}| < \sqrt{N / 2}, i \in [1, n]$ ;
(b): $For each i \in [1, n], Alice calculates w_{i} and w_{i}^{'}$ :
$w_{i 1} = E (u_{i}), w_{i 2} = E (- v_{i}), w_{i} = E [r_{a i} (u_{i} - v_{i})] = E (r_{a i} u_{i}) + E (- r_{a i} v_{i}), and that is w_{i} = \underset{r_{a i}}{\underset{⏟}{(w_{i 1} + \dots + w_{i 1})}} + \underset{r_{a i}}{\underset{⏟}{(w_{i 2} + \dots + {w_{i}}_{2})}} . At the same time, Alice calculates w_{i 1}^{'} = E (- u_{i}^{'}), w_{i 2}^{'} = E (v_{i}^{'}), w_{i}^{'} = E [r_{a i}^{'} (v_{i}^{'} - u_{i}^{'})] = E (r_{a i}^{'} v_{i}^{'}) + E (- r_{a i}^{'} u_{i}^{'}), and that is w_{i}^{'} = \underset{r_{a i}^{'}}{\underset{⏟}{(w_{i 1}^{'} + \dots + w_{i 1}^{'})}} + \underset{r_{a i}^{'}}{\underset{⏟}{(w_{i 2}^{'} + \dots + w_{i 2}^{'})}} . Then, Alice sends the ciphertexts w_{i} and w_{i}^{'}$ to Bob.
$For each i \in [1, n], Bob calculates g_{i} and g_{i}^{'}$ :
$g_{i 1} = E (u_{i}), g_{i 2} = E (- v_{i}), g_{i} = E [r_{b i} (u_{i} - v_{i})] = E (r_{b i} u_{i}) + E (- r_{b i} v_{i}), and that is g_{i} = \underset{r_{b i}}{\underset{⏟}{(g_{i 1} + \dots + g_{i 1})}} + \underset{r_{b i}}{\underset{⏟}{(g_{i 2} + \dots + {g_{i}}_{2})}} . At the same time, Bob calculates g_{i 1}^{'} = E (- u_{i}^{'}), g_{i 2}^{'} = E (v_{i}^{'}), g_{i}^{'} = E [r_{b i}^{'} (v_{i}^{'} - u_{i}^{'})] = E (r_{b i}^{'} v_{i}^{'}) + E (- r_{b i}^{'} u_{i}^{'}), and that is g_{i}^{'} = \underset{r_{b i}^{'}}{\underset{⏟}{(g_{i 1}^{'} + \dots + g_{i 1}^{'})}} + \underset{r_{b i}^{'}}{\underset{⏟}{(g_{i 2}^{'} + \dots + g_{i 2}^{'})}} . Then, Bob sends the ciphertexts g_{i} and g_{i}^{'}$ to Alice;

(3): $For each i \in [1, n], Alice decrypts w_{i} and w_{i}^{'} with the private key k_{1} to obtain points W_{1 i} and W_{2 i} . Bob decrypts g_{i} and g_{i}^{'} using k_{2} to obtain points Q_{1 i} and Q_{2 i}$ ;
(4): $For each i \in [1, n], Alice selects m random numbers p_{1 s} (0 \leq s \leq m) and p_{2 s} (0 \leq s \leq m), and calculates (c_{1 a}^{s}, c_{2 a}^{s}) = (p_{1 s} W_{1 i} + K_{1}, W_{1 i} + p_{1 s} W_{1 i} + a G) and (o_{1 a}^{s}, o_{2 a}^{s}) = (p_{2 s} W_{2 i} + K_{1}, W_{2 i} + p_{2 s} W_{2 i} + a G) . Bob selects m random numbers f_{1 s} (0 \leq s \leq m) and f_{2 s} (0 \leq s \leq m), and calculates (c_{1 b}^{s}, c_{2 b}^{s}) = (f_{1 s} Q_{1 i} + K_{2}, Q_{1 i} + f_{1 s} Q_{1 i} + b G) and (o_{1 b}^{s}, o_{2 b}^{s}) = (f_{2 s} Q_{2 i} + K_{2}, Q_{2 i} + f_{2 s} Q_{2 i} + b G) . Finally, Alice and Bob exchange (c_{1 a}^{s}, c_{2 a}^{s}), (o_{1 a}^{s}, o_{2 a}^{s}), and (c_{1 b}^{s}, c_{2 b}^{s}), (o_{1 b}^{s}, o_{2 b}^{s})$ ;
(5): Using the cut-and-choose method:
$Alice randomly selects m / 2 groups from m groups (c_{1 b}^{s}, c_{2 b}^{s}) and (o_{1 b}^{s}, o_{2 b}^{s}) sent by Bob to publish, and Bob is required to publish the corresponding f_{1 s} Q_{1 i} and f_{2 s} Q_{2 i} . Then, Alice verifies the received data : f_{1 s} Q_{1 i} + K_{2} = c_{1 b}^{s}, f_{2 s} Q_{2 i} + K_{2} = o_{1 b}^{s} . If the verification passes, then the protocol is continued; if the verification does not pass, then the protocol is terminated . Bob randomly selects m / 2 groups from m groups (c_{1 a}^{s}, c_{2 a}^{s}) and (o_{1 a}^{s}, o_{2 a}^{s}) sent by Alice to publish, and Alice is required to publish the corresponding p_{1 s} W_{1 i} and p_{2 s} W_{2 i} . Then, Bob verifies the received data : p_{1 s} W_{1 i} + K_{1} = c_{1 a}^{s}, p_{2 s} W_{2 i} + K_{1} = o_{1 a}^{s}$ . If the verification passes, then the protocol is continued; if the verification does not pass, then the protocol is terminated;
(6): $Alice randomly selects one (c_{1 b}^{j}, c_{2 b}^{j}) and (o_{1 b}^{j}, o_{2 b}^{j}) from the remaining m / 2 groups (c_{1 b}^{s}, c_{2 b}^{s}) and (o_{1 b}^{s}, o_{2 b}^{s}), respectively . Bob randomly selects one (c_{1 a}^{j}, c_{2 a}^{j}) and (o_{1 a}^{j}, o_{2 a}^{j}) from the remaining m / 2 groups (c_{1 a}^{s}, c_{2 a}^{s}) and (o_{1 a}^{s}, o_{2 a}^{s}), respectively . Meanwhile, Alice and Bob choose random numbers (a, q_{1}^{*}, q_{1}^{'} and b, q_{2}^{*}, q_{2}^{'}, respectively) . Alice calculates c_{b} = a (c_{2 b}^{j} - c_{1 b}^{j} - W_{1 i} + K_{2}) = a (Q_{1 i} - W_{1 i}) + a b G and c_{b}^{'} = a (o_{2 b}^{j} - o_{1 b}^{j} - W_{2 i} + K_{2}) = a (Q_{2 i} - W_{2 i}) + a b G, respectively making O_{1} = q_{1}^{*} G, λ_{b} = q_{1}^{*} K_{2} and O_{1}^{'} = q_{1}^{'} G, λ_{b}^{'} = q_{1}^{'} K_{2} . Bob calculates c_{a} = b (c_{2 a}^{j} - c_{1 a}^{j} - Q_{1 i} + K_{1}) = b (W_{1 i} - Q_{1 i}) + a b G and c_{a}^{'} = b (o_{2 b}^{j} - o_{1 b}^{j} - Q_{2 i} + K_{1}) = b (W_{2 i} - Q_{2 i}) + a b G, respectively making O_{2} = q_{2}^{*} G, λ_{a} = q_{2}^{*} K_{1}, and O_{2}^{'} = q_{2}^{'} G, λ_{a}^{'} = q_{2}^{'} K_{1} . Then, c_{b} + O_{1}, c_{b}^{'} + O_{1}^{'} and c_{a} + O_{2}, c_{a}^{'} + O_{2}^{'}$ are exchanged between Alice and Bob;
(7): $After both parties receive messages from the other, Alice calculates β_{a} = k_{1} (c_{a} + O_{2}), m_{a} = k_{1} c_{a} and β_{a}^{'} = k_{1} (c_{a}^{'} + O_{2}^{'}), m_{a}^{'} = k_{1} c_{a}^{'} and sends them to Bob . Bob calculates β_{b} = k_{2} (c_{b} + O_{1}), m_{b} = k_{2} c_{b} and β_{b}^{'} = k_{2} (c_{b}^{'} + O_{1}^{'}), m_{b}^{'} = k_{2} c_{b}^{'}$ and sends them to Alice;
(8): $To determine whether m_{b} and m_{b}^{'} sent by Bob are correct, Alice uses the zero - knowledge proof to check to prove that Bob really obtains the m_{b} by multiplying her private key (k_{2}) and her own c_{b}, and to prove that Bob really obtains the m_{b}^{'} by multiplying her private key (k_{2}) and her own c_{b}^{'} to judge whether m_{b} = β_{b} - λ_{b} and m_{b}^{'} = β_{b}^{'} - λ_{b}^{'} are correct, respectively . Similarly, Bob uses the same method to determine whether m_{a} = β_{a} - λ_{a} and m_{a}^{'} = β_{a}^{'} - λ_{a}^{'}$ are correct. The party who dose not pass is malicious;
(9): $Alice can obtain k_{2} a (Q_{1 i} - W_{1 i}) by calculating m_{b} - a q_{2}; if k_{2} a (Q_{1 i} - W_{1 i}) = 0, then Q_{1 i} = W_{1 i}; at the same time, Bob calculates m_{b}^{'} - a q_{2} to obtain k_{2} a (Q_{2 i} - W_{2 i}); if k_{2} a (Q_{2 i} - W_{2 i}) = 0, then Q_{2 i} = W_{2 i} . Similarly, Bob obtains k_{1} b (W_{1 i} - Q_{1 i}) by calculating m_{a} - b q_{1}; if k_{1} b (W_{1 i} - Q_{1 i}) = 0, then Q_{1 i} = W_{1 i}; at the same time, Bob calculates m_{a}^{'} - b q_{1} to obtain k_{1} b (W_{2 i} - Q_{2 i}); if k_{1} b (W_{2 i} - Q_{2 i}) = 0, then Q_{2 i} = W_{2 i} . If Q_{1 i} = W_{1 i} and Q_{2 i} = W_{2 i}$ are valid at the same time, it proves that the results required by both parties are correct and equal, or the protocol is terminated;
(10): $Alice decodes the points W_{1 i}, W_{2 i} to obtain points d_{1 i}, d_{1 i}^{'}; Bob decodes the points Q_{1 i}, Q_{2 i} to obtain points d_{2 i}, d_{2 i}^{'} . If d_{1 i} < N / 2, then h_{1 i} = 1 is set; otherwise, h_{1 i} = 0 is set . Similarly, if d_{1 i}^{'} < N / 2, then h_{1 i}^{'} = 1 is set; otherwise, h_{1 i}^{'} = 0 is set . Similarly, Bob obtains h_{2 i} and h_{2 i}^{'}$ ;
(11): $Alice encodes the points h_{1 i} and h_{1 i}^{'} to points N_{1 i} and N_{1 i}^{'} (1 \leq i \leq n) on the E_{p} (a, b), selects n random numbers (θ_{1 i} and θ_{1 i}^{'}), adopts the encryption method in step 1 (a), uses the K_{1} to encrypt each element (N_{1 i} and N_{1 i}^{'} (1 \leq i \leq n)) one by one, obtains E (h_{1}) = E (h_{11}), \dots, E (h_{1 n}) and E (h_{1}^{'}) = E (h_{11}^{'}), \dots, E (h_{1 n}^{'})$ , and sends them to Bob.
$Bob encodes the points h_{2 i} and h_{2 i}^{'} to points N_{2 i} and N_{2 i}^{'} (1 \leq i \leq n) on the elliptic curve (E_{p} (a, b)) one by one, selects n random numbers (θ_{2 i} and θ_{2 i}^{'}), adopts the encryption method in step 1 (b), uses the public key (K_{2}) to encrypt each element (N_{2 i} and N_{2 i}^{'} (1 \leq i \leq n)) one by one, obtains E (h_{2}) = E (h_{21}), \dots, E (h_{2 n}) and E (h_{2}^{'}) = E (h_{21}^{'}), \dots, E (h_{2 n}^{'})$ , and sends them to Alice;

(12): $Alice calculates that for each i \in [1, n], when r_{a i} > 0, let H_{1 i} = E (h_{1 i}); when r_{a i} < 0, calculate H_{1 i} = T [E (h_{1 i})] . Similarly, when r_{a i}^{'} > 0, let H_{1 i}^{'} = E (h_{1 i}^{'}); when r_{a i}^{'} < 0, calculate H_{1 i}^{'} = T [E (h_{1 i}^{'})] . Similarly, Bob calculates H_{2 i} and H_{2 i}^{'}$ ;
(13): $Alice calculates \overset{\land}{H_{1}} = \sum_{i \in [1, n]} H_{1 i} + H_{1 i}^{'} = E (e_{1}), Bob calculates \overset{\land}{H_{2}} = \sum_{i \in [1, n]} H_{2 i} + H_{2 i}^{'} = E (e_{2})$ ;
(14): $Alice selects a random number (r_{1} < \frac{N}{4 (n + 1)}), uses the encryption method in step 1 (a), selects the random numbers a_{1 i}, a_{1 i}^{'}, a_{2 i}, a_{2 i}^{'}, encrypts with the public key K_{1}, and calculates Z_{1} = E (1), Z_{2} = E (2 n), Z_{3} = E (2 e_{1}), Z_{4} = (- 2 t), Z = \underset{r_{1}}{\underset{⏟}{(Z_{1} + \dots Z_{1})}} + \underset{r_{1}}{\underset{⏟}{(Z_{2} + \dots Z_{2})}} - \underset{r_{1}}{\underset{⏟}{(Z_{3} + \dots Z_{3})}} + \underset{r_{1}}{\underset{⏟}{(Z_{4} + \dots Z_{4})}} = (C_{1}, C_{2}) . Alice sends the point Z to Bob . Bob selects a random number (r_{2} < \frac{N}{4 (n + 1)}), uses the encryption method in step 1 (b), selects the random numbers a_{3 i}, a_{3 i}^{'}, a_{4 i}, a_{4 i}^{'}, encrypts with the public key K_{2}, and calculates Z_{1}^{'} = E (1), Z_{2}^{'} = E (2 n), Z_{3}^{'} = E (2 e_{2}), Z_{4}^{'} = E (- 2 t), Z^{'} = \underset{r_{2}}{\underset{⏟}{(Z_{1}^{'} + \dots Z_{1}^{'})}} + \underset{r_{2}}{\underset{⏟}{(Z_{2}^{'} + \dots Z_{2}^{'})}} - \underset{r_{2}}{\underset{⏟}{(Z_{3}^{'} + \dots Z_{3}^{'})}} + \underset{r_{2}}{\underset{⏟}{(Z_{4}^{'} + \dots Z_{4}^{'})}} = (C_{1}^{'}, C_{2}^{'}) . Bob sends point Z^{'}$ to Alice;
(15): $Alice decrypts E (Z) using the private key k_{1} (i . e ., calculates C_{1} - k_{1} C_{2} = z_{1} and obtains point z_{1}); Bob decrypts E (Z^{'}) using the private key k_{2} (i . e ., calculates C_{1}^{'} - k_{2} C_{2}^{'} = z_{2} and obtains point z_{2}$ );
(16): $Alice selects m random numbers (p_{i} (0 \leq i \leq m)) and calculates (c_{11 a}^{i}, c_{12 a}^{i}) = (p_{i} z_{1} + K_{1}, z_{1} + p_{i} z_{1} + a G) . Bob selects m random numbers (p_{i}^{'} (0 \leq i \leq m)) and calculates (c_{11 b}^{i}, c_{12 b}^{i}) = (p_{i}^{'} z_{2} + K_{2}, z_{2} + p_{i}^{'} z_{2} + b G) . The final step is for Alice and Bob to exchange (c_{11 a}^{i}, c_{12 a}^{i}) and (c_{11 b}^{i}, c_{12 b}^{i})$ ;
(17): $With the help of the cut - and - choose method, Alice randomly selects m / 2 groups from these data (c_{11 b}^{i}, c_{12 b}^{i}) and publishes them, while, at the same time, Bob publishes p_{i}^{'} z_{2} . Alice verifies c_{11 b}^{i} = p_{i}^{'} z_{2} + K_{2} . Bob uses the same method and asks Alice to publish p_{i} z_{1} . Bob verifies c_{11 a}^{i} = p_{i} z_{1} + K_{1}$ . If the equation holds, then the next step is taken; if the equation does not hold, then the protocol is stopped;
(18): $Alice and Bob randomly select one (c_{11 b}^{j}, c_{12 b}^{j}) and (c_{11 a}^{j}, c_{12 a}^{j}) from the remaining m / 2 groups (c_{11 b}^{i}, c_{12 b}^{i}) and (c_{11 a}^{i}, c_{12 a}^{i}), respectively . At the same moment, Alice and Bob each pick random numbers (a, q_{3} and b, q_{4}) . Alice calculates c_{b 1} = a (c_{12 b}^{j} - c_{11 b}^{j} - z_{1} + K_{2}) = a (z_{2} - z_{1}) + a b G, J_{1} = q_{3} G, λ_{b 1} = q_{3} K_{2}, and Bob calculates c_{a 1} = b (c_{12 a}^{j} - c_{11 a}^{j} - z_{2} + K_{1}) = b (z_{1} - z_{2}) + a b G, J_{2} = q_{4} G, λ_{a 1} = q_{4} K_{1} . Then, c_{b 1} + J_{1}, c_{a 1} + J_{2}$ are exchanged between Alice and Bob;
(19): $After both parties receive messages from the other, Alice calculates β_{a 1} = k_{1} (c_{a 1} + J_{2}) and m_{a 1} = k_{1} c_{a 1}, Bob calculates β_{b 1} = k_{2} (c_{b 1} + J_{1}) and m_{b 1} = k_{2} c_{b 1}$ , and the messages are sent to the other party;
(20): $Alice needs to judge whether m_{b 1} = β_{b 1} - λ_{b 1} is true if she wants to prove whether the m_{b 1} sent by Bob is correct through the zero - knowledge - proof method . If Bob wants to verify the correctness of the m_{a 1} sent by Alice through zero - knowledge proof, then he needs to judge whether m_{a 1} = β_{a 1} - λ_{a 1}$ is true. If one party’s equation does not hold, the agreement will terminate;
(21): $Alice can obtain k_{2} a (z_{2} - z_{1}) by calculating m_{b 1} - a q_{2} . If k_{2} a (z_{2} - z_{1}) = 0, then z_{1} = z_{2}; Bob can obtain k_{1} b (z_{1} - z_{2}) by calculating m_{a 1} - b q_{1} . If k_{1} b (z_{1} - z_{2}) = 0, then z_{1} = z_{2} . z_{1} = z_{2}$ means that both parties’ results are correct; otherwise, the protocol is immediately stopped;
(22): $Finally, Alice and Bob decode the x - coordinates of points z_{1} and z_{2}, respectively, to obtain z_{3} and z_{4} . If z_{3} < N / 2, then L_{t} (x, y) = 1 is made; otherwise, L_{t} (x, y) = 0 is made, and L_{t} (x, y) is finally output . Similarly, if z_{4} < N / 2, then L_{t} (x, y) = 1 is made; otherwise, L_{t} (x, y) = 0 is made, and L_{t} (x, y)$ is finally output.

The protocol ends.

4.2. Correctness Analysis

The execution operations of Alice and Bob in Protocol 2 are identical, and so only Alice’s execution process is analyzed.

(1): In the protocol preparation stage, Alice converts vector $x$ into vectors $u$ and $- u^{'}$ by calculating $u_{i} = 2 x_{i}$ and $- u_{i}^{'} = - (2 x_{i} + 1)$ . Therefore, Alice does not disclose any information about the privacy vector ( $x$ ) during the protocol execution phase;
(2): In steps (5) and (6) of the protocol, Alice verifies whether there are malicious adversaries in the protocol via the cut-and-choose method;
(3): In step (8) of the protocol, Alice uses zero-knowledge proof to verify that $m_{b}$ and $m_{b}^{'}$ sent by Bob are correct (that is, to determine whether $m_{b} = β_{b} - λ_{b}$ and $m_{b}^{'} = β_{b}^{'} - λ_{b}^{'}$ are correct, respectively);
(4): The possible malicious behavior of Alice in the first round (from step 1 to step (10)) is that the random numbers $p_{1 s}$ and $p_{2 s}$ selected by Alice in step (4) do not meet the requirements, are not detected in the step 5 verification, and happen to be the choice of Bob in step (6), which leads Bob to obtain a faulty result. If Alice uses the method described above for spoofing, then the maximum probability of successful spoofing is m = 10, as an example. If five groups do not meet the requirements, the probability is $\frac{C_{9}^{5}}{C_{10}^{5}} \times \frac{1}{5} = \frac{1}{10}$ . If more than half of the group does not meet the requirements, the probability of successful spoofing drops to zero and is always detected in the verification phase. Therefore, the first round of the protocol is secure;
(5): In steps (17) and (18), Alice uses the cut-and-choose method to verify whether there are malicious adversaries in the protocol;
(6): In step (20) of the protocol, whether the $m_{b 1}$ sent by Bob is correct is verified by Alice using the zero-knowledge-proof method;
(7): The possible malicious behavior of Alice in the second round (from step (11) to step (22)) is that the random numbers ( $p_{i}$ ) selected by Alice in step (16) do not meet the requirements, are not detected in the step (17) verification, and happen to be the choice of Bob in step (18), which leads Bob to obtain a wrong result. The maximum probability of successful Alice spoofing is the same as in step (4). Thus, the second round of the protocol is secure.

4.3. Security Proof

For SCTS protocols under the malicious model, the real/ideal-model paradigm is widely used to prove the security of Protocol 2.

The parties pass two rounds of validation of Protocol 2. In the first round of verification, steps 1–3 are mainly used to calculate

W_{1 i}, W_{2 i}

and

Q_{1 i}, Q_{2 i}

confidentially. If Alice and Bob have verified that

W_{1 i} = Q_{1 i}

and

W_{2 i} = Q_{2 i}

according to the protocol, then they pass the verification. In the second round of verification, step 15 is mainly used to obtain points

z_{1}

and

z_{2}

confidentially. If Alice and Bob verify that

z_{1} = z_{2}

according to the protocol, then they pass the verification. The

W_{1 i}, W_{2 i}

and

Q_{1 i}, Q_{2 i}

in the first step and the

z_{1}

and

z_{2}

in the second step will be the input data under the ideal model. If the protocol ends, then the input message sent to the TTP is incorrect. Therefore, input information errors will not be considered in the malicious model. In the proof phase, the problem can be transformed into whether

W_{1 i}

is equal to

Q_{1 i}

,

W_{2 i}

is equal to

Q_{2 i}

, and

z_{1}

is equal to

z_{2}

. Therefore, proving whether

z_{1}

is equal to

z_{2}

is required.

Theorem 4.

Protocol 2 (denoted as ∏ ) is secure under the malicious model.

See Appendix A for specific certification process.

5. Performance Analysis

This paper analyzes the performance of the protocol via the complexity and communication complexity (number of communication rounds), and it compares the execution times of the existing protocols with those of Protocol 1 and Protocol 2 through experimental simulation.

5.1. Efficiency Analysis

References [17,18] and Protocol 1 are all used to calculate the SCTS under the semi-honest model. In Reference [17], two sequences with lengths of

n

are coded into two 0–1 sequences with lengths of

4 n

and are encrypted via the GM encryption algorithm. This requires

20 n \log N

times of modular multiplications, but it can only be used for string matching, with a small scope of application. Reference [18] confidentially determined whether two strings are equal, with an encryption number of

n + m

and a decryption number of

n - m

, requiring

[3 \log_{2} N (n + 1)] \log N

modular multiplication operations (

N

is usually taken as 1024 bits), but its efficiency is relatively low. However, the ECC encryption method is adopted in Protocol 1 of this paper, assuming that the character length is

n

(which can be encoded into

n

vector components), and a total of

16 n + 6

modular multiplication operations are required (other simple arithmetic operations and inversion operations can be ignored), which has a wider application range and higher efficiency.

Protocol 2 is an SCTS protocol under the malicious model. At present, no relevant protocol under the malicious model has been found. For solving the problem of text similarity, the protocol proposed in Reference [19] is more efficient. It is based on the GM encryption algorithm. Assuming that the character length is

n

, the protocol requires

8 (n + m - 1) l^{'} \log N

times of modular multiplications (

l^{'} \geq 2

is a security parameter,

N = 1024

bits), but it cannot defend against malicious attacks. Protocol 2 uses the ECC encryption method (the character length is

n

), which requires

32 n + 12

times of modular multiplications (other simple arithmetic operations and flip operations are ignored). Not only does it improve the efficiency, but it can also resist attacks from malicious enemies.

For the communication complexity measured by the amount of communication rounds, Protocol 1 and References [17,18] require two rounds, while Protocol 2 and Reference [19] require four rounds. Obviously, compared with References [17,18], Protocol 1 in this paper improved the computational efficiency and has a wider application range with the same number of communication rounds. Compared with Reference [19], Protocol 2 in this article has advantages in all aspects and can resist attacks from malicious adversaries.

The performances of Protocol 1 and Protocol 2 are compared with those of References [17,18,19], as shown in Table 1.

Both Protocol 1 and Protocol 2 in this article adopt ECC, which is not only more efficient but also more widely applicable compared to the GM and Paillier encryption schemes. Protocol 2 in this article is a protocol under the malicious model of ECC encryption, which can resist attacks from malicious adversaries and has improved efficiency compared to References [17,18,19]. With a small difference in the number of communication rounds, Protocol 2 is more efficient and secure. Therefore, Protocol 2 is efficient in terms of both the computational complexity and application scope.

5.2. Experimental Analysis

To further verify the execution efficiencies of the protocols in this paper, the experiment used the Windows 10 64-bit (home version) operating system, and the processor was Inter (R) Core (TM) i7-6600 CPU@3.30 GHZ, the memory was 8 GB, and the Java language was used to run the implementation on MyEclipse. The experiment used textual datasets, such as patients’ electronic health records, medical records, disease databases, etc., used for clinical diagnosis, encoded them into vectors, and combined them with the protocol algorithm of this paper, which, in turn, determined the patients’ disease types, such as infectious diseases, cardiovascular diseases, and other types of diseases. The next step was to compare the execution times of the protocols under the premise of achieving the same classification effect (i.e., the shorter the execution time, the higher efficiency of the protocol).

By comparing the execution times of the protocols through simulation experiments, the execution efficiencies of the protocols could be obtained. First, in this study, when the character length

n

was 1, 2, 3, …, 10, each set value of

n

was simulated 1000 times, and the average value was counted. Figure 3 shows a comparison of the execution times for different string lengths in Protocol 2. The horizontal coordinate indicates the string length, and the vertical coordinate indicates the execution time. When

n = 10

, the execution time of Protocol 2 is 1000 ms.

Secondly, this study selected the string length

n = 5

, carried out 1000 simulation experiments on different values of modulus

N

, and counted the average value. Figure 4 shows a comparison of the execution times for different module

N

lengths in Protocol 2. The horizontal coordinate indicates the length of module

N

, and the vertical coordinate indicates the execution time. When

n = 5

and

N = 1024

bits, the execution time of Protocol 2 is 712 ms.

Next, Figure 5 compares the execution times of References [17,18,19] with those of Protocol 1 and Protocol 2 in this paper through simulation experiments. The abscissa represents the string length, and the ordinate represents the execution time of the protocol. When the string length is 10, the execution times of the protocols are as follows: Reference [18]: 1440 ms; Reference [17]: 1200 ms; Reference [19]: 1118 ms; and Protocol 2: 1000 ms, which shows that the protocol of this paper has the shortest execution time and higher efficiency in the case of the same length of string. Figure 6 shows the execution times of References [17,18,19] and Protocol 1 and Protocol 2 under different settings of module

N

(

n = 5

at this time) through simulation experiments. The abscissa represents the length of module

N

, and the ordinate represents the execution time of the protocol. When the length of module

N

is 1024 bits, the execution times of the protocols are as follows: Reference [18]: 1024 ms; Reference [17]: 932 ms; Reference [19]: 800 ms; and Protocol 2: 712 ms, which shows that in the case of the same number of modes, the execution time of the protocol in this paper is the shortest and more efficient.

Of course, in order to compare the implementation efficiencies of the protocols more comprehensively, the delay times during the experiment should be considered. Figure 7 shows the delay times of Protocol 2 for different string lengths during the experiment. Figure 8 shows the delay times of Protocol 2 for different lengths of module

N

(when

n = 5

). For example, when

n = 1

, we can see from Figure 5 that the execution time of Protocol 2 is 100 ms, and from Figure 7, that the delay time of Protocol 2 is 0.6 ms; thus, the total time consumed in the execution process of Protocol 2 should be 100.6 ms.

Figure 5 and Figure 6 show that the execution times of the protocols increase regardless of the character length or module length. The execution times of Protocol 1 and Protocol 2 are the shortest and the most efficient when compared to those of References [17,18,19]. At the same time, Protocol 2 is not only efficient, but it is also resistant to attacks by malicious adversaries, and it has a great improvement in the security performance. Therefore Protocol 2 has greater practical value.

6. Conclusions

Text similarity computation in deep learning and natural language processing has a wide range of application scenarios and important application value in intelligent recommendation systems, information retrieval, data mining, etc. The existing SCTS protocols are inefficient and cannot resist malicious adversaries. Therefore, based on the efficient ECC encryption algorithm, this paper proposes an SCTS protocol under the semi-honest model. For the malicious behaviors that may be committed by malicious participants under the semi-honest protocol, the SCTS protocol under the malicious model is designed using the cut-and-choose and zero-knowledge-proof methods. The security of the protocol is proven by the real/ideal-model paradigm. Compared with the efficiencies of existing schemes, the protocol proposed in this paper is more efficient and can resist malicious attacks, which has practical value.

In future work, we will extend the number of participants from two parties to multiple parties to study the SCTS, and we will use the protocol in this paper as a basic tool to solve more secure multi-party computation problems, such as string-pattern matching, interval determination, etc., so that it can play more roles in more fields in the future.

Author Contributions

Conceptualization, X.L. (Xin Liu) and R.W.; methodology, R.W. and X.L. (Xiaomeng Liu); investigation, X.L. (Xin Liu); software, D.L. and G.X.; experimental simulation, D.L.; security proof, D.L.; English grammar modification, N.X.; validation, N.X. and X.C.; writing—review and editing, X.L. (Xin Liu) and N.X., writing—original draft, X.L. (Xin Liu) and X.C; funding acquisition, D.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China: Big Data Analysis based on Software Defined Networking Architecture, grant numbers 62177019 and F0701; NSFC, grant numbers 62271070, 72293583, and 61962009; the Inner Mongolia Natural Science Foundation, grant number 2021MS06006; the 2023 Inner Mongolia Young Science and Technology Talents Support Project, grant number NJYT23106; the 2022 Fund Project of Central Government Guiding Local Science and Technology Development, grant number 2022ZY0024; the 2022 Fundamental Research Funds for the Inner Mongolia University of Science and Technology 2022-101; the Inner Mongolia Postgraduate Scientific Research Innovation Project, grant number 2023; the 2022 “Western Light” Talent Training Program “Western Young Scholars” Project, grant number 22040601; the 14th Five-Year Plan of Education and Science of Inner Mongolia, grant number NGJGH2021167; the Open Foundation of State Key Laboratory of Networking and Switching Technology (Beijing University of Posts and Telecommunications), grant number SKLNST-2023-1-08; the 2022 Inner Mongolia Postgraduate Education and Teaching Reform Project: JGSZ2022037; the 2022 Ministry of Education Central and Western China Young Backbone Teachers and Domestic Visiting Scholars Program, grant number 2022015; the Inner Mongolia Discipline Inspection and Supervision Big Data Laboratory Open Project Fund, grant number IMDBD202020; the Baotou Kundulun District Science and Technology Plan Project, grant number YF2020013; the Inner Mongolia Science and Technology Major Project, grant number 2019ZD025; Project JCKY2021208B036, and the Fundamental Research Funds for Beijing Municipal Commission of Education, grant number 220201.

Data Availability Statement

The authors agreed to include data in the article to support the findings.

Conflicts of Interest

The authors declare that there are no conflict of interest.

Appendix A

Theorem A1.

Protocol 2 (denoted as ∏) is secure under the malicious model.

Proof:

Suppose that when the protocol

\prod

is executed, both parties take an acceptable policy pair (

\bar{A} = (A_{1}, A_{2})

). To prove that the protocol is secure under the malicious model,

\bar{A} = (A_{1}, A_{2})

can be converted into policy pair

\bar{B} = (B_{1}, B_{2})

under the ideal model. The security of the protocol is guaranteed if at least one of the two parties is honest. Thus, the protocol can occur in either of the following two ways:

(1)

A_{1}

is honest,

A_{2}

is dishonest (that is, Alice is honest, Bob is dishonest), and there are the following:

R E A L_{\bar{A}} (z_{1}, z_{2}) = {F (z_{1}, A_{2} (z_{2})), A_{2} ((c_{11 a}^{i}, c_{12 a}^{i}), m_{a 1}, S)},

(A1)

where

S

is the data generated by

A_{2}

in the zero-knowledge-proof process,

i = 1, \dots, m

, and

F

is the function executed by protocol

\prod

. To demonstrate the security of the protocol, we only need to find the policy pair

\bar{B} = (B_{1}, B_{2})

under the ideal model, the output and

R E A L_{\bar{A}} (z_{1}, z_{2})

of which are computationally indistinguishable.

Because

A_{1}

is honest (Alice is honest),

B_{1}

will send the correct

z_{1}

to the TTP. During this period,

B_{1}

will allow the TTP to send the message to

B_{2}

after receiving the message. There is no case in which

B_{2}

cannot receive the message. What data

B_{2}

sends to the TTP depend on

A_{2}

’s actual operation strategy (

B_{2}

needs to call

A_{2}

to achieve this). Ideally,

B_{2}

sends

z_{2}

to

A_{2}

. In reality,

A_{2}

sends

A_{2} (z_{2})

to

B_{2}

.

B_{2}

sends

A_{2} (z_{2})

to the TTP and outputs

F (z_{1}, A_{2} (z_{2}))

from the TTP. Ideally,

B_{2}

uses the

F (z_{1}, A_{2} (z_{2}))

sent to itself by the TTP to obtain

v i e w_{B_{2}} (z_{1}, A_{2} (z_{2}))

, which is indistinguishable from the

v i e w_{A_{2}} (z_{1}, A_{2} (z_{2}))

obtained by

A_{2}

in the actual situation, so that it is the same as the output of

A_{2}

in the actual situation. In fact,

B_{2}

selects a

z_{1}^{'}

to obtain

F (A_{1} (z_{1}), z_{2}^{'}) = F (A_{1} (z_{1}), z_{2})

, then executes Protocol 2 to obtain

m_{a 1}^{'}

,

c_{11 a}^{'}

, and

c_{12 a}^{'}

, and marks the sequence received by the zero-knowledge proof as

S^{'}

:

{I D E A L_{\bar{B}} (z_{1}, z_{2})} = {F (z_{1}, A_{2} (z_{2}), A_{2} ((c_{11 a}^{i^{'}}, c_{12 a}^{i^{'}}), m_{a 1}^{'}, S^{'})} .

(A2)

Because the ciphertexts obtained in the real and ideal conditions use the same probability algorithm, there are

c_{11 a}^{i^{'}} \overset{c}{\equiv} c_{11 a}^{i}

,

c_{12 a}^{i^{'}} \overset{c}{\equiv} c_{12 a}^{i}

, and

S^{'} \overset{c}{\equiv} S

; thus,

{R E A L_{\bar{A}} (z_{1}, z_{2}) \overset{c}{\equiv} I D E A L_{\bar{B}} (z_{1}, z_{2})}

;

(2)

A_{2}

is honest,

A_{1}

is dishonest (that is, Bob is honest, Alice is dishonest), and there are two situations:

(2.1) In the actual situation,

A_{1}

passes the zero-knowledge-proof verification and publishes the results:

R E A L_{\bar{A}} (z_{1}, z_{2}) = {A_{1} ((c_{11 b}^{i}, c_{12 b}^{i}), m_{b 1}, S), F (z_{1}, z_{2})} .

(A3)

(2.2) In the actual situation,

A_{1}

does not perform zero-knowledge-proof verification and does not announce the results:

R E A L_{\bar{A}} (z_{1}, z_{2}) = {A_{1} ((c_{11 b}^{i}, c_{12 b}^{i}), m_{b 1}, S), ⊥} .

(A4)

Because

A_{2}

is honest,

B_{2}

sends the correct

z_{2}

to the TTP (there is no protocol termination during this period), while

B_{1}

sends the data to the TTP depending on

A_{1}

’s actual operation strategy (

B_{1}

needs to call

A_{1}

to achieve this). Ideally,

B_{1}

sends

z_{1}

to

A_{1}

. In reality,

A_{1}

sends

A_{1} (z_{1})

to

B_{1}

,

B_{1}

sends

A_{1} (z_{1})

to the TTP, and outputs

F (A_{1} (z_{1}), z_{2})

data from the TTP. In the actual situation, if

A_{1}

does not perform zero-knowledge proof or the results are not announced by

A_{1}

, then ideally the TTP will output

⊥

to

B_{2}

. Ideally,

B_{1}

uses the

F (A_{1} (z_{1}), z_{2})

sent to itself by the TTP to obtain

v i e w_{B_{1}} (A_{1} (z_{1}), z_{2})

, which is indistinguishable from the

v i e w_{A_{1}} (A_{1} (z_{1}), z_{2})

obtained by

A_{1}

in the actual situation, so that it is the same as the result of the actual case of

A_{1}

. That is,

B_{1}

selects a

z_{2}^{'}

to obtain

F (A_{1} (z_{1}), z_{2}^{'}) = F (A_{1} (z_{1}), z_{2})

, then executes Protocol 2 to obtain

m_{b 1}^{'}

,

c_{11 b}^{'}

,

c_{12 b}^{'}

, and writes down the sequence of zero-knowledge proofs received as

S^{'}

.

In the ideal-model protocol, when

B_{1}

does not publish the results to

B_{2}

through the TTP,

I D E A L_{\bar{B}} (z_{1}, z_{2}) = {A_{1} ((c_{11 b}^{i^{'}}, c_{12 b}^{i^{'}}), m_{b 1}^{'}, S^{'}), ⊥} .

(A5)

In the ideal situation, when

B_{1}

publishes the results to

B_{2}

through the TTP,

I D E A L_{\bar{B}} (z_{1}, z_{2}) = {A_{1} ((c_{11 b}^{i^{'}}, c_{12 b}^{i^{'}}), m_{b 1}^{'}, S^{'}), F (A_{1} (z_{1}), z_{2})} .

(A6)

Because the ciphertexts obtained in the real and ideal conditions use the same probability algorithm, there are

c_{11 b}^{i^{'}} \overset{c}{\equiv} c_{11 b}^{i}

,

c_{12 b}^{i^{'}} \overset{c}{\equiv} c_{12 b}^{i}

, and

S^{'} \overset{c}{\equiv} S

; thus,

{I D E A L_{\bar{B}} (z_{1}, z_{2}) \overset{c}{\equiv} {R E A L_{\bar{A}} (z_{1}, z_{2})}

.

To sum up, Protocol 2 under the malicious model is secure. □

References

Shahamiri, S.R. Speech vision: An end-to-end deep learning-based dysarthric automatic speech recognition system. IEEE Trans. Neural Syst. Rehabil. Eng. 2021, 29, 852–861. [Google Scholar] [CrossRef]
Lou, R.; Wang, W.; Li, X.; Zheng, Y.C.; Lv, Z.H. Prediction of Ocean Wave Height Suitable for Ship Autopilot. IEEE Trans. Intell. Transp. Syst. 2021, 23, 25557–25566. [Google Scholar] [CrossRef]
Lauriola, I.; Lavelli, A.; Aiolli, F. An introduction to deep learning in natural language processing: Models, techniques, and tools. Neurocomputing 2022, 470, 443–456. [Google Scholar] [CrossRef]
Khurana, D.; Koli, A.; Khatter, K.; Singh, S. Natural language processing: State of the art, current trends and challenges. Multimed. Tools Appl. 2023, 82, 3713–3744. [Google Scholar] [CrossRef] [PubMed]
Kumar, P.; Kumar, R.; Srivastava, G.; Gupta, G.P.; Tripathi, R.; Gadekallu, T.R.; Xiong, N.N. PPSF: A privacy-preserving and secure framework using blockchain-based machine-learning for IoT-driven smart cities. IEEE Trans. Netw. Sci. Eng. 2021, 8, 2326–2341. [Google Scholar] [CrossRef]
Yao, Y.; Xiong, N.; Park, J.H.; Ma, L.; Liu, J. Privacy-preserving max/min query in two-tiered wireless sensor networks. Comput. Math. Appl. 2013, 65, 1318–1325. [Google Scholar] [CrossRef]
Huang, S.; Zeng, Z.; Ota, K.; Dong, M.; Wang, T.; Xiong, N. An intelligent collaboration trust interconnections system for mobile information control in ubiquitous 5G networks. IEEE Trans. Netw. Sci. Eng. 2020, 8, 347–365. [Google Scholar] [CrossRef]
Fu, A.; Zhang, X.L.; Xiong, N.; Gao, Y.S.; Wang, H.Q.; Zhang, J. VFL: A verifiable federated learning with privacy-preserving for big data in industrial IoT. IEEE Trans. Ind. Inform. 2020, 18, 3316–3326. [Google Scholar] [CrossRef]
Chen, Y.W.; Zhou, L.D.; Pei, S.W.; Yu, Z.W.; Chen, Y.; Liu, X.; Du, J.X.; Xiong, N. KNN-BLOCK DBSCAN: Fast clustering for large-scale data. IEEE Trans. Syst. Man Cybern. Syst. 2019, 51, 3939–3953. [Google Scholar] [CrossRef]
Zheng, R.; Wang, Q.; Lin, Z.; Jiang, Z.W.; Fu, J.M.; Peng, G.J. Cryptocurrency malware detection in real-world environment: Based on multi-results stacking learning. Appl. Soft Comput. 2022, 124, 109044. [Google Scholar] [CrossRef]
Yao, A.C. Protocols for secure computation. In Proceedings of the 23rd Annual Symposium on Foundation of Computer Science, Chicago, IL, USA, 3–5 November 1982; pp. 160–164. [Google Scholar]
Goldreich, O. The Fundamental of Crytography: Basic Application; Cambridge University Press: London, UK, 2004. [Google Scholar]
Cramer, R.; Damgård, I.B.; Nielsen, J.B. Secure Multiparty Computation; Cambridge University Press: London, UK, 2015. [Google Scholar]
Tran, A.T.; Luong, T.D.; Karnjana, J.; Huynh, V.N. An efficient approach for privacy preserving decentralized deep learning models based on secure multi-party computation. Neurocomputing 2021, 422, 245–262. [Google Scholar] [CrossRef]
Zhang, E.; Li, H.; Huang, Y.; Hong, L.; Zhao, L.; Ji, C. Practical multi-party private collaborative k-means clustering. Neurocomputing 2022, 467, 256–265. [Google Scholar] [CrossRef]
Braun, L.; Demmler, D.; Schneider, T.; Tkachenko, O. MOTION—A Framework for Mixed-Protocol Multi-Party Computation. ACM Trans. Priv. Secur. 2022, 25, 1–35. [Google Scholar] [CrossRef]
Ma, M.; Xu, Y.; Liu, Z. Privacy preserving Hamming distance computing problem of DNA sequences. J. Comput. Appl. 2019, 39, 2636. [Google Scholar]
Zhang, K.X.; Yang, C.; Li, S.D. Confidential calculation of string matching. J. Cryptol. 2022, 9, 619–632. [Google Scholar]
Kang, J.; Li, S.D.; Yang, X.Y. Secure Multiparty Computation for String Pattern Matching. J. Cryptol. 2017, 4, 241–252. [Google Scholar]
Fiori, F.J.; Pakalén, W.; Tarhio, J. Approximate string matching with SIMD. Comput. J. 2022, 65, 1472–1488. [Google Scholar] [CrossRef]
Xu, L.; Wei, X.; Cai, G.; Li, Y.; Wang, H. SWMQ: Secure wildcard pattern matching with query. Int. J. Intell. Syst. 2022, 37, 6262–6282. [Google Scholar] [CrossRef]
Wang, Y.N.; Dou, J.W.; Ge, X. Secure vector computation based on threshold. J. Cryptol. 2020, 7, 750–762. [Google Scholar]
Guan, Z.; Zhou, X.; Liu, P.; Wu, L.F.; Yang, W.T. A blockchain based dual side privacy preserving multiparty computation scheme for edge enabled smart grid. IEEE Internet Things. 2021, 9, 14287–14299. [Google Scholar] [CrossRef]
Li, S.D.; Wang, W.L.; Du, R.M. Protocol for millionaires’ problem in malicious models. Sci. Sin. Inf. 2021, 51, 75–78. (In Chinese) [Google Scholar] [CrossRef]

Figure 1. Elliptic–curve operation.

Figure 2. Possible malicious behavior in Protocol 1.

Figure 3. Comparison of execution times for different string lengths in Protocol 2.

Figure 4. Comparison of execution times for different module N lengths in Protocol 2.

Figure 5. Comparison of execution times for different schemes (Reference [17]: Ma, M. 2019; Reference [18]: Zhang, K.X. 2022; Reference [19]: Kang, J. 2017).

Figure 6. Comparison of execution times for different schemes (Reference [17]: Ma, M. 2019; Reference [18]: Zhang, K.X. 2022; Reference [19]: Kang, J. 2017).

Figure 7. Comparison of delay times for different string lengths in Protocol 2.

Figure 8. Comparison of delay times for different module N lengths in Protocol 2.

Table 1. Protocol comparison.

Protocol	Calculation Complexity (Modular Multiplication)	Communication Rounds	Scope of Application	Resistance to Malicious Behaviors
Protocol 1	$16 n + 6$	2 rounds	Rational number, string	No
Reference [17]	$20 n \log N$	2 rounds	String	No
Reference [18]	$[3 \log_{2} N (n + 1)] \log N$	2 rounds	String	No
Protocol 2	$32 n + 12$	4 rounds	Rational number, string	Yes
Reference [19]	$8 (n + m - 1) l^{'} \log N$	4 rounds	String	No

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, X.; Wang, R.; Luo, D.; Xu, G.; Chen, X.; Xiong, N.; Liu, X. Secure Computation Protocol of Text Similarity against Malicious Attacks for Text Classification in Deep-Learning Technology. Electronics 2023, 12, 3491. https://doi.org/10.3390/electronics12163491

AMA Style

Liu X, Wang R, Luo D, Xu G, Chen X, Xiong N, Liu X. Secure Computation Protocol of Text Similarity against Malicious Attacks for Text Classification in Deep-Learning Technology. Electronics. 2023; 12(16):3491. https://doi.org/10.3390/electronics12163491

Chicago/Turabian Style

Liu, Xin, Ruxue Wang, Dan Luo, Gang Xu, Xiubo Chen, Neal Xiong, and Xiaomeng Liu. 2023. "Secure Computation Protocol of Text Similarity against Malicious Attacks for Text Classification in Deep-Learning Technology" Electronics 12, no. 16: 3491. https://doi.org/10.3390/electronics12163491

APA Style

Liu, X., Wang, R., Luo, D., Xu, G., Chen, X., Xiong, N., & Liu, X. (2023). Secure Computation Protocol of Text Similarity against Malicious Attacks for Text Classification in Deep-Learning Technology. Electronics, 12(16), 3491. https://doi.org/10.3390/electronics12163491

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Secure Computation Protocol of Text Similarity against Malicious Attacks for Text Classification in Deep-Learning Technology

Abstract

1. Introduction

2. Related Work

2.1. Elliptic-Curve Cryptography

2.2. Coding Method

2.3. Cut-and-Choose Method

2.4. Security of Malicious Model

3. Secure Computation Protocol of Text Similarity under the Semi-Honest Model

3.1. Problem Description

3.2. Solutions

3.3. Correctness Analysis

4. Secure Computation Protocol of Text Similarity under the Malicious Model

4.1. Solutions

4.2. Correctness Analysis

4.3. Security Proof

5. Performance Analysis

5.1. Efficiency Analysis

5.2. Experimental Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI