Next Article in Journal
Pattern QUBOs: Algorithmic Construction of 3SAT-to-QUBO Transformations
Previous Article in Journal
Automatic Word Length Selection with Boundary Conditions for HIL of Power Converters
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Secure Computation Protocol of Text Similarity against Malicious Attacks for Text Classification in Deep-Learning Technology

1
School of Information Engineering, Inner Mongolia University of Science and Technology, Baotou 014010, China
2
State Key Laboratory of Network and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China
3
Department of Computer, Tianjin Ren’ai College, Tianjin 301636, China
4
College of Information, North China University of Technology, Beijing 100144, China
5
Department of Computer Science and Mathematics, Sul Ross State University, Alpine, TX 79830, USA
*
Author to whom correspondence should be addressed.
Electronics 2023, 12(16), 3491; https://doi.org/10.3390/electronics12163491
Submission received: 18 July 2023 / Revised: 11 August 2023 / Accepted: 15 August 2023 / Published: 17 August 2023

Abstract

:
With the development of deep learning, the demand for similarity matching between texts in text classification is becoming increasingly high. How to match texts quickly under the premise of keeping private information secure has become a research hotspot. However, most existing protocols currently have full set limitations, and the applicability of these methods is limited when the data size is large and scattered. Therefore, this paper applies the secure vector calculation method for text similarity matching in the case of data without any complete set constraints, and it designs a secure computation protocol of text similarity (SCTS) based on the semi-honest model. At the same time, elliptic-curve cryptography technology is used to greatly improve the execution efficiency of the protocol. In addition, we also analyzed the possibility of the malicious behavior of participants in the semi-honest-model protocol, and further designed an SCTS protocol suitable for the malicious model using the cut-and-choose and zero-knowledge-proof methods. By proposing a security mechanism, this protocol aims to provide a reliable and secure computing solution that can effectively prevent malicious attacks and interference. Finally, through the analysis of the efficiencies of the existing protocols, the efficiencies of the protocols under the malicious model are further verified, and the practical value for text classification in deep learning is demonstrated.

1. Introduction

In recent years, with the development of Artificial Intelligence, deep learning has become a hot research topic and is widely used in speech recognition [1], automatic driving [2], natural language processing [3], and other fields. In these applications, information retrieval, semantic understanding, and text classification all require the secure computation protocol of text similarity (SCTS). For example, in text classification, the text to be classified can be classified by comparing the similarity between the text to be classified and the text of a known category to determine the category to which it belongs, which is often used in tasks such as sentiment analysis, spam filtering, medical classification, and news classification. When the similarity score between the text to be classified and the text of a known category is above a set threshold, the text to be classified is deemed to belong to that category. To accomplish the text classification task, a selection of commonly used techniques, such as convolutional neural networks, recurrent neural networks, attention mechanisms, and transformer models, can be used. Depending on the specific task and dataset requirements, there is flexibility in choosing the right approach and combining pre-trained models and migration learning to improve the classification performance. In general, deep learning has achieved good results in text-similarity-matching tasks [4,5,6]. Natural language processing is important in deep learning as a vector-based approach to solve the problem of text similarity [7,8,9,10].
However, in the era of big data, text information is highly susceptible to leakage, and the SCTS faces great challenges. The traditional method of calculating text similarity is to calculate the cosine similarity between two text vectors, but it consumes a long processing time and is mostly suitable for short texts, and its information is also easily leaked. This paper breaks the traditional method by increasing the length of the matching string, and there is no full set restriction on the range of the string, which greatly improves its applicability and security. Therefore, the SCTS is a necessary and secure multi-party computation (MPC) technology that can precisely achieve the secure computation of text similarity.
MPC was first introduced by Professor Yao [11], and Goldreich [12] and Cramer et al. [13] further studied MPC algorithms, including secure data mining [14], confidential computing sets and geometric problems [15], and secure vector operations [16]. These studies provide conditions for the development of MPC on the basis of protecting the privacy of their respective information.
SCTS can be abstracted as the secure computation of vectors. For example, the text data are represented as the vectors x = ( x 1 , x n ) , and the user’s text data are represented as the vectors y = ( y 1 , , y n ) . By setting a threshold ( t ), it is only necessary to judge whether the number of identical components of two vectors reaches the threshold ( t ). If the set threshold ( t ) is reached, then this means that they match; otherwise, they do not match. The user can know the similarity matching degree of two texts by setting the threshold ( t ).
There are few existing SCTS protocols, and the only ones are implemented in a semi-honest model. String matching can be seen as a special case in SCTS. Reference [17] used GM encryption with the aim of computing the similarity of two strings, but the time required to run the algorithm is long and inefficient. Reference [18] designed a string-matching protocol similar to the application scenario of this paper, which implements exact matching and can accurately classify text, but it is less secure and has a full set restriction. Reference [19] converted the string-matching problem into a set membership determination problem, and the string-matching protocol used in the confidential matching process is the BMH algorithm, which can greatly reduce the computational complexity, but it is not secure enough in the execution process, and it is still not able to resist the attacks of malicious adversaries. Reference [20] solved the problem of approximate string matching, and Reference [21] proposed a wildcard-pattern-matching protocol with a query function. However, the number of characters needs to be controlled within a certain range, which has significant limitations and is prone to information leakage. Reference [22] proposed the problem of vector secrecy computation, which can achieve the text similarity computation of two vectors by setting the corresponding thresholds, but the protocol is based on the semi-honest model, which is less efficient and secure.
To address the above issues, this article designs an SCTS protocol for malicious adversaries, which has important guiding significance for further promoting the research and application of the SCTS. The main contributions are as follows:
(1)
The text is converted into a vector by an encoding method, and the number of equal elements in the secure vector computation is compared with the threshold value to judge the similarity of the two texts;
(2)
The SCTS protocol under the semi-honest model is designed using elliptic-curve cryptography without set range limits. Compared with other encryption algorithms, elliptic-curve cryptography has obvious advantages, such as high security, a small key size, fast encryption and decryption, low storage requirements, and high adaptability;
(3)
For the possible malicious behaviors of malicious adversaries in the SCTS semi-honest protocol, based on the cut-and-choose method and zero-knowledge proof, an SCTS protocol under the malicious model is designed. Whether the protocol is secure is verified via the real/ideal-model paradigm, and the efficiency of the protocol is analyzed.
The paper is structured as follows: Section 2 introduces the encoding method and the basic theorem, and then Section 3 describes the solution to the problem and designs the SCTS protocol under the semi-honest model. Section 4 then improves on Section 3 by designing the SCTS protocol under the malicious model using cryptographic tools. Section 5 provides a comparative analysis of the performances of existing protocols. Finally, Section 6 summarizes the work accomplished in this paper and shows future research directions.

2. Related Work

2.1. Elliptic-Curve Cryptography

Elliptic-curve cryptography (ECC) [23] is an asymmetric public-key cryptosystem based on a finite field. Assuming the elliptic curve y2 = x3x, a straight line passing through point P and Q intersects point R′ on the elliptic curve, and a vertical line passing through point R′ on the X axis intersects point R on the elliptic curve, and then P + Q = R can be obtained. R′ is the additive inverse of R, and R′ and R are symmetric about the X axes. Please refer to Figure 1 for details.
When P = Q, make a straight line through point P to intersect point R′ on the elliptic curve. If P is the tangent point, then there is P + P = R, which is recorded as 2P = R. If there is a P with the same k added, it is recorded as k·P; for example, P + P + P + P = P + 3P = 4P. It is precise because of the discrete logarithm problem on the elliptic curve. When K = k·P, it is easy to obtain K when k and P are known, but it is impossible to obtain k when K and P are known.
The ECC methodology is as follows:
First, select an elliptic curve (Ep(a,b)), select the base point (G) and the private key (k), and then calculate kG = K to obtain the public key (K).
(1)
Encryption: Encode a plaintext (m) to a point (M) on the elliptic curve (Ep(a,b)), select a random positive integer (r), and calculate C1 = M +rK, C2 = rG;
(2)
Decryption: Decrypt M through the formula C1kC2 = M + rKk(rG) = M. If the plaintext information (m) is obtained, then the point M on the elliptic curve is needed to decode;
(3)
Addition homomorphism: the following properties exist: E(M1) + E(M2) = E(M1 + M2).
Theorem 1.
Suppose C = E(h) is the ciphertext of 0 or 1 encrypted by the elliptic curve, and −C is its additive inverse. According to the addition homomorphism of ECC, T(C) = E(1) + (−C) = E(1 − h). If C is the ciphertext of 0, then T(C) is the ciphertext of 1, and if C is the ciphertext of 1, then T(C) is the ciphertext of 0; that is, ciphertext T(C) flips the plaintext 1 or 0 corresponding to C; thus, T(C) = E(1) + (−C) is called a flip operation.
Theorem 2.
For any integer (a,b), the following conclusion holds:
(1)
a > b if and only if 2a > 2b +1;
(2)
a ≥ b if and only if 2a + 1 > 2b.
Theorem 3.
In the ECC encryption scheme, it is assumed that a,b take values in the plaintext space ZN. If C = E(a) + E(−b) = E(a – b mod N) and w = D(C), the following conclusions can be reached:
(1)
w = 0 if and only if a = b;
(2)
If 0 ≤ a,b < N/2, then 0 < w < N/2, if and only if a > b; N/2 < w < N only if a < b.
Note: The specific proof processes of Theorem 1 and Theorem 2 can be referred to in Reference [22].

2.2. Coding Method

In this paper, the ASCL code is used to one-to-one correspond each character with a three-digit decimal system (a specific ASCL table can be queried) and encode text characters into vectors. For example, the text string ‘Love’, according to the ASCL table, corresponds to the characters with three decimal digits. The ASCL code of the character ‘L’ is 076, the ‘o’ is 111, the ‘v’ is 118, and the ‘e’ is 101. Therefore, the number string corresponding to ‘Love’ is 076111118101, and the vector is (076,111,118,101). The number string corresponding to the text string ‘Like’ is 076105107101, which is expressed as the vector (076,105,107,101).

2.3. Cut-and-Choose Method

The cut-and-choose method [24] plays an essential role in resisting the attacks of malicious adversaries. One party sends a large amount of data, and the other party arbitrarily selects a part of the data and requires the other party to verify it. After the verification is passed, the remaining data are selected for calculation. However, after the malicious participant passes the verification in the first step, the other party happens to pick up the wrong data, which are always found in the verification phase; thus, the cut-and-choose method can effectively resist the attack of the malicious opponent.
Input:
(1)
Participant A inputs vector x i ( i = 1 , , l ); that is, x i = < ( x 0 i , 1 , x 1 i , 1 ) , ( x 0 i , 2 , x 1 i , 2 ) , , ( x 0 i , s , x 1 i , s ) > , and there are l vectors in total. s is used as the input to check whether the X 1 , X s value is in { 0 , 1 } n ;
(2)
Participant B inputs σ 1 , σ i { 0 , 1 } and a set of parameters ( ζ [ s ] ).
Output: The receiving party will obtain the following information:
(1)
The receiving party ( R ) receives the j -pair in the vector x i , namely, ( x 0 i , j , x 1 i , j ) ;
(2)
The receiving party ( R ) receives σ i (i.e., < x σ i i , 1 , x σ i i , 2 , , x σ i i , s > ). Among them, i = 1 , , l , j ζ , k ζ , and X k are output by the R .

2.4. Security of Malicious Model

The protocol under the malicious model has the highest security. Reference [24] specifically describes the security proofs of protocols under the malicious model.
Ideal protocol: Assume that data x and y are owned by Alice and Bob, respectively. They can compute equation f ( x , y ) = ( f 1 ( x , y ) , f 2 ( x , y ) ) through a trusted third party (TTP) without disclosing their own data. In the end, Alice can only obtain the result f 1 ( x , y ) and Bob can only obtain the result f 2 ( x , y ) , as follows:
(1)
Alice and Bob, respectively, send input data x and y to the TTP. If the participants are honest, then they will send authentic data ( x or y ) to the TTP; if the participants are malicious, then the participants might choose to terminate the protocol or input false data ( x or y );
(2)
The result of the calculation is sent to Alice by the TTP. After the TTP obtains the input pair ( x , y ) , it will independently calculate function f ( x , y ) and send Alice function f 1 ( x , y ) ; otherwise, it will send Alice a special symbol ( );
(3)
The result of the calculation is sent to Bob by the TTP. If Alice is a malicious participant, then she obtains the message from the other side: The first response is to choose to disregard the TTP, at which point the TTP sends to Bob. The second response is to send f 2 ( x , y ) to Bob from the TTP.
When implementing the ideal protocol, the TTP and participants will not disclose any information except their own output information; thus, the protocol with the highest security is the one under the ideal model.
Under the ideal model, the process for the participants to jointly calculate F ( x , y ) through auxiliary input information z and strategy B ¯ is I D E A L F , B ¯ ( z ) ( x , y ) , which is defined as a random number ( r ) evenly selected by the adversary, making I D E A L F , B ¯ ( z ) ( x , y ) = γ ( x , y , z , r ) . The details are as follows:
(1)
When Alice is honest, there are γ ( x , y , z , r ) = ( f 1 ( x , y ) , B 2 ( y , z , r , f 2 ( x , y ) ) ) and y = B 2 ( y , z , r ) ;
(2)
When Bob is honest, there is the following:
γ ( x , y , z , r ) = ( B 1 ( x , z , r , f 1 ( x , y ) , ) , ) , B 1 ( x , z , r , f 1 ( x , y ) ) = ( B 1 ( x , z , r , f 1 ( x , y ) ) , f 2 ( x , y ) ) , o t h e r s
Both of these cases have x = B 1 ( x , z , r ) .
Definition 1.
Security of the protocol under the malicious model.
The actual protocol has policy pair A ¯ = ( A 1 , A 2 ) , and the ideal model has policy pair B ¯ = ( B 1 , B 2 ) , making { I D E A L F , B ¯ ( z ) ( x , y ) } x , y , z c { R E A L Π , A ¯ ( z ) ( x , y ) } x , y , z , and then Π is secure in computing the function ( F ) (there exists x , y , z { 0 , 1 } * such that x = y and z = p o l y ( x ) ).
Note 1: To design an MPC protocol under the malicious model, it must be ensured that at least one of the participants is honest; otherwise, the MPC protocol will not be implemented (which cannot be avoided under the ideal model).

3. Secure Computation Protocol of Text Similarity under the Semi-Honest Model

3.1. Problem Description

Alice encodes the private text as the vector x = ( x 1 , , x n ) . Bob encodes the private text as y = ( y 1 , , y n ) with an agreed threshold ( t ) (see Section 2.2). Both parties can securely output L t ( x , y ) without disclosing any information. If L t ( x , y ) = 1 , then it means that the number of equal elements of x i = y i ( 1 i n ) in vector x and vector y is l t , indicating that the similarity between the two texts is at least t / n . Otherwise, it outputs L t ( x , y ) = 0 (where the number of equal elements is recorded as l = L ( x , y ) ).

3.2. Solutions

(1)
Alice calculates the vectors u i = 2 x i and u i = 2 x i + 1 . Bob calculates the vectors v i = 2 y i + 1 and v i = 2 y i . Both parties jointly calculate the dominance degree ( h ) of u = ( u 1 , , u n ) with respect to v = ( v 1 , , v n ) and the dominance degree ( h ) of v = ( v 1 , , v n ) with respect to u = ( u 1 , , u n ) (see Note 2 below for the definition of the dominance degree). According to Theorem 2, the number of elements of x i = y i ( 1 i n ) in vector x and y is recorded as l = L ( x , y ) = n h h ;
(2)
The random vectors r = ( r 1 , , r n ) and r = ( r 1 , , r n ) are selected, and the symbols of t i = r i ( u i v i ) and t i = r i ( v i u i ) are determined for each i [ 1 , n ] ;
(3)
The dominance ( h ) of u with respect to v is determined by the number of the same symbols of t i and r i . The dominance ( h ) of v with respect to u is calculated via the number of the same symbols of t i and r i (the ciphertexts of h and h are obtained). Finally, the sizes of l and t are compared through the ciphertext of l = n h h .
Note 2: If two n-dimensional vectors ( u = ( u 1 , , u n ) and v = ( v 1 , , v n ) ) are given, then the number of u i > v i ( 1 i n ) in the vector is called the vector dominance of u with respect to v .
The specific Algorithm 1 is as follows:
Algorithm 1: Computing the text similarity under the semi-honest model.
Input:   x = ( x 1 , , x n ) : Alice’s input; y = ( y 1 , , y n ) : Bob’s input; t: agreed threshold between both parties; G: the base point of the elliptic curve (Ep); pk = K: the public key; sk = k: Alice’s private key; E: encrypt; D: decrypt; Encode: encode points onto elliptic curves (Ep); u i = 2 x i ,   u i = ( 2 x i + 1 ) : Alice’s calculation; v i = ( 2 y i + 1 ) ,   v i = 2 y i : Bob’s calculation;
1
E n c o d e ( u 1 , , u n ) = ( M 1 , , M n ) , E n c o d e ( u 1 , , u n ) = ( M 1 , , M n ) ;
2
E n c o d e ( v 1 , , v n ) = ( P 1 , , P n ) , E n c o d e ( v 1 , , v n ) = ( P 1 , , P n ) ;
3
Select ai, a i , bi, b i ;
4
E p k ( u i ) = ( M i + a i K , a i G ) ,   E p k ( u i ) = ( M i + a i K , a i G ) ;
5
E p k ( v i ) = ( P i + b i K , b i G ) ,   E p k ( v i ) = ( P i + b i K , b i G ) ;
6
Select random vectors r = ( r 1 , , r n ) ,   r = ( r 1 , , r n ) ;
7
Compute w i = E p k [ r i ( u i v i ) ] = E p k ( r i u i ) + E p k ( r i v i )
and   w i = E [ r i ( v i u i ) ] = E ( r i v i ) + E ( r i u i ) ;
8
D s k ( w i ) = r i ( u i v i ) = d i ,   D s k ( w i ) = r i ( v i u i ) = d i ;
9
If   d i < N / 2 ,   then   set   h i = 1 ;   otherwise   h i = 0 ;   if   d i < N / 2 ,   then   set   h i = 1 ;   otherwise ,   h i = 0 ;
10
E n c o d e ( h 1 , , h n ) = ( N 1 , , N n ) ,   E n c o d e ( h 1 , , h n ) = ( N 1 , , N n ) ;
11
Select   b 1 i ,   b 1 i ;
12
E p k ( h i ) = ( N i + b 1 i K , b 1 i G ) ,   E p k ( h i ) = ( N i + b 1 i K , b 1 i G ) ;
13
For   each   i [ 1 , n ] ,   when   r i > 0 ,   let   H i = E ( h i ) ;   when   r i < 0 ,   calculate   H i = T [ E ( h i ) ] ;
14
When   r i > 0 ,   let   H i = E ( h i ) ;   when   r i < 0 ,   calculate   H i = T [ E ( h i ) ] ;
15
Compute   H = i [ 1 , n ] H i + H i = E ( e ) = E ( n l ) ;
16
Select   r * ;
17
Compute   Z = E p k ( r * 1 ) + E p k ( r * 2 n ) E p k ( r * 2 e ) + E p k ( 2 t r * ) = E [ r * ( 2 l + 1 2 t ) ] ;
18
D s k ( Z ) = z * ;
19
If   z * < N / 2 ,   then   L t ( x , y ) = 1 ;   otherwise ,   L t ( x , y ) = 0 .
Output:   L t ( x , y ) .
The specific Protocol 1 is as follows:
Protocol 1: The SCTS protocol under the semi-honest model.
Input: Alice’s text vector ( x = ( x 1 , , x n ) ), Bob’s text vector ( y = ( y 1 , , y n ) ), and threshold (t).
Output: The size relationship ( L t ( x , y ) ) between the number of equal elements (l and t) of two vectors (x and y).
Preparation: For each i [ 1 , n ] , Alice calculates u i = 2 x i ,   u i = ( 2 x i + 1 ) , and Bob calculates v i = ( 2 y i + 1 ) ,   v i = 2 y i   ( u i , v i < N / 2 ). Alice selects an elliptic curve ( E p ( a , b ) ), selects the base point (G) and the private key (k), and then calculates kG = K to obtain the public key (K). Alice sends E p ( a , b ) , the public key (K), and G to Bob.
Protocol Start:
(1)
Alice encodes the plaintext sets u = ( u 1 , , u n ) and u = ( u 1 , , u n ) to points Mi and M i ( 1 i n ) on the elliptic curve (Ep) one by one, selects n random numbers (ai and a i ), and encrypts each element (Mi and M i ( 1 i n ) ) with the public key (K). That is, the ciphertexts E ( M i ) = ( C 1 i , C 2 i ) and   E ( M i ) = ( C 1 i , C 2 i ) are calculated, where C 1 i = M i + a i K , C 2 i = a i G   and   C 1 i = M i + a i K , C 2 i = a i G . Then, Alice obtains the sets E ( u ) = ( E ( M 1 ) , E ( M 2 ) , , E ( M n ) )   and   E ( u ) = ( E ( M 1 ) , E ( M 2 ) , , E ( M n ) ) and sends E ( u )   and   E ( u ) to Bob;
(2)
After Bob receives E ( u )   and   E ( u ) :
(a)
Bob encodes the plaintext sets v = ( v 1 , , v n )   and   v = ( v 1 , , v n ) to points Pi and P i ( 1 i n ) on the elliptic curve (Ep) one by one, selects n random numbers (bi and b i ( 1 i n ) ), and encrypts each element (Pi and P i ( 1 i n ) ) with the public key (K). That is, the ciphertexts E ( P i ) = ( I 1 i , I 2 i )   and   E ( P i ) = ( I 1 i , I 2 i ) are calculated, where I 1 i = P i + b i K , I 2 i = b i G   and   I 1 i = P i + b i K , I 2 i = b i G . Then, Bob obtains the sets E ( v ) = ( E ( P 1 ) , E ( P 2 ) , , E ( P n ) )   and   E ( v ) = ( E ( P 1 ) , E ( P 2 ) , , E ( P n ) ) . At the same time, Bob selects the random vectors r = ( r 1 , , r n ) ,   r = ( r 1 , , r n ) , where 0 < r i , r i < N / 2   and   i [ 1 , n ] ;
(b)
For each i [ 1 , n ] , Bob calculates wi and w i , including the following:
w i 1 = E ( u i ) , w i 2 = E ( v i ) , w i = E [ r i ( u i v i ) ] = E ( r i u i ) + E ( r i v i ) , and that is w i = ( w i 1 + + w i 1 ) r i + ( w i 2 + + w i 2 ) r i . At the same time, Bob calculates w i 1 = E ( u i ) ,   w i 2 = E ( v i ) , w i = E [ r i ( v i u i ) ] = E ( r i v i ) + E ( r i u i ) , and that is w i = ( w i 1 + + w i 1 ) r i + ( w i 2 + + w i 2 ) r i . Then, Bob sends the ciphertexts w i   and   w i to Alice;
(3)
For each i [ 1 , n ] :
(a)
Alice   decrypts   w i   and   w i   and   decodes   the   x - coordinates   of   points   w i   and   w i   to   obtain   d i   and   d i .   If   d i < N / 2 ,   h i = 1 is   set ;   otherwise ,   h i = 0 .   Similarly ,   if   d i < N / 2 ,   then   h i = 1   is   set ;   otherwise ,   h i = 0 ;
(b)
Alice   encodes   h i   and   h i   to   points   N i   and   N i ( 1 i n )   on   the   elliptic   curve   ( E p ( a , b ) )   one   by   one ,   selects   n   random   numbers   ( b 1 i   and   b 1 i ) ,   adopts   the   encryption   method   in   step   1 ,   uses   the   public   key   ( K )   to   encrypt   each   element   ( N i   and   N i ( 1 i n ) )   one   by   one ,   obtains   E ( h ) = ( E ( h 1 ) , , E ( h n ) )   and   E ( h ) = ( E ( h 1 ) , , E ( h n ) ) , and sends them to Bob;
(4)
Bob makes the following calculation:
(a)
  For   each   i [ 1 , n ] ,   when   r i > 0 ,   let   H i = E ( h i ) ;   when   r i < 0 ,   calculate   H i = T [ E ( h i ) ] .   Similarly ,   when   r i > 0 ,   let   H i = E ( h i ) ;   when   r i < 0 ,   calculate   H i = T [ E ( h i ) ] ;
(b)
  Compute   H = i [ 1 , n ] H i + H i = E ( e ) ;
(c)
  Select   the   random   number   r * < N 4 n + 1 ,   then   select   the   random   numbers   a 1 i , a 1 i , a 2 i , a 2 i ,   use   the   encryption   method   in   step   1   to   encrypt   with   the   public   key   ( K ) :   Z 1 = E ( 1 ) ,   Z 2 = E ( 2 n ) ,   Z 3 = E ( 2 e ) ,   Z 4 = E ( 2 t ) , Z = ( Z 1 + Z 1 ) r * + ( Z 2 + Z 2 ) r * ( Z 3 + Z 3 ) r * + ( Z 4 + Z 4 ) r * ,   and   send   Z   to   Alice ;
(5)
Alice   decrypts   to   obtain   z = D ( Z ) ,   and   decodes   the   x - coordinate   of   point   z   to   obtain   z * .   If   z * < N / 2 ,   then   L t ( x , y ) = 1 ;   otherwise ,   L t ( x , y ) = 0 .   Bob   will   be   informed   by   outputting   L t ( x , y ) .
The protocol ends.

3.3. Correctness Analysis

(1)
Steps (1)–(4) of the protocol, executed in parallel by Alice and Bob with u and v and u and v , respectively, reduce its communication complexity;
(2)
In step (2b) of the protocol, for each 0 u i , v i N / 2 , 0 < r i < N / 2 , and because u i v i , the range of the value of r i ( u i v i ) is N / 2 < r i ( u i v i ) < 0 or 0 < r i ( u i v i ) < N / 2 . According to Theorem 3, if 0 < d i < N / 2 , 0 < r i ( u i v i ) < N / 2 , then r i and u i v i are the same number, and h i = 1 ; if d i > N / 2 , then N / 2 < r i ( u i v i ) < 0 , and then r i and u i v i are different numbers, and h i = 0 ;
(3)
For the input u and v , it is known that u i > v i when and only when H i = E ( 1 ) ; u i < v i when and only when H i = E ( 0 ) . It is further known from Theorem 3 that x i > y i when and only when H i = E ( 1 ) ; x i y i when and only when H i = E ( 0 ) ;
(4)
For the input u and v , it is known that u i > v i when and only when H i = E ( 0 ) ; u i < v i when and only when H i = E ( 1 ) . It is further known from Theorem 3 that x i < y i when and only when H i = E ( 1 ) ; x i y i when and only when H i = E ( 0 ) ;
(5)
Clearly, the i -th component of vectors x and y is equal when H i , H i are simultaneously 0. By the ECC additive homomorphism, it follows that H ( e ) = i [ 1 , n ] H i + H i = E ( n l ) ;
(6)
Steps (4c) and (5) of the protocol are for l = L ( x , y ) and t size comparisons, and by the ECC property:
Z = ( Z 1 + Z 1 ) r * + ( Z 2 + Z 2 ) r * ( Z 3 + Z 3 ) r * + ( Z 4 + Z 4 ) r * = E [ r * ( 2 l + 1 2 t ) mod N ]
According to Theorem 2, the following is known:
0 < z * < N / 2 ,   then   2 l + 1 > 2 t l t L t ( x , y ) = 1
N / 2 < z * < N ,   then   L t ( x , y ) = 0 .
For example: Alice’s input vector: x = ( 2,6 , 8,9 ) ; Bob’s input vector: y = ( 2,6 , 7,8 ) ; threshold: t = 2 .
Preparation procedure: Alice calculates u = 2 x i = ( 4,12,16,18 ) , u = ( 2 x i + 1 ) = ( 5 , 13 , 17 , 19 ) . Bob calculates v = ( 2 y i + 1 ) = ( 5 , 13 , 15 , 17 ) , v = 2 y i = ( 4,12,14,16 ) .
Calculation procedure:
(1)
Alice encrypts E ( u ) = ( E ( 4 ) , E ( 12 ) , E ( 16 ) , E ( 18 ) ) and E ( u ) = ( E ( 5 ) , E ( 13 ) , E ( 17 ) , E ( 19 ) ) and sends them to Bob;
(2)
Bob encrypts E ( v ) = ( E ( 5 ) , E ( 13 ) , E ( 15 ) , E ( 17 ) ) , E ( v ) = ( E ( 4 ) , E ( 12 ) , E ( 14 ) , E ( 16 ) ) and picks the random vectors r = ( 1,2 , 3 , 1 ) and r = ( 1,3 , 2 , 1 ) . Then, Bob computes w i = E [ r i ( u i v i ) ] = [ E ( 1 × ( 1 ) ) , E ( 2 × ( 1 ) ) , E ( ( 3 ) × 1 ) , E ( ( 1 ) × 1 ) ] and w i = E [ r i ( v i u i ) ] = [ E ( 1 × ( 1 ) ) , E ( 3 × ( 1 ) ) , E ( 2 × ( 3 ) ) , E ( ( 1 ) × ( 3 ) ) ] , sending them to Alice;
(3)
Alice decrypts to obtain h i and h i (i.e., h = ( 0,0 , 0,0 ) and h = ( 0,0 , 0 , 1 ) ). Alice encrypts to obtain E ( h ) = ( E ( 0 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) ) and E ( h ) = ( E ( 0 ) , E ( 0 ) , E ( 0 ) , E ( 1 ) ) and sends them to Bob;
(4)
For each i [ 1 , n ] , when r i > 0 , let H i = E ( h i ) ; when r i < 0 , calculate H i = T [ E ( h i ) ] . Similarly, when r i > 0 , let H i = E ( h i ) ; when r i < 0 , calculate H i = T [ E ( h i ) ] . Bob obtains the vectors H i = ( E ( 0 ) , E ( 0 ) , E ( 1 ) , E ( 1 ) ) and H = ( E ( 0 ) , E ( 0 ) , E ( 0 ) , E ( 0 ) ) . H = E ( 0 + 0 ) + E ( 0 + 0 ) + E ( 1 + 0 ) + E ( 1 + 0 ) ) = E ( 2 ) is computed;
(5)
Bob selects r * = 1 and brings it into the calculation to obtain Z = E ( 1 + 2 n 2 e 2 t ) = E ( 1 ) , and send Z to Alice. Alice decypts and outputs L t ( x , y ) = 1 , which shows that the number of equal elements in x , y is greater than or equal to the set threshold (i.e., l t ), and the similarity of the two vectors is at least 2 4 = 50 % .
To sum up, Protocol 1 is secure because the participants are honest under the semi-honest model. However, malicious adversaries can exist in real situations, and so the design of a secure MPC protocol is required under the malicious model.

4. Secure Computation Protocol of Text Similarity under the Malicious Model

4.1. Solutions

By analyzing the possible attack behaviors of malicious participants in Protocol 1, the aim is to detect the attack in time, or the participants will be discovered once the attack is implemented. The following is an analysis of possible malicious actions in Protocol 1:
(1)
In Protocol 1, both Alice and Bob can encrypt plaintexts, but only Alice can decrypt the ciphertext. Once Alice informs Bob of an incorrect result, Bob can only accept the result and is reactive. Compared with Alice, Bob is extremely inequitable. Therefore, two factors should be considered (i.e., both Alice and Bob can perform fairly and obtain the correct result);
(2)
In Protocol 1, Alice and Bob need to inform each other of the encryption results when they execute the protocol. If one party intentionally informs the wrong ciphertext, this is an input error, which cannot be prevented in the ideal protocol, and so it will not be considered;
(3)
In Step 5, Alice may not output the correct results after decryption, but Alice already knows the correct results at this time, and there is some malicious behavior in this step.
Aiming at the above malicious behaviors, this paper designs an SCTS protocol under the malicious model. The design idea is that Alice and Bob have the same status, and each has a public key and a private key and use the zero-knowledge-proof and cut-and-choose methods to verify whether the calculation results are consistent. Please refer to Figure 2 for details.
The specific Algorithm 2 is as follows:
Algorithm 2: Computing the text similarity under the malicious model.
Input :   x = ( x 1 , , x n ) :   Alice s   input ;   y = ( y 1 , , y n ) :   Bob s   input ;   t :   agreed   threshold   between   both   parties ;   G :   the   base   point   of   the   elliptic   curve   ( E p ) ;   a :   Alice s   choice ;   b :   Bob s   choice ;   p k 1 = K 1 :   Alice s   public   key ;   p k 2 = K 2 :   Bob s   public   key ;   s k 1 = k 1 :   Alice s   private   key ;   s k 2 = k 2 :   Bob s   private   key ;   E :   encrypt ;   D :   decrypt ;   Encode :   encode   points   onto   E p ;   u i = 2 x i , u i = ( 2 x i + 1 ) :   Alice s   calculation ; v i = ( 2 y i + 1 ) ,   v i = 2 y i : Bob’s calculation;
1
  Compute   q 1 = a K 1 ;
2
  Compute   q 2 = b K 2 ;
3
  Exchange   ( K 1 , q 1 ) and   ( K 2 , q 2 ) ;
4
  E n c o d e ( u 1 , , u n ) = ( M 1 , , M n ) ,   E n c o d e ( u 1 , , u n ) = ( M 1 , , M n ) ;
5
  E n c o d e ( v 1 , , v n ) = ( P 1 , , P n ) ,   E n c o d e ( v 1 , , v n ) = ( P 1 , , P n ) ;
6
  Select   a i ,   a i ,   b i ,   b i ;
7
  E p k 1 ( u i ) = ( M i + a i K 1 , a i G ) ,   E p k 1 ( u i ) = ( M i + a i K 1 , a i G ) ;
8
  E p k 2 ( v i ) = ( P i + b i K 2 , b i G ) ,   E p k ( v i ) = ( P i + b i K 2 , b i G ) ;
9
  Select   random   vectors   r a = ( r a 1 , , r a n ) ,   r a = ( r a 1 , , r a n ) ;
10
  Select   random   vectors   r b = ( r b 1 , , r b n ) ,   r b = ( r b 1 , , r b n ) ;
11
  Compute   w i = E [ r a i ( u i v i ) ] = E ( r a i u i ) + E ( r a i v i ) and   w i = E [ r i ( v i u i ) ] = E ( r i v i ) + E ( r i u i ) ;
12
  Compute   g i = E [ r b i ( u i v i ) ] = E ( r b i u i ) + E ( r b i v i )   and   g i = E [ r b i ( v i u i ) ] = E ( r b i v i ) + E ( r b i u i ) ;
13
  Exchange   w i , w i   and   g i , g i ;
14
  D s k 1 ( w i ) = r a i ( u i v i ) = W 1 i ,   D s k 1 ( w i ) = r a i ( v i u i ) = W 2 i ;
15
  D s k 2 ( g i ) = r b i ( u i v i ) = Q 1 i ,   D s k 2 ( g i ) = r b i ( v i u i ) = Q 2 i ;
16
  Select   p 1 s ,   p 2 s ,   f 1 s ,   f 2 s ,   0 s m ;
17
  ( c 1 a s , c 2 a s ) = ( p 1 s W 1 i + K 1 , W 1 i + p 1 s W 1 i + a G )   and   ( o 1 a s , o 2 a s ) = ( p 2 s W 2 i + K 1 , W 2 i + p 2 s W 2 i + a G ) ;
18
  ( c 1 b s , c 2 b s ) = ( f 1 s Q 1 i + K 2 , Q 1 i + f 1 s Q 1 i + b G )   and   ( o 1 b s , o 2 b s ) = ( f 2 s Q 2 i + K 2 , Q 2 i + f 2 s Q 2 i + b G ) ;
19
  Exchange   ( c 1 a s , c 2 a s ) , ( o 1 a s , o 2 a s )   and   ( c 1 b s , c 2 b s ) , ( o 1 b s , o 2 b s ) ;
20
  Choose   m / 2   groups   ( c 1 b s , c 2 b s )   and   ( o 1 b s , o 2 b s )
21
  If   ( f 1 s Q 1 i + K 2 = c 1 b s ,   f 2 s Q 2 i + K 2 = o 1 b s ) , then continue; otherwise, terminate;
22
  Choose   m / 2   groups   ( c 1 a s , c 2 a s )   and   ( o 1 a s , o 2 a s ) ;
23
  If   ( p 1 s W 1 i + K 1 = c 1 a s ,   p 2 s W 2 i + K 1 = o 1 a s ) , then continue; otherwise, terminate;
24
  Choose   one   ( c 1 b j , c 2 b j )   and   ( o 1 b j , o 2 b j ) from remaining groups;
25
  c b = a ( c 2 b j c 1 b j W 1 i + K 2 ) = a ( Q 1 i W 1 i ) + a b G ,   c b = a ( o 2 b j o 1 b j W 2 i + K 2 ) = a ( Q 2 i W 2 i ) + a b G ;
26
  O 1 = q 1 * G , λ b = q 1 * K 2 ,   O 1 = q 1 G , λ b = q 1 K 2 ;
27
  Choose   one   ( c 1 a j , c 2 a j )   and   ( o 1 a j , o 2 a j ) from remaining groups;;
28
  c a = b ( c 2 a j c 1 a j Q 1 i + K 1 ) = b ( W 1 i Q 1 i ) + a b G ,   c a = b ( o 2 b j o 1 b j Q 2 i + K 1 ) = b ( W 2 i Q 2 i ) + a b G ;
29
  O 2 = q 2 * G ,   λ a = q 2 * K 1 ,   O 2 = q 2 G ,   λ a = q 2 K 1 ;
30
  Exchange   c b + O 1 ,   c b + O 1   and   c a + O 2 ,   c a + O 2 ;
31
  β a = k 1 ( c a + O 2 ) ,   m a = k 1 c a   and   β a = k 1 ( c a + O 2 ) ,   m a = k 1 c a ;
32
  β b = k 2 ( c b + O 1 ) ,   m b = k 2 c b   and   β b = k 2 ( c b + O 1 ) ,   m b = k 2 c b ;
33
  Exchange   β a ,   m a ,   β a ,   m a   and   β b ,   m b ,   β b ,   m b ;
34
  If   ( m b = β b λ b ,   m b = β b λ b ) , then continue; otherwise, terminate;
35
  If   ( m a = β a λ a ,   m a = β a λ a ) , then continue; otherwise, terminate;
36
  If   k 2 a ( Q 1 i W 1 i ) = 0   by   m b a q 2   and   k 2 a ( Q 2 i W 2 i )   by   m b a q 2 , continue;
37
  If   k 1 b ( W 1 i Q 1 i ) = 0   by   m a b q 1   and   k 1 b ( W 2 i Q 2 i )   by   m a b q 1 , continue;
38
  D ( W 1 i ) = d 1 i ,   D ( W 2 i ) = d 1 i ;   if   d 1 i < N / 2 ,   then   set   h 1 i = 1 ;   otherwise ,   h 1 i = 0 ;
39
  D ( Q 1 i ) = d 2 i ,   D ( Q 2 i ) = d 2 i ;   if   d 1 i < N / 2 ,   then   set   h 1 i = 1 ;   otherwise ,   h 1 i = 0 ;
40
  E p k 1 ( h 1 ) = E p k 1 ( h 11 ) , , E p k 1 ( h 1 n ) ;
41
  E p k 2 ( h 1 ) = E p k 2 ( h 11 ) , , E p k 2 ( h 1 n ) ;
42
  Obtain   H 1 i   and   H 1 i ;
43
  Obtain   H 2 i   and   H 2 i ;
44
  Compute   H 1 = i [ 1 , n ] H 1 i + H 1 i = E ( e 1 ) = E ( n l ) ;
45
  Compute   H 2 = i [ 1 , n ] H 2 i + H 2 i = E ( e 2 ) = E ( n l ) ;
46
  Select   r 1 , r 2 ;
47
  Compute   Z = E p k 1 ( r 1 1 ) + E p k ( r 1 2 n ) E p k 1 ( r 1 2 e 1 ) + E p k 1 ( 2 t r 1 ) = E [ r 1 ( 2 l + 1 2 t ) ] ;
48
  Compute   Z = E p k 2 ( r 2 1 ) + E p k 2 ( r 2 2 n ) E p k 2 ( r 2 2 e 2 ) + E p k 2 ( 2 t r 2 ) = E [ r 2 ( 2 l + 1 2 t ) ] ;
49
  D s k 1 ( Z ) = z 1 ,   D s k 2 ( Z ) = z 2 ;
50
  Select   p i   and   p i ( 0 i m ) ;
51
  ( c 11 a i , c 12 a i ) = ( p i z 1 + K 1 , z 1 + p i z 1 + a G ) ;
52
  ( c 11 b i , c 12 b i ) = ( p i z 2 + K 2 , z 2 + p i z 2 + b G ) ;
53
  Exchange   ( c 11 a i , c 12 a i )   and   ( c 11 b i , c 12 b i ) ;
54
  Choose   m / 2   groups   from   ( c 11 a i , c 12 a i )   and   ( c 11 b i , c 12 b i ) ;
55
  If   ( c 11 b i = p i z 2 + K 2 ) , then continue; otherwise, terminate;
56
  If   ( c 11 a i = p i z 1 + K 1 ) , then continue; otherwise, terminate;
57
  Choose   one   ( c 11 b j , c 12 b j )   and   ( c 11 a j , c 12 a j ) from remaining groups;
58
  Compute   c b 1 = a ( c 12 b j c 11 b j z 1 + K 2 ) = a ( z 2 z 1 ) + a b G ,   J 1 = q 3 G ,   λ b 1 = q 3 K 2 ;
59
  Compute   c a 1 = b ( c 12 a j c 11 a j z 2 + K 1 ) = b ( z 1 z 2 ) + a b G ,   J 2 = q 4 G ,   λ a 1 = q 4 K 1 ;
60
  Exchange   c b 1 + J 1 ,   c a 1 + J 2 ;
61
  β a 1 = k 1 ( c a 1 + J 2 ) ,   m a 1 = k 1 c a 1 ;
62
  β b 1 = k 2 ( c b 1 + J 1 ) ,   m b 1 = k 2 c b 1 ;
63
  Exchange   β a 1 ,   m a 1   and   β b 1 ,   m b 1 ;
64
  If   ( m b 1 = β b 1 λ b 1 ) , then continue; otherwise, terminate;
65
  If   ( m a 1 = β a 1 λ a 1 ) , then continue; otherwise, terminate;
66
  If   k 2 a ( z 2 z 1 ) = 0   by   m b 1 a q 2 , then continue;
67
  If   k 1 b ( z 1 z 2 ) = 0   by   m a 1 b q 1 , then continue;
68
  D ( z 1 ) = z 3 ;   if   z 3 < N / 2 ,   make   L t ( x , y ) = 1 ;
69
  D ( z 2 ) = z 4 ;   if   z 4 < N / 2 ,   make   L t ( x , y ) = 1 ;
Output :   L t ( x , y )
The specific Protocol 2 is as follows:
Protocol 2: The SCTS protocol under the malicious model.
Input :   Alice s   text   vector   ( x = ( x 1 , , x n ) ) ,   Bob s   text   vector   ( y = ( y 1 , , y n ) ) ,   and   a   threshold   ( t ) .
Output :   The   size   relationship   ( L t ( x , y ) )   between   the   number   of   equal   elements   ( l   and   t )   of   two   vectors   ( x   and   y ) .
Preparation :   For   each   i [ 1 , n ] ,   Alice   calculates   u i = 2 x i , u i = ( 2 x i + 1 )   and   Bob   calculates v i = ( 2 y i + 1 ) ,   v i = 2 y i   ( u i , v i < N / 2 ) .   Alice   and   Bob   together   choose   the   elliptic   curve   ( E p )   and   base   point   ( G ) ,   and   they   choose   the   private   key   ( k 1 , k 2 ( k 1 , k 2 > 0 ) )   and   random   number   ( a , b ) ,   respectively .   Then ,   both   parties   calculate   their   public   keys   (   K 1 = k 1 G , K 2 = k 2 G   and   q 1 = a K 1 ,   q 2 = b K 2 ,   respectively ) .   Finally ,   Alice   and   Bob   exchange   ( K 1 , q 1 )   and   ( K 2 , q 2 ) .
Protocol Start:
(1)
Alice and Bob calculate the following:
(a)
  Alice   encodes   the   plaintext   sets   u = ( u 1 , , u n )   and u = ( u 1 , , u n )   to   points   M i   and   M i ( 1 i n )   on   the   elliptic   curve   ( E p )   one   by   one ,   selects   n   random   numbers   ( a i   and   a i ) ,   and   encrypts   each   element   ( M i   and   M i ( 1 i n ) )   separately   by   using   the   public   key   K 1 .   That   is ,   the   ciphertexts   E ( M i ) = ( C 1 i , C 2 i )   and   E ( M i ) = ( C 1 i , C 2 i )   are   calculated ,   corresponding   to   each   element   ( M i   and   M i ( 1 i n ) ) ,   where   C 1 i = M i + a i K , C 2 i = a i G   and   C 1 i = M i + a i K , C 2 i = a i G .   Then ,   Alice   obtains   the   sets   E ( u ) = ( E ( M 1 ) , E ( M 2 ) , , E ( M n ) )   and   E ( u ) = ( E ( M 1 ) , E ( M 2 ) , , E ( M n ) ) ,   and   sends   E ( u )   and   E ( u ) to Bob;
(b)
  Bob   encodes   the   plaintext   sets v = ( v 1 , , v n )   and   v = ( v 1 , , v n )   to   points   P i   and   P i ( 1 i n )   on   the   elliptic   curve   ( E p )   one   by   one ,   selects   n   random   numbers   ( b i   and   b i ) ,   and   encrypts   each   element   ( P i   and   P i ( 1 i n ) )   separately   by   using   the   public   key   K 2 .   That   is ,   the   ciphertexts   E ( P i ) = ( I 1 i , I 2 i )   and   E ( P i ) = ( I 1 i , I 2 i )   are   calculated ,   corresponding   to   each   element   ( P i   and   P i ( 1 i n ) ) ,   where   I 1 i = P i + b i K , I 2 i = b i G , I 1 i = P i + b i K , I 2 i = b i G ,   and   C 1 i = M i + a i K , C 2 i = a i G .   Then ,   Bob   obtains   the   sets   E ( v ) = ( E ( P 1 ) , E ( P 2 ) , E ( P n ) )   and   E ( v ) = ( E ( P 1 ) , E ( P 2 ) , , E ( P n ) ) ,   and   sends   E ( v )   and   E ( v ) to Alice;
(2)
After receiving the ciphertexts of each other, the participants calculate the following:
(a)
  Alice   selects   random   vectors   r a = ( r a 1 , , r a n )   and   r a = ( r a 1 , , r a n ) ,   where   0 < r a i , r a i < N / 2 ,   i [ 1 , n ] .   Meanwhile ,   Bob   selects   random   vectors   r b = ( r b 1 , , r b n )   and   r b = ( r b 1 , , r b n ) ,   where   0 < r b i , r b i < N / 2 ,   i [ 1 , n ] ;
(b)
  For   each   i [ 1 , n ] ,   Alice   calculates   w i   and   w i :
w i 1 = E ( u i ) , w i 2 = E ( v i ) , w i = E [ r a i ( u i v i ) ] = E ( r a i u i ) + E ( r a i v i ) ,   and   that   is   w i = ( w i 1 + + w i 1 ) r a i + ( w i 2 + + w i 2 ) r a i .   At   the   same   time ,   Alice   calculates w i 1 = E ( u i ) ,   w i 2 = E ( v i ) , w i = E [ r a i ( v i u i ) ] = E ( r a i v i ) + E ( r a i u i ) ,   and   that   is   w i = ( w i 1 + + w i 1 ) r a i + ( w i 2 + + w i 2 ) r a i .   Then ,   Alice   sends   the   ciphertexts   w i   and   w i to Bob.
For   each   i [ 1 , n ] ,   Bob   calculates   g i   and   g i :
g i 1 = E ( u i ) , g i 2 = E ( v i ) , g i = E [ r b i ( u i v i ) ] = E ( r b i u i ) + E ( r b i v i ) ,   and   that   is   g i = ( g i 1 + + g i 1 ) r b i + ( g i 2 + + g i 2 ) r b i .   At   the   same   time ,   Bob   calculates g i 1 = E ( u i ) ,   g i 2 = E ( v i ) , g i = E [ r b i ( v i u i ) ] = E ( r b i v i ) + E ( r b i u i ) ,   and   that   is   g i = ( g i 1 + + g i 1 ) r b i + ( g i 2 + + g i 2 ) r b i .   Then ,   Bob   sends   the   ciphertexts   g i   and   g i to Alice;
(3)
  For   each   i [ 1 , n ] ,   Alice   decrypts   w i   and   w i   with   the   private   key   k 1   to   obtain   points   W 1 i   and   W 2 i .   Bob   decrypts   g i   and   g i   using   k 2   to   obtain   points   Q 1 i   and   Q 2 i ;
(4)
  For   each   i [ 1 , n ] ,   Alice   selects   m   random   numbers   p 1 s ( 0 s m )   and   p 2 s ( 0 s m ) ,   and   calculates   ( c 1 a s , c 2 a s ) = ( p 1 s W 1 i + K 1 , W 1 i + p 1 s W 1 i + a G )   and   ( o 1 a s , o 2 a s ) = ( p 2 s W 2 i + K 1 , W 2 i + p 2 s W 2 i + a G ) .   Bob   selects   m   random   numbers   f 1 s ( 0 s m )   and   f 2 s ( 0 s m ) ,   and   calculates   ( c 1 b s , c 2 b s ) = ( f 1 s Q 1 i + K 2 , Q 1 i + f 1 s Q 1 i + b G )   and   ( o 1 b s , o 2 b s ) = ( f 2 s Q 2 i + K 2 , Q 2 i + f 2 s Q 2 i + b G ) .   Finally ,   Alice   and   Bob   exchange   ( c 1 a s , c 2 a s ) , ( o 1 a s , o 2 a s ) ,   and   ( c 1 b s , c 2 b s ) , ( o 1 b s , o 2 b s ) ;
(5)
Using the cut-and-choose method:
Alice   randomly   selects   m / 2   groups   from   m   groups   ( c 1 b s , c 2 b s )   and   ( o 1 b s , o 2 b s )   sent   by   Bob   to   publish ,   and   Bob   is   required   to   publish   the   corresponding   f 1 s Q 1 i   and   f 2 s Q 2 i .   Then ,   Alice   verifies   the   received   data :   f 1 s Q 1 i + K 2 = c 1 b s ,   f 2 s Q 2 i + K 2 = o 1 b s .   If   the   verification   passes ,   then   the   protocol   is   continued ;   if   the   verification   does   not   pass ,   then   the   protocol   is   terminated .   Bob   randomly   selects   m / 2   groups   from   m   groups   ( c 1 a s , c 2 a s )   and   ( o 1 a s , o 2 a s )   sent   by   Alice   to   publish ,   and   Alice   is   required   to   publish   the   corresponding   p 1 s W 1 i   and   p 2 s W 2 i .   Then ,   Bob   verifies   the   received   data :   p 1 s W 1 i + K 1 = c 1 a s ,   p 2 s W 2 i + K 1 = o 1 a s . If the verification passes, then the protocol is continued; if the verification does not pass, then the protocol is terminated;
(6)
  Alice   randomly   selects   one   ( c 1 b j , c 2 b j )   and   ( o 1 b j , o 2 b j )   from   the   remaining   m / 2   groups   ( c 1 b s , c 2 b s )   and   ( o 1 b s , o 2 b s ) ,   respectively .   Bob   randomly   selects   one   ( c 1 a j , c 2 a j )   and   ( o 1 a j , o 2 a j )   from   the   remaining   m / 2   groups   ( c 1 a s , c 2 a s )   and   ( o 1 a s , o 2 a s ) ,   respectively .   Meanwhile ,   Alice   and   Bob   choose   random   numbers   ( a , q 1 * , q 1   and   b , q 2 * , q 2   ,   respectively ) .   Alice   calculates   c b = a ( c 2 b j c 1 b j W 1 i + K 2 ) = a ( Q 1 i W 1 i ) + a b G   and   c b = a ( o 2 b j o 1 b j W 2 i + K 2 ) = a ( Q 2 i W 2 i ) + a b G ,   respectively   making   O 1 = q 1 * G , λ b = q 1 * K 2   and   O 1 = q 1 G , λ b = q 1 K 2 .   Bob   calculates   c a = b ( c 2 a j c 1 a j Q 1 i + K 1 ) = b ( W 1 i Q 1 i ) + a b G   and   c a = b ( o 2 b j o 1 b j Q 2 i + K 1 ) = b ( W 2 i Q 2 i ) + a b G ,   respectively   making   O 2 = q 2 * G ,   λ a = q 2 * K 1 ,   and   O 2 = q 2 G ,   λ a = q 2 K 1 .   Then ,   c b + O 1 ,   c b + O 1   and   c a + O 2 ,   c a + O 2 are exchanged between Alice and Bob;
(7)
  After   both   parties   receive   messages   from   the   other ,   Alice   calculates   β a = k 1 ( c a + O 2 ) ,   m a = k 1 c a   and   β a = k 1 ( c a + O 2 ) ,   m a = k 1 c a and   sends   them   to   Bob .   Bob   calculates   β b = k 2 ( c b + O 1 ) ,   m b = k 2 c b   and   β b = k 2 ( c b + O 1 ) ,   m b = k 2 c b and sends them to Alice;
(8)
  To   determine   whether   m b   and   m b   sent   by   Bob   are   correct ,   Alice   uses   the   zero - knowledge   proof   to   check   to   prove   that   Bob   really   obtains   the   m b   by   multiplying   her   private   key   ( k 2 )   and   her   own   c b ,   and   to   prove   that   Bob   really   obtains   the   m b   by   multiplying   her   private   key   ( k 2 )   and   her   own   c b   to   judge   whether   m b = β b λ b   and   m b = β b λ b   are   correct ,   respectively .   Similarly ,   Bob   uses   the   same   method   to   determine   whether m a = β a λ a   and   m a = β a λ a are correct. The party who dose not pass is malicious;
(9)
  Alice   can   obtain   k 2 a ( Q 1 i W 1 i )   by   calculating   m b a q 2 ;   if   k 2 a ( Q 1 i W 1 i ) = 0 ,   then   Q 1 i = W 1 i ;   at   the   same   time ,   Bob   calculates   m b a q 2   to   obtain   k 2 a ( Q 2 i W 2 i ) ;   if   k 2 a ( Q 2 i W 2 i ) = 0 ,   then   Q 2 i = W 2 i .   Similarly ,   Bob   obtains   k 1 b ( W 1 i Q 1 i )   by   calculating   m a b q 1 ;   if   k 1 b ( W 1 i Q 1 i ) = 0 ,   then   Q 1 i = W 1 i ;   at   the   same   time ,   Bob   calculates   m a b q 1   to   obtain   k 1 b ( W 2 i Q 2 i ) ;   if   k 1 b ( W 2 i Q 2 i ) = 0 ,   then   Q 2 i = W 2 i .   If   Q 1 i = W 1 i   and   Q 2 i = W 2 i are valid at the same time, it proves that the results required by both parties are correct and equal, or the protocol is terminated;
(10)
  Alice   decodes   the   points   W 1 i , W 2 i   to   obtain   points   d 1 i , d 1 i ;   Bob   decodes   the   points   Q 1 i , Q 2 i   to   obtain   points   d 2 i , d 2 i .   If   d 1 i < N / 2 ,   then   h 1 i = 1   is   set ;   otherwise ,   h 1 i = 0   is   set .   Similarly ,   if   d 1 i < N / 2 ,   then   h 1 i = 1   is   set ;   otherwise ,   h 1 i = 0   is   set .   Similarly ,   Bob   obtains   h 2 i   and   h 2 i ;
(11)
  Alice   encodes   the   points   h 1 i   and   h 1 i to   points   N 1 i   and   N 1 i ( 1 i n )   on   the   E p ( a , b ) ,   selects   n   random   numbers   ( θ 1 i   and   θ 1 i ) ,   adopts   the   encryption   method   in   step   1 ( a ) ,   uses   the   K 1   to   encrypt   each   element   ( N 1 i   and   N 1 i ( 1 i n ) )   one   by   one ,   obtains   E ( h 1 ) = E ( h 11 ) , , E ( h 1 n )   and   E ( h 1 ) = E ( h 11 ) , , E ( h 1 n ) , and sends them to Bob.
Bob   encodes   the   points   h 2 i   and   h 2 i   to   points   N 2 i   and   N 2 i ( 1 i n )   on   the   elliptic   curve   ( E p ( a , b ) )   one   by   one ,   selects   n   random   numbers   ( θ 2 i   and   θ 2 i ) ,   adopts   the   encryption   method   in   step   1 ( b ) ,   uses   the   public   key   ( K 2 )   to   encrypt   each   element   ( N 2 i   and   N 2 i ( 1 i n ) )   one   by   one ,   obtains   E ( h 2 ) = E ( h 21 ) , , E ( h 2 n )   and   E ( h 2 ) = E ( h 21 ) , , E ( h 2 n ) , and sends them to Alice;
(12)
  Alice   calculates   that   for   each   i [ 1 , n ] ,   when   r a i > 0 ,   let   H 1 i = E ( h 1 i ) ;   when   r a i < 0 ,   calculate   H 1 i = T [ E ( h 1 i ) ] .   Similarly ,   when   r a i > 0 ,   let   H 1 i = E ( h 1 i ) ;   when   r a i < 0 ,   calculate   H 1 i = T [ E ( h 1 i ) ] .   Similarly ,   Bob   calculates   H 2 i   and   H 2 i ;
(13)
  Alice   calculates   H 1 = i [ 1 , n ] H 1 i + H 1 i = E ( e 1 ) ,   Bob   calculates   H 2 = i [ 1 , n ] H 2 i + H 2 i = E ( e 2 ) ;
(14)
  Alice   selects   a   random   number   ( r 1 < N 4 n + 1 ) ,   uses   the   encryption   method   in   step   1   ( a ) ,   selects   the   random   numbers   a 1 i , a 1 i , a 2 i , a 2 i ,   encrypts   with   the   public   key   K 1 ,   and   calculates   Z 1 = E ( 1 ) ,   Z 2 = E ( 2 n ) ,   Z 3 = E ( 2 e 1 ) ,   Z 4 = ( 2 t ) ,   Z = ( Z 1 + Z 1 ) r 1 + ( Z 2 + Z 2 ) r 1 ( Z 3 + Z 3 ) r 1 + ( Z 4 + Z 4 ) r 1 = ( C 1 , C 2 ) .   Alice   sends   the   point   Z   to   Bob .   Bob   selects   a   random   number   ( r 2 < N 4 n + 1 ) ,   uses   the   encryption   method   in   step   1   ( b ) ,   selects   the   random   numbers   a 3 i , a 3 i , a 4 i , a 4 i ,   encrypts   with   the   public   key   K 2 ,   and   calculates   Z 1 = E ( 1 ) ,   Z 2 = E ( 2 n ) ,   Z 3 = E ( 2 e 2 ) ,   Z 4 = E ( 2 t ) , Z = ( Z 1 + Z 1 ) r 2 + ( Z 2 + Z 2 ) r 2 ( Z 3 + Z 3 ) r 2 + ( Z 4 + Z 4 ) r 2 = ( C 1 , C 2 ) .   Bob   sends   point   Z to Alice;
(15)
  Alice   decrypts   E ( Z )   using   the   private   key   k 1   ( i . e . ,   calculates   C 1 k 1 C 2 = z 1   and   obtains   point   z 1 ) ;   Bob   decrypts   E ( Z )   using   the   private   key   k 2   ( i . e . ,   calculates   C 1 k 2 C 2 = z 2   and   obtains   point   z 2 );
(16)
  Alice   selects   m   random   numbers   ( p i ( 0 i m ) )   and   calculates   ( c 11 a i , c 12 a i ) = ( p i z 1 + K 1 , z 1 + p i z 1 + a G ) .   Bob   selects   m   random   numbers   ( p i ( 0 i m ) )   and   calculates   ( c 11 b i , c 12 b i ) = ( p i z 2 + K 2 , z 2 + p i z 2 + b G ) .   The   final   step   is   for   Alice   and   Bob   to   exchange   ( c 11 a i , c 12 a i )   and   ( c 11 b i , c 12 b i ) ;
(17)
  With   the   help   of   the   cut - and - choose   method ,   Alice   randomly   selects   m / 2   groups   from   these   data   ( c 11 b i , c 12 b i ) and   publishes   them ,   while ,   at   the   same   time ,   Bob   publishes   p i z 2 .   Alice   verifies   c 11 b i = p i z 2 + K 2 .   Bob   uses   the   same   method   and   asks   Alice   to   publish   p i z 1 .   Bob   verifies   c 11 a i = p i z 1 + K 1 . If the equation holds, then the next step is taken; if the equation does not hold, then the protocol is stopped;
(18)
  Alice   and   Bob   randomly   select   one   ( c 11 b j , c 12 b j )   and   ( c 11 a j , c 12 a j )   from   the   remaining   m / 2   groups   ( c 11 b i , c 12 b i )   and   ( c 11 a i , c 12 a i ) ,   respectively .   At   the   same   moment ,   Alice   and   Bob   each   pick   random   numbers   ( a , q 3   and   b , q 4 ) .   Alice   calculates   c b 1 = a ( c 12 b j c 11 b j z 1 + K 2 ) = a ( z 2 z 1 ) + a b G ,   J 1 = q 3 G ,   λ b 1 = q 3 K 2 ,   and   Bob   calculates   c a 1 = b ( c 12 a j c 11 a j z 2 + K 1 ) = b ( z 1 z 2 ) + a b G ,   J 2 = q 4 G ,   λ a 1 = q 4 K 1 .   Then ,   c b 1 + J 1 ,   c a 1 + J 2 are exchanged between Alice and Bob;
(19)
  After   both   parties   receive   messages   from   the   other ,   Alice   calculates   β a 1 = k 1 ( c a 1 + J 2 )   and   m a 1 = k 1 c a 1 ,   Bob   calculates   β b 1 = k 2 ( c b 1 + J 1 )   and   m b 1 = k 2 c b 1 , and the messages are sent to the other party;
(20)
  Alice   needs   to   judge   whether   m b 1 = β b 1 λ b 1   is   true   if   she   wants   to   prove   whether   the   m b 1   sent   by   Bob   is   correct   through   the   zero - knowledge - proof   method .   If   Bob   wants   to   verify   the   correctness   of   the   m a 1   sent   by   Alice   through   zero - knowledge   proof ,   then   he   needs   to   judge   whether   m a 1 = β a 1 λ a 1 is true. If one party’s equation does not hold, the agreement will terminate;
(21)
  Alice   can   obtain   k 2 a ( z 2 z 1 )   by   calculating   m b 1 a q 2 .   If   k 2 a ( z 2 z 1 ) = 0 ,   then   z 1 = z 2 ;   Bob   can   obtain   k 1 b ( z 1 z 2 )   by   calculating   m a 1 b q 1 .   If   k 1 b ( z 1 z 2 ) = 0 ,   then   z 1 = z 2 .   z 1 = z 2 means that both parties’ results are correct; otherwise, the protocol is immediately stopped;
(22)
  Finally ,   Alice   and   Bob   decode   the   x - coordinates   of   points   z 1   and   z 2 ,   respectively ,   to   obtain   z 3   and   z 4 .   If   z 3 < N / 2 ,   then   L t ( x , y ) = 1 is   made ;   otherwise ,   L t ( x , y ) = 0   is   made ,   and   L t ( x , y ) is   finally   output .   Similarly ,   if   z 4 < N / 2 ,   then   L t ( x , y ) = 1 is   made ;   otherwise ,   L t ( x , y ) = 0   is   made ,   and   L t ( x , y ) is finally output.
The protocol ends.

4.2. Correctness Analysis

The execution operations of Alice and Bob in Protocol 2 are identical, and so only Alice’s execution process is analyzed.
(1)
In the protocol preparation stage, Alice converts vector x into vectors u and u by calculating u i = 2 x i and u i = ( 2 x i + 1 ) . Therefore, Alice does not disclose any information about the privacy vector ( x ) during the protocol execution phase;
(2)
In steps (5) and (6) of the protocol, Alice verifies whether there are malicious adversaries in the protocol via the cut-and-choose method;
(3)
In step (8) of the protocol, Alice uses zero-knowledge proof to verify that m b and m b sent by Bob are correct (that is, to determine whether m b = β b λ b and m b = β b λ b are correct, respectively);
(4)
The possible malicious behavior of Alice in the first round (from step 1 to step (10)) is that the random numbers p 1 s and p 2 s selected by Alice in step (4) do not meet the requirements, are not detected in the step 5 verification, and happen to be the choice of Bob in step (6), which leads Bob to obtain a faulty result. If Alice uses the method described above for spoofing, then the maximum probability of successful spoofing is m = 10, as an example. If five groups do not meet the requirements, the probability is C 9 5 C 10 5 × 1 5 = 1 10 . If more than half of the group does not meet the requirements, the probability of successful spoofing drops to zero and is always detected in the verification phase. Therefore, the first round of the protocol is secure;
(5)
In steps (17) and (18), Alice uses the cut-and-choose method to verify whether there are malicious adversaries in the protocol;
(6)
In step (20) of the protocol, whether the m b 1 sent by Bob is correct is verified by Alice using the zero-knowledge-proof method;
(7)
The possible malicious behavior of Alice in the second round (from step (11) to step (22)) is that the random numbers ( p i ) selected by Alice in step (16) do not meet the requirements, are not detected in the step (17) verification, and happen to be the choice of Bob in step (18), which leads Bob to obtain a wrong result. The maximum probability of successful Alice spoofing is the same as in step (4). Thus, the second round of the protocol is secure.

4.3. Security Proof

For SCTS protocols under the malicious model, the real/ideal-model paradigm is widely used to prove the security of Protocol 2.
The parties pass two rounds of validation of Protocol 2. In the first round of verification, steps 1–3 are mainly used to calculate W 1 i , W 2 i and Q 1 i , Q 2 i confidentially. If Alice and Bob have verified that W 1 i = Q 1 i and W 2 i = Q 2 i according to the protocol, then they pass the verification. In the second round of verification, step 15 is mainly used to obtain points z 1 and z 2 confidentially. If Alice and Bob verify that z 1 = z 2 according to the protocol, then they pass the verification. The W 1 i , W 2 i and Q 1 i , Q 2 i in the first step and the z 1 and z 2 in the second step will be the input data under the ideal model. If the protocol ends, then the input message sent to the TTP is incorrect. Therefore, input information errors will not be considered in the malicious model. In the proof phase, the problem can be transformed into whether W 1 i is equal to Q 1 i , W 2 i is equal to Q 2 i , and z 1 is equal to z 2 . Therefore, proving whether z 1 is equal to z 2 is required.
Theorem 4.
Protocol 2 (denoted as ∏ ) is secure under the malicious model.
See Appendix A for specific certification process.

5. Performance Analysis

This paper analyzes the performance of the protocol via the complexity and communication complexity (number of communication rounds), and it compares the execution times of the existing protocols with those of Protocol 1 and Protocol 2 through experimental simulation.

5.1. Efficiency Analysis

References [17,18] and Protocol 1 are all used to calculate the SCTS under the semi-honest model. In Reference [17], two sequences with lengths of n are coded into two 0–1 sequences with lengths of 4 n and are encrypted via the GM encryption algorithm. This requires 20 n log N times of modular multiplications, but it can only be used for string matching, with a small scope of application. Reference [18] confidentially determined whether two strings are equal, with an encryption number of n + m and a decryption number of n m , requiring [ 3 log 2 N ( n + 1 ) ] log N modular multiplication operations ( N is usually taken as 1024 bits), but its efficiency is relatively low. However, the ECC encryption method is adopted in Protocol 1 of this paper, assuming that the character length is n (which can be encoded into n vector components), and a total of 16 n + 6 modular multiplication operations are required (other simple arithmetic operations and inversion operations can be ignored), which has a wider application range and higher efficiency.
Protocol 2 is an SCTS protocol under the malicious model. At present, no relevant protocol under the malicious model has been found. For solving the problem of text similarity, the protocol proposed in Reference [19] is more efficient. It is based on the GM encryption algorithm. Assuming that the character length is n , the protocol requires 8 ( n + m 1 ) l log N times of modular multiplications ( l 2 is a security parameter, N = 1024 bits), but it cannot defend against malicious attacks. Protocol 2 uses the ECC encryption method (the character length is n ), which requires 32 n + 12 times of modular multiplications (other simple arithmetic operations and flip operations are ignored). Not only does it improve the efficiency, but it can also resist attacks from malicious enemies.
For the communication complexity measured by the amount of communication rounds, Protocol 1 and References [17,18] require two rounds, while Protocol 2 and Reference [19] require four rounds. Obviously, compared with References [17,18], Protocol 1 in this paper improved the computational efficiency and has a wider application range with the same number of communication rounds. Compared with Reference [19], Protocol 2 in this article has advantages in all aspects and can resist attacks from malicious adversaries.
The performances of Protocol 1 and Protocol 2 are compared with those of References [17,18,19], as shown in Table 1.
Both Protocol 1 and Protocol 2 in this article adopt ECC, which is not only more efficient but also more widely applicable compared to the GM and Paillier encryption schemes. Protocol 2 in this article is a protocol under the malicious model of ECC encryption, which can resist attacks from malicious adversaries and has improved efficiency compared to References [17,18,19]. With a small difference in the number of communication rounds, Protocol 2 is more efficient and secure. Therefore, Protocol 2 is efficient in terms of both the computational complexity and application scope.

5.2. Experimental Analysis

To further verify the execution efficiencies of the protocols in this paper, the experiment used the Windows 10 64-bit (home version) operating system, and the processor was Inter (R) Core (TM) i7-6600 [email protected] GHZ, the memory was 8 GB, and the Java language was used to run the implementation on MyEclipse. The experiment used textual datasets, such as patients’ electronic health records, medical records, disease databases, etc., used for clinical diagnosis, encoded them into vectors, and combined them with the protocol algorithm of this paper, which, in turn, determined the patients’ disease types, such as infectious diseases, cardiovascular diseases, and other types of diseases. The next step was to compare the execution times of the protocols under the premise of achieving the same classification effect (i.e., the shorter the execution time, the higher efficiency of the protocol).
By comparing the execution times of the protocols through simulation experiments, the execution efficiencies of the protocols could be obtained. First, in this study, when the character length n was 1, 2, 3, …, 10, each set value of n was simulated 1000 times, and the average value was counted. Figure 3 shows a comparison of the execution times for different string lengths in Protocol 2. The horizontal coordinate indicates the string length, and the vertical coordinate indicates the execution time. When n = 10 , the execution time of Protocol 2 is 1000 ms.
Secondly, this study selected the string length n = 5 , carried out 1000 simulation experiments on different values of modulus N , and counted the average value. Figure 4 shows a comparison of the execution times for different module N lengths in Protocol 2. The horizontal coordinate indicates the length of module N , and the vertical coordinate indicates the execution time. When n = 5 and N = 1024 bits, the execution time of Protocol 2 is 712 ms.
Next, Figure 5 compares the execution times of References [17,18,19] with those of Protocol 1 and Protocol 2 in this paper through simulation experiments. The abscissa represents the string length, and the ordinate represents the execution time of the protocol. When the string length is 10, the execution times of the protocols are as follows: Reference [18]: 1440 ms; Reference [17]: 1200 ms; Reference [19]: 1118 ms; and Protocol 2: 1000 ms, which shows that the protocol of this paper has the shortest execution time and higher efficiency in the case of the same length of string. Figure 6 shows the execution times of References [17,18,19] and Protocol 1 and Protocol 2 under different settings of module N ( n = 5 at this time) through simulation experiments. The abscissa represents the length of module N , and the ordinate represents the execution time of the protocol. When the length of module N is 1024 bits, the execution times of the protocols are as follows: Reference [18]: 1024 ms; Reference [17]: 932 ms; Reference [19]: 800 ms; and Protocol 2: 712 ms, which shows that in the case of the same number of modes, the execution time of the protocol in this paper is the shortest and more efficient.
Of course, in order to compare the implementation efficiencies of the protocols more comprehensively, the delay times during the experiment should be considered. Figure 7 shows the delay times of Protocol 2 for different string lengths during the experiment. Figure 8 shows the delay times of Protocol 2 for different lengths of module N (when n = 5 ). For example, when n = 1 , we can see from Figure 5 that the execution time of Protocol 2 is 100 ms, and from Figure 7, that the delay time of Protocol 2 is 0.6 ms; thus, the total time consumed in the execution process of Protocol 2 should be 100.6 ms.
Figure 5 and Figure 6 show that the execution times of the protocols increase regardless of the character length or module length. The execution times of Protocol 1 and Protocol 2 are the shortest and the most efficient when compared to those of References [17,18,19]. At the same time, Protocol 2 is not only efficient, but it is also resistant to attacks by malicious adversaries, and it has a great improvement in the security performance. Therefore Protocol 2 has greater practical value.

6. Conclusions

Text similarity computation in deep learning and natural language processing has a wide range of application scenarios and important application value in intelligent recommendation systems, information retrieval, data mining, etc. The existing SCTS protocols are inefficient and cannot resist malicious adversaries. Therefore, based on the efficient ECC encryption algorithm, this paper proposes an SCTS protocol under the semi-honest model. For the malicious behaviors that may be committed by malicious participants under the semi-honest protocol, the SCTS protocol under the malicious model is designed using the cut-and-choose and zero-knowledge-proof methods. The security of the protocol is proven by the real/ideal-model paradigm. Compared with the efficiencies of existing schemes, the protocol proposed in this paper is more efficient and can resist malicious attacks, which has practical value.
In future work, we will extend the number of participants from two parties to multiple parties to study the SCTS, and we will use the protocol in this paper as a basic tool to solve more secure multi-party computation problems, such as string-pattern matching, interval determination, etc., so that it can play more roles in more fields in the future.

Author Contributions

Conceptualization, X.L. (Xin Liu) and R.W.; methodology, R.W. and X.L. (Xiaomeng Liu); investigation, X.L. (Xin Liu); software, D.L. and G.X.; experimental simulation, D.L.; security proof, D.L.; English grammar modification, N.X.; validation, N.X. and X.C.; writing—review and editing, X.L. (Xin Liu) and N.X., writing—original draft, X.L. (Xin Liu) and X.C; funding acquisition, D.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China: Big Data Analysis based on Software Defined Networking Architecture, grant numbers 62177019 and F0701; NSFC, grant numbers 62271070, 72293583, and 61962009; the Inner Mongolia Natural Science Foundation, grant number 2021MS06006; the 2023 Inner Mongolia Young Science and Technology Talents Support Project, grant number NJYT23106; the 2022 Fund Project of Central Government Guiding Local Science and Technology Development, grant number 2022ZY0024; the 2022 Fundamental Research Funds for the Inner Mongolia University of Science and Technology 2022-101; the Inner Mongolia Postgraduate Scientific Research Innovation Project, grant number 2023; the 2022 “Western Light” Talent Training Program “Western Young Scholars” Project, grant number 22040601; the 14th Five-Year Plan of Education and Science of Inner Mongolia, grant number NGJGH2021167; the Open Foundation of State Key Laboratory of Networking and Switching Technology (Beijing University of Posts and Telecommunications), grant number SKLNST-2023-1-08; the 2022 Inner Mongolia Postgraduate Education and Teaching Reform Project: JGSZ2022037; the 2022 Ministry of Education Central and Western China Young Backbone Teachers and Domestic Visiting Scholars Program, grant number 2022015; the Inner Mongolia Discipline Inspection and Supervision Big Data Laboratory Open Project Fund, grant number IMDBD202020; the Baotou Kundulun District Science and Technology Plan Project, grant number YF2020013; the Inner Mongolia Science and Technology Major Project, grant number 2019ZD025; Project JCKY2021208B036, and the Fundamental Research Funds for Beijing Municipal Commission of Education, grant number 220201.

Data Availability Statement

The authors agreed to include data in the article to support the findings.

Conflicts of Interest

The authors declare that there are no conflict of interest.

Appendix A

Theorem A1.
Protocol 2 (denoted as ∏) is secure under the malicious model.
Proof: 
Suppose that when the protocol is executed, both parties take an acceptable policy pair ( A ¯ = ( A 1 , A 2 ) ). To prove that the protocol is secure under the malicious model, A ¯ = ( A 1 , A 2 ) can be converted into policy pair B ¯ = ( B 1 , B 2 ) under the ideal model. The security of the protocol is guaranteed if at least one of the two parties is honest. Thus, the protocol can occur in either of the following two ways:
(1) A 1 is honest, A 2 is dishonest (that is, Alice is honest, Bob is dishonest), and there are the following:
R E A L A ¯ ( z 1 , z 2 ) = { F ( z 1 , A 2 ( z 2 ) ) , A 2 ( ( c 11 a i , c 12 a i ) , m a 1 , S ) } ,
where S is the data generated by A 2 in the zero-knowledge-proof process, i = 1 , , m , and F is the function executed by protocol . To demonstrate the security of the protocol, we only need to find the policy pair B ¯ = ( B 1 , B 2 ) under the ideal model, the output and R E A L A ¯ ( z 1 , z 2 ) of which are computationally indistinguishable.
Because A 1 is honest (Alice is honest), B 1 will send the correct z 1 to the TTP. During this period, B 1 will allow the TTP to send the message to B 2 after receiving the message. There is no case in which B 2 cannot receive the message. What data B 2 sends to the TTP depend on A 2 ’s actual operation strategy ( B 2 needs to call A 2 to achieve this). Ideally, B 2 sends z 2 to A 2 . In reality, A 2 sends A 2 ( z 2 ) to B 2 . B 2 sends A 2 ( z 2 ) to the TTP and outputs F ( z 1 , A 2 ( z 2 ) ) from the TTP. Ideally, B 2 uses the F ( z 1 , A 2 ( z 2 ) ) sent to itself by the TTP to obtain v i e w B 2 ( z 1 , A 2 ( z 2 ) ) , which is indistinguishable from the v i e w A 2 ( z 1 , A 2 ( z 2 ) ) obtained by A 2 in the actual situation, so that it is the same as the output of A 2 in the actual situation. In fact, B 2 selects a z 1 to obtain F ( A 1 ( z 1 ) , z 2 ) = F ( A 1 ( z 1 ) , z 2 ) , then executes Protocol 2 to obtain m a 1 , c 11 a , and c 12 a , and marks the sequence received by the zero-knowledge proof as S :
{ I D E A L B ¯ ( z 1 , z 2 ) } = { F ( z 1 , A 2 ( z 2 ) , A 2 ( ( c 11 a i , c 12 a i ) , m a 1 , S ) } .
Because the ciphertexts obtained in the real and ideal conditions use the same probability algorithm, there are c 11 a i c c 11 a i , c 12 a i c c 12 a i , and S c S ; thus, { R E A L A ¯ ( z 1 , z 2 ) c I D E A L B ¯ ( z 1 , z 2 ) } ;
(2) A 2 is honest, A 1 is dishonest (that is, Bob is honest, Alice is dishonest), and there are two situations:
(2.1) In the actual situation, A 1 passes the zero-knowledge-proof verification and publishes the results:
R E A L A ¯ ( z 1 , z 2 ) = { A 1 ( ( c 11 b i , c 12 b i ) , m b 1 , S ) , F ( z 1 , z 2 ) } .
(2.2) In the actual situation, A 1 does not perform zero-knowledge-proof verification and does not announce the results:
R E A L A ¯ ( z 1 , z 2 ) = { A 1 ( ( c 11 b i , c 12 b i ) , m b 1 , S ) , } .
Because A 2 is honest, B 2 sends the correct z 2 to the TTP (there is no protocol termination during this period), while B 1 sends the data to the TTP depending on A 1 ’s actual operation strategy ( B 1 needs to call A 1 to achieve this). Ideally, B 1 sends z 1 to A 1 . In reality, A 1 sends A 1 ( z 1 ) to B 1 , B 1 sends A 1 ( z 1 ) to the TTP, and outputs F ( A 1 ( z 1 ) , z 2 ) data from the TTP. In the actual situation, if A 1 does not perform zero-knowledge proof or the results are not announced by A 1 , then ideally the TTP will output to B 2 . Ideally, B 1 uses the F ( A 1 ( z 1 ) , z 2 ) sent to itself by the TTP to obtain v i e w B 1 ( A 1 ( z 1 ) , z 2 ) , which is indistinguishable from the v i e w A 1 ( A 1 ( z 1 ) , z 2 ) obtained by A 1 in the actual situation, so that it is the same as the result of the actual case of A 1 . That is, B 1 selects a z 2 to obtain F ( A 1 ( z 1 ) , z 2 ) = F ( A 1 ( z 1 ) , z 2 ) , then executes Protocol 2 to obtain m b 1 , c 11 b , c 12 b , and writes down the sequence of zero-knowledge proofs received as S .
In the ideal-model protocol, when B 1 does not publish the results to B 2 through the TTP,
I D E A L B ¯ ( z 1 , z 2 ) = { A 1 ( ( c 11 b i , c 12 b i ) , m b 1 , S ) , } .
In the ideal situation, when B 1 publishes the results to B 2 through the TTP,
I D E A L B ¯ ( z 1 , z 2 ) = { A 1 ( ( c 11 b i , c 12 b i ) , m b 1 , S ) , F ( A 1 ( z 1 ) , z 2 ) } .
Because the ciphertexts obtained in the real and ideal conditions use the same probability algorithm, there are c 11 b i c c 11 b i , c 12 b i c c 12 b i , and S c S ; thus, { I D E A L B ¯ ( z 1 , z 2 ) c { R E A L A ¯ ( z 1 , z 2 ) } .
To sum up, Protocol 2 under the malicious model is secure. □

References

  1. Shahamiri, S.R. Speech vision: An end-to-end deep learning-based dysarthric automatic speech recognition system. IEEE Trans. Neural Syst. Rehabil. Eng. 2021, 29, 852–861. [Google Scholar] [CrossRef]
  2. Lou, R.; Wang, W.; Li, X.; Zheng, Y.C.; Lv, Z.H. Prediction of Ocean Wave Height Suitable for Ship Autopilot. IEEE Trans. Intell. Transp. Syst. 2021, 23, 25557–25566. [Google Scholar] [CrossRef]
  3. Lauriola, I.; Lavelli, A.; Aiolli, F. An introduction to deep learning in natural language processing: Models, techniques, and tools. Neurocomputing 2022, 470, 443–456. [Google Scholar] [CrossRef]
  4. Khurana, D.; Koli, A.; Khatter, K.; Singh, S. Natural language processing: State of the art, current trends and challenges. Multimed. Tools Appl. 2023, 82, 3713–3744. [Google Scholar] [CrossRef] [PubMed]
  5. Kumar, P.; Kumar, R.; Srivastava, G.; Gupta, G.P.; Tripathi, R.; Gadekallu, T.R.; Xiong, N.N. PPSF: A privacy-preserving and secure framework using blockchain-based machine-learning for IoT-driven smart cities. IEEE Trans. Netw. Sci. Eng. 2021, 8, 2326–2341. [Google Scholar] [CrossRef]
  6. Yao, Y.; Xiong, N.; Park, J.H.; Ma, L.; Liu, J. Privacy-preserving max/min query in two-tiered wireless sensor networks. Comput. Math. Appl. 2013, 65, 1318–1325. [Google Scholar] [CrossRef]
  7. Huang, S.; Zeng, Z.; Ota, K.; Dong, M.; Wang, T.; Xiong, N. An intelligent collaboration trust interconnections system for mobile information control in ubiquitous 5G networks. IEEE Trans. Netw. Sci. Eng. 2020, 8, 347–365. [Google Scholar] [CrossRef]
  8. Fu, A.; Zhang, X.L.; Xiong, N.; Gao, Y.S.; Wang, H.Q.; Zhang, J. VFL: A verifiable federated learning with privacy-preserving for big data in industrial IoT. IEEE Trans. Ind. Inform. 2020, 18, 3316–3326. [Google Scholar] [CrossRef]
  9. Chen, Y.W.; Zhou, L.D.; Pei, S.W.; Yu, Z.W.; Chen, Y.; Liu, X.; Du, J.X.; Xiong, N. KNN-BLOCK DBSCAN: Fast clustering for large-scale data. IEEE Trans. Syst. Man Cybern. Syst. 2019, 51, 3939–3953. [Google Scholar] [CrossRef]
  10. Zheng, R.; Wang, Q.; Lin, Z.; Jiang, Z.W.; Fu, J.M.; Peng, G.J. Cryptocurrency malware detection in real-world environment: Based on multi-results stacking learning. Appl. Soft Comput. 2022, 124, 109044. [Google Scholar] [CrossRef]
  11. Yao, A.C. Protocols for secure computation. In Proceedings of the 23rd Annual Symposium on Foundation of Computer Science, Chicago, IL, USA, 3–5 November 1982; pp. 160–164. [Google Scholar]
  12. Goldreich, O. The Fundamental of Crytography: Basic Application; Cambridge University Press: London, UK, 2004. [Google Scholar]
  13. Cramer, R.; Damgård, I.B.; Nielsen, J.B. Secure Multiparty Computation; Cambridge University Press: London, UK, 2015. [Google Scholar]
  14. Tran, A.T.; Luong, T.D.; Karnjana, J.; Huynh, V.N. An efficient approach for privacy preserving decentralized deep learning models based on secure multi-party computation. Neurocomputing 2021, 422, 245–262. [Google Scholar] [CrossRef]
  15. Zhang, E.; Li, H.; Huang, Y.; Hong, L.; Zhao, L.; Ji, C. Practical multi-party private collaborative k-means clustering. Neurocomputing 2022, 467, 256–265. [Google Scholar] [CrossRef]
  16. Braun, L.; Demmler, D.; Schneider, T.; Tkachenko, O. MOTION—A Framework for Mixed-Protocol Multi-Party Computation. ACM Trans. Priv. Secur. 2022, 25, 1–35. [Google Scholar] [CrossRef]
  17. Ma, M.; Xu, Y.; Liu, Z. Privacy preserving Hamming distance computing problem of DNA sequences. J. Comput. Appl. 2019, 39, 2636. [Google Scholar]
  18. Zhang, K.X.; Yang, C.; Li, S.D. Confidential calculation of string matching. J. Cryptol. 2022, 9, 619–632. [Google Scholar]
  19. Kang, J.; Li, S.D.; Yang, X.Y. Secure Multiparty Computation for String Pattern Matching. J. Cryptol. 2017, 4, 241–252. [Google Scholar]
  20. Fiori, F.J.; Pakalén, W.; Tarhio, J. Approximate string matching with SIMD. Comput. J. 2022, 65, 1472–1488. [Google Scholar] [CrossRef]
  21. Xu, L.; Wei, X.; Cai, G.; Li, Y.; Wang, H. SWMQ: Secure wildcard pattern matching with query. Int. J. Intell. Syst. 2022, 37, 6262–6282. [Google Scholar] [CrossRef]
  22. Wang, Y.N.; Dou, J.W.; Ge, X. Secure vector computation based on threshold. J. Cryptol. 2020, 7, 750–762. [Google Scholar]
  23. Guan, Z.; Zhou, X.; Liu, P.; Wu, L.F.; Yang, W.T. A blockchain based dual side privacy preserving multiparty computation scheme for edge enabled smart grid. IEEE Internet Things. 2021, 9, 14287–14299. [Google Scholar] [CrossRef]
  24. Li, S.D.; Wang, W.L.; Du, R.M. Protocol for millionaires’ problem in malicious models. Sci. Sin. Inf. 2021, 51, 75–78. (In Chinese) [Google Scholar] [CrossRef]
Figure 1. Elliptic–curve operation.
Figure 1. Elliptic–curve operation.
Electronics 12 03491 g001
Figure 2. Possible malicious behavior in Protocol 1.
Figure 2. Possible malicious behavior in Protocol 1.
Electronics 12 03491 g002
Figure 3. Comparison of execution times for different string lengths in Protocol 2.
Figure 3. Comparison of execution times for different string lengths in Protocol 2.
Electronics 12 03491 g003
Figure 4. Comparison of execution times for different module N lengths in Protocol 2.
Figure 4. Comparison of execution times for different module N lengths in Protocol 2.
Electronics 12 03491 g004
Figure 5. Comparison of execution times for different schemes (Reference [17]: Ma, M. 2019; Reference [18]: Zhang, K.X. 2022; Reference [19]: Kang, J. 2017).
Figure 5. Comparison of execution times for different schemes (Reference [17]: Ma, M. 2019; Reference [18]: Zhang, K.X. 2022; Reference [19]: Kang, J. 2017).
Electronics 12 03491 g005
Figure 6. Comparison of execution times for different schemes (Reference [17]: Ma, M. 2019; Reference [18]: Zhang, K.X. 2022; Reference [19]: Kang, J. 2017).
Figure 6. Comparison of execution times for different schemes (Reference [17]: Ma, M. 2019; Reference [18]: Zhang, K.X. 2022; Reference [19]: Kang, J. 2017).
Electronics 12 03491 g006
Figure 7. Comparison of delay times for different string lengths in Protocol 2.
Figure 7. Comparison of delay times for different string lengths in Protocol 2.
Electronics 12 03491 g007
Figure 8. Comparison of delay times for different module N lengths in Protocol 2.
Figure 8. Comparison of delay times for different module N lengths in Protocol 2.
Electronics 12 03491 g008
Table 1. Protocol comparison.
Table 1. Protocol comparison.
ProtocolCalculation Complexity
(Modular Multiplication)
Communication RoundsScope of ApplicationResistance to Malicious Behaviors
Protocol 1 16 n + 6 2 roundsRational number, stringNo
Reference [17] 20 n log N 2 roundsStringNo
Reference [18] [ 3 log 2 N ( n + 1 ) ] log N 2 roundsStringNo
Protocol 2 32 n + 12 4 roundsRational number, stringYes
Reference [19] 8 ( n + m 1 ) l log N 4 roundsStringNo
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, X.; Wang, R.; Luo, D.; Xu, G.; Chen, X.; Xiong, N.; Liu, X. Secure Computation Protocol of Text Similarity against Malicious Attacks for Text Classification in Deep-Learning Technology. Electronics 2023, 12, 3491. https://doi.org/10.3390/electronics12163491

AMA Style

Liu X, Wang R, Luo D, Xu G, Chen X, Xiong N, Liu X. Secure Computation Protocol of Text Similarity against Malicious Attacks for Text Classification in Deep-Learning Technology. Electronics. 2023; 12(16):3491. https://doi.org/10.3390/electronics12163491

Chicago/Turabian Style

Liu, Xin, Ruxue Wang, Dan Luo, Gang Xu, Xiubo Chen, Neal Xiong, and Xiaomeng Liu. 2023. "Secure Computation Protocol of Text Similarity against Malicious Attacks for Text Classification in Deep-Learning Technology" Electronics 12, no. 16: 3491. https://doi.org/10.3390/electronics12163491

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop