An Intelligent Semi-Honest System for Secret Matching against Malicious Adversaries

Liu, Xin; Kong, Jianwei; Luo, Dan; Xiong, Neal; Xu, Gang; Chen, Xiubo

doi:10.3390/electronics12122617

Open AccessArticle

An Intelligent Semi-Honest System for Secret Matching against Malicious Adversaries

by

Xin Liu

^1,2,

Jianwei Kong

¹,

Dan Luo

^3,*,

Neal Xiong

⁴

,

Gang Xu

⁵ and

Xiubo Chen

⁶

¹

School of Information Engineering, Inner Mongolia University of Science and Technology, Baotou 014010, China

²

School of Computer Science, Shaanxi Normal University, Xi’an 710062, China

³

Computer Department, Tianjin Ren’ai College, Tianjin 301636, China

⁴

Department of Computer, Mathematical and Physical Sciences, Sul Ross State University, Alpine, TX 79830, USA

⁵

School of Information Science and Technology, North China University of Technology, Beijing 100144, China

⁶

State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(12), 2617; https://doi.org/10.3390/electronics12122617

Submission received: 24 May 2023 / Revised: 6 June 2023 / Accepted: 7 June 2023 / Published: 10 June 2023

Download

Browse Figures

Versions Notes

Abstract

:

With natural language processing as an important research direction in deep learning, the problems of text similarity calculation, natural language inference, question and answer systems, and information retrieval can be regarded as text matching applications for different data and scenarios. Secure matching computation of text string patterns can solve the privacy protection problem in the fields of biological sequence analysis, keyword search, and database query. In this paper, we propose an Intelligent Semi-Honest System (ISHS) for secret matching against malicious adversaries. Firstly, a secure computation protocol based on the semi-honest model is designed for the secret matching of text strings, which adopts a new digital encoding method and an ECC encryption algorithm and can provide a solution for honest participants. The text string matching protocol under the malicious model which uses the cut-and-choose method and zero-knowledge proof is designed for resisting malicious behaviors that may be committed by malicious participants in the semi-honest protocol. The correctness and security of the protocol are analyzed, which is more efficient and has practical value compared with the existing algorithms. The secure text matching has important engineering applications.

Keywords:

natural language processing; text matching; secure multi-party computation; malicious model; cut-and-choose method

1. Introduction

In recent years, deep learning (DL) has been widely used in the field of data processing, including functions such as image classification, target detection, image segmentation, and 3D reconstruction in the field of vision. The study of natural language processing problems is an important area of DL research. For example, news consulting application (APP) platforms push the news and content people want to read; social platforms filter and process some text information such as malicious comments and uncivilized posts; etc. Using algorithms related to natural language processing can greatly increase the efficiency of text-searching-related work.

Text matching (TM) is a fundamental and important research direction in the field of natural language processing, and the research on TM can be applied to several natural language processing tasks [1]. For example, in the task of information retrieval, the similarity between the text of a query and the text of a web page can be calculated to find the best matching web page [2,3,4,5,6,7,8]; in the task of machine translation, text similarity can be calculated from one language and another language, and the matching value can be used as an evaluation criterion [9,10,11,12,13,14,15,16,17]; in the task of automatic question and answer, TM between questions and candidate answers can be performed to obtain a matching score and select the correct answer [18]; the algorithms for TM can also be used in several fields such as dialogue systems and retelling problems.

In the process of text matching, privacy protection [19] has become the focus of natural language processing. In view of this problem, the secure multi-party computation can ensure that text strings can be matched under the condition of confidentiality. Secure multi-party computation [20] (MPC) is an important branch of cryptography which aims to solve the problem of cooperative computing between a group of malicious parties on the premise of protecting private data. In the whole process of the implementation of the calculation agreement, data calculation can be completed without relying on a third party, and the participants can only obtain the calculation results and not the private data of other participants. Most of the existing string matching protocols can only implement approximate string matching, string matching algorithm in plaintext, exact matching in plaintext, and Bloom Filter-based string matching in cloud computing. However, their computational complexity and communication complexity are relatively high. Based on the limitations of the existing secure text matching such as high computational complexity and low efficiency, we propose a new method to solve the text matching problem securely. The method has short cipher text length, as well as lower communication and computation complexity. It is more advantageous in solving the secure text matching problem.

The current research on the secure matching of text strings includes algorithmic improvement of secure text matching (STM) [21,22], approximate matching based on Bloom Filter [23,24,25,26], exact matching [27,28,29,30,31,32,33,34,35,36], and string equality problems [37,38,39]. In STM, keyword matching is a very common and important field. The keyword is the pattern string, and, since it finds all the positions in the text string where the pattern string appears, the algorithm to solve such problems is called the string pattern matching algorithm. Zhang, K.X. et al. [27] designed a new encoding method to handle the STM problem by hiding the confidential data of each participant in a vector and designed a secure determination protocol for string pattern matching under semi-honest model using the Paillier homomorphic encryption algorithm. Luo, Y.L. et al. [36] encoded each character as a corresponding binary number in the ASCII code and encrypted it with the ElGamal encryption algorithm, but the computational complexity was high. Luo, Y.L. et al. [36] used the Goldwasser–Micali homomorphic encryption algorithm to design a string pattern matching protocol and calculated each character with the binary number XOR; this method has a relatively high computational complexity. Another protocol in Luo, Y.L. et al. [36] used a symmetric cryptography algorithm for the STM problem, which is no longer secure once the key is leaked. Yasuda, M. et al. [38] used a homomorphic encryption scheme and a new data packing technique to encapsulate binary data into a single ciphertext on the ring space and determine whether the string matches by calculating multiple Hemming distances of the ciphertext. This improves the efficiency, but this method can only perform a single pattern matching (that is, the pattern string can only appear once in the text).

In this paper, given that the elliptic curve cryptosystem (ECC) is widely used by its short key, we propose an Intelligent Semi-Honest System (ISHS) for secret matching against malicious adversaries, and its computational efficiency and security are analyzed.

The contributions are as follows:

(1): First, an encoding method applicable to ECC encryption is proposed, which is simpler and more efficient.
(2): A text string fuzzy matching protocol based on semi-honest model is designed and its correctness is analyzed.
(3): By means of cryptographic tools such as the zero-knowledge proof and cut-and-choose methods, the modified STM protocol is resistant to malicious attacks that may be committed by malicious participants under the semi-honest model protocol. The security of the protocol is proved using the real/ideal model paradigm, and the efficiency of the protocol is analyzed by experimental simulations.

The rest of the paper is organized as follows: Section 2 introduces some basic tools needed to construct secure protocols, string encoding rules, and security definitions of protocols; Section 3 constructs a secure string pattern matching protocol under the semi-honesty and analyzes its correctness; Section 4 constructs an MPC protocol for string pattern matching under the malicious model and analyzes and proves the security of the protocol under the malicious model; Section 5 analyzes the performance of the protocol and introduces its engineering applications; Section 6 concludes this paper.

2. Related Work

2.1. Text String Encoding

Coding rules: Assuming the full set

U = {a, b, \dots, z} = {11, 12, \dots, 36}

, each character is represented by its corresponding two decimal digits. For example, character

a

is denoted as 11, character

b

is denoted as 12, and so on, and character

z

is denoted as 36.

Comparison rules: To determine whether string

S_{B}

is a substring of string

S_{A}

, a total of

n - m + 1

times of cyclic calculation is required. If at least one of the

n - m + 1

alignments results in 0,

S_{B}

is a substring of

S_{A}

. In addition, string

S_{A}

has a total of

n - m + 1

substrings of length

m

. Therefore, determination of whether

S_{B}

is a substring of

S_{A}

can be converted to the problem of whether

n - m + 1

substrings of string

S_{A}

are equal to

S_{B}

. For the convenience of description, binary predicates are defined as follows:

P (S_{A}, S_{B}) = \{\begin{cases} 0, Match \\ 1, Not match \end{cases} .

(1)

For example: Alice has string

S_{A} = a c d b c

generating vector A = (11,13,14,12,13) according to the encoding rules. Bob has string

S_{B} = a c

generating vector B = (11,13) according to the coding rules for calculation. The details are shown in Table 1.

2.2. Elliptic Curve Cryptography

Elliptic curve cryptography (ECC) is a public key encryption algorithm based on the elliptic curve in mathematics [40]; for example, y² = x³ + ax + b represents an elliptic curve and satisfies constraint 4a³ + 27b² ≠ 0.

The ECC encryption and decryption process is as follows:

(1): Select an elliptical curve $E_{P} (a, b)$ , and take a point on the elliptic curve as the base point $P$ .
(2): Select a large number $k$ as the private key, and generate the public key $Q = k P$ .
Encryption: choose a random number $r$ , encode the plaintext as a point $M$ , calculate ciphertext $C$ . The ciphertext is a point pair, that is, $C = (r P, M + r Q)$ . The negative element operation for elliptic curves is $E (x, - y) = E (x, - y \mod p) = E (x, p - y)$ .
(3): Decryption: $M + r Q - k (r P) = M + r (k P) - k (r P) = M$ ; calculate $m$ from $M$ .

Homomorphic addition operation:

Set two ciphertexts,

C_{1} = (r_{1} P, M_{1} + r_{1} Q)

and

C_{2} = (r_{2} P, M_{2} + r_{2} Q)

, then

E (C_{1} + C_{2}) = ((r_{1} + r_{2}) P, (M_{1} + M_{2}) + (r_{1} + r_{2}) Q) = M_{1} + M_{2}

.

2.3. Cut-and-Choose Method

The cut-and-choose method [41] is an important cryptographic tool that is used in most malicious model protocols. The cut-and-choose method involves one party constructing and sending a large number of circuits and the other party randomly selecting half of the circuits and asking the other party to open the circuits to check their correctness, and then performing a secure computation on the other half of the circuits that are not opened. For example:

Input:

(1): Alice input vector ${\vec{x}}_{i}, i = 1, 2, \dots, l$ . Each vector is composed of $s$ pairs, that is, ${\vec{x}}_{i} = 〈(x_{0}^{i, 1}, x_{1}^{i, 1}) (x_{0}^{i, 2}, x_{1}^{i, 2}), \dots, (x_{0}^{i, s}, x_{1}^{i, s})〉$ . There are $l$ vectors. Enter $s$ to check if the value of $X_{1}, X_{2}, \dots, X_{s}$ is in ${\{0, 1\}}^{n}$ .
(2): Bob enters $σ_{1}, σ_{2}, \dots, σ_{l} \in \{0, 1\}$ and a set of parameters $ζ \subseteq [s]$ .

Output:

(1): Receiver R obtains the $j$ th pair in vector ${\vec{x}}_{i}$ , that is, $(x_{0}^{i, j}, x_{1}^{i, j})$ ;
(2): The receiver R obtains $σ_{i}$ from each pair of vectors ${\vec{x}}_{i}$ , that is, $〈x_{σ_{i}}^{i, 1}, x_{σ_{i}}^{i, 2}, \dots, x_{σ_{i}}^{i, s}〉$ , among them $i = 1, 2, \dots, l$ , $j \in ζ$ , $k \notin ζ$ . The receiver outputs $X_{k}$ .

2.4. Security under the Malicious Model

The malicious model [42] is a more pervasive model for secure multi-party computation (MPC). However, MPC protocols under the malicious model should block or detect the behaviors of malicious participants and are thus more difficult to design than MPC protocols under the semi-honest model. To prove that an MPC protocol is secure under the malicious model, it must be proved that it satisfies the security definition under the malicious model, that is, if the real protocol can achieve the same security as the ideal protocol, then the protocol is secure. The proof method is called the real/ideal model paradigm.

Ideal protocol:

P_{1}

and

P_{2}

have private data

x

and

y

, respectively, and

P_{1}

and

P_{2}

want to jointly compute function

f (x, y) = (f_{1} (x, y), f_{2} (x, y))

. The computation process requires a Trusted Third Party (TTP). Finally, both parties obtain the results,

f_{1} (x, y)

and

f_{2} (x, y)

, respectively. The concrete implementation process is as follows:

(1): $P_{1}$ and $P_{2}$ send $x$ and $y$ to TTP, respectively. If $P_{i} (i = 1, 2)$ is honest, the correct data are sent to TTP. If $P_{i}$ is malicious, it may send false input $x ’$ or $y ’$ based on the private data, or it may refuse to execute the protocol. However, such cases affect the calculation results, and should not be considered.
(2): If TTP receives $x$ and $y$ and calculates $f (x, y)$ , send $f_{1} (x, y)$ to $P_{1}$ , and send $f_{2} (x, y)$ to $P_{2}$ .

In the ideal protocol,

P_{1}

and

P_{2}

do not receive any information from each other except for obtaining

f_{i} (x, y) (i = 1, 2)

. The ideal protocol is the most secure. If the protocol designed under the malicious model can also achieve the same security as the ideal protocol, the real protocol can be considered secure. In addition, the malicious model requires at least one of the parties to be honest, and there does not exist a protocol that is secure even if all participants are malicious adversaries.

In the ideal model, the participant has auxiliary information

z

. The process of calculating strategy

\bar{B}

jointly with

F (x, y)

is denoted as

I D E A L_{F, \bar{B} (z)} (x, y)

. A random number

r

is chosen, and

I D E A L_{F, \bar{B} (z)} (x, y) = γ (x, y, z, r)

, where

γ (x, y, z, r)

is defined as follows:

(1): If $P_{1}$ is honest, there is

$γ (x, y, z, r) = (f_{1} (x, y ’), B_{2} (y, z, r, f_{2} (x, y ’))),$

(2)

among them $y ’ = B_{2} (y, z, r)$ .
(2): If $P_{2}$ is honest, there is

$γ (x, y, z, r) = \{\begin{matrix} (B_{1} (x, z, r, f_{1} (x ’, y), ⊥), ⊥), & i f \begin{matrix} \end{matrix} B_{1} (x, z, r, f_{1} (x ’, y)) = ⊥ \\ (B_{1} (x, z, r, f_{1} (x ’, y)), f_{2} (x ’, y)), & otherwise \end{matrix} .$

(3)

In both Formulas (2) and (3), there is

x ’ = B_{1} (x, z, r)

.

Definition 1.

Security of MPC protocols under the malicious model

Let

F

{0, 1}^{*} \times {0, 1}^{*} \to {0, 1}^{*} \times {0, 1}^{*}

be a probability polynomial time function. Remember the output sequence messages generated by

A_{1} (y, z)

and interaction in the process

A_{2} (y, z)

of

R E A L_{Π, \overset{—}{A} (z)} (x, y)

executing protocol with strategy

\bar{A}

in the case of auxiliary input

z

;

\overset{—}{A} = (A_{1}, A_{2})

represents the probabilistic polynomial time algorithm constructed in the real model. The private information owned by both parties is

x

and

y

, respectively.

If for any acceptable strategy

\overset{—}{A} = (A_{1}, A_{2})

in the real protocol, an acceptable strategy

\overset{—}{B} = (B_{1}, B_{2})

in the ideal protocol can be found. It satisfies

{I D E A L_{F, \overset{—}{B} (z)} (x, y)}_{x, y, z} \overset{c}{\equiv} {R E A L_{Π, \overset{—}{A} (z)} (x, y)}_{x, y, z};

(4)

then, the protocol securely calculates function

F

.

3. Secure Text Matching Protocol under the Semi-Honest Model

Problem description: Secure text matching is performed to determine whether one string is a substring of another string. Alice has string

S_{A}

of length

n

and Bob has string

S_{B}

of length

m (m \leq n)

. Both parties want to know whether

S_{B}

is a substring of

S_{A}

without revealing other information.

Solution idea: The first element of string

S_{A}

and

m - 1

adjacent elements is selected to form substring

s_{a 1}

with the length of

m

. The character elements in

s_{a 1}

and

S_{B}

are subject to ECC addition homomorphism to determine whether

s_{a 1}

and

S_{B}

are equal. If the two strings are equal, the result is 0. Therefore, determination of whether substring

s_{a 1}

is equal to

S_{B}

requires

m

comparisons. If the result of

m

comparisons is 0, then substring

s_{a 1}

is equal to string

S_{B}

. The above operation is a circular calculation. The function of the

i

th cycle calculation is to select the

i

th element and the

m - 1

elements following it in string

S_{A}

, and then compose substring

s_{a 1}

of length

m

to compare with string

S_{B}

. Therefore, to determine whether string

S_{B}

is a substring of string

S_{A}

, a total of

n - m + 1

times of cyclic calculation is required, as shown in Figure 1.

If the result of at least one in

n - m + 1

comparison rounds is 0, then

S_{B}

is a substring of

S_{A}

and string

S_{A}

has a total of

n - m + 1

substrings of length

m

. Therefore, determining whether

S_{B}

is a substring of

S_{A}

can be converted to the problem of

n - m + 1

substrings of string

S_{A}

are equal to

S_{B}

.

Encoding method: Assuming the full set

U = {a, b, \dots, z} = {11, 12, \dots, 36}

, each character is represented by its corresponding two decimal digits (for example, character a is represented as 11). Alice has string

S_{A} = a_{1} a_{2} \dots a_{n} \in U

. According to the coding method between the elements in string

S_{A}

and the elements in the full set

U

,

S_{A}

is transformed into vector

A = (a_{1} ’, a_{2} ’, \dots, a_{n} ’)

. Bob has string

S_{B} = b_{1} b_{2} \dots b_{m} \in U

. Similarly,

S_{B}

can be transformed into vector

B = (b_{1} ’, b_{2} ’, \dots b_{m} ’)

according to the encoding method. Bob takes

m

elements

A ’ = (a_{i}^{’}, a_{_{i + 1}}^{’}, \dots, a_{i + m - 1}^{’})

in vector

A

in order from left to right and performs the ECC homomorphic addition with the elements in vector

B

. If

S_{B}

is a substring of

S_{A}

, then there is at least one zero vector in the result of the calculation, and all elements in that vector sum to 0.

Correctness analysis:

(1): In step (4), each $i \in [1, n - m + 1]$ is calculated. Bob selects the $i$ th character and the $m - 1$ characters after it from $S_{A}$ to form a substring with the length of $m$ and performs a homomorphic calculation with the elements in $E (B)$ to obtain cycle $E (ω_{i}) = \prod_{j = 1}^{m} E ({a^{'}}_{i + j - 1}) * E (N - {b^{'}}_{j})$ . A total of $n - m + 1$ rounds are required.
(2): In Step (6), Alice decrypts calculation result $E (W)$ to obtain set $W$ . If one of the decryption results in set $W$ is 0, it means that $S_{B}$ is a substring of $S_{A}$ . If otherwise, it is not.
(3): Bob randomly selects $s \in {0, 1}$ and random number $r_{i j}$ when encrypting the string, the purpose of which is to keep the data secure and prevent negative numbers during the ECC additive homomorphism calculation.

In Protocol 1, the STM protocol based on the semi-honest model is secure because the participants can follow the rules to execute the protocol. However, in real life, it is necessary to design STM protocols under the malicious model because participants may display some malicious behaviors.

Protocol 1 STM protocol of two strings under the semi-honest model.

Input: Alice and Bob’s respective strings are

S_{A} = a_{1} a_{2} \dots a_{n}

,

S_{B} = b_{1} b_{2} \dots b_{m}

,

(m \leq n)

Output:

P (S_{A}, S_{B}) = \{\begin{cases} 0, S_{B} \subseteq S_{A} \\ 1, S_{B} ⊄ S_{A} \end{cases}

.

Alice selects elliptic curve $E_{p}$ , base point $G$ and private key $k$ , calculates $k G = K$ , obtains public key $K$ , and sends $E_{p} (a, b)$ , public key $K$ and $G$ to Bob.
Alice transforms $S_{A}$ into $A = (a_{1} ’, a_{2} ’, \dots, a_{n} ’)$ according to the encoding method and encodes $a_{i} ’ (i \in 1, 2, \dots, n)$ to points $M_{i} (1 \leq i \leq n)$ of elliptic curve $E_{p}$ one by one. Alice chooses $n$ random numbers $r_{a i} (i \in 1, 2, \dots, n)$ and uses public key $K$ to encrypt each element $M_{i}$ separately, that is, she calculates $E (M_{i}) = (C_{a 1 i}, C_{a 2 i})$ corresponding to each element $M_{i}$ where $C_{a 1 i} = M_{i} + r_{a i} K, C_{a 2 i} = r_{a i} G$ to obtain ciphertext $E (A) = (E (M_{1}), E (M_{2}), \dots, E (M_{n}))$ and sends $E (A)$ to Bob.
Bob obtains vector $B = (b_{1} ’, b_{2} ’, \dots b_{m} ’)$ according to $S_{B}$ and set $U$ according to the encoding. He encodes $b_{i} ’ (i \in 1, 2, \dots, m)$ one by one to point $N_{i} (1 \leq i \leq m)$ of elliptic curve $E_{p}$ . Bob chooses $m$ random numbers $r_{b i} (i \in 1, 2, \dots, m)$ and uses public key $K$ to encrypt each element $N_{i}$ separately, that is, he calculates $E (N_{i}) = (C_{b 1 i}, C_{b 2 i})$ corresponding to each element $N_{i}$ where $C_{b 1 i} = N_{i} + r_{b i} K, C_{b 2 i} = r_{b i} G$ to obtain ciphertext $E (B) = (E (N_{1}), E (N_{2}), \dots, E (N_{m}))$ .
Bob randomly selects $s \in {0, 1}$ and a random number $r_{i j}$ , which is calculated for each $i$ as follows:
$E (ω_{i}) = \{\begin{cases} \prod_{j = 1}^{m} {(E ({a^{'}}_{i + j - 1}) \times E (N - {b^{'}}_{j}))}^{r_{i j}}, s = 0 \\ {\prod_{j = 1}^{m} (E (N - {a^{'}}_{i + j - 1}) \times E ({b^{'}}_{j}))}^{r_{i j}}, s = 1 \end{cases}$ .
Bob obtains $E (W) = \{E (ω_{1}), E (ω_{2}), \cdot \cdot \cdot, E (ω_{n - m + 1})\}$ after $n - m + 1$ cycles of calculation and sends $E (W)$ to Alice.
Alice decrypts $E (W)$ with her private key $k$ to obtain set $W$ . If there is at least one element of 0 in set $W$ , then output $P (S_{A}, S_{B}) = 0$ and string $S_{B}$ is a substring of $S_{A}$ ; otherwise, output $P (S_{A}, S_{B}) = 1$ and string $S_{B}$ is not a substring of $S_{A}$ .

The Protocol ends.

Advantages and Disadvantages: The advantage of the protocol under the semi-honest model is that its computation requires fewer steps which is less computationally intensive. It also has fewer communication rounds, which makes it faster. However, its disadvantage is that it cannot resist malicious adversary attacks, and in case of malicious behavior, the protocol under the semi-honest model becomes insecure. Therefore, it is necessary to design protocols under the malicious model.

4. Secure Text Matching Protocol under the Malicious Model

Designing MPC protocols based on the malicious model usually involves designing countermeasures based on malicious behaviors that may be committed by malicious participants in the semi-honest model protocol so that malicious adversaries cannot commit malicious behaviors or be detected.

The first thing to understand is that there are malicious behaviors that cannot be prevented in an ideal model, and also cannot be prevented in a malicious model either. The specific behaviors include three types: (1) participants input wrong data; (2) participants refuse to participate in the protocol; (3) participants stop the protocol midway after receiving the information they want.

Possible malicious acts in Protocol 1 include (as shown in Figure 2):

(1): In Protocol 1, Alice has public key K and private key k while Bob only has public key K. Moreover, the final result is only decrypted unilaterally by Alice, which is unfair to Bob. The solution countermeasure is that both participants can perform decryption.
(2): In Steps 2 and 3 of Protocol 1, the ciphertext sent by Alice and Bob to Bob may be incorrect. In this case, neither party can obtain the correct results. The solution is to use the cut-and-choose method and zero-knowledge proof to avoid such situations.
(3): In Step 6 of Protocol 1, Alice tells Bob the wrong result after decryption, making Bob obtain the wrong conclusion. The solution countermeasure is that Bob and Alice ask for equal status and generate their respective public and private keys at the same time.

4.1. Specific Protocols

To prevent these malicious behaviors and to design a protocol that is secure, fair, and can provide correct conclusions under the malicious model, in this paper, Protocol 1 cryptography and other tools are used to block possible malicious behavior; in addition, the final result is calculated by both sides simultaneously.

4.2. Correctness Analysis

(1)

In Step (4), Alice and Bob use their private keys

k_{1}

and

k_{2}

to decrypt sets

W

and

B

.

(2)

(c_{a 1}^{i}, c_{a 2}^{i})

and

(c_{b 1}^{i}, c_{b 2}^{i})

published in Step (5) do not disclose information, because random numbers were added to each.

(3)

In Step (7), Alice and Bob calculate, respectively,

c_{b} = a (c_{b 2}^{j} - c_{b 1}^{j} - W + K_{2}) = a (B + f_{j} B + b G - f_{j} B - K_{2} - W + K_{2}) = a (B - W) + a b G, c_{a} = b (c_{a 2}^{i} - c_{a 1}^{i} - B + K_{1}) = b (W + d_{i} W + a G - d_{i} W - K_{1} - B + K_{1}) = b (W - B) + a b G .

Then, Alice and Bob send

c_{b} + P_{1}

and

c_{a} + P_{2}

to each other, respectively.

(4)

In Step (5), if

a_{i}

is chosen by Alice as the wrong random number, Bob did not select the wrong random number

a_{i}

out of the

\frac{t}{2}

selected, that is, no wrong random number

a_{i}

was detected. In the following Step (7), it is selected by Bob, and Bob finally calculates the wrong result. If Alice performs malicious behavior using the above method, the case where this malicious behavior is performed with the maximum probability of success is when Alice mixes

t

random numbers

a_{i}

with 1 wrong

a_{i}

such that malicious behavior is performed with the maximum probability of success, and in this case, the probability of deception success is

\frac{1}{t}

. If

t = 20

, the probability of successful spoofing in this case is

\frac{C_{19}^{10}}{C_{20}^{10}} \times \frac{1}{10} = \frac{1}{200}

. However, if Alice mixes 10 wrong

a_{i}

, in this case, the probability of successful spoofing is

\frac{C_{10}^{10}}{C_{20}^{10}} \times \frac{1}{2} = 2.7 \times 10^{- 7}

; the probability of success is smaller or even negligible. Alice is detected in subsequent verification if more than

\frac{t}{2}

wrong random numbers

a_{i}

are mixed in

t

random numbers. Therefore, this malicious behavior is secure.

(5)

The result Alice and Bob finally obtain in Step (10) is correct because of the following factors:

After Alice uses the zero-knowledge proof to verify that Bob sent $m_{b}$ correctly, the answer is correct by calculating $m_{b} - a v$ , that is,

$m_{b} - a v = m_{b} - a b K_{2} = k_{2} c_{b} - a b k_{2} G = k_{2} a (B - W) + k_{2} a b G - a b k_{2} G = k_{2} a (B - W) .$

After Bob uses the zero-knowledge proof to verify that Alice sent $m_{a}$ correctly, the answer is correct by calculating $m_{a} - b u$ , that is,

$m_{a} - b u = m_{a} - b a K_{1} = k_{1} c_{a} - a b k_{1} G = k_{1} b (W - B) + k_{1} a b G - a b k_{1} G = k_{1} b (W - B) .$

(6)

In Step (11), Alice and Bob each decode the set of ciphertexts

W

and

B

. There is no leakage in the computation process between the two parties.

(7)

No secure data are leaked throughout the process, and both parties are able to arrive at their results, avoiding the unfairness of one party telling the other the results.

4.3. Proof of Security

For MPC protocols under malicious models, widely accepted real/ideal model paradigms are used to prove the security of the protocols.

Theorem 1.

The STM protocol (Protocol 2) is secure under the malicious model.

Protocol 2 STM protocol of two strings under the malicious model.

Input: Alice and Bob’s respective strings are

S_{A} = a_{1} a_{2} \dots a_{n}

and

S_{B} = b_{1} b_{2} \dots b_{m}

,

(m \leq n)

.
Output:

P (S_{A}, S_{B}) = \{\begin{cases} 0, S_{B} \subseteq S_{A} \\ 1, S_{B} ⊄ S_{A} \end{cases}

.

Alice and Bob jointly choose elliptic curve $E_{P}$ and base point $G$ . Alice and Bob choose their private keys $k_{1}, k_{2} (k_{1}, k_{2} > 0)$ and random numbers $a$ and $b$ , respectively; then, Alice and Bob compute their respective public keys $K_{1} = k_{1} G, K_{2} = k_{2} G$ and $u = a K_{1}, v = b K_{2}$ ; finally, Alice and Bob exchange $(K_{1}, u)$ and $(K_{2}, v)$ .
Alice transforms $S_{A}$ into $A = (a_{1} ’, a_{2} ’, \dots, a_{n} ’)$ according to the encoding and encodes $a_{i} ’ (i \in 1, 2, \dots, n)$ to points $M_{i}^{a} (1 \leq i \leq n)$ of elliptic curve $E_{p}$ one by one. Alice chooses $n$ random numbers $r_{a i} (i \in 1, 2, \dots, n)$ and uses public key $K_{1}$ to encrypt each element $M_{i}^{a}$ separately, that is, she calculates $E (M_{i}^{a}) = (C_{a 1 i}^{a}, C_{a 2 i}^{a})$ corresponding to each element $M_{i}^{a}$ , where $C_{a 1 i}^{a} = M_{i}^{a} + r_{a i} K_{1}, C_{a 2 i}^{a} = r_{a i} G$ , to obtain ciphertext $E (A) = (E (M_{1}^{a}), E (M_{2}^{a}), \dots, E (M_{n}^{a}))$ and sends $E (A)$ to Bob.
Bob transforms $S_{B}$ into $B = (b_{1} ’, b_{2} ’, \dots b_{m} ’)$ according to the encoding and encodes $b_{i} ’ (i \in 1, 2, \dots, m)$ to points $N_{i}^{b} (1 \leq i \leq m)$ of elliptic curve $E_{p}$ one by one. Alice chooses $m$ random numbers $r_{b i} (i \in 1, 2, \dots, m)$ and uses public key $K_{2}$ to encrypt each element $N_{i}^{b}$ separately, that is, she calculates the $E (N_{i}^{b}) = (C_{b 1 i}^{b}, C_{b 2 i}^{b})$ corresponding to each element $N_{i}^{b}$ , where $C_{b 1 i}^{b} = N_{i}^{b} + r_{b i} K, C_{b 2 i}^{b} = r_{b i} G$ , to obtain ciphertext $E (B) = (E (N_{1}^{b}), E (N_{2}^{b}), \dots, E (N_{m}^{b}))$ and sends $E (B)$ to Alice.
Bob selects the coordinates of length $m$ from the first coordinate according to the ciphertext set sent by Alice and obtains set $E (W)$ , that is, $E (W) = \{E (ω_{1}), E (ω_{2}), \cdot \cdot \cdot, E (ω_{n - m + 1})\}$ , after $n - m + 1$ rounds of calculation. Then, he permutes the components in $E (W)$ randomly, which is still $E (W)$ after permutation, and sends them to Alice, who decrypts $E (W)$ using private key $k_{1}$ to obtain set $W$ .
Alice chooses $t$ random numbers $d_{i} (0 \leq i \leq t)$ and calculates $(c_{a 1}^{i}, c_{a 2}^{i}) = (d_{i} W + K_{1}, W + d_{i} W + a G)$ . Bob chooses $t$ random numbers $f_{i} (0 \leq i \leq t)$ and calculates $(c_{b 1}^{i}, c_{b 2}^{i}) = (f_{i} B + K_{2}, B + f_{i} B + b G)$ . Finally, Alice and Bob exchange $(c_{a 1}^{i}, c_{a 2}^{i})$ and $(c_{b 1}^{i}, c_{b 2}^{i})$ .
With the cut-and-choose method, Alice randomly selects $t / 2$ groups $(c_{b 1}^{i}, c_{b 2}^{i})$ from the $t$ groups sent by Bob and announces them, asking Bob to announce the corresponding $f_{i} B$ . Alice verifies the received data: $f_{i} B + K_{2} = c_{b 1}^{i}$ . Verification passes and continues; otherwise, the protocol is terminated. Bob randomly selects $t / 2$ groups $(c_{a 1}^{i}, c_{a 2}^{i})$ from the $t$ groups sent by Alice and announces them, asking Alice to announce the corresponding $d_{i} W$ . Bob verifies the received data: $d_{i} W + K_{1} = c_{a 1}^{i}$ . Verification passes and the protocol continues; otherwise, the protocol is terminated.
Alice and Bob choose random $(c_{b 1}^{j}, c_{b 2}^{j})$ and $(c_{a 1}^{j}, c_{a 2}^{j})$ from the remaining $(c_{b 1}^{i}, c_{b 2}^{i})$ and $(c_{a 1}^{i}, c_{a 2}^{i})$ , respectively, while Alice chooses to pick two random numbers $a, p_{1}$ and Bob picks two random numbers $b, p_{2}$ . Alice computes $c_{b} = a (c_{b 2}^{j} - c_{b 1}^{j} - W + K_{2}) = a (B - W) + a b G$ , $P_{1} = p_{1} G$ , $λ_{b} = p_{1} K_{2}$ and Bob computes $c_{a} = b (c_{a 2}^{i} - c_{a 1}^{i} - B + K_{1}) = b (W - B) + a b G$ , $P_{2} = p_{2} G$ , $λ_{A} = p_{2} K_{1}$ . Alice and Bob then send $c_{b} + P_{1}$ and $c_{a} + P_{2}$ to each other.
After each participant receives a message from the other, Alice computes $θ_{a} = k_{1} (c_{a} + P_{2})$ and $m_{a} = k_{1} c_{a}$ and sends them to Bob; Bob computes $θ_{b} = k_{2} (c_{b} + P_{1})$ and $m_{b} = k_{2} c_{b}$ and sends them to Alice.
Alice uses zero-knowledge proof to verify that the $m_{b}$ sent by Bob is correct, that is, to prove that Bob did indeed multiply his private key $k_{2}$ with his own $c_{b}$ to obtain $m_{b}$ , that is, to determine whether $m_{b} = θ_{b} - λ_{b}$ holds. Bob uses zero-knowledge proof to verify that the $m_{a}$ sent by Alice is correct, that is, to prove that Alice did indeed multiply his private key $k_{1}$ with his own $c_{a}$ to obtain $m_{a}$ , that is, to determine whether $m_{a} = θ_{a} - λ_{a}$ holds. The party that does not pass is the malicious participant.
Alice can obtain $k_{2} a (B - W)$ by computing $m_{b} - a v$ , and if $k_{2} a (B - W) = 0$ , then $B = W$ ; Bob can obtain $k_{1} b (W - B)$ by computing $m_{a} - b u$ , and if $k_{1} b (W - B) = 0$ , then $B = W$ . If $B = W$ , it proves that the results sought by both parties are correctly the same; otherwise, the protocol is terminated.
Finally, Alice and Bob decode set $W$ and set $B$ , respectively. If there is at least one element of 0 in the two sets, then string $S_{B}$ is a substring of $S_{A}$ , If neither set has an element of 0, then string $S_{B}$ is not a substring of $S_{A}$ at this point.

The Protocol ends.

Proof.

To prove that Protocol 2 is secure under the malicious model, it is sufficient to show that the participants transform the acceptable policy pair

\overset{—}{A} = (A_{1}, A_{2})

into the corresponding policy pair

\overset{—}{B} = (B_{1}, B_{2})

in the ideal protocol during the execution of Protocol 2 so that the output messages of

A_{1}

and

A_{2}

are indistinguishable from those of

B_{1}

and

B_{2}

when Protocol 2 is executed. Since the case where both parties are malicious participants is considered, it is assumed that one party is honest and the other dishonest. The discussion is divided into two cases (here,

A_{1}, B_{1}

and

A_{2}, B_{2}

for Alice and Bob, respectively). □

Case 1:

A_{1}

is honest,

A_{2}

is dishonest.

A_{1}

honestly executes Protocol 2; then

R E A L_{\overset{—}{A}} (W, B) = {F W, A_{2} (B), A_{2} (C_{a 1}^{i}, C_{a 2}^{i}), m_{a}, S},

(5)

where message sequence received by zero-knowledge proof

A_{2}

is denoted as

S

.

A_{1}

is honest. In this case,

B_{1}

is determined by

A_{1}

and

B_{1}

implements the protocol according to the protocol steps. It is necessary to transform the real protocol adversary

A_{2}

into the ideal protocol malicious adversary

B_{2}

, in other words, to find an acceptable strategy pair

\overset{—}{B} = (B_{1}, B_{2})

under the ideal model so that its output is indistinguishable from the calculation of

R E A L_{_{\overset{—}{A} (W, B)}}

(in addition, the decision of

B_{2}

depends on the behavior of

A_{2}

).

Ideally,

B_{1}

sends a real

W

to the TTP (when

B_{1}

receives the message and allows the TTP to send a message to

B_{2}

).

B_{2}

is dishonest and the message it sends to TTP depends on

A_{2}

’s policy. In summary, it is known that

B_{2}

sends

A_{2} (B)

to TTP, and TTP sends

F (W, A_{2} (B))

to

B_{2}

(

B_{1}

will also obtain this result)

B_{2}

to use

F (W, A_{2} (B))

to obtain

v i e w_{B_{2}}^{} (W, A_{2} (B))

, which is indistinguishable from

v i e w_{A_{2}}^{} (W, A_{2} (B))

obtained by

A_{2}

in the real case given to

A_{2}

to obtain the output of

A_{2}

.

B_{2}

selected

W^{'}

to satisfy

F (W^{'}, A_{2} (B)) = F (W, A_{2} (B))

, that is, with

W^{'}

assumed to be the input of

A_{1}

with

A_{2}

to execute Protocol 2, the protocol execution process

B_{2}

can obtain the corresponding message sequence

S ’

. In this case, the protocol execution can be completed.

I D E A L_{\overset{—}{B}} (W, B) = {F (W, A_{2} (B)), A_{2} (C_{a 1}^{i ’}, C_{a 2}^{i ’}), m_{a} ’, S ’} .

(6)

Since the ideal protocol and the real protocol use the same encryption,

(C_{a 1}^{i}, C_{a 2}^{i}) \overset{c}{\equiv} (C_{a 1}^{i ’}, C_{a 2}^{i ’})

,

m_{a} ’ \overset{c}{\equiv} m_{a}

are guaranteed; the zero-knowledge proof, in turn, guarantees

S ’ \overset{c}{\equiv} S

. Therefore,

{I D E A L_{\overset{—}{B}} (W, B)} \overset{c}{\equiv} {R E A L_{\overset{—}{A}} (W, B)} .

(7)

Case 2:

A_{2}

is honest and

A_{1}

is dishonest. There are two scenarios:

(1): $A_{1}$ does not publish the result or ignores the TTP (considered as $A_{1}$ aborting the protocol), and the TTP sends $⊥$ to $A_{2}$ , then

$R E A L_{\overset{—}{A}} (W, B) = {A_{1} (C_{b 1}^{i} C_{b 2}^{i}), m_{b}, S, ⊥} .$

(8)
(2): Conversely, TTP sends $F (A_{1} (W), B)$ to $A_{2}$ , then

$R E A L_{\overset{—}{A}} (W, B) = {A_{1} (C_{b 1}^{i} C_{b 2}^{i}), m_{b}, S, F (A_{1} (W), B)},$

(9)

where the sequence of messages received by $A_{1}$ during the zero-knowledge proof is denoted as $S$ .

If

A_{2}

is honest, the adversary

A_{1}

is simply transformed in the real model into the ideal adversary

B_{1}

. That is, to prove that

A_{1}

is indistinguishable from

B_{1}

, a set of strategy pairs

\overset{—}{B} = (B_{1}, B_{2})

should be found in the ideal model such that their output satisfies the indistinguishability computed with

R E A L_{\overset{—}{A} (Q_{1}, Q_{2})}

.

If

A_{1}

is dishonest and

B_{1}

’s strategy for treating the TTP depends on

A_{1}

’s behavior, the message it would send to TTP is

A_{1} (W)

and

F (A_{1} (W), B)

is the message it would receive from TTP. Ideally,

B_{1}

uses

F (A_{1} (W), B)

to manage to obtain

v i e w_{B_{1}}^{} (A_{1} (W), B)

that satisfies the

v i e w_{A_{1}}^{} (A_{1} (W), B)

computation indistinguishable from the one obtained by

A_{1}

in the actual protocol, which is given to

A_{1}

to obtain the

A_{1}

output. Protocol 2 is allowed to be executed by

B_{1}

with

A_{1}

using

B^{'}

satisfying

F (A_{1} (W), B^{'}) = F (A_{1} (W), B)

as the input value.

During the execution of the protocol, the corresponding message sequence

S ’

is available to

B_{1}

, and it corresponds to the existence of the following two cases:

(1): In the ideal model, when $B_{1}$ informs TTP not to send calculation results to $B_{2}$ , it is determine that

$I D E A L_{\overset{—}{B}} (W, B) = {A_{1} (C_{b 1}^{i ’}, C_{b 2}^{i ’}), m_{b} ’, S ’, ⊥} .$

(10)
(2): Conversely, there are

$I D E A L_{\overset{—}{B}} (W, B) = {A_{1} (C_{b 1}^{i ’} C_{b 2}^{i ’}), m_{b} ’, S ’, F (A_{1} (W), B)} .$

(11)

In these two cases, the output of

A_{2}

and

B_{2}

in the actual protocol and the ideal protocol are the same, while the ideal protocol and the actual protocol adopt the same ECC encryption algorithm, so

(C_{1 b}^{i}, C_{2 b}^{i}) \overset{c}{\equiv} (C_{1 b}^{i ’}, C_{2 b}^{i ’})

,

m_{b}^{’} \overset{c}{\equiv} m_{b}

and the zero-knowledge proof can guarantee

S ’ \overset{c}{\equiv} S

. Then,

{I D E A L_{\overset{—}{B}} (W, B)} \overset{c}{\equiv} {R E A L_{\overset{—}{A}} (W, B)} .

(12)

In summary, Protocol 2 is secure under the malicious model.

4.4. Characteristics of the Protocol

The ECC encryption algorithm can prevent attacks better than other current encryption algorithms, so the secure text matching method proposed in this paper is more secure and resistant to malicious attacks; the key length of the encryption algorithm used in this paper is short, so the protocol in this paper obtains the same result as other encryption algorithms with less computation, which improves efficiency. Meanwhile, the protocol in this paper achieves exact matching of strings. It provides a new method to solve the pattern matching problem. Although the protocol proposed in this paper has advantages over other protocols, it is targeted at the condition of two parties. If it is used in the multi-party condition, then the computational complexity and communication complexity of the protocol will increase. Therefore, the protocol in this paper will have limitations in the multi-party condition.

5. Performance Analysis

The performance of the protocols is elaborated by comparing Protocol 1 and Protocol 2 with the existing schemes through computational complexity and communication complexity analysis.

5.1. Computational Complexity

Zhang, K.X. et al. [27] designed an STM protocol based on the Paillier encryption algorithm which has the complexity of

m + \frac{3}{2} (2 n + 1) \log_{2} N

modulo multiplication operations. Luo, Y.L. et al. [32] designed an STM protocol based on the ElGamal encryption algorithm which has a computational complexity of

m n [(2 n k + 1) \log_{2} p + n k - 1]

modulo multiplication operations (where

n

is the number of characters of strings

A

and

B

, and

k

is the number of binary bits of

A S C I I

values corresponding to each character of strings

A

and

B

). Kang, J. et al. [36] designed an STM protocol based on the Goldwasser–Micali encryption algorithm which has a computational complexity of

m n (5 n k + n k \log_{2} p)

modulo multiplication operations.

The computational complexity of Protocol 1 in this paper mainly consists of

n

ECC encryption calculations for Alice, one time ECC decryption calculation for Bob, and a total of

m n

modulo multiplication operations. During the implementation of Protocol 2, the computational complexity consists of Alice performing

n

encryption operations and one time ECC decryption algorithm and Bob performing

m

encryption operations and one time decryption algorithm for a total of

2 m n

modulo multiplication operations.

For comparison, it is shown in Table 2 that the proposed Protocol 1 and Protocol 2 are more efficient than the existing protocols.

5.2. Communication Complexity

The communication complexity is usually measured using the number of communication rounds. The method proposed by Zhang, K.X. et al. [27] requires two rounds of communication, the one by Luo, Y.L. et al. [32] requires

m n^{2} + m n

rounds, and the one proposed by Kang, J. et al. [36] requires

2 m n

rounds. In this paper, up to

n - m + 1

rounds of communication are performed in Protocol 1, and a total of

2 (n - m + 1)

rounds of communication are performed in Protocol 2. Table 2 shows the overall performance comparison of each protocol.

By comparison, with a small difference in the number of communication rounds, Protocol 1 and Protocol 2 choose the simple and fast ECC encryption algorithm to improve efficiency, and Protocol 2 prevents malicious behavior and is more widely used.

5.3. Experimental Simulation

To obtain a visual comparison of the complexity of each protocol, the protocols in references [27,32,36] and Protocol 2 in this paper are simulated experimentally. The experimental environment is as follows: processor Intel(R) Core(TM) i5-8300H @ 2.30 GHz, 12 GB of RAM, Windows 10 (64 bit) operating system in PyCharm 2020.3.2 with Python language.

In the STM protocols, Zhang, K.X. et al. [27], Luo, Y.L. et al. [32], and Kang, J. et al. [36] all use homomorphic encryption algorithms, so the execution time is used through simulation experiments to compare efficiency.

This experiment takes string

S_{A}

and string

S_{B}

as examples and sets the length of string

S_{A}

as

n = 26

. The length of string

S_{B}

as

m

in the order of 1, 2, …, 20, for each

m

is tested 1000 times in the simulation experiment, and the average value of the protocol execution time (ignoring the preprocessing time in the protocol) is counted. Figure 3 shows the comparison of the time consumption of each protocol with increasing modulus for the protocols of Zhang, K.X. et al. [27], Luo, Y.L. et al. [32], Protocol 2 of Kang, J. et al. [36], and Protocol 2 of this paper. The average time consumed for each protocol is calculated separately at 128, 256, 512, and 1024 bit moduli, where the vertical coordinate indicates the time consumed (s) and the horizontal coordinate indicates the different modulus (bit). From Figure 3, it can be seen that Protocol 2 consumes less time than the protocols in Zhang, K.X. et al. [27], Luo, Y.L. et al. [32], and Kang, J. et al. [36] at different modules.

The protocols that were analyzed are the following: the protocol designed based on the Paillier encryption algorithm in Zhang, K.X. et al. [27], the protocol designed based on the ElGamal encryption algorithm in Luo, Y.L. et al. [32], the protocol designed based on the GM encryption algorithm in Kang, J. et al. [36], and the protocol designed based on the ECC encryption algorithm in Protocol 2 of this paper. Therefore, when conducting experiments, we took the modulus of the ElGamal encryption algorithm, Goldwasser–Micali encryption algorithm, Paillier encryption algorithm, and ECC encryption algorithm which are 1024 bits, and the selected random number length is 64 bits. Figure 4 shows the variation pattern of the execution time of string pattern matching with the growth of the number of string characters for the protocols.

From Figure 4, it can be seen that there is a significant reduction in the computational complexity of Protocol 2 in this paper as the string length

m

increases, so Protocol 2 is efficient and more widely used.

5.4. Engineering Applications

Applications of TSM permeate many aspects of real life. In addition, the research on MPC of TSM has important practical applications in privacy computing, deep learning, and machine learning. The following are some specific application scenarios.

(1): The technology of keyword search encryption can be applied in the blockchain. That is, users input keywords, and the system returns data with a similar matching degree. Using blockchain to store the key data completed by segmentation, the encrypted data can be stored in the cloud server, while the data identifier of the encrypted data can be sent to the cloud server and the blockchain, respectively, because the blockchain cannot be tampered with, even if a malicious user modifies the data information in the cloud server. However, its record identifier is in the blockchain, and the service can be stopped when the data are obtained, which can ensure the security of the data. It also adopts an additive homomorphic encryption algorithm to ensure security when the key is distributed. As shown in Figure 5:
(2): Smart grid in edge computing. The power grid generates a large amount of sampling data, and the collection, transmission, and preservation of grid sampling data require large amounts of bandwidth and storage resources, while centralized storage may also cause leakage of user privacy information. With the rise of edge computing, grid terminals can better support local real-time intelligent business processing. Locally collected raw data can be executed at the edge for initial analysis; only useful data are transmitted to the cloud, thus reducing the network burden, lowering transmission costs, and ensuring data privacy and security. The multi-keyword ciphertext retrieval scheme suitable for power data achieves precise matching of multi-keyword search and record collection index and returns the list of search results to the users. As shown in Figure 6:

6. Summary

This paper solves the problem of secure text matching in the field of natural language processing. We proposed an Intelligent Semi-Honest System (ISHS) for secret matching against malicious adversaries. Firstly, a new encoding method is designed which can encode text strings as numbers, and an MPC protocol under the semi-honest model is designed in combination with the ECC encryption algorithm. An MPC protocol for text matching under malicious models is designed using cut-and-choose method and zero-knowledge proof to resist malicious behaviors that may be committed by malicious participants in the semi-honest protocol. The security of the protocol is demonstrated using the real/ideal model paradigm, and the computational complexity and communication complexity are efficient compared with the existing protocols, providing an effective solution to the text matching problem in the fields of deep learning and natural language processing. The secure text matching has important application value in engineering. In the future, we will further investigate secure string matching protocols with wildcards that are resistant to malicious adversaries, as well as secure string matching protocols with multiple participants.

Author Contributions

Conceptualization, X.L. and J.K.; methodology, J.K.; investigation, X.L.; writing—original draft preparation, X.L. and N.X.; software, D.L. and G.X.; funding acquisition, X.L.; validation, N.X. and X.C.; writing—original draft, J.K.; writing—review and editing, X.L., N.X. and X.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China: Big Data Analysis based on Software Defined Networking Architecture, grant numbers 62177019 and F0701; NSFC, grant numbers 62271070, 72293583, and 61962009; Inner Mongolia Natural Science Foundation, grant number 2021MS06006; 2023 Inner Mongolia Young Science and Technology Talents Support Project, grant number NJYT23106; 2022 Fund Project of Central Government Guiding Local Science and Technology Development, grant number 2022ZY0024; 2022 Basic Scientific Research Project of Direct Universities of Inner Mongolia, grant number 20220101; 2022 “Western Light” Talent Training Program “Western Young Scholars” Project, grant number 22040601; the 14th Five-Year Plan of Education and Science of Inner Mongolia, grant number NGJGH2021167; 2023 Open Project of the State Key Laboratory of Network and Exchange Technology, grant number 230201; 2022 Inner Mongolia Postgraduate Education and Teaching Reform Project: JGSZ2022037; the 2022 Ministry of Education Central and Western China Young Backbone Teachers and Domestic Visiting Scholars Program, grant number 2022015; Inner Mongolia Discipline Inspection and Supervision Big Data Laboratory Open Project Fund, grant number IMDBD202020; Baotou Kundulun District Science and Technology Plan Project, grant number YF2020013; Inner Mongolia Science and Technology Major Project, grant number 2019ZD025; Project JCKY2021208B036, and the Fundamental Research Funds for Beijing Municipal Commission of Education, grant number 220201.

Data Availability Statement

The authors approve that data used to support the finding of this study are included in the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bernardini, G.; Gawrychowski, P.; Pisanti, N.; Pissis, S.P.; Rosone, G. Elastic-Degenerate String Matching via Fast Matrix Multiplication. SIAM J. Comput. 2022, 51, 549–576. [Google Scholar] [CrossRef]
Cinti, A.; Bianchi, F.M.; Martino, A.; Rizzi, A. A novel algorithm for online inexact string matching and its FPGA implementation. Cogn. Comput. 2020, 12, 369–387. [Google Scholar] [CrossRef] [Green Version]
Kumar, P.; Kumar, R.; Srivastava, G.; Gupta, G.P.; Tripathi, R.; Gadekallu, T.R.; Xiong, N.N. PPSF: A privacy-preserving and secure framework using blockchain-based machine-learning for IoT-driven smart cities. IEEE Trans. Netw. Sci. Eng. 2021, 8, 2326–2341. [Google Scholar] [CrossRef]
Fu, A.M.; Zhang, X.L.; Xiong, N.X.; Gao, Y.S.; Wang, H.Q.; Zhang, J. VFL: A verifiable federated learning with privacy-preserving for big data in industrial IoT. IEEE Trans. Ind. Inform. 2020, 18, 3316–3326. [Google Scholar] [CrossRef]
Yao, Y.L.; Xiong, N.X.; Park, J.H.; Ma, L.; Liu, J.F. Privacy-preserving max/min query in two-tiered wireless sensor networks. Comput. Math. Appl. 2013, 65, 1318–1325. [Google Scholar] [CrossRef]
Cali, D.S.; Kalsi, G.S.; Bingöl, Z.; Fritina, C.; Subramanian, L.; Kim, J.S.; Ausavarungnirun, R.; Alser, M.; Gomez-Luna, J.; Boroumand, A.; et al. GenASM: A high-performance, low-power approximate string matching acceleration framework for genome sequence analysis. In Proceedings of the 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Athens, Greece, 17–21 October 2020. [Google Scholar]
Chen, Y.W.; Zhou, L.D.; Pei, S.W.; Yu, Z.W.; Chen, Y.; Liu, X.; Du, J.X.; Xiong, N. KNN-BLOCK DBSCAN: Fast clustering for large-scale data. IEEE Trans. Syst. Man Cybern. Syst. 2019, 51, 3939–3953. [Google Scholar] [CrossRef]
Hu, W.J.; Fan, J.; Du, Y.X.; Li, B.S.; Xiong, N.; Bekkering, E. MDFC-ResNet: An agricultural IoT system to accurately recognize crop diseases. IEEE Access 2020, 8, 115287–115298. [Google Scholar] [CrossRef]
Equi, M.; Mäkinen, V.; Tomescu, A.I.; Grossi, R. On the complexity of string matching for graphs. ACM Trans. Algorithms 2023, 19, 1–25. [Google Scholar] [CrossRef]
Equi, M.; Mäkinen, V.; Tomescu, A.I. Graphs cannot be indexed in polynomial time for sub-quadratic time string matching, unless SETH fails. In Proceedings of the 47th International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM 2021), Bolzano-Bozen, Italy, 25–29 January 2021. [Google Scholar]
Huang, S.; Zeng, Z.; Ota, K.; Dong, M.; Wang, T.; Xiong, N. An intelligent collaboration trust interconnections system for mobile information control in ubiquitous 5G networks. IEEE Trans. Netw. Sci. Eng. 2020, 8, 347–365. [Google Scholar] [CrossRef]
Cheng, H.J.; Xie, Z.; Shi, Y.S.; Xiong, N. Multi-step data prediction in wireless sensor networks based on one-dimensional CNN and bidirectional LSTM. IEEE Access 2019, 7, 117883–117896. [Google Scholar] [CrossRef]
Gao, Y.B.; Xiang, X.H.; Xiong, N.; Huang, B.; Lee, H.J.; Alrifai, R.; Jiang, X.Y.; Fang, Z.J. Human action monitoring for healthcare based on deep learning. IEEE Access 2018, 6, 52277–52285. [Google Scholar] [CrossRef]
Wu, C.X.; Luo, C.; Xiong, N.; Zhang, W.; Kim, T.H. A greedy deep learning method for medical disease analysis. IEEE Access 2018, 6, 20021–20030. [Google Scholar] [CrossRef]
Wu, C.X.; Ju, B.B.; Wu, Y.; Lin, X.; Xiong, N.; Xu, G.Q.; Li, H.Y.; Liang, X.F. UAV autonomous target search based on deep reinforcement learning in complex disaster scene. IEEE Access 2019, 7, 117227–117245. [Google Scholar] [CrossRef]
Zhao, J.; Huang, J.F.; Xiong, N. An effective exponential-based trust and reputation evaluation system in wireless sensor networks. IEEE Access 2019, 7, 33859–33869. [Google Scholar] [CrossRef]
Navarro, G. Indexing highly repetitive string collections, part I: Repetitiveness measures. ACM Comput. Surv. 2021, 54, 1–31. [Google Scholar] [CrossRef]
Kang, L.; Chen, R.S.; Xiong, N.; Chen, Y.C.; Hu, Y.X.; Chen, C.M. Selecting hyper-parameters of Gaussian process regression based on non-inertial particle swarm optimization in Internet of Things. IEEE Access 2019, 7, 59504–59513. [Google Scholar] [CrossRef]
Zhao, C.; He, Y. Auto-em: End-to-end fuzzy entity-matching using pre-trained deep models and transfer learning. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019. [Google Scholar]
Goldreich, O. Secure multi-party computation. Manuscr. Prelim. Version 1998, 78, 110. [Google Scholar]
Chen, Z.H.; Li, S.D.; Wang, D.S.; Huang, Q.; Dong, L.H. Protocols for secure computation of set-inclusion with the unencrypted method. J. Comput. Res. Dev. 2017, 54, 1549–1556. [Google Scholar]
Nozaki, K.; Hochin, T.; Nomiya, H. Semantic schema matching for string attribute with word vectors. In Proceedings of the 6th International Conference on Computational Science/Intelligence and Applied Informatics (CSII), Honolulu, HI, USA, 29–31 May 2019. [Google Scholar]
Markić, I.; Štula, M.; Zorić, M.; Stipaničev, D. Entropy-based approach in selection exact string-matching algorithms. Entropy 2020, 23, 31. [Google Scholar] [CrossRef]
Karcioglu, A.A.; Bulut, H. The WM-q multiple exact string matching algorithm for DNA sequences. Comput. Biol. Med. 2021, 136, 104656. [Google Scholar] [CrossRef]
Xu, L.; Wei, X.; Cai, G.; Li, Y.; Wang, H. SWMQ: Secure wildcard pattern matching with query. Int. J. Intell. Syst. 2022, 37, 6262–6282. [Google Scholar] [CrossRef]
Mua’ad, M.; Aldebei, K.; Alqadi, Z.A. Simple, efficient, highly secure, and multiple purposed method on data cryptography. Traitement Du Signal 2022, 39, 173–178. [Google Scholar]
Zhang, K.X.; Yang, C.; Li, S.D. Confidential calculation of string matching. J. Cryptol. 2022, 9, 619–632. [Google Scholar]
Ling, H.Z.; Xue, K.P.; Wei David, S.L.; Li, R.D. Searchable encryption scheme supporting multi-keyword fuzzy search for multi-user scenarios. J. Univ. Sci. Technol. China 2021, 51, 562–576. [Google Scholar]
Lv, Z.; Peng, R. A novel periodic learning ontology matching model based on interactive grasshopper optimization algorithm. Knowl.-Based Syst. 2021, 228, 107239. [Google Scholar] [CrossRef]
Paillier, P. Public-key cryptosystems based on composite degree residuosity classes. In Proceedings of the International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT′99), Prague, Czech Republic, 2–6 May 1999. [Google Scholar]
Li, S.D.; Wang, W.L.; Du, R.M. Protocol for millionaires’ problem in malicious models (in Chinese). Sci. Sin. Inf. 2021, 51, 75–88. [Google Scholar] [CrossRef]
Luo, Y.L.; Shi, L.; Zhang, C.Y.; Zhang, J. Privacy-preserving protocols for string matching. In Proceedings of the 2010 Fourth International Conference on Network and System Security (NSS 2010), Melbourne, VIC, Australia, 1–3 September 2010. [Google Scholar]
Hosseini, K.; Nanni, F.; Ardanuy, M.C. DeezyMatch: A flexible deep learning approach to fuzzy string matching. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online, 5 October 2020. [Google Scholar]
Bosker, H.R. Using fuzzy string matching for automated assessment of listener transcripts in speech intelligibility studies. Behav. Res. Methods 2021, 53, 1945–1953. [Google Scholar] [CrossRef]
Vaiwsri, S.; Ranbaduge, T.; Christen, P. Accurate and efficient privacy-preserving string matching. Int. J. Data Sci. Anal. 2022, 14, 191–215. [Google Scholar] [CrossRef]
Kang, J.; Li, S.D.; Yang, X.Y. Secure Multiparty Computation for String Pattern Matching. J. Cryptogr. 2017, 4, 241–252. [Google Scholar]
Hazay, C.; Toft, T. Computationally secure pattern matching in the presence of malicious adversaries. J. Cryptol. 2014, 27, 358–395. [Google Scholar] [CrossRef]
Yasuda, M.; Shimoyama, T.; Kogure, J.; Yokoyama, K.; Koshiba, T. Secure pattern matching using somewhat homomorphic encryption. In Proceedings of the 2013 ACM Workshop on Cloud Computing Security Workshop, Berlin, Germany, 8 November 2013. [Google Scholar]
Barton, C. On the average-case complexity of pattern matching with wildcards. Theor. Comput. Sci. 2022, 922, 37–45. [Google Scholar] [CrossRef]
Benssalah, M.; Rhaskali, Y.; Drouiche, K. An efficient image encryption scheme for TMIS based on elliptic curve integrated encryption and linear cryptography. Multimed. Tools Appl. 2021, 80, 2081–2107. [Google Scholar] [CrossRef]
Liu, X.; Zhang, R.L.; Xu, G.; Chen, X.B.; Xiong, N. Confidentially judging the relationship between an integer and an interval against malicious adversaries and its applications. Comput. Commun. 2021, 180, 115–125. [Google Scholar] [CrossRef]
Kociumaka, T.; Pissis, S.P.; Radoszewski, J. Pattern matching and consensus problems on weighted sequences and profiles. Theor. Comput. Syst. 2019, 63, 506–542. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Comparison of

n - m + 1

cycles.

Figure 1. Comparison of

n - m + 1

cycles.

Figure 2. Example of malicious attacks in Protocol 1.

Figure 3. Comparison of time consumption for different modalities. (Reference [27]: Zhang, K.X. 2022, Reference [32]: Luo, Y.L. 2010, Reference [36]: Kang, J. 2017).

Figure 4. Comparison of experimental simulation execution time. (Reference [27]: Zhang, K.X. 2022, Reference [32]: Luo, Y.L. 2010, Reference [36]: Kang, J. 2017).

Figure 5. Text matching framework in blockchain.

Figure 6. Text matching process in smart grid.

Table 1. Examples of strings and encoding.

Alice’s String	Code	Bob’s String	Code	Operation Result
ac	(11,13)	ac	(11,13)	0
cd	(13,14)			1
db	(14,12)			1
bc	(12,13)			1

Table 2. Performance Comparison.

Protocol	Computational Complexity	Communication Complexity	Anti-Malicious Adversaries
Zhang, K.X. et al. [27]	$m + \frac{3}{2} (2 n + 1) \log_{2} N$	2	×
Luo, Y.L. et al. [32]	$m n [(2 n k + 1) \log_{2} p + n k - 1]$	$m n^{2} + m n$	×
Kang, J. et al. [36]	$m n (5 n k + n k \log_{2} p)$	$2 m n$	×
Protocol 1	$m n$	$n - m + 1$	×
Protocol 2	$2 m n$	$2 (n - m + 1)$	√

×: Cannot resist a malicious adversary; √: Can resist malicious adversaries.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, X.; Kong, J.; Luo, D.; Xiong, N.; Xu, G.; Chen, X. An Intelligent Semi-Honest System for Secret Matching against Malicious Adversaries. Electronics 2023, 12, 2617. https://doi.org/10.3390/electronics12122617

AMA Style

Liu X, Kong J, Luo D, Xiong N, Xu G, Chen X. An Intelligent Semi-Honest System for Secret Matching against Malicious Adversaries. Electronics. 2023; 12(12):2617. https://doi.org/10.3390/electronics12122617

Chicago/Turabian Style

Liu, Xin, Jianwei Kong, Dan Luo, Neal Xiong, Gang Xu, and Xiubo Chen. 2023. "An Intelligent Semi-Honest System for Secret Matching against Malicious Adversaries" Electronics 12, no. 12: 2617. https://doi.org/10.3390/electronics12122617

APA Style

Liu, X., Kong, J., Luo, D., Xiong, N., Xu, G., & Chen, X. (2023). An Intelligent Semi-Honest System for Secret Matching against Malicious Adversaries. Electronics, 12(12), 2617. https://doi.org/10.3390/electronics12122617

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Intelligent Semi-Honest System for Secret Matching against Malicious Adversaries

Abstract

1. Introduction

2. Related Work

2.1. Text String Encoding

2.2. Elliptic Curve Cryptography

2.3. Cut-and-Choose Method

2.4. Security under the Malicious Model

3. Secure Text Matching Protocol under the Semi-Honest Model

4. Secure Text Matching Protocol under the Malicious Model

4.1. Specific Protocols

4.2. Correctness Analysis

4.3. Proof of Security

4.4. Characteristics of the Protocol

5. Performance Analysis

5.1. Computational Complexity

5.2. Communication Complexity

5.3. Experimental Simulation

5.4. Engineering Applications

6. Summary

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI