Fast, Searchable, Symmetric Encryption Scheme Supporting Ranked Search

He, Wei; Zhang, Yu; Li, Yin

doi:10.3390/sym14051029

Open AccessArticle

Fast, Searchable, Symmetric Encryption Scheme Supporting Ranked Search

by

Wei He

¹,

Yu Zhang

^1,*

and

Yin Li

²

¹

School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, China

²

School of Cyberspace Security, Dongguan University of Technology, Dongguan 523820, China

^*

Author to whom correspondence should be addressed.

Symmetry 2022, 14(5), 1029; https://doi.org/10.3390/sym14051029

Submission received: 19 April 2022 / Revised: 11 May 2022 / Accepted: 15 May 2022 / Published: 18 May 2022

(This article belongs to the Special Issue Information Security and Cryptology: Advanced Technologies in Symmetry/Asymmetry)

Download

Browse Figures

Versions Notes

Abstract

:

Searchable encryption (SE) is one of the effective techniques for searching encrypted data without decrypting it. This technique can provide a secure indexing mechanism for encrypted data and utilize a secure trapdoor to search for the encrypted data directly, thus realizing a secure ciphertext retrieval function. Existing schemes usually build a secure index directly on the whole dataset and implement the retrieval of encrypted data by implementing a secure search algorithm on the index. However, this approach requires testing many non-relevant documents, which diminishes the query efficiency. In this paper, we adopt a clustering method to preclassify the dataset, which can filter out quite a portion of irrelevant documents, thus improving the query. Concretely, we first partition the dataset into multiple document clusters using the k-means clustering algorithm; then, we design index building and searching algorithms for these document clusters; finally, by using the asymmetric scalar-product-preserving encryption (ASPE) scheme to encrypt the indexes and queries, we propose a fast searchable symmetric encryption scheme that supports ranked search. Detailed security analysis demonstrates that the proposed scheme can guarantee the data and query security of the search process. In addition, theoretical and experimental analysis indicates that our scheme outperforms other similar schemes in terms of query efficiency.

Keywords:

searchable symmetric encryption; searchable encryption; keyword search; ranked search; search over encrypted data

1. Introduction

With the rapid growth of cloud computing, more and more people and businesses are willing to outsource their data to the cloud. Using cloud computing, cloud service providers can provide online data storage and analysis services to different types of customers, which greatly reduces the cost of data storage and computing for users. As a result, data outsourcing techniques in the cloud environment have become widely popular. However, although data outsourcing brings convenience to data users, it also inevitably faces the risk of data leakage. To ensure data security, we can encrypt the data before uploading. However, encryption makes ciphertext no longer support retrieval operations as plaintext does. Driven by the demand for searching over encrypted data, a large number of searchable encryption schemes emerged [1,2,3,4,5,6]. According to different encryption mechanisms, searchable encryption schemes can be divided into two categories: searchable symmetric encryption (SSE) [1,2,3] and searchable public key encryption (SPE) [4,5,6]. SPE scheme can provide secure search service for multiple data owners, but public key operation has high computing cost and is not suitable for large-scale data application scenarios. On the contrary, symmetric encryption in the SSE scheme has simple operation and low cost, which is more suitable for the current big data environment.

The SSE scheme can solve the problem of finding documents most related to the query keywords in the corpus according to a given metric mechanism. Earlier SSE schemes only return documents containing all the query keywords and did not return documents based on how closely they relate to the query keywords. To improve query accuracy, Cao et al. proposed an SSE scheme supporting ranked search using the term frequency-inverse document frequency (TF-IDF) model [7]. However, due to the use of the forward index structure, the search time is linear with the number of documents, which is not practical in the big-data environment. To improve the search efficiency, by using a binary balanced tree structure, Xia et al. proposed an SSE scheme with a sublinear search complexity [8]. Subsequently, Guo et al. used bloom filter technology to compress the vector dimension of internal tree nodes [9], further improving the query efficiency of the SSE scheme.

Although these two schemes have achieved ranked search on encrypted data, there is still a room for improvement. As the number of documents increases, the number of nodes in the index tree increases dramatically, resulting in high time cost to search the index tree. In order to improve the query efficiency, in this paper, we propose a more efficient SSE scheme compared with previous schemes with similar functions. To implement our scheme, we first transform the documents into the corresponding semantic vectors using a keyword conversion method. Then, the document set is divided into multiple document clusters using the k-means clustering algorithm with document semantic vectors, and index trees are built for each document cluster. Finally, we exploit the ASPE scheme to encrypt these index trees and user queries to achieve a secure ranked search. To sum up, the contribution of this paper comprises the following parts.

(1): A clustering algorithm is adopted to cluster the document set into multiple document clusters, and a secure index tree is constructed on each cluster. By utilizing this approach, the time complexity of index tree retrieval can be reduced because the height of the index tree is decreased.
(2): We optimize the search method to reduce the number of index trees to be retrieved and further improve the query efficiency of our scheme without sacrificing query accuracy too much.
(3): By utilizing the ASPE scheme [10] to encrypt the index and the query, we propose a fast SSE scheme support ranked search (F-SSE-RS). Moreover, we also design a dynamic update method so that the index in our scheme can support safe document update operations.

In addition, we give a detailed analysis to prove the security of F-SSE-RS and implement it on a widely used data collection. The experiment results demonstrate that our scheme is feasible in practice.

Related Work: Searchable encryption, as an attractive technique in data security and privacy protection, has received much attention in recent years. According to its different cryptographic prototypes, there are two main categories: searchable symmetric key encryption (SSE) and searchable public key encryption (SPE).

In the SSE scheme, the data owner and the data user share the secret key. Song et al. proposed the first SSE scheme that supports only a single keyword search [1]. To realize multi-keywords search, Goh designed an SSE scheme that supports conjunctive keyword search using a technique called the bloom filter [11]. Subsequently, to support more flexible query conditions, researchers proposed some SSE schemes that support complex query conditions such as range search [12,13], fuzzy search [14,15], and semantical search [2,16]. However, since these schemes fail to measure the relevance of documents to queries, many low-relevance documents will be returned, thus resulting in a large computational and communication overhead. To solve this problem, Wang et al. proposed an SSE scheme that supports ranked search in [17]. This scheme can compute the relevance between documents and queries in the encrypted state and sort the query results. Since the scheme proposed in [17] only supports a single keyword search, Cao et al. proposed a ranked search scheme that supports multiple keywords search [7]. However, this scheme adopts a forward index structure in which each document has an individual index, so its search time is linear in the number of documents. The search efficiency of this scheme is impractical in the current big data scenario. To improve the search efficiency, by using a tree–index structure, Xia et al. proposed a similar SSE scheme [8]. The search efficiency of this scheme is sublinear with the number of documents. Subsequently, Guo et al. further improved the scheme to reduce the index building time [9]. However, we find that there is still room for decreasing the search time of these schemes, so we will propose a new scheme to improve the query efficiency in this paper. Besides, many recent SSE schemes are dedicated to supporting more complex query conditions, such as fuzzy query [18], spatial data query [19], and phrase query [20]. These schemes can better meet the user’s query intent.

In the SPE scheme, the data owner uses the public key to build an encrypted index, while the data user uses the private key to perform ciphertext retrieval. Thus, SPE naturally supports many-to-one query scenarios. Boneh et al. introduced the first SPE scheme, called public key encryption with keyword search (PEKS) [4]. However, it supports only single keyword retrieval. Subsequently, researchers have proposed various SPE schemes that support multiple keywords queries, such as conjunctive keyword search [21], disjunctive keyword search [22], and Boolean keyword search [23]. In recent years, in addition to the research on multi-keyword search, many studies have targeted the security, precision, and efficiency of SPE. For example, access control [24], fast query [25], and semantic search [26]. All these works have greatly improved the usefulness of SPE.

Organization: The rest of this paper is organized as follows. In Section 2, we define the system model and the threat model of the scheme and give the design objectives of our scheme. Section 3 focuses on the index building and search method of the proposed scheme. Section 4 gives the construction process of the F-SSE-RS scheme and also gives the updated algorithm of the scheme as well as the security analysis. The theoretical and experimental analysis of the proposed scheme is presented in Section 5. The conclusion is given in Section 6.

2. Problem Formulation

This section first presents the system model of the F-SSE-RS scheme. Then, based on this model, we introduce two threat models commonly considered in SSE schemes. Finally, we propose the design goal of our scheme. For clarity, the main notations utilized in this paper are listed in Table 1.

2.1. System Model

Data owner (DO), data user (DU), and cloud server (CS) are three different roles in the system model of F-SSE-RS. To further illustrate the relationships among these three roles, we present an architectural diagram of the system model in Figure 1. As shown in Figure 1, DO encrypts a collection of documents and builds a safe index before sending them to DU. Then, DO sends these encrypted documents to CS together with the encrypted index. Once DU wishes to run a query, Q, it computes and transmits a search trapdoor,

T_{Q}

, of Q to CS. After getting the trapdoor, CS checks the encrypted index using

T_{Q}

and sends the most relevant documents to DU.

To clarify the system model, the specific duties for each role are formally described as follows.

(1): Data owner (DO): DO owns a large number of sensitive documents $F = {f_{1}, f_{2}, \dots, f_{d}}$ . DO utilizes a symmetric key encryption scheme, e.g., AES, to encrypt the document set F, and adopts the F-SSE-RS scheme to build the secure searchable index. After these operations, DO uploads encrypted documents and secure index to CS. Finally, DO delivers the secret key to the data users who have been granted access to the data.
(2): Data user (DU): An authorized DU can make a secure query over encrypted data. Given a query Q, DU creates a trapdoor with the secret key and Q and sends it to CS. When DU gets search results from CS, DU can use the secret key to decode the encrypted contents.
(3): Cloud server (CS): DO’s encrypted index and documents are stored in CS. Once the trapdoor, $T_{Q}$ , is obtained from DU, CS executes the query over the index and returns the most relevant encrypted documents related to Q. In addition, CS runs update operations over the encrypted index after obtaining updated information from DO.

2.2. Threat Model

In reality, most cloud servers are not completely trustworthy, which means they might be snooping on information they should not have access to. Here, we suppose that the cloud server is an “honest-but-curious” model, adopted by many SE schemes [7,8,9]. In the honest-but-curious model, CS performs algorithms of the F-SSE-RS scheme in terms of the desired computational process, but CS infers the user’s privacy information curiously. Under the honest-but-curious model, based on the extent of information known to CS, our scheme adopts the following two threat models proposed by Cao et al. [7].

-: Known ciphertext model: In this model, CS only knows encrypted documents and the secure index, which are stored on the server.
-: Known background model: CS can access more information in this model than the aforementioned model. This information involves the relationship between a trapdoor and the dataset, and the statistics related to the dataset. For example, CS might exploit the dataset’s term frequency (TF)-inverse document frequency (IDF) knowledge to perform the statistical attack.

2.3. Design Goals

To make the ranked search execute efficiently and securely, our scheme should concurrently satisfy the following goals under the aforementioned model.

(1)

Multi-keyword ranked search: Each document,

f_{i}

, in F is associated with a keyword set,

W_{i} = {w_{i 1}, w_{i 2}, \dots, w_{i n_{i}}}

, in which

n_{i}

is the number of keywords in

W_{i}

. The multi-keyword query, Q, is

Q = {q_{1}, q_{2}, \dots, q_{m}}

. The F-SSE-RS scheme’s search result is sorted, which means that F-SSE-RS returns documents whose keyword set,

W_{i}

, is highly relevant to the query, Q. Furthermore, the F-SSE-RS scheme can enable efficiently dynamic activities, such as document insertion and deletion.

(2)

Efficiency: The F-SSE-RS scheme can achieve sublinear search efficiency. Furthermore, the time cost of keyword search is substantially lower than existing similar schemes.

(3)

Privacy preserving: The F-SSE-RS scheme, like some previous schemes, prevents CS from deducing more private information from ciphertexts, secure indexes, and trapdoors. Privacy requirements that our scheme focuses on are listed as follows.

-: Document and index privacy: Document privacy is usually protected by traditional symmetric-key encryption schemes, e.g., AES, DES, and six-face cubical key encryption [27]. For index privacy, the F-SSE-RS scheme prevents CS from learning what is hidden in the index.
-: Trapdoor unlinkability: The trapdoor generation algorithm needs to be probabilistic rather than deterministic, which means that the same keyword query will generate different trapdoors.
-: Keyword privacy: Although the trapdoor can be protected using cryptographic techniques, search results can be adopted to infer query keywords. Thus, our scheme needs to prevent CS from learning query keywords from trapdoors by search results and statistics of documents.

3. Methods for Index Building and Searching

In this section, we give the method for creating the plaintext index as well as the search method over this index. Before presenting these methods, we first devise a keyword conversion method to transform documents and queries into vector representations. Then, we introduce the index building method, which mainly consists of two steps: splitting the corpus into several document sets and building an index tree for each document set. After these two steps, the index tree of each document set is combined to construct the index of the entire corpus. Finally, we give a recursive method for searching the index. The following subsections provide a concrete discussion of these approaches.

3.1. Keyword Conversion Method

In our scheme, we need to convert documents and queries into vectors. We adopt the famous TF−IDF word weighting algorithm to create vectors for documents and queries. The information of term frequency (TF) is applied to create vector representations of documents, and the inverse document frequency (IDF) knowledge is utilized to construct vector representations of queries. We give the concrete conversion approach as follows.

(1): The method extracts keywords in the dataset, and builds a dictionary $D I C = {w_{1}, w_{2},$ $\dots, w_{N}}$ , where $w_{t}$ is an unique keyword in $D I C$ and $t \in [1, N]$ .
(2): For each document, $f_{i}$ , associated with a keyword set, $W_{i} = {w_{i 1}, w_{i 2}, \dots, w_{i n_{i}}}$ , the method first creates a $T F -$ vector $\vec{W_{i}} = {x_{i 1}, x_{i 2}, \dots, x_{i N}}$ for $f_{i}$ . Based on Equation (1), the method then sets $x_{i t} = T F_{w_{i j}}$ when $w_{i j} = w_{t}$ , where $t \in [1, N]$ , $i \in [1, d]$ and $j \in [1, n_{i}]$ .

$T F_{w_{i j}} = \frac{1 + l n (r_{w_{i j}})}{\sqrt{\sum_{w_{i j} \in W_{i}} {(1 + l n (r_{w_{i j}}))}^{2}}}$

(1)

The number of repetitions of $w_{i j}$ in the document, $f_{i}$ , is denoted by $r_{w_{i j}}$ in Equation (1).
(3): For a query, $Q = {q_{1}, q_{2}, \dots, q_{m}}$ , the method first constructs a $I D F -$ vector $\vec{Q} = {v_{1}, v_{2}, \dots, v_{N}}$ . After this, based on Equation (2), the method sets $v_{t} = I D F_{q_{j}}$ when $q_{j} = w_{t}$ , where $t \in [1, N]$ and $j \in [1, m]$ .

$I D F_{q_{j}} = l n (1 + \frac{n}{n_{q_{j}}})$

(2)

In Equation (2), $n_{q_{j}}$ is the number of documents that contain the keyword $q_{j}$ in the dataset.

Based on the

T F -

vector,

\vec{W_{i}}

and the IDF vector

\vec{Q}

, the relevance score between

f_{i}

and Q can be calculated by the following Equation (3):

S c o r e (f_{i}, Q) = \vec{W_{i}} \cdot \vec{Q} .

(3)

We can obtain the most relevant documents based on relevance scores between documents and queries.

3.2. Approach for Index Building

For building the index, we first utilize the k-means clustering algorithm to divide the entire corpus into several document sets. Then, we provide a tree-building algorithm to produce an index tree for each document set. Finally, all index trees are merged to create the corpus’s plaintext index. The detailed algorithms for these steps are given as follows.

3.2.1. Dataset Division Method

By using the keyword conversion method, each document,

f_{i}

, in F corresponds to a vector

\vec{W_{i}}

. Given the vector set

\vec{W_{1}}, \vec{W_{2}}, \dots, \vec{W_{d}}

, we apply a well-known clustering algorithm

k - m e a n s

to split the dataset F. The concrete division method is given in Algorithm 1.

Algorithm 1 Dataset division method.

Input: A vector set

\vec{W_{1}}, \vec{W_{2}}, \dots, \vec{W_{d}}

for the dataset F, the number of document clusters (k) that users want to produce.
Output: k document clusters

C = {c_{1}, c_{2}, \dots, c_{k}}

.

1:: Inputs $\vec{W_{1}}, \vec{W_{2}}, \dots, \vec{W_{d}}$ to the $k - m e a n s$ algorithm, and obtains k document clusters $C^{'} = {c_{1}^{'}, c_{2}^{'}, \dots, c_{k}^{'}}$ ;
2:: Let $M a x L e n$ be the maximum value of $| c_{1}^{'} |, | c_{2}^{'} |, \dots, | c_{k}^{'} |$ , where $| c_{i}^{'} |$ is the number of documents in the cluster $c_{i}^{'}$ and $i \in [1, k]$ ;
3:: for each $i \in [1, k]$ do
4:: Computes $G a p L e n = M a x L e n - | c_{i}^{'} |$ .
5:: if $G a p L e n > 0$ then
6:: Constructs $G a p L e n$ fake documents whose $T F$ vector is set to be zero vector;
7:: Combines these $G a p L e n$ fake documents and $c_{i}^{'}$ to create a new cluster, $c_{i}$ ;
8:: end if
9:: end for
10:: return $C = {c_{1}, c_{2}, \dots, c_{k}}$ ;

Algorithm 1 first applies the

k - m e a n s

algorithm to partition the dataset into k document clusters

C = {c_{1}^{'}, c_{2}^{'}, \dots, c_{k}^{'}}

, where the documents in each cluster are semantically relevant. Since the number of documents in each cluster is distinct, for the sake of security, Algorithm 1 then appends some fake documents to each cluster to ensure that each cluster has the same number of documents. Finally, Algorithm 1 outputs k clusters

C = {c_{1}, c_{2}, \dots, c_{k}}

, where

c_{i} = {f_{i 1}, f_{i 2}, \dots, f_{i ϕ}}

,

f_{i j}

is a document in

c_{i}

and

ϕ

is the number of documents in each cluster. We build the index of the dataset F based on the partition result C.

3.2.2. Method for Building the Plaintext Index

Like some previous SSE schemes [8,9], we take advantage of the binary balanced tree as the index structure. For the sake of clarity, we define the data structure of the tree node u as:

u = < I D, \vec{u}, P_{l}, P_{r}, F I D >

, where

I D

is the identity of u;

\vec{u}

is the vector representation of u;

P_{l}

and

P_{r}

are two pointers which point to u’s left and right children, respectively; and

F I D

is the identity of a document if u is a leaf node. According to this formal definition, we propose the method for building the plaintext index. For each cluster,

c_{i}

, in C, to create the index tree,

T_{i}

, of

c_{i}

, we first convert the documents of the cluster,

c_{i}

, to leaf nodes. Specifically, for each document

f_{i j}

in

c_{i}

, we set

u_{i j} . I D = G e n I D ()

,

u_{i j} . \vec{u} = \vec{W_{i j}}

,

u_{i j} . P_{l} = u_{i j} . P_{r} = N U L L

,

u_{i j} . F I D

be the identifier of

f_{i j}

, where

G e n I D ()

is a function that can generate an unique ID for the tree node, and

\vec{W_{i j}}

is the

T F

vector for documents

f_{i j}

,

i \in [1, k]

, and

j \in [1, ϕ]

. After conversion, we can obtain a leaf node set,

l_{i} = {u_{i 1}, u_{i 2}, \dots, u_{i ϕ}}

, for the cluster,

c_{i}

.

Based on the leaf node set,

l_{i}

, of the cluster,

c_{i}

, we propose the index tree building algorithm for

c_{i}

in Algorithm 2.

From Algorithm 2,

l_{i}

initially contains all the leaf nodes of the cluster,

c_{i}

. We need to construct internal nodes of the index tree in a bottom–up manner based on

l_{i}

. Let

| l_{i} |

be the number of nodes in

l_{i}

. If

l_{i}

contains only one node, this means that this node is the root node of the index tree of

c_{i}

. Otherwise, we should use two nodes in

l_{i}

to create a parent node. More specifically, let

l_{i} [t]

and

l_{i} [t + 1]

be two nodes in

l_{i}

, a parent node u of these two nodes is built as follows.

The ID of u is generated by running

G e n I D ()

. The two pointers of u are pointing to

l_{i} [t]

and

l_{i} [t + 1]

, respectively. For the vector of

\vec{u}

,

\vec{u} [j]

is the maximum of

\vec{l_{i} [t]} [j]

and

\vec{l_{i} [t + 1]} [j]

, where

\vec{u} [j]

,

\vec{l_{i} [t]} [j]

, and

\vec{l_{i} [t + 1]} [j]

represent values of j-th dimension of

\vec{u}

,

\vec{l_{i} [t]}

, and

\vec{l_{i} [t + 1]}

, respectively. By calling the function BuildTree(

l_{i}

) recursively, the plaintext index tree,

T_{i}

, for the cluster,

c_{i}

, can be constructed.

After building the index tree for each cluster, we use all the index trees as the index for the entire dataset. The index for the entire dataset is denoted by

I n d = {r_{1}, r_{2}, \dots r_{k}}

, where

r_{i}

is the root node of

T_{i}

and

i \in [1, k]

. For the sake of clarity, we give an example to illustrate the process of index building.

Algorithm 2 The algorithm for building index tree for the cluster,

c_{i}

, declared by BuildTree(

l_{i}

)

Input: The leaf node set

l_{i}

of the cluster,

c_{i}

.
Output: The plaintext index tree,

T_{i}

, for the cluster,

c_{i}

.

if $| l_{i} | = = 1$ then
return $l_{i}$ ;
$\ \$ This is the root node of the tree.
end if
Initializes an empty set $T e m p N o d e S e t$ ;
Sets $s = | l_{i} |, t = 0$ ;
while $t < s$ do
if $t + 1 < s$ then
Creates a parent node u for two nodes $l_{i} [t]$ and $l_{i} [t + 1]$ , where $u . I D = G e n I D ()$ , $u . P_{l} = l_{i} [t]$ , $u . P_{r} = l_{i} [t + 1]$ , $u . F I D = N U L L$ and $\vec{u} [j] = M a x (\vec{l_{i} [t]} [j], \vec{l_{i} [t + 1]} [j])$ .
Inserts u to $T e m p N o d e S e t$ ;
else
Inserts $l_{i} [t]$ to $T e m p N o d e S e t$ ;
end if
$i = i + 2$ ;
end while
$l_{i} = B u i l d T r e e (T e m p N o d e S e t)$ ;
$\ \$ recursively calls BuildTree.
return $l_{i}$ ;

Example 1.

An example of building the index for a document set

F = {f_{1}, f_{2}, \dots, f_{12}}

is illustrated in Figure 2. From Figure 2, after using clustering algorithm, we suppose that F is divided into three clusters

c_{1} = {f_{1}, f_{4}, f_{7}, f_{11}}

,

c_{2} = {f_{2}, f_{3}, f_{8}, f_{12}}

, and

c_{3} = {f_{5}, f_{6}, f_{9}, f_{10}}

. For each cluster,

c_{i}

, we first convert each document in

c_{i}

into a leaf node. Then, based on these leaf nodes, we construct the internal nodes in a bottom–up manner using Algorithm 2. Finally, we combine these three index tree as the plaintext index of the document set F.

3.3. Approach for Index Search

For a keyword query, Q, we adopt the keyword conversion method to transform Q into an IDF vector

\vec{Q}

. Given the index

I n d = {r_{1}, r_{2}, \dots r_{k}}

of the dataset, for each

r_{i}

, we will compute the relevant score between

\vec{Q}

and

\vec{r_{i}}

, according to Equation (3), where

\vec{r_{i}}

is the vector for the root node

r_{i}

and

i \in [1, k]

. We choose t root nodes with high correlation scores and search the index trees associated with these root nodes, where t is set by data users. Suppose that the selected root nodes are

{r_{1}^{'}, r_{2}^{'}, \dots, r_{t}^{'}}

, the search algorithm for each index tree with the root,

r_{i}^{'}

, is described as follows, where

i \in [1, t]

.

By running Algorithm 3, we can obtain a new document set

D S = R l i s t_{1} \cup R l i s t_{2} \cup \dots \cup R l i s t_{t}

, where

R l i s t_{i}

is the search result of querying the index tree

r_{i}^{'}

, where

i \in [1, t]

. We find

θ

documents with the highest correlation score from

D S

as query results.

Algorithm 3 The algorithm for search the index tree, declared by SearchIndexTree(

\vec{Q}

, u,

R L i s t

)

Input: An IDF vector

\vec{Q}

of the query Q, an index tree of the root node

r_{i}^{'}

and an empty result list

R L i s t

.
Output:

R L i s t

containing documents with

θ

maximum relevant scores.

1:: ifu is an internal node then
2:: SearchIndexTree( $\vec{Q}$ , $u . P_{l}$ , $R L i s t$ );
3:: SearchIndexTree( $\vec{Q}$ , $u . P_{r}$ , $R L i s t$ );
4:: else
5:: if $\vec{u} \cdot \vec{Q}$ > $θ$ -th score then
6:: Deletes the element holding the smallest relevance score in $R L i s t$ ;
7:: Inserts a new element $< S c o r e (u, Q), u . F I D >$ in the $R l i s t$ , and updates the $θ$ -th score;
8:: end if
9:: end if

Example 2.

In this example, we assume that the search aim is to find the documents with the top two relevance scores. From Figure 3, the entire search process consists of three parts. Firstly, we transform the query, Q, into an IDF vector

\vec{Q}

, and then calculate the relevance score between the query, Q, and the three index tree roots,

{r_{1}, r_{2}, r_{3}}

. Suppose that we retrieve only two index trees whose root nodes are most relevant to the query, Q. According to the calculation results, index tree 1 and tree 3 are chosen as our retrieval targets. Secondly, we apply Algorithm 3 to perform the retrieval operation on the index tree and obtain

f_{1}, f_{7}

and

f_{6}, f_{10}

as the result of the query on index trees 1 and 3, respectively. Thirdly, we combine the query results of index tree 1 and tree 3 and return the two documents

f_{1}, f_{10}

with the highest correlation scores as the final query results.

4. Proposed Scheme

In the last section, we introduced the construction and retrieval methods for the plaintext index. In this section, we utilize the ASPE scheme to encrypt the index and give the concrete construction method of our F-SSE-RS scheme. After this, the dynamic update approach and the security analysis for our scheme are also presented.

4.1. Construction of F-SSE-RS

According to the system model introduced in Section 2.1, the F-SSE-RS scheme consists of four algorithms:

K e y G e n

,

I n d e x B u i l d

,

T r a p d o o r G e n

, and

s e a r c h

. The

K e y G e n

and

I n d e x B u i l d

algorithms are executed by the data owner to generate the secrete key and create the encrypted index, respectively. The data user performs the

T r a p d o o r G e n

algorithm to generate the encrypted trapdoor and transmits it to the cloud server. When catching the trapdoor, the cloud server runs the

s e a r c h

algorithm to make the keyword query and returns the search result to the data user. The detailed construction of F-SSE-RS is given as follows.

KeyGen ( $γ$ ): Taking a security parameter, $γ$ , as an input, this algorithm chooses two random invertible matrices, $M_{1}, M_{2}$ , whose dimension are $(N + L) \times (N + L)$ , and a vector, S, whose dimension is $N + L$ . Then, it sets the secret key $s k$ as ${M_{1}, M_{2}, S}$ and outputs the $s k$ to authorized data users.
IndexBuild ( $sk$ , $F$ ): Given a document set F, this algorithm first partitions F into k document subsets ${c_{1}, c_{2}, \dots, c_{k}}$ using the data division method. For each document set, $c_{j}$ , this algorithm adopts Algorithm 2 to generate an index tree $T_{j}$ for $c_{j}$ , where $j \in [1, k]$ . Then, this algorithm encrypts the index tree, $T_{j}$ . The encryption process starts from the root node, and each node is encrypted using a sequential traversal method. More precisely, for a node $u = < I D, \vec{u}, P_{l}, P_{r}, F I D >$ , the algorithm extends the N-dimension vector $\vec{u}$ into a $(N + L)$ -dimension vector $\vec{u_{E}}$ , in which the value of $\vec{u_{E}} [i]$ is set to be $\vec{u} [i]$ when $i \in [1, N]$ , and the value of $\vec{u_{E}} [i]$ is set as a random number, $ϵ_{i}$ , when $i \in [N + 1, N + L]$ . After the extension process, two random vectors, ${{\vec{u_{E}}}^{^{'}}, {\vec{u_{E}}}^{^{″}}}$ , of $\vec{u_{E}}$ can be created by using the following equations.

$\{\begin{matrix} {\vec{u_{E}}}^{^{'}} [i] + {\vec{u_{E}}}^{^{″}} [i] = \vec{u_{E}} [i], i f S [i] = 0; \\ {\vec{u_{E}}}^{^{'}} [i] = {\vec{u_{E}}}^{^{″}} [i] = \vec{u_{E}} [i], i f S [i] = 1 . \end{matrix}\} i \in [1, N + L] .$

After encrypting each node in the index tree $T_{j}$ , the algorithm generates the encrypted index tree $E_{T_{j}}$ , where each encrypted node $E_{u}$ of u can be expressed as $E_{u} = < I D, M_{1}^{T} {\vec{u_{E}}}^{^{'}}, M_{2}^{T} {\vec{u_{E}}}^{^{″}}, P_{l}, P_{r}, F I D >$ Finally, after encrypting all the index trees, the algorithm outputs the encrypted index $E_{I n d} = {E_{T_{1}}, E_{T_{2}}, \dots, E_{T_{k}}}$ .
TrapdoorGen ( $sk$ , $Q$ ): Given a query, Q, the algorithm first transforms Q into an IDF vector $\vec{Q}$ using the keyword conversion method given in Section 3.1. Then, this algorithm extends the N-dimension vector $\vec{Q}$ into a $(N + L)$ -dimension vector $\vec{Q_{E}}$ , where each $\vec{Q_{E}} [i]$ is set to be $\vec{Q} [i]$ when $i \in [1, N]$ and each $\vec{Q_{E}} [i]$ is set to be 0 or 1 randomly when $i \in [N + 1, N + L]$ . After this, this algorithm generates two random vectors, ${{\vec{Q_{E}}}^{^{'}}, {\vec{Q_{E}}}^{^{″}}}$ , according to the following equations.

$\{\begin{matrix} {\vec{Q_{E}}}^{^{'}} [i] + {\vec{Q_{E}}}^{^{″}} [i] = \vec{Q_{E}} [i], i f S [i] = 1; \\ {\vec{Q_{E}}}^{^{'}} [i] = {\vec{Q_{E}}}^{^{″}} [i] = \vec{Q_{E}} [i], i f S [i] = 0 . \end{matrix}\} i \in [1, N + L] .$

Finally, this algorithm outputs $T_{Q} = {M_{1}^{- 1} {\vec{Q_{E}}}^{^{'}}, M_{2}^{- 1} {\vec{Q_{E}}}^{^{″}}}$ as the trapdoor for Q.
Search ( $T_{Q}$ , $E_{Ind}$ ): Given the trapdoor, $T_{Q}$ , for each encrypted tree, $E_{T_{i}}$ , this algorithm computes the relevant score, $r s_{i}$ , between the encrypted root node $r_{i}$ of $E_{T_{i}}$ and $T_{Q}$ , where $i \in [1, k]$ . Suppose that ${r s_{ρ 1}, r s_{ρ 2}, \dots, r s_{ρ t}}$ are the top-t correlation scores, the search algorithm performs the traversal search on these encrypted trees ${E_{T_{ρ 1}}, E_{T_{ρ 2}}, \dots, E_{T_{ρ t}}}$ , where ${ρ 1, ρ 2, \dots, ρ t} \subset {1, 2, \dots, k}$ . For each $j \in [1, t]$ , this algorithm searches the encrypted tree $E_{T_{ρ j}}$ according to Algorithm 3. In the search process, for an encrypted tree node $E_{u} = < I D, M_{1}^{T} {\vec{u_{E}}}^{^{'}}, M_{2}^{T} {\vec{u_{E}}}^{^{″}}, P_{l}, P_{r}, F I D >$ and the trapdoor $T_{Q} = {M_{1}^{- 1} {\vec{Q_{E}}}^{^{'}}, M_{2}^{- 1} {\vec{Q_{E}}}^{^{″}}}$ , this algorithm can compute:

$\begin{matrix} (M_{1}^{T} {\vec{u_{E}}}^{^{'}} \cdot M_{1}^{- 1} {\vec{Q_{E}}}^{^{'}}) + (M_{2}^{T} {\vec{u_{E}}}^{^{″}} \cdot M_{2}^{- 1} {\vec{Q_{E}}}^{^{″}}) & = {\vec{u_{E}}}^{^{'}} \cdot {\vec{Q_{E}}}^{^{'}} + {\vec{u_{E}}}^{^{″}} \cdot {\vec{Q_{E}}}^{^{″}} \\ = \vec{u_{E}} \cdot \vec{Q_{E}} \\ = S c o r e (u, Q) \end{matrix}$

(4)

According to Equation (4), the computation result between $E_{u}$ and $T_{Q}$ is the same as that between the plaintext u and Q. Therefore, the search algorithm can employ Algorithm 3 to perform the sorting search in the encrypted state. After finishing the query on the encrypted tree $E_{T_{ρ j}}$ , a result set $R L i s t_{ρ j}$ on $E_{T_{ρ j}}$ can be obtained. Finally, this algorithm figures out the $θ$ documents with the highest scores from $R L i s t_{ρ 1} \cup R L i s t_{ρ 2} \cup \dots \cup R L i s t_{ρ t}$ and return them to the user as query results.

4.2. Dynamic Update Operations

In general, in addition to the above search requirement, our scheme also needs to satisfy the user’s requirements for adding, deleting, and modifying documents. Therefore, we require three additional approaches to the scheme to support the above dynamic update operations. Since the encrypted index of the scheme is based on the balanced binary tree, we can implement these update operations by dynamically adding and deleting tree nodes. Inspired by the methods given in [8,9], the three dynamic update methods on F-SSE-RS are as follows.

-: Deletion: When DO wants to delete the document f from the index, DO first determines which tree in the index f exists in. Then, DO locates the position information about the leaf node of f in that index tree. Finally, DO sends the location information to CS, which can null the node based on the location information to achieve the deletion operation.
-: Addition: When DO wants to add a document f to the index, DO first transforms f’s keywords into a TF vector using the keyword conversion method and constructs a leaf node about f with its TF vector. Subsequently, using the TF vector, DO finds the index tree whose root node is the most semantic similar to f in the index, and locates a leaf node marked as invalid in that tree. Then, DO replaces this invalid node with a leaf node of f and updates the vector of all internal nodes on the path from the root of the tree to this leaf node. Finally, DO encrypts all the changed nodes and sends them to CS together with their corresponding position information. When CS receives these nodes, CS replaces the relevant nodes based on the position information to implement the insertion operation. In addition, if there are no leaf nodes marked as invalid in the index tree, DO can add multiple invalid nodes to the index tree and update the index tree. After that, DO encrypts the modified tree nodes and sends their location information to CS. According to this location information, CS updates the index tree to realize the file addition operation.
-: Modification: If DO wants to modify a file, then DO first locates the leaf node corresponding to that file and replaces the semantic vector for the leaf node with the newer vector. Then, DO updates all the nodes on the path from the root of the tree to that leaf node based on the modified vector of the leaf node. Finally, DO encrypts the contents of all nodes to be changed and sends their location information together to CS. When CS receives these nodes, it replaces the old nodes according to the location information to perform the update operation.

Note that the above dynamic operations all require that DO has a plaintext index stored locally. The advantage of this is that by updating the plaintext index locally, DO can obtain the information about the location and content of index updates faster. In addition, during the update process, DO encrypts the content information to be updated in the index and sends the corresponding location information to CS in plaintext, so that CS can finish modifying the index without understanding the updated content. Although the local storage method has additional space overhead, it improves the update efficiency and security of the scheme.

4.3. Security Analysis

In this subsection, we analyze the security of the proposed F-SSE-RS scheme based on the privacy requirements introduced in Section 2.3.

-: Document and index privacy: In the F-SSE-RS scheme, the confidentiality of the document content is guaranteed by a traditional symmetric secret key encryption scheme, such as AES. The index in the F-SSE-RS scheme is a combination of multiple index trees, and the content of each node in the index tree is cryptographically protected using the ASPE scheme. Because AES and ASPE are provably secure under known ciphertext models, the plaintext contents hidden in the documents and indices cannot be inferred by an attacker. So, we argue that the privacy of documents and indices is protected well.
-: Trapdoor unlinkability: The trapdoor-generation algorithm of the proposed scheme is probabilistic, which is manifested in the following two aspects. (1) The semantic vector $\vec{Q}$ of the query Q is enlarged into an extension vector $\vec{Q_{E}}$ before generating the trapdoor, and even the same two queries can be enlarged into different extension vectors; (2) in the “ TrapdoorGen” algorithm, the query vector $\vec{Q_{E}}$ is partitioned into two parts randomly. Based on the above two points, we can conclude that the same two queries can be encrypted into different trapdoors, so the proposed scheme can satisfy the requirement of trapdoor unlinkability.
-: Keyword privacy: Under the known ciphertext model, the attacker cannot infer the keyword information from the index and trapdoor since the F-SSE-RS scheme utilizes the ASPE scheme to encrypt the index and trapdoor. However, in the known background model, CS can use the document–word frequency to perform statistical attacks and then infer the keywords embedded in the index and trapdoors. For the statistical attack in the known background model, our scheme extends the keyword vectors $\vec{u}$ and $\vec{Q}$ in the index and trapdoor into $\vec{u_{E}}$ and $\vec{Q_{E}}$ , respectively. Specifically, for each extended dimension of $\vec{u_{E}}$ , the scheme randomly selects a number $ϵ_{i}$ , while for each extended dimension of $\vec{Q_{E}}$ , the scheme randomly selects a number 0 or 1. This approach allows the query results to be masked by the randomness of $\sum ϵ_{i}$ . Since the number of extended dimensions is L, the probability that two $\sum ϵ_{i}$ have the same value is only $\frac{1}{2^{L}}$ . Therefore, when L increases, the query results will be more influenced by $\sum ϵ_{i}$ , bringing the result that the privacy of keywords increases but the search accuracy decreases. Therefore, by adjusting L, we can make a tradeoff between precision and privacy in practical applications. The analysis of the tradeoff between precision and privacy can be found in [8].

5. Performance Evaluation

In this section, we evaluate the proposed F-SSE-RS scheme theoretically and experimentally and give a detailed experiment to quantify the space–time efficiency of the scheme. We implemented the proposed scheme in Python and tested it on a real dataset, i.e., Enron e-mail datasets [28]. In addition, our experimental runtime environment includes an Intel(R) Core(TM) i7 CPU whose frequency is 2.90 GHz and 16 GB of RAM. To illustrate the advantages of the proposed scheme, we compare it with two similar previous schemes in terms of the time complexity of index construction, trapdoor generation and searching, and the space complexity of indexes and trapdoors. For convenience, we denote the schemes proposed in [8,9] as Xia16 and Guo19, respectively. In addition, we also conduct experiments on the accuracy of these schemes to demonstrate the merits of the proposed schemes more comprehensively.

5.1. Efficiency of Index Building

In the index-building phase, the proposed scheme splits d documents into k document clusters, each of which contains nearly

d / k

documents on average. For each document cluster, we construct an index tree that contains

2 d / k

nodes. Since each node contains a

T F

vector of length

N + L

, the time cost of encrypting a node is

O (d {(N + L)}^{2})

, where the encryption operation mainly considers two multiplication operations between the

(N + L) \times (N + L)

-invertible matrix and the

N + L

-dimensional

T F

vector. Considering that the k index trees contain a total of

2 d

tree nodes, it can be deduced that the index building time of the proposed scheme is

O (d {(N + L)}^{2})

. Moreover, since the index of the proposed scheme contains a total of

2 D

nodes and each node contains a

T F

vector of length

N + L

, the index storage consumption of the proposed scheme is

O (d (N + L))

. For Xia16, since its index also contains

2 D

tree nodes, the index building time of Xia16 is

O (d {(N + L)}^{2})

and the storage space is also

O (d (N + L))

. For Guo19, its leaf nodes are constructed in the same way as Xia16, but the vectors of the internal nodes of its index tree are compressed using the bloom filter technique. Thus, its index building time is

O (d {(N + L)}^{2}) + O (d {(α)}^{2})

, where

α

is the length of the bloom filter and

α < < N

. Because the index of Guo19 contains two different storage methods, the storage space of its scheme is

O (d (N + L)) + O (d α)

. Since the index of Guo19 exists in two different vector forms, the storage space of its scheme is

O (d (N + L)) + O (d α)

.

As shown in Figure 4 and Figure 5, the index building time of Xia16, Guo19, and F-SSE-RS schemes are all squared with N (Figure 4) and linear with d (Figure 5). Specifically, when d = 1000 and N = 10,000, the index building times for Xia16, Guo19, and the proposed scheme are 886 s, 444 s, and 902 s, respectively. It is observed that the index generation time for Guo19 is half of that for Xia16 and the proposed scheme. The reason why the index building time of Guo19 is smaller than the other two schemes is that the vector dimension of its internal node is shorter. In additional, the index building time of the proposed scheme is slightly longer than that of Xia16. The reason that the index building time of the proposed scheme is longer than that of Xia16 is that our scheme has one more step of clustering operation. All the above experimental results are consistent with the theoretical analysis.

5.2. Efficiency of Trapdoor Generation

In the trapdoor-generation phase, the proposed scheme first converts the query Q into an

I D F

vector of dimension

N + L

, and then encrypts this vector using the ASPE scheme. Therefore, the trapdoor generation time of the proposed scheme is

O ({(N + L)}^{2})

. For Xia16, its trapdoor generation method is the same as our scheme, so the trapdoor generation time of Xia16 is also

O ({(N + L)}^{2})

. According to the above analysis, it is clear that the trapdoor storage costs of both the proposed scheme and xia16 are

O (N + L)

. For Guo19, since the internal nodes and leaf nodes are constructed by using the bloom filter vector and

T F

vector, respectively, its trapdoor can be seen as a binary tuple

(B F_{Q}, T F_{Q})

, where

B F_{Q}

and

T F_{Q}

are used to query the internal nodes and leaf nodes, respectively. Based on the above analysis, the trapdoor generation time of Guo19 is

O (N + L) + O (α)

, and the trapdoor storage consumption is also

O (N + L) + O (α)

.

As shown in Figure 4, the index generation time of Xia16, Guo19, and F-SSE-RS schemes are all squared with N. In particular, when d = 1000 and N = 10,000, the trapdoor generation times for Xia16, Guo19, and the proposed scheme are 440 ms, 455 ms, and 438 ms, respectively. It can be seen that the trapdoor generation time of Guo19 is more than the other two schemes. The trapdoor generation time of Guo19 is larger than the other two schemes since it has to encrypt two vectors, one for querying internal nodes and one for querying leaf nodes. Besides, the trapdoor generation time of Xia16 is the same as the trapdoor generation time of F-SSE-RS. All the above experimental results are consistent with the theoretical analysis.

5.3. Efficiency of Search

In the search phase, because the index of the proposed scheme contains k index trees and the height of each index tree is

l o g_{2} d / k

, the search time of each index tree is

O (l o g_{2} d / k (N + L))

, where

N + L

is the length of the vector contained in the internal node. In addition, when the search operation reaches the leaf nodes, the similarity calculation will be performed. The time consumption of similarity calculation for each index tree is

O (N + L)

since the dimension of the

T F -

vector is

N + L

. Based on the above analysis, assuming that we select t most relevant index trees for querying, the search time of the proposed scheme is

O (t l o g_{2} d / k (N + L)) + O (N + L)

. For Xia16, it has to search the index tree with height

l o g_{2} d

. Its query time is

O (l o g_{2} d (N + L)) + O (N + L)

. For Guo19, its query time is

O (l o g_{2} d α) + O (N + L)

since the vector length of the internal node is

α

.

As shown in Figure 4 and Figure 5, the query times of the Xia16, Guo19, and F-SSE-RS schemes are linear with N and sublinear with d. Concretely, when d = 1000 and N = 10,000, the search times for Xia16, Guo 19, and F-SSE-RS are 98 ms, 162 ms, and 193 ms, respectively. Based on the experiment result, the search time for the proposed scheme is two-thirds of that for Guo19 and half of that for Xia16. The proposed scheme has the highest query efficiency due to the lower depth of the index tree and the smaller number of queried nodes, which is consistent with the theoretical analysis.

5.4. Accuracy

Our scheme selects only a few of the most relevant index trees for querying, which will affect the accuracy of the search results. To quantify this impact, we use the “precision definition” proposed in [7] to measure the impact of accuracy. The “precision definition” is depicted as

p = θ / θ^{'}

, where

θ^{'}

is the number of real top-

θ

files returned by CS. For clarity, we design an experiment to test the relationship between search time and query accuracy of F-SSE-RS. Concretely, we construct an index consisting of 5 index trees and select the most relevant t index trees for querying, where

t \in {1, 2, 3, 4, 5}

. Figure 6 shows the comparison results among the proposed scheme, Guo19, and Xia16. From Figure 6, we can find that the query accuracy of F-SSE-RS is decreasing as t reduces, but the search time is also decreasing substantially. Specifically, for every loss of approximately

10 %

query accuracy, there is a reduction of approximately

20 %

search time. In summary, according to Figure 6, compared with Xia16, F-SSE-RS improves search efficiency while not compromising query accuracy as much as possible. Compared to Guo19, the proposed scheme guarantees similar query accuracy while using less search time.

5.5. Discussion

Based on the above theoretical analysis and experimental results, we can find that the proposed scheme has good flexibility compared to Xia16. Specifically, we can adjust the value of t to make a certain compromise between query accuracy and search time. Furthermore, compared with Guo19, our scheme has better query accuracy and search time, except for the longer index-building time. In real-time applications, users generally care more about search time and query accuracy, so our scheme will be more practical.

In addition, the index of the proposed scheme consists of multiple index trees, so we can accelerate the query process using parallel computing methods. Whenever DU performs a keyword search, DU sends a query trapdoor to CS. Then, CS will utilize the trapdoor to execute the search operation on multiple index trees in parallel. Each task independently performs a keyword search and adds the results to the final result set. Finally, CS returns the result set to DU. By adopting this method, the search efficiency can be significantly improved. Since the cloud platform has powerful computing power, we believe that the parallel strategy is very suitable for the proposed scheme.

The proposed scheme can be applied to cloud-based communication systems, such as wireless IoT systems [29], E-Healthcare systems [30], and personalized search systems [2]. Taking the personalized search system as an example, the users can use their terminal devices to send their encrypted query information to the cloud, and the cloud server can find the points of interest near the users and related to their queries through secure retrieval and mark them on their cloud devices. This kind of application is characterized by the real-time nature of user queries, and the fast query capability of this solution can better serve such users. In addition, the present solution has good flexibility between real-time use and accuracy. Specifically, the user can dynamically adjust the parameter t and can choose whether real-time use or accuracy is the priority. This customization setting can allow users to have a superior query experience.

6. Conclusions

In this paper, we utilize a clustering algorithm to divide the document set into multiple document clusters and index each document cluster using a binary balanced tree. When a keyword query is performed, the search algorithm retrieves only the index tree that is most semantically related to the query keyword, which effectively improves the query efficiency. By adopting an ASPE scheme to encrypt the index and query, we propose an F-SSE-RS scheme. This scheme can support ranked search on encrypted data and is secure under the known background model.

Furthermore, we give a detailed theoretical and experimental analysis. This analysis indicates that the query efficiency of the proposed scheme is sublinearly with the number of documents. In addition, our scheme has better query efficiency without compromising too much query accuracy and has better flexibility than other typical similar schemes. Thus, we believe that the proposed scheme has high practicality. Although this paper improves the query efficiency by eliminating some irrelevant documents through clustering methods, it still loses some query precision. Therefore, the future work of this paper is to further improve query accuracy while ensuring query efficiency. In addition, the proposed scheme currently supports only textual queries, while many existing cloud-based applications need to support both spatial and textual queries. Therefore, another extension work of this paper is to construct efficient searchable symmetric encryption schemes supporting spatial data queries.

Author Contributions

Conceptualization, W.H. and Y.Z.; data curation, Y.Z. and Y.L.; formal analysis, W.H. and Y.Z.; funding acquisition, W.H. and Y.Z.; methodology, W.H. and Y.Z.; software, Y.Z.; validation, W.H. and Y.Z.; writing—original draft preparation, W.H. and Y.Z.; writing—review & editing, W.H. and Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 61972090, 31872704), by Natural Science Foundation of Henan (Grant No. 202300410339), and by the Science and Technology Project of Henan Province (Grant No. 212102310993).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used to support the findings of this study is available from the website, “URL: http://www.cs.cmu.edu/~./enron/” (accessed on 19 April 2022).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SE	Searchable encryption.
ASPE	Asymmetric scalar-product-preserving encryption.
SSE	Searchable symmetric key encryption.
SPE	Searchable public key encryption.
TF-IDF	Term frequency-inverse document frequency.
PEKS	encryption with keyword search.
DO	Data owner.
DU	Data user.
CS	Cloud server.

References

Song, D.; Wagner, D.; Perrig, A. Practical techniques for searching on encrypted data. In Proceedings of the IEEE Symposium on Research in Security and Privacy, Berkeley, CA, USA, 14–17 May 2000; pp. 44–55. [Google Scholar]
Fu, Z.; Ren, K.; Shu, J.; Sun, X.; Huang, F. Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Trans. Parallel Distrib. Syst. 2015, 27, 2546–2559. [Google Scholar] [CrossRef]
Sun, W.; Liu, X.; Lou, W.; Hou, Y.T.; Li, H. Catch you if you lie to me: Efficient verifiable conjunctive keyword search over large dynamic encrypted cloud data. In Proceedings of the 2015 IEEE Conference on Computer Communications (INFOCOM), Hong Kong, China, 26 April–1 May 2015; pp. 2110–2118. [Google Scholar]
Boneh, D.; Di Crescenzo, G.; Ostrovsky, R.; Persiano, G. Public key encryption with keyword search. In Proceedings of the International Conference on the Theory and Applications of Cryptographic Techniques, Interlaken, Switzerland, 2–6 May 2004; pp. 506–522. [Google Scholar]
Zhang, Y.; Li, Y.; Wang, Y. Secure and Efficient Searchable Public Key Encryption for Resource Constrained Environment Based on Pairings under Prime Order Group. Secur. Commun. Netw. 2019, 2019, 1–14. [Google Scholar] [CrossRef]
Miao, Y.; Tong, Q.; Deng, R.; Choo, K.K.R.; Liu, X.; Li, H. Verifiable searchable encryption framework against insider keyword-guessing attack in cloud storage. IEEE Trans. Cloud Comput. 2020, 1–14. [Google Scholar] [CrossRef]
Cao, N.; Wang, C.; Li, M.; Ren, K.; Lou, W. Privacy-preserving multi-keyword ranked search over encrypted cloud data. IEEE Trans. Parallel Distrib. Syst. 2013, 25, 222–233. [Google Scholar] [CrossRef] [Green Version]
Xia, Z.; Wang, X.; Sun, X.; Wang, Q. A Secure and Dynamic Multi-Keyword Ranked Search Scheme over Encrypted Cloud Data. IEEE Trans. Parallel Distrib. Syst. 2016, 27, 340–352. [Google Scholar] [CrossRef]
Guo, C.; Zhuang, R.; Chang, C.C.; Yuan, Q. Dynamic multi-keyword ranked search based on bloom filter over encrypted cloud data. IEEE Access 2019, 7, 35826–35837. [Google Scholar] [CrossRef]
Wong, W.K.; Cheung, D.W.; Kao, B.; Mamoulis, N. Secure kNN computation on encrypted databases. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, Providence, RI, USA, 29 June–2 July 2009; pp. 139–152. [Google Scholar]
Goh, E.J. Secure indexes. IACR Cryptol. EPrint Arch. 2003, 2003, 216. [Google Scholar]
Wang, B.; Li, M.; Wang, H. Geometric range search on encrypted spatial data. IEEE Trans. Inf. Forensics Secur. 2015, 11, 704–719. [Google Scholar] [CrossRef]
Xu, G.; Li, H.; Dai, Y.; Yang, K.; Lin, X. Enabling efficient and geometric range query with access control over encrypted spatial data. IEEE Trans. Inf. Forensics Secur. 2018, 14, 870–885. [Google Scholar] [CrossRef]
Fu, Z.; Wu, X.; Guan, C.; Sun, X.; Ren, K. Toward Efficient Multi-Keyword Fuzzy Search Over Encrypted Outsourced Data With Accuracy Improvement. IEEE Trans. Inf. Forensics Secur. 2017, 11, 2706–2716. [Google Scholar] [CrossRef]
Kuzu, M.; Islam, M.S.; Kantarcioglu, M. Efficient similarity search over encrypted data. In Proceedings of the 2012 IEEE 28th International Conference on Data Engineering, Arlington, VA, USA, 1–5 April 2012; pp. 1156–1167. [Google Scholar]
Zhang, Y.; Li, Y.; Wang, Y. Efficient Searchable Symmetric Encryption Supporting Dynamic Multikeyword Ranked Search. Secur. Commun. Netw. 2020, 2020, 1–16. [Google Scholar] [CrossRef]
Wang, C.; Cao, N.; Ren, K.; Lou, W. Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data. IEEE Trans. Parallel Distrib. Syst. 2012, 23, 1467–1479. [Google Scholar] [CrossRef]
Shao, J.; Lu, R.; Guan, Y.; Wei, G. Achieve Efficient and Verifiable Conjunctive and Fuzzy Queries over Encrypted Data in Cloud. IEEE Trans. Serv. Comput. 2020, 15, 124–137. [Google Scholar] [CrossRef]
Wang, X.; Ma, J.; Liu, X.; Deng, R.H.; Miao, Y.; Zhu, D.; Ma, Z. Search me in the dark: Privacy-preserving boolean range query over encrypted spatial data. In Proceedings of the IEEE INFOCOM 2020-IEEE Conference on Computer Communications, Toronto, ON, Canada, 6–9 July 2020; pp. 2253–2262. [Google Scholar]
Guo, C.; Chen, X.; Jie, Y.; Fu, Z.; Li, M.; Feng, B. Dynamic multi-phrase ranked search over encrypted data with symmetric searchable encryption. IEEE Trans. Serv. Comput. 2020, 13, 1034–1044. [Google Scholar] [CrossRef]
Park, D.J.; Kim, K.; Lee, P.J. Public key encryption with conjunctive field keyword search. In Proceedings of the International Workshop on Information Security Applications, Jeju Island, Korea, 23–25 August 2004; pp. 73–86. [Google Scholar]
Katz, J.; Sahai, A.; Waters, B. Predicate encryption supporting disjunctions, polynomial equations, and inner products. In Proceedings of the Annual International Conference on the Theory and Applications of Cryptographic Techniques, Istanbul, Turkey, 13–17 April 2008; pp. 146–162. [Google Scholar]
Xu, P.; Tang, S.; Xu, P.; Wu, Q.; Hu, H.; Susilo, W. Practical multi-keyword and boolean search over encrypted e-mail in cloud server. IEEE Trans. Serv. Comput. 2019, 14, 1877–1889. [Google Scholar] [CrossRef]
Miao, Y.; Liu, X.; Choo, K.K.R.; Deng, R.H.; Li, J.; Li, H.; Ma, J. Privacy-preserving attribute-based keyword search in shared multi-owner setting. IEEE Trans. Dependable Secur. Comput. 2021, 18, 1080–1094. [Google Scholar] [CrossRef]
Xu, P.; He, S.; Wang, W.; Susilo, W.; Jin, H. Lightweight searchable public-key encryption for cloud-assisted wireless sensor networks. IEEE Trans. Ind. Inform. 2017, 14, 3712–3723. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, Y.; Li, Y. Searchable Public Key Encryption Supporting Semantic Multi-Keywords Search. IEEE Access 2019, 7, 122078–122090. [Google Scholar] [CrossRef]
Dhandabani, R.; Periyasamy, S.S.; Padma, T.; Sangaiah, A.K. Six-face cubical key encryption and decryption based on product cipher using hybridisation and Rubik’s cubes. IET Netw. 2018, 7, 313–320. [Google Scholar] [CrossRef]
Cohen, W.W. Enron E-Mail Dataset. Available online: Http://www.cs.cmu.edu/~./enron/ (accessed on 19 April 2022).
Sangaiah, A.K.; Javadpour, A.; Ja’fari, F.; Pinto, P.; Ahmadi, H.; Zhang, W. CL-MLSP: The design of detection mechanism for sinkhole attacks in smart cities. Microprocess. Microsyst. 2022, 90, 104504. [Google Scholar] [CrossRef]
Zhang, J.; Liang, X.; Zhou, F.; Li, B.; Li, Y. TYLER, a fast method that accurately predicts cyclin-dependent proteins by using computation-based motifs and sequence-derived features. Math. Biosci. Eng. 2021, 18, 6410–6429. [Google Scholar] [PubMed]

Figure 1. System model of F-SSE-RS.

Figure 2. An example of the index building process.

Figure 3. An example of the search process.

Figure 4. Impact of N on the time cost of setup (a), index building (b), trapdoor generation (c) and search (d) (N = {2000; 4000; 6000; 8000; 10,000}; d = 1000; k = 5).

Figure 5. Impact of d on the time cost of index building (a) and search (b) (d = {200; 400; 600; 800; 1000}; N = 10,000; k = 5).

Figure 6. Impact of t on the search time and query accuracy (t = {1, 2, 3, 4, 5}; d = 1000; N = 10,000; k = 5).

Table 1. Description of the main notations in the F-SSE-RS scheme.

F	A document set ${f_{1}, f_{2}, \dots, f_{d}}$ .
d	The number of documents in F.
$D I C = {w_{1}, w_{2}, \dots, w_{N}}$	The dictionary of a dataset.
N	The number of keywords in the dictionary.
$W_{i} = {w_{i 1}, w_{i 2}, \dots, w_{i n_{i}}}$	The keyword set for the document, $f_{i}$ , where $i \in [1, d]$ .
$n_{i}$	The number of keywords in $W_{i}$ , where $i \in [1, d]$ .
$w_{i j}$	The j-th keyword in $W_{i}$ , where $i \in [1, d]$ , $j \in [1, n_{i}]$ .
$\vec{W_{i}}$	The vector representation for $W_{i}$ .
$Q = {q_{1}, q_{2}, \dots, q_{m}}$	A keyword query.
$q_{i}$	A keyword in Q, where $i \in [1, m]$ .
$\vec{Q}$	The vector representation of the query Q.
$T_{Q}$	The trapdoor of Q.
$C = {c_{1}, c_{2}, \dots, c_{k}}$	k document clusters divided from F.
$c_{i} = {f_{i 1}, f_{i 2}, \dots, f_{i ϕ}}$	The document set in $c_{i}$ .
$f_{i j}$	The j-th document in the cluster, $c_{i}$ , where $i \in [1, k]$ , $j \in [1, ϕ]$ .
$\vec{W_{i j}}$	The vector representation of $f_{i j}$ .
k	The number of clusters for dataset clustering.
$ϕ$	The number of documents in each cluster.
$T_{i}$	An index tree for the cluster, $c_{i}$ , where $i \in [1, k]$ .
$r_{i}$	The root node for $T_{i}$ , where $i \in [1, k]$ .
u	A node in an index tree.
$\vec{u}$	The vector representation of the node u.
$I n d = {r_{1}, r_{2}, \dots, r_{k}}$	The index for F.
$E_{I n d} = {E_{T_{1}}, E_{T_{2}}, \dots, E_{T_{k}}}$	The encrypted index for F.
$E_{T_{i}}$	The encrypted index tree for the cluster, $c_{i}$ , where $i \in [1, k]$ .
t	The number of index trees needed to be search.
$θ$	The number of documents needed to be returned.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

He, W.; Zhang, Y.; Li, Y. Fast, Searchable, Symmetric Encryption Scheme Supporting Ranked Search. Symmetry 2022, 14, 1029. https://doi.org/10.3390/sym14051029

AMA Style

He W, Zhang Y, Li Y. Fast, Searchable, Symmetric Encryption Scheme Supporting Ranked Search. Symmetry. 2022; 14(5):1029. https://doi.org/10.3390/sym14051029

Chicago/Turabian Style

He, Wei, Yu Zhang, and Yin Li. 2022. "Fast, Searchable, Symmetric Encryption Scheme Supporting Ranked Search" Symmetry 14, no. 5: 1029. https://doi.org/10.3390/sym14051029

APA Style

He, W., Zhang, Y., & Li, Y. (2022). Fast, Searchable, Symmetric Encryption Scheme Supporting Ranked Search. Symmetry, 14(5), 1029. https://doi.org/10.3390/sym14051029

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fast, Searchable, Symmetric Encryption Scheme Supporting Ranked Search

Abstract

1. Introduction

2. Problem Formulation

2.1. System Model

2.2. Threat Model

2.3. Design Goals

3. Methods for Index Building and Searching

3.1. Keyword Conversion Method

3.2. Approach for Index Building

3.2.1. Dataset Division Method

3.2.2. Method for Building the Plaintext Index

3.3. Approach for Index Search

4. Proposed Scheme

4.1. Construction of F-SSE-RS

4.2. Dynamic Update Operations

4.3. Security Analysis

5. Performance Evaluation

5.1. Efficiency of Index Building

5.2. Efficiency of Trapdoor Generation

5.3. Efficiency of Search

5.4. Accuracy

5.5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI