Open Access
This article is
 freely available
 reusable
Computers 2020, 9(1), 1; https://doi.org/10.3390/computers9010001
Article
A Privacy Preserving CloudBased KNN Search Scheme with Lightweight User Loads
^{1}
MediaTek Inc., Hsinchu 30078, Taiwan
^{2}
Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei 10617, Taiwan
^{3}
Department of Computer Science and Information Engineering, National Taiwan University, Taipei 10617, Taiwan
^{*}
Author to whom correspondence should be addressed.
Received: 4 November 2019 / Accepted: 29 December 2019 / Published: 1 January 2020
Abstract
:With the growing popularity of cloud computing, it is convenient for data owners to outsource their data to a cloud server. By utilizing the massive storage and computational resources in cloud, data owners can also provide a platform for users to make query requests. However, due to the privacy concerns, sensitive data should be encrypted before outsourcing. In this work, a novel privacy preserving Knearest neighbor (KNN) search scheme over the encrypted outsourced cloud dataset is proposed. The problem is about letting the cloud server find K nearest points with respect to an encrypted query on the encrypted dataset, which was outsourced by data owners, and return the searched results to the querying user. Comparing with other existing methods, our approach leverages the resources of the cloud more by shifting most of the required computational loads, from data owners and query users, to the cloud server. In addition, there is no need for data owners to share their secret key with others. In a nutshell, in the proposed scheme, data points and user queries are encrypted attributewise and the entire search algorithm is performed in the encrypted domain; therefore, our approach not only preserves the data privacy and query privacy but also hides the data access pattern from the cloud server. Moreover, by using a tree structure, the proposed scheme could accomplish query requests in subliner time, according to our performance analysis. Finally, experimental results demonstrate the practicability and the efficiency of our method.
Keywords:
KNN; privacy preserving; cloud; encryption; security1. Introduction
By outsourcing data and/or tasks to the cloud, even devices with low computational ability can conduct analytic works with a large amount of data. Otherwise, data owners need to build a data warehouse for hosting their data so that other users can make queries on it, for further researches or services. Cloud server does benefit to the above application scenarios; however, it is wellknown that security issues are arisen if sensitive data are involved. For example, the medical records have not yet been stored on the cloud platform because the records may contain sensitive information which needs specific privacy protection before being released to public. A compromised cloud server might expose the medical dataset outsourced from medical data owners or infringe upon patients’ privacy by leaking out the associated symptoms or diagnosis results.
For the protection of sensitive data, data owners usually encrypt their data before outsourcing them to the cloud. Moreover, to make sure none of the others can retrieve the private data, no one except the data owner themselves can access the decryption key. At the same time, other users who want to access the data can make query requests to the cloud. Interestingly, sometimes these users also want to preserve their query privacy from data owners and the cloud server. Therefore, the query would be encrypted as well. The abovementioned security issues can be solved if we can process the queries directly over the encrypted dataset and make sure that the whole scheme works as a searchable privacy preserving data storage system.
In this work, we focus on solving the Knearest neighbor (KNN) search problem over the encrypted dataset in the cloud. The KNN search is a basic method widely used for classification and regression in pattern recognition or data mining areas. The KNN search algorithm takes a dataset and a query as input and outputs K data points closest to the query. Both data points in the dataset and the query can be multidimensional data and the measurement of the distance between them is the norm defined in a specific space, usually the Euclidean space. In the proposed scheme, data points are encrypted attributewise and so is the query. The entire search algorithm is performed in the encrypted domain and the user obtains the unencrypted results relative to his query. Apparently, the user has to be trusted by data owners since they can obtain a small portion of the plaintext data, eventually.
There are several works proposed to solve the predescribed privacypreserving KNN (PPkNN) search problem, but all of them have certain limitations, such as the need of sharing the data owner’s decryption key to others, adding additional storage to the user’s end, using two noncolluding servers, and refining the resultant data by the query user, etc. In this work, we try to overcome those limitations and find a new approach which meets the needs of privacy preserving and searchability simultaneously. Moreover, unlike most of existing methods that search the KNN results linearly through the entire dataset, our approach leverages a tree structure, named Rtree [1], to store the data, making the search complexity faster than the linear ones. In short, the contributions of our work can be summarized as follows:
 We propose a PPkNN search scheme based on the new encryption method described above, which can perform comparison and addition operations in the encrypted domain, and use an Rtree structure to achieve sublinear search complexity.
 Benefiting from the new encryption method, the construction of an Rtree can also be done by the cloud server on the encrypted domain, which can reduce the computational overhead of data owners and enhance the query user’s security by preventing data owners from knowing the data access pattern.
 We address and overcome the limitations of related works and make a comparison of our work with them.
 Experimental results over synthetic datasets show that our method is practical and efficient. Moreover, the workloads on user ends are very lightweight.
The rest of this paper is organized as follows. In Section 2, we survey some related works and discuss their limitations. The preliminary backgrounds to understand our scheme are introduced in Section 3 and the proposed work is addressed in Section 4. The security guarantees, such as the ideal security definition, the data privacy, and the query privacy, are analyzed in Section 5. In Section 6, we evaluate the performances of our work by complexity analyses and demonstrate them by the experimental results. Finally, Section 7 concludes this work.
2. Related Works
2.1. A Brief Survey of the Access Control Mechanisms
As mentioned by one of the anonymized reviewers, cloud computing provides the on demand and scalable services, therefore, the corresponding environment is highly dynamic. In addition to seeking for the assistance of cryptographic techniques, applying the access control mechanism is another fundamental and important choice that can meet the security requirements in the cloud environment. The main goal of applying access control mechanism is to restrict the users from performing any unauthorized activity to protect the sensitive information.
As summarized in [4], there are wide variety of models, methods, and policies proposed for designing an access control system. Each access control system has its own attributes, methods, and functions based on set of policies. For example, the Mandatory Access Control model (MAC) [5] is an access control policy in which a subject or request initiator can perform some sort of operation on a particular object or resource. When a subject or user attempts to access an object or resources, an authorization rule is enforced to determine whether the access can take place by examining the security attributes. Discretionary Access Control model (DAC) [6] is another wellknown access control policy which determines the owner of an object. The owner decides who is allowed to access the service based on users’ identities. Some researchers analyze the dynamic requirements for cloud environment and introduce the Role Based Access Control model (RBAC) [7] to the cloud environment. RBAC is an access control policy determined by the systems rather than by the owner itself. RBAC model can only be applied within a closed network and it is based on identification. It only checks the user identity and roles assigned to users and based on this role it checks if the user is authorized or not.
However, as commented also in [4], both MAC and DAC can only be applied to specific environments like operating and database management systems, while RBAC model fails to check the malicious activity done by the authorized users. In facing of the abovementioned shortages, the trustbased mechanisms (TBMs) [8] and the contextaware RBAC (CARBAC) [9,10,11,12], which well fit today’s dynamic environments provided by cloud and fog computing, are arisen. Both TBMs and CARBAC do effectively secure the sensitive information by authorizing legal users and protecting the cloud resources from the malicious activities. However, the focus of this work is to find cryptographybased security solution for Cloudbased KNN search problem, and a complete survey of TBMs and ACRBAC cannot be provided within such a limited space. Interested readers can find the corresponding details in the listed references and the references therein.
2.2. A General Survey of PrivacyPreserving KNN Problem
As mentioned in Section 1, many works have been proposed to solve the secure KNN search problem over the encrypted outsourced dataset in the cloud. In [13], a PPkNN approach was proposed to provide the privacy of dataset, input query, kNN result, and data access pattern, but it is vulnerable to collusion attacks. In other words, it assumed that there is neither collusion among cloud servers nor collusion between any data owner and cloud server. In [14], a privacy preserving medical diagnosis system using Ehealth was proposed, in which the cloud server worked in a privacy preserving manner even though the medical datasets are owned by multiple data owners. As a building block of the proposed diagnosis system, the authors designed a deterministic KNN based privacy preserving protocol for finding the k data with the highest similarity to a queried symptom, which reduces the average running time by 35% compared to that of a previous probabilisticbehaved work [15]. Basically, this PPkNN protocol is constructed on the basis of multiparty computation (MPC) based on secret sharing, and therefore works in a distributed manner without any trusted server. A survey about existing works related to PPkNN can be found in [14]. Similarly, in [16], the authors focused on solving the clustering problem over encrypted cloud data. In particular, they proposed a privacypreserving kmeans clustering technology over encrypted multidimensional cloud data by leveraging the scalarproductpreserving encryption primitive, called PPKmeans. The proposed technique is able to achieve efficient multidimensional data clustering as well as preserve the confidentiality of the outsourced cloud data. The authors claimed that their work is the first one to explore the privacypreserving multidimensional data clustering in the cloud computing environment. Extensive experiments in simulation datasets and reallife datasets demonstrate that the proposed PPKmeans is secure, efficient, and practical.
2.3. A Survey of the Most Related Works
In this subsection, we briefly describe the most related works and their corresponding characteristics such as the scheme structure, the main idea of searchable encryption, the security definition and the corresponding performances.
Wong et al. [17] presented an asymmetric scalarproduct preserving encryption (ASPE) scheme to perform the secure KNN computation on encrypted databases. The word asymmetric indicates the fact that the data owner and the query user perform their encryption in different ways. The scalarproduct between encrypted query and data points is preserved and the encryption equations built by ASPE allow the user to find out the K closest points among all the encrypted data. The data owner is forced to share the decryption key with users so that they can recover the encrypted results after receiving them. We consider that as a risk of revealing more unexpected private information to users. In addition, the method simply uses linear scan to find the results, but the requirement of complexity we consider is a sublinear one.
Hu et al. [18] discussed the general problem of query processing over untrusted data cloud including KNN query and range query. Unlike most of the other methods which simply outsourced the data to cloud server, they restructured their data based on an Rtree and sent the associated indexes to users. The query processing procedure is executed by the cloud server, who has the decryption key, and the user. They leverage a privacy homomorphism encryption method to make the secure version of KNN best first search (BFS) algorithm over Rtree executable. However, most of the computational cost associated with the processing procedure is endured at the user end which is against the original intention of using the cloud. Moreover, the indexes of Rtree need to be stored which consumes additional memories from the user. Furthermore, the scheme was proved not secure under probing attack [19]. After sending an adequate number of query requests, the user is able to recover the plaintext owned by the data owner.
Some security issues of [17,18] have been reported by Yao et al. in their work [19]. They revisited the above two methods and proved that they are insecure under chosen plaintext attack by introducing a new attack model. Another secure nearest neighbor finding scheme using a secure Voronoi Diagram (SVD) was presented in [19]. The SVD scheme induces a partition of the dataset and the data owner encrypts those partitions and sends them to the server with their identifiers. The query user finds the nearest neighbor of his query by asking server to return the partitions relative to the results. However, the abovementioned limitations such as the need of sharing decryption key with users and the additional storage consumption (to store the descriptions of partitioned data) still exist in this scheme. Users also need to provide extra computational resources to retrieve the accurate results among the returned partitions. Worst of all, the construction of SVD on data with higher than two dimensions needs large computational cost.
Considering the security level more strictly, Elmehdwi et al. [20] proposed a fully secure KNN search protocol over encrypted datasets in outsourced environment. They introduced several basic security protocols, e.g., secure multiplication, secure squared Euclidean distance, secure minimum, and secure bitor with Paillier cryptosystem to build the whole secure scheme, which not only preserve the data privacy and query privacy but also hide the data access pattern from the cloud server. Unfortunately, the protocols are based on the setup of two noncolluding cloud servers and the corresponding complicated protocols between the two servers leads to a long query response time. Those characteristics are regard as the disadvantages of this fully secure scheme which may handicap its value in practical usage. Our approach tries to relax the security guarantee of this approach by preventing the data access pattern revealed to server into considerations so as to enhance the practicability.
Wang et al. [21] proposed a practical method using the ideal secure order preserving encryption and the Rtree structure to solve the similar problem on largescale data. In their work, the search procedure includes two interaction communications between the user and the server. In the first round of interaction, the server narrows down the candidate results by identifying which minimum bounding rectangle (MBR) contains the query and sends the points within MBR back to the user. Then the user creates a search box according to the nearest point in the returned set and sends it to the server, and the server outputs all points in that box as the resulting set. One can see that this scheme, the same as [19], does not return accurate results, and extra distance calculations must be operated by the user. The problem structure of this work is also different from the ordinary ones, for example, in [21] a user plays the roles of both the data owner and the query user. That is, if we want to allow a thirdparty user to make a query request, the data owner needs to share out the decryption key, after all.
In addition, a few new works have been proposed to solve the same problem, but most of them did not meet our needs as well. Those works either need two noncolluding servers as modelassumption [22] or only return the indexes of search results [23,24]; moreover, decryption key shared from the data owner is a must, if the user wants to obtain the unencrypted results.
In summary, plenty of privacy preserving KNN search schemes have been proposed but all of them have certain shortages and put up barriers on their usage in practice. On the other hand, our approach achieves the relaxed security level, which preserves the data privacy of the data owner and the query privacy of the query user but reveals the data access pattern to the cloud server. Notice that the leakage of data access pattern is not an issue when the stored data has been encrypted. In other words, without the decryption key, no damage occurs even if the server knows what ciphertext the user has accessed. More importantly, the limitations of the other works have been released somewhat by our scheme. We summarize the characteristics of the proposed scheme as follows and also show the comparisons with some related works in Table 1:
 The search complexity of our scheme is faster than linear.
 The data owner does not have to share the decryption key to the cloud server or other query users.
 Our approach does not need extra local storage at users’ end and consumes extremely small computational resources.
 The cloud server returns neither the approximate results nor the indexes of resultant data points, but the accurate KNN values.
 Not only suitable for two dimensional spatial tuples, our approach can be extended to high dimensional dataset, directly.
 Comparing with some methods in which two noncolluding servers are required, one honestbutcurious server is good enough to our scheme.
3. Preliminaries
For the ease of explanation, the three most important components of the proposed scheme: Paillier Cryptosystem, Orderpreserving Encryption, and Rtree will be briefly reviewed in this section.
3.1. Paillier Cryptosystem
Paillier cryptosystem is a probabilistic public key cryptographic system proposed by Pascal Paillier in 1999 [3]. Since it is a secure and efficient cryptography along with the additive homomorphic property, it has been utilized in many existing applications such as Evoting. The Paillier encryption scheme, denoted as $(\mathrm{KeyGen},\text{}\mathrm{Enc},$ $\mathrm{Dec}{)}_{paillier}$, is defined as follows:
a. ${\mathrm{KeyGen}}_{paillier}$
The key generation procedure is responsible for generating a pair of keys $\left(pk,\text{}sk\right)$. Public key $pk$ is for data encryption and available to all public users in the system. Secret key $sk$ is for decryption and only the privilege member, e.g., official manager has the right to access it. A simplified variant of generation steps [25] can be described as follows:
 (1)
 Randomly select two large primes $p$ and $q$, such that $\mathrm{pq}$ and $\left(p1\right)\left(q1\right)$ are relatively prime, i.e., $\mathrm{gcd}\left(pq,\left(p1\right)\left(q1\right)\right)=1$.
 (2)
 Compute $n=pq$ and $\lambda =\left(p1\right)\left(q1\right)$.
 (3)
 Compute $g=n+1$ and $\mu ={\lambda}^{1}\mathrm{mod}\text{}n$.
 (4)
 Public key $pk$ is $\left(n,g\right)$ and secret key $sk$ is $\left(\lambda ,\mu \right)$.
b. ${\mathrm{Enc}}_{paillier}$
The encryption procedure takes the plaintext message $m\in {\mathbb{Z}}_{n}$ and $pk$ as input, while the output is the encrypted ciphertext $c$. Notation ${\mathrm{E}}_{pk}(\xb7)$ is used to represent the encryption procedure with the encryption key $pk$. The following two steps complete the corresponding encryption process.
 (1)
 Select a random number $r\in {\mathbb{Z}}_{n}$
 (2)
 $\mathrm{c}={\mathrm{E}}_{pk}\left(\mathrm{m}\right)={g}^{m}\xb7{r}^{n}\text{}\mathrm{mod}{n}^{2}$
c. ${\mathrm{Dec}}_{paillier}$
The decryption procedure takes the ciphertext $c\in {\mathbb{Z}}_{{n}^{2}}^{\ast}$ and $sk$ as input, while the output is the decrypted plaintext message $m$. Notation ${\mathrm{D}}_{sk}(\xb7)$ is used to represent the decryption procedure with the decryption key $sk$. The decryption procedure can be represented by the following formulas.
where function $\mathrm{L}\left(\mathrm{x}\right)=\frac{x1}{n}$.
$$m={\text{}\mathrm{D}}_{sk}\left(c\right)=\mathrm{L}\left({c}^{\lambda}\mathrm{mod}{n}^{2}\right)\xb7\mu \mathrm{mod}n$$
The most wellknown feature of Paillier cryptosystem is its additive homomorphic property, which enables users to do additive operation directly in the encrypted domain without decrypting the ciphertext first. The additive homomorphic property of Paillier cryptosystem, from the decryption point of view, includes:
 (1)
 The decryption of the product of two ciphertexts ${c}_{1}$ and ${c}_{2}$ can be realized by the addition of the two corresponding plaintexts ${m}_{1}$ and ${m}_{2}$ directly, that is ${\mathrm{D}}_{sk}({c}_{1}\xb7{c}_{2})={\mathrm{D}}_{sk}\left({\mathrm{E}}_{pk}\left({m}_{1}\right)\xb7{\mathrm{E}}_{pk}\left({m}_{2}\right)\mathrm{mod}{n}^{2}\right)={\mathrm{E}}_{pk}\left({m}_{1}+{m}_{2}\mathrm{mod}n\right)$.
 (2)
 The decryption of a ciphertext ${c}_{1}$ raised to the power of a plaintext ${m}_{2}$ can be realized by computing the product of the corresponding plaintexts ${m}_{1}$ and ${m}_{2}$ directly, that is ${\mathrm{D}}_{sk}\left({c}_{1}{}^{{m}_{2}}\right)={\mathrm{D}}_{sk}\left({\mathrm{E}}_{pk}{\left({m}_{1}\right)}^{{m}_{2}}\mathrm{mod}{n}^{2}\right)$ = ${m}_{1}\xb7{m}_{2}\mathrm{mod}n$.
3.2. OrderPreserving Encryption
Orderpreserving encryption or orderpreserving encoding (OPE) [26] is a special kind of encryption scheme in which the order relationship between plaintext messages is preserved after the encryption is done. The security of an ideal OPE should be defined with indistinguishability under ordered chosen plaintext attacks (INDOCPA) [27]. That means the ciphertexts of an OPE should reveal nothing more but the ordering of the plaintexts. Popa et al. [2] presented the first idealsecurity protocol for OPE. They showed that the mutability of some ciphertexts is required for idealsecurity OPE even if the encryption model is already stateful and interactive. The proposed mutable order preserving encoding (mOPE) protocol is conducted between a client (or data owner) and an OPE server. A basic mOPE protocol scheme, denoted as $(\mathrm{KeyGen},\mathrm{InitState},\mathrm{Enc},$ $\mathrm{Dec},\mathrm{Order}{)}_{mOPE}$, can be understood as follows:
a. ${\mathrm{KeyGen}}_{mOPE}$
The client takes a security parameter $\mathsf{\kappa}$ as input and generates the secrete key $sk$ by the key generation module of any symmetric deterministic encryption scheme ($\mathrm{SDES}$), which obeys the pseudo random function security property [28].
b. ${\mathrm{InitState}}_{mOPE}$
The server initializes its state $st$ by creating a binary search tree, or a $b$ary Btree, as OPE tree and a table as OPE table. The nodes on an OPE tree after encryption contain the ciphertexts and the values of the decrypted ciphertexts in the tree’s left subtree nodes are smaller than those in the right subtree ones. The OPE table records a mapping from each ciphertext to its corresponding OPE value.
c. ${\mathrm{Enc}}_{mOPE}$
The encryption algorithm of mOPE is run interactively by a client and a server. The client takes $sk$ as input to encrypt the original plaintext $m$. After running the algorithm, the server state $st$ is updated. The full encryption scheme can be addressed as follows:
 (1)
 Client: Encrypt $m$ by $\mathrm{SDES}$ and send the ciphertext $c$ to the server.
 (2)
 Server: If OPE table already contains $c$, do nothing.
 (3)
 Client $\leftrightarrow $ Server: Otherwise the client helps the server insert $c$ to OPE tree by starting a traversal from the root of OPE tree:
 (a)
 The server sends back the ciphertext ${c}^{\prime}$ to the client when it encounters a node.
 (b)
 The client decrypts ${c}^{\prime}$ to ${m}^{\prime}$. If ${m}^{\prime}>m$, the client tells the server to go left, otherwise go right.
 (c)
 Repeat steps a and b until the traversal goes to an empty leaf node and then puts $c$ in there.
 (d)
 Rebalance OPE tree if needed and update the OPE table.
A ciphertext value in OPE table is computed according to a path from the root to the node and the position of the ciphertext in that node is obtained based on Equation (2):
where $\left[\mathrm{path}\right]$ is the concatenation of the binary strings of path pointer indexes from root to the node, where $\left[\mathrm{pos}\right]$ is the binary string of one plus position index of the ciphertext in that node. The total length of both of them depends on $b$, which is the maximum ciphertext number in a node. Figure 1 shows an example of the predescribed encryption algorithm. The texts in each node represent a plaintextciphertext pair. The blue strings represent the path string from the root to the target node. The red strings represent the ciphertext positions in that node.
$$\mathrm{OPE}\text{}\mathrm{value}=\left[\mathrm{path}\right]\left[\mathrm{pos}\right]0\dots 0,$$
Suppose that there are five integers: 12, 10, 18, 27, and 14 to be encrypted by the client. The state of the 3ary Btree and the content of OPE table after the first four insertions are on the left of Figure 1. After the insertion of the fifth number 14, the Btree on the right rebalances its structure and the server updates the OPE table. Take the number 27 as an example. Before the insertion, its path index from the root was $1$ so $\left[\mathrm{path}\right]=\text{\u201c}01\text{\u201d}$ and one plus its position index becomes $1+1=2$ so $\left[\mathrm{pos}\right]=\text{\u201c}10\text{\u201d}$. After the encryption of number 14, the path index of number 27 becomes 2 so $\left[\mathrm{path}\right]=\text{\u201c}10\text{\u201d}$ and one plus the position index becomes $1+0=1$ so $\left[\mathrm{pos}\right]=\text{\u201c}01\text{\u201d}$. The OPE value of number 27 changed from 6 to 9 but the relative ordering did not change. The padding zeros at the end of Equation (2) are used to let all binary strings of OPE values have the same length.
d. ${\mathrm{Dec}}_{mOPE}$
The decryption of mOPE is just the same as the decryption of $\mathrm{SDES}$. There are no additional operations needed for calculating OPE value or OPE binary string.
e. ${\mathrm{Order}}_{mOPE}$
Since OPE table stores the ordered OPE values for each ciphertext, by taking ciphertexts as input, we can compare which corresponding plaintext is larger by using the ordering function $\mathrm{Ord}(\xb7)$. Take numbers 10 and 14 in Figure 1 as an example, since $10<14$, $\mathrm{Ord}\left(\text{\u201d}\mathrm{x}2\mathrm{b}017\text{\u201d}\right)=1<\mathrm{Ord}\left(\text{\u201d}\mathrm{x}6481\mathrm{d}\text{\u201d}\right)=6$.
3.3. RTree
Rtree is a tree data structure proposed by Antonin Guttman in 1984 [1]. It is widely used in many multidimensional data management tasks, such as $K$NN or geometric search. Each node in an Rtree represents an MBR of all its children MBRs. At the leaf level, each MBR contains a specific number of spatial data points or data objects as its children. Rtree is also a balanced tree, like Btree, so the average time complexity for searching a data point on it is $\mathrm{O}\left({\mathrm{log}}_{M}N\right)$, where $M$ is the maximum number of children in each node and $N$ is the total number of data points. Figure 2 shows an example of a twodimensional Rtree, in which $M=3$ and $N=10$.
3.3.1. BulkLoading
Bulkloading is a kind of construction method for Rtree which loads several data points to an Rtree at once. Apart from the normal construction method, where the data points are inserted one by one, bulkloading provides a more efficient construction but needs to know the whole data points beforehand. The Rtree constructed by bulkloading method also has better query performance since the overlap between MBRs can be reduced.
Leutenegger et al. [29] proposed an efficient Rtree bulkloading algorithm in 1997. The leafMBRs of an Rtree constructed by this algorithm do not overlap at all and the rest MBRs in higher levels overlap just a little. The following is the construction algorithm for $N$ $D$dimensional data points, where $M$ is the maximum children number in a node:
 (1)
 Set $d=D$; if $N<M$, simply create the rootMBR and end the algorithm.
 (2)
 Calculate the number of leafMBR pages, that is $P=\lceil \frac{N}{M}\rceil $, where $\lceil x\rceil $ denotes the ceiling function.
 (3)
 Sort the data points according to the $d$th coordinate and divide them into $S=\lceil {P}^{\frac{1}{d}}\rceil $ slices. Each slice has $M\xb7\lceil {P}^{\frac{d1}{d}}\rceil $ data points.
 (4)
 Recursively process each slice, by repeating steps 2 and 3, where $N$ is the current data number in this slice, and $d=d1$ until each slice contains only $M$ data points.
 (5)
 Create an MBR for each slice with pointers pointing to all data points in that slice.
 (6)
 Treat MBRs created in step 5 as data points and create higher level MBRs based on step 1.
3.3.2. $K$NN Search over RTree
The structure of an Rtree can improve the performance of a $K$NN search. By traversing a part of MBRs and the data points in them, there is no need to compute the distance between the query point and the whole data points. The idea of the best first search (BFS), proposed in [20] for $K$NN search over Rtree, is to access the nearest MBRs or data points all the time by using a priority queue. The input of the algorithm is an Rtree, $Rt$, and a query point $\mathit{q}$, and the output will be $\mathit{R}$, the set of $K$ data points nearest to $\mathit{q}$. The algorithm can be understood as follows:
 (1)
 Initialize a priority queue $Q$ and the result set $\mathit{R}$.
 (2)
 Enqueue the root of $Rt$ into $Q$.
 (3)
 While $Q$ is not empty
 (a)
 Dequeue element $\mathit{e}$ from $Q$.
 (b)
 If $\mathit{e}$ is a data point, put it in $\mathit{R}$. If $\left\mathit{R}\right=K$, return $\mathit{R}$.
 (c)
 Otherwise
 Compute the distance from $\mathit{q}$ to each of the children of $\mathit{e}$.
 Enqueue all children of $\mathit{e}$ into $Q$ using their distance to $\mathit{q}$ as priority.
At the beginning, we enqueue the root of an Rtree to the priority queue. For every element dequeued from the priority queue, we check if it is a data point. If so, add it into the resulting set or recursively enqueue its children to the priority queue if it is still an MBR. The algorithm ends if the resulting set has $K$ elements. Since the priority used in the priority queue is the distance to query point, we can always access to the best candidate containing the resultant points.
4. The Proposed Scheme
4.1. Overview
In this section, the structure of our proposed scheme and some notations will be described first, and the details of the scheme will be addressed in the remaining paragraphs. There are three characters (or players) involved in the proposed scheme, the data owner ($DO$) who owns the original data, the cloud server ($CS$) who provides the storage for data and shares the computation loads of $K$NN search, and the query user ($QU$) who makes the $K$NN search query request to $CS$, and obtains the final results. In order to preserve the privacy of the data, $DO$ encrypts his data before outsourcing them to $CS$. On the other hand, $QU$ wants to protect his own query privacy from revealing it to $CS$ and $DO$, so he also encrypts the query. We assume that $CS$ mentioned here is an honestbutcurious third party, i.e., it will follow the protocol to maintain its business credit but will try to know the data or query as much as it can during the execution of procedures. We leverage Paillier cryptosystem as the encryption method for $DO$ and $QU$ so that both of them can perform encryption with the same public key while the privilege of decryption retained to $DO$. The homomorphic properties of Paillier will help us shift some operations to $CS$ on the encrypted domain. At the same time, the ordering between the encrypted data should be preserved if we want to outsource the construction of Rtree and the runningjob of BFS algorithm as well. After executing BFS algorithm, the results returned to $QU$ should be decrypted but without revealing them to $DO$; otherwise the approximate value of query will be known by $DO$. $CS$ scrambles the results before sending them to $DO$ for decryption and helps $QC$ restore them, and then obtains the real data results. The structure of the proposed scheme and the detail steps of the proposed protocol are presented in Section 4.4.
In the next section, we describe the main encryption method used in the proposed scheme. Some security data management building blocks, used by server for protecting query privacy, are introduced in Section 4.3 and a newly proposed privacy preserving $K$NN search method is described in Section 4.4.
4.2. Encryption Method
As mentioned above, the encryption method in the proposed protocol is a combination of Paillier cryptosystem and OPE. By utilizing both of their properties, $CS$ is able to do additive operations without asking $DO$ to decrypt the ciphertexts first and checking the order between them. First of all, we try to replace $SDES$ in mOPE by Paillier cryptosystem. There are plenty of issues to be considered because Paillier is neither a symmetric nor a deterministic cryptosystem. In an mOPE, the order relation is preserved by OPE tree whose construction needs decryption. If we change the cryptosystem from symmetric to asymmetric ones, the whole members in that system are able to encrypt data using the public key but cannot obtain the order unless the accessibility to secret key is provided. Therefore, no matter how the other members encrypt their own new data by the public key, they are unable to find out the original data of $DO$ through probing attack. More security analyses will be presented in Section 5. On the other hand, the original mOPE made use of a deterministic algorithm so the ciphertexts will be the same if there are two equal plaintexts. mOPE protocol can easily come to an end if a ciphertext already existed in OPE table. If we change $SDES$ to Paillier, two equal plaintexts will be encrypted into two different ciphertexts and their OPE values will also be different. It is no big deal here, after we described the whole scheme, because the corresponding OPE values still adjacent to each other in ascending or descending order and the equal ciphertexts are always decrypted to the same number. The homomorphic properties of Paillier also need to be considered in this combination. The situation that some system members, without having secret key, may create new ciphertexts with the aid of homomorphic addition is similar to that of the predescribed symmetric/asymmetric situation. The OPE value of the new ciphertext obtained from homomorphic addition will not be the addition of two OPE values. In order to obtain the real one, one has to access to secret key from $DO$, first. The full encryption scheme combining Paillier and mOPE is denoted as $(KeyGen,InitState,Enc,SmOPE,Dec,$ $Order)$ in the rest of this paper.
a. $\mathrm{KeyGen}$
The same as ${\mathrm{KeyGen}}_{paillier}$, $DO$ generates public key $pk$ for $CS$ and $QU$ for encryption and keeps the secret key $sk$ to himself for conducting the decryption job.
b. $\mathrm{InitState}$
Similar to ${\mathrm{InitState}}_{mOPE}$, $CS$ initializes an OPE tree and an OPE table before $DO$ encrypts any data. A node in OPE tree contains the ciphertexts of Paillier and OPE table gives the map between the ciphertexts of Paillier to the values of OPE table.
c. $\mathrm{Enc}$
The same as ${\mathrm{Enc}}_{paillier}$, the one with $pk$ encrypts a plaintext to a ciphertext by Paillier cryptography.
d. $\mathrm{SmOPE}$
$\mathrm{SmOPE}$ stands for the secure mOPE. First of all, we replace the encryption of $\mathrm{SDES}$ by taking the ciphertext from $\mathrm{Enc}$ in the first step of ${\mathrm{Enc}}_{mOPE}$. Secondly, we do not need to check whether the ciphertext is already in OPE table (in step 2) or not because Paillier is a nondeterministic algorithm. The rest of the protocols will be described, as a security building blocks, in Section 4.3.
e. $\mathrm{Dec}$
$DO$ is the only one who owns $sk$ in this scheme and the decryption of a ciphertext is the same as that of ${\mathrm{Dec}}_{paillier}$.
f. $\mathrm{Order}$
We can obtain the ordering between ciphertexts of Paillier because of the functionality of $\mathrm{mOPE}$. However, the difference apart from ${\mathrm{Order}}_{mOPE}$ is that if ${m}_{1}$ and ${m}_{2}$ are two plaintexts and ${m}_{1}={m}_{2}$, there will be two different ciphertexts ${c}_{1}$ and ${c}_{2}$, where ${c}_{1}\ne {c}_{2}$ and $\mathrm{Ord}\left({c}_{2}\right)<\mathrm{Ord}\left({c}_{1}\right)$ if ${c}_{1}$ is encrypted before ${c}_{2}$.
4.3. Security Building Blocks
In our scheme, only $DO$ has the right to use $sk$ to decrypt data. This concern is about not letting anybody except $DO$ himself have the authority to obtain the original data. $DO$ will be more willing to provide data to $QU$ through $CS$ if he has such a security guarantee. On the other hand, $QU$ wants to access the data from $DO$ by sending queries without revealing his own personal query to any other people. However, the query of $QU$, even encrypted by the scheme described in Section 4.2, can easily be revealed to $DO$ through conducting $\mathrm{Dec}$. Thus, $CS$ plays an important role here to meet the needs of both. Some security building blocks are needed to be used by $CS$ such that $DO$ does not need to share his secret key with others and $QU$ can keep his query privacy after decryption at the same time.
4.3.1. RTree Construction on the Encrypted Domain
In existing methods, which leveraged Rtree data structure to speed up their $K$NN search, the construction steps are done by data owners in the plaintext domain. After data owners constructed the tree, they encrypted the plaintext values at the tree nodes and left the pointers between parents and children unencrypted and sent the tree structure to the server. Though the server would not obtain the real plaintext value in each node, it obviously could know the data access pattern of the query since the pointers are unencrypted. Therefore, if we use similar construction procedure to the existing methods, the data access pattern will be revealed to $DO$ when running BFS algorithm. That is unacceptable because $DO$ is the one who built the tree in this situation. Once he obtains the access pattern, he can find out nearby data of the query and thus obtains an approximation of the personal query value of $QU$.
To make sure that the abovementioned problem will not happen, we decided to shift the construction steps of Rtree from $DO$ to $CS$. By doing this, we can hide the access pattern from $DO$ for one thing and share the computation cost of construction for another. As we can see, the Rtree construction method introduced in the previous section, only needs to sort along with different dimensions and a few computations about the total data number $N$, dimension $D$, and the maximum number of children $M$ in a node. The sorting procedure can be successfully done in the encrypted domain by $\mathrm{Ord}(.)$. Parameters $N$ and $D$ should be known by $CS$ even if the data outsourced by $DO$ are encrypted. In our method, we will let $CS$ decide which dimension the sorting procedure will be followed up in addition to parameter $M$ as a randomness of construction. The altered algorithm of Rtree construction on the encrypted domain, denoted as $\mathrm{RtConstruct}$, is shown in Algorithm 1.
Algorithm 1.$\mathrm{RtConstruct}$ 
Input: $N$ $D$dimensional encrypted data points Output: Encrypted Rtree root pointer $Rt$
Treat MBRs created in step 6 as data points and create higher level MBRs based on step 2. 
By adding two parameters decided by $CS$ and the function $\mathrm{Ord}(.)$ to sort the ciphertexts, $DO$ cannot directly know which MBRs or data points are nearby $QU$’s query during the execution of BFS algorithm. The data access pattern is revealed to $CS$, which is the same as the construction process of existing methods; but it does not matter since $CS$ cannot perform the decryption to find the plaintext value out.
4.3.2. Secure Compare and Secure mOPE
$CS$ keeps the structure of Rtree in order to prevent $DO$ from finding out the query through data access pattern. However, $DO$ already has a chance to know it before searching if we simply follow the original mOPE protocol. The problem is that $\mathrm{mOPE}$ requires predecryption by $DO$ and that will directly reveal the query value, which is unacceptable. We need a secure protocol that can make the procedure $\mathrm{mOPE}$ more robust. $DO$ should know nothing about query, including its plaintext value or the ordering to any of his data point. We leverage a secure comparison protocol, $\mathrm{SCompare}$ as shown in Algorithm 2, as a building block and propose the secure version of $\mathrm{mOPE}$, denoted as $\mathrm{SmOPE}$, shown in Algorithm 3. We also let $CS$ decide the order of data points input to $\mathrm{SmOPE}$ once he obtains all the outsourced data. As a result, $DO$ does not obtain the structure details of OPE tree (just like an Rtree); thus, he truly has no idea what the query value is.
Algorithm 2.$\mathrm{SCompare}$ 
Input: two ciphertexts ${\mathrm{E}}_{pk}\left(a\right)$ and ${\mathrm{E}}_{pk}\left(b\right)$ Output: $a>b?$ $CS:$
$DO:$
$CS:$

Algorithm 3.$\mathrm{SmOPE}$ 
Input: ciphertext $c$ Output: the newest state $st$ of OPE table $CS:$
$CS\leftrightarrow DO:$
$CS:$

4.3.3. Secure Square Euclidean Distance
The last security building block introduced in this section is the secure square Euclidean distance protocol, $\mathrm{SSED}$. This protocol allows $CS$ and $DO$ to compute the Euclidean distance between two data points together without revealing the plaintext distance value to both of them. It comes up from the security protocols proposed in [20]. In this protocol, first, the distance between two encrypted data points is calculated through homomorphic subtraction. Each dimension of that difference vector is then used to compute the square of the difference vector through a secure multiplication protocol. Finally, the ciphertext of the square Euclidean distance is computed by homomorphically adding all the components of the square difference vector. We do not need to further compute the square root of the previous distance square because it will be used just for finding the priority in BFS algorithm. The secure multiplication, $\mathrm{SM}$, is a multiplication protocol between $CS$ and $DO$ for two given ciphertexts, and it outputs the ciphertext of multiplication of two corresponding plaintexts without letting both of them know any one of the plaintext values. Details of $\mathrm{SM}$ and $\mathrm{SSED}$ are shown in Algorithms 4 and 5, respectively.
Algorithm 4.$\mathrm{SM}$ 
Input: ciphertexts ${\mathrm{E}}_{pk}\left(a\right)$ and ${\mathrm{E}}_{pk}\left(b\right)$ Output: ${\mathrm{E}}_{pk}\left(ab\right)$ $CS:$
$DO:$
$CS:$

Algorithm 5.$\mathrm{SSED}$ 
Input: encrypted data points ${\mathrm{E}}_{pk}\left(\mathit{X}\right)=\left({\mathrm{E}}_{pk}\left({x}_{1}\right),{\mathrm{E}}_{pk}\left({x}_{2}\right),\dots ,{\mathrm{E}}_{pk}\left({x}_{D}\right)\right)$ and ${E}_{pk}\left(\mathit{Y}\right)$ Output: encrypted square Euclidean distance ${\mathrm{E}}_{pk}{(\parallel (\mathit{X}\mathit{Y})\parallel )}^{2}$. $CS:$

4.4. A Newly Proposed Privacy Preserving KNN Search Scheme (PPKSS)
The setup of our scheme starts from the encryption of all data points by $DO$ using $sk$. $DO$ publishes the public key $pk$ of Paillier and then outsources the encrypted data points to $CS$. Upon receiving the encrypted data, $CS$ permutes them and runs $\mathrm{SmOPE}$ one ciphertext by another. The maximum ciphertexts number in the node of an OPE tree is also decided by $CS$ so that $DO$ cannot obtain the structure details of OPE tree. After $CS$ processed the $\mathrm{RtConstruct}$ algorithm, we are ready for responding to any query request from $QU$ now.
$QU$ is able to encrypt his own query any time he wants using $pk,$ broadcasted by $DO,$ and sends the encrypted query to $CS$. In order to prevent $DO$ from keeping himself online all the time (which is impractical), $QU$ needs to notify $DO$ every time he wants to make a query request. Upon receiving the request from $QU$, $DO$ checks if he is a trusted query user and ready for participating the protocols starting from $CS$. The first protocol $CS$ needs to start is $\mathrm{SmOPE}$ for inserting the query to OPE tree and keeping its order relation with other data points in OPE table. Secondly, $CS$ runs the secure BFS algorithm, $\mathrm{SBFS}$ as shown in Algorithm 6, in the encrypted domain by using encrypted distance as priority, calculating the encrypted distance by $\mathrm{SSED}$ and pushing it into priority queue by $\mathrm{SCompare}$. Notice that the distance between the query and an MBR should be the distance from the query to the closest point in that MBR which can be found by using $\mathrm{Ord}\left(\right)$. Figure 3 shows two examples of finding the closest points to queries in an MBR with the aid of the predescribed orderpreserving properties. The output returned by the algorithm is a set of encrypted data points which represents the KNN search results of the query. $CS$ then adds a random number to each dimensional component of each point by homomorphic addition and sends the resultant set to $DO$. $DO$ decrypts the whole resultant set and returns it to $QU$. At the same time, $CS$ would send the plaintexts of the random numbers matrix he adds on the encrypted results to $QU$ to help $QU$ recover the real plaintext results. The steps of the full privacy preserving $K$NN search scheme, $\mathrm{PP}K\mathrm{SS}$, is shown in Algorithm 7 and the corresponding flow diagram is given in Figure 4.
Algorithm 6. SBFS 
Input: encrypted query ${\mathrm{E}}_{pk}\left(\mathit{q}\right)$, Rtree $Rt$ and $K$. Output: encrypted resultant set ${\mathrm{E}}_{pk}\left(\mathit{R}\right)$ $CS\leftrightarrow DO:$

Algorithm 7.$\mathrm{PP}K\mathrm{SS}$ 
Input: $DO:$ generate data points $\mathit{P}=\left\{{\mathit{p}}_{1},{\mathit{p}}_{2},\dots ,{\mathit{p}}_{N}\right\}$, where ${\mathit{p}}_{i}=\left({p}_{1}^{i},{p}_{2}^{i},\dots ,{p}_{D}^{i}\right)$, ${p}_{j}^{i}\in \mathbb{Z}$; $QU:$ make a query $\mathit{q}=\left({q}_{1}^{i},{q}_{2}^{i},\dots ,{q}_{D}^{i}\right))$, ${q}_{j}^{i}\in \mathbb{Z}$ and $K$ Results: $QU:$ receive $K$NN result $\mathit{R}=\left\{{\mathit{r}}_{1},{\mathit{r}}_{2},\dots ,{\mathit{r}}_{K}\right\}$ Setup $DO:$
$CS:$
$CS\leftrightarrow DO:$
$CS:$
Search $QU:$
$CS\leftrightarrow DO:$
$CS:$
$DO:$
$QU:$

5. Security Analysis
5.1. Indistinguishability under Order Chosen Plaintext Attack
The ideal security of OPE is founded on its indistinguishability under order chosen plaintext attack (INDOCPA), defined by Boldyreva et al. [27]. Generally speaking, the ideal OPE method is INDOCPA secure if the encryption procedure leaks nothing more than the ordering between ciphertexts. An INDOCPA security game between DO, CS, and a malicious adversary (MA) can clearly describe the definition. Consider that there are two sequences $\mathit{x}=\left\{{x}_{1},{x}_{2},\dots ,{x}_{l}\right\}$ and $\mathit{y}=\left\{{y}_{1},{y}_{2},\dots ,{y}_{l}\right\}$ sent from an MA to DO, where $l$ is the sequence length and the sequences have the same order relation, that is, ${x}_{i}<{x}_{j}$ if and only if ${y}_{i}<{y}_{j}$ for all $i,j\in {\mathbb{Z}}_{l}^{\ast}$. DO then randomly selects a sequence to encrypt and MA would attack the model by guessing which sequence has been chosen. The definition assumes that MA is able to check every ciphetexts of the chosen sequence and the corresponding state of the server if it is an interactive scheme. We say MA wins, Win_{MA}, if the right sequence randomly selected by DO is guessed and the satisfaction of Equation (3) realizes the INDOCPA security. That is
where Pr[A] denotes the probability of event A and negl(κ) is a negligible function with parameter κ. That means MA can only have a negligible advantage over random guessing if the scheme is INDOCPA secure.
$$\mathrm{Pr}\left[{\mathrm{Win}}_{MA}\right]\le \frac{1}{2}+\mathrm{negl}\left(\mathsf{\kappa}\right),$$
We prove that our modified mOPE combining with Paillier cryptosystem is INDOCPA secure based on their primitive security guarantees. Paillier cryptosystem has been proved to provide semantic security against chosen plaintext attack (INDCPA), so, by definition, it meets a higher security requirement than an OPE scheme. That is, the order relationships of the two sequences sent by MA are not necessary the same, if the scheme is INDCPA secure. Therefore, we can prove the proposed encryption method provides INDOCPA security merely by discussing that the information leaks from CS while running mOPE is indistinguishable between two sequences. We inductively prove it on the basis of the number of encrypted messages. The base case is at the beginning of the procedure, when no message is encrypted yet, CS initializes the server state, which leaks the same original information to MA. We assume that $MA$ receives the same information no matter which sequence has been encrypted and cannot distinguish the results after conducting the $i$th encryption. At the $\left(i+1\right)$th encryption, the sustentation of the indistinguishability completes the proof. Since sequences $\mathit{x}$. and $\mathit{y}$ have the same order relation, the associated OPE trees have the same structure after the $i$th encryption has been conducted. The traversal over the tree and the update of OPE table, at the $\left(i+1\right)$th encryption, are also in the same way as before. Therefore, by observing the server state, $MA$ can obtain nothing but the ordering between the ciphertexts, which is the same on both sequences. By mathematical induction, the procedure of mOPE reveals the same ordering information to $MA$. Additionally, owing to INDCPA security of Paillier, we complete the proof that the encryption scheme can survive the order chosen plaintext attack (INDOCPA).
5.2. Privacy Preserving of Data
We claim that the data privacy of our scheme is preserved if the following two statements can be justified:
 (1)
 $CS$ is prevented from knowing the actual plaintexts of the original data.
 (2)
 $QU$ receives only $K$NN search results in every query request, and has no idea about the rest of the dataset.
Since the data are encrypted by Paillier cryptosystem before outsourcing and $sk$ is held by $DO$ all the time, $CS$ cannot know the plaintexts directly. Therefore, our analyses focus on the interactions within mOPE. Despite assuming an honestbutcurious server, we consider the situation that $CS$ takes wrong ciphertexts while running $\mathrm{SmOPE}$ as launching a probing attack and tries to find the plaintexts of dataset out. $DO$ can prevent this kind of attack by setting a reasonable threshold on the maximum number of $SCompare$ taken by $CS$ to complete the system setup. The exceeding amount of $SCompare$ is regarded as a malicious behavior and $DO$ could shut the comparison service down before the privacy exposes.
The second requirement for preserving the data privacy is achieved much simpler than the first one, since $QU$ has no chance at all to directly contact with the encrypted dataset. The data involved by $K$NN search are only shared by $CS$ and $DO$, so $QU$ has no idea about the dataset except the receiving results. Certainly, one should also setup a threshold for the maximum acceptable value of $K$ to prevent $QU$ from sending an extremely large $K$ value and retrieving the entire dataset.
5.3. Privacy Preserving of Query
Apart from the preserving of data privacy, the proposed scheme preserves query privacy as well. Similarly, two required statements have to be justified to fulfill the privacy preserving of query.
 (1)
 $CS$ is prevented from knowing the actual value of query.
 (2)
 While running the search scheme, $DO$ must have no idea about the query, including the plaintext of it, the order relation with any other point in the dataset and the corresponding $K$NN search result.
The first statement can be justified in a similar way to that of the previous section, i.e., the query can be treated as another new encrypted data point. By executing $\mathrm{SmOPE}$ protocol, the only information leaks to $CS$ is the order relation. On the other hand, since our approach needs $QU$ to make a request to $DO$ at the same time, $CS$ is unable to maliciously pretend as a fake query user and send a large sequence of values to probe for the plaintext data.
The issues of query privacy prevention against $DO$ are more complicated because $DO$ is the one has decryption key $sk$. We design the scheme carefully by bringing in randomness on every part of operations involving the decryption process. First of all, the structure of OPE tree is unknown to $DO$ as the parameters are decided by $CS$. Under the above condition, $\mathrm{SmOPE}$ involves a sequence of comparisons of random values at $DO$’s aspect. The true value of the query and the order relation with data points in the dataset are hidden as a result. Secondly, the structure of Rtree is also unknown to $DO$, similar to the case of OPE tree. Therefore, $DO$ has no choice but treats $\mathrm{SBFS}$ algorithm as an ordinary sequence of multiplications and comparisons. Moreover, the protocols used in $\mathrm{SBFS}$, i.e., $\mathrm{SSED}$ and $\mathrm{SCompare}$, always add random numbers to the ciphertexts, which then will be decrypted by Paillier homomorphic property, so that $DO$ is performing operations on obscured data values. Finally, $DO$ receives a set of encrypted results at the end of PPKSS; however, even if the decryption is performed upon all of them, $DO$ still cannot obtain the $K$NN search result associated with the query eventually, since they have been randomly permuted in advance.
6. Performance Evaluation
The effectiveness of our scheme, PPKSS, is demonstrated in the following two different ways. First, we analyze the complexity of the scheme in the next section. Secondly, we show the experimental results in Section 6.2. The full PPKSS can be separated into different parts and each part will be discussed independently.
6.1. Complexity Analysis
First of all, $DO$ encrypts the dataset attributewise and outsources them to $CS$, which takes $\mathrm{O}\left(N\ast D\right)$ encryptions. This part of tasks can be fully parallelized since the encryption of each data can be done separately. After the outsourcing, $CS$ performs $\mathrm{O}\left(N\ast D\right)$ $\mathrm{SmOPE}$ protocol to preserve the order relation of each ciphertext. We further disassemble $\mathrm{SmOPE}$ protocol and find that it is composed of $\mathrm{O}\left(\mathrm{log}\left(N\ast D\right)\right)$ $\mathrm{SCompare}$ and $\mathrm{O}\left(1\right)$ OPE table update, where $\mathrm{log}\left(N\ast D\right)$ indicates the height of the OPE tree. The computation load of $\mathrm{SCompare}$ is shared by $DO$ and $CS$, where they need $\mathrm{O}\left(1\right)$ Paillier decryption and plaintext comparison and $\mathrm{O}\left(1\right)$ Paillier encryption and ciphertext multiplication, respectively. Additionally, a hash table is used to implement the OPE table, so we claim that the average update and access complexity takes $\mathrm{O}\left(1\right)$ plaintext operations.
After the construction of OPE tree, we leverage the order relation store in OPE table to build the Rtree, $\mathrm{RtConstruct}$, which is done simply based on sorting. The complexity of $\mathrm{RtConstruct}$ is $\mathrm{O}\left(DN\mathrm{log}\left(N\right)\right)$, and it can be derived as follows.
Since $\mathrm{RtConstruct}$ sorts data along with every dimension at each level of an Rtree, the comparison time using quick sort is the multiplication of $D$ and the summation of sorting complexity at each level. That is,
$$D\ast {{\displaystyle \sum}}_{i=0}^{{\mathrm{log}}_{M}N}\frac{N}{{M}^{i}}\mathrm{log}\left(\frac{N}{{M}^{i}}\right)\text{}\phantom{\rule{0ex}{0ex}}=DN((\mathrm{log}\left(N\right)\ast {{\displaystyle \sum}}_{i=0}^{{\mathrm{log}}_{M}N}\frac{1}{{M}^{i}})\left(\mathrm{log}\left(M\right)\ast {{\displaystyle \sum}}_{i=0}^{{\mathrm{log}}_{M}N}\frac{i}{{M}^{i}}\right))\phantom{\rule{0ex}{0ex}}\approx DN(\mathrm{log}\left(N\right)\left(\mathrm{log}\left(M\right)\ast {{\displaystyle \sum}}_{i=0}^{{\mathrm{log}}_{M}N}\frac{i}{{M}^{i}}\right))\text{}(\mathrm{sin}\mathrm{ce}\text{}N\gg M)\phantom{\rule{0ex}{0ex}}\le DN\mathrm{log}\left(N\right)$$
$QU$ enrolls the system after the setup steps were completed. We will discuss why the scheme is claimed to be with lightweight user loads, later. The complexity of query encryption is the same as the complexity of encryption of an individual data point, where the Paillier encryption is done by $QU$. We analyze the complexity of $\mathrm{SBFS}$ based on the original search algorithm over an Rtree in the plaintext domain [29]. The analysis is simplified under the assumption that the data in the dataset are uniformly distributed. Let $\mathit{H}$ be the hypersphere centered at the query $\mathit{q}$, with $r$ (the distance from $\mathit{q}$ to the $K$th result ${\mathit{r}}_{K}$), as radius. The search complexity or the total time $\mathrm{SSED}$ would take is the access number of MBRs from the root of an Rtree to the leafMBRs, which is $\mathrm{O}\left(\mathrm{log}N\right)$, plus the total access number of data points of the leafMBRs (i) inside $H$, which is $\mathrm{O}\left(K\right)$, or (ii) intersected by $\mathit{H}$, which is the most complicated part to analyze and it is derived as follows.
For simplification, another assumption is made, that is, we assume each leafMBR intersected by $\mathit{H}$ forms a hypercube with average occupancy of $c$ data points. Since the data points are assumed to be uniformly distributed, the expected volume of search region is $\frac{K}{N}$ and the expected volume of each leaf MBR is $\frac{c}{N}$. Moreover, the volume of $\mathit{H}$ is proportional to ${\pi}^{\frac{D}{2}}{r}^{D}$, so if ${\pi}^{\frac{D}{2}}{r}^{D}=\frac{K}{N}$, $r=\sqrt[D]{\frac{K}{{\pi}^{\frac{D}{2}}N}}$. Similarly, the side volume of each leafMBR $s=\sqrt[\frac{D}{D1}]{\frac{c}{N}}$. In addition, the number of leafMBRs intersected by $\mathit{H}$ is the same as that intersected by the circumscribed cube of $\mathit{H}$ [30,31], so the $2\ast D$ sides of a circumscribed cube intersect $\lfloor \frac{{\left(2r\right)}^{D1}}{s}\rfloor \le \frac{{\left(2r\right)}^{D1}}{s}$ regions, where $\lfloor x\rfloor $ denotes the floor function. Thus, the total expected data access number of leafMBRs intersected by $\mathit{H}$ is:
$$2D\ast \frac{{\left(2r\right)}^{D1}}{s}\ast c\phantom{\rule{0ex}{0ex}}=2D\ast \frac{{2}^{D1}{\left(\frac{K}{{\pi}^{\frac{D}{2}}N}\right)}^{\frac{D1}{D}}}{{(\frac{c}{N})}^{\frac{D1}{D}}}\ast c\approx 2\mathrm{D}\ast \sqrt[\frac{D}{D1}]{cK}$$
In summary, the total number operations of $\mathrm{SSED}$ is bounded by $\mathrm{O}\left(\mathrm{log}N+K+\mathrm{D}{K}^{\frac{D1}{D}}\right)$ and $\mathrm{O}\left(\mathrm{log}N+\text{}\mathrm{D}{K}^{\frac{D1}{D}}\right)$ is the expected node number in the priority queue.
Each subtask of $\mathrm{SSED}$ is analyzed separately to see the workload in detail. From the pseudocodes given in Section 4.3.3, it follows that $\mathrm{SSED}$ runs $\mathrm{O}\left(D\right)$ $\mathrm{SM}$ protocols, each of which needs $\mathrm{O}\left(1\right)$ decryption, plaintext multiplication and encryption on the site of $DO$, and $\mathrm{O}\left(1\right)$ encryption, ciphertext multiplication, and exponentiation on the site of $CS$.
On the other hand, a heap is used to implement the priority queue. If the expected number of MBRs and data points in the priority queue is $\leftQ\right$, which is about $\mathrm{O}\left(\mathrm{log}N+\text{}D{K}^{\frac{D1}{D}}\right)$ in total, then it takes $\mathrm{O}\left(\mathrm{log}\left(\leftQ\right\right)\right)$ $\mathrm{SCompare}$ individually to enqueue or dequeue data from the queue. Moreover, the complexity of our enqueue and dequeue approximates to $\mathrm{O}\left(\mathrm{log}N+K+D{K}^{\frac{D1}{D}}\right)$.
In the end of $\mathrm{PP}K\mathrm{SS}$, $CS$ takes $\mathrm{O}\left(DK\right)$ Paillier encryptions and multiplications to randomly permute the results and $DO$ takes $\mathrm{O}\left(DK\right)$ decryptions. Finally, only extremely light, $\mathrm{O}\left(DK\right)$, plaintext subtractions need to be done by $QU$.
We summarize our complexity analysis results in Table 2 by showing the executed complexities of different type of operations in different protocols of our scheme. Clearly, from the table, the operations in Paillier ciphertext domain, which cost most expensively, are performed by $CS$. As for $DO$, he or she focuses mainly on the decryption tasks and needs to do a few plaintext operations. Most importantly, a very lightweight workload is put on $QU$, which raises the possibility of extending our approach to a user with resourcelimited mobile devices. Another highlight of our approach is that the search complexity, $\mathrm{O}\left(\mathrm{log}N+K+D{K}^{\frac{D1}{D}}\right)$, is faster than linear scan, whose complexity is $\mathrm{O}\left(N\right)$, since $N$ is much greater than $D$ and $K$, in general.
6.2. Experimental Results
The overall performance of our proposed scheme is also demonstrated by several simulation experiments. We realized the proposed scheme, including $DO$, $CS$, and $QU$, in Clanguage and executed them on a laptop running macOS Sierra 10.12.3, with 2.7 GHz Intel Core i5 and 8 GB 1867 MHz DDR3 memory. The test dataset for $DO$ is randomly generated with different $N$ and $D$. We use 4ary Btree as an OPE tree and $M=5$ for Rtree. The key length of Paillier cryptosystem is set to 1024 bits. The rest of this section shows the executing time of each part of $\mathrm{PP}K\mathrm{SS}$ except for the encryption operations of dataset and query, which are simply accomplished by applying the original Paillier encryption. In general, a Paillier encryption of a plaintext data or a query takes about 7.5 milliseconds (ms) under our settings. Specifically, the timing responses of the proposed scheme will be examined separately so as to represent the workload of each character, independently. Moreover, the communication cost is also listed because the proposed procedures always include interactions between two parties.
First of all, Figure 5 shows the timing cost of performing $\mathrm{SmOPE}$ on a single Paillier ciphertext, that is, the time required to insert one ciphertext into an OPE tree and update the corresponding OPE table. When the total number of encrypted attributes grows from 100,000 to 1,000,000, each insertion may take longer time, changing from 40 ms to 45 ms, since the height of the tree grows. Notice that it may take a long time to complete the order preserving setup, but that does not matter since it is a onetime job.
The second task of the setup is $\mathrm{RtConstruct}$, which can totally be accomplished by $CS$. We illustrate the required timing for constructing the Rtree of the testing datasets with different sizes and dimensions in Figure 6. As the size of 2dimensional dataset grows from 20,000 to 100,000, the construction costs changed from 8 s to 45 s. Since the required construction time is proportion to the dimension of the dataset, as expected, Figure 6 shows it grows linearly with the respect to the number of involved data attributes.
We define the query search time of our scheme as a specific timeduration, which starts from the time after $CS$ completed $\mathrm{SmOPE}$ for inserting the encrypted query into an OPE table, and ends after $QU$ removed the random numbers out and got the unencrypted results correctly. The size of dataset $N$, the dimension of data $D$, and the range value $K$ for a $K$NN search are used as parameters to generate various sets of experiments. Each experimental result (corresponding to a different set of parameters) is obtained by running 100 queries on it and we record the resultant average search time. First, by fixing $D=2$ and $K=1$, that is, to find the nearest neighbors in a 2dimensional dataset, we illustrate the obtained average search time, for datasets with sizes changed from 10,000 to 100,000, in Figure 7. Since our approach utilizes Rtree structure to manage the dataset, the search time grows only a little while the size of dataset increases tremendously. Next, by setting $K=1$ and $N=20,000$, Figure 8 shows the search time behavior of our work over multidimensional dataset. The average search time grows apparently as the data dimension increases, for example, the search time changed from 3.52 s to 53.49 s when $D$ changed from 2 to 6. The reasons for this phenomenon come from the relatively complicated structure of Rtree and the increase of the number of necessary visited data points in MBRs, if high dimensional datasets are involved. In the last experiment, we fix $N=20,000$ and $D=2$ and conduct $K$NN searches by varying $K$, and the resulting search time is shown in Figure 9. As the value of $K$ increases from $1$ to $20$, the search time grows linearly from 3.52 s to 9.17 s.
Finally, we separate the search time of our approach into the combination of $DO$’s CPU time, $CS$’s CPU time, and $QU$’s CPU time to clearly identify the workload of each character. The data transmission costs are also listed to evaluate the extra communication overhead of the proposed scheme. We take $N=20,000$, $K=20$ and different $D$ as testing parameters and report the details of the composition of a search time in Table 3. One can see that although the search time is a little bit longer as the dimension of dataset is increased, about 80% of computational loads are outsourced to $CS$, and $DO$ takes care of the rest 20%. At the $QU$ site, the computation overhead is extremely lightweight and is negligible. This fact makes the proposed scheme more applicable and practical even if $DO$ and $QU$ are worked on resourcelimited environments. Notice that there is a high correlation between the communication cost and the total query search time, which is quite reasonable.
7. Conclusions and Future Works
In this work, a privacy preserving $K$NN search scheme is constructed based on a newly proposed encryption scheme, which preserves not only comparison but also addition operations in the encrypted domain. Moreover, some disadvantages of related works are addressed and released by our approach. The proposed scheme is realized on the bases of several security protocols among a data owner, a cloud server and a query user, and we manage the dataset with the structure of an Rtree. Security analysis shows that the proposed encryption scheme not only achieves INDOCPA security but also preserves both the data privacy and query privacy. Furthermore, we justify that our approach runs in sublinear time by complexity analysis. And finally, experimental results demonstrate its effectiveness with respect to different sizes of datasets. Most of all, the impressive lightweight user loads enhance the applicability of our approach to resourcelimited mobile platforms.
As an interesting extension of PPkNN, Wu et al. [32] presented the first solution to the socalled Group knearest neighbor (kGNN) search problem, which allows a group of n mobile users to jointly retrieve k points from a locationbased service provider (LSP) that minimizes the aggregate distance to them, at the same time. The authors identified four protection objectives in the privacypreserving kGNN (PPkGNN) search: (i) every user’s location should be protected from LSP; (ii) the group’s query and the query answer should be protected from LSP; (iii) LSP’s private database information should be protected from users, i.e., the users cannot learn more information beyond the answer they requested; (iv) every user’s location should be protected from the other users in the group. Since the encryption mechanism of our scheme is based on the Paillier cryptosystem, same as in [32], we might extend our PPkNN scheme to solve PPkGNN search problem, which is one of our future research directions. Furthermore, how to improve the overall system performance in regarding to highdimensional datasets is another research direction.
One of the anonymized reviewers brings the following important and interesting issue to us: “How a data owner can be contributed in the process of securing/encrypting cloud dataset?”, which is an important aspect of the dynamically changing environments, such as in Cloudbased servers. Although this topic is out of the main scope of our current work, this subject should be included in our future research directions, as suggested by the reviewer. Specific thanks to the same reviewer for mentioning the following useful reference [33] to us. As mentioned by the same reviewer: “Using more than one Cloud servers the processing and computational overheads can be further reduced.” Of course, this interesting and challenging issue would be included in our future research topics. Thanks also to the reviewer for bringing the following useful references to us [34,35].
Our scheme would fail if a cloud server did not follow the predefined protocols and returned the wrong results to the query user. In other words, we think how to design a cloudbased privacy preserving $K$NN search scheme which is robust to “Maninthemiddle attack” is still an open problem.
Author Contributions
Conceptualization, Y.C.H. and J.L.W.; Data curation, Y.C.H.; Formal analysis, Y.C.H.; Investigation, Y.C.H.; Methodology, Y.C.H. and J.L.W.; Software, Y.C.H.; Supervision, J.L.W.; Validation, Y.C.H.; Visualization, Y.C.H. and C.H.H.; Writing—original draft, Y.C.H. and J.L.W.; Writing—review & editing, C.H.H. and J.L.W. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Conflicts of Interest
The authors declare no conflict of interest.
References
 Guttman, A. Rtrees: A Dynamic Index Structure for Spatial Searching. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Boston, MA, USA, 18–21 June 1984; pp. 47–57. [Google Scholar]
 Popa, R.A.; Li, F.H.; Zeldovich, N. An IdealSecurity Protocol for OrderPreserving Encoding. In Proceedings of the IEEE Symposium on Security and Privacy, Berkeley, CA, USA, 19–22 May 2013; pp. 463–477. [Google Scholar]
 Paillier, P. Publickey Cryptosystems Based on Composite Degree Residuosity Classes. In Proceedings of the International Conference on the Theory and Applications of Cryptographic Techniques, Prague, Czech Republic, 2–6 May 1999; pp. 223–238. [Google Scholar]
 Behera, P.K.; Khilar, P.M. A Novel Trust Based Access Control Model for Cloud Environment. In Proceedings of the International Conference on Signal, Networks, Computing, and Systems, Delhi, India, 25–27 February 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 285–295. [Google Scholar]
 “Mandatory Access Control.” Wikipedia: The Free Encyclopedia. Wikimedia Foundation, Inc. Available online: http://en.wikipedia.org/wiki/Mandatory_access_control (accessed on 21 December 2019).
 Samarati, P.; De Vimercati, S.C. Access control: Policies, models, and mechanisms. In International School on Foundations of Security Analysis and Design; Springer: Berlin/Heidelberg, Germany, 2001; pp. 137–196. [Google Scholar]
 Sandhu, R.S.; Coyne, E.J.; Feinstein, H.L.; Youman, C.E. Rolebased access control models. Computer 1996, 29, 38–47. [Google Scholar] [CrossRef]
 Guoyuan, L.; Wang, D.; Bie, Y.; Lei, M. MTBAC: A mutual trustbased access control model in Cloud computing. China Commun. 2014, 11, 154–162. [Google Scholar] [CrossRef]
 Kayers, A.S.M.; Rahayu, W.; Dillon, T.; Chang, E.; Han, J. Contextaware access control with imprecise context characterization for cloudbased data resources. Future Gener. Comp. Syst. 2019, 93, 237–255. [Google Scholar]
 Kwon, Y.; Seo, J.H.; Park, T.B. Dynamic Rolebased User Service Authority Control and Management on Cloud Computing. Univers. J. Electr. Electron. Eng. 2019, 6, 79–93. [Google Scholar] [CrossRef]
 Voloch, N.; Levy, P.; Elmakies, M.; Gudes, E. An Access Control Model for Data Security in Online Social Networks Based on Role and User Credibility. The Third International Symposium on Cyber Security Cryptography and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2019; pp. 156–168. [Google Scholar]
 Kayes, A.S.M.; Rahayu, W.; Tharam, T. Critical situation management utilizing IoTbased data resources through dynamic contextual role modeling and activation. Computing 2019, 101, 743–772. [Google Scholar] [CrossRef]
 Rong, H.; Wang, H.M.; Liu, J.; Xian, M. Privacypreserving knearest neighbor computation in multiple cloud environments. IEEE Access 2016, 4, 9589–9603. [Google Scholar] [CrossRef]
 Park, J.; Lee, D.H. Privacy preserving knearest neighbor for medical diagnosis in ehealth cloud. J. Healthc. Eng. 2018, 2018, 4073103. [Google Scholar] [CrossRef] [PubMed]
 Burkhart, M.; Dimitropoulos, X. Privacypreserving distributed network troubleshooting—bridging the gap between theory and practice. ACM Trans. Inf. Syst. Secur. (TISSEC) 2011, 14, 31. [Google Scholar] [CrossRef]
 Yin, H.; Zhang, J.; Xiong, Y.; Huang, X.; Deng, T. PPKmeans: Achieving privacypreserving clustering over encrypted multidimensional cloud data. Electronics 2018, 7, 310. [Google Scholar] [CrossRef]
 Wong, W.K.; Cheung, D.W.L.; Kao, B.; Mamoulis, N. Secure KNN Computation On Encrypted Databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Providence, RI, USA, 29 June–2 July 2009; pp. 139–152. [Google Scholar]
 Hu, H.; Xu, J.; Ren, C.; Choi, B. Processing Private Queries over Untrusted Data Cloud Through Privacy Homomorphism. In Proceedings of the IEEE 27th International Conference on Data Engineering, Hannover, Germany, 11–16 April 2011; pp. 601–612. [Google Scholar]
 Yao, B.; Li, F.; Xiao, X. Secure Nearest Neighbor Revisited. In Proceedings of the IEEE 29th International Conference on Data Engineering (ICDE), Brisbane, Australia, 8–12 April 2013; pp. 733–744. [Google Scholar]
 Elmehdwi, Y.; Samanthula, B.K.; Jiang, W. Secure Knearest Neighbor Query over Encrypted Data in Outsourced Environments. In Proceedings of the IEEE 30th International Conference on Data Engineering, Chicago, IL, USA, 31 March–4 April 2014; pp. 664–675. [Google Scholar]
 Wang, B.; Hou, Y.; Li, M. Practical and Secure Nearest Neighbor Search on Encrypted Largescale Data. In Proceedings of the IEEE INFOCOM 2016—The 35th Annual IEEE International Conference on Computer Communications, San Francisco, CA, USA, 10–14 April 2016; pp. 1–9. [Google Scholar]
 Xu, R.; Morozov, K.; Yang, Y.; Zhou, J.; Takagi, T. Privacypreserving Knearest Neighbour Query on Outsourced Database. In Proceedings of the Australasian Conference on Information Security and Privacy, Melbourne, Australia, 4–6 July 2016; Springer: Berlin/Heidelberg, Germany; pp. 181–197. [Google Scholar]
 Zhou, L.; Zhu, Y.; Castiglione, A. Efficient kNN query over encrypted Data in cloud with limited keydisclosure and offline data owner. Comput. Secur. 2017, 69, 84–96. [Google Scholar] [CrossRef]
 Zhu, Y.; Huang, Z.; Takagi, T. Secure and controllable kNN query over encrypted cloud data with key confidentiality. J. Parallel Distrib. Comput. 2016, 89, 1–12. [Google Scholar] [CrossRef]
 Katz, J.; Lindell, Y. Introduction to Modern Cryptography; Chapman and Hall/CRC: Boca Raton, FL, USA, 2014. [Google Scholar]
 Agrawal, R.; Kiernan, J.; Srikant, R.; Xu, Y. Order Preserving Encryption for Numeric Data. In Proceedings of the ACM SIGMOD International Conference on Management of data, Paris, France, 13–18 June 2004; pp. 563–574. [Google Scholar]
 Boldyreva, A.; Chenette, N.; Lee, Y.; O’neill, A. OrderPreserving Symmetric Encryption. In Proceedings of the Annual International Conference on the Theory and Applications of Cryptographic Techniques, Cologne, Germany, 26–30 April 2009; pp. 224–241. [Google Scholar]
 Goldreich, O. Foundations of Cryptography: Volume 1, Basic Tools; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
 Leutenegger, S.T.; Lopez, M.A.; Edgington, J. STR: A Simple and Efficient Algorithm for Rtree Packing. In Proceedings of the 13th International Conference on Data Engineering, Birmingham, UK, 7–11 April 1997; pp. 497–506. [Google Scholar]
 Hjaltason, G.R.; Samet, H. Distance browsing in spatial databases. ACM Trans. Database Syst. (TODS) 1999, 24, 265–318. [Google Scholar] [CrossRef]
 Henrich, A. A Distance Scan Algorithm for Spatial Access Structures. In Proceedings of the ACMGIS, Gaithersburg, MD, USA, 1–2 December 1994; pp. 136–143. [Google Scholar]
 Wu, Y.; Wang, K.; Zhang, Z.; Lin, W.; Chen, H.; Li, C. Privacy Preserving Group Nearest Neighbor Search. In Proceedings of the EDBT, Vienna, Austria, 26–29 March 2018; pp. 277–288. [Google Scholar]
 Kayes, A.; Han, J.; Colman, A.; Islam, M.S. Relboss: A RelationshipAware Access Control Framework for Software Services. In Proceedings of the OTM Confederated International Conferences” On the Move to Meaningful Internet Systems”, Rhodes, Greece, 21–25 October 2014; pp. 258–276. [Google Scholar]
 Khilar, P.M.; Chaudhari, V.; Swain, R.R. TrustBased Access Control in Cloud Computing Using Machine Learning. In Cloud Computing for Geospatial Big Data Analytics; Springer: Berlin/Heidelberg, Germany, 2019; pp. 55–79. [Google Scholar]
 Kayes, A.S.M.; Rahayu, W.; Dillon, T.S.; Chang, E. Accessing Data from Multiple Sources Through ContextAware Access Control. In Proceedings of the 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, New York, NY, USA, 1–3 August 2018; pp. 551–559. [Google Scholar]
Figure 2.
An Rtree example. The nodes with uppercase letters indicate the minimum bounding rectangles (MBRs), while the nodes with lowercase letters are the data points.
Figure 4.
The structure and the flow diagram of the proposed secure Knearest neighbor ($K$NN) search scheme. The green key denotes the public key while the red one denotes the corresponding secret key.
Figure 5.
Average execution time for secure mOPE ($\mathrm{SmOPE}$) per attribute with respect to different numbers of total attributes.
Figure 7.
Average nearest neighbor query search time with respect to different sizes of twodimensional datasets, in which $D=2$ and $K=1$.
Figure 8.
Average nearest neighbor query search time of multidimensional datasets, in which $N=20,000$ and $K=1$.
[13]  [14]  [15]  [16]  [17]  Our Scheme  

Sublinear search complexity  ✕  ✓  ✓  ✕  ✓  ✓ 
No need to share private key with other  ✕  ✕  ✕  ✕  ✕  ✓ 
No need to store local data at user end  ✓  ✕  ✕  ✓  ✓  ✓ 
Return only accurate KNN set  ✓  ✓  ✕  ✓  ✕  ✓ 
Applicable to high dimensional data  ✓  ✓  ✕  ✓  ✓  ✓ 
Single Sever  ✓  ✓  ✓  ✕  ✓  ✓ 
Table 2.
The complexity analysis results of the proposed scheme. The complexities represented by red, blue, and green colors indicate the computational loads of data owner (DO), cloud server (CS), and query user (QU), respectively.
$\mathbf{Enc}/\mathbf{Dec}$  Paillier Operation  Plaintext Operation  

$Enc$for Data  $\mathrm{O}\left(ND\right)$  
$SmOPE$for Data  $\mathrm{O}\left(ND\mathrm{log}\left(ND\right)\right)$  $\mathrm{O}\left(ND\mathrm{log}\left(ND\right)\right)$  
$\mathrm{O}\left(ND\mathrm{log}\left(ND\right)\right)$  $\mathrm{O}\left(ND\mathrm{log}\left(ND\right)\right)$  $\mathrm{O}\left(ND\right)$  
$RtConstruct$  $\mathrm{O}\left(DN\mathrm{log}\left(N\right)\right)$  
$Enc$for Query  $\mathrm{O}\left(D\right)$  
$SmOPE$for Query  $\mathrm{O}\left(D\mathrm{log}\left(ND\right)\right)$  $\mathrm{O}\left(D\mathrm{log}\left(ND\right)\right)$  
$\mathrm{O}\left(D\mathrm{log}\left(ND\right)\right)$  $\mathrm{O}\left(D\mathrm{log}\left(ND\right)\right)$  $\mathrm{O}\left(D\right)$  
$SBFS$  $SSED$  $\mathrm{O}\left(D\left(\mathrm{log}N+K+D{K}^{\frac{D1}{D}}\right)\right)$  $\mathrm{O}\left(D\left(\mathrm{log}N+K+D{K}^{\frac{D1}{D}}\right)\right)$  
$\mathrm{O}\left(D\left(\mathrm{log}N+K+D{K}^{\frac{D1}{D}}\right)\right)$  $\mathrm{O}\left(D\left(\mathrm{log}N+K+D{K}^{\frac{D1}{D}}\right)\right)$  
$\mathbf{Access}$  $\mathrm{O}\left(\mathrm{log}\leftQ\right\left(\mathrm{log}N+K+D{K}^{\frac{D1}{D}}\right)\right)$  $\mathrm{O}\left(\mathrm{log}\leftQ\right\left(\mathrm{log}N+K+D{K}^{\frac{D1}{D}}\right)\right)$  
$\mathrm{O}\left(\mathrm{log}\leftQ\right\left(\mathrm{log}N+K+D{K}^{\frac{D1}{D}}\right)\right)$  $\mathrm{O}\left(\mathrm{log}\leftQ\right\left(\mathrm{log}N+K+D{K}^{\frac{D1}{D}}\right)\right)$  
Return Results  $\mathrm{O}\left(DK\right)$  
$\mathrm{O}\left(DK\right)$  $\mathrm{O}\left(DK\right)$  $\mathrm{O}\left(DK\right)$ 
Table 3.
Details of the query search time of different $D$ with N=20,000 and K=20. (Time is measured in seconds and communication cost is measured in Kb, Percentage abbreviated to Per.).
$\mathit{D}$  Total Query Time  $\mathit{D}\mathit{O}\text{}\mathbf{Time}$  Per.  $\mathit{C}\mathit{S}\text{}\mathbf{Time}$  Per.  $\mathit{Q}\mathit{U}\text{}\mathbf{Time}$  Per.  Communication Cost 

$2$  9.17  2.05  22.3%  7.04  76.7%  0.000003  <0.01%  469.003 
$3$  19.03  3.93  20.7%  14.91  78.4%  0.000003  <0.01%  978.568 
$4$  40.24  7.85  19.5%  32.01  79.5%  0.000003  <0.01%  2091.073 
$5$  77.8  14.52  18.7%  62.55  80.4%  0.000004  <0.01%  4030.649 
$6$  135.06  24.41  18.1%  109.43  81.0%  0.000006  <0.01%  7059.019 
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).