Next Article in Journal
GDR: A Game Algorithm Based on Deep Reinforcement Learning for Ad Hoc Network Routing Optimization
Next Article in Special Issue
Contemporary Study on Deep Neural Networks to Diagnose COVID-19 Using Digital Posteroanterior X-ray Images
Previous Article in Journal
YKP-SLAM: A Visual SLAM Based on Static Probability Update Strategy for Dynamic Environments
Previous Article in Special Issue
A Hybrid Method for Keystroke Biometric User Identification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Privacy-Preserving Top-k Query Processing Algorithms Using Efficient Secure Protocols over Encrypted Database in Cloud Computing Environment

1
Department of Computer Engineering, Jeonbuk National University, Jeonju-si 54896, Korea
2
Department of IT Convergence System, Jeonju Vision College, Jeonju-si 55069, Korea
*
Author to whom correspondence should be addressed.
Electronics 2022, 11(18), 2870; https://doi.org/10.3390/electronics11182870
Submission received: 9 August 2022 / Revised: 2 September 2022 / Accepted: 8 September 2022 / Published: 11 September 2022
(This article belongs to the Special Issue Digital Trustworthiness: Cybersecurity, Privacy and Resilience)

Abstract

:
Recently, studies on secure database outsourcing have been highlighted for the cloud computing environment. A few secure Top-k query processing algorithms have been proposed in the encrypted database. However, the previous algorithms can support either security or efficiency. Therefore, we propose a new Top-k query processing algorithm using a homomorphic cryptosystem, which can support both security and efficiency. For security, we propose new secure and efficient protocols based on arithmetic operations. To obtain a high level of efficiency, we also propose a parallel Top-k query processing algorithm using an encrypted random value pool. Through our performance analysis, the proposed Top-k algorithms present about 1.5∼7.1 times better performance with regard to a query processing time, compared with the existing algorithms.

1. Introduction

With the increasing popularity of cloud computing, there has been growing interest in outsourcing databases. Cloud computing provides a service that allows internet-connected users to use virtual computing resources such as storage, computation, and network [1,2,3,4,5]. Generally, three requirements are considered for outsourcing databases in cloud computing. First, it is essential to provide data privacy because the data owner’s database contains the sensitive information of clients [6,7]. Second, the query and the query result must be protected because these information can contain and infer the sensitive information of the query issuer [8,9]. Finally, data access patterns must be protected because an attacker can infer sensitive information by analyzing the data access patterns [10,11,12]. Therefore, secure query processing over an encrypted database has been researched to protect the original data, queries, and access patterns. Previous strategies modify plaintexts to their substituted data, and outsource them to a cloud [13,14,15,16,17,18]. However, these previous strategies cannot completely protect both data and queries because they are weak to various attacks. To tackle this problem, recent strategies encrypt the original data and outsource them to the cloud [19,20,21,22,23,24,25]. That is, a data owner encrypts the database before he/she outsources the original data to a cloud. The cloud can process the query that is issued from a user. The user can obtain the results from the cloud. Figure 1 shows an encrypted database outsourcing model.
Top-k query processing algorithms are used for various applications such as location-based service, e-commerce, and web search engine [26]. First, H-I. Kim et al.’s algorithm [27] proposed Top-k query processing algorithms based on the Paillier cryptosystem. However, while processing the Top-k query processing algorithms, the algorithm needs a high computational cost because it uses secure protocols based on binary operations. Second, H-J. Kim et al.’s algorithm [28] proposed Top-k query processing algorithms using Yao’s garbled circuit. By using Yao’s garbled circuit, the algorithm uses a secure protocol to check whether or not a node includes a point without binary bit operations for ciphertext. However, since Yao’s garbled circuit [29,30] is based on a hardware-based Boolean circuit, the algorithm still requires repetitive bit operations, thus causing a high computation cost. Both algorithms have a critical problem in that they need a high computational cost for Top-k query processing. To the best of our knowledge, there is no conventional Top-k query processing algorithm that is suitable for parallel processing. To tackle this problem, we propose a new Top-k query processing algorithm using new secure protocols based on arithmetic operations over encrypted database in cloud computing. In terms of efficiency, our algorithm uses both new secure protocols and a data filtering technique using kd-tree [31]. The proposed secure protocols optimize the procedure of comparison by using arithmetic operations, rather than employing binary bit operations, which require a high computation cost. Moreover, to improve the efficiency, we propose a parallel Top-k query processing algorithm using a thread pool. In addition, while hiding data access patterns, our algorithm support security for both original data and user query by using the Paillier cryptosystem [32] and secure two-party computation [33,34,35]. To prove the security of our algorithm, we provide the formal security proofs of both the proposed secure protocols and Top-k query processing algorithms. Through the performance analysis, we prove that our Top-k query processing algorithms outperform the existing ones. The contributions of our paper are as follows:
  • We present an architecture for outsourcing both the encrypted data and index.
  • We propose new secure protocols based on arithmetic operations (e.g., ASC, ASRO, and ASPE) to protect original data, user query, and access patterns.
  • We propose a new Top-k query processing algorithm that can support both security and efficiency.
  • We propose a new parallel Top-k query processing algorithm using a random value pool to improve the efficiency of Top-k query processing.
  • We also present the comprehensive performance analysis of our algorithms with a synthetic and real dataset.
We organize the rest of the paper as follows. In Section 2, we describe the Paillier cryptosystem, adversarial attack model, and related work. In Section 3, we present the two-party computation structure and new secure protocols. In Section 4, we describe a a new Top-k query processing algorithm. In Section 5, we describe a parallel Top-k algorithm. In Section 6, we present the security proof of our Top-k algorithms. In Section 7, we describe the performance analysis of our Top-k algorithms. Finally, we conclude this paper.

2. Background and Related Work

2.1. Background

Paillier cryptosystem.The Paillier cryptosystem [32] is an additive homomorphic and probabilistic asymmetric encryption scheme for public key cryptography. The public key p k for encryption is represented by (N, g), where N is a multiplication of the large prime integer p and q, and g is a circular integer in Z N 2 * . Here, Z N 2 * denotes an integer domain ranging from zero to N 2 . The secret key s k for decryption is represented by (p, q). Let E(·) represent the encryption function and D(·) represent the decryption function. The Paillier cryptosystem includes the following characteristic.
  • Homomorphic addition: The multiplication of two ciphertexts E( m 1 ) and E( m 2 ) generates the ciphertext of the sum of their plaintexts m 1 and m 2 (Equation (1)).
    E ( m 1 + m 2 ) = E ( m 1 ) × E ( m 2 ) m o d N 2
  • Homomorphic multiplication: The m 2 th power of ciphertext E( m 1 ) generates the ciphertext of the multiplication of m 1 and m 2 (Equation (2)).
    E ( m 1 + m 2 ) = E ( m 1 ) × E ( m 2 ) m o d N 2
  • Semantic security: Encryptions of the same plaintexts generate different ciphertexts in the same public key (Equation (3)).
    m 1 = m 2 E m 1 = E m 2
Adversarial attack model. In the outsourcing database environment, two attack models can be considered: a semi-honest attack model and a malicious attack model [36,37]. In the semi-honest (or honest-but-curious) attack model, the cloud performs its own protocol honestly, but attempts to obtain sensitive data about the data owner and the authorized user. To protect a semi-honest attack, original data should be encrypted before outsourcing. A malicious attack model tries to achieve the original data or disable service. To protect the original data under the malicious attack, a service provider concentrates on distinguishing attacks and restoring the damaged formal procedure. Since we concentrate on protecting sensitive data in an outsourced environment, we propose secure query processing under the semi-honest attack model. A secure protocol for the semi-honest attack model is defined as follows [36,37].
Definition 1.
Assuming α i is the input parameters of cloud C i , i ( ρ ( α ) ) is the execution image of C i for the protocol ρ. If the simulating execution image S i ( ρ ( α ) ) is indistinguishable from i ( ρ ( α ) ) , the protocol ρ is a secure protocol under the semi-honest attack model.

2.2. Related Work

The existing privacy-preserving Top-k query processing algorithms are as follows. First, J. Vaidya et al. [38] proposed a privacy-preserving Top-k query processing algorithm using Fagin’s scheme [39], in which the data are vertically partitioned. If each party reports the scored data being ordered based on the local score until there are at least k common data in the output of all of the parties, the union of the reported data includes Top-k results. The algorithm determines the actual Top-k results by identifying an approximate cutoff score separating the k-th data item from those below it. The algorithm has an advantage in that it does not reveal the score of an individual datum by using a secure comparison technique. However, in terms of query privacy, an attacker can infer the preference of a user because the cutoff score is easily estimated by using a binary search over the range of values. According to data privacy, because the algorithm does not encrypt the partitioned data, the attacker can obtain the original data under the semi-honest attack model. In addition, data access patterns are not protected because the identification of the Top-k result is disclosed.
Second, M. Burkhart et al. [40] proposed a Top-k query processing algorithm that utilizes hash tables and a secret sharing technique. The algorithm aggregates the distributed key-value data by using Shamir’s secret sharing technique, and finds the k number of key-value pairs with the largest aggregated values as the Top-k result. To reduce computation time, the algorithm uses fixed-length hash tables for avoiding expensive key comparison operations. However, the algorithm cannot guarantee the accurate result because the aggregated results are probabilistic, in that the algorithm performs a binary search for estimating the intermediate threshold separating the k-th item from the (k+1)-th item. Moreover, an attacker can infer the preference of a user because a threshold is easily estimated by using binary search over the range of values. The algorithm cannot conceal data access patterns because the index of a hash table related to the Top-k results is revealed.
Third, Y. Zheng et al. [41] proposed a privacy-preserving Top-k query processing algorithm over vertically distributed data sources. To support data privacy, the service provider ( S P ) divides the data owner’s database into data sources ( D S = D S 1 ,   D S 2 ,   ,   D S m ) for each attribute. The  S P generates the encryption keys and determines the positive weight values w 1 ,   w 2 ,   ,   w m for the scoring function f ( x 1 ,   x 2 ,   ,   x m ) = i = 1 m w i x i , where x i is a data record and i = 1 m   w i = 1 . Then, the  S P distributes the encryption keys to each D S i and sends the corresponding weight w i to D S i via a secure channel. For supporting the Top-k query, each D S i outsources its data to the cloud. Because both x i and w i are private information, D S i encrypts the product of x i and w i before outsourcing them to the cloud. However, in the semi-honest attack model, an attacker can achieve the sensitive data from data sources because a data owner transmits data records and weights to the data sources in plaintext. Therefore, the algorithm does not protect data, query, and access pattern.
Fourthly, H-I. Kim et al. [27] proposed a privacy-preserving Top-k query processing algorithm (STop k I ) grounded using both an encrypted kd-tree index and the Paillier cryptosystem [31]. The Paillier cryptosystem can provide homomorphic operation and protect a chosen-plaintext attack or a chosen-ciphertext one, so that the cloud can obtain the Top-k query result without decrypting both a user’s query and original data. The algorithm also proposed a secure protocol based on binary bit operations that can access all leaf nodes without the exposure of data access patterns. The algorithm can protect the user’s query, sensitive data, and data access patterns. However, because the secure protocol consists of binary bit operations for ciphertext, it is necessary to perform a binary bit operation as many times as the data size. The repetitive binary bit operations cause high computation cost.
Finally, H-J. Kim et al. [28] proposed a privacy-preserving Top-k query processing algorithm (STop k G ) using Yao’s garbled circuit [29,30]. Yao’s garbled circuit is a secure protocol that enables two-party secure computation in which two semi-honest parties can jointly calculate a function over private inputs by using a Boolean circuit. By using Yao’s garbled circuit, the algorithm uses a secure protocol to check whether or not a node includes a point without binary bit operations for ciphertext. The algorithm can present more efficiently than STop k I while providing the same level of privacy as STop k I . However, since Yao’s garbled circuit is based on a hardware-based Boolean circuit, the algorithm still requires repetitive bit operations, thus causing a high computation cost.
Table 1 summarize the existing studies based on the characteristics. We describe their comparison with regard to three major characteristics, i.e., hiding access patterns, computation overhead, and security risk. First, H-I. Kim et al.’s work [27] and H-J. Kim et al.’s work [28] can protect data access patterns, while J. Vaidya and C. Clifton’s work [38], M. Burkhart and X. Dimitropoulos’s work [40], and Y. Zheng et al.’s work [41] cannot protect it. Second, J. Vaidya and C. Clifton’s work [38] and M. Burkhart and X. Dimitropoulos’s work [40] require low computational overhead because they do not use any encryption scheme, while H-I. Kim et al.’s work [27] and H-J. Kim et al.’s work [28] need high computational overhead due to the use of the Paillier cryptosystem [32]. Finally, H-I. Kim et al.’s work [27] and H-J. Kim et al.’s work [28] have low risk with regard to security because they protect sensitive data, uses’s query, and data access patterns, while J. Vaidya and C. Clifton’s work [38], M. Burkhart and X. Dimitropoulos’s work [40], and Y. Zheng et al.’s work [41] have high security risk because they only protect sensitive data.

3. Overall System Architecture

3.1. System Architecture

Since we adopt the semi-honest attack model [36,37], we consider the cloud as insider adversaries. Under the semi-honest adversarial model, the cloud follows the procedure of protocol, but can attempt to achieve the additional information not allowed to the cloud. Therefore, earlier studies [13,16] are performed under the semi-honest attack model. Meanwhile, a secure protocol is a communication protocol that is an agreed sequence of actions performed by two or more communicating entities (e.g., cloud A and cloud B) in order to accomplish some mutually desirable goal [34,35]. It makes use of cryptographic techniques, allowing the communicating entities to achieve a security goal. For example, the entities encrypt the information or permutate the sequence of data by using the random value for data protection. A protocol can be proven to be secure under the semi-honest adversarial model by using Definition 1 in Section 2.
The system is organized according to four components: Cloud A( C A ), Cloud B( C B ), data owner ( D O ), and authorized user ( A U ). The  D O possesses the original database (T) of a set of records [27]. A record t i (1 ≤ in) is composed of m columns, where m denotes the number of data dimensions, and the j-th columns of t i is represented by t i , j (1 ≤ jm). The  D O divides T by using the kd-tree [27] to provide the indexing on T. If we retrieve the tree structure in a hierarchical manner, the access pattern can be disclosed. Therefore, we only consider the leaf nodes of the kd-tree, and all the leaf nodes are retrieved once during the query processing step. Hence, a node denotes a leaf node. Let h denote the level of the constructed kd-tree, and F denotes the fanout of each leaf node. A node is denoted by n o d e z (1 ≤ z 2 h 1 ), where 2 h 1 is the number of leaf nodes. The region data of n o d e z is shown as the upper bound u b z , j and the lower bound l b z , j (1 ≤ z 2 h 1 , 1 ≤ jm). Each node retains the identifiers ( i d ) of the data located inside the node region. To preserve the data privacy, the  D O encrypts T attribute-wise using the public key ( p k ) of the Paillier cryptosystem [32] before outsourcing the database to the cloud. Therefore, the  D O make E ( t i , j ) encrypt t i , j for 1 i n and 1 j m . The  D O also encrypts the region data of all nodes of the kd-tree, so as to support efficient query processing. Especially, u b and l b of each node are encrypted attribute-wise, such that E ( u b z , j ) and E ( l b z , j ) are created with 1 z 2 h 1 and 1 j m . Figure 2 presents the example of the encrypted kd-tree when the number of data items is 20 and the level of the kd-tree is 3. The encrypted kd-tree consists of four regions (leaf nodes), i.e.,  N o d e 1 ,   N o d e 2 ,   N o d e 3 , and N o d e 4 . For example, the lower bound and upper bound of N o d e 1 are <E(0), E(0)> and <E(6), E(5)>, respectively. Additionally,  N o d e 1 includes five data items, i.e., E( d 1 ) = <E(2), E(1)>, E( d 2 ) = <E(1), E(2)>, E( d 3 ) = <E(2), E(3)>, E( d 4 ) = <E(4), E(4)>, and E( d 5 ) = <E(5), E(3)>.
We consider that C A and C B are non-colluding and semi-honest clouds. Thus, they correctly follow the procedure of protocols. To support Top-k query processing over the encrypted database, a secure multi-party computation (SMC) is required between C A and C B . To establish the structure of SMC, the  D O outsources both the encrypted data and its encrypted index to the C A with p k , while the D O sends s k to the C B . The encrypted index includes the region data of each node in ciphertext and the ids of the data located inside the node, in plaintext. The  D O also sends p k to A U s to authorize them to encrypt a Top-k query. When issuing a query, an  A U first creates E( q j ) by encrypting a query q attribute-wise for 1 j m . C A and C B process the query and return its result to the A U .

3.2. Secure Protocol

Since the proposed Top-k query processing algorithm is constructed using several secure protocols, we use four secure protocols from the literature [19,27,28], such as Secure Multiplication (SM) [19], Secure Bit-Not (SBN) [27], Secure Compare (CMP-S) [27,28], and Secure Minimum from Set of n values (SMS n ) [27,28]. In addition, we newly suggest secure protocols: Advanced Secure Compare (ASC), Advanced Secure Range Overlapping (ASRO), and Advanced Secure Point Enclosure (ASPE). By using arithmetic operations, the proposed secure protocols improve the existing comparison protocols that employ binary bit comparison [19] and garbled circuit comparison [29,30]. Therefore, the proposed secure protocols require a low computation cost.
Advanced Secure Compare (ASC) protocol. The ASC protocol compares two encryption values securely and returns whether the first value is greater than another or not. For two encryption values E(u) and E(v), the ASC protocol returns E(1) if u v ; otherwise it returns E(0). The procedure of the ASC protocol is described in Algorithm 1. First, C A selects two random numbers and their encryption values, < r a , E( r a )>, < r b , E( r b )>, from the random value pool (line 1). Second, C A computes E u = E r a × u + r b = E ( u ) r a × E ( r b ) and E v = E r a × v + r b = E ( v ) r a × E ( r b ) by using the Paillier cryptosystem (lines 2∼3). Third, C A randomly selects one of two functions, F 0 : u v and F 1 : u < v . The chosen function cannot expose to C B . Then, C A sends data to C B , according to the chosen function. If  F 0 : u < v is chosen, C A transmits <E( u ), E( v )> to C B . If  F 1 : u < v is chosen, C A transmits <E( v ), E( u )> to C B . Fourth, C B receives a pair of encryption <E( α ), E( β )> from C A (line 9). C B decrypts E( α ) and E( β ). Because  C B does not know the sequence of data received from C A , C B cannot determine whether < α , β > is < u , v > or < v , u >. If  α is less than or equal to β , C B sends E( γ ) = E(1) to C A . Otherwise, C B sends E( γ ) = E(0) to C A (lines 10∼16). Finally, C A receives E( γ ) from C B . If  F 0 : u v is selected, C A returns E( γ ). Otherwise, C A returns SBN(E( γ )) because C A sends <E( v ), E( u )>, which is the reverse order of the original pair <E( u ), E( v )> (lines 18∼21). The procedure of the ASC protocol is presented in Algorithm 1.
For example, C A compares E(u) = E(5), E(v) = E(10) by using the ASC protocol. First, C A picks the random value r a = 3, r b = 2 in the random value pool. C A calculates E( u ) = E u r a × E r b = E r a × u + r b = E 3 × 5 + 2 = E ( 17 ) ,   E v = E ( v ) r a × E r b = E r a × v + r b = E ( 3 × 10 + 2 ) = E ( 32 ) . Second, C A selects one of two functions, F 0 and F 1 . If we assume that C A chooses F 1 , the sequence of data is <E( v ), E( u )>. Therefore, C A transmits <E(32), E(17)> to C B . Third, C B sets < α ,   β > to <32, 17> by decrypting <E(32), E(17)>. Because  α is greater than β , γ is set to 0. C B encrypts γ and sends E( γ ) = E(0) to C A . Finally, C A receives E( γ ) = E(0) from C B . Because  C A selects F 1 , C A returns SBN(E( γ )) = E(1).
Advanced Secure Range Overlapping (ASRO) protocol. The ASRO protocol determines whether two encrypted ranges are overlapped or not. A region is denoted by the lower bound (lb) and the upper bound (ub). Figure 3 presents the overlapping and the non-overlapping between two regions (i.e., r a n g e 1 and r a n g e 2 ) in the ASRO protocol. The  r a n g e 1 and r a n g e 2 are overlapped if range 2 . l b range 1 . l b range 2 . u b or range 2 . l b range 1 . u b range 2 . u b for all dimensions. For two encrypted range values E( r a n g e 1 ) and E( r a n g e 2 ), the ASRO protocol returns E(1) if r a n g e 1 and r a n g e 2 are overlapped; otherwise it returns E(0).
Algorithm 1 ASC Protocol.
Input:  E ( u ) ,   E ( v )
Output: if u v , return E(1), otherwise, return E(0)
C A :
   01. pick two pairs < r a , E( r a )> and < r b , E( r b )> in random value pool
   02. E( u ) ← E ( u ) r a × E ( r b )
   03. E( v ) ← E ( v ) r a × E ( r b )  
   04. randomly selected F 0 or F 1  
   05. if F 0 is selected then
   06.    Transmit <E( u ), E( v )> to C B  
   07. else if F 1 is selected then
   08.    Transmit <E( v ), E( u )> to C B  
C B :  
   09. receive E( α ), E( β ) // C B cannot know any information from C A .
   10. α D(E( α ))
   11. β D(E( β ))
   12. if α β  
   13.    E( γ ) ← E(1)
   14. else
   15.    E( γ ) ← E(0)
   16. Transmit E ( γ ) to C A  
C A :  
   17. receive E ( γ )  
   18. if F 0 is selected, then
   19.    return E( γ )
   20. else
   21.    return SBN(E( γ ))
First, C A initializes E( α ) as E(1) (line 1). Second, for all dimensions, C A performs ASC(E( r a n g e 1 . l b i ), E( r a n g e 2 . u b i )), where 1 i # of dimensions, and stores the result of the ASC protocol into E( β ). The ASC protocol is used to check whether the lower bound of r a n g e 1 is less than the upper bound of r a n g e 2 for each dimension. Then, C A performs SM(E( α ), E( β )) [19] and stores its result into E( α ) (lines 2∼4). Third, for all dimensions, C A performs ASC(E( r a n g e 2 . l b i ), E( r a n g e 1 . u b i )) and stores its result into E( γ ). The ASC protocol is used to check whether the lower bound of r a n g e 2 is less than the upper bound of r a n g e 1 for each dimension. Then, C A performs SM(E( α ), E( γ )) and stores its result into E( α ) (lines 5∼7). The SM protocol is used to determine whether the point is included into a range for all dimensions. Finally, C A return E( α ) as the result of the ASRO protocol. The procedure of the ASRO protocol is presented in Algorithm 2.
Algorithm 2 ASRO Protocol.
Input: E( r a n g e 1 ), E( r a n g e 2 ) (range consists of < r a n g e . l b i , r a n g e . u b i > 1 i m , m is the number of dimension)
Output: if r a n g e 1 and r a n g e 2 is overlapped, return E(1), otherwise, return E(0)
C A :
   01. E( α ) ← E(1)
   02. for 1 i m  
   03.    E( β ) ← ASC(E( r a n g e 1 . l b i ), E( r a n g e 2 . u b i ))
   04.    E( α ) ← SM(E( β ), E( α ))
   05. for 1 i m  
   06.    E( β ) ← ASC(E( r a n g e 1 . l b i ), E( r a n g e 2 . u b i ))
   07.    E( α ) ← SM(E( β ), E( α ))
   08. return E( α )
For example, in Figure 4, C A checks the overlap between E( r a n g e 1 ) and E( r a n g e 2 ) by using the ASRO protocol. First, C A sets E( α ) as E(1). Second, for the x-axis, C A calculates ASC(E( r a n g e 1 . l b x ), E( r a n g e 2 . u b x )) = ASC(E(1), E(5)) = E(1) and stores its result into E( β ). C A calculates SM(E( α ), E( β )) = SM(E(1), E(1)) = E(1) and stores it into E( α ). For the y-axis, C A calculates ASC(E( r a n g e 1 . l b y ), E( r a n g e 2 . u b y )) = ASC(E(1), E(5)) = E(1) and stores it into E( β ). C A calculates SM(E( α ), E( β )) = SM(E(1), E(1)) = E(1) and stores it into E( α ). Third, for the x-axis, C A calculates ASC(E( r a n g e 2 . l b x ), E( r a n g e 1 . u b x )) = ASC(E(2), E(4)) = E(1) and stores it into E( γ ). C A calculates SM(E( α ), E( γ )) = SM(E(1), E(1)) = E(1) and stores it in E( α ). For the y-axis, C A calculates ASC(E( r a n g e 2 . l b y ), E( r a n g e 1 . u b y )) = ASC(E(3), E(4)) = E(1) and stores it into E( γ ). C A calculates SM(E( α ), E( γ )) = SM(E(1), E(1)) = E(1) and stores it into E( α ). Finally, C A returns E( α ) = E(1) as the result of the ASRO protocol.
Advanced Secure Point Enclosure (ASPE) protocol. The ASPE protocol determines whether a point is included in the given range or not. A point p is included in a range if r a n g e . l b p r a n g e . u b for all dimensions. When an encrypted point value E(p) and an encrypted range E(range) are given, the ASPE protocol returns E(1) if the range includes the point(p); otherwise, it returns E(0). First, C A assigns E(1) to E( α ) (line 1). Second, for each dimension i, C A executes E( β ) = ASC(E( p i ), E( r a n g e . u b i )) and E( α ) = SM(E( α ), E( β )) (lines 2∼4). The ASC protocol is used to check whether p is less than the upper bound of the range for each dimension. The SM protocol is used to determine whether the point is included into a range for all dimensions. Third, for each dimension i, C A performs E( γ ) = ASC(E( r a n g e . l b j ), E( p j )), and E( α ) = SM(E( α ),E( γ )) (lines 5∼7). Finally, C A returns E( α ) (line 8). We skip the example of the ASPE protocol because the process of the ASPE protocol is identical to that of the ASRO protocol, except that the input is a point, not a range. The procedure of the ASPE protocol is presented in Algorithm 3.
Algorithm 3 ASPE Protocol.
Input: E(p), E(range) (range consists of < r a n g e . l b i , r a n g e . u b i > 1 i m , m is the number of dimension)
Output: if range includes p, return E(1), otherwise, return E(0)
C A :
   01. E( α ) ← E(1)
   02. for 1 i m  
   03.    E( β ) ← ASC(E(p), E( r a n g e . u b i ))
   04.    E( α ) ← SM(E( β ), E( α ))
   05.    E( γ ) ← ASC(E( r a n g e . l b i ), E(p))
   06.    E( α ) ← SM(E( α ), E( γ ))
   07. return E( α )

4. Privacy-Preserving Top- k Query Processing Algorithm

In this section, we suggest a privacy-preserving Top-k query processing algorithm that uses new secure protocols, such as the ASC, ASRO, and ASPE protocols mentioned in Section 3. The proposed Top-k query processing algorithm retrieves k data items that have the highest score for a scoring function over the encrypted database. The algorithm is organized in three phases; node data search phase, Top-k retrieval phase, and Top-k result refinement phase.

4.1. Node Data Search Phase

While hiding the data access patterns, C A securely extracts all the data items from a node containing the highest score. The procedure of the node data search phase is presented in Algorithm 4. First, for the extraction of data items related to the query, C A securely calculates a max point ( m p ) = < m p 1 ,   m p 2 ,   ,   m p m >, where m p j is the highest score for dimension j( 1 j m ) (lines 1∼4). To calculate m p , C A adds E( h i n t ) and E( c o e f i ) for 1 i m , where m is the number of dimensions. h i n t means the largest absolute value among the coefficients. m p i is determined according to the sign of the coefficients for the ith dimension. If  c o e f i + h i n t is greater than hint, c o e f i is positive and so m p i is set to E( m a x i ). Otherwise, m p i is set to E(0). By finding the node including m p , C A can extract the data items that have the possibility of the highest score.
Second, C A finds a node including mp by executing E( δ i ) = ASPE(E( m p ), E( n o d e i )) for 1 i # _ o f _ n o d e s , where # _ o f _ n o d e s means the total number of the leaf nodes in the kd-tree (lines 5∼6). The result of the ASPE protocol for all leaf nodes, i.e., E( δ ), is <E( δ 1 ), E( δ 2 ), …, E( δ # _ o f _ n o d e )>. Because our Top-k query processing algorithm utilizes the ASPE protocol, it can achieve a better performance than the existing algorithms [27,28]. In contrast, the existing algorithms cause a large computational overhead because they perform an iterative operation as many times as the bit length of data. Third, C A make E( δ ) by shuffling E( δ ) using a random permutation function π . Then, C A sends E( δ ) to C B (lines 7∼8).
Fourthly, by decrypting E( δ ), C B counts how many δ j has 1 (i.e., c) (lines 9∼10). Fifthly, C B generates c number of node groups ( N G ) (line 11). C B assigns a node with δ j = 1 into a different group among c node groups. C B evenly assigns the remaining nodes with δ j = 0 into c node groups (lines 12∼14). Then, C B randomly shuffles the ids of the nodes in each node group and sends to C A the shuffled N G , i.e.,  N G (line 15).
Fifthly, C A obtains N G * by deshuffling N G using π 1 (line 17∼18). Finally, for all the data items of each node for each N G * , C A performs E( c a n d w , t ) = E( c a n d w , t ) × SM(E( n o d e z . d a t a s , t ), E( δ z )) where 1 i c , 1 j # _ o f _ n o d e s in the selected N G i * , 1 s F a n O u t and 1 t m (lines 19∼25). Here, z means the id of the jth nodes of N G i * and w means c n t + s where c n t is the number of data items extracted from encrypted leaf nodes of the kd-tree. E( δ z ) is the result of the ASPE protocol corresponding to n o d e z . By repeating these steps with an updated c n t , C A returns <E( c a n d 1 ), E( c a n d 2 ), …, E( c a n d c n t )>. Consequently, all the data items in a node, including mp, are securely obtained without exposing the data access patterns [10,11].
Algorithm 4 Node data search phase.
Input: <E( n o d e 1 ), E( n o d e 2 ), …, E( n o d e p ) | p = # _ o f _ n o d e >, E( c o e f )=<E( c o e f 1 ), E( c o e f 2 ), …, E( c o e f m ) | c o e f means coefficient of score function and m = # _ o f _ d i m e n s i o n >, <E( m a x 1 ), E( m a x 2 ), …, E( m a x m ) | m a x i is the maximum data domain of i dimension >, E( h i n t ) // hint means the largest of the absolute values among coefficients
Output: <E( c a n d 1 ), E( c a n d 2 ), …, E( c a n d c n t )> // all candidates inside nodes being related to a query
C A :
   01. for 1 i m  
   02.    E( c o e f i + h i n t ) ← E( c o e f i )×E( h i n t )
   03.    E( ϵ i ) ← ASC(E( c o e f i + h i n t ), E( h i n t ))
   04.    E( m p i ) ← SM(E( ϵ i ), E(0)) × SM(SBN(E( ϵ i )), E( m a x i ))
   05. for 1 i p // p = 2 h 1 where h means the level of the kd-tree
   06.    E( δ i ) ← ASPE(E( m p ), E( n o d e i ))
   07. E( δ ) ← π (E( δ )) // shuffle the order of array E( δ ) = <E( δ 1 ), E( δ 2 ), …, E( δ p )>
   08. send E( δ ) to C B  
C B :
   09. δ ← D(E( δ ))
   10. c the number of ’1’ in δ  
   11. generate c number of Node Group such that N G = < N G 1 , N G 2 , …, N G c >
   12. for 1 i c  
   13.    assign into N G i a node with δ = 1 and # _ o f _ n o d e c 1 nodes with δ = 0
   14.     N G i shuffle the ids of nodes in N G i  
   15. send N G = < N G 1 , N G 2 , …, N G c > to C A  
C A :
   16. c n t 0 // c n t is # of data items extracted from the encrypted leaf nodes of kd-tree
   17. for 1 i c  
   18.     N G i * shuffle node ids using π 1 for each N G i // N G * = < N G 1 * , N G 2 * , …, N G c * >
   19. for 1 i c  
   20.    for 1 j n u m // n u m = # _ o f _ n o d e s in the selected N G i *  
   21.        z = id of jth nodes of N G i *  
   22.       for 1 s F a n O u t // F a n O u t : # of data items in each nodes
   23.          for 1 t m  
   24.             if (s = = 1)
   25.                E( c a n d w , t ) ← E(0) where w = c n t + s // E( c a n d w ) = <E( c a n d w , 1 ), E( c a n d w , 2 ), …, E( c a n d w , m )>
   26.             E( c a n d w , t ) ← E( c a n d w , t )×SM( n o d e z . d a t a s , t , E( δ z ))
   27.     c n t c n t + F a n O u t  
   28. send <E( c a n d 1 ), E( c a n d 2 ), …, E( c a n d c n t )>
Figure 5 presents an example of the node data search phase. The example reuses the same data items in Figure 2. First, C A executes the ASPE protocol between E( N o d e i . R a n g e ) and E( m p ) for 1 i # _ o f _ n o d e . C A sets the result of the ASPE protocol as E( α ) and transmits the result to C B . In Figure 5, for  N o d e 1 , C A performs the ASPE protocol between E( N o d e 1 . R a n g e ) = E( l b ) = <E(0), E(0)>, E( u b ) = <E(6), E(5)> and E( m p ) = <E(10), E(10)>, and sets the result of the ASPE protocol, i.e., E(0), as E( α 1 ). C A obtains the result of the ASPE protocol as < N o d e 1 , E(0)>, < N o d e 2 , E(0)>, < N o d e 3 , E(0)>, < N o d e 4 , E(1)>.
Second, C A permutates the order of < N o d e 1 , E( α 1 )>, < N o d e 2 , E( α 2 )>, …, < N o d e # _ o f _ n o d e , E( α # _ o f _ n o d e )> and assigns new ids based on the shuffled order of nodes, so that C A conceals the original node ids from C B . To recover the original node ids, C A stores the pairs of <the original id, the new id>. For example, in Figure 5, the original order of < N o d e 1 , E(0)>, < N o d e 2 , E(0)>, < N o d e 3 , E(0)>, < N o d e 4 , E(1)> is permutated to < N o d e 4 , E(1)>, < N o d e 1 , E(0)>, < N o d e 2 , E(0)>, < N o d e 3 , E(0)>. Then, C A converts N o d e 4 , N o d e 1 , N o d e 2 , and  N o d e 3 into P N 1 , P N 2 , P N 3 , and  P N 4 , respectively. The permutated order is < P N 1 , E(1)>, < P N 2 , E(0)>, < P N 3 , E(0)>, < P N 4 , E(0)>. Then, C A sends the permutated order to C B .
Third, C B obtains the permutated order and decrypts it. In Figure 5, C B obtains < P N 1 , E(1)>, < P N 2 , E(0)>, < P N 3 , E(0)>, < P N 4 , E(0)>, and gets < P N 1 , 1>, < P N 2 , 0>, < P N 3 , 0>, < P N 4 , 0> by decrypting it. To create node groups, C B counts how many 1s are in the order. Each node group has one core node whose α p equals to 1, where 1 p # _ o f _ n o d e . Nodes whose α p equals to 0, where 1 p # _ o f _ n o d e , are uniformly designated in the node groups. Additionally,  C B transmits the node groups to C A . For example, C B counts how many 1s are in the order of < P N 1 , 1>, < P N 2 , 0>, < P N 3 , 0>, < P N 4 , 0>, and generates a node group with the core node ( P N 1 ). The nodes < P N 3 , 0>, < P N 2 , 0>, and < P N 4 , 0> are designated in the node group. C B transmits the node group, i.e.,  P N 1 , P N 3 , P N 2 , P N 4 , to  C A .
Fourth, C A recovers the original node ids by using the pairs of <the original id, the new id>. In Figure 5, using < N o d e 1 ,   P N 2 >, < N o d e 2 ,   P N 3 >, < N o d e 3 ,   P N 4 >, < N o d e 4 ,   P N 1 >; C A gains N o d e 4 ,   N o d e 2 ,   N o d e 1 ,   N o d e 3 as the original node ids. Fifth, C A executes the SM protocol between the encrypted data item in a node group and E( α ). C A executes SM(E(1), E( N o d e 4 . D a t a )), SM(E(0), E( N o d e 2 . D a t a )), SM(E(0), E( N o d e 1 . D a t a )), and SM(E(0), E( N o d e 3 . D a t a )). The results of the SM protocol are E( d 16 ), E(0), E(0), E(0), and C A stores E( d 16 ) in the candidate set by merging the results. Sixth, for each node group, the algorithm performs the steps 5∼7 as many times as the # of data items. In Figure 5, by merging the results, C A obtains E( d 16 ), E( d 17 ), E( d 18 ), E( d 19 ), E( d 20 ).

4.2. Top-k Retrieval Phase

From the candidates obtained in the previous phase, the algorithm retrieves k number of data items with the highest score. The procedure of the Top-k retrieval phase is as follows. First, C A calculates the score by using a score function that is represented by the encrypted coefficients( q u e r y ). Second, C A finds the maximum score, i.e.,  s c o r e m a x , among the calculated score. An attacker cannot distinguish which data item has s c o r e m a x because of the Paillier cryptosystem [32]. Third, to mark the encrypted data item with s c o r e m a x , C A subtracts s c o r e m a x from the score. If the result of subtraction equals E(0), the data items have the maximum score. Because E(0) is unchanged by the homomorphic multiplication in the Paillier cryptosystem, the algorithm securely obtains Top-k. Fourth, the algorithm also executes the homomorphic multiplication of the result by a random value, so as to conceal the original data. To hide the data access patterns, the algorithm permutates the order of the result. Finally, the algorithm finds a data item with the highest score and replays the above process until k number of results are searched. The pseudo code of the Top-k retrieval phase is presented in Algorithm 5. First, using a score function (SF), C A securely computes the score E( s c o r e i ) between E( c o e f ) and E( c a n d i ) for 1 i c n t (lines 1∼2). Second, C A executes SMAX n to obtain the maximum value E( s c o r e m a x ) among E( s c o r e i ) for 1 i c n t (lines 3∼4). Third, C A computes E( τ i ) = E( s c o r e m a x ) × E( s c o r e i ) N 1 , for  1 i c n t . C A calculates E( τ i ) = E ( τ i ) r i , where r i means a random value for 1 i c n t . C A obtains E( λ ) by permutating E( τ ) and transmits E( λ ) to C B (lines 5∼9). Fourthly, after decrypting E( λ ), C B stores E( U i ) = E(1) if E( λ i ) = 0 for 1 i c n t . Otherwise, C B sets E( U i ) = E(0). C B sends E(U) = <E( U 1 ), E( U 2 ), …, E( U c n t )> to C A (lines 10∼13). Fifthly, C A obtains E(V) by shuffling E(U) using π 1 . Then, C A does SM(E( V i ), E( c a n d i , j )) to obtain E( V i , j ) where 1 i c n t and 1 j m (lines 14∼17). Sixthly, C A sets E( s c o r e i ) with the highest score to E(0) by computing Equation (4) (lines 18∼19). Because the highest score is set to E(0) and the other scores are unchanged, the algorithm can obtain the Top-k result while preventing the same result from being selected more than twice.
E ( s c o r e i ) = S M ( E ( V i ) , E ( 0 ) ) × S M ( S B N ( E ( V i ) ) , E ( s c o r e i ) )
Finally, by calculating E( t o p k s , j ) = i = 1 c n t E ( V i , j ) for 1 j m and 1 s k , C A can securely extract the data item corresponding to the E( s c o r e m a x ) (lines 20∼21). This procedure is replayed k times to search the Top-k result.
Figure 6 presents an example of the Top-k retrieval phase. The example reuses the data items in Figure 2. The permutated function π is bypassed. First, C A computes the score by using a score function (SF), and stores the result in E( s c o r e i ) for 1 i c n t (①). In Figure 6, C A executes SF(E( c a n d 1 ) = <E( c a n d 1 , x a x i s ), E( c a n d 1 , y a x i s )>, E(coef) = <E( c o e f 1 ), E( c o e f 2 )>) = SF(<E(5), E(8)>, <E(3), E(2)>) = E(31), and stores it into E( s c o r e 1 ). Second, the highest score is calculated by using the SMAX n . C A stores SMAX n (E( s c o r e 1 ), E( s c o r e 2 ), E( s c o r e 3 ), E( s c o r e 4 ), E( s c o r e 5 )) = SMAX n (E(31), E(38), E(39), E(45), E(46)) = E(46) in E( s c o r e m a x ) (②). Third, to gain the encrypted data item with the highest score, C A stores E( s c o r e m a x - s c o r e i ) in E( τ i ) for 1 i c n t (③). If  s c o r e m a x is the same as s c o r e i , E( τ i ) is set to E(0). For E( s c o r e 5 ), C A stores E(46-46) = E(0) into E( τ 5 ). Fourth, C A executes the homomorphic multiplication of E( τ i ) by a random value to prevent the leakage of sensitive data(④). For E( τ 1 ), when a random value = 2, C A stores E(15×2) = E(30) into E( τ 1 ). Fifth, for  1 i c n t , if E( τ i ) is E(0), C B sets E( V i ) to E(1); otherwise, C B sets E( V i ) to E(0) (⑤). Sixth, C A obtains Top-k by doing the SM protocol between E( c a n d i ) and E( V i ) for 1 i c n t and summing the result of the SM protocol up (⑥~⑦). In Figure 6, for the x-axis, C A performs SM(E( V 1 = 0), E( c a n d 1 , x a x i s = 5)), SM(E( V 2 = 0), E( c a n d 2 , x a x i s = 8)), SM(E( V 3 = 0), E( c a n d 3 , x a x i s = 7)), SM(E( V 4 = 0), E( c a n d 4 , x a x i s = 9)), and SM (E( V 5 = 1), E( c a n d 5 , x a x i s = 10)). For the y-axis, C A performs SM(E( V 1 = 0), E( c a n d 1 , y a x i s = 8)), SM(E( V 2 = 0), E( c a n d 2 , y a x i s = 7)), SM(E( V 3 = 0), E( c a n d 3 , y a x i s = 9)), SM(E( V 4 = 0), E( c a n d 4 , y a x i s = 9)), and SM (E( V 5 = 1), E( c a n d 5 , y a x i s = 8)). C A sums E(0), E(0), E(0), E(0), and E(10) for the x-axis while summing E(0), E(0), E(0), E(0), and E(8) for the y-axis. Thus, C A gains <E(10), E(8)> as the Top-k result. Seventh, using Equation (4), C A stores the score of the searched Top-k result into E(0) so that C A can avoid selecting the same Top-k data for the next time. In Figure 6, C A stores SM(E(0), E(1))× SM(E(46), E(0)) = E(0) into E( s c o r e 5 ). Finally, C A replays the previous procedure until the Top-k data is searched (②~⑧).
Algorithm 5 Top-k retrieval phase.
Input: <E( c a n d 1 ), E( c a n d 2 ), …, E( c a n d c n t ) | c n t = # _ o f _ c a n d i d a t e s >, E( c o e f ), k // c o e f means coefficient of score function and k means the number of Top-k
Output: <E( t o p k 1 ), E( t o p k 2 ), …, E( t o p k k )> // temporary Top-k results
C A :
   01. for 1 i c n t  
   02.    E( s c o r e i ) ← SF(E( c o e f ), E( c a n d i )) // SF is a score function for Top-k
   03. for 1 s k  
   04.    E( s c o r e m a x ) ← SMAX n (E( s c o r e 1 ), E( s c o r e 2 ), …, E( s c o r e c n t ))
   05.    for 1 i c n t  
   06.       E( τ i ) ← E( s c o r e m a x )×E( s c o r e i ) N 1  
   07.       E( τ i ) ← E ( τ i ) r i // r i means random value for i
   08.    E( λ ) ← π ( τ ) // τ =< τ 1 , τ 2 , …, τ c n t >
   09.    send E( λ )=<E( λ 1 ), E( λ 2 ), …, E( λ c n t )> to C B  
C B :
   10.    for 1 i c n t  
   11.       if D( λ i ) = 0 then E( U i ) ← E(1)
   12.       else E( U i ) ← E(0)
   13.    send E(U)=<E( U 1 ), E( U 2 ), …, E( U c n t )> to C A  
C A :
   14.    E(V) ← π 1 (U) // E(V)=<E( V 1 ), E( V 2 ), …, E( V c n t )>
   15.    for 1 i c n t  
   16.       for 1 j m // m means # _ o f _ d i m e n s i o n  
   17.          E( V i , j ) ← SM(E( V i ), E( c a n d i , j )) // E( c a n d i )=<E( c a n d i , 1 ), E( c a n d i , 2 ), …,E( c a n d i , m )>
   18.       if s < k  
   19.          E( s c o r e i ) ← SM(E( V i ), E(0))×SM(SBN(E( V i )), E( s c o r e i ))
   20.    for 1 j m  
   21.       E( t o p k s , j ) ← i = 1 c n t E ( V i , j ) // E( t o p k s )=<E( t o p k s , 1 ), E( t o p k s , 2 ), …, E( t o p k s , m )>
   22.    E( t o p k ) ← E( t o p k ) ∪ E( t o p k s )
   23. Return E( t o p k )=<E( t o p k 1 ), E( t o p k 2 ), …, E( t o p k k )>

4.3. Top-k Result Refinement Phase

The Top-k result refinement confirms the correctness of the current Top-k result. Especially, the neighboring nodes must be searched to obtain data items with the higher score than the criteria. Therefore, we calculate the max point of ith node ( m p _ n o d e i = < m p _ n o d e i , 1 ,   m p _ n o d e i , 2 ,   ,   m p _ n o d e i , m >, where m p _ n o d e i , j is the highest score for dimension j( 1 j m )), such that the m p _ n o d e i is a point in the ith node whose score is the highest for the given query ( c o e f ). If the coefficient for the jth dimension is positive, C A stores the upper bound value of the ith node into m p _ n o d e i , j . Otherwise, C A stores the lower bound value of the ith node into m p _ n o d e i , j . If the score of m p _ n o d e i is greater than the criteria, C A extracts the data items from the ith node. Finally, after searching all the nodes, C A recalculates the final Top-k result in the same way as Algorithm 5. The process of the Top-k result refinement phase is presented in Algorithm 6. First, C A calculates E( c r i t e r i a ) = SF(E( c o e f ), E( t o p k k )) to gain the minimum score between the Top-k result and the query (line 1). Second, for each node, C A executes SM(E( ϵ j ), E( n o d e i . l b j ))×SM(SBN(E( ϵ j )), E( n o d e i . u b j )) for 1 i # _ o f _ n o d e s and 1 j m , and stores the result into E( m p _ n o d e i , j ) (lines 2∼4). E( ϵ j ) is the value computed by the ASC protocol for the jth dimension in the node data search phase. C A securely obtains the max point of the ith node, i.e., E( m p _ n o d e i ). Fourthly, C A calculates the highest score between the query and E( m p _ n o d e i ) by using the score function (SF), i.e., E( h s _ n o d e i ) (line 5). To prevent the same node from being selected more than twice, C A securely computes E( h s _ n o d e i ) = SM(E( δ i ), E(0))×SM(SBN(E( δ i )), E( h s _ n o d e i )), where E( δ i ) is the value returned by the ASPE protocol, and sets E( h s _ n o d e i ) to E(0), where the node has already been retrieved in the node data search phase (line 6). By performing E( δ i ) = ASC(E(criteria), E( h s _ n o d e i )), C A sets E( δ i ) to E(1) for the ith node if E(criteria) is less than E( h s _ n o d e i ). Otherwise, C A sets E( δ i ) to E(0) (line 7). Fifthly, C A securely obtains the data items stored in nodes with E( δ i ) = E(1) and generates E( c a n d ) by merging them with E( t o p k ) (lines 8∼9). Then, C A executes the Top-k retrieval phase based on E( c a n d ) to calculate the final Top-k result E( t o p k i * ) for 1 i k (line 10). Sixthly, to hide the Top-k result from C B , C A calculates E( γ i , j ) = E( t o p k i * ) × E( r i , j ) for 1 i k and 1 j m using a random value r i , j . Then, C A transmits E( γ i , j ) to C B and r i , j to A U (lines 18∼22). Seventhly, C B decrypts E( γ i , j ) and transmits the decrypted value to A U (lines 23∼26). Finally, A U gains the plaintexts of the Top-k result by calculating γ i , j - r i , j (lines 27∼29). As a result, A U can reduce the computation overhead without using decryption operations.
Algorithm 6 Top-k result refinement phase.
Input: <E( n o d e 1 ), E( n o d e 2 ), …, E( n o d e p ) | p = # _ o f _ n o d e >, <E( t o p k 1 ), E( t o p k 2 ), …, E( t o p k k )>, E( c o e f ), k // c o e f means coefficient of score function and k means the number of Top-k
Output: t o p k * =< t o p k 1 * , t o p k 2 * , …, t o p k k * > // final Top-k results
C A :
   01. E(criteria) = SF(E( c o e f ), E( t o p k k ))
   02. for 1 i p  
   03.    for 1 j m  
   04.       E( m p _ n o d e i , j )←SM(E( ϵ j ), E( n o d e i . l b j ))×SM(SBN(E( ϵ j )), E( n o d e i . u b j )) // E( δ i ) is value returned by ASPE protocol in line 6 of Algorithm 4
   05.    E( h s _ n o d e i ) ← SF(E( m p _ n o d e i ), E( c o e f ))
   06.    E( h s _ n o d e i ) ← SM(E( δ i ), E(0))× SM(SBN(E( δ i )), E( h s _ n o d e i )) // E( δ i ) is value returned by ASPE protocol in line 6 of Algorithm 4
   07.    E( δ i ) ← ASC(E(criteria), E( h s _ n o d e i ))
   08. E( c a n d ) ← perform 7 ∼ 27 lines of Algorithm 4 with <E( n o d e 1 ), E( n o d e 2 ), …, E( n o d e p )>, E( δ ) = <E( δ 1 ), E( δ 2 ), …, E( δ p )>
   09. E( c a n d ) ← E( c a n d )∪<E( t o p k 1 ), E( t o p k 2 ), …, E( t o p k k )>
   10. E( t o p k * ) ← perform Algorithm 5 with E( c a n d ), E( c o e f ) and k
   11. for 1 i k  
   12.    for 1 j m  
   13.       pick up the random value r i , j // r i =< r i , 1 , r i , 2 , …, r i , m >
   14.       E( γ i , j ) ← E( t o p k i , j * )×E( r i , j ) // E( t o p k i * )=<E( t o p k i , 1 * ), E( t o p k i , 2 * ), …, E( t o p k i , m * )> and E( γ i )=<E( γ i , 1 ), E( γ i , 2 ), …,E( γ i , m )>
   15. send <E( γ 1 ), E( γ 2 ), …, E( γ k ) > to C B and < r 1 , r 2 , …, r k > to A U // A U means authorized user
C B :
   16. for 1 i k  
   17.    for 1 j m  
   18.        γ i , j ← D(E( γ i , j )) // γ i = < γ i , 1 , γ i , 2 , …, γ i , m >
   19. send < γ 1 , γ 2 , …, γ k > to A U  
A U :
   20. for 1 i k  
   21.    for 1 j m  
   22.        t o p k i , j * γ i , j - r i , j // t o p k i * = < t o p k i , 1 * , t o p k i , 2 * , …, t o p k i , m * >
Figure 7 presents an example of the Top-k result refinement phase. First, C A calculates the max point of the ith node ( m p _ n o d e i )(①). For  n o d e 1 , C A performs m p _ n o d e 1 = <E( m p _ n o d e 1 , x a x i s ), E( m p _ n o d e 1 , y a x i s )> = <SM(E(0), E(0))×SM(E(1), E(6)), SM(E(0), E(0))×SM(E(1), E(5))> = <E(6), E(5)>. C A calculates the same operation for n o d e 2 , n o d e 3 , and  n o d e 4 . Second, C A calculates the highest score in the ith node( h s _ n o d e i ) (②). For the highest score of n o d e 1 , C A performs E( h s _ n o d e 1 ) = SM(E(3), E(6))×SM(E(2), E(5)) = E(28). C A computes the highest scores for n o d e 2 , n o d e 3 , and n o d e 4 . Third, to prevent the same node from being selected more than twice, C A sets E( h s _ n o d e i ) to E(0) because n o d e i has been already searched in the node data search phase (③). For  n o d e 4 , C A performs E( h s _ n o d e 4 ) = SM(E(1), E(0))×SM(E(0), E(50)) = E(0). Fourth, for node expansion, C A checks whether the ith node needs to be searched by performing the ASC protocol between E(criteria) and E( h s _ n o d e i ) for 1 i # _ o f _ n o d e (④). For  n o d e 2 , C A performs E( δ 2 ) = ASC(E(criteria), E( h s _ n o d e 2 )) = ASC(E(38), E(40)) = E(1), where E(criteria) equals E(38). Fifth, C A extracts all the data items in the expansion node( n o d e 2 ) by using the node data search phase (⑤). Finally, C A obtains the final Top-k results by performing the Top-k retrieval phase in Figure 7 (⑥).

5. Privacy-Preserving Parallel Top- k Query Processing Algorithm

In this section, we propose a privacy-preserving parallel Top-k query processing algorithm, which is the expansion of the proposed Top-k query processing algorithm mentioned in Section 4, so that it can be efficiently executed in a multi-core environment. The proposed parallel Top-k query processing algorithm consists of three phases; the parallel node data search phase, parallel Top-k retrieval phase, and parallel Top-k result refinement phase.

5.1. Parallel Node Data Search Phase

C A obtains all the data items from a node having a max point( m p ) in a parallel way. To expand the node data search phase in Section 4.1 to a multi-core environment, we utilize a thread pool where tasks can be executed simultaneously. The procedure of the parallel node data search phase is presented in Algorithm 7. First, C A computes m p in the same way as Algorithm 4 (lines 1∼4). Second, C A creates a thread pool based on queue (line 5). If a thread is able to process in the thread pool, C A assigns a task to a thread in a First-In-First-Out manner. Third, for the ith node where 1 i # _ o f _ n o d e s , C A allocates to the thread pool the task of the ASPE protocol (Algorithm 3 in Section 3), i.e., Proc_ASPE(E( m p ), E( n o d e i ), E( δ i )). The result of the ASPE protocol is set in E( δ ) = E( δ 1 ), E( δ 2 ), …, E( δ # _ o f _ n o d e ) (lines 6∼7). Fourthly, C A creates E( δ ) by permutating E( δ ) using a random function π and transmits E( δ ) to C B (lines 8∼9). Fifthly, C B performs the lines 9∼15 in Algorithm 4 to determine which node is selected (line 10). Sixthly, to recover the original node ids from the shuffled node ids, C A obtains node groups ( N G * ) with the original node ids by performing the lines 16∼18 in Algorithm 4 (line 11). Seventhly, C A gets access to one data item in each node for each N G * and allocates to the thread pool the task of Extract_candidate (lines 12∼17). The procedure of Extract_candidate securely obtains the candidate data items of the node, which includes m p . For this, C A performs E( c a n d w , t )←E( c a n d w , t )×SM(E( n o d e z . d a t a s , t ), E( δ z )), where 1 t m , z = id of jth nodes of N G i * , w = c n t + s , and  1 s F a n O u t . Here, 1 i c , 1 j # _ o f _ n o d e s in the selected N G i * , and  c n t = the number of data items extracted from the encrypted leaf nodes of kd-tree. Finally, C A merges all candidate data items obtained in a parallel way (line 18).
Algorithm 7 Parallel node data search phase.
Input: <E( n o d e 1 ), E( n o d e 2 ), …, E( n o d e p ) | p = # _ o f _ n o d e >, E( c o e f )=<E( c o e f 1 ), E( c o e f 2 ), …, E( c o e f m ) | c o e f means coefficient of score function and m = # _ o f _ d i m e n s i o n >, <E( m a x 1 ), E( m a x 2 ), …, E( m a x m ) | m a x i is the maximum data domain of i dimension >, E( h i n t ) // hint means the largest of the absolute values among coefficients
Output: <E( c a n d 1 ), E( c a n d 2 ), …, E( c a n d c n t )> // all candidates inside nodes being related to a query
C A :
   01. for 1 i m  
   02.    E( c o e f i + h i n t ) ← E( c o e f i )×E( h i n t )
   03.    E( ϵ i ) ← ASC(E( c o e f i + h i n t ), E( h i n t ))
   04.    E( m p i ) ← SM(E( ϵ i ), E(0)) × SM(SBN(E( ϵ i )), E( m a x i ))
   05. generate thread_pool // create a thread and wait in the pool until a task is given
   06. for 1 i p  
   07.    call thread_pool_push(Proc_ASPE(E( m p ), E( n o d e i ), E( δ i ))) // assign the task to an available thread
   08. E( δ ) ← π (E( δ )) // shuffle the order of array E( δ ) = <E( δ 1 ), E( δ 2 ), …, E( δ p )>
   09. send E( δ ) to C B  
C B :
   10. perform line 9∼15 in Algorithm 4
C A :
   11. perform line 16∼18 in Algorithm 4
   12. for 1 i c  
   13.    for 1 j n u m // n u m = # _ o f _ n o d e s in the selected N G i *  
   14.        z = id of jth nodes of N G i *  
   15.       for 1 s F a n O u t // F a n O u t : the number of data items in each nodes
   16.          call thread_pool_push(Extract_candidate(E( n o d e z ), E( δ z ), m, s, c n t , E( c a n d w )))
   17.     c n t c n t + F a n O u t  
   18. return <E( c a n d 1 ), E( c a n d 2 ), …, E( c a n d c n t )>
 
procedure 1. Proc_ASPE(E( m p ), E( n o d e i ))
    Begin Procedure
     01. E( δ i ) ← ASPE(E( m p ), E( n o d e i ))
     02. return E( δ i )
    End Procedure
end procedure
 
procedure 2. Extract_candidate(E( n o d e z . d a t a s ) = <E( n o d e z . d a t a s , 1 ), E( n o d e z . d a t a s , 2 ), …, E( n o d e z . d a t a s , m )>, E( δ z ))
    Begin Procedure
     01. for 1 t m  
     02.    if (s = = 1)
     03.       E( c a n d w , t ) ← E(0) where w = c n t + s // E( c a n d w ) = <E( c a n d w , 1 ), E( c a n d w , 2 ), …,E( c a n d w , m )>
     04.    E( c a n d w , t ) ← E( c a n d w , t )×SM( n o d e z . d a t a s , t , E( δ z ))
     05. return E( c a n d w ) = <E( c a n d w , 1 ), E( c a n d w , 2 ), …, E( c a n d w , m )>
    End Procedure
end procedure

5.2. Parallel Top-k Retrieval Phase

C A retrieves k number of data items with the highest score in a parallel way. The process of the parallel Top-k retrieval phase is presented in Algorithm 8. First, to calculate the scores of the candidates in a parallel way, C A allocates to the thread pool the task of the score function, i.e., Proc_SF(E( c o e f ), E( c a n d i ), E( s c o r e i )) for 1 i c n t . The result of the Proc_SF is set to E( s c o r e ) = E( s c o r e 1 ), E( s c o r e 2 ), …, E( s c o r e c n t ) (lines 1∼2). Second, C A executes SMAX n to search the maximum E( s c o r e m a x ) among E( s c o r e i ) for 1 i c n t (lines 3∼4). Third, to determine the data items with the highest score in a parallel way, C A allocates to the thread pool the task of Mark_Top_one. The procedure of Mark_Top_one securely sets the data item with the highest score to E(0). For this, C A performs E( τ i ) ← (E( s c o r e m a x )×E( s c o r e i ) N 1 ) r for 1 i c n t , where r means a random number and N is created in the Paillier cryptosystem (lines 5∼6). Fourthly, C A gains E( λ ) by permutating E( τ ) = < E( τ 1 ), E( τ 2 ), …, E( τ c n t )> using a random function π and sends E( λ ) to C B (lines 7∼8). Fifthly, after decrypting E( λ ), C B performs lines 10∼13 in Algorithm 4 to mark which data item contains the highest score (line 9). That is, C B stores E( U i ) into E(1) if E( λ i ) = 0 for 1 i c n t . Otherwise C B sets E( U i ) to E(0). Sixthly, C A obtains E(V) by recovering E(U) = <E( U 1 ), E( U 2 ), …, E( U c n t )> using π 1 (line 10). Seventhly, to prune out the data items except the data item with the highest score (Top-one) in a parallel way, C A allocates to the thread pool the task of Pruneout_From_Top2. The procedure of Pruneout_From_Top2 securely sets the data items except Top-one to E(0) by performing Prunout_From_Top2(E( V i ), E( c a n d i , j ), E( s c o r e i ), E( V i , j )) for 1 i c n t and 1 j m . The result of the Prunout_From_Top2 is stored in both E( s c o r e ) = E( s c o r e 1 ), E( s c o r e 2 ), …, E( s c o r e c n t ) and E( V i ) = E( V i , 1 ), E( V i , 2 ), …, E( V i , m ) (lines 11∼12). Eighthly, to obtain the Top-one in a parallel way, C A allocates the task of Find_Top_one to the thread pool. The procedure of Find_Top_one securely merges all the data items. For this, Find_Top_one performs a secure additive operation, i.e., E( t o p k s , j ) ← E( t o p k s , j )×E( V i , j ) for 1 i c n t , 1 j m , and  1 s # _ o f _ t o p k . The result of the Find_Top_one is stored into E( t o p k s ) = <E( t o p k s , 1 ), E( t o p k s , 2 ), …, E( t o p k s , m )> (lines 13∼14). Finally, C A obtains the Top-k result by merging all the Top_one for k rounds by repeating lines 4∼15 in Algorithm 8 (lines 15∼16).

5.3. Parallel Top-k Result Refinement Phase

C A determines whether or not the Top-k result obtained in the parallel Top-k retrieval phase is sufficient or not, in a parallel way. If not sufficient, C A performs both the parallel node data search phase and the parallel Top-k retrieval phase again. The process of the parallel Top-k result refinement phase is presented in Algorithm 9. First, C A calculates E(criteria) = SF(E( c o e f ), E( t o p k k )) to gain the minimum score between the Top-k result and the query (line 1). Second, to determine the neighboring nodes that need to be searched to acquire data items with a higher score than E(criteria) in a parallel way, C A allocates the task of check_node_expansion to the thread pool (lines 2∼5). To find the max point of each node, C A performs E( m p _ n o d e i , j )←SM(E( ϵ j ), E( n o d e i . l b j ))×SM(SBN(E( ϵ j )), E( n o d e i . u b j )) for 1 i # _ o f _ n o d e s and 1 j # _ o f _ d i m e n s i o n . Here, E( ϵ j ) is obtained by the ASC protocol in line 3 of Algorithm 6. If the coefficient for the jth dimension is positive, E( ϵ j ) is set to E(0). Otherwise, E( ϵ j ) is set to E(1). To calculate the highest score of each node, C A performs E( h s _ n o d e i ) ← SF(E( m p _ n o d e i ), E( c o e f )) for the ith node. To find nodes whose highest score is greater than the criteria, C A performs E( δ i ) ← ASC(E(criteria), E( h s _ n o d e i )) for the ith node. If E( δ i ) is E(1), the ith node needs to be expanded for the parallel Top-k result refinement. Third, C A calculates the final Top-k result in ciphertext by including the additional Top-k result obtained by performing Algorithm 8 (line 6). Fourthly, to hide the Top-k result from C B , C A calculates E( γ i , j ) = E( t o p k i * ) × E( r i , j ) for 1 i k and 1 j m , utilizing a random value r i , j . Then, C A transmits E ( γ i , j ) to C B and r i , j to A U (lines 7∼11). Fifthly, C B decrypts E( γ i , j ) and transmits them to A U (lines 12∼15). Finally, A U gains the plaintext of the Top-k result by calculating γ i , j - r i , j (lines 16∼18). As a result, A U can reduce the computation overhead without using decryption operations.
Algorithm 8 Parallel Top-k retrieval phase.
Input: <E( c a n d 1 ), E( c a n d 2 ), …, E( c a n d c n t ) | c n t = # _ o f _ c a n d i d a t e s >, E( c o e f ), k // c o e f means coefficient of score function and k means the number of Top-k
Output: <E( t o p k 1 ), E( t o p k 2 ), …, E( t o p k k )> // temporary Top-k results
C A :
   01. for 1 i c n t  
   02.    call thread_pool_push(Proc_SF(E( c o e f ), E( c a n d i ), E( s c o r e i )))
   03. for 1 s k  
   04.    E( s c o r e m a x ) ← SMAX n (E( s c o r e 1 ), E( s c o r e 2 ), …, E( s c o r e c n t ))
   05.    for 1 i c n t  
   06.       call thread_pool_push(Mark_Top_one(E( s c o r e m a x ), E( s c o r e i ), E( τ i )))
   07.    E( λ ) ← π ( τ ) // τ =< τ 1 , τ 2 , …, τ c n t >
   08.    send E( λ )=<E( λ 1 ), E( λ 2 ), …, E( λ c n t )> to C B  
C B :
   09. perform line 10∼13 in Algorithm 5
C A :
   10.    E(V) ← π 1 (U) // E(V) = <E( V 1 ), E( V 2 ), …, E( V c n t )>
   11.    for 1 i c n t  
   12.       call thread_pool_push(Pruneout_From_Top2(E( V i ), E( c a n d i , j ), E( s c o r e i ), E( V i , j )))
   13.    for 1 i c n t  
   14.       call thread_pool_push(Find_Top_one(E( V i , j ), c n t , E( t o p k s , j )))
   15.    E( t o p k ) ← E( t o p k ) ∪ E( t o p k s )
   16. return E( t o p k ) = <E( t o p k 1 ), E( t o p k 2 ), …, E( t o p k k )>
 
procedure 3. Proc_SF(E( c o e f ), E( c a n d i ))
    Begin Procedure
     01. E( s c o r e i ) ← SF(E( c o e f ), E( c a n d i ))
     02. return E( s c o r e i )
    End Procedure
end procedure
 
procedure 4. Mark_Top_one(E( s c o r e m a x ), E( s c o r e i ))
    Begin Procedure
     01. E( τ i ) ← E( s c o r e m a x )×E( s c o r e i ) N 1  
     02. E( τ i ) ← E ( τ i ) r // r means random value
     03. return E( τ i )
    End Procedure
end procedure
 
procedure 5. Pruneout_From_Top2(E( V i ), E( c a n d i , j ), E( s c o r e i ))
    Begin Procedure
     01. for 1 j m // m = # _ o f _ dimension
     02.    E( V i , j ) ← SM(E( V i ), E( c a n d i , j ))
     03. if s < k // k = # _ o f _ topk and s is from 1 to k
     04.    E( s c o r e i ) ← SM(E( V i ), E(0))×SM(SBN(E( V i )), E( s c o r e i ))
     05. return E( V i , j ), E( s c o r e i )
    End Procedure
end procedure
 
procedure 6. Find_Top_one(E( V i , j ), c n t , E( t o p k s , j ))
    Begin Procedure
     01. for 1 j m // c n t = # _ o f _ candidates
     02.    E( t o p k s , j ) ← E( t o p k s , j )×E( V i , j ) // s = index of loop from 1 to k, k = # _ o f _ topk
     03. return E( t o p k s , j )
    End Procedure
end procedure
Algorithm 9 Parallel Top-k result refinement phase.
Input: <E( n o d e 1 ), E( n o d e 2 ), …, E( n o d e p ) | p = # _ o f _ n o d e >, <E( t o p k 1 ), E( t o p k 2 ), …, E( t o p k k )>, E( c o e f ), k // c o e f means coefficient of score function and k means the number of Top-k
Output: t o p k * = < t o p k 1 * , t o p k 2 * , …, t o p k k * > // final Top-k results
C A :
   01. E(criteria) = SF(E( c o e f ), E( t o p k k ))
   02. for 1 i p  
   03.    call thread_pool_push(check_node_expansion(E( ϵ ), E( c o e f ), E( δ i ), E( n o d e i ), E(criteria), m), E( δ i ))
   04. E( c a n d ) ← perform 7 ∼ 27 lines of Algorithm 6 with <E( n o d e 1 ), E( n o d e 2 ), …, E( n o d e p )>, E( δ ) = <E( δ 1 ), E( δ 2 ), …, E( δ p )>
   05. E( c a n d ) ← E( c a n d )∪<E( t o p k 1 ), E( t o p k 2 ), …, E( t o p k k )>
   06. E( t o p k * ) ← perform Algorithm 8 with E( c a n d ), E( c o e f ) and k
   07. for 1 i k  
   08.    for 1 j m  
   09.       pick up the random value r i , j // r i = < r i , 1 , r i , 2 , …, r i , m >
   10.       E( γ i , j ) ← E( t o p k i , j * )×E( r i , j ) // E( t o p k i * )=<E( t o p k i , 1 * ), E( t o p k i , 2 * ), …, E( t o p k i , m * )> and E( γ i ) = <E( γ i , 1 ), E( γ i , 2 ), …,E( γ i , m )>
   11. send <E( γ 1 ), E( γ 2 ), …, E( γ k ) > to C B and < r 1 , r 2 , …, r k > to A U  
C B :
   12. for 1 i k  
   13.    for 1 j m  
   14.        γ i , j ← D(E( γ i , j )) // γ i = < γ i , 1 , γ i , 2 , …, γ i , m >
   15. send < γ 1 , γ 2 , …, γ k > to A U  
A U :
   16. for 1 i k  
   17.    for 1 j m  
   18.        t o p k i , j * γ i , j - r i , j // t o p k i * = < t o p k i , 1 * , t o p k i , 2 * , …, t o p k i , m * >
 
procedure 7. Check_node_expansion(E( ϵ ) = <E( ϵ 1 ),E( ϵ 2 ), …, E( ϵ m )>, E( c o e f ), E( δ i ), E( n o d e i ), E(criteria), m = # _ o f _ dimension)
    Begin Procedure
     01. for 1 j m  
     02.    E( m p _ n o d e i , j )←SM(E( ϵ j ), E( n o d e i . l b j ))×SM(SBN(E( ϵ j )), E( n o d e i . u b j )) // E( δ i ) is value returned by ASPE protocol in Algorithm 6
     03. E( h s _ n o d e i ) ← SF(E( m p _ n o d e i ), E( c o e f ))
     04. E( h s _ n o d e i ) ← SM(E( δ i ), E(0))× SM(SBN(E( δ i )), E( h s _ n o d e i )) // E( δ i ) is value returned by ASPE protocol in Algorithm 6
     05. E( δ i ) ← ASC(E(criteria), E( h s _ n o d e i ))
     06. return E( δ i )
    End Procedure
end procedure

6. Security Proof

6.1. Security Proof of the Secure Protocols

In this section, we present the security proof of the ASC and the ASPE protocols proposed in Section 3. To prove that the proposed protocols are secure under the semi-honest model, we present that the simulated images of the proposed protocols are computationally indistinct from their actual execution images. Security proof of the ASC protocol: We describe the security proof of the ASC protocol by analyzing the security of the execution images of C A and C B . First, the execution image on the C B side, i.e., C B ( A S C ) , is shown in Equation (5). Here, E ( v 1 ) and E ( v 2 ) are the encrypted data given from C A (line 9 of Algorithm 1), v 1 , and v 2 , and are acquired by decrypting E( v 1 ) and E( v 2 ). α is the result computed by the ASC protocol utilizing v 1 and v 2 on the C B side.
C B ( A S C ) = E ( v 1 ) ,   E ( v 2 ) ,   v 1 ,   v 2 ,   α
For example, we assume that C B S ( A S C ) = E ( s 1 ) ,   E ( s 2 ) ,   s 1 ,   s 2 ,   s 3 is the simulated execution image utilizing the ASC protocol on the C B side. Here, E( s 1 ) and E( s 2 ) are the non-deterministic numbers chosen in Z N 2 , and s 1 and s 2 are the indistinct numbers that are added by random value. s 3 is the result of the ASC protocol utilizing s 1 and s 2 on the C B side. Because the ASC protocol is executed based on the Paillier cryptosystem, it can support semantic security. Therefore, E( s 1 ) and E( s 2 ) are computationally indistinct from s 1 and s 2 . s 3 is indistinct from s 1 and s 2 because s 3 is computed by multiplying two indistinct numbers in C A , s 1 and s 2 . Therefore, we say that C B ( A S C ) is computationally indistinct from C B S ( A S C ) . Because C B can check only the result (e.g., α ) of the comparison between the non-deterministic numbers (e.g., v 1 and v 2 ), C B cannot obtain the original data while performing the ASC protocol. Furthermore, the execution image of C A is C A ( A S C ) = E ( α ) , such that E( α ) from C B can be considered as the result of the ASC protocol. Assume that the simulated image of C A is C A S ( A S C ) = E ( s 4 ) , where E( s 4 ) is randomly made from Z N 2 . Thus, E( α ) is computationally indistinct from E( s 4 ). Based on the above analysis, there is no information leakage, both at the C A and C B sides. Thus, we can conclude that the proposed ASC protocol is secure under the semi-honest adversarial model. Security proof of the ASPE protocol: We describe the security proof of the ASPE protocol by analyzing the security of the execution images of the C A side and the C B side. First, the execution image on the C B side, i.e., C B ( A S P E ) , is shown in Equation (6). Here, C B ( A S C ) means the execution image of the ASC protocol and C B ( S M ) means the execution image of the SM protocol.
C B ( A S P E ) = { C B ( A S C ) , C B ( S M ) }
In the security proof of the ASC protocol, we have already proven the security of the ASC protocol on the C B side. Additionally, Y. Elmehdwi et al.’s work [19] proved the security of the SM protocol on the C B side. Because the ASPE protocol is composed of the ASC protocol and the SM protocol, the ASPE protocol is secure on the C B side, based on composition theory [42]. On the other hand, the execution image of C A is C A ( A S P E ) = C A ( A S C ) , C A ( S M ) , which can be considered as the result of the ASC protocol and the SM protocol. In the security proof of the ASC protocol, we already proved the indistinguishability of the ASC protocol on the C A side. Additionally, Y. Elmehdwi et al.’s work proved the security of the SM protocol on the C A side [19]. Because the ASPE protocol consists of the ASC protocol and the SM protocol, the ASPE protocol is secure on the C A side, based on composition theory [42]. Based on the above analysis, there is no information leakage, both at the C A and the C B side. Thus, we can conclude that the proposed ASPE protocol is secure under the semi-honest attack model.

6.2. Security Proof of the Proposed Top-k Query Processing Algorithm

We prove that the proposed Top-k query processing algorithm on the encrypted database is secure against the semi-honest adversarial model. The proposed Top-k query processing algorithm in the encrpyted database is composed of the node data search phase (Algorithm 4), the Top-k retrieval phase (Algorithm 5), and the Top-k result refinement phase (Algorithm 6). To prove that the proposed Top-k query processing algorithm is secure under the semi-honest adversarial model, security analysis is executed by each phase. First, because the node data search phase is composed of the ASPE protocol, which has been proven to be secure, Algorithm 4 is secure under the semi-honest adversarial model by composition theory [42]. Second, the Top-k retrieval phase is secure in the C A side, because C A performs the score function, and the SMAX n and SM protocols have been proven to be secure in previous studies [27,28]. Even if C B decrypts the received data from C A in the Top-k retrieval phase, C B cannot obtain the original data. The reason for this that the data given from C A is changed by raising the original data to the power of a random integer and performing a permutation function. Thus, based on the composition theory [42], Algorithm 5 is secure under the semi-honest adversarial model. Lastly, the images made by the Top-k result refinement phase are the same as those made by Algorithms 4 and 5. Thus, the Top-k result refinement phase (Algorithm 6) is secure against the semi-honest adversarial model. Because all the phases of the proposed Top-k algorithm are secure, the proposed Top-k algorithm on the encrypted database is proven to be secure against the semi-honest adversarial model.

7. Performance Analysis

7.1. Performance Evaluation of the Proposed Top-k Query Processing Algorithm in a Single-Core Environment

In the performance analysis, we compare the proposed privacy-preserving Top-k query processing algorithm with H-I. Kim et. al.’s work [27] (STop k I ) and H-J. Kim et. al.’s work [28] (STop k G ). For performance analysis, three algorithms were programmed by using C++ language under CPU: an Intel(R) Xeon(R) E5-2630 v4 @ 2.20GHz, RAM: 64 GB (16 GB × 4AE) DDR3 UDIMM 1600 MHz and OS: Linux Ubuntu 18.04.2. We compare three algorithms regarding algorithm processing time by changing # of data, # of k, the level of the kd-tree, and the # of the data dimension. For performance experiments, we randomly make 100,000 data points with six dimensions. The domain of the data ranges from 0 to 2 22 . Table 2 presents the parameters for performance evaluation in a single-core environment.
Figure 8 presents the performances of STop k I , STop k G , and the proposed algorithm regarding the height of kd-tree(h). The number of data items(n) is computed as F × 2 h 1 , where 2 h 1 is the number of leaf nodes in the kd-tree. Based on n, it is crucial to select the proper height(h) of the kd-tree. In Figure 8, three algorithms present the best performance when h = 10. The performance of STop k I is greatly influenced by h because STop k I utilizes secure protocols using encrypted binary operations, which need a high computational cost. Whereas, the proposed algorithm is relatively less influenced by h than STop k I because it utilizes secure protocols using arithmetic operations that need low computational cost. Therefore, we set h to 10 in our experiment.
When the encryption key size is 1024 bit, Figure 9 presents the processing time with a varying number of data items. When the number of data items is 20 k, 40 k, 60 k, 80 k, and 100 k, the proposed algorithm requires 2383, 4453, 5287, 6836, and 8447 s, respectively. STop k G requires 4082, 7653, 9156, 11,895, and 14,799 s, while STop k I requires 23,124, 28,015, 32,280, 38,636, and 46,989 s when the number of data items is 20 k, 40 k, 60 k, 80 k, and 100 k. Therefore, the proposed algorithm presents 1.7 and 6.6 times better performance than STop k G and STop k I , respectively, because the proposed algorithm uses secure protocols based on arithmetic operations. STop k G and STop k I use secure protocols based on binary operations. The binary operations have the disadvantage of having to perform more iterations compared to the arithmetic operations.
Figure 10 presents the processing time with a varying number of k when the encryption key size is 1024 bit and n is 60 k. When the number of k is 5, 10, 15, and 20, the proposed algorithm requires 4749, 5287, 7693, and 8856 s. STop k G requires 8297, 9156, 13,232, and 15,043 s, while STop k I requires 22,288, 32,380, 76,076, and 97,021 s. The proposed algorithm presents 1.7 and 7.9 times better performance than STop k G and STop k I , respectively, because the proposed algorithm uses secure protocols based on arithmetic operations. As k increases, the number of data items searched increases in the Top-k result refinement phase. Because the proposed algorithm utilizes the ASC protocol, it presents better performance than both STop k I and STop k G . STop k G presents a better performance than STop k I because it utilizes Yao’s garbled circuit.
When n is 100 k, Figure 11 presents the processing time with a varying key size. When the key size is 512 and 1024, the proposed algorithm requires 2041 and 8486 s, respectively. STop k G requires 4367 and 14,847 s, while STop k I requires 13,500 and 46,989 s. Because the proposed algorithm utilizes secure protocols using arithmetic operations, the proposed algorithm outperforms STop k G and STop k I by 1.9 and 6.0 times. STop k G and STop k I use secure protocols based on binary operations, which have the disadvantage of having to perform more iterations compared to arithmetic operations.

7.2. Performance Evaluation of the Proposed Top-k Query Processing Algorithm in a Multi-Core Environment

For performance evaluation in a multi-core environment, we compare the proposed parallel Top-k query processing algorithm with the parallel version of existing works. For this, we make parallel STop k I (PSTop k I ) and STop k G (PSTop k G ) so that H-I. Kim et. al.’s work [27] and H-J. Kim et. al.’s work [28] can operate in a multi-core environment, respectively. For performance analysis, three algorithms were programmed by using C++ language under CPU: Intel(R) Xeon(R) E5-2630 v4 @ 2.20 GHz, RAM: 64 GB (16 GB×4AE) DDR3 UDIMM 1600MHz and OS: Linux Ubuntu 18.04.2. We compare the proposed parallel algorithm with both PSTop k G and PSTop k I , regarding the algorithm processing time by changing the # of data(n), # of k, and # of threads. For our experiments, we randomly make 100,000 data points with six dimensions. The domain of the data ranges from 0 to 2 22 . Table 3 describes the parameters for performance analysis in a multi-core environment.
Figure 12 presents the processing time with a varying number of threads when n is 100k and the number of k is 10. When # of threads is 2, 4, 6, 8, and 10, the proposed parallel algorithm requires 4451, 2547, 1871, 1552, and 1291 s, respectively. PSTop k G requires 7621, 4304, 3154, 2575, and 2115 s while PSTop k I requires 23,765, 13,293, 9520, 7671, and 6294 s when the number of threads is 2, 4, 6, 8, and 10. Therefore, the proposed parallel algorithm presents 1.6 and 5 times better performance than PSTop k G and PSTop k I , respectively. This is because the proposed parallel algorithm utilizes secure protocols using arithmetic operations. On the contrary, both PSTop k G and PSTop k I use a secure protocol based on binary operations, which have the disadvantage of performing more iterations compared to the arithmetic operations.
When the number of k is 10 and the number of threads is 10, Figure 13 presents the processing time with varying n. When n is 20 k, 40 k, 60 k, 80 k, and 100 k, the proposed algorithm requires 433, 762, 808, 1041, and 1301 s, respectively. SPTop k G requires 642, 1192, 1337, 1711, and 2127 s, while PSTop k I requires 3193, 3994, 4344, 5254, and 6272 s. Thus, the proposed parallel algorithm presents 1.5 and 5.5 times better performance than PSTop k G and PSTop k I , respectively. This is because the proposed algorithm utilizes secure protocols using arithmetic operations. On the contrary, PSTop k G and PSTop k I use a secure protocol based on binary operations that require more iterations.
Figure 14 presents the processing time with a varying number of k when the number of threads is 10 and n is 60 k. When the number of k is 5, 10, 15, and 20, the proposed parallel algorithm requires 681, 845, 1401, and 1664 s. PSTop k G requires 1165, 1392, 2148, and 2466 s, while PSTop k I requires 3000, 4472, 10,235, and 13,042 s. Therefore, the proposed parallel algorithm presents 1.5 and 6.2 times better performance than PSTop k G and PSTop k I , respectively, As k increases, the number of data items searched increases in the Top-k result refinement phase. Because the proposed parallel algorithm utilizes the ASC protocol, it presents a better performance than both PSTop k I and PSTop k G . PSTop k G presents a better performance than PSTop k I because it uses Yao’s garbled circuit.
Table 4 presents the parameters for our performance analysis using a real dataset. We utilize a chess dataset [43] made by a chess endgame database for a white king and rook against a black king. The chess dataset intends to categorize the optimal depth of win for white. We compare the proposed parallel algorithm with both PSTop k G and PSTop k I , regarding the query processing time by varying the number of k, the level of the kd-tree, and the number of threads.
In order to select the optimal level of kd-tree, we conduct the performance evaluation of three algorithms regarding the level of kd-tree. Figure 15 presents the processing time with a varying level of kd-tree when the number of threads is 10 and the number of k is 10. When the level of kd-tree ranges from 5 to 12, the proposed parallel algorithm requires 639, 524, 425, 354, 305, 267, 289, and 312 s. PSTop k G requires 1122, 916, 749, 631, 546, 453, 498, and 539 s, while PSTop k I requires 4282, 3497, 2771, 2259, 1929, 1625, 1745, and 1931 s. Three algorithms present the best performance in the case of h = 10. Thus, for our experiment, we use h as 10.
When the number of threads is 10, Figure 16 presents the processing time with a varying number of k. When the number of k is 5, 10, 15, and 20, the proposed parallel algorithm requires 147, 266, 407, and 505 s. PSTop k G requires 316, 453, 769, and 933 s, while PSTop k I requires 1028, 1625, 3003, and 3758 s. The proposed parallel algorithm outperforms PSTop k G and PSTop k I by 1.8 and 6.9 times. As k increases, the number of data items searched increases in the Top-k result refinement phase. Because the proposed parallel algorithm utilizes the ASC protocol using the arithmetic operation, it presents a better performance than both PSTop k I and PSTop k G . Because PSTop k G uses Yao’s garbled circuit, it presents a better performance than PSTop k I .
When the number of k is 10, Figure 17 presents the processing time with a varying number of threads. When the number of threads is 2, 4, 6, 8, and 10, the proposed parallel algorithm requires 718, 443, 349, 299, and 266 s. PSTop k G requires 1280, 753, 597, 503, and 452 s, while PSTop k I requires 6122, 3396, 2461, 1976, and 1629 s. Because the proposed parallel algorithm utilizes secure protocols using arithmetic operations, it outperforms PSTop k G and PSTop k I by 1.7 and 7.1 times. On the contrary, PSTop k G and PSTop k I use secure protocol based on binary operations which have to perform more iterations compared to the arithmetic operations.

8. Discussion

Impact of new secure protocols with low computation cost: A secure protocol is essential for secure query processing in a multi-party computation environment. We need to streamline the process of secure protocols because we aim for secure query processing by utilizing the Paillier cryptosystem, which uses a large amount of computing resources. First, H-I. Kim et al.’s work [27] used secure protocols, such as SCMP, SPE, SMAX, and SMAX n for Top-k query processing. By utilizing the Paillier cryptosystem, H-I. Kim et al.’s work [27] can hide sensitive data and user’s query. It utilizes arithmetic operations to hide the sensitive data, and a permutation technique to hide data access patterns. However, the weakness of H-I. Kim et. al.’s work [27] is that it requires an extremely high computational cost because the SCMP, SPE, SMAX, and SMAX n protocols utilize binary operations. For example, when we execute the SMAX protocol between E(11) and E(15), clouds convert an encrypted decimal into an encrypted binary array: E(11) ( 10 ) = {E(0), E(1), E(0), E(1), E(1) } ( 2 ) , E(15) ( 10 ) = {E(0), E(1), E(1), E(1), E(1) } ( 2 ) . After that, clouds execute the SMAX protocol between E(11) ( 10 ) = {E(0), E(1), E(0), E(1), E(1) } ( 2 ) and E(15) ( 10 ) = {E(0), E(1), E(1), E(1), E(1) } ( 2 ) . Consequently, the SMAX protocol needs a high computation cost since it executes binary operations as many times as its bit length. Due to the same reason, the SCMP, SPE, and SMAX n protocols require high computation costs. Second, H-J. Kim et al.’s work [44] and Y. Kim et al.’s work [45] proposed secure protocols such as GSCMP and GSPE, which are utilized for index searching to perform kNN and kNN classification based on Yao’s garbled circuit. However, since both GSCMP and GSPE utilize the binary array as input, they need high computation expense due to the same reason as the SMAX protocol. On the contrary, the proposed algorithm uses the new ASC and ASPE protocols, which calculate one Paillier arithmetic operation. The reason for this is that they utilize an encrypted decimal as input, rather than an binary array. Consequently, the new ASC and ASPE protocols need low computation cost.
Impact of a random value pool for parallel processing: In the proposed system architecture, we utilize multi-party computation for the parallel Top-k query processing algorithm. Thus, we must prevent C B from the leakage of sensitive data while performing secure protocols. For this, C A generates a random integer r from Z N and encrypts r by utilizing the Paillier cryptosystem. Then, C A performs the addition of both the encrypted random integer E(r) and the encrypted plaintext E(m) by computing E( m + r ) = E(m) × E(r). Because m + r is independent from m, C B cannot achieve sensitive data for decryption. However, calculating homomorphic addition and homomorphic multiplication to protect sensitive data results in poor performance, since modular and exponential operations in homomorphic addition and homomorphic multiplication require higher computing resources than other encrypted operations. In Table 5, regarding the Secure Multiplication protocol, H.-I. Kim et al.’s work [27] and H.-J. Kim et al.’s work [28] need three times the encryption: two encryptions for random integers at C A and one encryption for the multiplication at C B . On the contrary, the proposed algorithm needs one encryption for the multiplication at C B because it picks up the random integers from the encrypted random value pool at C A . With regard to the Secure Compare protocol, H-I. Kim et al.’s work [27] and H.-J. Kim et al.’s work [28] need l o g 2 D times encryption, where D is a data domain. On the contrary, the proposed algorithm needs one encryption for the comparison at C B utilizing the random value pool. Thus, the proposed algorithm can save the computational cost for encryption by utilizing the random value pool.
Impact of time complexity: To prove that the proposed algorithm is more efficient than the existing algorithms, we analyze the time complexity of both the proposed algorithm and existing ones (H.-I. Kim et al. [27] and H.-J. Kim et al. [28]). Because the number of data, the number dimension, the number of bit length, and the number of Top-k affect the efficiency of the Top-k query processing algorithm, we describe the time complexity in terms of them. Table 6 shows the average time complexity and the worst time complexity for the proposed algorithm and the existing ones [27,28]. The worst time complexity occurs in the case of a query to expand other kd-tree nodes in the Top-k result refinement phase. The average time complexity is measured by considering all queries that require or do not require the expansion of other kd-tree nodes in the Top-k result refinement phase. In the average time complexity, the proposed algorithms show m × ( l 1 ) + k × ( l 1 ) × l o g 2 n lower time complexity than the existing algorithms. This is because the proposed algorithm uses the secure protocol based on decimal arithmetic operation, rather than using one based on binary operation. In the worst time complexity, the proposed algorithm shows m × ( l 1 ) + k × ( l 1 ) × l o g 2 n lower time complexity than the existing algorithms. The reason is the same as the average time complexity.

9. Conclusions and Future Work

In this paper, we proposed a new Top-k query processing algorithm to support both security and efficiency in cloud computing. For security, we propose new secure and efficient protocols based on the arithmetic operations ASC, ASRO, and ASPE. To reduce the query processing time, we propose a privacy-preserving parallel Top-k query processing algorithm by utilizing a random value pool. In addition, we proved that the proposed algorithm is secure against the semi-honest adversarial model. From our performance analysis, the proposed algorithm outperformed the existing algorithms by 1.7∼6.6 times, since it uses new secure protocols with a streamlined process. In addition, when the number of threads ranges from 2 to 10, the proposed parallel algorithm outperforms the existing algorithms by 1.5∼7.1 times, since it uses not only new secure protocols, but also the random value pool. For future work, we plan to use the proposed secure protocols for various privacy-preserving data mining algorithms.

Author Contributions

Conceptualization, H.-J.K.; methodology, H.-J.K.; software, H.-J.K.; validation, H.-J.K., Y.-K.K., H.-J.L., J.-W.C.; formal analysis, H.-J.K., Y.-K.K., H.-J.L., J.-W.C.; investigation, H.-J.K., Y.-K.K., H.-J.L., J.-W.C.; resources, H.-J.K., Y.-K.K., H.-J.L., J.-W.C.; data curation, H.-J.K.; writing—original draft preparation, H.-J.K.; writing—review and editing, H.-J.K., Y.-K.K., H.-J.L., J.-W.C.; visualization, H.-J.K.; supervision, Y.-K.K., J.-W.C.; project administration, J.-W.C.; funding acquisition, J.-W.C., All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Korean government (MSIT) No. 2019R1I1A3A01058375. This paper was funded by Jeonbuk National University in 2020.

Acknowledgments

This work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. 2019R1I1A3A01058375). This paper was supported by research funds of Jeonbuk National University in 2020.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Hayes, B. Cloud computing. Commun. ACM 2008, 51, 9–11. [Google Scholar] [CrossRef]
  2. Qian, L.; Luo, Z.; Du, Y.; Guo, L. Cloud computing: An overview. In Proceedings of the IEEE International Conference on Cloud Computing, Bangalore, India, 21–25 September 2009; pp. 626–631. [Google Scholar]
  3. Grolinger, K.; Higashino, W.A.; Tiwari, A.; Capretz, M.A. Data management in cloud environments: NoSQL and NewSQL data stores. J. Cloud Comput. Adv. Syst. Appl. 2013, 2, 1–24. [Google Scholar] [CrossRef]
  4. Zhao, L.; Sakr, S.; Liu, A.; Bouguettaya, A. Cloud Data Management; Springer: Berlin/Heidelberg, Germany, 2014; 189p. [Google Scholar]
  5. Agrawal, D.; Das, S.; Abbadi, A.E. Data management in the cloud: Challenges and opportunities. In Synthesis Lectures on Data Management; Springer: Nature/Cham, Switzerland, 2012; Volume 4, 138p. [Google Scholar]
  6. Sun, Y.; Zhang, J.; Xiong, Y.; Zhu, G. Data security and privacy in cloud computing. Int. J. Distrib. Sens. Netw. 2014, 10, 190903. [Google Scholar] [CrossRef]
  7. Sharma, Y.; Gupta, H.; Khatri, S.K. A security model for the enhancement of data privacy in cloud computing. In Proceedings of the 2019 Amity International Conference on Artificial Intelligence (AICAI), Dubai, United Arab Emirates, 4–6 February 2019; pp. 898–902. [Google Scholar]
  8. Garigipati, N.; Krishna, R.V. A study on data security and query privacy in cloud. In Proceedings of the 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 23–25 April 2019; pp. 337–341. [Google Scholar]
  9. Cao, N.; Yang, Z.; Wang, C.; Ren, K.; Lou, W. Privacy-preserving query over encrypted graph-structured data in cloud computing. In Proceedings of the 2011 31st International Conference on Distributed Computing Systems, Minneapolis, MN, USA, 20–24 June 2011; pp. 393–402. [Google Scholar]
  10. Islam, M.S.; Kuzu, M.; Kantarcioglu, M. Access pattern disclosure on searchable encryption: Ramification, attack and mitigation. In Proceedings of the 19th Annual Network and Distributed System Security Symposium, San Diego, CA, USA, 5–8 February 2012; p. 12. [Google Scholar]
  11. Williams, P.; Sion, R.; Carbunar, B. Building castles out of mud: Practical access pattern privacy and correctness on untrusted storage. In Proceedings of the 15th ACM Conference on Computer and Communications Security, Alexandria, VI, USA, 27–31 August 2008; pp. 139–148. [Google Scholar]
  12. Cui, S.; Belguith, S.; Zhang, M.; Asghar, M.R.; Russello, G. Preserving access pattern privacy in sgx-assisted encrypted search. In Proceedings of the 2018 27th International Conference on Computer Communication and Networks (ICCCN), Hangzhou, China, 30 July–2 August 2018; pp. 1–9. [Google Scholar]
  13. Yiu, M.L.; Ghinita, G.; Jensen, C.S.; Kalnis, P. Enabling search services on outsourced private spatial data. VLDB J. 2010, 19, 363–384. [Google Scholar] [CrossRef]
  14. Boldyreva, A.; Chenette, N.; Lee, Y.; Oneill, A. Order-preserving symmetric encryption. In Proceedings of the Annual International Conference on the Theory and Applications of Cryptographic Techniques, Cologne, Germany, 26–30 April 2009; pp. 224–241. [Google Scholar]
  15. Boldyreva, A.; Chenette, N.; O’Neill, A. Order-preserving encryption revisited: Improved security analysis and alternative solutions. In Proceedings of the Annual Cryptology Conference, Santa Barbara, CA, USA, 14–18 August 2011; pp. 578–595. [Google Scholar]
  16. Qi, Y.; Atallah, M.J. Efficient privacy-preserving k-nearest neighbor search. In Proceedings of the 28th International Conference on Distributed Computing Systems, Beijing, China, 17–20 June 2008; pp. 311–319. [Google Scholar]
  17. Shaneck, M.; Kim, Y.; Kumar, V. Privacy preserving nearest neighbor search. In Machine Learning in Cyber Trust; Springer: Berlin/Heidelberg, Germany, 2009; pp. 247–276. [Google Scholar]
  18. Vaidya, J.; Clifton, C. Privacy-preserving top-k queries. In Proceedings of the 21st International Conference on Data Engineering (ICDE’05), Tokyo, Japan, 5–8 April 2005; pp. 545–546. [Google Scholar]
  19. Elmehdwi, Y.; Samanthula, B.K.; Jiang, W. Secure k-nearest neighbor query over encrypted data in outsourced environments. In Proceedings of the 2014 IEEE 30th International Conference on Data Engineering, Chicago, IL, USA, 31 March–4 April 2014; pp. 664–675. [Google Scholar]
  20. Kim, H.J.; Kim, H.I.; Chang, J.W. A privacy-preserving kNN classification algorithm using Yao’s garbled circuit on cloud computing. In Proceedings of the 2017 IEEE 10th International Conference on Cloud Computing (CLOUD), Honolulu, HI, USA, 25–30 June 2017; pp. 766–769. [Google Scholar]
  21. Zhou, L.; Zhu, Y.; Castiglione, A. Efficient k-NN query over encrypted data in cloud with limited key-disclosure and offline data owner. Comput. Secur. 2017, 69, 84–96. [Google Scholar] [CrossRef]
  22. Kim, H.I.; Kim, H.J.; Chang, J.W. A secure kNN query processing algorithm using homomorphic encryption on outsourced database. Data Knowl. Eng. 2019, 123, 101602. [Google Scholar] [CrossRef]
  23. Sun, X.; Wang, X.; Xia, Z.; Fu, Z.; Li, T. Dynamic multi-keyword top-k ranked search over encrypted cloud data. Int. J. Secur. Its Appl. 2014, 8, 319–332. [Google Scholar] [CrossRef]
  24. Zhang, W.; Liu, S.; Xia, Z. A distributed privacy-preserving data aggregation scheme for smart grid with fine-grained access control. J. Inf. Secur. Appl. 2022, 66, 103118. [Google Scholar] [CrossRef]
  25. Hozhabr, M.; Asghari, P.; Javadi, H.H.S. Dynamic secure multi-keyword ranked search over encrypted cloud data. J. Inf. Secur. Appl. 2021, 61, 102902. [Google Scholar] [CrossRef]
  26. Ilyas, I.F.; Beskales, G.; Soliman, M.A. A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 2008, 40, 1–58. [Google Scholar] [CrossRef] [Green Version]
  27. Kim, H.I.; Kim, H.J.; Chang, J.W. A privacy-preserving top-k query processing algorithm in the cloud computing. In Proceedings of the International Conference on the Economics of Grids, Clouds, Systems, and Services, Athens, Greece, 20–22 September 2016; pp. 277–292. [Google Scholar]
  28. Kim, H.J.; Chang, J.W. A new Top-k query processing algorithm to guarantee confidentiality of data and user queries on outsourced databases. Int. J. Syst. Assur. Eng. Manag. 2019, 10, 898–904. [Google Scholar] [CrossRef]
  29. Yao, A.C.C. How to generate and exchange secrets. In Proceedings of the 27th Annual Symposium on Foundations of Computer Science, Toronto, ON, Canada, 27–28 October 1986; pp. 162–167. [Google Scholar]
  30. Lindell, Y.; Pinkas, B. A proof of security of Yao’s protocol for two-party computation. J. Cryptol. 2009, 22, 161–188. [Google Scholar] [CrossRef]
  31. Bentley, J.L. Multidimensional binary search trees used for associative searching. Commun. ACM 1975, 18, 509–517. [Google Scholar] [CrossRef]
  32. Paillier, P. Public-key cryptosystems based on composite degree residuosity classes. In Proceedings of the International Conference on the Theory and Applications of Cryptographic Techniques, Prague, Czech Republic, 2–6 May 1999; pp. 223–238. [Google Scholar]
  33. Lindell, Y. Secure multiparty computation for privacy preserving data mining. In Encyclopedia of Data Warehousing and Mining; IGI Global: Hershey, PA, USA, 2005; pp. 1005–1009. [Google Scholar]
  34. Hazay, C.; Lindell, Y. Efficient Secure Two-Party Protocols: Techniques and Constructions; Springer Science & Business Media: Berlin, Germany, 2010. [Google Scholar]
  35. Cramer, R.; Damgård, I.B. Secure Multiparty Computation; Cambridge University Press: Cambridige, UK, 2015. [Google Scholar]
  36. Hazay, C.; Lindell, Y. A note on the relation between the definitions of security for semi-honest and malicious adversaries. Available online: https://eprint.iacr.org/2010/551.pdf (accessed on 6 September 2022).
  37. Veugen, T.; Blom, F.; de Hoogh, S.J.; Erkin, Z. Secure comparison protocols in the semi-honest model. IEEE J. Sel. Top. Signal Process. 2015, 9, 1217–1228. [Google Scholar] [CrossRef]
  38. Vaidya, J.; Clifton, C.W. Privacy-preserving kth element score over vertically partitioned data. IEEE Trans. Knowl. Data Eng. 2008, 21, 253–258. [Google Scholar] [CrossRef]
  39. Fagin, R. Combining fuzzy information from multiple systems. J. Comput. Syst. Sci. 1999, 58, 83–99. [Google Scholar] [CrossRef]
  40. Burkhart, M.; Dimitropoulos, X. Fast privacy-preserving top-k queries using secret sharing. In Proceedings of 19th International Conference on Computer Communications and Networks, Zurich, Switzerland, 2–5 August 2010; pp. 1–7. [Google Scholar]
  41. Zheng, Y.; Lu, R.; Yang, X.; Shao, J. Achieving efficient and privacy-preserving top-k query over vertically distributed data sources. In Proceedings of the ICC 2019-2019 IEEE International Conference on Communications (ICC), Shanghai, China, 20–24 May 2019; pp. 1–6. [Google Scholar]
  42. Goldreich, O. Foundations of Cryptography: Volume 2, Basic Applications; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
  43. Chess (King-Rookvs.King) DataSet. Available online: http://archive.ics.uci.edu/ml/datasets/Chess+%28King-Rook+vs.+King%29 (accessed on 7 July 2022).
  44. Kim, H.J.; Lee, H.; Kim, Y.K.; Chang, J.W. Privacy-preserving kNN query processing algorithms via secure two-party computation over encrypted database in cloud computing. J. Supercomput. 2022, 78, 9245–9284. [Google Scholar] [CrossRef]
  45. Kim, Y.K.; Kim, H.J.; Lee, H.; Chang, J.W. Privacy-preserving parallel kNN classification algorithm using index-based filtering in cloud computing. PLoS ONE 2022, 17, e0267908. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Encrypted database outsourcing model.
Figure 1. Encrypted database outsourcing model.
Electronics 11 02870 g001
Figure 2. Example of the encrypted kd-tree.
Figure 2. Example of the encrypted kd-tree.
Electronics 11 02870 g002
Figure 3. Overlapping and non-overlapping between two regions in the ASRO protocol.
Figure 3. Overlapping and non-overlapping between two regions in the ASRO protocol.
Electronics 11 02870 g003
Figure 4. Overlapping between E( r a n g e 1 ) and E( r a n g e 2 ) .
Figure 4. Overlapping between E( r a n g e 1 ) and E( r a n g e 2 ) .
Electronics 11 02870 g004
Figure 5. Example in node data search phase.
Figure 5. Example in node data search phase.
Electronics 11 02870 g005
Figure 6. Example of Top-k retrieval phase.
Figure 6. Example of Top-k retrieval phase.
Electronics 11 02870 g006
Figure 7. Example of Top-k result refinement phase.
Figure 7. Example of Top-k result refinement phase.
Electronics 11 02870 g007
Figure 8. Processing time with varying levels of kd-tree.
Figure 8. Processing time with varying levels of kd-tree.
Electronics 11 02870 g008
Figure 9. Processing time with a varying number of data items.
Figure 9. Processing time with a varying number of data items.
Electronics 11 02870 g009
Figure 10. Processing time with a varying number of k.
Figure 10. Processing time with a varying number of k.
Electronics 11 02870 g010
Figure 11. Processing time with a varying key size.
Figure 11. Processing time with a varying key size.
Electronics 11 02870 g011
Figure 12. Processing time with a varying number of threads.
Figure 12. Processing time with a varying number of threads.
Electronics 11 02870 g012
Figure 13. Processing time with a varying number of data items in 10 thread.
Figure 13. Processing time with a varying number of data items in 10 thread.
Electronics 11 02870 g013
Figure 14. Processing time with a varying number of k in 10 thread.
Figure 14. Processing time with a varying number of k in 10 thread.
Electronics 11 02870 g014
Figure 15. Processing time with a varying level of kd-tree in real dataset.
Figure 15. Processing time with a varying level of kd-tree in real dataset.
Electronics 11 02870 g015
Figure 16. Processing time with a varying number of k in real dataset.
Figure 16. Processing time with a varying number of k in real dataset.
Electronics 11 02870 g016
Figure 17. Processing time with a varying number of threads in real dataset.
Figure 17. Processing time with a varying number of threads in real dataset.
Electronics 11 02870 g017
Table 1. Comparison of the existing studies.
Table 1. Comparison of the existing studies.
FeaturesData PrivacyQuery PrivacyHiding Data Access PatternIndexComputation OverheadEncryption User Involvement in ComputationSecurity Risk
Schemes
J. Vaidya and C. Clifton’s work [38]SupportNot supportNot supportNoneLowNoneInvolvedHigh
M. Burkhart and X. Dimitropoulos’s work [40]SupportNot supportNot supportNoneLowNoneInvolvedHigh
Y. Zheng et al. [41]SupportNot supportNot supportNoneModerateAESInvolvedHigh
H-I. Kim et al. [27]SupportSupportSupportEncrypted kd-treeVery highPaillierNot involvedLow
H-J. Kim et al. [28]SupportSupportNot supportEncrypted kd-treeHighPaillierNot involvedLow
ProposedSupportSupportSupportEncrypted kd-treeLowPaillierNot involvedLow
Table 2. Parameters for performance evaluation in a multi-core environment.
Table 2. Parameters for performance evaluation in a multi-core environment.
ParameterValuesDefault
Number of data items (n)20 k, 40 k, 60 k, 80 k, 100 k-
Level of kd-tree7, 8, 9, 10, 1110
Number of dimensions2, 3, 4, 5, 66
Key size512, 1024-
Data domain (bit length) 2 22 2 22
Table 3. Parameters for performance evaluation in multi-core environment.
Table 3. Parameters for performance evaluation in multi-core environment.
ParameterValuesDefault
Number of data items (n)20 k, 40 k, 60 k, 80 k, 100 k-
Number of k5, 10, 15, 2010
Number of threads2, 4, 6, 8, 1010
Key size1024-
Data domain (bit length) 2 22 2 22
Table 4. Parameters for performance evaluation for real dataset.
Table 4. Parameters for performance evaluation for real dataset.
ParameterValuesDefault
Number of data items (n)28,056-
Level of kd-tree5, 6, 7, 8, 9, 10, 11, 1210
Number of dimensions66
Number of threads2, 4, 6, 8, 1010
Number of k5, 10, 15, 2010
Key size512-
Data domain (bit length) 2 12 2 12
Table 5. Comparison of the number of encryption according to encrypted random value pool.
Table 5. Comparison of the number of encryption according to encrypted random value pool.
ProtocolSM ProtocolSecure Compare Protocol
Algorithm
H-I. Kim et al. [27] 3 × E l o g 2 D × E
H-J. Kim et al. [28] 3 × E l o g 2 D × E
Proposed algorithm 1 × E 1 × E
E = Encryption, D = Data domain.
Table 6. Time complexity for proposed algorithm and existing algorithm.
Table 6. Time complexity for proposed algorithm and existing algorithm.
Top-k Query Processing AlgorithmAverage Time ComplexityWorst Time Complexity
H-I. Kim et al. [27] O ( m × l + n × m + k × ( l × l o g 2 n + n × m ) ) O ( m × l + n × m + k × ( l × l o g 2 n + n × m ) )
H-J. Kim et al. [28] O ( m × l + n × m + k × ( l × l o g 2 n + n × m ) ) O ( m × l + n × m + k × ( l × l o g 2 n + n × m ) )
Proposed algorithm O ( m + n × m + k × ( l o g 2 n + n × m ) ) O ( m + n × m + k × ( l o g 2 n + n × m ) )
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Kim, H.-J.; Kim, Y.-K.; Lee, H.-J.; Chang, J.-W. Privacy-Preserving Top-k Query Processing Algorithms Using Efficient Secure Protocols over Encrypted Database in Cloud Computing Environment. Electronics 2022, 11, 2870. https://doi.org/10.3390/electronics11182870

AMA Style

Kim H-J, Kim Y-K, Lee H-J, Chang J-W. Privacy-Preserving Top-k Query Processing Algorithms Using Efficient Secure Protocols over Encrypted Database in Cloud Computing Environment. Electronics. 2022; 11(18):2870. https://doi.org/10.3390/electronics11182870

Chicago/Turabian Style

Kim, Hyeong-Jin, Yong-Ki Kim, Hyun-Jo Lee, and Jae-Woo Chang. 2022. "Privacy-Preserving Top-k Query Processing Algorithms Using Efficient Secure Protocols over Encrypted Database in Cloud Computing Environment" Electronics 11, no. 18: 2870. https://doi.org/10.3390/electronics11182870

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop