Privacy-Preserving Secure Computation of Skyline Query in Distributed Multi-Party Databases

Qaosar, Mahboob; Zaman, Asif; Siddique, Md. Anisuzzaman; Annisa,; Morimoto, Yasuhiko

doi:10.3390/info10030119

Open AccessArticle

Privacy-Preserving Secure Computation of Skyline Query in Distributed Multi-Party Databases^†

by

Mahboob Qaosar

^1,2,*

,

Asif Zaman

²,

Md. Anisuzzaman Siddique

²,

Annisa

³ and

Yasuhiko Morimoto

¹

Graduate School of Engineering, Hiroshima University, Higashi-Hiroshima 739-8527, Japan

²

Department of Computer Science and Engineering, University of Rajshahi, Rajshahi 6205, Bangladesh

³

Department of Computer Science, Bogor Agricultural University, Bogor 1668, Indonesia

^*

Author to whom correspondence should be addressed.

^†

This paper is an extended version of our paper presented at the 12th International Conference on Advanced Data-Mining and Applications (ADMA 2016), Gold Coast, QLD, Australia, 12–15 December 2016. This version includes a more efficient implementation of our proposed method.

Information 2019, 10(3), 119; https://doi.org/10.3390/info10030119

Submission received: 21 February 2019 / Revised: 19 March 2019 / Accepted: 21 March 2019 / Published: 25 March 2019

Download

Browse Figures

Versions Notes

Abstract

Selecting representative objects from a large-scale database is an essential task to understand the database. A skyline query is one of the popular methods for selecting representative objects. It retrieves a set of non-dominated objects. In this paper, we consider a distributed algorithm for computing skyline, which is efficient enough to handle “big data”. We have noticed the importance of “big data” and want to use it. On the other hand, we must take care of its privacy. In conventional distributed algorithms for computing a skyline query, we must disclose the sensitive values of each object of a private database to another for comparison. Therefore, the privacy of the objects is not preserved. However, such disclosures of sensitive information in conventional distributed database systems are not allowed in the modern privacy-aware computing environment. Recently several privacy-preserving skyline computation frameworks have been introduced. However, most of them use computationally expensive secure comparison protocol for comparing homomorphically encrypted data. In this work, we propose a novel and efficient approach for computing the skyline in a secure multi-party computing environment without disclosing the individual attributes’ value of the objects. We use a secure multi-party sorting protocol that uses the homomorphic encryption in the semi-honest adversary model for transforming each attribute value of the objects without changing their order on each attribute. To compute skyline we use the order of the objects on each attribute for comparing the dominance relationship among the objects. The security analysis confirms that the proposed framework can achieve multi-party skyline computation without leaking the sensitive attribute value to others. Besides that, our experimental results also validate the effectiveness and scalability of the proposed privacy-preserving skyline computation framework.

Keywords:

secure skyline; homomorphic encryption; Paillier cryptosystem; information security; data-mining; data privacy; semi-honest adversary model; multi-party computation

1. Introduction

Data is an integral part of the current business and technology world. Every day, different organizations are producing a massive amount of data also known as “big data”. This “big data” analysis has attracted much attention to many organizations and researchers because it can assist in making strategic decisions and creating new knowledge. Product pricing for the open market place, investment risk estimation, mining customers’ spending/buying behaviors, credit card usage patterns, health issues, and so on are some common example of big data analytics. Designing a new framework for collecting, storing and analyzing this “big data” is undoubtedly a challenging task.

In the current IT era, multiple organizations dealing with similar kind of services want to perform analysis on their joint databases. It is often referred to as multi-party computation or analysis. This analysis may involve data-mining, querying over the joint dataset, data classification, statistical decision making, etc. [1,2]. Since the business applications contain sensitive data, such as personal health-related data or financial data, unveiling these data can potentially violate individual privacy and lead to significant financial loss to the organizations. Therefore, organizations do not want to disclose their data to anyone. However, when multiple organizations want to conduct a data-mining operation jointly, they are willing to get the result from the union of their databases without disclosing their sensitive data.

On the other hand, the skyline query is one of the popular methods for selecting representative objects from a large dataset. It retrieves a set of representative objects, each of which is not dominated by any other object within the database. For example, let us consider the issue of financial investment: an investor usually wants to purchase the stock that can minimize the commission costs and predicted risks. As a result, the target can be formalized as finding the skyline stock with minimal cost and minimal risk. Figure 1 shows a sample plot diagram of stock records along with their costs and risks. If we want to provide a suitable suggestion list for our clients using skyline query, the result will be

{U, O, P, X, Q, Y, Z}

. From Figure 1, it is obvious that no other object, within the given sample dataset, can dominate those seven objects. Therefore, they are in the skyline result. The skyline query attracts consistent attention in database research, due to its applications in decision making as well as analytics.

Like other data analysis applications, the distributed skyline computation certainly can benefit the participating organizations by producing skyline objects set from the joint database of the organizations. However, such computation also depends on managing data security and privacy challenges, especially for the skyline computation from the distributed multi-party databases. So far, several algorithms have been proposed for skyline computation, some of them are designed in a distributed computing environment and able to handle “big data” [3,4,5]. However, none of them considered database privacy issues.

Let us assume that several organizations have done surveys about commission cost and risk prediction where each of the organization has collected the same kind of privacy information from their customers/clients. This information is sensitive since the privacy of client information is a vital responsibility for each organization. Therefore, one organization does not want to disclose the dataset to other organization. Hence one organization cannot compute global skyline on organizations’ union databases but only compute skyline query of its own, although all parties (organizations) are willing to get the skyline result from their combined databases. In conventional skyline computation algorithm, it is not possible to get skyline query result without disclosing the objects’ attributes value to others.

When concerning the privacy of the database objects in a distributed multi-party computation environment, most of the existing work on privacy-preserving skyline computation focused on the secure comparison of encrypted values owned by participating organizations [6,7,8,9]. Although these frameworks can preserve the data objects privacy, they are not much suitable concerning computational efficiency. In our previous work [10], we introduced MapReduce framework-based secure ordering of database objects on each attribute in a semi-honest computation environment. Then computes the skyline by using the dominance relationship among the order of multi-party’s objects on each attribute. Although it is more efficient compared to secure comparison-based skyline query, it requires several rounds of ID encryption and decryption by the individual parties on each attribute of the database objects for creating the order of the objects. It also needs several rounds of data sorting by the coordinator on each dimension of the database objects. In this regard, our previous work consumes a significant amount of time for preparing the secure object order on each attribute. We also included the MapReduce framework only for sorting numeric values. However, using the MapReduce framework just for object ordering does not seem to be wise, since the framework itself requires a significant amount of time for inter-node communication and managing the process execution among multiple nodes.

In this work, we introduced an extended approach of [10] that can process the distributed object order more efficiently in a semi-honest computation environment; at the same time, it preserves the privacy of individual objects. In this extended work, we incorporate Paillier cryptosystem [11] for transforming the objects attributes value without changing the order of the objects on each attribute; where each participating party securely prepare encrypted object order on each attribute in collaboration with other participating parties. Then computes skyline from the order of the objects attribute value on each dimension without obtaining the original attributes’ value of the objects.

The remaining part of this paper is organized as follows. Section 2 reviews the related work. Section 3 discusses the notions and basic properties of skyline and Paillier cryptosystem. We briefly explain our secure skyline computation problem and proposed system model in Section 4. In Section 5, we specify the detailed algorithm with proper examples and analysis. Next, we discuss the privacy and security of our proposed framework in Section 6. We experimentally explain the efficiency of our algorithms in Section 7 under a variety of settings. Finally, Section 8 concludes this work.

Throughout this paper, we have used the hexadecimal number system for describing our proposed algorithm.

2. Related Work

Our previous research [10], as well as current research work, are motivated by earlier studies of skyline query processing, secure multi-party computation, and privacy-preserving secure skyline computation. Following Section 2.1 focuses on skyline query and Section 2.2 discuses about multi-party secure computation. Lastly, we highlighted on privacy-preserving secure skyline in Section 2.3.

2.1. Skyline Query

Borzsonyi et al., the original introducer of the skyline operator, proposed three algorithms for computing skyline from a large dataset: Block-Nested-Loops (BNL), Divide-and-Conquer (D&C), and B-tree-based schemes [12]. The BNL algorithm compares each object of the database with every other object and lists an object as a skyline object when any other object within the database does not dominate it. The D&C algorithm noticed the problem of memory limitation of a system. It divides the large dataset into several partitions and computes the skyline objects set for each partition by using a main-memory skyline algorithm. The skyline computation on the merged set of the skyline objects of each partition produces the final skyline. Later Kossmann et al. improved the D&C algorithm and proposed the Nearest Neighbor (NN) algorithm for pruning out dominated objects efficiently by iteratively partitioning the data space based on the nearest objects in the domain space [13]. Similarly, Chomicki et al. improved BNL by presorting, known as Sort-Filter-Skyline (SFS) [14]. The current most efficient algorithm is Branch-and-Bound Skyline (BBS) [15,16], which is a progressive algorithm based on the Fest-First Nearest Neighbor (BF-NN) algorithm proposed by Papadias et al. [17].

Presently, the distributed skyline computation becomes very popular. Balke et al. introduced skyline queries in distributed environments [18]. In their study, they presented several models for computing distributed skyline queries from the vertically partitioned web information. Wang et al. and Chen et al. both researched skyline query in structured P2P networks, named BATON networks, where peers are responsible for a partial region of data space [19,20]. Alternatively, a grid-based approach for distributed skyline processing (AGiDS) proposed by Rocha-Junior et al. [21] assuming that each peer maintains a grid-based data summary structure for describing its data distribution. Arefin et al. [22] worked on agent-based privacy skyline-set for the distributed database, but their query is different from us.

2.2. Multi-Party Secure Computation

The story of secure multi-party computation problem is widespread. Yao, who is the first introducer of this problem, presented a secure function evaluation process [23]. The process allows a set

P = {p_{1}, \dots, p_{m}}

of m players/parties to compute an arbitrary agreed function of their private data. The function preserves the privacy of data even if an adversary may corrupt and control some players/parties in various ways. After that, Goldreich, Micali, and Wigderson [24] and many others extended the research. According to Goldreich et al. [24], Security in Multi-party Computation means that the parties’ data remain secret except the intended results of the computation. Fundamentally, secure multi-party computation protocols are relatively less efficient than specific purpose protocols.

Privacy-preserving data-mining problems are another example of secure multi-party computation problem. We addressed it in this literature. Lindell et al. and Agrawal et al. proposed two different privacy-preserving data-mining approach [1,25]. Lindell defines the problem considering two parties; each of them has a nonpublic database, where the parties want to conduct a data-mining operation jointly on the union of their databases without disclosing their database to other parties, or any third party. In Agrawal’s paper, the problem was defined in another way, assuming two parties: Alice and Bob. The problem is to allow Alice to conduct data-mining operation on a private database owned by Bob, where Bob wants to prevent Alice from accessing precise information in individual data records. Although the problems are quite similar, the solution of these two similar problems proposed by Lindell and Agrawal are different: Lindell and Pinkas adopted secure multi-party computation protocols to solve their problem, while Agrawal applied the data perturbation method.

Most of the existing solutions used homomorphic encryption for secure comparison [26,27,28] although these protocols are highly expensive concerning computation and communication complexity [29]. Lin et al. introduced an efficient comparison protocol based on homomorphic encryption [30]. They have improved the secure comparison protocol by comparing two secret values in two rounds of data communication between two participating parties. However, this protocol is only limited to comparing secure attribute values owned by two parties, and it is not scalable.

Besides that, several multi-party computation tasks could be performed over the sorting order of the objects’ attributes. Such as skyline computation, querying with aggregation function, statistical analysis, and so on [10,31,32,33]. The oblivious radix sort is a renowned protocol for sorting privacy-preserving multi-party objects, proposed by Hamada et al. [31]. However, it demands multiple rounds of computation and communication between the participating parties for sorting multi-party objects based on attribute value. Recently Xin et al. also proposed a solution for secure multi-party sorting problem [34]. However, their protocol is based on the assumption that the attributes’ value are elements of a universal set, which is known by all participating parties and the computational complexity of the protocol will become high when the size of the universal set is large.

2.3. Secure Skyline Query

Due to the information privacy and security awareness of the present era, privacy-preserving secure data analysis is considered to be one of the major research areas in “big data” processing. The privacy-preserving secure skyline query is also being researched for multi-criteria data analysis considering different application aspect. Liu et al. have proposed secure skyline queries on cloud platform [7]. On the other hand, Hua et al. have proposed another privacy-preserving skyline computation model, called CINEMA [8]. They have considered computing skyline based on the user’s dynamic query. Using their proposed framework, they have considered keeping the privacy of the user’s dynamic query point and keep the database objects secret from the users, so that the users cannot access the secure database objects, and the database owner cannot obtain the user’s query point during computation. Although their proposed model produces a secure computation environment concerning data privacy, their circumstances are entirely different from us. Moreover, both models involve computationally expensive secure comparison protocols. Where Liu et al. integrates secure comparison and secure bit-decomposition protocols proposed by Veugen et al. [27] and Samanthula et al. [35]. On the other hand, Hua et al. reduced the communication overhead of secure comparison by using 0-encoding and 1-encoding scheme proposed by Lin et al. [30].

Liu et al. proposed another privacy-preserving skyline computation framework [6], which can be deployable in a multi-party computation platform. They also improved the efficiency of multi-party secure skyline computation by using secure comparison protocol based on the 0-encoding and 1-encoding scheme proposed by [30] and Lightweight Additive Homomorphic Public Key Encryption(LAHE) Scheme. They also reduce the number of secure comparisons by using the additivity property of skyline [36]. They considered that each party computes local skyline objects set at first. Then the global skyline object set could be computed by using secure dominance relationship computation among each party’s local skyline objects. However, their proposed framework is based on pairwise secure skyline computation for computing global skyline. So, the computational complexity increases rapidly with the number of participating parties. Moreover, the complexity of 0-encoding and 1-encoding scheme used by their framework for comparing two private attributes value increase with the length of the attribute value in the number of binary bits.

Recently, Liu et al. also proposed a new framework for privacy-preserving user-centric dynamic skyline query over multi-party databases, called PUSC [9]. Although it is a new framework for dynamic skyline query over distributed multi-party databases, it is not efficient enough since it requires a massive time for execution due to the complexity of different protocols integrated with the computation process. And the skyline computation time of PUSC increases with the total number of encrypted data objects supplied by data providers.

Besides that, our previous work introduced secure objects’ ordering-based skyline computation framework [10]. In this framework, the participating parties jointly construct their database objects’ order in collaboration with a semi-honest third party, called the coordinator. It requires several rounds of ID encryption and decryption by individual parties and requires several rounds of data sorting by the coordinator for generating objects’ order securely on each attribute. In this regard, our previous work consumes a significant time for preparing a secure object order. We also deployed the MapReduce framework for sorting the numeric values there. However, employing the MapReduce framework just for sorting values does not improve the efficiency of the computation, since the framework requires significant time for inter-node communication and controlling the task execution using multiple computing nodes.

3. Preliminaries

This section defines related properties of the proposed algorithm.

3.1. Dominance and Skyline

Given a dataset

D S

with d-dimensions

{d_{1}, d_{2}, \dots, d_{d}}

and n objects

{O_{1}, O_{2},

\dots, O_{n}}

. We use

O_{i}

.

d_{j}

to denote the j-th dimension value of object

O_{i}

. We assume that the smaller value in each attribute is better, without loss of generality.

Dominance: An object

O_{i} \in D S

is said to dominate another object

O_{j} \in D S

, denoted as

O_{i} ≺ O_{j}

, if

O_{i} . d_{r} \leq O_{j} . d_{r}

(

1 \leq r \leq d

) for all d dimensions and

O_{i} . d_{t} < O_{j} . d_{t}

(

1 \leq t \leq d

) for at least one dimension. We call such

O_{i}

as dominant object and such

O_{j}

as dominated object between

O_{i}

and

O_{j}

. For example, in Figure 2 object W is dominated by object P.

Skyline: An object

O_{i} \in D S

is said to be a skyline object of

D S

, if and only if there is no such object

O_{j} \in D S

(

j \neq i

) that dominates

O_{i}

. The skyline of

D S

, denoted by

S k y (D S)

, is the set of skyline objects in

D S

. For dataset shown in Figure 2, objects

{U, O, P, X, Q, Y, Z}

are not dominated by any other objects. Thus, skyline query retrieves

S k y (D S) = {U, O, P, X, Q, Y, Z}

.

Additivity of Skyline Computation [36]: Given a dataset

D S

and p datasets such that

D S = D S_{1} \cup \dots \cup D S_{p}

, the following equation holds:

S k y (D S) = S k y (S k y (D S_{1}) \cup \dots \cup S k y (D S_{p}))

. In Figure 2, if we consider that the red bubbles represent the objects of

D S_{1}

and green squares represents the objects of

D S_{2}

. Then the skyline objects set of

D S_{1}

and

D S_{2}

can be given by

S k y (D S_{1}) = {M, N, O, P, Q, R, S}

and

S k y (D S_{2}) = {U, V, W, X, Y, Z}

. However, the common skyline objects set can be given by

S k y (D S) = {U, O, P, X, Q, Y, Z}

, where

{O, P, Q} \in S k y (D S_{1})

and

{U, X, Y, Z} \in S k y (D S_{2})

.

3.2. Paillier Cryptosystem

In our proposed approach we use the Paillier cryptosystem, which is a probabilistic asymmetric algorithm for public key cryptography [11]. In Paillier cryptosystem both the public and private key consists of two integers, where the public key is given by

P a i l l i e r_{p k} (n, g)

and the private key is given by

P a i l l i e r_{s k} (λ, μ)

. The scheme is additive homomorphic encryption; this means that given the public key and the encryption of plain messages

m_{1}

and

m_{2}

, one can compute the encryption of

m_{1} + m_{2}

.

Let us consider two plain messages

m_{1}

and

m_{2}

and their corresponding cipher messages

ζ_{1}

and

ζ_{2}

, where

ζ_{1} = E n (m_{1}, P a i l l i e r_{p k})

and

ζ_{2} = E n (m_{2}, P a i l l i e r_{p k})

.

Then, the following equations give the homomorphic addition and multiplication properties of Paillier cryptosystem.

Homomorphic Addition

$(ζ_{1} \times ζ_{2}) \mod n^{2} = E n ((m_{1} + m_{2}) \mod n, P a i l l i e r_{p k})$
Homomorphic Multiplication

$ζ_{1}^{k} \mod n^{2} = E n (k \times m_{1} \mod n, P a i l l i e r_{p k})$

At the above equations, n is the part of Paillier public key and k is a positive integer constant.

4. Multi-Party Secure Skyline Computation Problem and Proposed System Model

In this section, we formalize privacy-preserving multi-party secure skyline computation problem and our proposed system model.

4.1. Multi-Party Secure Skyline Problem

Let us consider a situation where several organizations have done some surveys about commission cost and risk prediction. We assume that each of the organizations has collected similar private information of their customers. Also, assume that all the organizations computed the local skyline from their private dataset. Now each organization wants to find the resultant skyline from the union of these local skyline result also termed as the organizations’ global database. However, none of them is allowed to disclose the attributes’ value of their database objects to other organizations. We call participant organizations of the skyline computation as parties. Due to additivity property of skyline computation, it is apparent that the result of skyline query computed from the union of each party’s dataset must be equal to the skyline query result obtained from the merged results of individual skyline.

To simplify the problem, we keep the number of participant parties is equivalent to 2. They are denoted as

D a t a N o d e^{1}

and

D a t a N o d e^{2}

, respectively. To describe the proposed algorithm, assume that Figure 2 represent the union dataset of these two parties. Where “Green Square” symbol represents that the objects come from

D a t a N o d e^{1}

and “Red Circle” symbol means objects comes from

D a t a N o d e^{2}

. Table 1 and Table 2 represents the two-dimensional secure skyline objects set of

D a t a N o d e^{1}

and

D a t a N o d e^{2}

.

4.2. System Model

In our proposed system model, we introduced a skyline computation procedure from secure multi-party databases in an efficient and privacy-preserving way. Like some existing model of privacy-preserving multi-party computation [7,8,9,10], we also adopted the semi-honest adversary model in our study, as defined in [37], and included a semi-honest third party adversary, called the coordinator, which will be trusted by all participating parties. We considered that the coordinator is honest-but-curious. Specifically, all participating parties along with the coordinator strictly executes the protocol but intend to extract the private data from the computation. Therefore, any participating party will not expose their object directly to the coordinator or other participating parties. Therefore, we consider that all parties securely transform their objects’ attributes’ value without changing their order on each dimension and the coordinator computes the multi-party skyline objects set from the order of the objects’ attributes value. The detailed process of this skyline query, which does not use actual attributes’ value but the order of the attributes, can be found in [32]. The sorting order generation process should need to be secure enough so that nothing could be obtained by the coordinator other than the relative order of objects’ secret attributes’ value on each dimension. The proposed framework also needs to confirm that the participating parties should be unable to guess the value of the secret objects’ attributes of other party’s objects during computation. Therefore, the transformed order information should need to be secret to all participating parties. In this regard, we consider using the Paillier cryptosystem, and its properties for transforming the objects’ attributes’ value. As a semi-honest model, our proposed framework implicitly assumes that there will be no collude among the coordinator and some of the corrupted parties.

5. Privacy-Preserving Multi-Party Secure Skyline Computation Algorithm

In this section, we provide details of the proposed algorithm. It consists of eight steps.

Local skyline computation.
Fix the bit-slice length and maximum bit-length of substitute vector element.
Paillier key-pair generation.
Generate and share the encrypted substitute vectors.
Combine the encrypted substitute vectors.
Encrypt the object order and resultant dataset generation.
Decrypt the objects order and global skyline computation.
Qualified global skyline objects identification.

Figure 3 describes the simplified block-diagram of our proposed privacy-preserving skyline computation model. Where we use one coordinator and p is the number of participating parties. Each

V^{m}

represents the substitute vector generated by

D a t a N o d e^{m}

, and

E n (V^{m})

represents the encrypted substitute vector of

V^{m}

, where each element of

V^{m}

is encrypted using the Paillier public key.

5.1. Local Skyline Computation

Due to the additivity property of skyline computation, we can say that each global skyline object must be a member of any one of the local skyline objects set of the participating parties. So, we consider that each participating party initially computes respective local skyline objects set from their secure private dataset to compute global skyline. The local skyline query minimizes the risk of database disclosure by analyzing the objects’ attributes’ order information by the coordinator. This process also reduces the complexity of skyline computation from the combined large database of multi-party objects’ attributes’ order.

5.2. Fix the Bit-Slice Length and Maximum Bit-Length of Substitute Vector Element

We admitted that the objects’ attribute value could be significantly large. Therefore, we need to split the attribute value into multiple slices for substituting the attribute value with the substitute vector element. These substitute vectors replace the attribute value without changing their order. We also need to keep the vector size within the acceptable memory capacity during computation. For example, if we consider the attribute value could have a variation from 0 to

(2^{32} - 1)

, it is not feasible to create a single vector of length

2^{32}

for replacing the attribute value. However, it could be possible to use three substitute vectors of length

2^{11}

to substitute the attribute value without changing their sorting order. In this regard, at the beginning of our proposed framework, all participating parties mutually fix the bit-slice length for splitting the attributes’ value of each dimension to substitute it with substitute vector element. After that, it generates a separate objects order on each attribute. Each party also mutually fix the maximum bit-length of substitute vector element. The maximum bit-length of substitute vector element must need to be higher than the corresponding bit-slice length. It is also essential that the bit-slice length for splitting the attributes’ value should be long enough, since the coordinator may try to assume the actual attribute value by analyzing the incidence of the bit pattern of the transformed value of the objects’ attributes, while the bit-slice length is small.

Our proposed algorithm considered the most straightforward way for fixing the bit-slice length and maximum bit-length of substitute vector element without any concern of the coordinator. At first, each participating party recommend bit-slice length and maximum bit-length and shares it with other parties. Finally, each participating party computes the rounded-up integer average of all participating parties’ recommendation. All participating parties must follow this rounded-up integer average bit-slice length and maximum bit-length of the corresponding vector element for generating encrypted substitute vector.

Assume that two participating party recommendations are shown in Table 3 and Table 4 respectively, for generating encrypted substitute vector to substitute their two-dimensional integer dataset. Here each

N_{i, j}

indicates the bit-slice length of

i^{t h}

attribute and

j^{t h}

slice, where j is indexed from less significant bits slice to most significant bits slice of the corresponding attribute value. Similarly, each

R_{i, j}

indicates the maximum bit-length of substitute vector element for

i^{t h}

attribute and

j^{t h}

slice. Table 5 represent the computed common bit-slice length for splitting the attribute value and common maximum bit-length of the corresponding vector element.

Although we have considered 8-bit integer attribute values for our running example and 4 or 5-bit bit-slice length for splitting the attribute value, in the real experiment, we have examined our proposed protocol for 32-bit integer attribute value and bit-slice length higher than 10.

5.3. Paillier Key-Pair Generation

The coordinator generates Paillier public key,

P a i l l i e r_{p k} (n, g)

for data encryption and private key,

P a i l l i e r_{s k} (λ, μ)

for data decryption. The detail Paillier key construction process is explained in [11]. After generating the key-pair, the coordinator shares the public key with all participating parties.

5.4. Generate and Share the Encrypted Substitute Vectors

To conceal the actual attribute value from the coordinator, all participating parties generate

2^{N_{i, j}}

unique values between 0 to

(2^{R_{i, j}} - 1)

for substituting

j^{t h}

slice of

i^{t h}

dimension. Then each participating party

D a t a N o d e^{m}

sort the generated random values into a vector table,

V_{i, j}^{m}

. After that, each element of sorted vector table multiplied with

2^{K_{i, j}}

, except the sorted vector table constructed for the less significant bit-slice of each attribute (i.e.,

j = 0

). Value of

K_{i, j}

can be computed using the following equation.

K_{i, j} = \sum_{l = 0}^{j - 1} R_{i, l}

After multiplying with

2^{K_{i, j}}

, the participating parties encrypt each element of their generated vector table using Paillier public key,

P a i l l i e r_{p k}

to construct encrypted substitute vector table,

ρ_{i, j}^{m}

. Assume that all parties has determined to construct an encrypted substitute vector table for substituting the attributes’ value of

i^{t h}

dimension and

j^{t h}

slice of a dataset, where the bit-slice length,

N_{i, j} = 4

and the maximum bit-length,

R_{i, j} = 8

. The construction of encrypted substitute vector,

ρ_{i, j}^{1}

for

D a t a N o d e^{1}

described in Table 6.

Following this way, all participating parties generate encrypted substitute vector for all attributes and slices according to Table 5. After encrypted substitute vectors generation, each participating party shares their generated vectors to other parties except the coordinator. The Paillier encryption hides the value of the sorted vector element from the other participating parties, while they shared the vector among each other. It allows homomorphic addition and multiplication on the encrypted vector elements.

5.5. Combine the Encrypted Substitute Vectors

After receiving the encrypted substitute vector from all participating parties, each party adds (using homomorphic addition property) all the encrypted substitute vectors supplied by the individual parties to obtain the ultimate consolidated encrypted substitute vectors. Table 7 illustrates this process, where we consider two participating parties and

N_{i, j} = 4

.

5.6. Encrypt the Object Order and Resultant Dataset Generation

All participating parties split each local skyline objects set attribute values according to predetermined bit-slice length. For our running example bit-slice is shown in Table 5. The split value should be used as the index of the combined encrypted vector elements corresponds to their respective attributes and slices. Finally, the corresponding encrypted vector elements for each attribute value added together using homomorphic addition to generate encrypted order sequence of the object on that attribute. For self-blinding, each party also add the encryption of 0 with the value of encrypted sorting order.

Consider that both parties agreed to split the

i^{t h}

attribute value with

S_{i}

slices and

σ_{i, 0}, \dots, σ_{i, S_{i} - 1}

represent the corresponding encrypted vector elements of the split pieces of that attribute value. Then the transformation to encrypted object order

δ_{i}

by using the encrypted substitute vector elements can be computed by the following equation:

δ_{i} = \sum_{j = 0}^{S_{i} - 1} σ_{i, j} + E n (0, P a i l l i e r_{p k})

The coordinator may assume the individual skyline object identity by identifying the object provider. To avoid such situation, we consider that the individual parties do not send their locally computed skyline objects attributes order separately to the coordinator. In this regard, each party anonymizes their local skyline object’s IDs as follows: (1) Each party adds redundant bits with their local skyline objects’ IDs by using CRC scheme [38]. (2) The IDs with padded CRC bits are then encrypted by the corresponding party’s symmetric encryption key. Let us consider the original

I D

of a local skyline object belongs to

D a t a N o d e^{i}

is

α

and

D E S_{i}

is the symmetric encryption key of

D a t a N o d e^{i}

. If

i d_{α}

represents the encrypted

I D

of that object, then

i d_{α}

can be computed by using the following equation:

i d_{α} = E n ((α ∥ C R C (α)), D E S_{i})

Table 8 and Table 9 describe the encrypted ordering sequence generation process of

D a t a N o d e^{1}

and

D a t a N o d e^{2}

.

Finally, all participating parties send the encrypted local skyline objects order on each attribute along with their encrypted

I D

s to a common participating party. This party is also responsible for merging all encrypted skyline objects order on each attribute. After that, it sends the merged set of encrypted local skyline objects order to the coordinator.

5.7. Decrypt the Objects Order and Global Skyline Computation

After receiving the dataset with the encrypted disguised order of the local skyline objects on each attribute, the coordinator decrypts them by using Paillier private key,

P a i l l i e r_{s k}

and obtain the transformed value of local skyline objects without changing their relative order.

Table 10 illustrates the sample database with encrypted data obtained from individual parties. The transformed order value of the objects’ attributes on each dimension after decryption, where each value in column

θ_{i}

for

i = 1, 2

obtained by decrypting each encrypted value in column

δ_{i}

. This process can be represented by the following equation:

θ_{i} = D e (δ_{i}, P a i l l i e r_{s k})

Here we discuss the procedure of obtaining

θ_{1, i d_{M}} = 4 C 9 C_{16}

for

i d_{M}

, where the original attribute value is

2 D_{16}

. Let’s assume the value of substitute vector elements

V_{1, 0, D}^{1}

and

V_{1, 0, D}^{2}

for hexadecimal value

D_{16}

generated by

D a t a N o d e^{1}

and

D a t a N o d e^{2}

are

C 2_{16}

and

D A_{16}

, respectively. Similarly,

V_{1, 1, 2}^{1} = 1 D_{16}

and

V_{1, 1, 2}^{2} = 2 E_{16}

for

2_{16}

.

After encrypting with

P a i l l i e r_{p k}

,

D a t a N o d e^{1}

and

D a t a N o d e^{2}

obtain

ρ_{1, 0, D}^{1} = E n (C 2_{16})

and

ρ_{1, 0, D}^{2} = E n (2 E_{16})

. Therefore, using homomorphic addition property, both parties can obtain the combine encrypted substitute vector element for

D_{16}

as

ξ_{1, 0, D} = ρ_{1, 0, D}^{1} + ρ_{1, 0, D}^{2} = E n (C 2_{16}) + E n (D A_{16}) = E n (19 C_{16})

.

Since,

K_{1, 1} = 8

for our running example. Hence, for

D a t a N o d e^{1}

,

ρ_{1, 1, 2}^{1} = E n (2^{K_{1, 1}} \times V_{1, 1, 2}^{1} = E n (2^{8} \times 1 D_{16}) = E n (1 D 00_{16})

. By using the same equation

D a t a N o d e^{2}

computes

ρ_{1, 1, 2}^{2} = E n (2 E 00_{16})

. Proceeding in the same way of obtaining combine encrypted substitute vector element for

D_{16}

, both parties can get

ξ_{1, 1, 2} = ρ_{1, 1, 2}^{1} + ρ_{1, 1, 2}^{2} = E n (4 B 00_{16})

for

2_{16}

.

Finally, by adding the encrypted substitute vector elements for original attribute value

2 D_{16}

,

D a t a N o d e^{1}

can produce the encrypted order value as

δ_{1, M} = ξ_{1, 0, D} + ξ_{1, 1, 2} = E n (19 C_{16}) + E n (4 B 00_{16}) = E n (4 C 9 C_{16})

.

After decryption, the coordinator uses the object order on each attribute for computing global skyline query. From Table 10, we observe that according to the transformed value of the objects secure attribute value, any other objects within the dataset do not dominate the dataset objects with IDs

{i d_{U}, i d_{O}, i d_{P}, i d_{X}, i d_{Q}, i d_{Y}, i d_{Z}}

. It can be confirmed from column

θ_{1}

and

θ_{2}

. Therefore, the coordinator computes the skyline result as

{i d_{U}, i d_{O}, i d_{P}, i d_{X}, i d_{Q}, i d_{Y}, i d_{Z}}

. Since each

i d_{α}

representing the object with ID

α

, hence the result is also correct according to their original attributes value, as illustrated in Figure 2. After computing the global skyline objects set

S k y (D S)

the coordinator sends the encrypted IDs of qualified

S k y (D S)

objects to all participating parties.

5.8. Qualified Global Skyline Objects Identification

After receiving the encrypted

I D

s of the global skyline objects each party tries to decrypt the encrypted

I D

s using their symmetric encryption key. If the party owns that skyline object, the party can quickly identify it by the decrypted

I D

s and CRC code checking. Proceeding in a similar way each participating party recognizes their respective globally qualified skyline objects.

6. Privacy and Security

Our proposed framework of privacy-preserving secure multi-party skyline computation is based on transforming the attributes’ value without changing the order of the objects‘ attributes on each dimension. As a semi-honest adversary model, this framework implicitly assumes that all participating parties trust the coordinator and the coordinator honestly executes the processes and does not make an alliance with any of the corrupted party for obtaining the combined encrypted substitute vector.

Since only the coordinator has the private decryption key, no other party can obtain the transformed order information of the objects’ attributes. So, the data privacy of honest parties will not be affected by the dishonesty of some of the corrupted parties. On the other hand, since the participating parties only share the attributes order of their local skyline objects set computed from their secure database, it is not possible to guess the attribute value by analyzing the frequency of the limited number of objects’ attributes order value. However, if the coordinator and any corrupted party make any conspiracy by sharing substitute vectors, those are used for transforming the objects’ attributes, then the proposed framework cannot meet the privacy and security expectation.

Therefore, now we can claim that the proposed framework secures the privacy of the objects during multi-party skyline computation.

7. Experiments

In this section, we evaluate the performance and effectiveness of our proposed framework. We used four identical computers connected with Cisco Catalyst 2960-X Series Gigabit Switch for the experimental setup. Out of the four computers one was considered to be the coordinator and other three computers as individual parties containing private datasets. Each of the computers has an Intel^® Core™ i5-6500 3.20 GHz CPU and 8 GB memory. We used the 64-bit Ubuntu 16.04 operating system for our experiment. We compiled the source codes of the program under Java V8 and executed the program under Java™ 1.8.0 Runtime Environment. We generated synthetic datasets for evaluating the performance of our proposed framework. Each attribute value of the synthetic datasets was randomly picked from 32-bit unsigned integer. For the proposed study, we put our focus on the performance of generating secure object order targeting skyline computation from the privacy-preserved multi-party databases without unveiling the original attributes’ value of the objects to anyone. For evaluating the efficiency of our model, we considered that all participating parties begin to generate the encrypted substitute vectors and compute there local skyline objects set simultaneously after obtaining the Pallier public key, Paillier_pk from the coordinator.

From our experiment, we found that the significant time consumes for computing the local skyline objects set, for generating encrypted substitute vector and for combining the vectors generated by individual parties. However, since the individual parties compute the local skyline objects set from their plain dataset without any security protocol, the local skyline computation time remain same either for non-secured distributed skyline computation or for privacy-preserved multi-party skyline computation. We also comprehensively compared the complexity of our proposed framework with the frameworks proposed in [6,10].

A. Encrypted Substitute Vector Generation and Combining: We studied the runtime for encrypted substitute vector generation process according to the algorithm described in Section 5.4, which will be executed by each participating party simultaneously. Since the length of the substitute vector increases twice with each increase of the bit-slice length, the process runtime of generating the unique random numbers within a given range and encrypting the substitute vector elements also increases. However, using the larger bit-slice length reduces the number of partitions for splitting the attribute value to transform the attribute value and thus also reduces the number of the required substitute vector. For example, a 32-bit attribute value can be substitutable by using two vectors of 16-bit-slice length, but it requires three vectors to substitute using the vector of 11-bit-slice length. We examined runtime with varied bit-slice length from 10 to 16. Figure 4a shows the effect of encrypted substitute vector generation process with different bit-slice length.

We also studied the process execution time for joining the encrypted substitute vectors using homomorphic addition property according to Section 5.5. In this regard, we examined the runtime of combining three substitute vectors generated by three participating parties for varied bit-slice length. Our experimental result is illustrated in Figure 4b.

B. Privacy-Preserving Multi-Party Skyline Computation: To evaluate the performance of our proposed framework, we assumed that each participating party computes local skyline from the equal amount of data tuples. We evaluate the performance of our proposed framework for different data distribution and the varied number of objects’ dimension. For both experiments, we varied each participating parties’ tuples number from 10 k to 50 k.

To conduct this experiment, we used three different types of data distribution. They are correlated, anti-correlated, and independent distributions. As shown in Figure 5, this framework is affected by data distribution. We found that the framework is more efficient for the correlated dataset and less efficient for the anti-correlated dataset. However, the performance for independent dataset lies in between the performance for the anti-correlated and correlated dataset.

Figure 6 illustrates the effect of data dimension for computing skyline. We varied the data dimension from 2 to 6. Since the number of required encrypted substitute vector along with the number of comparisons and the amount of qualified local skyline objects increases with the vector dimension, the process execution time also increases. The results of our experiment also reflect it.

C. Comparison with Existing Privacy-Preserving Multi-party Skyline Computation Frameworks: The framework proposed in [6] applies the pairwise secure comparison of the objects’ attributes for computing dominance relationship between two participating parties’ objects. Therefore, the complexity of the algorithm increases with the number of participating parties, since each local skyline object of a party needs to be securely compared with other parties local skyline objects set separately. The author proposed to generate the homomorphic encryption key-pair twice for each comparison of the two private objects using the LAHE scheme. The complexity of the Fast Secure Integer Comparison (FSIC) protocol used by the framework depends on the maximum length of the attribute value in the number of bits. Furthermore, it also requires five rounds of information exchange between each pair of the participating parties for each comparison of their local skyline objects.

On the other hand, our proposed framework is comparatively less dependent on the number of participating parties. The coordinator generates the homomorphic encryption key-pair only for one time for the whole process. And our framework does not employ secure comparison protocol like [6]. Moreover, it just requires six rounds of data exchange for the entire computation process: at the beginning between the coordinator and the participating parties for sharing the public encryption key. After that, three rounds communication requires between the participating parties for fixing the bit-slice length, for sharing the encrypted substitute vector and merging the individual parties’ local skyline objects’ encrypted order on each attribute. Then, another round of communication required for sending the merged set of local skyline objects’ encrypted order to the coordinator. The final round of data communication needed between the coordinator and the participating parties, for sharing the encrypted

I D

s of the globally qualified skyline objects. Although it requires to transmit a large amount of data during the sharing of each party’s encrypted substitute vector, it is negligible compared to five rounds of information exchange for each dominance relationship comparison of two parties’ objects.

The method proposed in [10] is also scalable for any number of participating parties, although it requires multiple rounds of data interchange between the participating parties with the coordinator based on the number of slices of each attribute value and the number of dimension of the objects for preparing the order of the objects on each attribute. It also requires multiple rounds of sorting by the coordinator, and partial order merging by the individual parties for generating objects’ order securely on each attribute. On the other hand, our present work does not need several rounds of data exchange, data sorting and partial order merging like [10]. Besides that, we consider using homomorphic encrypted substitute vector to transform the objects’ attributes value securely without altering their order on each attribute.

Therefore, we claim that the proposed algorithm is more efficient and robust in terms of computation and communication complexity.

8. Conclusions

Our proposed approach addresses the problem of privacy-preserving skyline query in distributed multi-party databases. Considering privacy awareness, we must take the issue of data privacy during multi-party computation into account. We offered a secured but straightforward and efficient approach for skyline query in distributed multi-party databases without unveiling the objects’ attributes’ value, where most of the existing proposed framework for privacy-preserving multi-party skyline query requires time-consuming, expensive, and complex computation. We demonstrated the effectiveness and scalability of the proposed algorithm through intensive examples and experiments. It can also be possible to consider our proposed algorithm for the secure computation of the other variant of skyline query, such as k-dominant skyline and k-skyband. Besides that, the proposed algorithm of secured object ordering can also be applicable for retrieving the number of tuples with some given criteria of the database attributes from the privacy-preserved distributed multi-party databases.

Author Contributions

M.Q., A.Z., M.A.S., Annisa and Y.M. conceived the original idea for the study, analyzed the experiment results and revised the manuscript. M.Q. and A.Z. designed the system model. M.Q. performed the experiments and wrote the initial manuscript. All authors have confirmed and approved the submitted manuscript.

Funding

This work is supported by KAKENHI (16K00155, 23500180) Japan. Mahboob Qaosar is supported by the Japanese Government MEXT Scholarship. Asif Zaman was supported by the Japanese Government MEXT Scholarship. Annisa was supported by the Indonesian Government DG-RSTHE Scholarship.

Conflicts of Interest

The authors declare no conflict of interest.

References

Agrawal, R.; Srikant, R. Privacy-preserving Data Mining. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA, 15–18 May 2000; pp. 439–450. [Google Scholar]
Pathak, F.A.N.; Pandey, S.B.S. An efficient method for privacy preserving data mining in secure multiparty computation. In Proceedings of the 2013 Nirma University International Conference on Engineering (NUiCONE), Ahmedabad, India, 28–30 November 2013; pp. 1–3. [Google Scholar] [CrossRef]
Afrati, F.N.; Koutris, P.; Suciu, D.; Ullman, J.D. Parallel Skyline Queries. In Proceedings of the International Conference on Database Theory (ICDT), Berlin, Germany, 26–28 March 2012; pp. 274–284. [Google Scholar]
Mullesgaard, K.; Pedersen, J.L.; Lu, H.; Zhou, Y. Efficient Skyline Computation in MapReduce. In Proceedings of the International Conference on Extending Database Technology (EDBT), Athens, Greece, 24–28 March 2014; pp. 37–48. [Google Scholar]
Park, Y.; Min, J.K.; Shim, K. Parallel Computation of Skyline and Reverse Skyline Queries Using MapReduce. J. Proc. VLDB Endow. 2013, 6, 2002–2013. [Google Scholar] [CrossRef]
Liu, X.; Lu, R.; Ma, J.; Chen, L.; Bao, H. Efficient and privacy-preserving skyline computation framework across domains. Future Gen. Comput. Syst. 2016, 62, 161–174. [Google Scholar] [CrossRef]
Liu, J.; Yang, J.; Xiong, L.; Pei, J. Secure Skyline Queries on Cloud Platform. In Proceedings of the 2017 IEEE 33rd International Conference on Data Engineering (ICDE), San Diego, CA, USA, 19–22 April 2017; pp. 633–644. [Google Scholar] [CrossRef]
Hua, J.; Zhu, H.; Wang, F.; Liu, X.; Lu, R.; Li, H.; Zhang, Y. CINEMA: Efficient and Privacy-Preserving Online Medical Primary Diagnosis with Skyline Query. IEEE Internet Things J. 2018. [Google Scholar] [CrossRef]
Liu, X.; Choo, K.R.; Deng, R.H.; Yang, Y.; Zhang, Y. PUSC: Privacy-Preserving User-Centric Skyline Computation Over Multiple Encrypted Domains. In Proceedings of the 2018 17th IEEE International Conference On Trust, Security and Privacy In Computing And Communications/12th IEEE International Conference on Big Data Science And Engineering (TrustCom/BigDataSE), New York, NY, USA, 1–3 August 2018; pp. 958–963. [Google Scholar] [CrossRef]
Zaman, A.; Siddique, M.A.; Annisa; Morimoto, Y. Secure Computation of Skyline Query in MapReduce. In Advanced Data Mining and Applications (ADMA) 2016; Li, J., Li, X., Wang, S., Li, J., Sheng, Q.Z., Eds.; Springer International Publishing: Cham, Switzerland, 2016; pp. 345–360. [Google Scholar]
Paillier, P. Public-Key Cryptosystems Based on Composite Degree Residuosity Classes. In Advances in Cryptology, Proceedings of the Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT)’99, Prague, Czech Republic, 2–6 May 1999; Stern, J., Ed.; Springer: Berlin/Heidelberg, Germany, 1999; pp. 223–238. [Google Scholar]
Borzsonyi, S.; Kossmann, D.; Stocker, K. The skyline operator. In Proceedings of the IEEE International Conference on Data Engineering (ICDE), Heidelberg, Germany, 2–6 April 2001; pp. 421–430. [Google Scholar]
Kossmann, D.; Ramsak, F.; Rost, S. Shooting stars in the sky: An online algorithm for skyline queries. In Proceedings of the International Conference on Very Large Data Bases (VLDB), Hong Kong, China, 20–23 August 2002; pp. 275–286. [Google Scholar]
Chomicki, J.; Godfrey, P.; Gryz, J.; Liang, D. Skyline with Presorting. In Proceedings of the IEEE International Conference on Data Engineering (ICDE), Bangalore, India, 5–8 March 2003; pp. 717–719. [Google Scholar]
Jin, W.; Han, J.; Ester, M. Mining Thick Skylines over Large Databases. In Knowledge Discovery in Databases: PKDD 2004; Boulicaut, J.F., Esposito, F., Giannotti, F., Pedreschi, D., Eds.; Springer: Berlin/Heidelberg, Germany, 2004; pp. 255–266. [Google Scholar]
He, W.; Li, C.; Chen, H. Maintaining the Dominant Representatives on Data Streams. In Database and Expert Systems Applications; Bhowmick, S.S., Küng, J., Wagner, R., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; pp. 704–718. [Google Scholar]
Papadias, D.; Tao, Y.; Fu, G.; Seeger, B. Progressive skyline computation in database systems. ACM Trans. Database Syst. 2005, 30, 41–82. [Google Scholar] [CrossRef]
Balke, W.T.; Güntzer, U.; Zheng, J.X. Efficient Distributed Skylining for Web Information Systems. In Advances in Database Technology, Proceedings of the EDBT 2004: 9th International Conference on Extending Database Technology, Heraklion, Crete, Greece, 14–18 March 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 256–273. [Google Scholar]
Wang, S.; Ooi, B.C.; Tung, A.K.H.; Xu, L. Efficient Skyline Query Processing on Peer-to-Peer Networks. In Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering, Istanbul, Turkey, 15–20 April 2007; pp. 1126–1135. [Google Scholar]
Chen, L.; Cui, B.; Lu, H.; Xu, L.; Xu, Q. iSky: Efficient and Progressive Skyline Computing in a Structured P2P Network. In Proceedings of the 2008 the 28th International Conference on Distributed Computing Systems, Beijing, China, 17–20 June 2008; pp. 160–167. [Google Scholar] [CrossRef]
Rocha, J.B.; Vlachou, A.; Doulkeridis, C.; Nørvåg, K. AGiDS: A Grid-Based Strategy for Distributed Skyline Query Processing. In Data Management in Grid and Peer-to-Peer Systems: Second International Conference, Globe 2009 Linz, Austria Proceedings; Springer: Berlin/Heidelberg, Germany, 2009; pp. 12–23. [Google Scholar]
Arefin, M.S.; Morimoto, Y. Privacy Aware Parallel Computation of Skyline Sets Queries from Distributed Databases. In Proceedings of the 2013 International Conference on Computing, Networking and Communications (ICNC), Osaka, Japan, 30 November–2 December 2011; pp. 186–192. [Google Scholar] [CrossRef]
Yao, A.C. Protocols for secure computations. In Proceedings of the 23rd Annual IEEE Symposium on Foundations of Computer Science, Chicago, IL, USA, 3–5 November 1982; pp. 160–164. [Google Scholar]
Goldreich, O.; Micali, S.; Wigderson, A. How to Play ANY Mental Game. In Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing STOC’87, New York, NY, USA, 25–27 May 1987; pp. 218–229. [Google Scholar]
Lindell, Y.; Pinkas, B. Privacy Preserving Data Mining. In Advances in Cryptology, Proceedings of the CRYPTO 2000: 20th Annual International Cryptology Conference Santa Barbara, CA, USA, 20–24 August 2000; Springer: Berlin/Heidelberg, Germany, 2000; pp. 36–54. [Google Scholar]
Lin, Z.; Jaromczyk, J.W. An efficient secure comparison protocol. In Proceedings of the 2012 IEEE International Conference on Intelligence and Security Informatics, Washington, DC, USA, 11–14 June 2012; pp. 30–35. [Google Scholar] [CrossRef]
Veugen, T.; Blom, F.; de Hoogh, S.J.A.; Erkin, Z. Secure Comparison Protocols in the Semi-Honest Model. IEEE J. Sel. Top. Signal Process. 2015, 9, 1217–1228. [Google Scholar] [CrossRef]
Nishide, T.; Ohta, K. Multiparty Computation for Interval, Equality, and Comparison Without Bit-Decomposition Protocol. In Public Key Cryptography—PKC 2007; Okamoto, T., Wang, X., Eds.; Springer: Berlin/Heidelberg, Germany, 2007; pp. 343–360. [Google Scholar]
Kerschbaum, F.; Biswas, D.; de Hoogh, S. Performance Comparison of Secure Comparison Protocols. In Proceedings of the 2009 20th International Workshop on Database and Expert Systems Application, Linz, Austria, 31 August–4 September 2009; pp. 133–136. [Google Scholar] [CrossRef]
Lin, H.Y.; Tzeng, W.G. An Efficient Solution to the Millionaires’ Problem Based on Homomorphic Encryption. In Applied Cryptography and Network Security; Ioannidis, J., Keromytis, A., Yung, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2005; pp. 456–466. [Google Scholar]
Hamada, K.; Ikarashi, D.; Chida, K.; Takahashi, K. Oblivious Radix Sort: An Efficient Sorting Algorithm for Practical Secure Multi-party Computation. IACR Cryptol. ePrint Arch. 2014, 2014, 121. [Google Scholar]
Siddique, M.A.; Tian, H.; Morimoto, Y. Distributed Skyline Computation of Vertically Splitted Databases by Using MapReduce. In Database Systems for Advanced Applications (DASFAA); Springer: Berlin/Heidelberg, Germany, 2014; pp. 33–45. [Google Scholar]
Sepehri, M.; Cimato, S.; Damiani, E. Privacy-Preserving Query Processing by Multi-Party Computation. Comput. J. 2015, 58, 2195–2212. [Google Scholar] [CrossRef]
Liu, X.; Li, S.; Chen, X.; Xu, G.; Zhang, X.; Zhou, Y. Efficient Solutions to Two-Party and Multiparty Millionaires’ Problem. Secur. Commun. Netw. 2017, 2017, 11. [Google Scholar] [CrossRef]
Samanthula, B.K.K.; Chun, H.; Jiang, W. An Efficient and Probabilistic Secure Bit-decomposition. In Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer and Communications Security (ASIA CCS ’13), Hangzhou, China, 8–10 May 2013; pp. 541–546. [Google Scholar] [CrossRef]
Hose, K.; Vlachou, A. A survey of skyline processing in highly distributed environments. VLDB J. 2012, 21, 359–384. [Google Scholar] [CrossRef]
Hazay, C.; Lindell, Y. Semi-honest Adversaries. In Efficient Secure Two-Party Protocols: Techniques and Constructions; Springer: Berlin/Heidelberg, Germany, 2010; pp. 53–80. [Google Scholar] [CrossRef]
Williams, R. A Painless Guide to CRC Error Detection Algorithms. 1993. Available online: http://www.ross.net/crc/download/crc_v3.txt (accessed on 27 December 2018).

Figure 1. A skyline Problem.

Figure 2. A multi-party skyline Problem. Green Squares and Dotted-Line represent the objects and skyline of

D S_{1}

. Red Bubbles and Dotted-Line represent the objects and skyline of

D S_{2}

. Black Dotted-Line represents the global skyline of

D S_{1}

and

D S_{2}

.

Figure 2. A multi-party skyline Problem. Green Squares and Dotted-Line represent the objects and skyline of

D S_{1}

. Red Bubbles and Dotted-Line represent the objects and skyline of

D S_{2}

. Black Dotted-Line represents the global skyline of

D S_{1}

and

D S_{2}

.

Figure 3. Privacy-preserving multi-party skyline computation model.

Figure 4. Bit-slice length effect on encrypted substitute vector generation process.

Figure 5. Running time varies with data distribution. [Dimension: 2; Bit-slice length: 11-bit; Slices/Attribute: 3].

Figure 6. Running time varies with data dimension. [Data Distribution: Independent; Bit-slice length: 11-bit; Slices/Attribute: 3].

Table 1. Secure skyline objects set,

S k y (D S_{1})

of

D a t a N o d e^{1}

.

Table 1. Secure skyline objects set,

S k y (D S_{1})

of

D a t a N o d e^{1}

.

ID	$d_{1}$	$d_{2}$
M	2D	E3
N	3B	BF
O	41	A7
P	4D	72
Q	90	51
R	B6	4C
S	F4	42

Table 2. Secure skyline objects set,

S k y (D S_{2})

of

D a t a N o d e^{2}

.

Table 2. Secure skyline objects set,

S k y (D S_{2})

of

D a t a N o d e^{2}

.

ID	$d_{1}$	$d_{2}$
U	25	B2
V	54	AC
W	5D	7F
X	6F	66
Y	A8	34
Z	D8	28

Table 3. Bit-slice length,

N

and maximum bit-length,

R

recommended by

D a t a N o d e^{1}

.

Table 3. Bit-slice length,

N

and maximum bit-length,

R

recommended by

D a t a N o d e^{1}

.

$Attribute, i$	$Slice, j$	$N_{i, j}$	$R_{i, j}$
1	0	3	7
1	1	5	9
2	0	5	9
2	1	3	8

Table 4. Bit-slice length,

N

and maximum bit-length,

R

recommended by

D a t a N o d e^{2}

.

Table 4. Bit-slice length,

N

and maximum bit-length,

R

recommended by

D a t a N o d e^{2}

.

$Attribute, i$	$Slice, j$	$N_{i, j}$	$R_{i, j}$
1	0	5	9
1	1	3	7
2	0	4	8
2	1	4	8

Table 5. Determined bit-slice length,

N

and maximum bit-length,

R

.

Table 5. Determined bit-slice length,

N

and maximum bit-length,

R

.

$Attribute, i$	$Slice, j$	$N_{i, j}$	$R_{i, j}$
1	0	4	8
1	1	4	8
2	0	5	9
2	1	4	8

Table 6. Example of Encrypted Substitute Vector Generation for

N_{i, j} = 4

and

R_{i, j} = 8

.

Table 6. Example of Encrypted Substitute Vector Generation for

N_{i, j} = 4

and

R_{i, j} = 8

.

Index	Sorted Random	Encrypted Vector,
k	Number, $V_{i, j}^{1}$	$ρ_{i, j}^{1} {= En (2}^{K_{i, j}} \times V_{i, j}^{1} {, Paillier}_{pk})$
0	0D	$ρ_{i, j, 0}^{1}$
1	13	$ρ_{i, j, 1}^{1}$
2	26	$ρ_{i, j, 2}^{1}$
3	31	$ρ_{i, j, 3}^{1}$
4	3B	$ρ_{i, j, 4}^{1}$
5	40	$ρ_{i, j, 5}^{1}$
6	44	$ρ_{i, j, 6}^{1}$
7	51	$ρ_{i, j, 7}^{1}$
8	5E	$ρ_{i, j, 8}^{1}$
9	6C	$ρ_{i, j, 9}^{1}$
A	9F	$ρ_{i, j, A}^{1}$
B	A6	$ρ_{i, j, B}^{1}$
C	AF	$ρ_{i, j, C}^{1}$
D	C2	$ρ_{i, j, D}^{1}$
E	DC	$ρ_{i, j, E}^{1}$
F	F4	$ρ_{i, j, F}^{1}$

Table 7. Example of Combined Encrypted Substitute Vector Construction for N_i,j = 4.

Index	Encrypted Vectors		Combined Vector
k	$ρ_{i, j}^{1}$	$ρ_{i, j}^{2}$	$ξ_{i, j} = ρ_{i, j}^{1} + ρ_{i, j}^{2}$
0	$ρ_{i, j, 0}^{1}$	$ρ_{i, j, 0}^{2}$	$ξ_{i, j, 0}$
1	$ρ_{i, j, 1}^{1}$	$ρ_{i, j, 1}^{2}$	$ξ_{i, j, 1}$
2	$ρ_{i, j, 2}^{1}$	$ρ_{i, j, 2}^{2}$	$ξ_{i, j, 2}$
3	$ρ_{i, j, 3}^{1}$	$ρ_{i, j, 3}^{2}$	$ξ_{i, j, 3}$
4	$ρ_{i, j, 4}^{1}$	$ρ_{i, j, 4}^{2}$	$ξ_{i, j, 4}$
5	$ρ_{i, j, 5}^{1}$	$ρ_{i, j, 5}^{2}$	$ξ_{i, j, 5}$
6	$ρ_{i, j, 6}^{1}$	$ρ_{i, j, 6}^{2}$	$ξ_{i, j, 6}$
7	$ρ_{i, j, 7}^{1}$	$ρ_{i, j, 7}^{2}$	$ξ_{i, j, 7}$
8	$ρ_{i, j, 8}^{1}$	$ρ_{i, j, 8}^{2}$	$ξ_{i, j, 8}$
9	$ρ_{i, j, 9}^{1}$	$ρ_{i, j, 9}^{2}$	$ξ_{i, j, 9}$
A	$ρ_{i, j, A}^{1}$	$ρ_{i, j, A}^{2}$	$ξ_{i, j, A}$
B	$ρ_{i, j, B}^{1}$	$ρ_{i, j, B}^{2}$	$ξ_{i, j, B}$
C	$ρ_{i, j, C}^{1}$	$ρ_{i, j, C}^{2}$	$ξ_{i, j, C}$
D	$ρ_{i, j, D}^{1}$	$ρ_{i, j, D}^{2}$	$ξ_{i, j, D}$
E	$ρ_{i, j, E}^{1}$	$ρ_{i, j, E}^{2}$	$ξ_{i, j, E}$
F	$ρ_{i, j, F}^{1}$	$ρ_{i, j, F}^{2}$	$ξ_{i, j, F}$

Table 8. Encrypted disguised object order generation by

D a t a N o d e^{1}

.

Table 8. Encrypted disguised object order generation by

D a t a N o d e^{1}

.

ID	$d_{1}$	$d_{2}$	$σ_{1, 1}$	$σ_{1, 0}$	$σ_{2, 1}$	$σ_{2, 0}$	$id$	$δ_{1}$	$δ_{2}$
M	2D	E3	$ξ_{1, 1, 2}$	$ξ_{1, 0, D}$	$ξ_{2, 1, 7}$	$ξ_{2, 0, 03}$	$i d_{M}$	$δ_{1, M}$	$δ_{2, M}$
N	3B	BF	$ξ_{1, 1, 3}$	$ξ_{1, 0, B}$	$ξ_{2, 1, 5}$	$ξ_{2, 0, 1 F}$	$i d_{N}$	$δ_{1, N}$	$δ_{2, N}$
O	41	A7	$ξ_{1, 1, 4}$	$ξ_{1, 0, 1}$	$ξ_{2, 1, 5}$	$ξ_{2, 0, 07}$	$i d_{O}$	$δ_{1, O}$	$δ_{2, O}$
P	4D	72	$ξ_{1, 1, 4}$	$ξ_{1, 0, D}$	$ξ_{2, 1, 3}$	$ξ_{2, 0, 12}$	$i d_{P}$	$δ_{1, P}$	$δ_{2, P}$
Q	90	51	$ξ_{1, 1, 9}$	$ξ_{1, 0, 0}$	$ξ_{2, 1, 2}$	$ξ_{2, 0, 11}$	$i d_{Q}$	$δ_{1, Q}$	$δ_{2, Q}$
R	B6	4C	$ξ_{1, 1, B}$	$ξ_{1, 0, 6}$	$ξ_{2, 1, 2}$	$ξ_{2, 0, 0 C}$	$i d_{R}$	$δ_{1, R}$	$δ_{2, R}$
S	F4	42	$ξ_{1, 1, F}$	$ξ_{1, 0, 4}$	$ξ_{2, 1, 2}$	$ξ_{2, 0, 02}$	$i d_{S}$	$δ_{1, S}$	$δ_{2, S}$

Table 9. Encrypted disguised object order generation by

D a t a N o d e^{2}

.

Table 9. Encrypted disguised object order generation by

D a t a N o d e^{2}

.

ID	$d_{1}$	$d_{2}$	$σ_{1, 1}$	$σ_{1, 0}$	$σ_{2, 1}$	$σ_{2, 0}$	$id$	$δ_{1}$	$δ_{2}$
U	25	B2	$ξ_{1, 1, 2}$	$ξ_{1, 0, 5}$	$ξ_{2, 1, 5}$	$ξ_{2, 0, 12}$	$i d_{U}$	$δ_{1, U}$	$δ_{2, U}$
V	54	AC	$ξ_{1, 1, 5}$	$ξ_{1, 0, 4}$	$ξ_{2, 1, 5}$	$ξ_{2, 0, 0 C}$	$i d_{V}$	$δ_{1, V}$	$δ_{2, V}$
W	5D	7F	$ξ_{1, 1, 5}$	$ξ_{1, 0, D}$	$ξ_{2, 1, 3}$	$ξ_{2, 0, 1 F}$	$i d_{W}$	$δ_{1, W}$	$δ_{2, W}$
X	6F	66	$ξ_{1, 1, 6}$	$ξ_{1, 0, F}$	$ξ_{2, 1, 3}$	$ξ_{2, 0, 06}$	$i d_{X}$	$δ_{1, X}$	$δ_{2, X}$
Y	A8	34	$ξ_{1, 1, A}$	$ξ_{1, 0, 8}$	$ξ_{2, 1, 1}$	$ξ_{2, 0, 14}$	$i d_{Y}$	$δ_{1, Y}$	$δ_{2, Y}$
Z	D8	28	$ξ_{1, 1, D}$	$ξ_{1, 0, 8}$	$ξ_{2, 1, 1}$	$ξ_{2, 0, 08}$	$i d_{Z}$	$δ_{1, Z}$	$δ_{2, Z}$

Table 10. Disguised Object Order Decryption by the coordinator.

ID	$δ_{1}$	$δ_{2}$	$θ_{1}$	$θ_{2}$
$i d_{M}$	$δ_{1, M}$	$δ_{2, M}$	4C9C	22A93
$i d_{N}$	$δ_{1, N}$	$δ_{2, N}$	7A60	18BEE
$i d_{O}$	$δ_{1, O}$	$δ_{2, O}$	C572	1891D
$i d_{P}$	$δ_{1, P}$	$δ_{2, P}$	C69C	F874
$i d_{Q}$	$δ_{1, Q}$	$δ_{2, Q}$	15060	CA6C
$i d_{R}$	$δ_{1, R}$	$δ_{2, R}$	185BF	C9DC
$i d_{S}$	$δ_{1, S}$	$δ_{2, S}$	1D1A5	C854
$i d_{U}$	$δ_{1, U}$	$δ_{2, U}$	4BAA	18A74
$i d_{V}$	$δ_{1, V}$	$δ_{2, V}$	F2A5	189DC
$i d_{W}$	$δ_{1, W}$	$δ_{2, W}$	F39C	F9EE
$i d_{X}$	$δ_{1, X}$	$δ_{2, X}$	FDED	F70E
$i d_{Y}$	$δ_{1, Y}$	$δ_{2, Y}$	15C22	869E
$i d_{Z}$	$δ_{1, Z}$	$δ_{2, Z}$	1B622	853A

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qaosar, M.; Zaman, A.; Siddique, M.A.; Annisa; Morimoto, Y. Privacy-Preserving Secure Computation of Skyline Query in Distributed Multi-Party Databases. Information 2019, 10, 119. https://doi.org/10.3390/info10030119

AMA Style

Qaosar M, Zaman A, Siddique MA, Annisa, Morimoto Y. Privacy-Preserving Secure Computation of Skyline Query in Distributed Multi-Party Databases. Information. 2019; 10(3):119. https://doi.org/10.3390/info10030119

Chicago/Turabian Style

Qaosar, Mahboob, Asif Zaman, Md. Anisuzzaman Siddique, Annisa, and Yasuhiko Morimoto. 2019. "Privacy-Preserving Secure Computation of Skyline Query in Distributed Multi-Party Databases" Information 10, no. 3: 119. https://doi.org/10.3390/info10030119

APA Style

Qaosar, M., Zaman, A., Siddique, M. A., Annisa, & Morimoto, Y. (2019). Privacy-Preserving Secure Computation of Skyline Query in Distributed Multi-Party Databases. Information, 10(3), 119. https://doi.org/10.3390/info10030119

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Privacy-Preserving Secure Computation of Skyline Query in Distributed Multi-Party Databases^†

Abstract

1. Introduction

2. Related Work

2.1. Skyline Query

2.2. Multi-Party Secure Computation

2.3. Secure Skyline Query

3. Preliminaries

3.1. Dominance and Skyline

3.2. Paillier Cryptosystem

4. Multi-Party Secure Skyline Computation Problem and Proposed System Model

4.1. Multi-Party Secure Skyline Problem

4.2. System Model

5. Privacy-Preserving Multi-Party Secure Skyline Computation Algorithm

5.1. Local Skyline Computation

5.2. Fix the Bit-Slice Length and Maximum Bit-Length of Substitute Vector Element

5.3. Paillier Key-Pair Generation

5.4. Generate and Share the Encrypted Substitute Vectors

5.5. Combine the Encrypted Substitute Vectors

5.6. Encrypt the Object Order and Resultant Dataset Generation

5.7. Decrypt the Objects Order and Global Skyline Computation

5.8. Qualified Global Skyline Objects Identification

6. Privacy and Security

7. Experiments

8. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Privacy-Preserving Secure Computation of Skyline Query in Distributed Multi-Party Databases †

Abstract

1. Introduction

2. Related Work

2.1. Skyline Query

2.2. Multi-Party Secure Computation

2.3. Secure Skyline Query

3. Preliminaries

3.1. Dominance and Skyline

3.2. Paillier Cryptosystem

4. Multi-Party Secure Skyline Computation Problem and Proposed System Model

4.1. Multi-Party Secure Skyline Problem

4.2. System Model

5. Privacy-Preserving Multi-Party Secure Skyline Computation Algorithm

5.1. Local Skyline Computation

5.2. Fix the Bit-Slice Length and Maximum Bit-Length of Substitute Vector Element

5.3. Paillier Key-Pair Generation

5.4. Generate and Share the Encrypted Substitute Vectors

5.5. Combine the Encrypted Substitute Vectors

5.6. Encrypt the Object Order and Resultant Dataset Generation

5.7. Decrypt the Objects Order and Global Skyline Computation

5.8. Qualified Global Skyline Objects Identification

6. Privacy and Security

7. Experiments

8. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Privacy-Preserving Secure Computation of Skyline Query in Distributed Multi-Party Databases^†