Searchable Encryption Scheme for Personalized Privacy in IoT-Based Big Data

Li, Shuai; Li, Miao; Xu, Haitao; Zhou, Xianwei

doi:10.3390/s19051059

Open AccessArticle

Searchable Encryption Scheme for Personalized Privacy in IoT-Based Big Data

by

Shuai Li

,

Miao Li

,

Haitao Xu

^*

and

Xianwei Zhou

School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Sensors 2019, 19(5), 1059; https://doi.org/10.3390/s19051059

Submission received: 16 January 2019 / Revised: 24 February 2019 / Accepted: 25 February 2019 / Published: 1 March 2019

(This article belongs to the Special Issue Big Data Driven IoT for Smart Cities)

Download

Browse Figures

Versions Notes

Abstract

:

The Internet of things (IoT) has become a significant part of our daily life. Composed of millions of intelligent devices, IoT can interconnect people with the physical world. With the development of IoT technology, the amount of data generated by sensors or devices is increasing dramatically. IoT-based big data has become a very active research area. One of the key issues in IoT-based big data is ensuring the utility of data while preserving privacy. In this paper, we deal with the protection of big data privacy in the data storage phase and propose a searchable encryption scheme satisfying personalized privacy needs. Our proposed scheme works for all file types including text, audio, image, video, etc., and meets different privacy needs of different individuals at the expense of high storage cost. We also show that our proposed scheme satisfies index indistinguishability and trapdoor indistinguishability.

Keywords:

Internet of Things; big data; searchable encryption; personalized privacy needs; index indistinguishability; trapdoor indistinguishability

1. Introduction

Internet of Things (IoT) has become a significant part of our daily life over the past few years. A huge number of sensors or intelligent devices have been integrated together to interconnect people with the physical world, which also generates massive sensing data. Data generated by IoT devices are collected, disseminated, and exchanged among different people, business, and societies. With the development of IoT, the amount of data generated by organizations or individuals is increasing dramatically [1].

Although the massive data generated in the IoT environment is of significant value, exploring and using the extraordinary value of IoT data will increase the risk of privacy breach [2]. To obtain profits, the collection, storage, and reuse of our personal data poses a serious threat to our privacy. Consequently, researchers are faced with the challenge of ensuring the utility of data while preserving privacy. Various techniques have been developed to protect data privacy. Generally, these techniques for data privacy can be grouped based on the stages of big data life cycle, as follows [3].

Data generation: In the data generation phase, access restriction, and falsifying data techniques are used.
Data storage: The approaches in the data storage phase are mainly based on encryption techniques.
Data processing: Anonymization techniques as well as clustering, classification, and association rule mining-based techniques are used in the data processing phase.

In this paper, we will focus on the protection of big data privacy in the data storage phase of the big data life cycle. In the IoT environment, the sensing data generated by various sensors and devices will be collected and uploaded to cloud servers, where cloud servers can provide massive storage and cloud computing services. We know that encryption techniques are used for the protection of big data privacy in the data storage phase. When a large amount of encrypted data is stored in cloud servers, the first consideration is confidentiality of the data, which can be ensured by secure and efficient encryption schemes. However, when the data user wants to retrieve the data containing a specific keyword, the cloud server cannot respond to the data user’s retrieval request, because it cannot decrypt the encrypted data. All these problems can be solved by searchable encryption schemes [4,5], such as searchable symmetric encryption [6], public key encryption with keyword search [7], etc. The searchable encryption scheme mainly includes three entities—data owner, data user, and cloud server. The data owner outsources the encrypted data to the cloud server. The data user queries the encrypted data containing a specific keyword to the cloud server. The cloud server stores and retrieves the encrypted data.

In existing searchable encryption schemes, the data user can access all the data owned by the data owner, which can result in a privacy breach for the data owner. On the one hand, the data owner may be willing to share the data with some specific data users, but not with other data users. On the other hand, the data owner may be willing to share specific data with the data user, but not willing to share other data. Therefore, the data user accesses all the data owned by the data owner, which can result in a privacy breach for the data owner. Furthermore, additional information in the data owned by the data owner can also result in a privacy breach for the data owner. Privacy is subjective, and different people have different privacy needs. For example, the hidden text in a typical Word file includes a lot of sensitive personal information [8]. However, this additional information, which may disclose the privacy of the data owner, is useless for some data users. In data mining, data preprocessing is used to transform raw data into an understandable format [9]. In natural language processing, text feature extraction is used to transform a list of words into a feature set that is usable by a classifier [10]. In speech recognition and image recognition, feature extraction is a key step [11,12]. It means that this additional information may be discarded by the data user in the feature extraction phase. In summary, the data user accessing all the data owned by the data owner will result in a privacy breach for the data owner, but will not improve the utility of the data.

In this paper, we will propose a searchable encryption scheme for personalized privacy protection in IoT-based big data. The main contributions of our proposed scheme are as follows:

In our proposed scheme, the data owner generates the file features at different levels, and uploads the encrypted file features to the cloud server.
The proposed scheme makes a trade-off between ensuring the utility of the data and preserving the privacy, and meets the different privacy needs of different individuals.

The rest of this paper is as follows. Section 2 discusses the recent searchable encryption scheme. Section 3 presents necessary notations and definitions. Section 4 formalizes the searchable encryption scheme for meeting the personalized privacy needs in big data and presents main security definition. Section 5 describes the detailed construction of our proposed scheme. Section 6 discusses the security of our proposed scheme. Section 7 performs real time experimental results and makes a comparison of our proposed scheme with the existing schemes. The last section is the conclusion of this paper.

2. Related Work

Several different searchable encryption schemes have been proposed to allow the data user to retrieve the encrypted data [4,5]. In this section, we give a simple review on the existing work of the searchable encryption schemes.

In 2000, Song et al. [6] first proposed a searchable encryption scheme based on the symmetric encryption algorithm, which is called searchable symmetric encryption (SSE). However, their scheme has the following limitations: it is not proven to be a secure searchable encryption scheme; the distribution of the underlying plaintexts is vulnerable to statistical attacks; and the search time is linear to the length of the document collection. To overcome these limitations, Goh et al. [13] and Chang and Mitzenmacher [14] deployed a masked index table for SSE and introduced the notion of security for indexes. Curtmola et al. [15] generalized the security definitions of SSE and proposed two SSE schemes which are secure under the new security definitions. The search time of their schemes is linear to the number of documents. Subsequently, several SSE schemes were proposed for improvement. For example, Cash et al. [16] proposed an SSE scheme that supports conjunctive search and general Boolean queries on outsourced symmetrically encrypted data; Salam et al. [17] proposed a privacy-preserving data storage and retrieval system in cloud computing; Li et al. [18] proposed three different SSE schemes that can guard against a coercer by using the deniable encryption idea; Soleimanian et al. [19] proposed an SSE scheme to be publicly verifiable.

Although SSE schemes have high efficiency, they suffer from complicated secret key distribution. To resolve this problem, Boneh et al. [7] introduced a searchable encryption scheme based on public key cryptography, namely public key encryption with keyword search (PEKS). Waters et al. [20] showed that the PEKS schemes based on bilinear map could be applied to build encrypted and searchable auditing logs. However, the bilinear pairing operation is very complicated. Di et al. [21] introduced a PEKS scheme without bilinear pairing. The original PEKS scheme in [7] requires a secure channel to transmit the trapdoors. To overcome this limitation, Baek et al. [22] proposed a new PEKS scheme without requiring a secure channel. Byun et al. [23] introduced the off-line keyword-guessing attack (KGA) and pointed out that the original PEKS scheme in [7] was susceptible to KGA. Rhee et al. [24] proposed the notion of trapdoor indistinguishability and showed that trapdoor indistinguishability is a sufficient condition for preventing outside KGAs. Jeong et al. [25] showed that constructing secure PEKS schemes against inside KGA is impossible under the original PEKS framework in [7]. Xu et al. [26] proposed a PEKS scheme to against inside KGA. More recently, various improved PEKS schemes have been proposed. For example, Liang et al. [27] proposed a searchable attribute-based proxy re-encryption system to achieve privacy-preserving keyword search and encrypted data sharing as well as keyword update; Chen et al. [28] proposed a dual-server PEKS scheme to against inside KGA launched by the malicious server; Yang et al. [29] proposed a semantic key word searchable proxy re-encryption scheme for secure cloud storage using lattice-based cryptographic primitives; Wu et al. [30] designed an efficient and secure searchable encryption protocol using the trapdoor permutation function for cloud-based IoT; Yin et al. [31] proposed a ciphertext-policy attribute-based searchable encryption scheme to achieve keyword-based search and fine-grained access control over encrypted data.

Table 1 shows a simple comparison of some existing searchable encryption schemes. In the design of searchable encryption scheme, privacy is a key concern. However, in all the existing searchable encryption schemes, the data user can access all the data owned by the data owner, which can result in a privacy breach for the data owner.

3. Preliminaries

A summary of the notations used in this paper is presented in Table 2.

The set of all binary strings of length n is denoted as

{0, 1}^{n}

, and the set of all finite binary strings is denoted as

{0, 1}^{*}

.

An index table (or dictionary) denotes the data structure of the form

I [k e y] = v a l u e

. Given a

k e y

, the

v a l u e

matching the

k e y

is returned.

A function

μ : N \to N

is negligible if for every positive polynomial

p (\cdot)

and all sufficiently large

λ

,

μ (λ) < \frac{1}{p (λ)}

. We similarly write

f (λ) = negl (λ)

to mean that there exists a negligible function

μ (\cdot)

such that

f (λ) \leq μ (λ)

for all sufficiently large

λ

.

The following basic cryptographic primitives can be found in [32].

A symmetric encryption scheme is a tuple

E = (G e n, E n c, D e c)

of probabilistic, polynomial-time (PPT) algorithms, where

G e n

takes the security parameter

λ

as input, and outputs a secret key k;

E n c

takes a key k and a message

m \in {0, 1}^{*}

as input, and outputs a ciphertext

c = E n c (k, m)

;

D e c

takes a key k and a ciphertext c as input, and outputs m if

c = E n c (k, m)

.

For any symmetric encryption scheme

E = (G e n, E n c, D e c)

, any adversary A and any value

λ

for the security parameter, the chosen-plaintext attack (CPA) indistinguishability experiment

S E_{A, E}^{c p a} (λ)

is defined as:

A random key k is generated by running $G e n (λ)$ .
The adversary A is given input $λ$ and oracle access to $E n c (k, \cdot)$ , and outputs a pair of messages $m_{0}$ , $m_{1}$ of the same length.
A random bit $b \in {0, 1}$ is chosen, and then a ciphertext $c = E n c (k, m_{b})$ is computed and given to A. c is called the challenge ciphertext.
The adversary A continues to have oracle access to $E n c (k, \cdot)$ , and outputs a bit $b^{'}$ .
The output of the experiment is defined to be 1 if $b^{'} = b$ , and 0 otherwise. In the case $S E_{A, E}^{c p a} (λ) = 1$ , we say that A succeeded.

Definition 1.

A symmetric encryption scheme

E = (G e n, E n c, D e c)

is CPA-secure if for all PPT adversaries A there exists a negligible function

negl

such that

\begin{matrix} P r [S E_{A, E}^{c p a} (λ) = 1] \leq \frac{1}{2} + negl (λ), \end{matrix}

where the probability is taken over the random coins used by A, as well as the random coins used in the CPA indistinguishability experiment.

For any adversary A and any value

λ

for the security parameter, the computational Diffie-Hellman (CDH) experiment

C D H_{A, S e t u p} (λ)

is defined as:

Run $S e t u p (λ)$ to obtain output $(G, q, g)$ , where $G$ is a cyclic group of order q (with bit length $λ$ ) and g is a generator of $G$ .
Randomly choose a, $b \in Z_{q}$ .
A is given $G$ , q, g, $g^{a}$ , $g^{b}$ and outputs $h \in G$ .
The output of the experiment is defined to be 1 if $h = g^{a b}$ , and 0 otherwise.

Definition 2.

The CDH problem is hard relative to

S e t u p

if for all PPT adversaries A there exists a negligible function

negl

such that

\begin{matrix} P r [C D H_{A, S e t u p} (λ) = 1] \leq negl (λ) . \end{matrix}

4. System Model

The searchable encryption scheme for personalized privacy protection mainly includes three entities, i.e., the data owner, the data user, and cloud server. The data owner outsources the encrypted file features to the cloud server. The data user queries the encrypted file features containing a specific keyword to the cloud server. The cloud server stores and retrieves the encrypted file features. As the existing searchable encryption schemes, in this paper, the data owner is considered fully trusted. The data user is considered malicious, which means it may attempt to learn more information than it can retrieve. The cloud server is considered honest but curious in the sense that it may try to learn as much information as possible from the stored encrypted data and correctly execute the searchable encryption protocol.

Given n files

F_{i}

,

1 \leq i \leq n

, and a non-negative integer l, let

F_{i l}

denote the file feature of

F_{i}

at level l. Specially, let

F_{i 0} = F_{i}

, i.e., the file feature of

F_{i}

at level 0 is still

F_{i}

.

Let

n_{f} + 1

denote the number of the file feature level (FFL). The data owner wishes to store the file features set

F = {F_{i l} : 1 \leq i \leq n, 0 \leq l \leq n_{f}}

on the cloud server. The objectives of the data owner are as follows:

For $1 \leq i \leq n$ , $0 \leq l \leq n_{f}$ , the file feature $F_{i l}$ are stored on the cloud server such that the confidentiality of $F_{i l}$ is preserved.
The data user queries for a keyword w and an FFL l to retrieve all authorized file features $F_{i l}$ such that $w \in F_{i l_{0}}$ for a given $l_{0}$ in a secure and efficient way.

4.1. Formal Definition

The searchable encryption scheme for meeting the personalized privacy needs consists of the following algorithms:

$S e t u p (λ)$ : This algorithm is run by the data owner. It takes the security parameter $λ$ as input, and outputs the global parameter $Λ$ .
$K e y G e n (Λ)$ : This algorithm is run by the data owner and the data user, respectively. It takes the global parameter $Λ$ as input, and outputs public/private key pairs $(p k_{o}, s k_{o})$ and $(p k_{u}, s k_{u})$ for the data owner and the data user, respectively.
$S t o r e (F, p k_{u}, s k_{o})$ : This algorithm is run by the data owner. It takes the file features set $F$ , the data user’s public key $p k_{u}$ and the data owner’s private key $s k_{o}$ as input, and outputs the encrypted file features set $F^{'}$ and the encrypted index set $I n d^{'}$ .
$T r a p d o o r (w, l, p k_{o}, s k_{u})$ : This algorithm is run by the data user. It takes a keyword w, an FFL l, the data owner’s public key $p k_{o}$ , and the data user’s private key $s k_{u}$ as input, and outputs the trapdoor $T_{w, l}$ .
$S e a r c h (F^{'}, I n d^{'}, T_{w, l})$ : This algorithm is performed interactively between the cloud server and the data user. It takes the encrypted file features set $F^{'}$ , the encrypted index set $I n d^{'}$ , and the trapdoor $T_{w, l}$ as input, and outputs all authorization file features $F_{i l}$ such that $w \in F_{i l_{0}}$ for a given $l_{0}$ .

4.2. Security Definition

The searchable encryption scheme for meeting the personalized privacy needs must satisfy the index indistinguishability and the trapdoor indistinguishability under chosen keyword-FFL pair attack. As per literature [15], we define two challenge-response games

G a m e_{I}

and

G a m e_{T}

between the adversary A and the challenger C to show the index indistinguishability and the trapdoor indistinguishability under chosen keyword-FFL pair attack, respectively.

The adversary A plays

G a m e_{I}

with the challenger C and attempts to distinguish an encrypted index of the given keyword-FFL pair from some encrypted indexes. If A wins

G a m e_{I}

, then A has obtained some useful information from some encrypted indexes.

G a m e_{I}

:

Setup:

Challenger C runs

S e t u p (λ)

and

K e y G e n (Λ)

to generate the global parameter

Λ

and the public/private key pairs

(p k_{o}, s k_{o})

and

(p k_{u}, s k_{u})

of the data owner and the data user respectively, and sends

Λ

,

p k_{o}

and

p k_{u}

to A.

Adaptive query:

The adversary A makes the following queries to C:

-: The adversary A adaptively selects the keyword-FFL pair $(w, l)$ for the encrypted index query. C responds with $I n d^{'} [w^{'}]$ .
-: The adversary A adaptively selects the keyword-FFL pair $(w, l)$ for the trapdoor query. C responds with $T_{w, l}$ .

Challenge:

The adversary A sends two challenged keyword-FFL pairs

(w_{0}, l_{0})

,

(w_{1}, l_{1})

to C. C picks a random number

b \in {0, 1}

and sends the encrypted index

I n d^{'} [w_{b}^{'}]

of the keyword-FFL pair

(w_{b}, l_{b})

to A.

Guess:

The adversary A outputs

b^{'} \in {0, 1}

and wins the game if

b^{'} = b

.

Definition 3.

We say the searchable encryption scheme for meeting the personalized privacy needs satisfies the index indistinguishability under chosen keyword-FFL pair attack if for all PPT adversaries A there exists a negligible function

negl

such that

P r [A w i n s G a m e_{I}] \leq \frac{1}{2} + negl (λ) .

Adversary A plays

G a m e_{T}

with challenger C and attempts to distinguish a trapdoor of the given keyword-FFL pair from some trapdoors. If A wins

G a m e_{T}

, then A has obtained some useful information from some trapdoors.

G a m e_{T}

:

Setup:

C runs

S e t u p (λ)

and

K e y G e n (λ)

to generate the global parameter

Λ

and the public/private key pairs

(p k_{o}, s k_{o})

and

(p k_{u}, s k_{u})

of the data owner and the data user respectively, and sends

Λ

,

p k_{o}

and

p k_{u}

to A.

Adaptive query:

A makes the following queries to C:

-: Adversary A adaptively selects the keyword-FFL pair $(w, l)$ for the encrypted index query. C responds with $I n d^{'} [w^{'}]$ .
-: Adversary A adaptively selects the keyword-FFL pair $(w, l)$ for the trapdoor query. C responds with $T_{w, l}$ .

Challenge:

Adversary A sends two challenged keyword-FFL pairs

(w_{0}, l_{0})

,

(w_{1}, l_{1})

to C. C picks a random number

b \in {0, 1}

and sends the trapdoor

T_{w_{b}, l_{b}}

of the keyword-FFL pair

(w_{b}, l_{b})

to A.

Guess:

Adversary A outputs

b^{'} \in {0, 1}

and wins the game if

b^{'} = b

.

Definition 4.

We say the searchable encryption scheme for meeting the personalized privacy needs satisfies the trapdoor indistinguishability under chosen keyword-FFL pair attack if for all PPT adversaries A there exists a negligible function

negl

such that

P r [A w i n s G a m e_{T}] \leq \frac{1}{2} + negl (λ) .

5. Proposed Scheme

In this section, we present our proposed searchable encryption scheme for meeting the personalized privacy needs. It consists of the following algorithms.

S e t u p (λ)

is run by the data owner. It takes the security parameter

λ

as input, and performs the following:

Choose a cyclic group $G$ of prime order q and a generator g of $G$ .
Choose a symmetric encryption scheme $E = (G e n, E n c, D e c)$ .
Choose two collision-resistant hash functions $H_{1} : G \to {0, 1}^{λ}$ and $H_{2} : {0, 1}^{*} \to {0, 1}^{λ}$ .
Set the global parameter $Λ = (G, q, g, E, H_{1}, H_{2})$ .

K e y G e n (Λ)

is run by the data owner and the data user, respectively. It takes the global parameter

Λ

as input, and performs the following:

Randomly select two elements $k_{o}$ and $k_{u}$ in $Z_{q}$ as the private keys of the data owner and the data user, respectively.
Compute $g^{k_{o}}$ and $g^{k_{u}}$ in $G$ as the public keys of the data owner and the data user, respectively.

S t o r e (F, p k_{u}, s k_{o})

is run by the data owner. It takes the file features set

F

, the data user’s public key

p k_{u} = g^{k_{u}}

and the data owner’s private key

s k_{o} = k_{o}

as input, and performs the following:

Compute $k_{1} = H_{1} ({(g^{k_{u}})}^{k_{o}})$ .
For $1 \leq i \leq n$ , $0 \leq l \leq n_{f}$ , randomly select $i d_{i l} \in {0, 1}^{λ}$ as the identifier of $F_{i l}$ , run algorithm $G e n (λ)$ to generate the encryption key $e k_{i l}$ of $F_{i l}$ , and compute $i d_{i l}^{'} = E n c (k_{1}, i d_{i l})$ , $e k_{i l}^{'} = E n c (k_{1}, e k_{i l})$ , $F_{i l}^{'} = E n c (e k_{i l}, F_{i l})$ .
Create the index table $F^{'}$ such that $F^{'} [i d_{i l}] = F_{i l}^{'}$ for every $1 \leq i \leq n$ and $0 \leq l \leq n_{f}$ .
Given an FFL $l_{0}$ , create the keyword set $W_{l_{0}}$ of the file features set ${F_{i l_{0}} : 1 \leq i \leq n}$ .
For $w \in W_{l_{0}}$ , compute $w^{'} = E n c (k_{1}, H_{2} (w))$ .
For $0 \leq l \leq n_{f}$ , compute $l^{'} = E n c (k_{1}, H_{2} (l))$ .
For $1 \leq i \leq n$ , construct the set $L_{i}$ of the authorized FFL of the file $F_{i}$ . In other words, $l \in L_{i}$ implies the date user has authorization to access the file feature $F_{i l}$ .
Create the index table $I n d^{'}$ such that $I n d^{'} [w^{'}] = {(i d_{i l}^{'}, e k_{i l}^{'}, l^{'}) : w \in F_{i l_{0}}, l \in L_{i}, 1 \leq i \leq n}$ for every $w \in W_{l_{0}}$ .
Send $F^{'}$ and $I n d^{'}$ to the cloud server.

T r a p d o o r (w, l, p k_{o}, s k_{u})

is run by the data user. It takes a keyword w, an FFL l, the data owner’s public key

p k_{o} = g^{k_{o}}

and the data user’s private key

s k_{u} = k_{u}

as input, and performs the following:

Compute $k_{2} = H_{1} ({(g^{k_{u}})}^{k_{o}})$ .
Compute $T_{w, l} = E n c (k_{2}, H_{2} (w)), E n c (k_{2}, H_{2} (l))$ .

S e a r c h (F^{'}, I n d^{'}, T_{w, l})

is performed interactively between the cloud server and the data user. It takes the encrypted file features set

F^{'}

, the encrypted index set

I n d^{'}

and the trapdoor

T_{w, l}

as input, and performs the following:

The cloud server: Given $T_{w, l} = (T_{1}, T_{2})$ , search $I n d^{'} [T_{1}]$ to obtain the set $S = {(s_{1}, s_{2}, s_{3}) \in I n d^{'} [T_{1}] : s_{3} = T_{2}}$ and send $S$ to the data user.
The data user: Given $S$ , create two index tables $S_{1}$ and $S_{2}$ such that $S_{1} [r_{s}] = D e c (k_{2}, s_{1})$ , $S_{2} [r_{s}] = D e c (k_{2}, s_{2})$ for every $s = (s_{1}, s_{2}, s_{3}) \in S$ , where $k_{2} = H_{1} ({(g^{k_{u}})}^{k_{o}})$ and $r_{s}$ ( $s \in S$ ) are randomly selected in ${0, 1}^{λ}$ . Send $S_{1}$ to the cloud server and store $S_{2}$ .
The cloud server: Given $S_{1}$ , create the index table $R$ such that $R [r_{s}] = F^{'} [S_{1} [r_{s}]]$ for every $k e y$ $r_{s}$ in $S_{1}$ and send $R$ to the data user.
The data user: Given $S_{2}$ and $R$ , compute $D e c (S_{2} [r_{s}], R [r_{s}])$ for every $k e y$ $r_{s}$ in $S_{2}$ .

Remark 1.

Please note that

k_{1} = H_{1} ({(g^{k_{u}})}^{k_{o}}) = H_{1} ({(g^{k_{u}})}^{k_{o}}) = k_{2}

, then

T_{1} = w^{'}

,

T_{2} = l^{'}

. Thus,

s_{1} = i d_{i l}^{'}

,

s_{2} = e k_{i l}^{'}

,

S_{1} [r_{s}] = i d_{i l}

,

S_{2} [r_{s}] = e k_{i l}

,

R [r_{s}] = F^{'} [S_{1} [r_{s}]] = F_{i l}^{'}

for every

s = (s_{1}, s_{2}, s_{3}) \in S

, where

w \in F_{i l_{0}}

,

l \in L_{i}

,

1 \leq i \leq n

. Therefore, our proposed scheme is correct.

Given an FFL

l_{0}

, creating the keyword set

W_{l_{0}}

of the file features subset

{F_{i l_{0}} : 1 \leq i \leq n}

means that

F_{i l_{0}}

,

1 \leq i \leq n

must be text. Thus, our proposed scheme works for all file types including text, audio, image, video, etc. as long as there exists an FFL

l_{0}

such that the file feature of the file at

l_{0}

is text.

If the authorized FFL set of the ordinal file is only created by the data owner, then the data user cannot access to the unauthorized file features, thus our proposed scheme meets the different privacy needs of different individuals.

Our proposed scheme can be extended to the multi-user scenario. Let

n_{o}

and

n_{u}

be the number of the data owners and the data users, respectively. In the multi-user scenario, the public/private key pairs are first generated for every data owner and the data user; the file features stored on the cloud server is an

n_{o}

-ary vector, where the i-th element is the encrypted file features set of the i-th data owner; the index stored on the cloud server is an

n_{o} \times n_{u}

matrix, where the i-th row and j-th column element is the encrypted index set that the i-th data owner created for the j-th data user.

It is obvious that our proposed scheme needs increasing storage space when

n_{f}

is getting bigger. In particular, our proposed scheme has similar storage space to the existing searchable encryption schemes when

n_{f} = 0

.

6. Security Analysis

In this section, we show that our proposed scheme satisfies the index indistinguishability and the trapdoor indistinguishability under chosen keyword-FFL pair attack.

Theorem 1.

If

E = (G e n, E n c, D e c)

is CPA-Secure and the CDH problem is hard relative to

S e t u p

, then our proposed scheme satisfies the index indistinguishability under chosen keyword-FFL pair attack.

Proof.

If there exists a PPT, and adversary A wins

G a m e_{I}

, then there exists a simulator B such that

S E_{B, E}^{c p a} (λ) = 1

or

C D H_{B, S e t u p}^{c p a} (λ) = 1

.

In the setup phase, C runs

S e t u p (λ)

and

K e y G e n (Λ)

to generate the global parameter

Λ = (G, q, g, E, H_{1}, H_{2})

, and the public/private key pairs

(p k_{o} = g^{k_{o}}, s k_{o} = k_{o})

and

(p k_{u} = g^{k_{u}}, s k_{u} = k_{u})

of the data owner and the data user respectively. Then, C sends

Λ

,

p k_{o} = g^{k_{o}}

and

p k_{u} = g^{k_{u}}

to A.

In the adaptive query phase, assume A makes

n_{q} - 1

queries to C adaptively. The q-th query can be:

-: A adaptively selects the keyword-FFL pair $(w_{q}, l_{q})$ for the encrypted index query. C responds with $I n d^{'} [w_{q}^{'}] = {(i d_{i l_{q}}^{'}, e k_{i l_{q}}^{'}, l_{q}^{'}) : w_{q} \in D_{i}, l_{q} \in L_{i}, 1 \leq i \leq n}$ , where $L_{i}$ is the authorized FFL set of $F_{i}$ , $i d_{i l_{q}}^{'} = E n c (k_{1}, i d_{i l_{q}})$ , $e k_{i l_{q}}^{'} = E n c (k_{1}, e k_{i l_{q}})$ , $l_{q}^{'} = E n c (k_{1}, H_{2} (l_{q}))$ , $k_{1} = H_{1} ({(g^{k_{o}})}^{k_{u}})$ .
-: A adaptively selects the keyword-FFL pair $(w_{q}, l_{q})$ for the trapdoor query. C responds with $T_{w_{q}, l_{q}} = (E n c (k_{2}, H_{2} (w_{q})), E n c (k_{2}, H_{2} (l_{q}))$ , where $k_{2} = H_{1} ({(g^{k_{u}})}^{k_{o}})$ .

In the challenge phase, A sends two challenged keyword-FFL pairs

(w_{0}, l_{0})

,

(w_{1}, l_{1})

to C. C picks a random number

b \in {0, 1}

and sends the encrypted index

I n d^{'} [w_{b}] = {(i d_{i l_{b}}^{'}, e k_{i l_{b}}^{'}, l_{b}) : w_{b} \in D_{i}, l_{b} \in L_{i}, 1 \leq i \leq n}

of the keyword-FFL pair

(w_{b}, l_{b})

to A, where

i d_{i l_{b}}^{'} = E n c (k_{1}, i d_{i l_{b}})

,

e k_{i l_{b}}^{'} = E n c (k_{1}, e k_{i l_{b}})

,

l_{b}^{'} = E n c (k_{1}, H_{2} (l_{b}))

and

k_{2} = H_{1} ({(g^{k_{1}})}^{k_{u}})

.

In the guess phase, A outputs its guess

b_{1} \in {0, 1}

indicating whether the challenge

I n d^{'} [w_{b}]

is the encrypted index of

(w_{0}, l_{0})

or

(w_{1}, l_{1})

.

From the perspective of A,

i d_{i l_{q}}^{'} = E n c (k_{1}, i d_{i l_{q}})

and

e k_{i l_{q}}^{'} = E n c (k_{1}, e k_{i l_{q}})

are random values in

{0, 1}^{λ}

for every

1 \leq i \leq n

and

2 \leq q \leq n_{q}

. Please note that

k_{1} = H_{1} ({(g^{k_{u}})}^{k_{o}}) = H_{1} ({(g^{k_{o}})}^{k_{u}}) = k_{2}

. Then the information obtained by the adversary A in

G a m e_{I}

was the same as the information obtained by a simulator B in the CPA indistinguishability experiment

S E_{A, E}^{c p a} (λ)

and in the CDH experiment

C D H_{A, S e t u p} (λ)

. Thus, if A wins

G a m e_{I}

then

S E_{B, E}^{c p a} (λ) = 1

or

C D H_{B, S e t u p} (λ) = 1

, i.e.,

\begin{matrix} P r [A wins G a m e_{I}] & \leq & S E_{B, E}^{c p a} (λ) + C D H_{B, s e t u p} (λ) \\ \leq & \frac{1}{2} + negl (λ) . \end{matrix}

Therefore, our proposed scheme satisfies the index indistinguishability under chosen keyword-FFL pair attack if

E = (G e n, E n c, D e c)

is CPA-Secure and the CDH problem is hard relative to

S e t u p

. □

Similarly, we can prove the following theorem:

Theorem 2.

If

E = (G e n, E n c, D e c)

is CPA-Secure and the CDH problem is hard relative to

S e t u p

, then our proposed scheme satisfies the trapdoor indistinguishability under chosen keyword-FFL pair attack.

7. Performance Analysis

As shown in Table 3, we present a comprehensive comparison of the computation cost between our proposed scheme and some existing searchable encryption schemes. The notations used in Table 3 are as follows:

$T_{b p}$ : Time cost for a bilinear pairing.
$T_{h}$ : Time cost for a hash function.
$T_{e x p}$ : Time cost for an exponentiation operation in $G$ .
$T_{m u l}$ : Time cost for a multiplication operation in $G$ .
$T_{e n c}$ : Time cost for an encryption process of $E$ .
$T_{d e c}$ : Time cost for a decryption process of $E$ .

To meet the basic security level for comparison, SHA-256 and AES-256 is selected as the collision-resistant hash function and the symmetric encryption scheme, respectively. The cyclic group

G

of order q is generated by a point on an elliptic curve

E (F_{p})

, where q and p are the 256-bits and 521-bits prime numbers, respectively. To evaluate the efficiency of the five schemes, we perform our experiments on a computer with 2.4 GHz Intel Core i7 and 8 GB RAM.

As shown in Figure 1, Figure 2 and Figure 3, our proposed scheme is the most efficient in storage phase and search phase. In trapdoor phase, our proposed scheme has a higher computational cost than that of Boneh et al. [7], although it is still lower than other schemes. In summary, the performance of our proposed scheme is more efficient than four schemes studied in [7,24,26,28].

8. Conclusions

In this paper, we have proposed a searchable encryption scheme for meeting personalized privacy needs. Our proposed scheme mainly includes three entities, i.e., the data owner, the data user, and cloud server. The data owner outsources the encrypted file features to the cloud server. The data user queries the encrypted file features containing a specific keyword to the cloud server. The cloud server stores and retrieves the encrypted file features. Compared with the existing searchable encryption schemes, our proposed scheme works for all file types including text, audio, image, video, etc., and meets different privacy needs of different individuals at the expense of high storage cost. We also show that our proposed scheme satisfies index indistinguishability and trapdoor indistinguishability under chosen keyword-FFL pair attack. In other words, our proposed scheme is secure against inside KGA. Performance analysis shows that our proposed scheme is efficient in storage phase, trapdoor phase, and search phase.

Considering the decreasing costs of storage, storage cost is not a problem if

n_{f} + 1

, i.e., the number of the FFL is small in our proposed scheme. However, storage cost is still a problem if

n_{f}

is too large in our proposed scheme. Thus, choosing an appropriate

n_{f}

is an important work in the future.

Author Contributions

Writing—original draft preparation, S.L.; writing—review and editing, M.L., H.X. and W.Z.

Funding

This work is supported by the National Key R&D Program of China (No. 2018YFB1003905) and the National Natural Science Foundation of China under Grant (No. U1603116, No. 61701020).

Acknowledgments

The authors would like to thank the editor and the anonymous reviewers for their valuable comments and suggestions that improved the quality of this paper.

Conflicts of Interest

The authors declare no conflicts of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

IOT	Internet of Things
SSE	searchable symmetric encryption
PKES	public key encryption with keyword search
KGA	keyword guessing attac
FFL	The file feature level

References

Lohr, S. The age of big data. New York Times, 11 February 2012. [Google Scholar]
John Walker, S. Big Data: A Revolution That Will Transform How We Live, Work, and Think; Houghton Mifflin Harcourt: Boston, MA, USA, 2014. [Google Scholar]
Mehmood, A.; Natgunanathan, I.; Xiang, Y.; Hua, G.; Guo, S. Protection of big data privacy. IEEE Access 2016, 4, 1821–1834. [Google Scholar] [CrossRef]
Bösch, C.; Hartel, P.; Jonker, W.; Peter, A. A survey of provably secure searchable encryption. ACM Comput. Surv. 2015, 47, 18. [Google Scholar] [CrossRef]
Poh, G.S.; Chin, J.J.; Yau, W.C.; Choo, K.K.R.; Mohamad, M.S. Searchable symmetric encryption: Designs and challenges. ACM Comput. Surv. 2017, 50, 40. [Google Scholar] [CrossRef]
Song, D.X.; Wagner, D.; Perrig, A. Practical techniques for searches on encrypted data. In Proceedings of the 2000 IEEE Symposium on Security and Privacy, Berkeley, CA, USA, 14–17 May 2000; pp. 44–55. [Google Scholar]
Boneh, D.; Di Crescenzo, G.; Ostrovsky, R.; Persiano, G. Public key encryption with keyword search. In Proceedings of the International Conference on the Theory and Applications of Cryptographic Techniques, Interlaken, Switzerland, 2–6 May 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 506–522. [Google Scholar]
Byers, S. Information leakage caused by hidden data in published documents. IEEE Secur. Privacy 2004, 2, 23–27. [Google Scholar] [CrossRef]
Hand, D.J. Principles of data mining. Drug Safety 2007, 30, 621–622. [Google Scholar] [CrossRef] [PubMed]
Feldman, R.; Sanger, J. The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
Hirsch, H.G.; Pearce, D. The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In Proceedings of the ASR2000-Automatic Speech Recognition: Challenges for the New Millenium ISCA Tutorial and Research Workshop (ITRW), Beijing, China, 16–20 October 2000. [Google Scholar]
Hong, Z.Q. Algebraic feature extraction of image for recognition. Pattern Recogn. 1991, 24, 211–219. [Google Scholar] [CrossRef]
Goh, E.J. Secure indexes. IACR Cryptol. ePrint Arch. 2003, 2003, 216. [Google Scholar]
Chang, Y.C.; Mitzenmacher, M. Privacy preserving keyword searches on remote encrypted data. In Proceedings of the International Conference on Applied Cryptography and Network Security, New York, NY, USA, 7–10 June 2005; pp. 442–455. [Google Scholar]
Curtmola, R.; Garay, J.; Kamara, S.; Ostrovsky, R. Searchable symmetric encryption: Improved definitions and efficient constructions. J. Comput. Secur. 2011, 19, 895–934. [Google Scholar] [CrossRef]
Cash, D.; Jarecki, S.; Jutla, C.; Krawczyk, H.; Roşu, M.C.; Steiner, M. Highly-scalable searchable symmetric encryption with support for boolean queries. In Advances in Cryptology–CRYPTO 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 353–373. [Google Scholar]
Salam, M.I.; Yau, W.C.; Chin, J.J.; Heng, S.H.; Ling, H.C.; Phan, R.C.; Poh, G.S.; Tan, S.Y.; Yap, W.S. Implementation of searchable symmetric encryption for privacy-preserving keyword search on cloud storage. Hum. Centr. Comput. Inf. Sci. 2015, 5, 19. [Google Scholar] [CrossRef]
Li, H.; Zhang, F.; Fan, C.I. Deniable searchable symmetric encryption. Inf. Sci. 2017, 402, 233–243. [Google Scholar] [CrossRef]
Soleimanian, A.; Khazaei, S. Publicly verifiable searchable symmetric encryption based on efficient cryptographic components. Des. Codes Cryptogr. 2019, 87, 123–147. [Google Scholar] [CrossRef]
Waters, B.R.; Balfanz, D.; Durfee, G.; Smetters, D.K. Building an Encrypted and Searchable Audit Log; NDSS: San Diego, CA, USA, 2004; Volume 4, pp. 5–6. [Google Scholar]
Di Crescenzo, G.; Saraswat, V. Public key encryption with searchable keywords based on Jacobi symbols. In Proceedings of the International Conference on Cryptology in India, Chennai, India, 9–13 December 2007; pp. 282–296. [Google Scholar]
Baek, J.; Safavi-Naini, R.; Susilo, W. Public key encryption with keyword search revisited. In Proceedings of the International conference on Computational Science and Its Applications, Perugia, Italy, 30 June–3 July 2008; pp. 1249–1259. [Google Scholar]
Byun, J.W.; Rhee, H.S.; Park, H.A.; Lee, D.H. Off-line keyword guessing attacks on recent keyword search schemes over encrypted data. In Proceedings of the Workshop on Secure Data Management, Seoul, Korea, 10–11 September 2006; pp. 75–83. [Google Scholar]
Rhee, H.S.; Park, J.H.; Susilo, W.; Lee, D.H. Trapdoor security in a searchable public-key encryption scheme with a designated tester. J. Syst. Softw. 2010, 83, 763–771. [Google Scholar] [CrossRef]
Jeong, I.R.; Kwon, J.O.; Hong, D.; Lee, D.H. Constructing PEKS schemes secure against keyword guessing attacks is possible? Comput. Commun. 2009, 32, 394–396. [Google Scholar] [CrossRef]
Xu, P.; Jin, H.; Wu, Q.; Wang, W. Public-key encryption with fuzzy keyword search: A provably secure scheme under keyword guessing attack. IEEE Trans. Comput. 2013, 62, 2266–2277. [Google Scholar] [CrossRef]
Liang, K.; Susilo, W. Searchable attribute-based mechanism with efficient data sharing for secure cloud storage. IEEE Trans. Inf. Forensics Secur. 2015, 10, 1981–1992. [Google Scholar] [CrossRef]
Chen, R.; Mu, Y.; Yang, G.; Guo, F.; Wang, X. Dual-server public-key encryption with keyword search for secure cloud storage. IEEE Trans. Inf. Forensics Secur. 2016, 11, 789–798. [Google Scholar] [CrossRef]
Yang, Y.; Zheng, X.; Chang, V.; Tang, C. Semantic keyword searchable proxy re-encryption for postquantum secure cloud storage. Concurr. Comput. Pract. Exp. 2017, 29, e4211. [Google Scholar] [CrossRef]
Wu, L.; Chen, B.; Zeadally, S.; He, D. An efficient and secure searchable public key encryption scheme with privacy protection for cloud storage. Soft Comput. 2018, 22, 7685–7696. [Google Scholar] [CrossRef]
Yin, H.; Zhang, J.; Xiong, Y.; Ou, L.; Li, F.; Liao, S.; Li, K. CP-ABSE: A Ciphertext-Policy Attribute-Based Searchable Encryption Scheme. IEEE Access 2019, 7, 5682–5694. [Google Scholar] [CrossRef]
Lindell, Y.; Katz, J. Introduction to Modern Cryptography; Chapman and Hall/CRC: Boca Raton, FL, USA, 2014. [Google Scholar]

Figure 1. Computation cost at storage phase.

Figure 2. Computation cost at trapdoor phase.

Figure 3. Computation cost at search phase.

Table 1. A comparison of some existing searchable encryption schemes.

Type	Limitation	Characteristic	Literature
SSE	need key distribution	masked index	[13,14]
		boolean queries	[16]
		against the coercer	[18]
		publicly verifiable	[19]
PEKS	lower search efficiency	without bilinear pairing	[21]
		without secure channel	[22]
		keyword update	[27]
		against inside KGA	[28]
		synonym keyword search	[29]
		fine-grained access control	[31]

Table 2. Summary of notations.

Notation	Description
$λ$	The security parameter
$G$	A cyclic group of order q
g	A generator of $G$
$negl (λ)$	A negligible function with respect to $λ$
$G$	A cyclic group of order q
g	A generator of $G$
$(p k_{o}, s k_{o})$	The public/private key pairs for the data owner
$(p k_{u}, s k_{u})$	The public/private key pairs for the data user
n	The number of the file of the data owner
$F_{i}$	The i-th file of the data owner ( $1 \leq i \leq n$ )
$n_{f} + 1$	The number of the file feature level
l	A file feature level ( $0 \leq l \leq n_{f}$ )
$L_{i}$	The set of the authorized file feature level of $F_{i}$
$F_{i l}$	The file feature of $F_{i}$ at level l
$F$	The file features set ${F_{i l} : 1 \leq i \leq n, 0 \leq l \leq n_{f}}$
$F^{'}$	The encrypted file features set
$W_{l_{0}}$	The keyword set of the file features set ${F_{i l_{0}} : 1 \leq i \leq n}$
w	A keyword in $W_{l_{0}}$
$I n d$	The index set
$I n d^{'}$	The encrypted index set
$T_{w, l}$	The trapdoor with respect to w and l

Table 3. Computation cost: a comprehensive comparison.

Scheme	Computation
	Storage Phase	Trapdoor Phase	Search Phase
Boneh et al. [7]	$T_{b p} + 2 T_{h} + 2 T_{e x p}$	$T_{h} + T_{e x p}$	$T_{b p} + T_{h}$
Rhee et al. [24]	$T_{b p} + 2 T_{h} + 2 T_{e x p}$	$2 T_{h} + 3 T_{e x p}$	$T_{b p} + 2 T_{h} + 2 T_{e x p} + T_{m u l}$
Xu et al. [26]	$2 T_{b p} + 4 T_{h} + 4 T_{e x p}$	$2 T_{h} + 2 T_{e x p}$	$2 T_{b p} + 2 T_{h}$
Chen et al. [28]	$T_{h} + 4 T_{e x p} + 2 T_{m u l}$	$T_{h} + 4 T_{e x p} + 2 T_{m u l}$	$7 T_{e x p} + 3 T_{m u l}$
Our scheme	$T_{e x p} + 3 T_{h} + 5 T_{e n c}$	$T_{e x p} + 3 T_{h} + 2 T_{e n c}$	$T_{e x p} + T_{h} + 2 T_{d e c}$

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, S.; Li, M.; Xu, H.; Zhou, X. Searchable Encryption Scheme for Personalized Privacy in IoT-Based Big Data. Sensors 2019, 19, 1059. https://doi.org/10.3390/s19051059

AMA Style

Li S, Li M, Xu H, Zhou X. Searchable Encryption Scheme for Personalized Privacy in IoT-Based Big Data. Sensors. 2019; 19(5):1059. https://doi.org/10.3390/s19051059

Chicago/Turabian Style

Li, Shuai, Miao Li, Haitao Xu, and Xianwei Zhou. 2019. "Searchable Encryption Scheme for Personalized Privacy in IoT-Based Big Data" Sensors 19, no. 5: 1059. https://doi.org/10.3390/s19051059

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Searchable Encryption Scheme for Personalized Privacy in IoT-Based Big Data

Abstract

1. Introduction

2. Related Work

3. Preliminaries

4. System Model

4.1. Formal Definition

4.2. Security Definition

5. Proposed Scheme

6. Security Analysis

7. Performance Analysis

8. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI