A Content-Based Image Retrieval Scheme Using an Encrypted Difference Histogram in Cloud Computing

Content-based image retrieval (CBIR) has been widely used in many applications. Large storage and computation overheads have made the outsourcing of CBIR services attractive. However, the privacy issues brought by outsourcing have become a big problem. In this paper, a secure CBIR scheme based on an encrypted difference histogram (EDH-CBIR) is proposed. Firstly, the image owner calculates the order or disorder difference matrices of RGB components and encrypts them by value replacement and position scrambling. The encrypted images are then uploaded to the cloud server who extracts encrypted difference histograms as image feature vectors. To search similar images, the query image is encrypted by the image users as the image owner does, and the query feature vector is extracted by the cloud server. The Euclidean distance between query feature vector and image feature vector is calculated to measure the similarity. The security analysis and experiments demonstrate the usability of the proposed scheme.


Introduction
The number of images generated by all kinds of devices has been greatly increasing in recent years.Accordingly, content-based image retrieval (CBIR) technology research has generated wide attention and made remarkable advances [1][2][3][4][5].Images themselves are storage-consuming and the CBIR technologies are typically of high computation complexity.Thus, there is a motivation to outsource the CBIR services to the cloud server.
The public cloud storage services provide cheap storage space, are computationally convenient, and have multiple access modes.Although the cloud storage service has great advantages, it is worth pondering the privacy security problem it brings.The user defaults that the cloud service provider is untrustworthy [6][7][8].The urgent need for privacy protection has attracted experts to study secure outsourced CBIR schemes.
To solve the privacy problems of outsourcing CBIR services, the existing secure CBIR schemes mainly take the following steps.Firstly, the user extracts features directly from the plaintext image, builds an index, and then encrypts the features, index, and images.After that, the encrypted feature, index, and image are uploaded to the cloud server.In these CBIR schemes, the cloud server only provides storage and retrieval services.The computation burden on the user side is still serious.Therefore, it is necessary to propose a secure CBIR scheme that can directly extract features from the ciphertext domain on the cloud server side.
Contributions.This paper proposes a secure CBIR scheme based on encrypted difference histograms (EDH-CBIR).The major contributions are enumerated as follows: (1) A specially designed image encryption method is proposed to support the feature extraction directly from the ciphertext domain.(2) In EDH-CBIR, users only need to complete the work of image encryption, the feature extraction and index establishment will be completed by cloud server, which will largely reduce the user's work.(3) This paper takes the statistical characteristics of difference histogram into account, and considers two difference calculation methods.The retrieval accuracy and security in the two situations are tested and analyzed, respectively.
The rest sections as follows.The Section 2 describes the related work of the existing typical CBIR schemes, and the next section describes the system and security overview.Section 4 elaborates on the proposed scheme.Security analysis and experimental results are presented in Sections 5 and 6, respectively.Finally, the Section 7 gives the conclusion.

Related Work
Early searchable symmetric encryption (SSE) schemes [9][10][11][12][13] are mainly proposed to support the secure retrieval of the text.Lu et al., for the first time, proposed a ciphertext image retrieval scheme in 2009 [14].The scheme extracts the local features from the whole image database, and uses the clustering method to generate the visual word (Visual Words).The Jaccard similarity of visual word sets is used to measure the similarity between images, and the image content is protected by Min-hash and order preserving.In the same year, Lu et al. also analyzed the characteristics of three kinds of feature protection methods [15].The features that are encrypted with bit plane randomization and random unary coding can support Hamming distance calculation, and the features that are encrypted with the random projection algorithm can support L1 distance.In 2014, Lu et al. proposed an image retrieval algorithm based on homomorphic encryption, and compared the retrieval precision, efficiency and storage overhead with three previous proposed encryption algorithms [16].The results show that the algorithm based on homomorphic encryption is time-consuming despite its high level of security.Xia et al. [17] proposed an image retrieval scheme based on the scale-invariance feature transform (SIFT) feature and the earth mover's distance (EMD).This scheme uses the SIFT algorithm to extract the image feature.The EMD algorithm is used to measure the frequency histogram of the distance.The EMD algorithm is essentially a linear programming problem.The authors used the linear transformation of the linear programming problem to protect the features.Cheng et al. [18] proposed an encrypted image retrieval scheme based on Markov process.The scheme encrypts encoding data of JPEG image by stream cipher, and extracts Markov features from the encrypted image data directly.The similarity between images is ultimately measured by the similarity of Markov features.In [6], Xia et al. represented the images with four MPEG descriptors, and protected the image features by secure k-nearest neighbour (kNN) algorithm.The proposed scheme used locality-sensitive hash to improve the search efficiency.Degang et al. [19] proposed a triple-bit quantization-based scheme.The scheme assigns a 3-bit to each dimension and applies the asymmetric distance algorithm to re-rank candidates.Although the aforementioned schemes address the privacy issues, the computational burden on users is quite enormous.In order to solve the problem, Bellafqira et al. [20,21] proposed two privacy-preserving feature extraction methods in the homomorphic encryption domain.However, the extracted features cannot be used for image similarity retrieval.Ferreira et al. [22] proposed an secure image encryption algorithm for image retrieval, and separately processed the image color information and texture information.This scheme encrypted texture information by random encryption algorithm, and protected color information by using deterministic encryption.The authors used the color histogram as an image feature to support image retrieval.The scheme proposed by Ferreira greatly reduces the computational burden of users.However, the color histogram ignores the texture information of the image.Inspired by [22], a new image retrieval scheme based on a difference histogram is proposed to improve the retrieval accuracy.
The proposed system includes three entities: image owner, cloud server and users.The specific tasks of the three entities and the communication between them are shown in Figure 1.Image owner side.The image owner holds an original image database I = {I i } n i=1 , where I i is the ith image in image database and n is the total image number in the image database.Firstly, the image owner generates the secret keys to encrypt the original image, and the encrypted image database can be represented as C = {C i } n i=1 .After that, the image owner outsources the encrypted image database C to the cloud server.Cloud server side.After receiving the encrypted images, the cloud server extracts image features from the encrypted images and establishes the index.On receiving a search request from user, the cloud server extracts features from the trapdoor, and searches the most similar features in index.The k images with the most similar features are returned to the user.User side.To search the wanted images, the users encrypt the query image as image owner does.The encrypted query image is uploaded as the trapdoor to the cloud server.User decrypts the similar images returned by the cloud server with secret keys.
As a typical SSE scheme, the proposed scheme mainly considers semi-honest security model, i.e., honest-but-curious (HBC) security model.
In the HBC model, the cloud server will complete the specified tasks, but may take interest in the content of the encrypted image by acquiring and analyzing historical search records.The image owner and the image users are trustworthy believed, meaning that the image owner and image users will not reveal any privacy information to the cloud server during the communication.Furthermore, if image I i and image I j return the same similar image set, it is not difficult to infer that the image I i is similar to image I j .Hence, the information leakage caused by this way will not be discussed.

The Proposed Scheme
Section 3 presents the scheme with six main tasks: image encryption in the image owner side; feature extraction, index establishment and image retrieval in the cloud server side; and trapdoor generation and image decryption in the user side.The image user encrypts the query image as the image owner does and decrypts the encrypted similar images with the opposite operation of encryption.The tasks of the users are similar to those of the image owner, so we focus on image encryption, feature extraction and index establishment and image retrieval.

Image Encryption
The algorithm is based on RGB color space, and the detailed process of encryption consists of two steps: difference matrix computation and difference matrix encryption.
Difference histogram is used as the image feature.To extract the feature directly from the encrypted image, the image owner calculates the difference matrix of plaintext image.The difference matrix calculation is divided into the following three steps: (1) One-dimensional matrix.Assuming that the image size in the image database is M × N.
We select the appropriate conversion method to convert the image pixel matrix into a one-dimensional array Arrary, in which the length of Arrary is imgsize and the imgsize = M × N. Here, two conversion methods are mainly considered: orderly scanning and disorderly block scanning, and the schematic diagram is shown in Figure 2.For the order scanning method, the pixel values are obtained by orderly scanning the pixel matrix.The scanned sequence is shown in Figure 2a.For the disorderly block scanning method, we firstly divide the image into blocks.Then, the pixel values is obtained by disorderly scanning the block pixel matrix.Note: we arrange image blocks by line priority here.The Figure 2b gives a disorder block scanning example with the block size of 2 × 2.
The obtained pixels by two scanning methods are stored sequentially, so that the pixel matrix can be transformed into a one-dimensional array.According to the above scanning methods, we can obtain two kinds of arrays: order array and disorder array.The pixels in two kinds of array are represented as Array(pixel) = {pixel|1 ≤ pixel ≤ imgsize}, and the associated pixel values in RGB components are represented as {Value(pixel) * } * ∈{r,g,b} ∈ [0, 255].
(2) Difference value calculation.We acquire one-dimensional difference arrays Di f f Array by subtracting adjacent value in the Array.
The Di f f Array is represented as Di f f Array(pixel) = {pixel|1 ≤ pixel ≤ imgsize}.The formula of Di f f Value computation as follows: (3) Difference matrix acquisition.The difference matrix I DM can be gained by inverse conversion of the difference array Di f f Array.The transformation method is the inverse operation of the initial conversion method.The difference position of difference matrix can be represented as p = {(x, y)|1 ≤ x ≤ M, 1 ≤ y ≤ N}, and the corresponding difference value can be represented pv.
After difference matrix computation, we obtain two types of difference matrices: order difference matrix (ODM) and disorder difference matrix (DDM).The difference value in the difference matrix shows the changing trend of the pixel, and the difference position shows the roughness of difference matrix .To prevent the leakage of privacy, we encrypt the difference matrix by value replacement and position scrambling for ODM and DDM.Since the encryption methods are the same, we simply abbreviate as the differential matrix (DM) encryption.
-Value replacement.The image owner firstly generates three random permutations key vr , key vg , key vb of the range [{val min * } * ∈{r,g,b} , ..., {val max * } * ∈{r,g,b} ] by a pseudo-random permutation generator, where the parameter val min is the minimum difference value and val max is the maximum difference value in the image database.After that, image owner replaces the original difference value by the value in the random sequence.Denote pv r , pv g , and pv b as the three components of difference values pv in I DM , and pv r , pv g , pv b are the corresponding encryption results.For ∀pv ∈ I DM , do: A simple example is given to visualize the difference value replacement method.A sequence example is shown in Table 1.We give an original difference matrix in Figure 3a, and replace the original difference values with the values in the random sequence (Table 1), the results are shown in Figure 3b.Note: simple instance does not consider color space and it is just used as instantiated objects.
Give an example: we assume that a difference in the R component of the difference matrix in position (1, 1), i.e., (x, y) r = (1, 1).The first value in the random sequence key pr h is assumed to be 92, and the first value in the random sequence key pr w is assumed to be 88, i.e., key pr h [1] = 92, key pr w [1] = 88 .According to the Formula (3), (x, y) r ← (key pr h [1], key pr w [1]) = (92, 88), i.e., the position of first difference position (1, 1) becomes (92, 88).Operating on all pixel locations, we can get the encrypted image C.

Feature Extraction and Index Construction
The proposed scheme reduces the user's computational burden by transferring feature extraction and indexing tasks to the cloud server.When the image owner uploads the encrypted image database to the cloud server, the cloud server extracts histograms directly from encrypted images as image features.

Image Identity
Feature Vector

Image Retrieval
The user encrypts the query image with the mentioned encryption method as trapdoor TD.After accepting the trapdoor TD from users, the cloud server extracts the feature vector from TD, denoted as f q = { f q1 , f q2 , ..., f qτ , ..., f q{Val sum } }.In search of the most similar images, the cloud server retrieves the index, i.e., the cloud server matches the f q with all the feature vectors in the index.Euclidean distance is used to measure similarity.Cloud server calculates the Euclidean distance d(f q , f i ) between f q and all f i , i ∈ {1, ..., n}.The d(f q , f i ) is used as the similarities of images in database to the query image.The d(f q , f i ) is calculated as: By computing all the d(f q , f i ), we get the distance between the query feature and all the features.Similar vectors have smaller distances.Therefore, all the distances are sorted in ascending order, and a similar image database is made up of the k images with the most smallest distance.Then, the similar image database is returned to the query user.Thus, the cloud server has completed the process of image retrieval.

Security Analysis
Honest-but-curious (HBC) cloud server is considered as the security model.We analyze the security of the proposed scheme in the ciphertext-only attack (COA) model and known background attack (KBA) model.

Security under COA Model
Our security proofs follow the paradigm in secure multi-party computations [23].Interaction between cloud server and users is defined as a real experiment, and the HBC cloud server is defined as an attacker A. We build an ideal experiment, the simulator S is used to simulate all the possible attacks by cloud servers.If the difference between the real experiment and the ideal experiment is subtle, the proposed scheme proves security.Theorem 1.The scheme proposed is secure against HBC probabilistic polynomial time adversaries.The security strength is used to measure security of the proposed scheme.

•
Security of the encrypted image.Simulator S simulates a image set I S .The simulator S knows the image number and the image size of the image database, so it can simulate a hypothetical image database I S similar to real image database I. EDH-CBIR contains the encrypted order difference histogram-based CBIR scheme (EODH-CBIR) and the encrypted disorder difference histogram-based CBIR scheme (EDDH-CBIR).The security of the two schemes is analyzed.
-EODH-CBIR.To simulate an image in EODH-CBIR, the simulator S needs to solve a permutation to get the order difference matrix, and needs to solve val sum r !+ val sum g !+ val sum b !permutations for value replacement, and 3 * imgsize! for pixel scrambling of three components.Sec is defined as the security strength and the Sec of order difference as Sec od , which can be expressed as: -EDDH-CBIR.To simulate an image in EDDH-CBIR, the simulator S needs to solve 3*imgsize!permutations to get disorder difference matrix , val sum r !+ val sum g !+ val sum b !permutations for value replacement, and 3 * imgsize! for pixels scrambling of three components.We define the security strength of disorder difference as Sec dd , which can be expressed as: • Security of the image feature.The proposed scheme extracts the difference histogram from the encrypted image directly as the image feature.Simulator S simulates an image set I S .The simulator S can extract simulated features of I S .The security strength of the image feature is mainly determined by the difference value displacement.Therefore, the security strength of order difference image features can be represented as log 2 (val sum r !) + log 2 (val sum g !) + log 2 (val sum b !)bit, and the security strength of disorder difference image features can be represented as 3 * log 2 (imgsize!)+ log 2 (val sum r !) + log 2 (val sum g !) + log 2 (val sum b !)bit.
• Security of the trapdoor.Simulator S simulates a query image I S q .The simulator S knows the size of the query image, so it can simulate a query image I S q with the same pixel number as real query image I q .The user encrypts the query image as the image owner dose, so it has the same security strength as image encryption.Specific analysis is no longer expounded.

Security under the KBP Model
In addition to the previously mentioned information leakage, the statistical characteristics of plaintext images may be inferred by the ciphertext images.The pixel values of each color component have a range of [0, 255], and the theoretical difference values have a range of [−255, 255], i.e., simulator S needs to solve 500!sequences for color permutation encryption.However, some difference values will not occur in an image, and the number of resolved sequences is reduced.Taking the Lena standard gray image as an example, we calculate the gray order difference matrix (GODM) and gray disorder difference matrix (GDDM) according to Section 4.1.The difference distributions of GODM and GDDM are shown in Figure 4, Figure 5, respectively.The difference values of GODM have a range of [−150, 150], and the difference values of GDDM have a range of [−200, 200].Compared with theoretical sequences, the number of resolved sequences is greatly reduced.
Figure 4 presents as the Laplasse distribution, i.e., the values are centered around 0. Under this distribution, the attacker S can easily judge the original value corresponding to the occur frequency.Hence, the color value replacement encryption algorithm will be weakened.Figure 5 smoothes this distribution and improves security to some extent.

Experimental Results
This section shows the experimental results of encryption effectiveness and retrieval accuracy.The scheme is implemented with MatLab 2014a.The simulation is conducted on a computer with Intel Core CPU 2.50 GHZ and 16 G memory.All the experiments in this paper are based on the INRIA Holidays database [24].The image database contains 1491 images in 500 classes.The first images of each category are grouped into a query images set.The proposed scheme calculates the difference matrix of three components and encrypts them.Taking the R component as an example, we show difference value replacement, position scrambling and overlay effects of order difference and disorder difference in Figures 7 and 8, respectively.Merge encryption results of three RGB components and the final encryption image are shown in Figure 9.

Retrieval Accuracy
In our experiments, mean average precision (mAP) is used to measure the retrieval accuracy.On analysis of the encrypted disorder difference histogram-based CBIR scheme (EDDH-CBIR), the size of image block is probably an important parameter affecting retrieval precision.The experimental results of different block sizes are shown in Table 3.The proposed EDH-CBIR is divided into two sub-schemes: EODH-CBIR and EDDH-CBIR.Some other contrast experiments are carried out to compare the accuracy of the proposed method.The contrast experiments in the ciphertext domain contain the encrypted color histogram-based CBIR scheme (ECH-CBIR) and the global disorder difference histogram-based CBIR scheme (GDDH-CBIR).The contrast experiments in the plaintext domain contain the order difference histogram-based CBIR scheme (ODH-CBIR) and the disorder difference histogram-based CBIR scheme (DDH-CBIR).The mAPs of all the mentioned schemes are shown in Table 4. Experimental results show that the EDH-CBIR is indeed advantageous, and can obtain the comparable accuracy in the plaintext domain.

Efficiency
Efficiency is a significant measurement standard, and it includes the time consumptions of image encryption, index construction, and image searching.For comparison, this section considers the contrast experiments in the ciphertext domain.

•
The time consumption of image encryption.The encryption process of ECH-CBIR includes value replacement and position scrambling.The encryption processes of EODH-CBIR and GDDH-CBIR include the difference matrix calculation, difference value replacement, and pixel scrambling.EDDH-CBIR includes the block difference matrix calculation, the difference value replacement, and pixel permutation.The time consumptions of image encryptions of all above schemes are shown in Figure 10.

•
The time consumption of image retrieval.When the cloud server receives the user's trapdoor, it searches the index for the k most similar images.The index designed in this paper is a linear one, so the retrieval time is only related to the length of feature vectors.The time consumption of mentioned schemes are shown in Figure 12.

Conclusions
In this paper, a secure CBIR scheme is proposed by using encrypted difference features.The scheme encrypts the image by difference matrix calculation, difference value replacement, and difference position scrambling.On the basis of this scheme, we compare it with the ECH-CBIR scheme, the GODH-CBIR scheme, the ODH-CBIR scheme and the DDH-CBIR scheme, and the experiments show that our encrypted difference histogram feature has advantages.However, both the EDDH-CBIR and the EODH-CBIR scheme have the problem of security risks under the KBP model.Future work will focus on more efficient encryption methods to improve the security of the EDH-CBIR scheme.

Original difference value valFigure 3 .
Figure 3.The sample of value replacement, (a) an example of the original difference matrix, (b) difference matrix after value replacement.

Figure 6
Figure 6 shows the R, G, B components of the first image in the INRIA Holidays database.

Figure 6 .
Figure 6.The first image in the INRIA Holidays database and the related R, G, B components.(a) Original image; (b) R component; (c) G component; (d) B component.

Figure 10 .
Figure 10.The time consumption of image encryption.•Thetime consumption of index construction.A linear index is built for all the schemes so as to observe them more intuitively.Time consumption actually includes feature extraction and indexing, and results of three scheme are shown in Figure11.

Figure 11 .
Figure 11.The time consumption of index construction.

Figure 12 .
Figure 12.The time consumption of image retrieval.

Table 1 .
The sequence example.

Table 3 .
The mean average precision (mAP) (%) of different parameters for the encrypted disorder difference histogram-based content-based image retrieval scheme (EDDH-CBIR).