Reversible Data Hiding in Encrypted Image Based on Multi-MSB Embedding Strategy

: In this paper, a reversible data hiding method in encrypted image (RDHEI) is proposed. Prior to image encryption, the embeddable pixels are selected from an original image according to prediction errors due to adjacent pixels with strong correlation. Then the embeddable pixels and the other pixels are both rearranged and encrypted to generate an encrypted image. Secret bits are directly embedded into the multiple MSBs (most signiﬁcant bit) of the embeddable pixels in the encrypted image to generate a marked encrypted image during the encoding phase. In the decoding phase, secret bits can be extracted from the multiple MSBs of the embeddable pixels in the marked encrypted image. Moreover, the original embeddable pixels are restored losslessly by using correlation of the adjacent pixels. Thus, a reconstructed image with high visual quality can be obtained only when the encryption key is available. Since exploiting multiple MSBs of the embeddable pixels, the proposed method can obtain a very large embedding capacity. Experimental results show that the proposed method is able to achieve an average embedding rate as large as 1.7215 bpp (bits per pixel) for the BOW-2 database.


Introduction
Image encryption and data hiding are two main means for data security. The former aims to transform the meaningful image into a noise-like one to prevent image content leakage [1][2][3][4], while the latter embeds secret data into a cover image imperceptibly. In image encryption, the original image is the one to be protected, while, in data hiding, the secret data is the information that should be undisclosed. Traditional data hiding technology is usually irreversible, and the embedding process will bring permanent distortion to the original carrier, which is not accepted in some cases such as military images, medical images, and judicial evidence collection where the original carrier needs to be restored without distortion. Take into account requirements of lossless recovery of the original carrier, reversible data hiding (RDH) was proposed.
RDH technology has made great progress in the past decade. For example, early reversible data hiding technologies losslessly compress the least significant bit (LSB) planes or quantization residuals to accommodate for secret bits [5][6][7][8]. Later, Tian [9] proposed an RDH method using difference expansion (DE). This method divides an image into a series of pixel pair and embeds a secret bit into a pair of pixels by expanding the difference of this pixel pair. Based on the DE method, some improved reversible data hiding algorithms were proposed in [10][11][12][13]. Another classic reversible data-hiding algorithm is histogram shifting (HS) [14][15][16][17][18][19][20][21]. HS methods aim to generate a sharp histogram by counting pixel values [14], pixel difference [15,16], or prediction errors [17][18][19][20][21]. The peak bin is expanded for data embedding, and the other bins need to be shifted for reversibility. Among these methods, HS-based methods can achieve better embedding performance.
Nowadays, RDHEI has received increasing attention in the research community, in which both the original image and secret data need to be protected. In RDHEI, image encryption and data embedding are accomplished by different users separately. The content-owner first encrypts the original image to a noise-like one according to encryption key, then the data hider embeds secret data into the encrypted image using a data-hiding key without knowing its original content. The receiver can perform different operations according to the available keys. This can be applied in many scenarios such as Cloud storage, medical image management system.
In general, RDHEI can be divided into two categories, namely, vacating room after encryption (VRAE) [22][23][24][25][26][27][28][29][30][31] or reserving room before encryption (RRBE) [32][33][34][35][36][37][38][39]. Puech et al. [22] used an AES (advanced encryption standard) encryption algorithm to encrypt the original image, and then randomly selected a location in each 4×4 pixel block to embed the secret bits. In order to extract secret bits during the decryption phase, they performed a local standard deviation analysis of the marked image. The embedding rate (ER) of this method is very small, i.e., only 0.0625 bpp. Zhang [23] encrypted the original image using the stream cipher firstly and then divided the encrypted image into blocks and each pixel block was divided into two parts. In one part, the three LSBs of each pixel were flipped to embed a secret bit. On the receiver side, a fluctuation function was designed for data extraction and image recovery. Yu et al. [24] improves the method of Zhang et al. [23], so that the visual quality of decrypted images is improved. Wu et al. [25] adopted a prediction error to introduce two RDH methods in the encryption domain, namely, joint method and separable method. In order to make full use of the spatial correlation in the original image, Li et al. [26] abandoned the idea of block segmentation and adopted the strategy of random diffusion, and the performance is better than that of Zhang [23]. In [27], Zhou et al. designed a method to embed secret bits through a public key mechanism. In the decoding processing, a powerful SVM (support vector machine) classifier with two classes is designed to distinguish encrypted and unencrypted image patches, so that the embedded bits and the original image can be decoded jointly. Qian et al. [28] introduced an RDH method based on distributed source coding. After stream encryption of an image, a series of pixels is selected from the encrypted image for compression to vacate room for embedding data. However, the entropy value of the image after encryption will reach a maximum, which makes it difficult to vacate room to embed bits, so that the embedding rate is usually very low. For this case, some special encryption-based methods are proposed. Xu et al. [29] designed a special encryption mode to encrypt the non-sampled interpolation error, and then used the improved histogram shifting and DE technology to embed data. Li et al. [30] divide the image into several crosses, the pixels of each cross are encrypted with the same key, and secret data is embedded by shifting the difference histogram of the encrypted image. Yu et al. [31] encrypted the original image with key transmission and calculated two-layer pixel errors to generate an error histogram with three peak bins. Data hiding is performed by shifting the error histogram.
In order to improve the embedding rate, Ma et al. [32] first proposed a RRBE-based method. They exploited the traditional RDH method to achieve self-embedding before encryption so as to obtain room for data embedding. The embedding rate of Ma et al. is higher than that of the previous methods. Zhang et al. [33] predicted some pixel values before encryption so that through histogram shifting to embed data in the predicted error, and a special encryption scheme is designed to encrypt the predicted error. Yi et al. [34] increased the embedding rate through improving the method of Zhang et al. [33] by using half of the pixels in the original image to predict the other half of the pixels. In order to better utilize the correlation between adjacent pixels, Cao et al. [35] considered a patch-level sparse representation which can reserve a large redundant room for embedding data in the encrypted image. Zhang et al. [36] encrypted the original image by using the public key encryption technology with homomorphism, and then embedded data into LSB of pixels in the encrypted image. In [37], a novel method of RDHEI was proposed, Nguyen et al. first divided the pixels into smooth and complex regions according to the four neighborhood pixels of the pixels, and then encrypted the original image. The secret data is embedded in the median plane of the smooth pixels of the encrypted image. Compared with VRAE-based RDHEI methods, these RRBE-based RDHEI methods have improved the embedding capacity and the visual quality of decrypted images, but due to the limitation of the embedding method, the hiding capacity is not particularly large. Based on this, Puteaux et al. [38] designed a new MSB-prediction based on a method which embeds data in the MSB of the pixels. In [38], two approaches are proposed, which are the CPE-HCRDH approach (high-capacity reversible data hiding approach with correction of prediction errors) and the EPE-HCRDH approach (high-capacity reversible data hiding approach with embedded prediction errors). In the CPE-HCRDH approach, the embedding rate can reach 1 bpp for every image, and in the EPE-HCRDH approach, the embedding rate is slightly lower, but still between 0.8 bpp and 1 bpp for most tested images. Subsequently, on the basis of method in Puteaux et al. [38], Yi et al. [39] further improved the maximum embedding rate by using the method of embedding data at the two-MSB of pixels. For most tested images, the embedding rate was larger than 1 bpp.
Although [38,39] achieving a high embedding rate, only one MSB and two MSBs are exploited in [38] and [39], respectively. The utilization of MSBs is still unsatisfactory in these two methods and embedding rate still has improvement room. In this paper, an RDHEI method based on a multi-MSB embedding strategy is proposed. Integrating prediction technology with pixel correlation, the multiply MSBs can be exploited to improve embedding rate. The proposed scheme has the following highlights: • By using a multi-MSB embedding strategy, the secret bits can be embedded in the encrypted images without any pixel oversaturation in the plaintext domain.

•
More importantly, by using multi-MSB embedding strategy, secrets bits can be directly extracted from encrypted domain from the multi-MSB of pixels without any error. The reconstructed image with very high visual quality can be obtained only in the case that the encryption key is obtained.

•
Compared with the other state-of-the-art methods [38,39], the proposed method can achieve a significantly higher maximum embedding rate.
The remainder of this paper is organized as follows. The details of the proposed method are introduced in Section 2. Section 3 describes how to choose the optimal parameters. Experimental results and performance comparison are presented in Section 4. Finally, the proposed method is concluded in Section 5.

Proposed Method
In this section, an RDH method in encrypted image is proposed, which includes image encryption, data hiding in encrypted image, data extraction, and image restoration. The proposed method is suitable for many classic scenes. For example, as shown in Figure 1, the content owner is a user of the cloud. In order to prevent the information of the image from being leaked, he encrypted the image before uploading it to the cloud. As the third party, the cloud can provide data embedding services to generate a marked encrypted image without knowing the original image content, that is, the cloud is the data hider of our method. The recipient is also a cloud user. He can get a marked encrypted image from the cloud and then extract data and decrypt the image. We introduce our method through the content owner, data hider, and recipient. The content owner conducts a series of operations including selecting embeddable pixels, generating a location map, rearranging the image, and encrypting the image. Data hider embeds secret bits into the encrypted image by data-hiding key without knowing the contents of the original images. At the receiving end, if the legal recipient only has a data-hiding key, then he can extract secret bits without error. If he only has the encryption key, then he can obtain a reconstructed image. If he has both data-hiding key and encryption key, then he can extract data without error and restore the image losslessly.
The outline of this section is as follows: selection of embeddable pixels is introduced in Section 2.1. Image encryption and data embedding are presented in Sections 2.2 and 2.3, respectively. Finally, data extraction and image recovery are introduced in Section 2.4.

Selection of Embeddable Pixels
Suppose that the original image I is an 8-bit gray-scale image sized H × W. Then the value of each pixel I(i,j) belongs to [0,255] (1 ≤ i ≤ H, 1 ≤ j ≤ W). As shown in Figure 2a, the content owner divides the original image into black, white, and grey parts. Then the black pixels are denoted as I B , the white pixels are denoted as I W , and the grey pixels at the edge are denoted as I ED . For each white pixel, there are some black pixels around it. According to the distribution of these black pixels, namely, the number and location of the surrounding black pixels, we divide these white pixels into three cases, I, II, and III, as shown in Figure 2b. For case I, the white pixel has four diagonal black pixels surrounding it. For case II, two black pixels are located on the left and right of the white pixel, respectively. For case III, two black pixels are located on the upper and lower margins of the white pixel, respectively. Thus, the number of black pixels surrounding white pixels may be 2 or 4. Then, the local complexity of each white pixel I W (i,j) is calculated by Equation (1).
where I k B (i, j) (k = 1, . . . , u) are the black pixels which are surrounding with I W (i,j), and I ave (i, j) is the average value of these surrounding pixels. A large value of ∆(i, j) indicates that the current white pixel I W (i,j) is located in a relatively complex region. Thus, we define a threshold T that determines whether the current pixel is in a smooth region or in a complex region. If ∆(i, j) ≤ T, then the current pixel I W (i,j) is located in a smooth region, and we have I W (i, j) ∈ I s W ; otherwise, I W (i, j) ∈ I c W , where I s W and I c W are two pixel sets which are located in a smooth region and complex region, respectively. In order to achieve accurate prediction for the current pixel, we only predict the pixels in I s W . For the white pixels in three cases, I, II, and III, we use their corresponding nearest black pixels to predict them. Suppose that I s W (i, j) is the prediction value of I s W (i, j). According to different cases in Figure 2b, I s W (i, j) is predicted as According to prediction error between I s W (i, j) and I s W (i, j), the pixels in I s W are classified into two categories, and one of which is used to embed data. The criterion for pixel classification is where m∈ {3,4,5,6,7}. If the current white pixels satisfy Equation (3), it means that these pixels can be used for embedding data and denoted as I s W . Otherwise, it means that these pixels cannot be used for embedding data and denoted as I s * W . Since secret data is embedded into the MSBs of the pixels to generate marked pixels in the proposed method, the MSBs of those pixels will be changed due to modification in the encoding phase. In order to guarantee reversibility, only the pixels which satisfy Equation (3) (i.e., I s W ) can be used to embed data. The proof process is as follows. Note that the 1th~mth LSBs of the marked pixels keep unchanged; these original bit planes can be restored after decryption operation. While the (m+1)th~8th MSBs of these pixels are changed after data embedding, we need to use the surrounding pixels to restore their original values in the decoding phase. Suppose that the directly decrypted value of marked pixel is X. Then the 1th~mth LSBs of X remain unchanged, and each bit of the (m+1)th~8th MSBs of X may be 0 or 1, so there are 2 8−m different possible values for different combinations of the (m+1)th~8th MSBs. Suppose that these different values are X l (l = 1, . . . , 2 8−m ), then one of them must be equal to the original pixel value. Let X k (1 ≤ k ≤ 2 8−m ) be equal to the original pixel value (i.e., X k = I s W (i, j)). Then, according to Equation (3), X k satisfies Equation (4): Note that the black pixels surrounding I s W (i, j) are not modified; thus, the original value of these pixels can be restored losslessly after directly decryption. Therefore, the same predicted value I s W (i, j) can be obtained according to Equation (2).
Next, we will prove that only one value of X l (i.e., X k ) satisfies Equation (4). Here we assume that both two possible values X k 1 , X k 2 (1 ≤ k 1 , k 2 ≤ 2 8-m , k 1 k 2 ) satisfy Equation (4); then: According to Equation (5) and Equation (6), we have: Note that the (1th~mth) LSBs of X k 1 and X k 2 are the same, and at least one bit of the (m+1)th~8th MSBs of X k 1 and X k 2 is different. We know that each bit plane in (m+1) th~8th MSBs represents a decimal value which is equal to or larger than 2 m ; thus the assumption and Equation (7) are invalid. Thus, only one value of X l satisfies Equation (4), this value is equal to the original pixel value.
Since the pixels in I s * W are not suitable for data embedding, these pixels should be marked. To address this problem, a location map LM is utilized which is a 0-1 matrix with same size as the original image. In the LM, the pixels in I s * W are marked with 1, and the other pixels are marked with 0.

Image Encryption
Actually, to construct the encrypted image, there are three steps: auxiliary information generation, image rearrangement, and image encryption. Firstly, the auxiliary information is generated which is needed at the decoding phase. Next, the pixels are rearranged to produce a rearranged image. Last, the rearranged image is encrypted to generate its final version.
1) Auxiliary information generation: In order to correctly extract data and restore the original image losslessly during the decoding phase, auxiliary information is essential, including: threshold T (8 bits), parameter m (3 bits), the number of pixels in I s W : S (16 bits), and Location map LM. LM is compressed losslessly by arithmetic coding to reduce its size, and its compressed version is denoted as L clm and the size of L clm is l clm . A parameter n (16 bits) is required to record the value of l clm for extracting the auxiliary information in advance. And the sequence of parameters in auxiliary information is as follows: T, m, S, n, L clm . Consequently, the total size of the auxiliary information is (43+ l clm ) bits.
2) Image rearrangement: As Figure 2a shows, the black pixels occupy a quarter of the image; hence, we arrange these black pixels on the top quarter of the image in raster scan order, and the different types of pixels belonging to I W are arranged by the order of I s W → I s * W → I c W after I B . Finally, the pixels in I ED are arranged on the end of image. Figure 3 is a rearranged vision of the original image which is denoted as R. 3) Image encryption: In order to prevent the original image content leakage, it is encrypted with the encryption key K e , which is generated by encryption algorithm RC4 [40] due to its security and efficiency. Let R l (i,j) be the lth (l = 1, 2, . . . , 8) bit of R(i,j); then: where |·| is a floor function, and each encrypted bit E l (i, j) can be calculated by where notation ⊕ denotes the exclusive-or operation, and p l (i, j) is the standard stream cipher generated by the encryption key K e . Then the encrypted pixel value E(i,j) can be calculated by: After that, the encrypted image E can be obtained.

Data Embedding in the Encrypted Image
After receiving the encrypted image E and auxiliary information, the data hider can embed secret bits into the image even if he/she does not know the content of the original image. Firstly, the bit planes of pixels which are encrypted in I ED are replaced with auxiliary information, and then the replaced bit planes and secret bits are concatenated as embedded data, and the embedded data is embedded into the encrypted image. An overview of the process of encoding phase is presented in Figure 4. The number of pixels in I s W is S, which are immediately arranged behind the pixels of I B . Thus, we can know which part of the pixels belong to I s W . Then the data hider can use the data-hiding key K h to pseudo-randomly select the pixels of I s W for data embedding. In the proposed method, the (m+1)th~8th MSBs of these selected pixels are directly replaced by embedded data to generate a marked encrypted image, and the values of the marked pixels are derived as: where d l−m is a secret bit with a value of 0 or 1. Each pixel of I s W can accommodate (8-m) bits of data, and the number of pixels in I s W is determined by m and T. Thus, the embeddable capacity of an image under fixed m and T is: Obviously, the maximum embeddable capacity of an image depends on the value of m and T. How to obtain the optimal values of m and T is introduced in Section 3.

Data Extraction and Image Recovery
In the decoding phase, the recipient can perform different operations according to the availability of encryption key and data hiding key. There are three possible scenarios:

1)
The recipient only has the data hiding key K h . 2) The recipient only has the encryption key K e .
3) The recipient has both the data hiding key K h and encryption key K e .
An overview of the process of decoding phase is presented in Figure 5. In the first scenario, the recipient has the right to use the data hiding key to extracted data in the encrypted domain. First of all, the auxiliary information is extracted from the pixels in I ED of the encrypted image. The recipient can obtain the position of the marked pixels according to data hiding key. Then (8 − m) bits data can be extracted from marked pixels as follows: After every marked pixel is processed completely, all the data are extracted. Obviously, since the entire process is performed in the encrypted domain, it can effectively avoid the contents of the original image being leaked.
In the second scenario, if the recipient only has encryption key K e , then the reconstructed image I can be obtain, and the detailed process is as follows.
Step 1: The auxiliary information is extracted from the pixels in I ED of the encrypted image.
Step 2: The pixels of the encrypted image are decrypted directly by using the encryption key: Then the pixel values after decryption are calculated as: where E l (i, j) and D l (i, j) are the lth bit values of the marked pixel and decrypted pixel, respectively.
Step 3: The pixels are arranged to their original position. The process is as follows. 1) According to the previous scan order, the pixels of I B and I ED can be arranged to the original position.
2) The original positions of pixels in I W are scanned according to raster scanning mode, and then ∆(i, j) is calculated one by one according to Equation (1). If ∆(i, j)>T, then the pixel in this position belongs to I C W . By doing so, we can know the total number of pixels in I C W , and these pixels are rearranged in front of the pixels in I ED ; thus, it is easy to get the pixel positions in I C W in the rearranged image. Then the pixels in I C W can be arranged to original positions one by one, and the remaining positions of pixels in I W belong to I S W . 3) From the auxiliary information, the compressed LM is decompressed to get the positions of the pixels in I S * W . Then the pixels are arranged to their original positions according to the location map. Hence, the other positions belong to the pixels of I S W . Therefore, all pixels can be arranged to their original position.
Step 4: Since some pixels in I S W are embedded with data and some pixels in I ED are replaced by auxiliary information, the pixels of these two parts cannot be restored to original values after direct decryption. For the marked pixels in I S W , their surrounding pixels in I B have been restored after direct decryption. Then these surrounding pixels can be used to recover the original value of the pixels in I S W , which has been introduced in Section 2.1. For the pixels in I ED which are replaced by auxiliary information, their original pixel values cannot be restored without errors in this scenario. Therefore, we use the average value of their surrounding black pixels to restore the edged pixels in I ED .
Finally, the reconstructed image I is obtained. The flow chart of steps is shown in Figure 6. Since most of pixels have been restored to their original values except for the edged pixels in I ED , the image reconstructed I is approximated with the original image.
In the last scenario, if the recipient has both data-hiding key K h and encryption key K e , he can extract the embedded data and recover the original image without error.
The recipient uses the data-hiding key to extract data, which includes secret bits and the bit planes of pixels in I ED which are encrypted. By using the encryption key, these encrypted bit planes can be decrypted directly. As introduced in the second scenario, only the pixels in I ED cannot be restored to original values, but we can obtain the bit planes of these edged pixels in this scenario. All the pixels in I ED can be restored to original values by replacing these bit planes. Therefore, all the pixels in the reconstructed image I can be restored losslessly in this scenario.

Details of Selection of Parameters
The proposed method will be tested on standard images with size of 512 × 512 × 8. The maximum embedding rate and the peak signal-to-noise ratio (PSNR) of decrypted image are employed to evaluate the performance of the proposed method. To achieve the highest ER and PSNR, a few implement details of selecting parameters need to be introduced in advance.

Parameters of Maximum Embedding Rate
In this paper, the data is embedded into the multi-MSB of pixels, and the embedding capacity of an image is Cap which has been introduced in Section 2.3. In fact, the embedded data include secret bits and bit planes of edged pixels. Thus, the pure embedding capacity of secret bits is actually (Cap-43-l clm ) bits. Then, the pure embedding rate (PER) can be defined as: For each image, the maximum PER is determined by the value of m and T. In the proposed method, we use the parameter T to select the smooth pixels, and only the smooth pixels are predicted to choose the embeddable pixels. It is well-known that there are more smooth pixels and more embeddable pixels for the larger value of T. Meanwhile, it will generate more pixels in I s * W which are marked by the location map. The compressed location map is embedded into the edged pixels which is finite. Therefore, it is necessary to ensure that these edged pixels can accommodate the compressed location map and other auxiliary information. The number of edged pixels is 2043 and there are 16,344 bits which can accommodate auxiliary information. Thus, the following condition must be satisfied: On the other hand, the value of m determines bit number that an embeddable pixel can accommodate. The smaller the m, the more data bits can be embedded into an embeddable pixel; however, the pixels that satisfy the embeddable condition would be fewer (i.e., Equation (3)).
From what has been discussed above, in order to obtain the maximum PER of an image, an optimal combination (m, T) need to be chosen. For each value of m (i.e., 3, 4, 5, 6, 7), the value of T gradually increases from a sufficiently small value of 1 to 50 at most with step size of 1 until the Equation (17) is not satisfied. For each combination (m, T) that satisfies Equation (17), we can obtain the PER by using Equation (16). Finally, we select an optimal combination (m, T) to achieve the maximum PER of an image.

Parameters of the Highest PSNR of Decrypted Image
As introduced in Section 2.4, after decrypting the image with only the encryption key, some pixels in I ED cannot be recovered to original value since their bit planes were replaced by auxiliary information. Hence, the PSNR of the decrypted image is mainly affected by these unrecoverable edged pixels. Thus, under a given embedding rate GER, the smaller the amount of auxiliary information is, the fewer edged pixels need to be replaced, and the higher the PSNR is. Therefore, the optimal combination (m*, T*) is selected as Equation (18) to obtain the highest PSNR of the decrypted image under a given embedding rate.

Experimental Results and Analysis
In this section, we conduct several experiments to evaluate the proposed method. In Sections 4.1 and 4.2, we introduce the maximum embedding rate in tested images, and then a detailed example for the proposed method is presented. Finally, the proposed method is compared with other methods in Section 4.3.

The Maximum Embedding Rate for the Tested Images
We take ten standard images, including three publicly available image "Lena", "Airplane", "man", and seven images selected from the BOWS-2 database [41] to illustrate the pure embedding rate of the proposed method. For convenience of description, we name these seven images F 1 , F 2 , . . . , F 7 , as shown in Figure 7. The maximum pure embedding rates of ten images under different values of m are shown in Table 1. From Table 1, we can see that in some cases, the embedding rate of some images is very low or even 0 when the value of m is small. This is mainly because the number of embeddable pixels will be few for moderate smooth images when the value of m is small. Meanwhile, it will generate more pixels to be marked. If the edged pixels are not enough to accommodate the auxiliary information, we do not embed secret bits in this scenario. Thus, the embedding rate is 0. For an image, when five different values of m are utilized, there are five different PER; and finally the largest one of the five PER is selected as the maximum PER of the image. It can be seen that different images may have different values of m when the maximum PER is obtained, which mainly depends on the smoothness of an image. For example, in the ten tested images, images F 1 and F 2 are smoother than other images, and the maximum pure embedding rates obtained when m = 3 were 3.5123 bpp and 3.0972 bpp, respectively. In general, the smoother the image, the more accurate the predicted pixel values will be; we can use a smaller value of m to obtain the maximum PER.

A Detailed Example for the Proposed Method
In this part, we apply the common image Lena to introduce the performance of the proposed method. Figure 8 shows the results. For the experiment results, the given embedding rate are 0.5 bpp and 0.9 bpp, and the optimal parameters (m*, T*) are (7,5) and (6,4), respectively. Figure 8a is the original image Lena. In Figure 8b, under (m*, T*) = (7, 5), the upper quarter of the rearranged image is composed of black pixels, and the appearance of the original image can be seen. The other three-quarters of the rearranged image are composed of various white pixels and edged pixels, which looks disordered. Figure 8c is the rearranged image under (m*, T*) = (6, 4). The rearranged images are encrypted to become noise-like, as shown in Figure 8d,e. It is well-known that each original image has its own histogram characteristic which can be used for retrieving original image. As shown in Figure 8f,g, the histograms of the encrypted images are nearly uniformly distributed, which can well protect the content of the image to withstand the statistical attack. In addition, we utilize correlation of adjacent pixels and information entropy to illustrate image encryption security further. To test the correlation between two adjacent pixels, 10,000 pairs of two horizontally, vertically, and diagonally adjacent pixels from an image are randomly selected, respectively, and then the corresponding correlation coefficient r xy of each pair is calculated using the following equations: cov(x,y) = E{(x − E(x))(y − E(y))} where x and y are values of the two adjacent pixels in the image, E(x) is the mean value of x, and D(x) is the variance of x. If r xy is close to 1, x and y have a strong correlation. Otherwise, if r xy is close to 0, it indicates there is no correlation between x and y. Table 2 shows the comparison of adjacent pixel correlation between the original image and Figure 8d,e. It is obvious that the values of pixel correlation are extremely close to 0. It indicates encryption in our method is able to highly disorganize correlation of pixels in three directions.   The entropy is the most outstanding feature of the randomness. The information entropy H(X) of a message source X can be calculated as where X={x 0 , x 1 , . . . , x L−1 } and P(x i ) is the probability of x i . If the information entropy is close to the maximum value, it means the encrypted image acquires excellent properties of randomness. For a grayscale image, x i is an integer with the range of [0,255]. More specifically, each pixel in gray scale image can be encoded by 8 binary bits, and the ideal value of information entropy is 8. Information entropies of Figure 8d,e are 7.9831 and 7.9754, respectively. In summary, excellent histogram distribution, correlation coefficient, and entropy of encrypted images indicate the encryption method is secure in the proposed method. After image encryption, secret bits are embedded into the multi-MSB of pixels in the proposed method, it will greatly change the value of pixels, but there is no need to consider the visual damage of the image due to data embedding in the encrypted image. Figure 8h,i shows the marked encrypted image when the pure embedding rate are 0.5 bpp and 0.9 bpp, respectively. In the proposed method, if only the data-hiding key is available, then the embedded data can be extracted from encrypted domain without any error. If only the encryption key is available, the reconstructed image after decryption is obtained, as shown in Figure 8j,k, which are reconstructed images after decryption. We can see that the two reconstructed images are approximate replications of the original ones, which is indicated by PSNRs of 85.98 dB and 74.25 dB, respectively. In fact, the reconstructed image is overwhelming similar to the original image in our proposed method, because only some edged pixels of the reconstructed image are changed compared with original value. If the data hiding key and encryption key are all available, then the recipient can not only extract the data without any error but also obtain the reconstructed image which is same as original image, as shown in Figure 8l. Obviously, in this case, the PSNR of the reconstructed image is approximating +∞.
In order to obtain the best PSNR of the decrypted image with only the decryption key for a different given embedding rate, the optimal parameters (m*, T*) need to be chosen. As shown in Table 3, for the image Lena, the optimal parameters are obtained under a different given embedding rate. From Equation (18), we can know that if the auxiliary information is smaller, the PSNR of the decrypted image will be higher. Generally, on the premise that the pure embedding rate is higher than given embedding rate, the larger the value of m, the fewer pixels need to be marked (i.e., the pixels in I s * W ), which means when the amount of auxiliary information is smaller, then the PSNR of the reconstructed image after decryption is higher. It is observed from Table 3, when the given embedding rate increases gradually, the optimal value of m decreases in order to ensure that the pure embedding rate is higher than the given embedding rate.

Comparisons with Related Methods and Analysis
In order to demonstrate superiority of the proposed method, we compare the proposed method with different framework methods [31,35,36,38,39] in terms of maximum pure embedding rate and reconstructed image quality with only encryption key. It is noted that the method by Yu et al. [31] is based on the VRBE framework, while the other methods [35,36,38,39] are based on the BBRE framework. In particular, two MSB prediction methods proposed by Puteaux et al. [38] and Yi et al. [39] are overwhelmingly related with the proposed method. Overall, the five compared methods are all about the RDH algorithm in the encryption domain.
The proposed method and another two methods proposed in [38] and [39] are all based on an MSB embedding strategy. Thus, we compared with these two methods in terms of maximum embedding rate. We use the eight well-known images of Lena, Airplane, Man, Crowd, Baboon, Hill, Peppers, and Lake for comparison. The comparison results are as shown in Table 4. For the approach of Puteaux et al. CPE [38], the original image is modified in order to avoid all prediction errors. We can see that the maximum embedding rate is 1 bpp for each tested image because every pixel of the image can accommodate a bit in this approach. However, it will cause some damage to the quality of the reconstructed image. In the approach of Puteaux et al. EPE [38], the information about the error location is recorded in the encrypted image, thus the embedding capacity is slightly smaller than the approach of Puteaux et al. CPE [38]. However, during the decoding phase, the original image can be recovered losslessly. For the method proposed by Yi et al. [39], different from the two approaches in [38], two bits can be embedded into one pixel, which can improve the embedding rate especially in smooth images. In the methods proposed in [38] and [39], one or two bits of data can be embedded into a pixel at most, which limits the embedding rate. In the proposed method, the number of data bits that a pixel can accommodate increases to 5 (at most 5). For the images with different smooth levels, we can select the most suitable value of m to embed data which is more flexible than the methods in [38] and [39]. For our method and Yi et al. [39], the maximum embedding rate differences of eight tested images are relatively large, because the embedding capacity of these two methods has a great relationship with the smoothness of the image. When the image is more complex, the embedding capacity is smaller. For the image Baboon, which is more complex, the embedding rate is much smaller than that of other images. Compared with the two approaches in [38], the proposed method has a higher embedding rate in all tested images except for image Baboon, and the average embedding rate of the eight tested images increases by 0.4268 bpp and 0.4647 bpp, respectively. Compared with Yi et al. [39], the proposed method has a higher pure embedding rate in all eight tested images, and the average embedding rate increases by 0.4437 bpp. In order to further demonstrate the superiority of the proposed method in maximum embedding rate, we compare our method with the methods of [38] and [39] on 10,000 tested images from the BOWS-2 database [41]. As shown in Table 5, our average embedding rate of 10,000 images is 1.7215 bpp, which is higher than 1 bpp and 0.9681 bpp of two approaches in Puteaux et al. [38] and 1.3512 bpp in Yi et al. [39]. Especially for those extremely smooth images, the proposed method can achieve a very high embedding rate. Ideally, nearly 3/4 of the pixels of the whole image can be used to embed data, and each pixel can be embedded into 5 bits of data. Thus, the highest embedding rate of an image is close to 3.75 bpp. As can be seen from Table 4, the highest embedding rate of the proposed method reaches 3.7140 bpp. But for those complex images, the embedding rate is not ideal, the worst case is only 0.1863 bpp. To better compare the embedding rate of different images, we randomly selected 500 images from the 10,000 images, and the embedding rate of these images were obtained. The experimental results are shown in Figure 9. Compared with the methods proposed by Puteaux et al. EPE [38] and Yi et al. [39], respectively, the proposed method had a significant improvement on most images in terms of embedding rate. For these 500 tested images, the embedding rates of 495 images in the proposed method were higher than those of Puteaux et al. EPE [38], and the embedding rates of 475 images were higher than that of Yi et al. [39]. For the proposed method, the embedding rates of most tested images were between 1.6 bpp and 2.7 bpp, which are satisfying. For Yi et al.'s method [39], the embedding rates were between 1.2 bpp and 1.7 bpp for most images, which are obviously lower than the proposed method. In Puteaux et al. EPE [38], the maximum embedding rates of most images were between 0.8 bpp and 1 bpp. We used three straight lines to indicate the average embedding rate of the 500 tested images for three methods. For the proposed method, the average embedding rate of 500 images was 2.0061 bpp. Correspondingly, the average embedding rates of [38,39] were 0.9736 bpp and 1.4531 bpp, respectively. For these 500 tested images, the proposed method was higher than Puteaux et al. EPE [38] and Yi et al. [39] by 1.0325 bpp and 0.5530 bpp, respectively. Overall, the experimental results demonstrate that the embedding rate of the proposed method is approximately 1.8 and 1.3 times those of the methods [38] and [39], respectively. Table 5. Maximum embedding rate comparison between the proposed method and methods [38,39] on the BOW-2 database (bpp, 10,000 images). For RDH in the encrypted domain, the visual quality of the reconstructed image with only knowledge of the encryption key is an important evaluating index. Our method was compared with other relevant methods [31,35,36,38,39]. To do this, the well-known images of Lena, Airplane, and Man were used. Figure 10 shows the rate-distortion curves generated from the three tested images. For all tested images, since the approach of Puteaux et al. EPE [38] and the method of Yi et al. [39] do not need to use overhead, the reconstructed images are lossless (i.e., PSNR->+∞). The method proposed in [35] considers patch-level sparse representation when embedding secret data. In addition, the learned dictionary is also embedded into the encrypted image. With the powerful representation of sparse coding, the large space can be vacated, so that the more secret bits can be embed in the encrypted image. As we can see from Figure 10, with the increase of embedding data, the PSNR of the decrypted image is obviously decreasing. For example, for the image Lena, when the embedding rate increased from 0.05 bpp to 0.75 bpp, the corresponding PSNR decreased from 54 dB to 30.8 dB. In [36], a lossless, reversible, and combined data-hiding method based on probability homomorphism is proposed. In the lossless scheme, the additional data is embedded into several least significant bit planes of ciphertext pixels by multi-layer wet paper coding, and the ciphertext pixels are replaced by new values. In the reversible scheme, the image histogram is reduced by preprocessing before image encryption, so that the modification of the encrypted image during data embedding will not cause pixel oversaturation in the plaintext domain. Because of the compatibility between lossless and reversible schemes, the two kinds of data embedding operations can be performed simultaneously in the encrypted image. For an image, the higher the embedding rate is, the higher the distortion of the decrypted image. In [31], a separable and error-free reversible data-hiding method for encrypted images based on two-layer pixel error is proposed. A histogram of the error of two adjacent layers of encrypted pixels is used to embed the secret data through histogram shifting to generate a labeled encrypted image. The embedding capacity is determined by the value of parameter K, which is used for determining which prediction errors are used to embed secret data. The higher the value of K, the more secret data can be embedded, but meanwhile, the decrypted image suffers more distortion. Although the PSNRs of reconstructed images of our method cannot reach +∞, they are all very high. For the reconstructed images Lena, Airplane, and Man, the PSNRs are between 64 dB and 86 dB, 63 dB and 66 dB, and 64 dB and 90 dB respectively, which are much higher than the approach of Puteaux et al. CPE [38] or methods [31,35,36]. In fact, only some edged pixels are changed and most of pixels can be recovered losslessly. The reconstructed image of our method is very close to the original image.

Best Case Worst Case Average
On the other hand, the proposed method can achieve a higher embedding rate than all the compared methods [31,35,36,38,39]. Figure 10. Performance comparisons for the tested images Lena, Airplane, and Man with the methods [31,35,36,38,39].
In summary, the proposed method is able to recover the embedded data and original image without errors and achieves an excellent trade-off between the embedding rate and the visual quality of the reconstructed image with only the encryption key.

Conclusions
In this paper, we proposed an efficient RDHEI by using multi-MSB embedding strategy with a very high embedding rate, which is much higher than the related methods [31,35,36,38,39]. In the proposed method, the values of m may be different for different images when the maximum embedding rates are obtained. In general, the smoother the image is, the smaller the value of m selected to obtain the maximum embedding rate. Under a certain embedding rate, we can select optimal values of (m, T) to obtain the highest PSNR of a decrypted image with only the encryption key. For the reconstructed image with only the encryption key, only some parts of edged pixels are damaged, and the other pixels of the reconstructed image are all restored to the original value. This means that within the maximum embeddable capacity, no matter how much data is embedded, the visual quality of the reconstructed image will not be significantly damaged, and most of the pixel values can be losslessly recovered. The experimental results show that the proposed method can achieve excellent embedding performance.