1. Introduction
With the rapid development of the internet, a large number of videos are stored in the cloud [
1]. To protect the video content, these videos are stored in encrypted format [
2]. To achieve the goals of fast retrieval of encrypted video and protection of video after decryption, data hiding is combined with encryption to embed label information into encrypted video [
3]. Furthermore, in order to realize the lossless recovery of video, reversible data hiding in encrypted videos emerges as the times require [
4].
Recently, reversible data hiding in encrypted images has been widely studied [
5,
6,
7]. However, due to the different structures of video coding, most schemes of reversible data hiding in encrypted images are difficult to apply to video. Therefore, the research on reversible data hiding in encrypted videos (RDH-EV) develops slowly and the framework used is singular. In [
4], the first RDH-EV scheme is presented. The intra-prediction mode (IPM), the sign of motion vector difference (MVD), and the sign of quantized discrete cosine transform (QDCT) coefficient are encrypted by RC4. Histogram shifting (HS) is implemented to embed information in the encrypted QDCT coefficient. The scheme achieves the separation of encryption and reversible data hiding, which means that decryption and data extraction can be performed in any order. The most recent studies [
8,
9,
10,
11] apply the separable framework and focus on the improvement of data hiding. The scheme presented in [
8] focuses on the embedding capacity of data hiding and adds a scale factor for embedding zone selection to expand according to different capacity requirements. In [
9], the authors estimate the distortions caused by embedding information into the QDCT coefficient and set different priorities of the QDCT coefficient for embedding to decrease the interframe distortion drift. This work focuses on imperceptibility of data hiding. In [
10], every two adjacent coefficients are grouped into pairs and information is embedded in the pairs by two-dimensional histogram shifting. The scheme utilizes the correlation of adjacent QDCT coefficients and provides high capacity. In [
11], the authors translate the framework to H.265. In the scheme, the amplitude of MVD is encrypted by displacing vertical and horizontal components and the signs of MVD and QDCT coefficients are encrypted by RC4. The conventional HS is used for embedding information into QDCT coefficients. In [
12], a robust framework based on multidomain embedding is proposed. In the scheme, partial QDCT coefficients of I-frame are permutated by logistic chaotic scrambling and then used for robust embedding. The QDCT coefficients of P-frame are encrypted by the ZUC algorithm and then used for reversible embedding. The scheme focuses on the robustness of data hiding.
To the best of our knowledge, above are all papers in this field in recent years. Although some literature [
13,
14,
15] combining video encryption and data hiding techniques has been published in recent years, the kind of their data hiding algorithms is traditional steganography [
16] and not reversible, so those schemes do not belong to the scope of this paper. It can be seen that in current schemes, there are not many encryption and reversible data hiding techniques used in combination. In terms of encryption technique, IPM and the signs of MVD and QDCT coefficient are usually encrypted by stream cipher in the schemes. In 2017, Minemura et al. [
17] presented a novel sketch attack for encrypted video, and attackers can obtain the rough outline of the original video directly from its encrypted counterpart. Unfortunately, the encryption methods used in the current schemes cannot resist the attack. This means that the schemes will not be applicable to application scenarios with high security requirements, such as military. Therefore, RDH-EV needs to adopt more secure encryption methods. In terms of reversible data hiding, only HS is adopted to combine the encryption methods. Conventional reversible data hiding techniques include HS [
18,
19,
20], difference expansion (DE) [
21,
22,
23], and integer transform [
24,
25]. At this stage, how to achieve more diversified combination of reversible data hiding and encryption is also the focus of the field. Based on the above two considerations, a novel scheme combining logistic chaotic scrambling and DE is proposed for RDH-EV in this paper. QDCT blocks (4 × 4) in each frame are permutated by logistic chaotic scrambling. In order to further enhance the visual confusion effect, IPM and MVD signs are also encrypted by XOR encryption. Then information is embedded in fixed regions of each 4 × 4 QDCT block of P frame by DE. The highlights of this paper can be summarized as follows:
In terms of encryption, the proposed scheme can resist the sketch attack and has better visual security than the existing ones while maintaining format compliance and the bit rate.
In terms of reversible data hiding, DE is applied to RDH-EV for the first time and can provide labeled video with good quality.
Encryption and reversible data hiding in the proposed scheme are separable and can be performed in any order, which can be applied in more application scenarios.
The rest of the paper is organized as follows. In
Section 2, related work about the separable framework is summarized. The proposed scheme is described in
Section 3. The experimental results are shown in
Section 4.
Section 5 concludes the paper.
2. Related Work
In past decades, many works on video encryption [
26,
27,
28] and reversible data hiding [
18,
19,
20,
21,
22,
23,
24,
25] have been done. However, few RDH-EV schemes were put forward and the framework used is singular. In this section, we summarize the framework used in the current schemes.
In most schemes, IPM and MVD sign are encrypted only for content distortion. The separable framework in the schemes is mainly realized by encryption and embedding in QDCT domain. In this framework, the signs of QDCT coefficients are flipped through stream cipher encryption and the amplitudes of QDCT coefficients are used for data embedding. An origin-symmetric histogram shifting algorithm is designed and combined with sign encryption algorithm to achieve separability. The strategy of this shifting algorithm is shown in
Figure 1. Its prominent feature is symmetry about the origin. In the embedding phase, the bit embedded by shifting “1” to “2” is consistent with the bit embedded by shifting “−1” to “−2.” The bit embedded by shifting “1” to “1” is consistent with the bit embedded by shifting “−1” to “−1.” In this way, in the information extraction phase, the information extracted from the symmetrical points will be consistent, as shown in Equation (1). This makes the information extraction independent of the sign encryption algorithm, which means the extraction operation can be performed in both encrypted video and decrypted video. For example, in the encryption phase, the sign of the coefficient “1” is flipped by XOR encryption and “1” is encrypted to “−1.” In the embedding phase, the information will be embedded in the encrypted coefficient “−1.” According to the different bit to be embedded, the encrypted coefficient “−1” will be shifted to different values. If the bit to be embedded is “1”/”0,” the encrypted coefficient “−1” will be shifted to “−2”/”−1.” If the information extraction phase is performed in the encrypted video, the information bit “1”/”0” can be extracted from the encrypted shifted coefficient “−2”/”−1.” If the information extraction phase is performed in the decrypted video, the encrypted shifted coefficient “−2”/”−1” will be decrypted to be the decrypted shifted coefficient “2”/”1” and then the information bit extracted from the coefficient “2”/”1” is still “1”/”0.” Similarly, the strategy of 2DHS satisfies Equation (2).
where Extract( ) denotes data extraction algorithm;
x” denotes a modified QDCT coefficient;
m denotes bit “0” or “1.”
where
x” and
y” denote a pair of modified QDCT coefficients.
It can be seen that the framework perfectly realizes the separation of encryption and data hiding, and is easy to implement. However, because the encryption method used by the framework cannot resist sketch attacks, there is an urgent need for a new separable framework that combines higher visual security encryption methods.
4. Experimental Results with Analysis
4.1. Experiment Setting
For the objectivity of the experimental results, nine standard common intermediate format (352 × 288) video sequences (container, stefan, coastguard, foreman, news, akiyo, bus, flower and bridge-close) are used for simulation. These video sequences can be obtained on the YUV Video Sequence website [
32]. The nine video sequences selected are rich in content, including fast motion, slow motion, complex texture and simple texture scenes. In this paper, the luminance components of the first 100 frames of each video sequence are used for experiments.
- 2.
Experimental operation environment and parameter setting
All experiments in this section were simulated on a computer equipped with an Intel i7-8550U 4 GHz CPU and 8 GB memory. The simulation experiment runs in MATLAB H.264 codec [
33]. The group of pictures (GOP) is set to “IPPP” with length of 20, which means that the first frame is encoded as I frame and the rest 19 frames are encoded as P frame. The default quantization parameter (QP) is 28.
- 3.
Contrast experiment setting
In terms of encryption algorithm, it is compared with the encryption algorithm commonly used in the current scheme [
4,
8,
9,
10] (IPM, QDCT coefficient sign and MVD sign encryption, recorded as I-Q-M encryption). As far as we know, the difference expansion algorithm is applied to the encrypted video for the first time in this paper, and no other expansion algorithm in encrypted video that can be used for comparison exists. Therefore, in terms of the embedding algorithm, this paper makes longitudinal comparative experiments as detailed as possible.
4.2. Visual Encryption Effect
The visual encryption effect of encryption algorithm is mainly evaluated by subjective video quality and objective evaluation metrics. The objective evaluation metrics include peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) [
34,
35].
The subjective video quality experiment results are shown in
Figure 8. It can be seen that the content of the original frame cannot be perceived from the encrypted frames. Compared with the I-Q-M encryption method, the visual effect of the proposed encryption method is smoother and shows less texture. This is because the proposed encryption method scrambles the QDCT blocks, thus further confusing the texture in the video.
As shown in
Table 2, PSNR and SSIM before and after video encryption are given. It can be seen that the PSNR and SSIM of the frame encrypted by proposed method shall not exceed 13.03 dB and 0.1015. Compared with I-Q-M encryption, the PSNR and SSIM of the frames encrypted by proposed method are similar. In addition, the histograms of the two kinds of encrypted videos are statistically analyzed. As shown in
Figure 9, the histogram of the encrypted frame is different from that of the original frame, indicating that the proposed encryption algorithm has good performance. From the perspective of subjective video quality and objective standards, the encryption method used in this paper achieves a better visual confusion effect and meets the requirements of visual security.
4.3. Resistance to Sketch Attacks
Resisting sketch attacks is a remarkable characteristic of the proposed encryption method. At present, none of the encryption methods used in RDH-EV has this characteristic.
Figure 10 shows the result of sketch attack on videos encrypted by the I-Q-M method and the proposed method. It can be clearly seen that the I-Q-M encryption method cannot resist the sketch attack, and the outline of the original video content will still be stolen under the ciphertext state. Some information of video content can be easily obtained from these outline images, such as characters, scenes and vehicles. Under the sketch attack, the sketch image of the video encrypted by the proposed method is a noisy image without obvious information, from which any content of the original video cannot be distinguished.
4.4. Computational Complexity Analysis
The proposed encryption method includes three parts: IPM encryption, MVD sign encryption and QDCT block permutation. IPM encryption and MVD sign encryption are XOR encryption of codewords, and their time complexity mainly depends on the number of corresponding codewords. In a 4 × 4 macroblock, the IPM codeword used for encryption is 3 bits. Therefore, in a frame containing M × N macroblocks, the time complexity of IPM encryption is O (3(M−1)(N−1)). In terms of MVD, the maximum number of sign bits in P frame is 2 × M × N, so the time complexity of MVD sign encryption is O (2MN). Scrambling QDCT blocks requires sorting and scrambling operations, and the time complexity is O (MNlog2MN+MN). Since 3(M−1)(N−1) and 2MN are much smaller than (MNlog2MN+MN), the total time complexity of the proposed encryption operations is O (MNlog2MN).
The I-Q-M encryption method [
4,
8,
9,
10] includes IPM encryption, QDCT sign encryption and MVD sign encryption. The time complexity of the IPM encryption and the MVD encryption is consistent with that of our encryption methods. The time complexity of the IPM encryption is
O (3(
M−1)(
N−1)) and of the MVD encryption is
O (2
MN). Signs of non-zero AC coefficients will be encrypted by XOR in the QDCT sign encryption and the time complexity of this encryption depends on the number of non-zero AC coefficients. Since the non-zero AC coefficients of a 4 × 4 macroblock cannot exceed 15, the time complexity of the QDCT sign encryption is
O (15
MN). To sum up, the total time complexity of the I-Q-M encryption is
O (
MN), which is lower than that of the methods proposed in this paper.
The encryption method proposed in this paper is based on the coding bitstream. First, some elements, IPM and QDCT coefficients in the bitstream, are decoded, and then the corresponding encryption is performed. Finally, the encrypted bitstream is obtained by re-encoding the encrypted elements. Although the complexity of the proposed encryption method is higher than that of the I-Q-M encryption method, the proposed encryption method only partially decodes and encodes in the whole process and the time cost in the actual operation process is not too large.
4.5. Encryption Space Analysis
Generally speaking, the encryption space should be large enough to resist brute force attacks. As described in
Section 3.1, the encryption keys used in proposed encryption method include
K1,
K2 and
K3. The size of the keys is all 256 bits and they are used to generate random sequences by ZUC algorithm, which is difficult to attack [
32]. However, the attacker may attempt to enumerate the encryption operations instead of the encryption key. Therefore, the encryption space of I frame and P frame is analyzed. Let
ms denote the number of MVD sign bits and
mQ denote the number of non-zero AC coefficients sign bits in a frame. For the proposed encryption method and the I-Q-M encryption, the encryption space of I frame is calculated by the following formula:
The encryption space of P frame is calculated by the following formula:
It can be seen from Equations (22) and (23) that, for the proposed encryption method, the size of encryption space mainly depends on the number of macroblocks and the number of MVD. For the I-Q-M encryption, the size of encryption space mainly depends on the number of macroblocks, the number of MVD and the number of non-zero AC coefficients. As shown in
Table 3, although the encryption space of the P frame will be smaller than that of the I frame, the encryption space of both the proposed method and the I-Q-M encryption is large enough to resist brute force attacks. In addition, P frame content is decoded based on the I frame content. As long as I frame encryption is not cracked, even if P frame encryption is cracked, P frame content cannot be obtained. Accordingly, the subsequent decryption of the encrypted P frame requires the complete content of the forward P frame, otherwise the original content cannot be recovered. To sum up, for the proposed encryption, the encryption space of
i-th frame in a GOP can be calculated as follows:
When i = 1, Ei takes the minimum value E1, which is equal to EI. In other words, the minimum encryption space to crack a frame in a GOP is EI, which is large enough to resist the brute force attack.
Table 3.
The encryption space of I frame and P frame.
Table 3.
The encryption space of I frame and P frame.
Video Sequence | Proposed Method | I-Q-M Ecnryption |
---|
EI | EP | EI | EP |
---|
container | 6335! × 218531 | 6335! × 21031 | 236034 | 25042 |
stefan | 6335! × 218531 | 6335! × 21188 | 245511 | 215327 |
foreman | 6335! × 218531 | 6335! × 21200 | 230947 | 26298 |
news | 6335! × 218531 | 6335! × 21112 | 231119 | 24540 |
akiyo | 6335! × 218531 | 6335! × 2938 | 225694 | 22227 |
bus | 6335! × 218531 | 6335! × 21115 | 248383 | 217894 |
flower | 6335! × 218531 | 6335! × 21088 | 253570 | 214067 |
bridge-close | 6335! × 218531 | 6335! × 21366 | 239816 | 212370 |
4.6. Visual Quality of Labeled Decrypted Video
Because there is no difference expansion algorithm in the ciphertext domain to compare, this paper uses PSNR and SSIM to evaluate the visual quality of labeled decrypted video under different QP coding as vertically as possible.
Table 4 shows the average PSNR and SSIM of original video and labeled video with QP of 24, 28, and 32. The data in the table are obtained in full embedding. It can be seen from the table that the PSNR and SSIM of the embedded video have decreased compared with those of the original video, but the decrease is not large. The average PSNR value decreases by 3.57dB at most, and the average SSIM value decreases by 0.0027 at most, which can meet the requirements of good visual quality.
Figure 11 shows the PSNR and SSIM changes of the container video sequence after embedding. It can be seen that, in a GOP, the PSNR and SSIM of labeled videos decrease slowly, reaching the lowest in the last frame. In other words, the visual quality of the last frame in a GOP is the worst. As shown in
Figure 12, the 20th frame (GOP length in this paper is 20) of each labeled video sequence is given. As can be seen from the figure, even in the 20th frame, it is difficult to see the traces of information embedding. In the cloud environment, when authorized users directly decrypt videos for rough browsing, the embedding algorithm in this paper can provide good video quality.
4.7. Embedding Capacity
Embedding capacity is a basic evaluation metric of data hiding. The full embedded capacity of each video sequence is shown in
Table 5. From different video sequences, video sequences with complex textures, such as stefan, bus and flower, have relatively large embedding capacity. The embedding capacity of video sequences with simple texture, such as akiyo and container, is relatively small. This is because the specific QDCT blocks are selected for embedding, of which DC coefficients are not 0 and AC coefficients are not all 0. The more complex the texture of the video sequence is, the more QDCT blocks can be used for embedding, and the larger the full embedding capacity. In video compression coding, the larger the QP value is, the more QDCT coefficients will be quantized to 0, resulting in fewer embeddable QDCT blocks and smaller full embedding capacity. According to the change of SSIM following QP in
Table 4, the embedding capacity of this algorithm is large when the QP value is small, but the SSIM of embedded video will not decrease significantly. It shows that the embedding algorithm in this paper is more suitable for video encoded with low QP.
4.8. Robustness Analysis
Robustness refers to the ability not to lose hidden information due to some changes in the carrier. Without any attack on the video containing information, the information embedded by the proposed embedding algorithm can be accurately extracted from the video. However, if the video is attacked by various attacks, such as recompression, the embedded information will not be extracted effectively. For example, after recompression, the coefficients of some QDCT blocks containing information may be fully quantized to 0, so these QDCT blocks containing information will be skipped during information extraction, resulting in information extraction errors. In addition, even if the QDCT block is not quantized into an all zero coefficient, the coefficients a and b will probably change, making it impossible to accurately extract information. In addition, even if the coefficients of the QDCT block are not all quantized to zero, the coefficients a and b will probably change, making it impossible to accurately extract information. Therefore, the embedding algorithm in this paper is not robust. This means that the scheme in this paper is not suitable for scenarios with high robustness requirements, and can be applied to content authentication, data indexing and other fields.
4.9. Reversibility Analysis
The reversibility of the embedding algorithm refers to the property that the carrier can be recovered without loss after information extraction. In this paper, the classical reversible algorithm, difference expansion, is used. The algorithm uses the difference between coefficients to extract and restore the carrier.
Figure 13 shows the original PSNR and the PSNR after information embedding and extraction of news, container and bus video sequences. It can be seen from the figure that the two PSNR polylines corresponding to each video sequence coincide. It shows that the carrier can restore to the original value after extracting information and restore the original video quality, proving that the embedding algorithm in this paper is reversible.
4.10. Bit Rate Variation
In order to further evaluate the performance of this scheme, the bit rate variations caused by the encryption and information embedding are analyzed. Bit rate variation can be calculated as follows:
where
BR_e denotes the bit rate after encryption or embedding;
BR_ori denotes the original bit rate. The larger the
BR_var is, the greater the bit rate variation caused by the operation, which affects the compression performance.
The encryption method in this paper includes three aspects—QDCT block permutation, IPM encryption, and MVD sign encryption—and it is implemented at the code stream. The QDCT block permutation only changes the location, not the coefficients in the QDCT block, so it will not cause bit changes. IPM encryption is to perform XOR encryption on three codewords in the code stream without changing the code length. The encryption of the MVD sign will only cause the reversal of its sign codeword, and will not increase the bit. Therefore, the encryption method proposed in this paper will not lead to changes in bit rate. The embedding algorithm used in this paper will change amplitude of some QDCT coefficients, so it will inevitably lead to the variation of bit rate. The more information is embedded, the greater the bit rate change.
Table 6 shows the bit rate variation caused by encryption and embedding of each video sequence under different QP conditions. It can be seen that the experimental results are consistent with the above analysis. In different video sequences encoded by different QP, the bit rate after encryption is always consistent with the original bit rate. As for the bit rate change caused by embedding, we can analyze it from two perspectives: video sequence and QP value. From the perspective of video sequence, the bit rate of video sequence with complex texture changes more. From the perspective of QP, the bit rate of video sequence with low QP changes more. Through analysis, the reason is that video sequences with complex texture or low QP contain more embeddable QDCT blocks, and more coefficients are modified, resulting in greater changes in bit rate.
5. Conclusions
In this paper, a novel separable scheme for encryption and reversible data hiding is proposed, which has improved security in encryption method. In the encryption stage, the QDCT blocks are permutated by logistic chaotic scrambling and the partial codewords of the IPM and the MVD are encrypted by XOR encryption, which will not lead to bit rate variation. In the reversible data hiding stage, difference expansion is applied for the first time and
AC10 and
AC12 in 4 × 4 QDCT block of the P frame are selected as coefficient pairs for embedding. The experiments show that compared with the encryption method used in other schemes, the proposed encryption method can resist sketch attack and has better security. The reversible data hiding algorithm can provide good visual quality of the labeled decrypted video, of which the average PSNR decreases by 3.57 dB at most and the average SSIM decreases by 0.0027 at most. The encryption and the data hiding in the proposed scheme are separable and the proposed scheme can be applied to more application scenarios, as described in
Section 3.4. Based on the above conclusions, users concerned about data privacy can safely store video data in the cloud and can directly decrypt the video in the cloud to quickly browse the content. In addition, due to the high security of the proposed scheme, it may be applied to telemedicine or military fields. Of course, the proposed scheme can be further improved. Although the permutation encryption can resist contour attacks, the increased computational complexity makes the scheme less immediate. In future work, we focus on how to combine faster secure encryption techniques and reversible data hiding techniques.