A Novel Separable Scheme for Encryption and Reversible Data Hiding

: With the increasing emphasis on security and privacy, video in the cloud sometimes needs to be stored and processed in an encrypted format. To facilitate the indexing and tampering detection of encrypted videos, data hiding is performed in encrypted videos. This paper proposes a novel separable scheme for encryption and reversible data hiding. In terms of encryption method, intra-prediction mode and motion vector difference are encrypted by XOR encryption, and quantized discrete cosine transform block is permutated based on logistic chaotic mapping. In terms of the reversible data hiding algorithm, difference expansion is applied in encrypted video for the ﬁrst time in this paper. The encryption method and the data hiding algorithm are separable, and the embedded information can be accurately extracted in both encrypted video bitstream and decrypted video bitstream. The experimental results show that the proposed encryption method can resist sketch attack and has higher security than other schemes, keeping the bit rate unchanged. The embedding algorithm used in the proposed scheme can provide higher capacity in the video with lower quantization parameter and good visual quality of the labeled decrypted video, maintaining low bit rate variation. The video encryption and the reversible data hiding are separable and the scheme can be applied in more scenarios.


Introduction
With the rapid development of the internet, a large number of videos are stored in the cloud [1]. To protect the video content, these videos are stored in encrypted format [2]. To achieve the goals of fast retrieval of encrypted video and protection of video after decryption, data hiding is combined with encryption to embed label information into encrypted video [3]. Furthermore, in order to realize the lossless recovery of video, reversible data hiding in encrypted videos emerges as the times require [4].
Recently, reversible data hiding in encrypted images has been widely studied [5][6][7]. However, due to the different structures of video coding, most schemes of reversible data hiding in encrypted images are difficult to apply to video. Therefore, the research on reversible data hiding in encrypted videos (RDH-EV) develops slowly and the framework used is singular. In [4], the first RDH-EV scheme is presented. The intra-prediction mode (IPM), the sign of motion vector difference (MVD), and the sign of quantized discrete cosine transform (QDCT) coefficient are encrypted by RC4. Histogram shifting (HS) is implemented to embed information in the encrypted QDCT coefficient. The scheme achieves the separation of encryption and reversible data hiding, which means that decryption and data extraction can be performed in any order. The most recent studies [8][9][10][11] apply the separable framework and focus on the improvement of data hiding. The scheme presented in [8] focuses on the embedding capacity of data hiding and adds a scale factor for embedding

•
In terms of encryption, the proposed scheme can resist the sketch attack and has better visual security than the existing ones while maintaining format compliance and the bit rate.

•
In terms of reversible data hiding, DE is applied to RDH-EV for the first time and can provide labeled video with good quality. • Encryption and reversible data hiding in the proposed scheme are separable and can be performed in any order, which can be applied in more application scenarios.
The rest of the paper is organized as follows. In Section 2, related work about the separable framework is summarized. The proposed scheme is described in Section 3. The experimental results are shown in Section 4. Section 5 concludes the paper.

Related Work
In past decades, many works on video encryption [26][27][28] and reversible data hiding [18][19][20][21][22][23][24][25] have been done. However, few RDH-EV schemes were put forward and the framework used is singular. In this section, we summarize the framework used in the current schemes. In most schemes, IPM and MVD sign are encrypted only for content distortion. The separable framework in the schemes is mainly realized by encryption and embedding in QDCT domain. In this framework, the signs of QDCT coefficients are flipped through stream cipher encryption and the amplitudes of QDCT coefficients are used for data embedding. An origin-symmetric histogram shifting algorithm is designed and combined with sign encryption algorithm to achieve separability. The strategy of this shifting algorithm is shown in Figure 1. Its prominent feature is symmetry about the origin. In the embedding phase, the bit embedded by shifting "1" to "2" is consistent with the bit embedded by shifting "−1" to "−2." The bit embedded by shifting "1" to "1" is consistent with the bit embedded by shifting "−1" to "−1." In this way, in the information extraction phase, the information extracted from the symmetrical points will be consistent, as shown in Equation (1). This makes the information extraction independent of the sign encryption algorithm, which means the extraction operation can be performed in both encrypted video and decrypted video. For example, in the encryption phase, the sign of the coefficient "1" is flipped by XOR encryption and "1" is encrypted to "−1." In the embedding phase, the information will be embedded in the encrypted coefficient "−1." According to the different bit to be embedded, the encrypted coefficient "−1" will be shifted to different values. If the bit to be embedded is "1"/"0," the encrypted coefficient "−1" will be shifted to "−2"/"−1." If the information extraction phase is performed in the encrypted video, the information bit "1"/"0" can be extracted from the encrypted shifted coefficient "−2"/"−1." If the information extraction phase is performed in the decrypted video, the encrypted shifted coefficient "−2"/"−1" will be decrypted to be the decrypted shifted coefficient "2"/"1" and then the information bit extracted from the coefficient "2"/"1" is still "1"/"0." Similarly, the strategy of 2DHS satisfies Equation (2).
Electronics 2022, 11, x FOR PEER REVIEW 3 of 22 In most schemes, IPM and MVD sign are encrypted only for content distortion. The separable framework in the schemes is mainly realized by encryption and embedding in QDCT domain. In this framework, the signs of QDCT coefficients are flipped through stream cipher encryption and the amplitudes of QDCT coefficients are used for data embedding. An origin-symmetric histogram shifting algorithm is designed and combined with sign encryption algorithm to achieve separability. The strategy of this shifting algorithm is shown in Figure 1. Its prominent feature is symmetry about the origin. In the embedding phase, the bit embedded by shifting "1" to "2" is consistent with the bit embedded by shifting "−1" to "−2." The bit embedded by shifting "1" to "1" is consistent with the bit embedded by shifting "−1" to "−1." In this way, in the information extraction phase, the information extracted from the symmetrical points will be consistent, as shown in Equation (1). This makes the information extraction independent of the sign encryption algorithm, which means the extraction operation can be performed in both encrypted video and decrypted video. For example, in the encryption phase, the sign of the coefficient "1" is flipped by XOR encryption and "1" is encrypted to "−1." In the embedding phase, the information will be embedded in the encrypted coefficient "−1." According to the different bit to be embedded, the encrypted coefficient "−1" will be shifted to different values. If the bit to be embedded is "1"/"0," the encrypted coefficient "−1" will be shifted to "−2"/"−1." If the information extraction phase is performed in the encrypted video, the information bit "1"/"0" can be extracted from the encrypted shifted coefficient "−2"/"−1." If the information extraction phase is performed in the decrypted video, the encrypted shifted coefficient "−2"/"−1" will be decrypted to be the decrypted shifted coefficient "2"/"1" and then the information bit extracted from the coefficient "2"/"1" is still "1"/"0." Similarly, the strategy of 2DHS satisfies Equation (2).
It can be seen that the framework perfectly realizes the separation of encryption and data hiding, and is easy to implement. However, because the encryption method used by the framework cannot resist sketch attacks, there is an urgent need for a new separable framework that combines higher visual security encryption methods.

Proposed Scheme
The proposed scheme is shown in Figure 2. The proposed scheme includes three parts: video encryption, reversible data hiding in encrypted video, and data extraction and video recovery, which will be elaborated in this section. In video encryption stage, the video owner parses original bitstream and encrypts IPM, QDCT block and MVD to It can be seen that the framework perfectly realizes the separation of encryption and data hiding, and is easy to implement. However, because the encryption method used by the framework cannot resist sketch attacks, there is an urgent need for a new separable framework that combines higher visual security encryption methods.

Proposed Scheme
The proposed scheme is shown in Figure 2. The proposed scheme includes three parts: video encryption, reversible data hiding in encrypted video, and data extraction and video recovery, which will be elaborated in this section. In video encryption stage, the video owner parses original bitstream and encrypts IPM, QDCT block and MVD to obtain encrypted bitstream. In the data embedding stage, the data hider embeds label information in the encrypted bitstream by DE. Finally, the receiver can extract the label information from encrypted bitstream or decrypted bitstream and restore the original bitstream. As shown in Scenario II, the labeled encrypted bitstream will be decrypted to labeled bitstream. The labeled bitstream can be normally decoded to play and the video quality is good. information from encrypted bitstream or decrypted bitstream and restore the original bitstream. As shown in Scenario Ⅱ, the labeled encrypted bitstream will be decrypted to labeled bitstream. The labeled bitstream can be normally decoded to play and the video quality is good.

Video Encryption
As mentioned in Section 2, it is difficult to resist the sketch attack by using the encryption methods that IPM and the signs of QDCT and MVD are encrypted by XOR operation (I-Q-M encryption). In this section, the sketch attack presented in [17] is analyzed, and then an effective encryption method is proposed.

Sketch Attack Using Macroblock Bitstream Size (MBS)
As described in [17], the complexity of a macroblock can be inferred from the bitstream size required for encoding the macroblock. Specifically, the more complex the macroblock is, the larger the bitstream size needed to encode the macroblock is, and vice versa. Based on this feature, the following formula can be used to attack encrypted video to obtain the outline of original video: where b(i, j) denotes the number of bits spent on encoding the (i, j)-th macroblock, max{ } is the maximum function, round( ) is the rounding function, ϕ(i, j) denotes the (i, j)-th pixel value of the sketch image.
It can be seen from Equation (3) that MBS attacks mainly obtain plaintext information by calculating the encoding bitstream size of each macroblock. The attack effect is shown in Figure 3. The partial content of original video can be obviously seen from the sketch image obtained by MBS attack. The I-Q-M encryption does not change the bitstream size, so it cannot resist MBS attacks. Obviously, the encryption algorithm can resist MBS attacks through two operations: One is to change the code bitstream size of each element by using a specific replacement operation. In the most ideal case, the bitstream size of each macroblock will tend to be consistent, so the bitstream size of each macroblock cannot reflect its own complexity. Another method is to scramble the positions of each macroblock, so that the sketch image is scrambled. Due to the complexity of video coding structure, it is difficult to adjust the bitstream distribution to an ideal state while maintaining format compatibility. Therefore, the scrambling encryption method is used to improve the security in this paper, while keeping the bit rate unchanged.

Video Encryption
As mentioned in Section 2, it is difficult to resist the sketch attack by using the encryption methods that IPM and the signs of QDCT and MVD are encrypted by XOR operation (I-Q-M encryption). In this section, the sketch attack presented in [17] is analyzed, and then an effective encryption method is proposed.

Sketch Attack Using Macroblock Bitstream Size (MBS)
As described in [17], the complexity of a macroblock can be inferred from the bitstream size required for encoding the macroblock. Specifically, the more complex the macroblock is, the larger the bitstream size needed to encode the macroblock is, and vice versa. Based on this feature, the following formula can be used to attack encrypted video to obtain the outline of original video: where b(i, j) denotes the number of bits spent on encoding the (i, j)-th macroblock, max{ } is the maximum function, round( ) is the rounding function, φ(i, j) denotes the (i, j)-th pixel value of the sketch image. It can be seen from Equation (3) that MBS attacks mainly obtain plaintext information by calculating the encoding bitstream size of each macroblock. The attack effect is shown in Figure 3. The partial content of original video can be obviously seen from the sketch image obtained by MBS attack. The I-Q-M encryption does not change the bitstream size, so it cannot resist MBS attacks. Obviously, the encryption algorithm can resist MBS attacks through two operations: One is to change the code bitstream size of each element by using a specific replacement operation. In the most ideal case, the bitstream size of each macroblock will tend to be consistent, so the bitstream size of each macroblock cannot reflect its own complexity. Another method is to scramble the positions of each macroblock, so that the sketch image is scrambled. Due to the complexity of video coding structure, it is difficult to adjust the bitstream distribution to an ideal state while maintaining format compatibility. Therefore, the scrambling encryption method is used to improve the security in this paper, while keeping the bit rate unchanged.

Permutation of QDCT Block
Through analysis, the bitstream size spent on encoding a macroblock mainly depends on that spent on encoding the QDCT block. In this paper, the QDCT blocks are permutated instead of the macroblocks that are permutated in [26]. In this way, the sketch image will be scrambled and decoding error of IPM will not occur.
Chaotic mapping is generally used in scrambling algorithms to generate random sequences. Chaotic systems have very complex dynamic behaviors and are widely used in the field of secure communications [29,30]. The one-dimensional logistic mapping formula is as follows: where λ∈(0,4] is logistic parameter xn∈(0,1). When 3.57 < λ ≤ 4, the map is in a chaotic state. The closer λ is to 4, the more uniformly the x range is distributed at (0,1). Therefore, λ is equal to 4 in this paper. When x0 is given, a random sequence can be generated according to Equation (4), from which the random sequence S= (s1, …, sl-1, sl) with length l can be intercepted. Then, the index sequence K = (k1, ..., kl-1, kl) can be obtained according to Equation (5) and will be used for permutation.
Let Bi, j denote the (i, j)-th QDCT block in a frame that has M × N QDCT blocks. The QDCT blocks in a frame are divided into two sequences Q0 and Q1 as follows: The permutation process is shown in Figure 4. The sequence Q1 will be permutated to generate the permutated sequence 1 Q  according to Equation (8) with the index sequence K1. The QDCT block will be rearranged according to

Permutation of QDCT Block
Through analysis, the bitstream size spent on encoding a macroblock mainly depends on that spent on encoding the QDCT block. In this paper, the QDCT blocks are permutated instead of the macroblocks that are permutated in [26]. In this way, the sketch image will be scrambled and decoding error of IPM will not occur.
Chaotic mapping is generally used in scrambling algorithms to generate random sequences. Chaotic systems have very complex dynamic behaviors and are widely used in the field of secure communications [29,30]. The one-dimensional logistic mapping formula is as follows: where λ∈(0,4] is logistic parameter x n ∈(0,1). When 3.57 < λ ≤ 4, the map is in a chaotic state. The closer λ is to 4, the more uniformly the x range is distributed at (0,1). Therefore, λ is equal to 4 in this paper. When x 0 is given, a random sequence can be generated according to Equation (4), from which the random sequence S= (s 1 , . . . , s l-1 , s l ) with length l can be intercepted. Then, the index sequence K = (k 1 , ..., k l-1 , k l ) can be obtained according to Equation (5) and will be used for permutation. [ Let B i, j denote the (i, j)-th QDCT block in a frame that has M × N QDCT blocks. The QDCT blocks in a frame are divided into two sequences Q 0 and Q 1 as follows: The permutation process is shown in Figure 4. The sequence Q 1 will be permutated to generate the permutated sequence Q 1 according to Equation (8) with the index sequence K 1 . The QDCT block will be rearranged according to Q 1 .
where K 1 (n) denotes the n-th number in the permutation key K 1 and Q 1 (n) denotes the n-th B i, j in the sequence Q 1 .
where codeNum is calculated according to the value k to be encoded, as follows: The characteristics of the codeword are shown in Table 1. It can be seen from the table that the codewords of the opposite numbers are different only in the last bit. Therefore, the last bit of the MVD codeword is encrypted by XOR using stream cipher to realize MVD sign encryption in this paper. It should be noted that an MVD contains horizontal and vertical components, and two components are encrypted independently, as follows: (11) where mvdi denotes the i-th MVD; mvd_hi and mvd_vi represent the horizontal and vertical components of MVD respectively; S1 is generated by the ZUC algorithm [32] with the key K2. To enhance visual safety, the encryption of IPM is added. As in most schemes, XOR encryption is performed on the following three codewords when flag is 0, as shown in

Encryption of MVD Sign
The MVD codeword structure is encoded as [M zero] [1][INFO] using Exp-Golomb code in the video encoding process [31]. INFO is the information value of M bit. The M and INFO can be calculated as follows: where codeNum is calculated according to the value k to be encoded, as follows: The characteristics of the codeword are shown in Table 1. It can be seen from the table that the codewords of the opposite numbers are different only in the last bit. Therefore, the last bit of the MVD codeword is encrypted by XOR using stream cipher to realize MVD sign encryption in this paper. It should be noted that an MVD contains horizontal and vertical components, and two components are encrypted independently, as follows: where mvd i denotes the i-th MVD; mvd_h i and mvd_v i represent the horizontal and vertical components of MVD respectively; S 1 is generated by the ZUC algorithm [32] with the key K 2 .

Encryption of IPM Codeword
To enhance visual safety, the encryption of IPM is added. As in most schemes, XOR encryption is performed on the following three codewords when flag is 0, as shown in . The IPM of the first row or column will not be encrypted because it may cause decoding errors. The encryption method is as follows: (12) where p i denotes the i-th bit of the whole IPM codewords in a frame and S 2 is generated by ZUC algorithm with the key K 3 .
Electronics 2022, 11, x FOR PEER REVIEW 7 of 22 Figure 5. The IPM of the first row or column will not be encrypted because it may cause decoding errors. The encryption method is as follows: (12) where pi denotes the i-th bit of the whole IPM codewords in a frame and S2 is generated by ZUC algorithm with the key K3.

Adaptive Key Generation
In order to further strengthen the security of the encryption method and facilitate key management, each frame of the video will be encrypted with a different permutation key Kf, which is adaptively generated based on the data of each frame and an initial seed S0 with 256 bits. The steps of adaptive generation are as follows: Step 1: Arrange the DC coefficients of all QDCT blocks in a frame into a one-dimensional sequence from large to small D = (dc1, dc2, …, dcMN).
Step 2: Calculate the adaptive value xa of the frame according to Equation (11).
Step 4: XOR Sp and S0 to generate a new sequence Sn.

Adaptive Key Generation
In order to further strengthen the security of the encryption method and facilitate key management, each frame of the video will be encrypted with a different permutation key K f , which is adaptively generated based on the data of each frame and an initial seed S 0 with 256 bits. The steps of adaptive generation are as follows: Step 1: Arrange the DC coefficients of all QDCT blocks in a frame into a one-dimensional sequence from large to small D = (dc 1 , dc 2 , . . . , dc MN ).
Step 2: Calculate the adaptive value x a of the frame according to Equation (11).
Step 3: Generate a chaotic sequence (x k ) ∞ k=0 by logistic mapping using λ and x a . Extract (x k ) k=1255 k=1000 from (x k ) ∞ k=0 and generate binary random sequence S p according to Equation (12).
Step 4: XOR S p and S 0 to generate a new sequence S n .
Step 5: Divide S n into 16 subsequences S ni (i = 1, . . . , 16), each of which has a length of 16 bits. Calculate x 0 of the frame according to Equation (13).
Step 6: Calculate the permutation sequence K f of the frame according to Equations (4) and (5) using λ and x 0 .

Embedding Region Selection
As mentioned in Section 3.1.2, the complexity of macroblocks can be inferred according to the bitstream size spent on encoding its QDCT block. The human visual system is less sensitive to small changes in complex texture regions. Therefore, the QDCT block with large bitstream size is selected as the embeddable region to achieve better imperceptibility. In this paper, QDCT blocks, of which DC coefficients are not 0 and of which AC coefficients are not all 0 are selected for information embedding. In these blocks, AC 10 and AC 12 of each QDCT block form a pair of coefficients, as shown in Figure 6.

Embedding Region Selection
As mentioned in Section 3.1.2, the complexity of macroblocks can be inferred according to the bitstream size spent on encoding its QDCT block. The human visual system is less sensitive to small changes in complex texture regions. Therefore, the QDCT block with large bitstream size is selected as the embeddable region to achieve better imperceptibility. In this paper, QDCT blocks, of which DC coefficients are not 0 and of which AC coefficients are not all 0 are selected for information embedding. In these blocks, AC10 and AC12 of each QDCT block form a pair of coefficients, as shown in Figure 6.

Difference Expansion in Encrypted Videos
Let (x, y) denote the coefficient pair. The average value a and difference h of the coefficient pair can be calculated as follows: The inverse transformation of the above formula is as follows: The difference h will be expanded and one bit m can be embedded in the expanded difference h', as follows: h m  =+ (18) Then, the original coefficient pair will be modified as follows: The whole embedding process as shown in Algorithm 1.

Difference Expansion in Encrypted Videos
Let (x, y) denote the coefficient pair. The average value a and difference h of the coefficient pair can be calculated as follows: The inverse transformation of the above formula is as follows: The difference h will be expanded and one bit m can be embedded in the expanded difference h , as follows: Then, the original coefficient pair will be modified as follows: The whole embedding process as shown in Algorithm 1.

Information Extraction and Video Recovery
Information extraction is the inverse process of data hiding. Information can be extracted as follows: The original difference h can be obtained according to Equation (19). Then the original coefficient pair (x, y) can be recovered by Equation (15).
The extraction operation can be implemented in encrypted video or decrypted video. The original QDCT block can be accurately restored to recover the original video.

Application Scenario
If an RDH-EV scheme is separable, it can be applied to more application scenarios. Assume the original bitstream, the label information to be embedded, the encryption operation, the encryption key, the data-embedding operation, the encrypted bitstream, the labeled bitstream, the decryption operation and the data-extraction operation as O, M, E, K e , H, O E , O H , D and R. If video encryption and reversible data hiding are separable, the following equation must hold [13]: Specifically, the data embedding, data extraction and video recovery can be implemented in both plain and encrypted domains. As shown in Figure 2, the proposed scheme can realize these operations. It indicates that the scheme in this paper is separable. Figure 7 shows three application scenarios in cloud environment.
labeled bitstream can be normally decoded to play video with good quality. 2. Cloud provider only owns the hiding key. Cloud provider can directly extract the label information from the encrypted bitstream to complete indexing and authentication or other operations. 3. User 2 is authorized by user 1 and cloud provider, and owns the encryption key and the hiding key. User 2 can not only extract the label information for management, but also decrypt the bitstream to obtain the original video.

Experimental Results with Analysis
The effectiveness of the proposed scheme has been investigated through a series of simulation experiments. Section 4.1 introduces the video sequence used, the experimental runtime environment and the methods used for comparison. The experiments for the encryption method are analyzed in Sections 4.2-4.5. The visual effect of encryption method is analyzed in Section 4.2. The ability of resisting sketch attack is shown in Section 4.3. Section 4.4. analyzes computational complexity. The encryption space is analyzed in Section 4.5. Sections 4.6-4.9 are the experimental analysis of the embedding algorithm. The visual quality of labeled decrypted video is deeply analyzed in Section 4.6. The embedding capacity is analyzed in Section 4.7. Section 4.8 describes the robustness. The reversibility is analyzed in Section 4.9. Section 4.10 reports the bit rate variation caused by the encryption or embedding.

Video sequences
For the objectivity of the experimental results, nine standard common intermediate format (352 × 288) video sequences (container, stefan, coastguard, foreman, news, akiyo, bus, flower and bridge-close) are used for simulation. These video sequences can be obtained on the YUV Video Sequence website [32]. The nine video sequences selected are rich in content, including fast motion, slow motion, complex texture and simple texture scenes. In this paper, the luminance components of the first 100 frames of each video sequence are used for experiments.

1.
User 1 only owns the encryption key. User 1 can download the labeled encrypted bitstream from cloud and decrypted the bitstream to obtain labeled bitstream. The labeled bitstream can be normally decoded to play video with good quality.

2.
Cloud provider only owns the hiding key. Cloud provider can directly extract the label information from the encrypted bitstream to complete indexing and authentication or other operations.

3.
User 2 is authorized by user 1 and cloud provider, and owns the encryption key and the hiding key. User 2 can not only extract the label information for management, but also decrypt the bitstream to obtain the original video.

Experimental Results with Analysis
The effectiveness of the proposed scheme has been investigated through a series of simulation experiments. Section 4.1 introduces the video sequence used, the experimental runtime environment and the methods used for comparison. The experiments for the encryption method are analyzed in Section 4.2, Section 4.3, Section 4.4, Section 4.5. The visual effect of encryption method is analyzed in Section 4.2. The ability of resisting sketch attack is shown in Section 4.3. Section 4.4. analyzes computational complexity. The encryption space is analyzed in Section 4.5. Section 4.6, Section 4.7, Section 4.8, Section 4.9 are the experimental analysis of the embedding algorithm. The visual quality of labeled decrypted video is deeply analyzed in Section 4.6. The embedding capacity is analyzed in Section 4.7. Section 4.8 describes the robustness. The reversibility is analyzed in Section 4.9. Section 4.10 reports the bit rate variation caused by the encryption or embedding.

1.
Video sequences For the objectivity of the experimental results, nine standard common intermediate format (352 × 288) video sequences (container, stefan, coastguard, foreman, news, akiyo, bus, flower and bridge-close) are used for simulation. These video sequences can be obtained on the YUV Video Sequence website [32]. The nine video sequences selected are rich in content, including fast motion, slow motion, complex texture and simple texture scenes. In this paper, the luminance components of the first 100 frames of each video sequence are used for experiments.

Experimental operation environment and parameter setting
All experiments in this section were simulated on a computer equipped with an Intel i7-8550U 4 GHz CPU and 8 GB memory. The simulation experiment runs in MATLAB H.264 codec [33]. The group of pictures (GOP) is set to "IPPP" with length of 20, which means that the first frame is encoded as I frame and the rest 19 frames are encoded as P frame. The default quantization parameter (QP) is 28.

3.
Contrast experiment setting In terms of encryption algorithm, it is compared with the encryption algorithm commonly used in the current scheme [4,[8][9][10] (IPM, QDCT coefficient sign and MVD sign encryption, recorded as I-Q-M encryption). As far as we know, the difference expansion algorithm is applied to the encrypted video for the first time in this paper, and no other expansion algorithm in encrypted video that can be used for comparison exists. Therefore, in terms of the embedding algorithm, this paper makes longitudinal comparative experiments as detailed as possible.

Visual Encryption Effect
The visual encryption effect of encryption algorithm is mainly evaluated by subjective video quality and objective evaluation metrics. The objective evaluation metrics include peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) [34,35].
The subjective video quality experiment results are shown in Figure 8. It can be seen that the content of the original frame cannot be perceived from the encrypted frames. Compared with the I-Q-M encryption method, the visual effect of the proposed encryption method is smoother and shows less texture. This is because the proposed encryption method scrambles the QDCT blocks, thus further confusing the texture in the video.

Experimental operation environment and parameter setting
All experiments in this section were simulated on a computer equipped with an Intel i7-8550U 4 GHz CPU and 8 GB memory. The simulation experiment runs in MATLAB H.264 codec [33]. The group of pictures (GOP) is set to "IPPP" with length of 20, which means that the first frame is encoded as I frame and the rest 19 frames are encoded as P frame. The default quantization parameter (QP) is 28.

Contrast experiment setting
In terms of encryption algorithm, it is compared with the encryption algorithm commonly used in the current scheme [4,[8][9][10] (IPM, QDCT coefficient sign and MVD sign encryption, recorded as I-Q-M encryption). As far as we know, the difference expansion algorithm is applied to the encrypted video for the first time in this paper, and no other expansion algorithm in encrypted video that can be used for comparison exists. Therefore, in terms of the embedding algorithm, this paper makes longitudinal comparative experiments as detailed as possible.

Visual Encryption Effect
The visual encryption effect of encryption algorithm is mainly evaluated by subjective video quality and objective evaluation metrics. The objective evaluation metrics include peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) [34,35].
The subjective video quality experiment results are shown in Figure 8. It can be seen that the content of the original frame cannot be perceived from the encrypted frames. Compared with the I-Q-M encryption method, the visual effect of the proposed encryption method is smoother and shows less texture. This is because the proposed encryption method scrambles the QDCT blocks, thus further confusing the texture in the video. As shown in Table 2, PSNR and SSIM before and after video encryption are given. It can be seen that the PSNR and SSIM of the frame encrypted by proposed method shall not exceed 13.03 dB and 0.1015. Compared with I-Q-M encryption, the PSNR and SSIM of As shown in Table 2, PSNR and SSIM before and after video encryption are given. It can be seen that the PSNR and SSIM of the frame encrypted by proposed method shall not exceed 13.03 dB and 0.1015. Compared with I-Q-M encryption, the PSNR and SSIM of the frames encrypted by proposed method are similar. In addition, the histograms of the two kinds of encrypted videos are statistically analyzed. As shown in Figure 9, the histogram of the encrypted frame is different from that of the original frame, indicating that the proposed encryption algorithm has good performance. From the perspective of subjective video quality and objective standards, the encryption method used in this paper achieves a better visual confusion effect and meets the requirements of visual security. the frames encrypted by proposed method are similar. In addition, the histograms two kinds of encrypted videos are statistically analyzed. As shown in Figure 9, the gram of the encrypted frame is different from that of the original frame, indicatin the proposed encryption algorithm has good performance. From the perspective jective video quality and objective standards, the encryption method used in this achieves a better visual confusion effect and meets the requirements of visual secu

Resistance to Sketch Attacks
Resisting sketch attacks is a remarkable characteristic of the proposed encr method. At present, none of the encryption methods used in RDH-EV has this cha istic. Figure 10 shows the result of sketch attack on videos encrypted by the I-Q-M m and the proposed method. It can be clearly seen that the I-Q-M encryption method resist the sketch attack, and the outline of the original video content will still be under the ciphertext state. Some information of video content can be easily obtaine these outline images, such as characters, scenes and vehicles. Under the sketch atta sketch image of the video encrypted by the proposed method is a noisy image w

Resistance to Sketch Attacks
Resisting sketch attacks is a remarkable characteristic of the proposed encryption method. At present, none of the encryption methods used in RDH-EV has this characteristic. Figure 10 shows the result of sketch attack on videos encrypted by the I-Q-M method and the proposed method. It can be clearly seen that the I-Q-M encryption method cannot resist the sketch attack, and the outline of the original video content will still be stolen under the ciphertext state. Some information of video content can be easily obtained from these outline images, such as characters, scenes and vehicles. Under the sketch attack, the sketch image of the video encrypted by the proposed method is a noisy image without obvious information, from which any content of the original video cannot be distinguished.

Computational Complexity Analysis
The proposed encryption method includes three parts: IPM encryption, MVD sign encryption and QDCT block permutation. IPM encryption and MVD sign encryption are XOR encryption of codewords, and their time complexity mainly depends on the number of corresponding codewords. In a 4 × 4 macroblock, the IPM codeword used for encryption is 3 bits. Therefore, in a frame containing M × N macroblocks, the time complexity of

Computational Complexity Analysis
The proposed encryption method includes three parts: IPM encryption, MVD sign encryption and QDCT block permutation. IPM encryption and MVD sign encryption are XOR encryption of codewords, and their time complexity mainly depends on the number of corresponding codewords. In a 4 × 4 macroblock, the IPM codeword used for encryption is 3 bits. Therefore, in a frame containing M × N macroblocks, the time complexity of IPM encryption is O (3(M−1)(N−1)). In terms of MVD, the maximum number of sign bits in P frame is 2 × M × N, so the time complexity of MVD sign encryption is O (2MN). Scrambling QDCT blocks requires sorting and scrambling operations, and the time complexity is O (MNlog 2 MN+MN). Since 3(M−1)(N−1) and 2MN are much smaller than (MNlog 2 MN+MN), the total time complexity of the proposed encryption operations is O (MNlog 2 MN).
The I-Q-M encryption method [4,[8][9][10] includes IPM encryption, QDCT sign encryption and MVD sign encryption. The time complexity of the IPM encryption and the MVD encryption is consistent with that of our encryption methods. The time complexity of the IPM encryption is O (3(M−1)(N−1)) and of the MVD encryption is O (2MN). Signs of nonzero AC coefficients will be encrypted by XOR in the QDCT sign encryption and the time complexity of this encryption depends on the number of non-zero AC coefficients. Since the non-zero AC coefficients of a 4 × 4 macroblock cannot exceed 15, the time complexity of the QDCT sign encryption is O (15MN). To sum up, the total time complexity of the I-Q-M encryption is O (MN), which is lower than that of the methods proposed in this paper.
The encryption method proposed in this paper is based on the coding bitstream. First, some elements, IPM and QDCT coefficients in the bitstream, are decoded, and then the corresponding encryption is performed. Finally, the encrypted bitstream is obtained by re-encoding the encrypted elements. Although the complexity of the proposed encryption method is higher than that of the I-Q-M encryption method, the proposed encryption method only partially decodes and encodes in the whole process and the time cost in the actual operation process is not too large.

Encryption Space Analysis
Generally speaking, the encryption space should be large enough to resist brute force attacks. As described in Section 3.1, the encryption keys used in proposed encryption method include K 1 , K 2 and K 3 . The size of the keys is all 256 bits and they are used to generate random sequences by ZUC algorithm, which is difficult to attack [32]. However, the attacker may attempt to enumerate the encryption operations instead of the encryption key. Therefore, the encryption space of I frame and P frame is analyzed. Let m s denote the number of MVD sign bits and m Q denote the number of non-zero AC coefficients sign bits in a frame. For the proposed encryption method and the I-Q-M encryption, the encryption space of I frame is calculated by the following formula: The encryption space of P frame is calculated by the following formula: It can be seen from Equations (22) and (23) that, for the proposed encryption method, the size of encryption space mainly depends on the number of macroblocks and the number of MVD. For the I-Q-M encryption, the size of encryption space mainly depends on the number of macroblocks, the number of MVD and the number of non-zero AC coefficients. As shown in Table 3, although the encryption space of the P frame will be smaller than that of the I frame, the encryption space of both the proposed method and the I-Q-M encryption is large enough to resist brute force attacks. In addition, P frame content is decoded based on the I frame content. As long as I frame encryption is not cracked, even if P frame encryption is cracked, P frame content cannot be obtained. Accordingly, the subsequent decryption of the encrypted P frame requires the complete content of the forward P frame, otherwise the original content cannot be recovered. To sum up, for the proposed encryption, the encryption space of i-th frame in a GOP can be calculated as follows: When i = 1, E i takes the minimum value E 1 , which is equal to E I . In other words, the minimum encryption space to crack a frame in a GOP is E I , which is large enough to resist the brute force attack.

Visual Quality of Labeled Decrypted Video
Because there is no difference expansion algorithm in the ciphertext domain to compare, this paper uses PSNR and SSIM to evaluate the visual quality of labeled decrypted video under different QP coding as vertically as possible. Table 4 shows the average PSNR and SSIM of original video and labeled video with QP of 24, 28, and 32. The data in the table are obtained in full embedding. It can be seen from the table that the PSNR and SSIM of the embedded video have decreased compared with those of the original video, but the decrease is not large. The average PSNR value decreases by 3.57dB at most, and the average SSIM value decreases by 0.0027 at most, which can meet the requirements of good visual quality. Figure 11 shows the PSNR and SSIM changes of the container video sequence after embedding. It can be seen that, in a GOP, the PSNR and SSIM of labeled videos decrease slowly, reaching the lowest in the last frame. In other words, the visual quality of the last frame in a GOP is the worst. As shown in Figure 12, the 20th frame (GOP length in this paper is 20) of each labeled video sequence is given. As can be seen from the figure, even in the 20th frame, it is difficult to see the traces of information embedding. In the cloud environment, when authorized users directly decrypt videos for rough browsing, the embedding algorithm in this paper can provide good video quality.   Figure 11. PSNR and SSIM before and after embedding. (a) PSNR before and after embedding SSIM before and after embedding.

Embedding Capacity
Embedding capacity is a basic evaluation metric of data hiding. The full embedded capacity of each video sequence is shown in Table 5. From different video sequences, video sequences with complex textures, such as stefan, bus and flower, have relatively large embedding capacity. The embedding capacity of video sequences with simple texture, such as akiyo and container, is relatively small. This is because the specific QDCT blocks are selected for embedding, of which DC coefficients are not 0 and AC coefficients are not all 0. The more complex the texture of the video sequence is, the more QDCT blocks can be used for embedding, and the larger the full embedding capacity. In video compression coding, the larger the QP value is, the more QDCT coefficients will be quantized to 0, resulting in fewer embeddable QDCT blocks and smaller full embedding capacity. According to the change of SSIM following QP in Table 4, the embedding capacity of this algorithm is large when the QP value is small, but the SSIM of embedded video will not decrease significantly. It shows that the embedding algorithm in this paper is more suitable for video encoded with low QP.

Robustness Analysis
Robustness refers to the ability not to lose hidden information due to some changes in the carrier. Without any attack on the video containing information, the information embedded by the proposed embedding algorithm can be accurately extracted from the video. However, if the video is attacked by various attacks, such as recompression, the

Embedding Capacity
Embedding capacity is a basic evaluation metric of data hiding. The full embedded capacity of each video sequence is shown in Table 5. From different video sequences, video sequences with complex textures, such as stefan, bus and flower, have relatively large embedding capacity. The embedding capacity of video sequences with simple texture, such as akiyo and container, is relatively small. This is because the specific QDCT blocks are selected for embedding, of which DC coefficients are not 0 and AC coefficients are not all 0. The more complex the texture of the video sequence is, the more QDCT blocks can be used for embedding, and the larger the full embedding capacity. In video compression coding, the larger the QP value is, the more QDCT coefficients will be quantized to 0, resulting in fewer embeddable QDCT blocks and smaller full embedding capacity. According to the change of SSIM following QP in Table 4, the embedding capacity of this algorithm is large when the QP value is small, but the SSIM of embedded video will not decrease significantly. It shows that the embedding algorithm in this paper is more suitable for video encoded with low QP.

Robustness Analysis
Robustness refers to the ability not to lose hidden information due to some changes in the carrier. Without any attack on the video containing information, the information embedded by the proposed embedding algorithm can be accurately extracted from the video. However, if the video is attacked by various attacks, such as recompression, the embedded information will not be extracted effectively. For example, after recompression, the coefficients of some QDCT blocks containing information may be fully quantized to 0, so these QDCT blocks containing information will be skipped during information extraction, resulting in information extraction errors. In addition, even if the QDCT block is not quantized into an all zero coefficient, the coefficients a and b will probably change, making it impossible to accurately extract information. In addition, even if the coefficients of the QDCT block are not all quantized to zero, the coefficients a and b will probably change, making it impossible to accurately extract information. Therefore, the embedding algorithm in this paper is not robust. This means that the scheme in this paper is not suitable for scenarios with high robustness requirements, and can be applied to content authentication, data indexing and other fields.

Reversibility Analysis
The reversibility of the embedding algorithm refers to the property that the carrier can be recovered without loss after information extraction. In this paper, the classical reversible algorithm, difference expansion, is used. The algorithm uses the difference between coefficients to extract and restore the carrier. Figure 13 shows the original PSNR and the PSNR after information embedding and extraction of news, container and bus video sequences. It can be seen from the figure that the two PSNR polylines corresponding to each video sequence coincide. It shows that the carrier can restore to the original value after extracting information and restore the original video quality, proving that the embedding algorithm in this paper is reversible.
Electronics 2022, 11, x FOR PEER REVIEW 18 embedded information will not be extracted effectively. For example, after recompres the coefficients of some QDCT blocks containing information may be fully quantized so these QDCT blocks containing information will be skipped during information ex tion, resulting in information extraction errors. In addition, even if the QDCT block i quantized into an all zero coefficient, the coefficients a and b will probably change, ing it impossible to accurately extract information. In addition, even if the coefficien the QDCT block are not all quantized to zero, the coefficients a and b will probably cha making it impossible to accurately extract information. Therefore, the embedding rithm in this paper is not robust. This means that the scheme in this paper is not sui for scenarios with high robustness requirements, and can be applied to content auth cation, data indexing and other fields.

Reversibility Analysis
The reversibility of the embedding algorithm refers to the property that the ca can be recovered without loss after information extraction. In this paper, the classic versible algorithm, difference expansion, is used. The algorithm uses the differenc tween coefficients to extract and restore the carrier. Figure 13 shows the original P and the PSNR after information embedding and extraction of news, container and video sequences. It can be seen from the figure that the two PSNR polylines correspon to each video sequence coincide. It shows that the carrier can restore to the original v after extracting information and restore the original video quality, proving that the bedding algorithm in this paper is reversible.

Bit Rate Variation
In order to further evaluate the performance of this scheme, the bit rate varia caused by the encryption and information embedding are analyzed. Bit rate variation be calculated as follows: _ e _ ori _ var 100% _ ori

− =
where BR_e denotes the bit rate after encryption or embedding; BR_ori denotes the o nal bit rate. The larger the BR_var is, the greater the bit rate variation caused by the o ation, which affects the compression performance.

Bit Rate Variation
In order to further evaluate the performance of this scheme, the bit rate variations caused by the encryption and information embedding are analyzed. Bit rate variation can be calculated as follows: where BR_e denotes the bit rate after encryption or embedding; BR_ori denotes the original bit rate. The larger the BR_var is, the greater the bit rate variation caused by the operation, which affects the compression performance. The encryption method in this paper includes three aspects-QDCT block permutation, IPM encryption, and MVD sign encryption-and it is implemented at the code stream. The QDCT block permutation only changes the location, not the coefficients in the QDCT block, so it will not cause bit changes. IPM encryption is to perform XOR encryption on three codewords in the code stream without changing the code length. The encryption of the MVD sign will only cause the reversal of its sign codeword, and will not increase the bit. Therefore, the encryption method proposed in this paper will not lead to changes in bit rate. The embedding algorithm used in this paper will change amplitude of some QDCT coefficients, so it will inevitably lead to the variation of bit rate. The more information is embedded, the greater the bit rate change. Table 6 shows the bit rate variation caused by encryption and embedding of each video sequence under different QP conditions. It can be seen that the experimental results are consistent with the above analysis. In different video sequences encoded by different QP, the bit rate after encryption is always consistent with the original bit rate. As for the bit rate change caused by embedding, we can analyze it from two perspectives: video sequence and QP value. From the perspective of video sequence, the bit rate of video sequence with complex texture changes more. From the perspective of QP, the bit rate of video sequence with low QP changes more. Through analysis, the reason is that video sequences with complex texture or low QP contain more embeddable QDCT blocks, and more coefficients are modified, resulting in greater changes in bit rate.

Conclusions
In this paper, a novel separable scheme for encryption and reversible data hiding is proposed, which has improved security in encryption method. In the encryption stage, the QDCT blocks are permutated by logistic chaotic scrambling and the partial codewords of the IPM and the MVD are encrypted by XOR encryption, which will not lead to bit rate variation. In the reversible data hiding stage, difference expansion is applied for the first time and AC 10 and AC 12 in 4 × 4 QDCT block of the P frame are selected as coefficient