A Multi-Domain Embedding Framework for Robust Reversible Data Hiding Scheme in Encrypted Videos

: For easier cloud management, reversible data hiding is performed in an encrypted domain to embed label information. However, the existing schemes are not robust and may cause the loss of label information during transmission. Enhancing robustness while maintaining reversibility in data hiding is a challenge. In this paper, a multi-domain embedding framework in encrypted videos is proposed to achieve both robustness and reversibility. In the framework, the multi-domain characteristic of encrypted video is fully used. The element for robust embedding is encrypted through Logistic chaotic scrambling, which is marked as element-I. To further improve robustness, the label information will be encoded with the Bose–Chaudhuri–Hocquenghem code. Then, the label information will be robustly embedded into element-I by modulating the amplitude of element-I, in which the auxiliary information is generated for lossless recovery of the element-I. The element for reversible embedding is marked as element-II, the sign of which will be encrypted by stream cipher. The auxiliary information will be reversibly embedded into element-II through traditional histogram shifting. To verity the feasibility of the framework, an anti-recompression RDH-EV based on the framework is proposed. The experimental results show that the proposed scheme outperforms the current representative ones in terms of robustness, while achieving reversibility. In the proposed scheme, video encryption and data hiding are commutative and the original video bitstream can be recovered fully. These demonstrate the feasibility of the multi-domain embedding framework in encrypted videos.


Introduction
Reversible data hiding [1] in encrypted domain (RDH-ED) is a technique that uses encrypted data as the carrier to reversibly embed information, and can still carry out correct data extraction and carrier decryption and recovery [2]. This technique is being increasingly used in the field of information security to ensure privacy and copyright protection [3]. Military data are often stored and transmitted in ciphertext. To authenticate access authentication, label information needs to be embedded into the encrypted data. In the medical field, multimodal medical images [4] are usually encrypted to prevent information on the patient's condition from leaking. For convenience of management, the information about individuals and conditions are embedded into encrypted images [5]. With the rapid development of the network, video information is widely disseminated in the Internet. At the same time, because there are many elements for data hiding in the video codec process, the reversible data hiding in encrypted videos (RDH-EV) has attracted the attention of many researchers [6][7][8].
Recently, the technology of reversible data hiding in encrypted images (RDH-EI) has seen great development, and can be roughly divided into three categories: vacating room after encryption [9][10][11], vacating room before encryption [12][13][14] and vacating room in

•
The proposed framework for RDH-EV can achieve both robustness and reversibility.

•
In terms of robustness, the proposed scheme outperforms the existing ones. • Video encryption and data hiding in the proposed scheme are commutative.
The rest of the paper is organized as follows. In Section 2, the proposed framework is described and anti-recompression reversible data hiding in encrypted videos based on multi-domain embedding is proposed. The experimental results are shown in Section 3. Section 4 concludes the paper.

Proposed Framework and Scheme
In this section, the multi-domain embedding framework in encrypted videos and the anti-recompression RDH-EV scheme based on the framework are elaborated. The framework is shown in Figure 1. To achieve commutativity, data hiding and encryption usually avoid interference by modifying different attributes of the same element. In this framework, the element for robust embedding is encrypted through Logistic chaotic scrambling, which is marked as element-I. To further improve robustness, the label information will be encoded with BCH (7, 4, 1) [27,28]. Then, the label information will be robustly embedded into element-I by modulating the amplitude of element-I, in which the auxiliary information is generated for recovering the element-I lossless. The element for reversible embedding is marked as element-II, the sign of which will be encrypted by stream cipher. The auxiliary information will be reversibly embedded into element-II through traditional HS. The scenario flow of the proposed scheme is shown in Figure 2. The scheme is mainly divided into three parts: video encryption, data embedding in encrypted video and data extraction and video recovery. In the video encryption phase, the video owner selectively encrypts elements of the original video bitstream using Logistic chaotic scrambling and the ZUC algorithm. In the data hiding phase, the data hider embeds the label information and the auxiliary information into different encrypted elements. The data hider can be a cloud service or the video owner, depending on the actual application scenarios. In the data extraction and video recovery phase, the authorized receiver extracts the information and fully recovers the original video bitstream.

Video Encryption
Selective encryption selects only some syntax elements in the compression process or compressed bit stream for encryption, so as to ensure format compatibility [29,30]. In this paper, we will focus on a selective encryption scheme that can be applied to compressed H.264 bitstream. The QDCT coefficient of I-frame and the QDCT coefficient of P-frame are mainly encrypted to form two independent ciphertext fields.
As shown in Figure 2, partial QDCT of I-frame will be used for encryption and robust data hiding. It is not appropriate to encrypt the sign of QDCT coefficient, because the proposed data hiding may change the sign of the QDCT coefficient, which will affect the normal description and make the scheme uncommutative. To keep the commutation of encryption and data hiding, Logistic chaotic scrambling is applied, which just permutes the location.
Chaotic systems have complex dynamic behavior and are widely used in the field of confidential communication. The one-dimensional Logistic mapping formula is as follows: where λ ∈ (0, 4] is Logistic parameter, and x n ∈ (0, 1). When 3.57 < λ ≤ 4, the map is in a chaotic state. The closer λ is to 4, the more uniformly the x range is distributed at (0,1). When λ and x 0 are given, the random sequence S = (s 1 , . . . , s l−1 , s l ) with length l can be generated according to Equation (1). Then, the index sequence K = (k 1 , . . . , k l−1 , k l ) can be obtained according to Equation (2), which is the encryption key.
In this paper, partial QDCT coefficients of I-frame are chosen as element-I. As shown in Figure 3, the partial AC coefficients, AC 9 -AC 9+µ of a 4 × 4 QDCT block in non-first rows and non-first columns of an I-frame, are divided into a group, assuming A i = (a 1 , . . . , a µ , a µ+1 ), where i = 1, . . . , n−1, n; n is the number of QDCT blocks available in the I-frame for embedding; µ ∈ {1, 3, 5} is a robust embedding capacity parameter; and the default µ = 1 in this paper. Suppose M = (A 1 , . . . , A n−1 , A n ). Then M will be encrypted with key K. The encryption method in the proposed scheme is as follows: (1) Given λ and x 0 , the scrambling key K 1 is obtained according to the Logistic chaotic scrambling algorithm. Then, M is encrypted with K 1 , as shown in Figure 4. In this way, the encrypted domain for robust embedding is formed. Note that λ and x 0 will be reversibly embedded into the P-frame as part of the auxiliary information.
(2) The signs of the remaining non-zero QDCT coefficients of the I-frame are encrypted by stream cipher according to K 2 . Specifically, the sign bits are XORed with the random sequence generated by the ZUC algorithm with the encryption key K 2 .
(3) The signs of all non-zero QDCT coefficients of the P-frame are encrypted by stream cipher according to K 3 . Specifically, the sign bits are XORed with the random sequence generated by the ZUC algorithm with the encryption key K 3 . In this way, the encrypted domain for reversible embedding is formed. For security purposes, the IPM should also be encrypted. Details of the encryption can be found in [18][19][20][21][22][23][24], and will not be repeated in this paper.

Element selection
As described in Section 2.1, AC 9 -AC 9+µ of I-frame are taken as the robust embedding points, which are marked as I-QDCT. Because the Logistic chaotic encryption only changes the position of QDCT coefficients, the amplitude of QDCT can be modified to a large extent without considering the change of its sign, so as to achieve strong robustness. The nonzero coefficient of P-frame is selected as the reversible embedding point, which is marked as P-QDCT. There is no doubt that element selection can be optimized further, but this paper mainly verifies the feasibility of the framework and does not focus too much on the details of processing.

2.
Robust embedding in I-frame AC 9 -AC 9+µ can be divided into N non-overlapping coefficient pairs (b 1 , b 2 ). One bit can be embedded into one coefficient pair, so N bits can be embedded into a 4 × 4 QDCT block. The information is embedded in sequence from back to front into 4 × 4 QDCT block. The specific embedding algorithm is as follows: where ξ is the parameter that controls robustness and the default value is 1, w ∈ {0,1} is information bit and d is the difference between two coefficients in a pair of coefficients. After embedding, the difference d is modified as follow: where z = d mod 2.

Reversible embedding in P-frame
In this stage, the auxiliary information is embedded into the nonzero coefficient of P-frame. The auxiliary information L includes Logistic parameter λ, initial value x 0 , robustness control parameter ξ, robust embedding capacity parameter µ and the original difference d of each coefficient pair, which is very little overhead after Run-Huffman encoding. Classic one-dimensional HS is used to embed the auxiliary information into P-QDCT, and its embedding method is as follow: where y is a single QDCT coefficient, y is a modified coefficient, and sign() is the sign function.

4.
Multi-domain embedding process The proposed algorithm takes a group of pictures (GOP) as the basic embedding unit. For the whole video, the data embedding starts in the last GOP and the embedding is carried forward one by one. The main steps of the multi-domain embedding algorithm can be summarized as follows: Step 1: Partially decode the encrypted video bitstream to obtain each GOP.
Step 2: Decode the encrypted I-QDCT and P-QDCT from the last GOP that has not been embedded.
Step 3: Divide the encrypted I-QDCT into non-overlapping coefficient pairs (b 1 , b 2 ). The label information is encoded by the BCH code at first and then is embedded into the I-QDCT in a sequence from back to front according to Equation (3), in which the corresponding auxiliary information is generated.
Step 4: Embed the auxiliary information into the P-QDCT according to Equation (5) ina sequence from back to front.
Step 5: Go back to Step 2 until the label information is embedded completely.
Step 6: Encode the I-QDCT and P-QDCT to obtain the encrypted labeled video bitstream.

1.
There are two steps to information extraction: Step 1: Extract the auxiliary information L according to Equation (6), and restore P-QDCT according to Equation (7).
Step 2: Extract the label information according to Equation (8), and use the auxiliary information L I to restore the I-QDCT according to Equation (9).
2. Video recovery is divided into the following two cases: Case 1. The video is decrypted before the information is extracted. After the video decryption, for P-QDCT, the correct auxiliary information extraction can be carried out directly and the corresponding QDCT coefficient can be restored. In this case, it is necessary to use λ and x 0 in the auxiliary information to generate the key K 1 , and then reverse the information extracted in the I-QDCT to obtain the correct label information.
Case 2. The information is extracted before the video is decrypted. In this case, it is only necessary to extract the information, restore the carrier in P-QDCT and I-QDCT successively, and then use the encryption key to decrypt the video to obtain the original video.
The above two situations show that the decryption and data extraction are commutative in the proposed scheme.

Experimental Results with Analysis
The effectiveness of the proposed scheme has been investigated through a series of simulation experiments. Section 3.1 introduces the video sequence used, the experimental runtime environment, the representative methods used for comparison and the objective evaluation metrics. The security of encrypted video is analyzed in Section 3.2. The robustness is deeply analyzed in Section 3.3. The embedding capacity is analyzed in Section 3.4. Section 3.5 reports the visual quality using both subjective visual performance and objective evaluation statistics. The bit rate increase ratio (BIR) and reversibility is analyzed in Section 3.6 and 3.7, respectively. Comparative analysis is given in Section 3.8. Further discussion and research are considered in Section 3.9.

1.
Video sequences For objectivity of the experimental results, eight well-known video sequences (i.e., foreman, carphone, salesman, news, container, coastguard and city) in QCIF format (176 × 144) are used for simulation. The video sequence can be accessed on the website of the YUV Video Sequence [31]. The eight selected video sequences are rich in variety, including rapid motion, slow motion, complex texture and simple texture. In this paper, the luminance component of the first 100 frames of video is used for experiments.

2.
Experimental runtime environment and parameter setting All simulation experiments in this section are conducted on a PC equipped with an Intel i7-8550U 4 GHz CPU and 8 GB memory. Simulations were run in MATLAB R2019a. The matlab implementation of H.264 is used for simulation, and can be accessed on the website of MathWorks [32]. The GOP is set as "IPPP" with length 20. The default quantization parameter (QP) is set as 28.

3.
Representative methods used for comparison Another five different representative schemes have been selected for comparison [21][22][23][24][25]. These five schemes were chosen to meet two criteria: first, these schemes are reversible and separable. Second, the five schemes can be implemented in H.264, although [25] is for HEVC. All of the selected schemes were implemented following the parameter-setting in the cited references. In [12], the parameter β = 0 or 1. In this experience, β = 1 is chosen for comparison.
PSNR is commonly used to measure the difference between two images. The larger the PNSR, the smaller the difference between the two images. Different from the absolute error measured by PSNR, SSIM is a perception model, which is more in line with the intuitive feeling of the human eye. The larger the SSIM, the more similar the two images are; values range from −1 to 1. BER in this paper is based on the number of the extracted bits with error divided by the number of all the embedded bits, which can effectively measure the robustness of the proposed algorithm. BIR is introduced to measure the variation of video bit rate, which can be calculated according to Equation (10).
where BR_em is the bitrate after the video is encrypted or embedded with information, and BR_ori is the original video bitrate.

Security Analysis of Encrypted Video
Selective encryption for video is designed to prevent unauthorized users from having complete and clear content. Selective encryption for video needs to meet cryptographic security and perceptual security [19]. In terms of cryptographic security, the ZUC algorithm and the Logistic chaos scrambling algorithm are used in the proposed scheme to ensure this aspect of security. Perceptual safety is mainly evaluated by subjective video quality and objective evaluation criteria (i.e., the PSNR and the SSIM).
The subjective results are shown in Figures 5 and 6. The video content cannot be distinguished from the figure, achieving protection of the video content. As shown in Table 1, the PSNR and the SSIM of each encrypted video sequence are provided. The PSNR of the encrypted video does not exceed 15 dB, and the SSIM does not exceed 0.25. Although the PSNR and the SSIM of the same video with different QP fluctuate after encrypting, the encryption effect can still meet the requirements of scrambling content. In addition, the histogram is compared for analysis. As shown in Figure 7, the histograms of encrypted frames are different from those of original frames, which shows encryption with a good performance. From both subjective and objective aspects, the video encryption in the proposed scheme can meet the requirement of perceptual security.

Robustness Analysis
Robustness refers to the anti-interference ability of the data hiding algorithm. The robustness of a scheme is evaluated by its extraction accuracy (1-BER) under attacks. The higher the accuracy, the stronger the robustness. In this experiment, the representative algorithms [21][22][23][24][25] are selected for a comparative experiment. For consistency, all algorithms are tested in I-frame. Figure 8 compares the extraction accuracy (1-BER) of the proposed algorithm and the representative algorithms. As seen from the figure, the proposed scheme outperforms the compared schemes in terms of robustness. The proposed algorithm can achieve an accuracy more than 90% under low QP re-compression attacks, while the accuracies of the algorithms in [21][22][23][24][25] fluctuate around 50% in eight test sequences. Although the BCH code is used in [23], the algorithm cannot resist a re-compression attack, like the other four algorithms. This is because the embedding algorithms used in [21][22][23][24][25] are all traditional HS, with which the information needs to be extracted on accurate amplitudes (i.e., 1 and −1). Once one of the accurate amplitudes is quantized to 0 under the re-compression attack, it will affect the extraction of not only the current bit, but also the subsequent bits. As a result, most of the extracted information will be wrong and the BCH code will be unable to work. In the proposed algorithm, the information is extracted by distinguishing the positive and negative of d . For example, although the coefficient pair (2, 1) containing bit '1' is quantized to (1, 0), the correct bit '1' can still be extracted according to Equation (10). Moreover, the embedding and extracting area is fixed, and the extraction error of any bit will not affect the extraction of other bits. Therefore, the proposed algorithm can better resist the re-compression attack.   Under the re-compression attack, some small QDCT coefficient pairs will be quantized to (0, 0) in the re-compression with large QP, so the information embedded in them may be lost. As can be seen from Table 2, when ξ = 1, it is difficult to resist the re-compression attack with QP ≥ 36. This is because the proposed algorithm mainly uses the medium and high frequency coefficients, most of which are small. When the re-compression QP is large and the parameter ξ is small, the labeled QDCT coefficient can be easily quantized to 0. When ξ ≥ 2, the proposed algorithm can be effective against large QP re-compression attacks. Figure 9 displays the relationship between ξ and the ability to resist re-compression attacks. Obviously, the larger the ξ, the stronger the ability to resist re-compression attacks. That is because the larger the ξ, the larger the modified coefficient. A large coefficient is difficult to quantize to 0 so the information can be extracted correctly under a considerable degree of re-compression attacks. Figure 10 displays the extraction accuracy of foreman, salesman and carphone video sequences with different original encoding QP values under a QP = 40 re-compression attack. It can be seen that the larger the original encoding QP value, the larger the re-compression QP value that can be resisted. When the original encoding QP ≥ 32, it can effectively resist the QP = 40 re-compression attack. As seen from Figures 8-10, the proposed scheme not only outperforms other schemes in terms of robustness but also can adapt to the robustness requirements in different scenarios by setting ξ and original encoding QP.

Embedding Capacity
For the video, the embedding capacity requirement is not high because the information can be embedded in several frames to meet the embedding capacity requirement. For the proposed algorithm, the embedding capacity of a single frame can be calculated according to t Equation (11).
where h is the height of the frame, w is the width of the frame, µ ∈ {1, 3, 5} is the embedding capacity parameter, and size is the size of the QDCT block which is 4 × 4 in this paper. For the QCIF format video, when µ = 1, the embedding capacity is 1584 bits. When µ = 3, the embedding capacity is 3168 bits. When µ = 5, the embedding capacity is 4752 bits. Of course, in practice, more information is not embedded into a single frame. When the embedding capacity is higher, the video quality will obviously decrease accordingly, and a compromise option can be selected according to actual needs.

Visual Quality of the Decrypted Labeled Video
In some cases, authorized users will decrypt the encrypted video directly in order to browse the similar video quickly and the decrypted video still contains the label information. In this experiment, the visual quality of decrypted labeled videos will be evaluated. As shown in Figures 5 and 11, from the perspective of human eyes, it is difficult to detect the difference between the decrypted labeled video and the original video. Table 3 provides the PSNR and the SSIM of each video sequence with 256 bits of the label information. It can be seen that the video quality is reduced to varying degrees in the same video with different QP. For QP = 24, the average ∆PSNR and ∆SSIM are 4.02 dB and 0.0032, respectively. For QP = 28, the average ∆PSNR and ∆SSIM are 2.97 dB and 0.0049, respectively. For QP = 32, the average ∆PSNR and ∆SSIM are 3.33 dB and 0.0094, respectively. The ∆PSNR of the same decrypted labeled video with different QP fluctuates, but the ∆PSNR is not more than 5.48 dB at most. Judging from the SSIM which is more in line with human visual characteristics than the PSNR, as QP increases, the ∆SSIM increases in most video sequences. However, the ∆SSIM is small and not more than 0.0253 at most, illustrating that the proposed algorithm has good imperceptibility. Therefore, the proposed scheme can meet the needs of directly decrypting to obtain similar videos in the cloud environment.

Bit Rate Variation Analysis
During the encryption process, the encryption of the QDCT sign does not affect the bitrate due to the encoding characteristics. Scrambling encryption of the same area of different QDCT blocks will not seriously affect the overall distribution and the I-frame is less, so it will not significantly increase the bit rate. When embedding information, the embedding algorithm increases or decreases the amplitude of coefficient, and it is possible to modify the coefficient 0, so the bit rate will be increased. Table 4 provides the BIR caused by encrypting and embedding 256 bits of label information The bit rate changes differently in the same video sequence with different QP. The BIR caused by encryption fluctuates in the same encrypted video with different QP, but the BIR is no more than 1.14% at most. It depends on the distribution of QDCT after scrambling encryption. As QP increases, the BIR caused by embedding increases. This is because with the increase in QP, the amplitude of QDCT becomes smaller and tends to 0 so that the probability of modifying the coefficient 0 increases. It is inevitable that with the increase in the embedding rate, the BIR will also increase. Therefore, a proper embedding rate need to be selected to control the BIR in some scenarios.

Reversibility Analysis
The reversibility of a scheme means that the data hiding it performs is reversible. Specially, the labeled video can be lossless when recovered after data extraction. The proposed scheme embeds the label information into the I-QDCT, generates the auxiliary information required for reversible extraction, and then uses HS to reversibly embed the auxiliary information into the P-QDCT. When extracting information, extract the auxiliary information in the P-frame and restore the P-QDCT at the same time. Then, while extracting the label information from the I-frame, restore the I-QDCT using the auxiliary information. The I-frame restoration process can only be carried out smoothly under the condition of successful extraction of auxiliary information, otherwise only the label information can be extracted from the I-frame and the carrier cannot be restored. Figure 12 displays the original PSNR of video sequences and the PSNR of video sequences after information embedding and extraction. The PSNR of two situations corresponding to each video sequence are seen to be consistent. That is to say, the proposed scheme can fully recover the carrier and is reversible.

Qualitative Analysis
As shown in Table 5, qualitative analysis of the related schemes is given. It can be seen that the existing schemes in [25][26][27][28] do not make full use of encrypted elements and do not take both reversibility and robustness into account. The existing schemes in [18][19][20][21][22][23][24][25] only utilize one encrypted element to embed information, while the proposed scheme utilizes two encrypted elements for embedding information. The scheme in [18] is robust but not reversible. The schemes in [19,20] are neither reversible nor robust. The schemes in [21][22][23][24][25] are reversible but not robust. The proposed scheme achieves robustness, reversibility and separability while maintaining format compliant.  information for lossless recovery of the former element. The auxiliary information was reversibly embedded into the latter element. An anti-recompression RHD-EV based on the framework was proposed to verity the feasibility of the framework. The robustness experiment showed that the proposed scheme outperforms the compared schemes. Under the low QP re-compression attack, the accuracy of data extraction in the proposed scheme was above 90%, while the accuracy of data extraction in compared schemes fluctuated around 50%. In the experiment of reversibility, the PSNR of the recovered video is consistent with the PSNR of the original video, which shows that the proposed scheme is perfectly reversible. The results of the two experiments in the paper demonstrate that compared with existing schemes, the scheme based on the framework can achieve both robustness and reversibility. In addition, encryption and data hiding are commutative for meeting different application scenarios.
Of course, the proposed framework can be further improved. For example, the Logistic chaotic scrambling algorithm will cause an increase in the bit rate. This problem can be solved by scrambling based on the whole QDCT block in the follow-up research. In addition, the proposed scheme based on the framework has its limitations, mentioned in Section 3.9. In future work, we will try to design a more robust embedding algorithm based on the framework to resist more attacks.