Genuine Reversible Data Hiding Technique for H.264 Bitstream Using Multi-Dimensional Histogram Shifting Technology on QDCT Coefﬁcients

: Video has become the most important medium for communication among people. Video has become the most important medium for communication among people. Therefore, reversible data hiding technologies for video have been developed so that information can be hidden in the video without damaging the original video in order to be used in the copyright protection and distribution ﬁeld of video. This paper proposes a practical and genuine reversible data hiding method by using a multi-dimensional histogram shifting scheme on QDCT coefﬁcients in the H.264/AVC bitstream. The proposed method deﬁnes the vacant histogram bins as a set of n -dimensional vectors and ﬁnds the optimal vector space, which gives the best performance, in a 4 × 4 QDCT block. In addition, the secret message is mapped to the optimal vector space, which is equivalent to embedding the information into the QDCT block. The simulation results show that the data hiding efﬁciency is the highest among the compared ﬁve existing methods. In addition, the image distortion and maximum payload capacity are measured quite high.


Introduction
With the fast and inexpensive network environment and the increasing distribution of digital content, there is growing concern about copyright infringement as well. One way to prevent copyright infringement is to hide copyright information such as copyright holders, camera source identification number, and distributors in the digital content, and then use the secretly hidden information as evidence when the content is illegally used later. Such copyright information may change from time to time for various reasons, so it must be deleted and rewritten again and again. However, information concealment using existing watermarking techniques can result in more or less damage to the original content. Thus, the copyright information is modified frequently, and the content quality is deteriorated accordingly. The reversible data hiding technique inserts and extracts data without compromising the original content. Therefore, if you use reversible data hiding technology to hide the copyright or distribution information in your content, there is no need to worry about the content damage caused by frequent information modifications. Moreover, since the copyright information tends to be large, which causes more damage to the content if a traditional watermarking technique is used, the reversible hiding technique is further required. The application fields are not limited to copyright protection. Images are used to monitor and visualize natural phenomena in various scientific research activities. Securing the originality and reliability of these images is also an important issue [1] and can be achieved by reversibly hiding time stamp or hash information.
blocks. Moreover, the size of the stego bitstream has been significantly increased due to the properties of the entropy coding algorithm.
Even though most reversible video data hiding technologies focus on high payload, low distortion, and small increase of compressed bitstream size, there are application-specific approaches. Ma et al. [14] applied the RDH technique to privacy protection of surveillance videos. The privacy region such as a person's face is visually protected with full reversibility so that the privacy is protected under normal circumstances and, at the same time, full information is available to law enforcement. As part of privacy protection, RDH methods in encrypted video have been conducted recently. In [15], specific code words substitution method is used for three important syntax elements including DCT coefficients of an encrypted H.264 video. Yao et al. [16] theoretically analyzed the picture distortion caused by data embedding and inter-frame distortion drift. Xu [17] proposed a way of hiding data in a partially encrypted HEVC video using the coefficient modulation method. In recent years, many techniques used in H.263 have also been applied to HEVC standard [18] in [19,20].
Most previous works are pseudo reversible and not practical. Since most videos are distributed in a compressed form, it is desirable to utilize H.264 bitstream as a cover video as in [12,14]. In this paper, we propose a genuine RDH technique that utilize a histogram shifting method on mid-frequency QDCT coefficients. The composition of this paper is as follows: Section 3 explains the proposed algorithm of reversible data hiding and extraction, Section 4 explains the experimental environment and results, and Section 5 discusses and presents conclusions on the findings of simulation and its implications.

The Proposed Method
The overall reversible data hiding and extraction process shown in Figure 1 adapted the genuine method proposed in [12]. First, the cover H.264 bitstream compressed by an H.264 encoder is decompressed until the QDCT coefficients are obtained. Second, some secret message is embedded in certain positions in the QDCT block so that some QDCT coefficients are modified causing image distortion. Finally, the modified blocks are re-compressed to make a stego H.264 bitstream. The data extraction process is reversed. The stego bitstream is the input to the extraction process, and the fully recovered H.264 bitstream and the extracted secret message are the output.

Hidable QDCT Block Identification
Network Adaptation Layer units containing an I slice or P slice include a macroblock header and residual data, which are converted into 4 × 4 or 8 × 8 QDCT blocks during the decompression process. Only 4 × 4 blocks are used for data hiding, while 8 × 8 blocks are skipped. One 4 × 4 QDCT block has 16 transform coefficients as shown in Figure 2 and it can be represented as a set of 16 elements denoted by R = {r i | i = 1, 2, . . . , 16}. In addition, the secret message to be hidden in R also can be denoted by a set S = {s i | i ∈ Z + }. Since the information of R corresponds to the residual image after performing the prediction and motion estimation processes, the influence on the image distortion due to the coefficient modification of R may not be large. However, recognizing the fact that the distortion of the current frame can easily propagate to the next frames according to the H.264 compression method, the coefficient modification should be minimized. Identifying QDCT blocks that are able to hide the secret data are the first step in the proposed RDH technique. Thus, let us define an expandable block as R e = {r i ∈ R | r 11∼16 = 0} in order to select QDCT blocks with less complex textual patterns in the spatial domain. R e is defined in this way because the human visual system is more sensitive to images with high frequency components than images with low frequency components. The expandable blocks are histogram shifted and some of them become hideable blocks R h , in which the secret message is actually hidden. The definition of R h changes according to histogram shifting thresholds and data hiding locations. Let's assume that m consecutive bits of the secret message are hidden in one single R h . We need at least 2 m histogram vacant bins to accommodate m bits. According to the result of [11,12], the most appropriate positions for the data hiding are r 7 through r 10 , taking into account the three main objectives of a RDH technology: larger payload, less distortion, and smaller bitstream increase. In this medium frequency range, the results are generally not biased between the aforementioned three contradictory goals. Thus, the hideable block is defined as R h = {r i ∈ R e | r 7∼10 = 0} because r 7∼10 = 0 usually constitutes the peak bin of a 4-dimensional histogram with axes r 7 through r 10 .

Multi-Dimensional Histogram Shifting Method
The four-dimensional histogram is generated by counting the values of r 7 through r 10 of all R e blocks. The blocks corresponding to the peak bin are selected as R h in order to efficiently vacate adjacent bins to hide the secret message. The histogram is shifted toward zero bins along the four axes by the positive threshold T p in the positive direction and the negative threshold T n in the negative direction, respectively. As a result, the number of vacant bins N b = (T p − T n + 1) 4 is proportional to both the number of chosen r i s, which is fixed to 4 in this proposed method, and the threshold levels. Even though all the vacant bins can be used to hide secret data theoretically, some bins are not profitable because they may distort the stego image more than others. Therefore, a more sophisticated approach is needed to effectively hide the secret message in empty histogram bins.
There are several rules to hide the secret data into QDCT coefficients. First, m consecutive bits of secret data are hidden in the first R h and the next m consecutive bits in the next R h . This rule helps to increase the hiding capacity according to the Pigeonhole principle because all R h s with the same capacity are assigned the same amount of secret data. The selected r i s used to hide the data should be consecutive in zigzag scan order as shown in Figure 2, and minimum numbers of them have nonzero values after data hiding. By doing so, the modified QDCT coefficients r i s are likely to have successive zero values, which is advantageous for making short entropy code length and less stego bitstream increase accordingly. In this paper, we chose r 7 to r 10 for data hiding and represented them as a vector − → r = (r 7 , r 8 , r 9 , r 10 ). If we define all possible values of − → r as the vector space V, the vacant bins are a subspace of V and called vacant bin space V v . The space V v depends on (T n , T p ) and is defined as in Equation (1): However, the only a portion of V v is used for data hiding in order to reduce image distortion and the stego bitstream size at the cost of payload reduction. Thus, an m-sized data chunk should be embedded in a coded fashion to achieve relatively good video quality. For example, if '1111' data are embedded in R h as it is, then − → r = (1, 1, 1, 1) is filled, causing a severe distortion by modifying − → r = (0, 0, 0, 0) to − → r = (1, 1, 1, 1). However, if '1111' is embedded into − → r = (1, 0, 0, 0) in a coded way, then the effect is relatively small. We also can define the subspace V u in V as in Equation (2), which is used to hide data: where k is integer and − → r bi is standard basis vectors that span V. Since the dimension of V is four, there are four − → r bi s: − → r b1 , − → r b2 , − → r b3 and − → r b4 . The size of V u , or |V u |, is proportional to the embedding capacity of a single R h and can be calculated using Equation (3): The smaller the norm of − → r , or − → r , the less image distortion. For example, if (T n , T p ) = (0, 1), Since the secret message can be mapped to these used vacant bins, the payload capacity of R h is determined by Equation (4): For instance, if (T n , T p ) = (−1, 1), then Cap(R h ) = 3.17 bits meaning that we can hide 3.17 bits per R h . Thus, three bits are always hidden in a R h and sometimes four bits can be hidden as shown in Table 1. According to Table 1, the R h can hide four bits when the data chunk of S contains '1110' or '1111'. The one-to-one correspondence table depends on (T n , T p ), so the specific mapping rule should be designed accordingly. The proposed HS method can cause QDCT coefficient overflow. Considering that the QP value ranges from 10 to 40 in practice, it is reasonable to assume that there are no overflow issues. Table 1. An example of a one-to-one correspondence table from the data chunk of S to V u .

Reversible Data Hiding and Extraction Algorithms
For a better understanding, the overall data embedding and extraction algorithms are summarized in the following Algorithms 1 and 2. Step 1 Calculate the used vacant bin space V u using Equation (2).
Step 2 Prepare a one-to-one correspondence table from the data chunk of S to V u .
Step 3 Decode B and find the first QDCT block R.
Step 4 Restore coefficient values of the R.
Step 5 Determine if the R is a expandable block R e .
Step 5-1 If yes, the − → r of R e is shifted by (T n , T p ) to make vacant histogram bins.
Step 6 Determine if the R e can be classified as R h .
Step 6-1 If yes, map the data chunk of S to corresponding element of V u .
Step 6-2 If yes, replace the − → r of R h with the mapped element.
Step 7 Restore coefficient values of the next R from the.
Step 8 Go to step 5 until all R blocks are processed. Step 1 Calculate the used vacant bin space V u using Equation (2).
Step 2 Prepare a one-to-one correspondence table from V u to the data chunk of S.
Step 3 Decode B and find the first QDCT block R.
Step 4 Restore coefficient values of the R.
Step 5 Determine if the R is a hidable block R h .
Step 5-1 If yes, extract the secret message S using the − → r of R h and the mapping table prepared in Step 2.
Step 5-2 If yes, set the − → r of R h to the zero vector.
Step 6 Determine if the R is a hidable block R h .
Step 6-1 If yes, the − → r of R h is shifted backward by (T n , T p ) to remove vacant histogram bins.
Step 7 Restore coefficient values of the next R from B .
Step 8 Go to Step 5 until all R blocks are processed.

Simulation Results
We implemented the proposed method based on the H.264/AVC JM-18 reference software [21]. Totally, eight 352 × 288-sized video sequences of 300 frames were used for the simulation, including bridges(closed), Cost Guard, Foreman, Hall Monitor, Mobile, Mother, News, and Akiyo. JM-18 software configuration parameters were set to the baseline profile, 30 frames/second, and intra update period of 30 frames (i.e., group of frame IPPP...) with both CAVLC entropy and RD optimization mode on. As an indicator of video quality distortion, we used the peak signal-to-noise ratio (PSNR) between the compressed video B and the stego video B . The payload per frame (PPF) index was used to measure the payload capacity of the algorithm. PPF is calculated by dividing the amount of hidden payload by the number of frames in the cover video. If the PPF value is large, it means that the payload performance is good. Meanwhile, the file increase per payload (FPP) index is used to measure the file growth effect after data embedding. FPP is the difference in file size before and after data hiding divided by the amount of hidden payload. Smaller FPP means good data hiding efficiency.

Searching for the Optimal Subspace V u
There exist many V u spaces because V u can vary depending on relevant parameters as described in Equation (2). Thus, it is necessary to compare the performance of various V u s to find out the optimal one. First, the difference between symmetric and asymmetric histogram shifting is investigated as shown in Table 2. Four asymmetric threshold cases (T n , T p ) = (0, 1), (−1, 0), (−1, 2), (−2, 1) are tested to measure the three main performances of the proposed method. We appended the letter S to (T n , T p ) in order to indicate that the histogram is symmetrically shifted, even though the threshold is set to asymmetric. Therefore, the − → r of R e is not respectively shifted by |T p | in the positive direction and by |T n | in the negative direction, but shifted equally by max(|T n |, |T p |) in both directions. However, the V u of (T n , T p )S is the same as (T n , T p ) according to Equation (2).  Table 2. The symmetric shifting always shows better performance in PSNR than the asymmetric one. However, the results of FPP are reverse. The interim result from Table 2 is that the symmetric shifting method gives better image quality, and poorer file size increases at the same amount of payload. This is because natural image quality deteriorates significantly when asymmetric frequency components are introduced synthetically. Thus, we will adapt the symmetric shifting to achieve a higher PSNR. From now on, we will compare the performance differences for various V u s as depicted in Table 3. We also investigated four cases of histogram shifting thresholds for all test sequences. Specifically, four cases (T n , T p ) = (−1, 0)S, (−1, 1), (−2, 1)S, (−2, 2) were tested, and the average values are recorded. We intentionally made the payload the same for the same QP value to tell which V u provides better PSNR and FPP performance. It is quite clear that the (T n , T p ) = (−1, 1) case reports the best PSNR and FPP at the same time, and we will use the corresponding V u as the optimal space.

Performance Comparison with Existing Methods
The performance of the proposed method is measured for PSNR, PFF, and FPP as before. The simulation results of the proposed method are compared with those of [10][11][12][13] as shown in Table 4. The performance of [10,11,13] is much lower than the proposed method just because the pseudo RDH method is used, apart from their own algorithms. For fair comparison, the genuine RDH method suggested in [12] is applied to those methods. In Figure 3, the tenth stego video frames of the 'Hall monitor' sequence at QP = 30 are displayed to compare image quality. From a perspective of image distortion, the overall comparison is illustrated in Figure 4a and Chen et al.'s method [13] achieves the highest average PSNR of 36.41 dB as we expected. The second highest is the proposed one at 29.76 dB. From the viewpoint of capacity, Kim et al.'s method [12] can hide the largest payload at 5974 bits per frame and the second highest is the proposed one at 4273 as shown in Figure 4b. On the other hand, the PPF of [13] is 1924, which is the smallest among the five methods and less than half of the proposed methods. Finally, the file size increase due to a data embedment is a very important performance factor as it implies how compatible the algorithm is with the H.264 standard. From the graph in Figure 4c, the proposed method shows the lowest ratio at 2.30, which is much better than 2.72 and 2.95 achieved by [12,13], respectively. Therefore, the proposed method is 18.3% more efficient than other methods for most test sequences.  [13]; (d) The stego video of [12]; (e) The stego video of [11]; (f) The stego video of [10]. Table 4. Comparison between our proposed method(P) and the methods of [10][11][12][13].

Discussion
There are three main contributions of the paper. First, we proposed the generalized multi-dimensional HS method for H.264 bitstream. It should be also noticed that the method by [11] can be considered as a special case of the proposed method. As a result of generalization, it is possible to flexibly change the data embedding capacity by changing (T n , T p ) and V u as needed. On top of the flexibility, we can estimate the PSNR, PFF, and FPP by calculating the norm of V u elements and the size of V u that is nice to design the subspace V u . Second, we found an optimal V u through a number of simulations. However, finding the optimal V u is closely related to performance, so there is still room for improvement if you devise a more sophisticated algorithm. Third, the proposed algorithm achieved best data hiding efficiency while maintaining quite good image quality and maximum payload capacity. The method by Chen et al. [13] shows good image quality but embeds the smallest amount of payload among the compared methods. The method by Kim et al. [12] hides the largest amount of secret messages, but the image quality is moderate and coding efficiency is on the lower side. Overall, the proposed method gave the best results in one of the three performances measured, and the top in two. Therefore, it can be said that the proposed method is the best. In fact, considering the situation in which RDH technology has evolved considerably, it is not easy to develop an algorithm with the best performance in all fields. Therefore, it is necessary for future research to define an application field first, and then develop a method suitable for it.