A Robust and High Capacity Data Hiding Method for H.265/HEVC Compressed Videos with Block Roughness Measure and Error Correcting Techniques

Recently, the H.265/HEVC video coding has been standardised by the ITU-T VCEG and the ISO/IEC MPEG. The improvements in H.265/HEVC video coding structure (CTU, motion compensation, inter- and intra-prediction, etc.) open up new possibilities to realise better data hiding algorithms in terms of capacity and robustness. In this paper, we propose a new data hiding method for HEVC videos. The proposed method embeds data in 4 × 4 and some selected larger transform units. As theory of Human Visual System suggests that human vision is less sensitive to change in uneven areas, relatively coarser blocks among the 8 × 8 and 16 × 16 blocks are selected as embedding destinations based on the proposed Jensen-Shannon Divergence and Second Moment (JSD-SM) block coarseness measure. In addition, the SME(1,3,7) embedding technique is able to embed three bits of message by modifying only one coefficient and therefore exhibits superior distortion performance. Furthermore, to achieve better robustness against re-compression attacks, BCH and Turbo error correcting codes have been used. Comparative studies of BCH and Turbo codes show the effectiveness of Turbo codes. Experimental results show that the proposed method achieves greater payload capacity and robustness than many existing state-of-the-art techniques without compromising on the visual quality.


Introduction
Digital motion picture, colloquially known as video, has become one of the most popular media in the global entertainment industry. It also has widespread use in IP telephony, surveillance and audio-visual education. The growing dependence on Internet and ubiquitous availability of network coverage through ADSL broadband, Wi-Fi, 3G and 4G LTE mobile Internet services together with cheap data packages has enabled almost exponential growth in usage of popular social media and content sharing services such as Facebook, YouTube, Instagram, Twitter, etc. The availability of high speed mobile Internet and the convenience of handheld mobile devices has given rise to online content sharing and streaming services such as Netflix, YouTube, Hulu, etc., which are rapidly replacing the traditional cable and satellite TV services. Due to the ease and convenience with which video contents can be downloaded and shared through social networking services over the Internet, the industry feels the need control, manage and protect their copyrights of their intellectual property. Specifically, the industry wishes: (1) to manage distribution rights of their video content to protect their business interest; (2) to monitor and keep track of when, where, how many times and by whom a copyrighted video has been downloaded, shared and streamed; and (3) to hyperlink the related contents together to enhance the end-user experience. Data hiding in video is one of the possible solutions that can fulfil the aforementioned objectives. Data hiding is the process of embedding considered robustness in this work. Yang et al. [19] proposed a multilevel data hiding technique for HEVC videos that is based on PU partition modes in P-frames. This technique work in two passes. The first pass records the PU partition modes selected by the H.265 algorithm. The second pass is the modification pass. This pass forces each PU mode into one of the encoding groups to reflect the presence of the binary data bits. Finally, the modified PU partition modes are used in rest of the HEVC encoding process. This technique achieves good payload capacity at acceptable level of bit-rate increase. However, robustness of embedded data has not been evaluated by the authors. Apart from the above, several techniques [20][21][22] have been proposed in the literature that use the motion vector feature of HEVC as information hiding venue. In [20], the authors proposed a reversible data hiding method that embeds data in pairs of MVs by reducing difference expansion coefficient. The actual embedding is done by substituting the LSBs of the MVs. This technique yields good perceptual visual quality and acceptable level of bit-rate increase, but robustness performance was not considered in the paper. The authors of [22] proposed a data hiding technique that embeds in motion vector features of both P-frames and B-frames in compressed video. To minimise perceptual distortion, a candidate subset of MVs is selected based on prediction error. This approach yielded good distortion performance; however, the authors did not consider robustness performance.
The research work in this paper is motivated by observations about the existing works in data hiding in AVC and HEVC coded videos. Firstly, many data hiding methods have been developed for the H.264/AVC and earlier video compression standards. The literature of methods developed for H.264/AVC is very rich [3][4][5][6][7][8][9][10]. However, most of these existing methods cannot be applied to H.265/HEVC due to many changes in coding structure and prediction algorithms, which are briefly discussed in Section 3.1. Secondly, most of the aforementioned methods that use the QTCs as data embedding venue, only embed data in the 4 × 4 intra-predicted blocks and therefore suffer from low payload capacity. Thirdly, many proposed techniques that utilise MV and block partitioning as data embedding features do not consider robustness as a performance objective. The main objectives of this research work are to increase payload capacity by embedding data in larger prediction (8 × 8 and 16 × 16 in addition to the 4 × 4 blocks) unit in CTBs, to improve visual quality by minimising distortions caused by data embedding and to increase robustness by using powerful error correcting codes. The main contributions of the paper are as follows. Firstly, the Jensen-Shannon Divergence and Second Moment based block coarseness measure (JSD-SM) is proposed. JSD-SM coarseness measure selects the relatively more coarse blocks among all 8 × 8 and 16 × 16 luma TUs. Secondly, increased robustness is achieved using Turbo coding . Multiple sets of parameters of Turbo coding are proposed, evaluated and the results are compared to BCH codes. Thirdly, the Simplified Matrix Encoding, i.e., SME (1,3,7) data embedding technique, is able to embed three data bits in a block of seven quantised transform coefficients (QTCs) by modifying only one QTC. Hence, it reduces change density (i.e., number of modification per embedded data bit) of the proposed method.

Overview of H.265/HEVC Video Coding Standard
The H.265/HEVC video coding takes a hybrid approach of intra-and inter-frame prediction, motion compensation and 2-D discrete transform. The algorithm takes raw frames as input and divides the sequence into groups of pictures (GOP). Each GOP contains a pre-defined number of frames. The first frame of a GOP is coded using intra-picture prediction, i.e., using redundant spatial data within the same frame. The rest of the frames in a GOP are coded using inter-picture prediction in either of the two modes: (1) prediction mode, in which the ith frame is coded with taking advantage of temporal redundancy with the (i − 1)th frame, resulting in P-frames; and (2) bi-directional prediction mode, which takes in to account both (i − 1)th and (i + 1)th frame while coding the ith frame and results in B-frames. Therefore, each GOP contains one I-frame in the beginning and the rest of the frames are either P-frames or B-frames. Inter-picture (P and B frames) encoding process involves finding motion information, i.e., motion vectors associated to each block of the current picture with respect to the reference frame, i.e., the previous I-frame. This process is known as motion compensation and is based on the idea that in a video sequence, many blocks in the picture are expected to move with respect to the I-frame, and therefore, to encode the next frames, it suffices to encode which blocks moved how much and in which direction, rather than encoding all the blocks themselves. The residual signal, i.e.,the difference between the original blocks and its predicted version, is mathematically transformed using a spatial transform. The transform coefficients are then passed through a series of processes involving scaling, quantisation and entropy coding. The final output is then stored or transmitted in a format specified by the HEVC standard. Figure 1 shows the overall encoding process. HEVC CTUs contain luma and chroma coding tree blocks (CTB) whose size can be as large as 64 × 64. HEVC supports these blocks to be further divided and arranged in quad-tree structure, as shown in Figure 3. further splitting is possible, as signaled by a maximum depth of the residual quadtree indicated in the SPS, each quadrant is assigned a flag that indicates whether it is split into four quadrants. The leaf node blocks resulting from the residual quadtree are the transform blocks that are further processed by transform coding. The encoder indicates the maximum and minimum luma TB sizes that it will use. Splitting is implicit when the CB size is larger than the maximum TB size. Not splitting is implicit when splitting would result in a luma TB size smaller than the indicated minimum. The chroma TB size is half the luma TB size in each dimension, except when the luma TB size is 4×4, in which case a single 4×4 chroma TB is used for the region covered by four 4×4 luma TBs. In the case of intrapicture-predicted CUs, the decoded samples of the nearest-neighboring TBs (within or outside the CB) are used as reference data for intrapicture prediction. In contrast to previous standards, the HEVC design allows a TB to span across multiple PBs for interpicture-predicted CUs to maximize the potential coding efficiency benefits of the quadtree-structured TB partitioning.

F. Slices and Tiles
Slices are a sequence of CTUs that are processed in the order of a raster scan. A picture may be split into one or several slices as shown in Fig. 5(a) so that a picture is a collection of one or more slices. Slices are self-contained in the sense that, given the availability of the active sequence and picture parameter sets, their syntax elements can be parsed from the bitstream and the values of the samples in the area of the picture that the slice represents can be correctly decoded (except with regard to the effects of in-loop filtering near the using interpicture pr compensated predict tion). B slices use bo The main purpose of sl losses. Furthermore, slices mum number of bits, e.g., fore, slices may often co CTUs per slice in a mann video scene. In addition t which are self-contained a gular regions of the pictur enable the use of parallel p and decoding. Multiple tile being contained in the sam may contain multiple slice arranged group of CTUs ( all of them containing abo shown in Fig. 5 To assist with the gran pendent slices are addition

Motion Estimation
Motion estimation is the process of finding matching blocks of pixels in inter-frame coding. In intra-frame coding, although there is no motion involved as the blocks are matched within a single frame, the objective is same: to take advantage of data redundancy. Intra prediction finds spatial redundancy, whereas inter-frame prediction estimates motion of pixel blocks to find temporal redundancy between two consecutive frames. In intra-prediction, i.e., when utilising spatial redundancy for compression within a I-frame, the H.264/AVC supports nine prediction modes. These modes are shown in Figure 4. For large 16 × 16 luma blocks, four4 prediction modes are supported. H.265/HEVC significantly improved intra-prediction by introducing much finer angles of supported directions. HEVC specifies 33 non-uniform angular prediction modes ( Figure 5). The angles are finer at near-horizontal and near-vertical angles and more coarse at diagonal angles. This arrangement provides better statistical matches of blocks of pixels across frames.   possible prediction directions are shown in Fig. 6. Alternatively, planar prediction (assuming an amplitude surface with a horizontal and vertical slope derived from the boundaries) and DC prediction (a flat surface with a value matching the mean value of the boundary samples) can also be used. For chroma, the horizontal, vertical, planar, and DC prediction modes can be explicitly signaled, or the chroma prediction mode can be indicated to be the same as the luma prediction mode (and, as a special case to avoid redundant signaling, when one of the first four choices is indicated and is the same as the luma prediction mode, the Intra − Angular [34] mode is applied instead). Each CB can be coded by one of several coding types, depending on the slice type. Similar to H.264/MPEG-4 AVC, intrapicture predictive coding is supported in all slice types. HEVC supports various intrapicture predictive coding methods referred to as Intra − Angular, Intra − Planar, and Intra − DC. The following subsections present a brief further explanation of these and several techniques to be applied in common.
1) PB Partitioning: An intrapicture-predicted CB of size M×M may have one of two types of PB partitions referred to as PART − 2N×2N and PART − N×N, the first of which indicates that the CB is not split and the second indicates that the CB is split into four equal-sized PBs. (Conceptually, in this notation, N = M/2.) However, it is possible to represent the same regions that would be specified by four PBs by using four smaller CBs when the size of the current CB is larger than the minimum CU size. Thus, the HEVC design only allows the partitioning type PART − N×N to be used when the current CB size is equal to the minimum CU size. This means that the PB size is always equal to the CB size when the CB is coded using an intrapicture prediction mode and the CB size is not equal to the minimum CU size. Although the intrapicture prediction mode is established at the PB level, the actual prediction process operates separately for each TB.
2) Intra − Angular Prediction: Spatial-domain intrapicture prediction has previously been successfully used in H.264/MPEG-4 AVC. The intrapicture prediction of HEVC similarly operates in the spatial domain, but is extended significantly-mainly due to the increased size of the TB and an increased number of selectable prediction directions. Compared to the eight prediction directions of H.264/MPEG-4 AVC, HEVC supports a total of 33 prediction directions, denoted as Intra − Angular[k], where k is a mode number from 2 to 34. The angles are intentionally desig denser coverage for near-horizontal and near and coarser coverage for near-diagonal angle observed statistical prevalence of the angles a ness of the signal prediction processing.
When using an Intra − Angular mode, each directionally from spatially neighboring samp constructed (but not yet filtered by the in-loo being used for this prediction. For a TB of si of 4N+1 spatially neighboring samples may prediction, as shown in Fig. 6. When available decoding operations, samples from lower left T for prediction in HEVC in addition to sampl the left, above, and above right of the current The prediction process of the Intra − Angu involve extrapolating samples from the proj sample location according to a given direction the need for sample-by-sample switching be row and column buffers, for Intra − Angular[k range of 2-17, the samples located in the projected as additional samples located in t and with k in the range of 18-34, the sample left column are projected as samples located in To improve the intrapicture prediction acc jected reference sample location is computed w accuracy. Bilinear interpolation is used to o of the projected reference sample using two c samples located at integer positions.
The prediction process of the Intra − Angula sistent across all block sizes and prediction dir H.264/MPEG-4 AVC uses different methods f block sizes of 4×4, 8×8, and 16×16. This de is especially desirable since HEVC supports a of TB sizes and a significantly increased numb directions compared to H.264/MPEG-4 AVC.
3) Intra − Planar and Intra − DC Predictio to Intra − Angular prediction that targets regi directional edges, HEVC supports two altern methods, Intra − Planar and Intra − DC, for whic were specified in H.264/MPEG-4 AVC. Whil diction uses an average value of reference prediction, average values of two linear predic corner reference samples are used in Intra − P

Transform Coding and Quantisation
Two types of transforms have been specified in the H.265/HEVC standard [23]: core transform and an alternative transform [24,25]. The core transform is the discrete cosine transform (DCT) that is performed on 4 × 4, 8 × 8, 16 × 16 and 32 × 32 transform blocks (TB). However, H.265 standard specifies only one transform matrix of size 32 × 32 [23]. The rest of the transform matrices for smaller TBs (4 × 4, 8 × 8 and 16 × 16) are computed by sub-sampling the 32 × 32 transform matrix [24,26]. The alternative transform, which is used only for 4 × 4 luma residual TBs in the intra-picture prediction mode, is derived from discrete sine transform (DST). The 4 × 4 DST is computationally not too much more demanding than DCT of same dimension. However, on average, DST results in approximately 1% savings in terms of bit rate. The sub-sampled 8 × 8 core transform matrix and the 4 × 4 alternative transform matrices are shown in Equations (1) and (2)

Entropy Coding
The MPEG-1 and MPEG-2 standards employed Huffman coding as their entropy coding model. H.264/AVC specifies two entropy coding models: context-adaptive binary arithmetic coding (CABAC) and context-adaptive variable length coding (CAVLC) for its baseline and main profiles respectively [27]. H.265/HEVC specifies only CABAC for all profiles [23]. CABAC is a lossless entropy coding model that employs different probability models for different contexts. This allows better modelling of distribution as local data are generally well-correlated. Although CABAC is more complex and computationally intensive than CAVLC and Huffman coding, it is 10% more efficient than CAVLC in terms of bitrate savings [28].

Overview of Error Correcting Techniques
Several researchers [14, [29][30][31][32] have shown that error correcting codes (linear block and convolutional codes) such as CRC, Reed-Muller codes, and Bose-Chaudhury-Hocquenghem (BCH) codes can significantly reduce embedding distortion and increase robustness in noisy and unreliable transmission scenario and also against video re-quantisation attacks. In this paper, we use a carefully designed set of BCH codes and Turbo codes [33]. In the following sections, we briefly discuss the theories of BCH and Turbo codes.

BCH Syndrome Error Correcting Codes
Bose-Chaudhury-Hocquenghem (BCH) is a powerful class of random error correcting codes that is capable of correcting multiple errors. For any integers m ≥ 3 and t ≤ (2 m − 1)/2, BCH codes are characterized by the following parameters: where n is the code block length, (n − k) is the number of parity check bits and d min is the minimum distance. This system of code is capable of correcting t-errors and referred to as BCH(n, k, t). If α is a primitive element of the Galois Field GF(2 m ), m being the order of the field, n being the length of the codeword, and k being the dimension of the code, the error correction matrix H for BCH(n, k, t) is as follows: Given the original data are D = {d 0 , d 1 , d 2 , . . . , d k−1 }, the BCH codeword V = {v 0 , v 1 , v 2 , . . . , v n−1 } for D is calculated as follows: An unreliable, noisy channel may introduce multiple errors in the data.
. . , v n−1 is the erroneous data received at the receiver end and if E is the error pattern, then The syndrome S is given by the 2t-tuple: is the minimal polynomial of α i , for binary polynomial, it holds that, Hence, s i can be evaluated as where α j 1 , α j 2 , . . . , α j k are unknown. To decode BCH code, we must solve Equation (11). In this paper, we use Berelkamp's iterative algorithm [34] as BCH decoding algorithm.

Turbo Codes
To increase coding efficiency of the traditional codes, and to approach the Shannon limit [35], the code-word length of the linear block codes (or constraint length, in the case of convolutional code) should be increased. However, increasing the code-word length causes the complexity of the decoder to increase exponentially and the decoder takes proportionately longer time. Turbo codes were proposed to address these issues. Turbo code [35] is an approach that simulates larger coding blocks by splitting and interleaving it, such that decoding can be done in a number of manageable steps. An interleaver is a component that temporally permutes (with the help of a memory buffer) a sequence of symbols in a totally deterministic manner. An added benefit of interleaving is that burst errors in data can be converted to statistically independent short errors when the data are de-interleaved at the decoder side. This enables code designed for statistically independent errors to be used as the constituent codes (for encoder 1 and 2). Figure 6 depicts the basic building blocks of a Turbo encoder. The interleaver permutes (in periodic or pseudo-random manner) the input bits such that the two encoders operate on the same set of input bits, but in different order. The constituent encoders may use different or same convolution or linear block codes. Figure 7 shows the general structure of a Turbo decoder.  Figure 7. Block diagram of a general Turbo decoder.

Proposed Method of Data Hiding
As discussed in Section 2, the proposed method focuses on improving three main aspects of data hiding methods: payload capacity, robustness against transmission errors or re-quantisation attacks and distortion performance. As discussed in Section 2, most of the existing works in the literature embed data in only the 4 × 4 blocks as these are the most irregular blocks. Human Visual System is not very sensitive to little changes in rough areas, therefore embedding in 4 × 4 blocks does not cause noticeable visual degradation. However, this severely limits payload capacity. To increase payload capacity, in this paper, we embed data in selected 8 × 8 and 16 × 16 TU blocks in addition to all 4 × 4 blocks. To keep visual distortion under control, we embed data only in the most irregular 8 × 8 and 16 × 16 blocks. These blocks are selected using the proposed JSD-SM block coarseness measure. In addition, most existing methods in the literature [2][3][4][5][6][7][8][9][10][11][12][13] embed data by directly modifying the LSBs of the QTC values. In these methods, the LSB of one QTC is modified to embed one bit of data. The SME(1,3,7) technique presented in this paper, is able to embed three bits of data by modifying one QTC in a block of seven QTCs. Therefore, in a 4 × 4 block, SME(1,3,7) is able to embed 6 data-bits by modifying only 2 QTCs. The change density is the density of data embedding, which is defined as the number of modifications per embedded data bit. In most existing literature [2][3][4][5][6][7][8][9][10][11][12][13], change density is 1 modification per 1 embedded data bit. The SME (1,3,7) technique is able to embed 3 data bits by modifying only 1 coefficient. Therefore, change density of SME(1,3,7) is 1/3, which is one-third of the existing methods. Reduced change density allows us to embed data in larger than 4 × 4 blocks without causing any significant visual distortion. However, we only select the relatively rough 8 × 8 and 16 × 16 blocks based on the proposed JSD-SM coarseness measure. The video stream with embedded data would most likely be transmitted over the network or streamed in publicly shared channels. Most transmission channels (e.g., WiFi and mobile broadband networks) are prone to error. This error may cause some coefficients of the compressed video to change. Moreover, an attacker may deliberately introduce perturbation in the QTCs or recompress (i.e., re-quantise) the video to foil correct recovery of embedded secret data by the intended receiver. To attain a high degree of robustness against these attacks, in this paper, we use BCH and Turbo error correcting codes. The BCH code has been used in some existing literature [3,4,30,31]. However, its effectiveness is expected to increase when used in conjunction with the SME(1,3,7) whose change density is one-third of most of the existing LSB based embedding techniques. In addition to BCH codes, we also use Turbo error correcting codes. The BCH codes are able to correct random single bit errors but are less effective against burst errors, i.e., when more than one consecutive bits are changed due to error. Turbo codes are a family of powerful errors correcting codes that have been developed to correct such burst errors in addition to random single bit errors. In the following subsections, we present the proposed JSD-SM block coarseness measure, the SME(1,3,7) data embedding and extraction technique and design, and parameters of the BCH and Turbo error correcting codes. In Section 5.4, we present the overall embedding and extraction process.

Block Selection Using JSD-SM Coarseness Measure
As discussed in Section 3.1.1, the H.265/HEVC compression algorithm partitions the frames in transform units (TU) of size from 4 × 4 to 32 × 32 which are organised in CTB structures. HEVC partitions the picture frame in different TU sizes based on smoothness or coarseness of a region. If a region in a picture is coarse, i.e., contains high variation of pixel values, it is subdivided in many 4 × 4 blocks to retain maximum detail. Larger blocks such as 8 × 8, 16 × 16 and 32 × 32 are allocated for relatively less coarse regions. In the proposed work, message bits are embedded in all 4 × 4 blocks since these are the most coarse blocks. However, to achieve higher payload capacity, it is necessary to embed data not only in 4×, but in larger 8 × 8 and 16 × 16 blocks too. As the HEVC algorithm allocates larger blocks to the relatively less coarse regions, the 8 × 8 and 16 × 16 blocks are relatively less coarse than the 4 × 4 blocks. Embedding data in these blocks may introduce visual artefacts as the human visual system is sensitive to small changes in smooth (i.e., low frequency) regions. To minimise embedding distortions in these blocks, we embed data in the relatively most coarse blocks among all 8 × 8 and 16 × 16 blocks.
To select the coarser blocks, we propose a method that is based on Jensen-Shannon divergence (JS-divergence) [36] measure and second moment of the pixel values. We call the proposed method block selection using Jensen-Shannon Divergence and Second Moment (JSD-SM). Jensen-Shannon divergence is a symmetric adaptation of Kullback-Leibler divergence (KL-divergence). Given two probability distributions P and Q in the same probability space, the KL-divergence is defined as, Jensen-Shannon divergence is then defined as, where M = 1 2 (P + Q). Unlike KL-divergence, Jensen-Shannon divergence is finite, i.e., 0 ≤ D JS ≤ 1, and symmetric, i.e., D JS (P Q) = D JS (Q P). These properties make it more suitable for our purpose.
The first and second moments of a block are defined as follows. Given a block B of dimension m × n, its set of pixel values can be imagined as a one dimensional vector Given the above definitions of Jensen-Shannon Divergence and second moment of probability distribution, we propose the JSD-SM block coarseness measure as follows (Algorithm 1).

Data Embedding and Extraction
The process of representing message bits by altering the carrier file features is called data embedding and the process of recovering embedded data from the cover medium is called data extraction. In the literature, many data embedding techniques have been proposed for steganography, watermarking and data hiding algorithms in general. The earliest and simplest data embedding technique is the Least Significant Bit (LSB) substitution [37] method and its derivatives. In LSB substitution based methods, one or more LSBs of embedding feature (e.g., pixel value, transform coefficient, motion vector, etc.) are substituted with the secret data bits. Equation (15) mathematically represents the basic LSB substitution based data embedding operation.
where m i is the ith message bit, X i is the value of the ith selected pixel before embedding and Y i is the value of the ith pixel after embedding [38]. LSB based techniques are simple, fast and generally have very good payload capacity. However, these techniques suffer from high distortion and lack of robustness against re-compression attacks and also suffers in presence of transmission errors. Several derivatives of LSB technique, namely LPAP [37], OPAP [39], SLSB [40] and PVD [41], improved the distortion performance, however these methods are not suitable for data embedding in video features due to following reasons: (1) low embedding efficiency [42]; (2) high distortion; (3) lack of robustness in erroneous transmission scenario; and (4) not robust against re-quantisation attacks. Lack of robustness of simple LSB substitution based techniques motivates us to use error correcting codes.
In this paper, for data embedding in the selected quantised transform coefficients (QTCs), we propose a Simplified Matrix Embedding (SME) technique. Matrix embedding techniques were discussed by Fridrich and Soukul [43]. SME technique embeds k data bits in a block of n = 2 k − 1 QTCs by modifying at most 1 QTC value. Specifically, in our case, we use a SME(1, k = 3, n = 7) scheme. This scheme embeds three message bits in a block of seven QTCs. The embedding scheme works as follows.
1. Let a 3-bit message block be M = (m 1 m 2 m 3 ) and the destination block of seven QTCs is QB = (Q 1 , Q 2 , Q 3 , Q 3 , Q 4 , Q 5 , Q 6 , Q 7 ). Only one of the Q i s is modified to encode the message block in QTC block. 2. Define three parity values P 1 , P 2 and P 3 as follows.
3. To encode binary message bits (m 1 m 2 m 3 ), modify the QTC values according to the following rules: At the receiver end, data are extracted from the cover medium as follows.

Design of the BCH and Turbo Error Correcting Codes
We use two different data encoding algorithms for error correction codes: BCH and Turbo codes. These error correcting codes introduce redundancies that help correct errors in video stream that is transmitted through an error-prone channel or the errors that may be introduced due to re-quantisation attacks. This increases robustness or survivability of the embedded data. As discussed in Section 4.1, BCH coding schemes are described as BCH(n, k, t) where n is the code block length, k is the number of message bits and t is the number of error bits that the code is capable of correcting. We propose the following three set of parameters and corresponding generating polynomials of BCH(n, k, t) coding given in Table 1. In this paper, we use Berelkamp iterative algorithm [44] to decode BCH coded data. For Turbo coding, we employ two identical parallel concatenated convolution encoders (PCCE) as the constituent encoders in conjunction with a pseudo-random intervealer. The proposed structure of the constituent encoder is as shown in Figure 9. The constraint length of the Turbo code is determined by the pseudo-random interleaver. The decoder is the Soft-Output Viterbi algorithm (SOVA) [45] that decodes received codes by estimating logarithm of likelihood ratio (LLR) as in Equation (22).
where R is the received bit. For constituent convolutional encoders, we propose a recursive convolutional code (RSC) that is used for both for CE 1 and CE 2 . The structure of the constituent RSC is as shown in Figure 9. The proposed RSC has 2 3 = 8 states and has constraint length of 4. The transfer function is as follows.
where g 0 (D) = 1 + D 2 + D 3 and g 1 (D) = 1 + D + D 3 . The initial values of the three shift registers are kept all zeros when starting the encoding process. The constraint length of Turbo encoders depend on the interleaver used. In this paper, we use a pseudo-random interleaver. The proposed interleaver is a linear congruential generator (LCG) given by Equation (24).
where X is the output sequence of pseudo-random numbers, m (> 0) is the modulus, a ∈ (0, m) is a constant multiplier, c ∈ [0, m) is the increment and X 0 ∈ [0, m) is the seed or the start value. The set of of these values and the corresponding constraint lengths K t that are used in this paper are summarised in Table 2. The block diagram of the proposed Turbo encoder and decoder is illustrated in Figure 10 in which the convolutional encoders CE 1 and CE 2 are having the same structure as shown in Figure 9.

Overall Architecture of the Proposed Method
The proposed method embeds message bits into the H.265/HEVC quantised transform coefficients of the intra-coded frames. The embedding process is described in Algorithm 2 and illustrated in Figure 11. Once data are embedded in a video sequence, it is transmitted to one or more receiver over a public channel. At the receiver end, the receiver extracts the embedded data using the key. The receiver starts decoding the H.265/HEVC compressed video. After entropy decoding, all 4 × 4, 8 × 8 and 16 × 16 TU blocks in the luminance channel are copied and stored separately. Then, the usual HEVC inverse quantisation and inverse transform processes continues. Inverse transform generates the spatial domain blocks corresponding to each DST (4 × 4) and DCT (8 × 8 and 16 × 16) domain blocks. The JSD-SM coarseness of these spatial domain blocks are evaluated. Next, κ % most coarse blocks are selected. Data are the extracted from the quantised DST and DCT coefficients the corresponding transform domain blocks (that were save separately) using the SME(1,3,7) extraction process described in Section 5.2. After extraction, data is decoded and output to the receiver. The whole process is described in Algorithm 3 and illustrated in Figure 12.

Experimental Results and Discussion
The proposed method was implemented in the H.265/HEVC reference coding software HM (version 16.20) released by Fraunhofer HH Institute. All experiments were run in a GNU/Linux based operating system on a computer with Intel i7-3770 CPU and 16GB RAM. Six test video sequences were used as the carrier videos. The names and specifications of these video sequences are given the Table 3. These videos were chosen because these are widely used by researchers in data hiding, video compression and allied fields and facilitate objective comparison of performances with other state-of-the-art data hiding methods proposed in the literature [14-16]. For objective comparison with the above mentioned research works, we used the main profile of the H.265/HEVC standard for video encoding purpose.

Visual Quality and Payload Capacity
For measuring and comparing the distortion characteristics of the proposed method, we used Peak Signal to Noise Ratio (PSNR).PSNR is defined in terms of mean squared error (MSE). Given a m × n original image or video frame I and its corresponding stego-frame K, MSE is defined as: Then, PSNR (in dB) is defined as: where MAX is the maximum pixel value of the image/video. In our case, as the bit depth of the YUV test videos is 8, MAX = 2 8 − 1 = 255. For objective comparison of our results with state-of-the-art in the literature, we selected the works discussed in [4,14]. These papers were selected as they are recent state-of-the-art and reported much better results than earlier works in the literature. Moreover, the researchers in these papers used the same set of test video sequences at the same resolution as in our work. For comparison of capacity and distortion performances, test video sequences must be same. These observations motivated us to compare our results with the results in these papers. We kept the quantisation parameter value (QP) fixed at 30. As in these studies, we encoded 300 frames of each test video sequences with a frame rate of 30 frames per second. We used the main profile of the H.265/HEVC encoder. Table 4 summarises the visual quality performances of all the proposed schemes in terms of PSNR values. In this table, PSNR 1 is the visual quality difference between the original YUV videos and the re-quantised video with QP = 30, but without embedded data. PSNR 2 values denote the visual quality difference between the original YUV videos and the re-quantised video (with QP = 30) with data embedded using the proposed method. Therefore, the quantity (PSNR 1 − PSNR 2 ) denotes the loss in visual quality due to embedding. It is seen in the table that the maximum difference between PSNR 1 and PSNR 2 is less than 2 for all video sequences and proposed schemes. This indicates that visual distortion due to data embedding is hardly noticeable in naked human eye. This can be observed visually in Figure 13. Figure 13a,d,g,j shows the original frames from the original YUV video sequences container, news, mobile and akiyo, from top to bottom. The frames in the middle column are the corresponding frames from HEVC video sequences, compressed with QP = 30, but with no embedded data. The rightmost column shows the frames from the HEVC compressed video sequences with data embedded with the proposed method. It is seen that these frames are visually almost indistinguishable from the middle and leftmost frames. For closer inspection, we present enlarged segments of I-frames of video sequences in Figure 14. The left column shows the first HEVC compressed (with QP = 30) I-frames of Container, News, Mobile and Akiyo datasets from top to bottom, with no data embedded. The middle column shows the corresponding segments with data embedded. The right hand side column shows the matrices of absolute value differences between the pixels of segments with and without embedded data. It is seen that the difference matrices are almost black, indicating pixel value differences very close to zero. This vindicates the efficacy of the proposed method in preserving good visual quality. Table 4. The PSNR values attained at with different coding schemes of the proposed method. PSNR 1 denotes the measured quality difference between the original YUV video sequences and the HEVC encoded video sequences with QP = 30, but no data are embedded. PSNR 2 values represent the attained quality when data are embedded in HEVC compressed video with QP = 30.

Video
PSNR 1 (QP = 30) PSNR 2 (QP = 30) and SME (1, 3, 7) and Following Error Codes B (7,4,1) B(31,16,3 Table 5 presents the comparison of PSNR values and attained payload capacity between the proposed method and the aforementioned state-of-the-art methods in teh literature. Table 5 shows that the proposed method attains much greater payload capacity than the methods of Liu et al. [4] and Liu et al. [14]. Liu et al. [4] proposed multiple schemes of embedding. Among those, the highest performing scheme was compared. In the proposed method, the SME(1,3,7) embedding scheme together with the BCH(7,4,1) error correcting code yields PSNR more than 38 and capacity of over 400 bits on the average per I-frame for all the test video sequences. Therefore, average payload capacity is three times than that of the method in [14] and more than 2.5 times the capacity offered by the method in [4]. In the proposed method, when SME(1,3,7) embedding together with Turbo(K t = 24) error correcting scheme is used, the embedding capacity is more than twice the capacity of the method in [4] and almost two times the method in [14]. Despite drastic increase of capacity, the proposed method does not compromise with visual quality. In all cases, the proposed schemes attain slightly better PSNR values than the methods of both Liu et al. [4] and Liu et al. [14]. Early methods in the literature selected only the 4 × 4 TU blocks in order to keep the visual quality degradation under control. This, however, yielded low payload capacity. In the proposed method, in addition to all 4 × 4 TU blocks, some 8 × 8 and 16 × 16 TU blocks that pass the proposed JSD-SM coarseness criteria, are also used for data embedding venue. This drastically increases payload capacity. Moreover, the SME(1, 3, 7) data modulation technique used in this paper is able to embed 3 bits of data in a block of 7 QTCs by changing only 1 of the QTC values. Thus, SME(1,3,7) results in less change density, i.e., fewer modifications per embedded bit. Fewer changes of QTCs gives more headroom to embed data in blocks larger than 4 × 4. Hence, the proposed method is able to embed data in 8 × 8 and 16 × 16 blocks without negatively affecting visual quality of stego-video. Table 5. Comparison of visual quality (PSNR) and payload capacity between various schemes of the proposed method and in [4,14]. PSNR is in dB and Capacity is given in bits. As Liu et al. [4] proposed many error correcting schemes, only the best performing scheme was compared.

Robustness Performance
For robustness analysis of the proposed framework, we adopted the following methodology. First, we embeded data in the video sequences using different error correcting coding schemes proposed in Section 5.3. Next, the video sequences with embedded data were re-quantised with different values of quantisation parameter (QP). Then, the payload data were extracted from the re-quantised video sequences. Recovered payload data were then compared to the original data and bit-by-bit similarity was evaluated. The higher is the similarity, the higher is the survivability or robustness. For the the purpose of objective comparison with the methods of Swati et al. [15] and Liu et al. [14], the QP values are taken same as in those methods, i.e., from QP = 29 to QP = 35 with increment step of 1.
The results are summarised in Table 6. As seen in the table, in case of the Container video, survival rate attained by proposed BCH(7, 4, 1) error correcting code is better than both Swati et al. [15] and Liu et al. [14] at all QP values. As expected, BCH (31,11,5) scheme performs much better than both the literature at all QP values in the range 29-35. The proposed Turbo error correcting schemes gives interesting results. As seen is the table, the coding scheme with block length K t = 16 performs much better than both Swati et al. [15] and Liu et al. [14]. In addition, its performance is similar to BCH (31,11,5). This means that, in the proposed method, Turbo code with block length 16 is almost as powerful as BCH (31,11,5). The fact that Turbo code is able to correct similar number of errors with smaller block size can be attributed to its superior burst error correcting capability with the help of the parallel concatenated recursive convolution codes and the pseudo-random interleaver (Section 4.2). In the case of the News and Mobile video sequences, similar superior performance of Turbo code is noticed. Table 6. Survival rate of the proposed method at different QP values when BCH (7,4,1), BCH (31,16,3), BCH (31,11,5), Turbo(K t = 16) and Turbo(K t = 124) error correcting codes are used. Performances of these codes have been compared to the performance of the methods of Swati et al. [15] and Liu et al. [14]. Robustness performance of the proposed method was also evaluated in terms of similarity (SI M) and bit error error rate (BER). For original embedded data D(i, j) and recovered dataD(i, j) of size m × n, SIM and BER are defined as follows.

Video
The results are compared and summarised in Table 7. In the table, it can be seen that the average similarity values achieved by the proposed framework are much higher than those of Swati et al. [15] and Liu et al. [14] and is slightly better than those of Liu et al. [16] at all QP values. Here, it is to be noted that Liu et al. [14] used same set of test video sequences as used in this work. However, both Swati et al. [15] and Liu et al. [16] used High resolution video sequences such as ParkScene (1920 × 1080), BQMall (832 × 480), PeopleOnStreet (2560 × 1600), etc. which used to be available from University of Hannover [46], but are no longer accessible, and therefore could not be evaluated in our work. The main difference between these videos and the videos used in our work is that these videos are very high resolution whereas our test sequences are of resolution 352 × 288. Therefore, comparison of our results with these works is not perfect, but since both BER and SIM are given in percentage (i.e., a dimensionless quantity), it gives a fair idea as to where our results stands in comparison to Swati et al. [15] and Liu et al. [16].

Bit-Rate Increase
The bit-rate (and consequently total size of video sequence) tends to increase after data embedding. The percentage of this increment should be as low as possible. Bit-rate increase due to data embedding is measured as follows. First, the original YUV video sequences were compressed in HEVC with QP = 30, but no data were embedded. Next, the YUV sequences were compressed in HEVC with QP = 30 and data were embedded using the proposed method. The percentage of increase of size (in bits) is the increase in bit-rate. For example, size of the HEVC compressed Container sequence (when no data are embedded) is 9,299,832 bytes. When Turbo coded data (with K t = 24) is embedded using the proposed method, size increases to 9,671,825. This is a 0.04% increase of bit-rate. Table 8 summarises the results and compares to Liu et al. [14]. The table shows that the proposed method causes some increase in bit-rate that is lower than that of Liu et al. [14]. In [14], Liu et al. used same set of test video sequences as used in this work, making direct comparison possible. However, other recent state-of-the-art techniques such as those of Liu et al. [16] and Li et al. [47] used a set of higher resolution video sequences that makes these techniques incompatible for direct comparison with the proposed method. Nevertheless, the average bit-rate increase in these works and our proposed work are presented in Table 9 together with the specifications of the video dataset used.

JSD-SM Coarness Measure Analysis
In the proposed data hiding method, JSD-SM coarseness measure (Algorithm 1) plays a crucial role. As the theory of Human Visual System (HVS) [48] suggests that the human vision is less sensitive to changes in the luminosity in coarse regions, the aim is to embed data in relatively coarse TU blocks of the luma channel of each I-frame in the video sequence. The HEVC compression algorithm allocates 4 × 4 TU blocks to the most coarse areas. Therefore, in this paper, data are embedded in all 4 × 4 blocks. To achieve greater payload capacity, data should be embedded in larger TU blocks. However, this may cause unacceptable visual distortion. Hence, only relatively coarser blocks among the 8 × 8 and 16 × 16 luma TU blocks are selected as embedding venues. The top κ% most uneven 8 × 8 and 16 × 16 luma TU blocks are selected based on the proposed JSD-SM coarseness measure, as explained in Section 5.1. Data are embedded in these selected blocks using the SME (1,3,7) technique. Figure 15 illustrates four specimen 16 × 16 luma TU blocks from the second I-frame of the Akiyo video sequence. The top left block is a non selected block as it contains little variation of luminosity. Other three blocks were selected as these has high coarseness as per JSD-SM coarseness measure. These show that the proposed JSD-SM coarseness measure correctly captures the relatively more coarse blocks. The quantity κ works as a parameter to the embedding algorithm. If κ = p, the top p% most coarse 8 × 8 and 16 × 16 TU blocks are selected for data embedding. Thus, κ controls the balance between embedding capacity and embedding distortion. If κ = 0, none of the 8 × 8 and 16 × 16 blocks are selected for embedding and data are embedded only in the 4 × 4 blocks. Similarly, if κ = 100, all 8 × 8 and 16 × 16 blocks are selected as data embedding venues. Figure 16 shows the variation of average embedding capacity in I-frames at different values of κ. As the parameter κ specifies the percentage of most coarse blocks to be selected, both payload capacity and embedding distortion are dependant on κ. In the figure, it can be seen that at κ = 0, i.e., when data are embedded in only the 4 × 4 blocks and in no 8 × 8 or 16 × 16 blocks, the average capacity varies from 256 bits to 380 bits. The capacity of the Foreman video sequence is the least because among all test sequences, it contains least number of 4 × 4 blocks at QP = 30. On the other hand, the Mobile test sequence contains more coarseness, and it is allocated more 4 × 4 blocks at QP = 30. Hence, it has the highest capacity among all test videos at QP = 30. At κ = 20, in addition to all the 4 × 4 blocks, data are embedded in the 20% most coarse 8 × 8 and 16 × 16 blocks that are selected based on the proposed JSD-SM block coarseness measure. Consequently, payload capacity increases significantly for all the test sequences. Similar to every increment of κ from 0 to 100, embedding capacity increases as more and more 8 × 8 and 16 × 16 are selected as embedding venue. At κ = 100, the highest capacity is achieved as data are embedded in all 8 × 8 and 16 × 16 blocks. Figure 17 illustrates distortion performance in terms of PSNR at different values of κ at QP = 30. At κ = 0, i.e., when data are embedded only in the 4 × 4 blocks, highest PSNR values are achieved for all the test video sequences and these values range from 38 to 39. At κ = 20, in addition to the 4 × 4 blocks, 20% of the 8 × 8 and 16 × 16 blocks are used as data embedding venues and therefore, as more QTCs are modified, PSNR value drops. At κ = 100, all 4 × 4, 8 × 8 and 16 × 16 blocks in an I-frame are used as data embedding venues. Consequently, PSNR values drop even further. However, even at κ = 100, PSNR values stay above 35. These results prove the proposed method achieves good distortion characteristics. Moreover, the proposed method provides great flexibility of attainable payload capacity and distortion performance. It lets the user choose a value of κ that dictates the balance between payload capacity and distortion.    Figure 17. PSNR values attained with the six video sequences at different values of κ, when BCH(7,4,1) error correcting scheme is used in conjunction with SME(1,3,7) data embedding technique

Computation Time
The proposed method was implemented in HM (version 16.20), which is the H.265/HEVC reference coding software released by Fraunhofer HH Institute. All experiments were run in a GNU/Linux based operating system on a computer with Intel i7-3770 CPU and 16GB RAM. The HM software is a C++ a source-only package and researchers are supposed to modify the software as per need and then compile to generate the executable encoder, decoder and other modules. We compiled the software in GNU C++ compiler with optimiser switch "-O 2". This is important to mention, because an executable generated with different optimiser switch may give different execution time for experiment. For the sake of consistency, CPU throttling was turned off and the clock speed was kept fixed at 2.4 GHz. Multi-threading was also turned off as it may give dynamic behaviour in execution. The computation time of the proposed embedding method can be divided in the following steps.
A. Average data pre-processing and data encoding time: in this step, the data to be embedded are first converted to binary bit-stream from its original format. Then, they are encoded in one of the schemes of BCH or Turbo coding described in Section 5.3. Different schemes of BCH and Turbo encoding take slightly different time. The average time taken by all the proposed schemes is considered. B. block selection using proposed JSD-SM technique as described in Section 5.1 C. data embedding using SME(1,3,7) technique as described in Section 5.2 D. total time: The total time taken to complete the whole embedding process. This includes A, B, C and rest of the usual HEVC process such as motion vector analysis, quantisation, entropy coding, etc.
Similarly, computation time for the data extraction process can be divided into following steps: M. Block selection using the proposed JSD-SM technique N. Data extraction using SME(1,3,7) technique O. Data decoding using one of the proposed schemes of Turbo/BCH coding and post processing P. Total time that includes M, N, O and rest of the HEVC decoding steps, e.g. inverse DST/DCT, quantisation, etc.
Tables 10 and 11 summarise the computation time of different steps and also the total time taken by embedding and extraction processes for each test video sequences. The results shown are the times taken to embed data in 300 frames of each video sequence.

Conclusions
In this paper, we propose a method of robust and secure data hiding in H.265/HEVC compressed videos that is capable of higher payload capacity than the previous state-of-the-art methods in the literature at similar or better visual distortion. Compared to the previous state-of-the-art works in the literature, the proposed method achieves two to three times more payload capacity without compromising the visual quality. Higher payload capacity has been achieved by embedding data in larger transform blocks of H.265/HEVC and embedding distortion is kept under control by reducing embedding change density with the help of SME(1, 3, 7) data embedding technique. SME(1, 3, 7) is capable of embedding 3 bits of data in a block of seven QTCs by modifying only one QTC. Visual distortion is also kept under control by avoiding data embedding in smoother 8 × 8 and 16 × 16 blocks. The smooth blocks are filtered out and coarse blocks are selected as embedding venue with the help of the proposed JSD-SM block coarseness measure that is based on Jensen-Shannon divergence and second moment of pixel value distribution in image blocks. Moreover, the proposed method achieves excellent robustness against re-quantisation attacks with the help of powerful BCH or Turbo error correcting codes. We have compared a set of different parameters and generator polynomials of BCH and Turbo codes. The results show that Turbo code can achieve superior robustness due to its burst error correcting properties. However, there is room for more improvements. A potential way forward is to increase payload capacity even more by embedding data in the inter-predicted frames (P and B) and other HEVC compression features such as motion vectors.

Conflicts of Interest:
The author declares no conflict of interest.