Next Article in Journal
Analysis of a Trapped Bose–Einstein Condensate in Terms of Position, Momentum, and Angular-Momentum Variance
Previous Article in Journal
Diagonally Implicit Block Backward Differentiation Formula with Optimal Stability Properties for Stiff Ordinary Differential Equations

Symmetry 2019, 11(11), 1343; https://doi.org/10.3390/sym11111343

Article
Detection of Transcoding from H.264/AVC to HEVC Based on CU and PU Partition Types
1
School of Information Engineering, Beijing Institute of Graphic Communication, Beijing 102600, China
2
The Operation Center of Network Security Products China Telecom Corp. Ltd., Beijing 100010, China
3
School of Electronic and Information Engineering, Beijing JiaoTong University, Beijing 100044, China
4
Guangdong Provincial Key Laboratory of Information Security Technology, Guangzhou 510000, China
5
Jeejio Technology Co., Ltd, Beijing 100086, China
*
Author to whom correspondence should be addressed.
Received: 5 October 2019 / Accepted: 22 October 2019 / Published: 1 November 2019

Abstract

:
High Efficiency Video Coding (HEVC) is a worldwide popular video coding standard due to its high coding efficiency. To make profits, forgers prefer to transcode videos from previous standards such as H.264/AVC to HEVC. To deal with this issue, an efficient method is proposed to expose such transcoded HEVC videos based on coding unit (CU) and prediction unit (PU) partition types. CU and PU partitioning are two unique syntactic units of HEVC that can reflect a video’s compression history. In this paper, CU and PU partition types of I pictures and P pictures are firstly extracted. Then, their mean frequencies are calculated and concatenated as distinguishing features, which are further sent to a support vector machine (SVM) for classification. Experimental results show that the proposed method can identify transcoded HEVC videos with high accuracy and has strong robustness against frame-deletion and shifted Group of Pictures (GOP) structure attacks.
Keywords:
video forensics; HEVC; CU; PU; transcoding

1. Introduction

Owing to the rapid development of mobile web and video capture devices, people can shoot, transmit, and watch videos at any time and any place. Every minute, 500 h worth of video content are uploaded to the video sharing website YouTube [1]. However, by using increasingly user-friendly video editing software, forgers can change the original video contents at will, such as inserting/ deleting particular pictures or objects, splicing different video sequences, which may lead to judicial misjudgment or fake news. Therefore, it is an essential thing to authenticate the credibility of a video. As we know, falsified videos must have encountered a re-encoding process because forgers have to decompress one video before tampering and recompress it after tampering [2], so re-encoding detection can be used as the first step when verifying the authentication of one given video, and it has become one of the most important video forensics techniques [3].
In the literature, numerous re-encoding detection methods have been proposed for videos recompressed with the same codec as the original one. Discrete cosine transform (DCT) coefficients [4] and block artifacts [5] were utilized to detect re-encoded MPEG videos. Reference [6] was proposed for MPEG and H.264/AVC (AVC for short) re-encoding. After the release of the latest video coding standard, High Efficiency Video Coding (HEVC), a number of researchers turned to the field of its re-encoding detection [7,8,9,10,11,12]. However, not many algorithms have been reported to detect transcoding, i.e., re-encoding with different codecs. In fact, transcoding is a common operation of re-encoding. However, the transcoding process would cover up the video’s original compression information, and users can only get the re-encoding compression information.
In Reference [13], the proposed algorithm can detect AVC videos transcoded from MPEG videos and estimate the original Group of Pictures (GOP) size. In recent years, HEVC has got growing support from the industry and has begun to replace AVC. In such circumstances, a forger may re-encode low quality AVC videos to disguise as HEVC videos, and then upload them to a video-sharing website to gain popularity or even profits. Therefore, it is of great importance to reliably detect HEVC videos transcoded from AVC videos. For short, we call these kinds of videos “AVC/HEVC videos” in this paper.
For detecting AVC/HEVC videos, Costanzo et al. [14] analyzed the frequencies of motion prediction modes over all B pictures and utilized the critical point where the distance between the given video and its recompression version began to increase to identify AVC/HEVC videos. However, it is only suitable for videos encoded with constant Quantization Parameter (QP) mode, and it needs to recompress the videos a number of times, which leads to high computational complexity. Bian et al. [15] combined the frequency of PU partition types in I pictures and P pictures to detect AVC/HEVC videos, and the results confirmed that PU partition types could be effective in identifying AVC/HEVC videos. However, it did not consider the characteristic of CU partition types. Like Reference [15], Reference [12], which was proposed for detecting videos recompressed with the same codec as the original one, also only explored PU partition types and did not consider the CU partition types. Furthermore, PU partition types were only extracted from P pictures in Reference [12]. However, CU is the parent partition of PU in the HEVC coding standard. Compared with PU, which can better capture the subtle differences of Coding Tree Unit (CTU) contents, CU usually reflects the complexity of CTU contents, and would also be influenced by the transcoding process. Stimulated by this, we will comprehensively analyze the partition difference between singly compressed HEVC videos and AVC/HEVC videos in terms of both CU types and PU types in this paper.
The proposed method consists of two stages. In the first stage, the CU and PU partition types of I pictures and the first P pictures in each GOP are extracted, and the corresponding mean frequencies are calculated and concatenated. The support vector machine (SVM) classification is carried out in the second stage. The experimental results show that compared with the state-of-the-art work, the proposed method has higher classification accuracy in identifying AVC/HEVC videos, and stronger robustness against frame-deletion and shifted GOP structure attacks.
The rest of the paper is organized as follows. Section 2 analyzes the CU and PU partition types in I and P pictures. Section 3 gives the theoretical analysis and examples of the difference between AVC/HEVC videos and singly compressed HEVC videos. Classification features are described in Section 4. Experimental results are discussed in Section 5, and Section 6 summarizes the paper.

2. CU and PU Partition Types in I and P Pictures of HEVC

HEVC was jointly released by ITU-T [16] and ISO/IEC [17] in 2013 and has been continually improved [18] and supported by hardware producers. It employs the same hybrid coding framework as AVC, including intra/inter prediction and 2D transform coding. However, HEVC has brought in various new techniques to increase compression efficiency. One major technique is the introduction of CU, PU, and transform unit (TU). CU is the basic coding unit with flexible size, and PU and TU are the basic units for intra/inter prediction and transform/quantization, respectively.
The Coding Tree Unit (CTU) is defined as the basic processing unit in HEVC. The first step of HEVC encoding process is to split each coding picture into non-overlapped CTU in which the size is determined by the user. Then, each CTU is divided into multiple CUs in a quadtree structure. Figure 1 displays an example of a quadtree partition of one CTU from a coding picture. The red square denotes one CTU with size a of 64×64, and the digit in the CTU represents the coding order of each CU. We can see that the red CTU in Figure 1a is divided into 16 CUs with the largest size of 32×32 and the smallest size of 8×8. The CUs can be grouped into four types according to their sizes. I picture and P picture adopt the same CTU partition rule; thus, they have the same CU types. In order to demonstrate clearly, we list the four CU types and their corresponding indexes in Table 1.
When implementing a CTU quadtree partition, the encoder will traverse all the PU partition modes and determine the optimal one for each CU according to rate distortion function. As shown in Figure 2, there are two kinds of prediction modes: intra prediction and inter prediction. In intra prediction mode, two PU partition modes are stipulated. One 2N×2N (N = 4, 8, 16, 32) CU can either be coded as one single PU or divided into four PUs with equal size (N×N). In inter prediction mode, the CU can be split into symmetric or asymmetric PUs. Both symmetric and asymmetric PU consist of four partition modes.
Figure 3 illustrates the PU partition of the CTU marked red in Figure 1. The CU without the dotted line means no PU subdivision, i.e., the PU size equals the CU size, while the CU with the dotted line means that the CU is subdivided into two PUs. For instance, the 4th CU and 15th CU are encoded as one single PU, while asymmetric and symmetric PU partitions are taken by the 1st and 6th CU, respectively. Similar to CUs, the PUs can be categorized into different types depending on the sizes of PUs. During the encoding process of I pictures, only the intra prediction mode is allowed, while both the intra prediction mode and inter prediction mode are supported in the P picture. Thus, there are a total of 4 PU types in the I picture, and 25 PU types in the P picture. Table 2 and Table 3 list the correspondence between PU types and PU indexes for the I picture and P picture, respectively.
CU and PU partitions are closely related to image contents. Generally, for regions with simple contents, HEVC tends to select coarse CUs and PUs, i.e., CUs and PUs with large sizes. On the contrary, CUs and PUs with small sizes are often selected for regions with complex contents to adapt to the variable shape of the target. The transcoding process will introduce an irreversible change of the video contents and make the reconstructed video different from the original one. Therefore, we suppose the CU and PU partition types will be different between AVC/HEVC videos and singly compressed HEVC videos. In the subsequent section, we will construct a theoretical model to analyze the factors leading to the differences.

3. CU and PU Partition Types Analysis in Singly Compressed HEVC Videos and AVC/HEVC Videos

In the AVC and HEVC encoding process, quantization and reconstruction are basic operations, but they will introduce irreversible quantization errors and reconstruction errors, which make the decoded video different from the original one. The change of video content will further affect the CU and PU partition types and makes them different between AVC/HEVC videos and singly compressed HEVC videos. We will illustrate the difference elaborately in this section.

3.1. Theoretical Analysis

Figure 4 describes the simplified block diagram of AVC/HEVC transcoding. YUV video is a kind of uncompressed video and often used as test example of video encoders. Given a YUV video V, the first step is to encode V into the AVC bit stream H 1 with bitrate r 1 . Then H 1 will be decoded to YUV video V ^ and recompressed into the HEVC bit stream H 2 with bitrate r 2 . Please note two points here. One is that the reconstruction module in the encoding process is equivalent to the decoding process; thus we directly use the reconstruction module to represent the decoder in Figure 4 to save space. The other one is that only HEVC encoding is implemented for a singly compressed HEVC video. That is to say, for a singly compressed HEVC video, the input video is the uncompressed YUV video V, not its decoded version V ^ .
From Figure 4, we can see that CU and PU partition types of each picture are determined by the content of the picture and the number of bits allocated to it by the rate control module. Here, please note that in this paper, a picture contains only one slice. Though CU and PU partition types in I pictures are different from P pictures, the partition strategy is similar. Therefore, we take the CU types in P pictures as an example to analyze the difference between AVC/HEVC videos and singly compressed HEVC videos.
Now let’s consider the AVC/HEVC transcoding process. Assume one uncompressed video sequence V consists of N P pictures and is expressed as Equation (1), where I n denotes the nth P picture of V, then the bit stream H 1 can be obtained by implementing the prediction, transform, quantization, and entropy coding process.
V = { I n | n = 1 , 2 , , N }
In the AVC encoding process, a rate control process is implemented. Assume the bit rate for V is r, let u ( ) denote the rate control process, then, the number of bits allocated to the nth P picture can be represented as b n ( r ) = u ( n ; V ; r ) . After that, the quantization step Q p for the picture will be determined according to b n ( r ) . In addition, AVC standard adopts macroblock as the basic coding unit and does not introduce the concept of CU; hence, CU types do not exist in the AVC encoding process.
The decoding process is the inverse process of encoding. Let C n stand for the prediction signal of I n , DCT ( ) and IDCT ( ) represent discrete cosine transform (DCT) and inverse DCT, respectively, then the decoded video sequence V ^ can be obtained by Equations (2) and (3), where I ^ n means the decoded version of I n , [ ] represents the rounding operator, E ( I n ) and E ( C n ) denote the irreversible quantization error and reconstruction error of I n and C n , respectively. The quantization error means the error introduced in the quantization process. The reconstruction error means the rounding error and truncation error generated in the reconstruction process.
I ^ n = IDCT ( [ DCT ( I n C n ) / Q p ] × Q p ) + C n IDCT ( [ DCT ( I n ) / Q p ] × Q p ) IDCT ( [ DCT ( C n ) / Q p ] × Q p ) + C n = I n + E ( I n ) E ( C n )
V ^ = { I ^ n | n = 1 , 2 , , N }
In the process of transcoding V ^ to the HEVC bit stream H 2 , the number of bits are allocated to the nth P picture according to Equation (4). Here we use u ( ) rather than u ( ) to represent the rate control process because the bits allocation function adopted in HEVC is different from the AVC standard.
b n ( r 1 , r 2 ) = u ( n ; V ^ ; r 2 )
Knowing the number of bits allocated to the nth P picture, the type of the kth CU in the nth P picture P C U n , k ( r 1 , r 2 ) can be written as Equation (5), where ω ( ) stands for the CU partition process, and C ^ n denotes the corresponding prediction signal of I ^ n .
P C U n , k ( r 1 , r 2 ) = ω ( I ^ n ; b n ( r 1 , r 2 ) ; C ^ n )
For a singly compressed HEVC video with bitrate r 2 , the number of bits allocated to the nth P picture, and the CU partition type will be determined by Equations (6) and (7).
b n ( r 2 ) = u ( n ; V ; r 2 )
P C U n , k ( r 2 ) = ω ( I n ; b n ( r 2 ) ; C n )
Eventually, we can get the difference of CU partition types between the AVC/HEVC video and the singly compressed HEVC video according to Equation (8).
P C U n , k ( r 1 , r 2 ) P C U n , k ( r 2 ) = ω ( I ^ n ; b n ( r 1 , r 2 ) ; C ^ n ) ω ( I n ; b n ( r 2 ) ; C n ) = ω ( I ^ n ; u ( n ; V ^ ; r 2 ) ; C ^ n ) ω ( I n ; u ( n ; V ; r 2 ) ; C n ) = ω ( I ^ n ; u ( n ; { I ^ n | n = 1 , 2 , , N } ; r 2 ) ; C ^ n ) ω ( I n ; u ( n ; { I n | n = 1 , 2 , , N } ; r 2 ) ; C n )
As shown in Equation (8), we can see that P C U n , k ( r 1 , r 2 ) would be different from P C U n , k ( r 2 ) . There are two factors that lead to the difference. One is the difference between I ^ n and I n . Equation (2) states that the irreversible quantization error and reconstruction error would make I ^ n different from I n . Here, please note that the quantization error is closely related to Q p . The bigger the Q p , the bigger the quantization error. However, Q p is determined by the number of bits allocated to the picture b n ( r ) , which makes Q p indirectly decided by the rate control process. Because the rate control process in AVC is quite different from HEVC, the selection of Q p in AVC would be different from that in HEVC, which further enlarges the difference between I ^ n and I n . The other factor that causes P C U n , k ( r 1 , r 2 ) to be different from P C U n , k ( r 2 ) is the difference between C ^ n and C n . C ^ n and C n are the prediction signals of I ^ n and I n , respectively, which are calculated according to the reconstruction pictures encoded before I ^ n and I n . Since there also exists quantization error and reconstruction error in the reconstruction pictures, C ^ n would be different from C n .
Because of the CU types in the I pictures, the PU types in I pictures and P pictures would also be inevitably affected by the quantization error, reconstruction error, and bits allocation method; the theoretical analyses of them are similar to the CU types in P pictures. Therefore, it can be concluded that the CU types in I pictures and P pictures and the PU types in I pictures and P pictures would be different between singly compressed HEVC videos and AVC/HEVC videos.

3.2. Illustrative Examples

In this subsection, we will exhibit the CU and PU partition types of singly compressed HEVC videos and AVC/HEVC videos to demonstrate the difference between them. YUV sequence “sign_irene” is selected as the testing video. For singly compressed HEVC video, this YUV sequence is directly encoded with an HEVC standard at bitrate 300 Kbps, while for a transcoded HEVC video, this YUV sequence is encoded with an AVC standard at bitrate 200 Kbps, 300 Kbps, and 400 Kbps followed by an HEVC standard at bitrate 300 Kbps, respectively. Figure 5 presents the CU and PU partition of the first P picture in the fourth GOP, where solid lines and dotted lines indicate the CU boundaries and PU boundaries, respectively. Figure 5a shows the CU and PU partition of the singly compressed HEVC video. Figure 5b–d show the CU and PU partition of HEVC videos transcoded from AVC videos with bitrates 200 Kbps, 300 Kbps, and 400 Kbps, respectively.
It can be observed that the CU and PU partition in AVC/HEVC transcoded picture is much different from singly compressed HEVC picture, even though they have the same visual content. This phenomenon verifies the analysis depicted in Section 3.1, and thus, CU and PU partition types can be exploited as footprints for AVC/HEVC video detection. Furthermore, looking at the block marked by the red boundaries in Figure 5a,b, they have the same CU partitions, but their PU partitions differ. CU usually reflects the complexity of the CTU contents, and PU can better capture the subtle differences in content. That is to say, CU types and PU types are complementary when encoding one video. The CU types and PU types in the I pictures have a similar phenomenon. Therefore, CU types and PU types can be used as complementary features, and we can merge them to detect AVC/HEVC videos.
In order to illustrate the characteristics of CU and PU partitions more clearly, the numbers of each CU and PU partition type in Figure 5 are exhibited in Figure 6 and Figure 7. We can see that the number of each CU and PU partition type in the AVC/HEVC video is much different from the singly compressed HEVC video. For example, for the 8×8 PU partition type, the numbers of them in 200–300 Kbps and 300–300 Kbps AVC/HEVC videos are nearly 25 and 20 smaller than that in the singly compressed HEVC video, respectively, while the number of it in the 400–300 Kbps AVC/HEVC video is much bigger than that in singly compressed HEVC video. Based on the above theoretic analysis and examples, it is reasonable to take both CU and PU types in the I pictures and P pictures as classification features and use them to identify AVC/HEVC videos with the SVM classification method. The specific feature extraction method and the classification method will be explicated in Section 4.

4. Proposed Method for Detecting HEVC Videos Transcoded from AVC

From the above analysis, we know that the numbers of CU partition types and PU partition types in both I pictures and P pictures are different between singly compressed HEVC videos and AVC/HEVC videos. In order to describe this characteristic and make it more universal, we consider utilizing the mean frequencies of CU partition types from I pictures (I-CU for short), CU partition types from P pictures (P-CU for short), PU partition types from I pictures (I-PU for short), and PU partition types from P pictures (P-PU for short) as classification features. Please note that we utilize the first P picture of each GOP to extract features for P pictures in this paper. The reason is that the characteristics of each P picture in one GOP are similar. In this section, we will firstly describe the feature extraction method in detail and then present the flow of the proposed method.

4.1. Feature Extraction

In the proposed method, we use s i ( i = 1 , 2 , 3 , 4 ) to represent the ith mean frequency set of PU and CU partition types. Specifically, s 1 means the 4-dimensional mean frequency set of I-CU partition types, s 2 states the 4-dimensional mean frequency set of P-CU partition types, s 3 and s 4 represent the 5-dimensional mean frequency set of I-PU partition types and 25-dimensional mean frequency set of P-PU partition types, respectively. Each mean frequency set can be given by Equation (9), where f i j represents the mean frequency of the jth partition type of the ith mean frequency set in the whole video, ( x i j ) k means the number of the jth partition type of the ith mean frequency set in the kth I picture or P picture, N denotes the total number of I pictures or P pictures in one video, M i represents the dimension of the ith mean frequency set, so M i equals to 4, 4, 5, and 25 when i is 1, 2, 3 and 4, respectively.
s i = { f i j | f i j = k = 1 N ( x i j ) k / N , j = 1 , 2 , , M i }
S = { s i | i = 1 , 2 , 3 , 4 }
According to Section 3.2, PU partition types and CU partition types are complementary features, and we can fuse them to enhance their expressiveness. Therefore, we concatenate the mean frequency set of I-CU, P-CU, I-PU, and P-PU, and get the final combination feature set S, as shown in Equation (10). From the above analysis, S is a 38-dimensional feature set that describes the distribution characteristics of the CU and PU partition.
In Figure 8, a line chart is utilized to show the difference between the classification feature set S of the singly compressed HEVC video and that of the AVC/HEVC video. For display purposes, the mean frequencies of each element in S are plotted in logarithmic form. The red line represents one singly compressed HEVC video, which is directly encoded with an HEVC standard at bitrate 300 Kbps, while the green line, blue line, and black line represent its corresponding AVC/HEVC versions, which are encoded with an AVC standard at bitrate 200 Kbps, 300 Kbps, and 400 Kbps followed by HEVC standard at bitrate 300 Kbps, respectively. The abscissa indicates the index of each element in the classification feature set S, that is to say, one index denotes one kind of PU or CU partition type. The ordinate indicates the corresponding mean frequency of each index, i.e., the mean frequency of each CU or PU partition type. From Figure 8, we can see that the mean frequencies of the singly compressed HEVC video and its corresponding AVC/HEVC versions differ greatly at most indexes, which indicate that the mean frequencies of most CU or PU partition types of the singly compressed HEVC video are greatly different from those of the AVC/HEVC video, and further indicate the effectiveness of the classification feature set. However, it is difficult to separate them by intuitive linear classification. Thus, we bring in SVM, which is quite suitable for solving the classification problem of nonlinear high dimension pattern recognition and small samples, and the detailed processes are provided in the next subsection.

4.2. The Flow of the Proposed Method

For one given test video, the flow of identifying whether it is an AVC/HEVC video is illustrated in Figure 9. Firstly, CU and PU partition types of I pictures and the first P pictures in each GOP are extracted. Then classification feature set S is calculated according to Equations (9) and (10) and sent to SVM. Finally, SVM will decide the category of the given video by using the classification model, which is well trained in the SVM training process beforehand. In the SVM training process, singly compressed HEVC videos samples and AVC/HEVC videos samples are required. The classification feature set S is calculated for both of them and then sent to SVM to get the optimal classification model.

5. Experimental Results

In this section, 29 widely known standard YUV sequences are used, including one low-resolution video set of 18 CIF format videos with a resolution of 352×288, and one high-resolution video set of 11 1080p videos with the resolution of 1920×1080. To increase the size of the video sets, each video is segmented into non-overlapped video clips with a length of 100 frames. Finally, a total of 43 CIF video clips and 32 1080p video clips are generated and used for the subsequent training and testing.
HM [19] with encoder_lowdelay_P_main configuration file and JM [20] with encoder_main configuration file are used to implement the HEVC and AVC encoding and decoding process, respectively. The frame rate, period of the I picture, and GOP structure are set as 30, 4, and IPPP, respectively.
A singly compressed HEVC video is generated by using HM with bitrate r 2 . In the generation procedure of AVC/HEVC transcoded videos, YUV sequences are firstly encoded with JM at bitrate r 1 , and then they are decoded with JM and recompressed with HM at bitrate r 2 . To guarantee the visual quality of compressed videos, the bitrates r 1 and r 2 used for the CIF video set are selected from {200, 300, 400, 500} (Kbps). r 1 and r 2 for the 1080p video set are selected from {3, 4, 5} (Mbps).
LIBSVM, proposed by Chang et al. [21], is a popular kind of SVM and suitable for non-linear classification. Therefore, LIBSVM with PolySVC kernel is adopted as a classifier in this paper to distinguish AVC/HEVC videos from firstly compressed ones. A total of 35 and 27 videos are randomly selected as training samples for the CIF video set and 1080p video set, respectively, while the rest videos are used for testing.
As shown in Equation (11), Accuracy Rate (AR) is used to evaluate the performance of the proposed method, where TNR means true negative rate and is defined as the rate that firstly compressed videos are labeled as firstly compressed ones, and TPR means true positive rate and is defined as the rate that transcoded videos are labeled as transcoded ones. The training and testing process is repeated 100 times, and the mean value of AR is treated as the final classification accuracy.
AR = ( TNR + TPR ) / 2

5.1. Accuracy of Detecting AVC/HEVC Videos

We tested the effectiveness of the proposed method in CIF and 1080p video sets, and the results are shown in Table 4 and Table 5. We can see that, whether r 1 is smaller than r 2 or not, the classification accuracy is higher than 0.9 with a minimum value of 0.904 and a maximum value of 0.997 in the two video sets. The classifying results have verified the theoretical analysis in Section 3 and indicated that the proposed method could identify AVC/HEVC videos very well.
To the best of our knowledge, Reference [16] is the most recent literature, which is published in 2019, and performs best in detecting AVC/HEVC videos. Therefore, in this paper, a comparative analysis with Reference [16] is conducted in the low-resolution CIF video set and high-resolution 1080p video set. Furthermore, the same SVM training and testing process as implemented in the proposed method is adopted for comparison, and the results are reported in Table 6 and Table 7.
Observing Table 6 and Table 7, it can be seen that the classification accuracy of the proposed method is higher than Bian’s method in almost all cases except the entry (300, 400) in Table 6. When r 1 = 500 Kbps , r 2 = 200 Kbps , the accuracy of the proposed method is even 5% higher than Reference [16]. The reason is that only PU types were adopted as the classification features in Bian’s method, and they did not consider the characteristic of CU types and the complementary relationship between PU types and CU types, while the relationship between them was explored and more comprehensive features based on them were extracted in the proposed method. For the entry (300, 400) in Table 6, the classification accuracy of Bian’s method is 0.8 percent higher than the proposed method, which may be caused by the special bitrate and video contents. CU partition types and PU partition types are decided by the video contents and the allocated bitrate. When the video contents are fixed, the CU partition types of singly compressed HEVC videos would be similar to AVC/HEVC videos at some special bitrate. Figure 10 shows the CU partition of one picture in the singly compressed HEVC video at bitrate 400 kbps and its corresponding transcoding version at bitrate 300 kbps for AVC encoding and 400 kbps for HEVC encoding. We can see that the CU partition types are identical at the special bitrate. This would make the features extracted from I-CU and P-CU ineffective and cause redundancy of the feature set S, which further leads to the decrease of SVM classification accuracy.

5.2. Robustness against Frame-Deletion

Frame-deletion is a common forgery type for digital videos. Hence, we tested the robustness of it against the proposed method and Reference [16]. The frame-deleted video is generated as follows. Firstly, the YUV video sequence in the 1080p video set is encoded and decoded with AVC codec at bitrate r 1 . Then, 30 frames are randomly deleted for each video, and finally, the frame-deleted videos are encoded with HEVC at bitrate r 2 . Table 8 exhibits the classification accuracy.
We can see that the classification accuracy is above 0.97 for both the proposed method and Reference [16], which indicates that the proposed method and Reference [16] are effective for detecting frame-deleted AVC/HEVC videos. This is caused by the difference between I ^ n and I n , which is one major factor that causes the difference of CU and PU partition types between AVC/HEVC videos and singly compressed HEVC videos according to the analysis in Section 3. Moreover, frame-deletion operation deletes several frames and further boosts the difference between I ^ n and I n .
On the other hand, it can be observed that the classification accuracy of the proposed method performs better than Reference [16] with the average improvement being about 1.1%, which implies again that CU types are effective features and the utilization of the complementary relationship between PU types and CU types can improve the performance of detecting AVC/HEVC videos.

5.3. Robustness against Shifted GOP Structure

Shifted GOP structure means the GOP size adopted in recompression is different from that adopted in the first compression, which is also a common forgery type for digital videos. Therefore, we tested the robustness of the proposed method and Reference [16] against the shifted GOP structure in the 1080p video set. In the former AVC encoding, the GOP size and GOP structure are set as four and IPPP, respectively. In the latter HEVC encoding, the GOP size and GOP structure should be changed. In this section, the tests are carried out in two cases. One case is that the GOP size used in HEVC encoding is an integral multiple of that in AVC encoding. The other case is the opposite of the former case. In this section, therefore, for the first case, the GOP size and GOP structure are set as eight and IPPPPPPP, respectively, and we call the corresponding video set the GOP4-GOP8 video set for abbreviation. The GOP size and GOP structure used for the second case are six and IPBBPB, respectively, and the corresponding video set is called theGOP4-GOP6 video set. The classification accuracies of the proposed method and Reference [16] are listed in Table 9 and Table 10.
We observed that, whether or not the GOP size used in HEVC encoding is an integral multiple of that in AVC encoding, both the proposed method and Reference [16] performs well under the shifted GOP structure with a classification accuracy above 0.98. Additionally, the proposed method performs better than Reference [16], with the classification accuracy being one in some cases. It indicates that the proposed method is also effective under the shifted GOP structure case.

6. Conclusions

In this paper, an effective method is proposed to detect HEVC videos transcoded from AVC videos. Both PU partition types and CU partition types are analyzed, and the conclusion that they can be used as complementary features is obtained. Based on the analysis, mean frequencies of I-CU, I-PU, P-CU, and P-PU can be calculated and form the final 38-dimensional feature set, which is then sent to SVM for classification. The experimental results verify that the proposed method performs well to detect transcoded HEVC videos and has strong robustness against frame-deletion and shifted GOP structure attack. In future work, we will look for other valid features to improve the classification accuracy and even estimate the codec parameters adopted in the first compression.

Author Contributions

Conceptualization and methodology, C.L., Z.L. and Z.Z.; software, C.L., H.Y. and Z.Z.; formal analysis, C.L., Z.Z. and L.Y.; writing—original draft preparation, Z.Z.; writing—review and editing, C.L., Z.L., L.Y. and H.Y.; project administration, Z.L.; funding acquisition, Z.L.

Funding

This work was supported by the National Natural Science Foundation of China (No. 61702034), and by the Opening Project of Guangdong Province Key Laboratory of Information Security Technology (Grant No. 2017B030314131).

Acknowledgments

The authors highly appreciate the authors of [16], S.Bian, H. Li, et al., for their kind help in implementing their method. The authors also appreciate both the editors and the reviewers for their invaluable comments and suggestions that helped greatly to improve this manuscript

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. 160 YouTube Statistics and Facts 2019. Available online: https://expandedramblings.com/index.php/youtube-statistics/ (accessed on 10 August 2019).
  2. Bestagini, P.; Milani, S.; Tagliasacchi, M.; Tubaro, S. Codec and GOP Identification in Double Compressed Videos. IEEE Trans. Image Process 2016, 25, 2298–2310. [Google Scholar] [CrossRef] [PubMed]
  3. He, P.; Jiang, X.; Sun, T.; Wang, S. Detection of Double Compression in MPEG-4 Videos Based on Block Artifact Measurement. Neurocomputing 2016, 228, 84–96. [Google Scholar] [CrossRef]
  4. Huang, Z.; Huang, F.; Huang, J. Detection of double compression with the same bit rate in MPEG-2 videos. In Proceedings of the 2014 IEEE China Summit & International Conference on Signal and Information Processing (China SIP), Xi’an, China, 9–13 July 2014; pp. 306–309. [Google Scholar]
  5. Luo, W.; Wu, M.; Huang, J. MPEG recompression detection based on block artifacts. In Security, Forensics, Steganography, and Watermarking of Multimedia Contents X; International Society for Optics and Photonics: Bellingham, WA, USA, 2008; Volume 6819, p. 68190X. [Google Scholar]
  6. Jiang, X.; He, P.; Sun, T.; Xie, F.; Wang, S. Detection of Double Compression with the Same Coding Parameters Based on Quality Degradation Mechanism Analysis. IEEE Trans. Inf. Forensics Secur. 2018, 13, 170–185. [Google Scholar] [CrossRef]
  7. Huang, M.; Wang, R.; Jian, X.; Li, Q.; Xu, D. Detection of double compression for HEVC videos based on statistics properties of DCT coefficient. J. Optoelectron. Laser 2015, 26, 733–739. [Google Scholar]
  8. Huang, M.; Wang, R.; Xu, J.; Xu, D.; Li, Q. Detection of double compression for HEVC videos based on the Markov feature optimization. In Proceedings of the 12th China Information Hiding Multimedia Security Workshop (CIHW 2015), Wuhan, China, 7–10 October 2015; pp. 475–481. [Google Scholar]
  9. Huang, M.; Wang, R.; Xu, J.; Xu, D.; Li, Q. Detection of double compression for HEVC videos based on the co- occurrence matrix of DCT coefficients. In Proceedings of the 14th Int. Workshop Digital Water-Marking (IWDW 2015), Tokyo, Japan, 7–10 October 2015; pp. 61–71. [Google Scholar]
  10. Xu, Q.; Sun, T.; Jiang, X.; Dong, Y. HEVC double compression detection based on SN-PUPM feature. In Digital Forensics and Watermarking (IWDW 2017), Lecture Notes in Computer Science; Kraetzer, C., Shi, Y.Q., Dittmann, J., Kim, H., Eds.; Springer: Cham, Switzerland, 2017; Volume 10431, pp. 3–17. [Google Scholar]
  11. Li, Q.; Wang, R.; Xu, D. Detection of Double Compression in HEVC Videos Based on TU Size and Quantized DCT Coefficients. IET Information Security. 2019, 13, 1–6. [Google Scholar] [CrossRef]
  12. Liang, X.; Li, Z.; Yang, Y.; Zhang, Z.; Zhang, Y. Detection of Double Compression for HEVC Videos with Fake Bitrate. IEEE Access 2018, 6, 53243–53253. [Google Scholar] [CrossRef]
  13. Vazquez-Padin, D.; Fintani, M.; Bianchi, T. Detection of video double encoding with GOP size estimation. In Proceedings of the IEEE International Workshop on Information Forensics and Security, Tenerife, Spain, 2–5 December 2012; pp. 151–156. [Google Scholar]
  14. Costanzo, A.; Barni, M. Detection of double AVC/HEVC encoding. In Proceedings of the 2016 24th European Signal Processing Conference (EUSIPCO), Budapest, Hungary, 28 August–2 September 2016; pp. 2245–2249. [Google Scholar]
  15. Bian, S.; Li, H.; Gu, T.; Kot, A.C. Exposing Video Compression History by Detecting Transcoded HEVC Videos from AVC Coding. Symmetry 2019, 11, 67. [Google Scholar] [CrossRef]
  16. Rec. ITU-T H.265: High Efficiency Video Coding. 2016. Available online: https://www.itu.int/itu-t/recommendations/ rec.aspx?rec=12905 (accessed on 12 August 2019).
  17. International Organization for Standardization. ISO/IEC 23008-2:2017. Information Technology—High Efficiency Coding and Media Delivery in Heterogeneous Environments—Part 2: High Efficiency Video Coding; International Organization for Standardization: Geneva, Switzerland, 2017; Available online: https://www.iso.org/standard/69668.html (accessed on 12 August 2019).
  18. Ferroukhi, M.; Ouahabi, A.; Attari, M.; Habchi, Y.; Taleb-Ahmed, A. Medical Video Coding Based on 2nd-Generation Wavelets: Performance Evaluation. Electronics 2019, 8, 88. [Google Scholar] [CrossRef]
  19. HM Software. Available online: https://hevc.hhi.fraunhofer.de/trac/hevc/browser/tags (accessed on 20 August 2019).
  20. JM Software. Available online: http://iphome.hhi.de/suehring/tml/ (accessed on 20 August 2019).
  21. Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2011, 2, 27. [Google Scholar] [CrossRef]
Figure 1. Example of a quadtree partition of Coding Tree Unit (CTU) into coding units (CUs.) (a) CTU partition; (b) corresponding quadtree.
Figure 1. Example of a quadtree partition of Coding Tree Unit (CTU) into coding units (CUs.) (a) CTU partition; (b) corresponding quadtree.
Symmetry 11 01343 g001
Figure 2. Prediction unit (PU) partition types in different prediction modes.
Figure 2. Prediction unit (PU) partition types in different prediction modes.
Symmetry 11 01343 g002
Figure 3. PU partition of the CTU marked red in Figure 1.
Figure 3. PU partition of the CTU marked red in Figure 1.
Symmetry 11 01343 g003
Figure 4. Simplified block diagram of AVC/High Efficiency Video Coding (HEVC) transcoding.
Figure 4. Simplified block diagram of AVC/High Efficiency Video Coding (HEVC) transcoding.
Symmetry 11 01343 g004
Figure 5. CU and PU partitioning of the first P picture in a singly compressed HEVC video and AVC/HEVC transcoded videos. (a) Singly compressed HEVC video. (b–d) HEVC videos transcoded from various AVC coding.
Figure 5. CU and PU partitioning of the first P picture in a singly compressed HEVC video and AVC/HEVC transcoded videos. (a) Singly compressed HEVC video. (b–d) HEVC videos transcoded from various AVC coding.
Symmetry 11 01343 g005
Figure 6. The numbers of CU partition types of videos shown in Figure 5.
Figure 6. The numbers of CU partition types of videos shown in Figure 5.
Symmetry 11 01343 g006
Figure 7. The numbers of PU partition types of videos shown in Figure 5.
Figure 7. The numbers of PU partition types of videos shown in Figure 5.
Symmetry 11 01343 g007
Figure 8. Classification feature set S of a singly compressed HEVC video and its corresponding AVC/HEVC transcoded versions.
Figure 8. Classification feature set S of a singly compressed HEVC video and its corresponding AVC/HEVC transcoded versions.
Symmetry 11 01343 g008
Figure 9. The flow of the proposed method.
Figure 9. The flow of the proposed method.
Symmetry 11 01343 g009
Figure 10. CU partition of one picture in a singly compressed HEVC video and its AVC/HEVC transcoding version at a special bitrate. (a) HEVC 400 kbps; (b) AVC 300 kbps HEVC 400 kbps.
Figure 10. CU partition of one picture in a singly compressed HEVC video and its AVC/HEVC transcoding version at a special bitrate. (a) HEVC 400 kbps; (b) AVC 300 kbps HEVC 400 kbps.
Symmetry 11 01343 g010
Table 1. Indexes of CU types in I picture and P picture.
Table 1. Indexes of CU types in I picture and P picture.
Index1234
CU Type8×816×1632×3264×64
Table 2. Indexes of PU types in the I picture.
Table 2. Indexes of PU types in the I picture.
Index12345
PU Type4×48×816×1632×3264×64
Table 3. Indexes of PU types in the P picture.
Table 3. Indexes of PU types in the P picture.
Index12345
PU Type4×48×84×88×416×16
Index678910
PU Type16×88×1616×1216×412×16
Index1112131415
PU Type4×1632×3232×1616×3232×24
Index1617181920
PU Type32×824×328×3264×6464×32
Index2122232425
PU Type32×6464×4864×1648×6416×64
Table 4. Classification accuracy in the CIF video set.
Table 4. Classification accuracy in the CIF video set.
r 1 \ r 2   ( kbps ) 200300400500
2000.9390.9910.9870.988
3000.9190.9910.9680.988
4000.9040.9760.9830.980
5000.9260.9530.9740.986
Table 5. Classification accuracy in the 1080p video set.
Table 5. Classification accuracy in the 1080p video set.
r 1 \ r 2   ( Mbps ) 234
20.9970.990.993
30.9910.9890.992
40.9850.9830.989
Table 6. Comparison results of the proposed method and Reference [16] in the CIF video set.
Table 6. Comparison results of the proposed method and Reference [16] in the CIF video set.
r 1 \ r 2   ( Kbps ) 200300400500
Ref [16]ProposedRef [16]ProposedRef [16]ProposedRef [16]Proposed
2000.9110.9390.9630.9910.9820.9870.9830.988
3000.8760.9190.9780.9910.9760.9680.9860.988
4000.8890.9040.9500.9760.9790.9830.9770.980
5000.8760.9260.9410.9530.9690.9740.9830.986
Table 7. Comparison results of the proposed method and Reference [16] in the 1080p video set.
Table 7. Comparison results of the proposed method and Reference [16] in the 1080p video set.
r 1 \ r 2   ( Mbps ) 345
Ref [16]ProposedRef [16]ProposedRef [16]Proposed
30.9840.9970.9780.990.9860.993
40.980.9910.980.9890.980.992
50.9740.9850.9770.9830.9780.989
Table 8. Classification accuracy of the proposed method and Reference [16] when deleting 30 frames in the 1080p video set.
Table 8. Classification accuracy of the proposed method and Reference [16] when deleting 30 frames in the 1080p video set.
r 1 \ r 2   ( Mbps ) 345
Ref [16]ProposedRef [16]ProposedRef [16]Proposed
30.9840.9940.9760.9910.980.994
40.9810.9940.9790.990.980.99
50.9710.9870.9870.990.9750.982
Table 9. Classification accuracy of the proposed method and Reference [16] in the 1080p GOP4-GOP8 video set.
Table 9. Classification accuracy of the proposed method and Reference [16] in the 1080p GOP4-GOP8 video set.
r 1 \ r 2   ( Mbps ) 345
Ref [16]ProposedRef [16]ProposedRef [16]Proposed
30.9910.9970.99710.9991
40.9890.9950.99110.9991
50.9820.9870.99210.9981
Table 10. Classification accuracy of the proposed method and Reference [16] in the 1080p GOP4-GOP6 video set.
Table 10. Classification accuracy of the proposed method and Reference [16] in the 1080p GOP4-GOP6 video set.
r 1 \ r 2   ( Mbps ) 345
Ref [16]ProposedRef [16]ProposedRef [16]Proposed
30.9910.9950.9850.9890.9850.987
40.9850.9880.9820.9880.9850.989
50.9870.9970.990.9940.9860.99
Back to TopTop