Fake Bitrate Detection of HEVC Videos Based on Prediction Process

: In order to defraud click-through rate, some merchants recompress the low-bitrate video to a high-bitrate one without improving the video quality. This behavior deceives viewers and wastes network resources. Therefore, a stable algorithm that detects fake bitrate videos is urgently needed. High-E ﬃ ciency Video Coding (HEVC) is a worldwide popular video coding standard. Hence, in this paper, a robust algorithm is proposed to detect HEVC fake bitrate videos. Firstly, ﬁve e ﬀ ective feature sets are extracted from the prediction process of HEVC, including Coding Unit of I-picture / P-picture partitioning modes, Prediction Unit of I-picture / P-picture partitioning modes, Intra Prediction Modes of I-picture. Secondly, feature concatenation is adopted to enhance the expressiveness and improve the e ﬀ ectiveness of the features. Finally, ﬁve single feature sets and three concatenate feature sets are separately sent to the support vector machine for modeling and testing. The performance of the proposed algorithm is compared with state-of-the-art algorithms on HEVC videos of various resolutions and fake bitrates. The results show that the proposed algorithm can not only can better detect HEVC fake bitrate videos, but also has strong robustness against frame deletion, copy-paste, and shifted Group of Picture structure attacks.


Introduction
Digital video has become an indispensable part of our daily lives nowadays. According to the statistics, about 65,000 videos are uploaded to YOUTUBE every day [1]. However, falsified videos have been exposed to sneak in [2], which will cause serious moral, ethical, and legal problems. Therefore, it is essential to verify the authenticity of videos before they can be trusted. One of the video tampering methods is to up-convert the bitrate of a video without introducing any additional information about the video content. However, this operation does not improve the quality of the video, so we call this claimed high bitrate 'fake bitrate'. The abused fake bitrate videos will not only mislead the viewers, but also lead to a big waste of storage space. Hence, it is necessary to propose an effective algorithm to detect fake bitrate videos.
In the process of making a fake bitrate video, the encoded video is decompressed before upconverting the bitrate and then recompressed after that. The re-encoding process may be different from that in the original video, in terms of video coding format and/or video coding parameters. Hence, the fake bitrate video is compressed twice at least. Therefore, detecting whether the video has been recompressed is a key step in detecting fake bitrate video. However, very limited work has been reported on detecting High-Efficiency Video Coding (HEVC) recompressed videos. In summary, the recompression detection methods can be divided into two categories: transformation-process-based method and prediction-process-based method. Discrete Cosine Transform (DCT) is adopted in the Thirdly, the robustness of the proposed method is considered, and experimental results show that it has strong robustness against frame deletion, copy-paste, and shifted GOP structure attacks.
The rest of the paper is organized as follows. Section 2 briefly reviews the prediction process in HEVC and continued with the analysis of several effective prediction features under different fake bitrates in Section 3. Experimental results are shown in Sections 4 and 5 concludes the paper.

Basics of Prediction Process in HEVC
In 2013, the HEVC standard was jointly ratified and published by ITU-T and ISO/IEC [19,20]. The video coding layer of HEVC employs the same hybrid approach (inter/intra prediction and 2-D transform coding) used in all video compression standards since H.261 [21]. The coding framework of HEVC is shown in Figure 1. It mainly includes bitrate control module, transformation and quantization module, prediction module, entropy coding module, and reconstruction module. It can be seen that the prediction module is a very important part of the coding process, and it is closely related to each module. Prediction coding in HEVC uses the spatial and temporal correlation of the image signal to predict the current encoding pixel with its reconstructed pixel. There are two kinds of video signal prediction methods in HEVC: intra prediction and inter prediction. We use Figure 2 to describe the HEVC prediction process visually. In the prediction process of HEVC, each picture firstly performs CU and PU partitioning in a quadtree structure, and then selects the optimal IPM in each PU. The HEVC standard stipulates that I-picture only can opt intra prediction, while P-picture can adopt either intra or inter prediction.  Prediction coding in HEVC uses the spatial and temporal correlation of the image signal to predict the current encoding pixel with its reconstructed pixel. There are two kinds of video signal prediction methods in HEVC: intra prediction and inter prediction. We use Figure 2 to describe the HEVC prediction process visually. In the prediction process of HEVC, each picture firstly performs CU and PU partitioning in a quadtree structure, and then selects the optimal IPM in each PU. The HEVC standard stipulates that I-picture only can opt intra prediction, while P-picture can adopt either intra or inter prediction. Using the quadtree partitioning method, the coding tree unit (CTU) is divided into CUs and then the PU partitioning modes in each CU block are determined. A 64 × 64 luma of CTU can be split into multiple CUs with size from 8 × 8 to 64 × 64 ( Figure 3a). The partitioning can be described by a quadtree structure, as shown in (Figure 3b), where the numbers indicate the index of the CUs in Figure 3a. Regardless of whether it is intra prediction or inter prediction, there are only four types of CU partitioning mode. The index of each CU partitioning modes is shown in Table 1.  The PU partitioning modes in intra prediction and inter prediction are shown in Figure 4. A CU can be symmetrically divided into one or four prediction blocks (PBs) for intra prediction. While in inter prediction, a CU can be divided into symmetric or asymmetric PBs. There are 5 PU partitioning Using the quadtree partitioning method, the coding tree unit (CTU) is divided into CUs and then the PU partitioning modes in each CU block are determined. A 64 × 64 luma of CTU can be split into multiple CUs with size from 8 × 8 to 64 × 64 ( Figure 3a). The partitioning can be described by a quadtree structure, as shown in (Figure 3b), where the numbers indicate the index of the CUs in Figure 3a. Regardless of whether it is intra prediction or inter prediction, there are only four types of CU partitioning mode. The index of each CU partitioning modes is shown in Table 1. Using the quadtree partitioning method, the coding tree unit (CTU) is divided into CUs and then the PU partitioning modes in each CU block are determined. A 64 × 64 luma of CTU can be split into multiple CUs with size from 8 × 8 to 64 × 64 ( Figure 3a). The partitioning can be described by a quadtree structure, as shown in (Figure 3b), where the numbers indicate the index of the CUs in Figure 3a. Regardless of whether it is intra prediction or inter prediction, there are only four types of CU partitioning mode. The index of each CU partitioning modes is shown in Table 1.  The PU partitioning modes in intra prediction and inter prediction are shown in Figure 4. A CU can be symmetrically divided into one or four prediction blocks (PBs) for intra prediction. While in inter prediction, a CU can be divided into symmetric or asymmetric PBs. There are 5 PU partitioning  The PU partitioning modes in intra prediction and inter prediction are shown in Figure 4. A CU can be symmetrically divided into one or four prediction blocks (PBs) for intra prediction. While in inter prediction, a CU can be divided into symmetric or asymmetric PBs. There are 5 PU partitioning  Tables 2 and 3. Symmetry 2019, 11, x FOR PEER REVIEW 5 of 22 modes in intra prediction and 25 PU partitioning modes in inter prediction, and their indexes are listed in Tables 2 and 3.
In the intra prediction process, the optimal IPM is selected in each PU. HEVC defined 35 IPM, which are DC mode, 33 angle modes, and planar mode, as shown in Figure 5. The increase of intra prediction angles makes intra prediction more accurate, thereby reducing the spatial redundancy of video more effectively.     In the intra prediction process, the optimal IPM is selected in each PU. HEVC defined 35 IPM, which are DC mode, 33 angle modes, and planar mode, as shown in Figure 5. The increase of intra prediction angles makes intra prediction more accurate, thereby reducing the spatial redundancy of video more effectively.

Prediction Features in HEVC Videos
In this paper, we focus on proposing an effective algorithm to distinguish recompressed HEVC videos with fake bitrate from original videos. In the HEVC video recompression process, reconstruction error and quantization error are generated in the reconstruction process and quantization process, respectively. These two kinds of errors make the decoding video lose a part of content information, which further influence prediction variables: I-CU partitioning modes, I-PU partitioning modes, I-IPM, P-CU partitioning modes, P-PU partitioning modes, and P-IPM. The theoretical model and case study of the influence of irreversible coding errors on prediction variables will be described in detail in this section.

Theoretical Analysis and Modeling
For an original YUV (Luma and Chroma) sequence V (  V = { | = 1, 2, , } n n N F ), where F n denotes the nth frame and N is the total number of video frames. Given a bitrate r, the CU partitioning modes, PU partitioning modes, and IPM in I-picture and P-picture are successively determined. Let ( ) ρ ⋅ represents the bit allocation process of rate control module, the amount of bits allocated to the nth frame can be written as For intra prediction process ( ) ⋅ π , it will select the optimal splitting of CTU into CUs, the optimal partitioning of CU into Pus, and the best IPM in PU to obtain the smallest rate-distortion value. Let k and n K represent the kth CU in the nth frame and the total number of CUs in the nth frame, denotes an IPM whose index is j of the ith PU in the kth CU whose depth is d in the nth frame.

Prediction Features in HEVC Videos
In this paper, we focus on proposing an effective algorithm to distinguish recompressed HEVC videos with fake bitrate from original videos. In the HEVC video recompression process, reconstruction error and quantization error are generated in the reconstruction process and quantization process, respectively. These two kinds of errors make the decoding video lose a part of content information, which further influence prediction variables: I-CU partitioning modes, I-PU partitioning modes, I-IPM, P-CU partitioning modes, P-PU partitioning modes, and P-IPM. The theoretical model and case study of the influence of irreversible coding errors on prediction variables will be described in detail in this section.

Theoretical Analysis and Modeling
For an original YUV (Luma and Chroma) sequence V (V = {F n |n = 1, 2, · · · , N}), where F n denotes the nth frame and N is the total number of video frames. Given a bitrate r, the CU partitioning modes, PU partitioning modes, and IPM in I-picture and P-picture are successively determined. Let ρ(·) represents the bit allocation process of rate control module, the amount of bits allocated to the nth frame can be written as b (r) n = ρ(n; V; r). Please note that in this paper, a picture contains only one slice. For intra prediction process π(·), it will select the optimal splitting of CTU into CUs, the optimal partitioning of CU into Pus, and the best IPM in PU to obtain the smallest rate-distortion value. Let k and K n represent the kth CU in the nth frame and the total number of CUs in the nth frame, respectively. The CU partitioning sequence with d partitioning depth in the nth frame can be denoted as CU n,k,d k = 0, 1, 2, . . . , K n , d ∈ {0, 1, 2, 3} , where d = 0, 1, 2, 3 means the CU size is 64 × 64, 32 × 32, 16 × 16, 8 × 8, respectively. When the CU depth is 0 to 2, the PU partitioning mode is the same as the CU partitioning mode. When the CU depth is 3, the PU can either be 8 × 8 or 4 × 4. Therefore, the PU partitioning sequence in the nth frame can be denoted as n,k,d,i,j denotes an IPM whose index is j of the ith PU in the kth CU whose depth is d in the nth frame.
In order to demonstrate the subscript more clearly, we draw a CTU with size of 64 × 64 in the nth frame as an example. The CU sequence of the CTU is shown in Figure 6a, the PU sequence of the 32 × In order to demonstrate the subscript more clearly, we draw a CTU with size of 64 × 64 in the nth frame as an example. The CU sequence of the CTU is shown in Figure 6a, the PU sequence of the 32 × 32 CU in the upper left corner of Figure 6a is shown in Figure 6b. The index of each CU is counted from up to down and from left to right. The sequences of CU partitioning modes, PU partitioning modes, and IPMs in intra prediction process can be represented as Equation (1). We can see that CU partitioning modes, PU partitioning modes, and IPMs in intra prediction are mainly affected by the picture content and the bits allocated to the frame. . Different from the case in intra prediction, apart from symmetric partitioning modes, inter prediction also has asymmetric PU partitioning modes, as shown in Figure 4. Therefore, the inter PU partitioning sequence in the nth frame can be represented as , where i and I represent the ith PU in the kth CU and the total number of PUs contained in the kth CU, respectively. When the size of the kth CU is 64 × 64 or 32 × 32 or 16 × 16 (d = 0 or 1 or 2), the kth CU can be symmetric or asymmetric partitioned, so I = 1 or 2. When d = 3, it is the same as intra prediction. There is no IPM exist in inter prediction. Thus, the CU partitioning sequence and PU partitioning sequence in inter prediction process can be represented as Equation (2), where n C is the reference frame of F n . That is to say, the CU partitioning modes and PU partitioning modes in inter prediction are not only determined by the content of the picture and the bits allocated to the frame, but also by the content of the reference frame.
We use the P-PU partitioning sequence as an example to describe the difference between HEVC recompressed video with fake bitrate and single-compressed video. As shown in Figure 2, P-picture can use either intra prediction or inter prediction. That is to say, P-PU partitioning sequence in a HEVC video denoted as ( ) PPU r is composed of PU partitioning sequence by adopting intra prediction and that by adopting inter prediction. Therefore, the P-PU partitioning sequence ( ) PPU r can be expressed as Equation (3). The sequences of CU partitioning modes, PU partitioning modes, and IPMs in intra prediction process can be represented as Equation (1). We can see that CU partitioning modes, PU partitioning modes, and IPMs in intra prediction are mainly affected by the picture content and the bits allocated to the frame. (IPM For inter prediction process f (·), the partition strategy is similar to intra prediction. The CU partitioning sequence with d depth in the nth frame can be represented as CU n,k,d k = 0, 1, 2, . . . , K n , d ∈ {0, 1, 2, 3} . Different from the case in intra prediction, apart from symmetric partitioning modes, inter prediction also has asymmetric PU partitioning modes, as shown in Figure 4. Therefore, the inter PU partitioning sequence in the nth frame can be represented as PU , where i and I represent the ith PU in the kth CU and the total number of PUs contained in the kth CU, respectively. When the size of the kth CU is 64 × 64 or 32 × 32 or 16 × 16 (d = 0 or 1 or 2), the kth CU can be symmetric or asymmetric partitioned, so I = 1 or 2. When d = 3, it is the same as intra prediction. There is no IPM exist in inter prediction. Thus, the CU partitioning sequence and PU partitioning sequence in inter prediction process can be represented as Equation (2), where C n is the reference frame of F n . That is to say, the CU partitioning modes and PU partitioning modes in inter prediction are not only determined by the content of the picture and the bits allocated to the frame, but also by the content of the reference frame.
We use the P-PU partitioning sequence as an example to describe the difference between HEVC recompressed video with fake bitrate and single-compressed video. As shown in Figure 2, P-picture can use either intra prediction or inter prediction. That is to say, P-PU partitioning sequence in a HEVC video denoted as PPU (r) is composed of PU partitioning sequence by adopting intra prediction and that by adopting inter prediction. Therefore, the P-PU partitioning sequence PPU (r) can be expressed as Equation (3). Assume that the first and second compression bitrates of the recompressed video are r 1 and r 2 (r 1 − r 2 for abbreviation), respectively. Here, we only consider the fake bitrate recompression situation, that is, the bitrate up-converting, so r 1 < r 2 . In the second compression, the amount of bits allocated to the nth P-frame can be written as Equation (4), whereV (V = F n n = 1, 2, · · · , N represents the decoded YUV sequence, andF n is the decompressed picture of F n . The P-PU partitioning sequence of HEVC recompressed video is PPU (r 1 , For HEVC single-compressed video with bitrate r 2 , b (r 2 ) n is denoted as the amount of bits allocated to the nth P-picture, and PPU (r 2 ) is the P-PU partitioning sequence of this single-compressed video. We can achieve b inter .
Use D(·) to represent the difference between PPU (r 1 ,r 2 ) and PPU (r 2 ) . According to Equations (1)- (7), we can get the difference between the P-PU partitioning sequence of HEVC single-compressed videos and its corresponding recompressed videos with fake bitrate, as shown in Equation (8), wherê F n andĈ n are the decompressed version of F n and C n , respectively.
n ; C n ))) = D((π(F n ; ρ(n; F n n = 1, 2, · · · , N ; r 2 )) ∪ f (F n ; ρ(n; F n n = 1, 2, · · · , N ; r 2 );Ĉ n )), (π(F n ; ρ(n; {F n |n = 1, 2, · · · , N}; r 2 )) ∪ f (F n ; ρ(n; {F n |n = 1, 2, · · · , N}; r 2 ); C n ))) (8) It can be concluded from Equation (8) that the main difference between PPU (r 1 ,r 2 ) and PPU (r 2 ) comes from the contents of F n ,F n , C n , andĈ n . The relation betweenF n and F n can be derived as Equation (9), where E(F n ) and E(C n ) are the quantization error of F n and C n under the given quantization step Q p , respectively, [] means rounding operation. Therefore, we can getF n − F n = E(F n ) − E(C n ), which means the difference betweenF n and F n is caused by quantization error and reconstruction error. Quantization error contains rounding error and truncation error in the quantization process. Reconstruction error is caused by reference frame in the reconstruction process. It means that the difference between P-PU partitioning sequence in HEVC double-compressed video with fake bitrate and single-compressed video is mainly caused by quantization error and reconstruction error.
There are six kinds of prediction variables as illustrated in Figure 2. Similar as P-PU partitioning sequence PPU (r) , we can get other prediction sequences: I-CU partitioning sequence ICU (r) , I-PU partitioning sequence IPU (r) , I-IPM sequence IIPM (r) , P-CU partitioning sequence PCU (r) , P-IPM sequence PIPM (r) . And the difference between other prediction sequences of HEVC recompressed video and that of single-compressed video can be illustrated in Equations (10)-(13), respectively. Please note that the theoretical model of IIPM (r) and PIPM (r) are the same because IPM only appears in the intra prediction process.

Feature Analysis and Example Description
To illustrate the difference between prediction variables in fake bitrate recompressed video and single-compressed video, we will extract the CU partitioning modes, PU partitioning modes, and IPM of the same I-frame within the recompressed video with fake bitrate 100-400 Kbps and within the corresponding single-compressed video with bitrate 400 Kbps, as shown in Figure 7. The 16th picture (I-picture) with CU partitioning is extracted from the double-compressed video with fake bitrate as shown in Figure 7a, and the same frame of the single-compressed video is shown in Figure 7c. Taking the 32 × 32 block (shown in Figure 7b,d) surrounded by the red circle in the I-frame as an example for analysis. The block with PU partitioning modes in double-and single-compressed videos are shown in Figure 7e,f, respectively. Figure 7g,h exhibit the block with IPM in double-and single-compressed video, respectively. Observing Figure 7, it can be found that the three prediction variables (I-CU partitioning modes, I-PU partitioning modes, I-IPM) in the same I-picture of single-compressed video and double-compressed video with fake bitrate are quite different. Therefore, these three prediction variables of I-picture can be used as independent classification features for detecting fake bitrate recompression videos. In addition, even within the same I-CU partitioning block, the I-PU partitioning mode in single-compressed frame and recompressed frame is not necessarily the same, as is IPM, so these three prediction variables of I-picture are interdependent. As we can see, I-IPM is based on I-PU partitioning modes, and I-PU partitioning modes is based on I-CU partitioning modes, thus, the fusion method of concatenating these three single features can be used to enhance feature expression. I-CU partitioning modes, thus, the fusion method of concatenating these three single features can be used to enhance feature expression.  (Figure 8a), the I-CU partitioning, I-PU partitioning, and I-IPM in the recompressed video blocks will change, the number of changes is shown in Table 4. Both visually and statistically, it can be seen that when the difference between the first bitrate and the second bitrate of the recompressed video becomes smaller, the difference between the prediction variables of I-picture in single-compressed video and double-compressed video will become smaller, the same as the prediction variables of P-picture.  (Figure 8a), the I-CU partitioning, I-PU partitioning, and I-IPM in the recompressed video blocks will change, the number of changes is shown in Table 4. Both visually and statistically, it can be seen that when the difference between the first bitrate and the second bitrate of the recompressed video becomes smaller, the difference between the prediction variables of I-picture in single-compressed video and double-compressed video will become smaller, the same as the prediction variables of P-picture.  In P-picture, CU block traverses the various modes of inter prediction and intra prediction, and then selects the mode with the lowest rate distortion cost as the final optimal prediction mode. While the temporal redundancy in the video is more than the spatial redundancy, the efficiency of the general inter prediction is higher than that of the intra prediction. Therefore, most of the P-pictures select inter prediction instead of intra prediction. This phenomenon is indeed ubiquitous in HEVC videos, for example, randomly selecting 9 P-pictures from a HEVC single-compressed video. Then, separately counting the number of CU blocks which adopt intra prediction (abbreviated as "intra-CUs") and the number of CU blocks which adopt inter prediction (abbreviated as "inter-CUs") in each P-pictures, the statistical histogram is shown in Figure 9. The red bars represent the inter-CUs and blue bars represent the intra-CUs. It can be seen that intra-CUs only take a tiny percentage of the total CUs and they are much smaller than inter-CUs. Furthermore, there are 35 IPM choices for each PU block, which makes the number of each IPM mode too small in P-pictures to be used as a single classification feature. Therefore, we ignore the P-IPM and only adopt the other two prediction variables for P-pictures: P-CU partitioning modes and P-PU partitioning modes. Similarly, the CU partitioning modes and the PU partitioning modes of P-picture are mutually dependent.   In P-picture, CU block traverses the various modes of inter prediction and intra prediction, and then selects the mode with the lowest rate distortion cost as the final optimal prediction mode. While the temporal redundancy in the video is more than the spatial redundancy, the efficiency of the general inter prediction is higher than that of the intra prediction. Therefore, most of the P-pictures select inter prediction instead of intra prediction. This phenomenon is indeed ubiquitous in HEVC videos, for example, randomly selecting 9 P-pictures from a HEVC single-compressed video. Then, separately counting the number of CU blocks which adopt intra prediction (abbreviated as "intra-CUs") and the number of CU blocks which adopt inter prediction (abbreviated as "inter-CUs") in each P-pictures, the statistical histogram is shown in Figure 9. The red bars represent the inter-CUs and blue bars represent the intra-CUs. It can be seen that intra-CUs only take a tiny percentage of the total CUs and they are much smaller than inter-CUs. Furthermore, there are 35 IPM choices for each PU block, which makes the number of each IPM mode too small in P-pictures to be used as a single classification feature. Therefore, we ignore the P-IPM and only adopt the other two prediction variables for P-pictures: P-CU partitioning modes and P-PU partitioning modes. Similarly, the CU partitioning modes and the PU partitioning modes of P-picture are mutually dependent.   In P-picture, CU block traverses the various modes of inter prediction and intra prediction, and then selects the mode with the lowest rate distortion cost as the final optimal prediction mode. While the temporal redundancy in the video is more than the spatial redundancy, the efficiency of the general inter prediction is higher than that of the intra prediction. Therefore, most of the P-pictures select inter prediction instead of intra prediction. This phenomenon is indeed ubiquitous in HEVC videos, for example, randomly selecting 9 P-pictures from a HEVC single-compressed video. Then, separately counting the number of CU blocks which adopt intra prediction (abbreviated as "intra-CUs") and the number of CU blocks which adopt inter prediction (abbreviated as "inter-CUs") in each P-pictures, the statistical histogram is shown in Figure 9. The red bars represent the inter-CUs and blue bars represent the intra-CUs. It can be seen that intra-CUs only take a tiny percentage of the total CUs and they are much smaller than inter-CUs. Furthermore, there are 35 IPM choices for each PU block, which makes the number of each IPM mode too small in P-pictures to be used as a single classification feature. Therefore, we ignore the P-IPM and only adopt the other two prediction variables for P-pictures: P-CU partitioning modes and P-PU partitioning modes. Similarly, the CU partitioning modes and the PU partitioning modes of P-picture are mutually dependent.  In the following, we will use a line chart to show the difference between the five prediction variables in the HEVC double-compressed video and the single-compressed video, as shown in Figure 10. The bitrate of the HEVC single-compressed video is 400 Kbps (blue lines) and the corresponding recompressed video with the first compression bitrate r 1 = 100 KbpS and the second compression bitrate r 2 = 400 KbpS (red lines). The abscissa indicates the index of the mode used in the prediction variable, and the ordinate indicates the total number of times the mode used in the prediction variable appears in the video. It is clearly that the difference between the red and blue lines in Figure 10a-d is obvious, which means the four prediction variables (I-CU partitioning modes, I-PU partitioning modes, P-CU partitioning modes, P-PU partitioning modes) can effectively distinguish HEVC single-and double-compressed videos. In Figure 10e, we find that the number of I-IPM only differs greatly in the 1st, 2nd, 10th, 11th, 26th, and 27th intra prediction modes, and this phenomenon appears in most videos. Thus, we only use these six modes as the representatives of I-IPM.

The Proposed Target Features
From the above analysis, we know that the numbers of the five prediction variables are different between double-compressed videos with fake bitrate and single ones. In order to make the characteristics of the prediction variables more universal, we consider using the probability matrix of prediction variables as the classification features. The probability set can be expressed as Equations (14) and (15), where f i denotes the probability matrix of the ith prediction variable, where i = 1, 2, 3, 4, 5 represents 4-dimensional I-CU partitioning modes, 5-dimensional I-PU partitioning modes, 6-dimensional I-IPM, 4-dimensional P-CU partitioning modes, and 25-dimensional P-PU partitioning modes, respectively. M i represents the dimensions of each prediction variable, so M i = {4, 5, 6, 4, 25|i = 1, 2, 3, 4, 5}. x ij represents the number of the jth mode of the ith prediction variable in a HEVC video. p ij represents the probability of the jth mode for ith prediction variable in the whole video. For example, for I-CU partitioning modes, i = 1, M 1 = 4, j = 1, 2, 3, 4. The number of each I-CU partition modes in the HEVC video are x 1 j , respectively, and their probabilities are expressed as p 1 j = x 1 j / M 1 j=1 x 1 j , then, the probability matrix of I-CU partitioning modes is f 1 . The probability matrix of other prediction variables is similar to that of I-CU partitioning modes. Finally, we can get five single features: the probability matrix of I-CU partitioning modes (PMoICU) with dimensions of 4, the probability matrix of I-PU partitioning modes (PMoIPU) with dimensions of 5, the probability matrix of I-IPM (PMoIIPM) with dimensions of 6, the probability matrix of P-CU partitioning modes (PMoPCU) with dimensions of 4, and the probability matrix of P-PU partitioning modes (PMoPPU) with dimensions of 25.
According to Section 3.2, CU partitioning modes, PU partitioning modes, and IPM are interdependent, the feature fusion method can enhance the expressiveness of single features. Therefore, we concatenate these single features from three aspects, which are called Concatenation Feature of I-picture (CFoI), Concatenation Feature of P-picture (CFoP), and Full Concatenation Feature (FCF). CFoI with dimensions of 15 is made up of PMoICU, PMoIPU, and PMoIIPM; CFoP with dimensions of 29 is composed of PMoPCU and PMoPPU, while FCF with dimensions of 44 is concatenated by all single features.

The Flow of the Proposed Algorithm
The flow of the proposed algorithm is shown in Figure 11. The classification feature is extracted from the HEVC video, and then the feature is sent to the SVM classifier for model training and testing to determine whether the video is single-compressed video or double-compressed video with fake bitrate. obvious, which means the four prediction variables (I-CU partitioning modes, I-PU partitioning modes, P-CU partitioning modes, P-PU partitioning modes) can effectively distinguish HEVC single-and double-compressed videos. In Figure 10e, we find that the number of I-IPM only differs greatly in the 1st, 2nd, 10th, 11th, 26th, and 27th intra prediction modes, and this phenomenon appears in most videos. Thus, we only use these six modes as the representatives of I-IPM.    Figure 11. The flow of the proposed algorithm. The process of feature extraction of the proposed algorithm is shown in Figure 12. Five prediction variables are extracted from the decoding stream of HEVC video. Based on the whole HEVC video, probabilistic statistics are performed on the five prediction variables to obtain five single features, respectively. Then, PMoICU, PMoIPU, and PMoIIPM are concatenated into CFoI, PMoPCU and PMoPPU are concatenated into CFoP, and finally, CFoI and CFoP are concatenated into the final classification feature FCF of the proposed algorithm.

Experimental Results
In this section, the performance of the proposed method is investigated on HEVC single-and double-compressed videos in QCIF (Quarter Common Interchange Format), CIF (Common Interchange Format), 720p, and 1080p video sets. The resolution of these four video sets are 176 × 144, 352 × 288, 1280 × 720, and 1920 × 1080, respectively. Each YUV sequence [22] is cut into several nonoverlapping subsequences with 100 frames. Finally, we can obtain 36 QCIF videos, 43 CIF videos, 36 720p videos, and 32 1080p videos with YUV420P pixel format. The GOP size is set to 4 with the IPPP structure both in single-and double-compressed videos, and the rate control is enabled in the encoding process. Because this paper is for detecting recompressed video with fake bitrate, that is, the second bitrate of the recompressed video in our video sets is greater than the first bitrate (bitrate up-converting case), regardless of the cases of bitrate dropping or equal. To guarantee the quality of The accuracy rate (AR) is used as the criterion and calculated by AR = (TPR + TNR)/2, where TPR and TNR mean true positive rate and true negative rate, respectively. For the LIBSVM [23] classifier, PolySVC kernel is selected and the optimal parameters of the classifier are obtained by a round-robin algorithm. The entire classification procedure is repeated for 20 times and the average AR is considered as the final classification accuracy. The ratio of videos in training and testing sets is set to be 5:1 for each video set, the videos are randomly assigned to training or testing set, and each training set and test set is non-overlapping. Each single prediction feature will be tested separately to verify its validity, and then the concatenation features CFoI, CFoP, and FCF will be tested immediately. The robustness of the proposed method against frame-deletion, copy-paste, and shifted GOP structure attacks will be also discussed.

Single Features of I-Picture
From Section 3, we know that single features related to I-pictures are PMoICU, PMoIPU, and PMoIIPM, and their dimensions are 4, 5, and 6, respectively. In this section, their classification accuracies on four video sets are tested and shown in Tables Figure 12. The process of feature extraction.

Experimental Results
In this section, the performance of the proposed method is investigated on HEVC single-and double-compressed videos in QCIF (Quarter Common Interchange Format), CIF (Common Interchange Format), 720p, and 1080p video sets. The resolution of these four video sets are 176 × 144, 352 × 288, 1280 × 720, and 1920 × 1080, respectively. Each YUV sequence [22] is cut into several non-overlapping subsequences with 100 frames. Finally, we can obtain 36 QCIF videos, 43 CIF videos, 36 720p videos, and 32 1080p videos with YUV420P pixel format. The GOP size is set to 4 with the IPPP structure both in single-and double-compressed videos, and the rate control is enabled in the encoding process. Because this paper is for detecting recompressed video with fake bitrate, that is, the second bitrate of the recompressed video in our video sets is greater than the first bitrate (bitrate up-converting case), regardless of the cases of bitrate dropping or equal. To guarantee the quality of videos, for double compression video, the first compression bitrates is selected from r The accuracy rate (AR) is used as the criterion and calculated by AR = (TPR + TNR)/2, where TPR and TNR mean true positive rate and true negative rate, respectively. For the LIBSVM [23] classifier, PolySVC kernel is selected and the optimal parameters of the classifier are obtained by a round-robin algorithm. The entire classification procedure is repeated for 20 times and the average AR is considered as the final classification accuracy. The ratio of videos in training and testing sets is set to be 5:1 for each video set, the videos are randomly assigned to training or testing set, and each training set and test set is non-overlapping. Each single prediction feature will be tested separately to verify its validity, and then the concatenation features CFoI, CFoP, and FCF will be tested immediately. The robustness of the proposed method against frame-deletion, copy-paste, and shifted GOP structure attacks will be also discussed.

Single Features of I-Picture
From Section 3, we know that single features related to I-pictures are PMoICU, PMoIPU, and PMoIIPM, and their dimensions are 4, 5, and 6, respectively. In this section, their classification accuracies on four video sets are tested and shown in Tables 5-7. As we can see, though the dimensions of these three single features are low, but they are still effective in recompression detection with fake Table 8. Classification accuracy of probability matrix of P-CU partitioning modes (PMoPCU) on four kinds of video sets (in percentage).

(a) QCIF Video Set (b) CIF Video Set (c) 720p Video Set (d) 1080p Video Set
r 1 − r 2 (bps) Accuracy r 1 − r 2 (bps) Accuracy r 1 − r 2 (bps) Accuracy r 1 − r 2 (bps) Accuracy  Table 9. Classification accuracy of probability matrix of P-PU partitioning (PMoPPU) on four kinds of video sets (in percentage).  The statistics in Tables 5-9 show that the five single features proposed in this paper can effectively detect HEVC fake bitrate videos in both low resolution QCIF and CIF video sets and High-Definition (HD) 720p and 1080p video sets.

The Concatenation Features
According to Section 3.2, concatenation can be used to enhance the expressiveness of classification feature. The classification accuracies of CFoI and CFoP are exhibited in Tables 10 and 11, respectively. It can be seen that all the classification accuracies are above 80% for both CFoI and CFoP, even reach 100% for some cases. Compared Table 10 with Tables 5-7, we can obtain that the classification accuracy of CFoI is 17.6%, 11.4% and 4.7% higher than that of single features: PMoICU, PMoIPU, and PMoIIPM, respectively. Similarly, comparing CFoP with single features PMoPCU and PMoPPU, and the average increments are 12.2% and 1.1%, respectively. It is obviously that the classification accuracy is improved a lot after the feature concatenation. Finally, the detecting results of the final classification feature of the proposed method FCF are shown in Table 12. We can see that all the classification accuracies of FCF are over 92% on four video sets, especially on the HD video sets (720p and 1080p) whose classification accuracies are all above 95%.  Then, we will investigate the relationship between the classification accuracy and the bitrate difference. In CIF video sets, we compared the classification accuracies with three different bitrates, which are 100-400 Kbps, 200-400 Kbps, and 300-400 Kbps, as shown in Figure 13a. Simultaneously, the classification accuracies with three different bitrates, which are 10-40 Mbps, 20-40 Mbps, and 30-40 Mbps in 720p video sets is shown in Figure 13b. It can be found that, whether in the low-resolution video set or the high-resolution video set, the smaller the difference between the first bitrate and the second bitrate of the recompressed video, the lower the classification accuracy of each feature. It is because in this case, the difference between the prediction variables of single-compressed video and recompressed video becomes smaller, as in the legend analysis in Section 3.2.
Regarding the double compression detection of HEVC videos, some algorithms are specific to the detection of recompression with the same quantization parameters, e.g., reference [15]. Some algorithms like references [6][7][8][9] are proposed for detecting recompression with different quantization parameters. Reference [16] is proposed for detecting recompression with different GOP, and references [17,18] are proposed for detecting fake bitrate videos. In order to better demonstrate the performance of the proposed method, we will compare the proposed method with previous excellent algorithms [8,9,[16][17][18] in different aspects, respectively. The feature extraction part for previous algorithms are implemented by using their own codes, and the SVM classifier is set as the above demonstration. Since the references [16][17][18] have analyzed the robustness of their detecting algorithms, so the comparison with these three algorithms will be in the subsequent robustness analysis.

Robustness to Frame-Deletion
Frame-deletion is a widely used technique in digital video tampering, thus we construct two frame-deletion video sets and test the robustness of the proposed method to frame-deletion in this section. Firstly, we compress the videos with bitrate of r 1 and decompress them. Then, we delete the 30th-59th frames from each decompressed video and recompress each video with bitrate of r 2 ( ) 1 2 r r < . Finally, the frame-deleted videos can be obtained. Table 14 presents the classification accuracy of our proposed feature FCF for detecting frame-deleted videos on 1080p video set. It can be seen that all the classification accuracies of FCF reach 100% in identifying recompressed framedeleted videos with fake bitrate. From Section 3, we know that the difference of prediction variables between single-compressed videos and recompressed ones are caused by the difference of ˆn F and n F . For frame-deleted videos, besides the difference between ˆn F and n F , frame deletion operation will reduce the number of video frames and influence the bitrate allocated to each picture, which further lead to the change of prediction variables. Thus, the recompressed frame-deleted video exhibits more significant changes in terms of FCF compared to the fake bitrate video without framedeletion. 10Mbps-40Mbps 20Mbps-40Mbps 30Mbps-40Mbps The team of Ningbo University proposed three classical HEVC recompressed video detection algorithms before and after 2015, the reference [8] is one of them. Reference [9] is the latest algorithm for HEVC recompression detection in 2018. The classification accuracies of the proposed method and the excellent references [8,9] under the QCIF video set are shown in Table 13. When r 1 − r 2 equals 100-200 Kbps, the classification accuracy of FCF is 11.4% and 5.8% higher than references [8,9], respectively. The classification accuracy of FCF is also higher than references [8,9] under other cases. Therefore, we can conclude that the proposed method has higher classification accuracy in HEVC fake bitrate video detection than the classical algorithm [8] and the latest algorithm [9]. Furthermore, we will compare both of the classification accuracy and different kinds of robustness among the proposed method and previous works [16][17][18] in next three sections. Since most of the previous works only test robustness to one specific attack on one specific resolution video set In order to maintain fairness, we will compare the proposed method with the previous work by using the same attack in the same resolution video set, respectively.

Robustness to Frame-Deletion
Frame-deletion is a widely used technique in digital video tampering, thus we construct two frame-deletion video sets and test the robustness of the proposed method to frame-deletion in this section. Firstly, we compress the videos with bitrate of r 1 and decompress them. Then, we delete the 30th-59th frames from each decompressed video and recompress each video with bitrate of r 2 (r 1 < r 2 ). Finally, the frame-deleted videos can be obtained. Table 14 presents the classification accuracy of our proposed feature FCF for detecting frame-deleted videos on 1080p video set. It can be seen that all the classification accuracies of FCF reach 100% in identifying recompressed frame-deleted videos with fake bitrate. From Section 3, we know that the difference of prediction variables between single-compressed videos and recompressed ones are caused by the difference ofF n and F n . For frame-deleted videos, besides the difference betweenF n and F n , frame deletion operation will reduce the number of video frames and influence the bitrate allocated to each picture, which further lead to the change of prediction variables. Thus, the recompressed frame-deleted video exhibits more significant changes in terms of FCF compared to the fake bitrate video without frame-deletion. Reference [17] showed its experimental results on the QCIF recompressed video set and QCIF frame-deletion video set, thus we compare it with the proposed method on these two video sets and the comparison results are shown in Table 15. It indicates that the accuracies of FCF are as high as that of reference [17], except the cases when the bitrates are 200-300 Kbps and 300-400 Kbps. But the classification accuracies of the proposed method on the QCIF frame-deleted video set are all 100%, which are much higher than that of reference [17]. Therefore, it can be said that the proposed method maintains high fake bitrate detection accuracy and has stronger robustness against frame-deletion attack.

Robustness to Shifted GOP Structure
GOP structure is an important setting in encoding videos, and it will influence the CU partitioning modes, PU partitioning modes, and IPM. Therefore, we will test the robustness of the proposed method against shifted GOP structure attack. Reference [16] is specifically proposed for recompression detection with different GOP structures. In order to facilitate the comparison with it, we do comparison experiments on the 1080p video set, which is the same as reference [16]. The encoding parameters of the 1080p single-compressed videos are: GOP structure IPPPPPPP and fake bitrate r 2 . The recompressed 1080p video set with shifted GOP structure can be obtained as follows: after decompressing single-compressed 1080p videos with parameters: GOP structure IPPP and bitrate r 1 , we recompress them with the parameters: GOP structure IPPPPPPP and recompressed fake bitrate r 2 (r 1 < r 2 ). The classification accuracies of the proposed method and the reference [16] on unshifted and shifted GOP structure video sets are shown in Table 16. It is obvious that whether on unshifted GOP structure video sets or shifted GOP structure video set, the accuracies of the proposed method are all much higher than that of reference [16]. Therefore, compared with reference [16], the proposed method has stronger performance in fake bitrate detection and resistance to shifted GOP structural attack.

Robustness to Copy-Paste Tampering
Copy-paste is also a common tampering operation for digital videos. Thus, we test the robustness of the proposed method to copy-paste tampering. A copy-paste video set for 1080p video set is constructed as follows. We copy a region of the first frame in decompressed video and paste it to the 30th-59th frames, and then recompress the video at fake bitrate of r 2 (r 1 < r 2 ). The copied region size takes 30% of the first frame.
Our previous work [18] is the latest paper on HEVC recompression detection published in 2018, it used only one prediction variable: PU partitioning modes of P-picture. However, in the prediction process of HEVC, the upper layer of PU is CU and the lower layer of PU is IPM. Therefore, the reference [18] ignored the CU partitioning modes, which can better describe the overall complexity of the image block, and ignored the IPM, which contains the most detail information of the video content, more importantly, it ignored the information of I-picture, which is vital for HEVC videos. Therefore, by comprehensively analyzing the prediction process of HEVC, the proposed method extracts I-CU partitioning modes, I-PU partitioning modes, I-IPM, P-CU partitioning modes, and P-PU partitioning modes simultaneously. Considering that each prediction variable has different expressiveness on different video content. For example, IPM is more accurate for texture-complex regions, and CU is more suitable for smooth region, but we know that almost every video has smooth regions and complex regions, so we use the fusion method to enhance feature expression and further improve classification accuracy of the proposed method. In the following, we will comprehensively compare our method with reference [18], including computational complexity, the classification accuracy in recompressed video sets, and robustness to different video attacks.
In the proposed method, the feature extraction time and SVM training time are the main consuming time, so they are used to represent the computational complexity, and compared with that of [18]. The feature dimension of the proposed method is 44 D, and that of the reference [18] is 25 D. The feature extraction time and the time required for model training in four resolution videos are shown in Table 17. Because the feature dimension of the proposed method is slightly higher, the time complexity of the proposed method is slightly higher. Reference [18] has strong robustness on resisting copy-paste, frame-deletion, and shifted GOP structure attacks. In order to better demonstrate the superiority of the proposed method, we compare it with reference [18] in four kinds of 1080p video sets, as shown in Table 18. The comparison results show that whether on the recompressed video sets or on the three kinds of attacked video sets, the classification accuracies of the proposed method are all higher than that of the reference [18], which illustrates that the proposed method not only improves the accuracy of detecting HEVC fake-bitrate recompressed videos, but also improves the robustness against frame-deletion, copy-paste, and shifted GOP structure attacks.

Conclusions
In this paper, we have proposed a novel method to detect double-compressed HEVC videos with fake bitrates. After systematically and comprehensively analyzing the prediction process of HEVC, five effective single features are extracted from prediction process: the probability matrixes of I-CU partitioning modes, I-PU partitioning modes, I-IPM, P-CU partitioning modes, and P-PU partitioning modes. Considering the complementarity between these five single features, concatenation is adopted and three concatenation features are obtained: CFoI, CFoP, and FCF. Each kind of classification features is sent to SVM for training and testing. Experimental results show that both the five single features and the concatenation features are effective. Furthermore, compared with the state-of-art works, the proposed method not only has higher classification accuracy for detecting HEVC recompressed video with fake bitrates, but also has stronger robustness against frame-deletion, copy-paste, and shifted GOP structure attacks. In future, we will further study other HEVC coding modules and look for more effective classification features.