Detection of Double-Compressed H . 264 / AVC Video Incorporating the Features of the String of Data Bits and Skip Macroblocks

Today’s H.264/AVC coded videos have a high quality, high data-compression ratio. They also have a strong fault tolerance, better network adaptability, and have been widely applied on the Internet. With the popularity of powerful and easy-to-use video editing software, digital videos can be tampered with in various ways. Therefore, the double compression in the H.264/AVC video can be used as a first step in the study of video-tampering forensics. This paper proposes a simple, but effective, double-compression detection method that analyzes the periodic features of the string of data bits (SODBs) and the skip macroblocks (S-MBs) for all I-frames and P-frames in a double-compressed H.264/AVC video. For a given suspicious video, the SODBs and S-MBs are extracted for each frame. Both features are then incorporated to generate one enhanced feature to represent the periodic artifact of the double-compressed video. Finally, a time-domain analysis is conducted to detect the periodicity of the features. The primary Group of Pictures (GOP) size is estimated based on an exhaustive strategy. The experimental results demonstrate the efficacy of the proposed method.


Introduction
The rapid development of video-compression technology has made it possible to use digital video technology in many different fields (e.g., digital TV broadcasts, video conferences, digital video surveillance).With the automatic manufacturing of cheap digital devices, digital cameras are widely used in daily life.In addition, due to the implementation of high-speed Internet and the popularity of various video-sharing sites (e.g., YouTube), more and more users are uploading videos to the Internet.Digital videos may have undergone editing operations throughout the spreading process.Users may be able to enhance the quality of the video after shooting a digital video.After downloading a video from the site, the user can perform a variety of video transformations: copy a video scene, video zoom-in or zoom-out, adjustment of video color and contrast, and subtitle insertion.
Research to identify the authenticity of the multimedia (e.g., images, videos, audios) has attracted greater attention in the past few years.At first, tampering-detection research mainly focused on images, and the existing techniques for image forgery detection can be roughly grouped into three aspects: image forensic-hashing techniques [1][2][3], image fragile-watermarking techniques [4,5], and image passive-forensic techniques [6][7][8].Although the methods using the former two techniques can provide more accurate detection accuracy, the main disadvantage of hashing and watermarking techniques is that the hash information needs to be extracted in advance and watermark information needs to be embedded during the image-generating process.However, for passive-forensic techniques, the inspector can authenticate the image without any prior knowledge, in other words, passive-forensic methods are suitable for a wider range of practical applications.Inspired by image forgery-detection applications, the forgery-detection research has fast expanded to other media formats.To detect the tampering history of digital videos, great efforts have been invested in protecting the content of videos with digital-watermarking techniques [9].Like image forgery detection, the main disadvantage of video-watermarking techniques is that the watermark information needs to be embedded during the video-recording process.Unlike watermarking techniques, passive video-forensic techniques identify the authentication of the videos without adding any extra information during the generation and spread processes.Therefore, video-forensic techniques have become the primary research direction of copyright protection in recent years [10,11].
In most video-tampering cases, the existing video-editing tools cannot directly operate on compressed video.As such, the process of editing a video is divided into three steps: decompress the input compressed-video sequence, edit the content of the decompressed sequence, and recompress the edited video again.In other words, the digital videos on the network are likely to have been compressed at least twice.Once a video sequence is decompressed into the pixel domain and a secondary compression is conducted, some of the primary encoding information is clearly lost.Hence, it is not possible to easily obtain the previous compression history information by analyzing the last encoded stream information.In the digital video-compression process, different coding standards (e.g., conventional MPEG-2 and MPEG-4 encoders, and the latest H.264/AVC and HEVC [12] standards) can be selected according to the application scenarios.
In all video-coding standards, a Group of Pictures (GOP) is used as the encoding unit.The GOP includes three video frame types: intra-coded frame (I-frame), predictive frame (P-frame) and bi-directionally predictive frame (B-frame).According to the difference between the primary and secondary compressed GOP structures, the research on double-compressed video detection can be grouped into two aspects: detecting double-compressed video with the same GOP structure or detecting double-compressed video with different GOP structures.
For the double compressed video with the same GOP structure, some detection methods have been proposed by using different features.Through utilizing a convex pattern in the distribution of quantized DCT coefficients, Su and Xu [13] proposed an approach to effectively detect a double MPEG-2 compression at various output bit rates.In [14], Wang and Farid utilized the time domain motion residual sequence of P-frames to detect a digital video-frame deletion or frame insertion operation.Based on the probability distribution of the first digits of the non-zero MPEG quantized alternating current (AC) coefficients, Chen and Shi [15] proposed a detection scheme to effectively detect doubly MPEG compressed videos for both the variable bit rate mode and constant bit rate mode.Combining the first digit distribution of quantized AC coefficients and the DCT coefficients, Sun et al. [16] proposed an approach with serial support vector machine (SVM) architecture to estimate the original bit rate scale in a double-compressed video.Jiang et al. [17] proposed an approach to detect double MPEG-4 compression artifacts based on Markov statistics.
For double-compressed videos with various GOP structures, most of the previously mentioned methods failed to detect this kind of forgery.The reason may be that some original I-frames were coded as P-frames in the secondary compression and that these two kinds of frames were applied to two different predictive-coding strategies.Since the fixed GOP has been used, Wang and Farid [18] detected a double compression with various GOP by using the periodicity of the average prediction residual sequence.In [19], Aghamaleki and Behrad used the quantization error of the P-frame prediction residuals to detect a double video compression and locate the deleted or inserted frames in the time domain.Stamm et al. [20] proposed an automatic double-compression detection method for spectral peak characteristics based on the average prediction residuals of the P-frames.Vazquez-Padin et al. [21] proposed a double-compression detection method based on the variation characteristics of the macroblock prediction type in the double coded P-frames and estimated the primary GOP size of the to-be-detected videos.For a double-compressed video with intense foreground motion, He et al. [22] proposed a method to detect a double MPEG-4 compression based on a local motion vector field analysis, conducted after a static background segmentation.More recently, Bestagini et al. [23] proposed a coding footprint to identify the secondary codec and estimated the GOP size used in the first encoding process.The method in [23] is based on using the same codec and encoding parameters to recode the video sequence, which can generate similar characteristics with the former sequence.
The aforementioned double-compression detection methods are primarily based on the MPEG-4 compression standard.However, the videos coded by the H.264/AVC standard are more prevalent in practical terms.Thus, in this paper, by analyzing the time-domain periodic characteristics of H.264/AVC videos, we propose a simple, but effective, double-compressed H.264/AVC video-detection method incorporating the periodic features of the string of data bits and skip macroblocks.The rest of this paper is organized as follows.Sections 2 and 3 introduce the motivation and concrete steps of the proposed method, respectively.Section 4 presents the experimental results and Section 5 concludes the paper.

Periodic Artifact in the P-Frame String of Data Bits
The original video sequence contains a large amount of data that needs to be stored and transmitted.In order to compress the original video, a variety of compression algorithms are used to reduce the redundancy in the video content.As the most prevalent video-compression standard, the H.264/AVC video-coding technique reduces the redundancy in both the space and time domains through the integration of intra-and inter-prediction coding.Similar to the MPEG-2 and MPEG-4 standards, the video frames in the H.264/AVC standard are divided into several consecutive GOPs.To suppress the channel noise and the error propagation caused by the frame loss during the decoding process, each GOP is independently coded.
The GOP structure contains three types of frames: I-frame, P-frame, and B-frame.The I-frame, also known as the intra-frame coding frame, is the starting frame for each GOP.It does not rely on the other frame during the coding process.It should be noted that the H.264/AVC coding standard applied a new intra-prediction coding technology that can effectively reduce the string of the data bits (SODBs) of the I-frames.The P-frame, also known as the forward predictive coding frame, executes compression coding by using a one-way motion compensation technique.In this way, only the residuals predicted by the I-frame or P-frame, before the current coding frame and the motion compensation prediction values, are encoded.The B-frame, also known as the bi-directional predictive coding frame, has a similar encoding process to the P-frame, except for the use of a two-way motion compensation technique to generate the current encoding frame residual.
The values of the orthogonal transformed coefficients and prediction errors are quantified to represent the output values with fewer bits, during the process of video-compression coding.Once the secondary compression process is conducted using another GOP structure, the encoding style of the original I-frame may be converted to that of the P-frame.Due to the use of the different quantization steps during the various encoding types, the P-frames of the double-compressed video illustrates periodic statistical characteristics.Denote F = {F t , t = 1, 2, . . ., T} as the original video frame set, where T is the total frame number.Note here, we only consider the I-frame and P-frames in each GOP. Figure 1 shows the schematic of the double compression for the cases of conversion from an I-frame to a P-frame.As shown in Figure 1, during the process of the primary compression, F t−1 is the last frame of the (i − 1) th GOP and it is encoded using the inter-frame coding mode.For F t , located at the starting frame of the i th GOP, the quantization process of the primary compression on the frames of F t and F t−1 can be expressed as: where Q n t stands for the n th -time quantized value of the t th frame, Q ∆1 [] and Q ∆2 [] denote the intra-frame and inter-frame quantization operations, respectively, Γ represents the transformation coding operation, superscript ˆrepresents the prediction operation during the intra-frame coding process, ∆1 and ∆2 denote the corresponding quantization steps, respectively, where the former is greater than the latter, and superscript ~stands for the inter-frame motion compensation prediction process, which is obtained from the current frame F t−1 and its preceding decompressed frame, denoted by F t−2 .
After the single compression is conducted, F t and F t−1 can be approximately restored as: Once the video sequence undergoes a secondary compression, F t and F t−1 will again be treated as the input frames for the compression.As shown in Figure 1, both frames are likely to be appointed as the P-frames, to apply the inter-frame coding mode at the same time.Note that both frames are grouped into the same GOP in the secondary compression.Similar to (1) and ( 2), the secondary compression process for the (t − 1) th and t th frames can be expressed as: Note that, for (5), during the primary and secondary compression processes, F t−1 and F t−1 both use the same inter-frame prediction mode and the same reference frame.Thus, the predicted values after their two motion compensations are approximately equal (i.e., F t−1 ∼ = F t−1 ).According to (5)  and ( 6), the quantized value In other words, it is necessary to use a longer string of data bits to represent the t th frame than the (t − 1) th frame.Furthermore, there are two different types of double-compressed P-frames: the frame specified as the P-frame during both compression processes (referred to herein as the P-P-frame) (e.g., (t − 1) th frame) and the frame specified as the I-frame during the first compression process, while being converted to a P-frame in the secondary compression process (referred to herein as the I-P-frame) (e.g., t th frame).
The schematic of the double compression for the cases of conversion from an I-frame to a Pframe.
Note that the P-P-frame applies to the same inter-frame coding mode during both compression processes.Moreover, the quantization parameters of the two compression processes are the same.For the I-P-frame, the intra-frame and inter-frame coding modes are applied during the primary and secondary compression processes, respectively.Therefore, taking Ft as an example, the time domain correlation between the reference and current frame is weak, relative to the P-P-frame.In other words, although both the (t − 1) th and t th frames are both encoded as P-frames in the secondary compression process, more data bits are needed to represent the t th frame, than the (t − 1) th frame, due to the weaker correlation.To verify this view, Figure 2 illustrates an example for this circumstance.Here, the YUV sequence hall (All standard test sequences are downloaded from http://media.xiph.org/video/derf/.)was doubly compressed with the first and second GOP length settings at 12 and 15, respectively.Figure 2 illustrates the corresponding data bits for each I-frame and P-frame, where the blue solid line and red dashed line represent the data bits after the primary and secondary compressions, respectively.Figure 2 also shows that for a double compression, besides the obvious peaks at the second I-frame indices, there are still some lower peaks located at the first I-frame indices, which have been encoded as a P-frame during the secondary compression.Note that the P-P-frame applies to the same inter-frame coding mode during both compression processes.Moreover, the quantization parameters of the two compression processes are the same.For the I-P-frame, the intra-frame and inter-frame coding modes are applied during the primary and secondary compression processes, respectively.Therefore, taking F t as an example, the time domain correlation between the reference and current frame is weak, relative to the P-P-frame.In other words, although both the (t − 1) th and t th frames are both encoded as P-frames in the secondary compression process, more data bits are needed to represent the t th frame, than the (t − 1) th frame, due to the weaker correlation.To verify this view, Figure 2 illustrates an example for this circumstance.Here, the YUV sequence hall (All standard test sequences are downloaded from http://media.xiph.org/video/derf/.) was doubly compressed with the first and second GOP length settings at 12 and 15, respectively.Figure 2 illustrates the corresponding data bits for each I-frame and P-frame, where the blue solid line and red dashed line represent the data bits after the primary and secondary compressions, respectively.Figure 2 also shows that for a double compression, besides the obvious peaks at the second I-frame indices, there are still some lower peaks located at the first I-frame indices, which have been encoded as a P-frame during the secondary compression.
Symmetry 2017, 9, 313 5 of 17 The schematic of the double compression for the cases of conversion from an I-frame to a Pframe.
Note that the P-P-frame applies to the same inter-frame coding mode during both compression processes.Moreover, the quantization parameters of the two compression processes are the same.For the I-P-frame, the intra-frame and inter-frame coding modes are applied during the primary and secondary compression processes, respectively.Therefore, taking Ft as an example, the time domain correlation between the reference and current frame is weak, relative to the P-P-frame.In other words, although both the (t − 1) th and t th frames are both encoded as P-frames in the secondary compression process, more data bits are needed to represent the t th frame, than the (t − 1) th frame, due to the weaker correlation.To verify this view, Figure 2 illustrates an example for this circumstance.Here, the YUV sequence hall (All standard test sequences are downloaded from http://media.xiph.org/video/derf/.)was doubly compressed with the first and second GOP length settings at 12 and 15, respectively.Figure 2 illustrates the corresponding data bits for each I-frame and P-frame, where the blue solid line and red dashed line represent the data bits after the primary and secondary compressions, respectively.Figure 2 also shows that for a double compression, besides the obvious peaks at the second I-frame indices, there are still some lower peaks located at the first I-frame indices, which have been encoded as a P-frame during the secondary compression.

Periodic Feature of Skip Macroblocks for Double-Compressed H.264/AVC Videos
In Section 2.1, we explained the periodic artifacts for the string of data bits of P-frames after a double compression with different GOP structures.This characteristic can be used as a significant clue to detect the double compression, in most circumstances.However, some videos experience dramatic changes in the video content (e.g., a test sequence coastguard, due to the camera moving up and down during the shooting process).Thus, the background is changed dramatically and the periodic characteristic is no longer observed.This is observed in Figure 3, where the red solid stem denotes the 73rd frame, an I-P-frame that is now submerged in its surrounding values.In this example, the 75th frame, which is a P-P-frame, has more data bits than the 73rd frame.Thus, the enhancement of the periodic artifact for the double-compressed videos with moving content is significant for double-compression forgery detection.

Periodic Feature of Skip Macroblocks for Double-Compressed H.264/AVC Videos
In Section 2.1, we explained the periodic artifacts for the string of data bits of P-frames after a double compression with different GOP structures.This characteristic can be used as a significant clue to detect the double compression, in most circumstances.However, some videos experience dramatic changes in the video content (e.g., a test sequence coastguard, due to the camera moving up and down during the shooting process).Thus, the background is changed dramatically and the periodic characteristic is no longer observed.This is observed in Figure 3, where the red solid stem denotes the 73rd frame, an I-P-frame that is now submerged in its surrounding values.In this example, the 75th frame, which is a P-P-frame, has more data bits than the 73rd frame.Thus, the enhancement of the periodic artifact for the double-compressed videos with moving content is significant for double-compression forgery detection.To suppress the interference of the scene transformation (e.g., the 75th frame in Figure 3), we propose an I-P-frame feature enhancement method for H.264/AVC videos.During the encoding process of the H.264/AVC standard, each P-frame is permitted to have three types of macroblocks: intra-frame encoding macroblocks (I-MB), prediction-encoding macroblocks (P-MB) and skip macroblocks (S-MB).It is worth noting that S-MB is one type of macroblock that ignores macroblock compression coding; it only needs to mark the macroblock type in the frame stream.Since the I-Pframes and their surrounding P-P-frames originally belonged to different GOPs in the primary compression process, a large number of macroblock types are converted during the I-P-frame coding.More specifically, S-MBs are converted into P-MBs for prediction coding.In addition, the number of I-MBs corresponding to the more motion dramatic regions in the video frame are increased accordingly.
For ease of comprehension, Figure 4 illustrates the comparison of numbers for all three macroblock types between the P-P-frame and the I-P-frame after the double-compression process.In Figure 4, we also take the sequence coastguard as an example.The primary and secondary GOP sizes were set at 12 and 25, respectively.As shown in Figure 4, the 72nd, 74th and 75th frames all belong to the P-P-frame and the 73rd frame belongs to the I-P-frame.The lower panel in Figure 4 illustrates the macroblock types of the 72nd, 73rd, 74th and 75th frames in the double-compressed video, respectively.Here, the S-MB, I-MB (8 × 8 size), I-MB (4 × 4 or 16 × 16 size) and P-MB are denoted by yellow, orange, red and blue dots, respectively.It can be observed that the number of S-MBs for each P-P-frame is greater than that for each I-P-frame, even for some scene-switching frames (e.g., the 75th frame).For the I-P-frames (e.g., the 73rd frame in Figure 4), a large number of P-MBs are converted into I-MBs.The S-MBs are converted into P-MBs at the same time as the preceding frame.This artifact is caused by the conversion of the prediction-coding mode, from the I-frame to the P-frame.To suppress the interference of the scene transformation (e.g., the 75th frame in Figure 3), we propose an I-P-frame feature enhancement method for H.264/AVC videos.During the encoding process of the H.264/AVC standard, each P-frame is permitted to have three types of macroblocks: intra-frame encoding macroblocks (I-MB), prediction-encoding macroblocks (P-MB) and skip macroblocks (S-MB).It is worth noting that S-MB is one type of macroblock that ignores macroblock compression coding; it only needs to mark the macroblock type in the frame stream.Since the I-P-frames and their surrounding P-P-frames originally belonged to different GOPs in the primary compression process, a large number of macroblock types are converted during the I-P-frame coding.More specifically, S-MBs are converted into P-MBs for prediction coding.In addition, the number of I-MBs corresponding to the more motion dramatic regions in the video frame are increased accordingly.
For ease of comprehension, Figure 4 illustrates the comparison of numbers for all three macroblock types between the P-P-frame and the I-P-frame after the double-compression process.In Figure 4, we also take the sequence coastguard as an example.The primary and secondary GOP sizes were set at 12 and 25, respectively.As shown in Figure 4, the 72nd, 74th and 75th frames all belong to the P-P-frame and the 73rd frame belongs to the I-P-frame.The lower panel in Figure 4 illustrates the macroblock types of the 72nd, 73rd, 74th and 75th frames in the double-compressed video, respectively.Here, the S-MB, I-MB (8 × 8 size), I-MB (4 × 4 or 16 × 16 size) and P-MB are denoted by yellow, orange, red and blue dots, respectively.It can be observed that the number of S-MBs for each P-P-frame is greater than that for each I-P-frame, even for some scene-switching frames (e.g., the 75th frame).For the I-P-frames (e.g., the 73rd frame in Figure 4), a large number of P-MBs are converted into I-MBs.The S-MBs are converted into P-MBs at the same time as the preceding frame.This artifact is caused by the conversion of the prediction-coding mode, from the I-frame to the P-frame.
In general, the quantization matrix and quality factor of the I-frame are different from that of the P-frame.The I-frame is successively or indirectly the reference frame, as designated by the subsequent un-coded frame (P-frame or B-frame).In addition, for the static background, if there is no residual information or motion vector, the encoder will use the S-MB to improve the coding efficiency.However, in the case of severe background changes or frequent scene switching, once the decompressed I-frame is encoded in the P-frame prediction coding mode during the secondary compression process, a slight change with respect to its reference frame is inevitably generated.Thus, the relative displacement is generated to obtain the prediction vector.The S-MB cannot be used directly, so it is converted to the P-MB for inter-frame prediction coding.In a word, the number of S-MBs in the I-P-frame is smaller than that of the other P-P-frames.
Based on the observation of the macroblock artifacts for the I-P frames, we enhance the periodic characteristics for the double compressed H.264/AVC videos by incorporating the skip macroblock features.In this paper, for the t th frame, we use the ratio of the total number of macroblocks, denoted by N t , and the number of S-MBs, denoted by S t , to quantify this feature: Symmetry 2017, 9, 313 7 of 17 In general, the quantization matrix and quality factor of the I-frame are different from that of the P-frame.The I-frame is successively or indirectly the reference frame, as designated by the subsequent un-coded frame (P-frame or B-frame).In addition, for the static background, if there is no residual information or motion vector, the encoder will use the S-MB to improve the coding efficiency.However, in the case of severe background changes or frequent scene switching, once the decompressed I-frame is encoded in the P-frame prediction coding mode during the secondary compression process, a slight change with respect to its reference frame is inevitably generated.Thus, the relative displacement is generated to obtain the prediction vector.The S-MB cannot be used directly, so it is converted to the P-MB for inter-frame prediction coding.In a word, the number of S-MBs in the I-P-frame is smaller than that of the other P-P-frames.
Based on the observation of the macroblock artifacts for the I-P frames, we enhance the periodic characteristics for the double compressed H.264/AVC videos by incorporating the skip macroblock features.In this paper, for the t th frame, we use the ratio of the total number of macroblocks, denoted by Nt, and the number of S-MBs, denoted by St, to quantify this feature:

Proposed Method
The periodic artifacts for the double-compressed H.264/AVC videos have been explained in Section 2. The proposed method to detect the double-compression manipulation will be introduced in this section.In addition, the GOP size of the primary compression can be estimated using the proposed method.Figure 5 illustrates the three-step schematic diagram of the proposed method:

Proposed Method
The periodic artifacts for the double-compressed H.264/AVC videos have been explained in Section 2. The proposed method to detect the double-compression manipulation will be introduced in this section.In addition, the GOP size of the primary compression can be estimated using the proposed method.Figure 5 illustrates the three-step schematic diagram of the proposed method: coding parameter extraction, feature sequence calculations, and periodic characteristic analysis.More specifically, the string of the data bits and the skip macroblock number for each I-frame and P-frame, and the GOP size of the to-be-detected video, can be extracted directly from the video stream.The conversion of the S-MBs in the video frame is calculated and the final feature sequence is generated by combining the features of the string of the data bits and the S-MBs.Finally, the feature sequence is analyzed using the time domain periodic analysis method in order to obtain the detection result and estimate the primary GOP size.Details for Steps 2 and 3 will be presented in Sections 3.1 and 3.2, respectively.coding parameter extraction, feature sequence calculations, and periodic characteristic analysis.More specifically, the string of the data bits and the skip macroblock number for each I-frame and P-frame, and the GOP size of the to-be-detected video, can be extracted directly from the video stream.The conversion of the S-MBs in the video frame is calculated and the final feature sequence is generated by combining the features of the string of the data bits and the S-MBs.Finally, the feature sequence is analyzed using the time domain periodic analysis method in order to obtain the detection result and estimate the primary GOP size.Details for Steps 2 and 3 will be presented in Sections 3.1 and 3.2, respectively.
Extract coding parameters in each I-frame and P-frame

String of data bits (SODBs)
Skip macroblocks (S-MBs) Suppress the interference of current I-frames

Incorporate the features of SODBs and S-MBs
The output result

Step 1 Coding parameters extraction
Step 2 Feature sequence generation

The input video
Enhanced feature sequence Periodic analysis

Generation of the Feature Sequence
After extracting the SODB and the macroblock types for each frame, as demonstrated in Section 2.1, the original I-frame is likely converted to a P-frame for the inter-frame prediction coding in the secondary compression process.Therefore, for a double-compression video, the SODBs of the Pframes illustrate the periodic peaks.This attribute is used as the characteristic of the doublecompression detection in the proposed method.To further expose the underlying periodic artifact, a pre-process is conducted by replacing the value of the SODBs at the current I-frames by the average SODBs of their corresponding, preceding and next frames:

Generation of the Feature Sequence
After extracting the SODB and the macroblock types for each frame, as demonstrated in Section 2.1, the original I-frame is likely converted to a P-frame for the inter-frame prediction coding in the secondary compression process.Therefore, for a double-compression video, the SODBs of the P-frames illustrate the periodic peaks.This attribute is used as the characteristic of the double-compression detection in the proposed method.To further expose the underlying periodic artifact, a pre-process is conducted by replacing the value of the SODBs at the current I-frames by the average SODBs of their corresponding, preceding and next frames: where G 2 is the GOP size of the latest compression process, T is the total frame number, and D t is the SODB of the t th frame.By doing so, the interference of the extremely high SODBs of the current I-frames are suppressed.Taking the sequence hall as an example, as in Figure 2, a comparison of the single and double compression, after the preprocess, is shown in Figure 6. Figure 6 illustrates that the peaks in the current I-frame have been successfully suppressed.
Symmetry 2017, 9, 313 9 of 17 where G2 is the GOP size of the latest compression process, T is the total frame number, and Dt is the SODB of the t th frame.By doing so, the interference of the extremely high SODBs of the current Iframes are suppressed.Taking the sequence hall as an example, as in Figure 2, a comparison of the single and double compression, after the preprocess, is shown in Figure 6. Figure 6 illustrates that the peaks in the current I-frame have been successfully suppressed.To further suppress the interference of scene switching, a P-frame skip macroblock feature is extracted, according to (7).The periodic artifact in the SODBs is then enhanced by incorporating the skip macroblock feature as: where Et is the enhanced data bits feature of the t th frame, incorporating the S-MB feature.At this point, we have obtained the feature sequence {Et, t = 1, 2, …, T}, which will be used to expose the periodic artifact in Step 3.

Detection of Double Compression and Estimation of the Primary GOP Size
The feature sequence {Et, t = 1, 2, …, T} is used to conduct a time-domain analysis to detect the double compression and estimate the primary GOP size.A time-domain analysis method, similar to [22], is used to obtain the final result.Since the detector does not know the primary GOP size, the set containing all primary GOP size candidates is initially identified.It is reasonable to assume that the maximum value of the primary GOP size is limited and is much smaller than the length of the video.
Based on this assumption, the maximum candidate value of the primary GOP size, denoted by 1 max Ĝ , is set at: ( ) Thus, the set ˆ1 G = {2, 3, ..., 1 max Ĝ } contains all probable primary GOP sizes.The value of 1 is not included in the set ˆ1 G , because the I-P frame does not appear, in this case.Hence, it is not realistic for a practical H.264/AVC compression with GOP size being set at 1.
The function To further suppress the interference of scene switching, a P-frame skip macroblock feature is extracted, according to (7).The periodic artifact in the SODBs is then enhanced by incorporating the skip macroblock feature as: where E t is the enhanced data bits feature of the t th frame, incorporating the S-MB feature.At this point, we have obtained the feature sequence {E t , t = 1, 2, . . ., T}, which will be used to expose the periodic artifact in Step 3.

Detection of Double Compression and Estimation of the Primary GOP Size
The feature sequence {E t , t = 1, 2, . . ., T} is used to conduct a time-domain analysis to detect the double compression and estimate the primary GOP size.A time-domain analysis method, similar to [22], is used to obtain the final result.Since the detector does not know the primary GOP size, the set containing all primary GOP size candidates is initially identified.It is reasonable to assume that the maximum value of the primary GOP size is limited and is much smaller than the length of the video.Based on this assumption, the maximum candidate value of the primary GOP size, denoted by Ĝ1 max , is set at: Ĝ1 max = min(150, T/10 ).(10) Thus, the set Ĝ1 = {2, 3, . . ., Ĝ1 max } contains all probable primary GOP sizes.The value of 1 is not included in the set Ĝ1 , because the I-P frame does not appear, in this case.Hence, it is not realistic for a practical H.264/AVC compression with GOP size being set at 1.
The function Λ( Ĝ1 ) is then defined to evaluate the periodic characteristics presented in the feature sequence {E t , t = 1, 2, . . ., T} for each candidate Ĝ1 in Ĝ1 : Considering the issue of some frame deletion, probably conducted before the secondary compression, each frame in its primary GOP has a probability that the intra-frame predictive coding is performed as a key frame in the secondary compression process.Thus, in this method, and for each Ĝ1 , it is assumed that each frame in Ĝ1 is likely to be the original I-frame and each frame is traversed to obtain the final Λ( Ĝ1 ).
After computing the Λ( Ĝ1 ) with all candidate Ĝ1 , we seek the maximum and the second value among all Λ( Ĝ1 ).We then compare their differences with a pre-defined threshold T Λ .It is worth pointing out that the strategy of setting T Λ in the proposed method refers to [22], and T Λ is sought according to the receiver operating characteristic (ROC) curve analysis on a large number of samples with the fixed parameters.If the difference is smaller than T Λ , the video is identified as a single-compressed video; otherwise, it is identified as a double-compressed video, and its primary GOP size G * 1 can be estimated as: In order to better explain the basic flow of the proposed method, a pseudo-code implementation of double-compressed video detection and primary GOP size estimation is described in Algorithm 1, where max1( ) and max2( ) are the functions to seek the maximum and the second value among the input data set.Algorithm 1. Pseudo code of double-compressed video detection and primary GOP size estimation.
Input: video sequence {F t , t = 1, 2, . . ., T}, the last GOP size G 2 , and threshold T Λ 1: Count string of data bits (SODB) of each frame F t , denote D t 2: Count total number of macroblocks and number of skip macroblocks (S-MB), denote N t and S t , respectively 3: for t = 1 to T do 4:

Experimental Results
In this section, the efficacy of the proposed method will be demonstrated on some standard test sequences, which will be provided in Section 4.1.

Test Dataset
To verify the proposed method, we used a variety of video sequences encoded with different parameters.Table 1 lists all used encoding parameters in our dataset, including the primary compression codec c 1 , the primary video-compression bit rate R 1 , its corresponding GOP size G 1 , the secondary compression codec c 2 , the secondary compressed bit rate R 2 and its corresponding GOP size G 2 .In the experiments, 11 CIF (resolution 352 × 288) YUV sequences were used as the original videos.They include: akiyo, bowing, container, foreman, deadline, hall, coastguard, news, paris, sign-irene, and silent.Note that all YUV video sequences in our dataset not only include fixed-camera captured video, they also include a moving camera shoot.All selected video sequences are shown in Figure 7.To reduce the computational complexity, the experimental analysis only compressed the first 300 frames of the YUV sequence.The compression process used a different bit rate and codec to generate the target.MPEG-2, MPEG-4 and H.264/AVC standard codecs were used during the primary compression process.The H.264/AVC standard codec was fixed during the second compression.The first compressed bit rate value R 1 and the second compressed bit rate value R 2 were set at {300, 500, 700, 900, 1100} kbps.The primary and secondary compression GOP size G 1 and G 2 were set at {10, 15, 30, 40} and {9, 16, 33, 50}, respectively.Therefore, for each YUV sequence, 60 singly compressed videos and 1200 doubly compressed videos can be generated, according to the different combinations of parameters.

Experimental Results
In this section, the efficacy of the proposed method will be demonstrated on some standard test sequences, which will be provided in Section 4.1.

Test Dataset
To verify the proposed method, we used a variety of video sequences encoded with different parameters.Table 1 lists all used encoding parameters in our dataset, including the primary compression codec c1, the primary video-compression bit rate R1, its corresponding GOP size G1, the secondary compression codec c2, the secondary compressed bit rate R2 and its corresponding GOP size G2.
In the experiments, 11 CIF (resolution 352 × 288) YUV sequences were used as the original videos.They include: akiyo, bowing, container, foreman, deadline, hall, coastguard, news, paris, sign-irene, and silent.Note that all YUV video sequences in our dataset not only include fixed-camera captured video, they also include a moving camera shoot.All selected video sequences are shown in Figure 7.
To reduce the computational complexity, the experimental analysis only compressed the first 300 frames of the YUV sequence.The compression process used a different bit rate and codec to generate the target.MPEG-2, MPEG-4 and H.264/AVC standard codecs were used during the primary compression process.The H.264/AVC standard codec was fixed during the second compression.The first compressed bit rate value R1 and the second compressed bit rate value R2 were set at {300, 500, 700, 900, 1100} kbps.The primary and secondary compression GOP size G1 and G2 were set at {10, 15, 30, 40} and {9, 16, 33, 50}, respectively.Therefore, for each YUV sequence, 60 singly compressed videos and 1200 doubly compressed videos can be generated, according to the different combinations of parameters.

Detection Results for the Double Compression
In this section, the proposed double-compression method is evaluated with different parameter settings, including bit rates and GOP size. Figure 6 illustrates an example of our detection results for the test sequence coastguard, where the primary GOP and secondary GOP were set at 10 and 16, respectively.Figure 8 illustrates that, by using the enhanced feature of the string of the data bits, the interference of frame 75 has been partially suppressed and the periodicity for all peak values is detected.We also prepare the proposed method with other state-of-the-art methods, including He et al. [22], Stamm et al. [20] and Vazquez-Padin et al. [21].

Detection Results for the Double Compression
In this section, the proposed double-compression method is evaluated with different parameter settings, including bit rates and GOP size. Figure 6 illustrates an example of our detection results for the test sequence coastguard, where the primary GOP and secondary GOP were set at 10 and 16, respectively.Figure 8 illustrates that, by using the enhanced feature of the string of the data bits, the interference of frame 75 has been partially suppressed and the periodicity for all peak values is detected.We also prepare the proposed method with other state-of-the-art methods, including He et al. [22], Stamm et al. [20] and Vazquez-Padin et al. [21].The coding bit rate is a significant parameter for video compression.Accordingly, we first valuated our method with different bit rate settings.For a fixed setting of (R1, R2), through modifying the other parameters of all test videos, we can generate 11(sequence number) × 3(c1 number) × 4(G1 number) = 132 single compressed videos.These test videos were double compressed using the different setting of G2 to generate 132 × 4 = 528 double-compressed sample videos.Following [22], we obtained the receiver operating characteristic (ROC) curve to seek the optimal threshold T Λ with a fixed combination of (R1, R2).The optimal detection accuracy was also obtained at the same time.Table 2 lists the detection accuracy with different bit rate settings where the best results for each combination of (R1, R2) are bold and underlined.Note that all of the accuracy data contains two aspects: whether the single-and double-compressed videos were correctly detected.Table 2, in most cases, shows that the method proposed in this paper has a better performance for doublecompression detection than other methods in different target bit rate videos.The greater the value of the bit rate means that the quality of the compressed video is better.When R1 < R2, all methods have satisfactory accuracies.When R1 is small, the detection accuracies of the proposed method are slightly lower than that of [22].However, with the increase in R1, the proposed method exhibits slightly better performance than all of the rest of the methods.In the case of R1 > R2, the performance of all of the methods has a different degree of decline.The cause for this kind of decline for the primary compression is due to the higher bit rate, meaning a smaller quantization step.Thus, a variety of periodic artifact features used in all methods are less obvious to detect.However, our method still has higher accuracies than the other methods; this is due to the incorporation of the skip macroblock feature.The coding bit rate is a significant parameter for video compression.Accordingly, we first valuated our method with different bit rate settings.For a fixed setting of (R 1 , R 2 ), through modifying the other parameters of all test videos, we can generate 11(sequence number) × 3(c 1 number) × 4(G 1 number) = 132 single compressed videos.These test videos were double compressed using the different setting of G 2 to generate 132 × 4 = 528 double-compressed sample videos.Following [22], we obtained the receiver operating characteristic (ROC) curve to seek the optimal threshold T Λ with a fixed combination of (R 1 , R 2 ).The optimal detection accuracy was also obtained at the same time.Table 2 lists the detection accuracy with different bit rate settings where the best results for each combination of (R 1 , R 2 ) are bold and underlined.Note that all of the accuracy data contains two aspects: whether the single-and double-compressed videos were correctly detected.Table 2, in most cases, shows that the method proposed in this paper has a better performance for double-compression detection than other methods in different target bit rate videos.The greater the value of the bit rate means that the quality of the compressed video is better.When R 1 < R 2 , all methods have satisfactory accuracies.When R 1 is small, the detection accuracies of the proposed method are slightly lower than that of [22].However, with the increase in R 1 , the proposed method exhibits slightly better performance than all of the rest of the methods.In the case of R 1 > R 2 , the performance of all of the methods has a different degree of decline.The cause for this kind of decline for the primary compression is due to the higher bit rate, meaning a smaller quantization step.Thus, a variety of periodic artifact features used in all methods are less obvious to detect.However, our method still has higher accuracies than the other methods; this is due to the incorporation of the skip macroblock feature.The detection accuracy was then evaluated with different GOP size settings.Similar to the previous experiment, for a fixed setting of (G 1 , G 2 ), we generate 11 × 33 × 5 = 165 single-compressed videos.Based on these videos, we also generated 165 × 5 = 825 double-compressed videos.After an ROC curve fitting, the optimal threshold T Λ was selected to obtain the best detection accuracy.Table 3 lists the detection accuracies with different GOP size settings.Table 3 shows that the proposed method has better performance for most cases, where best results for each combination of (G 1 , G 2 ) are bold and underlined.Even when the primary GOP encoding size is relatively large, all detection rates are more than 0.80.The experimental results illustrate that the feature sequence {E t , t = 1, 2, . . ., T} generated in this method exhibits a more obvious periodicity than that of the other methods, especially in the case of videos with scene switching.To further demonstrate the superiority of the proposed method, we compare the detection performance of the proposed method with He et al. [22], Stamm et al. [20] and Vazquez-padin et al. [21] by applying non-parametric statistical tests.We used the STAC platform [24] to conduct the tests of statistical significance on the previously presented experimental results, with the objective of determining if statistical differences existed among the proposed method and comparative approaches [20][21][22].Specifically, a Friedman ranking test [25] with a significance level of 0.05 was first performed, and the test results are presented in Table 4, where a lower rank indicates a better performance.As shown in Table 4, the proposed method has the lowest rank of 1.439, followed by [22] with a rank of 1.744.In addition, we next verified the proposed method outperforms the other methods by means of a Holm post hoc test method [26] with an alpha level of 0.05.Table 5 shows the Holm test results, where the proposed method is set as the control method, the null hypothesis (H0) is set so that there is no difference between the control method and the comparative method, and the Holm p-value refers to the adjusted p-value of the first combination of the ranking that performs significantly worse than one of the best groups.It can be seen in Table 5 that for Stamm et al. [20] and Vazquez-Padin et al. [21], the hypothesis H0 are both rejected, in other words, the proposed method is apparently better than [20,21].However, for He et al. [22], the hypothesis H0 is accepted, that is, the proposed method is slightly better than [22] in terms of performance, but it is not improved significantly from a statistical point of view.

Performance for Primary GOP Size Estimation
In this section, we evaluate our accuracy performance for the primary GOP size estimation with different secondary bit rate settings.Here, we selected six test videos as our test sequences: akiyo, bowing, container, hall, paris and coastguard.For a fixed parameter R 2 , we generated 6 × 3 × 4 × 4 = 288 double-compressed videos.We used the threshold T Λ sought from the first experiment and identified the authentication of all test videos.For all correctly detected videos, we further estimated the corresponding primary GOP size G 1 for each video according to (12).We defined the estimation results as: Table 6 lists the estimation accuracy comparison of the proposed method for the different sequences with different secondary bit rates.To further intuitively show the estimation performance of the proposed method, Figure 9 illustrate a boxplot to graphically depict the estimation accuracy of the proposed method on six standard test sequences with different settings of R 2 .The bottom and top of the box are the first and third quartiles; the band inside the box is the median accuracy; and the ends of the whiskers represent the minimum and maximum of all accuracies.Table 6 illustrates that with an increase in R 2 , the estimation accuracy is also increased.In addition, observed from Table 6 and Figure 9, all sequences have similar estimation accuracies for a fixed R 2, except for the hall and coastguard sequences.More specifically, the hall or coastguard performances are higher or lower than that of the other sequences.The probable reason for this circumstance is that hall is a surveillance video with a mass of static background.The features of the string of the data bits and the skip macroblock are both more significant than that of the other sequences.For coastguard, due to the fast-moving camera, the features for the I-P-frame are easily disturbed by the other P-P-frame with violent content changes.that of the other sequences.The probable reason for this circumstance is that hall is a surveillance video with a mass of static background.The features of the string of the data bits and the skip macroblock are both more significant than that of the other sequences.For coastguard, due to the fastmoving camera, the features for the I-P-frame are easily disturbed by the other P-P-frame with violent content changes.

Time-Efficiency Evaluation
The implementation of the algorithm is based on the Visual Studio 2010 software development platform on the Windows operating system.It was run on a personal computer with an Inter Core i3-2310M CPU 2.10GHz processor and 4GB RAM.In our experiment, a standard test video library is generated by the FFmpeg video codec tool to create different parameters for the test videos.To evaluate the time efficiency of the proposed method, we conducted our algorithm on two standard test sequences, akiyo and coastguard, with a fixed frame number of 300 and with different primary and secondary GOP sizes, respectively.Table 7 lists their corresponding execution time.It can be seen in Table 7 that the average running time of the entire process for akiyo and coastguard are 129 ms and 197 ms, respectively; therefore, the results demonstrate the computation cost of the proposed method is satisfactory and the proposed method can be applied in practical video forgery detection.It is noteworthy that the execution time for coastguard is generally higher than akiyo.The reason for this difference is perhaps due to the fact that akiyo belongs to the videos with relatively static scenes and coastguard contains a variety of dramatically changing scenes.Thus, the execution complexity during the process of macroblock classification for scenes with dramatic changes is likely to be higher than that for static scenes.

Time-Efficiency Evaluation
The implementation of the algorithm is based on the Visual Studio 2010 software development platform on the Windows operating system.It was run on a personal computer with an Inter Core i3-2310M CPU 2.10GHz processor and 4GB RAM.In our experiment, a standard test video library is generated by the FFmpeg video codec tool to create different parameters for the test videos.To evaluate the time efficiency of the proposed method, we conducted our algorithm on two standard test sequences, akiyo and coastguard, with a fixed frame number of 300 and with different primary and secondary GOP sizes, respectively.Table 7 lists their corresponding execution time.It can be seen in Table 7 that the average running time of the entire process for akiyo and coastguard are 129 ms and 197 ms, respectively; therefore, the results demonstrate the computation cost of the proposed method is satisfactory and the proposed method can be applied in practical video forgery detection.It is noteworthy that the execution time for coastguard is generally higher than akiyo.The reason for this difference is perhaps due to the fact that akiyo belongs to the videos with relatively static scenes and coastguard contains a variety of dramatically changing scenes.Thus, the execution complexity during the process of macroblock classification for scenes with dramatic changes is likely to be higher than that for static scenes.

Conclusions
This paper presents a simple, but effective, double-compressed H.264/AVC video-detection method.The feature of the string of the data bits for each P-frame was extracted and then incorporated with the skip macroblocks feature.Both features can directly be obtained from the H.264/AVC coding stream.A time-domain periodicity analysis was then conducted to detect the artifact of the enhanced feature of the SODBs of all I-P-frames and P-P-frames.Finally, the primary GOP size was further estimated by a traversal search.In the experiment, the validity of the proposed method was proven by constructing the video dataset, highlighting the robustness of the different coding parameters (including the bit rate, GOP size and codec) used in the primary and secondary compression processes, and the accuracy of the estimation for the primary GOP size.The contributions of this paper are mainly divided into two aspects: on the one hand, the feature of SODBs is directly extracted from H.264/AVC coding streams, and therefore, the computation cost of the proposed method is relative low and very suitable for real-time applications; and on the other hand, our method concerns the effect of primary compression on macroblock types, and thus the double-compression artifacts are further enhanced before the periodicity analysis.
Some issues remain to be resolved in future work.First, once the primary GOP size is set at an integer multiple of the secondary GOP size (e.g., G 1 = 16, G 2 = 8), the proposed method will be invalid, because there is no I-P-frame on this occasion.In addition, for inter-frame tampering video detection, the proposed method will also be invalid when the number of frames deleted or inserted is an integer multiple of the primary GOP length.These issues are the future directions of our research.

Figure 2 .
Figure 2. The comparison of a string of data bits between single and double compressions on the sequence hall.The sizes of the Groups of Pictures (GOPs) for the single and double compressions are 12 and 15, respectively.

Figure 1 .
Figure 1.The schematic of the double compression for the cases of conversion from an I-frame to a P-frame.

Figure 2 .
Figure 2. The comparison of a string of data bits between single and double compressions on the sequence hall.The sizes of the Groups of Pictures (GOPs) for the single and double compressions are 12 and 15, respectively.

Figure 2 .
Figure 2. The comparison of a string of data bits between single and double compressions on the sequence hall.The sizes of the Groups of Pictures (GOPs) for the single and double compressions are 12 and 15, respectively.

6 Figure 3 .
Figure 3.The string of data bits for the double-compressed test sequence of coastguard.The 73rd and 75th frames are an I-P-frame and P-P-frame, respectively.

Figure 3 .
Figure 3.The string of data bits for the double-compressed test sequence of coastguard.The 73rd and 75th frames are an I-P-frame and P-P-frame, respectively.

Figure 4 .
Figure 4.The macroblock-type comparison for the I-P and P-P frames.The yellow dots denote the skip macroblocks.

Figure 4 .
Figure 4.The macroblock-type comparison for the I-P and P-P frames.The yellow dots denote the skip macroblocks.

Figure 5 .
Figure 5. Schematic diagram of the proposed method.

Figure 5 .
Figure 5. Schematic diagram of the proposed method.

Figure 6 .
Figure 6.A comparison of the string of data bits between the single and double compression on the sequence hall after a key frame suppression.The sizes of the GOPs for the single and double compression are 12 and 15, respectively.

Figure 6 .
Figure 6.A comparison of the string of data bits between the single and double compression on the sequence hall after a key frame suppression.The sizes of the GOPs for the single and double compression are 12 and 15, respectively.

Figure 7 .
Figure 7. Screenshots of the standard test sequences used in our experiments.Figure 7. Screenshots of the standard test sequences used in our experiments.

Figure 7 .
Figure 7. Screenshots of the standard test sequences used in our experiments.Figure 7. Screenshots of the standard test sequences used in our experiments.

Figure 8 .
Figure 8. Enhanced feature of the string of data bits for the test sequence coastguard.The periodic artifact and primary GOP size are observable in the figure.

Figure 8 .
Figure 8. Enhanced feature of the string of data bits for the test sequence coastguard.The periodic artifact and primary GOP size are observable in the figure.

Figure 9 .
Figure 9. Boxplot of estimation accuracy of the proposed method for six standard test sequences with different settings of R 2 .

Table 1 .
List of parameter settings used to build the dataset.

Table 1 .
List of parameter settings used to build the dataset.

Table 2 .
Detection accuracy with different bit rate settings.

Table 3 .
Detection accuracy with different GOP size settings.

Table 4 .
Friedman ranking test results.

Table 5 .
Holm post hoc test results.

Table 6 .
The estimation accuracy for the double-compressed H.264 videos for different sequences and secondary bit rate settings.

Table 6 .
The estimation accuracy for the double-compressed H.264 videos for different sequences and secondary bit rate settings.
Figure 9. Boxplot of estimation accuracy of the proposed method for six standard test sequences with different settings of R2.

Table 7 .
The execution time for the proposed method (time unit: ms).

Table 7 .
The execution time for the proposed method (time unit: ms).