Overcome the Brightness and Jitter Noises in Video Inter-Frame Tampering Detection

Digital video forensics plays a vital role in judicial forensics, media reports, e-commerce, finance, and public security. Although many methods have been developed, there is currently no efficient solution to real-life videos with illumination noises and jitter noises. To solve this issue, we propose a detection method that adapts to brightness and jitter for video inter-frame forgery. For videos with severe brightness changes, we relax the brightness constancy constraint and adopt intensity normalization to propose a new optical flow algorithm. For videos with large jitter noises, we introduce motion entropy to detect the jitter and extract the stable feature of texture changes fraction for double-checking. Experimental results show that, compared with previous algorithms, the proposed method is more accurate and robust for videos with significant brightness variance or videos with heavy jitter on public benchmark datasets.


Introduction
The rapid development and spread of low-cost and easy-to-use video editing software, such as Adobe Premiere, Photoshop, and Lightworks, makes it easier to tamper with digital video without efforts. Inter-frame forgery happens quite often. It includes inserting frames into a video sequence or removing frames from a video sequence [1]. These tampered videos may be indistinguishable to the naked eye. Thus, they may harm judicial forensics, media reports, e-commerce, finance, and public security. Therefore, it is necessary to develop methods to help human eyes identify tampered videos [1].
A considerable amount of effort has been devoted to inter-frame forgery detection. Most of these approaches are based on the successful extraction of some characteristics of the video. For example, some recent works detected tampered video by calculating the optical flow between frames [2][3][4][5][6]. However, this process could be severely interrupted by illumination noises, which invalidates the extraction of optical flow features [7,8]. Besides, jitter noise may also affect correlation consistency between adjacent frames in the video [9,10], causing many false detections.
For the forgery detection of the videos with noises, a few methods have been developed, including low-rank theory for video with blur noise [11], the coarse-to-fine approach under the condition of regular attacks, including additive noise and filtering [12]. However, these works did not consider brightness and jitter noises. Videos with brightness changes and jitter videos are common in real life-e.g., most of the videos are shot by cell phones. Although the motion-adaptive method [13] considered both brightness and jitter noises, it was not suitable for the lowest motion video with minor changes between adjacent frames, which is quite popular. Moreover, these methods also do not consider or validate the effect of multi-tamper.
We propose a novel framework that not only takes into account both brightness and jitter noises, but also considers the lowest motion video. To deal with considerable illumination noise, we introduce the relaxing brightness constancy assumption [14] and develop a linear model to present the physical intensity change. To deal with subtle illumination noises, we introduce intensity normalization [15]. To deal with the false detection caused by video jitter, we propose motion entropy and stable texture changes fraction features of the video for double-checking. In addition, the improved robust optical flow is insensitive to the motion level of the video. Moreover, the texture changes fraction feature can also describe the subtle inter-frame differences of the lowest motion video. Therefore, our method is also suitable for the lowest motion video. Experimental results on three public video databases show that our method can be applied to the videos with brightness variance, the videos with significant jitter, and the lowest motion videos. Furthermore, our approach can not only locate the forgery precisely, but it can also estimate the way of multi-forgery on tampered positions.
The rest of this paper is organized as follows. In Section 2, we briefly introduce the related work for inter-frame forgery detection. In Section 3, we briefly describe the preliminaries in this paper. Section 4 describes the proposed scheme in detail. We provide the evaluation of optical flow computation in Section 5. Experimental results and analysis are presented in Section 6, and we draw a conclusion and discuss future works in Section 7.

Related Work
Most of the prior works detected forgeries based on the analysis of correlations between frames, which relies on features extracted from videos. As noises in videos could significantly affect feature extraction and correlation analysis, we classify the existing methods into two categories: methods without considering noises and methods considering noises.

Methods without Considering Noises
In terms of the type of features, previous methods could be divided into two categories: image-features-based and video-features-based. Methods in the first category usually extracted image features of each frame, such as texture features [9], color characteristics [9,16], histogram features [16], structural features [17], etc. Methods in the second category mainly utilized the impact of tampering on video features, including video encoding characteristic [18][19][20], double compression [21], motion features such as errors in motion estimation [10], optical flow, predict residual gradients [19], and brightness features such as segmented brightness variance descriptor (BBVD) [2], illumination information [4], etc.
Although these methods have been validated on videos from public data sets, they generally did not consider noises. They could probably generate incorrect results on reallife videos containing various noises. For example, the performance of methods [2,4,19] declines due to illumination noise, and the methods [9,18] are susceptible to jitter noises in real-life videos.

Methods Considering Noises
To address the issue of feature extraction in the blurry video, Lin et al. [11] adopted low-rank theory to deblur video, fusing multiple fuzzy kernels of keyframes by low-rank decomposition. Jia et al. [12] proposed a video copy-move detection method based on robust optical flow features. Furthermore, they also adopt adaptive or stable parameters to detect the tamper under the condition of regular attacks, including additive noise and filtering. The method [12] is limited or only validated by copy-move forgery. However, our proposed approach applies to all tampering operations, including frame insertion, frame copy-move, frame replication, and frame deletion. Feng et al. [13] adopted a frame deletion detection method based on the motion residuals feature. They embrace the postprocessing forensic tools, including the automatic color equalization (ACE) forensics and mean gradient evaluation, to eliminate the detection interference caused by illumination and jitter noises.
Illumination noises and jitter noises have side effects on the detection result. However, few works take both brightness and jitter noises into account at the same time. Although [13] has considered both noise factors, it does not fit the lowest motion strength video. While our work takes both brightness and jitter into account at the same time, it is also suitable for the lowest motion videos.

Horn and Schunck (H&S) Method
When a moving object in the three-dimensional world is projected onto a twodimensional plane, optical flow (OF) is the relative displacement of the pixels of the image pairs [8]. Specifically, the optical flow method uses the information difference between adjacent frames to describe the movement of objects in a three-dimensional world [7]. OF has been widely applied in various scenes, such as object segmentation, target tracking, and video stabilization [22].
Horn and Schunck (H&S) method [8] is a classical OF estimation algorithm, which is based on three major premise assumptions: brightness consistency, the spatial coherence of neighboring pixels, and small motion of the pixel [23]. Given a video sequence, the pixel intensity at the position (x, y) of t-th frame is I(x, y, t) , the brightness consistency can be described by the Equation where dx and dy correspond to the slight change of the movement over dt, then Equation (1) can be expanded by the first-order Taylor series Let I x = ∂I/∂x, I y = ∂I/∂y, I t = ∂I/∂t, then I x , I y , I t represents the change rate of the grey value of the pixel along the x, y, and t directions, respectively. Combining Equations (1) and (2), we can get the Equation According to the definition of speed Equation u = dx/dt and v = dy/dt, we obtain Equation (4) is the OF constraint equation, then we constrain the OF calculation problem to the minimum optimization problem of Equation (5), E d is the sum of the errors under the brightness constancy constraint, and there are two unknown variables: u and v. An equation cannot determine a unique solution, so a new condition E s needs to be introduced. E s is the constraint condition for smooth changes of OF over the entire image [24], which is shown in the Equation (6).  (6) where ∇ represents the gradient operator. The H&S algorithm converts the OF solution to the minimum optimization problem, shown as the following Equation (7). Equation consists of a grayscale change factor E d and a smooth change factor E s . The ideal OF value E is relatively small, so the corresponding values of the grayscale change of E d and the speed change E s are also small, which meets the assumption of constant brightness and small motion, respectively.
where ∇ represents the gradient operator and λ represents the smooth factor.

Robust Optical Flow Algorithm against Brightness Changes
The above classical OF calculation is usually incorrect when the image sequence has significant brightness changes, which exist in most real-life videos. Therefore, the OF algorithm was enhanced by relaxing brightness consistency assumptions [14].
Gennert et al. [14] relaxed brightness consistency assumptions by the Equation where S(x, y, t) and T(x, y, t) are constraint parameters for space and time.
Combining Equations (2) and (8), we can obtain T/dt, we combine (9) and (3) obtain The enhanced OF is calculated by solving the extreme value problem described by Equation (11). Compared with Equation (7), the enhanced OF algorithm is more robust by considering the brightness change. min u,v,sc,tc E = E d + λ s E s + λ sc E sc + λ tc E tc E d = Ω I x u + I y v + I t − scI − tc 2 dxdy E s = Ω |∇u| 2 +|∇v| 2 dxdy E sc = Ω |∇sc| 2 dxdy E tc = Ω |∇tc| 2 dxdy where λ s , λ sc , λ tc are smoothing factor, spatial domain constraint parameter, and timedomain constraint parameter, respectively. E d , E sc are grayscale change factor and smooth change factor, respectively. E sc and E tc are spatial and time-domain constraint parameters, respectively.

Method
We propose a novel framework to overcome the brightness and jitter noises in video inter-frame tampering detection. As illustrated in Figure 1, there are three algorithms in this framework. Firstly, Algorithm 1 reduces the impact of illumination changes in the input video sequence by the optical flow information. At the same time, if the motion entropy is more significant than a certain threshold, we detect jittery video by Algorithm 2. Based on the detected tampering points of the above two steps, Algorithm 3 makes the judgment of video tamper finally. entropy is more significant than a certain threshold, we detect jittery video by Algorith 2. Based on the detected tampering points of the above two steps, Algorithm 3 makes judgment of video tamper finally.

Algorithm 1: Reduce the Impact of Illumination Changes
The consistency of the has been proven to be an efficient tool to check the int rity of video [3,5,6]. Based on the enhanced algorithm described in the previous s tion, we design Algorithm 1 to reduce the impact of illumination changes; the main ste are shown as follows.
Step 1: Due to the brightness variations, the intensity of images should be normaliz [15] before applying the optical flow method with a digital filter sequence. To cope w the high-frequency noise which affects the computation, we preprocess the inp video by Gaussian filter [26].
Step 2: Based on the enhanced method described in Section 3 we extract the fluctuation feature i r to measure the similarity between adjacent frames of the video Equation

Algorithm 1: Reduce the Impact of Illumination Changes
The consistency of the OF has been proven to be an efficient tool to check the integrity of video [3,5,6]. Based on the enhanced OF algorithm described in the previous section, we design Algorithm 1 to reduce the impact of illumination changes; the main steps are shown as follows.
Step 1: Due to the brightness variations, the intensity of images should be normalized [15] before applying the optical flow method with a digital filter sequence. To cope with the high-frequency noise which affects the OF computation, we preprocess the input video by Gaussian filter [25].
Step 2: Based on the enhanced OF method described in Section 3 we extract the OF fluctuation feature r i to measure the similarity between adjacent frames of the video by Equation (12) sum_OF i is the OF sum of the i-th video frame, which is calculated by Equation (13) where wid, hei represent the width and height of the video frame, respectively. N is the video frames number. The average OF sum avg(sum_OF i ) in a sliding window centered on the i_th frame is calculated by the Equation: where 2w is the width of the sliding window, sum_OF 3 is the OF sum of the third video frame, sum_OF 4 is the OF sum of the fourth video frame.
Step 3: Jitter frame pixels have small amplitude movements in the same motion direction [26], which has the consistency of motion direction. The video with consistent motion direction has small motion direction entropy. Therefore, we adopt motion direction entropy ME to perceive the consistency of video motion direction, which can sense video jitter. ME can be calculated as follows: 1. Use the frame difference method [27] to calculate the binarized motion area. 2. Utilize Shi-Tomasi corner calculation method [28] to obtain the corner c(j) on the binarized motion area. 3. Combining the standard deviation S(θ) of the histogram of the OF direction, the motion entropy ME of the video is computed.
where OF ij is the OF of the corner c(j) in the i_th frame. The S(θ) is the standard deviation measure of the OF direction histogram, which measures the consistency of the direction histogram. ME i is the motion entropy of the i_th frame and std is the standard deviation of the N video frames.
Step 4: We judge whether the video is tampered with based on the continuity of the video frame feature sequence. THR_R is a threshold selected for the peak point of OF fluctuation feature sequence, and C is the variable counter for peak point. THR_E is the threshold selected for ME. If r i THR_R, the i_th frame is considered to be the suspected tamper point and C+ = 1 is used to count the number of suspected tamper points. When C ≥ 1, it means the video has suspected tamper points. Under the premise of C ≥ 1, if the detection result satisfies the condition ME ≤ THR_E, which means the video is not jittery. Then we can judge the video as a tampered video directly; if not, it indicates that the video is jittery. Therefore, the video needs to be further detected by Algorithm 2. The suspected tampered position detection process is summarized in Algorithm 1.

Algorithm 1: Reduce the impact of illumination changes
Input : video framesI (p) (1 ≤ p ≤ N , set THR_R as threshold selected for peak point, THR_R as threshold selected for ME. Output: store position of suspicious tampering point in S. 1 : S = ∅, C = 0 //C is the variable counter for peak point 2 : for i = 1; i < N; i + + do 3 : calculate OF fluctuation feature r i and motion entropy ME 4 : if r i THR_R then 5 : add iinto S 6 :

Algorithm 2 Detects Jittery Video
Video jitter refers to a small motion in the same motion direction of the video frame. Since the enhanced OF fluctuation feature r in Algorithm 1 is a global motion statistic, only using the feature r is likely to cause leak detection or false detection, especially in the case of severe video jitter. To eliminate the false negative detected point caused by video jitter, we adopt the video texture changes fraction TC to detect the jitter video. The TC feature captures the local details changes of different motion direction of the video frame, which is not captured the characteristics of the same motion direction of the jitter frame, so jittery frames are not identified as tampered frames. The TC feature is calculated by three steps: Step 1: We compute the gradient structure information of the i_th frame as ∆I i . The corresponding binary mask TM i is obtained by the threshold Th t for the gradient image ∆I i , and the binary mask of the video frame is shown as Figure 2b1,b2.
where I i x is the partial derivative of the i_th frame in the x-direction, I i y is the partial derivative of the i_th frame in the y-direction, and ∆I i means the gradient structure information of the i_th frame. Figure 2. Textured area of video frames. (a1,a2) are the video frames; after compute the gradient structure information of (a1,a2), (b1,b2) are the corresponding binary mask; after perform morphological operations on (b1,b2), (c1,c2) are the textures area of video frames).
Step 3: We calculate the texture changes fraction with Equation (20), and is an absolute value operator. The value of 1 in _ ℎ frame and 0 in ( + 1)_ ℎ frame is called the exiting pixel, shown by the arrow at the top of Figure 3, and its statistic is called Cout . On the contrary, the value of 0 in _ ℎ frame and 1 in ( + 1)_ ℎ frame is called entering pixels, shown by the arrow at the bottom of Figure 3, and its statistic is called Cin . The process of the detection algorithm based on video texture changes fraction TC is shown in Algorithm 2.
Step 2: We perform morphological operations on the binary mask TM i to fill the gaps and remove small areas containing noise, as shown in Figure 2(c1,c2).
where • means a closed operation of morphological operation, • means an open operation of morphological operation, and SE is a structural element of open operation and closed operation.
Step 3: We calculate the texture changes fraction TC(I i , I i+1 ) between TM i and TM i+1 with Equation (20), and || is an absolute value operator. The value of 1 in i_th frame and 0 in (i + 1)_th frame is called the exiting pixel, shown by the arrow at the top of Figure 3, and its statistic is called Cout. On the contrary, the value of 0 in i_th frame and 1 in (i + 1)_th frame is called entering pixels, shown by the arrow at the bottom of Figure 3, and its statistic is called Cin. The process of the detection algorithm based on video texture changes fraction TC is shown in Algorithm 2. (20) Step 3: We calculate the texture changes fraction with Equation (20), and is an absolute value operator. The value of 1 _ ℎ frame and 0 in ( + 1)_ ℎ frame is called the exiting pixel, shown by the arrow at th top of Figure 3, and its statistic is called Cout . On the contrary, the value of 0 in _ frame and 1 in ( + 1)_ ℎ frame is called entering pixels, shown by the arrow at the botto of Figure 3, and its statistic is called Cin . The process of the detection algorithm based o video texture changes fraction TC is shown in Algorithm 2.
exiting pixel entering pixel i-th frame (i+1)-th frame Statistics of video frame texture changes fraction.
Due to the picture continuity of video frames, the content similarity between adjacen frames is substantial, and the value TC is considerably small. If a certain number frames are inserted or deleted, the video continuity will be destroyed. The larger the valu TC , and the more likely the video is to be tampered. Due to the picture continuity of video frames, the content similarity between adjacent frames is substantial, and the value TC is considerably small. If a certain number of frames are inserted or deleted, the video continuity will be destroyed. The larger the value TC, and the more likely the video is to be tampered. The exiting common video tamper operation can cause different tampering point on the extracted video feature sequence. More concretely, the deletion forgery causes a sudden peak in the feature sequence, and the insertion forgery causes two pikes. When I i is a frame forgery point and its previous frame is I (i−1) . At the same time, I j is another frame forgery point, and its next frame is I (j+1) . If I (i−1) and I (j+1) are very similar, then there is a video frame insert clip from I i to I (j−i+1) . If not, the tamper detection method is a deletion forgery. The process of judgment of video tamper is shown in Algorithm 3. Input:suspicious tampering point set in S, the variable counter for peak point C Output : frame insertion setS insert , frame deletion set S delete 1 : S insert = ∅, S delete = ∅ 2 : for i = 1; i < C; i + + do 3 : for j = i + 1; j < C; j + + do 4 : if j > C: 5 : add i into S delete 6: else:

Experimental Setup
To evaluate the enhanced OF algorithm of the proposed detection framework, we perform experiments on the benchmark dataset [29]. The dataset contains various image sequences and the corresponding ground-truth OF information, so we can quantify the robustness and accuracy of the enhanced OF algorithm. To evaluate the enhanced OF algorithm against dynamic brightness variation, the image I is multiplied by a factor M, and a constant C 1 is added to construct a model of dynamic brightness variation. The specific calculation process is shown in Equation (21). For example, Figure 4a,b show frame10 and frame11 of the Hydrangea sequence group in the dataset, respectively. When M = 1.1 and C 1 = 10, Figure 4b is changed to Figure 4c. We need to calculate the OF information between Figure 4a,c.
where M ∈ [0.9, 1.1], C 1 ∈ [−10, 10].   [32]. The AAE and EPE are used to compare the difference between the ground truth and the estimated information. The smaller the values of AAE and EPE, the better the performance of the corresponding algorithm. We can also visually estimate algorithm performance by visualization of the flow map. Equation of AAE is shown in Equations (22)   We estimate the OF information between Figure 4b and c, let u gt i ,v gt i represent the real OF information, and let u e i , v e i represent the estimated OF information. We evaluate OF methods by two measures indicators: the average angular error (AAE) [30] and the end point error (EPE) [31]. The AAE and EPE are used to compare the difference between the ground truth OF and the estimated OF information. The smaller the values of AAE and EPE, the better the performance of the corresponding OF algorithm. We can also visually  (22) and (23).

Experimental Results and Analysis
The test results of different OF algorithms between frame10 Figure 4a and frame11 (Figure 4c under brightness change) in the Hydrangea sequence group are shown in Figure 5. The description of different approaches and parameter settings used for OF evaluation is shown in Table 1. The performance evaluation results of different OF algorithms are shown in Table 2. The original image and the ground-truth velocity field are shown in Figure 5a. The flow map and the warped image obtained by the HS algorithm are shown in Figure 5b. The flow map uses different colors and brightness to indicate the size and direction of the estimated OF, and the warped image represents frame11 warped to frame10 according to the estimated OF. At the same time, it is observed that the estimated flow map in Figure 5b and the ground truth in Figure 5a are significantly different. The error measures of AAE and EPE in Table 2 are also relatively large. It is observed that the HS algorithm is not suitable for the evaluation of the image sequence with dynamic brightness variation.
The evaluation result of the HS+IN (intensity normalization) algorithm is shown in Figure 5c. Compared to the HS algorithm, the values of AAE and EPE of the HS+IN algorithm are significantly reduced. The execution time is not much different, which indicates that the intensity normalization is beneficial to the OF calculation of image sequences with brightness changes.
The evaluation result of the HS+BR (brightness relaxing factor) algorithm is shown in Figure 5d. Compared to the HS algorithm, the values of AAE and EPE of the HS+BR algorithm is greatly increased, which indicates that just introducing the brightness relaxing factor is not beneficial to the OF calculation of image sequences with brightness changes.
Combining IN and BR, we propose the enhanced OF algorithm. The evaluation result is shown in Figure 5e, which is very close to the ground-truth velocity field in Figure 5a visually. The warped image is also similar to frame10. The values of AAE and EPE are small, which reaches single digits. The above indicators show that the enhanced OF algorithm proposed is suitable for the OF calculation of image sequence with brightness changes. We have made a trade-off between computational accuracy and time complexity. The evaluation result of the HS+IN (intensity normalization) algorithm is sh Figure 5c. Compared to the HS algorithm, the values of AAE and EPE of the HS+I rithm are significantly reduced. The execution time is not much different, which in that the intensity normalization is beneficial to the OF calculation of image sequenc brightness changes.

Experimental Results and Analysis
We conduct extensive experiments in diverse and realistic forensic setups to evaluate the performance of the proposed detection framework in this section. The experimental data is introduced first. Then the setup of parameters and evaluation standards are suggested. Finally, we present the experimental results and comparison analysis with four existing state-of-art algorithms to detect accuracy and robustness.

Experimental Data
To evaluate the detection effect of the proposed method, we performed experiments on three public datasets, namely the SULFA Video Library (The Surrey University Library for Forensic Analysis) [32], the CDNET Video Library (a video database for testing change detection algorithms) [33], and the VFDD Video Library (Video Forgery Detection Database of South China University of Technology Version 1.0) [34], respectively. There are about 200 videos in total. The scenes in the video library are as follows: (1) The video library includes videos of different motion levels, including slow motion, medium motion, and high motion.
(2) The video library contains videos of different brightness intensities and different scenes (indoors and outdoors).
(3) The video library includes a variety of mobile phone videos, as well as camera videos, which were taken with or without a tripod.

Experimental Setup
We download 150 videos with noticeable brightness changes from the video website and adopt the metrics of ACE forensics [35] to determine the brightness changes of videos. We found that these videos have a higher intensity of dynamic brightness changes than the experimental video library. Because the authenticity of the website video is uncertain, it cannot be used as an experimental video. Therefore, we apply the model of dynamic brightness change, which is shown in Equation (21), to simulate the video brightness changes in the real-life environment. We report the precision with respect to λ sc , λ tc , λ s and THR_E respectively. Based on the results of Figure 6, we observe the effect is best when λ sc = 1, λ tc = 1, λ s = 10 and THR E = 0.5. The values of THR_R and THR_R1 are set according to the Chebyshev inequality adaptively [36], and the corner point c(i) in Algorithm 1 is set to 50. brightness change, which is shown in Equation (21), to simulate the video brig changes in the real-life environment. We report the precision with respect to sc  ， and THR_E respectively. Based on the results of Figure 6, we observe the effect when To evaluate the performance of the detection algorithm, we use the error me and to analyze the experimental results. The calculation Equatio where is the number of detected correct points, is the number of detecte points, and is the number of tampered points that were missed. To evaluate the performance of the detection algorithm, we use the error metrics of precision and recall to analyze the experimental results. The calculation Equations are: (25) where N c is the number of detected correct points, N f is the number of detected false points, and N m is the number of tampered points that were missed. Figure 7 is the detection result of frame deletion forgery for the video with jitter noises and illumination noises. Figure 7a is the experimental results by Algorithm 1, which shows the OF fluctuation feature sequence has peaks pair (91, 99, 118). At the same time, the calculated value of motion entropy ME is 0.672, which indicates that the video is jittery. To reduce the side effect of the video jitter, we detect the nervous video by Algorithm 2, which utilizes the texture changes fraction feature TC to detect. The detection result of double-checking is shown as Figure 7b, where the tampering point is 118. At last, we make the judgment of video tamper by Algorithm 3, We can obtain that 118 is frame deletion forgery point, and the peak pair (91, 99) is false detection results. Figure 7 is the detection result of frame deletion forgery for the video with j noises and illumination noises. Figure 7a is the experimental results by Algorithm which shows the fluctuation feature sequence has peaks pair (91, 99, 118). At the s time, the calculated value of motion entropy is 0.672, which indicates that the vi is jittery. To reduce the side effect of the video jitter, we detect the nervous video by gorithm 2, which utilizes the texture changes fraction feature to detect. The detec result of double-checking is shown as Figure 7b, where the tampering point is 118. At we make the judgment of video tamper by Algorithm 3, We can obtain that 118 is fr deletion forgery point, and the peak pair (91, 99) is false detection results.  to detect forgery, and the detec results show that the feature sequence has peaks pair (91, 99, 118). At the same time, the mo entropy of is greater than the selected threshold, which indicates it is a jittery video. There we detect the video by Algorithm 2, and the detection results in the (b) show that frame118 i tampering point.

Experimental Results
Based on the detection result of Figure 7, Figure 8 is the detection result of mult tampering of the same video. Figure 8a is the experimental results by the Algorithm which shows that the fluctuation feature sequence has peaks pair (91, 99, 118,  180). Moreover, the motion entropy is 0.752, which indicates that the video is jitt To eliminate the effect of the video jitter, this video is re-tested by Algorithm 2, The  (91, 99, 118). At the same time, the motion entropy of OF is greater than the selected threshold, which indicates it is a jittery video. Therefore, we detect the video by Algorithm 2, and the detection results in the (b) show that frame118 is the tampering point.
Based on the detection result of Figure 7, Figure 8 is the detection result of multiple tampering of the same video. Figure 8a is the experimental results by the Algorithm 1, which shows that the OF fluctuation feature sequence has peaks pair (91, 99, 118, 150, 180).
Moreover, the motion entropy ME is 0.752, which indicates that the video is jittery. To eliminate the effect of the video jitter, this video is re-tested by Algorithm 2, The re-testing detection result is shown as Figure 8, which locates the tampering points pair at (118, 150, 180), and the peak pair (91, 99) is false detection results. At last, we make the judgment of tamper by Algorithm 3. We can obtain that frame118 is the deletion forgery point, and the point pair (150, 180) is frame insertion forgery point.
Sensors 2021, 21, x FOR PEER REVIEW 15 testing detection result is shown as Figure 8, which locates the tampering points pa (118, 150, 180), and the peak pair (91, 99) is false detection results. At last, we make judgment of tamper by Algorithm 3. We can obtain that frame118 is the deletion for point, and the point pair (150, 180) is frame insertion forgery point.   pair (91, 99, 118, 150, 180). At the same time, the m entropy of is greater than the selected threshold, which indicates the video is a jittery video video is re-tested using Algorithm 2, and the detection results in (b) show that frame (118,150,180) is the tampering point. Figure 9 is the detection result of the untampered video with jitter noises and illu nation noises. Figure 9a is the detection result by Algorithm 1, which shown that the fluctuation feature sequence has a peak pair (22,70). The motion entropy is 0 which indicates the video is jittery. To eliminate the effect of the video jitter, this vid re-tested by the Algorithm 2, which utilizes the texture changes fraction to detect.  99, 118, 150, 180). At the same time, the motion entropy of OF is greater than the selected threshold, which indicates the video is a jittery video; this video is re-tested using Algorithm 2, and the detection results in (b) show that frame pair (118,150,180) is the tampering point. Figure 9 is the detection result of the untampered video with jitter noises and illumination noises. Figure 9a is the detection result by Algorithm 1, which shown that the OF fluctuation feature sequence has a peak pair (22,70). The motion entropy ME is 0.643, which indicates the video is jittery. To eliminate the effect of the video jitter, this video is re-tested by the Algorithm 2, which utilizes the texture changes fraction to detect. The detection result of re-testing is shown as Figure 9b, which indicates that the texture changes fraction sequence has no peaks. Based on the above test results, we judge that the video is original and has not been tampered.
Sensors 2021, 21, x FOR PEER REVIEW changes fraction sequence has no peaks. Based on the above test results, we judge th video is original and has not been tampered. to detect forgery, and the detection result that the feature sequence has peaks pair (22,70). At the same time, the motion entropy of greater than the selected threshold, which indicates the video is jittery; this video is re-tested Algorithm 2 and it can be seen that the texture changes fraction sequence has no peaks in (b Figure 10 shows frame replacement forgery detection result of video with illu tion noise. It shows that the feature sequence has peaks pair (51, 93). At the same tim Figure 9. Detection result of the untampered video with jitter noises and illumination noises. In (a), Algorithm 1 utilizes the fluctuation extent of OF to detect forgery, and the detection results show that the feature sequence has peaks pair (22,70). At the same time, the motion entropy of OF is greater than the selected threshold, which indicates the video is jittery; this video is re-tested using Algorithm 2 and it can be seen that the texture changes fraction sequence has no peaks in (b). Figure 10 shows frame replacement forgery detection result of video with illumination noise. It shows that the feature sequence has peaks pair (51, 93). At the same time, the calculated motion entropy ME is 0.453, which indicates that the video is not jittery. Then we judge video tamper, and the OF fluctuation feature r between frame 50th and 94th is 1.0046, which shows frame pair (50, 94) is very similar. Therefore, the peak pair (51, 93) is the location of video insertion forgery.  Figure 11 shows the detection result of frame deletion forgery of video with illumination noises. It indicates the fluctuation feature r has prominent peaks at frame deletion point 56. Because the motion entropy is 0.486, which suggests that the video is not jittery. At last, we make the judgment of video tamper and obtain that frame point 56 is the location of video deletion forgery. Figure 11. Detection result of frame deletion forgery of video with illumination noise. Figure 12 is the detection result of video frame copy-move forgery of video with illumination noise. Figure 12 is the detection result by Algorithm 1, which shown that the fluctuation feature sequence has a peak pair (45, 57). And we calculate the value of motion entropy is 0.482, which indicates that the video is not jittery. At last, we make  Figure 11 shows the detection result of frame deletion forgery of video with illumination noises. It indicates the OF fluctuation feature r has prominent peaks at frame deletion point 56. Because the motion entropy ME is 0.486, which suggests that the video is not jittery. At last, we make the judgment of video tamper and obtain that frame point 56 is the location of video deletion forgery.  Figure 11 shows the detection result of frame deletion forgery of video with illumination noises. It indicates the fluctuation feature r has prominent peaks at frame deletion point 56. Because the motion entropy is 0.486, which suggests that the video is not jittery. At last, we make the judgment of video tamper and obtain that frame point 56 is the location of video deletion forgery. Figure 11. Detection result of frame deletion forgery of video with illumination noise. Figure 12 is the detection result of video frame copy-move forgery of video with illumination noise. Figure 12 is the detection result by Algorithm 1, which shown that the fluctuation feature sequence has a peak pair (45, 57). And we calculate the value of motion entropy is 0.482, which indicates that the video is not jittery. At last, we make the judgment of video tamper. The fluctuation feature r between frame 44th and 58th is 0.9844. Therefore, the peak pair (45, 57) is the location of video insertion forgery.  Figure 12 is the detection result of video frame copy-move forgery of video with illumination noise. Figure 12 is the detection result by Algorithm 1, which shown that the OF fluctuation feature sequence has a peak pair (45, 57). And we calculate the value of motion entropy ME is 0.482, which indicates that the video is not jittery. At last, we make the judgment of video tamper. The OF fluctuation feature r between frame 44th and 58th is 0.9844. Therefore, the peak pair (45, 57) is the location of video insertion forgery. According to the performance evaluation criteria of the proposed algorithm, a parison is made between the proposed algorithm in the paper and the state-of-the-a ferent video tamper detection algorithms [3,6,9,38]. Table 3 shows the parameter de tion of the comparison methods, our proposed method and the comparison method the same dataset, and the comparison results are shown in Table 4.
As compared to methods reported in [3,6,9,38], the proposed method has hig bustness and high accuracy. The results indicate that the proposed method is capa effective detection and localization of all inter-frame forgeries on videos with illumin noises and jitter noises. In a real-life scenario, the forensic investigator has no contro the parameters of the environment where the video was captured or the parameters by the video tamper. The forensic investigator must detect in the complete absence o information regarding the noises, the motion-level, and the forgery operation forms captured video. Therefore, the most suitable forgery detection is the one that has pra suitability for the real-life video scenes, such as videos with brightness variance, v with significant jitter, and the various motion-level videos. Furthermore, our metho only can locate the forgery precisely, but also can estimate the way of multi-forge tampered positions.  According to the performance evaluation criteria of the proposed algorithm, a comparison is made between the proposed algorithm in the paper and the state-of-the-art different video tamper detection algorithms [3,6,9,37]. Table 3 shows the parameter description of the comparison methods, our proposed method and the comparison methods use the same dataset, and the comparison results are shown in Table 4.  As compared to methods reported in [3,6,9,37], the proposed method has high robustness and high accuracy. The results indicate that the proposed method is capable of effective detection and localization of all inter-frame forgeries on videos with illumination noises and jitter noises. In a real-life scenario, the forensic investigator has no control over the parameters of the environment where the video was captured or the parameters used by the video tamper. The forensic investigator must detect in the complete absence of any information regarding the noises, the motion-level, and the forgery operation forms of the captured video. Therefore, the most suitable forgery detection is the one that has practical suitability for the real-life video scenes, such as videos with brightness variance, videos with significant jitter, and the various motion-level videos. Furthermore, our method not only can locate the forgery precisely, but also can estimate the way of multi-forgery on tampered positions.
For [3,6], the detection methods based on OF are invalid when there are illumination changes added to the image sequence. Hence, the detection result is not so good. For [9], the detection performance is improved; the main reason is that the Zernike moment feature avoids the effect of brightness intensity. However, experiments prove that its detection performance on the jittery video has decreased significantly, so the detection result is not so good. For [37], the test results are also relatively improved; the main reason is that the multi-channel feature avoids missing detection; however, experimental results show that the performance of this method is not good for the minor frame deletion forgery, so this method is not as stable as the proposed method in our paper.
Prior video tampering detection methods are not suitable for videos with dynamic brightness changes and jittery videos. The detection method [13] based on motion residual can be ideal for the most motion-level video, such as high motion-level, medium motionlevel, etc. However, it is not suitable for the slowest motion-level video. The inter-frame difference will decrease as the video motion-level decrease, so the extracted motion residual feature will be weak. However, the relocated I-frame is not affected by the motion level of video, so the relocated I-frame will be defined as the tampered frame mistakenly. Therefore, reference [13] is not suitable for the lowest motion video. Our proposed method utilizes the inconsistencies of features, including the enhanced OF and texture changes fraction, to detect tamper in real-life videos. The former feature is insensitive to the motion level of the video. Moreover, the latter feature can also describe the subtle inter-frame differences of the lowest motion video. Therefore, our method is also suitable for the lowest motion video.
To reduce the effect of illumination noises and jitter noises, we utilize a robust optical flow detection method based on relaxing brightness consistency assumption and intensity normalization, which can reduce the influence of significant brightness change and small brightness change, respectively. At the same time, we use motion entropy ME to sense whether the video is jittery and utilize the texture changes fraction TC for double-checking, so the false detection caused by video jitter can be reduced. Experiments prove that the proposed detection method has strong robustness and high accuracy for complex scene video.

Conclusions
In this paper, we have proposed a novel detection framework for inter-frame forgery in real-life video with illumination noises and jitter noises. Firstly, for videos with severe brightness changes, we relax brightness constancy constraint and adopt intensity normalization to propose a new optical flow algorithm in Algorithm 1. Secondly, for videos with large jitter noises, we introduce motion entropy to detect the jitter and extract the stable feature of texture changes fraction for double-checking in Algorithm 2. Finally, we make the judgment of video tamper in Algorithm 3. The proposed method was validated by extensive experimentation in diverse and realistic forensic setups. The obtained results indicate that the proposed method is entirely accurate and robust. It can detect video singleforgery or multi-forgeries with an average accuracy of 89%, including frame deletion, frame insertion, frame replacement, and frame copy-move. Furthermore, the proposed method is not sensitive to the jitter noises, illumination noises, or the motion level of the video. In the future, it would be beneficial to explore the suitability of some other real-life video scenes, such as blurred video and still video.