An Inter-Frame Forgery Detection Algorithm for Surveillance Video

Surveillance systems are ubiquitous in our lives, and surveillance videos are often used as significant evidence for judicial forensics. However, the authenticity of surveillance videos is difficult to guarantee. Ascertaining the authenticity of surveillance video is an urgent problem. Inter-frame forgery is one of the most common ways for video tampering. The forgery will reduce the correlation between adjacent frames at tampering position. Therefore, the correlation can be used to detect tamper operation. The algorithm is composed of feature extraction and abnormal point localization. During feature extraction, we extract the 2-D phase congruency of each frame, since it is a good image characteristic. Then calculate the correlation between the adjacent frames. In the second phase, the abnormal points were detected by using k-means clustering algorithm. The normal and abnormal points were clustered into two categories. Experimental results demonstrate that the scheme has high detection and localization accuracy.


Introduction
Video sequences are often believed to provide stronger forensic evidence than still images.Thus, surveillance video, as important evidence, is often used in the case investigation.However, the digitization feature makes surveillance video easy to be manipulated.Tampering with a digital video without leaving visible clues is easily accomplished by using a video editing software, such as Adobe Premiere.Therefore, digital video forensics, which is designed to verify the trustworthiness of digital video, has become an important and exciting field for recent research.Katsaounidou et al. [1] introduced a framework of cross-media authentication and verification, as well as the values of cross-media authentication in journalism and judicial.Arab et al. [2] proposed a detection method for surveillance systems by embedding robust watermark in video.The algorithm was proven efficiency at detecting a wider range of tampering.However, embedding watermarks is not feasible sometimes in the tested video.Therefore, the detection algorithm which does not depend on prior information, for instance, detecting traces of forgery, has caught much attention in recent research.
Forgery detection for surveillance video can be divided into source authenticity and content authenticity.Source authenticity [3,4] is actually the analysis of the video "where it came from" and "how it came from".Many approaches have been developed to investigate each of steps of the acquisition process [5].While content authenticity studies whether the video has experienced some kinds of tamper operation.Forgery detection of video content includes double compression detection [6][7][8], intra-frame forgery detection [9][10][11][12] and inter-frame forgery detection, because the videos may be tampered with in various ways, including spatial tampering, temporal tampering and spatio-temporal tampering [13].Mizher et al. [14] made a detailed classification and introduction to video falsifying techniques and video forgery detection techniques.In this paper, we focus on detecting inter-frame forgery for surveillance video, which is one of the most common ways for video tampering.The purpose is to imitate or to conceal a specific event by inserting or deleting certain frames.
The existing inter-frame forgery detection methods can be divided into two categories [15].
(1) Methods based on the periodic effect of double compression Su et al. [16] indicated that the power of high frequency region of DCT coefficients block in the inter-frame forgeries shows a clear periodic artifact.The weakness of the algorithm is that it may only apply to MPEG-2.Dong et al. [17] proposed a motion-compensated edge artifact (MCEA) scheme to detect frame-based video manipulation, by judging spikes in the Fourier transform domain after double MPEG compression.Due to the fact that frame deletion or insertion would result in the frames moving from one GOP to another, and gives rise to relatively larger motion estimation errors.A machine learning approach to detect frame deletion is put forward [18].A number of discriminative features, such as prediction residuals, percentage of intra-coded macroblocks, quantization scales and reconstruction quality, are extracted from the video bit stream and its reconstructed images.Then, machine learning techniques were used to detect frame deletion.But the method cannot provide the exact localization of the deleted frames.Feng et al. [19] proposed a method which is applicable to video sequences with variable motion strengths.They analyzed the statistical characteristics of the most common interfering frames, then exploit a new fluctuation feature based on frame motion residuals to identify frame deletion points.However, the disadvantage of such methods is that they depend on the encoding parameters of the tested video.In addition, the forger can achieve anti-forensics by correcting the prediction error [20].
(2) Methods based on the discontinuity of content at tampering position.These methods are insensitive to encoding parameters and have more advantages in practical applications Chao et al. [21] utilized optical flow consistency between adjacent frames to detect frame forgery, since inter-frame forgery will disturb the optical flow consistency.For frame insertion and frame deletion forgery, the authors select different detection methods.However, we could not know in advance what kind of forgery was involved.The methods have similar ideas in [22,23], which velocity field consistency and motion vector pyramid (MVP) consistency were used respectively.In [24], a method based on quotients of correlation coefficients between local binary patterns (LBPs) coded frames is proposed.The abnormal point detection is achieved by using chebyshev inequality twice.The weakness is that it fails to discuss the selection of multiple parameters while different parameters have different detection precision.Zhang et al. [25] also used chebyshev inequality to locate the tampering position.They used a three-dimensional tensor to describe the video features, then the tensor was factorized by Tucker non-negative decomposition method.Finally, they extracted time dimension matrix to calculate correlation to determine whether there is a frame insertion or deletion forgery.Zhao et al. [26] proposed an algorithm to detect the frame-deleting forgery.The feature extraction based on the normalized mutual information feature, and make use of generalized ESD test to localize the tampering point.However, the method assumes that there is only one discontinuity point and could not detect multiple tampering points.
Since content-based detection methods do not rely on encoding standards, the approach have a more widely application, and has attracted the attention of scholars in recent years.However, as mentioned above, there are still some shortcomings in the current methods.In this paper, we propose a novel scheme for inter-frame forgery detection based on 2D phase congruency and k-means clustering.By means of k-means clustering analysis, the abnormal points can be accurately located.There is no need to select multiple thresholds to avoid the impact of threshold selection on the detection results.And it is also effective for multiple tampering.Because the surveillance video has static background, inter-frame forgery operation will reduce the correlation between adjacent frames at tampering position.Then, the consistency of the consecutive correlation coefficients is disturbed.Tampering localization could be achieved by detecting these discontinuous points, i.e., abnormal points.When calculating the correlation of adjacent frames, we use 2D phase congruency as the feature of frame since it is a good image characteristic.Furthermore, we employ k-means clustering analysis to cluster the normal and abnormal points into two categories.
The rest of the paper is organized as follows.In Section 2, the concept of 2D phase congruency and its feasibility are introduced.Section 3 describes the k-means clustering algorithm and the detection procedure for abnormal points.Section 4 gives our experimental results and discussion.Finally conclusions are drawn in Section 5.

Feature Extraction
Digital video is composed of sequences of still images or frames, and thus, it is also referred to as motion pictures.Therefore, some descriptors for digital image are also applicable to video.Due to the limited storage space of the monitoring equipment, usually the resolution of surveillance video is low.The 2D phase congruency is very sensitive to the edge and texture of the image, which can be used to describe the content of the surveillance video.

2-D Phase Congruency
The authors of [27] provide a detailed introduction of 2D phase congruency.The Local Energy Model developed by Morrone et al. [28] postulated that the sharp features are perceived at points of maximum phase congruency in an image.Phase congruency (PC) was first defined by Morrone [29] in terms of the Fourier series expansion of a signal at some location x as: where A n is the amplitude of the nth Fourier component, ∅ n (x) is the local phase of the nth Fourier component at position x, and ∅(x) is the weighted mean local phase angle at position x.If PC equals to a maximal value of 1, indicating that a noticeable signal change information is detected.As in the case of step edge in square wave.Otherwise, PC takes on values between 0 and 1, which denotes that the signal of each harmonic phase begins to be inconsistent.In this case, signal changes begin to relax, even no characteristics change.The principle can be extended to a two-dimensional (2D) image signal.
When each harmonic phase has a high degree consistency, the image contains sharp features, such as edge and line.Kovesi [30] extended the 1D PC to allow for the calculation of 2D PC of image by applying 1D analysis over several orientations and combining the results in some way.To calculate 2D PC of a given image, the image is first convolved with a bank of log-Gabor filters.Let the image denoted by I(x, y), the even-symmetric filter and odd-symmetric filter at scale s and orientation o denoted by M e so and M o so , respectively.The responses of each quadrature pair of filters is a vector.
[e so (x, y), o so (x, y)] = [I(x, y) * M e so , I(x, y) where * is the convolution operator.From Equation (2), the amplitude and phase of this response is given by Equations ( 3) and (4): ∅ so (x, y) = arctan( e so (x, y), o so (x, y)) The 2D phase congruency is then calculated by: where () + denotes that the enclosed quantity is equal to itself if it is positive, and equal to zero otherwise, W o (x, y) is a measure of significance of frequency spread, ε is a small positive constant used to prevent division of zero, T o is a quantity introduced to compensate image noise, and ∆∅ so (x, y) is a sensitive phase deviation function defined as: The results of 2D phase consistency processing for video frame is shown in Figure 1.
The results of 2D phase consistency processing for video frame is shown in Figure 1.

The Correlation of Adjacent Frames
The contents of adjacent frames in video are usually very close, while the contents of distant frames may be very different.Whether the video is captured by a static camera or hand-held camera.But there should be no shot change.Therefore, we will use the correlation coefficient as a measure of the continuity of the inter frame content.The video frames were first processed through 2D PC, and the correlation coefficient between adjacent frames is defined in Equation ( 7): where  indicates the correlation coefficient between kth and (k+1)th frame,  (, ) represents the 2-D PC value of kth frame at location (, ), n is the total number of video frames,  is the average of 2-D PC for the kth frame.Assuming the frame width of video is w pixels and the height is h pixels, then  can be calculated by using Equation ( 8): As shown in Figure 2, we can get a sequence of inter-frame correlation coefficients.If the total number of video frame is n, the length of the sequence would be n − 1.

The Correlation of Adjacent Frames
The contents of adjacent frames in video are usually very close, while the contents of distant frames may be very different.Whether the video is captured by a static camera or hand-held camera.But there should be no shot change.Therefore, we will use the correlation coefficient as a measure of the continuity of the inter frame content.The video frames were first processed through 2D PC, and the correlation coefficient between adjacent frames is defined in Equation ( 7): where r k indicates the correlation coefficient between kth and (k+1)th frame, PC k (i, j) represents the 2-D PC value of kth frame at location (i, j), n is the total number of video frames, PC k is the average of 2-D PC for the kth frame.Assuming the frame width of video is w pixels and the height is h pixels, then PC k can be calculated by using Equation ( 8): As shown in Figure 2, we can get a sequence of inter-frame correlation coefficients.If the total number of video frame is n, the length of the sequence would be n − 1.
then  can be calculated by using Equation ( 8): As shown in Figure 2, we can get a sequence of inter-frame correlation coefficients.If the total number of video frame is n, the length of the sequence would be n − 1.In the original video, the value of correlation coefficients are close to each other, which means the curve of r is consistent.Figure 3 shows an illustration of frames 143-202 of test video "person15_jogging".When the original video is subjected to frame insertion tamper, the value of the correlation coefficient will decrease at the tamper position.The two frames calculating the correlation coefficient are not the original adjacent relation, regardless of whether the inserted frame is from the In the original video, the value of correlation coefficients are close to each other, which means the curve of r is consistent.Figure 3 shows an illustration of frames 143-202 of test video "person15_jogging".When the original video is subjected to frame insertion tamper, the value of the correlation coefficient will decrease at the tamper position.The two frames calculating the correlation coefficient are not the original adjacent relation, regardless of whether the inserted frame is from the same video or another video.The same is true of frame deletion.Figure 4 shows the correlation coefficients of forged video.In Figure 4a, ten frames from the same video were inserted into original video.And the result of frame deletion (delete thirteen frames) was shown in Figure 4b.The red line indicates the tampering position, we can see that the correlation coefficient at tamper position is much lower than others.
Information 2018, 9, x 5 of 14 same video or another video.The same is true of frame deletion.Figure 4 shows the correlation coefficients of forged video.In Figure 4a, ten frames from the same video were inserted into original video.And the result of frame deletion (delete thirteen frames) was shown in Figure 4b.The red line indicates the tampering position, we can see that the correlation coefficient at tamper position is much lower than others.

The Variation of Consecutive Correlation Coefficients
Some characteristics of video, such as the complexity of the texture or the speed of movement, will affect the value of the correlation coefficient.For instance, in Figure 3, the correlation coefficient is larger when the video content change slowly.Therefore, the detection is not accurate if we simply use the value of correlation coefficient.To restrain this phenomenon caused by the diversification of video content, some scholars have put forward the solution of calculate the variation of consecutive correlation coefficients.Such as the quotients [24] or absolute difference [31] of consecutive correlation coefficients.The definitions are given in Equations ( 9) and ( 10): same video or another video.The same is true of frame deletion.Figure 4 shows the correlation coefficients of forged video.In Figure 4a, ten frames from the same video were inserted into original video.And the result of frame deletion (delete thirteen frames) was shown in Figure 4b.The red line indicates the tampering position, we can see that the correlation coefficient at tamper position is much lower than others.

The Variation of Consecutive Correlation Coefficients
Some characteristics of video, such as the complexity of the texture or the speed of movement, will affect the value of the correlation coefficient.For instance, in Figure 3, the correlation coefficient is larger when the video content change slowly.Therefore, the detection is not accurate if we simply use the value of correlation coefficient.To restrain this phenomenon caused by the diversification of video content, some scholars have put forward the solution of calculate the variation of consecutive correlation coefficients.Such as the quotients [24] or absolute difference [31] of consecutive correlation coefficients.The definitions are given in Equations ( 9) and ( 10):

The Variation of Consecutive Correlation Coefficients
Some characteristics of video, such as the complexity of the texture or the speed of movement, will affect the value of the correlation coefficient.For instance, in Figure 3, the correlation coefficient is larger when the video content change slowly.Therefore, the detection is not accurate if we simply use the value of correlation coefficient.To restrain this phenomenon caused by the diversification of video content, some scholars have put forward the solution of calculate the variation of consecutive correlation coefficients.Such as the quotients [24] or absolute difference [31] of consecutive correlation coefficients.The definitions are given in Equations ( 9) and (10): where ∆r1 and ∆r2 represent the quotients and absolute difference of consecutive correlation coefficients, respectively.By construction, ∆r1 ≥ 1 and ∆r2 ≥ 0. We selected 60 frames from video "person15_jogging" for testing.The curves of ∆r1 and ∆r2 of the test video are shown in Figure 5, the left is ∆r1 and the right is ∆r2.The red line represents the tamper position.Moreover, there appears a pair of peaks, which are called the abnormal points.Comparing to r and ∆r2, ∆r1 is more credible as video changes in content have little impact on it.Therefore, the quotients of consecutive correlation coefficients (∆r1) are more suitable for inter-frame forgery detection.We set ∆r1 as the measure of the variation of consecutive correlation coefficients.

Detection Scheme for Abnormal Points
From Section 2.3, we can conclude that frame insertion and deletion will influence the consistency of the variation of consecutive correlation coefficients.The peaks at the tampering position are the abnormal points that we need to detect.From Figure 5, we can find that the normal value of ∆1 is near 1, this feature will have good clustering effect.
We hope to use clustering algorithm to detect outliers in sample points.K-means clustering is sensitive to outliers and may converge to a local minimum.It may help us to cluster the outliers into one category accurately.
In this section, we will describe the k-means clustering algorithm and the detection procedure for abnormal points.

The k-Means Clustering Algorithm
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).The k-means (KM) clustering [32] is the most widely used clustering

Detection Scheme for Abnormal Points
From Section 2.3, we can conclude that frame insertion and deletion will influence the consistency of the variation of consecutive correlation coefficients.The peaks at the tampering position are the abnormal points that we need to detect.From Figure 5, we can find that the normal value of ∆r1 is near 1, this feature will have good clustering effect.
We hope to use clustering algorithm to detect outliers in sample points.K-means clustering is sensitive to outliers and may converge to a local minimum.It may help us to cluster the outliers into one category accurately.
In this section, we will describe the k-means clustering algorithm and the detection procedure for abnormal points.

The k-Means Clustering Algorithm
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).The k-means (KM) clustering [32] is the most widely used clustering algorithm due to its simplicity and efficiency.The objective of the algorithm is to minimize an objective function in order to assign a group of data to its centroid.
Given a set of observations where each observation is a d-dimensional real vector, k-means clustering aims to partition the n observations into k (k ≤ n) sets S = {S 1 , S 2 , S 3 , • • • S k }, so as to minimize the within-cluster sum of squares.In other words, its objective is to find: where µ i is the mean of points in S i .
Combined with the characteristics of this paper, we introduce the step of k-means clustering algorithm as below.
The variation of consecutive correlation coefficients ∆r1 can be obtained by applying Equation ( 9).Where inter-frame correlation coefficients r and 2-D PC for each frame were calculated according to Sections 2.1 and 2.2, respectively.
Step 2. Initialize clustering parameters.Since our purpose is to classify the sample points into two categories, one cluster is normal points and another is abnormal, so we set cluster number k = 2. K-means algorithm intensively depends on the selection of initial clustering centers.While the outliers to be detected are the several largest values in the sample points, so we select the two largest values of X as cluster S 1 , and the centroid is the mean of S 1 .While the other values of X as cluster S 2 , the centroid is the minimum of X.The storage in cluster is the location of value.
Step 3. Assign each observation x p to the cluster whose mean yields the minimum distance between x p and centroid of cluster.
Because ∆r1 is one-dimensional, the distance is defined as the absolute difference between x p and centroid m: where (t) represents iterations.Each x p is assigned to exactly one S (t) , even if it could be assigned to two or more of them.
Step 4. Calculate the new means to be the centroids of the observations in the new clusters.
Step 5. Steps 3 and 4 are repeated until convergence has been reached.

Abnormal Points Detection Based on KM
In this section, we will illustrate the effectiveness of the feature and the feasibility of the detection method in detail.

Clustering Results of Original Video
Figure 6a shows sixteen sequential frames of an original video, the resolution is 640 × 480.∆r1 was calculated using Equation ( 9), and the clustering results of ∆r1 were shown in Figure 6c.'*' indicates the cluster S 1 , ' Step 5. Step 3 and 4 are repeated until convergence has been reached.

Abnormal Points Detection Based on KM
In this section, we will illustrate the effectiveness of the feature and the feasibility of the detection method in detail.

Clustering Results of Original Video
Figure 6a shows sixteen sequential frames of an original video, the resolution is 640 × 480.∆1 was calculated using Equation ( 9), and the clustering results of ∆1 were shown in Figure 6c.'*' indicates the cluster  , ' ☆ ' represents the cluster  , '+' denotes the centroids of the clusters.
The centroids of two clusters were 1.0389 and 1.0118, respectively.We find that both of the values of centroids were very close to 1, if the video is not forged.' represents the cluster S 2 , '+' denotes the centroids of the clusters.The centroids of two clusters were 1.0389 and 1.0118, respectively.We find that both of the values of centroids were very close to 1, if the video is not forged.
Step 3 and 4 are repeated until convergence has been reached.

Abnormal Points Detection Based on KM
In this section, we will illustrate the effectiveness of the feature and the feasibility of the detection method in detail.

Clustering Results of Original Video
Figure 6a shows sixteen sequential frames of an original video, the resolution is 640 × 480.∆1 was calculated using Equation ( 9), and the clustering results of ∆1 were shown in Figure 6c.'*' indicates the cluster  , '☆' represents the cluster  , '+' denotes the centroids of the clusters.The centroids of two clusters were 1.0389 and 1.0118, respectively.We find that both of the values of centroids were very close to 1, if the video is not forged.

Clustering Results of Forged Video by Frame Insertion
In Figure 7a, the last eight adjacent frames are the insertion frames from the same video with the first eight frames.In Figure 7b, there is a pair of peaks at the 7th and 8th frames because of the low correlation between the 8th and 9th frame.If we detect the abnormal points, we can prove that the video has been forged.The location of abnormal point is where tampering happens.In clustering results, cluster S 1 represents the detected outliers.The centroids of two clusters were 2.5144 and 1.0413, respectively.We find that the centroid of S 1 is deviate from 1, due to the tampered operation.

Clustering Results of Forged Video by Frame Insertion
In Figure 7a, the last eight adjacent frames are the insertion frames from the same video with the first eight frames.In Figure 7b, there is a pair of peaks at the 7th and 8th frames because of the low correlation between the 8th and 9th frame.If we detect the abnormal points, we can prove that the video has been forged.The location of abnormal point is where tampering happens.In clustering results, cluster  represents the detected outliers.The centroids of two clusters were 2.5144 and 1.0413, respectively.We find that the centroid of  is deviate from 1, due to the tampered operation.In Figure 8a, the 11th and the 12th frame are not adjacent, because we delete several frames between them.The consistency of ∆1 was destroyed at the delete position, and the frame number 10, 11 were clustered in  .The centroids of  and  were 1.7757 and 1.0359, respectively.

Clustering Results of Forged Video by Frame Deletion
In Figure 8a, the 11th and the 12th frame are not adjacent, because we delete several frames between them.The consistency of ∆r1 was destroyed at the delete position, and the frame number 10, 11 were clustered in S 1 .The centroids of S 1 and S 2 were 1.7757 and 1.0359, respectively.

Clustering Results of Forged Video by Frame Deletion
In Figure 8a, the 11th and the 12th frame are not adjacent, because we delete several frames between them.The consistency of ∆1 was destroyed at the delete position, and the frame number 10, 11 were clustered in  .The centroids of  and  were 1.7757 and 1.0359, respectively.( Sometimes, in order to falsify evidence, the forger will delete all frames which contain characters in the video.For example, in Figure 9a, we entirely remove a group of frames where a subject passes.There looks no discontinuities, but in fact, inconsistency appeared in the delete position.Shown in Figure 9b.
Information 2018, 9, x 9 of 14 Sometimes, in order to falsify evidence, the forger will delete all frames which contain characters in the video.For example, in Figure 9a, we entirely remove a group of frames where a subject passes.There looks no discontinuities, but in fact, inconsistency appeared in the delete position.Shown in Figure 9b.

Clustering Results of Forged Video by Multiple Tampering
Sometimes video suffers from multiple tampering issues.For example, the combination of frame insertion and deletion.As shown in Figure 10a, the video suffered three forgeries.Due to the difference of tampering positions and the number of frames, the values of peaks are different.However, all the tampering positions were precisely detected as shown in Figure 10b.The centroids of  and  were 1.6289 and 1.0489, respectively.

Clustering Results of Forged Video by Multiple Tampering
Sometimes video suffers from multiple tampering issues.For example, the combination of frame insertion and deletion.As shown in Figure 10a, the video suffered three forgeries.Due to the difference of tampering positions and the number of frames, the values of peaks are different.However, all the tampering positions were precisely detected as shown in Figure 10b.The centroids of S 1 and S 2 were 1.6289 and 1.0489, respectively.

Clustering Results of Forged Video by Multiple Tampering
Sometimes video suffers from multiple tampering issues.For example, the combination of frame insertion and deletion.As shown in Figure 10a, the video suffered three forgeries.Due to the difference of tampering positions and the number of frames, the values of peaks are different.However, all the tampering positions were precisely detected as shown in Figure 10b.The centroids of  and  were 1.6289 and 1.0489, respectively.From the above examples and analysis, we can conclude that the selected feature can effectively reflect the inter-frame manipulation.Moreover, the KM algorithm is able to locate the tampering position.From the clustering results, it is not difficult to see that if the video is forged, the value of  ′ centroid will be larger, which is deviated from 1, corresponding to the peaks of the tampering.
Thus, the points in cluster  are the abnormal points.But when the video is original, both of the two clustering centers are very close to 1. Therefore, the value of the cluster center is the basis for judging whether the video has been tampered with or not.Moreover, generally the value of the cluster center in frame insertion is larger than that of frame deletion, but it is not true in all cases.From the above examples and analysis, we can conclude that the selected feature can effectively reflect the inter-frame manipulation.Moreover, the KM algorithm is able to locate the tampering position.From the clustering results, it is not difficult to see that if the video is forged, the value of S 1 centroid will be larger, which is deviated from 1, corresponding to the peaks of the tampering.Thus, the points in cluster S 1 are the abnormal points.But when the video is original, both of the two clustering centers are very close to 1. Therefore, the value of the cluster center is the basis for judging whether the video has been tampered with or not.Moreover, generally the value of the cluster center in frame insertion is larger than that of frame deletion, but it is not true in all cases.

Threshold Decision
According to the description in Section 3.2, the normal and abnormal points are clustered into two categories by k-means clustering, the value of centroid is the basis for judging whether the video has been tampered with or not.
We tested the proposed method on the original sub-database, and analyzed the cluster center of S 1 of 599 original videos.The histogram of the value of centroid is shown in Figure 11.The x-axis indicates the value of centroid of the cluster, the y-axis indicates the frequency of occurrence of each value.

Threshold Decision
According to the description in Section 3.2, the normal and abnormal points are clustered into two categories by k-means clustering, the value of centroid is the basis for judging whether the video has been tampered with or not.
We tested the proposed method on the original sub-database, and analyzed the cluster center of  of 599 original videos.The histogram of the value of centroid is shown in Figure 11.The x-axis indicates the value of centroid of the cluster, the y-axis indicates the frequency of occurrence of each value.From Figure 11, we conclude that most of the clustering center values are less than 1.25.So we set T = 1.25 as a threshold to distinguish original video and forged video.When centroid of  is greater than T, the video is detected as forged.Otherwise, the video is detected as original.

Dataset
In our experiments, two datasets of different sources and resolutions are selected.The first From Figure 11, we conclude that most of the clustering center values are less than 1.25.So we set T = 1.25 as a threshold to distinguish original video and forged video.When centroid of S 1 is greater than T, the video is detected as forged.Otherwise, the video is detected as original.

Dataset
In our experiments, two datasets of different sources and resolutions are selected.The first dataset selects from public Kungliga Tekniska Högskolan (KTH) [33], include one original sub-dataset, and four forgeries sub-datasets with different number of tampered frames.The videos contain six types of human actions, namely walking, jogging, running, boxing, hand clapping, and hand waving.The test videos compressed by MPEG, and were taken with a static background with 50fps frame rate, the resolution is 180 × 144.The number of tested videos of the five sub-datasets is 599, 599, 599, 599, and 598, respectively.
The second dataset has 480 videos, half of them are original videos, and the other are forged videos, include frame insertion and deletion.The number of tampered frames is more than 20.The composition of the second video dataset is shown in the Table 1.

Evaluation Metrics and Method Assessment Procedure
In order to evaluate the validity of the scheme, we consider six performance indices: TPR (True Positive Rate), also known as recall, TNR (True Negative Rate), PPV (positive predictive value), also known as precision, Accuracy, F1 score, and Location Precision.Accuracy is the average detection accuracy.F1 score can be interpreted as the weighted average of the precision and recall.Location Precision is the percentage of correct localization among all the correct detected forgery videos: where TP is the number of true positive, means that the forged video was detected as forged, TN is the number of true negative, means that the original video was detected as original; FP is the number of false positive, means that the original video was detected as forged; FN is the number of false negative, means that the forged video was detected as original, TPL is the number of correct localization, (TP + FN) is the total number of forged videos, (TN + FP) is the total number of original videos, (TP + FP + TN + FN) is the total number of database.
In order to improve LP, the location results will be post-processed.We reject the single suspected abnormal points, since in the tamper position there is always a pair of peaks.

Conclusions
In this paper, an inter-frame forgery detection scheme based on 2D phase congruency and k-means clustering was proposed for surveillance video.We calculate 2D PC for each frame firstly.Then, the correlation coefficients of adjacent frames and the variation of consecutive correlation coefficients are obtained.Finally, the discontinuous points caused by tampering are detected by using k-means clustering algorithm.Experimental results show that our approach can detect and localize the tampering positions efficiently.The shortcoming is that when deleting frames appear at the beginning or the end of the video, the detection method is impossible.Moreover, in this scheme, the TPR and LP are higher in frame insertion detection than frame deletion.Therefore, in the future work, we will focus on finding a better solution to improve the precision of detecting frame deletion.
In addition, the method locates the inter-frame tampering operations without distinguishing whether the inserted frames are copied from the same video or are spliced from another video.In future work, we will try to distinguish different frame insertion forgery operations.

Figure 6 .
Figure 6.Original video frames and clustering results.(a) Sixteen frames from an original video; (b) The curve of ∆r1; (c) clustering results.

Figure 6 .
Figure 6.Original video frames and clustering results.(a) Sixteen frames from an original video; (b) The curve of ∆r1; (c) clustering results.

Figure 6 .
Figure 6.Original video frames and clustering results.(a) Sixteen frames from an original video; (b) The curve of ∆r1; (c) clustering results.

Figure 7 .
Figure 7. Frame insertion and clustering results.(a) Sixteen frames from a video tampered by frame insertion; (b) The curve of ∆r1; (c) clustering results.3.2.3.Clustering Results of Forged Video by Frame Deletion

Figure 7 .
Figure 7. Frame insertion and clustering results.(a) Sixteen frames from a video tampered by frame insertion; (b) The curve of ∆r1; (c) clustering results.

Figure 7 .
Figure 7. Frame insertion and clustering results.(a) Sixteen frames from a video tampered by frame insertion; (b) The curve of ∆r1; (c) clustering results.

Figure 8 .Figure 8 .
Figure 8. Frame deletion and clustering results.(a) Sixteen frames from a video tampered by frame deletion; (b) The curve of ∆r1; (c) clustering results.

Figure 9 .
Figure 9. Frame deletion and clustering results.(a) Sixteen frames from a video tampered by frame deletion; (b) The curve of ∆r1; (c) clustering results.

Figure 9 .
Figure 9. Frame deletion and clustering results.(a) Sixteen frames from a video tampered by frame deletion; (b) The curve of ∆r1; (c) clustering results.

Figure 9 .
Figure 9. Frame deletion and clustering results.(a) Sixteen frames from a video tampered by frame deletion; (b) The curve of ∆r1; (c) clustering results.

Figure 11 .
Figure 11.The histogram of cluster center.

Figure 11 .
Figure 11.The histogram of cluster center.

Table 1 .
The composition of the second video dataset.

Table 5 .
The time consuming of the algorithm.