You are currently viewing a new version of our website. To view the old version click .
Sensors
  • Article
  • Open Access

18 August 2024

An Anti-Forensics Video Forgery Detection Method Based on Noise Transfer Matrix Analysis

,
,
,
and
1
Institute of Intelligent Rehabilitation Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
2
Institute of Forensic Science of Shanghai Municipal Public Security Bureau, Shanghai 200083, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
This article belongs to the Section Sensing and Imaging

Abstract

The dispute over the authenticity of video has become a hot topic in judicial practice in recent years. Despite detection methods being updated rapidly, methods for determining authenticity have limitations, especially against high-level forgery. Deleting the integral group of pictures (GOP) length in static scenes could remove key information in the video, leading to unjust sentencing. Anyone can conduct such an operation using publicly available software, thus escaping state-of-the-art detection methods. In this paper, we propose a detection method based on noise transfer matrix analysis. A pyramid structure and a weight learning module are adopted to improve the detection rate and reduce the false positive rate. In total, 80 videos were examined through delicate anti-forensic forgery operations to verify the detection performance of the proposed method and three previously reported methods against anti-forensic forgery operations. In addition, two of the latest learning-based methods were included in our experiments to evaluate the proposed method. The experimental results show that the proposed method significantly improves the detection of frame deletion points compared with traditional and learning-based methods, especially in low false positive rate (FPR) intervals, which is meaningful in forensic science.

1. Introduction

Digital videos are captured using cameras developed for many scenarios, such as streets, schools, subways, and shopping malls. Even a cell phone can be used as a camera by anyone passing by such areas. Every minute, 100 hours’ worth of video is uploaded to YouTube [1]. Digital videos have been widely used in many fields for their intuitive characteristics.
In forensic science, tampered videos can convey inaccurate messages, which mislead the audience. When used in court, tampered video can have severe effects. For example, an attacker could delete a sequence of frames where a person is walking through an accident scene to destroy any evidence that the person was present [2]. Furthermore, digital audio–visual materials can be easily modified for frame insertion and deletion [3]. The authenticity of digital data must be confirmed before the data can be accepted. Digital watermarks and blind detection techniques offer authentication solutions. Digital watermarks rely on hardware for inserting watermark information in the recording process, and blind detection techniques have become a hot topic in forensic science. Scholars have proposed some useful standards to apply in detection. Common forms of tampering with digital video include deleting frames and cloning or duplicating frames or parts of a frame, as these operations are easy to implement and perform well [4].
Methods for video forensics can be extended to intra-frame and inter-frame forgery detection [5,6]. Intra-frame detection focuses on identifying the forged region in the frame, while inter-frame detection aims to determine temporal domain forgery, such as frame deletion or frame insertion. As reported by Milani et al. [7], detection methods can be divided into three categories: recording equipment artifacts, geometric/physical inconsistencies, and recoding traces. With the rapid development of neural networks, multiple networks have been used to detect video forgery [8,9,10,11,12,13,14,15].
The basis of equipment artifact detection is the unique characteristics of the device. Noise is a simple aspect that illustrates equipment artifacts. Pixel intensity is corrupted by photon counting, readout, and quantization noise. Alograini et al. [16] conducted sequential and patch analyses based on noise for object removal detection in video. Fayyaz et al. [17] developed an improved surveillance video forgery detection technique to address forged frames with induced sensor pattern noise. If the edited video has frames inserted from the same device or only consecutive frames removed, methods that rely on recording equipment artifacts [11,16,18,19,20,21,22,23] become useless for verifying the video.
Detecting geometric/physical inconsistencies is a common method for detecting multiple features that exhibit consistency throughout the entire video. Inter-frame forgery will disrupt the consistency. Li et al. [5] proposed a novel frame deletion detection method based on optical flow orientation variation. Robust principal component analysis is applied to extract moving objects from video sequences and compute their descriptors. They examined more than 300 real-world videos to evaluate the method’s performance. The results demonstrated that the true positive rate reached 90.12% with a 7.71% false alarm rate. Fadl et al. [24] proposed a fused inter-frame forgery detection method based on the histogram of oriented gradients (HOG) and motion energy image (MEI). Shehnaz et al. [25] proposed a histogram similarity pattern-detection approach by combining HOG and local binary patterns (LBPs), using a supervised support vector machine (SVM) classifier that is trained to detect video tampering based on histogram similarity metrics. This method could recognize all forms of inter-frame forgery. Most detections of geometric/physical inconsistencies [5,8,9,10,24,26,27,28,29,30,31,32] can be fooled by removing frames from static scenes, because an abnormal moving trace is the key point to detect inconsistencies.
It should be pointed out that some reported approaches [1,33,34] can detect frame deletion in static scenes, including integer length group of pictures (GOP) deletion. This has significant meanings in forensics, as frame deletion in static scenes is a common way to tamper with videos with anti-forensics effects. A recoding trace is an inevitable feature in a forged video, as a tampered video will be re-encoded at least once after interpolation. Wang et al. [35] proposed an algorithm to detect double compression after frame removal. An unusual phenomenon occurs in the prediction error curve of the predictive frame after some frames are dropped from the video: some frames follow the deletion from one GOP sequence to another. Suppose the predictive frame moves from one to another. In that case, the prediction error of the predictive frame will periodically increase for the relationship between the predictive frame and the intra-frame destroyed by deletion manipulation. This is solid proof that multiple compressions have occurred in the video. Similar approaches with modified models on this basis have been reported in the literature [2,36,37,38,39,40,41,42,43,44]. In recent years, network models have also been introduced to improve the performance of GOP methods. Hong et al. [42] proposed a multi-layer perceptron scheme to classify the features extracted from a GOP. Such attempts contribute to making the evaluation criteria of GOP methods more objective.
Although existing digital forensic techniques have developed rapidly in recent decades, more approaches have achieved remarkable performance on publicly available datasets [45]. Researchers have demonstrated that many people can be deceived if forgers use anti-forensic techniques [36]. In this paper, our target to identify is the anti-forensic operation for removing the integer length of a GOP in a static scene, as shown in Figure 1. The GOP structure is described in Section 2.4. This operation can alter a video by hiding key information, which can mislead courts when making sentences. In addition, it can be easily performed by anyone who understands the principles of current detection methods. The performance of a network-based approach highly depends on the training set. No anti-forensics operation training set has been verified in network-based approaches until now. Methods for detecting equipment artifacts and geometric/physical inconsistencies can be fooled under frame deletion in static scenes. Compared with inconsistency detection methods, multiple recoding detection methods could verify the authenticity of videos even when there is frame removal in static scenes. However, such methods have two primary drawbacks:
Figure 1. Sketch map of anti-forensics operation.
  • Multiple recoding traces are solid proof of a loss of originality. However, multiple recoding traces may not indicate that the video has necessarily been tampered with for reasonable transmission processes.
  • Integral GOP frame deletion in static scenes is an effective attack against most multiple recoding detection approaches based on Wang et al.’s [35] principle.
This paper proposes an anti-forensic forged video detection method based on a noise transfer matrix analysis. The primary contributions of the proposed method are as follows:
  • We propose a novel anti-forensic detection approach to discern forged videos (integral GOP frame deletion in static scenes), using the combination of a pyramid structure and an adaptive weight adjustment module.
  • We adopt a pooling operation and pyramid structure to extract noise features, which are available for subsequent analysis with suppressed sensitivity.
  • A normalization operation and the combination of successive frame results reduce the influence of variable dimensions and video recording interference.
  • Incorporating an adaptive weight adjustment module ensures the algorithm’s universality and fast learning ability with only a single video and diverse environments, thus meeting the practical requirements of forensic science.
  • Original videos and visual examples are discussed in the paper, which are intuitive and detailed and display the characteristics of various detection methods.
  • The receiver operating characteristic (ROC) results demonstrate significant enhancement, particularly in the low false positive rate (FPR), indicating highly improved performance in terms of the forensic principle of no punishment in doubtful cases.
The rest of the paper is organized as follows: Section 2 presents related works on the detection of forged videos. In Section 3, the materials of the experiment are introduced, as the integer length of GOP deletion was not involved in past research. Then, we formulate the correlation of noise using a noise transfer matrix analysis. Section 4 provides the experimental results and a discussion, and lastly, conclusions are given in Section 5.

3. Proposed Approach

Before introducing the proposed method, we must point out the particularity of the dataset tested in this paper. Integer-length GOP deletion in static scenes is the anti-forensics operation we hope to detect. To the best of our knowledge, there is no specialized dataset to verify such detection performance. Therefore, the experiment reported in Section 4 was conducted using our proposed dataset, which is described in detail in Section 4.

3.1. Methods

In this section, we present the operation of the proposed video tampering detection method in detail. As stated in Section 1, the inter-frame noise correlation between adjacent frames of a natural video is inherently high. The noise stream is continuous in the video. The global and local features are correlated in successive frames with relatively gentle changes in time series. The changing of noise features is caused by coupling between sensors, characteristics, defects of sensors, temperature and working conditions of the equipment, etc. Integer GOP deletion in a static scene is an effective operation to evade most existing detection approaches, but the noise correlation would still be adversely affected, leaving the trace to be detected. We propose an anti-forensic forgery detection method based on a noise transfer matrix with a pyramid structure and adaptive weight adjustment. The method presented in this paper can be summarized into three main steps: (1) extracting noise, (2) computing the noise transfer matrix, and (3) adjusting the weights of the transfer matrix. Figure 2 shows a flowchart of the proposed detection method.
Figure 2. Flowchart of proposed detection method.

3.1.1. Noise Extraction

A color image is converted to grayscale before processing. For static scenes, the noise intensity can be calculated by the averaging technique using Equation (1), where N x , y , m , n is noise intensity in the position (x,y) of the nth frame which belongs to mth GOP. I x , y , m , n is the intensity in the position (x,y) of the nth frame which belongs to mth GOP. I x , y , m ¯ is the average intensity in position (x,y) of the mth GOP containing the nth frame:
N x , y , m , n = I x , y , m , n I x , y , m ¯
As reported by Li et al. [51], extracting the noise intensity directly at the pixel level will not work well. We adopted a three-layer pyramid structure inspired by Zhao et al. [33] to extract noise considering both local and structural features. Firstly, an overlapped 16 × 9 window is used to extract local noise features, including the maximum, mean, and standard deviation. Then, no overlapped 2 × 2 window is applied twice in succession on the maximum noise map extracted in the previous layer. The corresponding noise map also extracts the maximum, mean, and standard deviation. The sizes of the three feature maps are 120 × 120, 60 × 60, and 30 × 30, respectively. A flowchart of the noise feature maps is shown in Figure 3. Various sizes of windows (32 × 18 and 64 × 36) were tested to optimize the window size. The results of parameter optimization experiments are described in Section 4.
Figure 3. Flowchart of 3-layer noise feature maps.

3.1.2. Noise Transfer Matrix Computation

After noise extraction, the noise transfer matrix is computed as shown in Equation (2). A schematic diagram is shown in Figure 4. Similar computations are conducted on the maximum, mean, and standard deviation values in each layer. A total of nine noise transfer matrices are computed to evaluate the distance between adjacent frames.
C i , j , n = N i , j , n N i , n
Figure 4. Schematic diagram of noise transfer matrix.
In Equation (2), C i , j , n stands for the transfer probability, and N i , j , n stands for the amount of noise feature intensity transfer from i to j in the corresponding position in nth and (n + 1)th frames, and N i , n denotes the total amount of noise features with intensity i in the nth frame. The distance D n + 1 between successive frames can be calculated using Equations (3)–(6), where D n + 1 , m a x represents the distance between maximum noise features extracted from the three-layer structure. A similar operation is adopted to compute the distance of mean and standard deviation noise. If the dimensions of two matrices are not equivalent, the smaller dimension is filled with element zero to keep the same dimension as the larger one. Peaks in the single noise feature could be caused by interference in the video, while peaks standing in multiple noise features are more likely caused by frame deletion; an example is shown in Figure 5. Black circles indicate the result of extracted noise features in frame deletion points, and the blue circle indicates occasional spikes extracted in a single noise feature.
D n + 1 , m a x = i = 1 i j = 1 j t = 1 3 C i , j , m a x , t , n C i , j , m a x , t , n + 1 2
D n + 1 , m e a n = i = 1 i j = 1 j t = 1 3 C i , j , m e a n , t , n C i , j , m e a n , t , n + 1 2
D n + 1 , s d = i = 1 i j = 1 j t = 1 3 C i , j , s d , t , n C i , j , s d , t , n + 1 2
D n + 1 = D n + 1 , m a x × D n + 1 , m e a n × D n + 1 , s d 3
Figure 5. Example of multiple noise features extracted in scene 11: (a) maximum noise; (b) mean noise; (c) standard deviation noise; (d) combined result.
The same distance value means differences in various dimensions. An example of changing dimensions in an entire video is shown in Figure 6. In short videos, it does not make sense for the dimension to change at a slow gradient. However, in hundreds of frames, the diversity of dimensions can reach 50%, which should not be overlooked, as reported by Li et al. [51]. The transfer matrix would be more like an identity matrix with stronger correlations in successive frames. Conversely, the transfer matrix would be more like a uniform matrix. To normalize extreme situations, the results computed by Equations (3)–(5) should be multiplied by a regulation coefficient, K (Equation (7)), to eliminate the effect of different scales. The n in the formula means the dimension of the transfer matrix. Because the dimension of the transfer matrix (n) is relatively large, the normalization process is simplified by multiplying by a coefficient (Equation (8)).
K = n n 1
K = n
Figure 6. Schematic diagram of dimensions of transfer matrix in a sample video.
The correlation of noise between adjacent frames can be influenced by multiple factors. In successive frames, minor disturbances will reduce the continuity of noise to some extent. For integral GOP deletion, frames in different GOPs will become back-to-back connections, reducing the correlations between noise features. Two consecutive high signals would appear in the FDP, as explained in Figure 7. The product of noise correlation amplifies the difference between minor disturbances in the video and integral GOP removal. The final correlation of noise continuity can be enhanced by using Equation (9). The function ln() represents the operation of taking the natural logarithm. An example of such an enhanced operation is shown in Figure 8, where the black circles indicate the selection of genuine FDP and the blue circles indicate other peaks. It can be inferred that enhanced operation highlights the abnormal fluctuation of noise features caused by frame removal.
D n , f = ln D n 1 × D n
Figure 7. Diagram of two consecutive peaks appearing in FDP.
Figure 8. Example of enhanced operation in scene 79: (a) original result; (b) modified result from enhanced operation.

3.1.3. Adjusting Transfer Matrix Weights

Equations (3)–(5) determine that all transfer matrix elements have the same contribution. It is a simple assumption but not reasonable for some elements to change rapidly, even in successive frames; others will change slowly despite the removal of an entire GOP. In practical forensic cases, disputed digital data usually last tens of seconds to a few minutes without supplementary digital data. The weight modification should be practical and efficient.
Inspired by the above, we introduce a self-adaption module to adjust the weights of the transfer matrix. Elements that participate in weight modification are decided in the first step. Weight amendment is learned from successive frames and discontinuous frames. In this paper, we apply the length of the whole GOP (25 frames) to discontinuous frame contribution learning and weight adjustment. A sketch map is shown in Figure 9. The comparison experiments with various intervals are presented in Section 4. In general, weights related to broken points in the frame will be strengthened, and weights related to successive frames will be weakened. All weights are initialized to 1. First, elements of the transfer matrix, in both successive and discontinuous frames, are arranged in descending order by Equations (10) and (11):
V a r s , i , j = C i , j , t , n C i , j , t , n + 1 2
V a r d , i , j = C i , j , t , n C i , j , t , n 25 2
Figure 9. Sketch map of weight adjustment.
The top elements in the two sets represent the maximum variance in successive frames and discontinuous frames. If an element arrears in the single set, it will provide useful information for weight adjustment; it appears twice, and the opposite information is learned from descending order sets, which will confuse weight adjustment. Based on the above, Equation (12), inspired by information entropy described by Zhao et al. [33], is established to determine the weights participating in modification:
E n t r o p y N = N = 1 N V a r s , N × C i , j × log 2 C i , j + N = 1 N V a r d , N × C i , j × log 2 C i , j
where V a r s , N and V a r d , N represent the top N elements in the arranged sets, and C i , j is the corresponding transfer probability of noise feature. The extreme value of the above function determines the elements participating in weight adjustment. Then, the weights of elements selected by Equations (10) and (12) will be weakened by Equation (13), while the weights of elements selected by Equations (11) and (12) will be strengthened by Equation (14):
W e i g h t s w , N = δ G L × log 10 C i , j , N , n C i , j , N , n + 1 2 + 0.000001
W e i g h t s s , t = N = 1 N W e i g h t s w , N N = 1 N C i , j , N , n C i , j , N , n + 1 2 × C i , j , t , n C i , j , t , n + 1 2
W e i g h t s w , N represents the Nth element in the set determined by Equation (10), GL represents the frame length participating in weight adjustment, C i , j , N , n represents the corresponding value in the transfer matrix, and δ represents the power of adjustment. W e i g h t s s , t represents the tth element in the set determined by Equation (11), and C i , j , t , n is the corresponding value in the transfer matrix. The intensity of δ influences the performance of the weight adjustment module, according to comparison experiments conducted to optimize the parameter. The results are shown in Section 4. Finally, the noise features computed by Equations (3)–(5) are modified by the weight adjustment module using Equations (15)–(17), where ω represents the corresponding weight after adjustment:
D n + 1 , m a x = i = 1 i j = 1 j t = 1 3 ω i , j , m a x , t × C i , j , m a x , t , n C i , j , m a x , t , n + 1 2
D n + 1 , m e a n = i = 1 i j = 1 j t = 1 3 ω i , j , m e a n , t × C i , j , m e a n , t , n C i , j , m e a n , t , n + 1 2
D n + 1 , s d = i = 1 i j = 1 j t = 1 3 ω i , j , s d , t × C i , j , s d , t , n C i , j , s d , t , n + 1 2

4. Experiments

In this section, first we introduce the dataset and coarse filter, and then we describe several experiments, including optimization parameter experiments, ablation experiments, and comparison experiments, that were conducted to verify and test the performance of the proposed method. Finally, some visual examples and discussion will be given at the end of the section.

4.1. Dataset

To our knowledge, there is no specialized video database for integer GOP deletion in static scene detection. This is a blind spot of public-reported research; meanwhile, it makes sense in forensic practice. Therefore, we created our own database by shooting 80 static scene videos. Analog and digital signals are the two main kinds of videos in practical use. So, test videos containing analog and digital signals have more stringency in court science. We recorded 40 samples with a Sony HVR-25C as analog signals saved on magnetic tape and 40 samples with a Sony NXCAM as digital signals saved in memory cards. All videos are 500 frames in length and consist of static scenes with a frame rate of 25 frames per second, resolution of 1920 × 1080 pixels, GOP length of 25, and other parameters set to default value. Half of the videos are original, and half are tampered with by means of 100 frame deletion (4 GOP) in the 250th frame. Example images from the videos are shown in Figure 10. All of the genuine FDPs could not be observed frame-by-frame directly even by experienced forensic identifiers.
Figure 10. Examples of static scenes tested in this paper: (a) static scene 4; (b) static scene 8; (c) static scene 14; (d) static scene 22; (e) static scene 30; (f) static scene 37.

4.2. Coarse Filter

Before presenting our experiment results, we will first describe the preliminary detection of outliers. As reported in previous work [1,33,34], I frames are encoded independently in each GOP without any reference frames. The correlation value will cause a significant drop between the I frame and the last frame of the previous GOP. This is a periodic effect caused by the GOP structure. Our concern is whether an abnormal reduction point occurs caused by the periodic effect or integral GOP deletion. We adopted Chebyshev’s inequality to filter massive useless data in the curve by Equations (18) and (19) for the results of the three compared methods mentioned in Section 4. Similar operations were conducted as in [1,34]. T1 and T2 represent the filter thresholds; values higher than the threshold will be filtered. μ 1 and μ 2 represent the mean values of correlation distribution and corresponding derived function of the given video. σ 1 and σ 2 represent the standard deviation of correlation distribution and the corresponding derived function of the given video. p 1 and p 2 represent the standard deviation from the mean. In our paper, the value is set to 1 as a coarse processing stage. Samples are shown in Figure 11 and Figure 12.
T 1 = μ 1 p 1 × σ 1
T 2 = μ 2 p 2 × σ 2
Figure 11. Results for scene 3: (a) original video; (b) result of LBP method; (c) result of MNMI method; (d) result of Haralick coded method.
Figure 12. Filtered results of LBP method for scene 3 (blue circles).
Filtered by Equations (18) and (19), a portion of points are selected for further calculation to evaluate the performance, as indicated by the blue circles in Figure 12. The mean and standard deviation of filtered points are computed to evaluate the detection performance of the method by using the receiver operating characteristic (ROC) curve. The true positive rate (TPR) and false positive rate (FPR) with variable standard deviation from the mean is used to draw the ROC curve. Numbers in tables represent the standard deviation between the detection of true or false positives and the mean computed by the above equations. In our paper, 0.5 times standard deviation is the interval in the ROC curve.
Different from the compared methods [1,33,34], in our paper, peak points indicate a low correlation between extracted noise features. A similar filter operation is performed by Equations (20) and (21) to extract points for further processing. T3 and T4 represent the filter thresholds; values lower than the thresholds are filtered. μ 3 and μ 4 represent the mean values of correlation distribution and corresponding derived function of the given video. σ 3 and σ 4 represent the standard deviation of correlation distribution and the corresponding derived function of the given video. p 3 and p 4 represent the standard deviation from the mean. In our paper, the value is set to 1 as a coarse processing stage. A sample is shown in Figure 13. It should be pointed out that an obvious trough was observed in the initial stage of recording video in the results for all proposed noise-based methods. This phenomenon can be explained by the unstable noise features at the beginning of the recording. So, the results of the first 100 frames were excluded from subsequent filtering. Numbers in tables are similar to those used to evaluate the compared methods. The ROC curve was used to evaluate the detection performance of the proposed method compared to other methods.
T 3 = μ 3 p 3 × σ 3
T 4 = μ 4 p 4 × σ 4
Figure 13. Filtered results of proposed method for scene 3 (blue circles).

4.3. Parameter Optimization Experiments

Control variable experiments are conducted to determine the optimal parameters for our method based on the 80 videos mentioned in Section 4.1. Scenes 1–20 and scenes 41–60 are original videos; the remaining videos are integral GOPs with the static scenes carefully removed. The coarse filter mentioned in Section 4.2 is used to extract points for further analysis. Noise feature extraction windows of different sizes (16 × 9, 32 × 18, 64 × 36), GOPs with different lengths involved in weight adjustment (from 0.25 to 2 GOPs, increasing by intervals of 0.25 GOP), and weight adjustment of different strengths (from 1 to 8, increasing by intervals of 1) were used to test each combination to optimize the parameters in the proposed method. Table 1 shows comparison results and Figure 14 shows ROC curves. Numbers in the “true positive” and “false positive” columns of the table represent the standard deviation from the mean.
Table 1. Comparison results for various extraction window sizes.
Figure 14. ROC curves for various extraction window sizes.
Figure 14 and Table 1 show the results for different extraction window sizes with the same length of GOP involved in weight adjustment (1 GOP) and the same strength of weight adjustment (3). The results indicate that a large extraction window size will lead to blunting of the ROC curve, especially on the left. With regard to the principle of no punishment in court for doubtful cases, the left curve speaks much louder than the right. So, we accept a noise feature extraction window of 16 × 9 in this paper. It could be explained that noise is a local feature of the image. The appropriate window size optimizes filtering performance; a larger size will lose too many details, narrowing the gap between successive and interrupted frames. The above phenomena generally occur with different combinations of GOP length and weight adjustment strength.
Figure 15 and Figure 16 and Table 2 show the results for GOPs of different lengths involved in weight adjustment with the same extraction window size (16 × 9) and the same strength of weight adjustment (3). The results indicate that better performance was achieved with 1 GOP, so we adopted 1 GOP for weight adjustment in our paper. The ROC curve decreases with reduced weight amendment length if shorter lengths are selected. It could be explained that the phase of the learned and current frames is variable. Moreover, a learning length that is too short means inadequate weight adjustment, which leads to worse performance. Conversely, if a longer length is selected, the ROC curve will drop entirely within a narrow margin with increasing weight amendment size. Despite tiny changes in the ROC curves, the results of individual cases (examples shown in Table 2) have obvious distinctions with variable weight amendment size. It could be explained that a too long learning length will cause an overfitting of weights, leading to polarized results in individual cases (examples shown in Table 2). Meanwhile, the phase of the learned and current frames is variable and also leads to worse performance. The above phenomena generally occur with different combinations of extracted window size and weight adjustment strength.
Figure 15. ROC curves of GOPs of various lengths involved in weight adjustment (1).
Figure 16. ROC curves of GOPs of various lengths involved in weight adjustment (2).
Table 2. Comparison results of variable weight adjustment and length.
Table 3 shows the results of weight adjustment strength with the same extraction window size (16 × 9) and the same length of GOP involved in weight adjustment (1 GOP). Figure 17 and Figure 18, respectively, describe the results with two groups of control variables (16 × 9 with 1 GOP, and 16 × 9 with 2 GOP). If lower strength is selected, the performance is the best at δ = 3 with 1 GOP; then, it declines with significant overfitting. If higher strength is selected, the performance is the best at δ = 6 with 2 GOP; then, it declines with gentle overfitting. Between the two, the performance with low strength and shorter length beats that with higher strength and longer length. We adopted δ = 3 as the weight adjustment strength in our paper. It could be explained that for weight learning, a length of 2 GOP would be too far away (50 frames) to learn noise features from the current frame, which would reduce the performance. Meanwhile, weight learning with more combined frames would lead to a weight matrix that is more uniform. So, the ROC curve will be entirely gentle with an obvious drop on the left, as demonstrated with δ = 3 and δ = 6 . In addition, overfitting will also be gentle, as demonstrated with δ = 4 and δ = 8 . The above phenomena generally occur with different combinations of extraction window size and GOP length involved in weight adjustment.
Table 3. Comparison results of various weight adjustment strengths.
Figure 17. ROC curves for various weight adjustment strengths (1).
Figure 18. ROC curves for various weight adjustment strengths (2).
Based on the above experiments, we adopted a noise feature extraction window size of 16 × 9, a weight learning length of 1 GOP, and a strength coefficient of 3 for our experiments.

4.4. Ablation Experiments

Ablation experiments were also conducted to verify the contribution of the pyramid structure, weight learning module, and fusion performance. The dataset and coarse filter are, respectively, mentioned in Section 4.1 and Section 4.2. The individual detection results for each part and the combined results are shown in Table 4, and the corresponding ROC curves are shown in Figure 19.
Table 4. Comparison results of ablation experiments. OT, original true positive; OF, original false positive; PT, pyramid true positive; PF, pyramid false positive; WLT, weight learning true positive; WLF, weight learning false positive; CT, combined true positive; CF, combined true positive.
Figure 19. ROC curves for ablation experiments.
The results of the ablation experiment are shown in Figure 19. We can see that both the pyramid structure and the weight learning module contribute to the performance of frame deletion detection and work best in combination. Comparing their ROC curves, the weight learning module improved the TPR rapidly with a low FPR. Although the pyramid structure only achieved relatively small improvement, in the individual cases (examples shown in Table 4), abnormal peaks in original videos were suppressed by the pyramid structure. On the contrary, the weight learning module highlighted frame detection deletion points while increasing the probability of normal points turning into suspected peaks. Our proposed method combines the advantages of pyramid and weight learning. For the purposes of forensic science, the proposed method could detect considerable integral GOP deletion in static scenes with low risk.

4.5. Comparison Experiments

We quantitatively evaluated the contribution of the proposed method by conducting comparison experiments. Dataset and coarse filter are, respectively, mentioned in Section 4.1 and Section 4.2. Multiple GOPs were removed from the original videos of static scenes, which strongly oppose existing detection technology. Both up-to-date learning-based detection methods [14,15] and anti-forensics forgery detection methods [1,33,34] were used for comparison. Before presenting the results of our paper, we will briefly describe the comparison methods [1,14,15,33,34].

4.5.1. LBP Method

LBP was originally designed for texture analysis. Because of its simplicity and efficacy in image processing and recognition, LBP has been applied in many research areas related to images. The definition of LBP is given by Equations (22) and (23):
L B P P = p = 0 P 1 s g p g c 2 P
s x = 1 , x 0 0 , x < 0
P denotes a neighborhood of P sampling points around the central pixel, which is always a 3 × 3 window, g c denotes the gray value of the central pixel, and g p denotes the gray value of the neighboring pixel. If g p is greater than g c , the corresponding binary code is 1; otherwise, it is 0. Finally, these binary codes are transformed into a decimal number. Theoretically, the CCoLBP will drop obviously at the FDP. The LBP method proposed by Zhang et al. [1] was selected for comparison in our paper.

4.5.2. MNMI Method

In information theory, a communication system consists of a source, a sink, and a channel. Redundancy is associated with the possibility of every symbol’s occurrence or uncertainty in the message. The mutual information of two events in information theory can be used to describe the content similarity of two adjacent frames. Inspired by this idea, the information between adjacent frames can be measured by joint entropy, as shown in Equation (24). The average mutual information is shown in Equation (25). Greater average mutual information means a higher similarity between adjacent frames. Due to the range of average mutual information not being between 0 and 1, Equation (26) should be normalized as shown in Equation (26):
H F t , F t + 1 = i = 0 L 1 j = 0 L 1 p I i F t , I j F t + 1 × l b p I i F t , I j F t + 1
M I F t , F t + 1 = H F t + H F t + 1 H F t , F t + 1
N M I F t , F t + 1 = 0 ,                                           H F t , F t + 1 = 0 H F t + H F t + 1 2 H F t , F t + 1 , H F t , F t + 1 0
where p I i F t , I j F t + 1 denotes the possibility of a gray value pair I i , I j appearing on the corresponding pixel position of F t and F t + 1 , and lb is a logarithm of 2. Because spatial analysis at different scales could extract multi-level details of the image, three-layer Gaussian pyramids were adopted in the algorithm. The final MNMI result can be computed using Equation (27). The weights of the pyramid layers are ω 0 = 0.5, 0.286, 0.143, and 0.071. As a result, discontinuity points caused by inter-frame forgery will be highlighted in the MNMI. The MNMI method proposed by Zhao et al. [33] was selected for comparison in our paper.
M N M I t = k = 0 3 ω k N M I F t k , F t + 1 k

4.5.3. Haralick Coded Method

Texture is one of the most important characteristics of an image. Haralick coding can extract structural information from images, while LBP focuses on pixel differences. In order to compute Haralick features, GLCM should be determined first, which provides a tabulation of frequencies of different combinations of pixel gray values in the image. Four GLCMs were constructed using Equations (28)–(31):
P 0 ° , d a , b = k , l , m , n : k = m , l n = d
P 45 ° , d a , b = k , l , m , n : k m = n l = d ,   o r d
P 90 ° , d a , b = k , l , m , n : k m = d , l = n
P 135 ° , d a , b = k , l , m , n : k m = l n = d ,   o r d
where k , l and m , n indicate the locations of pixels with gray levels a and b , representing set cardinality. After GLCMs are computed, the result is normalized using Equation (32) before Haralick feature extraction:
N , i , j = N i , j i = 1 n j = 1 n N i , j
where N i , j represents the i , j element of the non-normalized GLCM, and N , i , j represents the corresponding element after normalization. In our paper, the dimension of GLCM is set as 1920 × 1080. Haralick features are extracted from the normalized GLCM for each frame, including the angular second moment, contrast, sum of squares, correlation, sum average, sum entropy, sum variance, inverse difference moment, entropy, difference variance, difference entropy, information measure of correlation, and max correlation coefficient. Finally, correlations between adjacent frames are computed using Equation (33) based on the extracted Haralick features:
r i = j F i j F i ¯ × F i + 1 j F i + 1 ¯ j F i j F i ¯ 2 × j F i + 1 j F i + 1 ¯ 2
where r i represents the correlation between the ith and (i + 1)th frames, F i j represents the jth Haralick feature of the ith frame, and F i ¯ denotes the average value of all Haralick features of the ith frame. Abrupt changes in the correlation curve are visible at the editing point, including frame deletion, insertion, and duplication. The Haralick coded method proposed by Bakas et al. [34] was selected for comparison in our paper.

4.5.4. MS-SSIM Method

Gowda et al. [14] designed an MS-SSIM method to detect inter-frame deletion in videos. To differentiate the frames from each other, an absolute difference layer is added at the beginning of the model. In addition, a group of frames is constructed to improve accuracy and reduce computations. Pixel-wise discrepancies between adjacent frames are computed using Equations (34) and (35). The inter-frame forgery detection model is followed by rectified linear units (ReLU), batch normalization (BN), Maxpooling3D, global average pooling, dense, and SoftMax.
P f m , n = K f m , n K f + 1 m , n
D k m , n = 1 ,   i f   P f m , n > f 0 ,   o t h e r w i s e  
In the learning stage, the UCF-101 dataset is split randomly at a ratio of 3:1 for training and testing the model. After that, multi-scale structural similarity index measurement (MS-SSIM) is adopted to locate inter-frame forgery points if they exist.

4.5.5. UFS-MSRC Method

Girish et al. [15] introduced the UMS-MSRC algorithm and LSTM network to detect inter-frame forgery. The model comprises five major steps: dataset collection, data pre-processing, feature extraction, feature selection, and classification. The data pre-processing operation combines temporal and spatial information, which outputs the scene background with a pale appearance of movement vectors. In the feature extraction and selection stage, the GoogleNet model extracts 1024 feature vectors that are fed to the UFS-MSRC algorithm to select discriminative feature vectors. For the final result, the UFS-MSRC algorithm outputs overall features in ascending order based on k-nearest neighbor and Laplacian graphs developed in subspaces. Finally, selected optimal feature vectors are given as the input to the LSTM module for video forgery detection. The LSTM classifier’s parameters are set as follows: maximum epoch = 100, learning rate = 0.01, batch size = 40, hidden layers = 4, layer 1 = 100 units, layer 2 = 100 units, layer 3 = 125 units, and layer 4 = 100 units. The SULFA dataset is split randomly at a ratio of 4:1 for training and testing the LSTM model.

4.5.6. Learning-Based Method Results

Despite the excellent effects achieved in forgery video detection with the contribution of newly developed learning-based methods, proof of their utility in forensic science is still weak. In our experiments, learning-based methods [14,15] failed to detect frame deletion points (FDPs) in all 40 edited videos. No FDP could be detected as the lowest similarity point in the entire video. Examples are shown in Figure 20, Figure 21 and Figure 22.
Figure 20. Example of detection scene with learning-based method: (a) static scene 21; (b) static scene 22; (c) static scene 68; (d) static scene 69.
Figure 21. Example of detection results with MS-SSIM method: (a) static scene 21; (b) static scene 22; (c) static scene 68; (d) static scene 69.
Figure 22. Example of detection results with UFS-MSRC method: (a) static scene 21; (b) static scene 22; (c) static scene 68; (d) static scene 69.
In the figures, FDPs are indicated by black circles, while the lowest similarity points in the videos are indicated by blue circles. The results indicate that the learning-based methods [14,15] were completely unable to detect the delicate forgeries in our experimental videos. The decrease in similarity, as a result of other factors in the video, would cover the FDPs. Further verification is not necessarily based on current results. The phenomenon widely exists in the generalization of learning-based methods [63]. The out-of-distribution significantly reduces the performance of learning-based methods [64]. It should be highlighted that although impressive detection accuracy is achieved by many learning-based methods, the datasets used to verify the effect, such as UCF-101, VIFFD, SULFA, etc., are quite different from those used in forensics practice. UCF-101 was originally used for human action recognition research, VIFFD is made up of scenes from daily life, and SULFA was designed for the forensic analysis of tampered videos by presenting lifelike scenarios. In these databases, continuous motion vectors and local differences existing in successive frames can definitely be detected. A 3D-CNN model or LSTM model can detect FDPs in analogous videos containing continuous motion vectors by learning-based extracted features. In our paper, static scenes are involved in frame deletion detection because this is a common operation criminals use to forge videos. The instability of the focal plane, tiny changes in light sources, and various interference factors that occur during recording will be detected as local minimums of similarity. For the entire video, genuine FDPs will be covered by other low similarity points, as mentioned above, if the video is long enough. As learning-based methods are highly content-related, their extracted features, designed models, and parameters will not perform well in practical work, especially in integer-length GOP deletion in static scenes. Furthermore, such anti-forensics detection operation also veers from a series of methods based on the double compression detection principle proposed by Wang et al. [35]. Examples in UCF-101, VIFFD, and SULFA are shown in Figure 23, Figure 24 and Figure 25.
Figure 23. Video examples in UCF-101 database.
Figure 24. Video example in VIFFD database.
Figure 25. Video example in SULFA database.

4.5.7. Results of Robustness to Anti-Forensics Operation

Several representative detection methods reported to be robust to anti-forensics operations [1,33,34] were selected for comparison to evaluate the performance of the method proposed in this paper. Table 5 shows the detection results, and Figure 26 shows the ROC curves. Note that if the genuine FDP in the compared method is filtered by the coarse stage described in Section 4 or is larger than the mean computed in the coarse filtering operation, the number is recorded as 0.
Table 5. Detection results of comparison experiments. LBPT, LBP method true positive; LBPF, LBP method false positive; MNMIT, MNMI method true positive; MNMIF, MNMI method false positive; HCT, Haralick coded method true positive; HCF, Haralick coded method false positive; PT, proposed method true positive; PF, proposed method false positive.
Figure 26. ROC curves of compared methods.
The LBP method is a pixel-level detection method. The detection capability is fairly weak, as seen by the poor performance of the ROC curve in Figure 26, and some FDPs can be filtered by the coarse process stage, which is shown as zeros in the LBPT column of Table 5. Nonetheless, the results still indicate that this method can detect frame deletion points in some cases. Some examples are shown in Figure 27. Genuine frame deletion detection is indicated by black circles. As this is a content-related approach, similar unusual peaks are also detected in original videos, covering the performance of such a detection approach to some extent. Moreover, LBP is an overly sensitive detection method, and false positive detections are common in some cases. Some obvious examples are shown in Figure 28 and Figure 29. Figure 28 shows the typical influence of jitter in the focal plane, and Figure 29 shows the typical influence of noise instability caused by a larger underexposed region in the frame. Obvious false positive detections are indicated by black circles. As mentioned above, the LBP method cannot be fully applied in forensic science practice to detect integral GOP length frame removal in forged static scene videos.
Figure 27. Examples of detection performance of LBP method: (a) static scene 73; (b) LBP result for static scene 73; (c) static scene 74; (d) LBP result for static scene 74.
Figure 28. Examples of the typical influence of jitter in the focal plane: (a) static scene 5; (b) LBP result for static scene 5; (c) static scene 6; (d) LBP result for static scene 6.
Figure 29. Examples of the typical influence of noise instability: (a) static scene 10; (b) LBP result for static scene 10; (c) static scene 13; (d) LBP result for static scene 13.
Haralick coding is a local structure-level detection method. It consumes vast running time when handling large images. In our paper, it took much more running time with little improvement over the LBP method. Some successful detection examples are shown in Figure 30. Genuine frame deletion detection is indicated by black circles. Compared with the LBP method, the Haralick coded method blunts the sensitivity to distraction in videos (seen in Table 5). False positive detections still occur with the Haralick coded method. Some obvious examples are shown in Figure 31. Camera shake leads to false positive detection, as indicated by black circles. Similar to the LBP method, the Haralick coded method also cannot be used to detect integral GOP length frame removal in forged static scene videos.
Figure 30. Examples of detection performance of Haralick coded method: (a) static scene 68; (b) Haralick coded result for static scene 68; (c) static scene 71; (d) Haralick coded result for static scene 71.
Figure 31. Examples of the typical influence of camera shake: (a) static scene 22; (b) Haralick coded result for static scene 23; (c) static scene 22; (d) Haralick coded result for static 23.
The MNMI method was inspired by information theory. Significantly improved ROC curve performance was obtained compared with the LBP and Haralick coded methods. This method has improved the detection of FDPs while restraining the influence of interference factors. Some examples are shown in Figure 32. True positive detections are indicated by black circles. Although the false positive detection is controlled by the specific nature of the MNMI method, failures could still be found in the results. Figure 33 shows the typical influence of the twinkling of the light source. Obvious false positive detections are indicated by black circles. As mentioned above, the MNMI method has little positive application in forensic science for detecting integral GOP length frame removal in forged static scene videos.
Figure 32. Examples of detection performance of MNMI method: (a) static scene 31; (b) MNMI result for static scene 31; (c) static scene 65; (d) MNMI result for static scene 65.
Figure 33. Examples of the typical influence of twinkling of light source: (a) static scene 37; (b) MNMI result for static scene 37; (c) static scene 38; (d) MNMI result for static scene 38.

4.6. Discussion

The proposed approach is based on noise transfer matrix analysis. We adopt a pyramid structure and weight learning module to detect delicate frame deletion in forged videos. As shown in Figure 26 and Table 5, the proposed approach improves the performance of the ROC curve and individual cases more than other approaches. The results reveal the following:
(1)
The LBP, Haralick and MNMI algorithms detect suspected frame deletion points more accurately than random. Even though the ROC curves demonstrate poor performance, there is no doubt that those methods still provide useful information. False positive detection chance represents the dominant quantity in the video. Even considering the periodic effect caused by the GOP structure, the number of possible false positive detection points is dozens of times that of genuine positive detection points. An intuitive example is shown in Figure 12. Points indicated by blue circles indicate participants of the genuine FDP points.
(2)
Our approach has improved performance in detecting delicate frame deletion (integral GOP deletion in a static scene) in video compared with other approaches [1,33,34]. From Table 5, we can identify multiple examples of genuine FDPs detected by our approach, while these are ignored by the other approaches. Our approach reaches a TPR of 0.4 with an FPR of 0. However, the compared methods do not reach higher than 0.05. Because of the rigorousness of forensic science, vague clues cannot be accepted as evidence. A high TPR without any dispute can make a vital contribution to court proceedings.
(3)
Our approach reduces the possibility of false positive detection caused by various types of interference in video generation. This is a key point in forensic science in terms of verifying the authenticity of a video. In practical work, many novel approaches have been discarded due to the problem of false positive detection. In court, a suspected false positive detection in an entire video could overturn the results of genuine FDP detection. Examples will be given, presenting the characteristics of the proposed approach and comparing the approaches directly.
Despite demonstrating significant improvements, our method still has limitations, which can direct future work: (1) A TPR with a low FPR did not reach a high level, limiting the forensic science application. (2) Noise features are highly content dependent; thus, more videos should be tested to further verify and improve the proposed method. (3) The proposed measure is specifically aimed at detecting integral GOP deletions in static scenes of forged videos. Combining it with other approaches would help to expand the scope of application.
Unobservable anomalies due to frame deletion could be verified by the proposed approach, which was not possible with the other approaches. Some examples are shown in Figure 34 and Figure 35. The genuine FDPs detected by the proposed approach are indicated by black circles. Furthermore, the proposed approach is more robust to multiple types of interference in videos than the other approaches. Some examples are shown in Figure 36. Genuine FDPs are detected by the proposed approach, while suspected signals might be highlighted by the other approaches such as false positives. Obvious false positive detections by the compared approaches are indicated by blue circles, while genuine frame deletion point detections by the proposed approach are indicated by black circles. Because of the content-dependent nature of the proposed approach, obvious changes in a video, such as twinkling of the light source, cannot be avoided, as with the compared approaches. An example is shown in Figure 37. Obvious false positive detections are indicated by black circles. As mentioned above, the proposed approach can speak much louder than the compared approaches in forensic science with regard to detecting integral GOP length frame removal in forged static scene videos.
Figure 34. Examples of compared results: (a) static scene 34; (b) LBP method result for static scene 34; (c) MNMI method result for static scene 34; (d) Haralick coded method result for static scene 34; (e) proposed method result for static scene 34.
Figure 35. Examples of compared results: (a) static scene 68; (b) LBP method result for static scene 68; (c) MNMI method result for static scene 68; (d) Haralick coded method result for static scene 68; (e) proposed method result for static scene 68.
Figure 36. Examples of compared results: (a) static scene 25; (b) LBP method result for static scene 25; (c) MNMI method result for static scene 25; (d) Haralick coded method result for static scene 25; (e) proposed method result for static scene 25.
Figure 37. Examples of compared results: (a) static scene 37; (b) LBP method result for static scene 37; (c) MNMI method result for static scene 37; (d) Haralick coded method result for static scene 37; (e) proposed method result for static scene 37.

5. Conclusions

Determining the authenticity of a video poses a considerable challenge for modifying it easily without a trace. In forensic science, frame deletion can have serious consequences, such as unjust sentencing by courts. Meanwhile, many videos are presented as evidence in court. Removing integral GOP length frames in a static scene is an effective method of forging videos that cannot be accurately detected using current methods. Furthermore, such operation is convenient for anyone with publicly available software as long as they have a certain understanding of current detection methods.
This paper presents a method for anti-forensic forgery video detection based on noise transfer matrix analysis. Compared to three other anti-forensic forgery detection methods and newly developed learning-based methods, the proposed method significantly improves the performance of the ROC curve, especially in the low FPR interval, which is highly relevant in forensic science.

Author Contributions

Methodology, Q.B. and Y.W.; Software, Q.B. and Y.W.; Dataset curation, H.H. and K.D.; Validation, H.H. and K.D.; Writing—original draft preparation, Q.B.; Writing—review and editing, F.L.; Supervision, F.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhang, Z.; Hou, J.; Ma, Q.; Li, Z. Efficient video frame insertion and deletion detection based on inconsistency correlations between local binary pattern coded frames. Secur. Commun. Netw. 2015, 8, 311–320. [Google Scholar] [CrossRef]
  2. Yu, L.; Wang, H.; Han, Q.; Niu, X.; Yiu, S.M.; Fang, J.; Wang, Z. Exposing frame deletion by detecting abrupt changes in video streams. Neurocomputing 2016, 205, 84–91. [Google Scholar] [CrossRef]
  3. Kumar, V.; Gaur, M. Multiple forgery detection in video using inter-frame correlation distance with dual-threshold. Multimed. Tools Appl. 2022, 81, 43979–43998. [Google Scholar] [CrossRef]
  4. Sharma, H.; Kanwal, N. Video interframe forgery detection: Classification, technique & new dataset. J. Comput. Secur. 2021, 29, 531–550. [Google Scholar]
  5. Li, S.; Hou, H. Frame deletion detection based on optical flow orientation variation. IEEE Access 2021, 9, 37196–37209. [Google Scholar] [CrossRef]
  6. EI-Shafai, W.; Fouda, M.A.; EI-Rabaie, E.S.M.; EI-Salam, N.A. A comprehensive taxonomy on multimedia video forgery detection techniques: Challenges and novel trends. Multimed. Tools Appl. 2024, 83, 4241–4307. [Google Scholar] [CrossRef] [PubMed]
  7. Milani, S.; Fontani, M.; Bestagini, P.; Barni, M.; Piva, A.; Tagliasacchi, M.; Tubaro, S. An overview on video forensics. APSIPA Trans. Sig. Inf. Process. 2012, 1, e2. [Google Scholar] [CrossRef]
  8. Long, C.; Smith, E.; Basharat, A.; Hoogs, A. A C3D-based convolutional neural network for frame dropping detection in a single video shot. In Proceedings of the 2017 Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 1898–1906. [Google Scholar]
  9. Long, C.; Basharat, A.; Hoogs, A.; Singh, P.; Farid, H. A coarse-to-fine deep convolutional neural network framework for frame duplication detection and localization in forged videos. IEEE CVPR workshops. 2019, 1-10. In Computer Vision and Pattern Recognition; 2019; pp. 1–10. Available online: https://arxiv.org/abs/1811.10762 (accessed on 16 August 2024).
  10. Bakas, J.; Naskar, R. A digital forensic technique for inter–frame video forgery detection based on 3D CNN. In Proceedings of the International Conference on Information Systems Security, Bangalore, India, 17–19 December 2018; pp. 304–317. [Google Scholar]
  11. Yang, Q.; Yu, D.; Zhang, Z.; Yao, Y.; Chen, L. Spatiotemporal trident networks: Detection and localization of object removal tampering in video passive forensics. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 4131–4144. [Google Scholar] [CrossRef]
  12. Fadl, S.; Han, Q.; Li, Q. CNN spatiotemporal features and fusion for surveillance video forgery detection. Sig. Process. Image Commun. 2021, 90, 116066. [Google Scholar] [CrossRef]
  13. Kaur, H.; Jindal, N. Deep convolutional neural network for graphics forgery detection in video. Wirel. Pers. Commun. 2020, 112, 1763–1781. [Google Scholar] [CrossRef]
  14. Gowda, R.; Pawar, D. Deep learning-based forgery identification and localization in videos. Sig. Image Video Process. 2023, 17, 2185–2192. [Google Scholar] [CrossRef]
  15. Girish, N.; Nandini, C. Inter-frame video forgery detection using UFS-MSRC algorithm and LSTM network. Int. J. Model. Simul. Sci. Comput. 2023, 14, 2341013. [Google Scholar] [CrossRef]
  16. Aloraini, M.; Sharifzadeh, M.; Schonfeld, D. Sequential and patch analyses for object removal video forgery detection and localization. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 917–930. [Google Scholar] [CrossRef]
  17. Fayyaz, M.A.; Anjum, A.; Ziauddin, S.; Khan, A.; Sarfaraz, A. An improved surveillance video forgery detection technique using sensor pattern noise and correlation of noise residues. Multimed. Tools Appl. 2020, 79, 5767–5788. [Google Scholar] [CrossRef]
  18. Hsu, C.C.; Hung, T.Y.; Lin, C.W.; Hsu, C.T. Video forgery detection using correlation of noise residue. In Proceedings of the 2008 IEEE 10th Workshop on Multimedia Signal Processing, Cairns, Qld, Australia, 8–10 October 2008; pp. 170–174. [Google Scholar]
  19. Mandelli, S.; Bestagini, P.; Tubaro, S.; Cozzolion, D.; Verdoliva, L. Blind detection and localization of video temporal splicing exploiting sensor-based footprints. In Proceedings of the 2018 26th European Signal Processing Conference (EUSIPCO), Roma, Italy, 3–7 September 2018; pp. 1362–1366. [Google Scholar]
  20. Kobayashi, M.; Okabe, T.; Sato, Y. Detecting forgery from static-scene video based on inconsistency in noise level functions. IEEE Trans. Inf. Forensics Secur. 2010, 5, 883–892. [Google Scholar] [CrossRef]
  21. Bayram, S.; Sencar, H.; Memon, N.; Avcibas, I. Source camera identification based on CFA interpolation. In Proceedings of the IEEE International Conference on Image Processing 2005, Genova, Italy, 14 September 2005; pp. III–69. [Google Scholar]
  22. Liu, C.; Freeman, W.T.; Szeliski, R.; Kang, S.B. Noise estimation from a single image. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 06), New York, NY, USA, 17–22 June 2006; pp. 901–908. [Google Scholar]
  23. Matsushita, Y.; Lin, S. Radiometric calibration from noise distributions. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007; pp. 1–8. [Google Scholar]
  24. Fadl, S.; Han, Q.; Qiong, L. Exposing video inter-frame forgery via histogram of oriented gradients and motion energy image. Multidimens. Syst. Sig. Process. 2020, 31, 1365–1384. [Google Scholar] [CrossRef]
  25. Shehnaz; Kaur, M.A. Detection and localization of multiple inter-frame forgeries in digital videos. Multimed. Tools Appl. 2024, 83, 1–33. [Google Scholar] [CrossRef]
  26. Wang, Q.; Li, Z.; Zhang, Z.; Ma, Q.A. Video inter-frame forgery identification based on consistency of correlation coefficients of gray values. J. Comput. Commun. 2014, 2, 51–57. [Google Scholar] [CrossRef]
  27. Wang, Q.; Li, Z.; Zhang, Z.; Ma, Q.A. Video inter-frame forgery identification based on optical flow consistency. Sens. Transducers 2014, 166, 229–234. [Google Scholar]
  28. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual learning for image recognition. In Proceedings of the 2017 IEEE International Symposium on Technologies for Homeland Security (HST), Las Vegas, NV, USA, 27–30 June 2017; pp. 770–778. [Google Scholar]
  29. Wan, Q.; Panetta, K.; Agaian, S. A video forensic technique for detecting frame integrity using human visual system-inspired measure. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Waltham, MA, USA, 25–26 April 2016; pp. 1–6. [Google Scholar]
  30. Jing, Z.; Su, Y.; Zhang, M. Exposing digital video forgery by ghost shadow artifact. In Proceedings of the First ACM Workshop on Multimedia in Forensics, Tianjin, China, 23 October 2009; pp. 49–54. [Google Scholar]
  31. Feng, C.; Xu, Z.; Jia, S.; Zhang, W.; Xu, Y. Motion-adaptive frame deletion detection for digital video forensics. IEEE Trans. Circuits Syst. Video Technol. 2016, 27, 2543–2554. [Google Scholar] [CrossRef]
  32. Kingra, S.; Aggarwal, N.; Singh, R.W. Inter-frame forgery detection in H.264 videos using motion and brightness gradients. Multimed. Tools Appl. 2017, 76, 25767–25786. [Google Scholar] [CrossRef]
  33. Zhao, Y.; Pang, T.; Liang, X.; Li, Z. Frame-deletion detection for static-background video based on multi-scale mutual information. In Proceedings of the International Conference on Cloud Computing and Security (ICCCS 2017), Nanjing, China, 16–18 June 2017; pp. 371–384. [Google Scholar]
  34. Bakas, J.; Naskar, R.; Dixit, R. Detection and localization of inter-frame video forgeries based on inconsistency in correlation distribution between Haralick coded frames. Multimed. Tools Appl. 2019, 78, 4905–4935. [Google Scholar] [CrossRef]
  35. Wang, W.; Fraid, H. Exposing digital forgeries in video by detecting double MPEG compression. In Proceedings of the 8th Workshop on Multimedia and Security, Geneva, Switzerland, 26–27 September 2006; pp. 37–47. [Google Scholar]
  36. Stamm, M.C.; Lin, W.S.; Liu, K.J.R. Temporal forensics and anti-forensics for motion compensated video. IEEE Trans. Inf. Forensics Secur. 2012, 7, 1315–1329. [Google Scholar] [CrossRef]
  37. Gironi, A.; Fontani, M.; Bianchi, T.; Piva, A.; Barni, M. A video forensic technique for detecting frame deletion and insertion. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Florence, Italy, 4–9 May 2014; pp. 6226–6230. [Google Scholar]
  38. Jiang, X.; Wang, W.; Sun, T.; Shi, Y.Q.; Shilin, W. Detection of double compression in MPEG-4 videos based on Markov statistics. IEEE Sig. Process. Lett. 2013, 20, 447–450. [Google Scholar] [CrossRef]
  39. Shanableh, T. Detection of frame deletion for digital video forensics. Digit. Investig. 2013, 10, 350–360. [Google Scholar] [CrossRef]
  40. Liu, H.; Li, S.; Bian, S. Detecting frame deletion in H. 264 video. In Proceedings of the International Conference on Information Security Practice and Experience (ISPEC 2014), Fuzhou, China, 5–8 May 2014; pp. 262–270. [Google Scholar]
  41. Vazquez-Padin, D.; Fontani, M.; Bianchi, T.; Comesana, P.; Piva, A.; Barni, M. Detection of video double encoding with GOP size estimation. In Proceedings of the 2012 IEEE International Workshop on Information Forensics and Security (WIFS), Costa Adeje, Spain, 2–5 December 2012; pp. 151–156. [Google Scholar]
  42. Hong, J.H.; Yang, Y.; Oh, B.T. Detection of frame deletion in HEVC-Coded video in the compressed domain. Digit. Investig. 2019, 30, 23–31. [Google Scholar] [CrossRef]
  43. Singh, R.D.; Aggarwal, D. Detection of re-compression, transcoding and frame-deletion for digital video authentication. In Proceedings of the 2015 2nd International Conference on Recent Advances in Engineering & Computational Sciences (RAECS), Chandigarh, India, 21–22 December 2015; pp. 1–6. [Google Scholar]
  44. Bakas, J.; Naskar, R.; Bakshi, S. Detection and localization of inter-frame forgeries in videos based on macroblock variation and motion vector analysis. Comp. Electr. Eng. 2021, 89, 106929. [Google Scholar] [CrossRef]
  45. Shekar, B.H.; Abraham, W.; Pilar, B. A simple difference based inter frame video forgery detection and localization. In Proceedings of the International Conference on Soft Computing and its Engineering Applications (icSoftComp 2023), Changa, Anand, India, 7–9 December 2023; pp. 3–15. [Google Scholar]
  46. Wang, W.; Jiang, X.; Wang, S.; Wan, M.; Sun, T. Identifying video forgery process using optical flow. In Proceedings of the International WorkshopDigital-Forensics and Watermarking (IWDW 2013), Auckland, New Zealand, 1–4 October 2013; pp. 244–257. [Google Scholar]
  47. Dar, Y.; Bruckstein, A.M. Motion-compensated coding and frame rate up-conversion: Models and analysis. IEEE Trans. Image Process. 2015, 24, 2051–2066. [Google Scholar] [CrossRef] [PubMed]
  48. Yao, H.; Ni, R.; Zhao, Y. An approach to detect video frame deletion under anti-forensics. J. Real-Time Image Process. 2019, 16, 751–764. [Google Scholar] [CrossRef]
  49. Reibman, A.R.; Poole, D. Characterizing packet-loss impairments in compressed video. In Proceedings of the 2007 IEEE International Conference on Image Processing, San Antonio, TX, USA, 16 September 2007–19 October 2007; pp. 77–80. [Google Scholar]
  50. Reibman, A.R.; Vaishampayan, V.A.; Sermadevi, Y. Quality monitoring of video over a packet network. IEEE Trans. Multimed. 2004, 6, 327–334. [Google Scholar] [CrossRef]
  51. Li, H.; Bao, Q.; Deng, N. Detection of video continuity based on noise Markov transfer matrix. J. Crim. Investig. Police Univ. China 2019, 149, 125–128. [Google Scholar]
  52. Chen, J.; Kang, X.; Liu, Y.; Wang, Z.J. Median filtering forensics based on convolutional neural networks. IEEE Sig. Process. Lett. 2015, 22, 1849–1853. [Google Scholar] [CrossRef]
  53. Kirchner, M.; Rohme, R. Hiding traces of resampling in digital images. IEEE Trans. Inf. Forensics Secur. 2008, 3, 582–592. [Google Scholar] [CrossRef]
  54. Pawar, S.; Pradhan, G.; Goswami, B.; Bhutad, S. Identifying Fake Images Through CNN Based Classification Using FIDAC. In Proceedings of the 2022 International Conference on Intelligent Controller and Computing for Smart Power (ICICCSP), Hyderabad, India, 21–23 July 2022; pp. 1–6. [Google Scholar]
  55. Lin, T.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal loss for dense object detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2999–3007. [Google Scholar]
  56. Patel, J.; Sheth, R. An optimized convolution neural network based inter-frame forgery detection model—A multi-feature extraction framework. ICTACT J. Image Video Process. 2021, 12, 2570–2581. [Google Scholar]
  57. Paing, S.; Htun, Y. An unsupervised learning algorithm based deletion of Inter-frame forgery detection system. In Proceedings of the 2023 International Conference on Research Methodologies in Knowledge Management, Artificial Intelligence and Telecommunication Engineering (RMKMATE), Chennai, India, 1–2 November 2023; pp. 1–4. [Google Scholar]
  58. Chen, C.; Shi, Y.Q.; Su, W. A machine learning based scheme for double JPEG compression detection. In Proceedings of the 2008 19th International Conference on Pattern Recognition, Tampa, FL, USA, 8–11 December 2008; pp. 1–4. [Google Scholar]
  59. Thing, V.L.L.; Chen, Y.; Cheh, C. An improved double compression detection method for JPEG image forensics. In Proceedings of the 2012 IEEE International Symposium on Multimedia, Irvine, CA, USA, 10–12 December 2012; pp. 290–297. [Google Scholar]
  60. Fisher, R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]
  61. Su, Y.; Nie, W.; Zhang, C. A frame tampering detection algorithm for MPEG videos. In Proceedings of the 2011 6th IEEE Joint International Information Technology and Artificial Intelligence Conference, Chongqing, China, 20–22 August 2011; pp. 461–464. [Google Scholar]
  62. Wu, Y.; Jiang, X.; Sun, T.; Wang, W. Exposing video inter-frame forgery based on velocity field consistency. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Florence, Italy, 4–9 May 2014; pp. 2674–2678. [Google Scholar]
  63. Yang, J.; Zhou, K.; Li, Y.; Liu, Z. Generalized out-of-distribution detection: A survey. Int. J. Comput. Vis. 2024, 1–28. [Google Scholar] [CrossRef]
  64. Zhu, L.; Yang, Y.; Gu, Q.; Wang, X.; Zhou, C.; Ye, N. CRoFT: Robust fine-tuning with concurrent optimization for OOD generalization and open-set OOD detection. arXiv 2024, arXiv:2405.16417. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.