You are currently viewing a new version of our website. To view the old version click .
Electronics
  • Article
  • Open Access

19 June 2024

An Approach to Deepfake Video Detection Based on ACO-PSO Features and Deep Learning

,
and
1
Computer Engineering, Karabuk University, 78050 Karabuk, Turkey
2
Information Security and Digital Forensics, University at Albany, State University of New York, Albany, NY 12222, USA
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Applications of Artificial Intelligence, Machine Learning, Deep Learning, and Explainable AI (XAI)

Abstract

The rapid advancement of deepfake technology presents significant challenges in detecting highly convincing fake videos, posing risks such as misinformation, identity theft, and privacy violations. In response, this paper proposes an innovative approach to deepfake video detection by integrating features derived from ant colony optimization–particle swarm optimization (ACO-PSO) and deep learning techniques. The proposed methodology leverages ACO-PSO features and deep learning models to enhance detection accuracy and robustness. Features from ACO-PSO are extracted from the spatial and temporal characteristics of video frames, capturing subtle patterns indicative of deepfake manipulation. These features are then used to train a deep learning classifier to automatically distinguish between authentic and deepfake videos. Extensive experiments using comparative datasets demonstrate the superiority of the proposed method in terms of detection accuracy, robustness to manipulation techniques, and generalization to unseen data. The computational efficiency of the approach is also analyzed, highlighting its practical feasibility for real-time applications. The findings revealed that the proposed method achieved an accuracy of 98.91% and an F1 score of 99.12%, indicating remarkable success in deepfake detection. The integration of ACO-PSO features and deep learning enables comprehensive analysis, bolstering precision and resilience in detecting deepfake content. This approach addresses the challenges involved in facial forgery detection and contributes to safeguarding digital media integrity amid misinformation and manipulation.

1. Introduction

Deepfake videos, which leverage advanced deep learning techniques to manipulate facial expressions and actions in videos, have emerged as a significant concern in recent years [1]. The accessibility of the necessary technology has facilitated the creation and dissemination of deepfake content, posing serious threats to privacy, security, and trust in media [2]. Detecting deepfake videos has thus become a critical research area in combating misinformation and propaganda [3].

1.1. Motivations

There are rising concerns regarding the proliferation of deepfake videos and their potential impact on privacy, security, and trust in media. Thus, there is an urgent need to develop effective detection methods to identify and mitigate the spread of misinformation and propaganda. Various techniques, including facial expression analysis and machine learning algorithms, have been proposed for detecting deepfake videos [3,4,5]. However, inconsistencies in generated content and evolving manipulation techniques remain persistent challenges in this regard [6,7].
This paper proposes a novel approach to deepfake video detection, integrating features derived from ant colony optimization–particle swarm optimization (ACO-PSO) and deep learning techniques. Our methodology leverages ACO-PSO features and deep learning models to enhance the detection accuracy and robustness [8]. Specifically, ACO-PSO features are extracted from the spatial and temporal characteristics of video frames, capturing subtle patterns indicative of deepfake manipulation [9].

1.2. Main Contributions

The main contributions of this paper are as follows:
  • The integration of ACO-PSO features and deep learning for improved deepfake video detection.
  • The utilization of transfer learning with pre-trained ACO-PSO models to address the limitations of the available training data.
  • An emphasis on optical flow features to analyze temporal dynamics and enhance detection accuracy.
  • A comprehensive analysis encompassing both the spatial and temporal dimensions of video data to strengthen deepfake detection capabilities.
Through extensive experiments using benchmark datasets, our proposed method demonstrates superior performance in terms of detection accuracy, robustness to manipulation techniques, and generalization to unseen data. Furthermore, the computational efficiency of our approach is analyzed, highlighting its practical feasibility for real-time applications.

3. Material and Methods

Optical flow describes the movement of objects within an image. It is frequently applied to a series of images, such as video frames. Optical flow calculates the velocity of pixels within these images and predicts where these pixels may be in subsequent image series. The optical flow vector field demonstrates how shifting pixels in an object from the first image may result in an identical object in the second image. Since the optical flow field can be approximated if the corresponding pixels of an object are known, it is a type of correspondence learning. The flowchart of the optical flow algorithm used to extract features is shown in Figure 1.
Figure 1. Proposed method.
ACO is inspired by the foraging behavior of ants, where they find the shortest path to a food source by laying down pheromones. The key steps in the ACO algorithm are as follows:
  • Initialization:
Set the number of ants, m.
Initialize the pheromone matrix, τij, and heuristic information, ηij.
2.
Construct Ant Solutions:
Each ant constructs a solution by moving from node to node based on a probabilistic decision rule.
The probability, P i j k , of ant kk moving from node i to node j is given by:
P i j k = τ i j α η i j β l N i k   τ i l α η i l β
where α and β are parameters that control the influence of the pheromone and heuristic information, respectively, and N i k is the set of feasible moves.
3.
Update Pheromones:
After all ants have constructed their solutions, update the pheromone levels:
τ i j = ( 1 ρ ) τ i j + k = 1 m   Δ τ i j k
where ρ is the evaporation rate and τ i j k   is the pheromone deposited by ant k:
Δ τ i j k = Q L k i f   a n t   k   u s e   s e d g e   ( i , j )   i n   i t s   s o l u t i o n 0 o t h e r w i s e
where Q is a constant and Lk is the length of the solution constructed by ant k.
PSO is inspired by the social behavior of birds flocking or fish schooling. The algorithm maintains a population of particles, where each particle represents a potential solution. The key steps in the PSO algorithm are:
1.
Initialization:
Initialize the positions, xi, and velocities, vi, of n particles randomly.
Set the personal best position, pi, for each particle and the global best position, g.
2.
Update Velocities:
Update the velocity, vi, of each particle based on its personal best position and the global best position:
v i ( t + 1 ) = ω v i ( t ) + c 1 r 1 p i x i ( t ) + c 2 r 2 g x i ( t )
where ω is the inertia weight, c1 and c2 are cognitive and social coefficients, and r1 and r2 are random values between 0 and 1.
3.
Update Positions:
Update the position xi of each particle:
x i ( t + 1 ) = x i ( t ) + v i ( t + 1 )
4.
Update Personal and Global Bests:
Update the personal best position, pi, if the new position is better.
Update the global best position, g, if any personal best position is better than the current global best.
Hybrid ACO-PSO Algorithm
A hybrid ACO-PSO algorithm leverages the strengths of both ACO and PSO to achieve better performance in optimization tasks. One possible hybridization approach is outlined below:
1.
Initialization:
Initialize the population of particles as in PSO.
Initialize the pheromone matrix as in ACO.
2.
PSO Update:
For each particle, update its velocity and position using the PSO update rules.
Evaluate the new positions and update the personal and global bests.
3.
ACO Update:
For each ant (which could be considered as a particle), construct solutions using the ACO probabilistic decision rule.
Evaluate the solutions and update the pheromone matrix as in ACO.
4.
Information Sharing:
Integrate the information from PSO and ACO by influencing the pheromone update with the global best position found by PSO.
Use the pheromone levels to guide the particles in PSO by adjusting the velocities toward regions with higher pheromone concentrations.
5.
Iteration:
Repeat the PSO and ACO updates for a fixed number of iterations or until convergence. The dataset used in this study was obtained from [48] and can be downloaded from the following link: (https://www.kaggle.com/competitions/deepfake-detection-challenge/data, 10 November 2023).
There are limited datasets available for deepfake detection. The Deepfake Detection Challenge (DFDC) dataset [49], one of the largest of these, can be used. It is a training dataset published for use in the DFDC Kaggle competition. Facebook created this dataset of more than 100,000 videos. The videos are shot by real people and contain a mix of real and fake videos. Various algorithms were used to create the deepfake videos. The size of this dataset is 471 GB, of which 4.5 GB were used in this paper. Some of the images are shown in Figure 2.
Figure 2. Real images and fake images.

4. Innovative Solutions for Overcoming Contemporary Challenges in Facial Forgery Detection

The method presented here provides a compelling resolution to the obstacles encountered in facial forgery detection, thereby playing a crucial role in preserving the integrity of digital media amid the escalating prevalence of misinformation and manipulation, as depicted in Figure 3.
Figure 3. Proposed method to resolve the challenges facing current facial forgery detection systems.
A. 
Challenges facing contemporary facial forgery detection systems
  • Adapting to varied manipulation techniques: existing systems struggle with the ever-evolving manipulation techniques seen in deepfake creation, ranging from basic facial swaps to intricate alterations of expressions, lighting, and occlusions.
  • Limited training data: obtaining labeled data for training is hindered by privacy concerns and the expertise needed to label deepfake videos, impacting the system’s ability to be generalized to new manipulation techniques.
  • Real-time processing: handling high-resolution video streams in real time, especially on platforms such as social media or video-streaming services, poses significant computational hurdles.
  • Robustness to compression and quality loss: deepfake videos undergo compression and quality degradation when shared online, challenging detection systems to maintain accuracy across various platforms and viewing conditions.
  • Ethical considerations: deploying these systems raises ethical concerns regarding privacy, consent, and potential misuse, necessitating the careful consideration of user consent, false positives, and privacy implications.
B. 
How our method addresses these challenges
  • Feature integration using ACO-PSO and deep learning: our approach combined the features of ACO-PSO and deep learning to enhance detection accuracy and robustness.
  • Transfer learning with ACO-PSO: we employed transfer learning with pre-trained ACO-PSO models to leverage knowledge from large datasets, reducing the need for extensive training data.
  • Focus on facial regions: by concentrating on facial features using the Viola–Jones face detection method, we streamlined detection and reduced computational overhead.
  • Evaluation of benchmark datasets: extensive experiments on benchmark datasets allowed us to assess our method’s effectiveness in terms of detection accuracy, robustness, and generalization to new data.
  • Computational efficiency: we analyzed the computational efficiency of our approach, ensuring practical feasibility for real-time applications while maintaining accuracy.
As indicated in Table 2, these significant resources introduce innovative approaches to identifying deepfakes and tackling the various obstacles involved, including localization, adversarial learning, generalization, and the extraction of crucial cues. Researchers and practitioners have the opportunity to harness the insights and techniques provided in these resources to actively combat the widespread dissemination of deepfake content.
Table 2. Key resources for enhanced deepfake detection.

5. Dataset Description and Features

The dataset employed in this research served as the cornerstone for both training and assessing the efficacy of the novel deepfake detection approach proposed herein. Sourced from the DFDC database, generously provided by Meta, this dataset represents a rich tapestry of videos encompassing a broad spectrum of authenticity, ranging from genuine to synthetically manipulated content. Categorically organized, this dataset enables the systematic development and rigorous validation of robust deepfake detection algorithms.

Dataset Features

(1)
Diverse video content: this dataset spans a wide gamut of video content, ranging from everyday scenarios to orchestrated performances, ensuring a comprehensive representation of real-world contexts.
(2)
Manipulation techniques: videos within the dataset exhibit a plethora of manipulation techniques characteristic of deepfake creation, including facial swaps, expression alterations, lighting modifications, and occlusion effects. This diversity fosters the thorough training and evaluation of detection algorithms across various manipulation scenarios.
(3)
Variation in resolution and quality: reflecting real-world conditions encountered across diverse platforms and recording devices, videos in the dataset exhibit variations in resolution and quality. This diversity fortifies the robustness of detection algorithms against discrepancies in video resolution and quality, which is vital for real-world deployment.
(4)
Temporal dynamics: capturing the intrinsic temporal dynamics of video sequences, the dataset encapsulates motion patterns, temporal irregularities, and frame-level alterations. This temporal dimension enriches the training data, empowering algorithms to discern subtle temporal cues indicative of deepfake manipulation.
(5)
Focus on facial regions: Considering the predominant focus on deepfake manipulation of facial features, the dataset emphasizes facial regions within videos. Leveraging the Viola–Jones face detection method, the dataset streamlines the extraction of facial frames, optimizing the detection process and enhancing computational efficiency.
(6)
Annotation and labeling: Each video in the dataset undergoes meticulous annotation and labeling, indicating its authenticity status—whether it is real or a deepfake. These annotations serve as ground truth labels, ensuring the accuracy and reliability of performance metrics during algorithm training and evaluation.
By harnessing the diverse array of features embedded within the dataset, the proposed methodology undergoes rigorous training and evaluation, culminating in the development of robust deepfake detection capabilities. The incorporation of various manipulation techniques, resolution disparities, and temporal dynamics and an emphasis on facial regions enrich the training data, enabling algorithms to be effectively generalized to real-world scenarios and emerging manipulation techniques.

6. Results and Discussion

In this section, the results obtained from the proposed method are discussed. Ant colony optimization–particle swarm optimization yielded the best results. The main reason for this is the small size of the dataset. If we had used the entire DFDC dataset, we might have obtained better results with other networks. For frame extraction, we used one of MATLAB’s libraries, the Viola–Jones face detection method. Viola–Jones face detection provides the frames from each video in the dataset. In this study, we set the number of frames to 10 for each video. With the mentioned face detection method, more than 4000 frames were extracted from the videos. The main reason for focusing on the faces in the videos is that deepfake videos are typically created by changing faces. Focusing on the face and ignoring other regions saved us a great deal of time and file size.
After all the steps performed on the dataset, the ACO-PSO model was built using the transfer learning method. There are several ways to realize that transfer learning pathways are available, such as using this pre-trained model or creating a new one. There are two ways to employ a pre-trained model. One can train the model using weights after using pre-trained weights and biases as initial parameters. The alternative is to extract features using the pre-trained model. A classifier can be trained on the input image by using the parameters of the pre-trained model to extract features from it. If there is not enough data available, one can also train a new model using the weights of one that was made for a similar problem to the one that involved a great deal of data. The ACO-PSO family of models, which has already been trained, was employed in this study as a feature extractor, and a classifier was trained on top of it to produce the prediction output.
Ant colony optimization–particle swarm optimization was chosen as the initial model due to its proven effectiveness and suitability considering the size of our dataset. This decision considered the trade-off between model complexity and ensemble dimension, acknowledging that with larger datasets, alternative networks may yield superior results.
Transfer learning with a family of pre-trained ACO-PSOs was used to leverage the knowledge gained from a model trained on a larger dataset. This method contributed to reducing the need for extensive data for training and utilizing pre-existing features learned by the model.
By combining these diverse sets of machine learning algorithms, we intended to achieve the comprehensive and accurate detection of deepfake videos, considering the subtleties of facial changes and the overall dynamics of video sequences. Each algorithm was strategically chosen and contributed to the strength of our proposed method. The dataset used in this study was obtained from the DFDC database, provided by Meta. The dataset was primarily focused on deepfake videos, and it classified videos into either the deepfake or real categories. Each video underwent face extraction using the Viola–Jones face recognition method, which efficiently separated facial regions. The focus on the face was justified by the prevalent nature of deepfake videos, which primarily manipulate facial features. After the face extraction process, the dataset was further processed using the proposed method, specifically ACO-PSO, for feature extraction and deepfake detection. The colorectal dataset was specifically selected for evaluation, and the suggested strategy was applied to assess the approach’s efficacy, with the number of frames being set to 10 per video. The division of the dataset was crucial for training and testing the proposed algorithm, allowing for a comprehensive evaluation of its performance in distinguishing between deepfake and real videos.
The decision to utilize ACO-PSO and extract optical flow features was a strategic move aimed at bolstering the precision and resilience of deepfake detection. In fact, ACO-PSO features were specifically selected for their capability to capture nuanced patterns and irregularities within the spatial and temporal dimensions of video frames, providing valuable cues indicative of deepfake manipulation. Leveraging the strengths inherent in ACO-PSO, our proposed methodology effectively discerned between authentic and deepfake videos, even considering the intricacies of sophisticated manipulation techniques.
Similarly, the inclusion of optical flow features assumed a pivotal role in deepfake detection by elucidating the movement of pixels within images over time. This dynamic insight proved particularly valuable in the realm of video analysis, where it can uncover disparities in motion suggestive of tampering. By integrating optical flow features into our methodology, we gained deep insights into the temporal dynamics of video sequences, thereby enhancing our capacity to accurately pinpoint deepfake alterations. In essence, the integration of ACO-PSO and optical flow features empowered our proposed method to provide a comprehensive analysis encompassing both the spatial and temporal dimensions of video data. This holistic approach fortified our ability to detect deepfake content and surmount the obstacles created by evolving manipulation techniques.
In this study, images were evaluated using accuracy, sensitivity, precision, and F1 scores, as determined by Equations (6)–(10), respectively [50,51,52,53]:
A C C = Τ Ρ + Τ Ν Τ Ρ + Τ Ν + F Ρ + F Ν × 100
S e n s i t i v i t y = Τ Ρ Τ Ρ + F Ν × 100
S p e c i f i c t y = Τ Ν Τ Ν + F Ρ × 100
P r e c i s i o n = Τ Ρ Τ Ρ + F Ρ × 100
F 1 = 2 × Re c a l l × Pr e c i s i o n Re c a l l + Pr e c i s i o n × 100
True positive Τ Ρ , false positive F Ρ , false negative F Ν , and true negative Τ Ν were used to determine each of these indicators [53]. Accuracy A C C is the proportion of correct predictions out of all predictions. The decision tree and support vector machine were both used in this study’s classification of deepfake detection. The new method was also contrasted with ambiguous varieties of traditional ACO, PSO, and GA methods. Regarding repeating the optical flow in an experiment, Figure 4 illustrates the precision values with which false and actual images can be identified. Four proposed methods were applied in this implementation to extract the features from deepfake detection images. These techniques included ACO-PSO.
Figure 4. Results of the various methods.
The ACO-PSO technique demonstrated effectiveness in classifying image correctness according to analysis and evaluation. The accuracy was enhanced by carefully selecting the feature vector in the optical flow method. Additionally, the ACO-PSO method reduced server-to-server traffic during distributed training. For machine learning and training, the feature selection method identified the most significant features from both real and synthetic images. In this experiment, the accuracy of the ACO-PSO was 98.25%. The accuracy of the ACO-PSO on the colorectal dataset increased to an average of 98.88% as the population of feature vectors grew due to 20 experiments.
First, the colorectal dataset was selected, the suggested strategy was implemented, and the images were used to evaluate the approach’s efficacy. Table 2 shows the results regarding the sensitivity, specificity, and accuracy index comparisons between the proposed method, such as ACO, PSO, GA, and optical flow algorithms based on ACO-PSO approaches. For two classes, the suggested method’s mean sensitivity, specificity, and accuracy values were compared with those of competing methods. Table 3 shows the relationship between the sensitivity, specificity, accuracy, and F1 scores.
Table 3. Comparison of the proposed method with various metaheuristic methods.
According to the analysis, the suggested method’s average sensitivity, specificity, accuracy, and F 1 score index were 8.21%, 98.56%, 98.25%, and 97.56%, respectively. Compared to other methods, the suggested method performed better in terms of sensitivity, specificity, accuracy, and F 1 score in the analysis and classification of false and real images. The ACO technique had an accuracy index of 93.90%. The proposed method’s index was equivalent to 99.12%. The ACO algorithm yielded the worst results, as can be seen in Table 2. Sensitivity, specificity, accuracy, and F1 score were measured using the ACO algorithm, and the results were 97.18%, 95.79%, 93.90%, and 96.01%, respectively. The DFDC dataset was also used to further assess the suggested approach.
According to an analysis of the proposed method, as well as other ways of classifying fake and real images, the suggested method was based on four indicators (accuracy, precision, sensitivity, and F 1   s c o r e s ) for methods such as ACO, PSO, and GA. These indicators were derived through an examination of the proposed method and other methods. Classifying fake and real images was more successfully accomplished using optical flow. The comparison between the proposed method and other methods is shown in Figure 5.
Figure 5. The proposed method’s mean index of sensitivity, accuracy, and precision in comparison to rival techniques.
The proposed method successfully classified images from the DFDC dataset with a sensitivity, specificity, precision, accuracy, and F1 score of 99.34%, 99.08%, 98.91%, 99.12%, and 98.94%, respectively. Evaluations revealed that the optical flow approach had the lowest classification accuracy for deepfake detection, at 91.88%. The suggested method achieved the highest sensitivity score, while optical flow showed the lowest performance in this metric. In terms of precision, the proposed strategy performed the best with a score of 98.91%, whereas the optical flow method had the lowest precision index at 61.86%. The proposed strategy also outperformed other methods in the F1 index, while optical flow showed the worst performance.

7. Limitations of the Proposed Method

  • Dataset size: Acknowledging the influence of dataset size on method effectiveness is crucial. While the ACO-PSO model exhibited proficiency with the current dataset, it is important to recognize that outcomes may differ with larger datasets. This constraint could impact the adaptability of the approach across diverse contexts with varying dataset sizes.
  • Computational complexity: The integration of deep learning techniques with optimization algorithms such as ACO-PSO can lead to computational intensiveness. This complexity may pose challenges for real-time applications, especially on platforms with limited computational resources.
  • Robustness to new manipulation techniques: The dynamic nature of deepfake creation constantly introduces novel manipulation techniques. Ensuring the method’s resilience to these evolving techniques necessitates ongoing adaptation and updates, presenting a hurdle in terms of maintaining detection accuracy over time.
  • Ethical considerations: Deploying deepfake detection systems raises ethical concerns surrounding privacy, consent, and potential misuse. Addressing these ethical implications requires the development of comprehensive mechanisms to safeguard against misuse and ensure ethical usage.

Potential Solutions

  • Dataset expansion: expanding the dataset to include a wider array of manipulation techniques, resolutions, and quality levels can enhance method generalizability and facilitate a more thorough performance evaluation.
  • Model optimization: Exploring techniques to enhance the computational efficiency of the model without compromising detection accuracy is essential. This may involve strategies such as model compression, pruning, or the development of lightweight architectures tailored for real-time deployment.
  • Adaptability mechanisms: Developing mechanisms to dynamically monitor and adapt the detection system to emerging manipulation techniques in real time is critical. Continuous model retraining using incoming data streams or integrating anomaly detection algorithms can aid in identifying and addressing suspicious content.
  • Ethical framework development: Collaborating with ethicists, policymakers, and stakeholders to formulate comprehensive ethical frameworks is imperative. These frameworks should address issues of consent, user privacy, and potential societal impacts, ensuring responsible deployment and minimizing harm.

8. Conclusions

This study introduced a novel method for identifying counterfeit videos by integrating ACO-PSO features with deep learning techniques. Our findings underscored the effectiveness of this approach, achieving an impressive accuracy of 98.91% and an F1 score of 99.12% in distinguishing between genuine and manipulated videos. By strategically integrating ACO-PSO features, which capture spatial and temporal patterns indicative of deepfake manipulation, with deep learning classifiers, our approach offers superior precision and resilience in detection.
Furthermore, we leveraged transfer learning with pre-trained ACO-PSO models, tapping into knowledge gained from models trained on larger datasets. This strategy reduced the need for extensive training data while leveraging pre-existing features learned by the model. Consequently, it addressed challenges posed by limited datasets available for deepfake detection, yielding promising results even with smaller datasets.
Moreover, incorporating optical flow features enhanced our method’s ability to analyze temporal dynamics, providing insights into motion disparities suggestive of tampering. This comprehensive approach, encompassing both spatial and temporal analyses, strengthened our ability to detect deepfake content amid evolving manipulation techniques.
Looking forward, it will be imperative to develop more robust models and further reduce error rates. Emphasizing advancements in the deepfake creation process and deploying deep-learning-based ensemble methods will be essential in staying ahead of emerging threats. By actively pursuing innovative techniques and drawing insights from ongoing research, we can continue to enhance the effectiveness and efficiency of deepfake detection, safeguarding digital media integrity and mitigating the risks associated with misinformation and manipulation.

9. Future Work

  • Enhanced ensemble methods: Delve into the untapped potential of refining ensemble learning techniques by integrating a broader spectrum of machine learning algorithms and architectures. Explore innovative combinations to fine-tune deepfake detection accuracy and resilience.
  • Dynamic feature selection: Investigate dynamic feature selection methods aimed at intelligently adapting to select the most pertinent features for deepfake detection. This approach has the potential to bolster efficiency and efficacy, particularly in real-time applications
  • Adversarial defense mechanisms: Forge robust adversarial defense mechanisms to counter the impact of adversarial attacks on deepfake detection systems. Explore advanced techniques, such as adversarial training and GANs, to fortify model resilience.
  • Continual learning strategies: Explore continual learning strategies to empower deepfake detection models to evolve and improve over time. This is especially crucial in the face of evolving manipulation techniques and emerging variants of deepfake technology.
  • Ethical and legal implications: Scrutinize the ethical and legal dimensions of deploying deepfake detection systems, delving into issues surrounding privacy, consent, and potential misuse. Develop comprehensive guidelines and frameworks to ensure the responsible deployment and ethical usage of deepfake detection technology.
  • User education and awareness: Place a spotlight on educating users about the existence and potential risks associated with deepfake technology. Empower users to critically assess media content and equip them with the necessary tools to safeguard against misinformation and manipulation.
  • Real-world deployment and integration: Initiate pilot studies and forge collaborations with pertinent stakeholders to seamlessly deploy deepfake detection systems and integrate them into real-world scenarios. This includes integration into platforms such as social media, video-streaming services, and forensic analysis tools.
  • Multi-modal fusion: Explore the fusion of diverse modalities, such as audio, text, and contextual information, with visual data to create a more comprehensive deepfake detection framework. Uncover synergies between various modalities to amplify detection accuracy and robustness.
  • Benchmarking and evaluation: Establish standardized benchmarks and evaluation protocols for deepfake detection systems to facilitate fair comparison and reproducibility across various research endeavors. Continuously update benchmarks to reflect evolving challenges and advancements in deepfake technology.

Author Contributions

Methodology, H.S.A.; Writing—original draft, Y.C.; Project administration, S.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets generated and analyzed during the current study are available from the corresponding authors upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Afchar, D.; Nozick, V.; Yamagishi, J.; Echizen, I. Mesonet: A compact facial video forgery detection network. In Proceedings of the 2018 IEEE International Workshop on Information Forensics and Security (WIFS), Hong Kong, China, 11–13 December 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–7. [Google Scholar]
  2. Agarwal, S.; Farid, H.; Gu, Y.; He, M.; Nagano, K.; Li, H. Protecting World Leaders Against Deep Fakes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Long Beach, CA, USA, 15–20 June 2019; p. 38. [Google Scholar]
  3. Güera, D.; Delp, E.J. Deepfake video detection using recurrent neural networks. In Proceedings of the 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Auckland, New Zealand, 27–30 November 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar]
  4. Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of gans for improved quality, stability, and variation. arXiv 2017, arXiv:1710.10196. [Google Scholar]
  5. Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4401–4410. [Google Scholar]
  6. Li, Y.; Chang, M.-C.; Lyu, S. In ictu oculi: Exposing ai created fake videos by detecting eye blinking. In Proceedings of the 2018 IEEE International Workshop on Information Forensics and Security (WIFS), Hong Kong, China, 11–13 December 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–7. [Google Scholar]
  7. Hussain, S.; Neekhara, P.; Jere, M.; Koushanfar, F.; McAuley, J. Adversarial deepfakes: Evaluating vulnerability of deepfake detectors to adversarial examples. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Virtual, 5–9 January 2021; pp. 3348–3357. [Google Scholar]
  8. Nirkin, Y.; Keller, Y.; Hassner, T. Fsgan: Subject agnostic face swapping and reenactment. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 7184–7193. [Google Scholar]
  9. Siarohin, A.; Lathuilière, S.; Tulyakov, S.; Ricci, E.; Sebe, N. First order motion model for image animation. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, 8–14 December 2019. [Google Scholar]
  10. Thies, J.; Elgharib, M.; Tewari, A.; Theobalt, C.; Nießner, M. Neural voice puppetry: Audio-driven facial reenactment. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 716–731. [Google Scholar]
  11. Vougioukas, K.; Petridis, S.; Pantic, M. Realistic speech-driven facial animation with gans. Int. J. Comput. Vis. 2020, 128, 1398–1413. [Google Scholar] [CrossRef]
  12. Yang, X.; Li, Y.; Lyu, S. Exposing deep fakes using inconsistent head poses. In Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 8261–8265. [Google Scholar]
  13. Yi, R.; Ye, Z.; Zhang, J.; Bao, H.; Liu, Y.-J. Audio-driven talking face video generation with learning-based personalized head pose. arXiv 2020, arXiv:2002.10137. [Google Scholar]
  14. Kong, C.; Chen, B.; Li, H.; Wang, S.; Rocha, A.; Kwong, S. Detect and locate: Exposing face manipulation by semantic-and noise-level telltales. IEEE Trans. Inf. Forensics Secur. 2022, 17, 1741–1756. [Google Scholar] [CrossRef]
  15. Luo, A.; Kong, C.; Huang, J.; Hu, Y.; Kang, X.; Kot, A.C. Beyond the prior forgery knowledge: Mining critical clues for general face forgery detection. IEEE Trans. Inf. Forensics Secur. 2023, 19, 1168–1182. [Google Scholar] [CrossRef]
  16. Mohamed, M.H.; Khafagy, M.H.; Elbeh, H.; Abdalla, A.M. Sparsity and cold start recommendation system challenges solved by hybrid feedback. Int. J. Eng. Res. Technol. 2019, 12, 2734–2741. [Google Scholar]
  17. Mohamed, M.H.; Ibrahim, L.F.; Elmenshawy, K.; Fadlallah, H.R. Adaptive learning systems based on ILOs of courses. WSEAS Trans. Syst. Control 2023, 18, 1–17. [Google Scholar] [CrossRef]
  18. Mohamed, M.H.; Khafagy, M.H. Hash semi cascade join for joining multi-way map reduce. In Proceedings of the 2015 SAI Intelligent Systems Conference (IntelliSys), London, UK, 10–11 November 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 355–361. [Google Scholar]
  19. Mohamed, M.H.; Khafagy, M.H.; Hasan, M. Music Recommendation System Used Emotions to Track and Change Negative Users’ mood. J. Theor. Appl. Inf. Technol. 2021, 99, 4358–4376. [Google Scholar]
  20. Sayed, A.R.; Khafagy, M.H.; Ali, M.; Mohamed, M.H. Predict student learning styles and suitable assessment methods using click stream. Egypt. Inform. J. 2024, 26, 100469. [Google Scholar] [CrossRef]
  21. Shan, Y.; Hu, D.; Wang, Z. A Novel Truncated Norm Regularization Method for Multi-channel Color Image Denoising. IEEE Trans. Circuits Syst. Video Technol. 2024. [Google Scholar] [CrossRef]
  22. Liu, Y.; Yan, Z.; Tan, J.; Li, Y. Multi-purpose oriented single nighttime image haze removal based on unified variational retinex model. IEEE Trans. Circuits Syst. Video Technol. 2022, 33, 1643–1657. [Google Scholar] [CrossRef]
  23. Liu, Y.; Yan, Z.; Chen, S.; Ye, T.; Ren, W.; Chen, E. Nighthazeformer: Single nighttime haze removal using prior query transformer. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; pp. 4119–4128. [Google Scholar]
  24. Li, C.; Hu, E.; Zhang, X.; Zhou, H.; Xiong, H.; Liu, Y. Visibility restoration for real-world hazy images via improved physical model and Gaussian total variation. Front. Comput. Sci. 2024, 18, 1–3. [Google Scholar] [CrossRef]
  25. Xiao, J.; Zhou, J.; Lei, J.; Xu, C.; Sui, H. Image hazing algorithm based on generative adversarial networks. IEEE Access 2019, 8, 15883–15894. [Google Scholar] [CrossRef]
  26. Mustak, M.; Salminen, J.; Mäntymäki, M.; Rahman, A.; Dwivedi, Y.K. Deepfakes: Deceptions, mitigations, and opportunities. J. Bus. Res. 2023, 154, 113368. [Google Scholar] [CrossRef]
  27. Traboulsi, N. Deepfakes: Analysis of Threats and Countermeasures; California State University: Fullerton, CA, USA, 2020. [Google Scholar]
  28. Don, L. Advanced Cybersecurity Strategies: Leveraging Machine Learning for Deepfake and Malware Defense. Cameroon. 2024. Available online: https://easychair.org/publications/preprint/pN7Q (accessed on 16 April 2024).
  29. Owaid, M.A.; Hammoodi, A.S. Evaluating Machine Learning and Deep Learning Models for Enhanced DDoS Attack Detection. Math. Model. Eng. Probl. 2024, 11, 493–499. [Google Scholar] [CrossRef]
  30. Wazirali, R.; Yaghoubi, E.; Abujazar, M.S.S.; Ahmad, R.; Vakili, A.H. State-of-the-art review on energy and load forecasting in microgrids using artificial neural networks, machine learning, and deep learning techniques. Electr. Power Syst. Res. 2023, 225, 109792. [Google Scholar] [CrossRef]
  31. Thippanna, D.G.; Priya, M.D.; Srinivas, T.A.S. An Effective Analysis of Image Processing with Deep Learning Algorithms. Int. J. Comput. Appl. 2023, 975, 8887. [Google Scholar] [CrossRef]
  32. Hassini, K.; Khalis, S.; Habibi, O.; Chemmakha, M.; Lazaar, M. An end-to-end learning approach for enhancing intrusion detection in Industrial-Internet of Things. Knowl. Based Syst. 2024, 294, 111785. [Google Scholar] [CrossRef]
  33. George, A.S.; George, A.S.H. Deepfakes: The Evolution of Hyper realistic Media Manipulation. Partn. Univers. Innov. Res. Publ. 2023, 1, 58–74. [Google Scholar]
  34. Masood, M.; Nawaz, M.; Malik, K.M.; Javed, A.; Irtaza, A.; Malik, H. Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward. Appl. Intell. 2023, 53, 3974–4026. [Google Scholar] [CrossRef]
  35. Namazli, P. Face Spoof Detection Using Convolutional Neural Networks. Probl. Inf. Soc. 2023, 14, 40–46. [Google Scholar]
  36. Alkishri, W.; Widyarto, S.; Yousif, J.H.; Al-Bahri, M. Fake Face Detection Based on Colour Textual Analysis Using Deep Convolutional Neural Network. J. Internet Serv. Inf. Secur. 2023, 13, 143–155. [Google Scholar] [CrossRef]
  37. Gupta, A.; Pandey, D. Unmasking the Illusion: Deepfake Detection through MesoNet. In Proceedings of the 2024 IEEE International Conference on Computing, Power and Communication Technologies (IC2PCT), Greater Noida, India, 9–10 February 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1934–1938. [Google Scholar]
  38. Sajith, S.; Pooja, A.; Ramesh, T.; Rajpal, P.; Roshna, A.R.; Ahammad, J. Anemia Identification from Blood Smear Images Using Deep Learning: An XAI Approach. In Proceedings of the 2023 International Conference on Recent Advances in Information Technology for Sustainable Development (ICRAIS), Manipal, India, 6–7 November 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 147–152. [Google Scholar]
  39. Zhang, C.; Costa-Perez, X.; Patras, P. Adversarial attacks against deep learning-based network intrusion detection systems and defense mechanisms. IEEE/ACM Trans. Netw. 2022, 30, 1294–1311. [Google Scholar] [CrossRef]
  40. Mohammed, A.; Kora, R. A comprehensive review on ensemble deep learning: Opportunities and challenges. J. King Saud Univ.-Comput. Inf. Sci. 2023, 35, 757–774. [Google Scholar] [CrossRef]
  41. Khaleel, M.; Yaghoubi, E.; Yaghoubi, E.; Jahromi, M.Z. The role of mechanical energy storage systems based on artificial intelligence techniques in future sustainable energy systems. Int. J. Electr. Eng. Sustain. (IJEES) 2023, 1, 1–31. [Google Scholar]
  42. Zhang, L.; Zhao, D.; Lim, C.P.; Asadi, H.; Huang, H.; Yu, Y.; Gao, R. Video Deepfake Classification Using Particle Swarm Optimization-based Evolving Ensemble Models. Knowl. Based Syst. 2024, 289, 111461. [Google Scholar] [CrossRef]
  43. Nailwal, S.; Singhal, S.; Singh, N.T.; Raza, A. Deepfake Detection: A Multi-Algorithmic and Multi-Modal Approach for Robust Detection and Analysis. In Proceedings of the 2023 International Conference on Research Methodologies in Knowledge Management, Artificial Intelligence and Telecommunication Engineering (RMKMATE), Chennai, India, 1–2 November 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–8. [Google Scholar]
  44. Taji, K.; Sohail, A.; Shahzad, T.; Khan, B.S.; Khan, M.A.; Ouahada, K. An Ensemble Hybrid Framework: A Comparative Analysis of Metaheuristic Algorithms for Ensemble Hybrid CNN features for Plants Disease Classification. IEEE Access 2024, 12, 61886–61906. [Google Scholar] [CrossRef]
  45. Passos, L.A.; Jodas, D.; Costa, K.A.P.; Júnior, L.A.S.; Rodrigues, D.; Del Ser, J.; Camacho, D.; Papa, J.P. A review of deep learning-based approaches for deepfake content detection. Expert Syst. 2022. [Google Scholar] [CrossRef]
  46. Bappy, J.H.; Roy-Chowdhury, A.K.; Bunk, J.; Nataraj, L.; Manjunath, B.S. Exploiting spatial structure for localizing manipulated image regions. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4970–4979. [Google Scholar]
  47. Zhang, W.; Gu, X.; Tang, L.; Yin, Y.; Liu, D.; Zhang, Y. Application of machine learning, deep learning and optimization algorithms in geoengineering and geoscience: Comprehensive review and future challenge. Gondwana Res. 2022, 109, 1–17. [Google Scholar] [CrossRef]
  48. Dolhansky, B.; Bitton, J.; Pflaum, B.; Lu, J.; Howes, R.; Wang, M.; Ferrer, C.C. The deepfake detection challenge (dfdc) dataset. arXiv 2020, arXiv:2006.07397. [Google Scholar]
  49. Zi, B.; Chang, M.; Chen, J.; Ma, X.; Jiang, Y.G. Wilddeepfake: A challenging real-world dataset for deepfake detection. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020. [Google Scholar]
  50. Al Shalchi, N.F.A.; Rahebi, J. Human retinal optic disc detection with grasshopper optimization algorithm. Multimed. Tools Appl. 2022, 81, 24937–24955. [Google Scholar] [CrossRef]
  51. Al-Safi, H.; Munilla, J.; Rahebi, J. Patient privacy in smart cities by blockchain technology and feature selection with Harris Hawks Optimization (HHO) algorithm and machine learning. Multimed. Tools Appl. 2022, 81, 8719–8743. [Google Scholar] [CrossRef]
  52. Al-Safi, H.; Munilla, J.; Rahebi, J. Harris Hawks Optimization (HHO) Algorithm based on Artificial Neural Network for Heart Disease Diagnosis. In Proceedings of the 2021 IEEE International Conference on Mobile Networks and Wireless Communications (ICMNWC), Tumkur, India, 3–4 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–5. [Google Scholar]
  53. Al-Rahlawee, A.T.H.; Rahebi, J. Multilevel thresholding of images with improved Otsu thresholding by black widow optimization algorithm. Multimed. Tools Appl. 2021, 80, 28217–28243. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.