MDPI - Publisher of Open Access Journals

16 pages, 5963 KB

Open AccessArticle

Channel-Aware Local–Global Representation Learning for Generalizable Deepfake Detection

by Liang Kuang, Beijing Chen and Pei Shi

Mathematics 2026, 14(11), 1913; https://doi.org/10.3390/math14111913 - 1 Jun 2026

Viewed by 234

Deepfake detection has emerged as a critical research area in image content security. To address the issue of limited generalization caused by insufficient modeling of forgery representations, this paper proposes a channel-aware local–global representation learning network for generalizable deepfake detection. Specifically, we introduce [...] Read more.

Deepfake detection has emerged as a critical research area in image content security. To address the issue of limited generalization caused by insufficient modeling of forgery representations, this paper proposes a channel-aware local–global representation learning network for generalizable deepfake detection. Specifically, we introduce a Local–Global Integration Vision Transformer (LGI-ViT) block that learns local and global representations and further integrates them with input features to capture more generalizable forgery cues at both fine-grained and global levels. Local representation learning is enhanced through coordinate convolution, while a hybrid convolution–Transformer architecture is employed to model global dependencies. Based on these representations, a residual connection is incorporated to integrate the combined local–global representation with the original input features. In addition, a Lightweight Channel Attention Network (LCAN) block is designed to strengthen interactions among feature channels and improve the discriminability of forgery-related representations. Experimental results demonstrate that the proposed network, trained on the FaceForensics++ (FF++) dataset, achieves cross-dataset AUC scores of 73.95% and 79.42% on the DeepFake Detection Challenge (DFDC) and Celeb-DF datasets, respectively. It outperforms the best-performing baseline among 11 competing models for generalizable detection by 1.01 percentage points on average, thereby validating its effectiveness in generalizable deepfake detection. Full article

(This article belongs to the Special Issue Artificial Intelligence Algorithms in Information Security and Cryptography)

► Show Figures

Figure 1

18 pages, 1606 KB

Open AccessArticle

Multi-Scale Dynamic Perception and Context Guidance Modulation for Efficient Deepfake Detection

by Yuanqing Ding, Fanliang Bu and Hanming Zhai

Electronics 2026, 15(8), 1569; https://doi.org/10.3390/electronics15081569 - 9 Apr 2026

Viewed by 445

Abstract

Deepfake technology poses significant threats to information authenticity and social trust, necessitating effective detection methods. However, existing detection approaches predominantly rely on high-complexity network architectures that, while accurate in controlled environments, suffer from prohibitive computational costs that hinder deployment in resource-constrained scenarios such [...] Read more.

Deepfake technology poses significant threats to information authenticity and social trust, necessitating effective detection methods. However, existing detection approaches predominantly rely on high-complexity network architectures that, while accurate in controlled environments, suffer from prohibitive computational costs that hinder deployment in resource-constrained scenarios such as social media platforms. To address this efficiency-accuracy dilemma, we propose a lightweight face forgery detection method that systematically learns multi-scale forgery traces. Our approach features a four-stage lightweight architecture that hierarchically extracts features from local textures to global semantics, mimicking the human visual system. Within each stage, a multi-scale dynamic perception mechanism divides feature channels into parallel groups equipped with lightweight attention modules to capture forgery cues spanning pixel-level anomalies, local structures, regional patterns, and semantic inconsistencies. Furthermore, rather than relying on conventional feature fusion that risks suppressing subtle artifacts, we introduce a novel Context-Guided Dynamic Convolution. This mechanism uses mid-level spatial anomalies as active anchors to dynamically modulate high-level semantic filters, with the goal of mitigating the disconnect between semantic content and forgery evidence. Our model achieves strong performance, yielding an AUC of 91.98% on FaceForensics++ and 93.50% on DeepFake Detection Challenge, outperforming current state-of-the-art lightweight methods. Furthermore, compared to heavy Vision Transformers, our model achieves a superior performance-efficiency trade-off, requiring only 3.06 M parameters and 1.36 G FLOPs, making it highly suitable for real-time, resource-constrained deployment. Full article

(This article belongs to the Section Electronic Multimedia)

► Show Figures

Figure 1

15 pages, 1482 KB

Open AccessArticle

PatchSeal: A Robust and Intangible Image Watermarking Framework for AIGC

by Ting You, Haixia Zheng, Zhaohan Wang and Yi Chen

Mathematics 2026, 14(4), 679; https://doi.org/10.3390/math14040679 - 14 Feb 2026

Viewed by 692

Abstract

The rapid growth of artificial intelligence-generated content (AIGC) has created serious challenges for image copyright protection, since semantic edits and deep-fake manipulations can easily erase or distort embedded watermarks. Traditional robust watermarking methods, which are mainly designed to resist pixel-level distortions such as [...] Read more.

The rapid growth of artificial intelligence-generated content (AIGC) has created serious challenges for image copyright protection, since semantic edits and deep-fake manipulations can easily erase or distort embedded watermarks. Traditional robust watermarking methods, which are mainly designed to resist pixel-level distortions such as noise, compression or filtering, often fail when faced with content-level transformations generated by AIGC models. This paper presents PatchSeal, a robust and intangible image watermarking framework that combines multi-targeted and attention-oriented embedding with a focus-oriented masking. The proposed framework introduces a segmentation-assisted embedding strategy that distributes watermark bits across several prominent regions to improve resilience to semantic changes. An attention-based module, composed of a subject extraction branch and a channel weighting branch, adapts to the encoder towards texture-rich and semantically stable regions, improving both invisibility and robustness. Experiments conducted in three public object data sets show that PatchSeal achieves an average PSNR of 43.13 dB and a bit precision of 92.98 percent under various AIGC editing conditions, surpassing representative methods such as MBRS and FIN. These results demonstrate the effectiveness of the proposed method in resisting AIGC-driven manipulations and provide new practical paths and methodological insights for the design of robust watermarks in the AIGC era. Full article

► Show Figures

Figure 1

30 pages, 16517 KB

Open AccessArticle

An Attention-Based Framework for Detecting Face Forgeries: Integrating Efficient-ViT and Wavelet Transform

by Yinfei Xiao, Yanbing Zhou, Pengzhan Cheng, Leqian Ni, Xusheng Wu and Tianxiang Zheng

Mathematics 2025, 13(16), 2576; https://doi.org/10.3390/math13162576 - 12 Aug 2025

Cited by 1 | Viewed by 2715

Abstract

As face forgery techniques, particularly the DeepFake method, progress, the imperative for effective detection of manipulations that enable hyper-realistic facial representations to mitigate security threats is emphasized. Current spatial domain approaches commonly encounter difficulties in generalizing across various forgery methods and compression artifacts, [...] Read more.

As face forgery techniques, particularly the DeepFake method, progress, the imperative for effective detection of manipulations that enable hyper-realistic facial representations to mitigate security threats is emphasized. Current spatial domain approaches commonly encounter difficulties in generalizing across various forgery methods and compression artifacts, whereas frequency-based analyses exhibit promise in identifying nuanced local cues; however, the absence of global contexts impedes the capacity of detection methods to improve generalization. This study introduces a hybrid architecture that integrates Efficient-ViT and multi-level wavelet transform to dynamically merge spatial and frequency features through a dynamic adaptive multi-branch attention (DAMA) mechanism, thereby improving the deep interaction between the two modalities. We innovatively devise a joint loss function and a training strategy to address the imbalanced data issue and improve the training process. Experimental results on the FaceForensics++ and Celeb-DF (V2) have validated the effectiveness of our approach, attaining 97.07% accuracy in intra-dataset evaluations and a 74.7% AUC score in cross-dataset assessments, surpassing our baseline Efficient-ViT by 14.1% and 7.7%, respectively. The findings indicate that our approach excels in generalization across various datasets and methodologies, while also effectively minimizing feature redundancy through an innovative orthogonal loss that regularizes the feature space, as evidenced by the ablation study and parameter analysis. Full article

► Show Figures

Figure 1

35 pages, 1458 KB

Open AccessArticle

User Comment-Guided Cross-Modal Attention for Interpretable Multimodal Fake News Detection

by Zepu Yi, Chenxu Tang and Songfeng Lu

Appl. Sci. 2025, 15(14), 7904; https://doi.org/10.3390/app15147904 - 15 Jul 2025

Cited by 1 | Viewed by 2998

Abstract

In order to address the pressing challenge posed by the proliferation of fake news in the digital age, we emphasize its profound and harmful impact on societal structures, including the misguidance of public opinion, the erosion of social trust, and the exacerbation of [...] Read more.

In order to address the pressing challenge posed by the proliferation of fake news in the digital age, we emphasize its profound and harmful impact on societal structures, including the misguidance of public opinion, the erosion of social trust, and the exacerbation of social polarization. Current fake news detection methods are largely limited to superficial text analysis or basic text–image integration, which face significant limitations in accurately identifying deceptive information. To bridge this gap, we propose the UC-CMAF framework, which comprehensively integrates news text, images, and user comments through an adaptive co-attention fusion mechanism. The UC-CMAF workflow consists of four key subprocesses: multimodal feature extraction, cross-modal adaptive collaborative attention fusion of news text and images, cross-modal attention fusion of user comments with news text and images, and finally, input of fusion features into a fake news detector. Specifically, we introduce multi-head cross-modal attention heatmaps and comment importance visualizations to provide interpretability support for the model’s predictions, revealing key semantic areas and user perspectives that influence judgments. Through the cross-modal adaptive collaborative attention mechanism, UC-CMAF achieves deep semantic alignment between news text and images and uses social signals from user comments to build an enhanced credibility evaluation path, offering a new paradigm for interpretable fake information detection. Experimental results demonstrate that UC-CMAF consistently outperforms 15 baseline models across two benchmark datasets, achieving F1 Scores of 0.894 and 0.909. These results validate the effectiveness of its adaptive cross-modal attention mechanism and the incorporation of user comments in enhancing both detection accuracy and interpretability. Full article

(This article belongs to the Special Issue Explainable Artificial Intelligence Technology and Its Applications)

► Show Figures

Figure 1

42 pages, 3140 KB

Open AccessReview

Face Anti-Spoofing Based on Deep Learning: A Comprehensive Survey

by Huifen Xing, Siok Yee Tan, Faizan Qamar and Yuqing Jiao

Appl. Sci. 2025, 15(12), 6891; https://doi.org/10.3390/app15126891 - 18 Jun 2025

Cited by 12 | Viewed by 17588

Abstract

Face recognition has achieved tremendous success in both its theory and technology. However, with increasingly realistic attacks, such as print photos, replay videos, and 3D masks, as well as new attack methods like AI-generated faces or videos, face recognition systems are confronted with [...] Read more.

Face recognition has achieved tremendous success in both its theory and technology. However, with increasingly realistic attacks, such as print photos, replay videos, and 3D masks, as well as new attack methods like AI-generated faces or videos, face recognition systems are confronted with significant challenges and risks. Distinguishing between real and fake faces, i.e., face anti-spoofing (FAS), is crucial to the security of face recognition systems. With the advent of large-scale academic datasets in recent years, FAS based on deep learning has achieved a remarkable level of performance and now dominates the field. This paper systematically reviews the latest advancements in FAS based on deep learning. First, it provides an overview of the background, basic concepts, and types of FAS attacks. Then, it categorizes existing FAS methods from the perspectives of RGB (red, green and blue) modality and other modalities, discussing the main concepts, the types of attacks that can be detected, their advantages and disadvantages, and so on. Next, it introduces popular datasets used in FAS research and highlights their characteristics. Finally, it summarizes the current research challenges and future directions for FAS, such as its limited generalization for unknown attacks, the insufficient multi-modal research, the spatiotemporal efficiency of algorithms, and unified detection for presentation attacks and deepfakes. We aim to provide a comprehensive reference in this field and to inspire progress within the FAS community, guiding researchers toward promising directions for future work. Full article

(This article belongs to the Special Issue Deep Learning in Object Detection)

► Show Figures

Figure 1

20 pages, 2171 KB

Open AccessArticle

CBAM-ResNet: A Lightweight ResNet Network Focusing on Time Domain Features for End-to-End Deepfake Speech Detection

by Yuezhou Wu, Hua Huang, Zhiri Li and Siling Zhang

Electronics 2025, 14(12), 2456; https://doi.org/10.3390/electronics14122456 - 17 Jun 2025

Cited by 3 | Viewed by 2231

Abstract

With the rapid development of synthetic speech and deepfake technology, fake speech poses a severe challenge to voice authentication systems. Traditional detection methods generally rely on manual feature extraction, facing problems such as limited feature expression ability and insufficient cross-scenario generalization performance. To [...] Read more.

With the rapid development of synthetic speech and deepfake technology, fake speech poses a severe challenge to voice authentication systems. Traditional detection methods generally rely on manual feature extraction, facing problems such as limited feature expression ability and insufficient cross-scenario generalization performance. To this end, this paper proposes an improved ResNet network based on a Convolutional Block Attention Module (CBAM) for end-to-end fake speech detection. This method introduces channel attention and spatial attention mechanisms into the ResNet network structure to enhance the model’s attention to the temporal characteristics of speech, thereby improving the ability to distinguish between real and fake speech. The proposed model adopts an end-to-end training strategy, directly processes the original spectrogram input, uses the residual structure to alleviate the gradient vanishing problem in the deep network, and enhances the collaborative expression ability of local details and global context through the CBAM module. The experiment is conducted on the ASVspoof2019 LA dataset, and the equal error rate (EER) is used as the main evaluation indicator. The experimental results show that compared with traditional deepfake speech detection methods, the proposed model achieves better performance in indicators such as EER, verifying the effectiveness of the CBAM attention mechanism in forged speech detection. Full article

(This article belongs to the Special Issue Emerging Trends in Generative-AI Based Audio Processing)

► Show Figures

Figure 1

41 pages, 5112 KB

Open AccessArticle

Deepfake Face Detection and Adversarial Attack Defense Method Based on Multi-Feature Decision Fusion

by Shanzhong Lei, Junfang Song, Feiyang Feng, Zhuyang Yan and Aixin Wang

Appl. Sci. 2025, 15(12), 6588; https://doi.org/10.3390/app15126588 - 11 Jun 2025

Cited by 5 | Viewed by 8091

Abstract

The rapid advancement in deep forgery technology in recent years has created highly deceptive face video content, posing significant security risks. Detecting these fakes is increasingly urgent and challenging. To improve the accuracy of deepfake face detection models and strengthen their resistance to [...] Read more.

The rapid advancement in deep forgery technology in recent years has created highly deceptive face video content, posing significant security risks. Detecting these fakes is increasingly urgent and challenging. To improve the accuracy of deepfake face detection models and strengthen their resistance to adversarial attacks, this manuscript introduces a method for detecting forged faces and defending against adversarial attacks based on a multi-feature decision fusion. This approach allows for rapid detection of fake faces while effectively countering adversarial attacks. Firstly, an improved IMTCCN network was employed to precisely extract facial features, complemented by a diffusion model for noise reduction and artifact removal. Subsequently, the FG-TEFusionNet (Facial-geometry and Texture enhancement fusion-Net) model was developed for deepfake face detection and assessment. This model comprises two key modules: one for extracting temporal features between video frames and another for spatial features within frames. Initially, a facial geometry landmark calibration module based on the LRNet baseline framework ensured an accurate representation of facial geometry. A SENet attention mechanism was then integrated into the dual-stream RNN to enhance the model’s capability to extract inter-frame information and derive preliminary assessment results based on inter-frame relationships. Additionally, a Gram image texture feature module was designed and integrated into EfficientNet and the attention maps of WSDAN (Weakly Supervised Data Augmentation Network). This module aims to extract deep-level feature information from the texture structure of image frames, addressing the limitations of purely geometric features. The final decisions from both modules were integrated using a voting method, completing the deepfake face detection process. Ultimately, the model’s robustness was validated by generating adversarial samples using the I-FGSM algorithm and optimizing model performance through adversarial training. Extensive experiments demonstrated the superior performance and effectiveness of the proposed method across four subsets of FaceForensics++ and the Celeb-DF dataset. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

16 pages, 2542 KB

Open AccessArticle

The Eyes: A Source of Information for Detecting Deepfakes

by Elisabeth Tchaptchet, Elie Fute Tagne, Jaime Acosta, Danda B. Rawat and Charles Kamhoua

Information 2025, 16(5), 371; https://doi.org/10.3390/info16050371 - 30 Apr 2025

Cited by 5 | Viewed by 4344

Abstract

Currently, the phenomenon of deepfakes is becoming increasingly significant, as they enable the creation of extremely realistic images capable of deceiving anyone thanks to deep learning tools based on generative adversarial networks (GANs). These images are used as profile pictures on social media [...] Read more.

Currently, the phenomenon of deepfakes is becoming increasingly significant, as they enable the creation of extremely realistic images capable of deceiving anyone thanks to deep learning tools based on generative adversarial networks (GANs). These images are used as profile pictures on social media with the intent to sow discord and perpetrate scams on a global scale. In this study, we demonstrate that these images can be identified through various imperfections present in the synthesized eyes, such as the irregular shape of the pupil and the difference between the corneal reflections of the two eyes. These defects result from the absence of physical and physiological constraints in most GAN models. We develop a two-level architecture capable of detecting these fake images. This approach begins with an automatic segmentation method for the pupils to verify their shape, as real image pupils naturally have a regular shape, typically round. Next, for all images where the pupils are not regular, the entire image is analyzed to verify the reflections. This step involves passing the facial image through an architecture that extracts and compares the specular reflections of the corneas of the two eyes, assuming that the eyes of real people observing a light source should reflect the same thing. Our experiments with a large dataset of real images from the Flickr-FacesHQ and CelebA datasets, as well as fake images from StyleGAN2 and ProGAN, show the effectiveness of our method. Our experimental results on the Flickr-Faces-HQ (FFHQ) dataset and images generated by StyleGAN2 demonstrated that our algorithm achieved a remarkable detection accuracy of 0.968 and a sensitivity of 0.911. Additionally, the method had a specificity of 0.907 and a precision of 0.90 for this same dataset. And our experimental results on the CelebA dataset and images generated by ProGAN also demonstrated that our algorithm achieved a detection accuracy of 0.870 and a sensitivity of 0.901. Moreover, the method had a specificity of 0.807 and a precision of 0.88 for this same dataset. Our approach maintains good stability of physiological properties during deep learning, making it as robust as some single-class deepfake detection methods. The results of the tests on the selected datasets demonstrate higher accuracy compared to other methods. Full article

► Show Figures

Figure 1

48 pages, 6422 KB

Open AccessReview

Modern Trends and Recent Applications of Hyperspectral Imaging: A Review

by Ming-Fang Cheng, Arvind Mukundan, Riya Karmakar, Muhamed Adil Edavana Valappil, Jumana Jouhar and Hsiang-Chen Wang

Technologies 2025, 13(5), 170; https://doi.org/10.3390/technologies13050170 - 23 Apr 2025

Cited by 60 | Viewed by 18621

Abstract

Hyperspectral imaging (HSI) is an advanced imaging technique that captures detailed spectral information across multiple fields. This review explores its applications in counterfeit detection, remote sensing, agriculture, medical imaging, cancer detection, environmental monitoring, mining, mineralogy, and food processing, specifically highlighting significant achievements from [...] Read more.

Hyperspectral imaging (HSI) is an advanced imaging technique that captures detailed spectral information across multiple fields. This review explores its applications in counterfeit detection, remote sensing, agriculture, medical imaging, cancer detection, environmental monitoring, mining, mineralogy, and food processing, specifically highlighting significant achievements from the past five years, providing a timely update across several fields. It also presents a cross-disciplinary classification framework to systematically categorize applications in medical, agriculture, environment, and industry. In counterfeit detection, HSI identified fake currency with high accuracy in the 400–500 nm range and achieved a 99.03% F1-score for counterfeit alcohol detection. Remote sensing applications include hyperspectral satellites, which improve forest classification accuracy by 50%, and soil organic matter, with the prediction reaching R² = 0.6. In agriculture, the HSI-TransUNet model achieved 86.05% accuracy for crop classification, and disease detection reached 98.09% accuracy. Medical imaging benefits from HSI’s non-invasive diagnostics, distinguishing skin cancer with 87% sensitivity and 88% specificity. In cancer detection, colorectal cancer identification reached 86% sensitivity and 95% specificity. Environmental applications include PM2.5 pollution detection with 85.93% accuracy and marine plastic waste detection with 70–80% accuracy. In food processing, egg freshness prediction achieved R² = 91%, and pine nut classification reached 100% accuracy. Despite its advantages, HSI faces challenges like high costs and complex data processing. Advances in artificial intelligence and miniaturization are expected to improve accessibility and real-time applications. Future advancements are anticipated to concentrate on the integration of deep learning models for automated feature extraction and decision-making in hyperspectral imaging analysis. The development of lightweight, portable HSI devices will enable more on-site applications in agriculture, healthcare, and environmental monitoring. Moreover, real-time processing methods will enhance efficiency for field deployment. These improvements seek to enhance the accessibility, practicality, and efficacy of HSI in both industrial and clinical environments. Full article

(This article belongs to the Special Issue Artificial Intelligence and Smart Information Systems: Trends and Innovations)

► Show Figures

Figure 1

30 pages, 1422 KB

Open AccessFeature PaperArticle

A Comparative Analysis of Compression and Transfer Learning Techniques in DeepFake Detection Models

by Andreas Karathanasis, John Violos and Ioannis Kompatsiaris

Mathematics 2025, 13(5), 887; https://doi.org/10.3390/math13050887 - 6 Mar 2025

Cited by 5 | Viewed by 5400

Abstract

DeepFake detection models play a crucial role in ambient intelligence and smart environments, where systems rely on authentic information for accurate decisions. These environments, integrating interconnected IoT devices and AI-driven systems, face significant threats from DeepFakes, potentially leading to compromised trust, erroneous decisions, [...] Read more.

DeepFake detection models play a crucial role in ambient intelligence and smart environments, where systems rely on authentic information for accurate decisions. These environments, integrating interconnected IoT devices and AI-driven systems, face significant threats from DeepFakes, potentially leading to compromised trust, erroneous decisions, and security breaches. To mitigate these risks, neural-network-based DeepFake detection models have been developed. However, their substantial computational requirements and long training times hinder deployment on resource-constrained edge devices. This paper investigates compression and transfer learning techniques to reduce the computational demands of training and deploying DeepFake detection models, while preserving performance. Pruning, knowledge distillation, quantization, and adapter modules are explored to enable efficient real-time DeepFake detection. An evaluation was conducted on four benchmark datasets: “SynthBuster”, “140k Real and Fake Faces”, “DeepFake and Real Images”, and “ForenSynths”. It compared compressed models with uncompressed baselines using widely recognized metrics such as accuracy, precision, recall, F1-score, model size, and training time. The results showed that a compressed model at 10% of the original size retained only 56% of the baseline accuracy, but fine-tuning in similar scenarios increased this to nearly 98%. In some cases, the accuracy even surpassed the original’s performance by up to 12%. These findings highlight the feasibility of deploying DeepFake detection models in edge computing scenarios. Full article

(This article belongs to the Special Issue Ambient Intelligence Methods and Applications)

► Show Figures

Figure 1

21 pages, 3599 KB

Open AccessArticle

Using Deep Learning to Identify Deepfakes Created Using Generative Adversarial Networks

by Jhanvi Jheelan and Sameerchand Pudaruth

Computers 2025, 14(2), 60; https://doi.org/10.3390/computers14020060 - 10 Feb 2025

Cited by 10 | Viewed by 6724

Abstract

Generative adversarial networks (GANs) have revolutionised various fields by creating highly realistic images, videos, and audio, thus enhancing applications such as video game development and data augmentation. However, this technology has also given rise to deepfakes, which pose serious challenges due to their [...] Read more.

Generative adversarial networks (GANs) have revolutionised various fields by creating highly realistic images, videos, and audio, thus enhancing applications such as video game development and data augmentation. However, this technology has also given rise to deepfakes, which pose serious challenges due to their potential to create deceptive content. Thousands of media reports have informed us of such occurrences, highlighting the urgent need for reliable detection methods. This study addresses the issue by developing a deep learning (DL) model capable of distinguishing between real and fake face images generated by StyleGAN. Using a subset of the 140K real and fake face dataset, we explored five different models: a custom CNN, ResNet50, DenseNet121, MobileNet, and InceptionV3. We leveraged the pre-trained models to utilise their robust feature extraction and computational efficiency, which are essential for distinguishing between real and fake features. Through extensive experimentation with various dataset sizes, preprocessing techniques, and split ratios, we identified the optimal ones. The 20k_gan_8_1_1 dataset produced the best results, with MobileNet achieving a test accuracy of 98.5%, followed by InceptionV3 at 98.0%, DenseNet121 at 97.3%, ResNet50 at 96.1%, and the custom CNN at 86.2%. All of these models were trained on only 16,000 images and validated and tested on 2000 images each. The custom CNN model was built with a simpler architecture of two convolutional layers and, hence, lagged in accuracy due to its limited feature extraction capabilities compared with deeper networks. This research work also included the development of a user-friendly web interface that allows deepfake detection by uploading images. The web interface backend was developed using Flask, enabling real-time deepfake detection, allowing users to upload images for analysis and demonstrating a practical use for platforms in need of quick, user-friendly verification. This application demonstrates significant potential for practical applications, such as on social media platforms, where the model can help prevent the spread of fake content by flagging suspicious images for review. This study makes important contributions by comparing different deep learning models, including a custom CNN, to understand the balance between model complexity and accuracy in deepfake detection. It also identifies the best dataset setup that improves detection while keeping computational costs low. Additionally, it introduces a user-friendly web tool that allows real-time deepfake detection, making the research useful for social media moderation, security, and content verification. Nevertheless, identifying specific features of GAN-generated deepfakes remains challenging due to their high realism. Future works will aim to expand the dataset by using all 140,000 images, refine the custom CNN model to increase its accuracy, and incorporate more advanced techniques, such as Vision Transformers and diffusion models. The outcomes of this study contribute to the ongoing efforts to counteract the negative impacts of GAN-generated images. Full article

► Show Figures

Figure 1

16 pages, 603 KB

Open AccessArticle

Comprehensive Evaluation of Deepfake Detection Models: Accuracy, Generalization, and Resilience to Adversarial Attacks

by Maryam Abbasi, Paulo Váz, José Silva and Pedro Martins

Appl. Sci. 2025, 15(3), 1225; https://doi.org/10.3390/app15031225 - 25 Jan 2025

Cited by 17 | Viewed by 18629

Abstract

The rise of deepfakes—synthetic media generated using artificial intelligence—threatens digital content authenticity, facilitating misinformation and manipulation. However, deepfakes can also depict real or entirely fictitious individuals, leveraging state-of-the-art techniques such as generative adversarial networks (GANs) and emerging diffusion-based models. Existing detection methods face [...] Read more.

The rise of deepfakes—synthetic media generated using artificial intelligence—threatens digital content authenticity, facilitating misinformation and manipulation. However, deepfakes can also depict real or entirely fictitious individuals, leveraging state-of-the-art techniques such as generative adversarial networks (GANs) and emerging diffusion-based models. Existing detection methods face challenges with generalization across datasets and vulnerability to adversarial attacks. This study focuses on subsets of frames extracted from the DeepFake Detection Challenge (DFDC) and FaceForensics++ videos to evaluate three convolutional neural network architectures—XCeption, ResNet, and VGG16—for deepfake detection. Performance metrics include accuracy, precision, F1-score, AUC-ROC, and Matthews Correlation Coefficient (MCC), combined with an assessment of resilience to adversarial perturbations via the Fast Gradient Sign Method (FGSM). Among the tested models, XCeption achieves the highest accuracy (89.2% on DFDC), strong generalization, and real-time suitability, while VGG16 excels in precision and ResNet provides faster inference. However, all models exhibit reduced performance under adversarial conditions, underscoring the need for enhanced resilience. These findings indicate that robust detection systems must consider advanced generative approaches, adversarial defenses, and cross-dataset adaptation to effectively counter evolving deepfake threats. Full article

► Show Figures

Figure 1

21 pages, 5152 KB

Open AccessArticle

GAGAN: Enhancing Image Generation Through Hybrid Optimization of Genetic Algorithms and Deep Convolutional Generative Adversarial Networks

by Despoina Konstantopoulou, Paraskevi Zacharia, Michail Papoutsidakis, Helen C. Leligou and Charalampos Patrikakis

Algorithms 2024, 17(12), 584; https://doi.org/10.3390/a17120584 - 19 Dec 2024

Cited by 8 | Viewed by 4111

Abstract

Generative Adversarial Networks (GANs) are highly effective for generating realistic images, yet their training can be unstable due to challenges such as mode collapse and oscillatory convergence. In this paper, we propose a novel hybrid optimization method that integrates Genetic Algorithms (GAs) to [...] Read more.

Generative Adversarial Networks (GANs) are highly effective for generating realistic images, yet their training can be unstable due to challenges such as mode collapse and oscillatory convergence. In this paper, we propose a novel hybrid optimization method that integrates Genetic Algorithms (GAs) to improve the training process of Deep Convolutional GANs (DCGANs). Specifically, GAs are used to evolve the discriminator’s weights, complementing the gradient-based learning typically employed in GANs. The proposed GAGAN model is trained on the CelebA dataset, using 2000 images, to generate 128 × 128 images, with the generator learning to produce realistic faces from random latent vectors. The discriminator, which classifies images as real or fake, is optimized not only through standard backpropagation, but also through a GA framework that evolves its weights via crossover, mutation, and selection processes. This hybrid method aims to enhance convergence stability and boost image quality by balancing local search from gradient-based methods with the global search capabilities of GAs. Experiments show that the proposed approach reduces generator loss and improves image fidelity, demonstrating that evolutionary algorithms can effectively complement deep learning techniques. This work opens new avenues for optimizing GAN training and enhancing performance in generative models. Full article

(This article belongs to the Special Issue Algorithms for Image Processing and Machine Vision)

► Show Figures

Figure 1

14 pages, 543 KB

Open AccessArticle

CSTAN: A Deepfake Detection Network with CST Attention for Superior Generalization

by Rui Yang, Kang You, Cheng Pang, Xiaonan Luo and Rushi Lan

Sensors 2024, 24(22), 7101; https://doi.org/10.3390/s24227101 - 5 Nov 2024

Cited by 4 | Viewed by 3048

Abstract

With the advancement of deepfake forgery technology, highly realistic fake faces have posed serious security risks to sensor-based facial recognition systems. Recent deepfake detection models mainly use binary classification models based on deep learning. Despite achieving high detection accuracy on intra-datasets, these models [...] Read more.

With the advancement of deepfake forgery technology, highly realistic fake faces have posed serious security risks to sensor-based facial recognition systems. Recent deepfake detection models mainly use binary classification models based on deep learning. Despite achieving high detection accuracy on intra-datasets, these models lack generalization ability when applied to cross-datasets. We propose a deepfake detection model named Channel-Spatial-Triplet Attention Network (CSTAN), which focuses on the difference between real and fake features, thereby enhancing the generality of the detection model. To enhance the feature-learning ability of the model for image forgery regions, we have designed the Channel-Spatial-Triplet (CST) attention mechanism, which extracts subtle local information by capturing feature channels and the spatial correlation of three different scales. Additionally, we propose a novel feature extraction method, OD-ResNet-34, by embedding ODConv into the feature extraction network to enhance its dynamic adaptability to data features. Trained on the FF++ dataset and tested on the Celeb-DF-v1 and Celeb-DF-v2 datasets, the experimental results show that our model has stronger generalization ability in cross-datasets than similar models. Full article

(This article belongs to the Special Issue Image Processing and Analysis for Object Detection: 2nd Edition)

► Show Figures

Figure 1

Search Results (56)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (56)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI