Next Article in Journal
Inter-Satellite Link Network Real-Time Ring Dissemination Performance and Robustness
Previous Article in Journal
Audio-Based Drone Detection System Using FFT and Machine Learning Models
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

A Survey on Deep-Learning-Based Techniques for Detecting AI-Generated Synthetic Images †

by
Staycy Guevara
,
Ana Lucila Sandoval Orozco
and
Luis Javier García Villalba
*
Group of Analysis, Security and Systems (GASS), Department of Software Engineering and Artificial Intelligence (DISIA), Faculty of Computer Science and Engineering, Office 431,Universidad Complutense de Madrid (UCM), Calle Profesor José García Santesmases, 9, Ciudad Universitaria, 28040 Madrid, Spain
*
Author to whom correspondence should be addressed.
Presented at the First Summer School on Artificial Intelligence in Cybersecurity, Cancun, Mexico, 3–7 November 2025.
Eng. Proc. 2026, 123(1), 32; https://doi.org/10.3390/engproc2026123032
Published: 11 February 2026
(This article belongs to the Proceedings of First Summer School on Artificial Intelligence in Cybersecurity)

Abstract

Detecting synthetic images has become increasingly challenging due to the high realism achieved by current generation models. Generative adversarial networks (GANs) and diffusion models can produce images that mimic human features and textures with remarkable accuracy, raising concerns about the spread of sensitive content, such as AI-generated child sexual abuse material (CSAM). To address this issue, deep-learning-based detection techniques can accurately distinguish AI-generated images from real ones, offering robust generalization capabilities. This review provides an in-depth examination of AI-generated synthetic image detection techniques, highlighting strengths, limitations, and emerging trends, with a focus on applications in detecting manipulated content and identifying areas for future research and development.

1. Introduction

Artificial intelligence (AI) algorithms can now create highly realistic synthetic images, making it difficult for humans to distinguish them from real ones. Diffusion-based algorithms are particularly effective in producing high-quality synthetic images [1]. However, this has raised concerns about image manipulation, fake news, and explicit content [2]. The rise in synthetic image production poses a challenge for ensuring online content authenticity, highlighting the need for AI-generated synthetic image detection.

2. Related Work

The increasing prevalence of fake images has driven the development of sophisticated approaches to detect AI-generated content. Identifying which images are real and which are not has become a complex task, requiring innovative methods. Some of these methods include looking at the image’s frequency patterns, using models that combine visual and text information, and analyzing different aspects of the image, such as its edges, colors, and textures. Machine learning is also a key part of these detection methods, helping to spot fake images. A recent approach, presented in “High-resolution network-based multifeature fusion for generalised forgery detection” [3], proposes a system based on HRNet, a high-resolution network. HRNet combines information from three domains: gradient, frequency, and color space (RGB). This multi-domain fusion enables effective detection of synthetic images, improving accuracy in spotting manipulations. Specifically, HRNet captures edge and texture information (gradient domain), checks for image patterns and inconsistencies (frequency domain), and examines color data (RGB domain), making it a robust solution for forgery detection. On the other hand, the study “Deepfake Detection using Deep Learning: A Two-Pronged Approach with CNNs and Auto-encoders” [4] uses a two-part approach to detect deepfakes. First, it uses a convolutional neural network to directly classify images as real or fake. Second, it uses an auto-encoder to reconstruct the image and detect any anomalies or inconsistencies that may indicate manipulation. By combining these two strategies, the approach aims to improve the accuracy of deepfake detection. With the growth of increasingly sophisticated generative models, CNN-based architectures have shown limitations in capturing long-range relationships within images. To overcome this limitation, Vision Transformers (ViT) have been explored as an alternative approach, processing images as sequences of patches to more effectively identify generation patterns. A study titled “Advanced Detection of AI-Generated Images Through Vision Transformers” [5] demonstrated the effectiveness of ViT in detecting synthetic images, achieving an impressive accuracy of 98.2% on a dataset of 30,000 real and synthetic images. This outperformed other methods, such as ResNet50 and Effi-cientNet, highlighting the potential of ViT as a robust detection approach. Unlike approaches that rely solely on visual models, the paper “CLIPping the Deception: Adapting vision-language models for Universal Deepfake Detection” [6] explores the possibility of using vision-language models, specifically CLIP, for universal deepfake detection. CLIP combines visual and textual information, enabling it to understand image context and meaning, leverage textual descriptions and clues, and improve generalization across diverse datasets. Notably, CLIP has demonstrated effectiveness in deepfake detection, achieving an accuracy of 95.5% and outperforming traditional visual-only approaches. Another method, frequency domain analysis, has proven to be an effective tool in detecting synthetic images. Along these lines, the study ”MaskSim: Detection of Synthetic Images by Masked Spectrum Similarity Analysis“ [7] introduces a semi-white-box approach based on anomalous spectral patterns present in AI-generated images. MaskSim analyzes the frequency domain to reveal hidden patterns left by synthetic image generation methods, using masked spectrum similarity to compare an image’s spectrum with a reference pattern and highlight inconsistencies. This approach also provides high explainability, making detection results easier to understand. Furthermore, a combination method using frequency analysis can be found in the literature. The study “Development of a Deepfake Detection Method” [8] proposes a new approach to detect deepfakes by analyzing images in two ways: frequency analysis and color space reduction. Frequency analysis helps identify patterns common in manipulated images, while color space reduction simplifies the image, making manipulations more apparent. By combining these two techniques, this approach aims to improve the accuracy of detecting manipulated images. Hybrid methods combining different techniques are emerging, such as the framework proposed in “A Novel Framework Based on a Hybrid Vision Transformer and Deep Neural Network for Deepfake Detection” [9]. This approach combines Vision Transformers (ViTs) with Convolutional Autoencoders (CAEs) to detect deepfakes through two models: (1) ViT-CAE Integration, which integrates CAE with ViT to reconstruct and classify images, identifying subtle differences between authentic and synthetic content; and (2) CAE-based Feature Extraction, where CAE encodes image representations analyzed using traditional machine learning algorithms like support vector machines (SVMs) and artificial neural networks (ANNs). Similarly, the study “UAM-Net: Robust Deepfake Detection Through Hybrid Attention Into Scalable Convolutional Network” [10] proposes UAM-Net, a scalable convolutional neural network that combines spatial and channel attention mechanisms to detect synthetic images. UAM-Net achieved impressive results on multiple datasets, including DFDC-P, FaceForensics++, and CelebA-DF, with an accuracy and F1-score of 98.07. Recently, vision-language models (VLMs) have shown great potential in deepfake detection. FLODA (Florence-2 Optimized for Deepfake Assessment), a method presented in [11], uses VLMs to detect deepfakes by reframing detection as a visual question answering problem, enabling more contextual analysis of AI-generated images. FLODA surpasses the existing models with an average accuracy of 97.14% across 16 datasets and achieves 100% accuracy against adversarial attacks, backdoors, and data poisoning.
As shown in Figure 1, detection techniques are grouped according to the type of generative model, including diffusion based models, GANs based models, and other models.
Table 1 presents a comparison of the techniques evaluated based on performance metrics.
As shown in Table 1, the comparison of the techniques is made, evaluating accuracy, AUC (comparing the true positive rate (Recall) versus the false positive rate (FPR)), F1 score (harmonic mean between Precision and Recall), Recall (sensitivity or true positive rate), and Precision (measuring how many positive predictions were actually correct).

3. Conclusions

There is a need for a more comprehensive study of techniques used to increase diversity in training datasets and address issues of data selection and sensitivity to distortion in images. This is particularly important as content generation has reached a point where generative artificial intelligence has achieved such a high degree of visual realism that, in many cases, the generated images are virtually indistinguishable from real ones. This development poses a new challenge in distinguishing between authentic and artificially generated content, raising questions about the authenticity, integrity, and credibility of digitally seen images. Looking ahead, applying knowledge of synthetic image detection could significantly improve law enforcement efforts against criminal activities involving manipulated images, such as CSAM. By leveraging this expertise, cybersecurity professionals can better combat these crimes and enhance their defenses against emerging threats.

Author Contributions

Conceptualization, S.G., A.L.S.O. and L.J.G.V.; methodology, S.G., A.L.S.O. and L.J.G.V.; validation, S.G., A.L.S.O. and L.J.G.V.; investigation, S.G., A.L.S.O. and L.J.G.V.; writing—original draft preparation, S.G., A.L.S.O. and L.J.G.V.; writing—review and editing, S.G., A.L.S.O. and L.J.G.V. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the European Commission under the Horizon 2020 research and innovation programme, as part of the project HEROES (http://heroes-fct.eu, Grant Agreement no. 101021801) and of the project ALUNA (https://aluna-isf.eu/, Grant Agreement no. 101084929). This work was also carried out with funding from the Recovery, Transformation and Resilience Plan, financed by the European Union (Next Generation EU), through the Chair “Cybersecurity for Innovation and Digital Protection” INCIBE-UCM. In addition, this work has been supported by Comunidad Autonoma de Madrid, CIRMA-CM Project (TEC-2024/COM-404).

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).

Acknowledgments

The content of this article does not reflect the official opinion of the European Union. Responsibility for the information and views expressed therein lies entirely with the authors. S.G. thanks the National Secretariat of Science, Technology and Innovation (SENACYT) of Panama for the financial support scholarship while towards her PhD.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bammey, Q. Synthbuster: Towards Detection of Diffusion Model Generated Images. IEEE Open J. Signal Process. 2024, 5, 1–9. [Google Scholar] [CrossRef]
  2. Corvi, R.; Cozzolino, D.; Zingarini, G.; Poggi, G.; Nagano, K.; Verdoliva, L. On The Detection of Synthetic Images Gen-erated by Diffusion Models. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar] [CrossRef]
  3. Liu, R.; Zhang, S.; Xu, Y.; Xu, W.; He, X. High-resolution network-based multi-feature fusion for generalized forgery detection. Multimed. Syst. 2025, 31, 35. [Google Scholar] [CrossRef]
  4. Madhumitha, A.N.K.K.; Shet, M.; Alekhya, P. Deepfake Detection using Deep Learning: A Two-Pronged Approach with CNNs and Autoencoders. In Proceedings of the 2024 2nd International Conference on Recent Advances in Information Technology for Sustainable Development (ICRAIS), Manipal, India, 6–7 November 2024; pp. 24–29. [Google Scholar] [CrossRef]
  5. Lamichhane, D. Advanced Detection of AI-Generated Images Through Vision Transformers. IEEE Access 2025, 13, 3644–3652. [Google Scholar] [CrossRef]
  6. Khan, S.A.; Dang-Nguyen, D. CLIPping the Deception: Adapting Vision-Language Models for Universal Deepfake Detection. In Proceedings of the 2024 International Conference on Multimedia Retrieval (ICMR ’24); Association for Computing Machinery: New York, NY, USA, 2024; pp. 1006–1015. [Google Scholar] [CrossRef]
  7. Li, Y.; Bammey, Q.; Gardella, M.; Nikoukhah, T.; Morel, J.-M.; Colom, M.; Gioi, R.G.V. MaskSim: Detection of synthetic images by masked spectrum similarity analysis. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 17–18 June 2024; pp. 3855–3865. [Google Scholar] [CrossRef]
  8. Rogovoi, V.; Korzhuk, V.M.; Kokorina, O.A. Development of a Deepfake Detection Method: Application of Frequency Analysis and Reduction of the Image Color Space to Improve Classification Accuracy. In Proceedings of the 2024 V International Conference on Neural Networks and Neurotechnologies (NeuroNT), Saint Petersburg, Russia, 20 June 2024; pp. 36–39. [Google Scholar] [CrossRef]
  9. Shahin, M.; Deriche, M. A Novel Framework based on a Hybrid Vision Transformer and Deep Neural Network for Deepfake Detection. In Proceedings of the 2024 21st International Multi-Conference on Systems, Signals & Devices (SSD), Erbil, Iraq, 20 June 2024; pp. 329–333. [Google Scholar] [CrossRef]
  10. Sudarshana, K.; Vamsidhar, Y. UAM-Net: Robust Deepfake Detection Through Hybrid Attention Into Scalable Convolutional Network. Expert Syst. 2025, 42, e70009. [Google Scholar] [CrossRef]
  11. Youngho, B.; Seunghyeon, P.; Gunhui, H.; Alexander, O. FLODA: Harnessing Vision-Language Models for Deepfake Assessment. In Proceedings of the 2025 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 11–14 January 2025. [Google Scholar] [CrossRef]
Figure 1. Detection techniques grouped by generative model: detection techniques for diffusion models, for GAN models, and other models.
Figure 1. Detection techniques grouped by generative model: detection techniques for diffusion models, for GAN models, and other models.
Engproc 123 00032 g001
Table 1. Metrics for evaluation image detection.
Table 1. Metrics for evaluation image detection.
TechniqueAccuracy (%)AUC (%)F1 (%)Recall (%)Precision (%)
CNN-based91.488.790.289.691.0
HRNet99.899.799.799.599.8
Transformer97.697.497.396.997.1
Frequency96.898.197.096.596.9
Vision Language98.098.598.298.397.8
Hybrid97.294.695.494.995.0
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Guevara, S.; Sandoval Orozco, A.L.; García Villalba, L.J. A Survey on Deep-Learning-Based Techniques for Detecting AI-Generated Synthetic Images. Eng. Proc. 2026, 123, 32. https://doi.org/10.3390/engproc2026123032

AMA Style

Guevara S, Sandoval Orozco AL, García Villalba LJ. A Survey on Deep-Learning-Based Techniques for Detecting AI-Generated Synthetic Images. Engineering Proceedings. 2026; 123(1):32. https://doi.org/10.3390/engproc2026123032

Chicago/Turabian Style

Guevara, Staycy, Ana Lucila Sandoval Orozco, and Luis Javier García Villalba. 2026. "A Survey on Deep-Learning-Based Techniques for Detecting AI-Generated Synthetic Images" Engineering Proceedings 123, no. 1: 32. https://doi.org/10.3390/engproc2026123032

APA Style

Guevara, S., Sandoval Orozco, A. L., & García Villalba, L. J. (2026). A Survey on Deep-Learning-Based Techniques for Detecting AI-Generated Synthetic Images. Engineering Proceedings, 123(1), 32. https://doi.org/10.3390/engproc2026123032

Article Metrics

Back to TopTop