DIFC-Net: Diffusion-Intrinsic Feature Capture for AI-Generated Image Detection
Abstract
1. Introduction
- We propose DIFC-Net, a novel diffusion-intrinsic detection framework that identifies AI-generated images by analyzing their reconstruction behavior during diffusion inversion, rather than relying on static pixel-level artifacts.
- We introduce a unified multi-path learning architecture that integrates spatial residual discrepancies and latent diffusion trajectory evolution, enabling robust detection of generative inconsistencies across diverse models.
- We demonstrate strong cross-model generalization and interpretability, achieving state-of-the-art performance on unseen diffusion models and providing transparent forensic reasoning through residual visualization and attention heatmaps.
2. Related Work
2.1. Low-Level Artifacts and Frequency Domain Signals
2.2. End-to-End Detection Based on Deep Features
2.3. Detection and Reconstruction Consistency of Diffusion Model Specificity
2.4. Fingerprint, Attribution and Traceability
2.5. Datasets, Robustness and Calibration
3. Methods
3.1. Reconstruction
3.1.1. Forward Diffusion
3.1.2. Reverse Denoising Process
3.1.3. Triplet and Latent Trajectory
3.2. Visual Comparator Path
3.3. Trajectory Feature Model
3.4. Active Capture Fusion
3.5. Binary Decision Layer
4. Results
4.1. Experimental Setup
4.1.1. Datasets
4.1.2. Baselines
4.2. Results and Analysis
4.2.1. Comparisons of Detection
4.2.2. Analysis of Residual and Attention Maps
4.2.3. Discussion
4.2.4. Computational Complexity and Runtime Analysis
4.3. Ablation Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhu, M.; Chen, H.; Yan, Q.; Huang, X.; Lin, G.; Li, W.; Tu, Z.; Hu, H.; Hu, J.; Wang, Y. GenImage: A Million-Scale Benchmark for Detecting AI-Generated Image. arXiv 2023, arXiv:2306.08571. [Google Scholar] [CrossRef]
- Marra, F.; Gragnaniello, D.; Verdoliva, L.; Poggi, G. Do GANs leave artificial fingerprints? arXiv 2018, arXiv:1812.11842. [Google Scholar] [CrossRef]
- Nataraj, L.; Mohammed, T.M.; Chandrasekaran, S.; Flenner, A.; Bappy, J.H.; Roy-Chowdhury, A.K.; Manjunath, B.S. Detecting GAN generated Fake Images using Co-occurrence Matrices. arXiv 2019, arXiv:1903.06836. [Google Scholar] [CrossRef]
- Frank, J.; Eisenhofer, T.; Schönherr, L.; Fischer, A.; Kolossa, D.; Holz, T. Leveraging Frequency Analysis for Deep Fake Image Recognition. arXiv 2020, arXiv:2003.08685. [Google Scholar] [CrossRef]
- Durall, R.; Keuper, M.; Keuper, J. Watch your Up-Convolution: CNN Based Generative Deep Neural Networks are Failing to Reproduce Spectral Distributions. arXiv 2020, arXiv:2003.01826. [Google Scholar] [CrossRef]
- Wang, S.Y.; Wang, O.; Zhang, R.; Owens, A.; Efros, A.A. CNN-generated images are surprisingly easy to spot… for now. arXiv 2020, arXiv:1912.11035. [Google Scholar] [CrossRef]
- Song, J.; Meng, C.; Ermon, S. Denoising Diffusion Implicit Models. arXiv 2022, arXiv:2010.02502. [Google Scholar] [CrossRef]
- Wang, Z.; Bao, J.; Zhou, W.; Wang, W.; Hu, H.; Chen, H.; Li, H. DIRE for Diffusion-Generated Image Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV); IEEE: Piscataway, NJ, USA, 2023; pp. 22445–22455. [Google Scholar]
- Chen, B.; Zeng, J.; Yang, J.; Yang, R. DRCT: Diffusion reconstruction contrastive training towards universal detection of diffusion generated images. In Proceedings of the 41st International Conference on Machine Learning; ICML’24; JMLR.org.: Norfolk, MA, USA, 2024. [Google Scholar]
- Zhang, Y.; Xu, X. Diffusion Noise Feature: Accurate and Fast Generated Image Detection. arXiv 2025, arXiv:2312.02625. [Google Scholar] [CrossRef]
- Xu, Q.; Jiang, X.; Sun, T.; Wang, H.; Meng, L.; Yan, H. Detecting Artificial Intelligence-Generated images via deep trace representations and interactive feature fusion. Inf. Fusion 2024, 112, 102578. [Google Scholar] [CrossRef]
- Rössler, A.; Cozzolino, D.; Verdoliva, L.; Riess, C.; Thies, J.; Nießner, M. FaceForensics++: Learning to Detect Manipulated Facial Images. arXiv 2019, arXiv:1901.08971. [Google Scholar] [CrossRef]
- Guarnera, L.; Giudice, O.; Battiato, S. DeepFake Detection by Analyzing Convolutional Traces. arXiv 2020, arXiv:2004.10448. [Google Scholar] [CrossRef]
- Chen, Y.; Kang, X.; Wang, Z.J.; Zhang, Q. Densely Connected Convolutional Neural Network for Multi-purpose Image Forensics under Anti-forensic Attacks. In IH&MMSec ’18: 6th ACM Workshop on Information Hiding and Multimedia Security; Association for Computing Machinery: New York, NY, USA, 2018; pp. 91–96. [Google Scholar] [CrossRef]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning Transferable Visual Models From Natural Language Supervision. arXiv 2021, arXiv:2103.00020. [Google Scholar] [CrossRef]
- He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked Autoencoders Are Scalable Vision Learners. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2022; pp. 15979–15988. [Google Scholar] [CrossRef]
- Li, J.; Li, D.; Savarese, S.; Hoi, S. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. arXiv 2023, arXiv:2301.12597. [Google Scholar] [CrossRef]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. arXiv 2021, arXiv:2106.09685. [Google Scholar] [CrossRef]
- Luo, Y.; Du, J.; Yan, K.; Ding, S. LaRE2: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection. arXiv 2025, arXiv:2403.17465. [Google Scholar] [CrossRef]
- Wahab, H.; Ugail, H.; Jaleel, L. Ensemble-Based Deepfake Detection using State-of-the-Art Models with Robust Cross-Dataset Generalisation. arXiv 2025, arXiv:2507.05996. [Google Scholar] [CrossRef]
- Lin, Y.; Mao, T.; Chen, Z.; Lu, H.; Chen, Z.; Kang, Y. MFF-Net: A multi-view feature fusion network for generalized forgery image detection. Neurocomputing 2025, 640, 130351. [Google Scholar] [CrossRef]
- Qin, Z.; Guo, X.; Li, J.; Chen, Y. Domain generalization for image classification based on simplified self ensemble learning. PLoS ONE 2025, 20, e0320300. [Google Scholar] [CrossRef]
- Wu, Y.; Zhang, F.; Shi, T.; Yin, R.; Wang, Z.; Gan, Z.; Wang, X.; Lv, C.; Zheng, X.; Huang, X. Explainable Synthetic Image Detection through Diffusion Timestep Ensembling. arXiv 2025, arXiv:2503.06201. [Google Scholar] [CrossRef]
- Dhariwal, P.; Nichol, A. Diffusion Models Beat GANs on Image Synthesis. arXiv 2021, arXiv:2105.05233. [Google Scholar] [CrossRef]
- Ma, R.; Duan, J.; Kong, F.; Shi, X.; Xu, K. Exposing the Fake: Effective Diffusion-Generated Images Detection. arXiv 2023, arXiv:2307.06272. [Google Scholar] [CrossRef]
- Chu, B.; Xu, X.; Wang, X.; Zhang, Y.; You, W.; Zhou, L. FIRE: Robust Detection of Diffusion-Generated Images via Frequency-Guided Reconstruction Error. arXiv 2025, arXiv:2412.07140. [Google Scholar] [CrossRef]
- Cazenavette, G.; Sud, A.; Leung, T.; Usman, B. FakeInversion: Learning to Detect Images from Unseen Text-to-Image Models by Inverting Stable Diffusion. arXiv 2024, arXiv:2406.08603. [Google Scholar] [CrossRef]
- Yang, D.; Huang, Y.; Guo, Q.; Juefei-Xu, F.; Jia, X.; Wang, R.; Pu, G.; Liu, Y. Text Modality Oriented Image Feature Extraction for Detecting Diffusion-based DeepFake. arXiv 2024, arXiv:2405.18071. [Google Scholar] [CrossRef]
- Yu, Z.; Ni, J.; Lin, Y.; Deng, H.; Li, B. Diffforensics: Leveraging diffusion prior to image forgery detection and localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2024; pp. 12765–12774. [Google Scholar]
- Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Deep Image Prior. Int. J. Comput. Vis. 2020, 128, 1867–1888. [Google Scholar] [CrossRef]
- Tasnim, N.; Uddin, K.; Malik, K. GReX-Bench: Benchmarking Generalization, Robustness, and Explainability in AI-Generated Image Detection. Res. Sq. 2026. [Google Scholar] [CrossRef]
- Qian, Y.; Yin, G.; Sheng, L.; Chen, Z.; Shao, J. Thinking in Frequency: Face Forgery Detection by Mining Frequency-aware Clues. In Proceedings of the European Conference on Computer Vision; Springer: Cham, Switzerland, 2020. [Google Scholar]
- Liu, Z.; Qi, X.; Jia, J.; Torr, P.H.S. Global Texture Enhancement for Fake Face Detection in the Wild. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2020; pp. 8057–8066. [Google Scholar]
- Ojha, U.; Li, Y.; Lee, Y.J. Towards Universal Fake Image Detectors that Generalize Across Generative Models. arXiv 2024, arXiv:2302.10174. [Google Scholar] [CrossRef]
- Karageorgiou, D.; Papadopoulos, S.; Kompatsiaris, I.; Gavves, E. Any-Resolution AI-Generated Image Detection by Spectral Learning. arXiv 2025, arXiv:2411.19417. [Google Scholar] [CrossRef]
- Corvi, R.; Cozzolino, D.; Zingarini, G.; Poggi, G.; Nagano, K.; Verdoliva, L. On The Detection of Synthetic Images Generated by Diffusion Models. In ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); IEEE: Piscataway, NJ, USA, 2022; pp. 1–5. [Google Scholar]
- Sha, Z.; Li, Z.; Yu, N.; Zhang, Y. DE-FAKE: Detection and Attribution of Fake Images Generated by Text-to-Image Generation Models. In CCS ’23: 2023 ACM SIGSAC Conference on Computer and Communications Security; Association for Computing Machinery: New York, NY, USA, 2023; pp. 3418–3432. [Google Scholar] [CrossRef]
- Koutlis, C.; Papadopoulos, S. Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection. arXiv 2024, arXiv:2402.19091. [Google Scholar] [CrossRef]
- Ju, Y.; Jia, S.; Ke, L.; Xue, H.; Nagano, K.; Lyu, S. Fusing Global and Local Features for Generalized AI-Synthesized Image Detection. In 2022 IEEE International Conference on Image Processing (ICIP); IEEE: Piscataway, NJ, USA, 2022; pp. 3465–3469. [Google Scholar]
- Tan, C.; Zhao, Y.; Wei, S.; Gu, G.; Liu, P.; Wei, Y. Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2024; pp. 28130–28139. [Google Scholar]
- Tan, C.; Tao, R.; Liu, H.; Gu, G.; Wu, B.; Zhao, Y.; Wei, Y. C2P-CLIP: Injecting Category Common Prompt in CLIP to Enhance Generalization in Deepfake Detection. In Proceedings of the AAAI Conference on Artificial Intelligence; Association for the Advancement of Artificial Intelligence: Washington, DC, USA, 2024. [Google Scholar]
- Yu, N.; Davis, L.; Fritz, M. Attributing Fake Images to GANs: Learning and Analyzing GAN Fingerprints. arXiv 2019, arXiv:1811.08180. [Google Scholar] [CrossRef]
- Lu, Y.; Liu, J.; Zhang, R. GANFR: GAN fingerprint removal network for image anti-forensics. Knowl.-Based Syst. 2025, 327, 114134. [Google Scholar] [CrossRef]
- Cozzolino, D.; Verdoliva, L. Noiseprint: A CNN-Based Camera Model Fingerprint. IEEE Trans. Inf. Forensics Secur. 2020, 15, 144–159. [Google Scholar] [CrossRef]
- Wen, Y.; Kirchenbauer, J.; Geiping, J.; Goldstein, T. Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust. arXiv 2023, arXiv:2305.20030. [Google Scholar] [CrossRef]
- Deng, H.; Jiang, Y.; Yu, G.; Wang, Q.; Wang, X.; Ma, B.; Ni, W.; Liu, R.P. PoLO: Proof-of-Learning and Proof-of-Ownership at Once with Chained Watermarking. arXiv 2025, arXiv:2505.12296. [Google Scholar] [CrossRef]
- Bammey, Q. Synthbuster: Towards Detection of Diffusion Model Generated Images. IEEE Open J. Signal Process. 2024, 5, 1–9. [Google Scholar] [CrossRef]
- Cai, H.; Liu, C.; Shen, S.; Qu, Y.; Gui, P. Robust AI-Synthesized Image Detection via Multi-feature Frequency-aware Learning. arXiv 2025, arXiv:2504.02879. [Google Scholar] [CrossRef]
- Guo, C.; Pleiss, G.; Sun, Y.; Weinberger, K.Q. On Calibration of Modern Neural Networks. arXiv 2017, arXiv:1706.04599. [Google Scholar] [CrossRef]
- Minderer, M.; Djolonga, J.; Romijnders, R.; Hubis, F.; Zhai, X.; Houlsby, N.; Tran, D.; Lucic, M. Revisiting the Calibration of Modern Neural Networks. arXiv 2021, arXiv:2106.07998. [Google Scholar] [CrossRef]
- Schuhmann, C.; Beaumont, R.; Vencu, R.; Gordon, C.; Wightman, R.; Cherti, M.; Coombes, T.; Katta, A.; Mullis, C.; Wortsman, M.; et al. LAION-5B: An open large-scale dataset for training next generation image-text models. arXiv 2022, arXiv:2210.08402. [Google Scholar] [CrossRef]
- Roca, T.; Postiglione, M.; Gao, C.; Gortner, I.; Wojciak, Z.; Wang, P.; Alimardani, M.; Anlen, S.; White, K.; Lavista, J.; et al. Introducing the MNW Benchmark for AI Forensics. 2025. Available online: https://www.dau.mcaindia.in/blog/northwestern-university-introducing-the-mnw-benchmark-for-ai-forensics (accessed on 7 December 2025).






| Method | Category | Core Cue | Inv. | Traj. | Key Idea |
|---|---|---|---|---|---|
| DIRE [8] | Diffusion-spec. | Reconstruction error after inversion | Yes | No | Detects synthetic images through inversion-induced reconstruction inconsistency. |
| DRCT [9] | Diffusion-spec. | Contrastive learning on reconstructed views | Yes | No | Learns discriminative representations from reconstruction-based multi-view contrastive training. |
| DNF [10] | Diffusion-spec. | Noise/residual-based clues | Partial | No | Exploits diffusion noise features and residual cues for detection. |
| LaRE2 [19] | Diffusion-spec. | Latent reconstruction error | Yes | Limited | Measures latent-space reconstruction inconsistency after inversion. |
| SeDID [25] | Diffusion-spec. | Reconstruction/noise inconsistency | Yes | No | Uses reconstruction and noise inconsistency as diffusion-specific forensic evidence. |
| FIRE [26] | Diffusion-spec. | Frequency-guided reconstruction error | Yes | No | Combines reconstruction error with frequency-aware forensic cues. |
| FakeInversion [27] | Diffusion-spec. | Multi-view inversion features | Yes | No | Fuses original images, decoded noise, and reconstructions as inversion-based multi-view inputs. |
| TOFE [28] | Diffusion-spec. | Text-guided inversion features | Yes | Limited | Introduces text-conditioned inversion optimization for detection. |
| DIP [29] | Diffusion-spec. | Reconstruction-consistency fingerprints | Yes | No | Builds source fingerprints from reconstruction consistency of deep image priors. |
| CNNSpot [6] | General | CNN-learned synthetic artifacts | No | No | Learns generic synthetic-image artifacts with CNN-based classification. |
| F3Net [32] | General | Frequency-domain artifacts | No | No | Detects manipulations through frequency-domain anomaly modeling. |
| GramNet [33] | General | Texture/statistical correlations | No | No | Uses texture statistics and Gram-style feature correlations for detection. |
| UnivFD [34] | Foundation-model | Universal pretrained visual features | No | No | Leverages large-scale pretrained visual representations for universal fake detection. |
| SPAI [35] | General | Spatial/semantic artifact cues | No | No | Detects AI-generated images through appearance-space inconsistencies. |
| DMID [36] | General | Deep discriminative artifact cues | No | No | Uses deep discriminative forensic features for synthetic image detection. |
| DE-Fake [37] | General | Enhanced forensic deep features | No | No | Enhances deep forensic representations for AI-generated image detection. |
| RINE [38] | General | Robust image forensic features | No | No | Uses robust image forensic representations extracted from intermediate encoder features. |
| Fusing [39] | General | Global–local feature fusion | No | No | Combines global and local image features for synthetic image detection. |
| NPR [40] | General | Noise/pattern representations | No | No | Models generic noise-pattern representations as forensic evidence. |
| C2PClip [41] | Vision–language | CLIP-based contrastive features | No | No | Uses pretrained CLIP representations for contrastive AI-image detection. |
| DIFC-Net (Ours) | Diffusion-intrinsic | Triplet discrepancy + latent trajectory with adaptive fusion | Yes | Yes | Jointly models reconstruction-triplet discrepancy and latent diffusion trajectory with adaptive fusion. |
| Fake Dataset | Datasize (Fake/Real) | Real Dataset |
|---|---|---|
| DALL·E3 | 500/500 | LAION |
| SDXL | 500/500 | |
| SDv1.5 | 500/500 | |
| ADM | 500/500 | |
| GLIDE | 500/500 | |
| Wukong | 500/500 | |
| Adobe | 350/350 | |
| Image3 | 250/250 | |
| Baidu | 250/250 | |
| BigGAN | 500/500 |
| Method | DALL·E3 | SDXL | SDv1.5 | ADM | GLIDE | Wukong | Adobe | Image3 | Baidu | BigGAN | Avg |
|---|---|---|---|---|---|---|---|---|---|---|---|
| CNNSpot | 63.59 ± 1.42 | 70.42 ± 1.67 | 76.08 ± 1.53 | 85.35 ± 1.63 | 56.20 ± 0.92 | 85.78 ± 1.64 | 45.48 ± 3.19 | 57.66 ± 1.19 | 47.63 ± 1.30 | 51.13 ± 1.12 | 63.93 ± 1.56 |
| F3Net | 57.94 ± 1.87 | 79.75 ± 1.89 | 67.97 ± 1.94 | 56.61 ± 2.19 | 49.44 ± 1.87 | 74.99 ± 2.09 | 73.87 ± 2.21 | 68.16 ± 2.28 | 65.17 ± 2.01 | 49.39 ± 1.86 | 64.33 ± 2.02 |
| GramNet | 51.64 ± 1.89 | 77.84 ± 2.08 | 84.26 ± 2.14 | 86.72 ± 1.64 | 78.53 ± 2.16 | 88.36 ± 1.71 | 55.56 ± 2.33 | 82.69 ± 2.18 | 70.35 ± 2.32 | 52.77 ± 2.12 | 72.87 ± 2.06 |
| UnivFD | 47.21 ± 1.64 | 81.08 ± 1.33 | 87.98 ± 1.71 | 77.39 ± 1.29 | 80.96 ± 1.40 | 90.92 ± 1.78 | 75.10 ± 1.81 | 84.11 ± 1.54 | 63.40 ± 1.32 | 61.55 ± 1.34 | 74.97 ± 1.52 |
| DIRE | 69.05 ± 2.03 | 62.63 ± 2.22 | 63.02 ± 2.29 | 49.43 ± 2.20 | 40.07 ± 2.36 | 55.64 ± 2.50 | 24.28 ± 2.58 | 44.17 ± 2.38 | 36.51 ± 2.74 | 44.85 ± 2.24 | 48.97 ± 2.35 |
| SPAI | 90.45 ± 2.06 | 93.71 ± 2.22 | 98.19 ± 1.29 | 85.50 ± 2.67 | 92.74 ± 2.16 | 97.08 ± 1.95 | 78.86 ± 2.27 | 90.00 ± 2.23 | 75.74 ± 2.25 | 84.83 ± 2.19 | 88.71 ± 2.13 |
| C2PClip | 75.12 ± 1.76 | 98.92 ± 0.66 | 99.27 ± 0.65 | 60.96 ± 1.61 | 71.90 ± 1.78 | 85.02 ± 1.61 | 49.17 ± 1.66 | 51.59 ± 1.65 | 24.61 ± 2.36 | 89.13 ± 1.48 | 70.57 ± 1.52 |
| DMID | 46.13 ± 1.67 | 98.21 ± 1.26 | 97.41 ± 1.73 | 67.00 ± 2.12 | 72.55 ± 1.68 | 95.37 ± 1.60 | 69.03 ± 1.64 | 55.58 ± 1.96 | 85.50 ± 1.87 | 67.40 ± 2.11 | 75.42 ± 1.76 |
| DE-Fake | 92.53 ± 1.67 | 60.60 ± 2.42 | 91.94 ± 1.80 | 73.10 ± 1.82 | 78.13 ± 1.61 | 85.61 ± 1.86 | 46.91 ± 2.10 | 81.01 ± 1.73 | 82.85 ± 1.62 | 72.47 ± 1.93 | 76.52 ± 1.86 |
| RINE | 41.92 ± 1.77 | 98.89 ± 1.00 | 94.94 ± 1.70 | 66.66 ± 2.35 | 96.30 ± 1.64 | 92.50 ± 1.67 | 60.72 ± 1.63 | 91.60 ± 1.62 | 84.37 ± 1.59 | 89.86 ± 1.61 | 81.78 ± 1.66 |
| Fusing | 24.97 ± 1.80 | 64.64 ± 1.63 | 59.65 ± 1.65 | 41.68 ± 1.82 | 60.87 ± 1.93 | 53.90 ± 2.04 | 34.96 ± 1.86 | 60.33 ± 1.89 | 45.61 ± 2.28 | 72.65 ± 1.64 | 51.93 ± 1.85 |
| NPR | 96.26 ± 1.47 | 74.65 ± 1.78 | 83.65 ± 1.96 | 64.74 ± 1.37 | 74.51 ± 1.92 | 90.26 ± 1.47 | 73.99 ± 1.58 | 36.15 ± 1.44 | 52.38 ± 1.56 | 74.51 ± 1.90 | 72.11 ± 1.65 |
| DIFC-Net(Ours) | 98.83 ± 0.51 | 98.82 ± 0.92 | 99.02 ± 0.61 | 85.99 ± 1.20 | 89.24 ± 1.24 | 95.41 ± 1.19 | 71.63 ± 1.29 | 95.41 ± 1.57 | 87.28 ± 1.30 | 81.28 ± 1.19 | 90.29 ± 1.10 |
| Dataset | Precision | Recall | F1-Score | FPR |
|---|---|---|---|---|
| DALL·E3 | 98.38 | 97.40 | 97.89 | 1.60 |
| SDXL | 98.18 | 97.00 | 97.59 | 1.80 |
| SDv1.5 | 98.40 | 98.20 | 98.30 | 1.60 |
| ADM | 81.58 | 80.60 | 81.09 | 18.20 |
| GLIDE | 86.00 | 84.80 | 85.40 | 13.80 |
| Wukong | 93.13 | 92.20 | 92.66 | 6.80 |
| Adobe | 72.19 | 69.71 | 70.93 | 26.86 |
| Image3 | 93.15 | 92.40 | 92.77 | 6.80 |
| Baidu | 83.81 | 82.80 | 83.30 | 16.00 |
| BigGAN | 78.56 | 76.20 | 77.36 | 20.80 |
| Variant | Avg. AUC |
|---|---|
| Triplet Only | 46.72 ± 5.94 |
| VCP Only | 69.16 ± 2.08 |
| TFM Only | 61.55 ± 3.11 |
| VCP + TFM (Concatenation + MLP) | 85.51 ± 2.24 |
| DIFC-Net (Full) | 90.29 ± 1.10 |
| Fusion Strategy | Avg. AUC |
|---|---|
| Element-wise Sum | 74.33 ± 3.18 |
| Weighted Sum | 76.06 ± 2.10 |
| Concatenation + MLP | 85.51 ± 2.24 |
| Cross-Attention Fusion | 82.26 ± 1.85 |
| ACF (Ours) | 90.29 ± 1.10 |
| Steps | Avg. AUC |
|---|---|
| 10 | 68.62 ± 1.98 |
| 25 | 83.47 ± 1.36 |
| 50 | 90.29 ± 1.10 |
| 100 | 89.71 ± 1.14 |
| 150 | 89.12 ± 1.42 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Lu, S.; Tian, J.; Zhang, Y.; Wu, F.; Gong, L.; Pan, H. DIFC-Net: Diffusion-Intrinsic Feature Capture for AI-Generated Image Detection. Sensors 2026, 26, 2389. https://doi.org/10.3390/s26082389
Lu S, Tian J, Zhang Y, Wu F, Gong L, Pan H. DIFC-Net: Diffusion-Intrinsic Feature Capture for AI-Generated Image Detection. Sensors. 2026; 26(8):2389. https://doi.org/10.3390/s26082389
Chicago/Turabian StyleLu, Shaofeng, Jin Tian, Yujin Zhang, Fei Wu, Li Gong, and Han Pan. 2026. "DIFC-Net: Diffusion-Intrinsic Feature Capture for AI-Generated Image Detection" Sensors 26, no. 8: 2389. https://doi.org/10.3390/s26082389
APA StyleLu, S., Tian, J., Zhang, Y., Wu, F., Gong, L., & Pan, H. (2026). DIFC-Net: Diffusion-Intrinsic Feature Capture for AI-Generated Image Detection. Sensors, 26(8), 2389. https://doi.org/10.3390/s26082389

