EFIMD-Net: Enhanced Feature Interaction and Multi-Domain Fusion Deep Forgery Detection Network
Abstract
1. Introduction
- This paper proposes a cosine similarity-guided Cross-feature Interaction Enhancement (CFIE) mechanism, which uses dynamic cosine similarity calculation to guide space–frequency domain feature interaction instead of traditional static fusion mode to achieve efficient adaptive fusion between features and effectively improve the complementarity of multi-domain features.
- An Enhanced Feature Guidance (EFG) module integrating multi-level self-attention, channel attention [12] and spatial attention is designed to extract semantic information accurately from RGB streams through multi-dimensional attention mechanism, guide SRM streams to locate potential forgery areas, and enhance the ability to capture and locate diverse fine artifacts.
- An Enhanced Multi-scale Feature Fusion (EMFF) technique is developed, which effectively improves the perception, integration, and discrimination ability of the model for different scale forgery artifacts through adaptive feature enhancement and efficient recombination strategy.
2. Materials and Methods
2.1. Related Work
2.1.1. Deep Forgery Technology
2.1.2. Multi-Domain Feature Fusion Detection Method
2.2. EFIMD-Net
2.2.1. Overall Structure
2.2.2. Cross-Feature Interaction Enhancement Module
2.2.3. Enhanced Feature Guidance Module
2.2.4. Enhanced Multi-Scale Feature Fusion Module
2.2.5. Loss Function
3. Results
3.1. Experimental Settings
3.2. Contrast Experiment
3.2.1. Domain Performance
3.2.2. Generalization Performance Across Datasets
3.3. Ablation Experiments
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Mirsky, Y.; Lee, W. The creation and detection of deepfakes: A survey. ACM Comput. Surv. 2021, 54, 1–41. [Google Scholar] [CrossRef]
- Al Redhaei, A.; Fraihat, S.; Al-Betar, M.A. A self-supervised BEiT model with a novel hierarchical patchReducer for efficient facial deepfake detection. Artif. Intell. Rev. 2025, 58, 1–37. [Google Scholar] [CrossRef]
- Li, Y.; Lyu, S. Exposing DeepFake Videos By Detecting Face Warping Artifacts. arXiv 2018, arXiv:1811.00656. [Google Scholar]
- Yang, X.; Li, Y.; Lyu, S. Exposing deep fakes using inconsistent head poses. In Proceedings of the ICASSP 2019-2019 IEEE International Conference On Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 8261–8265. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Info, Y.B. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. In Proceedings of the NIPS’20: 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020; Volume 33, pp. 6840–6851. [Google Scholar]
- Miao, C.; Tan, Z.; Chu, Q.; Liu, H.; Hu, H.; Yu, N. F2Trans: High-Frequency Fine-Grained Transformer for Face Forgery Detection. IEEE Trans. Inf. Forensics Secur. 2023, 18, 1039–1051. [Google Scholar] [CrossRef]
- Liu, H.; Li, X.; Zhou, W.; Chen, Y.; He, Y.; Xue, H.; Zhang, W.; Yu, N. Spatial-phase shallow learning: Rethinking face forgery detection in frequency domain. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 772–781. [Google Scholar]
- Li, L.; Liu, J.; Wang, S.; Zhang, K.; Lau, R.W.H.; Chen, M. UMMAFormer: A Universal Multimodal-adapter Transformer Framework for Temporal Forgery Localization. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; pp. 1532–1541. [Google Scholar]
- Hussein, S.A.; Tirer, T.; Giryes, R. Image-adaptive GAN based reconstruction. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 3121–3129. [Google Scholar]
- Guo, M.; Yin, Q.; Lu, W.; Luo, X. Towards Open-world Generalized Deepfake Detection: General Feature Extraction via Unsupervised Domain Adaptation. arXiv 2025, arXiv:2505.12339. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- DeepFakes GitHub Repository. Available online: https://github.com/deepfakes/faceswap (accessed on 26 March 2025).
- Thies, J.; Zollhofer, M.; Stamminger, M.; Theobalt, C.; Nießner, M. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2387–2395. [Google Scholar]
- FaceSwap GitHub Repository. Available online: https://github.com/marekkowalski/FaceSwap (accessed on 30 March 2025).
- Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4401–4410. [Google Scholar]
- Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8110–8119. [Google Scholar]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10684–10695. [Google Scholar]
- Ramesh, A.; Dhariwal, P.; Nichol, A.; Chu, C.; Chen, M. Hierarchical text-conditional image generation with clip latents. arXiv 2022, arXiv:2204.06125. [Google Scholar] [CrossRef]
- Li, Y.; Yang, X.; Sun, P.; Qi, H.; Lyu, S. Celeb-df: A large-scale challenging dataset for deepfake forensics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3207–3216. [Google Scholar]
- Yang, X.; Li, Y.; Qi, H.; Lyu, S. Exposing GAN-synthesized faces using landmark locations. In Proceedings of the ACM Workshop on Information Hiding and Multimedia Security; Association for Computing Machinery (ACM): New York, NY, USA, 2019; pp. 113–118. [Google Scholar]
- Rossler, A.; Cozzolino, D.; Verdoliva, L.; Riess, C.; Thies, J.; Nießner, M. Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1–11. [Google Scholar]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Zhao, H.; Zhou, W.; Chen, D.; Wei, T.; Zhang, W.; Yu, N. Multi-attentional deepfake detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 2185–2194. [Google Scholar]
- Durall, R.; Keuper, M.; Keuper, J. Watch your up-convolution: Cnn based generative deep neural networks are failing to reproduce spectral distributions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2020; pp. 7890–7899. [Google Scholar]
- Marra, F.; Gragnaniello, D.; Verdoliva, L.; Poggi, G. Do gans leave artificial fingerprints? In Proceedings of the 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), San Jose, CA, USA, 28–30 March 2019; pp. 506–511. [Google Scholar]
- Li, J.; Xie, H.; Li, J.; Wang, Z.; Zhang, Y. Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6458–6467. [Google Scholar]
- Li, Z.; Tang, W.; Gao, S.; Wang, S.; Wang, Y. Multiple Contexts and Frequencies Aggregation Network forDeepfake Detection. arXiv 2024, arXiv:2408.01668. [Google Scholar] [CrossRef]
- Qian, Y.; Yin, G.; Sheng, L.; Chen, Z.; Shao, J. Thinking in frequency: Face forgery detection by mining frequency-aware clues. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 86–103. [Google Scholar]
- Zhou, P.; Han, X.; Morariu, V.I.; Davis, L.S. Learning rich features for image manipulation detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1053–1061. [Google Scholar]
- Qiu, X.; Miao, X.; Wan, F.; Duan, H.; Shah, T.; Ojha, V.; Long, Y.; Ranjan, R. D2Fusion: Dual-domain fusion with feature superposition for Deepfake detection. Inf. Fusion 2025, 120, 103087. [Google Scholar] [CrossRef]
- Li, L.; Bao, J.; Zhang, T.; Yang, H.; Chen, D.; Wen, F.; Guo, B. Face x-ray for more general face forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 5001–5010. [Google Scholar]
- Sun, K.; Yao, T.; Chen, S.; Ding, S.; Li, J.; Ji, R. Dual contrastive learning for general face forgery detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Online Meeting, 22 February–1 March 2022; pp. 2316–2324. [Google Scholar]
- Zhao, T.; Xu, X.; Xu, M.; Ding, H.; Xiong, Y.; Xia, W. Learning self-consistency for deepfake detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 15023–15033. [Google Scholar]
- Shiohara, K.; Yamasaki, T. Detecting deepfakes with self-blended images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 18720–18729. [Google Scholar]
- Zhou, T.; Wang, W.; Liang, Z.; Shen, J. Face forensics in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 5778–5788. [Google Scholar]
- Li, J.; Xie, H.; Yu, L.; Zhang, Y. Wavelet-enhanced weakly supervised local feature learning for face forgery detection. In Proceedings of the 30th ACM International Conference on Multimedia, Lisbon, Portugal, 10–14 October 2022; pp. 1299–1308. [Google Scholar]
- Zheng, Y.; Bao, J.; Chen, D.; Zeng, M.; Wen, F. Exploring temporal coherence for more general video face forgery detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 15044–15054. [Google Scholar]
- Wang, J.; Sun, Y.; Tang, J. LiSiam: Localization Invariance Siamese Network for Deepfake Detection. IEEE Trans. Inf. Forensics Secur. 2022, 17, 2425–2436. [Google Scholar] [CrossRef]
- Haliassos, A.; Vougioukas, K.; Petridis, S.; Pantic, M. Lips don’t lie: A generalisable and robust approach to face forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 5039–5049. [Google Scholar]
- Guan, J.; Zhou, H.; Hong, Z.; Ding, E.; Wang, J.; Quan, C.; Zhao, Y. Delving into sequential patches for deepfake detection. In Proceedings of the NIPS’22: 36th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022; Volume 35, pp. 4517–4530. [Google Scholar]
- Shuai, C.; Zhong, J.; Wu, S.; Lin, F.; Wang, Z.; Ba, Z.; Liu, Z.; Cavallaro, L.; Ren, K. Locate and verify: A two-stream network for improved deepfake detection. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; pp. 7131–7142. [Google Scholar]
Method | FF++ | DF | F2F | FS | NT | Avg |
---|---|---|---|---|---|---|
Xception [23] | 0.963 | 0.994 | 0.995 | 0.994 | 0.995 | 0.942 |
Face X-Ray [32] | 0.985 | 0.991 | 0.993 | 0.992 | 0.993 | 0.922 |
DCL [33] | 0.993 | - | 0.992 | - | 0.990 | 0.991 |
PCL + l2G [34] | 0.991 | 1.00 | 0.990 | 0.999 | 0.976 | 0.912 |
SBIs [35] | 0.992 | - | - | 0.988 | 0.996 | 0.992 |
Ours | 0.995 ↑0.002 | 0.997 | 0.997 ↑0.002 | 0.993 | 0.995 | 0.995 ↑0.003 |
Method | Training Set | CelebDF-v1 | CelebDF-v2 |
---|---|---|---|
Xception [23] | FF++ | 0.623 | 0.737 |
Face X-Ray [32] | Prd | 0.806 | - |
FWA [3] | Prd | 0.538 | 0.569 |
DAM [36] | FF++ | - | 0.783 |
Li.et.al [37] | FF++ | - | 0.870 |
FTCN [38] | FF++ | - | 0.869 |
LiSiam [39] | FF++ | 0.811 | 0.782 |
SBIs [35] | Prd | - | 0.870 |
LipForensics [40] | FF++ | - | 0.824 |
LITTD [41] | FF++ | - | 0.893 |
Locate and Verify [42] | FF++ | 0.847 | 0.922 |
Ours | FF++ | 0.938 ↑0.091 | 0.995 ↑0.073 |
Model Variant | FF++ | |
---|---|---|
ACC | AUC | |
Xception | 0.885 | 0.959 |
+ SRM | 0.901 | 0.974 |
+ SRM + CFIE | 0.934 | 0.989 |
+ SRM + CFIE + EFG | 0.947 | 0.992 |
+ SRM + CFIE + EFG+EMFF | 0.960 | 0.995 |
Method | FLOPs [G] | Parameters [M] |
---|---|---|
Locate and Verify [42] | 21.39 | 61.87 |
EFIMD-Net (ours) | 101.33 | 69.43 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cheng, H.; Pang, W.; Li, K.; Wei, Y.; Song, Y.; Chen, J. EFIMD-Net: Enhanced Feature Interaction and Multi-Domain Fusion Deep Forgery Detection Network. J. Imaging 2025, 11, 312. https://doi.org/10.3390/jimaging11090312
Cheng H, Pang W, Li K, Wei Y, Song Y, Chen J. EFIMD-Net: Enhanced Feature Interaction and Multi-Domain Fusion Deep Forgery Detection Network. Journal of Imaging. 2025; 11(9):312. https://doi.org/10.3390/jimaging11090312
Chicago/Turabian StyleCheng, Hao, Weiye Pang, Kun Li, Yongzhuang Wei, Yuhang Song, and Ji Chen. 2025. "EFIMD-Net: Enhanced Feature Interaction and Multi-Domain Fusion Deep Forgery Detection Network" Journal of Imaging 11, no. 9: 312. https://doi.org/10.3390/jimaging11090312
APA StyleCheng, H., Pang, W., Li, K., Wei, Y., Song, Y., & Chen, J. (2025). EFIMD-Net: Enhanced Feature Interaction and Multi-Domain Fusion Deep Forgery Detection Network. Journal of Imaging, 11(9), 312. https://doi.org/10.3390/jimaging11090312