Towards Generalizable Deepfake Detection via Facial Landmark-Guided Convolution and Local Structure Awareness
Abstract
1. Introduction
- We innovatively propose Landmark-Guided Convolution (LGConv), which generates facial landmark-guided offsets to adjust convolutional kernel sampling positions. This allows LGConv to respond specifically in semantically meaningful facial regions prone to forgery artifacts, such as around the eyes, lips, and nose wings.
- We propose Facial Structure Awareness Block (FSAB) that enhances the detection of local and fine-grained deepfake artifacts. By incorporating residual connections and a Convolutional Block Attention Module (CBAM) for adaptive feature recalibration, FSAB effectively fuses the landmark-guided local features extracted by LGConv with the global representations from the backbone, thereby improving the model’s sensitivity to subtle facial forgery cues.
- Extensive evaluations on multiple mainstream benchmark datasets show that our method achieves state-of-the-art detection performance in both intra-dataset and cross-dataset evaluations. In particular, on the highly challenging CD1, CD2 and DFDCP datasets, our method obtains results superior to existing methods, validating its strong generalization ability in complex real-world scenarios.
2. Related Works
2.1. Methods Based on Overall Facial Consistency
2.2. Methods Based on Facial Landmark Structural Features
2.3. Methods Based on Local Facial Forgery Traces
2.4. Deformable Convolution
3. Methods
3.1. Overview
3.2. VMamba and Facial Structure Awareness Block
| Algorithm 1: Facial Structure Awareness Block (FSAB). |
Input: Input feature , facial landmarks L Output: Output feature y Step 1: Apply landmark-guided convolution Step 2: Residual unit 1 with normalization and activation Step 3: Apply LGConv again with updated input Step 4: Residual unit 2 Step 5: Apply CBAM attention module return y |
3.3. Facial Landmark-Guided Convolution
| Algorithm 2: Landmark-Guided Convolution (LGConv). |
Input: Input feature map , facial landmarks Output: Output feature map y Step 1: Generate fixed base sampling shape Step 2: Generate landmark-guided offsets Step 3: Learn image-dependent offsets (output shape: ) Step 4: Compute sampling positions Step 5: Resample feature map at Step 6: Apply convolution over sampled features return y |
4. Experiments
4.1. Datasets and Experimental Setup
4.2. Results
4.2.1. Intra-Dataset Experiments
4.2.2. Cross-Dataset Evaluation
4.2.3. Ablation Experiment
4.2.4. Robustness Evaluation
4.3. Grad-CAM Visualization
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Li, L.; Bao, J.; Zhang, T.; Yang, H.; Chen, D.; Wen, F.; Guo, B. Face X-Ray for More General Face Forgery Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 5000–5009. [Google Scholar] [CrossRef]
- Zhu, H.; Huang, H.; Li, Y.; Zheng, A.; He, R. Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, Vienna, Austria, 11–17 July 2020; pp. 2362–2368. [Google Scholar] [CrossRef]
- Wang, Y.; Chen, X.; Zhu, J.; Chu, W.; Tai, Y.; Li, J.; Wang, C.; Wu, Y.; Huang, F.; Ji, R. HifiFace: 3D Shape and Semantic Prior Guided High Fidelity Face Swapping. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Montreal, QC, Canada, 19–27 August 2021; pp. 1136–1142. [Google Scholar] [CrossRef]
- He, Y.; Yu, N.; Keuper, M.; Fritz, M. Beyond the Spectrum: Detecting Deepfakes via Re-Synthesis. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Montreal, QC, Canada, 19–27 August 2021; pp. 2534–2541. [Google Scholar] [CrossRef]
- Hu, Z.; Xie, H.; Wang, Y.; Li, J.; Wang, Z.; Zhang, Y. Dynamic Inconsistency-aware DeepFake Video Detection. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, Montreal, QC, Canada, 19–27 August 2021; pp. 736–742. [Google Scholar] [CrossRef]
- Nirkin, Y.; Wolf, L.; Keller, Y.; Hassner, T. DeepFake Detection Based on Discrepancies Between Faces and Their Context. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 6111–6121. [Google Scholar] [CrossRef] [PubMed]
- Yang, Q.; Yu, D.; Zhang, Z.; Yao, Y.; Chen, L. Spatiotemporal Trident Networks: Detection and Localization of Object Removal Tampering in Video Passive Forensics. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 4131–4144. [Google Scholar] [CrossRef]
- Li, Y.; Chang, M.; Lyu, S. In Ictu Oculi: Exposing AI Created Fake Videos by Detecting Eye Blinking. In Proceedings of the 2018 IEEE International Workshop on Information Forensics and Security, WIFS 2018, Hong Kong, China, 11–13 December 2018; pp. 1–7. [Google Scholar] [CrossRef]
- Matern, F.; Riess, C.; Stamminger, M. Exploiting Visual Artifacts to Expose Deepfakes and Face Manipulations. In Proceedings of the IEEE Winter Applications of Computer Vision Workshops, WACV Workshops 2019, Waikoloa Village, HI, USA, 7–11 January 2019; pp. 83–92. [Google Scholar] [CrossRef]
- Akhtar, Z.; Dasgupta, D. A Comparative Evaluation of Local Feature Descriptors for DeepFakes Detection. In Proceedings of the 2019 IEEE International Symposium on Technologies for Homeland Security (HST), Woburn, MA, USA, 5–6 November 2019; pp. 1–5. [Google Scholar] [CrossRef]
- Afchar, D.; Nozick, V.; Yamagishi, J.; Echizen, I. MesoNet: A Compact Facial Video Forgery Detection Network. In Proceedings of the 2018 IEEE International Workshop on Information Forensics and Security, WIFS 2018, Hong Kong, China, 11–13 December 2018; pp. 1–7. [Google Scholar] [CrossRef]
- Hsu, C.; Lee, C.; Zhuang, Y. Learning to Detect Fake Face Images in the Wild. arXiv 2018, arXiv:1809.08754. [Google Scholar] [CrossRef]
- Hsu, C.C.; Zhuang, Y.X.; Lee, C.Y. Deep Fake Image Detection Based on Pairwise Learning. Appl. Sci. 2020, 10, 370. [Google Scholar] [CrossRef]
- Nguyen, H.H.; Fang, F.; Yamagishi, J.; Echizen, I. Multi-task Learning for Detecting and Segmenting Manipulated Facial Images and Videos. In Proceedings of the 10th IEEE International Conference on Biometrics Theory, Applications and Systems, BTAS 2019, Tampa, FL, USA, 23–26 September 2019; pp. 1–8. [Google Scholar] [CrossRef]
- Nguyen, H.H.; Yamagishi, J.; Echizen, I. Capsule-forensics: Using Capsule Networks to Detect Forged Images and Videos. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, UK, 12–17 May 2019; pp. 2307–2311. [Google Scholar] [CrossRef]
- Guera, D.; Delp, E.J. Deepfake Video Detection Using Recurrent Neural Networks. In Proceedings of the 15th IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2018, Auckland, New Zealand, 27–30 November 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Amerini, I.; Galteri, L.; Caldelli, R.; Bimbo, A.D. Deepfake Video Detection through Optical Flow Based CNN. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshops, ICCV Workshops 2019, Seoul, Republic of Korea, 27–28 October 2019; pp. 1205–1207. [Google Scholar] [CrossRef]
- Li, Y.; Lyu, S. Exposing DeepFake Videos By Detecting Face Warping Artifacts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Rossler, A.; Cozzolino, D.; Verdoliva, L.; Riess, C.; Thies, J.; Nießner, M. FaceForensics++: Learning to Detect Manipulated Facial Images. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1–11. [Google Scholar] [CrossRef]
- Sun, K.; Liu, H.; Ye, Q.; Gao, Y.; Liu, J.; Shao, L.; Ji, R. Domain General Face Forgery Detection by Learning to Weight. Proc. Aaai Conf. Artif. Intell. 2021, 35, 2638–2646. [Google Scholar] [CrossRef]
- Zhou, P.; Han, X.; Morariu, V.I.; Davis, L.S. Two-Stream Neural Networks for Tampered Face Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2017, Honolulu, HI, USA, 21–26 July 2017; IEEE Computer Society: Piscataway, NJ, USA, 2017; pp. 1831–1839. [Google Scholar] [CrossRef]
- Li, Y.; Yang, X.; Sun, P.; Qi, H.; Lyu, S. Celeb-DF: A Large-Scale Challenging Dataset for DeepFake Forensics. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 3204–3213. [Google Scholar] [CrossRef]
- Shiohara, K.; Yamasaki, T. Detecting Deepfakes with Self-Blended Images. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 18699–18708. [Google Scholar] [CrossRef]
- Ganiyusufoglu, I.; Ngô, L.M.; Savov, N.; Karaoglu, S.; Gevers, T. Spatio-temporal Features for Generalized Detection of Deepfake Videos. arXiv 2020, arXiv:2010.11844. [Google Scholar] [CrossRef]
- Amerini, I.; Caldelli, R. Exploiting Prediction Error Inconsistencies through LSTM-based Classifiers to Detect Deepfake Videos. In Proceedings of the IH&MMSec ’20: ACM Workshop on Information Hiding and Multimedia Security, Denver, CO, USA, 22–24 June 2020; pp. 97–102. [Google Scholar] [CrossRef]
- Masi, I.; Killekar, A.; Mascarenhas, R.M.; Gurudatt, S.P.; AbdAlmageed, W. Two-Branch Recurrent Network for Isolating Deepfakes in Videos. In Proceedings of the Computer Vision—ECCV 2020—16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part VII; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2020; Volume 12352, pp. 667–684. [Google Scholar] [CrossRef]
- Haliassos, A.; Vougioukas, K.; Petridis, S.; Pantic, M. Lips Don’t Lie: A Generalisable and Robust Approach to Face Forgery Detection. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 5037–5047. [Google Scholar] [CrossRef]
- Liu, H.; Li, X.; Zhou, W.; Chen, Y.; He, Y.; Xue, H.; Zhang, W.; Yu, N. Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 772–781. [Google Scholar] [CrossRef]
- Cozzolino, D.; Thies, J.; Rössler, A.; Riess, C.; Nießner, M.; Verdoliva, L. ForensicTransfer: Weakly-supervised Domain Adaptation for Forgery Detection. arXiv 2018, arXiv:1812.02510. [Google Scholar] [CrossRef]
- Kong, C.; Chen, B.; Li, H.; Wang, S.; Rocha, A.; Kwong, S. Detect and Locate: Exposing Face Manipulation by Semantic- and Noise-Level Telltales. IEEE Trans. Inf. Forensics Secur. 2022, 17, 1741–1756. [Google Scholar] [CrossRef]
- Wang, Y.; Peng, C.; Liu, D.; Wang, N.; Gao, X. ForgeryNIR: Deep Face Forgery and Detection in Near-Infrared Scenario. IEEE Trans. Inf. Forensics Secur. 2022, 17, 500–515. [Google Scholar] [CrossRef]
- Liu, Y.; Tian, Y.; Zhao, Y.; Yu, H.; Xie, L.; Wang, Y.; Ye, Q.; Jiao, J.; Liu, Y. VMamba: Visual State Space Model. In Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, 10–15 December 2024. [Google Scholar] [CrossRef]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017; IEEE Computer Society: Piscataway, NJ, USA, 2017; pp. 764–773. [Google Scholar] [CrossRef]
- Bayar, B.; Stamm, M.C. A Deep Learning Approach to Universal Image Manipulation Detection Using a New Convolutional Layer. In Proceedings of the Proceedings of the 4th ACM Workshop on Information Hiding and Multimedia Security, IH&MMSec 2016, Vigo, Galicia, Spain, 20–22 June 2016; pp. 5–10. [Google Scholar] [CrossRef]
- Qian, Y.; Yin, G.; Sheng, L.; Chen, Z.; Shao, J. Thinking in Frequency: Face Forgery Detection by Mining Frequency-Aware Clues. In Proceedings of the Computer Vision—ECCV 2020—16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XII; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2020; Volume 12357, pp. 86–103. [Google Scholar] [CrossRef]
- Li, H.; Li, B.; Tan, S.; Huang, J. Identification of deep network generated images using disparities in color components. Signal Process. 2020, 174, 107616. [Google Scholar] [CrossRef]
- Nirkin, Y.; Wolf, L.; Keller, Y.; Hassner, T. DeepFake Detection Based on the Discrepancy Between the Face and its Context. arXiv 2020, arXiv:2008.12262. [Google Scholar] [CrossRef]
- Sun, K.; Chen, S.; Yao, T.; Liu, H.; Sun, X.; Ding, S.; Ji, R. Diffusionfake: Enhancing generalization in deepfake detection via guided stable diffusion. Adv. Neural Inf. Process. Syst. 2024, 37, 101474–101497. [Google Scholar]
- Fu, X.; Yan, Z.; Yao, T.; Chen, S.; Li, X. Exploring Unbiased Deepfake Detection via Token-Level Shuffling and Mixing. In Proceedings of the AAAI-25, Sponsored by the Association for the Advancement of Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; pp. 3040–3048. [Google Scholar] [CrossRef]
- Yan, Z.; Zhang, Y.; Fan, Y.; Wu, B. UCF: Uncovering Common Features for Generalizable Deepfake Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, 1–6 October 2023; pp. 22355–22366. [Google Scholar] [CrossRef]
- Sun, K.; Liu, H.; Yao, T.; Sun, X.; Chen, S.; Ding, S.; Ji, R. An Information Theoretic Approach for Attention-Driven Face Forgery Detection. In Proceedings of the Computer Vision—ECCV 2022—17th European Conference, Tel Aviv, Israel, 23–27 October 2022; Proceedings, Part XIV; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2022; Volume 13674, pp. 111–127. [Google Scholar] [CrossRef]
- Peng, S.; Zhang, T.; Gao, L.; Zhu, X.; Zhang, H.; Pang, K.; Lei, Z. WMamba: Wavelet-based Mamba for Face Forgery Detection. In Proceedings of the 33rd ACM International Conference on Multimedia, New York, NY, USA, 27–31 October 2025; pp. 4768–4777. [Google Scholar] [CrossRef]
- Zhang, Y.; Wang, C.; Zhou, X. MSER-Net: Multi-stage edge refinement network for deepfake detection. Knowl. Based Syst. 2025, 328, 114280. [Google Scholar] [CrossRef]
- Zakharov, E.; Shysheya, A.; Burkov, E.; Lempitsky, V.S. Few-Shot Adversarial Learning of Realistic Neural Talking Head Models. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9458–9467. [Google Scholar] [CrossRef]
- Zhang, J.; Zeng, X.; Wang, M.; Pan, Y.; Liu, L.; Liu, Y.; Ding, Y.; Fan, C. FReeNet: Multi-Identity Face Reenactment. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 5325–5334. [Google Scholar] [CrossRef]
- Wiles, O.; Koepke, A.S.; Zisserman, A. X2Face: A Network for Controlling Face Generation Using Images, Audio, and Pose Codes. In Proceedings of the Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, 8–14 September 2018; Proceedings, Part XIII; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2018; Volume 11217, pp. 690–706. [Google Scholar] [CrossRef]
- Hsu, G.S.; Tsai, C.H.; Wu, H.Y. Dual-Generator Face Reenactment. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 632–640. [Google Scholar] [CrossRef]
- Doukas, M.C.; Ververas, E.; Sharmanska, V.; Zafeiriou, S. Free-HeadGAN: Neural Talking Head Synthesis With Explicit Gaze Control. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 9743–9756. [Google Scholar] [CrossRef]
- Liu, Z.; Qi, X.; Torr, P.H. Global Texture Enhancement for Fake Face Detection in the Wild. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 8057–8066. [Google Scholar] [CrossRef]
- Finder, S.E.; Amoyal, R.; Treister, E.; Freifeld, O. Wavelet Convolutions for Large Receptive Fields. In Proceedings of the Computer Vision—ECCV 2024—18th European Conference, Milan, Italy, 29 September–4 October 2024; Proceedings, Part LIV; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2024; Volume 15112, pp. 363–380. [Google Scholar] [CrossRef]
- Qi, Y.; He, Y.; Qi, X.; Zhang, Y.; Yang, G. Dynamic Snake Convolution based on Topological Geometric Constraints for Tubular Structure Segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, 1–6 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 6047–6056. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the Computer Vision—ECCV 2018—15th European Conference, Munich, Germany, 8–14 September 2018; Proceedings, Part VII; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2018; Volume 11211, pp. 3–19. [Google Scholar] [CrossRef]
- Zhang, X.; Song, Y.; Song, T.; Yang, D.; Ye, Y.; Zhou, J.; Zhang, L. LDConv: Linear deformable convolution for improving convolutional neural networks. Image Vis. Comput. 2024, 149, 105190. [Google Scholar] [CrossRef]
- Dolhansky, B.; Howes, R.; Pflaum, B.; Baram, N.; Canton-Ferrer, C. The Deepfake Detection Challenge (DFDC) Preview Dataset. arXiv 2019, arXiv:1910.08854. [Google Scholar] [CrossRef]
- Dufour, G.R.N.; Gully, A. Contributing Data to Deepfake Detection Research. 2019. Available online: https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html (accessed on 7 December 2025).
- Li, X.; Ni, R.; Yang, P.; Fu, Z.; Zhao, Y. Artifacts-Disentangled Adversarial Learning for Deepfake Detection. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 1658–1670. [Google Scholar] [CrossRef]
- Bai, N.; Wang, X.; Han, R.; Hou, J.; Wang, Q.; Pang, S. Towards generalizable face forgery detection via mitigating spurious correlation. Neural Netw. 2025, 182, 106909. [Google Scholar] [CrossRef]
- Zhao, H.; Wei, T.; Zhou, W.; Zhang, W.; Chen, D.; Yu, N. Multi-attentional Deepfake Detection. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 2185–2194. [Google Scholar] [CrossRef]
- Cao, J.; Ma, C.; Yao, T.; Chen, S.; Ding, S.; Yang, X. End-to-End Reconstruction-Classification Learning for Face Forgery Detection. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 4103–4112. [Google Scholar] [CrossRef]
- Yang, Z.; Liang, J.; Xu, Y.; Zhang, X.; He, R. Masked Relation Learning for DeepFake Detection. IEEE Trans. Inf. Forensics Secur. 2023, 18, 1696–1708. [Google Scholar] [CrossRef]
- Duan, H.; Jiang, Q.; Jin, X.; Wozniak, M.; Zhao, Y.; Wu, L.; Yao, S.; Zhou, W. Mf-net: Multi-feature fusion network based on two-stream extraction and multi-scale enhancement for face forgery detection. Complex Intell. Syst. 2025, 11, 11. [Google Scholar] [CrossRef]
- Qiu, X.; Miao, X.; Wan, F.; Duan, H.; Shah, T.; Ojha, V.; Long, Y.; Ranjan, R. D2Fusion: Dual-domain fusion with feature superposition for Deepfake detection. Inf. Fusion 2025, 120, 103087. [Google Scholar] [CrossRef]
- Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, CA, USA, 9–15 June 2019; Proceedings of Machine Learning Research. Volume 97, pp. 6105–6114. [Google Scholar]
- Chen, S.; Yao, T.; Chen, Y.; Ding, S.; Li, J.; Ji, R. Local Relation Learning for Face Forgery Detection. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, 2–9 February 2021; pp. 1081–1088. [Google Scholar] [CrossRef]
- Sun, K.; Yao, T.; Chen, S.; Ding, S.; Li, J.; Ji, R. Dual Contrastive Learning for General Face Forgery Detection. In Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022, Virtual Event, 22 February–1 March 2022; pp. 2316–2324. [Google Scholar] [CrossRef]
- Hu, J.; Liao, X.; Liang, J.; Zhou, W.; Qin, Z. FInfer: Frame Inference-Based Deepfake Detection for High-Visual-Quality Videos. In Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022, Virtual Event, 22 February–1 March 2022; pp. 951–959. [Google Scholar] [CrossRef]
- Dong, S.; Wang, J.; Ji, R.; Liang, J.; Fan, H.; Ge, Z. Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, 17–24 June 2023; pp. 3994–4004. [Google Scholar] [CrossRef]
- Nguyen, D.; Mejri, N.; Singh, I.P.; Kuleshova, P.; Astrid, M.; Kacem, A.; Ghorbel, E.; Aouada, D. LAA-Net: Localized Artifact Attention Network for Quality-Agnostic and Generalizable Deepfake Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, WA, USA, 16–22 June 2024; pp. 17395–17405. [Google Scholar] [CrossRef]
- Yan, Z.; Luo, Y.; Lyu, S.; Liu, Q.; Wu, B. Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, WA, USA, 16–22 June 2024; pp. 8984–8994. [Google Scholar] [CrossRef]
- Yang, L.; Zhang, R.Y.; Li, L.; Xie, X. SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; Proceedings of Machine Learning Research. Volume 139, pp. 11863–11874. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016; IEEE Computer Society: Piscataway, NJ, USA, 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, 10–17 October 2021; pp. 9992–10002. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. Int. J. Comput. Vis. 2020, 128, 336–359. [Google Scholar] [CrossRef]






| Training Set | Methods | Venue | Test Sets (AUC (%)) | ||||
|---|---|---|---|---|---|---|---|
| DF | FS | F2F | NT | Avg | |||
| DF | Face-Xray * | CVPR2020 | 98.70 | 60.07 | 63.36 | 69.82 | 72.99 |
| MAT | CVPR2021 | 99.92 | 40.51 | 75.23 | 71.08 | 71.69 | |
| RECCE * | CVPR2022 | 99.95 | 54.72 | 69.75 | 77.15 | 75.39 | |
| ADA * [56] | TCSVT2023 | 99.68 | 79.41 | 73.61 | 81.92 | 83.66 | |
| MRL * | TIFS2023 | 99.75 | 95.63 | 79.04 | 79.69 | 88.53 | |
| TGF [57] | NN2024 | 99.47 | 69.06 | 77.39 | 68.51 | 78.61 | |
| Mf-net | CIS2024 | 99.97 | 74.54 | 70.82 | 82.37 | 81.93 | |
| D2Fusion | IF2025 | 99.98 | 62.25 | 77.88 | 75.73 | 78.96 | |
| LGMamba (Ours) | 99.81 | 96.82 | 85.17 | 83.66 | 91.37 | ||
| FS | Face-Xray * | CVPR2020 | 45.84 | 95.89 | 76.17 | 70.22 | 72.03 |
| MAT | CVPR2021 | 64.13 | 99.67 | 66.39 | 50.10 | 70.07 | |
| RECCE * | CVPR2022 | 63.05 | 99.72 | 66.21 | 58.07 | 71.76 | |
| ADA * | TCSVT2023 | 72.36 | 99.91 | 70.20 | 62.12 | 76.15 | |
| MRL * | TIFS2023 | 89.33 | 95.67 | 75.66 | 80.35 | 85.25 | |
| TGF | NN2024 | 81.47 | 98.93 | 65.28 | 60.63 | 76.58 | |
| Mf-net | CIS2024 | 76.62 | 99.91 | 70.12 | 60.34 | 76.75 | |
| D2Fusion | IF2025 | 77.50 | 99.92 | 69.76 | 58.45 | 76.41 | |
| LGMamba (Ours) | 93.22 | 98.10 | 80.63 | 83.92 | 88.97 | ||
| F2F | Face-Xray * | CVPR2020 | 63.06 | 68.81 | 94.43 | 72.58 | 74.72 |
| MAT | CVPR2021 | 86.15 | 60.14 | 99.13 | 64.59 | 77.50 | |
| RECCE * | CVPR2022 | 71.55 | 50.02 | 99.20 | 72.27 | 73.26 | |
| ADA * | TCSVT2023 | 90.32 | 69.49 | 99.17 | 73.13 | 83.03 | |
| MRL * | TIFS2023 | 81.44 | 83.43 | 83.41 | 79.52 | 81.95 | |
| TGF | NN2024 | 78.07 | 67.58 | 98.27 | 74.01 | 79.48 | |
| Mf-net | CIS2024 | 86.74 | 67.51 | 99.96 | 90.42 | 86.16 | |
| D2Fusion | IF2025 | 89.50 | 62.64 | 99.86 | 75.23 | 81.81 | |
| LGMamba (Ours) | 83.01 | 77.50 | 98.23 | 86.19 | 86.23 | ||
| NT | Face-Xray * | CVPR2020 | 70.51 | 78.37 | 79.22 | 92.57 | 80.17 |
| MAT | CVPR2021 | 87.23 | 75.33 | 48.22 | 98.66 | 77.36 | |
| RECCE * | CVPR2022 | 72.37 | 51.61 | 64.69 | 99.59 | 72.07 | |
| ADA * | TCSVT2023 | 90.94 | 78.47 | 63.28 | 99.28 | 82.99 | |
| MRL * | TIFS2023 | 80.54 | 81.74 | 76.56 | 78.42 | 79.32 | |
| TGF | NN2024 | 83.81 | 63.88 | 78.60 | 92.42 | 79.68 | |
| Mf-net | CIS2024 | 89.68 | 64.59 | 74.97 | 99.36 | 82.15 | |
| D2Fusion | IF2025 | 94.44 | 80.75 | 71.08 | 99.43 | 86.43 | |
| LGMamba (Ours) | 83.88 | 81.23 | 83.03 | 95.07 | 85.80 | ||
| Methods | Venue | Test Sets (AUC (%)) | ||||
|---|---|---|---|---|---|---|
| CD1 | CD2 | DFDCP | DFD | Avg | ||
| Xception * [19] | ICCV2019 | 78.90 | 73.75 | 74.96 | 80.66 | 77.07 |
| Ef-b4 * [63] | ICML2019 | 69.44 | 64.29 | 70.38 | 83.17 | 71.82 |
| LRL * [64] | AAAI2021 | ~ | 78.26 | ~ | 89.24 | ~ |
| LipFor [27] | CVPR2021 | ~ | 82.40 | ~ | ~ | ~ |
| DCL * [65] | AAAI2022 | ~ | 82.30 | 76.71 | 91.66 | 83.56 |
| Finfer [66] | AAAI2022 | 70.60 | ~ | 70.39 | ~ | ~ |
| ADA * | TCSVT2023 | 82.49 | 84.62 | 78.51 | 92.14 | 84.44 |
| MRL * | TIFS2023 | ~ | 83.58 | 71.53 | ~ | ~ |
| CADDM * [67] | CVPR2023 | 89.57 | 77.04 | 81.23 | 93.92 | 85.44 |
| LAA-Net [68] | CVPR2024 | ~ | 95.40 | 86.94 | ~ | ~ |
| LSDA [69] | CVPR2024 | 86.70 | 83.00 | 81.50 | 88.00 | 84.80 |
| D2Fusion | IF2025 | 88.14 | 83.29 | ~ | ~ | ~ |
| UDD [39] | AAAI2025 | ~ | 86.90 | 85.60 | 91.00 | ~ |
| LGMamba (Ours) | 92.34 | 96.01 | 88.87 | 92.26 | 92.37 | |
| DConv | Training Set | Test Sets (AUC (%)) | |
|---|---|---|---|
| CD2 | DFDCP | ||
| DCN | FF++ | 92.50 | 80.77 |
| DSConv | FF++ | 95.23 | 84.62 |
| WTConv | FF++ | 95.51 | 84.98 |
| LGConv | FF++ | 96.01 | 88.87 |
| Components | Training Set | Test Set (AUC (%)) |
|---|---|---|
| FF++ | ||
| None | FF++ | 87.50 |
| SimAM [70] | FF++ | 87.93 |
| SENet [71] | FF++ | 88.22 |
| CBAM | FF++ | 89.91 |
| Components | Backbone | Params | FLOPs | Inference Time | Training Set | Test Sets (AUC (%)) | ||
|---|---|---|---|---|---|---|---|---|
| LGConv | FSAB | DFDCP | CD2 | |||||
| Resnet | 44 M | 7.8 G | 59 ms | FF++ | 72.57 | 70.38 | ||
| ✓ | Resnet | 45 M | 8.0 G | 91 ms | FF++ | 72.91 | 71.02 | |
| ✓ | ✓ | Resnet | 45 M | 8.0 G | 127 ms | FF++ | 73.80 | 72.22 |
| Swin-T | 49 M | 8.5 G | 102 ms | FF++ | 80.22 | 89.90 | ||
| ✓ | Swin-T | 50 M | 8.7 G | 194 ms | FF++ | 81.83 | 91.01 | |
| ✓ | ✓ | Swin-T | 50 M | 8.7 G | 231 ms | FF++ | 82.24 | 92.88 |
| VMamba | 30 M | 4.9 G | 68 ms | FF++ | 85.12 | 94.28 | ||
| ✓ | VMamba | 31 M | 5.1 G | 178 ms | FF++ | 86.97 | 95.66 | |
| ✓ | ✓ | VMamba | 31 M | 5.1 G | 199 ms | FF++ | 88.87 | 96.01 |
| Methods | Venue | Clean | Blur | Noise | Block | Avg |
|---|---|---|---|---|---|---|
| Xception | ICCV2019 | 75.98 | 73.64 | 72.75 | 72.56 | 73.73 |
| Face-Xray | CVPR2020 | 80.92 | 76.31 | 75.02 | 77.25 | 77.38 |
| RECCE | CVPR2022 | 78.33 | 76.20 | 74.69 | 74.23 | 75.86 |
| ADA | TCSVT2023 | 80.92 | 79.33 | 80.01 | 77.59 | 79.46 |
| MRL | TIFS2023 | 80.21 | 79.65 | 77.32 | 78.44 | 78.91 |
| LGMamba | 89.91 | 87.69 | 88.70 | 85.95 | 88.06 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Chen, H.; Zhang, Z.; Li, Q.; Feng, C. Towards Generalizable Deepfake Detection via Facial Landmark-Guided Convolution and Local Structure Awareness. Algorithms 2026, 19, 270. https://doi.org/10.3390/a19040270
Chen H, Zhang Z, Li Q, Feng C. Towards Generalizable Deepfake Detection via Facial Landmark-Guided Convolution and Local Structure Awareness. Algorithms. 2026; 19(4):270. https://doi.org/10.3390/a19040270
Chicago/Turabian StyleChen, Hao, Zhengxu Zhang, Qin Li, and Chunhui Feng. 2026. "Towards Generalizable Deepfake Detection via Facial Landmark-Guided Convolution and Local Structure Awareness" Algorithms 19, no. 4: 270. https://doi.org/10.3390/a19040270
APA StyleChen, H., Zhang, Z., Li, Q., & Feng, C. (2026). Towards Generalizable Deepfake Detection via Facial Landmark-Guided Convolution and Local Structure Awareness. Algorithms, 19(4), 270. https://doi.org/10.3390/a19040270

