Fighting Deepfakes by Detecting GAN DCT Anomalies
Abstract
:1. Introduction
- A new high-performance Deepfake face detection method based on the analysis of the AC coefficients calculated through the Discrete Cosine Transform, which delivered not only great generalization results but also impressive classification results with respect to previous published works. The method does not require computation via GPU and “hours” of training to perform Real Vs Deepfake classifications;
- The detection method is “explainable” (white-box method). Through a simple estimation of the characterizing parameters of the Laplacian distribution, we are able to detect those anomalous frequencies generated by various Deepfake architectures;
- Finally, the detection strategy was attacked to simulated situations in the wild. Mirroring, scaling, rotation, addition of random size rectangles, position and color were applied to the images, also demonstrating the robustness of the proposed method and the ability to perform well even on video dataset never taken into account during training.
2. Related Works
2.1. Deepfake Generation Techniques of Faces
2.2. Deepfake Detection Techniques
3. The CTF Approach
4. Datasets Details
- CelebA (CelebFaces Attributes Dataset): a large-scale face attributes dataset with more than 200 K celebrity images, containing 40 labels related to facial attributes such as hair color, gender and age. The images in this dataset cover large pose variations and background clutter. The dataset is composed by JPEG images.
- FFHQ (Flickr-Faces-HQ): is a high-quality image dataset of human faces with variations in terms of age, ethnicity and image background. The images were crawled from Flickr and automatically aligned and cropped using dlib [43]. The dataset is composed by high-quality PNG images.
- StarGAN is able to perform Image-to-image translations on multiple domains using a single model. Using CelebA as real images dataset, every image was manipulated by means of a pre-trained model (https://github.com/yunjey/stargan, accessed on 14 February 2021) obtaining a final resolution equal to .
- GDWCT is able to improve the styling capability. Using CelebA as real images dataset, every image was manipulated by means of a pre-trained model (https://github.com/WonwoongCho/GDWCT, accessed on 14 February 2021) obtaining a final resolution equal to .
- AttGAN is able to transfers facial attributes with constraints. Using CelebA as real images dataset, every image was manipulated by means of a pre-trained model (https://github.com/LynnHo/AttGAN-Tensorflow, accessed on 14 February 2021) obtaining a final resolution equal to .
- StyleGAN is able to transfers semantic content from a source domain to a target domain characterized by a different style. Images have been generated considering FFHQ as dataset in input with resolution (https://drive.google.com/drive/folders/1uka3a1noXHAydRPRbknqwKVGODvnmUBX, accessed on 14 February 2021).
- StyleGAN2 improves STYLEGAN quality with the same task. Images have been generated considering FFHQ as dataset in input with resolution (https://drive.google.com/drive/folders/1QHc-yF5C3DChRwSdZKcx1w6K8JvSxQi7, accessed on 14 February 2021).
5. Discussion on GSF
Finalizing the CTF Approach
6. Experimental Results
6.1. Testing with Noise
6.2. Comparison and Generalization Tests
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
- Vaccari, C.; Chadwick, A. Deepfakes and disinformation: Exploring the impact of synthetic political video on deception, uncertainty, and trust in news. Soc. Media+ Soc. 2020, 6, 2056305120903408. [Google Scholar] [CrossRef] [Green Version]
- Guarnera, L.; Giudice, O.; Battiato, S. DeepFake Detection by Analyzing Convolutional Traces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 666–667. [Google Scholar]
- Guarnera, L.; Giudice, O.; Nastasi, C.; Battiato, S. Preliminary Forensics Analysis of DeepFake Images. In Proceedings of the 2020 AEIT International Annual Conference (AEIT), Catania, Italy, 23–25 September 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Zhang, X.; Karaman, S.; Chang, S.F. Detecting and simulating artifacts in gan fake images. In Proceedings of the 2019 IEEE International Workshop on Information Forensics and Security (WIFS), Delft, The Netherlands, 9–12 December 2019; pp. 1–6. [Google Scholar]
- Oliva, A.; Torralba, A. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vis. 2001, 42, 145–175. [Google Scholar] [CrossRef]
- Xu, K.; Qin, M.; Sun, F.; Wang, Y.; Chen, Y.K.; Ren, F. Learning in the frequency domain. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 1740–1749. [Google Scholar]
- Xu, Z.Q.J.; Zhang, Y.; Xiao, Y. Training behavior of deep neural network in frequency domain. In Neural Information Processing; Springer: Berlin/Heidelberg, Germany, 2019; pp. 264–274. [Google Scholar]
- Yin, D.; Gontijo Lopes, R.; Shlens, J.; Cubuk, E.D.; Gilmer, J. A Fourier Perspective on Model Robustness in Computer Vision. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Volume 32, pp. 13276–13286. [Google Scholar]
- Rahaman, N.; Baratin, A.; Arpit, D.; Draxler, F.; Lin, M.; Hamprecht, F.; Bengio, Y.; Courville, A. On the spectral bias of neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 5301–5310. [Google Scholar]
- Farinella, G.M.; Ravì, D.; Tomaselli, V.; Guarnera, M.; Battiato, S. Representing scenes for real-time context classification on mobile devices. Pattern Recognit. 2015, 48, 1086–1100. [Google Scholar] [CrossRef]
- Ravì, D.; Bober, M.; Farinella, G.M.; Guarnera, M.; Battiato, S. Semantic segmentation of images exploiting DCT based features and random forest. Pattern Recognit. 2016, 52, 260–273. [Google Scholar] [CrossRef]
- Lam, E.Y.; Goodman, J.W. A mathematical analysis of the DCT coefficient distributions for images. IEEE Trans. Image Process. 2000, 9, 1661–1666. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Tolosana, R.; Vera-Rodriguez, R.; Fierrez, J.; Morales, A.; Ortega-Garcia, J. Deepfakes and beyond: A survey of face manipulation and fake detection. arXiv 2020, arXiv:2001.00179. [Google Scholar] [CrossRef]
- Verdoliva, L. Media Forensics and DeepFakes: An overview. IEEE J. Sel. Top. Signal Process. 2020, 14, 910–932. [Google Scholar] [CrossRef]
- Choi, Y.; Choi, M.; Kim, M.; Ha, J.W.; Kim, S.; Choo, J. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8789–8797. [Google Scholar]
- Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 4401–4410. [Google Scholar]
- Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 8110–8119. [Google Scholar]
- He, Z.; Zuo, W.; Kan, M.; Shan, S.; Chen, X. Attgan: Facial attribute editing by only changing what you want. IEEE Trans. Image Process. 2019, 28, 5464–5478. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cho, W.; Choi, S.; Park, D.K.; Shin, I.; Choo, J. Image-to-image translation via group-wise deep whitening-and-coloring transformation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 10639–10647. [Google Scholar]
- Liu, Z.; Luo, P.; Wang, X.; Tang, X. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 3730–3738. [Google Scholar]
- Langner, O.; Dotsch, R.; Bijlstra, G.; Wigboldus, D.H.; Hawk, S.T.; Van Knippenberg, A. Presentation and validation of the Radboud Faces Database. Cogn. Emot. 2010, 24, 1377–1388. [Google Scholar] [CrossRef]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Lee, H.Y.; Tseng, H.Y.; Huang, J.B.; Singh, M.; Yang, M.H. Diverse image-to-image translation via disentangled representations. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 35–51. [Google Scholar]
- Wilber, M.J.; Fang, C.; Jin, H.; Hertzmann, A.; Collomosse, J.; Belongie, S. Bam! The behance artistic media dataset for recognition beyond photography. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 1202–1211. [Google Scholar]
- Durall, R.; Keuper, M.; Pfreundt, F.J.; Keuper, J. Unmasking deepfakes with simple features. arXiv 2019, arXiv:1911.00686. [Google Scholar]
- Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of gans for improved quality, stability, and variation. arXiv 2017, arXiv:1710.10196. [Google Scholar]
- Wang, R.; Ma, L.; Juefei-Xu, F.; Xie, X.; Wang, J.; Liu, Y. Fakespotter: A simple baseline for spotting AI-synthesized fake faces. arXiv 2019, arXiv:1909.06122. [Google Scholar]
- Liu, M.; Ding, Y.; Xia, M.; Liu, X.; Ding, E.; Zuo, W.; Wen, S. Stgan: A unified selective transfer network for arbitrary image attribute editing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 3673–3682. [Google Scholar]
- Rossler, A.; Cozzolino, D.; Verdoliva, L.; Riess, C.; Thies, J.; Nießner, M. Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27–28 October 2019; pp. 1–11. [Google Scholar]
- Li, Y.; Yang, X.; Sun, P.; Qi, H.; Lyu, S. Celeb-DF: A Large-scale Challenging Dataset for DeepFake Forensics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 3207–3216. [Google Scholar]
- Jain, A.; Majumdar, P.; Singh, R.; Vatsa, M. Detecting GANs and Retouching based Digital Alterations via DAD-HCNN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 672–673. [Google Scholar]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
- Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
- Pathak, D.; Krahenbuhl, P.; Donahue, J.; Darrell, T.; Efros, A.A. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2536–2544. [Google Scholar]
- Liu, Z.; Qi, X.; Torr, P.H. Global texture enhancement for fake face detection in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 8060–8069. [Google Scholar]
- Hulzebosch, N.; Ibrahimi, S.; Worring, M. Detecting CNN-Generated Facial Images in Real-World Scenarios. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 642–643. [Google Scholar]
- Guarnera, L.; Giudice, O.; Battiato, S. Fighting Deepfake by Exposing the Convolutional Traces on Images. IEEE Access 2020, 8, 165085–165098. [Google Scholar] [CrossRef]
- Moon, T.K. The expectation-maximization algorithm. IEEE Signal Process. Mag. 1996, 13, 47–60. [Google Scholar] [CrossRef]
- Jing, X.Y.; Zhang, D. A face and palmprint recognition approach based on discriminant DCT feature extraction. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2004, 34, 2405–2415. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Thai, T.H.; Retraint, F.; Cogranne, R. Camera model identification based on DCT coefficient statistics. Digit. Signal Process. 2015, 40, 88–100. [Google Scholar] [CrossRef] [Green Version]
- Lam, E.Y. Analysis of the DCT coefficient distributions for document coding. IEEE Signal Process. Lett. 2004, 11, 97–100. [Google Scholar] [CrossRef]
- King, D.E. Dlib-ml: A Machine Learning Toolkit. J. Mach. Learn. Res. 2009, 10, 1755–1758. [Google Scholar]
- Wang, S.Y.; Wang, O.; Zhang, R.; Owens, A.; Efros, A.A. CNN-generated images are surprisingly easy to spot… for now. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; Volume 7. [Google Scholar]
- Zhou, B.; Zhao, H.; Puig, X.; Fidler, S.; Barriuso, A.; Torralba, A. Scene parsing through ade20k dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 633–641. [Google Scholar]
- Bishop, C.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Thies, J.; Zollhofer, M.; Stamminger, M.; Theobalt, C.; Nießner, M. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2387–2395. [Google Scholar]
Real | AttGAN | GDWCT | StarGAN | StyleGAN | StyleGAN2 | Overall Accuracy | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Prec | Rec | F1 | Prec | Rec | F1 | Prec | Rec | F1 | Prec | Rec | F1 | Prec | Rec | F1 | Prec | Rec | F1 | |||
Raw Images | 99 | 97 | 98 | 99 | 100 | 99 | 98 | 98 | 98 | 99 | 100 | 100 | 99 | 98 | 99 | 98 | 100 | 99 | 99 | |
Random Square | 98 | 94 | 96 | 90 | 96 | 93 | 92 | 89 | 91 | 100 | 98 | 99 | 98 | 99 | 98 | 99 | 99 | 99 | 96 | |
Gaussian Filter | 98 | 95 | 96 | 83 | 88 | 86 | 89 | 92 | 91 | 92 | 86 | 89 | 97 | 98 | 98 | 99 | 99 | 99 | 93 | |
98 | 99 | 98 | 62 | 59 | 60 | 70 | 79 | 74 | 59 | 53 | 56 | 99 | 98 | 99 | 98 | 99 | 98 | 81 | ||
100 | 97 | 98 | 58 | 64 | 61 | 72 | 64 | 68 | 55 | 53 | 54 | 98 | 99 | 98 | 95 | 100 | 97 | 80 | ||
Rotation | 45 | 97 | 93 | 95 | 85 | 82 | 83 | 92 | 98 | 95 | 84 | 84 | 84 | 97 | 99 | 98 | 99 | 98 | 98 | 92 |
90 | 98 | 99 | 98 | 95 | 99 | 97 | 98 | 93 | 95 | 100 | 99 | 99 | 99 | 98 | 98 | 99 | 99 | 99 | 98 | |
135 | 95 | 96 | 96 | 85 | 83 | 84 | 97 | 94 | 96 | 83 | 86 | 85 | 96 | 95 | 96 | 96 | 97 | 97 | 92 | |
180 | 98 | 94 | 96 | 95 | 100 | 97 | 97 | 95 | 96 | 99 | 100 | 99 | 98 | 99 | 99 | 100 | 99 | 99 | 98 | |
225 | 96 | 95 | 95 | 88 | 85 | 87 | 96 | 96 | 96 | 86 | 89 | 88 | 96 | 97 | 97 | 97 | 97 | 97 | 93 | |
Mirror | H | 99 | 96 | 98 | 99 | 100 | 99 | 98 | 99 | 98 | 99 | 100 | 99 | 99 | 99 | 99 | 100 | 100 | 100 | 99 |
V | 99 | 96 | 98 | 99 | 100 | 99 | 97 | 99 | 98 | 99 | 100 | 100 | 99 | 99 | 99 | 100 | 100 | 100 | 99 | |
B | 99 | 94 | 97 | 98 | 100 | 99 | 97 | 99 | 98 | 99 | 100 | 100 | 99 | 99 | 99 | 100 | 100 | 100 | 99 | |
Scaling | +50% | 99 | 98 | 99 | 94 | 95 | 95 | 95 | 93 | 94 | 98 | 99 | 99 | 99 | 99 | 99 | 99 | 100 | 99 | 97 |
−50% | 74 | 95 | 84 | 77 | 66 | 71 | 74 | 72 | 73 | 81 | 77 | 79 | 82 | 85 | 84 | 90 | 81 | 85 | 80 | |
JPEG | 1 | 78 | 69 | 73 | 63 | 65 | 64 | 59 | 67 | 63 | 59 | 57 | 58 | 78 | 83 | 80 | 84 | 80 | 82 | 70 |
50 | 93 | 95 | 94 | 98 | 99 | 98 | 87 | 80 | 83 | 84 | 89 | 86 | 88 | 88 | 88 | 90 | 89 | 89 | 90 | |
100 | 99 | 99 | 99 | 100 | 99 | 99 | 98 | 98 | 98 | 99 | 100 | 99 | 99 | 99 | 99 | 99 | 99 | 99 | 99 |
StarGAN | StyleGAN | StyleGAN2 | |
---|---|---|---|
AutoGAN [5] | 65.6 | 79.5 | 72.5 |
FakeSpotter [28] | 88 | 99.1 | 91.9 |
EM [38] | 90.55 | 99.48 | 99.64 |
CTF (our) | 99.9 | 100 | 100 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Giudice, O.; Guarnera, L.; Battiato, S. Fighting Deepfakes by Detecting GAN DCT Anomalies. J. Imaging 2021, 7, 128. https://doi.org/10.3390/jimaging7080128
Giudice O, Guarnera L, Battiato S. Fighting Deepfakes by Detecting GAN DCT Anomalies. Journal of Imaging. 2021; 7(8):128. https://doi.org/10.3390/jimaging7080128
Chicago/Turabian StyleGiudice, Oliver, Luca Guarnera, and Sebastiano Battiato. 2021. "Fighting Deepfakes by Detecting GAN DCT Anomalies" Journal of Imaging 7, no. 8: 128. https://doi.org/10.3390/jimaging7080128
APA StyleGiudice, O., Guarnera, L., & Battiato, S. (2021). Fighting Deepfakes by Detecting GAN DCT Anomalies. Journal of Imaging, 7(8), 128. https://doi.org/10.3390/jimaging7080128