Detect with Style: A Contrastive Learning Framework for Detecting Computer-Generated Images
Abstract
:1. Introduction
- A novel CNN-based framework is designed to discriminate between CGIs and NIs, abbreviated as CoStNet. To the best of the authors’ knowledge, this is the first attempt to conduct such discrimination based on supervised contrastive learning and style transfer in the benchmark DSTok, Rahmouni, and LSCGB datasets.
- A complementary style transfer module, which operates in real-time, is employed to increase the training CGIs even when a limited number of training samples is available, thus enhancing the training procedure.
- CoStNet achieves state-of-the-art accuracies in the benchmark DSTok, Rahmouni, and LSCGB datasets, underscoring its remarkable advancement in the field.
- The generalization capability of CoStNet, initially trained on the LSCGB dataset, is evaluated through testing on the DSTok dataset. Additionally, CoStNet undergoes training on the DSTok dataset and is subsequently tested on the Rahmounis’ dataset to assess its broader applicability.
- The proposed framework is robust against high salt-and-pepper and Gaussian noise at various corruption levels.
- Multiple tests are conducted to empirically demonstrate that CoStNet is less sensitive to modifications of the training parameters, such as the number of training epochs and the batch size.
- An ablation study is performed to assess the impact of the style transfer module when limited training samples are available.
- Hypothesis testing confirms that the improvements in detection accuracy between CoStNet and methods reported in the literature are statistically significant.
2. Related Work
3. Proposed Framework
3.1. Framework Overview
3.2. Style Transfer Learning
3.3. Supervised Contrastive Learning
4. Datasets
- DSTok dataset [27]: The DSTok dataset comprises a total of 4850 CGIs and 4850 NIs sourced from the Internet. NIs encompass diverse indoor and outdoor landscapes captured by various devices, while CGs exhibit photorealistic qualities. This collection boasts high-resolution images, ranging from to , showcasing significant inter-class diversity. Such characteristics position the DSTok dataset as a pivotal resource for research in CG image detection, emphasizing its prominence in the literature.
- Rahmouni’s dataset [24]: Rahmouni’s dataset consists of 1800 high-resolution CGIs of size pixels downloaded from the Level-Design Reference Database [43]. These CGIs were taken from photorealistic video games (i.e., Uncharted 4, Battlefield Bad Company 2, The Witcher 3, Battlefield 4, and Grand Theft Auto 5). Only these five distinct video games were deemed to exhibit a sufficient level of photorealism and thus they were employed. On the other hand, 1800 high-resolution NIs with a size of pixels were obtained from the RAISE dataset [44] comprising a diverse array of settings, including outdoor and indoor scenes such as monuments, houses, landscapes, people bodies and faces, and forests.
- LSCGB dataset [28]: It is one of the most recent datasets. Its size is orders of magnitude larger than that of the preceding datasets. It consists of 71,168 CGIs and 71,168 NIs. It is characterized by high diversity and small bias regarding the distribution of color, tone, brightness, and saturation.
- He’s dataset [22]: He’s dataset consists of 6800 CGIs downloaded from the Internet. The images were created using a variety of rendering software packages, such as Maya, AutoCAD, etc. Another 6800 NIs were included in the dataset, which were captured under various indoor and outdoor circumstances. All images were stored in jpeg format, and their size ranges from to .
- Columbia dataset [45]: The Columbia dataset consists of four sets of 800 images, resulting in a total of 3200 images. It consists of 800 NIs captured using the professional single-lens reflex Canon 10D and Nikon . These images demonstrate content diversity regarding indoor and outdoor scenes, various lighting conditions, etc. Another 800 NIs were retrieved from the Internet using Google Image Search based on keywords matching the CGI set’s categories. A total of 800 CGIs were downloaded from the Internet. The images were classified based on their content, such as nature, objects, architecture, etc. Various rendering software packages were employed to create them. Another 800 CGIs were recaptured from the monitor while displaying the set of 800 previous CGIs.
Dataset | # of CGIs | # of NIs | CGI Sources | NI Sources | Year |
---|---|---|---|---|---|
DSTok [27] | 4850 | 4850 | 3D models | Photo-sharing websites | 2013 |
Rahmouni [24] | 1800 | 1800 | 3D models games | Existing benchmarks | 2017 |
LSCGB [28] | 71,168 | 71,168 | Models, games, movies, GANs | Existing benchmarks, movies, photo-sharing websites | 2020 |
He [22] | 6800 | 6800 | 3D models | Personal collection | 2018 |
Columbia [45] | 1600 | 1600 | 3D models | Personal collection, Google Image Search | 2005 |
5. Experimental Evaluation
5.1. Experimental Setup and Augmentations
5.2. Evaluation Results on the Benchmark Datasets
5.3. Parameters’ Assessment and Generalization Ability
5.4. Robustness Capability
5.5. Impact of Style Transfer (Ablation Study)
5.6. Statistical Significance
5.7. Real-World Forensic Applications
6. Conclusions, Limitations, and Future Directions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Autodesk A360 Rendering Gallery. Available online: https://gallery.autodesk.com/a360rendering/ (accessed on 24 January 2020).
- Artlantis Gallery. Available online: https://artlantis.com/en/gallery/ (accessed on 24 January 2020).
- Learn VRay. Available online: https://www.learnvray.com/fotogallery/ (accessed on 24 January 2020).
- Corona Renderer Gallery. Available online: https://corona-renderer.com/gallery (accessed on 24 January 2020).
- Yang, P.; Baracchi, D.; Ni, R.; Zhao, Y.; Argenti, F.; Piva, A. A survey of deep learning-based source image forensics. J. Imaging 2020, 6, 9. [Google Scholar] [CrossRef]
- Mazumdar, A.; Bora, P.K. Siamese convolutional neural network-based approach towards universal image forensics. IET Image Process. 2020, 14, 3105–3116. [Google Scholar] [CrossRef]
- Goel, N.; Kaur, S.; Bala, R. Dual branch convolutional neural network for copy move forgery detection. IET Image Process. 2021, 15, 656–665. [Google Scholar] [CrossRef]
- Rhee, K.H. Detection of spliced image forensics using texture analysis of median filter residual. IEEE Access 2020, 8, 103374–103384. [Google Scholar] [CrossRef]
- Chang, H.; Yeh, C. Face anti-spoofing detection based on multi-scale image quality assessment. Image Vis. Comput. 2022, 121, 104428. [Google Scholar] [CrossRef]
- Matern, F.; Riess, C.; Stamminger, M. Gradient-based illumination description for image forgery detection. IEEE Trans. Inf. Forensics Secur. 2019, 15, 1303–1317. [Google Scholar] [CrossRef]
- Chen, J.; Liao, X.; Qin, Z. Identifying tampering operations in image operator chains based on decision fusion. Signal Process. Image Commun. 2021, 95, 116287. [Google Scholar] [CrossRef]
- Zhang, X.; Sun, Z.; Karaman, S.; Chang, S. Discovering image manipulation history by pairwise relation and forensics tools. IEEE J. Sel. Top. Signal Process. 2020, 14, 1012–1023. [Google Scholar] [CrossRef]
- Carvalho, T.; Faria, F.; Pedrini, H.; Torres, R.; Rocha, A. Illuminant-based transformed spaces for image forensics. IEEE Trans. Inf. Forensics Secur. 2015, 11, 720–733. [Google Scholar] [CrossRef]
- Wang, J.; Li, T.; Shi, Y.; Lian, S.; Ye, J. Forensics feature analysis in quaternion wavelet domain for distinguishing photographic images and computer graphics. Multim. Tools Appl. 2017, 76, 23721–23737. [Google Scholar] [CrossRef]
- Peng, F.; Zhou, D.; Long, M.; Sun, X. Discrimination of natural images and computer generated graphics based on multi-fractal and regression analysis. AEU Int. J. Electron. Commun. 2017, 71, 72–81. [Google Scholar] [CrossRef]
- Ng, T.; Chang, S.; Hsu, J.; Xie, L.; Tsui, M. Physics-motivated features for distinguishing photographic images and computer graphics. In Proceedings of the 13th Annual CM International Conference on Multimedia, Singapore, 28 November–30 December2005; pp. 239–248. [Google Scholar]
- Chen, W.; Shi, Y.Q.; Xuan, G. Identifying computer graphics using HSV color model and statistical moments of characteristic functions. In Proceedings of the IEEE International Conference on Multimedia and Expo, Beijing, China, 2–5 July 2007; pp. 1123–1126. [Google Scholar]
- Zhang, R.; Wang, R.; Ng, T. Distinguishing photographic images and photorealistic computer graphics using visual vocabulary on local image edges. In Digital Watermarking Techniques in Curvelet and Ridgelet Domain; Springer: Cham, Switzerland, 2011; pp. 292–305. [Google Scholar]
- Yao, Y.; Zhang, Z.; Ni, X.; Shen, Z.; Chen, L.; Xu, D. CGNet: Detecting computer-generated images based on transfer learning with attention module. Signal Process. Image Commun. 2022, 105, 116692. [Google Scholar] [CrossRef]
- Quan, W.; Wang, K.; Yan, D.M.; Zhang, X.; Pellerin, D. Learn with diversity and from harder samples: Improving the generalization of CNN-Based detection of computer-generated images. Forensic Sci. Int. Digit. Investig. 2020, 35, 301023. [Google Scholar] [CrossRef]
- Zhang, R.; Quan, W.; Fan, L.; Hu, L.; Yan, D. Distinguishing computer-generated images from natural images using channel and pixel correlation. J. Comput. Sci. Technol. 2020, 35, 592–602. [Google Scholar] [CrossRef]
- He, P.; Jiang, X.; Sun, T.; Li, H. Computer graphics identification combining convolutional and recurrent neural networks. IEEE Signal Process. Lett. 2018, 25, 1369–1373. [Google Scholar] [CrossRef]
- De Rezende, E.R.; Ruppert, G.C.; Theophilo, A.; Tokuda, E.K.; Carvalho, T. Exposing computer generated images by using deep convolutional neural networks. Signal Process. Image Commun. 2018, 66, 113–126. [Google Scholar] [CrossRef]
- Rahmouni, N.; Nozick, V.; Yamagishi, J.; Echizen, I. Distinguishing computer graphics from natural images using convolution neural networks. In Proceedings of the IEEE Workshop on Information Forensics and Security (WIFS), Rennes, France, 4–7 December 2017; pp. 1–6. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Khosla, P.; Teterwak, P.; Wang, C.; Sarna, A.; Tian, Y.; Isola, P.; Maschinot, A.; Liu, C.; Krishnan, D. Supervised contrastive learning. In Advances in Neural Information Processing Systems; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H., Eds.; Curran Associates, Inc.: New York, NY, USA, 2020; Volume 33, pp. 18661–18673. [Google Scholar]
- Tokuda, E.; Pedrini, H.; Rocha, A. Computer generated images vs. digital photographs: A synergetic feature and classifier combination approach. J. Visual Commun. Image Repres. 2013, 24, 1276–1292. [Google Scholar] [CrossRef]
- Bai, W.; Zhang, Z.; Li, B.; Wang, P.; Li, Y.; Zhang, C.; Hu, W. Robust texture-aware computer-generated image forensic: Benchmark and algorithm. IEEE Trans. Image Process. 2021, 30, 8439–8453. [Google Scholar] [CrossRef]
- Lyu, S.; Farid, H. How realistic is photorealistic? IEEE Trans. Signal Process. 2005, 53, 845–850. [Google Scholar] [CrossRef]
- Yao, Y.; Hu, W.; Zhang, W.; Wu, T.; Shi, Y. Distinguishing computer-generated graphics from natural images based on sensor pattern noise and deep learning. Sensors 2018, 18, 1296. [Google Scholar] [CrossRef]
- Quan, W.; Wang, K.; Yan, D.; Zhang, X. Distinguishing between natural and computer-generated images using convolutional neural networks. IEEE Trans. Inf. Forensics Secur. 2018, 13, 2772–2787. [Google Scholar] [CrossRef]
- Tariang, D.B.; Senguptab, P.; Roy, A.; Chakraborty, R.S.; Naskar, R. Classification of Computer Generated and Natural Images based on Efficient Deep Convolutional Recurrent Attention Model. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 146–152. [Google Scholar]
- He, P.; Li, H.; Wang, H.; Zhang, R. Detection of computer graphics using attention-based dual-branch convolutional neural network from fused color components. Sensors 2020, 20, 4743. [Google Scholar] [CrossRef]
- Meena, K.B.; Tyagi, V. Methods to distinguish photorealistic computer generated images from photographic images: A review. In Proceedings of the Advances and Applications in Computer Science, Electronics and Industrial Engineering, Ghazibad, India, 12–13 April 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 64–82. [Google Scholar]
- Ni, X.; Chen, L.; Yuan, L.; Wu, G.; Yao, Y. An evaluation of deep learning-based computer generated image detection approaches. IEEE Access 2019, 7, 130830–130840. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Luo, X.; Han, Z.; Yang, L. Progressive Attentional Manifold Alignment for Arbitrary Style Transfer. In Proceedings of the Asian Conference on Computer Vision, Macao, China, 4–8 December 2022; pp. 3206–3222. [Google Scholar]
- Izmailov, P.; Podoprikhin, D.; Garipov, T.; Vetrov, D.; Wilson, A. Averaging weights leads to wider optima and better generalization. arXiv 2018, arXiv:1803.05407. [Google Scholar]
- Manning, C.; Raghavan, P.; Schütze, H. Introduction to Information Retrieval; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
- Kolkin, N.; Salavon, J.; Shakhnarovich, G. Style transfer by relaxed optimal transport and self-similarity. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 10051–10060. [Google Scholar]
- Qiu, T.; Ni, B.; Liu, Z.; Chen, X. Fast optimal transport artistic style transfer. In Proceedings of the 27th International Conference on Multimedia Modeling, Prague, Czech Republic, 22–24 June 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 37–49. [Google Scholar]
- Afifi, M.; Brubaker, M.; Brown, M. Histogan: Controlling colors of gan-generated and real images via color histograms. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 7941–7950. [Google Scholar]
- Piaskiewicz, M. Level-Design Reference Database. 2017. Available online: http://level-design.org/referencedb (accessed on 24 January 2020).
- Dang-Nguyen, D.; Pasquini, C.; Conotter, V.; Boato, G. RAISE: A raw images dataset for digital image forensics. In Proceedings of the 6th ACM Multimedia Systems Conference, Portland, OR, USA, 18–20 March 2015; pp. 219–224. [Google Scholar]
- Ng, T.; Chang, S.; Hsu, J.; Pepeljugoski, M. Columbia photographic images and photorealistic computer graphics dataset. ADVENT Technical Report; Columbia University: New York, NY, USA, 2005; pp. 205–2004. [Google Scholar]
- Amari, S. A theory of adaptive pattern classifiers. IEEE Trans. Electron. Comput. 1967, 3, 299–307. [Google Scholar] [CrossRef]
- Meena, K.B.; Tyagi, V. Distinguishing computer-generated images from photographic images using two-stream convolutional neural network. Appl. Soft Comput. 2021, 100, 107025. [Google Scholar] [CrossRef]
- Nguyen, H.; Tieu, T.; Nguyen-Son, H.; Nozick, V.; Yamagishi, J.; Echizen, I. Modular convolutional neural network for discriminating between computer-generated images and photographic images. In Proceedings of the 13th International Conference on Availability, Reliability and Security, Hamburg, Germany, 27–30 August 2018; pp. 1–10. [Google Scholar]
- Huang, R.; Fang, F.; Nguyen, H.; Yamagishi, J.; Echizen, I. A method for identifying origin of digital images using a convolutional neural network. In Proceedings of the IEEE 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, Auckland, New Zealand, 7–10 December 2020; pp. 1293–1299. [Google Scholar]
- Gando, G.; Yamada, T.; Sato, H.; Oyama, S.; Kurihara, M. Fine-tuning deep convolutional neural networks for distinguishing illustrations from photographs. Expert Syst. Appl. 2016, 66, 295–301. [Google Scholar] [CrossRef]
- Chawla, C.; Panwar, D.; Anand, G.S.; Bhatia, M. Classification of computer generated images from photographic images using convolutional neural networks. In Proceedings of the IEEE 2018 International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), Bangalore, India, 19–22 September 2018; pp. 1053–1057. [Google Scholar]
- Guyon, I.; Makhoul, J.; Schwartz, R.; Vapnik, V. What size test set gives good error rate estimates? IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 52–64. [Google Scholar] [CrossRef]
Details | Configuration |
---|---|
CPU | i9-7900X @ 3.3 GHz |
GPU | RTX 2080 Ti |
RAM | 126 GB |
Epochs | 10 | 20 | 30 | 50 | 60 | 70 | 100 |
Accuracy |
Batch size | 20 | 30 | 50 | 60 | 70 | 100 | 250 |
Accuracy |
Algorithms | Rahmouni’s Dataset |
---|---|
Rahmouni [24] | |
Quan [31] | |
Yao [30] | |
Gando [50] | |
De Rezende [23] | |
He [22] | |
Zhang [21] | |
Quan [20] | |
Yao [19] | |
CoStNet (Proposed) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Karantaidis, G.; Kotropoulos, C. Detect with Style: A Contrastive Learning Framework for Detecting Computer-Generated Images. Information 2024, 15, 158. https://doi.org/10.3390/info15030158
Karantaidis G, Kotropoulos C. Detect with Style: A Contrastive Learning Framework for Detecting Computer-Generated Images. Information. 2024; 15(3):158. https://doi.org/10.3390/info15030158
Chicago/Turabian StyleKarantaidis, Georgios, and Constantine Kotropoulos. 2024. "Detect with Style: A Contrastive Learning Framework for Detecting Computer-Generated Images" Information 15, no. 3: 158. https://doi.org/10.3390/info15030158
APA StyleKarantaidis, G., & Kotropoulos, C. (2024). Detect with Style: A Contrastive Learning Framework for Detecting Computer-Generated Images. Information, 15(3), 158. https://doi.org/10.3390/info15030158