Detecting Browser Drive-By Exploits in Images Using Deep Learning
Abstract
:1. Introduction
2. Background
2.1. Stenography
- Based on spatial domain. They are based on the statistics of the image and create a hidden channel using a replacement method. It can be implemented in a sequential way, e.g., using the least significant bits (LSB) or in a random sequence, for instance, by using the least significant bits (LSB) with Fermat or Fibonacci formulas generator (https://stegano.readthedocs.io/en/latest/software.html#the-command-stegano-lsb-set (accessed on 15 July 2021)).
- Based on the frequency domain. It spreads the data over the frequency domain of the signal. Almost all robust methods of steganography are based in the Frequency Domain. Some examples are F5 algorithm (a Discrete Cosine Transform (DCT)), OutGuess (https://www.rbcafe.es/software/outguess/ (accessed on 1 August 2022)), YASS (https://github.com/logasja/yass-js (accessed on 1 August 2022)), etc. There are more robust methods than LSB, although they have the limitation of the number of least significant bits of an image.
- Based on spread spectrum image steganography (SSIS). They are based on modulating a narrow band above the carrier.
- Based on machine learning algorithms [8].
- Manually inserting the code in the image randomly, etc.
2.2. Steganalysis: Frameworks and Techniques
2.3. Polyglot Attacks with Steganography
2.4. Previous Work in Steganalysis and Deep Learning
3. Proposal for a Steganalysis Approach to Polyglot Detection
3.1. Description of the Approach
- The type and variety of objects displayed in the images. COCO were used as sources since the images they contain show different types of objects. The ILSVRC dataset was also used as source of images to increase variety and avoid possible biases.
- The number of polyglots embedded in the images. Several trainings were held varying the number of Javascript polyglots and the number of images infected.
- The characteristics of the images. Training was conducted with greyscale or colour images. Polyglots were embedded before and after colour transformation for different tests.
- The homogeneity of the images (same size, orientation, etc.) Image transformation regarding size and orientation has been performed.
3.2. Experimental Setup
- Watermelon dataset + LSB steganography (v0.1): Dataset of different watermelon images. It contained 1354 clean images and 1946 infected images with 1 polyglot.
- Watermelon dataset + LSB steganography (v0.2): Dataset of different watermelon images. It contained 1354 clean images and 1946 infected images with 20 polyglots.
- COCO Dataset + LSB steganography (v1): Using COCO as source from images that contain a variety of items/situations, LSB technique was used to create stego images with polyglots, resulting in a dataset with 37,000 clean images and 3000 infected images.
- COCO Dataset + LSB steganography + Image modifications (Resizing, Relocation, …) (v2): Using the the dataset configured in (2), data augmentation and images resizing were performed.
- COCO Dataset + Gray Conversion + LSB Steganography (v3): Using the clean COCO dataset configured in (2) (40,000 images), 20,280 images images were first converted to greyscale and polyglots were included in 1256 of these greyscale images using the LSB technique.
- COCO Dataset + LSB Steganography (v4): Using the images from COCO, the number of different polyglots embedded using the LSB technique was increased up to 20 common and known structures in Javascript. The number of infected images were also increased up to 411,000 images, being 328,000 clean images and 83,000 infected images.
- COCO Dataset + LSB Steganography (v5): As the previous dataset configuration can suggest overfitting, a new version of the training dataset was designed. Using images from COCO dataset, the number of different polyglots embedded using the LSB technique was increased up to 104 common and known structures in JavaScript. The number of clean images was reduced to 123,460 and the number of infected images to 31,000.
- COCO Dataset + ILSVR dataset + LSB Steganography (v6): Using images from both COCO dataset and ILSVR dataset [37], the following two datasets were generated, which contained 41.026 clean images and 8.313 embedded images in the first case and 205.130 clean images and 41.026 embedded images in the second. In both cases, stego images were embedded with 104 common structures of polyglots in JavaScript using the LSB technique.
- COCO Dataset + ILSVR dataset + LSB Steganography + LSB Steganography using Fermat and Fibonacci generation (v7): Based on v4 dataset, 33.347 images infected using Fermat and Fibonacci generator are added. The final dataset is composed of 279.503 images, from them 41.026 LSB infected images and 33.347 images infected using Fibonacci and Fermat generators.
- COCO Dataset + ILSVR dataset + LSB Steganography + LSB Steganography using Fermat and Fibonacci generation (v7) + F5 [35] (v8): Based on v5 dataset, 33.347 images infected using Fermat and Fibonacci generator and 621 F5 images are added. The final dataset is composed of 280.124 images, from them 41.026 LSB infected images, 621 F5 infected images and 33.347 images infected using Fibonacci and Fermat generators.
- Local machine;
- Docker Virtual machine based in Tensorflow without GPU;
- Google Colab with no hardware optimizations;
- Google Colab with GPU [39];
- Google Colab with TPU.
4. Results and Discussion
5. Conclusions and Outlook
Author Contributions
Funding
Conflicts of Interest
References
- Kadhima, I.J.; Premaratnea, P.; Viala, P.J.; Hallorana, B. Comprehensive survey of image steganography: Techniques, Evaluations, and trends in future research. Neurocomputing 2019, 335, 299–326. [Google Scholar] [CrossRef]
- Muñoz, A. A Simple Steganalysis Tool. Available online: https://stegsecret.sourceforge.net/ (accessed on 1 October 2022).
- ENISA Threat Landscape 2020: Cyber Attacks Becoming More Sophisticated, Targeted, Widespread and Undetected. Available online: https://www.enisa.europa.eu/news/enisa-news/enisa-threat-landscape-2020 (accessed on 29 November 2021).
- Steganography in Attacks on Industrial Enterprises. Available online: https://ics-cert.kaspersky.com/reports/2020/06/17/steganography-in-attacks-on-industrial-enterprises (accessed on 29 November 2021).
- Jiang, C.; Pang, Y.; Xiong, S. A High Capacity Steganographic Method Based on Quantization Table Modification and F5 Algorithm. Circuits Syst. Signal Process 2014, 33, 1611–1626. [Google Scholar] [CrossRef]
- Kour, J.; Verma, D. Steganography Techniques—A Review Paper. Int. J. Emerg. Res. Manag. Technol. 2015, 3, 132–135. [Google Scholar]
- Malik, S.; Mitra, W. Hiding Information—A Survey. J. Inf. Sci. Comput. Technol. 2015, 3, 232–240. [Google Scholar]
- Cho, D.X.; Thuong, D.T.H.; Dung, N.K. A Method of Detecting Storage Based Network Steganography Using Machine Learning. Procedia Comput. Sci. 2019, 154, 543–548. [Google Scholar] [CrossRef]
- Wang, J.; Cheng, M.; Wu, P.; Chen, B. A Survey on Digital Image Steganography. J. Inf. Hiding Priv. Prot. 2019, 1, 87–93. [Google Scholar] [CrossRef] [Green Version]
- Jiao, S.; Zhou, C.; Shi, Y.; Zou, W.; Li, X. Review on Optical Image Hiding and Watermarking Techniques. Opt. Laser Technol. 2019, 109, 370–380. [Google Scholar] [CrossRef] [Green Version]
- Luo, X.Y.; Wang, D.S.; Wang, P.; Liu, F.L. A review on blind detection for image Stenography. Signal Process. 2008, 88, 2138–2157. [Google Scholar] [CrossRef]
- Nissar, A.; Mir, A.H. Classification of Steganalysis Techniques: A Study. Digit. Signal Process. 2010, 20, 1758–1770. [Google Scholar] [CrossRef]
- Karampidis, K.; Kavallieratou, E.; Papadourakis, G. A Review of Image Steganalysis Techniques for Digital Forensics. J. Inf. Secur. Appl. 2018, 40, 217–235. [Google Scholar] [CrossRef]
- Bebloh—A Well-Known Banking Trojan with Noteworthy Innovations. Available online: https://www.gdatasoftware.com/blog/2013/12/23978-bebloh-a-well-known-banking-trojan-with-noteworthy-innovations (accessed on 1 October 2022).
- Ursnif. Available online: https://attack.mitre.org/software/S0386/ (accessed on 1 October 2022).
- Tabares-Soto, R.; Arteaga-Arteaga, H.; Mora-Rubio, A.; Bravo-Ortiz, M.A.; Arias-Garzón, D.; Grisales, J.A.A.; Jacome, A.B.; Orozco-Arias, S.; Isaza, G.; Pollan, R.R. Strategy to improve the accuracy of convolutional neural network architectures applied to digital image steganalysis in the spatial domain. J. Comput. Sci. 2021, 7, e45. [Google Scholar] [CrossRef] [PubMed]
- Chaumont, M. Deep Learning in steganography and steganalysis from 2015 to 2018. In Digital Media Steganography: Principles, Algorithms, Advances; Hassaballah, M., Ed.; Elsevier: Amsterdam, The Netherlands, 2019. [Google Scholar]
- Shi, Y.Q.; Xuan, G.R.; Zou, D.K. Image steganalysis based on moments of characteristics functions using wavelet characteristics functions using wavelet decomposition, prediction-error image, and neural network. In Proceedings of the IEEE International Conference on Multimedia and Expo, Amsterdam, The Netherlands, 6 July 2005; pp. 269–272. [Google Scholar]
- Lie, W.N.; Lin, G.S. A Feature-based classification technique for blind steganalysis. IEEE Trans. Multimed. 2005, 7, 1007–1020. [Google Scholar]
- Tan, S.; Li, B. Stacked Convolutional Auto-Encoders for Steganalysis of Digital Images. In Proceedings of the Signal and Information Processing Association Annual Summit and Conference, Siem Reap, Cambodia, 9–12 December 2014. [Google Scholar]
- Qian, Y.; Dong, J.; Wang, W.; Tan, T. Deep Learning for Steganalysis via Convolutional Neural Networks. In Proceedings of the Media Watermarking, Security, and Forensics, San Francisco, CA, USA, 8–12 February 2015; Volume 9404. [Google Scholar]
- Jin, B.; Cruz, L.; Goncalves, N. Deep Facial Diagnosis: Deep Transfer Learning From Face Recognition to Facial Diagnosis. IEEE Access 2020, 8, 123649–123661. [Google Scholar] [CrossRef]
- Boroum, M.; Chen, M.; Fridich, J. Deep Residual Network for Steganalysis of Digital Images. IEEE Trans. Inf. Forensics Secur. 2019, 14, 1181–1193. [Google Scholar] [CrossRef]
- Li, B.; Wei, W.; Ferreira, A.; Tan, S. ReST-Net: Diverse Activation Modules and Parallel Subnets-Based CNN for Spatial Image Steganalysis. IEEE Signal Process. Lett. 2018, 25, 650–654. [Google Scholar] [CrossRef]
- Xu, G. Deep Convolutional Neural Network to Detect J-UNIWARD. In Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security, Philadelphia, PA, USA, 20–22 June 2017; pp. 63–67. [Google Scholar]
- Shi, H.; Dong, J.; Wang, W.; Qian, Y.; Zhang, X. SSGAN: Secure Steganography Based on Generative Adversarial Networks. In Lecture Notes in Computer Science, Proceedings of the 18th Pacific-Rim Conference on Multimedia, Harbin, China, 28–29 September 2017; Springer: Cham, Switzerland, 2017; pp. 534–544. [Google Scholar]
- Tang, W.; Tan, S.; Li, B.; Huang, J. Automatic Steganographic Distorsion Learning Using Generative Adversarial Networks. IEEE Signal Process. Lett. 2017, 24, 1547–1551. [Google Scholar] [CrossRef]
- Yasrab, R. SRNET: A shallow skip connection based convolutional neural network design for resolving singularities. J. Comput. Sci. Technol. 2019, 34, 924–938. [Google Scholar] [CrossRef]
- Reinel, T.S.; Brayan, A.A.H.; Alej, B.O.M.; Alej, M.R.; Daniel, A.G.; Alej, A.G.J.; Buenaventura, B.-J.A.; Simon, O.-A.; Gustavo, I.; Raúl, R.-P. GBRAS-Net: A convolutional neural network architecture for spatial image steganalysis. IEEE Access 2021, 9, 14340–14350. [Google Scholar] [CrossRef]
- Wu, L.; Han, X.; Wen, C.; Li, B. A Steganalysis framework based on CNN using the filter subset selection method. Multimed. Tools Appl. 2020, 79, 19875–19892. [Google Scholar] [CrossRef]
- Zheng, Q.; Yang, M.; Yang, J.; Zhang, Q.; Zhang, X. Improvement of Generalization Ability of Deep CNN via Implicit Regularization in Two-Stage Training Process. IEEE Access 2018, 6, 15844–15869. [Google Scholar] [CrossRef]
- Liu, Y.; Dou, Y.; Qiao, P. Beyond top-N accuracy indicator: A comprehensive evaluation indicator of CNN models in image classification. IET Comput. Vis. 2020, 14, 407–414. [Google Scholar] [CrossRef]
- Zhao, M.; Chang, C.H.; Xie, W.; Xie, Z.; Hu, J. Cloud shape classification system based on multi-channel cnn and improved fdm. IEEE Access 2020, 8, 44111–44124. [Google Scholar] [CrossRef]
- Jin, B.; Cruz, L.; Goncalvez, N. Pseudo RGB-Face Recognition. IEEE Sens. J. 2022, 22, 21780–21794. [Google Scholar] [CrossRef]
- Newman, J.; Lin, L.; Chen, W.; Reinders, S.; Wang, Y.; Wu, M.; Guan, Y. StegoAppDB: A steganography apps forensics image database. Electron. Imaging 2019, 2019, 536. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.Y.; Maire, M.; Belongie, S.; Bourdev, L.; Girshick, R.; Hays, J.; Perona, P.; Ramanan, D.; Zitnick, C.L.; Dollár, P. Microsoft COCO: Common Objects in Context. In Lecture Notes in Computer Science, Proceedings of Computer Vision—ECCV 2014, Zurich, Switzerland, 6–12 September 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Springer: Cham, Switzerland, 2014. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. (IJCV) 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
- A Collection of JavaScript Engine CVEs with PoCs. Available online: https://github.com/tunz/js-vuln-db (accessed on 29 November 2021).
- Zhao, M.; Jha, A.; Liu, Q.; Millis, B.; Mahadevan-Jansen, A.; Lu, L.; Landman, B.; Tyska, M.J.; Huo, Y. Faster Mean-shift: GPU-accelerated clustering for cosine embedding-based cell segmentation and tracking. Med. Image Anal. 2021, 17, 102048. [Google Scholar] [CrossRef]
Dataset | Model | Number of Polyglots | Type of Stego | Val Accuracy |
---|---|---|---|---|
Watermelon (v0.1) | 1 | 1 | LSB | 0.9672 |
Watermelon (v0.2) | 1 | 20 | LSB | 0.561 |
Coco RGB (v1) | 1 | 1 | LSB | 0.9507 |
Coco RGB (v2) | 2 | 20 | LSB | 0.9543 |
Coco Gray (v3) | 2 | 20 | LSB | 0.9399 |
Coco RGB (v5) | 2 | 104 | LSB | 0.9739 |
Coco RGB (v3+v4) | 2 | 20 | LSB + Gray | 0.0915 |
Coco+ILSVR (v6) | 2 | 104 | LSB | 1 |
Coco+ILSVR (v7) | 2 | 104 | LSB | 0.9521 |
Coco+ILSVR+F5 (v8) | 2 | NA | LSB, F5 | 0.9861 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Iglesias, P.; Sicilia, M.-A.; García-Barriocanal, E. Detecting Browser Drive-By Exploits in Images Using Deep Learning. Electronics 2023, 12, 473. https://doi.org/10.3390/electronics12030473
Iglesias P, Sicilia M-A, García-Barriocanal E. Detecting Browser Drive-By Exploits in Images Using Deep Learning. Electronics. 2023; 12(3):473. https://doi.org/10.3390/electronics12030473
Chicago/Turabian StyleIglesias, Patricia, Miguel-Angel Sicilia, and Elena García-Barriocanal. 2023. "Detecting Browser Drive-By Exploits in Images Using Deep Learning" Electronics 12, no. 3: 473. https://doi.org/10.3390/electronics12030473
APA StyleIglesias, P., Sicilia, M.-A., & García-Barriocanal, E. (2023). Detecting Browser Drive-By Exploits in Images Using Deep Learning. Electronics, 12(3), 473. https://doi.org/10.3390/electronics12030473