Getting Ahead of the Arms Race: Hothousing the Coevolution of VirusTotal with a Packer
Abstract
:1. Introduction
- We introduce a learning method that can show white hats where they are more vulnerable, to help them to anticipate black hats (Section 2).
- We instantiate this framework for information theoretic malware detection by evolving a polymorphic engine, EEE, and coevolve it against VirusTotal (Section 3).
- We demonstrate that artificial evolution via a genetic algorithm can drive down the median detection rate of state of the art malware detectors in the context of a coevolutionary detection arms race (Section 4).
2. Hothousing the Coevolution
3. EEE: The Evolutionary Packer
3.1. Evolving UPX to EEE
3.2. EEE Protections
3.3. EEE Final Version
4. Experiments
- How much does this modified version of UPX reduce the detection rate for real antivirus? (Section 4.1)
- How quickly does VirusTotal learn from the programs submitted to it? As we detect false positives, we also asked, How quickly does VirusTotal recover from false positives? (Section 4.2)
- How much does automated search using a genetic algorithm allow EEE to reduce VirusTotal’s detection rate? (Section 4.3)
- Which antivirus tools are more resistant and what are they detecting? (Section 4.4)
4.1. Initial Steps
4.2. VirusTotal Is a Fast Learner
4.3. Automatically Adapting to VirusTotal
4.4. VirusTotal’s Resilience in Depth
5. Related Work
5.1. Coevolutionary and Adversary Learning Models
5.2. Entropy in Malware Detection
6. Conclusions
7. Availability
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Calleja, A.; Martín, A.; Menéndez, H.D.; Tapiador, J.; Clark, D. Picking on the family: Disrupting android malware triage by forcing misclassification. Expert Syst. Appl. 2018, 95, 113–126. [Google Scholar] [CrossRef]
- Menéndez, H.D.; Bhattacharya, S.; Clark, D.; Barr, E.T. The arms race: Adversarial search defeats entropy used to detect malware. Expert Syst. Appl. 2019, 118, 246–260. [Google Scholar] [CrossRef]
- Preda, M.D.; Christodorescu, M.; Jha, S.; Debray, S. A semantics-based approach to malware detection. ACM Trans. Program. Lang. Syst. 2008, 30, 25. [Google Scholar] [CrossRef]
- Xu, W.; Qi, Y.; Evans, D. Automatically evading classifiers. In Proceedings of the 2016 Network and Distributed Systems Symposium, New Delhi, India, 25–27 February 2016. [Google Scholar]
- Somayaji, A. How to win an evolutionary arms race. IEEE Secur. Priv. 2004, 2, 70–72. [Google Scholar] [CrossRef]
- Man, K.F.; Tang, K.S.; Kwong, S. Genetic Algorithms: Concepts and Designs; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
- Total, V. VirusTotal-Free Online Virus, Malware and URL Scanner. 2017. Available online: https://www.virustotal.com (accessed on 31 January 2016).
- Schrittwieser, S.; Katzenbeisser, S.; Kinder, J.; Merzdovnik, G.; Weippl, E. Protecting software through obfuscation: Can it keep pace with progress in code analysis? ACM Comput. Surv. 2016, 49, 4. [Google Scholar] [CrossRef]
- Guerra, P.H.C.; Guedes, D.; Meira, J.W.; Hoepers, C.; Chaves, M.; Steding-Jessen, K. Exploring the spam arms race to characterize spam evolution. In Proceedings of the 7th Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS), Redmond, WA, USA, 13–14 July 2010. [Google Scholar]
- Marsden, C.T. Net Neutrality: Towards a Co-Regulatory Solution; A&C Black: London, UK, 2010. [Google Scholar]
- Zhou, Y.; Kantarcioglu, M.; Xi, B. A survey of game theoretic approach for adversarial machine learning. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2019, 9, e1259. [Google Scholar] [CrossRef]
- Gibbons, R. A primer in Game Theory; Harvester Wheatsheaf: London, UK, 1992. [Google Scholar]
- Gardiner, J.; Nagaraja, S. On the Security of Machine Learning in Malware C& C Detection: A Survey. ACM Comput. Surv. 2016, 49, 59:1–59:39. [Google Scholar] [CrossRef] [Green Version]
- Srndic, N.; Laskov, P. Practical Evasion of a Learning-Based Classifier: A Case Study. In Proceedings of the 2014 IEEE Symposium on Security and Privacy, Berkeley, CA, USA, 18–21 May 2014; pp. 197–211. [Google Scholar] [CrossRef] [Green Version]
- Malin, C.H.; Casey, E.; Aquilina, J.M. Malware Forensics Field Guide for Linux Systems: Digital Forensics Field Guides; Newnes: London, UK, 2013. [Google Scholar]
- Bifet, A.; Holmes, G.; Kirkby, R.; Pfahringer, B. Moa: Massive online analysis. J. Mach. Learn. Res. 2010, 11, 1601–1604. [Google Scholar]
- Hintze, J.L.; Nelson, R.D. Violin plots: A box plot-density trace synergism. Am. Stat. 1998, 52, 181–184. [Google Scholar]
- Yason, M.V. The art of unpacking. Retrieved Feb 2007, 12, 2008. [Google Scholar]
- Oberhumer, M.; Molnár, L.; Reiser, J.F. UPX: The Ultimate Packer for eXecutables. 2004. Available online: http://upx.sourceforge.net/ (accessed on 1 June 2015).
- Globerson, A.; Roweis, S. Nightmare at test time: Robust learning by feature deletion. In Proceedings of the 23rd International Conference on Machine Learning, San Jose, CA, USA, 18–21 May 2006; pp. 353–360. [Google Scholar]
- Zhou, Y.; Kantarcioglu, M.; Thuraisingham, B.; Xi, B. Adversarial support vector machine learning. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 1059–1067. [Google Scholar]
- Kantarcıoğlu, M.; Xi, B.; Clifton, C. Classifier evaluation and attribute selection against active adversaries. Data Min. Knowl. Discov. 2011, 22, 291–335. [Google Scholar] [CrossRef] [Green Version]
- Liu, W.; Chawla, S. Mining adversarial patterns via regularized loss minimization. Mach. Learn. 2010, 81, 69–83. [Google Scholar] [CrossRef] [Green Version]
- Dua, S.; Du, X. Data Mining and Machine Learning in Cybersecurity; Auerbach Publications: Boca Raton, FL, USA, 2016. [Google Scholar]
- Moreno-Torres, J.G.; Raeder, T.; Alaiz-RodríGuez, R.; Chawla, N.V.; Herrera, F. A unifying view on dataset shift in classification. Pattern Recognit. 2012, 45, 521–530. [Google Scholar] [CrossRef]
- Martín, A.; Menéndez, H.D.; Camacho, D. MOCDroid: Multi-objective evolutionary classifier for Android malware detection. Soft Comput. 2017, 21, 7405–7415. [Google Scholar] [CrossRef]
- Feng, Y.; Anand, S.; Dillig, I.; Aiken, A. Apposcopy: Semantics-based Detection of Android Malware Through Static Analysis. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, Hong Kong, China, 16–22 November 2014; pp. 576–587. [Google Scholar] [CrossRef] [Green Version]
- Sebastián, M.; Rivera, R.; Kotzias, P.; Caballero, J. Avclass: A tool for massive malware labeling. In Proceedings of the International Symposium on Research in Attacks, Intrusions, and Defenses, Heraklion, Greece, 10–12 September 2016; pp. 230–253. [Google Scholar]
- Menéndez, H.D.; Llorente, J.L. Mimicking Anti-Viruses with Machine Learning and Entropy Profiles. Entropy 2019, 21, 513. [Google Scholar] [CrossRef] [Green Version]
- Hammad, M.; Garcia, J.; Malek, S. A large-scale empirical study on the effects of code obfuscations on Android apps and anti-malware products. In Proceedings of the 40th International Conference on Software Engineering, Gothenburg, Sweden, 27 May–3 June 2018; pp. 421–431. [Google Scholar]
- Chinavle, D.; Kolari, P.; Oates, T.; Finin, T. Ensembles in adversarial classification for spam. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, Hong Kong, China, 2–6 November 2009; pp. 2015–2018. [Google Scholar]
- Smutz, C.; Stavrou, A. Malicious PDF detection using metadata and structural features. In Proceedings of the 28th Annual Computer Security Applications Conference, Orlando, FL, USA, 3–7 December 2012; pp. 239–248. [Google Scholar]
- Šrndic, N.; Laskov, P. Detection of malicious pdf files based on hierarchical document structure. In Proceedings of the 20th Annual Network & Distributed System Security Symposium, San Diego, CA, USA, 24–27 February 2013. [Google Scholar]
- Biggio, B.; Nelson, B.; Laskov, P. Poisoning Attacks against Support Vector Machines. In Proceedings of the 29th International Conference on Machine Learning, Edinburgh, UK, 26 June–1 July 2012. [Google Scholar]
- Biggio, B.; Fumera, G.; Roli, F. Security evaluation of pattern classifiers under attack. IEEE Trans. Knowl. Data Eng. 2014, 26, 984–996. [Google Scholar] [CrossRef] [Green Version]
- Ménendez, H.D. VARMOG: A Co-Evolutionary Algorithm to Identify Manifolds on Large Data. In Proceedings of the 2019 IEEE Congress on Evolutionary Computation (CEC), Wellington, New Zealand, 10–13 June 2019; pp. 3300–3307. [Google Scholar]
- Garcia-Cobo, I.; Menéndez, H.D. Designing large quantum key distribution networks via medoid-based algorithms. Future Gener. Comput. Syst. 2020, 115, 814–824. [Google Scholar] [CrossRef]
- Shannon, C.E. A mathematical theory of communication. ACM Sigmobile Mob. Comput. Commun. Rev. 2001, 5, 3–55. [Google Scholar] [CrossRef]
- Lyda, R.; Hamrock, J. Using entropy analysis to find encrypted and packed malware. IEEE Secur. Priv. 2007, 5, 40–45. [Google Scholar] [CrossRef]
- Shafiq, M.Z.; Tabish, S.; Farooq, M. PE-probe: Leveraging packer detection and structural information to detect malicious portable executables. In Proceedings of the Virus Bulletin Conference (VB), Citeseer, Switzerland, 23–25 September 2009; pp. 29–33. [Google Scholar]
- McMillan, C.; Garman, J. System and Method for Determining Data Entropy to Identify Malware. US Patent 8,069,484, 29 November 2011. [Google Scholar]
- Alshahwan, N.; T Barr, E.; Clark, D.; Danezis, G.; D Menéndez, H. Detecting malware with information complexity. Entropy 2020, 22, 575. [Google Scholar] [CrossRef] [PubMed]
- Zhou, Y.; Inge, W.M. Malware Detection Using Adaptive Data Compression. In Proceedings of the 1st ACM Workshop on Workshop on AISec, Alexandria, VA, USA, 27–31 October 2008; pp. 53–60. [Google Scholar] [CrossRef]
- Martín, A.; Calleja, A.; Menéndez, H.D.; Tapiador, J.; Camacho, D. ADROIT: Android malware detection using meta-information. In Proceedings of the 2016 IEEE Symposium Series on Computational Intelligence (SSCI), Athens, Greece, 6–9 December 2016; pp. 1–8. [Google Scholar]
- Shafiq, M.Z.; Khayam, S.A.; Farooq, M. Embedded malware detection using markov n-grams. In Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Lisbon, Portugal, 10–11 July 2008; pp. 88–107. [Google Scholar]
- Perdisci, R.; Lanzi, A.; Lee, W. McBoost: Boosting scalability in malware collection and analysis using statistical classification of executables. In Proceedings of the Annual Computer Security Applications Conference, Anaheim, CA, USA, 8–12 December 2008; pp. 301–310. [Google Scholar]
- Tabish, S.M.; Shafiq, M.Z.; Farooq, M. Malware Detection Using Statistical Analysis of Byte-level File Content. In Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics, Paris, France, 28 June 2009; pp. 23–31. [Google Scholar] [CrossRef]
- Ugarte-Pedrero, X.; Santos, I.; Sanz, B.; Laorden, C.; Bringas, P.G. Countering entropy measure attacks on packed software detection. In Proceedings of the Consumer Communications and Networking Conference (CCNC), Las Vegas, NV, USA, 14–17 January 2012; pp. 164–168. [Google Scholar]
- Jacob, G.; Comparetti, P.M.; Neugschwandtner, M.; Kruegel, C.; Vigna, G. A static, packer-agnostic filter to detect similar malware samples. In Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Heraklion, Greece, 26–27 July 2012; pp. 102–122. [Google Scholar]
- Sorokin, I. Comparing files using structural entropy. J. Comput. Virol. 2011, 7, 259–265. [Google Scholar] [CrossRef]
- Baysa, D.; Low, R.M.; Stamp, M. Structural entropy and metamorphic malware. J. Comput. Virol. Hacking Tech. 2013, 9, 179–192. [Google Scholar] [CrossRef] [Green Version]
Version | Compress | Encrypt | CERs | Search | Protect | Points |
---|---|---|---|---|---|---|
UPX | Yes | - | - | - | - | B |
MUPX-I | Yes | XOR | - | - | - | C |
MUPX-II | Yes | XOR | Fixed | - | - | D-E |
MUPX-III | Yes | XOR | Variable | Random | Delay | F-G |
EEE | Yes | XOR | Variable | Evolutionary | Delay & Oli | H |
Median | Mean | Min | Max | |
---|---|---|---|---|
Malware | 392 | 717 | 4 | 17,972 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Menéndez, H.D.; Clark, D.; T. Barr, E. Getting Ahead of the Arms Race: Hothousing the Coevolution of VirusTotal with a Packer. Entropy 2021, 23, 395. https://doi.org/10.3390/e23040395
Menéndez HD, Clark D, T. Barr E. Getting Ahead of the Arms Race: Hothousing the Coevolution of VirusTotal with a Packer. Entropy. 2021; 23(4):395. https://doi.org/10.3390/e23040395
Chicago/Turabian StyleMenéndez, Héctor D., David Clark, and Earl T. Barr. 2021. "Getting Ahead of the Arms Race: Hothousing the Coevolution of VirusTotal with a Packer" Entropy 23, no. 4: 395. https://doi.org/10.3390/e23040395
APA StyleMenéndez, H. D., Clark, D., & T. Barr, E. (2021). Getting Ahead of the Arms Race: Hothousing the Coevolution of VirusTotal with a Packer. Entropy, 23(4), 395. https://doi.org/10.3390/e23040395