Next Article in Journal
Two-Dimensional Permutation Vectors’ (PV) Code for Optical Code Division Multiple Access Systems
Previous Article in Journal
Improving Underwater Continuous-Variable Measurement-Device-Independent Quantum Key Distribution via Zero-Photon Catalysis

Detecting Malware with Information Complexity

Computer Science Department, University College London, London WC1E 6BT, UK
Computer Science Department, Middlesex University London, London NW4 4BG, UK
Author to whom correspondence should be addressed.
Entropy 2020, 22(5), 575;
Received: 20 April 2020 / Revised: 15 May 2020 / Accepted: 16 May 2020 / Published: 20 May 2020
(This article belongs to the Section Multidisciplinary Applications)
Malware concealment is the predominant strategy for malware propagation. Black hats create variants of malware based on polymorphism and metamorphism. Malware variants, by definition, share some information. Although the concealment strategy alters this information, there are still patterns on the software. Given a zoo of labelled malware and benign-ware, we ask whether a suspect program is more similar to our malware or to our benign-ware. Normalized Compression Distance (NCD) is a generic metric that measures the shared information content of two strings. This measure opens a new front in the malware arms race, one where the countermeasures promise to be more costly for malware writers, who must now obfuscate patterns as strings qua strings, without reference to execution, in their variants. Our approach classifies disk-resident malware with 97.4% accuracy and a false positive rate of 3%. We demonstrate that its accuracy can be improved by combining NCD with the compressibility rates of executables using decision forests, paving the way for future improvements. We demonstrate that malware reported within a narrow time frame of a few days is more homogeneous than malware reported over two years, but that our method still classifies the latter with 95.2% accuracy and a 5% false positive rate. Due to its use of compression, the time and computation cost of our method is nontrivial. We show that simple approximation techniques can improve its running time by up to 63%. We compare our results to the results of applying the 59 anti-malware programs used on the VirusTotal website to our malware. Our approach outperforms each one used alone and matches that of all of them used collectively. View Full-Text
Keywords: information theory; Kolmogorov complexity; normalized compression distance; malware detection information theory; Kolmogorov complexity; normalized compression distance; malware detection
Show Figures

Figure 1

MDPI and ACS Style

Alshahwan, N.; Barr, E.T.; Clark, D.; Danezis, G.; Menéndez, H.D. Detecting Malware with Information Complexity. Entropy 2020, 22, 575.

AMA Style

Alshahwan N, Barr ET, Clark D, Danezis G, Menéndez HD. Detecting Malware with Information Complexity. Entropy. 2020; 22(5):575.

Chicago/Turabian Style

Alshahwan, Nadia, Earl T. Barr, David Clark, George Danezis, and Héctor D. Menéndez. 2020. "Detecting Malware with Information Complexity" Entropy 22, no. 5: 575.

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

Back to TopTop