Next Article in Journal
Certain Hadamard Proportional Fractional Integral Inequalities
Next Article in Special Issue
Gene-Similarity Normalization in a Genetic Algorithm for the Maximum k-Coverage Problem
Previous Article in Journal
Automatic Calibration of Process Noise Matrix and Measurement Noise Covariance for Multi-GNSS Precise Point Positioning
Previous Article in Special Issue
Developing a New Robust Swarm-Based Algorithm for Robot Analysis
Open AccessArticle

An Improved Bytewise Approximate Matching Algorithm Suitable for Files of Dissimilar Sizes

Institute of Physical and Information Technologies (ITEFI), Spanish National Research Council (CSIC), Serrano 144, 28034 Madrid, Spain
*
Author to whom correspondence should be addressed.
Mathematics 2020, 8(4), 503; https://doi.org/10.3390/math8040503
Received: 21 February 2020 / Revised: 30 March 2020 / Accepted: 30 March 2020 / Published: 2 April 2020
(This article belongs to the Special Issue Evolutionary Computation & Swarm Intelligence)
The goal of digital forensics is to recover and investigate pieces of data found on digital devices, analysing in the process their relationship with other fragments of data from the same device or from different ones. Approximate matching functions, also called similarity preserving or fuzzy hashing functions, try to achieve that goal by comparing files and determining their resemblance. In this regard, ssdeep, sdhash, and LZJD are nowadays some of the best-known functions dealing with this problem. However, even though those applications are useful and trustworthy, they also have important limitations (mainly, the inability to compare files of very different sizes in the case of ssdeep and LZJD, the excessive size of sdhash and LZJD signatures, and the occasional scarce relationship between the comparison score obtained and the actual content of the files when using the three applications). In this article, we propose a new signature generation procedure and an algorithm for comparing two files through their digital signatures. Although our design is based on ssdeep, it improves some of its limitations and satisfies the requirements that approximate matching applications should fulfil. Through a set of ad-hoc and standard tests based on the FRASH framework, it is possible to state that the proposed algorithm presents remarkable overall detection strengths and is suitable for comparing files of very different sizes. A full description of the multi-thread implementation of the algorithm is included, along with all the tests employed for comparing this proposal with ssdeep, sdhash, and LZJD. View Full-Text
Keywords: approximate matching; context-triggered piecewise hashing; edit distance; fuzzy hashing; LZJD; multi-thread programming; sdhash; signatures; similarity detection; ssdeep approximate matching; context-triggered piecewise hashing; edit distance; fuzzy hashing; LZJD; multi-thread programming; sdhash; signatures; similarity detection; ssdeep
Show Figures

Figure 1

MDPI and ACS Style

Gayoso Martínez, V.; Hernández-Álvarez, F.; Hernández Encinas, L. An Improved Bytewise Approximate Matching Algorithm Suitable for Files of Dissimilar Sizes. Mathematics 2020, 8, 503.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop