Next Article in Journal
3C3R, an Image Encryption Algorithm Based on BBI, 2D-CA, and SM-DNA
Previous Article in Journal
Integrated Information Theory and Isomorphic Feed-Forward Philosophical Zombies
Open AccessArticle

A Reference-Free Lossless Compression Algorithm for DNA Sequences Using a Competitive Prediction of Two Classes of Weighted Models

1
Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, 3810-193 Aveiro, Portugal
2
Department of Electronics, Telecomunications and Informatics, University of Aveiro, 3810-193 Aveiro, Portugal
3
Department of Virology, University of Helsinki, 00100 Helsinki, Finland
*
Author to whom correspondence should be addressed.
Entropy 2019, 21(11), 1074; https://doi.org/10.3390/e21111074
Received: 24 September 2019 / Revised: 25 October 2019 / Accepted: 31 October 2019 / Published: 2 November 2019
The development of efficient data compressors for DNA sequences is crucial not only for reducing the storage and the bandwidth for transmission, but also for analysis purposes. In particular, the development of improved compression models directly influences the outcome of anthropological and biomedical compression-based methods. In this paper, we describe a new lossless compressor with improved compression capabilities for DNA sequences representing different domains and kingdoms. The reference-free method uses a competitive prediction model to estimate, for each symbol, the best class of models to be used before applying arithmetic encoding. There are two classes of models: weighted context models (including substitutional tolerant context models) and weighted stochastic repeat models. Both classes of models use specific sub-programs to handle inverted repeats efficiently. The results show that the proposed method attains a higher compression ratio than state-of-the-art approaches, on a balanced and diverse benchmark, using a competitive level of computational resources. An efficient implementation of the method is publicly available, under the GPLv3 license.
Keywords: lossless data compression; DNA sequences; competitive prediction; weighted models; context models; stochastic repeat models lossless data compression; DNA sequences; competitive prediction; weighted models; context models; stochastic repeat models
MDPI and ACS Style

Pratas, D.; Hosseini, M.; Silva, J.M.; Pinho, A.J. A Reference-Free Lossless Compression Algorithm for DNA Sequences Using a Competitive Prediction of Two Classes of Weighted Models. Entropy 2019, 21, 1074.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop