Next Article in Journal
State Entropy and Differentiation Phenomenon
Previous Article in Journal
Thermodynamics and Statistical Mechanics of Small Systems
Previous Article in Special Issue
Remote Sensing Extraction Method of Tailings Ponds in Ultra-Low-Grade Iron Mining Area Based on Spectral Characteristics and Texture Entropy
Article Menu
Issue 6 (June) cover image

Export Article

Open AccessArticle
Entropy 2018, 20(6), 393; https://doi.org/10.3390/e20060393

Comparison of Compression-Based Measures with Application to the Evolution of Primate Genomes

1
Institute of Electronics and Informatics Engineering of Aveiro, University of Aveiro, 3810-193 Aveiro, Portugal
2
Department of Medical Sciences and Institute for Biomedicine - iBiMED, University of Aveiro, 3810-193 Aveiro, Portugal
3
Department of Electronics, Telecommunications and Informatics, University of Aveiro, 3810-193 Aveiro, Portugal
*
Author to whom correspondence should be addressed.
Received: 3 March 2018 / Revised: 16 May 2018 / Accepted: 21 May 2018 / Published: 23 May 2018
(This article belongs to the Special Issue Entropy-based Data Mining)
Full-Text   |   PDF [2548 KB, uploaded 23 May 2018]   |  

Abstract

An efficient DNA compressor furnishes an approximation to measure and compare information quantities present in, between and across DNA sequences, regardless of the characteristics of the sources. In this paper, we compare directly two information measures, the Normalized Compression Distance (NCD) and the Normalized Relative Compression (NRC). These measures answer different questions; the NCD measures how similar both strings are (in terms of information content) and the NRC (which, in general, is nonsymmetric) indicates the fraction of one of them that cannot be constructed using information from the other one. This leads to the problem of finding out which measure (or question) is more suitable for the answer we need. For computing both, we use a state of the art DNA sequence compressor that we benchmark with some top compressors in different compression modes. Then, we apply the compressor on DNA sequences with different scales and natures, first using synthetic sequences and then on real DNA sequences. The last include mitochondrial DNA (mtDNA), messenger RNA (mRNA) and genomic DNA (gDNA) of seven primates. We provide several insights into evolutionary acceleration rates at different scales, namely, the observation and confirmation across the whole genomes of a higher variation rate of the mtDNA relative to the gDNA. We also show the importance of relative compression for localizing similar information regions using mtDNA. View Full-Text
Keywords: data compression; NCD; NRC; DNA sequences; primate evolution data compression; NCD; NRC; DNA sequences; primate evolution
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).
SciFeed

Share & Cite This Article

MDPI and ACS Style

Pratas, D.; Silva, R.M.; Pinho, A.J. Comparison of Compression-Based Measures with Application to the Evolution of Primate Genomes. Entropy 2018, 20, 393.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Entropy EISSN 1099-4300 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top