Entropy 2013, 15(9), 3435-3448; doi:10.3390/e15093435
Article

Bacterial DNA Sequence Compression Models Using Artificial Neural Networks

1 Instituto de Telecomunicaҫões / Departamento de Electrónica, Telecomunicaҫões e Informática, Campus Universitário de Santiago, Aveiro 3810-193, Portugal 2 Instituto de Engenharia Electrónica e Telemática de Aveiro / Departamento de Electrónica, Telecomunicaҫões e Informática, Campus Universitário de Santiago, Aveiro 3810-193, Portugal
* Author to whom correspondence should be addressed.
Received: 22 May 2013; in revised form: 2 August 2013 / Accepted: 27 August 2013 / Published: 30 August 2013
PDF Full-text Download PDF Full-Text [526 KB, uploaded 30 August 2013 11:15 CEST]
Abstract: It is widely accepted that the advances in DNA sequencing techniques have contributed to an unprecedented growth of genomic data. This fact has increased the interest in DNA compression, not only from the information theory and biology points of view, but also from a practical perspective, since such sequences require storage resources. Several compression methods exist, and particularly, those using finite-context models (FCMs) have received increasing attention, as they have been proven to effectively compress DNA sequences with low bits-per-base, as well as low encoding/decoding time-per-base. However, the amount of run-time memory required to store high-order finite-context models may become impractical, since a context-order as low as 16 requires a maximum of 17.2 x 109 memory entries. This paper presents a method to reduce such a memory requirement by using a novel application of artificial neural networks (ANN) to build such probabilistic models in a compact way and shows how to use them to estimate the probabilities. Such a system was implemented, and its performance compared against state-of-the art compressors, such as XM-DNA (expert model) and FCM-Mx (mixture of finite-context models) , as well as with general-purpose compressors. Using a combination of order-10 FCM and ANN, similar encoding results to those of FCM, up to order-16, are obtained using only 17 megabytes of memory, whereas the latter, even employing hash-tables, uses several hundreds of megabytes.
Keywords: compression; finite-context models; Markov models; neural nets

Article Statistics

Load and display the download statistics.

Citations to this Article

Cite This Article

MDPI and ACS Style

Duarte, M.J.; Pinho, A.J. Bacterial DNA Sequence Compression Models Using Artificial Neural Networks. Entropy 2013, 15, 3435-3448.

AMA Style

Duarte MJ, Pinho AJ. Bacterial DNA Sequence Compression Models Using Artificial Neural Networks. Entropy. 2013; 15(9):3435-3448.

Chicago/Turabian Style

Duarte, Manuel J.; Pinho, Armando J. 2013. "Bacterial DNA Sequence Compression Models Using Artificial Neural Networks." Entropy 15, no. 9: 3435-3448.

Entropy EISSN 1099-4300 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert