Next Article in Journal
A Survey of Low-Rank Updates of Preconditioners for Sequences of Symmetric Linear Systems
Previous Article in Journal
Numerical Simulation of Non-Linear Models of Reaction—Diffusion for a DGT Sensor
Previous Article in Special Issue
Editorial: Special Issue on Data Compression Algorithms and Their Applications
Open AccessArticle

A New Lossless DNA Compression Algorithm Based on A Single-Block Encoding Scheme

School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430070, China
*
Author to whom correspondence should be addressed.
Algorithms 2020, 13(4), 99; https://doi.org/10.3390/a13040099
Received: 24 March 2020 / Revised: 16 April 2020 / Accepted: 17 April 2020 / Published: 20 April 2020
(This article belongs to the Special Issue Data Compression Algorithms and their Applications)
With the emergent evolution in DNA sequencing technology, a massive amount of genomic data is produced every day, mainly DNA sequences, craving for more storage and bandwidth. Unfortunately, managing, analyzing and specifically storing these large amounts of data become a major scientific challenge for bioinformatics. Therefore, to overcome these challenges, compression has become necessary. In this paper, we describe a new reference-free DNA compressor abbreviated as DNAC-SBE. DNAC-SBE is a lossless hybrid compressor that consists of three phases. First, starting from the largest base (Bi), the positions of each Bi are replaced with ones and the positions of other bases that have smaller frequencies than Bi are replaced with zeros. Second, to encode the generated streams, we propose a new single-block encoding scheme (SEB) based on the exploitation of the position of neighboring bits within the block using two different techniques. Finally, the proposed algorithm dynamically assigns the shorter length code to each block. Results show that DNAC-SBE outperforms state-of-the-art compressors and proves its efficiency in terms of special conditions imposed on compressed data, storage space and data transfer rate regardless of the file format or the size of the data. View Full-Text
Keywords: DNA sequence; storage; lossless compression; FASTA files; FASTQ files; binary encoding scheme; single-block encoding scheme; compression ratio DNA sequence; storage; lossless compression; FASTA files; FASTQ files; binary encoding scheme; single-block encoding scheme; compression ratio
Show Figures

Figure 1

MDPI and ACS Style

Mansouri, D.; Yuan, X.; Saidani, A. A New Lossless DNA Compression Algorithm Based on A Single-Block Encoding Scheme. Algorithms 2020, 13, 99.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop