Next Article in Journal
A Family of Tools for Supporting the Learning of Programming
Previous Article in Journal
Recognition of Pulmonary Nodules in Thoracic CT Scans Using 3-D Deformable Object Models of Different Classes
Previous Article in Special Issue
Interactive Compression of Digital Data
Article Menu

Export Article

Open AccessArticle
Algorithms 2010, 3(2), 145-167; doi:10.3390/a3020145

Suffix-Sorting via Shannon-Fano-Elias Codes

Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV 26506-6109, USA
*
Author to whom correspondence should be addressed.
Received: 23 December 2009 / Revised: 18 March 2010 / Accepted: 18 March 2010 / Published: 1 April 2010
(This article belongs to the Special Issue Data Compression)
View Full-Text   |   Download PDF [580 KB, uploaded 1 April 2010]   |  

Abstract

Given a sequence T = t0t1 . . . tn-1 of size n = |T|, with symbols from a fixed alphabet Σ, (|Σ| ≤ n), the suffix array provides a listing of all the suffixes of T in a lexicographic order. Given T, the suffix sorting problem is to construct its suffix array. The direct suffix sorting problem is to construct the suffix array of T directly without using the suffix tree data structure. While algorithims for linear time, linear space direct suffix sorting have been proposed, the actual constant in the linear space is still a major concern, given that the applications of suffix trees and suffix arrays (such as in whole-genome analysis) often involve huge data sets. In this work, we reduce the gap between current results and the minimal space requirement. We introduce an algorithm for the direct suffix sorting problem with worst case time complexity in O(n), requiring only (1 2 3 n log n - n log | |+O(1)) bits in memory space. This implies 5 2 3 n+O(1) bytes for total space requirment, (including space for both the output suffix array and the input sequence T) assuming n 2 32 ,| |256 , and 4 bytes per integer. The basis of our algorithm is an extension of Shannon-Fano-Elias codes used in source coding and information theory. This is the first time information-theoretic methods have been used as the basis for solving the suffix sorting problem. View Full-Text
Keywords: suffix sorting; suffix arrays; suffix tree; Shannon-Fano-Elias codes; source coding suffix sorting; suffix arrays; suffix tree; Shannon-Fano-Elias codes; source coding
This is an open access article distributed under the Creative Commons Attribution License (CC BY 3.0).

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Adjeroh, D.; Nan, F. Suffix-Sorting via Shannon-Fano-Elias Codes. Algorithms 2010, 3, 145-167.

Show more citation formats Show less citations formats

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Algorithms EISSN 1999-4893 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top