Next Article in Journal
Optimal 2-Coverage of a Polygonal Region in a Sensor Network
Next Article in Special Issue
Multiplication Symmetric Convolution Property for Discrete Trigonometric Transforms
Previous Article in Journal
Featured-Based Algorithm for the Automated Registration of Multisensorial / Multitemporal Oceanographic Satellite Imagery
Previous Article in Special Issue
Graph Compression by BFS
Algorithms 2009, 2(3), 1105-1136; doi:10.3390/a2031105

Approximate String Matching with Compressed Indexes

1,3,* , 2
1 INESC-ID, R. Alves Redol 9, 1000 Lisboa, Portugal 2 Department of Computer Science, University of Chile, Avenida Blanco Encalada, 2120, 837-0459 Santiago, Chile Santiago, Chile 3 CITI, Departamento de Inform´atica, Faculdade de Ciˆencias e Tecnologia, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal 4 Instituto Superior T´ecnico, Universidade T´ecnica de Lisboa, Av. Rovisco Pais, 1, 1049-001 Lisboa, Portugal
* Author to whom correspondence should be addressed.
Received: 9 July 2009 / Revised: 8 September 2009 / Accepted: 9 September 2009 / Published: 10 September 2009
(This article belongs to the Special Issue Data Compression)
View Full-Text   |   Download PDF [334 KB, uploaded 10 September 2009]   |   Browse Figures


A compressed full-text self-index for a text T is a data structure requiring reduced space and able to search for patterns P in T. It can also reproduce any substring of T, thus actually replacing T. Despite the recent explosion of interest on compressed indexes, there has not been much progress on functionalities beyond the basic exact search. In this paper we focus on indexed approximate string matching (ASM), which is of great interest, say, in bioinformatics. We study ASM algorithms for Lempel-Ziv compressed indexes and for compressed suffix trees/arrays. Most compressed self-indexes belong to one of these classes. We start by adapting the classical method of partitioning into exact search to self-indexes, and optimize it over a representative of either class of self-index. Then, we show that a Lempel- Ziv index can be seen as an extension of the classical q-samples index. We give new insights on this type of index, which can be of independent interest, and then apply them to a Lempel- Ziv index. Finally, we improve hierarchical verification, a successful technique for sequential searching, so as to extend the matches of pattern pieces to the left or right. Most compressed suffix trees/arrays support the required bidirectionality, thus enabling the implementation of the improved technique. In turn, the improved verification largely reduces the accesses to the text, which are expensive in self-indexes. We show experimentally that our algorithms are competitive and provide useful space-time tradeoffs compared to classical indexes.
Keywords: compressed index; approximate string matching; Lempel-Ziv; compressed suffix tree; compressed suffix array compressed index; approximate string matching; Lempel-Ziv; compressed suffix tree; compressed suffix array
This is an open access article distributed under the Creative Commons Attribution License (CC BY) which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Share & Cite This Article

Further Mendeley | CiteULike
Export to BibTeX |
EndNote |
MDPI and ACS Style

Russo, L.M.S.; Navarro, G.; Oliveira, A.L.; Morales, P. Approximate String Matching with Compressed Indexes. Algorithms 2009, 2, 1105-1136.

View more citation formats

Related Articles

Article Metrics

For more information on the journal, click here


[Return to top]
Algorithms EISSN 1999-4893 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert