Next Article in Journal
The Effect of Preprocessing on Arabic Document Categorization
Previous Article in Journal
Primary User Localization Algorithm Based on Compressive Sensing in Cognitive Radio Networks
Article Menu

Export Article

Open AccessArticle
Algorithms 2016, 9(2), 26; doi:10.3390/a9020026

siEDM: An Efficient String Index and Search Algorithm for Edit Distance with Moves

1
Graduate School of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka-shi, Fukuoka 820-8502, Japan
2
Computer Centre, Gakushuin University, 1-5-1 Mejiro, Toshima-ku, Tokyo 171-8588, Japan
3
PRESTO, Japan Science and Technology Agency, 4-1-8 Honcho, Kawaguchi-shi, Saitama 332-0012, Japan
*
Author to whom correspondence should be addressed.
Academic Editor: Florin Manea
Received: 25 November 2015 / Revised: 8 April 2016 / Accepted: 11 April 2016 / Published: 15 April 2016
View Full-Text   |   Download PDF [1027 KB, uploaded 15 April 2016]   |  

Abstract

Although several self-indexes for highly repetitive text collections exist, developing an index and search algorithm with editing operations remains a challenge. Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string into another. Although the problem of computing EDM is intractable, it has a wide range of potential applications, especially in approximate string retrieval. Despite the importance of computing EDM, there has been no efficient method for indexing and searching large text collections based on the EDM measure. We propose the first algorithm, named string index for edit distance with moves (siEDM), for indexing and searching strings with EDM. The siEDM algorithm builds an index structure by leveraging the idea behind the edit sensitive parsing (ESP), an efficient algorithm enabling approximately computing EDM with guarantees of upper and lower bounds for the exact EDM. siEDM efficiently prunes the space for searching query strings by the proposed method, which enables fast query searches with the same guarantee as ESP. We experimentally tested the ability of siEDM to index and search strings on benchmark datasets, and we showed siEDM’s efficiency. View Full-Text
Keywords: edit distance with moves; self-index; grammar-based self-index; string index for edit-distance with moves edit distance with moves; self-index; grammar-based self-index; string index for edit-distance with moves
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Takabatake, Y.; Nakashima, K.; Kuboyama, T.; Tabei, Y.; Sakamoto, H. siEDM: An Efficient String Index and Search Algorithm for Edit Distance with Moves. Algorithms 2016, 9, 26.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Algorithms EISSN 1999-4893 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top