Next Article in Journal
Scheduling Algorithms for a Hybrid Flow Shop under Uncertainty
Next Article in Special Issue
Computing Maximal Lyndon Substrings of a String
Previous Article in Journal
Spikyball Sampling: Exploring Large Networks via an Inhomogeneous Filtered Diffusion
Open AccessArticle

Efficient Data Structures for Range Shortest Unique Substring Queries

1
Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
2
Department of Computer Science, University of Wisconsin - Whitewater, Whitewater, WI 53190, USA
3
Life Sciences and Health, CWI, 1098 XG Amsterdam, The Netherlands
4
Center for Integrative Bioinformatics, Vrije Universiteit, 1081 HV Amsterdam, The Netherlands
*
Author to whom correspondence should be addressed.
An early version of this paper appeared in the Proceedings of SPIRE 2019.
Algorithms 2020, 13(11), 276; https://doi.org/10.3390/a13110276
Received: 9 September 2020 / Revised: 23 October 2020 / Accepted: 29 October 2020 / Published: 30 October 2020
(This article belongs to the Special Issue Combinatorial Methods for String Processing)
Let T[1,n] be a string of length n and T[i,j] be the substring of T starting at position i and ending at position j. A substring T[i,j] of T is a repeat if it occurs more than once in T; otherwise, it is a unique substring of T. Repeats and unique substrings are of great interest in computational biology and information retrieval. Given string T as input, the Shortest Unique Substring problem is to find a shortest substring of T that does not occur elsewhere in T. In this paper, we introduce the range variant of this problem, which we call the Range Shortest Unique Substring problem. The task is to construct a data structure over T answering the following type of online queries efficiently. Given a range [α,β], return a shortest substring T[i,j] of T with exactly one occurrence in [α,β]. We present an O(nlogn)-word data structure with O(logwn) query time, where w=Ω(logn) is the word size. Our construction is based on a non-trivial reduction allowing for us to apply a recently introduced optimal geometric data structure [Chan et al., ICALP 2018]. Additionally, we present an O(n)-word data structure with O(nlogϵn) query time, where ϵ>0 is an arbitrarily small constant. The latter data structure relies heavily on another geometric data structure [Nekrich and Navarro, SWAT 2012]. View Full-Text
Keywords: shortest unique substring; suffix tree; heavy-light decomposition; range queries; geometric data structures shortest unique substring; suffix tree; heavy-light decomposition; range queries; geometric data structures
Show Figures

Figure 1

MDPI and ACS Style

Abedin, P.; Ganguly, A.; Pissis, S.P.; Thankachan, S.V. Efficient Data Structures for Range Shortest Unique Substring Queries. Algorithms 2020, 13, 276. https://doi.org/10.3390/a13110276

AMA Style

Abedin P, Ganguly A, Pissis SP, Thankachan SV. Efficient Data Structures for Range Shortest Unique Substring Queries. Algorithms. 2020; 13(11):276. https://doi.org/10.3390/a13110276

Chicago/Turabian Style

Abedin, Paniz; Ganguly, Arnab; Pissis, Solon P.; Thankachan, Sharma V. 2020. "Efficient Data Structures for Range Shortest Unique Substring Queries" Algorithms 13, no. 11: 276. https://doi.org/10.3390/a13110276

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop