 Next Article in Journal
Scheduling Algorithms for a Hybrid Flow Shop under Uncertainty
Next Article in Special Issue
Computing Maximal Lyndon Substrings of a String
Previous Article in Journal
Spikyball Sampling: Exploring Large Networks via an Inhomogeneous Filtered Diffusion

## Printed Edition

A printed edition of this Special Issue is available at MDPI Books....
Article

# Efficient Data Structures for Range Shortest Unique Substring Queries †

1
Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
2
Department of Computer Science, University of Wisconsin - Whitewater, Whitewater, WI 53190, USA
3
Life Sciences and Health, CWI, 1098 XG Amsterdam, The Netherlands
4
Center for Integrative Bioinformatics, Vrije Universiteit, 1081 HV Amsterdam, The Netherlands
*
Author to whom correspondence should be addressed.
An early version of this paper appeared in the Proceedings of SPIRE 2019.
Algorithms 2020, 13(11), 276; https://doi.org/10.3390/a13110276
Received: 9 September 2020 / Revised: 23 October 2020 / Accepted: 29 October 2020 / Published: 30 October 2020
(This article belongs to the Special Issue Combinatorial Methods for String Processing)
Let $T[1,n]$ be a string of length n and $T[i,j]$ be the substring of $T$ starting at position i and ending at position j. A substring $T[i,j]$ of $T$ is a repeat if it occurs more than once in $T$; otherwise, it is a unique substring of $T$. Repeats and unique substrings are of great interest in computational biology and information retrieval. Given string $T$ as input, the Shortest Unique Substring problem is to find a shortest substring of $T$ that does not occur elsewhere in $T$. In this paper, we introduce the range variant of this problem, which we call the Range Shortest Unique Substring problem. The task is to construct a data structure over $T$ answering the following type of online queries efficiently. Given a range $[α,β]$, return a shortest substring $T[i,j]$ of $T$ with exactly one occurrence in $[α,β]$. We present an $O(nlogn)$-word data structure with $O(logwn)$ query time, where $w=Ω(logn)$ is the word size. Our construction is based on a non-trivial reduction allowing for us to apply a recently introduced optimal geometric data structure [Chan et al., ICALP 2018]. Additionally, we present an $O(n)$-word data structure with $O(nlogϵn)$ query time, where $ϵ>0$ is an arbitrarily small constant. The latter data structure relies heavily on another geometric data structure [Nekrich and Navarro, SWAT 2012]. View Full-Text
Show Figures Figure 1

MDPI and ACS Style

Abedin, P.; Ganguly, A.; Pissis, S.P.; Thankachan, S.V. Efficient Data Structures for Range Shortest Unique Substring Queries. Algorithms 2020, 13, 276. https://doi.org/10.3390/a13110276

AMA Style

Abedin P, Ganguly A, Pissis SP, Thankachan SV. Efficient Data Structures for Range Shortest Unique Substring Queries. Algorithms. 2020; 13(11):276. https://doi.org/10.3390/a13110276

Chicago/Turabian Style

Abedin, Paniz, Arnab Ganguly, Solon P. Pissis, and Sharma V. Thankachan. 2020. "Efficient Data Structures for Range Shortest Unique Substring Queries" Algorithms 13, no. 11: 276. https://doi.org/10.3390/a13110276

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

1