1. Introduction
Quantum computing [
1,
2,
3] is one of the hot topics in computer science in the last few decades. There are many problems where quantum algorithms outperform the best known classical algorithms [
4,
5,
6,
7,
8,
9,
10,
11,
12].
One of these problems are problems for strings. Researchers show the power of quantum algorithms for such problems in [
13,
14,
15,
16,
17,
18,
19,
20,
21,
22].
In this paper, we consider three problems:
Our algorithms use some quantum algorithms as a subroutine, and the remaining part is classical. We investigate the problems in terms of query complexity. The query model is one of the most popular in the case of quantum algorithms. Such algorithms can do a query to a black box that has access to the sequence of strings. As a running time of an algorithm, we mean a number of queries to the black box.
In the paper, we suggested a quantum comparison procedure for two strings. We show that its quantum complexity is
$\Theta \left(\sqrt{k}\right)$, where
k is the length of strings. The classical complexity is
$\Theta \left(k\right)$. Thus, the quantum algorithm has a quadratic speedup compared to classical algorithms. We propose a quantum algorithm that is based on “the first one search” (The minimal element satisfying a condition) problem algorithm from [
23,
24,
25,
26]. This algorithm is a modification of Grover’s search algorithm [
27,
28]. Another important algorithm for the search is described in [
29]. Using this idea, we obtain quantum algorithms for several problems.
The first problem is the String Sorting problem. Assume that we have
n strings of length
k. It is known [
30] that no quantum algorithm can sort arbitrary comparable objects faster than
$O(nlogn)$. At the same time, several researchers tried to improve the hidden constant [
31,
32]. Other researchers investigated the space bounded case [
33]. We focus on sorting strings. In a classical case, we can use an algorithm that is better than arbitrary comparable objects sorting algorithms. It is radix sort that has
$O\left(nk\right)$ query complexity [
34] for a finite size alphabet. It is also a lower bound for classical (randomized or deterministic) algorithms that is
$\Omega \left(nk\right)$. Our quantum algorithm for the string sorting problem has query complexity
$O(n(logn)\xb7\sqrt{k})=\tilde{O}\left(n\sqrt{k}\right)$, where
$\tilde{O}$ does not consider log factors. It is based on standard sorting algorithms [
34] or Heapsort [
34,
35] and the quantum algorithm for comparing strings. Additionally, we use the idea of a noisy comparison procedure for sorting [
36].
The second problem is the following. We have
n strings of length
k. We can assume that string symbols are letters from any constant size alphabet, for example, binary, Latin alphabet, or Unicode. The problem is finding the string that occurs in the sequence most often. The problem [
37] is one of the most wellstudied ones in the area of data streams [
38,
39,
40,
41]. Many applications in packet routing, telecommunication logging, and tracking keyword queries in search machines are critically based on such routines. The bestknown classical (randomized or deterministic) algorithms require
$\Omega \left(nk\right)$ queries because an algorithm should at least test all symbols of all strings. The deterministic solution can use the radix sort algorithm [
34] or the Trie (prefix tree) [
42,
43,
44,
45] that allow achieving the required complexity.
We propose a quantum algorithm that is based on the sorting algorithm from the first problem. Our algorithm for the most frequent string search problem has query complexity $O(n(logn)\xb7\sqrt{k})=\tilde{O}\left(n\sqrt{k}\right)$. If ${log}_{2}n=o\left(\sqrt{k}\right)$, then our algorithm is better than classical counterparts. Note that this setup makes sense in practical cases.
The third problem is the Intersection of Two String Sequences problem. Assume that we have two sequences of strings of length k. The size of the first set is n, and the size of the second one is m. The first sequence is given, and the second one is given in an online fashion, one by one. After each requested string from the second sequence, we want to check whether this string belongs to the first sequence. We propose a quantum algorithm for the problem with quantum query complexity $O((n+m(logm+loglogn))\xb7logn\xb7\sqrt{k})=\tilde{O}\left((n+m)\sqrt{k}\right)$. The algorithm uses a quantum algorithm for sorting strings. At the same time, the bestknown classical (randomized or deterministic) algorithm requires $\Omega \left(\right(n+m\left)k\right)$ queries, and this bound is achieved using the radix sort algorithm or the Trie data structure.
The paper is an extended version of a conference paper [
46].
The structure of the paper is the following. Discussion on the computation model is situated in
Section 2. We present the quantum subroutine that compares two strings in
Section 3. Then, we discuss three problems: Strings Sorting problem in
Section 4, the Most Frequent String Search problem in
Section 5, and Intersection of Two String Sequences problem in
Section 6.
Section 7 contains the conclusions.
2. Preliminaries
We use the standard form of the quantum query model. Let $f:D\to \{0,1\},D\subseteq {\{0,1\}}^{N}$ be an N variable function. An input for the function is $x\in D$. We are given an oracle access to the input x, i.e., it is realized by a specific unitary transformation usually defined as $i\rangle z\rangle w\rangle \to i\rangle z+{x}_{i}\phantom{\rule{4.44443pt}{0ex}}(mod\phantom{\rule{0.277778em}{0ex}}2)\rangle w\rangle $, where the $i\rangle $ register indicates the index of the variable we are querying, $z\rangle $ is the output register, and $w\rangle $ is some auxiliary workspace. Note that we use Dirac notation vectors. An algorithm in the query model consists of alternating applications of arbitrary unitaries independent of the input and the query unitary and a measurement in the end.
In the case of nonbinary input, we present the input variables in binary form. Using alternating unitaries independent of the input, we can store bits in auxiliary workspace $w\rangle $ and use the obtained variable in an algorithm. In the case of computing a complex function f and additionally nonbinary input, we can consider a block of alternating unitaries independent of the input and the query unitary that stores required variables in the auxiliary workspace $w\rangle $. Then, we compute the Boolean value of the function f on arguments and store them in the auxiliary workspace $w\rangle $. After that, we can use the value of the function f in our algorithms.
The smallest number of queries for an algorithm that outputs
$f\left(x\right)$ with a probability that is at least
$\frac{2}{3}$ on all
x is called the quantum query complexity of the function
f and is denoted by
$Q\left(f\right)$. We refer the readers to [
1,
2,
3] for more details on quantum computing.
In the quantum algorithms in this article, we discuss quantum query complexity. We use modifications of Grover’s search algorithm [
27,
28] as quantum subroutines. For these subroutines, time complexity (number of gates in a circuit) is more than query complexity for an additional log factor. Note that the query can be implemented using the CNOT gate.
3. The Quantum Algorithm for Comparing Two Strings
Firstly, we discuss a quantum subroutine that compares two strings of length k. Assume that this subroutine is Compare_strings$(s,t,k)$, and it compares s and t in the lexicographical order. It returns:
$1$ if $s<t$;
0 if $s=t$;
1 if $s>t$.
As a base for our algorithm, we use the algorithm of finding the minimal argument with 1result of a Booleanvalue function. Formally, we have:
Lemma 1 ([
24,
25], Theorem 10; [
23], Section 2.2; [
26], Proposition 4)
. Suppose we have a function $f:\{1,\dots ,N\}\to \{0,1\}$ for some integer N. There is a quantum algorithm for finding ${j}_{0}=min\{j\in \{1,\dots ,N\}:f\left(j\right)=1\}$. The algorithm finds ${j}_{0}$ with the expected query complexity $O\left(\sqrt{{j}_{0}}\right)$ and error probability that is, at most, $\frac{1}{2}$. Let us choose the function $f\left(j\right)=({s}_{j}\ne {t}_{j})$. Thus, we search for ${j}_{0}$ that is the index of the first unequal symbol of the strings. Then, we can claim that s precedes t in the lexicographical order if the symbol ${s}_{{j}_{0}}$ precedes the symbol ${t}_{{j}_{0}}$. The claim is right by the definition of the lexicographical order. If there are no unequal symbols, then the strings are equal.
If we discuss the implementation of the f, then we can say that for computing the value $f\left(j\right)$, we store the binary representation of ${s}_{j}$ and ${t}_{j}$ in the auxiliary workspace, for example, ${\psi}_{s}\rangle $ and ${\psi}_{t}\rangle $. Then, compute the value of $f\left(j\right)$ and store it in a qubit $\varphi \rangle $. After that, we can use this value in the algorithm. The last step is clearing $\varphi \rangle $ using values of ${\psi}_{s}\rangle $ and ${\psi}_{t}\rangle $ and the CNOT gate; then, clearing ${\psi}_{s}\rangle $ and ${\psi}_{t}\rangle $ repeatedly using the same queries (that use CNOT gates). All these manipulations take a constant number of queries because of the constant size of the input alphabet.
We use The_first_one_search$(f,k)$ as a subroutine from Lemma 1, where $f\left(j\right)=({s}_{j}\ne {t}_{j})$. Assume that this subroutine returns $k+1$ if it does not find any solution or the found argument ${j}^{\prime}$ is such that $f\left({j}^{\prime}\right)=0$.
We use the standard technique of boosting success probability. Thus, we repeat the subroutine $\lceil {log}_{2}\left({\delta}^{1}\right)\rceil $ times and return the minimal answer.
Suppose the subroutine has an error. There are two cases. The first one is finding the index of unequal symbols that is not the minimal one. In the second case, the algorithm does not find unequal symbols. Then, we assume that it returns $k+1$. Thus, in a case of an error, the subroutine returns a value that is bigger than the correct answer.
Therefore, if at least one subroutine invocation has no error, then the whole algorithm succeeds. All error events are independent. The error probability of the whole algorithm is the probability of error for all invocations of the subroutine, that is $O\left(\frac{1}{{2}^{{log}_{2}\left({\delta}^{1}\right)}}\right)=O\left(\delta \right)$.
Let us present the Algorithm 1.
Algorithm 1Compare_strings$(s,t,k)$. The Quantum Algorithm for Comparing Two Strings. 
${j}_{0}$ ← The_first_one_search(f,k) ▹ The initial value for$i\in \{1,\dots ,\lceil {log}_{2}{\delta}^{1}\rceil \}$do j_{0} ← min(j_{0}, The_first_one_search(f,k)) end for if${j}_{0}=k+1$then $result\leftarrow 0$ ▹ The strings are equal. else if $({s}_{{j}_{0}}<{t}_{{j}_{0}})$ then $result\leftarrow 1$ ▹s precedes t. else $result\leftarrow 1$ ▹s succeeds t. end if end if return$result$

The next property follows from the previous discussion.
Lemma 2. Algorithm 1 compares two strings of length k in the lexicographical order with query complexity $O(\sqrt{k}log{\delta}^{1})$ and error probability $O\left(\delta \right)$ for some integer k and $0<\delta <1$.
The algorithm finds the minimal index of unequal symbols ${j}_{0}$. We can say that ${j}_{0}1$ is the length of the longest common prefix for these strings.
We can show that the lower bound for the problem is $\Omega \left(\sqrt{k}\right)$.
Lemma 3. Any quantum algorithm for Comparing Two Strings problem has $\Omega \left(\sqrt{k}\right)$ query complexity.
Proof. Let us show that the problem is at least as hard as the unstructured search problem. Let ${s}_{\lfloor k/2\rfloor}=1$ and ${s}_{j}=0$ for all $j\in \{1,\dots ,\lfloor k/2\rfloor 1,\lfloor k/2\rfloor +1,\dots k\}$. The string t is such that there is only one 1 in position z. In other words, there is $z\in \{1,\dots ,k\}$ such that ${t}_{z}=1$ and ${t}_{j}=0$ for all $j\in \{1,\dots ,z1,z+1,\dots k\}$.
If $z<\lfloor k/2\rfloor $, then $t>s$. If $z=\lfloor k/2\rfloor $, then $t=s$. If $z>\lfloor k/2\rfloor $, then $t<s$. Therefore, the problem is at least as hard as the search for 1 among the first $\lfloor k/2\rfloor $ variables in the string t.
It is known [
14] that the quantum query complexity of the unstructured search among
$\lfloor k/2\rfloor $ variables is
$\Omega \left(\sqrt{k}\right)$. □
At the same time, the classical complexity of the problem is $\Theta \left(k\right)$.
Lemma 4. Randomized query complexity for Comparing Two Strings problem is $\Theta \left(k\right)$.
Proof. Due to the proof of Lemma 3, the problem is at least as hard as the search for 1 among the first $k/2$ variables in the string t.
It is known [
14] that the randomized query complexity of the unstructured search among
$k/2$ variables is
$\Omega \left(k\right)$.
At the same time, we can check all symbols sequentially to search the first unequal symbol. This algorithm has $O\left(k\right)$ query complexity. □
Additionally, we can compute the complexity of any algorithm based on the two strings comparison procedure.
Lemma 5. Suppose we have some integer n, integer $A=A\left(n\right)$ and ε such that $\underset{n\to \infty}{lim}\epsilon /A=0$. Then, if a quantum algorithm does $A\left(n\right)$ comparisons of strings of length k and has $O\left(\epsilon \right)$ error probability, then it does at most $O(A\sqrt{k}log(A/\epsilon ))$ queries.
Proof. As a strings comparison procedure, we use Compare_strings subroutine for $\delta =\epsilon /A$. Because of Lemma 2, the complexity of the subroutine is $O(\sqrt{k}log(A/\epsilon ))$, and the error probability is $O(\epsilon /A)$. Because of A comparison operations, the total complexity of the algorithm is $O(A\sqrt{k}log(A/\epsilon ))$.
Let us discuss the error probability. Events of error in the algorithm are independent. Thus, all events should be correct. The error probability for one event is $1\left(1\epsilon /A\right)$. Hence, the error probability for all A events is at least $1{\left(1\epsilon /A\right)}^{A}=1{\left(1\epsilon /A\right)}^{A}$.
Note that
Hence, the total error probability is at most
$O\left(\epsilon \right)$. □
4. Strings Sorting Problem
Let us consider the following problem.
Problem. For some positive integers n and k, we have the sequence of strings $s=({s}^{1},\dots ,{s}^{n})$. Each ${s}^{i}=({s}_{1}^{i},\dots ,{s}_{k}^{i})\in {\Sigma}^{k}$ for some finite size alphabet $\Sigma $. We search an order $ORDER=({i}_{1},\dots ,{i}_{n})$ such that for any $j\in \{1,\dots ,n1\}$, we have ${s}^{{i}_{j}}\le {s}^{{i}_{j+1}}$ in the lexicographical order.
We use one of the existing sorting algorithms (for example, Heapsort algorithm [
34,
35] or the Merge sort algorithm [
34]) as a base and the quantum algorithm for string comparison from
Section 3. In fact, our comparison function can have errors. That is why we use the result for “noisy computation” from [
36]. The result is presented in the following lemma.
Lemma 6 ([
36], Theorem 3.5)
. Suppose we have a comparison procedure that works with error probability ε. Then there is a sorting algorithm with query complexity $O(nlog(n/\epsilon \left)\right)$ and error probability at most ε. The complexity of the algorithm is presented in the following theorem.
Theorem 1. The algorithm sorts $s=({s}^{1},\dots ,{s}^{n})$ with query complexity $O(n(logn)\xb7\sqrt{k})=\tilde{O}\left(n\sqrt{k}\right)$ and constant error probability.
Proof. The correctness of the algorithm follows from the description. Let $\epsilon =0.1$. Then, we apply the result from Lemma 6 and use the quantum comparison procedure that has $\epsilon $ error probability and $O(\sqrt{k}log{\epsilon}^{1})=O\left(\sqrt{k}\right)$ query complexity. Therefore, the query complexity of the algorithm is $O(n(log(n/\epsilon ))\xb7\sqrt{k})=O(n(logn)\xb7\sqrt{k})=\tilde{O}\left(n\sqrt{k}\right)$, and the error probability is $\epsilon $. □
We can show the lower bound for the problem.
Theorem 2. Any quantum algorithm for the Sorting problem has $\Omega \left(\sqrt{nk}\right)$ query complexity.
Proof. Let us show that the problem is at least as hard as the unstructured search problem. Assume that strings ${s}^{1},\dots ,{s}^{n}$ are such that
There is a pair $(u,v)$ such that ${s}_{v}^{u}=1$;
For all pairs $(i,j)\ne (u,v)$, ${s}_{j}^{i}=0$.
In that case, the answer is $ORDER=({i}_{1},\dots ,{i}_{n1},u)$, where $({i}_{1},\dots ,{i}_{n1})$ is a permutation of integers from $\{1,\dots ,u1,u+1,\dots ,n\}$. The searching for the required index u is at least as hard as the search for the 1value variable ${s}_{v}^{u}$.
It is known [
14] that the quantum complexity of the unstructured search among
$nk$ variables is
$\Omega \left(\sqrt{nk}\right)$. □
The lower bound for classical complexity can be proven by the same way as in Theorem 2.
Theorem 3. The randomized query complexity of the Sorting problem is $\Theta \left(nk\right)$.
Proof. Due to the proof of Theorem 2, the problem is at least as hard as the search for 1 among $nk$ variables in the strings ${s}^{1},\dots ,{s}^{n}$.
It is known [
14] that the randomized query complexity of the unstructured search among
$nk$ variables is
$\Omega \left(nk\right)$.
The Radix sort [
34] algorithm reaches this bound and has
$O\left(nk\right)$ complexity in a case of a finite alphabet. □
5. The Most Frequent String Search Problem
Let us formally present the problem.
Problem. For some positive integers
n and
k, we have a sequence of strings
$s=({s}^{1},\dots ,{s}^{n})$. Each
${s}^{i}=({s}_{1}^{i},\dots ,{s}_{k}^{i})\in {\Sigma}^{k}$ for some finite size alphabet
$\Sigma $. Let
$\#\left(t\right)=\left\right\{i\in \{1,\dots ,n\}:{s}^{i}=t\left\}\right$ be the number of occurrences of a string
t. We search for
$i=argma{x}_{i\in \{1,\dots n\}}\#\left({s}^{i}\right)$. If several strings satisfy the condition, then the answer is the index of the string with minimal index in the set
s. Formally,
i is such that:
Firstly, we present an idea of the algorithm.
The algorithm contains two steps. The first step is sorting the sequence of strings and obtaining
$ORDER=({i}_{1},\dots ,{i}_{n})$ such that for any
$j\in \{1,\dots ,n1\}$, we have
${s}^{{i}_{j}}\le {s}^{{i}_{j+1}}$ in the lexicographical order. In that case, equal strings are situated sequentially. On the second step, we find each segment
$[{i}_{\ell},{i}_{r}]$ of indexes for equal strings, i.e.,
${s}^{j}={s}^{{i}_{\ell}}$ for
$j\in \{{i}_{\ell},\dots ,{i}_{r}\}$ and
${s}^{{i}_{\ell 1}}\ne {s}^{{i}_{\ell}}$ or
$\ell =1$, and
${s}^{{i}_{r+1}}\ne {s}^{{i}_{r}}$ or
$r=n$. We check segments for different strings one by one. We store the longest segment’s length as
${c}_{max}$ and the minimal index of the string that corresponds to this segment in
${j}_{max}$. As in the sorting algorithm, in the second step of the algorithm, we apply the
Compare_
strings subroutine for checking the equality of strings. Assume that we have the
Sort_
strings$\left(s\right)$ subroutine that implements the algorithm from
Section 4.
Let us present the algorithm formally in Algorithm 2.
Let us discuss the complexity of the algorithm.
Theorem 4. Algorithm 2 finds the most frequent string from $s=({s}^{1},\dots ,{s}^{n})$ with query complexity $O(n(logn)\xb7\sqrt{k})=\tilde{O}\left(n\sqrt{k}\right)$ and constant error probability.
Proof. The correctness of the algorithm follows from the description. Let us discuss the query complexity. Because of Theorem 1, the sorting algorithm’s complexity is $O\left(n(logn)\sqrt{k}\right)$, and the error probability is constant. The second step does $O\left(n\right)$ comparison operations. Let ${\epsilon}^{\prime}=0.1$. Thus, because of Lemma 5, the second step of the algorithm algorithm does $O\left(n(logn)\sqrt{k}\right)$ queries, and the error probability is constant. The total complexity is $O(n(logn)\sqrt{k}+n(logn)\sqrt{k})=O\left(n(logn)\sqrt{k}\right)$.
Error events of two steps are independent. Therefore, the error probability of the whole algorithm is also constant. We can achieve any required constant error probability by repetition. The technique is standard in both oneside and twoside errors. It can be seen, for example, in [
16]. □
Algorithm 2 The Quantum Algorithm for the Most Frequent String Problem. 
$({i}_{1},\dots ,{i}_{n})=ORDER$ ← Sort_strings(s) ▹ We sort $s=({s}^{1},\dots ,{s}^{n})$. ${c}_{max}\leftarrow 0,{j}_{max}\leftarrow 1$ $c\leftarrow 1,j\leftarrow {i}_{1}$ for$b\in (1,\dots ,n)$do if $b=n$ or ($b\ne n$ and Compare_strings$({s}^{{i}_{b}},{s}^{{i}_{b+1}},k)\ne 0$) then ▹ We find the end of a segment if $c>{c}_{max}$ then ▹ If the current segment is longer than the current longest one ${c}_{max}\leftarrow c,{j}_{max}\leftarrow j$ end if $c\leftarrow 1$ if $b\ne n$ then $j={i}_{b+1}$ end if else $c\leftarrow c+1$ if ${i}_{b+1}<j$ then ▹j is the minimal index of the current segment $j\leftarrow {i}_{b+1}$. end if end if end for return${j}_{max}$

Theorem 5. Suppose we have a constant ε such that $0<\epsilon <3/4$. If the length of the strings $k\ge {log}_{2}n$, then any quantum algorithm for the Most Frequent String Search problem has $\Omega (\sqrt{nk}+{n}^{3/4\epsilon})$ query complexity. If $k<{log}_{2}n$, then any quantum algorithm for the Most Frequent String Search problem has $\Omega \left(\sqrt{nk}\right)$ query complexity
Proof. Let us show that the problem is at least as hard as the unstructured search problem. Assume that $n=2t$ and $k>1$ for some integer t. Then, let ${s}^{t+1},\dots ,{s}^{2t}={0}^{k}$, where ${0}^{k}$ is a string of k zeros. Other strings can be ${s}^{1},\dots ,{s}^{t}={1}^{k}$ or there are $z\in \{1,\dots ,t\}$ and $u\in \{1,\dots ,k\}$ such that ${s}_{u}^{z}=0$ and ${s}_{{u}^{\prime}}^{z}=1$ for all ${u}^{\prime}\in \{1,\dots ,u1,u+1,\dots ,k\}$.
In the first case, the answer is ${1}^{k}$. In the second case, the answer is ${0}^{k}$. Therefore, solving the problem for this instance is equivalent to the search for 0 among the first $tk=nk/2$ variables.
According to [
14], the quantum complexity of the unstructured search among
$nk/2$ is
$\Omega \left(\sqrt{nk}\right)$.
In the case of odd n, we assign ${s}^{n}={1}^{k/2}{0}^{k/2}$, and it is not used in the search. Then, we can consider only $n1$ strings. Thus, $n1$ is even.
Let us consider the case of $k=1$. If n is odd, then ${s}^{n}=2$. Let ${s}^{i}=0$ for $i\ge t+1$, and $t=\lfloor n/2\rfloor $. Let us consider two cases. The first one is ${s}^{i}=1$ for all $i\in \{1,\dots ,t\}$. The second case is ${s}^{i}=1$ for all $i\in \{1,\dots ,t\}\backslash \left\{{i}_{1}\right\}$ and ${s}^{{i}_{1}}=0$ for some ${i}_{1}\in \{1,\dots ,t\}$. In the first case, the answer is 1. In the second case, the answer is 0. Therefore, solving the problem for this instance is equivalent to the search for 0 among the first $t=n/2=nk/2$ variables.
Let us show that the problem is at least as hard as the
ddistinctness problem [
47]. Let
d be such that
$\frac{1}{4d}=\epsilon /2$. Let
b be the maximal integer that satisfies
$n\ge b\xb7(d1)+1$. Let
${u}^{j}$ be a binary representation of
j for
$j\in \{0,\dots ,b\}$.
Assume that ${s}^{1}={u}^{1}$ for other strings. We have two cases:
Case 1. The sequence s contains $d1$ copies of each ${u}^{j}$, where $j\ge 1$ and other strings are ${u}^{0}$. Formally:
 
$\#\left({u}^{j}\right)=d1$ for $j\in \{1,\dots ,b\}$;
 
$\#\left({u}^{0}\right)=nb\xb7(d1)$.
Case 2. The sequence s contains $d1$ copies of each ${u}^{j}$, where $j\ge 1$ except some ${j}_{m}\in \{2,\dots ,b\}$; d copies of ${u}^{{j}_{m}}$ and other strings are ${u}^{0}$. Formally:
 
$\#\left({u}^{{j}_{m}}\right)=d$ for some ${j}_{m}\in \{2,\dots ,b\}$;
 
$\#\left({u}^{j}\right)=d1$ for $j\in \{1,\dots ,b\}\backslash \left\{{j}_{m}\right\}$;
 
$\#\left({u}^{0}\right)=nb\xb7(d1)+1$.
In the first case, $\#\left({u}^{j}\right)=d1$ for $j\in \{1,\dots ,b\}$, $\#\left({u}^{0}\right)\le d1$ and ${s}^{1}={u}^{1}$. Therefore, the answer is 1. In the second case, $\#\left({u}^{j}\right)=d1$ for $j\in \{1,\dots ,b\}\backslash \left\{{j}_{m}\right\}$, $\#\left({u}^{0}\right)\le d1$ and $\#\left({u}^{{j}_{m}}\right)=d$. Therefore, the answer is ${i}_{m}=min\{i:{s}^{i}={u}^{{j}_{m}}\}$. Note that ${i}_{m}\ne 1$ because ${j}_{m}\ge 2$ and ${s}^{1}={u}^{1}\ne {u}^{{j}_{m}}$.
Hence, solving the problem for this instance is equivalent to checking whether there is a string that occurs in the input at least
d times. It is the
ddistinctness problem from [
47]. It is known that the complexity of the problem is
$\Omega \left(\frac{1}{{4}^{d}{d}^{2}\xb7{log}^{5/2}R}\xb7{R}^{3/41/\left(4d\right)}\right)$ for
$R=\Theta \left({d}^{d/2}n\right)$. In our case, the complexity is
$\Omega \left({n}^{3/4\epsilon}\right)$. □
Secondly, let us discuss the classical complexity of the problem.
Theorem 6. Any randomized algorithm for the Most Frequent String Search problem has $\Theta \left(nk\right)$ query complexity.
Proof. The bestknown classical algorithm uses the radix sort algorithm and does steps similar to the steps of the quantum algorithm.
The running time of this algorithm is $O\left(nk\right)$. At the same time, we can show that it is also a lower bound.
As it was shown in the proof of Theorem 5, the problem is at least as hard as the unstructured search problem among
$nk/2$ variables. It is known [
14] that the randomized complexity of the unstructured search among
$nk/2$ variables is
$\Omega \left(nk\right)$. □
6. Intersection of Two Sequences of Strings Problem
Let us consider the following problem.
Problem. For some positive integers $n,m$ and k, we have the sequence of strings $s=({s}^{1},\dots ,{s}^{n})$. Each ${s}^{i}=({s}_{1}^{i},\dots ,{s}_{k}^{i})\in {\Sigma}^{k}$ for some finite size alphabet $\Sigma $. Then, we obtain m requests $t=({t}^{1}\dots {t}^{m})$, where ${t}^{i}=({t}_{1}^{i},\dots ,{t}_{k}^{i})\in {\Sigma}^{k}$. The answer for a request ${t}^{i}$ is 1 if there is $j\in \{1,\dots ,n\}$ such that ${t}^{i}={s}^{j}$. We should answer 0 or 1 to each of m requests.
Let us present the algorithm that is based on the sorting algorithm from
Section 4. We sort strings from
s. Then, we answer each request using a binary search in the sorted sequence of strings [
34] and
Compare_
strings quantum subroutine for strings comparison during the binary search.
Let us present Algorithm 3. Assume that the sorting algorithm from
Section 4 is the subroutine
Sort_
strings$\left(s\right)$, and it returns the order
$ORDER=({i}_{1},\dots ,{i}_{n})$. The subroutine
Binary_
search_
for_
strings$({t}^{i},s,ORDER)$ is the binary search algorithm with the
Compare_
strings subroutine as a comparator, and it searches for
${t}^{i}$ in the ordered sequence
$({s}^{{i}_{1}},\dots ,{s}^{{i}_{n}})$. Suppose that the subroutine
Binary_
search_
for_
strings returns 1 if it finds
t and 0 otherwise.
Algorithm 3 The Quantum Algorithm for Intersection of Two Sequences of Strings Problem using sorting algorithm. 
$ORDER$ ← Sort_strings(s) ▹ We sort $s=({s}^{1},\dots ,{s}^{n})$. for$i\in \{1,\dots ,m\}$do ans ← Binary_search_for_strings$({t}^{i},s,ORDER)$ ▹ We search ${t}^{i}$ in the ordered sequence. return $ans$ end for

The algorithms have the following query complexity.
Theorem 7. Algorithm 3 solves Intersection of Two Sequences of Strings Problem with query complexity $O((n+m)\sqrt{k}\xb7logn\xb7log(n+m))=\tilde{O}\left((n+m)\sqrt{k}\right)$ and error probability $O\left(\frac{1}{n+m}\right)$.
Proof. The correctness of the algorithm follows from the description.
Because of Theorem 1, the sorting algorithm’s complexity is $O(nlogn\xb7\sqrt{k})$ and constant error probability.
Let us consider the second part of the algorithm. It does
$O(mlogn)$ comparison operations for all invocations of the binary search algorithm. Let
$\epsilon =0.1$. Thus, because of Lemma 5, the second part of the algorithm does
queries, and the error probability is constant.
Thus, the total complexity is $O((n+m(logm+loglogn))\sqrt{k}logn)$. Error events of the two steps are independent. Therefore, the error probability of the whole algorithm is also constant. We can achieve any required constant error probability by repetition. □
The lower bound for the classical case can be proven using a result stated in [
48] (Lemma 7, Section 5.1).
Theorem 8. The randomized query complexity of Intersection of Two Sequences of Strings Problem is $\Theta \left(\right(n+m\left)k\right)$.
Proof. Assume that $n>m$. Let us consider ${t}^{1}={0}^{k}$, and ${s}^{i}$ contains only 0 s and 1 s, i.e., ${s}_{j}^{i}\in \{0,1\}$ for all $i\in \{1,\dots ,n\},j\in \{1,\dots ,k\}$.
For checking
${s}^{i}={t}^{1}$, it is enough to check
$\neg \underset{j=1}{\overset{k}{\bigvee}}({s}_{j}^{i}=1)$ because this implies
${s}_{j}^{i}=0$ for all
$j\in \{1,\dots ,k\}$. In that case, checking for the existence of
${t}^{1}$ among
${s}^{i}$ is the same as checking the following condition:
This condition means that not all string ${s}^{i}$ contains at least one 1.
The randomized complexity of computing
$\neg \underset{j=1}{\overset{k}{\bigvee}}({s}_{j}^{i}=1)$ is the same as the complexity of the unstructured search for 1 among
k variables, which is
$\Omega \left(k\right)$. According to [
48] (Lemma 7, Section 5.1), the total complexity of the function is
$\Omega \left(nk\right)$.
Assume that $m>n$. Let us consider ${s}^{i}={0}^{k}$ for all $i\in \{1,\dots ,n\}$. The checking existence ${t}^{j}$ among ${s}^{1},\dots ,{s}^{n}$ is at least as hard as the search for 1 among ${t}_{1}^{j},\dots ,{t}_{k}^{j}$ that requires $\Omega \left(k\right)$ queries. It is true for all $j\in \{1,\dots ,m\}$. Therefore, the total randomized complexity is $\Omega \left(mk\right)$.
Hence, if we join both cases, the randomized complexity of solving the problem is $\Omega (max(n,m)\xb7k)=\Omega \left(\right(n+m)\xb7k)$.
This complexity $O\left(\right(n+m\left)k\right)$ can be reached if we use the radix sort algorithm and perform the same operations as in the quantum algorithm. □
Note that we can use the quantum algorithm for element distinctness [
49,
50] for this problem. The algorithm solves the problem of finding two identical elements in the sequence. The query complexity of the algorithm is
$O\left({D}^{2/3}\right)$, where
D is the number of elements in the sequence. The complexity is tight because of [
51]. The algorithm can be the following. On
jth request, we can add the string
${t}^{j}$ to the sequence
${s}^{1},\dots ,{s}^{n}$ and invoke the element distinctness algorithm that finds a collision of
${t}^{j}$ with other strings. Such approach requires
$\Omega \left({n}^{2/3}\sqrt{k}\right)$ queries for each request and
$\Omega \left(m{n}^{2/3}\sqrt{k}\right)$ for processing all requests. Note that the online nature of requests does not allow us to access all
${t}^{1},\dots ,{t}^{m}$. Thus, each request should be processed separately.
In a case of $n\ll m$, we can use the Grover search algorithm for searching ${t}^{j}$ among $({s}_{1},\dots ,{s}_{n})$. The complexity is $\tilde{O}\left(m\sqrt{nk}\right)$ in that case.
Because of the probabilistic behavior of the Oracle, we should use the approach similar to [
52] that uses ideas of Amplitude Amplification [
53].
7. Conclusions
In the paper, we propose a quantum algorithm for a comparison of strings and a general idea for any algorithm that does A string comparison operations. Then, using these results, we construct a quantum strings sorting algorithm that works faster than the radix sort algorithm, which is the best known deterministic algorithm for sorting a sequence of strings.
We propose quantum algorithms for two problems using the sorting algorithm: the Most Frequent String Search and Intersection of Two String Sequences. These quantum algorithms are more efficient than classical (deterministic or randomized) counterparts in a case of ${log}_{2}\left(n\right)=o\left(\sqrt{k}\right)$, where k is the length of strings and n is the number of strings. In a case of the Intersection of Two String Sequences problem, the condition is ${log}_{2}\left(n\right)({log}_{2}m+{log}_{2}{log}_{2}n)=o\left(\sqrt{k}\right)$, where n and m are the number of strings in two sequences. Note that these assumptions are reasonable.
We discussed quantum and classical lower bounds for these problems. Classical lower bounds are tight, and at the same time, there is room to improve the quantum lower bounds.