1. Introduction
Quantum search algorithms and their generalizations [
1,
2,
3,
4,
5] are part of a group of quantum algorithms that are of great interest in computer science.
The use of quantum computers can significantly reduce the computation time for models with a large number of weights, or those requiring an exponentially growing number of combinations. We presented several results in the field of quantum approaches to classification and information retrieval in the papers [
6,
7].
The task of finding occurrences of a given substring in a text is an important problem of information retrieval. It occurs in a wide range of applications, namely, in text editors, in search robots, spam filters, bioinformatics, etc. A large number of algorithms (deterministic and probabilistic) for solving the search problem have been developed over the past few decades. Quantum algorithms for solving the search problem have been developed in the last two decades.
1.1. Problem
Given a binary
N length sequence
and a binary
m length sequence
w,
. Let
denote the number of substrings of length m in the text string. It is required to find the index of the occurrence of the substring
w in the text
. Namely, it is required to find an index
k such that
,
.
1.2. Related Work
The known Knuth–Morris–Pratt’s [
8] classical algorithm (1977) solves the problem in linear time
.
In the early 2000s, a quantum algorithm [
9] for searching for a given substring in a text was presented. It allows to obtain quadratic search speed-up. It returns one of the indices of the occurrence of the searched substring. The probability of obtaining the correct answer is strictly greater than 1/2. The query complexity (this measure of complexity is sometimes associated with time complexity) of an algorithm is
. The authors [
9] do not estimate the memory complexity (number of used qubits) of their algorithm. Since the authors use m qubits to encode a single binary subword of length m, the [
9] algorithm requires
qubits.
1.3. Our Contribution
We present a quantum algorithm for finding a pattern in a string. The algorithm is based on Grover’s search algorithm and hashing technique. It is known that the hashing method can significantly save space and, as a rule, the time required for searching in databases. The results of the paper demonstrate the potential of a universal hashing method for building quantum search algorithms.
To simplify the presentation, and not to clutter the idea with technical details, consider the case when it is known in advance that the desired substring occurs in the text exactly once.
More precisely, we propose a hybrid random–quantum search algorithm that searches a text for a substring and has the following characteristics:
The algorithm produces a result with a high probability of obtaining the correct answer.
The algorithm is based on Grover’s search. This search is presented in the paper as an auxiliary algorithm and requires query steps.
The
algorithm exponentially saves (compared to the [
9] algorithm) the number of qubits relative to the parameter
m—the length of the substring. Namely, the algorithm requires
qubits for his work.
The main idea of the paper is the use of the hashing technique to save space complexity in quantum search. The algorithm is based on a certain universal family of hash functions. The algorithm is a generalization of the result to a general universal hash family of functions.
The result of the work is organized as follows. In the next section (
Section 2), we present the basic model of the quantum search algorithm, basic notation and definitions. In
Section 3, we present the main result. We first present the auxiliary quantum procedure
. Next, we present a quantum search algorithm
based on the hashing technique. Theorem 1 is an analysis of the
algorithm. In
Section 5, we present the
algorithm, a generalization of the
algorithm.
The preliminary version of the article was published in Russian [
10].
2. Preliminaries for Quantum Query Algorithm
We refer the reader to the [
5,
11] for an introduction to the basics of quantum query algorithms and the state of research in this area. Here, we define the query model of computations in a way that is convenient for us. We use the notation defined in detail in [
6,
7].
The operations applied to quantum
s-qubit states
are mathematically expressed using unitary operators:
where
U is a
unitary matrix.
2.1. Query Model
For a finite sets
X and
Y let
be a discrete function. A quantum query algorithm
computing the function
g begins in a quantum state
and applies a sequence of operators
The operator depends on the input X. The quantum community calls the operator an “oracle”. Application of the oracle is called the query of the algorithm to the initial data X. The operator does not depend on X.
The algorithm computes the value
of the discrete function
g for
if the initial state
goes to the final state
in the process of computing on the input value
. The final state allows extracting the value of
as a result of measuring the state of
.
2.2. Search in an Unordered Database
Quantum search algorithm in an unordered database of n elements, in which there is exactly one element of interest to us (Grover’s algorithm), is a special case of a group of query algorithms. The following algorithm scheme underlies many generalizations:
Initialize a quantum system of qubits into a state containing information about the database. The state is constructed so that each of the basis quantum states represents the required information about n database elements.
The following “macro” steps are performed on the state:
“Macro” Step 2 describes the key operation of quantum search. Each such step increases the amplitude of the basis state that represents the desired information. The number of such steps maximizes the amplitude, and the probability of extracting the required information from the resulting state becomes close to 1.
2.3. Basic Operations of Qubits for Quantum Search
The potential advantages of quantum algorithms, on which the results of this work are based, lie in the possibility of implementing quantum operators of dimension by basic operations based on a small number of order qubits and in a small number of quantum computing steps (not a complex scheme implemented by basic elements).
The main operations are :
X—is a NOT operator. It changes the state of the qubit from
to
, and vice versa:
Z—amplitude sign reversal operator:
2.4. Characteristics of the Algorithm
The characteristics of a quantum query algorithm are the used memory, the number of queries to the analyzed data, and the probability of an error.
Size complexity (memory complexity). The number of used qubits is a measure of the memory complexity of the quantum algorithm .
We denote by the number of qubits used by the algorithm to solve the problem of finding the substring w in the .
Through , we denote the maximum among the numbers over all and w with parameters .
Query complexity. The number
of queries (number of the oracle applications) is a “query” measure of the complexity of the quantum algorithm
. Note that in [
11], one request to the oracle is testing one variable (one bit). Accordingly, in our work, one request to the oracle is implemented in the process of testing the entire string
w (hash values of
w).
We denote by the number of applications by the algorithm of the oracle operator.
Through , we denote the maximum among the numbers over all and w with parameters .
Error probability. We denote by the probability of the following event. The algorithm , as a result of solving the problem of finding the substring w in the text , gives the number k of the position in the text such that .
Through , we denote the maximum among the numbers over all and w with parameters .
3. Algorithms for Finding the Index of Occurrence of a Substring in the Text
In this section, we present a hybrid classical–quantum algorithm (more precisely, a hybrid random–quantum algorithm)
for finding a substring
w of length
m in a binary text
We consider the simplest version of the problem: we assume that the required substring w is guaranteed to occur exactly once in the text .
The problem is reduced to a problem of finding word
w in the vocabulary as follows. Let
. To simplify presentation, we assume below that the number
n (the number of substrings in the
) is a power of 2. Denote by
a sequence composed of all substrings of length
m of the
where
for
. We will call
a vocabulary.
Now, the problem of finding the word w in is represented as the problem of finding an index k such that for the vocabulary .
3.1. Auxiliary Quantum Procedure (Algorithm 1)
The quantum part of the algorithm is the following auxiliary quantum procedure . Procedure starts with an initial quantum state (quantum vocabulary) constructed from the vocabulary V using the pre-procedure .
Pre-procedure
generates the initial state (quantum vocabulary)
from vocabulary
composed of binary words of length
. The
state has the following structure:
| Algorithm 1 Auxiliary quantum procedure |
Input: Quantum state . Binary word v. Output: The index k, which is interpreted as an index such that . That is, implements the mapping
The following two macro steps, described below in operator form, are applied to the state times. In the literature, these two macro steps are often referred to as the “Grover iteration” [ 1]: The oracle operation of changing the phase of the state representing information about , for which is performed. For a binary sequence , define the Boolean function by the condition if and only if . Let denote the basic state corresponding to the element x, which is one of . In this case, is l—the qubit basis state. Oracle performs the following three actions: Application of the H operator on the auxiliary ( )-th qubit :
Application of the operator on the last qubits:
Application of the H operator on the auxiliary ( )-th qubit :
Note that the auxiliary qubit at the end restores its value (by repeated application of the Hadamard transformation) to the state: Inversion operation. = 2 —operator applied on the first n qubits. The operator is given in matrix form:
Obtaining the result of computation (the output of the ). After running the macro steps above times, we measure the first qubits in the computational basis. The resulting bits are interpreted as a binary representation of the required index k. |
3.2. Algorithm (Algorithm 2)
The algorithm consists of two sequentially working parts:
First part: preparing the initial state based on the dictionary .
Second part: reading the search word w and searching for its occurrence in the vocabulary.
We emphasize that the algorithm has two different input sets: and w. These sets are fed to the first and second parts of the algorithm , respectively.
Notations:
For a binary string w of length m, denote by the number represented by w, .
Given a number , denote by its binary representation of length m.
For the vocabulary
, let
denote the following vocabulary:
where
and
is the
p-remainder of
. That is,
, where
and
.
Denote by the set of first d primes.
| Algorithm 2 Algorithm |
Input: For the first part: Vocabulary . For the second part: Binary word w of length m. Output: The index k, which is interpreted as an index such that . That is, implements the mapping
The first part of the algorithm is to prepare the initial state from the vocabulary : First, the algorithm makes a classical random choice: a prime number p is uniformly and randomly selected from the set and the vocabulary
prepared from the vocabulary . The second step consists of the preparation of quantum state composed of
The second part of the algorithm is to read the input word w and find k such that : |
3.3. Characteristics of the Algorithm
The following theorem describes the characteristics of the main part of the algorithm (searching for the w).
Theorem 1. For a vocabulary of words of length m, for a word w of length m, algorithm solves the problem of finding index k such that with the following characteristics.
For an arbitrary integer , for an integer , it is true that The proof of Theorem 1 is given in the next section.
4. Proof of Theorem 1
4.1. Space Complexity
We start with the technical lemma we need below.
Lemma 1. For arbitrary , for , for vocabulary generated from vocabulary V and for a word formed from the word w of length m, it is true that Proof. Due to the theorem condition, we select the set
of the first
d primes, where
. Due to Chebyshev’s theorem, there exist constants
such that for all
, the
d-th prime number
satisfies the inequalities
The values of the constants defined by Chebyshev (see for example [
12]) are
That is, for arbitrary
and
, it is true that
□
To estimate the space complexity characteristic
, consider the quantum state
formed on the basis of
First, qubits are needed to encode basis states. Second, according to the Lemma 1, qubits are needed to encode basis states. Thus, we have .
4.2. The Query Complexity
The query complexity of algorithm is completely determined by the query complexity of the auxiliary procedure when takes as input the quantum state . Recall that represents (quantumly) the vocabulary for , with d satisfying the condition of the theorem.
Property 1. Let with . Let be the quantum state generated by , where itself is formed by the vocabulary . Let a word be formed from w. Then, for the , the following is true Proof. Let us consider the amplitude amplification procedure when the procedure is applied to the initial state . Let us put . Let us denote the following numbers by and : is the initial amplitude for the required basic state such that , and are the amplitudes of all other basic states of the initial state :
and .
By virtue of the introduced notation, the state
can be represented in the following form:
After applying successively j times two macrosteps to the initial state , we have the following:
(1) The operation of changing the phase of the state representing information about , for which is performed;
(2) The inversion operations (called in the literature Grover’s iteration), the amplitudes of the
-th state of the
, will be expressed by formulas (see, for example, ref. [
13] for a detailed technical justification for the effects of operators (1), and (2) on
states):
We have:
where (recall that we are considering the case when
k for which
is the only number):
Therefore,
and
can be given as
Further, if . Based on these considerations, the optimal number r of iterations of the search algorithm is determined.
Ref. [
13] shows that the probability of obtaining an incorrect result does not exceed
if you run
of Grover’s iterations in succession. If
n is a sufficiently large number, then
, then
Since one call to the oracle in the algorithm is testing an entire substring, the final value is . □
4.3. The Error
The proof of the upper bound for the error probability is based on the following considerations. For V and the desired word w, we split the set of primes into good and bad .
A prime number will be considered good for the V and w if for all such that . That is, the vocabulary with (the “good” vocabulary ) represents the vocabulary V “correctly”, and the vocabulary with (the “bad” vocabulary ) represents vocabulary V “wrongly”.
Then, the error probability
of the algorithm
can be estimated from above as follows:
where
is the probability of choosing
, and
is the error of
when “good” dictionary
is chosen for procedure
.
Note that, if a bad p occurs, it is possible to obtain the correct result when applying the procedure. However, considering all continuations of the procedure as erroneous for bad p, we only increase the error probability.
In the remainder of the proof, we estimate the
and
components of the sum (
6).
Property 2. For , the following is true Proof. Observe that
p is selected uniformly at random from
. So,
. To estimate the
, consider the following. For a pair of distinct numbers
, we denote by
, the set of primes
such that
. For the vocabulary
V and the desired sequence
w, we have that
Note that for an arbitrary pair of distinct numbers
, the following is true
The idea of proving the inequality (
8) is as follows. The number
does not exceed
. Therefore, fewer than
m different primes can divide
a. For details of the proof (
8), see Lemma 7.4 [
14].
Finally, combining (
7) and (
8), we have that
□
Now let us estimate the second term of the sum (
6). We estimate the
in the condition when we apply the procedure
for the state
. The state
represents (quantumly) the vocabulary
for
. That is, we are in the case where
represents the vocabulary
V “correctly”. This situation satisfies the condition of Grover’s search algorithm. So, immediately we have that
Now, finally, the probability
is estimated based on (
6) and Property 2 and (
9) as follows:
Thus, we have that there are a few “bad variants” of processing the pair and w by the algorithm — their share does not exceed of the total number of possible processings. In the “good case” of processing the pair and w, the probability of erroneous processing is bounded from above by the error probability of the procedure , which is bounded by .
5. Generalization
In this section, we present the algorithm, which is a generalization of the algorithm in terms of a universal family of hash functions.
The concept of universal hashing is defined in [
15] and has been discussed in sufficient detail in a number of papers, see, for example, [
16,
17]. The family
of (
)-functions
for
is called a universal family of hash functions if for some
and for an arbitrary pair
where
.
For our purposes, we extend the definition of the universal family of hash functions as follows.
Definition 1. A family of ()-functions for and will be called a strongly
-universal family of hash (
)-functions
if for each n-subset of the set and an arbitrary word , where .
Note that a strongly -universal family of hash functions is an -universal family of hash functions in the standard sense.
An example of a strongly
-universal family of (
)-functions is the set
of following functions. For
, (
)-function
is determined by the
j-th prime number
as follows:
Here, is the remainder r when is divided by a prime , and is the binary presentation of the number . Clearly, we have that .
The family satisfies the following property.
Property 3. For and , the set of ()-functions forms a family that is a strongly -universal family of hash ()-functions.
Proof. For the proof, see the Property 2 above. □
We now present the algorithm (Algorithm 3)—generalization of the algorithm—in terms of a strongly -universal family of hash functions.
The algorithm consists of two sequentially working parts:
First part: preparing the initial state based on the dictionary .
Second part: reading the search word w and searching for its occurrence in the dictionary.
We emphasize that the algorithm
has two different input sets:
and
w. These sets are fed to the first and second parts of the algorithm
, respectively.
| Algorithm 3 Algorithm |
Input: For the first part: Vocabulary composed of binary words of length m from the . For the second part: Binary word w of length m. Output: The index k, which is interpreted as an index such that . The first part of the algorithm (preparing the initial state): The second part of the algorithm (searching for the ): |
Now, we have the following statement—a generalization of Theorem 1 for the algorithm based on strongly -universal family of ()-functions.
Theorem 2. For a vocabulary of n words of length m, for a word w of length m algorithm, solves the problem of finding an index k such that , with the following characteristics Proof. The proof of Theorem 2 repeats, word for word, the proof of Theorem 1 with only one amendment: it is based on the general family of strongly -universal ()-functions instead of hash functions of a specific family from Property 3. □
Note that such generalization (algorithm
) works effectively when the length
l of hashes is small. As the algorithm
shows, such algorithms exist—for the algorithm
, the parameter
, see Property 3. This provides an upper bound
6. Conclusions
The paper presents a hybrid classical–quantum algorithm and its generalization—the quantum algorithm for finding the occurrence of the word w in the text . The problem naturally reduces to searching for a word in the vocabulary formed from .
The quantum part of algorithms and is presented by the auxiliary quantum procedure . The quantum procedure is essentially Grover’s quantum search algorithm. Here, we use the original conditions for applying Grover’s algorithm, namely, we consider that in the text , there is (necessarily) a unique occurrence of the word w. It must be said that the calculation of query complexity depends on the direct implementation of the algorithm. In this work, we consider the oracle as an appeal to the whole substring, not to its individual bits.
The main declared result in saving the number of qubits used is achieved in the
and
algorithms. The
and
algorithms use hash family techniques to save quantum space. The
algorithm is based on a specific family of hash functions. This specific family of hash functions is known as Freivald fingerprints (see, for example, the book [
14], chapter 7 for more information). The
algorithm is the generalization of the
for an arbitrary strongly
-universal family of hash functions.
It is important to note that in this paper, as well as in the cited paper [
9], the algorithms are applied to the quantum state (initial state), which is prepared in advance based on the analyzed text
(on the vocabulary
that is formed from the
). The preparation of the initial state requires preliminary work, but this work is not taken into account in the algorithms. Note that the problem of preparing the initial state, as a special problem, is beginning to be discussed in the community dealing with quantum information search. In particular, the authors of [
18] drew attention to the problem of the complexity of preparing the initial state.
Finally, once again, we note that in this paper, we considered the case when it is known in advance that the desired substring occurs in the text exactly once. The problem can be expanded if the number of occurrences is greater than one and is known in advance, and also if the number of occurrences of the substring is not known in advance. An estimate of the time and space complexity of Grover’s search algorithm in these cases is described in [
13].