Hybrid classical-quantum text search based on hashing

The paper considers the problem of finding a given substring in a text. It is known that the complexity of a classical search query in an unordered database is linear in the length of the text and a given substring. At the same time, Grover's quantum search provides a quadratic speedup in the complexity of the query and gives the correct result with a high probability. We propose a hybrid classical-quantum algorithm (hybrid random-quantum algorithm to be more precise), that implements Grover's search to find a given substring in a text. As expected, the algorithm works a) with a high probability of obtaining the correct result and b) with a quadratic query acceleration compared to the classical one. What's new is that our algorithm uses the uniform hash family functions technique. As a result, our algorithm is much more memory efficient (in terms of the number of qubits used) compared to previously known quantum algorithms.

Quantum search algorithms and its generalizations [1,2,3,4,5] are part of a group of quantum algorithms that are of great interest in computer science.
The using of quantum computers can significantly reduce the computation time for models with a large number of weights, or those requiring an exponentially growing number of combinations.We presented a review of some results in the field of quantum approaches to classification and information retrieval in the papers [6,7].
The task of finding occurrences of a given substring in a text is important problem of information retrieval.It occurs in a wide range of applications, namely, in text editors, in search robots, spam filters, bioinformatics, etc.A large number of algorithms (deterministic and probabilistic) for solving the search problem have been developed over the past few decades.Quantum algorithms for solving the search problem have been developed in the last two decades.
Problem.Given a binary N length sequence string = b 1 . . .b N and a binary m length sequence w, m < N .It is required to find the index of the occurrence of the substring w in the text string.Namely, it is required to find an index k such that w = b k . . .b k+m−1 .
Related work.The known Knuth-Morris-Pratt's [8] classical algorithm (1977) solves the problem in linear time O(m + n).
In the early 2000s, a quantum algorithm [9] for searching for a given substring in a text was presented.It allows to get quadratic search speed-up.It returns one of the indexes of the occurrence of the searched substring.The probability of getting the correct answer is strictly greater than 1/2.The query complexity (this measure of complexity is sometimes associated with time complexity) of an algorithm is O( √ n log n m log m + √ m log 2 m).The authors [9] do not estimate the memory complexity (number of used qubits) of their algorithm.An analysis shows that the [9] algorithm requires O(log n + m) qubits.
Our contribution.We present a quantum algorithm for finding a pattern in a string.The algorithm is based on Grover's search algorithm and hashing technique.It is known that the hashing method can significantly save space and, as a rule, the time required for searching in databases.The results of the paper demonstrate the potential of a universal hashing method for building quantum search algorithms.
To simplify the presentation, not to clutter the idea with technical details, consider the case when it is known in advance that the desired substring occurs in the text exactly once.
More precisely, we propose a hybrid random-quantum search algorithm A that searches a text for a substring and has the following characteristics: • The A algorithm produces a result with a high probability of obtaining the correct answer.
• The A algorithm is based on Grover's search.This search is presented in the paper as an auxiliary algorithm A1 and requires O( √ n) query steps.
• The A algorithm exponentially saves (compared to the [9] algorithm) the number of qubits relative to the parameter m -the length of the substring.Namely, the algorithm requires O(log n + log m) qubits for his work.
The main idea of the paper is the use of hashing technique to save space complexity in quantum search.The A algorithm is based on a certain universal family of hash functions.The A2 algorithm is a generalization of the result to a general universal hash family of functions.
The result of the work is organized as follows.In the next section (Section 2), we present the basic model of the quantum search algorithm, basic notation and definitions.In Section 3 we present the main result.We first present the auxiliary quantum procedure A1.Next, we present a quantum search algorithm A based on the hashing technique.Theorem 1 is an analysis of the A algorithm.In Section 5, we present the A2 algorithm, a generalization of the A algorithm.
The preliminary version of the article was published in Russian [10].

Preliminaries for quantum query algorithm
We refer the reader to the [5,11] for an introduction to the basics of quantum query algorithms and the state of research in this area.Here we define the query model of computations in a way that is convenient for us.We use the notation defined in detail in [6,7].
The operations applied to quantum s-qubit states |ψ ∈ (H 2 ) ⊗s are mathematically expressed using unitary operators: where U is a 2 s × 2 s unitary matrix.
Query model.For a finite sets X and Y let g : X → Y be a discrete function.A quantum query algorithm A computing the function g begins in a quantum state |ψ start and applies a sequence of operators O X , U, . . ., O X , U, The operator O X depends on the input X.The quantum community calls the O X operator an "oracle".Application of the oracle is called the query of the algorithm A to the initial data X.The operator U does not depend on X.
The algorithm computes the value g(σ) of the discrete function g for σ ∈ X if the initial state |ψ start goes to the final state |ψ(g(σ)) = UO σ • • • UO σ |ψ start , in the process of computing on the input value σ.The final state allows extracting the value of g(σ) as a result of measuring the state of |ψ(g(σ)) .
Search in an unordered database.Quantum search algorithm in an unordered database of n elements, in which there is exactly one element of interest to us (Grover's algorithm) is a special case of a group of query algorithms.The following algorithm scheme underlies many generalizations.
1. Initialize a quantum system of O(log n) qubits into a |ψ start state containing information about the database.
The |ψ start state is constructed so that each of the 2 O(log n) basis quantum states represents the required information about n database elements.Basic operations of qubits for quantum search.The potential advantages of quantum algorithms, on which the results of this work are based, lie in the possibility of implementing quantum operators of dimension 2 s × 2 s by basic operations based on a small number of order O(s) qubits and in a small number of quantum computing steps (not a complex scheme implemented by basic elements).
Main operations are I, X, Z, H.
• X -is a NOT operator.It changes the state of the qubit from |0 to |1 , and vice versa.
Characteristics of the A algorithm.The characteristics of a quantum query algorithm are the used memory, the number of queries to the analyzed data, and the probability of an error.

Size complexity (Memory complexity).
The number S(A) of used qubits is a measure of the memory complexity of the quantum algorithm A. We denote by S A (string, w) the number of qubits used by the A algorithm to solve the problem of finding the substring w in the string.
Through S A (n, m) we denote the maximum among the numbers S A (string, w) over all string and w with parameters Query complexity.The number Q(A) of queries (number of the oracle applications) is a "query" measure of the complexity of the quantum algorithm A. Note that in [11], one request to the oracle is testing one variable (one bit).Accordingly, in our work, one request to the oracle is implemented in the process of testing the entire string w (hash values of w).We denote by Q A (string, w) the number of applications by the algorithm A of the oracle operator.Through Q A (n, m) we denote the maximum among the numbers Q A (string, w) over all string and w with parameters

Error probability.
We denote by Er A (string, w) the probability of the following event.The algorithm A as a result of solving the problem of finding the substring w in the text string gives the number k of the position in the text string such that w k = w.Through Er A (n, m) we denote the maximum among the numbers Er A (string, w) over all string and w with parameters 3 Algorithms for finding the index of occurrence of a substring in the text.
In this section, we present a hybrid classical-quantum algorithm (more precisely, a hybrid random-quantum algorithm) A for finding a substring w of length m in a binary text We consider the simplest version of the problem: we assume that the required substring w is guaranteed to occur exactly once in the text string.
The problem is reduced to a problem of finding word w in a vocabulary as follows.Let n = N + 1 − m.To simplify presentaion, we assume below that the number n (the number of substrings in the string) is a power of 2. Denote by V (string, m) a sequence composed of all substrings of length m of the string V (string, m) = {w 0 , . . ., w n−1 }, where w k = b k+1 . . .b k+m for 0 ≤ k ≤ n − 1.We will call V (string, m) a vocabulary.Now the problem of finding the word w in string is represented as the problem of finding an index k such that w = w k for the vocabulary V (string, m).

Auxiliary quantum procedure A1.
The quantum part of the algorithm is the following auxiliary quantum procedure A1.Procedure A1 starts with an initial quantum state |string (quantum vocabulary) constructed from the vocabulary V using the preprocedure P.
Pre-procedure P generates the initial state (quantum vocabulary) |string from vocabulary V = {v 0 , . . ., v n−1 } composed of binary words of length l ≥ 1.The |string state has the following structure

Description of the procedure A1
Input: Quantum state |string .Binary word v.
Output: The index k, which is interpreted as an index such that v = v k .
The following two macro steps, described below in operator form, are applied to the state |string π 4 √ n times.In the literature, these two macro steps are often referred to as the "Grover iteration" [1].
• The oracle O fv operation of changing the phase of the state representing information about v k , for which Let |x denotes the basic state corresponding to the element x, which is one of v k .In this case, |x is the lqubit basis state.Oracle O fv performs the following three actions: 1. Application of the H operator on the auxiliary (log n + l + 1)-th qubit |1 : 2. Application of the operator U fv on the last l + 1 qubits: 3. Application of the H operator on the auxiliary (log n + l + 1)-th qubit |1 : Note that the auxiliary qubit at the end restores its value (by repeated application of the Hadamard transformation) to the |1 state.
• Inversion operation.D = 2R -I ⊗ log n -operator applied on the first n qubits.The R operator is given in matrix form: Getting the result of computation (the output of the A1).After running the macro steps above π

4
√ n times, we measure the first log n qubits in the computational basis.The resulting log n bits are interpreted as a binary representation of the required index k.

Algorithm A.
The algorithm consists of two sequentially working parts: • First part: preparing the initial state based on the dictionary V (string, m).
• Second part: reading the search word w and searching for its occurrence in the vocabulary.
We emphasize that the algorithm A has two different input sets: V (string, m) and w.These sets are fed to the first and second parts of the algorithm A, respectively.

Notations.
• For a binary string w of length m, denote by a(w) the number represented by w, 0 ≤ a(w) ≤ 2 m − 1.
• For the vocabulary V (string, m), let V p denote the following vocabulary where v(w k ) = bin(r(w k ) p ) and r(w k ) p is the p-remainder of a(w k ).That is, a(w k ) = cp + r(w k ) p , where c ≥ 0 and r(w k ) p ∈ {0, . . ., p − 1}.
• Denote by P d = {p 1 , . . ., p d } the set of first d primes.
Description of the algorithm A.

Input:
For the first part: Vocabulary V (string, m).
For the second part: Binary word w of length m.
Output: The index k, which is interpreted as an index such that w = w k .
That is, A implements the mapping A : V (string, m), w −→ k.
The first part of the algorithm is to prepare the initial state from the vocabulary V (string, m): • First, the algorithm makes a classical random choice: a prime number p is uniformly and randomly selected from the set P d and the vocabulary V p V p = {v(w 0 ), . . ., v(w n−1 )} prepared from the vocabulary V (string, m).
• Second step consists of preparation of quantum state |string, p composed of The second part of the algorithm is to read the input word w and find k such that w = w k : • The algorithm reads the input word w and prepares v(w) = bin(r(w) p ).
• The quantum stage of the algorithm A: quantum procedure A1 is applied with the input |string, p and word v(w).A1 implements the mapping to the state |string, p and the search word v(w).The number k is the result of measuring the first log n qubits.The number k is declared as the required number of the word w k such that w = w k .

Characteristics of the algorithm A
The following theorem describes the characteristics of the main part of the A algorithm (searching for the w).Theorem 1.For a vocabulary V = V (string, m) = {w 0 , . . ., w n−1 } of words of length m, for a word w of length m algorithm A solves the problem of finding index k such that w = w k with the following characteristics.
For an arbitrary integer c ≥ 3, for an integer d = cnm it is true that The proof of Theorem 1 is given in the next section.
4 Proof of Theorem1 Proof.Let us consider the amplitude amplification procedure when the procedure A1 is applied to the initial state |string, p .Let's put |string 0 = |string, p .Let us denote the following numbers by α 0 and β 0 : α 0 is the initial amplitude for the required basic state |k |v(w k ) |1 such that v(w k ) = v(w), and β 0 are the amplitudes of all other basic states of the initial state |string 0 .
By virtue of the introduced notation, the state |string 0 can be represented in the following form: After applying successively j times two macrosteps to the initial state |string 0 1) The operation of changing the phase of the state representing information about w k , for which v(w k ) = v(w) is performed and 2) The inversion operations (called in the literature Grover's iteration), the amplitudes of the (j + 1)-th state of the |string j+1 will be expressed by formulas (see, for example, [12] for a detailed technical justification for the effects of operators 1), and 2) on |string j states): We have: where (recall that we are considering the case when k for which v(w k ) = v(w) is the only number).
Therefore, α and β can be given as Further, α r = 1 if (2r + 1)θ = π/2.Based on these considerations, the optimal number r of iterations of the search algorithm r = (π − 2θ)/4θ is determined.[12] shows that the probability of getting an incorrect result does not exceed 1/n if you run [π/(4θ)] of Grover's iterations in succession.If n is a sufficiently large number, then θ Since one call to the oracle in the algorithm is testing an entire substring, the final value is

The error Er
The proof of the upper bound for the error probability Er A (V, w) is based on the following considerations.For V and the desired word w, we split the set P d of primes into good P good and bad P bad .
A prime number p ∈ P will be considered as good for the V and w if r(w) p = r(w j ) p for all w j ∈ V such that w = w j .That is, the vocabulary V p with p ∈ P good (the "good" vocabulary V p ) represents the vocabulary V "correctly", and the vocabulary V p with p ∈ P bad (the "bad" vocabulary V p ) represents vocabulary V "wrong".
Then the error probability Er A (V, w) of the algorithm A can be estimated from above as follows where P r bad is the probability of choosing p ∈ P bad and Er A1 is the error of A1 when "good" dictionary V p is chosen for procedure A1.
Note that, if a bad p occurs, it is possible to obtain the correct result when applying the A1 procedure.However, considering all continuations of the procedure A1 as erroneous for bad p, we only increase the error probability.
In the remainder of the proof we estimate the P r bad and Er A1 components of the sum (6 , the set of primes p ∈ P d such that a 1 ≡ a 2 (mod p).For the vocabulary V and the desired sequence w, we have that Note that for an arbitrary pair of distinct numbers a 1 , a 2 ∈ {0, . . ., 2 m − 1} the following is true The idea of proving the inequality ( 8) is as follows.The number a = |a 1 − a 2 | does not exceed 2 m .Therefore, less than m different primes can divide a.For details of the proof (8), see Lemma 7.4 [13].
Finally combining (7) and (8) we have that ✷ Now let's estimate the second term of the sum (6).We estimate the Er A1 in the condition when we apply the procedure A1 for the state |string, p .The state |string, p represents (quantumly) the vocabulary V p = {v(w 0 ), . . ., v(w n−1 )}, for p ∈ P good .That is, we are in the case where V p represents the vocabulary V "correctly".This situation satisfies the condition of Grover's search algorithm.So, immediately we have that Now, finally, the probability Er A (string, w) is estimated based on (6) and Property 2 and (9) as follows: Thus, we have that there are few "bad variants" of processing the pair string and w by the A algorithm -their share does not exceed 1/c of the total number of possible processings.In the "good case" of processing the pair string and w, the probability of erroneous processing is bounded from above by the error probability of the procedure A1, which is bounded by 1/n.

Generalization
In this section, we present the A2 algorithm, which is a generalization of the A algorithm in terms of universal family of hash functions.
The concept of universal hashing is defined in [14] and has been discussed in sufficient detail in a number of papers, see, for example, [15] and [16].The family F = {f 1 , . . ., f d } of (m, l)-functions f : {0, 1} m → {0, 1} l for l < m is called a universal family of hash functions if for some ǫ ∈ [0, 1] and for an arbitrary pair v, w ∈ {0, For our purposes, we extend the definition of the universal family of hash functions as follows.
for n ≥ 1 and ǫ ∈ [0, 1] will be called a strongly (n, ǫ)-universal family of hash (m, l)-functions if for each n-subset Set = {v 1 , . . ., v n } of the set {0, 1} m and an arbitrary word Note that a strongly (1, ǫ)-universal family of hash functions is an ǫ-universal family of hash functions in the standard sense.
Here r(w) pj is the remainder r when a(w) is divided by a prime p j and bin(r(w) pj ) is the binary presentation of the number r(w) pj .Clearly we have that l ≤ log p d .
The family F satisfies the following property.Proof.For the proof see the Property 2 above.✷ We now present the A2 algorithm -generalization of the A algorithm -in terms of a strongly (n, ǫ)-universal family of hash functions.
algorithm, namely, we consider that in the text string there is (necessarily) a unique occurrence of the word w.It must be said that the calculation of query complexity depends on the direct implementation of the algorithm.In this work, we consider the oracle as an appeal to the whole substring, not to its individual bits.
The main declared result in saving the number of qubits used is achieved in the A and A2 algorithms.The A and A2 algorithms use hash family techniques to save quantum space.The A algorithm is based on a specific family of hash functions.This specific family of hash functions is known as Freivald fingerprints (see, for example, the book [13], chapter 7 for more information).The A2 algorithm is the generalization of the A for an arbitrary strongly (n, ǫ)-universal family of hash functions.
It is important to note that in this paper, as well as in the cited paper [9], the algorithms are applied to the quantum state (initial state), which is prepared in advance based on the analyzed text string (on the Vocabulary V (string) that is formed from the string).The preparation of the initial state requires preliminary work, but this work is not taken into account in the algorithms.Note that the problem of preparing the initial state, as a special problem, is beginning to be discussed in the community dealing with quantum information search.In particular, the authors of [17] drew attention to the problem of the complexity of preparing the initial state.
Finally, once again, we note that in this paper we considered the case when it is known in advance that the desired substring occurs in the text exactly once.The problem can be expanded if the number of occurrences is greater than 1 and is known in advance, and also if the number of occurrences of the substring is not known in advance.An estimate of the time and space complexity of Grover's search algorithm in these cases is described in [12].

4. 1
Space complexity S A ǫ (n, m).We start with the technical Lemma we need below.Lemma 1.For arbitrary p ∈ P d , for d = cnm, for vocabulary V p = {v(w 0 ), . . ., v(w n−1 )}, generated from vocabulary V and for a word v(w) formed from the word w of length m it is true that |v(w)| = O(log n + log m), |v(w k )| = O(log n + log m).Proof.Due to the Theorem condition we select the set P d of the first d primes, where d = cnm.Due to Chebyshev's theorem there exist constants 0 < a < A such that for all d = 1, 2, . . ., the d-th prime number p d satisfies the inequalities ad ln d < p d < Ad ln d.That is, for arbitrary p ∈ P d and r ≤ p it is true that bin(r) ≤ log(Ad ln d) = O(log n + log m).✷ To estimate the space complexity characteristic S A ǫ (n, m), consider the quantum state |string, p formed on the basis of V p |string, p = 1 √ n n−1 k=0 |k ⊗ |v(w k ) ⊗ |1 .First.log n qubits are needed to encode |k basis states.Second.According to the 1 Lemma, O(log n + log m) qubits are needed to encode |v(w k ) basis states.Thus, we have S A ǫ (n, m) = O(log n + log m).

4. 2 Property 1 .
The query complexity Q A (n, m).The query complexity Q A (n, m) of algorithm A completely determined by the query complexity Q A1 (n, m) of the auxiliary procedure A1 when A1 takes as input the quantum state |string, p .Recall that |string, p represents (quantumly) the vocabulary V p for p ∈ P d with d satisfying the condition of the theorem.Let p ∈ P d with d = cnm.Let |string, p be the quantum state generated by V p , where V p itself formed by the vocabulary V (string, m).Let a word v(w) formed from w. Then for the A1(|string, p , v(w)) the following is true
2. The following O(√ n) "macro" steps are performed on the |ψ start state• The oracle O X is used.It recognizes the basis state of interest to us and multiplies its amplitude by -1.•The operator U performs the inversion by the average value over all amplitudes.
). Observe that p is selected uniformly at random from P d .So, P r bad = |P bad |/|P d |.To estimate the |P bad | consider the following.For a pair of distinct numbers a 1 , a 2 ∈ {0, . . . 2 m − 1}, we denote by P a1,a2