Hybrid Classical–Quantum Text Search Based on Hashing

Ablayev, Farid; Salikhova, Nailya; Ablayev, Marat

doi:10.3390/math12121858

Open AccessArticle

Hybrid Classical–Quantum Text Search Based on Hashing

by

Farid Ablayev

^*,†,

Nailya Salikhova

^*,† and

Marat Ablayev

^*,†

Institute of Computational Mathematics and Information Technology, Kazan Federal University, Kazan 420008, Russia

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2024, 12(12), 1858; https://doi.org/10.3390/math12121858

Submission received: 19 April 2024 / Revised: 31 May 2024 / Accepted: 8 June 2024 / Published: 14 June 2024

(This article belongs to the Special Issue Quantum Algorithms and Quantum Computing)

Download Versions Notes

Abstract

The paper considers the problem of finding a given substring in a text. It is known that the complexity of a classical search query in an unordered database is linear in the length of the text and a given substring. At the same time, Grover’s quantum search provides a quadratic speed-up in the complexity of the query and gives the correct result with a high probability. We propose a hybrid classical–quantum algorithm (hybrid random–quantum algorithm, to be more precise) that implements Grover’s search to find a given substring in a text. As expected, the algorithm works (a) with a high probability of obtaining the correct result and (b) with a quadratic query acceleration compared to the classical one. What is new is that our algorithm uses the uniform hash family functions technique. As a result, our algorithm is much more memory efficient (in terms of the number of qubits used) compared to previously known quantum algorithms.

Keywords:

text search; quantum hashing; Grover’s algorithm; prime numbers; strongly (n,ϵ)-universal family of hash functions

MSC:

68Q12

1. Introduction

Quantum search algorithms and their generalizations [1,2,3,4,5] are part of a group of quantum algorithms that are of great interest in computer science.

The use of quantum computers can significantly reduce the computation time for models with a large number of weights, or those requiring an exponentially growing number of combinations. We presented several results in the field of quantum approaches to classification and information retrieval in the papers [6,7].

The task of finding occurrences of a given substring in a text is an important problem of information retrieval. It occurs in a wide range of applications, namely, in text editors, in search robots, spam filters, bioinformatics, etc. A large number of algorithms (deterministic and probabilistic) for solving the search problem have been developed over the past few decades. Quantum algorithms for solving the search problem have been developed in the last two decades.

1.1. Problem

Given a binary N length sequence

s t r i n g = b_{1} \dots b_{N}

and a binary m length sequence w,

m < N

. Let

n = N + 1 - m

denote the number of substrings of length m in the text string. It is required to find the index of the occurrence of the substring w in the text

s t r i n g

. Namely, it is required to find an index k such that

w = b_{k} \dots b_{k + m - 1}

,

0 \leq k \leq n - 1

.

1.2. Related Work

The known Knuth–Morris–Pratt’s [8] classical algorithm (1977) solves the problem in linear time

O (m + n)

.

In the early 2000s, a quantum algorithm [9] for searching for a given substring in a text was presented. It allows to obtain quadratic search speed-up. It returns one of the indices of the occurrence of the searched substring. The probability of obtaining the correct answer is strictly greater than 1/2. The query complexity (this measure of complexity is sometimes associated with time complexity) of an algorithm is

O (\sqrt{n} log \sqrt{\frac{n}{m}} log m + \sqrt{m} {log}^{2} m)

. The authors [9] do not estimate the memory complexity (number of used qubits) of their algorithm. Since the authors use m qubits to encode a single binary subword of length m, the [9] algorithm requires

O (log n + m)

qubits.

1.3. Our Contribution

We present a quantum algorithm for finding a pattern in a string. The algorithm is based on Grover’s search algorithm and hashing technique. It is known that the hashing method can significantly save space and, as a rule, the time required for searching in databases. The results of the paper demonstrate the potential of a universal hashing method for building quantum search algorithms.

To simplify the presentation, and not to clutter the idea with technical details, consider the case when it is known in advance that the desired substring occurs in the text exactly once.

More precisely, we propose a hybrid random–quantum search algorithm

A

that searches a text for a substring and has the following characteristics:

The $A$ algorithm produces a result with a high probability of obtaining the correct answer.
The $A$ algorithm is based on Grover’s search. This search is presented in the paper as an auxiliary algorithm $A 1$ and requires $O (\sqrt{n})$ query steps.
The $A$ algorithm exponentially saves (compared to the [9] algorithm) the number of qubits relative to the parameter m—the length of the substring. Namely, the algorithm requires $O (log n + log m)$ qubits for his work.

The main idea of the paper is the use of the hashing technique to save space complexity in quantum search. The

A

algorithm is based on a certain universal family of hash functions. The

A 2

algorithm is a generalization of the result to a general universal hash family of functions.

The result of the work is organized as follows. In the next section (Section 2), we present the basic model of the quantum search algorithm, basic notation and definitions. In Section 3, we present the main result. We first present the auxiliary quantum procedure

A 1

. Next, we present a quantum search algorithm

A

based on the hashing technique. Theorem 1 is an analysis of the

A

algorithm. In Section 5, we present the

A 2

algorithm, a generalization of the

A

algorithm.

The preliminary version of the article was published in Russian [10].

2. Preliminaries for Quantum Query Algorithm

We refer the reader to the [5,11] for an introduction to the basics of quantum query algorithms and the state of research in this area. Here, we define the query model of computations in a way that is convenient for us. We use the notation defined in detail in [6,7].

The operations applied to quantum s-qubit states

|ψ〉 \in {(H^{2})}^{\otimes s}

are mathematically expressed using unitary operators:

|ψ^{'}〉 = U |ψ〉,

(1)

where U is a

2^{s} \times 2^{s}

unitary matrix.

2.1. Query Model

For a finite sets X and Y let

g : X \to Y

be a discrete function. A quantum query algorithm

A

computing the function g begins in a quantum state

| ψ_{s t a r t} 〉

and applies a sequence of operators

O_{X}, U, \dots, O_{X}, U,

The operator

O_{X}

depends on the input X. The quantum community calls the

O_{X}

operator an “oracle”. Application of the oracle is called the query of the algorithm

A

to the initial data X. The operator

U

does not depend on X.

The algorithm computes the value

g (σ)

of the discrete function g for

σ \in X

if the initial state

| ψ_{s t a r t} 〉

goes to the final state

|ψ (g (σ))〉 = U O_{σ} \dots U O_{σ} |ψ_{s t a r t}〉,

in the process of computing on the input value

σ

. The final state allows extracting the value of

g (σ)

as a result of measuring the state of

|ψ (g (σ))〉

.

2.2. Search in an Unordered Database

Quantum search algorithm in an unordered database of n elements, in which there is exactly one element of interest to us (Grover’s algorithm), is a special case of a group of query algorithms. The following algorithm scheme underlies many generalizations:

Initialize a quantum system of $O (log n)$ qubits into a $|ψ_{s t a r t}〉$ state containing information about the database. The $|ψ_{s t a r t}〉$ state is constructed so that each of the $2^{O (log n)}$ basis quantum states represents the required information about n database elements.
The following $O (\sqrt{n})$ “macro” steps are performed on the $|ψ_{s t a r t}〉$ state:
- The oracle $O_{X}$ is used. It recognizes the basis state of interest to us and multiplies its amplitude by −1.
- The operator $U$ performs the inversion by the average value over all amplitudes.

“Macro” Step 2 describes the key operation of quantum search. Each such step increases the amplitude of the basis state that represents the desired information. The number

O (\sqrt{n})

of such steps maximizes the amplitude, and the probability of extracting the required information from the resulting state

|ψ_{f i n a l}〉

becomes close to 1.

2.3. Basic Operations of Qubits for Quantum Search

The potential advantages of quantum algorithms, on which the results of this work are based, lie in the possibility of implementing quantum operators of dimension

2^{s} \times 2^{s}

by basic operations based on a small number of order

O (s)

qubits and in a small number of quantum computing steps (not a complex scheme implemented by basic elements).

The main operations are

I, X, Z, H

:

I—identity operator.

$I = (\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}) .$
X—is a NOT operator. It changes the state of the qubit from $|0〉$ to $|1〉$ , and vice versa:

$\begin{matrix} X = (\begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix}) . \end{matrix}$

(2)
Z—amplitude sign reversal operator:

$\begin{matrix} Z = (\begin{matrix} 1 & 0 \\ 0 & - 1 \end{matrix}) . \end{matrix}$

(3)
H—Hadamard operator:

$\begin{matrix} H = \frac{1}{\sqrt{2}} (\begin{matrix} 1 & 1 \\ 1 & - 1 \end{matrix}) . \end{matrix}$

(4)

2.4. Characteristics of the $A$ Algorithm

The characteristics of a quantum query algorithm are the used memory, the number of queries to the analyzed data, and the probability of an error.

Size complexity (memory complexity). The number $S (A)$ of used qubits is a measure of the memory complexity of the quantum algorithm $A$ .
We denote by $S^{A} (s t r i n g, w)$ the number of qubits used by the $A$ algorithm to solve the problem of finding the substring w in the $s t r i n g$ .
Through $S^{A} (n, m)$ , we denote the maximum among the numbers $S^{A} (s t r i n g, w)$ over all $s t r i n g$ and w with parameters $N = | s t r i n g |, m = | w |, n = N - m + 1$ .
Query complexity. The number $Q (A)$ of queries (number of the oracle applications) is a “query” measure of the complexity of the quantum algorithm $A$ . Note that in [11], one request to the oracle is testing one variable (one bit). Accordingly, in our work, one request to the oracle is implemented in the process of testing the entire string w (hash values of w).
We denote by $Q^{A} (s t r i n g, w)$ the number of applications by the algorithm $A$ of the oracle operator.
Through $Q^{A} (n, m)$ , we denote the maximum among the numbers $Q^{A} (s t r i n g, w)$ over all $s t r i n g$ and w with parameters $N = | s t r i n g |, m = | w |, n = N - m + 1$ .
Error probability. We denote by $E r^{A} (s t r i n g, w)$ the probability of the following event. The algorithm $A$ , as a result of solving the problem of finding the substring w in the text $s t r i n g$ , gives the number k of the position in the text $s t r i n g$ such that $w_{k} \neq w$ .
Through $E r^{A} (n, m)$ , we denote the maximum among the numbers $E r^{A} (s t r i n g, w)$ over all $s t r i n g$ and w with parameters $N = | s t r i n g |, m = | w |, n = N - m + 1$ .

3. Algorithms for Finding the Index of Occurrence of a Substring in the Text

In this section, we present a hybrid classical–quantum algorithm (more precisely, a hybrid random–quantum algorithm)

A

for finding a substring w of length m in a binary text

s t r i n g = b_{1} \dots b_{N} .

We consider the simplest version of the problem: we assume that the required substring w is guaranteed to occur exactly once in the text

s t r i n g

.

The problem is reduced to a problem of finding word w in the vocabulary as follows. Let

n = N + 1 - m

. To simplify presentation, we assume below that the number n (the number of substrings in the

s t r i n g

) is a power of 2. Denote by

V (s t r i n g, m)

a sequence composed of all substrings of length m of the

s t r i n g

V (s t r i n g, m) = {w_{0}, \dots, w_{n - 1}},

where

w_{k} = b_{k + 1} \dots b_{k + m}

for

0 \leq k \leq n - 1

. We will call

V (s t r i n g, m)

a vocabulary.

Now, the problem of finding the word w in

s t r i n g

is represented as the problem of finding an index k such that

w = w_{k}

for the vocabulary

V (s t r i n g, m)

.

3.1. Auxiliary Quantum Procedure $A 1$ (Algorithm 1)

The quantum part of the algorithm is the following auxiliary quantum procedure

A 1

. Procedure

A 1

starts with an initial quantum state

|s t r i n g〉

(quantum vocabulary) constructed from the vocabulary V using the pre-procedure

P

.

Pre-procedure

P

generates the initial state (quantum vocabulary)

|s t r i n g〉

from vocabulary

V = {v_{0}, \dots, v_{n - 1}}

composed of binary words of length

l \geq 1

. The

|s t r i n g〉

state has the following structure:

|s t r i n g〉 = \frac{1}{\sqrt{n}} \sum_{k = 0}^{n - 1} |k〉 \otimes |v_{k}〉 \otimes |1〉 .

Description of the procedure $A 1$

Algorithm 1 Auxiliary quantum procedure

A 1

Input: Quantum state

|s t r i n g〉

. Binary word v.

Output: The index k, which is interpreted as an index such that

v = v_{k}

.

That is,

A 1 (|s t r i n g〉, v)

implements the mapping

A 1 : |s t r i n g〉, v ⟼ k .

The following two macro steps, described below in operator form, are applied to the state

|s t r i n g〉

\frac{π}{4} \sqrt{n}

times. In the literature, these two macro steps are often referred to as the “Grover iteration” [1]:

The oracle $O_{f_{v}}$ operation of changing the phase of the state representing information about $v_{k}$ , for which $v_{k} = v$ is performed. For a binary sequence $v \in {0, 1}^{l}$ , define the Boolean function $f_{v} : {0, 1}^{l} \to {0, 1}$ by the condition $f_{v} (x) = 1$ if and only if $x = v$ .
Let $|x〉$ denote the basic state corresponding to the element x, which is one of $v_{k}$ . In this case, $|x〉$ is l—the qubit basis state. Oracle $O_{f_{v}}$ performs the following three actions:
- Application of the H operator on the auxiliary ( $log n + l + 1$ )-th qubit $|1〉$ :
  
  $|x〉 \otimes |1〉 \overset{I^{\otimes log n + l} \otimes H}{\to} |x〉 \otimes \frac{1}{\sqrt{2}} (|0〉 - |1〉)$
- Application of the operator $U_{f_{v}}$ on the last $l + 1$ qubits:
  
  $\begin{matrix} |x〉 \otimes \frac{1}{\sqrt{2}} (|0〉 - |1〉) & \overset{I^{\otimes log n} \otimes U_{f_{v}}}{\to} \frac{1}{\sqrt{2}} |x〉 \otimes (|0 \oplus f_{v} (x)〉 - |1 \oplus f_{v} (x)〉) = \\ = {(- 1)}^{f_{v} (x)} |x〉 \otimes \frac{1}{\sqrt{2}} (|0〉 - |1〉) \end{matrix}$
- Application of the H operator on the auxiliary ( $log n + l + 1$ )-th qubit $|1〉$ :
  
  $\begin{matrix} {(- 1)}^{f_{v} (x)} |x〉 \otimes \frac{1}{\sqrt{2}} (|0〉 - |1〉) & \overset{I^{\otimes log n + l} \otimes H}{\to} {(- 1)}^{f_{v} (x)} |x〉 \otimes |1〉 \end{matrix}$
Note that the auxiliary qubit at the end restores its value (by repeated application of the Hadamard transformation) to the $|1〉$ state:
Inversion operation. $D$ = 2 $R - I^{\otimes log n}$ —operator applied on the first n qubits. The $R$ operator is given in matrix form:

$R = \frac{1}{log n} (\begin{matrix} 1 & 1 & \dots & 1 \\ 1 & 1 & \dots & 1 \\ ⋮ & \dots & ⋱ & ⋮ \\ 1 & 1 & \dots & 1 \end{matrix})$

Obtaining the result of computation (the output of the

A 1

). After running the macro steps above

\frac{π}{4} \sqrt{n}

times, we measure the first

log n

qubits in the computational basis. The resulting

log n

bits are interpreted as a binary representation of the required index k.

3.2. Algorithm $A$ (Algorithm 2)

The algorithm consists of two sequentially working parts:

First part: preparing the initial state based on the dictionary $V (s t r i n g, m)$ .
Second part: reading the search word w and searching for its occurrence in the vocabulary.

We emphasize that the algorithm

A

has two different input sets:

V (s t r i n g, m)

and w. These sets are fed to the first and second parts of the algorithm

A

, respectively.

Notations:

For a binary string w of length m, denote by $a (w)$ the number represented by w, $0 \leq a (w) \leq 2^{m} - 1$ .
Given a number $a \in {0, \dots, 2^{m} - 1}$ , denote by $b i n (a)$ its binary representation of length m.
For the vocabulary $V (s t r i n g, m)$ , let $V_{p}$ denote the following vocabulary:

$V_{p} = {v (w_{0}), \dots, v (w_{n - 1})},$

where $v (w_{k}) = b i n (r {(w_{k})}_{p})$ and $r {(w_{k})}_{p}$ is the p-remainder of $a (w_{k})$ . That is, $a (w_{k}) = c p + r {(w_{k})}_{p}$ , where $c \geq 0$ and $r {(w_{k})}_{p} \in {0, \dots, p - 1}$ .
Denote by $P_{d} = {p_{1}, \dots, p_{d}}$ the set of first d primes.

Description of the Algorithm $A$

Algorithm 2 Algorithm

A

Input:

For the first part: Vocabulary

V (s t r i n g, m)

.

For the second part: Binary word w of length m.

Output: The index k, which is interpreted as an index such that

w = w_{k}

.

That is,

A

implements the mapping

A : V (s t r i n g, m), w ⟼ k .

The first part of the algorithm is to prepare the initial state from the vocabulary

V (s t r i n g, m)

:

First, the algorithm makes a classical random choice: a prime number p is uniformly and randomly selected from the set $P_{d}$ and the vocabulary $V_{p}$

$V_{p} = {v (w_{0}), \dots, v (w_{n - 1})}$

prepared from the vocabulary $V (s t r i n g, m)$ .
The second step consists of the preparation of quantum state $|s t r i n g, p〉$ composed of $V_{p}$

$|s t r i n g, p〉 = \frac{1}{\sqrt{n}} \sum_{k = 0}^{n - 1} |k〉 \otimes |v (w_{k})〉 \otimes |1〉 .$

The second part of the algorithm is to read the input word w and find k such that

w = w_{k}

:

The algorithm reads the input word w and prepares $v (w) = b i n (r {(w)}_{p})$ .
The quantum stage of the algorithm $A$ : quantum procedure $A 1$ is applied with the input $|s t r i n g, p〉$ and word $v (w)$ . $A 1$ implements the mapping

$A 1 : (|s t r i n g, p〉, v (w)) ⟼ k .$

to the state $|s t r i n g, p〉$ and the search word $v (w)$ . The number k is the result of measuring the first $log n$ qubits. The number k is declared as the required number of the word $w_{k}$ such that $w = w_{k}$ .

3.3. Characteristics of the Algorithm $A$

The following theorem describes the characteristics of the main part of the

A

algorithm (searching for the w).

Theorem 1.

For a vocabulary

V = V (s t r i n g, m) = {w_{0}, \dots, w_{n - 1}}

of words of length m, for a word w of length m, algorithm

A

solves the problem of finding index k such that

w = w_{k}

with the following characteristics.

For an arbitrary integer

c \geq 3

, for an integer

d = c n m

, it is true that

\begin{matrix} S^{A} (n, m) & = O (log n + log m), \\ Q^{A} (n, m) & = O (\sqrt{n}), \\ E r^{A} (V, w) & \leq \frac{1}{c} + \frac{1}{n} . \end{matrix}

The proof of Theorem 1 is given in the next section.

4. Proof of Theorem 1

4.1. Space Complexity $S^{A} (n, m)$

We start with the technical lemma we need below.

Lemma 1.

For arbitrary

p \in P_{d}

, for

d = c n m

, for vocabulary

V_{p} = {v (w_{0}), \dots, v (w_{n - 1})},

generated from vocabulary V and for a word

v (w)

formed from the word w of length m, it is true that

| v (w) | = O (log n + log m), | v (w_{k}) | = O (log n + log m) .

Proof.

Due to the theorem condition, we select the set

P_{d}

of the first d primes, where

d = c n m

. Due to Chebyshev’s theorem, there exist constants

0 < a < A

such that for all

d = 1, 2, \dots

, the d-th prime number

p_{d}

satisfies the inequalities

a d ln d < p_{d} < A d ln d .

The values of the constants defined by Chebyshev (see for example [12]) are

a = 0.92129, A = 1.10555

That is, for arbitrary

p \in P_{d}

and

r \leq p

, it is true that

b i n (r) \leq log (A d ln d) = O (log n + log m) .

□

To estimate the space complexity characteristic

S^{A} (n, m)

, consider the quantum state

|s t r i n g, p〉

formed on the basis of

V_{p}

|s t r i n g, p〉 = \frac{1}{\sqrt{n}} \sum_{k = 0}^{n - 1} |k〉 \otimes |v (w_{k})〉 \otimes |1〉 .

First,

log n

qubits are needed to encode

|k〉

basis states. Second, according to the Lemma 1,

O (log n + log m)

qubits are needed to encode

|v (w_{k})〉

basis states. Thus, we have

S^{A} (n, m) = O (log n + log m)

.

4.2. The Query Complexity $Q^{A} (n, m)$

The query complexity

Q^{A} (n, m)

of algorithm

A

is completely determined by the query complexity

Q^{A 1} (n, m)

of the auxiliary procedure

A 1

when

A 1

takes as input the quantum state

|s t r i n g, p〉

. Recall that

|s t r i n g, p〉

represents (quantumly) the vocabulary

V_{p}

for

p \in P_{d}

, with d satisfying the condition of the theorem.

Property 1.

Let

p \in P_{d}

with

d = c n m

. Let

|s t r i n g, p〉

be the quantum state generated by

V_{p}

, where

V_{p}

itself is formed by the vocabulary

V (s t r i n g, m)

. Let a word

v (w)

be formed from w. Then, for the

A 1 (|s t r i n g, p〉, v (w))

, the following is true

\begin{matrix} Q^{A 1} (n, m) & = O (\sqrt{n}) \end{matrix}

(5)

Proof.

Let us consider the amplitude amplification procedure when the procedure

A 1

is applied to the initial state

|s t r i n g, p〉

. Let us put

|s t r i n g_{0}〉 = |s t r i n g, p〉

. Let us denote the following numbers by

α_{0}

and

β_{0}

:

α_{0}

is the initial amplitude for the required basic state

|k〉 |v (w_{k})〉 |1〉

such that

v (w_{k}) = v (w)

, and

β_{0}

are the amplitudes of all other basic states of the initial state

|s t r i n g_{0}〉

:

α_{0} = 1 / \sqrt{n} = β_{0} = 1 / \sqrt{n}

and

α_{0}^{2} + (n - 1) β_{0}^{2} = 1

.

By virtue of the introduced notation, the state

|s t r i n g_{0}〉

can be represented in the following form:

|s t r i n g_{0}〉 = α_{0} \sum_{k : v (w_{k}) = v (w)} |k〉 \otimes |v (w_{k})〉 \otimes |1〉 + β_{0} \sum_{k : v (w_{k}) \neq v (w)} |k〉 \otimes |v (w_{k})〉 \otimes |1〉 .

After applying successively j times two macrosteps to the initial state

|s t r i n g_{0}〉

, we have the following:

(1) The operation of changing the phase of the state representing information about

w_{k}

, for which

v (w_{k}) = v (w)

is performed;

(2) The inversion operations (called in the literature Grover’s iteration), the amplitudes of the

(j + 1)

-th state of the

|s t r i n g_{j + 1}〉

, will be expressed by formulas (see, for example, ref. [13] for a detailed technical justification for the effects of operators (1), and (2) on

|s t r i n g_{j}〉

states):

α_{j + 1} = \frac{n - 2}{n} α_{j} + \frac{2 (n - 1)}{n} β_{j} β_{j + 1} = \frac{n - 2}{n} β_{j} - \frac{2}{n} α_{j} .

We have:

|s t r i n g_{j + 1}〉 = α_{j + 1} \sum_{k : v (w_{k}) = v (w)} |k〉 \otimes |v (w_{k})〉 \otimes |1〉 + β_{j + 1} \sum_{k : v (w_{k}) \neq v (w)} |k〉 \otimes |v (w_{k})〉 \otimes |1〉,

where (recall that we are considering the case when k for which

v (w_{k}) = v (w)

is the only number):

α_{j + 1}^{2} + (n - 1) β_{j + 1}^{2} = 1

Therefore,

α

and

β

can be given as

α_{j} = sin ((2 j + 1) θ), β_{j} = \frac{1}{\sqrt{n - 1}} cos ((2 j + 1) θ)

Further,

α_{r} = 1

if

(2 r + 1) θ = π / 2

. Based on these considerations, the optimal number r of iterations of the search algorithm

r = (π - 2 θ) / 4 θ

is determined.

Ref. [13] shows that the probability of obtaining an incorrect result does not exceed

1 / n

if you run

[π / (4 θ)]

of Grover’s iterations in succession. If n is a sufficiently large number, then

θ \approx sin θ = 1 / \sqrt{n}

, then

r = \frac{π}{4} \sqrt{n} .

Since one call to the oracle in the algorithm is testing an entire substring, the final value is

Q^{A 1} (n, m) = O (\sqrt{n})

. □

4.3. The Error $E r^{A} (V, w)$

The proof of the upper bound for the error probability

E r^{A} (V, w)

is based on the following considerations. For V and the desired word w, we split the set

P_{d}

of primes into good

P_{g o o d}

and bad

P_{b a d}

.

A prime number

p \in P

will be considered good for the V and w if

r {(w)}_{p} \neq r {(w_{j})}_{p}

for all

w_{j} \in V

such that

w \neq w_{j}

. That is, the vocabulary

V_{p}

with

p \in P_{g o o d}

(the “good” vocabulary

V_{p}

) represents the vocabulary V “correctly”, and the vocabulary

V_{p}

with

p \in P_{b a d}

(the “bad” vocabulary

V_{p}

) represents vocabulary V “wrongly”.

Then, the error probability

E r^{A} (V, w)

of the algorithm

A

can be estimated from above as follows:

\begin{matrix} E r^{A} (V, w) & \leq & P r_{b a d} + (1 - P r_{b a d}) E r^{A 1} \\ \leq & P r_{b a d} + E r^{A 1}, \end{matrix}

(6)

where

P r_{b a d}

is the probability of choosing

p \in P_{b a d}

, and

E r^{A 1}

is the error of

A 1

when “good” dictionary

V_{p}

is chosen for procedure

A 1

.

Note that, if a bad p occurs, it is possible to obtain the correct result when applying the

A 1

procedure. However, considering all continuations of the procedure

A 1

as erroneous for bad p, we only increase the error probability.

In the remainder of the proof, we estimate the

P r_{b a d}

and

E r^{A 1}

components of the sum (6).

Property 2.

For

d = c n m

, the following is true

P r_{b a d} \leq \frac{1}{c} .

Proof.

Observe that p is selected uniformly at random from

P_{d}

. So,

P r_{b a d} = | P_{b a d} | / | P_{d} |

. To estimate the

| P_{b a d} |

, consider the following. For a pair of distinct numbers

a_{1}, a_{2} \in {0, \dots 2^{m} - 1}

, we denote by

P_{a_{1}, a_{2}}

, the set of primes

p \in P_{d}

such that

a_{1} \equiv a_{2} (mod p)

. For the vocabulary V and the desired sequence w, we have that

P_{b a d} = ⋃_{v \in V} P_{a (w), a (v)},

(7)

Note that for an arbitrary pair of distinct numbers

a_{1}, a_{2} \in {0, \dots, 2^{m} - 1}

, the following is true

| P_{a_{1}, a_{2}} | \leq m .

(8)

The idea of proving the inequality (8) is as follows. The number

a = | a_{1} - a_{2} |

does not exceed

2^{m}

. Therefore, fewer than m different primes can divide a. For details of the proof (8), see Lemma 7.4 [14].

Finally, combining (7) and (8), we have that

P r_{b a d} \leq \frac{n m}{c n m} = \frac{1}{c} .

□

Now let us estimate the second term of the sum (6). We estimate the

E r^{A 1}

in the condition when we apply the procedure

A 1

for the state

|s t r i n g, p〉

. The state

|s t r i n g, p〉

represents (quantumly) the vocabulary

V_{p} = {v (w_{0}), \dots, v (w_{n - 1})},

for

p \in P_{g o o d}

. That is, we are in the case where

V_{p}

represents the vocabulary V “correctly”. This situation satisfies the condition of Grover’s search algorithm. So, immediately we have that

\begin{matrix} E r^{A 1} & \leq \frac{1}{n} . \end{matrix}

(9)

Now, finally, the probability

E r^{A} (s t r i n g, w)

is estimated based on (6) and Property 2 and (9) as follows:

\begin{matrix} E r^{A} (s t r i n g, w) & \leq \frac{n m}{c n m} + \frac{1}{n} = \frac{1}{c} + \frac{1}{n} . \end{matrix}

Thus, we have that there are a few “bad variants” of processing the pair

s t r i n g

and w by the

A

algorithm — their share does not exceed

1 / c

of the total number of possible processings. In the “good case” of processing the pair

s t r i n g

and w, the probability of erroneous processing is bounded from above by the error probability of the procedure

A 1

, which is bounded by

1 / n

.

5. Generalization

In this section, we present the

A 2

algorithm, which is a generalization of the

A

algorithm in terms of a universal family of hash functions.

The concept of universal hashing is defined in [15] and has been discussed in sufficient detail in a number of papers, see, for example, [16,17]. The family

F = {f_{1}, \dots, f_{d}}

of (

m, l

)-functions

f : {0, 1}^{m} \to {0, 1}^{l}

for

l < m

is called a universal family of hash functions if for some

ϵ \in [0, 1]

and for an arbitrary pair

v, w \in {0, 1}^{m}

\frac{| F_{v, w} |}{| F |} \leq ϵ,

where

F_{w, v} = {f \in F : f (v) = f (w)}

.

For our purposes, we extend the definition of the universal family of hash functions as follows.

Definition 1.

A family

F = {f_{1}, \dots, f_{d}}

of (

m, l

)-functions

f : {0, 1}^{m} \to {0, 1}^{l}

for

n \geq 1

and

ϵ \in [0, 1]

will be called a strongly

(n, ϵ)

-universal family of hash (

m, l

)-functions if for each n-subset

S e t = {v_{1}, \dots, v_{n}}

of the set

{0, 1}^{m}

and an arbitrary word

w \in {0, 1}^{m}

,

\frac{| F_{S e t, w} |}{| F |} \leq ϵ,

where

F_{S e t, w} = ⋃_{v \in S e t} F_{v, w}

.

Note that a strongly

(1, ϵ)

-universal family of hash functions is an

ϵ

-universal family of hash functions in the standard sense.

Example of a strongly $(n, ϵ)$ -universal family of hash functions

An example of a strongly

(n, ϵ)

-universal family of (

m, l

)-functions is the set

F = {f_{1}, \dots, f_{d}}

of following functions. For

j \in {1, \dots, d}

, (

m, l

)-function

f_{j} : {0, 1}^{m} \to {0, 1}^{l}

is determined by the j-th prime number

p_{j}

as follows:

f_{j} (w) = b i n (r {(w)}_{p_{j}}) .

Here,

r {(w)}_{p_{j}}

is the remainder r when

a (w)

is divided by a prime

p_{j}

, and

b i n (r {(w)}_{p_{j}})

is the binary presentation of the number

r {(w)}_{p_{j}}

. Clearly, we have that

l \leq log p_{d}

.

The family

F

satisfies the following property.

Property 3.

For

c \geq 3

and

d = c n m

, the set

F = {f_{1}, \dots, f_{d}}

of (

m, O (log (n m))

)-functions forms a family that is a strongly

(n, 1 / c)

-universal family of hash (

m, O (log (n m))

)-functions.

Proof.

For the proof, see the Property 2 above. □

We now present the

A 2

algorithm (Algorithm 3)—generalization of the

A

algorithm—in terms of a strongly

(n, ϵ)

-universal family of hash functions.

Algorithm $A 2$

The algorithm consists of two sequentially working parts:

First part: preparing the initial state based on the dictionary $V (s t r i n g, m)$ .
Second part: reading the search word w and searching for its occurrence in the dictionary.

We emphasize that the algorithm

A

has two different input sets:

V (s t r i n g, m)

and w. These sets are fed to the first and second parts of the algorithm

A

, respectively.

Algorithm 3 Algorithm

A 2

Input:

For the first part: Vocabulary

V (s t r i n g, m) = {w_{0}, \dots, w_{n - 1}}

composed of binary words of length m from the

s t r i n g

.

For the second part: Binary word w of length m.

Output: The index k, which is interpreted as an index such that

w = w_{k}

.

The first part of the algorithm (preparing the initial state):

The first stage of the algorithm—classical: The function f is chosen equiprobably from the set $F$ .
Second stage (preparation of the quantum state): For the function $f \in F$ and vocabulary $V (s t r i n g, m)$ , the algorithm forms vocabulary

$V_{f} = {f (w_{0}) \dots, f (w_{n - 1})} .$

Then, based on $V_{f}$ , the following quantum state is generated:

$|s t r i n g, f〉 = \frac{1}{\sqrt{n}} \sum_{k = 0}^{n - 1} |k〉 \otimes |f (w_{k})〉 \otimes |1〉 .$

The second part of the algorithm (searching for the $w$ ):

The third stage (quantum) of the algorithm $A 2$ :
quantum procedure $A 1$ is applied with the input $|s t r i n g, f〉$ and word $f (w)$ . $A 1$ implements the mapping

$A 1 : (|s t r i n g, f〉, f (w)) ⟼ k .$

to the state $|s t r i n g, f〉$ and the search word $f (w)$ .
The number k is the result of measuring the first $log n$ qubits. The number k is declared as the required number of the word $w_{k}$ such that $w = w_{k}$ .

Now, we have the following statement—a generalization of Theorem 1 for the

A 2

algorithm based on

F = {f_{1}, \dots, f_{d}}

strongly

(n, ϵ)

-universal family of (

m, l

)-functions.

Theorem 2.

For a vocabulary

V = V (s t r i n g, m) = {w_{0}, \dots, w_{n - 1}}

of n words of length m, for a word w of length m algorithm,

A 2

solves the problem of finding an index k such that

w = w_{k}

, with the following characteristics

\begin{matrix} S^{A 2} (n, m) & = O (log n + l), \\ Q^{A 2} (n, m) & = O (\sqrt{n}), \\ E r^{A 2} (s t r i n g, w)) & \leq ϵ + \frac{1}{n} . \end{matrix}

Proof.

The proof of Theorem 2 repeats, word for word, the proof of Theorem 1 with only one amendment: it is based on the general family of

F

strongly

(n, ϵ)

-universal (

m, l

)-functions instead of hash functions of a specific family from Property 3. □

Comment

Note that such generalization (algorithm

A 2

) works effectively when the length l of hashes is small. As the algorithm

A

shows, such algorithms exist—for the algorithm

A

, the parameter

l = O (log (n m))

, see Property 3. This provides an upper bound

S^{A} = O (log n + log m) .

6. Conclusions

The paper presents a hybrid classical–quantum algorithm

A

and its generalization—the quantum algorithm

A 2

for finding the occurrence of the word w in the text

s t r i n g

. The problem naturally reduces to searching for a word in the vocabulary

V (s t r i n g)

formed from

s t r i n g

.

The quantum part of algorithms

A

and

A 2

is presented by the auxiliary quantum procedure

A 1

. The quantum procedure

A 1

is essentially Grover’s quantum search algorithm. Here, we use the original conditions for applying Grover’s algorithm, namely, we consider that in the text

s t r i n g

, there is (necessarily) a unique occurrence of the word w. It must be said that the calculation of query complexity depends on the direct implementation of the algorithm. In this work, we consider the oracle as an appeal to the whole substring, not to its individual bits.

The main declared result in saving the number of qubits used is achieved in the

A

and

A 2

algorithms. The

A

and

A 2

algorithms use hash family techniques to save quantum space. The

A

algorithm is based on a specific family of hash functions. This specific family of hash functions is known as Freivald fingerprints (see, for example, the book [14], chapter 7 for more information). The

A 2

algorithm is the generalization of the

A

for an arbitrary strongly

(n, ϵ)

-universal family of hash functions.

It is important to note that in this paper, as well as in the cited paper [9], the algorithms are applied to the quantum state (initial state), which is prepared in advance based on the analyzed text

s t r i n g

(on the vocabulary

V (s t r i n g)

that is formed from the

s t r i n g

). The preparation of the initial state requires preliminary work, but this work is not taken into account in the algorithms. Note that the problem of preparing the initial state, as a special problem, is beginning to be discussed in the community dealing with quantum information search. In particular, the authors of [18] drew attention to the problem of the complexity of preparing the initial state.

Finally, once again, we note that in this paper, we considered the case when it is known in advance that the desired substring occurs in the text exactly once. The problem can be expanded if the number of occurrences is greater than one and is known in advance, and also if the number of occurrences of the substring is not known in advance. An estimate of the time and space complexity of Grover’s search algorithm in these cases is described in [13].

Author Contributions

Conceptualization, N.S.; Methodology, F.A. and N.S.; Formal analysis, F.A., N.S. and M.A.; Investigation, F.A. and N.S.; Writing—original draft, F.A. and N.S.; Writing—review & editing, F.A. and N.S.; Supervision, F.A.; Project administration, F.A. All authors have read and agreed to the published version of the manuscript.

Funding

This paper has been supported by the Kazan Federal University Strategic Academic Leadership Program (“PRIORITY-2030”).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Grover, L.K. A fast quantum mechanical algorithm for database search. In Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing, Philadelphia, PA, USA, 22–24 May 1996; pp. 212–219. [Google Scholar]
Gilliam, A.; Pistoia, M.; Gonciulea, C. Optimizing quantum search using a generalized version of grover’s algorithm. arXiv 2020, arXiv:2005.06468. [Google Scholar]
Brassard, G.; Hoyer, P.; Mosca, M.; Tapp, A. Quantum amplitude amplification and estimation. Contemp. Math. 2002, 305, 53–74. [Google Scholar]
Reitzner, D.; Nagaj, D.; Buzek, V. Quantum walks. arXiv 2012, arXiv:1207.7283. [Google Scholar] [CrossRef]
De Wolf, R. Quantum computing: Lecture notes. arXiv 2019, arXiv:1907.09415. [Google Scholar]
Ablayev, F.; Ablayev, M.; Huang, J.Z.; Khadiev, K.; Salikhova, N.; Wu, D. On quantum methods for machine learning problems part i: Quantum tools. Big Data Min. Anal. 2019, 3, 41–55. [Google Scholar] [CrossRef]
Ablayev, F.; Ablayev, M.; Huang, J.Z.; Khadiev, K.; Salikhova, N.; Wu, D. On quantum methods for machine learning problems part ii: Quantum classification algorithms. Big Data Min. Anal. 2019, 3, 56–67. [Google Scholar] [CrossRef]
Knuth, D.E.; Morris, J.H., Jr.; Pratt, V.R. Fast pattern matching in strings. SIAM J. Comput. 1977, 6, 323–350. [Google Scholar] [CrossRef]
Ramesh, H.; Vinay, V. String matching in o( $\sqrt{n}$ + $\sqrt{m}$ ) quantum time. J. Discret. Algorithms 2003, 1, 103–110. [Google Scholar] [CrossRef]
Ablayev, M.F.; Salikhova, N.M. Quantum search for a given substring in the text using a hashing technique. Seriya Fiziko-Matematicheskie Nauki 2020, 162, 241–258. [Google Scholar] [CrossRef]
Ambainis, A. Understanding quantum algorithms via query complexity. In Proceedings of the International Congress of Mathematicians (ICM 2018), Rio de Janeiro, Brazil, 1–9 August 2018; World Scientific: Singapore, 2019. [Google Scholar]
Diamond, H.G. Elementary methods in the study of the distribution of prime numbers. Bull. Am. Math. Soc. 1982, 7, 553–589. [Google Scholar] [CrossRef]
Boyer, M.; Brassard, G.; Høyer, P.; Tapp, A. Tight bounds on quantum searching. Fortschritte Phys. Prog. Phys. 1998, 46, 493–505. [Google Scholar] [CrossRef]
Motwani, R.; Raghavan, P. Randomized Algorithms; Cambridge University Press: Cambridge, UK, 1995. [Google Scholar]
Carter, J.L.; Wegman, M.N. Universal classes of hash functions. J. Comput. Syst. Sci. 1979, 18, 143–154. [Google Scholar] [CrossRef]
Stinson, D.R. Universal hashing and authentication codes. In Annual International Cryptology Conference; Springer: Berlin/Heidelberg, Germany, 1991; pp. 74–85. [Google Scholar]
Stinson, D.R. Combinatorial techniques for universal hashing. J. Comput. Syst. Sci. 1994, 48, 337–346. [Google Scholar] [CrossRef]
Macaluso, A.; Clissa, L.; Lodi, S.; Sartori, C. Quantum ensemble for classification. arXiv 2020, arXiv:2007.01028. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ablayev, F.; Salikhova, N.; Ablayev, M. Hybrid Classical–Quantum Text Search Based on Hashing. Mathematics 2024, 12, 1858. https://doi.org/10.3390/math12121858

AMA Style

Ablayev F, Salikhova N, Ablayev M. Hybrid Classical–Quantum Text Search Based on Hashing. Mathematics. 2024; 12(12):1858. https://doi.org/10.3390/math12121858

Chicago/Turabian Style

Ablayev, Farid, Nailya Salikhova, and Marat Ablayev. 2024. "Hybrid Classical–Quantum Text Search Based on Hashing" Mathematics 12, no. 12: 1858. https://doi.org/10.3390/math12121858

APA Style

Ablayev, F., Salikhova, N., & Ablayev, M. (2024). Hybrid Classical–Quantum Text Search Based on Hashing. Mathematics, 12(12), 1858. https://doi.org/10.3390/math12121858

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hybrid Classical–Quantum Text Search Based on Hashing

Abstract

1. Introduction

1.1. Problem

1.2. Related Work

1.3. Our Contribution

2. Preliminaries for Quantum Query Algorithm

2.1. Query Model

2.2. Search in an Unordered Database

2.3. Basic Operations of Qubits for Quantum Search

2.4. Characteristics of the $A$ Algorithm

3. Algorithms for Finding the Index of Occurrence of a Substring in the Text

3.1. Auxiliary Quantum Procedure $A 1$ (Algorithm 1)

3.2. Algorithm $A$ (Algorithm 2)

3.3. Characteristics of the Algorithm $A$

4. Proof of Theorem 1

4.1. Space Complexity $S^{A} (n, m)$

4.2. The Query Complexity $Q^{A} (n, m)$

4.3. The Error $E r^{A} (V, w)$

5. Generalization

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Hybrid Classical–Quantum Text Search Based on Hashing

Abstract

1. Introduction

1.1. Problem

1.2. Related Work

1.3. Our Contribution

2. Preliminaries for Quantum Query Algorithm

2.1. Query Model

2.2. Search in an Unordered Database

2.3. Basic Operations of Qubits for Quantum Search

2.4. Characteristics of the A Algorithm

3. Algorithms for Finding the Index of Occurrence of a Substring in the Text

3.1. Auxiliary Quantum Procedure A 1 (Algorithm 1)

3.2. Algorithm A (Algorithm 2)

3.3. Characteristics of the Algorithm A

4. Proof of Theorem 1

4.1. Space Complexity S A ( n , m )

4.2. The Query Complexity Q A ( n , m )

4.3. The Error E r A ( V , w )

5. Generalization

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.4. Characteristics of the $A$ Algorithm

3.1. Auxiliary Quantum Procedure $A 1$ (Algorithm 1)

3.2. Algorithm $A$ (Algorithm 2)

3.3. Characteristics of the Algorithm $A$

4.1. Space Complexity $S^{A} (n, m)$

4.2. The Query Complexity $Q^{A} (n, m)$

4.3. The Error $E r^{A} (V, w)$