Mind the O˜: Asymptotically Better, but Still Impractical, Quantum Distributed Algorithms

Kerger, Phillip; Bernal Neira, David E.; Gonzalez Izquierdo, Zoe; Rieffel, Eleanor G.

doi:10.3390/a16070332

Open AccessArticle

Mind the $\tilde{O}$ : Asymptotically Better, but Still Impractical, Quantum Distributed Algorithms

by

Phillip Kerger

^1,2,3,*

,

David E. Bernal Neira

^2,3,4

,

Zoe Gonzalez Izquierdo

^2,3 and

Eleanor G. Rieffel

^2,*

¹

Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD 21218, USA

²

Quantum Artificial Intelligence Laboratory, NASA Ames Research Center, Moffett Field, CA 94035, USA

³

Research Institute of Advanced Computer Science, Universities Space Research Association, Mountain View, CA 94043, USA

⁴

Davidson School of Chemical Engineering, Purdue University, West Lafayette, IN 47907, USA

^*

Authors to whom correspondence should be addressed.

Algorithms 2023, 16(7), 332; https://doi.org/10.3390/a16070332

Submission received: 16 June 2023 / Revised: 8 July 2023 / Accepted: 9 July 2023 / Published: 11 July 2023

(This article belongs to the Collection Feature Paper in Algorithms and Complexity Theory)

Download Review Reports Versions Notes

Abstract

:

We present two algorithms in the quantum CONGEST-CLIQUE model of distributed computation that succeed with high probability: one for producing an approximately optimal Steiner tree, and one for producing an exact directed minimum spanning tree, each of which uses

\tilde{O} (n^{1 / 4})

rounds of communication and

\tilde{O} (n^{9 / 4})

messages, achieving a lower asymptotic round and message complexity than any known algorithms in the classical CONGEST-CLIQUE model. At a high level, we achieve these results by combining classical algorithmic frameworks with quantum subroutines. Additionally, we characterize the constants and logarithmic factors involved in our algorithms as well as related classical algorithms, otherwise obscured by

\tilde{O}

notation, revealing that advances are needed to render both the quantum and classical algorithms practical.

Keywords:

quantum computing; distributed computing; complexity; Steiner tree; directed minimum spanning tree; arborescence

1. Introduction

The classical CONGEST-CLIQUE model (henceforth referred to as cCCM) in distributed computing has been carefully studied as a model central to the field, e.g., [1,2,3,4,5,6]. In this model, processors in a network solve a problem whose input is distributed across the nodes under significant communication limitations, described in detail in Section 2. For example, a network of aircraft or spacecraft, satellites, and control stations, all with large distances between them, may have severely limited communication bandwidth to be modeled in such a way. The quantum version of this model, in which quantum bits can be sent between processors, the quantum CONGEST-CLIQUE model (qCCM), as well as the quantum CONGEST model, have been the subjects of recent research [7,8,9,10] in an effort to understand how quantum communication may help in these distributed computing frameworks. For the quantum CONGEST model, however, ref. [10] showed that many problems cannot be solved more quickly than in the classical model. These include shortest paths, minimum spanning trees, Steiner trees, min-cut, and more; the computational advantages of quantum communication are, thus, severely limited in the CONGEST setting, though a notable positive result is sub-linear diameter computation in [11]. No comparable negative results exist for the qCCM, and in fact, ref. [7] provides an asymptotic quantum speedup for computing all-pairs shortest path (APSP, henceforth) distances. Hence, it is apparent that the negative results of [10] cannot transfer over to the qCCM, so investigating these problems in the qCCM presents an opportunity for contributing to the understanding of how quantum communication may help in these distributed computing frameworks. In this paper, we contribute to this understanding by formulating algorithms in the qCCM for finding approximately optimal Steiner trees and exacting directed minimum spanning trees using

\tilde{O} (n^{1 / 4})

rounds—asymptotically fewer rounds than any known classical algorithms. This is done by augmenting the APSP algorithm of [7] with an efficient routing table scheme, which is necessary to make use of the shortest path information instead of only the APSP distances, and using the resulting subroutine with existing classical algorithmic frameworks. Beyond asymptotics, we also characterize the complexity of our algorithms as well as those of [2,3,7,12] to include the logarithmic and constant factors involved to estimate the scales at which they would be practical, which was not included in the previous work. It should be noted that, similar to APSP, these problems cannot see quantum speedups in the CONGEST (non-clique) setting, as shown in [10]. Our Steiner tree algorithm is approximate and based on a classical polynomial-time centralized algorithm of [13]. Our directed minimum spanning tree problem algorithm follows an approach similar to [3], which effectively has its centralized roots in [14].

2. Background and Setting

This section provides the necessary background for our algorithms’ settings and the problems they solve.

2.1. The CONGEST and CONGEST-CLIQUE Models of Distributed Computing

In the standard CONGEST model, we consider a graph of n processor nodes whose edges represent communication channels. Initially, each node knows only its neighbors in the graph and associated edge weights. In rounds, each processor node executes computation locally and then communicates with its neighbors before executing further local computation. The congestion limitation restricts this communication, with each node able to send only one message of

O (log (n))

classical bits in each round to its neighbors, though the messages to each neighbor may differ. In the cCCM, we separate the communication graph from the problem’s input graph by allowing all nodes to communicate with each other, though the same

O (log (n))

bits-per-message congestion limitation remains. Hence, a processor node could send

n - 1

different messages to the other

n - 1

nodes in the graph, with a single node distributing up to

O (n \cdot log (n))

bits of information in a single round. Taking advantage of this way of dispersing information to the network is paramount in many efficient CONGEST-CLIQUE algorithms. The efficiency of algorithms in these distributed models is commonly measured in terms of the round complexity, or the number of rounds of communication used in an algorithm to solve the problem in question. A good overview of these distributed models can be found in [15].

2.2. Quantum Versions of CONGEST and CONGEST-CLIQUE

The quantum models we work in are obtained via the following modification: Instead of restricting to messages of

O (log (n))

classical bits, we allow messages to consist of

O (log (n))

quantum bits, i.e., qubits. For background on qubits and the fundamentals of quantum computing, we refer the reader to [16]. We formally define the qCCM, the setting for our algorithms, as follows:

Definition 1.

(Quantum CONGEST-CLIQUE) The quantum CONGEST-CLIQUE model (qCCM) is a distributed computation model in which an input graph

G = (V, E, W)

is distributed over a network of n processors, where each processor is represented by a node in V. Each node is assigned a unique ID number in

[n]

. Time passes in rounds, each of which consists of the following:

Each node may execute unlimited local computation.
Each node may send a message consisting of either a register of $O (log n)$ qubits or a string of $O (log n)$ classical bits to every other node in the network. Each of those messages may be distinct.
Each node receives and saves the messages the other nodes send it.

The input graph G is distributed across the nodes as follows: Each node knows its own ID number, the ID numbers of its neighbors in G, the number of nodes n in G, and the weights corresponding to the edges it is incident upon. The output solution to a problem must be given by having each node

v \in V

return the restriction of the global output to

N_{G} (u) : = {v : u v \in E}

, its neighborhood in G. No entanglement is shared across nodes initially.

This is an analog of the cCCM, except that quantum bits may be sent in place of classical bits. To clarify the output requirement, in the Steiner tree problem, we require node u to output the edges of the solution tree that are incident upon u. Since many messages in our algorithms need not be sent as qubits, we define the qCCM slightly unconventionally, allowing either quantum or classical bits to be sent. We specify those that may be sent classically. However, even without this modification, the quantum versions of CONGEST and cCCM are at least as powerful as their classical counterparts. This is because any n-bit classical message can be instead sent as an n-qubit message of unentangled qubits; for a classical bit reading 0 or 1, we can send a qubit in the state

| 0 〉

or

| 1 〉

, respectively, and then take measurements with respect to the

{| 0 〉, | 1 〉}

basis to read the same message the classical bits would have communicated. Hence, one can also freely make use of existing classical algorithms in the qCCM. Further, the assumption that IDs are in

[n]

, with n known, is not necessary but is convenient; without this assumption, we could have all nodes broadcast their IDs to the entire network and then assign a new label in

[n]

to each node according to an ordering of the original IDs, resulting in our assumed situation.

Remark 1.

Definition 1 does not account for how the information needs to be stored. In this paper, it suffices for all information regarding the input graph to be stored classically as long as there is quantum access to that data. We provide some details on this in Appendix A.4.

Remark 2.

No initial entanglement being shared across nodes, as outlined in Definition 1, results in quantum teleportation not being a straightforward method to solve problems in the qCCM.

Example 1.

To provide some intuition on how allowing communication through qubits in this distributed setting can be helpful, we now describe and give an example of the distributed Grover search, first described in [11]. The high-level intuition for why quantum computing provides an advantage for search is that quantum operations use quantum interference effects to have canceling effects among non-solutions. Grover search has a generalization referred to as “amplitude amplification” that we will use; see [16] for details on these algorithms. Now, for a processor node u in the network and a Boolean function

g : X \to {0, 1}

, suppose that there exists a classical procedure

C

in the cCCM that allows u to compute

g (x)

, for any

x \in X

in r rounds. The quantum speedup will come from computing

C

in a quantum superposition, which enables g to be evaluated with inputs in superposition so that amplitude amplification can be used for inputs to g. Let

A_{i} : {x \in X : g (x) = i}

, for

i = 0, 1

, and suppose that

0 < | A_{1} | \leq | X | / 2

. Classically, node u can then find an

x \in A_{1}

in

Θ (r | X |)

rounds by checking each element of X. Using the quantum distributed Grover search of [11] enables u to find such an x with high probability in only

\tilde{O} (r \sqrt{| X |})

rounds by evaluating the result of computing g on a superposition of inputs.

We illustrate this procedure in an example case where node u wants to inquire whether one of its edges

u v

is part of a triangle in G. We first describe a classical procedure for this, followed by the corresponding quantum-distributed search version. For

v \in N_{G} (u)

, we denote by

I_{v} : V \to {0, 1}

the indicator function of

N_{G} (v)

, and by

g_{u v} : N_{G} (u) \to {0, 1}

its restriction to inputs in

N_{G} (u)

. Classically, node u can evaluate

g_{u v} (w)

in two rounds for any

w \in N_{G} (u)

by sending the ID of w (of length

log n

) to v, and having v send back the answer

I_{v} (w)

. u can then check

g_{u v} (w)

for each

w \in N_{G} (u)

one at a time to determine whether

u v

is part of a triangle in G or not in

2 \cdot | N_{G} (u) |

rounds.

For the distributed quantum implementation, u can instead initialize a register of

log n

qubits as

{| ψ 〉}_{0} : = \frac{1}{\sqrt{| N_{G} (u) |}} \sum_{x \in N_{G} (u)} | x 〉

, all the inputs for

g_{u v}

in equal superposition. To conduct a Grover search, u needs to be able to evaluate

g_{u v}

with inputs

| ψ 〉

in superposition. For the quantum implementation of

C

, u sends a quantum register in state

| ψ 〉 | 0 〉

to node v, and has node v evaluate a quantum implementation of

I_{v}

, which we will consider as a call to an oracle mapping

| x 〉 | 0 〉

to

| x 〉 | I_{v} (x) 〉

for all

x \in V

. Node v sends back the resulting qubit register, and node u evaluates

g_{u v} (| ψ 〉)

in two rounds. Now, since u can evaluate

g_{u v}

in superposition, node u may proceed using standard amplitude amplification, using two rounds of communication for each evaluation of

g_{u v}

, so that u can find an element

w \in N_{G} (u)

satisfying

g_{u v} (w) = 1

with high probability in

\tilde{O} (r \sqrt{| N_{G} (u) |})

rounds if one exists. We note that in this example, v cannot execute this procedure by itself since it does not know

N_{G} (u)

(and sending this information to v would take

| N_{G} (u) |

rounds), though it is able to evaluate

I_{v}

in superposition for any

w \in N_{G} (u)

. For any classical procedure

C

evaluating a different function from this specific g (which can be efficiently implemented classically and, therefore, translated to an efficient quantum implementation), the same idea results in the square-root advantage to find a desired element, such that g evaluates 1.

2.3. Notation and Problem Definitions

For an integer-weighted graph

G = (V, E, W)

, we will denote

n : = | V |, m : = | E |,

and

W_{e}

the weight of an edge

e \in E

throughout the paper. Let

δ (v) \subset V

be the set of edges incident on node v, and

N_{G} (u) : = {v : u v \in E}

the neighborhood of

u \in G

. Denote by

d_{G} (u, v)

the shortest-path distance in G from u to v. For a graph

G = (V, E, W)

, given two sets of nodes, U and

U^{'}

, let

P_{G} (U, U^{'}) : = {u v \in E : u \in U, w \in U^{'}}

be the set of edges connecting U to

U^{'}

. Let

P (U) : = P (U, U)

as shorthand. All logarithms will be taken with respect to base 2, unless otherwise stated.

Definition 2.

(Steiner tree problem) Given a weighted, undirected graph

G = (V, E, W)

, and a set of nodes

Z \subset V

, referred to as Steiner terminals, output the minimum weight tree in G that contains

Z

.

Definition 3.

(Approximate Steiner tree) For a Steiner tree problem with terminals

Z

and a solution

S_{O P T}

with edge set

E_{S_{O P T}}

, a tree T in G containing

Z

with edge set

E_{T}

, such that

\sum_{u v \in E_{T}} W_{u v} \leq r \cdot \sum_{u v \in E_{S_{O P T}}} W_{u v}

is called an approximate Steiner tree with approximation factor r.

Definition 4.

(Directed minimum spanning tree problem (DMST)) Given a directed, weighted graph

G = (V, E, W)

and a root node

r \in V

, output the minimum weight directed spanning tree for G rooted at r. This is also known as the minimum weight arborescence problem.

3. Contributions

We provide an algorithm for the qCCM that produces an approximate Steiner tree with high probability (w.h.p.) in

\tilde{O} (n^{1 / 4})

rounds and an algorithm that produces an exact directed minimum spanning tree w.h.p. in

\tilde{O} (n^{1 / 4})

rounds. To do this, we enhance the quantum APSP algorithm of [7] in an efficient way to compute APSP distances as well as the corresponding routing tables (described in Section 4) that our algorithms rely on. Furthermore, in addition to these

\tilde{O}

results, in Section 4.7, Section 5.4, and Section 6.3, we characterize the constants and logarithmic factors involved in our algorithms as well as related classical algorithms to contribute to the community’s understanding of their implementability. This reveals that the factors commonly obscured by the

\tilde{O}

notation in related literature, especially the logarithms, have a severe impact on practicality.

We summarize the algorithmic results in the following two theorems:

Theorem 1.

There exists an algorithm in the quantum CONGEST-CLIQUE model that, given an integer-weighted input graph

G = (V, E, W)

, outputs a

2 (1 - 1 / l)

approximate Steiner tree with a probability of at least

1 - \frac{1}{p o l y (n)}

, and uses

\tilde{O} (n^{1 / 4})

rounds of computation, where l denotes the number of terminal leaf nodes in the optimal Steiner tree.

Theorem 2.

There exists an algorithm in the quantum CONGEST-CLIQUE model that, given a directed and integer-weighted input graph

G = (V, E, W)

, produces an exact directed minimum spanning tree with high probability, of at least

1 - \frac{1}{p o l y (n)}

, and uses

\tilde{O} (n^{1 / 4})

rounds of computation.

4. APSP and Routing Tables

We first describe an algorithm for the APSP problem with routing tables in the qCCM, for which we combine an algorithm of [7] with a routing table computation from [17]. For this, we reduce APSP with routing tables to triangle finding by utilizing distance products, as demonstrated in [12].

4.1. Distance Products and Routing Tables

Definition 5.

A routing table for a node v is a function

R_{v} : V \to V

mapping a vertex u to the first node visited in the shortest path from v to u, other than v itself.

Definition 6.

The distance product between two

n \times n

matrices A and B is defined as the

n \times n

matrix

A ★ B

with entries:

\begin{matrix} {(A ★ B)}_{i j} = min_{k} {A_{i k} + B_{k j}} . \end{matrix}

(1)

The distance product is also sometimes referred to as the min-plus or tropical product. For shortest paths, we will repeatedly square the graph adjacency matrix with respect to the distance product. For a

n \times n

matrix W and an integer k, let us denote

W^{k, ★} : = W ★ (W ★ (\dots (W ★ W)) \dots)

as the

k^{t h}

power of the distance product. For a graph

G = (V, E, W)

with weighted adjacency matrix W (assigning

W_{u v} = \infty

if

u v \notin E

),

W_{u v}^{k, ★}

is the length of the shortest path from v to u in G using at most k hops. Hence, for any

N \geq n

,

W^{N, ★}

contains all the shortest-path distances between nodes in G. As these distance products obey standard exponent rules, we may take

N = 2^{⌈ log n ⌉}

to recursively compute the APSP distances by taking

⌈ log n ⌉

distance product squares:

\begin{matrix} W^{2, ★} = W ★ W, W^{4, ★} = {(W^{2, ★})}^{2, ★}, \dots, W^{2^{⌈ log n ⌉}, ★} = {(W^{2^{⌈ log n ⌉ - 1}, ★})}^{2, ★} . \end{matrix}

(2)

This procedure reduces computing APSP distances to computing

⌈ log n ⌉

distance products. In the context of the CONGEST-CLIQUE model, each node needs to learn the row of

W^{n}

that represents it. As we also require nodes to learn their routing tables, we provide a scheme in Section 4.3 that is well-suited for our setting to extend [7] to also compute routing tables.

4.2. Distance Products via Triangle Finding

Having established reductions to distance products, we turn to their efficient computation. The main idea is that we can reduce distance products to a binary search in which each step in the search finds negative triangles. This procedure corresponds to ([18] Proposition 2), which we describe here, restricting to finding the distance product square needed for Equation (2).

A negative triangle in a weighted graph is a set of edges

Δ^{-} = (u v, v w, w u) \subset E^{3}

, such that

\sum_{e \in Δ^{-}} W_{e} < 0

. Let us denote the set of all negative triangles in a graph G as

Δ_{G}^{-}

. Specifically, we will be interested in each node v being able to output edges

v u \in δ (v)

, such that

v u

is involved in at least one negative triangle in G. Let us call this problem FindEdges, and define it formally as follows:

FindEdges

Input: An integer-weighted (directed or undirected) graph $G = (V, E, W)$ distributed among the nodes, with each node v knowing $N_{G} (v)$ , as well as the weights $W_{v u}$ for each $u \in N_{G} (v)$ .
Output: For each node v, its output is all the edges $v u \in E$ that are involved in at least one negative triangle in G.

Proposition 1.

If FindEdges on an n-node integer-weighted graph

G = (V, E, W)

can be solved in

T (n)

rounds, then the distance product

A ★ B

of two

n \times n

matrices A and B with entries in

[M]

can be computed in

T (3 n) \cdot ⌈ {log}_{2} (2 M) ⌉

rounds.

Proof.

Let A and B be arbitrary

n \times n

integer-valued matrices, and D be an

n \times n

matrix initialized to

0

. Let each

u \in V

simulate three copies of itself,

u_{1}, u_{2}, u_{3}

, writing

V_{1}, V_{2}, V_{3}

as the sets of copies of nodes in V. Consider the graph

G^{'} = (V_{1} \cup V_{2} \cup V_{3}, E^{'}, W^{'})

, by letting

u_{i} v_{j} \in E^{'}

for

u_{i} \in V_{i}, v_{j} \in V_{j}, i \neq j

, taking

W_{u_{1} v_{2}}^{'} = A_{u v}

for

u_{1} \in V_{1}, v_{2} \in V_{2}

,

W_{u_{2} v_{3}}^{'} = B_{u v}

for

u_{2} \in V_{2}, v_{3} \in V_{3}

, and

W_{u_{3} v_{1}}^{'} = D_{u v}

for

u_{3} \in V_{3}, v_{1} \in V_{1}

. An edge

z v

is part of a negative triangle in

G^{'}

exactly whenever

min_{u \in V} {A_{v u} + B_{u z}} < - D_{z v} .

Assuming we can compute FindEdges for a k-node graph in

T (n)

rounds, with a non-positive matrix

D = 0

initialized, we can apply simultaneous binary searches on

D_{z v}

, with values between

{- 2 M, 0}

, updating it for each node v after each run of FindEdges to find

{min}_{u \in V} {A_{v u} + B_{u z}}

for every other node z in

T (3 n) \cdot ⌈ log ({max}_{v, z \in V} {{min}_{u \in V} {A_{v u} + B_{u v}}}) ⌉

rounds, since

G^{'}

is a tripartite graph with

3 n

nodes. □

Remark 3.

This procedure can be realized in a single n-node distributed graph by letting each node represent the three copies of itself since

G^{'}

is tripartite. The

T (3 n)

stems from each processor node possibly needing to send one message for each node it is simulating in each round of FindEdges. If the bandwidth per message is large enough (i.e., three times the bandwidth needed for solving FindEdges in

T (n)

rounds), then this can be conducted in

T (n)

rounds.

Thus, for this binary search, each node v initializes and locally stores

D_{v z} = 0

for each other

z \in V

; subsequently, we solve FindEdges on

G^{'}

. The node then updates each

D_{v z}

according to whether or not the edge copies of

v z

were part of a negative triangle in

G^{'}

, after which, FindEdges is computed with the updated values for D. This is repeated until all the

{min}_{u \in V} {A_{v u} + B_{u z}}

are determined.

4.3. Routing Tables via Efficient Computation of Witness Matrices

For the routing table entries, we also need each node v to know the intermediate node u that is being used to attain

{min}_{u \in V} {W_{v u} + W_{u z}}

.

Definition 7.

For a distance product

A ★ B

of two

n \times n

matrices

A, B

, a witness matrix C is an

n \times n

matrix, such that

\begin{matrix} C_{i j} \in a r g m i n_{k \in [n]} {A_{i k} + B_{k j}} \end{matrix}

To put it simply, a witness matrix contains the intermediate entries used to attain the values in the resulting distance product. We present here a simple way of computing witness matrices along with the distance product by modifying the matrix entries appropriately, first considered by [17]. The approach is well-suited for our algorithm, as we only incur

O (log n)

additional calls to FindEdges for a distance product computation with a witness matrix.

For an

n \times n

integer matrix W, obtain matrices

W^{'}

and

W^{″}

by taking

W_{i j}^{'} = n W_{i j} + j - 1

and

W_{j i}^{″} = n W_{j i}

. Set

K = W^{'} ★ W^{″}

.

Claim 1.

With

W, W^{'}, W^{″},

and K as defined immediately above,

(i): $⌊ \frac{K}{n} ⌋ = W^{2, ★}$
(ii): $(K mod n) + 1$ is a witness matrix for $W^{2, ★} .$

The claim follows from routine calculations of the quantities involved and can be found in Appendix A.1.

Hence, we can obtain witness matrices by simply changing the entries of our matrices by no more than a multiplicative factor of n and an addition of n. Since the complexity of our method logarithmically depends on the magnitude of the entries of W, we only logarithmically need many more calls to FindEdges to obtain witness matrices along with the distance products, making this simple method well-suited for our approach. More precisely, we can compute

W^{2}

with a witness matrix using

⌈ log (2 n \cdot {max}_{i, j} {W_{i j}^{2} < \infty}) ⌉ .

calls to FindEdges. We obtain the following corollary to Proposition 1 to characterize the exact number of rounds needed:

Corollary 1.

If FindEdges on an n-node integer-weighted graph

G = (V, E, W)

can be solved in

T (n)

rounds, then the distance product square

W^{2, ★}

, along with a witness matrix H, can be computed in

T (3 n) \cdot ⌈ {log}_{2} (n \cdot {max}_{v, z \in G} {{min}_{u \in V} {W_{v u} + W_{u v}}} + n) ⌉

rounds.

Proof.

This follows from Claim 1 and Proposition 1 upon observing that

{max}_{v \in V} {{min}_{u \in V} {W_{v u}^{'} + W_{u v}^{''})}} \leq n \cdot {max}_{v, z \in G} {{min}_{u \in V} {W_{v u} + W_{u v}}} + n

. □

Once we obtain witness matrices along with the distance product computations, constructing the routing tables for each node along the way of computing APSP is straightforward. In each squaring of W in Equation (2), each node updates its routing table entries according to the corresponding witness matrix entry observed. It is worth noting that these routing table entries only need to be stored and accessed classically so that we avoid using unnecessary quantum data storage.

4.4. Triangle Finding

Given the results from Section 4.2 and Section 4.3, we reduced finding both the routing tables and distance product to having each edge learn the edges involved in a negative triangle in the graph. This section will, thus, describe the procedure to solve the FindEdges subroutine. We state here a central result from [7]:

Proposition 2.

There exists an algorithm in the quantum CONGEST-CLIQUE model that solves the FindEdges subroutine in

\tilde{O} (n^{1 / 4})

rounds.

We will proceed to describe each step of the algorithm to describe the precise round complexity beyond the

\tilde{O} (n^{1 / 4})

, to characterize the constants involved in the interest of assessing the future implementability of our algorithms.

As a preliminary, we provide a message routing lemma of [5] for the congested clique, which will be used repeatedly:

Lemma 1.

Suppose each node in G is the source and destination for at most n messages of size

O (log n)

and that the sources and destinations of each message are known in advance to all nodes. All messages can then be routed to their destinations in 2 rounds.

We introduce the subproblem FindEdgesWithPromise (FEWP, henceforth). Let

Γ (u, v)

denote the number of nodes

w \in V

, such that

(u, v, w)

forms a negative triangle in G.

FEWP

Input: An integer-weighted graph $G = (V, E, W)$ distributed among the nodes and a set $S \subset P (V)$ , with each node v knowing $N_{G} (v)$ and S.
Promise: For each $u v \in S, Γ (u, v) \leq 90 log n .$
Output: For each node v, its outputs are the edges $v u \in S$ that satisfy $Γ (u, v) > 0$ .

We present here a description of the procedure of [7] to solve FindEdges, given an algorithm

A

to solve FEWP. Let

ε_{A}

be the failure probability of algorithm

A

for an instance of FEWP.

FindEdgesViaFEWP

1:

S : = P; M : = \emptyset; i : = 0

.

2:

WHILE

60 \cdot 2^{i} log n \leq n

:

(a):: Each node samples each of its edges with probability $\sqrt{\frac{60 \cdot 2^{i} log n}{n}}$ , so that we obtain a distributed subgraph $G^{'}$ of G consisting of the sampled edges.
(b):: Run $A$ on $(G^{'}, S)$ . Denote the output by $S^{'}$ .
(c):: $S \leftarrow S ∖ S^{'}; M \leftarrow M \cup S; i \leftarrow i + 1 .$

3:

Run

A

on

(G, S)

, and call

S^{″}

the output.

4:

Output

M \cup S

.

From step 2 of this above algorithm, it is straightforward to check that this requires a maximum of

c_{n} : = ⌈ log (\frac{n}{60 log n}) ⌉ + 1

calls to the

A

subroutine to solve FEWP. Further, it succeeds with a probability of at least

1 - c_{n} / n^{3} - c_{n} / n^{2} 8 - (c_{n} + 1) ε_{A}

. We refer the reader to ([7] Section 3) for the proof of correctness. We now turn toward constructing an efficient algorithm for FEWP.

To solve this subroutine, we must first introduce an additional labeling scheme over the nodes that will determine how the search for negative triangles will be split up to avoid communication congestion in the network. Assume for simplicity that

n^{1 / 4}, \sqrt{n}, n^{3 / 4}

are integers. Let

M = [n^{1 / 4}] \times [n^{1 / 4}] \times [\sqrt{n}]

. Clearly,

| M | = n

, and

M

admits a total ordering lexicographically. Since we assume each node

v_{i} \in V

is labeled with a unique integer ID

i \in [n]

,

v_{i}

can select the element in

M

that has place i in the lexicographic ordering of

M

without communication occurring. Hence, each node

v \in V

is associated with a unique triple

(i, j, k) \in M

. We will refer to the unique node associated with

(i, j, k) \in M

as node

v_{(i, j, k)}

.

The next ingredient is a partitioning scheme of the space of possible triangles. Let

U

be a partition of V into

n^{1 / 4}

subsets containing

n^{3 / 4}

nodes each, by taking

U_{i} : = {v_{j} : j \in {(i - 1) \cdot n^{3 / 4}, \dots, i \cdot n^{3 / 4}}}

for

i = 1, \dots, n^{1 / 4}

, and

U : = {U_{1}, \dots, U_{n^{1_{4}}}}

. Apply the same idea to create a partition

U^{'}

of

\sqrt{n}

sets of size

\sqrt{n}

, by taking

U_{i}^{'} : = {v_{j} : j \in {(i - 1) \cdot \sqrt{n}, \dots, i \cdot \sqrt{n}}}

for

i = 1, \dots, \sqrt{n}

, and

U : = {U_{1}, \dots, U_{\sqrt{n}}}

. Let

V = U \times U \times U^{'}

. Each node

v_{(i, j, k)}

can then locally determine its association with the element

(U_{i}, U_{j}, U_{k}^{'}) \in V

since

| V | = n

. Further, if we use one round to have all nodes broadcast their IDs to all other nodes, each node

v_{(i, j, k)}

can locally compute the

(U_{i}, U_{j}, U_{k}^{'})

it is assigned to, so this assignment can be conducted in one round.

We present here the algorithm ComputePairs used to solve the FEWP subroutine.

ComputePairs

Input: An integer-weighted graph

G = (V, E, W)

distributed among the nodes, a partition of

V \times V \times V

of

(U_{i}, U_{j}, U_{k}^{'})

associated with each node as above, and a set

S \subset P (V)

, such that for

u v \in S, Γ (u, v) \leq 90 log n .

Output: For each node v, its output is the edges

v u \in S

that satisfy

Γ (u, v) > 0

.

1:: Every node $v_{(i, j, k)}$ receives the weights $W_{u v}$ , $W_{v w}$ for all $u v \in P (U_{i}, U_{j})$ and $v w \in P (U_{j}, U_{k}^{'})$ .
2:: Every node $v_{(i, j, k)}$ constructs the set $Λ_{k} (U_{i}, U_{j}) \subset P (U_{i}, U_{j})$ by selecting every $u v \in P (U_{i}, U_{j})$ with probability $10 \cdot \frac{log n}{\sqrt{n}}$ . If $| {v \in U_{1} : u v \in Λ_{k} (U_{i}, U_{j})} | > 100 n^{1 / 4} log n$ for some $u \in U_{j}$ , abort the algorithm and report failure. Otherwise, $v_{(i, j, k)}$ keeps all pairs $u v \in Λ_{k} (U_{i}, U_{j}) \cap S$ and receives the weights $W u v$ for all of those pairs. Denote those elements of $Λ_{k} (U_{i}, U_{j}) \cap S$ as $u_{1}^{k} v_{1}^{k}, \dots, u_{m}^{k} v_{m}^{k}$ .
3:: Every node $v_{(i, j, k)}$ checks for each $l \in [m]$ , whether there is some $U \in U^{'}$ that contains a node w, such that $(u_{l}^{k}, v_{l}^{k}, w)$ forms a negative triangle, and outputs all pairs $u_{l}^{k} v_{l}^{k}$ for which a negative triangle was found.

With a probability of at least

1 - 2 / n

, the algorithm ComputePairs does not terminate at step 2 and every pair

(u, v) \in S

appears in at least one

Λ_{k} (U_{i}, U_{j})

. The details for this result can be found in ([7] Lemma 2).

Step 1 requires

2 n^{1 / 4} ⌈ \frac{log W}{log n} ⌉

rounds and can be fully implemented classically without any qubit communication. Step 2 requires at most

200 log n ⌈ \frac{log W}{log n} ⌉

rounds and can also be implemented classically. Step 3 can be implemented in

\tilde{O} (n^{1 / 4})

rounds, quantumly taking advantage of the distributed Grover search, but would take

O (\sqrt{n})

steps to implement classically. The remainder of this section is devoted to illustrating how this step can be conducted in

\tilde{O} (n^{1 / 4})

rounds.

Define the following quantity:

Definition 8.

For node

v_{(i, j, k)}

, let

Δ (i, j, k) : = {(u, v) \in P (U_{i}, U_{j}) \cap S : \exists w \in U_{k}^{'} with (u, v, w) forming a negative triangle in G}

For simultaneous quantum searches, we divide the nodes into different classes based on the number of negative triangles that they are a part of with the following routine:

IdentifyClass

Input: An integer-weighted graph

G = (V, E, W)

distributed among the nodes, and a set

S \subset E

, as in FEWP.

Output: For each node v, a class

α

to which the node belongs.

1:: Every node $u_{(i, j, k)} \in V$ samples each node in ${v \in V : (u_{(i, j, k)}, v) \in S}$ with probability $\frac{10 log n}{n}$ , creating a set $Λ (u)$ of sampled vertices. If ${max}_{u} | Λ (u) | > 20 log n$ , abort the algorithm and report a failure. Otherwise, have each node broadcast $Λ (u)$ to all other nodes, and take $R : = \cup_{u \in V} {u v | v \in Λ (u)}$ .
2:: Each $v_{(i, j, k)} \in V$ computes $d_{i, j, k} : = | {u v \in P (i, j) \cap R : \exists w \in U_{k}^{'}$ , such that ${u, v, w}$ forms a negative triangle in $G} |$ , then determines its class $α$ to be $m i n {c \in N : d_{i, j, k} < 10 \cdot 2^{c} log n}$ .

This uses, at most,

20 log n

rounds (each node sends (at most) that many IDs to every other node) and can be implemented so that all exchanged messages consist solely of classical bits. Using a Chernoff bound, one can show that the procedure succeeds with a probability of at least

1 - 1 / n

, as seen in ([18] Proposition 5).

Let us make the convenient assumption that

α = 0

for all

v_{i, j, k}

, which avoids some technicalities around congestion in the forthcoming triangle search. Note that

α \leq \frac{1}{2} log n

, so we can run successive searches for each

α

for nodes in class

α

in the general case. The general case is discussed in Appendix A.2 and can also be found in [7], but this case is sufficient to convey the central ideas.

We have all the necessary ingredients to describe the implementation of step 3 of the ComputePairs procedure.

3.1:: Each node executes the IdentifyClass procedure.
3.2:: For each $α$ , for every $l \in [m]$ , every node $v_{(i, j, k)}$ in class $α$ executes a quantum search to find whether there is a $U_{k}^{'} \in U^{'}$ with some $w \in U_{k}^{'}$ forming a negative triangle $(u_{l}^{k}, v_{l}^{k}, w)$ in G, and then reports all the pairs $u_{l}^{k} v_{l}^{k}$ for which such a $U_{k}^{'}$ is found.

This provides the basis of the triangle-searching strategy. To summarize the intuition of the asymptotic speedup in this paper: Since

U_{k}^{'}

has size

\sqrt{n}

(recall that

| U^{'} | = \sqrt{n}

), if each node using a quantum search can search through its assigned

U_{k}^{'}

in

\tilde{O} (n^{1 / 4})

rounds, simultaneously, we will obtain our desired complexity. We will complete this argument in Section 4.6 and first describe the quantum searches used therein in the following subsection.

4.5. Distributed Quantum Searches

With this intuition in mind, we now state two useful theorems of [7] for distributed quantum searches. Let X denote a finite set throughout this subsection.

Theorem 3.

Let

g : X \to {0, 1}

, if node u can compute

g (x)

in r rounds in the CONGEST-CLIQUE model for any

x \in X

, then there exists an algorithm in the quantum CONGEST-CLIQUE that has u output some

x \in X

with

g (x) = 1

with high probability, using

\tilde{O} (r \sqrt{| X |})

rounds.

This basic theorem concerns only single searches; however, we need a framework that can perform multiple simultaneous searches. Let

g_{1}, \dots, g_{m} : X \to {0, 1}

and

A_{i}^{0} : = {x \in X : g_{i} (x) = 0}, A_{i}^{1} : = {x \in X : g_{i} (x) = 1}, \forall i \in [m] .

Assume that there exists an r-round classical distributed algorithm

C_{m}

that allows a node u upon an input

χ = (x_{1}, \dots, x_{m}) \in X^{m}

to determine and output

(g_{1} (x_{1}), \dots, g_{m} (x_{m}))

. In our use of distributed searches, X will consist of nodes in the network, and searches will need to communicate with those nodes for which the functions

g_{i}

are evaluated. To avoid congestion, we will have to carefully consider those

χ \in X^{m}

that have many repeated entries. We introduce a notation for this first. Define the quantity

α (χ) : = max_{I \subset [m]} | {χ_{i} = χ_{j} \forall i, j \in I} |,

as the maximum number of entries in

χ

that are all identical.

Next, given some

β \in N

, assume that in place of

C_{m}

, we now have a classical algorithm

{\tilde{C}}_{m, β}

, such that upon input

χ = (x_{1}, \dots, x_{m}) \in X^{m}

, a node u outputs

g_{1} (x_{1}), \dots, g_{m} (x_{m})

if

α (χ) \leq β

and an arbitrary output otherwise. The following theorem summarizes that such a

{\tilde{C}}_{m, β}

with sufficiently large

β

is enough to maintain a quantum speedup as seen in the previous theorem:

Theorem 4.

For set X with

| X | < m / (36 log m)

, suppose there exists such an evaluation algorithm

C_{m, β}

for some

β > 8 m / | X |

and that

α (χ) \leq β

for all

χ \in A_{1}^{1} \times \cdot \cdot \cdot \times A_{m}^{1}

. Then, there is a

\tilde{O} (r \sqrt{| X |})

-round quantum algorithm that outputs an element of

A_{1}^{1} \times \cdot \cdot \cdot \times A_{m}^{1}

with a probability of at least

1 - 2 / m^{2}

.

The proof can be found in ([7] Theorem 3).

4.6. Final Steps of the Triangle Finding

We continue here to complete step 3.2 of the ComputePairs procedure, armed with Theorem 4. We need simultaneous searches to be executed by each node

v_{(i, j, k)}

to determine the triangles in

U_{i} \times U_{j} \times U_{k}^{'}

. We provide a short lemma first that ensures the conditions for the quantum searches:

Lemma 2.

The following statements hold with a probability of at least

1 - 2 / n^{2}

:

(i):: $| Δ (i, j, k) | \leq 2 n$
(ii):: $| Λ_{k} (U_{i}, U_{j}) \cap Δ (i, j, k) | \leq 100 \cdot \sqrt{n} log n$ for $i, j \in [n^{1 / 4}]$ .

The proofs of these statements are technical but straightforward, making use of a Chernoff bound and union bounds; hence, we skip them here. To invoke Theorem 4, we describe a classical procedure first, beginning with an evaluation step, EvaluationA, implementable in

\tilde{O} (1)

rounds.

EvaluationA

Input: Every node

v_{(i, j, k)}

receives m elements

(u_{1}^{i, j, k}, \dots, u_{m}^{i, j, k})

of

U^{'}

Promise: For every node

v_{i, j, k}

and every

w \in U^{'}, | L_{w}^{i, j, k} | \leq 800 \sqrt{n} log n

.

Output: Each node outputs a list of exactly those

u_{l}^{i, j, k}

, such that there is a negative triangle in

U_{i} \times U_{j} \times u_{l}^{i, j, k}

.

Every node $v_{(i, j, k)}$ for each $r \in \sqrt{n}$ routes the list $L_{w}^{i, j, t}$ to node $v_{(i, j, t)}$ .
Every node $v_{(i, j, k)}$ for each $v u$ it received in step 1 sends the truth value of the inequality

$\begin{matrix} min_{w \in U_{k}^{'}} {W_{u w} + W_{w v}} \leq W_{v u} \end{matrix}$

(3)

to the node that sent $v u$ .

Each node is the source and destination of up to

800 n log n

messages in step 1, meaning that this step can be implemented in

1600 log n

rounds. The same goes for step 2, noting that the number of messages is the same, but they need to only be single-bit messages (the truth values of the inequalities). Hence, the evaluations for Theorem 4 can be implemented in

3200 log n

rounds. Now, applying the theorem with

X = U^{'}, β = 800 \sqrt{n} log n

, noting that then the assumptions of the theorem hold with a probability of at least

1 - 2 / n^{2}

due to Lemma 2, implies that step 3.2 is implementable in

\tilde{O} (n^{1 / 4})

rounds, with a success probability of at least

1 - 2 / m^{2}

.

For the general case in which we do not assume

α = 0

for all

i, j, k

in IdentifyClass, covered in the appendix, one needs to modify the EvaluationA procedure in order to implement load balancing and information duplication to avoid congestion in the simultaneous searches. These details can be found in the appendix, where a new labeling scheme and a different evaluation procedure EvaluationB are described; more information can also be found in [7].

4.7. Complexity

As noted previously and in [7], this APSP scheme uses

\tilde{O} (n^{1 / 4})

rounds. Let us characterize the constants and logarithmic factors involved to assess this algorithm’s practical utility. Suppose that in each round,

2 \cdot log n

qubits can be sent in each message (so that we can send two IDs or one edge with each message), where n is the number of nodes. For simplicity, let us assume

W ≪ n

and drop W.

APSP with routing tables needs $log (n)$ distance products with witness matrices.
Computing the $i^{t h}$ distance product square for Equation (2) with a witness matrix needs up to $log (2^{i}) = i$ calls to FindEdges, since the entries of the matrix being squared may double each iteration. APSP and distance products together make $\sum_{i = 1}^{⌈ log n ⌉} i = \frac{⌈ log (n) ⌉ (⌈ log (n) ⌉ + 1)}{2}$ calls to FindEdges.
Solving FindEdges needs $log (\frac{n}{60 log n})$ calls to FEWP, using FindEdgesViaFEWP.
Step 1 of ComputePairs needs up to $2 \cdot n^{1 / 4}$ rounds and step 2 takes up to $200 log n$ rounds.
Step 1 of IdentifyClass needs up to $20 log n$ rounds.
In step 2 of IdentifyClass, the $c_{u v w}$ are up to $\frac{1}{2} log n$ large and, hence, $α$ may range up to $\frac{1}{2} log n$ .
Step 0 of the EvaluationB procedure needs $n^{1 / 4}$ rounds. Steps 1 and 2 of EvaluationB (or EvaluationA, in the $α = 0$ case) procedure use a total of $3200 log n$ rounds.
The procedure, EvaluationB (or EvaluationA), is called up to $log (n) n^{1 / 4}$ times for each value of $α$ in step 3.2 of ComputePairs.

Without any improvements, we have the following complexity, using

3 n

in place of n for the terms of steps 3–8 due to Corollary 1:

\begin{matrix} \frac{⌈ log (n) ⌉ (⌈ log (n) ⌉ + 1)}{2} log (\frac{3 n}{60 log 3 n}) (2 {(3 n)}^{1 / 4} + 220 log 3 n + 2 {(3 n)}^{1 / 4} + \\ \frac{1}{2} log 3 n \cdot log 3 n \cdot {(3 n)}^{1 / 4} 3200 (log 3 n)), \end{matrix}

(4)

which we will call

f (n)

, so that

f (n) = O (n^{1 / 4} {log}^{6} (n))

, with the largest term being about

800 {log}^{6} (n) n^{1 / 4}

, and we drop W to just consider the case

W ≪ n

. We can solve the problem trivially in the (quantum or classical) CONGEST-CLIQUE within

n log (W)

rounds by having each node broadcast its neighbors and the weight on the edge. Let us again drop W for the case

W ≪ n

so that in order for the quantum algorithm to give a real speedup, we will need

f (n) < n,

which requires

n > 10^{18}

(even with the simpler under-approximation

800 {log}^{6} (n) n^{1 / 4}

in place of f). Hence, even with some potential improvements, the algorithm is impractical for a large regime of values of n, even when compared to the trivial CONGEST-CLIQUE n-round strategy.

For the algorithm of [7] computing only APSP distances, the first term in 4 becomes simply

⌈ log n ⌉

, so that when computing only APSP distances, the advantage over the trivial strategy begins at roughly

n \approx 10^{16}

.

Remark 4.

In light of logarithmic factors commonly being obscured by

\tilde{O}

notation, we point out that even an improved algorithm needing only

{log}^{4} (n) n^{1 / 4}

would not be practical unless

n > 10^{7}

, for the same reasons. Recall that n is the number of processors in the distributed network—tens of millions would be needed to make this algorithm worth implementing instead of the trivial strategy. Practitioners should mind the

\tilde{O}

if applications are of interest since even relatively few logarithmic factors can severely limit the practicality of algorithms, and researchers should be encouraged to fully write out the exact complexities of their algorithms for the same reason.

4.7.1. Memory Requirements

Although in definition 1, we make no assumption on the memory capacities of each node, the trivial n-round strategy uses at least

{2 log (n) | E |}^{2} \cdot log (W)

memory at the leader node that solves the problem. For the APSP problem in question, using the Floyd–Warshall algorithm results in memory requirements of

2 n^{2} log (n) \cdot log (n W)

at the leader node. Hence, we may ask whether the quantum APSP algorithm leads to lower memory requirements. The memory requirement is largely characterized by up to

720 n^{7 / 4} log (n) log (n W)

needed in step 0 of the EvaluationB procedure, which can be found in the appendix. This results in a memory advantage for quantum APSP over the trivial strategy beginning in the regime of

n > 1.6 \cdot 10^{10}

.

4.7.2. Complexity of the Classical Analog

For completeness, we provide here a characterization of the complexity of a closely related classical algorithm for APSP with routing tables in the CONGEST-CLIQUE, as proposed in [12], which has complexity

\tilde{O} (n^{1 / 3})

. In their framework, the approach to finding witness matrices requires

O (log {(n)}^{3})

calls to the distance product ([12] Section 3.4), and similar to our approach,

log (n)

distance products are required. Their classical algorithm computes distance products in

O (n^{1 / 3})

rounds, or under

2 log n

message bandwidth in up to

\begin{matrix} 20 n^{1 / 3} log {(n)}^{4} = : g (n) \end{matrix}

(5)

rounds, the details of which can be found in Appendix A.2. Then,

g (n) > n

, up until about

n \approx 2.6 \cdot 10^{11}

. As with the quantum APSP, though this algorithm provides the best-known asymptotic complexity of

\tilde{O} (n^{1 / 3})

in the classical CONGEST-CLIQUE, it also fails to give any real improvement over the trivial strategy across a very large regime of values of n. Consequently, algorithms making use of this APSP algorithm, such as [2] or [3], suffer from the same problem of impracticality. However, the algorithm only requires

4 n^{4 / 3} log (n) log (n W) + n log (n) log (n W)

memory per node, which is less than what is required for the trivial strategy, even for

n \geq 4

.

5. Approximately Optimal Steiner Tree Algorithm

5.1. Algorithm Overview

We present a high-level overview of the proposed algorithm to produce approximately optimal Steiner trees, divided into four steps.

Step 1: —APSP and routing tables: Solve the APSP problem, as in [7], and add an efficient routing table scheme via triangle finding in $\tilde{O} (n^{1 / 4})$ rounds, with success probability $(1 - 1 / p o l y (n))$ (this step determines the algorithm’s overall success probability).
Step 2: —shortest-path forest: Construct a shortest-path forest (SPF), where each tree consists of exactly one source terminal and the shortest paths to the vertices whose closest terminal is that source terminal. This step can be completed in one round and n messages, per ([2] Section 3.1). The messages can be in classical bits.
Step 3: —weight modifications: Modify the edge weights depending on whether they belong to a tree (set to 0), connect nodes in the same tree (set to ∞), or connect nodes from different trees (set to the shortest-path distance between root terminals of the trees that use the edge). This uses one round and n messages.
Step 4: —minimum spanning tree: Construct a minimum spanning tree (MST) on the modified graph in $O (1)$ rounds as in [6], and prune leaves of the MST that do not connect terminal nodes since these are not needed for the Steiner tree.

The correctness of the algorithm follows from the correctness of each step together with the analysis of the classical results of [13], which uses the same algorithmic steps of constructing a shortest-path forest and building it into an approximately optimal Steiner tree.

5.2. Shortest-Path Forest

After the APSP distances and routing tables are found, we construct a shortest-path forest (SPF) based on the terminals of the Steiner tree.

Definition 9.

(Shortest-path forest): For a weighted, undirected graph

G = (V, E, W)

together with a given set of terminal nodes

Z = {z_{1}, \dots, z_{k}}

, a subgraph

F = (V, E_{F}, W)

of G is called a shortest-path forest if it consists of

| Z |

disjoint trees

T_{z} = (V_{z}, E_{z}, W)

satisfying

(i): $z_{i} \in T_{z_{j}}$ if and only if $i = j$ , for $i, j \in [k]$ .
(ii): For each $v \in Z_{i}, d_{G} (v, z_{i}) = {min}_{z \in Z} d_{G} (v, z)$ , and a shortest path connecting v to $z_{i}$ in G is contained in $T_{z_{i}}$
(iii): The $V_{z_{i}}$ form a partition of V, and $E_{z_{1}} \cup E_{z_{2}} \dots \cup E_{z_{k}} = E_{F} \subset E$

In other words, an SPF is a forest obtained by gathering, for each node, a shortest path in G, connecting it to the closest Steiner terminal node.

For a node v in a tree, we will let

p a r (v)

denote the parent node of v in that tree,

s (v)

the Steiner terminal in the tree that v will be in, and

I D (v) \in [n]

the ID of node

v \in V

. Let

Q (v) : = {z : d_{G} (v, z) = {min}_{z \in Z} d_{G} (v, z)}

be the set of Steiner terminals closest to node v. We make use of the following procedure for the SPF:

DistributedSPF

Input: For each node

v \in G

, APSP distances and the corresponding routing table

R_{v}

.

Output: An SPF distributed among the nodes.

1:: Each node v sets $s (v) : = {argmin}_{z \in Q (v)} I D (z)$ using the APSP information.
2:: Each node v sets $p a r (v) : = R_{v} (s (v))$ , $R_{v}$ being the routing table of v, and sends a message to $p a r (v)$ to indicate this choice. If v receives such a message from another node u, it registers u as its child in the SPF.

Step 1 in DistributedSPF requires no communication since each node already knows the shortest-path distances to all other nodes, including the Steiner terminals, meaning it can be executed locally. Each node v choosing

p a r (v)

in step 2 can also be conducted locally using routing table information and, thus, step 2 requires 1 round of communication in

n - | Z |

classical messages, since all non-Steiner nodes send one message.

Claim 2.

After executing the DistributedSPF procedure, the trees

T_{z_{k}} = (V_{z_{k}}, E_{z_{k}}, W)

with

V_{z_{k}} : = {v \in V : s (v) = z_{k}}

and

E_{z_{k}} : = {v, p a r (v)} : v \in V_{z_{k}}}

form an SPF.

Proof.

(i) holds since each Steiner terminal is closest to itself; (iii) is immediate. To see that (ii) holds, note that for

v \in V_{z_{k}}

,

p a r (v) \in V_{z_{k}}

and

{v, p a r (v)} \in E_{z_{k}}

as well. Then,

p a r (p a r (\dots p a r (v) \dots)) = z_{k}

and the entire path to

z_{k}

lies in

T_{z_{k}}

. □

Hence, after this procedure, we have a distributed SPF across our graph, where each node knows its label, parent, and children of the tree it is in.

5.3. Weight-Modified MST and Pruning

Finally, we introduce a modification of the edge weights before constructing an MST on that new graph that will be pruned into an approximate Steiner tree. These remaining steps stem from a centralized algorithm first proposed by [13], whose steps can be implemented efficiently in the distributed setting, as in [2]. We first modify the edge weights as follows:

Partition the edges E into three sets: tree edges

E_{F}

, as in Definition 9, which are part of the edge set of the SPF; intra-tree edges

E_{I T}

, which are incident on two nodes in the same tree

T_{i}

of the SPF; and inter-tree edges

E_{X T}

, which are incident on two nodes in different trees of the SPF. Ensuring that each node knows which of these edges it belongs to can be accomplished in one round by having each node send its neighbors the ID of the terminal it chose as the root of the tree in the SPF it is part of. The edge weights are then modified as follows, denoting the modified weights as

W^{'}

:

(i):: For $e = (u, v) \in E_{T}, W^{'} (u, v) : = 0$
(ii):: For $e = (u, v) \in E_{I T}, W^{'} (u, v) : = \infty$
(iii):: For $e = (u, v) \in E_{X T}, W^{'} (u, v) : = d (u, Z_{u}) + W (u, v) + d (v, Z_{v})$ ,

noting that

d_{G} (u, s (u))

is the shortest-path distance in G from u to its closest Steiner terminal.

Next, we find a minimum spanning tree on the graph

G^{'} = (V, E, W^{'})

, for which we may implement the classical

O (1)

round algorithm proposed by [6]. On a high level, this constant-round complexity is achieved by sparsification techniques, reducing MST instances to sparse ones, and then solving those efficiently. We skip the details here and refer the interested reader to [6]. After this step, each node knows which of its edges are part of this weight-modified MST, as well as the parent–child relationships in the tree for those edges.

Finally, we prune this MST by removing non-terminal leaf nodes and the corresponding edges. This is conducted by each node v sending the ID of its parent in the MST to every other node in the graph. As a result, each node can locally compute the entire MST and then decide whether or not it connects two Steiner terminals. If it does, it decides it is part of the Steiner tree; otherwise, it broadcasts that it is to be pruned. Each node that has not been pruned then registers the edges connecting it to non-pruned neighbors as part of the Steiner tree. This pruning step takes 2 rounds and up to

n^{2} + n

classical messages.

5.4. Overall Complexity and Correctness

In Algorithm Section 5.1, after step 1, steps 2 and 3 can each be conducted within 2 rounds. Walking through [6] reveals that the MST for step 4 can be found in 54 rounds, with an additional 2 rounds sufficing for the pruning. Hence, the overall complexity remains dominated by Equation (4). Hence, the round complexity is

\tilde{O} (n^{1 / 4})

, which is faster than any known classical CONGEST-CLIQUE algorithm capable of producing an approximate Steiner tree of the same approximation ratio. However, as a consequence of the full complexity obtained in Section 4.7, the regime of n in which this algorithm beats the trivial strategy of sending all information to a single node is also

n > 10^{18}

. For the same reason, the classical algorithm provided in [2], making use of the APSP subroutine from [12], as discussed in Section 4.7.2, has its complexity mostly characterized by Equation (5), so that the regime in which it provides an advantage over the trivial strategy lies in

n > 10^{11}

. Our algorithm’s correctness follows from the correctness of each step together with the correctness of the algorithm by [13], which implements these steps in a classical, centralized manner.

6. Directed Minimum Spanning Tree Algorithm

This section focuses on establishing Theorem 2 for the directed minimum spanning tree (DMST) problem, in Definition 4. Similar to [3], we follow the algorithmic ideas first proposed by [14], implementing them in the quantum CONGEST-CLIQUE. Specifically, we will use

log n

calls to the APSP and routing tables scheme described in Section 4, so that in our case, we retrieve complexity

\tilde{O} (n^{1 / 4})

and success probability

{(1 - \frac{1}{p o l y (n)})}^{log n} = 1 - \frac{1}{p o l y (n)}

.

Before describing the algorithm, we need to establish some preliminaries and terminology for the procedures executed during the algorithm, especially the concept of shrinking vertices into super vertices and tracking a set H of specific edges, as first described in [19]. We use the following language to discuss super vertices and related objects.

Definition 10.

A super vertex set

V^{*} : = {V_{1}^{*}, \dots, V_{t}^{*}}

for a graph

G = (V, E, W)

is a partition of V, and each

V_{i}^{*}

is called a super vertex. We will call a super vertex simple if

V^{*}

is a singleton. The corresponding minor

G^{*} : = (V^{*}, E^{*}, W^{*})

is the graph obtained by creating edges

(V_{i}^{*}, V_{j}^{*})

with weight

W^{*} (V_{i}^{*}, V_{j}^{*}) : = min {W (v_{i}, v_{j}) : v_{i} \in V_{i}^{*}, v_{j} \in V_{j}^{*}}

.

Notably, we continue to follow the convention of an edge of weight ∞ being equivalent to not having an edge. We will refer to creating a super vertex

V^{*}

as contracting the vertices in

V^{*}

into a super vertex.

6.1. Edmonds’ Centralized DMST Algorithm

We provide a brief overview of the algorithm proposed in [19], which presents the core ideas of the super vertex-based approach. The following algorithm produces a DMST for G:

Edmonds’ DMST Algorithm

Input: An integer-weighted digraph and a root node r.

Output: A DMST for G rooted at r.

Initialize a subgraph H with the same vertex set as G by subtracting for each node the minimum incoming edge weight from all its incoming edges, and selecting exactly one incoming zero-weight edge for each non-root node of G. Set $G_{0} = G, H_{0} = H, t = 0$ .
WHILE $H_{t}$ is not a tree:
(a)
For each cycle of H, contract the nodes on that cycle into a super vertex. Consider all non-contracted nodes as simple super vertices, and obtain a new graph $G_{t + 1}$ as the resulting minor.
(b)
If there is a non-root node of $G_{t + 1}$ with no incoming edges, report a failure. Otherwise, obtain a subgraph $H_{t + 1}$ by, for each non-root node of $G_{t + 1}$ , subtracting the minimum incoming edge weight from all its incoming edges, and selecting exactly one incoming zero-weight edge for each non-root, updating $t \leftarrow t + 1$ .
Let $B_{t} = H_{t}$ . FOR $k \in (t, t - 1, \dots, 1)$ :
(a)
Obtain $B_{k - 1}^{'}$ by expanding the non-simple super vertices of $B_{k}$ and selecting all but one of the edges for each of the previously contracted cycles of $H_{k}$ to add to $B_{k - 1}$ .
Return $B_{0}$ .

Note that the edge weight modifications modify the weight of all directed spanning trees equally, so optimality is unaffected. In step 2, if

H_{t}

is a tree, it is an optimal DMST for the current graph

G_{t}

. Otherwise, it contains at least one directed cycle, so step 2 is valid. Hence, at the beginning of step 3.,

B_{t}

is a DMST for

G_{t}

. The first iteration then produces

B_{t - 1}

a DMST for

G_{t - 1}

since only edges of zero weight were added, and

B_{t - 1}

will have no cycles. The same holds for

B_{t - 2}, B_{t - 3}, \dots, B_{0}

, for which

B_{0}

corresponds to the DMST for the original graph G. If the algorithm reports a failure at some point, no spanning tree rooted at r exists for the graph, since a failure is reported only when there is an isolated non-root connected component in

G_{t + 1}

.

Note that in iteration t of step 2., H has one cycle for each of its connected components that does not contain the root node. Hence, the drawback of this algorithm is that we may apply up to

O (n)

steps of shrinking cycles. This shortcoming is remedied by a more efficient method of selecting how to shrink nodes into super vertices in [14], such that only the

log n

shrinking cycle steps take place.

6.2. Lovasz’s Shrinking Iterations

We devote this subsection to discussing the shrinking step of [14] that will be repeated

log n

times in place of step 2. of Edmonds’ algorithm to obtain Lovasz’s DMST algorithm.

Lovasz’s Shrinking Iteration LSI

Input: A directed, weighted graph

G = (V, E, W)

and a root node

r \in V

.

Output: Either a new graph

G^{*}

, or a success flag and a DMST H of G.

If there is a non-root node of G with no incoming edges, report a failure. Otherwise, for each non-root node of G, subtract the minimum incoming edge weight from all its incoming edges. Select exactly one incoming zero-weight edge for each non-root node to create a subgraph H of G with those edges.
Find all cycles of H, and denote them $H_{1}, \dots, H_{C} .$ If H has no cycles, abort the iteration and return (SUCCESS, H). For $j = 1, \dots, C$ , find the set $V_{j}$ of nodes reachable by dipaths in H from $H_{j}$ .
Compute the all-pairs shortest path distances in G.
For each node $v \in V$ , denote $d_{j} (v) : = min {d (v, u) : u \in H_{j}}$ . For each $j = 1, \dots, C$ , set $β_{j} : = min {d_{j} (v) : v \in V (G) ∖ V_{j}}$ and $U_{j} : = {u \in V_{j} : d_{j} (u) \leq β_{j}}$ .
Create a minor $G^{*}$ by contracting each $U_{j}$ into a super vertex $U_{j}^{*}$ , considering all other vertices of G as simple super vertices $V_{1}^{*}, \dots, V_{k}^{*}$ . For each vertex $N^{*}$ of $G^{*}$ , let the edge weights in $G^{*}$ be:

$\begin{matrix} W_{N^{*} U_{j}^{*}}^{*} & = min {W_{v u} : v \in N^{*}, u \in U_{j}^{*}} - β_{j} + min {d_{j} (u) : u \in U_{j}^{*}} \\ for all j = 1, \dots, C, and \\ W_{N^{*} V^{*}}^{*} & = min {W_{v V^{*}} : v \in N^{*}} \\ for all the simple super vertices V^{*} of G^{*} . \end{matrix}$
Return $G^{*}$ .

To summarize these iterations: The minimum-weight incoming edge of each node is selected. That weight is subtracted from the weights of every incoming edge to that node, and one of those edges with new weight 0 is selected for each node to create a subgraph H. If H is a tree, we are done. Otherwise, we find all cycles of the resulting directed subgraph, then compute APSP, and determine the

V_{j}, U_{j},

and

β_{j}

, which we use to define a new graph with some nodes of the original G contracted into super vertices.

The main result for the DMST problem in [14] states that replacing (a) and (b) of step 2. in Edmonds’ DMST Algorithm, taking the new H obtained at each iteration as

H_{t + 1}

and

G^{*}

as

G_{t + 1}

, leads to no more than

⌈ log n ⌉

shrinking iterations needed before success is reported.

6.2.1. Quantum Distributed Implementation

Our goal is to implement the Lovasz iterations in the quantum distributed setting in

\tilde{O} (n^{1 / 4})

rounds by making use of quantum APSP from Section 4. In the distributed setting, processor nodes cannot directly be shrunk into super vertices. As in [3], we reconcile this issue by representing the super vertex contractions within the nodes through soft contractions.

First, note that a convenient way to track what nodes we wish to merge into a super vertex involves maintaining a mapping

s I D : V \to S

, where S is a set of super vertex IDs, which we can just take to be the IDs of the original nodes. We will refer to a pair of

(G, s I D)

as an annotated graph. An annotated graph naturally corresponds to some minor of G, namely, the minor obtained by contracting all vertices sharing a super vertex ID into a super vertex.

Definition 11.

(Soft Contractions) For an annotated graph

(G, s I D)

, a set of active edges H, and active component

H_{i}

with corresponding weight modifiers

β_{i}

, and a subset

A \subset S

of super vertices, the soft contraction of

H_{i}

in G is the annotated graph

(G^{H_{i}}, s I D^{'})

obtained by taking

G^{H_{i}} = (V, E, W^{'})

with

$W_{u v}^{'} = 0$ if $s I D (u) = s I D (v)$
$W_{u v}^{'} = W_{u v} + d i s t_{G (A)} (v, C (H_{i})) - β_{i}$ if $u \in V ∖ A$ and $v \in A$
$W_{u v}^{'} = W_{u v}$ otherwise

and updating the mapping

s I D

to

s I D^{'}

defined by

s I D^{'} (v) = s I D (v), \forall v \notin A

,

s I D^{'} (v) = min {s I D (u) : u \in A}

.

6.2.2. Quantum Distributed Implementation of Lovasz’s Iteration

We provide here a quantum distributed implementation of Lovasz’s iteration, which will form the core of our DMST algorithm.

Quantum distributed implementation of Lovasz’s iteration QDLSI

Input: A directed, weighted, graph

G = (V, E, W)

with annotations

s I D

and a subgraph H.

Output: A new graph

G^{*}

with annotations

s I D^{'}

, or a success flag and a DMST H of G.

1:

Have all nodes learn all edges of H, as well as the current super vertices.

2:

For each connected component

H_{i} \subset H

, denote by

C (H_{i})

the cycle of

H_{i}

. Let

c (H_{i})

be the node with maximal ID in

C (H_{i})

, which each node can locally compute.

3:

Run the quantum algorithm for APSP and routing tables described in Section 4 on this graph, or report a failure if it fails.

4:

For each i, determine an edge

v_{i} u_{i}

,

v_{i} \notin H_{i}, u_{i} \in H_{i}

minimizing

β_{i} : = W_{v_{i} u_{i}} + d_{G} (u_{i}, c (H_{i}))

, and broadcast both to all nodes in

H_{i}

.

5:

Each node

v_{i}

in each

H_{i}

applies the following updates

l o c a l l y

:

−: Soft contract $H_{i}$ at level $β_{i}$ to soft-contract all super vertices with distance $β_{i}$ to $C (H_{i})$ into one super vertex, with each contracted node updating its super vertex ID to $c (H_{i})$
−: Add edge $v_{i} u_{i}$ to H, effectively merging $H_{i}$ with another active component of H

We can follow the steps of Lovasz’s DMST algorithm, distributedly, by replacing steps 2–5 with LSI with this quantum-distributed version. The following ensues:

Lemma 3.

If none of the APSP and routing table subroutines fail, within

⌈ log n ⌉

iterations of QDLSI, H is a single connected component.

Lemma 4.

With probability

{(1 - \frac{1}{p o l y (n)})}^{log n}

, all the APSP and routing table subroutines in step 3 succeed.

Lemmas 3 and 4 then together imply Theorem 2. Within

⌈ log n ⌉

iterations, only one active component remains, i.e., the root component. This active component can then be expanded to a full DMST on G within

⌈ log n ⌉

rounds, as detailed in ([3] Section 7) or the Unpacking procedure in Appendix A.3. All messages in the algorithm, other than those for computing the APSP in QDLSI, may be classical. We provide here the full algorithm for completeness:

Quantum DMST Algorithm

Input: An integer-weighted digraph and a root node r.

Output: A DMST for G rooted at r.

Initialize a subgraph H with the same vertex set as G by subtracting for each node the minimum incoming edge weight from all its incoming edges, and selecting exactly one incoming zero-weight edge for each non-root node of G. Set $t = 0, H_{0} = H$ , and $G_{0} = G$ with annotations $s I D_{0}$ to be the identity mapping.
WHILE: $H_{t}$ is not a single component
(a)
Run QDLSI with inputs $H_{t}$ , $(G_{t}, s I D_{t})$ to obtain $H_{t + 1}$ , $(G_{t + 1}, s I D t + 1)$ as outputs. Increment $t \leftarrow t + 1$ .
Let $T_{t} : = H_{t}$ . For $k = t, \dots, 1$ : For each super vertex of the $k^{t h}$ iteration of QDLSI applied, simultaneously run the Unpacking procedure with input tree $T_{k}$ to obtain $T_{k - 1}$ .
Return $T_{0}$ as the distributed minimum spanning tree.

6.3. Complexity

In QDLSI, all steps other than the APSP step 3 of the quantum Lovasz iteration can be implemented within 2 rounds. In particular, to have all nodes know some tree on G for which each node knows its parent, every node can simply broadcast its parent edge and weight. Since this iteration is used up to

⌈ log (n) ⌉

times and expanding the DMST at the end of the algorithm also takes logarithmically many rounds, we obtain a complexity dominated by the APSP computation of

\tilde{O} (n^{1 / 4})

, a better asymptotic rate than any known classical CONGEST-CLIQUE algorithm. However, beyond the

\tilde{O}

, the complexity is largely characterized by

log (n) \cdot f (n)

, with

f (n)

, as in Equation (4). In order to have

log (n) f (n) < n

to improve upon the trivial strategy of having a single node solve the problem, we then need

n > 10^{21}

. Using the classical APSP from [12] in place of the quantum APSP of Section 4, as conducted in [3] to attain the

\tilde{O} (n^{1 / 3})

complexity in the cCCM, one would need

log (n) \cdot g (n) < n

to beat the trivial strategy, with g, as in Equation (5), or more than

n > 10^{14}

.

7. Discussion and Future Work

We provided algorithms in the quantum CONGEST-CLIQUE model for computing approximately optimal Steiner trees and exact directed minimum spanning trees that use asymptotically fewer rounds than their classical known counterparts. As Steiner trees and minimum spanning trees cannot benefit from quantum communication in the CONGEST (non-clique) model, the algorithms reveal how quantum communication can be exploited thanks to the CONGEST-CLIQUE setting. A few open questions remain as well. In particular, there exist many generalizations of the Steiner tree problem, so these may be a natural starting point to attempt to generalize the results. A helpful overview of Steiner-type problems can be found in [20]. Regarding the DMST, it may be difficult to generalize a similar approach to closely related problems. Since the standard MST can be solved in a (relatively small) constant number of rounds in the classical CONGEST-CLIQUE, no significant quantum speedup is possible. Other interesting MST-type problems are the bounded-degree and minimum-degree spanning tree problems. However, even the bounded-degree decision problem on an unweighted graph, “does G have a spanning tree of degree at most k?” is NP-complete, unlike the DMST, so we suspect that other techniques would need to be employed. Reference [21] provides a classical distributed approximation algorithm for the problem. Additionally, we traced many constants and log factors throughout our description of the above algorithms, which, as shown, would need to be significantly improved for these and related algorithms to be practical. Hence, a natural avenue for future work is to work towards such practical improvements. Beyond the scope of the particular algorithms involved, we hope to help the community recognize the severity in which the practicality of algorithms is affected by logarithmic factors that may be obscured by the

\tilde{O}

notation and, thus, encourage fellow researchers to present the full complexity of their algorithms beyond asymptotics. Particularly in a model such as CONGEST-CLIQUE, where problems can always be solved trivially in n rounds, these logarithmic factors should clearly not be taken lightly. Further, a question of potential practical interest would be to ask the following: What algorithms solving the discussed problems are the most efficient with respect to rounds needed in the CONGEST-CLIQUE in the regimes of n, in which the discussed algorithms are impractical?

Author Contributions

Conceptualization, E.G.R.; Methodology, P.K. and D.E.B.N.; Software, P.K.; Formal analysis, P.K., D.E.B.N. and Z.G.I.; Investigation, P.K., D.E.B.N., Z.G.I. and E.G.R.; Writing – original draft, P.K.; Writing—review & editing, D.E.B.N., Z.G.I. and E.G.R.; Supervision, D.E.B.N. and E.G.R.; Funding acquisition, E.G.R. All authors have read and agreed to the published version of the manuscript.

Funding

We are grateful for support from the NASA Ames Research Center, from the NASA SCaN program, and from DARPA under IAA 8839, Annex 130. PK and DB acknowledge support from the NASA Academic Mission Services (contract NNA16BD14C).

Acknowledgments

The authors thank Ojas Parekh for helpful input and discussions regarding the arborescence problem, Shon Grabbe for ongoing useful discussions, and Filip Maciejewskifor for helpful feedback on the work.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1. Proof of Claim 1

For an

n \times n

integer matrix W, obtain matrices

W^{'}

and

W^{″}

by taking

W_{i j}^{'} = n W_{i j} + j - 1

and

W_{j i}^{''} = n W_{j i}

. Set

D = W^{'} ★ W^{''}

. We aim to show that

⌊ \frac{D}{n} ⌋ = W^{2, ★}

and

(D mod n) + 1

is a witness matrix for

W^{2, ★} .

Proof.

(i): We have

$\begin{matrix} ⌊ {\frac{D}{n}}_{i j} ⌋ & = ⌊ min_{k \in [n]} \{n W_{i k} + k - 1 + n W_{k j}\} / n ⌋ = ⌊ min_{k \in [n]} \{W_{i k} + W_{k j} + \frac{k - 1}{n}\} ⌋ \\ = W_{i j}^{2} + ⌊ min_{k \in [n]} \{\frac{k - 1}{n} : W_{i k} + W_{k j} = W_{i j}^{2}\} ⌋ = W_{i j}^{2} . \end{matrix}$
(ii): Next,

$\begin{matrix} D_{i j} = n W_{i j}^{2} + ⌊ min_{k \in [n]} \{k - 1 : W_{i k} + W_{k j} = W_{i j}^{2}\} ⌋ \end{matrix}$

gives us

$(D mod n) + 1 = ⌊ min_{k \in [n]} \{k - 1 : W_{i k} + W_{k j} = W_{i j}^{2}\} ⌋ + 1 = min_{k \in [n]} \{k : W_{i k} + W_{k j} = W_{i j}^{2}\},$

which proves the claim.

□

Appendix A.2. The α > 0 Case

The strategy will be to assign each

v_{(i, j, k)} \in V

into classes in accordance with approximately how many negative triangles are in

U_{i} \times U_{j} \times U_{k}^{'}

before starting the search.

To assign each node to a class, we use the routine IdentifyClass of [7], also described in the main text.

The main body of this paper discusses the special case, assuming

α = 0

. Hence, we now consider the

α > 0

case.

For each

α \in N

, let us denote

c_{i, j, k}

as the smallest nonnegative integer satisfying

d_{i, j, k} < 10 \cdot 2^{c} log n

, and

\begin{matrix} V_{α} & : = {v_{(i, j, k)} : c_{i, j, k} = α} \end{matrix}

(A1)

\begin{matrix} V_{α} [i, j] & : = {U_{k}^{'} \in U^{'} : v_{(i, j, k)} \in V_{α}} \end{matrix}

(A2)

for any

i, j \in [n^{1 / 4}]

. Notably,

P (i, j)

contains, at most,

\sqrt{n}

edges, so that

d_{i, j, k} \leq \sqrt{n}

as well. Hence,

c = \frac{1}{2} log n

provides an upper bound for the minimum in step 2. The important immediate consequence is that we only need to consider

V_{α}

up to, at most,

α = \frac{1}{2} log n

.

Lemma A1.

The IdentifyClass algorithm and the resulting

V_{α}

satisfy the following statements with a probability of at least

1 - 2 / n

:

(i):: The algorithm does not abort
(ii):: $| Δ (i, j, k) | \leq 2 n$
(iii):: For $α > 0$ , $v_{(i, j, k)} \in V_{α}$ , we have $2^{α - 3} n \leq | Δ (i, j, k) | \leq 2^{α + 1} n$ .
(iv):: $| Λ_{x} (i, j) \cap Δ (i, j, k) | \leq 100 \cdot 2^{α} \sqrt{n} log n$ for $i, j \in [n^{1 / 4}]$ and $α \in N$ .

This provides an adapted version of Lemma 2 for the

α > 0

case.

The following lemma provides a tool that will allow for "duplication" of information to avoid message congestion in the network in the EvaluationB procedure.

Lemma A2.

For all

α \geq 0

and

i, j \in [n^{1 / 4}]

,

\begin{matrix} | V_{α} [i, j] | \leq \frac{720 \sqrt{n} log n}{2^{α}} \end{matrix}

(A3)

Proof.

The

α = 0

case is immediate since

| U^{'} | = \sqrt{n}

, so consider

α \geq 1

. The “promise” in the FEWP subroutine we are in guarantees that for all

(u, v) \in S, Γ (u, v) \leq 90 log n

, so that for any

i, j \in [n^{1 / 4}]

, each edge in

P (U_{i}, U_{j}) \cap S

has, at most,

90 log n

other nodes forming a negative triangle with it, leading to the inequality

\sum_{k : v_{(i, j, k)} \in V_{α}} | Δ (i, j, k) | \leq 90 n^{3 / 2} log n .

Using

| Δ (i, j, k) | \geq 2^{α - 3} n

from part (i) of Lemma A1, the conclusion follows. □

We now describe the implementation of step 3 of the ComputePairs procedure for the

α > 0

case.

3.1:: Each node executes the IdentifyClass procedure.
3.2:: For each $α$ :
For every $l \in [m]$ , every node $v_{(i, j, k)}$ executes a quantum search to find whether there is a $U_{k}^{'} \in V_{α} [U_{i}, U_{j}]$ with some $w \in U_{k}^{'}$ forming a negative triangle $(u_{l}^{k}, v_{l}^{k}, w)$ in G, and then reports all the pairs $u_{l}^{k} v_{l}^{k}$ for which such a $U_{k}^{'}$ was found.

The

α = 0

case was described in the main text. We proceed to describe the classical procedure for invoking Theorem 4 to obtain the speedup for the general

α

case, as in ([7] Section 5.3.2). Some technical precautions must be taken to avoid the congestion of messages between nodes. This crucially relies on information duplication to effectively increase the bandwidth between nodes. Lemma A2 provides a strong bound for the size of each

V_{α}

. For this duplication of the information stored by the relevant nodes, a new labeling scheme is convenient. Suppose for simplicity that

C_{α} : = 2^{α} / (720 log n)

is an integer, and assign each node a label

(u, v, w, y) \in V_{α} \times [C_{α}]

, which is possible due to the bound of Lemma A2. The following EvaluationB, which is implementable in

O (log n)

rounds (using a slightly sharper complexity analysis than [7]), can then be used for invoking Theorem 4:

EvaluationB

Input: A list

(w_{1}^{k}, \dots, w_{m}^{k})

of elements of

V_{α} [u, v]

assigned to each node

k = (u, v, x)

.

Promise:

| L_{w}^{k} | \leq 800 \cdot 2^{α} \sqrt{n} log n

for each node k and all

| w \in V_{α} [u, v]

.

Output: Every node

k = (u, v, x)

outputs for each

ℓ \in [m]

whether some

w \in w_{l}^{k}

forms a negative triangle

{u_{ℓ}^{k}, v_{ℓ}^{k}, w}

.

0.: Every node $(u, v, w) \in V_{α}$ broadcasts the edge information loaded in step 1 of ComputePairs to $(u, v, w, y)$ for each $y \in [C_{α}]$ .
1.: Every node $(u, v, x)$ splits each $L_{w}^{k}$ into smaller sublists $L_{w, 1}^{k}, \dots, L_{w, C_{α}}^{k}$ for each $w$ , with each sublist containing up to $⌈ | L_{w}^{k} | / C_{α} ⌉ = ⌈ 800 \cdot 720 \sqrt{n} {log}^{2} n ⌉$ elements, and sends each $L_{w, y}^{k}$ to node $(u, v, w, y)$ along with the relevant edge weights.
2.: Every $(u, v, w, y)$ node returns the truth value

$min_{w \in w} {W_{u w} + W_{w v}} \leq W_{v u}$

to node k for each $u v \in L_{w, y}^{k}$ received in step 1.

For each value of

α

, we separately solve step 3.2 of the ComputePairs procedure. Since Lemma A2 tells us that there are

C_{α}

times more nodes not in

V_{α}

than there are in

V_{α}

, every node in

V_{α}

can use

C_{α}

of those nodes not in

V_{α}

to relay messages and effectively increase its message bandwidth, which is exactly what EvaluationB takes advantage of. Steps 1 and 2 of the procedure take up to

2 \cdot ⌈ | L_{w}^{k} | ⌉ / n \leq 1600 \cdot log n

rounds, since lists of size

⌈ | L_{w}^{k} | / C_{α} ⌉

are sent to

C_{α}

nodes, and the bound on

α

gives

⌈ | L_{w}^{k} | ⌉ \leq 800 n log (n)

.

Complexity of the Classical Analog

This subsection of the appendix serves to provide some supplemental information to Section 4.7.2, discussing the complexity of an algorithm for APSP with routing tables in the CONGEST-CLIQUE as proposed in [12], which has complexity

\tilde{O} (n^{1 / 3})

. Note that Corollary 6, in [12], is applied to APSP distance computations only, whereas the routing table computations are discussed in [12], Section 3.4. As shown there,

O ({log}^{3})

distance products (without witnesses) are needed to compute one distance product with a witness matrix. More precisely:

Obtaining a witness matrix when witnesses are unique requires $log (n)$ distance products.
The procedure for finding witnesses in the general case calls the procedure to find witnesses in the unique witness case $O ({log}^{2} n)$ times, or $2 \cdot {log}^{2} n$ times if $c = 2$ is deemed sufficient for the success probability.
For the APSP algorithm with routing tables, $log n$ distance products with witnesses are needed.

Then

2 {log}^{4} n

distance products are computed in total for one distance product with witnesses. The distance product via the semi-ring matrix multiplication algorithm of [12], Section 2.1, uses

10 n^{1 / 3}

rounds (

4 n^{1 / 3}

for steps 1 and 2, and

2 n^{1 / 3}

for step 3) using Lemma 1; hence, one obtains the full round complexity of

\begin{matrix} 10 n^{1 / 3} \cdot 2 log {(n)}^{4} = g (n) . \end{matrix}

(A4)

Appendix A.3. Expanding the DMST in the Distributed Setting

We handle the expansion of the DMST in the same way as in ([3] Section 7), borrowing much of their discussion for our description here. However, as we have computed APSP distances along the way in place of SSSP, ‘unpacking’ the DMST becomes a bit simpler in our case.

Consider a component

H_{i}

in one of the iterations of QDLSI, with the input graph for the iteration being

G_{i}

. For each contraction in QDLSI, we determined edges

v_{i} u_{i}

,

v_{i} \notin H_{i}, u_{i} \in H_{i}

minimizing

β_{i} : = W_{v_{i} u_{i}} + d_{G} (u_{i}, c (H_{i}))

to contract nodes. Recall that what happens in the iteration is that the cycle

c (H_{i})

and all nodes that have distance

β_{i}

to

c (H_{i})

are contracted into one super vertex. Denote that super vertex by

V_{H_{i}, β_{i}}^{*}

. Let

G_{i + 1}

denote the graph obtained after this contraction. Our goal, given a DMST

T_{i + 1}

for

G_{i + 1}

, is to recover

G_{i}

along with a DMST

T_{i}

for

G_{i}

. We make use of the following Unpacking operation of ([3] Section 7):

Unpacking

Input: A digraph

G_{i + 1}

with a DMST

T_{i + 1}

with root r, a set of edges

H_{i}

as in Quantum DMST Algorithm, a node

V_{H_{i}, β_{i}}^{*}

of

G_{I + 1}

, which is marked as a super vertex, a set

c (H_{i})

of the nodes contracted into it, and

G_{i}

is the graph before contracting

c (H_{i})

.

Output: A DMST

T_{i}

for

G_{i}

rooted at r.

1:

For any

v_{1}, v_{2} \notin V_{H_{i}, β_{i}}^{*}

, let edge

v_{1} v_{2} \in T_{i}

if

v_{1} v_{2} \in T_{i + 1}

.

2:

For

u V_{H_{i}, β_{i}}^{*} \in T_{i + 1}

, which exists since

T_{i + 1}

is a DMST for

G_{i + 1}

, denote the edge

u v^{*} : = a r g m i n_{u v : v \in V_{H_{i}, β_{i}}^{*}, u : \exists u v \in G_{i + 1}} W_{v u} + d_{G} (u, c (H_{i}))

. Add

u v^{*}

and the shortest path

ζ

connecting

v^{*}

to

c (H_{i})

to

T_{i}

.

3:

For any edge

V_{H_{i}, β_{i}}^{*} u \in T_{i + 1}

outgoing from the contracted super vertex, add the edge

$a r g m i n_{v u : v \in V_{H_{i}, β_{i}}^{*}} W_{v u}^{G_{i}}$ to $T_{i}$ .

4:

Add all edges

H_{i} ∖ δ^{i n} (ζ)

to

T_{i}

, where

δ^{i n} (ζ)

denotes all edges incoming on

ζ

.

At the end of this procedure,

T_{i}

is a DMST for

G_{i}

([3], lemma 8). We now describe how it can be implemented distributedly, needing only classical messages and information. For every contracted super vertex, the following steps can be implemented at the same time, as will become clear in how the steps are executed for the nodes of each contracted super vertex. Let us focus on unpacking one super vertex

V_{H_{i}, β_{i}}^{*}

. Each node knows its neighbors in

G_{i}

, and every node’s super vertex ID in

G_{i}

and

G_{i + 1}

, since each node stores this information before the initial contraction to

G_{i + 1}

in QDLSI happens. Hence, step 1 can be conducted locally at each node without any communication. Step 2 can be conducted by first having each node

v \in V_{H_{i}, β_{i}}^{*}

send

β (u, v)

to the other nodes in

V_{H_{i}, β_{i}}^{*}

, in one round, and then having each node of

V_{H_{i}, β_{i}}^{*}

send to

v^{*}

the routing table entry corresponding to its shortest path to

c (H_{i})

in

G_{i}

, also in one round (the nodes have already computed this information in QDLSI.

v^{*}

then notifies the nodes that are part of

ζ

, which can then add the appropriate edge to

T_{i}

, needing yet another round, so that step 2 can be conducted in three rounds of classical communication only. Step 3 is handled similarly. For the outgoing edge, each node in

V_{H_{i}, β_{i}}^{*}

sends

W_{v u}^{G_{i}}

to the other nodes in

V_{H_{i}, β_{i}}^{*}

so that the appropriate edge to add to

T_{i}

can be determined (in case of a tie, the node with a smaller ID can be the one to add the edge), so this can be conducted in one round. For step 4, every node in

ζ

notifies its neighbors that it is in

ζ

, after which, every node can determine which edges to add to

T_{i}

. For the unpacking of

V_{H_{i}, β_{i}}^{*}

, the information and communication for implementing its unpacking is contained in the nodes of

V_{H_{i}, β_{i}}^{*}

, so we can indeed unpack all vertices synchronously to obtain

G_{i}

, even when multiple super vertices are contracted to obtain

G_{i + 1}

. Hence, one layer of unpacking using this procedure can be implemented in 5 rounds (making use of the APSP and routing table information computed earlier before the contractions in QDLSI). Since there are, at most,

⌈ log n ⌉

contraction steps, the unpacking procedure can be implemented in

5 \cdot ⌈ log n ⌉

rounds.

Appendix A.4. Information Access

In Remark 1, we mention that it suffices for all information regarding the input graph to be stored classically, with quantum access to it. Here, we expand on what we mean by that and refer the interested reader to [22] for further details.

While our algorithms use quantum subroutines, the problem instances and their solutions are encoded as classical information. The required quantum access implies the capability to interact with the classical data in a way that allows computation in the superposition of that data. For instance, in the standard (non-distributed) Grover search algorithm, with a problem instance described by a function

g : X \to 0, 1

, we need the ability to apply the unitary

U_{w} | x 〉 = {(- 1)}^{g (x)} | x 〉

to an N-qubit superposition state

| s 〉 = \frac{1}{\sqrt{N}} \sum_{x = 0}^{N - 1} | x 〉

. This unitary is also referred to as the “oracle”, and the act of calling it is called a " “query”. If we wish to use the distributed Grover search in Example 1, in which the node u leading the search attempts to determine whether each edge

u v

incident on it is part of a triangle in graph G, the unitary that node v must be able to evaluate is the indicator function of its neighborhood, and u must be able to apply the Grover diffusion unitary restricted to its neighborhood. Then after initializing the N-qubit equal superposition, nodes u and v can send a register of qubits back and forth between each other, with v evaluating the unitary corresponding to the indicator of its neighborhood and u applying the Grover diffusion operator restricted to its neighborhood. The same ideas transfer over to a distributed quantum implementation of the EvaluationA or EvaluationB procedures. There, instead of evaluating unitaries corresponding to indicators, in step 2, each node

v_{(i, j, k)}

evaluates the unitary corresponding to the truth values of inequality 3 for the evaluation steps. That information is then returned to the node that sent it, which can then apply the appropriate Grover diffusion operator.

In general, quantum random access memory (QRAM) is the data structure that allows queries to the oracle. We can use circuit QRAM in our protocols or could make use of special-purpose hardware QRAM if it were to be realized. This choice does not affect the number of communication rounds but would affect the computation efficiency at each node. A main component of the distributed algorithms discussed in this work is the provision of quantum query access for each node to its list of edges and their associated weights in a given graph G. This information is stored in memory, and the QRAM that implements the query to retrieve it can be accessed in

O (log n)

time, resulting in a limited overhead for our algorithms. This retrieval of information takes place locally at each node; hence, this overhead does not add to the round complexity of our algorithms in the CONGEST-CLIQUE setting. We refer to [23] for more details on QRAM.

References

Korhonen, J.H.; Suomela, J. Towards a complexity theory for the congested clique. In Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures, Vienna, Austria, 16–18 July 2018. [Google Scholar] [CrossRef]
Saikia, P.; Karmakar, S. Distributed Approximation Algorithms for Steiner Tree in the CONGESTED CLIQUE. arXiv 2019, arXiv:1907.12011. [Google Scholar] [CrossRef]
Fischer, O.; Oshman, R. A distributed algorithm for directed minimum-weight spanning tree. Distrib. Comput. 2021, 36, 57–87. [Google Scholar] [CrossRef]
Lenzen, C. Optimal Deterministic Routing and Sorting on the Congested Clique; ACM: New York, NY, USA, 2012. [Google Scholar] [CrossRef]
Dolev, D.; Lenzen, C.; Peled, S. “Tri, Tri Again”: Finding Triangles and Small Subgraphs in a Distributed Setting. In International Symposium on Distributed Computing; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
Nowicki, K. A Deterministic Algorithm for the MST Problem in Constant Rounds of Congested Clique. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, Phoenix, AZ, USA, 23–26 June 2019. [Google Scholar] [CrossRef]
Izumi, T.; Gall, F.L. Quantum Distributed Algorithm for the All-Pairs Shortest Path Problem in the CONGEST-CLIQUE Model. In Proceedings of the 2019 ACM Symposium on PODC, Toronto, ON, Canada, 29 July–2 August 2019. [Google Scholar] [CrossRef] [Green Version]
Censor-Hillel, K.; Fischer, O.; Le Gall, F.; Leitersdorf, D.; Oshman, R. Quantum Distributed Algorithms for Detection of Cliques. arXiv 2022, arXiv:2201.03000. [Google Scholar] [CrossRef]
van Apeldoorn, J.; de Vos, T. A Framework for Distributed Quantum Queries in the CONGEST Model. In Proceedings of the 2022 ACM Symposium on Principles of Distributed Computing, Salerno, Italy, 25–29 July 2022. [Google Scholar] [CrossRef]
Elkin, M.; Klauck, H.; Nanongkai, D.; Pandurangan, G. Can Quantum Communication Speed Up Distributed Computation? In Proceedings of the 2014 ACM Symposium on Principles of Distributed Computing, Madeira, Portugal, 16–18 July 2012. [Google Scholar] [CrossRef]
Le Gall, F.; Magniez, F. Sublinear-Time Quantum Computation of the Diameter in CONGEST Networks. In Proceedings of the 2018 ACM Symposium on PODC, Egham, UK, 23–27 July 2018. [Google Scholar] [CrossRef] [Green Version]
Censor-Hillel, K.; Kaski, P.; Korhonen, J.H.; Lenzen, C.; Paz, A.; Suomela, J. Algebraic methods in the congested clique. Distrib. Comput. 2016, 32, 461–478. [Google Scholar] [CrossRef] [Green Version]
Kou, L.T.; Markowsky, G.; Berman, L. A fast algorithm for Steiner trees. Acta Inform. 1981, 15, 141–145. [Google Scholar] [CrossRef]
Lovasz, L. Computing ears and branchings in parallel. In Proceedings of the 26th Annual Symposium on Foundations of Computer Science (sfcs 1985), Washington, DC, USA, 21–23 October 1985; pp. 464–467. [Google Scholar] [CrossRef]
Ghaffari, M. Distributed Graph Algorithms (Lecture Notes). 2020. Available online: https://disco.ethz.ch/courses/fs21/podc/lecturenotes/DGA.pdf (accessed on 15 June 2023).
Rieffel, E.; Polak, W. Quantum Computing: A Gentle Introduction, 1st ed.; The MIT Press: Cambridge, MA, USA, 2011. [Google Scholar]
Zwick, U. All Pairs Shortest Paths using Bridging Sets and Rectangular Matrix Multiplication. arXiv, 2000; arXiv:cs/0008011. [Google Scholar]
Izumi, T.; Le Gall, F.; Magniez, F. Quantum Distributed Algorithm for Triangle Finding in the CONGEST Model. In Proceedings of the 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020), Montpellier, France, 10–13 March 2020. [Google Scholar] [CrossRef]
Edmonds, J. Optimum branchings. J. Res. Natl. Bur. Stand. B 1967, 71, 233–240. [Google Scholar] [CrossRef]
Hauptmann, M.; Karpinski, M. A Compendium on Steiner Tree Problems. 2015. Available online: https://theory.cs.uni-bonn.de/info5/steinerkompendium/ (accessed on 15 June 2023).
Dinitz, M.; Halldorsson, M.M.; Izumi, T.; Newport, C. Distributed Minimum Degree Spanning Trees. In Proceedings of the Proceedings of the 2019 ACM Symposium on PODC, New York, NY, USA, 29 July–2 August 2019; pp. 511–520. [Google Scholar] [CrossRef]
Booth, K.E.C.; O’Gorman, B.; Marshall, J.; Hadfield, S.; Rieffel, E. Quantum-accelerated constraint programming. Quantum 2021, 5, 550. [Google Scholar] [CrossRef]
Giovannetti, V.; Lloyd, S.; Maccone, L. Quantum Random Access Memory. Phys. Rev. Lett. 2008, 100, 160501. [Google Scholar] [CrossRef] [PubMed] [Green Version]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kerger, P.; Bernal Neira, D.E.; Gonzalez Izquierdo, Z.; Rieffel, E.G. Mind the $\tilde{O}$ : Asymptotically Better, but Still Impractical, Quantum Distributed Algorithms. Algorithms 2023, 16, 332. https://doi.org/10.3390/a16070332

AMA Style

Kerger P, Bernal Neira DE, Gonzalez Izquierdo Z, Rieffel EG. Mind the $\tilde{O}$ : Asymptotically Better, but Still Impractical, Quantum Distributed Algorithms. Algorithms. 2023; 16(7):332. https://doi.org/10.3390/a16070332

Chicago/Turabian Style

Kerger, Phillip, David E. Bernal Neira, Zoe Gonzalez Izquierdo, and Eleanor G. Rieffel. 2023. "Mind the $\tilde{O}$ : Asymptotically Better, but Still Impractical, Quantum Distributed Algorithms" Algorithms 16, no. 7: 332. https://doi.org/10.3390/a16070332

APA Style

Kerger, P., Bernal Neira, D. E., Gonzalez Izquierdo, Z., & Rieffel, E. G. (2023). Mind the $\tilde{O}$ : Asymptotically Better, but Still Impractical, Quantum Distributed Algorithms. Algorithms, 16(7), 332. https://doi.org/10.3390/a16070332

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Mind the O˜: Asymptotically Better, but Still Impractical, Quantum Distributed Algorithms

Abstract

1. Introduction

2. Background and Setting

2.1. The CONGEST and CONGEST-CLIQUE Models of Distributed Computing

2.2. Quantum Versions of CONGEST and CONGEST-CLIQUE

2.3. Notation and Problem Definitions

3. Contributions

4. APSP and Routing Tables

4.1. Distance Products and Routing Tables

4.2. Distance Products via Triangle Finding

4.3. Routing Tables via Efficient Computation of Witness Matrices

4.4. Triangle Finding

4.5. Distributed Quantum Searches

4.6. Final Steps of the Triangle Finding

4.7. Complexity

4.7.1. Memory Requirements

4.7.2. Complexity of the Classical Analog

5. Approximately Optimal Steiner Tree Algorithm

5.1. Algorithm Overview

5.2. Shortest-Path Forest

5.3. Weight-Modified MST and Pruning

5.4. Overall Complexity and Correctness

6. Directed Minimum Spanning Tree Algorithm

6.1. Edmonds’ Centralized DMST Algorithm

6.2. Lovasz’s Shrinking Iterations

6.2.1. Quantum Distributed Implementation

6.2.2. Quantum Distributed Implementation of Lovasz’s Iteration

6.3. Complexity

7. Discussion and Future Work

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A

Appendix A.1. Proof of Claim 1

Appendix A.2. The α > 0 Case

Complexity of the Classical Analog

Appendix A.3. Expanding the DMST in the Distributed Setting

Appendix A.4. Information Access

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Mind the $\tilde{O}$ : Asymptotically Better, but Still Impractical, Quantum Distributed Algorithms