Can a Quantum Walk Tell Which Is Which?A Study of Quantum Walk-Based Graph Similarity

We consider the problem of measuring the similarity between two graphs using continuous-time quantum walks and comparing their time-evolution by means of the quantum Jensen-Shannon divergence. Contrary to previous works that focused solely on undirected graphs, here we consider the case of both directed and undirected graphs. We also consider the use of alternative Hamiltonians as well as the possibility of integrating additional node-level topological information into the proposed framework. We set up a graph classification task and we provide empirical evidence that: (1) our similarity measure can effectively incorporate the edge directionality information, leading to a significant improvement in classification accuracy; (2) the choice of the quantum walk Hamiltonian does not have a significant effect on the classification accuracy; (3) the addition of node-level topological information improves the classification accuracy in some but not all cases. We also theoretically prove that under certain constraints, the proposed similarity measure is positive definite and thus a valid kernel measure. Finally, we describe a fully quantum procedure to compute the kernel.


Introduction
In recent years, we have observed rapid advancements in the fields of machine learning and quantum computation. An increasing number of researchers is looking at challenges emerging at the intersection of these two fields, from quantum annealing as an alternative to classical simulated annealing [1,2] to quantum parallelism as a source of algorithmic speedup [3,4]. Quantum walks [5,6], the quantum mechanical analogue of classical random walks, have been shown to provide the potential for exponential speedups over classical computation, in part thanks to an array of exotic properties not exhibited by their classic counterpart, such as interference. The rising popularity of quantum walks can be further understood by looking at the work of Childs [7], who showed that any quantum computation can be efficiently simulated by a quantum walk on a sparse and unweighted graph, thus elevating quantum walks to the status of universal computational primitives.
In machine learning and pattern recognition, graphs are used as convenient representations for systems that are best described in terms of their structure and classical random walks have been repeatedly and successfully used to analyse such structure [8][9][10][11][12]. Indeed, the rich expressiveness of these representations comes with several issues when applying standard machine learning and pattern recognition techniques to them. In fact, these techniques usually require graphs to be mapped to corresponding vectorial representations, which in turn needs a canonical node order to be established first. Graphs with different number of nodes and edges present yet another challenge, as the dimension

•
We define a novel kernel for directed graphs based on [18] and the work of Chung [26] on directed Laplacians; • We extend the work of [18] by incorporating additional node-level topological information using two well-known structural signatures, the Heat Kernel Signature [27] and the Wave Kernel Signature [28]; • We give a formal proof of the positive definiteness of the undirected kernel for the case where the Hamiltonian is the graph Laplacian and the starting state satisfies several constraints; • We propose a simple yet efficient quantum algorithm to compute the kernel; • We perform an empirical comparison of the performance of the kernel for both directed and undirected graphs and for different choices of the Hamiltonian.
We perform an extensive set of experimental evaluations and we find that: • Adding the edge directionality information allows to better discriminate between the different classes, even when compared with other commonly used kernels for directed graphs; • In most cases the incorporation of the node-level topological information results in a significant improvement over the performance of the original kernel; • The optimal Hamiltonian (in terms of classification accuracy) depends on the dataset, as already suggested in [18]; • The constraints we enforce to ensure the positive definiteness of the kernel disrupt the phase of the initial state, leading to a decrease in classification accuracy.
The remainder of this paper is organised as follows. Section 2 reviews the fundamental concepts of graph theory and quantum mechanics needed to understand the present paper. Section 3 introduces the quantum-walk based similarity measure, analyses it from a theoretical perspectives, and illustrates how it can be adapted to work on both undirected and directed graphs. Finally, Section 4 discusses the results of our experimental evaluation and Section 5 concludes the paper.

Graphs and Quantum Walks
In this section we introduce several fundamental graph-theoretical and quantum mechanical notions. In particular, we show how to associate a density matrix to a graph using continuous-time quantum walks. The density matrix representation of the graphs will be then used in Section 3 to define the kernel for both directed and undirected graphs.

Elementary Graph-Theoretic Concepts
Let G(V, E) denote a directed graph, where V is a set of n vertices and E ⊆ V × V is a set of edges such that each e = (u, v) ∈ E has a start-vertex u ∈ V and an end-vertex v ∈ V. Please note that we sometimes omit (V, E) and simply write G when the vertex and edge sets are implicitely defined. The adjacency matrix A of G has elements If A uv = A vu = 1 we say that the edge e = (u, v) is bidirectional. If every edge of a graph G is bidirectional, i.e., the associated adjacency matrix A is symmetric, we say that G is an undirected graph.
For a node u, the degree of u is the number of nodes connected to u by an edge. For undirected graphs this is d u = ∑ v∈V A(u, v), while for directed graphs we distinguish between the in-degree d in u = ∑ v∈V A(v, u) and the out-degree d out u = ∑ v∈V A(u, v), which can be added up to give the total degree d tot u = d in u + d out u . For undirected graphs, we can define the degree matrix D as the matrix with the d u s as the diagonal elements and zero elsewhere. The graph Laplacian of G is then defined as L = D − A, and it can be interpreted as a combinatorial analogue of the discrete Laplace-Beltrami operator [29]. Similarly, we can define the normalised Laplacian L = D −1/2 LD 1/2 . For directed graphs, things are less straightforward. In Section 3.1 we illustrate how to define both Laplacian and normalised Laplacian on directed graphs based on the work of Chung [26].

Quantum Walks on Graphs
Using the Dirac notation to denote as |u the basis state corresponding to the walk being at vertex u ∈ V, the state of a continuous-time quantum walk on G at time is where the amplitude α u (t) ∈ C and |ψ t ∈ C |V| are both complex. Moreover, we have that α u (t)α * u (t) gives the probability that at time t the walker is at the vertex u, and thus ∑ u∈V α u (t)α * u (t) = 1 and α u (t)α * u (t) ∈ [0, 1], for all u ∈ V, t ∈ R + . The state of the walk evolves through time according to the Schrödinger equation where H denotes the time-independent Hamiltonian. Given an initial state |ψ 0 , the solution of the previous equation for time t is then |ψ t = e −iHt |ψ 0 .
In the case of undirected graphs, the Laplacian matrix is usually chosen as the system Hamiltonian, i.e., H = L. However in theory any Hermitian operator encoding the graph structure can be used instead, e.g., the adjacency matrix or the normalised Laplacian matrix. Directed graphs pose an additional challenge, as the asymmetricity of the adjacency matrix implies that it cannot be used as the system Hamiltonian. However it is possible to define symmetric Laplacian and normalised Laplacian matrix representations for directed graphs, as explained in Section 3.1.

From Quantum Walks to Graph Density Matrices
The density matrix representation is introduced in quantum mechanics as a way describe a system that is a statistical ensemble of pure states |ψ i , each with probability p i , i.e., Please note that density matrices are positive unit trace matrices. Given a density matrix ρ with eigenvalues λ 1 , . . . , λ n , we can compute its von Neumann entropy [30] S(ρ) as Moreover, given two density matrices ρ and σ we can compute their quantum Jensen-Shannon divergence (QJSD) [23,24] as The QJSD is always well defined, symmetric and positive definite [23,24]. It can also be shown that QJSD(ρ, σ) is bounded, i.e., 0 ≤ QJSD(ρ, σ) ≤ 1, with the latter equality attained only if ρ and σ have support on orthogonal subspaces [18].
The QJSD has been shown to have several interesting properties required for a good distinguishability measure between quantum states [23,24]. In pattern recognition, it has been used to define several graph kernels [17,18,31]. This in turn is achieved by associating quantum states to graphs and using the QJSD between the states as a proxy of the similarity between the graphs.
More specifically, given a graph G and a continuous-time quantum walk with starting state |ψ 0 , the idea is to associate a quantum state to G as follows. First, we let the quantum walk evolve according to Equation (4) until a time T is reached. Then, we define the quantum state with density matrix representation This state represents a uniform ensemble of pure states |ψ t each corresponding to a different stage of the quantum walk evolution. Different versions of QJSD-based graph kernels are based on different definitions of this density matrix [17,18]. In this paper we focus on the kernel of Rossi et al. [18], which was shown to have several advantages compared to its alternatives. However, while the original work of Rossi et al. [18] focused only on undirected graph, in the next sections we show how it can be extended to directed graphs. We also study how the choice of the Hamiltonian influences the kernel performance in a graph classification task and we investigate if the addition of structural information at node level has any benefit. While Rossi et al. [18] were unable to prove the positive definiteness of the kernel, we are able to provide a proof under the assumption that the Hamiltonian is the graph Laplacian and the starting state satisfies certain conditions. Finally, we propose a fully quantum procedure to compute the kernel.

Graph Similarity from Quantum Walks
Given two undirected graphs G 1 (V 1 , E 1 ) and G 2 (V 2 , E 2 ), we merge them into a larger structure by establishing a complete set of connections between the two node sets. More specifically, we construct a Figure 1 shows an example pair of graphs and their union. Figure 1. Given two graphs G 1 (V 1 , E 1 ) and G 2 (V 2 , E 2 ) we build a new graph G = (V, E ) where V = V 1 ∪ V 2 , E = E 1 ∪ E 2 and we add a new edge (u, v) between each pair of nodes u ∈ V 1 and v ∈ V 2 .
The intuition of Rossi et al. [18] was to define two mixed quantum states on this new graph describing the time-evolution of two quantum walks specifically designed to amplify constructive and destructive interference effects. To this end, we define two independent quantum walks with starting states We can then associate the following quantum states to the graph, Given these two states, we finally make use of the QJSD to compute the similarity between the input graphs in terms of the dissimilarity between the ρ − T and ρ + T , i.e., Since the quantum states ρ − T and ρ + T represent the evolution of two quantum walks that emphasize destructive and constructive interference, respectively, we expect that the more similar the input graphs are, the more dissimilar ρ − T and ρ + T are. In the extreme case where G 1 and G 2 and isomorphic, it can be shown that k T (G 1 , G 2 ) = 1 [18].
In [18], the authors suggest to consider the limit of ρ + T for T → ∞, which in turn allows to rewrite Equation (11) as whereΛ(H) is the set of distinct eigenvalues of the Hamiltonian (H), i.e., the eigenvalues λ with multiplicity µ(λ) = 1, and P λ = ∑ µ(λ) k=1 φ λ,k φ λ,k is the projection operator on the subspace spanned by the µ(λ) eigenvectors φ λ,k associated with λ. Similarly, one can work out the limit of ρ − T for T → ∞. This in turn has the effect of making the computation of the kernel easier in addition to allowing us to drop the time parameter T, yielding

Extension to Directed Graphs
The kernel described in the previous subsection can only be computed for undirected graphs. In order to cope with directed graphs, we propose using the directed graph Laplacian introduced by Chung [26]. We first merge the two directed input graphs G 1 (V 1 , E 1 ) and G 2 (V 2 , E 2 ) as done in the undirected case, i.e., we create a complete set of undirected edges connecting the nodes of V 1 to the nodes of V 2 . Note, however, that the resulting merge graph is directed, as a consequence of the edges in E 1 and E 2 being directed (see Figure 2). Hence we define the starting states ψ − 0 and ψ + 0 similarly to Equation (10), but using the out degree instead: To compute the Hamiltonian and thus ρ − T and ρ + T , we proceed as follows. Let us define the classical random walk transition matrix M for the directed graph G as the matrix with elements M uv = A uv /d out u , where we use the convention that when the out-degree of u is zero, i.e., d out u = 0, we let M uv = 0. From the Perron-Frobenius theorem we know that for a strongly connected directed graph, i.e., a directed graph where there exists a path between every pair of vertices and there are no sinks, the transition matrix M has a unique non-negative left eigenvector π [32], i.e., πM = λπ, where λ is the eigenvalue associated with π. According to the Perron-Frobenius theorem, if M is aperiodic we also have that λ = 1, and thus πM = π, implying that the random walk on the directed graph G with transition matrix M will converge to a stationary probability distribution.
Let Π be the diagonal matrix with the elements of π on the diagonal. Then Chung [26] defines the Laplacian matrix of the directed graph G as Similarly, the normalised Laplacian of G is defined as where I denotes the identity matrix. Please note that while the adjacency matrix of a directed graph is clearly not Hermitian, both the normalised Laplacian and the Laplacian defined in Equations (16) and (17) are symmetric and thus can used as the Hamiltonian governing the quantum walks evolution. An alternative to this would have been to symmetrize the graph edges effectively making the graph undirected, however as we will show in the experimental part this causes the loss of important structural information.
We would like to stress that the method of Chung [26] is not the only way to associate a (normalised) Laplacian matrix to a directed graph. We are aware of at least one different method proposed by Bauer [33]. However we decide to focus on the definition proposed by Chung [26] as this has been successfully applied to the analysis of directed graphs in pattern recognition and network science [34][35][36].

Integrating Local Topological Information
In [31] it was shown that for the case of undirected attributed graphs where f i is a function assigning attributes to the nodes of the graph G i , it is possible to effectively incorporate the information on the pairwise similarity between the attributes of the two graphs by allowing the adjacency matrix of the merged graph to be weighted. More specifically, for each u ∈ V 1 and v ∈ V 2 , the authors proposed to label the edge (u, v) with a real value ω( f 1 (u), f 2 (v)) representing the similarity between f 1 (u) and f 2 (v).
While in this paper we only deal with unattributed graphs, we propose using a similar method to incorporate additional node-level structural information into the kernel. More specifically, given an undirected graph G(V, E) and a node v ∈ V, we capture the structure of the graph from the perspective of v using two well known spectral signatures based on the graph Laplacian, the Heat Kernel Signature (HKS) [27] and the Wave Kernel Signature (WKS) [28]. The signatures simulate a heat diffusion and wave propagation process on the graph, respectively. The result is in both cases a series of vectors characterising the graph topology centered at each node. In graph matching, particularly in computer vision applications [27,28], this is used to match pairs of nodes with similar (in an Euclidean sense) vectorial representations under the assumption that these describe structurally similar nodes in the original graph.
Given two nodes u, v ∈ V, we compute their spectral signatures f 1 (u) and f 1 (v) using either HKS or WKS and we label the edge (u, v) with the real value Please note that the signatures are computed on the original graphs, not on the merged one. Then the adjacency matrix of the graph obtained by merging G 1 and G 2 becomes Figure 3 shows an example of two graphs with 2-dimensional signatures and their merged graph. For ease of presentation, the weight of each edge is shown by varying its thickness, i.e., the thicker the edge the higher the weight and the more similar the signatures of the connected nodes. Here we decided to compute the similarity between each pair of nodes in V, effectively turning G 1 and G 2 into two weighted cliques, i.e., complete weighted graphs, where the weights on the edges encode the pairwise nodes similarity. While at first sight it may seem like we are discarding the original structure of G 1 and G 2 , note that this is actually encoded in the node signatures and thus the edge weights.
In Section 4 we run an extensive set of experiments to determine if integrating node-level structural information improves the ability of the kernel to distinguish between structurally similar graphs.

Kernel Properties
Consider a pair of graphs G 1 and G 2 , with n and m nodes respectively. Here we consider the case where we take the graph Laplacian to be the Hamiltonian of the system and we assume that the input graphs are undirected. Let L 1 and L 2 be the Laplacian of G 1 and G 2 respectively. Then the Laplacian of the merged graph is We can provide a full characterization of the eigensystem of L. Let |v and |w be eigenvectors of L 1 and L 2 , respectively, with corresponding eigenvalues v and w. Further, assume that |v and |w are not Fiedler vectors, i.e., 1 n |v = 0 and 1 m |w = 0. Then |v, 0 m is an eigenvector of L with eigenvalue v + m and |0 m , w is an eigenvector of L with eigenvalue w + n. The remaining eigenvectors of L are |1 m+n of eigenvalue 0 and |m1 n , −n1 m of eigenvalue m + n. In the following, we will denote with P 1 λ the orthogonal projector on the eigenspace of L 1 of eigenvalue λ, if λ is an eigenvalue of L 1 , 0 otherwise. For the special case of the 0 eigenspace, we eliminate the constant eigenvector |1 n . P 2 λ is similarly defined for L 2 .
Under this eigendecomposition we can write ρ + ∞ and ρ − ∞ as the following sums of rank-1 matrices: with the eigenvalues of L being λ = µ + m + n. Similarly, Please note that each matrix in the summation is a rank-1 matrix and the matrices are orthogonal to each other as a consequence of the projectors being orthogonal to each others. This in turn implies that the size of the spectra of ρ + ∞ and ρ − ∞ is the same and it is equal to the number of distinct eigenvalues in (Λ(L 1 ) − n) ∪ (Λ(L 2 ) − m). More precisely, for each µ ∈ (Λ(L 1 ) − n) ∪ (Λ(L 2 ) − m), excluding those derived from the two Fiedler vectors of L 1 and L 2 , there exists a non-zero eigenvalue of both ρ + ∞ and ρ − ∞ of the form ξ 0 | P 1 µ+n |ξ 0 + χ 0 | P 2 µ+m |χ 0 .
In addition, we have two eigenvalues of the form for L 1 , and ( 1 n |ξ 0 − 1 m |χ 0 ) 2 m + n and mn ( 1 In other words the two density matrices, despite having different eigenspaces, have identical spectra induced from the eigenspaces orthogonal to the Fiedler vector and differ only in the values due to the extremal eigenvalues of L. Turning our attention to the arithmetic mean of the two density matrices in Equation (14), we have Again, this is a summation of rank-1 matrices orthogonal to one-another, resulting in eigenvalues of the form ξ 0 | P 1 µ+n |ξ 0 and χ 0 | P 2 µ+m |χ 0 , corresponding to the (distinct) eigenvalues of L 1 and L 2 , respectively, excluding the eigenspace component induced by the Fiedler vectors, as well as the two eigenvector from the extremal eigenvalues of L We can now compute the QJSD kernel between the graphs G 1 and G 2 as Assuming that the initial states are normalised in such a way that 1 n |ξ 0 = 1 m |χ 0 = 0 and ξ 0 |ξ 0 = χ 0 |χ 0 = 1 2 , the last terms disappear and we have This leads us to the following final observation. Let f 1 and f 2 be two discrete distributions with values inΛ(L 1 ) − n andΛ(L 2 ) − m, respectively, defined as where δ(x) is the Dirac delta function. Then we have where CJSD denotes the classical Jensen-Shannon divergence. Now we can prove the following: Theorem 1. Using the Laplacian as the Hamiltonian and normalizing |ξ 0 and |χ 0 such that ξ 0 |ξ 0 = ξ 0 |ξ 0 = 1 2 and 1 n |ξ 0 = 1 m |ξ 0 = 0, then the quantum Jensen-Shannon kernel k ∞ (G 1 , Proof. The proof descends directly from the derivations above and from the fact that the classical Jensen-Shannon divergence is conditionally negative-definite [37].

Kernel Computation
The issue of efficiently computing the QJSD kernel is that of computing the entropies of ρ − ∞ , ρ + ∞ , and of their mean. Here we will discuss a general quantum approach to estimate the entropy of an infinite-time average mixing matrix starting from a pure state. Following [18], we write the infinite-time average mixing matrix as where H is the Hamiltonian and P λ is the projector onto its λ eigenspace. If ρ 0 = |φ φ| is a pure state, then the eigenvectors corresponding to non-zero eigenvalues of ρ ∞ are of the form for λ ∈Λ(H), and the corresponding eigenvalues are We already used this property in the previous section for the special case of the graph Laplacian as the Hamiltonian, but it holds in general.
It is important to note that the lambdas here are energy levels of the system, i.e., possible observed values using the Hamiltonian as an observable, and the probability of observing an energy state λ from the state ρ 0 = |φ φ| is φ| P λ |φ . This means that the von Neumann entropy of ρ ∞ is equivalent to the classic Shannon entropy of the energy levels of the (pure) initial state, i.e., the entropy of the possible observations from the pure initial state using the Hamiltonian as the observable.
The quantum algorithm to estimate the von Neumann entropy of the infinite-time average mixing matrix ρ ∞ = |φ φ| is as follows. First, we create several systems in the initial pure state |φ and we observe their energy using H as the observable. Then we use any entropy estimator for finite (discrete) distributions, using the energies as samples. Algorithms such as [38] or [39] are particularly interesting because they work well in the under-sampled case.
In the special case considered in Theorem 1, we can make the estimation even more efficient noting that we can observe independently from L 1 − nI n and L 2 − mI m , and in particular we can perform a single set of observation for each graph and reuse them in every pairwise computation for the kernel. In this case, the samples from the first graph are used to estimate the entropy S( f 1 ), the samples from the second graph are used to estimate S( f 2 ), and the two sample-sets are merged to estimate S

Experiments
In this section, we evaluate the accuracy of the proposed kernel in several classification tasks. Our aim is to investigate (1) the importance of the choice of the Hamiltonian; (2) the integration of node-level structural signatures; and (3) the ability to incorporate edge directionality information using the Laplacians proposed by Chung [26]. To this end, we make use of the following datasets of undirected graphs (see Table 1): MUTAG [40] is a dataset consisting originally of 230 chemical compounds tested for mutagenicity in Salmonella typhimurium [41]. Among the 230 compounds, however, only 188 (125 positive, 63 negative) are considered to be learnable and thus are used in our simulations. The 188 chemical compounds are represented by graphs. The aim is predicting whether each compound possesses mutagenicity.
PPIs (Protein-Protein Interaction) is a dataset collecting protein-protein interaction networks related to histidine kinase [42] (40 PPIs from Acidovorax avenae and 46 PPIs from Acidobacteria) [43]. The graphs describe the interaction relationships between histidine kinase in different species of bacteria. Histidine kinase is a key protein in the development of signal transduction. If two proteins have direct (physical) or indirect (functional) association, they are connected by an edge. The original dataset comprises 219 PPIs from 5 different kinds of bacteria with the following evolution order (from older to more recent): Aquifex 4 and Thermotoga 4 PPIs from Aquifex aelicus and Thermotoga maritima, Gram-Positive 52 PPIs from Staphylococcus aureus, Cyanobacteria 73 PPIs from Anabaena variabilis and Proteobacteria 40 PPIs from Acidovorax avenae. There is an additional class (Acidobacteria 46 PPIs) which is more controversial in terms of the bacterial evolution since they were discovered.  [44]. These graphs are very small and sparse. We select the graphs of Male Rats (MR) for evaluation. There are 344 test graphs in the MR class.
COIL Columbia Object Image Library consists of 3D objects images of 100 objects [45]. There are 72 images per object taken in order to obtain 72 views from equally spaced viewing directions. For each view a graph was built by triangulating the extracted Harris corner points. In our experiments, we use the gray-scale images of five objects.
NCI1 The anti-cancer activity prediction dataset consists of undirected graphs representing chemical compounds screened for activity against non-small cell lung cancer lines [46]. Here we use only the connected graphs in the dataset (3530 out of 4110).
We also make use of the following directed graphs datasets: Shock The Shock dataset consists of graphs from a database of 2D shapes [47]. Each graph is a medial axis-based representation of the differential structure of the boundary of a 2D shape. There are 150 graphs divided into 10 classes, each containing 15 graphs. The original version contains directed trees each with a root node, the undirected version has been created by removing the directionality.
Alzheimer The dataset is obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) [48] and concerns interregional connectivity structure for functional magnetic resonance imaging (fMRI) activation networks for normal and Alzheimer subjects. Each image volume is acquired every two seconds with blood oxygenation level dependent signals (BOLD). The fMRI voxels here have been aggregated into larger regions of interest (ROIs). The different ROIs correspond to different anatomical regions of the brain and are assigned anatomical labels to distinguish them. There are 96 anatomical regions in each fMRI image. The correlation between the average time series in different ROIs represents the degree of functional connectivity between regions which are driven by neural activities [49]. Subjects fall into four categories according to their degree of disease severity: AD-full Alzheimer's (30 subjects), LMCI-Late Mild Cognitive Impairment (34 subjects), EMCI-Early Mild Cognitive Impairment (47 subjects), HC-Normal Healthy Controls (38 subjects). The LMCI subjects are more severely affected and close to full Alzheimerś, while the EMCI subjects are closer to the healthy control group (Normal). A directed graph with 96 nodes is constructed for each patient based on the magnitude of the correlation and the sign of the time-lag between the time-series for different anatomical regions. To model causal interaction among ROIs, the directed graph uses the time lagged cross-correlation coefficients for the average time series for pairs of ROIs. We detect directed edges by finding the time-lag that results in the maximum value of the cross-correlation coefficient. The direction of the edge depends on whether the time lag is positive or negative. We then apply a threshold to the maximum values to retain directed edges with the top 40% of correlation coefficients. This yields a binary directed adjacency matrix for each subject, where the diagonal elements are set to zero. Those ROIs which have missing time series data are discarded. In order to fairly evaluate the influence caused by edges directionality, an undirected copy has been created as well. In particular, let A d be the adjacency matrix of a directed graph. Then its projection over the symmetric matrices space will be given by (A d + A d )/2.

Experimental Framework
With these datasets to hand, we aim to solve a graph classification task using as binary C-Support Vector Machine (C-SVM) [19]. We perform 10-fold cross validation, where for each sample we independently tune the value of C, the SVM regularizer constant, by considering the training data from that sample. The process is averaged over 100 random partitions of the data, and the results are reported in terms of average accuracy ± standard error. Recall that in classification tasks, the accuracy is intended as the fraction of the data occurrences that are assigned the correct class label. In our case, an occurrence is a graph. Moreover, we contrast the performance of the kernel with that of other well-established alternative graph kernels, namely the shortest-path kernel [15] (SP), the classic random walk kernel [9] (RW) and the the Weisfeiler-Lehman subtree kernel [16] (W L).
We divide our experiments into two parts. We first evaluate the influence of the Hamiltonian on the classification accuracy, where the Hamiltonian is chosen to be one between (1) the adjacency matrix (QJSD A ); (2) the Laplacian (QJSD L ); or (3) the normalised Laplacian of the graph (QJSD NL ).
Here we also consider the positive definite version of the kernel discussed in Section 3.3 (QJSD * ). In the same experiment, we test if the addition of node signatures benefits the kernel performance, where the signature is either the (1) HKS or the (2) WKS, denoted as hk and wk respectively. In the second experiment we test the performance of our kernel for the classification of directed graphs. Please note that all the alternative kernels we consider are also able to cope with directed graphs. Here we test both our competitiveness against these kernels as well as the increase or decrease in performance compared to the case where we remove the edge directionality information. Table 2 shows the results of the classification experiments on undirected graphs. The best and second best performing kernels are highlighted in bold and italic, respectively. While the kernels based on the quantum Jensen-Shannon divergence yield the highest accuracy in 4 out of 5 datasets, it is clear that no specific combination of Hamiltonian and structural signatures is optimal in all cases. However, we observe that with the exception of the PPI dataset, both the choice of the Hamiltonian and the use or not of structural signatures has little or no impact on the performance of the kernel, which remains highly competitive when compared to the shortest-path, random walks, graphlet, and Weisfeiler-Lehman kernels. The performance gain is particularly evident when compared to the kernel based on classical random walks, which is consistently outperformed by the kernels based on quantum walks. This in turn is a confirmation of the increased ability of capturing subtle structural similarities of quantum walks with respect to their classical counterpart, as well as a clear statement of the robustness of the proposed kernel to the choice of its parameters. We do observe a tendency of the Laplacian + WKS combination to perform better on datasets with average graph size >100 (PPI and COIL), however this would need more extensive experiments to be validated. We would like to stress that there are indeed alternative strategies to the one proposed in Section 3.2 to integrate the node signatures information. For example, instead of creating a pair of weighted cliques, one could retain the original graphs structure and compute the similarity only between nodes connected by edges in E 1 , E 2 , and E 12 . Another approach could have been to restrict the weighting only to the edges in E 12 , similarly to what done by Rossi et al. in [31]. However we found little or no difference in terms of classification accuracy between these alternative solutions and thus we decided to omit these results for ease of presentation.

Undirected Graphs
Finally, the performance of the positive definite kernel QJSD * deserves a closer inspection. In contrast to the other quantum-walk based kernels, in this case the performance is often significantly lower and the dependency on the addition of the node-level structural signatures higher. Unlike the other kernels, in this case we had to enforce several constraints on the initial state in order to guarantee the positive definiteness of the resulting kernel (see Section 3.3). However one of this constraints, i.e., the requirement that the two parts of the initial amplitude vector corresponding to G 1 and G 2 both sum to zero, causes the phase of the initial state to differ from that of the other kernels. While the initial states defined in Equation (10) are such that ψ + 0 has equal phase on both G 1 and G 2 , while ψ − 0 has opposite phase on G 1 and G 2 , enforcing the constraint of Section 3.3 leads to a change of sign (and thus of phase) for the smallest (in absolute value) components of the initial state. This in turn invalidates the assumption that ρ + ∞ and ρ − ∞ are designed to stress constructive and destructive interference, respectively, and thus affects the ability of the kernel to distinguish between similar/dissimilar graphs. Table 3 shows the results of the classification experiments on directed graphs. We compare the kernels on the two directed graphs dataset, as well as their undirected versions, and we highlight again the best performing kernel for each dataset. Please note that when the graphs are directed we cannot choose the (potentially asymmetric) adjacency matrix as the Hamiltonian. We first observe that preserving the edge directionality information leads to a higher or similar classification accuracy for most kernels, with the exception of the shortest-path kernel on the Shock dataset. Please note that the graphs in this dataset are trees, and as such in the directed version there may be pairs of nodes for which a shortest-path does not exist, leading to the observed decrease in classification performance. In general, we observe that both in the directed and undirected case our kernel is the best performing, with the normalised Laplacian being the Hamiltonian that leads to the highest classification accuracy in 3 out of 4 datasets. Table 3. Classification accuracy (±standard error) on directed graph datasets, where d and u denote the directed and undirected versions of the datasets, respectively. The best and second best performing kernels are highlighted in bold and italic, respectively.

Conclusions
We studied the use of continuous-time quantum walks and the quantum Jensen-Shannon divergence for the purpose of measuring the similarity between two graphs. We performed a thorough analysis of the similarity measure originally introduced in [18], showing that: (1) the kernel can be extended to deal with directed graphs, allowing to better discriminate between the different classes, even when compared with other commonly used kernels for directed graphs; (2) while it is possible to incorporate additional node-level topological information in the kernel, this results in a significant improvement over the performance of the original kernel; (3) the optimal Hamiltonian (in terms of classification accuracy) depends on the dataset; (4) when appropriate conditions are met, the kernel is proved to be positive-definite, however (5) its performance suffers from a loss of structure in the design of the initial state phase; (6) it is possible to efficiently compute the kernel using a fully quantum procedure.
Despite having made important progress toward a better understanding of the quantum Jensen-Shannon divergence kernel, several questions remain unanswered. As such, future work should aim at understanding the role played by the initial state in the performance of the kernel, as well as considering alternative divergence measures and different ways of encoding the edge directionality information. Another idea could be to investigate if we can link the similarity between two graphs to the quantumness [50] of a quantum walk on their merged graph, based on the observation that isomorphic graphs result in a highly symmetric merged graph where strong interference effects can lead the quantum walk to behave significantly differently from its classical counterpart.