A Comparison Between Invariant and Equivariant Classical and Quantum Graph Neural Networks

Machine learning algorithms are heavily relied on to understand the vast amounts of data from high-energy particle collisions at the CERN Large Hadron Collider (LHC). The data from such collision events can naturally be represented with graph structures. Therefore, deep geometric methods, such as graph neural networks (GNNs), have been leveraged for various data analysis tasks in high-energy physics. One typical task is jet tagging, where jets are viewed as point clouds with distinct features and edge connections between their constituent particles. The increasing size and complexity of the LHC particle datasets, as well as the computational models used for their analysis, greatly motivate the development of alternative fast and efficient computational paradigms such as quantum computation. In addition, to enhance the validity and robustness of deep networks, one can leverage the fundamental symmetries present in the data through the use of invariant inputs and equivariant layers. In this paper, we perform a fair and comprehensive comparison between classical graph neural networks (GNNs) and equivariant graph neural networks (EGNNs) and their quantum counterparts: quantum graph neural networks (QGNNs) and equivariant quantum graph neural networks (EQGNN). The four architectures were benchmarked on a binary classification task to classify the parton-level particle initiating the jet. Based on their AUC scores, the quantum networks were shown to outperform the classical networks. However, seeing the computational advantage of the quantum networks in practice may have to wait for the further development of quantum technology and its associated APIs.


Introduction
Through the measurement of the byproducts of particle collisions, the Large Hadron Collider (LHC) collects a substantial amount of information about fundamental particles and their interactions.The data produced from these collisions can be analyzed using various supervised and unsupervised machine learning methods [1][2][3][4][5].Jet tagging is a key task in high-energy physics, which seeks to identify the likely parton-level particle from which the jet originated.By viewing individual jets as point clouds with distinct features and edge connections between their constituent particles, a graph neural network (GNN) is considered a well-suited architecture for jet tagging [2,3].
Classified as deep geometric networks, GNNs have the potential to draw inferences about a graph structure, including the interactions among the elements in the graph [6,7].Graph neural networks are typically thought of as generalizations of convolutional neural networks (CNNs), which are predominantly used for image recognition, pattern recognition, and computer vision [8,9].This can be attributed to the fact that in an image, each pixel is connected to its nearest neighboring pixels, whereas in a general dataset, one would ideally like to construct an arbitrary graph structure among the samples.Many instances in nature can be described well in terms of graphs, including molecules, maps, social networks, and the brain.For example, in molecules, the nodal data can be attributed to the atoms, the edges can be characterized as the strength of the bond between atoms, and the features embedded within each node can be the atom's characteristics, such as reactivity.
Generally, graphically structured problems involve unordered sets of elements with a learnable embedding of the input features.Useful information can be extracted from such graphically structured data by embedding them within GNNs.Many subsequent developments have been made to GNNs since their first implementation in 2005.These developments have included graph convolutional, recurrent, message passing, graph attention, and graph transformer architectures [2,6,10,11].
To enhance the validity and robustness of deep networks, invariant and equivariant networks have been constructed to learn the symmetries embedded within a dataset by preserving an oracle in the former and by enforcing weight sharing across filter orientations in the latter [12,13].Utilizing analytical invariant quantities characteristic of physical symmetry representations, computational methods have successfully rediscovered fundamental Lie group structures, such as the SO(n), SO(1, 3), and U(n) groups [14][15][16][17].Nonlinear symmetry discovery methods have also been applied to classification tasks in data domains [18].The simplest and most useful embedded symmetry transformations include translations, rotations, and reflections, which have been the primary focus in invariant (IGNN) and equivariant (EGNN) graph neural networks [19][20][21].
The learned representations from the collection of these network components can be used to understand unobservable causal factors, uncover fundamental physical principles governing these processes, and possibly even discover statistically significant hidden anomalies.However, with increasing amounts of available data and the computational cost of these deep learning networks, large computing resources will be required to efficiently run these machine learning algorithms.The extension of classical networks, which rely on bit-wise computation, to quantum networks, which rely on qubit-wise computation, is already underway as a solution to this complexity problem.Due to superposition and entanglement among qubits, quantum networks are able to store the equivalent of 2 n characteristics from n two-dimensional complex vectors.In other words, while the expressivity of the classical network scales linearly, that of the quantum network scales exponentially with the sample size n [22].Many APIs, including Xanadu's Pennylane, Google's Cirq, and IBM's Qiskit, have been developed to allow for the testing of the quantum circuits and quantum machine learning algorithms running on these quantum devices.
In the quantum graph structure, classical nodes can be mapped to the quantum states of the qubits, real-valued features to the complex-valued entries of the states, edges to the interactions between states, and edge attributes to the strength of the interactions between the quantum states.Through a well-defined Hamiltonian operator, the larger structure of a classical model can then be embedded into the quantum model.The unitary operator constructed from this parameterized Hamiltonian determines the temporal evolution of the quantum system by acting on the fully entangled quantum state of the graph.Following several layers of application, a final state measurement of the quantum system can then be made to reach a final prediction.The theory and application of unsupervised and supervised learning tasks involving quantum graph neural networks (QGNNs), quantum graph recurrent neural networks (QGRNNs), and quantum graph convolutional neural networks (QGCNNs) have already been developed [23,24].Improvements to these models to arbitrarily sized graphs have been made with the implementation of ego-graph-based quantum graph neural networks (egoQGNNs) [25].Quantum analogs of other advanced classical architectures, including generative adversarial networks (GANs), transformers, natural language processors (NLPs), and equivariant networks, have also been proposed [23,[26][27][28][29][30][31][32].
With the rapid development of quantum deep learning, this paper intends to offer a fair and comprehensive comparison between classical GNNs and their quantum counterparts.To classify whether a particle jet has originated from a quark or a gluon, a binary classification task was carried out using four different architectures.These architectures included a GNN, SE(2) EGNN, QGNN, and permutation EQGNN.Each quantum model was fine tuned to have an analogous structure to its classical form.In order to provide a fair comparison, all models used similar hyperparameters as well as a similar number of total trainable parameters.The final results across each architecture were recorded using identical training, validation, and testing sets.We found that QGNN and EQGNN outperformed their classical analogs on the particular binary classification task described above.Although these results seem promising for the future of quantum computing, the further development of quantum APIs is required to allow for more general implementations of quantum architectures.

Data
The jet tagging binary classification task is illustrated with the high-energy physics (HEP) dataset Pythia8 Quark and Gluon Jets for Energy Flow [33].This dataset contains data from two million particle collision jets split equally into one million jets that originated from a quark and one million jets that originated from a gluon.These jets resulted from LHC collisions with total center of mass energy √ s = 14 TeV and were selected to have transverse momenta p jet T between 500 to 550 GeV and rapidities |y jet | < 1.7.The jet kinematic distributions are shown in Figure 1.For each jet α, the classification label is provided as either a quark with y α = 1 or a gluon with y α = 0.Each particle i within the jet is listed with its transverse momentum p

Graphically Structured Data
A graph G is typically defined as a set of nodes V and edges E , i.e., G = {V, E }.Each node v (i) ∈ V is connected to its neighboring nodes v (j) via edges e (ij) ∈ E .In our case, each jet α can be considered as a graph J α composed of the jet's constituent particles as the nodes v  α .It should be noted that the number of nodes within a graph can vary.This is especially true for the case of particle jets, where the number of particles within each jet can vary greatly.Each jet J α can be considered as a collection of m α particles with l distinct features per particle.An illustration of graphically structured data and an example jet in the (ϕ, y) plane are shown in Figure 2.

Feature Engineering
We used the Particle package [34] to find the particle masses m (i) α from the respective particle IDs I (i) α .From the available kinematic information for each particle i, we constructed new physically meaningful kinematic variables [35], which were used as additional features in the analysis.In particular, we considered the transverse mass m x,α , p (i) y,α , and p (i) z,α , defined, respectively, as T,α cosh(y x,α = p T,α cos(ϕ T,α sin(ϕ z,α = m T,ij sinh(y The original kinematic information in the dataset was then combined with the additional kinematic variables (1) into a feature set h α , l = 0, 1, 2, . . ., 7, as follows: x,α , p These features were then max-scaled by their maximum value across all jets α and particles i, i.e., h α ).Edge connections are formed via the Euclidean distance ∆R = ∆ϕ 2 + ∆y 2 between one particle and its neighbor in (ϕ, y) space.Therefore, the edge attribute matrix for each jet can be expressed as (3)

Training, Validation, and Testing Sets
We considered jets with at least 10 particles.This left us with N = 1,997,445 jets, 997,805 of which were quark jets.While the classical GNN is more flexible in terms of its hidden features, the size of the quantum state and the Hamiltonian scale as 2 n , where n is the number of qubits.As we shall see, the number of qubits is given by the number of nodes n α in the graph, i.e., the number of particles in the jet.Therefore, jets with large particle multiplicity require prohibitively complex quantum networks.Thus, we limited ourselves to the case of n α = 3 particles per jet by only considering the three highest momenta p T particles within each jet.In other words, each graph contained the set h α = (h where each h For training, we randomly picked N = 12,500 jets and used the first 10,000 for training, the next 1250 for validation, and the last 1250 for testing.These sets happened to contain 4982, 658, and 583 quark jets, respectively.

Models
The four different models described below, including a GNN, an EGNN, a QGNN, and an EQGNN, were constructed to perform graph classification.The binary classification task was determining whether a jet J α originated from a quark or a gluon.

Invariance and Equivariance
By making a network invariant or equivariant to particular symmetries within a dataset, a more robust architecture can be developed.In order to introduce invariance and equivariance, one must assume or learn a certain symmetry group G of transformations on the dataset.A function φ : X → Y is equivariant with respect to a set of group transformations T g : X → X, g ∈ G, acting on the input vector space X, if there exists a set of transformations S g : Y → Y that similarly transform the output space Y, i.e., A model is said to be invariant when, for all g ∈ G, S g becomes the set containing only the trivial mapping, i.e., S g = {I G }, where I G ∈ G is the identity element of the group G [12,36].
Invariance performs better as an input embedding, whereas equivariance can be more easily incorporated into the model layers.For each model, the invariant component corresponds to the translational and rotational invariant embedding distance φ ≡ ∆R Here, the function φ : R 2 × R 2 → R makes up the edge attribute matrix a (ij) α , as defined in Equation (3).This distance is used as opposed to solely incorporating the raw coordinates.Equivariance has been accomplished through the use of simple nontrivial functions along with higher-order methods involving the use of spherical harmonics to embed the equivariance within the network [37,38].Equivariance takes different forms in each model presented here.

Graph Neural Network
Classical GNNs take in a collection of graphs {G α }, each with nodes v α .Here, we can define N (i) as the set of neighbors of node v (i) α and take r = 1, as we only consider one edge attribute.In other words, the edge attribute tensor a α becomes a matrix.The edge attributes are typically formed from the features corresponding to each node and its neighbors.
In the layer structure of a GNN, multilayer perceptions (MLPs) are used to update the node features and edge attributes before aggregating, or mean pooling, the updated node features for each graph to make a final prediction.To simplify notation, we omit the graph index α, lower the node index i, and introduce a model layer index l.The first MLP is the edge MLP ϕ e , which, at each layer l, collects the features h l i , its neighbors' features h l j , and the edge attribute a ij corresponding to the pair.Once the new edge matrix m ij is formed, we sum along the neighbor dimension j to create a new node feature m i .This extra feature is then added to the original node features h i before being input into a second node updating MLP ϕ h to form new node features h l+1 i [8,10,21].Therefore, a GNN is defined as Here, h l i ∈ R k is the kth-dimensional embedding of node v i at layer l, and m ij is typically referred to as the message-passing function.Once the data are sent through the P graph layers of the GNN, the updated nodes h P i are aggregated via mean pooling for each graph to form a set of final features 1 n α ∑ n α i=1 h P i .These final features are sent through a fully connected neural network (NN) to output the predictions.Typically, a fixed number of hidden features k = N h is defined for both the edge and node MLPs.The described GNN architecture is pictorially shown in the left panel in Figure 3.

SE(2) Equivariant Graph Neural Network
For the classical EGNN, the approach used here was informed by the successful implementation of SE(3), or rotational, translational, and permutational, equivariance on dynamic systems and the QM9 molecular dataset [21].It should be noted that GNNs are naturally permutation equivariant, in particular invariant, when averaging over the final node feature outputs of the graph layers [39].An SE(2) EGNN takes the same form as a GNN; however, the coordinates are equivariantly updated within each graph layer, i.e., x i → x l i where x i = (ϕ i , y i ) in our case.The new form of the network becomes Since the coordinates x l i of each node v i are equivariantly evolving, we also introduce a second invariant embedding |x l i − x l j | based on the equivariant coordinates into the edge MLP ϕ e .The coordinates x i are updated by adding the summed difference between the coordinates of x i and its neighbors x j .This added term is suppressed by a factor of C, which we take to be C(n α ) = 1 ln(2n α ) .The term is further multiplied by a coordinate MLP ϕ x , which takes as input the message-passing function m ij between node i and its neighbors j.For a proof of the rotational and translational equivariance of x l+1 i , see Appendix A. The described EGNN architecture is pictorially shown in the right panel in Figure 3.

Quantum Graph Neural Network
For a QGNN, the input data, as a collection of graphs {G α }, are the same as described above.We fix the number of qubits n to be the number of nodes n α in each graph.For the quantum algorithm, we first form a normalized qubit product state from an embedding MLP ϕ |ψ 0 ⟩ , which takes in the features h i of each node v i and reduces each of them to a qubit state |ϕ |ψ 0 ⟩ (h i )⟩ ∈ C 2 [40].The initial product state describing the system then becomes |ψ 0 α ⟩ = n i=1 |ϕ |ψ⟩ (h i )⟩ ∈ C 2 n , which we normalize by ⟨ψ 0 α |ψ 0 α ⟩.A fully parameterized Hamiltonian can then be constructed based on the adjacency matrix a ij , or trainable interaction weights W ij , and node features h i , or trainable feature weights M i [24].Here, for the coupling term of the Hamiltonian H C , we utilize the edge matrix a ij connected to two coupled Pauli-Z operators, σ z i and σ z j , which act on the Hilbert spaces of qubits i and j, respectively.Since we embed the quantum state |ψ 0 ⟩ based on the node features h i , we omit the self-interaction term which utilizes the chosen features applied to the Pauli-Z operator, σ z i , which acts on the Hilbert space of qubit i.We also introduce a transverse term H T to each node in the form of a Pauli-X operator, σ x i , with a fixed or learnable constant coefficient Q 0 , which we take to be Q 0 = 1.Note that the Hamiltonian H contains O(2 n × 2 n ) entries due to the Kronecker products between Pauli operators.To best express the properties of each graph, we take the Hamiltonian of the form where the 8 × 8 matrix representations of H C and H T are shown in Figure 4. We generate the unitary form of the operator via the complex exponentiating of the Hamiltonian with real learnable coefficients γ lq ∈ R P×Q , which can be thought of as infinitesimal parameters running over Q = 2 Hamiltonian terms and P layers of the network.Therefore, the QGNN can be defined as where U l θ = (θ ′ − iI)(θ ′ + iI) −1 is a parameterized unitary Cayley transformation in which we force θ ′ = θ + θ † to be self-adjoint, i.e., θ ′ = θ ′ † , with θ ∈ C 2 n ×2 n as the trainable parameters.The QGNN evolves the quantum state |ψ 0 ⟩ by applying unitarily transformed ansatz Hamiltonians with Q terms to the state over P layers.The final product state |ψ P ⟩ ∈ C 2 n is measured for output, which is sent to a classical fully connected NN to make a prediction.The analogy between the quantum unitary interaction function ϕ u and classical edge MLP ϕ e , as well as between the quantum unitary state update function ϕ |ψ⟩ and classical node update function ϕ h , should be clear.For a reduction in the coupling Hamiltonian H C in Equation (7), see Appendix B. The described QGNN architecture is pictorially shown in the left panel in Figure 5.

Permutation Equivariant Quantum Graph Neural Network
The EQGNN takes the same form as the QGNN; however, we aggregate the final elements of the product state 12 n ∑ 2 n k=1 |ψ P k ⟩ via mean pooling before sending this complex value to a fully connected NN [31,40,41].See Appendix C for a proof of the quantum product state permutation equivariance over the sum of its elements.The resulting EQGNN architecture is shown in the right panel in Figure 5.

Results and Analysis
For each model, a range of total parameters was tested; however, the overall comparison test was conducted using the largest optimal number of total parameters for each network.A feed-forward NN was used to reduce each network's graph layered output to a binary one, followed by the softmax activation function to obtain the logits in the classical case and the norm of the complex values to obtain the logits in the quantum case.Each model trained over 20 epochs with the Adam optimizer consisting of a learning rate of η = 10 −3 and a cross-entropy loss function.The classical networks were trained with a batch size of 64 and the quantum networks with a batch size of one due to the limited capabilities of broadcasting unitary operators in the quantum APIs.After epoch 15, the best model weights were saved when the validation AUC of the true positive rate (TPR) as a function of the false positive rate (FPR) was maximized.The results of the training for the largest optimal total number of parameters |Θ| ≈ 5100 are shown in Figure 6, with more details listed in Table 1.Recall that for each model, we varied the number of hidden features N h in the P graph layers.Since we fixed the number of nodes n α = 3 per jet, the hidden feature number N h = 2 3 = 8 was fixed in the quantum case, and, therefore, we also varied the parameters of the encoder ϕ |ψ 0 ⟩ and decoder NN in the quantum algorithms.
Based on the AUC scores, the EGNN outperformed both the classical and quantum GNN; however, this algorithm was outperformed by EQGNN with a 7.29% increase in AUC, representing the strength of the EQGNN.Although the GNN outperformed the QGNN in the final parameter test by 1.93%, the QGNN performed more consistently and outperformed the GNN in the mid-parameter range |Θ| ∈ (1500, 4000).Through the variation in the number of parameters, the AUC scores were recorded for each case, where the number of parameters taken for each point corresponded to |Θ| ≈ {500, 1200, 1600, 2800, 3500, 5100}, as shown in the right panel in Figure 7.

Conclusions
Through several computational experiments, the quantum GNNs seemed to exhibit enhanced classifier performance compared with their classical GNN counterparts based on the best test AUC scores produced after the training of the models while relying on a similar number of parameters, hyperparameters, and model structures.These results seem promising for the quantum advantage over classical models.Despite this result, the quantum algorithms took over 100 times longer to train than the classical networks.This was primarily due to the fact that we ran our quantum simulations on classical computers and not on actual quantum hardware.In addition, we were hindered by the limited capabilities in the quantum APIs, where the inability to train with broadcastable unitary operators forced the quantum models to take in a batch size of one or train on a single graph at a time.
The action of the equivariance in the EGNN and EQGNN could be further explored and developed.This is especially true in the quantum case where more general permutation and SU(2) equivariance have been explored [40][41][42][43].Expanding the flexibility of the networks to an arbitrary number of nodes per graph should also offer increased robustness; however, this may continue to pose challenges in the quantum case due to the current limited flexibility of quantum software.A variety of different forms of the networks can also be explored.Potential ideas for this include introducing attention components and altering the structure of the quantum graph layers to achieve enhanced performance of both classical and quantum GNNs.In particular, one can further parameterize the quantum graph layer structure to better align with the total number of parameters used in the classical structures.These avenues will be explored in future work.

Software and Code
PyTorch and Pennylane were the primary APIs used in the formation and testing of these algorithms.The code corresponding to this study can be found at https://github.com/royforestano/2023_gsoc_ml4sci_qmlhep_gnn (accessed on 5 February 2024).
Proof.Let a general transformation g ∈ T g act on X by gX = RX + T, where R ∈ T g denotes a general rotation, and T ∈ T g denotes a general translation.Then, under transformation g on X of function φ defined above, we have where φ(gx) = gφ(x) shows φ transforms equivariantly under transformations g ∈ T g acting on X.

Appendix B. Coupling Hamiltonian Simplification
The reduction in the coupling Hamiltonian becomes and multiplying on the left by Îj and on the right by Îk produces where the sum of elements becomes which is equivalent to the sum of the elements for commutative spaces where v j i ∈ C and {1, 2}, {2, 1} should be regarded as ordered sets, which again shows the sum of the state elements remaining unchanged when the qubit states switch positions in the product.We now assume the statement is true for n = N final qubit states and proceed to show the N + 1 case is true.The quantum product state over N elements becomes which we assume to be permutation-equivariant over the sum of its elements.We can rewrite the form of this state as where A j defines the 2 N terms in the final product state.Replacing the i + 1th entry of the Kronecker product above with a new N + 1th state, we have When this occurs, this new state consisting of 2 N+1 elements with the N + 1 state in the i + 1th entry of the product can be written in terms of the old state with groupings of the new elements in 2 N+1−i batches of 2 i elements, i.e., N+1 i=1 . . .
which, when summed, becomes However, the i + 1th entry is arbitrary, and, due to the summation permutation equivariance of the initial state N i=1 |ψ i ⟩, the sum ∑ 2 N j=1 A j is equivariant, in fact invariant, under all reorderings of the elements |ψ i ⟩ in the product N i=1 |ψ i ⟩.Therefore, we conclude N+1 i=1 |ψ i ⟩ is permutation-equivariant with respect to the sum of its elements.
To show a simple illustration of why (A5) is true, let us take two initial states and see what happens when we insert a new state between them, i.e., in the 2nd entry in the product.This should lead to 2 2+1−1 = 2 2 = 4 groupings of 2 1 = 2 elements.To begin, we have and when we insert the new third state in the 1st entry of the product above, we have

Figure 1 .
Figure 1.Distributions of the jet transverse momenta p T , total momenta p, and energies E.
(i) α with node features h (il) α and edge connections e (ij)α between the nodes in J α with edge features a (ij)

Figure 2 .
Figure 2. A visualization of graphically structured data (left) and a sample jet shown in the (ϕ, y) plane (right) with each particle color-coded by its transverse momentum p (i) T,α .
, the energy E momentum components, p α ∈ E α , where each graph G α = {V α , E α } is the set of its corresponding nodes and edges.Each node v (i) α has an associated feature vector h (il) α , and the entire graph has an associated edge attribute tensor a (ijr) α describing r different relationships between node v (i) α and its neighbors v (j)

Figure 4 .
Figure 4.The 8 × 8 matrix representations of the coupling and transverse Hamiltonians defined in Equation (7).

Figure 7 .
Figure 7. Model ROC curves (left) and AUC scores as a function of parameters (right).

Table 1 .
Metric comparison between the classical and quantum graph models 1 .