Z 2 × Z 2 E QUIVARIANT Q UANTUM N EURAL N ETWORKS : B ENCHMARKING AGAINST C LASSICAL N EURAL N ETWORKS

This paper presents a comprehensive comparative analysis of the performance of Equivariant Quantum Neural Networks (EQNN) and Quantum Neural Networks (QNN), juxtaposed against their classical counterparts: Equivariant Neural Networks (ENN) and Deep Neural Networks (DNN). We evaluate the performance of each network with two toy examples for a binary classification task, focusing on model complexity (measured by the number of parameters) and the size of the training data set. Our results show that the Z 2 × Z 2 EQNN and the QNN provide superior performance for smaller parameter sets and modest training data samples.


Introduction
The rapidly evolving convergence of machine learning (ML) and high-energy physics (HEP) offers a range of opportunities and challenges for the HEP community.Beyond simply applying traditional ML methods to HEP issues, a fresh cohort of experts skilled in both areas is pioneering innovative and potentially groundbreaking approaches.ML methods based on symmetries play a crucial role in improving data analysis as well as expediting the discovery of new physics [1,2].In particular, classical Equivariant Neural Networks (ENNs) exploit the underlying symmetry structure of the data, ensuring that the input and output transform consistently under the symmetry [3].ENNs have been widely used in various applications including deep convolutional neural networks for computer vision [4], AlphaFold for protein structure prediction [5], Lorentz equivariant neural networks for particle physics [6], and many other HEP applications [7][8][9][10][11].
Meanwhile, the rise of readily available noisy intermediate-scale quantum computers [12] has sparked considerable interest in using quantum algorithms to tackle highenergy physics problems.Modern quantum computers boast impressive quantum volume and are capable of executing highly complex computations, driving a collaborative effort within the community [13,14] to explore their applications in quantum physics, particularly in addressing theoretical challenges in particle physics.Recent research on quantum algorithms for particle physics at the Large Hadron Collider (LHC) covers a range of tasks, including the evaluation of Feynman loop integrals [15], simulation of parton showers [16] and structure [17], development of quantum algorithms for helicity amplitude assessments [18], and simulation of quantum field theories [19][20][21][22][23][24].
In this paper we benchmark the performance of EQNNs against various classical and/or non-equivariant alternatives for three two-dimensional toy datasets, which exhibit a Z 2 × Z 2 symmetry structure.Such patterns often appear in high-energy physics data, e.g., as kinematic boundaries in the high-dimensional phase space describing the final state [40,41].By a clever choice of the kinematic variables for the analysis, these boundaries can be preserved in projections onto a lower-dimensional feature space [42][43][44][45][46].For example, one can form various combinations of possible invariant mass for the generic decay chain considered in Ref. [44],    In this study, we consider simplified two-dimensional datasets that mimic the data arising in such projections.This setup allows us to focus on the comparison between different methods, avoiding unnecessary issues that may arise when dealing with actual particle physics simulation data such as sampling statistics, parton distribution functions, unknown particle mass spectrum, unknown width, detector effects, etc.We explore EQNNs and benchmark them against classical neural network models.We find that the variational quantum circuits learn the data better with the smaller number of parameters and the smaller training dataset compared to their classical counterparts.

Dataset Description
In all three examples, we consider two-dimensional data (x 1 , x 2 ) on the unit square (−1 ≤ x i ≤ 1).The data points belong to two classes: y = +1 (blue points) and y = −1 (red points).
(i) Symmetric case: In the first example (Figure 1), the labels are generated by the function where H(x) is the Heaviside step function and for definiteness we choose R = 1.1.
The function (1) respects a Z 2 × Z 2 symmetry, where the first Z 2 is given by a reflection about the x 1 = x 2 diagonal while the second Z 2 corresponds to a reflection about the x 1 = −x 2 diagonal This Z 2 × Z 2 example was studied in Ref. [37] and we shall refer to it as the symmetric case since the y label is invariant.(ii) Anti-symmetric case: The second example is illustrated in Figure 2. The labels are generated by the function The first Z 2 is still realized as in (2).However, this time, the labels are flipped under a reflection along the x 1 = −x 2 diagonal: which is why we shall refer to this case as anti-symmetric.(iii) Fully anti-symmetric case: The last example is depicted in Figure 3.The labels are generated by the function where H(x) is the Heaviside step function and for definiteness we choose R = 1.In this case, the labels are flipped under both reflections along the x 1 = −x 2 diagonal as well as the x 1 = x 2 diagonal, which is why we shall refer to this case as fully antisymmetric.As we will see later, it is straightforward to incorporate both symmetric and anti-symmetric properties in variational quantum circuits, while it is not obvious how to consider the anti-symmetric case in the classical neural networks.

Network Architectures
To assess the importance of embedding the symmetry in the network, and to compare the classical and quantum versions of the networks, we study the performance of the following four different architectures: (i) Deep Neural Network (DNN), (ii) Equivariant Neural Network (ENN), (iii) Quantum Neural Network (QNN), and (iv) Equivariant Quantum Neural Network (EQNN).In each case, we adjust the hyperparameters to ensure that the number of network parameters is roughly the same.

(i) Deep Neural Networks:
In our DNN, for the symmetric (anti-symmetric) case, we use one (two) hidden layer(s) with four neurons.For both types of classical networks, we use the softmax activation function, Adam optimizer, and a learning rate of 0.1.We use the binary cross-entropy for both the DNN and ENN.(ii) Equivariant Neural Networks: A given map f : x ∈ X → f (x) ∈ Y between an input space X and an output space Y is said to be equivariant under a group G if it satisfies the following relation: where g in (g out ) is a representation of a group element g ∈ G acting on the input (output) space.In the special case when g out is the trivial representation, the map is called invariant under the group G, i.e., a symmetry transformation acting on the input data x does not change the output of the map.The goal of ENNs, or equivariant learning models in general, is to design a trainable map f which would always satisfy Equation (7).In tasks where the symmetry is known, such equivariant models are believed to have an advantage in terms of the number of parameters and training complexity.Several studies in high-energy physics have attempted to use classical equivariant neural networks [6,[47][48][49][50].Our ENN model utilizes four Z 2 × Z 2 symmetric copies for each data point, which are fed into the input layer, followed by one equivariant layer with three (two) neurons and one dense layer with four (four) neurons in the symmetric (anti-symmetric) case.(iii) Quantum Neural Networks: For the QNN, we utilize the one-qubit data-reuploading model [51], as shown in Fig. 4, with depth four (eight) for the symmetric (anti-symmetric and fully anti-symmetric) case, using the angle embedding and three parameters at each depth.This choice leads to a similar number of parameters as in the classical networks.We use the Adam optimizer and the loss for any choice of two orthogonal operators O 1 and O 2 (see Ref. [52] for more details.).
In this paper, we use for all three datasets considered in this paper.(iv) Equivariant Quantum Neural Networks.
In EQNN models, symmetry transformations acting on the embedding space of input features are realized as finite-dimensional unitary transformations U g , g ∈ G.
Consider the simplest case where one trainable operator U(θ, x) acts on a state |ψ⟩: U(θ, x) |ψ⟩.If for a symmetry transformation U g , the condition is satisfied, then the operator U is equivariant, i.e., the equivariant gate should commute with the symmetry.In general, the U g operators on the two sides of Equation (10) do not necessarily have to be in the same representation but are often assumed so for simplicity.The output of a QNN is the measurement of the expectation value of the state with respect to some observable O.If the gates are equivariant and we apply some symmetry transformation U g , then this is equivalent to measuring the observable U † g OU g .Hence, if O commutes with the symmetry U g , the model as a whole would be invariant under U g , which is the case in our symmetric example.Otherwise the model is equivariant, as in our anti-symmetric example.
Our EQNN uses the two-qubit quantum circuit depicted in Figure 5 for depth 1.This circuit is repeated five (ten) times with different parameters for the symmetric (antisymmetric and fully anti-symmetric).The two R Z gates embed x 1 and x 2 , respectively.The R X gates share the same parameter (θ 1 ) and the R ZZ gate uses another parameter (θ 2 ).The invariant model (for the symmetric case) uses the same observable O for both classes in the data.In the anti-symmetric case, we use two different observables O 1 and O 2 that correspond to each label.They transform into one another under reflection g r , i.e., The remaining circuits are parameterized by R X (θ 1 ) and R Z (θ 2 ).
In the symmetric case, we use binary cross-entropy loss, assuming the true label y is either 0 or 1, The observables O and the reflection U g r along x 1 = −x 2 are defined as follows: In the anti-symmetric and fully anti-symmetric cases, we used the same loss as in QNN For the anti-symmetric case O 1 (O 2 ) is the observable corresponding to y = 1 (y = 0) For the fully anti-symmetric case, we use another set of observables, so one will transform into the other with reflection along any of the two diagonals.They are given as follows: Since it is anti-symmetric with respect to each of the diagonals, the result is invariant if both reflections are applied.It is difficult to build a classical equivariant neural network using these anti-symmetries since classical equivariant models are built based on the assumption that the target is invariant under certain transformations.When discussing the theory of classical equivariant machine learning models, the models that transform non-trivially under the symmetry group are often discussed mathematically but rarely implemented in code.For our classical model on partially anti-symmetric data, we only implemented the invariant part of the symmetry (Z 2 ) and ignored the anti-symmetric portion of the data.While it may not be impossible to consider such asymmetric cases in classical neural networks, implementation can be quite involved.
On the other hand, it is straightforward to build quantum equivariant models.For this purpose, we would only need to exploit the transformation properties of the observables.If one observable transforms to the other under the transformation of interest (reflection along the diagonal in this case), then measurement made on one observable is equivalent to the measurement of the other observable given the transformed input.
We can consider equivariant quantum models with anti-symmetric transformation from the point of view of representation theory.The fully invariant (symmetric) case can be considered as the model transform under the trivial representation of the group, where all the transformations defined by the group do not change the output of the model.The asymmetric (either anti-symmetric or fully anti-symmetric) cases that we considered here can be interpreted as transforms under some other (one-dimensional) representation of the group, where some transformations change the output of the model to its opposite value, while other transformations do not change the output.

Results
The left panels in Figure 6 show the receiver operating characteristic (ROC) curves for each network with N train = 200 and N test = 2000 samples for the symmetric (top), anti-symmetric (middle), and fully anti-symmetric (bottom) dataset.The results for the DNN, ENN, QNN, and EQNN are shown in (green, dotted), (yellow, dotdashed), (red, dashed), and (blue, solid), respectively.As expected, networks with an equivariance structure (EQNN and ENN) improve the performance of the corresponding networks (QNN and DNN) without the symmetry.We also observe that quantum networks perform better than the classical analogs.In the legends, numerical values followed by network acronyms represent the number of parameters used for each network.For the symmetric example, the EQNN uses only 10 parameters; thus, for fair comparison, we constructed the other networks with O(10) parameters as well.For the anti-symmetric example, we use 20 parameters for the EQNN.To further quantify the performance of our quantum networks, in Figure 7, we show the AUC (Area under the ROC Curve) as a function of the number of parameters (left panels) with a fixed size of the training data (N train = 200), and as a function of the number of training samples (right panels) with a fixed number of parameters (N params = 20).The top, middle, and bottom panels show results for the symmetric, anti-symmetric, and fully anti-symmetric dataset.As the number of parameters increases, the performance of all networks improves.All AUC values become similar when N params ≈ 20 (N params ≈ 40) for the symmetric (antisymmetric) case.As shown in the bottom panels, the performances of all networks become comparable to each other for both examples once the size of the training data reaches ∼400, except for the fully anti-symmetric case.We observe that from the top panel to the bottom panel, the relative improvement from the QNN to the EQNN grows, indicating the importance of symmetry implementation on the network.Similar relative improvement exists from the DNN to the QNN, emphasizing the importance of quantum algorithms.Note that the ENN curves are missing in the bottom panel of both Figures 6 and 7.This is due to the non-trivial implementation of the anti-symmetric property in classical ENNs.Finally Table 1 shows the accuracy of the DNN for the fully anti-symmetric dataset.The different rows and columns represent different choices of the number of parameters and the number of training samples, respectively.These numbers are compared against those in right-bottom panel of

Conclusions
In this paper, we examined the performance of Equivariant Quantum Neural Networks and Quantum Neural Networks, compared against their classical counterparts, Equivariant Neural Networks and Deep Neural Networks, considering two toy examples for a binary classification task.Our study demonstrates that EQNNs and QNNs outperform their classical counterparts, particularly in scenarios with fewer parameters and smaller training datasets.This highlights the potential of quantum-inspired architectures in resource-constrained settings.This point has been emphasized in a similar study recently in Ref. [35], which showed that an EQNN outperforms the non-equivariant one in terms of generalization power, especially with a small training set size.We note a more significant enhancement in the performance of an EQNN and QNN compared to an ENN and DNN, particularly evident in the anti-symmetric example rather than the symmetric one.This underscores the robustness of quantum algorithms.The code used for this study is publicly available at https://github.com/ZhongtianD/EQNN/tree/main(accessed on 9 March 2024).
While our current study has primarily focused on an EQNN with discrete symmetries, it is crucial to acknowledge the significant role that continuous symmetries, such as Lorentz symmetry or gauge symmetries, play in particle physics.In our future research, we aim to compare an EQNN with continuous symmetries against classical neural networks.Exploring more complex datasets with high-dimensional features is another direction we plan to pursue.However, handling such examples would necessitate an increase in the number of network parameters, prompting an investigation into related issues like overparameterization, barren plateaus, and others.

2 i m 2 j
where Particles A, B, C, and D are hypothetical particles in new physics beyond the standard model of masses {m A , m B , m C , m D }, while the corresponding standard model decay products consist of a jet j, a "near" lepton ℓ ± n , and a "far" lepton ℓ ± f .The two-dimensional (bivariate) distributiond 2 Γ dR ij dR klshows distributions similar to those in Figures1-3, where R ij = m is the mass square ratio.Symmetric, anti-symmetric, or non-symmetric structures provide information of particle masses involved in the cascade decays.

Figure 1 .
Figure 1.Pictorial illustration of the first dataset used in this study-the symmetric case (1).

Figure 2 .
Figure 2. Pictorial illustration of the second dataset used in this study-the anti-symmetric case (4).

Figure 3 .
Figure 3. Pictorial illustration of the third dataset used in this study-the fully anti-symmetric case (6).

Figure 4 .
Figure 4. Illustration of the quantum circuit used for QNN at depth 2. This circuit is repeated up to depth four (five) times with different parameters for the symmetric (anti-symmetric and fully anti-symmetric) case.The data points ⃗ x = (x 1 , x 2 , 0) are loaded via angle embedding with rotation gates, followed by another rotation R(⃗ x) = R Z (0)R Y (x 2 )R Z (x 1 ) with arbitrary angle parameters.

Figure 5 .
Figure 5. Illustration of the quantum circuit used for EQNN at depth 1.This circuit is repeated five (ten) times with different parameters for the symmetric (anti-symmetric and fully anti-symmetric) case.The data points (x 1 , x 2 ) are loaded via angle embedding with two R Z gates, R Z (x 1 ) and R Z (x 2 ).The remaining circuits are parameterized by R X (θ 1 ) and R Z (θ 2 ).

Figure 6 .
Figure 6.ROC (left) and accuracy (right) curves for the symmetric (top), anti-symmetric (middle), and fully anti-symmetric (bottom) example.The evolution of the accuracy during training and testing is shown in the right panels of Figure 6.The accuracy converges faster (after only 5 epochs) for the QNN and EQNN in comparison to their classical counterparts (10-20 epochs).The same color-scheme is used, but this time, solid curves represent training accuracy, while dashed curves show test accuracy.To further quantify the performance of our quantum networks, in Figure 7, we show the AUC (Area under the ROC Curve) as a function of the number of parameters (left panels) with a fixed size of the training data (N train = 200), and as a function of the number of training samples (right panels) with a fixed number of parameters (N params = 20).The top, middle, and bottom panels show results for the symmetric, anti-symmetric, and fully anti-symmetric dataset.As the number of parameters increases, the performance of all networks improves.All AUC values become similar when N params ≈ 20 (N params ≈ 40) for the symmetric (antisymmetric) case.As shown in the bottom panels, the performances of all networks become comparable to each other for both examples once the size of the training data reaches ∼400,

Figure 7 .
Figure 7. AUC as a function of the number of parameters (left) for fixed N train = 2000 and N test = 200, and as a function of N train (right) with a fixed number of parameters as shown in the legend, for the symmetric (top), anti-symmetric (middle), and fully anti-symmetric example (bottom).

Figure 7 .
The EQNN achieves 0.95 accuracy with 20 parameters and 200 training samples, while the DNN requires more parameters and/or more training samples.

Table 1 .
Accuracy of DNN for the fully anti-symmetric dataset.The different rows (columns) represent different choices of the number of parameters (the number of training samples).