1. Introduction
The rapid advancements in both machine learning (ML) and quantum computing have led to the emergence of quantum machine learning (QML), a promising interdisciplinary field that leverages quantum mechanics to enhance computational capabilities [
1]. Among the many applications of ML, image classification has seen remarkable progress in recent decades, becoming increasingly crucial across diverse sectors, including medical diagnosis and autonomous driving. Convolutional Neural Networks (CNNs) [
2] have played a central role in this success, excelling at hierarchically extracting spatial features from visual data.
Numerous researchers have developed various concepts for QML. Early work explored quantum algorithms for support vector machines (SVMs), demonstrating potential speedups over their classical counterparts [
3]. Initial theoretical explorations also investigated how fundamental neural-network components, such as artificial neurons (i.e., the Rosenblatt perceptron [
4]), could be realised on quantum hardware [
5]. More recently, Tacchino et al. [
6] revisited the quantum perceptron using parameterised quantum circuits (PQCs) with hardware-efficient single-qubit gates that can be stacked into trainable networks suitable for Noisy Intermediate-Scale Quantum (NISQ) [
7] devices, characterised by a limited number of qubits and susceptibility to noise. The crucial step of encoding classical data into quantum states has been rigorously analysed; concurrently Schuld and Killoran [
8] and Havlíček et al. [
9] independently formalised these encodings as
quantum feature maps that underpin kernel methods and variational classifiers. Cong et al. [
10] introduced a Quantum Convolutional Neural Network (QCNN) that uses only
variational parameters for
N qubits and can recognise quantum phases.
Variational quantum algorithms emerged as a natural progression in this field. Farhi and Neven [
11] proposed a quantum neural network suitable for near-term processors. Mitarai et al. [
12] introduced quantum circuit learning, demonstrating how parameterised quantum circuits can approximate nonlinear functions. Addressing practical concerns, McClean et al. [
13] highlighted barren-plateau trainability issues, and subsequent work demonstrated gradient-preserving circuit designs [
14] and noise-resilient optimisation strategies [
15].
Recent efforts in QML have focused on integrating quantum components into classical neural network architectures. Liu et al. [
16] embedded PQCs into classical convolutional blocks, yielding the hybrid quantum–classical convolutional neural network (QCCNN) and noting its robustness to moderate hardware noise [
15]. Henderson et al. [
17] developed the Quantum Convolutional (Quanvolutional) Neural Network (QNN) by incorporating quantum layers with arbitrary quantum filters into CNN pipelines, while Mari’s [
18] PennyLane demo distilled this idea to a single quantum layer whose outputs feed a shallow artificial neural network. Beyond classification, QML has also been explored for generative modelling [
19] and the community has begun to formalise benchmarking practice, e.g., Bowles et al. [
20] on the pitfalls and design principles of QML benchmarks.
Building upon the work of Henderson et al. [
17] and Mari [
18], Riaz et al. [
21] introduced the concept of
quantum pre-processing filters (QPFs), demonstrating their potential to improve image classification accuracy within a simple quantum kernel filter combined with a small neural network model. However, despite the demonstrated utility of QPFs, the underlying reasons for their variable performance across different datasets, sometimes leading to improved accuracy and other times to degradation, remain largely unexplored. Understanding these influencing factors is critical for the robust development and deployment of hybrid quantum–classical architectures.
To address this knowledge gap, we investigated the factors contributing to the performance of QPFs, with a specific focus on their impact on validation and test accuracy. Our primary hypothesis centred on the role of entanglement. To systematically examine this, we introduced the concept of
spatial symmetry within the QPF circuit, which governs the qubit entanglement patterns introduced by combining rotation gates with CNOT gates. We took the exact QPF circuit designed by Riaz et al. [
21] and systematically modified the control-target qubit configurations of the CNOT gates. Through experimentation with 24 combinations, we identified three distinct spatial symmetries:
diagonal,
vertical, and
horizontal (see
Section 2 for detailed circuit configurations). Furthermore, we explored the relationship between classification accuracy and the degree of entanglement generated by each QPF symmetry for various datasets through the lens of
von Neumann entropy [
22]. Our approach utilises a hybrid quantum–classical framework suitable for NISQ devices.
This study makes three contributions. First, we perform a controlled sweep of all 24 pixel-to-wire permutations of a fixed four-qubit QPF. Second, we group these permutations into diagonal, vertical, and horizontal symmetry classes based on the spatial pixel pairings induced by the CNOT gates. Third, we compare these symmetry-conditioned feature maps across multiple datasets and analyse whether their average von Neumann entropy correlates with downstream classification performance. This distinguishes our work from earlier quanvolutional and QPF studies by focusing on the behaviour and interpretability of QPF design choices, rather than proposing a new quantum architecture.
This paper presents our findings on the impact of QPF spatial symmetries on image classification performance and their correlation with entanglement levels. We detail the experimental setup and methodology in
Section 2, present and discuss the classification results and entanglement analysis in
Section 3. Finally,
Section 4 concludes the paper with a summary of key insights.
3. Results and Discussion
We tested the 24 QPF permutations in a few different datasets
eMNIST Digits [
25],
Fashion MNIST [
26],
MNIST [
27],
PneumoniaMNIST,
OCTMNIST, and
BreastMNIST [
28].
We present boxplots of validation accuracy for two datasets—eMNIST Digits and Fashion MNIST—across hybrid models with QPFs of three symmetry types (diagonal, vertical, and horizontal). These plots exemplify both scenarios: where the QPF enhances validation performance, and where it leads to a decline when compared with the baseline. For each symmetry, we evaluated these models with four different classical architectures, varying the number of hidden layers from 0 to 3.
As described in
Section 2.1, each of the 24 pixel-to-wire permutations was treated as a separate QPF configuration and trained independently. For each dataset and network depth, each configuration was trained independently using a newly initialised neural network. The permutations were then assigned to one of the three spatial symmetry classes: diagonal, vertical, or horizontal, with eight configurations per class. For each dataset and network depth, we collected the validation accuracies from the final 15 training epochs of each independently trained configuration. These values were pooled only for visualising symmetry-level performance. Thus, each QPF boxplot contains
validation accuracy values, corresponding to eight independently trained permutations and 15 late-stage epochs per permutation. The plain neural-network baseline contains 15 validation accuracy values from its final 15 epochs since it has no permutation variants.
Figure 4 shows our results for the eMNIST Digits dataset. With no hidden layers, every symmetry-enhanced model surpasses the plain baseline; the diagonal filter yields the highest validation accuracy, followed by the horizontal and vertical symmetries. Adding a single hidden layer improves all curves, and the baseline briefly surpasses the horizontal and vertical QPF symmetries, although it still lags behind the diagonal symmetry. A second hidden layer raises all accuracies again, where the diagonal symmetry remains ahead. In contrast, the horizontal and vertical accuracies converge, and the baseline edges pass both but cannot match the diagonal. Introducing a third hidden layer yields only marginal gains; the diagonal symmetry remains dominant, and the baseline plateaus above the horizontal and vertical axes.
We occasionally observe step-like increases in validation accuracy. Many occur around epochs where the learning rate is reduced by ReduceLROnPlateau callback, but similar behaviour can also arise from ordinary training variability (e.g., mini-batch noise and checkpointing). We therefore summarise performance using medians and interquartile ranges and maxima/minima, and treat outliers as descriptive only. Outliers in the boxplots reflect epoch-level variability within the last-15-epoch window and should not be over-interpreted.
The pattern in
Figure 5 when using Fashion MNIST is noticeably different. With zero hidden layers, using the best validation accuracy as the ranking criterion (the same criterion used for checkpoint selection), the horizontal symmetry performs best among all models, and the diagonal symmetry offers a slight improvement over the baseline, while the vertical symmetry underperforms both the other QPF variants and the plain network.
Once a hidden layer is introduced, overall accuracy climbs, but the plain network now overtakes every QPF symmetry; within the QPF symmetries, the vertical symmetry is the one which has the most impact, a reversal from the 0-hidden layer scenario, which can be seen as an indication that when combining the QPF with the 0-hidden layer, the NN is not able to learn the difference between the QPF symmetries for this specific dataset. A second hidden layer keeps the vertical filter narrowly ahead of the other symmetries, yet the baseline—despite a small dip—remains the overall leader. Depth beyond two layers yields no substantive gains: performance stabilises, the vertical symmetry continues to top the QPF symmetries, and the baseline model finishes with the highest accuracy.
These two behaviours show that the learning process changes depending on the type of symmetry used to extract features. This suggests that the features are not the same and supports the idea that they offer new information that can be utilised to train neural networks.
Figure 6 shows that across datasets, accuracy rises with depth for both curves, but QPFs are most beneficial in
shallow regimes: at 0 hidden layers the QPF variants noticeably lift MNIST and eMNIST Digits—and remain competitive on Fashion MNIST—indicating that the filters act as front-end feature extractors that compensate for limited capacity. As depth increases (≥1/2 layers), the curves flatten and dataset-specific preferences persist (diagonal for handwritten digits; vertical the strongest QPF on Fashion MNIST), while the plain baseline can match or surpass QPFs—most clearly on Fashion MNIST. Taken together, the figure shows that symmetry choice is dataset dependent and its benefit interacts with model capacity: QPFs help most when the classifier is shallow, a property that could be exploited in larger architectures by pairing lightweight QPF blocks with deeper downstream modules.
Table 2 distils the box-plot and trend-line results into a single snapshot of peak validation accuracy for every dataset–depth pair. Three patterns emerge. First, diagonal symmetry dominates the handwriting datasets: it edges out both the baseline and the other two symmetries on MNIST and eMNIST Digits, regardless of depth. Second, Fashion MNIST breaks that rule—vertical symmetry gives the highest QPF accuracy once at least one hidden layer is present, though the plain network still finishes on top. Third, the horizontal kernel is consistently best for the three medical-imaging benchmarks (PneumoniaMNIST, OCTMNIST, and BreastMNIST), and its relative gain grows with network depth. These results reinforce the message that each dataset has a “preferred” spatial symmetry during training.
Overall, we observed that most of the datasets used in this study benefit from improved accuracy when leveraging one of the symmetries of the QPF, with the notable exception of Fashion MNIST. We explore a possible explanation for this anomaly in the following sections.
Table 3 shows the evaluation results of the best model for each dataset/symmetry on the test-set (depth
). On handwritten digits (MNIST and eMNIST Digits), QPF variants are comparable with the baseline: differences are well within the reported uncertainty, indicating no measurable gain. On Fashion MNIST, the baseline is clearly higher than all QPF variants, exceeding the uncertainty bands and suggesting that QPFs do not help at this depth for this dataset. For the medical sets, QPFs tend to improve over baseline: PneumoniaMNIST shows an uplift with diagonal (similar for vertical/horizontal), OCTMNIST peaks with horizontal, and BreastMNIST favours diagonal. Because the medical datasets exhibit wider intervals, these gains are suggestive rather than definitive, but they align with the pattern that QPF symmetries can be beneficial on medical imagery while offering little to no improvement on digit recognition and may even underperform on Fashion MNIST. Overall, the table reinforces the dataset-dependent value of QPFs and the absence of a universally best symmetry. For the digit datasets and Fashion MNIST, validation and test accuracies are closely aligned, indicating that validation performance is representative of test-time behaviour. In contrast, the MedMNIST sets show pronounced drops from validation to test. These gaps suggest reduced generalisation on the medical distributions, possibly due to overfitting or a smaller dataset.
3.1. Datasets’ Entanglement Level
To look for correlations between the performance of the NN models that use the QPFs and the level of entanglement generated by these QPFs, we recorded the entropy of the QPF-encoded states for all images in each dataset, then averaged over the entire dataset. We repeated this for every dataset presented in this study. The resulting value for the best-performing configuration for each symmetry is shown in
Table 4.
Within-dataset analysis shows no clear link between average entanglement and validation accuracy. The three symmetries often differ by only a small amount in average entropy, yet they can lead to different validation accuracies. This indicates that average von Neumann entropy alone is not sufficient to explain QPF performance. Physically, this is expected because the downstream neural network does not receive the complete quantum state generated by the circuit. It receives only the classical measurement outputs, namely the single-qubit expectation values used as features. Therefore, even if two QPF configurations generate similar levels of entanglement, they may still produce different measured feature maps. The usefulness of those features depends on which pixels are paired and whether the resulting single-pixel and pairwise terms align with the spatial structure of the dataset.
This distinction helps explain why higher entropy does not guarantee improved classification. Entanglement entropy quantifies the nonseparability of the quantum state, but it does not directly measure class separability in the final measured feature space. A symmetry may generate slightly higher average entropy while producing measured features that are less useful for the downstream classifier. Conversely, a lower-entropy configuration may preserve more discriminative local intensity information. Therefore, QPF performance depends not only on the amount of entanglement generated but also on how the induced pixel correlations interact with the dataset structure and the architecture’s classical components.
Future research should investigate whether an optimal “sweet spot” for entanglement may exist that balances performance and how this interacts with factors such as dataset complexity and network capacity. While developing a state-of-the-art quantum neural network model is beyond the scope of this work, gaining a deeper understanding of these quantum filters could pave the way for hybrid models that integrate quantum and classical filtering techniques.
For each dataset and symmetry, we computed the mean single-qubit von Neumann entropy on the corresponding split and plotted, along the horizontal axis, that entropy against the vertical-axis measure
, which encodes the signed gap between a QPF model and the plain baseline (positive bars: QPF > baseline; negative bars: QPF < baseline) as shown in
Figure 8. The left panel reports validation results; the right panel reports test results. This analysis is not about absolute performance but about the
effect of feature extraction: how much a symmetry-conditioned QPF helps or hurts relative to no filtering, and whether that effect covaries with the amount of entanglement injected by the filter.
Across datasets, we do not observe a clear global correlation between entropy and : both high and low entanglement can coincide with small positive or negative accuracy gaps. This supports the interpretation that entanglement is not a direct performance predictor in this readout setting. The measured QPF features contain a mixture of single-pixel responses and pairwise interaction terms, and the relevance of these terms varies by dataset. A weak positive pattern appears only within the MedMNIST group (PneumoniaMNIST, OCTMNIST, BreastMNIST), where higher entropies tend to align with larger positive test deltas. This suggests that, for some medical images, local pairwise correlations may provide useful additional structure for generalisation. However, this trend is not universal. Fashion MNIST remains an important counterexample: despite moderate entropy values, all three symmetries produce negative deltas, indicating that the induced pairwise feature map is less useful than the unfiltered representation for this dataset. Overall, entanglement should be understood as one component of the QPF transformation, not as the sole mechanism determining classification accuracy.
We also probed whether entanglement could reduce readout dimensionality by measuring only the target qubits at the circuit output, under the premise that correlations would encode control-qubit information into those targets and make the control readouts redundant. The results show that this is not the case. This can be understood from the structure of the measured features: for each CNOT pair, the control-qubit readout preserves a single-pixel response, while the target-qubit readout contains a pairwise interaction term. Measuring only the targets therefore removes part of the local intensity information that remains useful for classification. Empirically, target-only readout showed no advantage over full readout; accuracies were slightly below or above but very close to the baseline across depths. Thus, in our setting, the target measurements do not reliably preserve all discriminative information carried by the control measurements. The model performs better when all qubits are measured, suggesting that useful QPF representations require both single-pixel and pairwise features. As future work, we will investigate entanglement-aware dimensionality reduction, such as joint measurements, learned projective readouts, or redesigned QPF kernels, to concentrate correlated information into fewer measured features without discarding discriminative signal.
3.2. Synthesis and Implications
This study investigated the impact of QPFs, with designs based on spatial symmetries, on hybrid quantum–classical neural networks across multiple image datasets. Increasing model depth consistently improved validation accuracy across all datasets. However, the efficacy of QPF symmetries proved to be highly dataset-dependent. Diagonal symmetry QPFs consistently performed well on simpler datasets like MNIST and EMNIST Digits, suggesting their effectiveness in capturing key features. For medical imaging datasets (PneumoniaMNIST, OCTMNIST, and BreastMNIST), horizontal symmetry consistently yielded the best performance. In contrast, for the more complex Fashion-MNIST dataset, the classical baseline neural network generally outperformed QPF-enhanced models, though the vertical symmetry QPF (for hidden layers ) demonstrated valuable representational power, isolating specific visual features that other symmetries failed to capture.
Our findings also clarify the role of entanglement in this setting. Entanglement is physically present in the QPF and changes the local feature map, but its average magnitude does not by itself determine classification performance. The downstream classifier receives classical measurement outputs, not the full quantum state. Therefore, the relevant question is whether the measurement-induced features improve class separability for a given dataset. This explains why similar entropy values can lead to different accuracies across symmetries, and why higher entropy does not guarantee improvement over the baseline. The results suggest that QPF design should focus not only on generating entanglement but on designing entangling patterns and readout strategies that preserve task-relevant information while introducing useful nonlinear pixel correlations.
3.3. Limitations and Future Work
Our conclusions should be viewed in light of several limitations. First, the experiments were conducted on simulated noise-free circuits; gate errors on NISQ hardware may reduce performance. Second, the entanglement metric was restricted to pairwise von Neumann entropy, leaving multi-qubit correlations unexplored. Addressing these issues—by benchmarking on real devices and by extending the information-theoretic analysis—will be the focus of future work. A promising direction is to develop a selection framework that automatically matches dataset characteristics to the most effective QPF symmetry (or kernel).