1. Introduction
The COVID-19 pandemic, declared a global health emergency by the World Health Organization in March 2020, exposed critical vulnerabilities in large-scale diagnostic infrastructures [
1]. Although reverse transcription polymerase chain reaction (RT-PCR) remains the clinical gold standard, its diagnostic performance is constrained by suboptimal early sensitivity, processing delays, and reliance on specialized laboratory facilities [
2]. These limitations motivated the expanded use of medical imaging, particularly chest computed tomography (CT) and chest X-ray (CXR), as complementary modalities for screening, triage, and disease monitoring. CT demonstrates high sensitivity for hallmark manifestations, such as ground-glass opacities and multifocal consolidations [
3,
4], while CXR offers greater accessibility and cost-effectiveness for large-scale deployment [
3]. Artificial intelligence (AI), especially deep learning based on convolutional neural networks (CNNs), has been extensively explored to automate COVID-19 image-based diagnosis. Transfer learning with pretrained backbones such as ResNet, DenseNet, VGG, and EfficientNet has achieved strong performance across CT and CXR datasets [
4,
5,
6,
7,
8].
Nevertheless, persistent methodological limitations and barriers to clinical translation remain. Dataset scarcity and class imbalance increase the risk of overfitting and inflated performance estimates [
2,
8]. Generalization across imaging centers, scanners, and acquisition protocols is often weak [
5,
9]. Large-scale or three-dimensional CT models introduce substantial computational and memory overhead [
10], while the limited interpretability of deep neural networks continues to hinder clinical trust and regulatory acceptance [
11]. These challenges highlight the need for alternative or complementary computational paradigms that enhance representation efficiency and robustness without excessive architectural scaling.
Quantum machine learning (QML) has recently emerged as a complementary paradigm that enriches feature representations through high-dimensional Hilbert-space embeddings [
12]. Parameterized quantum circuits can, in principle, model complex nonlinear correlations within comparatively compact parameter spaces. However, current quantum devices operate in the noisy intermediate-scale quantum (NISQ) regime, characterized by limited qubit counts, shallow circuit depths, restricted connectivity, and sensitivity to noise and decoherence [
13]. These constraints render end-to-end quantum processing of high-resolution medical images impractical. To reconcile theoretical expressivity with hardware limitations, hybrid quantum–classical (HQC) frameworks have been proposed. These approaches embed compact quantum modules—such as variational quantum circuits, quantum convolutional operators, or quantum classifier heads—within classical deep learning pipelines [
14,
15,
16]. Early investigations report encouraging parameter efficiency and binary classification performance under simulation settings. However, multi-class diagnostic robustness, real-hardware validation, and multi-site evaluation remain limited, and reported improvements are often sensitive to encoding strategies, circuit design, and optimization settings.
Several survey articles have broadly reviewed quantum machine learning or summarized artificial intelligence techniques for COVID-19 diagnosis. As summarized in
Table 1, existing reviews predominantly focus on algorithmic categories, cross-domain applications, or high-level trends. While some mention hybrid quantum–classical approaches or COVID-19 imaging, none provide a dedicated architecture-centric analysis of how quantum computation is structurally integrated into medical imaging pipelines for COVID-19 classification.
Despite the growing number of HQC applications in COVID-19 imaging, the literature lacks a unified, architecture-centric framework for systematically organizing and comparing these approaches. Most contributions are presented as isolated algorithmic proposals, emphasizing performance metrics rather than structural integration within the end-to-end learning pipeline. Consequently, a fundamental design question remains: what functional role does the quantum component play in hybrid systems, and how does its architectural placement influence representational capacity, optimization behavior, computational cost, and scalability under NISQ constraints?
To clearly guide the reader, we introduce an architecture-centric taxonomy of hybrid quantum–classical models, as illustrated in
Figure 1. Rather than grouping studies by model name or reported performance, the reviewed works are classified according to the dominant functional role of the quantum module within the diagnostic pipeline. Specifically, the taxonomy is based on the quantum module’s structural position, the type of input it receives, the type of output it produces, and the component responsible for the final diagnostic decision. Based on these criteria, the reviewed studies are organized into three main archetypes. Archetype A comprises quantum-based or patch-level quantum preprocessing models, in which quantum circuits operate at the early feature-extraction stage and act as localized, nonlinear feature transformers over small image regions. Archetype B includes classical feature-extractor–quantum-classifier-head architectures, in which a classical backbone first learns compact image embeddings, and a quantum module performs decision-stage classification or decision refinement. Archetype C corresponds to quantum feature-extractor architectures with classical classifiers, in which the quantum circuit transforms compact classical representations into quantum-derived discriminative features that are subsequently processed by a lightweight classical classifier.
Figure 1 provides a process-oriented visual summary of the proposed taxonomy and highlights the main structural differences among the three HQC archetypes. The revised figure presents the archetypes as three vertically stacked pipelines, allowing the placement and functional role of the quantum module to be compared more clearly across architectures. Archetype A represents early-stage quanvolutional processing, where quantum circuits operate after classical preprocessing and before classical feature aggregation and classification. Archetype B represents a late-stage quantum decision architecture, where classical preprocessing and feature-map generation are followed by a quantum classifier head. Archetype C represents an intermediate quantum feature-extraction architecture, where compact classical representations are transformed by a quantum feature extractor before final classification by a classical classifier. This architectural perspective supports a more systematic comparison of quantum-module placement, classical–quantum coupling, encoding strategies, computational implications, reproducibility requirements, and diagnostic behavior. It also clarifies why some HQC designs achieve promising binary screening results while still facing challenges in multiclass generalization, scalability, and clinical robustness.
The remainder of this article is organized as follows:
Section 2 reviews prior studies and outlines the literature search strategy;
Section 3 introduces the theoretical foundations of hybrid quantum–classical computation;
Section 4 presents a detailed technical analysis of the three architectural archetypes;
Section 5 provides a comparative synthesis supported by consolidated tables;
Section 6 discusses key methodological and translational challenges;
Section 7 outlines future research directions; and
Section 8 concludes with the main insights and implications for the development of clinically credible hybrid quantum–classical diagnostic systems.
2. Review of Prior Studies and Literature Search
This section provides a structured overview of prior work related to hybrid quantum–classical (HQC) models for COVID-19 medical imaging. It first describes the literature search and study-selection process using a PRISMA-inspired workflow, followed by a review of the main research streams relevant to this survey, including classical deep-learning approaches for COVID-19 imaging, quantum machine learning principles, hybrid quantum–classical frameworks in healthcare, COVID-19-specific HQC imaging models, and the remaining research gaps that motivate the proposed architecture-centric taxonomy.
2.1. Literature Search Strategy and Study Selection
To ensure transparency and reproducibility, this survey followed a structured PRISMA-inspired search and selection workflow, as summarized in
Figure 2. The literature search covered the period from January 2016 to January 2026 and was conducted across IEEE Xplore, PubMed, Scopus, Web of Science, and arXiv, using English-language search queries combining terms related to hybrid quantum–classical learning, quantum machine learning, medical imaging, and COVID-19 diagnosis. Representative search terms included “hybrid quantum–classical,” “quantum machine learning,” “variational quantum circuit,” “quanvolution,” “quantum classifier,” “medical imaging,” “image classification,” “chest X-ray,” “CT,” “COVID-19,” “SARS-CoV-2,” and “coronavirus.” The initial database search identified 1248 records, and 37 additional records were obtained through reference checking and manual search, resulting in 1285 total records. After removing 312 duplicates, 973 records were screened by title and abstract, of which 778 were excluded as out of scope. The full texts of 195 potentially relevant studies were then assessed for eligibility. Studies were included if they proposed an original HQC model, focused on COVID-19 medical-imaging diagnosis or classification, provided sufficient technical detail to identify the role and placement of the quantum module, and reported quantitative results or an experimentally evaluable framework. Studies were excluded if they were reviews, editorials, purely theoretical papers, non-COVID-19 imaging studies, not genuine models, lacked sufficient architectural detail, or were conference papers, preprints, duplicates, or inaccessible full texts. Among the 195 full-text articles assessed, 185 were excluded for the reasons summarized in
Figure 2, including lack of HQC architecture, lack of relevance to COVID-19 diagnosis, insufficient technical detail, review/theoretical status, conference or preprint status, or inaccessible/duplicate full text. Finally, 10 peer-reviewed journal studies were retained for qualitative and comparative synthesis. These studies constitute the evidence base for the architecture-centric taxonomy, comparative tables, and cross-study analyses presented in the remainder of this survey.
2.2. Classical AI for COVID-19 Medical Imaging
Classical COVID-19 imaging diagnosis is dominated by transfer learning and CNN-based classification/segmentation pipelines evaluated on public CXR and CT datasets. A major CXR benchmark effort introduced an open-access dataset and a tailored CNN design, reporting strong COVID-19 detection performance and emphasizing transparency in dataset curation; the benchmark scale (CXR images and patient cases) is explicitly documented and widely reused for subsequent comparisons [
8]. Subsequent classical studies built CXR classifiers with various backbones and transfer-learning strategies, typically reporting high accuracy/sensitivity on multi-class settings (COVID vs. normal vs. pneumonia) but showing sensitivity to class imbalance and dataset origin; representative works include robust detection frameworks and transfer-learning-based pipelines that report strong performance under standard splits [
26]. For CT-based diagnosis, several influential works created large-scale CT benchmarks and tailored CNN architectures, explicitly reporting dataset sizes and patient counts. One benchmark is documented at the scale of 104,009 CT slices across 1489 patients (COVIDx-CT), enabling reproducible evaluation and consistent reporting across studies using this dataset [
27]. In parallel, smaller but highly cited public CT collections were introduced to catalyze research when large CT datasets were scarce; for example, an open dataset reports 349 COVID CT images from 216 patients and 463 non-COVID CT images, along with baseline experiments and metrics for reference [
28]. Another widely used CT dataset reports 2482 CT scans split between 1252 COVID-positive and 1230 non-COVID cases; multiple classical CT pipelines (including end-to-end DL and feature-engineering hybrids) report very high classification accuracy on this dataset, sometimes approaching ~99% under certain experimental settings. Additional CT studies emphasize clinical slice selection (radiologist-guided informative-slice selection) and curated institutional datasets, reporting improved reliability but at the cost of reduced comparability due to private data and differing protocols [
29]. Across these classical studies, high headline accuracy is common, but methodological reviews and critiques show that results can be inflated by non-patient-wise splitting, hidden dataset leakage, and spurious correlations arising from multi-source aggregation (scanner/site artifacts). These issues directly affect clinical generalization and must be treated as first-order variables when judging any “improvement,” including quantum/hybrid claims [
8].
2.3. Hybrid Quantum–Classical Frameworks in Healthcare
Classical methods to medical image analysis and healthcare-related classification tasks. From a theoretical perspective, the trainability of variational quantum circuits can be affected by barren plateau phenomena, which may hinder optimization as circuit complexity increases [
30].
To improve visual feature extraction, quanvolutional neural networks have been proposed as quantum-enhanced alternatives to classical convolutional operations [
31]. More recent studies have explored medical image classification on real quantum hardware [
32] and introduced specialized architectures, including hybrid quantum–classical convolutional neural networks [
33], federated quantum convolutional neural networks for privacy-preserving learning [
34], and noise-aware quantum neural networks designed to improve robustness under realistic quantum conditions [
35]. In parallel, broader quantum image classification frameworks [
36] and transfer-learning-based quantum neural networks for early disease detection [
37] further illustrate the growing interest in applying quantum learning paradigms to healthcare and medical imaging applications. Hybrid quantum–classical learning in healthcare extends beyond imaging to clinical decision support, bio-signal interpretation, and risk prediction, where tabular EHR-like datasets are common, and dimensionality can be controlled. Broad healthcare reviews synthesize applications spanning diagnosis assistance, imaging, genomics, and operational optimization. They consistently frame hybrid designs as the most viable path under NISQ constraints [
38]. A systematic review of digital health assesses whether QML (including hybrid models) outperforms classical methods for clinical decision-making and health service delivery, summarizes thousands of screened studies, and concludes that a consistent advantage has not yet been established due to heterogeneous baselines, proxy datasets, and uneven reporting [
14]. In heart-disease prediction, hybrid/QML evaluations on standard cardiology datasets report incremental accuracy improvements and/or faster training compared to classical baselines under particular settings, highlighting sensitivity to feature selection and evaluation choices [
39]. For diabetes prediction on the Pima Indians Diabetes Dataset, multiple QML studies test variational quantum classifiers and quantum support vector approaches, reporting performance competitive with classical ML but often below strong deep baselines; some works emphasize that optimizer choice, feature maps (e.g., ZZ), and error-aware tuning can noticeably change reported metrics [
40]. In biosignal analysis, Hybrid Quantum–Classical ECG/arrhythmia pipelines propose quantum transforms or quantum-enhanced feature stages followed by classical learners, often evaluated on large ECG corpora (e.g., MIT-BIH style settings) and reporting high accuracy in arrhythmia classification when paired with strong classical feature extractors [
41]. In oncology-style classification, breast cancer benchmarks (e.g., WDBC-type datasets) are repeatedly used to test VQC/QSVM variants and compare them against classical baselines; these studies show that careful circuit/ansatz design and tuning are necessary for stability, and that improvements can be small or dataset-dependent [
42]. In healthcare decisioning, the most credible hybrid studies (i) evaluate on widely recognized clinical datasets, (ii) report multiple clinically meaningful metrics (not only accuracy), and (iii) explicitly state quantum resources and whether results are simulator- or hardware-based. Systematic evidence indicates that many “advantages” are not yet robust across datasets or protocols, underscoring the need for standardized reporting and stronger external validation—lessons that directly translate to COVID imaging [
14].
2.4. Hybrid Quantum–Classical Models for COVID-19 Imaging
Early hybrid quantum–classical (HQC) models for COVID-19 imaging predominantly explored patch-wise quantum processing via quantum convolutional designs. A representative framework is introduced in [
43], in which a random quantum circuit-based quanvolutional layer is embedded in a classical convolutional neural network. In this approach, small chest X-ray patches are angle-encoded into shallow quantum circuits, and the resulting expectation values are aggregated as quantum-enhanced feature maps for subsequent classical convolutional layers. The model is evaluated across multiple chest X-ray datasets with sample sizes ranging from approximately 7000 to 11,000 images, in both binary and multi-class classification settings. While binary COVID–normal classification achieves accuracy exceeding 98%, performance in the multi-class scenario (COVID, normal, pneumonia) degrades to the high-80% range, indicating reduced separability among visually similar classes. A closely related random quanvolution-based hybrid CNN is further investigated in [
44], with a more systematic evaluation across multiple classification scenarios. In this design, the quantum layer explicitly replaces the first classical convolutional stage and feeds into a conventional CNN backbone. Experiments conducted on chest X-ray datasets for binary COVID–normal and multi-class COVID–pneumonia–normal classification report binary accuracy around 98% and multi-class accuracy in the range of 90–93%. These results suggest that random quantum feature maps can enhance early-stage representation learning, but their contribution diminishes as classification complexity increases. Moving beyond fixed random circuits, a parameterized quanvolutional variant is proposed in [
45], where trainable quantum convolutional layers are integrated into a hybrid CNN. Image patches are encoded using rotation gates into variational quantum circuits, whose parameters are optimized jointly with classical network weights. Evaluated on chest X-ray datasets for binary COVID-19 detection, the model reports simulator-based accuracy above 97%. However, the introduction of trainable quantum parameters increases sensitivity to optimization instability, particularly under shallow circuit and limited-qubit constraints.
In contrast to patch-wise quantum processing, ref. [
46] introduces a quantum feature extraction architecture with a compact classifier (HQF-CC). In this framework, chest X-ray images from a large radiography dataset with over 15,000 samples are first processed using classical feature-extraction pipelines. The resulting compact feature vectors are then encoded into a variational quantum circuit to generate quantum-enhanced representations, which are classified using a lightweight custom classical classifier. Reported test accuracy reaches approximately 98.8%, with COVID-class sensitivity around 88–89% and specificity above 97%, demonstrating that high aggregate accuracy may coexist with uneven class-wise performance. Hybrid quantum transfer learning for CT imaging is examined in [
47], in which a pretrained classical CNN backbone extracts high-dimensional CT embeddings, which are subsequently processed by a dressed quantum circuit (DQC) classifier. Multi-class CT classification involving COVID-19, community-acquired pneumonia, and normal cases is evaluated using reduced feature vectors encoded into 4–8 qubits via angle encoding. The results show that the DQC-based hybrid achieves competitive multi-class accuracy while maintaining more stable training behavior than purely quantum classifier heads, particularly under restricted qubit budgets. A closely related hybrid classical–quantum transfer-learning study is presented in [
48], although it targets cardiomegaly detection rather than COVID-19 diagnosis. In this work, chest X-ray images are processed using a pretrained DenseNet-121 backbone, and a parameterized quantum circuit is integrated into the downstream classification pipeline. The study uses the CheXpert repository to construct a balanced dataset of 2436 posteroanterior chest X-rays, including cardiomegaly and control cases. The hybrid models are implemented using Qiskit and PennyLane and evaluated with k-fold cross-validation under a state-vector simulation setting. Reported results show ROC-AUC values up to 0.93 and accuracies up to 0.87, comparable to the classical baseline. The study demonstrates how compact quantum modules can be coupled with pretrained classical feature extractors in radiographic diagnosis. However, since it is not COVID-19-specific and is evaluated under simulation, it should be interpreted as supporting evidence for hybrid classical–quantum transfer learning in medical imaging rather than direct evidence for COVID-19 classification. Finally, ref. [
49] presents a CT-based hybrid framework that combines a strong pretrained classical encoder, such as a VGG-type architecture, with a quantum neural network classifier head implemented using modern quantum software toolchains. Evaluated on a CT dataset of approximately 2500 images using an 80/20 train–test split, the model reports accuracy near 96–97%, precision above 98%, recall around 95%, and high specificity. These findings indicate that quantum classifier heads can be effectively integrated into established deep learning pipelines, although the resulting gains remain incremental rather than transformative.
2.5. Gap Analysis and Motivation of This Survey
Although hybrid quantum–classical (HQC) models have attracted growing attention for COVID-19 medical imaging, the existing literature remains fragmented and difficult to compare. Most studies report binary COVID-19 screening results using heterogeneous datasets, split strategies, and evaluation metrics, making cross-study comparison difficult. A central gap is the lack of a unified, architecture-centric perspective that clarifies the functional role of quantum modules within classical pipelines. The interaction among architectural placement, encoding strategies, and circuit scale under NISQ constraints has rarely been systematically and quantitatively analyzed. Moreover, multi-class diagnostic robustness and class-wise sensitivity—critical for clinical deployment—are often underexplored compared to binary accuracy. Generalization and bias also remain insufficiently addressed. Heavy reliance on small or curated datasets, limited external validation, and instability in optimization further raise concerns about reproducibility and clinical credibility. Existing surveys typically focus on general quantum machine learning or classical COVID-19 imaging, without providing a targeted, architecture-driven, and quantitative synthesis of HQC frameworks tailored to medical imaging workflows. Motivated by these gaps, this survey contributes in three main ways:
It introduces a clear taxonomy of HQC architectures based on the functional placement of quantum modules, enabling structured comparison across studies with similar architecture types.
It provides a quantitative comparative analysis of ten representative HQC models, covering datasets, encoding strategies, quantum resource usage, and performance in both binary and multi-class settings. This approach enables more meaningful comparisons, mitigating the effects of dataset heterogeneity and other confounding factors.
It identifies key challenges and open research directions under NISQ constraints, linking algorithmic design decisions to validation rigor and practical deployment feasibility.
By addressing these issues, this survey establishes a structured baseline for evaluating HQC approaches and provides guidance for advancing hybrid models from simulation-based studies toward clinically credible diagnostic systems while alleviating difficulties caused by fragmented datasets and inconsistent protocols.
4. Hybrid Quantum–Classical Models in COVID-19 Diagnosis
Hybrid quantum–classical learning has emerged as a pragmatic strategy for COVID-19 and respiratory medical-imaging diagnosis, aiming to combine the representational strength of classical deep learning with the expressive potential of quantum circuits. In the reviewed literature, quantum models are not used as fully standalone diagnostic systems; instead, they are integrated into classical pipelines as quantum convolutional layers, quantum feature extractors, or quantum classifier heads operating on compressed representations derived from chest X-ray (CXR) or computed tomography (CT) images. This section reviews ten peer-reviewed journal studies that form the final comparative corpus of this survey and uses them to define and instantiate the three architectural archetypes analyzed in
Section 4.1,
Section 4.2 and
Section 4.3.
The first group of studies uses quantum circuits in an early feature-processing role. Ref. [
43] proposed an HQ-CNN model for COVID-19 prediction from CXR images, in which random quantum circuits operate as quanvolutional filters before classical CNN-based aggregation. The model achieved strong binary classification performance, but its multiclass performance decreased when COVID-19, viral pneumonia, bacterial pneumonia, and normal cases were jointly considered. Ref. [
64] extended a similar hybrid quantum–classical convolutional idea to CT imaging, using a random quantum convolutional layer within a CNN pipeline and evaluating the model on two public COVID-19 CT datasets. Their study reported high binary diagnostic performance and included both a train/validation/test split and 5-fold cross-validation. Ref. [
65] proposed a quantum machine learning architecture in which CGAN-generated synthetic CT images are combined with a 4-qubit quanvolutional/QNN classifier, addressing data scarcity but also introducing a potential dependence on synthetic data quality. Ref. [
66] investigated hybrid quantum–classical CNNs for multiclass lung X-ray classification using Rotation and Pauli gates; although this study is not a direct COVID-19 detector, it is retained as a recent COVID-related lung-imaging comparator because it evaluates hybrid QCNN behavior in respiratory X-ray classification.
The second group of studies places the quantum module after classical feature extraction, using it as a classifier or decision-refinement component. Ref. [
45] proposed a pre-trained QCNN architecture for COVID-19 CT classification, combining classical feature extraction, especially VGG16, with a Qiskit EstimatorQNN based on ZZFeatureMap and RealAmplitudes ansatz. Ref. [
67] developed a hybrid framework for respiratory lung disease detection from CXR images, where a custom CNN extracts features and MMS/MSMS quantum classifiers perform final classification; this work is particularly important because it includes IBM Q-QASM circuit validation in addition to simulation. Ref. [
68] introduced the MLDC framework for multi-lung disease classification, combining a CNN-based feature extractor with either an ANN classifier or an MMS-based quantum classifier, and reported improved performance for the quantum classifier over the ANN counterpart. Ref. [
69] proposed the DI-QL approach for COVID-19 CT classification, using Hadamard and coupling gates within a quantum-assisted deep learning framework and reporting strong binary classification performance.
A third line of work assigns the quantum component a feature-generation or feature-extraction role between classical preprocessing and final classification. Ref. [
70] proposed the HQF-CC framework for CXR-based respiratory disease detection, where an MMS quantum feature extractor generates representations that are subsequently classified by a custom classical classifier. This design differs from decision-stage quantum classifiers because the quantum circuit contributes directly to feature formation rather than only to the final decision boundary. Ref. [
47] also investigated a quantum neural network framework for CT-based COVID-19 prognostic analysis and compared it against DNN, CNN, and 2D-CNN baselines, reporting improved accuracy and reduced runtime. Although the exact patient-level validation details remain limited, this study is included because it represents an early peer-reviewed attempt to evaluate quantum neural networks for COVID-19 CT image analysis.
The reviewed hybrid quantum–classical studies differ not only in the type of medical image used or in the reported diagnostic performance, but more fundamentally in the functional position and responsibility of the quantum module within the diagnostic pipeline. Therefore, the taxonomy adopted in this review is architecture-based rather than performance-based. Each study is assigned to an archetype based on four structural criteria: where the quantum module is inserted, the type of input it receives, the type of output it produces, and the component responsible for the final diagnostic decision.
Figure 3 provides a conceptual comparison of the three proposed hybrid quantum–classical architectural archetypes. In Archetype A, the quantum module is placed at the front end and operates on local image patches before classical CNN aggregation. In Archetype B, a classical CNN or pretrained backbone first extracts a compact feature embedding, and the quantum circuit acts as a decision-stage classifier head. In Archetype C, the quantum circuit is placed between a compact classical feature-construction stage and a final classical classifier; thus, it acts primarily as a quantum feature extractor rather than as the final decision module.
The proposed taxonomy assigns each study according to the dominant functional role of the quantum module rather than the model name or reported performance. The assignment is based on three structural criteria: the input received by the quantum module, the output produced by the quantum module, and the component responsible for the final diagnostic decision. For mixed architectures, the dominant quantum function determines the primary archetype, while additional mechanisms such as CGAN-based augmentation, transfer learning, or custom feature extraction are recorded as secondary operational tags. The assignment criteria are summarized in
Table 2. This table clarifies that the proposed taxonomy is not determined by the model name, dataset, accuracy, or publication year but by the dominant function of the quantum component inside the hybrid pipeline.
For studies with mixed design elements, the dominant quantum function is used to assign the primary archetype. For example, the presence of CGAN-based data augmentation, pretrained transfer learning, or custom feature construction is treated as an operational mechanism rather than a separate archetype. This prevents the taxonomy from mixing architectural placement with training strategy or dataset augmentation. Accordingly, the classification is based on the structural role of the quantum module: whether it transforms local image patches, refines the final decision boundary, or generates intermediate quantum-derived features. Based on this taxonomy,
Section 4.1,
Section 4.2 and
Section 4.3 describe the three archetypes in detail.
Section 4.1 discusses patch-level quanvolutional architectures,
Section 4.2 explains classical feature extractors coupled with quantum classifier heads, and
Section 4.3 presents quantum feature-extraction architectures followed by classical classifiers.
4.1. Archetype A: Quanvolution/Patch-Level Quantum Preprocessing
Archetype A represents hybrid quantum–classical architectures in which the quantum module is inserted at an early stage of image processing. In this design, the quantum circuit does not act as the final classifier. Instead, it functions as a local nonlinear transformation block applied to small image patches. The measured outputs of these patch-wise quantum circuits are reconstructed as quantum-generated feature maps, which a conventional classical CNN then processes for global feature aggregation and final diagnostic prediction. The conceptual workflow of this archetype is illustrated in
Figure 4. The architecture can be divided into three functional blocks: a classical front-end, a quantum local feature-transformation module, and a classical decision stage.
Let the input medical image be
where
,
, and
denote image height, width, and number of channels. For chest X-ray images,
is often one after grayscale conversion, while CT slices may be treated as single-channel images after preprocessing. Classical preprocessing includes resizing, normalization, optional denoising, and intensity scaling:
where
denotes the classical preprocessing operator.
The preprocessed image is then partitioned into local patches:
where
is the
-th image patch and
is the total number of patches. Each patch is vectorized before quantum encoding:
Thus, forms the classical-to-quantum interface. In most practical quantum computational models, small patches such as 2 × 2 are used to keep the input dimension, qubit count, and circuit depth compatible with near-term quantum devices.
Each patch vector
is encoded into an
-qubit quantum state using an encoding map
:
A commonly used encoding strategy is angle encoding, where each feature controls the rotation angle of a single-qubit gate:
where
is the
-th component of the vectorized patch and
are scaling constants used to map pixel values into valid angular ranges.
After encoding, a shallow quantum circuit transforms the encoded patch:
The unitary
may be a fixed random quantum circuit or a trainable parameterized quantum circuit. A generic shallow circuit can be expressed as
where
contains single-qubit rotations,
contains entangling gates such as CNOT, and
is the circuit depth.
The transformed state is measured through a set of observables
:
The measurement vector for the
-th patch is
After all patches are processed, the quantum-derived vectors are rearranged according to their original spatial locations to form a quantum-generated feature tensor:
For non-overlapping patches,
This tensor plays a role analogous to the output of a classical convolutional layer, but the local transformation is generated by quantum state preparation, unitary evolution, and measurement rather than by learned classical kernels.
The quantum feature tensor is passed to a classical CNN:
where
denotes the classical CNN and
is the diagnostic prediction. This stage aggregates local quantum-derived responses, learns global spatial structure, and performs the final classification.
Two training regimes are possible. In fixed quanvolution, the quantum circuit is not optimized and only the classical parameters are trained:
In trainable quanvolution, both the quantum and classical parameters are optimized:
The fixed setting is generally more stable and computationally efficient, while the trainable setting provides greater task adaptivity but increases optimization cost and measurement sensitivity. The main advantage of Archetype A is that the quantum module directly participates in early local representation learning. This makes it attractive for texture-sensitive diagnostic tasks, such as COVID-19 screening from chest X-ray or CT images. However, because the quantum circuit is applied patch by patch, the total quantum execution cost scales with the number of patches, measurement shots, circuit depth, and measured observables:
where
is the number of patches,
is the number of shots,
is the circuit depth, and
is the number of measured observables.
In summary, Archetype A is a patch-level quantum preprocessing architecture. The quantum module enriches local image patches, while the classical CNN performs global aggregation and final diagnosis. As shown in
Figure 4, this architecture has a clear classical–quantum–classical structure, but its scalability depends strongly on patch count, shot efficiency, and hardware-aware circuit execution.
4.2. Archetype B: Classical Feature Extractor with Quantum Classifier Head
Archetype B describes hybrid architectures in which the quantum module is placed at the decision stage of the diagnostic pipeline. Unlike Archetype A, the quantum circuit does not process raw image patches. Instead, a classical CNN or pretrained backbone first learns a high-level feature representation from the input image. This representation is then compressed into a low-dimensional vector and passed to a quantum classifier head. The conceptual structure of this archetype is shown in
Figure 5. This architecture reflects a practical constraint of near-term quantum computing: high-dimensional medical images cannot be directly processed by small quantum circuits. Therefore, the classical network performs the main representation-learning task, while the quantum module performs compact nonlinear decision mapping. The architecture consists of four functional stages: classical feature extraction, feature compression, quantum decision mapping, and classical output formation.
After classical preprocessing, the image is passed through a classical feature extractor:
where
denotes a CNN or a pretrained backbone, and ℎ is a high-dimensional semantic embedding. This stage learns global visual patterns, including opacity distribution, texture changes, and disease-related lung structures.
Since the embedding dimension
is usually larger than the feasible quantum input dimension, a classical adapter compresses
into a compact vector:
The adapter may be a dense layer, projection layer, bottleneck block, or dimensionality-reduction module. The vector is the classical-to-quantum interface.
The compressed vector
is encoded into an
-qubit state:
For angle encoding,
where
is the
-th component of the compressed embedding. Under this encoding,
is typically close to
, or
is selected according to the available qubit budget. The encoded state is processed by a variational quantum classifier:
where
is a trainable quantum circuit. A generic variational classifier can be written as
where
contains parameterized rotations,
contains entangling gates, and
is the circuit depth. The final quantum state is measured through observables
:
The resulting measurement vector is
In this archetype, is interpreted as decision evidence, class-related features, or a compact representation for final logit formation.
A lightweight classical output layer maps the quantum measurement vector to class logits:
where
may be a linear layer or small dense classifier, and
is the number of diagnostic classes. The final predicted class can be obtained as
This setting includes two common variants. In a variational quantum classifier head, the quantum circuit directly produces decision features or class evidence. In a dressed quantum circuit head, small classical layers are placed before and after the quantum circuit to improve trainability, stabilize feature scaling, and calibrate logits. The supervised training objective can be expressed as
where
denotes quantum measurement and
is the classification loss. Depending on data availability, the classical backbone may be frozen and only the adapter and quantum head may be trained:
or the entire hybrid model may be fine-tuned end-to-end:
Archetype B typically requires only one or a few quantum-circuit evaluations per image. Its quantum execution cost can be approximated as
where
is the number of shots,
is the circuit depth, and
is the number of measured observables. Unlike Archetype A, this cost does not scale directly with the number of image patches. In summary, Archetype B is a classical-representation/quantum-decision architecture. The classical backbone learns high-level image representations, while the quantum head performs compact nonlinear decision refinement. As shown in
Figure 5, this structure is more scalable than patch-wise quanvolution, but its quantum contribution depends strongly on the quality of the classical embedding. Therefore, fair evaluation requires strong classical baselines and ablation studies that isolate the effect of the quantum classifier head.
4.3. Archetype C: Quantum Feature Extractor with Classical Classifier
Archetype C represents hybrid architectures in which the quantum circuit is positioned as an intermediate feature extractor. In this design, the image is first converted into a compact classical representation. The quantum circuit then transforms this compact representation into a quantum-derived feature vector. Unlike Archetype B, the quantum module is not primarily treated as the final classifier. Instead, the final decision is made by a downstream classical classifier operating on the quantum-generated feature space. The conceptual structure of this archetype is shown in
Figure 6. This architecture occupies a middle position between Archetype A and Archetype B. It avoids the patch-wise scaling cost of quanvolution while assigning the quantum circuit a more explicit feature-generation role than a decision-stage quantum head.
Let the input medical image be
After classical preprocessing,
a lightweight classical mapping constructs an initial compact representation:
where
may be a shallow CNN, handcrafted feature extractor, small embedding network, or feature-construction block.
Because the quantum circuit requires a small-dimensional input, an additional compression step is applied:
The vector forms the classical-to-quantum interface. The quality of this vector is critical because excessive compression can remove clinically relevant information before quantum processing.
The compact vector
is encoded into a quantum state:
Using angle encoding,
where
is the
-th component of the compact vector.
The encoded state is processed by a parameterized quantum feature extractor:
The quantum circuit can be represented as
where
contains trainable rotation gates and
introduces entanglement. The purpose of this circuit is to generate a nonlinear quantum-derived representation, not to directly produce final class logits.
The circuit output is measured through a set of observables:
The resulting quantum-derived feature vector is
Here, is the number of quantum-derived features and does not necessarily equal the number of classes. This distinction is important: in Archetype C, is treated as a feature embedding rather than as the final diagnostic output.
The feature vector
is passed to a classical classifier:
where
may be a small multilayer perceptron, logistic regression layer, support vector machine, or custom classifier. The final class prediction is
The supervised learning objective can be written as
In small-data scenarios, the classical front-end may be fixed or lightly tuned, while the quantum feature extractor and classical classifier are optimized. When sufficient data are available, all components can be trained jointly. The quantum execution cost of Archetype C can be approximated as
where
is the number of shots,
is the circuit depth, and
is the number of measured quantum features. Similar to Archetype B, this cost is bounded because the quantum circuit is not repeatedly applied to all image patches. However, unlike Archetype B, the quantum output is interpreted as a feature representation rather than as a decision head. In summary, Archetype C is a quantum-feature/classical-decision architecture. The classical front-end prepares compact inputs, the quantum circuit generates a discriminative feature embedding, and the classical classifier performs the final diagnosis. As shown in
Figure 6, this design provides a balanced hybrid structure: it avoids patch-wise scaling while giving the quantum module a direct representational role. Its effectiveness depends on whether the quantum-derived feature space improves class separability beyond a strong classical feature baseline.
4.4. Instantiation of the Three Hybrid Architectural Archetypes in Reviewed Studies
The taxonomy introduced in
Section 4.1,
Section 4.2 and
Section 4.3 categorizes hybrid quantum–classical architectures for COVID-19 and related medical-imaging diagnosis according to the functional role and placement of the quantum module within the diagnostic pipeline. Accordingly, the quantum component may operate at the early image-processing stage on local image patches, serve as a quantum classifier or decision-refinement head at the end of the network, or function as an intermediate quantum feature-extraction or feature-generation module. To operationalize this taxonomy, the ten peer-reviewed journal studies selected for the final comparative analysis are mapped in
Table 3 by imaging modality, dataset context, diagnostic task, classical component, quantum component, hybrid coupling location, and corresponding architectural archetype.
Table 3 shows that current hybrid quantum–classical medical-imaging studies are concentrated mainly in Archetype A and Archetype B, whereas Archetype C remains comparatively less explored. The taxonomy assignment clarifies that each study is categorized by the dominant functional role of the quantum module in the diagnostic pipeline, rather than by model name, dataset type, or reported performance. The secondary operational tags further capture mixed design elements, such as CGAN-based augmentation, pretrained transfer learning, custom CNN feature extraction, or direct QNN-based classification, without introducing additional archetypes. The primary archetype assignment was based on the quantum module’s dominant functional role in the diagnostic pipeline. Secondary operational tags were used to capture mixed design elements such as synthetic data augmentation, pretrained transfer learning, custom CNN feature extraction, or direct QNN classification without introducing additional archetypes.
The distribution of studies in
Table 3 reflects two dominant design philosophies. The first strategy places the quantum module early in the pipeline, where it functions as a quantum-enhanced local image transformer. The second strategy retains a strong classical feature extractor and introduces the quantum circuit only at the classification or decision-refinement stage. A third, less frequently adopted strategy uses the quantum circuit as an intermediate feature extractor, passing the quantum-generated features to a classical classifier.
Archetype A includes models in which quantum circuits process local image regions before the main classical classifier. Refs. [
43,
64] are representative examples of this design. In both cases, local CXR or CT image patches are encoded into shallow quantum circuits and transformed into feature maps that are subsequently processed by classical CNN layers. Ref. [
65] is also assigned to Archetype A because the dominant role of the quantum module is quanvolutional or QNN-based image processing after CGAN-assisted CT data generation; the CGAN component is therefore captured as a secondary operational tag rather than as a separate archetype. Ref. [
66] also falls into this group as a hybrid QCNN lung-imaging comparator, although it is not a direct COVID-19 detection study because its task involves normal, lung opacity, and viral pneumonia classification. The main advantage of Archetype A is that the quantum circuit participates directly in early feature formation, potentially enriching local nonlinear representations before classical aggregation. However, this benefit comes with an important scalability cost: patch-wise quantum processing requires repeated circuit evaluations across many local image regions, thereby increasing the computational burden with image resolution, patch count, measurement shots, and circuit depth.
Archetype B includes models in which a classical network first extracts compact feature representations, and the quantum module is then used as a classifier or decision-refinement head. Refs. [
45,
47,
67,
68,
69] follow this strategy in different forms. In these architectures, the quantum circuit does not process raw image patches directly; instead, it receives lower-dimensional representations generated by a pretrained CNN, a custom CNN, a deep learning pipeline, or classical preprocessing. This design is more compatible with near-term quantum constraints because the quantum module operates on compressed feature embeddings rather than high-dimensional image data. It also reduces the number of quantum circuit invocations per image compared with Archetype A. The trade-off is that the quantum component may contribute mainly to decision refinement rather than to low-level image representation, making it more difficult to determine whether performance gains arise from the quantum module itself or from the strength of the classical feature extractor.
Archetype C is represented by models that use the quantum component as an intermediate feature-generation or feature-extraction mechanism before a classical classifier. Rao and Rajitha’s HQF-CC model [
70] is the clearest example of this category. In this architecture, an MMS quantum-based feature extractor generates quantum-enhanced representations, which are then passed to a custom classical classifier for the final decision. This design differs from Archetype B because the quantum module is not merely a final classifier head; instead, it explicitly constructs an intermediate feature representation. It also differs from Archetype A because the quantum circuit is not repeatedly applied to dense local image patches. As a result, Archetype C offers a compromise between representational involvement and computational scalability. Nevertheless, the limited number of studies in this category suggests that intermediate quantum-feature generation strategies remain underdeveloped in COVID-19 and respiratory imaging.
Overall,
Table 3 supports three key observations. First, most reviewed studies do not use quantum models as standalone diagnostic systems; rather, quantum circuits are embedded within classical medical-imaging pipelines. Second, the placement of the quantum component strongly determines the computational behavior of the model: patch-level quantum preprocessing maximizes direct quantum participation but is less scalable, whereas decision-stage quantum classifiers are more computationally bounded but may provide less interpretable quantum contributions. Third, the secondary operational tags reveal that hybrid quantum–classical models are methodologically heterogeneous even within the same archetype. Studies differ in modality, task formulation, dataset source, feature-extraction strategy, data augmentation mechanism, and hybrid coupling design. This heterogeneity motivates a more detailed evaluation in the following tables, where diagnostic performance, quantum-resource reporting, validation rigor, and deployment feasibility are analyzed separately.
4.5. Diagnostic Performance and Evaluation Context of Reviewed HQC Models
Table 4 summarizes the reported diagnostic performance and evaluation protocols for the 10 reviewed hybrid quantum–classical or quantum-assisted models for COVID-19 and related respiratory/lung imaging tasks. Each study is contextualised by its dataset and evaluation protocol, baseline or comparator models, best-reported performance, reported clinical metrics, and main empirical observation.
Table 4 shows that the strongest reported results are generally obtained in binary COVID-19 classification, particularly in CT- and CXR-based studies, where the classification boundary is limited to COVID-19 versus normal or COVID-19 versus non-COVID cases. For example, ref. [
43] report near-98–99% performance in binary CXR experiments, while [
64] report 99.39% and 97.91% accuracy across two CT datasets. Ref. [
69] also reports very high binary COVID-CT performance using DI-QL. These results suggest that current hybrid quantum–classical pipelines are most effective when the diagnostic task is well constrained and the number of target classes is limited. However,
Table 3 also shows that high binary performance does not necessarily translate into robust multiclass respiratory classification. The clearest example is [
43], where binary experiments achieve high performance, but the multiclass experiment drops to 82.6% accuracy.
Similarly, the lung X-ray comparator study by [
66] shows that adding a quantum convolutional layer does not automatically improve performance; the classical CNN achieves 91% accuracy, whereas the best hybrid QCNN reaches 87%. This is an important negative or cautionary result because it indicates that quantum integration alone is not sufficient to guarantee improved diagnostic accuracy. Across the reviewed studies, baseline selection strongly affects how the quantum contribution should be interpreted. Models that combine a strong pretrained or custom CNN feature extractor with a quantum decision stage, such as [
45,
67,
68], often report strong performance. Nevertheless, in these cases, part of the improvement may come from the representational power of the classical backbone rather than the quantum module alone. Therefore, fair interpretation requires comparing each hybrid model not only with weak classical baselines but also with strong CNNs, pretrained CNNs, and classical machine learning alternatives. The table further highlights substantial heterogeneity in evaluation protocols. Some studies use fixed train–test splits, others use random hold-out validation, while [
64] additionally report 5-fold cross-validation. Patient-wise separation, independent external validation, and confidence intervals are inconsistently reported, which limits direct comparability across studies. This means that reported accuracy values should not be interpreted as direct evidence of quantum advantage unless the dataset source, split strategy, baseline strength, and clinical validation rigor are considered together. Overall,
Table 4 supports three conclusions. First, hybrid quantum–classical models show promising diagnostic performance, especially for binary COVID-19 screening. Second, multiclass classification and broader respiratory-disease discrimination remain more challenging and reveal performance instability. Third, claims of quantum advantage remain preliminary because most studies differ in datasets, baselines, validation protocols, and metric completeness. These issues motivate the subsequent analyses of quantum-resource reporting, validation rigor, and deployment feasibility in
Table 4,
Table 5 and
Table 6.
5. Cross-Study Synthesis: Quantum Resources, Validation Rigor, and Deployment Feasibility
Building on
Table 3 and
Table 4, this section synthesizes the reviewed HQC studies across quantum-resource reporting, validation rigor, and deployment readiness. These dimensions are essential because high internal accuracy alone does not establish clinical relevance, practical quantum utility, or real-world feasibility.
5.1. Quantum-Resource Reporting and Reproducibility
Table 5 summarizes the quantum resource and implementation characteristics of the reviewed studies. The comparison includes the encoding or feature-map strategy, qubit count, circuit depth, circuit or ansatz structure, backend and shot reporting, trainable quantum parameters, and quantum invocation pattern. These factors are central to reproducibility because they determine whether a reported hybrid model can be independently implemented, simulated, or migrated to near-term quantum hardware.
Across the reviewed studies, small-qubit and shallow-circuit designs dominate. Here, small-qubit refers to quantum modules with a limited number of qubits, typically around four qubits in several reviewed models, while shallow-circuit denotes circuits with only a small number of sequential gate layers. This design choice is consistent with NISQ-era constraints, where limited qubit availability, gate noise, decoherence, and measurement uncertainty restrict the feasible width and depth of quantum circuits. Patch-level quanvolutional models typically encode local image regions into compact quantum circuits, often using 2 × 2 patches or low-dimensional feature blocks. Refs. [
43,
64,
65,
66] are representative of this design family. In contrast, decision-stage models such as [
45,
67,
68,
69] apply quantum circuits after classical feature extraction, thereby reducing the dimensionality of the quantum input and limiting the number of quantum calls per image. A key observation from
Table 5 is that quantum-resource reporting remains inconsistent. Some studies provide useful implementation details, including the number of qubits, Qdepth, rotational gates, CNOT entanglement, Pauli-Z measurements, and PennyLane/Qiskit implementation. For example, ref. [
70] report Qdepth values of 4 and 6 with 72 and 114 parameters, respectively, while [
68] reports a four-qubit classifier with Qdepth 4 and 72 total parameters. Ref. [
43] also report shot-based experimentation with 500 and 1000 shots. However, several studies do not fully report exact qubit count, transpiled circuit depth, number of shots, backend configuration, noise model, error-mitigation strategy, or per-image quantum latency.
This lack of standardization limits the interpretability of the claimed quantum advantage. Without explicit information about circuit depth, shot count, backend, and noise assumptions, it is difficult to determine whether reported gains originate from the quantum module, the classical backbone, dataset characteristics, or simulation-specific conditions. Therefore, the reviewed evidence supports the need for a minimum quantum-reporting checklist that includes encoding strategy; qubit count; circuit depth before and after transpilation; trainable quantum parameters; shots; backend; noise model; error mitigation; and the number of circuit calls per image.
5.2. Validation Rigor and Medical-Imaging Evidence Quality
Table 6 evaluates the reviewed studies in terms of patient-level separation, external validation, dataset bias, metric completeness, and generalization risk. These factors are critical because image-level leakage, slice-level correlation, synthetic augmentation, and class imbalance can inflate reported performance.
The main limitation is that patient-wise splitting is rarely stated explicitly. Several studies report 70/30, 80/20, random hold-out, or 7:1:2 protocols without clarifying whether images or CT slices from the same patient were kept in the same subset. This is especially important for CT studies, where correlated slices may lead to overestimated generalization.
External validation is also limited. Most studies rely on internal splits of public, private, or constructed datasets, and prospective clinical validation is absent. Metric reporting varies across studies: some report sensitivity, specificity, F1-score, AUC, Kappa, MCC, or computational cost, but confidence intervals and uncertainty estimates are generally missing. Overall,
Table 6 shows that validation design remains a major barrier to clinical credibility.
5.3. Computational Scaling and Deployment Feasibility
Table 7 summarizes computational scaling and deployment readiness. HQC models introduce additional costs beyond conventional CNNs, including circuit construction, repeated execution, measurement shots, backend latency, and hardware noise.
A key trade-off is between quantum involvement and deployment feasibility. Patch-wise models give the quantum circuit a direct feature-extraction role, but their cost scales with image resolution, patch count, shots, and circuit evaluations. This makes [
43,
64,
65,
66] less suitable for latency-sensitive deployment unless hardware-aware optimization is reported. In contrast, decision-stage models usually invoke the quantum circuit once or only a few times per image, making [
45,
67,
68,
69] more plausible for early hardware testing.
Hardware-aware reporting remains limited. Most studies rely on simulators, and only a few report real or near-real backend validation. Study [
67] is notable for IBM Q-QASM circuit validation, but full end-to-end clinical latency is still not established. Overall, deployment readiness remains preliminary.
5.4. Integrated Methodological Synthesis
Taken together,
Table 3,
Table 4,
Table 5,
Table 6 and
Table 7 show that current HQC models for COVID-19 and respiratory imaging are best interpreted as quantum-augmented classical pipelines rather than replacements for classical deep learning. The reviewed studies demonstrate that quantum modules can be placed at different points of the diagnostic workflow, but the practical consequences of this placement differ substantially. Patch-level quantum preprocessing provides the most direct quantum involvement in image representation, but it scales poorly with image resolution and measurement cost. Decision-stage quantum classifiers are more computationally bounded and more compatible with NISQ limitations, but their contribution may be incremental relative to the classical feature extractor. Intermediate quantum feature-generation models offer a potentially useful compromise, but this design remains less explored in the reviewed literature.
Empirically, the strongest results are concentrated in binary COVID-19 screening. Multiclass respiratory classification remains challenging and often results in performance degradation. This suggests that current HQC models may be more effective at separating strongly distinct categories than at performing fine-grained differential diagnosis among visually overlapping respiratory diseases. Methodologically, the most important unresolved issues are validation and reproducibility. Patient-wise splitting, external validation, confidence intervals, and hardware-aware quantum-resource reporting are inconsistently addressed. Without these elements, it is difficult to determine whether reported improvements reflect genuine hybrid quantum–classical utility or are driven by dataset-specific effects, strong classical backbones, synthetic augmentation, or permissive evaluation protocols. Therefore, progress in HQC medical imaging should not be measured only by higher reported accuracy. Future studies must combine strong classical baselines, transparent quantum-resource reporting, patient-level validation, independent external testing, and realistic hardware-aware deployment analysis. Only under these conditions can the field meaningfully assess whether hybrid quantum–classical models provide clinically relevant advantages over well-optimized classical deep learning systems.
6. Challenges and Open Issues in Hybrid Quantum–Classical COVID-19 Imaging
Despite encouraging experimental results, hybrid quantum–classical (HQC) models for COVID-19 and related respiratory medical imaging remain at an exploratory stage. The comparative analyses in
Section 4 and
Section 5 show that reported performance depends strongly on the architectural placement of the quantum module, dataset characteristics, evaluation protocol, quantum resource availability, and validation rigor. Therefore, the main challenges are not purely algorithmic; they span data quality, medical-imaging evidence standards, NISQ hardware limitations, quantum-resource reporting, optimization stability, interpretability, and deployment feasibility.
Table 8 summarizes the principal open issues identified from the revised comparative synthesis. The challenges are linked to the evidence presented in
Table 3,
Table 4,
Table 5 and
Table 6, including diagnostic performance heterogeneity, quantum resource reporting, validation quality, and computational deployment constraints.
Table 8 indicates that the main bottlenecks of HQC COVID-19 imaging systems are systemic rather than only architectural. Although several studies report high binary COVID-19 screening performance, these results are difficult to compare directly because the reviewed works differ in modality, dataset composition, task formulation, split strategy, validation protocol, backend, and baseline models.
This heterogeneity makes it difficult to attribute reported gains solely to the quantum component. From a medical-imaging perspective, validation rigor remains a major limitation. Patient-wise splitting is not consistently reported, and external validation is rare. This is particularly important for CT-based studies, where multiple slices from the same patient may be highly correlated. If slices from the same patient appear in both training and testing subsets, performance may be inflated. Moreover, accuracy alone is insufficient for assessing clinical utility; sensitivity, specificity, AUROC, F1-score, confidence intervals, calibration, and per-class results are necessary for evaluating diagnostic reliability.
From a quantum-computing perspective, the reviewed models remain constrained by NISQ-era limitations. Most architectures use small-qubit and shallow-circuit designs, which are practical but limit expressivity. Quantum-resource reporting is also inconsistent. Several studies do not fully report circuit depth, shots, the backend, the noise model, the error-mitigation strategy, trainable quantum parameters, or the number of quantum calls per image. Without these details, reproducibility and practical feasibility are difficult to assess. Computational scaling is another important challenge. Patch-level quanvolutional models provide the most direct quantum involvement in image representation, but their cost increases with the number of image patches, measurement shots, circuit depth, and measured observables. In contrast, decision-stage and intermediate quantum feature-extraction models are more computationally constrained because the quantum circuit is invoked only once or a few times per image. However, these models depend heavily on the quality of the classical feature representation, and their quantum contribution must be validated using strong classical baselines and ablation studies. Overall, progress toward clinically credible HQC systems will require more than new circuit designs or higher reported accuracy. Future studies should prioritize standardized benchmarking, transparent reporting of quantum resources, patient-level and external validation, uncertainty-aware metrics, hardware-aware evaluation, and interpretability analysis. These steps are necessary to determine whether hybrid quantum–classical models can provide practical, clinically meaningful advantages beyond well-optimized classical deep learning systems.
7. Future Research Directions
The transition of hybrid quantum–classical (HQC) medical-imaging models from exploratory prototypes to clinically meaningful systems requires progress along multiple interconnected directions. The analyses in
Section 4,
Section 5 and
Section 6 show that future work should not focus only on increasing reported accuracy or circuit complexity. Instead, progress depends on improving dataset quality, validation rigor, transparency of quantum resources, hardware-aware evaluation, interpretability, and fair benchmarking against strong classical baselines.
Table 9 summarizes the main future research directions derived from the observed limitations in the reviewed literature.
7.1. Near-Term Priorities
In the near term, the most urgent priority is to improve the methodological reliability of HQC evaluation. Future studies should use larger and more diverse datasets, explicitly report patient-wise train–test separation, and avoid slice-level leakage, particularly in CT-based studies. Performance should be reported using clinically meaningful metrics, including sensitivity, specificity, AUROC, F1-score, calibration, confidence intervals, and per-class results, rather than relying primarily on accuracy. A second near-term priority is standardized quantum-resource reporting. Every HQC study should report the encoding strategy, number of qubits, circuit depth, ansatz or gate structure, number of shots, simulator or hardware backend, noise model, error-mitigation strategy, trainable quantum parameters, and quantum calls per image. Without this information, it is difficult to reproduce the method or assess feasibility under NISQ constraints.
7.2. Mid-Term Methodological Advancements
Mid-term progress should focus on fair attribution of quantum contribution. Hybrid models must be compared against strong and matched classical baselines, including CNNs, pretrained backbones, and classical dense classifier heads with similar parameter budgets. Ablation studies are essential for determining whether improvements arise from the quantum module, the classical backbone, data augmentation, feature compression, or the evaluation protocol. Another key direction is the development of domain-aware hybrid interfaces. Current HQC models often use generic feature encodings, while medical images contain structured anatomical and pathological information. Future models should explore ROI-guided encoding, lung-mask-based patch selection, pathology-aware feature compression, and clinically informed quantum feature maps. These strategies may reduce unnecessary dimensionality while preserving diagnostically relevant patterns. Hardware-aware design is also necessary. Patch-level quanvolutional architectures should optimize patch sampling, shot efficiency, and circuit reuse to reduce inference cost. Decision-stage and intermediate feature-extraction models should be tested under noisy simulation and real-device constraints to evaluate whether their compact quantum modules remain stable outside idealized simulators.
7.3. Long-Term Translational Development
Long-term development should move beyond proof-of-concept accuracy reports toward clinically and operationally validated HQC systems. This requires multi-centre testing, prospective validation, uncertainty-aware decision support, failure mode analysis, and integration with explainability tools. Explainable quantum AI will be especially important because clinicians and regulators must understand not only model outputs but also whether quantum-derived features have meaningful diagnostic relevance. Future studies should also evaluate HQC frameworks beyond COVID-19. Potential domains include lung cancer screening, pneumonia subtyping, tuberculosis detection, tumor classification, segmentation-assisted diagnosis, and neurological imaging. Such studies would help determine whether HQC models offer generalizable advantages or whether their benefits are limited to narrow benchmark settings. Overall, future progress in HQC medical imaging will depend less on increasing qubit count alone and more on principled architecture design, transparent resource reporting, rigorous validation, and hardware-aware benchmarking. These steps are necessary before claims of clinical utility or quantum advantage can be made with confidence.
8. Conclusions
Hybrid quantum–classical (HQC) models are an emerging research direction for COVID- Hybrid quantum–classical (HQC) models for COVID-19 medical imaging are best viewed as quantum-augmented classical pipelines, not standalone quantum diagnostic systems. This review proposes an architecture-centric taxonomy that classifies HQC models by the functional role and placement of the quantum module into three archetypes: patch-level quantum convolutional preprocessing, classical feature extraction with quantum classifier heads, and quantum feature extraction followed by classical classification. Across 10 peer-reviewed studies, HQC models show promising results, particularly for binary COVID-19 screening using CXR or CT images. However, performance is less stable in multiclass respiratory diagnosis, where COVID-19 must be separated from visually similar conditions such as pneumonia, lung opacity, or tuberculosis. Thus, high binary accuracy alone is not sufficient evidence of clinical robustness. The review also shows that current evidence remains limited by simulator-based evaluation, incomplete reporting of quantum resources, unclear patient-wise splitting, limited external validation, and heterogeneous baselines. Key details such as qubit count, circuit depth, shots, backend, noise model, error mitigation, trainable parameters, and quantum calls per image are often missing, making it difficult to verify claims of quantum contribution. From a deployment perspective, patch-wise quanvolution gives the quantum circuit a direct representational role but scales poorly with image resolution and patch count. Decision-stage and intermediate feature-extraction models are more NISQ-compatible, but their added value must be proven against strong classical baselines. Overall, there is no conclusive task-level quantum advantage yet for COVID-19 medical imaging. Future progress requires standardized benchmarking, transparent quantum-resource reporting, patient-wise and multi-center validation, uncertainty-aware metrics, robust ablation studies, and hardware-aware latency evaluation. Only then can HQC models be fairly assessed for clinically meaningful benefit beyond optimized classical deep learning.