Binary and Multi-Class Classification of Colorectal Polyps Using CRP-ViT: A Comparative Study Between CNNs and QNNs

Selvaraj, Jothiraj; Almutairi, Fadhiyah; Aslam, Shabnam M.; Umapathy, Snekhalatha

doi:10.3390/life15071124

Open AccessArticle

Binary and Multi-Class Classification of Colorectal Polyps Using CRP-ViT: A Comparative Study Between CNNs and QNNs

¹

Department of Biomedical Engineering, College of Engineering and Technology, SRM Institute of Science and Technology, Kattankulathur, Chengalpattu 603203, India

²

Department of Information Systems, College of Computer and Information Sciences, Majmaah University, Al Majmaah 11952, Saudi Arabia

³

Department of Information Technology, College of Computer and Information Sciences, Majmaah University, Al Majmaah 11952, Saudi Arabia

^*

Authors to whom correspondence should be addressed.

Life 2025, 15(7), 1124; https://doi.org/10.3390/life15071124

Submission received: 28 May 2025 / Revised: 9 July 2025 / Accepted: 15 July 2025 / Published: 17 July 2025

(This article belongs to the Special Issue Current Progress in Medical Image Segmentation)

Download

Browse Figures

Versions Notes

Abstract

Background: Colorectal cancer (CRC) is a major contributor to cancer mortality on a global scale, with polyps being critical precursors. The accurate classification of colorectal polyps (CRPs) from colonoscopy images is essential for the timely diagnosis and treatment of CRC. Method: This research proposes a novel hybrid model, CRP-ViT, integrating ResNet50 with Vision Transformers (ViTs) to enhance feature extraction and improve classification performance. This study conducted a comprehensive comparison of the CRP-ViT model against traditional convolutional neural networks (CNNs) and emerging quantum neural networks (QNNs). Experiments were conducted for binary classification to predict the presence of polyps and multi-classification to predict specific polyp types (hyperplastic, adenomatous, and serrated). Results: The results demonstrate that CRP_QNN-ViT achieved superior classification performance while maintaining computational efficiency. CRP_QNN-ViT achieved an accuracy of 98.18% for training and 97.73% for validation on binary classification and 98.13% during training and 97.92% for validation on multi-classification tasks. In addition to the key metrics, computational parameters were compared, where CRP_QNN-ViT excelled in computational time. Conclusions: This comparative analysis reveals the potential of integrating quantum computing into medical image analysis and underscores the effectiveness of transformer-based architectures for CRP classification.

Keywords:

colorectal polyp classification; CRP-ViT; vision transformer; CNN; QNN; colonoscopy images; quantum computing

1. Introduction

Colorectal cancer (CRC) ranks among the prevalent and fatal cancers across the globe, with its early stages often marked by the presence of colorectal polyps [1,2]. The timely detection and classification of these polyps during colonoscopy examinations are crucial for preventing malignant transformation [2,3,4]. However, the manual interpretation of colonoscopy images can be subjective, labor-intensive, and susceptible to differences in interpretation among observers [5,6]. As a result, the integration of Artificial Intelligence (AI) into clinical diagnostics has gained increasing attention for improving the accurate detection of anomalies [7,8,9,10].

Convolutional neural networks (CNNs) have traditionally been the backbone of image classification tasks, including medical imaging [11,12]. Their hierarchical feature extraction capabilities have enabled them to detect fine-grained patterns in endoscopic images [13]. Also, Vision Transformers (ViTs) have gained prominence as an effective alternative, utilizing self-attention mechanisms to model long-range interactions and spatial relationships within an image [14,15].

In this study, we propose a novel hybrid architecture, CRP-ViT, that combines the strengths of convolutional layers for local feature learning with the global attention capabilities of Vision Transformers. This unified model is designed to enhance the classification of CRP from colonoscopy images. In addition to this, we explore the emerging paradigm of quantum neural networks (QNNs), which utilize principles of quantum computation to enable complex pattern recognition and data encoding within quantum states. While still in their infancy, QNNs offer promising advantages in terms of computational parallelism and robustness in high-dimensional feature spaces.

1.1. CNN

CNNs represent a specialized architecture within the broader domain of DL, particularly tailored for image analysis and computer vision tasks [16]. Inspired by the structure and functioning of the human visual cortex, CNNs are adept at recognizing and interpreting spatial hierarchies and patterns within images [17]. Their architecture is composed of several interconnected layers, each performing a specific function that contributes to the model’s ability to learn from image inputs [12,18]. The core components of a CNN includes convolutional layers, activation functions, pooling layers, fully connected layers, back propagation, and optimization [19]. The fundamental building block of a CNN is the convolutional layer, which employs a series of learnable filters or kernels that convolve across the input image to detect local features such as edges, textures, corners, and more complex shapes [12]. The core operation of a 2D convolution layer in a CNN is provided in Figure 1.

The input is a two-dimensional grid representing an image or feature map upon which a sliding window, defined by a kernel or filter, moves across the input data, performing element-wise multiplication followed by summation at each location [12]. Each filter is sensitive to a particular kind of visual pattern, and multiple filters allow the network to learn a diverse set of features at various levels of abstraction [20]. Following the convolution, activation functions, typically non-linear functions such as the Rectified Linear Unit (ReLU), are applied to introduce non-linearity into the model, which is crucial for learning intricate patterns and relationships in the data [21]. To manage the high dimensionality of feature maps and to make the network more computationally efficient, pooling layers are introduced [22]. These layers perform a downsampling operation, reducing the spatial dimensions of the feature maps while preserving the most important information [23]. After successive stages of convolution, activation, and pooling, the resulting features are flattened into a one-dimensional vector and fed into one or more fully connected layers (dense layers) [19,24]. These layers function as the decision-making component of the network, integrating the learned spatial features to perform classification tasks [25]. The learning process in CNNs involves forward propagation, where inputs pass through the network to produce an output, and back propagation, a mechanism that adjusts the model parameters by propagating the error gradients backward through the network [26]. This optimization process is typically governed by algorithms such as Adam, which iteratively minimize the loss function to improve prediction accuracy [27,28].

1.2. QNN

Recent advancements in quantum computing have paved new pathways in the field of AI, particularly in the development of quantum neural networks (QNNs) [29]. Unlike classical neural networks, QNNs harness quantum mechanical phenomena—namely superposition, entanglement, and quantum interference—for processing information in a high-dimensional Hilbert space, thereby enabling more complex pattern recognition capabilities with fewer resources [30]. The quantum workflow is provided in Figure 2a encompassing Quantum Data Encoding, where the pixel intensities of the input image are transformed into quantum states through angle encoding schemes allowing quantum systems to represent large amounts of information compactly [31,32]. Instead of traditional neurons and weights, QNNs use quantum gates organized in quantum circuit layers to transform quantum states [33]. These transformations create entangled and superposed states that enable parallel computation across multiple possibilities [34]. After quantum operations, measurements are taken to collapse the quantum state into classical outcomes, which are interpreted for tasks like classification [35]. Due to the current limitations of quantum hardware, many QNNs are implemented as hybrid models [36]. A quantum circuit acts as a layer within a classical neural network and optimization is performed using classical techniques, guided by measurement outcomes from the quantum layer [37]. The Quanvolutional layers used in this study were implemented using PennyLane with Qiskit, a quantum simulator backend, which executes real quantum circuit operations on classical hardware. The design adheres to valid quantum logic gate principles, simulating actual qubit behavior within the constraints of current simulation environments.

In this study, the 4-qubit QNN architecture was utilized for the classification of polyps in colonoscopy images, as shown in Figure 2b. Each qubit in the network represents a quantum bit capable of existing in a superposition of states, which exponentially increases the representational capacity of the system [38]. The adoption of a 4-qubit design balances computational complexity and expressiveness, rendering it compatible for present Noisy Intermediate-Scale Quantum (NISQ) devices while maintaining scalability for future hardware improvements [39]. The input medical images are first preprocessed using classical techniques. Subsequently, the angle encoding is used to map classical features x_i ∈ [0, 1] into quantum states using rotation gates. Specifically, each feature is embedded into a qubit using the parameterized gates R_y(πx_i) and R_z(πx_i) [40]. The complete normalized feature vector is represented as per Equation (1).

x = [x_{1}, x_{2}, \dots, x_{n}]

(1)

where x_i denotes the ith normalized scalar feature, where x_i ∈ [0, 1] and n indicates the number of features.

Each x_i is encoded into a quantum state using parameterized rotation gates applied to qubits initialized in the ground state ∣0⟩. The Y-axis and Z-axis rotation gates R_y(πx_i) and R_z(πx_i) are represented by Equations (2) and (3), respectively.

R_{y} (π x_{i}) = [\begin{matrix} c o s (θ_{i} / 2) & - s i n (θ_{i} / 2) \\ s i n (θ_{i} / 2) & c o s (θ_{i} / 2) \end{matrix}]

(2)

R_{z} (π x_{i}) = [\begin{matrix} e^{- i θ_{i} / 2} & 0 \\ 0 & e^{i θ_{i} / 2} \end{matrix}]

(3)

For multiple features, a system of n qubits is used. The full composite quantum state of the system is given by the tensor product as provided in Equation (4).

| Ψ ⟩ = ⨂_{i = 1}^{n} R_{y} (π x_{i}) R_{z} (π x_{i}) | 0 ⟩

(4)

The symbol ⨂ represents the tensor (Kronecker) product operator, combining individual qubit states into a multi-qubit system.

Following the encoding, CNOT entanglement operations are applied to capture complex inter-feature dependencies. The QNN is trained using a variational quantum circuit (VQC) paradigm, where the parameters of the rotation gates are iteratively updated using classical optimization algorithms [41]. The output from quantum measurements is used to calculate class probabilities for multi-class classification.

In the present investigation, a shallow quantum circuit was utilized due to noise sensitivity and computational complexity in simulating quantum operations. Specifically, each 4-qubit circuit consisted of rotation gates for encoding followed by entanglement via CNOT gates. Deeper circuits were avoided to maintain stability and computational feasibility on classical simulators, acknowledging the exponential growth in resources required as the qubit count or depth increases.

1.3. Survey of Literature

The integration of AI in gastrointestinal endoscopy has significantly enhanced the detection and classification of CRPs, a key step in the prevention of CRC [42]. Over the past decade, various DL models have been deployed to improve the diagnostic accuracy of colonoscopy procedures, with CNNs emerging as the dominant approach due to their high performance in image classification tasks [43,44].

Several studies have demonstrated the utility of CNNs in CRP classification. For instance, Urban et al. [45] developed a real-time DL system that achieved expert-level performance in detecting polyps in colonoscopy videos. Adrian Krenzer et al. [46] developed an automated method to classify polyps according to the NICE and Paris classifications. Similarly, Balasubramani et al. [47] trained a CNN model on the Kvasir and CVC-ClinicDB datasets, achieving high precision and recall rates in distinguishing adenomatous from non-adenomatous polyps. Sena Busra Yengec-Tasdemir et al. [48] attempted to classify polyp categories based on histopathological images. The histopathological images required invasive procedures of biopsy to be taken. The authors achieved an accuracy of 87.1% with a custom dataset and 70.3% with the publicly available UniToPatho dataset. The authors limited their work to hyperplastic and adenoma polyps. Pradipta Sasmal et al. [49] proposed generative adversarial networks for the UniToPatho dataset and achieved an accuracy of 87.50%. The study carried out by Thomas De Carvalho et al. [50] focused on the NICE classification, for categorizing polyps as either hyperplastic or adenomas or cancer. Trained with multiple datasets, the classifier produced 92% accuracy on a custom dataset and 88% on the Piccolo dataset. Despite their success, CNNs are inherently limited by their confined receptive fields, which may hinder their ability to capture global contextual information crucial for complex image structures like polyps with subtle variations.

To overcome these limitations, Vision Transformers (ViTs) have been introduced as a viable alternative. Dosovitskiy et al. [51] pioneered the use of transformer-based models for image recognition by applying the self-attention mechanism to image patches. Haq et al. [52] applied ViT architectures for gastrointestinal disease detection, reporting improvements in classification accuracy over traditional CNNs. However, transformers generally demand extensive training datasets and significant computational power, which may limit their application in resource-constrained clinical settings.

Parallel to these advancements, quantum neural networks (QNNs) have emerged as a novel computational paradigm that combines quantum mechanics and neural networks [53]. Leveraging quantum bits (qubits) and quantum gates, QNNs have the potential to encode complex feature spaces with fewer resources, enabling exponential speed-ups for certain tasks [54]. Schuld and Killoran [55] proposed a hybrid quantum–classical model that successfully performed classification tasks on medical datasets. Ovalle-Magallanes et al. [40] introduced a learnable quantum angle encoding framework for the classification of ECG and EEG signals, which are critical for neurological and cardiac imaging diagnostics. While not limited to imaging alone, Meghanath et al. [32] proposed a quantum deep CNN for real-time, high-dimensional visual data analysis in safety-critical systems; their approach has potential translational relevance for diagnostic radiology, where real-time and accurate visual pattern recognition is essential. Awujoola et al. [53] applied QNNs to the early detection of breast and lung cancer by classifying histopathological and cytology images, showcasing improved sensitivity in differentiating malignant from benign tissues. More recently, QNNs have been explored for medical image analysis, such as in cancer detection and genomics, although most studies remain at a proof-of-concept level due to current limitations in quantum hardware.

Despite the promise shown by individual models, very few studies have undertaken a comparative evaluation of CNNs, ViT-based models, and QNNs for CRP classification. Furthermore, hybrid models combining CNNs and ViTs remain underexplored in the gastrointestinal imaging domain. This research aims to bridge this gap by proposing a CRP-ViT hybrid model and evaluating its performance against both classical CNNs and emerging QNNs, thereby offering a comprehensive perspective on the strengths and limitations of each approach in CRP classification.

1.4. Problem Identified from the Literature

Most existing studies on polyp detection from colonoscopy images primarily rely on classical deep learning architectures. In contrast, this work explores the application of quantum computing to enhance both computational efficiency and detection performance. Furthermore, the majority of prior research focuses on binary classification, aiming only to identify the presence or absence of polyps, rather than distinguishing between specific polyp types. Even in studies that address multi-class classification, the problem is often decomposed into a series of binary tasks. Additionally, these multi-class approaches typically categorize polyps based on superficial attributes such as color and texture, without providing insight into the clinical severity or pathological significance of the lesions.

1.5. Aim and Objective of This Article

The aim of this study is to conduct a comparative evaluation between two architectures: CRP-ViT integrated with a classical CNN and its quantum-enhanced counterpart employing a QNN. The models were assessed on independent image datasets beyond those used during training and validation to examine their generalization capabilities. The objective of this research is to highlight the potential advantages of quantum-assisted learning over conventional CNN- and transformer-based methods. Building upon the previously published CRP_CNN-ViT framework, the authors introduce a modified architecture CRP_QNN-ViT by replacing the convolutional layers with Quanvolutional layers. The goal was to train and validate this network for both binary and multi-class classification tasks and benchmark its performance against results reported in the base reference study. The findings are intended to contribute to the advancement of more accurate and generalizable diagnostic tools for CRC screening.

1.6. Contribution of This Article

The contributions of the authors include the following:

A diverse CRP image dataset is aggregated by combining publicly available sources with self-curated real-time clinical colonoscopy images to enhance the dataset’s variability in training and evaluating DL models in the detection of CRP.
A novel architectural modification is proposed by replacing classical convolutional layers in the original CRP_CNN-ViT model with Quanvolutional layers, resulting in a hybrid quantum model named CRP_QNN-ViT. This integration aims to deploy quantum computational advantages such as parallelism and entanglement for improved feature representation and classification performance.
Both the classical (CRP_CNN-ViT) and quantum (CRP_QNN-ViT) models are employed for binary (polyp or normal) and multi-class classification (hyperplastic or adenoma or serrated or normal).
A detailed comparative analysis is conducted between the CRP_CNN-ViT and CRP_QNN-ViT models. Performance is evaluated using standard classification metric, as well as computational complexity parameters to highlight the trade-offs between classical and quantum-enhanced models in practical deployment scenarios.
An ablation study is performed on the CRP_QNN-ViT model to systematically evaluate the impact of quantum components on CRP-ViT architectures.

1.7. Outline of This Article

This article is organized to systematically present the development and evaluation of a hybrid classical–quantum model for CRP classification. Section 1 is the Introduction, providing foundational insights into CNNs and QNNs, followed by a comprehensive literature survey, the identification of existing research gaps, and a clear statement of the study’s aims, objectives, and contributions. Section 2 provides the Materials and Methods, detailing the methodology in two phases including binary and multi-class classification, followed by the design of the proposed CRP_QNN-ViT architecture, performance evaluation strategies including cross-validation, and a comparative analysis between the CRP_CNN-ViT and CRP_QNN-ViT models. The quantitative findings from both classification phases are presented in Section 3. This is followed by Section 4 in which the results are interpreted in greater depth including the current limitations and challenges faced in implementing quantum-based models. Finally, Section 5 summarizes the key findings and underscores the future potential of integrating quantum computing with deep learning for medical image analysis.

2. Materials and Methods

The pipeline of this research is divided into two phases, each contributing to the comparative analysis between classical and quantum deep learning models, as illustrated in Figure 3. The initial stages in both the phases involves the collection of datasets that are representative of the CRP and preprocessing. Phase 1 focuses on the binary classification of CRP (polyp vs. non-polyp), while phase 2 addresses the multi-class classification of polyp subtypes. Both phases adopt the CRP-ViT framework, where two distinct DL paradigms, including CNNs and QNNs, are explored.

In the previous works carried out by the authors, CRP-ViT integrated with a CNN backbone was developed to address both binary [56] and multi-class [57] classification tasks in CRP analysis involving identifying the presence or absence of polyps and distinguishing between polyp types, respectively. Building upon this foundational work, the current study introduces an enhanced version of the CRP_QNN-ViT model which incorporates a Quanvolutional neural network (QNN) as its feature extraction backbone. The QNN-based approach leverages quantum-inspired convolutional operations to potentially capture more intricate patterns and relationships within medical image data, offering an innovative alternative to traditional CNN architectures. To ensure an equitable performance comparison between the proposed CRP_QNN-ViT and the previously developed CRP_CNN-ViT models, the experimental design adheres to the same protocols, datasets, preprocessing steps, and evaluation strategies established in the earlier study.

All deep learning (DL) experiments, including model training and evaluation, were implemented and executed on the Google Colaboratory (Colab) platform. The entire implementation was carried out using the Python (V3.10) programming language, utilizing various open-source libraries and frameworks optimized for DL workflows.

This section outlines the dataset used for experimentation, preprocessing techniques, the architecture of the proposed CRP-ViT model, and the baseline CNN and QNN architectures used for comparative analysis. It also defines the evaluation metrics employed to measure model performance.

2.1. Phase 1: Binary Classification

The workflow of the phase 1 involving binary classification is detailed in Figure 4. The dataset employed in this study integrates four distinct sources to ensure comprehensive and diverse representation for CRP analysis. The first source is a real-time clinical database developed through collaboration with SRM Medical College Hospital & Research Centre (SRMC & RC), comprising a total of 182 annotated colonoscopy images (82 polyp and 100 normal), obtained under institutional ethical approval (IEC No.: 8677/IEC/2023). The manual annotation of clinical images is inherently subjective, often leading to inter-observer variability, even among experts. While expert validation enhances the quality of annotations, it cannot entirely eliminate the bias introduced by either individual interpretation styles or clinical experience. Such variability can result in noisy labels, which may impact the consistency of the training data and ultimately affect the generalization ability of DL models. To mitigate these issues, our study employed a two-stage annotation strategy. In Stage 1, two medical practitioners with four years of postgraduate experience in gastroenterology annotated the data. In Stage 2, two senior professors with a combined decade of experience in medical education and clinical practice reviewed and verified the annotations. All annotators had access to corresponding pathology and biopsy reports to support their decisions. To reduce subjectivity and enhance label reliability, experts in each stage collaboratively reviewed each image and any disagreements were resolved through discussion and consensus. The finalized dataset comprised 82 polyp images and 100 normal images, serving as the real-time dataset (DS-1) [56].

In addition to this real-time dataset, three publicly available benchmark datasets were utilized: CVC-ColonDB [58], CVC-ClinicDB [59], and the UAH DB [60]. The CVC-ColonDB includes 380 annotated frames extracted from colonoscopy videos of 13 patients, providing detailed pixel-wise polyp annotations. The CVC-ClinicDB contributes a larger dataset of 612 images sourced from 23 patients, offering a broader and more diverse sample of polyp appearances across various anatomical regions. The UAH DB comprises 76 high-resolution colonoscopy images, each clinically annotated and categorized into specific histological types: hyperplastic, adenomatous, and sessile serrated polyps. However, since phase 1 of this study was dedicated to binary classification—distinguishing between polyp and non-polyp (normal) classes—all histological subtypes in the UAH DB were unified and treated collectively under the single polyp category.

To ensure consistency and to standardize the spatial dimension of the input images provided to CRP-ViT, all the colonoscopy images considered in this study were uniformly resized. Subsequently, each resized image was normalized to scale pixel intensity values between 0 and 1 [4]. Notably, 100 images (50 polyp, 50 normal) out of 182 real-time datasets were reserved as an unseen test set for model evaluation. The dataset, except the images reserved for testing, was split into 80:20 and used for training and validation, respectively. All the open-source datasets contained only polyp images; hence, normal-class images were supplemented via augmentation techniques to balance the training and validation setup. The normal images of the real-time dataset were augmented using 21 techniques to produce 1100 normal images in total. CRP_CNN-ViT and CRP_QNN-ViT were validated with performance metric assessment.

2.2. Phase 2: Multi-Classification

In phase 2, a multi-classification task was undertaken, targeting four categories: hyperplastic, adenomatous, serrated, and normal. It is crucial to emphasize that the datasets used in phase 1 (DS-2 and DS-3) do not include the specific classes of polyps and therefore were inapplicable for the current multi-class classification paradigm. The workflow of phase 2 encompassing multi-classification is illustrated in Figure 5. Following the same ethical clearance obtained for phase 1, additional images were acquired to expand the number of CRP images from 82 to 100. Phase 2 utilized two datasets: (1) a self-curated real-time dataset consisting of 200 colonoscopy images (100 CRP and 100 normal) and (2) a publicly available dataset from the UAH DB (DS-4) [60], contributing 76 CRP images. All images underwent a preprocessing pipeline, including data normalization (DN) and dimensional resizing (DR). Following preprocessing, 40 normal images of real-time dataset and UAH DB CRP images were reserved for testing. In multi-classification tasks, data imbalance poses a significant challenge by causing the model to become biased toward majority classes, thereby compromising its ability to accurately learn and classify minority-class instances. The limited representation of minority classes in training data leads to inadequate feature learning, increasing the likelihood of misclassification and reducing the model’s generalization capability. Data augmentation aids in mitigating the effects of data imbalance in multi-classification tasks by increasing the number of training samples for underrepresented classes. In the present study, a two-stage augmentation approach was employed: Stage 1 involved balancing the dataset by equalizing the number of images across all classes, while Stage 2 focused on further augmenting the training samples to enhance the model’s learning capacity and robustness. After augmentation, the dataset was partitioned with an 80:20 split for training and validation, respectively. The complete data-splitting strategy has been detailed in the authors’ previously published work [57].

2.3. Proposed Architecture

Figure 6 illustrates the proposed CRP_QNN-ViT framework designed to classify colonoscopy images into four diagnostic categories considered in this research. CRPQNN-ViT is similar to CRP_CNN-ViT in its architecture except that Quanvolutional layers are used instead of convolutional layers. The CRP_QNN-ViT model integrates Quanvolutional- (ResNet50) and transformer- (ViT) based components to leverage both local and global feature representations. The architecture initiates with an input stage, where colonoscopy images are processed through a series of Quanvolutional layers. The core processing component incorporates a four-stage Quanvolutional neural network (QNN) backbone, modeled on the ResNet-50 architecture and structured using bottleneck residual blocks. Each stage comprises multiple Quanvolutional layers that include combinations of 1 × 1 and 3 × 3 Quanvolutions, as detailed in Table 1. These layers are configured with progressively increasing filter depths to capture hierarchical feature representations. To efficiently manage the spatial dimensions and facilitate deeper learning, stride Quanvolutions are employed for downsampling at strategically selected stages within the network. Upon the completion of feature extraction in the QNN backbone, the resulting feature maps are flattened and converted into a sequence of fixed-size embedded patches, which serve as input to the transformer module. To preserve spatial dependencies across the image domain, positional encodings are integrated with the embedded patches. Furthermore, a learnable classification token is prepended to the sequence, enabling the ViT to aggregate global contextual information necessary for decision-making. Both the CRP_CNN-ViT and CRP_QNN-ViT models were trained, and their performance metrics were evaluated and compared comprehensively.

2.4. Performance Evaluation and Cross-Validation

Building upon previously published research [56,57] that employed the CRP_CNN-ViT framework, this study evaluates the performance of the proposed CRP_QNN-ViT architecture as shown in Figure 7. The objective of this study was to compare the classification capabilities of both the CRP_CNN-ViT and CRP_QNN-ViT architectures in the context of CRP detection from colonoscopy images. A comprehensive comparative analysis was conducted between the CRP_QNN-ViT and the CRP_CNN-ViT models to assess improvements in classification accuracy in CRP detection. To ensure a robust and unbiased evaluation, a comprehensive cross-validation strategy was employed. This included partitioning the dataset into 5 folds to facilitate repeated training and validation across the data splits, thereby reducing variance and mitigating overfitting [4]. Both models, CRP_CNN-ViT and CRP_QNN-ViT, were trained using the same dataset splits and hyperparameter configurations to ensure a fair and controlled comparison.

2.5. Comparative Analysis Between CNN and QNN

In addition to the conventional performance metrics, a comprehensive comparative analysis was conducted to evaluate the architectural efficiency of the proposed CRP_CNN-ViT and CRP_QNN-ViT models. This assessment focused on key computational and resource-oriented parameters that significantly impact the practical deployment and scalability of DL models, particularly in real-time or resource-constrained healthcare environments. The parameters considered in this investigation included computational speed, floating-point operations per second (FLOPs), total trainable parameters, and memory usage.

3. Results

The experimental results reported in this study are derived from the implementation and evaluation of the proposed CRP_QNN-ViT architecture. For a fair and consistent comparison, the performance metrics associated with the CRP_CNN-ViT model are adopted from our previously published work [56,57].

3.1. Phase 1: Binary Classification

The performance achieved by the CRP_QNN-ViT using an 80:20 split is detailed in Table 2. During the training and validation phase, the model achieved accuracies of 98.18% and 97.73%, demonstrating its capability to effectively learn discriminative features. The sensitivity and NPV were recorded at 98.64% and 98.61%, respectively, during validation, indicating a strong ability to correctly identify positive cases and minimize false negatives. The specificity reached 96.82%, while the precision was 97.88%, highlighting the model’s balanced performance in detecting both positive and negative instances.

The performance of the CRP_QNN-ViT model for binary classification between CRP and normal images is illustrated in Figure 8. The training and validation accuracy curves indicate rapid convergence within 23 epochs, with both curves stabilizing above 98%, suggesting effective learning and minimal overfitting. The loss curves further confirm model stability, with the training and validation losses remaining consistently low throughout the training process. The Receiver Operating Characteristic (ROC) analysis demonstrates the model’s strong discriminative ability, with AUC values of 98.18% for training and 97.73% for validation. Additionally, the confusion matrix of test images highlights the classification performance, with 49 out of 50 CRP and 49 out of 50 normal samples correctly identified, resulting in a testing classification accuracy of 98%.

The classification performance across the five folds is summarized in Table 3. Almost Similar trends of performance were observed among all the folds of the 5-fold cross-validation. The mean and standard deviation of the performance metrics across the folds indicate that the CRP_QNN-ViT model maintained stable performance across different data partitions.

The difference in performance between the 80:20 split and 5-fold cross-validation is tabulated in Table 4. The observed differences between the two approaches were minimal, while accuracy remained identical. These negligible deviations demonstrate the model’s high degree of stability across different validation techniques and reinforce its potential applicability in clinical settings for CRP classification.

3.2. Phase 2: Multi-Classification

The CRP_QNN-ViT model was further assessed for its effectiveness in performing multi-class classification tasks, as presented in Table 5. Specifically, the model was employed to accurately differentiate and predict multiple categories, including various types of CRP as well as normal tissue. The class-wise performance achieved by CRP_QNN-ViT using an 80:20 split during both the training and validation phases over 29 out of 50 epochs, under the Adam optimizer configuration, is presented. The overall accuracy during training and validation were 98.13% and 97.92%.

The multi-class classification performance of the CRP_QNN-ViT model is depicted in Figure 9, showcasing its training behavior and evaluation metrics. The accuracy curves reveal that both training and validation accuracies converge quickly, indicating the model’s strong generalization capability across multiple polyp classes. The loss plot demonstrates a consistent reduction in both training and validation losses, reflecting efficient learning and model stability. The ROC curves for each class highlight high discriminative power, with AUC scores exceeding 97% across all classes. The confusion matrix of the test images further validates model performance, showing high classification accuracy across all five classes, with minimal misclassifications.

The class-wise performance achieved by CRP_QNN-ViT during 5-fold cross-validation is provided in Table 6. Across the five folds (K = 1 to K = 5), the CRP_QNN-ViT consistently demonstrated high classification performance. The overall classification accuracy across all classes and folds was approximately 97.94 ± 0.35. The variance among the folds was minimal, indicating that CRP_QNN-ViT maintained its performance across all the folds irrespective of the data partition. The difference in performance of CRP_QNN-ViT using an 80:20 split and 5-fold cross-validation is provided in Table 7. The observed consistency in the performance of CRP_QNN-ViT using an 80:20 split and K-fold cross-validation again confirms the model’s strong generalization capability.

4. Discussion

4.1. Phase 1: Binary Classification

Table 8 provides a comparative evaluation of the computational complexity and execution time for the CRP_CNN-ViT and CRP_QNN-ViT models considered for binary classification. The CRP_QNN-ViT model exhibits a more lightweight architecture, with a total of 86 million parameters, which are significantly fewer than the 132 million parameters required by CRP_CNN-ViT. This reduction is reflected in computational demands, as CRP_QNN-ViT performs only 8.5 Giga Multiply–Accumulate Operations (MACs) and 17 Giga floating-point operations (FLOPs), compared to 11.6 Giga MACs and 23.2 Giga FLOPs for the classical model. Furthermore, the QNN-based model achieves faster processing times, with training and validation requiring 10.61 milliseconds per image and testing requiring 1.79 milliseconds per image.

A comparative performance analysis between the CRP_CNN-ViT and CRP_QNN-ViT models to evaluate the impact of incorporating Quanvolutional operations on classification performance is presented in Table 9. The CRP_QNN-ViT model consistently outperformed the CRP_CNN-ViT baseline, demonstrating improved training accuracy (98.18% vs. 97.44%) and validation accuracy (97.73% vs. 96.59%), with respective gains of 0.74% and 1.14%. The results clearly demonstrate that the CRP_QNN-ViT model achieved a superior performance when compared to the CRP_CNN-ViT model across all key metrics. To further investigate the performance variability of the models, a comparison was carried out using mean ± standard deviation values across five folds for both CRP_CNN-ViT and CRP_QNN-ViT architectures, as tabulated in Table 10. The CRP_QNN-ViT model demonstrated a notable improvement in overall classification accuracy, achieving 97.73% ± 0.31, compared to 96.60% ± 0.03 for the CRP_CNN-ViT model, reflecting an enhancement of +1.13%. While specificity and precision exhibited slight reductions of 0.08% and 0.10%, respectively, these differences fall within acceptable variance limits and do not negatively impact the model’s reliability. These findings highlight that the integration of quantum-inspired Quanvolutional layers into the CRP_QNN-ViT architecture contributes to improved feature learning and generalization, thereby enhancing the model’s effectiveness in the classification of CRP.

4.2. Phase 2: Multi-Classification

The computational complexity and processing time for the CRP_CNN-ViT and CRP_QNN-ViT models are compared in Table 11. CRP_QNN-ViT, which integrates quantum components, demonstrates notable improvements in efficiency over its classical counterpart. Specifically, CRP_QNN-ViT requires only 86 million parameters compared to 132 million, as required by CRP_CNN-ViT. Correspondingly, it achieves reductions in MACs and FLOPs, with values of 8.5 Giga MACs and 17 Giga FLOPs, respectively. In terms of processing speed, CRP_QNN-ViT also offers faster inference and training times, requiring only 11.36 milliseconds for training and validation per image and 2.11 milliseconds for testing per image.

A detailed class-wise performance comparison carried out between the CRP_CNN-ViT and CRP_QNN-ViT models to assess the influence of Quanvolutional operations on multi-class classification outcomes is presented in Table 12. The CRP_QNN-ViT model consistently exhibited improved or comparable performance across most evaluation metrics and class categories when compared to the CRP_CNN-ViT, with the exception of a reduction in specificity and NPV for class 3. Importantly, the overall training and validation accuracy of the CRP_QNN-ViT model reached 98.13% and 97.92%, respectively, outperforming CRP_CNN-ViT, whose values were 97.28% and 96.02%, respectively. The comparison between both the DL models considered in this research as tabulated in Table 13 was taken into account for the 5-fold cross-validation. For most of the classes, the gains were more pronounced, though a marginal reduction was noted in NPV for class 2 and class 3.

While CNNs are highly optimized on classical hardware using GPUs and TPUs, they often face several limitations. CNNs typically exhibit high FLOP counts due to repeated convolutional and pooling operations, and deep architectures can contain millions of parameters, making them prone to overfitting and requiring extensive regularization. Additionally, memory usage in CNNs scales with model depth and input data size, leading to substantial resource demands.

QNNs, by contrast, leverage intrinsic properties of quantum computing such as superposition, entanglement, and quantum parallelism. These principles allow QNNs to perform multiple computations simultaneously, enabling more compact and efficient circuit representations. Quantum gates perform unitary transformations analogous to classical operations, but often with fewer operations when optimized. As a result, QNNs can operate with significantly fewer trainable parameters and reduced memory usage for model representation, as quantum states are encoded in the amplitudes of qubits.

Moreover, QNNs can represent complex functions in high-dimensional Hilbert spaces using fewer layers and less redundancy than classical deep networks, enhancing their theoretical representational power and learning capacity. However, simulating QNNs on classical machines remains computationally intensive as compared to quantum systems due to the exponential scaling of qubit states. This leads to high memory and processing demands, and the latency of simulated quantum gates does not match the efficiency expected from actual quantum hardware. Once mature, physical quantum systems are anticipated to alleviate these limitations through qubit-efficient computation and reduced memory footprints.

4.3. Challenges and Limitations

While the proposed CRP-ViT model has demonstrated strong performance in CRP classification, several challenges and limitations emerged during the course of this research. One of the primary concerns is data limitation. Additionally, the computational complexity of CRP_CNN-ViT imposes significant demands on processing power and memory, which may restrict their deployment in real-time or resource-limited clinical settings. In general, high-resolution images are typically not used as direct input for either CNNs or QNNs for several reasons related to computational efficiency, memory limitations, and overfitting risk that diminishes the performance of the model. CRP_QNN-ViT is constrained by the limited number of qubits available on near-term quantum devices, which restricts the model’s capacity to process high-dimensional medical imaging data directly. Scaling a QNN beyond four qubits introduces several challenges, including exponential growth in the Hilbert space, increased circuit depth, and susceptibility to noise and decoherence on the current hardware. As quantum hardware evolves to support more qubits with lower noise levels and improved gate fidelity, the CRP_QNN-ViT framework can be systematically scaled by modular circuit expansion, qubit-efficient design patterns, and layer-wise quantum training techniques, thereby making it suitable for complex, high-dimensional medical imaging tasks.

Even though the performance of CRP_QNN-ViT is superior, it imposes a barrier involving quantum hardware constraints, since current quantum processors are still in the early stages of development, with limited qubit fidelity and scalability. Future extensions of the quantum work will explore error mitigation techniques such as Zero-Noise Extrapolation (ZNE), and measurement error correction will be integrated to enhance inference fidelity. In the architecture, qubit-efficient encodings and hardware-efficient ansatzes will be incorporated to align circuit structures with native gate sets and the connectivity constraints of specific NISQ backends. These enhancements are expected to make CRP_QNN-ViT more resilient to noise and thus more viable for deployment on near-term quantum hardware. Additionally, in future work, we intend to explore and systematically evaluate alternative entanglement strategies such as CZ, iSWAP, or hardware-efficient entanglement patterns through empirical experimentation and comparative analysis to better understand their impact on classification performance and circuit expressiveness.

5. Conclusions

This study presents a novel hybrid architecture, CRP_QNN-ViT, which integrates quantum neural networks (QNNs) into a convolutional residual pathway-enhanced Vision Transformer framework for efficient CRP classification. The proposed model outperforms its classical counterpart, CRP_CNN-ViT, in terms of both classification accuracy and computational efficiency across binary and multi-class tasks. Interestingly, the QNN model, outperforming the classical models, showed competitive results, particularly in generalizing to unseen data. This may be attributed to the ability of quantum circuits to represent complex data distributions with fewer parameters. However, the limitations of current quantum simulators, noise in quantum gates, and the lack of optimized quantum hardware may have impacted the full potential of the QNN model. Nonetheless, the results highlight that QNNs, when mature, could offer a powerful alternative for high-dimensional medical imaging tasks, especially where classical models face scalability issues.

Author Contributions

Conceptualization, J.S. and S.U.; methodology, J.S. and S.U.; software, J.S. and S.U.; validation, F.A., S.M.A. and S.U.; formal analysis, F.A., S.M.A. and S.U.; investigation, J.S. and S.U.; resources, J.S. and S.U.; data curation, J.S. and S.U.; writing—original draft preparation, J.S. and S.U.; writing—review and editing, F.A., S.M.A., and S.U.; visualization, J.S. and S.U.; supervision, S.U.; project administration, S.U.; funding acquisition, F.A., S.M.A., and S.U. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

This study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Ethics Committee of SRM Medical College Hospital and Research Centre, SRM Institute of Science and Technology, Kattankulathur, Chengalpattu, Tamil Nadu, India, 603203 (ethical approval number: 8677/IEC/2023; approval date: 19 July 2023).

Informed Consent Statement

Informed consent was obtained from all subjects involved in this study.

Data Availability Statement

The data that support the findings of this study are openly available at the following. (1) URL: https://figshare.com/articles/figure/Polyp_DataSet_zip/21221579?file=37636550 (accessed on 16 July 2025); DOI: https://doi.org/10.1016/j.patcog.2012.03.002. (2) URL: https://www.kaggle.com/datasets/balraj98/cvcclinicdb (accessed on 16 July 2025); DOI: https://doi.org/10.1016/j.compmedimag.2015.02.007. (3) URL: http://www.depeca.uah.es/colonoscopy_dataset/ (accessed on 16 July 2025); DOI: https://doi.org/10.1109/tmi.2016.2547947.

Acknowledgments

The authors would like to thank the Deanship of Scientific Research at Majmaah University for supporting this work under project number R-2025-1889. We gratefully acknowledge the use of Servier Medical Art’s open access image repository, which has been highly valuable to the development of figures for this project. We also extend our profound appreciation to the distinguished gastroenterologist for their expert input in annotation, which substantially enriched the quality and accuracy of this work. Additionally, we are thankful to the participants whose involvement has supported the successful completion of this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Marabotto, E.; Kayali, S.; Buccilli, S.; Levo, F.; Bodini, G.; Giannini, E.G.; Savarino, V.; Savarino, E.V. Colorectal cancer in inflammatory bowel diseases: Epidemiology and prevention: A review. Cancers 2022, 14, 4254. [Google Scholar] [CrossRef] [PubMed]
Sullivan, B.A.; Noujaim, M.; Roper, J. Cause, epidemiology, and histology of polyps and pathways to colorectal cancer. Gastrointest. Endosc. Clin. 2022, 32, 177–194. [Google Scholar] [CrossRef] [PubMed]
Kanth, P.; Inadomi, J.M. Screening and prevention of colorectal cancer. BMJ 2021, 374, n1855. [Google Scholar] [CrossRef] [PubMed]
Selvaraj, J.; Umapathy, S. CRPU-NET: A deep learning model based semantic segmentation for the detection of colorectal polyp in lower gastrointestinal tract. Biomed. Phys. Eng. Express 2023, 10, 015018. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Wei, X.; Feng, X.; Liu, Y.; Feng, G.; Du, Y. Repeatability of radiomics studies in colorectal cancer: A systematic review. BMC Gastroenterol. 2023, 23, 125. [Google Scholar] [CrossRef] [PubMed]
Porter, R.J.; Din, S.; Bankhead, P.; Oniscu, A.; Arends, M.J. QuPath algorithm accurately identifies MLH1-deficient inflammatory bowel disease-associated colorectal cancers in a tissue microarray. Diagnostics 2023, 13, 1890. [Google Scholar] [CrossRef] [PubMed]
Karalis, V.D. The integration of artificial intelligence into clinical practice. Appl. Biosci. 2024, 3, 14–44. [Google Scholar] [CrossRef]
Sharma, N.; Kaushik, P. Integration of AI in Healthcare Systems—A Discussion of the Challenges and Opportunities of Integrating AI in Healthcare Systems for Disease Detection and Diagnosis. In AI in Disease Detection: Advancements and Applications; Wiley: Hoboken, NJ, USA, 2025; pp. 239–263. [Google Scholar] [CrossRef]
Uddin, K.M.M.; Bhuiyan, M.T.A.; Saad, M.N.; Islam, A.; Islam, M.M. Ensemble Machine Learning-Based Approach to Predict Cervical Cancer with Hyperparameter Tuning and Model Explainability. Biomed. Mater. Devices 2025, 3, 1463–1490. [Google Scholar] [CrossRef]
Ilmi, H.R.; Khalaf, E.T. Blaze Pose Graph Neural Networks and Long Short-Term Memory for Yoga Posture Recognition. IJACI Int. J. Adv. Comput. Inform. 2025, 1, 79–88. [Google Scholar] [CrossRef]
Yu, H.; Yang, L.T.; Zhang, Q.; Armstrong, D.; Deen, M.J. Convolutional neural networks for medical image analysis: State-of-the-art, comparisons, improvement and perspectives. Neurocomputing 2021, 444, 92–110. [Google Scholar] [CrossRef]
Taye, M.M. Theoretical understanding of convolutional neural network: Concepts, architectures, applications, future directions. Computation 2023, 11, 52. [Google Scholar] [CrossRef]
Wei, H.; Yi, D.; Hu, S.; Zhu, G.; Ding, Y.; Pang, M. Multi-granularity classification of upper gastrointestinal endoscopic images. Neurocomputing 2025, 626, 129564. [Google Scholar] [CrossRef]
Hassija, V.; Palanisamy, B.; Chatterjee, A.; Mandal, A.; Chakraborty, D.; Pandey, A.; Chalapathi, G.; Kumar, D. Transformers for Vision: A Survey on Innovative Methods for Computer Vision. IEEE Access 2025, 13, 95496–95523. [Google Scholar] [CrossRef]
Ali, A.M.; Benjdira, B.; Koubaa, A.; El-Shafai, W.; Khan, Z.; Boulila, W. Vision transformers in image restoration: A survey. Sensors 2023, 23, 2385. [Google Scholar] [CrossRef] [PubMed]
Bhatt, D.; Patel, C.; Talsania, H.; Patel, J.; Vaghela, R.; Pandya, S.; Modi, K.; Ghayvat, H. CNN variants for computer vision: History, architecture, application, challenges and future scope. Electronics 2021, 10, 2470. [Google Scholar] [CrossRef]
Davoodi, P.; Ezoji, M.; Sadeghnejad, N. Classification of natural images inspired by the human visual system. Neurocomputing 2023, 518, 60–69. [Google Scholar] [CrossRef]
Chen, L.; Li, S.; Bai, Q.; Yang, J.; Jiang, S.; Miao, Y. Review of image classification algorithms based on convolutional neural networks. Remote Sens. 2021, 13, 4712. [Google Scholar] [CrossRef]
Cong, S.; Zhou, Y. A review of convolutional neural network architectures and their optimizations. Artif. Intell. Rev. 2023, 56, 1905–1969. [Google Scholar] [CrossRef]
Elharrouss, O.; Akbari, Y.; Almadeed, N.; Al-Maadeed, S. Backbones-review: Feature extractor networks for deep learning and deep reinforcement learning approaches in computer vision. Comput. Sci. Rev. 2024, 53, 100645. [Google Scholar] [CrossRef]
Dubey, S.R.; Singh, S.K.; Chaudhuri, B.B. Activation functions in deep learning: A comprehensive survey and benchmark. Neurocomputing 2022, 503, 92–108. [Google Scholar] [CrossRef]
Zafar, A.; Aamir, M.; Mohd Nawi, N.; Arshad, A.; Riaz, S.; Alruban, A.; Dutta, A.K.; Almotairi, S. A comparison of pooling methods for convolutional neural networks. Appl. Sci. 2022, 12, 8643. [Google Scholar] [CrossRef]
He, L.; Wang, M. SliceSamp: A Promising Downsampling Alternative for Retaining Information in a Neural Network. Appl. Sci. 2023, 13, 11657. [Google Scholar] [CrossRef]
Zhao, X.; Wang, L.; Zhang, Y.; Han, X.; Deveci, M.; Parmar, M. A review of convolutional neural networks in computer vision. Artif. Intell. Rev. 2024, 57, 99. [Google Scholar] [CrossRef]
Ahmed, S.F.; Alam, M.S.B.; Hassan, M.; Rozbu, M.R.; Ishtiak, T.; Rafa, N.; Mofijur, M.; Shawkat Ali, A.; Gandomi, A.H. Deep learning modelling techniques: Current progress, applications, advantages, and challenges. Artif. Intell. Rev. 2023, 56, 13521–13617. [Google Scholar] [CrossRef]
Vakalopoulou, M.; Christodoulidis, S.; Burgos, N.; Colliot, O.; Lepetit, V. Deep learning: Basics and convolutional neural networks (CNNs). In Machine Learning for Brain Disorders; Springer Nature: New York, NY, USA, 2023; pp. 77–115. [Google Scholar]
Reyad, M.; Sarhan, A.M.; Arafa, M. A modified Adam algorithm for deep neural network optimization. Neural Comput. Appl. 2023, 35, 17095–17112. [Google Scholar] [CrossRef]
Hassan, E.; Shams, M.Y.; Hikal, N.A.; Elmougy, S. The effect of choosing optimizer algorithms to improve computer vision tasks: A comparative study. Multimed. Tools Appl. 2023, 82, 16591–16633. [Google Scholar] [CrossRef] [PubMed]
Seetohul, V.; Jahankhani, H.; Kendzierskyj, S.; Will Arachchige, I.S. Quantum Reinforcement Learning: Advancing AI Agents Through Quantum Computing. In Space Law Principles and Sustainable Measures; Springer Nature: Berlin/Heidelberg, Germany, 2024; pp. 55–73. [Google Scholar]
Barreto, A.G.; Fanchini, F.F.; Papa, J.P.; de Albuquerque, V.H.C. Why consider quantum instead classical pattern recognition techniques? Appl. Soft Comput. 2024, 165, 112096. [Google Scholar] [CrossRef]
Innan, N.; Behera, B.K.; Al-Kuwari, S.; Farouk, A. QNN-VRCS: A Quantum Neural Network for Vehicle Road Cooperation Systems. IEEE Trans. Intell. Transp. Syst. 2025, 1–10. [Google Scholar] [CrossRef]
Meghanath, A.; Das, S.; Behera, B.K.; Khan, M.A.; Al-Kuwari, S.; Farouk, A. QDCNN: Quantum Deep Learning for Enhancing Safety and Reliability in Autonomous Transportation Systems. IEEE Trans. Intell. Transp. Syst. 2025, 1–11. [Google Scholar] [CrossRef]
Acampora, G.; Schiattarella, R. Deep neural networks for quantum circuit mapping. Neural Comput. Appl. 2021, 33, 13723–13743. [Google Scholar] [CrossRef]
Khrennikov, A. Roots of quantum computing supremacy: Superposition, entanglement, or complementarity? Eur. Phys. J. Spec. Top. 2021, 230, 1053–1057. [Google Scholar] [CrossRef]
González, F.A.; Vargas-Calderón, V.; Vinck-Posada, H. Classification with quantum measurements. J. Phys. Soc. Jpn. 2021, 90, 044002. [Google Scholar] [CrossRef]
Guarasci, R.; De Pietro, G.; Esposito, M. Quantum natural language processing: Challenges and opportunities. Appl. Sci. 2022, 12, 5651. [Google Scholar] [CrossRef]
Hur, T.; Kim, L.; Park, D.K. Quantum convolutional neural network for classical data classification. Quantum Mach. Intell. 2022, 4, 3. [Google Scholar] [CrossRef]
Hermans, S.; Pompili, M.; Beukers, H.; Baier, S.; Borregaard, J.; Hanson, R. Qubit teleportation between non-neighbouring nodes in a quantum network. Nature 2022, 605, 663–668. [Google Scholar] [CrossRef] [PubMed]
Hughes, C.; Isaacson, J.; Perry, A.; Sun, R.F.; Turner, J.; Hughes, C.; Isaacson, J.; Perry, A.; Sun, R.F.; Turner, J. What is a qubit? In Quantum Computing for the Quantum Curious; Springer Nature: Berlin/Heidelberg, Germany, 2021; pp. 7–16. [Google Scholar]
Ovalle-Magallanes, E.; Alvarado-Carrillo, D.E.; Avina-Cervantes, J.G.; Cruz-Aceves, I.; Ruiz-Pinales, J. Quantum angle encoding with learnable rotation applied to quantum–classical convolutional neural networks. Appl. Soft Comput. 2023, 141, 110307. [Google Scholar] [CrossRef]
Wang, Y.; Wang, Y.; Chen, C.; Jiang, R.; Huang, W. Development of variational quantum deep neural networks for image recognition. Neurocomputing 2022, 501, 566–582. [Google Scholar] [CrossRef]
Fasulo, E.; D’Amico, F.; Zilli, A.; Furfaro, F.; Cicerone, C.; Parigi, T.L.; Peyrin-Biroulet, L.; Danese, S.; Allocca, M. Advancing Colorectal Cancer Prevention in Inflammatory Bowel Disease (IBD): Challenges and Innovations in Endoscopic Surveillance. Cancers 2024, 17, 60. [Google Scholar] [CrossRef] [PubMed]
Ali, H.; Muzammil, M.A.; Dahiya, D.S.; Ali, F.; Yasin, S.; Hanif, W.; Gangwani, M.K.; Aziz, M.; Khalaf, M.; Basuli, D. Artificial intelligence in gastrointestinal endoscopy: A comprehensive review. Ann. Gastroenterol. 2024, 37, 133. [Google Scholar] [CrossRef] [PubMed]
Tontini, G.E.; Rimondi, A.; Vernero, M.; Neumann, H.; Vecchi, M.; Bezzio, C.; Cavallaro, F. Artificial intelligence in gastrointestinal endoscopy for inflammatory bowel disease: A systematic review and new horizons. Ther. Adv. Gastroenterol. 2021, 14, 17562848211017730. [Google Scholar] [CrossRef] [PubMed]
Urban, G.; Tripathi, P.; Alkayali, T.; Mittal, M.; Jalali, F.; Karnes, W.; Baldi, P. Deep learning localizes and identifies polyps in real time with 96% accuracy in screening colonoscopy. Gastroenterology 2018, 155, 1069–1078.e8. [Google Scholar] [CrossRef] [PubMed]
Krenzer, A.; Heil, S.; Fitting, D.; Matti, S.; Zoller, W.G.; Hann, A.; Puppe, F. Automated classification of polyps using deep learning architectures and few-shot learning. BMC Med. Imaging 2023, 23, 59. [Google Scholar] [CrossRef] [PubMed]
Balasubramani, S.; Renjith, P.; Kavisankar, L.; Rajavel, R.; Malarvel, M.; Shankar, A. A Quantum-Enhanced Artificial Neural Network Model for Efficient Medical Image Compression. IEEE Access 2025, 13, 31809–31828. [Google Scholar] [CrossRef]
Yengec-Tasdemir, S.B.; Aydin, Z.; Akay, E.; Dogan, S.; Yilmaz, B. An effective colorectal polyp classification for histopathological images based on supervised contrastive learning. Comput. Biol. Med. 2024, 172, 108267. [Google Scholar] [CrossRef] [PubMed]
Sasmal, P.; Sharma, V.; Prakash, A.J.; Bhuyan, M.K.; Patro, K.K.; Samee, N.A.; Alamro, H.; Iwahori, Y.; Tadeusiewicz, R.; Acharya, U.R. Semi-supervised generative adversarial networks for improved colorectal polyp classification using histopathological images. Inf. Sci. 2024, 658, 120033. [Google Scholar] [CrossRef]
De Carvalho, T.; Kader, R.; Brandao, P.; Lovat, L.B.; Mountney, P.; Stoyanov, D. NICE polyp feature classification for colonoscopy screening. Int. J. Comput. Assist. Radiol. Surg. 2025, 20, 1015–1024. [Google Scholar] [CrossRef] [PubMed]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Haq, E.U.; Yong, Q.; Yuan, Z.; Jianjun, H.; Haq, R.U.; Qin, X. Accurate multiclassification and segmentation of gastric cancer based on a hybrid cascaded deep learning model with a vision transformer from endoscopic images. Inf. Sci. 2024, 670, 120568. [Google Scholar] [CrossRef]
Awujoola, J.O.; Enem, T.A.; Owolabi, J.A.; Akusu, O.C.; Abioye, O.; AbidemiAwujoola, E.; OlayinkaAdelegan, R. Exploring the Intersection of Quantum Neural Networks and Classical Neural Networks for Early Cancer Identification. In Quantum Computing; Auerbach Publications: Abingdon-on-Thames, UK, 2025; pp. 147–169. [Google Scholar]
Khan, M.A.; Aman, M.N.; Sikdar, B. Beyond bits: A review of quantum embedding techniques for efficient information processing. IEEE Access 2024, 12, 46118–46137. [Google Scholar] [CrossRef]
Schuld, M.; Killoran, N. Is quantum advantage the right goal for quantum machine learning? Prx Quantum 2022, 3, 030101. [Google Scholar] [CrossRef]
Selvaraj, J.; Umapathy, S.; Rajesh, N.A. Artificial intelligence based real time colorectal cancer screening study: Polyp segmentation and classification using multi-house database. Biomed. Signal Process. Control 2025, 99, 106928. [Google Scholar] [CrossRef]
Selvaraj, J.; Sadaf, K.; Aslam, S.M.; Umapathy, S. Multiclassification of Colorectal Polyps from Colonoscopy Images Using AI for Early Diagnosis. Diagnostics 2025, 15, 1285. [Google Scholar] [CrossRef] [PubMed]
Bernal, J.; Sánchez, J.; Vilarino, F. Towards automatic polyp detection with a polyp appearance model. Pattern Recognit. 2012, 45, 3166–3182. [Google Scholar] [CrossRef]
Bernal, J.; Sánchez, F.J.; Fernández-Esparrach, G.; Gil, D.; Rodríguez, C.; Vilariño, F. WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians. Comput. Med. Imaging Graph. 2015, 43, 99–111. [Google Scholar] [CrossRef] [PubMed]
Mesejo, P.; Pizarro, D.; Abergel, A.; Rouquette, O.; Beorchia, S.; Poincloux, L.; Bartoli, A. Computer-aided classification of gastrointestinal lesions in regular colonoscopy. IEEE Trans. Med. Imaging 2016, 35, 2051–2063. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Illustration of a 2D convolution operation in a CNN using a 3 × 3 kernel.

Figure 2. Quantum circuit for DL: (a) Work flow (b). Parameterized quantum circuit architecture.

Figure 3. Work flow of the proposed research study.

Figure 4. Work flow of phase 1: binary classification.

Figure 5. Work flow of phase 2: multi-classification.

Figure 6. Work flow of the proposed architecture (CRP_QNN-ViT).

Figure 7. Performance evaluation and 5-fold cross-validation.

Figure 8. Visualization of CRP_QNN-ViT binary classification performance. (a) Accuracy plot, (b) loss plot, (c) ROC plot, and (d) confusion matrix for test results.

Figure 9. Visualization of CRP_QNN-ViT multi-classification performance. (a) Accuracy plot, (b) loss plot, (c) ROC plot, and (d) confusion matrix for test results.

Table 1. Summary of the proposed CRP_QNN-ViT architecture.

Stage	Layer Type	Details/Description	No. of Layers	Output Size
Input	Quantum-Encoded Input Image	256 × 256 × 3	–	256 × 256 × 3
Stage 0	Quantum Conv + Quantum Pooling	7 × 7 QConv, 64 filters, stride 2 + 3 × 3 QPool	2	56 × 56 × 64
Stage 1	Quantum Conv Block + 2 Identity QBlocks	[1 × 1, 64] → [3 × 3, 64] → [1 × 1, 256]	3 blocks (9 layers)	56 × 56 × 256
Stage 2	Quantum Conv Block + 3 Identity QBlocks	[1 × 1, 128] → [3 × 3, 128] → [1 × 1, 512]	4 blocks (12 layers)	28 × 28 × 512
Stage 3	Quantum Conv Block + 5 Identity QBlocks	[1 × 1, 256] → [3 × 3, 256] → [1 × 1, 1024]	6 blocks (18 layers)	14 × 14 × 1024
Stage 4	Quantum Conv Block + 2 Identity QBlocks	[1 × 1, 512] → [3 × 3, 512] → [1 × 1, 2048]	3 blocks (9 layers)	7 × 7 × 2048
Pooling	Quantum Global Average Pooling	Reduces feature map to 1 × 1 × 2048	1	1 × 1 × 2048
FC Layer	Quantum Fully Connected + QSigmoid/QSoftmax	Dense layer with QSigmoid/QSoftmax (intermediate feature vector)	1	2048 or custom size
Patch Embedding	Patchify + Linear Projection	Convert 2D features to patch tokens (e.g., 16 × 16 patches)	1	N × D (e.g., 196 × 768)
Transformer Encoder	Multi-Head Self-Attention + MLP	Multiple transformer blocks with LayerNorm and MLP	12 blocks typical	N × D (e.g., 196 × 768)
Classification Head	MLP Head + Sigmoid for Binary/ MLP Head + Softmax for Multi-Class	Final classification from [CLS] token	1	2 for binary/ 4 for multi-class

Table 2. Performance metrics achieved by CRP_QNN-ViT (binary class) using 80:20 split.

Task	Epoch	Accuracy	Sensitivity	Specificity	Precision	NPV
Training	23	98.18	98.41	97.95	97.96	98.4
Validation	23	97.73	98.64	96.82	96.88	98.61