Author Contributions
Conceptualization: S.S., M.S.H., and M.B.A.M.; data curation, formal analysis, investigation, methodology: S.S., M.S.H., M.B.A.M., M.F.A.M., and Z.M.; funding acquisition, project administration: Z.M., M.B.A.M., and M.F.A.M.; resources, software: S.S., M.S.H., and M.B.A.M.; validation, visualization: M.B.A.M., A.A., M.F.A.M., and Z.M.; writing—original draft: S.S., M.S.H., and M.B.A.M.; writing—review and editing: M.B.A.M., A.A., Z.M., and M.F.A.M. All authors have read and agreed to the published version of the manuscript.
Figure 1.
System architecture of the proposed hybrid classification model, including the preprocessing, feature extraction, and classification stages.
Figure 1.
System architecture of the proposed hybrid classification model, including the preprocessing, feature extraction, and classification stages.
Figure 2.
Representative Sample Images from Each Class in the (a) CRC-VAL-HE-7K, (b) NCT-CRC-HE-100K, and (c) LC25000 datasets.
Figure 2.
Representative Sample Images from Each Class in the (a) CRC-VAL-HE-7K, (b) NCT-CRC-HE-100K, and (c) LC25000 datasets.
Figure 3.
Image enhancement process applied to histopathological samples, including gamma correction, bilateral filtering, and CLAHE to improve contrast and tissue structure visibility.
Figure 3.
Image enhancement process applied to histopathological samples, including gamma correction, bilateral filtering, and CLAHE to improve contrast and tissue structure visibility.
Figure 4.
Normal versus enhanced histopathological images from the (a) CRC-VAL-7K, (b) NCT-CRC-HE-100K, and (c) LC25000 datasets, demonstrating the visual improvements introduced by the enhancement pipeline.
Figure 4.
Normal versus enhanced histopathological images from the (a) CRC-VAL-7K, (b) NCT-CRC-HE-100K, and (c) LC25000 datasets, demonstrating the visual improvements introduced by the enhancement pipeline.
Figure 5.
Overall architecture of the proposed hybrid model: (a) Xception backbone for hierarchical feature extraction, (b) Convolutional Block Attention Module (CBAM) for spatial–channel attention refinement, (c) Transformer block for capturing long-range spatial dependencies, and (d) Classification head for final prediction across all target classes.
Figure 5.
Overall architecture of the proposed hybrid model: (a) Xception backbone for hierarchical feature extraction, (b) Convolutional Block Attention Module (CBAM) for spatial–channel attention refinement, (c) Transformer block for capturing long-range spatial dependencies, and (d) Classification head for final prediction across all target classes.
Figure 6.
Proposed model architecture with attention: (a) Residual block, (b) Multiheaded self-attention, (c) FFN network.
Figure 6.
Proposed model architecture with attention: (a) Residual block, (b) Multiheaded self-attention, (c) FFN network.
Figure 7.
The web-based Graphical User Interface (GUI) deployed for model inference and accessibility.
Figure 7.
The web-based Graphical User Interface (GUI) deployed for model inference and accessibility.
Figure 8.
Confusion matrix of the proposed model on the CRC-VAL-HE-7K dataset.
Figure 8.
Confusion matrix of the proposed model on the CRC-VAL-HE-7K dataset.
Figure 9.
Confusion matrix of the proposed model on the NCT-CRC-HE-100K dataset.
Figure 9.
Confusion matrix of the proposed model on the NCT-CRC-HE-100K dataset.
Figure 10.
Confusion matrix of the proposed model on the LC25000 dataset.
Figure 10.
Confusion matrix of the proposed model on the LC25000 dataset.
Figure 11.
The proposed model’s training and validation accuracy for CRC-VAL-7K.
Figure 11.
The proposed model’s training and validation accuracy for CRC-VAL-7K.
Figure 12.
The proposed model’s training and validation loss for CRC-VAL-7K.
Figure 12.
The proposed model’s training and validation loss for CRC-VAL-7K.
Figure 13.
The proposed model’s training and validation accuracy for NCT-CRC-100K.
Figure 13.
The proposed model’s training and validation accuracy for NCT-CRC-100K.
Figure 14.
The proposed model’s training and validation loss for NCT-CRC-HE-100K.
Figure 14.
The proposed model’s training and validation loss for NCT-CRC-HE-100K.
Figure 15.
The proposed model’s training and validation accuracy for LC25000.
Figure 15.
The proposed model’s training and validation accuracy for LC25000.
Figure 16.
The proposed model’s training and validation loss for LC25000.
Figure 16.
The proposed model’s training and validation loss for LC25000.
Figure 17.
t-SNE visualization of deep feature representations (CRC-VAL-HE-7K).
Figure 17.
t-SNE visualization of deep feature representations (CRC-VAL-HE-7K).
Figure 18.
t-SNE visualization of deep feature representations (NCT-CRC-HE-100K).
Figure 18.
t-SNE visualization of deep feature representations (NCT-CRC-HE-100K).
Figure 19.
t-SNE visualization of deep feature representations (LC25000).
Figure 19.
t-SNE visualization of deep feature representations (LC25000).
Figure 20.
ROC curve for the proposed model on (CRC-VAL-HE-7K).
Figure 20.
ROC curve for the proposed model on (CRC-VAL-HE-7K).
Figure 21.
ROC curve for the proposed model (NCT-CRC-100K).
Figure 21.
ROC curve for the proposed model (NCT-CRC-100K).
Figure 22.
ROC curve for the proposed model (LC25000).
Figure 22.
ROC curve for the proposed model (LC25000).
Figure 23.
The proposed model’s layer-wise feature extraction process on the input images.
Figure 23.
The proposed model’s layer-wise feature extraction process on the input images.
Figure 24.
The proposed model’s Grad-CAM visualization across three different datasets: (a) CRC-VAL-HE-7K; (b) NCT-CRC-HE-100K; and (c) LC25000.
Figure 24.
The proposed model’s Grad-CAM visualization across three different datasets: (a) CRC-VAL-HE-7K; (b) NCT-CRC-HE-100K; and (c) LC25000.
Figure 25.
Fold-wise performance metrics of the proposed model on the LC25000 dataset.
Figure 25.
Fold-wise performance metrics of the proposed model on the LC25000 dataset.
Figure 26.
Ablation analysis of different preprocessing strategies on the CRC-VAL-HE-7K dataset.
Figure 26.
Ablation analysis of different preprocessing strategies on the CRC-VAL-HE-7K dataset.
Table 1.
Comparison of existing histopathological image classification approaches, highlighting their advantages and limitations.
Table 1.
Comparison of existing histopathological image classification approaches, highlighting their advantages and limitations.
| Model Category | Advantages | Limitations |
|---|
| Standard CNNs [16,17,18] | Excellent at extracting hierarchical local features (cell, nucleus morphology). Computationally efficient. | Limited effective receptive field fails to capture global tissue context. Highly sensitive to stain and color variations. |
| Pure Transformers [19] | Excellent at modeling global context and long-range spatial relationships. | Requires massive training datasets. May lose fine-grained local texture details captured by CNNs. |
| Standard Hybrid Models [15,20] | Combines CNN local feature power with Transformer global context (represents current SOTA). | It can be overly complex. Often lacks a validated preprocessing stage and may not use fine-grained, low-level attention mechanisms (CBAM/SE). |
| Stain Invariance Networks [21] | Explicitly minimizes the variability caused by inconsistent H&E staining, improving generalization across different scanning centers. | Primary focus on color consistency may neglect morphological feature enhancement. Still reliant on local feature extractors. |
| Multiple Instance Learning (MIL) [22] | Handles extremely large Whole-Slide Images (WSIs) by aggregating information from numerous small patches. Often includes a form of patch-level attention. | Aggregation layer may lose crucial spatial relationships between patches. Computationally intensive due to sequential tile processing. |
Proposed Model (Enhancement + Xception–CBAM–Transformer) | Designed to solve all listed limitations: (1) Preprocessing handles stain variance. (2) Xception captures local detail. (3) CBAM refines feature focus. (4) Transformer models global context. | - |
Table 2.
Class distribution of the CRC-VAL-7K dataset following an 80–20% train–validation split.
Table 2.
Class distribution of the CRC-VAL-7K dataset following an 80–20% train–validation split.
| Class | Total | Training (80%) | Validation (20%) |
|---|
| ADI | 1338 | 1070 | 268 |
| BACK | 847 | 678 | 169 |
| DEB | 339 | 271 | 68 |
| LYM | 634 | 507 | 127 |
| MUC | 1035 | 828 | 207 |
| MUS | 592 | 474 | 118 |
| NORM | 741 | 593 | 148 |
| STR | 421 | 337 | 84 |
| TUM | 1233 | 986 | 247 |
| Total | 7280 | 5744 | 1536 |
Table 3.
Class distribution of the NCT-CRC-100K dataset following an 80–20% train–validation split.
Table 3.
Class distribution of the NCT-CRC-100K dataset following an 80–20% train–validation split.
| Class | Total | Training (80%) | Validation (20%) |
|---|
| ADI | 15,020 | 12,016 | 3004 |
| BACK | 10,566 | 8453 | 2113 |
| DEB | 11,512 | 9210 | 2302 |
| LYM | 11,557 | 9246 | 2311 |
| MUC | 8896 | 7117 | 1770 |
| MUS | 13,536 | 10,829 | 2707 |
| NORM | 8763 | 7010 | 1753 |
| STR | 10,446 | 8357 | 2089 |
| TUM | 14,317 | 11,454 | 2863 |
| Total | 104,113 | 83,192 | 20,921 |
Table 4.
Class distribution of the LC2500 dataset following an 80–20% train–validation split.
Table 4.
Class distribution of the LC2500 dataset following an 80–20% train–validation split.
| Class | Total | Training (80%) | Validation (20%) |
|---|
| colon_aca | 5000 | 4000 | 1000 |
| colon_bnt | 5000 | 4000 | 1000 |
| lung_aca | 5000 | 4000 | 1000 |
| lung_scc | 5000 | 4000 | 1000 |
| lung_bnt | 5000 | 4000 | 1000 |
| Total | 25,000 | 20,000 | 5000 |
Table 5.
Average image quality metrics of original images in the CRC-VAL-HE-7K dataset.
Table 5.
Average image quality metrics of original images in the CRC-VAL-HE-7K dataset.
| Class | Entropy | SI | PSNR (db) | IQI |
|---|
| ADI | 5.3087 | 13.5822 | 100.00 | 1.0 |
| BACK | 3.9238 | 4.9472 | 100.00 | 1.0 |
| DEB | 6.8641 | 18.8359 | 100.00 | 1.0 |
| LYM | 7.3665 | 24.9039 | 100.00 | 1.0 |
| MUC | 6.9295 | 20.7820 | 100.00 | 1.0 |
| MUS | 6.8906 | 17.7147 | 100.00 | 1.0 |
| NORM | 7.4485 | 17.6426 | 100.00 | 1.0 |
| STR | 7.0325 | 19.4184 | 100.00 | 1.0 |
| TUM | 7.1196 | 15.6653 | 100.00 | 1.0 |
Table 6.
Average image quality metrics of enhanced images in the CRC-VAL-HE-7K dataset.
Table 6.
Average image quality metrics of enhanced images in the CRC-VAL-HE-7K dataset.
| Class | Entropy | SI | PSNR (db) | IQI |
|---|
| ADI | 5.9625 | 14.0326 | 26.9067 | 0.9943 |
| BACK | 5.1629 | 4.6107 | 18.8282 | 0.9316 |
| DEB | 7.4049 | 21.5364 | 21.9243 | 0.9744 |
| LYM | 7.7233 | 29.3981 | 21.9710 | 0.9750 |
| MUC | 7.3384 | 20.7820 | 23.4738 | 0.9814 |
| MUS | 7.3787 | 19.8815 | 22.2676 | 0.9813 |
| NORM | 7.7302 | 20.4912 | 21.9938 | 0.9749 |
| STR | 7.4978 | 22.5964 | 21.1507 | 0.9762 |
| TUM | 7.5359 | 17.6582 | 21.7172 | 0.9762 |
Table 7.
Average image quality metrics of original images in the NCT-CRC-HE-100K dataset.
Table 7.
Average image quality metrics of original images in the NCT-CRC-HE-100K dataset.
| Class | Entropy | SI | PSNR (db) | IQI |
|---|
| ADI | 5.1162 | 16.2056 | 100.00 | 1.0 |
| BACK | 3.7543 | 4.9802 | 100.00 | 1.0 |
| DEB | 6.7772 | 19.1007 | 100.00 | 1.0 |
| LYM | 7.3677 | 21.6227 | 100.00 | 1.0 |
| MUC | 7.0829 | 17.7169 | 100.00 | 1.0 |
| MUS | 6.8061 | 18.9369 | 100.00 | 1.0 |
| NORM | 7.3585 | 19.1077 | 100.00 | 1.0 |
| STR | 6.9971 | 20.0641 | 100.00 | 1.0 |
| TUM | 7.1653 | 20.4176 | 100.00 | 1.0 |
Table 8.
Average image quality metrics of enhanced images in the NCT-CRC-HE-100K dataset.
Table 8.
Average image quality metrics of enhanced images in the NCT-CRC-HE-100K dataset.
| Class | Entropy | SI | PSNR (db) | IQI |
|---|
| ADI | 5.7392 | 16.4826 | 26.8352 | 0.9960 |
| BACK | 5.0444 | 4.6428 | 22.3114 | 0.9522 |
| DEB | 7.2333 | 21.5301 | 22.8247 | 0.9813 |
| LYM | 7.7413 | 27.3297 | 22.0819 | 0.9744 |
| MUC | 7.4602 | 19.4021 | 22.5264 | 0.9827 |
| MUS | 7.2743 | 21.3647 | 22.6810 | 0.9833 |
| NORM | 7.6924 | 22.1051 | 22.0992 | 0.9773 |
| STR | 7.4583 | 23.2537 | 22.1608 | 0.9812 |
| TUM | 7.5758 | 23.6460 | 21.9519 | 0.9793 |
Table 9.
Average image quality metrics of original images in the LC25000 dataset.
Table 9.
Average image quality metrics of original images in the LC25000 dataset.
| Class | Entropy | SI | PSNR (db) | IQI |
|---|
| colon_aca | 7.0919 | 28.2666 | 100.00 | 1.0 |
| colon_bnt | 7.1505 | 28.0476 | 100.00 | 1.0 |
| lung_aca | 7.0451 | 14.2314 | 100.00 | 1.0 |
| Lung_bnt | 6.6160 | 13.7855 | 100.00 | 1.0 |
| Lung_scc | 6.7575 | 13.9977 | 100.00 | 1.0 |
Table 10.
Average image quality metrics of enhanced images in the LC25000 dataset.
Table 10.
Average image quality metrics of enhanced images in the LC25000 dataset.
| Class | Entropy | SI | PSNR (db) | IQI |
|---|
| colon_aca | 7.6567 | 34.2878 | 17.2238 | 0.9511 |
| colon_bnt | 7.7199 | 32.7993 | 16.1384 | 0.9375 |
| lung_aca | 7.5795 | 19.8720 | 21.1361 | 0.9680 |
| Lung_bnt | 7.2235 | 18.2911 | 21.6530 | 0.9780 |
| Lung_scc | 7.4720 | 20.4731 | 20.7948 | 0.9687 |
Table 11.
Hardware specifications used for training the proposed model.
Table 11.
Hardware specifications used for training the proposed model.
| Hardware Specifications | Details |
|---|
| Platform | Jupyter Notebook |
| Processor | AMD Ryzen 5 3600 |
| Memory (RAM) | 64 GB |
| Operating System | Ubuntu 23.10, 64 bit |
| Graphics Card | NVIDIA GeForce GTX 1660 (VRAM 6 GB) |
Table 12.
Hyperparameters used in training the proposed model.
Table 12.
Hyperparameters used in training the proposed model.
| Hyperparameter | Value |
|---|
| Input image size | 224 × 224 × 3 |
| Number of classes | 5, 9, and 9 |
| Batch size | 16 |
| Number of epochs | 50, 100, and 100 |
| Backbone | Xception |
| Frozen layers | First 100 layers |
| Attention module | CBAM (Channel + Spatial) |
| Token embedding dimension | 128 |
| Number of tokens | H × W (CNN feature map) |
| Positional encoding | Learned |
| Transformer encoder blocks | 2 |
| Attention heads | 4 |
| FFN hidden dimension | 256 |
| Transformer dropout | 0.1 |
| Classifier dropout | 0.3 |
| Optimizer | Adam (1 × 10−4) |
| Loss function | Categorical Cross-entropy |
| Pooling layer | Global Average Pooling |
| Final activation | Softmax |
Table 13.
Comparative performance of pre-trained models and the proposed model on CRC-VAL-HE-7K.
Table 13.
Comparative performance of pre-trained models and the proposed model on CRC-VAL-HE-7K.
| Model | Accuracy | Precision | Recall | F1-Score | MCC | Kappa |
|---|
| Densenet-121 | 99.26 | 99.07 | 99.10 | 99.30 | 99.38 | 99.37 |
| MobileNetV2 | 97.76 | 96.73 | 96.89 | 96.81 | 96.60 | 96.61 |
| Xception | 99.20 | 98.77 | 98.84 | 98.80 | 98.78 | 98.79 |
| InceptionV3 | 97.27 | 97.14 | 97.89 | 96.38 | 96.63 | 95.82 |
| VGG-16 | 98.18 | 97.54 | 96.70 | 97.12 | 97.08 | 97.09 |
| NasNetMobile | 94.35 | 95.08 | 93.30 | 94.19 | 98.32 | 93.86 |
| Proposed Model | 99.58 | 99.10 | 99.00 | 99.40 | 99.40 | 99.40 |
Table 14.
Per-class performance metrics for each histopathological class on CRC-VAL-HE-7K.
Table 14.
Per-class performance metrics for each histopathological class on CRC-VAL-HE-7K.
| Class | Precision | Recall | F1-Score | Per-Class Acc. | MCC | Support |
|---|
| ADI | 1.0000 | 0.9925 | 0.9962 | 0.9986 | 0.9954 | 268 |
| BACK | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 169 |
| DEB | 1.0000 | 0.9851 | 0.9925 | 0.9993 | 0.9921 | 68 |
| LYM | 1.0000 | 0.9921 | 0.9960 | 0.9993 | 0.9956 | 127 |
| MUC | 1.0000 | 0.9952 | 0.9976 | 0.9993 | 0.9972 | 207 |
| MUS | 1.0000 | 0.9831 | 0.9915 | 0.9986 | 0.9907 | 118 |
| NORM | 1.0000 | 0.9932 | 0.9966 | 0.9993 | 0.9962 | 148 |
| STR | 0.9545 | 1.0000 | 0.9767 | 0.9972 | 0.9756 | 84 |
| TUM | 0.9840 | 1.0000 | 0.9919 | 0.9972 | 0.9903 | 247 |
| Macro Avg | 0.9932 | 0.9935 | 0.9932 | 0.9988 | 0.9926 | – |
| Weighted Avg | 0.9946 | 0.9944 | 0.9944 | 0.9987 | 0.9937 | 1536 |
Table 15.
Comparative performance of pre-trained models and the proposed model on NCT-CRC-HE-100K.
Table 15.
Comparative performance of pre-trained models and the proposed model on NCT-CRC-HE-100K.
| Model | Accuracy | Precision | Recall | F1-Score | MCC | Kappa |
|---|
| Densenet-121 | 96.81 | 96.81 | 96.81 | 96.81 | 96.40 | 96.40 |
| MobileNetV2 | 96.30 | 96.25 | 96.29 | 96.29 | 95.83 | 95.82 |
| Xception | 98.16 | 98.12 | 98.18 | 98.14 | 97.92 | 97.92 |
| InceptionV3 | 95.20 | 95.17 | 95.13 | 95.14 | 94.58 | 94.58 |
| VGG-16 | 97.02 | 96.94 | 97.06 | 96.98 | 96.65 | 96.64 |
| NasNetMobile | 94.20 | 94.16 | 94.10 | 94.12 | 93.46 | 93.45 |
| Proposed Model | 99.33 | 99.27 | 99.26 | 99.27 | 99.97 | 99.17 |
Table 16.
Per-class performance metrics for each histopathological class on NCT-CRC-HE-100K.
Table 16.
Per-class performance metrics for each histopathological class on NCT-CRC-HE-100K.
| Class | Precision | Recall | F1-Score | Per-Class Acc. | MCC | Support |
|---|
| ADI | 0.9981 | 0.9981 | 0.9981 | 0.9996 | 0.9979 | 3004 |
| BACK | 0.9991 | 0.9986 | 0.9988 | 0.9997 | 0.9987 | 2113 |
| DEB | 0.9858 | 0.9952 | 0.9905 | 0.9978 | 0.9893 | 2302 |
| LYM | 0.9996 | 0.9944 | 0.9970 | 0.9993 | 0.9966 | 2311 |
| MUC | 0.9977 | 0.9826 | 0.9901 | 0.9982 | 0.9892 | 1770 |
| MUS | 0.9937 | 0.9948 | 0.9943 | 0.9984 | 0.9934 | 2707 |
| NORM | 0.9897 | 0.9897 | 0.9897 | 0.9982 | 0.9887 | 1753 |
| STR | 0.9871 | 0.9885 | 0.9878 | 0.9974 | 0.9864 | 2089 |
| TUM | 0.9882 | 0.9923 | 0.9902 | 0.9972 | 0.9886 | 2863 |
| Macro Avg | 0.9932 | 0.9927 | 0.9929 | 0.9984 | 0.9921 | – |
| Weighted Avg | 0.9930 | 0.9930 | 0.9930 | 0.9984 | 0.9921 | 20,921 |
Table 17.
Comparative performance of pre-trained models and the proposed model on the LC25000 dataset.
Table 17.
Comparative performance of pre-trained models and the proposed model on the LC25000 dataset.
| Model | Accuracy | Precision | Recall | F1-Score | MCC | Kappa |
|---|
| Densenet-121 | 99.26 | 99.52 | 99.52 | 99.52 | 99.40 | 99.25 |
| MobileNetV2 | 99.26 | 99.37 | 99.38 | 99.37 | 99.25 | 99.20 |
| Xception | 99.68 | 99.68 | 99.68 | 99.68 | 99.60 | 99.42 |
| InceptionV3 | 97.86 | 97.88 | 97.84 | 97.83 | 97.31 | 97.32 |
| VGG-16 | 99.72 | 99.78 | 99.78 | 99.78 | 99.72 | 99.60 |
| NasNetMobile | 97.74 | 97.82 | 97.78 | 97.78 | 97.32 | 97.45 |
| Proposed Model | 99.98 | 99.98 | 99.98 | 99.98 | 99.75 | 99.75 |
Table 18.
Per-class performance metrics for colon and lung cancer classification on the LC25000 dataset.
Table 18.
Per-class performance metrics for colon and lung cancer classification on the LC25000 dataset.
| Class | Precision | Recall | F1-Score | Per-class Acc. | MCC | Support |
|---|
| colon_aca | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1000 |
| colon_bnt | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1000 |
| lung_aca | 0.9990 | 1.0000 | 0.9995 | 0.9996 | 0.9988 | 1000 |
| lung_bnt | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1000 |
| lung_scc | 1.0000 | 0.9990 | 0.9996 | 0.9996 | 0.9987 | 1000 |
| Macro Avg | 0.9998 | 0.9998 | 0.9998 | 0.9998 | 0.9995 | – |
| Weighted Avg | 0.9998 | 0.9998 | 0.9998 | 0.9998 | 0.9995 | 5000 |
Table 19.
Step-by-step implementation approaches of the proposed model on the LC25000 dataset.
Table 19.
Step-by-step implementation approaches of the proposed model on the LC25000 dataset.
| Model Variant | Params | FLOPS (G) | Inference (ms) | Accuracy (%) | F1 (%) | Acc Vs Base (%) |
|---|
| Ensemble (MobilenetV2 + Xception + VGG16) | 164.8 M | 24.2 | 112 | 98.76 0.21 | 96.74 0.23 | 3.24 |
| Xception (ImageNet) | 21.9 M | 4.2 G | 7.66 | 99.68 0.17 | 99.68 0.26 | 0.30 |
| Xception Conv5 | 7.6 M | 3.8 G | 3.6 | 97.58 0.38 | 97.48 0.62 | 2.20 |
| Xception + Spatial Attention | 0.66 M | 1.5 G | 0.7 | 98.54 0.04 | 98.38 0.34 | 1.11 |
| Xception (like-CNN) + SE + CBAM | 3.01 M | 2.45 G | 11.3 | 98.53 0.15 | 98.46 0.36 | 1.45 |
| Xception + CBAM Attention + Transformer | 20.1 M | 3.9 G | 7.1 | 99.98 0.01 | 99.98 0.01 | |
Table 20.
Performance comparison between prior work and the proposed model on the NCT-CRC-HE-100K dataset.
Table 20.
Performance comparison between prior work and the proposed model on the NCT-CRC-HE-100K dataset.
| Model | Dataset | Accuracy | Precision | Recall | F1-Score | MCC | Kappa | Ref. |
|---|
| CRCCN-NET | NCT-CRC-HE-100K | 96.26 | 96.44 | 96.34 | 96.38 | 96.00 | 96.00 | [32] |
| CNN + SWIN Transformer | NCT-CRC-HE-100K | 95.80 | 97.90 | 97.63 | 97.76 | 97.61 | 97.64 | [33] |
| VGG19 | NCT-CRC-HE-100K | 96.40 | 94.22 | 94.44 | 94.44 | NA | NA | [34] |
| Ensemble CNN | NCT-CRC-HE-100K | 96.16 | 96.17 | 96.15 | NA | NA | NA | [35] |
| GAN + Inception | NCT-CRC-HE-100K | 89.54 | 86.84 | 86.62 | 98.70 | NA | NA | [36] |
| Proposed Model | NCT-CRC-HE-100K | 99.33 | 99.27 | 99.26 | 99.27 | 99.17 | 99.17 | |
Table 21.
Performance comparison between prior work and the proposed model on CRC-VAL-HE-7K.
Table 21.
Performance comparison between prior work and the proposed model on CRC-VAL-HE-7K.
| Model | Dataset | Accuracy | Precision | Recall | F1-Score | MCC | Kappa | Ref. |
|---|
| ResNet50 + Kernel Polynomial | CRC-VAL-HE-7K | 97.01 | 98.20 | 98.20 | 98.20 | 96.50 | 98.10 | [37] |
| FineTuned-VGG16 | CRC-VAL-HE-7K | 97.92 | 98.02 | 97.38 | 97.65 | 97.62 | 97.61 | [38] |
| CNN-adam | CRC-VAL-HE-7K | 90.00 | 89.00 | 87.00 | 87.00 | NA | NA | [39] |
| Proposed Model | CRC-VAL-HE-7K | 99.58 | 99.10 | 99.40 | 99.40 | 99.40 | 99.40 | |
Table 22.
Performance comparison between prior work and the proposed model on the LC2500 dataset.
Table 22.
Performance comparison between prior work and the proposed model on the LC2500 dataset.
| Model | Dataset | Accuracy | Precision | Recall | F1-Score | MCC | Kappa | Ref. |
|---|
| Fine-tuned ResNet101 | LC25000 | 99.94 | 99.84 | 99.85 | 99.84 | NA | NA | [40] |
| LW-MS-CCN | LC25000 | 99.20 | 99.16 | 99.36 | 99.29 | NA | NA | [41] |
| CNN | LC25000 | 96.33 | 96.39 | 96.37 | 96.38 | 95.44 | 95.41 | [42] |
| CNN + GC attention block | LC25000 | 99.76 | 99.76 | 99.40 | 99.70 | 99.50 | 99.50 | [43] |
| HIELCC-EDL | LC25000 | 99.60 | 99.00 | 99.00 | 99.00 | 99.00 | 99.20 | [44] |
| Self-ONN | LC25000 | 99.89 | 99.74 | 99.74 | 99.74 | 99.84 | 99.78 | [45] |
| CNN + ImageNet | LC25000 | 99.96 | 99.96 | 99.96 | 99.96 | 99.96 | 98.36 | [46] |
| Ensemble (MobileNet + Xception) | LC25000 | 99.44 | 99.42 | 99.43 | 99.42 | 99.43 | 99.30 | [47] |
| Ensemble (ResNet + NasNet + EfficientNet) | LC25000 | 99.94 | 99.84 | 99.84 | 99.84 | 99.78 | 99.88 | [48] |
| Proposed Model | LC25000 | 99.98 | 99.98 | 99.98 | 99.98 | 99.95 | 99.98 | |
Table 23.
Patient-level five-fold cross-validation performance (mean ± standard deviation) on the LC25000 dataset.
Table 23.
Patient-level five-fold cross-validation performance (mean ± standard deviation) on the LC25000 dataset.
| Fold | Accuracy | Precision | Recall | F1-Score | MCC | Kappa |
|---|
| Fold-1 | 99.98 | 99.98 | 99.98 | 99.98 | 99.98 | 99.97 |
| Fold-2 | 99.96 | 99.96 | 99.96 | 99.96 | 99.95 | 99.95 |
| Fold-3 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
| Fold-4 | 99.92 | 99.92 | 99.92 | 99.92 | 99.90 | 99.90 |
| Fold-5 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 | 100.00 |
| Average ± SD | 99.97 0.03 | 99.97 0.03 | 99.97 0.03 | 99.97 0.03 | 99.97 0.0004 | 99.97 0.0004 |
Table 24.
Ablation study evaluating the effect of different preprocessing configurations on classification performance on the CRC-VAL-HE-7K dataset.
Table 24.
Ablation study evaluating the effect of different preprocessing configurations on classification performance on the CRC-VAL-HE-7K dataset.
| Preprocessing Strategy | Accuracy | Precision | Recall | F1-Score | MCC | Kappa |
|---|
| No preprocessing | 99.32 | 99.26 | 99.11 | 99.19 | 99.19 | 99.26 |
| Stain normalization only | 98.95 | 98.66 | 98.83 | 98.71 | 98.80 | 98.80 |
Spatially Adaptive NLM + Edge-Aware Sharpening | 97.65 | 97.26 | 97.32 | 97.25 | 97.44 | 97.10 |
| Gamma correction only | 99.36 | 99.36 | 99.23 | 99.29 | 99.29 | 99.44 |
| Gamma + Bilateral filtering | 99.51 | 99.36 | 99.29 | 99.33 | 99.44 | 99.44 |
| Gamma + Bilateral + CLAHE | 99.58 | 99.10 | 99.40 | 99.40 | 99.40 | 99.40 |