Advancing Medical Decision-Making with AI: A Comprehensive Exploration of the Evolution from Convolutional Neural Networks to Capsule Networks
Abstract
1. Introduction
1.1. Background and Motivation
1.2. Structure of the Review
- Section 2: Review MethodologyThis section outlines the main approach followed to collect, select, and analyze the related studies for this review.
- Section 3: Theoretical BackgroundProvides an overview of CNN and CapsNets architectures and explains how CapsNets address some of the major limitations of CNNs.
- Section 4: Applications of CNNs and CapsNets in Medical ImagingIn this section, an introduction to the general topic of medical imaging and applications involving CNN and CapsNet would be discussed.
- Section 5: Comparative Analysis of CNNs and CapsNets in Medical Image AnalysisThis section will compare and examine the performance of CNNs and CapsNets regarding accuracy, precision, computational efficiency, and scalability. At the end we discuss the main advantages and shortcomings of both methods.
- Section 6: Real-World Applications and Comparative ResultsThis aspect of the literature review will emphasize the implementations and case studies that show the comparative results and outputs of the systems that use CNN and CapsNets.
- Section 7: Discussion and Future PerspectivesIn order to address future outlooks, one section will be allocated to recapitulate the findings, point out new research patterns, and discuss potential future enhancements regarding the use of AI for the support of medical decisions.
- Section 8: ConclusionsIn order to conclude our review, major comparative insights and their potential impacts on future clinical and research developments are summarized in the following section.
- We provide a structured and pedagogical comparison between CNNs and CapsNets, describing their mathematical foundations, architectural principles, and operational differences.
- We synthesize recent medical imaging applications of both architectures across major diagnostic tasks, including classification, detection, and segmentation.
- We analyze the technical and clinical barriers that currently prevent CapsNets from achieving real-world deployment, highlighting computational constraints, lack of regulatory validation, and integration challenges in clinical workflows.
- We discuss emerging hybrid paradigms that combine CNNs, CapsNets, and attention-based models, and outline future research directions toward explainable, efficient, and clinically reliable AI decision-support systems.
2. Review Methodology
2.1. Scope and Temporal Coverage
2.2. Sources and Search Strategy
2.3. Inclusion and Exclusion Criteria
- Published in English between 2018 and 2025;
- Related to the application or comparison of CNNs and CapsNets in medical imaging tasks such as classification, segmentation, or disease diagnosis;
- Systematic reviews specifically addressing CNN or CapsNets approaches;
- Reported measurable performance metrics (e.g., accuracy, sensitivity, specificity, precision, or F1-score);
- Used benchmark or publicly available datasets (e.g., BraTS, INbreast, ISIC);
- Contributed to research on interpretability, robustness, or hybrid deep learning architectures.
- Were unrelated to biomedical imaging;
- Were purely theoretical without experimental validation;
- Were editorials, brief communications, or not peer-reviewed.
2.4. Analytical Approach
- Architectural Analysis: Examining the model organization, feature retrieval procedures, and spatial encoding strategies.
- Performance Assessment: Evaluating performance metrics and computational efficiency as presented in the literature.
- Clinical Relevance: Discussing model interpretability, robustness to morphological variability, and potential applicability in real-world decision-support systems.
3. Theoretical Background
3.1. Fundamentals of CNNs
3.2. Fundamentals of Capsule Networks
3.3. Illustrative Example of Dynamic Routing
4. Applications of CNNs and CapsNets in Medical Imaging
4.1. Disease Classification and Detection
4.2. Medical Image Segmentation
- V-Net: An extension of U-Net for 3D image segmentation [57], V-Net incorporates residual learning and is particularly useful for volumetric medical imaging data.
- SegNet: This architecture uses an encoder-decoder approach with indices-based unpooling [58], allowing for more efficient upsampling in the decoder.
- Feature Extraction: CNNs can learn and extract relevant features directly from the data in order to reduce the need for manual feature engineering.
- Contextual awareness: The hierarchical nature of CNNs allows them to capture both local and global context, crucial for accurate segmentation.
- Adaptability: CNNs can be fine-tuned for specific medical imaging modalities and anatomical structures.
- End-to-end learning: CNNs can be trained in an end-to-end manner, optimizing the entire segmentation pipeline simultaneously.
- Limited training data: Medical imaging datasets are often small due to privacy concerns and the cost of annotation.
- Class imbalance: In many medical segmentation tasks, the region of interest may be much smaller than the background.
- Three-dimensional data handling: While 2D CNNs have shown success, efficiently processing 3D volumetric data remains challenging.
4.3. Multimodal Analysis and Computer-Aided Diagnosis
5. Comparative Analysis of CNNs and CapsNets in Medical Image Analysis
5.1. Performance Metrics
5.2. Accuracy, Precision, Computational Efficiency, and Scalability
5.3. Strengths and Weaknesses of CNNs and CapsNets
6. Real-World Applications and Comparative Results
7. Discussion and Future Perspectives
Emerging Hybrid Architectures and Future Directions
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Kumar, Y.; Koul, A.; Singla, R.; Ijaz, M.F. Artificial intelligence in disease diagnosis: A systematic literature review, synthesizing framework and future research agenda. J. Ambient. Intell. Humaniz. Comput. 2023, 14, 8459–8486. [Google Scholar] [CrossRef]
- Chan, H.P.; Hadjiiski, L.M.; Samala, R.K. Computer-aided diagnosis in the era of deep learning. Med. Phys. 2020, 47, e218–e227. [Google Scholar] [CrossRef]
- Battineni, G.; Sagaro, G.G.; Chinatalapudi, N.; Amenta, F. Applications of machine learning predictive models in the chronic disease diagnosis. J. Pers. Med. 2020, 10, 21. [Google Scholar] [CrossRef]
- Zhou, S.K.; Greenspan, H.; Davatzikos, C.; Duncan, J.S.; Van Ginneken, B.; Madabhushi, A.; Summers, R.M. A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises. Proc. IEEE 2021, 109, 820–838. [Google Scholar] [CrossRef]
- Patrick, M.K.; Adekoya, A.F.; Mighty, A.A.; Edward, B.Y. Capsule networks—A survey. J. King Saud-Univ.-Comput. Inf. Sci. 2022, 34, 1295–1310. [Google Scholar] [CrossRef]
- Sun, Z.; Zhao, G.; Scherer, R.; Wei, W.; Woźniak, M. Overview of capsule neural networks. J. Internet Technol. 2022, 23, 33–44. [Google Scholar]
- Mazzia, V.; Salvetti, F.; Chiaberge, M. Efficient-capsnet: Capsule network with self-attention routing. Sci. Rep. 2021, 11, 14634. [Google Scholar] [CrossRef] [PubMed]
- Goceri, E. CapsNet topology to classify tumours from brain images and comparative evaluation. IET IMage Process. 2020, 14, 882–889. [Google Scholar] [CrossRef]
- Ali, R.; Manikandan, A.; Xu, J. A novel framework of adaptive fuzzy-GLCM segmentation and fuzzy with capsules network (F-CapsNet) classification. Neural Comput. Appl. 2023, 35, 22133–22149. [Google Scholar] [CrossRef]
- Yuan, Y.; Chu, J.; Leng, L.; Miao, J.; Kim, B.G. A scale-adaptive object-tracking algorithm with occlusion detection. EURASIP J. Image Video Process. 2020, 2020, 7. [Google Scholar] [CrossRef]
- Zeng, D.; Veldhuis, R.; Spreeuwers, L. A survey of face recognition techniques under occlusion. IET Biom. 2021, 10, 581–606. [Google Scholar] [CrossRef]
- Lan, Z.; Cai, S.; He, X.; Wen, X. FixCaps: An improved capsules network for diagnosis of skin cancer. IEEE Access 2022, 10, 76261–76267. [Google Scholar] [CrossRef]
- Albraikan, A.A.; Nemri, N.; Alkhonaini, M.A.; Hilal, A.M.; Yaseen, I.; Motwakel, A. Automated deep learning based melanoma detection and classification using biomedical dermoscopic images. Comput. Mater. Contin. 2023, 74, 2443–2459. [Google Scholar] [CrossRef]
- Haghanifar, A. Automated Teeth Extraction and Dental Caries Detection in Panoramic X-Ray. Ph.D. Thesis, University of Saskatchewan, Saskatoon, SK, Canada, 2022. [Google Scholar]
- Alwakid, G.; Gouda, W.; Humayun, M.; Sama, N.U. Melanoma detection using deep learning-based classifications. Healthcare 2022, 10, 2481. [Google Scholar] [CrossRef]
- Aziz, M.J.; Zade, A.A.T.; Farnia, P.; Alimohamadi, M.; Makkiabadi, B.; Ahmadian, A.; Alirezaie, J. Accurate automatic glioma segmentation in brain MRI images based on CapsNet. In Proceedings of the 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Guadalajara, Mexico, 1–5 November 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 3882–3885. [Google Scholar]
- Akinyelu, A.A.; Bah, B. COVID-19 diagnosis in computerized tomography (CT) and X-ray scans using capsule neural network. Diagnostics 2023, 13, 1484. [Google Scholar] [CrossRef] [PubMed]
- Rahman, H.; Naik Bukht, T.F.; Ahmad, R.; Almadhor, A.; Javed, A.R. Efficient breast cancer diagnosis from complex mammographic images using deep convolutional neural network. Comput. Intell. Neurosci. 2023, 2023, 7717712. [Google Scholar] [CrossRef]
- Saraiva, M.J.M.; Afonso, J.; Ribeiro, T.; Ferreira, J.; Cardoso, H.; Andrade, A.P.; Macedo, G. Deep learning and capsule endoscopy: Automatic identification and differentiation of small bowel lesions with distinct haemorrhagic potential using a convolutional neural network. BMJ Open Gastroenterol. 2021, 8, e000753. [Google Scholar] [CrossRef] [PubMed]
- An, Q.; Chen, W.; Shao, W. A deep convolutional neural network for pneumonia detection in x-ray images with attention ensemble. Diagnostics 2024, 14, 390. [Google Scholar] [CrossRef] [PubMed]
- Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A survey of convolutional neural networks: Analysis, applications, and prospects. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 6999–7019. [Google Scholar] [CrossRef]
- Yin, H.; Gong, Y.; Qiu, G. Fast and efficient implementation of image filtering using a side window convolutional neural network. Signal Process. 2020, 176, 107717. [Google Scholar] [CrossRef]
- Romero, D.W.; Kuzina, A.; Bekkers, E.J.; Tomczak, J.M.; Hoogendoorn, M. Ckconv: Continuous kernel convolution for sequential data. arXiv 2021, arXiv:2102.02611. [Google Scholar]
- Taye, M.M. Theoretical understanding of convolutional neural network: Concepts, architectures, applications, future directions. Computation 2023, 11, 52. [Google Scholar] [CrossRef]
- Santra, S.; Hsieh, J.W.; Lin, C.F. Gradient descent effects on differential neural architecture search: A survey. IEEE Access 2021, 9, 89602–89618. [Google Scholar] [CrossRef]
- Jagadeesan, M.; Razenshteyn, I.; Gunasekar, S. Inductive bias of multi-channel linear convolutional networks with bounded weight norm. In Proceedings of the Conference on Learning Theory, London, UK, 2–5 July 2022; pp. 2276–2325. [Google Scholar]
- Debnath, T.; Reza, M.M.; Rahman, A.; Beheshti, A.; Band, S.S.; Alinejad-Rokny, H. Four-layer ConvNet to facial emotion recognition with minimal epochs and the significance of data diversity. Sci. Rep. 2022, 12, 6991. [Google Scholar] [CrossRef]
- Jena, B.; Nayak, G.K.; Saxena, S. Convolutional neural network and its pretrained models for image classification and object detection: A survey. Concurr. Comput. Pract. Exp. 2022, 34, e6767. [Google Scholar] [CrossRef]
- Zhao, Q.; Shang, Z. Deep learning and its development. J. Phys. Conf. Ser. 2021, 1948, 012023. [Google Scholar] [CrossRef]
- Akhtar, N.; Mian, A.; Kardan, N.; Shah, M. Advances in adversarial attacks and defenses in computer vision: A survey. IEEE Access 2021, 9, 155161–155196. [Google Scholar] [CrossRef]
- Hinton, G. How to represent part-whole hierarchies in a neural network. Neural Comput. 2023, 35, 413–452. [Google Scholar] [CrossRef]
- Taher, O.; Özacar, K. HeCapsNet: An enhanced capsule network for automated heel disease diagnosis using lateral foot X-Ray images. Int. J. Imaging Syst. Technol. 2024, 34, e23084. [Google Scholar] [CrossRef]
- Sabour, S.; Frosst, N.; Hinton, G.E. Dynamic routing between capsules. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Shiri, P. Optimizing Capsule Networks. Ph.D. Thesis, Department of Electrical and Computer Engineering, University of Victoria, Victoria, BC, Canada, 2022. [Google Scholar]
- Sood, K.; Fiaidhi, J. Capsule Networks: An Alternative Approach to Image Classification Using Convolutional Neural Networks. TechRxiv 2020. [Google Scholar] [CrossRef]
- Tran, M.; Vo-Ho, V.K.; Quinn, K.; Nguyen, H.; Luu, K.; Le, N. CapsNet for medical image segmentation. In Deep Learning for Medical Image Analysis; Academic Press: Cambridge, MA, USA, 2024; pp. 75–97. [Google Scholar]
- Hinton, G.E.; Sabour, S.; Frosst, N. Matrix Capsules with EM Routing. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018; Available online: https://openreview.net/pdf?id=HJWLfGWRb (accessed on 14 December 2025).
- Bhardwaj, P.; Kaur, A. A novel and efficient deep learning approach for COVID-19 detection using X-ray imaging modality. Int. J. Imaging Syst. Technol. 2021, 31, 1775–1791. [Google Scholar] [CrossRef] [PubMed]
- Hilmizen, N.; Bustamam, A.; Sarwinda, D. The multimodal deep learning for diagnosing COVID-19 pneumonia from chest CT-scan and X-ray images. In Proceedings of the 2020 3rd International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), Yogyakarta, Indonesia, 10–11 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 26–31. [Google Scholar]
- Huang, J.; Fang, Y.; Wu, Y.; Wu, H.; Gao, Z.; Li, Y.; Del Ser, J.; Xia, J.; Yang, G. Swin transformer for fast MRI. Neurocomputing 2022, 493, 281–304. [Google Scholar] [CrossRef]
- Li, T.; Bo, W.; Hu, C.; Kang, H.; Liu, H.; Wang, K.; Fu, H. Applications of deep learning in fundus images: A review. Med. Image Anal. 2021, 69, 101971. [Google Scholar] [CrossRef]
- Solnik, M.; Paduszyńska, N.; Czarnecka, A.M.; Synoradzki, K.J.; Yousef, Y.A.; Chorągiewicz, T.; Rejdak, R.; Toro, M.D.; Zweifel, S.; Dyndor, K.; et al. Imaging of uveal melanoma—Current standard and methods in development. Cancers 2022, 14, 3147. [Google Scholar] [CrossRef]
- Szabó, V.; Orhan, K.; Dobó-Nagy, C.; Veres, D.S.; Manulis, D.; Ezhov, M.; Sanders, A.; Szabó, B.T. Deep Learning-Based Periapical Lesion Detection on Panoramic Radiographs. Diagnostics 2025, 15, 510. [Google Scholar] [CrossRef]
- Çetinkaya, İ.; Çatmabacak, E.D.; Öztürk, E. Detection of Fractured Endodontic Instruments in Periapical Radiographs: A Comparative Study of YOLOv8 and Mask R-CNN. Diagnostics 2025, 15, 653. [Google Scholar] [CrossRef]
- Alwateer, M.; Bamaqa, A.; Farsi, M.; Aljohani, M.; Shehata, M.; Elhosseini, M.A. Transformative Approaches in Breast Cancer Detection: Integrating Transformers into Computer-Aided Diagnosis for Histopathological Classification. Bioengineering 2025, 12, 212. [Google Scholar] [CrossRef]
- Abhisheka, B.; Biswas, S.K.; Purkayastha, B.; Das, D.; Escargueil, A. Recent trend in medical imaging modalities and their applications in disease diagnosis: A review. Multimed. Tools Appl. 2024, 83, 43035–43070. [Google Scholar] [CrossRef]
- Deng, Y.; Lu, L.; Aponte, L.; Angelidi, A.M.; Novak, V.; Karniadakis, G.E.; Mantzoros, C.S. Deep transfer learning and data augmentation improve glucose levels prediction in type 2 diabetes patients. npj Digit. Med. 2021, 4, 109. [Google Scholar] [CrossRef] [PubMed]
- Sufian, A.; Ghosh, A.; Sadiq, A.S.; Smarandache, F. A survey on deep transfer learning to edge computing for mitigating the COVID-19 pandemic. J. Syst. Archit. 2020, 108, 101830. [Google Scholar] [CrossRef]
- Yadav, S.; Dhage, S. TE-CapsNet: Time efficient capsule network for automatic disease classification from medical images. Multimed. Tools Appl. 2024, 83, 49389–49418. [Google Scholar] [CrossRef]
- Saif, A.F.M.; Imtiaz, T.; Rifat, S.; Shahnaz, C.; Zhu, W.P.; Ahmad, M.O. CapsCovNet: A modified capsule network to diagnose Covid-19 from multimodal medical imaging. IEEE Trans. Artif. Intell. 2021, 2, 608–617. [Google Scholar] [CrossRef] [PubMed]
- Aksoy, B.; Salman, O.K.M. Detection of COVID-19 Disease in Chest X-Ray Images with capsul networks: Application with cloud computing. J. Exp. Theor. Artif. Intell. 2021, 33, 527–541. [Google Scholar] [CrossRef]
- Shah, M.; Bhavsar, N.; Patel, K.; Gautam, K.; Chauhan, M. Modern Challenges and Limitations in Medical Science Using Capsule Networks: A Comprehensive Review. In Proceedings of the International Conference on Image Processing and Capsule Networks, Bangkok, Thailand, 10–11 March 2023; Springer Nature: Singapore, 2023; pp. 1–25. [Google Scholar]
- Wu, Y.; Cen, L.; Kan, S.; Xie, Y. Multi-layer capsule network with joint dynamic routing for fire recognition. Image Vis. Comput. 2023, 139, 104825. [Google Scholar] [CrossRef]
- Chen, R.; Shen, H.; Zhao, Z.Q.; Yang, Y.; Zhang, Z. Global routing between capsules. Pattern Recognit. 2024, 148, 110142. [Google Scholar] [CrossRef]
- Siddique, N.; Paheding, S.; Elkin, C.P.; Devabhaktuni, V. U-net and its variants for medical image segmentation: A review of theory and applications. IEEE Access 2021, 9, 82031–82057. [Google Scholar] [CrossRef]
- Du, X.; Cheng, K.; Zhang, J.; Wang, Y.; Yang, F.; Zhou, W.; Lin, Y. Infrared Small Target Detection Algorithm Based on Improved Dense Nested U-Net Network. Sensors 2025, 25, 814. [Google Scholar] [CrossRef]
- Mohammed, K.K.; Hassanien, A.E.; Afify, H.M. A 3D image segmentation for lung cancer using V. Net architecture based deep convolutional networks. J. Med. Eng. Technol. 2021, 45, 337–343. [Google Scholar] [CrossRef]
- He, F.; Wang, W.; Ren, L.; Zhao, Y.; Liu, Z.; Zhu, Y. CA-SegNet: A channel-attention encoder—Decoder network for histopathological image segmentation. Biomed. Signal Process. Control 2024, 96, 106590. [Google Scholar] [CrossRef]
- Li, Y.; Li, P.; Wang, H.; Gong, X.; Fang, Z. CAML-PSPNet: A Medical Image Segmentation Network Based on Coordinate Attention and a Mixed Loss Function. Sensors 2025, 25, 1117. [Google Scholar] [CrossRef]
- Yamada, T.; Yoshimura, T.; Ichikawa, S.; Sugimori, H. Improving Cerebrovascular Imaging with Deep Learning: Semantic Segmentation for Time-of-Flight Magnetic Resonance Angiography Maximum Intensity Projection Image Enhancement. Appl. Sci. 2025, 15, 3034. [Google Scholar] [CrossRef]
- Ranjbarzadeh, R.; Caputo, A.; Tirkolaee, E.B.; Ghoushchi, S.J.; Bendechache, M. Brain tumor segmentation of MRI images: A comprehensive review on the application of artificial intelligence tools. Comput. Biol. Med. 2023, 152, 106405. [Google Scholar] [CrossRef] [PubMed]
- Halder, A.; Chatterjee, S.; Dey, D.; Kole, S.; Munshi, S. An adaptive morphology based segmentation technique for lung nodule detection in thoracic CT image. Comput. Methods Programs Biomed. 2020, 197, 105720. [Google Scholar] [CrossRef]
- Deng, Z.; Gao, W.; Gong, Z.; Gan, R.; Chen, L.; Zhang, S.; Ma, L. A Fundus Image Dataset for AI-based Artery-Vein Vessel Segmentation. Sci. Data 2025, 12, 1298. [Google Scholar] [CrossRef] [PubMed]
- Poiret, C.; Bouyeure, A.; Patil, S.; Boniteau, C.; Duchesnay, E.; Grigis, A.; Lemaitre, F.; Noulhiane, M. Attention-gated 3D CapsNet for robust hippocampal segmentation. J. Med. Imaging 2024, 11, 014003. [Google Scholar] [CrossRef]
- Avesta, A.; Hossain, S.; Lin, M.; Aboian, M.; Krumholz, H.M.; Aneja, S. Comparing 3D, 2.5 D, and 2D approaches to brain image auto-segmentation. Bioengineering 2023, 10, 181. [Google Scholar] [CrossRef]
- Kourounis, G.; Elmahmudi, A.A.; Thomson, B.; Hunter, J.; Ugail, H.; Wilson, C. Computer image analysis with artificial intelligence: A practical introduction to convolutional neural networks for medical professionals. Postgrad. Med. J. 2023, 99, 1287–1294. [Google Scholar] [CrossRef]
- Olveres, J.; González, G.; Torres, F.; Moreno-Tagle, J.C.; Carbajal-Degante, E.; Valencia-Rodríguez, A.; Escalante-Ramírez, B. What is new in computer vision and artificial intelligence in medical image analysis applications. Quant. Imaging Med. Surg. 2021, 11, 3830. [Google Scholar] [CrossRef]
- Hassan, E.; Shams, M.Y.; Hikal, N.A.; Elmougy, S. A novel convolutional neural network model for malaria cell images classification. Comput. Mater. Contin. 2022, 72, 5889–5907. [Google Scholar] [CrossRef]
- Kodipalli, A.; Fernandes, S.L.; Dasar, S.K.; Ismail, T. Computational framework of inverted fuzzy C-means and quantum convolutional neural network towards accurate detection of ovarian tumors. Int. J. E-Health Med. Commun. (IJEHMC) 2023, 14, 1–16. [Google Scholar] [CrossRef]
- Vadlamudi, S.H.; Sai Souhith Reddy, Y.; Ajith Sai Kumar Reddy, P.; Periasamy, P.; Vali Mohamad, N.M. Automatic liver tumor segmentation and identification using fully connected convolutional neural network from CT images. Concurr. Comput. Pract. Exp. 2022, 34, e7212. [Google Scholar] [CrossRef]
- Babulal, K.S.; Das, A.K. Deep learning-based object detection: An investigation. In Proceedings of the Futuristic Trends in Networks and Computing Technologies: Select Proceedings of Fourth International Conference on FTNCT 2021, Dehradun, India, 14–15 December 2022; Springer Nature: Singapore, 2022; pp. 697–711. [Google Scholar]
- Kaur, A.; Singh, Y.; Neeru, N.; Kaur, L.; Singh, A. A survey on deep learning approaches to medical images and a systematic look up into real-time object detection. Arch. Comput. Methods Eng. 2022, 29, 2071–2111. [Google Scholar] [CrossRef]
- Tayal, A.; Gupta, J.; Solanki, A.; Bisht, K.; Nayyar, A.; Masud, M. DL-CNN-based approach with image processing techniques for diagnosis of retinal diseases. Multimed. Syst. 2022, 28, 1417–1438. [Google Scholar] [CrossRef]
- Arabahmadi, M.; Farahbakhsh, R.; Rezazadeh, J. Deep learning for smart Healthcare—A survey on brain tumor detection from medical imaging. Sensors 2022, 22, 1960. [Google Scholar] [CrossRef] [PubMed]
- Salehi, A.W.; Khan, S.; Gupta, G.; Alabduallah, B.I.; Almjally, A.; Alsolai, H.; Mellit, A. A study of CNN and transfer learning in medical imaging: Advantages, challenges, future scope. Sustainability 2023, 15, 5930. [Google Scholar] [CrossRef]
- Jabbar, A.; Naseem, S.; Mahmood, T.; Saba, T.; Alamri, F.S.; Rehman, A. Brain tumor detection and multi-grade segmentation through hybrid caps-VGGNet model. IEEE Access 2023, 11, 72518–72536. [Google Scholar] [CrossRef]
- Celebi, A.R.C.; Bulut, E.; Sezer, A. Artificial intelligence based detection of age-related macular degeneration using optical coherence tomography with unique image preprocessing. Eur. J. Ophthalmol. 2023, 33, 65–73. [Google Scholar] [CrossRef]
- Reis, H.C.; Turk, V. COVID-DSNet: A novel deep convolutional neural network for detection of coronavirus (SARS-CoV-2) cases from CT and Chest X-Ray images. Artif. Intell. Med. 2022, 134, 102427. [Google Scholar] [CrossRef]
- Kaur, P.; Harnal, S.; Tiwari, R.; Alharithi, F.S.; Almulihi, A.H.; Noya, I.D.; Goyal, N. A hybrid convolutional neural network model for diagnosis of COVID-19 using chest X-ray images. Int. J. Environ. Res. Public Health 2021, 18, 12191. [Google Scholar] [CrossRef]
- Swaraj, K.K.; Kiruthiga, G.; Madhu, K.P. Detection of liver cancer from CT images using CAPSNET. ICTACT J. Image Video Process. 2021, 12, 2601–2604. [Google Scholar] [CrossRef]
- Wang, Q.; Chen, A.; Xue, Y. Liver ct image recognition method based on capsule network. Information 2023, 14, 183. [Google Scholar] [CrossRef]
- Iyyanar, G.; Gunasekaran, K.; George, M. Hybrid Approach for Effective Segmentation and Classification of Glaucoma Disease Using UNet++ and CapsNet. Rev. d’Intell. Artif. 2024, 38, 613–621. [Google Scholar] [CrossRef]
- Kalyani, G.; Janakiramaiah, B.; Karuna, A.; Prasad, L.N. Diabetic retinopathy detection and classification using capsule networks. Complex Intell. Syst. 2023, 9, 2651–2664. [Google Scholar] [CrossRef]
- Mascarenhas, M.; Ribeiro, T.; Afonso, J.; Ferreira, J.P.; Cardoso, H.; Andrade, P.; Macedo, G. Deep learning and colon capsule endoscopy: Automatic detection of blood and colonic mucosal lesions using a convolutional neural network. Endosc. Int. Open 2022, 10, E171–E177. [Google Scholar] [CrossRef] [PubMed]
- Afriyie, Y.; Weyori, B.A.; Opoku, A.A. Gastrointestinal tract disease recognition based on denoising capsule network. Cogent Eng. 2022, 9, 2142072. [Google Scholar] [CrossRef]
- Hasnain, M.A.; Ali, S.; Malik, H.; Irfan, M.; Maqbool, M.S. Deep learning-based classification of dental disease using X-rays. J. Comput. Biomed. Inform. 2023, 5, 82–95. [Google Scholar]
- AlSayyed, A.; Taqateq, A.; Al-Sayyed, R.; Suleiman, D.; Shukri, S.; Alhenawi, E.; Albsheish, A. Employing CNN ensemble models in classifying dental caries using oral photographs. Int. J. Data Netw. Sci. 2023, 7, 1535–1550. [Google Scholar] [CrossRef]
- Adla, D.; Reddy, G.V.R.; Nayak, P.; Karuna, G. Deep learning-based computer aided diagnosis model for skin cancer detection and classification. Distrib. Parallel Databases 2022, 40, 717–736. [Google Scholar] [CrossRef]
- U.S. Food and Drug Administration (FDA). 510(k) Premarket Notification: Transpara (K181704). Available online: https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfpmn/pmn.cfm?ID=K181704 (accessed on 14 December 2025).
- U.S. Food and Drug Administration (FDA). 510(k) Premarket Notification: BriefCase (K203508), Aidoc Medical, Ltd. Available online: https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfpmn/pmn.cfm?ID=K203508 (accessed on 14 December 2025).







| Aspect | CNNs | CapsNets |
|---|---|---|
| Core Principle | Hierarchical feature extraction using convolution and pooling; spatial hierarchies captured implicitly via shared weights. | Groups neurons into capsules; explicitly encodes part whole relationships and spatial configuration. |
| Transformation Handling | Invariance mainly through pooling and extensive data augmentation. | Equivariance to pose and rotation via capsule vectors and routing-by-agreement. |
| Interpretability | Moderate; relies on post-hoc saliency/attribution methods. | Higher; capsule outputs encode presence and pose, supporting more structured explanations. |
| Computational Efficiency | Highly optimized; scalable and efficient on large datasets. | More expensive due to iterative routing and vector operations. |
| Small/Imbalanced Data | Tends to overfit; improved by transfer learning and regularization. | Often more robust on limited data owing to richer internal representations. |
| Generalization | Strong on large, well-curated datasets; less robust to unseen geometric changes. | Improved robustness to viewpoint and deformation through explicit spatial modeling. |
| Clinical Applicability | Widely adopted in CAD pipelines for detection and segmentation. | Promising but not yet routine; complexity and lack of mature toolchains hinder deployment. |
| Characteristic | CNNs | CapsNets |
|---|---|---|
| Architecture | Uses convolutional layers, pooling layers, and fully connected layers | Uses capsules (groups of neurons) and dynamic routing algorithms |
| Spatial Relationships | Limited in capturing spatial hierarchies | Excels at capturing spatial hierarchies and relationships |
| Feature Detection | Detects features through convolutional filters | Encodes both the activation and the spatial relationship of features |
| Performance | Generally higher performance due to mature development and optimization | Currently less optimized, often resulting in lower performance |
| Hardware and Software Efficiency | Efficient and widely supported | Computationally intensive, with limited hardware and software support |
| Explainability | Limited interpretability | Potential for improved interpretability due to structured representation |
| Robustness to Variations | Sensitive to small variations in spatial structures | More robust to variations, capturing pose and orientation information |
| Applications in Medical Decision-Making | Widely applied in disease detection, classification, and image segmentation | Emerging applications, with potential for improved outcomes in similar tasks |
| Training Complexity | Well-established training methods and optimization techniques | More complex training, requiring dynamic routing and advanced optimization |
| Adoption in Clinical Settings | Widely adopted and validated in numerous clinical studies | Still in early stages of adoption, with ongoing research to validate effectiveness |
| Example Models | AlexNet, VGGNet, ResNet, Inception | CapsNet, SegCaps MCTDCapsNet |
| Use Case Examples | Tumor detection, organ segmentation, disease classification | Pose estimation, robust disease detection, spatial relationship analysis |
| Scalability | Highly scalable, well-suited for large datasets | Scalability challenges due to computational complexity |
| Transfer Learning Capability | Excellent, with many pre-trained models available | Limited, as the field is still developing |
| Data Augmentation Requirements | Moderate to high | Potentially lower due to inherent rotational invariance |
| Handling of 3D Imaging | Requires specialized architectures (e.g., 3D CNNs) | Potentially more natural handling of 3D structures |
| Multimodal Integration | Well-established techniques available | Promising, but less explored |
| Uncertainty Quantification | Requires additional techniques (e.g., dropout) | Potentially more inherent through capsule activations |
| Few-shot Learning | Generally requires large datasets | Potentially better with limited data due to efficient parameter sharing |
| Computational Resource Requirements | Well-optimized, moderate requirements | Higher requirements, especially during training |
| Handling of Class Imbalance | Requires specific techniques (e.g., weighted loss) | May handle imbalance better due to equivariance properties |
| Interpretability Tools | Various tools available (e.g., Grad-CAM) | Emerging tools, potentially more intuitive due to capsule structure |
| Regulatory Compliance | Established frameworks for validation | Frameworks still developing |
| Longitudinal Analysis Capability | Requires specific architectures | Potentially better suited due to pose preservation |
| Modification Type | Modification | Description | Impact on Robustness to Affine Transformations |
|---|---|---|---|
| CapsNet modifications | Dynamic routing with RBA | Routing-by-agreement (RBA) variation of dynamic routing. | Enhances capsule consensus and adaptability to variations in position, rotation, and scale. |
| Aff-CapsNets | Affine CapsNets. | Reduces the number of parameters while increasing resistance to affine changes. | |
| Transformation-aware capsules | Capsules explicitly designed to handle affine transformations. | Detects and applies suitable transformations, boosting invariance to affine changes. | |
| Capsule-capsule transformation (CCT) | Adaptive transformation between capsules. | Effectively manages varying degrees of affine transformations. | |
| Margin loss regularization | Adds margin loss terms during training. | Promotes larger margins between capsules, enhancing resistance to affine changes. | |
| Capsule routing with EM routing | Utilizes an EM-like algorithm for capsule routing. | Strengthens the capsule agreement process and feature learning, improving robustness to affine transformations. | |
| Self-routing | A supervised, non-iterative routing method. | Eliminates the need for inter-capsule agreement, similar to mixture of experts (MoE), thereby enhancing robustness. | |
| Adversarial capsule networks | Combines CapsNet with adversarial training techniques. | Trains against adversarial affine transformations, enhancing feature robustness. | |
| Capsule dropouts | Applies dropouts to capsules during training. | Improves generalization and robustness by preventing capsule co-adaptations. | |
| Capsule reconstruction | Augments CapsNet with a reconstruction loss term. | Preserves spatial information, improving robustness to affine transformations. | |
| Capsule attention mechanism | Incorporates attention mechanisms into capsules. | Focuses on important features, aiding robustness against affine transformations. | |
| CNN modifications | Spatial transformer networks (STN) | Introduces learnable modules to perform spatial transformations. | Enhances the ability to learn transformations, improving invariance to affine changes. |
| Data augmentation | Applies random transformations (e.g., rotation, scaling) to training data. | Trains on diverse affine-transformed images, increasing robustness. | |
| Deformable convolutions | Modifies convolutional filters to adapt to spatial transformations. | Enhances feature extraction by allowing filters to deform, improving robustness to affine changes. | |
| Affine invariant layers | Uses layers specifically designed to be affine invariant. | Integrates affine invariance directly into the network, improving resilience to affine changes. | |
| Local binary patterns (LBP) | Extracts texture features that are robust to affine transformations. | Focuses on texture rather than spatial arrangement, increasing robustness. | |
| Rotation equivariant CNNs | Incorporates rotation-equivariant filters in the architecture. | Enhances handling of rotational transformations, improving robustness to affine changes. | |
| Adversarial training | Trains the network with adversarial examples, including affine transformations. | Learns robust features by resisting adversarial and affine transformations. | |
| Pooling techniques (e.g., max pooling) | Uses pooling layers to reduce spatial dimensions. | Provides partial invariance to translations and distortions, aiding robustness to affine changes. | |
| SIFT-like feature extraction | Utilizes SIFT-like (Scale-Invariant Feature Transform) methods. | Enhances invariance to scale and rotation, improving robustness to affine changes. | |
| Attention mechanisms | Incorporates attention layers to focus on important features. | Improves focus on relevant features, enhancing robustness to affine transformations. |
| Study | Application/Problem Definition | Architecture and Parameters | Dataset | Performance Metrics | Recommendations/Limitations |
|---|---|---|---|---|---|
| Goceri [8] | Brain tumor classification (pituitary, glioma, meningioma) | 3 fully connected layers, dynamic routing, SASGradD method | 120 T1-weighted contrast-enhanced brain MR images | Accuracy: 92.65% | Manual segmentation is time-consuming and subjective; further refinement and larger datasets needed |
| Aziz et al. [16] | Glioma tumor segmentation in MR images | SegCaps, fewer training images than U-Net | BraTS2020 dataset (MRI scans from 369 patients) | DSC: 87.96%, Parameters: 1.5 M | Slower routing algorithms, higher computational complexity; further optimization needed |
| Akinyelu & Bah [17] | COVID-19 detection based on CT and X-ray imaging | Conv layers, primary capsule layers, digit capsule layer | CT and X-ray images | Accuracy: 99.929% (CT), 94.721% (X-ray) | Decreased accuracy on augmented datasets; further research for generalization and robustness needed |
| Reis & Turk [78] | COVID-19 diagnosis using medical imaging | Depthwise separable convolution, residual networks | CT, chest X-ray, hybrid CT + CXR images | Accuracy: 97.60% | Small datasets, potential image noise; suggests data augmentation and transfer learning |
| Kaur et al. [79] | COVID-19 diagnosis from chest X-ray images | InceptionV4, multiclass SVM classifier | Chest X-ray images | Accuracy: 96.24% (four classes) | Needs larger datasets; potential observer variability in manual diagnoses |
| Rahman et al. [18] | Breast cancer diagnosis from mammographic images | ResNet-50 | INbreast dataset | Accuracy: 93%, Specificity: 93.86%, Sensitivity: 93.83% | Potential performance variability across datasets; exploring alternative networks (VGG, AlexNet) recommended |
| Swaraj et al. [80] | Liver cancer classification from CT images | CapsNet, 41 layers | 3D-IRCADb-01 dataset | Accuracy: 86%+ | Suggests using false positive filters or larger datasets to mitigate false positives |
| Wang et al. [81] | Liver cancer recognition | CapsNet | Liver CT images | Accuracy: 92.9% (CapsNet), 87.6% (CNN) | Potential overfitting; further validation on larger datasets recommended |
| Iyyanar et al. [82] | Glaucoma segmentation and classification | UNet++, CapsNet | ORIGA dataset | Accuracy: 97.6% | Integrating Generative Adversarial Networks to enhance dataset availability and applicability |
| Kalyani et al. [83] | Diabetic retinopathy detection and classification | Reformed capsNet architecture, avoids pooling layers | Messidor dataset | Accuracy: 97.98% (healthy retina) | Further training on additional datasets recommended for more stages of diabetic retinopathy |
| Mascarenhas et al. [84] | Detection of colonic mucosal lesions and blood in CCE images | CNN model, Xception | CCE images | Sensitivity: 96.3%, Specificity: 98.2% | Larger multicenter studies recommended for validation and enhancing clinical applications |
| Afriyie et al. [85] | Gastrointestinal tract disease recognition | Dn-CapsNets | Kvasir-v2 dataset | Accuracy: 94.16% | Further improvements on larger and more complex datasets like HyperKvasir recommended |
| Saraiva et al. [19] | Identification and differentiation of small bowel lesions | CNN model, Xception | Capsule endoscopy (CE) images | Accuracy: 99% | Larger studies recommended to assess clinical impact and enhance generalizability |
| Mascarenhas et al. [84] | Detection of colonic mucosal lesions and blood in CCE images | CNN model | CCE images | Similar performance metrics to prior study | Prospective studies recommended to confirm clinical applicability and enhance model robustness |
| Hasnain et al. [86] | Dental disease classification | Deep learning-based approach | X-ray images | Accuracy: 97.87% | Small dataset size; further research to enhance performance and generalizability recommended |
| Haghanifar [14] | Dental caries detection in panoramic radiography | PaXNet | Panoramic radiography | Accuracy: 86.05% | Suggests expanding the dataset and improving segmentation methods for better accuracy |
| AlSayyed et al. [87] | Dental caries classification using oral photographs | CNN ensemble models | Oral photographs | Accuracy: 97% | Larger, higher-quality datasets recommended; suggests extending framework to other medical domains |
| Lan et al. [12] | Skin cancer diagnosis | FixCaps | HAM10000 dataset | Accuracy: 96.49% | Further exploration of generalization performance recommended due to limitation in current evaluation scope |
| Albraikan et al. [13] | Melanoma detection and classification | ADL-MDC model | ISIC dataset | Accuracy: 98.27% | Further improvements in classification performance through advanced deep learning-based image segmentation techniques suggested |
| Adla et al. [88] | Skin cancer detection | Deep learning-based CAD model | Skin images | Accuracy: 98.50% | Testing on larger datasets and in IoT environments recommended; challenges in segmenting lesions due to variations in texture, size, and color |
| Alwakid et al. [15] | Melanoma detection using dermoscopic images | Deep learning-based approach | HAM10000 dataset | Accuracy: 0.86 (CNN) | Further experiments on larger datasets suggested; incorporating additional types of skin lesions to enhance model robustness |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Khoulqi, I.; El Ouazzani, Z. Advancing Medical Decision-Making with AI: A Comprehensive Exploration of the Evolution from Convolutional Neural Networks to Capsule Networks. J. Imaging 2026, 12, 17. https://doi.org/10.3390/jimaging12010017
Khoulqi I, El Ouazzani Z. Advancing Medical Decision-Making with AI: A Comprehensive Exploration of the Evolution from Convolutional Neural Networks to Capsule Networks. Journal of Imaging. 2026; 12(1):17. https://doi.org/10.3390/jimaging12010017
Chicago/Turabian StyleKhoulqi, Ichrak, and Zakariae El Ouazzani. 2026. "Advancing Medical Decision-Making with AI: A Comprehensive Exploration of the Evolution from Convolutional Neural Networks to Capsule Networks" Journal of Imaging 12, no. 1: 17. https://doi.org/10.3390/jimaging12010017
APA StyleKhoulqi, I., & El Ouazzani, Z. (2026). Advancing Medical Decision-Making with AI: A Comprehensive Exploration of the Evolution from Convolutional Neural Networks to Capsule Networks. Journal of Imaging, 12(1), 17. https://doi.org/10.3390/jimaging12010017

