MDPI - Publisher of Open Access Journals

24 pages, 624 KiB

Open AccessSystematic Review

Integrating Artificial Intelligence into Perinatal Care Pathways: A Scoping Review of Reviews of Applications, Outcomes, and Equity

by Rabie Adel El Arab, Omayma Abdulaziz Al Moosa, Zahraa Albahrani, Israa Alkhalil, Joel Somerville and Fuad Abuadas

Nurs. Rep. 2025, 15(8), 281; https://doi.org/10.3390/nursrep15080281 (registering DOI) - 31 Jul 2025

Abstract

Background: Artificial intelligence (AI) and machine learning (ML) have been reshaping maternal, fetal, neonatal, and reproductive healthcare by enhancing risk prediction, diagnostic accuracy, and operational efficiency across the perinatal continuum. However, no comprehensive synthesis has yet been published. Objective: To conduct a scoping [...] Read more.

Background: Artificial intelligence (AI) and machine learning (ML) have been reshaping maternal, fetal, neonatal, and reproductive healthcare by enhancing risk prediction, diagnostic accuracy, and operational efficiency across the perinatal continuum. However, no comprehensive synthesis has yet been published. Objective: To conduct a scoping review of reviews of AI/ML applications spanning reproductive, prenatal, postpartum, neonatal, and early child-development care. Methods: We searched PubMed, Embase, the Cochrane Library, Web of Science, and Scopus through April 2025. Two reviewers independently screened records, extracted data, and assessed methodological quality using AMSTAR 2 for systematic reviews, ROBIS for bias assessment, SANRA for narrative reviews, and JBI guidance for scoping reviews. Results: Thirty-nine reviews met our inclusion criteria. In preconception and fertility treatment, convolutional neural network-based platforms can identify viable embryos and key sperm parameters with over 90 percent accuracy, and machine-learning models can personalize follicle-stimulating hormone regimens to boost mature oocyte yield while reducing overall medication use. Digital sexual-health chatbots have enhanced patient education, pre-exposure prophylaxis adherence, and safer sexual behaviors, although data-privacy safeguards and bias mitigation remain priorities. During pregnancy, advanced deep-learning models can segment fetal anatomy on ultrasound images with more than 90 percent overlap compared to expert annotations and can detect anomalies with sensitivity exceeding 93 percent. Predictive biometric tools can estimate gestational age within one week with accuracy and fetal weight within approximately 190 g. In the postpartum period, AI-driven decision-support systems and conversational agents can facilitate early screening for depression and can guide follow-up care. Wearable sensors enable remote monitoring of maternal blood pressure and heart rate to support timely clinical intervention. Within neonatal care, the Heart Rate Observation (HeRO) system has reduced mortality among very low-birth-weight infants by roughly 20 percent, and additional AI models can predict neonatal sepsis, retinopathy of prematurity, and necrotizing enterocolitis with area-under-the-curve values above 0.80. From an operational standpoint, automated ultrasound workflows deliver biometric measurements at about 14 milliseconds per frame, and dynamic scheduling in IVF laboratories lowers staff workload and per-cycle costs. Home-monitoring platforms for pregnant women are associated with 7–11 percent reductions in maternal mortality and preeclampsia incidence. Despite these advances, most evidence derives from retrospective, single-center studies with limited external validation. Low-resource settings, especially in Sub-Saharan Africa, remain under-represented, and few AI solutions are fully embedded in electronic health records. Conclusions: AI holds transformative promise for perinatal care but will require prospective multicenter validation, equity-centered design, robust governance, transparent fairness audits, and seamless electronic health record integration to translate these innovations into routine practice and improve maternal and neonatal outcomes. Full article

(This article belongs to the Topic Health Services Optimization, Improvement, and Management: Worldwide Experiences)

► Show Figures

Figure 1

23 pages, 5770 KiB

Open AccessArticle

Assessment of Influencing Factors and Robustness of Computable Image Texture Features in Digital Images

by Diego Andrade, Howard C. Gifford and Mini Das

Tomography 2025, 11(8), 87; https://doi.org/10.3390/tomography11080087 (registering DOI) - 31 Jul 2025

Abstract

Background/Objectives: There is significant interest in using texture features to extract hidden image-based information. In medical imaging applications using radiomics, AI, or personalized medicine, the quest is to extract patient or disease specific information while being insensitive to other system or processing variables. [...] Read more.

Background/Objectives: There is significant interest in using texture features to extract hidden image-based information. In medical imaging applications using radiomics, AI, or personalized medicine, the quest is to extract patient or disease specific information while being insensitive to other system or processing variables. While we use digital breast tomosynthesis (DBT) to show these effects, our results would be generally applicable to a wider range of other imaging modalities and applications. Methods: We examine factors in texture estimation methods, such as quantization, pixel distance offset, and region of interest (ROI) size, that influence the magnitudes of these readily computable and widely used image texture features (specifically Haralick’s gray level co-occurrence matrix (GLCM) textural features). Results: Our results indicate that quantization is the most influential of these parameters, as it controls the size of the GLCM and range of values. We propose a new multi-resolution normalization (by either fixing ROI size or pixel offset) that can significantly reduce quantization magnitude disparities. We show reduction in mean differences in feature values by orders of magnitude; for example, reducing it to 7.34% between quantizations of 8–128, while preserving trends. Conclusions: When combining images from multiple vendors in a common analysis, large variations in texture magnitudes can arise due to differences in post-processing methods like filters. We show that significant changes in GLCM magnitude variations may arise simply due to the filter type or strength. These trends can also vary based on estimation variables (like offset distance or ROI) that can further complicate analysis and robustness. We show pathways to reduce sensitivity to such variations due to estimation methods while increasing the desired sensitivity to patient-specific information such as breast density. Finally, we show that our results obtained from simulated DBT images are consistent with what we see when applied to clinical DBT images. Full article

► Show Figures

Figure 1

21 pages, 22884 KiB

Open AccessData Descriptor

An Open-Source Clinical Case Dataset for Medical Image Classification and Multimodal AI Applications

by Mauro Nievas Offidani, Facundo Roffet, María Carolina González Galtier, Miguel Massiris and Claudio Delrieux

Data 2025, 10(8), 123; https://doi.org/10.3390/data10080123 - 31 Jul 2025

Viewed by 54

Abstract

High-quality, openly accessible clinical datasets remain a significant bottleneck in advancing both research and clinical applications within medical artificial intelligence. Case reports, often rich in multimodal clinical data, represent an underutilized resource for developing medical AI applications. We present an enhanced version of [...] Read more.

High-quality, openly accessible clinical datasets remain a significant bottleneck in advancing both research and clinical applications within medical artificial intelligence. Case reports, often rich in multimodal clinical data, represent an underutilized resource for developing medical AI applications. We present an enhanced version of MultiCaRe, a dataset derived from open-access case reports on PubMed Central. This new version addresses the limitations identified in the previous release and incorporates newly added clinical cases and images (totaling 93,816 and 130,791, respectively), along with a refined hierarchical taxonomy featuring over 140 categories. Image labels have been meticulously curated using a combination of manual and machine learning-based label generation and validation, ensuring a higher quality for image classification tasks and the fine-tuning of multimodal models. To facilitate its use, we also provide a Python package for dataset manipulation, pretrained models for medical image classification, and two dedicated websites. The updated MultiCaRe dataset expands the resources available for multimodal AI research in medicine. Its scale, quality, and accessibility make it a valuable tool for developing medical AI systems, as well as for educational purposes in clinical and computational fields. Full article

► Show Figures

Figure 1

40 pages, 3463 KiB

Open AccessReview

Machine Learning-Powered Smart Healthcare Systems in the Era of Big Data: Applications, Diagnostic Insights, Challenges, and Ethical Implications

by Sita Rani, Raman Kumar, B. S. Panda, Rajender Kumar, Nafaa Farhan Muften, Mayada Ahmed Abass and Jasmina Lozanović

Diagnostics 2025, 15(15), 1914; https://doi.org/10.3390/diagnostics15151914 - 30 Jul 2025

Viewed by 275

Abstract

Healthcare data rapidly increases, and patients seek customized, effective healthcare services. Big data and machine learning (ML) enabled smart healthcare systems hold revolutionary potential. Unlike previous reviews that separately address AI or big data, this work synthesizes their convergence through real-world case studies, [...] Read more.

Healthcare data rapidly increases, and patients seek customized, effective healthcare services. Big data and machine learning (ML) enabled smart healthcare systems hold revolutionary potential. Unlike previous reviews that separately address AI or big data, this work synthesizes their convergence through real-world case studies, cross-domain ML applications, and a critical discussion on ethical integration in smart diagnostics. The review focuses on the role of big data analysis and ML towards better diagnosis, improved efficiency of operations, and individualized care for patients. It explores the principal challenges of data heterogeneity, privacy, computational complexity, and advanced methods such as federated learning (FL) and edge computing. Applications in real-world settings, such as disease prediction, medical imaging, drug discovery, and remote monitoring, illustrate how ML methods, such as deep learning (DL) and natural language processing (NLP), enhance clinical decision-making. A comparison of ML models highlights their value in dealing with large and heterogeneous healthcare datasets. In addition, the use of nascent technologies such as wearables and Internet of Medical Things (IoMT) is examined for their role in supporting real-time data-driven delivery of healthcare. The paper emphasizes the pragmatic application of intelligent systems by highlighting case studies that reflect up to 95% diagnostic accuracy and cost savings. The review ends with future directions that seek to develop scalable, ethical, and interpretable AI-powered healthcare systems. It bridges the gap between ML algorithms and smart diagnostics, offering critical perspectives for clinicians, data scientists, and policymakers. Full article

(This article belongs to the Special Issue Machine-Learning-Based Disease Diagnosis and Prediction)

► Show Figures

Figure 1

13 pages, 3685 KiB

Open AccessArticle

A Controlled Variation Approach for Example-Based Explainable AI in Colorectal Polyp Classification

by Miguel Filipe Fontes, Alexandre Henrique Neto, João Dallyson Almeida and António Trigueiros Cunha

Appl. Sci. 2025, 15(15), 8467; https://doi.org/10.3390/app15158467 (registering DOI) - 30 Jul 2025

Viewed by 149

Abstract

Medical imaging is vital for diagnosing and treating colorectal cancer (CRC), a leading cause of mortality. Classifying colorectal polyps and CRC precursors remains challenging due to operator variability and expertise dependence. Deep learning (DL) models show promise in polyp classification but face adoption [...] Read more.

Medical imaging is vital for diagnosing and treating colorectal cancer (CRC), a leading cause of mortality. Classifying colorectal polyps and CRC precursors remains challenging due to operator variability and expertise dependence. Deep learning (DL) models show promise in polyp classification but face adoption barriers due to their ‘black box’ nature, limiting interpretability. This study presents an example-based explainable artificial intehlligence (XAI) approach using Pix2Pix to generate synthetic polyp images with controlled size variations and LIME to explain classifier predictions visually. EfficientNet and Vision Transformer (ViT) were trained on datasets of real and synthetic images, achieving strong baseline accuracies of 94% and 96%, respectively. Image quality was assessed using PSNR (18.04), SSIM (0.64), and FID (123.32), while classifier robustness was evaluated across polyp sizes. Results show that Pix2Pix effectively controls image attributes like polyp size despite limitations in visual fidelity. LIME integration revealed classifier vulnerabilities, underscoring the value of complementary XAI techniques. This enhances DL model interpretability and deepens understanding of their behaviour. The findings contribute to developing explainable AI tools for polyp classification and CRC diagnosis. Future work will improve synthetic image quality and refine XAI methodologies for broader clinical use. Full article

(This article belongs to the Special Issue Innovative Applications of AI, Machine Learning, IoT, and Assistive Robots in Health Monitoring and Care)

► Show Figures

Figure 1

13 pages, 311 KiB

Open AccessArticle

Diagnostic Performance of ChatGPT-4o in Analyzing Oral Mucosal Lesions: A Comparative Study with Experts

by Luigi Angelo Vaira, Jerome R. Lechien, Antonino Maniaci, Andrea De Vito, Miguel Mayo-Yáñez, Stefania Troise, Giuseppe Consorti, Carlos M. Chiesa-Estomba, Giovanni Cammaroto, Thomas Radulesco, Arianna di Stadio, Alessandro Tel, Andrea Frosolini, Guido Gabriele, Giannicola Iannella, Alberto Maria Saibene, Paolo Boscolo-Rizzo, Giovanni Maria Soro, Giovanni Salzano and Giacomo De Riu

Medicina 2025, 61(8), 1379; https://doi.org/10.3390/medicina61081379 - 30 Jul 2025

Viewed by 149

Abstract

Background and Objectives: this pilot study aimed to evaluate the diagnostic accuracy of ChatGPT-4o in analyzing oral mucosal lesions from clinical images. Materials and Methods: a total of 110 clinical images, including 100 pathological lesions and 10 healthy mucosal images, were retrieved [...] Read more.

Background and Objectives: this pilot study aimed to evaluate the diagnostic accuracy of ChatGPT-4o in analyzing oral mucosal lesions from clinical images. Materials and Methods: a total of 110 clinical images, including 100 pathological lesions and 10 healthy mucosal images, were retrieved from Google Images and analyzed by ChatGPT-4o using a standardized prompt. An expert panel of five clinicians established a reference diagnosis, categorizing lesions as benign or malignant. The AI-generated diagnoses were classified as correct or incorrect and further categorized as plausible or not plausible. The accuracy, sensitivity, specificity, and agreement with the expert panel were analyzed. The Artificial Intelligence Performance Instrument (AIPI) was used to assess the quality of AI-generated recommendations. Results: ChatGPT-4o correctly diagnosed 85% of cases. Among the 15 incorrect diagnoses, 10 were deemed plausible by the expert panel. The AI misclassified three malignant lesions as benign but did not categorize any benign lesions as malignant. Sensitivity and specificity were 91.7% and 100%, respectively. The AIPI score averaged 17.6 ± 1.73, indicating strong diagnostic reasoning. The McNemar test showed no significant differences between AI and expert diagnoses (p = 0.084). Conclusions: In this proof-of-concept pilot study, ChatGPT-4o demonstrated high diagnostic accuracy and strong descriptive capabilities in oral mucosal lesion analysis. A residual 8.3% false-negative rate for malignant lesions underscores the need for specialist oversight; however, the model shows promise as an AI-powered triage aid in settings with limited access to specialized care. Full article

(This article belongs to the Section Dentistry and Oral Health)

50 pages, 937 KiB

Open AccessReview

Precision Neuro-Oncology in Glioblastoma: AI-Guided CRISPR Editing and Real-Time Multi-Omics for Genomic Brain Surgery

by Matei Șerban, Corneliu Toader and Răzvan-Adrian Covache-Busuioc

Int. J. Mol. Sci. 2025, 26(15), 7364; https://doi.org/10.3390/ijms26157364 - 30 Jul 2025

Viewed by 192

Abstract

Precision neurosurgery is rapidly evolving as a medical specialty by merging genomic medicine, multi-omics technologies, and artificial intelligence (AI) technology, while at the same time, society is shifting away from the traditional, anatomic model of care to consider a more precise, molecular model [...] Read more.

Precision neurosurgery is rapidly evolving as a medical specialty by merging genomic medicine, multi-omics technologies, and artificial intelligence (AI) technology, while at the same time, society is shifting away from the traditional, anatomic model of care to consider a more precise, molecular model of care. The general purpose of this review is to contemporaneously reflect on how these advances will impact neurosurgical care by providing us with more precise diagnostic and treatment pathways. We hope to provide a relevant review of the recent advances in genomics and multi-omics in the context of clinical practice and highlight their transformational opportunities in the existing models of care, where improved molecular insights can support improvements in clinical care. More specifically, we will highlight how genomic profiling, CRISPR-Cas9, and multi-omics platforms (genomics, transcriptomics, proteomics, and metabolomics) are increasing our understanding of central nervous system (CNS) disorders. Achievements obtained with transformational technologies such as single-cell RNA sequencing and intraoperative mass spectrometry are exemplary of the molecular diagnostic possibilities in real-time molecular diagnostics to enable a more directed approach in surgical options. We will also explore how identifying specific biomarkers (e.g., IDH mutations and MGMT promoter methylation) became a tipping point in the care of glioblastoma and allowed for the establishment of a new taxonomy of tumors that became applicable for surgeons, where a change in practice enjoined a different surgical resection approach and subsequently stratified the adjuvant therapies undertaken after surgery. Furthermore, we reflect on how the novel genomic characterization of mutations like DEPDC5 and SCN1A transformed the pre-surgery selection of surgical candidates for refractory epilepsy when conventional imaging did not define an epileptogenic zone, thus reducing resective surgery occurring in clinical practice. While we are atop the crest of an exciting wave of advances, we recognize that we also must be diligent about the challenges we must navigate to implement genomic medicine in neurosurgery—including ethical and technical challenges that could arise when genomic mutation-based therapies require the concurrent application of multi-omics data collection to be realized in practice for the benefit of patients, as well as the constraints from the blood–brain barrier. The primary challenges also relate to the possible gene privacy implications around genomic medicine and equitable access to technology-based alternative practice disrupting interventions. We hope the contribution from this review will not just be situational consolidation and integration of knowledge but also a stimulus for new lines of research and clinical practice. We also hope to stimulate mindful discussions about future possibilities for conscientious and sustainable progress in our evolution toward a genomic model of precision neurosurgery. In the spirit of providing a critical perspective, we hope that we are also adding to the larger opportunity to embed molecular precision into neuroscience care, striving to promote better practice and better outcomes for patients in a global sense. Full article

(This article belongs to the Special Issue Molecular Insights into Glioblastoma Pathogenesis and Therapeutics)

► Show Figures

Figure 1

26 pages, 14606 KiB

Open AccessReview

Attribution-Based Explainability in Medical Imaging: A Critical Review on Explainable Computer Vision (X-CV) Techniques and Their Applications in Medical AI

by Kazi Nabiul Alam, Pooneh Bagheri Zadeh and Akbar Sheikh-Akbari

Electronics 2025, 14(15), 3024; https://doi.org/10.3390/electronics14153024 - 29 Jul 2025

Viewed by 252

Abstract

One of the largest future applications of computer vision is in the healthcare industry. Computer vision tasks are generally implemented in diverse medical imaging scenarios, including detecting or classifying diseases, predicting potential disease progression, analyzing cancer data for advancing future research, and conducting [...] Read more.

One of the largest future applications of computer vision is in the healthcare industry. Computer vision tasks are generally implemented in diverse medical imaging scenarios, including detecting or classifying diseases, predicting potential disease progression, analyzing cancer data for advancing future research, and conducting genetic analysis for personalized medicine. However, a critical drawback of using Computer Vision (CV) approaches is their limited reliability and transparency. Clinicians and patients must comprehend the rationale behind predictions or results to ensure trust and ethical deployment in clinical settings. This demonstrates the adoption of the idea of Explainable Computer Vision (X-CV), which enhances vision-relative interpretability. Among various methodologies, attribution-based approaches are widely employed by researchers to explain medical imaging outputs by identifying influential features. This article solely aims to explore how attribution-based X-CV methods work in medical imaging, what they are good for in real-world use, and what their main limitations are. This study evaluates X-CV techniques by conducting a thorough review of relevant reports, peer-reviewed journals, and methodological approaches to obtain an adequate understanding of attribution-based approaches. It explores how these techniques tackle computational complexity issues, improve diagnostic accuracy and aid clinical decision-making processes. This article intends to present a path that generalizes the concept of trustworthiness towards AI-based healthcare solutions. Full article

(This article belongs to the Special Issue Artificial Intelligence-Driven Emerging Applications)

► Show Figures

Figure 1

25 pages, 2887 KiB

Open AccessArticle

Federated Learning Based on an Internet of Medical Things Framework for a Secure Brain Tumor Diagnostic System: A Capsule Networks Application

by Roman Rodriguez-Aguilar, Jose-Antonio Marmolejo-Saucedo and Utku Köse

Mathematics 2025, 13(15), 2393; https://doi.org/10.3390/math13152393 - 25 Jul 2025

Viewed by 189

Abstract

Artificial intelligence (AI) has already played a significant role in the healthcare sector, particularly in image-based medical diagnosis. Deep learning models have produced satisfactory and useful results for accurate decision-making. Among the various types of medical images, magnetic resonance imaging (MRI) is frequently [...] Read more.

Artificial intelligence (AI) has already played a significant role in the healthcare sector, particularly in image-based medical diagnosis. Deep learning models have produced satisfactory and useful results for accurate decision-making. Among the various types of medical images, magnetic resonance imaging (MRI) is frequently utilized in deep learning applications to analyze detailed structures and organs in the body, using advanced intelligent software. However, challenges related to performance and data privacy often arise when using medical data from patients and healthcare institutions. To address these issues, new approaches have emerged, such as federated learning. This technique ensures the secure exchange of sensitive patient and institutional data. It enables machine learning or deep learning algorithms to establish a client–server relationship, whereby specific parameters are securely shared between models while maintaining the integrity of the learning tasks being executed. Federated learning has been successfully applied in medical settings, including diagnostic applications involving medical images such as MRI data. This research introduces an analytical intelligence system based on an Internet of Medical Things (IoMT) framework that employs federated learning to provide a safe and effective diagnostic solution for brain tumor identification. By utilizing specific brain MRI datasets, the model enables multiple local capsule networks (CapsNet) to achieve improved classification results. The average accuracy rate of the CapsNet model exceeds 97%. The precision rate indicates that the CapsNet model performs well in accurately predicting true classes. Additionally, the recall findings suggest that this model is effective in detecting the target classes of meningiomas, pituitary tumors, and gliomas. The integration of these components into an analytical intelligence system that supports the work of healthcare personnel is the main contribution of this work. Evaluations have shown that this approach is effective for diagnosing brain tumors while ensuring data privacy and security. Moreover, it represents a valuable tool for enhancing the efficiency of the medical diagnostic process. Full article

(This article belongs to the Special Issue Innovations in Optimization and Operations Research)

► Show Figures

Figure 1

19 pages, 3862 KiB

Open AccessArticle

Estimation of Total Hemoglobin (SpHb) from Facial Videos Using 3D Convolutional Neural Network-Based Regression

by Ufuk Bal, Faruk Enes Oguz, Kubilay Muhammed Sunnetci, Ahmet Alkan, Alkan Bal, Ebubekir Akkuş, Halil Erol and Ahmet Çağdaş Seçkin

Biosensors 2025, 15(8), 485; https://doi.org/10.3390/bios15080485 - 25 Jul 2025

Viewed by 363

Abstract

Hemoglobin plays a critical role in diagnosing various medical conditions, including infections, trauma, hemolytic disorders, and Mediterranean anemia, which is particularly prevalent in Mediterranean populations. Conventional measurement methods require blood sampling and laboratory analysis, which are often time-consuming and impractical during emergency situations [...] Read more.

Hemoglobin plays a critical role in diagnosing various medical conditions, including infections, trauma, hemolytic disorders, and Mediterranean anemia, which is particularly prevalent in Mediterranean populations. Conventional measurement methods require blood sampling and laboratory analysis, which are often time-consuming and impractical during emergency situations with limited medical infrastructure. Although portable oximeters enable non-invasive hemoglobin estimation, they still require physical contact, posing limitations for individuals with circulatory or dermatological conditions. Additionally, reliance on disposable probes increases operational costs. This study presents a non-contact and automated approach for estimating total hemoglobin levels from facial video data using three-dimensional regression models. A dataset was compiled from 279 volunteers, with synchronized acquisition of facial video and hemoglobin values using a commercial pulse oximeter. After preprocessing, the dataset was divided into training, validation, and test subsets. Three 3D convolutional regression models, including 3D CNN, channel attention-enhanced 3D CNN, and residual 3D CNN, were trained, and the most successful model was implemented in a graphical interface. Among these, the residual model achieved the most favorable performance on the test set, yielding an RMSE of 1.06, an MAE of 0.85, and a Pearson correlation coefficient of 0.73. This study offers a novel contribution by enabling contactless hemoglobin estimation from facial video using 3D CNN-based regression techniques. Full article

(This article belongs to the Special Issue Non-Invasive Biosensors for Clinical Diagnostics and Healthcare Monitoring)

► Show Figures

Figure 1

28 pages, 4702 KiB

Open AccessArticle

Clinical Failure of General-Purpose AI in Photographic Scoliosis Assessment: A Diagnostic Accuracy Study

by Cemre Aydin, Ozden Bedre Duygu, Asli Beril Karakas, Eda Er, Gokhan Gokmen, Anil Murat Ozturk and Figen Govsa

Medicina 2025, 61(8), 1342; https://doi.org/10.3390/medicina61081342 - 25 Jul 2025

Viewed by 303

Abstract

Background and Objectives: General-purpose multimodal large language models (LLMs) are increasingly used for medical image interpretation despite lacking clinical validation. This study evaluates the diagnostic reliability of ChatGPT-4o and Claude 2 in photographic assessment of adolescent idiopathic scoliosis (AIS) against radiological standards. This [...] Read more.

Background and Objectives: General-purpose multimodal large language models (LLMs) are increasingly used for medical image interpretation despite lacking clinical validation. This study evaluates the diagnostic reliability of ChatGPT-4o and Claude 2 in photographic assessment of adolescent idiopathic scoliosis (AIS) against radiological standards. This study examines two critical questions: whether families can derive reliable preliminary assessments from LLMs through analysis of clinical photographs and whether LLMs exhibit cognitive fidelity in their visuospatial reasoning capabilities for AIS assessment. Materials and Methods: A prospective diagnostic accuracy study (STARD-compliant) analyzed 97 adolescents (74 with AIS and 23 with postural asymmetry). Standardized clinical photographs (nine views/patient) were assessed by two LLMs and two orthopedic residents against reference radiological measurements. Primary outcomes included diagnostic accuracy (sensitivity/specificity), Cobb angle concordance (Lin’s CCC), inter-rater reliability (Cohen’s κ), and measurement agreement (Bland–Altman LoA). Results: The LLMs exhibited hazardous diagnostic inaccuracy: ChatGPT misclassified all non-AIS cases (specificity 0% [95% CI: 0.0–14.8]), while Claude 2 generated 78.3% false positives. Systematic measurement errors exceeded clinical tolerance: ChatGPT overestimated thoracic curves by +10.74° (LoA: −21.45° to +42.92°), exceeding tolerance by >800%. Both LLMs showed inverse biomechanical concordance in thoracolumbar curves (CCC ≤ −0.106). Inter-rater reliability fell below random chance (ChatGPT κ = −0.039). Universal proportional bias (slopes ≈ −1.0) caused severe curve underestimation (e.g., 10–15° error for 50° deformities). Human evaluators demonstrated superior bias control (0.3–2.8° vs. 2.6–10.7°) but suboptimal specificity (21.7–26.1%) and hazardous lumbar concordance (CCC: −0.123). Conclusions: General-purpose LLMs demonstrate clinically unacceptable inaccuracy in photographic AIS assessment, contraindicating clinical deployment. Catastrophic false positives, systematic measurement errors exceeding tolerance by 480–1074%, and inverse diagnostic concordance necessitate urgent regulatory safeguards under frameworks like the EU AI Act. Neither LLMs nor photographic human assessment achieve reliability thresholds for standalone screening, mandating domain-specific algorithm development and integration of 3D modalities. Full article

(This article belongs to the Special Issue Diagnosis and Treatment of Adolescent Idiopathic Scoliosis)

► Show Figures

Figure 1

15 pages, 1758 KiB

Open AccessArticle

Eye-Guided Multimodal Fusion: Toward an Adaptive Learning Framework Using Explainable Artificial Intelligence

by Sahar Moradizeyveh, Ambreen Hanif, Sidong Liu, Yuankai Qi, Amin Beheshti and Antonio Di Ieva

Sensors 2025, 25(15), 4575; https://doi.org/10.3390/s25154575 - 24 Jul 2025

Viewed by 212

Abstract

Interpreting diagnostic imaging and identifying clinically relevant features remain challenging tasks, particularly for novice radiologists who often lack structured guidance and expert feedback. To bridge this gap, we propose an Eye-Gaze Guided Multimodal Fusion framework that leverages expert eye-tracking data to enhance learning [...] Read more.

Interpreting diagnostic imaging and identifying clinically relevant features remain challenging tasks, particularly for novice radiologists who often lack structured guidance and expert feedback. To bridge this gap, we propose an Eye-Gaze Guided Multimodal Fusion framework that leverages expert eye-tracking data to enhance learning and decision-making in medical image interpretation. By integrating chest X-ray (CXR) images with expert fixation maps, our approach captures radiologists’ visual attention patterns and highlights regions of interest (ROIs) critical for accurate diagnosis. The fusion model utilizes a shared backbone architecture to jointly process image and gaze modalities, thereby minimizing the impact of noise in fixation data. We validate the system’s interpretability using Gradient-weighted Class Activation Mapping (Grad-CAM) and assess both classification performance and explanation alignment with expert annotations. Comprehensive evaluations, including robustness under gaze noise and expert clinical review, demonstrate the framework’s effectiveness in improving model reliability and interpretability. This work offers a promising pathway toward intelligent, human-centered AI systems that support both diagnostic accuracy and medical training. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

22 pages, 4406 KiB

Open AccessArticle

Colorectal Cancer Detection Tool Developed with Neural Networks

by Alex Ede Danku, Eva Henrietta Dulf, Alexandru George Berciu, Noemi Lorenzovici and Teodora Mocan

Appl. Sci. 2025, 15(15), 8144; https://doi.org/10.3390/app15158144 - 22 Jul 2025

Viewed by 245

Abstract

In the last two decades, there has been a considerable surge in the development of artificial intelligence. Imaging is most frequently employed for the diagnostic evaluation of patients, as it is regarded as one of the most precise methods for identifying the presence [...] Read more.

In the last two decades, there has been a considerable surge in the development of artificial intelligence. Imaging is most frequently employed for the diagnostic evaluation of patients, as it is regarded as one of the most precise methods for identifying the presence of a disease. However, a study indicates that approximately 800,000 individuals in the USA die or incur permanent disability because of misdiagnosis. The present study is based on the use of computer-aided diagnosis of colorectal cancer. The objective of this study is to develop a practical, low-cost, AI-based decision-support tool that integrates clinical test data (blood/stool) and, if needed, colonoscopy images to help reduce misdiagnosis and improve early detection of colorectal cancer for clinicians. Convolutional neural networks (CNNs) and artificial neural networks (ANNs) are utilized in conjunction with a graphical user interface (GUI), which caters to individuals lacking programming expertise. The performance of the artificial neural network (ANN) is measured using the mean squared error (MSE) metric, and the obtained performance is 7.38. For CNN, two distinct cases are under consideration: one with two outputs and one with three outputs. The precision of the models is 97.2% for RGB and 96.7% for grayscale, respectively, in the first instance, and 83% for RGB and 82% for grayscale in the second instance. However, using a pretrained network yielded superior performance with 99.5% for 2-output models and 93% for 3-output models. The GUI is composed of two panels, with the best ANN model and the best CNN model being utilized in each. The primary function of the tool is to assist medical personnel in reducing the time required to make decisions and the probability of misdiagnosis. Full article

► Show Figures

Figure 1

16 pages, 2557 KiB

Open AccessArticle

Explainable AI for Oral Cancer Diagnosis: Multiclass Classification of Histopathology Images and Grad-CAM Visualization

by Jelena Štifanić, Daniel Štifanić, Nikola Anđelić and Zlatan Car

Biology 2025, 14(8), 909; https://doi.org/10.3390/biology14080909 - 22 Jul 2025

Viewed by 293

Abstract

Oral cancer is typically diagnosed through histological examination; however, the primary issue with this type of procedure is tumor heterogeneity, where a subjective aspect of the examination may have a direct effect on the treatment plan for a patient. To reduce inter- and [...] Read more.

Oral cancer is typically diagnosed through histological examination; however, the primary issue with this type of procedure is tumor heterogeneity, where a subjective aspect of the examination may have a direct effect on the treatment plan for a patient. To reduce inter- and intra-observer variability, artificial intelligence algorithms are often used as computational aids in tumor classification and diagnosis. This research proposes a two-step approach for automatic multiclass grading using oral histopathology images (the first step) and Grad-CAM visualization (the second step) to assist clinicians in diagnosing oral squamous cell carcinoma. The Xception architecture achieved the highest classification values of 0.929 (±σ = 0.087) AUC_macro and 0.942 (±σ = 0.074) AUC_micro. Additionally, Grad-CAM provided visual explanations of the model’s predictions by highlighting the precise areas of histopathology images that influenced the model’s decision. These results emphasize the potential of integrated AI algorithms in medical diagnostics, offering a more precise, dependable, and effective method for disease analysis. Full article

► Show Figures

Figure 1

12 pages, 2353 KiB

Open AccessArticle

Intergrader Agreement on Qualitative and Quantitative Assessment of Diabetic Retinopathy Severity Using Ultra-Widefield Imaging: INSPIRED Study Report 1

by Eleonora Riotto, Wei-Shan Tsai, Hagar Khalid, Francesca Lamanna, Louise Roch, Medha Manoj and Sobha Sivaprasad

Diagnostics 2025, 15(14), 1831; https://doi.org/10.3390/diagnostics15141831 - 21 Jul 2025

Viewed by 290

Abstract

Background/Objectives: Discrepancies in diabetic retinopathy (DR) grading are well-documented, with retinal non-perfusion (RNP) quantification posing greater challenges. This study assessed intergrader agreement in DR evaluation, focusing on qualitative severity grading and quantitative RNP measurement. We aimed to improve agreement through structured consensus [...] Read more.

Background/Objectives: Discrepancies in diabetic retinopathy (DR) grading are well-documented, with retinal non-perfusion (RNP) quantification posing greater challenges. This study assessed intergrader agreement in DR evaluation, focusing on qualitative severity grading and quantitative RNP measurement. We aimed to improve agreement through structured consensus meetings. Methods: A retrospective analysis of 100 comparisons from 50 eyes (36 patients) was conducted. Two paired medical retina fellows graded ultra-widefield color fundus photographs (CFP) and fundus fluorescein angiography (FFA) images. CFP assessments included DR severity using the International Clinical Diabetic Retinopathy (ICDR) grading system, DR Severity Scale (DRSS), and predominantly peripheral lesions (PPL). FFA-based RNP was defined as capillary loss with grayscale matching the foveal avascular zone. Weekly adjudication by a senior specialist resolved discrepancies. Intergrader agreement was evaluated using Cohen’s kappa (qualitative DRSS) and intraclass correlation coefficients (ICC) (quantitative RNP). Bland–Altman analysis assessed bias and variability. Results: After eight consensus meetings, CFP grading agreement improved to excellent: kappa = 91% (ICDR DR severity), 89% (DRSS), and 89% (PPL). FFA-based PPL agreement reached 100%. For RNP, the non-perfusion index (NPI) showed moderate overall ICC (0.49), with regional ICCs ranging from 0.40 to 0.57 (highest in the nasal region, ICC = 0.57). Bland–Altman analysis revealed a mean NPI difference of 0.12 (limits: −0.11 to 0.35), indicating acceptable variability despite outliers. Conclusions: Structured consensus training achieved excellent intergrader agreement for DR severity and PPL grading, supporting the clinical reliability of ultra-widefield imaging. However, RNP measurement variability underscores the need for standardized protocols and automated tools to enhance reproducibility. This process is critical for developing robust AI-based screening systems. Full article

(This article belongs to the Special Issue New Advances in Retinal Imaging)

► Show Figures

Figure 1

Search Results (689)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (689)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI