AI in Radiology and Nuclear Medicine: Challenges and Opportunities

A special issue of Diagnostics (ISSN 2075-4418). This special issue belongs to the section "Machine Learning and Artificial Intelligence in Diagnostics".

Deadline for manuscript submissions: closed (31 December 2025) | Viewed by 5965

Special Issue Editor


E-Mail Website
Guest Editor
Medical Physics Department, Faculty of Medicine, School of Health Sciences, University of Thessaly, Larissa, Greece
Interests: medical physics; radiology; nuclear medicine; artificial intelligence in biomedical imaging and radiotherapy
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

This Special Issue, titled “AI in Radiology and Nuclear Medicine: Challenges and Opportunities”, delves into the ever-evolving intersection of artificial intelligence with the fields of radiology and nuclear medicine. It compiles a range of manuscripts that explore the cutting-edge applications of AI in enhancing diagnostic accuracy, optimizing workflow efficiency, and facilitating personalized treatment plans. From deep learning algorithms that augment image analysis to AI-driven decision support systems, this Special Issue highlights both the transformative potential and the inherent challenges faced in integrating AI into clinical practice.

Dr. Ioannis M. Tsougos
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Diagnostics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • diagnosis
  • prognosis
  • radiology
  • nuclear medicine
  • artificial intelligence (AI)

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

24 pages, 2692 KB  
Article
Domain Shift in Breast DCE-MRI Tumor Segmentation: A Balanced LoCoCV Study on the MAMA-MIA Dataset
by Munid Alanazi and Bader Alsharif
Diagnostics 2026, 16(2), 362; https://doi.org/10.3390/diagnostics16020362 - 22 Jan 2026
Viewed by 139
Abstract
Background and Objectives: Accurate breast tumor segmentation in dynamic contrast-enhanced MRI (DCE-MRI) is crucial for treatment planning, therapy monitoring, and quantitative studies of breast cancer response. However, deep learning models often have worse performance when applied to new hospitals because scanner hardware, acquisition [...] Read more.
Background and Objectives: Accurate breast tumor segmentation in dynamic contrast-enhanced MRI (DCE-MRI) is crucial for treatment planning, therapy monitoring, and quantitative studies of breast cancer response. However, deep learning models often have worse performance when applied to new hospitals because scanner hardware, acquisition protocols, and patient populations differ from those in the training data. This study investigates how such center-related domain shift affects automated breast DCE-MRI tumor segmentation on the multi-center MAMA-MIA dataset. Methods: We trained a standard 3D U-Net for primary tumor segmentation under two evaluation settings. First, we constructed a random patient-wise split that mixes cases from the three main MAMA-MIA center groups (ISPY2, DUKE, NACT) and used this as an in-distribution reference. Second, we designed a balanced leave-one-center-out cross-validation (LoCoCV) protocol in which each center is held out in turn, while training, validation, and test sets are matched in size across folds. Performance was assessed using the Dice similarity coefficient, 95th percentile Hausdorff distance (HD95), sensitivity, specificity, and related overlap measures. Results: On the mixed-center random split, the best three-channel model achieved a mean Dice of about 0.68 and a mean HD95 of about 19.7 mm on the held-out test set, indicating good volumetric overlap and boundary accuracy when training and test distributions match. Under balanced LoCoCV, the one-channel model reached a mean Dice of about 0.45 and a mean HD95 of about 41 mm on unseen centers, with similar averages for the three-channel variant. Compared with the random split baseline, Dice and sensitivity decreased, while HD95 nearly doubled, showing that boundary errors become larger and segmentations less reliable when the model is applied to new centers. Conclusions: A model that performs well on mixed-center random splits can still suffer a substantial loss of accuracy on completely unseen institutions. The balanced LoCoCV design makes this out-of-distribution penalty visible by separating center-related effects from sample size effects. These findings highlight the need for robust multi-center training strategies and explicit cross-center validation before deploying breast DCE-MRI segmentation models in clinical practice. Full article
(This article belongs to the Special Issue AI in Radiology and Nuclear Medicine: Challenges and Opportunities)
Show Figures

Figure 1

24 pages, 2472 KB  
Article
Beyond Radiomics Alone: Enhancing Prostate Cancer Classification with ADC Ratio in a Multicenter Benchmarking Study
by Dimitrios Samaras, Georgios Agrotis, Alexandros Vamvakas, Maria Vakalopoulou, Marianna Vlychou, Katerina Vassiou, Vasileios Tzortzis and Ioannis Tsougos
Diagnostics 2025, 15(19), 2546; https://doi.org/10.3390/diagnostics15192546 - 9 Oct 2025
Viewed by 1150
Abstract
Background/Objectives: Radiomics enables extraction of quantitative imaging features to support non-invasive classification of prostate cancer (PCa). Accurate detection of clinically significant PCa (csPCa; Gleason score ≥ 3 + 4) is crucial for guiding treatment decisions. However, many studies explore limited feature selection, [...] Read more.
Background/Objectives: Radiomics enables extraction of quantitative imaging features to support non-invasive classification of prostate cancer (PCa). Accurate detection of clinically significant PCa (csPCa; Gleason score ≥ 3 + 4) is crucial for guiding treatment decisions. However, many studies explore limited feature selection, classifier, and harmonization combinations, and lack external validation. We aimed to systematically benchmark modeling pipelines and evaluate whether combining radiomics with the lesion-to-normal ADC ratio improves classification robustness and generalizability in multicenter datasets. Methods: Radiomic features were extracted from ADC maps using IBSI-compliant pipelines. Over 100 model configurations were tested, combining eight feature selection methods, fifteen classifiers, and two harmonization strategies across two scenarios: (1) repeated cross-validation on a multicenter dataset and (2) nested cross-validation with external testing on the PROSTATEx dataset. The ADC ratio was defined as the mean lesion ADC divided by contralateral normal tissue ADC, by placing two identical ROIs in each side, enabling patient-specific normalization. Results: In Scenario 1, the best model combined radiomics, ADC ratio, LASSO, and Naïve Bayes (AUC-PR = 0.844 ± 0.040). In Scenario 2, the top-performing configuration used Recursive Feature Elimination (RFE) and Boosted GLM (a generalized linear model trained with boosting), generalizing well to the external set (AUC-PR = 0.722; F1 = 0.741). ComBat harmonization improved calibration but not external discrimination. Frequently selected features were texture-based (GLCM, GLSZM) from wavelet- and LoG-filtered ADC maps. Conclusions: Integrating radiomics with the ADC ratio improves csPCa classification and enhances generalizability, supporting its potential role as a robust, clinically interpretable imaging biomarker in multicenter MRI studies. Full article
(This article belongs to the Special Issue AI in Radiology and Nuclear Medicine: Challenges and Opportunities)
Show Figures

Figure 1

13 pages, 1859 KB  
Article
Enhanced Malignancy Prediction of Small Lung Nodules in Different Populations Using Transfer Learning on Low-Dose Computed Tomography
by Jyun-Ru Chen, Kuei-Yuan Hou, Yung-Chen Wang, Sen-Ping Lin, Yuan-Heng Mo, Shih-Chieh Peng and Chia-Feng Lu
Diagnostics 2025, 15(12), 1460; https://doi.org/10.3390/diagnostics15121460 - 8 Jun 2025
Viewed by 1246
Abstract
Background: Predicting malignancy in small lung nodules (SLNs) across diverse populations is challenging due to significant demographic and clinical variations. This study investigates whether transfer learning (TL) can improve malignancy prediction for SLNs using low-dose computed tomography across datasets from different countries. Methods: [...] Read more.
Background: Predicting malignancy in small lung nodules (SLNs) across diverse populations is challenging due to significant demographic and clinical variations. This study investigates whether transfer learning (TL) can improve malignancy prediction for SLNs using low-dose computed tomography across datasets from different countries. Methods: We collected two datasets: an Asian dataset (669 SLNs from Cathay General Hospital, CGH, Taiwan) and an American dataset (600 SLNs from the National Lung Screening Trial, NLST, America). Initial U-Net models for malignancy prediction were trained on each dataset, followed by the application of TL to transfer model parameters across datasets. Model performance was evaluated using accuracy, specificity, sensitivity, and the area under the receiver operating characteristic curve (AUC). Results: Significant demographic differences (p < 0.001) were observed between the CGH and NLST datasets. Initial models trained on one dataset showed a substantial performance decline of 15.2% to 97.9% when applied to the other dataset. TL enhanced model performance across datasets by 21.1% to 159.5% (p < 0.001), achieving an accuracy of 0.86–0.91, sensitivity of 0.81–0.96, specificity of 0.89–0.92, and an AUC of 0.90–0.97. Conclusions: TL enhances SLN malignancy prediction models by addressing population variations and enabling their application across diverse international datasets. Full article
(This article belongs to the Special Issue AI in Radiology and Nuclear Medicine: Challenges and Opportunities)
Show Figures

Figure 1

12 pages, 3173 KB  
Article
Information Extraction from Lumbar Spine MRI Radiology Reports Using GPT4: Accuracy and Benchmarking Against Research-Grade Comprehensive Scoring
by Katharina Ziegeler, Virginie Kreutzinger, Michelle W. Tong, Cynthia T. Chin, Emma Bahroos, Po-Hung Wu, Noah Bonnheim, Aaron J. Fields, Jeffrey C. Lotz, Thomas M. Link and Sharmila Majumdar
Diagnostics 2025, 15(7), 930; https://doi.org/10.3390/diagnostics15070930 - 4 Apr 2025
Cited by 2 | Viewed by 2351
Abstract
Background/Objectives: This study aimed to create a pipeline for standardized data extraction from lumbar-spine MRI radiology reports using a large language model (LLM) and assess the agreement of the extracted data with research-grade semi-quantitative scoring. Methods: We included a subset of [...] Read more.
Background/Objectives: This study aimed to create a pipeline for standardized data extraction from lumbar-spine MRI radiology reports using a large language model (LLM) and assess the agreement of the extracted data with research-grade semi-quantitative scoring. Methods: We included a subset of data from a multi-site NIH-funded cohort study of chronic low back pain (cLBP) participants. After initial prompt development, a secure application programming interface (API) deployment of OpenAIs GPT-4 was used to extract different classes of pathology from the clinical radiology report. Unsupervised UMAP and agglomerative clustering of the pathology terms’ embeddings provided insight into model comprehension for optimized prompt design. Model extraction was benchmarked against human extraction (gold standard) with F1 scores and false-positive and false-negative rates (FPR/FNR). Then, an expert MSK radiologist provided comprehensive research-grade scores of the images, and agreement with report-extracted data was calculated using Cohen’s kappa. Results: Data from 230 patients with cLBP were included (mean age 53.2 years, 54% women). The overall model performance for extracting data from clinical reports was excellent, with a mean F1 score of 0.96 across pathologies. The mean FPR was marginally higher than the FNR (5.1% vs. 3.0%). Agreement with comprehensive scoring was moderate (kappa 0.424), and the underreporting of lateral recess stenosis (FNR 63.6%) and overreporting of disc pathology (FPR 42.7%) were noted. Conclusions: LLMs can accurately extract highly detailed information on lumbar spine imaging pathologies from radiology reports. Moderate agreement between the LLM and comprehensive scores underscores the need for less subjective, machine-based data extraction from imaging. Full article
(This article belongs to the Special Issue AI in Radiology and Nuclear Medicine: Challenges and Opportunities)
Show Figures

Figure 1

Other

Jump to: Research

10 pages, 680 KB  
Systematic Review
Diagnostic Performance of Artificial Intelligence in Predicting Malignant Upgrade of B3 Breast Lesions: Systematic Review and Meta-Analysis
by Romuald Ferre and Cherie M. Kuzmiak
Diagnostics 2026, 16(1), 75; https://doi.org/10.3390/diagnostics16010075 - 25 Dec 2025
Viewed by 448
Abstract
Background/Objectives: High-risk (B3) breast lesions are a heterogeneous group with uncertain malignant potential. Methods: We systematically reviewed and meta-analyzed the ability of artificial-intelligence (AI) models to predict malignant upgrades (a ductal carcinoma in situ or an invasive carcinoma) after biopsy. A comprehensive search [...] Read more.
Background/Objectives: High-risk (B3) breast lesions are a heterogeneous group with uncertain malignant potential. Methods: We systematically reviewed and meta-analyzed the ability of artificial-intelligence (AI) models to predict malignant upgrades (a ductal carcinoma in situ or an invasive carcinoma) after biopsy. A comprehensive search of medical and engineering databases through 27 July 2025 identified retrospective studies that developed or validated AI models for upgrade prediction in cohorts with ≥20 B3 lesions and confirmed outcomes at surgical excision or after ≥24 months of follow-up. Results: Three single-center studies (557 lesions, 91 upgrades) met the eligibility criteria. Pooled analysis focused on clinically meaningful operating points rather than raw accuracy metrics. Models tuned for high sensitivity achieved high negative predictive values (pooled 0.95), suggesting reliable identification of lesions suitable for surveillance, but positive predictive values were modest and heterogenous (0.15–1.00), reflecting trade-offs between avoiding missed upgrades and reducing unnecessary excisions. Only two studies reported area-under-the-receiver-operating-characteristic curves, which pooled to 0.72, indicating moderate discrimination. Conclusions: Although limited by small sample sizes and single-center designs, these findings suggest that AI could aid decision-making for B3 lesion management. Prospective multicenter validation and standardized reporting are needed to evaluate clinical utility. Full article
(This article belongs to the Special Issue AI in Radiology and Nuclear Medicine: Challenges and Opportunities)
Show Figures

Figure 1

Back to TopTop