Applications of Artificial Intelligence in Selected Internal Medicine Specialties: A Critical Narrative Review of the Latest Clinical Evidence

Łoś, Aleksandra; Bartusik-Aebisher, Dorota; Mytych, Wiktoria; Aebisher, David

doi:10.3390/a19010054

Open AccessReview

Applications of Artificial Intelligence in Selected Internal Medicine Specialties: A Critical Narrative Review of the Latest Clinical Evidence

¹

English Division Science Club, Collegium Medicum, Faculty of Medicine, University of Rzeszów, 35-310 Rzeszów, Poland

²

Department of Biochemistry and General Chemistry, Collegium Medicum, Faculty of Medicine, University of Rzeszów, 35-310 Rzeszów, Poland

³

Department of Photomedicine and Physical Chemisry, Collegium Medicum, Faculty of Medicine, University of Rzeszów, 35-310 Rzeszów, Poland

^*

Author to whom correspondence should be addressed.

Algorithms 2026, 19(1), 54; https://doi.org/10.3390/a19010054

Submission received: 2 December 2025 / Revised: 1 January 2026 / Accepted: 4 January 2026 / Published: 7 January 2026

(This article belongs to the Special Issue AI-Assisted Medical Diagnostics)

Download

Browse Figures

Versions Notes

Abstract

Background: Artificial intelligence (AI) is rapidly transforming clinical medicine by enabling earlier disease detection, personalized risk stratification, precision diagnostics, and optimized therapeutic decision-making across multiple specialties. Methods: This narrative review synthesizes the most recent evidence from prospective randomized controlled trials, large cohort studies, and real-world implementations of AI in cardiology, pulmonology, neurology, hepatology, pancreatic diseases, and other key areas of internal medicine. Studies were selected based on clinical impact, external validation, and regulatory approval status where applicable. Results: AI systems now outperform traditional clinical tools in numerous high-stakes applications: >88% freedom from atrial fibrillation at 1 year with AI-guided ablation, noninferior stent optimization versus OCT guidance, >95% sensitivity for atrial fibrillation and low ejection fraction detection on single-lead ECG, substantial increases in adenoma detection rate and melanoma triage accuracy, automated pancreatic cancer detection on routine CT with 89–90% sensitivity, and significant improvements in palliative care consultation rates and post-PCI outcomes using AI-supported telemedicine. Over 850 FDA-cleared AI devices exist as of November 2025, with cardiology and radiology dominating clinical adoption. Conclusions: AI has transitioned from experimental to clinically indispensable in multiple specialties, delivering measurable reductions in mortality, morbidity, hospitalizations, and healthcare resource utilization. Remaining challenges include external validation gaps, bias mitigation, and the need for large-scale prospective trials before universal implementation.

Keywords:

artificial intelligence; health monitoring; disease prediction; prognostics; medical imaging

1. Introduction

Artificial intelligence (AI) is rapidly transforming internal medicine specialties. This review examines the current applications of AI in selected fields, with a focus on high-impact trials and real-world implementations [1]. The number of PubMed-indexed publications containing the term “artificial intelligence” in the title or abstract has increased more than 30-fold since 2010 [2,3]. The widespread adoption of wearable devices and the Internet of Medical Things has enabled continuous, non-invasive collection of physiological data [4,5]. AI algorithms analyze ECG signals, heart rate, oxygen saturation, physical activity, sleep patterns, and biochemical parameters (Figure 1). Meta-analyses published between 2023 and 2025 report that deep learning models achieve >95% sensitivity and >98% specificity in detecting atrial fibrillation using data from devices such as Apple Watch or Fitbit [6,7]. Recent studies have also explored multimodal health monitoring using federated learning to preserve patient privacy [8,9]. Predictive AI models frequently outperform traditional risk assessment tools. For example, multimodal neural networks combining genetic, laboratory, imaging, and lifestyle data from the UK Biobank achieved an AUC > 0.92 for 10-year prediction of cardiovascular disease, type 2 diabetes, and selected cancers [10,11].

In 2025, the first prospective randomized clinical trials demonstrated that AI-driven disease prediction in primary care was associated with a 20–30% reduction in hospitalizations for acute cardiovascular events [12,13]. In oncology, cardiology, and intensive care, AI-based prognostic models often outperform conventional scoring systems. Examples include the DeepMind/Google Health system for predicting acute kidney injury 48 h in advance [14,15,16,17] and foundation models such as Med-PaLM-M and GatorTron for in-hospital mortality prediction from unstructured electronic health records [18,19,20]. Medical imaging remains the most mature area of AI application, with over 850 FDA-cleared AI-enabled medical devices as of November 2025, of which more than 70% are related to imaging [21].

2. Materials and Methods

This is a narrative review synthesizing the most clinically impactful applications of AI in internal medicine, with a focus on high-quality evidence published in recent years. The narrative format was chosen because the field is rapidly evolving, with heterogeneous study designs, outcomes, and regulatory pathways, making a formal meta-analysis or systematic review with quantitative synthesis currently premature and impractical. Studies were selected through a comprehensive review of PubMed, Google Scholar, and relevant conference proceedings (up to November 2025), focusing on high-impact publications (e.g., prospective trials, randomized controlled trials where available, and real-world evidence studies) reporting AI applications in clinical settings. Inclusion criteria prioritized studies demonstrating clinical impact, defined here as improvements in diagnostic accuracy, prognostic stratification, workflow efficiency, or hard clinical outcomes (e.g., reduced hospitalizations, major adverse events, or mortality). Retrospective studies were included for emerging fields, but emphasis was placed on prospective validations and FDA-cleared devices. Summary tables compile key recent reports per specialty.

Inclusion criteria

Studies were included if they met at least one of the following:

-: Prospective randomized controlled trials (RCTs) evaluating AI interventions
-: Large prospective cohort studies with real-world implementation of AI tools
-: Post hoc analyses of major prospective trials that incorporated
-: Studies demonstrating improvement in clinically relevant outcomes or resulting in regulatory approval
-: Publication date from 2010

Exclusion criteria

-: Purely retrospective studies without prospective external validation
-: Proof-of-concept or preclinical studies lacking patient-centered clinical endpoints
-: Non-English language publications

No formal quantitative synthesis or meta-analysis was performed. The selection of studies was guided by clinical impact, methodological rigor, and regulatory relevance rather than exhaustive screening of all potentially eligible records.

3. Results

Artificial intelligence shows increasingly mature clinical deployment across internal medicine specialties, with cardiology leading in both volume and outcome evidence. As of November 2025, over 850 AI-enabled medical devices have received FDA clearance, of which more than 70% are imaging-related. Emerging evidence from prospective randomized trials suggests that AI integration into routine care can reduce hard clinical endpoints (hospitalizations, major adverse cardiovascular events, missed cancers) while potentially decreasing radiologist and clinician workload. The following sections present specialty-specific results from the most recent high-impact trials and real-world implementations, organized by organ system and supported by dedicated summary tables.

3.1. Cardiology

Artificial intelligence is increasingly integrated into cardiology, supporting early diagnosis, risk stratification, personalized treatment, and disease management (Figure 2). Table 1 summarizes selected high-impact prospective studies published in recent years [22,23]. Cardiology has been at the forefront of AI adoption in clinical practice. Regarding heart failure (HF), AI leveraging machine learning and deep neural networks enables early diagnosis, phenotyping, and stratification of disease severity by integrating multimodal data, including imaging, ECG, and electronic health records. These approaches achieve significantly better prediction of adverse outcomes and therapeutic response compared to conventional methods [24,25]. Moreover, AI facilitates the identification of asymptomatic structural changes and disease progression, supporting personalized care planning and timely interventions that are increasingly transforming routine clinical practice [26,27]. Diagnostic accuracy in heart failure reaches approximately 85% when using raw imaging data from echocardiography, computed tomography, or magnetic resonance imaging [27].

While AI provides substantial support to clinicians, challenges remain, including discrepancies between diagnostic algorithms (e.g., up to 40% disagreement between HFA-PEFF and HFA2FPEF models in HF with preserved ejection fraction), over-reliance on left ventricular ejection fraction, and interpretive limitations [28,29]. These issues highlight the need for resolution prior to widespread implementation. Beyond heart failure, AI enhances imaging quality and diagnostics in congenital heart defects by reducing inter-observer variability, improving segmentation, and predicting long-term outcomes [30]. In postoperative settings, it detects subtle trends for early complication identification, such as catheter thrombosis. Additionally, AI contributes to drug development and targeted therapies [31], while at the population level, it supports early detection, resource allocation, remote monitoring, and large-scale research through registry and biobank analysis [32].

Table 1. Summary of the latest reports in the field of cardiology.

Authors	Methodology	Results	Significance
Deisenhofer, I., et al. [33]	Multicenter, randomized, controlled, double-blind superiority trial; patients with drug-refractory persistent/long-standing persistent AF were 1:1 randomized to either conventional anatomical PVI-only or PVI plus AI-guided ablation of areas showing spatio-temporal electrogram dispersion.	At 12 months after a single procedure, freedom from documented AF was 88% in the tailored (AI-guided + PVI) arm vs. 70% in the PVI-only arm (log-rank p < 0.0001). No significant difference in freedom from any atrial arrhythmia. Procedure and ablation times were twice as long in the tailored arm; safety outcomes were similar between groups.	AI-guided targeting of spatio-temporal dispersion areas significantly improves 1-year AF-free survival compared to PVI alone in persistent/long-standing persistent AF. This establishes a new, more effective ablation strategy beyond standard PVI, although longer procedures and potential need for additional tachycardia ablation should be considered.
Kim, Y., et al. [34]	Multicenter, randomized (1:1), controlled trial involving 400 patients undergoing PCI; comparison of fully automated real-time AI-based quantitative coronary angiography (AI-QCA)-assisted PCI versus intravascular OCT-guided PCI. Primary endpoint: post-PCI minimal stent area (MSA) measured by OCT, tested for noninferiority (margin 0.8 mm²).	Post-PCI MSA was 6.3 ± 2.2 mm² (AI-QCA) vs. 6.2 ± 2.2 mm² (OCT) (difference −0.16 mm²; 95% CI −0.59 to 0.28; P for noninferiority < 0.001). Most OCT-defined endpoints (stent underexpansion, dissection, untreated reference disease) were similar; stent malapposition was higher in the AI-QCA group (13.6% vs. 5.6%, p = 0.007).	AI-QCA-assisted PCI is noninferior to OCT-guided PCI in achieving optimal stent expansion (MSA) while being faster and not requiring additional imaging equipment or expertise. It offers a practical, fully automated alternative to intravascular imaging guidance for everyday PCI with comparable stent optimization outcomes.
Liu, W. T., et al. [35]	Open-label, cluster-randomized controlled trial at two hospitals in Taiwan; noncardiologists were randomized by cluster to either receive real-time AI-ECG alerts for undetected AF in at-risk patients (CHA₂DS₂-VASc ≥ 1 M/≥2 F) or usual care without alerts (NCT05127460).	In patients with AI-detected AF, NOAC prescription within 90 days was significantly higher in the intervention group (23.3% vs. 12.0%; HR 1.85, 95% CI 1.11–3.07). New AF diagnosis rate was also higher (HR 1.40, 95% CI 1.03–1.90). No differences in echocardiogram ordering, cardiology referrals, ischemic stroke, CV death, or all-cause death.	Simple AI-ECG alerts substantially increased AF detection and guideline-directed NOAC prescribing by noncardiologists, narrowing the care gap with cardiologists. This low-cost, scalable intervention can improve stroke prevention in undiagnosed AF without increasing downstream testing or hard clinical events.
Tsai, D. J., et al. [36]	Pragmatic randomized controlled trial at a single academic center in Taiwan; 13,631 inpatients under non-cardiologist care were 1:1 randomized to AI-ECG interpretation (low-EF probability displayed) versus standard ECG care without AI results.	New low EF (≤50%) diagnoses within 30 days were significantly higher in the intervention group (1.5% vs. 1.1%; HR 1.50, 95% CI 1.11–2.03). Effect was stronger in AI-flagged high-risk patients (13.0% vs. 8.9%; HR 1.55). Positive predictive value of echocardiograms for low EF rose from 20.2% to 34.2% (p < 0.001) with no increase in overall echo utilization; cardiology consultations increased in high-risk patients.	A simple AI-ECG tool significantly improved early detection of low ejection fraction in routine inpatient care without raising resource use. It enhanced diagnostic yield of downstream testing and facilitated more appropriate specialist referral, demonstrating an efficient way to close the gap in heart failure diagnosis by non-cardiologists.
Kolossváry, M., et al. [37]	Post hoc analysis of the SCOT-HEART trial; coronary CT angiography from 1750 patients was segmented and analyzed for both conventional attenuation-based plaque burden and advanced radiomic features (eigen radiomic descriptors of plaque morphology). Univariable and multivariable Cox models plus Harrell’s C-statistic and time-dependent AUC with cross-validation assessed incremental prognostic value for fatal/nonfatal myocardial infarction over median 8.6 years	82 myocardial infarctions occurred. Eight radiomic features remained independently associated with MI after adjustment for cardiovascular risk score and plaque burden. Adding plaque burden to a clinical model did not improve discrimination (C-statistic 0.70 → 0.70), but further adding radiomic features increased performance to C-statistic 0.74, with significantly higher cumulative/dynamic AUC after year 5.	Radiomics-based detailed plaque morphology characterization substantially improves long-term MI risk prediction beyond clinical factors, calcium score, stenosis, and simple plaque burden. This precision-phenotyping approach from routine coronary CTA identifies higher-risk plaques and could enable better risk stratification and targeted prevention.
Trivedi, R., et al. [38]	Qualitative semistructured interviews with purposive sampling of 30 patients with atrial fibrillation who completed a 6-month fully automated voice-based conversational AI intervention (weekly AI phone calls with speech recognition and natural language processing) as part of the CHAT-AF trial; thematic analysis of transcribed interviews.	Four main themes emerged: (1) AI interactions felt human-like yet limited by scripted responses and trusted because hospital-delivered; (2) engagement depended on personalization, novel content, manageable information volume, and multichannel flexibility; (3) AI improved perceived access to continuous AF care and information; (4) patients felt empowered in self-management through reminders and reassurance from linked rhythm-monitoring devices.	Patients with AF found conversational AI an acceptable and engaging tool for education and self-management support, particularly when personalized and hospital affiliated. Findings highlight the value of voice-based AI in bridging care gaps while identifying key areas (natural dialog flow, tailored content, and information dosing) for future improvement.
Mekonnen, D., et al. [39]	Post hoc imaging sub-analysis of the RESUS-AMI trial; AI-assisted echocardiographic software (CAAS Qardia 2.0) was used to measure GLS (fully automated and semi-automated), LVEF, and volumes in 169 patients after primary PCI for STEMI. Results were correlated with CMR-derived infarct size, LVEF, and volumes (n = 81); intra- and inter-observer reproducibility of AI-derived parameters was assessed using ICC, bias, and limits of agreement.	AI-derived GLS showed moderate-to-good correlation with CMR infarct size (r = 0.58 automated, r = 0.64 semi-automated; both p < 0.001) and CMR LVEF (r = −0.63 and −0.65). Correlation with echo-derived LVEF was r = −0.51 (automated) and r = −0.67 (semi-automated). Inter- and intra-observer reproducibility of GLS was excellent (ICC 0.93–0.94).	AI-assisted GLS provides a reproducible and reliable marker of infarct size and LV systolic function after STEMI, with good correlation to gold-standard CMR. It enables fast, operator-independent strain assessment in routine post-PCI echocardiography, potentially improving risk stratification and follow-up.
Williams, M. C., et al. [40]	Retrospective analysis of the SCOT-HEART trial (n = 1769); two separate XGBoost machine learning models with 10-fold cross-validation and grid-search hyperparameter tuning were trained on clinical variables (symptoms, demographics, risk factors, ECG, exercise tolerance testing) to predict (1) any coronary artery disease and (2) increased low-attenuation plaque (LAP) burden on CCTA.	ML model predicted any CAD significantly better than the 10-year CV risk score alone (AUC 0.80 vs. 0.75, p = 0.004); key features: CV risk score, age, sex, total cholesterol, abnormal ETT. The model predicting high LAP burden showed no significant improvement over the CV risk score (AUC 0.75 vs. 0.72, p = 0.08).	Machine learning using readily available clinical data meaningfully improves pre-test prediction of obstructive CAD on CCTA, potentially optimizing selection for imaging. However, clinical variables alone are insufficient to reliably predict high-risk (low-attenuation) plaque burden, suggesting imaging-derived features remain essential for identifying vulnerable plaques.
Saklica, D., et al. [41]	Randomized controlled trial with 52 CAD patients allocated to three groups: telerehabilitation (TRG, n = 18), mobile app-based rehabilitation (MAG, n = 13), or control (physical activity advice only, CG, n = 21). All intervention groups followed a 12-week supervised calisthenic/resistance program (3×/week). Outcomes: exercise capacity (Incremental Shuttle Walk Test), QoL (SF-36), adherence, and patient feedback analyzed with fine-tuned BERT NLP model plus anomaly detection.	Both TRG (+87.2 m) and MAG (+89.4 m) significantly improved ISWT distance vs. CG (+10.9 m; p = 0.001). Adherence was markedly higher in TRG (100%) and MAG (80%) than CG (30%; p < 0.001). NLP-analyzed patient satisfaction strongly correlated with ISWT gains (r = 0.75, p < 0.001); AI anomaly detection identified adherence–outcome mismatches.	Technology-supported cardiac rehabilitation (tele- or app-based) substantially outperforms usual-care advice in improving exercise capacity and adherence. AI tools (NLP for sentiment analysis and anomaly detection) provide objective, scalable enhancement of outcome evaluation and patient engagement monitoring in cardiac rehabilitation.
Trivedi, R., et al. [42]	Single-blinded, 4:1 randomized controlled feasibility trial; 103 post-discharge AF patients allocated to 6 months of fully automated conversational AI phone calls (speech recognition + NLP) with self-management support, symptom monitoring, triggered clinical alerts, supplementary SMS/email surveys, nudges, and an educational website versus usual care.	Trial stopped early (103/385 planned). No significant between-group difference in AFEQT QoL score at 6 months (adjusted mean difference 2.08, 95% CI −7.79 to 11.96; p = 0.46). Within the intervention group, AFEQT improved significantly from baseline (69.9 to 79.9; p = 0.01). Engagement was moderate (average 4/7 outreaches completed); 88.4% of completed contacts rated useful.	Conversational AI delivered via phone calls is feasible, acceptable, and engaging for post-discharge AF support, with high perceived usefulness and preliminary evidence of within-group QoL improvement. Despite early termination preventing definitive efficacy assessment, it establishes proof-of-concept for scalable, fully automated patient support in chronic AF management.
Fiolet, A. T. L., et al. [43]	Cross-sectional subanalysis of the LoDoCo2 trial; 151 patients with chronic coronary disease on stable therapy underwent coronary CTA after median 28.2 months of blinded low-dose colchicine (0.5 mg/day) or placebo. AI-enabled software quantified pericoronary adipose tissue (PCAT) attenuation and total/detailed plaque volumes (non-calcified, low-attenuation, calcified, dense-calcified) across the entire coronary tree.	No difference in pericoronary inflammation (PCAT attenuation: −79.5 HU colchicine vs. −78.7 HU placebo, p = 0.236). Colchicine group showed significantly higher calcified plaque volume (169.6 vs. 113.1 mm³, p = 0.041), calcified plaque burden (9.6% vs. 7.0%, p = 0.035), and dense calcified plaque volume (192.8 vs. 144.3 mm³, p = 0.048). Low-attenuation plaque burden was lower with colchicine only in patients on low-intensity statins (p-interaction = 0.037).	Low-dose colchicine does not reduce pericoronary adipose tissue inflammation but promotes coronary plaque calcification and increases dense calcified plaque—features associated with greater plaque stability. This provides a mechanistic explanation for the observed cardiovascular event reduction in LoDoCo2 and supports plaque stabilization as a key anti-atherosclerotic effect of colchicine.
Li, G., et al. [44]	Post hoc blinded analysis of the prospective CAREER trial (NCT04665817); fully automatic AI-based CCTA reconstruction and CT-μFR computation (Murray-law based quantitative flow ratio) performed in 242 patients (657 vessels) who had invasive coronary angiography with FFR or μFR within 30 days. Reference standard: invasive FFR ≤ 0.80 or μFR ≤ 0.80.	Fully automatic CT-μFR was successful in all cases with mean analysis time 1.60 ± 0.34 min/patient. CT-μFR showed good correlation (r = 0.62) and agreement (bias −0.01 ± 0.10) with invasive physiology. Patient-level diagnostic accuracy was 83.0% (sensitivity 84.2%, specificity 81.9%, PPV 82.1%, NPV 84.0%).	Fully automatic, AI-powered CT-μFR provides rapid (~1.6 min), operator-independent functional assessment directly from routine CCTA with high diagnostic performance comparable to invasive FFR. It enables accurate pre-catheterization identification of hemodynamically significant stenoses, potentially reducing unnecessary invasive procedures
Yu, X., et al. [45]	Single-center, open-label, randomized controlled trial (n = 2086 post-PCI CHD patients in China); 1:1 randomization to a comprehensive web-based telemedicine platform (personalized education, medication reminders, vital sign monitoring, AI-assisted consultations) plus usual care versus usual care alone (phone follow-up at 1, 3, 6, 12 months). Primary endpoint: 1-year MACCE (cardiac death, MI, stroke, target vessel revascularization).	Telemedicine significantly reduced 1-year MACCE (3.5% vs. 5.3%, p = 0.04), driven by lower cardiac death (1.0% vs. 2.3%, p = 0.02) and MI (0.8% vs. 1.8%, p = 0.03). Serious bleeding (BARC 3–5) was lower (0.6% vs. 1.6%, p = 0.03). Intervention group achieved better BP control, higher adherence to aspirin and ACEI/ARB/ARNI, and reduced alcohol consumption; smoking showed a favorable trend.	A multicomponent AI-supported telemedicine program significantly lowered hard clinical events (MACCE) and improved secondary prevention metrics at 1 year after PCI. This demonstrates that scalable, technology-driven remote management can close implementation gaps and meaningfully improve long-term outcomes in high-risk CHD patients.
Ishiguchi, H., et al. [46]	Post hoc analysis of the WARCEF trial; 2213 HFrEF patients without atrial fibrillation were included. Nine machine learning models (including SVM, XGBoost, LightGBM) were trained on 12 selected clinical/demographic variables to predict incident ischaemic stroke during mean 3.3-year follow-up (74 events). Model performance was compared to CHA₂DS₂-VASc using AUC and decision curve analysis; feature importance assessed via SHAP values.	ML models strongly outperformed CHA₂DS₂-VASc (AUC 0.643). Best-performing models were SVM (AUC 0.874, 95% CI 0.769–0.959) and XGBoost (AUC 0.873, 95% CI 0.783–0.953), with SVM and LightGBM showing consistent net clinical benefit. Top predictive features across models: creatinine clearance, blood urea nitrogen, and warfarin use.	Machine learning substantially improves risk stratification for ischaemic stroke in HFrEF patients in sinus rhythm compared to traditional scores. Renal function markers (CrCl, BUN) and anticoagulation status emerged as dominant predictors, highlighting new targets for stroke prevention beyond AF in this high-risk population.

Recent 2025 studies demonstrate progress in the application of artificial intelligence in cardiology, while also highlighting several important limitations and areas requiring further evaluation. In atrial fibrillation ablation, AI improves efficacy [33], but at the cost of twice the procedure length and a higher risk of complications. AI-ECG increases the detection of AF and low ejection fraction [35,36], but this does not translate into a reduction in strokes or deaths—early detection does not always translate into real prevention. Voice AI for patients with AF is promising [38,42], but the studies are small, often uncontrolled, and have been terminated early—the evidence remains weak. In coronary artery disease, AI-QCA produces results like OCT [34], but higher stent malposition raises concerns about long-term safety. Radiomics with CTA improves myocardial infarction prediction [37], but the C-statistic gain is modest (0.70 → 0.74) and may be insufficient in clinical practice. Automated CT-μFR is fast and accurate [44], but requires validation in large, diverse populations. Studies from 2025 show that artificial intelligence in cardiology is making rapid progress, but many results raise significant concerns. In atrial fibrillation ablation, AI improves efficacy [33], but at the cost of twice the procedure length and a higher risk of complications. AI-ECG increases the detection of AF and low ejection fraction [35,36], but this does not translate into a reduction in stroke or death—early detection does not always translate into real prevention. Voice AI for patients with AF is promising [38,42], but studies are small, often uncontrolled, and terminate early—the evidence remains weak. In coronary artery disease, AI-QCA produces results like OCT [34], but higher stent malapposition raises concerns about long-term safety. Radiomics with CTA improves myocardial infarction prediction [37], but the C-statistic gain is modest (0.70 → 0.74) and may be insufficient in clinical practice. Automated CT-μFR is fast and accurate [44], but requires validation in large, diverse populations. ML models predict stroke better in HFrEF than CHA₂DS₂-VASc [46], but the lack of randomization and prospective validation limits their readiness for use. Tele-/application-based rehabilitation and telemedicine improve adherence and performance [41,45], but studies are small or conducted in specific settings—scalability in the real world remains uncertain. AI offers valuable tools, but most evidence comes from studies of limited scale, short follow-up, or without hard endpoints (death, stroke, myocardial infarction). We often demonstrate statistical differences, not clinically significant improvements in prognosis. Large, multicenter, randomized trials with long-term follow-up and hard endpoints are needed before routine implementation. For now, AI is a valuable addition to cardiology, but it is far from a true revolution.

3.2. Pulmonology

Building on its success in cardiovascular imaging, AI is now making significant inroads in respiratory medicine (Table 2). It is contributing to advancements in the diagnosis and management of conditions such as lung cancer, interstitial lung disease (ILD), and obstructive lung diseases [47]. Primarily through analysis of chest computed tomography, deep learning algorithms particularly convolutional neural networks (CNNs), enable automatic detection of pulmonary abnormalities, malignancy risk assessment, and longitudinal monitoring [48]. These models often match or approach expert-level accuracy in lung cancer classification, facilitating faster diagnosis, improved prognosis, personalized treatment planning, and even simulation of treatment response using high-resolution reconstructions [49]. Despite these promising results, ongoing challenges include the need for robust clinical validation, seamless system integration, and resolution of ethical concerns.

In recent years, AI has begun to play an increasingly important role in various stages of the development and implementation of respiratory disease therapies—from the discovery of new molecular targets, through drug design and dose optimization, to diagnostics and treatment monitoring. The first groundbreaking example is the path taken by the TNIK kinase inhibitor INS018_055 (later named rentosertib) from AI-assisted target identification to obtaining clinical evidence in Phase 2. Ren and colleagues [50] were the first to demonstrate that generative AI can complete the entire pathway in just 18 months: from selecting a high-potency target (TNIK) in fibrotic diseases, through designing a new chemical molecule, to nominating a clinical candidate. Importantly, INS018_055 demonstrated potent pan-organ activity (lungs, liver, skin, kidneys) in preclinical models and additionally possessed anti-inflammatory properties. Phase I studies in healthy volunteers confirmed an excellent safety profile and pharmacokinetics after both oral and inhaled/topical administration. The same molecule was subsequently evaluated in a randomized, placebo-controlled phase 2a study in patients with idiopathic pulmonary fibrosis (IPF) [51]. Despite the short treatment period (12 weeks) and small number of patients (n = 71), a dose-dependent clinical signal was observed: at the highest dose (60 mg once daily), there was an increase in FVC of +98.4 mL compared to a decrease of −20.3 mL in the placebo group. Crucially, the safety profile remained very favorable. This is the world’s first fully AI-generated novel small molecule (new target + new chemical entity) to achieve clinical proof in phase 2 in a disease with a high unmet need and poor prognosis. It demonstrates that generative AI can dramatically shorten the time from concept to clinical evidence (traditionally 10–15 years → here ~2–3 years to Phase 2). This paves the way for faster treatment of rare and challenging fibrotic diseases. Another area where AI is beginning to provide tangible clinical benefits is hemodynamic management during surgery. Habicher et al. [52] conducted a randomized trial in patients undergoing lung surgery with single-lung ventilation—a very challenging hemodynamic situation. The group using the HPI (Hypotension Prediction Index) algorithm had significantly fewer episodes of hypotension, shorter duration, and less exposure to low mean arterial pressure (MAP < 65 mmHg). Although the incidence of acute kidney injury (AKI) did not differ significantly, a clear trend towards fewer episodes of myocardial infarction (MINS) and postoperative infections was observed. This demonstrates that AI can be proactive, predicting hypotension minutes in advance and enabling staff to intervene quickly. Even if not all complications were significantly reduced, the trends suggest real potential for improved outcomes in high-risk patients. Hong et al. [53] went further, examining whether AI can improve quality of life and clinical outcomes in outpatient COPD patients. In two independent randomized trials (12-month and 9-month), an AI-based intervention (telemedicine, education, exercise, reminders) resulted in significant improvements in quality of life, emotional well-being, and a reduction in the number of exacerbations requiring hospitalization after 9–12 months. These effects only became apparent after a longer period, which is typical of behavioral and educational interventions. This demonstrates that AI can effectively support long-term adherence and secondary prevention in chronic respiratory diseases. Ladbury et al. [54] demonstrated ML + SHAP for determining safe dose thresholds in NSCLC radiotherapy—better AUC than classical logistic regression and specific, clinically useful toxicity thresholds. Pasipanodya et al. [55] also demonstrated ML for detecting complex pharmacokinetic interactions in tuberculosis treatment—explaining why abbreviated regimens with gatifloxacin do not always work and indicating how to improve them. Ding et al. [56] and Bosman et al. [57] used AI combined with biomarkers or AI standalone to improve lung nodule diagnosis and tuberculosis screening with very high accuracy and the potential to reduce unnecessary invasive procedures. This perspective shows that we are witnessing a true paradigm shift in pulmonology from “AI as a curiosity” to “AI as an integral part of modern respiratory care”.

3.3. Neurology

The application of AI extends naturally to neurology, where the complexity of brain imaging and signal processing creates opportunities for advanced computational tools. Amid rapid progress in neuroscience and computer science, AI is emerging as a key asset in managing neurological disorders (Figure 3) [58]. This growing collaboration between AI developers and clinicians addresses the demand for precise, objective tools capable of detecting even subtle changes, such as those seen in Alzheimer’s disease or epilepsy [59,60].

Primary applications include analysis of MRI scans for anomaly detection, tracking progression of neurodegenerative diseases, decoding EEG signals for brain–computer interfaces (BCI), and supporting cognitive stimulation (Table 3) [61,62].

Recent years have shown that artificial intelligence in neurology and neurosurgery is moving from experimental applications to real, clinically significant benefits, although still with clear limitations and the need for further refinement. A groundbreaking example is the first documentation of individual, stable electrophysiological biomarkers following deep-brain stimulation (DBS) in patients with treatment-resistant depression (TRD). Alagapan and colleagues [64] demonstrated that 90% of patients with DBS in the subcallosal cingulate (SCC) achieved a clinical response after 24 weeks, and 70% achieved remission. Explainable AI enabled the extraction of patient-specific LFP biomarkers from chronic recordings, which precisely tracked clinical status, distinguished therapeutic from transient effects, and responded to programming changes. Furthermore, the trajectory of improvement was strongly associated with preoperative white matter network integrity, and objective mood changes were reflected in automated analysis of facial expressions. This is the first evidence that DBS in TRD can be performed in a biomarker-guided manner, with minimal reliance on subjective patient assessment—paving the way for personalized, precise neuromodulation in severe psychiatric disorders. Concurrently, AI is beginning to support DBS programming in Parkinson’s disease. Boutet et al. [65] demonstrated that optimal stimulation induces a characteristic pattern of motor network activation visible in fMRI, and a machine learning model trained on these patterns achieved 88% accuracy in distinguishing optimal from suboptimal settings—and, most importantly, generalized to treatment-naive patients. This is proof of concept that functional imaging can become an objective tool for rapid and precise selection of stimulation parameters, potentially reducing the time and number of visits needed to optimize DBS.In neurological diagnostics, AI also achieves results close to those of experts. Cowan et al. [66] demonstrated that a fully automated, web-based diagnostic engine (CDE) based on the ICHD-3 criteria achieved excellent agreement (κ = 0.83) with a telephone interview conducted by specialists in diagnosing migraine, with very high sensitivity (90%) and specificity (96%). This tool can significantly shorten diagnostic time and reduce specialist workload—especially in countries with limited access to neurologists. At the same time, research shows that AI’s success is not automatic. Gorenshtein et al. [67], in a randomized trial comparing physician-assisted electromyography (EDX) interpretation with AI-assisted interpretation, demonstrated that the addition of AI did not improve the quality of reports compared to experts, and physicians rated the tool as inconvenient and of limited use (low ratings for effectiveness and workflow integration). This is an important signal that even advanced AI systems must be designed with real clinical needs and physician ergonomics in mind. In neurosurgical education, Davidovic et al. [68] demonstrated the clear advantage of a hybrid approach: in simulated brain tumor resection training, students who received personalized instructions from a human tutor supported by real-time AI data (errors detected by the algorithm) achieved significantly better results than with an AI tutor alone or traditional training. This indicates that AI is best suited as a precise support for human learning, not as a complete replacement. In clinical trial recruitment, Hassan et al. [69] demonstrated that automated, real-time AI-assisted head CT scan analysis (Viz RECRUIT SDH) increased the rate of patient enrollment in a randomized trial of subdural hematomas by 36%, eliminated enrollment errors, and achieved a PPV of >80%. This demonstrates how AI can accelerate clinical trials in neurosurgery and neurology. AI in neurology and neurosurgery already provides personalized biomarkers, supports the precise programming of implantable devices, accelerates diagnostics and research recruitment, and in education, it works best in hybrid form with humans. At the same time, it is becoming clear that not every AI application automatically brings benefits—success depends on explainability, adaptability to workflows, and collaboration with clinicians. We are on the threshold of an era in which AI is becoming an integral element of precise, personalized neurology and neurosurgery.

3.4. Hepatology

Similarly, in hepatology, AI is proving valuable for handling large datasets and complex imaging interpretations (Table 4). Using machine learning, convolutional neural networks, and deep learning, AI effectively analyzes clinical data, imaging, and histopathological studies, frequently matching or surpassing traditional techniques [71]. These capabilities translate into potential enhancements in diagnosis, prognosis, treatment optimization (Figure 4), and automated detection/classification of lesions on ultrasound, CT, or MRI [72].

In liver transplantation, AI aids donor–recipient matching and risk assessment for rejection, potentially improving outcomes by enabling early complication detection from electronic records [73].

Table 4. Summary of the latest reports in the field of hepatology.

Authors	Methodology	Results	Significance
Ratziu, V., et al. [74]	Post hoc analysis of digitized liver biopsies from 251 patients with biopsy-confirmed NASH (F1–F3) enrolled in a 72-week phase 2 RCT of once-daily semaglutide (NCT02970942). Paired baseline and week-72 biopsies were scored by two expert pathologists using conventional NASH CRN criteria and independently analyzed by PathAI’s machine-learning (ML) NASH model. Both categorical (ordinal) scores and continuous quantitative feature scores for fibrosis, steatosis, inflammation, and ballooning were generated. Treatment effects on the two co-primary endpoints (NASH resolution without worsening of fibrosis; fibrosis improvement ≥ 1 stage without worsening of NASH) were compared between pathologist and ML assessments.	Both pathologist and ML categorical scoring detected significantly higher rates of NASH resolution with semaglutide 0.4 mg versus placebo (pathologist: 58.5% vs. 22.0%, p < 0.0001; ML: 36.9% vs. 11.9%, p = 0.0015). Fibrosis improvement trended higher but was not significant with either method. ML-derived continuous scores revealed significant semaglutide-induced reductions in fibrosis (p = 0.0099) and other features that were not detectable by conventional categorical pathologist or ML ordinal assessments.	ML-based digital pathology reproduces expert pathologist categorical assessments of treatment response in NASH but provides superior sensitivity via continuous quantitative scoring, uncovering an antifibrotic effect of semaglutide missed by traditional histopathology. This demonstrates the value of AI-powered continuous metrics as more responsive endpoints in NASH clinical trials.
Tiyarattanachai, T., et al. [75]	Single-center prospective randomized controlled trial (TCTR20201230003); 504 patients (260 with FLLs for non-experts, 244 for experts) underwent real-time ultrasound twice: with and without AI assistance from a CNN-based system. Detection rates and false positives were compared using McNemar’s test; non-experts were trainees; experts were board-certified radiologists.	For non-experts, AI assistance increased FLL detection rate (36.9% vs. 21.4%, p < 0.001) without significantly raising false positives (14.2% vs. 9.2%, p = 0.08). For experts, AI did not significantly improve detection (66.7% vs. 63.3%, p = 0.32) or alter false positives (8.6% vs. 9.0%, p = 0.85).	AI assistance significantly boosts focal liver lesion detection during ultrasound for non-expert operators, potentially enabling high-quality HCC surveillance in resource-limited settings with limited expert availability. No added benefit for experts suggests AI’s primary value in democratizing access to reliable imaging.
Zhang, Y., et al. [76]	Prospective randomized diagnostic trial: 100 patients with suspected small hepatocellular carcinoma (≤3 cm) were 1:1:1:1 randomized to four ultrasound modalities: color Doppler alone, contrast-enhanced ultrasound (CEUS) alone, elastography alone, or multimodal (all three combined). All images were processed and segmented by a Mask R-CNN deep-learning algorithm. Diagnostic performance was compared against pathological biopsy as gold standard. Additionally, EZH2 and p57 expression were measured by immunohistochemistry in tumor tissue, peritumoral tissue, and normal liver.	Mask R-CNN achieved the highest segmentation accuracy (97.23%) and average precision (71.90%). The multimodal ultrasound group outperformed single-modality groups with sensitivity 88.87%, specificity 90.91%, accuracy 89.47%, and Cohen’s κ 0.68 (all p < 0.05). Multimodal imaging features of malignancy: irregular shape, unclear borders, uneven internal echo, grade 1–2 blood flow, elasticity score 4–5, and fast-in fast-out contrast pattern. EZH2 was overexpressed (75.95% positive) and p57 underexpressed (80.79% negative) in tumor tissue; p57 negativity was significantly higher in poorly differentiated HCC.	Mask R-CNN-enhanced multimodal ultrasound (Doppler + CEUS + elastography) provides excellent diagnostic performance for small HCC, significantly superior to any single modality. High EZH2 and low p57 expression are strongly associated with oncogenesis and poor differentiation in small HCC, supporting their potential as diagnostic and prognostic biomarkers. This AI-augmented multimodal approach offers a highly accurate, non-invasive tool for early HCC detection.
Briceño, J., et al. [77]	Multicenter retrospective analysis of 1003 liver transplants from 11 Spanish centers. Sixty-four donor and recipient variables were used to develop two complementary artificial neural network (ANN) models via Neural Net Evolutionary Programming (NNEP): a positive-survival model (NN-CCR) and a negative-loss model (NN-MS) to predict 3-month graft survival/loss for each donor–recipient pair. Performance was compared against six established prognostic scores (MELD, D-MELD, DRI, P-SOFT, SOFT, BAR) using ROC curves and AUROC.	ANN models significantly outperformed all conventional scores. NN-CCR predicted 3-month graft survival with 90.79% accuracy (AUROC 0.80) and NN-MS predicted graft loss with 71.42% accuracy (AUROC 0.82). AUROCs of traditional scores ranged from 0.41 to 0.67 (all p < 0.001 vs. ANN). ANN also outperformed multiple regression models.	Artificial neural networks provide substantially superior, individualized prediction of 3-month graft survival compared to all currently validated prognostic scores. This ANN-based approach offers a more objective, accurate, and equitable tool for donor–recipient matching and organ allocation, with potential to optimize justice, utility, and transplant outcomes in clinical practice.

In recent years, artificial intelligence has been revolutionizing hepatology and liver transplantation, both in diagnosis and in assessing treatment response and optimizing organ allocation. The most groundbreaking are two independent post hoc analyses of the same phase 2 clinical trial with semaglutide in NASH [75]. In 251 patients with liver biopsies before and after 72 weeks of treatment, PathAI’s machine-learning NASH model analyzed scanned slides in parallel with two expert pathologists. Traditional categorical ratings (CRN) confirmed a higher rate of NASH resolution without worsening fibrosis in the semaglutide 0.4 mg group (58.5% vs. 22.0% by pathologists; 36.9% vs. 11.9% by ML). However, only continuous, quantitative metrics generated by AI revealed a significant reduction in fibrosis (p = 0.0099) and improvement in all histological features (steatosis, inflammation, ballooning)—an antifibrotic effect of semaglutide that had completely eluded traditional categorical methods. This demonstrates that AI-assisted digital pathology not only replicates expert assessment but is significantly more sensitive and can detect subtle changes that determine drug success in NASH clinical trials. As a result, AI may accelerate drug development and change the way histological endpoints are defined in hepatology. Simultaneously, AI is beginning to significantly improve liver imaging diagnostics. In a randomized trial involving 504 patients, Tiyarattanachai et al. [75] demonstrated that assisting non-experts (residents) with a real-time CNN system during ultrasound examination increased the detection rate of focal liver lesions (FLL) from 21.4% to 36.9% without a significant increase in false positives. In expert (certified radiologists), AI did not provide additional benefit, meaning that its greatest value lies in the hands of less experienced operators. This paves the way for improved HCC screening in countries with limited access to specialists. Zhang et al. [76] went further and demonstrated that multimodal ultrasound (Doppler + CEUS + elastography) aided by Mask R-CNN segmentation achieved excellent diagnostic accuracy for small HCCs (≤3 cm): sensitivity 88.9%, specificity 90.9%, and accuracy 89.5%. This result was significantly better than with either modality alone. Furthermore, EZH2 overexpression and p57 deficiency in tumor tissue were confirmed to be strongly associated with carcinogenicity and poorer prognosis—a finding that may support both diagnostics and risk stratification in the future. In transplantology, Briceño et al. [77] developed two complementary neural network models (NN-CCR and NN-MS) that, based on 64 donor and recipient variables, predict graft survival or loss within 3 months with an accuracy of 90.8% (AUROC 0.80) and 71.4% (AUROC 0.82). Both models significantly outperformed all previous classical scales (MELD, DRI, SOFT, BAR—AUROC 0.41–0.67). This is the first evidence that AI can offer individual, precise prediction of transplant prognosis, potentially enabling more equitable and efficient organ allocation. AI in hepatology is moving from supporting diagnostics to providing new, more sensitive endpoints in clinical trials, improving the detection of hepatocellular carcinoma in less experienced physicians, and finally to precise, individual donor–recipient matching in transplantology. We are witnessing a paradigm shift in which AI not only supports but is beginning to redefine standards for diagnosis, treatment, and organ allocation in liver diseases.

3.5. Pancreatic Disease

AI applications continue in the challenging domain of pancreatic diseases (Table 5), where early detection is critical. Ranging from pancreatic cancer to polyps and acute pancreatitis, AI primarily leverages imaging modalities such as endoscopic ultrasound (EUS), CT, and MRI to enable rapid identification of pathological changes [78,79]. Notably, 3D-CNN models detect pancreatic ductal adenocarcinoma (PDAC) and preinvasive lesions with sensitivity exceeding 90%, even on pre-diagnostic scans [80,81,82]. Deep learning achieves Dice coefficients above 0.92 for automatic segmentation of solid lesions and cysts. In assessing malignancy risk for intraductal papillary mucinous neoplasms (IPMN), radiomics-based models yield AUC values of 0.88–0.95 [83,84]. Furthermore, tools like the EASY-APP model predict multiple organ failure risk within the first 24 h of hospitalization using clinical and biochemical data, with an accuracy of 85–92% [85].

Table 5. Summary of the latest reports in the field of pancreatic diseases.

Authors	Methodology	Results	Significance
Cui, H., et al. [86]	Multicenter randomized crossover trial (4 centers in China, Jan–Jun 2023); 12 endoscopists (junior and senior) diagnosed 130 prospective patients with solid pancreatic lesions twice: conventionally and with assistance from a multimodal joint-AI model. The AI was trained/validated on EUS images + clinical data from 439 internal patients (2014–2022) and externally tested on 189 patients from 3 other institutions. Primary outcome: diagnostic accuracy with vs. without AI; secondary: AUC of the joint-AI model and human-AI interaction effects.	Joint-AI model achieved outstanding performance: internal AUC 0.996, external AUCs 0.955–0.976. In prospective crossover testing, AI assistance significantly improved the diagnostic accuracy of novice endoscopists (p < 0.001) with no decline in senior endoscopists. Explainability features reduced skepticism among experienced users, demonstrating positive human-AI collaboration.	The multimodal joint-AI model (EUS images + clinical data) dramatically outperforms single-modality AI and significantly boosts real-world diagnostic accuracy, especially for less-experienced endoscopists. It establishes a new benchmark for AI-assisted EUS diagnosis of solid pancreatic lesions and shows that well-designed, explainable multimodal AI can be readily adopted across expertise levels in clinical practice.
Chen, P. T., et al. [87]	Retrospective multicenter development and validation of an end-to-end deep learning (DL) system combining a segmentation CNN and a classifier ensemble of five CNNs. Training/validation used 546 pancreatic cancer CTs (2006–2018) and 733 normal pancreas CTs (2004–2019). Internal testing and comparison with original radiologist reports were performed, followed by nationwide external real-world validation on 1473 CT studies from institutions across Taiwan.	Internal test set: DL tool achieved 89.9% sensitivity (98/109) and 95.9% specificity (141/147), AUC 0.96, with no significant difference from original radiologist sensitivity (96.1%, p = 0.11). Nationwide real-world test set (n = 1473): sensitivity 89.7% (600/669), specificity 92.8% (746/804), AUC 0.95. For tumors < 2 cm, sensitivity was 74.7% (68/91).	This fully automated DL tool reliably detects pancreatic cancer on contrast-enhanced CT, including a large proportion of sub-2 cm tumors that are frequently missed by radiologists in routine practice. It offers performance comparable to or exceeding human interpretation and has proven generalizability across Taiwan, establishing it as a robust second-reader or triage tool to reduce missed pancreatic cancers in clinical workflow.
Kovatchev, B., et al. [88]	Pilot randomized crossover feasibility trial: 15 adults with T1D on commercial AID systems underwent two identical 20 h supervised hotel stays. The University of Virginia Model-Predictive Control (UMPC) algorithm was encoded into a neural network to create its Neural-Net Artificial Pancreas (NAP) approximation. Participants were randomly assigned to receive either NAP or the original UMPC algorithm during each session (crossover design with washout).	NAP and UMPC achieved nearly identical glycemic outcomes: TIR 86% vs. 87% (adjusted difference 1 percentage point), time < 70 mg/dL 2.0% vs. 1.8%, and CV 29.3% vs. 29.1%. Mean absolute difference in insulin delivery was only 0.031 U/h under identical inputs. NAP required sixfold lower computational resources than UMPC. No serious adverse events occurred with either controller.	A neural-network-encoded version of a clinically validated model-predictive control AID algorithm (NAP) replicated the performance of the original UMPC algorithm in real-world conditions while dramatically reducing computational burden. This first-in-human demonstration opens regulatory and clinical pathways for modern machine-learning techniques to replace traditional control algorithms in artificial pancreas systems, enabling faster innovation and deployment of next-generation AID.
Atlas, E., et al. [89]	Early feasibility clinical study of the MD-Logic Artificial Pancreas (MDLAP), a closed-loop system based on fuzzy logic theory that mimics diabetes caregiver reasoning using control-to-range and control-to-target strategies. Seven young adults with well-controlled T1D (age 19–30, A1C 6.6 ± 0.7%) underwent a total of 14 supervised sessions: 8 h sessions (overnight fasting + meal challenges) and 24 h full closed-loop sessions in a clinical research center.	Mean postprandial peak glucose was 224 ± 22 mg/dL, returning to <180 mg/dL within 2.6 ± 0.6 h and remaining stable in range for ≥1 h. During 24 h closed-loop control, 73% of sensor values were in 70–180 mg/dL, 27% > 180 mg/dL, and 0% < 70 mg/dL. No symptomatic hypoglycemia occurred in any trial.	The MD-Logic Artificial Pancreas, one of the earliest fuzzy-logic-based fully closed-loop systems, demonstrated safe and effective automated glucose control in a controlled setting, with good postprandial handling and zero hypoglycemia despite meal challenges. This proof-of-concept study established the clinical feasibility of non-model-based, expert-knowledge-driven artificial pancreas technology and paved the way for subsequent real-world and long-term trials.

AI is already achieving results that not only support physicians but, in many cases, equal or exceed their performance, while simultaneously paving the way for new generations of medical devices. The most spectacular results are seen in endoscopic diagnostics of the pancreas. Cui et al. [86], in a randomized, crossover multicenter study involving 12 endoscopists and 130 patients with solid pancreatic lesions, demonstrated that a multimodal AI model (combining EUS images with clinical data) achieved an external AUC of 0.955–0.976. In real-world conditions, AI assistance significantly improved the diagnostic accuracy of junior endoscopists (p < 0.001) without compromising the performance of experienced specialists. Importantly, explainability elements reduced skepticism among seniors and enabled seamless human-AI collaboration. This demonstrates that a well-designed, multimodal, and explainable AI system can become the standard in pancreatic endoscopy, especially in centers with varying levels of experience. Equally impressive results are achieved in the diagnosis of pancreatic cancer using computed tomography. Chen et al. [87] developed a fully automated deep-learning system that, in a nationwide validation using 1473 CT examinations in Taiwan, achieved a sensitivity of 89.7% and a specificity of 92.8% (AUC 0.95)—comparable to original radiologist reports. Crucially, the tool detected 74.7% of tumors smaller than 2 cm lesions that are often missed in daily practice. This result confirms that AI can be a reliable second reader or triage tool that significantly reduces the number of missed pancreatic cancers. In diabetes, AI is also revolutionizing automated glucose control systems. Kovatchev and colleagues [88], in their first human study, demonstrated that a neural network approximating a clinically validated model prediction algorithm (UMPC) named NAP achieved nearly identical glycemic control results (TIR 86% vs. 87%) with six times less computational demand. This breakthrough paves the way for replacing traditional control algorithms with artificial intelligence, which could significantly accelerate the development and implementation of future generations of artificial pancreas. The foundations for this path were previously demonstrated by Atlas et al. [89] in a feasibility study of the MD-Logic Artificial Pancreas system based on fuzzy logic. Under controlled, 24 h closed-loop conditions, a TIR of 73% was achieved without any episodes of symptomatic hypoglycemia, despite mealtime challenges. Although the results were even lower than those of current systems, this study demonstrated that a model-free, expert-based, closed-loop approach is safe and effective, opening the door to later, more advanced solutions. AI is evolving from a supportive role to a key diagnostic and therapeutic tool. Multimodal AI in EUS improves accuracy even for less experienced physicians, automated DL systems on CT save lives by detecting small pancreatic tumors, and neural approximations of traditional control algorithms enable the faster development of modern automated glucose control systems. We are witnessing a paradigm shift, with AI becoming an integral part of the precise diagnosis and treatment of pancreatic diseases and type 1 diabetes.

3.6. Other Applications of AI (Table 6)

Finally, AI demonstrates broad utility across additional medical domains. A growing body of evidence highlights its effectiveness in diagnostic imaging for cancers of the breast, colon, stomach, and melanoma, where it improves detection rates while reducing clinician workload [90,91]. Emerging roles in behavioral therapy and palliative care suggest potential outcome improvements, whereas results in medical training and patient self-care remain mixed. AI offers scalability but does not consistently outperform established standards [92,93,94,95]. Overall, AI appears safe and effective in addressing healthcare workforce shortages, though further real-world studies are essential to confirm long-term benefits.

Table 6. Other examples of AI applications in medicine.

Authors	Methodology	Results	Significance
Lång, K., et al. [96]	Prospective, population-based, two-arm randomized controlled trial (MASAI, NCT04838756) at four Swedish screening sites. 80,033 women aged 40–80 were randomized 1:1 to AI-supported screening (Transpara v1.7.0 risk score 1–10 used to triage: scores 1–9 single reading, score 10 double reading; CAD marks shown for scores 8–10) versus standard double reading without AI. Participants and radiographers were masked; radiologists were not. Prespecified clinical safety analysis performed after 80,000 enrolments, focusing on cancer detection rate, recall rate, false-positive rate, PPV, and screen-reading workload.	AI-supported arm (n = 39,996): cancer detection 6.1/1000 (244 cancers), recall 2.2%, false-positive rate 1.5%, PPV 28.3%, with 44.3% reduction in screen readings (46,345 vs. 83,231). Standard double-reading arm (n = 40,024): cancer detection 5.1/1000 (203 cancers), recall 2.0%, false-positive rate 1.5%, PPV 24.8%. Cancer detection ratio 1.2 (95% CI 1.0–1.5, p = 0.052), exceeding the prespecified safety threshold (>3/1000 in AI arm). Proportion of invasive vs. in situ cancers was similar.	In the first large-scale randomized trial of AI in mammography screening, AI-supported reading proved clinically safe with a comparable (slightly higher) cancer detection rate, similar recall/false-positive rates, and a 44% reduction in radiologist workload. This establishes AI as a viable and efficient alternative to standard double reading, with potential to address global mammography workforce shortages while maintaining or improving screening performance.
Dembrower, K., et al. [97]	Prospective, population-based, paired-screening non-inferiority trial (ScreenTrustCAD, NCT04778670) at a single mammography unit in Stockholm. 55,581 consecutive screening women aged 40–74 years (Apr 2021–Jun 2022) had their mammograms independently read by two radiologists (standard double reading) and by AI (Transpara) + one radiologist, AI alone, and two radiologists + AI. Non-inferiority margin was a ≤15% relative reduction in cancer detection rate compared with standard double reading by two radiologists.	269 screen-detected cancers (0.49%). One radiologist + AI: 261 cancers (relative proportion 1.04, 95% CI 1.00–1.09)—non-inferior and numerically 4% higher. AI alone: 246 cancers (relative proportion 0.98, 95% CI 0.93–1.04)—non-inferior. Two radiologists + AI: 269 cancers (relative proportion 1.08, 95% CI 1.04–1.11)—superior. False-positive rates and recall rates were comparable across strategies.	In the first prospective trial of AI as an independent reader in population-based screening, replacing one of the two radiologists with AI maintained (and slightly improved) cancer detection while halving radiologist workload. AI alone was also non-inferior to standard double reading. These findings support safe, controlled implementation of AI to address radiologist shortages and increase screening capacity without compromising detection performance.
Sadeh-Sharvit, S., et al. [98]	Single-site randomized controlled trial at a U.S. community mental health clinic; 47 adults with primary depression or anxiety disorders starting outpatient individual CBT were 1:1 randomized to therapy augmented by the Eleos Health AI platform or treatment-as-usual (TAU) for the first 2 months. The AI platform automatically transcribed sessions, measured fidelity to evidence-based practices, integrated patient-reported outcome measures, and auto-drafted progress notes.	Patients in the AI-augmented group attended 67% more sessions (mean 5.24 vs. 3.14). Depression (PHQ-9) scores decreased 34% in the AI group vs. 20% in TAU, and anxiety (GAD-7) scores decreased 29% vs. 8%, both with large effect sizes favoring AI-augmented therapy. Therapists using the AI platform submitted progress notes 55 h earlier on average (p < 0.001). Treatment satisfaction and perceived helpfulness were equivalent between groups.	In the first randomized trial of an AI platform designed specifically for behavioral health, AI-augmented CBT significantly improved patient engagement, symptom reduction, and clinician documentation efficiency compared with standard care. These results establish clinical proof-of-concept that AI tools can meaningfully enhance the delivery and outcomes of routine outpatient psychotherapy in real-world community settings.
Repici, A., et al. [99]	Prospective, randomized, controlled non-inferiority trial (AID-2); 10 non-expert endoscopists (<2000 lifetime colonoscopies) performed 660 screening/surveillance/diagnostic colonoscopies in patients aged 40–80 years, randomized 1:1 to high-definition colonoscopy with or without real-time AI CADe (GI Genius, Medtronic). Primary endpoint: adenoma detection rate (ADR). Post hoc pooled analysis combined these data with the previously published AID-1 trial (6 expert endoscopists, similar design).	In non-experts (AID-2), CADe significantly increased ADR (53.3% vs. 44.5%; RR 1.22, 95% CI 1.04–1.40; p < 0.01 for non-inferiority, p = 0.02 for superiority), adenomas per colonoscopy, and detection of small/distal lesions, without increasing non-neoplastic resections. Pooled analysis of 1020 patients (AID-1 + AID-2) confirmed CADe as a strong independent predictor of higher ADR (RR 1.29, 95% CI 1.16–1.42), while endoscopist experience level was not significant (RR 1.02, 95% CI 0.89–1.16).	Real-time AI CADe substantially improves ADR in less experienced colonoscopists to levels exceeding those of unaided experts, and pooled data show that the benefit of CADe is largely independent of physician experience. This supports universal implementation of CADe to standardize and elevate colonoscopy quality across all skill levels, especially during training periods.
Marcuzzi, A., et al. [100]	Single-center, three-arm randomized clinical trial at a Danish multidisciplinary specialist outpatient clinic. 294 adults (≥18 years) with persistent neck and/or low back pain on the waiting list for specialist care were randomized 1:1:1 to: SELFBACK app + usual care (AI-based, individually tailored weekly exercise, physical-activity, and education plans) e-Help + usual care (non-tailored generic web-based self-management information) Usual care alone Primary outcome: change in Musculoskeletal Health Questionnaire (MSK-HQ) score at 3 months. Follow-up: 6 weeks, 3 months, 6 months	At 3 months (82.7% complete data): Adjusted mean difference SELFBACK vs. usual care: +0.62 MSK-HQ points (95% CI −1.66 to 2.90; p = 0.60) SELFBACK vs. e-Help: +1.08 points (95% CI −1.24 to 3.41; p = 0.36) No significant differences were found in the primary outcome or any secondary outcomes (pain-related disability, pain intensity, catastrophizing, fear-avoidance, quality of life) at any time point.	In patients already referred to specialist care, adding an AI-based, individually tailored self-management app (SELFBACK) to usual care did not improve musculoskeletal health, pain, or function more than usual care alone or simple nontailored web-based information. This negative trial suggests that highly personalized digital self-management support may offer little additional benefit in a secondary/tertiary care setting where patients already expect specialist intervention, highlighting the importance of context when deploying digital health tools.
Wallace, M. B., et al. [101]	Multicenter (Italy, UK, US), randomized (1:1), tandem colonoscopy trial in 230 screening/surveillance patients. Same-day back-to-back procedures were performed with or without real-time AI assistance (deep-learning CADe), with order randomized (AI-first vs. colonoscopy-first). Primary endpoint: adenoma miss rate (AMR) = adenomas detected only on second colonoscopy/total adenomas from both colonoscopies. Secondary endpoints: mean adenomas at second colonoscopy, false-negative patient rate, and adverse events.	Overall AMR was significantly lower when AI was used first (15.5% vs. 32.4%; adjusted OR 0.38, 95% CI 0.23–0.62). AI benefit was greatest for ≤5 mm (15.9% vs. 35.8%; OR 0.34) and nonpolypoid lesions (16.8% vs. 45.8%; OR 0.24), and consistent in proximal (18.3% vs. 32.5%) and distal colon. Mean adenomas missed at second colonoscopy were halved with AI-first (0.33 vs. 0.70, p < 0.001). False-negative patient rate dropped from 29.6% to 6.8% (OR 0.17). No difference in adverse events.	In this rigorous tandem-colonoscopy design, real-time AI approximately halved the miss rate of colorectal adenomas, especially small/subtle lesions that commonly escape human perception. This provides direct evidence that AI reduces perceptual errors during standard colonoscopy, supporting its role in substantially improving colorectal cancer prevention through higher neoplasia detection.
Liaw, S. Y., et al. [102]	Single-center randomized controlled trial with 64 nursing students 1:1 allocated to sepsis team training in virtual reality simulation with either an AI-powered virtual doctor or a human-controlled virtual doctor (played by medical students). Both groups underwent identical sepsis scenarios. Outcomes assessed: sepsis care knowledge, interprofessional communication knowledge, self-efficacy in interprofessional communication (pre- and post-intervention), and objective performance in sepsis care and communication during a post-intervention simulation-based test.	Both groups significantly improved communication knowledge and self-efficacy from baseline. Only the AI-powered group significantly improved sepsis care knowledge (p < 0.001 vs. p = 0.16 in human-controlled group). Posttest sepsis knowledge: AI-powered group significantly higher (mean 9.06 vs. 7.75, p = 0.009) Sepsis care performance: no difference (13.63 vs. 12.75, p = 0.39) Interprofessional communication performance: no difference (29.34 vs. 27.06, p = 0.21) Self-efficacy in interprofessional communication: human-controlled group higher (69.6 vs. 60.1, p = 0.008).	An AI-powered virtual doctor was not inferior to a human-controlled (medical student) virtual doctor in training nursing students for sepsis care and interprofessional communication, demonstrating feasibility and scalability of AI-driven team training when medical student availability is limited. The AI group achieved superior sepsis knowledge retention, but human partners provided a greater boost in perceived communication self-efficacy. This supports hybrid models that combine AI scalability with human sociability for optimal interprofessional sepsis training.
Luo, H., et al. [103]	Multicenter diagnostic study across six Chinese hospitals of varying tiers. A total of 1,036,496 standard white-light endoscopy images from 84,424 patients were used to develop and validate GRAIDS, a deep-learning convolutional neural network system for real-time detection of upper gastrointestinal cancers (oesophageal and gastric). Training and tuning: images from Sun Yat-sen University Cancer Center Validation: internal, prospective (same centre), and five external datasets from primary-care hospitals Performance of GRAIDS was directly compared with endoscopists of three expertise levels (expert, competent, trainee).	GRAIDS achieved: Diagnostic accuracy: 91.5–97.8% across all validation sets (internal 95.5%, prospective 92.7%) Sensitivity: 94.2% (not significantly different from expert endoscopists 94.5%, p = 0.692) Sensitivity markedly superior to competent (85.8%, p < 0.0001) and trainee (72.2%, p < 0.0001) endoscopists Specificity: 91–98% Positive predictive value: 81.4% Negative predictive value: 97.8%	GRAIDS is the first deep-learning system prospectively validated across multiple tiers of hospitals for real-time detection of upper gastrointestinal cancers using standard endoscopy. It performs at expert-endoscopist level for sensitivity and significantly outperforms non-expert endoscopists, offering a scalable, high-performance tool to improve early cancer detection in community and primary-care settings where endoscopic expertise is often limited. This represents a major step toward AI-assisted universal upper GI cancer screening
Wilson, P. M., et al. [104]	Pragmatic, cluster-randomized, stepped-wedge trial across 12 inpatient units at two U.S. hospitals (Aug 2019–Nov 2020). An AI/ML tool continuously screened the electronic health record and predicted need for palliative care consultation. Units were sequentially randomized to activate the intervention: when triggered, the tool sent a real-time Best Practice Advisory to the primary team recommending palliative care consultation. Control periods used usual care (no AI alert). Primary outcome: documented palliative care consultation. Secondary outcomes: hospital length of stay, ICU transfers, and 30-/60-/90-day readmissions.	3183 hospitalizations enrolled: 1717 retained for analysis (1212 intervention, 1332 usual-care). Palliative care consultation rate was significantly higher with AI support (IRR 1.44, 95% CI 1.11–1.92). Exploratory analyses showed reduced 60-day (OR 0.75, 95% CI 0.57–0.97) and 90-day readmissions (OR 0.72, 95% CI 0.55–0.93). No significant differences in length of stay or ICU transfers.	This is the first large-scale randomized trial demonstrating that a real-time AI/ML decision-support tool safely and effectively increases palliative care consultation rates in general hospital inpatients. The associated reduction in 60- and 90-day readmissions suggests that proactive, AI-triggered palliative care involvement improves downstream utilization and potentially quality of care. The pragmatic design and integration into routine workflow support scalability across diverse hospital settings.
Papachristou, P., et al. [105]	Prospective multicentre diagnostic trial at 36 primary care centres in Sweden. Primary care physicians used a smartphone-based AI clinical decision support tool (dermoscopic photo → dichotomous text output: “evidence of melanoma” or not) on 253 skin lesions of concern in 228 patients. All lesions were managed according to standard care (excision or specialist referral) regardless of app result. Final histopathological or specialist diagnoses were retrieved from medical records and compared with the AI output.	253 lesions: 21 melanomas (11 invasive, 10 in situ) Overall melanoma detection: AUROC 0.960 (95% CI 0.928–0.980) → max sensitivity 95.2%, specificity 84.5% Invasive melanomas only: AUROC 0.988 (95% CI 0.965–0.997) → max sensitivity 100%, specificity 92.6%	This is one of the first prospective trials of an AI-based melanoma detection tool used by primary care physicians in real-world clinical practice. The smartphone app achieved extremely high diagnostic accuracy (near perfect for invasive melanomas) when assessing dermoscopic images of lesions already flagged as suspicious. These results suggest that AI decision support can safely and effectively augment primary care physicians’ ability to triage and detect melanoma, potentially reducing diagnostic delay and unnecessary excisions in primary care settings.

In diagnostic imaging and clinical care, AI has been achieving breakthrough results in recent years, not only supporting physicians but, in many cases, maintaining or improving the quality of care while significantly reducing the workload of medical staff. The most important evidence comes from large, randomized trials of mammography screening. In the MASAI study [96]—the first multicenter RCT involving over 80,000 women—AI-assisted mammogram triage (Transpara) detected more cancers (6.1 vs. 5.1 per 1000) with similar false positive and recall rates, while simultaneously reducing the number of readings required by radiologists by 44%. Similar conclusions were drawn from the ScreenTrustCAD study [97], in which replacing one radiologist with an AI in a double reading did not reduce cancer detection rates, but slightly increased them, and the AI as a stand-alone reader was no worse than two human readers. These results clearly demonstrate that AI can safely and effectively address the shortage of radiologists in breast cancer screening programs, increasing the availability and quality of examinations. AI has demonstrated equally impressive results in colonoscopy. Repici et al. [99] in the AID-2 study demonstrated that the CADe system (GI Genius) significantly improved adenoma detection (ADR 53.3% vs. 44.5%) in inexperienced endoscopists, and a combined analysis with the AID-1 study showed that the benefit of AI was independent of operator experience. Wallace et al. [101] in a tandem colonoscopy study confirmed that AI halved the rate of missed adenomas (AMR 15.5% vs. 32.4%), particularly for small and flat lesions—critical for colorectal cancer prevention. Together, these studies demonstrate that real-time CADe is becoming a tool for standardizing and improving the quality of colonoscopy, regardless of the physician’s experience. AI is also bringing tangible benefits in other areas. Sadeh-Sharvit et al. [98] in a randomized trial demonstrated that the Eleos Health platform (automatic transcription of CBT sessions, assessment of EBP fidelity, and note generation) increased patient attendance by 67%, improved symptoms of depression and anxiety, and sped up documentation by dozens of hours. Luo et al. [103] created the GRAIDS system, which, in prospective validation in hospitals of various levels, achieved expert-level sensitivity for detecting esophageal and gastric cancer (94.2%), significantly outperforming trainee and competent endoscopists. Papachristou et al. [105] demonstrated that a smartphone app in the hands of primary care physicians achieved an AUROC of 0.988 for invasive melanoma, opening the door to rapid triage of skin lesions in primary care. Not all trials yielded positive results. Marcuzzi et al. [100] demonstrated that the individually tailored SELFBACK app did not improve outcomes for patients with back and neck pain already referred to a specialist, illustrating that AI’s success depends on the clinical context. Wilson et al. [104] demonstrated that proactive AI alerts increased palliative consultation rates by 44% and reduced readmissions at 60 and 90 days, confirming AI’s potential to improve the quality of hospital care. Well-designed AI systems for mammography, colonoscopy, and upper gastrointestinal endoscopy already meet or exceed expert results, significantly reduce staff workload, and improve cancer detection (Table 7). In psychiatry and palliative care, AI increases patient engagement and treatment efficiency, and in medical education and primary care, it opens new opportunities for scalable support.

4. Discussion

Artificial intelligence offers significant potential to support diagnosis, prognosis, treatment planning, and workflow efficiency in medicine. Successful applications are characterized by high-quality training data, external validation, integration into clinical workflows, and collaboration with clinicians rather than replacement of human judgment. A credible validation of the artificial intelligence models is one of the core principles of the safe clinical use of the models. One such threat is that of overfitting, i.e., overfitting the model, hence high performance in the training group but poor performance in new patients. Data leakage may make this issue worse, as it involves accidental data leakage between the training and test sets that results in falsely high results. The learning curve analysis, in turn, is a complement to the effectiveness evaluation that shows whether the model in question becomes better in its performance as the volume of data increases. Calibration is also essential, and it identifies whether the projected probabilities are sufficient for the actual clinical risk. The single high values of discrimination measures like AUC cannot guarantee the reliability of the model in clinical practice. To ensure reliable evaluation, it is necessary to check whether the model identifies meaningful patterns in the data, rather than simply remembering cases used in the training set. This includes checking learning curve analysis, checking on separate cohorts and checking the calibration of the modeled risk. The representativeness of the training data also defines the effectiveness of the model because, in the case of the absence of diversity of the population, diagnostic errors may appear and, therefore, limited practical value can be obtained. All these factors need to be taken into consideration to evaluate the plausibility and viability of AI implementation in clinical care. In cardiology, pulmonology, neurology, hepatology, and gastroenterology, AI frequently achieves diagnostic accuracy comparable to or exceeding that of human experts in selected tasks. However, evidence for a reduction in hard clinical endpoints (mortality, major morbidity) remains limited. At the height of the artificial intelligence growth in the medical field, validation of the presented models is important. The study of the mentioned studies demonstrates a distinct separation as certain papers provide solid proof based on prospective, randomized clinical trials with external validation (e.g., TAILORED-AF [33], FLASH [34], MASAI [96], or Phase II studies of a drug identified with the help of generative AI [50,51]) which reduces the risk of the overfitting and states the real clinical effect. Conversely, many post hoc-based and retrospective cohort studies (e.g., radiomics studies in SCOT-HEART [37], predictive ML models in WARCEF [46], or subanalyses of RTOG 0617 [54]) report primarily final performance scores (e.g., C-statistic increase by 0.04 or AUC > 0.90) derived on in-house data or with limited cross-validation. When this happens, the likelihood of overfitting increases, especially when one applies high-dimensional features (e.g., radiomics) and thus erroneous hopes about their generalizability and actual clinical use. The difference highlights the necessity to focus on potential, multicenter, RCTs with solid endpoints prior to the regular application of AI in clinical practice. The absence of such validation of numerous potential good publications in the present weakens the evidence and demands careful interpretation of the reported advantages. Generalizability across diverse populations, bias mitigation, and legal responsibility require further attention. The problem of generalizability of AI models trained within a setting to local settings (so-called domain shift) due to differences in equipment, imaging protocols, patient demographics, and clinical practices is one issue that remains a challenge. Algorithms tested on data of highly developed centers tend to become slow in other environments, which restricts their safe use. The other important aspect is the detection of patient subcategories (e.g., age, gender, race/ethnicity or socioeconomic status) where the model does not perform optimally, resulting in care disparities. Subgroup analysis should also be performed regularly (e.g., with explainable AI, e.g., SHAP) or by identifying subgroups with worse performance using data, to identify such disparities. The absence of such evaluation in most studies (mostly retrospective or post hoc) is a severe drawback and explains the necessity to conduct prospective, multi-centric validation and extended observation of the results, post implementation. Although there is a rising trend in the use of artificial intelligence in the field of medicine, the practical clinical use of artificial intelligence tools still has its pitfalls. The existing regulatory framework and recommendations are too broad and lack any reliable guidelines on the circumstances in which AI tool recommendations are not compatible with the opinion with an experienced clinician, which poses a severe problem associated with their application and operation in everyday clinical practice. In case the recommendations of AI tools challenge the judgment of a clinician, a physician may follow their judgment and experience, record discrepancies, and examine the potential error sources [148]. The implementation of these systems does not shift the burden of responsibility onto physicians in making their decisions and any mistakes may weigh down on the employees and the organizations employing these systems [149]. There are also challenges encountered in the course of testing algorithms in new, local clinical sites where there are differences between training data and the real population of patients. This will require the local validation and constant post-implementation observation. Validation of an algorithm can be used to determine the effectiveness of these algorithms using other centers or population data. Clinical implementation should be validated by a multicenter or prospective validation prior to clinical implementation [150,151]. The problem of the dissimilar efficiency of the models among particular patients is discussed in the research community, yet, in clinical practice, the number of studies to offer particular solutions to eliminate the inequalities is still low [152]. Tools such as AI-guided atrial fibrillation ablation [33], automated PCI optimization [34], and radiomics analysis of atherosclerotic plaques in CCTA [37] highlight the potential of AI to exceed the limitations of standard anatomical parameters. Low-threshold solutions, such as AI-ECG, which support the clinical practice of non-cardiologists in detecting AF and reduced ejection fraction [35,36], demonstrate that even the simplest algorithms can effectively reduce diagnostic gaps. Increasing patient experience with conversational AI [38] and automated echocardiographic measurements [39] emphasizes the importance of interpretability, personalization, and integration with workflows for ongoing clinical adaptation. The effectiveness of these tools depends primarily on appropriate data selection and clinical context. Models based solely on clinical variables can support patient selection for diagnosis [40], but imaging studies remain essential for identifying high-risk features. The greatest clinical benefits are observed in the use of tools that combine analytics with education, monitoring, and patient support in rehabilitation and outpatient care [41,42,43,45]. Fully automated tools such as CT-μFR [44] or stroke prediction models in patients with HFrEF [46] outperform traditional scales and reveal new risk determinants. In pulmonology, AI is useful at every stage of care, from drug discovery, through treatment optimization, to disease management. Generative AI in the antifibrotic drug discovery process [50,51] requires interpretation for long-term durability and safety. The use of predictive models in surgery and radiotherapy [52,54] also requires interpretability and clinical accountability. In chronic pulmonary diseases such as COPD, the benefits of AI only become apparent with long-term follow-up [53]. Hybrid tools combining imaging results and biomarkers, such as in the diagnosis of pulmonary nodules [56] or tuberculosis screening [57], achieve the greatest clinical value. In neurology, the value of AI depends not only on accuracy but also on the presentation of results and personalization to the patient. Explainable AI demonstrates that results can be interpreted differently by physicians and laypeople [63]. AI-guided neuromodulation enables objectification of DBS programming in treatment-resistant depression and Parkinson’s disease [64,65], thus reducing subjective assessment. The main limitations to its effectiveness include negative experiences with electrodiagnostics [67] and the limited effectiveness of signal analysis from external devices [70], as well as the lack of integration into workflows. Tools supporting decisions and care organization, e.g., in the diagnosis of migraines [66], and hybrid neurosurgical training [68], reveal the greatest potential of AI in neurology. In the field of hepatology and transplantology, AI allows for overcoming the traditional limitations of expert assessments through continuous, quantitative measures and complex decision models. Digital analyses in NASH have shown that continuous scoring reveals subtle therapeutic effects that are undetectable by classical tools [74,75]. In diagnostic imaging, AI equalizes the competence of operators [76], and the integration of deep learning algorithms with a multimodal approach allows for the detection of small hepatocellular tumors with high accuracy [77]. Neural networks in transplantology outperform traditional prognostic scales [78], thus emphasizing the need for transparency and decision-making accountability. The primary value of AI comes from endpoint control, risk stratification, and clinical decision support, rather than from automated single assessments. In gastroenterology and diabetology, the greatest clinical benefits are provided by multimodal, validated, and supportive systems. Rather than replacing physician decision-making. Combining clinical data with EUS images significantly improves the diagnostic accuracy of less experienced endoscopists [86], and DL systems for pancreas imaging maintain their performance in nationwide validation [87]. Modern algorithms for type 1 diabetes significantly replicate the effectiveness of traditional control algorithms, reducing computational complexity and maintaining safety [88,89]. In diagnostic imaging, endoscopy, mental health, and palliative care, AI works best when applied in clinical processes and designed as part of the care team, rather than an automatic decision-maker. In mammography, AI can safely replace one of the operators, maintaining cancer detection rates and thus reducing staffing burden [96,97]. In endoscopy, advanced CADe and GRAIDS systems reduce perception errors and allow for compensating for differences in operator experience [99,101,103]. Algorithms do not always deliver benefits in populations already receiving care, as evidenced by the negative results of SELFBACK [100]. In mental health and palliative care, AI improves patient engagement, treatment outcomes, and supports clinical decision-making [98,102,104]. In everyday primary care practice, it allows for shortening the diagnostic process for melanoma and making the correct diagnosis [105]. The use of artificial intelligence in medicine shows significant potential to improve the diagnostic process, treatment, and efficiency of healthcare in many areas. Successful AI implementation requires not only precise algorithms but also sustainable generalizability, integration into daily workflows, and the definition of clinical responsibility.

5. Limitations

This narrative review has several important limitations. The rapid evolution of artificial intelligence in medicine means that some of the most recent studies, particularly those published after the final literature search, may not be included. The field is dominated by imaging-based applications, which are overrepresented in the current literature due to their relative maturity and regulatory approvals, while evidence in other domains (e.g., long-term outcome prediction, non-imaging biomarkers, or real-world implementation in low-resource settings) remains more limited. Many of the cited studies are single-center, have relatively short follow-up periods, or lack large-scale, multicenter prospective validation. Long-term clinical outcome data, particularly hard endpoints such as mortality or major morbidity, are still scarce for most AI tools. Additionally, potential publication bias toward positive results and the predominance of high-income country settings may limit generalizability. Finally, while this review focuses on clinically impactful, high-quality evidence, the heterogeneity of study designs and outcomes precludes formal meta-analysis, and important ongoing trials or emerging foundation models may alter the landscape soon. These limitations should be considered when interpreting the current state of AI in clinical medicine.

6. Conclusions

AI, particularly ML and DL, shows growing integration into clinical practice, offering tools with high diagnostic and prognostic accuracy across multiple internal medicine specialties. This review summarizes current applications in cardiology, pulmonology, neurology, hepatology, pancreatic diseases, gastroenterology, and related fields. Promising evidence from recent studies demonstrates improvements in diagnostic performance as well as workflow efficiency. As of late 2025, over 1300 AI-enabled medical devices have received FDA clearance, predominantly in imaging-related applications (more than 75%), with cardiology and radiology leading in regulatory approvals. Despite strong performance metrics in controlled settings (AUC often 0.90–0.98) and increasing real-world implementations, most of the evidence derives from retrospective studies or external validations, with a limited number of randomized controlled trials. Direct improvements in hard clinical outcomes remain preliminary and require further confirmation. Widespread clinical adoption is currently constrained by challenges, including the need for robust prospective validation, generalizability across diverse populations and settings, bias mitigation, interpretability, ethical considerations, legal liability, and clinician acceptance. Large-scale, multicenter, prospective randomized controlled trials are essential to establish sustained benefits on patient outcomes, optimize integration into clinical workflows, and realize the full potential of AI as a supportive tool that enhances rather than replaces human clinical judgment. Successful future implementation will depend on high-quality data, transparent algorithms, multidisciplinary collaboration, and ongoing post-market surveillance.

Author Contributions

Conceptualization, A.Ł., D.B.-A., W.M. and D.A.; methodology, A.Ł., D.B.-A., W.M. and D.A.; software, A.Ł., D.B.-A., W.M. and D.A.; validation, A.Ł., D.B.-A., W.M. and D.A.; formal analysis, A.Ł., D.B.-A., W.M. and D.A.; resources, A.Ł., D.B.-A., W.M. and D.A.; data curation, A.Ł., D.B.-A., W.M. and D.A.; writing—original draft preparation, A.Ł., D.B.-A., W.M. and D.A.; writing—review and editing, A.Ł., D.B.-A., W.M. and D.A.; visualization, A.Ł., D.B.-A., W.M. and D.A.; supervision, D.A.; project administration, A.Ł., D.B.-A., W.M. and D.A.; funding acquisition, D.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data has been included.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AF	Atrial fibrillation
AI	Artificial intelligence
AKI	Acute kidney injury
AUC	Area under the (receiver operating characteristic) curve
CADe	Computer-aided detection
CAD	Coronary artery disease
CNN	Convolutional neural network
CT	Computed tomography
DL	Deep learning
ECG	Electrocardiogram
EHR	Electronic health record
EUS	Endoscopic ultrasound
FDA	U.S. Food and Drug Administration
HF	Heart failure
HFpEF	Heart failure with preserved ejection fraction
HPI	Hypotension Prediction Index
IPMN	Intraductal papillary mucinous neoplasm
LVEF	Left ventricular ejection fraction
MACCE	Major adverse cardiac and cerebrovascular events
ML	Machine learning
MRI	Magnetic resonance imaging
NOAC	Non-vitamin K antagonist oral anticoagulant
NSCLC	Non-small cell lung cancer
PCI	Percutaneous coronary intervention
PDAC	Pancreatic ductal adenocarcinoma
PPV	Positive predictive value
QoL	Quality of life
RCT	Randomized controlled trial
T1D	Type 1 diabetes

References

Bajwa, J.; Munir, U.; Nori, A.; Williams, B. Artificial intelligence in healthcare: Transforming the practice of medicine. Future Healthc. J. 2021, 8, e188–e194. [Google Scholar] [CrossRef]
Shi, J.; Bendig, D.; Vollmar, H.C.; Rasche, P. Mapping the Bibliometrics Landscape of AI in Medicine: Methodological Study. J. Med. Internet Res. 2023, 25, e45815. [Google Scholar] [CrossRef] [PubMed]
Howard, F.M.; Li, A.; Riffon, M.F.; Garrett-Mayer, E.; Pearson, A.T. Characterizing the Increase in Artificial Intelligence Content Detection in Oncology Scientific Abstracts From 2021 to 2023. JCO Clin. Cancer Inform. 2024, 8, e2400077. [Google Scholar] [CrossRef] [PubMed]
Shajari, S.; Kuruvinashetti, K.; Komeili, A.; Sundararaj, U. The Emergence of AI-Based Wearable Sensors for Digital Health Technology: A Review. Sensors 2023, 23, 9498. [Google Scholar] [CrossRef] [PubMed]
Huang, G.; Chen, X.; Liao, C. AI-Driven Wearable Bioelectronics in Digital Healthcare. Biosensors 2025, 15, 410. [Google Scholar] [CrossRef] [PubMed]
Hannun, A.Y.; Rajpurkar, P.; Haghpanahi, M.; Tison, G.H.; Bourn, C.; Turakhia, M.P.; Ng, A.Y. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nat. Med. 2019, 25, 65–69. [Google Scholar] [CrossRef] [PubMed]
Virella Pérez, Y.I.; Medlow, S.; Ho, J.; Steinbeck, K. Mobile and Web-Based Apps That Support Self-Management and Transition in Young People with Chronic Illness: Systematic Review. J. Med. Internet Res. 2019, 21, e13579. [Google Scholar] [CrossRef] [PubMed]
Rieke, N.; Hancox, J.; Li, W.; Milletarì, F.; Roth, H.R.; Albarqouni, S.; Bakas, S.; Galtier, M.N.; Landman, B.A.; Maier-Hein, K.; et al. The future of digital health with federated learning. npj Digit. Med. 2020, 3, 119. [Google Scholar] [CrossRef] [PubMed]
Ziller, A.; Usynin, D.; Braren, R.; Makowski, M.; Rueckert, D.; Kaissis, G. Medical imaging deep learning with differential privacy. Sci. Rep. 2021, 11, 13524. [Google Scholar] [CrossRef] [PubMed]
Yadalam, A.K.; Liu, C.; Hui, Q.; Razavi, A.C.; Sperling, L.S.; Quyyumi, A.A.; Sun, Y.V. Large-Scale Proteomics-Based Risk Score for the Prediction of Incident Cardio-Kidney-Metabolic Disease Risk. Circ. Genom. Precis. Med. 2025, 18, e005125. [Google Scholar] [CrossRef]
Singh, M.; Kumar, A.; Khanna, N.N.; Laird, J.R.; Nicolaides, A.; Faa, G.; Johri, A.M.; Mantella, L.E.; Fernandes, J.F.E.; Teji, J.S.; et al. Artificial intelligence for cardiovascular disease risk assessment in personalised framework: A scoping review. EClinicalMedicine 2024, 73, 102660. [Google Scholar] [CrossRef]
Hadida Barzilai, D.; Sudri, K.; Goshen, G.; Klang, E.; Zimlichman, E.; Barbash, I.; Cohen Shelly, M. Randomized Controlled Trials Evaluating Artificial Intelligence in Cardiovascular Care: A Systematic Review. JACC Adv. 2025, 4, 102152. [Google Scholar] [CrossRef]
Komorowski, M.; Cecconi, M. Deploying AI in the ICU: Learning from successes and failures. Intensive Care Med. 2025, 51, 2410–2413. [Google Scholar] [CrossRef] [PubMed]
Tomašev, N.; Glorot, X.; Rae, J.W.; Zielinski, M.; Askham, H.; Saraiva, A.; Mottram, A.; Meyer, C.; Ravuri, S.; Protsyuk, I.; et al. A clinically applicable approach to continuous prediction of future acute kidney injury. Nature 2019, 572, 116–119. [Google Scholar] [CrossRef] [PubMed]
Kellum, J.A.; Bihorac, A. Artificial intelligence to predict AKI: Is it a breakthrough? Nat. Rev. Nephrol. 2019, 15, 663–664. [Google Scholar] [CrossRef]
Tiwari, A.; Mishra, S.; Kuo, T.R. Current AI technologies in cancer diagnostics and treatment. Mol. Cancer 2025, 24, 159. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Che, Y.; Liu, R.; Wang, Z.; Liu, W. Deep learning-driven multi-omics analysis: Enhancing cancer diagnostics and therapeutics. Brief. Bioinform. 2025, 26, bbaf440. [Google Scholar] [CrossRef]
Olang, O.; Mohseni, S.; Shahabinezhad, A.; Hamidianshirazi, Y.; Goli, A.; Abolghasemian, M.; Shafiee, M.A.; Aarabi, M.; Alavinia, M.; Shaker, P. Artificial Intelligence-Based Models for Prediction of Mortality in ICU Patients: A Scoping Review. J. Intensive Care Med. 2025, 40, 1240–1246. [Google Scholar] [CrossRef]
Singhal, K.; Tu, T.; Gottweis, J.; Sayres, R.; Wulczyn, E.; Amin, M.; Hou, L.; Clark, K.; Pfohl, S.R.; Cole-Lewis, H.; et al. Toward expert-level medical question answering with large language models. Nat. Med. 2025, 31, 943–950. [Google Scholar] [CrossRef]
Yang, X.; Chen, A.; PourNejatian, N.; Shin, H.C.; Smith, K.E.; Parisien, C.; Compas, C.; Martin, C.; Costa, A.B.; Flores, M.G.; et al. A large language model for electronic health records. npj Digit. Med. 2022, 5, 194. [Google Scholar] [CrossRef]
Sivakumar, R.; Lue, B.; Kundu, S. FDA Approval of Artificial Intelligence and Machine Learning Devices in Radiology: A Systematic Review. JAMA Netw. Open 2025, 8, e2542338. [Google Scholar] [CrossRef] [PubMed]
Shuja, M.H.; Shakil, F.; Shuja, S.H.; Hasan, M.; Edhi, M.; Abbasi, A.F.; Jawaid, A.; Shakil, S. Harnessing Artificial Intelligence in Cardiology: Advancements in Diagnosis, Treatment, and Patient Care. Heart Views 2024, 25, 241–248. [Google Scholar] [CrossRef]
Ahmad, A.; Ahmad, S.; Ahmad, R.; Bodi, J.; Mohamed, A.; Wasim, A. Artificial Intelligence in Cardiovascular Diagnosis: Innovations and Impact on Disease Screenings. J. Pharm. Bioallied Sci. 2025, 17, S1900–S1903. [Google Scholar] [CrossRef]
Medhi, D.; Kamidi, S.R.; Mamatha Sree, K.P.; Shaikh, S.; Rasheed, S.; Thengu Murichathil, A.H.; Nazir, Z. Artificial Intelligence and Its Role in Diagnosing Heart Failure: A Narrative Review. Cureus 2024, 16, e59661. [Google Scholar] [CrossRef] [PubMed]
Udoy, I.A.; Hassan, O. AI-Driven Technology in Heart Failure Detection and Diagnosis: A Review of the Advancement in Personalized Healthcare. Symmetry 2025, 17, 469. [Google Scholar] [CrossRef]
Almansouri, N.E.; Awe, M.; Rajavelu, S.; Jahnavi, K.; Shastry, R.; Hasan, A.; Hasan, H.; Lakkimsetti, M.; AlAbbasi, R.K.; Gutiérrez, B.C.; et al. Early Diagnosis of Cardiovascular Diseases in the Era of Artificial Intelligence: An In-Depth Review. Cureus 2024, 16, e55869. [Google Scholar] [CrossRef] [PubMed]
Yasmin, F.; Shah, S.M.I.; Naeem, A.; Shujauddin, S.M.; Jabeen, A.; Kazmi, S.; Siddiqui, S.A.; Kumar, P.; Salman, S.; Hassan, S.A.; et al. Artificial intelligence in the diagnosis and detection of heart failure: The past, present, and future. Rev. Cardiovasc. Med. 2021, 22, 1095–1113. [Google Scholar] [CrossRef]
Sanders-van Wijk, S.; Barandiarán Aizpurua, A.; Brunner-La Rocca, H.P.; Henkens, M.T.H.M.; Weerts, J.; Knackstedt, C.; Uszko-Lencer, N.; Heymans, S.; van Empel, V. The HFA-PEFF and H₂ FPEF scores largely disagree in classifying patients with suspected heart failure with preserved ejection fraction. Eur. J. Heart Fail. 2021, 23, 838–840. [Google Scholar] [CrossRef]
Bourazana, A.; Xanthopoulos, A.; Briasoulis, A.; Magouliotis, D.; Spiliopoulos, K.; Athanasiou, T.; Vassilopoulos, G.; Skoularigis, J.; Triposkiadis, F. Artificial Intelligence in Heart Failure: Friend or Foe? Life 2024, 14, 145. [Google Scholar] [CrossRef] [PubMed]
Holt, D.B.; El-Bokl, A.; Stromberg, D.; Taylor, M.D. Role of Artificial Intelligence in Congenital Heart Disease and Interventions. J. Soc. Cardiovasc. Angiogr. Interv. 2025, 4, 102567. [Google Scholar] [CrossRef]
Paul, D.; Sanap, G.; Shenoy, S.; Kalyane, D.; Kalia, K.; Tekade, R.K. Artificial intelligence in drug discovery and development. Drug Discov. Today 2021, 26, 80–93. [Google Scholar] [CrossRef]
Meder, B.; Asselbergs, F.W.; Ashley, E. Artificial intelligence to improve cardiovascular population health. Eur. Heart J. 2025, 46, 1907–1916. [Google Scholar] [CrossRef]
Deisenhofer, I.; Albenque, J.P.; Busch, S.; Gitenay, E.; Mountantonakis, S.E.; Roux, A.; Horvilleur, J.; Bakouboula, B.; Oza, S.; Abbey, S.; et al. Artificial intelligence for individualized treatment of persistent atrial fibrillation: A randomized controlled trial. Nat. Med. 2025, 31, 1286–1293. [Google Scholar] [CrossRef]
Kim, Y.; Yoon, H.J.; Suh, J.; Kang, S.H.; Lim, Y.H.; Jang, D.H.; Park, J.H.; Shin, E.S.; Bae, J.W.; Lee, J.H.; et al. Artificial Intelligence-Based Fully Automated Quantitative Coronary Angiography vs Optical Coherence Tomography-Guided PCI: The FLASH Trial. JACC. Cardiovasc. Interv. 2025, 18, 187–197. [Google Scholar] [CrossRef]
Liu, W.T.; Lin, C.; Lee, C.C.; Chang, C.H.; Fang, W.H.; Tsai, D.J.; Lin, W.Y.; Hung, Y.; Chen, K.C.; Lee, C.H.; et al. Artificial Intelligence-Enabled ECGs for Atrial Fibrillation Identification and Enhanced Oral Anticoagulant Adoption: A Pragmatic Randomized Clinical Trial. J. Am. Heart Assoc. 2025, 14, e042106. [Google Scholar] [CrossRef] [PubMed]
Tsai, D.J.; Lin, C.; Liu, W.T.; Lee, C.C.; Chang, C.H.; Lin, W.Y.; Liu, Y.L.; Chang, D.W.; Hsieh, P.H.; Tsai, C.S.; et al. Artificial intelligence-assisted diagnosis and prognostication in low ejection fraction using electrocardiograms in inpatient department: A pragmatic randomized controlled trial. BMC Med. 2025, 23, 342. [Google Scholar] [CrossRef] [PubMed]
Kolossváry, M.; Lin, A.; Kwiecinski, J.; Cadet, S.; Slomka, P.J.; Newby, D.E.; Dweck, M.R.; Williams, M.C.; Dey, D. Coronary Plaque Radiomic Phenotypes Predict Fatal or Nonfatal Myocardial Infarction: Analysis of the SCOT-HEART Trial. JACC Cardiovasc. Imaging 2025, 18, 308–319. [Google Scholar] [CrossRef]
Trivedi, R.; Shaw, T.; Sheahen, B.; Chow, C.K.; Laranjo, L. Patient Perspectives on Conversational Artificial Intelligence for Atrial Fibrillation Self-Management: Qualitative Analysis. J. Med. Internet Res. 2025, 27, e64325. [Google Scholar] [CrossRef] [PubMed]
Mekonnen, D.; Spitzer, E.; McFadden, E.P.; Caplice, N.M.; Ren, C.B. Artificial intelligence-assisted left ventricular global longitudinal strain assessment in patients with acute myocardial infarction: A RESUS-AMI trial sub-analysis. Int. J. Cardiovasc. Imaging 2025, 41, 1225–1236. [Google Scholar] [CrossRef]
Williams, M.C.; Guimaraes, A.R.M.; Jiang, M.; Kwieciński, J.; Weir-McCall, J.R.; Adamson, P.D.; Mills, N.L.; Roditi, G.H.; van Beek, E.J.R.; Nicol, E.; et al. Machine learning to predict high-risk coronary artery disease on CT in the SCOT-HEART trial. Open Heart 2025, 12, e003162. [Google Scholar] [CrossRef]
Saklica, D.; Vardar-Yagli, N.; Saglam, M.; Yuce, D.; Ates, A.H.; Yorgun, H. The Impact of Technology-Based Cardiac Rehabilitation on Exercise Capacity and Adherence in Patients with Coronary Artery Disease: An Artificial Intelligence Analysis. Arq. Bras. Cardiol. 2025, 122, e20240765. [Google Scholar] [CrossRef]
Trivedi, R.; Laranjo, L.; Marschner, S.; Thiagalingam, A.; Thomas, S.; Kumar, S.; Shaw, T.; Chow, C.K. Conversational AI Phone Calls to Support Patients with Atrial Fibrillation: Randomized Controlled Trial. JMIR Cardio 2025, 9, e64326. [Google Scholar] [CrossRef]
Fiolet, A.T.L.; Lin, A.; Kwiecinski, J.; Tutein Nolthenius, J.; McElhinney, P.; Grodecki, K.; Kietselaer, B.; Opstal, T.S.; Cornel, J.H.; Knol, R.J.; et al. Effect of low-dose colchicine on pericoronary inflammation and coronary plaque composition in chronic coronary disease: A subanalysis of the LoDoCo2 trial. Heart 2025, 111, 1156–1163. [Google Scholar] [CrossRef] [PubMed]
Li, G.; Weng, T.; Sun, P.; Li, Z.; Ding, D.; Guan, S.; Han, W.; Gan, Q.; Li, M.; Qi, L.; et al. Diagnostic performance of fully automatic coronary CT angiography-based quantitative flow ratio. J. Cardiovasc. Comput. Tomogr. 2025, 19, 40–47. [Google Scholar] [CrossRef]
Yu, X.; Cao, J.; Xu, J.; Xu, Q.; Chen, H.; Yu, D.; Ou, A.; Hu, Y.; Ma, L. Efficacy of Telemedical Interventional Management in Patients with Coronary Heart Disease Undergoing Percutaneous Coronary Intervention: Randomized Controlled Trial. J. Med. Internet Res. 2025, 27, e63350. [Google Scholar] [CrossRef] [PubMed]
Ishiguchi, H.; Chen, Y.; Huang, B.; Gue, Y.; Correa, E.; Homma, S.; Thompson, J.L.P.; Qian, M.; Lip, G.Y.H.; Abdul-Rahim, A.H. Machine learning for stroke in heart failure with reduced ejection fraction but without atrial fibrillation: A post-hoc analysis of the WARCEF trial. Eur. J. Clin. Investig. 2025, 55, e14360. [Google Scholar] [CrossRef] [PubMed]
Sindhu, A.; Jadhav, U.; Ghewade, B.; Bhanushali, J.; Yadav, P. Revolutionizing Pulmonary Diagnostics: A Narrative Review of Artificial Intelligence Applications in Lung Imaging. Cureus 2024, 16, e57657. [Google Scholar] [CrossRef]
Cellina, M.; Cacioppa, L.M.; Cè, M.; Chiarpenello, V.; Costa, M.; Vincenzo, Z.; Pais, D.; Bausano, M.V.; Rossini, N.; Bruno, A.; et al. Artificial Intelligence in Lung Cancer Screening: The Future Is Now. Cancers 2023, 15, 4344. [Google Scholar] [CrossRef]
Ma, K.; Zheng, M.; Chen, W.; Qi, Y.; Rong, H. Research progress in computer-aided diagnosis systems for lung cancer. npj Digit. Med. 2025, 8, 722. [Google Scholar] [CrossRef]
Ren, F.; Aliper, A.; Chen, J.; Zhao, H.; Rao, S.; Kuppe, C.; Ozerov, I.V.; Zhang, M.; Witte, K.; Kruse, C.; et al. A small-molecule TNIK inhibitor targets fibrosis in preclinical and clinical models. Nat. Biotechnol. 2025, 43, 63–75. [Google Scholar] [CrossRef]
Xu, Z.; Ren, F.; Wang, P.; Cao, J.; Tan, C.; Ma, D.; Zhao, L.; Dai, J.; Ding, Y.; Fang, H.; et al. A generative AI-discovered TNIK inhibitor for idiopathic pulmonary fibrosis: A randomized phase 2a trial. Nat. Med. 2025, 31, 2602–2610. [Google Scholar] [CrossRef]
Habicher, M.; Denn, S.M.; Schneck, E.; Akbari, A.A.; Schmidt, G.; Markmann, M.; Alkoudmani, I.; Koch, C.; Sander, M. Perioperative goal-directed therapy with artificial intelligence to reduce the incidence of intraoperative hypotension and renal failure in patients undergoing lung surgery: A pilot study. J. Clin. Anesth. 2025, 102, 111777. [Google Scholar] [CrossRef]
Hong, L.; Cheng, X.; Zheng, D. Application of Artificial Intelligence in Emergency Nursing of Patients with Chronic Obstructive Pulmonary Disease. Contrast Media Mol. Imaging 2021, 2021, 6423398. [Google Scholar] [CrossRef] [PubMed]
Ladbury, C.; Li, R.; Danesharasteh, A.; Ertem, Z.; Tam, A.; Liu, J.; Hao, C.; Li, R.; McGee, H.; Sampath, S.; et al. Explainable Artificial Intelligence to Identify Dosimetric Predictors of Toxicity in Patients with Locally Advanced Non-Small Cell Lung Cancer: A Secondary Analysis of RTOG 0617. Int. J. Radiat. Oncol. Biol. Phys. 2023, 117, 1287–1296. [Google Scholar] [CrossRef] [PubMed]
Pasipanodya, J.G.; Smythe, W.; Merle, C.S.; Olliaro, P.L.; Deshpande, D.; Magombedze, G.; McIlleron, H.; Gumbo, T. Artificial intelligence-derived 3-Way Concentration-dependent Antagonism of Gatifloxacin, Pyrazinamide, and Rifampicin During Treatment of Pulmonary Tuberculosis. Clin. Infect. Dis. 2018, 67, S284–S292. [Google Scholar] [CrossRef]
Ding, Y.; Zhang, J.; Zhuang, W.; Gao, Z.; Kuang, K.; Tian, D.; Deng, C.; Wu, H.; Chen, R.; Lu, G.; et al. Improving the efficiency of identifying malignant pulmonary nodules before surgery via a combination of artificial intelligence CT image recognition and serum autoantibodies. Eur. Radiol. 2023, 33, 3092–3102. [Google Scholar] [CrossRef]
Bosman, S.; Ayakaka, I.; Muhairwe, J.; Kamele, M.; van Heerden, A.; Madonsela, T.; Labhardt, N.D.; Sommer, G.; Bremerich, J.; Zoller, T.; et al. Evaluation of C-Reactive Protein and Computer-Aided Analysis of Chest X-rays as Tuberculosis Triage Tests at Health Facilities in Lesotho and South Africa. Clin. Infect. Dis. 2024, 79, 1293–1302. [Google Scholar] [CrossRef] [PubMed]
Kalani, M.; Anjankar, A. Revolutionizing Neurology: The Role of Artificial Intelligence in Advancing Diagnosis and Treatment. Cureus 2024, 16, e61706. [Google Scholar] [CrossRef]
Kale, M.; Wankhede, N.; Pawar, R.; Ballal, S.; Kumawat, R.; Goswami, M.; Khalid, M.; Taksande, B.; Upaganlawar, A.; Umekar, M.; et al. AI-driven innovations in Alzheimer’s disease: Integrating early diagnosis, personalized treatment, and prognostic modelling. Ageing Res. Rev. 2024, 101, 102497. [Google Scholar] [CrossRef]
Onciul, R.; Tataru, C.-I.; Dumitru, A.V.; Crivoi, C.; Serban, M.; Covache-Busuioc, R.-A.; Radoi, M.P.; Toader, C. Artificial Intelligence and Neuroscience: Transformative Synergies in Brain Research and Clinical Applications. J. Clin. Med. 2025, 14, 550. [Google Scholar] [CrossRef]
Zhang, H.; Jiao, L.; Yang, S.; Li, H.; Jiang, X.; Feng, J.; Zou, S.; Xu, Q.; Gu, J.; Wang, X.; et al. Brain-computer interfaces: The innovative key to unlocking neurological conditions. Int. J. Surg. 2024, 110, 5745–5762. [Google Scholar] [CrossRef]
Buttar, A.M.; Shaheen, Z.; Gumaei, A.H.; Mosleh, M.A.A.; Gupta, I.; Alzanin, S.M.; Akbar, M.A. Enhanced neurological anomaly detection in MRI images using deep convolutional neural networks. Front. Med. 2024, 11, 1504545. [Google Scholar] [CrossRef] [PubMed]
Gombolay, G.Y.; Silva, A.; Schrum, M.; Gopalan, N.; Hallman-Cooper, J.; Dutt, M.; Gombolay, M. Effects of explainable artificial intelligence in neurology decision support. Ann. Clin. Transl. Neurol. 2024, 11, 1224–1235. [Google Scholar] [CrossRef] [PubMed]
Alagapan, S.; Choi, K.S.; Heisig, S.; Riva-Posse, P.; Crowell, A.; Tiruvadi, V.; Obatusin, M.; Veerakumar, A.; Waters, A.C.; Gross, R.E.; et al. Cingulate dynamics track depression recovery with deep brain stimulation. Nature 2023, 622, 130–138. [Google Scholar] [CrossRef]
Boutet, A.; Madhavan, R.; Elias, G.J.B.; Joel, S.E.; Gramer, R.; Ranjan, M.; Paramanandam, V.; Xu, D.; Germann, J.; Loh, A.; et al. Predicting optimal deep brain stimulation parameters for Parkinson’s disease using functional MRI and machine learning. Nat. Commun. 2021, 12, 3043. [Google Scholar] [CrossRef]
Cowan, R.P.; Rapoport, A.M.; Blythe, J.; Rothrock, J.; Knievel, K.; Peretz, A.M.; Ekpo, E.; Sanjanwala, B.M.; Woldeamanuel, Y.W. Diagnostic accuracy of an artificial intelligence online engine in migraine: A multi-center study. Headache 2022, 62, 870–882. [Google Scholar] [CrossRef] [PubMed]
Gorenshtein, A.; Weisblat, Y.; Khateb, M.; Kenan, G.; Tsirkin, I.; Fayn, G.; Geller, S.; Shelly, S. AI-Based EMG Reporting: A Randomized Controlled Trial. J. Neurol. 2025, 272, 586. [Google Scholar] [CrossRef]
Davidovic, V.; Giglio, B.; Albeloushi, A.; Alhaj, A.K.; Alhantoobi, M.; Saeedi, R.; Deraiche, S.; Yilmaz, R.; Tee, T.; Fazlollahi, A.M.; et al. Effect of Artificial Intelligence-Augmented Human Instruction on Feedback Frequency and Surgical Performance During Simulation Training. J. Surg. Educ. 2025, 82, 103743. [Google Scholar] [CrossRef]
Hassan, A.E.; Ravi, S.; Desai, S.; Saei, H.M.; Mckennon, E.; Tekle, W.G. An artificial intelligence (AI)-based approach to clinical trial recruitment: The impact of Viz RECRUIT on enrollment in the EMBOLISE trial. Interv. Neuroradiol. J. Perither. Neuroradiol. Surg. Proced. Relat. Neurosci. 2025, 31, 739–744. [Google Scholar] [CrossRef]
Macea, J.; Heremans, E.R.M.; Proost, R.; De Vos, M.; Van Paesschen, W. Automated Sleep Staging in Epilepsy Using Deep Learning on Standard Electroencephalogram and Wearable Data. J. Sleep Res. 2025, 34, e70061. [Google Scholar] [CrossRef]
Kröner, P.T.; Engels, M.M.; Glicksberg, B.S.; Johnson, K.W.; Mzaik, O.; van Hooft, J.E.; Wallace, M.B.; El-Serag, H.B.; Krittanawong, C. Artificial intelligence in gastroenterology: A state-of-the-art review. World J. Gastroenterol. 2021, 27, 6794–6824. [Google Scholar] [CrossRef]
Bian, Y.; Li, J.; Ye, C.; Jia, X.; Yang, Q. Artificial intelligence in medical imaging: From task-specific models to large-scale foundation models. Chin. Med. J. 2025, 138, 651–663. [Google Scholar] [CrossRef]
Vivek, K.; Papalois, V. AI and Machine Learning in Transplantation. Transplantology 2025, 6, 23. [Google Scholar] [CrossRef]
Ratziu, V.; Francque, S.; Behling, C.A.; Cejvanovic, V.; Cortez-Pinto, H.; Iyer, J.S.; Krarup, N.; Le, Q.; Sejling, A.S.; Tiniakos, D.; et al. Artificial intelligence scoring of liver biopsies in a phase II trial of semaglutide in nonalcoholic steatohepatitis. Hepatology 2024, 80, 173–185. [Google Scholar] [CrossRef]
Tiyarattanachai, T.; Apiparakoon, T.; Chaichuen, O.; Sukcharoen, S.; Yimsawad, S.; Jangsirikul, S.; Chaikajornwat, J.; Siriwong, N.; Burana, C.; Siritaweechai, N.; et al. Artificial intelligence assists operators in real-time detection of focal liver lesions during ultrasound: A randomized controlled study. Eur. J. Radiol. 2023, 165, 110932. [Google Scholar] [CrossRef]
Zhang, Y.; Cui, J.; Wan, W.; Liu, J. Multimodal Imaging under Artificial Intelligence Algorithm for the Diagnosis of Liver Cancer and Its Relationship with Expressions of EZH2 and p57. Comput. Intell. Neurosci. 2022, 2022, 4081654. [Google Scholar] [CrossRef]
Briceño, J.; Cruz-Ramírez, M.; Prieto, M.; Navasa, M.; Ortiz de Urbina, J.; Orti, R.; Gómez-Bravo, M.Á.; Otero, A.; Varo, E.; Tomé, S.; et al. Use of artificial intelligence as an innovative donor-recipient matching model for liver transplantation: Results from a multicenter Spanish study. J. Hepatol. 2014, 61, 1020–1028. [Google Scholar] [CrossRef] [PubMed]
Yang, M.; Zhao, Y.; Li, C.; Weng, X.; Li, Z.; Guo, W.; Jia, W.; Feng, F.; Hu, J.; Sun, H.; et al. Multimodal integration of liquid biopsy and radiology for the noninvasive diagnosis of gallbladder cancer and benign disorders. Cancer Cell 2025, 43, 398–412.e4. [Google Scholar] [CrossRef]
Goyal, H.; Mann, R.; Gandhi, Z.; Perisetti, A.; Zhang, Z.; Sharma, N.; Saligram, S.; Inamdar, S.; Tharian, B. Application of artificial intelligence in pancreaticobiliary diseases. Ther. Adv. Gastrointest. Endosc. 2021, 14, 2631774521993059. [Google Scholar] [CrossRef] [PubMed]
Podină, N.; Gheorghe, E.C.; Constantin, A.; Cazacu, I.; Croitoru, V.; Gheorghe, C.; Balaban, D.V.; Jinga, M.; Țieranu, C.G.; Săftoiu, A. Artificial Intelligence in Pancreatic Imaging: A Systematic Review. United Eur. Gastroenterol. J. 2025, 13, 55–77. [Google Scholar] [CrossRef]
Lopez-Ramirez, F.; Syailendra, E.A.; Tixier, F.; Kawamoto, S.; Fishman, E.K.; Chu, L.C. Early detection of pancreatic cancer on computed tomography: Advancements with deep learning. Radiol. Adv. 2025, 2, umaf028. [Google Scholar] [CrossRef]
Korfiatis, P.; Suman, G.; Patnam, N.G.; Trivedi, K.H.; Karbhari, A.; Mukherjee, S.; Cook, C.; Klug, J.R.; Patra, A.; Khasawneh, H.; et al. Automated Artificial Intelligence Model Trained on a Large Data Set Can Detect Pancreas Cancer on Diagnostic Computed Tomography Scans As Well As Visually Occult Preinvasive Cancer on Prediagnostic Computed Tomography Scans. Gastroenterology 2023, 165, 1533–1546.e4. [Google Scholar] [CrossRef]
Udriștoiu, A.L.; Podină, N.; Ungureanu, B.S.; Constantin, A.; Georgescu, C.V.; Bejinariu, N.; Pirici, D.; Burtea, D.E.; Gruionu, L.; Udriștoiu, S.; et al. Deep learning segmentation architectures for automatic detection of pancreatic ductal adenocarcinoma in EUS-guided fine-needle biopsy samples based on whole-slide imaging. Endosc. Ultrasound 2024, 13, 335–344. [Google Scholar] [CrossRef]
Shi, Y.J.; Zhang, H.; Wang, L.L.; Liu, Y.L.; Zhu, H.T.; Li, X.T.; Wei, Y.Y.; Sun, Y.S. Deep learning automatic segmentation and radiomics model for diagnosing pancreatic solid neoplasms in MRI. BMC Cancer 2025, 25, 1563. [Google Scholar] [CrossRef]
Kui, B.; Pintér, J.; Molontay, R.; Nagy, M.; Farkas, N.; Gede, N.; Vincze, Á.; Bajor, J.; Gódi, S.; Czimmer, J.; et al. EASY-APP: An artificial intelligence model and application for early and easy prediction of severity in acute pancreatitis. Clin. Transl. Med. 2022, 12, e842. [Google Scholar] [CrossRef]
Cui, H.; Zhao, Y.; Xiong, S.; Feng, Y.; Li, P.; Lv, Y.; Chen, Q.; Wang, R.; Xie, P.; Luo, Z.; et al. Diagnosing Solid Lesions in the Pancreas With Multimodal Artificial Intelligence: A Randomized Crossover Trial. JAMA Netw. Open 2024, 7, e2422454. [Google Scholar] [CrossRef] [PubMed]
Chen, P.T.; Wu, T.; Wang, P.; Chang, D.; Liu, K.L.; Wu, M.S.; Roth, H.R.; Lee, P.C.; Liao, W.C.; Wang, W. Pancreatic Cancer Detection on CT Scans with Deep Learning: A Nationwide Population-based Study. Radiology 2023, 306, 172–182. [Google Scholar] [CrossRef] [PubMed]
Kovatchev, B.; Castillo, A.; Pryor, E.; Kollar, L.L.; Barnett, C.L.; DeBoer, M.D.; Brown, S.A. Neural-Net Artificial Pancreas: A Randomized Crossover Trial of a First-in-Class Automated Insulin Delivery Algorithm. Diabetes Technol. Ther. 2024, 26, 375–382. [Google Scholar] [CrossRef]
Atlas, E.; Nimri, R.; Miller, S.; Grunberg, E.A.; Phillip, M. MD-logic artificial pancreas system: A pilot study in adults with type 1 diabetes. Diabetes Care 2010, 33, 1072–1076. [Google Scholar] [CrossRef] [PubMed]
Ali, A.; Alghamdi, M.; Marzuki, S.S.; Tengku Din, T.A.D.A.A.; Yamin, M.S.; Alrashidi, M.; Alkhazi, I.S.; Ahmed, N. Exploring AI Approaches for Breast Cancer Detection and Diagnosis: A Review Article. Breast Cancer 2025, 17, 927–947. [Google Scholar] [CrossRef]
Debellotte, O.; Dookie, R.L.; Rinkoo, F.; Kar, A.; Salazar González, J.F.; Saraf, P.; Aflahe Iqbal, M.; Ghazaryan, L.; Mukunde, A.C.; Khalid, A.; et al. Artificial Intelligence and Early Detection of Breast, Lung, and Colon Cancer: A Narrative Review. Cureus 2025, 17, e79199. [Google Scholar] [CrossRef]
Barrett, M.; Boyne, J.; Brandts, J.; Brunner-La Rocca, H.P.; De Maesschalck, L.; De Wit, K.; Dixon, L.; Eurlings, C.; Fitzsimons, D.; Golubnitschaja, O.; et al. Artificial intelligence supported patient self-care in chronic heart failure: A paradigm shift from reactive to predictive, preventive and personalised care. EPMA J. 2019, 10, 445–464. [Google Scholar] [CrossRef]
Garzonis, K.; Mann, E.; Wyrzykowska, A.; Kanellakis, P. Improving Patient Outcomes: Effectively Training Healthcare Staff in Psychological Practice Skills: A Mixed Systematic Literature Review. Eur. J. Psychol. 2015, 11, 535–556. [Google Scholar] [CrossRef]
Arioz, U.; Allsop, M.J.; Goodman, W.D.; Timmons, S.; Simbirtseva, K.; Mlakar, I.; Mocnik, G. Artificial intelligence-based approaches for advance care planning: A scoping review. BMC Palliat. Care 2025, 24, 268. [Google Scholar] [CrossRef]
Bienefeld, N.; Keller, E.; Grote, G. AI Interventions to Alleviate Healthcare Shortages and Enhance Work Conditions in Critical Care: Qualitative Analysis. J. Med. Internet Res. 2025, 27, e50852. [Google Scholar] [CrossRef]
Lång, K.; Josefsson, V.; Larsson, A.M.; Larsson, S.; Högberg, C.; Sartor, H.; Hofvind, S.; Andersson, I.; Rosso, A. Artificial intelligence-supported screen reading versus standard double reading in the Mammography Screening with Artificial Intelligence trial (MASAI): A clinical safety analysis of a randomised, controlled, non-inferiority, single-blinded, screening accuracy study. Lancet Oncol. 2023, 24, 936–944. [Google Scholar] [CrossRef] [PubMed]
Dembrower, K.; Crippa, A.; Colón, E.; Eklund, M.; Strand, F.; ScreenTrustCAD Trial Consortium. Artificial intelligence for breast cancer detection in screening mammography in Sweden: A prospective, population-based, paired-reader, non-inferiority study. Lancet Digit. Health 2023, 5, e703–e711. [Google Scholar] [CrossRef] [PubMed]
Sadeh-Sharvit, S.; Camp, T.D.; Horton, S.E.; Hefner, J.D.; Berry, J.M.; Grossman, E.; Hollon, S.D. Effects of an Artificial Intelligence Platform for Behavioral Interventions on Depression and Anxiety Symptoms: Randomized Clinical Trial. J. Med. Internet Res. 2023, 25, e46781. [Google Scholar] [CrossRef]
Repici, A.; Spadaccini, M.; Antonelli, G.; Correale, L.; Maselli, R.; Galtieri, P.A.; Pellegatta, G.; Capogreco, A.; Milluzzo, S.M.; Lollo, G.; et al. Artificial intelligence and colonoscopy experience: Lessons from two randomised trials. Gut 2022, 71, 757–765. [Google Scholar] [CrossRef]
Marcuzzi, A.; Nordstoga, A.L.; Bach, K.; Aasdahl, L.; Nilsen, T.I.L.; Bardal, E.M.; Boldermo, N.Ø.; Falkener Bertheussen, G.; Marchand, G.H.; Gismervik, S.; et al. Effect of an Artificial Intelligence-Based Self-Management App on Musculoskeletal Health in Patients with Neck and/or Low Back Pain Referred to Specialist Care: A Randomized Clinical Trial. JAMA Netw. Open 2023, 6, e2320400. [Google Scholar] [CrossRef] [PubMed]
Wallace, M.B.; Sharma, P.; Bhandari, P.; East, J.; Antonelli, G.; Lorenzetti, R.; Vieth, M.; Speranza, I.; Spadaccini, M.; Desai, M.; et al. Impact of Artificial Intelligence on Miss Rate of Colorectal Neoplasia. Gastroenterology 2022, 163, 295–304.e5. [Google Scholar] [CrossRef]
Liaw, S.Y.; Tan, J.Z.; Bin Rusli, K.D.; Ratan, R.; Zhou, W.; Lim, S.; Lau, T.C.; Seah, B.; Chua, W.L. Artificial Intelligence Versus Human-Controlled Doctor in Virtual Reality Simulation for Sepsis Team Training: Randomized Controlled Study. J. Med. Internet Res. 2023, 25, e47748. [Google Scholar] [CrossRef] [PubMed]
Luo, H.; Xu, G.; Li, C.; He, L.; Luo, L.; Wang, Z.; Jing, B.; Deng, Y.; Jin, Y.; Li, Y.; et al. Real-time artificial intelligence for detection of upper gastrointestinal cancer by endoscopy: A multicentre, case-control, diagnostic study. Lancet Oncol. 2019, 20, 1645–1654. [Google Scholar] [CrossRef] [PubMed]
Wilson, P.M.; Ramar, P.; Philpot, L.M.; Soleimani, J.; Ebbert, J.O.; Storlie, C.B.; Morgan, A.A.; Schaeferle, G.M.; Asai, S.W.; Herasevich, V.; et al. Effect of an Artificial Intelligence Decision Support Tool on Palliative Care Referral in Hospitalized Patients: A Randomized Clinical Trial. J. Pain Symptom Manag. 2023, 66, 24–32. [Google Scholar] [CrossRef]
Papachristou, P.; Söderholm, M.; Pallon, J.; Taloyan, M.; Polesie, S.; Paoli, J.; Anderson, C.D.; Falk, M. Evaluation of an artificial intelligence-based decision support for the detection of cutaneous melanoma in primary care: A prospective real-life clinical trial. Br. J. Dermatol. 2024, 191, 125–133. [Google Scholar] [CrossRef] [PubMed]
Kazemzadeh, K. Artificial intelligence in ophthalmology: Opportunities, challenges, and ethical considerations. Med. Hypothesis Discov. Innov. Ophthalmol. J. 2025, 14, 255–272. [Google Scholar] [CrossRef]
Hosny, A.; Parmar, C.; Quackenbush, J.; Schwartz, L.H.; Aerts, H.J.W.L. Artificial intelligence in radiology. Nat. Rev. Cancer 2018, 18, 500–510. [Google Scholar] [CrossRef]
Bhandari, A. Revolutionizing Radiology with Artificial Intelligence. Cureus 2024, 16, e72646. [Google Scholar] [CrossRef]
Försch, S.; Klauschen, F.; Hufnagl, P.; Roth, W. Artificial Intelligence in Pathology. Dtsch. Arztebl. Int. 2021, 118, 194–204. [Google Scholar] [CrossRef]
Shafi, S.; Parwani, A.V. Artificial intelligence in diagnostic pathology. Diagn. Pathol. 2023, 18, 109. [Google Scholar] [CrossRef]
Sajithkumar, A.; Thomas, J.; Saji, A.M.; Ali, F.; E.K, H.H.; Adampulan, H.A.G.; Sarathchand, S. Artificial Intelligence in pathology: Current applications, limitations, and future directions. Ir. J. Med. Sci. 2024, 193, 1117–1121. [Google Scholar] [CrossRef]
Hashemian, H.; Peto, T.; Ambrósio, R., Jr.; Lengyel, I.; Kafieh, R.; Muhammed Noori, A.; Khorrami-Nejad, M. Application of Artificial Intelligence in Ophthalmology: An Updated Comprehensive Review. J. Ophthalmic Vis. Res. 2024, 19, 354–367. [Google Scholar] [CrossRef]
Lim, J.I.; Regillo, C.D.; Sadda, S.R.; Ipp, E.; Bhaskaranand, M.; Ramachandra, C.; Solanki, K. Artificial Intelligence Detection of Diabetic Retinopathy: Subgroup Comparison of the EyeArt System with Ophthalmologists’ Dilated Examinations. Ophthalmol. Sci. 2022, 3, 100228. [Google Scholar] [CrossRef]
Li, Z.; Wang, L.; Wu, X.; Jiang, J.; Qiang, W.; Xie, H.; Zhou, H.; Wu, S.; Shao, Y.; Chen, W. Artificial intelligence in ophthalmology: The path to the real-world clinic. Cell Rep. Med. 2023, 4, 101095. [Google Scholar] [CrossRef]
Johnson, K.W.; Torres Soto, J.; Glicksberg, B.S.; Shameer, K.; Miotto, R.; Ali, M.; Ashley, E.; Dudley, J.T. Artificial Intelligence in Cardiology. J. Am. Coll. Cardiol. 2018, 71, 2668–2679. [Google Scholar] [CrossRef] [PubMed]
Karatzia, L.; Aung, N.; Aksentijevic, D. Artificial intelligence in cardiology: Hope for the future and power for the present. Front. Cardiovasc. Med. 2022, 9, 945726. [Google Scholar] [CrossRef]
Patrascanu, O.S.; Tutunaru, D.; Musat, C.L.; Dragostin, O.M.; Fulga, A.; Nechita, L.; Ciubara, A.B.; Piraianu, A.I.; Stamate, E.; Poalelungi, D.G.; et al. Future Horizons: The Potential Role of Artificial Intelligence in Cardiology. J. Pers. Med. 2024, 14, 656. [Google Scholar] [CrossRef] [PubMed]
Lotter, W.; Hassett, M.J.; Schultz, N.; Kehl, K.L.; Van Allen, E.M.; Cerami, E. Artificial Intelligence in Oncology: Current Landscape, Challenges, and Future Directions. Cancer Discov. 2024, 14, 711–726. [Google Scholar] [CrossRef]
Kann, B.H.; Hosny, A.; Aerts, H.J.W.L. Artificial intelligence for clinical oncology. Cancer Cell 2021, 39, 916–927. [Google Scholar] [CrossRef] [PubMed]
Shimizu, H.; Nakayama, K.I. Artificial intelligence in oncology. Cancer Sci. 2020, 111, 1452–1460. [Google Scholar] [CrossRef]
Rizzo, M.; Dawson, J.D. AI in Neurology: Everything, Everywhere, All at Once Part 1: Principles and Practice. Ann. Neurol. 2025, 98, 211–230. [Google Scholar] [CrossRef]
Khalilian, M.; Godefroy, O.; Roussel, M.; Mousavi, A.; Aarabi, A. Post-stroke outcome prediction based on lesion-derived features. NeuroImage Clin. 2025, 45, 103747. [Google Scholar] [CrossRef] [PubMed]
Voigtlaender, S.; Pawelczyk, J.; Geiger, M.; Vaios, E.J.; Karschnia, P.; Cudkowicz, M.; Dietrich, J.; Haraldsen, I.R.J.H.; Feigin, V.; Owolabi, M.; et al. Artificial intelligence in neurology: Opportunities, challenges, and policy implications. J. Neurol. 2024, 271, 2258–2273. [Google Scholar] [CrossRef]
Shrestha, U.K. Emerging role of artificial intelligence in gastroenterology and hepatology. World J. Gastroenterol. 2025, 31, 111495. [Google Scholar] [CrossRef]
Urquhart, S.A.; Christof, M.; Coelho-Prabhu, N. The impact of artificial intelligence on the endoscopic assessment of inflammatory bowel disease-related neoplasia. Ther. Adv. Gastroenterol. 2025, 18, 17562848251348574. [Google Scholar] [CrossRef] [PubMed]
El-Sayed, A.; Lovat, L.B.; Ahmad, O.F. Clinical Implementation of Artificial Intelligence in Gastroenterology: Current Landscape, Regulatory Challenges, and Ethical Issues. Gastroenterology 2025, 169, 518–530. [Google Scholar] [CrossRef]
Nahm, W.J.; Sohail, N.; Burshtein, J.; Goldust, M.; Tsoukas, M. Artificial Intelligence in Dermatology: A Comprehensive Review of Approved Applications, Clinical Implementation, and Future Directions. Int. J. Dermatol. 2025, 64, 1568–1583. [Google Scholar] [CrossRef] [PubMed]
Young, A.T.; Xiong, M.; Pfau, J.; Keiser, M.J.; Wei, M.L. Artificial Intelligence in Dermatology: A Primer. J. Investig. Dermatol. 2020, 140, 1504–1512. [Google Scholar] [CrossRef]
Fliorent, R.; Fardman, B.; Podwojniak, A.; Javaid, K.; Tan, I.J.; Ghani, H.; Truong, T.M.; Rao, B.; Heath, C. Artificial intelligence in dermatology: Advancements and challenges in skin of color. Int. J. Dermatol. 2024, 63, 455–461. [Google Scholar] [CrossRef]
Cruz-Gonzalez, P.; He, A.W.; Lam, E.P.; Ng, I.M.C.; Li, M.W.; Hou, R.; Chan, J.N.; Sahni, Y.; Vinas Guasch, N.; Miller, T.; et al. Artificial intelligence in mental health care: A systematic review of diagnosis, monitoring, and intervention applications. Psychol. Med. 2025, 55, e18. [Google Scholar] [CrossRef]
Prégent, J.; Chung, V.H.; El Adib, I.; Désilets, M.; Hudon, A. Applications of Artificial Intelligence in Psychiatry and Psychology Education: Scoping Review. JMIR Med. Educ. 2025, 11, e75238. [Google Scholar] [CrossRef] [PubMed]
Garcia, G. The role of AI in transforming psychiatric-mental health care: Enhancing the role of psychiatric-mental health nurse practitioners. Nurs. Outlook 2025, 73, 102461. [Google Scholar] [CrossRef]
Niazi, S.K.; Mariam, Z. Artificial intelligence in drug development: Reshaping the therapeutic landscape. Ther. Adv. Drug Saf. 2025, 16, 20420986251321704. [Google Scholar] [CrossRef]
Zhang, K.; Yang, X.; Wang, Y.; Yu, Y.; Huang, N.; Li, G.; Li, X.; Wu, J.C.; Yang, S. Artificial intelligence in drug development. Nat. Med. 2025, 31, 45–59. [Google Scholar] [CrossRef]
Fu, C.; Chen, Q. The future of pharmaceuticals: Artificial intelligence in drug discovery and development. J. Pharm. Anal. 2025, 15, 101248. [Google Scholar] [CrossRef]
Baghbani, S.; Mehrabi, Y.; Movahedinia, M.; Babaeinejad, E.; Joshaghanian, M.; Amiri, S.; Shahrezaee, M. The revolutionary impact of artificial intelligence in orthopedics: Comprehensive review of current benefits and challenges. J. Robot. Surg. 2025, 19, 511. [Google Scholar] [CrossRef]
Wu, S.; Miao, Y.; Mei, J.; Xiong, S. The Rise of Artificial Intelligence in Orthopedics: A Bibliometric and Visualization Analysis. J. Multidiscip. Healthc. 2025, 18, 6037–6050. [Google Scholar] [CrossRef]
Song, J.; Wang, G.C.; Wang, S.C.; He, C.R.; Zhang, Y.Z.; Chen, X.; Su, J.C. Artificial intelligence in orthopedics: Fundamentals, current applications, and future perspectives. Mil. Med. Res. 2025, 12, 42. [Google Scholar] [CrossRef]
Smith, M.E.; Zalesky, C.C.; Lee, S.; Gottlieb, M.; Adhikari, S.; Goebel, M.; Wegman, M.; Garg, N.; Lam, S.H.F. Artificial Intelligence in Emergency Medicine: A Primer for the Nonexpert. J. Am. Coll. Emerg. Physicians Open 2025, 6, 100051. [Google Scholar] [CrossRef] [PubMed]
Stewart, J.; Sprivulis, P.; Dwivedi, G. Artificial intelligence and machine learning in emergency medicine. Emerg. Med. Australas. EMA 2018, 30, 870–874. [Google Scholar] [CrossRef] [PubMed]
Amiot, F.; Potier, B. Artificial Intelligence (AI) and Emergency Medicine: Balancing Opportunities and Challenges. JMIR Med. Inform. 2025, 13, e70903. [Google Scholar] [CrossRef] [PubMed]
Kerth, J.L.; Bischops, A.C.; Hagemeister, M.; Reinhart, L.; Konrad, K.; Heinrichs, B.; Meissner, T. Künstliche Intelligenz in der Gesundheitsvorsorge von Kindern und Jugendlichen—Anwendungsmöglichkeiten und Akzeptanz [Artificial intelligence in preventive medicine for children and adolescents-applications and acceptance]. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 2025, 68, 907–914. [Google Scholar] [CrossRef] [PubMed]
Dinc, R.; Ardic, N. The Next Frontiers in Preventive and Personalized Healthcare: Artificial Intelligent-powered Solutions. J. Prev. Med. Public Health = Yebang Uihakhoe Chi 2025, 58, 441–452. [Google Scholar] [CrossRef] [PubMed]
Whiteson, H.Z.; Frishman, W.H. Artificial Intelligence in the Prevention and Detection of Cardiovascular Disease. Cardiol. Rev. 2025, 33, 239–242. [Google Scholar] [CrossRef]
Lopes, S.; Rocha, G.; Guimarães-Pereira, L. Artificial intelligence and its clinical application in Anesthesiology: A systematic review. J. Clin. Monit. Comput. 2024, 38, 247–259. [Google Scholar] [CrossRef]
Dost, A.; Alaraj, R.; Mayet, R.; Agrawal, D.K. Reshaping Anesthesia with Artificial Intelligence: From Concept to Reality. Anesth. Crit. Care 2025, 7, 77–90. [Google Scholar] [CrossRef]
Hashimoto, D.A.; Witkowski, E.; Gao, L.; Meireles, O.; Rosman, G. Artificial Intelligence in Anesthesiology: Current Techniques, Clinical Applications, and Limitations. Anesthesiology 2020, 132, 379–394. [Google Scholar] [CrossRef]
Scott, I.A.; van der Vegt, A.; Lane, P.; McPhail, S.; Magrabi, F. Achieving large-scale clinician adoption of AI-enabled decision support. BMJ Health Care Inform. 2024, 31, e100971. [Google Scholar] [CrossRef]
Fihn, S.; Saria, S.; Mendonça, E.; Hain, S.; Matheny, M.; Shah, N.; Liu, H.; Auerbach, A. Deploying artificial intelligence in clinical settings. In Artificial Intelligence in Health Care: The Hope, the Hype, the Promise, the Peril; Whicher, D., Ahmed, M., Israni, S.T., Eds.; National Academies Press: Washington, DC, USA, 2023. Available online: https://www.ncbi.nlm.nih.gov/books/NBK605954/ (accessed on 20 December 2025).
Bahl, M. Artificial intelligence in clinical practice: Implementation considerations and barriers. J. Breast Imaging 2022, 4, 632–639. [Google Scholar] [CrossRef]
de Hond, A.A.H.; Leeuwenberg, A.M.; Hooft, L.; Kant, I.M.J.; Nijman, S.W.J.; van Os, H.J.A.; Aardoom, J.J.; Debray, T.P.A.; Schuit, E.; van Smeden, M.; et al. Guidelines and quality criteria for artificial intelligence-based prediction models in healthcare: A scoping review. npj Digit. Med. 2022, 5, 2. [Google Scholar] [CrossRef]
Liu, M.; Ning, Y.; Teixayavong, S.; Liu, X.; Mertens, M.; Shang, Y.; Li, X.; Miao, D.; Liao, J.; Xu, J.; et al. A scoping review and evidence gap analysis of clinical AI fairness. npj Digit. Med. 2025, 8, 360. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Schematic representation of the integration of artificial intelligence in medicine: from multidimensional patient data collection to clinical application.

Figure 2. Main applications of artificial intelligence in cardiology.

Figure 3. The role of AI in clinical neurology: supporting diagnostic imaging, personalizing treatment, predicting risk, and continuously monitoring patients to optimize clinical decisions.

Figure 4. Advantages and disadvantages of using AI in the diagnosis and treatment of liver diseases.

Table 2. Summary of the latest reports in the field of pulmunology.

Authors	Methodology	Results	Significance
Ren, F., et al. [50]	AI-driven target identification selected TNIK as a high-confidence anti-fibrotic target. A generative AI platform designed and optimized the small-molecule TNIK inhibitor INS018_055. The compound was tested in multiple preclinical in vivo models of lung, liver, skin, and kidney fibrosis (oral, inhaled, and topical routes) and in anti-inflammatory models. Safety, tolerability, and pharmacokinetics were assessed in two randomized, double-blind, placebo-controlled phase I trials (NCT05154240 and CTR20221542) involving 78 healthy volunteers.	INS018_055 showed potent pan-organ anti-fibrotic activity and additional anti-inflammatory effects across preclinical models via all tested administration routes. Both phase I trials confirmed favorable safety, tolerability, and pharmacokinetic profiles with no serious adverse events. The entire process from target identification to clinical candidate nomination was completed in approximately 18 months.	This study demonstrates the first successful clinical translation of a generative AI-discovered drug for idiopathic pulmonary fibrosis. By identifying TNIK as a novel anti-fibrotic target and rapidly delivering a safe, orally/inhalably bioavailable inhibitor (INS018_055), it validates an ultra-fast AI-powered drug discovery pipeline capable of dramatically accelerating therapeutic development for high-unmet-need fibrotic diseases.
Xu, Z., et al. [51]	Multicenter, double-blind, randomized, placebo-controlled phase 2a trial (NCT05938920); 71 patients with idiopathic pulmonary fibrosis were randomized 1:1:1:1 to rentosertib (AI-discovered first-in-class TNIK inhibitor) 30 mg QD, 30 mg BID, 60 mg QD, or placebo for 12 weeks. Primary endpoint: safety and tolerability (treatment-emergent adverse events). Key secondary endpoints: pharmacokinetics and change in forced vital capacity (FVC).	Safety profile was favorable and comparable across groups (TEAEs 70–83%); serious treatment-related AEs were rare, with liver toxicity and diarrhea as main reasons for discontinuation. At the highest dose (60 mg QD), FVC increased by +98.4 mL (95% CI 10.9–185.9) versus a decline of −20.3 mL (95% CI −116.1 to 75.6) with placebo, indicating a dose-dependent signal of lung function stabilization/improvement.	Rentosertib is the first fully AI-generated novel small molecule (new target + new chemical entity) to demonstrate clinical proof-of-concept in a phase 2 trial. The positive safety profile and early evidence of FVC benefit in IPF validate generative AI as a viable end-to-end drug discovery engine capable of delivering clinical-stage candidates for previously intractable diseases
Habicher, M., et al. [52]	Single-center, single-blinded randomized controlled trial; 150 patients undergoing lung surgery with single-lung ventilation were 1:1 randomized to either AI-based goal-directed hemodynamic therapy using the Hypotension Prediction Index (HPI) or standard care without a specific protocol.	The HPI-guided intervention significantly reduced the number of hypotensive episodes (0 [0–1] vs. 1 [0–2], p = 0.01), duration of hypotension (0 vs. 2.33 min, p = 0.01), area under MAP < 65 mmHg (0 vs. 10.67 mmHg·min, p < 0.01), and time-weighted average below MAP 65 (0 vs. 0.07 mmHg, p < 0.01). Postoperative AKI rates were similar (6.7% vs. 4.2%, p = 0.72); MINS showed a strong trend toward reduction (17.1% vs. 31.8%, p = 0.07), with a similar trend for postoperative infections.	AI-driven predictive hemodynamic management using the HPI effectively minimized intraoperative hypotension during single-lung ventilation surgery. Although AKI was not significantly reduced, the observed trends toward lower myocardial injury and infections suggest potential clinical benefit, supporting broader adoption of predictive AI tools for perioperative hemodynamic optimization.
Hong, L., et al. [53]	Two prospective randomized controlled trials in COPD patients: 1. A 12-month RCT with 447 patients assessing AI-based medical intervention vs. standard care, with quality-of-life (QoL) and psychological outcomes measured at 4 and 12 months. 2. A separate 9-month RCT with 101 patients randomized to a web-based AI-driven educational and exercise program for prevention of acute exacerbations versus control	At 4 months no significant QoL improvement was observed. At 12 months the AI-intervention group showed significantly better quality of life, emotional status, and psychological well-being compared with controls. In the second trial, the AI-supported group had lower hospitalization rates and shorter length of hospital stay than the control group. Single-factor analysis did not reach statistical significance in all outcomes, but overall results were positive.	AI-supported medical and educational interventions appear feasible and effective in long-term COPD management, improving quality of life, psychological health, and reducing acute exacerbations and hospitalizations after 9–12 months. Although some individual endpoints lacked statistical power, the trials provide preliminary evidence supporting the integration of artificial intelligence into routine COPD care and secondary prevention.
Ladbury, C., Li, R., et al. [54]	Secondary analysis of the RTOG 0617 trial; multiple machine learning models (including XGBoost, random forest, naive Bayes) were trained on clinical and dosimetric variables to predict grade ≥ 3 pulmonary, cardiac, and esophageal toxicity after definitive chemoradiation for locally advanced NSCLC. Best models were interpreted using SHAP values to identify and quantify dosimetric thresholds, validated with logistic regression and bootstrapping.	XGBoost best predicted pulmonary toxicity (AUC 0.739), random forest cardiac (AUC 0.706), and naive Bayes esophageal toxicity (AUC 0.721), all outperforming traditional logistic regression. Key thresholds: lung mean dose > 18 Gy and V20 > 37% for pulmonary toxicity (OR 2.47 and 2.72); esophageal mean dose > 34 Gy and V20 > 37% for esophageal toxicity (OR 4.01 and 3.73). No significant cardiac thresholds identified.	Machine learning combined with explainable AI (SHAP) validated known and identified new clinically actionable dosimetric thresholds for radiation-induced toxicity, outperforming conventional statistical methods. This approach enables data-driven, precise optimization of radiotherapy planning constraints in lung cancer chemoradiation.
Pasipanodya, J. G., et al. [55]	Nested pharmacokinetic substudy within the OFLOTUB trial; 126 patients with drug-susceptible pulmonary TB received a 4-month gatifloxacin-containing regimen. Intensive PK sampling was performed on two days in the first 2 months. Therapy failure was defined as failure to culture-convert, relapse (confirmed by spoligotyping), or death within 24 months. An ensemble machine-learning approach (multiple algorithms) was used to rank 27 clinical/laboratory/PK variables and detect interactions predicting outcome.	19/126 patients (15%) had unfavorable outcomes. Machine learning ranked pyrazinamide and rifampicin exposure (Cmax and AUC) as more important than gatifloxacin exposure. A significant antagonistic 3-way interaction between low concentrations of pyrazinamide, gatifloxacin, and rifampicin was identified; this negative interaction disappeared when rifampicin Cmax exceeded 7 mg/L. Drug concentrations explained 31–75% of outcome variance across sites.	Concentration-dependent antagonism among the three key drugs contributes to treatment failure in shortened TB regimens but can be overcome by higher rifampicin exposure (>7 mg/L). The findings provide a mechanistic explanation for prior trial failures and strongly support dose optimization of both rifampicin and fluoroquinolones to improve efficacy of short-course TB regimens.
Ding, Y., et al. [56]	Retrospective analysis of 424 patients with pulmonary nodules undergoing surgical resection; all had preoperative CT-based AI malignancy probability score, 7-autoantibody (7-AAB) panel, and CEA testing. Patients were randomly split 1:1 into training (n = 212) and validation (n = 212) sets. A logistic regression-based nomogram was built using forward stepwise selection of significant predictors (age, AI score, 7-AAB result, CEA) and internally validated.	The nomogram achieved AUC 0.899, sensitivity 82.3%, specificity 90.5%, and PPV 97.2% in the validation set. It significantly outperformed 7-AAB alone (sensitivity 82.3% vs. 35.9%, p < 0.001), CEA alone (sensitivity 82.3% vs. 18.8%, p < 0.001), and standalone CT-AI (specificity 90.5% vs. 69.0%, p = 0.022). For nodules ≤ 2 cm, nomogram specificity remained high at 90.0% vs. 67.5% for AI alone (p = 0.022).	Integration of CT-based AI, 7-AAB panel, age, and CEA into a simple nomogram substantially improves diagnostic accuracy for pulmonary nodules, especially small (≤2 cm) lesions, offering higher sensitivity than biomarkers alone and higher specificity than AI alone. This non-invasive, clinically practical tool can reduce unnecessary surgeries while maintaining excellent malignancy detection.
Bosman, S., et al. [57]	Prospective diagnostic accuracy study at health facilities in Lesotho and South Africa; 1392 symptomatic adults (≥1 TB symptom) underwent digital chest X-ray with CAD4TBv7 analysis, point-of-care CRP, and microbiological confirmation (Xpert MTB/RIF Ultra + liquid culture as composite reference standard). CAD4TBv7 performance was compared to CRP and expert radiologist reading.	CAD4TBv7 AUC was 0.87 (95% CI 0.84–0.91) vs. 0.80 for CRP (95% CI 0.76–0.84). At ≥90% sensitivity, CAD4TBv7 specificity was 68.2% (95% CI 65.4–71.0%), nearly meeting WHO TPP (>70%), while CRP specificity was only 38.2%. CAD4TBv7 performed equivalently to expert radiologist interpretation.	CAD4TBv7 (version 7) is a highly accurate, non-sputum triage test that nearly achieves WHO TPP criteria in high TB/HIV-burden settings, enabling rapid rule-out of TB and prioritization of confirmatory testing. It significantly outperforms CRP and matches expert radiology, supporting its deployment as a scalable frontline TB screening tool.

Table 3. Summary of the latest reports in the field of neurology.

Authors	Methodology	Results	Significance
Gombolay, G. Y., et al. [63]	Randomized, blinded vignette-based study comparing 81 neurologists/child neurologists (members of Child Neurology Society and American Academy of Neurology) with 284 general-population participants. Each received AI-based diagnostic recommendations accompanied by one of eight randomly assigned explainable AI (xAI) methods (decision tree, crowd-sourced agreement, case-based reasoning, probability scores, counterfactuals, feature importance, templated explanations, or no explanation). Primary outcomes: task performance, perceived explainability, trust, and social competence of the DSS.	Decision trees were rated significantly more explainable by neurologists than by the general population (p < 0.01) and more explainable than probability scores among neurologists (p < 0.001). Higher clinical experience and higher perceived explainability paradoxically correlated with worse task performance (p = 0.0214). Performance was driven by perceived explainability rather than the specific xAI technique used.	Different xAI techniques are not equally effective across clinicians and laypeople; neurologists strongly prefer decision-tree explanations, and there is no universal “best” xAI method. Perceived explainability can negatively affect diagnostic accuracy in experts, highlighting the need for clinician-centered, personalized xAI design rather than one-size-fits-all approaches in clinical decision-support systems.
Alagapan, S., et al. [64]	Prospective study in 10 patients with treatment-resistant depression receiving subcallosal cingulate (SCC) DBS with a bidirectional implant allowing chronic local field potential (LFP) recording (NCT01984710). SCC LFPs from 6 patients were analyzed using explainable AI to derive individualized electrophysiological biomarkers of clinical state. Preoperative structural/functional connectivity of the target network was quantified with tractography and resting-state fMRI; objective mood changes were measured via automated video-based facial expression analysis.	At 24 weeks, 90% of patients were responders and 70% achieved remission. Explainable AI identified stable, patient-specific SCC LFP biomarkers that accurately tracked clinical state, discriminated therapeutic from transient stimulation effects, and responded to programming changes. Recovery trajectories strongly correlated with preoperative integrity of the white-matter treatment network and were objectively mirrored by changes in data-driven facial expression metrics.	This is the first demonstration of chronic, individualized electrophysiological biomarkers from the SCC DBS target that objectively track clinical state in TRD. Combined with preoperative connectivity and automated behavioral analysis, these biomarkers enable personalized, biomarker-guided DBS management, reduce reliance on subjective reporting, and explain inter-individual variability in long-term outcome of SCC DBS for severe depression.
Boutet, A., et al. [65]	Prospective observational trial in 67 PD patients with chronic DBS; 3T fMRI was performed under clinically optimized (ON) and deliberately non-optimal (OFF) stimulation settings. Brain response patterns were compared, and a machine learning classifier (trained on 39 patients with a priori known optimal settings) was built to predict optimal vs. non-optimal stimulation from whole-brain fMRI activation maps.	Optimal DBS produced a distinct fMRI signature with strong engagement of the motor network. The ML model classified optimal vs. non-optimal settings with 88% accuracy in the training cohort and successfully generalized to unseen held-out data, including stimulation-naïve patients, correctly predicting the clinically optimal settings.	fMRI-based brain response patterns serve as an objective, patient-specific biomarker of therapeutic DBS efficacy in Parkinson’s disease. This proof-of-concept demonstrates the feasibility of imaging-guided DBS programming, potentially reducing the number of clinical visits and accelerating the identification of optimal stimulation parameters.
Cowan, R. P., et al. [66]	Cross-sectional diagnostic accuracy study at three academic headache centers; 212 adults completed both a self-administered web-based Computer-based Diagnostic Engine (CDE) and a semi-structured telephone interview (SSI) by headache specialists, both strictly applying ICHD-3 criteria. Order of administration was randomized; concordance and accuracy metrics (Cohen’s kappa, sensitivity, specificity, PPV/NPV) were calculated using SSI as the reference standard.	Excellent concordance between CDE and SSI for migraine/probable migraine diagnosis (κ = 0.83, 95% CI 0.75–0.91). CDE showed sensitivity 90.1% (95% CI 83.6–94.6%), specificity 95.8% (95% CI 88.1–99.1%), PPV 97.0%, and NPV 86.6% at 60% study prevalence. At a population prevalence of 10%, PPV was 70.3% and NPV 98.9%.	A fully automated, self-administered online diagnostic tool using ICHD-3 logic achieves near-specialist-level accuracy in migraine diagnosis. It reliably rules in migraine (high specificity) and rules out migraine (high sensitivity), offering a scalable, valid solution to reduce diagnostic delay and the need for specialist interviews in both clinical and research settings.
Gorenshtein, A., et al. [67]	Prospective single-center randomized controlled trial; 200 patients referred for electrodiagnostic (EDX) studies were 1:1 randomized to physician-only interpretation (control) or physician + AI-assisted interpretation using the multi-agent INSPIRE framework (intervention). Three board-certified physicians rotated across arms. In the intervention arm, physicians reviewed and integrated an AI-generated preliminary report. Primary outcome: report quality measured by AIGERS score (0–1). Secondary outcomes: physician-rated AI integration (PAIR) and usability survey.	I-generated preliminary reports showed only moderate consistency. Final integrated physician-AI reports did not significantly outperform physician-only reports on AIGERS (no statistical difference, p > 0.05). Physicians rated trust in AI suggestions moderately (3.7/5) but scored efficiency (2.0/5), ease of use (1.7/5), and workload reduction (1.7/5) poorly, citing interpretability issues and workflow disruption.	In real-world clinical use, the tested AI-assisted multi-agent framework (INSPIRE) did not improve EDX report quality over expert physician interpretation alone and was perceived as cumbersome. Current limitations in usability and workflow integration highlight that AI tools for complex interpretive tasks like EDX require substantial improvement before they can deliver meaningful clinical benefit.
Davidovic, V., et al. [68]	Cross-sectional cohort follow-up of a randomized controlled trial (NCT06273579) at McGill University Neurosurgical Simulation Centre; 87 medical students were block-randomized to three training conditions on the NeuroVR virtual reality simulator: pure AI-tutor feedback, scripted human instruction, or AI-augmented personalized human instruction (human instructor received real-time AI-detected error data). Participants performed repeated simulated tumor resections. Primary measure: feedback frequency (as proxy for errors); secondary: objective performance metrics (healthy tissue removal, instrument separation, aspirator force).	By the third repetition, the AI-augmented personalized instruction group required significantly fewer total feedback instances (IRR 1.50, 95% CI 1.16–1.94, p < 0.001) and high-force aspirator corrections (IRR 1.71, 95% CI 1.15–2.55, p = 0.002) than in earlier repetitions. Compared to pure AI-tutor instruction, AI-augmented human instruction achieved significantly less healthy tissue removal (p = 0.01), smaller instrument tip separation (mean ratio 1.25, p = 0.008), and lower aspirator force (mean ratio 1.68, p < 0.001), with sustained improvement across all metrics from baseline.	Real-time AI error detection fed to a human instructor (AI-augmented personalized instruction) markedly reduces feedback frequency (indicating fewer errors) and produces superior technical skill acquisition compared to either AI-only or traditional scripted human teaching. This hybrid approach leverages the strengths of both AI precision and human pedagogical adaptability, establishing a new benchmark for effective VR-based surgical training.
Hassan, A. E., et al. [69]	Observational before-and-after study within the ongoing EMBOLISE trial (NCT04402632) at a single large comprehensive stroke center. Pre-AI period: 153 days (5 May–6 October 2021); post-AI period: 316 days after activation of Viz RECRUIT SDH (6 October 2021–18 August 2022). The AI platform automatically analyzed all non-contrast head CTs, flagged suspected subacute/chronic SDH, and calculated volume, thickness, and midline shift. All AI alerts were manually reviewed to assess positive predictive value (PPV) and enrollment impact.	Pre-AI: 5 patients enrolled (0.99/month), 1 screen failure. Post-AI: 14 patients enrolled (1.35/month), representing a 36% increase in enrollment rate and zero screen failures. Of 6244 processed CTs, 207 SDH detections (3% prevalence); PPV of the algorithm was 81.4% (95% CI 75.3–86.7%). Median response time to alerts: 50% viewed within 1 h, 35% within 10 min.	Real-time AI-based automated screening of routine head CTs (Viz RECRUIT SDH) significantly accelerated patient identification and enrollment in a randomized trial for chronic/subacute SDH by 36%, eliminated screen failures, and demonstrated high real-world performance (PPV > 80%) and rapid clinical response. This validates AI-driven mobile platforms as powerful tools to improve recruitment efficiency in time-sensitive neurosurgical trials.
Macea, J., et al. [70]	Prospective observational study within the SeizeIT2 trial (NCT04284072); 223 in-hospital overnight recordings from 50 epilepsy patients were simultaneously captured with full polysomnography (including standard EEG) and a wearable EEG + accelerometry device. A single deep-learning model performed automated 30 s epoch sleep staging on both modalities. Automated scoring was compared against consensus clinical expert scoring on 20 nights (one per patient) using accuracy, Cohen’s kappa, F1-scores, and Bland–Altman analysis. Mixed-effects models compared sleep macrostructure between patients with and without in-hospital seizures.	Automated staging showed moderate agreement with expert scoring on standard EEG (accuracy 0.73, κ = 0.59) and lower agreement on wearable data (accuracy 0.61, κ = 0.43). Sensitivity was poor for N1 across both modalities; wearable-based staging systematically underestimated total sleep time and most stages except N2. Patients with seizures slept significantly longer (6.37 h, 95% CI 5.86–7.87) than those without (5.68 h, 95% CI 5.24–6.13; p = 0.001) and spent more time in N2.	Wearable-based deep learning can perform automated sleep staging in epilepsy patients with moderate accuracy, sufficient to detect clinically relevant differences (longer sleep and more N2 in patients with seizures). However, current performance (especially poor N1 detection and systematic underestimation by the wearable) requires further model improvement before reliable clinical or research use in epilepsy monitoring.

Table 7. A summary of applications of artificial intelligence in various fields of medicine—an overview of current methods, goals and implementation status.

Medical Area	AI Methods	Purposes	Status
Radiology	Deep Learning (DL), Convolutional Neural Networks (CNN), Diffusion Models, Machine Learning (ML), Computer Vision, Large Multimodal Models (LMMs)	Image analysis, diagnosis (e.g., cancer detection in MRI/CT), report generation, anomaly detection, image enhancement, segmentation, triaging acute diseases (e.g., stroke, pneumothorax).	Established with emerging generative AI for synthetic imaging and bias mitigation [106,107,108].
Pathology	DL, CNN, ML, Neural Networks, Virtual Staining	Histopathology classification, mutation prediction, diagnosis (e.g., tumor subclassification), prognosis prediction, biomarker analysis.	Established, with emerging applications in precision pathology and immunotherapy outcome prediction [109,110,111].
Ophthalmology	DL, ML, CNN	Diabetic retinopathy screening, macular edema detection, glaucoma staging, refractive error detection.	Emerging, fastest-growing field with high publication growth [112,113,114].
Cardiology	ML, DL, LMMs, CNN	ECG interpretation, arrhythmia detection, cardiovascular risk prediction, echocardiography analysis.	Established, with emerging multimodal analysis for improved accuracy [115,116,117].
Oncology	DL, ML, CNN, Generative AI	Cancer detection/subtyping, treatment planning, prognosis, drug response prediction, precision therapy.	Established, emerging in genomic subtyping and AI for chemotherapy dose optimization [118,119,120].
Neurology	DL, LMMs, Transformer-based Models, ML	Lesion detection, brain condition distinction, stroke triage, neurosurgery outcome prediction.	Emerging, with multimodal integration for diagnostics [121,122,123].
Gastroenterology	CNN, Support Vector Machines (SVM), ML, DL	Neoplasia detection, polyp identification, inflammatory bowel disease prediction, endoscopy assistance.	Established, emerging integration with genomic data [124,125,126].
Dermatology	CNN, DL, ML	Skin lesion classification, melanoma detection, triage of lesions.	Established, addressing health disparities through data mining [127,128,129].
Mental Health/Psychiatry	NLP, LLMs, ML, Chatbots	Therapy delivery, psychosis prediction, mental health monitoring, patient engagement.	Emerging, with AI chatbots for accessible care and relapse prediction [130,131,132].
Drug Discovery & Development	ML, DL, NLP, Generative AI	Drug repurposing, target identification, toxicity prediction, vaccine design, clinical trial management.	Emerging, accelerating personalized medicine and synthetic data generation [133,134,135].
Orthopedics	ML, CNN, Diffusion Models	Fracture detection, osteoarthritis prediction, surgical outcome prediction, gait analysis.	Emerging, patient-specific biomechanical testing [136,137,138].
Emergency Medicine	ML, CNN, DL	Patient triage, risk stratification, hyperglycemic crises prediction, decision support.	Established, faster triage for acute conditions [139,140,141].
Preventive Medicine	ML, DL, Predictive Analytics	Risk assessment, disease progression prediction, population health management.	Emerging, second-fastest-growing field with wearable integration [142,143,144].
Anesthesiology	ML, DL	Depth of anesthesia monitoring, pain identification, intraoperative support.	Established, with emerging decision support techniques [145,146,147].

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Łoś, A.; Bartusik-Aebisher, D.; Mytych, W.; Aebisher, D. Applications of Artificial Intelligence in Selected Internal Medicine Specialties: A Critical Narrative Review of the Latest Clinical Evidence. Algorithms 2026, 19, 54. https://doi.org/10.3390/a19010054

AMA Style

Łoś A, Bartusik-Aebisher D, Mytych W, Aebisher D. Applications of Artificial Intelligence in Selected Internal Medicine Specialties: A Critical Narrative Review of the Latest Clinical Evidence. Algorithms. 2026; 19(1):54. https://doi.org/10.3390/a19010054

Chicago/Turabian Style

Łoś, Aleksandra, Dorota Bartusik-Aebisher, Wiktoria Mytych, and David Aebisher. 2026. "Applications of Artificial Intelligence in Selected Internal Medicine Specialties: A Critical Narrative Review of the Latest Clinical Evidence" Algorithms 19, no. 1: 54. https://doi.org/10.3390/a19010054

APA Style

Łoś, A., Bartusik-Aebisher, D., Mytych, W., & Aebisher, D. (2026). Applications of Artificial Intelligence in Selected Internal Medicine Specialties: A Critical Narrative Review of the Latest Clinical Evidence. Algorithms, 19(1), 54. https://doi.org/10.3390/a19010054

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Applications of Artificial Intelligence in Selected Internal Medicine Specialties: A Critical Narrative Review of the Latest Clinical Evidence

Abstract

1. Introduction

2. Materials and Methods

3. Results

3.1. Cardiology

3.2. Pulmonology

3.3. Neurology

3.4. Hepatology

3.5. Pancreatic Disease

3.6. Other Applications of AI (Table 6)

4. Discussion

5. Limitations

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI