Generative Artificial Intelligence in Healthcare: Applications, Implementation Challenges, and Future Directions

Rabbani, Syed Arman; El-Tanani, Mohamed; Sharma, Shrestha; Rabbani, Syed Salman; El-Tanani, Yahia; Kumar, Rakesh; Saini, Manita

doi:10.3390/biomedinformatics5030037

Open AccessReview

Generative Artificial Intelligence in Healthcare: Applications, Implementation Challenges, and Future Directions

by

Syed Arman Rabbani

^1,*

,

Mohamed El-Tanani

¹,

Shrestha Sharma

^2,*,

Syed Salman Rabbani

³,

Yahia El-Tanani

⁴,

Rakesh Kumar

^2,5 and

Manita Saini

^2,6

¹

Clinical Pharmacy & Pharmacology, RAK College of Pharmacy, RAK Medical and Health Sciences University, Ras Al Khaimah 11172, United Arab Emirates

²

Amity Institute of Pharmacy, Amity University, Panchgaon, Gurgaon 122412, India

³

Department of Management Studies, Jamia Hamdard, New Delhi 110062, India

⁴

Royal Cornwall Hospital Trust, NHS, Truro TR1 3LJ, UK

⁵

Department of Pharmacy, Jagannath University, Bahadurgarh 124507, India

⁶

Geeta Institute of Pharmacy, Geeta University, Naultha, Panipat 132145, India

^*

Authors to whom correspondence should be addressed.

BioMedInformatics 2025, 5(3), 37; https://doi.org/10.3390/biomedinformatics5030037

Submission received: 12 May 2025 / Revised: 22 June 2025 / Accepted: 1 July 2025 / Published: 7 July 2025

(This article belongs to the Special Issue Integrating Health Informatics and Artificial Intelligence for Advanced Medicine)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Generative artificial intelligence (AI) is rapidly transforming healthcare systems since the advent of OpenAI in 2022. It encompasses a class of machine learning techniques designed to create new content and is classified into large language models (LLMs) for text generation and image-generating models for creating or enhancing visual data. These generative AI models have shown widespread applications in clinical practice and research. Such applications range from medical documentation and diagnostics to patient communication and drug discovery. These models are capable of generating text messages, answering clinical questions, interpreting CT scan and MRI images, assisting in rare diagnoses, discovering new molecules, and providing medical education and training. Early studies have indicated that generative AI models can improve efficiency, reduce administrative burdens, and enhance patient engagement, although most findings are preliminary and require rigorous validation. However, the technology also raises serious concerns around accuracy, bias, privacy, ethical use, and clinical safety. Regulatory bodies, including the FDA and EMA, are beginning to define governance frameworks, while academic institutions and healthcare organizations emphasize the need for transparency, supervision, and evidence-based implementation. Generative AI is not a replacement for medical professionals but a potential partner—augmenting decision-making, streamlining communication, and supporting personalized care. Its responsible integration into healthcare could mark a paradigm shift toward more proactive, precise, and patient-centered systems.

Keywords:

generative artificial intelligence; large language models (LLMs); interpretation; transparency; diagnostic; AI models; regulatory bodies; proactive; patient-centered

1. Introduction

Generative artificial intelligence (AI) produces new content through algorithms that generate text and images as well as audio and synthetic data, which resembles human-created output. It has gained rapid medical attention throughout the last few years following OpenAI’s launch of ChatGPT 4.5 in late 2022. These models, trained on vast biomedical and general datasets, have shown capabilities that numerous experts predict will transform healthcare delivery and research [1]. ChatGPT demonstrated near-perfect results on the United States Medical Licensing Examination (USMLE) without receiving any training for medical domains. The medical community reacted with astonishment to this achievement because it demonstrated AI’s exceptional ability to process complex medical inquiries [2]. Therefore, quick adoption and experimentation began with available generative AI tools including ChatGPT. These tools have shown the potential to enhance healthcare delivery [3]. The medical field acknowledges both the potential benefits of generative AI and the need for the thorough clinical testing of these systems. The healthcare industry continues to face ongoing challenges, including high expenses, delayed drug development processes, exhausted medical staff, and unequal healthcare delivery, which generative AI systems have the potential to resolve [1]. AI tools have the potential to reduce provider and patient burdens through their ability to automate documentation tasks, speed up therapy discovery, and enhance clinical decision support, as depicted in Figure 1.

At the same time, the risks of premature adoption are significant: generative models can produce incorrect or fabricated information (“hallucinations”) that could be harmful in medical settings [4]. Ensuring patient safety and maintaining trust will require careful validation of these technologies under real-world conditions. This review provides a comprehensive overview of generative AI in healthcare, focusing on practical applications for clinicians and researchers. Furthermore, the review covers recent advancements, current applications across clinical domains, emerging evidence from trials, and the challenges that must be navigated to integrate generative AI safely into medicine. The aim is to equip clinicians, researchers, and health policymakers with an in-depth understanding of how generative AI is being applied in practice today and what future directions are on the horizon.

2. Methodology

A systematic literature search was performed to identify relevant peer-reviewed studies published from November 2022 to present on the use of generative artificial intelligence (AI) in healthcare. Preliminary searches were conducted to identify appropriate Medical Subject Headings (MeSH), keyword variants, and emerging terminology associated with generative AI technologies in healthcare. A comprehensive list of keywords and search terms was developed, which included the following: generative artificial intelligence, generative AI, large language models, LLMs, ChatGPT, GPT-4, transformer models, diffusion models, AI-generated content, synthetic data generation, medical AI, clinical decision support, healthcare automation, AI in diagnostics, AI in therapeutics, AI ethics in healthcare, implementation barriers, and digital health technologies. Searches were carried out across major scientific databases including PubMed, Scopus, ProQuest, and Embase. Additionally, reference lists of all eligible studies were manually screened to identify further relevant articles not captured in the initial database search.

3. Different Generative AI Tools

Generative AI encompasses a class of machine learning techniques designed to create new content that is statistically similar to the data on which they were trained. This contrasts with discriminative AI systems that primarily make predictions or classifications. Key advances in recent years have produced generative models capable of highly sophisticated outputs. Broadly, two families of generative models are most relevant in healthcare: large language models (LLMs) for text generation and image-generating models for creating or enhancing visual data (Figure 2).

Large Language Models (LLMs): LLMs are neural networks trained on enormous corpora of text to predict and generate language. Modern LLMs such as GPT-3.5 and GPT-4 (by OpenAI), PaLM 2 (by Google), and LLaMA (by Meta) are built on the transformer architecture and contain billions of parameters. When provided with prompts, these models can compose human-like natural language responses. In a medical context, LLMs can draft clinical notes, summarize patient histories, answer questions, or even engage in dialogues emulating a clinician’s reasoning. Notably, one of the earliest signs of LLMs’ medical potential was a study in which ChatGPT (GPT-3.5) was tested on USMLE questions: it performed at greater than 50% accuracy across all exam steps and even exceeded the approximate passing threshold (around 60%) in some analyses [3]. This performance suggested that a general-purpose LLM, with no specialized medical tuning, could recall and reason over a wide breadth of medical knowledge comparable to a new physician.

Following close on these findings, medical-specific LLMs have been developed. For example, Google’s Med-PaLM 2, an LLM tuned on medical domain data, achieved about 85–90% accuracy on expert-level medical exam questions, surpassing the passing score and approaching the performance of physicians [5]. Such results, reported in 2023–2024, mark a breakthrough moment wherein AI systems can demonstrate “expert” level competence on written medical knowledge assessments [5]. These models not only recall facts but can also interpret clinical vignettes and provide reasoning. In practical terms, this opens possibilities for AI assistants to support clinicians by synthesizing medical literature, formulating differential diagnoses from case descriptions, or suggesting evidence-based answers to clinical questions. However, it must be emphasized that exam performance is an imperfect proxy for clinical acumen; real-world decision-making involves nuances that go beyond textbook knowledge [6]. Still, the rapid improvement of LLM capabilities over just the past three years is a remarkable advance, and ongoing research is exploring ways to further align these models with clinical reasoning and up-to-date medical knowledge.

Image Generative Models: Alongside text, generative AI has made inroads in medical imaging via models like generative adversarial networks (GANs) and diffusion models. These systems learn the distribution of complex image data (such as radiographs, pathology slides, or MRI scans) and can produce new synthetic images or transform existing ones. In a GAN, for example, a generator network creates fake images that a discriminator network tries to distinguish from real images; through this adversarial training, GANs can generate highly realistic medical images. A recent systematic review confirms that GAN-based techniques have been successfully applied across imaging modalities—including MRI, CT, X-rays, ultrasound, and histopathology—to create lifelike synthetic images for research and training purposes [7].

Diffusion models, a newer class of generative models, iteratively refine random noise into coherent images and have also demonstrated impressive results in medical imaging tasks, such as generating high-resolution synthetic MRIs or enhancing image quality by denoising. These image generators are particularly valuable in scenarios where annotated medical images are scarce or privacy-protected. Generative models can augment training datasets for diagnostic algorithms without risking patient confidentiality by generating realistic synthetic data that retains the statistical properties of real patient data. A proof-of-concept study in neuroimaging showed that using GAN-synthesized MRI scans of patients with multiple sclerosis to augment the training set significantly improved an AI model’s ability to detect new brain lesions on external data [8]. The model trained with GAN-generated images achieved an area under the ROC curve (AUC) of 93.3% on an independent external dataset, compared to 83.6% for the model trained on real images alone. This represented a substantial improvement in generalizability [8]. Such findings underscore how generative AI can address the notorious problem of “domain shift” in medical imaging by providing a diverse array of training examples.

Beyond data augmentation, image generative models can perform modality-to-modality translation, for example, generating a synthetic CT scan from an MRI to aid in radiation therapy planning, or converting a pathology image from one staining technique to another. They are also being explored for image enhancement, such as increasing the resolution of low-quality scans or removing noise. A notable application is in accelerated MRI: diffusion models have been trained to fill in missing data from accelerated MRI protocols, enabling much faster scans while maintaining image fidelity, though this approach is still under investigation. These advancements in image generation are highly technical, but the practical upshot is that clinicians may soon have access to AI-augmented imaging tools that improve diagnostic accuracy (by reducing noise or artifacts) and expand datasets in underrepresented disease areas (through realistic synthetic examples).

Foundation Models and Multimodal Generative AI: The term “foundation model” has been used to describe large-scale models (like GPT-4 or Stable Diffusion) that can be adapted to many tasks. Increasingly, researchers are working on multimodal generative AI that can handle more than one type of data—for example, models that take both imaging and textual input. A preliminary case study in late 2023 demonstrated that GPT-4’s vision-enabled variant (GPT-4V) could analyze clinical images (such as skin lesion photos or chest X-rays) in combination with text and provide diagnostic impressions [1]. While these multimodal capabilities are in their nascent stages, they point toward a future where a single AI system might interpret a patient’s radiology images, lab results, and clinical notes together to generate an integrated report or clinical summary. The technical underpinnings of such systems involve combining vision transformers for image analysis with LLMs for reasoning and explanation. Although there is excitement here, so far these remain research prototypes; rigorous validation is required before multimodal generative AI can be trusted for frontline clinical use.

4. Applications in Clinical Practice and Research

Generative AI can play a role in almost every part of healthcare, from examining patients to conducting research. In this section, we explore the major application domains: clinical documentation and administrative tasks, patient communication, diagnosis and decision support (including medical imaging and pathology), drug discovery and biomedical research, and medical education (Figure 3). For each, we emphasize practical use cases that have been piloted or implemented, and we anchor the discussion in the findings from recent clinical studies (2022–2025) that evaluate these tools.

4.1. Clinical Documentation and Administrative Workflow

One of the most immediate and tangible uses of generative AI for clinicians is in managing documentation. Physicians consistently report that electronic health record (EHR) documentation is a significant contributor to burnout and lost clinical time. Generative AI offers a way to automate and streamline many of these routine textual tasks. Current-generation LLMs are already being used to draft clinical notes, consultation letters, and insurance pre-authorization documents from structured inputs. For example, a large language model can transcribe a physician—patient encounter (captured via a microphone or video) and automatically generate a formatted clinical note or referral letter. Early evidence suggests these tools can improve efficiency [9].

In one study, Ali et al. tested ChatGPT’s ability to write patient clinic letters from brief clinician prompts and found that the AI-generated drafts were coherent and contained the key required information, with the potential to save clinicians’ time in letter writing [10]. Although such anecdotal reports are encouraging, more robust evaluations have been undertaken. Zaretsky and colleagues evaluated the ability of LLMs to transform discharge summaries into a more patient-friendly format. The transformed summaries had significantly improved readability (Flesch–Kincaid Grade Level: 6.2 vs. 11.0) and understandability (PEMAT: 81% vs. 13%). However, concerns about omissions and inaccuracies highlighted the need for physician review before implementation [11].

A landmark quality-improvement study at Stanford Health Care in 2023 examined the real-world adoption of an AI assistant for drafting patient messages and clinical notes in an EHR-integrated workflow [12]. In this 5-week pilot involving 162 physicians and other healthcare professionals, a HIPAA-compliant LLM (based on GPT-3.5) was integrated into the EHR system to generate draft replies to patient portal messages (e.g., emails from patients about symptoms or follow-up questions). Clinicians had the option to use these AI-generated drafts as a starting point when responding. The study found a mean AI draft utilization rate of 20%, and with this relatively modest usage, participants reported a significant reduction in perceived workload and burnout time, which was assessed by validated survey scales. Specifically, the task load score (a composite measure of effort and frustration with inbox management) decreased by ~14 points and work-related emotional exhaustion scores also improved (both with p < 0.001) [12].

Another nonrandomized clinical study evaluated an AI-powered clinical documentation tool for its impact on physician workload and experience. The AI users reported reduced time spent on electronic health records (EHRs) at home and after visits and experienced less frustration with EHR use [9].

One cohort evaluated LLM-generated emergency-to-inpatient handoff notes and compared them to physician-written notes in 1600 patient records. The AI-generated notes demonstrated higher lexical similarity and detail but were slightly less useful and safe according to physician reviews [13].

These studies suggest that integrating generative AI for drafting routine communications can meaningfully alleviate the documentation burden on clinicians. But clinicians still had to review and edit AI-generated drafts (which explains why total time might not drop), but having a draft to work from appeared to make the process feel less onerous. However, some clinicians desired improvements in making the tone more personalized and less “robotic” [12]. Aside from patient messages, generative AI is being applied to other documentation tasks. Some hospitals have begun experimenting with AI scribes—systems where an AI listens during patient visits (or takes the transcript of a recorded visit) and generates the clinical note for that encounter. These notes can then be reviewed and signed by the physician. Compared to human medical scribes, AI scribes offer 24/7 availability and lower long-term cost, though their accuracy and reliability must meet high standards. Early versions are promising but not yet perfect; errors of omission or incorrect details in AI-generated notes have been observed, necessitating careful physician review (the legal responsibility for accurate records remains with the clinician).

One such prospective study evaluated the impact of scribing tools on clinical documentation efficiency. Clinicians reported improved engagement with patients and reduced documentation burden with a significant reduction in time spent on notes and after-hours work time [14]. Nevertheless, as LLMs like GPT-4 have improved factual accuracy and incorporated medical knowledge, their generated notes are becoming more polished. In fact, generative models can be fine-tuned on a specific institution’s data to better capture the preferred style and content of that organization’s notes [12]. For example, an AI note writer could be customized to always include certain required elements (problem lists, medication changes, follow-up instructions) in a format familiar to that clinic.

Beyond notes, another administrative area is insurance and billing: generative AI can help populate billing codes or draft justifications for procedures by analyzing the clinical documentation, tasks that typically consume considerable time. Some EHR vendors are developing generative AI features that, for instance, scan a patient’s chart and generate a succinct summary or problem-oriented synopsis for quick provider review. This could help during care transitions or handoffs. Indeed, the largest EHR vendor in the US, Epic Systems, announced in 2023 a partnership with OpenAI to integrate LLM technology into its software for exactly these kinds of purposes [6].

In summary, generative AI, particularly large language models (LLMs) are rapidly transforming clinical documentation by alleviating the administrative burden associated with electronic health records (EHRs). Studies have shown that AI-generated drafts for clinical notes, discharge summaries, patient messages, and handoff communications can improve efficiency, readability, and provider satisfaction, though physician oversight remains essential to ensure accuracy. Early pilots, including Stanford Health Care, demonstrated reduced perceived workload, emotional exhaustion, and after-hours documentation time. AI scribes and documentation assistants also show promise in enhancing real-time notetaking and billing support. Despite the progress, issues like incomplete information, impersonal tone, and legal responsibility highlight the ongoing need for human review. As LLMs become more refined and customized to institutional needs, their role in supporting clinicians is expected to expand—streamlining documentation while preserving quality and safety in care delivery.

4.2. Patient Communication and Education

Effective communication with patients is central to clinical care, and here too, generative AI is beginning to play a role. Patients increasingly use digital platforms to seek medical information or advice—from patient portals to online forums. Generative AI models, especially LLM-based chatbots, can be deployed to interact with patients in natural language, answering questions, providing education, and even offering counseling support.

One of the most striking early examples of generative AI in patient communication comes from a study that compared physician vs. AI responses to patient questions on a public forum [15]. A 2023 cross-sectional study analyzed 195 patient questions from Reddit’s r/AskDocs, comparing ChatGPT-generated responses with physician answers. A blinded panel of healthcare professionals preferred the AI’s responses 78.6% of the time, rating them significantly higher in quality and empathy. While physician responses were often brief, ChatGPT provided structured, detailed, and reassuring replies. The study suggested AI could assist clinicians in drafting patient responses, particularly in managing high message volumes in patient portals, while still requiring human oversight for accuracy and personalization [15].

Building on that concept, the previously mentioned Stanford pilot study not only measured clinician burden but also provided a real-world test of AI-assisted patient communication in the form of drafting patient inbox messages. The finding was that clinicians voluntarily used AI for 20% of their patient replies during the pilot [12]. In another model based on NLP, patient message routing was automated in an electronic health record (EHR) system. It significantly reduced response (less than 1 h) and resolution times and decreased staff message interactions in comparison to unrouted messages. The model demonstrated high accuracy (97.8%) in classifying messages, improving workflow efficiency and communication [16].

The MySurgeryRisk AI system accurately predicted postoperative complications using real-time EHR data, with AUROC values ranging from 0.78 to 0.91 across various complications. Its mobile app version improved accessibility and speed, enabling real-time risk assessments for surgeons [17].

Small et al. evaluated primary care physicians’ perceptions of GenAI-drafted patient messages in an EHR system. GenAI responses were rated higher for communication style and empathy than physician-generated responses though they were more complex and less readable. The findings suggest GenAI could enhance patient–provider communication while highlighting concerns about accessibility [18].

Another study evaluated ChatGPT-3.5’s ability to answer radiation oncology patient care questions compared to expert answers. The AI model performed similarly or better in correctness (94%), completeness (77%), and conciseness (91%), with minimal risk of harm [19]. However, in some AI models, no significant improvement in clinician efficiency has been observed, as was evident in a study assessing Nuance’s Dragon Ambient eXperience (DAX) Copilot, an AI-based clinical documentation tool, in which no significant improvements in EHR efficiency or financial outcomes among clinicians were found. However, exploratory findings indicated slight reductions in documentation time for high DAX users, low-volume providers, and family medicine practitioners [20].

Large language models (LLMs) exhibit strong zero-shot capabilities for various natural language tasks but face challenges in clinical adoption due to inaccuracies and potential harm. One study evaluated and compared Almanac, an LLM enhanced with medical resource retrieval, against standard models for medical guideline recommendations. A panel of clinicians assessed the responses to 314 clinical questions across nine specialties. Almanac outperformed standard LLMs in factuality, completeness, user preference, and safety. The findings highlight the potential of domain-specific LLMs in clinical decision-making while emphasizing the need for rigorous validation before deployment [21]. One randomized clinical trial evaluated voice-based conversational AI for basal insulin management in type 2 diabetes patients. The AI group achieved optimal insulin dosing faster (median 15 vs. >56 days, p = 0.006), had better adherence (82.9% vs. 50.2%, p = 0.01), and improved glycemic control (81.3% vs. 25.0%, p = 0.005) [22].

One cross-sectional study compared ChatGPT-generated ophthalmology advice with responses from American Academy of Ophthalmology (AAO)-affiliated ophthalmologists using 200 patient questions from an online medical forum. Expert evaluators struggled to distinguish AI from human responses, and chatbot-generated advice was comparable to human-written responses in accuracy, likelihood of harm, and adherence to medical consensus [23].

Generative AI chatbots are being tested to help patients understand their symptoms and decide what to do next. They can chat with patients on websites or apps, ask questions, and give advice in simple language. Some hospitals are trying them out for things like after-hours help or mental health support. Early results are promising, but there are still concerns about safety. As a safeguard, some systems use a hybrid approach where the AI drafts a response, but a human clinician or nurse approves it before it is sent to the patient—essentially an AI–human team in customer service for healthcare.

Another domain of patient communication is health education and literacy. Generative AI can take medical jargon or complex explanations and translate them into simpler language for patients. A study by Blease et al. (2023) highlighted that LLMs could be used to generate accessible explanations of medical information for patients, potentially improving health literacy [6]. For instance, after an office visit, a patient might receive an “AI-augmented” summary of their visit, where the clinician’s notes are translated into lay terms by an AI and supplemented with educational content (e.g., an illustration or a quick primer on their diagnosis). This could empower patients to better understand their conditions and treatments. Indeed, the World Health Organization has emphasized the importance of clear communication in healthcare, and AI could be a tool to further that goal—provided the content is vetted for accuracy. Multimodal AI chatbots, which process both medical images and text, have the potential to improve clinical diagnostics compared to text-only chatbots. However, more research is needed to optimize multimodal AI for improved oncology-specific accuracy [24].

Mental health is a particularly sensitive area of patient communication where generative AI has both potential and pitfalls. There has been exploration into AI therapy chatbots that can engage in empathic dialogue with users, simulating aspects of counseling. For example, apps like Woebot (a CBT-based chatbot) existed even before GPT, but now LLMs are making such interactions more fluid and human-like. Some users have reported that speaking with an AI chatbot can feel therapeutic, offering a nonjudgmental space to discuss problems and even helping alleviate loneliness [25].

Serious illness conversations (SICs) in outpatient care can enhance mood and quality of life for cancer patients while reducing the likelihood of intensive end-of-life treatments. A machine learning-driven, behaviorally informed intervention encouraging SICs resulted in end-of-life cost savings for cancer patients by reducing systemic therapy and outpatient expenses [26].

However, experts caution that these systems are not a replacement for trained mental health professionals. They currently lack the ability to handle crises appropriately, a recent analysis found that many AI chatbots failed to recognize signs of suicidal ideation and did not consistently provide safe guidance in such cases [27]. There are also ethical concerns about AI giving psychological advice without understanding the full context of a person’s life. The consensus in the field is that AI can augment mental healthcare (for instance, by guiding patients through evidence-based self-help exercises or by providing check-ins between therapy sessions), but it should not operate independently in clinical mental healthcare at this stage. The FDA has not approved any AI chatbot as a medical device for therapy, and any such use would require rigorous clinical trials to ensure safety and efficacy.

The rapid integration of artificial intelligence (AI) in healthcare raises concerns about patient trust and engagement. While AI adoption is advancing quickly, patient perspectives on its responsible use remain underexplored. One cross-sectional study surveyed a representative sample of US adults to assess their trust in health systems using AI responsibly and preventing AI-related harm. Findings revealed low trust levels, with experiences of discrimination negatively impacting confidence in AI. General trust in the healthcare system, but not health literacy or AI knowledge, was associated with AI trust perceptions. These results highlight the need for improved communication and institutional efforts to build trust in AI-driven healthcare [28].

To summarize this, generative AI is showing promise in improving patient communication, from assisting clinicians with responses to directly engaging patients as an informational tool. Early studies report benefits like reduced clinician workload and better patient satisfaction, but safe use requires human oversight likely through AI-generated drafts reviewed by clinicians.

To summarize, generative AI has emerged as a valuable tool in enhancing patient communication by supporting clinicians in managing high volumes of patient interactions and directly engaging patients through conversational agents. LLM-based systems are being deployed to automate patient message triage, explain complex medical concepts in lay terms, and assist in disease management tasks, such as insulin titration and symptom assessment. Notably, domain-specific models like Almanac have shown higher factual accuracy and safety in guideline-based recommendations. However, challenges remain regarding accessibility, overcomplexity, and trust, particularly in mental health contexts where AI cannot yet safely replace human judgment. Hybrid models, where clinicians review AI-generated content, are emerging as a practical safeguard. Ultimately, generative AI holds promise in supporting scalable, empathetic, and informative patient communication, but its use must be carefully governed to ensure safety, trust, and equity.

4.3. Clinical Decision Support and Diagnostics

Perhaps the most aspiring application of generative AI in healthcare is clinical decision support that employs AI to assist with diagnosis, treatment planning, and other complex medical decisions. Generative models bring new capabilities to this arena by synthesizing large volumes of heterogeneous information and even generating novel hypotheses or insights. In clinical practice, an AI might review a patient’s history, lab results, and imaging, and then produce a summary or suggest possible diagnoses and management plans for the clinician to consider, as depicted in Figure 4. This is akin to having a very knowledgeable, tireless consultant who can rapidly draft a differential diagnosis or summarize the latest research relevant to a case.

Clinicians often struggle with probabilistic reasoning in diagnostic decision-making. Large language models have shown they can mimic the clinical reasoning process to some extent. For instance, given a case description, an LLM can be prompted to list possible diagnoses and its rationale for each. Some preliminary studies have benchmarked LLMs on diagnostic challenge sets or medical board-style questions. Researchers are testing LLMs using real hospital data to assess their diagnostic and clinical decision-making abilities. A 2023 study found that LLMs often matched discharge diagnoses and aligned with clinical guidelines but also occasionally hallucinated false details. This exposes a key risk in AI-generated content. To address this, newer approaches like retrieval-augmented generation are being explored to ground AI outputs in actual patient records and medical literature [5].

AI has potential clinical applications in diagnosing complex cases, especially in low-resource settings lacking specialist care. One study found that GPT-4 could achieve a performance on clinical case questions comparable to senior medical residents, successfully identifying the correct diagnosis in complex cases around 80% of the time, whereas GPT-3.5 was around 60%.

Another study evaluated GPT-4’s diagnostic accuracy in six elderly patients with delayed diagnoses, showing higher accuracy (83.3% with differential diagnoses) compared to clinicians (50%) and a traditional decision support system (33.3%). While GPT-4 suggested relevant diagnoses earlier, its limitations in detecting multifocal infections and certain imaging recommendations highlight the need for careful clinical correlation [29].

Prompt identification of ST-elevation myocardial infarction (STEMI) is essential for effectively treating patients with acute coronary syndrome. In a randomized controlled trial conducted in Tri-Service General Hospital Taiwan, the ability of AI-ECG-assisted STEMI detection was evaluated to minimize treatment delays for STEMI patients. AI-ECG improved the timely detection of STEMI in a large clinical trial, significantly reducing door-to-balloon and ECG-to-balloon times. The AI-ECG system demonstrated high predictive accuracy and was associated with a reduction in cardiac deaths. These findings highlight its potential to enhance STEMI triage and treatment efficiency [30].

The impact of an LLM on physicians’ diagnostic reasoning compared to conventional resources was assessed in a randomized study. The LLM alone outperformed both physician groups but its availability did not significantly enhance physicians’ reasoning [31]. The performance of AI algorithms for breast cancer detection on digital breast tomosynthesis (DBT) was rigorously evaluated in an international challenge that brought together methods from multiple research groups. The highest-ranking model attained a sensitivity of 0.957, underscoring its robust capability to identify malignant lesions. Moreover, the competition yielded several key resources including a standardized set of evaluation benchmarks, a publicly released clinical DBT image repository, and open-source code for all submitted algorithms, thereby establishing a reproducible framework and accelerating future AI developments in DBT analysis [32]. A recent diagnostic investigation evaluated an XGBoost-based machine learning model for distinguishing Kawasaki disease (KD) from other febrile illnesses in pediatric patients. In a retrospective cohort of 74,641 children presenting with fever, the model demonstrated excellent discriminative performance, achieving an area under the receiver operating characteristic curve (AUC) of 0.980. At the optimal operating point, sensitivity was 92.5% and specificity 97.3%, indicating a high degree of accuracy in identifying KD among diverse febrile presentations [33].

In a retrospective analysis of 2054 thyroid-nodule ultrasounds, a personalized AI decision-aid workflow cut diagnostic time for senior radiologists without sacrificing accuracy, while junior radiologists saw greater gains using the conventional AI-assisted approach [34]. Diagnoses of schizophrenia and bipolar disorders are often delayed, hindering their timely treatment. An XGBoost model applied to EHRs from 24,449 patients predicted diagnostic progression with an overall AUROC of 0.64–0.80 for schizophrenia and 0.62 for bipolar disorder, leveraging clinical notes to boost early detection [35].

A smartphone-based ML-based app was developed for neonatal jaundice screening by analyzing skin color to predict bilirubin levels. It was tested on 546 neonates, and a strong correlation with total serum bilirubin (TSB), achieving 100% sensitivity and an AUC of 0.89, suggested its potential as a noninvasive NNJ screening tool, pending further validation for clinical implementation [36].

In a cohort of 241 children with autism spectrum disorder (ASD), a video-based AI algorithm automatically detected 92.5% of manually annotated stereotypical motor movements, showing strong concordance with clinician assessments [37].

In a retrospective cohort of 116,495 women, a commercial AI tool flagged those at high risk of breast cancer up to 4–6 years before diagnosis. Breasts that later developed cancer had significantly higher AI risk scores, underscoring its promise for personalized screening and early detection [38].

A diagnostic study assessed machine learning models applied to EHRs to predict perinatal mood and anxiety disorders, incorporating bias-mitigation techniques [39]. James et al. tested an AI tool on chest X-rays to identify pneumothorax and tension pneumothorax, achieving AUCs of 0.979 and 0.987. It reached sensitivities of 94.3–94.5% and specificities of 92.0–95.3%, on par with radiologist consensus [40].

The integration of AI-derived histologic features into pathology practice might also improve risk stratification in colon cancer. One prognostic study evaluated the score of a tumor adipose feature (TAF) in colon cancer that was identified using machine learning and assessed by pathologists. TAF presence was associated with reduced survival, with strong agreement among pathologists (90%) [41].

One diagnostic study assessed four LLMs (ChatGPT, Galactica, Perplexity, and BioMedLM) as support tools and one expert physician for precision oncology by evaluating their treatment recommendations for fictional cancer cases. While LLMs identified more treatment options than human experts and provided unique and useful suggestions, indicating their potential for assisting clinical decision-making and literature screening in precision oncology, their accuracy and credibility were lower (F1 scores: 0.04–0.19) [42]. Another study explored the use of deep learning (DL) methods to quantify joint attention for detecting autism spectrum disorder (ASD) and assessing symptom severity. The DL model demonstrated high predictive accuracy in distinguishing ASD from typical development and evaluating symptom severity based on behavioral responses. These findings suggest that AI-assisted methods could enable scalable, automated ASD detection and assessment [42].

In one multicentric, block-randomized, double-blind, placebo-controlled clinical trial in women, the use of ensemble machine learning with augmented inverse probability weighting (AIPW) for estimating per-protocol effects was demonstrated. The Effects of Aspirin in Gestation and Reproduction (EAGeR) trial was analyzed, showing that adherence to a low-dose aspirin regimen (≥5 days/week) was associated with an 8.0 per 100 increase in hCG-detected pregnancies, compared to 4.3 per 100 in the intention-to-treat analysis [43].

Table 1 summarizes generative AI applications in clinical practice. Generative AI is starting to show value in clinical diagnosis, especially as a supportive tool for identifying complex or rare conditions. In a notable case, a patient with a rare condition—tethered spinal cord—went undiagnosed by several doctors. When the case was entered into ChatGPT, it correctly suggested the condition, leading to further testing and confirmation. While anecdotal, this case illustrates how AI, trained on a vast amount of medical literature and case data, can surface diagnostic possibilities that even experienced clinicians might miss. To assess this potential more rigorously, a 2024 study compared physician teams and ChatGPT on a set of challenging diagnostic cases with known outcomes. While the physicians performed slightly better overall, the AI identified some correct diagnoses that humans overlooked, and vice versa. This suggests a complementary relationship where AI could function as a “diagnostic safety net” or second opinion generator. For instance, in an emergency room, a clinician encountering a patient having nonspecific but concerning symptoms could ask the AI for differential diagnoses. The AI might suggest rare but important possibilities and explain its reasoning based on the case details. However, for this to be viable in practice, the AI must consistently demonstrate high accuracy, avoid suggesting harmful errors, and be transparent about its uncertainty—challenges that ongoing research and validation efforts aim to address.

All these findings show that generative AI acts as a powerful tool in transforming clinical decision support, offering capabilities such as synthesizing complex patient data, suggesting differential diagnoses, and assisting with personalized treatment planning. Studies show that large language models like GPT-4 can match or even outperform physicians in diagnostic accuracy, particularly in complex or delayed diagnoses, and enhance decision-making by providing context-aware treatment suggestions. Various applications range from AI-assisted ECG interpretation and oncology planning to rare disease detection and risk stratification in colon cancer. AI models have also demonstrated utility in imaging analysis, neonatal and psychiatric screening, and the management of polypharmacy and chronic conditions. However, while results are promising, issues like hallucinations, overreliance, and variability in model performance necessitate rigorous validation, human oversight, and integration with clinical workflows. Overall, AI’s role as a “diagnostic safety net” is increasingly evident, supporting clinicians in delivering more accurate, timely, and individualized care.

4.4. Medical Imaging Interpretation

AI has already advanced radiology and pathology through tasks like image classification and segmentation. Generative AI is now extending these capabilities by enabling image interpretation with multimodal LLMs that can generate full-text radiology reports, not just fixed labels. In 2023, researchers showed that an LLM, fine-tuned on radiology reports, could turn image findings into coherent, human-like impressions, for example, describing a chest X-ray as showing “right lower lobe pneumonia”. This approach could standardize and speed up reporting, especially in high-volume settings. However, AI-generated reports may miss subtle findings or introduce errors, limiting current use to draft reports that radiologists review and edit. Some vendors now offer AI “report drafting assistants” that support rather than replace radiologists in routine cases [4].

In one cross-sectional study, the ability of two AI chatbots (Bard and GPT-4) was evaluated to simplify pathology reports for patients. Both chatbots significantly improved readability, with GPT-4 achieving a lower grade level and higher accuracy (97.4%) than Bard (87.6%). However, occasional inaccuracies and hallucinations were observed, highlighting the need for clinician review before patient use [51]. Sebastian and colleagues developed a deep learning AI model to differentiate colon carcinoma (CC) and acute diverticulitis (AD) on CT images. AI support improved radiologists’ sensitivity from 77.6% to 85.6% and specificity from 81.6% to 91.3%, reducing false-negative and false-positive diagnoses [52]. In another diagnostic study, a deep learning model to detect retinal hemorrhage (RH) was developed in pediatric head CT scans, aiding in the diagnosis of abusive head trauma (AHT) and achieving good sensitivity and specificity [53].

In the RA2-DREAM initiative, researchers created machine learning models that automatically quantify joint damage on rheumatoid arthritis radiographs. These algorithms demonstrated high accuracy and hold potential for seamless integration into electronic health records, supporting both clinical decision making and research [54]. The BrainNERD model is a natural language processing (NLP) tool that accurately extracts and summarizes acute brain injury data from head CT reports. With high precision and recall, it enables large-scale analysis of radiographic findings, supporting clinical research and decision-making [55].

AI models also prove to be helpful in the early detection of trisomy 21 in the first trimester with high accuracy. A deep learning model using ultrasonographic images accurately screened for trisomy 21 in the first trimester, outperforming traditional nuchal translucency and maternal age-based screening [56].

An AI tool raised pulmonary nodule detection on chest X-rays by 6.4% across all difficulty levels. Junior radiologists improved sensitivity the most, while seniors saw similar specificity gains. This suggests AI can boost lung cancer screening, offering reliable support for radiologists of any experience level [57]. Lee et al. trained AI models on routine biometry images and ultrasound videos to fine-tune gestational age estimates and boost fetal health monitoring. Their ensemble approach outperformed traditional fetal biometry, with the largest gains in small-for-GA fetuses demonstrating AI’s promise in improving prenatal care [58]. Researchers developed EchoNet-Liver, a deep learning model that analyzes echocardiograms to spot cirrhosis and steatotic liver disease, in an observational study. Trained on extensive datasets, it delivered strong diagnostic performance across diverse cohorts. This AI-driven approach enables opportunistic chronic liver disease screening, supporting earlier diagnosis and intervention [59].

Based on a similar concept, another study on AI-based anomaly detection (AD) in histopathology was conducted, which depicted its effectiveness in identifying rare gastrointestinal (GI) diseases using real-world biopsy datasets. The model, trained only on common diseases, accurately detected infrequent pathologies and cancers with high AUROC scores (up to 97.7%). This approach can prioritize cases, reduce misdiagnoses, and enhance AI adoption in clinical pathology [60].

Another example was where AI was used to detect pulmonary tuberculosis and chest X-ray abnormalities in a population with high TB/HIV, evaluated by Sahar and colleagues. The AI system’s sensitivity and specificity were comparable to radiologists and met WHO specificity targets but fell short on sensitivity. Additionally, AI effectively detected other CXR abnormalities, supporting its potential use in TB triage and broader diagnostic applications [61]. Another cross-sectional study tested a deep learning-based AI model to detect myopia, strabismus, and ptosis in children using smartphone images. The model was trained on 1419 images from 476 patients, and it demonstrated high sensitivity and accuracy, suggesting its potential for early and convenient home-based screening of pediatric eye diseases [62].

A multimodal AI model was also used for integrating clinical data and EUS images to improve the diagnosis of solid pancreatic lesions. The model demonstrated high accuracy across multiple datasets and significantly enhanced novice endoscopists’ diagnostic performance [63].

Aklilu et al. built a computer-vision model to parse 243 laparoscopic cholecystectomy videos, linking surgeons’ fine-grained actions to outcomes. The tool predicted blood loss (AUROC 0.81) and operator expertise (AUROC 0.78), uncovering how specific techniques affect performance and underscoring AI’s promise for objective, scalable skill assessment [64]. Jong et al. evaluated the impact of AI-assisted interpretation on radiologists’ performance in detecting chest radiograph abnormalities. AI improved sensitivity for pneumonia, nodules, pleural effusion, and pneumothorax while reducing reporting time by 10% [65].

Another diagnostic study assessed an AI-based software tool for assisting radiologists in detecting amyloid-related imaging abnormalities (ARIAs) in Alzheimer’s patients undergoing monoclonal antibody therapy, thereby enhancing ARIA monitoring and aiding treatment decisions and patient safety [66]. Major generative AI applications in medical imaging and data augmentation are mentioned in Table 2.

Generative AI methods are increasingly employed for image enhancement and reconstruction in medical imaging, improving diagnostic utility while reducing patient risk. For example, they convert low-dose CT acquisitions into standard-quality images by inferring missing details, aiding the identification of small lesions with lower radiation exposure. In MRI, generative frameworks reconstruct full volumetric datasets from partial acquisitions, markedly shortening scan times. With rigorous validation, these approaches could streamline imaging workflows, enhance patient comfort, and maintain diagnostic accuracy [4]. In pathology, generative models can perform stain-style transfer—converting an H&E-stained slide to an immunohistochemistry slide virtually, for instance—which could assist pathologists by highlighting certain features without needing additional lab tests. They can also generate synthetic pathological images for rare tumors to augment training data for pathology AI algorithms [67,68].

Table 2. Use of Generative AI in Medical Imaging and Data Augmentation.

Application	Description	Benefits	References
Synthetic Image Generation	Use of GANs and diffusion models to generate synthetic MRI, CT, X-ray, ultrasound, and pathology images.	Augments datasets, preserve patient privacy, improve model generalization.	[8,67]
Image Reconstruction	Generative models like GANs used to enhance image quality and reconstruct missing parts of images.	Improved image clarity; better diagnostics.	[69,70]
Data Augmentation	GAN-generated synthetic data to augment training datasets, synthetic images to balance class distribution and enhance model performance and robustness.	Increases diagnostic model accuracy; addresses domain shift; reduced overfitting.	[8,71,72]
Image Quality Enhancement	Denoising low-dose CT/MRI images, super -resolution MRI using generative models.	Reduces scan time, radiation exposure, and improves image clarity.	[73]
Modality-to-Modality Translation	Generating synthetic CT images from MRIs or virtual histochemical staining in pathology using GANs.	Reduces need for multiple scans; enhances surgical/radiation planning.	[68,74]
Disease Detection	AI models trained on generatively augmented data for more accurate detection of diseases in scans.	Early detection; better patient outcomes.	[75,76]
Image Segmentation	Segmentation models using generative techniques for precise delineation of structures in medical images.	Enhanced surgical planning; faster analysis.	[77,78]

4.5. Clinical Decision Support Systems (CDSSs)

Beyond diagnosis, generative AI can support treatment decisions and care planning. For example, an oncologist could use AI to scan clinical trial protocols and identify suitable trials for a patient or summarize optimal treatment options based on tumor genetics and medical history. An LLM can integrate textual guidelines, research articles, and patient-specific data to produce a tailored summary. IBM’s Watson for Oncology attempted something similar years ago using a more rules-based AI but with mixed results. Modern generative models are more flexible in assimilating data and might overcome some limitations of earlier systems.

Large language models (LLMs) are increasingly being recognized as valuable tools for evaluating complex therapeutic trade-offs in clinical practice. By leveraging their capacity to process vast quantities of biomedical literature, clinical trial outcomes, electronic health records, and pharmacovigilance data, LLMs can help synthesize information on both the efficacy and adverse effect profiles of various treatment options. This is particularly important in cases involving polypharmacy, chronic disease management, and precision oncology, where clinicians must weigh benefits against potential risks in real time. Moreover, generative AI can contribute to personalized treatment planning by contextualizing outcomes based on patient-specific factors such as comorbidities, age, or genetic markers. As these models are further integrated with clinical decision support systems, they hold promise for improving shared decision-making by providing balanced, data-driven summaries of treatment options in accessible formats for both providers and patients [79].

In a study, a comparative evaluation of LLMs was performed in clinical oncology. This study evaluated five large language models (LLMs) on 2044 oncology-related questions to assess their accuracy and reliability in clinical applications. GPT-4 outperformed other models, scoring above the 50th percentile compared to human benchmarks, though performance varied across oncology subdomains. However, continuous benchmarking is essential to ensure safe and effective AI integration into clinical oncology [80]. Timely interventions like antibiotics and IV fluids can significantly reduce sepsis-induced mortality. AI models that effectively predict the risk of sepsis onset could help accelerate the administration of these treatments. Earlier, various AI models were used to predict sepsis, but these models have many limitations limiting their clinical use. In a recent cohort study conducted in the University of Michigan’s academic medical center, an Epic sepsis model (ESM) was developed using a large inpatient dataset to assess its predictive performance before clinical recognition. The model achieved an AUROC of 0.62 when including all predictions but dropped to 0.47 when excluding those made after clinical recognition, highlighting the need for improved early prediction models [81].

Machine learning models enable the targeted identification of high-risk individuals for hepatitis C virus (HCV) screening, improving efficiency over traditional methods. Using retrospective data, an XGBoost-based model was developed and deployed nationwide in Israel for identifying active HCV carriers. The model identified HCV carriers with over 100 times greater efficiency than standard screening, highlighting its potential for more effective disease detection [82].

Seung et al. developed an AI model that combines preoperative and intraoperative data to flag patients at high risk of massive transfusion during surgery, achieving an AUROC > 0.94 and enabling earlier intervention [83]. Such models also help to predict neurologic morbidity in critically ill pediatric patients, using data from a quaternary pediatric ICU. One prognostic study demonstrated strong predictive performance, with external validation confirming its reliability. Moreover, the model’s predictions significantly correlated with brain injury biomarkers, highlighting its potential for early risk identification and intervention [84].

Trust and accountability are vital in decision support—clinicians need more than a “black-box” recommendation; they want clear explanations. Generative AI addresses this by offering natural language justifications, for example, linking a drug choice to clinical guidelines and patient factors. Even so, confident-sounding rationales can be flawed if the AI’s logic is off. Ensuring explanation accuracy, often called interpretability or faithfulness, is an active research area. One promising solution is retrieval-based systems that cite relevant guidelines or literature to back their recommendations [4], blending AI with advanced clinical research to provide evidence-backed advice, save time, and support informed decisions. Such tools can save clinicians time and help them make confident, evidence-based choices [4].

4.6. Emergency and Triage

Another emerging application of generative AI is in acute care settings, supporting triage and decision-making. For example, an AI could listen to an emergency department (ED) triage interview and generate a risk assessment, flagging patients with subtle signs of serious conditions like chest pain suggestive of acute coronary syndrome, even if those signs are missed by less experienced staff.

An AI-informed triage clinical decision support (CDS) system was implemented across multiple emergency departments (EDs) to improve patient risk stratification and flow. It helped in the identification of critical care cases, adjusted triage acuity distribution, and reduced waiting times. However, differences in nurse agreement with the AI tool affected triage outcomes, emphasizing the need for better alignment between AI recommendations and clinical expertise [85]. In ICUs, electroencephalography (EEG) monitoring is essential for detecting brain injuries but is limited by clinician availability and subjective interpretation.

Barnett et al. built an interpretable deep learning system trained on over 50,000 EEG samples to classify six harmful patterns with case-based explanations, boosting ICU diagnostic accuracy from 47% to 71%. The model achieving AUROCs of 0.87–0.96 outperformed black-box approaches, and its visualizations supported the ictal–interictal injury continuum hypothesis, underscoring AI’s promise for enhancing EEG-based critical care diagnostics [86]. Use of AI could enhance training and accuracy in trauma assessment. One study investigated the impact of AI guidance on the quality of FAST (Focused Assessment with Sonography in Trauma) ultrasonography performed by novice operators. AI guidance significantly improved diagnostic quality scores (p = 0.02) and the rate of acceptable images (p < 0.001), though it initially increased examination time [87].

4.7. Medical Imaging and Pathology

Medical imaging and pathology are visual domains where generative AI techniques, especially GANs and diffusion models, have found fertile grounds. The motivations for using generative AI in these fields are increased availability of training data via synthetic images, improving image quality, preserving patient privacy through generated datasets, and even creating educational simulators.

4.7.1. Data Augmentation and Synthetic Datasets

High-quality labeled imaging data is essential for developing effective AI diagnostic models, which might be helpful in tasks like tumor detection on MRI images or diabetic retinopathy grading on retinal photos. However, obtaining sufficiently large and diverse datasets is challenging, particularly for rare diseases or unusual imaging patterns. Generative models can help by producing realistic synthetic images that augment real datasets.

A 2022 study in ophthalmology used a GAN to generate synthetic fundus images of rare retinal diseases (like different stages of age-related macular degeneration) [71]. Waikel et al. assessed pediatric residents’ ability to recognize generative AI images to recognize Kabuki and Noonan syndromes after different educational interventions. Real images were most effective, but AI-generated images performed comparably, improving recognition and confidence. The findings suggested that AI-generated images could be a useful adjunct for teaching rare genetic conditions [88].

Clinicians rated most of these synthetic images as indistinguishable from real patient images. When an image classification model was trained on a combination of real plus GAN-generated fundus images, its performance in detecting the disease on a test set improved compared to training on real images alone, indicating the synthetic data added useful signals. Similarly, in digital pathology, researchers have generated synthetic histology slides of tumors. One group focused on pediatric cancers where data was scarce—they trained a GAN on a small set of tumor histology images and generated many additional slides. Training a diagnostic model on the expanded set led to higher accuracy in cross-validation than using the original limited data [67]. Synthetic images can also balance datasets that have class imbalances (e.g., far more normal images than abnormal). By generating abnormal examples, one can ensure an AI sees enough pathology during training to learn robust features. Regulators will likely scrutinize models trained on synthetic data to ensure no unintended biases are introduced, but so far the concept is proving beneficial in research settings [89].

One particularly powerful use of generative synthetic data is to enable data sharing across institutions without violating privacy. Institutions often cannot share patient images freely due to privacy concerns. However, if one institution trains a generative model on their data and then shares only the model (or the model’s output images which are synthetic), other institutions can benefit from the variance in that data without ever seeing the real patients’ images. Synthetic data, if truly de-identified and irreversibly generated, might sidestep some HIPAA restrictions, though this is an area under careful ethical review. The ideal is to create a “virtual cohort” of synthetic patients whose data can be open source, enabling many researchers to develop and test algorithms without risk to real patients.

4.7.2. Image Quality Enhancement

Generative models excel at learning the underlying structure of images, which allows them to fill in gaps or correct degradation. Denoising is one such application which is quite helpful in MRI or CT images that often contain noise due to fast acquisition or low dose. Generative models can be trained on pairs of images (one noisy, one clean) and learn to produce the clean version from a noisy input. A generative denoising model for low-dose CT was able to reduce noise such that radiologists rated the images as diagnostically equivalent to normal-dose CT, potentially allowing dose reduction by 25–50% in certain scans [4].

In MRI, there is growing interest in super-resolution techniques that reconstruct high-resolution images from lower-resolution scans. For example, this could enable 1.5 Tesla MRI scanners to produce images with quality approaching that of 3 Tesla scanners through AI-driven enhancement. Early studies in brain MRI show promise, with diffusion models capable of upsampling images to reveal finer anatomical details; however, radiologists must remain vigilant to ensure AI does not introduce fabricated structures. Generally, these enhancements are used as an adjunct, not a replacement: a radiologist might view both the original and the AI-enhanced image side by side [4].

4.7.3. Image-to-Image Translation

This refers to converting one type of image to another. In radiology, a prime example is synthesizing CT images from MRI for surgical planning or radiation planning. MRI provides superior soft tissue detail but lacks the bone detail of CT that is often needed for these tasks. A GAN or diffusion model can be trained on patients who had both MRI and CT to learn a mapping, and then when a new patient has only MRI, the model can generate a pseudo-CT. In 2023, an orthopedic research group successfully generated CT images from MRI for patients with pelvic fractures. Surgeons found that the synthetic CTs had a sufficiently accurate depiction of the bone structures to plan certain surgeries, though they would not yet rely on them alone [74]. The potential lies in reducing reliance on multiple imaging modalities, thereby lowering both costs and patient radiation exposure. Pathology offers similar applications, for instance, using AI to virtually convert an H&E-stained slide into a special stain like PAS or trichrome, enabling digital histochemical staining without additional lab work.

However, as of 2025, generative AI in imaging is largely confined to research and pilot projects. No generative model is yet FDA-approved for autonomous clinical image interpretation or generation (some discriminative AI for detection is approved, but not generative). However, the FDA has approved certain AI-based image enhancement devices (like an MRI reconstruction algorithm) under its device pathways, so it is plausible a generative enhancement could gain clearance if proven to not introduce errors.

The radiology community is actively assessing these tools, with journals like Radiology: Artificial Intelligence publishing numerous studies on GANs enhancing AI model generalizability, along with surveys highlighting recent advancements [8]. There is growing agreement that generative AI can be a valuable tool in imaging if used carefully, but radiologist oversight is crucial to catch errors, like a GAN accidentally adding a fake lesion that could be mistaken for disease [7]. Some studies have reported an increase in burnout prevalence with the use of AI in radiology due to emotional exhaustion. A dose-response relationship was observed, with higher burnout among those with greater AI use, high workload, and lower AI acceptance suggesting the need for further research to optimize AI integration in radiology practice [90].

Therefore, AI tools will assist rather than replacing radiologists and pathologists in order to become more impactful as digital adoption grows, especially with slide digitization enabling advanced features like image enhancement and virtual staining [4]. Table 3 summarizes the various primary and secondary outcomes of clinical studies using artificial intelligence.

4.8. Drug Discovery and Biomedical Research

Beyond clinical care, generative AI is rapidly advancing biomedical research, especially in drug discovery and molecular design. This is an area where generative models have been used for several years (even preceding the recent LLM explosion), but recent advances have greatly accelerated progress. Generative AI seems to be attractive in drug discovery because it can quickly explore vast chemical spaces more efficiently than the traditional methods.

Large language models (LLMs) have also emerged as a valuable tool for shaping strategic research directions. The vast and ever-expanding volume of biomedical literature poses challenges for manual synthesis and priority setting. LLMs can assist in systematically analyzing published literature, extracting latent trends, identifying underexplored areas, and flagging overrepresented domains. This capability can guide the design of more equitable and forward-looking medical research agendas. Recent reports from major funding agencies such as the European Research Council (ERC) and the National Science Foundation (NSF) have highlighted imbalances in funded research domains, with certain fields receiving disproportionate attention while others remain underrepresented. In this context, LLMs can support evidence-based decision-making by enabling meta-research analyses across multiple disciplines. For example, generative AI tools can map the density of research efforts, funding patterns, or citation impact across therapeutic areas, thereby helping stakeholders rebalance research priorities toward unmet medical needs or neglected populations [96]. Furthermore, these models can be fine-tuned to assess global health priorities, translational impact, and alignment with Sustainable Development Goals (SDGs). With appropriate human oversight, transparency, and bias-mitigation mechanisms, generative AI has the potential to support funding agencies, research institutions, and policymakers in designing more inclusive, impactful, and data-driven medical research strategies [97].

4.8.1. De Novo Molecule Generation

Generative models (including GANs, variational autoencoders, and transformer-based models) can be trained on datasets of known bioactive molecules and then generate novel chemical structures with desired properties. In practice, a researcher can set specific criteria such as targeting a particular protein or avoiding toxic components, and the AI generates candidate molecules that match or closely approximate those requirements. This approach reverses the traditional process: instead of humans designing molecules for testing, AI designs a wide range of options, and researchers focus on testing the most promising ones. In the last three years, there have been remarkable successes attributed to this approach. As of late 2023, roughly 70 new drug candidates designed with the help of AI (many using generative approaches) were reported to be in various stages of clinical trials [4]. While none of these AI-designed drugs has reached the market yet, some have progressed to Phase II trials, indicating they passed initial safety and dosing studies in humans [98]. A notable example is Insilico Medicine’s compound INS018_055, which was discovered and designed using generative AI for a novel anti-fibrotic target. This drug completed Phase I trials in early 2023 with positive results and has entered Phase II trials for patients with idiopathic pulmonary fibrosis [98]. As per the press releases, it is described as the first AI-discovered drug to enter Phase II human trials, marking a major milestone in the field [98]. The key point is that generative algorithms rapidly scanned millions of molecules and identified a candidate in months; a task that usually takes years.

Another AI-designed drug, DSP-1181, developed by Exscientia and Sumitomo Dainippon Pharma for OCD, entered Phase I trials in 2020, using earlier AI methods. By 2024, many pharma companies had fully adopted generative AI to create drug candidates, which are then improved through lab testing and model feedback [99].

Generative AI is also used to design new proteins and peptides with desired functions. In 2022, researchers used a generative model called ProGen (a language model for protein sequences) to create millions of enzyme sequences, synthesized a few top-rated ones among them, and discovered that some of these AI-generated enzymes were found to be functional in vitro; one even matched the activity of natural enzymes despite having a completely new sequence. In 2023, diffusion models were applied to protein structure generation (hallucinating new protein folds atom-by-atom), enabling the design of proteins that bind to specific targets, a key step in biologic drug development. One such approach produced novel proteins that could bind a cancer-relevant target with high affinity, effectively proposing a new class of biologic therapeutics. These are still in the laboratory testing phase, but the speed at which generative models can ideate new therapeutic candidates is unprecedented [4].

4.8.2. Drug Optimization and ADMET

Another role of generative models is the optimization of existing drug candidates for better pharmacological profiles. A model might start with a known active molecule and generate analogs predicted to be more potent or have better absorption, distribution, metabolism, excretion, and toxicity (ADMET) characteristics. For instance, if a compound is effective but has liver toxicity, a generative model can try to modify the structure in ways that remove the toxic substructure while retaining activity. This is akin to having a thousand virtual medicinal chemists brainstorming modifications around the clock. Some pharma companies reported that by using such AI tools, they reduced the number of compounds they needed to synthesize by 50% to find a preclinical drug candidate, translating to significant time and cost savings [100].

4.8.3. Clinical Trial Design and Data Augmentation

Besides designing molecules, generative AI is also helping to improve the design and analysis of clinical trials. One intriguing use is generating “digital patient” data to simulate clinical trials. Companies are developing generative models that learn from past trial data and electronic health records to create realistic virtual patient populations. By doing this, they can simulate how a trial might play out (for example, what outcomes the placebo group might have) to better understand trial power or to identify which patient subgroups might benefit most from a treatment [4].

In some cases, these digital twins of patients can serve as a proxy for a control arm, potentially allowing smaller real-trial arms. For instance, instead of enrolling 100 placebo patients, a company might enroll 50 and supplement with 50 AI-simulated patients—though regulators have not yet accepted this approach, and it still remains experimental. One step in that direction is using historical data to strengthen control groups; generative models can create data that match past patterns, helping assess if historical controls are suitable for a trial [4].

The process of screening participants for clinical trials is time-consuming, resource-intensive, and prone to errors. Advanced natural language processing capabilities of large language models like GPT-4 offer the potential to improve the efficiency and accuracy of the screening process. One such clinical study examined the effectiveness of a retrieval-augmented generation (RAG)-enabled GPT-4 system in enhancing the accuracy, efficiency, and reliability of screening for a clinical trial focused on patients with symptomatic heart failure. This study evaluated RECTIFIER, a GPT-4-powered AI system, which improved accuracy, efficiency, and cost-effectiveness in identifying eligible heart failure patients. AI-assisted screening can enhance trial workflows but requires clinician oversight [101].

The U.S. FDA has shown openness to innovative trial designs (like Bayesian approaches and synthetic controls in rare diseases), so generative AI may soon become part of the trial design toolbox, especially in early-phase or exploratory studies where ethical and practical constraints encourage creative methods.

4.8.4. Genomics and Precision Medicine

The use of generative AI can also be applied in the field of genomics research, such as generating synthetic genomic sequences that obey certain properties. One example is designing mRNA sequences for vaccines or therapeutics that have optimal protein expressions and stability. An LLM can be trained on known mRNA sequence-protein pairs and then generate new mRNA sequences for a given protein that are predicted to be more efficiently translated in human cells (by optimizing codon usage, secondary structure, etc.).

Diagnosing genetic disorders involves the meticulous manual review and analysis of candidate variants, which is a time-consuming and complex process even for experienced geneticists. AI-MARRVEL (AIM) is a machine learning tool that is designed to enhance genetic disorder diagnosis by integrating expert-engineered features into a random forest classifier trained on over 3.5 million variants. AIM outperformed current methods, doubling the number of diagnosed cases across three independent cohorts and reaching a 98% precision rate in identifying diagnosable cases [102].

During the COVID-19 pandemic, researchers used AI to design optimized mRNA vaccine sequences, some of which were took forward for preclinical studies. Another fascinating area is using generative models to propose new CRISPR guide RNA designs or even entirely new base-editing enzymes—essentially inventing new biotech tools via AI.

4.8.5. Biomedical Literature and Knowledge Synthesis

For medical researchers and clinicians, a practical use of generative AI is helping manage the overwhelming amount of scientific literature. LLMs when fine-tuned on biomedical publications (like BioGPT, PubMedGPT, or Google’s scientific model) can summarize papers, extract key points, and even generate initial drafts of literature reviews. Researchers can use these models to quickly obtain up-to-date summaries on topics like CAR-T cell therapy for lymphoma along with references and to organize major findings for a paper’s introduction. However, it is necessary to verify all content and references (since models can generate references that look real but are not). Nevertheless, these tools can significantly speed up the preliminary drafting process. Some scientists use ChatGPT as a smart research assistant to write data analysis code or troubleshoot lab protocols by asking targeted questions. Researchers are also exploring how LLMs can mine scientific literature to generate new hypotheses, for example, uncovering hidden links between genes, drugs, and diseases across millions of papers that no single person could fully process and suggesting experiments to test these ideas. While still in the early stages, this “discovery generation” points to AI not just speeding up research but potentially contributing creatively to scientific discovery. The progress in AI-driven drug discovery has led to substantial investment and the rise of several biotech startups focused on AI. By 2024, several AI-discovered drugs had entered human trials, with at least one an insulin analog approaching Phase III [99].

Pharmaceutical companies have started to form partnerships or acquire AI startups to strengthen their drug development process. AI is expected to shorten the preclinical research phase by years and boost success rates by identifying better drug targets and molecules resulting in fewer failures in Phase II/III clinical trials. However, caution is needed: AI can design promising molecules in silico, but real-world biology often brings unexpected results. Some AI-generated compounds that seem promising in simulations may fail due to unmodeled complexities like metabolites or immune reactions. While AI is not a magic bullet to eliminate drug development risk, it does appear to improve the chances of success [103].

From a clinician’s view, AI in drug discovery may not be noticeable until new treatments reach the market, but it could mean more, faster, and broader options, including for rare diseases. For clinician-researchers, AI helps explore big data and the literature and even generate hypotheses like repurposing existing drugs by linking disease mechanisms to drug actions. In fact, AI has already suggested generic drugs for new uses, with some now in investigator-led trials. The real test will be how many translate to clinical breakthroughs. Even a few successful Phase III approvals could validate the approach and pave the way for an era where AI plays a key role in developing new treatments.

4.9. Patient Monitoring and Telehealth Integration

Remote patient monitoring (RPM) enhanced by AI is transforming patient care, especially for chronic conditions and postoperative recovery [104]. It enables continuous, real-time monitoring of health metrics like heart rate and glucose levels through wearable devices, providing immediate data for healthcare providers. This technology is crucial in chronic disease management, allowing for real-time interventions and personalized medication and lifestyle adjustments [105]. The Tenovi company in California, US, has developed an AI-powered RPM system to assist patients with diabetes. The system continuously monitors key metrics such as blood glucose, physical activity, and dietary intake and leverages AI to generate personalized meal and exercise plans. This tailored approach enhances glycemic control and reduces complication risks. For postoperative patients, RPM aids in tracking recovery and reducing hospital visits.

AI-powered telehealth is transforming care delivery, especially in underserved regions. By offering virtual consultations and automated initial assessments, it bridges distance gaps and speeds up access. It also streamlines patient triage, ensuring people are swiftly guided to the right level of care based on their individual needs [106]. In chronic disease management, AI-powered telehealth enables round-the-clock monitoring and personalized care plans that reduce the need for in-person visits. It also supports mental well-being by offering chatbots and virtual therapists for timely emotional assistance [107]. One specific example is an AI system used by doctors in India, designed by the US-based company Welltok. This AI system provided real-time analysis of the doctor’s interactions with patients and made recommendations on improving care. This led to a more efficient patient management process, reducing call volume for health plans, providers, and employer benefit managers while offering an on-demand, customized experience [47]. The AI chatbot used in this system, known as Concierge, helped increase resource efficiency, provide cost transparency, and direct customers to lower-cost alternatives. It achieved an accuracy rate of 98% and was found to save consumers time by over 60%.

4.10. Medical Education and Training

The application of generative AI in medical education and training is worth noticing and relevant for both clinicians and researchers. Generative AI can serve as a powerful educational tool by personalizing learning and providing on-demand teaching in ways that were not previously possible.

4.10.1. Education for Trainees

Medical students and residents can use generative AI as a supplement to their learning. One straightforward use is as a tutoring system. LLMs like ChatGPT can be asked to explain complex concepts in simple terms. For example, a student might ask “Explain the Frank–Starling mechanism as if I’m a college biology major”, and the AI would give a tailored response, allowing for follow-up questions like an interactive textbook. While ChatGPT’s medical explanations are generally accurate, they can include minor errors or oversimplifications, so key points should be cross-checked with reliable sources [108]. Another use case of AI is generating practice questions or clinical vignettes. Early evidence suggests these AI-generated items can closely resemble board exam questions, aiding exam prep. Some faculties are already using AI to help create quiz questions and OSCE (objective structured clinical examination) scenarios [6].

4.10.2. Simulation and Case-Based Learning

Generative AI can simulate realistic patient encounters, providing valuable training opportunities for medical learners. For example, an LLM-based chatbot can role-play a patient with a specific condition, allowing trainees to conduct virtual interviews and practice history-taking. The AI can respond realistically showing emotions or hesitation, creating a safe space to build clinical reasoning and communication skills. It can also generate complete case studies with patient details, symptoms, and synthetic lab results or images for discussion. Educators can use AI-generated cases to train students in differential diagnosis and management, offering controlled, diverse scenarios including rare conditions often missed in limited clinical rotations. A 2023 pilot at a medical school found that students who used AI-simulated patient interviews alongside their standard curriculum reported greater confidence and slightly better performance in standardized patient exams. Students noted the AI patients felt realistic sometimes and even more challenging since they could push back with questions like “Doctor, why are you asking that?”, prompting students to explain their reasoning clearly.

4.10.3. Continuing Education and Knowledge Update

Generative AI offers clinicians a way to stay updated on new research and clinical guidelines specific to their field. For example, an oncologist could ask “What are the major updates in Stage III lung cancer treatment this year?” and obtain a clear, concise summary. Likewise, AI can generate personalized learning modules such as an introduction to medical genetics along with structured content and self-assessment quizzes.

4.10.4. Assessment and Feedback

Generative AI might also assist faculties in evaluating students. AI can be used to evaluate student essays or clinical notes, offering feedback on medical accuracy and organization. It might flag missing elements in a differential diagnosis or suggest clearer phrasing. Early studies show that AI can grade short-answer responses comparably to human assessors, making it a potential tool for large-scale education. However, caution is needed to avoid reinforcing biases in the training data that may shape AI judgments about what constitutes a “good” answer.

There are some cautions for using generative AI in education. A key concern with the use of generative AI by students is maintaining academic integrity and ensuring accuracy. Overreliance on AI may hinder critical thinking and risk acceptance of incorrect information. In response, institutions are setting guidelines restricting AI use in formal assessments but allowing it as a study aid with full disclosure and fact-checking. There is also a push to educate students about AI’s strengths and limitations, with some programs integrating basic AI literacy to prepare future clinicians for its responsible use.

While generative AI’s role in healthcare often centers on clinical tasks, its potential in education and training is rapidly expanding. By offering personalized, interactive learning experiences, AI can effectively supplement traditional methods by helping develop clinicians and researchers who are both well-informed and digitally proficient.

4.11. Explainability in Machine Learning in Contrast to Traditional Statistical Methods

Conventional statistical models, such as logistic regression or Cox proportional hazards models, provide inherent transparency through interpretable parameters, p-values, and confidence intervals—making them widely trusted in clinical and epidemiological research. However, these models are often constrained by assumptions of linearity, limited covariates, and predefined relationships. In contrast, ML models, particularly deep learning and large language models, can model highly complex, nonlinear interactions and extract patterns from unstructured data but have been criticized for their “black-box” nature. This lack of intuitive explainability can hinder clinical adoption, particularly in high-stakes decision-making.

To address this, the field of explainable artificial intelligence (XAI) has emerged, offering tools such as SHAP (SHapley Additive exPlanations), LIME (Local Interpretable Model-agnostic Explanations), attention visualizations, and counterfactual reasoning to demystify model predictions. These tools provide local and global interpretability, allowing clinicians to understand the rationale behind specific outputs, detect biases, and assess reliability in context. While statistical models offer a global understanding rooted in theoretical constructs, explainable ML provides case-level insights and dynamic explanations tailored to individual predictions, an important feature in personalized and precision medicine. A hybrid approach that integrates the interpretability of statistical models with the flexibility of ML may offer the most robust solutions for transparent and trustworthy clinical AI [109].

4.12. Comparative Insights of Generative AI with Traditional Methods or Human Experts

A comparative evaluation of generative AI against traditional diagnostic tools and clinical expertise reveals both promise and restriction in its real-world adoption. Across multiple domains ranging from differential diagnosis and radiological interpretation to early detection of mental health disorders and drug discovery, generative AI models have frequently demonstrated diagnostic performance on par with or superior to human experts, particularly in structured decision-making tasks and image-heavy domains. For instance, GPT-4 matched or exceeded the diagnostic accuracy of senior medical residents in complex cases, while specialized AI tools in radiology achieved sensitivities exceeding 95%, emulating expert consensus. Similarly, machine learning-based diagnostic models for conditions like Kawasaki disease and schizophrenia showed high discriminative performance, suggesting AI’s potential to reduce diagnostic delays in vulnerable populations. However, these advances must be contextualized within broader clinical workflows, as model hallucinations, interpretability challenges, and variability in real-world datasets can impact reliability. Hence, while generative AI serves as a compelling adjunct to human expertise, its optimal role may be as a decision support collaborator rather than a standalone solution, especially in high-stakes clinical scenarios requiring nuanced judgment [110,111].

5. Challenges and Ethical Considerations

Despite its immense promise, generative AI in healthcare comes with significant challenges and risks that must be addressed. The transition from exciting AI demos to safe, effective clinical tools is nontrivial. As generative AI becomes more embedded in healthcare, its safe and effective use will depend on careful regulation, evidence-based insights, and proactive policymaking. There are several challenges and ethical considerations involved with the use of generative AI in healthcare systems, which are depicted in Figure 5 and discussed below.

5.1. Accuracy and Hallucinations

One of the most important issues with generative AI, especially LLMs, is their tendency to generate incorrect output that is superficially plausible. In healthcare systems, AI-generated misinformation, often called “hallucinations”, can have critical consequences, especially if used without proper verification [108]. For instance, an AI might fabricate a lab result or imaging finding while summarizing a case or propose an unproven treatment as part of a patient care plan. AI systems have been known to generate citations for medical journal articles that do not exist, constructing them by combining legitimate-sounding author names and titles. In medicine, hallucinated content could mislead clinicians or patients if not caught. The primary reason for these mistakes is that generative models focus on producing the most likely word or image sequences, not necessarily the most accurate. If their training data contains contradictions or lacks certain information, the models tend to fill those gaps with plausible assumptions. Additionally, they lack an inherent fact-checking mechanism or understanding of truth—they do not “know” in the human sense; they just correlate patterns.

Ensuring accuracy is important. Possible solutions include integrating verification mechanisms such as cross-checking AI-generated content outputs with trusted databases, limiting outputs to source-backed information through retrieval-augmented models, and continuous model fine-tuning with user feedback. To address these risks, some institutions are adopting a “bounded” use of generative AI, limiting it to tasks like summarizing or rephrasing content strictly from existing transcripts or charts, without generating new or speculative information. Similarly, patient chatbots are being designed to include frequent disclaimers like “I am not a medical professional, please consult one for any serious symptoms” and to alert or redirect users to emergency help if they mention things like self-harm.

Despite the promise of generative AI in healthcare, its implementation in real-world clinical settings faces several critical limitations. One of the most pressing concerns is the tendency of large language models (LLMs) to produce hallucinations that are plausible-sounding but factually incorrect or misleading. This poses a substantial risk in clinical environments, where accuracy is paramount and misinformation can lead to harmful consequences. Hallucination rates vary across models and use cases. Early versions of LLMs, such as GPT-3.5, demonstrated higher hallucination frequencies, particularly in complex, domain-specific tasks. In contrast, newer models like GPT-4 have shown improvements in factual accuracy and consistency. For example, a 2023 benchmarking study evaluating medical question-answering found that GPT-4 had a hallucination rate of approximately 8–10% in clinical vignettes, compared to 20–30% for earlier models. However, even low error rates can have significant clinical ramifications, particularly when generative AI is used to support diagnosis, treatment planning, or patient communication.

The clinical consequences of hallucinations can range from minor confusion to serious adverse events, depending on how the information is used. If an AI model suggests an incorrect drug dosage or misinterprets a diagnostic image summary, clinicians relying on such outputs without critical oversight may unknowingly propagate errors. These risks underscore the importance of maintaining human oversight and integrating AI as an assistive, not autonomous, tool in clinical workflows.

To address hallucinations and other risks, several mitigation strategies are under development. Retrieval-augmented generation (RAG) architectures combine generative AI with fact retrieval from trusted medical databases or electronic health records (EHRs), anchoring outputs in verified sources. Additionally, domain-specific fine-tuning using curated medical datasets improves the factual robustness of LLMs. Techniques such as prompt engineering, reinforcement learning with human feedback (RLHF), and postprocessing filters are also being employed to reduce hallucinations. Furthermore, regulatory frameworks and validation standards for clinical AI tools remain in their early stages. At present, no generative AI system has received FDA approval for unsupervised clinical decision-making. Therefore, rigorous testing, external validation, and continuous monitoring are essential to ensure safety, reliability, and ethical deployment [108,109].

5.2. Bias and Health Equity

Generative AI in healthcare can unintentionally reinforce biases present in its training data, leading to unequal performance across different demographic groups. This can affect everything from symptom interpretation to drug design, particularly when certain populations are underrepresented. For instance, an AI trained mostly on lighter skin tones may miss conditions on darker skin, or one trained on adult data may underperform in pediatric cases. To address this, developers must use diverse datasets, evaluate performance across subgroups, and incorporate explainable AI to make biases easier to detect. Ensuring fairness requires ongoing oversight, inclusive design, and transparency throughout development and deployment [110].

Bias in generative AI models can arise from nonrepresentative training data, flawed labeling, or historical inequities embedded in clinical datasets. These biases may result in disparities in diagnostic accuracy, treatment recommendations, or patient communication quality—particularly affecting marginalized or underrepresented groups. To mitigate such risks, several strategies are being adopted:

Dataset Diversification: Curating diverse and representative datasets is a foundational step. This includes demographic diversity (race, ethnicity, age, gender, socioeconomic status), as well as clinical diversity across disease types, comorbidities, and care settings. Efforts are underway to build open-access, high-quality datasets with better subgroup representation, such as the NIH’s All of Us program.

Algorithmic Fairness Techniques: Techniques such as reweighting, adversarial debiasing, and fairness-constrained learning are used during training to ensure equitable performance across subgroups. These techniques explicitly penalize disparities in predictions between protected groups.

Bias Audits and Evaluation Metrics: Regular audits using fairness metrics (e.g., equal opportunity, demographic parity, disparate impact ratio) are essential during both model development and post-deployment phases. Tools like Model Cards and Datasheets for Datasets improve transparency regarding model performance across different populations.

Synthetic Data Augmentation: Generative models can be used to create synthetic examples of underrepresented cases (e.g., rare diseases or minority populations), which can supplement training datasets and reduce skew.

Post-Deployment Monitoring: Continuous monitoring of AI outputs in real-world clinical environments helps detect emerging biases due to data drift or contextual misalignment. Feedback loops involving clinicians and patients can guide recalibration.

Stakeholder Engagement and Ethical Review: Involving diverse stakeholders, including ethicists, patient advocates, and clinicians from different backgrounds, throughout the AI development life cycle ensures broader perspectives are considered and potential harms are anticipated.

Despite these strategies, challenges remain, such as balancing fairness with model accuracy and the lack of standardized fairness benchmarks in medical AI. Regulatory bodies and professional organizations are beginning to issue guidance, but more consensus is needed to implement and enforce fairness principles at scale. By integrating these strategies, generative AI can move closer to delivering equitable healthcare solutions that do not reinforce or exacerbate existing disparities [110].

5.3. Privacy and Security

Generative AI models in healthcare raise significant concerns about patient privacy and consent, as they rely on vast amounts of training data, often drawn from patient records. Even if de-identified, there is a small but real risk that models could memorize and inadvertently reproduce sensitive information. Cases have shown that models can regurgitate details like Social Security numbers or chunks of rare-case narratives from training sets. This has prompted calls for strong de-identification practices, safeguards against protected health information (PHI) leakage, and limits on using real patient data. Technologies like federated learning, where data never leaves the institution, and clear usage guidelines for clinicians, especially around avoiding third-party tools like public AI services for PHI, are being emphasized. Legal frameworks such as HIPAA require any AI vendor handling PHI to have formal agreements in place, and until healthcare-specific, secure AI systems become more common, this remains a critical ethical and compliance issue [112].

Beyond privacy, integrating generative AI into clinical workflows introduces cybersecurity vulnerabilities. These systems could become targets for “prompt attacks” where malicious inputs lead to unsafe outputs or system manipulation. The hypothetical risk of AI-generated misinformation, like false drug recall alerts, illustrates the need for strict guardrails, authentication protocols, and human oversight. Intellectual property risks also emerge when researchers use AI with sensitive data, posing concerns about data leakage or unauthorized use by the AI’s creators. Solutions include deploying AI models locally within secure environments or using privacy-preserving training techniques. Policymakers, including the U.S. GAO, have recognized these risks and are working toward regulations that ensure compliance with privacy laws while fostering innovation in healthcare AI [112].

5.4. Accountability and Legal Liability

As AI becomes more integrated into clinical decision-making, questions around responsibility, regulation, and transparency are increasingly important. While clinicians are currently accountable for outcomes involving AI, the line blurs when AI tools automatically generate clinical content, raising concerns about oversight and liability. Regulatory bodies like the FDA are exploring how to manage adaptive, learning-based AI systems, especially those providing direct diagnostic or treatment support. Ethically, there is ongoing debate about whether patients should be informed when AI is involved in their care, with transparency seen as key to maintaining trust. These evolving issues highlight the need for clear guidelines to ensure safe, responsible, and equitable use of AI in healthcare [113].

5.5. Implementation Challenges

Besides technical and ethical concerns, implementing generative AI in healthcare faces significant practical challenges, such as integrating with legacy IT systems, training staff, and ensuring AI tools fit smoothly into clinical workflows without causing delays or usability issues. Poor interface design, slow response times, or cumbersome processes can lead to clinician abandonment. Human factors engineering is essential to make AI support seamless, ideally embedding it within existing systems. Cost is another barrier, as developing and deploying AI can be expensive, potentially widening the gap between well-resourced and under-resourced healthcare systems. While cloud-based AI solutions could help smaller practices access advanced tools, they also raise concerns about privacy and data security [113].

Generative AI in healthcare faces significant challenges; recognizing and addressing them through collaboration among clinicians, engineers, ethicists, and regulators is key (Figure 5). Ensuring patient safety, equity, and trust remains central as the field evolves.

The regulatory oversight of generative AI in healthcare is still in its formative stages. Currently, most regulatory agencies—including the FDA and EMA—evaluate AI tools under the broader category of Software as a Medical Device (SaMD). The FDA’s existing guidance (e.g., the “Proposed Regulatory Framework for Modifications to AI/ML-Based Software as a Medical Device” issued in 2019) primarily applies to traditional machine learning models and does not yet fully encompass the unique characteristics of generative AI, such as dynamic content generation, hallucinations, and context-sensitive outputs.

The FDA’s Digital Health Center of Excellence (DHCoE) has begun to explore adaptive AI systems, but as of now, no generative AI model (e.g., LLMs like GPT) has received FDA clearance for autonomous use in clinical decision-making. Instead, most tools are expected to operate under clinician oversight, and their use is generally considered assistive rather than authoritative. In Europe, the EMA and the upcoming EU AI Act propose a risk-based classification system. Under this framework, generative AI tools intended for diagnostic or therapeutic support may fall under “high-risk” AI systems, requiring stringent conformity assessments, transparency, and real-world performance monitoring. However, specific provisions for LLMs and generative models remain unclear, and harmonized guidance for clinical validation is lacking.

Globally, the International Medical Device Regulators Forum (IMDRF) and organizations like the World Health Organization (WHO) have acknowledged the need for adaptive regulatory frameworks that can accommodate the continuous learning capabilities and complex output nature of GenAI systems. To address upcoming challenges, there is a growing call for multistakeholder collaboration between regulators, developers, clinicians, and ethicists. Developing domain-specific standards (e.g., for radiology, oncology, or mental health) and creating audit tools for generative outputs could facilitate safer and more trustworthy adoption.

5.6. Sustainable Development Goals (SDGs): Climate Action

Training and deploying large language models require substantial computational resources which contribute to carbon emissions and energy consumption. For instance, recent analyses suggest that training a single large-scale AI model can emit hundreds of metric tons of CO₂ equivalent, depending on energy source and data center efficiency. These emissions raise ethical questions about the environmental trade-offs of deploying AI at scale, even in high-priority sectors like healthcare, and also emphasize the need for green AI strategies, such as optimizing model architectures, improving energy efficiency in data centers, and adopting renewable-energy-powered computation, to align AI-driven healthcare innovations with broader sustainability goals [114,115].

5.7. Cost-Effectiveness and Scalability Considerations

While generative AI offers significant clinical promise, its real-world deployment must be evaluated in terms of cost-effectiveness and scalability. The implementation costs of generative AI solutions include initial investments in computing infrastructure (particularly for large language models requiring GPU clusters or cloud access), integration with electronic health records (EHRs), staff training, cybersecurity measures, and ongoing maintenance or fine-tuning.

For instance, AI documentation tools such as DAX Copilot or Suki AI have demonstrated some reduction in clinician burnout and documentation time, but large-scale implementation remains costly for smaller or resource-limited institutions. Similarly, deploying AI models at the point of care in rural or under-resourced regions often requires mobile-compatible, lightweight versions of models or cloud-based access, which introduces challenges in bandwidth, data privacy, and affordability. Preliminary cost-benefit analyses from pilot deployments suggest that generative AI systems can be cost-effective over time by reducing clinician workload, administrative costs, and medical errors. However, these findings are largely confined to well-funded health systems in high-income countries. Scalability to low- and middle-income countries (LMICs) requires careful adaptation—such as open-source models, multilingual capabilities, and modular AI solutions that can operate on minimal infrastructure.

To address these disparities, global initiatives like the World Bank’s Digital Health Initiative and the WHO Global Strategy on Digital Health emphasize the development of scalable, equitable AI tools tailored for diverse healthcare contexts. Further studies evaluating longitudinal cost outcomes and health economic modeling will be critical in determining the sustainable integration of generative AI in healthcare systems worldwide.

6. Future Directions and Conclusions

Generative AI in healthcare has rapidly advanced between 2022 and 2025, marked by a surge of pilot projects and innovations from automated note writing to AI-designed drugs. For generative AI to reach its full potential in healthcare, it must be seamlessly integrated into clinical workflows rather than treated as a separate tool. Future EHRs and order entry systems are likely to feature built-in AI assistants that offer real-time suggestions and automate routine tasks. This integration will depend on close collaboration between AI developers and health IT vendors, with a strong focus on intuitive, user-centered design. Ultimately, AI support should feel as natural and unobtrusive as spell-check, enhancing efficiency without disrupting clinical work [116].

Generative AI systems in healthcare can improve over time with continued use, offering a unique opportunity but also posing regulatory challenges. Future deployments may include AI tools that adapt locally or through federated learning, refining performance based on clinician feedback. To ensure safety and effectiveness, hospitals will need ongoing monitoring system “model validation hubs” to track metrics and manage updates. This approach supports a “learning healthcare system”, where AI evolves like a clinician gaining experience [116].

Future generative AI systems in healthcare will move beyond text and images to integrate diverse data types such as genomics, wearable data, environmental inputs, and more for a comprehensive, personalized view of each patient. These systems could detect subtle health trends across multiple data sources to predict issues like heart failure exacerbations and proactively communicate with patients. Realizing this vision will require overcoming data silos and interoperability challenges, but it opens the door to creating “digital twins” AI models that simulate individual health trajectories and treatment outcomes, an approach already being explored in critical care [117].

Moreover, future generative AI in healthcare may be personalized not just for patient populations but for each patient that could adapt to each person’s communication style, health literacy, and preferences. An AI health coach could tailor its messages using visuals or text, technical or simple language, based on what resonates best with each user. Similarly, clinicians could benefit from AI that formats outputs according to their preferred style, such as bullet points or narratives. This level of personalization could be achieved through fine-tuning user-specific data, with privacy maintained via on-device models and user consent [117].

In the coming future, clearer regulatory and ethical guidelines for generative AI in healthcare are expected from bodies like the AMA and FDA, with frameworks addressing disclosure, supervision, and documentation. The EU’s AI Act already classifies healthcare AI as high-risk, requiring strong oversight and transparency. Hospitals may create dedicated AI oversight committees to evaluate tools before deployment, assessing factors like training data and potential biases. Additionally, malpractice insurers might adjust risk assessments based on AI use, possibly offering incentives for validated systems that demonstrably reduce clinical errors [118].

The conceptual shift from viewing generative AI as a mere “co-scientist” to positioning it as a “co-doctor” reflects its expanding role in augmenting real-time clinical decision-making, communication, and workflow efficiency. As large language models become increasingly capable of drafting clinical notes, triaging cases, composing patient communications, and even assisting in diagnostic reasoning, their integration into frontline care settings demands robust ethical and operational guidelines. To responsibly guide this transformation, AI tools must be embedded within a human-in-the-loop framework, where clinicians retain ultimate accountability for all medical decisions and documentation. Key principles include ensuring the explainability of outputs, bias mitigation through inclusive training datasets, and transparency in model behavior. Moreover, adherence to data governance regulations (e.g., HIPAA, GDPR) and maintenance of patient confidentiality must remain central. Clinical settings should define clear boundaries for generative AI involvement—differentiating between the automation of routine tasks (e.g., draft generation) and high-stakes judgments (e.g., final diagnosis or treatment planning) [119].

Clinicians will need fundamental AI literacy, prompting medical education to include training on using and interpreting AI tools responsibly. A new role of clinical AI auditors or medical AI specialists is required to bridge the gap between healthcare teams and AI systems, ensuring safe and effective use. Similar to clinical pharmacists, they would oversee implementation, troubleshoot problems, and guide best practices. Hospitals may also incorporate AI use into quality assurance reviews to evaluate its real-world impact and identify any risks.

While the promise of generative AI in healthcare is substantial, its future must be understood in the context of current tangible advancements. For instance, Microsoft’s Nuance Dragon Ambient eXperience (DAX) Copilot is currently being piloted in U.S. healthcare systems for ambient clinical documentation, and initial findings suggest a modest but meaningful reduction in clinician workload. Similarly, Google’s Med-PaLM M, a multimodal large language model capable of processing both text and medical images (e.g., chest X-rays and dermatological images), is being tested for clinical reasoning tasks and diagnostic imaging support. Early trials have shown improved diagnostic accuracy in image interpretation when paired with textual clinical context.

Another promising area is voice-based AI for chronic disease management. A randomized controlled trial recently demonstrated that voice-interactive AI for insulin titration in type 2 diabetes led to faster achievement of target glycemic control and improved adherence compared to usual care. On the regulatory front, the FDA’s Digital Health Software Precertification Program and EMA’s guidelines for adaptive machine learning-based medical devices are being actively refined to accommodate generative AI tools. Several generative AI applications, including documentation and triage assistants, are already under review for formal clinical integration.

These examples underscore that generative AI is progressing beyond the conceptual stage into early clinical implementation. Continued validation through prospective trials and regulatory collaboration will be essential in realizing their full clinical potential safely and equitably [118,120].

Future AI models are expected to become more efficient, transparent, and specialized, with smaller, task-specific models replacing large general ones. These models could either be individually fine-tuned for specific roles, such as oncology counseling or ER triage, or feature modular components for different knowledge domains. This shift would likely improve performance, reduce computational costs, and democratize access, allowing smaller, locally deployed models to handle many tasks without relying on large APIs from companies like OpenAI or Google [120].

More importantly, generative AI in healthcare could help in building a robust evidence base through prospective trials and outcome studies. AI tools might improve patient outcomes, safety, or efficiency in real-world settings, such as reducing errors or burnout in note writing or improving diagnostic speed and accuracy. By 2025, early studies may inform clinical guidelines and standard care, with positive outcomes potentially leading to AI tools being recommended or mandated in specific areas (Figure 6). Furthermore, the incorporation of AI in healthcare is dependent upon patient acceptance, and it must meet their expectations without fear. Success stories help in building trust and provide comfort to patients. Engaging patient advocacy groups in AI design (co-development) could ensure the tools address patient needs and concerns.

In conclusion, generative AI is likely to become a vital part of healthcare in the 21st century. While there has been exciting progress, its use in real clinical settings is still just beginning, similar to the early days of evidence-based medicine or medical imaging. Moving forward, its success will depend on solid research and careful use. As this review has detailed, numerous applications in clinical documentation, patient communication, diagnostics, imaging, drug discovery, and beyond are actively being explored, with growing evidence of benefit in some areas (e.g., documentation efficiency, drug candidate identification) [12]. While generative AI holds promise in enhancing healthcare, its use must be guided by caution, ethical standards, and evidence-based practice. Clinicians should engage with AI thoughtfully, being neither overly trusting nor dismissive, by contributing feedback and research to shape its development. When used properly, AI can support clinicians by handling routine tasks, allowing them to focus on the human, ethical, and complex aspects of care that AI cannot replace. Over the next decade, generative AI is expected to move from pilot projects to routine use in healthcare, with the potential to boost efficiency, reduce paperwork, and enable more personalized, proactive care [121]. However, realizing these benefits will require safeguards like human oversight, ongoing validation, and equitable access. As experts note, implementing AI will be an ongoing, iterative process that is refined through real-world testing and continuous learning.

Author Contributions

Conceptualization, S.A.R. and S.S.; methodology, M.E.-T. and S.S.R.; software, S.A.R., S.S. and S.S.R.; validation, S.A.R. and S.S.; formal analysis, S.S.R. and Y.E.-T.; investigation, S.A.R., S.S.; resources, S.S.R.; data curation, S.A.R., S.S.; writing—original draft preparation, S.A.R. and S.S.; writing—review and editing, S.A.R. and S.S.; visualization, R.K. and M.S.; supervision, M.E.-T. and Y.E.-T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

To enhance the clarity and coherence of this manuscript, AI-assisted tool (ChatGPT by OpenAI) was employed during the drafting and revision process. This tool was utilized to support language refinement and writing flow.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zhang, K.; Meng, X.; Yan, X.; Ji, J.; Liu, J.; Xu, H.; Zhang, H.; Liu, D.; Wang, J.; Wang, X.; et al. Revolutionizing Health Care: The Transformative Impact of Large Language Models in Medicine. J. Med. Internet Res. 2025, 27, e59069. [Google Scholar] [CrossRef] [PubMed]
Hacking, S. ChatGPT and Medicine: Together We Embrace the AI Renaissance. JMIR Bioinform. Biotechnol. 2024, 5, e52700. [Google Scholar] [CrossRef] [PubMed]
Kung, T.H.; Cheatham, M.; Medenilla, A.; Sillos, C.; De Leon, L.; Elepaño, C.; Madriaga, M.; Aggabao, R.; Diaz-Candido, G.; Maningo, J.; et al. Performance of ChatGPT on USMLE: Potential for AI-Assisted Medical Education Using Large Language Models. PLoS Digit. Health 2023, 2, e0000198. [Google Scholar] [CrossRef]
Allaway, R.J. Science & Tech Spotlight: Generative AI in Health Care 2024. Available online: https://www.gao.gov/products/gao-24-107634 (accessed on 30 April 2025).
Singhal, K.; Tu, T.; Gottweis, J.; Sayres, R.; Wulczyn, E.; Amin, M.; Hou, L.; Clark, K.; Pfohl, S.R.; Cole-Lewis, H.; et al. Toward Expert-Level Medical Question Answering with Large Language Models. Nat. Med. 2025, 31, 943–950. [Google Scholar] [CrossRef]
Torous, J.; Blease, C. Generative Artificial Intelligence in Mental Health Care: Potential Benefits and Current Challenges. World Psychiatry 2024, 23, 1–2. [Google Scholar] [CrossRef]
Ng, C.K.C. Generative Adversarial Network (Generative Artificial Intelligence) in Pediatric Radiology: A Systematic Review. Children 2023, 10, 1372. [Google Scholar] [CrossRef] [PubMed]
Brugnara, G.; Jayachandran Preetha, C.; Deike, K.; Haase, R.; Pinetz, T.; Foltyn-Dumitru, M.; Mahmutoglu, M.A.; Wildemann, B.; Diem, R.; Wick, W.; et al. Addressing the Generalizability of AI in Radiology Using a Novel Data Augmentation Framework with Synthetic Patient Image Data: Proof-of-Concept and External Validation for Classification Tasks in Multiple Sclerosis. Radiol. Artif. Intell. 2024, 6, e230514. [Google Scholar] [CrossRef]
Liu, T.-L.; Hetherington, T.C.; Stephens, C.; McWilliams, A.; Dharod, A.; Carroll, T.; Cleveland, J.A. AI-Powered Clinical Documentation and Clinicians’ Electronic Health Record Experience: A Nonrandomized Clinical Trial. JAMA Netw. Open 2024, 7, e2432460. [Google Scholar] [CrossRef] [PubMed]
Ali, S.R.; Dobbs, T.D.; Hutchings, H.A.; Whitaker, I.S. Using ChatGPT to Write Patient Clinic Letters. Lancet Digit. Health 2023, 5, e179–e181. [Google Scholar] [CrossRef]
Zaretsky, J.; Kim, J.M.; Baskharoun, S.; Zhao, Y.; Austrian, J.; Aphinyanaphongs, Y.; Gupta, R.; Blecker, S.B.; Feldman, J. Generative Artificial Intelligence to Transform Inpatient Discharge Summaries to Patient-Friendly Language and Format. JAMA Netw. Open 2024, 7, e240357. [Google Scholar] [CrossRef]
Garcia, P.; Ma, S.P.; Shah, S.; Smith, M.; Jeong, Y.; Devon-Sand, A.; Tai-Seale, M.; Takazawa, K.; Clutter, D.; Vogt, K.; et al. Artificial Intelligence–Generated Draft Replies to Patient Inbox Messages. JAMA Netw. Open 2024, 7, e243201. [Google Scholar] [CrossRef] [PubMed]
Hartman, V.; Zhang, X.; Poddar, R.; McCarty, M.; Fortenko, A.; Sholle, E.; Sharma, R.; Campion, T.; Steel, P.A.D. Developing and Evaluating Large Language Model–Generated Emergency Medicine Handoff Notes. JAMA Netw. Open 2024, 7, e2448723. [Google Scholar] [CrossRef] [PubMed]
Duggan, M.J.; Gervase, J.; Schoenbaum, A.; Hanson, W.; Howell, J.T.; Sheinberg, M.; Johnson, K.B. Clinician Experiences With Ambient Scribe Technology to Assist With Documentation Burden and Efficiency. JAMA Netw. Open 2025, 8, e2460637. [Google Scholar] [CrossRef]
Ayers, J.W.; Poliak, A.; Dredze, M.; Leas, E.C.; Zhu, Z.; Kelley, J.B.; Faix, D.J.; Goodman, A.M.; Longhurst, C.A.; Hogarth, M.; et al. Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum. JAMA Intern. Med. 2023, 183, 589–596. [Google Scholar] [CrossRef] [PubMed]
Anderson, B.J.; Zia Ul Haq, M.; Zhu, Y.; Hornback, A.; Cowan, A.D.; Mott, M.; Gallaher, B.; Harzand, A. Development and Evaluation of a Model to Manage Patient Portal Messages. NEJM AI 2025, 2, AIoa2400354. [Google Scholar] [CrossRef]
Ren, Y.; Loftus, T.J.; Datta, S.; Ruppert, M.M.; Guan, Z.; Miao, S.; Shickel, B.; Feng, Z.; Giordano, C.; Upchurch, G.R.; et al. Performance of a Machine Learning Algorithm Using Electronic Health Record Data to Predict Postoperative Complications and Report on a Mobile Platform. JAMA Netw. Open 2022, 5, e2211973. [Google Scholar] [CrossRef]
Small, W.R.; Wiesenfeld, B.; Brandfield-Harvey, B.; Jonassen, Z.; Mandal, S.; Stevens, E.R.; Major, V.J.; Lostraglio, E.; Szerencsy, A.; Jones, S.; et al. Large Language Model–Based Responses to Patients’ In-Basket Messages. JAMA Netw. Open 2024, 7, e2422399. [Google Scholar] [CrossRef]
Yalamanchili, A.; Sengupta, B.; Song, J.; Lim, S.; Thomas, T.O.; Mittal, B.B.; Abazeed, M.E.; Teo, P.T. Quality of Large Language Model Responses to Radiation Oncology Patient Care Questions. JAMA Netw. Open 2024, 7, e244630. [Google Scholar] [CrossRef]
Liu, T.-L.; Hetherington, T.C.; Dharod, A.; Carroll, T.; Bundy, R.; Nguyen, H.; Bundy, H.E.; Isreal, M.; McWilliams, A.; Cleveland, J.A. Does AI-Powered Clinical Documentation Enhance Clinician Efficiency? A Longitudinal Study. NEJM AI 2024, 1, AIoa2400659. [Google Scholar] [CrossRef]
Zakka, C.; Shad, R.; Chaurasia, A.; Dalal, A.R.; Kim, J.L.; Moor, M.; Fong, R.; Phillips, C.; Alexander, K.; Ashley, E.; et al. Almanac—Retrieval-Augmented Language Models for Clinical Medicine. NEJM AI 2024, 1, AIoa2300068. [Google Scholar] [CrossRef]
Nayak, A.; Vakili, S.; Nayak, K.; Nikolov, M.; Chiu, M.; Sosseinheimer, P.; Talamantes, S.; Testa, S.; Palanisamy, S.; Giri, V.; et al. Use of Voice-Based Conversational Artificial Intelligence for Basal Insulin Prescription Management Among Patients With Type 2 Diabetes: A Randomized Clinical Trial. JAMA Netw. Open 2023, 6, e2340232. [Google Scholar] [CrossRef] [PubMed]
Bernstein, I.A.; Zhang, Y.; Govil, D.; Majid, I.; Chang, R.T.; Sun, Y.; Shue, A.; Chou, J.C.; Schehlein, E.; Christopher, K.L.; et al. Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions. JAMA Netw. Open 2023, 6, e2330320. [Google Scholar] [CrossRef] [PubMed]
Chen, D.; Huang, R.S.; Jomy, J.; Wong, P.; Yan, M.; Croke, J.; Tong, D.; Hope, A.; Eng, L.; Raman, S. Performance of Multimodal Artificial Intelligence Chatbots Evaluated on Clinical Oncology Cases. JAMA Netw. Open 2024, 7, e2437711. [Google Scholar] [CrossRef]
Dolan, E. Generative AI Chatbots like ChatGPT Can Act as an Emotional Sanctuary for Mental Health 2025. Available online: https://www.psypost.org/generative-ai-chatbots-like-chatgpt-can-act-as-an-emotional-sanctuary-for-mental-health/ (accessed on 30 April 2025).
Patel, T.A.; Heintz, J.; Chen, J.; LaPergola, M.; Bilker, W.B.; Patel, M.S.; Arya, L.A.; Patel, M.I.; Bekelman, J.E.; Manz, C.R.; et al. Spending Analysis of Machine Learning–Based Communication Nudges in Oncology. NEJM AI 2024, 1, AIoa2300228. [Google Scholar] [CrossRef]
Krohmer, K.; Naumann, E.; Tuschen-Caffier, B.; Svaldi, J. Mirror Exposure in Binge-Eating Disorder: Changes in Eating Pathology and Attentional Biases. J. Consult. Clin. Psychol. 2022, 90, 613–625. [Google Scholar] [CrossRef]
Nong, P.; Platt, J. Patients’ Trust in Health Systems to Use Artificial Intelligence. JAMA Netw. Open 2025, 8, e2460628. [Google Scholar] [CrossRef]
Shea, Y.-F.; Lee, C.M.Y.; Ip, W.C.T.; Luk, D.W.A.; Wong, S.S.W. Use of GPT-4 to Analyze Medical Records of Patients With Extensive Investigations and Delayed Diagnosis. JAMA Netw. Open 2023, 6, e2325000. [Google Scholar] [CrossRef]
Lin, C.; Liu, W.-T.; Chang, C.-H.; Lee, C.-C.; Hsing, S.-C.; Fang, W.-H.; Tsai, D.-J.; Chen, K.-C.; Lee, C.-H.; Cheng, C.-C.; et al. Artificial Intelligence–Powered Rapid Identification of ST-Elevation Myocardial Infarction via Electrocardiogram (ARISE)—A Pragmatic Randomized Controlled Trial. NEJM AI 2024, 1, AIoa2400190. [Google Scholar] [CrossRef]
Goh, E.; Gallo, R.; Hom, J.; Strong, E.; Weng, Y.; Kerman, H.; Cool, J.A.; Kanjee, Z.; Parsons, A.S.; Ahuja, N.; et al. Large Language Model Influence on Diagnostic Reasoning: A Randomized Clinical Trial. JAMA Netw. Open 2024, 7, e2440969. [Google Scholar] [CrossRef]
Konz, N.; Buda, M.; Gu, H.; Saha, A.; Yang, J.; Chłędowski, J.; Park, J.; Witowski, J.; Geras, K.J.; Shoshan, Y.; et al. A Competition, Benchmark, Code, and Data for Using Artificial Intelligence to Detect Lesions in Digital Breast Tomosynthesis. JAMA Netw. Open 2023, 6, e230524. [Google Scholar] [CrossRef]
Tsai, C.-M.; Lin, C.-H.R.; Kuo, H.-C.; Cheng, F.-J.; Yu, H.-R.; Hung, T.-C.; Hung, C.-S.; Huang, C.-M.; Chu, Y.-C.; Huang, Y.-H. Use of Machine Learning to Differentiate Children with Kawasaki Disease from Other Febrile Children in a Pediatric Emergency Department. JAMA Netw. Open 2023, 6, e237489. [Google Scholar] [CrossRef] [PubMed]
Tong, W.-J.; Wu, S.-H.; Cheng, M.-Q.; Huang, H.; Liang, J.-Y.; Li, C.-Q.; Guo, H.-L.; He, D.-N.; Liu, Y.-H.; Xiao, H.; et al. Integration of Artificial Intelligence Decision Aids to Reduce Workload and Enhance Efficiency in Thyroid Nodule Management. JAMA Netw. Open 2023, 6, e2313674. [Google Scholar] [CrossRef] [PubMed]
Hansen, L.; Bernstorff, M.; Enevoldsen, K.; Kolding, S.; Damgaard, J.G.; Perfalk, E.; Nielbo, K.L.; Danielsen, A.A.; Østergaard, S.D. Predicting Diagnostic Progression to Schizophrenia or Bipolar Disorder via Machine Learning. JAMA Psychiatry 2025, 82, 459. [Google Scholar] [CrossRef]
Ngeow, A.J.H.; Moosa, A.S.; Tan, M.G.; Zou, L.; Goh, M.M.R.; Lim, G.H.; Tagamolila, V.; Ereno, I.; Durnford, J.R.; Cheung, S.K.H.; et al. Development and Validation of a Smartphone Application for Neonatal Jaundice Screening. JAMA Netw. Open 2024, 7, e2450260. [Google Scholar] [CrossRef] [PubMed]
Barami, T.; Manelis-Baram, L.; Kaiser, H.; Ilan, M.; Slobodkin, A.; Hadashi, O.; Hadad, D.; Waissengreen, D.; Nitzan, T.; Menashe, I.; et al. Automated Analysis of Stereotypical Movements in Videos of Children With Autism Spectrum Disorder. JAMA Netw. Open 2024, 7, e2432851. [Google Scholar] [CrossRef] [PubMed]
Gjesvik, J.; Moshina, N.; Lee, C.I.; Miglioretti, D.L.; Hofvind, S. Artificial Intelligence Algorithm for Subclinical Breast Cancer Detection. JAMA Netw Open 2024, 7, e2437402. [Google Scholar] [CrossRef]
Wong, E.F.; Saini, A.K.; Accortt, E.E.; Wong, M.S.; Moore, J.H.; Bright, T.J. Evaluating Bias-Mitigated Predictive Models of Perinatal Mood and Anxiety Disorders. JAMA Netw. Open 2024, 7, e2438152. [Google Scholar] [CrossRef]
Hillis, J.M.; Bizzo, B.C.; Mercaldo, S.; Chin, J.K.; Newbury-Chaet, I.; Digumarthy, S.R.; Gilman, M.D.; Muse, V.V.; Bottrell, G.; Seah, J.C.Y.; et al. Evaluation of an Artificial Intelligence Model for Detection of Pneumothorax and Tension Pneumothorax in Chest Radiographs. JAMA Netw. Open 2022, 5, e2247172. [Google Scholar] [CrossRef]
L’Imperio, V.; Wulczyn, E.; Plass, M.; Müller, H.; Tamini, N.; Gianotti, L.; Zucchini, N.; Reihs, R.; Corrado, G.S.; Webster, D.R.; et al. Pathologist Validation of a Machine Learning–Derived Feature for Colon Cancer Risk Stratification. JAMA Netw. Open 2023, 6, e2254891. [Google Scholar] [CrossRef]
Benary, M.; Wang, X.D.; Schmidt, M.; Soll, D.; Hilfenhaus, G.; Nassir, M.; Sigler, C.; Knödler, M.; Keller, U.; Beule, D.; et al. Leveraging Large Language Models for Decision Support in Personalized Oncology. JAMA Netw. Open 2023, 6, e2343689. [Google Scholar] [CrossRef]
Zhong, Y.; Brooks, M.M.; Kennedy, E.H.; Bodnar, L.M.; Naimi, A.I. Use of Machine Learning to Estimate the Per-Protocol Effect of Low-Dose Aspirin on Pregnancy Outcomes: A Secondary Analysis of a Randomized Clinical Trial. JAMA Netw. Open 2022, 5, e2143414. [Google Scholar] [CrossRef] [PubMed]
Ramírez-Baraldes, E.; García-Gutiérrez, D.; García-Salido, C. Artificial Intelligence in Nursing: New Opportunities and Challenges. Eur. J. Educ. 2025, 60, e70033. [Google Scholar] [CrossRef]
Huang, J.; Neill, L.; Wittbrodt, M.; Melnick, D.; Klug, M.; Thompson, M.; Bailitz, J.; Loftus, T.; Malik, S.; Phull, A.; et al. Generative Artificial Intelligence for Chest Radiograph Interpretation in the Emergency Department. JAMA Netw. Open 2023, 6, e2336100. [Google Scholar] [CrossRef]
Chen, X. Generative Models in Protein Engineering: A Comprehensive Survey. 2024. Available online: https://openreview.net/pdf?id=Xc7l84S0Ao (accessed on 30 April 2025).
Khalifa, M.; Albadawy, M.; Iqbal, U. Advancing Clinical Decision Support: The Role of Artificial Intelligence across Six Domains. Comput. Methods Programs Biomed. Update 2024, 5, 100142. [Google Scholar] [CrossRef]
Wang, D.; Huang, X. Transforming Education through Artificial Intelligence and Immersive Technologies: Enhancing Learning Experiences. Interact. Learn. Environ. 2025, 1–20. [Google Scholar] [CrossRef]
Haque, A.; Akther, N.; Khan, I.; Agarwal, K.; Uddin, N. Artificial Intelligence in Retail Marketing: Research Agenda Based on Bibliometric Reflection and Content Analysis (2000–2023). Informatics 2024, 11, 74. [Google Scholar] [CrossRef]
Blease, C.R.; Locher, C.; Gaab, J.; Hägglund, M.; Mandl, K.D. Generative Artificial Intelligence in Primary Care: An Online Survey of UK General Practitioners. BMJ Health Care Inform. 2024, 31, e101102. [Google Scholar] [CrossRef] [PubMed]
Steimetz, E.; Minkowitz, J.; Gabutan, E.C.; Ngichabe, J.; Attia, H.; Hershkop, M.; Ozay, F.; Hanna, M.G.; Gupta, R. Use of Artificial Intelligence Chatbots in Interpretation of Pathology Reports. JAMA Netw. Open 2024, 7, e2412767. [Google Scholar] [CrossRef] [PubMed]
Ziegelmayer, S.; Reischl, S.; Havrda, H.; Gawlitza, J.; Graf, M.; Lenhart, N.; Nehls, N.; Lemke, T.; Wilhelm, D.; Lohöfer, F.; et al. Development and Validation of a Deep Learning Algorithm to Differentiate Colon Carcinoma From Acute Diverticulitis in Computed Tomography Images. JAMA Netw. Open 2023, 6, e2253370. [Google Scholar] [CrossRef]
Gunturkun, F.; Bakir-Batu, B.; Siddiqui, A.; Lakin, K.; Hoehn, M.E.; Vestal, R.; Davis, R.L.; Shafi, N.I. Development of a Deep Learning Model for Retinal Hemorrhage Detection on Head Computed Tomography in Young Children. JAMA Netw. Open 2023, 6, e2319420. [Google Scholar] [CrossRef]
Sun, D.; Nguyen, T.M.; Allaway, R.J.; Wang, J.; Chung, V.; Yu, T.V.; Mason, M.; Dimitrovsky, I.; Ericson, L.; Li, H.; et al. A Crowdsourcing Approach to Develop Machine Learning Models to Quantify Radiographic Joint Damage in Rheumatoid Arthritis. JAMA Netw. Open 2022, 5, e2227423. [Google Scholar] [CrossRef] [PubMed]
Torres-Lopez, V.M.; Rovenolt, G.E.; Olcese, A.J.; Garcia, G.E.; Chacko, S.M.; Robinson, A.; Gaiser, E.; Acosta, J.; Herman, A.L.; Kuohn, L.R.; et al. Development and Validation of a Model to Identify Critical Brain Injuries Using Natural Language Processing of Text Computed Tomography Reports. JAMA Netw. Open 2022, 5, e2227109. [Google Scholar] [CrossRef]
Zhang, L.; Dong, D.; Sun, Y.; Hu, C.; Sun, C.; Wu, Q.; Tian, J. Development and Validation of a Deep Learning Model to Screen for Trisomy 21 During the First Trimester From Nuchal Ultrasonographic Images. JAMA Netw. Open 2022, 5, e2217854. [Google Scholar] [CrossRef] [PubMed]
Homayounieh, F.; Digumarthy, S.; Ebrahimian, S.; Rueckel, J.; Hoppe, B.F.; Sabel, B.O.; Conjeti, S.; Ridder, K.; Sistermanns, M.; Wang, L.; et al. An Artificial Intelligence–Based Chest X-Ray Model on Human Nodule Detection Accuracy From a Multicenter Study. JAMA Netw. Open 2021, 4, e2141096. [Google Scholar] [CrossRef]
Lee, C.; Willis, A.; Chen, C.; Sieniek, M.; Watters, A.; Stetson, B.; Uddin, A.; Wong, J.; Pilgrim, R.; Chou, K.; et al. Development of a Machine Learning Model for Sonographic Assessment of Gestational Age. JAMA Netw. Open 2023, 6, e2248685. [Google Scholar] [CrossRef] [PubMed]
Sahashi, Y.; Vukadinovic, M.; Amrollahi, F.; Trivedi, H.; Rhee, J.; Chen, J.; Cheng, S.; Ouyang, D.; Kwan, A.C. Opportunistic Screening of Chronic Liver Disease with Deep-Learning–Enhanced Echocardiography. NEJM AI 2025, 2, AIoa2400948. [Google Scholar] [CrossRef]
Dippel, J.; Prenißl, N.; Hense, J.; Liznerski, P.; Winterhoff, T.; Schallenberg, S.; Kloft, M.; Buchstab, O.; Horst, D.; Alber, M.; et al. AI-Based Anomaly Detection for Clinical-Grade Histopathological Diagnostics. NEJM AI 2024, 1, AIoa2400468. [Google Scholar] [CrossRef]
Kazemzadeh, S.; Kiraly, A.P.; Nabulsi, Z.; Sanjase, N.; Maimbolwa, M.; Shuma, B.; Jamshy, S.; Chen, C.; Agharwal, A.; Lau, C.T.; et al. Prospective Multi-Site Validation of AI to Detect Tuberculosis and Chest X-Ray Abnormalities. NEJM AI 2024, 1, AIoa2400018. [Google Scholar] [CrossRef]
Shu, Q.; Pang, J.; Liu, Z.; Liang, X.; Chen, M.; Tao, Z.; Liu, Q.; Guo, Y.; Yang, X.; Ding, J.; et al. Artificial Intelligence for Early Detection of Pediatric Eye Diseases Using Mobile Photos. JAMA Netw. Open 2024, 7, e2425124. [Google Scholar] [CrossRef]
Cui, H.; Zhao, Y.; Xiong, S.; Feng, Y.; Li, P.; Lv, Y.; Chen, Q.; Wang, R.; Xie, P.; Luo, Z.; et al. Diagnosing Solid Lesions in the Pancreas With Multimodal Artificial Intelligence: A Randomized Crossover Trial. JAMA Netw. Open 2024, 7, e2422454. [Google Scholar] [CrossRef]
Aklilu, J.G.; Sun, M.W.; Goel, S.; Bartoletti, S.; Rau, A.; Olsen, G.; Hung, K.S.; Mintz, S.L.; Luong, V.; Milstein, A.; et al. Artificial Intelligence Identifies Factors Associated with Blood Loss and Surgical Experience in Cholecystectomy. NEJM AI 2024, 1, AIoa2300088. [Google Scholar] [CrossRef]
Ahn, J.S.; Ebrahimian, S.; McDermott, S.; Lee, S.; Naccarato, L.; Di Capua, J.F.; Wu, M.Y.; Zhang, E.W.; Muse, V.; Miller, B.; et al. Association of Artificial Intelligence–Aided Chest Radiograph Interpretation With Reader Performance and Efficiency. JAMA Netw. Open 2022, 5, e2229289. [Google Scholar] [CrossRef]
Sima, D.M.; Phan, T.V.; Van Eyndhoven, S.; Vercruyssen, S.; Magalhães, R.; Liseune, A.; Brys, A.; Frenyo, P.; Terzopoulos, V.; Maes, C.; et al. Artificial Intelligence Assistive Software Tool for Automated Detection and Quantification of Amyloid-Related Imaging Abnormalities. JAMA Netw. Open 2024, 7, e2355800. [Google Scholar] [CrossRef] [PubMed]
Carmody, S.; John, D. On Generating Synthetic Histopathology Images Using Generative Adversarial Networks. In Proceedings of the 2023 34th Irish Signals and Systems Conference (ISSC), Dublin, Ireland, 13–14 June 2023; pp. 1–5. [Google Scholar]
Van Booven, D.J.; Chen, C.-B.; Malpani, S.; Mirzabeigi, Y.; Mohammadi, M.; Wang, Y.; Kryvenko, O.N.; Punnen, S.; Arora, H. Synthetic Genitourinary Image Synthesis via Generative Adversarial Networks: Enhancing Artificial Intelligence Diagnostic Precision. J. Pers. Med. 2024, 14, 703. [Google Scholar] [CrossRef]
Pinaya, W.H.L.; Graham, M.S.; Kerfoot, E.; Tudosiu, P.-D.; Dafflon, J.; Fernandez, V.; Sanchez, P.; Wolleb, J.; da Costa, P.F.; Patel, A.; et al. Generative AI for Medical Imaging: Extending the MONAI Framework 2023. Available online: https://arxiv.org/abs/2307.15208 (accessed on 30 April 2025).
Shende, P. A Brief Review on: MRI Images Reconstruction Using GAN; IEEE: Piscataway, NJ, USA, 2019. [Google Scholar]
Wang, Z.; Lim, G.; Ng, W.Y.; Tan, T.-E.; Lim, J.; Lim, S.H.; Foo, V.; Lim, J.; Sinisterra, L.G.; Zheng, F.; et al. Synthetic Artificial Intelligence Using Generative Adversarial Network for Retinal Imaging in Detection of Age-Related Macular Degeneration. Front. Med. 2023, 10, 1184892. [Google Scholar] [CrossRef]
Kebaili, A.; Lapuyade-Lahorgue, J.; Ruan, S. Deep Learning Approaches for Data Augmentation in Medical Imaging: A Review. J. Imaging 2023, 9, 81. [Google Scholar] [CrossRef]
Waddle, S.; Ganji, S.; Wang, D.; Chao, T.C.; Browne, J.; Leiner, T. Feasibility of Ai-Denoising and Ai-Super-Resolution to Accelerate Cardiac Cine Imaging: Qualitative and Quantitative Analysis. J. Cardiovasc. Magn. Reson. 2024, 26, 100989. [Google Scholar] [CrossRef]
Li, Y.; Xu, S.; Lu, Y.; Qi, Z. CT Synthesis from MRI with an Improved Multi-Scale Learning Network. Front. Phys. 2023, 11, 1088899. [Google Scholar] [CrossRef]
Sakthivel, B.; Vanathi, P.; Sri, M.R.; Subashini, S.; Sonthi, V.K.; Sathish, C. Generative AI Models and Capabilities in Cancer Medical Imaging and Applications. In Proceedings of the 2024 3rd International Conference on Sentiment Analysis and Deep Learning (ICSADL), Bhimdatta, Nepal, 13 March 2024; pp. 349–355. [Google Scholar]
Abbasi, N.; Fnu, N.; Zeb, S. Md Fardous Generative AI in Healthcare: Revolutionizing Disease Diagnosis, Expanding Treatment Options, and Enhancing Patient Care. J. Knowl. Learn. Sci. Technol. 2024, 3, 127–138. [Google Scholar] [CrossRef]
Iqbal, A.; Sharif, M.; Yasmin, M.; Raza, M.; Aftab, S. Generative Adversarial Networks and Its Applications in the Biomedical Image Segmentation: A Comprehensive Survey. Int. J. Multimed. Inf. Retr. 2022, 11, 333–368. [Google Scholar] [CrossRef]
Ali, M.; Ali, M.; Hussain, M.; Koundal, D. Generative Adversarial Networks (GANs) for Medical Image Processing: Recent Advancements. Arch. Comput. Methods Eng. 2025, 32, 1185–1198. [Google Scholar] [CrossRef]
Khosravi, M.; Zare, Z.; Mojtabaeian, S.M.; Izadi, R. Artificial Intelligence and Decision-Making in Healthcare: A Thematic Analysis of a Systematic Review of Reviews. Health Serv. Res. Manag. Epidemiol. 2024, 11, 23333928241234863. [Google Scholar] [CrossRef] [PubMed]
Rydzewski, N.R.; Dinakaran, D.; Zhao, S.G.; Ruppin, E.; Turkbey, B.; Citrin, D.E.; Patel, K.R. Comparative Evaluation of LLMs in Clinical Oncology. NEJM AI 2024, 1, AIoa2300151. [Google Scholar] [CrossRef] [PubMed]
Kamran, F.; Tjandra, D.; Heiler, A.; Virzi, J.; Singh, K.; King, J.E.; Valley, T.S.; Wiens, J. Evaluation of Sepsis Prediction Models before Onset of Treatment. NEJM AI 2024, 1, AIoa2300032. [Google Scholar] [CrossRef]
Dagan, N.; Magen, O.; Leshchinsky, M.; Makov-Assif, M.; Lipsitch, M.; Reis, B.Y.; Yaron, S.; Netzer, D.; Balicer, R.D. Prospective Evaluation of Machine Learning for Public Health Screening: Identifying Unknown Hepatitis C Carriers. NEJM AI 2024, 1, AIoa2300012. [Google Scholar] [CrossRef]
Lee, S.M.; Lee, G.; Kim, T.K.; Le, T.; Hao, J.; Jung, Y.M.; Park, C.-W.; Park, J.S.; Jun, J.K.; Lee, H.-C.; et al. Development and Validation of a Prediction Model for Need for Massive Transfusion During Surgery Using Intraoperative Hemodynamic Monitoring Data. JAMA Netw. Open 2022, 5, e2246637. [Google Scholar] [CrossRef]
Horvat, C.M.; Barda, A.J.; Perez Claudio, E.; Au, A.K.; Bauman, A.; Li, Q.; Li, R.; Munjal, N.; Wainwright, M.S.; Boonchalermvichien, T.; et al. Interoperable Models for Identifying Critically Ill Children at Risk of Neurologic Morbidity. JAMA Netw. Open 2025, 8, e2457469. [Google Scholar] [CrossRef]
Taylor, R.A.; Chmura, C.; Hinson, J.; Steinhart, B.; Sangal, R.; Venkatesh, A.K.; Xu, H.; Cohen, I.; Faustino, I.V.; Levin, S. Impact of Artificial Intelligence–Based Triage Decision Support on Emergency Department Care. NEJM AI 2025, 2, AIoa2400296. [Google Scholar] [CrossRef]
Barnett, A.J.; Guo, Z.; Jing, J.; Ge, W.; Kaplan, P.W.; Kong, W.Y.; Karakis, I.; Herlopian, A.; Jayagopal, L.A.; Taraschenko, O.; et al. Improving Clinician Performance in Classifying EEG Patterns on the Ictal–Interictal Injury Continuum Using Interpretable Machine Learning. NEJM AI 2024, 1, AIoa2300331. [Google Scholar] [CrossRef]
Chiu, I.-M.; Lin, C.-H.R.; Yau, F.-F.F.; Cheng, F.-J.; Pan, H.-Y.; Lin, X.-H.; Cheng, C.-Y. Use of a Deep-Learning Algorithm to Guide Novices in Performing Focused Assessment With Sonography in Trauma. JAMA Netw. Open 2023, 6, e235102. [Google Scholar] [CrossRef]
Waikel, R.L.; Othman, A.A.; Patel, T.; Ledgister Hanchard, S.; Hu, P.; Tekendo-Ngongang, C.; Duong, D.; Solomon, B.D. Recognition of Genetic Conditions After Learning With Images Created Using Generative Artificial Intelligence. JAMA Netw. Open 2024, 7, e242609. [Google Scholar] [CrossRef] [PubMed]
Dolezal, J.M.; Wolk, R.; Hieromnimon, H.M.; Howard, F.M.; Srisuwananukorn, A.; Karpeyev, D.; Ramesh, S.; Kochanny, S.; Kwon, J.W.; Agni, M.; et al. Deep Learning Generates Synthetic Cancer Histology for Explainability and Education. npj Precis. Oncol. 2023, 7, 49. [Google Scholar] [CrossRef]
Liu, H.; Ding, N.; Li, X.; Chen, Y.; Sun, H.; Huang, Y.; Liu, C.; Ye, P.; Jin, Z.; Bao, H.; et al. Artificial Intelligence and Radiologist Burnout. JAMA Netw. Open 2024, 7, e2448714. [Google Scholar] [CrossRef]
Fajtl, J.; Welikala, R.A.; Barman, S.; Chambers, R.; Bolter, L.; Anderson, J.; Olvera-Barrios, A.; Shakespeare, R.; Egan, C.; Owen, C.G.; et al. Trustworthy Evaluation of Clinical AI for Analysis of Medical Images in Diverse Populations. NEJM AI 2024, 1, AIoa2400353. [Google Scholar] [CrossRef]
He, Y.; Guo, Y.; Lyu, J.; Ma, L.; Tan, H.; Zhang, W.; Ding, G.; Liang, H.; He, J.; Lou, X.; et al. Disorder-Free Data Are All You Need—Inverse Supervised Learning for Broad-Spectrum Head Disorder Detection. NEJM AI 2024, 1, AIoa2300137. [Google Scholar] [CrossRef]
Lehmann, V.; Zueger, T.; Maritsch, M.; Notter, M.; Schallmoser, S.; Bérubé, C.; Albrecht, C.; Kraus, M.; Feuerriegel, S.; Fleisch, E.; et al. Machine Learning to Infer a Health State Using Biomedical Signals—Detection of Hypoglycemia in People with Diabetes While Driving Real Cars. NEJM AI 2024, 1, AIoa2300013. [Google Scholar] [CrossRef]
Natesan, D.; Eisenstein, E.L.; Thomas, S.M.; Eclov, N.C.W.; Dalal, N.H.; Stephens, S.J.; Malicki, M.; Shields, S.; Cobb, A.; Mowery, Y.M.; et al. Health Care Cost Reductions with Machine Learning-Directed Evaluations during Radiation Therapy—An Economic Analysis of a Randomized Controlled Study. NEJM AI 2024, 1, AIoa2300118. [Google Scholar] [CrossRef] [PubMed]
Wu, K.; Wu, E.; Theodorou, B.; Liang, W.; Mack, C.; Glass, L.; Sun, J.; Zou, J. Characterizing the Clinical Adoption of Medical AI Devices through U.S. Insurance Claims. NEJM AI 2024, 1, AIoa2300030. [Google Scholar] [CrossRef]
Mazzucato, M.; Li, H.L. Rebalancing the research agenda: From discovery to societal impact. Nature 2023, 615, 415–418. [Google Scholar]
National Science Foundation. NSF Report on Equity and Research Prioritization. 2023. Available online: https://www.nsf.gov (accessed on 30 April 2025).
First Drug Discovered and Designed with Generative AI Enters Phase 2 Clinical Trials; EurekAlert! 2023; Available online: https://www.eurekalert.org/news-releases/993844 (accessed on 30 April 2025).
Brazil How AI Is Transforming Drug Discovery. 27-06-23. Available online: https://www.drugdiscoverytrends.com/ai-drug-discovery-2023-trends/ (accessed on 30 April 2025).
AI Drug Discovery 2023 Trends. 2023. Available online: https://pharmaceutical-journal.com/article/feature/how-ai-is-transforming-drug-discovery (accessed on 30 April 2025).
Unlu, O.; Shin, J.; Mailly, C.J.; Oates, M.F.; Tucci, M.R.; Varugheese, M.; Wagholikar, K.; Wang, F.; Scirica, B.M.; Blood, A.J.; et al. Retrieval-Augmented Generation–Enabled GPT-4 for Clinical Trial Screening. NEJM AI 2024, 1, AIoa2400181. [Google Scholar] [CrossRef]
Mao, D.; Liu, C.; Wang, L.; AI-Ouran, R.; Deisseroth, C.; Pasupuleti, S.; Kim, S.Y.; Li, L.; Rosenfeld, J.A.; Meng, L.; et al. AI-MARRVEL—A Knowledge-Driven AI System for Diagnosing Mendelian Disorders. NEJM AI 2024, 1, AIoa2300009. [Google Scholar] [CrossRef]
Tang, X.; Dai, H.; Knight, E.; Wu, F.; Li, Y.; Li, T.; Gerstein, M. A Survey of Generative AI for de Novo Drug Design: New Frontiers in Molecule and Protein Generation. Brief. Bioinform. 2024, 25, bbae338. [Google Scholar] [CrossRef]
Helm, J.M.; Swiergosz, A.M.; Haeberle, H.S.; Karnuta, J.M.; Schaffer, J.L.; Krebs, V.E.; Spitzer, A.I.; Ramkumar, P.N. Machine Learning and Artificial Intelligence: Definitions, Applications, and Future Directions. Curr. Rev. Musculoskelet. Med. 2020, 13, 69–76. [Google Scholar] [CrossRef] [PubMed]
Wang, W.-H.; Hsu, W.-S. Integrating Artificial Intelligence and Wearable IoT System in Long-Term Care Environments. Sensors 2023, 23, 5913. [Google Scholar] [CrossRef] [PubMed]
Amjad, A.; Kordel, P.; Fernandes, G. A Review on Innovation in Healthcare Sector (Telehealth) through Artificial Intelligence. Sustainability 2023, 15, 6655. [Google Scholar] [CrossRef]
Ahmed, A.; Aziz, S.; Khalifa, M.; Shah, U.; Hassan, A.; Abd-Alrazaq, A.; Househ, M. Thematic Analysis on User Reviews for Depression and Anxiety Chatbot Apps: Machine Learning Approach. JMIR Form. Res. 2022, 6, e27654. [Google Scholar] [CrossRef] [PubMed]
Alkaissi. McFarlane Artificial Hallucinations in ChatGPT: Implications in Scientific Writing; Harvard Business School: Boston, MA, USA, 2022. [Google Scholar]
Tonekaboni, S.; Joshi, S.; McCradden, M.D.; Goldenberg, A. What clinicians want: Contextualizing explainable machine learning for clinical end use. Proc. Mach. Learn. Res. 2019, 106, 359–380. [Google Scholar]
Currie, G.M.; Hawk, K.E.; Rohren, E.M. Generative Artificial Intelligence Biases, Limitations and Risks in Nuclear Medicine: An Argument for Appropriate Use Framework and Recommendations. Semin. Nucl. Med. 2025, 55, 423–436. [Google Scholar] [CrossRef]
Farhud, D.D.; Zokaei, S. Ethical Issues of Artificial Intelligence in Medicine and Healthcare. Iran. J. Public Health 2021, 50, i–v. [Google Scholar] [CrossRef]
Chen, Y.; Esmaeilzadeh, P. Generative AI in Medical Practice: In-Depth Exploration of Privacy and Security Challenges. J. Med. Internet Res. 2024, 26, e53008. [Google Scholar] [CrossRef]
Habli, I.; Lawton, T.; Porter, Z. Artificial Intelligence in Health Care: Accountability and Safety. Bull. World Health Organ. 2020, 98, 251–256. [Google Scholar] [CrossRef]
Strubell, E.; Ganesh, A.; McCallum, A. Energy and Policy Considerations for Deep Learning in NLP. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July 2019. [Google Scholar] [CrossRef]
Patterson, D.; Gonzalez, J.; Le, Q.; Liang, C.; Munguia, L.-M.; Rothchild, D.; So, D.; Texier, M.; Dean, J. Carbon Emissions and Large Neural Network Training. arXiv 2021, arXiv:2104.10350. [Google Scholar]
Reddy, S. Generative AI in Healthcare: An Implementation Science Informed Translational Path on Application, Integration and Governance. Implement. Sci. 2024, 19, 27. [Google Scholar] [CrossRef] [PubMed]
Hanna, M.G.; Pantanowitz, L.; Dash, R.; Harrison, J.H.; Deebajah, M.; Pantanowitz, J.; Rashidi, H.H. Future of Artificial Intelligence—Machine Learning Trends in Pathology and Medicine. Mod. Pathol. 2025, 38, 100705. [Google Scholar] [CrossRef]
Maleki Varnosfaderani, S.; Forouzanfar, M. The Role of AI in Hospitals and Clinics: Transforming Healthcare in the 21st Century. Bioengineering 2024, 11, 337. [Google Scholar] [CrossRef] [PubMed]
Nundy, S.; Montgomery, T.; Wachter, R.M. Promoting trust between patients and physicians in the era of artificial intelligence. JAMA 2022, 327, 521–522. [Google Scholar] [CrossRef]
Lu, Z.; Peng, Y.; Cohen, T.; Ghassemi, M.; Weng, C.; Tian, S. Large Language Models in Biomedicine and Health: Current Research Landscape and Future Directions. J. Am. Med. Inform. Assoc. 2024, 31, 1801–1811. [Google Scholar] [CrossRef]
Bhuyan, S.S.; Sateesh, V.; Mukul, N.; Galvankar, A.; Mahmood, A.; Nauman, M.; Rai, A.; Bordoloi, K.; Basu, U.; Samuel, J. Generative Artificial Intelligence Use in Healthcare: Opportunities for Clinical Excellence and Administrative Efficiency. J. Med. Syst. 2025, 49, 10. [Google Scholar] [CrossRef]

Figure 1. Various generative AI tools and their application in healthcare.

Figure 2. Generative AI models, including Large Language Models (LLMs) and Image Generation Models (IGMs) are transforming clinical workflows. LLMs support text-based applications such as clinical documentation, diagnosis formulation, medical literature synthesis, and evidence-based responses to clinical questions. IGMs enhance imaging tasks including image-to-image translation, medical image synthesis, image enhancement and generation of synthetic MRI data, highlighting the dual role of generative AI in advancing both textual and visual dimensions of healthcare.

Figure 3. Applications of Generative AI in clinical practice.

Figure 4. An overview of the integration of AI-driven clinical decision support systems (AI CDSSs) across various healthcare settings. (Top left): AI can interpret imaging to support the diagnosis of brain tumors. (Top right): Personalized treatment recommendations can be generated based on patient history and comorbidities. (Bottom left): AI can analyze multimodal inputs (biometrics, CT, labs) to assist in stroke diagnosis and emergency decisions. (Bottom right): Real-time AI monitoring at the bedside can detect critical events like cardiac arrest and guide intervention, such as initiating ventilation. Together, these use cases illustrate the multifaceted applications of AI CDSSs in enhancing clinical efficiency and patient safety.

Figure 5. Challenges of Generative AI in healthcare.

Figure 6. Futuristic Landscape of Generative Artificial Intelligence in Healthcare, showcasing its evolution from foundational model development to advanced applications in personalized medicine, autonomous diagnostics, clinical decision support, and predictive healthcare analytics.

Table 1. Tabular Summary of Generative AI Applications in Clinical Practice.

Domain	Application	Outcomes	References
Clinical Documentation	Drafting clinical notes, discharge summaries, and patient letters	Improved clinician efficiency; reduced burnout	[10,44]
Patient Communication	Drafting responses to patient messages and health education	AI responses rated higher in empathy; improved satisfaction and understanding	[15,19,21]
Clinical Decision Support	Assisting in diagnosis and management suggestions	Comparable or improved accuracy in diagnostic reasoning compared to physicians	[29,31,32]
Medical Imaging Interpretation	AI that can “read” an image and generate a report	Generating reports from radiology images	[45]
Drug Discovery and Biomedical Research	Assisting in drug discovery and development	Generating novel molecules, optimizing drug candidates, and designing clinical trials	[46]
Patient Monitoring and Telehealth Integration	Transforming patient care, especially for chronic conditions	Remote patient monitoring systems, AI powered telehealth	[47]
Medical Education and Training	Enhancing medical education	AI as an adjunct for learning	[48]
Mental Health Support	Chatbots offering conversational support or behavioral coaching	Early evidence of utility as a supportive tool; still requires human oversight	[49,50]

Table 3. Different Clinical Studies Reporting the Implementation of Artificial Intelligence in Healthcare.

Author	Clinical Study	Primary Outcomes	Secondary Outcomes	Inference	Reference
Aklilu et al.	Artificial Intelligence Identifies Factors Associated with Blood Loss and Surgical Experience in Cholecystectomy	The study’s primary objective was to identify specific surgical maneuvers associated with positive indicators of surgical performance and high surgical skill, with a particular focus on factors contributing to blood loss during cholecystectomy.	The secondary objectives was to examine additional elements influencing surgical outcomes.	The AI model demonstrated the capability to identify factors—such as surgical experience and technique—associated with intraoperative outcomes, particularly blood loss during cholecystectomy.	[64]
Barnett et al.	Improving Clinician Performance in Classifying EEG Patterns on the Ictal-Interictal Injury Continuum Using Interpretable Machine Learning	Developed an interpretable deep learning system that accurately classifies six patterns of potentially harmful EEG activity seizures, lateralized periodic discharges (LPDs), generalized periodic discharges (GPDs), lateralized rhythmic delta activity (LRDA), generalized rhythmic delta activity (GRDA), and other patterns while providing faithful case-based explanations of its predictions.	Identification and characterization of strategies to bolster confidence in model-generated responses.	Users demonstrated significant improvements in pattern classification accuracy with the assistance of this interpretable deep learning model. The interpretable design facilitates effective human–AI collaboration; this system may improve diagnosis and patient care in clinical settings. The model may also provide a better understanding of how EEG patterns relate to each other along the ictal–interictal injury continuum.	[86]
Fajtl et al.	Methodology for independent evaluation of algorithms for automated analysis of medical images for trustworthy and equitable deployment of clinical Al in diverse population screening programmes	The study outlines a transferable methodology for the independent evaluation of algorithms, using a routine, high-volume, multiethnic national diabetic eye-screening program as an exemplar.	Secondary objective was to evaluate the practical aspects of implementing these AI systems in real-world screening programs. This included assessing the time required for algorithm installation, image-processing durations, and the overall scalability of deploying such systems in large-scale, routine screening settings.	The methodology was shown to be transferable for algorithm evaluation in large-scale, multiethnic screening programs.	[91]
He et al.	Disorder-Free Data are All You Need: Inverse Supervised Learning for Broad-Spectrum Head Disorder Detection	The study’s primary objective was to develop and evaluate an AI-based system capable of accurately detecting a wide range of head disorders without requiring any disorder-containing data for training. This was achieved by introducing a novel learning algorithm called Inverse Supervised Learning (ISL), which learns exclusively from disorder-free head CT scans.	The adaptability of the ISL-based system to other medical imaging modalities. Specifically, it evaluated the system’s performance on pulmonary CT and retinal optical coherence tomography (OCT) images, achieving AUC values of 0.893 and 0.895, respectively.	Inverse supervised learning can be effective for broad-spectrum head disorder detection.	[92]
Hiesinger et al.	Almanac: Retrieval-Augmented Language Models for Clinical Medicine	The study develops Almanac, a large language model framework augmented with retrieval capabilities to provide medical guideline and treatment recommendations. The primary outcome was to demonstrate significant improvements in factuality across all specialties.	Secondary outcomes include improvements in completeness and the safety of the recommendations. Evaluate performance on a novel dataset of clinical scenarios (n = 130).	Performance on a novel dataset of clinical scenarios demonstrates that large language models can be effective tools in the clinical decision-making process, showing significant increases in factuality (mean of 18%, p < 0.05) across all specialties, along with improvements in completeness and safety—highlighting the need for careful testing and deployment.	[21]
Kamran et al.	Evaluation of Sepsis Prediction Models before Onset of Treatment	The primary outcome is typically specified before the study begins and is the basis for calculating the sample size needed for adequate statistical power.		The accuracy of AI sepsis predictions varies depending on the timing of the prediction relative to treatment initiation.	[81]
Kazemzadeh et al.	Prospective Multi-Site Validation of AI to Detect Tuberculosis and Chest X-Ray Abnormalities	Noninferiority of AI detection to radiologist performance.	AI detection compared with WHO targets. Abnormality AI was non-inferior to the high-sensitivity benchmark.	The CXR TB AI was noninferior to radiologists for active pulmonary TB triaging in a population with a high TB and HIV burden. Neither the TB AI nor the radiologists met WHO recommendations for sensitivity in the study population. AI can also be used to detect other CXR abnormalities in the same population.	[61]
Lehmann et al.	Machine learning to infer a health state using biomedical signals—detection of hypoglycemia in people with diabetes while driving real cars	The primary outcome was the detection of hypoglycemia using a machine learning approach.	The secondary outcome was the diagnostic accuracy of this approach, quantified by the area under the receiver operating characteristic curve (AUROC).	Machine learning can effectively and noninvasively detect hypoglycemia in people with diabetes during real-world driving scenarios, using driving behavior and gaze/head motion data.	[93]
Lin et al.	Artificial Intelligence-Powered Rapid Identification of ST-Elevation Myocardial Infarction via Electrocardiogram (ARISE) A Pragmatic Randomized Controlled Trial	To evaluate the potential of AI-ECG-assisted detection of STEMI to reduce treatment delays for patients with STEMI.	The secondary objectives was to evaluate the sensitivity and specificity of the AI algorithm in accurately identifying STEMI from 12-lead ECGs.	In patients with STEMI, AI-ECG-assisted triage of STEMI decreased the door-to-balloon time for patients presenting to the emergency department and decreased the ECG-to-balloon time for patients in the emergency room and inpatients.	[30]
Natesan et al.	Health Care Cost Reductions with Machine Learning Directed Evaluations during Radiation Therapy—An Economic Analysis of a Randomized Controlled Study	Healthcare cost reduction.	Acute care visit costs, inpatient costs.	Machine learning-directed evaluations during radiation therapy can lead to significant healthcare cost reductions.	[94]
Patel et al.	Spending Analysis of Machine Learning Based Communication Nudges in Oncology	Total medical costs.	Acute care visit costs.	Machine learning-based communication nudges may lead to cost reductions in oncology care.	[26]
Rydzewski et al.	Comparative Evaluation of LLMs in Clinical Oncology	To conduct comprehensive evaluations of LLMs in the field of oncology. To identify and characterize strategies to bolster confidence in a model’s response.	The secondary objective was to evaluate LLM performance on a novel validation set of 50 oncology questions.	LLMs, particularly GPT-4 Turbo and Gemini 1.0 Ultra, demonstrated high accuracy in answering clinical oncology questions, with GPT-4 achieving the highest performance among those tested; however, all models exhibited clinically significant error rates.	[80]
Wu et al.	Characterizing the Clinical Adoption of Medical AI Devices through U.S. Insurance Claims	The primary objective was to quantify the adoption and usage of medical AI devices in the United States.	Analyze the prevalence of medical AI devices based on submitted claims.	Medical AI device adoption is still nascent, with most usage driven by a handful of leading devices. Zip codes with higher income levels, metropolitan areas, and academic medical centers are more likely to have medical AI usage.	[95]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rabbani, S.A.; El-Tanani, M.; Sharma, S.; Rabbani, S.S.; El-Tanani, Y.; Kumar, R.; Saini, M. Generative Artificial Intelligence in Healthcare: Applications, Implementation Challenges, and Future Directions. BioMedInformatics 2025, 5, 37. https://doi.org/10.3390/biomedinformatics5030037

AMA Style

Rabbani SA, El-Tanani M, Sharma S, Rabbani SS, El-Tanani Y, Kumar R, Saini M. Generative Artificial Intelligence in Healthcare: Applications, Implementation Challenges, and Future Directions. BioMedInformatics. 2025; 5(3):37. https://doi.org/10.3390/biomedinformatics5030037

Chicago/Turabian Style

Rabbani, Syed Arman, Mohamed El-Tanani, Shrestha Sharma, Syed Salman Rabbani, Yahia El-Tanani, Rakesh Kumar, and Manita Saini. 2025. "Generative Artificial Intelligence in Healthcare: Applications, Implementation Challenges, and Future Directions" BioMedInformatics 5, no. 3: 37. https://doi.org/10.3390/biomedinformatics5030037

APA Style

Rabbani, S. A., El-Tanani, M., Sharma, S., Rabbani, S. S., El-Tanani, Y., Kumar, R., & Saini, M. (2025). Generative Artificial Intelligence in Healthcare: Applications, Implementation Challenges, and Future Directions. BioMedInformatics, 5(3), 37. https://doi.org/10.3390/biomedinformatics5030037

Article Menu

Generative Artificial Intelligence in Healthcare: Applications, Implementation Challenges, and Future Directions

Abstract

1. Introduction

2. Methodology

3. Different Generative AI Tools

4. Applications in Clinical Practice and Research

4.1. Clinical Documentation and Administrative Workflow

4.2. Patient Communication and Education

4.3. Clinical Decision Support and Diagnostics

4.4. Medical Imaging Interpretation

4.5. Clinical Decision Support Systems (CDSSs)

4.6. Emergency and Triage

4.7. Medical Imaging and Pathology

4.7.1. Data Augmentation and Synthetic Datasets

4.7.2. Image Quality Enhancement

4.7.3. Image-to-Image Translation

4.8. Drug Discovery and Biomedical Research

4.8.1. De Novo Molecule Generation

4.8.2. Drug Optimization and ADMET

4.8.3. Clinical Trial Design and Data Augmentation

4.8.4. Genomics and Precision Medicine

4.8.5. Biomedical Literature and Knowledge Synthesis

4.9. Patient Monitoring and Telehealth Integration

4.10. Medical Education and Training

4.10.1. Education for Trainees

4.10.2. Simulation and Case-Based Learning

4.10.3. Continuing Education and Knowledge Update

4.10.4. Assessment and Feedback

4.11. Explainability in Machine Learning in Contrast to Traditional Statistical Methods

4.12. Comparative Insights of Generative AI with Traditional Methods or Human Experts

5. Challenges and Ethical Considerations

5.1. Accuracy and Hallucinations

5.2. Bias and Health Equity

5.3. Privacy and Security

5.4. Accountability and Legal Liability

5.5. Implementation Challenges

5.6. Sustainable Development Goals (SDGs): Climate Action

5.7. Cost-Effectiveness and Scalability Considerations

6. Future Directions and Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI