1. Introduction
In healthcare settings, the primary goal is to improve patients’ health through prevention, diagnosis, and treatment. This is facilitated by exploring and analysing Electronic Health Records (EHRs). EHRs can represent a huge amount of textual and image data to process, particularly in community and acute settings where time and resources can be limited. As a result, harnessing large volumes of data can be overwhelming and expensive for public health organizations in terms of human resources [
1]. GenAI has the potential to perform such tasks at a lower cost [
2]. Within healthcare, GenAI leverages EHR insights to provide evidence-based clinical diagnosis and decision-making to support clinicians and improve healthcare services. GenAI has made significant progress in the rapidly evolving healthcare landscape in recent years and is gradually reshaping disease diagnosis [
3], management, and treatment [
2]. While Explainable AI (XAI) improves the interpretability and transparency of medical diagnosis, GenAI enhances personalization in the treatment and management of diseases. GenAI is increasingly becoming popular in the healthcare industry. According to Sai et al. [
4], a survey by Market.us estimated that the market value of GenAI in the healthcare industry will reach about USD 17 billion by 2032. The review highlighted the capacity of GenAI tools such as ChatGPT, DALL-E, Midjourney, and Stable Diffusion to generate and analyze text and images in formats intelligible to humans. In accordance with Liu et al. [
5], Grünebaum et al. [
6], and Halawani et al. [
7], it was indicated that GenAI holds the potential to enhance healthcare decision making, treatment, and education. In light of the continued expansion of GenAI in healthcare and the potential highlighted by existing studies, our study explored its current landscape, commencing with a bibliometric assessment. Considered as the statistical analysis of bibliographic data, bibliometric analysis are widely used by scholars to perform articles’ metadata analysis for various purposes, such as to depict evolutionary nuances and trends in subject areas, a journal’s performance or collaboration patterns [
8].
Existing studies have focused on specific areas of healthcare. A bibliometric analysis study examined the use of GenAI in mental health, underlining emotional support, psychological assessment and personalized psychological intervention as key applications in this area [
9]. Other studies have focused on AI in healthcare more generally, acknowledging machine learning, natural language processing (NLP), and EHR [
10], then diabetes, heart failure, Alzheimer disease, depression, and cancer [
11], as trending research hotpots. Traditional AI enables the user to analyze big data, recognize patterns, and make predictions. GenAI, on the other hand, creates new content in response to the user’s prompt [
4]. In this work, we are focused specifically on the use of GenAI in healthcare, and our study strives to answer the following research questions:
- RQ1: 
- What are the most influential institutions and countries promoting the advancement of GenAI in healthcare? 
- RQ2: 
- How are the studies distributed over the years? 
- RQ3: 
- What is the intellectual structure of knowledge on GenAI in a healthcare setting? 
- RQ4: 
- What are the key growth trends of GenAI in healthcare? 
- RQ5: 
- What authors and articles in the literature of GenAI in healthcare have had the greatest citation impact? 
- RQ6: 
- What are the most influential journals applying GenAI in healthcare? 
- RQ7: 
- What areas are conceptually promising and are currently under-explored? 
- RQ8: 
- What are the challenges for GenAI in healthcare? 
  2. Background
  2.1. What Is GenAI and How Does It Work?
Unlike traditional AI systems, that focus on learning, recognition, or classification of patterns, GenAI is designed to produce new contextually relevant content in human language, by learning from various data structures such as text, sounds, videos, or images [
12]. The capacity of GenAI to produce easily understandable content for healthcare patients holds significant potential for supporting physicians, specialists, and other healthcare practitioners. The rapid integration of large language models (LLMs) and the evolution of the Generative Pre-trained Transformer (GPT) have resulted in the proliferation of GenAI models. In 2024, a scoping review identified nine GenAI models utilized in healthcare [
13]. These models included Google Gemini utilized in 10% of studies, Microsoft Bing AI utilized in 7% of studies, and ChatGPT utilized in 74%, thereby appearing as the most prevalent, as further supported by [
4] GenAI models are designed to engage in conversation with users in natural human language. Unlike traditional search engines matching user’s keywords to their database content, GenAI models have the ability to discuss with humans while keeping context. They can explain sophisticated concepts and theories in plain language. They are trained on large sets of data from which they learn, moreover, they can participate in conversations and generate answers to user questions as indicated by [
14] and further supported by [
4]. However, they are only as good as the quality of the data on which they are trained. Poor data quality can lead to inconsistencies and potential misinformation [
4], which may have serious consequences in healthcare settings. Additionally, their accuracy is reported to be variable based on the users’ language [
15].
  2.2. Why GenAI in Healthcare?
The primary goal of healthcare service providers is to improve patient care, both physically and mentally. However, achieving this purpose can be difficult at times due to a lack of healthcare human resources, as observed during the COVID-19 pandemic [
16]. Healthcare practitioners are often constrained to work for longer hours, jeopardizing their health and impacting their work–life balance. Such situations often make room for medical mistakes due to excessive fatigue and work pressure [
17]. A study including 61 physicians from 29 practices assessed how physicians spend their time at work [
1], and it was found that physicians spend about 44.9% and 66.5% of their work time on EHRs processing and patient care respectively. GenAI models hold the potential to minimize such situations, given their features and capabilities. They are available at all times and can be of great support in assisting clinicians and patients in many clinical and administrative tasks as indicated by [
18] and further supported by [
19]. They can also help patients and the general public answer healthcare-related questions. Moreover, they can support patients and healthcare personnel in navigating complex healthcare systems, thus helping to save time, which is a critical resource in the ever-busy healthcare environment [
13].
  3. Methodology and Materials
As previously discussed, the primary research method used in this work is a bibliometric analysis. A search was performed on the Scopus database to extract papers related to the use of GenAI in healthcare. In this section, we detail the search criteria, factors used to determine whether work was included or excluded in the analysis, and the tools used to analyse the set of papers that resulted from our search.
  3.1. Search Criteria and Screening
A systematic search was performed to gather the relevant information used for the bibliometric analysis. The search followed a series of well-defined steps to improve the transparency and comprehensiveness of the study as seen in the PRISMA diagram (
Figure 1).
The dataset was obtained by searching documents in the Scopus database using keywords related to GenAI and healthcare. Past studies have used databases such as Web of Science, PubMed, and Scopus for bibliometric analysis or review [
10,
13], but Scopus was chosen for this study owing to its interdisciplinary nature, comprehensiveness, and up-to-date characteristics as indicated by [
20] and further supported [
21]. The data extraction was performed in February 2025. The search string used was
(Generative AND artificial AND intelligence) OR (large AND language AND model) OR GenAI OR LLM) AND (health OR healthcare)
This initial stage of the search resulted in the identification of 2675 papers. A set of restrictions was then applied to these search results, as follows. The search was limited to open-access, English-language articles in the subject areas of computer science, artificial intelligence and subjects related to health professions (such as nursing, psychology, medicine, dentistry). Results were also restricted to journal articles that had been published (rather than in press).
Following this, manual screening using the title, abstract, and, where necessary, the full text, was performed independently by two researchers to ensure that the resulting papers were relevant to the area of GenAI in healthcare. The final result was a dataset of 267 articles.
  3.2. Data Analysis
The data analysis of the selected articles was performed using two bibliographic software tools: VOSviewer (v1.6.20) and Biblioshiny (v5.1.1) [
22]. Microsoft Power BI (v13.0.26732.48) was also used to perform some quantitative analysis. These tools were used to create visualizations, explore the networks of citations, collaboration, and co-occurrence of keywords to obtain qualitative and quantitative insights discussed in the next section.
  4. Bibliometric Analysis Results
  4.1. Dataset Overview
The data set includes 267 articles from 199 sources, published between January 2023 and February 2025. We note that during the data collection process, no restrictions were set on the time period, as a means to include the maximum number of articles that met the selection criteria. This means that the first article to meet our search terms and inclusion criteria was from 2023, indicating that GenAI within healthcare only began appearing as a topic from this time onwards.
Table 1 shows a three-year time range along with the articles’ average age of 0.891 years, as a potential indicator of the relatively young age of the research area, perhaps motivated by the COVID-19 pandemic which has likely motivated the applications of LLMs in healthcare research and practice and exposed the need for automation and the extension of digital health services. For instance, a recent study evaluated the application of pretrained generative large language models, including LLaMA2-7b and Flan-T5-XL, for assessing COVID-19 severity [
23]. The study acknowledged that they performed well in providing no code real time assessment in low data setting encompassing unstructured inputs and limited data. The annual growth rate of the publications of 44.02% suggests a rapid increase in research into GenAI for healthcare over this period.
   4.2. Trend in Article Publication
GenAI became a focal point in the research community around the year 2010 with the creation of early virtual assistants like SIRI, then Alexa and Cortana in 2014, followed by ChatGPT in 2018 and Bard in 2023 [
24].
According to Al-Amin et al. [
24], GPT-3.5 was released in 2022. 
Figure 2 shows that by 2023, there were 27 published articles on the use of GenAI in healthcare. GenAI continued to gain popularity the following year, recording 184 publications related to healthcare. This represents a more than six-fold increase in output in just this short time period. In just the first two months of 2025, there were 56 relevant articles published, equating to almost a 100% increase when compared to the total for 2023. We will reflect on this further when discussing the limitations of this research in 
Section 7.
  4.3. Co-Authorship Distribution by Countries
Figure 3 presents the network of collaboration of authors across countries. In generating this figure, we set a minimum threshold of five articles per country. Of the 77 countries represented in the data set, 17 countries met this criterion and are represented in the diagram. The nodes represent the countries that meet the minimum threshold, and the size of each node is proportional to the number of collaborations with other countries.
 It is observed in 
Figure 3 that there are five clusters, represented by nodes of the same colour. The United States (US) are leading the co-authorship distribution with 15 collaborations widely spread across various clusters. The US is followed by the United Kingdom and Canada, with seven and six collaborations, respectively. This output suggests that the US is the country with the highest number of collaborations in GenAI applied to healthcare. Moreover, 
Figure 3 also suggests that authors from the United Kingdom, Canada, and Israel frequently collaborate with authors from the US. The prominence of the US is not uncommon in such analyses, with a 2023 bibliometric analysis study reporting the US as the country with the highest number of publications related to AI applied to healthcare [
10]. The US emphasis in the co-authorship countries’ map could be related to the size of its population, as the countries with the largest population, such as India and China, are also observed in 
Figure 3.
  4.4. Co-Authorship Network
The co-authorship network shows the links between authors in the field, representing the degree of collaboration among authors. A spherical network plot was chosen to display the links among authors, due to its capacity to highlight the intensity of the collaborations among authors, represented by nodes. The thickness of the links between the nodes increases with the number of collaborations among authors. 
Figure 4 presents the co-author network and shows five clusters of authors who regularly publish together (illustrated by nodes of the same colour).
The largest group with the most collaborations is made up of six authors:
- Harada Yukinori, Senior Assistant Professor, Dokkyo Medical University 
- Shimizu Taro, Professor, Dokkyo Medical University, Department of Diagnostic and Generalist Medicine 
- Suzuki Tomoharu, Medical Doctor, Urasoe General Hospital 
- Mizuta Kazuya, Dokkyo Medical University, Department of Diagnostic and Generalist Medicine 
- Hirosawa Takanobu, Senior Assistant Professor, Dokkyo Medical University, Department of Diagnostic and Generalist Medicine 
- Kazuki Tokumasu, Assistant Professor, Okayama University, Department of General Medicine 
The six authors leading this strong network of collaboration are medical domain experts based in Japan. The second group, of five authors, are from the US (Yale University, Weill Cornell Medical School, the University of Pittsburgh, Tulane University School of Medicine and the University of Illinois).
  4.5. Author Productivity Through Lotka’s Law
In the scientific research landscape, the scientific productivity of authors and the relationship between the number of articles they have published is described by Lotka’s law [
25]. Lotka’s Law states that “The number of authors making n contributions is about 
 of those making one; and the proportion of all contributors, that makes a single contribution, is about 60 percent” [
26]. In other words, in a population of 100 authors, approximately 60 will produce one article; 25 will produce two; and 11 will produce three, 6 will produce 4, and 4 will produce 5 articles, and so on.
The productivity of authors in our study is presented in 
Table 2. It is observed that the proportion of authors with more than three publications (0.015%) is very low when compared to the proportion of authors with one publication (93.8%). In other words, the percentage of authors decreases as the quantity of their articles increases. In particular, only one author has produced six articles, and two have produced seven. This may be a result of the early stage of research into GenAI in healthcare, and percentages may, in time, trend towards those predicted by Lotka’s Law.
  4.6. Top Corresponding Author’s Countries
The corresponding author’s countries give an overview of the number of articles for which the corresponding author is from a particular country (and thus an indication of which countries are leading this research) and also the degree of intra- and inter-country collaborations. 
Figure 5 shows the top 30 corresponding authors’ countries. It is suggested that there are more collaborations among the authors of the same countries than among those of different countries. The US is top of the table, with 115 papers, including 20 MCP (Multiple Country Collaborations) and 95 SCP (Single Country Collaborations), followed by China with 6 MCP and 13 SCP. Additionally, there are countries with solely MCP and some with SCP only. Examples of these countries are Chile and Cyprus, with one MCP each, and Spain and Ireland with one SCP each.
  4.7. Collaboration Network by Institutions
The institution collaboration map allows us to visualise the links between institutions. The top 80 institutions with the most collaborations are shown in 
Figure 6. The top six institutions that collaborate the most are Harvard Medical School, University of California, National Institute of Health, Vanderbilt University Medical Center, University of Pennsylvania and the University of Pittsburgh. These institutions also happened to be among the top 10 influential affiliations presented in 
Table 3.
  4.8. Most Cited Countries
Figure 7 shows the top 10 most-cited countries. The US leads this with 543 citations. China is the second most-cited country with 146 citations, followed by Germany with 133 citations.
   4.9. Most Cited Articles
A document citation analysis was performed on the 267 articles included in this study to discover the most-cited articles. 
Table 4 shows the five articles with the highest citation counts.
The most cited article recorded 75 citations and was published early in 2023 by Cai et al. [
27] in the American Journal of Ophthalmology. The article investigated the ability of GenAI to answer board-style ophthalmology questions. In their experiment, the authors compared three large language models’ (Bing Chat from Microsoft, ChatGPT3.5, and OpenAI 4.0) ability to answer 250 questions related to basic science and clinical science. The LLMs’ performance was compared with human respondents. It was established that human respondents had an average accuracy of 72.2%, while Bing Chat, ChatGPT3.5, and ChatGPT4.0 had an average accuracy of 71. 2%, 58. 8%, and 71. 6%, respectively. However, the authors also reported that Bing Chat and ChatGPT4.0 struggled with image interpretation while ChatGPT3.5 had the highest rate of hallucination and non-logical reasoning (42.4%).
The second-most cited paper [
28] was authored by Andrew T. Gabrielson et al. and was published in the 
Journal of Urologists in 2023. The authors present the capabilities and drawbacks of ChatGPT for urologists. The study presents the benefits of GenAI working in concert with physicians, despite recognising its limitations for urologists and while also raising concerns that physicians may be replaced by AI systems. The study acknowledges ChatGPT as an emerging technology that can help mitigate physician burnout by reducing fatigue and improving patient face time.
The third-most cited study [
29] was a review presenting the potentials of ChatGPT and other LLMs in education, research and ophthalmology. The study deepened the understanding of LLMs by presenting the evolution of GPT-based models and related features. Adopting an ophthalmology-specific lens, the study explored the perspective of different stakeholders regarding the integration of LLMs in eye care. From the patient’s viewpoint, the study contended that LLMs could enhance patient-centred care and communication during consultation. It was followed by acknowledging ChatGPT could enable multilingual translation and simplify medical terminology. Within the context of practitioners, the study argued that LLMs hold the promise of augmenting training and education, semi-automating administrative tasks, enhancing efficiency, literature search, and manuscript writing. Using the policymakers’ lens, the study stated that LLMs could be useful in proofreading and refining documents and guidelines, document writing, analysis, and evidence synthesis. While highlighting the applications of LLMs in eye care, the manuscript also presented the foreseeable challenges associated with the integration of LLMs in clinical settings. The challenges included accuracy, bias, security, and interpretability. The study concluded by emphasising the need for multidisciplinary collaboration among stakeholders in order to enhance efficiency, ethics, and safety.
The fourth-most cited article was published in 2023 by Chervenak et al. [
30]. It presents the advantages and consequences of using ChatGPT for clinical information acquisition. The study prompted ChatGPT with fertility-related questions from three different sources and evaluated its responses in terms of factual content, length, and sentiment. It was reported that ChatGPT performed well in providing factual content with an error rate of 6.1% for the first source. For the second source of questions, ChatGPT outperformed the average patient on fertility knowledge. With the third source of questions, ChatGPT showed great performance, and the study concluded by acknowledging its ability to assist in fertility-related clinical inquiries, but emphasised limitations such as its unpredictable unreliability and the inability of users to evaluate the uncertainty in responses.
The fifth-most cited article was published in the 
Journal of Medical Internet Research in 2023 by Giannakopoulos et al. [
31]. This paper examined the use of LLMs in evidence-based dentistry. They found that LLMs have some potential for use in evidence-based dentistry, but their limitations can lead to potentially harmful healthcare decisions, so they recommended that LLMs should not replace the dentist’s expertise and knowledge and that there is a requirement for further research and clinical validation of medical LLMs.
  4.10. Journal Analysis
This study includes papers from 199 journals, and a source analysis allows us to highlight the core journals. Bradford’s Law can help to recognise the influential journals in a field [
32]. It highlights the distribution of the literature and enables us to identify the sources that are more engaged in a particular field. For this study, we used Bradford’s Law to categorise the sources in three zones representing sources with high, average, and low engagement in the field. Out of 199 sources recorded for this study, the 27 belonging to the first zone and representing high-engagement journals are
The top five most productive journals over time are presented in 
Table 5. The 
Journal of Medical Internet Research (JMIR) stands as the most productive journal, with 1, 12, and 12 articles published in 2023, 2024, and 2025, respectively. 
IEEE Access and 
JMIR Mental Health show similar productivity performances with 1, 6, and 6 published articles in 2023, 2024, and 2025, respectively.
  4.11. Most-Cited References
The most-cited references are those works that are most commonly cited by the papers in our dataset. Some of these are within our dataset, while others are from outside. This shows that there are a number of works outside of GenAI for healthcare that are potentially of use to researchers working on the topic of GenAI tools in healthcare.
The 10 most-cited references are presented in 
Table 6 below. It is observed that the most cited references are studies leveraging ChatGPT in areas such as healthcare education and diagnostics. While exploring the applications of ChatGPT in education [
33,
34,
35] and diagnostics [
36], thereby highlighting the attention given to ChatGPT by scholars seeking to harness its potential to enhance medical education and diagnostics, they also pointed out the limitations of ChatGPT in these application areas [
36,
37].
The most cited reference was a single-author systematic review by Sallam [
33]. That review covered 60 articles about potential uses and challenges for ChatGPT usage, with 31 articles (51.7%) focusing on academic/scientific writing, 20 articles (33.3%) on scientific research, 14 articles (23.3%) on healthcare practice, 7 articles (11.7%) on the healthcare education, and 2 articles (3.3%) highlighting the free availability of ChatGPT as an accessible supportive tool for healthcare. The second most cited reference, also published in 2023, demonstrated the artificial hallucination of ChatGPT through an evidence-based case study on bone metabolism and homocysteine. ChatGPT was prompted with tasks, including writing a brief essay on liver involvement in Late-onset Pompe disease (LOPD). It was observed that ChatGPT responses were only partially accurate and its provided references were not consistent [
37]. In addition to discussing the use of LLMs in healthcare [
36,
38] examined LLMs drawbacks like ethics, regulation and trust, extremely important given the sensitive nature of the healthcare ecosystem where patients’ well being is involved.
      
  
    
  
  
    Table 6.
    Top 10 most cited references.
  
 
  
      Table 6.
    Top 10 most cited references.
      
        | Rank | Cited Reference | 
|---|
| 1 | Sallam [33] | 
| 2 | Alkaissi and McFarlane [37] | 
| 3 | Ayers et al. [34] | 
| 4 | Cooper [35] | 
| 5 | Kanjee et al. [36] | 
| 6 | Kung et al. [39] | 
| 7 | Meskó [38] | 
| 8 | Thirunavukarasu et al. [40] | 
| 9 | Biden [41] | 
| 10 | Eysenbach [42] | 
      
 
  4.12. Co-Word Analysis
The keyword co-occurrence network allows us to understand the connections between topics related to genAI in healthcare. This network is built using the “keyword plus” method, which includes not just the keywords in the articles themselves, but also those in their cited references, thus enhancing comprehensiveness. 
Figure 8 shows a network map of keywords based on the Leiden Algorithm [
43]. A star network layout was chosen to simplify the visualisation and highlight the topics with the highest level of centrality, such as “medical education” in this case. Keywords that regularly appear together are placed in clusters, indicated by using the same colours and being connected to each other by lines, with thicker lines indicating more connections.
The keywords network showed that “clinical decision making” was directly connected to key topics like “patient education”, “medical education”, and “hallucination”. Moreover, the network map also highlighted some important topics explored by scholars in healthcare domains such as higher education, medical ethics, radiology, diagnosis accuracy, mental health, health disparity and equity, patient care, and physician–patient relationship. From this diagram, it can also be seen that terms such as medical education, patient care, medical ethics, ethical dilemma and health equity are connected and clustered together, indicating a strong relationship between these terms in the literature.
  4.13. Thematic Map, Word Cloud, and Dendrogram
A thematic map and word cloud of authors’ keywords were created to analyse and categorise topics based on their relevance and degree of development, as shown in 
Figure 9 and 
Figure 10. A thematic map is a two-dimensional plot with four compartments revealing the theme’s present status and future direction in the subject area. The first dimension of the plot represents the centrality. It refers to the “relevance of the themes estimated by external associations with keywords”. On the other hand, the second dimension represents the density, referring to “the degree of development of the themes as measured by the internal associations among the keywords” [
44].
- The basic themes are those with low density and high centrality. In this case, only emergency medicine shows as a basic theme. 
- The motor themes are those with high density and centrality. These have been more explored by scholars and in this study include topics such as mental health, diagnosis, health literacy, patient education, nursing education, and ethical implications. Note that many of these themes are also observed in the word cloud in  Figure 10- . 
- The niche themes represent less connected but more developed topics, with high-density but low centrality. Niche themes found here include ophthalmology and retrieval-augmented generation. 
- The emerging themes represent themes with low density and low centrality. For this study, emerging themes include medical education, AI ethics, communication and dentistry. 
A general view of the prevalence of different topics is given by the word cloud in 
Figure 10. Combining both the word cloud and the thematic map enables more comprehensiveness, thus proposing a broader view of the state of the art. Clinical decision making, medical and patient education, mental health, diagnosis accuracy, medical ethics, and data privacy all have a high frequency and thus would appear to be some of the most actively researched topics.
Figure 11 presents a topic dendrogram, hierarchically illustrating the correlation among authors’ keywords. The height of each branch increases with the level of dissimilarity between themes. Similar themes are placed at lower heights, while dissimilar themes are placed at higher heights. The connecting horizontal line between themes illustrates the degree of similarity between them.
   4.14. Summary
This study included 267 articles published between 2023 and 2025. While the short time frame is illustrative of the recent emergence of GenAI in public health, the number of articles indicates the healthcare industry’s high interest in GenAI, therefore suggesting there is an avenue for GenAI to bring changes and innovations. Based on the themes that emerged in the bibliometric analysis, the following section of this paper will present some of the key application areas and challenges of GenAI in healthcare. These themes indicate where research is currently focused, some potential benefits that GenAI can provide in the healthcare industry, and the challenges needed to be addressed for their integration within the healthcare sector.
  5. Applications of GenAI in Healthcare
In this section, we detail some of the main application areas of GenAI in healthcare that showed up in the bibliometric analysis.
  5.1. Patient Care and Diagnosis
The rapid growth of GenAI has created a paradigm shift from traditional machine and deep learning methods to LLMs for clinical diagnosis. Their ability to perform sophisticated tasks with minimal human intervention is a potential game-changer in the dynamic medical environment. Clinicians can use GenAI systems to harness the content of EHRs and suggest diagnoses, treatment and care plans, along with justifications for the suggestions made [
45]. Research by Liu et al. [
5] has shown, for example, that LLMs have the potential to generate clinical radiology reports of limb X-rays for the detection of abnormalities in the emergency department. The study argued that local LLMs outperformed commercial LLMs in clinical textual data generation and stand as a more suitable option as they comply best with the sensitivity of patient data protection regulations. Similarly, Li et al. [
46] explored the potential of LLMs for generating and mining clinical data. Their study investigated the performance of LLMs in clinical data augmentation for the diagnosis of Alzheimer’s symptoms from EHRs. Their work established that LLMs can generate clinical text to enhance the diagnosis of Alzheimer’s Disease with the support of expert-based knowledge. Pagano et al. [
3] conducted a study involving 115 orthopedic patients. Their study compared the diagnosis performance of five different LLMs, including four versions of ChatGPT and two of Gemini, to experienced orthopedic clinicians’ diagnoses. It was found that GPT-4o achieved the highest diagnostic sensitivity at 92.3%, significantly outperforming other LLMs. Another study demonstrated the ability of ChatGPT to accurately generate hospital discharge summaries, outperforming UK junior doctors [
47]. Researchers have also examined the potential use of LLMs in other specialties, such as obstetric gynecology [
6], neurology [
48], and otolaryngology [
49]. The authors of these studies compared the LLMs’ diagnostic performance to that of domain experts. Although the LLMs demonstrated general positive performances, the need for a responsible use of the LLMs was highlighted in order to minimize misinformation, errors and was further supported by a recent study related to personalized nutrition [
50].
  5.2. Clinical Decision-Making
Decision-making is a critical aspect of clinical care; the outcome of any medical intervention is underpinned by the timing and adequacy of the healthcare practitioner’s technical and ethical judgment and abilities. This task is generally complex and resource-intensive, and it differs from one patient to another. GenAI offers tools to facilitate and improve the decision-making process in healthcare settings [
51].
A recent study explored the decision-making process of two LLMs, ChatGPT-4, and Claude-3-Opus, in suggesting treatments for prostate cancer patients [
52]. The authors of that study reported that both LLMs demonstrated 93% adherence with the multidisciplinary team of experts’ treatment recommendations. Nevertheless, there was a discrepancy between the LLMs’ recommendations and those of the experts in 9% of cases due to a lack of clinical information provided to the LLMs, while in 3% of cases the LLMs were not in line with professional guidelines, even though they had full access to all relevant patient information.
In another study, a randomised control trial was conducted to investigate the efficacy of ChatGPT in providing clinical guidance during cardiac arrest situations [
53]. The study compared the efficacy of ChatGPT, clinician-supervised ChatGPT and traditional paper-based instructions on cardiac arrest and CPR. The results indicated that the clinician-supervised ChatGPT was more accurate than both ChatGPT alone and the paper-based instruction. However, it was reported that ChatGPT recommended a risky option in one instance, thus highlighting the need for clinical specialist supervision.
In the same vein, Schmidl et al. [
54] assessed the performance of two versions of ChatGPT as tumour decision-making tools for recurrent/metastatic head and neck cancer cases. Both LLMs were used to generate therapy recommendations for 100 (50 recurrent and 50 metastatic) head and neck cancer cases. The authors assessed the LLMs’ output in terms of clinical summarisation, explanation and recommendation. From these, it was established that both LLMs were able to provide answers related to surgery, palliative care or systemic therapy. However, the authors again highlighted the necessity for validation by experienced clinicians, as both LLMs provided incorrect results in some cases.
LLMs offer an avenue for change and innovations in clinical decision-making. However, as seen in the studies presented here, there is a risk of incorrect recommendations [
30]. Therefore, constant clinical supervision is recommended in order to minimise life-threatening clinical decisions.
  5.3. Patient and Clinician Education
From the patient’s perspective, GenAI tools can be harnessed for tasks related to customer service, such as appointment booking or general inquiries about healthcare services and providers. In this regard, they can be valuable tools in mitigating healthcare practitioners’ burnout and high turnover. The global healthcare practitioner crisis reported by the World Health Organisation (WHO) [
55] suggests a different approach in providing access to care. A shift from traditional care to patient-centred care is needed to improve healthcare outcomes. However, the effectiveness of patient-centred care approaches requires the patients to be educated and empowered about their conditions. Enhancing healthcare accessibility and communication would be valuable in achieving this goal, as it will enable patients who are not familiar, in most cases, with medical jargon to have a clearer picture of their condition [
51].
A study conducted by Halawani et al. [
7] demonstrated that LLMs could be beneficial to empower patients in renal cancer education. According to the authors, these LLMs generated reports that proved to be readable and accurate with only minor detail omissions. Consequently, the study acknowledged the potential of LLMs to simplify medical communication to patients while emphasising the necessity for cautious use.
Similarly, Scquizzato et al. [
56] explored the potential of ChatGPT in providing answers to lay people’s questions related to cardiac arrest and cardiopulmonary resuscitation. The study acknowledged the potential of ChatGPT in providing largely accurate, relevant, and comprehensive answers to questions related to cardiac arrest while pointing out ChatGPT’s low-quality response to CPR-related questions, suggesting the importance of clinical specialist monitoring.
Patient engagement is a pivotal aspect of prevention and treatment therapy. However, lack of access to medical information seems to be a challenge. GenAI holds the potential to filling this gap by helping patients enhance their literacy in various medical topics including infectious diseases studies examining this for areas such as infectious diseases [
57], orthopedic [
58] and drug discovery [
59].
  5.4. Administrative Tasks and Workflow Management
In the healthcare landscape, administrative and workflow management are areas that require a lot of expertise. They include operational, human, and financial resource management. Healthcare practitioners, leaders and managers are responsible for optimizing the human, financial, and technical resources available to meet key performance indicators related to the areas of specialization under their remit. One common issue they face is patient waiting times due to demand and capacity imbalance. For this purpose, Biesheuvel et al. [
51] highlighted the potential of LLMs in critical care logistics, with a focus on aspects such as medication supply coordination and intensive care unit (ICU) bed capacity prediction.
Furthermore, physicians also handle a significant number of tasks not necessarily related to patient care. For example, it was reported by Toscano et al. [
1] that physicians in the outpatient department spend more than 44.9% of their clinical time on activities outside of patient care. GenAI offers the potential to reduce the administrative workload [
60]. By streamlining and optimizing the workflow processes, GenAI can reduce the administrative burden in areas including, but not limited to clinical note summarisation [
4,
60,
61] and clinical triage [
4].
  6. Challenges of GenAI in Healthcare
As noted by many of the papers included in the bibliometric analysis, the use of GenAI in healthcare comes with a number of challenges and limitations. Healthcare is a sensitive application domain for GenAI as it is so deeply connected to a patient’s life. Consequently, deploying GenAI systems requires the consideration of aspects such as reliability, effectiveness, accuracy, privacy, and safety. In this section, we detail some of the challenges that arose as themes in our analysis.
  6.1. Low Medical Accuracy and Misinformation
GenAI systems can provide pertinent health information, but they might also produce incorrect information given their probabilistic nature [
4]. As a result, their accuracy remains an issue, particularly in areas such as clinical diagnosis [
3,
46,
47], decision making [
52,
53,
54], and patient care [
7,
56]. LLMs are often trained on large amounts of data from the internet and are only as accurate as the quality of data they are trained on. Accordingly, LLMs may provide erroneous information and recommendations which could endanger patients’ lives and potentially lead to death [
62]. The outputs generated by LLMs and GenAI systems in general ought to be reviewed and validated accordingly by clinical domain experts to minimize medical misinformation and irreversible mistakes. As such, the literature suggests the use of GenAI tools as an assistant to augment healthcare practitioners rather than replacing them [
12,
63].
  6.2. Ethical Consent and Regulatory Issues
Ethics, regulations, and governance are key issues in the healthcare domain. Patients’ healthcare information is extremely sensitive, and the analysis and sharing of this data is often tightly governed. These aspects must also be taken into account when designing GenAI systems for use within the healthcare industry.
  6.2.1. Data Privacy and Security
GenAI could pose a potential threat to data privacy and security in healthcare, as a vast amount of data is required to train the models, thus suggesting the collection and processing of sensitive patient data for training purposes. The data may come from patients who are not necessarily aware and could eventually be maliciously used during training and deployment if wrongly disclosed [
64].
The privacy issues of GenAI models should be addressed to maintain the safety and confidentiality of patients’ personal data. Unlike commercial data such as retail transaction data, medical data are non-perishable; they remain valuable over time, and the privacy and confidentiality breach risk actively remains a concern. In this light, Brown et al. [
65] argued that de-identifying data for LLMs could be quite challenging, especially when dealing with medical images [
66]. In use cases where LLMs interact with data that includes medical histories or diagnoses, they might gather and store them, consequently making room for skilled hackers to steal or leak the data if the platforms hosting the LLMs are not secure enough. A possible example is given by Hacker et al. [
67], highlighting the malicious use of improperly secured LLMs to generate fake patients for insurance claims.
  6.2.2. Transparency and Trustworthiness
Studies point out the opacity of GenAI models [
68], leading to a lack of transparency and trust [
19]. The black box nature of GenAI prevents the user from understanding the decision-making process. Given the delicate nature of the healthcare environment, GenAI tools should be able to provide clarity on their decision-making process to permit users to comprehend their generated outputs. Additionally, improving explainability in health-specific GenAI models can enhance the transparency and trustworthiness of medical diagnosis, as well as enhance personalization in the treatment and management of diseases [
69]. Transparency can also help to mitigate the issue of algorithmic bias, as discussed in the next section, and can improve the confidence level of the users towards the GenAI tools.
The issue of trust of GenAI in healthcare has been linked to the absence of the human factor in GenAI systems. For example, it was noted by Sharma et al. [
19] and Sweeney et al. [
70] that the lack of human factor in GenAI tools made the users feel less connected when using them. Finally, misinformation and hallucinations observed in LLMs are factors hindering their trustworthiness, such as when they offer harmful suggestions in mental health contexts, as reported by Navarro et al. [
69].
  6.2.3. Algorithmic Bias, Fairness and Responsibility
Algorithmic bias and fairness in GenAI models are huge challenges for their use in the healthcare industry, as they can potentially lead to healthcare disparities and discrimination [
69]. Algorithmic bias mostly results from the design and implementation phase of GenAI models. They might come from the data or from the LLMs and can intensify cultural and social bias or even lead to unfair treatment towards a specific group of people. Studies identified data bias as one cause of algorithmic bias but equally highlight that algorithmic bias could be due to biased decision-making processes within the GenAI model itself [
68].
GenAI is emerging in telehealth with the use of conversational chatbots and virtual assistants. Companies make use of these methods to enhance digital health solutions, improve knowledge access and patient engagement. However, GenAI models lack the human status to bear any legal responsibility in case of misinformation or negative outcomes, therefore creating an ethical concern [
4,
70]. To promote the responsible use of GenAI in healthcare, a human–AI collaboration approach is needed [
71].
Moreover, practitioners and patients should understand their roles and responsibilities when using GenAI systems, as this will help to acknowledge and mitigate potential biases and define responsibilities. Similarly, the developers of GenAI models need to be responsible and accountable for the quality of design, implementation and performance of their products.
  7. Limitations and Future Research
As noted in 
Section 3, the data gathering for this study was performed only on the Scopus database. A consideration of other popular databases like Web of Science and PubMed, in addition to Scopus, would potentially result in retrieving a greater number of relevant papers.
The topic of GenAI in healthcare is relatively new, meaning that the papers analysed dated from January 2023 to February 2025. The swift growth rate of work in this area means that many more papers are being published on this topic. Performing the same search in July 2025 would have increased the number of papers found by 290. Given this, there is scope for similar studies to be performed using additional databases, or at a later date, to allow for the discovery of the changes in trend in the topic.
  8. Discussion
From a bibliometric perspective, this study explored the knowledge structure and progress of Gen AI in healthcare. Through the analysis of related bibliographic data, various insights were depicted, patterns were uncovered, and the benefits of GenAI were discussed in conjunction with its limitations. The 267 articles included in this research were retrieved in February 2025. These articles were published between January 2023 and February 2025, with an annual growth rate of 44.02%, suggesting a rapid growth and a high level of interest in the field.
While GenAI is gaining prominence in healthcare, its exploration is currently limited to a small number of developed countries. The United States of America is leading research in this area, as the most influential country in terms of co-authorship, followed by the United Kingdom and Canada. The US invests a lot of financial resources to fund AI research and promote life-enhancing, cutting-edge technologies [
4,
10]. As a result, the influential affiliations are both in the US and it is also the most-cited country, with 543 citations, followed by Germany and China.
However, the co-authorship network was dominated by a group a six medical experts from Japan. They were followed by a group of five authors from the US, also medical experts. This finding reinforces the idea that medical domain knowledge is essential for the efficient exploration and application of GenAI capabilities in healthcare, suggesting interdisciplinary collaboration between AI experts and medical professionals is necessary to efficiently and effectively harness GenAI to enhance public health services [
9].
The most cited articles and references were related to the use, benefits, and limitations of ChatGPT. The majority of these studies compared ChatGPT’s abilities to medical professionals, students and in some cases to other LLMs such as Google’s Gemini or Microsoft’s Bing Chat [
27,
28,
29]. These studies showed the potential of LLMs for the healthcare domain, while emphasizing on their limitations and the importance of keeping medical professionals in the loop [
30]. A co-words analysis showed that GenAI research in healthcare was trending towards topics, including medical education, patient education, patient care, clinical decision making, and medical ethics, all representing the applications and, to some extent, the limitations of GenAI in healthcare.
GenAI has the potential to revolutionise the healthcare industry in areas such as clinical diagnosis, decision-making, knowledge access, and workflow management. However, the challenges of GenAI in healthcare need be acknowledged and addressed. Adopting a human–GenAI collaboration approach could facilitate the practical integration of GenAI, as clinical expert validation of LLM-generated output will enhance patient safety [
69,
71].
  9. Conclusions
The future of GenAI in healthcare seems promising, offering potential tools in a range of healthcare areas. However, researchers need to acknowledge and address the limitations of GenAI, as these can seriously hinder its integration in clinical practice and thus block the potential benefits for practitioners and patients alike. Perceiving GenAI systems as supporting tools rather than a replacement for healthcare professionals can foster a sustainable human–GenAI collaboration approach guided by ethical and regulatory policies.
Considering the dynamic of the healthcare industry and the rapid evolution of GenAI technologies, it is important to continuously monitor and evaluate the performance of GenAI systems. This will minimise inaccuracies and enable health professionals to take corrective measures. Moreover, attention needs to be paid to sanitising patient data when dealing with LLMs and other GenAI tools. Medical practitioners and GenAI developers are invited to find a balance between harnessing the potential of GenAI to improve health outcomes and maintaining the healthcare core values, including integrity, accountability, empathy, and quality of care.  
   
  
    Author Contributions
Conceptualization, M.T.M. and V.K.N.; methodology, M.T.M. and V.K.N.; data curation, V.K.N.; writing—original draft preparation, V.K.N.; writing—review and editing, M.T.M.; visualization, M.T.M. and V.K.N.; supervision, M.T.M.; funding acquisition, M.T.M. and V.K.N. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by Taighde Éireann-Research Ireland grant number 18/CRT/6223.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.
References
- Toscano, F.; O’Donnell, E.; Broderick, J.E.; May, M.; Tucker, P.; Unruh, M.A.; Messina, G.; Casalino, L.P. How physicians spend their work time: An ecological momentary assessment. J. Gen. Intern. Med. 2020, 35, 3166–3172. [Google Scholar] [CrossRef]
- Maity, S.; Saikia, M.J. Large Language Models in Healthcare and Medical Applications: A Review. Bioengineering 2025, 12, 631. [Google Scholar] [CrossRef]
- Pagano, S.; Strumolo, L.; Michalk, K.; Schiegl, J.; Pulido, L.C.; Reinhard, J.; Maderbacher, G.; Renkawitz, T.; Schuster, M. Evaluating ChatGPT, Gemini and other Large Language Models (LLMs) in orthopaedic diagnostics: A prospective clinical study. Comput. Struct. Biotechnol. J. 2025, 28, 9–15. [Google Scholar] [CrossRef]
- Sai, S.; Gaur, A.; Sai, R.; Chamola, V.; Guizani, M.; Rodrigues, J.J. Generative AI for transformative healthcare: A comprehensive study of emerging models, applications, case studies, and limitations. IEEE Access 2024, 12, 31078–31106. [Google Scholar] [CrossRef]
- Liu, J.; Koopman, B.; Brown, N.J.; Chu, K.; Nguyen, A. Generating synthetic clinical text with local large language models to identify misdiagnosed limb fractures in radiology reports. Artif. Intell. Med. 2025, 159, 103027. [Google Scholar] [CrossRef] [PubMed]
- Grünebaum, A.; Chervenak, J.; Pollet, S.L.; Katz, A.; Chervenak, F.A. The exciting potential for ChatGPT in obstetrics and gynecology. Am. J. Obstet. Gynecol. 2023, 228, 696–705. [Google Scholar] [CrossRef] [PubMed]
- Halawani, A.; Almehmadi, S.G.; Alhubaishy, B.A.; Alnefaie, Z.A.; Hasan, M.N. Empowering patients: How accurate and readable are large language models in renal cancer education. Front. Oncol. 2024, 14, 1457516. [Google Scholar] [CrossRef]
- Donthu, N.; Kumar, S.; Mukherjee, D.; Pandey, N.; Lim, W.M. How to conduct a bibliometric analysis: An overview and guidelines. J. Bus. Res. 2021, 133, 285–296. [Google Scholar] [CrossRef]
- Zhang, W.; Zhang, Q.; Wang, P.; Zhou, X.; Yulan, W. The application of Generative Artificial Intelligence in Mental Health Care: A Bibliometric and Visualized Analysis. Asian J. Psychiatry 2025, 110, 104596. [Google Scholar] [CrossRef]
- Jimma, B.L. Artificial intelligence in healthcare: A bibliometric analysis. Telemat. Inform. Rep. 2023, 9, 100041. [Google Scholar] [CrossRef]
- Guo, Y.; Hao, Z.; Zhao, S.; Gong, J.; Yang, F. Artificial intelligence in health care: Bibliometric analysis. J. Med. Internet Res. 2020, 22, e18228. [Google Scholar] [CrossRef]
- Reddy, S. Generative AI in healthcare: An implementation science informed translational path on application, integration and governance. Implement. Sci. 2024, 19, 27. [Google Scholar] [CrossRef]
- Moulaei, K.; Yadegari, A.; Baharestani, M.; Farzanbakhsh, S.; Sabet, B.; Afrash, M.R. Generative artificial intelligence in healthcare: A scoping review on benefits, challenges and applications. Int. J. Med. Inform. 2024, 188, 105474. [Google Scholar] [CrossRef]
- Zhou, C.; Li, Q.; Li, C.; Yu, J.; Liu, Y.; Wang, G.; Zhang, K.; Ji, C.; Yan, Q.; He, L.; et al. A comprehensive survey on pretrained foundation models: A history from bert to ChatGPT. Int. J. Mach. Learn. Cybern. 2024. [Google Scholar] [CrossRef]
- García-Porta, N.; Vaughan, M.; Rendo-González, S.; Gómez-Varela, A.I.; O’Donnell, A.; de Moura, J.; Novo-Bujan, J.; Ortega-Hortas, M. Are artificial intelligence chatbots a reliable source of information about contact lenses? Contact Lens Anterior Eye 2024, 47, 102130. [Google Scholar] [CrossRef]
- McNeill, M. Extraordinary Impacts on the Healthcare Workforce: COVID-19 and Aging. Del. J. Public Health 2022, 8, 164. [Google Scholar] [CrossRef] [PubMed]
- Gandhi, T.K.; Classen, D.; Sinsky, C.A.; Rhew, D.C.; Vande Garde, N.; Roberts, A.; Federico, F. How can artificial intelligence decrease cognitive and work burden for front line practitioners? JAMIA Open 2023, 6, ooad079. [Google Scholar] [CrossRef] [PubMed]
- Wilson, L.; Marasoiu, M. The development and use of chatbots in public health: Scoping review. JMIR Hum. Factors 2022, 9, e35882. [Google Scholar] [CrossRef] [PubMed]
- Sharma, D.; Kaushal, S.; Kumar, H.; Gainder, S. Chatbots in healthcare: Challenges, technologies and applications. In Proceedings of the 2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST), Delhi, India, 9–10 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar]
- Baas, J.; Schotten, M.; Plume, A.; Côté, G.; Karimi, R. Scopus as a curated, high-quality bibliometric data source for academic research in quantitative science studies. Quant. Sci. Stud. 2020, 1, 377–386. [Google Scholar] [CrossRef]
- Pranckutė, R. Web of Science (WoS) and Scopus: The titans of bibliographic information in today’s academic world. Publications 2021, 9, 12. [Google Scholar] [CrossRef]
- Aria, M.; Cuccurullo, C. bibliometrix: An R-tool for comprehensive science mapping analysis. J. Inf. 2017, 11, 959–975. [Google Scholar] [CrossRef]
- Roshani, M.A.; Zhou, X.; Qiang, Y.; Suresh, S.; Hicks, S.; Sethuraman, U.; Zhu, D. Generative large language model—powered conversational ai app for personalized risk assessment: Case study in covid-19. JMIR AI 2025, 4, e67363. [Google Scholar] [CrossRef] [PubMed]
- Al-Amin, M.; Ali, M.S.; Salam, A.; Khan, A.; Ali, A.; Ullah, A.; Alam, M.N.; Chowdhury, S.K. History of generative Artificial Intelligence (AI) chatbots: Past, present, and future development. arXiv 2024, arXiv:2402.05122. [Google Scholar] [CrossRef]
- Nicholls, P.T. Bibliometric modeling processes and the empirical validity of Lotka’s law. J. Am. Soc. Inf. Sci. 1989, 40, 379–385. [Google Scholar] [CrossRef]
- Nagaiah, M.; Thanuskodi, S.; Alagu, A. Application of Lotka’s Law to the Research Productivity in the field of Open Educational Resources during 2011–2020. Libr. Philos. Pract. 2021, 2021, 6365. [Google Scholar]
- Cai, L.Z.; Shaheen, A.; Jin, A.; Fukui, R.; Jonathan, S.Y.; Yannuzzi, N.; Alabiad, C. Performance of generative large language models on ophthalmology board–style questions. Am. J. Ophthalmol. 2023, 254, 141–149. [Google Scholar] [CrossRef]
- Gabrielson, A.T.; Odisho, A.Y.; Canes, D. Harnessing generative artificial intelligence to improve efficiency among urologists: Welcome ChatGPT. J. Urol. 2023, 209, 827–829. [Google Scholar] [CrossRef]
- Tan, T.F.; Thirunavukarasu, A.J.; Campbell, J.P.; Keane, P.A.; Pasquale, L.R.; Abramoff, M.D.; Kalpathy-Cramer, J.; Lum, F.; Kim, J.E.; Baxter, S.L.; et al. Generative artificial intelligence through ChatGPT and other large language models in ophthalmology: Clinical applications and challenges. Ophthalmol. Sci. 2023, 3, 100394. [Google Scholar] [CrossRef]
- Chervenak, J.; Lieman, H.; Blanco-Breindel, M.; Jindal, S. The promise and peril of using a large language model to obtain clinical information: ChatGPT performs strongly as a fertility counseling tool with limitations. Fertil. Steril. 2023, 120, 575–583. [Google Scholar] [CrossRef]
- Giannakopoulos, K.; Kavadella, A.; Aaqel Salim, A.; Stamatopoulos, V.; Kaklamanos, E.G. Evaluation of the performance of generative AI large language models ChatGPT, Google Bard, and Microsoft Bing Chat in supporting evidence-based dentistry: Comparative mixed methods study. J. Med. Internet Res. 2023, 25, e51580. [Google Scholar] [CrossRef]
- Nash-Stewart, C.E.; Kruesi, L.M.; Del Mar, C.B. Does Bradford’s Law of Scattering predict the size of the literature in Cochrane Reviews? J. Med. Libr. Assoc. JMLA 2012, 100, 135. [Google Scholar] [CrossRef]
- Sallam, M. ChatGPT utility in healthcare education, research, and practice: Systematic review on the promising perspectives and valid concerns. Healthcare 2023, 11, 887. [Google Scholar] [CrossRef]
- Ayers, J.W.; Poliak, A.; Dredze, M.; Leas, E.C.; Zhu, Z.; Kelley, J.B.; Faix, D.J.; Goodman, A.M.; Longhurst, C.A.; Hogarth, M.; et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern. Med. 2023, 183, 589–596. [Google Scholar] [CrossRef] [PubMed]
- Cooper, G. Examining science education in ChatGPT: An exploratory study of generative artificial intelligence. J. Sci. Educ. Technol. 2023, 32, 444–452. [Google Scholar] [CrossRef]
- Kanjee, Z.; Crowe, B.; Rodman, A. Accuracy of a generative artificial intelligence model in a complex diagnostic challenge. JAMA 2023, 330, 78–80. [Google Scholar] [CrossRef]
- Alkaissi, H.; McFarlane, S.I. Artificial hallucinations in ChatGPT: Implications in scientific writing. Cureus 2023, 15, e35179. [Google Scholar] [CrossRef]
- Meskó, B. Prompt engineering as an important emerging skill for medical professionals: Tutorial. J. Med. Internet Res. 2023, 25, e50638. [Google Scholar] [CrossRef]
- Kung, T.; Cheatham, M.; Medenilla, A.; Sillos, C.; De Leon, L.; Elepaño, C.; Madriaga, M.; Aggabao, R.; Diaz-Candido, G.; Maningo, J.; et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLoS Digital Health 2023, 2, e0000198. [Google Scholar] [CrossRef]
- Thirunavukarasu, A.; Ting, D.; Elangovan, K.; Gutierrez, L.; Tan, T.; Ting, D. Large language models in medicine. Nat. Med. 2023, 29, 1930–1940. [Google Scholar] [CrossRef]
- Biden, J. Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. 2023. Available online: https://digitalcommons.unl.edu/scholcom/263/ (accessed on 3 March 2025).
- Eysenbach, G. The role of ChatGPT, generative language models, and artificial intelligence in medical education: A conversation with ChatGPT and a call for papers. JMIR Med. Educ. 2023, 9, e46885. [Google Scholar] [CrossRef]
- Traag, V.A.; Waltman, L.; Van Eck, N.J. From Louvain to Leiden: Guaranteeing well-connected communities. Sci. Rep. 2019, 9, 5233. [Google Scholar] [CrossRef] [PubMed]
- Wilczewski, M.; Alon, I. Language and communication in international students’ adaptation: A bibliometric and content analysis review. High. Educ. 2023, 85, 1235–1256. [Google Scholar] [CrossRef] [PubMed]
- Barnett, G.O.; Cimino, J.J.; Hupp, J.A.; Hoffer, E.P. DXplain: An evolving diagnostic decision-support system. JAMA 1987, 258, 67–74. [Google Scholar] [CrossRef] [PubMed]
- Li, R.; Wang, X.; Yu, H. Two Directions for Clinical Data Generation with Large Language Models: Data-to-Label and Label-to-Data. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, 6–10 December 2023; Bouamor, H., Pino, J., Bali, K., Eds.;  Association for Computational Linguistics: Kerrville, TX, USA, 2023. [Google Scholar]
- Clough, R.A.J.; Sparkes, W.A.; Clough, O.T.; Sykes, J.T.; Steventon, A.T.; King, K. Transforming healthcare documentation: Harnessing the potential of AI to generate discharge summaries. BJGP Open 2024, 8, BJGPO.2023.0116. [Google Scholar] [CrossRef]
- Romano, M.F.; Shih, L.C.; Paschalidis, I.C.; Au, R.; Kolachalama, V.B. Large language models in neurology research and future practice. Neurology 2023, 101, 1058–1067. [Google Scholar] [CrossRef]
- Mayo-Yáñez, M.; Lechien, J.R.; Maria-Saibene, A.; Vaira, L.A.; Maniaci, A.; Chiesa-Estomba, C.M. Examining the Performance of ChatGPT 3.5 and Microsoft Copilot in Otolaryngology: A Comparative Study with Otolaryngologists’ Evaluation. Indian J. Otolaryngol. Head Neck Surg. 2024, 76, 3465–3469. [Google Scholar] [CrossRef]
- Kaçar, H.K.; Kaçar, Ö.F.; Avery, A. Diet Quality and Caloric Accuracy in AI-Generated Diet Plans: A Comparative Study Across Chatbots. Nutrients 2025, 17, 206. [Google Scholar] [CrossRef]
- Biesheuvel, L.A.; Workum, J.D.; Reuland, M.; van Genderen, M.E.; Thoral, P.; Dongelmans, D.; Elbers, P. Large language models in critical care. J. Intensive Med. 2024, 5, 113–118. [Google Scholar] [CrossRef]
- Kaiser, P.; Yang, S.; Bach, M.; Breit, C.; Mertz, K.; Stieltjes, B.; Ebbing, J.; Wetterauer, C.; Henkel, M. The interaction of structured data using openEHR and large Language models for clinical decision support in prostate cancer. World J. Urol. 2025, 43, 67. [Google Scholar] [CrossRef]
- Harari, R.E.; Altaweel, A.; Ahram, T.; Keehner, M.; Shokoohi, H. A randomized controlled trial on evaluating clinician-supervised generative AI for decision support. Int. J. Med. Inform. 2025, 195, 105701. [Google Scholar] [CrossRef]
- Schmidl, B.; Hütten, T.; Pigorsch, S.; Stögbauer, F.; Hoch, C.C.; Hussain, T.; Wollenberg, B.; Wirth, M. Assessing the role of advanced artificial intelligence as a tool in multidisciplinary tumor board decision-making for recurrent/metastatic head and neck cancer cases—The first study on ChatGPT 4o and a comparison to ChatGPT 4.0. Front. Oncol. 2024, 14, 1455413. [Google Scholar] [CrossRef]
- McIsaac, M.; Buchan, J.; Abu-Agla, A.; Kawar, R.; Campbell, J. Global Strategy on Human Resources for Health: Workforce 2030—A Five-Year Check-In. Hum. Resour. Health 2024, 22, 68. [Google Scholar] [CrossRef] [PubMed]
- Scquizzato, T.; Semeraro, F.; Swindell, P.; Simpson, R.; Angelini, M.; Gazzato, A.; Sajjad, U.; Bignami, E.G.; Landoni, G.; Keeble, T.R.; et al. Testing ChatGPT ability to answer laypeople questions about cardiac arrest and cardiopulmonary resuscitation. Resuscitation 2024, 194, 110077. [Google Scholar] [CrossRef] [PubMed]
- Meo, S.A.; Alotaibi, M.; Meo, M.Z.S.; Meo, M.O.S.; Hamid, M. Medical knowledge of ChatGPT in public health, infectious diseases, COVID-19 pandemic, and vaccines: Multiple choice questions examination based performance. Front. Public Health 2024, 12, 1360597. [Google Scholar] [CrossRef] [PubMed]
- Ghanem, D.; Shu, H.; Bergstein, V.; Marrache, M.; Love, A.; Hughes, A.; Sotsky, R.; Shafiq, B. Educating patients on osteoporosis and bone health: Can “ChatGPT” provide high-quality content? Eur. J. Orthop. Surg. Traumatol. 2024, 34, 2757–2765. [Google Scholar] [CrossRef]
- Zeng, X.; Wang, F.; Luo, Y.; Kang, S.g.; Tang, J.; Lightstone, F.C.; Fang, E.F.; Cornell, W.; Nussinov, R.; Cheng, F. Deep generative molecular design reshapes drug discovery. Cell Rep. Med. 2022, 3, 100794. [Google Scholar] [CrossRef]
- Schoonbeek, R.; Workum, J.; Schuit, S.C.; Doornberg, J.; van der Laan, T.P.; Bootsma-Robroeks, C.M. Completeness, Correctness and Conciseness of Physician-Written Versus Large Language Model Generated Patient Summaries Integrated in Electronic Health Records.  2024. Available online: https://ssrn.com/abstract=4835935 (accessed on 15 October 2025).
- Madden, M.G.; McNicholas, B.A.; Laffey, J.G. Assessing the usefulness of a large language model to query and summarize unstructured medical notes in intensive care. Intensive Care Med. 2023, 49, 1018–1020. [Google Scholar] [CrossRef]
- Munn, L.; Magee, L.; Arora, V. Truth machines: Synthesizing veracity in AI language models. AI Soc. 2024, 39, 2759–2773. [Google Scholar] [CrossRef]
- Sezgin, E. Artificial intelligence in healthcare: Complementing, not replacing, doctors and healthcare providers. Digital Health 2023, 9, 20552076231186520. [Google Scholar] [CrossRef]
- Xu, L.; Sanders, L.; Li, K.; Chow, J.C. Chatbot for health care and oncology applications using artificial intelligence and machine learning: Systematic review. JMIR Cancer 2021, 7, e27850. [Google Scholar] [CrossRef]
- Brown, H.; Lee, K.; Mireshghallah, F.; Shokri, R.; Tramèr, F. What does it mean for a language model to preserve privacy? In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, 21–24 June 2022; pp. 2280–2292. [Google Scholar]
- Kim, B.N.; Dolz, J.; Jodoin, P.M.; Desrosiers, C. Privacy-net: An adversarial approach for identity-obfuscated segmentation of medical images. IEEE Trans. Med. Imaging 2021, 40, 1737–1749. [Google Scholar] [CrossRef]
- Hacker, P.; Engel, A.; Mauer, M. Regulating ChatGPT and other large generative AI models. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, Chicago, IL, USA, 12–15 June 2023; pp. 1112–1123. [Google Scholar]
- Wang, C.; Liu, S.; Yang, H.; Guo, J.; Wu, Y.; Liu, J. Ethical considerations of using ChatGPT in health care. J. Med. Internet Res. 2023, 25, e48009. [Google Scholar] [CrossRef] [PubMed]
- Navarro, H.J.; Sandoval, C.L.; Galpin, I. Large language models in medicine: A systematic review of applications in medical, healthcare, and educational contexts. Period. Eng. Nat. Sci. 2025, 13, 629–670. [Google Scholar] [CrossRef]
- Sweeney, C.; Potts, C.; Ennis, E.; Bond, R.; Mulvenna, M.D.; O’neill, S.; Malcolm, M.; Kuosmanen, L.; Kostenius, C.; Vakaloudis, A.; et al. Can chatbots help support a person’s mental health? Perceptions and views from mental healthcare professionals and experts. ACM Trans. Comput. Healthc. 2021, 2, 1–15. [Google Scholar] [CrossRef]
- Wang, L.; Wang, D.; Tian, F.; Peng, Z.; Fan, X.; Zhang, Z.; Yu, M.; Ma, X.; Wang, H. Cass: Towards building a social-support chatbot for online health community. Proc. ACM-Hum.-Comput. Interact. 2021, 5, 3449083. [Google Scholar] [CrossRef]
|  | Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
      
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).