Next Article in Journal
Effects of Seat Height and Backrest Inclination on Body Pressure Distribution and Subjective Comfort in Seat Design for the Elderly
Previous Article in Journal
Mechanical Properties and Performance Assessment of Polymer Concretes
Previous Article in Special Issue
Personality Emulation Utilizing Large Language Models
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Systematic Review

Generative AI Chatbots Across Domains: A Systematic Review

Computer Science Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 11432, Saudi Arabia
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(20), 11220; https://doi.org/10.3390/app152011220
Submission received: 26 September 2025 / Revised: 11 October 2025 / Accepted: 14 October 2025 / Published: 20 October 2025

Abstract

The rapid advancement of large language models (LLMs) has significantly transformed the development and deployment of generative AI chatbots across various domains. This systematic literature review (SLR) analyzes 39 primary studies published between 2020 and 2025 to explore how these models are utilized, the sectors in which they are deployed, and the broader trends shaping their use. The findings reveal that models such as GPT-3.5, GPT-4, and LLaMA variants have been widely adopted, with applications spanning education, healthcare, business services, and beyond. As adoption increases, research continues to emphasize the need for more adaptable, context-aware, and responsible chatbot systems. The insights from this review aim to guide the effective integration of LLM-based chatbots, highlighting best practices such as domain-specific fine-tuning, retrieval-augmented generation (RAG), and multi-modal interaction design. This review maps the current landscape of LLM-based chatbot development, explores the sectors and primary use cases in each domain, analyzes the types of generative AI models used in chatbot applications, and synthesizes the reported limitations and future directions to guide effective strategies for their design and deployment across domains.

1. Introduction

Artificial intelligence (AI) has become one of the most important and rapidly growing fields in technology. It plays a critical role in various domains such as healthcare, education, business, and customer service [1]. In recent years, AI has undergone significant advancements, particularly with the emergence of LLMs such as GPT, BERT, and PaLM 2 [2], which have expanded the capabilities of machines to simulate understanding and generate human-like text [3].
One significant application of these advances is the development of AI-driven conversational agents, commonly known as chatbots [4]. These systems are now widely used to communicate with users, answer questions, provide information, and assist in various fields, including customer support, education, and healthcare [5]. Although the concept of chatbots dates back to the 1960s [6], their capabilities have evolved remarkably in recent years [7]. Today’s chatbots are smarter, more interactive, and capable of holding natural, human-like conversations. In the healthcare sector, for example, AI chatbots are used to assist patients, offer health advice, and support mental health services [4]. In the business world, companies use chatbots to respond quickly to customer inquiries, reduce employee workload, and improve overall service efficiency [8]. Similarly, in education, chatbots provide on-demand learning resources and answer academic questions [2].
Several prior systematic reviews have examined AI-driven chatbots, but their scopes were often confined to specific sectors or narrow model categories. For example, the review presented in [5] provided a comprehensive analysis of AI-driven conversational chatbot methodologies from 1999 to 2022, focusing mainly on implementation techniques and challenges in general customer service and education contexts. While insightful, this review did not analyze large language model (LLM)-based generative chatbots. Similarly, the systematic literature review in [1] emphasized the design frameworks and application trends of conversational agents up to early 2024. However, it did not explore cross-domain comparisons or the evolution of generative architectures such as GPT-3.5 [9], GPT-4 [9], or LLaMA [10].
Despite the widespread use of LLM-based generative AI chatbots, there remains a lack of consolidated evidence on how they are being implemented, the specific benefits they offer, and the challenges they face in real-world settings [6]. To address this gap, this study presents a systematic review of the literature (SLR) that investigates the capabilities and limitations of AI chatbots in various domains. Unlike previous SLRs that often focused on a single domain or a narrow range of applications, this review will analyze emerging trends, identify technological approaches, and summarize best practices that contribute to effective human-AI interactions and domain-specific innovations.
Building upon these prior reviews, this paper seeks to address the following research questions: RQ1: In which sectors have generative AI chatbots been implemented, and what are their primary use cases in each domain? RQ2: What types of generative AI models are used in chatbot applications across different domains? RQ3: What are the reported limitations and challenges in applying generative AI chatbots? RQ4: What directions does current research suggest for future development and deployment of generative AI chatbots?
The rest of the paper is organized as follows: Section 2 presents the research methodology, including the PRISMA-based search strategy and inclusion criteria. Section 3 provides a digest of the review results, while Section 4 is a discussion based on the reflections gleaned from the review. The final section concludes the paper.

2. Research Methodology

Our study followed the PRISMA2020 [11] model, which was designed to improve the transparency and quality of reporting in systematic. A comprehensive literature search was conducted across multiple electronic databases, including IEEE Xplore, ScienceDirect, and Web of Science. Our search was performed using a combination of keywords and Boolean operators: (“generative AI” OR “large language model” OR “LLM”) AND (“chatbot” OR “conversational agent” OR “CA” OR “intelligence chatbot” OR “AI chatbot”). The search was limited to publications from January 2020 to May 2025 to capture recent advancements in the field.

2.1. Study Selection Criteria

The selection of studies for this systematic review was guided by explicit inclusion and exclusion criteria to ensure methodological rigor and relevance. Studies were eligible for inclusion if they focused on the development, application, or evaluation of generative AI chatbots leveraging LLMs; addressed applications in one or more target domains (such as healthcare, education, or customer service); were published in peer-reviewed journals, conferences, or reputable technical reports; were written in English; featured contributions from LLMs; and involved text-based chatbot systems.
Conversely, studies were excluded if they were purely theoretical without implementation or evaluation, consisted mainly of surveys or meta-analyses of chatbot impact, focused exclusively on non-generative chatbot architectures (e.g., rule-based or retrieval-based chatbots), were not peer-reviewed (such as blogs or advertisements), or did not meet the above domain and methodological requirements.
The initial database search yielded a total of [2250] articles. Relevance screening was carried out in multiple stages: first, titles were reviewed to remove clearly irrelevant records; next, abstracts were assessed against the inclusion and exclusion criteria; and finally, full-text articles were examined in detail to determine eligibility for the final synthesis. Through this multi-stage process, and as documented in the accompanying PRISMA flow diagram (Figure 1), a total of 39 primary studies were selected for inclusion in the review.

2.2. Reviewing Process

Each study included in the review was read in full to ensure a thorough understanding of its aims, approach, and contributions. Data were systematically extracted from these selected studies using a standardized data extraction form, and all collected information was compiled into a structured Excel document for consistency and easy reference. For every study, the review team gathered key bibliographic details such as the title and year of publication, as well as specifics regarding the study’s objectives and domain focus. Information on the datasets used was recorded to enable comparison of data sources and research contexts. In addition, details about the chatbot’s primary functionality and its mode of user interaction were noted, along with the particular type of large language model (LLM) utilized in the implementation and how the chatbot was technically realized.
Quantitative and qualitative evaluation outcomes were carefully documented, capturing not only the metrics and results reported by each study but also any stated limitations and proposed areas for future work. This comprehensive process ensured that each study was evaluated on a common set of criteria, facilitating meaningful synthesis across the literature. Once all relevant data were extracted, they were analyzed and synthesized to address the review’s overarching research questions. By drawing together findings from diverse studies using this consistent process, the review was able to identify trends, capture sectoral insights, and highlight prevailing challenges and gaps in the deployment of generative AI chatbots across various domains.

3. Results and Findings

This section presents the findings of the reviewed studies, organized according to the four guiding research questions. It explores the sectors and the primary use cases in each domain, the types of generative AI models used in chatbot applications, the reported challenges and limitations, and the directions suggested for future development and deployment.

3.1. In Which Sectors Have Generative AI Chatbots Been Implemented, and What Are Their Primary Use Cases in Each Domain?

The studies could be categorized according to their primary focus sectors, such as healthcare, education, cybersecurity, news and event analysis, and other miscellaneous sectors. Figure 2 illustrates the sector-wise distribution of generative AI chatbot applications based on the 39 articles reviewed.
The healthcare/medical sector emerges as the dominant area of application, accounting for 22 (approximately 56.4%) of the reviewed articles. This highlights a significant focus on the use of generative AI chatbots for various medical and health-related purposes. Following healthcare, the education sector shows the second-highest number of applications with five articles, indicating a growing interest in integrating these technologies into learning environments. The cybersecurity and media and journalism sectors also demonstrate notable adoption, with four and three articles, respectively. While industry and manufacturing, e-commerce, customer service, technical support service, and enterprise and management each feature in one article, they show a smaller but present number of applications. This distribution underscores the diverse, yet concentrated, areas where generative AI chatbots are currently being explored and implemented.
Table 1 provides a detailed overview of the primary use cases for generative AI chatbots across the identified sectors, including the corresponding sample sizes reported in each study. The distribution of studies across these categories reveals an uneven research landscape, where certain application areas have attracted substantial scholarly attention, while others remain in their early stages of exploration. This pattern underscores the evolving nature of the field and highlights emerging domains that warrant further empirical investigation.
In the healthcare and medical sector, generative AI chatbots are being implemented for a wide array of applications, reflecting the sector’s complexity and the potential for AI to enhance patient care and operational efficiency. Key use cases include providing vaccine awareness [12], bioinformatics support [13], and aiding in neurology diagnosis [14]. They are also utilized for personalized risk assessment, such as for COVID-19 [15], cancer proteomics analysis [16], and early dementia detection [17]. Chatbots serve as symptom checkers and medical triage support [18], offer explainable diabetes risk prediction [19], and facilitate general medical consultations [20].
Furthermore, they support specialized areas like Ayurvedic consultation [21], act as AI therapy assistants [22], and provide support for autism [23]. Other critical applications involve chronic disease auxiliary diagnosis [24], smoking cessation support [25], dental support [26], and pilgrim health support [27]. They also contribute to clinical case analysis [28], physical and mental health diagnosis support [29], nutrition guidance [30], mental health support [31], sexual harassment victim support [32], and psychotherapy support [33]. These applications demonstrate the potential of generative AI chatbots to assist in diagnosis, provide personalized health information, offer therapeutic support, and streamline various medical processes.
The education sector is leveraging generative AI chatbots to enhance learning experiences and administrative tasks. Primary use cases identified are: Programming Q&A Support [34], Educational Assistant [10,35], and University Information Assistant [36,37]. These applications aim to provide immediate support to students, automate routine queries, and facilitate a more interactive learning environment.
In the critical domain of cybersecurity, generative AI chatbots are being developed to address emerging threats and ensure the secure deployment of AI systems. Specific use cases include LLM Safety/Security [38,39], Cyber Incident Response Chatbot [40], and Security Log Summarization Chatbot [41]. These applications are crucial for identifying vulnerabilities, mitigating risks, and enhancing the overall security posture of AI-driven systems. The media and journalism sector is exploring generative AI chatbots for content creation, analysis, and dissemination. The primary use cases identified are in journalism, media, and news analysis [42,43,44]. This indicates the potential for AI to assist journalists in research, content generation, and understanding media trends.
In the technical support service sector, generative AI chatbots are increasingly deployed as IT Helpdesk Assistants [45], streamlining the resolution of common technical issues and providing immediate support to users. Also, generative AI chatbots are being widely adopted in customer service to improve efficiency and customer satisfaction. The main use case is Automotive Manual Query Assistant [46]. These chatbots can handle routine inquiries, provide instant solutions, and free up human agents for more complex issues. In the industry and manufacturing sector, generative AI chatbots are being utilized to optimize operations and provide on-demand support. The identified use case is Factory Troubleshooting Support [47]. This application helps in quickly diagnosing and resolving issues on the factory floor, minimizing downtime and improving productivity.
The e-commerce sector is leveraging generative AI chatbots to enhance the online shopping experience. The primary use case is Online Shopping Customer Support [48]. These chatbots can assist customers with product inquiries, order tracking, and personalized recommendations, leading to improved sales and customer loyalty. In the enterprise and management domain, generative AI chatbots are being applied to optimize business processes and decision-making. The identified use case is Business Cost Optimization [49]. This application demonstrates the potential for AI to analyze financial data, identify cost-saving opportunities, and support strategic planning.

3.2. What Types of Generative AI Models Are Used in Chatbot Applications Across Different Domains?

The review of 39 primary studies highlights a wide range of generative AI models employed in chatbot applications across diverse domains. The models include both proprietary and open-source LLMs, with varying scales, architectures, and customization strategies. These LLMs have been used in sectors such as healthcare, education, cybersecurity, customer service, and nutrition, among others. In several cases, studies leveraged general-purpose pre-trained models, while others adopted smaller or instruction-tuned variants, depending on the domain requirements and computational constraints. A number of implementations involved hybrid systems that combined LLMs with RAG combines large language models with external knowledge retrieval to generate factually grounded responses by retrieving relevant documents and using them as context for generation [50], domain-specific databases, or external reasoning modules.
The distribution of LLM usage and their corresponding performance scores across domains, as illustrated in Table 2, the analysis reveals that GPT-3.5 and GPT-3.5 Turbo [9] represent the most frequently utilized models across the reviewed studies, with GPT-3.5 adopted in six studies and GPT-3.5 Turbo in five. Following these, GPT-4.0 is utilized in four studies. Among open-source alternatives, LLaMA 3 [51] stands out with usage in three studies. Other models, such as LLaMA2-7b [51], GPT [9], Mistral 7B Instruct [52], Mistral 7B Instruct v0.2 [53], Microsoft Copilot [54], Google Gemini [55], and GPT-4o [9], are employed in two studies across domains like education, healthcare, and enterprise services. In contrast, models like Llama-3-3B-Instruct [51], Llama 3.1 [51], and Llama-2-13B [51] appear only once, often in exploratory or domain-specific research.

3.3. What Are the Reported Limitations and Challenges in Applying Generative AI Chatbots?

While generative AI chatbots have demonstrated significant promise across various domains, the literature consistently highlights key limitations that hinder their effectiveness and widespread deployment. One of the most critical challenges is the prevalence of hallucinations, where the models generate factually incorrect or misleading responses that still seem contextually fluent. This issue becomes especially problematic in high-stakes settings, where accuracy and trustworthiness are paramount [13,15]. In addition to hallucinations, generative chatbots often produce outputs that are unreliable or factually inaccurate, further eroding their credibility and limiting their practical utility.
Another significant hurdle lies in the adaptability of these systems, which are often built on static, narrowly scoped datasets. This reliance on fixed knowledge bases severely limits their ability to respond to real-time information or adjust to evolving contexts. For instance, systems like VaxBot, developed for specific domains such as healthcare or education, struggle with maintaining up-to-date relevance and are hindered when applied to new, dynamic situations [12,42,43]. Such limitations make it difficult for the models to generalize beyond their original scope, preventing them from transferring effectively to new domains or responding to emerging trends.
Scalability also remains a major concern, as many chatbot evaluations are conducted using small-scale or synthetic datasets that fail to capture the complexity of real-world environments [14,19,36,46]. This lack of robust testing restricts the ability of these systems to perform reliably when scaled up or deployed in broader, more varied contexts. Furthermore, many chatbots operate in static, pre-scripted environments, making it challenging for them to adapt to real-time user interactions or integrate live data streams [19,48,49]. Without the ability to dynamically respond to changes in user context, the chatbot’s relevance and responsiveness are significantly diminished.
Taken together, these challenges underscore the need for continued research to reduce factual errors, develop more adaptable and scalable architectures, and enhance the contextual awareness of generative AI chatbots. Addressing these issues is critical to ensuring that these systems can support trustworthy, real-world applications across a wide range of domains.

3.4. What Directions Does Current Research Suggest for Future Development and Deployment of Generative AI Chatbots?

Current research outlines several important directions for improving the development and deployment of generative AI chatbots. A key area of focus is making these systems more personalized and adaptable, allowing them to respond effectively to individual users and specific contexts [31,35,39,49]. For example, VaxBot-HPV [12] and DrBioRight 2.0 [16] demonstrate personalization through domain-specific fine-tuning and adaptive prompting that tailor responses to user profiles in healthcare settings. Similarly, Edubot [35] and the LLaMA-3 Student Assistant [37] illustrate educational personalization by combining retrieval-augmented generation (RAG) with learning context memory to provide individualized academic support.
To enhance robustness, recent implementations such as SafetyRAG [38] and IntellBot [39] apply RAG frameworks that integrate verified external knowledge bases, reducing hallucination and improving factual consistency in cybersecurity and enterprise applications. These systems exemplify how domain-grounded retrieval layers can make chatbot outputs more trustworthy in real deployments.
There is also a strong focus on enabling more natural and coherent interactions through multi-turn conversational capabilities that maintain context across extended dialogue [31,45]. Expanding interaction modes beyond text—including voice, multilingual, and multimodal inputs—is seen as essential for broader accessibility and engagement [10,17,20,32]. Moreover, grounding chatbot behavior in instructional, cognitive, or therapeutic frameworks ensures that generated responses align with meaningful human outcomes, as observed in mental health assistants like TheraBot [33] and cognitive training bots for autism assessment [23]. Finally, comprehensive evaluation through standardized metrics and long-term studies is needed to assess system reliability, adaptability, and trust across domains and deployment contexts [18,25,42].

4. Discussion

In this research, the analysis was based on four key dimensions: application domains, model integration patterns, design limitations, and future directions. By synthesizing these findings, this section highlights the major challenges facing generative AI chatbot systems and maps out targeted strategies for addressing them. Figure 3 summarizes the key challenge–solution pairs identified across the reviewed literature.

4.1. Challenges and Sectoral Insights

Generative AI chatbots have gained significant traction in healthcare and education. In healthcare, chatbots support mental health interventions, clinical triage, and patient education, aligning with global efforts toward digital health augmentation. In education, they offer personalized tutoring, adaptive feedback, and engagement support, particularly in STEM fields.
Despite these successes, several sector-specific barriers persist. For instance, content hallucinations and inadequate domain adaptation compromise the reliability of chatbot outputs, especially in high-stakes contexts. As depicted in Figure 3, addressing these issues requires integrating domain-specific knowledge bases and applying fine-tuned training strategies tailored to each sector’s constraints.

4.2. Model Usage and Integration Patterns

Model selection across studies was pragmatic and context-dependent. Although GPT-3.5 emerged as the most commonly implemented model, it is important to note that it is not necessarily the most current; alternative LLMs—both proprietary and open source—were employed based on factors such as cost, licensing, and fit for purpose. A notable pattern was the growing reliance on RAG architectures, which combine LLMs with external data sources to enhance factual accuracy. This approach directly addresses the challenge of hallucination, as noted in the figure, by grounding chatbot outputs in verifiable information.

4.3. Design Limitations and Evaluation Gaps

Another challenge involves the static nature of many systems, which are disconnected from real-time data or user context. This reduces adaptability and long-term relevance. Additionally, limited attention to personalization and interface design restricts accessibility and user engagement.
A significant gap is the absence of longitudinal evaluation. Most studies rely on short-term metrics, providing little insight into sustained performance or user trust over time. As shown in Figure 3, advancing long-term evaluation frameworks is essential to better understand real-world deployment dynamics and system robustness.

4.4. Future Directions: Toward Human-Centered, Adaptive Systems

The field is moving toward human-centric, context-aware generative chatbot systems. Figure 3 outlines five key challenge areas and the corresponding future directions that can guide development and evaluation.
  • Enhancing domain-specific accuracy and reducing hallucination: through the adoption of RAG frameworks, real-time data integration, and domain-specific fine-tuning.
  • Bridging domain adaptation gaps: developing training strategies that incorporate sector-specific datasets, terminology, and task objectives.
  • Improving evaluation methodology: implementing long-term, user-centered studies to assess chatbot reliability, adaptability, and trust over time.
  • Advancing personalization and usability: building adaptive, multimodal interfaces that can respond to diverse user needs and interaction preferences.
  • Strengthening privacy, security, and ethical safeguards: including encryption, explainability, liveness detection, and user education mechanisms to ensure responsible deployment.
These directions represent a strategic roadmap for developing the next generation of generative AI chatbot systems that are not only capable and intelligent but also transparent and sustainable across real-world contexts.

4.5. Ethical Considerations

Ethical considerations in the reviewed studies, such as privacy, safety, and fairness, are often secondary to technical performance. Few papers implement safety measures; for instance, SafetyRAG integrates safety-aware retrieval to mitigate harmful outputs [38]. Some healthcare studies recognize ethical guardrails, with VaxBot-HPV focusing on trustworthy vaccine information [12] and DrBioRight 2.0 promoting transparency [16]. However, many studies, particularly in news and clinical decision-making support, neglect privacy and bias controls [42,43].
Privacy: Most studies either avoid processing personally identifiable information (PII) or lack data protection measures. High-risk sectors like healthcare demonstrate minimal documentation on data minimization or encryption practices. Some designs prioritize content accuracy over legal compliance [12,38].
Bias and Equity: Few studies perform bias audits or subgroup analyses. Clinical systems often report aggregate accuracy without demographic stratification, leaving equity implications under-explored [14,28]. Safety-oriented retrieval does not replace bias evaluation [38].
Regulatory Compliance: Documentation for regulatory alignment, such as HIPAA or GDPR, is largely absent. Authors rarely provide compliance artifacts, and healthcare studies often omit critical oversight details [14,28]. Overall, ethical considerations are sporadic, with robust privacy and bias governance remaining rare.

5. Conclusions

This paper presents an SLR of 39 primary studies published between 2020 and 2025, focusing on generative AI chatbots across diverse domains. The review revealed that healthcare remains the most prominent domain for generative AI chatbot deployment. Education, cybersecurity, media, and industry followed as emerging fields, each demonstrating unique adoption patterns and use cases. Additionally, the review highlighted several challenges and limitations in real-world implementations. The review found that generative AI chatbots powered by LLMs, of which GPT-3.5 variants were the most widely adopted, have contributed to improved user satisfaction and interaction quality. However, gaps in standardization, evaluation methods, and domain-specific effectiveness continue to limit the reliability and generalizability of current systems. Compared to previous systematic reviews, this paper offers a more comprehensive cross-domain synthesis, expanding the analysis beyond single-sector perspectives and highlighting new insights on explainability, scalability, and external validation that enhance the overall generalizability and contribution of generative AI chatbot research.

Author Contributions

L.A., F.A. (Fay Aljomah), M.A., F.A. (Fawzia Alanazi), F.T. and S.A. conceived the project, performed the review, and analyzed and interpreted the data, and drafted the manuscript; A.A.-N. validated, supervised, reviewed, and edited the manuscript and contributed to the discussion. All authors have read and agreed to the published version of the manuscript.

Funding

This scientific paper is derived from a research grant funded by the Research, Development, and Innovation Authority (RDIA)—Kingdom of Saudi Arabia—with grant number (13461-imamu-2023-IMIU-R-3-1-HW-).

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial Intelligence
LLMsLarge Language Models
RAGRetrieval-Augmented Generation
STEMScience, Technology, Engineering, and Mathematics
SLRSystematic Literature Review

References

  1. Samonte, M.J.; Arlando, R.D.C.; Joquiño, N.A.P.; Manongas, J.B.; Poblete, J.O. Applications of Artificial Intelligence in Conversational Agents: A Systematic Literature Review of AI in Chatbots. In Proceedings of the 2024 IEEE 4th International Conference on Software Engineering and Artificial Intelligence (SEAI), Xiamen, China, 21–23 June 2024; pp. 289–295. [Google Scholar] [CrossRef]
  2. Tanvir, S.H.; Kim, G.J. WIP: Generative and Custom Chatbots in Computer Programming Education and Their Effectiveness A Systematic Literature Review. In Proceedings of the 2024 IEEE Frontiers in Education Conference (FIE), Washington, DC, USA, 13–16 October 2024; pp. 1–5. [Google Scholar] [CrossRef]
  3. Arabelli, R.; Gupta, S.; Prakash, N.; Ali, Z. Natural Language Generation in AI: Developing Human-like Text Through Deep Learning. In Proceedings of the 2025 First International Conference on Advances in Computer Science, Electrical, Electronics, and Communication Technologies (CE2CT), Bhimtal, Nainital, India, 21–22 February 2025; pp. 1411–1415. [Google Scholar] [CrossRef]
  4. Talyshinskii, A.; Naik, N.; Hameed, B.M.Z.; Juliebø-Jones, P.; Somani, B.K. Potential of AI-Driven Chatbots in Urology: Revolutionizing Patient Care Through Artificial Intelligence. Curr. Urol. Rep. 2024, 25, 9–18. [Google Scholar] [CrossRef]
  5. Lin, C.C.; Huang, A.Y.Q.; Yang, S.J.H. A Review of AI-Driven Conversational Chatbots Implementation Methodologies and Challenges (1999–2022). Sustainability 2023, 15, 4012. [Google Scholar] [CrossRef]
  6. Elhiny, L.; Ye, X.; Speidel, U.; Manoharan, S. A Systematic Review of Recent Research on Chatbots. In Proceedings of the 2023 International Conference on Electrical, Communication and Computer Engineering (ICECCE), Dubai, United Arab Emirates, 30–31 December 2023; pp. 1–9. [Google Scholar] [CrossRef]
  7. Miklosik, A.; Evans, N.; Qureshi, A.M.A. The Use of Chatbots in Digital Business Transformation: A Systematic Literature Review. IEEE Access 2021, 9, 106530–106539. [Google Scholar] [CrossRef]
  8. Nicolescu, L.; Tudorache, M.T. Human-Computer Interaction in Customer Service: The Experience with AI Chatbots—A Systematic Literature Review. Electronics 2022, 11, 1579. [Google Scholar] [CrossRef]
  9. OpenAI. GPT-3.5. Web Page, OpenAI. 2023. Available online: https://openai.com (accessed on 1 August 2023).
  10. Khan, U.H.; Khan, M.H.; Ali, R. Large Language Model based Educational Virtual Assistant using RAG Framework. Procedia Comput. Sci. 2025, 252, 905–911. [Google Scholar] [CrossRef]
  11. Liberati, A.; Altman, D.G.; Tetzlaff, J.; Mulrow, C.; Gøtzsche, P.C.; Ioannidis, J.P.A.; Clarke, M.; Devereaux, P.J.; Kleijnen, J.; Moher, D. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: Explanation and elaboration. J. Clin. Epidemiol. 2009, 62, e1–e34. [Google Scholar] [CrossRef] [PubMed]
  12. Li, Y.; Li, J.; Li, M.; Yu, E.; Rhee, D.; Amith, M.; Tang, L.; Savas, L.S.; Cui, L.; Tao, C. VaxBot-HPV: A GPT-based chatbot for answering HPV vaccine-related questions. JAMIA Open 2025, 8, ooaf005. [Google Scholar] [CrossRef] [PubMed]
  13. Cinquin, O. Steering veridical large language model analyses by correcting and enriching generated database queries: First steps toward ChatGPT bioinformatics. Briefings Bioinform. 2025, 26, bbaf045. [Google Scholar] [CrossRef]
  14. Barrit, S.; Torcida, N.; Mazeraud, A.; Boulogne, S.; Benoit, J.; Carette, T.; Carron, T.; Delsaut, B.; Diab, E.; Kermorvant, H.; et al. Specialized Large Language Model Outperforms Neurologists at Complex Diagnosis in Blinded Case-Based Evaluation. Brain Sci. 2025, 15, 347. [Google Scholar] [CrossRef]
  15. Roshani, M.A.; Zhou, X.; Qiang, Y.; Suresh, S.; Hicks, S.; Sethuraman, U.; Zhu, D. Generative Large Language Model—Powered Conversational AI App for Personalized Risk Assessment: Case Study in COVID-19. JMIR AI 2025, 4, e67363. [Google Scholar] [CrossRef]
  16. Liu, W.; Li, J.; Tang, Y.; Zhao, Y.; Liu, C.; Song, M.; Ju, Z.; Kumar, S.V.; Lu, Y.; Akbani, R.; et al. DrBioRight 2.0: An LLM-powered bioinformatics chatbot for large-scale cancer functional proteomics analysis. Nat. Commun. 2025, 16, 2256. [Google Scholar] [CrossRef]
  17. Routray, S.; Samir, D.; Pushya, G.; Srivastava, J. Early Detection of Dementia Using a Large Language Model-Powered Chatbot. In Proceedings of the 2024 Eighth International Conference on Parallel, Distributed and Grid Computing (PDGC), Waknaghat, Solan, India, 18–20 December 2024; pp. 719–724. [Google Scholar] [CrossRef]
  18. Fraser, H.; Crossland, D.; Bacher, I.; Ranney, M.; Madsen, T.; Hilliard, R. Comparison of Diagnostic and Triage Accuracy of Ada Health and WebMD Symptom Checkers, ChatGPT, and Physicians for Patients in an Emergency Department: Clinical Data Analysis Study. JMIR mHealth uHealth 2023, 11, e49995. [Google Scholar] [CrossRef] [PubMed]
  19. Elfayoumi, M.; AbouElazm, M.; Mohamed, O.; Abuhmed, T.; El-Sappagh, S. Knowledge Augmented Significant Language Model-Based Chatbot for Explainable Diabetes Mellitus Prediction. In Proceedings of the 2025 19th International Conference on Ubiquitous Information Management and Communication (IMCOM), Bangkok, Thailand, 3–5 January 2025; pp. 1–8. [Google Scholar] [CrossRef]
  20. Amrutkar, G.; Awari, O.; Chikmurge, D.; Kharat, S. Medical Chatbot using Gamma LLMV2 and Comparison Using BERT Models. In Proceedings of the 2025 3rd International Conference on Intelligent Systems, Advanced Computing and Communication (ISACC), Silchar, India, 27–28 February 2025; pp. 1055–1060. [Google Scholar] [CrossRef]
  21. Patil, R.; Yeolekar, S.; Khade, O.; Kadam, Y.; Ingole, P.; Patil, S. AyUR-bot: A Revolutionary Ayurvedic Chatbot Empowered by Generative AI. In Proceedings of the 2024 8th International Conference on Computing, Communication, Control and Automation (ICCUBEA), Pune, India, 23–24 August 2024; pp. 1–6. [Google Scholar] [CrossRef]
  22. Gupta, D.; Swami, V.; Shukla, D.; Nimala, K. Design and Implementation of an AI-Driven Mental Health Chatbot: A Generative AI Model. In Proceedings of the 2024 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES), Chennai, India, 12–13 December 2024; pp. 1–7. [Google Scholar] [CrossRef]
  23. Fallah, A.; Keramati, A.; Nazari, M.A.; Mirfazeli, F.S. Automating Theory of Mind Assessment with a LLaMA-3-Powered Chatbot: Enhancing Faux Pas Detection in Autism. In Proceedings of the 2024 14th International Conference on Computer and Knowledge Engineering (ICCKE), Mashhad, Iran, 19–20 November 2024; pp. 365–372. [Google Scholar] [CrossRef]
  24. Zhang, S.; Song, J. A chatbot based question and answer system for the auxiliary diagnosis of chronic diseases based on large language model. Sci. Rep. 2024, 14, 17118. [Google Scholar] [CrossRef] [PubMed]
  25. Bricker, J.B.; Sullivan, B.; Mull, K.; Santiago-Torres, M.; Ferres, J.M.L. Conversational Chatbot for Cigarette Smoking Cessation: Results From the 11-Step User-Centered Design Development Process and Randomized Controlled Trial. JMIR mHealth uHealth 2024, 12, e57318. [Google Scholar] [CrossRef] [PubMed]
  26. Vidivelli, S.; Ramachandran, M.; Dharunbalaji, A. Efficiency-Driven Custom Chatbot Development: Unleashing LangChain, RAG, and Performance-Optimized LLM Fusion. Comput. Mater. Contin. 2024, 80, 2423–2442. [Google Scholar] [CrossRef]
  27. Alghamdi, H.M.; Mostafa, A. Towards Reliable Healthcare LLM Agents: A Case Study for Pilgrims during Hajj. Information 2024, 15, 371. [Google Scholar] [CrossRef]
  28. Liu, W.; Kan, H.; Jiang, Y.; Geng, Y.; Nie, Y.; Yang, M. MED-ChatGPT CoPilot: A ChatGPT medical assistant for case mining and adjunctive therapy. Front. Med. 2024, 11, 1460553. [Google Scholar] [CrossRef]
  29. Sree, Y.B.; Sathvik, A.; Hema Akshit, D.S.; Kumar, O.; Pranav Rao, B.S. Retrieval-Augmented Generation Based Large Language Model Chatbot for Improving Diagnosis for Physical and Mental Health. In Proceedings of the 2024 6th International Conference on Electrical, Control and Instrumentation Engineering (ICECIE), Pattaya, Thailand, 23–23 November 2024; pp. 1–8. [Google Scholar] [CrossRef]
  30. Peñas, R.T.L.; Cajote, R.D. Philippine Region-Based Food and Nutrition Cross-Reference Using Fine-Tuned Generative ChatBot. In Proceedings of the 2024 2nd World Conference on Communication & Computing (WCONF), Raipur, India, 12–14 July 2024; pp. 1–5. [Google Scholar] [CrossRef]
  31. Siddique, S.; Alsayoud, F. Multi-Tiered RAG-Based Chatbot for Mental Health Support. In Proceedings of the 2025 Eighth International Women in Data Science Conference at Prince Sultan University (WiDS PSU), Riyadh, Saudi Arabia, 13–14 April 2025; pp. 1–6. [Google Scholar] [CrossRef]
  32. Vakayil, S.; Juliet, D.S.; J, A.; Vakayil, S. RAG-Based LLM Chatbot Using Llama-2. In Proceedings of the 2024 7th International Conference on Devices, Circuits and Systems (ICDCS), Coimbatore, India, 23–24 April 2024; pp. 1–5. [Google Scholar] [CrossRef]
  33. Kang, C.; Novak, D.; Urbanova, K.; Cheng, Y.; Hu, Y. Domain-Specific Improvement on Psychotherapy Chatbot Using Assistant. In Proceedings of the 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), Seoul, Republic of Korea, 14–19 April 2024; pp. 351–355. [Google Scholar] [CrossRef]
  34. Vasaniya, R.; Visodiya, M.; Patel, A.K. TechAssist: A RAG-Based Chatbot for Accessing Technical Information from StackOverflow. In Proceedings of the 2025 IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS), Bhopal, India, 18–19 January 2025; pp. 1–6. [Google Scholar] [CrossRef]
  35. Pandey, T.; Yadav, A.; Sharma, J.; Singhal, S. Design based Retrieval Augmented Generation Oriented Education Chatbot: Edubot. In Proceedings of the 2024 2nd International Conference on Advances in Computation, Communication and Information Technology (ICAICCIT), Faridabad, India, 28–29 November 2024; Volume 1, pp. 1278–1283. [Google Scholar] [CrossRef]
  36. Neupane, S.; Hossain, E.; Keith, J.; Tripathi, H.; Ghiasi, F.; Golilarz, N.A.; Amirlatifi, A.; Mittal, S.; Rahimi, S. From Questions to Insightful Answers: Building an Informed Chatbot for University Resources. In Proceedings of the 2024 IEEE Frontiers in Education Conference (FIE), Washington, DC, USA, 13–16 October 2024; pp. 1–9. [Google Scholar] [CrossRef]
  37. Comia, L.V. Chatbot for Student Descipline Handbook-Related Queries: A RAG-Based LLM Using Llama-3 Approach. In Proceedings of the 2025 11th International Conference on Web Research (ICWR), Tehran, Iran, 16–17 April 2025; pp. 306–312. [Google Scholar] [CrossRef]
  38. Omri, S.; Abdelkader, M.; Hamdi, M. SafetyRAG: Towards Safe Large Language Model-Based Application through Retrieval-Augmented Generation. J. Adv. Inf. Technol. 2025, 16, 243–250. [Google Scholar] [CrossRef]
  39. Arikkat, D.R.; Abhinav, M.; Binu, N.; Parvathi, M.; Biju, N.; Arunima, K.S.; Vinod, P.; KA, R.R.; Conti, M. IntellBot: Retrieval Augmented LLM Chatbot for Cyber Threat Knowledge Delivery. In Proceedings of the 2024 IEEE 16th International Conference on Computational Intelligence and Communication Networks (CICN), Indore, India, 22–23 December 2024; pp. 644–651. [Google Scholar] [CrossRef]
  40. Rosari, H.S.; Girinoto; Novita Yasa, R.; Qomariasih, N.; Anindya Wijayanti, R. SiJesi: Large Language Model Chatbot with Augmented Retrieval Approach Generation and Prompt Engineering. In Proceedings of the 2024 International Conference on Computer Engineering, Network, and Intelligent Multimedia (CENIM), Surabaya, Indonesia, 19–20 November 2024; pp. 1–6. [Google Scholar] [CrossRef]
  41. Balasubramanian, P.; Seby, J.; Kostakos, P. CYGENT: A cybersecurity conversational agent with log summarization powered by GPT-3. In Proceedings of the 2024 3rd International Conference on Artificial Intelligence for Internet of Things (AIIoT), Vellore, India, 3–4 May 2024; pp. 1–6. [Google Scholar] [CrossRef]
  42. Sufi, F. Just-in-Time News: An AI Chatbot for the Modern Information Age. AI 2025, 6, 22. [Google Scholar] [CrossRef]
  43. Sufi, F.; Alsulami, M. AI-Driven Chatbot for Real-Time News Automation. Mathematics 2025, 13, 850. [Google Scholar] [CrossRef]
  44. Baviskar, V.; Sabale, T.; Jachak, P.; Tapkir, C. News Article Analysis Chatbot using Generative AI. In Proceedings of the 2025 International Conference on Data Science, Agents & Artificial Intelligence (ICDSAAI), Chennai, India, 28–29 March 2025; pp. 1–6. [Google Scholar] [CrossRef]
  45. Lee, H.C.; Hung, K.; Man, G.M.T.; Ho, R.; Leung, M. Development of an RAG-Based LLM Chatbot for Enhancing Technical Support Service. In Proceedings of the TENCON 2024 - 2024 IEEE Region 10 Conference (TENCON), Singapore, 1–4 December 2024; pp. 1080–1083. [Google Scholar] [CrossRef]
  46. Medeiros, T.; Medeiros, M.; Azevedo, M.; Silva, M.; Silva, I.; Costa, D.G. Analysis of Language-Model-Powered Chatbots for Query Resolution in PDF-Based Automotive Manuals. Vehicles 2023, 5, 1384–1399. [Google Scholar] [CrossRef]
  47. Kiangala, K.S.; Wang, Z. An experimental hybrid customized AI and generative AI chatbot human machine interface to improve a factory troubleshooting downtime in the context of Industry 5.0. Int. J. Adv. Manuf. Technol. 2024, 132, 2715–2733. [Google Scholar] [CrossRef]
  48. Patel, V.; Tejani, P.; Parekh, J.; Huang, K.; Tan, X. Developing A Chatbot: A Hybrid Approach Using Deep Learning and RAG. In Proceedings of the 2024 IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), Bangkok, Thailand, 9–12 December 2024; pp. 273–280. [Google Scholar] [CrossRef]
  49. Faruqui, N.; Raju, N.V.D.S.S.V.P.; Sivakumar, S.; Patel, N.; Vengaramkode Bhaskaran, S.; Khanam, S.; Bhuiyan, T. Gen-Optimizer: A Generative AI Framework for Strategic Business Cost Optimization. Computers 2025, 14, 59. [Google Scholar] [CrossRef]
  50. Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.t.; Rocktäschel, T.; et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv 2021, arXiv:2005.11401. [Google Scholar] [CrossRef]
  51. Meta. Llama. Web Page, Meta. 2024. Available online: https://llama.meta.com (accessed on 1 August 2024).
  52. Mistral AI. Mistral 7B Instruct. Web Page, Mistral AI. 2023. Available online: https://mistral.ai (accessed on 1 August 2024).
  53. Mistral AI. Mistral-7B-Instruct-v0.2. Web Page, Mistral AI/Hugging Face. 2023. Available online: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2 (accessed on 1 August 2024).
  54. Microsoft. Copilot. Web Page, Microsoft. 2023. Available online: https://copilot.microsoft.com/ (accessed on 1 August 2024).
  55. Google. Gemini. Web Page, Google. 2024. Available online: https://gemini.google.com/ (accessed on 1 August 2024).
Figure 1. PRISMA flow diagram.
Figure 1. PRISMA flow diagram.
Applsci 15 11220 g001
Figure 2. Sector-wise distribution of generative AI chatbot applications. Source: Based on data from 39 reviewed articles.
Figure 2. Sector-wise distribution of generative AI chatbot applications. Source: Based on data from 39 reviewed articles.
Applsci 15 11220 g002
Figure 3. Summary of key challenges and future directions in generative AI chatbot systems.
Figure 3. Summary of key challenges and future directions in generative AI chatbot systems.
Applsci 15 11220 g003
Table 1. Overview of chatbot use cases across sectors.
Table 1. Overview of chatbot use cases across sectors.
SectorUse CaseSample Size (Concise)Articles
Healthcare/MedicalVaccine Awareness451 docs; 202 + 39 Q–A pairs[12]
Bioinformatics SupportNot reported[13]
Neurology Diagnosis13 neurologists; 5 samples; 20 AI evals[14]
Personalized Risk Assessment (COVID-19)393 participants[15]
Cancer Proteomics Analysis≈9000 samples[16]
Early Dementia Detection500 participants[17]
Symptom Checker/Medical Triage Support40 participants (30 diagnosis; 37 triage)[18]
Explainable Diabetes Risk Prediction100,000 samples[19]
General Medical ConsultationNot reported[20]
Ayurvedic ConsultationNot reported[21]
AI Therapy Assistant10 evaluated samples[22]
Autism Support1 participant + translator[23]
Chronic Disease Auxiliary Diagnosis64 participants + 1200 samples[24]
Smoking Cessation Support404 participants[25]
Dental SupportNot reported[26]
Pilgrim Health Support50 participants + 150 synthetic samples[27]
Clinical Case Analysis241 samples (306 papers; 300 MedQA Qs)[28]
Physical and Mental Health Diagnosis SupportNot reported[29]
Nutrition Guidance30 web sources + 100 Q-A pairs[30]
Mental Health Support7 evaluated samples[31]
Sexual harassment victim supportNot reported[32]
Psychotherapy Support1179 transcripts[33]
EducationProgramming Q&A Support10M Q–A pairs[34]
Educational Assistant50 participants[35]
Not reported[10]
University Information Assistant50 participants[36]
7 evaluated samples[37]
CybersecurityLLM Safety/Security100 samples[38]
2447 docs + 400 eval samples[39]
Cyber Incident Response Chatbot19 participants[40]
Security Log Summarization Chatbot101 samples[41]
Media and JournalismJournalism, Media, News Analysis35 participants; 989k samples[42]
1,306,518 samples[43]
Not reported[44]
Technical Support ServiceIT Helpdesk Assistant75 samples[45]
Customer ServiceAutomotive Manual Query Assistant1 sample (4.8 MB)[46]
Industry and ManufacturingFactory Troubleshooting Support15,000 samples[47]
E-commerceOnline Shopping Customer Support10,000 samples[48]
Enterprise and ManagementBusiness Cost Optimization25,000 Q–A pairs[49]
Table 2. LLM usage across domains.
Table 2. LLM usage across domains.
DomainLLM ModelPerformance ScoreArticles
Healthcare/MedicalChatGPT 3.559% Triage Agreement[18]
ChatGPT 4.076% Triage Agreement[18]
Claude + GPT87% ACC[17]
Flan-T5-xl-T0.69 Zero-Shot AUC and 0.70 32-Shot AUC[15]
GPT-[29]
GPT-20.97 ACC[24]
GPT-3.5 (base)0.80 Answer Relevancy[12]
GPT-3.5 (fine-tuned)-[25]
GPT-3.5 (Prompt)0.76 ACC[19]
GPT-40.83 Answer Relevancy/58.1% Success Rate[12,16]
GPT-4 (GPT-4-1106-preview)77.33% ACC[28]
GPT-4 Turbo86.17% ACC[14]
GPT-4o-[13,16]
Gamma LLM v295% ACC[20]
Gemma2 (Prompt)0.83 ACC[19]
Gemma2 (RAG)0.85 ACC[19]
LLaMA93% ACC[22]
LLaMA-3-70b-chat-hf-[23]
LLaMA2-7b0.59 Zero-Shot AUC and 0.67 32-Shot AUC[15]
Llama 20.87 F1-Score[21]
Llama 3≈90% Success Rate/87% ACC[16,27]
Llama 3.1 (Baseline)0.77 ACC[19]
Llama 3.1 (Prompt)0.86 ACC[19]
Llama 3.1 (RAG)0.85 ACC[19]
T0-3b-T0.75 Zero-Shot AUC and 0.65 32-Shot AUC[15]
T0pp(8bit)-T0.67 Zero-Shot AUC[15]
TinyLlama-1.1B-Chat-v1.091% ROUGE[26]
Mental HealthChatGLM2-6B32.8 Rouge and 56.4 Fluency[33]
GPT-3.5-Turbo0.94 Faith[31]
GPT-4-[33]
Llama-2-7b (Meta)95% ACC[32]
fine-tuned LLaMA-2-7B22.4 Rouge and 30.3 Fluency[33]
Food and NutritionMeta LLaMA 2 7B HF0.63 BERTScore F1[30]
Meta LLaMA 3 7B Instruct0.69 BERTScore F1[30]
Mistral 7B Instruct v0.20.64 BERTScore F1[30]
EducationGPT-3.5-turbo0.96 RAGAS mean score[36]
Llama-3-3B-Instruct0.92 mean similarity score[37]
Meta-LLaMA-2-7B-Chat-HF≈0.84 BLEU (avg)[10]
Mistral 7B Instruct91% ACC[34]
Mistral x86 model≈0.89 contextual relevance score[35]
CybersecurityGPT-3 family97% BERTScore[41]
GPT-3.5-turbo94.8% ACC/87% ACC[38,39]
GPT-3.5-turbo-01286.47% UAT Score[40]
Gemma-7B61% ACC[38]
Llama-2-13B58% ACC[38]
Mistral-7B-instruct56% ACC[38]
Media and JournalismGoogle Gemini + Microsoft Copilot97% ACC[42]
Google Gemini0.94 F1[43]
LLaMA 3-[44]
Microsoft Copilot-[43]
Enterprise and ManagementCustom Transformer-based93.2% ACC[49]
Customer ServiceGPT-3 (text-davinci-003)-[46]
E-commerceGPT-3.599.7% ACC[48]
ManufacturingGPT-3.5-[47]
Technical Support ServiceGPT-3.5-turbo0.35 ROUGE-L[45]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Aldhafeeri, L.; Aljumah, F.; Thabyan, F.; Alabbad, M.; AlShahrani, S.; Alanazi, F.; Al-Nafjan, A. Generative AI Chatbots Across Domains: A Systematic Review. Appl. Sci. 2025, 15, 11220. https://doi.org/10.3390/app152011220

AMA Style

Aldhafeeri L, Aljumah F, Thabyan F, Alabbad M, AlShahrani S, Alanazi F, Al-Nafjan A. Generative AI Chatbots Across Domains: A Systematic Review. Applied Sciences. 2025; 15(20):11220. https://doi.org/10.3390/app152011220

Chicago/Turabian Style

Aldhafeeri, Lama, Fay Aljumah, Fajr Thabyan, Maram Alabbad, Sultanh AlShahrani, Fawzia Alanazi, and Abeer Al-Nafjan. 2025. "Generative AI Chatbots Across Domains: A Systematic Review" Applied Sciences 15, no. 20: 11220. https://doi.org/10.3390/app152011220

APA Style

Aldhafeeri, L., Aljumah, F., Thabyan, F., Alabbad, M., AlShahrani, S., Alanazi, F., & Al-Nafjan, A. (2025). Generative AI Chatbots Across Domains: A Systematic Review. Applied Sciences, 15(20), 11220. https://doi.org/10.3390/app152011220

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop