Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (398)

Search Parameters:
Keywords = factuality

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 1818 KiB  
Article
Image Captioning Model Based on Multi-Step Cross-Attention Cross-Modal Alignment and External Commonsense Knowledge Augmentation
by Liang Wang, Meiqing Jiao, Zhihai Li, Mengxue Zhang, Haiyan Wei, Yuru Ma, Honghui An, Jiaqi Lin and Jun Wang
Electronics 2025, 14(16), 3325; https://doi.org/10.3390/electronics14163325 - 21 Aug 2025
Viewed by 75
Abstract
To address the semantic mismatch between limited textual descriptions in image captioning training datasets and the multi-semantic nature of images, as well as the underutilized external commonsense knowledge, this article proposes a novel image captioning model based on multi-step cross-attention cross-modal alignment and [...] Read more.
To address the semantic mismatch between limited textual descriptions in image captioning training datasets and the multi-semantic nature of images, as well as the underutilized external commonsense knowledge, this article proposes a novel image captioning model based on multi-step cross-attention cross-modal alignment and external commonsense knowledge enhancement. The model employs a backbone architecture comprising CLIP’s ViT visual encoder, Faster R-CNN, BERT text encoder, and GPT-2 text decoder. It incorporates two core mechanisms: a multi-step cross-attention mechanism that iteratively aligns image and text features across multiple rounds, progressively enhancing inter-modal semantic consistency for more accurate cross-modal representation fusion. Moreover, the model employs Faster R-CNN to extract region-based object features. These features are mapped to corresponding entities within the dataset through entity probability calculation and entity linking. External commonsense knowledge associated with these entities is then retrieved from the ConceptNet knowledge graph, followed by knowledge embedding via TransE and multi-hop reasoning. Finally, the fused multimodal features are fed into the GPT-2 decoder to steer caption generation, enhancing the lexical richness, factual accuracy, and cognitive plausibility of the generated descriptions. In the experiments, the model achieves CIDEr scores of 142.6 on MSCOCO and 78.4 on Flickr30k. Ablations confirm both modules enhance caption quality. Full article
Show Figures

Figure 1

20 pages, 1466 KiB  
Article
Towards Controllable and Explainable Text Generation via Causal Intervention in LLMs
by Jie Qiu, Quanrong Fang and Wenhao Kang
Electronics 2025, 14(16), 3279; https://doi.org/10.3390/electronics14163279 - 18 Aug 2025
Viewed by 248
Abstract
Large Language Models (LLMs) excel in diverse text generation tasks but still face limited controllability, opaque decision processes, and frequent hallucinations. This paper presents a structural causal intervention framework that models input–hidden–output dependencies through a structural causal model and performs targeted interventions on [...] Read more.
Large Language Models (LLMs) excel in diverse text generation tasks but still face limited controllability, opaque decision processes, and frequent hallucinations. This paper presents a structural causal intervention framework that models input–hidden–output dependencies through a structural causal model and performs targeted interventions on hidden representations. By combining counterfactual sample construction with contrastive training, our method enables precise control of style, sentiment, and factual consistency while providing explicit causal explanations for output changes. Experiments on three representative tasks demonstrate consistent and substantial improvements: style transfer accuracy reaches 92.3% (+7–14 percentage points over strong baselines), sentiment-controlled generation achieves 90.1% accuracy (+1.3–10.9 points), and multi-attribute conflict rates drop to 3.7% (a 40–60% relative reduction). Our method also improves causal attribution scores to 0.83–0.85 and human agreement rates to 87–88%, while reducing training and inference latency by 25–30% through sparse masking that modifies ≤10% of hidden units per attribute. These results confirm that integrating structural causal intervention with counterfactual training advances controllability, interpretability, and efficiency in LLM-based generation, offering a robust foundation for deployment in reliability-critical and resource-constrained applications. Full article
Show Figures

Figure 1

22 pages, 3187 KiB  
Article
Automated Clinical Trial Data Analysis and Report Generation by Integrating Retrieval-Augmented Generation (RAG) and Large Language Model (LLM) Technologies
by Sheng-Ming Kuo, Shao-Kuo Tai, Hung-Yu Lin and Rung-Ching Chen
AI 2025, 6(8), 188; https://doi.org/10.3390/ai6080188 - 15 Aug 2025
Viewed by 734
Abstract
Retrieval-Augmented Generation (RAG) combined with Large Language Models (LLMs) introduces a new paradigm for clinical-trial data analysis that is both real-time and knowledge-traceable. This study targets a multi-site, real-world data environment. It builds a hierarchical RAG pipeline spanning an electronic health record (EHR), [...] Read more.
Retrieval-Augmented Generation (RAG) combined with Large Language Models (LLMs) introduces a new paradigm for clinical-trial data analysis that is both real-time and knowledge-traceable. This study targets a multi-site, real-world data environment. It builds a hierarchical RAG pipeline spanning an electronic health record (EHR), National Health Insurance (NHI) billing codes, and image-vector indices. The LLM is optimized through lightweight LoRA/QLoRA fine-tuning and reinforcement-learning-based alignment. The system first retrieves key textual and imaging evidence from heterogeneous data repositories and then fuses these artifacts into the contextual window for clinical report generation. Experimental results show marked improvements over traditional manual statistics and prompt-only models in retrieval accuracy, textual coherence, and response latency while reducing human error and workload. In evaluation, the proposed multimodal RAG-LLM workflow achieved statistically significant gains in three core metrics—recall, factual consistency, and expert ratings—and substantially shortened overall report-generation time, demonstrating clear efficiency advantages versus conventional manual processes. However, LLMs alone often face challenges such as limited real-world grounding, hallucination risks, and restricted context windows. Similarly, RAG systems, while improving factual consistency, depend heavily on retrieval quality and may yield incoherent synthesis if evidence is misaligned. These limitations underline the complementary nature of integrating RAG and LLM architectures in a clinical reporting context. Quantitatively, the proposed system achieved a Composite Quality Index (CQI) of 78.3, outperforming strong baselines such as Med-PaLM 2 (72.6) and PMC-LLaMA (74.3), and reducing the report drafting time by over 75% (p < 0.01). These findings confirm the practical feasibility of the framework to support fully automated clinical reporting. Full article
Show Figures

Figure 1

22 pages, 1165 KiB  
Article
AI-Assisted Exam Variant Generation: A Human-in-the-Loop Framework for Automatic Item Creation
by Charles MacDonald Burke
Educ. Sci. 2025, 15(8), 1029; https://doi.org/10.3390/educsci15081029 - 11 Aug 2025
Viewed by 338
Abstract
Educational assessment relies on well-constructed test items to measure student learning accurately, yet traditional item development is time-consuming and demands specialized psychometric expertise. Automatic item generation (AIG) offers template-based scalability, and recent large language model (LLM) advances promise to democratize item creation. However, [...] Read more.
Educational assessment relies on well-constructed test items to measure student learning accurately, yet traditional item development is time-consuming and demands specialized psychometric expertise. Automatic item generation (AIG) offers template-based scalability, and recent large language model (LLM) advances promise to democratize item creation. However, fully automated approaches risk introducing factual errors, bias, and uneven difficulty. To address these challenges, we propose and evaluate a hybrid human-in-the-loop (HITL) framework for AIG that combines psychometric rigor with the linguistic flexibility of LLMs. In a Spring 2025 case study at Franklin University Switzerland, the instructor collaborated with ChatGPT (o4-mini-high) to generate parallel exam variants for two undergraduate business courses: Quantitative Reasoning and Data Mining. The instructor began by defining “radical” and “incidental” parameters to guide the model. Through iterative cycles of prompt, review, and refinement, the instructor validated content accuracy, calibrated difficulty, and mitigated bias. All interactions (including prompt templates, AI outputs, and human edits) were systematically documented, creating a transparent audit trail. Our findings demonstrate that a HITL approach to AIG can produce diverse, psychometrically equivalent exam forms with reduced development time, while preserving item validity and fairness, and potentially reducing cheating. This offers a replicable pathway for harnessing LLMs in educational measurement without sacrificing quality, equity, or accountability. Full article
Show Figures

Figure 1

21 pages, 396 KiB  
Article
A Hybrid Approach to Literature-Based Discovery: Combining Traditional Methods with LLMs
by Judita Preiss
Appl. Sci. 2025, 15(16), 8785; https://doi.org/10.3390/app15168785 - 8 Aug 2025
Viewed by 387
Abstract
We present a novel hybrid approach to literature-based discovery (LBD) which exploits large language models (LLMs) to enhance traditional LBD methodologies. We explore the use of LLMs to address significant LBD challenges: (1) the extraction of factual subject–predicate–object relations from publication abstracts using [...] Read more.
We present a novel hybrid approach to literature-based discovery (LBD) which exploits large language models (LLMs) to enhance traditional LBD methodologies. We explore the use of LLMs to address significant LBD challenges: (1) the extraction of factual subject–predicate–object relations from publication abstracts using few-shot learning and (2) the filtering of unpromising candidate hidden knowledge pairs (CHKPs) using a variant of the LLM-as-a-judge paradigm with and without the addition of domain-specific information using retrieval augmented generation. The approach produces relations with greater coverage and results in a lower number of CHKPs compared to LBD based on relations extracted with, e.g., SemRep, improving the prediction and efficiency of knowledge discovery. We demonstrate the utility of the method using a drug-repurposing case study and suggest that emerging AI technologies can be used to assist in knowledge discovery from the ever-growing volume of the scientific literature. Full article
(This article belongs to the Special Issue Text Mining and Data Mining)
Show Figures

Figure 1

18 pages, 1226 KiB  
Article
Addressing Gaps in Ontario’s Sexual Health Education: Supporting Healthy Sexual Lives in Young Adults with Disabilities
by Rsha Soud, Adam Davies, Justin Brass and Shoshanah Jacobs
Sexes 2025, 6(3), 42; https://doi.org/10.3390/sexes6030042 - 4 Aug 2025
Viewed by 437
Abstract
This study examines how Ontario’s Health and Physical Education curriculum addresses the needs of young adults with disabilities. A total of 54 individuals aged 18–35 years old with developmental, learning, or physical disabilities who had completed secondary school in Ontario participated in a [...] Read more.
This study examines how Ontario’s Health and Physical Education curriculum addresses the needs of young adults with disabilities. A total of 54 individuals aged 18–35 years old with developmental, learning, or physical disabilities who had completed secondary school in Ontario participated in a cross-sectional mixed-methods survey. Participants were recruited through disability-focused community networks and a university psychology participant pool. They completed the Sex Education subscale of the Sexual Knowledge, Experience, Feelings and Needs Scale, a 35-item sexual knowledge questionnaire, and open-ended questions. Quantitative data were analyzed using descriptive statistics and independent samples t-tests; qualitative responses were examined using thematic analysis. Participants reported limited factual knowledge, minimal classroom representation, and heavy reliance on independent learning. Barriers included inaccessible materials, teacher discomfort, and the absence of disability narratives in sexuality units. Findings point to three priorities: revising curriculum content, expanding educator training, and creating disability-affirming resources. These measures will help ensure comprehensive and rights-based sexuality education that supports the autonomy and well-being of students with disabilities. Full article
Show Figures

Figure 1

15 pages, 915 KiB  
Article
Armenian Architectural Legacy in Henry F. B. Lynch’s Travel Writing
by Martin Harutyunyan and Gaiane Muradian
Arts 2025, 14(4), 86; https://doi.org/10.3390/arts14040086 - 4 Aug 2025
Viewed by 356
Abstract
The study of historical monuments within both architectural and literary frameworks reveals a dynamic interplay between scientific observation and artistic interpretation—a vital characteristic of travel writing/the travelogue. This approach, exemplified by British traveler and writer Henry Finnis Blosse Lynch (1862–1913), reflects how factual [...] Read more.
The study of historical monuments within both architectural and literary frameworks reveals a dynamic interplay between scientific observation and artistic interpretation—a vital characteristic of travel writing/the travelogue. This approach, exemplified by British traveler and writer Henry Finnis Blosse Lynch (1862–1913), reflects how factual detail and creative representation are seamlessly integrated in depictions of sites, landscapes, and cultural scenes. This case study highlights Lynch as a pioneering explorer who authored the first comprehensive volume on Armenian architecture and as a writer who vividly portrayed Armenian monuments through both verbal description and photographic imagery, becoming the first traveler to document such sites using photography. Additionally, this paper emphasizes the significance of Lynch’s detailed accounts of architectural monuments, churches, monasteries, cities, villages, populations, religious communities, and educational institutions in vivid language. The careful study of his work can contribute meaningfully to the investigation of the travelogue as a literary genre and to the preservation and protection of the architectural heritage of historical and contemporary Armenia, particularly in regions facing cultural or political threats. Full article
Show Figures

Figure 1

38 pages, 371 KiB  
Article
How ChatGPT’s Semantic Parrotting (Compared to Gemini’s) Impacts Text Summarization with Literary Text
by Rodolfo Delmonte, Giulia Marchesini and Nicolò Busetto
Information 2025, 16(8), 623; https://doi.org/10.3390/info16080623 - 22 Jul 2025
Viewed by 572
Abstract
In this paper we explore ChatGPT’s ability to produce a summary, a precis, and/or an essay on the basis of excerpts from a novel—The Solid Mandala—by Nobel Prize Australian writer Patrick White. We use a number of prompts to test a [...] Read more.
In this paper we explore ChatGPT’s ability to produce a summary, a precis, and/or an essay on the basis of excerpts from a novel—The Solid Mandala—by Nobel Prize Australian writer Patrick White. We use a number of prompts to test a number of functions related to narrative analysis from the point of view of the “sujet”, the “fable”, and the style. In the paper, we illustrate extensively a number of recurrent semantic mistakes that can badly harm the understanding of the contents of the novel. We made a list of 12 different types of semantic mistakes or parrotting we found GPT made, which can be regarded as typical for stochastic-based generation. We then tested Gemini for the same 12 mistakes and found a marked improvement in all critical key issues. The conclusion for ChatGPT is mostly negative. We formulate an underlying hypothesis for its worse performance, the influence of vocabulary size, which in Gemini is seven times higher than in GPT. Full article
11 pages, 386 KiB  
Article
Benchmarking AI Chatbots for Maternal Lactation Support: A Cross-Platform Evaluation of Quality, Readability, and Clinical Accuracy
by İlke Özer Aslan and Mustafa Törehan Aslan
Healthcare 2025, 13(14), 1756; https://doi.org/10.3390/healthcare13141756 - 20 Jul 2025
Viewed by 523
Abstract
Background and Objective: Large language model (LLM)–based chatbots are increasingly utilized by postpartum individuals seeking guidance on breastfeeding. However, the chatbots’ content quality, readability, and alignment with clinical guidelines remain uncertain. This study was conducted to evaluate and compare the quality, readability, and [...] Read more.
Background and Objective: Large language model (LLM)–based chatbots are increasingly utilized by postpartum individuals seeking guidance on breastfeeding. However, the chatbots’ content quality, readability, and alignment with clinical guidelines remain uncertain. This study was conducted to evaluate and compare the quality, readability, and factual accuracy of responses generated by three publicly accessible AI chatbots—ChatGPT-4o Pro, Gemini 2.5 Pro, and Copilot Pro—when prompted with common maternal questions related to breast-milk supply. Methods: Twenty frequently asked breastfeeding-related questions were submitted to each chatbot in separate sessions. The responses were paraphrased to enable standardized scoring and were then evaluated using three validated tools: ensuring quality information for patients (EQIP), the simple measure of gobbledygook (SMOG), and the global quality scale (GQS). Factual accuracy was benchmarked against WHO, ACOG, CDC, and NICE guidelines using a three-point rubric. Additional user experience metrics included response time, character count, content density, and structural formatting. Statistical comparisons were performed using the Kruskal–Wallis and Wilcoxon rank-sum tests with Bonferroni correction. Results: ChatGPT-4o Pro achieved the highest overall performance across all primary outcomes: EQIP score (85.7 ± 2.4%), SMOG score (9.78 ± 0.22), and GQS rating (4.55 ± 0.50), followed by Gemini 2.5 Pro and Copilot Pro (p < 0.001 for all comparisons). ChatGPT-4o Pro also demonstrated the highest factual alignment with clinical guidelines (95%), while Copilot showed more frequent omissions or simplifications. Differences in response time and formatting quality were statistically significant, although not always clinically meaningful. Conclusions: ChatGPT-4o Pro outperforms other chatbots in delivering structured, readable, and guideline-concordant breastfeeding information. However, substantial variability persists across the platforms, and none should be considered a substitute for professional guidance. Importantly, the phenomenon of AI hallucinations—where chatbots may generate factually incorrect or fabricated information—remains a critical risk that must be addressed to ensure safe integration into maternal health communication. Future efforts should focus on improving the transparency, accuracy, and multilingual reliability of AI chatbots to ensure their safe integration into maternal health communications. Full article
Show Figures

Figure 1

22 pages, 1642 KiB  
Article
Artificial Intelligence and Journalistic Ethics: A Comparative Analysis of AI-Generated Content and Traditional Journalism
by Rimma Zhaxylykbayeva, Aizhan Burkitbayeva, Baurzhan Zhakhyp, Klara Kabylgazina and Gulmira Ashirbekova
Journal. Media 2025, 6(3), 105; https://doi.org/10.3390/journalmedia6030105 - 15 Jul 2025
Viewed by 1793
Abstract
This article presents a comparative study of content generated by artificial intelligence (AI) and articles authored by professional journalists, focusing on the perspective of a Kazakhstani audience. The analysis was conducted based on several key criteria, including the structure of the article, writing [...] Read more.
This article presents a comparative study of content generated by artificial intelligence (AI) and articles authored by professional journalists, focusing on the perspective of a Kazakhstani audience. The analysis was conducted based on several key criteria, including the structure of the article, writing style, factual accuracy, citation of sources, and completeness of the information. The study spans a variety of topics, such as politics, economics, law, sports, education, and social issues. The results indicate that AI-generated articles tend to exhibit greater structural clarity and neutrality. On the other hand, articles written by journalists score higher in terms of factual accuracy, analytical depth, and the use of verified sources. Furthermore, the research explores the significance of journalistic ethics in ensuring transparency and information completeness in content production. Ultimately, the findings emphasize the importance of upholding rigorous journalistic standards when integrating AI into media practices. Full article
Show Figures

Figure 1

11 pages, 1132 KiB  
Article
Custom-Tailored Radiology Research via Retrieval-Augmented Generation: A Secure Institutionally Deployed Large Language Model System
by Michael Welsh, Julian Lopez-Rippe, Dana Alkhulaifat, Vahid Khalkhali, Xinmeng Wang, Mario Sinti-Ycochea and Susan Sotardi
Inventions 2025, 10(4), 55; https://doi.org/10.3390/inventions10040055 - 8 Jul 2025
Viewed by 600
Abstract
Large language models (LLMs) show promise in enhancing medical research through domain-specific question answering. However, their clinical application is limited by hallucination risk, limited domain specialization, and privacy concerns. Public LLMs like GPT-4-Consensus pose challenges for use with institutional data, due to the [...] Read more.
Large language models (LLMs) show promise in enhancing medical research through domain-specific question answering. However, their clinical application is limited by hallucination risk, limited domain specialization, and privacy concerns. Public LLMs like GPT-4-Consensus pose challenges for use with institutional data, due to the inability to ensure patient data protection. In this work, we present a secure, custom-designed retrieval-augmented generation (RAG) LLM system deployed entirely within our institution and tailored for radiology research. Radiology researchers at our institution evaluated the system against GPT-4-Consensus through a blinded survey assessing factual accuracy (FA), citation relevance (CR), and perceived performance (PP) using 5-point Likert scales. Our system achieved mean ± SD scores of 4.15 ± 0.99 for FA, 3.70 ± 1.17 for CR, and 3.55 ± 1.39 for PP. In comparison, GPT-4-Consensus obtained 4.25 ± 0.72, 3.85 ± 1.23, and 3.90 ± 1.12 for the same metrics, respectively. No statistically significant differences were observed (p = 0.97, 0.65, 0.42), and 50% of participants preferred our system’s output. These results validate that secure, local RAG-based LLMs can match state-of-the-art performance while preserving privacy and adaptability, offering a scalable tool for medical research environments. Full article
(This article belongs to the Special Issue Machine Learning Applications in Healthcare and Disease Prediction)
Show Figures

Figure 1

36 pages, 1084 KiB  
Article
Quantifying Claim Robustness Through Adversarial Framing: A Conceptual Framework for an AI-Enabled Diagnostic Tool
by Christophe Faugere
AI 2025, 6(7), 147; https://doi.org/10.3390/ai6070147 - 7 Jul 2025
Viewed by 1293
Abstract
Objectives: We introduce the conceptual framework for the Adversarial Claim Robustness Diagnostics (ACRD) protocol, a novel tool for assessing how factual claims withstand ideological distortion. Methods: Based on semantics, adversarial collaboration, and the devil’s advocate approach, we develop a three-phase evaluation process combining [...] Read more.
Objectives: We introduce the conceptual framework for the Adversarial Claim Robustness Diagnostics (ACRD) protocol, a novel tool for assessing how factual claims withstand ideological distortion. Methods: Based on semantics, adversarial collaboration, and the devil’s advocate approach, we develop a three-phase evaluation process combining baseline evaluations, adversarial speaker reframing, and dynamic AI calibration along with quantified robustness scoring. We introduce the Claim Robustness Index that constitutes our final validity scoring measure. Results: We model the evaluation of claims by ideologically opposed groups as a strategic game with a Bayesian-Nash equilibrium to infer the normative behavior of evaluators after the reframing phase. The ACRD addresses shortcomings in traditional fact-checking approaches and employs large language models to simulate counterfactual attributions while mitigating potential biases. Conclusions: The framework’s ability to identify boundary conditions of persuasive validity across polarized groups can be tested across important societal and political debates ranging from climate change issues to trade policy discourses. Full article
(This article belongs to the Special Issue AI Bias in the Media and Beyond)
Show Figures

Figure 1

29 pages, 1812 KiB  
Article
Innovative Guardrails for Generative AI: Designing an Intelligent Filter for Safe and Responsible LLM Deployment
by Olga Shvetsova, Danila Katalshov and Sang-Kon Lee
Appl. Sci. 2025, 15(13), 7298; https://doi.org/10.3390/app15137298 - 28 Jun 2025
Viewed by 1489
Abstract
This paper proposes a technological framework designed to mitigate the inherent risks associated with the deployment of artificial intelligence (AI) in decision-making and task execution within the management processes. The Agreement Validation Interface (AVI) functions as a modular Application Programming Interface (API) Gateway [...] Read more.
This paper proposes a technological framework designed to mitigate the inherent risks associated with the deployment of artificial intelligence (AI) in decision-making and task execution within the management processes. The Agreement Validation Interface (AVI) functions as a modular Application Programming Interface (API) Gateway positioned between user applications and LLMs. This gateway architecture is designed to be LLM-agnostic, meaning it can operate with various underlying LLMs without requiring specific modifications for each model. This universality is achieved by standardizing the interface for requests and responses and applying a consistent set of validation and enhancement processes irrespective of the chosen LLM provider, thus offering a consistent governance layer across a diverse LLM ecosystem. AVI facilitates the orchestration of multiple AI subcomponents for input–output validation, response evaluation, and contextual reasoning, thereby enabling real-time, bidirectional filtering of user interactions. A proof-of-concept (PoC) implementation of AVI was developed and rigorously evaluated using industry-standard benchmarks. The system was tested for its effectiveness in mitigating adversarial prompts, reducing toxic outputs, detecting personally identifiable information (PII), and enhancing factual consistency. The results demonstrated that AVI reduced successful fast injection attacks by 82%, decreased toxic content generation by 75%, and achieved high PII detection performance (F1-score ≈ 0.95). Furthermore, the contextual reasoning module significantly improved the neutrality and factual validity of model outputs. Although the integration of AVI introduced a moderate increase in latency, the overall framework effectively enhanced the reliability, safety, and interpretability of LLM-driven applications. AVI provides a scalable and adaptable architectural template for the responsible deployment of generative AI in high-stakes domains such as finance, healthcare, and education, promoting safer and more ethical use of AI technologies. Full article
Show Figures

Figure 1

24 pages, 1501 KiB  
Review
Large Language Models in Medical Chatbots: Opportunities, Challenges, and the Need to Address AI Risks
by James C. L. Chow and Kay Li
Information 2025, 16(7), 549; https://doi.org/10.3390/info16070549 - 27 Jun 2025
Viewed by 1954
Abstract
Large language models (LLMs) are transforming the capabilities of medical chatbots by enabling more context-aware, human-like interactions. This review presents a comprehensive analysis of their applications, technical foundations, benefits, challenges, and future directions in healthcare. LLMs are increasingly used in patient-facing roles, such [...] Read more.
Large language models (LLMs) are transforming the capabilities of medical chatbots by enabling more context-aware, human-like interactions. This review presents a comprehensive analysis of their applications, technical foundations, benefits, challenges, and future directions in healthcare. LLMs are increasingly used in patient-facing roles, such as symptom checking, health information delivery, and mental health support, as well as in clinician-facing applications, including documentation, decision support, and education. However, as a study from 2024 warns, there is a need to manage “extreme AI risks amid rapid progress”. We examine transformer-based architectures, fine-tuning strategies, and evaluation benchmarks specific to medical domains to identify their potential to transfer and mitigate AI risks when using LLMs in medical chatbots. While LLMs offer advantages in scalability, personalization, and 24/7 accessibility, their deployment in healthcare also raises critical concerns. These include hallucinations (the generation of factually incorrect or misleading content by an AI model), algorithmic biases, privacy risks, and a lack of regulatory clarity. Ethical and legal challenges, such as accountability, explainability, and liability, remain unresolved. Importantly, this review integrates broader insights on AI safety, drawing attention to the systemic risks associated with rapid LLM deployment. As highlighted in recent policy research, including work on managing extreme AI risks, there is an urgent need for governance frameworks that extend beyond technical reliability to include societal oversight and long-term alignment. We advocate for responsible innovation and sustained collaboration among clinicians, developers, ethicists, and regulators to ensure that LLM-powered medical chatbots are deployed safely, equitably, and transparently within healthcare systems. Full article
Show Figures

Graphical abstract

18 pages, 967 KiB  
Article
A Data-Driven Analysis of Engineering Contract Risk Characterization Based on Judicial Cases of Disputes
by Yongcheng Zhang, Ziyi Wu, Chaohua Xiong, Jianwei Wang and Maxwell Fordjour Antwi-Afari
Buildings 2025, 15(13), 2245; https://doi.org/10.3390/buildings15132245 - 26 Jun 2025
Viewed by 372
Abstract
Engineering contract management is a critical component of project management systems, serving as a key mechanism for ensuring successful project implementation. This study systematically analyzes 349 s-instance judicial cases related to construction engineering contract disputes in the Yangtze River Delta Economic Zone from [...] Read more.
Engineering contract management is a critical component of project management systems, serving as a key mechanism for ensuring successful project implementation. This study systematically analyzes 349 s-instance judicial cases related to construction engineering contract disputes in the Yangtze River Delta Economic Zone from 2017 to 2021, based on data obtained from the China Judgments Online database. The research identifies contractual risk characteristics across dimensions such as regional distribution, dispute terminology, legal citation patterns, and appellate role transitions. The key findings include the following: (1) Primary risks involve payment disputes, quality assurance failures, contractual validity issues, and schedule compliance challenges. (2) Litigation patterns reveal complex interdependencies between contracting parties and stakeholders, posing significant risk management challenges. (3) High second-instance modification rates stem from procedural irregularities, new evidence, improper legal application, and factual errors in initial trials. The study proposes stratified risk mitigation strategies, including governmental regulatory improvements and enterprise-level management optimizations. These findings offer valuable insights into advancing risk governance in construction contract administration, particularly through an enhanced understanding of dispute complexity and systemic vulnerabilities. Full article
(This article belongs to the Section Construction Management, and Computers & Digitization)
Show Figures

Figure 1

Back to TopTop