Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (85)

Search Parameters:
Keywords = medical question answering system

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 6095 KiB  
Article
MERA: Medical Electronic Records Assistant
by Ahmed Ibrahim, Abdullah Khalili, Maryam Arabi, Aamenah Sattar, Abdullah Hosseini and Ahmed Serag
Mach. Learn. Knowl. Extr. 2025, 7(3), 73; https://doi.org/10.3390/make7030073 - 30 Jul 2025
Viewed by 394
Abstract
The increasing complexity and scale of electronic health records (EHRs) demand advanced tools for efficient data retrieval, summarization, and comparative analysis in clinical practice. MERA (Medical Electronic Records Assistant) is a Retrieval-Augmented Generation (RAG)-based AI system that addresses these needs by integrating domain-specific [...] Read more.
The increasing complexity and scale of electronic health records (EHRs) demand advanced tools for efficient data retrieval, summarization, and comparative analysis in clinical practice. MERA (Medical Electronic Records Assistant) is a Retrieval-Augmented Generation (RAG)-based AI system that addresses these needs by integrating domain-specific retrieval with large language models (LLMs) to deliver robust question answering, similarity search, and report summarization functionalities. MERA is designed to overcome key limitations of conventional LLMs in healthcare, such as hallucinations, outdated knowledge, and limited explainability. To ensure both privacy compliance and model robustness, we constructed a large synthetic dataset using state-of-the-art LLMs, including Mistral v0.3, Qwen 2.5, and Llama 3, and further validated MERA on de-identified real-world EHRs from the MIMIC-IV-Note dataset. Comprehensive evaluation demonstrates MERA’s high accuracy in medical question answering (correctness: 0.91; relevance: 0.98; groundedness: 0.89; retrieval relevance: 0.92), strong summarization performance (ROUGE-1 F1-score: 0.70; Jaccard similarity: 0.73), and effective similarity search (METEOR: 0.7–1.0 across diagnoses), with consistent results on real EHRs. The similarity search module empowers clinicians to efficiently identify and compare analogous patient cases, supporting differential diagnosis and personalized treatment planning. By generating concise, contextually relevant, and explainable insights, MERA reduces clinician workload and enhances decision-making. To our knowledge, this is the first system to integrate clinical question answering, summarization, and similarity search within a unified RAG-based framework. Full article
(This article belongs to the Special Issue Advances in Machine and Deep Learning)
Show Figures

Figure 1

30 pages, 1251 KiB  
Article
Large Language Models in Medical Image Analysis: A Systematic Survey and Future Directions
by Bushra Urooj, Muhammad Fayaz, Shafqat Ali, L. Minh Dang and Kyung Won Kim
Bioengineering 2025, 12(8), 818; https://doi.org/10.3390/bioengineering12080818 - 29 Jul 2025
Viewed by 267
Abstract
The integration of vision and language processing into a cohesive system has already shown promise with the application of large language models (LLMs) in medical image analysis. Their capabilities encompass the generation of medical reports, disease classification, visual question answering, and segmentation, providing [...] Read more.
The integration of vision and language processing into a cohesive system has already shown promise with the application of large language models (LLMs) in medical image analysis. Their capabilities encompass the generation of medical reports, disease classification, visual question answering, and segmentation, providing yet another approach to interpreting multimodal data. This survey aims to compile all known applications of LLMs in the medical image analysis field, spotlighting their promises alongside critical challenges and future avenues. We introduce the concept of X-stage tuning which serves as a framework for LLMs fine-tuning across multiple stages: zero stage, one stage, and multi-stage, wherein each stage corresponds to task complexity and available data. The survey describes issues like sparsity of data, hallucination in outputs, privacy issues, and the requirement for dynamic knowledge updating. Alongside these, we cover prospective features including integration of LLMs with decision support systems, multimodal learning, and federated learning for privacy-preserving model training. The goal of this work is to provide structured guidance to the targeted audience, demystifying the prospects of LLMs in medical image analysis. Full article
(This article belongs to the Special Issue Deep Learning in Medical Applications: Challenges and Opportunities)
Show Figures

Figure 1

11 pages, 1132 KiB  
Article
Custom-Tailored Radiology Research via Retrieval-Augmented Generation: A Secure Institutionally Deployed Large Language Model System
by Michael Welsh, Julian Lopez-Rippe, Dana Alkhulaifat, Vahid Khalkhali, Xinmeng Wang, Mario Sinti-Ycochea and Susan Sotardi
Inventions 2025, 10(4), 55; https://doi.org/10.3390/inventions10040055 - 8 Jul 2025
Viewed by 433
Abstract
Large language models (LLMs) show promise in enhancing medical research through domain-specific question answering. However, their clinical application is limited by hallucination risk, limited domain specialization, and privacy concerns. Public LLMs like GPT-4-Consensus pose challenges for use with institutional data, due to the [...] Read more.
Large language models (LLMs) show promise in enhancing medical research through domain-specific question answering. However, their clinical application is limited by hallucination risk, limited domain specialization, and privacy concerns. Public LLMs like GPT-4-Consensus pose challenges for use with institutional data, due to the inability to ensure patient data protection. In this work, we present a secure, custom-designed retrieval-augmented generation (RAG) LLM system deployed entirely within our institution and tailored for radiology research. Radiology researchers at our institution evaluated the system against GPT-4-Consensus through a blinded survey assessing factual accuracy (FA), citation relevance (CR), and perceived performance (PP) using 5-point Likert scales. Our system achieved mean ± SD scores of 4.15 ± 0.99 for FA, 3.70 ± 1.17 for CR, and 3.55 ± 1.39 for PP. In comparison, GPT-4-Consensus obtained 4.25 ± 0.72, 3.85 ± 1.23, and 3.90 ± 1.12 for the same metrics, respectively. No statistically significant differences were observed (p = 0.97, 0.65, 0.42), and 50% of participants preferred our system’s output. These results validate that secure, local RAG-based LLMs can match state-of-the-art performance while preserving privacy and adaptability, offering a scalable tool for medical research environments. Full article
(This article belongs to the Special Issue Machine Learning Applications in Healthcare and Disease Prediction)
Show Figures

Figure 1

40 pages, 2828 KiB  
Review
Generative Artificial Intelligence in Healthcare: Applications, Implementation Challenges, and Future Directions
by Syed Arman Rabbani, Mohamed El-Tanani, Shrestha Sharma, Syed Salman Rabbani, Yahia El-Tanani, Rakesh Kumar and Manita Saini
BioMedInformatics 2025, 5(3), 37; https://doi.org/10.3390/biomedinformatics5030037 - 7 Jul 2025
Viewed by 2389
Abstract
Generative artificial intelligence (AI) is rapidly transforming healthcare systems since the advent of OpenAI in 2022. It encompasses a class of machine learning techniques designed to create new content and is classified into large language models (LLMs) for text generation and image-generating models [...] Read more.
Generative artificial intelligence (AI) is rapidly transforming healthcare systems since the advent of OpenAI in 2022. It encompasses a class of machine learning techniques designed to create new content and is classified into large language models (LLMs) for text generation and image-generating models for creating or enhancing visual data. These generative AI models have shown widespread applications in clinical practice and research. Such applications range from medical documentation and diagnostics to patient communication and drug discovery. These models are capable of generating text messages, answering clinical questions, interpreting CT scan and MRI images, assisting in rare diagnoses, discovering new molecules, and providing medical education and training. Early studies have indicated that generative AI models can improve efficiency, reduce administrative burdens, and enhance patient engagement, although most findings are preliminary and require rigorous validation. However, the technology also raises serious concerns around accuracy, bias, privacy, ethical use, and clinical safety. Regulatory bodies, including the FDA and EMA, are beginning to define governance frameworks, while academic institutions and healthcare organizations emphasize the need for transparency, supervision, and evidence-based implementation. Generative AI is not a replacement for medical professionals but a potential partner—augmenting decision-making, streamlining communication, and supporting personalized care. Its responsible integration into healthcare could mark a paradigm shift toward more proactive, precise, and patient-centered systems. Full article
Show Figures

Figure 1

24 pages, 3720 KiB  
Article
A Comparative Study of the Accuracy and Readability of Responses from Four Generative AI Models to COVID-19-Related Questions
by Zongjing Liang, Yun Kuang, Xiaobo Liang, Gongcheng Liang and Zhijie Li
COVID 2025, 5(7), 99; https://doi.org/10.3390/covid5070099 - 30 Jun 2025
Viewed by 303
Abstract
The purpose of this study is to compare the accuracy and readability of Coronavirus Disease 2019 (COVID-19)-prevention and control knowledge texts generated by four current generative artificial intelligence (AI) models—two international models (ChatGPT and Gemini) and two domestic models (Kimi and Ernie Bot)—and [...] Read more.
The purpose of this study is to compare the accuracy and readability of Coronavirus Disease 2019 (COVID-19)-prevention and control knowledge texts generated by four current generative artificial intelligence (AI) models—two international models (ChatGPT and Gemini) and two domestic models (Kimi and Ernie Bot)—and to evaluate the other performance characteristics of texts generated by domestic and international models. This paper uses the questions and answers in the COVID-19 prevention guidelines issued by the U.S. Centers for Disease Control and Prevention (CDC) as the evaluation criteria. The accuracy, readability, and comprehensibility of the texts generated by each model are scored against the CDC standards. Then the neural network model in the intelligent algorithms is used to identify the factors that affect readability. Then the medical topics of the generated text are analyzed using text analysis technology. Finally, a questionnaire-based manual scoring approach was used to evaluate the AI-generated texts, which was then compared to automated machine scoring. Accuracy: domestic models have higher textual accuracy, while international models have higher reliability. Readability: domestic models produced more fluent and publicly accessible language; international models generated more standardized and formally structured texts with greater consistency. Comprehensibility: domestic models offered superior readability, while international models were more stable in output. Readability factors: the average words per sentence (AWPS) emerged as the most significant factor influencing readability across all models. Topic analysis: ChatGPT emphasized epidemiological knowledge; Gemini focused on general medical and health topics; Kimi provided more multidisciplinary content; and Ernie Bot concentrated on clinical medicine. From the empirical results, it can be found that the manual and machine scoring are highly consistent in the indicators SimHash and FKGL, which proves the effectiveness of the evaluation method proposed in this paper. Conclusion: Texts generated by domestic models are more accessible and better suited for public education, clinical communication, and health consultations. In contrast, the international model has a higher accuracy in generating expertise, especially in epidemiological studies and assessing knowledge literature on disease severity. The inclusion of manual evaluations confirms the reliability of the proposed assessment framework. It is therefore recommended that future AI-generated knowledge systems for infectious disease control balance professional rigor with public comprehensibility, in order to provide reliable and accessible reference materials during major infectious disease outbreaks. Full article
(This article belongs to the Section COVID Public Health and Epidemiology)
Show Figures

Figure 1

25 pages, 2296 KiB  
Article
Multimedia Graph Codes for Fast and Semantic Retrieval-Augmented Generation
by Stefan Wagenpfeil
Electronics 2025, 14(12), 2472; https://doi.org/10.3390/electronics14122472 - 18 Jun 2025
Viewed by 572
Abstract
Retrieval-Augmented Generation (RAG) has become a central approach to enhance the factual consistency and domain specificity of large language models (LLMs) by incorporating external context at inference time. However, most existing RAG systems rely on dense vector-based similarity, which fails to capture complex [...] Read more.
Retrieval-Augmented Generation (RAG) has become a central approach to enhance the factual consistency and domain specificity of large language models (LLMs) by incorporating external context at inference time. However, most existing RAG systems rely on dense vector-based similarity, which fails to capture complex semantic structures, relational dependencies, and multimodal content. In this paper, we introduce Graph Codes—a matrix-based encoding of Multimedia Feature Graphs—as an alternative retrieval paradigm. Graph Codes preserve semantic topology by explicitly encoding entities and their typed relationships from multimodal documents, enabling structure-aware and interpretable retrieval. We evaluate our system in two domains: multimodal scene understanding (200 annotated image-question pairs) and clinical question answering (150 real-world medical queries with 10,000 structured knowledge snippets). Results show that our method outperforms dense retrieval baselines in precision (+9–15%), reduces hallucination rates by over 30%, and yields higher expert-rated answer quality. Theoretically, this work demonstrates that symbolic similarity over typed semantic graphs provides a more faithful alignment mechanism than latent embeddings. Practically, it enables interpretable, modality-agnostic retrieval pipelines deployable in high-stakes domains such as medicine or law. We conclude that Graph Code-based RAG bridges the gap between structured knowledge representation and neural generation, offering a robust and explainable alternative to existing approaches. Full article
(This article belongs to the Special Issue AI Synergy: Vision, Language, and Modality)
Show Figures

Figure 1

21 pages, 1796 KiB  
Article
A Study of NLP-Based Speech Interfaces in Medical Virtual Reality
by Mohit Nayak, Jari Kangas and Roope Raisamo
Multimodal Technol. Interact. 2025, 9(6), 50; https://doi.org/10.3390/mti9060050 - 26 May 2025
Viewed by 1134
Abstract
Applications of virtual reality (VR) have grown in significance in medicine, as they are able to recreate real-life scenarios in 3D while posing reduced risks to patients. However, there are several interaction challenges to overcome when moving from 2D screens to 3D VR [...] Read more.
Applications of virtual reality (VR) have grown in significance in medicine, as they are able to recreate real-life scenarios in 3D while posing reduced risks to patients. However, there are several interaction challenges to overcome when moving from 2D screens to 3D VR environments, such as complex controls and slow user adaptation. More intuitive techniques are needed for enhanced user experience. Our research explored the potential of intelligent speech interfaces to enhance user interaction while conducting complex medical tasks. We developed a speech-based assistant within a VR application for maxillofacial implant planning, leveraging natural language processing (NLP) to interpret user intentions and to execute tasks such as obtaining surgical equipment or answering questions related to the VR environment. The objective of the study was to evaluate the usability and cognitive load of the speech-based assistant. We conducted a mixed-methods within-subjects user study with 20 participants and compared the voice-assisted approach to traditional interaction methods, such as button panels on the VR view, across various tasks. Our findings indicate that NLP-driven speech-based assistants can enhance interaction and accessibility in medical VR, especially in areas such as locating controls, easiness of control, user comfort, and intuitive interaction. These findings highlight the potential benefits of augmenting traditional controls with speech interfaces, particularly in complex VR scenarios where conventional methods may limit usability. We identified key areas for future research, including improving the intelligence, accuracy, and user experience of speech-based systems. Addressing these areas could facilitate the development of more robust, user-centric, voice-assisted applications in virtual reality environments. Full article
Show Figures

Figure 1

19 pages, 704 KiB  
Article
Hierarchical Modeling for Medical Visual Question Answering with Cross-Attention Fusion
by Junkai Zhang, Bin Li and Shoujun Zhou
Appl. Sci. 2025, 15(9), 4712; https://doi.org/10.3390/app15094712 - 24 Apr 2025
Viewed by 1177
Abstract
Medical Visual Question Answering (Med-VQA) is designed to accurately answer medical questions by analyzing medical images when given both a medical image and its corresponding clinical question. Designing the MedVQA system holds profound importance in assisting clinical diagnosis and enhancing diagnostic accuracy. Building [...] Read more.
Medical Visual Question Answering (Med-VQA) is designed to accurately answer medical questions by analyzing medical images when given both a medical image and its corresponding clinical question. Designing the MedVQA system holds profound importance in assisting clinical diagnosis and enhancing diagnostic accuracy. Building upon this foundation, Hierarchical Medical VQA extends Medical VQA by organizing medical questions into a hierarchical structure and making level-specific predictions to handle fine-grained distinctions. Recently, many studies have proposed hierarchical Med-VQA tasks and established datasets. However, several issues still remain: (1) imperfect hierarchical modeling leads to poor differentiation between question levels, resulting in semantic fragmentation across hierarchies. (2) Excessive reliance on implicit learning in Transformer-based cross-modal self-attention fusion methods, which can obscure crucial local semantic correlations in medical scenarios. To address these issues, this study proposes a Hierarchical Modeling for Medical Visual Question Answering with Cross-Attention Fusion (HiCA-VQA) method. Specifically, the hierarchical modeling includes two modules: Hierarchical Prompting for fine-grained medical questions and Hierarchical Answer Decoders. The hierarchical prompting module pre-aligns hierarchical text prompts with image features to guide the model in focusing on specific image regions according to question types, while the hierarchical decoder performs separate predictions for questions at different levels to improve accuracy across granularities. The framework also incorporates a cross-attention fusion module where images serve as queries and text as key-value pairs. This approach effectively avoids the irrelevant signals introduced by global interactions while achieving lower computational complexity compared to global self-attention fusion modules. Experiments on the Rad-Restruct benchmark demonstrate that the HiCA-VQA framework outperforms existing state-of-the-art methods in answering hierarchical fine-grained questions, especially achieving an 18 percent improvement in the F1 score. This study provides an effective pathway for hierarchical visual question answering systems, advancing medical image understanding. Full article
(This article belongs to the Special Issue New Trends in Natural Language Processing)
Show Figures

Figure 1

29 pages, 549 KiB  
Review
Generative Models in Medical Visual Question Answering: A Survey
by Wenjie Dong, Shuhao Shen, Yuqiang Han, Tao Tan, Jian Wu and Hongxia Xu
Appl. Sci. 2025, 15(6), 2983; https://doi.org/10.3390/app15062983 - 10 Mar 2025
Cited by 1 | Viewed by 4040
Abstract
Medical Visual Question Answering (MedVQA) is a crucial intersection of artificial intelligence and healthcare. It enables systems to interpret medical images—such as X-rays, MRIs, and pathology slides—and respond to clinical queries. Early approaches primarily relied on discriminative models, which select answers from predefined [...] Read more.
Medical Visual Question Answering (MedVQA) is a crucial intersection of artificial intelligence and healthcare. It enables systems to interpret medical images—such as X-rays, MRIs, and pathology slides—and respond to clinical queries. Early approaches primarily relied on discriminative models, which select answers from predefined candidates. However, these methods struggle to effectively address open-ended, domain-specific, or complex queries. Recent advancements have shifted the focus toward generative models, leveraging autoregressive decoders, large language models (LLMs), and multimodal large language models (MLLMs) to generate more nuanced and free-form answers. This review comprehensively examines the paradigm shift from discriminative to generative systems, examining generative MedVQA works on their model architectures and training process, summarizing evaluation benchmarks and metrics, highlighting key advances and techniques that propels the development of generative MedVQA, such as concept alignment, instruction tuning, and parameter-efficient fine-tuning (PEFT), alongside strategies for data augmentation and automated dataset creation. Finally, we propose future directions to enhance clinical reasoning and intepretability, build robust evaluation benchmarks and metrics, and employ scalable training strategies and deployment solutions. By analyzing the strengths and limitations of existing generative MedVQA approaches, we aim to provide valuable insights for researchers and practitioners working in this domain. Full article
(This article belongs to the Special Issue Feature Review Papers in "Computing and Artificial Intelligence")
Show Figures

Figure 1

15 pages, 1758 KiB  
Article
The Extent to Which Artificial Intelligence Can Help Fulfill Metastatic Breast Cancer Patient Healthcare Needs: A Mixed-Methods Study
by Yvonne W. Leung, Jeremiah So, Avneet Sidhu, Veenaajaa Asokan, Mathew Gancarz, Vishrut Bharatkumar Gajjar, Ankita Patel, Janice M. Li, Denis Kwok, Michelle B. Nadler, Danielle Cuthbert, Philippe L. Benard, Vikaash Kumar, Terry Cheng, Janet Papadakos, Tina Papadakos, Tran Truong, Mike Lovas and Jiahui Wong
Curr. Oncol. 2025, 32(3), 145; https://doi.org/10.3390/curroncol32030145 - 2 Mar 2025
Cited by 1 | Viewed by 2066
Abstract
The Artificial Intelligence Patient Librarian (AIPL) was designed to meet the psychosocial and supportive care needs of Metastatic Breast Cancer (MBC) patients with HR+/HER2− subtypes. AIPL provides conversational patient education, answers user questions, and offers tailored online resource recommendations. This study, conducted in [...] Read more.
The Artificial Intelligence Patient Librarian (AIPL) was designed to meet the psychosocial and supportive care needs of Metastatic Breast Cancer (MBC) patients with HR+/HER2− subtypes. AIPL provides conversational patient education, answers user questions, and offers tailored online resource recommendations. This study, conducted in three phases, assessed AIPL’s impact on patients’ ability to manage their advanced disease. In Phase 1, educational content was adapted for chatbot delivery, and over 100 credible online resources were annotated using a Convolutional Neural Network (CNN) to drive recommendations. Phase 2 involved 42 participants who completed pre- and post-surveys after using AIPL for two weeks. The surveys measured patient activation using the Patient Activation Measure (PAM) tool and evaluated user experience with the System Usability Scale (SUS). Phase 3 included focus groups to explore user experiences in depth. Of the 42 participants, 36 completed the study, with 10 participating in focus groups. Most participants were aged 40–64. PAM scores showed no significant differences between pre-survey (mean = 59.33, SD = 5.19) and post-survey (mean = 59.22, SD = 6.16), while SUS scores indicated good usability. Thematic analysis revealed four key themes: AIPL offers basic wellness and health guidance, provides limited support for managing relationships, offers limited condition-specific medical information, and is unable to offer hope to patients. Despite showing no impact on the PAM, possibly due to high baseline activation, AIPL demonstrated good usability and met basic information needs, particularly for newly diagnosed MBC patients. Future iterations will incorporate a large language model (LLM) to provide more comprehensive and personalized assistance. Full article
(This article belongs to the Section Breast Cancer)
Show Figures

Figure 1

32 pages, 3661 KiB  
Systematic Review
Explainable AI in Diagnostic Radiology for Neurological Disorders: A Systematic Review, and What Doctors Think About It
by Yasir Hafeez, Khuhed Memon, Maged S. AL-Quraishi, Norashikin Yahya, Sami Elferik and Syed Saad Azhar Ali
Diagnostics 2025, 15(2), 168; https://doi.org/10.3390/diagnostics15020168 - 13 Jan 2025
Cited by 6 | Viewed by 5270
Abstract
Background: Artificial intelligence (AI) has recently made unprecedented contributions in every walk of life, but it has not been able to work its way into diagnostic medicine and standard clinical practice yet. Although data scientists, researchers, and medical experts have been working in [...] Read more.
Background: Artificial intelligence (AI) has recently made unprecedented contributions in every walk of life, but it has not been able to work its way into diagnostic medicine and standard clinical practice yet. Although data scientists, researchers, and medical experts have been working in the direction of designing and developing computer aided diagnosis (CAD) tools to serve as assistants to doctors, their large-scale adoption and integration into the healthcare system still seems far-fetched. Diagnostic radiology is no exception. Imagining techniques like magnetic resonance imaging (MRI), computed tomography (CT), and positron emission tomography (PET) scans have been widely and very effectively employed by radiologists and neurologists for the differential diagnoses of neurological disorders for decades, yet no AI-powered systems to analyze such scans have been incorporated into the standard operating procedures of healthcare systems. Why? It is absolutely understandable that in diagnostic medicine, precious human lives are on the line, and hence there is no room even for the tiniest of mistakes. Nevertheless, with the advent of explainable artificial intelligence (XAI), the old-school black boxes of deep learning (DL) systems have been unraveled. Would XAI be the turning point for medical experts to finally embrace AI in diagnostic radiology? This review is a humble endeavor to find the answers to these questions. Methods: In this review, we present the journey and contributions of AI in developing systems to recognize, preprocess, and analyze brain MRI scans for differential diagnoses of various neurological disorders, with special emphasis on CAD systems embedded with explainability. A comprehensive review of the literature from 2017 to 2024 was conducted using host databases. We also present medical domain experts’ opinions and summarize the challenges up ahead that need to be addressed in order to fully exploit the tremendous potential of XAI in its application to medical diagnostics and serve humanity. Results: Forty-seven studies were summarized and tabulated with information about the XAI technology and datasets employed, along with performance accuracies. The strengths and weaknesses of the studies have also been discussed. In addition, the opinions of seven medical experts from around the world have been presented to guide engineers and data scientists in developing such CAD tools. Conclusions: Current CAD research was observed to be focused on the enhancement of the performance accuracies of the DL regimens, with less attention being paid to the authenticity and usefulness of explanations. A shortage of ground truth data for explainability was also observed. Visual explanation methods were found to dominate; however, they might not be enough, and more thorough and human professor-like explanations would be required to build the trust of healthcare professionals. Special attention to these factors along with the legal, ethical, safety, and security issues can bridge the current gap between XAI and routine clinical practice. Full article
Show Figures

Figure 1

33 pages, 2332 KiB  
Review
Explainable Machine Learning in Critical Decision Systems: Ensuring Safe Application and Correctness
by Julius Wiggerthale and Christoph Reich
AI 2024, 5(4), 2864-2896; https://doi.org/10.3390/ai5040138 - 11 Dec 2024
Cited by 4 | Viewed by 2553
Abstract
Machine learning (ML) is increasingly used to support or automate decision processes in critical decision systems such as self driving cars or systems for medical diagnosis. These systems require decisions in which human lives are at stake and the decisions should therefore be [...] Read more.
Machine learning (ML) is increasingly used to support or automate decision processes in critical decision systems such as self driving cars or systems for medical diagnosis. These systems require decisions in which human lives are at stake and the decisions should therefore be well founded and very reliable. This need for reliability contrasts with the black-box nature of many ML models, making it difficult to ensure that they always behave as intended. In face of the high stakes involved, the resulting uncertainty is a significant challenge. Explainable artificial intelligence (XAI) addresses the issue by making black-box models more interpretable, often to increase user trust. However, many current XAI applications focus more on transparency and usability than on enhancing safety of ML applications. In this work, we therefore conduct a systematic literature review to examine how XAI can be leveraged to increase safety of ML applications in critical decision systems. We strive to find out for what purposes XAI is currently used in critical decision systems, what are the most common XAI techniques in critical decision systems and how XAI can be harnessed to increase safety of ML applications in critical decision systems. Using the SPAR-4-SLR protocol, we are able to answer these questions and provide a foundational resource for researchers and practitioners seeking to mitigate risks of ML applications. Essentially, we identify promising approaches of XAI which go beyond increasing trust to actively ensure correctness of decisions. Our findings propose a three-layered framework to enhance safety of ML in critical decision systems by means of XAI. The approach consists of Reliability, Validation and Verification. Furthermore, we point out gaps in research and propose future directions of XAI research for enhancing safety of ML applications in critical decision systems. Full article
Show Figures

Figure 1

20 pages, 3148 KiB  
Article
Influence of Spirituality on Bitter Kola Consumption Among Pretoria Residents in Response to COVID-19 and Related Illnesses
by Daniel Orogun and Harold G. Koenig
Religions 2024, 15(12), 1508; https://doi.org/10.3390/rel15121508 - 11 Dec 2024
Viewed by 2111
Abstract
The agrarian continent of Africa has many fruits with nutritional, medicinal and spiritual values. Regardless, Africa leads the statistics of poor healthcare globally. Two major challenges in Africa’s healthcare system are poor access and the high cost of medical healthcare. Among others, the [...] Read more.
The agrarian continent of Africa has many fruits with nutritional, medicinal and spiritual values. Regardless, Africa leads the statistics of poor healthcare globally. Two major challenges in Africa’s healthcare system are poor access and the high cost of medical healthcare. Among others, the effects of such challenges include low responsiveness to medical treatment and a high mortality rate. However, it seems the nosophobia that accompanied the global mortality rate during the COVID-19 pandemic may have triggered a spiritually influenced alternative. One of the traditional alternatives was a subscription to Garcinia Kola, popularly known as Bitter Kola (BK). This article, majoring in spiritual and not psychological influence, raised a hypothetical question: does spirituality influence Africans’ traditional response to COVID-19? To answer this question, Sunnyside in Pretoria was chosen as a demography to investigate the hypothesis. Data were collected via mixed research methods. There were 16 qualitative respondents, including sellers, herbalists and clergies, and 75 consumers as quantitative respondents under probability sampling. The results analysed using Excel and Python’s regression analysis demonstrated strong connections between consumers’ spiritual motivations, the sales period, the sales rate, and the swift traditional response to the pandemic and related illnesses. The outcome validated the influence of spirituality on 60.9% of quantitative respondents and showed how 25–72% responded to COVID-19 symptoms with BK. Likewise, 87.5% of qualitative respondents consumed BK via indigenous spiritual knowledge in response to the pandemic. Subsequently, this article discussed the benefits, limitations and lessons of spiritual influence on BK consumption in the post-COVID-19 era. Full article
(This article belongs to the Special Issue The Role of Religion and Spirituality in Times of Crisis)
Show Figures

Figure 1

22 pages, 1650 KiB  
Article
Interpretable Conversation Routing via the Latent Embeddings Approach
by Daniil Maksymenko and Oleksii Turuta
Computation 2024, 12(12), 237; https://doi.org/10.3390/computation12120237 - 1 Dec 2024
Cited by 2 | Viewed by 1315
Abstract
Large language models (LLMs) are quickly implemented to answer question and support systems to automate customer experience across all domains, including medical use cases. Models in such environments should solve multiple problems like general knowledge questions, queries to external sources, function calling and [...] Read more.
Large language models (LLMs) are quickly implemented to answer question and support systems to automate customer experience across all domains, including medical use cases. Models in such environments should solve multiple problems like general knowledge questions, queries to external sources, function calling and many others. Some cases might not even require a full-on text generation. They possibly need different prompts or even different models. All of it can be managed by a routing step. This paper focuses on interpretable few-shot approaches for conversation routing like latent embeddings retrieval. The work here presents a benchmark, a sorrow analysis, and a set of visualizations of the way latent embeddings routing works for long-context conversations in a multilingual, domain-specific environment. The results presented here show that the latent embeddings router is able to achieve performance on the same level as LLM-based routers with additional interpretability and higher level of control over model decision-making. Full article
(This article belongs to the Special Issue Artificial Intelligence Applications in Public Health)
Show Figures

Figure 1

15 pages, 5607 KiB  
Article
Auto-Rad: End-to-End Report Generation from Lumber Spine MRI Using Vision–Language Model
by Mohammed Yeasin, Kazi Ashraf Moinuddin, Felix Havugimana, Lijia Wang and Paul Park
J. Clin. Med. 2024, 13(23), 7092; https://doi.org/10.3390/jcm13237092 - 23 Nov 2024
Viewed by 1773
Abstract
Background: Lumbar spinal stenosis (LSS) is a major cause of chronic lower back and leg pain, and is traditionally diagnosed through labor-intensive analysis of magnetic resonance imaging (MRI) scans by radiologists. This study aims to streamline the diagnostic process by developing an automated [...] Read more.
Background: Lumbar spinal stenosis (LSS) is a major cause of chronic lower back and leg pain, and is traditionally diagnosed through labor-intensive analysis of magnetic resonance imaging (MRI) scans by radiologists. This study aims to streamline the diagnostic process by developing an automated radiology report generation (ARRG) system using a vision–language (VL) model. Methods: We utilized a Generative Image-to-Text (GIT) model, originally designed for visual question answering (VQA) and image captioning. The model was fine-tuned to generate diagnostic reports directly from lumbar spine MRI scans using a modest set of annotated data. Additionally, GPT-4 was used to convert semistructured text into coherent paragraphs for better comprehension by the GIT model. Results: The model effectively generated semantically accurate and grammatically coherent reports. The performance was evaluated using METEOR (0.37), BERTScore (0.886), and ROUGE-L (0.3), indicating its potential to produce clinically relevant content. Conclusions: This study highlights the feasibility of using vision–language models to automate report generation from medical imaging, potentially reducing the diagnostic workload for radiologists. Full article
Show Figures

Figure 1

Back to TopTop