Synergistic Joint Model of Knowledge Graph and LLM for Enhancing XAI-Based Clinical Decision Support Systems

Park, Chaelim; Lee, Hayoung; Lee, Seonghee; Jeong, Okran

doi:10.3390/math13060949

Open AccessArticle

Synergistic Joint Model of Knowledge Graph and LLM for Enhancing XAI-Based Clinical Decision Support Systems

¹

School of Computing, Gachon University, Seongnam-si 13120, Republic of Korea

²

Artificial Intelligence Convergence Research Section, Electronics and Telecommunications Research Institute (ETRI), Seongnam-si 13488, Republic of Korea

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2025, 13(6), 949; https://doi.org/10.3390/math13060949

Submission received: 18 February 2025 / Revised: 2 March 2025 / Accepted: 5 March 2025 / Published: 13 March 2025

(This article belongs to the Special Issue Application of Artificial Intelligence, Machine Learning and Data Science in Industrial and Medical Domains)

Download

Browse Figures

Versions Notes

Abstract

Despite the excellent generalization capabilities of large-scale language models (LLMs), their severe limitations, such as illusions, lack of domain-specific knowledge, and ambiguity in the reasoning process, challenge their direct application to clinical decision support systems (CDSSs). To address these challenges, this study proposes a synergistic joint model that integrates knowledge graphs (KGs) and LLMs to enhance domain-specific knowledge and improve explainability in CDSSs. The proposed model leverages KGs to provide structured, domain-specific insights while utilizing LLMs’ generative capabilities to dynamically extract, refine, and expand medical knowledge. This bi-directional interaction ensures that CDSS recommendations remain both clinically accurate and contextually comprehensive. Performance evaluation of the joint model for mental health etiology, stress detection, and emotion recognition tasks of the CDSS showed up to a 12.0% increase in accuracy and an 8.6% increase in F1 score when compared to the standalone LLM model, with additional significant improvements when using the model with medical domain knowledge. Thus, the reliable and up-to-date domain knowledge obtained through the joint model not only improves the task performance of the CDSS, but also provides direct evidence of how such decisions were made. These findings validate the broad applicability and effectiveness of our KG–LLM joint model, highlighting its potential in real-world clinical decision support scenarios.

Keywords:

knowledge graph; large language model; clinical decision support system; explainable AI; information extraction

MSC:

68T50

1. Introduction

Clinical decision support systems (CDSSs) [1] are designed to support clinicians in diagnosing and treating diseases based on clinical information from patients. CDSSs combine patient medical records, the latest medical research findings, and treatment guidelines to facilitate various clinical decisions, such as diagnosis, treatment planning, and prognosis. As the healthcare environment becomes more sophisticated and specialized, the need for decision support systems that can encompass the entire patient care process is becoming increasingly important, with Artificial Intelligence (AI) being a promising solution [2]. However, despite the rapid advancements in AI, several limitations must be addressed before it can be effectively applied to CDSSs.

In the medical field, accuracy and reliability are critical, as clinical decisions directly impact patients’ lives. In this context, shortcomings in large language models (LLMs) can lead to catastrophic consequences. For instance, the hallucination problem where LLMs generate incorrect or misleading information could result in serious errors when used for medical diagnosis or treatment recommendations [3]. In addition, since LLMs are primarily trained on general knowledge, they often lack the specialized medical expertise required to interpret complex clinical information accurately and to provide precise guidance to users [4]. Finally, LLMs are often criticized for being ‘black-box’ models, unable to transparently show how their outputs are derived [5]. These challenges significantly hinder the adoption of LLMs in the healthcare domain. Therefore, in order to utilize the full potential of an LLM, a rigorous system is needed to support your conclusions.

Knowledge graphs (KGs) have emerged as a potential solution to address these limitations of LLMs. By providing structured, domain-specific knowledge, KGs can compensate for the lack of specialized knowledge within LLMs [6]. KGs represent data in a structured graph format, where concepts and entities are nodes, and relationships between them are edges. Therefore, KGs can generate more accurate and reliable responses through the process of Retrieval-augmented generation (RAG) [7]. Furthermore, KGs offer a transparent basis for reasoning, enhancing the credibility of LLMs in medical contexts. Due to these properties, KGs can mitigate the black-box nature of LLMs and play a crucial role in reducing hallucinations, thereby improving their applicability in the healthcare domain [8].

However, KGs also have some limitations. Utilizing KGs is relatively complex and requires users to possess a high level of technical expertise. Users need to perform additional steps, such as querying to retrieve relevant triples and effectively utilizing the results [9]. Also, KGs struggle to reflect dynamically changing knowledge in real time. For instance, new terms like “COVID”, which emerged in 2020, are not included in existing KGs, making it challenging for CDSSs to provide timely responses. Previously accepted knowledge changes over time, and if graphs are not properly updated, there is a risk of perpetuating outdated or incorrect information because of this static nature of KGs [10].

In this study, we propose a novel joint model that addresses these limitations by leveraging the complementary strengths of LLMs and KGs to build a synergistic system. LLMs and KGs are inherently interconnected and can mutually enhance each other’s capabilities. When the KG can provide more accurate guidance to the LLM and present clear decision-making processes and evidence, the LLM can in turn extract relevant knowledge for building and updating the KG, thereby improving its overall quality. As a result, the integration of the LLM and KG can serve as a powerful model for knowledge representation and reasoning, overcoming the shortcomings of each system through the circular structure illustrated in Figure 1.

To synergize LLMs and KGs, we propose the integration of a circular structure. While LLMs are pre-trained on large datasets and perform very well on a variety of tasks, they have black-box and hallucination problems. Therefore, extracting sub-graphs from KGs can make the evidence of a decision identifiable and thus more explainable.

However, a KG is incomplete and not continuously updated. In this work, we compensate by extending the graph with triples extracted from relevant recent documents using an LLM, which perform well in natural language-processing tasks. This can improve the limited coverage and accuracy of the KG [6]. LLMs are expensive to train for new knowledge and have difficulty relearning, but KGs can be easily updated and expanded without retraining, making it easy to maintain up-to-date knowledge [11].

Therefore, by scoring the relevance of triples based on the LLM’s embeddings for the sub-graphs obtained, the most relevant triples can be concatenated to achieve the desired result. The circular structure of overcoming the drawbacks of LLMs with KGs, and overcoming the drawbacks of KGs with LLMs, can improve the performance of various downstream tasks while creating synergistic effects.

In Figure 1, as a motivating example, when an LLM is used standalone in (a), the model misclassifies ‘emotional confusion’ and fails to explain the inference results. In contrast, (b) is an example of a joint model of an LLM and KG. After retrieving the relevant sub-graph from the KG, we update this sub-graph by extracting new triples of (COVID, cause, stress) and (stress, cause, anxiety) related to COVID-19 from external documents. We then select the top k relevant triples and utilize them to obtain the correct output, ‘Social Relationships’. Thus, the proposed joint model not only provides explainability through the KG, but also demonstrates the ability of the LLM to utilize the expanded KG to acquire new knowledge and produce accurate conclusions.

Our proposed synergistic joint model can support the decision-making process and improve the interpretability of the AI system results. It can also effectively adapt to new knowledge and overcome existing limitations. To validate our approach, we conducted experiments on various CDSS applications and tasks, including stress cause or factor detection, status detection, and emotion recognition from daily conversations. The experimental results demonstrate that the proposed joint model significantly extends the applicability of AI in the healthcare domain and confirms its feasibility for enhancing CDSSs.

The contributions of the research are as follows:

Development of a Synergistic Joint Model: Proposes a joint model that addresses the limitations of LLMs and KGs, enhancing explainability and mitigating hallucinations for reliable use in the medical field;
Improved CDSS Performance: Demonstrates enhanced performance in clinical tasks like mental health detection and emotion classification, supporting diagnosis and treatment processes;
Scalability and Adaptability: Establishes a scalable and adaptive system capable of dynamically incorporating new knowledge, making it applicable to various domains.

In the following sections, we provide a description of the proposed model. The remainder of this paper is organized as follows: Section 2 introduces a motivating example of our proposed model. Section 3 describes the structure of the proposed model and the implementation of each layer of the joint model in detail with formulas. In Section 4, we present the results of experiments conducted to demonstrate the validity of the joint model and prove the effectiveness of the proposed method. In Section 5, we discuss various scalable applications of the proposed joint model and evaluate the effectiveness of the methodology based on the experimental results. Finally, Section 6 concludes the paper and proposes future research directions.

2. Related Work

2.1. Clinical Decision Support System (CDSS)

Clinical decision support systems (CDSSs) are tools designed to assist medical professionals in making informed decisions throughout patient care. CDSSs combine patient medical records, the latest research findings, and treatment guidelines to support various clinical decisions, such as diagnosis, treatment planning, and prognosis. By utilizing a data-driven approach, these systems help standardize medical practices, reduce errors, and ultimately enhance the quality of patient care [12].

2.1.1. Evolution of CDSSs and Key Features

CDSSs initially began as rule-based systems, where medical knowledge was represented as predefined rules to provide recommendations for specific conditions. However, these systems were limited in their ability to address complex clinical scenarios and required considerable effort to update the rules [13]. With recent advancements in artificial intelligence (AI) and machine learning (ML), CDSSs have evolved into predictive models that leverage data analysis and pattern recognition.

For instance, IBM Watson for Oncology analyzes vast amounts of medical literature to recommend personalized treatment options for cancer patients, enabling healthcare professionals to make more informed decisions [14]. Additionally, mobile applications like Epocrates offer real-time information on drug interactions, disease diagnoses, and the latest treatment guidelines, providing healthcare professionals with easy access to essential information. These advancements illustrate the growing role of AI and data analytics technologies in transforming CDSSs and enhancing decision making in healthcare.

2.1.2. Converging CDSSs and AI

The convergence of AI and CDSSs is demonstrating significant potential in the medical field. ML algorithms can analyze large volumes of patient data to develop various predictive models, such as those for early disease detection, predicting treatment responses, and assessing the likelihood of hospital readmissions. These capabilities enable healthcare professionals to more accurately monitor patients’ health statuses and create personalized treatment plans [15].

For example, DeepMind’s Streams application utilizes AI to analyze data from patients with kidney disease and alert clinicians to early signs of acute kidney injury, allowing for timely intervention [16]. Another example is Aidoc, an AI-powered imaging analytics solution that processes radiology results in real time to identify emergencies and prioritize patient care, significantly improving patient management efficiency, particularly in emergency departments [17].

Additionally, Butterfly Network (https://www.butterflynetwork.com) has developed an AI-powered handheld ultrasound device that analyzes images in real time during the diagnostic process, providing immediate diagnostic support to healthcare professionals. This technology is particularly valuable in resource-limited settings, enabling faster and more accurate diagnoses.

2.1.3. Limitations of Traditional CDSSs

Although CDSSs have made significant progress in the medical field, several limitations remain [18]. First, many CDSSs are based on static, rule-based systems, making it challenging to accommodate real-time changes in patient conditions or incorporate the latest medical knowledge. Second, the information provided is often derived from general or standardized data, limiting the system’s ability to offer personalized recommendations for complex patient conditions. Third, existing CDSSs lack transparency in their reasoning processes, making it difficult for healthcare professionals to fully trust the system’s recommendations.

To address these challenges, a synergistic system that combines LLMs and KGs is needed to enhance the capabilities of CDSSs. LLMs can analyze large volumes of medical literature and patient data to discover new knowledge and patterns, while KGs can organize and structure this information, improving the reliability of clinical decision making.

For instance, LLM-based CDSSs can analyze a patient’s symptoms and medical history to suggest potential diagnoses, while KGs provide a clear rationale to support these suggestions, thereby assisting healthcare professionals in making better-informed decisions. Additionally, the continuous updating of KGs ensures that CDSSs reflect the most up-to-date medical knowledge.

This integrated approach can significantly enhance the accuracy and reliability of CDSS recommendations. Given the importance of precise data and dependable inferences in the medical field, the integration of LLMs and KGs has the potential to substantially improve the performance and trustworthiness of CDSSs.

2.2. Integrating LLMs with KGs

To better describe existing research on integrating LLMs with KGs, we have categorized notable studies into three primary approaches: KG-enhanced LLMs, LLM-augmented KGs, and joint optimization models. The following Table 1 summarizes the main research efforts across these categories and provides insights into their application domains, knowledge utilization strategies, primary models, and explainability features.

2.2.1. KG-Enhanced LLMs

Knowledge graph-enhanced LLMs integrate KGs during the pre-training and inference phases, allowing models to internalize domain-specific knowledge and access real-time information. This approach strengthens the model’s knowledge base and improves its applicability, particularly in specialized fields like healthcare. Notable examples of integrating KGs in the pre-training phase include K-BERT [19] and ERNIE [20].

K-BERT incorporates information from KGs into BERT models, enriching the model’s internal knowledge by embedding entities and their relationships into the learning process. Similarly, ERNIE leverages Chinese KGs during pre-training to enhance domain-specific understanding.

For integrating KGs in the inference phase, the RAG model [21] and the KEPLER model [22] are representative approaches. RAG retrieves relevant information from KGs in real time and incorporates it into the LLM’s generation process, thereby improving access to up-to-date information, which is particularly beneficial in the medical field. On the other hand, KEPLER uses the structure of KGs to refine the accuracy of generated text, producing more sophisticated responses by incorporating relevant KG search results during sentence generation.

Table 1. Notable studies categorized into three primary approaches: KG-enhanced LLMs, LLM-augmented KGs, and joint optimization models.

Research (Author, Year)	KG–LLM Integration Type	Application Domain	Knowledge Utilization	Primary Model	Explainability Features
K-BERT (2020) [19]	KG-enhanced LLM	NLP	Pre-training with KG	BERT + ConceptNet
ERNIE (2021) [20]	KG-enhanced LLM	NLP	Pre-training with KG	BERT + Domain KG
RAG (2020) [21]	KG-enhanced LLM	Healthcare/NLP	Retrieval-augmented generation	T5 + OpenKG
KEPLER (2021) [22]	KG-enhanced LLM	KG completion	KG-structured fine-tuning	BERT + KG embeddings
Gao et al. (2023) [8]	KG-enhanced LLM	Clinical diagnosis	KG-guided fine-tuning	LLM + UMLS	√
Remy et al. (2023) [23]	KG-enhanced LLM	Biomedical NLP	Pre-training with KG	BioLORD-2023	√
Yang et al. (2024) [24]	KG-enhanced LLM	Medical QA	KG-based retrieval and ranking	KG-Rank
COMET (2019) [25]	LLM-augmented KG	Commonsense reasoning	KG expansion using GPT	GPT-2 + Commonsense KG
ATOMIC (2019) [26]	LLM-augmented KG	Causal reasoning	Causal knowledge extraction	GPT-2 + Event-Based KG
BERT-KGQA (2020) [27]	LLM-augmented KG	KGQA	KG-driven question answering	BERT + Knowledge Integration	√
KG-BERT (2020) [28]	LLM-augmented KG	KG completion	KG textual representation learning	BERT
Jia et al. (2024) [29]	Joint optimization	Clinical decision support	KG–LLM integration	MedIKAL	√
Zuo et al. (2024) [30]	Joint optimization	Medical diagnosis	Automated KG construction	KG4Diagnosis	√

KGs enhance LLMs by integrating medical knowledge graphs from the Unified Medical Language System (UMLS) to augment LLMs with structured clinical knowledge [8], and utilizing Dr. Know [31] to mimic clinical diagnostic reasoning to improve automated diagnostic accuracy and reduce errors in medical decision support.

While these methods significantly enhance LLM performance, they also come with drawbacks, such as increased model complexity and higher computational costs. Additionally, they face challenges in reflecting real-time information and may suffer from overfitting when applied to general tasks outside of specialized domains.

2.2.2. LLM-Augmented KGs

The LLM-based approach for building, completing, and expanding KGs leverages the language processing capabilities of LLMs to significantly enhance the utility of KGs. By converting textual data into graph structures and supporting tasks such as knowledge graph question answering (KGQA) and graph completion, this approach improves the accuracy and completeness of KGs.

Representative models include COMET [25] and ATOMIC [26], which extend KGs using LLMs. COMET, based on the GPT model, extracts implicit knowledge from large text corpora and automatically adds commonsense knowledge to the graph. ATOMIC focuses on extracting causal knowledge from everyday situations and converting it into structured graph data, enriching the KG with human commonsense reasoning.

In addition, a study introduced LLM-enhanced KGs tailored for biomedical NLP by structuring domain-specific knowledge extracted from clinical texts into KGs [23]. This approach enabled more accurate medical information retrieval and contributed to improving natural language understanding in the healthcare domain.

For KGQA and graph-completion tasks, models like BERT-KGQA [27] and KG-BERT [28] are noteworthy. BERT-KGQA improves the accuracy of question answering within the KG by using BERT to extract and integrate information from both the text and the graph, resulting in more precise responses. KG-BERT, on the other hand, utilizes BERT for KG completion by adding new relationships extracted from text, thereby enhancing the graph’s overall completeness.

Although these models play a crucial role in constructing and expanding KGs, they have limitations, such as reliability issues with LLM-generated information and a high dependency on the initial accuracy of the KG. Additionally, as KGs dynamically expand, their structural complexity increases, making them challenging to maintain and manage.

2.2.3. Joint Optimization Models

While KG-enhanced LLMs and LLM-enhanced KG approaches each have their own strengths, they also present clear limitations. Integrating KGs into LLMs increases model complexity and computational costs, potentially leading to performance degradation in general tasks outside of specific domains. Conversely, using LLMs to extend KGs can result in issues related to information reliability and graph maintainability.

MedIKAL [29] proposed a joint optimization approach that integrates KG retrieval with LLM-based reasoning in a clinical decision support environment. The framework ensures that LLM-generated answers are based on structured medical knowledge while dynamically updating the KG based on newly gained insights from real-world medical cases.

KG4Diagnosis [30] introduced a hierarchical multi-agent framework to automate KG construction while leveraging LLMs for medical diagnosis. The model combines the strengths of structured knowledge discovery and generative reasoning to demonstrate high accuracy in diagnosing complex medical conditions.

To address these challenges, a synergized LLMs + KGs approach is needed, which integrates LLMs and KGs in a complementary manner to enable bi-directional reasoning. This approach allows LLMs and KGs to compensate for each other’s weaknesses and enhances the accuracy of complex knowledge representation and reasoning processes. Particularly in complex domains such as healthcare, such integrated models can play a critical role in building more reliable decision support systems by enabling real-time data updates, transparent reasoning processes, and continuous knowledge expansion.

3. Proposed Method

In this section, we introduce a synergistic joint model that integrates LLMs and KGs into a circular structure to address their respective limitations, applied to an XAI-based CDSS and its related applications. A limitation of KGs, which struggle to incorporate new knowledge, can be mitigated by leveraging LLMs’ strength in general knowledge to extract relevant triples from newly available documents and expand the KG. Conversely, the lack of explainability and domain-specific knowledge in LLMs is addressed by incorporating retrieval mechanisms using the KG. Furthermore, we construct a synergistic joint model tailored for CDSSs by enhancing the explainability of the KG through embedding scores derived from LLMs. We provide an overview of the proposed model’s architecture and explain the methods implemented for each layer in detail.

3.1. Structure of the Proposed Method

LLMs are models trained on large-scale data that demonstrate strong generalization capabilities in broad knowledge domains. However, they often lack domain-specific expertise and can generate misleading information, a phenomenon known as hallucination. To address these limitations, KGs have recently emerged as a promising alternative. KGs store domain-specific knowledge in the form of entities linked by relationships and can provide explainable evidence for model decisions by extracting relevant sub-graphs [3].

General RAG retrieves information from documents or databases, so the quality of the answer depends on the documents retrieved. Compared to document search, KG-based RAG can be more accurate because it provides structured knowledge of entities and relationships. It can also reason over relationships in the KG and utilize the appropriate information to guide the generation of logically consistent responses. Therefore, by extracting reliable KG information and including it directly in the prompt, LLMs are encouraged to refer to it when generating responses, reducing hallucinations and increasing the probability of generating more reliable responses [32].

In this paper, we propose a joint model with a circular structure that leverages the strengths of both LLMs and KGs, while compensating for their individual shortcomings, as illustrated in Figure 2. When a text input is provided to the model, the Sub-Graph Extraction (SGE) layer first extracts the sub-graphs related to the input text from the KG. However, because KGs are static and do not inherently expand, they cannot handle new terms. If an entity does not exist in the KG, the input is processed through the Graph Expansion layer, where the LLM extracts relevant triples from related documents to expand the KG, thereby forming a complete sub-graph. Subsequently, the top-k triples with the highest cosine similarity to the input text are selected through the Triple Scoring layer and passed through the Concat layer to perform for various downstream tasks in the CDSS.

Figure 2 illustrates the structure of the proposed model, which consists of four layers. First, when a text is entered into the joint model, then tokenized, and the main keywords are extracted using medical Named-Entity Recognition (NER) in the Sub-Graph Extraction (SGE) layer. The model then checks whether these keywords exist in the KG and proceeds to the Graph Expansion layer only if they do not exist. In the Graph Expansion layer, triples are extracted from related documents using natural language techniques, information extraction, and entity linking, and then combined to form a sub-graph.

Once the sub-graph is completed for the given text, the top-k triples are selected in the Triple Scoring layer using cosine similarity with S-BERT. Finally, in the Concat layer, the original text tokens and extracted triples are concatenated with special tokens to form a single input sequence, which is fed into the LLM. This layered structure allows the model to produce both accurate classification results and explainable evidence for its decisions.

With the emergence of ChatGPT, chat-based User eXperience (UX) became widely popular, giving users the impression that they could ask anything and receive an answer. This trend led to the development of LLMs with vast amounts of knowledge, which required significant time and resources to build. However, it became clear that using excessive data for specific tasks is inefficient, and it was determined that optimizing smaller models for targeted tasks through fine-tuning would be a more effective approach [33]. Therefore, we adapted the proposed joint model to specialized CDSS applications, such as detecting stress states and levels, extracting cause and factors, and recognizing emotions in daily conversations, rather than focusing on general QA-type tasks [34].

3.2. Layers That Comprise the Joint Model

3.2.1. Notation

We denote a sentence

s = [w_{0}, w_{1}, w_{2}, \dots, w_{n}]

as a sequence of tokens, where n represents the length of the sentence. Each token

w_{i}

belongs to a vocabulary

V

, which is a collection of words. The knowledge graph G used to construct the joint model is represented as a set of triples

(h, r, t)

, where

h

(head entity),

r

(relation), and

t

(tail entity), are elements contained in

V

.

G = \{(h, r, t)| h, r, t \in V}

(1)

3.2.2. Sub-Graph Extraction Layer

The joint model proposed in this paper leverages KGs and LLMs to perform downstream tasks related to the medical domain based on a given text. To extract and utilize only the relevant portions of a large KG, a sub-graph extraction process is employed, which consists of two main steps.

First, before extracting relevant triples from the KG, a Named-Entity Recognition (NER) task is conducted to identify searchable entity types from the text in the dataset. Named entities include people, organizations, locations, and other specific categories. Recent advancements in NER often utilize LLMs to capture contextual understanding and linguistic patterns, enabling accurate entity recognition and classification. Thus, these models are trained to identify entities in raw text through specialized recognition tasks.

To align with the downstream tasks in the CDSS domain, we extract key medical-related terms from the sentences. Therefore, we utilize the Medical NER model, a fine-tuned version of the DeBERTa model [35], which is specialized for NER tasks and trained on the PubMED dataset [36]. If the model fails to extract the relevant keywords, we use the SpaCy library (https://spacy.io/), a Python-based open-source natural language processing tool, to extract the keywords. The extracted keyword set is then filtered through lemmatization, special symbol removal, and other preprocessing techniques to make it searchable in the KG. We also use fuzzy matching to retrieve similar entities even if the strings do not match exactly, considering word variations and misspellings. Therefore, given an input sentence

s

to the medical NER model, we obtain

K

, the set of the main keywords

k

.

K = M e d i c a l N E R (s) = {k_{1}, k_{2}, \dots k_{m}}

(2)

The second step is to collect related triples, named sub-graph

G_{s u b}

, with a depth of 1 for each entity corresponding to the extracted keywords in the KG. We use ConceptNet (https://conceptnet.io/), a common sense-based KG. It originated from the MIT Media Lab’s OMCS (Open Mind Common Sense) project and was built through crowdsourcing with several public datasets, including WordNet, Wikipedia, and DBpedia. Unlike traditional hierarchical ontologies, ConceptNet adopts a flexible graph structure of a semi-structured graph to represent various concepts and relationships. It is expressed in natural language as an RDF (Resource Description Framework)-style triple structure in the form of (subject concept, relationship, object concept), which is easy for humans to understand and easy to integrate between systems. It contains a wide range of concepts and relationships that help computers understand the meaning of the words used in everyday contexts [37].

On the other hand, injecting too much knowledge into the LLM can result in knowledge noise—a phenomenon where irrelevant or misleading information is emphasized, distorting the actual meaning [19]. The main cause of knowledge noise is the inclusion of less relevant knowledge triples selected in the Triple Scoring layer. Specifically, some relations within the knowledge graph, such as ‘related_to’, ‘similar_to’, or ‘is_a’, are too general and do not contribute meaningfully to the context, which can interfere with the LLM’s inference process. Additionally, the Graph Expansion layer, which generates new triples for keywords that are not present in the existing knowledge graph, can inadvertently introduce inappropriate triples that negatively impact the overall model performance.

To address this issue, we implemented a filtering strategy in this study to reduce knowledge noise during the integration of the synergistic joint model. Specifically, as a first method to reduce knowledge noise, we filtered out relations that were too general or irrelevant to the given downstream task [38]. Out of the 34 relations in ConceptNet, we selected only 16 relations that were highly relevant, such as ‘causes’, ‘created_by’, and ‘motivated_by_goal’ [39].

G_{s u b} = \{(h, r, t)| h = k o r t = k, (h, r, t) \in S G E l a y e r (K, G)}

(3)

In the medical domain, KGs such as UMLS [40] are available. However, they primarily focus on drug descriptions or in-depth medical knowledge. The goal of this paper, however, is to enhance the performance of CDSS-related downstream tasks and address the limitations of AI models. Therefore, we use ConceptNet, a commonsense KG, to perform the Sub-Graph Extraction (SGE) layer.

3.2.3. Graph Expansion Layer

ConceptNet is a static KG that does not continuously expand, making it difficult to accommodate new or emerging terms such as “COVID”. Additionally, it lacks domain-specific knowledge, such as specialized medical terminology, as it is built primarily on general commonsense information. These limitations are critical for systems like CDSS, where up-to-date information and domain-specific knowledge are essential.

To address these limitations, we introduce the Graph Expansion layer, which expands the KG by generating new triples when relevant information is absent, even if the keywords in the text contain critical insights. This layer utilizes LLMs to create new triples, denoted as

T^{'} = (h^{'}, r^{'}, t^{'})

, for keywords that do not have corresponding triples in the sub-graphs extracted through the SGE layer. This process is essential for filling in missing information and broadening the KG’s coverage.

The triple extraction process operates as follows: First, the SGE layer selects data from triples that can be extracted for keywords that do not exist in the KG. This data are typically sourced from documents that include trending or domain-specific terms. Using this curated data, we design prompts to generate new triples with OpenAI’s GPT-3.5-turbo-instruct model [41].

The prompts are structured to include the following elements:
Text Conversion: Converts the target text to lowercase and replaces spaces with underscores to ensure a consistent format;
Keyword Processing: Standardizes all keywords and intermediary terms to maintain uniformity in the model’s input, enabling the accurate recognition of key elements within the text;
I have checked and revised all.

Based on the designed prompt, a request to generate triples is made through the API. The resulting triples are in the format of (subject, relation, object) and are structured using ConceptNet-style relationships. The generated triples are stored in a data frame for each index and further validated for subsequent processing.

Next, the triples are refined and adjusted for format consistency. This step is crucial before integrating the LLM-generated triples into the KG.

The refinement process includes the following steps:

Triple Format Validation: Uses regular expressions to ensure that the extracted triples follow the correct (subject, relation, object) structure. This step prevents incorrectly formatted triples from being added to the KG;
Format Consistency Handling: Triples that pass the format validation are converted into a list and stored. If a triple string contains errors or does not match the expected format, the GPT-3.5-turbo-instruct model is called again to refine and standardize the format;
Final Validation and Storage: The refined triples are stored in a data frame for each index and prepared for integration into the KG. Errors and malformed data are automatically handled in this step, minimizing inconsistencies.

The generated triples

T^{'}

are then integrated into the set of

G_{s u b}

triples extracted from the SGE layer to create a new set of triples

T

that can accommodate new knowledge.

T = \{(h, r, t)| (h, r, t) {\in G}_{s u b}} \cup \{(h^{'}, r^{'}, t^{'})| (h^{'}, r^{'}, t^{'}) \in T^{'}}

(4)

This approach overcomes the limitations of static KGs and contributes to building a more comprehensive and scalable joint model. Consequently, the Graph Expansion layer becomes a powerful tool for integrating the latest information and knowledge into domain-specific systems, such as CDSSs.

Thus, the Graph Expansion layer plays a crucial role in enhancing the robustness of the KG by effectively incorporating new terms and knowledge. This layer improves the overall system’s accuracy and reliability, contributing to the continuous development and sustainability of the KG.

3.2.4. Triple Scoring Layer

The SGE layer and Graph Expansion layer extract the set of triples T related to the input text. However, due to the large size of ConceptNet and the limited number of input tokens for the LLM, additional filtering is required. To increase the efficiency of LLM and reduce knowledge noise, the Triple Scoring layer is employed to select the most relevant triples from the sub-graph.

Previous approaches to scoring KGs have typically relied on various embedding techniques. For instance, TransE [42] is a method that represents entities and relationships in a continuous embedding space and scores triples based on the proximity between them. This approach is effective in capturing simple relational patterns between entities. Similarly, ComplEx [43] utilizes a complex vector space to compute scores, measuring the distance and similarity between entities and their corresponding relations. These methods have been widely used in KG applications due to their ability to model structured relationships within a graph.

However, traditional KG Embedding (KGE) techniques face a significant limitation: they cannot generate embeddings for new or unseen terms that are not present in the existing KG. This makes it challenging to incorporate newly emerging entities and relations into the graph, thereby restricting their applicability to dynamic domains where new knowledge is continuously introduced.

Since our joint model relies on a KG that dynamically expands based on new knowledge generated by LLMs, traditional KG Embeddings (KGEs) are not suitable for use as a triple scoring function. To address this, we implement the Triple Scoring layer using Sentence-BERT (S-BERT) [44], also known as Sentence Transformers, a state-of-the-art Python framework for sentence and text embeddings specifically designed for LLM-based applications.

S-BERT is trained on two main tasks: classifying sentence pairs for problems such as natural language inference (NLI) and solving regression problems for sentence pairs, as seen in the semantic textual similarity (STS) task. This training enables S-BERT to capture semantic relationships between sentences and phrases effectively. To score the relevance of a sentence s and the triples

T

extracted from the KG, we first pass s and each triple through S-BERT to obtain their respective embeddings. The sentence embedding is denoted as

e_{s}

, while the triple embeddings are represented as

E_{T} = [e_{t_{1}}, e_{t_{2}}, \dots {e_{t}}_{l}]

. Using these embeddings, we calculate the cosine similarity score between

s

and each triple in

T

to determine the most relevant triples, leveraging S-BERT’s strong performance in capturing semantic similarity.

e_{s} = S - B E R T (s), e_{t_{i}} = S - B E R T (t_{i})

(5)

s c o r e (s, t_{i}) = \frac{e_{s} \cdot e_{t_{i}}}{‖e_{s}‖ ‖e_{t_{i}}‖}, i = 1, 2, \dots, l

(6)

By applying the Triple Scoring layer, we extract the top 3 triples to simplify

G_{s u b}

, thereby reducing knowledge noise and enhancing the downstream task. Rather than injecting excessive knowledge, it utilizes the relevant knowledge triples obtained from the Triple Scoring layer results. Thus, by reducing knowledge noise, an efficient combination of KG and LLM can be accomplished.

T_{f i l t e r e d} = \{t_{i}| t_{i} \in T o p - 3 (s c o r e (s, t_{i})), t_{i} \in T}

(7)

3.2.5. Concat Layer

Finally, we apply a Concat layer to incorporate the structured information from the KG as input to the LLM, which will be used for downstream tasks. While the knowledge in the text corpus is typically implicit and unstructured, the knowledge in the KG is explicit and structured. Therefore, to effectively leverage the top-k triples extracted from the previous layers and enhance the performance of downstream tasks, it is essential to reflect the structural characteristics of the KG in the LLM input. Therefore, by concatenating the triple structure, it reduces knowledge noise and achieves logical consistency and accuracy effects better than regular RAG.

To achieve this, we utilize the ‘special_token’ feature provided by the Huggingface Tokenizer to construct the input text

x

, ensuring that the structured information is appropriately encoded for the LLM.

x = [[C L S] < H E A D > h < R E L > r < T A I L > t \dots [S E P] w_{0}, w_{1}, \dots w_{n}]

(8)

To differentiate the triples

T_{f i l t e r e d}

from the plain text sentence

s

in the KG, we use the

[S E P]

token to separate and combine them. The input result

x

is represented as a sequence of segment tags and is formatted as

[a, a, \dots, a, b, b, \dots, b]

, where

a

corresponds to the triple tokens and

b

to the sentence tokens. Additionally, the

[C L S]

token is placed at the beginning of

x

, and its final hidden state is fed into the classifier, which is an LLM, to predict the downstream task’s outcome

r

.

r = σ (M L P (e_{[C L S]}))

(9)

4. Evaluation

We conducted a series of experiments to evaluate the effectiveness of the proposed model. This section details the experimental setup, results, and ablation studies.

4.1. Dataset

To evaluate the performance of the joint model in addressing the limitations of LLMs and KGs, we conducted a classification task, a common benchmark widely used in existing LLM studies. Given that our joint model is tailored for medical-related tasks, we focused on specific applications such as emotion classification from text, as well as the classification of causes and conditions related to stress and depression.

(1) Mental Health Cause/Factor Detection: To build a joint model capable of detecting causes and factors related to mental health from text, we utilized two datasets. First, the SAD dataset [45] is designed to classify various types of stress causes and provide appropriate advice for stress management during conversations. It consists of 6850 samples categorized into 9 types of stressors: work, school, financial problems, family issues, emotional turmoil, social relationships, health/fatigue or physical pain, everyday decision making, and others. This dataset is widely used to identify stressor categories in everyday conversations.

Second, the Causal Analysis for Mental illness in Social media posts (CAMS) dataset [46] is intended for causal inference, explanation extraction, and causal categorization. It includes 5052 samples, each classified into one of the following categories: bias/abuse, jobs/careers, medication, relationships, and alienation.

(2) Mental Health Status/Level Detection: For this task, we used datasets collected from social media platforms that provide an authentic view of people’s thoughts on mental health issues such as depression and stress. The dataset consists of 190,000 posts across five categories from the Reddit community, with a total of 3.5 million segments labeled from 3000 posts using Amazon Mechanical Turk.

For the same text, two key datasets were used to detect mental health status and levels. The DR dataset [47] classifies the stress index into four levels: moderate, minimal, mild, and severe. Meanwhile, the Dreaddit dataset [48] is designed for binary classification, identifying whether a post is stressful or not. These datasets enable more a detailed analysis of mental health conditions based on social media text.

(3) Emotion Recognition: The GoEmotions dataset [49], which is used for emotion recognition tasks, consists of 58,000 carefully curated comments extracted from Reddit. The dataset includes 27 distinct emotion categories along with an additional “neutral” label, making a total of 28 categories. Each comment has been annotated by human reviewers to capture nuanced emotional content. Furthermore, the dataset is split into training, validation, and test sets, as summarized in Table 2. Originally introduced in 2020, this dataset is widely used for emotion detection and serves as a benchmark for evaluating the scalability of models designed for stress detection.

The GoEmotions dataset is labeled with 28 emotion types, but the texts often express complex human emotions that cannot be easily categorized into a single label. As a result, the dataset follows a multi-label annotation scheme (1:N structure) where each text can be simultaneously tagged with multiple emotions, rather than a single emotion (1:1 structure). Therefore, the task performed using the GoEmotions dataset is a multi-label classification, where the model’s performance is evaluated based on its ability to correctly predict the top-n emotions with the highest probability from the 28 emotion categories.

4.2. Experiment Setups

We used the PyTorch (v. 2.5.1) framework (https://pytorch.org/) and Huggingface library (https://huggingface.co) to validate the performance of the integrated joint model combining the LLM and KG proposed in this study. All experiments are conducted on GeForce RTX 4070 Ti SUPER. For training consistency, we employ the Adam optimizer with a learning rate of 1 × 10⁻⁵ and a weight decay of 0.01. The performance of the model was evaluated using precision, recall, and F1 score, which are standard metrics commonly used for assessing classification tasks.

4.3. Experiment and Results

The experimental results validating the performance of the joint model across the three different datasets are presented in Table 3. For comparison, we evaluated our model against RoBERTa [50], a widely recognized LLM known for its strong performance in classification tasks, and Mental RoBERTa [51], a domain-specific variant that is fine-tuned using mental health-related datasets.

The joint model proposed in this study not only enhances the explainability of AI models but also enables them to adapt to new knowledge. Furthermore, it effectively addresses the limitations of both LLMs and KGs by leveraging their respective strengths. The experimental results in Table 3 demonstrate that the performance of the proposed joint model is generally superior to that of the baseline models.

The improved results, regardless of the specific type of LLM used in the joint model, indicate that the circular integration of LLMs and KGs successfully compensates for the shortcomings of each component. Consequently, the experimental findings validate the effectiveness and robustness of the joint model proposed in this research.

When comparing the performance of the original models, it is evident that the Mental RoBERTa model, a specialized LLM tailored for mental health, outperforms the general RoBERTa model. The Mental RoBERTa model [51] is specifically focused on the mental health domain and is pre-trained on Reddit data, using subreddits that include keywords such as “depression”, “suicide watch”, “anxiety”, and “mental illness”. This pre-training enables the model to perform well on tasks related to mental health, such as identifying causes, status, and severity levels. This result highlights that while the general RoBERTa model has strong general knowledge, it lacks the specialized domain knowledge required for mental health applications.

However, when examining the performance of the joint models, we observed that the RoBERTa-based joint model, despite being a general LLM, surpassed the Mental RoBERTa-based joint model by compensating for the lack of domain-specific knowledge through the expanded KG. This outcome demonstrates that even a general-purpose LLM can achieve superior performance by integrating structured domain knowledge and enhancing explainability through the use of a KG. Consequently, we show that vertical LLMs can also benefit from this joint approach, resulting in more structured knowledge representation and improved overall performance.

4.4. Ablation Study

To demonstrate the effectiveness of the proposed joint model, we further analyzed its experimental performance by assessing the impact of including or excluding key layers within the model. Additionally, we evaluated its performance based on the number of key triples extracted, examining how varying the quantity of extracted triples influences the model’s overall effectiveness.

We conducted additional experiments by excluding the Graph Expansion layer and the Sub-Graph Extraction layer, which are the core components of the joint model. The experiments shown in Table 4 were performed using RoBERTa, the model with the highest variance, and the results are presented based on whether the layers were included and whether the KG was expanded.

w/o Graph Expansion layer: It utilizes only the static state of a KG without extending the KG with an LLM. It is still a joint model using an LLM and a KG, but it deals poorly with recent or new knowledge.

w/o Sub-Graph Extraction layer: It is the same as a standalone LLM that does not use a KG at all, and does not have any of the benefits of KGs such as reduced hallucinations and explainability.

The results in Table 4 indicate a performance drop across all datasets, when either layer was excluded. This suggests that constructing an expanded KG to incorporate new information not only enhances explainability but also positively impacts various downstream tasks in the CDSS domain. These findings highlight the importance of each layer in the proposed joint model.

4.5. Analysis of Knowledge Noise

There are many factors that can affect knowledge noise in the experimental results of the joint model that combines a KG and an LLM. For example, knowledge noise usually occurs when unnecessary information is appended in the Concat layer to merge text and key triples, or when the information obtained from the knowledge graph is not useful. In other words, knowledge noise occurs when excessive or irrelevant knowledge is included, and the embedding emphasizes less important parts of the text, reducing the overall model effectiveness. Therefore, if noise is introduced during the process of combining a KG and an LLM, it can negatively impact the performance of the joint model.

To determine where the knowledge noise mainly comes from and how it affects the joint model, we ran several experiments. First, we ran additional experiments to see if we could reduce the noise by varying the number of top

k

triples in the process of merging the LLM and KG. These experiments were performed using the RoBERTa model to determine whether knowledge noise was a general phenomenon in the context of this study.

Table 5 shows the results of the experiments as a function of the number of knowledge triples used. Overall, we can see that as the number of triples decreases, the amount of knowledge injected decreases, resulting in a decrease in performance. This result suggests that the parameter value of

k = 3

set in this study is optimal and can efficiently pass domain knowledge to the LLM without knowledge noise.

We also aimed to demonstrate that providing knowledge in the structured form of head entities, relations, and tail entities, which is the main advantage of KGs, is effective for learning and reducing knowledge noise. We compared our experimental results with the structured form of triples and their normalized versions on the LLM’s input. We used the T5 model [52], which is excellent for transfer learning, to convert triple forms into regular sentences.

Table 6 shows that the experimental results show that the method proposed in this paper, which used special tokens to structure the input using a triple structure, outperforms the LLM when the KG search results are provided in natural language. The result shows that the structured form of knowledge graphs is robust to knowledge noise.

In this paper, we added a filtering operation to exclude non-significant realizations of the relations in ConceptNet to reduce knowledge noise. Therefore, finally, we compared our work with a version that used all relations in the KG without filtering the type of relation to see if the relation was affected by knowledge noise.

In Table 7, it is observed that both the RoBERTa and Mental RoBERTa models perform substantially better with relation filtering applied. Given the vast amount of information in ConceptNet, relation filtering serves as an effective initial step in eliminating extraneous information. In particular, triples containing excluded relations can have nearly identical head and tail entities—such as (’rejection’, ’related_to’, ’reject’)—for relations like ’SimilarTo’, ’RelatedTo’, or ’Synonym’. In such instances, the information derived from the knowledge graph contributes minimal value to the LLM and may introduce knowledge noise. Thus, relation filtering, while fundamental, can significantly mitigate knowledge noise.

5. Applications and Discussions

In this section, we explore the potential applications of CDSS and the proposed approach in various healthcare scenarios. As described in Figure 3, The model enhances CDSS performance by integrating LLMs and KGs, providing more accurate and context-specific support in disease detection, bedside decision support, treatment and prescription recommendations, and overall clinical practice. Additionally, we discuss practical deployment challenges and the role of explainable AI in improving clinical trust and decision making.

5.1. CDSS Applications

The circular joint model proposed in this study can support a wide range of clinical decision-making processes, including prevention, diagnosis, treatment, prescription, and prognosis. To illustrate its versatility, we present examples of how the joint model can be applied to various scenarios such as disease detection, decision support, personalized treatment and prescription recommendations, clinical practice, and other healthcare applications. The following subsections illustrate its versatility with specific applications.

5.1.1. Disease Detection

The model proposed in this study can be utilized to analyze complex medical data, such as brain imaging, by integrating LLMs and KGs. For instance, rather than relying solely on static KGs, the proposed CDSS dynamically integrates new findings from recent medical literature, ensuring that disease detection is based on the latest available knowledge.

5.1.2. Beside Decision Support

The CDSS can process real-time patient data to assist in immediate clinical decision making. By continuously analyzing patient vitals and cross-referencing them with structured medical knowledge, the system can recommend optimal treatment actions, such as adjusting medication dosages or identifying early warning signs of deterioration [53].

5.1.3. Treatment and Prescription

The proposed model can be applied to analyze a patient’s personal medical records to recommend personalized treatment plans. By dynamically incorporating the latest treatment options from newly published medical studies, the system can offer more precise recommendations, reducing the reliance on outdated guidelines [54].

5.1.4. Clinical Practice

In clinical practice, it is crucial to comprehensively analyze various test results and real-time patient data to make rapid and accurate decisions. The approach proposed in this study is particularly effective in such scenarios. The CDSS synthesizes real-time patient data with continuously updated medical guidelines, presenting structured insights to clinicians in a transparent and interpretable manner [55].

5.2. Discussion

The proposed KG–LLM model addresses critical challenges in CDSSs by improving knowledge integration and adaptability. However, several key factors must be considered for effective real-world deployment.

(1) Computational Overhead: The deployment of KG–LLM models in clinical settings is constrained by high processing demands and latency concerns, which can limit scalability and real-world applicability. To enhance feasibility, optimization techniques such as model distillation and resource-efficient pruning can be used to reduce computational burdens while retaining essential reasoning capabilities [56,57]. In addition, hybrid inference approaches, such as KV caching, optimize memory usage and improve response times, enabling AI-driven decision making to be more responsive to clinical needs [58]. However, a key challenge remains in balancing computational efficiency with clinical reasoning; enhancing one often compromises the other, requiring continuous refinement to ensure that KG–LLM models can be both scalable and practically deployable in hospital environments [59].

(2) Real-Time Adaptability: Effective deployment of CDSSs requires the ability to integrate and respond to evolving medical knowledge in real time. Self-updating knowledge graphs (KGs) and adaptive learning mechanisms enable continuous refinement of recommendations, ensuring alignment with the latest clinical guidelines and research findings [60]. Dynamic learning frameworks enhance CDSS capabilities by leveraging real-time data streams to optimize decision making, resource allocation, and patient-specific interventions [60]. Additionally, integrating real-world evidence strengthens the model’s reliability by ensuring that new updates are validated against retrospective clinical data before being applied in practice [61]. While these techniques improve adaptability, maintaining the accuracy and trustworthiness of continuously updated models remains a critical challenge requiring robust validation pipelines and clinician oversight [62].

(3) Interpretability and Trust: The successful deployment of AI-driven CDSSs in clinical environments depends on healthcare professionals’ ability to understand and trust AI-generated recommendations. Transparency is essential for widespread adoption, and structured reasoning models that provide step-by-step explanations can significantly improve trust in AI outputs [63]. Therefore, integration with KGs is being actively considered to increase trusted knowledge and accuracy. Additionally, methods for visualizing the reasoning process of LLMs, such as LIME and SHAP, can be applied to provide interpretable insights into AI predictions to increase clinician confidence in model outputs [64]. Furthermore, expanding and updating the knowledge graph to reflect the continuous generation of new knowledge over time and the changing nature of previously true knowledge can further increase the reliability of AI results. Future research should refine these techniques to balance interpretability and predictive performance, while ensuring that studies are conducted in compliance with regulatory standards and ethical considerations.

6. Conclusions

In this study, we proposed a novel joint model that combines explainability and adaptability to new knowledge through the complementary integration of LLMs and KGs. By maximizing the strengths of the LLM’s generalization capabilities and the KG’s structured knowledge characteristics, we successfully integrated the LLM and KG by establishing a circular structure. This integration provides a robust tool to support decision making in complex scenarios, such as those found in the medical domain. By extending KGs beyond what traditional LLM models offer, the proposed model compensates for the lack of domain-specific knowledge and enhances explainability in the reasoning process through providing a basis for decision making.

The proposed joint model demonstrated superior performance compared to existing approaches, particularly in tasks such as emotion recognition, mental health cause detection, and stress level detection within the context of CDSS. The experimental results indicate that the joint model outperforms standalone LLMs or KG-based systems overall, effectively addressing their respective limitations and leveraging their synergies. These findings suggest that the proposed model has the potential to be applied in various domains and can contribute to supporting reliable decision making, especially in healthcare. This research could ultimately contribute to building more reliable decision support systems and improving the performance of AI systems in diverse contexts.

In future work, it is necessary to advance the model further by introducing specialized mental health KGs, to enhance its ability to support more complex decision making in CDSSs. Additionally, we also need to devise new approaches to minimize knowledge noise, optimize joint models, and improve robustness and overall performance. Therefore, applying research on reducing knowledge noise to circular joint models will have significant potential for a wide range of applications.

Author Contributions

Conceptualization, C.P., H.L., S.L. and O.J.; methodology, C.P. and H.L.; validation, H.L.; formal analysis, C.P. and H.L.; investigation, C.P.; resources, C.P.; writing—original draft preparation, C.P. and H.L.; writing—review and editing, C.P., H.L., S.L. and O.J.; visualization, H.L.; supervision, S.L. and O.J.; project administration, S.L. and O.J.; funding acquisition, S.L. and O.J. All authors have read and agreed to the published version of the manuscript.

Funding

This study was conducted as part of the Electronics and Telecommunications Research Institute research operation support project (basic project) and Gachon University Research Fund 2023 (No. 25ZT1100, GCU-202110010001).

Data Availability Statement

The dataset source is given in the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Correction Statement

This article has been republished with a minor correction to the Funding statement. This change does not affect the scientific content of the article.

References

Khalifa, M. Clinical Decision Support: Strategies for Success. Procedia Comput. Sci. 2014, 37, 422–427. [Google Scholar] [CrossRef]
Wang, D.; Wang, L.; Zhang, Z.; Wang, D.; Zhu, H.; Gao, Y.; Fan, X.; Tian, F. “Brilliant AI Doctor” in Rural Clinics: Challenges in AI-Powered Clinical Decision Support System Deployment. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 8–13 May 2021; pp. 1–18. [Google Scholar] [CrossRef]
Dang, L.D.; Phan, U.T.P.; Nguyen, N.T.H. GENA: A Knowledge Graph for Nutrition and Mental Health. J. Biomed. Inform. 2023, 145, 104460. [Google Scholar] [CrossRef]
Zhou, H.; Liu, F.; Gu, B.; Zou, X.; Huang, J.; Wu, J.; Li, Y.; Chen, S.S.; Zhou, P.; Liu, J.; et al. A Survey of Large Language Models in Medicine: Progress, Application, and Challenge. arXiv 2024, arXiv:2311.05112. [Google Scholar] [CrossRef]
Chao, P.; Robey, A.; Dobriban, E.; Hassani, H.; Pappas, G.J.; Wong, E. Jailbreaking Black Box Large Language Models in Twenty Queries. arXiv 2024, arXiv:2310.08419. [Google Scholar] [CrossRef]
Pan, S.; Luo, L.; Wang, Y.; Chen, C.; Wang, J.; Wu, X. Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Trans. Knowl. Data Eng. 2024, 36, 3580–3599. [Google Scholar] [CrossRef]
Bahr, L.; Wehner, C.; Wewerka, J.; Bittencourt, J.; Schmid, U.; Daub, R. Knowledge Graph Enhanced Retrieval-Augmented Generation for Failure Mode and Effects Analysis. arXiv 2024, arXiv:2406.18114. [Google Scholar] [CrossRef]
Gao, Y.; Li, R.; Caskey, J.; Dligach, D.; Miller, T.; Churpek, M.M.; Afshar, M. Leveraging A Medical Knowledge Graph into Large Language Models for Diagnosis Prediction. arXiv 2023, arXiv:2308.14321. [Google Scholar] [CrossRef]
Chandak, P.; Huang, K.; Zitnik, M. Building a Knowledge Graph to Enable Precision Medicine. Sci. Data 2023, 10, 67. [Google Scholar] [CrossRef]
Garg, M. Mental Health Analysis in Social Media Posts: A Survey. Arch. Comput. Methods Eng. 2023, 30, 1819–1842. [Google Scholar] [CrossRef] [PubMed]
Sun, J.; Xu, C.; Tang, L.; Wang, S.; Lin, C.; Gong, Y.; Ni, L.M.; Shum, H.-Y.; Guo, J. Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph. arXiv 2024, arXiv:2307.07697. [Google Scholar] [CrossRef]
Musen, M.A.; Shahar, Y.; Shortliffe, E.H. Clinical Decision-Support Systems. In Biomedical Informatics: Computer Applications in Health Care and Biomedicine; Shortliffe, E.H., Cimino, J.J., Eds.; Springer: New York, NY, USA, 2006; pp. 698–736. [Google Scholar] [CrossRef]
Shortliffe, E.H.; Sepúlveda, M.J. Clinical Decision Support in the Era of Artificial Intelligence. JAMA 2018, 320, 2199–2200. [Google Scholar] [CrossRef]
Ferrucci, D.; Brown, E.; Chu-Carroll, J.; Fan, J.; Gondek, D.; Kalyanpur, A.A.; Lally, A.; Murdock, J.W.; Nyberg, E.; Prager, J.; et al. Building Watson: An Overview of the DeepQA Project. AI Mag. 2010, 31, 59–79. [Google Scholar] [CrossRef]
Rajkomar, A.; Dean, J.; Kohane, I. Machine Learning in Medicine. N. Engl. J. Med. 2019, 380, 1347–1358. [Google Scholar] [CrossRef]
McKinney, S.M.; Sieniek, M.; Godbole, V.; Godwin, J.; Antropova, N.; Ashrafian, H.; Back, T.; Chesus, M.; Corrado, G.S.; Darzi, A.; et al. International Evaluation of an AI System for Breast Cancer Screening. Nature 2020, 577, 89–94. [Google Scholar] [CrossRef]
Lakhani, P.; Sundaram, B. Deep Learning at Chest Radiography: Automated Classification of Pulmonary Tuberculosis by Using Convolutional Neural Networks. Radiology 2017, 284, 574–582. [Google Scholar] [CrossRef]
Sutton, R.T.; Pincock, D.; Baumgart, D.C.; Sadowski, D.C.; Fedorak, R.N.; Kroeker, K.I. An Overview of Clinical Decision Support Systems: Benefits, Risks, and Strategies for Success. NPJ Digit. Med. 2020, 3, 17. [Google Scholar] [CrossRef] [PubMed]
Liu, W.; Zhou, P.; Zhao, Z.; Wang, Z.; Ju, Q.; Deng, H.; Wang, P. K-BERT: Enabling Language Representation with Knowledge Graph. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 2901–2908. [Google Scholar] [CrossRef]
Zhang, Z.; Han, X.; Liu, Z.; Jiang, X.; Sun, M.; Liu, Q. ERNIE: Enhanced Language Representation with Informative Entities. arXiv 2019, arXiv:1905.07129. [Google Scholar] [CrossRef]
Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.; Rocktäschel, T.; et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. Adv. Neural Inf. Process. Syst. 2020, 33, 9459–9474. [Google Scholar]
Wang, X.; Gao, T.; Zhu, Z.; Zhang, Z.; Liu, Z.; Li, J.; Tang, J. KEPLER: A Unified Model for Knowledge Embedding and Pre-Trained Language Representation. Trans. Assoc. Comput. Linguist. 2021, 9, 176–194. [Google Scholar] [CrossRef]
Remy, F.; Demuynck, K.; Demeester, T. BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical Knowledge Graph Insights. arXiv 2023, arXiv:2311.16075. [Google Scholar]
Yang, R.; Liu, H.; Marrese-Taylor, E.; Zeng, Q.; Ke, Y.H.; Li, W.; Cheng, L.; Chen, Q.; Caverlee, J.; Matsuo, Y.; et al. KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking Techniques. arXiv 2024, arXiv:2403.05881. [Google Scholar]
Bosselut, A.; Rashkin, H.; Sap, M.; Malaviya, C.; Celikyilmaz, A.; Choi, Y. COMET: Commonsense Transformers for Automatic Knowledge Graph Construction. arXiv 2019, arXiv:1906.05317. [Google Scholar] [CrossRef]
Sap, M.; Bras, R.L.; Allaway, E.; Bhagavatula, C.; Lourie, N.; Rashkin, H.; Roof, B.; Smith, N.A.; Choi, Y. ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 3027–3035. [Google Scholar] [CrossRef]
Yasunaga, M.; Ren, H.; Bosselut, A.; Liang, P.; Leskovec, J. QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering. arXiv 2022, arXiv:2104.06378. [Google Scholar] [CrossRef]
Yao, L.; Mao, C.; Luo, Y. KG-BERT: BERT for Knowledge Graph Completion. arXiv 2019, arXiv:1909.03193. [Google Scholar] [CrossRef]
Jia, M.; Duan, J.; Song, Y.; Wang, J. medIKAL: Integrating Knowledge Graphs as Assistants of LLMs for Enhanced Clinical Diagnosis on EMRs. arXiv 2024, arXiv:2406.14326. [Google Scholar]
Zuo, K.; Jiang, Y.; Mo, F.; Liò, P. KG4Diagnosis: A Hierarchical Multi-Agent LLM Framework with Knowledge Graph Enhancement for Medical Diagnosis. arXiv 2024, arXiv:2412.16833. [Google Scholar]
Khumrin, P.; Ryan, A.; Juddy, T.; Verspoor, K. DrKnow: A Diagnostic Learning Tool with Feedback from Automated Clinical Decision Support. In Proceedings of the AMIA Annual Symposium, San Francisco, CA, USA, 3–7 November 2018; Volume 2018, pp. 1348–1357. [Google Scholar]
Agrawal, G.; Kumarage, T.; Alghamdi, Z.; Liu, H. Can Knowledge Graphs Reduce Hallucinations in LLMs?: A Survey. arXiv 2024, arXiv:2311.07914. [Google Scholar] [CrossRef]
Yoo, S.; Jeong, O. Automating the Expansion of a Knowledge Graph. Expert Syst. Appl. 2020, 141, 112965. [Google Scholar] [CrossRef]
Xu, Y.; He, S.; Chen, J.; Wang, Z.; Song, Y.; Tong, H.; Liu, G.; Liu, K.; Zhao, J. Generate-on-Graph: Treat LLM as Both Agent and KG in Incomplete Knowledge Graph Question Answering. arXiv 2024, arXiv:2404.14741. [Google Scholar] [CrossRef]
He, P.; Liu, X.; Gao, J.; Chen, W. DeBERTa: Decoding-Enhanced BERT with Disentangled Attention. arXiv 2021, arXiv:2006.03654. [Google Scholar] [CrossRef]
Dernoncourt, F.; Lee, J.Y. PubMed 200k RCT: A Dataset for Sequential Sentence Classification in Medical Abstracts. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers); Kondrak, G., Watanabe, T., Eds.; Asian Federation of Natural Language Processing: Taipei, Taiwan, 2017; pp. 308–313. [Google Scholar]
Speer, R.; Chin, J.; Havasi, C. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar] [CrossRef]
Liu, A.; Wang, B.; Tan, Y.; Zhao, D.; Huang, K.; He, R.; Hou, Y. MTGP: Multi-Turn Target-Oriented Dialogue Guided by Generative Global Path with Flexible Turns. In Findings of the Association for Computational Linguistics: ACL 2023; Rogers, A., Boyd-Graber, J., Okazaki, N., Eds.; Association for Computational Linguistics: Toronto, ON, Canada, 2023; pp. 259–271. [Google Scholar] [CrossRef]
Wang, P.; Peng, N.; Ilievski, F.; Szekely, P.; Ren, X. Connecting the Dots: A Knowledgeable Path Generator for Commonsense Question Answering. arXiv 2020, arXiv:2005.00691. [Google Scholar] [CrossRef]
Bodenreider, O. The Unified Medical Language System (UMLS): Integrating Biomedical Terminology. Nucleic Acids Res. 2004, 32, D267–D270. [Google Scholar] [CrossRef] [PubMed]
Ye, J.; Chen, X.; Xu, N.; Zu, C.; Shao, Z.; Liu, S.; Cui, Y.; Zhou, Z.; Gong, C.; Shen, Y.; et al. A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models. arXiv 2023, arXiv:2303.10420. [Google Scholar] [CrossRef]
Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating Embeddings for Modeling Multi-Relational Data. Adv. Neural Inf. Process. Syst. 2013, 26, 2787–2795. [Google Scholar]
Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, E.; Bouchard, G. Complex Embeddings for Simple Link Prediction. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 2071–2080. [Google Scholar]
Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; Inui, K., Jiang, J., Ng, V., Wan, X., Eds.; Association for Computational Linguistics: Hong Kong, China, 2019; pp. 3982–3992. [Google Scholar] [CrossRef]
Mauriello, M.L.; Lincoln, T.; Hon, G.; Simon, D.; Jurafsky, D.; Paredes, P. SAD: A Stress Annotated Dataset for Recognizing Everyday Stressors in SMS-like Conversational Systems. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems; ACM: Yokohama, Japan, 2021; pp. 1–7. [Google Scholar] [CrossRef]
Garg, M.; Saxena, C.; Saha, S.; Krishnan, V.; Joshi, R.; Mago, V. CAMS: An Annotated Corpus for Causal Analysis of Mental Health Issues in Social Media Posts. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, Marseille, France, 20–25 June 2022; Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., et al., Eds.; European Language Resources Association: Marseille, France, 2022; pp. 6387–6396. [Google Scholar]
Pirina, I.; Çöltekin, Ç. Identifying Depression on Reddit: The Effect of Training Data. In Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task, Brussels, Belgium, 31 October–1 November 2018; Gonzalez-Hernandez, G., Weissenbacher, D., Sarker, A., Paul, M., Eds.; Association for Computational Linguistics: Brussels, Belgium, 2018; pp. 9–12. [Google Scholar] [CrossRef]
Turcan, E.; McKeown, K. Dreaddit: A Reddit Dataset for Stress Analysis in Social Media. arXiv 2019, arXiv:1911.00133. [Google Scholar] [CrossRef]
Demszky, D.; Movshovitz-Attias, D.; Ko, J.; Cowen, A.; Nemade, G.; Ravi, S. GoEmotions: A Dataset of Fine-Grained Emotions. arXiv 2020, arXiv:2005.00547. [Google Scholar] [CrossRef]
Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar] [CrossRef]
Murarka, A.; Radhakrishnan, B.; Ravichandran, S. Detection and Classification of Mental Illnesses on Social Media Using RoBERTa. arXiv 2020, arXiv:2011.11226. [Google Scholar] [CrossRef]
Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. arXiv 2023, arXiv:1910.10683. [Google Scholar] [CrossRef]
Pendyala, S.K. Real-Time Analytics and Clinical Decision Support Systems: Transforming Emergency Care. Int. J. Multidiscip. Res. 2024, 6. [Google Scholar] [CrossRef]
Yazzourh, S.; Savy, N.; Saint-Pierre, P.; Kosorok, M.R. Medical Knowledge Integration into Reinforcement Learning Algorithms for Dynamic Treatment Regimes. arXiv 2024, arXiv:2407.00364. [Google Scholar] [CrossRef]
Aiello, T.F.; Teijon-Lumbreras, C.; Gallardo-Pizarro, A.; Monzó-Gallo, P.; Martinez-Urrea, A.; Cuervo, G.; Rio, A.D.; Hernández-Meneses, M.; Morata, L.; Mensa, J.; et al. Strengths and Weaknesses of Computerized Clinical Decision Support Systems: Insights from a Digital Control Center (C3 COVID-19) for Early and Personalized Treatment for COVID-19. Rev. Esp. Quim. 2025, 38, 1–7. [Google Scholar] [CrossRef]
Qu, Y.; Dai, Y.; Yu, S.; Tanikella, P.; Schrank, T.; Hackman, T.; Li, D.; Wu, D. A Novel Compact LLM Framework for Local, High-Privacy EHR Data Applications. arXiv 2024, arXiv:2412.02868. [Google Scholar] [CrossRef]
Puccioni, L.; Farshin, A.; Scazzariello, M.; Wang, C.; Chiesa, M.; Kostic, D. Deriving Coding-Specific Sub-Models from LLMs Using Resource-Efficient Pruning. arXiv 2025, arXiv:2501.05248. [Google Scholar] [CrossRef]
Jiang, C.; Gao, L.; Zarch, H.E.; Annavaram, M. Efficient LLM Inference with I/O-Aware Partial KV Cache Recomputation. arXiv 2024, arXiv:2411.17089. [Google Scholar] [CrossRef]
AbdElhameed, M.; Halim, P. Inference Scaling vs Reasoning: An Empirical Analysis of Compute-Optimal LLM Problem-Solving. arXiv 2024, arXiv:2412.16260. [Google Scholar] [CrossRef]
Jacinth, J.; Krishnaraj, K.; Sekar, P.C.; Kanth, T.V.R. Dynamic Learning-Driven Software Ecosystems: Revolutionizing Healthcare Solutions through Real-Time Adaptation. Int. J. BIM Eng. Sci. 2025, 10, 7–17. [Google Scholar] [CrossRef]
Torres, J.; Alonso, E.; Larburu, N. Real-World Evidence Inclusion in Guideline-Based Clinical Decision Support Systems: Breast Cancer Use Case. In Artificial Intelligence in Medicine; Juarez, J.M., Marcos, M., Stiglic, G., Tucker, A., Eds.; Springer Nature Switzerland: Cham, Switzerland, 2023; pp. 357–361. [Google Scholar] [CrossRef]
Choudhuri, A.; Jang, H.; Segre, A.M.; Polgreen, P.M.; Jha, K.; Adhikari, B. Continually-Adaptive Representation Learning Framework for Time-Sensitive Healthcare Applications. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023; pp. 4538–4544. [Google Scholar] [CrossRef]
Alkan, M.; Zakariyya, I.; Leighton, S.; Sivangi, K.B.; Anagnostopoulos, C.; Deligianni, F. Artificial Intelligence-Driven Clinical Decision Support Systems. arXiv 2025, arXiv:2501.09628. [Google Scholar] [CrossRef]
Liu, Y.-K.; Tsai, Y.-C. Explainable AI for Trustworthy Clinical Decision Support: A Case-Based Reasoning System for Nursing Assistants. In Proceedings of the 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA, 15–28 December 2024; pp. 6502–6509. [Google Scholar] [CrossRef]

Figure 1. Motivating example of the circular structure of the synergistic joint model of the LLM and KG. (a) describes a simple application of an LLM to a CDSS, which produces incorrect classification results and cannot explain how the results were derived. On the other hand, (b) shows the application of our proposed synergistic joint model to a CDSS, which can incorporate new knowledge as the KG is expanded and provides clear explanations for its decisions.

Figure 2. Structure of the joint model with KGs and LLMs.

Figure 3. Various applications of CDSSs.

Table 2. Statistics for datasets.

Dataset	Cause/Factor Detection		Status/Level Detection		Emotion Recognition
Dataset	SAD	CAMS	DR	Dreaddit	GoEmotions
#Train	5335	3946	2839	2267	43,410
#Valid	667	493	355	567	5426
#Test	667	494	355	715	5427
#Total	6850	5052	3553	3553	54,263
#Label	9	6	4	2	28

Table 3. CDSS downstream task experiment results.

Model			Cause/Factor Detection		Status/Level Detection		Emotion Recognition
Model			SAD	CAMS	DR	Dreaddit	GoEmotions
Original model	RoBERTa	Precision	0.5722	0.1828	0.4795	0.6824	0.6853
		Recall	0.5652	0.2871	0.6366	0.6825	0.4133
		F1 score	0.5633	0.221	0.5468	0.6821	0.5306
	Mental RoBERTa	Precision	0.5805	0.2912	0.4841	0.69	0.6947
		Recall	0.5817	0.4069	0.6451	0.6911	0.4325
		F1 score	0.5733	0.3339	0.553	0.6912	0.5457
Joint model	RoBERTa	Precision	0.6124	0.3109	0.6071	0.6995	0.7407
		Recall	0.6177	0.415	0.6761	0.6993	0.5032
		F1 score	0.6131	0.3501	0.6077	0.6988	0.581
	Mental RoBERTa	Precision	0.6405	0.3863	0.5725	0.7024	0.7391
		Recall	0.6567	0.4312	0.6986	0.7021	0.4881
		F1 score	0.6452	0.3786	0.6292	0.7014	0.5733

Table 4. Impact of KG expansion and layer configuration on joint model performance.

	Cause/Factor Detection		Status/Level Detection		Emotion Recognition
	SAD	CAMS	DR	Dreaddit	GoEmotions
Joint Model	0.6131	0.3501	0.6077	0.6988	0.581
- Graph Expansion Layer	0.5956	0.3003	0.5725	0.6831	0.5625
- Sub-Graph Extraction Layer	0.5633	0.221	0.5468	0.6821	0.5306

Table 5. Experimental results comparing performance based on the number of top-k triples used in the Concat layer.

	Cause/Factor Detection		Status/Level Detection		Emotion Recognition
	SAD	CAMS	DR	Dreaddit	GoEmotions
k = 3	0.6131	0.3501	0.6077	0.6988	0.581
k = 1	0.5847	0.3268	0.598	0.6923	0.5464
Original	0.5633	0.221	0.5468	0.6821	0.5306

Table 6. Experimental results comparing performance based on the input text structure used in the Concat layer.

		Cause/Factor Detection		Status/Level Detection		Emotion Recognition
		SAD	CAMS	DR	Dreaddit	GoEmotions
RoBERTa	Sentence with T5	0.5882	0.3063	0.5803	0.6844	0.5803
RoBERTa	Triple(proposed)	0.6131	0.3501	0.6077	0.6988	0.581
Mental RoBERTa	Sentence with T5	0.6027	0.3494	0.6054	0.694	0.5602
Mental RoBERTa	Triple(proposed)	0.6452	0.3786	0.6292	0.7014	0.5733

Table 7. Experimental results comparing performance based on whether relation filtering was performed on the Sub-Graph Extraction layer.

		Cause/Factor Detection		Status/Level Detection		Emotion Recognition
		SAD	CAMS	DR	Dreaddit	GoEmotions
RoBERTa	All relations	0.5882	0.3063	0.5803	0.6844	0.5803
RoBERTa	Filtering(proposed)	0.6131	0.3501	0.6077	0.6988	0.581
Mental RoBERTa	All relations	0.6027	0.3494	0.6054	0.694	0.5602
Mental RoBERTa	Filtering(proposed)	0.6452	0.3786	0.6292	0.7014	0.5733

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Park, C.; Lee, H.; Lee, S.; Jeong, O. Synergistic Joint Model of Knowledge Graph and LLM for Enhancing XAI-Based Clinical Decision Support Systems. Mathematics 2025, 13, 949. https://doi.org/10.3390/math13060949

AMA Style

Park C, Lee H, Lee S, Jeong O. Synergistic Joint Model of Knowledge Graph and LLM for Enhancing XAI-Based Clinical Decision Support Systems. Mathematics. 2025; 13(6):949. https://doi.org/10.3390/math13060949

Chicago/Turabian Style

Park, Chaelim, Hayoung Lee, Seonghee Lee, and Okran Jeong. 2025. "Synergistic Joint Model of Knowledge Graph and LLM for Enhancing XAI-Based Clinical Decision Support Systems" Mathematics 13, no. 6: 949. https://doi.org/10.3390/math13060949

APA Style

Park, C., Lee, H., Lee, S., & Jeong, O. (2025). Synergistic Joint Model of Knowledge Graph and LLM for Enhancing XAI-Based Clinical Decision Support Systems. Mathematics, 13(6), 949. https://doi.org/10.3390/math13060949

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Synergistic Joint Model of Knowledge Graph and LLM for Enhancing XAI-Based Clinical Decision Support Systems

Abstract

1. Introduction

2. Related Work

2.1. Clinical Decision Support System (CDSS)

2.1.1. Evolution of CDSSs and Key Features

2.1.2. Converging CDSSs and AI

2.1.3. Limitations of Traditional CDSSs

2.2. Integrating LLMs with KGs

2.2.1. KG-Enhanced LLMs

2.2.2. LLM-Augmented KGs

2.2.3. Joint Optimization Models

3. Proposed Method

3.1. Structure of the Proposed Method

3.2. Layers That Comprise the Joint Model

3.2.1. Notation

3.2.2. Sub-Graph Extraction Layer

3.2.3. Graph Expansion Layer

3.2.4. Triple Scoring Layer

3.2.5. Concat Layer

4. Evaluation

4.1. Dataset

4.2. Experiment Setups

4.3. Experiment and Results

4.4. Ablation Study

4.5. Analysis of Knowledge Noise

5. Applications and Discussions

5.1. CDSS Applications

5.1.1. Disease Detection

5.1.2. Beside Decision Support

5.1.3. Treatment and Prescription

5.1.4. Clinical Practice

5.2. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Correction Statement

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI