Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Biomedical Question Answering
Abstract
1. Introduction
Problem Statement and Research Goals
- Reducing Data Gaps and Hallucinations: We aim to decrease inaccuracies and fabrications in LLM responses by integrating them with Knowledge Graphs and using a query-checking algorithm that verifies and corrects Cypher queries generated by LLMs, targeting common syntactic and schema alignment issues. Additionally, we optimize prompts to guide LLMs in producing more accurate queries, enhancing response reliability.
- Evaluating LLM Performance: We assess the performance of several LLMs, including GPT-4 Turbo and Llama 3:70b, on a custom benchmark dataset to identify model strengths and weaknesses and explore improvements for open-source models via prompt engineering.
- Creating a Benchmark Dataset: We developed a dataset of 50 biomedical questions based on a subset of PrimeKG for evaluating LLMs’ ability to generate accurate Cypher queries for a specific biomedical Knowledge Graph, providing a foundation for future research in this domain.
- Designing a User-Friendly Interface: To make the system accessible, we created a user-friendly web-based interface where users can input natural language queries, view generated and corrected Cypher queries, and inspect results.
2. A New Approach for LLM-Based Knowledge Graph Queries
- The user’s question, along with the graph schema, is passed to the LLM, which generates a Cypher query.
- This generated query is then subjected to the query-checking algorithm for validation and potential correction.
- Finally, the validated query is executed on the Knowledge Graph, and the results are returned.
2.1. Step 1: Initial Cypher Query Generation
2.2. Step 2: Query Checker
- “Name” attribute missing from the returned node: The output RETURN dr; would return the entire drug node, including all its properties, rather than just the drug names. Thus, the syntax checker would refine the query to RETURN dr.name;.
- Wrong node type: (d:pathway {name:“multiple sclerosis”}) The LLM incorrectly identifies “multiple sclerosis” as a pathway, instead of a disease. The node checker would correct this error by modifying the Cypher query to: (d:disease {name:“multiple sclerosis”}).
- Relationship direction error: The query incorrectly directs the “contraindication” relationship as -[:contraindication]->, pointing from the disease to the drug. The correct direction should have the relationship pointing from the drug to the disease. This will be corrected by the relation checker to: <-[:contraindication]-.
- Syntax Node Checker: This component ensures that the output of the Cypher query returns the resulting node names by appending the .name property to each node in the return statement. It also verifies that any variables specified in the return clause are correctly associated with their respective node types in the MATCH statement, in the form of node: node_type.
- Node Checker: This checker validates the types of nodes referenced in the query, ensuring they are the correct type for the item that was extracted from the question. If an incorrect node type is identified, it automatically substitutes the correct type. It also evaluates whether the relationships involving the corrected node type are appropriate and adjusts them if necessary, unless no compatible relationships exist, in which case the step is skipped.
- Relation Checker: This component verifies the directionality of relationships between nodes, ensuring that they are oriented correctly within the query. If any relationships are found to be reversed, the Relation Checker automatically corrects their direction to maintain the integrity of the query’s logic.
2.3. Step 3: Querying the Knowledge Graph
3. Data
3.1. PrimeKG-Based Knowledge Graph
3.2. Tailored Question/Answer Set
3.2.1. One-Hop
- Structure 1: A single, direct relationship between two nodes, typically excluding bidirectional relations. Example question: What are the names of the drugs that are contraindicated when a patient has multiple sclerosis?
3.2.2. Two-Hop
- Structure 2: One entity is directly connected to two other entities; see Figure 2. The question for this path was What side effects does a drug have that is indicated for Richter syndrome?
- Structure 3: A linear arrangement (chain) where one entity is connected to a second, which in turn is connected to a third; see Figure 3. The question for this path was What are phenotypes that gene POMC is associated with that also occur in neuromyelitis optica?
3.2.3. Three-Hop
- Structure 4: A sequential chain of four connected items; see Figure 4. The question for this path was What pathways do the exposures that can lead to multiple sclerosis interact with?
- Structure 5: An entity has relationships interfacing with two other entities, one of which interfaces with a fourth entity; see Figure 5. The question for this path was Which biological processes are affected by the gene APOE and are also affected by an exposure to something that is linked to multiple sclerosis?
4. Experimental Setup
Evaluation Metrics
5. Cypher Generation with Zero-Shot Prompting
5.1. Results
5.2. Influence of the Query Checker
5.3. Influence of Paraphrased Questions
6. Cypher Generation with Optimized Prompting
6.1. Multi-Shot Prompts
6.2. Promptcrafting
- Simplified Prompt: Removed detailed instructions to evaluate the effect of minimal guidance (Appendix D.4).
- Syntax Emphasis Prompt: Added a sentence instructing the model to focus on correct syntax usage (Appendix D.5).
- Social Engineering Prompt: Introduced a scenario where the prompter could face consequences (e.g., being fired) if the LLM made mistakes (Appendix D.6).
- Expert Role Prompt: Positioned the LLM as a Cypher query expert to encourage more accurate query generation (Appendix D.7).
7. User Interface
8. Related Work
- Direct Translation and Fine-Tuning of Language Models
- Chain-of-Thought Prompting and Few-Shot Learning
- Semantic Parsing and Template-Based Methods
- Subgraph Extraction and Contextualization
- Entity and Relation Matching Techniques
9. Conclusions
10. Outlook
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. PrimeKG Structure and Changes
| Type of Change | PrimeKG | Adjusted Version | Reason |
|---|---|---|---|
| 2-hop subgraph around ’multiple sclerosis’ | ∼8 M triples, 30 distinct relations, ∼130k unique items | ∼45k triples, 22 distinct relations, ∼7400 unique items | Make querying graph faster |
| Relation direction | All relations unidirectional | Only self-relations bidirectional | Shrink graph information, makes more sense in a sentence |
Appendix B. Questions and Answers
| Question | Answer |
|---|---|
| What are the names of the drugs that are contraindicated when a patient has multiple sclerosis? | Ascorbic acid, Zinc gluconate |
| Which drugs are contraindicated when I have dermatitis? | Hydrocortisone, Cortisone acetate, Triamcinolone |
| Which drugs cause Alkalosis as a side effect? | Methylprednisolone, Prednisone |
| Which drugs have Anxiety as side effect? | Methylprednisolone, Prednisone, Hydrocortisone, Dexamethasone, Betamethasone |
| Which genes are expressed in the eye? | CBLB, RBPJ, KCNJ10, SLC11A1, CLEC16A |
| What genes are expressed in the nasopharynx? | RBPJ |
| What proteins interact with GRB2? | CBLB, GC, IL7R, P2RX7, TNFRSF1A, VCAM1 |
| What proteins interact with the protein KPNA2? | CBLB, IL7R, IL10, VCAM1 |
| What are off-label uses of Zinc gluconate? | methemoglobinemia, sulfhemoglobinemia, attention deficit-hyperactivity disorder, attention deficit hyperactivity disorder, inattentive type, gastroenteritis, acrodermatitis enteropathica |
| What are off-label uses for Ascorbic acid? | Coronavinae infectious disease |
| In which cellular components is a protein expressed that is associated with the Spasticity phenotype? | plasma membrane, cytoplasm, integral component of plasma membrane, membrane raft, actin cytoskeleton, glutamatergic synapse, mitochondrial outer membrane, integral component of presynaptic membrane, GABA-ergic synapse, growth cone, integral component of mitochondrial membrane |
| In which cellular components is a protein expressed that is associated with Nausea? | cytoplasm, extracellular space, extracellular region, secretory granule, secretory granule lumen |
| What side effects does a drug have that is indicated for Richter syndrome? | Abdominal distention, Abdominal pain, Adrenal insufficiency, Alkalosis, Alopecia, Alopecia of scalp, Anaphylactic shock, Poor appetite, Anxiety, Arrhythmia, Arthralgia, Hypertrophic cardiomyopathy, Corneal ulceration, Delusions, Inflammatory abnormality of the skin, Atopic dermatitis, Vertigo, Bruising susceptibility, Edema, Abnormality of the endocrine system, Seizure, Abnormality of the eye, Fatigue, Fever, Erythema, Recurrent fractures, Gastrointestinal hemorrhage, Glycosuria, Hallucinations, Headache, Cardiac arrest, Cardiomegaly, Congestive heart failure, Hepatomegaly, Hirsutism, Hypercholesterolemia, Hyperglycemia, Hypernatremia, Hyperthyroidism, Hypothyroidism, Abnormal joint morphology, Arthropathy, Lethargy, Leukocytosis, Nausea, Abnormal peripheral nervous system morphology, Polyneuropathy, Peripheral neuropathy, Avascular necrosis, Osteoporosis, Generalized osteoporosis, Pancreatitis, Optic neuritis, Papilledema, Paraplegia, Paresthesia, Peptic ulcer, Petechiae, Pruritus, Pulmonary edema, Facial erythema, Loss of consciousness, Syncope, Tachycardia, Telangiectasia, Thrombophlebitis, Vasculitis, Vomiting, Increased body weight, Agitation, Emotional lability, Mood swings, Mood changes, Dermal atrophy, EEG abnormality, Impaired glucose tolerance, Growth delay, Increased intracranial pressure, Muscle weakness, Tendon rupture, Striae distensae, Irregular menstruation, Malnutrition, Abnormality of the skin, Paraparesis, Myalgia, Polyphagia, Memory impairment, Ocular hypertension, Subcapsular cataract, Personality changes, Vertebral compression fractures, Myopathy, Lipoatrophy, Mania, Blurred vision, Scaling skin, Hyperactivity, Hyperkinetic movements, Bradycardia, Dementia, Facial edema, Hyperhidrosis, Dry skin |
| If I take drugs for dry eye syndrome, what side effects will they have? | Abdominal distention, Anaphylactic shock, Arrhythmia, Hypertrophic cardiomyopathy, Corneal ulceration, Atopic dermatitis, Bruising susceptibility, Edema, Inflammatory abnormality of the skin, Headache, Cardiac arrest, Cardiomegaly, Congestive heart failure, Hepatomegaly, Hirsutism, Hypernatremia, Abnormal joint morphology, Arthropathy, Keratitis, Mydriasis, Nausea, Abnormal peripheral nervous system morphology, Polyneuropathy, Peripheral neuropathy, Osteoporosis, Generalized osteoporosis, Pancreatitis, Optic neuritis, Papilledema, Paraplegia, Paresthesia, Peptic ulcer, Petechiae, Pulmonary edema, Seizure, Vertigo, Loss of consciousness, Syncope, Tachycardia, Thrombophlebitis, Vasculitis, Increased body weight, Hypokalemic alkalosis, Emotional lability, Mood swings, Mood changes, Avascular necrosis, Increased intracranial pressure, Muscle weakness, Tendon rupture, Striae distensae, Irregular menstruation, Paraparesis, Polyphagia, Ocular hypertension, Subcapsular cataract, Erythema, Personality changes, Vertebral compression fractures, Myopathy, Blurred vision, Bradycardia, Visual impairment, Pain, Hyperhidrosis, Dry skin |
| In what anatomical structures is there no expression of proteins that interact with leukocyte migration? | cerebellar vermis |
| In what anatomical structures is there no expression of proteins that interact with cell migration? | vastus lateralis, cerebellar vermis |
| What genes and biological processes does an exposure to Tobacco Smoke Pollution interact with? | regulation of blood pressure, triglyceride metabolic process, respiratory system process, gene expression, DNA methylation, spermatogenesis, cognition, regulation of DNA methylation, developmental growth, immune response, cholesterol metabolic process, behavior, lipid metabolic process, DNA methylation on cytosine within a CG sequence, inflammatory response, regulation of gene silencing by miRNA, regulation of respiratory gaseous exchange, DNA metabolic process, respiratory gaseous exchange by respiratory system, menopause, circulatory system process, mRNA methylation, feeding behavior, hypersensitivity, estrone secretion, alanine metabolic process, lactate metabolic process, IFNG, IL1B, ARNT, ATF6B, BNIP3L, DDB2, FTH1, GADD45A, RAD51, TP53, TXN, AHRR, CNTNAP2, CYP1A1, EXT1, GFI1, HLA-DPB2, MYO1G, RUNX1, TTC7B, F2RL3, SLC7A8, C11orf52, FRMD4A, IL1B, IFNG, IL4, TNF, SERPINE1 |
| What genes and biological processes does an exposure to Lead interact with? | regulation of blood pressure, gene expression, cognition, head development, regulation of DNA methylation, glucose metabolic process, mitochondrial DNA metabolic process, behavior, lipid metabolic process, DNA methylation on cytosine within a CG sequence, regulation of heart rate, memory, regulation of systemic arterial blood pressure, response to oxidative stress, psychomotor behavior, hemoglobin biosynthetic process, visual perception, developmental process involved in reproduction, metabolic process, regulation of humoral immune response mediated by circulating immunoglobulin, glomerular filtration, cortisol metabolic process, detection of oxidative stress, tissue homeostasis, social behavior, calcium ion homeostasis, heart contraction, humoral immune response, lymphocyte mediated immunity, transport, ethanolamine metabolic process, glutamate metabolic process, urea metabolic process, regulation of cortisol secretion, inositol metabolic process, DNA methylation involved in gamete generation, regulation of multicellular organism growth, regulation of amyloid-beta formation, regulation of genetic imprinting, positive regulation of multicellular organism growth, sensory perception of sound, homocysteine metabolic process, response to auditory stimulus, renal filtration, response to lead ion, detection of mechanical stimulus involved in sensory perception, cellular amine metabolic process, choline metabolic process, creatine metabolic process, brain development, ICAM1, ADAM9, LRPAP1, RTN4, APP, IL6, TNFRSF1B, CRP, ICAM1, H19, HYMAI, IGF2, PEG3, PLAGL1, MIR10A, MIR146A, MIR190B, MIR431, MIR651, IGF1, HEXB, B2M, MIR222, ALB, PON1 |
| With which pathways do proteins interact that are associated with sleep-wake disorder? | Interleukin-1 processing, Pyroptosis, CLEC7A/inflammasome pathway, Interleukin-10 signaling, Interleukin-4 and Interleukin-13 signaling, Interleukin-1 signaling, Purinergic signaling in leishmaniasis infection, Opioid Signalling, Androgen biosynthesis, Glucocorticoid biosynthesis, G-protein activation, Peptide hormone biosynthesis, Endogenous sterols, Peptide ligand-binding receptors, G alpha (s) signalling events, G alpha (i) signalling events, Defective ACTH causes obesity and POMCD, FOXO-mediated transcription of oxidative stress, metabolic and neuronal genes, ADORA2B mediated anti-inflammatory cytokines production |
| With which pathways do proteins interact that are associated with sickle cell anemia? | Immunoregulatory interactions between a Lymphoid and a non-Lymphoid cell, Integrin cell surface interactions, Interleukin-4 and Interleukin-13 signaling, Interferon gamma signaling |
| What are phenotypes that gene POMC is associated with that also occur in neuromyelitis optica? | Ocular pain, Nausea |
| What are phenotypes that gene IFNG is associated with that also occur in neuromyelitis optica? | Nausea |
| What drugs should I take if I have a disease because of an exposure to Lead? | Methylprednisolone, Prednisone, Dalfampridine, Prednisolone, Hydrocortisone, Cortisone acetate, Hydrocortisone acetate, Dexamethasone, Betamethasone, Natalizumab, Teriflunomide, Ozanimod, Triamcinolone |
| What drugs should I take if I have a disease because of an exposure to Tobacco Smoke Pollution? | Methylprednisolone, Prednisone, Dalfampridine, Prednisolone, Hydrocortisone, Cortisone acetate, Hydrocortisone acetate, Dexamethasone, Betamethasone, Natalizumab, Teriflunomide, Ozanimod, Triamcinolone |
| What diseases are the diseases where Dalfampridine is contraindicated for related to? | brain disease |
| What diseases are the diseases where Morphine is contraindicated for related to? | multiple sclerosis, megalencephalic leukoencephalopathy with cysts, encephalopathy, acute, infection-induced, diabetic encephalopathy, hydrocephalus, brain compression, cerebral sarcoidosis, hepatic encephalopathy, visual pathway disease, central nervous system origin vertigo, cerebellar disease, olfactory nerve disease, thalamic disease, pituitary gland disease, disorder of optic chiasm, basal ganglia disease, epilepsy, mental disorder, subarachnoid hemorrhage (disease), central nervous system cyst (disease), migraine disorder, prion disease, delayed encephalopathy after acute carbon monoxide poisoning, cerebral malaria, akinetic mutism, Reye syndrome, brain edema, encephalomalacia, intracranial hypertension, intracranial hypotension, kernicterus, Wernicke encephalopathy, encephalopathy, recurrent, of childhood, progressive bulbar palsy, cerebrovascular disorder, disorder of medulla oblongata, brain inflammatory disease, narcolepsy-cataplexy syndrome, meningoencephalocele, cerebral sinovenous thrombosis, autoimmune encephalopathy with parasomnia and obstructive sleep apnea, neurometabolic disease, cerebral organic aciduria, narcolepsy without cataplexy, cerebral lipidosis with dementia, brain neoplasm, colpocephaly, corpus callosum agenesis of blepharophimosis robin type, corpus callosum dysgenesis X-linked recessive, corpus callosum dysgenesis cleft spasm, corpus callosum dysgenesis hypopituitarism, cerebral degeneration, brain injury, encephalopathy, cluster headache syndrome, cerebral cortex disease, midbrain disease, central nervous system disease |
| Which cellular components do the proteins an exposure to Lead affects interact with? | extracellular space, extracellular exosome, collagen-containing extracellular matrix, cell surface, plasma membrane, membrane, integral component of plasma membrane, external side of plasma membrane, membrane raft, focal adhesion, immunological synapse |
| Which cellular components do the proteins an exposure to Tobacco Smoke Pollution affects interact with? | extracellular space, extracellular region, cytosol, lysosome |
| What genes are associated with diseases that are linked to an exposure to Lead? | APOE, BCHE, CASP1, CBLB, CD6, CD40, CD58, CNR1, GC, HLA-DPB1, HLA-DQB1, HLA-DRA, HLA-DRB1, ICAM1, IRF8, IFNB1, IFNG, RBPJ, IL1B, IL1RN, IL2RA, IL7, IL7R, IL10, IL12A, IL17A, KCNJ10, MCAM, CLDN11, P2RX7, PDCD1, POMC, NECTIN2, SELE, SLC11A1, STAT4, TNFAIP3, TNFRSF1A, TYK2, VCAM1, VDR, TNFSF14, KIF1B, CLEC16A, NLRP3 |
| What genes are associated with diseases that are linked to an exposure to Mercury? | APOE, BCHE, CASP1, CBLB, CD6, CD40, CD58, CNR1, GC, HLA-DPB1, HLA-DQB1, HLA-DRA, HLA-DRB1, ICAM1, IRF8, IFNB1, IFNG, RBPJ, IL1B, IL1RN, IL2RA, IL7, IL7R, IL10, IL12A, IL17A, KCNJ10, MCAM, CLDN11, P2RX7, PDCD1, POMC, NECTIN2, SELE, SLC11A1, STAT4, TNFAIP3, TNFRSF1A, TYK2, VCAM1, VDR, TNFSF14, KIF1B, CLEC16A, NLRP3 |
| What side effects of the drug Methylprednisolone are similar to the multiple sclerosis phenotype? | Emotional lability, Paresthesia, Muscle Weakness, Paraplegia, Optic neuritis, Nausea |
| What side effects of the drug Prednisone are similar to the multiple sclerosis phenotype? | Paresthesia, Optic neuritis, Muscle weakness, Emotional lability, Nausea, Paraplegia |
| Which drug is contraindicated in a disease that was linked to an exposure to something that interacts with the protein IFNG? | Ascorbic acid, Zinc gluconate, Methylprednisolone, Prednisone, Prednisolone, Hydrocortisone, Cortisone acetate, Hydrocortisone acetate, Dexamethasone, Betamethasone, Triamcinolone |
| Which drug is contraindicated in a disease that was linked to an exposure to something that interacts with the protein IL1B? | Ascorbic acid, Zinc gluconate, Methylprednisolone, Prednisone, Prednisolone, Hydrocortisone, Cortisone acetate, Hydrocortisone acetate, Dexamethasone, Betamethasone, Triamcinolone |
| What pathways do the exposures that can lead to multiple sclerosis interact with? | Immunoregulatory interactions between a Lymphoid and a non-Lymphoid cell, Integrin cell surface interactions, Interleukin-10 signaling, Interleukin-4 and Interleukin-13 signaling, Interferon gamma signaling, Regulation of IFNG signaling, RUNX1 and FOXP3 control the development of regulatory T lymphocytes (Tregs), Gene and protein expression by JAK-STAT signaling after Interleukin-12 stimulation, Interleukin-1 processing, Pyroptosis, CLEC7A/inflammasome pathway, Interleukin-1 signaling, Purinergic signaling in leishmaniasis infection |
| What pathways do the exposures that can lead to atopic eczema interact with? | Immunoregulatory interactions between a Lymphoid and a non-Lymphoid cell, Integrin cell surface interactions, Interleukin-10 signaling, Interleukin-4 and Interleukin-13 signaling, Interferon gamma signaling |
| Which exposure can affect drugs that are approved for off-label-use for dermatitis? | Chlorpyrifos, glyphosate, Insecticides, Organophosphates, Pesticides, Lead, Tobacco Smoke Pollution |
| Which exposure can affect drugs that are approved for off-label-use for heart disease? | Chlorpyrifos, glyphosate, Insecticides, Organophosphates, Pesticides, Lead, Tobacco Smoke Pollution |
| Which drugs have synergistic interactions with drugs that are affected by proteins that CASP1 has protein–protein interactions with? | Methylprednisolone, Prednisone, Prednisolone, Hydrocortisone, Cortisone acetate, Hydrocortisone acetate, Dexamethasone, Zinc gluconate, Betamethasone, Natalizumab, Teriflunomide, Ozanimod, Triamcinolone |
| Which drugs have synergistic interactions with drugs that are affected by proteins that IL1B has protein–protein interactions with? | Methylprednisolone, Prednisone, Dalfampridine, Prednisolone, Hydrocortisone, Cortisone acetate, Hydrocortisone acetate, Dexamethasone, Zinc gluconate, Betamethasone, Triamcinolone |
| Which biological processes are affected by the gene APOE which are also affected by an exposure to something that is linked to multiple sclerosis? | cholesterol homeostasis, triglyceride metabolic process, cholesterol metabolic process, gene expression |
| Which biological processes are affected by the gene IL1B which are also affected by an exposure to something that is linked to multiple sclerosis? | inflammatory response, immune response |
| What drugs should I not take for a disease that I got because exposure to Tobacco Smoke Pollution interacts with a protein relevant to that disease? | Ascorbic acid, Zinc gluconate, Methylprednisolone, Prednisone, Dalfampridine, Prednisolone, Hydrocortisone, Cortisone acetate, Hydrocortisone acetate, Dexamethasone, Betamethasone, Ozanimod, Triamcinolone, Tolvaptan, Nelarabine |
| What drugs should I not take for a disease that I got because exposure to Lead interacts with a protein relevant to that disease? | Ascorbic acid, Zinc gluconate, Methylprednisolone, Prednisone, Prednisolone, Hydrocortisone, Cortisone acetate, Hydrocortisone acetate, Dexamethasone, Betamethasone, Ozanimod, Triamcinolone, Tolvaptan, Nelarabine |
| What side effects does Prednisone have that also occur when a protein is expressed that is influenced by exposure to Tobacco Smoke Pollution? | Memory impairment, Fever, Leukocytosis, Lethargy, Cardiomegaly, Nausea, Seizure, Vomiting |
| What side effects does Dexamethasone have that also occur when a protein is expressed that is influenced by exposure to Tobacco Smoke Pollution? | Fever, Leukocytosis, Lethargy, Cardiomegaly, Nausea, Seizure, Vomiting |
| What drugs can I take that are indicated for a disease whose phenotype is associated with the gene POMC? | Eculizumab |
| What drugs can I take that are indicated for a disease whose phenotype is associated with the gene IFNG? | Eculizumab |
| What drugs can I take that are approved for off-label-use for a disease that I got because exposure to Tobacco Smoke Pollution interacts with a protein relevant to that disease? | Methylprednisolone, Prednisone, Prednisolone, Hydrocortisone, Cortisone acetate, Dexamethasone, Betamethasone, Triamcinolone |
| What drugs can I take that are approved for off-label-use for a disease that I got because exposure to Particulate Matter interacts with a protein relevant to that disease? | Methylprednisolone, Prednisone, Prednisolone, Hydrocortisone, Cortisone acetate, Dexamethasone, Betamethasone, Triamcinolone |
Appendix C. Paraphrased Questions
| Question | Paraphrase |
|---|---|
| What are the names of the drugs that are contraindicated when a patient has multiple sclerosis? | Which medications are contraindicated for patients with multiple sclerosis? |
| Which drugs are contraindicated when I have dermatitis? | Which medications are contraindicated for patients with dermatitis? |
| Which drugs cause Alkalosis as a side effect? | Which medications list Alkalosis as an adverse effect? |
| Which drugs have Anxiety as side effect? | Which medications list Anxiety as a side effect? |
| Which genes are expressed in the eye? | Which genes show expression in the eye? |
| What genes are expressed in the nasopharynx? | Which genes show expression in the nasopharynx? |
| What proteins interact with GRB2? | Which proteins interact with GRB2? |
| What proteins interact with the protein KPNA2? | Which proteins interact with KPNA2? |
| What are off-label uses of Zinc gluconate? | Which off-label indications are reported for Zinc gluconate? |
| What are off-label uses for Ascorbic acid? | Which off-label indications are reported for Ascorbic acid? |
| In which cellular components is a protein expressed that is associated with the Spasticity phenotype? | Which cellular components show expression of the protein associated with Spasticity? |
| In which cellular components is a protein expressed that is associated with Nausea? | Which cellular components show expression of the protein associated with Nausea? |
| What side effects does a drug have that is indicated for Richter syndrome? | What adverse effects are associated with a drug indicated for Richter syndrome? |
| If I take drugs for dry eye syndrome, what side effects will they have? | What adverse effects occur with medications used to treat dry eye syndrome? |
| In what anatomical structures is there no expression of proteins that interact with leukocyte migration? | Which anatomical structures lack expression of proteins that interact with leukocyte migration? |
| In what anatomical structures is there no expression of proteins that interact with cell migration? | Which anatomical structures lack expression of proteins that interact with cell migration? |
| What genes and biological processes does an exposure to Tobacco Smoke Pollution interact with? | Which genes and biological processes interact with exposure to Tobacco Smoke Pollution? |
| What genes and biological processes does an exposure to Lead interact with? | Which genes and biological processes interact with Lead exposure? |
| With which pathways do proteins interact that are associated with sleep-wake disorder? | Which pathways do proteins associated with sleep-wake disorder interact with? |
| With which pathways do proteins interact that are associated with sickle cell anemia? | Which pathways do proteins associated with sickle cell anemia interact with? |
| What are phenotypes that gene POMC is associated with that also occur in neuromyelitis optica? | Which phenotypes linked to POMC also occur in neuromyelitis optica? |
| What are phenotypes that gene IFNG is associated with that also occur in neuromyelitis optica? | Which phenotypes linked to the gene IFNG also occur in neuromyelitis optica? |
| What drugs should I take if I have a disease because of an exposure to Lead? | Which medications are indicated for a disease caused by Lead exposure? |
| What drugs should I take if I have a disease because of an exposure to Tobacco Smoke Pollution? | Which medications are indicated for a disease caused by Tobacco Smoke Pollution exposure? |
| What diseases are the diseases where Dalfampridine is contraindicated for related to? | Which diseases are related to the diseases for which Dalfampridine is contraindicated? |
| What diseases are the diseases where Morphine is contraindicated for related to? | Which diseases are related to the diseases for which Morphine is contraindicated? |
| Which cellular components do the proteins an exposure to Lead affects interact with? | Which cellular components do proteins affected by Lead exposure interact with? |
| Which cellular components do the proteins an exposure to Tobacco Smoke Pollution affects interact with? | Which cellular components do proteins affected by Tobacco Smoke Pollution exposure interact with? |
| What genes are associated with diseases that are linked to an exposure to Lead? | Which genes are associated with diseases linked to Lead exposure? |
| What genes are associated with diseases that are linked to an exposure to Mercury? | Which genes are associated with diseases linked to Mercury exposure? |
| What side effects of the drug Methylprednisolone are similar to the multiple sclerosis phenotype? | Which adverse effects of Methylprednisolone overlap with multiple sclerosis phenotypes? |
| What side effects of the drug Prednisone are similar to the multiple sclerosis phenotype? | Which adverse effects of Prednisone overlap with multiple sclerosis phenotypes? |
| Which drug is contraindicated in a disease that was linked to an exposure to something that interacts with the protein IFNG? | Which drug should be avoided for a disease linked to an exposure that interacts with the protein IFNG? |
| Which drug is contraindicated in a disease that was linked to an exposure to something that interacts with the protein IL1B? | Which drug should be avoided for a disease linked to an exposure that interacts with IL1B? |
| What pathways do the exposures that can lead to multiple sclerosis interact with? | Which pathways are modulated by exposures associated with multiple sclerosis? |
| What pathways do the exposures that can lead to atopic eczema interact with? | Which pathways are modulated by exposures associated with atopic eczema? |
| Which exposure can affect drugs that are approved for off-label-use for dermatitis? | Which exposure can modify the response to off-label dermatitis medications? |
| Which exposure can affect drugs that are approved for off-label-use for heart disease? | Which exposure can modify the response to off-label heart disease medications? |
| Which drugs have synergistic interactions with drugs that are affected by proteins that CASP1 has protein–protein interactions with? | Which agents show synergy with drugs affected by CASP1-interacting proteins? |
| Which drugs have synergistic interactions with drugs that are affected by proteins that IL1B has protein–protein interactions with? | Which agents show synergy with drugs affected by IL1B-interacting proteins? |
| Which biological processes are affected by the gene APOE which are also affected by an exposure to something that is linked to multiple sclerosis? | Which processes regulated by the gene APOE are likewise affected by exposures associated with multiple sclerosis? |
| Which biological processes are affected by the gene IL1B which are also affected by an exposure to something that is linked to multiple sclerosis? | Which processes regulated by the gene IL1B are likewise affected by exposures associated with multiple sclerosis? |
| What drugs should I not take for a disease that I got because exposure to Tobacco Smoke Pollution interacts with a protein relevant to that disease? | Which medications should I avoid for a disease that arose because exposure to Tobacco Smoke Pollution interacts with a disease-relevant protein? |
| What drugs should I not take for a disease that I got because exposure to Lead interacts with a protein relevant to that disease? | Which medications should I avoid for a disease that arose because exposure to Lead interacts with a disease-relevant protein? |
| What side effects does Prednisone have that also occur when a protein is expressed that is influenced by exposure to Tobacco Smoke Pollution? | Which adverse effects of Prednisone also occur when a phenotype influenced by Tobacco Smoke Pollution is expressed? |
| What side effects does Dexamethasone have that also occur when a protein is expressed that is influenced by exposure to Tobacco Smoke Pollution? | Which adverse effects of Dexamethasone also occur when a phenotype influenced by Tobacco Smoke Pollution is expressed? |
| What drugs can I take that are indicated for a disease whose phenotype is associated with the gene POMC? | Which medications are indicated for a disease whose phenotype is associated with the gene POMC? |
| What drugs can I take that are indicated for a disease whose phenotype is associated with the gene IFNG? | Which medications are indicated for a disease whose phenotype is associated with the gene IFNG? |
| What drugs can I take that are approved for off-label-use for a disease that I got because exposure to Tobacco Smoke Pollution interacts with a protein relevant to that disease? | Which medications approved for off-label use for the disease are suitable when exposure to Tobacco Smoke Pollution interacts with a disease-relevant protein? |
| What drugs can I take that are approved for off-label-use for a disease that I got because exposure to Particulate Matter interacts with a protein relevant to that disease? | Which medications approved for off-label use for the disease are suitable when exposure to Particulate Matter interacts with a disease-relevant protein? |
Appendix D. Cypher Generation Prompts
Appendix D.1. Zero-Shot
Appendix D.2. One-Shot
Appendix D.3. Few-Shot
Appendix D.4. Simple Prompt
Appendix D.5. Syntax Prompt
Appendix D.6. Social Engineering Prompt
Appendix D.7. Role Prompt
References
- Xu, Z.; Jain, S.; Kankanhalli, M. Hallucination is inevitable: An innate limitation of large language models. arXiv 2024, arXiv:2401.11817. [Google Scholar] [CrossRef]
- Sequeda, J.; Allemang, D.; Jacob, B. Knowledge Graphs as a source of trust for LLM-powered enterprise question answering. J. Web Semant. 2025, 85, 100858. [Google Scholar] [CrossRef]
- Huang, L.; Yu, W.; Ma, W.; Zhong, W.; Feng, Z.; Wang, H.; Chen, Q.; Peng, W.; Feng, X.; Qin, B.; et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Trans. Inf. Syst. 2025, 43, 1–55. [Google Scholar] [CrossRef]
- Wang, Y.; Lipka, N.; Rossi, R.A.; Siu, A.; Zhang, R.; Derr, T. Knowledge Graph Prompting for Multi-Document Question Answering. Proc. AAAI Conf. Artif. Intell. 2024, 38, 19206–19214. [Google Scholar] [CrossRef]
- LangChain. Available online: https://github.com/langchain-ai/langchain (accessed on 11 September 2025).
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.t.; Rocktäschel, T.; et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv. Neural Inf. Process. Syst. 2020, 33, 9459–9474. [Google Scholar]
- Chandak, P.; Huang, K.; Zitnik, M. Building a knowledge graph to enable precision medicine. Nat. Sci. Data 2023, 10, 67. [Google Scholar] [CrossRef]
- Neo4j. Neo4jGraph. Available online: https://api.python.langchain.com/en/latest/graphs/langchain_community.graphs.neo4j_graph.Neo4jGraph.html (accessed on 11 September 2025).
- LangChain. LangChain Expression Language (LCEL). Available online: https://python.langchain.com/v0.1/docs/expression_language/ (accessed on 11 September 2025).
- Liu, H.; Li, C.; Li, Y.; Lee, Y.J. Improved Baselines with Visual Instruction Tuning. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2023. [Google Scholar]
- Hartford, E.; Atkins, L.; Fernandes, F.; Cognitive Computations. Dolphin Llama 3. Available online: https://huggingface.co/dphn/dolphin-2.9-llama3-8b (accessed on 11 September 2025).
- Malartic, Q.; Chowdhury, N.R.; Cojocaru, R.; Farooq, M.; Campesan, G.; Djilali, Y.A.D.; Narayan, S.; Singh, A.; Velikanov, M.; Boussaha, B.E.A.; et al. Falcon2-11B Technical Report. arXiv 2024, arXiv:2407.14885. [Google Scholar] [CrossRef]
- Gemma Team, T.M.; Hardin, C.; Dadashi, R.; Bhupatiraju, S.; Sifre, L.; Rivière, M.; Kale, M.S.; Love, J.; Tafti, P.; Hussenot, L.; et al. Gemma. 2024. Available online: https://www.kaggle.com/models/google/gemma (accessed on 15 October 2025).
- Alpindale. Goliath. Available online: https://huggingface.co/alpindale/goliath-120b (accessed on 11 September 2025).
- OpenAI. GPT-4 Technical Report. 2023. Available online: https://openai.com/research/gpt-4 (accessed on 11 September 2025).
- OpenAI. Introducing GPT-4 Turbo. 2023. Available online: https://openai.com/product/gpt-4 (accessed on 11 September 2025).
- OpenAI. Introducing GPT-5, 7 August 2025. Available online: https://openai.com/index/introducing-gpt-5/ (accessed on 31 October 2025).
- Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; et al. Llama 2: Open foundation and fine-tuned chat models. arXiv 2023, arXiv:2307.09288. [Google Scholar] [CrossRef]
- Liu, Z.; Ping, W.; Roy, R.; Xu, P.; Lee, C.; Shoeybi, M.; Catanzaro, B. ChatQA: Surpassing GPT-4 on Conversational QA and RAG. arXiv 2024, arXiv:2401.10225. [Google Scholar]
- Dubey, A.; Jauhri, A.; Pandey, A.; Kadian, A.; Al-Dahle, A.; Letman, A.; Mathur, A.; Schelten, A.; Yang, A.; Fan, A.; et al. The llama 3 herd of models. arXiv 2024, arXiv:2407.21783. [Google Scholar] [CrossRef]
- Meta AI. Llama 3.3 70B Instruct - Model Card, 6 December 2024. Available online: https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct (accessed on 31 October 2025).
- llSourcell. medllama2. Available online: https://huggingface.co/llSourcell/medllama2_7b (accessed on 11 September 2025).
- Jiang, A.Q.; Sablayrolles, A.; Mensch, A.; Bamford, C.; Chaplot, D.S.; Casas, D.d.l.; Bressand, F.; Lengyel, G.; Lample, G.; Saulnier, L.; et al. Mistral 7B. arXiv 2023, arXiv:2310.06825. [Google Scholar] [CrossRef]
- Jiang, A.Q.; Sablayrolles, A.; Roux, A.; Mensch, A.; Savary, B.; Bamford, C.; Chaplot, D.S.; Casas, D.d.l.; Hanna, E.B.; Bressand, F.; et al. Mixtral of experts. arXiv 2024, arXiv:2401.04088. [Google Scholar] [CrossRef]
- Pankajmathur. orca mini 70b. Available online: https://huggingface.co/pankajmathur/orca_mini_v3_70b (accessed on 11 September 2025).
- Bai, J.; Bai, S.; Chu, Y.; Cui, Z.; Dang, K.; Deng, X.; Fan, Y.; Ge, W.; Han, Y.; Huang, F.; et al. Qwen Technical Report. arXiv 2023, arXiv:2309.16609. [Google Scholar] [CrossRef]
- Zhu, B.; Frick, E.; Wu, T.; Zhu, H.; Ganesan, K.; Chiang, W.-L.; Zhang, J.; Jiao, J. Starling-7B: Improving helpfulness and harmlessness with RLAIF. In Proceedings of the First Conference on Language Modeling, Philadelphia, PA, USA, 7–9 October 2024. [Google Scholar]
- Lmsys. vicuna-33b. Available online: https://huggingface.co/lmsys/vicuna-33b-v1.3 (accessed on 11 September 2025).
- Xu, C.; Sun, Q.; Zheng, K.; Geng, X.; Zhao, P.; Feng, J.; Tao, C.; Jiang, D. Wizardlm: Empowering large language models to follow complex instructions. arXiv 2023, arXiv:2304.12244. [Google Scholar] [CrossRef]
- LangChain. Neo4j DB QA Chain. 2025. Available online: https://python.langchain.com/docs/tutorials/graph/ (accessed on 11 September 2025).
- Streamlit. Available online: https://streamlit.io/ (accessed on 11 September 2025).
- Neo4j. Available online: https://neo4j.com (accessed on 11 September 2025).
- Ollama. Available online: https://ollama.com/ (accessed on 11 September 2025).
- Wang, R.; Zhang, Z.; Rossetto, L.; Ruosch, F.; Bernstein, A. NLQxform: A Language Model-based Question to SPARQL Transformer. arXiv 2023, arXiv:2311.07588. [Google Scholar] [CrossRef]
- Luo, H.; E, H.; Tang, Z.; Peng, S.; Guo, Y.; Zhang, W.; Ma, C.; Dong, G.; Song, M.; Lin, W.; et al. Chatkbqa: A generate-then-retrieve framework for knowledge base question answering with fine-tuned large language models. arXiv 2023, arXiv:2310.08975. [Google Scholar]
- Zahera, H.M.; Ali, M.; Sherif, M.A.; Moussallem, D.; Ngomo, A.C.N. Generating sparql from natural language using chain-of-thoughts prompting. In Proceedings of the SEMANTICS 2024, Amsterdam, The Netherlands, 17–19 September 2024. [Google Scholar]
- Avila, C.V.S.; Casanova, M.A.; Vidal, V.M. A Framework for Question Answering on Knowledge Graphs Using Large Language Models. In Proceedings of the European Semantic Web Conference (ESWC), Hersonissos, Greece, 26–30 May 2024. [Google Scholar]
- Perez-Beltrachini, L.; Jain, P.; Monti, E.; Lapata, M. Semantic parsing for conversational question answering over knowledge graphs. arXiv 2023, arXiv:2301.12217. [Google Scholar] [CrossRef]
- Agarwal, D.; Das, R.; Khosla, S.; Gangadharaiah, R. Bring your own kg: Self-supervised program synthesis for zero-shot kgqa. In Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, Mexico City, Mexico, 16–21 June 2024; pp. 896–919. [Google Scholar]
- Pliukhin, D.; Radyush, D.; Kovriguina, L.; Mouromtsev, D. Improving Subgraph Extraction Algorithms for One-Shot SPARQL Query Generation with Large Language Models. In Proceedings of the QALD/SemREC@ ISWC, Athens, Greece, 6–10 November 2023. [Google Scholar]
- Soman, K.; Rose, P.W.; Morris, J.H.; Akbas, R.E.; Smith, B.; Peetoom, B.; Villouta-Reyes, C.; Cerono, G.; Shi, Y.; Rizk-Jackson, A.; et al. Biomedical knowledge graph-optimized prompt generation for large language models. Bioinformatics 2024, 40, btae560. [Google Scholar] [CrossRef]
- Morris, J.H.; Soman, K.; Akbas, R.E.; Zhou, X.; Smith, B.; Meng, E.C.; Huang, C.C.; Cerono, G.; Schenk, G.; Rizk-Jackson, A.; et al. The scalable precision medicine open knowledge engine (SPOKE): A massive knowledge graph of biomedical information. Bioinformatics 2023, 39, btad080. [Google Scholar] [CrossRef]
- Steinigen, D.; Teucher, R.; Ruland, T.H.; Rudat, M.; Flores-Herr, N.; Fischer, P.; Milosevic, N.; Schymura, C.; Ziletti, A. Fact Finder–Enhancing Domain Expertise of Large Language Models by Incorporating Knowledge Graphs. arXiv 2024, arXiv:2408.03010. [Google Scholar]
- Jia, R.; Zhang, B.; Méndez, S.J.R.; Omran, P.G. Leveraging Large Language Models for Semantic Query Processing in a Scholarly Knowledge Graph. arXiv 2024, arXiv:2405.15374. [Google Scholar] [CrossRef]








| LLM | Number of Parameters | Open/Closed Source | Context Length |
|---|---|---|---|
| bakllava:7b [10] | 7b | open | 32 K |
| dolphin-llama3:8b [11] | 8b | open | 8 K |
| dolphin-llama3:70b [11] | 70b | open | 8 K |
| falcon2:11b [12] | 11b | open | 2 K |
| gemma2:9b [13] | 9b | open | 8 K |
| goliath:120b [14] | 120b | open | 4 K |
| gpt-4 turbo [15] | unknown | closed | 128 K |
| gpt-4o [16] | unknown | closed | 128 K |
| gpt-5 [17] | unknown | closed | 400 K |
| llama2:70b [18] | 70b | open | 4 K |
| llama3-chatqa:70b [19] | 70b | open | 8 K |
| llama3:70b [20] | 70b | open | 8 K |
| llama3:8b [20] | 8b | open | 8 K |
| llama3.3:70b [21] | 70b | open | 128 K |
| medllama2:7b [22] | 7b | open | 4 K |
| mistral:7b [23] | 7b | open | 32 K |
| mixtral:8x7b [24] | 8x7b | open | 32 K |
| orca-mini:70b [25] | 70b | open | 4 K |
| qwen:32b [26] | 32b | open | 32 K |
| qwen:110b [26] | 110b | open | 32 K |
| starling-lm:7b [27] | 7b | open | 8 K |
| vicuna:33b [28] | 33b | open | 2 K |
| wizardlm2:7b [29] | 7b | open | 32 K |
| LLM | Number Correct | Accuracy | Wilson 95% CI |
|---|---|---|---|
| bakllava:7b | 0 | 0 | [0, 0.071] |
| medllama2:7b | 0 | 0 | [0, 0.071] |
| mistral:7b | 9 | 0.18 | [0.098, 0.31] |
| starling-lm:7b | 0 | 0 | [0, 0.071] |
| wizardlm2:7b | 5 | 0.1 | [0.043, 0.21] |
| dolphin-llama3:8b | 8 | 0.16 | [0.083, 0.29] |
| llama3:8b | 11 | 0.22 | [0.13, 0.35] |
| gemma2:9b | 0 | 0 | [0, 0.071] |
| falcon2:11b | 0 | 0 | [0, 0.071] |
| qwen:32b | 20 | 0.4 | [0.28, 0.54] |
| vicuna:33b | 1 | 0.02 | [0.0035, 0.1] |
| mixtral:8x7b | 1 | 0.02 | [0.0035, 0.1] |
| dolphin-llama3:70b | 16 | 0.32 | [0.21, 0.46] |
| llama2:70b | 11 | 0.22 | [0.13, 0.35] |
| llama3:70b | 23 | 0.46 | [0.33, 0.6] |
| llama3.3:70b | 31 | 0.62 | [0.48, 0.74] |
| llama3-chatqa:70b | 8 | 0.16 | [0.083, 0.29] |
| orca-mini:70b | 15 | 0.3 | [0.19, 0.44] |
| qwen:110b | 18 | 0.36 | [0.24, 0.5] |
| goliath:120b | 19 | 0.38 | [0.26, 0.52] |
| gpt-4 turbo | 45 | 0.9 | [0.79, 0.96] |
| gpt-4o | 40 | 0.8 | [0.67, 0.89] |
| gpt-5 | 42 | 0.84 | [0.74, 0.94] |
| llama3:8b | llama3:70b | starling_lm:7b | gpt-5 | gpt4-turbo | dolphin_llama3:70b | llama3_chatqa:70b | wizardlm2:7b | mistral:7b | medllama2:7b | qwen:110b | llama3.3:70b | gemma:9b | dolphin_llama3:8b | bakllava:7b | falcon2:11b | llama2:70b | mixtral:8x7b | vicuna:33b | gpt-4o | orca_mini:70b | goliath:120b | qwen:32b | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| llama3:8b | - | +0.22 | −0.62 * | −0.68 * | +0.06 | +0.12 | +0.04 | +0.22 | −0.40 * | +0.22 | +0.06 | +0.22 | +0.22 | 0.00 | +0.20 | +0.20 | −0.58 * | ||||||
| llama3:70b | +0.24 | - | +0.46 * | −0.38 * | −0.44 * | +0.14 | +0.30 | +0.36 * | +0.28 | +0.46 * | +0.10 | +0.46 * | +0.30 * | +0.46 * | +0.46 * | +0.24 | +0.44 * | +0.44 * | +0.16 | +0.08 | +0.06 | ||
| starling_lm:7b | −0.46 * | - | −0.84 * | −0.90 * | −0.32 * | 0.00 | −0.36 * | −0.62 * | 0.00 | 0.00 | 0.00 | −0.80 * | −0.30 * | −0.38 * | −0.40 * | ||||||||
| gpt-5 | +0.62 * | +0.38 * | +0.84 * | - | +0.52 * | +0.68 * | +0.74 * | +0.66 * | +0.84 * | +0.48 * | +0.22 | +0.84 * | +0.68 * | +0.84 * | +0.84 * | +0.62 * | +0.82 * | +0.82 * | +0.04 | +0.54 * | +0.46 * | +0.44 * | |
| gpt4-turbo | +0.68 * | +0.44 * | +0.90 * | +0.06 | - | +0.58 * | +0.74 * | +0.80 * | +0.72 * | +0.90 * | +0.54 * | +0.28 | +0.90 * | +0.74 * | +0.90 * | +0.90 * | +0.68 * | +0.88 * | +0.88 * | +0.10 | +0.60 * | +0.52 * | +0.50 * |
| dolphin_llama3:70b | +0.10 | +0.32 * | −0.52 * | −0.58 * | - | +0.16 | +0.22 | +0.14 | +0.32 * | +0.32 * | +0.16 | +0.32 * | +0.32 * | +0.10 | +0.30 * | +0.30 * | −0.48 * | +0.02 | |||||
| llama3_chatqa:70b | +0.16 | −0.68 * | −0.74 * | - | +0.06 | +0.16 | −0.46 * | +0.16 | 0.00 | +0.16 | +0.16 | +0.14 | +0.14 | −0.64 * | |||||||||
| wizardlm2:7b | −0.36 * | +0.10 | −0.74 * | −0.80 * | - | +0.10 | −0.26 * | −0.52 * | +0.10 | +0.10 | +0.10 | +0.08 | +0.08 | −0.70 * | |||||||||
| mistral:7b | +0.18 | −0.66 * | −0.72 * | +0.02 | +0.08 | - | +0.18 | −0.44 * | +0.18 | +0.02 | +0.18 | +0.18 | +0.16 | +0.16 | −0.62 * | ||||||||
| medllama2:7b | −0.46 * | 0.00 | −0.84 * | −0.90 * | −0.32 * | - | −0.36 * | −0.62 * | 0.00 | 0.00 | 0.00 | −0.80 * | −0.30 * | −0.38 * | −0.40 * | ||||||||
| qwen:110b | +0.14 | +0.36 * | −0.48 * | −0.54 * | +0.04 | +0.20 | +0.26 * | +0.18 | +0.36 * | - | +0.36 * | +0.20 | +0.36 * | +0.36 * | +0.14 | +0.34 * | +0.34 * | −0.44 * | +0.06 | ||||
| llama3.3:70b | +0.40 * | +0.16 | +0.62 * | +0.30 | +0.46 * | +0.52 * | +0.44 * | +0.62 * | +0.26 | - | +0.62 * | +0.46 * | +0.62 * | +0.62 * | +0.40 * | +0.60 * | +0.60 * | +0.32 | +0.24 | +0.22 | |||
| gemma:9b | −0.46 * | 0.00 | −0.84 * | −0.90 * | −0.32 * | 0.00 | −0.36 * | −0.62 * | - | 0.00 | 0.00 | −0.80 * | −0.30 * | −0.38 * | −0.40 * | ||||||||
| dolphin_llama3:8b | −0.30 * | +0.16 | −0.68 * | −0.74 * | 0.00 | +0.06 | +0.16 | −0.46 * | +0.16 | - | +0.16 | +0.16 | +0.14 | +0.14 | −0.64 * | ||||||||
| bakllava:7b | −0.46 * | 0.00 | −0.84 * | −0.90 * | −0.32 * | 0.00 | −0.36 * | −0.62 * | 0.00 | - | 0.00 | −0.80 * | −0.30 * | −0.38 * | −0.40 * | ||||||||
| falcon2:11b | −0.46 * | 0.00 | −0.84 * | −0.90 * | −0.32 * | 0.00 | −0.36 * | −0.62 * | 0.00 | 0.00 | - | −0.80 * | −0.30 * | −0.38 * | −0.40 * | ||||||||
| llama2:70b | 0.00 | +0.22 | −0.62 * | −0.68 * | +0.06 | +0.12 | +0.04 | +0.22 | −0.40 * | +0.22 | +0.06 | +0.22 | +0.22 | - | +0.20 | +0.20 | −0.58 * | ||||||
| mixtral:8x7b | −0.44 * | +0.02 | −0.82 * | −0.88 * | −0.30 * | +0.02 | −0.34 * | −0.60 * | +0.02 | +0.02 | +0.02 | - | 0.00 | −0.78 * | −0.28 * | −0.36 * | −0.38 * | ||||||
| vicuna:33b | −0.44 * | +0.02 | −0.82 * | −0.88 * | −0.30 * | +0.02 | −0.34 * | −0.60 * | +0.02 | +0.02 | +0.02 | 0.00 | - | −0.78 * | −0.36 * | −0.38 * | |||||||
| gpt-4o | +0.58 * | +0.34 | +0.80 * | +0.48 * | +0.64 * | +0.70 * | +0.62 * | +0.80 * | +0.44 * | +0.18 | +0.80 * | +0.64 * | +0.80 * | +0.80 * | +0.58 * | +0.78 * | +0.78 * | - | +0.50 * | +0.42 * | +0.40 * | ||
| orca_mini:70b | +0.08 | +0.30 * | −0.54 * | −0.60 * | +0.14 | +0.20 | +0.12 | +0.30 * | +0.30 * | +0.14 | +0.30 * | +0.30 * | +0.08 | +0.28 * | +0.28 | −0.50 * | - | ||||||
| goliath:120b | +0.16 | +0.38 * | −0.46 * | −0.52 * | +0.06 | +0.22 | +0.28 | +0.20 | +0.38 * | +0.02 | +0.38 * | +0.22 | +0.38 * | +0.38 * | +0.16 | +0.36 * | +0.36 * | −0.42 * | +0.08 | - | |||
| qwen:32b | +0.18 | +0.40 * | −0.44 * | −0.50 * | +0.08 | +0.24 | +0.30 * | +0.22 | +0.40 * | +0.04 | +0.40 * | +0.24 | +0.40 * | +0.40 * | +0.18 | +0.38 * | +0.38 * | −0.40 * | +0.10 | +0.02 | - |
| Model | Correct w/o Checker | Corrected by Checker | Wrong, No Checking Opportunities | Wrong Despite Checking | Total Correct | Total False | Accuracy Uplift | Help Rate | Help Rate 95% Wilson CI | Residual Error Rate |
|---|---|---|---|---|---|---|---|---|---|---|
| llama3:8b | 6 | 5 | 14 | 25 | 11 | 39 | 0.10 | 0.11 | [0.05, 0.24] | 0.78 |
| llama3:70b | 1 | 22 | 8 | 19 | 23 | 27 | 0.44 | 0.45 | [0.32, 0.59] | 0.54 |
| starling_lm:7b | 0 | 0 | 0 | 50 | 0 | 50 | 0.00 | 0.00 | [0, 0.071] | 1.00 |
| gpt-4 turbo | 34 | 11 | 4 | 1 | 45 | 5 | 0.22 | 0.69 | [0.44, 0.86] | 0.10 |
| gpt-5 | 1 | 41 | 0 | 8 | 42 | 8 | 0.82 | 0.84 | [0.71, 0.91] | 0.16 |
| dolphin_llama3:70b | 10 | 6 | 8 | 26 | 16 | 34 | 0.12 | 0.15 | [0.071, 0.29] | 0.68 |
| llama3_chatqa:70b | 7 | 1 | 10 | 32 | 8 | 42 | 0.02 | 0.02 | [0.0041, 0.12] | 0.84 |
| wizardlm2:7b | 3 | 2 | 16 | 29 | 5 | 45 | 0.04 | 0.04 | [0.012, 0.14] | 0.90 |
| mistral:7b | 7 | 2 | 16 | 25 | 9 | 41 | 0.04 | 0.05 | [0.013, 0.15] | 0.82 |
| medllama2:7b | 0 | 0 | 19 | 31 | 0 | 50 | 0.00 | 0.00 | [0, 0.071] | 1.00 |
| qwen:110b | 13 | 5 | 10 | 22 | 18 | 32 | 0.10 | 0.14 | [0.059, 0.28] | 0.64 |
| gemma2:9b | 0 | 0 | 0 | 50 | 0 | 50 | 0.00 | 0.00 | [0, 0.071] | 1.00 |
| dolphin_llama3:8b | 7 | 1 | 29 | 13 | 8 | 42 | 0.02 | 0.02 | [0.0041, 0.12] | 0.84 |
| bakllava:7b | 0 | 0 | 0 | 50 | 0 | 50 | 0.00 | 0.00 | [0, 0.071] | 1.00 |
| falcon2:11b | 0 | 0 | 0 | 50 | 0 | 50 | 0.00 | 0.00 | [0, 0.071] | 1.00 |
| llama2:70b | 10 | 1 | 14 | 25 | 11 | 39 | 0.02 | 0.02 | [0.0044, 0.13] | 0.78 |
| mixtral:8x7b | 1 | 0 | 1 | 48 | 1 | 49 | 0.00 | 0.00 | [0, 0.073] | 0.98 |
| vicuna:33b | 0 | 1 | 23 | 26 | 1 | 49 | 0.02 | 0.02 | [0.0035, 0.1] | 0.98 |
| gpt-4o | 1 | 39 | 0 | 10 | 40 | 10 | 0.78 | 0.80 | [0.66, 0.89] | 0.20 |
| orca_mini:70b | 3 | 12 | 6 | 29 | 15 | 35 | 0.24 | 0.26 | [0.15, 0.4] | 0.70 |
| goliath:120b | 5 | 14 | 13 | 18 | 19 | 31 | 0.28 | 0.31 | [0.2, 0.46] | 0.62 |
| qwen:32b | 14 | 6 | 13 | 17 | 20 | 30 | 0.12 | 0.17 | [0.079, 0.32] | 0.60 |
| llama3.3:70b | 28 | 3 | 8 | 11 | 31 | 19 | 0.06 | 0.14 | [0.047, 0.33] | 0.38 |
| LLM | Question Type | Correct | Number Total | Accuracy | Wilson 95% CI |
|---|---|---|---|---|---|
| gpt-4 turbo | normal | 45 | 50 | 0.9 | [0.79, 0.96] |
| llama3.3:70b | normal | 31 | 50 | 0.62 | [0.48, 0.74] |
| gpt-4 turbo | paraphrased | 41 | 50 | 0.82 | [0.69, 0.9] |
| llama3.3:70b | paraphrased | 32 | 50 | 0.64 | [0.5, 0.76] |
| gpt-4 turbo | both | 86 | 100 | 0.86 | [0.78, 0.91] |
| llama3.3:70b | both | 63 | 100 | 0.63 | [0.54, 0.72] |
| gpt-4 Turbo Normal | llama3.3:70b Normal | gpt-4 Turbo Paraphrased | llama3.3:70b Paraphrased | |
|---|---|---|---|---|
| gpt-4 turbo normal | - | +0.28 * | +0.08 | +0.26 * |
| llama3.3:70b normal | −0.28 * | - | ||
| gpt-4 turbo paraphrased | +0.20 | - | +0.18 | |
| llama3.3:70b paraphrased | −0.26 * | +0.02 | - |
| gpt-4 Turbo | llama3.3:70b | |
|---|---|---|
| gpt-4 turbo | - | +0.23 * |
| llama3.3:70b | −0.23 * | - |
| LLM | Zero-Shot | One-Shot | Few-Shot | ||||||
|---|---|---|---|---|---|---|---|---|---|
| k | CI | k | CI | k | CI | ||||
| gpt-4 turbo | 45 | 0.9 | [0.79, 0.96] | 42 | 0.84 | [0.71, 0.92] | 42 | 0.84 | [0.71, 0.92] |
| llama3:70b | 23 | 0.46 | [0.33, 0.6] | 32 | 0.64 | [0.5, 0.76] | 35 | 0.7 | [0.56, 0.81] |
| llama3:70b Zero-Shot | llama3:70b One-Shot | llama3:70b Few-Shot | gpt-4 Turbo Zero-Shot | gpt-4 Turbo One-Shot | gpt-4 Turbo Few-Shot | |
|---|---|---|---|---|---|---|
| llama3:70b Zero-Shot | — | * | * | * | * | |
| llama3:70b One-Shot | — | * | ||||
| llama3:70b Few-Shot | * | — | ||||
| gpt-4 turbo Zero-Shot | * | * | — | |||
| gpt-4 turbo One-Shot | * | — | 0.00 | |||
| gpt-4 turbo Few-Shot | * | 0.00 | — |
| Prompt | gpt-4 Turbo | llama3:70b | ||||
|---|---|---|---|---|---|---|
| k | CI | k | CI | |||
| Standard | 45 | 0.90 | [0.79, 0.96] | 23 | 0.46 | [0.33, 0.6] |
| Simplified | 34 | 0.68 | [0.54, 0.79] | 19 | 0.38 | [0.26, 0.52] |
| Syntax Emphasis | 45 | 0.9 | [0.79, 0.96] | 24 | 0.48 | [0.35, 0.61] |
| Social Engineering | 46 | 0.92 | [0.81, 0.97] | 29 | 0.58 | [0.44, 0.71] |
| Expert Role | 46 | 0.92 | [0.81, 0.97] | 26 | 0.52 | [0.39, 0.65] |
| llama3:70b Standard | llama3:70b Simple | llama3:70b Syntax | llama3:70b Social | llama3:70b Expert | gpt-4 Turbo Standard | gpt-4 Turbo Simple | gpt-4 Turbo Syntax | gpt-4 Turbo Social | gpt-4 Turbo Expert | |
|---|---|---|---|---|---|---|---|---|---|---|
| llama3:70b Standard | — | * | * | * | * | |||||
| llama3:70b Simple | — | * | * | * | * | * | ||||
| llama3:70b Syntax | — | * | * | * | * | |||||
| llama3:70b Social | — | * | * | * | * | |||||
| llama3:70b Expert | — | * | * | * | * | |||||
| gpt-4 turbo Standard | * | * | * | * | * | — | 0.0 | |||
| gpt-4 turbo Simple | * | — | * | * | ||||||
| gpt-4 turbo Syntax | * | * | * | * | * | 0.0 | — | |||
| gpt-4 turbo Social | * | * | * | * | * | * | — | 0.00 | ||
| gpt-4 turbo Expert | * | * | * | * | * | * | — |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pusch, L.; Conrad, T.O.F. Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Biomedical Question Answering. BioMedInformatics 2025, 5, 70. https://doi.org/10.3390/biomedinformatics5040070
Pusch L, Conrad TOF. Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Biomedical Question Answering. BioMedInformatics. 2025; 5(4):70. https://doi.org/10.3390/biomedinformatics5040070
Chicago/Turabian StylePusch, Larissa, and Tim O. F. Conrad. 2025. "Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Biomedical Question Answering" BioMedInformatics 5, no. 4: 70. https://doi.org/10.3390/biomedinformatics5040070
APA StylePusch, L., & Conrad, T. O. F. (2025). Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Biomedical Question Answering. BioMedInformatics, 5(4), 70. https://doi.org/10.3390/biomedinformatics5040070

