Mitigating Hallucinations in Knowledge Graph Completion via Embedding-Guided Instruction Tuning

Zhang, Pengfei; Xu, Xing; Wu, Junying; Lu, Xin; Shi, Jiahao; Zhang, Xiaodong; Cui, Dezhi; Peng, Xiuxian; He, Sihao; Zong, Ping; Zhang, Guoxin; Ou, Zhonghong; Song, Meina; Zhu, Yifan

doi:10.3390/info17020207

Open AccessArticle

Mitigating Hallucinations in Knowledge Graph Completion via Embedding-Guided Instruction Tuning

by

Pengfei Zhang

¹,

Xing Xu

¹

,

Junying Wu

¹,

Xin Lu

¹,

Jiahao Shi

²,

Xiaodong Zhang

²,

Dezhi Cui

²,

Xiuxian Peng

²,

Sihao He

²

,

Ping Zong

²,

Guoxin Zhang

²

,

Zhonghong Ou

^3,*

,

Meina Song

²

and

Yifan Zhu

²

¹

State Grid Hebei Information and Telecommunication Branch, Shijiazhuang 050011, China

²

School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing 100876, China

³

State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China

^*

Author to whom correspondence should be addressed.

Information 2026, 17(2), 207; https://doi.org/10.3390/info17020207

Submission received: 15 January 2026 / Revised: 6 February 2026 / Accepted: 11 February 2026 / Published: 16 February 2026

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Real-world Knowledge Graphs (KGs) are inherently incomplete, which hinders effective downstream reasoning. While Large Language Models (LLMs) possess powerful semantic capabilities, directly applying them to Knowledge Graph Completion (KGC) often leads to hallucinations and a lack of structural awareness. To address these challenges, we propose Embedding-Guided Instruction Tuning (EGIT), a novel framework that synergizes the structural precision of embedding models with the semantic reasoning of LLMs. Our approach operates in three key stages: (1) utilizing pre-trained embedding models to automatically synthesize high-quality, annotation-free instruction data; (2) fine-tuning the LLM with these structure-aware instructions to adapt it to the KGC task; and (3) employing a joint inference mechanism where the embedding model retrieves candidates and the fine-tuned LLM performs the final selection, thereby significantly reducing hallucinations. In extensive experiments, the best variant of EGIT achieves 7.0% and 2.5% improvements in Hits@1 on the FB15k-237 and WN18RR datasets, respectively.

Keywords:

knowledge graph completion; large language models; instruction fine-tuning; embedding models; hallucination reduction

1. Introduction

In the era of data-driven intelligence, Knowledge Graphs (KGs) have established themselves as the backbone of structured knowledge representation and management by addressing limitations of traditional unstructured data processing (e.g., weak knowledge association, low reasoning interpretability). Their ability to model complex relationships between real-world entities enables support for a wide range of mission-critical applications across domains. For instance, in Question Answering (QA), KGs provide precise logical chains to answer complex natural language queries, significantly improving the accuracy and interpretability of QA systems [1]. In Recommender Systems, rich side information is used to alleviate the cold-start problem, enabling personalized recommendations even with sparse user-item interactions [2]. Furthermore, in scientific domains like Drug Discovery, KGs model intricate interactions between proteins and molecules, accelerating the identification of potential therapeutic targets [3,4].

Despite their immense utility, real-world KGs are notoriously incomplete. For example, over 70% of person nodes in Freebase [5] lack a birthplace relation, and the density of long-tail entities in open-source graphs is often insufficient. This sparsity severely hinders the reasoning capability of downstream models, making Knowledge Graph Completion (KGC)—the task of inferring missing links based on existing facts—a pivotal research focus [6,7].

For the past decade, the dominant paradigm for KGC has been Knowledge Graph Embedding (KGE). Traditional methods map entities and relations into low-dimensional vector spaces, scoring triples based on geometric distances or tensor factorization. While these models excel at capturing local structural patterns, they suffer from a fundamental “semantic gap.” They typically treat entities as discrete, independent identifiers (IDs), discarding the rich semantic information embedded in textual descriptions, which limits their generalization ability in inductive settings [8].

Recently, Large Language Models (LLMs) have revolutionized this landscape. By linearizing triples into natural-language sequences, LLMs can leverage their vast pre-trained knowledge to predict missing links, theoretically overcoming the semantic limitations of KGEs [9]. However, applying generative LLMs to the rigid task of link prediction introduces a new, critical challenge: Hallucination [10]. The core conflict lies in the objective mismatch: LLMs are designed for open-ended probabilistic generation, whereas KGC requires precise closed-set discrimination. Without strict constraints, an LLM might confidently generate a plausible-sounding but non-existent entity, leading to factually incorrect predictions [11]. This issue is exacerbated when the LLM’s internal knowledge conflicts with the specific schema of the target KG [12].

In the context of knowledge graph completion, hallucination refers to model predictions that violate the structural constraints of the target knowledge graph under the closed-world assumption. Specifically, this includes: (1) generating entities or relations that do not belong to the predefined entity or relation set of the graph; and (2) selecting structurally implausible entities that appear semantically plausible due to over-reliance on the language model’s parametric knowledge rather than graph evidence. Unlike hallucination in open-ended natural language generation, hallucination in KGC is fundamentally a structure violation problem rather than a linguistic inconsistency problem.

Existing attempts to bridge this gap generally fall into two categories: In-Context Learning (ICL) and Adapter-based tuning. Methods like KICGPT [13] retrieve relevant triples as few-shot demonstrations to prompt the LLM. While cost-effective, ICL is highly sensitive to prompt selection and limited by the context window size [14]. Adapter-based methods, such as KoPA [15], inject structural embeddings via soft prompts. However, these methods primarily focus on feature alignment rather than enabling the model to effectively discriminate between true facts and “hard negatives”.

To address these limitations, we propose a novel hybrid framework: Embedding-Guided Instruction Tuning (EGIT). Our core insight is to utilize the lightweight KGE model as a “structural anchor” to guide the semantic reasoning of the LLM. Specifically, we design an automated instruction generation pipeline. Before fine-tuning, a pre-trained KGE model is employed to retrieve a candidate list containing both the ground truth and “hard negatives.” These candidates are then incorporated into a multiple-choice instruction template. This strategy transforms the unconstrained generation task into a constrained discrimination task, enabling the LLM to better discern the subtle distinctions between valid facts and structural noise. We further employ Low-Rank Adaptation (LoRA) [16] to efficiently adapt the LLM to this structure-aware task. During inference, we adopt a “Retrieve-then-Rerank” pipeline: the KGE model acts as a high-recall retriever to narrow the search space, and the fine-tuned LLM acts as a high-precision reranker to make the final prediction [17].

EGIT differs from existing instruction-tuning or retrieval-augmented approaches in three fundamental aspects. First, unlike RAG, which only injects retrieved knowledge at inference, EGIT uses embedding-based retrieval to construct hard-negative-aware instructions during training, thereby transforming KGC into a structure-constrained discrimination task. Second, we introduce a structure-guided attention bias that explicitly alters the attention distribution inside the Transformer, rather than treating structural signals as passive context. Third, EGIT enforces training–inference alignment, where the candidate set mechanism used during fine-tuning is identical to that used at inference, reducing distribution mismatch and hallucination risk.

Although EGIT adopts a retrieval-and-reranking style pipeline at inference, it is fundamentally different from existing retrieve-and-rerank or instruction-tuning approaches in its training paradigm. Conventional methods typically apply retrieval only at inference time to narrow the candidate space, while the language model itself is trained with open-ended or weakly structured supervision. In contrast, EGIT leverages embedding-based retrieval to construct hard-negative-aware instruction data during training, thereby converting knowledge graph completion into a structure-constrained discrimination task. As a result, the LLM is explicitly trained to reason over structurally plausible candidates rather than generate free-form answers, which significantly reduces hallucination risk and improves alignment with the closed-world assumption of knowledge graphs.

The main contributions of this work are summarized as follows:

We propose a novel instruction generation strategy that utilizes pre-trained KGE models to automatically synthesize high-quality, structure-aware fine-tuning data, effectively eliminating the need for manual annotation.
We develop a specialized instruction-tuning protocol that adapts LLMs to the structural constraints of KGC tasks, significantly enhancing their discriminative capabilities while mitigating generative hallucinations.
We introduce a joint prediction pipeline that synergizes the high-recall retrieval of KGE models with the high-precision reasoning of fine-tuned LLMs to ensure robust and accurate link prediction.
We demonstrate through empirical validation on standard benchmarks that our approach outperforms state-of-the-art baselines, achieving absolute Hits@1 improvements of 7.0% on FB15k-237 and 2.5% on WN18RR compared to the baseline.

2. Related Work

2.1. Geometric and Factorization-Based KGE

Knowledge Graph Embedding (KGE) aims to map entities and relations to continuous low-dimensional vector spaces while preserving the inherent structural information of the graph. These methods can be broadly classified into translational distance models and semantic matching models.

Translational Distance Models. The representative model, TransE [18], interprets a relation as a translation vector connecting head and tail entities, assuming

h + r \approx t

. While computationally efficient, TransE struggles with complex relational patterns such as 1-to-N, N-to-1, and symmetric relations. To overcome these limitations, TransH [19] projects entities onto relation-specific hyperplanes, allowing an entity to have distinct representations under different relations. TransR [20] further extends this by modeling entities and relations in separate spaces, using projection matrices to map entities from the entity space to the relation space. More recently, RotatE [21] defines relations as rotations in a complex vector space. By leveraging Euler’s identity, it theoretically proves its ability to model symmetry, antisymmetry, inversion, and composition patterns. HAKE [22] introduces a hierarchy-aware knowledge graph embedding model that maps entities into polar coordinates to better capture semantic hierarchies in taxonomy graphs.

Semantic Matching Models. These models calculate plausibility scores using similarity-based functions. DistMult [23] employs a bilinear diagonal model, capturing interactions through element-wise products. However, its symmetric scoring function restricts it from modeling asymmetric relations. ComplEx [24] solves this by introducing complex-valued embeddings and utilizing the Hermitian product to handle asymmetry. TuckER [25] proposes a more generalized framework based on Tucker decomposition, which learns a core tensor to model interactions between entity and relation embeddings, subsuming DistMult and ComplEx as special cases. SimplE [26] provides another fully expressive model based on Canonical Polyadic decomposition, allowing for independent embeddings for head and tail roles.

While these embedding-based methods are effective at modeling structural patterns, they lack rich semantic knowledge and struggle with long-tail entities and complex reasoning, often resulting in suboptimal performance in open-domain scenarios.

2.2. Neural and Graph-Based Architectures

Beyond shallow linear models, deep neural architectures have been introduced to capture non-linear interactions and graph topology.

CNN-based Models. ConvE [27] is the first model to use 2D convolutional neural networks for KGC. It reshapes head and relation embeddings into 2D images and applies convolution filters to extract local interaction features, significantly improving performance on link prediction. ConvKB [28] adopts 1D convolution to capture global relationships among entities and relations, treating triples as sequence data.

GNN-based Models. Graph Neural Networks (GNNs) explicitly leverage the graph connectivity. R-GCN [29] generalizes graph convolutions to multi-relational data, aggregating neighborhood information to update entity representations. CompGCN [30] integrates composition operations (e.g., subtraction, multiplication) into GCNs to handle multi-relational data effectively. NBFNet [31] parameterizes the Bellman-Ford algorithm with neural networks, capturing path-based reasoning and offering interpretability. Furthermore, RED-GNN [32] utilizes subgraph reasoning to improve inductive performance, demonstrating the power of topological awareness in KGC. Despite their efficiency, these structure-based methods treat entities as opaque IDs, failing to leverage the rich semantic information contained in textual names and descriptions [33].

2.3. Pre-Trained Language Models

With the advent of Pre-trained Language Models (PLMs) like BERT, researchers have started to integrate textual semantics into KGC.

Cross-Encoder Paradigm. KG-BERT [34] treats triples as text sequences and employs BERT to classify their plausibility. StAR [35] adopts a two-tower architecture to encode structural and textual information separately, interacting them via complex matching networks to reduce computational overhead. Structure-BERT [36] extends BERT pre-training tasks to include entity alignment and triple classification, further aligning language representations with KG structures. These methods achieve high accuracy but suffer from high computational costs during inference, as every candidate triple must be fed into the heavy PLM.

Bi-Encoder Paradigm. To improve efficiency, SimKGC [37] utilizes a bi-encoder architecture with contrastive learning. It encodes the query (head, relation) and the candidate (tail) independently, allowing for fast retrieval via Maximum Inner Product Search (MIPS). KGT5 [38] proposes a sequence-to-sequence framework that unifies KGC and QA, significantly reducing model size while maintaining competitive accuracy. Despite their semantic awareness, these PLM-based methods operate in a discriminative manner—ranking a fixed set of candidates—and lack the flexibility to generate answers for open-ended queries or explain their reasoning.

2.4. Generative LLMs and Hallucination Mitigation

The emergence of Large Language Models (LLMs) has shifted the focus from discrimination to generation. Generative KGC aims to directly output the target entity given a query.

Generative KGC. GenKGC [39] explores prompting generative models for link prediction using schema-aware prompts. AutoKG [40] employs autonomous agents to construct and complete KGs. InstructKGC [41] demonstrates that instruction tuning can unlock the few-shot capabilities of LLMs for KGC tasks, aligning LLMs with the format of triple completion.

The Hallucination Challenge. A critical bottleneck for generative KGC is “hallucination”, where LLMs generate plausible but factually incorrect entities [10]. This stems from the mismatch between the LLM’s open-world pre-training and the KGC task’s closed-world constraints [11]. Recent works attempt to mitigate this via retrieval-augmented generation (RAG) [42] or post-hoc verification. KICGPT [13] employs In-Context Learning (ICL) by retrieving relevant triples as few-shot demonstrations. KoPA [15] injects structural embeddings via prefix adapters. However, ICL is limited by context window size, and adapters often fail to fully constrain the generation space. Ultra [43] introduces a foundation model for reasoning on graphs to improve generalization. In contrast, our work proposes an Embedding-Guided Instruction Tuning strategy. By constructing “hard-negative aware” instructions derived from lightweight KGE models, we explicitly enable the LLM to adhere to structural constraints during the fine-tuning stage [17,44].

Existing LLM-based KGC methods can be broadly categorized into retrieval-augmented generation, in-context learning, and instruction-tuning paradigms. Retrieval-augmented or retrieve-and-rerank methods primarily rely on external retrieval modules at inference time, while the LLM remains largely unconstrained during training. Instruction-tuning approaches adapt LLMs to KGC formats but typically lack explicit structural grounding derived from the graph topology. In contrast, EGIT introduces embedding-guided retrieval during instruction construction, embedding hard negatives directly into the training data. This design enforces structural constraints throughout both training and inference, distinguishing EGIT from prior approaches that combine retrieval and LLM reasoning in a decoupled manner.

3. Preliminaries

3.1. Notation Definitions

Let

G = (E, R, T)

denote a knowledge graph, where

E = {e_{1}, e_{2}, \dots, e_{N}}

is the set of entities,

R = {r_{1}, r_{2}, \dots, r_{M}}

is the set of relations, and

T = {(h, r, t)}

is the set of triples with

h \in E

(head entity),

r \in R

(relation), and

t \in E

(tail entity). For the knowledge graph completion task, an incomplete triple is defined as

τ = (h, r, ?)

(missing tail entity) or

τ = (?, r, t)

(missing head entity).

Let

M_{E M B}

be a knowledge graph embedding model that maps entities and relations into a low-dimensional vector space, i.e.,

h, t \in R^{d}

and

r \in R^{d}

, where d is the embedding dimension.

M_{L L M}

represents a pre-trained large language model, and

M_{L L M}^{F T}

denotes the model after instruction tuning;

D_{F T}

is the instruction tuning dataset;

C (h, r, ?)

is the candidate entity list corresponding to the incomplete triple

(h, r, ?)

with length K.

3.2. Definition of Knowledge Graph Completion Task

The goal of knowledge graph completion is to predict the missing components of incomplete triples, which can be divided into three subtasks:

1.: Prediction of tail entity: Given the head entity h and the relation r, predict the tail entity $t^{*}$ that satisfies $(h, r, t^{*}) \in T$ . Its mathematical expression is as follows:

$t^{*} = \arg \max_{t \in E} P (t | h, r; Θ_{E M B}),$

(1)

where $Θ_{E M B}$ is the set of parameters of the embedding model, and $P (t | h, r; Θ_{E M B})$ is the probability that the entity t is the correct tail entity.
2.: Prediction of the head entity: Given the relation r and the tail entity t, predict the head entity $h^{*}$ that satisfies $(h^{*}, r, t) \in T$ . The expression is as follows:

$h^{*} = \arg \max_{h \in E} P (h | r, t; Θ_{E M B}) .$

(2)
3.: Prediction of relationships: Given the head entity h and the tail entity t, predict the relation $r^{*}$ that satisfies $(h, r^{*}, t) \in T$ . The expression is as follows:

$r^{*} = \arg \max_{r \in R} P (r | h, t; Θ_{E M B}) .$

(3)

In the closed-domain knowledge graph completion task, the predicted entities or relations must belong to the original entity set

E

or the relation set

R

; in the open-domain scenario, the model can predict entities or relations outside the original sets.

3.3. Knowledge Graph Embedding

The core idea of knowledge graph embedding is to project entities and relations into a continuous vector space and measure the rationality of triples through a scoring function. For example, in the classic TransE model, the scoring function is as follows.

f (h, r, t) = {∥ h + r - t ∥}_{2}^{2} .

(4)

A smaller score indicates higher rationality of the triple. The model is trained by minimizing the margin-based ranking loss, and the loss function is as follows:

L_{E M B} = \sum_{(h, r, t) \in T} \sum_{(h^{'}, r^{'}, t^{'}) \in T_{n e g}} \max (0, γ + f (h, r, t) - f (h^{'}, r^{'}, t^{'})),

(5)

where

T_{n e g}

is the set of negative triples, and

γ

is the margin hyperparameter.

3.4. Instruction Tuning of Large Language Models

Instruction tuning adapts pre-trained LLMs to specific tasks through supervised learning on instruction datasets containing natural language prompts and reference responses. For KGC, instructions describe completion tasks, and the model learns to output correct entities or relations by minimizing cross-entropy loss:

L_{F T} = - \sum_{(q, a) \in D_{F T}} \log P (a | q; Θ_{F T}),

(6)

where q is the instruction (including incomplete triples and candidate lists), a is the correct answer, and

Θ_{F T}

is the fine-tuning parameter set (We employ Low-Rank Adaptation (LoRA) [16] for parameter-efficient fine-tuning, freezing the pre-trained LLM parameters and training only low-rank update matrices

Δ W = B A

, where

B \in R^{d \times r}

,

A \in R^{r \times k}

and

r ≪ \min (d, k)

.

4. Method

4.1. Overview

The overall framework of Embedding-Guided Instruction Tuning (EGIT) is illustrated in Figure 1. To effectively bridge the gap between structural embeddings and semantic reasoning, EGIT operates through a coherent three-stage pipeline, transforming the KGC task from open-ended generation into structure-constrained discrimination.

Stage 1: Embedding-Guided Data Synthesis. First, we train a lightweight Knowledge Graph Embedding (KGE) model (e.g., TransE, SimKGC) to capture the global structural patterns of the graph. We employ similarity-based retrieval and ranking, which are well-established techniques for KGE models, to predict candidate lists for missing triples. This model serves as a structural prior to identify both ground truth and hard negatives. These candidates are then populated into designed templates to automatically synthesize high-quality, structure-aware instruction datasets, bypassing the need for manual annotation.

Stage 2: Structure-Aware Instruction Tuning. Next, we employ the synthesized data to fine-tune a Large Language Model (LLM). We build upon instruction tuning and QLoRA, and unlike standard instruction tuning, our novel protocol specifically adapts the LLM to the closed-world constraints of KGC. By learning to select the correct entity from the provided candidate list, the LLM aligns its semantic reasoning capabilities with the structural logic of the knowledge graph.

Stage 3: Joint Inference Mechanism. Finally, we implement a “Retrieve-then-Rerank” joint prediction strategy, adopting the two-stage inference architecture. During inference, the KGE model acts as a high-recall retriever, leveraging its strength in entity matching to narrow the search space to a top-K candidate set and filter out massive amounts of irrelevant noise. The fine-tuned LLM then acts as a high-precision reranker, leveraging its semantic knowledge to identify the optimal completion. This collaborative mechanism effectively combines the structural reliability of KGEs with the reasoning power of LLMs, significantly mitigating hallucinations.

4.2. Embedding-Guided Instruction Generation

To efficiently synthesize high-quality training corpora and enforce structural consistency, we design an automated instruction generation pipeline. This module leverages the global structural patterns captured by KGE models to construct annotation-free fine-tuning data.

4.2.1. Instruction Template Design

We first design a structured instruction template to align the KGC task with the generative paradigm of LLMs. As shown in Figure 2a, each instruction consists of four components, and a concrete instance of such a fine-tuning instruction is shown in Figure 2b.

Query: Describes the task (e.g., tail entity prediction) and provides the incomplete triple $(h, r, ?)$ with a natural language prompt.
Entities: A candidate entity list retrieved by the KGE model using similarity-based ranking, which is a well-established technique. This list serves as the “options” for the LLM, transforming the open-ended generation into a multiple-choice-like discrimination task.
Info: Contextual information to aid reasoning. In this work, we utilize the textual descriptions of entities provided by the dataset. To filter out irrelevant noise, we select descriptions based on TF-IDF similarity with the query.
Answer: The ground truth entity, used as the supervision signal for fine-tuning.

To help the LLM parse the structure, we introduce special marker tokens (e.g., [QUERY], [ENTITY]) into the vocabulary.

4.2.2. Automated Data Synthesis

The core of our approach is to utilize a pre-trained KGE model to populate the Entities field in the template. The process involves three steps:

Structural Prior Learning. We first train a lightweight KGE model (e.g., TransE, SimKGC) on the training set

T_{t r a i n}

to learn the vector representations of entities and relations. The training objective is to minimize the margin-based ranking loss or cross-entropy loss, ensuring that the model captures the global topology of the graph.

Candidate Retrieval and Scoring. For each incomplete triple

(h, r, ?)

in the training set, we use the trained KGE model to compute the plausibility score for all entities in the entity set

E

. This employs similarity-based retrieval and ranking, which are well-established techniques in KGE. The scoring function

f (h, r, t)

depends on the specific KGE architecture (e.g., distance-based for TransE or semantic matching for ComplEx). To quantify the relevance, we normalize the scores into a probability distribution. For a candidate entity

t_{c}

, its probability is calculated as follows:

P (t_{c} | h, r; Θ_{EMB}) = \frac{\exp (f (h, r, t_{c}) / τ)}{\sum_{t^{'} \in E} \exp (f (h, r, t^{'}) / τ)}

(7)

where

τ

is a temperature hyperparameter. Alternatively, for distance-based models, similarity can be directly measured via cosine similarity between the query embedding

(h + r)

and candidate embedding

t_{c}

:

sim (h, r, t_{c}) = \frac{{(h + r)}^{⊤} t_{c}}{{∥ h + r ∥}_{2} \cdot {∥ t_{c} ∥}_{2}}

(8)

Filtering and Synthesis. To construct the final instruction, we sort all entities based on their scores (or probabilities) in descending order. We retain the top-K entities to form the candidate list

C_{c a n d}

. This truncation strategy serves two purposes: it ensures the inclusion of ground truth and hard negatives (high-ranking but incorrect entities), while filtering out massive amounts of irrelevant noise that might confuse the LLM. Finally, the triplet, the top-K candidate list, the retrieved description (Info), and the ground truth are filled into the template to generate a complete instruction sample. This process is repeated for all triples in

T_{t r a i n}

, yielding a large-scale dataset

D_{F T}

without manual annotation. The overall procedure of this synthesis phase is summarized in Algorithm 1.

Algorithm 1: Fine-tuning Instruction Generation Algorithm Based on Knowledge Graph Embedding Model

4.3. Structure-Aware Instruction Tuning

After synthesizing the instruction dataset, the critical challenge is to adapt the LLM to the strict constraints of the KGC task. Standard LLMs, trained on open-ended text, are prone to hallucinations—generating plausible but non-existent entities. To mitigate this, we propose a novel Structure-Aware Instruction Tuning protocol that explicitly grounds the model’s generation in the provided structural candidates. While we build upon instruction tuning and QLoRA as foundational adaptation techniques, our protocol introduces two key innovations specifically designed for KGC tasks: (1) structural token initialization and (2) attention-enhanced QLoRA, which enable the LLM to better internalize the closed-domain constraints and suppress hallucinations from within the model.

4.3.1. Structural Token Initialization

Standard LLMs treat structural markers (e.g., [ENTITY]) as random new tokens, lacking semantic grounding. This ambiguity often confuses the model, leading to unconstrained generation. To address this, we propose a novel semantic-aware initialization strategy. For a marker m, its embedding

e_{m}

is initialized as the centroid of its semantic class:

e_{m} = \frac{1}{| S_{m} |} \sum_{w \in S_{m}} e_{w}

(9)

where

S_{m}

denotes task-relevant words (e.g., {“entity”, “object”}). This initialization mechanism represents a key methodological innovation in our work. By providing an explicit semantic anchor, it helps the model strictly distinguish between “instructional markers” and “content,” reducing the likelihood of the model misinterpreting the task instruction and drifting into hallucination. Consider an LLM reading: “Predict the missing entity: [Steve Jobs] was the CEO of [ENTITY] Apple.” Without proper initialization, the model might misunderstand [ENTITY] as part of the answer (e.g., “[ENTITY] Steve Ballmer”). By initializing [ENTITY] with the semantic meaning of “entity marker,” the model learns to treat it as a placeholder tag indicating “the answer goes here” rather than actual content.

4.3.2. Attention-Enhanced QLoRA

We adopt QLoRA [45] as our foundational fine-tuning strategy, which is an existing technique that enables efficient model adaptation by freezing the 4-bit quantized LLM backbone and optimizing only a small set of low-rank adapter parameters.

Limitation of Vanilla QLoRA: While efficient, standard QLoRA inherits the vanilla self-attention mechanism of Transformers, which computes attention scores based solely on semantic correlations. This treats all tokens with equal potential importance. However, in KGC tasks, specific structural tokens (i.e., the retrieved candidate entities in

C_{c a n d}

) carry the critical evidence required for correct discrimination, whereas other context tokens are often mere syntactic noise.

Consider the instruction: “Given (Steve Jobs, CEO, ?), select from: [ENTITY] Tim Cook, [ENTITY] Satya Nadella, [ENTITY] Elon Musk.” Vanilla QLoRA might pay equal attention to words like “Given,” “from,” and the entity names, diluting the signal. Our attention bias tells the model: “Focus more heavily on the actual candidate entities (Tim Cook, Satya Nadella, Elon Musk) since these contain the answer, and pay less attention to the generic template words.”

To address this limitation, we propose Attention-Enhanced QLoRA. This is a core methodological innovation that enhances QLoRA with structure-guided attention mechanisms. Specifically, we introduce a Structure-Guided Attention Bias to explicitly recalibrate the model’s focus. During the fine-tuning phase, we modulate the attention weights to enforce structural awareness. The enhanced attention score

{\hat{α}}_{i, j}

is defined as

{\hat{α}}_{i, j} = Softmax (\frac{q_{i} k_{j}^{⊤}}{\sqrt{d}} + B_{i, j}^{s t r u c t})

(10)

where

B_{i, j}^{s t r u c t}

is the structural bias term:

B_{i, j}^{s t r u c t} = \{\begin{matrix} γ, & if w_{j} \in C_{c a n d} \\ 0, & otherwise \end{matrix}

(11)

where

C_{c a n d}

represents the set of tokens belonging to the candidate entities provided by the KGE model, and

γ

is a learnable or fixed scalar (e.g.,

γ = 0.3

) that amplifies the attention signal. By boosting attention weights specifically for candidate entity tokens, the model is forced to focus its reasoning on the actual valid options provided by the KGE model rather than on irrelevant context words or distractors.

Anti-Hallucination Discussion: This mechanism acts as a “soft structural constraint.” By artificially boosting the attention weights of valid candidates, AEQLoRA effectively suppresses the noise from the LLM’s internal parametric memory. This ensures that the model’s reasoning is grounded in the explicitly retrieved context rather than open-world hallucinations. The attention-enhanced mechanism enables the LLM to better internalize KGC task constraints and suppress hallucinations from within the model.

4.3.3. Optimization Objective

Finally, we optimize the model to minimize the negative log-likelihood of the ground truth entity a given the structure-constrained context q:

L_{F T} = - \sum_{(q, a) \in D_{F T}} \log P (a | q; Θ_{L o R A}, Θ_{b i a s})

(12)

This loss function forces the LLM to learn that “my answer must come from the constrained candidate set provided by the KGE model,” effectively transforming it from a stochastic generator into a discriminative reasoner and aligning its output space with the closed-world knowledge graph.

4.4. Joint Inference Mechanism

In the link prediction phase, we define a novel collaborative relationship and division of labor between the KGE model and the LLM. We adopt the well-established “retrieve-then-rerank” two-stage inference architecture, but our innovation lies in how KGE and the LLM collaborate specifically for KGC tasks: KGE serves as a high-recall retriever, leveraging its strength in entity matching, while the fine-tuned LLM acts as a high-precision reranker, utilizing its vast pre-trained semantic knowledge. This collaborative paradigm enables complementary advantages and provides a systematic solution for mitigating hallucinations through collaborative reasoning.

For the input incomplete triple, the trained knowledge graph embedding model is first used to generate a predicted candidate entity list. The candidate selection is based on the embedding model’s ranking score:

Rank (t_{c}) = \sum_{t^{'} \in E, f (h, r, t^{'}) < f (h, r, t_{c})} 1,

(13)

where

Rank (t_{c})

is the rank of candidate

t_{c}

(smaller rank indicates higher relevance), and only candidates with

Rank (t_{c}) \leq K

are retained. The candidate list is then filled into the discriminant instruction template and input to the fine-tuned large model. The large model outputs the entity with the highest predicted probability, where the probability of candidate

t_{c}

being the correct answer is:

P (t_{c} | q; M_{L L M}^{F T}) = \frac{\exp (score (t_{c}))}{\sum_{t^{'} \in C (h, r, ?)} \exp (score (t^{'}))}

(14)

where

score (t_{c})

is the model’s output logit for the token sequence corresponding to

t_{c}

. The prediction result is ensured to belong to the entity space of the knowledge graph through structured constraints:

t^{*} \in E

.

From a hallucination perspective, EGIT explicitly constrains the output space of the large language model during inference. Since the LLM is required to select an answer from the candidate entity set retrieved by the embedding model, all predicted entities are guaranteed to belong to the entity vocabulary of the target knowledge graph. As a result, out-of-graph or schema-violating hallucinations are structurally impossible in EGIT. This design contrasts with unconstrained generative KGC approaches, where the model may produce entities outside the graph schema due to open-ended decoding.

After completing instruction tuning, we evaluate the fine-tuned large model on the structured link prediction task. For test set triples, the trained KGE model generates candidate entity lists sorted by Equation (7) descending, retaining only top-K entities. This information fills the discriminant instruction template, which lacks the correct answer (unlike fine-tuning instructions). The large model outputs an entity from the candidate entities as the prediction. Prediction confidence is defined as

Conf (t^{*}) = P (t^{*} | q; M_{L L M}^{F T}) - \max_{t_{c} \neq t^{*}} P (t_{c} | q; M_{L L M}^{F T}),

(15)

where a higher

Conf (t^{*})

indicates a more reliable prediction.

By comparing the model prediction results with the correct answers of the test set, various performance indicators of the model can be calculated, including Mean Reciprocal Rank (MRR) and Hits@K:

\begin{matrix} MRR & = \frac{1}{N_{test}} \sum_{i = 1}^{N_{test}} \frac{1}{Rank (t_{i}^{*})}, \\ Hits @ K & = \frac{1}{N_{test}} \sum_{i = 1}^{N_{test}} I (Rank (t_{i}^{*}) \leq K), \end{matrix}

(16)

where

N_{test}

is the number of incomplete triples in the test set, and

t_{i}^{*}

is the correct answer for the i-th test triple. This combination of the embedding model and the large language model can effectively avoid the hallucination problem that may occur when using the large language model directly for completion, while fully leveraging the embedding model’s advantages in entity matching and the large model’s massive pre-trained knowledge. The pseudocode of the algorithm is shown in Algorithm 2.

Algorithm 2: Link Prediction Algorithm Combining Embedding Model and Large Language Model

5. Experiments

5.1. Datasets

To evaluate the proposed method, we conduct experiments on two widely used benchmark datasets for knowledge graph completion: FB15k-237 [18] and WN18RR [27]. The FB15k-237 dataset is derived from FB15k by removing inverse relations and contains 14,541 entities, 237 relations, and 310,116 triplets. The WN18RR dataset contains 40,943 entities, 11 relations, and 93,003 triplets. Both datasets are split into training, validation, and test sets. The detailed statistics are shown in Table 1.

5.2. Evaluation Metrics

We adopt standard evaluation metrics for knowledge graph completion: Hits@1, Hits@3, Hits@10, and Mean Reciprocal Rank (MRR). Hits@k measures the proportion of correct entities ranked in the top-k predictions. MRR computes the average reciprocal rank of the correct entity across all test triplets. Higher values indicate better performance.

5.3. Experimental Setup

We compare EGIT with several baseline models, categorized into three groups: (1) Traditional triple-based knowledge graph embedding models (TransE [18], ConvE [27], SimKGC [37], RESCAL [46], ComplEx [24]); (2) GNN-based methods (CompGCN [30], NBFNet [31]); and (3) LLM-based methods (KICGPT [13]).

For our EGIT framework, we use three knowledge graph embedding models (TransE, SimKGC, ComplEx) to generate candidate lists, and two large language models (Llama-3-8B-Instruct, Llama-3.1-8B-Instruct) for fine-tuning and prediction. The hyperparameters of our fine-tuning QLoRA are shown in Table 2.

5.4. Main Results

The comparative results on FB15k-237 and WN18RR are summarized in Table 3 and Table 4, respectively. Overall, our proposed EGIT framework consistently outperforms both traditional embedding-based baselines and recent LLM-based approaches. We highlight three key observations from the empirical analysis:

Synergy of Structure and Semantics. Compared to traditional KGE baselines (e.g., TransE, ComplEx, SimKGC), EGIT achieves significant performance gains. On the complex FB15k-237 dataset, the best variant of EGIT improves Hits@1 by an absolute margin of 7.0% over the base ComplEx (0.341 vs. 0.271). Traditional KGEs rely solely on structural topology and struggle with complex relations that require semantic understanding. By integrating a fine-tuned LLM as a reranker, EGIT successfully injects semantic reasoning into the prediction process, allowing the model to distinguish correct entities based on contextual descriptions even when structural cues are ambiguous.

Mitigation of Hallucinations. A critical advantage of EGIT is evident when compared with the LLM-based baseline, KICGPT. On the WN18RR dataset, KICGPT suffers a significant performance drop (MRR: 0.564), likely due to graph sparsity, which causes In-Context Learning to hallucinate. In contrast, EGIT maintains robust performance (MRR 0.674), outperforming KICGPT by 11.0% in MRR. This validates the effectiveness of our Structure-Aware Instruction Tuning. By constraining the LLM’s output space to the candidate list provided by the KGE model and using Attention-Enhanced QLoRA to focus on these candidates, EGIT effectively suppresses the open-world hallucinations that plague standard generative methods.

Robustness Across Backbones. EGIT consistently improves performance across different KGE backbones (TransE, SimKGC, ComplEx). Even with simple TransE candidates, Llama-3 achieves competitive results, demonstrating the model-agnostic nature of our “Retrieve-then-Rerank” paradigm, where the LLM acts as a universal semantic refiner.

Table 3. Results on the FB15k-237 dataset.

Method	LLM Backbone	MRR	Hits@1	Hits@3	Hits@10
Triple-based methods
TransE [18]	-	0.279	0.198	0.376	0.441
ConvE [27]	-	0.320	0.240	0.350	0.490
SimKGC [37]	-	0.338	0.252	0.364	0.390
RESCAL [46]	-	0.356	0.266	0.390	0.535
GenKGC [39]	-	-	0.192	0.355	0.439
ComplEx [24]	-	0.366	0.271	0.401	0.557
KGTuner [47]	-	0.345	0.252	0.381	0.534
KG-Mixup [48]	-	0.359	0.265	0.395	0.547
UniGE [49]	-	0.343	0.257	0.375	0.523
GNN-based methods
CompGCN [30]	-	0.355	0.264	0.390	0.535
NBFNet [31]	-	0.415	0.321	0.454	0.599
CSProm-KG [50]	-	0.355	0.261	0.389	0.531
LLM-based methods
KICGPT [13]	-	0.412	0.327	0.448	0.581
EGIT
+ TransE	Llama-3-8B	0.372	0.317	0.398	0.514
+ SimKGC	Llama-3-8B	0.398	0.329	0.446	0.539
+ ComplEx	Llama-3-8B	0.418	0.339	0.461	0.577
+ TransE	Llama-3.1-8B	0.368	0.313	0.394	0.512
+ SimKGC	Llama-3.1-8B	0.384	0.327	0.413	0.523
+ ComplEx	Llama-3.1-8B	0.416	0.341	0.454	0.562

Table 4. Results on the WN18RR dataset.

Method	LLM Backbone	MRR	Hits@1	Hits@3	Hits@10
Triple-based methods
TransE [18]	-	0.243	0.043	0.441	0.532
ConvE [27]	-	0.430	0.390	0.440	0.510
SimKGC [37]	-	0.671	0.595	0.719	0.802
RESCAL [46]	-	0.467	0.439	0.478	0.516
GenKGC [39]	-	-	0.287	0.403	0.535
ComplEx [24]	-	0.487	0.441	0.501	0.580
KGTuner [47]	-	0.481	0.438	0.499	0.556
KG-Mixup [48]	-	0.488	0.443	0.505	0.541
UniGE [49]	-	0.491	0.447	0.512	0.563
GNN-based methods
CompGCN [30]	-	0.479	0.443	0.494	0.546
NBFNet [31]	-	0.551	0.497	0.573	0.666
CSProm-KG [50]	-	0.569	0.520	0.590	0.675
LLM-based methods
KICGPT [13]	-	0.564	0.478	0.612	0.677
EGIT
+ TransE	Llama-3-8B	0.508	0.496	0.517	0.571
+ SimKGC	Llama-3-8B	0.674	0.620	0.723	0.796
+ ComplEx	Llama-3-8B	0.610	0.569	0.626	0.692
+ TransE	Llama-3.1-8B	0.494	0.479	0.506	0.564
+ SimKGC	Llama-3.1-8B	0.654	0.580	0.725	0.797
+ ComplEx	Llama-3.1-8B	0.604	0.568	0.625	0.689

5.5. Ablation Study

Analysis on Individual Component. To rigorously assess the contribution of individual components in the EGIT framework, we conduct an ablation study on the FB15k-237 dataset. We benchmark the full model against three strategic variants to isolate the effects of structural priors and task-specific adaptation: (1) w/o Fine-tuning, which utilizes the LLM in an inference-only mode with KGE-retrieved candidates but without parameter updates; (2) w/o Candidate List, which fine-tunes the LLM solely on triples without the structural constraints provided by the candidate set; and (3) Direct LLM Prediction, which serves as the lower bound where the vanilla LLM predicts the tail entity directly from the query.

The results are summarized in Table 5. We observe that both the structural candidate retrieval and the instruction tuning are indispensable. First, removing the candidate list (w/o Candidate List) leads to a precipitous drop in performance (MRR decreases from 0.418 to 0.342). This shows that purely semantic reasoning is insufficient for KGC; the KGE-retrieved candidates serve as a necessary “structural anchor” to narrow the search space and mitigate open-world hallucinations. Second, even with candidates provided, the lack of fine-tuning (w/o Fine-tuning) results in suboptimal performance (0.381 MRR). This underscores the necessity of our structure-aware adaptation, which aligns the LLM’s reasoning patterns with the discriminatory logic of selecting from a closed set. Notably, the full EGIT framework achieves the highest performance across all metrics, confirming the synergistic effect of combining structural priors with semantic-aware instruction tuning.

Analysis on Hallucination Mitigation. We note that existing benchmark datasets for knowledge graph completion, such as FB15k-237 and WN18RR, do not provide explicit hallucination annotations. Under the standard closed-world evaluation protocol, hallucination is implicitly reflected by ranking-based metrics such as Mean Reciprocal Rank (MRR) and Hits@K, since any incorrect ranking corresponds to selecting a non-factual or structurally invalid entity. Moreover, generative hallucinations such as out-of-vocabulary entities or schema violations cannot occur in EGIT due to its candidate-constrained inference mechanism, whereas such errors may arise in unconstrained generative LLM-based KGC methods.

5.6. Training Cost and Efficiency

To assess the practicality of EGIT, we detail the computational costs involved. All experiments were conducted on a server equipped with an Intel(R) Xeon(R) Platinum 8468V CPU and 128GB of RAM, utilizing four NVIDIA RTX 3090 GPUs (24GB VRAM each). The training process consists of two main phases: (1) KGE Model Training: Training embedding models like TransE or ComplEx on FB15k-237 typically converges within 2–4 h. (2) LLM Fine-tuning: The instruction tuning of the Llama-3-8B model using QLoRA (with hyperparameters in Table 2) is highly efficient. On a single GPU, the fine-tuning process for one dataset completes within 6–8 h, leveraging the significant parameter reduction (e.g., updating only 0.1% of parameters with rank = 64) offered by QLoRA [16]. This demonstrates that EGIT achieves its performance gains without prohibitive computational overhead.

6. Conclusions

In this paper, we presented EGIT, a novel framework that synergizes the structural reliability of KGEs with the semantic reasoning capabilities of LLMs. First, we introduced an Embedding-Guided Instruction Generation pipeline, which synthesizes high-quality, annotation-free training data. Furthermore, we proposed a Structure-Aware Instruction Tuning protocol that incorporates semantic token initialization and Attention-Enhanced QLoRA to ground the model’s reasoning. Finally, we implemented a Joint Inference Mechanism using a “Retrieve-then-Rerank” strategy, which effectively transforms the LLM into a constrained discriminator to mitigate hallucinations.By explicitly constraining both the training and inference processes to structurally valid candidate entities, EGIT mitigates hallucination in KGC by design, aligning large language model reasoning with the closed-world assumptions of knowledge graphs.

Extensive experiments on standard benchmarks (FB15k-237 and WN18RR) demonstrate that EGIT achieves state-of-the-art performance, outperforming both traditional embedding models and recent LLM-based methods. We note that the inference latency introduced by the LLM reranking stage is relatively high compared with that of lightweight embedding models. In the future, we aim to explore knowledge distillation techniques to accelerate the inference process and extend our framework to inductive settings.

Author Contributions

Conceptualization, P.Z. (Pengfei Zhang), X.X., J.W., J.S., M.S., and Y.Z.; methodology, P.Z. (Pengfei Zhang) and J.S.; software, X.L.; validation, J.S.; writing—original draft preparation, P.Z. (Pengfei Zhang), J.S., X.Z., D.C., X.P., S.H., and G.Z.; writing—review and editing, S.H., P.Z. (Ping Zong), G.Z., Z.O., M.S., and Y.Z.; supervision, Z.O., M.S., and Y.Z.; project administration, Z.O.; funding acquisition, P.Z. (Pengfei Zhang), X.X., J.W., and X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Science and Technology Project of State Grid Hebei Information and Telecommunication Branch, grant number kj2024-018. The APC was funded by the Science and Technology Project of State Grid Hebei Information and Telecommunication Branch.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: https://github.com/thunlp/OpenKE/tree/master/benchmarks (GitHub—OpenKE Benchmark Datasets) accessed on 15 January 2026.

Acknowledgments

During the preparation of this manuscript, the authors used Gemini 3 solely for language refinement, including grammar, phrasing, and text editing. The AI tool did not participate in the research design, data analysis, experimental development, scientific interpretation, or the generation of technical content. All AI-assisted text was fully reviewed, verified, and revised by the authors, who take full responsibility for the final scientific content. All authors have agreed to this acknowledgment.

Conflicts of Interest

Author Pengfei Zhang, Xing Xu, Junying Wu, and Xin Lu were employed by the company State Grid Hebei Information and Telecommunication Branch. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The authors declare that this study received funding from the Science and Technology Project of State Grid Hebei Information and Telecommunication Branch. The funder had the following involvement with the study: writing of the original draft.

References

Yani, M.; Krisnadhi, A.A. Challenges, techniques, and trends of simple knowledge graph question answering: A survey. Information 2021, 12, 271. [Google Scholar] [CrossRef]
Guo, Q.; Zhuang, F.; Qin, C.; Zhu, H.; Xie, X.; Xiong, H.; He, Q. A survey on knowledge graph-based recommender systems. IEEE Trans. Knowl. Data Eng. 2020, 34, 3549–3568. [Google Scholar] [CrossRef]
Zhang, O.; Lin, H.; Zhang, X.; Wang, X.; Wu, Z.; Ye, Q.; Zhao, W.; Wang, J.; Ying, K.; Kang, Y.; et al. Graph Neural Networks in Modern AI-Aided Drug Discovery. Chem. Rev. 2025, 125, 10001–10103. [Google Scholar] [CrossRef] [PubMed]
Bongini, P.; Bianchini, M.; Scarselli, F. Molecular generative graph neural networks for drug discovery. Neurocomputing 2021, 450, 242–252. [Google Scholar] [CrossRef]
Bollacker, K.; Evans, C.; Paritosh, P.; Sturge, T.; Taylor, J. Freebase: A collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Vancouver, BC, Canada, 9–12 June 2008; pp. 1247–1250. [Google Scholar]
Bian, H. LLM-empowered knowledge graph construction: A survey. arXiv 2025, arXiv:2510.20345. [Google Scholar] [CrossRef]
Ji, S.; Pan, S.; Cambria, E.; Marttinen, P.; Yu, P.S. A survey on knowledge graphs: Representation, acquisition, and applications. IEEE Trans. Neural Networks Learn. Syst. 2021, 33, 494–514. [Google Scholar] [CrossRef] [PubMed]
Ali, M.; Berrendorf, M.; Hoyt, C.T.; Vermue, L.; Sharifzadeh, S.; Tresp, V.; Lehmann, J. PyKEEN 1.0: A python library for training and evaluating knowledge graph embeddings. J. Mach. Learn. Res. 2021, 22, 1–6. [Google Scholar]
Yao, L.; Peng, J.; Mao, C.; Luo, Y. Exploring large language models for knowledge graph completion. In Proceedings of the ICASSP 2025—2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 6–11 April 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 1–5. [Google Scholar]
Ji, Z.; Lee, N.; Frieske, R.; Yu, T.; Su, D.; Xu, Y.; Ishii, E.; Bang, Y.J.; Madotto, A.; Fung, P. Survey of hallucination in natural language generation. ACM Comput. Surv. 2023, 55, 1–38. [Google Scholar] [CrossRef]
Zhang, Y.; Li, Y.; Cui, L.; Cai, D.; Liu, L.; Fu, T.; Huang, X.; Zhao, E.; Zhang, Y.; Chen, Y.; et al. Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models. Comput. Linguist. 2025, 51, 1373–1418. [Google Scholar] [CrossRef]
Wagner, R.; Kitzelmann, E.; Boersch, I. Mitigating hallucination by integrating knowledge graphs into LLM inference–a systematic literature review. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, Vienna, Austria, 27 July–1 August 2025; Volume 4: Student Research Workshop, pp. 795–805. [Google Scholar]
Wei, Y.; Huang, Q.; Zhang, Y.; Kwok, J. Kicgpt: Large language model with knowledge in context for knowledge graph completion. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, 6–10 December 2023; pp. 8667–8683. [Google Scholar]
Dong, Q.; Li, L.; Dai, D.; Zheng, C.; Ma, J.; Li, R.; Xia, H.; Xu, J.; Wu, Z.; Chang, B.; et al. A survey on in-context learning. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, 12–16 November 2024; pp. 1107–1128. [Google Scholar]
Zhang, Y.; Chen, Z.; Guo, L.; Xu, Y.; Zhang, W.; Chen, H. Making large language models perform better in knowledge graph completion. In Proceedings of the 32nd ACM International Conference on Multimedia, Melbourne, Australia, 28 October–1 November 2024; pp. 233–242. [Google Scholar]
Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. Lora: Low-rank adaptation of large language models. ICLR 2022, 1, 3. [Google Scholar]
Liu, Y.; Cao, Y.; Lin, X.; Shang, Y.; Wang, S.; Pan, S. Enhancing Large Language Model for Knowledge Graph Completion via Structure-Aware Alignment-Tuning. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Suzhou, China, 4–9 November 2025; pp. 20981–20995. [Google Scholar]
Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating embeddings for modeling multi-relational data. Adv. Neural Inf. Process. Syst. 2013, 26, 2787–2795. [Google Scholar]
Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the AAAI Conference on Artificial Intelligence, Québec, QC, Canada, 27–31 July 2014; Volume 28. [Google Scholar]
Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015. [Google Scholar]
Sun, Z.; Deng, Z.H.; Nie, J.Y.; Tang, J. Rotate: Knowledge graph embedding by relational rotation in complex space. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Zhang, Z.; Cai, J.; Zhang, Y.; Wang, J. Learning hierarchy-aware knowledge graph embeddings for link prediction. In Proceedings of the AAAI conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 3065–3072. [Google Scholar]
Yang, B.; Yih, W.T.; He, X.; Gao, J.; Deng, L. Embedding entities and relations for learning and inference in knowledge bases. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Trouillon, T.; Welbl, J.; Riedel, S.; Gaussier, É.; Bouchard, G. Complex embeddings for simple link prediction. In Proceedings of the International Conference on Machine Learning. PMLR, New York, NY, USA, 20–22 June 2016; pp. 2071–2080. [Google Scholar]
Balažević, I.; Allen, C.; Hospedales, T. Tucker: Tensor factorization for knowledge graph completion. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 5185–5194. [Google Scholar]
Kazemi, S.M.; Poole, D. SimplE Embedding for Link Prediction in Knowledge Graphs. In Proceedings of the Advances in Neural Information Processing Systems, Montréal, QC, Canada, 3–8 December 2018; Volume 31. [Google Scholar]
Dettmers, T.; Minervini, P.; Stenetorp, P.; Riedel, S. Convolutional 2d knowledge graph embeddings. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
Nguyen, T.D.; Nguyen, D.Q.; Phung, D. A novel embedding model for knowledge base completion based on convolutional neural network. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA, 1–6 June 2018; Volume 2 (Short Papers), pp. 327–333. [Google Scholar]
Schlichtkrull, M.; Kipf, T.N.; Bloem, P.; Van Den Berg, R.; Titov, I.; Welling, M. Modeling relational data with graph convolutional networks. In Proceedings of the European Semantic Web Conference; Springer: Berlin/Heidelberg, Germany, 2018; pp. 593–607. [Google Scholar]
Vashishth, S.; Sanyal, S.; Nitin, V.; Talukdar, P. Composition-based multi-relational graph convolutional networks. In Proceedings of the International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
Zhu, Z.; Zhang, Z.; Xhonneux, L.P.; Tang, J. Neural bellman-ford networks: A general graph neural network framework for link prediction. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Online, 6–14 December 2021. [Google Scholar]
Zhang, Y.; Yao, Q. Knowledge graph reasoning with relational digraph. In Proceedings of the Web Conference (WWW), Austin, TX, USA, 30 April 30–4 May 2022. [Google Scholar]
Xiao, H.; Huang, M.; Meng, L.; Zhu, X. SSP: Semantic space projection for knowledge graph embedding with text descriptions. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
Yao, L.; Mao, C.; Luo, Y. KG-BERT: BERT for knowledge graph completion. arXiv 2019, arXiv:1909.03193. [Google Scholar] [CrossRef]
Wang, B.; Shen, T.; Long, G.; Zhou, T.; Wang, Y.; Chang, Y. Structure-augmented text representation learning for efficient knowledge graph completion. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; pp. 1737–1748. [Google Scholar]
Wang, W.; Bi, B.; Yan, M.; Wu, C.; Bao, Z.; Xia, J.; Peng, L.; Si, L. Structbert: Incorporating language structures into pre-training for deep language understanding. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Wang, L.; Zhao, W.; Wei, Z.; Liu, J. SimKGC: Simple Contrastive Knowledge Graph Completion with Pre-trained Language Models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022. [Google Scholar]
Saxena, A.; Kochsiek, A.; Gemulla, R. Sequence-to-Sequence Knowledge Graph Completion and Question Answering. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 2814–2828. [Google Scholar]
Xie, X.; Zhang, N.; Li, Z.; Deng, S.; Chen, H.; Xiong, F.; Chen, M.; Chen, H. From discrimination to generation: Knowledge graph completion with generative transformer. In Proceedings of the Companion Proceedings of the Web Conference 2022, Lyon, France, 25–29 April 2022; pp. 162–165. [Google Scholar]
Chen, B.; Bertozzi, A.L. AutoKG: Efficient automated knowledge graph generation for language models. In Proceedings of the 2023 IEEE International Conference on Big Data (BigData), Sorrento, Italy, 15–18 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 3117–3126. [Google Scholar]
Chen, J.; Ma, L.; Li, X.; Thakurdesai, N.; Xu, J.; Cho, J.H.; Nag, K.; Korpeoglu, E.; Kumar, S.; Achan, K. Knowledge graph completion models are few-shot learners: An empirical study of relation labeling in e-commerce with llms. arXiv 2023, arXiv:2305.09858. [Google Scholar]
Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.t.; Rocktäschel, T.; et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv. Neural Inf. Process. Syst. 2020, 33, 9459–9474. [Google Scholar]
Galkin, M.; Yuan, X.; Mostafa, H.; Tang, J.; Zhu, Z. Towards Foundation Models for Knowledge Graph Reasoning. In Proceedings of the NeurIPS 2023 Workshop: New Frontiers in Graph Learning, New Orleans, LA, USA, 15 December 2023. [Google Scholar]
Guan, X.; Liu, Y.; Lin, H.; Lu, Y.; He, B.; Han, X.; Sun, L. Mitigating large language model hallucinations via autonomous knowledge graph-based retrofitting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 18126–18134. [Google Scholar]
Dettmers, T.; Pagnoni, A.; Holtzman, A.; Zettlemoyer, L. Qlora: Efficient finetuning of quantized llms. Adv. Neural Inf. Process. Syst. 2023, 36, 10088–10115. [Google Scholar]
Nickel, M.; Tresp, V.; Kriegel, H.P. A three-way model for collective learning on multi-relational data. In Proceedings of the International Conference on Machine Learning, Bellevue, WA, USA, 28 June–2 July 2011; Volume 11, pp. 3104482–3104584. [Google Scholar]
Zhang, Y.; Zhou, Z.; Yao, Q.; Li, Y. KGTuner: Efficient Hyper-parameter Search for Knowledge Graph Learning. arXiv 2022, arXiv:2205.02460. [Google Scholar] [CrossRef]
Shomer, H.; Jin, W.; Wang, W.; Tang, J. Toward Degree Bias in Embedding-Based Knowledge Graph Completion. In Proceedings of the ACM Web Conference 2023, New York, NY, USA, 30 April–4 May 2023; pp. 705–715. [Google Scholar] [CrossRef]
Liu, Y.; Cao, Z.; Gao, X.; Zhang, J.; Yan, R. Bridging the Space Gap: Unifying Geometry Knowledge Graph Embedding with Optimal Transport. In Proceedings of the ACM Web Conference 2024, New York, NY, USA, 13–17 May 2024; pp. 2128–2137. [Google Scholar] [CrossRef]
Chen, C.; Wang, Y.; Sun, A.; Li, B.; Lam, K.Y. Dipping PLMs Sauce: Bridging Structure and Text for Effective Knowledge Graph Completion via Conditional Soft Prompting. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023; Rogers, A., Boyd-Graber, J., Okazaki, N., Eds.; Association for Computational Linguistics: Toronto, ON, Canada, 2023; pp. 11489–11503. [Google Scholar] [CrossRef]

Figure 1. The overall framework of EGIT. (a) Instruction Generation: A pre-trained KGE model retrieves structural candidates to automatically synthesize high-quality, annotation-free instruction data. (b) Joint Prediction: A “Retrieve-then-Rerank” pipeline where the KGE model filters the search space and the fine-tuned LLM performs high-precision reasoning, effectively mitigating hallucinations.

Figure 2. (a) An example of a fine-tuning instruction. (b) An instance of fine-tuning instruction.

Table 1. Statistics of the datasets used in the experiments.

Dataset	#Entities	#Relations	#Triplets	#Train	#Valid	#Test
FB15k-237	14,541	237	310,116	272,115	17,535	20,466
WN18RR	40,943	11	93,003	86,835	3034	3134

Table 2. Hyperparameter settings for QLoRA fine-tuning.

Hyperparameter	Value
rank	64
alpha	16
dropout	0.1
precision	bf16
quantization precision	INT4

Table 5. Ablation study on FB15k-237 dataset (using ComplEx as embedding model and Llama-3-8B).

Variant	MRR	Hits@1	Hits@3	Hits@10
EGIT (Full Model)	0.418	0.339	0.461	0.577
w/o Fine-tuning	0.381	0.302	0.423	0.541
w/o Candidate List	0.342	0.265	0.381	0.498
Direct LLM Prediction	0.287	0.210	0.325	0.443

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, P.; Xu, X.; Wu, J.; Lu, X.; Shi, J.; Zhang, X.; Cui, D.; Peng, X.; He, S.; Zong, P.; et al. Mitigating Hallucinations in Knowledge Graph Completion via Embedding-Guided Instruction Tuning. Information 2026, 17, 207. https://doi.org/10.3390/info17020207

AMA Style

Zhang P, Xu X, Wu J, Lu X, Shi J, Zhang X, Cui D, Peng X, He S, Zong P, et al. Mitigating Hallucinations in Knowledge Graph Completion via Embedding-Guided Instruction Tuning. Information. 2026; 17(2):207. https://doi.org/10.3390/info17020207

Chicago/Turabian Style

Zhang, Pengfei, Xing Xu, Junying Wu, Xin Lu, Jiahao Shi, Xiaodong Zhang, Dezhi Cui, Xiuxian Peng, Sihao He, Ping Zong, and et al. 2026. "Mitigating Hallucinations in Knowledge Graph Completion via Embedding-Guided Instruction Tuning" Information 17, no. 2: 207. https://doi.org/10.3390/info17020207

APA Style

Zhang, P., Xu, X., Wu, J., Lu, X., Shi, J., Zhang, X., Cui, D., Peng, X., He, S., Zong, P., Zhang, G., Ou, Z., Song, M., & Zhu, Y. (2026). Mitigating Hallucinations in Knowledge Graph Completion via Embedding-Guided Instruction Tuning. Information, 17(2), 207. https://doi.org/10.3390/info17020207

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Mitigating Hallucinations in Knowledge Graph Completion via Embedding-Guided Instruction Tuning

Abstract

1. Introduction

2. Related Work

2.1. Geometric and Factorization-Based KGE

2.2. Neural and Graph-Based Architectures

2.3. Pre-Trained Language Models

2.4. Generative LLMs and Hallucination Mitigation

3. Preliminaries

3.1. Notation Definitions

3.2. Definition of Knowledge Graph Completion Task

3.3. Knowledge Graph Embedding

3.4. Instruction Tuning of Large Language Models

4. Method

4.1. Overview

4.2. Embedding-Guided Instruction Generation

4.2.1. Instruction Template Design

4.2.2. Automated Data Synthesis

4.3. Structure-Aware Instruction Tuning

4.3.1. Structural Token Initialization

4.3.2. Attention-Enhanced QLoRA

4.3.3. Optimization Objective

4.4. Joint Inference Mechanism

5. Experiments

5.1. Datasets

5.2. Evaluation Metrics

5.3. Experimental Setup

5.4. Main Results

5.5. Ablation Study

5.6. Training Cost and Efficiency

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI