Knowledge Integrity in Large Language Models: A State-of-The-Art Review

Abishethvarman, Vadivel; Sabrina, Fariza; Kwan, Paul

doi:10.3390/info16121076

Open AccessReview

Knowledge Integrity in Large Language Models: A State-of-The-Art Review

by

Vadivel Abishethvarman

¹

,

Fariza Sabrina

^2,*

and

Paul Kwan

²

¹

Department of Computing and Information Systems, Faculty of Computing, Sabaragamuwa University of Sri Lanka, Belihuloya 70140, Sri Lanka

²

School of Engineering and Technology, Central Queensland University, Rockhampton, QLD 4701, Australia

^*

Author to whom correspondence should be addressed.

Information 2025, 16(12), 1076; https://doi.org/10.3390/info16121076

Submission received: 29 September 2025 / Revised: 10 November 2025 / Accepted: 25 November 2025 / Published: 4 December 2025

(This article belongs to the Special Issue Applications of Information Extraction, Knowledge Graphs, and Large Language Models)

Download

Browse Figures

Versions Notes

Abstract

Large Language Models (LLMs) are emerging technologies and a growing research trend in Artificial General Intelligence (AGI), which envisions a future where machines can think and learn like humans across a wide range of tasks. Information generated by LLMs is essentially the prediction of next tokens in Natural Language Processing (NLP) tasks. However, the generated content is always subject to issues of truthfulness and hallucinations. The information and knowledge integrity of LLM-generated content therefore remains subjective. Exploring recent literature on the integrity of LLMs in a systematic manner is both timely and essential. Moreover, ensuring the reliability of LLMs in real-world applications is critical. Various approaches have been explored to promote information and knowledge integrity in LLMs, including adversarial training, data augmentation, and calibration methods. However, beyond these techniques, other strategies also contribute to maintaining knowledge integrity. This paper specifically focuses on three such approaches: knowledge distillation, semantic integrity, and provenance tracking, which play essential roles in ensuring that LLMs generate accurate, consistent, and trustworthy information. Knowledge distillation enhances model efficiency by transferring knowledge from larger models to smaller ones while preserving essential learning without compromising knowledge integrity. This reduces hallucinations. Semantic integrity safeguards consistency and strengthens the robustness of generated outputs. It is concurrently checking the meaningfulness of the outputs with the context. Provenance tracking improves transparency and trustworthiness through mechanisms such as data lineage and explainability, thereby ensuring the credibility of the LLM-generated responses. This review suggests that knowledge distillation, semantic integrity, and provenance tracking can enhance the reliability of LLM outputs, with prior studies reporting reductions in hallucination rates, improvements in robustness, and gains in factual consistency.

Keywords:

knowledge integrity; knowledge distillation; provenance tracking; semantic integrity; large language models (LLMs)

1. Introduction

The development and advancement of Large Language Models (LLMs) have revolutionized various domains, from conversational AI [1,2] to automated content generation [3]. These models are highly susceptible to security risks [4], including adversarial attacks [5], data leakage [6], and the propagation of misinformation [7]. A number of approaches have been introduced to enhance the robustness, trustworthiness, and efficiency of LLMs without sacrificing knowledge integrity.

While security, robustness, and trustworthiness are often discussed in LLM research, these aspects are inherently connected to the broader notion of knowledge integrity. Knowledge integrity goes beyond conventional security; it focuses on ensuring that the information learned, represented, and generated by LLMs remains accurate, consistent, and verifiable. In other words, maintaining integrity is not only about protecting data from attacks or leakage, but also about safeguarding the correctness and reliability of the model’s internal and output knowledge. In this context, the approaches discussed in this paper, including knowledge distillation, semantic integrity, and provenance tracking, are examined as key mechanisms that directly contribute to preserving the integrity of knowledge in LLMs.

Various methods have been proposed to enhance the reliability and robustness of LLMs, including adversarial training [8], data augmentation [9], and calibration techniques [10]. Complementing these are approaches that more directly target knowledge integrity, such as knowledge distillation [11], semantic integrity [12], and provenance tracking [13]. Of these, knowledge distillation [14,15] serves a critical role by transferring knowledge from large-capacity models to smaller, more efficient counterparts. Furthermore, ensuring semantic integrity [12] and implementing provenance tracking [13] further support the verification of authenticity and correctness of AI-generated content, thereby reducing the risks of manipulation and misinformation.

Understanding these mechanisms also requires situating them within the broader conceptual frameworks of information security [4] and knowledge management [16], which share critical interconnections with knowledge integrity. Together, these domains address how information is compressed, efficiently transferred across systems, validated for trustworthiness, and traced by various stakeholders. Ultimately, information must be protected throughout its entire lifecycle; thus, maintaining information and knowledge integrity necessitates attention to both security and management dimensions.

Within this context, knowledge distillation focuses on transferring essential information from larger models to smaller ones while maintaining accuracy, semantic integrity ensures that meaning remains consistent across transformations, and provenance tracking monitors the origins and modifications of information to safeguard its integrity.

This review therefore provides a comprehensive analysis of these approaches, proposing a holistic framework for maintaining information trustworthiness where vulnerabilities in any single domain can undermine the overall system. Such an interconnected perspective is increasingly crucial as organizations and researchers confront growing challenges in preserving the integrity of information and knowledge in AI-driven environments.

The main research questions guiding this study are as follows:

1.1. Research Questions

1.: RQ1: What techniques are currently used to ensure information and knowledge integrity in Large Language Models (LLMs)?
2.: RQ2: How does knowledge distillation contribute to preserving integrity in LLMs?
3.: RQ3: What approaches are proposed to safeguard semantic integrity in LLM outputs?
4.: RQ4: What methods are employed to implement provenance tracking in LLMs?

It is noteworthy that this review specifically examines the mechanisms that support information and knowledge integrity in LLMs, with an emphasis on their practical and theoretical contributions.

1.2. Contributions of This Review

A comprehensive taxonomy of LLM information and knowledge integrity, focusing on knowledge distillation, semantic integrity, and provenance tracking.
An analysis of knowledge distillation techniques to promote hallucination-free and content-aware responses in smaller LLMs.
An examination of semantic integrity and provenance tracking approaches to mitigate threats, ensure legitimate outputs, and enhance the authenticity and traceability of LLM-generated content.

2. Background

2.1. General Overview

2.1.1. Large Language Models

Large Language Models [17] (LLMs) such as GPT [18,19,20,21], LLaMa [22], Mistral, Gemma [23], Gemini [24,25], PaLM [26], BERT [27], T5 [28], and their derivatives have revolutionized natural language processing (NLP). They enable machines to generate human-like text, reason over complex problems, and understand context at scale. Built upon transformer architectures [29] and trained on vast corpora, LLMs have fundamentally altered information processing and knowledge representation. They demonstrate remarkable capabilities in natural language understanding, generation, question answering, summarization, translation, and reasoning, which are milestones seen as early steps towards Artificial General intelligence (AGI) [30].

Beyond their impressive performance, the scale and architecture [29] of LLMs are what enable such capabilities. These models typically contain billions, or even trillions, of parameters and are trained on massive, diverse corpora collected from the internet, books, and other sources. This scale allows them to capture subtle patterns in language and generalize across tasks without task-specific programming. However, this same reliance on broad, uncurated data makes them susceptible to encoding biases, misinformation, and harmful content that may surface in generated outputs. Their ability to generalize also introduces unpredictability, as responses can vary significantly across contexts even when the prompts appear similar.

As LLMs are increasingly deployed in domains such as healthcare [31,32], finance [33], and education [34], their reliability is under greater scrutiny. Hallucinations, adversarial vulnerabilities, and opaque reasoning raise critical concerns around trustworthiness.

2.1.2. Information Security in Large Language Models

Information security in the context of LLMs creates multiple dimensions excluding cybersecurity aspects. The novel architecture and operational characteristics of LLMs is exposable to specific vulnerabilities and attack vectors. The wider use of LLMs introduces a range of information security challenges [35]. These include the risk of data leakage, model inversion attacks, and prompt injection. LLMs are trained on large datasets, and they may carelessly retain or regenerate harmful information.

Ensuring the security of both the training data and model behavior is vital for any domains. LLMs face distinctive security threats including prompt injection attacks [36], where maliciously crafted inputs can manipulate model behavior. These are utilized to bypass safety measures or extract sensitive information from LLMs. Data poisoning [37] represents another critical concern, where adversarial examples in training data can compromise model integrity and lead to biased or harmful outputs. Likewise, model inversion attacks [38] pose significant risks to the privacy of training data, a vulnerability that allows attackers to reconstruct sensitive information from model parameters. Additionally, LLMs may memorize and reproduce training data [39], leading to potential privacy breaches when sensitive information is exposed through model outputs. As a result, model outputs have to be filtered by executing system commands while referencing the knowledge representation to avoid leakage of sensitive information.

2.1.3. Knowledge Management in Large Language Models

Knowledge management in LLMs represents a shift from traditional knowledge systems. It is largely not based on structured databases or expert systems. LLMs store knowledge implicitly within their neural parameters [40], which create both opportunities and challenges for effective knowledge governance.

Effective knowledge management in LLMs links to the model’s ability to acquire, organize, update, and utilize information accurately. Efficiency is not the primary objective, as LLMs are static post training [41], thereby presenting a major challenge to managing evolving knowledge. Without active knowledge management, models may produce outdated or incorrect outputs, which can lead to uncertainty.

Another challenge stems from the fact that LLMs encode knowledge through distributed representations that are learnt during training [42]. The distributed representations learnt make it difficult to verify, update, or control specific pieces of information. This uncontrollable knowledge storage complicates traditional knowledge management practices, such as knowledge auditing [43] and selective knowledge removal [44]. This challenge is further compounded by the fact that LLMs may show emergent knowledge behaviors that were not explicitly present in their training data.

Traditional knowledge management systems allow for precise updates and corrections to stored information. Modifying knowledge [45] in LLMs typically requires expensive retraining or fine-tuning processes. As a result, the knowledge of an LLM is almost impossible to modify by others other than the creators. This limitation raises questions about knowledge credibility, accuracy maintenance, and the ability to correct misinformation.

Information security and knowledge management in LLMs give rise to the concept of information and knowledge integrity [46]. This combination of processes and their associated terminologies encompasses the assurance that information processed and knowledge represented by LLMs are to be accurate, consistent, and trustworthy throughout the system’s lifecycle.

Information and knowledge integrity depend on both information security and knowledge management. Therefore, emphasizing the accuracy, consistency, and trustworthiness of knowledge handled by LLMs once they are created is important for knowledge integrity. Maintaining knowledge integrity is critical for applications where factual correctness and the protection of sensitive information are non-negotiable.

2.1.4. Integrity in LLM Context

Information integrity in LLMs refers to the preservation of data accuracy and consistency during processing, storage, and retrieval operations. Knowledge integrity extends this concept to the correctness and reliability of the elicited knowledge representations learned by the model. These concepts form the foundation for trustworthy AI systems [16]. Trustworthy AI systems can be relied upon for critical decision-making processes.

Assessing LLMs requires a multidimensional approach that considers truthfulness, including factual accuracy, consistency, and source reliability; safety, by addressing risks such as toxicity, bias, and misinformation; robustness, by evaluating performance under adversarial prompts, out-of-distribution data, and noisy inputs; fairness, by ensuring equitable outcomes across group, individual, and counterfactual perspectives; and privacy, by protecting against data leakage, membership inference, and attribute inference. Together, these dimensions ensure that model outputs remain accurate, reliable, unbiased, resilient, and respectful of sensitive information, collectively forming the foundation of integrity, as outlined in Table 1.

Maintaining integrity in LLMs faces several challenges. For example, the nature of neural networks makes it difficult to trace how specific inputs influence outputs and how knowledge is being represented internally. Another example stems from the statistical nature of LLM responses, where identical inputs might produce varying outputs. In other words, obtaining different outputs for the same prompt in different instances can complicate traditional integrity verification methods.

2.1.5. Bridging Knowledge Management and Information Security

In this review, it is evident that existing studies typically address knowledge management and information security in LLMs as separate concerns. To the best of our knowledge, no study has explicitly explored their combined role in shaping the trustworthiness of LLMs in the context of integrity.

In this work, we introduce the perspective that knowledge integrity emerges at the intersection of knowledge management and information security, illustrated in Figure 1. Knowledge management ensures that LLMs acquire, update, and organize information reliably, while information security protects that knowledge from hallucinations and misinformation.

\begin{matrix} Knowledge Integrity (KI) = Knowledge Management (KM) + Information Security (IS) \end{matrix}

(1)

We define Knowledge Integrity (KI) as the integration of multiple dimensions that collectively ensure reliable, secure, and trustworthy outputs from LLMs. Equation (1) captures our perspective that knowledge management and information security are both critical for the knowledge integrity of an LLM.

These components form a conceptual framework in which knowledge integrity is achieved by balancing efficiency, transparency, contextual meaning, security, accountability, trustworthiness, and resilience.

Together, these components provide a foundation for defining integrity in the LLM context, which we conceptualize and explore in the following subsection.

2.1.6. Knowledge Distillation

Knowledge Distillation [47] (KD) introduces as a promising approach for transferring information and knowledge integrity in LLMs. This technique is about transferring knowledge from a larger, more complex teacher model to a smaller model [48]. Creating a more efficient student model is the ultimate goal here. It potentially provides opportunities for integrity verification and enhancement.

KD provides a mechanism for controlled knowledge transfer, allowing practitioners to selectively transfer specific types of knowledge while excluding others. This selective transfer capability can be leveraged to enhance integrity by ensuring that only verified, high-quality knowledge is propagated to production systems.

2.1.7. Semantic Integrity

Semantic integrity focuses on preserving the meaning and contextual accuracy of information [49]. It is a special case, and it is achieved by knowledge within LLM systems. This dimension of integrity is particularly critical given LLMs’ primary function of processing and generating natural language with semantic content.

Semantic integrity requires that the meaning of information is preserved across all system operations. It started from input processing through knowledge representation to output generation. This includes maintaining consistency in how concepts are represented and ensuring that relationships between different pieces of knowledge remain coherent and accurate.

LLMs must maintain semantic integrity across different contexts and applications. This involves ensuring that knowledge remains contextually appropriate. The model’s understanding of concepts remains consistent, regardless of how those concepts are presented or queried. Achieving this level of semantic integrity requires sophisticated approaches to knowledge representation and integrity verification. Therefore, separate modules for coherence, consistency, and completeness can enable semantic integrity.

2.1.8. Provenance Tracking in LLMs

Provenance tracking represents a critical component of integrity assurance in LLM systems [50], providing mechanisms to trace the origin, transformation, and lineage of information and knowledge throughout the system’s lifecycle.

Tracing the source of the information generated through the LLM is nearly impossible. But, auditing the pieces of the information that are factually correct can be possible by invoking a knowledge base. Establishing clear provenance tracking requires comprehensive documentation of data sources [51]. Also, the preprocessing steps are complicated, and they become even more difficult in training procedures. This lineage information becomes substantial for integrity verification. Bias detection and compliance with regulatory requirements are additional actions in provenance tracking. The complex and distributed nature of knowledge in LLMs makes traditional provenance tracking methods insufficient.

2.2. Framework for Knowledge Integrity

The intersection of information security and knowledge management in LLMs is quite a unique area. It is located within the scope of information and knowledge integrity. Current approaches to integrity assurance in traditional systems are insufficient for the unique characteristics of LLMs, necessitating the development of novel frameworks and methodologies.

The integration of knowledge distillation, provenance tracking, and semantic integrity mechanisms offers promising directions for developing comprehensive integrity assurance frameworks for LLM systems. Further research is needed to develop practical, scalable solutions that can be deployed in real-world applications while maintaining the performance and capabilities that make LLMs valuable.

While knowledge distillation, semantic integrity, and provenance tracking target different aspects of LLM performance, they are interdependent and complementary within the proposed framework. Knowledge distillation reduces errors and preserves core knowledge at the model level, semantic integrity ensures outputs are contextually meaningful and consistent, and provenance tracking provides transparency and accountability of information sources. Together, these mechanisms create a layered approach that collectively enhances the reliability, trustworthiness, and overall integrity of LLM-generated knowledge.

Understanding and addressing these challenges is crucial for the continued adoption and trustworthy deployment of LLMs in critical applications where information and knowledge integrity is paramount.

Figure 2 depicts the workflow of a response system designed to retrieve information by enforcing over knowledge integrity through a layered approach. A user’s prompt is initially answered by a student model trained via knowledge distillation from a more complex teacher model. It is ensuring the core knowledge is retained in a compact, efficient format. To preserve knowledge integrity, the generated answer undergoes semantic validation, where coherence, completeness, and consistency checks ensure the response is logical and accurate. Simultaneously, information integrity is maintained by a provenance tracker that traces the origin of the data, guaranteeing the response is trustworthy and backed by verifiable sources. This multi-layer process reinforces the generation of meaningful, source-backed, and logically consistent answers, minimizing risks like hallucinations or misinformation in AI outputs.

The remainder of this paper is structured as follows: Section 3 discusses and compares existing work in this domain. Section 4 outlines the methodology adopted for this review. Section 5 presents our key findings from the review. Finally, Section 6 concludes the paper and offers recommendations for future research while highlighting several limitations in this study.

3. Comparison with Existing Work

We have reviewed several existing studies in the field of LLMs that focused on information and knowledge integrity.

This review explores key techniques and methodologies aimed at enhancing the security and integrity of Large Language Models (LLMs). With the increasing reliance on LLMs in critical applications, concerns regarding model efficiency, data integrity, adversarial attacks, and provenance tracking have gained significant attention. Specifically, this paper systematically examines three crucial areas:

Knowledge Distillation: Improving model efficiency while retaining performance.
Semantic Integrity: Ensuring the consistency and correctness of generated outputs.
Provenance Tracking: Enhancing traceability and accountability in model training and usage.

Several recent surveys have addressed different aspects of LLMs that are relevant to our work. Wang et al. [52] provide a comprehensive survey on factuality in LLMs, particularly in the context of retrieval augmentation, highlighting issues such as hallucinations and truthfulness, thereby supporting research on semantic integrity. Similarly, Wang et al. [53] examine strategies for enhancing LLMs through knowledge expansion methods such as continual learning and retrieval-based adaptation, which aligns with the broader goal of preserving information and knowledge integrity. However, their focus is primarily on applying external knowledge to LLMs, with discussions of the associated challenges, rather than on integrity itself.

Li et al. [54] investigate the boundaries of LLM knowledge by categorizing knowledge types and identifying limitations in knowledge retention and accuracy. Their work contributes a perspective on the constraints of information integrity with LLMs.

In another survey, Wang et al. [42] review the integration of LLMs with Knowledge Representation Learning (KRL), demonstrating how LLMs can enhance KRL and, in turn, uphold knowledge integrity. This line of research opens new possibilities for improving the representation and management of knowledge within LLMs. Yang et al. [50] provide an in-depth analysis of knowledge distillation for LLMs, distinguishing between white-box and black-box approaches to improve efficiency.

Huang et al. [55] shift the focus towards safety and trustworthiness, discussing vulnerabilities in LLMs and examining how traditional verification and validation techniques can be applied to reinforce integrity. Similarly, Xu et al. [14] provide a comprehensive taxonomy of knowledge distillation, covering algorithms, methods, elicitation techniques, and opportunities for improving LLM efficiency and reliability.

By carefully considering these works, as summarized in Table 2, we conclude that there is currently no review dedicated to addressing knowledge integrity in LLMs. While some surveys touch on information security by identifying vulnerabilities and flaws in LLMs, none explicitly investigate the combined importance of information security and knowledge management in this context. It motivates our investigation into how knowledge distillation, semantic integrity, and provenance tracking collectively enhance LLM reliability.

Furthermore, most existing surveys provide broad discussions of potential directions and emerging applications. In contrast, this paper proposes a review specifically focused on information and knowledge integrity in LLMs. Our review intentionally concentrates on three key mechanisms: knowledge distillation, semantic integrity, and provenance tracking.

4. Methodology

This review involves identifying all relevant works based on predefined selection criteria. For this review, we primarily relied on Google Scholar, which provides broad access to academic publications across multiple domains, including artificial intelligence and Large Language Models (LLMs) (see Table 3 for the search strategy).

Google scholar was chosen for this review is because it provides broad coverage across multiple disciplines. Its extensive indexing of journals and conference proceedings.

To meet the review objectives, search criteria were designed around three primary objectives:

Identifying published studies on Knowledge Distillation in LLMs.
Including studies that address Provenance Tracking in AI systems.
Including studies that examine Semantic Integrity and its role in maintaining the trustworthiness of LLM outputs.

The core search query combined the following criteria:

(“Knowledge Distillation” OR “KD”) AND (“Provenance Tracking” OR “PT” OR “Trustworthy” OR “Authenticity”) AND (“Semantic Integrity” OR “SI”) AND (“LLM” OR “Large Language Models”).

The query returned varying numbers of papers, which were refined using predefined inclusion and exclusion criteria. The inclusion and exclusion criteria followed established guidelines for conducting systematic reviews in computing and information systems research [56,57,58,59].

The inclusion criteria were: (i) peer-reviewed journal or conference papers, (ii) publications in English, and (iii) papers focused on knowledge integrity in LLMs (knowledge management and information security). Exclusion criteria included: (i) studies not related to LLMs, (ii) abstracts without full text, and (iii) non-peer-reviewed articles. This process yielded refined findings suitable for analysis.

The initial Google Scholar search retrieved over 3360 records using the allintitle filter.

Limiting the publication date to 2023–2025 reduced this number to 343 papers (See Figure 3):

Knowledge Distillation in LLMs: 77 papers
Semantic Integrity in LLMs: 199 papers
Provenance Tracking in LLMs: 4 papers
Trustworthiness in LLMs: 59 papers
Authenticity in LLMs: 4 papers

Figure 3. Donut chart illustrating the range of the papers taken for the literature review.

We then applied additional filtering by relevance to our research objectives. Studies focusing primarily on information management, domain-specific LLM implementations, and medical/diagnostic applications were excluded. After applying these criteria, a final set of 61 papers was selected (see Figure 4 for the PRISMA process, and see Figure 5 for the year of publication of the paper taken for the research).

5. Key Findings of the Review

Several techniques have emerged across different aspects of model development to address information and knowledge integrity in Large Language Models (LLMs). These include knowledge distillation methods that compress and transfer information from large models to smaller ones without compromising accuracy, semantic integrity techniques that ensure consistent and truthful outputs, and provenance tracking mechanisms that trace the origin and transformation of data. The entire taxonomy is displayed in Figure 6. Collectively, these approaches enhance the reliability, security, and trustworthiness of LLMs by safeguarding the integrity of the knowledge they learn and generate.

5.1. Knowledge Distillation

Knowledge Distillation (KD) [47,48] transfers knowledge from a large teacher model to a smaller student model, preserving accuracy while reducing computational costs. It has been applied to large language models [14].

Key KD techniques include teacher–student learning, where the student model learns from the teacher’s soft probabilities rather than hard labels. Model compression methods like quantization and pruning further minimize size and complexity while retaining essential knowledge. Multi-teacher distillation enhances robustness by incorporating insights from multiple teacher models, thereby improving generalization. In summary, smaller LLM models inherit intelligence from larger models.

5.1.1. Teacher–Student Learning

Teacher–student learning is a fundamental technique in Knowledge Distillation (KD), where a pre-trained teacher model [61,62] transfers knowledge to a smaller student model while preserving information integrity and security [63]. This process enhances model efficiency without compromising the accuracy or robustness of the teacher model to the student model.

To maintain knowledge integrity [64], the student model learns not just from hard labels [65] but also from the teacher’s soft probability distributions [66,67], ensuring essential decision-making patterns [68] are retained. The distillation loss [64], computed via cross-entropy [69,70,71] and Kullback–Leibler (KL) divergence [66,72], secures accurate knowledge transfer while minimizing information loss.

From a security standpoint, teacher–student learning improves model resilience against adversarial attacks by promoting generalization over overfitting. Additional techniques further enhance security, which include:

Confidential Distillation: Encrypting knowledge transfer to prevent unauthorized access [73]. The goal is to prevent unauthorized access to knowledge while still allowing a student model to learn from a teacher model.
Federated Distillation: Decentralized learning where students learn from a teacher while preserving data privacy [74,75]. The teacher’s knowledge is shared in the form of distilled outputs like soft labels or logits, not the raw model parameters or datasets.

Teacher–student learning preserves knowledge integrity. Students absorb soft predictions from teachers, maintaining essential decision-making patterns.

5.1.2. Model Compression

Model compression is a key technique in Knowledge Distillation (KD) that reduces computational complexity [76] and memory usage while preserving information integrity and security. It enables efficient usage of models in resource-constrained environments without compromising performance or reliability.

To maintain knowledge integrity, compression techniques focus on retaining essential semantic representations [11] while removing redundancies. The primary methods are:

Quantization: Reducing parameter precision to enhance efficiency by reducing from 32-bit floats to 8-bit integers. Secure quantization prevents adversarial manipulation while maintaining predictive accuracy. In addition, TinyBERT and MicroBERT models have been enhanced successfully [77,78,79,80,81]. Accuracy is preserved by applying calibration and fine-tuning strategies during quantization-aware training.
Pruning: Removing redundant neurons, layers, or parameters that minimally impact performance. Structured pruning preserves critical knowledge while reducing potential attack surfaces [82,83].

Model compression retains knowledge integrity. Techniques like quantization and pruning reduce model size while preserving core knowledge and performance.

5.1.3. Multi-Teacher Distillation

Multi-Teacher Distillation (MTD) is an advanced knowledge distillation technique [84] where a student model learns from multiple teachers, enhancing information integrity, security, and robustness [85,86].

Ensuring knowledge integrity in multi teacher distillation ensures well-rounded and consistent knowledge transfer by aggregating outputs from multiple teachers. This enhances the knowledge of a student LLM from various task-specific teacher LLMs.

Key strategies are logit-based fusion, feature-based distillation [61], and attention-based aggregation [11]. Logit-based fusion is where students learn from the multiple teachers’ soft predictions. Feature-based distillation is where students absorb intermediate representations from the teachers, which focus only on the patterns. Attention-based aggregation is where the student dynamically weighs knowledge from each teacher.

By leveraging multiple knowledge sources, MTD enhances model robustness while improving explainability. Multi-teacher distillation reinforces knowledge integrity. Students learn from multiple teachers, gaining balanced and robust insights.

The above discussion addresses the second research question (RQ2) by showing that knowledge distillation directly contributes to preserving integrity in LLMs. The important papers’ strengths, weaknesses/limitations, applications/implementations, and data/sources are summarized Table 4 Techniques such as teacher–student learning, model compression, and multi-teacher distillation reduce complexity while ensuring that decision-making patterns and semantic representations are faithfully retained. By combining efficiency with resilience against adversarial threats, KD emerges as a key strategy for maintaining knowledge integrity in compact and secure models.

5.2. Semantic Integrity

Semantic integrity ensures that knowledge learned and transferred within a machine learning model remains consistent, accurate, and resistant to manipulation [88,89]. In the context of Large Language Models (LLMs), preserving semantic integrity is important to maintaining truthfulness and fairness in responses with subjective correctness. Semantic integrity directly impacts information and knowledge integrity and security, ensuring that models retain factual correctness, consistency, and reliability in their outputs.

5.2.1. Consistency Verification

Consistency verification involves mechanisms that ensure a model’s outputs remain stable across different inputs and variations [51,90]. It prevents contradictions [91], hallucinations [92], and semantic drifts [93] in LLM responses. Methods include:

Fact-Checking Models: Cross-referencing outputs with external knowledge bases to verify truthfulness and coherence [94]. Fact-checking models are designed to verify the truthfulness and coherence of generated outputs. It is possible by cross-referencing them with trusted external knowledge sources such as databases and curated knowledge graphs. By validating key claims and information against reliable references, these models reduce the likelihood of hallucinations and fabricated responses in large language models.

Cross-Model Agreement: Comparing predictions from multiple models to ensure semantic stability and consistency [95]. It is a technique to improve the reliability by comparing the outputs of multiple models for the same input. When different models respond on similar predictions, the result is considered more stable at the same time it is trustworthy. This method helps to detect inconsistencies and biases that arise in individual models, ensuring that the final response maintains semantic stability.

These techniques improve information integrity by preventing models from generating misleading or fabricated knowledge, reinforcing trustworthiness.

Consistency verification preserves semantic integrity. Fact-checking models and cross-model agreement prevent contradictions, hallucinations, and semantic drift, ensuring reliable outputs.

5.2.2. Adversarial Robustness

Adversarial attacks attempt to manipulate LLMs by introducing carefully crafted inputs that mislead the model into generating incorrect or biased responses. Adversarial robustness eradicates adversarial attacks and maintains knowledge [96,97]. Enhancing adversarial robustness involves:

Adversarial Training: Exposing models to perturbed inputs to fortify their resistance against adversarial manipulation. Studies were conducted to identify various adversarial attacks. Adversarial training would be the counter mechanism [98]. It works by intentionally generating manipulated inputs often by adding small, imperceptible noise designed to fool the model. Therefore, including these examples in the training set will make the LLM stronger. By repeatedly exposing the model to such adversarial samples, the training process forces it to learn more robust decision boundaries that are less sensitive to malicious perturbations. It is a counter mechanism for adversarial attacks.

Input Sanitization: Detecting and filtering adversarial inputs using preprocessing techniques such as perturbation detection [99]. It is a preventive mechanism that work against adversarial inputs before they interact with the model. This method applies preprocessing techniques to detect whether an input has been tampered with.

Robust Embedding Techniques: Ensuring model embeddings capture true semantic meaning while remaining resilient to adversarial distortions. Studies have been conducted on analyzing the perturbation or the hallucination [100]. The aim is in designing embeddings that maintain stability under small input perturbations and enforcing constraints during embedding learning. Other approaches include applying regularization strategies that penalize sensitivity to noise.

These approaches uphold knowledge integrity by securing the model against misinformation injection and malicious exploits.

In summary, adversarial robustness maintains knowledge integrity. Adversarial training, input sanitization, and robust embeddings secure LLMs against manipulated inputs and malicious exploits.

5.2.3. Bias Detection and Mitigation

Biases in LLMs can compromise semantic integrity, leading to unfair, misleading, or discriminatory outputs [101]. Mitigating bias enhances both knowledge security and ethical AI development. Approaches include fairness-aware training, which is a balanced dataset, and adversarial debiasing [102] techniques to reduce unintended biases.

Explainable AI (XAI) Techniques: Providing interpretability mechanisms to detect biased decision patterns [103,104]. By highlighting which features most strongly influence predictions, XAI helps researchers and practitioners identify biased decision patterns that may otherwise remain hidden within complex models. It is a baseline of mechanistic intepretablity approach.

Post-Training Bias Audits: Evaluating models against ethical AI benchmarks to ensure unbiased knowledge generation [105]. These audits take a retrospective approach by systematically testing the model’s outputs across diverse contextual scenarios.

By safeguarding semantic integrity through these techniques, LLMs maintain trustworthy, secure, and fair knowledge representation, ensuring high-quality information dissemination without compromising information security and integrity.

Bias detection and mitigation reinforce semantic integrity. Fairness-aware training, XAI techniques, and post-training audits reduce unintended biases and promote trustworthy knowledge generation.

This addresses the third research question (RQ3) by demonstrating that safeguarding semantic integrity requires consistency verification, adversarial robustness, and bias mitigation. Together, these approaches reduce hallucinations, prevent manipulation, and promote fairness in LLM outputs. Table 5 presents the insights of the papers taken for the review in semantic integrity in the scope of domain, method, and application. By ensuring stable, reliable, and unbiased responses, semantic integrity directly reinforces the trustworthiness and security of knowledge representation in LLMs.

5.3. Provenance Tracking

Provenance tracking is essential for ensuring the integrity and security of information and knowledge in large language models (LLMs). It involves maintaining a traceable record of the origins [49], transformations, and decision-making processes of the data, thereby improving trust, accountability, and transparency in AI-generated knowledge. Provenance tracking prevents data manipulation [106], misinformation propagation [7], and model misinterpretation, making it a crucial aspect of securing LLM outputs.

Data Lineage Tracking

Provenance tracking ensures that every piece of information processed by an LLM can be traced back to its original source, thereby preventing the use of unverified or manipulated data. It encompasses several complementary mechanisms that collectively uphold transparency and trustworthiness:

Source Attribution: Identifying and recording the origins of both training and inference data to ensure verifiable and trustworthy knowledge generation [13]. Maintaining explicit records of data sources enables researchers and users to assess the reliability, credibility, and potential biases that influence model outputs.
Data Versioning: Maintaining historical records of data modifications to track changes over time and prevent tampering. This ensures that model behavior and results can be accurately reproduced or audited.
Dependency Tracking: Mapping data relationships and dependencies to prevent the propagation of errors or misinformation across multiple knowledge sources.

Together, these mechanisms form a robust data lineage framework that reinforces knowledge integrity by making every transformation, update, or dependency transparent and verifiable. Provenance tracking, therefore, enhances both accountability and trust by enabling stakeholders to validate the authenticity and correctness of LLM outputs.

In relation to Research Question 4 (RQ4), these techniques highlight provenance tracking as a foundational pillar for accountability and traceability in LLMs. Through comprehensive data lineage tracking, source attribution, and dependency monitoring, provenance-based methods ensure that each piece of knowledge can be verified against its origin. By preventing manipulation and enabling transparent auditing, provenance tracking plays a critical role in securing trustworthy and explainable LLM outcomes.

Collectively, these discussions address the first research question (RQ1) by identifying the key techniques that ensure information and knowledge integrity in LLMs. In addition, knowledge distillation (RQ2), semantic integrity (RQ3), and provenance tracking (RQ4) serve as complementary strategies which, when integrated, provide a comprehensive framework for maintaining trustworthy, transparent, and secure LLM outputs (see Table 6).

6. Conclusions and Future Research

This review explored existing strategies aimed at ensuring information and knowledge integrity in large language models (LLMs). The literature highlights that achieving integrity involves minimizing hallucinations, maintaining semantic consistency, and enabling provenance tracking of information sources. Three recurring mechanisms were identified across prior works, including knowledge distillation, semantic integrity, and provenance tracking. Together, these mechanisms provide a layered perspective on how LLMs can be made more reliable, transparent, and trustworthy.

Current methods that enforce knowledge integrity of LLMs face challenges related to scalability, domain adaptability, and nuanced validation. Provenance tracking ensures transparency, but it struggles to scale effectively because continuously verifying and storing provenance data for massive and dynamic datasets is computationally expensive.

Semantic integrity emphasizes maintaining consistency, but it encounters difficulties in nuanced validation since semantic correctness is often subjective, context-dependent, and hard to evaluate automatically.

Knowledge distillation also faces scalability limitations because distilled models frequently fail to retain the full breadth of domain knowledge, particularly when extended to multiple specialized domains.

This review demonstrates that knowledge distillation (RQ2), semantic integrity (RQ3), and provenance tracking (RQ4) collectively address the overarching research question (RQ1) by identifying key techniques to ensure knowledge integrity in large language models (LLMs).

Building on this analysis, the review contributes a comprehensive taxonomy of approaches that promote information and knowledge integrity in LLMs. It focuses on the interplay among knowledge distillation, semantic integrity, and provenance tracking as complementary mechanisms for achieving reliable, hallucination-free, and content-aware language models. The paper also examines methods that mitigate security and trust threats while enhancing the authenticity, transparency, and traceability of AI-generated content. In doing so, it identifies ongoing challenges such as scalability limitations, domain adaptability, and the nuanced validation of semantic correctness and provenance data.

Future research can advance these foundations in several key directions. First, developing empirical frameworks that integrate these mechanisms and testing them in real-world environments would provide valuable practical insights. Second, incorporating real-time fact-checking systems within LLM pipelines could substantially strengthen response reliability. Third, extending validation techniques to multilingual and multimodal contexts would improve the robustness of semantic integrity across diverse applications. Finally, emerging directions such as self-correcting LLMs, which iteratively refine outputs using feedback from provenance and semantic validators, present promising opportunities for enhancing knowledge integrity in next-generation language models.

Limitations

This review has certain limitations. First, the proposed layered approach relies heavily on predefined semantic validation rules, which may restrict adaptability across diverse contexts. Second, much of the existing work on provenance tracking assumes the availability of structured data sources, which are not universally present in real-world scenarios. Third, our analysis of provenance mechanisms was limited to data lineage tracing and did not extend to other dimensions such as dynamic source verification or broader auditability frameworks. These limitations should be addressed in future research to strengthen the generalizability of knowledge integrity mechanisms. Additionally, it should be noted that this review is subject to certain methodological limitations. Our search relied primarily on Google Scholar, which, while comprehensive, may not capture all relevant publications indexed in other databases such as Scopus or Web of Science. The choice of keywords, including “LLM” and “Large Language Model,” and limiting searches to titles may have excluded some relevant studies. Furthermore, the analyzed period (2023–2025) may omit earlier foundational work. These factors introduce potential biases in the paper selection process, which should be considered when interpreting the findings.

Author Contributions

V.A.: Conceptualization, Methodology, Visualization, Writing—Original Draft, Writing—Review and Editing; F.S.: Conceptualization, Methodology, Supervision, Writing—Original Draft, Writing—Review and Editing; P.K.: Supervision, Writing—Original Draft, Writing—Review and Editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kulkarni, P.; Mahabaleshwarkar, A.; Kulkarni, M.; Sirsikar, N.; Gadgil, K. Conversational AI: An overview of methodologies, applications & future scope. In Proceedings of the IEEE 2019 5th International Conference on Computing, Communication, Control and Automation (ICCUBEA), Pune, India, 19–21 September 2019; pp. 1–7. [Google Scholar]
Gao, J.; Galley, M.; Li, L. Neural approaches to conversational AI. In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Ann Arbor, MI, USA, 8–12 July 2018; pp. 1371–1374. [Google Scholar]
Kumar, P.; Manikandan, S.; Kishore, R. Ai-driven Text Generation: A Novel Gpt-based Approach for Automated Content Creation. In Proceedings of the IEEE 2024 2nd International Conference on Networking and Communications (ICNWC), Chennai, India, 2–4 April 2024; pp. 1–6. [Google Scholar]
Yao, Y.; Duan, J.; Xu, K.; Cai, Y.; Sun, Z.; Zhang, Y. A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. High-Confid. Comput. 2024, 4, 100211. [Google Scholar] [CrossRef]
Xu, X.; Kong, K.; Liu, N.; Cui, L.; Wang, D.; Zhang, J.; Kankanhalli, M. An llm can fool itself: A prompt-based adversarial attack. arXiv 2023, arXiv:2310.13345. [Google Scholar] [CrossRef]
Wu, Y.; Li, Z.; Zhang, J.M.; Liu, Y. Condefects: A new dataset to address the data leakage concern for llm-based fault localization and program repair. arXiv 2023, arXiv:2310.16253. [Google Scholar]
Chen, M.; Wei, L.; Cao, H.; Zhou, W.; Hu, S. Can large language models understand content and propagation for misinformation detection: An empirical study. arXiv 2023, arXiv:2311.12699. [Google Scholar] [CrossRef]
Xhonneux, S.; Sordoni, A.; Günnemann, S.; Gidel, G.; Schwinn, L. Efficient adversarial training in llms with continuous attacks. Adv. Neural Inf. Process. Syst. 2024, 37, 1502–1530. [Google Scholar]
Zhang, X.; Zhang, J.; Mo, F.; Wang, D.; Fu, Y.; Liu, K. LEKA: LLM-Enhanced Knowledge Augmentation. arXiv 2025, arXiv:2501.17802. [Google Scholar]
Wang, Z.; Shi, Z.; Zhou, H.; Gao, S.; Sun, Q.; Li, J. Towards Objective Fine-tuning: How LLMs’ Prior Knowledge Causes Potential Poor Calibration? arXiv 2025, arXiv:2505.20903. [Google Scholar] [CrossRef]
Hu, S.; Zou, G.; Yang, S.; Lin, S.; Gan, Y.; Zhang, B.; Chen, Y. Large language model meets graph neural network in knowledge distillation. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; Volume 39, pp. 17295–17304. [Google Scholar]
Rajan, S.S.; Soremekun, E.; Chattopadhyay, S. Knowledge-based consistency testing of large language models. arXiv 2024, arXiv:2407.12830. [Google Scholar]
Wang, J.; Lu, X.; Zhao, Z.; Dai, Z.; Foo, C.S.; Ng, S.K.; Low, B.K.H. Source Attribution for Large Language Model-Generated Data. arXiv 2023, arXiv:2310.00646. [Google Scholar]
Xu, X.; Li, M.; Tao, C.; Shen, T.; Cheng, R.; Li, J.; Xu, C.; Tao, D.; Zhou, T. A survey on knowledge distillation of large language models. arXiv 2024, arXiv:2402.13116. [Google Scholar]
Wang, L.; Yoon, K.J. Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3048–3068. [Google Scholar] [CrossRef]
You, D.; Chon, D. Trust & Safety of LLMs and LLMs in Trust & Safety. arXiv 2024, arXiv:2412.02113. [Google Scholar] [CrossRef]
Chang, Y.; Wang, X.; Wang, J.; Wu, Y.; Yang, L.; Zhu, K.; Chen, H.; Yi, X.; Wang, C.; Wang, Y.; et al. A survey on evaluation of large language models. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–45. [Google Scholar] [CrossRef]
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-training. OpenAI Technical Report. 2018. Available online: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf (accessed on 10 November 2025).
Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. Gpt-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar] [CrossRef]
Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; et al. Llama 2: Open foundation and fine-tuned chat models. arXiv 2023, arXiv:2307.09288. [Google Scholar] [CrossRef]
Team, G.; Mesnard, T.; Hardin, C.; Dadashi, R.; Bhupatiraju, S.; Pathak, S.; Sifre, L.; Rivière, M.; Kale, M.S.; Love, J.; et al. Gemma: Open models based on gemini research and technology. arXiv 2024, arXiv:2403.08295. [Google Scholar] [CrossRef]
Team, G.; Anil, R.; Borgeaud, S.; Alayrac, J.B.; Yu, J.; Soricut, R.; Schalkwyk, J.; Dai, A.M.; Hauth, A.; Millican, K.; et al. Gemini: A family of highly capable multimodal models. arXiv 2023, arXiv:2312.11805. [Google Scholar] [CrossRef]
Team, G.; Georgiev, P.; Lei, V.I.; Burnell, R.; Bai, L.; Gulati, A.; Tanzer, G.; Vincent, D.; Pan, Z.; Wang, S.; et al. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv 2024, arXiv:2403.05530. [Google Scholar] [CrossRef]
Chowdhery, A.; Narang, S.; Devlin, J.; Bosma, M.; Mishra, G.; Roberts, A.; Barham, P.; Chung, H.W.; Sutton, C.; Gehrmann, S.; et al. Palm: Scaling language modeling with pathways. J. Mach. Learn. Res. 2023, 24, 1–113. [Google Scholar]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), (NAACL-HLT 2019), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
Zhong, T.; Liu, Z.; Pan, Y.; Zhang, Y.; Zhou, Y.; Liang, S.; Wu, Z.; Lyu, Y.; Shu, P.; Yu, X.; et al. Evaluation of openai o1: Opportunities and challenges of agi. arXiv 2024, arXiv:2409.18486. [Google Scholar] [CrossRef]
Nazi, Z.A.; Peng, W. Large language models in healthcare and medical domain: A review. Informatics 2024, 11, 57. [Google Scholar] [CrossRef]
Shool, S.; Adimi, S.; Saboori Amleshi, R.; Bitaraf, E.; Golpira, R.; Tara, M. A systematic review of large language model (LLM) evaluations in clinical medicine. BMC Med. Inform. Decis. Mak. 2025, 25, 117. [Google Scholar] [CrossRef]
Chen, Z.Z.; Ma, J.; Zhang, X.; Hao, N.; Yan, A.; Nourbakhsh, A.; Yang, X.; McAuley, J.; Petzold, L.; Wang, W.Y. A survey on large language models for critical societal domains: Finance, healthcare, and law. arXiv 2024, arXiv:2405.01769. [Google Scholar] [CrossRef]
Chu, Z.; Wang, S.; Xie, J.; Zhu, T.; Yan, Y.; Ye, J.; Zhong, A.; Hu, X.; Liang, J.; Yu, P.S.; et al. Llm agents for education: Advances and applications. arXiv 2025, arXiv:2503.11733. [Google Scholar]
Gong, C.; Li, Z.; Li, X. Information security based on llm approaches: A review. arXiv 2025, arXiv:2507.18215. [Google Scholar] [CrossRef]
Kumar, S.S.; Cummings, M.; Stimpson, A. Strengthening LLM trust boundaries: A survey of prompt injection attacks. In Proceedings of the 2024 IEEE 4th International Conference on Human-Machine Systems (ICHMS), Toronto, ON, Canada, 15–17 May 2024; pp. 1–6. [Google Scholar]
Zhao, P.; Zhu, W.; Jiao, P.; Gao, D.; Wu, O. Data poisoning in deep learning: A survey. arXiv 2025, arXiv:2503.22759. [Google Scholar] [CrossRef]
Fang, H.; Qiu, Y.; Yu, H.; Yu, W.; Kong, J.; Chong, B.; Chen, B.; Wang, X.; Xia, S.T.; Xu, K. Privacy leakage on dnns: A survey of model inversion attacks and defenses. arXiv 2024, arXiv:2402.04013. [Google Scholar] [CrossRef]
Leybzon, D.; Kervadec, C. Learning, forgetting, remembering: Insights from tracking llm memorization during training. In Proceedings of the 7th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP, Miami, FL, USA, 15 November 2024; pp. 43–57. [Google Scholar]
Pan, J.Z.; Razniewski, S.; Kalo, J.C.; Singhania, S.; Chen, J.; Dietze, S.; Jabeen, H.; Omeliyanenko, J.; Zhang, W.; Lissandrini, M.; et al. Large language models and knowledge graphs: Opportunities and challenges. arXiv 2023, arXiv:2308.06374. [Google Scholar] [CrossRef]
Du, H.; Li, W.; Cai, M.; Saraipour, K.; Zhang, Z.; Lakkaraju, H.; Sun, Y.; Zhang, S. How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence. arXiv 2025, arXiv:2504.02904. [Google Scholar] [CrossRef]
Wang, X.; Chen, Z.; Wang, H.; Hou U, L.; Li, Z.; Guo, W. Large language model enhanced knowledge representation learning: A survey. Data Sci. Eng. 2025, 10, 315–338. [Google Scholar] [CrossRef]
Mökander, J.; Schuett, J.; Kirk, H.R.; Floridi, L. Auditing large language models: A three-layered approach. AI Ethics 2024, 4, 1085–1115. [Google Scholar] [CrossRef]
Veldanda, A.K.; Zhang, S.X.; Das, A.; Chakraborty, S.; Rawls, S.; Sahu, S.; Naphade, M. Llm surgery: Efficient knowledge unlearning and editing in large language models. arXiv 2024, arXiv:2409.13054. [Google Scholar]
Wang, S.; Zhu, Y.; Liu, H.; Zheng, Z.; Chen, C.; Li, J. Knowledge editing for large language models: A survey. ACM Comput. Surv. 2024, 57, 1–37. [Google Scholar] [CrossRef]
Xu, R.; Qi, Z.; Guo, Z.; Wang, C.; Wang, H.; Zhang, Y.; Xu, W. Knowledge conflicts for llms: A survey. arXiv 2024, arXiv:2403.08319. [Google Scholar] [CrossRef]
Gou, J.; Yu, B.; Maybank, S.J.; Tao, D. Knowledge distillation: A survey. Int. J. Comput. Vis. 2021, 129, 1789–1819. [Google Scholar] [CrossRef]
Tian, Y.; Pei, S.; Zhang, X.; Zhang, C.; Chawla, N. Knowledge distillation on graphs: A survey. ACM Comput. Surv. 2023, 57, 1–16. [Google Scholar] [CrossRef]
Zuo, F.; Rhee, J.; Choe, Y.R. Knowledge Transfer from LLMs to Provenance Analysis: A Semantic-Augmented Method for APT Detection. arXiv 2025, arXiv:2503.18316. [Google Scholar] [CrossRef]
Yang, C.; Zhu, Y.; Lu, W.; Wang, Y.; Chen, Q.; Gao, C.; Yan, B.; Chen, Y. Survey on knowledge distillation for large language models: Methods, evaluation, and application. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–27. [Google Scholar] [CrossRef]
Ghosh, B.; Hasan, S.; Arafat, N.A.; Khan, A. Logical Consistency of Large Language Models in Fact-checking. arXiv 2024, arXiv:2412.16100. [Google Scholar] [CrossRef]
Wang, C.; Liu, X.; Yue, Y.; Tang, X.; Zhang, T.; Jiayang, C.; Yao, Y.; Gao, W.; Hu, X.; Qi, Z.; et al. Survey on factuality in large language models: Knowledge, retrieval and domain-specificity. arXiv 2023, arXiv:2310.07521. [Google Scholar] [CrossRef]
Wang, M.; Stoll, A.; Lange, L.; Adel, H.; Schütze, H.; Strötgen, J. Bring Your Own Knowledge: A Survey of Methods for LLM Knowledge Expansion. arXiv 2025, arXiv:2502.12598. [Google Scholar] [CrossRef]
Li, M.; Zhao, Y.; Deng, Y.; Zhang, W.; Li, S.; Xie, W.; Ng, S.K.; Chua, T.S. Knowledge Boundary of Large Language Models: A Survey. arXiv 2024, arXiv:2412.12472. [Google Scholar] [CrossRef]
Huang, X.; Ruan, W.; Huang, W.; Jin, G.; Dong, Y.; Wu, C.; Bensalem, S.; Mu, R.; Qi, Y.; Zhao, X.; et al. A survey of safety and trustworthiness of large language models through the lens of verification and validation. Artif. Intell. Rev. 2024, 57, 175. [Google Scholar] [CrossRef]
Kitchenham, B. Procedures for Performing Systematic Reviews; Keele University: Keele, UK, 2004; Volume 33, pp. 1–26. [Google Scholar]
Petticrew, M.; Roberts, H. Systematic Reviews in the Social Sciences: A Practical Guide; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
Haddaway, N.R.; Woodcock, P.; Macura, B.; Collins, A. Making literature reviews more reliable through application of lessons from systematic reviews. Conserv. Biol. 2015, 29, 1596–1605. [Google Scholar] [CrossRef]
Mallett, R.; Hagen-Zanker, J.; Slater, R.; Duvendack, M. The benefits and challenges of using systematic reviews in international development research. J. Dev. Eff. 2012, 4, 445–455. [Google Scholar] [CrossRef]
Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef]
Shirgaonkar, A.; Pandey, N.; Abay, N.C.; Aktas, T.; Aski, V. Knowledge Distillation Using Frontier Open-source LLMs: Generalizability and the Role of Synthetic Data. arXiv 2024, arXiv:2410.18588. [Google Scholar] [CrossRef]
Li, J.; Nag, S.; Liu, H.; Tang, X.; Sarwar, S.M.; Cui, L.; Gu, H.; Wang, S.; He, Q.; Tang, J. Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data. In Findings of the Association for Computational Linguistics, Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL 2025), Albuquerque, NM, USA, 29 April–4 May 2025; Chiruzzo, L., Ritter, A., Wang, L., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2025; pp. 2627–2641. [Google Scholar] [CrossRef]
Hu, C.; Li, X.; Liu, D.; Wu, H.; Chen, X.; Wang, J.; Liu, X. Teacher-student architecture for knowledge distillation: A survey. arXiv 2023, arXiv:2308.04268. [Google Scholar] [CrossRef]
Liu, J.; Zhang, C.; Guo, J.; Zhang, Y.; Que, H.; Deng, K.; Liu, J.; Zhang, G.; Wu, Y.; Liu, C.; et al. Ddk: Distilling domain knowledge for efficient large language models. Adv. Neural Inf. Process. Syst. 2024, 37, 98297–98319. [Google Scholar]
Nguyen, H.; He, Z.; Gandre, S.A.; Pasupulety, U.; Shivakumar, S.K.; Lerman, K. Smoothing Out Hallucinations: Mitigating LLM Hallucination with Smoothed Knowledge Distillation. arXiv 2025, arXiv:2502.11306. [Google Scholar] [CrossRef]
Gu, Y.; Dong, L.; Wei, F.; Huang, M. MiniLLM: Knowledge distillation of large language models. arXiv 2023, arXiv:2306.08543. [Google Scholar]
Anshumann, A.; Zaidi, M.A.; Kedia, A.; Ahn, J.; Kwon, T.; Lee, K.; Lee, H.; Lee, J. Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vienna, Austria, 27 July–1 August 2025; pp. 18085–18108. [Google Scholar] [CrossRef]
Chen, D.; Zhang, S.; Gao, F.; Zhuang, Y.; Tang, S.; Liu, Q.; Xu, M. Logic Distillation: Learning from Code Function by Function for Planning and Decision-making. arXiv 2024, arXiv:2407.19405. [Google Scholar] [CrossRef]
Yang, Y.; Tian, B.; Yu, F.; He, Y. An Anomaly Detection Model Training Method Based on LLM Knowledge Distillation. In Proceedings of the IEEE 2024 International Conference on Networking and Network Applications (NaNA), Yinchuan City, China, 9–12 August 2024; pp. 472–477. [Google Scholar]
Di Palo, F.; Singhi, P.; Fadlallah, B. Performance-Guided LLM Knowledge Distillation for Efficient Text Classification at Scale. arXiv 2024, arXiv:2411.05045. [Google Scholar] [CrossRef]
Lee, T.; Bang, J.; Kwon, S.; Kim, T. Multi-aspect Knowledge Distillation with Large Language Model. arXiv 2025, arXiv:2501.13341. [Google Scholar] [CrossRef]
Wu, T.; Tao, C.; Wang, J.; Yang, R.; Zhao, Z.; Wong, N. Rethinking kullback-leibler divergence in knowledge distillation for large language models. arXiv 2024, arXiv:2404.02657. [Google Scholar] [CrossRef]
Song, Y.; Zhang, J.; Tian, Z.; Yang, Y.; Huang, M.; Li, D. LLM-based privacy data augmentation guided by knowledge distillation with a distribution tutor for medical text classification. arXiv 2024, arXiv:2402.16515. [Google Scholar] [CrossRef]
Li, L.; Gou, J.; Yu, B.; Du, L.; Tao, Z.Y.D. Federated distillation: A survey. arXiv 2024, arXiv:2404.08564. [Google Scholar] [CrossRef]
Qin, L.; Zhu, T.; Zhou, W.; Yu, P.S. Knowledge distillation in federated learning: A survey on long lasting challenges and new solutions. arXiv 2024, arXiv:2406.10861. [Google Scholar] [CrossRef]
Huangpu, Q.; Gao, H. Efficient Model Compression and Knowledge Distillation on Llama 2: Achieving High Performance with Reduced Computational Cost. 2024. Available online: https://osf.io/preprints/osf/hax36 (accessed on 10 November 2025).
Du, D.; Zhang, Y.; Cao, S.; Guo, J.; Cao, T.; Chu, X.; Xu, N. Bitdistiller: Unleashing the potential of sub-4-bit llms via self-distillation. arXiv 2024, arXiv:2402.10631. [Google Scholar]
Fan, A.; Stock, P.; Graham, B.; Grave, E.; Gribonval, R.; Jegou, H.; Joulin, A. Training with quantization noise for extreme model compression. arXiv 2020, arXiv:2004.07320. [Google Scholar]
Liu, Z.; Oguz, B.; Zhao, C.; Chang, E.; Stock, P.; Mehdad, Y.; Shi, Y.; Krishnamoorthi, R.; Chandra, V. Llm-qat: Data-free quantization aware training for large language models. arXiv 2023, arXiv:2305.17888. [Google Scholar]
Latif, E.; Fang, L.; Ma, P.; Zhai, X. Knowledge distillation of LLM for automatic scoring of science education assessments. arXiv 2023, arXiv:2312.15842. [Google Scholar]
Zheng, D.; Li, J.; Yang, Y.; Wang, Y.; Pang, P.C.I. MicroBERT: Distilling MoE-Based Knowledge from BERT into a Lighter Model. Appl. Sci. 2024, 14, 6171. [Google Scholar] [CrossRef]
Sreenivas, S.T.; Muralidharan, S.; Joshi, R.; Chochowski, M.; Mahabaleshwarkar, A.S.; Shen, G.; Zeng, J.; Chen, Z.; Suhara, Y.; Diao, S.; et al. Llm pruning and distillation in practice: The minitron approach. arXiv 2024, arXiv:2408.11796. [Google Scholar] [CrossRef]
Muralidharan, S.; Turuvekere Sreenivas, S.; Joshi, R.; Chochowski, M.; Patwary, M.; Shoeybi, M.; Catanzaro, B.; Kautz, J.; Molchanov, P. Compact language models via pruning and knowledge distillation. Adv. Neural Inf. Process. Syst. 2024, 37, 41076–41102. [Google Scholar]
Mansourian, A.M.; Ahmadi, R.; Ghafouri, M.; Babaei, A.M.; Golezani, E.B.; Ghamchi, Z.Y.; Ramezanian, V.; Taherian, A.; Dinashi, K.; Miri, A.; et al. A Comprehensive Survey on Knowledge Distillation. arXiv 2025, arXiv:2503.12067. [Google Scholar] [CrossRef]
Zhao, Z.; Xie, Z.; Zhou, G.; Huang, J.X. MTMS: Multi-teacher Multi-stage Knowledge Distillation for Reasoning-Based Machine Reading Comprehension. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington, DC, USA, 14–18 July 2024; pp. 1995–2005. [Google Scholar]
Tian, Y.; Han, Y.; Chen, X.; Wang, W.; Chawla, N.V. Beyond Answers: Transferring Reasoning Capabilities to Smaller LLMs Using Multi-Teacher Knowledge Distillation. arXiv 2024, arXiv:2402.04616. [Google Scholar]
Li, Z.; Xu, P.; Chang, X.; Yang, L.; Zhang, Y.; Yao, L.; Chen, X. When object detection meets knowledge distillation: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 10555–10579. [Google Scholar] [CrossRef] [PubMed]
Yang, M.; Chen, Y.; Liu, Y.; Shi, L. DistillSeq: A Framework for Safety Alignment Testing in Large Language Models Using Knowledge Distillation. In Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, Vienna, Austria, 16–20 September 2024; pp. 578–589. [Google Scholar]
Guo, S.; Wang, Y.; Ye, J.; Zhang, A.; Zhang, P.; Xu, K. Semantic Importance-Aware Communications with Semantic Correction Using Large Language Models. IEEE Trans. Mach. Learn. Commun. Netw. 2025, 3, 232–245. [Google Scholar] [CrossRef]
Lee, A.W.; Chan, J.; Fu, M.; Kim, N.; Mehta, A.; Raghavan, D.; Cetintemel, U. Semantic Integrity Constraints: Declarative Guardrails for AI-Augmented Data Processing Systems. arXiv 2025, arXiv:2503.00600. [Google Scholar] [CrossRef]
Raj, H.; Gupta, V.; Rosati, D.; Majumdar, S. Semantic consistency for assuring reliability of large language models. arXiv 2023, arXiv:2308.09138. [Google Scholar] [CrossRef]
Galitsky, B.; Chernyavskiy, A.; Ilvovsky, D. Truth-o-meter: Handling multiple inconsistent sources repairing LLM hallucinations. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington, DC, USA, 14–18 July 2024; pp. 2817–2821. [Google Scholar]
Roe, A.; Richardson, S.; Schneider, J.; Cummings, A.; Forsberg, N.; Klein, J. Semantic drift mitigation in large language model knowledge retention using the residual knowledge stability concept. TechRxiv Preprint 2024. [Google Scholar] [CrossRef]
Yao, J.; Sun, H.; Xue, N. Fact-checking AI-generated news reports: Can LLMs catch their own lies? arXiv 2025, arXiv:2503.18293. [Google Scholar]
Chanenson, J.; Pickering, M.; Apthorpe, N. Automating governing knowledge commons and contextual integrity (GKC-CI) privacy policy annotations with large language models. arXiv 2023, arXiv:2311.02192. [Google Scholar] [CrossRef]
Zhu, K.; Wang, J.; Zhou, J.; Wang, Z.; Chen, H.; Wang, Y.; Yang, L.; Ye, W.; Zhang, Y.; Gong, N.; et al. Promptrobust: Towards evaluating the robustness of large language models on adversarial prompts. In Proceedings of the 1st ACM Workshop on Large AI Systems and Models with Privacy and Safety Analysis, Salt Lake City, UT, USA, 14–18 October 2023; pp. 57–68. [Google Scholar]
Liu, S.; Chen, J.; Ruan, S.; Su, H.; Yin, Z. Exploring the robustness of decision-level through adversarial attacks on llm-based embodied models. In Proceedings of the 32nd ACM International Conference on Multimedia, Melbourne, VIC, Australia, 28 October–1 November 2024; pp. 8120–8128. [Google Scholar]
Zou, J.; Zhang, S.; Qiu, M. Adversarial attacks on large language models. In Proceedings of the International Conference on Knowledge Science, Engineering and Management, Birmingham, UK, 16–18 August 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 85–96. [Google Scholar]
Wang, C.; Zhang, W.; Su, Z.; Xu, X.; Zhang, X. Sanitizing Large Language Models in Bug Detection with Data-Flow. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, Miami, FL, USA, 12–16 November 2024; pp. 3790–3805. [Google Scholar]
Singh, A.; Singh, N.; Vatsal, S. Robustness of llms to perturbations in text. arXiv 2024, arXiv:2407.08989. [Google Scholar] [CrossRef]
Huang, D.; Zhang, J.M.; Bu, Q.; Xie, X.; Chen, J.; Cui, H. Bias testing and mitigation in llm-based code generation. ACM Trans. Softw. Eng. Methodol. 2025, 34, 1–30. [Google Scholar] [CrossRef]
Peng, B.; Chen, K.; Li, M.; Feng, P.; Bi, Z.; Liu, J.; Niu, Q. Securing large language models: Addressing bias, misinformation, and prompt attacks. arXiv 2024, arXiv:2409.08087. [Google Scholar] [CrossRef]
Ecker, J.E. Explainable AI for Large Language Models via Context-Aware Word Embeddings. In Proceedings of the 2025 AIAA Science and Technology Forum and Exposition (AIAA SCITECH Forum), Orlando, FL, USA, 6–10 January 2025; p. 1916. [Google Scholar]
Mumuni, F.; Mumuni, A. Explainable artificial intelligence (XAI): From inherent explainability to large language models. arXiv 2025, arXiv:2501.09967. [Google Scholar] [CrossRef]
Marks, S.; Treutlein, J.; Bricken, T.; Lindsey, J.; Marcus, J.; Mishra-Sharma, S.; Ziegler, D.; Ameisen, E.; Batson, J.; Belonax, T.; et al. Auditing language models for hidden objectives. arXiv 2025, arXiv:2503.10965. [Google Scholar] [CrossRef]
Singh, S.; Vorster, L. LLM Supply Chain Provenance: A Blockchain-Based Approach. In Proceedings of the International Conference on AI Research, Lisbon, Portugal, 5–6 December 2024. [Google Scholar]

Figure 1. Intersection of knowledge management and information security in our proposed perspective of knowledge integrity in LLMs.

Figure 2. Explanatory diagram illustrating LLM knowledge integrity preservation.

Figure 4. Identification of relevant studies following the protocol of PRISMA 2020 [60].

Figure 5. Number of papers published per year on Knowledge Distillation (KD), Semantic Integrity (SI), and Provenance/Trust in LLM research (2023–2025).

Figure 6. Taxonomy of LLM knowledge integrity.

Table 1. Classification of knowledge integrity dimensions by Knowledge Management (KM) and Information Security (IS).

Dimension	Knowledge Management (KM)	Information Security (IS)
Truthfulness	Fact-checking accuracy, consistency, source reliability	Source reliability
Robustness	Adversarial robustness, out-of-distribution robustness	–
Safety	Toxicity, bias and discrimination, misinformation	–
Fairness	Group fairness, individual fairness, counterfactual fairness	–
Privacy	–	Data privacy, membership inference, attribute inference

Table 2. Comparison with existing work.

Reference	Title	Highlight	Comparison with Our Work
Wang et al. [53]	Bring Your Own Knowledge: A Survey of Methods for LLM Knowledge Expansion	Reviews methods for enhancing LLMs with diverse knowledge, including continual learning and retrieval-based adaptation.	Our work extends integrity preservation by incorporating provenance and distillation within secure contexts.
Li et al. [54]	Knowledge Boundary of Large Language Models: A Survey	Explores LLM knowledge boundaries, categorizes knowledge types, and review methods to identify and address limitations in knowledge retention and accuracy.	Our work investigates the interplay between security, knowledge distillation, and provenance mechanisms to offer a comprehensive perspective on LLM output integrity.
Wang et al. [42]	Large Language Model Enhanced Knowledge Representation Learning: A Survey	Highlights how LLMs enhance Knowledge Representation Learning (KRL).	Our work shifts the focus by including distillation, provenance, and security to address integrity beyond KRL.
Yang et al. [50]	Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application	Surveys knowledge distillation for LLMs, categorizing techniques into white-box and black-box KD.	Our work connects knowledge distillation and semantic integrity to provenance and security for robust LLM integrity.
Huang et al. [55]	A survey of safety and trustworthiness of large language models through the lens of verification and validation	Reviews safety and trustworthiness of LLMs, categorising vulnerabilities and examining traditional verification and validation techniques.	Our work integrates safety and trustworthiness with provenance, distillation, and semantic integrity to strengthen LLM robustness.
Xu et al. [14]	A survey on knowledge distillation of large language models	Provides a comprehensive analysis of knowledge distillation for LLMs.	Our work extends knowledge distillation by emphasizing its role in integrity and provenance within LLMs.

Table 3. Summary of search criteria.

Criteria	Value
Source	Google Scholar
Type	Journal, Conference, Book Chapters
Search Field	Title
Sort Order	Relevance
Year Range	2023–2025

Table 4. Summary of knowledge distillation papers with strengths, weaknesses, and implementation details.

Paper	KD Type	Strengths	Weaknesses/Limitations	Applications/Implementation	Data/Resources
General/Survey KD
[47]	Survey	Comprehensive overview of KD techniques	General survey; lacks experiments	Theoretical understanding	N/A
[87]	Object Detection KD Survey	Focused on object detection; highlights challenges	Limited to OD; not NLP-generalizable	Object detection	Detection datasets
[48]	Graph KD Survey	Handles structured graph data	Less explored for LLMs	Graph-based learning	Graph datasets
[74]	Federated KD Survey	Overview of federated KD methods	Survey only; no experiments	Distributed learning/federated learning	Federated datasets
[75]	Federated KD Survey	Highlights challenges and solutions	Mostly conceptual	Federated learning/NLP	Distributed nodes/data
Teacher–Student/LLM → Student
[63]	Teacher–Student	Covers various architectures; adaptable	Mostly survey; minimal implementation	Model compression, NLP	GPU-heavy if teacher is large
[61]	Teacher–Student/LLM	Uses synthetic data; generalizable	Quality of synthetic data can affect student	NLP tasks; LLM distillation	Open-source LLM + synthetic data
[62]	LLM → Student via unlabeled data	Adaptive sample selection (LLKD); efficient	Depends on teacher confidence; complex selection	Text classification; NLP	LLM inference + student training
[64]	Domain Knowledge KD	Efficiently distills domain-specific knowledge	Limited to domain knowledge; scalability	Domain-specific NLP/LLM compression	Teacher + domain dataset
Teacher-Student/LLM → Student
[65]	Smoothed KD	Reduces hallucinations in LLMs	Adds computational overhead	LLM hallucination mitigation	Teacher + student LLMs
[66]	LLM → Smaller LLM	Compresses LLMs effectively	May lose reasoning capability	Lightweight LLM deployment	Teacher LLM required
[68]	Logic Distillation	Function-by-function code distillation; task-specific	Limited to code/planning tasks	Planning, decision-making	Teacher LLM + code dataset
[69]	LLM → Student/Anomaly	Focused on anomaly detection	Specific to network anomalies	Anomaly detection models	LLM + anomaly dataset
[70]	LLM → Student/Text Classification	Performance-guided; scalable	Requires teacher inference	Text classification at scale	Teacher LLM + text datasets
[71]	Multi-aspect KD	Captures multiple aspects of teacher knowledge	Complexity in combining aspects	NLP; LLM reasoning	Teacher LLM + student model
[72]	KD Metrics/LLM	Rethinks KL divergence for LLM distillation	Only focuses on loss; needs experiments	LLM training/evaluation	Teacher + student LLM
[73]	Privacy-aware KD	Guides student with distribution tutor; protects privacy	Limited to medical text; complex tutor mechanism	Medical text classification	LLM + privacy dataset
[80]	LLM → Student/Assessment	Auto-scoring science assessments	Limited to education domain	Educational assessment scoring	Teacher LLM + student data
[81]	MoE → Lighter Model	Distills Mixture-of-Experts BERT to small model	MoE-specific; task limited	Lightweight NLP models	Teacher MoE BERT
[49]	LLM → Security	Semantic-augmented knowledge transfer for APT detection	Limited to provenance analysis; needs LLM teacher	Cybersecurity; APT detection	LLM + network logs
[61]	Teacher–Student LLM	Uses open-source LLMs; evaluates generalizability	Synthetic data quality critical	NLP model compression	LLM + synthetic data
[62]	LLM → Student	Efficient with unlabeled data	Pseudo-label noise	NLP	Teacher LLM + unlabeled dataset
Model Compression/Quantization/Pruning
[76]	LLaMA2 Compression	High performance; low computation cost	Focused on LLaMA2 only	Model compression	GPU + quantization
[11]	GNN + LLM KD	Combines graph info	Limited to graph-based tasks	NLP + graph tasks	LLM + GNN data
[77]	Self-distillation/Sub-4-bit LLM	Very low-bit models; efficient	May degrade performance if too low-bit	LLM compression	GPU + quantized LLM
Model Compression/Quantization/Pruning
[78]	Quantization + KD	Extreme model compression	Only tested on small models	Model compression	Quantization noise training
[79]	Data-free QAT KD	Quantization aware training for LLM	Data-free assumptions may limit generality	LLM compression	Teacher + student LLM
[83]	Pruning + KD	Compact LMs; pruning + distillation	Needs careful pruning schedule	Efficient LLM deployment	Teacher + pruning framework
[82]	Pruning + KD (Minitron)	Practical LLM distillation workflow	Implementation heavy	LLM pruning	Teacher LLM + GPU
Multi-Teacher/Multi-Stage
[85]	Multi-teacher Multi-stage	Combines multiple teachers; reasoning-focused	High computation; complex alignment	MRC (Machine Reading Comprehension)	Multiple teacher models
[86]	Multi-Teacher/ Reasoning	Transfers reasoning capabilities; multi-teacher	Computationally expensive	Small LLM reasoning improvement	Teacher LLMs
[84]	Survey/ Comprehensive	Covers most KD methods; up-to-date	No experiments; mainly conceptual	Reference for KD research	N/A

Table 5. Summary of insights from the semantic integrity papers categorized by consistency verification, adversarial robustness, and bias mitigation.

Papers	Focus/Domain	Method/Approach	Application
Consistency Verification
[88]	Safety Alignment	Knowledge Distillation framework for testing LLMs	Evaluating LLM safety and alignment
[89]	Semantic Communications	Importance-aware semantic correction using LLMs	Reliable communication
[51]	Logical Consistency	Consistency verification of LLM outputs	Fact-checking and verification
[90]	Semantic Integrity Constraints	Declarative guardrails for AI-augmented data processing	Ensuring reliable data transformations
[92]	Inconsistent Sources	Multi-source repair to mitigate hallucinations	Improving factual correctness of LLM outputs
[93]	Semantic Drift	Residual knowledge stability concept	Maintaining knowledge retention in LLMs
[91]	Semantic Consistency	Monitoring and enforcing consistency constraints	Reliable LLM outputs
[94]	Fact-Checking	LLM self-evaluation for detecting misinformation	Automated detection of AI-generated false reports
[95]	Contextual Integrity/Privacy	Automating privacy policy annotations using LLMs	Governance and compliance
Adversarial Robustness
[96]	Adversarial Prompts	Evaluation of LLMs under adversarial prompts	Measuring LLM robustness
[97]	Adversarial Attacks	Decision-level robustness analysis	Assessing vulnerability of embodied LLMs
[98]	Adversarial Attacks	Attack strategies on LLMs	Understanding LLM weaknesses
[99]	Bug Detection	Data-flow sanitization with LLMs	Improving reliability of code-generation LLMs
[100]	Text Perturbations	Robustness testing	Ensuring stability of LLM outputs
Bias Detection and Mitigation
[101]	Bias in Code Generation	Bias testing and mitigation in LLMs	Reducing unwanted bias in outputs
[102]	Bias	Addressing bias, misinformation, prompt attacks	Secure LLM deployment

Table 6. Summary of key techniques and sub-techniques for LLM information and knowledge integrity.

Technique	Sub-Techniques	Purpose	Integrity Aspect
Knowledge Distillation	Teacher-Student	Preserves knowledge from teacher to student	Knowledge Management
	Model Compression	Reduces complexity, preserves knowledge	Knowledge Management
	Multi-Teacher	Integrates insights from multiple teachers for robustness	Knowledge Management
Semantic Integrity	Consistency Verification	Ensures outputs are logically coherent	Knowledge Management
	Adversarial Robustness	Protects model from manipulated inputs	Information Security
	Bias Mitigation	Reduces social, cultural, demographic biases	Knowledge Management and Information Security
Provenance Tracking	Data Lineage	Maintains traceable records of data usage	Information Security
	Source Attribution	Identifies origin of data and outputs	Information Security and Knowledge Management

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Abishethvarman, V.; Sabrina, F.; Kwan, P. Knowledge Integrity in Large Language Models: A State-of-The-Art Review. Information 2025, 16, 1076. https://doi.org/10.3390/info16121076

AMA Style

Abishethvarman V, Sabrina F, Kwan P. Knowledge Integrity in Large Language Models: A State-of-The-Art Review. Information. 2025; 16(12):1076. https://doi.org/10.3390/info16121076

Chicago/Turabian Style

Abishethvarman, Vadivel, Fariza Sabrina, and Paul Kwan. 2025. "Knowledge Integrity in Large Language Models: A State-of-The-Art Review" Information 16, no. 12: 1076. https://doi.org/10.3390/info16121076

APA Style

Abishethvarman, V., Sabrina, F., & Kwan, P. (2025). Knowledge Integrity in Large Language Models: A State-of-The-Art Review. Information, 16(12), 1076. https://doi.org/10.3390/info16121076

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Knowledge Integrity in Large Language Models: A State-of-The-Art Review

Abstract

1. Introduction

1.1. Research Questions

1.2. Contributions of This Review

2. Background

2.1. General Overview

2.1.1. Large Language Models

2.1.2. Information Security in Large Language Models

2.1.3. Knowledge Management in Large Language Models

2.1.4. Integrity in LLM Context

2.1.5. Bridging Knowledge Management and Information Security

2.1.6. Knowledge Distillation

2.1.7. Semantic Integrity

2.1.8. Provenance Tracking in LLMs

2.2. Framework for Knowledge Integrity

3. Comparison with Existing Work

4. Methodology

5. Key Findings of the Review

5.1. Knowledge Distillation

5.1.1. Teacher–Student Learning

5.1.2. Model Compression

5.1.3. Multi-Teacher Distillation

5.2. Semantic Integrity

5.2.1. Consistency Verification

5.2.2. Adversarial Robustness

5.2.3. Bias Detection and Mitigation

5.3. Provenance Tracking

Data Lineage Tracking

6. Conclusions and Future Research

Limitations

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI