From Illusion to Insight: A Taxonomic Survey of Hallucination Mitigation Techniques in LLMs
Abstract
1. Introduction
2. Understanding Hallucinations
2.1. Definition of Hallucinations
2.2. Categories of Hallucinations
- Intrinsic hallucinations (factuality errors) occur when a model generates content that contradicts established facts, its training data, or referenced input [2,3,31,49,50,52,53]. Following the taxonomic names in [5] the subtypes of this category may include (but are not limited to):
- ○
- Entity-error hallucinations, where the model generates non-existent entities or misrepresents their relationships (e.g., inventing fake individuals, non-existent biographical details [4] or non-existent research papers), often measured via entity-level consistency metrics [60], as shown in [3,31,57,61].
- ○
- ○
- ○
- Extrinsic hallucinations (faithfulness errors) appear when the generated content deviates from the provided input or user prompt. These hallucinations are generally characterized by the inability to verify the generated output which may or may not be true but, in either case, it is either not directly deducible from the user prompt or it contradicts itself [2,3,5,50,56,58]. Extrinsic hallucinations may manifest as:
- ○
- ○
- ○
- Emergent hallucinations, defined as those arising unpredictably in larger models due to scaling effects [63]. These can be attributed to cross-domain reasoning and modality fusion especially in multi-modal settings or Chain of Thought (CoT) prompting scenarios [3,63,64], multi-step inference errors [65] and abstraction or alignment issues as shown in [50,57,64]. For instance, self-reflection demonstrates mitigation capabilities, effectively reducing hallucinations only in models above a certain threshold (e.g., 70B parameters), while paradoxically increasing errors in smaller models due to limited self-diagnostic capacity [5].
2.3. Underlying Causes of Hallucinations
- data-related issues (e.g., noisy, biased, duplicated, or imbalanced training corpora),
- model and training-related issues (e.g., probabilistic decoding objectives, memorization, under-training, or alignment-induced effects), and
- usage-related issues (e.g., prompt design, distribution shift, and retrieval mismatches).
3. Related Works
4. Review Methodology, Proposed Taxonomy, Contributions and Limitations
4.1. Review Methodology
- Literature Retrieval: We systematically collected research papers from major electronic archives—including Google Scholar, ACM Digital Library, IEEE Xplore, Elsevier, Springer, and ArXiv—with a cutoff date of 12 August 2025. Eligible records were restricted to peer-reviewed journal articles, conference papers, preprints under peer review, and technical reports, while non-academic sources such as blogs or opinion pieces were excluded. A structured query was used, combining keywords: (“mitigation” AND “hallucination” AND “large language models”) OR “evaluation”. In addition, we examined bibliographies of retrieved works to identify further relevant publications.
- Screening: The screening process followed a two-stage approach. First, titles and abstracts were screened for topical relevance. Records passing this stage underwent a full-text review to assess eligibility. Out of 412 initially retrieved records, 83 were excluded as irrelevant at the screening stage. The 329 eligible papers were then examined in detail and further categorized into support studies, literature reviews, datasets/benchmarks, and works directly proposing hallucination detection or mitigation methods. The final set of 221 studies formed the basis of our taxonomy. This process is summarized in Figure 2.
- Paper-level tagging, where every study was assigned one or more tags corresponding to its employed mitigation strategies. Our review accounts for papers that propose multiple methodologies by assigning them multiple tags, ensuring a comprehensive representation of each paper’s contributions.
- Thematic clustering, where we consolidated those tags into six broad categories presented analytically in Section 4.2. This enabled us to generate informative visualizations that reflect the prevalence and trends among different mitigation techniques.
- Content-specific retrieval: To gain deeper insight into mitigation strategies, we developed a custom Retrieval-Augmented Generation (RAG) system based on the Mistral language model as an additional research tool, which enabled us to extract content-specific passages directly from the research papers.
4.2. Proposed Taxonomy and Review Organization
- Training and Learning Approaches (Section 5.1): Encompasses diverse methodologies employed to train and refine AI models, shaping their capabilities and performance.
- Architectural Modifications (Section 5.2): Covers structural changes and enhancements made to AI models and their inference processes to improve performance, efficiency, and generation quality.
- Input/Prompt Optimization (Section 5.3): Focuses on strategies for crafting and refining the text provided to AI models to steer their behavior and output, often specifically to mitigate hallucinations.
- Post-Generation Quality Control (Section 5.4): Encompasses essential post-generation checks applied to text outputs, aiming to identify or correct inaccuracies.
- Interpretability and Diagnostic Approaches (Section 5.5): Encompasses methods that help researchers understand why and where a model may be hallucinating (e.g., Internal State Probing, Attribution-based diagnostics).
- Agent-based Orchestration (Section 5.6): Includes frameworks comprising single or multiple LLMs within multi-step loops, enabling iterative reasoning and tool usage.
4.3. Contributions and Key Findings
5. Methods for Mitigating Hallucinations
5.1. Training and Learning Approaches
5.1.1. Supervised and Semi-Supervised Learning
- Fine-Tuning with factuality objectives, where techniques such as FactPEGASUS make use of ranked factual summaries for factuality-aware fine-tuning [92] while FAVA generates synthetic training data using a pipeline involving error insertion and post-processing to address fine-grained hallucinations [93]. Faithful Finetuning applies weighted cross-entropy and fact-grounded QA losses to enhance faithfulness [94], while Principle Engraving fine-tunes LLaMA on self-aligned, principle-based responses [95]. Other work [5] examines the interplay between supervised fine-tuning and RLHF in mitigating hallucinations. Adversarial approaches build on Wasserstein GANs [90], with AFHN synthesizing features for new classes using labeled samples as context, supported by classification and anti-collapse regularizers to ensure feature discriminability and diversity.
- Synthetic Data and Weak Supervision, where studies automatically generated hallucinated data or weak labels for training. For instance, in [91] hallucinated tags are prepended to the model inputs so that it can learn from annotated examples to control hallucination levels while [96] uses BART and cross-lingual models with synthetic hallucinated datasets for token-level hallucination detection. Similarly, Petite Unsupervised Research and Revision (PURR) involves fine-tuning a compact model on synthetic data comprising corrupted claims and their denoised versions [97] while TrueTeacher uses labels generated by a teacher LLM to train a student model on factual consistency [98].
- Preference-Based Optimization and Alignment: In [99] a two-stage framework first combines supervised fine-tuning using curated legal QA data and Hard Sample-aware Iterative Direct Preference Optimization (HIPO) to ensure factuality by leveraging signals based on human preferences while in [80] a lightweight classifier is finetuned on contrastive pairs (hallucinated vs. non-hallucinated outputs). Similarly, mFACT—a metric for factual consistency—is derived from training classifiers in different target languages [100], while Contrastive Preference Optimization (CPO) combines a standard negative-log likelihood loss with a contrastive loss to finetune a model on a dataset consisting of triplets (source, hallucinated translation, corrected translation) [101]. UPRISE employs a retriever model that is trained using signals from an LLM to select optimal prompts for zero-shot tasks, allowing the retriever to directly internalize alignment signals from the LLM [102]. Finally, behavioral tuning uses label data (dialogue history, knowledge sources, and corresponding responses) to improve alignment [103].
- Knowledge-Enhanced Adaptation: Techniques like HALO injects Wikidata entity triplets or summaries via fine-tuning [104] while Joint Entity and Summary Generation employs a pre-trained Longformer model which is finetuned on the PubMed dataset, in order to mitigate hallucinations by supervised adaptation and data filtering [105]. The impact of injecting new knowledge into LLMs via supervised finetuning and the potential risk of hallucinations is also studied in [106].
- Hallucination Detection Classifiers: [107] involves fine-tuning a LLaMA-2-7B model to classify hallucination-prone queries using labeled data while in [108] a sample selection strategy improves the efficiency of supervised fine-tuning by reducing annotation costs while preserving factuality through supervision.
- Training of factuality classifiers: Supervised finetuning is used to train models on labeled text data in datasets such as HDMBENCH, TruthfulQA, and multilingual datasets demonstrating improvements in task-specific performance and factual alignment [110,111,118]. Additionally, training enables classifiers to detect properties such as honesty and lies within intermediate representations resulting in increased accuracy and separability of these concepts as shown in [83,112,117].
- Synthetic data creation, which involves injecting hallucinations into correct reasoning steps, as in FG-PRM, which trains Process Reward Models to detect specific hallucination types [113]. Approaches like RAGTruth provide human-annotated labels on response grounding to support supervised training and evaluation [114], while [124] introduces an entity-substitution framework that generates conflicting QA instances to address over-reliance on parametric knowledge.
- Refinement pipelines employ supervised training in various forms, such as critic models trained on LLM data with synthetic negatives [115], augmentation techniques like TOPICPREFIX for improved grounding [116], and models such as HVM trained on the FATE dataset to distinguish faithful from unfaithful text [119]. Related methods target truthful vs. untruthful representations [53,120,121], while self-training on synthetic data outperforms crowdsourced alternatives [122]. Finally, WizardLM fine-tunes LLaMA on generated instructions, enhancing generalization [123].
5.1.2. Reinforcement Learning
5.1.3. Contrastive Learning
5.1.4. Knowledge Distillation
5.1.5. Instruction Tuning
- Factual alignment can be achieved through domain-specific fine-tuning, as in [99], where LLMs are trained on datasets of legal instructions and responses. Another approach, Curriculum-based Contrastive Learning Cross-lingual Chain-of-Thought (CCL-XCoT), integrates curriculum-based cross-lingual contrastive learning with instruction fine-tuning to transfer factual knowledge from high-resource to low-resource languages. Its Cross-lingual Chain-of-Thought (XCoT) strategy further enhances this process by guiding the model to reason before generating in the target language, effectively reducing hallucinations [160].
- Consistency alignment, which is achieved in [158] during a two-stage supervised fine-tuning process: The first step uses instruction–response pairs while in the second step, pairs of semantically similar instructions are used to enforce aligned responses across instructions.
- Data-centric grounding, where Self-Instruct introduces a scalable, semi-automated method for generating diverse data without human annotation [159]. It begins with a small set of instructions and uses a pre-trained LLM to generate new tasks and corresponding input-output examples, which are then filtered and used to fine-tune the model, thus generating more aligned and grounded outputs.
5.2. Architectural Modifications
5.2.1. Attention Mechanisms
5.2.2. Decoding Strategies
- Probabilistic Refinement and Confidence-Based Adjustments: Methods in this family adjust token selection to favor context-aligned, higher-confidence outputs. Context-aware decoding amplifies the gap between probabilities with vs. without context, down-weighting prior knowledge when stronger contextual evidence is present [29,173]. Entropy-based schemes penalize hallucination-prone tokens using cross-layer entropy or confidence signals [174], while CPMI rescales next-token scores toward tokens better aligned with the source [175]. Logit-level interventions refine decoding by interpreting/manipulating probabilities during generation [112]. Confidence-aware search variants—Confident Decoding and uncertainty-prioritized beam search—use epistemic uncertainty to steer beams toward more faithful continuations, with higher predictive uncertainty correlating with greater hallucination risk [76,177]. SEAL trains models to emit a special [REJ] token when outputs conflict with parametric knowledge and then leverages the [REJ] probability at inference to penalize uncertain trajectories [178]. Finally, factual-nucleus sampling adapts sampling randomness by sentence position, substantially reducing factual errors [116].
- Contrastive-inspired Decoding Strategies: A range of decoding methods build on contrastive principles to counter hallucinations. DeCoRe induces hallucinations by masking retrieval heads and contrasting outputs of the base LLM with its hallucination-prone variant [146], while Delta reduces hallucinations by masking random input spans and comparing distributions from original and masked prompts [147]. Contrastive Decoding replaces nucleus or top-k search by optimizing the log-likelihood gap between an LLM and a smaller model, introducing a plausibility constraint that filters low-probability tokens [17]. SH2 (Self-Highlighted Hesitation) manipulates token-level decisions by appending low-confidence tokens to the context, causing the decoder to hesitate before committing [179]. Spectral Editing of Activations (SEA) projects token representations onto directions of maximal information, amplifying factual signals and suppressing hallucinatory ones [180]. Induce-then-Contrast (ICD) fine-tunes a “factually weak LLM” on non-factual samples and uses its induced hallucinations as penalties to discourage untruthful predictions [181]. Active Layer Contrastive Decoding (ActLCD) applies reinforcement learning to decide when to contrast layers, treating decoding as a Markov decision process [182]. Finally, Self-Contrastive Decoding (SCD) down-weights overrepresented training tokens during generation, reducing knowledge overshadowing [54].
- Verification and Critic-Guided Mechanisms: Several strategies enhance decoding by incorporating verification signals or critic models. Critic-driven Decoding combines an LLM’s probabilistic outputs with a text critic classifier that evaluates generated text and steers decoding away from hallucinations [115]. Self-consistency samples multiple reasoning paths, selecting the most consistent answer; this not only improves reliability but also provides an uncertainty estimate for detecting hallucinations [21]. TWEAK treats generated sequences and their continuations as hypotheses, which are reranked by an NLI or Hypothesis Verification Model (HVM) [119]. Similarly, mFACT integrates a faithfulness metric into decoding, pruning candidate summaries that fall below a factuality threshold [100]. RHO (Reducing Hallucination in Open-domain Dialogues) generates candidate responses via beam search and re-ranks them for factual consistency by analyzing knowledge graph trajectories from external sources [183].
- Internal Representation Intervention and Layer Analysis: Understanding how LLMs encode replies in their early internal states is key to developing decoding strategies that mitigate hallucinations [30]. Hallucination-prone outputs often display diffuse activation patterns rather than concentrated on relevant references. In-context sharpness metrics address this by enforcing sharper token activations, ensuring predictions emerge from high-confidence knowledge areas [184]. Inference-Time Intervention (ITI) shifts activations until the response is complete [185], while DoLa contrasts logits from early and later layers, emphasizing factual knowledge embedded in deeper layers over less reliable lower-layer signals [150]. Activation Decoding similarly constrains token probabilities using entropy-derived activations without retraining [184]. LayerSkip introduces self-speculative decoding, training models with layer dropout and early-exit loss so that earlier layer predictions are verified by later ones, thereby improving efficiency [186].
- RAG-based Decoding: RAG-based decoding strategies integrate external knowledge to enhance factual consistency and mitigate hallucinations [154,176]. For instance, REPLUG prepends a different retrieved document for every forward pass of the LLM and averages the probabilities from these individual passes, thus allowing the model to produce more accurate outputs by synthesizing information from multiple relevant contexts simultaneously [176]. Similarly, Retrieval in Decoder (RID) dynamically adjusts the decoding process based on the outcomes of the retrieval, allowing the model to adapt its generation based on the confidence and relevance of the retrieved information [154].
5.2.3. Retrieval-Augmented Generation
5.2.4. Knowledge Representation Approaches
5.2.5. Specialized Architectural Mechanisms for Enhanced Generation
5.3. Input/Prompt Optimization
5.3.1. Prompt Engineering
- In dataset creation and evaluation, prompt engineering has been used to generate and filter references used for inference and evaluation [107,220,224,228], systematically induce, detect, or elicit imitative falsehoods [42,53,93,117,139,181,202,223], and even create specific types of code hallucinations to test research methodologies [229,230].
- For confidence assessment and behavioral guidance, it has been used to elicit verbalized confidence, consistency, or uncertainty, and test or guide model behavior and alignment [4,22,30,71,77,95,97,103,120,179,194,203,225,227], reduce corpus-based social biases [80], extract and verify claims [231] as well as investigate failure cascades like hallucination snowballing [65].
- In knowledge integration scenarios it has been combined with retrieval modules or factual constraints [17,73,114,232], in agentic environments where prompts guide the generation of states, actions, and transitions [233], or the alignment process between queries and external knowledge bases [205], and even in the training process of a model where they are used to inject entity summaries and triplets [104]. Additionally, prompts have also been explored as explicit, language-based feedback signals in reinforcement learning settings, where natural language instructions are parsed and used to fine-tune policy decisions during training [69].
- scalability issues which arise from the number of intermediate tasks or their complexity [235],
- context dilution which demonstrates that prompts often fail when irrelevant context is retrieved, especially in RAG scenarios [73],
- lack of standardized prompting workflows which makes prompt engineering a significant trial and error task not only for end-users but also for NLP experts [72], hindering reliable mitigation, and
5.3.2. Structured or Iterative Reasoning Prompting
- Structured reasoning prompts modify behavior within a single forward pass. The model follows the request to enumerate steps “in one shot”; typically, there is no separate controller deciding when to take steps or whether to call external tools.
- Iterative reasoning further improves generation by guiding decomposition into a series of steps, each of which builds on, refines, and supports previous steps before producing the final answer.
- exploit the dialog capabilities of LLMs to detect logical inconsistencies by integrating deductive formal methods [82].
5.3.3. In-Context Prompting
- Pattern Reinforcement: Exposure to multiple demonstrations helps the model align its response style and factual consistency with the provided examples. For instance, Principle-Driven Self-Alignment supplies 5 in-context exemplars alongside 16 human-written principles, giving clear patterns for compliance and thereby aligning the model’s behavior with the desired norms [95].
- Bias Reduction: Balanced example selection can minimize systematic biases, particularly in ambiguous queries [254,255] while few-shot examples have been used to calibrate GPT-3’s responses, demonstrating how different sets of balanced vs. biased prompts significantly influence downstream performance [71].
5.3.4. Context Optimization
5.3.5. System Prompt Design
5.4. Post-Generation Quality Control
- Self-verification and Consistency Checking: Involves internal assessments of output quality, ensuring logical flow, and maintaining factual coherence within the generated content.
- External Fact-checking and Source Attribution: Validates information against outside authoritative sources or asks the model to explicitly name its sources.
- Reliability Quantification: a broader subcategory that encompasses:
- ○
- Uncertainty Estimation (quantifying the likelihood of claims) and
- ○
- Confidence Scoring (assigning an overall reliability score to the output).
- Output Refinement: Involves further shaping and iteratively polishing the generated text.
- Response Validation: Strictly focuses on confirming that the output meets specific, pre-defined criteria and constraints.
5.4.1. Self-Verification and Consistency Checking
5.4.2. External Fact-Checking
5.4.3. Uncertainty Estimation and Confidence Scoring
Uncertainty Estimation
- Entropy-based approaches: The Real-time Hallucination Detection (RHD) flags high-entropy entities and triggers self-correction when unreliability is predicted [55]. Conditional Pointwise Mutual Information (CPMI) quantifies token-level conditional entropy, identifying hallucinated tokens as high-entropy states and reinforcing uncertainty as a useful proxy [175]. INSIDE computes an EigenScore—differential entropy in the sentence-embedding space—directly from hidden states and applies feature clipping to curb overconfident generations [26]. A PMI-based detector measures “overshadowing” via perturbations, using uncertainty as a cue to spot low-confidence conditions [54]. Beyond surface entropy, semantic formulations better capture meaning-level doubt. “Semantic entropy” clusters multiple sampled answers by meaning and computes entropy over clusters, estimating uncertainty over interpretations rather than word choices [25]. Semantic Entropy Probes (SEPs) infer a comparable signal from a single generation’s hidden states—eschewing multi-sample costs and outperforming log-probability or entropy baselines [275]. Hybrid estimators further improve reliability by pairing auxiliary models with decoding signals: an Epistemic Neural Network (ENN) extracts hidden features, trains a small MLP for next-token prediction, and fuses its outputs with contrastive-decoding logits to down-weight low-confidence generations [276]. CHOKE shows high-certainty hallucinations—even when the model “knows” the answer—across semantic entropy and token probability [79]. Hence, these signals are valuable early-warning indicators but should be paired with external verification or contrastive/decoding controls to detect high-confidence failures.
- Sampling: Sampling-based uncertainty estimation treats variability across multiple generations as a proxy for uncertainty. In [127], the model is sampled repeatedly and output divergence is quantified to produce a resampling-derived confidence value (distinct from single-token probabilities); this scalar then serves both as a verification signal and as a reward during reinforcement learning. Similarly, Ref. [267] detects hallucination risk by measuring divergence among sampled responses, operationalized with metrics such as BERTScore and NLI-based agreement. Extending this idea, Ref. [77] three main components: sampling strategies to generate multiple responses, different prompting techniques to elicit the model’s uncertainty and aggregation techniques which are used to combine these multiple responses and their associated confidence scores to produce a final, calibrated confidence score.
- Monte Carlo methods: Fundamental measures such as sequence log-probability and Monte-Carlo dropout dissimilarity are used as uncertainty signals to detect hallucinations and to drive downstream refinement, detection, and re-ranking, capturing variability and confidence in predictions [9]. In [233], reward estimation based on the log probability of actions effectively quantifies confidence in individual reasoning steps; MCTS then exploits these rewards to prioritize higher-plausibility paths. Although not labeled as “uncertainty estimation,” this setup substantially overlaps with it, since the reward function encodes a trust/uncertainty signal over the LLM’s reasoning traces [233].
- Explicit Verbalization: The core method in [155] trains models to verbalize epistemic uncertainty by distinguishing “uncertain” vs. “certain” data using both supervised and unsupervised signals, and evaluates calibration with ECE and AP, improving models’ ability to express self-doubt. SelfAware introduces a benchmark and method for detecting when a model should state uncertainty in response to unanswerable questions [277]. Similarly, Ref. [278] fine-tunes GPT-3 to output “verbalized probability,” a direct expression of epistemic uncertainty— a higher-order objective beyond raw softmax confidence. While we place [278] under Uncertainty Estimation, confidence scoring remains crucial: calibration is assessed with MSE and MAD, but these scores arise from the verbalized probabilities themselves, not from post hoc logits.
- Semantic analysis: In [78], semantic density quantifies uncertainty by measuring the similarity between a given response and multiple completions in embedding space. It operates response-wise (not prompt-wise), requires no retraining, and addresses limitations of earlier uncertainty methods (e.g., semantic entropy, P(True)). Although the authors consistently describe the resulting scalar as “confidence,” the [0, 1] score—with natural thresholds for filtering—is best viewed as the output of an uncertainty metric. In [232], semantic analysis is combined with logit-derived, token-level probabilities to compute a confidence score per atomic unit, which is then integrated with textual entailment probabilities to yield a refined score for detecting hallucinated spans. While termed “confidence” and useful for thresholding, this score functions within a broader hallucination-detection pipeline; thus, confidence scoring is effectively a means to uncertainty estimation in the authors’ framework.
- Training approaches: [152] explicitly links hard labels to model overconfidence and proposes soft labels to introduce uncertainty-aware supervision. By restructuring the training objective to reflect confidence calibration and evaluating overconfidence via the NLL of incorrect answers, the authors show that fine-tuning with soft labels reduces misplaced certainty—an important driver of hallucinations. Ref. [36] extends this idea by using smoothed soft labels (rather than hard labels) to mitigate hallucinations through knowledge distillation: a student model learns from a calibrated probability distribution, consistent with the maximum-entropy principle, yielding better factual grounding and reliability. As in [152], overconfidence is assessed by plotting NLL on incorrect answers, and soft-label fine-tuning is shown to reduce unwarranted certainty.
- Composite methods: In [175], epistemic uncertainty directly guides decoding via a modified beam search that prioritizes low-uncertainty continuations, reducing incorrect or nonexistent facts. Ref. [279] uses a proxy model to compute token- and sentence-level hallucination scores from uncertainty metrics; these signals are sharpened by (i) emphasizing keywords, (ii) propagating uncertainty through attention weights, and (iii) correcting token probabilities by entity type and frequency to address over- and under-confidence. Finally, in [168] the authors use the attention mechanism as a self-knowledge probe. Specifically, they design an uncertainty estimation head, which is essentially a lightweight attention head that relies on attention-derived features such as token-to-token attention maps and lookback ratios, serving as indicators of hallucination likelihood.
Confidence Scoring
5.4.4. Output Refinement
- RAG/Web-search–based output refinement retrieves external evidence to correct or revise model outputs. CRAG uses a lightweight retrieval evaluator and a decompose–then–recompose strategy to judge the relevance of retrieved documents for a given query [280]. EVER validates generations against verified sources and iteratively fixes intrinsic errors or reformulates extrinsic hallucinations [246]. FAVA conducts span-level hallucination detection and editing, marking inaccurate or subjective spans for deletion and proposing corrected replacements to refine the final text [93].
- Structured knowledge sources integrate and reason over formal data—e.g., knowledge graphs (KGs)—to refine and validate LLM outputs. Ref. [167] couples text with relational signals via GNN-based probabilistic inference to improve factual grounding, while [183] re-ranks conversational candidates using walks over KG subgraphs to enhance reasoning. Neural Path Hunter (NPH) [188] follows a generate-then-refine loop, detecting hallucinated entities in dialogue and replacing them via targeted KG queries. During FLEEK’s revision phase [207], a fact-revision module proposes corrections to dubious triples using verified KG/Web evidence. Complementing graph-centric methods, Ref. [82] blends deductive formal verification with the inductive strengths of LLMs, using logical tests to expose hallucinations.
- External Feedback and verification: These methods refine outputs using outside signals—human feedback, retrieved evidence, or verified knowledge. CRITIC [256] exploits CoT and few-shot prompting to revise hallucinated answers based on external feedback (free-form QA, mathematical reasoning, toxicity reduction). Chain of Knowledge (CoK) [270] applies a three-stage pipeline—reasoning preparation, dynamic knowledge adapting, and answer consolidation—and, when no majority consensus emerges, corrects rationales by integrating heterogeneous sources. Verify-and-Edit [268] specifically post-edits CoT chains: the model produces an initial answer and CoT retrieves external knowledge to answer them. Subsequently, it adjusts the original CoT and final response to fix unsupported or incorrect claims. Within DRAD [55], the SEK module acts when RHD flags a likely hallucination: SEK formulates a query from the local context, retrieves relevant evidence, truncates the output at the error point, and regenerates the continuation using the retrieved knowledge.
- Filtering Based on External Grounding: These methods filter outputs by checking them against external documents or ground truth. HAR [42] uses Factuality Filtering and Attribution Filtering to retain only answers explicitly supported by the provided document. HaluEval-Wild [107] applies filtering, manual verification of hallucination-prone queries, and selection of difficult cases to refine outputs and ensure the evaluation set contains only well-grounded examples.
- Agent-Based Interaction with External Context: These involve agents that interact with external environments or receive structured external feedback for refinement. For instance, the mitigation agent in [220] is designed to refine and improve the output by interpreting an Open Voice Network (OVON) JSON message. This JSON message contains crucial information, including the estimated hallucination level and detailed reasons for potential hallucinations, which guides the refinement process [139].
- Model Tuning/Refinement with External Knowledge: Methods that explicitly use external knowledge during their training or refinement phase to improve model outputs. In [157], methods like refusal tuning, open-book tuning, and discard tuning are leveraged to refine the outputs of the model, thus ensuring consistency with external and intrinsic knowledge. The PURR model refines its outputs through a process akin to conditional denoising by learning to correct faux hallucinations—intentionally corrupted text that has been used to fine tune an LLM. The refinement happens as PURR denoises these corruptions by incorporating relevant evidence, resulting in more accurate and attributable outputs [97].
- Iterative self-correction methods refine outputs through repeated, prompt-driven revision and internal checks. An adaptive framework in [247] performs defect analysis, guided optimization, and response comparison via prompt-based voting. Self-Checks [266] rephrase prompts or ask related questions to test internal consistency, while [281] uses in-context prompting to incorporate the model’s self-generated feedback for iterative correction. Self-Reflection [45] rewrites answers to improve factuality, consistency, and entailment, and [67] prompts the model to identify and adjust self-contradictions within its own text. Finally, Tree of Thoughts (ToT) [234] employs structured search to explore and evaluate intermediate reasoning branches, enabling staged self-evaluation and refinement of the reasoning path.
- Self-Regulation during Generation/Decoding, where the model re-adjusts its own output or decision-making process in real-time during generation. For instance, the Self-highlighted Hesitation method (SH2) presented in [179] refines the model’s output by iteratively recalibrating the token probabilities through hesitation and contrastive decoding, while the Hypothesis Verification Model (HVM) estimates faithfulness scores during decoding, refining the output at each step [119].
- Self-Generated Data for Improvement, where the LLM generates data or instructions which are subsequently used to finetune itself. For instance, the Self-Instruct framework bootstraps off the LLM’s own generations to create a diverse set of instructions for finetuning while in WizardLM such instructions are evolved and iteratively refined through elimination evolving to ensure a diverse dataset for instruction fine-tuning.
- Model-based techniques and tuning: Approaches in this group refine outputs by adding evaluators, rerankers, or specialized training. LaMDA uses a generate-then-rerank pipeline with discriminators that score safety and quality, selecting the top candidate [19]. Dehallucinator overwrites flagged translations by sampling Monte Carlo-dropout hypotheses, scoring them, and choosing the best translation [282]. A dual-model setup [213] pairs a generator with an evaluator that applies token-level confidence scoring and probabilistic anomaly detection; a feedback loop flags problematic spans and iteratively adjusts the output. SC2 (Structured Comparative reasoning) combines approximate inference with pairwise comparison to pick the most consistent structured representation from multiple intermediates [242]. Ref. [77] improves reliability by sampling multiple responses and aggregating them for consistency. Verbose Cloning [95] uses tailored prompts and context distillation to make answers more comprehensive, reducing overly brief or indirect outputs. A corrector model [23] iteratively upgrades a base model’s hypotheses via value-improving triplets (input, hypothesis, correction), yielding gains in math program synthesis, lexically constrained generation, and toxicity removal. Finally, an MoE architecture [212] refines outputs through expert consensus—using majority voting to filter erroneous responses and retain only agreement-backed generations.
5.4.5. Response Validation
5.5. Interpretability and Diagnostic Approaches
5.5.1. Internal State Probing
5.5.2. Neuron Activation and Layer Analysis
5.5.3. Attribution-Based Diagnostics
5.6. Agent-Based Orchestration
5.6.1. Reflexive/Self-Reflective Agents
5.6.2. Modular and Multi-Agent Architectures
6. Benchmarks for Evaluating Hallucinations
- Factual Verification Benchmarks: These benchmarks focus on assessing the factual accuracy of LLM outputs by comparing them against established ground truth.
- ○
- ANAH is a bilingual dataset for fine-grained hallucination annotation in large language models, providing sentence-level annotations for hallucination type and correction [57]
- ○
- BoolQ: A question answering dataset which focuses on yes/no questions, requiring models to understand the context and before deciding [310].
- ○
- DiaHalu is introduced as the first dialogue-level hallucination evaluation benchmark for LLMs, designed to move beyond purely factual errors by spanning four multi-turn settings—knowledge-grounded, task-oriented, chit-chat, and reasoning [260].
- ○
- FACTOR (Factuality Assessment Corpus for Text and Reasoning) is a benchmark for evaluating LLM factuality with an emphasis on multi-hop reasoning and evidence retrieval [311].
- ○
- FACTSCORE [255] is a fine-grained, atomic-level metric that assesses factual precision in long-form outputs by labeling each claim as supported, unsupported, or unverifiable.
- ○
- FELM (Factuality Evaluation of Large Language Models) is a benchmark dataset for testing factuality evaluators on long-form LLM outputs across five domains—world knowledge, science/tech, math, reasoning, and writing/recommendation—by measuring their ability to detect factual errors [312].
- ○
- FEVER (Fact Extraction and Verification): FEVER is a 185,445-claim dataset that serves as a challenging benchmark—requiring multi-hop reasoning and evidence retrieval—to test models’ ability to gather relevant evidence and determine claim veracity [313].
- ○
- FEWL (Factuality Evaluation Without Labels): FEWL is a methodology for measuring and reducing hallucinations in large language models without relying on gold-standard answers [243].
- ○
- The FRANK benchmark for abstractive summarization provides fine-grained error annotations on summaries from nine systems, enabling rigorous evaluation and comparison of factuality metrics [302].
- ○
- HADES (HAllucination DEtection dataset) is a reference-free hallucination-detection dataset for QA, built by perturbing Wikipedia text and human-annotating via a model-in-the-loop process, enabling detection of hallucinations without ground-truth references [85].
- ○
- HalluEditBench is a dataset comprising verified hallucinations across multiple domains and topics and measures editing performance across Efficacy, Generalization, Portability, Locality, and Robustness [314].
- ○
- HalluLens is a hallucination-focused benchmark that covers intrinsic and extrinsic tasks, dynamically generates test sets to curb data leakage, and aims for task-aligned detection by treating hallucinations as inconsistency with training/user input rather than absolute truth [315].
- ○
- HALOGEN (Hallucinations of Generative Models) is a multi-domain hallucination benchmark that evaluates LLMs on hallucination frequency, refusal behavior, and utility, shedding light on error types and their likely pretraining-data sources [316].
- ○
- HaluEval: HaluEval is a large-scale benchmark designed to evaluate the hallucination tendencies of large language models (LLMs). It measures how well LLMs can generate factually accurate content and identify information that is hallucinated or incorrect [224].
- ○
- HaluEval 2.0: HaluEval 2.0 is an enhanced version of the original HaluEval benchmark, containing 8770 questions from diverse domains offering wider coverage and more rigorous evaluation metrics for assessing factuality hallucinations [5].
- ○
- HaluEval-Wild: HaluEval-Wild is a benchmark designed to evaluate hallucinations within dynamic, real-world user interactions as opposed to other benchmarks that focus on controlled NLP tasks like question answering or summarization [107].
- ○
- HDMBench is a benchmark designed for hallucination detection across diverse knowledge-intensive tasks [111]. It includes span-level and sentence-level annotations, covering hallucinations grounded in both context and common knowledge.
- ○
- Head-to-Tail: Head-to-Tail delves into the nuances of factual recall by categorizing information based on popularity [203]. It consists of 18,000 question-answer pairs and segments knowledge according to popularity.
- ○
- ○
- NQ (Natural Questions): NQ is a large-scale dataset of real questions asked by users on Google, paired with corresponding long-form answers from Wikipedia [317]. It tests the ability to retrieve and understand information from a large corpus.
- ○
- RAGTruth is a dataset tailored for analyzing word-level hallucinations within standard RAG frameworks for LLM applications. It comprises nearly 18,000 naturally generated responses from various LLMs that can also be used to benchmark hallucination frequencies [114].
- ○
- SelfCheckGPT is a zero-resource, black-box benchmark for hallucination detection. It assesses model consistency by sampling multiple responses and measuring their similarity, without needing external databases or model internals [267].
- ○
- TriviaQA is a question-answering dataset that contains over 650,000 question-answer-evidence triplets that were created by combining trivia questions from various web sources [318].
- ○
- TruthfulQA: TruthfulQA is a benchmark designed to assess the capability of LLMs in distinguishing between truthful and false statements, particularly those crafted to be adversarial or misleading [53].
- ○
- The UHGEval benchmark offers a large-scale, Chinese-language dataset for evaluating hallucinations under unconstrained generation settings. UHGEval captures naturally occurring hallucinations from five LLMs and applies a rigorous annotation pipeline, making it a more realistic and fine-grained resource for factuality evaluation [319].
- Domain-Specific Benchmarks: These benchmarks target specific domains, testing the model’s knowledge and reasoning abilities within those areas.
- ○
- PubMedQA: This benchmark focuses on medical question answering, evaluating the accuracy and reliability of LLMs in the medical domain [240].
- ○
- SciBench: This benchmark verifies scientific reasoning and claim consistency, assessing the ability of LLMs to understand and apply scientific principles [320].
- ○
- LegalBench: This benchmark examines legal reasoning and interpretation, evaluating the performance of LLMs on legal tasks [10].
- Code Generation Benchmarks (e.g., HumanEval, Codex): These benchmarks assess the ability of LLMs to generate correct and functional code, which requires both factual accuracy and logical reasoning [230].
Benchmark Selection Guidance
7. Practical Implications
- Keep humans in the loop with role clarity: Use Self-Reflection and external fact-checking pipelines to route low-confidence or conflicting outputs to a designated reviewer; require dual sign-off for irreversible actions when sources disagree.
7.1. Operationalization in High-Stakes Domains
- Healthcare Applications: For healthcare, a retrieve → generate → verify → abstain/revise workflow seems appropriate. The retrieval step could be restricted to reliable, up-to-date sources, with citations provided at the span level. It may be wise to constrain the model’s generation to only the retrieved information. Low confidence might trigger a human review, and final outputs should explicitly note any uncertainty. End-to-end auditability can be maintained by logging prompts, model versions, retrieved passages, and final decisions. There are trade-offs to consider, such as balancing accuracy and safety against latency and computational cost.
- Legal Applications: For legal contexts, a workflow of scoped retrieve → structured reasoning → cite-check → redline might be preferred. Retrieval could be limited by jurisdiction and authority. Prompts might enforce a structured analysis, and a secondary checker could validate quotes against authoritative texts. Provenance and rationale should be logged to support audits. Key tensions here could include balancing citation precision against coverage and rigorous verification against speed.
- General Deployment Practices: Across these settings, you may want to locally calibrate retrieval depth, verification thresholds, and abstention policies. It could be beneficial to pilot evaluations that report not only accuracy but also a triplet of outcomes: factuality, robustness to noisy inputs, and latency. In practice, modular or agentic designs that separate the generate, verify, and refine stages can provide better control and traceability, though this comes with added complexity. These costs should be explicitly documented in a safety case.
7.2. Does RAG Help or Harm?
7.3. Computational Trade-Offs
8. Extended Discussion
9. Challenges
- Lack of Standardized Benchmarks and Metrics: The evaluation of hallucination mitigation is fragmented, using ad hoc datasets and inconsistent metrics. This makes it difficult to compare different methods. Creating shared benchmarks that include various hallucination types and languages is necessary for establishing reliable baselines and systematic evaluation.
- Interpretability and Attribution Difficulties: It is challenging to understand how transformer-based models create hallucinations due to their complexity. This makes it hard to pinpoint the cause of errors, especially in multi-method systems. Improved interpretability tools and causal analysis are crucial for building trust in these applications.
- Robustness Under Distribution Shifts and Adversarial Inputs: Large Language Models (LLMs) are often vulnerable to hallucinations when they encounter data or prompts that differ from their training data. Ensuring the models are resilient under these conditions requires improved uncertainty estimation and new training methods.
- Computational Trade-offs and Latency Constraints: Many mitigation strategies, such as retrieval-augmented generation, add significant computational overhead, creating a conflict between accuracy and speed. Future research must focus on efficient mitigation techniques that are both reliable and practical for real-world use.
- Knowledge Limitations and Updating: Hallucinations often happen because models use outdated or incomplete training data. While retrieval-based methods help, they can still be affected by errors in external sources. More robust strategies for continuously updating knowledge and noise-aware retrieval are needed.
- Ethical and Epistemological Concerns: Hallucination mitigation is also an ethical issue. Researchers must navigate the line between factual reliability and creative generation, especially in sensitive fields like healthcare and law. This requires both technical safeguards and governance frameworks.
10. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Glossary
| A | |
| ActLCD | Active Layer-Contrastive Decoding—Decoding that contrasts intermediate layers to steer token selection toward more factual continuations. |
| Activation Decoding | Constrained decoding that adjusts next-token probabilities using activation/uncertainty signals to suppress hallucinations. |
| AFHN | Adversarial Feature Hallucination Networks—Adversarial training to produce features and examples that stress models and reduce hallucinations. |
| AggTruth | Attention aggregation across heads/layers to flag unsupported spans and improve factual consistency checks. |
| ALIGNed-LLM | Aligns external knowledge (e.g., KG/entity embeddings) with the model’s representation space to ground generations. |
| ALTI+ | Attribution method that quantifies how much each input token contributes to generated tokens for interpretability/factuality analysis. |
| ANAH | Bilingual hallucination dataset with sentence-level annotations and suggested corrections. |
| ATF | Adaptive Token Fusion—Merges redundant/similar tokens early to retain meaning while reducing noise and hallucination risk. |
| AutoHall | Automatic pipeline to synthesize, detect, and evaluate hallucinations for training and benchmarking. |
| AutoRAG-LoRA | Lightweight LoRA adaptation to better couple retrieval and generation in RAG systems. |
| B | |
| BERTScore | Semantic similarity metric using contextual embeddings to evaluate generated text. |
| BLEU | N-gram overlap metric; useful for surface similarity but not a direct measure of factuality. |
| BoolQ | Yes/no question-answering dataset often used in factuality experiments. |
| C | |
| CCL-XCoT | Cross-lingual transfer of Chain of Thought traces to improve reasoning and reduce hallucinations across languages. |
| CD | Contrastive Decoding—Penalizes tokens favored by a weaker/contrast model to filter implausible continuations. |
| CoK | Chain of Knowledge—Grounds reasoning by explicitly incorporating external knowledge into intermediate steps. |
| COMET-QE | Reference-free MT quality estimation used as a proxy signal for consistency. |
| Confident Decoding | Incorporates uncertainty estimates into beam/nucleus procedures to favor low-uncertainty continuations. |
| CoNLI | Chain of NLI—Cascaded entailment checks over partial outputs to prune unsupported content. |
| CoT | Chain of Thought—Prompting that elicits step-by-step reasoning before the final answer. |
| CoVe | Chain of Verification—Contextualized word embeddings pre-trained on translation, used for stronger semantic representations. |
| CPM | Conditional Entropy Mechanism—Uses token-level entropy to detect and avoid uncertain/hallucination-prone outputs. |
| CPMI | Conditional Pointwise Mutual Information—Decoding re-scoring that rewards tokens better supported by the source/context. |
| CPO | Contrastive Preference Optimization—Preference optimization that uses contrastive signals to align outputs with faithful behavior. |
| CRAG | Corrective Retrieval-Augmented Generation—Adds corrective/revision steps atop RAG to fix unsupported claims. |
| CRITIC | A verify-and-edit framework where a “critic” process checks claims against evidence and proposes fixes. |
| Critic-driven Decoding | Decoding guided by a trained critic/verifier that down-weights unsupported next tokens. |
| D | |
| D&Q | Decompose-and-Query—Decomposes a question into sub-questions and retrieves evidence for each before answering. |
| DeCoRe | Decoding by Contrasting Retrieval Heads—Contrasts retrieval-conditioned signals to suppress ungrounded tokens. |
| Dehallucinator | Detect-then-rewrite approach that edits hallucinated spans into grounded alternatives. |
| Delta | Compares outputs under masked vs. full context to detect and penalize hallucination-prone continuations. |
| DiaHalu | Dialogue-level hallucination benchmark covering multiple multi-turn domains. |
| DoLa | Decoding by Contrasting Layers—Uses differences between early vs. late layer logits to promote factual signals. |
| DPO | Direct Preference Optimization—RL-free preference tuning that directly optimizes for chosen responses. |
| DRAD | Decoding with Retrieval-Augmented Drafts—Uses retrieved drafts/evidence to guide decoding away from unsupported text. |
| DreamCatcher | Detects and corrects hallucinations by cross-checking outputs against external evidence/tools. |
| DrHall | Lightweight, fast hallucination detection targeted at real-time scenarios. |
| E | |
| EigenScore | Uncertainty/factuality signal derived from the spectrum of hidden-state representations. |
| EntailR | Entailment-based verifier used to check whether generated claims follow from retrieved evidence. |
| EVER | Evidence-based verification/rectification that validates claims and proposes fixes during/after generation. |
| F | |
| F2 | Faithful Finetuning—Direct finetuning objective to increase faithfulness of generations. |
| FacTool | Tool-augmented factuality checking that extracts claims and verifies them against sources. |
| FactPEGASUS | Summarization variant emphasizing factual consistency with the source document. |
| FactRAG | RAG design focused on retrieving and citing evidence that supports each claim. |
| FACTOR | Benchmark emphasizing multi-hop factuality and evidence aggregation. |
| FAVA | Corrupt-and-denoise training pipeline to teach models to correct fabricated content. |
| FELM | Benchmark for evaluating factuality evaluators on long-form outputs. |
| FEVER | Large-scale fact verification dataset (Supported/Refuted/Not Enough Info). |
| FG-PRM | Fine-Grained Process Reward Model—Process-level reward modeling for stepwise supervision of reasoning. |
| FRANK | Fine-grained factual error taxonomy and benchmark for summarization. |
| FreshLLMs | Uses live retrieval/search refresh to reduce outdated or stale knowledge. |
| FactScore | Atomic, claim-level factuality scoring/benchmark for long-form text. |
| G | |
| GAN | Generative Adversarial Network—Adversarial training framework used to stress and correct model behaviors. |
| GAT | Graph Attention Network—Graph neural network with attention; used to propagate grounded evidence. |
| GNN | Graph Neural Network—Neural architectures over graphs for structured reasoning/grounding. |
| GoT | Graph-of-Thoughts—Represents reasoning as a graph of states/operations to explore multiple paths. |
| Grad-CAM | Gradient-based localization on intermediate features for interpretability of decisions. |
| Gradient × Input | Simple attribution method multiplying gradients by inputs to estimate token importance. |
| Graph-RAG | RAG that leverages knowledge graphs/graph structure for retrieval and grounding. |
| G-Retriever | Graph-aware retriever designed to recall evidence that reduces hallucinations. |
| H | |
| HADES | Reference-free hallucination detection dataset for QA. |
| HALO | Estimation and reduction framework for hallucinations in open source LLMs. |
| HALOGEN | Structure-aware reasoning/verification pipeline to reduce unsupported claims. |
| HalluciNot | Retrieval-assisted span verification to detect and mitigate hallucinations. |
| HaluBench | Benchmark suite for evaluating hallucinations across tasks or RAG settings. |
| HaluEval | Large-scale hallucination evaluation benchmark. |
| HaluEval-Wild | “In-the-wild” hallucination evaluation using web-scale references. |
| HaluSearch | Retrieval-in-the-loop detection/mitigation pipeline that searches evidence while generating. |
| HAR | Hallucination Augmented Recitations—Produces recitations/snippets that anchor generation to evidence. |
| HDM-2 | Hallucination Detection Method 2—Modular multi-detector system targeting specific hallucination types. |
| HERMAN | Checks entities/quantities in outputs against source to avoid numerical/entity errors. |
| HILL | Human-factors-oriented hallucination identification framework/benchmark. |
| HIPO | Hard-sample-aware iterative preference optimization to improve robustness. |
| HMC | Hidden Markov Chains—Sequential state models used to analyze latent dynamics associated with hallucinations. |
| HSP | Hierarchical Semantic Piece—Hierarchical text segmentation/representation to stabilize retrieval and grounding. |
| HybridRAG | Combines multiple retrieval sources/strategies (e.g., dense + sparse + KG) for stronger grounding. |
| HumanEval | Code generation benchmark often used in hallucination-sensitive program synthesis. |
| HVM | Hypothesis Verification Model—Classifier/verifier that filters candidates by textual entailment with evidence. |
| I | |
| ICD | Induce-then-Contrast Decoding—Induces errors with a weaker model and contrasts to discourage hallucinated tokens. |
| INSIDE | Internal-state-based uncertainty estimation with interventions to reduce overconfidence. |
| Input Erasure | Attribution by removing/ablating input spans to see their effect on outputs. |
| InterrogateLLM | Detects hallucinations via inconsistency across multiple answers/contexts. |
| Iter-AHMCL | Iterative decoding with hallucination-aware contrastive learning to refine outputs. |
| ITI | Inference-Time Intervention—Nudges specific heads/activations along truth-aligned directions during decoding. |
| J | |
| Joint Entity and Summary Generation | Summarization that jointly predicts entities and the abstract to reduce unsupported content. |
| K | |
| KB | Knowledge Base—External repository of facts used for grounding/verification. |
| KCA | Knowledge-Consistent Alignment—Aligns model outputs with retrieved knowledge via structured prompting/objectives. |
| KG | Knowledge Graph—Graph-structured facts used for retrieval, verification, and attribution. |
| KGR | Knowledge Graph Retrofitting—Injects/retrofits KG-verified facts into outputs or intermediate representations. |
| KL-divergence | Divergence measure used in calibration/regularization and to compare layer distributions. |
| Knowledge Overshadowing | When parametric priors dominate over context, causing the model to ignore given evidence. |
| L | |
| LaBSE | Multilingual sentence encoder used for cross-lingual matching/verification. |
| LASER | Language-agnostic sentence embeddings for multilingual retrieval/entailment. |
| LAT | Linear Artificial Tomography—Linear probes/edits to reveal and steer latent concept directions. |
| LayerSkip | Self-speculative decoding with early exits/verification by later layers. |
| LID | Local Intrinsic Dimension—Dimensionality measure of hidden states linked to uncertainty/truthfulness. |
| LinkQ | Forces explicit knowledge-graph queries to ground answers. |
| LLM Factoscope | Probing/visualization of hidden-state clusters to distinguish factual vs. fabricated content. |
| LLM-AUGMENTER | Orchestrates retrieval/tools around an LLM to improve grounding and reduce errors. |
| Logit Lens | Projects intermediate residual streams to the vocabulary space to inspect token preferences. |
| Lookback Lens | Attention-only method that checks whether outputs attend to relevant context. |
| LoRA | Low-rank adapters for efficient finetuning, commonly used in factuality/hallucination pipelines. |
| LQC | Lightweight Query Checkpoint—Predicts when a query needs verification or retrieval before answering. |
| LRP | Layer-wise Relevance Propagation—Decomposes predictions to attribute token-level contributions. |
| M | |
| MARL | Multi-Agent Reinforcement Learning—Multiple agents coordinate/critique each other to improve reliability. |
| MC | Monte Carlo—Stochastic sampling used for uncertainty estimation and search. |
| MCTS | Monte Carlo Tree Search—Guided tree exploration used in deliberate, plan-and-verify reasoning. |
| METEOR | MT metric leveraging synonymy/stemming; not a direct factuality measure. |
| mFACT | Decoding-integrated factuality signal to prune low-faithfulness candidates. |
| MixCL | Mixed contrastive learning (with hard negatives) to reduce dialog hallucinations. |
| MoCo | Momentum contrast representation learning used to build stronger encoders. |
| MoE | Mixture-of-Experts—Sparse expert routing to localize knowledge and reduce interference. |
| N | |
| NEER | Neural evidence-based evaluation/repair methods that use entailment or retrieved evidence to improve outputs. |
| Neural Path Hunter | Analyzes reasoning paths/graphs to locate error-prone segments for correction. |
| Neural-retrieval-in-the-loop | Integrates a trainable retriever during inference to stabilize grounding. |
| NL-ITI | Nonlinear version of ITI with richer probes and multi-token interventions. |
| NLU | Natural Language Understanding—Models/components (e.g., NLI, QA) used as verifiers or critics. |
| Nucleus Sampling | Top-p decoding that samples from the smallest set whose cumulative probability exceeds p. |
| O | |
| OVON | Open-Vocabulary Object Navigation; task setting where language directs navigation to open-set objects, used in agent/LLM evaluations. |
| P | |
| PCA | Principal Component Analysis—Projects activations to principal subspaces to analyze truth/lie separability. |
| PGFES | Psychology-guided two-stage editing and sampling along “truthfulness” directions in latent space. |
| Persona drift | When a model’s stated persona/stance shifts across sessions or contexts. |
| PoLLMgraph | Probabilistic/graph model over latent states to track hallucination dynamics. |
| PMI | Pointwise Mutual Information—Signal for overshadowing/low-confidence conditions during decoding. |
| Principle Engraving | Representation-editing to imprint desired principles into activations. |
| Principle-Driven Self-Alignment | Self-alignment method that derives rules/principles and tunes behavior accordingly. |
| ProbTree | Probabilistic Tree-of-Thought—ToT reasoning with probabilistic selection/evaluation of branches. |
| PURR | Trains on corrupted vs. corrected claims to produce a compact, factuality-aware model. |
| TOPICPREFIX | Prompt/prefix-tuning scheme to stabilize topic adherence and reduce drift. |
| Q | |
| Q2 | Factual consistency measure comparing outputs to retrieved references. |
| R | |
| R-Tuning | Tuning models to abstain or say “I don’t know” when unsure. |
| RAG | Retrieval-Augmented Generation—Augments generation with document retrieval for grounding. |
| RAG-KG-IL | RAG integrated with knowledge-graph and incremental-learning components. |
| RAG-Turn | Turn-aware retrieval for multi-turn tasks. |
| RAGTruth | Human-annotated data for evaluating/teaching RAG factuality. |
| RAP | Reasoning viA Planning—Planning-style reasoning that structures problem solving before answering. |
| RARR | Retrieve-and-Revise pipeline that edits outputs to add citations and fix unsupported claims. |
| RBG | Read-Before-Generate—Reads/retrieves first, then conditions generation on the evidence. |
| REPLUG | Prepends retrieved text and averages probabilities across retrieval passes to ground decoding. |
| RepE | Representation Engineering—Editing/steering latent directions to improve honesty/faithfulness. |
| RefChecker | Reference-based fine-grained hallucination checker and diagnostic benchmark. |
| Reflexion | Self-critique loop where the model reflects on errors and retries. |
| RID | Retrieval-In-Decoder—Retrieval integrated directly into the decoder loop. |
| RHO | Reranks candidates by factual consistency with retrieved knowledge or graph evidence. |
| RHD | Real-time Hallucination Detection—Online detection and optional self-correction during generation. |
| RLCD | Reinforcement Learning with Contrastive Decoding—RL variant that pairs contrastive objectives with decoding. |
| RLHF | Reinforcement Learning from Human Feedback—Uses human preference signals to align model behavior. |
| RLAIF | Reinforcement Learning from AI Feedback—Uses AI-generated preference signals to scale alignment. |
| RLKF | Reinforcement-Learning-based Knowledge Filtering that favors context-grounded generation. |
| ROUGE | Overlap-based summarization metric (e.g., ROUGE-L). |
| RaLFiT | Reinforcement-learning-style fine-tuning aimed at improving truthfulness/factuality. |
| S | |
| SC2 | Structured Comparative Reasoning—Compares structured alternatives and selects the most consistent one. |
| SCOTT | Self-Consistent Chain-of-Thought Distillation—Samples multiple CoTs and distills the consistent answer. |
| SCD | Self-Contrastive Decoding—Penalizes over-represented priors to counter knowledge overshadowing. |
| SEA | Spectral Editing of Activations—Projects activations along truth-aligned directions while suppressing misleading ones. |
| SEAL | Selective Abstention Learning—Teaches models to abstain (e.g., emit a reject token) when uncertain. |
| SEBRAG | Structured Evidence-Based RAG—RAG variant that structures evidence and grounding steps. |
| SEK | Evidence selection/structuring module used to verify or revise outputs. |
| SEPs | Semantic Entropy Probes—Fast probes that estimate uncertainty from hidden states. |
| Self-Checker | Pipeline that extracts and verifies claims using tools or retrieval. |
| Self-Checks | Generic self-verification passes (consistency checks, regeneration, or critique). |
| Self-Consistency | Samples multiple reasoning paths and selects the majority-consistent result. |
| Self-Familiarity | Calibrates outputs based on what the model “knows it knows” vs. uncertain areas. |
| Self-Refine | Iterative refine-and-feedback loop where the model improves its own draft. |
| Self-Reflection | The model reflects on its reasoning and revises responses accordingly. |
| SELF-RAG | Self-reflective RAG where a critic guides retrieval and edits drafts. |
| SelfCheckGPT | Consistency-based hallucination detector using multiple sampled outputs. |
| SH2 | Self-Highlighted Hesitation—Injects hesitation/abstention mechanisms at uncertain steps. |
| SimCLR | Contrastive representation learning framework used to build stronger encoders. |
| SimCTG | Contrastive text generation that constrains decoding to avoid degenerate outputs. |
| Socratic Prompting | Uses guided questions to elicit intermediate reasoning and evidence. |
| SVD | Singular Value Decomposition—Matrix factorization used to analyze or edit latent directions. |
| T | |
| ToT | Tree-of-Thought—Branch-and-evaluate reasoning over a tree of intermediate states. |
| TOPICPREFIX | Prompt/prefix-tuning that encodes topics to stabilize context adherence. |
| TrueTeacher | Teacher-style training that builds a factual evaluator and uses it to guide student outputs. |
| Truth Forest | Learns orthogonal “truth” representations and intervenes along those directions. |
| TruthfulQA | Benchmark evaluating resistance to common falsehoods. |
| TruthX | Latent editing method that nudges activations toward truthful directions. |
| Tuned Lens | Learns linear mappings from hidden states to logits to study/steer layer-wise predictions. |
| TWEAK | Think While Effectively Articulating Knowledge—Hypothesis-and-NLI-guided reranking that prefers supported continuations. |
| U | |
| UHGEval | Hallucination evaluation benchmark for unconstrained generation in Chinese and related settings. |
| UPRISE | Uses LLM signals to train a retriever that selects stronger prompts/evidence. |
| V | |
| Verbose Cloning | Prompting/aggregation technique that elicits explicit, fully specified answers to reduce ambiguity. |
| X | |
| XCoT | Cross-lingual Chain-of-Thought prompting/transfer. |
| XNLI | Cross-lingual NLI benchmark commonly used for entailment-based verification. |
Appendix A. Hallucination Mitigation Subcategories Comparison Table

Appendix B. Summary Table of Benchmarks Used in Hallucination Detection and Mitigation

References
- Tonmoy, S.M.T.I.; Zaman, S.M.M.; Jain, V.; Rani, A.; Rawte, V.; Chadha, A.; Das, A. A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models. January 2024. Available online: http://arxiv.org/abs/2401.01313 (accessed on 12 August 2025).
- Rawte, V.; Sheth, A.; Das, A. A Survey of Hallucination in Large Foundation Models. September 2023. Available online: http://arxiv.org/abs/2309.05922 (accessed on 12 August 2025).
- Huang, L.; Yu, W.; Ma, W.; Zhong, W.; Feng, Z.; Wang, H.; Chen, Q.; Peng, W.; Feng, X.; Qin, B.; et al. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. ACM Trans. Inf. Syst. 2024, 43, 1–55. [Google Scholar] [CrossRef]
- Agrawal, A.; Suzgun, M.; Mackey, L.; Kalai, A.T. Do Language Models Know When They’re Hallucinating References? May 2023. Available online: http://arxiv.org/abs/2305.18248 (accessed on 12 August 2025).
- Li, J.; Chen, J.; Ren, R.; Cheng, X.; Zhao, W.X.; Nie, J.; Wen, J. The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models. January 2024. Available online: http://arxiv.org/abs/2401.03205 (accessed on 12 August 2025).
- Garcia-Carmona, A.M.; Prieto, M.-L.; Puertas, E.; Beunza, J.-J. Enhanced Medical Data Extraction: Leveraging LLMs for Accurate Retrieval of Patient Information from Medical Reports. JMIR AI. November 2024. Available online: https://www.researchgate.net/publication/382224134_Enhanced_Medical_Data_Extraction_Leveraging_LLMs_for_Accurate_Retrieval_of_Patient_Information_from_Medical_Reports (accessed on 12 August 2025).
- Kim, Y.; Jeong, H.; Chen, S.; Li, S.S.; Lu, M.; Alhamoud, K.; Mun, J.; Grau, C.; Jung, M.; Gameiro, R.; et al. Medical Hallucinations in Foundation Models and Their Impact on Healthcare. February 2025. Available online: http://arxiv.org/abs/2503.05777 (accessed on 12 August 2025).
- Magesh, V.; Surani, F.; Dahl, M.; Suzgun, M.; Manning, C.D.; Ho, D.E. Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools. May 2024. Available online: http://arxiv.org/abs/2405.20362 (accessed on 12 August 2025).
- Dahl, M.; Magesh, V.; Suzgun, M.; Ho, D.E. Large Legal Fictions: Profiling Legal Hallucinations in Large Language Models. J. Leg. Anal. 2024, 16, 64–93. [Google Scholar] [CrossRef]
- Guha, N.; Nyarko, J.; Ho, D.E.; Ré, C.; Chilton, A.; Narayana, A.; Chohlas-Wood, A.; Peters, A.; Waldon, B.; Rockmore, D.N.; et al. LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models. August 2023. Available online: http://arxiv.org/abs/2308.11462 (accessed on 12 August 2025).
- Shrivastava, A.; Hullman, J.; Lamparth, M. Measuring Free-Form Decision-Making Inconsistency of Language Models in Military Crisis Simulations. October 2024. Available online: http://arxiv.org/abs/2410.13204 (accessed on 12 August 2025).
- Kalai, A.T.; Vempala, S.S. Calibrated Language Models Must Hallucinate. November 2023. Available online: http://arxiv.org/abs/2311.14648 (accessed on 12 August 2025).
- Xu, Z.; Jain, S.; Kankanhalli, M. Hallucination is Inevitable: An Innate Limitation of Large Language Models. January 2024. Available online: http://arxiv.org/abs/2401.11817 (accessed on 12 August 2025).
- Banerjee, S.; Agarwal, A.; Singla, S. LLMs Will Always Hallucinate, and We Need to Live with This. 2024. Available online: https://arxiv.org/abs/2409.05746 (accessed on 12 August 2025).
- Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.L.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A.; et al. Training Language Models to Follow Instructions with Human Feedback. March 2022. Available online: http://arxiv.org/abs/2203.02155 (accessed on 12 August 2025).
- Sun, W.; Shi, Z.; Gao, S.; Ren, P.; de Rijke, M.; Ren, Z. Contrastive Learning Reduces Hallucination in Conversations. December 2022. Available online: http://arxiv.org/abs/2212.10400 (accessed on 12 August 2025).
- Li, X.L.; Holtzman, A.; Fried, D.; Liang, P.; Eisner, J.; Hashimoto, T.; Zettlemoyer, L.; Lewis, M. Contrastive Decoding: Open-ended Text Generation as Optimization. October 2022. Available online: http://arxiv.org/abs/2210.15097 (accessed on 12 August 2025).
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.; Rocktäschel, T.; et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. May 2020. Available online: http://arxiv.org/abs/2005.11401 (accessed on 12 August 2025).
- Thoppilan, R.; Freitas, D.D.; Hall, J.; Shazeer, N.; Kulshreshtha, A.; Cheng, H.; Jin, A.; Bos, T.; Baker, L.; Du, Y.; et al. LaMDA: Language Models for Dialog Applications. January 2022. Available online: http://arxiv.org/abs/2201.08239 (accessed on 12 August 2025).
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Ichter, B.; Xia, F.; Chi, E.; Le, Q.; Zhou, D. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. January 2022. Available online: http://arxiv.org/abs/2201.11903 (accessed on 12 August 2025).
- Wang, X.; Wei, J.; Schuurmans, D.; Le, Q.; Chi, E.H.; Narang, S.; Chowdhery, A.; Zhou, D. Self-Consistency Improves Chain of Thought Reasoning in Language Models. May 2023. Available online: https://arxiv.org/abs/2203.11171 (accessed on 12 August 2025).
- Li, M.; Peng, B.; Galley, M.; Gao, J.; Zhang, Z. Self-Checker: Plug-and-Play Modules for Fact-Checking with Large Language Models. May 2023. Available online: http://arxiv.org/abs/2305.14623 (accessed on 12 August 2025).
- Welleck, S.; Lu, X.; West, P.; Brahman, F.; Shen, T.; Khashabi, D.; Choi, Y. Generating Sequences by Learning to Self-Correct. October 2022. Available online: http://arxiv.org/abs/2211.00053 (accessed on 12 August 2025).
- Leiser, F.; Eckhardt, S.; Knaeble, M.; Maedche, A.; Schwabe, G.; Sunyaev, A. From ChatGPT to FactGPT: A Participatory Design Study to Mitigate the Effects of Large Language Model Hallucinations on Users. In Proceedings of the ACM International Conference Proceeding Series, New York, NY, USA, 26 September 2023; Association for Computing Machinery: New York, NY, USA, 2023; pp. 81–90. [Google Scholar] [CrossRef]
- Farquhar, S.; Kossen, J.; Kuhn, L.; Gal, Y. Detecting hallucinations in large language models using semantic entropy. Nature 2024, 630, 625–630. [Google Scholar] [CrossRef]
- Chen, C.; Liu, K.; Chen, Z.; Gu, Y.; Wu, Y.; Tao, M.; Fu, Z.; Ye, J. INSIDE: LLMs’ Internal States Retain the Power of Hallucination Detection. February 2024. Available online: http://arxiv.org/abs/2402.03744 (accessed on 12 August 2025).
- Bilal, A.; Mohsin, M.A.; Umer, M.; Bangash, M.A.K.; Jamshed, M.A. Meta-Thinking in LLMs via Multi-Agent Reinforcement Learning: A Survey. April 2025. Available online: http://arxiv.org/abs/2504.14520 (accessed on 12 August 2025).
- Shinn, N.; Cassano, F.; Berman, E.; Gopinath, A.; Narasimhan, K.; Yao, S. Reflexion: Language Agents with Verbal Reinforcement Learning. March 2023. Available online: http://arxiv.org/abs/2303.11366 (accessed on 12 August 2025).
- Xu, Z. Context-Aware Decoding Reduces Hallucination in Query-Focused Summarization. December 2023. Available online: http://arxiv.org/abs/2312.14335 (accessed on 12 August 2025).
- Slobodkin, A.; Goldman, O.; Caciularu, A.; Dagan, I.; Ravfogel, S. The Curious Case of Hallucinatory (Un)Answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models. October 2023. Available online: http://arxiv.org/abs/2310.11877 (accessed on 12 August 2025).
- Ji, Z.; Lee, N.; Frieske, R.; Yu, T.; Su, D.; Xu, Y.; Ishii, E.; Bang, Y.; Chen, D.; Dai, W.; et al. Survey of Hallucination in Natural Language Generation. July 2024. Available online: https://arxiv.org/pdf/2202.03629 (accessed on 12 August 2025).
- Berberette, E.; Hutchins, J.; Sadovnik, A. Redefining ‘Hallucination’ in LLMs: Towards a Psychology-Informed Framework for Mitigating Misinformation. February 2024. Available online: http://arxiv.org/abs/2402.01769 (accessed on 12 August 2025).
- van Deemter, K. The Pitfalls of Defining Hallucination. January 2024. Available online: http://arxiv.org/abs/2401.07897 (accessed on 12 August 2025).
- Lee, K.; Ippolito, D.; Nystrom, A.; Zhang, C.; Eck, D.; Callison-Burch, C.; Carlini, N. Deduplicating Training Data Makes Language Models Better. July 2021. Available online: http://arxiv.org/abs/2107.06499 (accessed on 12 August 2025).
- Carlini, N.; Ippolito, D.; Jagielski, M.; Lee, K.; Tramer, F.; Zhang, C. Quantifying Memorization Across Neural Language Models. February 2022. Available online: http://arxiv.org/abs/2202.07646 (accessed on 12 August 2025).
- McKenna, N.; Li, T.; Cheng, L.; Hosseini, M.J.; Johnson, M.; Steedman, M. Sources of Hallucination by Large Language Models on Inference Tasks. May 2023. Available online: http://arxiv.org/abs/2305.14552 (accessed on 12 August 2025).
- Lin, Z.; Guan, S.; Zhang, W.; Zhang, H.; Li, Y.; Zhang, H. Towards Trustworthy LLMs: A Review on Debiasing and Dehallucinating in Large Language Models. Artif. Intell. Rev. 2024, 57, 243. [Google Scholar] [CrossRef]
- Hoffmann, J.; Borgeaud, S.; Mensch, A.; Buchatskaya, E.; Cai, T.; Rutherford, E.; Casas, D.d.L.; Hendricks, L.A.; Welbl, J.; Clark, A.; et al. Training Compute-Optimal Large Language Models. March 2022. Available online: http://arxiv.org/abs/2203.15556 (accessed on 12 August 2025).
- Dziri, N.; Milton, S.; Yu, M.; Zaiane, O.; Reddy, S. On the Origin of Hallucinations in Conversational Models: Is it the Datasets or the Models? April 2022. Available online: http://arxiv.org/abs/2204.07931 (accessed on 12 August 2025).
- Li, J.; Consul, S.; Zhou, E.; Wong, J.; Farooqui, N.; Ye, Y.; Manohar, N.; Wei, Z.; Wu, T.; Echols, B.; et al. Banishing LLM Hallucinations Requires Rethinking Generalization. June 2024. Available online: http://arxiv.org/abs/2406.17642 (accessed on 12 August 2025).
- Yao, J.-Y.; Ning, K.-P.; Liu, Z.-H.; Ning, M.-N.; Liu, Y.-Y.; Yuan, L. LLM Lies: Hallucinations Are Not Bugs, but Features as Adversarial Examples. October 2023. Available online: http://arxiv.org/abs/2310.01469 (accessed on 12 August 2025).
- Köksal, A.; Aksitov, R.; Chang, C.-C. Hallucination Augmented Recitations for Language Models. November 2023. Available online: http://arxiv.org/abs/2311.07424 (accessed on 12 August 2025).
- He, Z.; Zhang, B.; Cheng, L. Shakespearean Sparks: The Dance of Hallucination and Creativity in LLMs’ Decoding Layers. March 2025. Available online: http://arxiv.org/abs/2503.02851 (accessed on 12 August 2025).
- Gundogmusler, A.; Bayindiroglu, F.; Karakucukoglu, M. Mathematical Foundations of Hallucination in Transformer-Based Large Language Models for Improvisation. TechRxiv 2024. [Google Scholar] [CrossRef] [PubMed]
- Ji, Z.; Yu, T.; Xu, Y.; Lee, N.; Ishii, E.; Fung, P. Towards Mitigating Hallucination in Large Language Models via Self-Reflection. October 2023. Available online: http://arxiv.org/abs/2310.06271 (accessed on 12 August 2025).
- McIntosh, T.R.; Liu, T.; Susnjak, T.; Watters, P.; Ng, A.; Halgamuge, M.N. A Culturally Sensitive Test to Evaluate Nuanced GPT Hallucination. IEEE Access 2024, 12, 51555–51572. [Google Scholar] [CrossRef]
- Shah, S.V. Accuracy, Consistency, and Hallucination of Large Language Models When Analyzing Unstructured Clinical Notes in Electronic Medical Records. JAMA Netw. Open 2024, 7, e2425953. [Google Scholar] [CrossRef]
- Maleki, N.; Padmanabhan, B.; Dutta, K. AI Hallucinations: A Misnomer Worth Clarifying. January 2024. Available online: http://arxiv.org/abs/2401.06796 (accessed on 12 August 2025).
- Yin, Z. A review of methods for alleviating hallucination issues in large language models. Appl. Comput. Eng. 2024, 76, 258–266. [Google Scholar] [CrossRef]
- Ye, H.; Liu, T.; Zhang, A.; Hua, W.; Jia, W. Cognitive Mirage: A Review of Hallucinations in Large Language Models. September 2023. Available online: http://arxiv.org/abs/2309.06794 (accessed on 12 August 2025).
- Zhang, W.; Zhang, J. Hallucination Mitigation for Retrieval-Augmented Large Language Models: A Review. Mathematics 2025, 13, 856. [Google Scholar] [CrossRef]
- Perković, G.; Drobnjak, A.; Botički, I. Hallucinations in LLMs: Understanding and Addressing Challenges. In Proceedings of the 2024 47th ICT and Electronics Convention, MIPRO 2024-Proceedings, Opatija, Croatia, 20–24 May 2024; Institute of Electrical and Electronics Engineers Inc.: New York, NY, USA, 2024; pp. 2084–2088. [Google Scholar] [CrossRef]
- Lin, S.; Hilton, J.; Evans, O. TruthfulQA: Measuring How Models Mimic Human Falsehoods. Long Papers. May 2022. Available online: https://arxiv.org/abs/2109.07958 (accessed on 12 August 2025).
- Zhang, Y.; Li, S.; Liu, J.; Yu, P.; Fung, Y.R.; Li, J.; Li, M.; Ji, H. Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models. July 2024. Available online: http://arxiv.org/abs/2407.08039 (accessed on 12 August 2025).
- Su, W.; Tang, Y.; Ai, Q.; Wang, C.; Wu, Z.; Liu, Y. Mitigating Entity-Level Hallucination in Large Language Models. In Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, Washington DC, USA, 14–18 December 2024; ACM: New York, NY, USA, 2024; pp. 23–31. [Google Scholar] [CrossRef]
- Zhang, Y.; Li, Y.; Cui, L.; Cai, D.; Liu, L.; Fu, T.; Huang, X.; Zhao, E.; Zhang, Y.; Chen, Y.; et al. Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models. September 2023. Available online: http://arxiv.org/abs/2309.01219 (accessed on 12 August 2025).
- Ji, Z.; Gu, Y.; Zhang, W.; Lyu, C.; Lin, D.; Chen, K. ANAH: Analytical Annotation of Hallucinations in Large Language Models. May 2024. Available online: http://arxiv.org/abs/2405.20315 (accessed on 12 August 2025).
- Maynez, J.; Narayan, S.; Bohnet, B.; McDonald, R. On Faithfulness and Factuality in Abstractive Summarization. May 2020. Available online: http://arxiv.org/abs/2005.00661 (accessed on 12 August 2025).
- Rawte, V.; Chakraborty, S.; Pathak, A.; Sarkar, A.; Tonmoy, S.M.T.I.; Chadha, A.; Sheth, A.P.; Das, A. The Troubling Emergence of Hallucination in Large Language Models—An Extensive Definition, Quantification, and Prescriptive Remediations. October 2023. Available online: http://arxiv.org/abs/2310.04988 (accessed on 12 August 2025).
- Nan, F.; Nallapati, R.; Wang, Z.; Santos CNd Zhu, H.; Zhang, D.; McKeown, K.; Xiang, B. Entity-level Factual Consistency of Abstractive Text Summarization. February 2021. Available online: http://arxiv.org/abs/2102.09130 (accessed on 12 August 2025).
- Guan, X.; Liu, Y.; Lin, H.; Lu, Y.; He, B.; Han, X.; Sun, L. Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-Based Retrofitting. 2024. Available online: http://arxiv.org/abs/2311.13314 (accessed on 12 August 2025).
- Vu, T.; Iyyer, M.; Wang, X.; Constant, N.; Wei, J.; Wei, J.; Tar, C.; Sung, Y.; Zhou, D.; Le, Q. FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation. October 2023. Available online: http://arxiv.org/abs/2310.03214 (accessed on 12 August 2025).
- Wei, J.; Tay, Y.; Bommasani, R.; Raffel, C.; Zoph, B.; Borgeaud, S.; Yogatama, D.; Bosma, M.; Zhou, D.; Metzler, D.; et al. Emergent Abilities of Large Language Models. June 2022. Available online: http://arxiv.org/abs/2206.07682 (accessed on 12 August 2025).
- Besta, M.; Blach, N.; Kubicek, A.; Gerstenberger, R.; Podstawski, M.; Gianinazzi, L.; Gajda, J.; Lehmann, T.; Niewiadomski, H.; Nyczyk, P.; et al. Graph of Thoughts: Solving Elaborate Problems with Large Language Models. Proc. AAAI Conf. Artif. Intell. 2023, 38, 17682–17690. [Google Scholar] [CrossRef]
- Zhang, M.; Press, O.; Merrill, W.; Liu, A.; Smith, N.A. How Language Model Hallucinations Can Snowball. May 2023. Available online: http://arxiv.org/abs/2305.13534 (accessed on 12 August 2025).
- Zhang, Z.; Wang, Y.; Wang, C.; Chen, J.; Zheng, Z. LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation. September 2024. Available online: http://arxiv.org/abs/2409.20550 (accessed on 12 August 2025).
- Mündler, N.; He, J.; Jenko, S.; Vechev, M. Self-Contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation. May 2023. Available online: http://arxiv.org/abs/2305.15852 (accessed on 12 August 2025).
- Li, Y.; Li, Z.; Hung, K.; Wang, W.; Xie, H.; Li, Y. Ambiguity processing in Large Language Models: Detection, resolution, and the path to hallucination. Nat. Lang. Process. J. 2025, 100173. [Google Scholar] [CrossRef]
- Sharma, M.; Tong, M.; Korbak, T.; Duvenaud, D.; Askell, A.; Bowman, S.R.; Cheng, N.; Durmus, E.; Hatfield-Dodds, Z.; Johnston, S.R.; et al. Towards Understanding Sycophancy in Language Models. October 2023. Available online: http://arxiv.org/abs/2310.13548 (accessed on 12 August 2025).
- Turpin, M.; Michael, J.; Perez, E.; Bowman, S.R. Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting. May 2023. Available online: http://arxiv.org/abs/2305.04388 (accessed on 12 August 2025).
- Si, C.; Gan, Z.; Yang, Z.; Wang, S.; Wang, J.; Boyd-Graber, J.; Wang, L. Prompting GPT-3 To Be Reliable. October 2022. Available online: http://arxiv.org/abs/2210.09150 (accessed on 12 August 2025).
- Zamfirescu-Pereira, J.D.; Wong, R.Y.; Hartmann, B.; Yang, Q. Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. In Proceedings of the Conference on Human Factors in Computing Systems-Proceedings, Association for Computing Machinery, Hamburg, Germany, 23–28 April 2023. [Google Scholar] [CrossRef]
- Gao, T.; Fisch, A.; Chen, D. Making Pre-trained Language Models Better Few-Shot Learners. June 2021. Available online: http://arxiv.org/abs/2012.15723 (accessed on 12 August 2025).
- Shuster, K.; Poff, S.; Chen, M.; Kiela, D.; Weston, J. Retrieval Augmentation Reduces Hallucination in Conversation. April 2021. Available online: http://arxiv.org/abs/2104.07567 (accessed on 12 August 2025).
- Holtzman, A.; Buys, J.; Du, L.; Forbes, M.; Choi, Y. The Curious Case of Neural Text Degeneration. April 2019. Available online: http://arxiv.org/abs/1904.09751 (accessed on 12 August 2025).
- Tian, R.; Narayan, S.; Sellam, T.; Parikh, A.P. Sticking to the Facts: Confident Decoding for Faithful Data-to-Text Generation. October 2019. Available online: http://arxiv.org/abs/1910.08684 (accessed on 12 August 2025).
- Xiong, M.; Hu, Z.; Lu, X.; Li, Y.; Fu, J.; He, J.; Hooi, B. Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs. June 2023. Available online: http://arxiv.org/abs/2306.13063 (accessed on 12 August 2025).
- Qiu, X.; Miikkulainen, R. Semantic Density: Uncertainty Quantification for Large Language Models Through Confidence Measurement in Semantic Space. May 2024. Available online: http://arxiv.org/abs/2405.13845 (accessed on 12 August 2025).
- Simhi, A.; Itzhak, I.; Barez, F.; Stanovsky, G.; Me, Y.B.T. I’m Wrong: High-Certainty Hallucinations in LLMs. February 2025. Available online: http://arxiv.org/abs/2502.12964 (accessed on 12 August 2025).
- Schick, T.; Udupa, S.; Schütze, H. Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP. February 2021. Available online: http://arxiv.org/abs/2103.00453 (accessed on 12 August 2025).
- Bai, Y.; Kadavath, S.; Kundu, S.; Askell, A.; Kernion, J.; Jones, A.; Chen, A.; Goldie, A.; Mirhoseini, A.; McKinnon, C.; et al. Constitutional AI: Harmlessness from AI Feedback. December 2022. Available online: http://arxiv.org/abs/2212.08073 (accessed on 12 August 2025).
- Jha, S.; Jha, S.K.; Lincoln, P.; Bastian, N.D.; Velasquez, A.; Neema, S. Dehallucinating Large Language Models Using Formal Methods Guided Iterative Prompting. In Proceedings of the 2023 IEEE International Conference on Assured Autonomy, ICAA 2023, Laurel, MD, USA, 6–8 June 2023; Institute of Electrical and Electronics Engineers Inc.: New York, NY, USA, 2023; pp. 149–152. [Google Scholar] [CrossRef]
- Azaria, A.; Mitchell, T. The Internal State of an LLM Knows When It’s Lying. April 2023. Available online: http://arxiv.org/abs/2304.13734 (accessed on 12 August 2025).
- Luo, J.; Xiao, C.; Ma, F. Zero-Resource Hallucination Prevention for Large Language Models. September 2023. Available online: http://arxiv.org/abs/2309.02654 (accessed on 12 August 2025).
- Luo, J.; Li, T.; Wu, D.; Jenkin, M.; Liu, S.; Dudek, G. Hallucination Detection and Hallucination Mitigation: An Investigation. January 2024. Available online: http://arxiv.org/abs/2401.08358 (accessed on 12 August 2025).
- Liu, F.; Liu, Y.; Shi, L.; Huang, H.; Wang, R.; Yang, Z.; Zhang, L.; Li, Z.; Ma, Y. Exploring and Evaluating Hallucinations in LLM-Powered Code Generation. April 2024. Available online: http://arxiv.org/abs/2404.00971 (accessed on 12 August 2025).
- Zhao, Y.; Liu, Z.; Zheng, Y.; Lam, K.-Y. Attribution Techniques for Mitigating Hallucination in RAG-based Question-Answering Systems: A Survey. TechRxiv 2025. [Google Scholar] [CrossRef]
- Agrawal, G.; Kumarage, T.; Alghamdi, Z.; Liu, H. Can Knowledge Graphs Reduce Hallucinations in LLMs? A Survey. March 2024. Available online: http://arxiv.org/abs/2311.07914 (accessed on 12 August 2025).
- Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. May 2020. Available online: http://arxiv.org/abs/2005.14165 (accessed on 12 August 2025).
- Li, K.; Zhang, Y.; Li, K.; Fu, Y. Adversarial Feature Hallucination Networks for Few-Shot Learning. 2020. Available online: http://arxiv.org/abs/2003.13193 (accessed on 12 August 2025).
- Filippova, K. Controlled Hallucinations: Learning to Generate Faithfully from Noisy Data. October 2020. Available online: http://arxiv.org/abs/2010.05873 (accessed on 12 August 2025).
- Wan, D.; Bansal, M. FactPEGASUS: Factuality-Aware Pre-training and Fine-tuning for Abstractive Summarization. May 2022. Available online: http://arxiv.org/abs/2205.07830 (accessed on 12 August 2025).
- Mishra, A.; Asai, A.; Balachandran, V.; Wang, Y.; Neubig, G.; Tsvetkov, Y.; Hajishirzi, H. Fine-grained Hallucination Detection and Editing for Language Models. January 2024. Available online: http://arxiv.org/abs/2401.06855 (accessed on 12 August 2025).
- Hu, M.; He, B.; Wang, Y.; Li, L.; Ma, C.; King, I. Mitigating Large Language Model Hallucination with Faithful Finetuning. June 2024. Available online: http://arxiv.org/abs/2406.11267 (accessed on 12 August 2025).
- Sun, Z.; Shen, Y.; Zhou, Q.; Zhang, H.; Chen, Z.; Cox, D.; Yang, Y.; Gan, C. Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision. May 2023. Available online: http://arxiv.org/abs/2305.03047 (accessed on 12 August 2025).
- Zhou, C.; Neubig, G.; Gu, J.; Diab, M.; Guzmán, F.; Zettlemoyer, L.; Ghazvininejad, M. Detecting Hallucinated Content in Conditional Neural Sequence Generation. 2021. Available online: https://arxiv.org/abs/2011.02593 (accessed on 12 August 2025).
- Chen, A.; Pasupat, P.; Singh, S.; Lee, H.; Guu, K. PURR: Efficiently Editing Language Model Hallucinations by Denoising Language Model Corruptions. May 2023. Available online: http://arxiv.org/abs/2305.14908 (accessed on 12 August 2025).
- Gekhman, Z.; Herzig, J.; Aharoni, R.; Elkind, C.; Szpektor, I. TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models. May 2023. Available online: http://arxiv.org/abs/2305.11171 (accessed on 12 August 2025).
- Hu, Y.; Gan, L.; Xiao, W.; Kuang, K.; Wu, F. Fine-tuning Large Language Models for Improving Factuality in Legal Question Answering. January 2025. Available online: http://arxiv.org/abs/2501.06521 (accessed on 12 August 2025).
- Qiu, Y.; Ziser, Y.; Korhonen, A.; Ponti, E.M.; Cohen, S.B. Detecting and Mitigating Hallucinations in Multilingual Summarisation. May 2023. Available online: http://arxiv.org/abs/2305.13632 (accessed on 12 August 2025).
- Tang, Z.; Chatterjee, R.; Garg, S. Mitigating Hallucinated Translations in Large Language Models with Hallucination-focused Preference Optimization. January 2025. Available online: http://arxiv.org/abs/2501.17295 (accessed on 12 August 2025).
- Cheng, D.; Huang, S.; Bi, J.; Zhan, Y.; Liu, J.; Wang, Y.; Sun, H.; Wei, F.; Deng, D.; Zhang, Q. UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation. March 2023. Available online: http://arxiv.org/abs/2303.08518 (accessed on 12 August 2025).
- Razumovskaia, E.; Vulić, I.; Marković, P.; Cichy, T.; Zheng, Q.; Wen, T.; Budzianowski, P. Dial BEINFO for Faithfulness: Improving Factuality of Information-Seeking Dialogue via Behavioural Fine-Tuning. November 2023. Available online: http://arxiv.org/abs/2311.09800 (accessed on 12 August 2025).
- Elaraby, M.; Lu, M.; Dunn, J.; Zhang, X.; Wang, Y.; Liu, S.; Tian, P.; Wang, Y.; Wang, Y. Halo: Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models. August 2023. Available online: http://arxiv.org/abs/2308.11764 (accessed on 12 August 2025).
- Rehman, T.; Mandal, R.; Agarwal, A.; Sanyal, D.K. Hallucination Reduction in Long Input Text Summarization. September 2023. Available online: http://arxiv.org/abs/2309.16781 (accessed on 12 August 2025).
- Gekhman, Z.; Yona, G.; Aharoni, R.; Eyal, M.; Feder, A.; Reichart, R.; Herzig, J. Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? May 2024. Available online: http://arxiv.org/abs/2405.05904 (accessed on 12 August 2025).
- Zhu, Z.; Yang, Y.; Sun, Z. HaluEval-Wild: Evaluating Hallucinations of Language Models in the Wild. March 2024. Available online: http://arxiv.org/abs/2403.04307 (accessed on 12 August 2025).
- Xia, Y.; Liu, X.; Yu, T.; Kim, S.; Rossi, R.A.; Rao, A.; Mai, T.; Li, S. Hallucination Diversity-Aware Active Learning for Text Summarization. April 2024. Available online: http://arxiv.org/abs/2404.01588 (accessed on 12 August 2025).
- Cao, H.; An, Z.; Feng, J.; Xu, K.; Chen, L.; Zhao, D. A Step Closer to Comprehensive Answers: Constrained Multi-Stage Question Decomposition with Large Language Models. November 2023. Available online: http://arxiv.org/abs/2311.07491 (accessed on 12 August 2025).
- Goodrich, B.; Rao, V.; Liu, P.J.; Saleh, M. Assessing the Factual Accuracy of Generated Text. In Proceedings of the KDD’19: The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Anchorage, AK, USA, 4–8 August 2019. [Google Scholar] [CrossRef]
- Paudel, B.; Lyzhov, A.; Joshi, P.; Anand, P. HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification. April 2025. Available online: http://arxiv.org/abs/2504.07069 (accessed on 12 August 2025).
- Zou, A.; Phan, L.; Chen, S.; Campbell, J.; Guo, P.; Ren, R.; Pan, A.; Yin, X.; Mazeika, M.; Dombrowski, A.; et al. Representation Engineering: A Top-Down Approach to AI Transparency. October 2023. Available online: http://arxiv.org/abs/2310.01405 (accessed on 12 August 2025).
- Li, R.; Luo, Z.; Du, X. FG-PRM: Fine-grained Hallucination Detection and Mitigation in Language Model Mathematical Reasoning. October 2024. Available online: http://arxiv.org/abs/2410.06304 (accessed on 12 August 2025).
- Niu, C.; Wu, Y.; Zhu, J.; Xu, S.; Shum, K.; Zhong, R.; Song, J.; Zhang, T. RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models. December 2023. Available online: http://arxiv.org/abs/2401.00396 (accessed on 12 August 2025).
- Lango, M.; Dušek, O. Critic-Driven Decoding for Mitigating Hallucinations in Data-to-text Generation. 2023. Available online: https://arxiv.org/abs/2310.16964 (accessed on 12 August 2025).
- Lee, N.; Ping, W.; Xu, P.; Patwary, M.; Fung, P.; Shoeybi, M.; Catanzaro, B. Factuality Enhanced Language Models for Open-Ended Text Generation. June 2022. Available online: http://arxiv.org/abs/2206.04624 (accessed on 12 August 2025).
- Pacchiardi, L.; Chan, A.J.; Mindermann, S.; Moscovitz, I.; Pan, A.Y.; Gal, Y.; Evans, O.; Brauner, J. How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions. September 2023. Available online: http://arxiv.org/abs/2309.15840 (accessed on 12 August 2025).
- Pfeiffer, J.; Piccinno, F.; Nicosia, M.; Wang, X.; Reid, M.; Ruder, S. mmT5: Modular Multilingual Pre-Training Solves Source Language Hallucinations. May 2023. Available online: http://arxiv.org/abs/2305.14224 (accessed on 12 August 2025).
- Qiu, Y.; Embar, V.; Cohen, S.B.; Han, B. Think While You Write: Hypothesis Verification Promotes Faithful Knowledge-to-Text Generation. November 2023. Available online: http://arxiv.org/abs/2311.09467 (accessed on 12 August 2025).
- Chen, Z.; Sun, X.; Jiao, X.; Lian, F.; Kang, Z.; Wang, D.; Xu, C. Truth Forest: Toward Multi-Scale Truthfulness in Large Language Models through Intervention without Tuning. December 2023. Available online: http://arxiv.org/abs/2312.17484 (accessed on 12 August 2025).
- Zhang, S.; Yu, T.; Feng, Y. TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space. February 2024. Available online: http://arxiv.org/abs/2402.17811 (accessed on 12 August 2025).
- Lewis, A.; White, M.; Liu, J.; Koike-Akino, T.; Parsons, K.; Wang, Y. Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in QA Agents. February 2025. Available online: http://arxiv.org/abs/2502.19545 (accessed on 12 August 2025).
- Xu, C.; Sun, Q.; Zheng, K.; Geng, X.; Zhao, P.; Feng, J.; Tao, C.; Jiang, D. WizardLM: Empowering Large Language Models to Follow Complex Instructions. April 2023. Available online: http://arxiv.org/abs/2304.12244 (accessed on 12 August 2025).
- Longpre, S.; Perisetla, K.; Chen, A.; Nikhil, R.; Dubois, C.; Singh, S. Entity-Based Knowledge Conflicts in Question Answering. 2021. Available online: https://arxiv.org/abs/2109.05052 (accessed on 3 August 2025).
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction. 2015. Available online: https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf (accessed on 12 August 2025).
- Roit, P.; Ferret, J.; Shani, L.; Aharoni, R.; Cideron, G.; Dadashi, R.; Geist, M.; Girgin, S.; Hussenot, L.; Keller, O.; et al. Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback. May 2023. Available online: http://arxiv.org/abs/2306.00186 (accessed on 12 August 2025).
- Tian, K.; Mitchell, E.; Yao, H.; Manning, C.D.; Finn, C. Fine-Tuning Language Models for Factuality. November 2023. Available online: http://arxiv.org/abs/2311.08401 (accessed on 12 August 2025).
- Lightman, H.; Kosaraju, V.; Burda, Y.; Edwards, H.; Baker, B.; Lee, T.; Leike, J.; Schulman, J.; Sutskever, I.; Cobbe, K. Let’s Verify Step by Step. May 2023. Available online: http://arxiv.org/abs/2305.20050 (accessed on 12 August 2025).
- Christiano, P.; Leike, J.; Brown, T.B.; Martic, M.; Legg, S.; Amodei, D. Deep Reinforcement Learning from Human Preferences. June 2017. Available online: http://arxiv.org/abs/1706.03741 (accessed on 12 August 2025).
- Ji, J.; Qiu, T.; Chen, B.; Zhang, B.; Lou, H.; Wang, K.; Duan, Y.; He, Z.; Vierling, L.; Hong, D.; et al. AI Alignment: A Comprehensive Survey. October 2023. Available online: http://arxiv.org/abs/2310.19852 (accessed on 12 August 2025).
- OpenAI; Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; et al. GPT-4 Technical Report. March 2023. Available online: http://arxiv.org/abs/2303.08774 (accessed on 12 August 2025).
- Yang, Y.; Chern, E.; Qiu, X.; Neubig, G.; Liu, P. Alignment for Honesty. October 2024. Available online: http://arxiv.org/abs/2312.07000 (accessed on 12 August 2025).
- Perez, E.; Ringer, S.; Lukošiūtė, K.; Nguyen, K.; Chen, E.; Heiner, S.; Pettit, C.; Olsson, C.; Kundu, S.; Kadavath, S.; et al. Discovering Language Model Behaviors with Model-Written Evaluations. December 2022. Available online: http://arxiv.org/abs/2212.09251 (accessed on 12 August 2025).
- Lee, H.; Phatale, S.; Mansoor, H.; Mesnard, T.; Ferret, J.; Lu, K.; Bishop, C.; Hall, E.; Carbune, V.; Rastogi, A.; et al. RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback. September 2023. Available online: http://arxiv.org/abs/2309.00267 (accessed on 12 August 2025).
- Liang, Y.; Song, Z.; Wang, H.; Zhang, J. Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation. January 2024. Available online: http://arxiv.org/abs/2401.15449 (accessed on 12 August 2025).
- Cheng, X.; Li, J.; Zhao, W.X.; More, J.-R.W.T. Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking. January 2025. Available online: http://arxiv.org/abs/2501.01306 (accessed on 12 August 2025).
- Lin, S.; Gao, L.; Oguz, B.; Xiong, W.; Lin, J.; Yih, W.; Chen, X. FLAME: Factuality-Aware Alignment for Large Language Models. May 2024. Available online: http://arxiv.org/abs/2405.01525 (accessed on 12 August 2025).
- Parcalabescu, L.; Frank, A. On Measuring Faithfulness or Self-consistency of Natural Language Explanations. November 2023. Available online: http://arxiv.org/abs/2311.07466 (accessed on 12 August 2025).
- Gosmar, D.; Dahl, D.A. Hallucination Mitigation Using Agentic AI Natural Language-Based Frameworks. January 2025. Available online: http://arxiv.org/abs/2501.13946 (accessed on 12 August 2025).
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A Simple Framework for Contrastive Learning of Visual Representations. February 2020. Available online: http://arxiv.org/abs/2002.05709 (accessed on 12 August 2025).
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum Contrast for Unsupervised Visual Representation Learning. November 2019. Available online: http://arxiv.org/abs/1911.05722 (accessed on 12 August 2025).
- Chern, I.-C.; Wang, Z.; Das, S.; Sharma, B.; Liu, P.; Neubig, G. Improving Factuality of Abstractive Summarization via Contrastive Reward Learning. July 2023. Available online: http://arxiv.org/abs/2307.04507 (accessed on 12 August 2025).
- Robinson, J.; Chuang, C.-Y.; Sra, S.; Jegelka, S. Contrastive Learning with Hard Negative Samples. October 2020. Available online: http://arxiv.org/abs/2010.04592 (accessed on 12 August 2025).
- Yang, Z.; Qi, P.; Zhang, S.; Bengio, Y.; Cohen, W.W.; Salakhutdinov, R.; Manning, C.D. HOTPOTQA: A Dataset for Diverse, Explainable Multi-Hop Question Answering. Association for Computational Linguistics. Available online: https://arxiv.org/abs/1809.09600 (accessed on 12 August 2025).
- Wu, H.; Li, X.; Xu, X.; Wu, J.; Zhang, D.; Liu, Z. Iter-AHMCL: Alleviate Hallucination for Large Language Model via Iterative Model-level Contrastive Learning. October 2024. Available online: http://arxiv.org/abs/2410.12130 (accessed on 12 August 2025).
- Gema, A.P.; Jin, C.; Abdulaal, A.; Diethe, T.; Teare, P.; Alex, B.; Minervini, P.; Saseendran, A. DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations. October 2024. Available online: http://arxiv.org/abs/2410.18860 (accessed on 12 August 2025).
- Huang, C.P.; Chen, H.-Y. Delta—Contrastive Decoding Mitigates Text Hallucinations in Large Language Models. February 2025. Available online: http://arxiv.org/abs/2502.05825 (accessed on 12 August 2025).
- He, J.; Gong, Y.; Chen, K.; Lin, Z.; Wei, C.; Zhao, Y. LLM Factoscope: Uncovering LLMs’ Factual Discernment through Inner States Analysis. December 2023. Available online: http://arxiv.org/abs/2312.16374 (accessed on 12 August 2025).
- Wang, P.; Wang, Z.; Li, Z.; Gao, Y.; Yin, B.; Ren, X. SCOTT: Self-Consistent Chain-of-Thought Distillation. 2023. Available online: http://arxiv.org/abs/2305.01879 (accessed on 12 August 2025).
- Chuang, Y.-S.; Xie, Y.; Luo, H.; Kim, Y.; Glass, J.; He, P. DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models. September 2023. Available online: http://arxiv.org/abs/2309.03883 (accessed on 12 August 2025).
- Xu, W.; Agrawal, S.; Briakou, E.; Martindale, M.J.; Carpuat, M. Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection. January 2023. Available online: http://arxiv.org/abs/2301.07779 (accessed on 12 August 2025).
- Nguyen, H.; He, Z.; Gandre, S.A.; Pasupulety, U.; Shivakumar, S.K.; Lerman, K. Smoothing Out Hallucinations: Mitigating LLM Hallucination with Smoothed Knowledge Distillation. February 2025. Available online: http://arxiv.org/abs/2502.11306 (accessed on 12 August 2025).
- Liu, W.; Li, G.; Zhang, K.; Du, B.; Chen, Q.; Hu, X.; Xu, H.; Chen, J.; Wu, J. Mind’s Mirror: Distilling Self-Evaluation Capability and Comprehensive Thinking from Large Language Models. November 2023. Available online: http://arxiv.org/abs/2311.09214 (accessed on 12 August 2025).
- Feng, J.; Wang, Q.; Qiu, H.; Liu, L. Retrieval In Decoder benefits generative models for explainable complex question answering. Neural Netw. 2025, 181, 106833. [Google Scholar] [CrossRef] [PubMed]
- Zhang, H.; Diao, S.; Lin, Y.; Fung, Y.R.; Lian, Q.; Wang, X.; Chen, Y.; Ji, H.; Zhang, T. R-Tuning: Instructing Large Language Models to Say ‘I Don’t Know’. November 2023. Available online: http://arxiv.org/abs/2311.09677 (accessed on 12 August 2025).
- Chung, H.W.; Hou, L.; Longpre, S.; Zoph, B.; Tay, Y.; Fedus, W.; Li, Y.; Wang, X.; Dehghani, M.; Brahma, S.; et al. Scaling Instruction-Finetuned Language Models. October 2022. Available online: http://arxiv.org/abs/2210.11416 (accessed on 12 August 2025).
- Wan, F.; Huang, X.; Cui, L.; Quan, X.; Bi, W.; Shi, S. Knowledge Verification to Nip Hallucination in the Bud. January 2024. Available online: http://arxiv.org/abs/2401.10768 (accessed on 12 August 2025).
- Zhao, Y.; Yan, L.; Sun, W.; Xing, G.; Wang, S.; Meng, C.; Cheng, Z.; Ren, Z.; Yin, D. Improving the Robustness of Large Language Models via Consistency Alignment. March 2024. Available online: http://arxiv.org/abs/2403.14221 (accessed on 12 August 2025).
- Wang, Y.; Kordi, Y.; Mishra, S.; Liu, A.; Smith, N.A.; Khashabi, D.; Hajishirzi, H. Self-Instruct: Aligning Language Models with Self-Generated Instructions. December 2022. Available online: http://arxiv.org/abs/2212.10560 (accessed on 12 August 2025).
- Zheng, W.; Lee, R.K.-W.; Liu, Z.; Wu, K.; Aw, A.; Zou, B. CCL-XCoT: An Efficient Cross-Lingual Knowledge Transfer Method for Mitigating Hallucination Generation. July 2025. Available online: http://arxiv.org/abs/2507.14239 (accessed on 12 August 2025).
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. September 2014. Available online: http://arxiv.org/abs/1409.0473 (accessed on 12 August 2025).
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. June 2017. Available online: http://arxiv.org/abs/1706.03762 (accessed on 12 August 2025).
- Michel, P.; Levy, O.; Neubig, G. Are Sixteen Heads Really Better than One? November 2019. Available online: http://arxiv.org/abs/1905.10650 (accessed on 12 August 2025).
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.-J. BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, PA, USA, 7–12 July 2002; pp. 311–318. [Google Scholar] [CrossRef]
- Li, K.; Liu, T.; Bashkansky, N.; Bau, D.; Viégas, F.; Pfister, H.; Wattenberg, M. Measuring and Controlling Instruction (In)Stability in Language Model Dialogs. February 2024. Available online: http://arxiv.org/abs/2402.10962 (accessed on 12 August 2025).
- Hoscilowicz, J.; Wiacek, A.; Chojnacki, J.; Cieslak, A.; Michon, L.; Urbanevych, V.; Janicki, A. Non-Linear Inference Time Intervention: Improving LLM Truthfulness. March 2024. Available online: http://arxiv.org/abs/2403.18680 (accessed on 12 August 2025).
- Fairburn, S.; Ainsworth, J. Mitigate Large Language Model Hallucinations with Probabilistic Inference in Graph Neural Networks. 1 July 2024. Available online: https://www.authorea.com/users/798018/articles/1147827-mitigate-large-language-model-hallucinations-with-probabilistic-inference-in-graph-neural-networks?commit=59e46cef9e4db14a5daf553bcbf96ff7ebab29be (accessed on 12 August 2025).
- Shelmanov, A.; Fadeeva, E.; Tsvigun, A.; Tsvigun, I.; Xie, Z.; Kiselev, I.; Daheim, N.; Zhang, C.; Vazhentsev, A.; Sachan, M.; et al. A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs. May 2025. Available online: http://arxiv.org/abs/2505.08200 (accessed on 12 August 2025).
- Guo, L.; Fang, Y.; Chen, F.; Liu, P.; Xu, S. Large Language Models with Adaptive Token Fusion: A Novel Approach to Reducing Hallucinations and Improving Inference Efficiency. 24 October 2024. Available online: https://www.authorea.com/users/847419/articles/1235237-large-language-models-with-adaptive-token-fusion-a-novel-approach-to-reducing-hallucinations-and-improving-inference-efficiency?commit=8e85c59f4f49cf8895c0b1eb937d89a716932d4c (accessed on 12 August 2025).
- Yuksekgonul, M.; Chandrasekaran, V.; Jones, E.; Gunasekar, S.; Naik, R.; Palangi, H.; Kamar, E.; Nushi, B. Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models. September 2023. Available online: http://arxiv.org/abs/2309.15098 (accessed on 12 August 2025).
- Nie, F.; Yao, J.-G.; Wang, J.; Pan, R.; Lin, C.-Y. A Simple Recipe towards Reducing Hallucination in Neural Surface Realisation. Association for Computational Linguistics. 2019. Available online: https://aclanthology.org/P19-1256.pdf (accessed on 12 August 2025).
- Matys, P.; Eliasz, J.; Kiełczyński, K.; Langner, M.; Ferdinan, T.; Kocoń, J.; Kazienko, P. AggTruth: Contextual Hallucination Detection using Aggregated Attention Scores in LLMs. June 2025. [CrossRef]
- Shi, W.; Han, X.; Lewis, M.; Tsvetkov, Y.; Zettlemoyer, L.; Yih, S.W. Trusting Your Evidence: Hallucinate Less with Context-aware Decoding. May 2023. Available online: http://arxiv.org/abs/2305.14739 (accessed on 12 August 2025).
- Wu, J.; Shen, Y.; Liu, S.; Tang, Y.; Song, S.; Wang, X.; Cai, L. Improve Decoding Factuality by Token-wise Cross Layer Entropy of Large Language Models. 2025. Available online: https://arxiv.org/abs/2502.03199 (accessed on 12 August 2025).
- van der Poel, L.; Cotterell, R.; Meister, C. Mutual Information Alleviates Hallucinations in Abstractive Summarization. October 2022. Available online: http://arxiv.org/abs/2210.13210 (accessed on 12 August 2025).
- Shi, W.; Min, S.; Yasunaga, M.; Seo, M.; James, R.; Lewis, M.; Zettlemoyer, L.; Yih, W. REPLUG: Retrieval-Augmented Black-Box Language Models. January 2023. Available online: http://arxiv.org/abs/2301.12652 (accessed on 12 August 2025).
- Xiao, Y.; Wang, W.Y. On Hallucination and Predictive Uncertainty in Conditional Language Generation. March 2021. Available online: http://arxiv.org/abs/2103.15025 (accessed on 12 August 2025).
- Huang, L.; Feng, X.; Ma, W.; Fan, Y.; Feng, X.; Gu, Y.; Ye, Y.; Zhao, L.; Zhong, W.; Wang, B.; et al. Alleviating Hallucinations from Knowledge Misalignment in Large Language Models via Selective Abstention Learning. 2025. Available online: https://aclanthology.org/2025.acl-long.1199.pdf (accessed on 4 August 2025).
- Kai, J.; Zhang, T.; Hu, H.; Lin, Z. SH2: Self-Highlighted Hesitation Helps You Decode More Truthfully. January 2024. Available online: http://arxiv.org/abs/2401.05930 (accessed on 12 August 2025).
- Qiu, Y.; Zhao, Z.; Ziser, Y.; Korhonen, A.; Ponti, E.M.; Cohen, S.B. Spectral Editing of Activations for Large Language Model Alignment. May 2024. Available online: http://arxiv.org/abs/2405.09719 (accessed on 12 August 2025).
- Zhang, Y.; Cui, L.; Bi, W.; Shi, S. Alleviating Hallucinations of Large Language Models through Induced Hallucinations. December 2023. Available online: http://arxiv.org/abs/2312.15710 (accessed on 12 August 2025).
- Zhang, H.; Chen, H.; Chen, M.; Zhang, T. Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation. June 2025. Available online: http://arxiv.org/abs/2505.23657 (accessed on 12 August 2025).
- Ji, Z.; Liu, Z.; Lee, N.; Yu, T.; Wilie, B.; Zeng, M.; Fung, P. RHO (ρ): Reducing Hallucination in Open-Domain Dialogues with Knowledge Grounding. May 2023. Available online: https://arxiv.org/abs/2212.01588 (accessed on 12 August 2025).
- Chen, S.; Xiong, M.; Liu, J.; Wu, Z.; Xiao, T.; Gao, S.; He, J. In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation. March 2024. Available online: http://arxiv.org/abs/2403.01548 (accessed on 12 August 2025).
- Li, K.; Patel, O.; Viégas, F.; Pfister, H.; Wattenberg, M. Inference-Time Intervention: Eliciting Truthful Answers from a Language Model. June 2023. Available online: http://arxiv.org/abs/2306.03341 (accessed on 12 August 2025).
- Elhoushi, M.; Shrivastava, A.; Liskovich, D.; Hosmer, B.; Wasti, B.; Lai, L.; Mahmoud, A.; Acun, B.; Agrawal, S.; Roman, A.; et al. LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding. 2024. Available online: https://arxiv.org/abs/2404.16710 (accessed on 3 August 2025).
- Chen, J.; Lin, H.; Han, X.; Sun, L. Benchmarking Large Language Models in Retrieval-Augmented Generation. September 2023. Available online: http://arxiv.org/abs/2309.01431 (accessed on 12 August 2025).
- Dziri, N.; Madotto, A.; Zaiane, O.; Bose, A.J. Neural Path Hunter: Reducing Hallucination in Dialogue Systems via Path Grounding. April 2021. Available online: http://arxiv.org/abs/2104.08455 (accessed on 12 August 2025).
- Yu, H.Q.; McQuade, F. RAG-KG-IL: A Multi-Agent Hybrid Framework for Reducing Hallucinations and Enhancing LLM Reasoning Through RAG and Incremental Knowledge Graph Learning Integration. March 2025. Available online: http://arxiv.org/abs/2503.13514 (accessed on 12 August 2025).
- Gao, L.; Dai, Z.; Pasupat, P.; Chen, A.; Chaganty, A.T.; Fan, Y.; Zhao, V.Y.; Lao, N.; Lee, H.; Juan, D.; et al. RARR: Researching and Revising What Language Models Say, Using Language Models. October 2022. Available online: http://arxiv.org/abs/2210.08726 (accessed on 12 August 2025).
- Karpukhin, V.; Guz, B.O.; Min, S.; Lewis, P.; Wu, L.; Edunov, S.; Chen, D.; Yih, W.; Ai, F. Dense Passage Retrieval for Open-Domain Question Answering. 2020. Available online: https://arxiv.org/abs/2004.04906 (accessed on 12 August 2025).
- Mala, C.S.; Gezici, G.; Giannotti, F. Hybrid Retrieval for Hallucination Mitigation in Large Language Models: A Comparative Analysis. February 2025. Available online: http://arxiv.org/abs/2504.05324 (accessed on 12 August 2025).
- Peng, B.; Galley, M.; He, P.; Cheng, H.; Xie, Y.; Hu, Y.; Huang, Q.; Liden, L.; Yu, Z.; Chen, W.; et al. Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback. February 2023. Available online: http://arxiv.org/abs/2302.12813 (accessed on 12 August 2025).
- CH-Wang, S.; van Durme, B.; Eisner, J.; Kedzie, C. Do Androids Know They’re Only Dreaming of Electric Sheep? December 2023. Available online: http://arxiv.org/abs/2312.17249 (accessed on 12 August 2025).
- Barry, M.; Caillaut, G.; Halftermeyer, P.; Qader, R.; Mouayad, M.; Cariolaro, D.; Deit, F.L.; Gesnouin, J. GraphRAG: Leveraging Graph-Based Efficiency to Minimize Hallucinations in LLM-Driven RAG for Finance Data. 2025. Available online: https://aclanthology.org/2025.genaik-1.6.pdf (accessed on 12 August 2025).
- Asai, A.; Wu, Z.; Wang, Y.; Sil, A.; Self-RAG, H.H. Generate, and Critique through Self-Reflection. October 2023. Available online: http://arxiv.org/abs/2310.11511 (accessed on 12 August 2025).
- Dwivedi, K.; Mishra, P.P. AutoRAG-LoRA: Hallucination-Triggered Knowledge Retuning via Lightweight Adapters. July 2025. Available online: http://arxiv.org/abs/2507.10586 (accessed on 12 August 2025).
- Cao, S.; Zhang, J.; Shi, J.; Lv, X.; Yao, Z.; Tian, Q.; Li, J.; Hou, L. Probabilistic Tree-of-thought Reasoning for Answering Knowledge-intensive Complex Questions. November 2023. Available online: http://arxiv.org/abs/2311.13982 (accessed on 12 August 2025).
- Su, D.; Li, X.; Zhang, J.; Shang, L.; Jiang, X.; Liu, Q.; Fung, P. Read before Generate! Faithful Long Form Question Answering with Machine Reading. 2022. Available online: https://arxiv.org/abs/2203.00343 (accessed on 12 May 2025).
- Signé, Q.; Boughanem, M.; Moreno, J.G.; Belkacem, T. A Substring Extraction-Based RAG Method for Minimising Hallucinations in Aircraft Maintenance Question Answering. In Proceedings of the 2025 International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR), New York, NY, USA, 18 July 2025; ACM: New York, NY, USA, 2025; pp. 513–521. [Google Scholar] [CrossRef]
- Lv, Q.; Wang, J.; Chen, H.; Li, B.; Zhang, Y.; Wu, F. Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models. October 2024. Available online: http://arxiv.org/abs/2410.15116 (accessed on 12 August 2025).
- Nonkes, N.; Agaronian, S.; Kanoulas, E.; Petcu, R. Leveraging Graph Structures to Detect Hallucinations in Large Language Models. July 2024. Available online: http://arxiv.org/abs/2407.04485 (accessed on 12 August 2025).
- Sun, K.; Xu, Y.E.; Zha, H.; Liu, Y.; Dong, X.L. Head-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A. Will LLMs Replace Knowledge Graphs? August 2023. Available online: http://arxiv.org/abs/2308.10168 (accessed on 12 August 2025).
- Lavrinovics, E.; Biswas, R.; Bjerva, J.; Hose, K.K. Knowledge Graphs, Large Language Models, and Hallucinations: An NLP Perspective. 2024. Available online: http://arxiv.org/abs/2411.14258 (accessed on 12 August 2025).
- Zhang, S.; Pan, L.; Zhao, J.; Wang, W.Y. The Knowledge Alignment Problem: Bridging Human and External Knowledge for Large Language Models. May 2023. Available online: http://arxiv.org/abs/2305.13669 (accessed on 12 August 2025).
- Reddy, G.P.; Kumar, Y.V.P.; Prakash, K.P. Hallucinations in Large Language Models (LLMs). In Proceedings of the 2024 IEEE Open Conference of Electrical, Electronic and Information Sciences, Vilnius, Lithuania, 25 April 2024; eStream 2024-Proceedings. Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2024. [Google Scholar] [CrossRef]
- Bayat, F.F.; Qian, K.; Han, B.; Sang, Y.; Belyi, A.; Khorshidi, S.; Wu, F.; Ilyas, I.F.; Li, Y. FLEEK: Factual Error Detection and Correction with Evidence Retrieved from External Knowledge. October 2023. Available online: http://arxiv.org/abs/2310.17119 (accessed on 12 August 2025).
- Sherif, S.; Saad, D.; Silva, S.; Gomes, V. Graph-Enhanced RAG: A Survey of Methods, Architectures, and Performance. 2025. Available online: https://www.researchgate.net/publication/393193258 (accessed on 12 August 2025).
- Li, H.; Appleby, G.; Alperin, K.; Gomez, S.R.; Suh, A. Mitigating LLM Hallucinations with Knowledge Graphs: A Case Study. April 2025. Available online: http://arxiv.org/abs/2504.12422 (accessed on 12 August 2025).
- Nishat, N.A.Z.; Coletta, A.; Bellomarini, L.; Amouzouvi, K.; Lehmann, J.; Vahdati, S. Aligning Knowledge Graphs and Language Models for Factual Accuracy. July 2025. Available online: http://arxiv.org/abs/2507.13411 (accessed on 12 August 2025).
- He, X.; Tian, Y.; Sun, Y.; Chawla, N.V.; Laurent, T.; LeCun, Y.; Bresson, X.; Hooi, B. G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering. May 2024. Available online: http://arxiv.org/abs/2402.07630 (accessed on 12 August 2025).
- Suzuoki, S.; Hatano, K. Reducing Hallucinations in Large Language Models: A Consensus Voting Approach Using Mixture of Experts. TechRxiv 2024. [Google Scholar] [CrossRef]
- Behore, S.; Dumont, L.; Venkataraman, J. Enhancing Reliability in Large Language Models: Self-Detection of Hallucinations with Spontaneous Self-Checks. 9 September 2024. Available online: https://www.authorea.com/users/829447/articles/1223513-enhancing-reliability-in-large-language-models-self-detection-of-hallucinations-with-spontaneous-self-checks?commit=5c3caaa663d1123b079882ae7501d480e3831a68 (accessed on 12 August 2025).
- Chrysostomou, G.; Zhao, Z.; Williams, M.; Aletras, N. Investigating Hallucinations in Pruned Large Language Models for Abstractive Summarization. November 2023. Available online: https://arxiv.org/pdf/2311.09335 (accessed on 12 August 2025).
- Jacobs, R.A.; Jordan, M.I.; Nowlan, S.J.; Hinton, G.E. Adaptive Mixtures of Local Experts. 1991. Available online: https://2024.sci-hub.se/1867/e922caa86bf169b2dbb314f150dbdadb/jacobs1991.pdf (accessed on 7 August 2025).
- Shazeer, N.; Mirhoseini, A.; Maziarz, K.; Davis, A.; Le, Q.; Hinton, G.; Dean, J. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. January 2017. Available online: http://arxiv.org/abs/1701.06538 (accessed on 12 August 2025).
- Li, J.; Mao, Z.; Wang, Q. Alleviating Hallucinations in Large Language Models via Truthfulness-Driven Rank-adaptive LoRA. July 2025. Available online: https://aclanthology.org/2025.findings-acl.103.pdf (accessed on 4 August 2025).
- Wang, C.; Zhao, Y.; Liu, Y.; Zhu, H. Enhancing Latent Diffusion in Large Language Models for High-Quality Implicit Neural Representations with Reduced Hallucinations. 2024. Available online: https://osf.io/preprints/osf/9utwy_v1 (accessed on 29 June 2025).
- Feldman, P.; Foulds, J.R.; Pan, S. Trapping LLM Hallucinations Using Tagged Context Prompts. June 2023. Available online: http://arxiv.org/abs/2306.06085 (accessed on 12 August 2025).
- Lei, D.; Li, Y.; Hu, M.; Wang, M.; Yun, V.; Ching, E.; Kamal, E. Chain of Natural Language Inference for Reducing Large Language Model Ungrounded Hallucinations. October 2023. Available online: http://arxiv.org/abs/2310.03951 (accessed on 12 August 2025).
- White, J.; Fu, Q.; Hays, S.; Sandborn, M.; Olea, C.; Gilbert, H.; Elnashar, A.; Spencer-Smith, J.; Schmidt, D.C. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. February 2023. Available online: http://arxiv.org/abs/2302.11382 (accessed on 12 August 2025).
- Kaddour, J.; Harris, J.; Mozes, M.; Bradley, H.; Raileanu, R.; McHardy, R. Challenges and Applications of Large Language Models. July 2023. Available online: http://arxiv.org/abs/2307.10169 (accessed on 12 August 2025).
- Cheng, Q.; Sun, T.; Zhang, W.; Wang, S.; Liu, X.; Zhang, M.; He, J.; Huang, M.; Yin, Z.; Chen, K.; et al. Evaluating Hallucinations in Chinese Large Language Models. October 2023. Available online: http://arxiv.org/abs/2310.03368 (accessed on 12 August 2025).
- Li, J.; Cheng, X.; Zhao, W.X.; Nie, J.-Y.; Wen, J.-R. HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models. May 2023. Available online: http://arxiv.org/abs/2305.11747 (accessed on 12 August 2025).
- Leiser, F.; Eckhardt, S.; Leuthe, V.; Knaeble, M.; Maedche, A.; Schwabe, G.; Sunyaev, A. HILL: A Hallucination Identifier for Large Language Models. In Proceedings of the Conference on Human Factors in Computing Systems-Proceedings, Association for Computing Machinery, New York, NY, USA, 11–16 May 2024. [Google Scholar] [CrossRef]
- Levinstein, B.A.; Herrmann, D.A. Still No Lie Detector for Language Models: Probing Empirical and Conceptual Roadblocks. Philos. Stud. 2023, 182, 1539–1565. [Google Scholar] [CrossRef]
- Varshney, N.; Raj, S.; Mishra, V.; Chatterjee, A.; Sarkar, R.; Saeidi, A.; Baral, C. Investigating and Addressing Hallucinations of LLMs in Tasks Involving Negation. June 2024. Available online: http://arxiv.org/abs/2406.05494 (accessed on 12 August 2025).
- Cao, Z.; Yang, Y.; Zhao, H. AutoHall: Automated Hallucination Dataset Generation for Large Language Models. September 2023. Available online: http://arxiv.org/abs/2310.00259 (accessed on 12 August 2025).
- Agarwal, V.; Pei, Y.; Alamir, S.; Liu, X. CodeMirage: Hallucinations in Code Generated by Large Language Models. August 2024. Available online: http://arxiv.org/abs/2408.08333 (accessed on 12 August 2025).
- Chen, M.; Tworek, J.; Jun, H.; Yuan, Q.; Pinto, H.P.d.O.; Kaplan, J.; Edwards, H.; Burda, Y.; Joseph, N.; Brockman, G.; et al. Evaluating Large Language Models Trained on Code. July 2021. Available online: http://arxiv.org/abs/2107.03374 (accessed on 12 August 2025).
- Hu, X.; Ru, D.; Qiu, L.; Guo, Q.; Zhang, T.; Xu, Y.; Luo, Y.; Liu, P.; Zhang, Y.; Zhang, Z. RefChecker: Reference-Based Fine-grained Hallucination Checker and Benchmark for Large Language Models. May 2024. Available online: http://arxiv.org/abs/2405.14486 (accessed on 12 August 2025).
- Elchafei, P.; Abu-Elkheir, M. Span-Level Hallucination Detection for LLM-Generated Answers. April 2025. Available online: http://arxiv.org/abs/2504.18639 (accessed on 12 August 2025).
- Hao, S.; Gu, Y.; Ma, H.; Hong, J.J.; Wang, Z.; Wang, D.Z.; Hu, Z. Reasoning with Language Model is Planning with World Model. May 2023. Available online: http://arxiv.org/abs/2305.14992 (accessed on 12 August 2025).
- Yao, S.; Yu, D.; Zhao, J.; Shafran, I.; Griffiths, T.L.; Cao, Y.; Narasimhan, K. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. May 2023. Available online: http://arxiv.org/abs/2305.10601 (accessed on 12 August 2025).
- Lester, B.; Al-Rfou, R.; Constant, N.; Research, G. The Power of Scale for Parameter-Efficient Prompt Tuning. Available online: https://arxiv.org/abs/2104.08691 (accessed on 12 August 2025).
- Liu, Y.; Deng, G.; Xu, Z.; Li, Y.; Zheng, Y.; Zhang, Y.; Zhao, L.; Zhang, T.; Wang, K.; Liu, Y. Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study. May 2023. Available online: http://arxiv.org/abs/2305.13860 (accessed on 12 August 2025).
- Dhuliawala, S.; Komeili, M.; Xu, J.; Raileanu, R.; Li, X.; Celikyilmaz, A.; Weston, J. Chain-of-Verification Reduces Hallucination in Large Language Models. September 2023. Available online: http://arxiv.org/abs/2309.11495 (accessed on 12 August 2025).
- Braverman, A.; Zhang, W.; Gu, Q. Mitigating Hallucination in Large Language Models with Explanatory Prompting. 2024. Available online: https://neurips.cc/virtual/2024/105546 (accessed on 12 August 2025).
- Kıcıman, E.; Ness, R.; Sharma, A.; Tan, C. Causal Reasoning and Large Language Models: Opening a New Frontier for Causality. April 2023. Available online: http://arxiv.org/abs/2305.00050 (accessed on 12 August 2025).
- Jin, Q.; Dhingra, B.; Liu, Z.; Cohen, W.W.; Lu, X. PubMedQA: A Dataset for Biomedical Research Question Answering. September 2019. Available online: http://arxiv.org/abs/1909.06146 (accessed on 12 August 2025).
- Zhou, D.; Schärli, N.; Hou, L.; Wei, J.; Scales, N.; Wang, X.; Schuurmans, D.; Cui, C.; Bousquet, O.; Le, Q.; et al. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. May 2022. Available online: http://arxiv.org/abs/2205.10625 (accessed on 12 August 2025).
- Yan, J.N.; Liu, T.; Chiu, J.T.; Shen, J.; Qin, Z.; Yu, Y.; Zhao, Y.; Lakshmanan, C.; Kurzion, Y.; Rush, A.M.; et al. Predicting Text Preference Via Structured Comparative Reasoning. November 2023. Available online: http://arxiv.org/abs/2311.08390 (accessed on 12 August 2025).
- Wei, J.; Yao, Y.; Ton, J.-F.; Guo, H.; Estornell, A.; Liu, Y. Measuring and Reducing LLM Hallucination without Gold-Standard Answers. February 2024. Available online: http://arxiv.org/abs/2402.10412 (accessed on 12 August 2025).
- Chern, I.; Chern, S.; Chen, S.; Yuan, W.; Feng, K.; Zhou, C.; He, J.; Neubig, G.; Liu, P. FacTool: Factuality Detection in Generative AI—A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios. July 2023. Available online: http://arxiv.org/abs/2307.13528 (accessed on 12 August 2025).
- Li, N.; Li, Y.; Liu, Y.; Shi, L.; Wang, K.; Wang, H. Drowzee: Metamorphic Testing for Fact-Conflicting Hallucination Detection in Large Language Models. In Proceedings of the ACM on Programming Languages; ACM: New York, NY, USA, May 2024. [Google Scholar] [CrossRef]
- Kang, H.; Ni, J.; Yao, H. Ever: Mitigating Hallucination in Large Language Models through Real-Time Verification and Rectification. November 2023. Available online: http://arxiv.org/abs/2311.09114 (accessed on 12 August 2025).
- Yan, T.; Xu, T. Refining the Responses of LLMs by Themselves. May 2023. Available online: http://arxiv.org/abs/2305.04039 (accessed on 12 August 2025).
- Du, L.; Wang, Y.; Xing, X.; Ya, Y.; Li, X.; Jiang, X.; Fang, X. Quantifying and Attributing the Hallucination of Large Language Models via Association Analysis. September 2023. Available online: http://arxiv.org/abs/2309.05217 (accessed on 12 August 2025).
- Chang, E.Y. Prompting Large Language Models with the Socratic Method. February 2023. Available online: http://arxiv.org/abs/2303.08769 (accessed on 12 August 2025).
- Yehuda, Y.; Malkiel, I.; Barkan, O.; Weill, J.; Ronen, R.; Koenigstein, N. InterrogateLLM: Zero-Resource Hallucination Detection in LLM-Generated Answers. August 2024. Available online: http://arxiv.org/abs/2403.02889 (accessed on 12 August 2025).
- Cohen, R.; Hamri, M.; Geva, M.; Globerson, A. LM vs LM: Detecting Factual Errors via Cross Examination. May 2023. Available online: http://arxiv.org/abs/2305.13281 (accessed on 12 August 2025).
- Kojima, T.; Gu, S.S.; Reid, M.; Matsuo, Y.; Iwasawa, Y. Large Language Models are Zero-Shot Reasoners. January 2023. Available online: http://arxiv.org/abs/2205.11916 (accessed on 12 August 2025).
- Jones, E.; Palangi, H.; Simões, C.; Chandrasekaran, V.; Mukherjee, S.; Mitra, A.; Awadallah, A.; Kamar, E. Teaching Language Models to Hallucinate Less with Synthetic Tasks. October 2023. Available online: http://arxiv.org/abs/2310.06827 (accessed on 12 August 2025).
- Zhao, T.Z.; Wallace, E.; Feng, S.; Klein, D.; Singh, S. Calibrate Before Use: Improving Few-Shot Performance of Language Models. June 2021. Available online: http://arxiv.org/abs/2102.09690 (accessed on 12 August 2025).
- Min, S.; Krishna, K.; Lyu, X.; Lewis, M.; Yih, W.; Koh, P.W.; Iyyer, M.; Zettlemoyer, L.; Hajishirzi, H. FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation. May 2023. Available online: http://arxiv.org/abs/2305.14251 (accessed on 12 August 2025).
- Gou, Z.; Shao, Z.; Gong, Y.; Shen, Y.; Yang, Y.; Duan, N.; Chen, W. CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing. May 2023. Available online: http://arxiv.org/abs/2305.11738 (accessed on 12 August 2025).
- Suzgun, M.; Kalai, A.T. Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding. January 2024. Available online: http://arxiv.org/abs/2401.12954 (accessed on 12 August 2025).
- Grayson, M.; Patterson, C.; Goldstein, B.; Ivanov, S.; Davidson, M. Mitigating Hallucinations in Large Language Models Using a Channel-Aware Domain-Adaptive Generative Adversarial Network (CADAGAN). 30 September 2024. Available online: https://www.researchsquare.com/article/rs-5164079/v1 (accessed on 12 August 2025).
- Joshi, N.; Rando, J.; Saparov, A.; Kim, N.; He, H. Personas as a Way to Model Truthfulness in Language Models. October 2023. Available online: http://arxiv.org/abs/2310.18168 (accessed on 12 August 2025).
- Chen, K.; Chen, Q.; Zhou, J.; He, Y.; He, L. DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large Language Models. March 2024. Available online: http://arxiv.org/abs/2403.00896 (accessed on 12 August 2025).
- Xu, R.; Lin, B.S.; Yang, S.; Zhang, T.; Shi, W.; Zhang, T.; Fang, Z.; Xu, W.; Qiu, H. The Earth is Flat Because…: Investigating LLMs’ Belief Towards Misinformation via Persuasive Conversation. December 2023. Available online: http://arxiv.org/abs/2312.09085 (accessed on 12 August 2025).
- Chen, R.; Arditi, A.; Sleight, H.; Evans, O.; Lindsey, J. Persona Vectors: Monitoring and Controlling Character Traits in Language Models. July 2025. Available online: http://arxiv.org/abs/2507.21509 (accessed on 12 August 2025).
- Li, X.L.; Liang, P. Prefix-Tuning: Optimizing Continuous Prompts for Generation. January 2021. Available online: http://arxiv.org/abs/2101.00190 (accessed on 12 August 2025).
- Kadavath, S.; Conerly, T.; Askell, A.; Henighan, T.; Drain, D.; Perez, E.; Schiefer, N.; Hatfield-Dodds, Z.; DasSarma, N.; Tran-Johnson, E.; et al. Language Models (Mostly) Know What They Know. November 2022. Available online: http://arxiv.org/abs/2207.05221 (accessed on 12 August 2025).
- Wu, W.; Cao, Y.; Yi, N.; Ou, R.; Zheng, Z. Detecting and Reducing the Factual Hallucinations of Large Language Models with Metamorphic Testing. In Proceedings of the ACM on Software Engineering; ACM: New York, NY, USA, 2025; Volume 2, pp. 1432–1453. Available online: https://dl.acm.org/doi/pdf/10.1145/3715784 (accessed on 12 August 2025).
- Harrington, F.; Rosenthal, E.; Swinburne, M. Mitigating Hallucinations in Large Language Models with Sliding Generation and Self-Checks. TechRxiv 2024. [Google Scholar] [CrossRef]
- Manakul, P.; Liusie, A.; Gales, M.J.F. SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models. March 2023. Available online: http://arxiv.org/abs/2303.08896 (accessed on 12 August 2025).
- Zhao, R.; Li, X.; Joty, S.; Qin, C.; Bing, L. Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework. May 2023. Available online: http://arxiv.org/abs/2305.03268 (accessed on 12 August 2025).
- Zhao, Z.; Cohen, S.B.; Webber, B. Reducing Quantity Hallucinations in Abstractive Summarization. September 2020. Available online: http://arxiv.org/abs/2009.13312 (accessed on 12 August 2025).
- Li, X.; Zhao, R.; Chia, Y.K.; Ding, B.; Joty, S.; Poria, S.; Bing, L. Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources. May 2023. Available online: http://arxiv.org/abs/2305.13269 (accessed on 12 August 2025).
- Zablocki, P.; Gajewska, Z. Assessing Hallucination Risks in Large Language Models Through Internal State Analysis. Authorea 2024. [Google Scholar] [CrossRef]
- Dale, D.; Voita, E.; Barrault, L.; Costa-jussà, M.R. Detecting and Mitigating Hallucinations in Machine Translation: Model Internal Workings Alone Do Well, Sentence Similarity Even Better. December 2022. Available online: http://arxiv.org/abs/2212.08597 (accessed on 12 August 2025).
- Liu, Y.; Yang, Q.; Tang, J.; Guo, T.; Wang, C.; Li, P.; Xu, S.; Gao, X.; Li, Z.; Liu, J.; et al. Reducing hallucinations of large language models via hierarchical semantic piece. Complex Intell. Syst. 2025, 11, 231. [Google Scholar] [CrossRef]
- Ross, J.J.; Khramtsova, E.; van der Vegt, A.; Koopman, B.; Zuccon, G. RARR Unraveled: Component-Level Insights into Hallucination Detection and Mitigation. In Proceedings of the SIGIR 2025, 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, Padua, Italy, 13–18 July 2025; Association for Computing Machinery, Inc.: New York, NY, USA, 2025; pp. 3286–3295. [Google Scholar] [CrossRef]
- Kossen, J.; Han, J.; Razzak, M.; Schut, L.; Malik, S.; Gal, Y. Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs. 2024. Available online: https://arxiv.org/pdf/2406.15927 (accessed on 3 May 2025).
- Verma, S.; Tran, K.; Ali, Y.; Min, G. Reducing LLM Hallucinations using Epistemic Neural Networks. December 2023. Available online: http://arxiv.org/abs/2312.15576 (accessed on 12 August 2025).
- Yin, Z.; Sun, Q.; Guo, Q.; Wu, J.; Qiu, X.; Huang, X. Do Large Language Models Know What They Don’t Know? 2023. Available online: https://github.com/yinzhangyue/SelfAware (accessed on 12 August 2025).
- Lin, S.; Hilton, J.; Evans, O. Teaching Models to Express Their Uncertainty in Words. May 2022. Available online: http://arxiv.org/abs/2205.14334 (accessed on 12 August 2025).
- Zhang, T.; Qiu, L.; Guo, Q.; Deng, C.; Zhang, Y.; Zhang, Z.; Zhou, C.; Wang, X.; Fu, L. Enhancing Uncertainty-Based Hallucination Detection with Stronger Focus. November 2023. Available online: http://arxiv.org/abs/2311.13230 (accessed on 12 August 2025).
- Yan, S.-Q.; Gu, J.-C.; Zhu, Y.; Ling, Z.-H. Corrective Retrieval Augmented Generation. January 2024. Available online: http://arxiv.org/abs/2401.15884 (accessed on 12 August 2025).
- Madaan, A.; Tandon, N.; Gupta, P.; Hallinan, S.; Gao, L.; Wiegreffe, S.; Alon, U.; Dziri, N.; Prabhumoye, S.; Yang, Y.; et al. Self-Refine: Iterative Refinement with Self-Feedback. March 2023. Available online: http://arxiv.org/abs/2303.17651 (accessed on 12 August 2025).
- Guerreiro, N.M.; Voita, E.; Martins, A.F.T. Looking for a Needle in a Haystack: A Comprehensive Study of Hallucinations in Neural Machine Translation. August 2022. Available online: http://arxiv.org/abs/2208.05309 (accessed on 12 August 2025).
- Alain, G.; Bengio, Y. Understanding Intermediate Layers Using Linear Classifier Probes. October 2016. Available online: http://arxiv.org/abs/1610.01644 (accessed on 12 August 2025).
- Rateike, M.; Cintas, C.; Wamburu, J.; Akumu, T.; Speakman, S. Weakly Supervised Detection of Hallucinations in LLM Activations. December 2023. Available online: http://arxiv.org/abs/2312.02798 (accessed on 12 August 2025).
- Chuang, Y.-S.; Qiu, L.; Hsieh, C.-Y.; Krishna, R.; Kim, Y.; Glass, J. Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps. October 2024. Available online: http://arxiv.org/abs/2407.07071 (accessed on 12 August 2025).
- Zhu, D.; Chen, D.; Li, Q.; Chen, Z.; Ma, L.; Grossklags, J.; Fritz, M. PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics. April 2024. Available online: http://arxiv.org/abs/2404.04722 (accessed on 12 August 2025).
- Marks, S.; Tegmark, M. The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets. October 2023. Available online: http://arxiv.org/abs/2310.06824 (accessed on 12 August 2025).
- Chen, L.; Wu, X.; Xiong, Z.; Kang, X. Two Stage Psychology-Guided Fine-Grained Editing and Sampling Approach for Mitigating Hallucination in Large Language Models Publication. 2025. Available online: https://escholarship.org/uc/item/0gn8m1qq (accessed on 4 August 2025).
- Son, M.; Jang, J.; Kim, M. Lightweight Query Checkpoint: Classifying Faulty User Queries to Mitigate Hallucinations in Large Language Model Question Answering. July 2025. Available online: https://openreview.net/pdf?id=n9C8u6tpT4 (accessed on 4 August 2025).
- Yin, F.; Srinivasa, J.; Chang, K.-W. Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension. February 2024. Available online: http://arxiv.org/abs/2402.18048 (accessed on 12 August 2025).
- Dai, D.; Dong, L.; Hao, Y.; Sui, Z.; Chang, B.; Wei, F. Knowledge Neurons in Pretrained Transformers. March 2022. Available online: http://arxiv.org/abs/2104.08696 (accessed on 12 August 2025).
- Raunak, V.; Menezes, A.; Junczys-Dowmunt, M. The Curious Case of Hallucinations in Neural Machine Translation. April 2021. Available online: http://arxiv.org/abs/2104.06683 (accessed on 12 August 2025).
- Jiang, C.; Qi, B.; Hong, X.; Fu, D.; Cheng, Y.; Meng, F.; Yu, M.; Zhou, B.; Zhou, J. On Large Language Models’ Hallucination with Regard to Known Facts. March 2024. Available online: http://arxiv.org/abs/2403.20009 (accessed on 12 August 2025).
- Ji, Z.; Chen, D.; Ishii, E.; Cahyawijaya, S.; Bang, Y.; Wilie, B.; Fung, P. LLM Internal States Reveal Hallucination Risk Faced with a Query. July 2024. Available online: http://arxiv.org/abs/2407.03282 (accessed on 12 August 2025).
- Yin, K.; Neubig, G. Interpreting Language Models with Contrastive Explanations. February 2022. Available online: http://arxiv.org/abs/2202.10419 (accessed on 12 August 2025).
- Yu, L.; Cao, M.; Cheung, J.C.K.; Dong, Y. Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations. June 2024. Available online: http://arxiv.org/abs/2403.18167 (accessed on 12 August 2025).
- Kapoor, S.; Stroebl, B.; Siegel, Z.S.; Nadgir, N.; Narayanan, A. AI Agents That Matter. July 2024. Available online: http://arxiv.org/abs/2407.01502 (accessed on 12 August 2025).
- Thórisson, K.; Helgasson, H. Cognitive Architectures and Autonomy: A Comparative Review. J. Artif. Gen. Intell. 2012, 3, 1–30. [Google Scholar] [CrossRef]
- Du, Y.; Li, S.; Torralba, A.; Tenenbaum, J.B.; Mordatch, I. Improving Factuality and Reasoning in Language Models through Multiagent Debate. May 2023. Available online: http://arxiv.org/abs/2305.14325 (accessed on 12 August 2025).
- Abd Elrahman Amer and Magdi Amer Using Multi-Agent Architecture to Mitigate the Risk of LLM Hallucinations. July 2025. Available online: https://arxiv.org/pdf/2507.01446 (accessed on 5 July 2025).
- Huh, D.; Mohapatra, P. Multi-Agent Reinforcement Learning: A Comprehensive Survey. July 2024. Available online: http://arxiv.org/abs/2312.10256 (accessed on 12 August 2025).
- Pagnoni, A.; Balachandran, V.; Tsvetkov, Y. Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics. April 2021. Available online: http://arxiv.org/abs/2104.13346 (accessed on 12 August 2025).
- Banerjee, S.; Lavie, A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, MI, USA, 29 June 2005; pp. 65–72. Available online: https://aclanthology.org/W05-0909/ (accessed on 12 August 2025).
- Lin, C.-Y. ROUGE: A package for automatic evaluation of summaries. In Proceedings of the ACL Workshop on Text Summarization Branches Out, Barcelona, Spain, 25–26 July 2004; pp. 74–81. [Google Scholar]
- Zhang, T.; Kishore, V.; Wu, F.; Weinberger, K.Q.; Artzi, Y. BERTScore: Evaluating Text Generation with BERT. April 2019. Available online: http://arxiv.org/abs/1904.09675 (accessed on 12 August 2025).
- Chaturvedi, A.; Bhar, S.; Saha, S.; Garain, U.; Asher, N. Analyzing Semantic Faithfulness of Language Models via Input Intervention on Question Answering. Comput. Linguist. 2024, 50, 119–155. [Google Scholar] [CrossRef]
- Kryściński, W.; McCann, B.; Xiong, C.; Socher, R. Evaluating the Factual Consistency of Abstractive Text Summarization. October 2019. Available online: http://arxiv.org/abs/1910.12840 (accessed on 12 August 2025).
- Ramprasad, S.; Ferracane, E.; Lipton, Z.C. Analyzing LLM Behavior in Dialogue Summarization: Unveiling Circumstantial Hallucination Trends. June 2024. Available online: http://arxiv.org/abs/2406.03487 (accessed on 12 August 2025).
- Hong, G.; Gema, A.P.; Saxena, R.; Du, X.; Nie, P.; Zhao, Y.; Perez-Beltrachini, L.; Ryabinin, M.; He, X.; Fourrier, C.; et al. The Hallucinations Leaderboard—An Open Effort to Measure Hallucinations in Large Language Models. April 2024. Available online: http://arxiv.org/abs/2404.05904 (accessed on 12 August 2025).
- Clark, C.; Lee, K.; Chang, M.; Kwiatkowski, T.; Collins, M.; Toutanova, K.; Allen, P.G. BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions. Available online: https://arxiv.org/abs/1905.10044 (accessed on 12 August 2025).
- Muhlgay, D.; Ram, O.; Magar, I.; Levine, Y.; Ratner, N.; Belinkov, Y.; Abend, O.; Leyton-Brown, K.; Shashua, A.; Shoham, Y. Generating Benchmarks for Factuality Evaluation of Language Models. July 2023. Available online: http://arxiv.org/abs/2307.06908 (accessed on 12 August 2025).
- Chen, S.; Zhao, Y.; Zhang, J.; Chern, I.; Gao, S.; Liu, P.; He, J. FELM: Benchmarking Factuality Evaluation of Large Language Models. November 2023. Available online: http://arxiv.org/abs/2310.00741 (accessed on 12 August 2025).
- Thorne, J.; Vlachos, A.; Christodoulopoulos, C.; Mittal, A. FEVER: A Large-Scale Dataset for Fact Extraction and VERification. March 2018. Available online: http://arxiv.org/abs/1803.05355 (accessed on 12 August 2025).
- Huang, B.; Chen, C.; Xu, X.; Payani, A.; Shu, K. Can Knowledge Editing Really Correct Hallucinations? March 2025. Available online: http://arxiv.org/abs/2410.16251 (accessed on 12 August 2025).
- Bang, Y.; Ji, Z.; Schelten, A.; Hartshorn, A.; Fowler, T.; Zhang, C.; Cancedda, N.; Fung, P. HalluLens: LLM Hallucination Benchmark. April 2025. Available online: http://arxiv.org/abs/2504.17550 (accessed on 12 August 2025).
- Ravichander, A.; Ghela, S.; Wadden, D.; Choi, Y. HALoGEN: Fantastic LLM Hallucinations and Where to Find Them. January 2025. Available online: http://arxiv.org/abs/2501.08292 (accessed on 12 August 2025).
- Kwiatkowski, T.; Palomaki, J.; Redfield, O.; Collins, M.; Parikh, A.; Alberti, C.; Epstein, D.; Polosukhin, I.; Devlin, J.; Lee, K.; et al. Natural Questions: A Benchmark for Question Answering Research. Trans. Assoc. Comput. Linguist. 2019, 7, 452–466. [Google Scholar] [CrossRef]
- Joshi, M.; Choi, E.; Weld, D.S.; Zettlemoyer, L. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. May 2017. Available online: http://arxiv.org/abs/1705.03551 (accessed on 12 August 2025).
- Liang, X.; Song, S.; Niu, S.; Li, Z.; Xiong, F.; Tang, B.; Wang, Y.; He, D.; Cheng, P.; Wang, Z.; et al. UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation. November 2023. Available online: https://huggingface.co/papers/2311.15296 (accessed on 12 August 2025).
- Wang, X.; Hu, Z.; Lu, P.; Zhu, Y.; Zhang, J.; Subramaniam, S.; Loomba, A.R.; Zhang, S.; Sun, Y.; Wang, W. SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models. July 2023. Available online: http://arxiv.org/abs/2307.10635 (accessed on 12 August 2025).
- Chen, Y.; Fu, Q.; Yuan, Y.; Wen, Z.; Fan, G.; Liu, D.; Zhang, D.; Li, Z.; Xiao, Y. Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models. In Proceedings of the International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023; Association for Computing Machinery: New York, NY, USA, 2023; pp. 245–255. Available online: https://arxiv.org/pdf/2407.04121 (accessed on 12 August 2025).
- Laban, P.; Kryściński, W.; Agarwal, D.; Fabbri, A.R.; Xiong, C.; Joty, S.; Wu, C. LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond. May 2023. Available online: http://arxiv.org/abs/2305.14540 (accessed on 12 August 2025).
- Yoran, O.; Wolfson, T.; Ram, O.; Berant, J. Making Retrieval-Augmented Language Models Robust to Irrelevant Context. October 2023. Available online: http://arxiv.org/abs/2310.01558 (accessed on 12 August 2025).
- Honovich, O.; Choshen, L.; Aharoni, R.; Neeman, E.; Szpektor, I.; Abend, O. Q2: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering. April 2021. Available online: http://arxiv.org/abs/2104.08202 (accessed on 12 August 2025).
- Malin, B.; Kalganova, T.; Boulgouris, N. A Review of Faithfulness Metrics for Hallucination Assessment in Large Language Models. 2024. Available online: http://arxiv.org/abs/2501.00269 (accessed on 12 August 2025).
- Huo, S.; Arabzadeh, N.; Clarke, C.L.A. Retrieving Supporting Evidence for LLMs Generated Answers. 2023. Available online: http://arxiv.org/abs/2306.13781 (accessed on 12 August 2025).
- Naseem, T.; Xu, G.; Swaminathan, S.; Yehudai, A.; Chaudhury, S.; Florian, R.; Astudillo, R.; Munawar, A. A Grounded Preference Model for LLM Alignment. 2024. Available online: https://aclanthology.org/2024.findings-acl.10 (accessed on 12 August 2025).
- Chen, E.; Kaushik, D.; Dhillon, G.; Wang, Y.; Hadsell, R.; Cohen, W.W. Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations. 2025. Available online: https://arxiv.org/abs/2504.14150 (accessed on 12 August 2025).
- Lanham, T.; Chen, A.; Radhakrishnan, A.; Steiner, B.; Denison, C.; Hernandez, D.; Li, D.; Durmus, E.; Hubinger, E.; Kernion, J.; et al. Measuring Faithfulness in Chain-of-Thought Reasoning. 2023. Available online: https://arxiv.org/abs/2307.13702 (accessed on 12 August 2025).
- The PRISMA 2020 Statement: An Updated Guideline for Reporting Systematic Reviews. Available online: https://www.bmj.com/content/372/bmj.n71 (accessed on 26 September 2025).




















| Strategy | Extra LM Calls per Query | Context Growth | Other Modules | Latency Impact | Memory Impact | Notes |
|---|---|---|---|---|---|---|
| Prompt Engineering | +0 | ±0 | — | Low | Low | Sensitive to prompt design; cheapest mitigation. |
| Decoding constraints/contrastive decoding | +0 | ±0 | per-token ops | Low–Med | Low | Per-token overhead (≈10–40%) depending on constraint strength. |
| RAG (BM25/embedding + reranker + generator) | +0–1 (reranker) | High (k × chunk_len) | retriever, index I/O | Med–High | Med | Cost scales with k (docs) and chunk size; reranker adds one pass. |
| RAG + post-gen verifier (claim checker, NLI) | +1 (verifier) | High | retriever + verifier | High | Med–High | Better precision; ~1 extra LM pass for verification. |
| Self-verification/critic (same/smaller LM) | +1 (critic) | ±0 | — | Med–High | Low–Med | One additional pass; loops increase cost linearly. |
| Agentic pipelines | +m (m stages) | High | tool calls/IO | High–Very High | Med–High | Cost ≈ m × single-pass + retrieval/verification overheads. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kazlaris, I.; Antoniou, E.; Diamantaras, K.; Bratsas, C. From Illusion to Insight: A Taxonomic Survey of Hallucination Mitigation Techniques in LLMs. AI 2025, 6, 260. https://doi.org/10.3390/ai6100260
Kazlaris I, Antoniou E, Diamantaras K, Bratsas C. From Illusion to Insight: A Taxonomic Survey of Hallucination Mitigation Techniques in LLMs. AI. 2025; 6(10):260. https://doi.org/10.3390/ai6100260
Chicago/Turabian StyleKazlaris, Ioannis, Efstathios Antoniou, Konstantinos Diamantaras, and Charalampos Bratsas. 2025. "From Illusion to Insight: A Taxonomic Survey of Hallucination Mitigation Techniques in LLMs" AI 6, no. 10: 260. https://doi.org/10.3390/ai6100260
APA StyleKazlaris, I., Antoniou, E., Diamantaras, K., & Bratsas, C. (2025). From Illusion to Insight: A Taxonomic Survey of Hallucination Mitigation Techniques in LLMs. AI, 6(10), 260. https://doi.org/10.3390/ai6100260

