Mind the Link: Discourse Link-Aware Hallucination Detection in Summarization
Abstract
1. Introduction
2. Related Works
2.1. Hallucination Detection in Summarization
2.2. Atomic Content Unit Decomposition
2.3. AMR Graph
3. Methods
3.1. Discourse Link-Aware Content Unit Decomposition Using AMR Graphs
3.1.1. Subgraph Extraction Based on Discourse Link
3.1.2. Post-Processing
| Algorithm 1 DL-ACU: Discourse Link-Aware ACU Decomposition | 
| Require: Summary S; STOG; GTOS; NLI scorer fNLI_ENTAIL; coreference resolver Coref; thresholds τ, δ | 
| Require: AMR role sets: predicate detector isPredicate(·); discourse labels ; rule set ; skip set (:wiki, :name, :op) | 
| Require: Baseline ACUs (from the host Hallucination Detection system) | 
| 
 | 
3.1.3. Discussion and Practical Considerations
3.2. Selective Document-Atomic Content Unit Decomposition
3.2.1. Entailment-Based Selection
3.2.2. Atomic Content Unit Decomposition
3.2.3. Discussion and Practical Considerations
4. Experiments
4.1. Experimental Setup
4.2. Evaluation Benchmarks
5. Results
5.1. Discourse Link-Aware Content Unit
5.2. Selective Document-Atomic Content Unit
5.3. Synergistic Effects and Overall Performance
6. Conclusions
7. Limitations
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| ACU | Atomic Content Unit | 
| DL-ACU | Discourse Link-Aware Content Unit | 
| SD-ACU | Selective Document-Atomic Content Unit | 
References
- Liu, Y.; Fabbri, A.; Liu, P.; Zhao, Y.; Nan, L.; Han, R.; Han, S.; Joty, S.; Wu, C.S.; Xiong, C.; et al. Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; pp. 4140–4170. [Google Scholar]
- Yang, J.; Yoon, S.; Kim, B.; Lee, H. FIZZ: Factual Inconsistency Detection by Zoom-in Summary and Zoom-out Document. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, 12–16 November 2024; pp. 30–45. [Google Scholar]
- Scirè, A.; Ghonim, K.; Navigli, R. FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand, 11–16 August 2024; pp. 14148–14161. [Google Scholar]
- Zhang, H.; Xu, Y.; Perez-Beltrachini, L. Fine-Grained Natural Language Inference Based Faithfulness Evaluation for Diverse Summarisation Tasks. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), St. Julian’s, Malta, 17–22 March 2024; pp. 1701–1722. [Google Scholar]
- Pagnoni, A.; Balachandran, V.; Tsvetkov, Y. Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 4812–4829. [Google Scholar]
- Luo, G.; Fan, W.; Li, M.; He, Y.; Yang, Y.; Bao, F. On the Intractability to Synthesize Factual Inconsistencies in Summarization. In Proceedings of the Findings of the Association for Computational Linguistics: EACL 2024, St. Julian’s, Malta, 17–22 March 2024; pp. 1026–1037. [Google Scholar]
- Qiu, H.; Huang, K.H.; Qu, J.; Peng, N. AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Mexico City, Mexico, 16–21 June 2024; pp. 594–608. [Google Scholar]
- Laban, P.; Schnabel, T.; Bennett, P.N.; Hearst, M.A. SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization. Trans. Assoc. Comput. Linguist. 2022, 10, 163–177. [Google Scholar] [CrossRef]
- Zha, Y.; Yang, Y.; Li, R.; Hu, Z. AlignScore: Evaluating Factual Consistency with A Unified Alignment Function. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; pp. 11328–11348. [Google Scholar]
- Stacey, J.; Minervini, P.; Dubossarsky, H.; Camburu, O.M.; Rei, M. Atomic Inference for NLI with Generated Facts as Atoms. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, 12–16 November 2024; pp. 10188–10204. [Google Scholar]
- Mitra, A.; Corro, L.D.; Mahajan, S.; Codas, A.; Simoes, C.; Agarwal, S.; Chen, X.; Razdaibiedina, A.; Jones, E.; Aggarwal, K.; et al. Orca 2: Teaching Small Language Models How to Reason. arXiv 2023, arXiv:2311.11045. [Google Scholar] [CrossRef]
- Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
- Banarescu, L.; Bonial, C.; Cai, S.; Georgescu, M.; Griffitt, K.; Hermjakob, U.; Knight, K.; Koehn, P.; Palmer, M.; Schneider, N. Abstract Meaning Representation for Sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, Sofia, Bulgaria, 8–9 August 2013; pp. 178–186. [Google Scholar]
- Chen, Y.; Eger, S. MENLI: Robust Evaluation Metrics from Natural Language Inference. Trans. Assoc. Comput. Linguist. 2023, 11, 804–825. [Google Scholar] [CrossRef]
- Kryscinski, W.; McCann, B.; Xiong, C.; Socher, R. Evaluating the Factual Consistency of Abstractive Text Summarization. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 9332–9346. [Google Scholar]
- Bao, F.S.; Li, M.; Qu, R.; Luo, G.; Wan, E.; Tang, Y.; Fan, W.; Tamber, M.S.; Kazi, S.; Sourabh, V.; et al. FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), Albuquerque, Mexico, 29 April–4 May 2025; pp. 448–461. [Google Scholar]
- Scialom, T.; Dray, P.A.; Lamprier, S.; Piwowarski, B.; Staiano, J.; Wang, A.; Gallinari, P. QuestEval: Summarization Asks for Fact-based Evaluation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 6594–6604. [Google Scholar]
- Fabbri, A.; Wu, C.S.; Liu, W.; Xiong, C. QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA, 10–15 July 2022; pp. 2587–2601. [Google Scholar]
- Liu, Y.; Iter, D.; Xu, Y.; Wang, S.; Xu, R.; Zhu, C. G-Eval: NLG Evaluation using Gpt-4 with Better Human Alignment. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023; pp. 2511–2522. [Google Scholar]
- Wan, D.; Sinha, K.; Iyer, S.; Celikyilmaz, A.; Bansal, M.; Pasunuru, R. ACUEval: Fine-grained Hallucination Evaluation and Correction for Abstractive Summarization. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand, 11–16 August 2024; pp. 10036–10056. [Google Scholar]
- Ribeiro, L.F.R.; Liu, M.; Gurevych, I.; Dreyer, M.; Bansal, M. FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA, 10–15 July 2022; pp. 3238–3253. [Google Scholar]
- Goyal, T.; Durrett, G. Evaluating Factuality in Generation with Dependency-level Entailment. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Online, 16–20 November 2020; pp. 3592–3603. [Google Scholar]
- Nenkova, A.; Passonneau, R. Evaluating Content Selection in Summarization: The Pyramid Method. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004, Boston, MA, USA, 2–7 April 2004; pp. 145–152. [Google Scholar]
- Goodman, M.W. Penman: An Open-Source Library and Tool for AMR Graphs. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Online, 5–10 July 2020; pp. 312–319. [Google Scholar]
- Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 7871–7880. [Google Scholar]
- Wein, S.; Opitz, J. A Survey of AMR Applications. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, 12–16 November 2024; pp. 6856–6875. [Google Scholar]
- Nawrath, M.; Nowak, A.; Ratz, T.; Walenta, D.; Opitz, J.; Ribeiro, L.; Sedoc, J.; Deutsch, D.; Mille, S.; Liu, Y.; et al. On the Role of Summary Content Units in Text Summarization Evaluation. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), Mexico City, Mexico, 16–21 June 2024; pp. 272–281. [Google Scholar]
- Martinelli, G.; Barba, E.; Navigli, R. Maverick: Efficient and Accurate Coreference Resolution Defying Recent Trends. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand, 11–16 August 2024; pp. 13380–13394. [Google Scholar]
- Kamoi, R.; Goyal, T.; Diego Rodriguez, J.; Durrett, G. WiCE: Real-World Entailment for Claims in Wikipedia. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023; pp. 7561–7583. [Google Scholar]
- Zhang, T.; Kishore, V.; Wu, F.; Weinberger, K.Q.; Artzi, Y. BERTScore: Evaluating Text Generation with BERT. In Proceedings of the 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Schuster, T.; Fisch, A.; Barzilay, R. Get Your Vitamin C! Robust Fact Verification with Contrastive Evidence. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 624–643. [Google Scholar]
- Pradhan, S.; Moschitti, A.; Xue, N.; Uryupina, O.; Zhang, Y. CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes. In Proceedings of the Joint Conference on EMNLP and CoNLL, Shared Task, Jeju Island, Republic of Korea, 12–14 July 2012; pp. 1–40. [Google Scholar]
- Tang, L.; Goyal, T.; Fabbri, A.; Laban, P.; Xu, J.; Yavuz, S.; Kryscinski, W.; Rousseau, J.; Durrett, G. Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; pp. 11626–11644. [Google Scholar]
- Fabbri, A.R.; Kryściński, W.; McCann, B.; Xiong, C.; Socher, R.; Radev, D. SummEval: Re-evaluating Summarization Evaluation. Trans. Assoc. Comput. Linguist. 2021, 9, 391–409. [Google Scholar] [CrossRef]
- Cao, S.; Wang, L. CLIFF: Contrastive Learning for Improving Faithfulness and Factuality in Abstractive Summarization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 6633–6649. [Google Scholar]
- Maynez, J.; Narayan, S.; Bohnet, B.; McDonald, R. On Faithfulness and Factuality in Abstractive Summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 1906–1919. [Google Scholar]
- Huang, D.; Cui, L.; Yang, S.; Bao, G.; Wang, K.; Xie, J.; Zhang, Y. What Have We Achieved on Text Summarization? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 446–469. [Google Scholar]
- Goyal, T.; Durrett, G. Annotating and Modeling Fine-grained Factuality in Summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 1449–1462. [Google Scholar]
- Cao, M.; Dong, Y.; Cheung, J. Hallucinated but Factual! Inspecting the Factuality of Hallucinations in Abstractive Summarization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; pp. 3340–3354. [Google Scholar]
- Wang, A.; Cho, K.; Lewis, M. Asking and Answering Questions to Evaluate the Factual Consistency of Summaries. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 5008–5020. [Google Scholar]
- See, A.; Liu, P.J.; Manning, C.D. Get To The Point: Summarization with Pointer-Generator Networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada, 30 July–4 August 2017; pp. 1073–1083. [Google Scholar]
- Gehrmann, S.; Deng, Y.; Rush, A. Bottom-Up Abstractive Summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 4098–4109. [Google Scholar]
- Liu, Y.; Lapata, M. Text Summarization with Pretrained Encoders. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 3730–3740. [Google Scholar]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language Models are Unsupervised Multitask Learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
- Zhang, J.; Zhao, Y.; Saleh, M.; Liu, P.J. PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In Proceedings of the 37th International Conference on Machine Learning, Online, 13–18 July 2020. [Google Scholar]
- Hermann, K.M.; Kočiský, T.; Grefenstette, E.; Espeholt, L.; Kay, W.; Suleyman, M.; Blunsom, P. Teaching Machines to Read and Comprehend. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 7–12 December 2015; pp. 1693–1701. [Google Scholar]
- Narayan, S.; Cohen, S.B.; Lapata, M. Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 1797–1807. [Google Scholar]
- Falke, T.; Ribeiro, L.F.R.; Utama, P.A.; Dagan, I.; Gurevych, I. Ranking Generated Summaries by Correctness: An Interesting but Challenging Application for Natural Language Inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 2214–2220. [Google Scholar]
- Adams, G.; Nguyen, B.; Smith, J.; Xia, Y.; Xie, S.; Ostropolets, A.; Deb, B.; Chen, Y.J.; Naumann, T.; Elhadad, N. What are the Desired Characteristics of Calibration Sets? Identifying Correlates on Long Form Scientific Summarization. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; pp. 10520–10542. [Google Scholar]
- Cohan, A.; Dernoncourt, F.; Kim, D.S.; Bui, T.; Kim, S.; Chang, W.; Goharian, N. A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), New Orleans, LA, USA, 1–6 June 2018; pp. 615–621. [Google Scholar]
- Huang, L.; Cao, S.; Parulian, N.; Ji, H.; Wang, L. Efficient Attentions for Long Document Summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 1419–1436. [Google Scholar]
- Fabbri, A.; Li, I.; She, T.; Li, S.; Radev, D. Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 1074–1084. [Google Scholar]
- Zhong, M.; Yin, D.; Yu, T.; Zaidi, A.; Mutuma, M.; Jha, R.; Awadallah, A.H.; Celikyilmaz, A.; Liu, Y.; Qiu, X.; et al. QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 5905–5921. [Google Scholar]

| Method | CoGenSumm | FactCC | FRANK | SummEval | Polytope | XSumFaith | AVG | 
|---|---|---|---|---|---|---|---|
| Other comparators | |||||||
| DAE | 63.40 | 75.90 | 61.70 | 70.30 | 62.80 | 50.80 | 64.15 | 
| FactCC | 65.40 | 77.30 | 59.40 | 58.53 | 58.20 | 56.54 | 62.56 | 
| SummaC-ZS | 70.40 | 83.81 | 78.51 | 78.70 | 62.00 | 58.40 | 71.97 | 
| SummaC-Conv | 64.70 | 89.06 | 81.62 | 81.70 | 62.70 | 65.44 | 74.20 | 
| MENLI | 55.20 | 58.17 | 75.11 | 52.73 | 57.07 | 67.15 | 60.91 | 
| MFMA | 64.13 | 84.88 | 80.62 | 75.50 | 58.31 | 55.07 | 69.75 | 
| AlignScore | 70.16 | 85.88 | 82.06 | 56.42 | 60.64 | 73.44 | 71.43 | 
| InFusE | 75.61 | 87.59 | 79.39 | 73.54 | 54.44 | 67.64 | 73.04 | 
| FIZZ family | |||||||
| FIZZ (baseline) | 59.18 | 78.66 | 80.44 | 62.92 | 58.09 | 72.33 | 68.60 | 
| FIZZ + DL-ACU | 62.43 | 79.46 | 80.49 | 63.09 | 58.47 | 72.72 | 69.44 | 
| FIZZ + SD-ACU | 63.23 | 81.50 | 80.70 | 64.17 | 60.14 | 72.55 | 70.38 | 
| FIZZ + DL-ACU + SD-ACU | 63.45 | 81.52 | 80.68 | 63.35 | 58.89 | 72.97 | 70.14 | 
| FENICE family | |||||||
| FENICE (baseline) | 76.50 | 84.29 | 83.05 | 72.32 | 65.48 | 72.90 | 75.76 | 
| FENICE + DL-ACU | 76.43 | 84.29 | 82.34 | 72.29 | 65.49 | 72.62 | 75.58 | 
| FENICE + SD-ACU | 78.02 | 85.33 | 83.28 | 76.66 | 66.79 | 72.66 | 77.12 | 
| FENICE + DL-ACU + SD-ACU | 77.87 | 84.98 | 83.33 | 74.89 | 66.24 | 73.83 | 76.86 | 
| Method | LinkE | CorefE | GramE | EntE | CircE | RelE | OutE | OtherE | 
|---|---|---|---|---|---|---|---|---|
| FIZZ family | ||||||||
| FIZZ (baseline) | 45.45 | 70.75 | 34.69 | 30.10 | 21.79 | 15.13 | 5.35 | 75.00 | 
| FIZZ + DL-ACU | 38.64 | 66.98 | 33.33 | 25.37 | 20.51 | 14.47 | 4.71 | 60.00 | 
| vs. baseline (pp) | 6.81 | 3.77 | 1.36 | 4.73 | 1.28 | 0.66 | 0.64 | 15.00 | 
| FENICE family | ||||||||
| FENICE (baseline) | 90.91 | 94.34 | 72.11 | 65.92 | 55.13 | 49.34 | 32.33 | 90.00 | 
| FENICE + DL-ACU | 79.55 | 93.40 | 73.47 | 68.16 | 60.26 | 49.34 | 34.48 | 90.00 | 
| vs. baseline (pp) | 11.36 | 0.94 | −1.36 | −2.24 | −5.13 | 0 | −2.15 | 0 | 
| Method | AggreFact-Cnn-FtSota | AggreFact-XSum-FtSota | AVG | 
|---|---|---|---|
| Other comparators | |||
| DAE | 59.40 | 73.10 | 66.25 | 
| FactCC | 57.60 | 54.88 | 56.24 | 
| SummaC-ZS | 65.19 | 54.08 | 59.64 | 
| SummaC-Conv | 61.72 | 63.52 | 62.62 | 
| MENLI | 62.24 | 65.30 | 63.77 | 
| MFMA | 61.86 | 55.00 | 58.43 | 
| AlignScore | 62.72 | 69.44 | 66.08 | 
| InFusE | 64.51 | 65.82 | 65.16 | 
| FIZZ family | |||
| FIZZ (baseline) | 65.86 | 69.25 | 67.56 | 
| FIZZ + DL-ACU | 67.11 | 69.82 | 68.47 | 
| FIZZ + SD-ACU | 66.46 | 69.44 | 67.95 | 
| FIZZ + DL-ACU + SD-ACU | 67.71 | 70.18 | 68.95 | 
| FENICE family | |||
| FENICE (baseline) | 66.23 | 73.83 | 70.03 | 
| FENICE + DL-ACU | 66.29 | 74.08 | 70.19 | 
| FENICE + SD-ACU | 68.43 | 73.44 | 70.94 | 
| FENICE + DL-ACU + SD-ACU | 67.17 | 74.86 | 71.02 | 
| Method | CSM | MNW | QMS | AXV | GOV | AVG | 
|---|---|---|---|---|---|---|
| Other comparators | ||||||
| FactCC | 50.36 | 34.41 | 46.62 | 61.87 | 66.97 | 52.05 | 
| SummaC-ZS | 59.36 | 46.72 | 44.92 | 68.16 | 72.58 | 58.75 | 
| SummaC-Conv | 53.76 | 52.70 | 49.44 | 61.50 | 71.13 | 57.71 | 
| MENLI | 53.19 | 60.53 | 44.45 | 66.85 | 34.39 | 51.48 | 
| MFMA | 60.34 | 49.94 | 43.33 | 72.30 | 60.32 | 57.65 | 
| AlignScore | 58.59 | 42.69 | 59.21 | 72.77 | 85.25 | 63.30 | 
| InFusE | 46.04 | 40.05 | 46.71 | 70.49 | 78.19 | 56.70 | 
| FIZZ family | ||||||
| FIZZ (baseline) | 54.01 | 38.67 | 51.22 | 62.15 | 64.16 | 54.44 | 
| FIZZ + DL-ACU | 53.34 | 38.20 | 49.34 | 61.34 | 64.52 | 53.34 | 
| FIZZ + SD-ACU | 54.27 | 39.82 | 51.79 | 62.81 | 66.20 | 54.98 | 
| FIZZ + DL-ACU + SD-ACU | 54.89 | 40.16 | 52.44 | 62.62 | 66.02 | 55.23 | 
| FENICE family | ||||||
| FENICE (baseline) | 54.83 | 43.50 | 55.92 | 75.94 | 74.34 | 60.91 | 
| FENICE + DL-ACU | 52.60 | 43.96 | 56.11 | 76.13 | 70.23 | 59.41 | 
| FENICE + SD-ACU | 51.57 | 47.18 | 50.00 | 73.83 | 77.47 | 63.57 | 
| FENICE + DL-ACU + SD-ACU | 50.60 | 45.45 | 52.16 | 72.27 | 75.84 | 59.26 | 
| Method | FaithBench | 
|---|---|
| Other comparators | |
| FactCC | 49.54 | 
| SummaC-ZS | 47.52 | 
| SummaC-Conv | 52.16 | 
| MENLI | 49.78 | 
| MFMA | 52.09 | 
| AlignScore | 48.33 | 
| InFusE | 49.87 | 
| FIZZ family | |
| FIZZ (baseline) | 52.50 | 
| FIZZ + DL-ACU | 53.32 | 
| FIZZ + SD-ACU | 54.27 | 
| FIZZ + DL-ACU + SD-ACU | 54.57 | 
| FENICE family | |
| FENICE (baseline) | 59.05 | 
| FENICE + DL-ACU | 60.20 | 
| FENICE + SD-ACU | 60.81 | 
| FENICE + DL-ACU + SD-ACU | 61.00 | 
| ID | FRANK Summary | FIZZORCA-2 | FENICET5-base | Our Method (DL-ACU) | 
|---|---|---|---|---|
| 1 | However, the shooter tried to ram the gates before firing at the guard at least once. | 
 | 
 | 
 | 
| 2 | Police believe the shooter barricaded himself inside after noticing a couple fighting in a car. | 
 | 
 | 
 | 
| 3 | French scientists say they have found a way to hide the earth’s vast mountains of Mont Blanc. | 
 | 
 | 
 | 
| 4 | The banks in the Indian capital, Delhi, have been shut down because of corruption and corruption, the BBC has learned. | 
 | 
 | 
 | 
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lee, D.; Jung, H.; Choi, Y.S. Mind the Link: Discourse Link-Aware Hallucination Detection in Summarization. Appl. Sci. 2025, 15, 10506. https://doi.org/10.3390/app151910506
Lee D, Jung H, Choi YS. Mind the Link: Discourse Link-Aware Hallucination Detection in Summarization. Applied Sciences. 2025; 15(19):10506. https://doi.org/10.3390/app151910506
Chicago/Turabian StyleLee, Dawon, Hyuckchul Jung, and Yong Suk Choi. 2025. "Mind the Link: Discourse Link-Aware Hallucination Detection in Summarization" Applied Sciences 15, no. 19: 10506. https://doi.org/10.3390/app151910506
APA StyleLee, D., Jung, H., & Choi, Y. S. (2025). Mind the Link: Discourse Link-Aware Hallucination Detection in Summarization. Applied Sciences, 15(19), 10506. https://doi.org/10.3390/app151910506
 
        


 
       