Mind the Link: Discourse Link-Aware Hallucination Detection in Summarization
Abstract
1. Introduction
2. Related Works
2.1. Hallucination Detection in Summarization
2.2. Atomic Content Unit Decomposition
2.3. AMR Graph
3. Methods
3.1. Discourse Link-Aware Content Unit Decomposition Using AMR Graphs
3.1.1. Subgraph Extraction Based on Discourse Link
3.1.2. Post-Processing
Algorithm 1 DL-ACU: Discourse Link-Aware ACU Decomposition |
Require: Summary S; STOG; GTOS; NLI scorer fNLI_ENTAIL; coreference resolver Coref; thresholds τ, δ |
Require: AMR role sets: predicate detector isPredicate(·); discourse labels ; rule set ; skip set (:wiki, :name, :op) |
Require: Baseline ACUs (from the host Hallucination Detection system) |
|
3.1.3. Discussion and Practical Considerations
3.2. Selective Document-Atomic Content Unit Decomposition
3.2.1. Entailment-Based Selection
3.2.2. Atomic Content Unit Decomposition
3.2.3. Discussion and Practical Considerations
4. Experiments
4.1. Experimental Setup
4.2. Evaluation Benchmarks
5. Results
5.1. Discourse Link-Aware Content Unit
5.2. Selective Document-Atomic Content Unit
5.3. Synergistic Effects and Overall Performance
6. Conclusions
7. Limitations
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
ACU | Atomic Content Unit |
DL-ACU | Discourse Link-Aware Content Unit |
SD-ACU | Selective Document-Atomic Content Unit |
References
- Liu, Y.; Fabbri, A.; Liu, P.; Zhao, Y.; Nan, L.; Han, R.; Han, S.; Joty, S.; Wu, C.S.; Xiong, C.; et al. Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; pp. 4140–4170. [Google Scholar]
- Yang, J.; Yoon, S.; Kim, B.; Lee, H. FIZZ: Factual Inconsistency Detection by Zoom-in Summary and Zoom-out Document. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, 12–16 November 2024; pp. 30–45. [Google Scholar]
- Scirè, A.; Ghonim, K.; Navigli, R. FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand, 11–16 August 2024; pp. 14148–14161. [Google Scholar]
- Zhang, H.; Xu, Y.; Perez-Beltrachini, L. Fine-Grained Natural Language Inference Based Faithfulness Evaluation for Diverse Summarisation Tasks. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), St. Julian’s, Malta, 17–22 March 2024; pp. 1701–1722. [Google Scholar]
- Pagnoni, A.; Balachandran, V.; Tsvetkov, Y. Understanding Factuality in Abstractive Summarization with FRANK: A Benchmark for Factuality Metrics. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 4812–4829. [Google Scholar]
- Luo, G.; Fan, W.; Li, M.; He, Y.; Yang, Y.; Bao, F. On the Intractability to Synthesize Factual Inconsistencies in Summarization. In Proceedings of the Findings of the Association for Computational Linguistics: EACL 2024, St. Julian’s, Malta, 17–22 March 2024; pp. 1026–1037. [Google Scholar]
- Qiu, H.; Huang, K.H.; Qu, J.; Peng, N. AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Mexico City, Mexico, 16–21 June 2024; pp. 594–608. [Google Scholar]
- Laban, P.; Schnabel, T.; Bennett, P.N.; Hearst, M.A. SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization. Trans. Assoc. Comput. Linguist. 2022, 10, 163–177. [Google Scholar] [CrossRef]
- Zha, Y.; Yang, Y.; Li, R.; Hu, Z. AlignScore: Evaluating Factual Consistency with A Unified Alignment Function. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; pp. 11328–11348. [Google Scholar]
- Stacey, J.; Minervini, P.; Dubossarsky, H.; Camburu, O.M.; Rei, M. Atomic Inference for NLI with Generated Facts as Atoms. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, 12–16 November 2024; pp. 10188–10204. [Google Scholar]
- Mitra, A.; Corro, L.D.; Mahajan, S.; Codas, A.; Simoes, C.; Agarwal, S.; Chen, X.; Razdaibiedina, A.; Jones, E.; Aggarwal, K.; et al. Orca 2: Teaching Small Language Models How to Reason. arXiv 2023, arXiv:2311.11045. [Google Scholar] [CrossRef]
- Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
- Banarescu, L.; Bonial, C.; Cai, S.; Georgescu, M.; Griffitt, K.; Hermjakob, U.; Knight, K.; Koehn, P.; Palmer, M.; Schneider, N. Abstract Meaning Representation for Sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, Sofia, Bulgaria, 8–9 August 2013; pp. 178–186. [Google Scholar]
- Chen, Y.; Eger, S. MENLI: Robust Evaluation Metrics from Natural Language Inference. Trans. Assoc. Comput. Linguist. 2023, 11, 804–825. [Google Scholar] [CrossRef]
- Kryscinski, W.; McCann, B.; Xiong, C.; Socher, R. Evaluating the Factual Consistency of Abstractive Text Summarization. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 9332–9346. [Google Scholar]
- Bao, F.S.; Li, M.; Qu, R.; Luo, G.; Wan, E.; Tang, Y.; Fan, W.; Tamber, M.S.; Kazi, S.; Sourabh, V.; et al. FaithBench: A Diverse Hallucination Benchmark for Summarization by Modern LLMs. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), Albuquerque, Mexico, 29 April–4 May 2025; pp. 448–461. [Google Scholar]
- Scialom, T.; Dray, P.A.; Lamprier, S.; Piwowarski, B.; Staiano, J.; Wang, A.; Gallinari, P. QuestEval: Summarization Asks for Fact-based Evaluation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 6594–6604. [Google Scholar]
- Fabbri, A.; Wu, C.S.; Liu, W.; Xiong, C. QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA, 10–15 July 2022; pp. 2587–2601. [Google Scholar]
- Liu, Y.; Iter, D.; Xu, Y.; Wang, S.; Xu, R.; Zhu, C. G-Eval: NLG Evaluation using Gpt-4 with Better Human Alignment. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023; pp. 2511–2522. [Google Scholar]
- Wan, D.; Sinha, K.; Iyer, S.; Celikyilmaz, A.; Bansal, M.; Pasunuru, R. ACUEval: Fine-grained Hallucination Evaluation and Correction for Abstractive Summarization. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand, 11–16 August 2024; pp. 10036–10056. [Google Scholar]
- Ribeiro, L.F.R.; Liu, M.; Gurevych, I.; Dreyer, M.; Bansal, M. FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA, 10–15 July 2022; pp. 3238–3253. [Google Scholar]
- Goyal, T.; Durrett, G. Evaluating Factuality in Generation with Dependency-level Entailment. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Online, 16–20 November 2020; pp. 3592–3603. [Google Scholar]
- Nenkova, A.; Passonneau, R. Evaluating Content Selection in Summarization: The Pyramid Method. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004, Boston, MA, USA, 2–7 April 2004; pp. 145–152. [Google Scholar]
- Goodman, M.W. Penman: An Open-Source Library and Tool for AMR Graphs. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Online, 5–10 July 2020; pp. 312–319. [Google Scholar]
- Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 7871–7880. [Google Scholar]
- Wein, S.; Opitz, J. A Survey of AMR Applications. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, 12–16 November 2024; pp. 6856–6875. [Google Scholar]
- Nawrath, M.; Nowak, A.; Ratz, T.; Walenta, D.; Opitz, J.; Ribeiro, L.; Sedoc, J.; Deutsch, D.; Mille, S.; Liu, Y.; et al. On the Role of Summary Content Units in Text Summarization Evaluation. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), Mexico City, Mexico, 16–21 June 2024; pp. 272–281. [Google Scholar]
- Martinelli, G.; Barba, E.; Navigli, R. Maverick: Efficient and Accurate Coreference Resolution Defying Recent Trends. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand, 11–16 August 2024; pp. 13380–13394. [Google Scholar]
- Kamoi, R.; Goyal, T.; Diego Rodriguez, J.; Durrett, G. WiCE: Real-World Entailment for Claims in Wikipedia. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023; pp. 7561–7583. [Google Scholar]
- Zhang, T.; Kishore, V.; Wu, F.; Weinberger, K.Q.; Artzi, Y. BERTScore: Evaluating Text Generation with BERT. In Proceedings of the 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Schuster, T.; Fisch, A.; Barzilay, R. Get Your Vitamin C! Robust Fact Verification with Contrastive Evidence. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 624–643. [Google Scholar]
- Pradhan, S.; Moschitti, A.; Xue, N.; Uryupina, O.; Zhang, Y. CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes. In Proceedings of the Joint Conference on EMNLP and CoNLL, Shared Task, Jeju Island, Republic of Korea, 12–14 July 2012; pp. 1–40. [Google Scholar]
- Tang, L.; Goyal, T.; Fabbri, A.; Laban, P.; Xu, J.; Yavuz, S.; Kryscinski, W.; Rousseau, J.; Durrett, G. Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; pp. 11626–11644. [Google Scholar]
- Fabbri, A.R.; Kryściński, W.; McCann, B.; Xiong, C.; Socher, R.; Radev, D. SummEval: Re-evaluating Summarization Evaluation. Trans. Assoc. Comput. Linguist. 2021, 9, 391–409. [Google Scholar] [CrossRef]
- Cao, S.; Wang, L. CLIFF: Contrastive Learning for Improving Faithfulness and Factuality in Abstractive Summarization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 6633–6649. [Google Scholar]
- Maynez, J.; Narayan, S.; Bohnet, B.; McDonald, R. On Faithfulness and Factuality in Abstractive Summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 1906–1919. [Google Scholar]
- Huang, D.; Cui, L.; Yang, S.; Bao, G.; Wang, K.; Xie, J.; Zhang, Y. What Have We Achieved on Text Summarization? In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 446–469. [Google Scholar]
- Goyal, T.; Durrett, G. Annotating and Modeling Fine-grained Factuality in Summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 1449–1462. [Google Scholar]
- Cao, M.; Dong, Y.; Cheung, J. Hallucinated but Factual! Inspecting the Factuality of Hallucinations in Abstractive Summarization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; pp. 3340–3354. [Google Scholar]
- Wang, A.; Cho, K.; Lewis, M. Asking and Answering Questions to Evaluate the Factual Consistency of Summaries. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 5008–5020. [Google Scholar]
- See, A.; Liu, P.J.; Manning, C.D. Get To The Point: Summarization with Pointer-Generator Networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada, 30 July–4 August 2017; pp. 1073–1083. [Google Scholar]
- Gehrmann, S.; Deng, Y.; Rush, A. Bottom-Up Abstractive Summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 4098–4109. [Google Scholar]
- Liu, Y.; Lapata, M. Text Summarization with Pretrained Encoders. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 3730–3740. [Google Scholar]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language Models are Unsupervised Multitask Learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
- Zhang, J.; Zhao, Y.; Saleh, M.; Liu, P.J. PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In Proceedings of the 37th International Conference on Machine Learning, Online, 13–18 July 2020. [Google Scholar]
- Hermann, K.M.; Kočiský, T.; Grefenstette, E.; Espeholt, L.; Kay, W.; Suleyman, M.; Blunsom, P. Teaching Machines to Read and Comprehend. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 7–12 December 2015; pp. 1693–1701. [Google Scholar]
- Narayan, S.; Cohen, S.B.; Lapata, M. Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 1797–1807. [Google Scholar]
- Falke, T.; Ribeiro, L.F.R.; Utama, P.A.; Dagan, I.; Gurevych, I. Ranking Generated Summaries by Correctness: An Interesting but Challenging Application for Natural Language Inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 2214–2220. [Google Scholar]
- Adams, G.; Nguyen, B.; Smith, J.; Xia, Y.; Xie, S.; Ostropolets, A.; Deb, B.; Chen, Y.J.; Naumann, T.; Elhadad, N. What are the Desired Characteristics of Calibration Sets? Identifying Correlates on Long Form Scientific Summarization. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; pp. 10520–10542. [Google Scholar]
- Cohan, A.; Dernoncourt, F.; Kim, D.S.; Bui, T.; Kim, S.; Chang, W.; Goharian, N. A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), New Orleans, LA, USA, 1–6 June 2018; pp. 615–621. [Google Scholar]
- Huang, L.; Cao, S.; Parulian, N.; Ji, H.; Wang, L. Efficient Attentions for Long Document Summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 1419–1436. [Google Scholar]
- Fabbri, A.; Li, I.; She, T.; Li, S.; Radev, D. Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 1074–1084. [Google Scholar]
- Zhong, M.; Yin, D.; Yu, T.; Zaidi, A.; Mutuma, M.; Jha, R.; Awadallah, A.H.; Celikyilmaz, A.; Liu, Y.; Qiu, X.; et al. QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 5905–5921. [Google Scholar]
Method | CoGenSumm | FactCC | FRANK | SummEval | Polytope | XSumFaith | AVG |
---|---|---|---|---|---|---|---|
Other comparators | |||||||
DAE | 63.40 | 75.90 | 61.70 | 70.30 | 62.80 | 50.80 | 64.15 |
FactCC | 65.40 | 77.30 | 59.40 | 58.53 | 58.20 | 56.54 | 62.56 |
SummaC-ZS | 70.40 | 83.81 | 78.51 | 78.70 | 62.00 | 58.40 | 71.97 |
SummaC-Conv | 64.70 | 89.06 | 81.62 | 81.70 | 62.70 | 65.44 | 74.20 |
MENLI | 55.20 | 58.17 | 75.11 | 52.73 | 57.07 | 67.15 | 60.91 |
MFMA | 64.13 | 84.88 | 80.62 | 75.50 | 58.31 | 55.07 | 69.75 |
AlignScore | 70.16 | 85.88 | 82.06 | 56.42 | 60.64 | 73.44 | 71.43 |
InFusE | 75.61 | 87.59 | 79.39 | 73.54 | 54.44 | 67.64 | 73.04 |
FIZZ family | |||||||
FIZZ (baseline) | 59.18 | 78.66 | 80.44 | 62.92 | 58.09 | 72.33 | 68.60 |
FIZZ + DL-ACU | 62.43 | 79.46 | 80.49 | 63.09 | 58.47 | 72.72 | 69.44 |
FIZZ + SD-ACU | 63.23 | 81.50 | 80.70 | 64.17 | 60.14 | 72.55 | 70.38 |
FIZZ + DL-ACU + SD-ACU | 63.45 | 81.52 | 80.68 | 63.35 | 58.89 | 72.97 | 70.14 |
FENICE family | |||||||
FENICE (baseline) | 76.50 | 84.29 | 83.05 | 72.32 | 65.48 | 72.90 | 75.76 |
FENICE + DL-ACU | 76.43 | 84.29 | 82.34 | 72.29 | 65.49 | 72.62 | 75.58 |
FENICE + SD-ACU | 78.02 | 85.33 | 83.28 | 76.66 | 66.79 | 72.66 | 77.12 |
FENICE + DL-ACU + SD-ACU | 77.87 | 84.98 | 83.33 | 74.89 | 66.24 | 73.83 | 76.86 |
Method | LinkE | CorefE | GramE | EntE | CircE | RelE | OutE | OtherE |
---|---|---|---|---|---|---|---|---|
FIZZ family | ||||||||
FIZZ (baseline) | 45.45 | 70.75 | 34.69 | 30.10 | 21.79 | 15.13 | 5.35 | 75.00 |
FIZZ + DL-ACU | 38.64 | 66.98 | 33.33 | 25.37 | 20.51 | 14.47 | 4.71 | 60.00 |
vs. baseline (pp) | 6.81 | 3.77 | 1.36 | 4.73 | 1.28 | 0.66 | 0.64 | 15.00 |
FENICE family | ||||||||
FENICE (baseline) | 90.91 | 94.34 | 72.11 | 65.92 | 55.13 | 49.34 | 32.33 | 90.00 |
FENICE + DL-ACU | 79.55 | 93.40 | 73.47 | 68.16 | 60.26 | 49.34 | 34.48 | 90.00 |
vs. baseline (pp) | 11.36 | 0.94 | −1.36 | −2.24 | −5.13 | 0 | −2.15 | 0 |
Method | AggreFact-Cnn-FtSota | AggreFact-XSum-FtSota | AVG |
---|---|---|---|
Other comparators | |||
DAE | 59.40 | 73.10 | 66.25 |
FactCC | 57.60 | 54.88 | 56.24 |
SummaC-ZS | 65.19 | 54.08 | 59.64 |
SummaC-Conv | 61.72 | 63.52 | 62.62 |
MENLI | 62.24 | 65.30 | 63.77 |
MFMA | 61.86 | 55.00 | 58.43 |
AlignScore | 62.72 | 69.44 | 66.08 |
InFusE | 64.51 | 65.82 | 65.16 |
FIZZ family | |||
FIZZ (baseline) | 65.86 | 69.25 | 67.56 |
FIZZ + DL-ACU | 67.11 | 69.82 | 68.47 |
FIZZ + SD-ACU | 66.46 | 69.44 | 67.95 |
FIZZ + DL-ACU + SD-ACU | 67.71 | 70.18 | 68.95 |
FENICE family | |||
FENICE (baseline) | 66.23 | 73.83 | 70.03 |
FENICE + DL-ACU | 66.29 | 74.08 | 70.19 |
FENICE + SD-ACU | 68.43 | 73.44 | 70.94 |
FENICE + DL-ACU + SD-ACU | 67.17 | 74.86 | 71.02 |
Method | CSM | MNW | QMS | AXV | GOV | AVG |
---|---|---|---|---|---|---|
Other comparators | ||||||
FactCC | 50.36 | 34.41 | 46.62 | 61.87 | 66.97 | 52.05 |
SummaC-ZS | 59.36 | 46.72 | 44.92 | 68.16 | 72.58 | 58.75 |
SummaC-Conv | 53.76 | 52.70 | 49.44 | 61.50 | 71.13 | 57.71 |
MENLI | 53.19 | 60.53 | 44.45 | 66.85 | 34.39 | 51.48 |
MFMA | 60.34 | 49.94 | 43.33 | 72.30 | 60.32 | 57.65 |
AlignScore | 58.59 | 42.69 | 59.21 | 72.77 | 85.25 | 63.30 |
InFusE | 46.04 | 40.05 | 46.71 | 70.49 | 78.19 | 56.70 |
FIZZ family | ||||||
FIZZ (baseline) | 54.01 | 38.67 | 51.22 | 62.15 | 64.16 | 54.44 |
FIZZ + DL-ACU | 53.34 | 38.20 | 49.34 | 61.34 | 64.52 | 53.34 |
FIZZ + SD-ACU | 54.27 | 39.82 | 51.79 | 62.81 | 66.20 | 54.98 |
FIZZ + DL-ACU + SD-ACU | 54.89 | 40.16 | 52.44 | 62.62 | 66.02 | 55.23 |
FENICE family | ||||||
FENICE (baseline) | 54.83 | 43.50 | 55.92 | 75.94 | 74.34 | 60.91 |
FENICE + DL-ACU | 52.60 | 43.96 | 56.11 | 76.13 | 70.23 | 59.41 |
FENICE + SD-ACU | 51.57 | 47.18 | 50.00 | 73.83 | 77.47 | 63.57 |
FENICE + DL-ACU + SD-ACU | 50.60 | 45.45 | 52.16 | 72.27 | 75.84 | 59.26 |
Method | FaithBench |
---|---|
Other comparators | |
FactCC | 49.54 |
SummaC-ZS | 47.52 |
SummaC-Conv | 52.16 |
MENLI | 49.78 |
MFMA | 52.09 |
AlignScore | 48.33 |
InFusE | 49.87 |
FIZZ family | |
FIZZ (baseline) | 52.50 |
FIZZ + DL-ACU | 53.32 |
FIZZ + SD-ACU | 54.27 |
FIZZ + DL-ACU + SD-ACU | 54.57 |
FENICE family | |
FENICE (baseline) | 59.05 |
FENICE + DL-ACU | 60.20 |
FENICE + SD-ACU | 60.81 |
FENICE + DL-ACU + SD-ACU | 61.00 |
ID | FRANK Summary | FIZZORCA-2 | FENICET5-base | Our Method (DL-ACU) |
---|---|---|---|---|
1 | However, the shooter tried to ram the gates before firing at the guard at least once. |
|
|
|
2 | Police believe the shooter barricaded himself inside after noticing a couple fighting in a car. |
|
|
|
3 | French scientists say they have found a way to hide the earth’s vast mountains of Mont Blanc. |
|
|
|
4 | The banks in the Indian capital, Delhi, have been shut down because of corruption and corruption, the BBC has learned. |
|
|
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lee, D.; Jung, H.; Choi, Y.S. Mind the Link: Discourse Link-Aware Hallucination Detection in Summarization. Appl. Sci. 2025, 15, 10506. https://doi.org/10.3390/app151910506
Lee D, Jung H, Choi YS. Mind the Link: Discourse Link-Aware Hallucination Detection in Summarization. Applied Sciences. 2025; 15(19):10506. https://doi.org/10.3390/app151910506
Chicago/Turabian StyleLee, Dawon, Hyuckchul Jung, and Yong Suk Choi. 2025. "Mind the Link: Discourse Link-Aware Hallucination Detection in Summarization" Applied Sciences 15, no. 19: 10506. https://doi.org/10.3390/app151910506
APA StyleLee, D., Jung, H., & Choi, Y. S. (2025). Mind the Link: Discourse Link-Aware Hallucination Detection in Summarization. Applied Sciences, 15(19), 10506. https://doi.org/10.3390/app151910506