Soft-Community Kernel Rényi Spectrum for Semantic Uncertainty Estimation in Large Language Models
Abstract
1. Introduction
- We propose a soft-community formulation for semantic uncertainty estimation in LLMs, representing multiple sampled generations as a weighted semantic graph and inferring soft community memberships instead of enforcing hard semantic equivalence classes.
- We introduce a kernel-based Rényi spectral uncertainty estimator, which quantifies semantic uncertainty via the Rényi entropy of the semantic kernel spectrum, generalizing von Neumann entropy and enabling tunable sensitivity to dominant and long-tail semantic modes.
- We present a unified information-theoretic framework that decouples semantic structure discovery from uncertainty quantification, providing a principled and extensible perspective on semantic uncertainty estimation for LLMs.
- Extensive experiments demonstrate that the proposed method yields more stable, discriminative, and sample-efficient uncertainty estimates under limited sampling budgets and noisy semantic judgments compared to existing semantic entropy approaches.
2. Related Work
2.1. Uncertainty Estimation in Large Language Models
2.2. Hallucinations and Confabulations in Large Language Models
3. Method
3.1. Problem Setup
3.2. Semantic Similarity and Graph Construction
3.3. Soft Community Representation
3.4. Semantic Kernel and Rényi Spectral Uncertainty
| Algorithm 1: Soft-Community Rényi Semantic Uncertainty |
Sampling. Sample N responses from conditioned on x; Semantic similarity. Obtain sentence embeddings for each response. Compute embedding similarity (rescaled to ) and symmetric NLI similarity . Fuse similarities to form the graph weights ; Spectral embedding. Compute degree matrix D with and normalized Laplacian . Compute eigenpairs and form ; Soft communities. Compute soft memberships by row-wise softmax: ; Kernel construction. Compute community prototypes and weights . Construct kernel and normalize ; Rényi spectral uncertainty. Compute eigenvalues of and output return ; |
3.5. Theoretical Properties of Rényi Spectral Uncertainty
4. Experiments
4.1. Experimental Setup
4.2. Evaluation Tasks and Data
4.3. Evaluation Metrics and Baselines
4.3.1. Evaluation Metrics
4.3.2. Baselines
4.4. Main Results
4.4.1. AUROC Performance
4.4.2. AUARC Performance
4.5. Ablation Studies
4.5.1. Effect of Rényi Order
4.5.2. Effect of Fusion Weight
4.5.3. Soft vs. Hard Semantic Community Assignment
4.6. Sensitivity Analysis
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A. Proofs of Theoretical Properties
References
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Wei, J.; Tay, Y.; Bommasani, R.; Raffel, C.; Zoph, B.; Borgeaud, S.; Yogatama, D.; Bosma, M.; Zhou, D.; Metzler, D.; et al. Emergent Abilities of Large Language Models. arXiv 2022, arXiv:2206.07682. [Google Scholar] [CrossRef]
- Shorinwa, O.; Mei, Z.; Lidard, J.; Ren, A.Z.; Majumdar, A. A survey on uncertainty quantification of large language models: Taxonomy, open research challenges, and future directions. ACM Comput. Surv. 2025, 58, 63. [Google Scholar] [CrossRef]
- Liu, X.; Chen, T.; Da, L.; Chen, C.; Lin, Z.; Wei, H. Uncertainty quantification and confidence calibration in large language models: A survey. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2; Association for Computing Machinery: New York, NY, USA, 2025; pp. 6107–6117. [Google Scholar]
- Penny-Dimri, J.C.; Bachmann, M.; Cooke, W.R.; Mathewlynn, S.; Dockree, S.; Tolladay, J.; Kossen, J.; Li, L.; Gal, Y.; Jones, G.D. Measuring large language model uncertainty in women’s health using semantic entropy and perplexity: A comparative study. Lancet Obstet. Gynaecol. Women’s Health 2025, 1, e47–e56. [Google Scholar] [CrossRef]
- Dahl, M.; Magesh, V.; Suzgun, M.; Ho, D.E. Large legal fictions: Profiling legal hallucinations in large language models. J. Leg. Anal. 2024, 16, 64–93. [Google Scholar] [CrossRef]
- Hu, H.; He, C.; Xie, X.; Zhang, Q. Lrp4rag: Detecting hallucinations in retrieval-augmented generation via layer-wise relevance propagation. arXiv 2024, arXiv:2408.15533. [Google Scholar] [CrossRef]
- Gal, Y.; Ghahramani, Z. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the International Conference on Machine Learning; PMLR: New York, NY, USA, 2016; pp. 1050–1059. [Google Scholar]
- Kendall, A.; Gal, Y. What uncertainties do we need in bayesian deep learning for computer vision? In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Farquhar, S.; Kossen, J.; Kuhn, L.; Gal, Y. Detecting hallucinations in large language models using semantic entropy. Nature 2024, 630, 625–630. [Google Scholar] [CrossRef]
- Nikitin, A.; Kossen, J.; Gal, Y.; Marttinen, P. Kernel language entropy: Fine-grained uncertainty quantification for llms from semantic similarities. Adv. Neural Inf. Process. Syst. 2024, 37, 8901–8929. [Google Scholar]
- Qiu, X.; Miikkulainen, R. Semantic density: Uncertainty quantification for large language models through confidence measurement in semantic space. Adv. Neural Inf. Process. Syst. 2024, 37, 134507–134533. [Google Scholar]
- Bowman, S.; Angeli, G.; Potts, C.; Manning, C.D. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing; Association for Computational Linguistics: Stroudsburg, PA, USA, 2015; pp. 632–642. [Google Scholar]
- Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP); Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; p. 3982. [Google Scholar]
- Von Neumann, J. Mathematical Foundations of Quantum Mechanics: New Edition; Princeton University Press: Princeton, NJ, USA, 2018. [Google Scholar]
- Giraldo, L.G.S.; Rao, M.; Principe, J.C. Measures of entropy from data using infinitely divisible kernels. IEEE Trans. Inf. Theory 2014, 61, 535–548. [Google Scholar] [CrossRef]
- Bach, F. Information theory with kernel methods. IEEE Trans. Inf. Theory 2022, 69, 752–775. [Google Scholar] [CrossRef]
- Blondel, V.D.; Guillaume, J.L.; Lambiotte, R.; Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008, 2008, P10008. [Google Scholar] [CrossRef]
- Traag, V.A.; Waltman, L.; Van Eck, N.J. From Louvain to Leiden: Guaranteeing well-connected communities. Sci. Rep. 2019, 9, 5233. [Google Scholar] [CrossRef]
- Lin, Z.; Trivedi, S.; Sun, J. Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models. arXiv 2024, arXiv:2305.19187. [Google Scholar] [CrossRef]
- Da, L.; Chen, T.; Cheng, L.; Wei, H. Llm uncertainty quantification through directional entailment graph and claim level response augmentation. arXiv 2024, arXiv:2407.00994. [Google Scholar] [CrossRef]
- Grewal, Y.S.; Bonilla, E.V.; Bui, T.D. Improving uncertainty quantification in large language models via semantic embeddings. arXiv 2024, arXiv:2410.22685. [Google Scholar] [CrossRef]
- Li, Z.; Shen, S.; Yang, W.; Jin, R.; Chen, H.; Ren, J. Enhancing Uncertainty Quantification in Large Language Models through Semantic Graph Density. In Proceedings of the 41st Conference on Uncertainty in Artificial Intelligence; PMLR: New York, NY, USA, 2025. [Google Scholar]
- Kuhn, L.; Gal, Y.; Farquhar, S. Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Zhang, C.; Liu, F.; Basaldella, M.; Collier, N. LUQ: Long-text Uncertainty Quantification for LLMs. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, 12–16 November 2024; pp. 5244–5262. [Google Scholar]
- Jiang, M.; Ruan, Y.; Sattigeri, P.; Roukos, S.; Hashimoto, T. Graph-based uncertainty metrics for long-form language model generations. Adv. Neural Inf. Process. Syst. 2024, 37, 32980–33006. [Google Scholar]
- Fang, X.; Huang, Z.; Tian, Z.; Fang, M.; Pan, Z.; Fang, Q.; Wen, Z.; Pan, H.; Li, D. Zero-resource hallucination detection for text generation via graph-based contextual knowledge triples modeling. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Washington, DC, USA, 2025; Volume 39, pp. 23868–23877. [Google Scholar]
- Huang, L.; Yu, W.; Ma, W.; Zhong, W.; Feng, Z.; Wang, H.; Chen, Q.; Peng, W.; Feng, X.; Qin, B.; et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Trans. Inf. Syst. 2025, 43, 42. [Google Scholar] [CrossRef]
- Berglund, L.; Tong, M.; Kaufmann, M.; Balesni, M.; Stickland, A.C.; Korbak, T.; Evans, O. The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A”. In Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024. [Google Scholar]
- Zheng, L.; Chiang, W.L.; Sheng, Y.; Zhuang, S.; Wu, Z.; Zhuang, Y.; Lin, Z.; Li, Z.; Li, D.; Xing, E.; et al. Judging llm-as-a-judge with mt-bench and chatbot arena. Adv. Neural Inf. Process. Syst. 2023, 36, 46595–46623. [Google Scholar]
- Sui, Y.; Ren, J.; Tan, H.; Chen, H.; Li, Z.; Wang, J. Enhancing LLM’s Reliability by Iterative Verification Attributions with Keyword Fronting. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer: Berlin/Heidelberg, Germany, 2024; pp. 251–268. [Google Scholar]
- Cohen, R.; Hamri, M.; Geva, M.; Globerson, A. LM vs LM: Detecting Factual Errors via Cross Examination. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023. [Google Scholar]
- Azaria, A.; Mitchell, T. The Internal State of an LLM Knows When It’s Lying. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, 6–10 December 2023; pp. 967–976. [Google Scholar]
- Williams, A.; Nangia, N.; Bowman, S. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers); Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 1112–1122. [Google Scholar]
- He, P.; Liu, X.; Gao, J.; Chen, W. Deberta: Decoding-enhanced bert with disentangled attention. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 4 May 2021. [Google Scholar]
- Reddy, S.; Chen, D.; Manning, C.D. Coqa: A conversational question answering challenge. Trans. Assoc. Comput. Linguist. 2019, 7, 249–266. [Google Scholar] [CrossRef]
- Joshi, M.; Choi, E.; Weld, D.S.; Zettlemoyer, L. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Association for Computational Linguistics: Stroudsburg, PA, USA, 2017; pp. 1601–1611. [Google Scholar]
- Tsatsaronis, G.; Balikas, G.; Malakasiotis, P.; Partalas, I.; Zschunke, M.; Alvers, M.R.; Weissenborn, D.; Krithara, A.; Petridis, S.; Polychronopoulos, D.; et al. An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinform. 2015, 16, 138. [Google Scholar] [CrossRef]
- Kwiatkowski, T.; Palomaki, J.; Redfield, O.; Collins, M.; Parikh, A.; Alberti, C.; Epstein, D.; Polosukhin, I.; Devlin, J.; Lee, K.; et al. Natural questions: A benchmark for question answering research. Trans. Assoc. Comput. Linguist. 2019, 7, 453–466. [Google Scholar] [CrossRef]

| Dataset | Entropy-Based Methods | Graph-Based Methods | Consistency-Based Methods | Ours | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| SE | DSE | KLE | Ecc | EigV | Deg | D-UE | NSS | SEU | RSU | |
| Llama-3.2-1B | ||||||||||
| NQ | 77.48 ± 0.55 | 76.50 ± 0.53 | 75.36 ± 0.50 | 76.66 ± 0.62 | 75.87 ± 0.57 | 77.18 ± 0.51 | 71.93 ± 0.63 | 76.43 ± 0.56 | 67.38 ± 0.72 | 77.90 ± 0.54 |
| CoQA | 73.73 ± 0.21 | 73.22 ± 0.22 | 74.59 ± 0.28 | 73.84 ± 0.29 | 70.82 ± 0.25 | 75.73 ± 0.27 | 73.95 ± 0.28 | 72.51 ± 0.22 | 69.80 ± 0.27 | 75.75 ± 0.28 |
| BioASQ | 86.87 ± 0.45 | 86.76 ± 0.47 | 86.73 ± 0.40 | 86.79 ± 0.45 | 85.53 ± 0.42 | 87.25 ± 0.39 | 85.62 ± 0.44 | 86.36 ± 0.44 | 78.78 ± 0.51 | 87.55 ± 0.39 |
| TriviaQA | 82.17 ± 0.18 | 81.15 ± 0.16 | 80.41 ± 0.18 | 81.13 ± 0.18 | 78.84 ± 0.16 | 81.64 ± 0.16 | 79.25 ± 0.13 | 80.51 ± 0.15 | 77.04 ± 0.14 | 82.23 ± 0.16 |
| Average | 80.06 | 79.41 | 79.27 | 79.61 | 77.77 | 80.45 | 77.69 | 78.95 | 73.25 | 80.86 |
| Llama-3.1-8B | ||||||||||
| NQ | 78.30 ± 0.43 | 77.88 ± 0.47 | 77.55 ± 0.44 | 77.73 ± 0.46 | 76.26 ± 0.41 | 78.64 ± 0.44 | 75.00 ± 0.42 | 77.48 ± 0.47 | 71.03 ± 0.44 | 78.86 ± 0.43 |
| CoQA | 75.26 ± 0.36 | 74.89 ± 0.35 | 78.92 ± 0.27 | 76.97 ± 0.40 | 71.75 ± 0.33 | 80.04 ± 0.24 | 77.90 ± 0.30 | 74.14 ± 0.35 | 72.71 ± 0.33 | 80.32 ± 0.27 |
| BioASQ | 83.40 ± 0.47 | 83.35 ± 0.47 | 84.28 ± 0.45 | 83.03 ± 0.58 | 81.32 ± 0.42 | 84.73 ± 0.46 | 82.59 ± 0.57 | 82.45 ± 0.51 | 74.81 ± 0.74 | 84.92 ± 0.48 |
| TriviaQA | 85.95 ± 0.11 | 85.23 ± 0.13 | 85.67 ± 0.12 | 84.97 ± 0.24 | 83.27 ± 0.12 | 86.23 ± 0.12 | 84.51 ± 0.38 | 84.42 ± 0.13 | 81.95 ± 0.13 | 87.11 ± 0.13 |
| Average | 80.73 | 80.34 | 81.61 | 80.68 | 78.15 | 82.41 | 80.00 | 79.62 | 75.13 | 82.80 |
| Mistral-7B-v0.3 | ||||||||||
| NQ | 76.88 ± 0.60 | 76.88 ± 0.60 | 77.58 ± 0.55 | 77.24 ± 0.46 | 76.62 ± 0.37 | 77.42 ± 0.56 | 76.15 ± 0.45 | 76.67 ± 0.57 | 71.85 ± 0.43 | 77.79 ± 0.48 |
| CoQA | 75.82 ± 0.33 | 75.76 ± 0.29 | 77.60 ± 0.21 | 78.11 ± 0.35 | 72.18 ± 0.27 | 79.61 ± 0.28 | 78.44 ± 0.26 | 75.32 ± 0.28 | 73.47 ± 0.25 | 80.21 ± 0.24 |
| BioASQ | 80.86 ± 0.53 | 80.90 ± 0.50 | 83.66 ± 0.41 | 83.05 ± 0.50 | 82.66 ± 0.50 | 83.57 ± 0.53 | 80.54 ± 0.55 | 80.98 ± 0.49 | 67.84 ± 0.41 | 84.38 ± 0.52 |
| TriviaQA | 83.76 ± 0.29 | 83.53 ± 0.28 | 83.86 ± 0.28 | 83.74 ± 0.11 | 82.80 ± 0.12 | 85.04 ± 0.28 | 83.58 ± 0.28 | 82.98 ± 0.28 | 79.59 ± 0.12 | 85.22 ± 0.26 |
| Average | 79.33 | 79.27 | 80.68 | 80.54 | 78.57 | 81.41 | 79.68 | 78.99 | 73.19 | 81.90 |
| Mistral-Nemo-12B | ||||||||||
| NQ | 76.78 ± 0.59 | 76.35 ± 0.57 | 77.78 ± 0.58 | 76.55 ± 0.47 | 76.28 ± 0.58 | 76.92 ± 0.52 | 73.04 ± 0.44 | 75.84 ± 0.56 | 69.53 ± 0.39 | 78.38 ± 0.51 |
| CoQA | 76.08 ± 0.19 | 75.72 ± 0.24 | 78.09 ± 0.19 | 77.25 ± 0.19 | 71.11 ± 0.24 | 79.10 ± 0.14 | 77.01 ± 0.16 | 75.05 ± 0.23 | 72.41 ± 0.20 | 78.88 ± 0.23 |
| BioASQ | 81.66 ± 0.48 | 81.58 ± 0.56 | 84.54 ± 0.44 | 82.20 ± 0.39 | 81.90 ± 0.61 | 83.60 ± 0.38 | 79.55 ± 0.44 | 80.91 ± 0.57 | 69.64 ± 0.49 | 84.24 ± 0.41 |
| TriviaQA | 85.44 ± 0.10 | 84.88 ± 0.19 | 86.10 ± 0.10 | 84.61 ± 0.14 | 83.31 ± 0.11 | 86.29 ± 0.11 | 84.29 ± 0.11 | 84.07 ± 0.09 | 81.47 ± 0.11 | 86.93 ± 0.11 |
| Average | 79.99 | 79.63 | 81.63 | 80.15 | 78.15 | 81.48 | 78.47 | 78.97 | 73.26 | 82.11 |
| Dataset | Entropy-Based Methods | Graph-Based Methods | Consistency-Based Methods | Ours | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| SE | DSE | KLE | Ecc | EigV | Deg | D-UE | NSS | SEU | RSU | |
| Llama-3.2-1B | ||||||||||
| NQ | 27.54 ± 0.63 | 27.43 ± 0.63 | 27.08 ± 0.58 | 27.75 ± 0.49 | 26.50 ± 0.53 | 28.10 ± 0.57 | 25.91 ± 0.52 | 27.20 ± 0.54 | 23.70 ± 0.50 | 28.66 ± 0.56 |
| CoQA | 86.97 ± 0.28 | 86.32 ± 0.16 | 87.65 ± 0.11 | 87.25 ± 0.14 | 85.28 ± 0.12 | 87.98 ± 0.28 | 87.45 ± 0.10 | 86.08 ± 0.26 | 85.80 ± 0.11 | 87.89 ± 0.16 |
| BioASQ | 71.25 ± 0.86 | 71.01 ± 0.85 | 70.64 ± 0.93 | 70.63 ± 0.90 | 69.37 ± 0.88 | 70.88 ± 0.83 | 70.11 ± 0.86 | 70.62 ± 0.91 | 65.72 ± 0.81 | 71.33 ± 0.82 |
| TriviaQA | 52.04 ± 0.24 | 51.48 ± 0.23 | 51.39 ± 0.37 | 51.55 ± 0.24 | 49.35 ± 0.21 | 52.22 ± 0.24 | 50.47 ± 0.21 | 50.89 ± 0.23 | 48.30 ± 0.22 | 52.46 ± 0.26 |
| Average | 59.45 | 59.06 | 59.19 | 59.30 | 57.63 | 59.80 | 58.49 | 58.70 | 55.88 | 60.09 |
| Llama-3.1-8B | ||||||||||
| NQ | 51.74 ± 1.02 | 51.10 ± 1.11 | 51.39 ± 1.06 | 51.11 ± 0.90 | 49.51 ± 1.00 | 52.10 ± 0.92 | 49.66 ± 0.97 | 50.71 ± 1.20 | 46.83 ± 0.81 | 52.27 ± 0.88 |
| CoQA | 94.74 ± 0.29 | 94.79 ± 0.34 | 96.02 ± 0.19 | 95.80 ± 0.17 | 94.13 ± 0.24 | 96.30 ± 0.15 | 95.92 ± 0.16 | 94.69 ± 0.18 | 95.15 ± 0.15 | 96.39 ± 0.15 |
| BioASQ | 82.48 ± 0.77 | 82.30 ± 0.80 | 83.57 ± 0.67 | 82.97 ± 0.63 | 81.05 ± 0.79 | 83.94 ± 0.53 | 82.82 ± 0.54 | 83.21 ± 0.37 | 81.84 ± 0.36 | 84.23 ± 0.49 |
| TriviaQA | 84.12 ± 0.31 | 83.60 ± 0.33 | 84.18 ± 0.32 | 83.85 ± 0.32 | 82.50 ± 0.33 | 84.49 ± 0.29 | 83.57 ± 0.33 | 83.21 ± 0.37 | 81.84 ± 0.36 | 84.77 ± 0.32 |
| Average | 78.27 | 77.95 | 78.79 | 78.43 | 76.80 | 79.21 | 77.99 | 77.96 | 76.42 | 79.42 |
| Mistral-7B-v0.3 | ||||||||||
| NQ | 51.96 ± 0.61 | 51.49 ± 0.71 | 52.53 ± 0.50 | 52.22 ± 0.60 | 51.16 ± 0.56 | 52.75 ± 0.48 | 52.01 ± 0.51 | 51.27 ± 0.70 | 49.52 ± 0.64 | 53.19 ± 0.58 |
| CoQA | 92.83 ± 0.26 | 93.11 ± 0.22 | 94.29 ± 0.20 | 94.17 ± 0.16 | 91.95 ± 0.32 | 94.47 ± 0.13 | 94.32 ± 0.16 | 93.02 ± 0.18 | 93.28 ± 0.17 | 94.63 ± 0.16 |
| BioASQ | 80.46 ± 0.77 | 80.07 ± 0.66 | 82.27 ± 0.70 | 81.07 ± 0.59 | 80.34 ± 0.86 | 81.65 ± 0.63 | 80.18 ± 0.61 | 80.05 ± 0.78 | 73.21 ± 0.62 | 82.18 ± 0.61 |
| TriviaQA | 82.58 ± 0.25 | 82.53 ± 0.31 | 83.02 ± 0.25 | 82.31 ± 0.26 | 81.81 ± 0.33 | 83.11 ± 0.30 | 82.11 ± 0.30 | 82.23 ± 0.37 | 79.48 ± 0.35 | 84.26 ± 0.33 |
| Average | 76.96 | 76.80 | 78.03 | 77.44 | 76.32 | 78.00 | 77.16 | 76.64 | 73.87 | 78.57 |
| Mistral-Nemo-12B | ||||||||||
| NQ | 51.32 ± 1.30 | 51.27 ± 1.17 | 52.12 ± 1.16 | 51.18 ± 1.10 | 50.47 ± 1.28 | 51.70 ± 1.12 | 49.83 ± 1.12 | 50.82 ± 1.39 | 47.12 ± 1.02 | 52.28 ± 1.14 |
| CoQA | 93.35 ± 0.24 | 93.15 ± 0.24 | 94.24 ± 0.18 | 94.15 ± 0.17 | 91.82 ± 0.17 | 94.54 ± 0.13 | 94.10 ± 0.14 | 93.02 ± 0.23 | 93.03 ± 0.18 | 94.98 ± 0.16 |
| BioASQ | 82.31 ± 0.56 | 82.00 ± 0.66 | 83.55 ± 0.53 | 82.33 ± 0.53 | 81.63 ± 0.70 | 83.50 ± 0.45 | 81.49 ± 0.56 | 81.73 ± 0.59 | 76.35 ± 0.54 | 84.28 ± 0.60 |
| TriviaQA | 85.35 ± 0.26 | 85.07 ± 0.38 | 85.85 ± 0.32 | 85.14 ± 0.26 | 84.14 ± 0.32 | 85.93 ± 0.27 | 85.14 ± 0.27 | 84.63 ± 0.27 | 83.31 ± 0.27 | 86.34 ± 0.30 |
| Average | 78.08 | 77.87 | 78.94 | 78.20 | 77.02 | 78.92 | 77.64 | 77.55 | 74.95 | 79.47 |
| 0.2 | 0.5 | 1.01 | 2 | 5 | 10 | |
| NQ | 73.25 ± 0.68 | 74.68 ± 0.64 | 75.64 ± 0.52 | 77.90 ± 0.54 | 76.91 ± 0.54 | 76.32 ± 0.56 |
| CoQA | 73.11 ± 0.38 | 74.33 ± 0.32 | 74.81 ± 0.28 | 75.75 ± 0.28 | 75.36 ± 0.29 | 74.78 ± 0.30 |
| BioASQ | 86.20 ± 0.48 | 86.77 ± 0.49 | 87.23 ± 0.40 | 87.55 ± 0.39 | 87.36 ± 0.40 | 87.34 ± 0.42 |
| TriviaQA | 79.92 ± 0.26 | 80.18 ± 0.26 | 80.76 ± 0.17 | 82.23 ± 0.16 | 81.45 ± 0.16 | 81.42 ± 0.18 |
| Average | 78.12 | 78.99 | 79.61 | 80.86 | 80.27 | 79.97 |
| 0.01 | 0.25 | 0.5 | 0.75 | 0.99 | |
| NQ | 76.98 ± 0.54 | 77.39 ± 0.55 | 77.90 ± 0.54 | 76.23 ± 0.62 | 75.31 ± 0.58 |
| CoQA | 75.21 ± 0.30 | 75.43 ± 0.28 | 75.75 ± 0.28 | 75.02 ± 0.33 | 74.71 ± 0.36 |
| BioASQ | 87.41 ± 0.42 | 87.57 ± 0.41 | 87.55 ± 0.39 | 87.23 ± 0.39 | 86.45 ± 0.40 |
| TriviaQA | 81.88 ± 0.16 | 82.16 ± 0.17 | 82.23 ± 0.16 | 81.38 ± 0.19 | 81.24 ± 0.20 |
| Average | 80.37 | 80.64 | 80.86 | 79.97 | 79.43 |
| Dataset | Soft Community | Hard Community |
|---|---|---|
| NQ | 77.90 ± 0.54 | 75.69 ± 0.55 |
| CoQA | 75.75 ± 0.28 | 74.31 ± 0.30 |
| BioASQ | 87.55 ± 0.39 | 86.36 ± 0.40 |
| TriviaQA | 82.23 ± 0.16 | 81.23 ± 0.17 |
| Average | 80.86 | 79.40 |
| N | 3 | 5 | 10 | 20 | 50 |
| NQ | 69.38 ± 0.56 | 74.55 ± 0.56 | 77.90 ± 0.54 | 77.91 ± 0.54 | 77.92 ± 0.56 |
| CoQA | 67.51 ± 0.33 | 73.42 ± 0.30 | 75.75 ± 0.28 | 75.82 ± 0.29 | 75.81 ± 0.30 |
| BioASQ | 78.29 ± 0.41 | 83.61 ± 0.40 | 87.55 ± 0.39 | 87.55 ± 0.41 | 87.52 ± 0.43 |
| TriviaQA | 76.45 ± 0.22 | 78.22 ± 0.20 | 82.23 ± 0.16 | 82.20 ± 0.16 | 82.28 ± 0.17 |
| Average | 72.91 | 77.45 | 80.86 | 80.87 | 80.88 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Li, Z.; Du, J. Soft-Community Kernel Rényi Spectrum for Semantic Uncertainty Estimation in Large Language Models. Entropy 2026, 28, 442. https://doi.org/10.3390/e28040442
Li Z, Du J. Soft-Community Kernel Rényi Spectrum for Semantic Uncertainty Estimation in Large Language Models. Entropy. 2026; 28(4):442. https://doi.org/10.3390/e28040442
Chicago/Turabian StyleLi, Zongkai, and Junliang Du. 2026. "Soft-Community Kernel Rényi Spectrum for Semantic Uncertainty Estimation in Large Language Models" Entropy 28, no. 4: 442. https://doi.org/10.3390/e28040442
APA StyleLi, Z., & Du, J. (2026). Soft-Community Kernel Rényi Spectrum for Semantic Uncertainty Estimation in Large Language Models. Entropy, 28(4), 442. https://doi.org/10.3390/e28040442

