Measuring Semantic Coherence of RAG-Generated Abstracts Through Complex Network Metrics
Abstract
1. Introduction
- RQ1: Can graph-theoretic robustness metrics capture the semantic coherence of abstracts generated via retrieval-augmented generation (RAG)?
- RQ2:Is there an observable alignment, even if partial, between these graph-based metrics and human expert judgments of coherence and importance?
- H1: Abstracts with higher graph-theoretic robustness (e.g., global efficiency, spectral radius, algebraic connectivity) will be perceived by experts as more semantically coherent.
- H2: The configuration that maximizes robustness metrics will also correspond to higher levels of human agreement on coherence and importance.
- Main contributions.
- We propose a two-phase framework that separates generation from evaluation, in which a simple retrieval-augmented generation (RAG) system produces scientific abstracts, and their semantic coherence is assessed independently using graph-theoretic analysis.
- Each abstract is modeled as a semantic co-occurrence network, characterized through seven robustness metrics (e.g., global efficiency, spectral radius, algebraic connectivity), providing interpretable fingerprints of thematic coherence.
- We conduct a comprehensive experimental study across multiple LLMs, embeddings, and prompting strategies, showing that optimal configurations maximize graph robustness while also yielding higher human inter-rater agreement (weighted ).
2. State of the Art
2.1. Retrieval-Augmented Generation
2.2. Complex Graph Networks
2.3. Semantic Graphs for Post-Generation Evaluation
3. Methodology
3.1. Data Collection
| Listing 1. The query targets documents containing the phrase “mineral processing” in the title, abstract, or keywords (TITLE-ABS-KEY), restricted to open-access publications (OA, “all”). |
| TITLE-ABS-KEY (mineral processing) AND (LIMIT-TO (OA, “all”)) |
- 1.
- Title:document title.
- 2.
- DOI (digital object identifier): unique and persistent document identifier.
- 3.
- Abstract: summary of the document’s content.
3.2. Pipeline
3.2.1. Embedding Generation
- sentence-transformers/all-mpnet-base-v2: A small and efficient model designed primarily for information retrieval and short text clustering (384 tokens) in English. It produces 768-dimensional embeddings. This model was chosen for its strong balance between performance and size, making it suitable for resource-constrained environments [40].
- dunzhang/stella_en_400M_v5: A relatively small model with 400 million parameters, based on Alibaba-NLP/gte-Qwen2-1.5B-instruct and trained using Matryoshka Representation Learning (MRL) [41]. It supports English texts up to 512 tokens and generates 1024-dimensional embeddings. This model offers high-quality representations without the computational burden of larger models [42].
3.2.2. RAG
| Listing 2. Example query submitted to an LLM within the RAG system. This prompt poses a question to the LLM, which generates an answer in the form of an abstract. |
| (“What are tailings, and how do environmental, chemical, and geotechnical factors influence sustainable tailings management in mineral processing operations?”) |
3.2.3. Large Language Models
- meta-llama/Llama-3.2-3B-Instruct: Part of Meta’s Llama family, this 3-billion-parameter (3B) model is designed to follow instructions. Its relatively small size makes it efficient for deployment in resource-constrained environments without significantly compromising response quality in focused tasks [45].
- Qwen/Qwen2.5-3B-Instruct: Developed by Alibaba Cloud, this 3B model is also instruction-tuned. Qwen is known for its strong performance across a variety of language tasks, often outperforming larger models. Its inclusion enables a direct comparison with Llama-3.2-3B-Instruct, given their similar size and purpose, which is useful for evaluating the performance of different LLM architectures within the same parameter class [46,47].
- google/gemma-2b-it: This Google model, with 2 billion parameters, is the most size-comparable instruct version within the Gemma family. Built upon the same research as the Gemini models, Gemma is designed to be lightweight and efficient, making it suitable for local or resource-limited deployments. Despite its smaller size, it performs competitively on reasoning and well-scoped generation tasks [48].
- Prompt A: This prompt employs the most direct and basic strategy, known as zero-shot prompting. It provides only the retrieved abstracts and a clear instruction to generate a new abstract. The aim is to establish a performance baseline by assessing the model’s inherent ability to synthesize information without additional guidance. It serves as the control condition against which more advanced techniques are compared.
- Prompt B: This version introduces the instruction “take your time before answering.” It is inspired by Chain-of-Thought (CoT) prompting, which aims to improve reasoning by encouraging the model to reflect before generating an answer [52]. The prompt implicitly guides the model to (1) identify key points, (2) organize them logically, and (3) synthesize a coherent summary. Known as Zero-shot-CoT, this approach enhances fidelity and structure without requiring exemplars, simply by reformulating the instruction to promote deliberate processing [53].
- Prompt C: This prompt uses a role-assignment strategy, instructing the model to act as a “postdoctoral researcher.” The goal is to align the model’s output with an expert-level tone, terminology, and analytical perspective [54]. By adopting this role, the model is expected not only to summarize content but to do so with the rigor, structure, and stylistic conventions of academic discourse. This strategy provides deep contextual framing to elicit more sophisticated and domain-appropriate outputs [55].
3.3. Complex Network Construction
3.3.1. Text Preprocessing
3.3.2. Complex Network
3.4. Structural Evaluation of the Generated Abstract
3.5. Robustness Metrics for Graphs
3.6. Experimental Setup
3.7. Evaluation of Thematic Coherence and Importance of RAG-Generated Abstracts
3.8. Experimental Design
- The first component represents the selected LLM architecture included in the study.
- The second component indicates the prompt type used, i.e., the textual input strategy guiding model generation. Variants are described in Table A1.
- The third component refers to the embedding model employed for the vector representation of text, used in retrieval or semantic comparison stages.
- The temperature applied during generation, controlling the model’s randomness. The scale ranges from deterministic (low-temperature) to more stochastic (high-temperature) values.
3.9. Metric Quantification
3.9.1. Difference for a Specific Metric
- are the normalized metric values;
- are the angles assigned to each metric, uniformly distributed around a circle (0 to radians);
- For the last point (), and to close the polygon.
3.9.2. Design of Kruskal–Wallis Tests
4. Results
4.1. Experiment 1
4.1.1. Analysis of Graphical Robustness Metrics Experiment 1
4.1.2. Metrics Experiment 1
4.1.3. Normality of the Data Experiment 1
4.1.4. Statistical Significance Experiment 1
4.2. Experiment 2
4.2.1. Graph Robustness Metrics Analysis Experiment 2
4.2.2. Metrics Experiment 2
4.2.3. Normality of the Data Experiment 2
4.2.4. Statistical Significance Experiment 2
4.3. Expert Evaluation: Meaningfulness & Importance
Convergent Validity with Expert Judgments
4.4. Synthesis of Findings
4.5. Discussion
4.6. Limitations
4.7. Future Work
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| Symbol | Description |
| A | Adjacency matrix of graph G |
| Area of the radar polygon (for configuration comparison) | |
| Effective conductance of G | |
| D | Degree matrix of G |
| Metric difference at iteration j | |
| Mean of differences across iterations | |
| Shortest-path length between nodes i and j | |
| E | Edge set of G |
| Global efficiency of G | |
| G | Graph (semantic co-occurrence network) |
| H | Kruskal–Wallis test statistic |
| L | Graph Laplacian, |
| N | Number of retrieved abstracts |
| n | Number of nodes (context: graph or embedding dimension) |
| Effective graph resistance | |
| Query and document embedding vectors | |
| Euclidean norms of u and v | |
| V | Vertex set of G |
| Cosine similarity | |
| Dot-product similarity | |
| Average edge betweenness in G | |
| Natural connectivity of G | |
| Cohen’s kappa (inter-rater agreement) | |
| Spectral gap () | |
| Algebraic connectivity (Fiedler value) | |
| Spectral radius | |
| Standard deviation | |
| LLM temperature | |
| Angle for metric i in the radar chart | |
| Acronym | Description |
| BERTScore | BERT-based semantic similarity metric |
| CoT | Chain-of-thought (prompting) |
| CSV | Comma-separated value |
| DOI | Digital object identifier |
| KG | Knowledge graph |
| LLM | Large language model |
| MRL | Matryoshka representation learning |
| NLP | Natural language processing |
| NPMI | Normalized pointwise mutual information |
| OA | Open access |
| PMI | Pointwise mutual information |
| PPMI | Positive pointwise mutual information |
| RAG | Retrieval-augmented generation |
| ROUGE | Recall-Oriented Understudy for Gisting Evaluation |
| SciBERT | Pretrained language model for scientific text |
| TF–IDF | Term frequency–inverse document frequency |
Appendix A
- Dot score, used with all-mpnet-base-v2, which produces normalized embeddings.
- Cosine similarity, used with other models that require prior normalization [73].


- Metrics
- 1.
- Global efficiency (): The efficiency between two vertices, i and j, is defined as for all , where is the shortest path length between vertices i and j. The global efficiency of a graph G is denoted as and is calculated as the average of the efficiencies over all pairs of vertices:This measure captures the overall information flow efficiency of the network, as proposed by Latora and Marchiori [62].
- 2.
- Average edge betweenness (): This measure is defined as the number of shortest paths that pass through an edge e out of the total possible shortest paths:where is the number of shortest paths between s and t that pass through e, and is the total number of shortest paths between s and t. The smaller the average edge betweenness, the more robust the graph since the shortest paths are more evenly distributed across each edge, rather than relying on a few central edges [63].
- 3.
- Spectral gap (): This metric evaluates the efficiency with which information can flow through various routes in a graph. It is computed as the difference between the two largest eigenvalues of the graph . A large indicates a robust graph in which information can readily traverse alternative paths, suggesting minimal bottlenecks or weak links [64].
- 4.
- Natural connectivity (): Natural connectivity can be interpreted as the “average eigenvalue” of the adjacency matrix , and it is defined as follows:It effectively measures the redundancy of pathways through the weighted count of closed walks. This metric is closely linked to the graph’s overall topology and its dynamics [65]. In simpler terms, a higher natural connectivity indicates the presence of more alternative paths, enhancing the graph’s robustness against disruptions.
- 5.
- Spectral radius (): The largest eigenvalue, , of an adjacency matrix, A, is called the spectral radius, . It is closely linked to the graph’s capacity to manage information flow through its paths and loops [64]. In simple terms, the greater the number of distinct paths between nodes, the better connected the graph is. A graph with many loops and alternative routes will exhibit a larger spectral radius. Graphs with a high spectral radius tend to be more resilient against failures or attacks, as information can continue to flow efficiently via alternate paths.
- 6.
- Effective conductance (): The effective resistance quantifies the robustness of a graph by accounting for both the number of parallel paths and the length of each path between node pairs. Specifically, the between nodes i and j is defined as the potential difference between these nodes when a unit current is injected at nodes i and withdrawn at j. The effective graph resistance is the sum of over all pairs of nodes in the graph. An effective method for computing is to express it in terms of the eigenvalues:where is the ith non-zero eigenvalue of L. In this work, we employ a normalized version of , termed conductance, which is defined as follows:with ; a larger value of indicates a higher level of robustness [67].
- 7.
- Algebraic connectivity (): The second smallest eigenvalue of the Laplacian matrix, also known as algebraic connectivity or the Fiedler value, is a crucial measure of graph robustness [66]. Because the L is symmetric and positive semidefinite, with each row summing to zero, its eigenvalues are real and non-negative. The smallest eigenvalue is always zero, and its multiplicity corresponds to the number of connected components in the graph. A higher indicates a more robust graph, meaning that networks with greater algebraic connectivity are more resistant to disconnection.
| Prompts |
|---|
|
- Mean of differences ()
- Standard deviation of differences ()
- Experimental Server
| Component | Specification |
|---|---|
| Operating system | Ubuntu 22.04.5 LTS (Jammy Jellyfish) |
| Kernel | Linux 6.8.0-65-generic |
| CPU | Intel Core i7-14700K (20 cores, 28 threads; base 3.4 GHz, turbo up to 5.6 GHz; 33 MB Smart Cache) |
| RAM | 128 GiB DDR5 3200 MHz (125 GiB visible to OS) |
| GPU | NVIDIA GeForce RTX 4080 (AD103, 16 GiB GDDR6X) |
| Storage | NVMe PCIe 4.0 ×4 SSD (1 TB) |
| Python environment | CPython 3.12.6 + pip 24.2 |
| Key libraries | PyTorch 2.5.1, Transformers 4.52.4, faiss-cpu 1.8, networkx 3.4.2, nvidia-cuda-toolkit 12.4.127 |




| Component | Available options |
|---|---|
| w—LLM | 0—meta-llama/Llama-3.2-3B-Instruct 1—Qwen/Qwen2.5-3B-Instruct 2—google/gemma-2b-it |
| x—Embedding model | 0—sentence-transformers/all-mpnet-base-v2 1—dunzhang/stella_en_400M_v5 |
| y—Prompt type | 0—Prompt A 1—Prompt B 2—Prompt C |
| z—Temperature | 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0 |


References
- Biemann, C.; Roos, S.; Weihe, K. Quantifying Semantics using Complex Network Analysis. In Proceedings of the COLING 2012, Mumbai, India, 8–15 December 2012; pp. 263–278. [Google Scholar]
- Edge, D.; Trinh, H.; Cheng, N.; Bradley, J.; Chao, A.; Mody, A.; Truitt, S.; Metropolitansky, D.; Ness, R.O.; Larson, J. From Local to Global: A Graph RAG Approach to Query-Focused Summarization. arXiv 2025, arXiv:2404.16130. [Google Scholar]
- Han, H.; Wang, Y.; Shomer, H.; Guo, K.; Ding, J.; Lei, Y.; Halappanavar, M.; Rossi, R.A.; Mukherjee, S.; Tang, X.; et al. Retrieval-Augmented Generation with Graphs (GraphRAG). arXiv 2025, arXiv:2501.00309. [Google Scholar]
- Havemann, F.; Scharnhorst, A. Bibliometric Networks. arXiv 2012, arXiv:1212.5211. [Google Scholar] [CrossRef]
- Patel, A.; Summers, J.; Kumar, P.; Edwards, S. Investigating the Use of Concept Maps and Graph-Based Analysis to Evaluate Learning. In Proceedings of the 2024 ASEE Annual Conference & Exposition, Portland, OR, USA, 23–26 June 2024; Available online: https://www.scopus.com (accessed on 1 July 2025).
- Cohan, A.; Goharian, N. Revisiting Summarization Evaluation for Scientific Articles. arXiv 2016, arXiv:1604.00400. [Google Scholar] [CrossRef]
- Lapata, M.; Barzilay, R. Automatic Evaluation of Text Coherence: Models and Representations. In Proceedings of the IJCAI 2005, Edinburgh, UK, 30 July–5 August 2005; pp. 1085–1090. [Google Scholar]
- Zhao, W.; Strube, M.; Eger, S. DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, Dubrovnik, Croatia, 2–6 May 2023; pp. 3865–3883. [Google Scholar] [CrossRef]
- Pogorilyy, S.; Kramov, A. Assessment of Text Coherence by Constructing the Graph of Semantic, Lexical, and Grammatical Consistancy of Phrases of Sentences. Cybern. Syst. Anal. 2020, 56, 893–899. [Google Scholar] [CrossRef]
- Fan, W.; Ding, Y.; Ning, L.; Wang, S.; Li, H.; Yin, D.; Chua, T.S.; Li, Q. A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models. arXiv 2024, arXiv:2405.06211. [Google Scholar] [CrossRef]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; tau Yih, W.; Rocktäschel, T.; et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv 2021, arXiv:2005.11401. [Google Scholar] [CrossRef]
- Ayala, O.; Bechard, P. Reducing hallucination in structured outputs via Retrieval-Augmented Generation. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Mexico City, Mexico, 16–21 June 2024; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; Volume 6: Industry Track, pp. 228–238. [Google Scholar] [CrossRef]
- Siriwardhana, S.; Weerasekera, R.; Wen, E.; Kaluarachchi, T.; Rana, R.; Nanayakkara, S. Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering. Trans. Assoc. Comput. Linguist. 2023, 11, 1–17. [Google Scholar] [CrossRef]
- Kang, M.; Kwak, J.M.; Baek, J.; Hwang, S.J. Knowledge Graph-Augmented Language Models for Knowledge-Grounded Dialogue Generation. arXiv 2023, arXiv:2305.18846. [Google Scholar]
- Xiong, G.; Jin, Q.; Lu, Z.; Zhang, A. Benchmarking Retrieval-Augmented Generation for Medicine. arXiv 2024, arXiv:2402.13178. [Google Scholar] [CrossRef]
- Luo, L.; Zhao, Z.; Haffari, G.; Phung, D.; Gong, C.; Pan, S. GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation. arXiv 2025, arXiv:2502.01113. [Google Scholar] [CrossRef]
- Peng, B.; Zhu, Y.; Liu, Y.; Bo, X.; Shi, H.; Hong, C.; Zhang, Y.; Tang, S. Graph Retrieval-Augmented Generation: A Survey. arXiv 2024. [Google Scholar] [CrossRef]
- Wang, X.; Chen, G. Complex networks: Small-world, scale-free and beyond. IEEE Circuits Syst. Mag. 2003, 3, 6–20. [Google Scholar] [CrossRef]
- Borge-Holthoefer, J.; Arenas, A. Semantic Networks: Structure and Dynamics. Entropy 2010, 12, 1264–1302. [Google Scholar] [CrossRef]
- Beltagy, I.; Lo, K.; Cohan, A. SciBERT: A Pretrained Language Model for Scientific Text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 3606–3611. [Google Scholar] [CrossRef]
- Pan, S.; Luo, L.; Wang, Y.; Chen, C.; Wang, J.; Wu, X. Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Trans. Knowl. Data Eng. 2024, 36, 3580–3599. [Google Scholar] [CrossRef]
- Yang, R.; Yang, B.; Feng, A.; Ouyang, S.; Blum, M.; She, T.; Jiang, Y.; Lecue, F.; Lu, J.; Li, I. Graphusion: A RAG Framework for Knowledge Graph Construction with a Global Perspective. arXiv 2025, arXiv:2410.17600. [Google Scholar]
- Lehmann, F. Semantic networks. Comput. Math. Appl. 1992, 23, 1–50. [Google Scholar] [CrossRef]
- Ma, N.; Politowicz, A.; Mazumder, S.; Chen, J.; Liu, B.; Robertson, E.; Grigsby, S. Semantic Novelty Detection in Natural Language Descriptions. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 866–882. [Google Scholar] [CrossRef]
- Jeon, D.; Lee, J.; Ahn, J.M.; Lee, C. Measuring the novelty of scientific publications: A fastText and local outlier factor approach. J. Inf. 2023, 17, 101450. [Google Scholar] [CrossRef]
- Ferrer-i Cancho, R.; Sole, R. The small world of human language. Proc. Biol. Sci./R. Soc. 2001, 268, 2261–2265. [Google Scholar] [CrossRef]
- Masucci, A.; Rodgers, G. Network properties of written human language. Phys. Rev. E Stat. Nonlinear Soft Matter Phys. 2006, 74, 026102. [Google Scholar] [CrossRef] [PubMed]
- Pereira, H.; Fadigas, I.; Senna, V.; Moret, M. Semantic networks based on titles of scientific papers. Phys. A Stat. Mech. Its Appl. 2011, 390, 1192–1197. [Google Scholar] [CrossRef]
- Amancio, D.R.; Machicao, J.; Quispe, L.V.C. Probing the statistical properties of enriched co-occurrence networks. arXiv 2024, arXiv:2412.02664. [Google Scholar] [CrossRef]
- Serrano, M.; Boguñá, M.; Vespignani, A. Extracting the Multiscale Backbone of Complex Weighted Networks. Proc. Natl. Acad. Sci. USA 2009, 106, 6483–6488. [Google Scholar] [CrossRef]
- Quispe, L.; Tohalino, J.; Amancio, D. Using virtual edges to improve the discriminability of co-occurrence text networks. Phys. A Stat. Mech. Its Appl. 2021, 562, 125344. [Google Scholar] [CrossRef]
- Wang, K.; Ding, Y.; Han, S. Graph neural networks for text classification: A survey. Artif. Intell. Rev. 2024, 57, 190. [Google Scholar] [CrossRef]
- Bullinaria, J.; Levy, J. Extracting semantic representations from word co-occurrence statistics: A computational study. Behav. Res. Methods 2007, 39, 510–526. [Google Scholar] [CrossRef]
- Levy, O.; Goldberg, Y. Dependency-Based Word Embeddings. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, MD, USA, 22–27 June 2014; Volume 2, pp. 302–308. [Google Scholar] [CrossRef]
- Church, K.; Hanks, P. Word Association Norms, Mutual Information, and Lexicography. Comput. Linguist. 2002, 16, 76–83. [Google Scholar] [CrossRef]
- Dunning, T. Accurate Methods for the Statistics of Surprise and Coincidence. Comput. Linguist. 1993, 19, 61–74. [Google Scholar]
- Scopus. 2025. Available online: https://www.scopus.com (accessed on 1 July 2025).
- Wang, X.; Koç, Y.; Derrible, S.; Ahmad, S.N.; Pino, W.J.; Kooij, R.E. Multi-criteria robustness analysis of metro networks. Phys. A Stat. Mech. Its Appl. 2017, 474, 19–31. [Google Scholar] [CrossRef]
- Gao, Y.; Xiong, Y.; Gao, X.; Jia, K.; Pan, J.; Bi, Y.; Dai, Y.; Sun, J.; Wang, M.; Wang, H. Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv 2024, arXiv:2312.10997. [Google Scholar] [CrossRef]
- Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. arXiv 2019, arXiv:1908.10084. [Google Scholar] [CrossRef]
- Kusupati, A.; Bhatt, G.; Rege, A.; Wallingford, M.; Sinha, A.; Ramanujan, V.; Howard-Snyder, W.; Chen, K.; Kakade, S.; Jain, P.; et al. Matryoshka Representation Learning. arXiv 2024, arXiv:2205.13147. [Google Scholar]
- Zhang, D.; Li, J.; Zeng, Z.; Wang, F. Jasper and Stella: Distillation of SOTA embedding models. arXiv 2025, arXiv:2412.19048. [Google Scholar]
- Muennighoff, N.; Tazi, N.; Magne, L.; Reimers, N. MTEB: Massive Text Embedding Benchmark. arXiv 2023, arXiv:2210.07316. [Google Scholar] [CrossRef]
- Li, Z.; Zhang, X.; Zhang, Y.; Long, D.; Xie, P.; Zhang, M. Towards general text embeddings with multi-stage contrastive learning. arXiv 2023, arXiv:2308.03281. [Google Scholar] [CrossRef]
- Grattafiori, A.; Dubey, A.; Jauhri, A.; Pandey, A.; Kadian, A.; Al-Dahle, A.; Letman, A.; Mathur, A.; Schelten, A.; Vaughan, A.; et al. The Llama 3 Herd of Models. arXiv 2024, arXiv:2407.21783. [Google Scholar] [CrossRef]
- Team, Q. Qwen2.5: A Party of Foundation Models; GitHub: San Francisco, CA, USA, 2024. [Google Scholar]
- Bai, J.; Bai, S.; Chu, Y.; Cui, Z.; Dang, K.; Deng, X.; Fan, Y.; Ge, W.; Han, Y.; Huang, F.; et al. Qwen Technical Report. arXiv 2023, arXiv:2309.16609. [Google Scholar] [CrossRef]
- Team, G. Gemma; Kaggle: San Francisco, CA, USA, 2024. [Google Scholar] [CrossRef]
- Sanh, V.; Webson, A.; Raffel, C.; Bach, S.H.; Sutawika, L.; Alyafeai, Z.; Chaffin, A.; Stiegler, A.; Scao, T.L.; Raja, A.; et al. Multitask Prompted Training Enables Zero-Shot Task Generalization. arXiv 2022, arXiv:2110.08207. [Google Scholar] [CrossRef]
- Weller, O.; Durme, B.V.; Lawrie, D.; Paranjape, A.; Zhang, Y.; Hessel, J. Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models. arXiv 2024, arXiv:2409.11136. [Google Scholar] [CrossRef]
- Yang, C.; Wang, X.; Lu, Y.; Liu, H.; Le, Q.V.; Zhou, D.; Chen, X. Large Language Models as Optimizers. arXiv 2024, arXiv:2309.03409. [Google Scholar] [PubMed]
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 2022, 35, 24824–24837. [Google Scholar]
- Kojima, T.; Gu, S.S.; Reid, M.; Matsuo, Y.; Iwasawa, Y. Large language models are zero-shot reasoners. Adv. Neural Inf. Process. Syst. 2022, 35, 22199–22213. [Google Scholar]
- Wang, Z.M.; Peng, Z.; Que, H.; Liu, J.; Zhou, W.; Wu, Y.; Guo, H.; Gan, R.; Ni, Z.; Yang, J.; et al. RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models. arXiv 2024, arXiv:2310.00746. [Google Scholar]
- Kong, A.; Zhao, S.; Chen, H.; Li, Q.; Qin, Y.; Sun, R.; Zhou, X.; Wang, E.; Dong, X. Better Zero-Shot Reasoning with Role-Play Prompting. arXiv 2024, arXiv:2308.07702. [Google Scholar]
- Vijayarani, S.; Janani, R. Text mining: Open source tokenization tools-an analysis. Adv. Comput. Intell. Int. J. (ACII) 2016, 3, 37–47. [Google Scholar]
- Sun, Y.; Platoš, J. A method for constructing word sense embeddings based on word sense induction. Sci. Rep. 2023, 13, 12945. [Google Scholar] [CrossRef] [PubMed]
- Ma, R.; Jin, L.; Liu, Q.; Chen, L.; Yu, K. Addressing the polysemy problem in language modeling with attentional multi-Sense embeddings. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020. [Google Scholar] [CrossRef]
- Saeidi, M.; Milios, E.; Zeh, N. Biomedical Word Sense Disambiguation with Contextualized Representation Learning. In Proceedings of the WWW ’22: Companion Proceedings of the Web Conference 2022, Virtual Event, 25–29 April 2022; pp. 843–848. [Google Scholar] [CrossRef]
- Hagberg, A.; Conway, D. Networkx: Network Analysis with Python. 2020, pp. 1–48. Available online: https://networkx.github.io (accessed on 1 July 2025).
- Liu, H.; Cong, J. Language clustering with word co-occurrence networks based on parallel texts. Chin. Sci. Bull. 2013, 58, 1139–1144. [Google Scholar] [CrossRef]
- Latora, V.; Marchiori, M. Efficient behavior of small-world networks. Phys. Rev. Lett. 2001, 87, 198701. [Google Scholar] [CrossRef]
- Freeman, L.C. A set of measures of centrality based on betweenness. Sociometry 1977, 40, 35–41. [Google Scholar] [CrossRef]
- Van Mieghem, P. Graph Spectra for Complex Networks; Cambridge University Press: Cambridge, UK, 2023. [Google Scholar]
- Jun, W.; Barahona, M.; Yue-Jin, T.; Hong-Zhong, D. Natural connectivity of complex networks. Chin. Phys. Lett. 2010, 27, 078902. [Google Scholar] [CrossRef]
- Fiedler, M. Algebraic connectivity of graphs. Czechoslov. Math. J. 1973, 23, 298–305. [Google Scholar] [CrossRef]
- Ellens, W.; Spieksma, F.M.; Van Mieghem, P.; Jamakovic, A.; Kooij, R.E. Effective graph resistance. Linear Algebra Its Appl. 2011, 435, 2491–2506. [Google Scholar]
- Warrens, M.J. Cohen’s linearly weighted kappa is a weighted average. Adv. Data Anal. Classif. 2012, 6, 67–79. [Google Scholar] [CrossRef]
- Renze, M. The effect of sampling temperature on problem solving in large language models. In Findings of the Association for Computational Linguistics: EMNLP 2024; Association for Computational Linguistics: Miami, FL, USA, 2024; pp. 7346–7356. [Google Scholar]
- Star, S.L.; Griesemer, J.R. Institutional ecology, translations and boundary objects: Amateurs and professionals in Berkeley’s Museum of Vertebrate Zoology, 1907–39. Soc. Stud. Sci. 1989, 19, 387–420. [Google Scholar] [CrossRef]
- Barbierato, E.; Gatti, A.; Incremona, A.; Pozzi, A.; Toti, D. Breaking Away From AI: The Ontological and Ethical Evolution of Machine Learning. IEEE Access 2025, 13, 55627–55647. [Google Scholar] [CrossRef]
- Basile, P.; Siciliani, L.; Musacchio, E.; Semeraro, G. Exploring the Word Sense Disambiguation Capabilities of Large Language Models. arXiv 2025, arXiv:2503.08662. [Google Scholar] [CrossRef]
- Toshevska, M.; Stojanovska, F.; Kalajdjieski, J. Comparative Analysis of Word Embeddings for Capturing Word Similarities. In Proceedings of the 6th International Conference on Natural Language Processing (NATP 2020), Copenhagen, Denmark, 25–26 April 2020; Aircc Publishing Corporation: Chennai, India, 2020. NATP 2020. pp. 9–24. [Google Scholar] [CrossRef]

| Metric | Definition | Reference | Implementation/Function |
|---|---|---|---|
| Global Efficiency() | Mean inverse shortest-path length between all node pairs; measures overall information flow efficiency. | [62] | nx.global_efficiency(G) |
| Average Edge Betweenness () | Mean fraction of all-pairs shortest paths crossing each edge; its inverse reflects redundancy of routes. | [63] | nx.edge_betweenness_centrality(G) |
| Spectral Radius () | Largest eigenvalue of the adjacency matrix A; indicates overall connectivity strength and resilience. | [64] | np.linalg.eigvals(nx.to_numpy_array(G)) |
| Spectral Gap () | Difference between the two largest adjacency eigenvalues; captures alternative path diversity. | [64] | Computed from eigenvalues of A |
| Natural Connectivity () | Logarithm of the average of across adjacency eigenvalues; measures redundancy of closed walks. | [65] | Custom NumPy implementation (Equation (A5)) |
| Algebraic Connectivity () | Second-smallest Laplacian eigenvalue (Fiedler value); quantifies robustness against disconnection. | [66] | nx.algebraic_connectivity(G) |
| Effective Conductance () | Normalized inverse of effective graph resistance (Equation (A7)); gauges ease of flow between all nodes. | [67] | Custom from Laplacian eigenvalues |
| Inverse Average Edge Betweenness | Inverse of to ensure higher values ⇒ greater robustness. | [63] | nx.edge_betweenness_centrality(G) |
| Component | Available Options |
|---|---|
| LLM | meta-llama/Llama-3.2-3B-Instruct Qwen/Qwen2.5-3B-Instruct google/gemma-2b-it |
| Prompt type | Prompt A Prompt B Prompt C |
| Embedding model | sentence-transformers/all-mpnet-base-v2 dunzhang/stella_en_400M_v5 |
| Temperature | 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0 |
| LLM | Prompt | Embedding | Mean | Std | Min | Max | Median |
|---|---|---|---|---|---|---|---|
| Qwen | A | mpnet | 0.403 | 0.636 | −1.155 | 1.447 | 0.531 |
| A | stella | −1.057 | 0.502 | −1.649 | −0.019 | −1.125 | |
| B | mpnet | 0.895 | 0.557 | −0.310 | 1.734 | 1.022 | |
| B | stella | 0.133 | 0.727 | −1.560 | 1.335 | 0.319 | |
| C | mpnet | 0.717 | 0.575 | −0.816 | 1.655 | 0.773 | |
| C | stella | 0.198 | 0.930 | −1.779 | 1.642 | 0.340 | |
| Llama | A | mpnet | −0.104 | 0.756 | −1.392 | 1.253 | −0.211 |
| A | stella | −1.652 | 0.129 | −1.840 | −1.322 | −1.671 | |
| B | mpnet | −0.187 | 0.590 | −1.179 | 0.845 | −0.081 | |
| B | stella | −0.364 | 0.571 | −1.222 | 0.827 | −0.334 | |
| C | mpnet | −0.219 | 0.584 | −1.246 | 0.975 | −0.241 | |
| C | stella | −0.734 | 0.671 | −1.715 | 0.597 | −0.919 | |
| Gemma | A | mpnet | 1.148 | 0.471 | −0.314 | 1.817 | 1.304 |
| A | stella | 0.369 | 0.627 | −0.702 | 1.337 | 0.504 | |
| B | mpnet | 1.400 | 0.251 | 0.653 | 1.815 | 1.421 | |
| B | stella | 0.658 | 0.794 | −1.128 | 1.769 | 0.845 | |
| C | mpnet | 1.174 | 0.508 | −0.740 | 1.657 | 1.277 | |
| C | stella | 0.654 | 0.795 | −1.456 | 1.688 | 0.859 |
| LLM | Prompt | Embedding | Global Efficiency (↑) | Avg. Edge Betweenness (↑) | Spectral Radius (↑) | Spectral Gap (↑) | Natural Connectivity (↑) | Algebra Connectivity (↑) | Conductance (↑) |
|---|---|---|---|---|---|---|---|---|---|
| Qwen | A | mpnet | 0.0100 ± 0.023 | −0.0169 ± 0.025 | 0.0058 ± 0.115 | 0.0068 ± 0.007 | 0.0022 ± 0.030 | 0.0009 ± 0.001 | 0.0038 ± 0.006 |
| A | stella | −0.0283 ± 0.024 | 0.0244 ± 0.027 | −0.0374 ± 0.139 | −0.0083 ± 0.007 | −0.0201 ± 0.031 | −0.0016 ± 0.001 | −0.0108 ± 0.006 | |
| B | mpnet | 0.0321 ± 0.027 | −0.0394 ± 0.031 | 0.2425 ± 0.188 | 0.0112 ± 0.012 | 0.0537 ± 0.043 | 0.0015 ± 0.002 | 0.0070 ± 0.006 | |
| B | stella | 0.0122 ± 0.027 | −0.0173 ± 0.029 | 0.1501 ± 0.153 | 0.0019 ± 0.010 | 0.0301 ± 0.038 | −0.0002 ± 0.001 | −0.0000 ± 0.007 | |
| C | mpnet | 0.0211 ± 0.029 | −0.0235 ± 0.029 | 0.0346 ± 0.109 | 0.0082 ± 0.009 | 0.0159 ± 0.032 | 0.0018 ± 0.002 | 0.0103 ± 0.010 | |
| C | stella | 0.0169 ± 0.040 | −0.0164 ± 0.043 | 0.0584 ± 0.131 | 0.0025 ± 0.013 | 0.0192 ± 0.039 | 0.0009 ± 0.003 | 0.0062 ± 0.013 | |
| Llama | A | mpnet | −0.0041 ± 0.031 | 0.0050 ± 0.032 | −0.1127 ± 0.077 | 0.0012 ± 0.007 | −0.0206 ± 0.018 | 0.0007 ± 0.001 | 0.0038 ± 0.008 |
| A | stella | −0.0528 ± 0.013 | 0.0586 ± 0.017 | −0.1794 ± 0.069 | −0.0154 ± 0.002 | −0.0484 ± 0.012 | −0.0022 ± 0.000 | −0.0153 ± 0.002 | |
| B | mpnet | −0.0052 ± 0.019 | −0.0055 ± 0.018 | −0.0613 ± 0.107 | 0.0030 ± 0.005 | −0.0185 ± 0.020 | 0.0002 ± 0.001 | −0.0018 ± 0.006 | |
| B | stella | −0.0092 ± 0.021 | −0.0028 ± 0.021 | −0.0628 ± 0.081 | 0.0031 ± 0.008 | −0.0238 ± 0.018 | −0.0002 ± 0.001 | −0.0052 ± 0.007 | |
| C | mpnet | −0.0106 ± 0.021 | 0.0033 ± 0.023 | −0.1162 ± 0.058 | 0.0025 ± 0.006 | −0.0263 ± 0.012 | 0.0004 ± 0.001 | −0.0002 ± 0.006 | |
| C | stella | −0.0188 ± 0.026 | 0.0133 ± 0.028 | −0.1410 ± 0.065 | −0.0029 ± 0.007 | −0.0368 ± 0.016 | −0.0005 ± 0.001 | −0.0052 ± 0.008 | |
| Gemma | A | mpnet | 0.0565 ± 0.028 | −0.0473 ± 0.032 | 0.0262 ± 0.122 | 0.0133 ± 0.012 | 0.0322 ± 0.035 | 0.0045 ± 0.003 | 0.0280 ± 0.013 |
| A | stella | 0.0252 ± 0.031 | −0.0161 ± 0.033 | −0.0504 ± 0.103 | 0.0029 ± 0.010 | 0.0035 ± 0.029 | 0.0019 ± 0.002 | 0.0146 ± 0.012 | |
| B | mpnet | 0.0606 ± 0.026 | −0.0625 ± 0.024 | 0.2657 ± 0.215 | 0.0189 ± 0.010 | 0.0823 ± 0.057 | 0.0035 ± 0.003 | 0.0212 ± 0.017 | |
| B | stella | 0.0359 ± 0.034 | −0.0461 ± 0.035 | 0.1444 ± 0.220 | 0.0160 ± 0.019 | 0.0403 ± 0.063 | 0.0019 ± 0.003 | 0.0078 ± 0.010 | |
| C | mpnet | 0.0479 ± 0.028 | −0.0506 ± 0.028 | 0.0954 ± 0.177 | 0.0176 ± 0.012 | 0.0389 ± 0.045 | 0.0039 ± 0.003 | 0.0203 ± 0.013 | |
| C | stella | 0.0319 ± 0.033 | −0.0330 ± 0.037 | 0.0146 ± 0.149 | 0.0125 ± 0.014 | 0.0160 ± 0.041 | 0.0030 ± 0.003 | 0.0133 ± 0.012 |
| Variable | Statistic | p-Value | Result |
|---|---|---|---|
| area_avg | 0.8981 | rejected | |
| area_new | 0.7908 | rejected |
| LLM | Prompt | Embedding | H Statistic | p-Value | Significant | Effect Size () | Median Diff CI (95%) |
|---|---|---|---|---|---|---|---|
| Qwen | A | stella | 84.3593 | True | 0.149927 | [−2.250, −1.947] | |
| A | mpnet | 4.1236 | True | 0.005618 | [−0.929, 0.096] | ||
| B | mpnet | 7.1796 | True | 0.011114 | [−0.880, −0.214] | ||
| B | stella | 13.0589 | True | 0.021689 | [−1.200, −0.378] | ||
| C | mpnet | 7.9486 | True | 0.012497 | [−0.954, −0.318] | ||
| C | stella | 27.5571 | True | 0.047765 | [−1.524, −1.052] | ||
| Llama | A | stella | 46.1523 | True | 0.081209 | [−1.908, −1.312] | |
| A | mpnet | 0.6671 | False | −0.000599 | [−0.215, 0.515] | ||
| B | mpnet | 16.1933 | True | 0.027326 | [0.364, 1.034] | ||
| B | stella | 0.5391 | False | −0.000829 | [−0.360, 0.255] | ||
| C | mpnet | 8.0134 | True | 0.012614 | [0.218, 0.813] | ||
| C | stella | 0.0062 | False | −0.001787 | [−0.509, 0.550] | ||
| Gemma | A | mpnet | 34.3988 | True | 0.060070 | [0.795, 1.261] | |
| A | stella | 0.3358 | False | −0.001195 | [−0.357, 0.491] | ||
| B | mpnet | 58.1348 | True | 0.102760 | [1.031, 1.367] | ||
| B | stella | 7.0208 | True | 0.010829 | [0.124, 0.871] | ||
| C | mpnet | 37.0708 | True | 0.064875 | [0.890, 1.276] | ||
| C | stella | 7.3154 | True | 0.011359 | [0.363, 0.844] |
| Temperature | Mean | Std | Min | Max | Median |
|---|---|---|---|---|---|
| 0.1 | 1.6304 | 0.1198 | 1.3165 | 1.7828 | 1.6505 |
| 0.2 | 1.6871 | 0.1177 | 1.3676 | 1.8195 | 1.7186 |
| 0.3 | 1.6101 | 0.2434 | 0.8369 | 1.9059 | 1.6683 |
| 0.4 | 1.5071 | 0.3502 | 0.1120 | 1.8619 | 1.5912 |
| 0.5 | 1.4690 | 0.2975 | 0.7317 | 1.9828 | 1.53916 |
| 0.6 | 1.3091 | 0.5699 | −1.1531 | 1.9490 | 1.4966 |
| 0.7 | 1.2705 | 0.4150 | 0.1570 | 1.8244 | 1.3115 |
| 0.8 | 1.2516 | 0.4710 | −0.0947 | 1.7563 | 1.3751 |
| 0.9 | 1.2534 | 0.4486 | −0.1275 | 1.8040 | 1.3116 |
| 1.0 | 1.0463 | 0.5275 | −0.1907 | 1.9082 | 1.1453 |
| Temperature | Global Efficiency (↑) | Avg. Edge Betweenness (↑) | Spectral Radius (↑) | Spectral Gap (↑) | Natural Connectivity (↑) | Algebra Connectivity (↑) | Conductance (↑) |
|---|---|---|---|---|---|---|---|
| 0.1 | 0.0671 ± 0.018 | −0.0756 ± 0.019 | 0.5403 ± 0.110 | 0.0207 ± 0.007 | 0.1421 ± 0.034 | 0.0040 ± 0.001 | 0.0211 ± 0.007 |
| 0.2 | 0.0796 ± 0.020 | −0.0856 ± 0.023 | 0.5490 ± 0.156 | 0.0229 ± 0.010 | 0.1589 ± 0.045 | 0.0047 ± 0.002 | 0.0271 ± 0.008 |
| 0.3 | 0.0754 ± 0.024 | −0.0813 ± 0.022 | 0.4981 ± 0.224 | 0.0215 ± 0.009 | 0.1436 ± 0.066 | 0.0045 ± 0.002 | 0.0259 ± 0.013 |
| 0.4 | 0.0727 ± 0.026 | −0.0771 ± 0.024 | 0.3560 ± 0.187 | 0.0222 ± 0.010 | 0.1073 ± 0.053 | 0.0046 ± 0.003 | 0.0254 ± 0.017 |
| 0.5 | 0.0661 ± 0.032 | −0.0697 ± 0.032 | 0.3542 ± 0.259 | 0.0199 ± 0.011 | 0.1099 ± 0.083 | 0.0041 ± 0.003 | 0.0243 ± 0.017 |
| 0.6 | 0.0602 ± 0.033 | −0.0650 ± 0.038 | 0.3123 ± 0.233 | 0.0185 ± 0.012 | 0.0889 ± 0.066 | 0.0032 ± 0.002 | 0.0176 ± 0.013 |
| 0.7 | 0.0606 ± 0.026 | −0.0625 ± 0.024 | 0.2657 ± 0.215 | 0.0189 ± 0.010 | 0.0823 ± 0.057 | 0.0035 ± 0.003 | 0.0212 ± 0.017 |
| 0.8 | 0.0561 ± 0.035 | −0.0585 ± 0.039 | 0.1746 ± 0.210 | 0.0212 ± 0.012 | 0.0573 ± 0.050 | 0.0040 ± 0.004 | 0.0199 ± 0.023 |
| 0.9 | 0.0535 ± 0.029 | −0.0590 ± 0.025 | 0.1718 ± 0.232 | 0.0211 ± 0.010 | 0.0565 ± 0.061 | 0.0037 ± 0.003 | 0.0186 ± 0.016 |
| 1.0 | 0.0461 ± 0.047 | −0.0484 ± 0.029 | 0.1070 ± 0.173 | 0.0172 ± 0.011 | 0.0415 ± 0.061 | 0.0037 ± 0.005 | 0.0188 ± 0.030 |
| Variable | Statistic | p-Value | Result |
|---|---|---|---|
| area_avg | 0.8904 | rejected | |
| area_new | 0.4321 | rejected |
| Temperature | H Statistic | p-Value | Significant | Effect Size () | Median Diff CI (95%) |
|---|---|---|---|---|---|
| 0.1 | 10.9475 | True | 0.032297 | [0.082, 0.225] | |
| 0.2 | 26.0081 | True | 0.081195 | [0.164, 0.283] | |
| 0.3 | 12.2268 | True | 0.036451 | [0.097, 0.261] | |
| 0.4 | 2.2331 | False | 0.004004 | [−0.037, 0.190] | |
| 0.5 | 0.0451 | False | −0.003100 | [−0.099, 0.126] | |
| 0.6 | 1.6901 | False | 0.002241 | [−0.248, 0.061] | |
| 0.7 | 5.7724 | True | 0.015495 | [−0.392, −0.133] | |
| 0.8 | 6.6564 | True | 0.018365 | [−0.332, −0.047] | |
| 0.9 | 7.2239 | True | 0.020207 | [−0.350, −0.050] | |
| 1.0 | 21.5433 | True | 0.066699 | [−0.566, −0.248] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gana, B.; Palma, W.; Lucay, F.A.; Missana, C.; Abarza, C.; Allende-Cid, H. Measuring Semantic Coherence of RAG-Generated Abstracts Through Complex Network Metrics. Mathematics 2025, 13, 3472. https://doi.org/10.3390/math13213472
Gana B, Palma W, Lucay FA, Missana C, Abarza C, Allende-Cid H. Measuring Semantic Coherence of RAG-Generated Abstracts Through Complex Network Metrics. Mathematics. 2025; 13(21):3472. https://doi.org/10.3390/math13213472
Chicago/Turabian StyleGana, Bady, Wenceslao Palma, Freddy A. Lucay, Cristóbal Missana, Carlos Abarza, and Hector Allende-Cid. 2025. "Measuring Semantic Coherence of RAG-Generated Abstracts Through Complex Network Metrics" Mathematics 13, no. 21: 3472. https://doi.org/10.3390/math13213472
APA StyleGana, B., Palma, W., Lucay, F. A., Missana, C., Abarza, C., & Allende-Cid, H. (2025). Measuring Semantic Coherence of RAG-Generated Abstracts Through Complex Network Metrics. Mathematics, 13(21), 3472. https://doi.org/10.3390/math13213472

