4. Results and Discussion
This section outlines the results of lexical and similarity metrics for the text generated by the baseline LLMs and RAG-Augmented LLMs for the user queries. We have run these tests on Python v3.12.3 using an Intel i9 processor with 64 GB RAM and Nvidia RTX 4090 GPU, as well as Google Colab Pro (4 CPU, 13 GB RAM with Nvidia T4 Tensor Core GPU). Furthermore, for these results, a single query was executed 11 times to capture 11 LLM outputs and evaluated using seven evaluation metrics. Consequently, we performed the same analysis with 11 distinct queries. Overall, for each evaluation metric, we gathered a mean of 121 results per LLM, facilitating a statistical analysis of LLM performance. This type of analysis provides a comprehensive understanding of LLM performance and lays the groundwork for the development of more effective and reliable LLMs. We have used the research work and publications carried out in the last month, i.e., later than the training cutoff date of the LLMs. The results have been categorized into two sections.
The first section compared the baseline LLMs with RAG-Augmented LLMs, highlighting the LLM performance improvement while using RAG. In the second section, a comparative analysis was conducted to list the trade-off possibilities of a small RAG-Augmented LLM compared to larger-parameter baseline LLMs. By utilizing four LLMs generating 121 outputs per LLM, our study is a comprehensive assessment of 968 LLM outputs evaluated across seven evaluation metrics.
4.2. Comparing RAG-Augmented Small LLMs with Large-Scale Baseline LLMs
In this section, we compare smaller RAG-Augmented LLMs with larger-parameter LLMs using the evaluation metrics mentioned in
Table 1 to explore the impact and trade-off between using RAG vs increasing the LLM parameters. A comparative analysis was conducted considering three cases:
Baseline Llama 3.1 8B to RAG-Augmented Mistral 7B;
Baseline Llama 1 13B to RAG-Augmented Mistral 7B;
Baseline Llama 1 13B to RAG-Augmented Llama 3.1 8B.
In this comparison, the RAG-Augmented Mistral 7B was evaluated against a different baseline LLM—Llama 3.1 8B—to explore the potential trade-off possibilities across parameter scale. Our comparative analysis, as shown in
Figure 5, indicates that RAG-Augmented Mistral 7B acts as an effective trade-off to Llama 3.1 8B. The performance improvement of RAG-Augmented Mistral 7B over Llama 3.1 8B is presented in
Table 4.
RAG-Augmented Mistral-7B retrieves nearly verbatim phrases from the source text, resulting in high n-gram overlap, thereby significantly elevating the BLEU score. Similarly, the ROUGE score signifies an increase in one, two, and long-sequence overlap as the RAG retrieves larger segments. On the contrary, the baseline Llama-3.1 8B demonstrated minimal direct word overlap with the reference text, resulting in a low lexical similarity score. While the semantic similarity exhibited modest gains with RAG, this improvement primarily reflects refinement in phrasing rather than significant changes in meaning, indicating that the LLM output already conveyed the core concept. In conclusion, if the primary focus is semantic fidelity alone, RAG-Augmentation offers only moderate benefits, while its impact on lexical accuracy is substantial. Therefore, for tasks requiring high lexical precision, RAG-Augmented Mistral 7B presents a compelling and resource-efficient alternative to larger Llama 3.1 8B LLM.
In this comparison, the RAG-Augmented Mistral 7B was evaluated against a different baseline LLM—Llama 1 13B—to explore the potential trade-off possibilities across parameter scale. This comparison reveals that RAG-Augmented Mistral 7B outperformed Llama 1 13B in both lexical and semantic similarity scores, as in
Figure 6 and
Table 5. When compared to Llama 1 13B, RAG-Augmented Mistral 7B showed a notable improvement in BLEU and ROUGE scores, indicating a stronger n-gram overlap. While improvement in the BERT Recall and F1 scores came out to 8.4%, indicating that both LLMs capture the core concepts of the target responses, RAG-Augmented Mistral 7B refines the LLM responses, resulting in more polished and coherent outputs.
In this comparison, the RAG-Augmented Llama 3.1 8B was evaluated against the same baseline LLM—Llama 1 13B—to explore the trade-off possibilities across parameter scale. We also found RAG-Augmented Llama 3.1 B gain improvement compared to baseline Llama 1 13B, with improvement in a lexical and semantic similarity score, as shown in
Figure 7. This indicates that RAG-Augmentations helps as a real trade-off to larger-parameter baseline LLMs by increasing the n-gram overlap by reusing phrases from the retrieved contents, as well as integrating contextually accurate information (
Table 6). Nevertheless there was a slight decrease in BERT Precision for RAG-Augmented Llama 3.1 8B, reflecting the strong semantic capacity already embedded in the larger Llama 1 13B due to substantial internalized knowledge and contextual understanding. RAG-Augmented Llama 3.1 8B still can be considered as a viable trade-off to Llama 1 13B, especially if the goal is to reduce resource requirements while maintaining strong performance.
It is also interesting to note that with RAG-Augmented LLMs, the lexical score significantly improved due to improved context provided by the RAG, even though the output from both RAG-Augmented and baseline LLMs had closer semantic similarity, as indicated by the improvement in BERT scores.
Overall, we can provide the following insights based on the two analyses: for LLMs under 10 billion parameters, it is advisable to consistently incorporate retrieval, as the increase in quality surpasses that of transitioning to the subsequent dense tier. In scenarios where the GPU budget is constrained, it is recommended to select Mistral combined with RAG instead of any standard 13 billion LLM, as it represents the optimal cost-to-quality ratio according to our evaluation. Compared with a larger model such as Llama 1 13B, which uses 13 GB RAM, RAG-Augmented Mistral 7B and Llama 3.1 8B achieve comparable performance with lower memory (7–8 GB) requirement, demonstrating a practical quality–compute trade-off. For minimal resource requirements with acceptable quality, utilize RAG implemented with Mistral 7B for a balanced approach between efficiency and high quality, and for high-resource capacity with near-maximum quality, implement Llama-13B while maintaining retrieval for tasks where recall is critical. It is interesting to note that retrieval enables smaller LLMs to ascend in output quality without necessitating an increase in LLM parameters, in turn limiting the resource requirements.
Author Contributions
Conceptualization, A.V.K., S.S., and R.K.; methodology, A.V.K., S.S., and R.K.; software, A.V.K. and S.S.; validation, A.V.K., S.S., and R.K.; formal analysis, A.V.K., S.S., and R.K.; investigation, A.V.K., S.S., and R.K.; resources, A.V.K.; data curation, A.V.K.; writing—original draft preparation, A.V.K.; writing—review and editing, S.S. and R.K.; supervision, S.S. and R.K. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
AI | Artificial Intelligence |
BERT | Bidirectional Encoder Representations from Transformers |
BLEU | Bilingual Evaluation Understudy |
FAISS | Facebook AI Similarity Search |
FiD | Fusion-in-Decoder |
GenAI | Generative Artificial Intelligence |
GRAFT | Graph Retrieval Augmented Fine-Tuning |
HyDE | Hypothetical Document Embeddings |
LLM | Large Language Model |
NLP | Natural Language Processing |
OER | Open Educational Resource |
PEFT | Parameter-Efficient Fine-Tuning |
RAG | Retrieval Augmented Generation |
ROUGE | Recall-Oriented Understudy for Gisting Evaluation |
References
- Yan, L.; Sha, L.; Zhao, L.; Li, Y.; Martinez-Maldonado, R.; Chen, G.; Li, X.; Jin, Y.; Gašević, D. Practical and ethical challenges of large language models in education: A systematic scoping review. Br. J. Educ. Technol. 2024, 55, 90–112. [Google Scholar] [CrossRef]
- Chowdhery, A.; Narang, S.; Devlin, J.; Bosma, M.; Mishra, G.; Roberts, A.; Barham, P.; Chung, H.W.; Sutton, C.; Gehrmann, S.; et al. Palm: Scaling language modeling with pathways. J. Mach. Learn. Res. 2023, 24, 1–113. [Google Scholar]
- Gerritse, E.J.; Hasibi, F.; de Vries, A.P. Entity-aware transformers for entity search. In Proceedings of the 45th International ACM Sigir Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 1455–1465. [Google Scholar]
- Kandpal, N.; Deng, H.; Roberts, A.; Wallace, E.; Raffel, C. Large language models struggle to learn long-tail knowledge. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 15696–15707. [Google Scholar]
- Mallen, A.; Asai, A.; Zhong, V.; Das, R.; Khashabi, D.; Hajishirzi, H. When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; Rogers, A., Boyd-Graber, J., Okazaki, N., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; pp. 9802–9822. [Google Scholar] [CrossRef]
- Sun, K.; Xu, Y.; Zha, H.; Liu, Y.; Dong, X.L. Head-to-Tail: How Knowledgeable are Large Language Models (LLMs)? A.K.A. Will LLMs Replace Knowledge Graphs? In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Mexico City, Mexico, 16–21 June 2024; Duh, K., Gomez, H., Bethard, S., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 311–325. [Google Scholar] [CrossRef]
- Dhingra, B.; Cole, J.R.; Eisenschlos, J.M.; Gillick, D.; Eisenstein, J.; Cohen, W.W. Time-Aware Language Models as Temporal Knowledge Bases. Trans. Assoc. Comput. Linguist. 2022, 10, 257–273. [Google Scholar] [CrossRef]
- Shuster, K.; Poff, S.; Chen, M.; Kiela, D.; Weston, J. Retrieval Augmentation Reduces Hallucination in Conversation. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, Punta Cana, Dominican Republic, 16–20 November 2021; Moens, M.F., Huang, X., Specia, L., Yih, S.W.T., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 3784–3803. [Google Scholar] [CrossRef]
- Kasai, J.; Sakaguchi, K.; Takahashi, Y.; Le Bras, R.; Asai, A.; Yu, X.; Radev, D.; Smith, N.A.; Choi, Y.; Inui, K. Realtime qa: What’s the answer right now? Adv. Neural Inf. Process. Syst. 2023, 36, 49025–49043. [Google Scholar]
- Chen, Z.; Gu, Z.; Cao, L.; Fan, J.; Madden, S.; Tang, N. Symphony: Towards Natural Language Query Answering over Multi-modal Data Lakes. In Proceedings of the CIDR, Amsterdam, The Netherlands, 8–11 January 2023; pp. 1–7. [Google Scholar]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.t.; Rocktäschel, T.; et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv. Neural Inf. Process. Syst. 2020, 33, 9459–9474. [Google Scholar]
- de Luis Balaguer, M.A.; Benara, V.; de Freitas Cunha, R.L.; Estevão Filho, R.d.M.; Hendry, T.; Holstein, D.; Marsman, J.; Mecklenburg, N.; Malvar, S.; Nunes, L.O.; et al. RAG vs. Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture. arXiv 2024, arXiv:2401.08406. [Google Scholar] [CrossRef]
- Chen, W.; Hu, H.; Chen, X.; Verga, P.; Cohen, W. MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and Text. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 7–11 December 2022; Goldberg, Y., Kozareva, Z., Zhang, Y., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 5558–5570. [Google Scholar] [CrossRef]
- Gao, Y.; Xiong, Y.; Gao, X.; Jia, K.; Pan, J.; Bi, Y.; Dai, Y.; Sun, J.; Guo, Q.; Wang, M.; et al. Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv 2023, arXiv:2312.10997. [Google Scholar]
- Jiang, Y.; Li, X.; Luo, H.; Yin, S.; Kaynak, O. Quo vadis artificial intelligence? Discov. Artif. Intell. 2022, 2, 4. [Google Scholar] [CrossRef]
- Telenti, A.; Auli, M.; Hie, B.L.; Maher, C.; Saria, S.; Ioannidis, J.P. Large language models for science and medicine. Eur. J. Clin. Investig. 2024, 54, e14183. [Google Scholar] [CrossRef]
- Jeyaram, R.; Ward, R.N.; Santolini, M. Large language models recover scientific collaboration networks from text. Appl. Netw. Sci. 2024, 9, 64. [Google Scholar] [CrossRef]
- Peris, C.; Dupuy, C.; Majmudar, J.; Parikh, R.; Smaili, S.; Zemel, R.; Gupta, R. Privacy in the time of language models. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, Singapore, 27 February–3 March 2023; pp. 1291–1292. [Google Scholar]
- Ho, H.T.; Ly, D.T.; Nguyen, L.V. Mitigating Hallucinations in Large Language Models for Educational Application. In Proceedings of the 2024 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), Danang, Vietnam, 3–6 November 2024; pp. 1–4. [Google Scholar]
- Howard, J.; Ruder, S. Universal Language Model Fine-tuning for Text Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 15–20 July 2018; Gurevych, I., Miyao, Y., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2018; pp. 328–339. [Google Scholar] [CrossRef]
- Houlsby, N.; Giurgiu, A.; Jastrzebski, S.; Morrone, B.; De Laroussilhe, Q.; Gesmundo, A.; Attariyan, M.; Gelly, S. Parameter-efficient transfer learning for NLP. arXiv 2019, arXiv:1902.00751. [Google Scholar] [CrossRef]
- Soudani, H.; Kanoulas, E.; Hasibi, F. Data augmentation for conversational ai. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Singapore, 13–17 May 2023; pp. 5220–5223. [Google Scholar]
- Mosbach, M.; Pimentel, T.; Ravfogel, S.; Klakow, D.; Elazar, Y. Few-shot Fine-tuning vs. In-context Learning: A Fair Comparison and Evaluation. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, Toronto, ON, Canada, 9–14 July 2023; Rogers, A., Boyd-Graber, J., Okazaki, N., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023; pp. 12284–12314. [Google Scholar] [CrossRef]
- Juvekar, K.; Purwar, A. Introducing a new hyper-parameter for RAG: Context Window Utilization. arXiv 2024, arXiv:2407.19794. [Google Scholar] [CrossRef]
- Asai, A.; Wu, Z.; Wang, Y.; Sil, A.; Hajishirzi, H. Self-rag: Self-reflective retrieval augmented generation. arXiv 2023, arXiv:2310.11511. [Google Scholar]
- Gao, L.; Ma, X.; Lin, J.; Callan, J. Precise zero-shot dense retrieval without relevance labels. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; pp. 1762–1777. [Google Scholar]
- Izacard, G.; Grave, E. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Online, 19–23 April 2021; Merlo, P., Tiedemann, J., Tsarfaty, R., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 874–880. [Google Scholar] [CrossRef]
- Douze, M.; Guzhva, A.; Deng, C.; Johnson, J.; Szilvasy, G.; Mazar’e, P.E.; Lomeli, M.; Hosseini, L.; J’egou, H. The Faiss library. arXiv 2024, arXiv:2401.08281. [Google Scholar] [CrossRef]
- Edge, D.; Trinh, H.; Cheng, N.; Bradley, J.; Chao, A.; Mody, A.; Truitt, S.; Larson, J. From Local to Global: A Graph RAG Approach to Query-Focused Summarization. arXiv 2024, arXiv:2404.16130. [Google Scholar] [CrossRef]
- Clemedtson, A.; Shi, B. GraphRAFT: Retrieval Augmented Fine-Tuning for Knowledge Graphs on Graph Databases. arXiv 2025, arXiv:2504.05478. [Google Scholar]
- Hindi, M.; Mohammed, L.; Maaz, O.; Alwarafy, A. Enhancing the precision and interpretability of retrieval-augmented generation (rag) in legal technology: A survey. IEEE Access 2025, 13, 46171–46189. [Google Scholar] [CrossRef]
- Wiratunga, N.; Abeyratne, R.; Jayawardena, L.; Martin, K.; Massie, S.; Nkisi-Orji, I.; Weerasinghe, R.; Liret, A.; Fleisch, B. CBR-RAG: Case-Based Reasoning for Retrieval Augmented Generation in LLMs for Legal Question Answering. In Proceedings of the 32nd International Conference, ICCBR 2024, Merida, Mexico, 1–4 July 2024. [Google Scholar]
- Stewart Kirubakaran, S.; Jasper Wilsie Kathrine, G.; Grace Mary Kanaga, E.; Mahimai Raja, J.; Ruban Gino Singh, A.; Yuvaraajan, E. A RAG-based Medical Assistant Especially for Infectious Diseases. In Proceedings of the 2024 International Conference on Inventive Computation Technologies (ICICT), Lalitpur, Nepal, 24–26 April 2024; pp. 1128–1133. [Google Scholar] [CrossRef]
- Wong, L. Gaita: A RAG System for Personalized Computer Science Education. Master’s Thesis, Johns Hopkins University, Baltimore, MD, USA, 2024. [Google Scholar]
- Modran, H.A.; Bogdan, I.C.; Ursuțiu, D.; Samoilă, C.; Modran, P.L. LLM intelligent agent tutoring in higher education courses using a RAG approach. In Proceedings of the International Conference on Interactive Collaborative Learning; Springer: Cham, Switzerland, 2024; pp. 589–599. [Google Scholar]
- Zhao, C.; Agrawal, G.; Kumarage, T.; Tan, Z.; Deng, Y.; Chen, Y.C.; Liu, H. Ontology-Aware RAG for Improved Question-Answering in Cybersecurity Education. arXiv 2024, arXiv:2412.14191. [Google Scholar]
- Zhu, Y.; Ren, C.; Wang, Z.; Zheng, X.; Xie, S.; Feng, J.; Zhu, X.; Li, Z.; Ma, L.; Pan, C. Emerge: Integrating rag for improved multimodal ehr predictive modeling. arXiv 2024, arXiv:2406.00036. [Google Scholar]
- Chen, W.; Zhou, M.; Fan, X.; Zhou, L.; Zhu, S.; Cai, T. Application of Retrieval-Augmented Generation in video. In Proceedings of the 2024 12th International Conference on Information Systems and Computing Technology (ISCTech), Xi’an, China, 8–11 November 2024; pp. 1–5. [Google Scholar]
- Li, X. Application of RAG model based on retrieval enhanced generation technique in complex query processing. Adv. Comput. Signals Syst. 2024, 8. Available online: https://api.semanticscholar.org/CorpusID:272955388 (accessed on 23 August 2025). [CrossRef]
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.J. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, 7–12 July 2002; pp. 311–318. [Google Scholar]
- Ng, J.P.; Abrecht, V. Better Summarization Evaluation with Word Embeddings for ROUGE. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; Màrquez, L., Callison-Burch, C., Su, J., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2015; pp. 1925–1930. [Google Scholar] [CrossRef]
- Lin, C.Y.; Och, F. Looking for a few good metrics: ROUGE and its evaluation. In Proceedings of the Ntcir Workshop, Tokyo, Japan, 2–4 June 2004; pp. 1–8. [Google Scholar]
- Kurokawa, R.; Ohizumi, Y.; Kanzawa, J.; Kurokawa, M.; Sonoda, Y.; Nakamura, Y.; Kiguchi, T.; Gonoi, W.; Abe, O. Diagnostic performances of Claude 3 Opus and Claude 3.5 Sonnet from patient history and key images in Radiology’s “Diagnosis Please” cases. Jpn. J. Radiol. 2024, 42, 1399–1402. [Google Scholar] [CrossRef] [PubMed]
- Islam, R.; Ahmed, I. Gemini-the most powerful LLM: Myth or Truth. In Proceedings of the 2024 5th Information Communication Technologies Conference (ICTC), Nanjing, China, 10–12 May 2024; pp. 303–308. [Google Scholar]
- Phan, H.; Acharya, A.; Chaturvedi, S.; Sharma, S.; Parker, M.; Nally, D.; Jannesari, A.; Pazdernik, K.; Halappanavar, M.; Munikoti, S.; et al. RAG vs. Long Context: Examining Frontier Large Language Models for Environmental Review Document Comprehension. arXiv 2024, arXiv:2407.07321. [Google Scholar] [CrossRef]
- Xu, P.; Ping, W.; Wu, X.; Xu, C.; Liu, Z.; Shoeybi, M.; Catanzaro, B. ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities. arXiv 2024, arXiv:2407.14482. [Google Scholar] [CrossRef]
- Tsai, H.C.; Huang, Y.F.; Kuo, C.W. Comparative analysis of automatic literature review using mistral large language model and human reviewers. Res. Square, 2024; ahead of print. Available online: https://sciety.org/articles/activity/10.21203/rs.3.rs-4022248/v1 (accessed on 23 August 2025).
- Can, E.; Uller, W.; Vogt, K.; Doppler, M.C.; Busch, F.; Bayerl, N.; Ellmann, S.; Kader, A.; Elkilany, A.; Makowski, M.R.; et al. Large language models for simplified interventional radiology reports: A comparative analysis. Acad. Radiol. 2025, 32, 888–898. [Google Scholar] [CrossRef]
- Kim, T.; Wang, Y.; Chaturvedi, V.; Gupta, L.; Kim, S.; Kwon, Y.; Ha, S. LLMem: Estimating GPU Memory Usage for Fine-Tuning Pre-Trained LLMs. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI-24), Jeju, Republic of Korea, 3–9 August 2024; pp. 6324–6332. [Google Scholar] [CrossRef]
- Thor, W.M. How To Calculate GPU VRAM Requirements for an Large-Language Model. ApX Machine Learning Blog Post. Last Updated 15 July 2025. Available online: https://apxml.com/posts/how-to-calculate-vram-requirements-for-an-llm (accessed on 3 June 2025).
- Yin, C.; Zhang, Z. A Study of Sentence Similarity Based on the All-minilm-l6-v2 Model with “Same Semantics, Different Structure” After Fine Tuning. In Proceedings of the 2024 2nd International Conference on Image, Algorithms and Artificial Intelligence (ICIAAI 2024), Singapore, 9–11 August 2024; Atlantis Press: Dordrecht, The Netherlands, 2024; pp. 677–684. [Google Scholar]
- Wawrzik, F.; Plaue, M.; Vekariya, S.; Grimm, C. Customized Information and Domain-centric Knowledge Graph Construction with Large Language Models. arXiv 2024, arXiv:2409.20010. [Google Scholar] [CrossRef]
- Hakdağlı, Ö. Hybrid Question-Answering System: A FAISS and BM25 Approach for Extracting Information from Technical Document. Orclever Proc. Res. Dev. 2024, 5, 226–237. [Google Scholar] [CrossRef]
- Moreira, G.D.S.P.; Ak, R.; Schifferer, B.D.; Xu, M.; Osmulski, R.; Oldridge, E. Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG. arXiv 2024, arXiv:2409.07691. [Google Scholar]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).