Performance Modeling of Lightweight Retrieval-Augmented Large Language Models for Low-Resource Plastic Surgery Settings
Abstract
1. Introduction
1.1. Background
1.2. Research Objectives
2. Methods
2.1. Query Generation
2.2. Model Configuration
2.2.1. Base Language Model
2.2.2. Embedder
2.2.3. Database Size
2.2.4. Chunk Size
2.2.5. Hop Type
2.3. Performance Evaluation
2.4. Predictive Modeling
3. Results
3.1. Overall Performance
3.2. Combined Mixed-Effects Model
3.3. Main Effects by Base Large Language Model
4. Discussion
4.1. Interpretation of the Results
4.2. Clinical Implications
4.3. Ethical Considerations
4.4. Strengths and Limitations of This Study
4.5. Future Directions
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Alowais, S.A.; Alghamdi, S.S.; Alsuhebany, N.T.; Alshaya, A.I.; Almohareb, S.N.; Aldairem, A.; Alrashed, M.; Bin Saleh, K.; Badreldin, H.A. Revolutionizing healthcare: The role of artificial intelligence in clinical practice. BMC Med. Educ. 2023, 23, 689. [Google Scholar] [CrossRef]
- Genovese, A.; Prabha, S.; Borna, S.; Gomez-Cabello, C.A.; Haider, S.A.; Trabilsy, M.; Tao, C.; Forte, A.J. From Data to Decisions: Leveraging Retrieval-Augmented Generation to Balance Citation Bias in Burn Management Literature. Eur. Burn. J. 2025, 6, 28. [Google Scholar] [CrossRef]
- Ozmen, B.B.; Mathur, P. Evidence-based artificial intelligence: Implementing retrieval-augmented generation models to enhance clinical decision support in plastic surgery. J. Plast. Reconstr. Aesthetic Surg. 2025, 104, 414–416. [Google Scholar] [CrossRef] [PubMed]
- Kozma, R. (Ed.) Sustainable Artificial Intelligence Must Be Aware of Its Body and the Environment. In Proceedings of the 2024 Joint 13th International Conference on Soft Computing and Intelligent Systems and 25th International Symposium on Advanced Intelligent Systems (SCIS&ISIS), Himeji, Japan, 9–12 November 2024. [Google Scholar]
- Rau, D.; Déjean, H.; Chirkova, N.; Formal, T.; Wang, S.; Clinchant, S.; Nikoulina, V. (Eds.) Bergen: A benchmarking library for retrieval-augmented generation. In Findings of the Association for Computational Linguistics: EMNLP 2024; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024. [Google Scholar]
- Barker, M.; Bell, A.; Thomas, E.; Carr, J.; Andrews, T.; Bhatt, U. Faster, Cheaper, Better: Multi-Objective Hyperparameter Optimization for LLM and RAG Systems. arXiv 2025, arXiv:2502.18635. [Google Scholar] [CrossRef]
- Srivastava, A.; Rastogi, A.; Rao, A.; Shoeb, A.A.M.; Abid, A.; Fisch, A.; Brown, A.; Santoro, A.; Gupta, A.; Garriga-Alonso, A. Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv 2022, arXiv:2206.04615. [Google Scholar] [CrossRef]
- Ye, Q.; Fu, H.; Ren, X.; Jia, R. (Eds.) How predictable are large language model capabilities? A case study on big-bench. In Findings of the Association for Computational Linguistics: EMNLP 2023; Association for Computational Linguistics: Stroudsburg, PA, USA, 2023. [Google Scholar]
- Patel, H.N.; Surti, A.; Goel, P.; Patel, B. (Eds.) A comparative analysis of large language models with retrieval-augmented generation based question answering system. In Proceedings of the 2024 8th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Kirtipur, Nepal, 3–5 October 2024. [Google Scholar]
- Vladika, J.; Matthes, F. On the Influence of Context Size and Model Choice in Retrieval-Augmented Generation Systems. arXiv 2025, arXiv:2502.14759. [Google Scholar] [CrossRef]
- Canale, L.; Scotta, S.; Messina, A.; Farinetti, L. BES4RAG: A Framework for Embedding Model Selection in Retrieval-Augmented Generation. In Proceedings of the Eleventh Italian Conference on Computational Linguistics (CLiC-it 2025), Cagliari, Italy, 24–26 September 2025. [Google Scholar]
- Bhat, S.R.; Rudat, M.; Spiekermann, J.; Flores-Herr, N. Rethinking Chunk Size For Long-Document Retrieval: A Multi-Dataset Analysis. arXiv 2025, arXiv:250521700. [Google Scholar]
- Liu, H.; Wang, Z.; Chen, X.; Li, Z.; Xiong, F.; Yu, Q.; Zhang, W. Hoprag: Multi-hop reasoning for logic-aware retrieval-augmented generation. arXiv 2025, arXiv:250212442. [Google Scholar]
- Juvekar, K.; Purwar, A. Introducing a new hyper-parameter for RAG: Context window utilization. arXiv 2024, arXiv:2407.19794. [Google Scholar] [CrossRef]
- Jiang, W.; Zhang, S.; Han, B.; Wang, J.; Wang, B.; Kraska, T. Piperag: Fast retrieval-augmented generation via algorithm-system co-design. arXiv 2024, arXiv:240305676. [Google Scholar]
- Wu, C.; Shao, N.; Liu, Z.; Xiao, S.; Li, C.; Zhang, C.; Wang, S.; Lian, D. (Eds.) Lighter and better: Towards flexible context adaptation for retrieval augmented generation. In Proceedings of the Eighteenth ACM International Conference on Web Search and Data Mining, Hannover, Germany, 10–14 March 2025. [Google Scholar]
- Reddy, S.; Khan, K.; Patil, R.; Chakraborty, A.; Khan, F.A.; Kulkarni, S.; Verma, A.; Singh, N. Computational Economics in Large Language Models: Exploring Model Behavior and Incentive Design under Resource Constraints. arXiv 2025, arXiv:2508.10426. [Google Scholar] [CrossRef]
- Janis, J.E. Essentials of Plastic Surgery, 3rd ed; Thieme: New York, NY, USA, 2022. [Google Scholar]
- Elangovan, K.; Ong, J.C.L.; Jin, L.; Seng, B.J.J.; Kwan, Y.H.; Tan, L.S.; Zhong, R.J.; Ma, J.K.L.; Ke, Y.; Liu, N. Lightweight Large Language Model for Medication Enquiry: Med-Pal. arXiv 2024, arXiv:240712822. [Google Scholar]
- Abdin, M.; Jacobs, S.A.; Awan, A.A.; Aneja, J.; Awadallah, A.; Awadalla, H.H.; Bach, N.; Bahree, A.; Bakhtiari, A.; Behl, H.S.; et al. Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone. arXiv 2024, arXiv:2404.14219. [Google Scholar] [CrossRef]
- Kopcho, T.J.; Fouda, M.M.; Krome, C.J. (Eds.) A Lightweight AI Model for Anomaly Detection in Wireless Networks. In Proceedings of the 2024 2nd International Conference on Artificial Intelligence, Blockchain, and Internet of Things (AIBThings), Mt Pleasant, MI, USA, 7–8 September 2024. [Google Scholar]
- Labrak, Y.; Bazoge, A.; Morin, E.; Gourraud, P.-A.; Rouvier, M.; Dufour, R. Biomistral: A collection of open-source pretrained large language models for medical domains. arXiv 2024, arXiv:240210373. [Google Scholar]
- Bolton, E.; Xiong, B.; Muralidharan, V.; Schamroth, J.; Muralidharan, V.; Manning, C.D.; Daneshjou, R. Assessing the potential of mid-sized language models for clinical qa. arXiv 2024, arXiv:2404.15894. [Google Scholar] [CrossRef]
- Chisholm, H. (Ed.) Encyclopaedia Britannica, 11th ed.; Encyclopaedia Britannica Company: Cambridge, UK, 1911. [Google Scholar]
- James, A.; Trovati, M.; Bolton, S. Retrieval-Augmented Generation to Generate Knowledge Assets and Creation of Action Drivers. Appl. Sci. 2025, 15, 6247. [Google Scholar] [CrossRef]
- Ragas. Semantic Similarity. 2025. Available online: https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/semantic_similarity/ (accessed on 20 June 2025).
- Risch, J.; Möller, T.; Gutsch, J.; Pietsch, M. Semantic answer similarity for evaluating question answering models. arXiv 2021, arXiv:2108.06130. [Google Scholar] [CrossRef]
- Sellam, T.; Das, D.; Parikh, A. (Eds.) BLEURT: Learning robust metrics for text generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020. [Google Scholar]
- Wang, P.; Li, L.; Chen, L.; Cai, Z.; Zhu, D.; Lin, B.; Cao, Y.; Liu, Q.; Liu, T.; Sui, Z. Large language models are not fair evaluators. arXiv 2023, arXiv:2305.17926. [Google Scholar] [CrossRef]
- Fan, T.; Wang, J.; Ren, X.; Huang, C. Minirag: Towards extremely simple retrieval-augmented generation. arXiv 2025, arXiv:250106713. [Google Scholar]
- Garg, M.; Raza, S.; Rayana, S.; Liu, X.; Sohn, S. The Rise of Small Language Models in Healthcare: A Comprehensive Survey. arXiv 2025, arXiv:2504.17119. [Google Scholar] [CrossRef]
- Tiet, V. Retrieval-Augmented Generation for Technical Question Answering. LU-CS-EX. 2025. Available online: https://lup.lub.lu.se/luur/download?func=downloadFile&recordOId=9187244&fileOId=9187245 (accessed on 5 January 2026).
- Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; Kang, J. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020, 36, 1234–1240. [Google Scholar] [CrossRef] [PubMed]
- Leng, Q.; Portes, J.; Havens, S.; Zaharia, M.; Carbin, M. Long Context RAG Performance of Large Language Models. arXiv 2024, arXiv:2411.03538. Available online: https://ui.adsabs.harvard.edu/abs/2024arXiv241103538L (accessed on 5 January 2026).
- Wang, Z.; Bi, B.; Luo, Y.; Asur, S.; Cheng, C.N. Diversity Enhances an LLM’s Performance in RAG and Long-context Task. arXiv 2025, arXiv:2502.09017. Available online: https://ui.adsabs.harvard.edu/abs/2025arXiv250209017W (accessed on 5 January 2026).
- Genovese, A.; Prabha, S.; Gomez-Cabello, C.A.; Haider, S.A.; Borna, S.; Trabilsy, M.; Forte, A.J. The evolving role of artificial intelligence in plastic surgery education: Insights from program directors and residents. J. Surg. Educ. 2025, 82, 103622. [Google Scholar] [CrossRef] [PubMed]
- Esnaashari, S.; Hashem, Y.; Francis, J.; Morgan, D.; Poletaev, A.; Bright, J. Exploring doctors’ perspectives on generative-AI and diagnostic-decision-support systems. BMJ Health Care Inform. 2025, 32, e101371. [Google Scholar] [CrossRef] [PubMed]
- Blease, C.R.; Locher, C.; Gaab, J.; Hägglund, M.; Mandl, K.D. Generative artificial intelligence in primary care: An online survey of UK general practitioners. BMJ Health Care Inform. 2024, 31, e101102. [Google Scholar] [CrossRef]
- Jussupow, E.; Spohrer, K.; Heinzl, A.; Gawlitza, J. Augmenting medical diagnosis decisions? An investigation into physicians’ decision-making process with artificial intelligence. Inf. Syst. Res. 2021, 32, 713–735. [Google Scholar] [CrossRef]
- Goktas, P.; Grzybowski, A. Shaping the future of healthcare: Ethical clinical challenges and pathways to trustworthy AI. J. Clin. Med. 2025, 14, 1605. [Google Scholar] [CrossRef]
- Genovese, A.; Prabha, S.; Borna, S.; Gomez-Cabello, C.A.; Haider, S.A.; Trabilsy, M.; Tao, C.; Aziz, K.T.; Murray, P.M.; Forte, A.J. Artificial intelligence for patient support: Assessing retrieval-augmented generation for answering postoperative rhinoplasty questions. Aesthetic Surg. J. 2025, 45, 735–744. [Google Scholar] [CrossRef]
- Guo, Y.; Tao, Y.; Ming, Y.; Nowak, R.D.; Liang, Y. Retrieval-Augmented Generation as Noisy In-Context Learning: A Unified Theory and Risk Bounds. arXiv 2025, arXiv:250603100. [Google Scholar]
- Wu, R.; Lee, Y.; Shu, F.; Xu, D.; Hwang, S.-w.; Yao, Z.; He, Y.; Yan, F. ComposeRAG: A Modular and Composable RAG for Corpus-Grounded Multi-Hop Question Answering. arXiv 2025, arXiv:250600232. [Google Scholar]
- An, B.; Zhang, S.; Dredze, M. Rag llms are not safer: A safety analysis of retrieval-augmented generation for large language models. arXiv 2025, arXiv:2504.18041. [Google Scholar] [CrossRef]
- Wong, E.; Bermudez-Cañete, A.; Campbell, M.J.; Rhew, D.C. Bridging the digital divide: A practical roadmap for deploying medical artificial intelligence technologies in low-resource settings. Popul. Health Manag. 2025, 28, 105–114. [Google Scholar] [CrossRef]
- Ong, J.C.L.; Jin, L.; Elangovan, K.; Lim, G.Y.S.; Lim, D.Y.Z.; Sng, G.G.R.; Ke, Y.; Tung, J.Y.M.; Zhong, R.J.; Koh, C.M.Y. Development and testing of a novel large language model-based clinical decision support systems for medication safety in 12 clinical specialties. arXiv 2024, arXiv:2402.01741. [Google Scholar] [CrossRef]
- Liu, S.; McCoy, A.B.; Wright, A. Improving large language model applications in biomedicine with retrieval-augmented generation: A systematic review, meta-analysis, and clinical development guidelines. J. Am. Med. Inform. Assoc. 2025, 32, 605–615. [Google Scholar] [CrossRef]
- Hang, C.N.; Yu, P.-D.; Tan, C.W. TrumorGPT: Graph-Based Retrieval-Augmented Large Language Model for Fact-Checking. arXiv 2025, arXiv:2505.07891. Available online: https://ui.adsabs.harvard.edu/abs/2025arXiv250507891H (accessed on 19 March 2026).


| Size | Number of 128-Token Chunks |
|---|---|
| 1 | 287 |
| 2 | 574 |
| 3 | 1148 |
| 4 | 2296 |
| 5 | 4591 |
| 6 | 9182 |
| 7 | 18,364 |
| 8 | 36,728 |
| 9 | 73,456 |
| 10 | 146,912 |
| Base LLM | Embedder | Hop Type | Chunk Size (Tokens) | Optimal Database Size | Mean Semantic Similarity | Lower 95% CI | Upper 95% CI |
|---|---|---|---|---|---|---|---|
| BioMistral | Bio_ClinicalBERT | Multi-hop | 128 | Size 2 | 0.404 | 0.278 | 0.531 |
| BioMistral | Bio_ClinicalBERT | Multi-hop | 256 | Size 3 | 0.404 | 0.29 | 0.519 |
| Phi-3-mini | Bio_ClinicalBERT | Single-Hop | 128 | Size 10 | 0.42 | 0.272 | 0.568 |
| Phi-3-mini | Bio_ClinicalBERT | Multi-hop | 128 | Size 5 | 0.466 | 0.351 | 0.582 |
| BioMistral | bge-large-en-v1.5 | Multi-hop | 128 | Size 3 | 0.523 | 0.406 | 0.64 |
| Phi-3-mini | Bio_ClinicalBERT | Single-Hop | 256 | Size 3 | 0.529 | 0.394 | 0.663 |
| BioMistral | all-MiniLM-L6-v2 | Multi-hop | 128 | Size 7 | 0.547 | 0.459 | 0.635 |
| Phi-3-mini | Bio_ClinicalBERT | Multi-hop | 256 | Size 1 | 0.599 | 0.508 | 0.689 |
| BioMistral | Bio_ClinicalBERT | Single-Hop | 256 | Size 3 | 0.607 | 0.466 | 0.748 |
| BioMistral | bge-large-en-v1.5 | Multi-hop | 256 | Size 5 | 0.615 | 0.52 | 0.71 |
| BioMistral | all-MiniLM-L6-v2 | Multi-hop | 256 | Size 5 | 0.624 | 0.538 | 0.709 |
| Phi-3-mini | all-MiniLM-L6-v2 | Single-Hop | 128 | Size 9 | 0.637 | 0.52 | 0.754 |
| BioMistral | Bio_ClinicalBERT | Single-Hop | 128 | Size 4 | 0.644 | 0.533 | 0.756 |
| Phi-3-mini | all-MiniLM-L6-v2 | Multi-hop | 128 | Size 5 | 0.65 | 0.574 | 0.726 |
| Phi-3-mini | all-MiniLM-L6-v2 | Multi-hop | 256 | Size 9 | 0.651 | 0.564 | 0.738 |
| Phi-3-mini | bge-large-en-v1.5 | Multi-hop | 128 | Size 5 | 0.67 | 0.581 | 0.76 |
| Phi-3-mini | bge-large-en-v1.5 | Multi-hop | 256 | Size 9 | 0.68 | 0.599 | 0.76 |
| Phi-3-mini | bge-large-en-v1.5 | Single-Hop | 128 | Size 8 | 0.701 | 0.569 | 0.833 |
| Phi-3-mini | all-MiniLM-L6-v2 | Single-Hop | 256 | Size 5 | 0.714 | 0.581 | 0.847 |
| BioMistral | all-MiniLM-L6-v2 | Single-Hop | 256 | Size 8 | 0.728 | 0.633 | 0.822 |
| Phi-3-mini | bge-large-en-v1.5 | Single-Hop | 256 | Size 6 | 0.746 | 0.645 | 0.848 |
| BioMistral | all-MiniLM-L6-v2 | Single-Hop | 128 | Size 7 | 0.75 | 0.645 | 0.855 |
| BioMistral | bge-large-en-v1.5 | Single-Hop | 256 | Size 6 | 0.755 | 0.674 | 0.876 |
| BioMistral | bge-large-en-v1.5 | Single-Hop | 128 | Size 10 | 0.786 | 0.682 | 0.889 |
| Base LLM | Chunk Size (Tokens) | Mean Semantic Similarity | Lower 95% CI | Upper 95% CI |
|---|---|---|---|---|
| BioMistral | 128 | 0.558 | 0.513 | 0.603 |
| Phi-3-mini | 128 | 0.541 | 0.496 | 0.585 |
| BioMistral | 256 | 0.569 | 0.524 | 0.614 |
| Phi-3-mini | 256 | 0.605 | 0.560 | 0.650 |
| Variable | Term | Estimate | p-Value |
|---|---|---|---|
| Hop Type | Single-hop (reference) | ||
| Multi-hop | −0.208 | <0.001 | |
| Chunk Size | 128 (reference) | ||
| 256 | −0.005 | 0.62437 | |
| Embedder | all-MiniLM-L6-v2 (reference) | ||
| bge-large-en-v1.5 | 0.023 | 0.00159 | |
| Bio_ClinicalBERT | −0.169 | <0.001 | |
| Database Size | 287 chunks (reference) | ||
| 574 chunks | 0.018 | 0.17595 | |
| 1148 chunks | 0.019 | 0.15272 | |
| 2296 chunks | 0.006 | 0.64924 | |
| 4591 chunks | 0.043 | 0.00119 | |
| 9182 chunks | 0.033 | 0.1323 | |
| 18,364 chunks | 0.029 | 0.02869 | |
| 36,728 chunks | 0.035 | 0.00854 | |
| 73,456 chunks | 0.034 | 0.01083 | |
| 146,912 chunks | 0.030 | 0.02429 | |
| Interaction Terms | Model (Phi-3-mini) × Chunk Size (256) | 0.053 | <0.001 |
| Model (Phi-3-mini) × Hop Type (Multi-Hop) | 0.188 | <0.001 | |
| Chunk Size (256) × Hop Type (Multi-Hop) | 0.033 | 0.00540 |
| Variable | Term | All Embedders | Bio_ClinicalBERT | All-MiniLM-L6-v2 | BGE-Large-En-v1.5 | ||||
|---|---|---|---|---|---|---|---|---|---|
| Estimate | p-Value | Estimate | p-Value | Estimate | p-Value | Estimate | p-Value | ||
| Hop Type | Single-hop (reference) | ||||||||
| Multi-hop | −0.004 | 0.942 | 0.041 | 0.379 | −0.003 | 0.955 | −0.049 | 0.411 | |
| Chunk Size | 128 (reference) | ||||||||
| 256 | 0.065 | <0.001 | 0.117 | <0.001 | 0.055 | <0.001 | 0.021 | 0.069 | |
| Embedder | all-MiniLM-L6-v2 (reference) | ||||||||
| bge-large-en-v1.5 | 0.029 | 0.003 | |||||||
| Bio_ClinicalBERT | −0.178 | <0.001 | |||||||
| Database Size | 287 chunks (reference) | ||||||||
| 574 chunks | 0.031 | 0.086 | 0.018 | 0.582 | 0.024 | 0.337 | 0.050 | 0.057 | |
| 1148 chunks | 0.015 | 0.389 | −0.015 | 0.647 | 0.030 | 0.235 | 0.031 | 0.229 | |
| 2296 chunks | 0.007 | 0.713 | −0.014 | 0.678 | 0.030 | 0.233 | 0.003 | 0.902 | |
| 4591 chunks | 0.050 | 0.006 | 0.014 | 0.666 | 0.081 | 0.002 | 0.054 | 0.039 | |
| 9182 chunks | 0.047 | 0.01 | 0.007 | 0.843 | 0.063 | 0.014 | 0.070 | 0.007 | |
| 18,364 chunks | 0.028 | 0.122 | −0.021 | 0.536 | 0.038 | 0.139 | 0.066 | 0.011 | |
| 36,728 chunks | 0.039 | 0.031 | −0.016 | 0.631 | 0.060 | 0.019 | 0.073 | 0.005 | |
| 73,456 chunks | 0.050 | 0.006 | −0.007 | 0.84 | 0.075 | 0.003 | 0.080 | 0.002 | |
| 146,912 chunks | 0.041 | 0.024 | 0.008 | 0.798 | 0.055 | 0.032 | 0.059 | 0.024 | |
| Variable | Term | All Embedders | Bio_ClinicalBERT | All-MiniLM-L6-v2 | BGE-Large-En-v1.5 | ||||
|---|---|---|---|---|---|---|---|---|---|
| Estimate | p-Value | Estimate | p-Value | Estimate | p-Value | Estimate | p-Value | ||
| Hop Type | Single-hop (reference) | ||||||||
| Multi-hop | −0.192 | 0.001 | −0.195 | 0.002 | −0.169 | 0.006 | −0.212 | 0.001 | |
| Chunk Size | 128 (reference) | ||||||||
| 256 | 0.012 | 0.145 | −0.027 | 0.076 | 0.021 | 0.045 | 0.041 | <0.001 | |
| Embedder | all-MiniLM-L6-v2 (reference) | ||||||||
| bge-large-en-v1.5 | 0.017 | 0.082 | |||||||
| Bio_ClinicalBERT | −0.161 | <0.001 | |||||||
| Database Size | 287 chunks (reference) | ||||||||
| 574 chunks | 0.005 | 0.767 | 0.011 | 0.742 | 0.012 | 0.591 | −0.008 | 0.752 | |
| 1148 chunks | 0.023 | 0.201 | 0.037 | 0.273 | 0.013 | 0.58 | 0.018 | 0.482 | |
| 2296 chunks | 0.006 | 0.755 | −0.010 | 0.771 | 0.022 | 0.349 | 0.005 | 0.848 | |
| 4591 chunks | 0.037 | 0.037 | −0.010 | 0.768 | 0.059 | 0.011 | 0.062 | 0.015 | |
| 9182 chunks | 0.019 | 0.27 | −0.045 | 0.192 | 0.045 | 0.05 | 0.058 | 0.022 | |
| 18,364 chunks | 0.030 | 0.085 | −0.009 | 0.794 | 0.061 | 0.009 | 0.040 | 0.116 | |
| 36,728 chunks | 0.031 | 0.077 | −0.013 | 0.697 | 0.051 | 0.029 | 0.056 | 0.025 | |
| 73,456 chunks | 0.018 | 0.301 | −0.043 | 0.212 | 0.037 | 0.114 | 0.061 | 0.016 | |
| 146,912 chunks | 0.019 | 0.273 | −0.046 | 0.18 | 0.040 | 0.086 | 0.064 | 0.011 | |
| Base LLM | Hop Type | Mean Semantic Similarity | Lower 95% CI | Upper 95% CI |
|---|---|---|---|---|
| BioMistral | Multi-hop | 0.468 | 0.405 | 0.530 |
| Phi-3-mini | Multi-hop | 0.571 | 0.509 | 0.634 |
| BioMistral | Single-hop | 0.659 | 0.597 | 0.722 |
| Phi-3-mini | Single-hop | 0.575 | 0.512 | 0.637 |
| Chunk Size (Tokens) | Hop Type | Mean Semantic Similarity | Lower 95% CI | Upper 95% CI |
|---|---|---|---|---|
| 128 | Multi-hop | 0.492 | 0.430 | 0.555 |
| 256 | Multi-hop | 0.547 | 0.484 | 0.609 |
| 128 | Single-hop | 0.606 | 0.544 | 0.669 |
| 256 | Single-hop | 0.628 | 0.565 | 0.690 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Sun, N.Y.; Genovese, A.; Prabha, S.; Gomez-Cabello, C.A.; Haider, S.A.; Collaco, B.; Pan, T.; Wood, N.G.; Forte, A.J. Performance Modeling of Lightweight Retrieval-Augmented Large Language Models for Low-Resource Plastic Surgery Settings. Bioengineering 2026, 13, 378. https://doi.org/10.3390/bioengineering13040378
Sun NY, Genovese A, Prabha S, Gomez-Cabello CA, Haider SA, Collaco B, Pan T, Wood NG, Forte AJ. Performance Modeling of Lightweight Retrieval-Augmented Large Language Models for Low-Resource Plastic Surgery Settings. Bioengineering. 2026; 13(4):378. https://doi.org/10.3390/bioengineering13040378
Chicago/Turabian StyleSun, Nora Y., Ariana Genovese, Srinivasagam Prabha, Cesar A. Gomez-Cabello, Syed Ali Haider, Bernardo Collaco, Theophilus Pan, Nadia G. Wood, and Antonio Jorge Forte. 2026. "Performance Modeling of Lightweight Retrieval-Augmented Large Language Models for Low-Resource Plastic Surgery Settings" Bioengineering 13, no. 4: 378. https://doi.org/10.3390/bioengineering13040378
APA StyleSun, N. Y., Genovese, A., Prabha, S., Gomez-Cabello, C. A., Haider, S. A., Collaco, B., Pan, T., Wood, N. G., & Forte, A. J. (2026). Performance Modeling of Lightweight Retrieval-Augmented Large Language Models for Low-Resource Plastic Surgery Settings. Bioengineering, 13(4), 378. https://doi.org/10.3390/bioengineering13040378

