Applications and Challenges of Retrieval-Augmented Generation (RAG) in Maternal Health: A Multi-Axial Review of the State of the Art in Biomedical QA with LLMs
Abstract
1. Introduction
1.1. Background
1.2. Large Language Models (LLMs)
1.3. Retrieval-Augmented Generation (RAG)
1.4. Biomedical Question Answering (QA)
1.5. Maternal Health in Telemedicine
2. Review Methodology
2.1. Thematic Axes of the Review
- Fundamentals of RAG in biomedical QA systems: architecture, retrieval mechanisms, and traceability;
- Biomedical LLMs and clinical QA generation: performance, training domains, and applicability;
- Use of QA as input for medical conversational agents: datasets, fine-tuning, and health-related interaction;
- Clinical validation and explainability of QA systems: clinical concordance, traceability, and transparency;
- Specific applications of QA and NLP in maternal health: tools, use cases, and population coverage.
2.2. Search Strategy and Sources
2.3. Inclusion and Exclusion Criteria
3. State-of-the-Art Development by Thematic Axes
3.1. Axis 1: RAG and Advanced RAG in Biomedical QA Systems
3.2. Axis 2: Development of LLMs Trained in Biomedical Domains
3.3. Axis 3: The Use of QA as Input for Clinical Conversational Agents
3.4. Axis 4: Methods for Clinical Validation and Explainability in QA Systems
3.5. Axis 5: Specific Applications in the Domain of Maternal Health
4. Synthesis and Visual Analytics
4.1. Technology Prominence in the Literature
4.2. Clinical Validation Attributes by Model
4.3. Comparative Capabilities of Biomedical Language Models
5. Cross-Sectional Analysis and Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Use of Artificial Intelligence
References
- Alkhalaf, M.; Yu, P.; Yin, M.; Deng, C. Applying generative AI with retrieval augmented generation to summarize and extract key clinical information from electronic health records. J. Biomed. Inform. 2024, 156, 104662. [Google Scholar] [CrossRef] [PubMed]
- Gaber, F.; Shaik, M.; Allega, F.; Bilecz, A.J.; Busch, F.; Goon, K.; Franke, V.; Akalin, A. Evaluating large language model workflows in clinical decision support incorporating RAG on real-world cases. npj Digit. Med. 2025, 8, 16. [Google Scholar] [CrossRef] [PubMed]
- Amugongo, L.M.; Mascheroni, P.; Brooks, S.; Doering, S.; Seidel, J. Retrieval augmented generation for large language models in healthcare: A systematic review. PLoS Digit. Health 2025, 4, e0000877. [Google Scholar] [CrossRef] [PubMed]
- Ozmen, B.B.; Mathur, P. Evidence-based artificial intelligence: Implementing retrieval-augmented generation models to enhance clinical decision support in plastic surgery. J. Plast. Reconstr. Aesthetic Surg. 2025, 104, 414–416. [Google Scholar] [CrossRef]
- Ji, Y.; Zhang, H.; Wang, Y. Evaluating bias in retrieval-augmented medical question-answering systems. arXiv 2025, arXiv:2503.15454. [Google Scholar] [CrossRef]
- Lin, M.; Lin, L.; Lin, L.; Lin, Z.; Yan, X. A bibliometric analysis of the advance of artificial intelligence in medicine. Front. Med. 2025, 12, 1504428. [Google Scholar] [CrossRef]
- Khan, M.J.; Duta, I.; Albert, B.; Cooke, W.; Vatish, M.; Jones, G.D. The OxMat dataset: A multimodal resource for the development of AI-driven technologies in maternal and newborn child health. arXiv 2024, arXiv:2404.08024. [Google Scholar] [CrossRef]
- Park, C.; Moon, H.; Park, C.; Lim, H. MIRAGE: A Metric-Intensive Benchmark for Retrieval-Augmented Generation Evaluation. arXiv 2025, arXiv:2504.17137. [Google Scholar] [CrossRef]
- Bommasani, R.; Hudson, D.A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S. On the opportunities and risks of foundation models. Nat. Mach. Intell. 2022, 4, 189–191. [Google Scholar] [CrossRef]
- World Health Organization. Pulse Survey on Continuity of Essential Health Services During the COVID-19 Pandemic; WHO: Geneva, Switzerland, 2020; Available online: https://www.who.int/publications/i/item/WHO-2019-nCoV-EHS_continuity-survey-2020.1 (accessed on 7 February 2025).
- Ohannessian, E.; Duong, A.; Odone, A. Global telemedicine implementation and integration within health systems to fight the COVID-19 pandemic: A call to action. JMIR Public Health Surveill. 2020, 6, e18810. [Google Scholar] [CrossRef]
- Portnoy, M.A. Telemedicine in the COVID-19 era: A balancing act to avoid harm. J. Allergy Clin. Immunol. Pract. 2020, 8, 2459–2461. [Google Scholar] [CrossRef]
- Dávila, L.S.; Rivera, R.R.; Tapia, J.H.; Asanza, W.R. Inteligencia artificial aplicada a la oftalmología: ResNet-50 y VGG-19 en el diagnóstico de catarata y glaucoma. Inform. Sist. Rev. Tecnol. Inform. Las Comun. 2024, 8, 52–59. [Google Scholar] [CrossRef]
- Schwab, K. The Fourth Industrial Revolution; World Economic Forum: Geneva, Switzerland, 2016; Available online: https://www.weforum.org/about/the-fourth-industrial-revolution-by-klaus-schwab (accessed on 8 February 2025).
- Topol, E. Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again; Basic Books: New York, NY, USA, 2019. [Google Scholar]
- World Health Organization. WHO Guideline: Recommendations on Digital Interventions for Health System Strengthening; WHO: Geneva, Switzerland, 2019; Available online: https://www.who.int/publications/i/item/9789241550505 (accessed on 8 February 2025).
- Agarwal, S.; LeFevre, A.E.; Lee, J.; L’Engle, K.; Mehl, G.; Sinha, C.; Labrique, A. Guidelines for reporting of health interventions using mobile phones: Mobile health (mHealth) evidence reporting and assessment (mERA) checklist. BMJ 2016, 352, i1174. [Google Scholar] [CrossRef] [PubMed]
- Ramakrishnan, R.; Rao, S.; He, J.-R. Perinatal health predictors using artificial intelligence: A review. Women’s Health 2021, 17, 17455065211046132. [Google Scholar] [CrossRef]
- Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A Survey of Large Language Models. arXiv 2023, arXiv:2303.18223. [Google Scholar] [CrossRef] [PubMed]
- Bhattarai, K.; Oh, I.Y.; Sierra, J.M.; Tang, J.; Payne, P.R.O.; Abrams, Z.; Lai, A.M. Leveraging GPT-4 for identifying cancer phenotypes in electronic health records: A performance comparison between GPT-4, GPT-3.5-turbo, Flan-T5, Llama-3-8B, and spaCy’s rule-based and machine learning-based methods. JAMIA Open 2024, 7, ooae060. [Google Scholar] [CrossRef]
- Mahyoub, M.; Dougherty, K.; Shukla, A. Extracting pulmonary embolism diagnoses from radiology impressions using GPT-4o: Large language model evaluation study. JMIR Med. Inform. 2025, 13, e67706. [Google Scholar] [CrossRef]
- Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; Kang, J. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020, 36, 1234–1240. [Google Scholar] [CrossRef]
- Singhal, K.; Tu, T.; Gottweis, J.; Sayres, R.; Wulczyn, E.; Amin, M.; Hou, L.; Clark, K.; Pfohl, S.R.; Cole-Lewis, H.; et al. Toward expert-level medical question answering with large language models. Nat. Med. 2025, 31, 943–950. [Google Scholar] [CrossRef]
- Ktena, I.; Wiles, O.; Albuquerque, I.; Rebuffi, S.-A.; Tanno, R.; Roy, A.G.; Azizi, S.; Belgrave, D.; Kohli, P.; Cemgil, T.; et al. Los modelos generativos mejoran la equidad de los clasificadores médicos en los cambios de distribución. Nat. Med. 2024, 30, 1166–1173. [Google Scholar] [CrossRef]
- Bedi, S.; Liu, Y.; Orr-Ewing, L.; Dash, D.; Koyejo, S.; Callahan, A.; Fries, J.A.; Wornow, M.; Swaminathan, A.; Lehmann, L.S.; et al. A Systematic Review of Testing and Evaluation of Healthcare Applications of Large Language Models (LLMs). medRxiv 2024. [Google Scholar] [CrossRef]
- Gargari, O.K.; Habibi, G. Mejora de la IA médica con generación aumentada por recuperación: Una mini revisión narrativa. Digit. Health 2025, 11, 1–7. [Google Scholar]
- Zheng, Y.; Yan, Y.; Chen, S.; Cai, Y.; Ren, K.; Liu, Y.; Zhuang, J.; Zhao, M. Integración de la generación aumentada de recuperación para mejorar las recomendaciones personalizadas de los médicos en los servicios médicos basados en la web: Estudio de desarrollo de modelos. Front. Public Health 2025, 13, 1501408. [Google Scholar] [CrossRef]
- Bora, A.; Cuayáhuitl, H. Systematic analysis of retrieval-augmented generation-based LLMs for medical chatbot applications. Mach. Learn. Knowl. Extr. 2024, 6, 2355–2374. [Google Scholar] [CrossRef]
- Li, Y.; Shen, X.; Yang, C.; Cao, Z.; Du, R.; Yu, M.; Wang, J.; Wang, M. Novel electronic health records applied for prediction of pre-eclampsia: Machine-learning algorithms. Pregnancy Hypertens. Int. J. Women’s Cardiovasc. Health 2021, 26, 102–109. [Google Scholar] [CrossRef] [PubMed]
- Ge, J.; Sun, S.; Owens, J.; Galvez, V.; Gologorskaya, O.; Lai, J.C.; Pletcher, M.J.; Lai, K. Development of a Liver Disease-Specific Large Language Model Chat Interface using Retrieval Augmented Generation. medRxiv 2023. [Google Scholar] [CrossRef]
- Chen, X.; Zhang, W.; Zhao, Z.; Xu, P.; Zheng, Y.; Shi, D.; He, M. ICGA-GPT: Report generation and question answering for indocyanine green angiography images. Br. J. Ophthalmol. 2024, 1208, 1450–1456. [Google Scholar] [CrossRef]
- K2View. What Are AI Hallucinations? Available online: https://www.k2view.com/what-are-ai-hallucinations/ (accessed on 2 June 2025).
- Kumar, A.; Lee, S.; Park, J. Adoption of biomedical large language models: A scoping review of applications and challenges. J. Biomed. Inform. 2024, 157, 104703. [Google Scholar] [CrossRef]
- Dorfner, F.J.; Dada, A.; Busch, F.; Makowski, M.R.; Han, T.; Truhn, D.; Kleesiek, J.; Sushil, M.; Lammert, J.; Adams, L.C.; et al. Biomedical large language models seem not to be superior to generalist models on unseen medical data. arXiv 2024, arXiv:2408.13833. [Google Scholar] [CrossRef]
- Carvallo, M.; Peso, J.; Zapata-Toloza, R.; Andalaft, C. Telehealth and telemedicine in Latin America: A scoping review. Salud Cienc. Tecnol. 2025, 5, 1185. [Google Scholar] [CrossRef]
- Dirección General de Comunicación Social, UNAM. Mayor Relevancia de la Telemedicina en Atención Durante el Embarazo (Boletín UNAM-DGCS-712). Universidad Nacional Autónoma de México. Available online: https://www.dgcs.unam.mx/boletin/bdboletin/2021_712.html (accessed on 29 August 2021).
- Valencia, S.A.; Barrientos, J.G.; Silva, E.A.T.; Díaz, E.S. Impacto en los resultados en salud de la telesalud aplicada para la atención y seguimiento ambulatorio del alto riesgo obstétrico: Revisión narrativa de la literatura. Med. UPB 2024, 43, 43–51. [Google Scholar] [CrossRef]
- NUBIX. (s. f.). La Teleginecología y sus Beneficios en la era de la Telemedicina. Available online: https://nubix.cloud/radiologia/la-teleginecologia-y-sus-beneficios-en-la-era-de-la-telemedicina (accessed on 1 February 2025).
- Gargari, O.K.; Habibi, G. Enhancing medical AI with retrieval-augmented generation: A mini narrative review. Digit. Health 2025, 11, 20552076251337177. [Google Scholar] [CrossRef]
- Wan, N.; Jin, Q.; Chan, J.; Xiong, G.; Applebaum, S.; Gilson, A.; McMurry, R.; Taylor, R.A.; Zhang, A.; Chen, Q.; et al. Humans Continue to Outperform Large Language Models in Complex Clinical Decision-Making: A Study with Medical Calculators. arXiv 2024, arXiv:2411.05897. [Google Scholar] [CrossRef]
- Yang, R.; Zeng, Q.; You, K.; Qiao, Y.; Huang, L.; Hsieh, C.-C.; Rosand, B.; Goldwasser, J.; Dave, A.; Keenan, T.; et al. Ascle—Un kit de herramientas de procesamiento del lenguaje natural de Python para la generación de textos médicos: Estudio de desarrollo y evaluación. J. Med. Internet Res. 2024, 26, e60601. [Google Scholar] [CrossRef]
- Tecnoloblog. Qué es un Sistema RAG y cómo Funciona: Guía Exhaustiva y Actualizada. Available online: https://www.tecnoloblog.com/sistemas-rag/ (accessed on 8 February 2025).
- León, M.C.C.; Núñez, y.J.E.R. Análisis de Modelos de Inteligencia Artificial Aplicados a Sistemas Biomédicos e Internet de Objetos Médicos. Universidad Politécnica Salesiana. 2024. Available online: https://dspace.ups.edu.ec/bitstream/123456789/27874/1/UPS-GT005362.pdf (accessed on 5 February 2025).
- Aliste, F.A. INTELIGENCIA ARTIFICIAL GENERATIVA: LLMS en Medicina. SECOIR. 2025. Available online: https://secoir.org/wp-content/uploads/2025/05/10.7-Monografia-SECOIR-2025-V1.pdf (accessed on 5 March 2025).
- Guanoluisa, J.M.; Chicaiza, R.P.M.; Avalos, C.J.B. Agente Conversacional para Consultas Sobre Servicio Médico en una Clínica Privada. 3C Tecnol. 2021, 10, 47–71. Available online: https://dialnet.unirioja.es/descarga/articulo/8044473.pdf (accessed on 5 March 2025). [CrossRef]
- Aloy-Duch, A.; Vila, M.S.; Ramos-D’Angelo, F.; Calo, L.A.; Llaneza-Velasco, M.E.; Fortuny-Organs, B.; Apezetxea-Celaya, A. Desarrollo y Validación de Estándares para Unidades de Calidad de Centros Sanitarios. J. Healthc. Qual. Res. 2023, 38, 366–375. Available online: https://www.elsevier.es/es-revista-journal-healthcare-quality-research-257-articulo-desarrollo-validacion-estandares-unidades-calidad-S260364792300057X (accessed on 15 February 2025). [CrossRef]
- Capasso, A.; de Mucio, B.; Ramírez, D.; Colomar, M.; Serruya, Y.S. Salud Digital en Salud Materna: Avances y Desafíos en América Latina y el Caribe. OPS. 2024. Available online: https://www.paho.org/es/noticias/7-3-2024-salud-digital-salud-materna-avances-desafios-america-latina-caribe (accessed on 8 October 2025).
- Ji, Z.; Lee, N.; Frieske, R.; Yu, T.; Su, D.; Xu, Y.; Ishii, E.; Bang, Y.J.; Madotto, A.; Fung, P. Survey of hallucination in natural language generation. ACM Comput. Surv. 2023, 55, 1–38. [Google Scholar] [CrossRef]
- Zhang, Y. Siren’s song in the AI ocean: A survey on hallucination in large language models. arXiv 2023, arXiv:2309.01219. [Google Scholar] [CrossRef]
- Joshi, S. Retrieval Augmented Generation for Medical Question-Answering with Llama-2–7b. Medium. Available online: https://medium.com/@sauravjoshi23/retrieval-augmented-generation-for-medical-question-answering-with-llama-2-7b-82486847d089 (accessed on 6 March 2025).
- Bodenreider, O. The Unified Medical Language System (UMLS). National Library of Medicine. Available online: https://www.nlm.nih.gov/research/umls/index.html (accessed on 15 July 2025).
- Xiong, G.; Jin, Q.; Lu, Z.; Zhang, A. Benchmarking Retrieval-Augmented Generation for Medicine. In Findings of the Association for Computational Linguistics: ACL; Association for Computational Linguistics: Bangkok, Thailand, 2024; pp. 6233–6251. Available online: https://teddy-xionggz.github.io/benchmark-medical-rag/ (accessed on 5 March 2025).
- Soman, K.; Rose, P.W.; Morris, J.H.; Akbas, R.E.; Smith, B.; Peetoom, B.; Villouta-Reyes, C.; Cerono, G.; Shi, Y.; Rizk-Jackson, A.; et al. Biomedical knowledge graph-optimized prompt generation for large language models. Bioinformatics 2024, 40, btae560. [Google Scholar] [CrossRef]
- Li, M.; Kilicoglu, H.; Xu, H.; Zhang, R. BiomedRAG: A retrieval augmented large language model for biomedicine. J. Biomed. Inform. 2025, 162, 104769. [Google Scholar] [CrossRef]
- Xiong, G.; Jin, Q.; Wang, X.; Zhang, M.; Lu, Z.; Zhang, A. Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions. In Biocomputing 2025: Proceedings of the Pacific Symposium; World Scientific: Singapore, 2025. [Google Scholar]
- Matsumoto, N.; Moran, J.; Choi, H.; Hernandez, M.E.; Venkatesan, M.; Wang, P.; Moore, J.H. KRAGEN: A Knowledge Graph-Enhanced RAG Framework for Biomedical Problem Solving Using Large Language Models. Bioinform. Adv. 2024, 40, btae353. Available online: https://github.com/EpistasisLab/KRAGEN (accessed on 8 February 2025). [CrossRef]
- Wu, J.; Zhu, J.; Qi, Y.; Chen, J.; Xu, M.; Menolascina, F.; Grau, V. Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation. 2024. Available online: https://github.com/MedicineToken/Medical-Graph-RAG (accessed on 8 February 2025).
- Rezaei, M.R.; Fard, R.S.; Parker, J.; Krishnan, R.G.; Lankarany, M. Adaptive Knowledge Graphs Enhance Medical Question Answering: Bridging the Gap Between LLMs and Evolving Medical Knowledge. arXiv 2025, arXiv:2502.13010. [Google Scholar] [CrossRef]
- Guan, L.; Huang, Y.; Liu, J. Biomedical Question Answering via Multi-Level Summarization on a Local Knowledge Graph. arXiv 2025, arXiv:2504.01309. [Google Scholar] [CrossRef]
- Delile, J.; Mukherjee, S.; Van Pamel, A.; Zhukov, L. Graph-Based Retriever Captures the Long Tail of Biomedical Knowledge. arXiv 2024, arXiv:2402.12352. [Google Scholar] [CrossRef]
- Lyu, T.; Liang, C.; Liu, J.; Campbell, B.; Hung, P.; Shih, Y.; Ghumman, N.; Li, X.; Haendel, M.A.; Chute, C.G. Temporal Events Detector for Pregnancy Care (TED-PC): A rule-based algorithm to infer gestational age and delivery date from electronic health records of pregnant women with and without COVID-19. PLoS ONE 2022, 17, e0276923. [Google Scholar] [CrossRef]
- Zhu, Y.; Ren, C.; Xie, S.; Liu, S.; Ji, H.; Wang, Z.; Sun, T.; He, L.; Li, Z.; Zhu, X. REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records Analysis via Large Language Models. arXiv 2024, arXiv:2402.07016. [Google Scholar] [CrossRef]
- Zhao, Z.; Yuan, H.; Liu, J.; Chen, H.; Ying, H.; Zhou, S.; Yu, S. Evaluating Entity Retrieval in Electronic Health Records: A Semantic Gap Perspective. arXiv 2025, arXiv:2502.06252. [Google Scholar] [CrossRef]
- He, J.; Zhang, B.; Rouhizadeh, H.; Chen, Y.; Yang, R.; Lu, J.; Chen, X.; Liu, N.; Li, I.; Teodoro, D. Retrieval-Augmented Generation in Biomedicine: A Survey of Technologies, Datasets, and Clinical Applications. arXiv 2025, arXiv:2505.01146. [Google Scholar] [CrossRef]
- Luo, R.; Sun, L.; Xia, Y.; Qin, T.; Zhang, S.; Poon, H.; Liu, T.-Y. BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining. arXiv 2022, arXiv:2210.10341. [Google Scholar] [CrossRef]
- Kim, S. MedBioLM: Optimizing Medical and Biological QA with Fine-Tuned Large Language Models and Retrieval-Augmented Generation. arXiv 2025, arXiv:2502.03004. [Google Scholar] [CrossRef]
- Sohn, J.; Park, Y.; Yoon, C.; Park, S.; Hwang, H.; Sung, M.; Kim, H.; Kang, J. Rationale-Guided Retrieval Augmented Generation for Medical Question Answering. 2024. Available online: https://github.com/dmis-lab/RAG2 (accessed on 8 February 2025).
- Wu, C.; Lin, W.; Zhang, X.; Zhang, Y.; Xie, W.; Wang, Y. PMC-LLaMA: Toward Building Open-Source Language Models for Medicine. J. Am. Med. Inform. Assoc. 2024, 31, 1833–1843. [Google Scholar] [CrossRef]
- Luo, L.; Ning, J.; Zhao, Y.; Wang, Z.; Ding, Z.; Chen, P.; Fu, W.; Han, Q.; Xu, G.; Qiu, Y.; et al. Taiyi: A bilingual fine-tuned large language model for diverse biomedical tasks. J. Am. Med. Inform. Assoc. 2024, 31, 1865–1874. [Google Scholar] [CrossRef]
- Bardhan, J.; Roberts, K.; Wang, D.Z. Question Answering for Electronic Health Records: Scoping Review of Datasets and Models. J. Med. Internet Res. 2024, 26, 53636. [Google Scholar] [CrossRef]
- Chen, S.; Li, Y.; Lu, S.; Van, H.; Aerts, H.J.; Savova, G.K.; Bitterman, D.S. Evaluating the ChatGPT family of models for biomedical reasoning and classification. J. Am. Med. Inform. Assoc. 2024, 31, 940–948. [Google Scholar] [CrossRef] [PubMed]
- Vani, M.S.; Sudhakar, R.V.; Mahendar, A.; Ledalla, S.; Radha, M.; Sunitha, M. Personalized health monitoring using explainable AI: Bridging trust in predictive healthcare. Sci. Rep. 2025, 15, 31892. [Google Scholar] [CrossRef]
- Laskar, I.J.; Peng, C.; Huang, J. A comprehensive evaluation of large language models on benchmark biomedical text processing tasks. Comput. Biol. Med. 2024, 171, 108189. [Google Scholar] [CrossRef] [PubMed]
- Wang, C.; Li, M.; He, J.; Wang, Z.; Darzi, E.; Chen, Z.; Ye, J.; Li, T.; Su, Y.; Ke, J.; et al. A survey for Large Language Models in Biomedicine. arXiv 2024, arXiv:2409.00133. [Google Scholar] [CrossRef]
- Ullah, E.; Parwani, A.; Baig, M.M.; Singh, R. Diagnostic Pathology Team. Challenges and barriers of using large language models such as ChatGPT for diagnostic medicine with a focus on digital pathology: A scoping review. Diagn. Pathol. 2024, 19, 1464. [Google Scholar] [CrossRef]
- Yu, L.; Fan, L.; Li, L.; Zhou, J.; Ma, Z.; Xian, L.; Hua, W.; He, S.; Jin, M.; Zhang, Y.; et al. Large Language Models in Biomedical and Health Informatics: A Review with Bibliometric Analysis. arXiv 2024, arXiv:2403.16303. [Google Scholar] [CrossRef]
- Spatharou, A.; Hieronimus, S.; Jenkins, J. Transforming Healthcare with AI: The Impact on the Workforce and Organizations, McKinsey & Company. 2020. Available online: https://www.mckinsey.com/industries/healthcare/our-insights/transforming-healthcare-with-ai (accessed on 8 October 2025).
- Singhal, K.; Tu, T.; Gottweis, J.; Sayres, R.; Wulczyn, E.; Amin, M.; Hou, L.; Clark, K.; Pfohl, S.R.; Cole-Lewis, H.; et al. Towards expert-level medical question answering with Med-PaLM 2. Nature 2023, 620, 113–122. [Google Scholar] [CrossRef]
- Li, D.; Williams, P.; Wang, W.; Sahay, S. Towards building ethical and safe conversational agents for health applications. ACM Trans. Comput.-Hum. Interact. (TOCHI) 2021, 28, 1–36. [Google Scholar] [CrossRef]
- Jeong, S.W.; Kim, C.G.; Whangbo, T.K. Question Answering System for Healthcare Information based on BERT and GPT. In Proceedings of the 2023 Joint International Conference on Digital Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunications Engineering (ECTI DAMT & NCON), Phuket, Thailand, 22–25 March 2023; pp. 1–6. [Google Scholar]
- Anjum, K.; Sameer, M.; Kumar, S. AI Enabled NLP based Text to Text Medical Chatbot. In Proceedings of the 2023 3rd International Conference on Innovative Practices in Technology and Management (ICIPTM), Uttar Pradesh, India, 22–24 February 2023; pp. 1–6. [Google Scholar]
- Esther, C.; Kanisshka, U.P.; Ananya, G.S.; Tamizhmalar, D.; Elangovan, V.; Ishan Raghavender, N. Biomedical Chat Assistant with Personalized Document Reader Using BioMistral and RAG. In Proceedings of the 2025 International Conference on Computing and Communication Technologies (ICCCT), Chennai, India, 16–17 April 2025; pp. 1–6. [Google Scholar]
- Carl, N.; Haggenmüller, S.; Wies, C.; Nguyen, L.; Winterstein, J.T.; Hetz, M.J.; Mangold, M.H.; Hartung, F.O.; Grüne, B.; Holland-Letz, T.; et al. Evaluating interactions of patients with large language models for medical information. BJU Int. 2025, 135, 1010–1017. [Google Scholar] [CrossRef] [PubMed]
- Crema, C.; Verde, F.; Tiraboschi, P.; Marra, C.; Arighi, A.; Fostinelli, S.; Giuffré, G.M.; Dal Maschio, V.P.; L’abbate, F.; Solca, F.; et al. Medical Information Extraction With NLP-Powered QABots: A Real-World Scenario. IEEE J. Biomed. Health Inform. 2024, 28, 6906–6918. [Google Scholar] [CrossRef]
- Denecke, K.; Reichenpfader, D.; Willi, D.; Kennel, K.; Bonel, H.; Nairz, K.; Cihoric, N.; Papaux, D.; von Tengg-Kobligk, H. Person-based design and evaluation of MIA, a digital medical interview assistant for radiology. Front. Artif. Intell. 2024, 7, 1431156. [Google Scholar] [CrossRef]
- Medani, I.E.; Hakami, A.M.; Chourasia, U.H.; Rahamtalla, B.; Adawi, N.M.; Fadailu, M.; Salih, A.; Abdelmola, A.; Hashim, K.N.; Dawelbait, A.M.; et al. Telemedicine in Obstetrics and Gynecology: A Scoping Review of Enhancing Access and Outcomes in Modern Healthcare. Healthcare 2024, 13, 2036. [Google Scholar] [CrossRef]
- Manes, I.; Ronn, N.; Cohen, D.; Ber, R.I.; Horowitz-Kugler, Z.; Stanovsky, G. K-QA: A Real-World Medical Q&A Benchmark. arXiv 2024, arXiv:2401.14493. [Google Scholar] [CrossRef]
- Chowdhury, M.; He, Y.V.; Higham, A.; Lim, E. ASTRID–An Automated and Scalable TRIaD for the Evaluation of RAG-based Clinical Question Answering Systems. arXiv 2025, arXiv:2501.08208. [Google Scholar] [CrossRef]
- Kell, G.; Roberts, A.; Umansky, S.; Qian, L.; Ferrari, D.; Soboczenski, F.; Wallace, B. Question answering systems for health professionals at the point of care–A systematic review. arXiv 2024, 31, 1009–1024. [Google Scholar] [CrossRef]
- Sadeghi, Z.; Alizadehsani, R.; CIFCI, M.A.; Kausar, S.; Rehman, R.; Mahanta, P.; Bora, P.K.; Almasri, A.; Alkhawaldeh, R.S.; Hussain, S.; et al. A review of Explainable Artificial Intelligence in healthcare. Comput. Electr. Eng. Int. J. 2024, 118, 109370. [Google Scholar] [CrossRef]
- Chiatti, A.; Bernardini, S.; Piccolo, L.S.G.; Schiaffonati, V.; Matteucci, M. Mapping user trust in Vision Language Models: Research landscape, challenges, and prospects. arXiv 2025, arXiv:2505.05318. [Google Scholar] [CrossRef]
- Li, H.; Chen, Z.; Zhang, J.; Wang, Q. Rationale-enhanced clinical QA with multi-level explanation generation. Artif. Intell. Med. 2024, 142, 102570. [Google Scholar] [CrossRef]
- Wang, Y.; Mercer, R.E.; Rudzicz, F.; Roy, S.S.; Ren, P.; Chen, Z.; Wang, X. Trustworthy medical question answering: An evaluation-centric survey. arXiv 2025, arXiv:2506.03659. [Google Scholar] [CrossRef]
- Sekar, T.; Kushal, K.; Shankar, S.; Mohammed, S.; Fiaidhi, J. Investigations on using evidence-based GraphRAG pipeline using LLM tailored for answering USMLE medical exam questions. medRxiv 2025. [Google Scholar] [CrossRef]
- Hao, Y.; Alhamoud, K.; Jeong, H.; Zhang, H.; Puri, I.; Torr, P.; Schaekermann, M.; Stern, A.D.; Ghassemi, M. MedPAIR: Measuring physicians and AI relevance alignment in medical question answering. arXiv 2025, arXiv:2505.24040. [Google Scholar] [CrossRef]
- Wang, J.; Yao, Z.; Yang, Z.; Zhou, H.; Li, R.; Wang, X.; Xu, Y.; Yu, H. NoteChat: A dataset of synthetic doctor-patient conversations conditioned on clinical notes. arXiv 2024, arXiv:2310.15959. [Google Scholar]
- Albassam, D. Toward human-centered interactive clinical question answering system. arXiv 2025, arXiv:2505.18928. [Google Scholar]
- Li, D.; He, S.; Hu, B.; Chen, Q. Towards explainable medical machine reading comprehension with rationale generation. IEEE Trans. Audio Speech Lang. Process. 2025, 33, 1675–1683. [Google Scholar] [CrossRef]
- Mehrtash, A.; Wells, W.M., III; Tempany, C.M.; Abolmaesumi, P.; Kapur, T. Confidence calibration and predictive uncertainty estimation for deep medical image segmentation. IEEE Trans. Med. Imaging 2020, 39, 3868–3878. [Google Scholar] [CrossRef]
- Müller, V.; Reichert, C.; Scheuermann, H. Explainability in Clinical AI: Designing Transparent Decision Support Tools. Artif. Intell. Med. 2021, 117, 102111. [Google Scholar] [CrossRef]
- Ji, M.; Genchev, G.Z.; Huang, H.; Xu, T.; Lu, H.; Yu, G. Evaluation framework for successful artificial intelligence-enabled clinical decision support systems: Mixed methods study. J. Med. Internet Res. 2021, 23, e25929. [Google Scholar] [CrossRef]
- Khosravi, M.; Zare, Z.; Mojtabaeian, S.M.; Izadi, R. Artificial intelligence and decision-making in healthcare: A thematic analysis of a systematic review of reviews. Health Serv. Res. Manag. Epidemiol. 2024, 11, 23333928241234863. [Google Scholar] [CrossRef]
- Xiong, G. MedRAG: A Systematic Toolkit for Retrieval-Augmented Generation on Medical Question Answering. GitHub. 2024. Available online: https://github.com/Teddy-XiongGZ/MedRAG (accessed on 7 February 2025).
- Ncube, M. Incomplete Chronicles: Unveiling Data Bias in Maternal Health. Mozilla Foundation. 2024. Available online: https://www.mozillafoundation.org/en/research/library/incomplete-chronicles-unveiling-data-bias-in-maternal-health/ (accessed on 10 March 2025).
- Joshi, A. Big data and AI for gender equality in health: Bias is a big challenge. Front. Big Data 2024, 7, 1436019. [Google Scholar] [CrossRef] [PubMed]
- Neha, F.; Bhati, D.; Shukla, D.K. Retrieval-Augmented Generation (RAG) in healthcare: A comprehensive review. AI 2025, 6, 226. [Google Scholar] [CrossRef]
- Yuan, S.; Yang, Z.; Li, J.; Wu, C.; Liu, S. AI-Powered early warning systems for clinical deterioration significantly improve patient outcomes: A meta-analysis. BMC Med. Inform. Decis. Mak. 2025, 25, 203. [Google Scholar] [CrossRef]
- Hu, T.; Zhou, X.-H. Unveiling LLM Evaluation Focused on Metrics: Challenges and Solutions. arXiv 2024, arXiv:2404.09135. [Google Scholar] [CrossRef]
- Boga, Z.; Sándor, C.; Kovács, P. A Multidimensional Particle Swarm Optimization-Based Algorithm for Brain MRI Tumor Segmentation. Sensors 2025, 25, 2800. [Google Scholar] [CrossRef]
- Gao, Z.; Peromingo Peromingo, D.; Cubillo Romero, J. Generación de Imágenes Mediante Modelos de Difusión. 2024. Available online: https://docta.ucm.es/entities/publication/66fa4de2-e0b2-4d2b-88cf-3885523dc16e (accessed on 10 March 2025).
- Park, S.H. Artificial intelligence for ultrasonography: Unique opportunities and challenges. Ultrasonography 2021, 40, 3–6. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Yan, Y.; Wang, K.; Feng, B.; Yao, J.; Jiang, T.; Jin, Z.; Zheng, Y.; Zhou, Y.; Chen, C.; Sui, L.; et al. The use of large language models in detecting Chinese ultrasound report errors. npj Digit. Med. 2025, 8, 66. [Google Scholar] [CrossRef]
- Centro de Investigación de la Universidad San Agustín (USAT). La Inteligencia Artificial como Desafío en la Salud Materna Perinatal. USA. 2023. Available online: https://www.usat.edu.pe/articulos/la-inteligencia-artificial-como-desafio-en-la-salud-materna-perinatal/ (accessed on 20 February 2025).
- INS–Instituto Nacional de Salud. Protocolo de Vigilancia en Salud Pública de Morbilidad Materna Extrema; Versión 4; Instituto Nacional de Salud: Bogotá, Colombia, 2022. [CrossRef]
- Consultorsalud. Interoperabilidad de la Historia Clínica Electrónica. 2021. Available online: https://consultorsalud.com/interoperabilidad-de-la-historia-clinica-electronica/ (accessed on 15 March 2025).
- Eunice Kennedy Shriver National Institute of Child Health and Human Development. Guía Materna: Artículo de Interés: Una guía Desarrollada por el NICHD Establece un Marco para Vincular los Datos de Salud Materna y Salud Infantil. NIH. 2023. Available online: https://espanol.nichd.nih.gov/noticias/prensa/062623-guia-materna (accessed on 26 June 2023).
- ¿Cuáles Son Los Retos De Implementar Inteligencia Artificial En Los Sistemas De Salud Y Cómo Manejarlos Eficientemente? Atlantis University. 2024. Available online: https://atlantisuniversity.edu/es/au_blog/retos-inteligencia-artificial-en-salud/ (accessed on 15 June 2025).
- Atlantis University. La Inteligencia Artificial en el Sector Salud ¿Están en Riesgo Algunos Trabajos? AU Blog. 2023. Available online: https://atlantisuniversity.edu/es/au_blog/inteligencia-artificial-para-sector-salud/ (accessed on 1 March 2025).
- Mhatre, A.; Warhade, S.R.; Pawar, O.; Kokate, S.; Jain, S.; Emmanuel, M. Leveraging LLM: Implementing an Advanced AI Chatbot for Healthcare. Int. J. Innov. Sci. Res. Technol. 2024, 9, 3144–3151. [Google Scholar] [CrossRef]
- Torres, L.F. XGBoost: The King of Machine Learning Algorithms|by Luís Fernando Torres|LatinXinAI|Medium. Medium. 2023. Available online: https://medium.com/latinxinai/xgboost-the-king-of-machine-learning-algorithms-6b5c0d4acd87 (accessed on 8 February 2025).
- Abbas, S.R.; Abbas, Z.; Zahir, A.; Lee, S.W. Federated Learning in Smart Healthcare: A Comprehensive Review on Privacy, Security, and Predictive Analytics with IoT Integration. Healthcare 2024, 12, 2587. [Google Scholar] [CrossRef]
- Salud Colsubsidio Lanza el Primer modelo de Patología 100% Digital en Colombia y Transforma el Diagnóstico Clínico en el país. Available online: https://consultorsalud.com/salud-colsubsidio-modelo-patologia-digital/ (accessed on 15 June 2025).
- Dettmers, T.; Pagnoni, A.; Holtzman, A.; Zettlemoyer, L. QLoRA: Efficient Finetuning of Quantized LLMs. NeurIPS. 2023. Available online: https://github.com/artidoro/qlora (accessed on 8 February 2025).
- Mapari, S.A.; Shrivastava, D.; Dave, A.; Bedi, G.N.; Gupta, A.; Sachani, P.; Kasat, P.R.; Pradeep, U. Revolutionizing Maternal Health: The Role of Artificial Intelligence in Enhancing Care and Accessibility. Cureus 2024, 16, e69555. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- OPS Conmemora el Día Mundial de la Salud Destacando Avances y Desafíos en salud Materna y Neonatal en la Región-OPS/OMS|Organización Panamericana de la Salud. Available online: https://www.paho.org/es/noticias/7-4-2025-ops-conmemora-dia-mundial-salud-destacando-avances-desafios-salud-materna (accessed on 15 June 2025).
- Organización Panamericana de la Salud (OPS). La OPS destaca avances en la reducción de la mortalidad materna en las Américas, pero advierte sobre desafíos persistentes. OPS/OMS 2025, b56. [Google Scholar]
- El CIE se une a la OMS para Priorizar la Salud Materna y del Recién Nacido Mediante la Inversión en Enfermería|ICN-International Council of Nurses. ICN-International Council of Nurses. Available online: https://www.icn.ch/es/noticias/el-cie-se-une-la-oms-para-priorizar-la-salud-materna-y-del-recien-nacido-mediante-la (accessed on 15 June 2025).
- UNFPA América Latina y el Caribe|Salud Materna. UNFPA LAC. Available online: https://lac.unfpa.org/es/topics/salud-matern (accessed on 15 June 2025).
- Chen, Y.; Zhang, W.; Liu, K. GraphRAG: Enhancing Biomedical QA with Knowledge Graph Grounding. J. Biomed. Inform. 2024, 154, 104217. [Google Scholar]
- Qiao, S.; Fang, X.; Garrett, C.; Zhang, R.; Li, X.; Kang, Y. Generative AI for qualitative analysis in a maternal health study: Coding in-depth interviews using Large Language Models (LLMs). medRxiv 2024. [Google Scholar] [CrossRef]





| Thematic Axis | Database | Search Equation |
|---|---|---|
| Axis 1: Fundamentals of RAG in Medicine | PubMed | (“retrieval augmented generation”[Title/Abstract] OR RAG[Title/Abstract]) AND (medicine OR biomedical) |
| Axis 1: Fundamentals of RAG in Medicine | IEEE Xplore | “retrieval augmented generation” AND (medical OR clinical) |
| Axis 2: Biomedical LLMs and QA Generation | Scopus | TITLE-ABS-KEY(“biomedical language model” OR “BioGPT”) AND “question answering” AND PUBYEAR > 2022 |
| Axis 2: Biomedical LLMs and QA Generation | PubMed | (“biomedical language model” OR “Med-PaLM” OR “BioGPT”) AND (“question answering”) |
| Axis 3: QA as Input for Intelligent Agents | PubMed | (“question answering” AND “chatbot” OR “conversational agent”) AND (healthcare OR telemedicine) |
| Axis 4: Clinical Validation and Explainability | Scopus | TITLE-ABS-KEY(“clinical validation” OR “traceability” OR “explainable AI”) AND (“medical question answering”) AND PUBYEAR > 2022 |
| Axis 5: Applications in Maternal Health | Google Scholar | “maternal health QA system” OR “telemedicine agents pregnancy care” AND 2023..2025 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Noguera, A.; Mogollón-Benavides, A.L.; Niño-Mojica, M.D.; Rua, S.; Sanin-Villa, D.; Tejada, J.C. Applications and Challenges of Retrieval-Augmented Generation (RAG) in Maternal Health: A Multi-Axial Review of the State of the Art in Biomedical QA with LLMs. Sci 2025, 7, 148. https://doi.org/10.3390/sci7040148
Noguera A, Mogollón-Benavides AL, Niño-Mojica MD, Rua S, Sanin-Villa D, Tejada JC. Applications and Challenges of Retrieval-Augmented Generation (RAG) in Maternal Health: A Multi-Axial Review of the State of the Art in Biomedical QA with LLMs. Sci. 2025; 7(4):148. https://doi.org/10.3390/sci7040148
Chicago/Turabian StyleNoguera, Adriana, Andrés L. Mogollón-Benavides, Manuel D. Niño-Mojica, Santiago Rua, Daniel Sanin-Villa, and Juan C. Tejada. 2025. "Applications and Challenges of Retrieval-Augmented Generation (RAG) in Maternal Health: A Multi-Axial Review of the State of the Art in Biomedical QA with LLMs" Sci 7, no. 4: 148. https://doi.org/10.3390/sci7040148
APA StyleNoguera, A., Mogollón-Benavides, A. L., Niño-Mojica, M. D., Rua, S., Sanin-Villa, D., & Tejada, J. C. (2025). Applications and Challenges of Retrieval-Augmented Generation (RAG) in Maternal Health: A Multi-Axial Review of the State of the Art in Biomedical QA with LLMs. Sci, 7(4), 148. https://doi.org/10.3390/sci7040148

