Generative Simulation and Summarization of Neonatal Patient Data
Abstract
1. Introduction
- Development of a GAN-driven NICU patient simulator capable of producing realistic neonatal respiration rate (RR) and heart rate (HR) data, optionally with arrhythmia. The simulator also generates realistic clinical and routine care events, accounting for correlations between events. Together with user-entered patient post-conceptual age, sex, and weight, these data are exported as a JSON data structure suitable for ingestion by the NPSS.
- A modular LLM-based summarization pipeline (NPSS) is developed using RAG for clinical accuracy and contextual relevance across multiple use cases.
- Validation of the summarization tools using synthetic patient care and health status data generated by the simulator. Groundedness and relevance are measured using a LLM-as-a-judge approach, across three different judge LLM model architectures.
- Demonstration of practical NPSS use cases, including nurse-to-nurse handovers, automated charting, and real-time parental communication. User-specific language tone and technical detail are observed across each use case.
1.1. Background and Related Work
Research Gaps Motivating This Study
2. Materials and Methods
2.1. NICU Patient Simulator Development
2.1.1. GAN Architecture and Training
2.1.2. Data Sources and Preprocessing for GAN Training
2.1.3. Intervention Modeling
2.2. Text Summarization Pipeline
2.2.1. Embedding & Retrieval
2.2.2. Prompt Engineering & Summarization Modules
2.3. Experimental Validation
2.3.1. Simulator Evaluation Methodology
Distributional Similarity Metrics
Summary Statistics
Arrhythmia Simulation
2.3.2. NPSS Validation Framework
3. Results
3.1. NICU Patient Simulator Results
3.1.1. Synthetic Vital Sign Realism
3.1.2. Simulation of Arrhythmia
3.2. NPSS Results
3.3. Ablation Study: Impact of RAG
3.4. Compute Resources Used in This Study
4. Discussion
4.1. Clinical Implications & Contributions
4.2. Limitations of the Study
4.3. Future Directions
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Liu, Z.; Huang, B.; Lin, C.L.; Wu, C.L.; Zhao, C.; Chao, W.C.; Wu, Y.C.; Zheng, Y.; Wang, Z. Contactless Respiratory Rate Monitoring For ICU Patients Based On Unsupervised Learning. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Vancouver, BC, Canada; IEEE: Piscataway, NJ, USA, 2023; pp. 6005–6014. [Google Scholar] [CrossRef]
- Zeng, Y.; Yu, D.; Song, X.; Wang, Q.; Pan, L.; Lu, H.; Wang, W. Camera-based cardiorespiratory monitoring of preterm infants in nicu. IEEE Trans. Instrum. Meas. 2024, 73, 1–13. [Google Scholar] [CrossRef]
- Dosso, Y.S.; Kyrollos, D.; Greenwood, K.J.; Harrold, J.; Green, J.R. NICUface: Robust neonatal face detection in complex NICU scenes. IEEE Access 2022, 10, 62893–62909. [Google Scholar] [CrossRef]
- Dosso, Y.S.; Aziz, S.; Nizami, S.; Greenwood, K.; Harrold, J.; Green, J.R. Video-based neonatal motion detection. In Proceedings of the 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC); IEEE: Piscataway, NJ, USA, 2020; pp. 6135–6138. [Google Scholar] [CrossRef]
- Hajj-Ali, Z.; Dosso, Y.S.; Greenwood, K.; Harrold, J.; Green, J.R. Depth-Based Intervention Detection in the Neonatal Intensive Care Unit Using Vision Transformers. Sensors 2024, 24, 7753. [Google Scholar] [CrossRef] [PubMed]
- Souley Dosso, Y.; Greenwood, K.; Harrold, J.; Green, J.R. RGB-D Scene Analysis in the NICU. Comput. Biol. Med. 2021, 138. [Google Scholar] [CrossRef] [PubMed]
- Al Nazi, Z.; Peng, W. Large Language Models in Healthcare and Medical Domain: A Review. Informatics 2024, 11, 57. [Google Scholar] [CrossRef]
- Huang, Z.; Chen, X.; Wang, Y.; Huang, J.; Zhao, X. A survey on biomedical automatic text summarization with large language models. Inf. Process. Manag. 2025, 62, 104216. [Google Scholar] [CrossRef]
- Xia, T.C.; Bertini, F.; Montesi, D. Large Language Models Evaluation for PubMed Extractive Summarisation. ACM Trans. Comput. Healthc. 2026, 7, 1–23. [Google Scholar] [CrossRef]
- Nerella, S.; Bandyopadhyay, S.; Zhang, J.; Contreras, M.; Siegel, S.; Bumin, A.; Silva, B.; Sena, J.; Shickel, B.; Bihorac, A.; et al. Transformers and large language models in healthcare: A review. Artif. Intell. Med. 2024, 154, 102900. [Google Scholar] [CrossRef]
- Van Buchem, M.M.; Boosman, H.; Bauer, M.P.; Kant, I.M.; Cammel, S.A.; Steyerberg, E.W. The digital scribe in clinical practice: A scoping review and research agenda. npj Digit. Med. 2021, 4, 57. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Yoon, J.; Jarrett, D.; van der Schaar, M. Time-series Generative Adversarial Networks. In Proceedings of the Advances in Neural Information Processing Systems; Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates, Inc.: Nice, France, 2019; Volume 32. [Google Scholar]
- Chang, C.; Perlman, J.; Abramson, E. Use of a Novel Manikin for Neonatal Resuscitation Ventilation Training. Children 2022, 9, 364. [Google Scholar] [CrossRef] [PubMed]
- Yousef, N.; Moreau, R.; Soghier, L. Simulation in neonatal care: Towards a change in traditional training? Eur. J. Pediatr. 2022, 181, 1429–1436. [Google Scholar] [CrossRef] [PubMed]
- Yang, S.Y. Simulation Training Needs of Nurses for Nursing High-Risk Premature Infants: A Cross-Sectional Study. Healthcare 2022, 10, 2197. [Google Scholar] [CrossRef]
- Chiocca, E.M. Normal Vital Signs in Infants, Children, and Adolescents. In Advanced Pediatric Assessment, 2nd ed.; Springer Publishing Company: New York, NY, USA, 2016; Appendix A: Normal Vital Signs in Infants, Children, and Adolescents. [Google Scholar] [CrossRef]
- Avila-Alvarez, A.; Davis, P.G.; Kamlin, C.O.F.; Thio, M. Documentation during neonatal resuscitation: A systematic review. Arch. Dis.-Child.-Fetal Neonatal Ed. 2021, 106, 376–380. [Google Scholar] [CrossRef]
- Gesner, E.; Dykes, P.C.; Zhang, L.; Gazarian, P. Documentation burden in nursing and its role in clinician burnout syndrome. Appl. Clin. Inform. 2022, 13, 983–990. [Google Scholar] [CrossRef]
- Cohen, G.R.; Friedman, C.P.; Ryan, A.M.; Richardson, C.R.; Adler-Milstein, J. Variation in physicians’ electronic health record documentation and potential patient harm from that variation. J. Gen. Intern. Med. 2019, 34, 2355–2367. [Google Scholar] [CrossRef]
- Boudreault, L.; Hebert-Lavoie, M.; Ung, K.; Mahmoudhi, C.; Vu, Q.P.; Jouvet, P.; Doyon-Poulin, P. Situation Awareness-Oriented Dashboard in ICUs in Support of Resource Management in Time of Pandemics. IEEE J. Transl. Eng. Health Med. 2023, 11, 151–160. [Google Scholar] [CrossRef]
- Yakob, N.; Laliberté, S.; Doyon-Poulin, P.; Jouvet, P.; Noumeir, R. Data Representation Structure to Support Clinical Decision-Making in the Pediatric Intensive Care Unit: Interview Study and Preliminary Decision Support Interface Design. JMIR Form. Res. 2024, 8, e49497. [Google Scholar] [CrossRef]
- ResusSim. ResusMonitor: Online Patient Monitor Simulator. 2024. Available online: https://resusmonitor.com/ (accessed on 6 September 2025).
- Padilha, J.M.; Machado, P.P.; Ribeiro, A.; Ramos, J.; Costa, P. Clinical Virtual Simulation in Nursing Education: Randomized Controlled Trial. J. Med. Internet Res. 2019, 21, e11529. [Google Scholar] [CrossRef]
- Elsevier. Shadow Health: Digital Clinical Experiences. 2021. Available online: https://www.shadowhealth.com/ (accessed on 6 September 2025).
- Festag, S.; Denzler, J.; Spreckelsen, C. Generative adversarial networks for biomedical time series forecasting and imputation. J. Biomed. Inform. 2022, 129, 104058. [Google Scholar] [CrossRef] [PubMed]
- Esteban, C.; Hyland, S.L.; Rätsch, G. Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs. arXiv 2017, arXiv:1706.02633. [Google Scholar] [CrossRef]
- Huang, L.; Yu, W.; Ma, W.; Zhong, W.; Feng, Z.; Wang, H.; Chen, Q.; Peng, W.; Feng, X.; Qin, B.; et al. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. ACM Trans. Inf. Syst. 2025, 43, 1–55. [Google Scholar] [CrossRef]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Kuksa, P.; Minervini, P.; Yih, W.t.; Rocktäschel, T.; et al. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: Nice, France, 2020; Volume 33, pp. 9459–9474. [Google Scholar]
- Neha, F.; Bhati, D.; Shukla, D.K. Retrieval-Augmented Generation (RAG) in Healthcare: A Comprehensive Review. AI 2025, 6, 226. [Google Scholar] [CrossRef]
- Tierney, A.A.; Gayre, G.; Hoberman, B.; Mattern, B.; Ballesca, M.; Wilson Hannay, S.B.; Castilla, K.; Lau, C.S.; Kipnis, P.; Liu, V.; et al. Ambient Artificial Intelligence Scribes: Learnings after 1 Year and over 2.5 Million Uses. NEJM Catal. 2025, 6, CAT–25. [Google Scholar] [CrossRef]
- Ernstmeyer, K.; Christman, E. Nursing Skills. Open Resources for Nursing (Open RN) [Internet]. Table 1.3b, “Normal Heart Rate by Age”. 2021. Available online: https://www.ncbi.nlm.nih.gov/books/NBK593193/table/ch1survey.T.normal_heart_rate_by_age/ (accessed on 1 March 2025).
- Figueira, A.; Vaz, B. Survey on Synthetic Data Generation, Evaluation Methods and GANs. Mathematics 2022, 10, 2733. [Google Scholar] [CrossRef]
- Gee, A.H.; Barbieri, R.; Paydarfar, D.; Indic, P. Predicting Bradycardia in Preterm Infants Using Point Process Analysis of Heart Rate. IEEE Trans. Biomed. Eng. 2017, 64, 2300–2308. [Google Scholar] [CrossRef]
- Sološenko, A.; Petrėnas, A.; Paliakaitė, B.; Marozas, V.; Sörnmo, L. Model for Simulating ECG and PPG Signals with Arrhythmia Episodes (Version 1.3.0). RRID:SCR_007345. 2021. Available online: https://physionet.org/content/ecg-ppg-simulator-arrhythmia/1.3.0/ (accessed on 1 March 2025).
- Wang, Y.; Xu, H.; Kumar, R.; Tipparaju, S.M.; Wagner, M.B.; Joyner, R.W. Differences in transient outward current properties between neonatal and adult human atrial myocytes. J. Mol. Cell. Cardiol. 2003, 35, 1083–1092. [Google Scholar] [CrossRef] [PubMed]
- Hasenstab-Kenney, K.A.; Bellodas Sanchez, J.; Prabhakar, V.; Lang, I.M.; Shaker, R.; Jadcherla, S.R. Mechanisms of bradycardia in premature infants: Aerodigestive-cardiac regulatory-rhythm interactions. Physiol. Rep. 2020, 8, e14495. [Google Scholar] [CrossRef] [PubMed]
- Kothari, D.S.; Skinner, J.R. Neonatal tachycardias: An update. Arch. Dis. Child.-Fetal Neonatal Ed. 2006, 91, F136–F144. [Google Scholar] [CrossRef]
- du Toit, S.H.; Browne, M.W. Structural Equation Modeling of Multivariate Time Series. Multivar. Behav. Res. 2007, 42, 67–101. [Google Scholar] [CrossRef] [PubMed]
- Inouye, D.; Yang, E.; Allen, G.; Ravikumar, P. A Review of Multivariate Distributions for Count Data Derived from the Poisson Distribution. Wiley Interdiscip. Rev. Comput. Stat. 2017, 9, e1398. [Google Scholar] [CrossRef] [PubMed]
- Héon, M.; Aita, M.; Lavallée, A.; De Clifford-Faugère, G.; Laporte, G.; Boisvert, A.; Feeley, N. Comprehensive Mapping of NICU Developmental Care Nursing Interventions and Related Sensitive Outcome Indicators: A Scoping Review Protocol. BMJ Open 2022, 12, e046807. [Google Scholar] [CrossRef]
- LangChain. Build a Local RAG Application. 2024. Available online: https://docs.langchain.com/oss/python/langchain/rag (accessed on 15 August 2024).
- Contributors, L. Recursive Text Splitter—LangChain Python Documentation. 2024. Available online: https://docs.langchain.com/oss/python/integrations/splitters/recursive_text_splitter (accessed on 10 February 2026).
- Massey, F.J., Jr. The Kolmogorov-Smirnov test for goodness of fit. J. Am. Stat. Assoc. 1951, 46, 68–78. [Google Scholar] [CrossRef]
- Spelta, A.; Raffinetti, E. Evaluating SAFE AI principles using Wasserstein distance: A comparative study of Machine Learning models. Statistics 2024, 58, 1283–1303. [Google Scholar] [CrossRef]
- Zheng, L.; Chiang, W.L.; Sheng, Y.; Zhuang, S.; Wu, Z.; Zhuang, Y.; Lin, Z.; Li, Z.; Li, D.; Xing, E.; et al. Judging llm-as-a-judge with mt-bench and chatbot arena. In Advances in Neural Information Processing Systems 36; Curran Associates, Inc.: Nice, France, 2023; pp. 46595–46623. [Google Scholar]
- Gu, J.; Jiang, X.; Shi, Z.; Tan, H.; Zhai, X.; Xu, C.; Li, W.; Shen, Y.; Ma, S.; Liu, H.; et al. A Survey on LLM-as-a-Judge. arXiv 2025, arXiv:2411.15594. [Google Scholar] [CrossRef]
- Es, S.; James, J.; Anke, L.E.; Schockaert, S. Ragas: Automated evaluation of retrieval augmented generation. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 150–158. [Google Scholar] [CrossRef]
- Asgari, E.; Montaña-Brown, N.; Dubois, M.; Khalil, S.; Balloch, J.; Pimenta, D. A Framework to Assess Clinical Safety and Hallucination Rates of LLMs for Medical Text Summarisation. medRxiv 2024. [Google Scholar] [CrossRef]
- Loni, M.; Poursalim, F.; Asadi, M.; Gharehbaghi, A. A Review on Generative AI Models for Synthetic Medical Text, Time Series, and Longitudinal Data. arXiv 2024, arXiv:2411.12274. [Google Scholar] [CrossRef] [PubMed]
- Gao, Y.; Xiong, Y.; Gao, X.; Jia, K.; Pan, J.; Bi, Y.; Dai, Y.; Sun, J.; Wang, M.; Wang, H. Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv 2024, arXiv:2312.10997. [Google Scholar] [CrossRef]
- Geis, J.R.; Brady, A.P.; Wu, C.C.; Spencer, J.; Ranschaert, E.; Jaremko, J.L.; Langer, S.G.; Borondy Kitts, A.; Birch, J.; Shields, W.F.; et al. Ethics of Artificial Intelligence in Radiology: Summary of the Joint European and North American Multisociety Statement. Radiology 2019, 293, 436–440. [Google Scholar] [CrossRef] [PubMed]
- Baig, M.M.; Hobson, C.; GholamHosseini, H.; Ullah, E.; Afifi, S. Generative AI in improving personalized patient care plans: Opportunities and barriers towards its wider adoption. Appl. Sci. 2024, 14, 10899. [Google Scholar] [CrossRef]
- Niestroy, J.C.; Moorman, J.R.; Levinson, M.A.; Manir, S.A.; Clark, T.W.; Fairchild, K.D.; Lake, D.E. Discovery of signatures of fatal neonatal illness in vital signs using highly comparative time-series analysis. npj Digit. Med. 2022, 5, 6. [Google Scholar] [CrossRef] [PubMed]
- Liu, S.; Ngiam, K.Y.; Feng, M. Deep Reinforcement Learning for Clinical Decision Support: A Brief Survey. arXiv 2019, arXiv:1907.09475. [Google Scholar] [CrossRef]
- Chiang, C.H.; Lee, H.y. Can large language models be an alternative to human evaluations? arXiv 2023, arXiv:2305.01937. [Google Scholar] [CrossRef]
- Rudd, E.M.; Andrews, C.; Tully, P. A Practical Guide for Evaluating LLMs and LLM-Reliant Systems. arXiv 2025, arXiv:2506.13023. [Google Scholar] [CrossRef]
- Mandel, J.C.; Kreda, D.A.; Mandl, K.D.; Kohane, I.S.; Ramoni, R.B. SMART on FHIR: A standards-based, interoperable apps platform for electronic health records. J. Am. Med. Inform. Assoc. 2016, 23, 899–908. [Google Scholar] [CrossRef] [PubMed]
- Pan, G.; Chodnekar, V.; Roy, A.; Wang, H. A Cost-Benefit Analysis of On-Premise Large Language Model Deployment: Breaking Even with Commercial LLM Services. arXiv 2025, arXiv:2509.18101. [Google Scholar] [CrossRef]







| Parameter | Patient Data Retriever | Reference Material Retriever |
|---|---|---|
| Chunking Strategy | RecursiveCharacterTextSplitter | RecursiveCharacterTextSplitter |
| Chunk Size | 5000 characters | 250 characters |
| Chunk Overlap | 200 characters | 0 characters |
| Retrieval Method | Cosine Similarity | Cosine Similarity |
| Top-k | 5 | 1 |
| Embedding Model | text-embedding-ada-002 | text-embedding-ada-002 |
| Vector Store | InMemoryVectorStore | InMemoryVectorStore |
| Reranking | None | None |
| Metric | Real Data | Synthetic Data |
|---|---|---|
| Mean (BPM) | 128.4 | 127.0 |
| Standard deviation (STD) | 12.7 | 13.5 |
| Skewness | 0.12 | 0.15 |
| Kurtosis | 2.85 | 3.10 |
| KS statistic/p-value/WD | 0.085/0.320/2.30 | |
| Metric | Real Data | Synthetic Data |
|---|---|---|
| Mean (BPM) | 127.26 | 126.86 |
| Standard deviation (STD) | 10.81 | 12.29 |
| Skewness | 0.23 | |
| Kurtosis | ||
| KS statistic/p-value/WD | 0.168/0.118/3.761 | |
| Metric | Real Data | Synthetic Data |
|---|---|---|
| Count of episodes (8 h) | 57 | 59 |
| Frequency (episodes/h) | 7.13 | 7.38 |
| Mean duration (timesteps) | 15.77 | 15.85 |
| Mean maximum BPM | 213.59 | 213.15 |
| KS statistic/p-value (duration) | 0.081/0.785 | |
| KS statistic/p-value (max BPM) | 0.093/0.682 | |
| Metric | Real Data | Synthetic Data |
|---|---|---|
| Count of episodes (8 h) | 42 | 45 |
| Frequency (episodes/h) | 5.25 | 5.63 |
| Mean duration (timesteps) | 10.20 | 10.60 |
| Mean minimum BPM | 78.40 | 77.90 |
| KS statistic/p-value (duration) | 0.094/0.713 | |
| KS statistic/p-value (min BPM) | 0.089/0.653 | |
| Judge Model | Groundedness | Relevance |
|---|---|---|
| o3-mini | ||
| Llama3 | ||
| Mistral |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Levine, J.; Riarh, G.; Green, J.R. Generative Simulation and Summarization of Neonatal Patient Data. Information 2026, 17, 261. https://doi.org/10.3390/info17030261
Levine J, Riarh G, Green JR. Generative Simulation and Summarization of Neonatal Patient Data. Information. 2026; 17(3):261. https://doi.org/10.3390/info17030261
Chicago/Turabian StyleLevine, Jesse, Gurshan Riarh, and James R. Green. 2026. "Generative Simulation and Summarization of Neonatal Patient Data" Information 17, no. 3: 261. https://doi.org/10.3390/info17030261
APA StyleLevine, J., Riarh, G., & Green, J. R. (2026). Generative Simulation and Summarization of Neonatal Patient Data. Information, 17(3), 261. https://doi.org/10.3390/info17030261

