QAMT: An LLM-Based Framework for Quality-Assured Medical Time-Series Data Generation
Abstract
1. Introduction
- Challenge 1: Joint Generation of Temporal Data and Static Event Data. Medical time-series data consist of diverse data types, each with unique characteristics. Among them, static event data are generally high-dimensional and discrete (e.g., demographics, clinical outcomes), whereas temporal data tend to be lower-dimensional and continuous (e.g., vital signs, laboratory measurements). Therefore, jointly generating medical time-series data that include both static and temporal components is essential for producing comprehensive and realistic datasets.
- Challenge 2: Clinical Constraints and Variable Dependencies. Many variables in medical time-series data are governed by clinical constraints, and their values often exhibit strong interdependencies. For example, a patient’s systolic blood pressure value of zero is clinically impossible. Additionally, if a patient consistently exhibits systolic blood pressure readings above 150 mmHg, a final diagnosis would not be hypotension. Accurately modeling these constraints and dependencies is essential to ensure the clinical plausibility of generated data.
- Challenge 3: Need for Interpretability. The generation process of medical time-series data should be interpretable. In clinical research and practice, it is crucial for stakeholders to understand how the data are produced in order to evaluate their quality, support downstream applications, and maintain trust in data-driven healthcare systems.
- QAMT jointly generates medical time-series data, which include both continuous temporal data and discrete static event data.
- QAMT ensures the quality assurance of the generated data by accounting for real-world clinical constraints and variable dependencies.
- QAMT enables interpretability in the medical time-series data generation process.
- QAMT is evaluated on the eICU and MIMIC-III datasets, and demonstrates superior performance compared to state-of-the-art models in terms of fidelity, utility, and privacy.
2. Preliminaries and Related Work
2.1. Definition
2.2. Medical Time-Series Data Generation
2.3. Medical Time-Series Data Quality Assurance
2.4. Interpretability of Data Generation
3. QAMT Overview
- (a)
- Based on real-world medical time-series datasets and open knowledge bases, a health knowledge graph (HKG) is constructed using an existing Health Knowledge Graph Builder (HKGB). This knowledge graph serves as a domain-specific knowledge resource for downstream LLMs (Section 4.1).
- (b)
- Construct HKG-CoT, a chain-of-thought (CoT) reasoning process enriched with clinical knowledge from the HKG, which provides healthcare-specific inference capabilities (Section 4.2).
- (c)
- The static event data generation module uses Retrieval-Augmented Generation (RAG) guided by the HKG to generate static event data (Section 5.1), referred to as Health Knowledge Graph-based Retrieval Augmented Generation (HKG-RAG):
- (d)
- The medical time-series data quality assurance module then evaluates the generated static event data using HKG-CoT (Section 6.1), obtaining constrained static event data with clinical constraints and logical consistency. Then, the evaluation results are fed back to the static event data generation module:
- (e)
- The temporal data generation module uses a GAN to generate temporal data (Section 5.2), leveraging constrained static event data as conditional guidance:
- (f)
- The generated temporal data are further validated by the medical time-series data quality assurance module, using the Concept Knowledge Graph (CKG), which is a subgraph within the HKG (Section 6.2), to check for clinical value range constraints and plausibility and obtain constrained temporal data:
- (g)
- The constrained temporal data are then input to an LLM-based diagnostic model, LLM-EvPredict, which predicts its corresponding static event data. Then, the medical time-series data quality assurance module compares the predicted static event data with the previously constrained static event data using the LLM-TSAssure (Section 6.3):
- (h)
- Finally, if the predicted static event data and constrained static event data are deemed consistent, the static event data and temporal data are considered to satisfy variable dependencies and are jointly assembled into final, reliable synthetic medical time-series data (Section 6.3):
4. Health Knowledge Graph Module
4.1. HKG
4.2. HKG-CoT
4.3. HKG-RAG
5. Medical Time-Series Data Generation Module
5.1. Static Event Data Generation Module
5.2. Temporal Data Generation Module
6. Medical Time-Series Data Quality Assurance Module
6.1. Clinical Constraint Assurance in Static Event Data
6.2. Clinical Constraint Assurance in Temporal Data
6.3. Assurance of Variable Dependencies
7. The Interpretability of Medical Time-Series Data Generation
8. Experimental Results
8.1. Experimental Setup
8.1.1. Datasets
8.1.2. Evaluation Metrics
- Fidelity. For static event data, we assessed fidelity using the probabilities of unigram, bigram, and trigram within each visit, as well as the probabilities of sequential bigram between continuous visits. For example, the probability of a continuous visit as was computed by dividing its frequency by the total number of patients. We then calculated the Pearson Correlation between the top 1000 n-gram probabilities in the real and generated datasets to evaluate the similarity in their distributions.
- Utility. We assessed the utility of the generated data by evaluating their performance across four downstream tasks involving two disease types: sepsis clustering [56], sepsis treatment strategy modeling [57], ARDS (Acute Respiratory Distress Syndrome) prediction [58], and ARDS treatment strategy modeling [59]. A smaller difference between the generated data and real data in downstream tasks indicates more similarity. For the sepsis clustering task, the study evaluated the results before and after applying its proposed method using the Sum of Squares Error (SSE) metric. Accordingly, we adopted the difference in SSE improvement between real and generated data as our evaluation metric. For the sepsis treatment strategy task, we measured the difference in patient condition improvement between models trained on generated data and real data. For ARDS prediction, we compared the AUROC scores of classifiers trained on real data and generated data within a 12 h window before onset. For ARDS treatment strategy modeling, we compared the average reduction in mortality achieved by reinforcement learning algorithms when trained on real or generated data.
- Privacy. We adopted the Membership Inference Attack (MIA) as the evaluation metric of privacy to determine whether specific data points were included in the data [27]. We fit a K-Nearest Neighbors (KNN) model on the generated data and the real dataset and calculated their nearest distances for each patient. A significant disparity between the distance distributions in the generated and real sets indicates lower privacy. We used the Hamming distance for static event sequences and the Euclidean distance for temporal embeddings. We then fit Gaussian distributions to these distances and assess the differences between the two distributions using the Wasserstein Distance (WD), Jensen–Shannon Divergence (JSD), and Area Under the Receiver Operating Characteristic (AUROC) metrics.
8.1.3. Baselines
8.1.4. Experimental Details
8.2. Medical Time-Series Data Fidelity Evaluation
8.3. Medical Time-Series Data Utility Evaluation
8.4. Medical Time-Series Data Privacy Evaluation
8.5. Robustness Analysis
8.5.1. Statistical Significance Test
8.5.2. Noise Robustness Analysis
8.6. Parameter Sensitivity Analysis
8.7. Ablation Experiments
8.7.1. Fidelity Evaluation
8.7.2. Utility Evaluation
8.7.3. Privacy Evaluation
9. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Cruz-Vega, I.B.; Ávila Vanzzini, N.; González-Gómez, G.H.; Springall, R.; Echeverría, J.C.; Lerma, C. Dynamic Response of Heart Rate Variability to Active Standing in Aortic Valve Disease: Insights from Recurrence Quantification Analysis. Sensors 2025, 25, 1535. [Google Scholar] [CrossRef]
- Fan, Y.; Dang, Y.; Guo, Y. Fault Identification Model Using Convolutional Neural Networks with Transformer Architecture. Sensors 2025, 25, 3897. [Google Scholar] [CrossRef]
- Rupprechter, S.; Morinan, G.; Peng, Y.; Foltynie, T.; Sibley, K.; Weil, R.S.; Leyland, L.A.; Baig, F.; Morgante, F.; Gilron, R.; et al. A Clinically Interpretable Computer-Vision Based Method for Quantifying Gait in Parkinson’s Disease. Sensors 2021, 21, 5437. [Google Scholar] [CrossRef]
- Kalahasty, R.; Yerrapragada, G.; Lee, J.; Gopalakrishnan, K.; Kaur, A.; Muddaloor, P.; Sood, D.; Parikh, C.; Gohri, J.; Panjwani, G.A.R.; et al. A Novel You Only Listen Once (YOLO) Deep Learning Model for Automatic Prominent Bowel Sounds Detection: Feasibility Study in Healthy Subjects. Sensors 2025, 25, 4735. [Google Scholar] [CrossRef]
- Dang, T.H.; Kim, S.m.; Choi, M.s.; Hwan, S.n.; Min, H.k.; Bien, F. An Automated Algorithm for Obstructive Sleep Apnea Detection Using a Wireless Abdomen-Worn Sensor. Sensors 2025, 25, 2412. [Google Scholar] [CrossRef] [PubMed]
- Randazzo, V.; Caligari, S.; Pasero, E.; Giustetto, C.; Saglietto, A.; Bertarello, W.; Averbuch, A.; Marcus-Kalish, M.; Zheludev, V.; Gaita, F. A Vision Transformer Model for the Prediction of Fatal Arrhythmic Events in Patients with Brugada Syndrome. Sensors 2025, 25, 824. [Google Scholar] [CrossRef] [PubMed]
- Pollard, T.J.; Johnson, A.E.; Raffa, J.D.; Celi, L.A.; Mark, R.G.; Badawi, O. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Sci. Data 2018, 5, 180178. [Google Scholar] [CrossRef]
- Johnson, A.E.; Pollard, T.J.; Shen, L.; Lehman, L.w.H.; Feng, M.; Ghassemi, M.; Moody, B.; Szolovits, P.; Anthony Celi, L.; Mark, R.G. MIMIC-III, a freely accessible critical care database. Sci. Data 2016, 3, 160035. [Google Scholar] [CrossRef]
- Theodorou, B.; Xiao, C.; Sun, J. Synthesize high-dimensional longitudinal electronic health records via hierarchical autoregressive language model. Nat. Commun. 2023, 14, 5305. [Google Scholar] [CrossRef] [PubMed]
- Karami, H.; Atienza Alonso, D.; Ionescu, A. SynEHRgy: Synthesizing Mixed-Type Structured Electronic Health Records using Decoder-Only Transformers. In Proceedings of the 38th Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 10–15 December 2024. [Google Scholar]
- El Emam, K.; Buckeridge, D.; Tamblyn, R.; Neisa, A.; Jonker, E.; Verma, A. The re-identification risk of Canadians from longitudinal demographics. BMC Med. Inform. Decis. Mak. 2011, 11, 46. [Google Scholar] [CrossRef]
- Benitez, K.; Malin, B. Evaluating re-identification risks with respect to the HIPAA privacy rule. J. Am. Med. Inform. Assoc. 2010, 17, 169–177. [Google Scholar] [CrossRef]
- Abbas, S.R.; Abbas, Z.; Zahir, A.; Lee, S.W. Federated learning in smart healthcare: A comprehensive review on privacy, security, and predictive analytics with IoT integration. Healthcare 2024, 12, 2587. [Google Scholar] [CrossRef]
- Kingma, D.P.; Welling, M. An introduction to variational autoencoders. Found. Trends® Mach. Learn. 2019, 12, 307–392. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Yan, C.; Zhang, Z.; Nyemba, S.; Malin, B.A. Generating Electronic Health Records with Multiple Data Types and Constraints. In Proceedings of the AMIA 2020, American Medical Informatics Association Annual Symposium, Virtual, 14–18 November 2020. [Google Scholar]
- Xu, L.; Skoularidou, M.; Cuesta-Infante, A.; Veeramachaneni, K. Modeling tabular data using conditional gan. arXiv 2019, arXiv:1907.00503. [Google Scholar] [CrossRef]
- Pradhan, P.K.; Das, A.; Kumar, A.; Baruah, U.; Sen, B.; Ghosal, P. SwinSight: A hierarchical vision transformer using shifted windows to leverage aerial image classification. Multim. Tools Appl. 2024, 83, 86457–86478. [Google Scholar] [CrossRef]
- Wang, Z.; Sun, J. PromptEHR: Conditional Electronic Healthcare Records Generation with Prompt Learning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, 7–11 December 2022; Goldberg, Y., Kozareva, Z., Zhang, Y., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 2873–2885. [Google Scholar] [CrossRef]
- Zhang, Z.; Yan, C.; Lasko, T.A.; Sun, J.; Malin, B.A. SynTEG: A framework for temporal structured electronic health data simulation. J. Am. Med. Inform. Assoc. 2021, 28, 596–604. [Google Scholar] [CrossRef]
- Baowaly, M.K.; Lin, C.; Liu, C.; Chen, K. Synthesizing electronic health records using improved generative adversarial networks. J. Am. Med. Inform. Assoc. 2019, 26, 228–241. [Google Scholar] [CrossRef]
- Lu, C.; Reddy, C.K.; Wang, P.; Nie, D.; Ning, Y. Multi-Label Clinical Time-Series Generation via Conditional GAN. IEEE Trans. Knowl. Data Eng. 2024, 36, 1728–1740. [Google Scholar] [CrossRef]
- Nikolentzos, G.; Vazirgiannis, M.; Xypolopoulos, C.; Lingman, M.; Brandt, E.G. Synthetic electronic health records generated with variational graph autoencoders. Npj Digit. Med. 2023, 6, 83. [Google Scholar] [CrossRef]
- Pang, C.; Jiang, X.; Pavinkurve, N.P.; Kalluri, K.S.; Minto, E.L.; Patterson, J.; Zhang, L.; Hripcsak, G.; Elhadad, N.; Natarajan, K. CEHR-GPT: Generating Electronic Health Records with Chronological Patient Timelines. arXiv 2024, arXiv:2402.04400. [Google Scholar] [CrossRef]
- Esteban, C.; Hyland, S.L.; Rätsch, G. Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs. arXiv 2017, arXiv:1706.02633. [Google Scholar] [CrossRef]
- Karami, H.; Hartley, M.; Atienza, D.; Ionescu, A. TimEHR: Image-based Time Series Generation for Electronic Health Records. arXiv 2024, arXiv:2402.06318. [Google Scholar] [CrossRef]
- Yoon, J.; Mizrahi, M.J.; Ghalaty, N.F.; Jarvinen, T.; Ravi, A.S.; Brune, P.; Kong, F.; Anderson, D.; Lee, G.; Meir, A.; et al. EHR-Safe: Generating high-fidelity and privacy-preserving synthetic electronic health records. Npj Digit. Med. 2023, 6, 141. [Google Scholar] [CrossRef]
- Lee, Y.; Chae, Y.; Jung, K. Leveraging VQ-VAE tokenization for autoregressive modeling of medical time series. Artif. Intell. Med. 2024, 154, 102925. [Google Scholar] [CrossRef]
- Zhou, X.; Jia, Q.; Hu, Y.; Xie, R.; Huang, T.; Yu, F.R. GenG: An LLM-Based Generic Time Series Data Generation Approach for Edge Intelligence via Cross-Domain Collaboration. In Proceedings of the IEEE INFOCOM 2024—IEEE Conference on Computer Communications Workshops, Vancouver, BC, Canada, 20 May 2024. [Google Scholar] [CrossRef]
- Zhang, Y.; Sheng, M.; Zhou, R.; Wang, Y.; Han, G.; Zhang, H.; Xing, C.; Dong, J. HKGB: An inclusive, extensible, intelligent, semi-auto-constructed knowledge graph framework for healthcare with clinicians’ expertise incorporated. Inf. Process. Manag. 2020, 57, 102324. [Google Scholar] [CrossRef]
- Borisov, V.; Seßler, K.; Leemann, T.; Pawelczyk, M.; Kasneci, G. Language Models are Realistic Tabular Data Generators. In Proceedings of the The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Hernandez, M.; Epelde, G.; Alberdi, A.; Cilla, R.; Rankin, D. Synthetic data generation for tabular health records: A systematic review. Neurocomputing 2022, 493, 28–45. [Google Scholar] [CrossRef]
- Viana, D.; Teixeira, R.; Baptista, J.; Pinto, T. Synthetic Data Generation Models for Time Series: A Literature Review. In Proceedings of the International Conference on Electrical, Computer and Energy Technologies, ICECET 2024, Sydney, Australia, 25–27 July 2024. [Google Scholar] [CrossRef]
- Inan, M.S.K.; Hossain, S.; Uddin, M.N. Data augmentation guided breast cancer diagnosis and prognosis using an integrated deep-generative framework based on breast tumor’s morphological information. Inform. Med. Unlocked 2023, 37, 101171. [Google Scholar] [CrossRef]
- Chu, Z.; Chen, J.; Chen, Q.; Yu, W.; He, T.; Wang, H.; Peng, W.; Liu, M.; Qin, B.; Liu, T. Navigate through enigmatic labyrinth a survey of chain of thought reasoning: Advances, frontiers and future. arXiv 2023, arXiv:2309.15402. [Google Scholar]
- Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; Kang, J. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020, 36, 1234–1240. [Google Scholar] [CrossRef] [PubMed]
- Peng, Y.; Yan, S.; Lu, Z. Transfer learning in biomedical natural language processing: An evaluation of BERT and ELMo on ten benchmarking datasets. arXiv 2019, arXiv:1906.05474. [Google Scholar] [CrossRef]
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D. Chain-of-thought prompting elicits reasoning in large language models. Adv. Neural Inf. Process. Syst. 2022, 35, 24824–24837. [Google Scholar]
- Lin, B.Y.; Chen, X.; Chen, J.; Ren, X. Kagnet: Knowledge-aware graph networks for commonsense reasoning. arXiv 2019, arXiv:1909.02151. [Google Scholar]
- Soman, K.; Rose, P.W.; Morris, J.H.; Akbas, R.E.; Smith, B.; Peetoom, B.; Villouta-Reyes, C.; Cerono, G.; Shi, Y.; Rizk-Jackson, A.; et al. Biomedical knowledge graph-optimized prompt generation for large language models. Bioinformatics 2024, 40, btae560. [Google Scholar] [CrossRef]
- Zhao, R.; Zhao, F.; Wang, L.; Wang, X.; Xu, G. Kg-cot: Chain-of-thought prompting of large language models over knowledge graphs for knowledge-aware question answering. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI-24), Jeju, Republic of Korea, 3–9 August 2024; pp. 6642–6650. [Google Scholar]
- Matsumoto, N.; Moran, J.; Choi, H.; Hernandez, M.E.; Venkatesan, M.; Wang, P.; Moore, J.H. KRAGEN: A knowledge graph-enhanced RAG framework for biomedical problem solving using large language models. Bioinformatics 2024, 40, btae353. [Google Scholar] [CrossRef]
- Shao, S.; Lin, S.; Huang, Z. A Medical Consultation System for Geriatric Disease Based on Multi-agent Architecture and Knowledge Graph. In Proceedings of the Health Information Science-13th International Conference, HIS 2024, Hong Kong, China, 8–10 December 2024; Proceedings; Siuly, S., Xing, C., Li, X., Zhou, R., Eds.; Lecture Notes in Computer Science; Springer: Singapore, 2024; Volume 15336, pp. 313–325. [Google Scholar] [CrossRef]
- Rae, J.W.; Borgeaud, S.; Cai, T.; Millican, K.; Hoffmann, J.; Song, F.; Aslanides, J.; Henderson, S.; Ring, R.; Young, S.; et al. Scaling language models: Methods, analysis & insights from training gopher. arXiv 2021, arXiv:2112.11446. [Google Scholar]
- Ma, K.; Cheng, H.; Liu, X.; Nyberg, E.; Gao, J. Open-domain question answering via chain of reasoning over heterogeneous knowledge. arXiv 2022, arXiv:2210.12338. [Google Scholar] [CrossRef]
- Xia, Y.; Wang, R.; Liu, X.; Li, M.; Yu, T.; Chen, X.; McAuley, J.; Li, S. Beyond chain-of-thought: A survey of chain-of-x paradigms for llms. arXiv 2024, arXiv:2404.15676. [Google Scholar]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.t.; Rocktäschel, T.; et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv. Neural Inf. Process. Syst. 2020, 33, 9459–9474. [Google Scholar]
- Zhao, X.; Liu, S.; Yang, S.Y.; Miao, C. Medrag: Enhancing retrieval-augmented generation with knowledge graph-elicited reasoning for healthcare copilot. In Proceedings of the Proceedings of the ACM on Web Conference 2025, Sydney, Australia, 28 April–2 May 2025; pp. 4442–4457. [Google Scholar]
- Jiang, J.; Zhou, K.; Zhao, W.X.; Li, Y.; Wen, J.R. ReasoningLM: Enabling structural subgraph reasoning in pre-trained language models for question answering over knowledge graph. arXiv 2023, arXiv:2401.00158. [Google Scholar]
- Kang, M.; Kwak, J.M.; Baek, J.; Hwang, S.J. Knowledge graph-augmented language models for knowledge-grounded dialogue generation. arXiv 2023, arXiv:2305.18846. [Google Scholar]
- Reimers, N.; Gurevych, I. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv 2019, arXiv:1908.10084. [Google Scholar] [CrossRef]
- Luo, Y.; Sheng, M.; Liu, X.; Wang, K.; Zhang, Y.; Zhao, H. Ltgan: Multi-label time-series gan with constraints for electronic health records generation. In Proceedings of the International Conference on Health Information Science, Hong Kong, China, 8–10 December 2024; Springer: Singapore, 2025; pp. 36–47. [Google Scholar]
- Kim, Y.; Xu, X.; McDuff, D.; Breazeal, C.; Park, H.W. Health-llm: Large language models for health prediction via wearable sensor data. arXiv 2024, arXiv:2401.06866. [Google Scholar] [CrossRef]
- Jin, M.; Yu, Q.; Shu, D.; Zhang, C.; Fan, L.; Hua, W.; Zhu, S.; Meng, Y.; Wang, Z.; Du, M.; et al. Health-LLM: Personalized retrieval-augmented disease prediction system. arXiv 2024, arXiv:2402.00746. [Google Scholar]
- Harutyunyan, H.; Khachatrian, H.; Kale, D.C.; Ver Steeg, G.; Galstyan, A. Multitask learning and benchmarking with clinical time series data. Sci. Data 2019, 6, 96. [Google Scholar] [CrossRef] [PubMed]
- Hao, R.; Sheng, M.; Zhang, Y.; Zhao, H.; Hao, C.; Li, W.; Wang, L.; Li, C. Enhancing clustering performance in sepsis time series data using gravity field. In Proceedings of the International Conference on Health Information Science, Melbourne, Australia, 23–24 October 2023; Springer: Singapore, 2023; pp. 199–212. [Google Scholar]
- Wang, Z.; Zhao, H.; Ren, P.; Zhou, Y.; Sheng, M. Learning optimal treatment strategies for sepsis using offline reinforcement learning in continuous space. In Proceedings of the International Conference on Health Information Science, Virtual, 28–30 October 2022; Springer: Cham, Switzerland, 2022; pp. 113–124. [Google Scholar]
- Le, S.; Pellegrini, E.; Green-Saxena, A.; Summers, C.; Hoffman, J.; Calvert, J.; Das, R. Supervised machine learning for the early prediction of acute respiratory distress syndrome (ARDS). J. Crit. Care 2020, 60, 96–102. [Google Scholar] [CrossRef]
- Zheng, H.; Zhu, J.; Xie, W.; Zhong, J. Reinforcement learning assisted oxygen therapy for COVID-19 patients under intensive care. BMC Med. Inform. Decis. Mak. 2021, 21, 350. [Google Scholar] [CrossRef]
- Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; et al. Llama 2: Open foundation and fine-tuned chat models. arXiv 2023, arXiv:2307.09288. [Google Scholar] [CrossRef]
- Team, G.; Georgiev, P.; Lei, V.I.; Burnell, R.; Bai, L.; Gulati, A.; Tanzer, G.; Vincent, D.; Pan, Z.; Wang, S.; et al. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv 2024, arXiv:2403.05530. [Google Scholar] [CrossRef]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Hao, C.; Hao, R.; Zhao, H.; Zhang, Y.; Sheng, M.; An, Y. Identification and validation of sepsis subphenotypes using time-series data. Heliyon 2024, 10, e28520. [Google Scholar] [CrossRef] [PubMed]
Method | Type | Domain | JointGeneration | QualityAssurance 1 | Interpretability |
---|---|---|---|---|---|
SynTEG [20] | GAN-based | Medical | ✗ | - | ✗ |
BGAN [21] | GAN-based | Medical | ✗ | - | ✗ |
MTGAN [22] | GAN-based | Medical | ✗ | - | ✗ |
VGAE [23] | VAE-based | Medical | ✗ | - | ✗ |
PromptEHR [19] | LLM-based | Medical | ✗ | CC, VD | ✗ |
CEHR-GPT [24] | LLM-based | Medical | ✗ | CC | ✗ |
RTSGAN [25] | GAN-based | Medical | ✗ | - | ✗ |
TimEHR [26] | GAN-based | Medical | ✗ | - | ✗ |
EHR-Safe [27] | GAN-based | Medical | ✗ | - | ✗ |
CodeAR [28] | VAE-based | Medical | ✗ | - | ✗ |
GenG [29] | LLM-based | Energy | ✗ | - | ✗ |
HGAN [16] | GAN-based | Medical | ✓ | CC, VD | ✗ |
HALO [9] | LLM-based | Medical | ✓ | CC | ✗ |
SynEHRgy [10] | LLM-based | Medical | ✓ | - | ✗ |
QAMT (ours) | LLM-based | medical | ✓ | CC, VD | ✓ |
Relation | Source Node 1,2 | Target Node 1,2 | Example |
---|---|---|---|
Concept → Concept | |||
_has_symptom_ | Dis | Sym | COPD -_has_symptom_→ fever |
_is_treated_by_ | Dis/Sym | Med/Tre | COPD -_is_treated_by_→ -agonist |
_indicated_by_ | Dis | CI | Diabetes -_indicated_by_→ HbA1c |
_progresses_to_ | Dis | Dis | Prediabetes -_progresses_to_→ Type 2 Diabetes |
_treated_ | Med/Tre | Dis/Sym | -agonist -_treated_→ COPD |
_cause_ | Sym | Dis | fever -_cause_→ COPD |
_measured_ | CI | Dis | PaO2 -_measured_→ COPD |
_constraints_ | CI | CC | PaO2 -_constraints_→ 20–50 mmHg |
… | |||
Entity/Event → Concept | |||
_conforms_to_ | Entity/Event | CC | EUI:67178.Lab Result -_conforms_to_→ HbA1c > 7% |
_diagnoses_ | Event | Dis | EUI:43613.Description -_diagnoses_→ Hypertension |
_prescribes_ | Event | Med/Tre | EUI:54241.Prescription -_prescribes_→ Amoxicillin |
_associated_with_ | Event | CI | EUI:67129.Indicators -_associated_with_→ |
… | |||
Entity/Event → Entity/Event | |||
_participate_in_ | Entity | Event | IUI:7657 -_participate_in_→ EUI:67129 |
_associate_with_ | Entity (A/Eth) | Event | Age 65 -_associate_with_→ EUI:412546 |
_occurs_at_ | Event | Entity (L) | EUI:63415 -_occurs_at_→ Beijing |
_time_is_ | Event | Entity (T) | EUI:132807 -_time_is_→ Year 2010 |
_event_type_is_ | Event | Entity (ET) | EUI:67129 -_event_type_is_→ Surgery |
_after_/_before_ | Event | Event | EUI:67129 -_after_→ EUI:162308 |
… |
MIMIC-III dataset | ||||
Unigram | Bigram | Trigram | Sequential Bigram | |
HGAN | 0.832 | 0.445 | 0.513 | 0.487 |
SynEHRgy | 0.907 | 0.717 | 0.738 | 0.571 |
HALO | 0.872 | 0.287 | 0.313 | 0.521 |
SynTEG | 0.858 | 0.501 | 0.647 | 0.562 |
QAMT | 0.928 | 0.721 | 0.718 | 0.631 |
eICU dataset | ||||
Unigram | Bigram | Trigram | Sequential Bigram | |
HGAN | 0.799 | 0.319 | 0.483 | 0.366 |
SynEHRgy | 0.848 | 0.763 | 0.711 | 0.500 |
HALO | 0.769 | 0.293 | 0.281 | 0.474 |
SynTEG | 0.787 | 0.579 | 0.663 | 0.492 |
QAMT | 0.897 | 0.755 | 0.736 | 0.592 |
MIMIC-III dataset | |||||
Precision | Recall | Density | Coverage | ||
SynEHRgy | 0.781 (0.011) | 0.853 (0.003) | 0.711 (0.016) | 0.852 (0.008) | 0.036 |
HGAN | 0.731 (0.023) | 0.617 (0.028) | 0.745 (0.045) | 0.315 (0.004) | 0.083 |
HALO | 0.503 (0.038) | 0.461 (0.002) | 0.372 (0.029) | 0.215 (0.009) | 0.075 |
SynTEG | 0.610 (0.013) | 0.721 (0.006) | 0.672 (0.031) | 0.507 (0.005) | 0.045 |
QAMT | 0.811 (0.009) | 0.859 (0.011) | 0.739 (0.036) | 0.638 (0.003) | 0.024 |
eICU dataset | |||||
Precision | Recall | Density | Coverage | ||
SynEHRgy | 0.814 (0.018) | 0.822 (0.005) | 0.701 (0.011) | 0.714 (0.007) | 0.042 |
HGAN | 0.669 (0.037) | 0.691 (0.042) | 0.728 (0.032) | 0.594 (0.004) | 0.045 |
HALO | 0.400 (0.061) | 0.417 (0.024) | 0.296 (0.018) | 0.395 (0.013) | 0.062 |
SynTEG | 0.523 (0.020) | 0.806 (0.012) | 0.633 (0.022) | 0.678 (0.006) | 0.051 |
QAMT | 0.863 (0.017) | 0.817 (0.009) | 0.698 (0.025) | 0.743 (0.005) | 0.033 |
MIMIC-III dataset | ||||
SpesisClustering ( %) | SpesisTreatment () | ARDSPrediction (AUROC) | ARDSTreatment ( %) | |
Real data | −32.37 | 0.217 | 0.809 | −2.33 |
HGAN | −28.67 | 0.189 | 0.764 | −1.73 |
SynEHRgy | −28.11 | 0.203 | 0.818 | −2.17 |
HALO | −27.72 | 0.195 | 0.793 | −2.11 |
SynTEG | −28.02 | 0.191 | 0.799 | −2.13 |
QAMT | −29.91 | 0.214 | 0.801 | −2.28 |
eICU dataset | ||||
SpesisClustering ( %) | SpesisTreatment () | ARDSPrediction (AUROC) | ARDSTreatment ( %) | |
Real data | −40.82 | 0.172 | 0.813 | −2.49 |
HGAN | −38.82 | 0.161 | 0.825 | −2.01 |
SynEHRgy | −37.41 | 0.167 | 0.816 | −2.31 |
HALO | −43.58 | 0.165 | 0.814 | −2.16 |
SynTEG | −40.07 | 0.162 | 0.820 | −2.20 |
QAMT | −42.17 | 0.173 | 0.807 | −2.24 |
MIMIC-III Dataset | ||||||||
Static Event Data | Temporal Data | |||||||
Method | JSD | WD | AUROC | Method | JSD | WD | AUROC | |
HGAN | 0.015 | 0.001 | 0.482 | HGAN | 0.001 | 0.003 | 0.482 | |
SynEHRgy | 0.014 | 0.001 | 0.461 | SynEHRgy | 0.002 | 0.002 | 0.492 | |
HALO | 0.013 | 0.000 | 0.477 | HALO | 0.003 | 0.001 | 0.493 | |
SynTEG | 0.014 | 0.000 | 0.469 | SynTEG | 0.002 | 0.001 | 0.488 | |
QAMT | 0.013 | 0.000 | 0.456 | QAMT | 0.001 | 0.002 | 0.477 | |
eICU Dataset | ||||||||
Static Event Data | Temporal Data | |||||||
Method | JSD | WD | AUROC | Method | JSD | WD | AUROC | |
HGAN | 0.015 | 0.002 | 0.496 | HGAN | 0.001 | 0.001 | 0.497 | |
SynEHRgy | 0.015 | 0.002 | 0.479 | SynEHRgy | 0.001 | 0.002 | 0.508 | |
HALO | 0.015 | 0.001 | 0.486 | HALO | 0.003 | 0.002 | 0.509 | |
SynTEG | 0.015 | 0.001 | 0.481 | SynTEG | 0.002 | 0.002 | 0.506 | |
QAMT | 0.015 | 0.001 | 0.456 | QAMT | 0.001 | 0.002 | 0.504 |
Static event data | |||
Unigram | Bigram | ||
Value | 0.928 | 0.721 | |
95% CI | [0.905, 0.951] | [0.712, 0.730] | |
p | p < 0.05 | p < 0.05 | |
Temporal data | |||
Precision | Recall | ||
Value | 0.811 | 0.859 | |
95% CI | [0.802, 0.820] | [0.852, 0.866] | |
p | p < 0.05 | p < 0.05 | |
Time-series data | |||
SpesisClustering ( %) | SpesisTreatment () | ||
Value | −29.91 | 0.214 | |
95% CI | [−30.27, −29.55] | [0.211, 0.217] | |
p | p < 0.05 | p < 0.01 |
Noise Intensity | Precision | Recall | Density | Coverage | |
---|---|---|---|---|---|
0% | 0.811 | 0.859 | 0.739 | 0.638 | 0.024 |
10% | 0.806 | 0.848 | 0.735 | 0.636 | 0.025 |
20% | 0.803 | 0.841 | 0.728 | 0.633 | 0.025 |
Influence | 0.6% | 2.1% | 1.5% | 0.8% | 4.2% |
k | Precision | Recall | Density | Coverage | |
---|---|---|---|---|---|
3 | 0.790 | 0.837 | 0.728 | 0.615 | 0.022 |
5 | 0.811 | 0.859 | 0.739 | 0.638 | 0.024 |
7 | 0.814 | 0.871 | 0.743 | 0.644 | 0.025 |
10 | 0.815 | 0.873 | 0.744 | 0.646 | 0.025 |
Static Event Data | Temporal Data | |||||||
---|---|---|---|---|---|---|---|---|
Method | JSD | WD | AUROC | Method | JSD | WD | AUROC | |
QAMT | 0.013 | 0.000 | 0.456 | QAMT | 0.001 | 0.002 | 0.477 | |
QAMT4 | 0.016 | 0.004 | 0.457 | QAMT4 | 0.002 | 0.002 | 0.478 | |
QAMT3 | 0.044 | 0.013 | 0.462 | QAMT3 | 0.005 | 0.006 | 0.484 | |
QAMT2 | 0.050 | 0.014 | 0.465 | QAMT2 | 0.005 | 0.007 | 0.486 | |
QAMT1 | 0.071 | 0.019 | 0.470 | QAMT1 | 0.006 | 0.008 | 0.491 | |
QAMT0 | 0.068 | 0.018 | 0.469 | QAMT0 | 0.006 | 0.008 | 0.492 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Luo, Y.; Zhang, Y.; Xing, C.; Ren, P.; Liu, X. QAMT: An LLM-Based Framework for Quality-Assured Medical Time-Series Data Generation. Sensors 2025, 25, 5482. https://doi.org/10.3390/s25175482
Luo Y, Zhang Y, Xing C, Ren P, Liu X. QAMT: An LLM-Based Framework for Quality-Assured Medical Time-Series Data Generation. Sensors. 2025; 25(17):5482. https://doi.org/10.3390/s25175482
Chicago/Turabian StyleLuo, Yi, Yong Zhang, Chunxiao Xing, Peng Ren, and Xinhao Liu. 2025. "QAMT: An LLM-Based Framework for Quality-Assured Medical Time-Series Data Generation" Sensors 25, no. 17: 5482. https://doi.org/10.3390/s25175482
APA StyleLuo, Y., Zhang, Y., Xing, C., Ren, P., & Liu, X. (2025). QAMT: An LLM-Based Framework for Quality-Assured Medical Time-Series Data Generation. Sensors, 25(17), 5482. https://doi.org/10.3390/s25175482