One Report, Multifaceted Views: Multi-Expert Rewriting for ECG Interpretation
Abstract
1. Introduction
2. Related Work
2.1. Traditional Text Augmentation Methods
2.2. Prompt-Based Text Augmentation Methods
2.3. Recent Advances in LLM-Based Clinical Text Augmentation
3. Methodology
3.1. Dataset
3.2. Multi-Expert Perspective Augmentation
3.3. BiomedBERT-Based Gradient Boosting Classification Model
4. Experimental Results
4.1. Evaluation
4.2. Experimental Setup
4.3. Performance on Original vs. Augmented Data
4.4. Comparison with Other Augmentation Techniques
4.5. Comparison with Other LLMs
4.6. Comparison with Traditional Gradient Boosting Models
5. Discussion
5.1. Methodological Strengths and Limitations
5.2. Ensuring Clinical Accuracy
5.3. Computational Efficiency and Real-World Applicability
5.4. Limitations Related to Data Source and Generalizability
5.5. Incorporating Explainable AI for Model Interpretability
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Appendix A.1
Prompt |
---|
{“role”: “system”, “content”: “”” You are a medical expert specializing in ECG interpretation. Your task is to rewrite an ECG report in a concise and machine-style format.\n\n **Instructions:**\n - Rewrite the report into five distinct versions.\n - Each version should reflect the perspective of a different specialist: cerebrovascular specialist, neurologist, cardiologist, radiologist, and circulatory specialist. - Use medically appropriate but concise phrasing. - Avoid redundant explanations or detailed speculation. - Do not explicitly state whether the patient has had a stroke. - Keep the style aligned with machine-generated reports: brief, objective, and direct.\n\n - **Output must be in valid JSON format only. Any deviation is not allowed.**\n\n **Format:**\n {“Cerebrovascular Specialist”: “<Report>”, “Neurologist”: “<Report>”, “Cardiologist”: “<Report>”, “Radiologist”: “<Report>”, “Circulatory Specialist”: “<Report>” }\n” “””}, {“role”: “user”, “content”: f”Here is the original ECG interpretation:\n\n\”{original_report}\”\n\n” “Please generate five distinct variations as per the given instructions.”} |
Prompt |
---|
{“role”: “system”, “content”: “You are a helpful assistant that augments ECG interpretation data based on author perspective changes.”}, {“role”: “user”, “content”: f“““ \“{original_report}\” Please think step by step: 1. What are some other attributes of the above sentence except “Author: Machine”? 2. How to rewrite this sentence with the same attributes, but with “Author: {author}”? 3. The rewritten sentence should remain concise, retain the machine-generated format, and reflect the perspective of the new author. Write only the final sentence without any explanation. ”””} |
Prompt for Rewrite | Prompt for Generation |
---|---|
{“role”: “user”, “content”: (“Rewrite the following medical report in a different way without changing its meaning.\n” “- Output only the rewritten sentence. Do not include any explanation, heading, or prefix.\n” “- Do not use quotation marks or markdown.\n” “- Keep the output medically consistent.\n\n” f”{original_text}”)} | {“role”: “user”, “content”: (f”You are a clinical report writer. Your task is to write a short ECG report that would likely belong to class {label}.\n” f”- Write a report of about {max_words} words.\n” “- The content must be medically realistic.\n” “- Output only the report text. Do not include any explanation, prefix, quotes, or markdown.\n” “- Output should be a single paragraph.”)} |
Appendix A.2
References
- Chen, X.; Du, Y. Enhancing Medical Text Classification with GAN-Based Data Augmentation and Multi-Task Learning in BERT. Sci. Rep. 2025, 15, 13854. [Google Scholar] [CrossRef] [PubMed]
- Amin-Nejad, A.; Ive, J.; Velupillai, S. Exploring Transformer Text Generation for Medical Dataset Augmentation. In Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France, 11–16 May 2020; Calzolari, N., Béchet, F., Blache, P., Choukri, K., Cieri, C., Declerck, T., Goggi, S., Isahara, H., Maegaard, B., Mariani, J., et al., Eds.; European Language Resources Association: Marseille, France, 2020; pp. 4699–4708. [Google Scholar]
- Sufi, F. Addressing Data Scarcity in the Medical Domain: A GPT-Based Approach for Synthetic Data Generation and Feature Extraction. Information 2024, 15, 264. [Google Scholar] [CrossRef]
- Bayer, M.; Kaufhold, M.-A.; Reuter, C. A Survey on Data Augmentation for Text Classification. ACM Comput. Surv. 2023, 55, 1–39. [Google Scholar] [CrossRef]
- Van Nooten, J.; Daelemans, W. Improving Dutch Vaccine Hesitancy Monitoring via Multi-Label Data Augmentation with GPT-3. In Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, 14 July 2023; Barnes, J., De Clercq, O., Klinger, R., Eds.; Toronto, ON, Canada; Association for Computational Linguistics: Toronto, ON, Canada, 2023; pp. 251–270. [Google Scholar]
- Lu, Q.; Dou, D.; Nguyen, T.H. Textual Data Augmentation for Patient Outcomes Prediction. In Proceedings of the 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Houston, TX, USA, 9–12 December 2021; pp. 2817–2821. [Google Scholar]
- Bird, J.J.; Pritchard, M.; Fratini, A.; Ekárt, A.; Faria, D.R. Synthetic Biological Signals Machine-Generated by GPT-2 Improve the Classification of EEG and EMG Through Data Augmentation. IEEE Robot. Autom. Lett. 2021, 6, 3498–3504. [Google Scholar] [CrossRef]
- Abdin, M.; Aneja, J.; Behl, H.; Bubeck, S.; Eldan, R.; Gunasekar, S.; Harrison, M.; Hewett, R.J.; Javaheripi, M.; Kauffmann, P.; et al. Phi-4 Technical Report. arXiv 2024, arXiv:2412.08905. [Google Scholar] [CrossRef]
- Gu, Y.; Tinn, R.; Cheng, H.; Lucas, M.; Usuyama, N.; Liu, X.; Naumann, T.; Gao, J.; Poon, H. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing. ACM Trans. Comput. Healthc. 2021, 3, 1–23. [Google Scholar] [CrossRef]
- Wei, J.; Zou, K. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; Inui, K., Jiang, J., Ng, V., Wan, X., Eds.; Association for Computational Linguistics: Hong Kong, China, 2019; pp. 6382–6388. [Google Scholar]
- Huong, T.H.; Hoang, V.T. A Data Augmentation Technique Based on Text for Vietnamese Sentiment Analysis. In Proceedings of the 11th International Conference on Advances in Information Technology, IAIT ’20, Bangkok, Thailand, 1–3 July 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 1–5. [Google Scholar]
- Qiu, S.; Xu, B.; Zhang, J.; Wang, Y.; Shen, X.; de Melo, G.; Long, C.; Li, X. EasyAug: An Automatic Textual Data Augmentation Platform for Classification Tasks. In Companion Proceedings of the Web Conference 2020; WWW ’20; Association for Computing Machinery: New York, NY, USA, 2020; pp. 249–252. [Google Scholar]
- Sennrich, R.; Haddow, B.; Birch, A. Improving Neural Machine Translation Models with Monolingual Data. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 7–12 August 2016; Erk, K., Smith, N.A., Eds.; Association for Computational Linguistics: Berlin, Germany, 2016; pp. 86–96. [Google Scholar]
- Wu, X.; Lv, S.; Zang, L.; Han, J.; Hu, S. Conditional BERT Contextual Augmentation. In Computational Science–ICCS 2019; Rodrigues, J.M.F., Cardoso, P.J.S., Monteiro, J., Lam, R., Krzhizhanovskaya, V.V., Lees, M.H., Dongarra, J.J., Sloot, P.M.A., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 84–95. [Google Scholar]
- Shi, B.; Zhang, L.; Huang, J.; Zheng, H.; Wan, J.; Zhang, L. MDA: An Intelligent Medical Data Augmentation Scheme Based on Medical Knowledge Graph for Chinese Medical Tasks. Appl. Sci. 2022, 12, 10655. [Google Scholar] [CrossRef]
- Xiang, R.; Chersoni, E.; Lu, Q.; Huang, C.-R.; Li, W.; Long, Y. Lexical Data Augmentation for Sentiment Analysis. J. Assoc. Inf. Sci. Technol. 2021, 72, 1432–1447. [Google Scholar] [CrossRef]
- Kumar, A.; Bhattamishra, S.; Bhandari, M.; Talukdar, P. Submodular Optimization-Based Diverse Paraphrasing and Its Effectiveness in Data Augmentation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; Burstein, J., Doran, C., Solorio, T., Eds.; Association for Computational Linguistics: Minneapolis, MN, USA, 2019; pp. 3609–3619. [Google Scholar]
- Liu, T.; Sun, Y. End-to-End Adversarial Sample Generation for Data Augmentation. In Findings of the Association for Computational Linguistics: EMNLP 2023; Bouamor, H., Pino, J., Bali, K., Eds.; Association for Computational Linguistics: Singapore, 2023; pp. 11359–11368. [Google Scholar]
- Piedboeuf, F.; Langlais, P. Is ChatGPT the Ultimate Data Augmentation Algorithm? In Findings of the Association for Computational Linguistics: EMNLP 2023; Bouamor, H., Pino, J., Bali, K., Eds.; Association for Computational Linguistics: Singapore, 2023; pp. 15606–15615. [Google Scholar]
- Ubani, S.; Polat, S.O.; Nielsen, R. ZeroShotDataAug: Generating and Augmenting Training Data with ChatGPT. arXiv 2023, arXiv:2304.14334. [Google Scholar]
- Zhao, H.; Chen, H.; Ruggles, T.A.; Feng, Y.; Singh, D.; Yoon, H.-J. Improving Text Classification with Large Language Model-Based Data Augmentation. Electronics 2024, 13, 2535. [Google Scholar] [CrossRef]
- Peng, L.; Zhang, Y.; Shang, J. Controllable Data Augmentation for Few-Shot Text Mining with Chain-of-Thought Attribute Manipulation. In Findings of the Association for Computational Linguistics: ACL 2024; Ku, L.-W., Martins, A., Srikumar, V., Eds.; Association for Computational Linguistics: Bangkok, Thailand, 2024; pp. 1–16. [Google Scholar]
- Liu, Y.; Sharma, P.; Oswal, M.J.; Xia, H.; Huang, Y. PersonaFlow: Boosting Research Ideation with LLM-Simulated Expert Personas. arXiv 2024, arXiv:2409.12538v1. [Google Scholar]
- Li, Z.; Chang, Y.; Le, X. Simulating Expert Discussions with Multi-Agent for Enhanced Scientific Problem Solving. In Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024), Bangkok, Thailand, 16 August 2024; Ghosal, T., Singh, A., Waard, A., Mayr, P., Naik, A., Weller, O., Lee, Y., Shen, S., Qin, Y., Eds.; Association for Computational Linguistics: Bangkok, Thailand, 2024; pp. 243–256. [Google Scholar]
- Liu, J.; Koopman, B.; Brown, N.J.; Chu, K.; Nguyen, A. Generating Synthetic Clinical Text with Local Large Language Models to Identify Misdiagnosed Limb Fractures in Radiology Reports. Artif. Intell. Med. 2025, 159, 103027. [Google Scholar] [CrossRef] [PubMed]
- Wei, Y.; Li, Q.; Pillai, J. Structured LLM Augmentation for Clinical Information Extraction. Stud. Health Technol. Inform. 2025, 329, 971–976. [Google Scholar] [PubMed]
- Šuvalov, H.; Lepson, M.; Kukk, V.; Malk, M.; Ilves, N.; Kuulmets, H.-A.; Kolde, R. Using Synthetic Health Care Data to Leverage Large Language Models for Named Entity Recognition: Development and Validation Study. J. Med. Internet Res. 2025, 27, e66279. [Google Scholar] [CrossRef] [PubMed]
- Badirli, S.; Liu, X.; Xing, Z.; Bhowmik, A.; Doan, K.; Keerthi, S.S. Gradient Boosting Neural Networks: GrowNet. arXiv 2020, arXiv:2002.07971. [Google Scholar] [CrossRef]
- Naver-Hyperclovax. HyperCLOVAX-SEED-Vision-Instruct-3B. Hugging Face. Available online: https://huggingface.co/naver-hyperclovax/HyperCLOVAX-SEED-Vision-Instruct-3B (accessed on 17 July 2025).
- Google. Gemma Kaggle. Available online: https://www.kaggle.com/models/google/gemma-2 (accessed on 17 July 2025).
- Freund, Y.; Schapire, R.E. A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
- Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; KDD ’16, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; KDD ’16, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 1135–1144. [Google Scholar]
Normal | Stroke |
---|---|
Normal sinus rhythm Normal ECG | Normal sinus rhythm Inferior infarct, age undetermined Abnormal ECG |
Text Type | Text |
---|---|
Original text | Sinus tachycardia Low voltage QRS Borderline ECG |
Cerebrovascular Specialist | Sinus tachycardia observed. Notable low voltage QRS amplitude, suggesting potential cardiac involvement or volume depletion. ECG borderline, further cerebrovascular evaluation may be warranted. |
Neurologist | Sinus tachycardia detected. Low voltage QRS complex indicates possible neurological-cardiac interaction. Borderline ECG findings, recommend monitoring for related neurological implications. |
Cardiologist | Presentation of sinus tachycardia with low voltage QRS complexes. ECG findings are borderline, indicating a need for further cardiac assessment and potential imaging. |
Radiologist | Sinus tachycardia with low voltage QRS complex. ECG borderline, suggesting need for diagnostic imaging to evaluate cardiac structure and possible underlying conditions. |
Circulatory Specialist | Sinus tachycardia identified. Low voltage QRS complexes and borderline ECG may indicate circulatory system disturbances. Further hemodynamic evaluation recommended. |
Dataset | Accuracy | Precision | Recall | F1-Score | AUROC |
---|---|---|---|---|---|
Original Data | 0.5496 (0.4953–0.6012) | 0.5549 (0.5000–0.6062) | 0.9827 (0.9609–1.0000) | 0.7089 (0.6625–0.7510) | 0.6290 (0.5675–0.6911) |
+ Expert Augmentation (Ours) | 0.8434 (0.8037–0.8816) | 0.8373 (0.7857–0.8876) | 0.8935 (0.8491–0.9351) | 0.8642 (0.8264–0.8985) | 0.9025 (0.8690–0.9319) |
Augmentation Techniques | Accuracy | Precision | Recall | F1-Score | AUROC |
---|---|---|---|---|---|
CoTAM [22] | 0.7653 (0.7165–0.8100) | 0.7359 (0.6786–0.7917) | 0.9052 (0.8587–0.9441) | 0.8114 (0.7684–0.8523) | 0.7950 (0.7391–0.8395) |
DA [21] | 0.7484 (0.6978–0.7944) | 0.7920 (0.7262–0.8483) | 0.7460 (0.6793–0.8103) | 0.7678 (0.7138–0.8134) | 0.8887 (0.8499–0.9211) |
Expert (Ours) | 0.8434 (0.8037–0.8816) | 0.8373 (0.7857–0.8876) | 0.8935 (0.8491–0.9351) | 0.8642 (0.8264–0.8985) | 0.9025 (0.8690–0.9319) |
Model | Accuracy | Precision | Recall | F1-Score | AUROC |
---|---|---|---|---|---|
HyperCLOVA X Seed 3B [29] | 0.7137 (0.6604–0.7602) | 0.7431 (0.6743–0.8022) | 0.7458 (0.6833–0.8125) | 0.7439 (0.6869–0.7926) | 0.8600 (0.8189–0.8984) |
Gemma-2 9B [30] | 0.8344 (0.7944–0.8723) | 0.8245 (0.7704–0.8733) | 0.8943 (0.8483–0.9344) | 0.8576 (0.8195–0.8911) | 0.8745 (0.8365–0.9089) |
Phi-4 14B [8] | 0.8434 (0.8037–0.8816) | 0.8373 (0.7857–0.8876) | 0.8935 (0.8491–0.9351) | 0.8642 (0.8264–0.8985) | 0.9025 (0.8690–0.9319) |
Model | Accuracy | Precision | Recall | F1-Score | AUROC |
---|---|---|---|---|---|
AdaBoost | 0.5386 (0.4825–0.5944) | 0.7847 (95% CI: 0.5383–1.0000) | 0.0794 (95% CI: 0.0362–0.1259) | 0.1434 (95% CI: 0.0685–0.2180) | 0.5319 (95% CI: 0.4588–0.6004) |
XGBoost | 0.5836 (95% CI: 0.5280–0.6399) | 0.9184 (95% CI: 0.7894–1.0000) | 0.1639 (95% CI: 0.1026–0.2333) | 0.2769 (95% CI: 0.1840–0.3736) | 0.5312 (95% CI: 0.4583–0.6002) |
Ours | 0.8434 (0.8037–0.8816) | 0.8373 (0.7857–0.8876) | 0.8935 (0.8491–0.9351) | 0.8642 (0.8264–0.8985) | 0.9025 (0.8690–0.9319) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kim, Y.-H.; Kim, C.; Kim, Y.-S. One Report, Multifaceted Views: Multi-Expert Rewriting for ECG Interpretation. Appl. Sci. 2025, 15, 9376. https://doi.org/10.3390/app15179376
Kim Y-H, Kim C, Kim Y-S. One Report, Multifaceted Views: Multi-Expert Rewriting for ECG Interpretation. Applied Sciences. 2025; 15(17):9376. https://doi.org/10.3390/app15179376
Chicago/Turabian StyleKim, Yu-Hyeon, Chulho Kim, and Yu-Seop Kim. 2025. "One Report, Multifaceted Views: Multi-Expert Rewriting for ECG Interpretation" Applied Sciences 15, no. 17: 9376. https://doi.org/10.3390/app15179376
APA StyleKim, Y.-H., Kim, C., & Kim, Y.-S. (2025). One Report, Multifaceted Views: Multi-Expert Rewriting for ECG Interpretation. Applied Sciences, 15(17), 9376. https://doi.org/10.3390/app15179376