ChatGPT Applications in Heart Failure: Patient Education, Readability Enhancement, and Clinical Utility
Abstract
1. Introduction
2. Methods
2.1. Study Design and Reporting Guidelines
2.2. Search Strategy
2.3. Inclusion and Exclusion Criteria
2.4. Study Selection, Data Extraction and Critical Appraisal
2.5. Synthesis
3. Results
4. Discussion
4.1. Key Findings and Thematic Synthesis
4.2. Implications for Heart Failure Care
4.3. Drawbacks and Limitations of ChatGPT in HF Care
4.4. Strengths and Limitations of the Evidence
4.5. Ethical Considerations
4.6. Future Directions
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Heidenreich, P.A.; Albert, N.M.; Allen, L.A.; Bluemke, D.A.; Butler, J.; Fonarow, G.C.; Ikonomidis, J.S.; Khavjou, O.; Konstam, M.A.; Maddox, T.M. Forecasting the impact of heart failure in the United States: A policy statement from the American Heart Association. Circ. Heart Fail. 2013, 6, 606–619. [Google Scholar] [CrossRef]
- Heidenreich, P.A.; Bozkurt, B.; Aguilar, D.; Allen, L.A.; Byun, J.J.; Colvin, M.M.; Deswal, A.; Drazner, M.H.; Dunlay, S.M.; Evers, L.R.; et al. 2022 AHA/ACC/HFSA Guideline for the Management of Heart Failure: A Report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines. Circulation 2022, 145, e895–e1032. [Google Scholar] [CrossRef]
- McCullough, P.A.; Mehta, H.S.; Barker, C.M.; Van Houten, J.; Mollenkopf, S.; Gunnarsson, C.; Ryan, M.; Cork, D.P. Mortality and guideline-directed medical therapy in real-world heart failure patients with reduced ejection fraction. Clin. Cardiol. 2021, 44, 1192–1198. [Google Scholar] [CrossRef] [PubMed]
- Ross, J.S.; Chen, J.; Lin, Z.; Bueno, H.; Curtis, J.P.; Keenan, P.S.; Normand, S.L.T.; Schreiner, G.; Spertus, J.A.; Vidán, M.T.; et al. Recent national trends in readmission rates after heart failure hospitalization. Circ. Heart Fail. 2010, 3, 97–103. [Google Scholar] [CrossRef] [PubMed]
- Gautam, N.; Ghanta, S.N.; Clausen, A.; Saluja, P.; Sivakumar, K.; Dhar, G.; Chang, Q.; DeMazumder, D.; Rabbat, M.G.; Greene, S.J. Contemporary Applications of Machine Learning for Device Therapy in Heart Failure. JACC Heart Fail. 2022, 10, 603–622. [Google Scholar] [CrossRef]
- Khan, M.S.; Arshad, M.S.; Greene, S.J.; Van Spall, H.G.C.; Pandey, A.; Vemulapalli, S.; Perakslis, E.; Butler, J. Artificial Intelligence and Heart Failure: A State-of-the-Art Review. Eur. J. Heart Fail. 2023, 25, 1507–1525. [Google Scholar] [CrossRef] [PubMed]
- Dimitriadis, F.; Alkagiet, S.; Tsigkriki, L.; Kleitsioti, P.; Sidiropoulos, G.; Efstratiou, D.; Askaldidi, T.; Tsaousidis, A.; Siarkos, M.; Giannakopoulou, P.; et al. ChatGPT and patients with heart failure. Angiology 2025, 76, 796–801. [Google Scholar] [CrossRef]
- King, R.C.; Samaan, J.S.; Yeo, Y.H.; Mody, B.; Lombardo, D.M.; Ghashghaei, R. Appropriateness of ChatGPT in answering heart failure related questions. Heart Lung Circ. 2024, 33, 1314–1318. [Google Scholar] [CrossRef]
- Anaya, F.; Prasad, R.; Bashour, M.; Yaghmour, R.; Alameh, A.; Balakumaran, K. Evaluating ChatGPT platform in delivering heart failure educational material: A comparison with the leading national cardiology institutes. Curr. Probl. Cardiol. 2024, 49, 102797. [Google Scholar] [CrossRef]
- Workman, T.E.; Ahmed, A.; Sheriff, H.M.; Raman, V.K.; Zhang, S.; Shao, Y.; Faselis, C.; Fonarow, G.C.; Zeng-Treitler, Q. ChatGPT-4 extraction of heart failure symptoms and signs from electronic health records. Prog. Cardiovasc. Dis. 2024, 87, 44–49. [Google Scholar] [CrossRef]
- Kozaily, E.; Geagea, M.; Akdogan, E.R.; Atkins, J.; Elshazly, M.B.; Guglin, M.; Tedford, R.J.; Wehbe, R.M. Accuracy and consistency of online large language model-based artificial intelligence chat platforms in answering patients’ questions about heart failure. Int. J. Cardiol. 2024, 408, 132115. [Google Scholar] [CrossRef]
- King, R.C.; Samaan, J.S.; Haquang, J.; Bharani, V.; Margolis, S.; Srinivasan, N.; Peng, Y.; Yeo, Y.H.; Ghashghaei, R. Improving the readability of institutional heart failure-related patient education materials using GPT-4: Observational study. JMIR Cardio 2025, 9, e68817. [Google Scholar] [CrossRef]
- Bhupathi, M.; Kareem, J.M.; Mediboina, A.; Janapareddy, K. Assessing information provided by ChatGPT: Heart failure versus patent ductus arteriosus. Cureus 2025, 17, e86365. [Google Scholar] [CrossRef]
- McDonagh, T.A.; Metra, M.; Adamo, M.; Gardner, R.S.; Baumbach, A.; Böhm, M.; Burri, H.; Butler, J.; Čelutkienė, J.; Chioncel, O.; et al. 2021 ESC Guidelines for the diagnosis and treatment of acute and chronic heart failure. Eur. Heart J. 2021, 42, 3599–3726. [Google Scholar] [CrossRef]
- Funk, P.F.; Hoch, C.C.; Knoedler, S.; Knoedler, L.; Cotofana, S.; Sofo, G.; Bashiri Dezfouli, A.; Wollenberg, B.; Guntinas-Lichius, O.; Alfertshofer, M. ChatGPT’s Response Consistency: A Study on Repeated Queries of Medical Examination Questions. Eur. J. Investig. Health Psychol. Educ. 2024, 14, 657–668. [Google Scholar] [CrossRef]
- Ghanta, S.N.; Al’Aref, S.J.; Lala-Trindade, A.; Nadkarni, G.N.; Ganatra, S.; Dani, S.S.; Mehta, J.L. Applications of ChatGPT in Heart Failure Prevention, Diagnosis, Management, and Research: A Narrative Review. Diagnostics 2024, 14, 2393. [Google Scholar] [CrossRef] [PubMed]
- Fabbri, M.; Yost, K.; Finney Rutten, L.J.; Manemann, S.M.; Boyd, C.M.; Jensen, D.; Weston, S.A.; Jiang, R.; Roger, V.L. Health Literacy and Outcomes in Patients with Heart Failure: A Prospective Community Study. Mayo Clin. Proc. 2018, 93, 9–15. [Google Scholar] [CrossRef] [PubMed]
- Sharma, A.; Medapalli, T.; Alexandrou, M.; Brilakis, E.; Prasad, A. Exploring the Role of ChatGPT in Cardiology: A Systematic Review of the Current Literature. Cureus 2024, 16, e58936. [Google Scholar] [CrossRef] [PubMed]
- Temperley, H.C.; O’Sullivan, N.J.; Mac Curtain, B.M.; Corr, A.; Meaney, J.F.; Kelly, M.E.; Brennan, I. Current applications and future potential of ChatGPT in radiology: A systematic review. J. Med. Imaging Radiat. Oncol. 2024, 68, 257–264. [Google Scholar] [CrossRef]
- Cannata, A.; Mizani, M.A.; Bromage, D.I.; Piper, S.E.; Hardman, S.M.C.; Scott, P.A.; Gardner, R.S.; Clark, A.L.; Cleland, J.G.F.; McDonagh, T.A.; et al. Heart failure specialist care and long-term outcomes for patients admitted with acute heart failure. JACC Heart Fail. 2025, 13, 402–413. [Google Scholar] [CrossRef] [PubMed]
- McMurray, J.J.V.; Packer, M.; Desai, A.S.; Gong, J.; Lefkowitz, M.P.; Rizkala, A.R.; Rouleau, J.L.; Shi, V.C.; Solomon, S.D.; Swedberg, K.; et al. Angiotensin–neprilysin inhibition versus enalapril in heart failure. N. Engl. J. Med. 2014, 371, 993–1004. [Google Scholar] [CrossRef] [PubMed]
- Saenger, J.A.; Hunger, J.; Boss, A.; Richter, J. Delayed diagnosis of a transient ischemic attack caused by ChatGPT. Wien. Klin. Wochenschr. 2024, 136, 236–238. [Google Scholar] [CrossRef]
- Harskamp, R.E.; De Clercq, L. Performance of ChatGPT as an AI-assisted decision support tool in medicine: A proof-of-concept study for interpreting symptoms and management of common cardiac conditions (AMSTELHEART-2). Acta Cardiol. 2024, 79, 358–366. [Google Scholar] [CrossRef] [PubMed]
- Sarraju, A.; Bruemmer, D.; Van Iterson, E.; Cho, L.; Rodriguez, F.; Laffin, L. Appropriateness of cardiovascular disease prevention recommendations obtained from a popular online chat-based artificial intelligence model. JAMA 2023, 329, 842–844. [Google Scholar] [CrossRef] [PubMed]

| Study | Primary Domain(s) | Sample Size | Intervention | Comparator | Key Outcomes |
|---|---|---|---|---|---|
| Dimitriadis et al. [7] | Patient Education and Question-Answering | 47 questions | ChatGPT-4 * | None (observational) | 100% accuracy in key areas such as lifestyle advice, medication mechanisms, and symptom management |
| King et al. (appropriateness) [8] | Patient Education and Question-Answering | 107 questions | GPT-3.5 * and GPT-4 | GPT-3.5 vs. GPT-4 | 98.1% appropriateness for GPT-3.5, with occasional hallucinations a (1.9%) |
| Anaya et al. [9] | Patient Education and Question-Answering; Readability Enhancement | 12 questions | ChatGPT-3 * | Leading US institutes (ACC, AHA, HFSA) | 78% actionability score, with competitive 75% PEMAT readability but lowest actionability among compared to materials |
| Workman et al. [10] | Clinical Documentation/Symptom Extraction | 1999 snippets + 102 synthetic | ChatGPT-4 ZSL | ML and rule-based | 95% F1 score for ZSL, outperforming baselines |
| Kozaily et al. [11] | Patient Education and Question-Answering | 30 questions | ChatGPT-3.5 and Bard | ChatGPT vs. Bard | 90% appropriateness for ChatGPT |
| King et al. (readability) [12] | Readability Enhancement | 143 PEMs | GPT-4 | Original materials | Improved readability to 6th-7th grade level |
| Bhupathi et al. [13] | Patient Education and Question-Answering | 21 questions (10 HF, 11 PDA) | ChatGPT-3.5 | AHA/ACC guidelines | Mean accuracy 5.4/6, 83.75% PEMAT-P understandability |
| Study | Expert Evaluation | ChatGPT Version | Evaluation Method | Metrics Used | Limitations Noted |
|---|---|---|---|---|---|
| Dimitriadis et al. [7] | 2 cardiologist researchers independently assessed responses for similarity, relevance & reliability per guidelines; overall evaluation by primary supervisor. | GPT-4 | Expert assessment by cardiologists on accuracy and comprehensiveness | Accuracy (%), comprehensiveness | Less comprehensive; limited to observational design |
| King et al. (appropriateness) [8] | 2 board-certified cardiologists independently graded responses using predefined scale for accuracy & comprehensiveness; differences resolved by 3rd reviewer. | GPT-3.5 and GPT-4 | Graded by board-certified cardiologists using 4-point scale (comprehensive to incorrect) | Appropriateness (%), reproducibility (%), comprehensive knowledge (%) | Occasional hallucinations * (1.9%); small sample of questions; no patient outcomes |
| Anaya et al. [9] | 4 advanced HF attendings conducted blind assessment. | GPT-3 | Blind assessment with PEMAT; readability calculators (Flesch-Kincaid, etc.) | PEMAT readability/actionability (%), grade level, word difficulty (%) | Longer responses at higher educational levels; lower actionability; observational only |
| Workman et al. [10] | 2 clinicians independently annotated snippets; discrepancies resolved via discussion & consensus. | GPT-4 | Zero-shot learning with prompt engineering; compared to ML/rule-based baselines | Precision (%), recall (%), F1 score (%) | Reliance on synthetic snippets; no real EHR validation; prompt sensitivity |
| Kozaily et al. [11] | 2 HF experts independently & blindly evaluated answers for accuracy & consistency. | GPT-3.5 | Expert evaluation by HF specialists; consistency across runs | Appropriateness (%), consistency (%) | Inadequacy in advanced topics; heterogeneity in comparators (Bard); small question set |
| King et al. (readability) [12] | 1 board-certified cardiologist (not blinded) assessed accuracy & comprehensiveness. | GPT-4 | Expert grading for accuracy/comprehensiveness; readability scores | Flesch-Kincaid grade level, accuracy (%), comprehensiveness increase (%) | Institutional materials bias; no long-term impact assessment |
| Bhupathi et al. [13] | 2 independent evaluators assessed referring to AHA info; disparities cross-checked vs. ACC guidelines. | GPT-3.5 | Likert scales for accuracy/completeness; PEMAT for understandability | Accuracy score (mean/6), completeness (mean/3), PEMAT understandability (%) | Slightly lower accuracy for rarer conditions; no RCTs; potential data abundance bias |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Doyle, R.S.; Hartnett, J.; Temperley, H.C.; Murray, C.P.; Walsh, R.; Walsh, J.; McCormick, J.; McGorrian, C.; Murphy, K.; McDonald, K. ChatGPT Applications in Heart Failure: Patient Education, Readability Enhancement, and Clinical Utility. J. Cardiovasc. Dev. Dis. 2025, 12, 422. https://doi.org/10.3390/jcdd12110422
Doyle RS, Hartnett J, Temperley HC, Murray CP, Walsh R, Walsh J, McCormick J, McGorrian C, Murphy K, McDonald K. ChatGPT Applications in Heart Failure: Patient Education, Readability Enhancement, and Clinical Utility. Journal of Cardiovascular Development and Disease. 2025; 12(11):422. https://doi.org/10.3390/jcdd12110422
Chicago/Turabian StyleDoyle, Robert S., Jack Hartnett, Hugo C. Temperley, Cian P. Murray, Ross Walsh, Jamie Walsh, John McCormick, Catherine McGorrian, Katie Murphy, and Kenneth McDonald. 2025. "ChatGPT Applications in Heart Failure: Patient Education, Readability Enhancement, and Clinical Utility" Journal of Cardiovascular Development and Disease 12, no. 11: 422. https://doi.org/10.3390/jcdd12110422
APA StyleDoyle, R. S., Hartnett, J., Temperley, H. C., Murray, C. P., Walsh, R., Walsh, J., McCormick, J., McGorrian, C., Murphy, K., & McDonald, K. (2025). ChatGPT Applications in Heart Failure: Patient Education, Readability Enhancement, and Clinical Utility. Journal of Cardiovascular Development and Disease, 12(11), 422. https://doi.org/10.3390/jcdd12110422

