Evaluating Large Language Models in Cardiology: A Comparative Study of ChatGPT, Claude, and Gemini
Abstract
1. Introduction
2. Methods
2.1. Artificial Intelligence Platforms
2.2. Prompt Formulation and Scenario Design
- Pre-diagnostic: Presentation of symptoms with a request for diagnostic reasoning.
- Post-diagnostic: Established diagnosis with queries regarding treatment and disease management.
- Layperson (simulated patient);
- General practitioner (simulated physician requesting specialist input).
“I experience chest pain during exercise. Could this be angina? As an expert cardiologist, please explain the condition and how it is diagnosed in absolutely no more than 300 words. Could you please provide a detailed explanation in plain text without using lists or bullet points?”
2.3. Evaluation Criteria and Rating Protocol
- Scientific accuracy: Correctness of clinical information;
- Completeness: Inclusion of all relevant diagnostic or therapeutic elements;
- Clarity: Use of accessible language appropriate for the intended audience;
- Coherence: Internal coherence and logical flow of information.
2.4. Pilot Study and Power Analysis
2.5. Inter-Rater Reliability Analysis
- Kendall’s W (coefficient of concordance) was computed separately for each evaluation criterion to assess the consistency of rankings across the three raters. The values ranged from 0 (no agreement) to 1 (perfect agreement). Values above 0.6 are generally interpreted as substantial agreement in medical studies (Table S2).
- Quadratic Weighted Kappa was used for pairwise comparisons between raters. This measure accounts for the ordinal nature of the ratings and penalizes larger disagreements more severely than smaller ones do. Interpretation thresholds were based on standard benchmarks: <0.20 (poor), 0.21–0.40 (fair), 0.41–0.60 (moderate), 0.61–0.80 (substantial), and >0.80 (almost perfect agreement).
2.6. Comparative Performance Analysis
- Diagnostic stage (pre- vs. post-diagnostic);
- User profile (doctor vs. patient scenario).
2.7. Sensitivity Analysis
2.8. Statistical Tools and Implementation
3. Results
3.1. Inter-Rater Reliability
3.2. Descriptive Statistics and Score Distributions
3.3. Comparative Analysis Between AI Models
3.4. Subgroup Analysis by Diagnostic Phase and User Type
3.5. Sensitivity Analysis
4. Discussion
4.1. Clinical and Practical Implications
4.2. Ethical and Regulatory Considerations in Clinical Deployment
4.3. Methodological Strengths and Limitations
4.4. Future Directions
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Jeblick, K.; Schachtner, B.; Dexl, J.; Mittermeier, A.; Stüber, A.T.; Topalis, J.; Weber, T.; Wesp, P.; Sabel, B.O.; Ricke, J.; et al. ChatGPT makes medicine easy to swallow: An exploratory case study on simplified radiology reports. Eur. Radiol. 2024, 34, 2817–2825. [Google Scholar] [CrossRef] [PubMed]
- Kung, T.H.; Cheatham, M.; Medenilla, A.; Sillos, C.; De Leon, L.; Elepaño, C.; Madriaga, M.; Aggabao, R.; Diaz-Candido, G.; Maningo, J.; et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLoS Digit. Health 2023, 2, e0000198. [Google Scholar] [CrossRef] [PubMed]
- Gandhi, T.K.; Kachalia, A.; Thomas, E.J.; Puopolo, A.; Yoon, C.; Brennan, T.A.; Studdert, D.M. Missed and delayed diagnoses in the ambulatory setting: A study of closed malpractice claims. Ann. Intern. Med. 2006, 145, 488–496. [Google Scholar] [CrossRef] [PubMed]
- Pope, J.H.; Aufderheide, T.P.; Ruthazer, R.; Woolard, R.H.; Feldman, J.A.; Beshansky, J.R.; Griffith, J.L.; Selker, H.P. Missed diagnoses of acute cardiac ischemia in the emergency department. N. Engl. J. Med. 2000, 342, 1163–1170. [Google Scholar] [CrossRef] [PubMed]
- Huerta, N.; Rao, S.J.; Isath, A.; Wang, Z.; Glicksberg, B.S.; Krittanawong, C. The premise, promise, and perils of artificial intelligence in critical care cardiology. Prog. Cardiovasc. Dis. 2024, 86, 2–12. [Google Scholar] [CrossRef] [PubMed]
- Morley, J.; Machado, C.C.V.; Burr, C.; Cowls, J.; Joshi, I.; Taddeo, M.; Floridi, L. The ethics of AI in health care: A mapping review. Soc. Sci. Med. 2020, 260, 113172. [Google Scholar] [CrossRef] [PubMed]
- Amann, J.; Blasimme, A.; Vayena, E.; Frey, D.; Madai, V.I. Explainability for artificial intelligence in healthcare: A multidisciplinary perspective. BMC Med. Inform. Decis. Mak. 2020, 20, 310. [Google Scholar] [CrossRef] [PubMed]
- Floridi, L.; Cowls, J.; Beltrametti, M.; Chatila, R.; Chazerand, P.; Dignum, V.; Luetge, C.; Madelin, R.; Pagallo, U.; Rossi, F.; et al. AI4People—An ethical framework for a good AI society: Opportunities, risks, principles, and recommendations. Minds Mach. 2018, 28, 689–707. [Google Scholar] [CrossRef] [PubMed]
- Cohen, J. Statistical Power Analysis for the Behavioral Sciences, 2nd ed.; Lawrence Erlbaum Associates: Mahwah, NJ, USA, 1988. [Google Scholar]
- Lakens, D. Sample size justification. Collabra Psychol. 2022, 8, 33267. [Google Scholar] [CrossRef]
- Roldan-Vasquez, E. Nonparametric Statistical Methods for Ordinal Data Analysis. J. Stat. Methods 2024, 15, 123–145. [Google Scholar]
- McDonald, J.H. Handbook of Biological Statistics, 3rd ed.; Baltimore, M.D., Ed.; Sparky House Publishing: Baltimore, MD, USA, 2014; Available online: http://www.biostathandbook.com/ (accessed on 14 July 2025).
- Yavuz, Y.E.; Kahraman, F. Evaluation of the prediagnosis and management of ChatGPT-4.0 in clinical cases in cardiology. Future Cardiol. 2024, 20, 197–207. [Google Scholar] [CrossRef] [PubMed]
- Harskamp, R.E.; De Clercq, L. Performance of ChatGPT as an AI-assisted decision support tool in medicine: AMSTELHEART-2. Acta Cardiol. 2024, 79, 358–366. [Google Scholar] [CrossRef] [PubMed]
- Kozaily, E.; Geagea, M.; Akdogan, E.; Atkins, J.; Elshazly, M.; Guglin, M.; Tedford, R.J.; Wehbe, R.M. Accuracy and consistency of online large language model-based artificial intelligence chat platforms in answering patients’ questions about heart failure. Int. J. Cardiol. 2024, 408, 132115. [Google Scholar] [CrossRef] [PubMed]
- Shi, C.; Su, Y.; Yang, C.; Yang, Y.; Cai, D. Specialist or Generalist? Instruction Tuning for Specific NLP Tasks. arXiv 2023, arXiv:2310.15326. [Google Scholar]
- Kim, S. MedBioLM: Optimizing Medical and Biological QA with Fine-Tuned Large Language Models and Retrieval-Augmented Generation. arXiv 2025, arXiv:2502.03004. [Google Scholar]
- Rangapur, A.; Rangapur, A. The Battle of LLMs: A Comparative Study in Conversational QA Tasks. arXiv 2024, arXiv:2405.18344. [Google Scholar]
- Gallegos, I.O.; Rossi, R.A.; Barrow, J.; Research, A.; Kim, S.; Dernoncourt, F.; Yu, T.; Zhang, R.; Ahmed, N.K. Bias and Fairness in Large Language Models: A Survey. Comput. Linguist. 2024, 50, 1097–1179. [Google Scholar] [CrossRef]
- Heersmink, R.; de Rooij, B.; Clavel Vázquez, M.J.; Colombo, M. A phenomenology and epistemology of large language models: Transparency, trust, and trustworthiness. Ethics Inf. Technol. 2024, 26, 47. [Google Scholar] [CrossRef]
- Weiner, E.; Dankwa-Mullan, I.; Nelson, W.; Hassanpour, S. Ethical challenges and evolving strategies in the integration of artificial intelligence into clinical practice. PLoS Digit. Health 2024, 4, e0000810. [Google Scholar] [CrossRef] [PubMed]
- Pruski, M. AI-Enhanced Healthcare: Not a new Paradigm for Informed Consent. J. Bioeth. Inq. 2024, 21, 475–489. [Google Scholar] [CrossRef] [PubMed]
- Iserson, K.V. Informed consent for artificial intelligence in emergency medicine: A practical guide. Am. J. Emerg. Med. 2024, 76, 225–230. [Google Scholar] [CrossRef] [PubMed]
- Navigli, R.; Conia, S.; Ross, B. Biases in Large Language Models: Origins, Inventory, and Discussion. J. Data Inf. Qual. 2023, 15, 10. [Google Scholar] [CrossRef]
- Han, H.; Chao, H.; Guerra, A.; Sosa, A.; Christopoulos, G.; Christakopoulos, G.E.; Rangan, B.V.; Maragkoudakis, S.; Jneid, H.; Banerjee, S.; et al. Evolution of the American College of Cardiology/American Heart Association Clinical Guidelines. J. Am. Coll. Cardiol. 2015, 65, 2726–2734. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Mao, R.; Lin, Q.; Ruan, Y.; Lan, X.; Feng, M.; Cambria, E. A survey of large language models for healthcare: From data, technology, and applications to accountability and ethics. Inf. Fusion. 2025, 118, 102816. [Google Scholar] [CrossRef]
- European Commission. Artificial Intelligence Act—Final Version Adopted by the Council of the EU. 2024. Available online: https://data.consilium.europa.eu/doc/document/ST-5662-2024-INIT/en/pdf (accessed on 14 July 2025).
- US Food and Drug Administration. Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) Action Plan. 2021. Available online: https://www.fda.gov/media/145022/download (accessed on 14 July 2025).
- World Health Organization. Ethics and Governance of Artificial Intelligence for Health: WHO Guidance. 2021. Available online: https://www.who.int/publications/i/item/9789240029200 (accessed on 14 July 2025).
- Kanithi, P.K.; Christophe, C.; Pimentel, M.A.; Raha, T.; Saadi, N.; Javed, H.; Maslenkova, S.; Hayat, N.; Rajan, R.; Khan, S. MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications. arXiv 2024, arXiv:2409.07314. [Google Scholar]
- Budler, L.C.; Chen, H.; Chen, A.; Topaz, M.; Tam, W.; Bian, J.; Stiglic, G. A Brief Review on Benchmarking for Large Language Models Evaluation in Healthcare. WIREs Data Min. Knowl. Discov. 2025, 15, e70010. [Google Scholar] [CrossRef]
- Nazar, W.; Nazar, G.; Kamińska, A.; Danilowicz-Szymanowicz, L. How to Design, Create, and Evaluate an Instruction-Tuning Dataset for Large Language Model Training in Health Care: Tutorial From a Clinical Perspective. J. Med. Internet Res. 2025, 27, e58924. [Google Scholar] [CrossRef] [PubMed]
- Han, T.; Adams, L.C.; Papaioannou, J.M.; Grundmann, P.; Oberhauser, T.; Figueroa, A.; Löser, A.; Truhn, D.; Bressem, K.K. MedAlpaca—An Open-Source Collection of Medical Conversational AI Models and Training Data. arXiv 2025, arXiv:2304.08247. [Google Scholar]
- Sprague, Z.; Ye, X.; Bostrom, K.; Chaudhuri, S.; Durrett, G. MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning. arXiv 2024, arXiv:2310.16049. [Google Scholar]
- Ahsan, H.; McInerney, D.J.; Kim, J.; Potter, C.; Young, G.; Amir, S.; Wallace, B.C. Retrieving Evidence from EHRs with LLMs: Possibilities and Challenges. arXiv 2024, arXiv:2309.04550. [Google Scholar]
- El-Sappagh, S.; Franda, F.; Ali, F.; Kwak, K.S. SNOMED CT standard ontology based on the ontology for general medical science. BMC Med. Inform. Decis. Mak. 2018, 18, 76. [Google Scholar] [CrossRef] [PubMed]
- Ke, Y.H.; Jin, L.; Elangovan, K.; Abdullah, H.R.; Liu, N.; Sia, A.T.H.; Soh, C.R.; Tung, J.Y.M.; Ong, J.C.L.; Kuo, C.-F.; et al. Retrieval augmented generation for 10 large language models and its generalizability in assessing medical fitness. NPJ Digit. Med. 2025, 8, 187. [Google Scholar] [CrossRef] [PubMed]
- Chen, X.; Yi, H.; You, M.; Liu, W.Z.; Wang, L.; Li, H.; Zhang, X.; Guo, Y.; Fan, L.; Chen, G.; et al. Enhancing diagnostic capability with multi-agents conversational large language models. NPJ Digit. Med. 2025, 8, 159. [Google Scholar] [CrossRef] [PubMed]
- Tu, T.; Schaekermann, M.; Palepu, A.; Saab, K.; Freyberg, J.; Tanno, R.; Wang, A.; Li, B.; Amin, M.; Cheng, Y.; et al. Towards conversational diagnostic artificial intelligence. Nature 2025, 642, 442–450. [Google Scholar] [CrossRef] [PubMed]
- Zhou, Y.; Zhang, P.; Song, M.; Zheng, A.; Lu, Y.; Liu, Z.; Chen, Y.; Xi, Z. Zodiac: A Cardiologist-Level LLM Framework for Multi-Agent Diagnostics. arXiv 2024, arXiv:2410.02026. Available online: http://arxiv.org/abs/2410.02026 (accessed on 15 July 2025).
- Lekadir, K.; Frangi, A.; Glocker, B.; Cintas, C.; Langlotz, C.; Weicken, E.; Porras, A.R.; Asselbergs, F.W.; Prior, F.; Collins, G.S.; et al. FUTURE-AI: International consensus guideline for trustworthy and deployable artificial intelligence in healthcare. BMJ 2024, 388, e081554. [Google Scholar] [CrossRef] [PubMed]
- Kim, J.Y.; Boag, W.; Gulamali, F.; Hasan, A.; Hogg, H.D.J.; Lifson, M.; Mulligan, D.; Patel, M.; Raji, I.D.; Sehgal, A.; et al. Organizational Governance of Emerging Technologies: AI Adoption in Healthcare. In ACM International Conference Proceeding Series, Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, Chicago IL USA, 12–15 June 2023; Association for Computing Machinery: New York, NY, USA, 2023; pp. 1396–1417. [Google Scholar]
- Hindricks, G.; Potpara, T.; Dagres, N.; Arbelo, E.; Bax, J.J.; Blomström-Lundqvist, C.; Boriani, G.; Castella, M.; Dan, G.A.; Dilaveris, P.E.; et al. 2020 ESC Guidelines for the diagnosis and management of atrial fibrillation developed in collaboration with the European Association for Cardio-Thoracic Surgery (EACTS): The Task Force for the diagnosis and management of atrial fibrillation of the European Society of Cardiology (ESC) Developed with the special contribution of the European Heart Rhythm Association (EHRA) of the ESC. Eur. Heart J. 2021, 42, 373–498. [Google Scholar] [CrossRef] [PubMed]
- Al-Khatib, S.M.; Stevenson, W.G.; Ackerman, M.J.; Bryant, W.J.; Callans, D.J.; Curtis, A.B.; Deal, B.J.; Dickfeld, T.; Field, M.E.; Fonarow, G.C.; et al. 2017 AHA/ACC/HRS Guideline for Management of Patients with Ventricular Arrhythmias and the Prevention of Sudden Cardiac Death. J. Am. Coll. Cardiol. 2018, 72, e91–e220. [Google Scholar] [CrossRef] [PubMed]
- National Institute for Health and Care Excellence. Chest Pain of Recent Onset: Assessment and Diagnosis (NICE Clinical Guideline CG95). 2016. Available online: https://www.nice.org.uk/guidance/cg95 (accessed on 14 July 2025).
- Writing Committee Members; Gulati, M.; Levy, P.D.; Mukherjee, D.; Amsterdam, E.; Bhatt, D.L.; Birtcher, K.K.; Blankstein, R.; Boyd, J.; Bullock-Palmer, R.P.; et al. 2021 AHA/ACC/ASE/CHEST/SAEM/SCCT/SCMR Guideline for the Evaluation and Diagnosis of Chest Pain: A Report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines. J. Am. Coll. Cardiol. 2021, 78, e187–e285. [Google Scholar] [CrossRef] [PubMed]
- Brugada, J.; Katritsis, D.G.; Arbelo, E.; Arribas, F.; Bax, J.J.; Blomström-Lundqvist, C.; Calkins, H.; Corrado, D.; Deftereos, S.G.; Diller, G.P.; et al. 2019 ESC Guidelines for the management of patients with supraventricular tachycardia (The Task Force for the management of patients with supraventricular tachycardia of the European Society of Cardiology in collaboration with AEPC). Eur. Heart J. 2019, 41, 655–720. [Google Scholar] [CrossRef] [PubMed]
- Page, R.L.; Joglar, J.A.; Caldwell, M.A.; Calkins, H.; Conti, J.B.; Deal, B.J.; Estes, N.A.M.; Field, M.E.; Goldberger, Z.D.; Hammill, S.C.; et al. 2015 ACC/AHA/HRS guideline for the management of adult patients with supraventricular tachycardia: A report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines and the Heart Rhythm Society. Circulation 2016, 133, e506–e574. [Google Scholar] [CrossRef] [PubMed]
Criterion | Reviewer Pair | Weighted Kappa | Interpretation |
---|---|---|---|
Accuracy | Reviewer1 vs. Reviewer2 | 0.679 | Substantial (0.61–0.80) |
Reviewer1 vs. Reviewer3 | 0.411 | Moderate (0.41–0.60) | |
Reviewer2 vs. Reviewer3 | 0.557 | Moderate (0.41–0.60) | |
Completeness | Reviewer1 vs. Reviewer2 | 0.722 | Substantial (0.61–0.80) |
Reviewer1 vs. Reviewer3 | 0.584 | Moderate (0.41–0.60) | |
Reviewer2 vs. Reviewer3 | 0.672 | Moderate (0.41–0.60) | |
Clarity | Reviewer1 vs. Reviewer2 | 0.666 | Substantial (0.61–0.80) |
Reviewer1 vs. Reviewer3 | 0.471 | Moderate (0.41–0.60) | |
Reviewer2 vs. Reviewer3 | 0.639 | Substantial (0.61–0.80) | |
Coherence | Reviewer1 vs. Reviewer2 | 0.683 | Substantial (0.61–0.80) |
Reviewer1 vs. Reviewer3 | 0.447 | Moderate (0.41–0.60) | |
Reviewer2 vs. Reviewer3 | 0.623 | Substantial (0.61–0.80) |
Model | Criterion | Mean | Median | SD | IQR |
---|---|---|---|---|---|
ChatGPT | Accuracy | 3.7 | 4 | 0.86 | 1.0 |
Completeness | 3.7 | 4 | 0.89 | 1.0 | |
Clarity | 4.0 | 4 | 0.81 | 1.0 | |
Coherence | 4.2 | 4 | 0.72 | 1.0 | |
Claude | Accuracy | 3.6 | 4 | 0.9 | 1.0 |
Completeness | 3.4 | 4 | 0.96 | 1.0 | |
Clarity | 3.8 | 4 | 0.85 | 1.0 | |
Coherence | 4.0 | 4 | 0.79 | 2 | |
Gemini | Accuracy | 3.4 | 3 | 0.9 | 1.0 |
Completeness | 2.9 | 3 | 1.2 | 2.0 | |
Clarity | 3.5 | 4 | 0.93 | 1.0 | |
Coherence | 3.7 | 4 | 0.90 | 1.0 |
Criterion | Model A | Model B | Pairwise Δ and 95% CI | p Value |
---|---|---|---|---|
Accuracy | chatGPT | Claude | 0.124 [−0.043 to 0.286] | 0.512 |
chatGPT | Gemini | 0.352 [0.186 to 0.524] | <0.001 | |
Claude | Gemini | 0.228 [0.052 to 0.400] | 0.018 | |
Completeness | chatGPT | Claude | 0.282 [0.100 to 0.457] | 0.019 |
chatGPT | Gemini | 0.761 [0.557 to 0.967] | <0.001 | |
Claude | Gemini | 0.48 [0.271 to 0.690] | <0.001 | |
Clarity | chatGPT | Claude | 0.153 [−0.01 to 0.310] | 0.241 |
chatGPT | Gemini | 0.457 [0.290 to 0.619] | <0.001 | |
Claude | Gemini | 0.304 [0.133 to 0.471] | 0.002 | |
Coherence | chatGPT | Claude | 0.167 [0.024 to 0.314] | 0.066 |
chatGPT | Gemini | 0.467 [0.314 to 0.624] | <0.001 | |
Claude | Gemini | 0.3 [0.133 to 0.462] | 0.003 |
Criterion | Mean Pre | Mean Post | Δ | 95% CI | U Statistic | p |
---|---|---|---|---|---|---|
Accuracy | 3.65 | 3.46 | 0.19 | [0.056, 0.33] | 43,719 | 0.006 |
Completeness | 3.49 | 3.21 | 0.28 | [0.116, 0.45] | 42,722 | 0.002 |
Clarity | 3.89 | 3.71 | 0.19 | [0.052, 0.325] | 44,389 | 0.015 |
Coherence | 4.03 | 3.91 | 0.12 | [0.011, 0.248] | 46,481 | 0.143 |
Criterion | Mean Patient | Mean Doctor | Δ | 95% CI | U Statistic | p |
---|---|---|---|---|---|---|
Accuracy | 3.68 | 3.37 | 0.31 | [0.168, 0.45] | 57,722 | <0.001 |
Completeness | 3.47 | 3.18 | 0.29 | [0.113, 0.457] | 55,665 | <0.001 |
Clarity | 3.99 | 3.55 | 0.44 | [0.307, 0.585] | 61,083 | <0.001 |
Coherence | 4.09 | 3.79 | 0.30 | [0.178, 0.439] | 57,434 | <0.001 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pierri, M.D.; Galeazzi, M.; D’Alessio, S.; Dottori, M.; Capodaglio, I.; Corinaldesi, C.; Marini, M.; Di Eusanio, M. Evaluating Large Language Models in Cardiology: A Comparative Study of ChatGPT, Claude, and Gemini. Hearts 2025, 6, 19. https://doi.org/10.3390/hearts6030019
Pierri MD, Galeazzi M, D’Alessio S, Dottori M, Capodaglio I, Corinaldesi C, Marini M, Di Eusanio M. Evaluating Large Language Models in Cardiology: A Comparative Study of ChatGPT, Claude, and Gemini. Hearts. 2025; 6(3):19. https://doi.org/10.3390/hearts6030019
Chicago/Turabian StylePierri, Michele Danilo, Michele Galeazzi, Simone D’Alessio, Melissa Dottori, Irene Capodaglio, Christian Corinaldesi, Marco Marini, and Marco Di Eusanio. 2025. "Evaluating Large Language Models in Cardiology: A Comparative Study of ChatGPT, Claude, and Gemini" Hearts 6, no. 3: 19. https://doi.org/10.3390/hearts6030019
APA StylePierri, M. D., Galeazzi, M., D’Alessio, S., Dottori, M., Capodaglio, I., Corinaldesi, C., Marini, M., & Di Eusanio, M. (2025). Evaluating Large Language Models in Cardiology: A Comparative Study of ChatGPT, Claude, and Gemini. Hearts, 6(3), 19. https://doi.org/10.3390/hearts6030019