The Quality of AI-Generated CABG Counseling: A Blinded Comparison of Two Language Models
Abstract
1. Introduction
2. Materials and Methods
2.1. Ethics Statement
2.2. Question Development and Response Collection
2.3. Expert Panel
2.4. Statistical Analysis
3. Results
4. Discussion
Limitations
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| CABG | Coronary artery bypass grafting |
| AI | Artificial intelligence |
| LLM | Large language model |
| IRB | Institutional review board |
| R | Reviewer |
References
- Adeyemi, A.; Berman, L.; Staroselsky, M.; Cordero, D.; Hai, O.; Makaryus, A.N.; Zeltser, R. Coronary Artery Bypass Grafting: A Review of Short- and Long-Term Outcomes. Int. J. Angiol. 2025, 34, 296–302. [Google Scholar] [CrossRef] [PubMed]
- Mokhtassi, S.S.; Bulut, H.I.; Salmasi, Y.; Khoshbin, E. Expert Review of the Strategies to Optimize Long-Term Outcomes After Coronary Artery Bypass Grafting. Rev. Cardiovasc. Med. 2025, 26, 39887. [Google Scholar] [CrossRef] [PubMed]
- Ayo-Ajibola, O.; Davis, R.J.; Lin, M.E.; Riddell, J.; Kravitz, R.L. Characterizing the Adoption and Experiences of Users of Artificial Intelligence-Generated Health Information in the United States: Cross-Sectional Questionnaire Study. J. Med. Internet Res. 2024, 26, e55138. [Google Scholar] [CrossRef] [PubMed]
- Cetin, H.K.; Demir, H.B.; Demir, T. ChatGPT’s Role in Coronary Artery Bypass Graft Information: A Critical Assessment. Sisli Etfal Hastan. Tip. Bul. 2025, 59, 311–315. [Google Scholar] [CrossRef] [PubMed]
- Koo, T.K.; Li, M.Y. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J. Chiropr. Med. 2016, 15, 155–163. [Google Scholar] [CrossRef] [PubMed]
- Erkan, M.H.; Rahman, Ö.F.; Güner, A.; Ayyıldız, F.; Barbarus, E. ChatGPT and Gemini in warfarin counseling. Croat. Med. J. 2026, 66, 399–405. [Google Scholar] [CrossRef] [PubMed]
- Akçay, O.; Öztürk, Ö.; Acar, T.; Gürsoy, S. Accuracy and Reliability of ChatGPT in Answering Patient Questions About Lung Cancer and Its Surgery: An Expert Panel Evaluation by Thoracic Surgeons. J. Cancer Educ. 2025. [Google Scholar] [CrossRef] [PubMed]
- Rahman, Ö.F.; Özbakkaloğlu, A.; Arslangilay, M.; Daylan, A.; Keleş, E.; Bozkurt, Ö.T.; Bozok, Ş. Expert evaluation of GPT-4o and Gemini responses to patient questions on carotid endarterectomy. Rev. Assoc. Med. Bras. 2026, 72, e20251453. [Google Scholar] [CrossRef] [PubMed]
- Erkan, M.H.; Rahman, Ö.F.; Güner, A.; Ayyıldız, F.; Barbarus, E. Comparative Analysis of Large Language Models in Hemodialysis Vascular Access: ChatGPT-5, Gemini-2.5, and DeepSeek-V3. Eur. Res. J. 2026, 1–9. [Google Scholar] [CrossRef]
- Zhang, Y.; Huang, T.; Liu, C.; Miller, A.N.; Yang, M.; Harris, I.A.; Sawaguchi, T.; Miclau, T.; Tian, M.; Chui, C.S.; et al. Comparative evaluation of large language models for hip fracture-related patient questions: DeepSeek-V3-FW, Gemini 2.0 Flash, and ChatGPT-4.5. Digit. Health 2026, 12, 20552076251412989. [Google Scholar] [CrossRef] [PubMed]
- Anh-Hoang, D.; Tran, V.; Nguyen, L.M. Survey and analysis of hallucinations in large language models: Attribution to prompting strategies or model behavior. Front. Artif. Intell. 2025, 8, 1622292. [Google Scholar] [CrossRef] [PubMed]
- Steyvers, M.; Tejeda, H.; Kumar, A.; Belem, C.; Karny, S.; Hu, X.; Mayer, L.W.; Smyth, P. What large language models know and what people think they know. Nat. Mach. Intell. 2025, 7, 221–231. [Google Scholar] [CrossRef]
- Baxter, K.A.; Sachdeva, N.; Baker, S. The Application of Cognitive Load Theory to the Design of Health and Behavior Change Programs: Principles and Recommendations. Health Educ. Behav. 2025, 52, 469–477. [Google Scholar] [CrossRef] [PubMed]
- Zhou, L.; Schellaert, W.; Martínez-Plumed, F.; Moros-Daval, Y.; Ferri, C.; Hernández-Orallo, J. Larger and more instructable language models become less reliable. Nature 2024, 634, 61–68. [Google Scholar] [CrossRef] [PubMed]
- Roustan, D.; Bastardot, F. The Clinicians’ Guide to Large Language Models: A General Perspective With a Focus on Hallucinations. Interact. J. Med. Res. 2025, 14, e59823. [Google Scholar] [CrossRef] [PubMed]
- Geracitano, J.; Anderson, B.; Coffel, M.; Rosenzweig, M.; Dorn, S.D.; Khairat, S.; Conklin, J. The Accuracy of ChatGPT in Answering FAQs, Making Clinical Recommendations, and Categorizing Patient Symptoms: A Literature Review. Adv. Health Inf. Sci. Pract. 2025, 1, Vxul2925. [Google Scholar] [CrossRef] [PubMed]
| 1 | How many days will I stay in the hospital after coronary artery bypass surgery? |
| 2 | Is it necessary to use a chest corset after coronary artery bypass surgery? |
| 3 | Will I need to take blood thinning medication for life after coronary bypass surgery? |
| 4 | Do I need to wear compression stockings after coronary bypass surgery? |
| 5 | How long after coronary bypass surgery can I start driving? |
| 6 | When can I return to work after coronary bypass surgery? |
| 7 | How long after coronary bypass surgery can I resume sexual activity? |
| 8 | When can I travel by airplane after coronary bypass surgery? |
| 9 | Will I experience sleep problems after coronary bypass surgery? |
| 10 | Can I swim in the sea after coronary bypass surgery? |
| 11 | Can every coronary bypass surgery be performed using a minimally invasive approach? |
| 12 | In minimally invasive bypass surgery, can more than one vessel be treated? |
| 13 | Is minimally invasive coronary bypass surgery safe? |
| 14 | Is there a risk of converting to a conventional incision during minimally invasive bypass surgery? |
| 15 | Does minimally invasive coronary bypass surgery require special equipment? |
| 16 | During minimally invasive bypass surgery, is the leg vein also harvested using a closed technique? |
| 17 | How large will the scar be after minimally invasive coronary bypass surgery? |
| 18 | What are the advantages of minimally invasive bypass surgery? |
| 19 | I have previously undergone heart surgery. Can I have minimally invasive bypass surgery? |
| 20 | When can I resume sexual activity after minimally invasive coronary bypass surgery? |
| 21 | In elderly patients, is minimally invasive or conventional coronary bypass surgery safer? |
| 22 | For patients with diabetes, is minimally invasive or conventional coronary bypass surgery more appropriate? |
| 23 | In obese patients, is minimally invasive or conventional coronary bypass surgery safer? |
| 24 | For younger patients requiring bypass surgery, is the minimally invasive or conventional approach more suitable? |
| 25 | Can both conventional and minimally invasive coronary bypass surgery be performed without stopping the heart? |
| 26 | Is there a difference in operative time between minimally invasive and conventional coronary bypass surgery? |
| 27 | Is the risk of stroke lower with minimally invasive coronary bypass compared with the conventional approach? |
| 28 | Is there a difference in blood transfusion requirements between minimally invasive and conventional coronary bypass surgery? |
| 29 | Is the risk of wound infection lower with minimally invasive coronary bypass compared with the conventional method? |
| 30 | Is there a difference in intensive care unit stay between minimally invasive and conventional coronary bypass surgery? |
| 31 | Is there a difference in postoperative pain between minimally invasive and conventional coronary bypass surgery? |
| 32 | Is recovery from anesthesia easier after minimally invasive bypass compared with the conventional method? |
| 33 | Is the risk of postoperative depression or low mood different between minimally invasive and conventional coronary bypass surgery? |
| 34 | Is there a difference in the time to resume driving between minimally invasive and conventional coronary bypass surgery? |
| 35 | Is there a difference in the time to return to work between minimally invasive and conventional coronary bypass surgery? |
| 36 | Is there a difference in the time to resume sexual activity between minimally invasive and conventional coronary bypass surgery? |
| 37 | Is there a difference in the time to return to sports or heavy exercise between minimally invasive and conventional coronary bypass surgery? |
| 38 | Is there a difference in medication use after discharge between minimally invasive and conventional coronary bypass surgery? |
| 39 | After minimally invasive versus conventional bypass surgery, are follow up hospital visits more frequent? |
| 40 | In the long term, is the need for repeat stenting or a second bypass surgery lower with the minimally invasive approach compared with the conventional method? |
| ChatGPT | DeepSeek | Test Statistics | p | Effect Size | |
|---|---|---|---|---|---|
| Accuracy (R1) | 5 (2–5) | 5 (3–5) | Z = −1.416 | 0.16 | r = −0.224 |
| Comprehensibility (R1) | 4 (3–5) | 4 (3–5) | Z = −0.159 | 0.87 | r = −0.025 |
| Unnecessary Detail (R1) | 5 (3–5) | 5 (3–5) | Z = −0.513 | 0.60 | r = −0.081 |
| Accuracy (R2) | 5 (3–5) | 5 (3–5) | Z = −1.213 | 0.22 | r = −0.192 |
| Comprehensibility (R2) | 5 (3–5) | 4 (3–5) | Z = −2.294 | 0.02 | r = −0.363 |
| Unnecessary Detail (R2) | 5 (3–5) | 4 (3–5) | Z = −0.853 | 0.39 | r = −0.135 |
| Accuracy (R3) | 4 (3–5) | 4 (3–5) | Z = −1.127 | 0.26 | r = −0.178 |
| Comprehensibility (R3) | 4 (2–5) | 4 (3–5) | Z = −1.091 | 0.27 | r = −0.172 |
| Unnecessary Detail (R3) | 4 (3–5) | 4 (3–5) | Z = −1.127 | 0.26 | r = −0.178 |
| Accuracy (R4) | 4 (3–5) | 4.5 (3–5) | Z = −2.599 | 0.009 | r = −0.411 |
| Comprehensibility (R4) | 4 (3–5) | 4 (3–5) | Z = 0.000 | 1.00 | r = 0.000 |
| Unnecessary Detail (R4) | 4 (2–5) | 4 (2–5) | Z = −0.354 | 0.72 | r = −0.056 |
| Accuracy (Overall) | 4.25 (3.5–4.75) | 4.5 (3.75–5) | Z = −2.847 | 0.004 | r = −0.450 |
| Comprehensibility (Overall) | 4.38 (3–5) | 4.25 (3.25–5) | Z = −0.439 | 0.66 | r = −0.069 |
| Unnecessary Detail (Overall) | 4.25 (3–5) | 4.25 (3.25–5) | Z = −0.711 | 0.48 | r = −0.112 |
| Mean of Three Categories (Overall) | 4.27 ± 0.3 | 4.32 ± 0.28 | t = −0.967 | 0.34 | d = −0.153 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Özbakkaloğlu, A.; Rahman, Ö.F.; Keleş, E.; Daylan, A.; Cansu, D.; Bozok, Ş. The Quality of AI-Generated CABG Counseling: A Blinded Comparison of Two Language Models. J. Clin. Med. 2026, 15, 3896. https://doi.org/10.3390/jcm15103896
Özbakkaloğlu A, Rahman ÖF, Keleş E, Daylan A, Cansu D, Bozok Ş. The Quality of AI-Generated CABG Counseling: A Blinded Comparison of Two Language Models. Journal of Clinical Medicine. 2026; 15(10):3896. https://doi.org/10.3390/jcm15103896
Chicago/Turabian StyleÖzbakkaloğlu, Alper, Ömer Faruk Rahman, Ercan Keleş, Ahmet Daylan, Dağlar Cansu, and Şahin Bozok. 2026. "The Quality of AI-Generated CABG Counseling: A Blinded Comparison of Two Language Models" Journal of Clinical Medicine 15, no. 10: 3896. https://doi.org/10.3390/jcm15103896
APA StyleÖzbakkaloğlu, A., Rahman, Ö. F., Keleş, E., Daylan, A., Cansu, D., & Bozok, Ş. (2026). The Quality of AI-Generated CABG Counseling: A Blinded Comparison of Two Language Models. Journal of Clinical Medicine, 15(10), 3896. https://doi.org/10.3390/jcm15103896

