Quantifying Readability in Chatbot-Generated Medical Texts Using Classical Linguistic Indices: A Review
Abstract
1. Background
2. Materials and Methods
3. Results
Readability Patterns Across Medical Specialities
4. Discussion
5. Future Directions
5.1. Dynamic, Readability-Aware Text Generation
5.2. Beyond Surface Metrics: Hybrid Readability Models
5.3. Cross-Linguistic and Cross-Cultural Readability Evaluation
5.4. User-Based Comprehension Studies
6. Limitations and Strengths
7. Conclusions
Supplementary Materials
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Fitzpatrick, P.J. Improving health literacy using the power of digital communications to achieve better health outcomes for patients and practitioners. Front. Digit. Health 2023, 5, 1264780. [Google Scholar] [CrossRef] [PubMed]
- Sharkiya, S.H. Quality communication can improve patient-centred health outcomes among older patients: A rapid review. BMC Health Serv. Res. 2023, 23, 886. [Google Scholar] [CrossRef] [PubMed]
- Chen, X.; Hay, J.L.; Waters, E.A.; Kiviniemi, M.T.; Biddle, C.; Schofield, E.; Li, Y.; Kaphingst, K.; Orom, H. Health Literacy and Use and Trust in Health Information. J. Health Commun. 2018, 23, 724–734. [Google Scholar] [CrossRef] [PubMed]
- Kwame, A.; Petrucka, P.M. A literature-based study of patient-centered care and communication in nurse-patient interactions: Barriers, facilitators, and the way forward. BMC Nurs. 2021, 20, 158. [Google Scholar] [CrossRef]
- Al Shamsi, H.; Almutairi, A.G.; Al Mashrafi, S.; Al Kalbani, T. Implications of Language Barriers for Healthcare: A Systematic Review. Oman Med. J. 2020, 35, e122. [Google Scholar] [CrossRef]
- Coughlin, S.S.; Vernon, M.; Hatzigeorgiou, C.; George, V. Health Literacy, Social Determinants of Health, and Disease Prevention and Control. J. Environ. Health Sci. 2020, 6, 3061. [Google Scholar]
- Pandey, M.; Maina, R.G.; Amoyaw, J.; Li, Y.; Kamrul, R.; Michaels, C.R.; Maroof, R. Impacts of English language proficiency on healthcare access, use, and outcomes among immigrants: A qualitative study. BMC Health Serv. Res. 2021, 21, 741. [Google Scholar] [CrossRef]
- Yeung, A.W.K.; Goto, T.K.; Leung, W.K. Readability of the 100 Most-Cited Neuroimaging Papers Assessed by Common Readability Formulae. Front. Hum. Neurosci. 2018, 12, 308. [Google Scholar] [CrossRef]
- Nash, E.; Bickerstaff, M.; Chetwynd, A.J.; Hawcutt, D.B.; Oni, L. The readability of parent information leaflets in paediatric studies. Pediatr. Res. 2023, 94, 1166–1171. [Google Scholar] [CrossRef]
- Brega, A.G.; Freedman, M.A.; LeBlanc, W.G.; Barnard, J.; Mabachi, N.M.; Cifuentes, M.; Albright, K.; Weiss, B.D.; Brach, C.; West, D.R. Using the Health Literacy Universal Precautions Toolkit to Improve the Quality of Patient Materials. J. Health Commun. 2015, 20, 69–76. [Google Scholar] [CrossRef]
- Rooney, M.K.; Santiago, G.; Perni, S.; Horowitz, D.P.; McCall, A.R.; Einstein, A.J.; Jagsi, R.; Golden, D.W. Readability of Patient Education Materials from High-Impact Medical Journals: A 20-Year Analysis. J. Patient Exp. 2021, 8, 2374373521998847. [Google Scholar] [CrossRef]
- Eltorai, A.E.; Ghanian, S.; Adams, C.A., Jr.; Born, C.T.; Daniels, A.H. Readability of patient education materials on the american association for surgery of trauma website. Arch. Trauma. Res. 2014, 3, e18161. [Google Scholar] [CrossRef]
- Badarudeen, S.; Sabharwal, S. Assessing readability of patient education materials: Current role in orthopaedics. Clin. Orthop. Relat. Res. 2010, 468, 2572–2580. [Google Scholar] [CrossRef]
- Geantă, M.; Bădescu, D.; Chirca, N.; Nechita, O.C.; Radu, C.G.; Rascu, Ș.; Rădăvoi, D.; Sima, C.; Toma, C.; Jinga, V. The Emerging Role of Large Language Models in Improving Prostate Cancer Literacy. Bioengineering 2024, 11, 654. [Google Scholar] [CrossRef] [PubMed]
- Demir, G.; Sevri, M.; Hacıosmanoğlu, C.D.; Büyüktaşkın, D.; Özaslan, A. Comparative Evaluation of Large Language Models in Addressing Autism-Related Information Queries: Insights from ChatGPT, Gemini, and Copilot. Gazi Med. J. 2025, 36, 407–416. [Google Scholar] [CrossRef]
- Bolgova, O.; Ganguly, P.; Mavrych, V. Comparative analysis of LLMs performance in medical embryology: A cross-platform study of ChatGPT, Claude, Gemini, and Copilot. Anat. Sci. Educ. 2025, 18, 718–726. [Google Scholar] [CrossRef] [PubMed]
- Swisher, A.R.; Wu, A.W.; Liu, G.C.; Lee, M.K.; Carle, T.R.; Tang, D.M. Enhancing Health Literacy: Evaluating the Readability of Patient Handouts Revised by ChatGPT’s Large Language Model. Otolaryngol. Head Neck Surg. 2024, 171, 1751–1757. [Google Scholar] [CrossRef]
- Nasra, M.; Jaffri, R.; Pavlin-Premrl, D.; Kok, H.K.; Khabaza, A.; Barras, C.; Slater, L.A.; Yazdabadi, A.; Moore, J.; Russell, J.; et al. Can artificial intelligence improve patient educational material readability? A systematic review and narrative synthesis. Intern. Med. J. 2025, 55, 20–34. [Google Scholar] [CrossRef]
- Kirchner, G.J.; Kim, R.Y.; Weddle, J.B.; Bible, J.E. Can Artificial Intelligence Improve the Readability of Patient Education Materials? Clin. Orthop. Relat. Res. 2023, 481, 2260–2267. [Google Scholar] [CrossRef]
- Mokmin, N.A.M.; Ibrahim, N.A. The evaluation of chatbot as a tool for health literacy education among undergraduate students. Educ. Inf. Technol. 2021, 26, 6033–6049. [Google Scholar] [CrossRef]
- Sezer, B.; Aydoğdu, T. Performance of Advanced Artificial Intelligence Models in Traumatic Dental Injuries in Primary Dentition: A Comparative Evaluation of ChatGPT-4 Omni, DeepSeek, Gemini Advanced, and Claude 3.7 in Terms of Accuracy, Completeness, Response Time, and Readability. Appl. Sci. 2025, 15, 7778. [Google Scholar] [CrossRef]
- Tilton, A.K.; Caplan, B.E.; Cole, B.J. Generative AI in consumer health: Leveraging large language models for health literacy and clinical safety with a digital health framework. Front. Digit. Health 2025, 7, 1616488. [Google Scholar] [CrossRef]
- Randell, R.L.; Wilson, H.P.; Ragavan, M.I.; Collins, A.B.; Vail, J.; Ramirez, S.; Amodei, J.; Mickievicz, E.; Krieger, M.S.; Macon, E.C.; et al. Communicating Health Research with Plain Language. Inq. J. Health Care Organ. Provis. Financ. 2025, 62, 469580251357755. [Google Scholar] [CrossRef] [PubMed]
- Giguère, A.; Zomahoun, H.T.V.; Carmichael, P.H.; Uwizeye, C.B.; Légaré, F.; Grimshaw, J.M.; Gagnon, M.P.; Auguste, D.U.; Massougbodji, J. Printed educational materials: Effects on professional practice and healthcare outcomes. Cochrane Database Syst. Rev. 2020, 8, CD004398. [Google Scholar] [CrossRef] [PubMed]
- Yu, P.; Xu, H.; Hu, X.; Deng, C. Leveraging Generative AI and Large Language Models: A Comprehensive Roadmap for Healthcare Integration. Healthcare 2023, 11, 2776. [Google Scholar] [CrossRef]
- Chen, B.; Zhang, Z.; Langrené, N.; Zhu, S. Unleashing the potential of prompt engineering for large language models. Patterns 2025, 6, 101260. [Google Scholar] [CrossRef]
- Reddy, S. Generative AI in healthcare: An implementation science informed translational path on application, integration and governance. Implement. Sci. 2024, 19, 27. [Google Scholar] [CrossRef]
- Warde, F.; Papadakos, J.; Papadakos, T.; Rodin, D.; Salhia, M.; Giuliani, M. Plain language communication as a priority competency for medical professionals in a globalized world. Can. Med. Educ. J. 2018, 9, e52–e59. [Google Scholar] [CrossRef]
- Delgado-Chaves, F.M.; Jennings, M.J.; Atalaia, A.; Wolff, J.; Horvath, R.; Mamdouh, Z.M.; Baumbach, J.; Baumbach, L. Transforming literature screening: The emerging role of large language models in systematic reviews. Proc. Natl. Acad. Sci. USA 2025, 122, e2411962122. [Google Scholar] [CrossRef]
- Yang, S.; Jing, M.; Wang, S.; Huang, Z.; Wang, J.; Kou, J.; Shi, M.; Xia, Z.; Wei, Q.; Xing, W.; et al. Building trustworthy large language model-driven generative recommender system for healthcare decision support: A scoping review of corpus sources, customization techniques, and evaluation frameworks. Artif. Intell. Med. 2026, 171, 103310. [Google Scholar] [CrossRef]
- Ozmen, B.B.; Singh, N.; Shah, K.; Berber, I.; Singh, D.; Pinsky, E.; Schulz, S.A.; Bishop, S.N.; Bernard, S.; Djohan, R.S.; et al. MicroRAG: Development of a Novel Artificial Intelligence Retrieval-Augmented Generation Model for Microsurgery Clinical Decision Support. Microsurgery 2025, 45, e70138. [Google Scholar] [CrossRef]
- Amugongo, L.M.; Mascheroni, P.; Brooks, S.; Doering, S.; Seidel, J. Retrieval augmented generation for large language models in healthcare: A systematic review. PLoS Digit. Health 2025, 4, e0000877. [Google Scholar] [CrossRef]
- Busch, F.; Kaibel, L.; Nguyen, H.; Lemke, T.; Ziegelmayer, S.; Graf, M.; Marka, A.W.; Endrös, L.; Prucker, P.; Spitzl, D.; et al. Evaluation of a Retrieval-Augmented Generation-Powered Chatbot for Pre-CT Informed Consent: A Prospective Comparative Study. J. Imaging Inform. Med. 2025, 38, 4312–4323. [Google Scholar] [CrossRef]
- Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef] [PubMed]
- Yurdakurban, E.; Topsakal, K.G.; Duran, G.S. A comparative analysis of AI-based chatbots: Assessing data quality in orthognathic surgery related patient information. J. Stomatol. Oral Maxillofac. Surg. 2024, 125, 101757. [Google Scholar] [CrossRef] [PubMed]
- Camargo, E.S.; Quadras, I.C.C.; Garanhani, R.R.; de Araujo, C.M.; Stuginski-Barbosa, J. A Comparative Analysis of Three Large Language Models on Bruxism Knowledge. J. Oral Rehabil. 2025, 52, 896–903. [Google Scholar] [CrossRef] [PubMed]
- Deveci, C.D.; Baker, J.J.; Sikander, B.; Rosenberg, J. A comparison of cover letters written by ChatGPT-4 or humans. Dan. Med. J. 2023, 70, A06230412. [Google Scholar]
- Kring, T.; Prasad, S.; Dadi, S.; Sokhn, E.; Franzmann, E. A comparison of quality and readability of Artificial Intelligence chatbots in triage for head and neck cancer. Am. J. Otolaryngol. 2025, 46, 104710. [Google Scholar] [CrossRef]
- Yun, J.Y.; Kim, D.J.; Lee, N.; Kim, E.K. A comprehensive evaluation of ChatGPT consultation quality for augmentation mammoplasty: A comparative analysis between plastic surgeons and laypersons. Int. J. Med. Inform. 2023, 179, 105219. [Google Scholar] [CrossRef]
- Carlson, J.A.; Cheng, R.Z.; Lange, A.; Nagalakshmi, N.; Rabets, J.; Shah, T.; Sindhwani, P. Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware. Cureus 2024, 16, e67996. [Google Scholar] [CrossRef]
- Halawani, A.; Mitchell, A.; Saffarzadeh, M.; Wong, V.; Chew, B.H.; Forbes, C.M. Accuracy and Readability of Kidney Stone Patient Information Materials Generated by a Large Language Model Compared to Official Urologic Organizations. Urology 2024, 186, 107–113. [Google Scholar] [CrossRef] [PubMed]
- Yau, J.Y.; Saadat, S.; Hsu, E.; Murphy, L.S.; Roh, J.S.; Suchard, J.; Tapia, A.; Wiechmann, W.; Langdorf, M.I. Accuracy of Prospective Assessments of 4 Large Language Model Chatbot Responses to Patient Questions About Emergency Care: Experimental Comparative Study. J. Med. Internet Res. 2024, 26, e60291. [Google Scholar] [CrossRef]
- Yıldız, H.A.; Söğütdelen, E. AI Chatbots as Sources of STD Information: A Study on Reliability and Readability. J. Med. Syst. 2025, 49, 43. [Google Scholar] [CrossRef] [PubMed]
- Stephan, D.; Bertsch, A.; Burwinkel, M.; Vinayahalingam, S.; Al-Nawas, B.; Kämmerer, P.W.; Thiem, D.G. AI in Dental Radiology-Improving the Efficiency of Reporting with ChatGPT: Comparative Study. J. Med. Internet Res. 2024, 26, e60684. [Google Scholar] [CrossRef] [PubMed]
- Hand, C.; Bohn, C.; Tannir, S.; Ulrich, M.; Saniei, S.; Girod-Hoffman, M.; Lu, Y.; Forsythe, B. American Academy of Orthopaedic Surgeons OrthoInfo provides more readable information regarding rotator cuff injury than ChatGPT. J. ISAKOS 2025, 12, 100841. [Google Scholar] [CrossRef]
- Bohn, C.; Hand, C.; Tannir, S.; Ulrich, M.; Saniei, S.; Girod-Hoffman, M.; Lu, Y.; Krych, A.; Forsythe, B. American academy of Orthopedic Surgeons’ OrthoInfo provides more readable information regarding meniscus injury than ChatGPT-4 while information accuracy is comparable. J. ISAKOS 2025, 11, 100843. [Google Scholar] [CrossRef]
- Ichhpujani, P.; Parmar, U.P.S.; Kumar, S. Appropriateness and readability of Google Bard and ChatGPT-3.5 generated responses for surgical treatment of glaucoma. Rom. J. Ophthalmol. 2024, 68, 243–248. [Google Scholar] [CrossRef]
- Azzopardi, M.; Ng, B.; Logeswaran, A.; Loizou, C.; Cheong, R.C.T.; Gireesh, P.; Ting, D.S.J.; Chong, Y.J. Artificial intelligence chatbots as sources of patient education material for cataract surgery: ChatGPT-4 versus Google Bard. BMJ Open Ophthalmol. 2024, 9, e001824. [Google Scholar] [CrossRef]
- Gondode, P.G.; Singh, R.; Mehta, S.; Singh, S.; Kumar, S.; Nayak, S.S. Artificial intelligence chatbots versus traditional medical resources for patient education on “Labor Epidurals”: An evaluation of accuracy, emotional tone, and readability. Int. J. Obstet. Anesth. 2025, 61, 104302. [Google Scholar] [CrossRef]
- Pradhan, F.; Fiedler, A.; Samson, K.; Olivera-Martinez, M.; Manatsathit, W.; Peeraphatdit, T. Artificial intelligence compared with human-derived patient educational materials on cirrhosis. Hepatol. Commun. 2024, 8, e0367. [Google Scholar] [CrossRef]
- Ayad, O.; Yassa, A.; Patel, A.M.; Vengsarkar, V.A.; Ayad, S.; Ayad, S.; Mikhael, M. Artificial intelligence in patient care: Evaluating artificial intelligence’s accuracy and accessibility in addressing blepharoplasty concerns. Int. Ophthalmol. 2025, 45, 244. [Google Scholar] [CrossRef] [PubMed]
- Erden, Y.; Temel, M.H.; Bağcıer, F. Artificial intelligence insights into osteoporosis: Assessing ChatGPT’s information quality and readability. Arch. Osteoporos. 2024, 19, 17. [Google Scholar] [CrossRef] [PubMed]
- Shin, D.; Park, H.; Shaffrey, I.; Yacoubian, V.; Taka, T.M.; Dye, J.; Danisa, O. Artificial intelligence versus clinical judgement: How accurately do generative models reflect CNS guidelines for chiari malformation? Clin. Neurol. Neurosurg. 2025, 248, 108662. [Google Scholar] [CrossRef] [PubMed]
- Andrikyan, W.; Sametinger, S.M.; Kosfeld, F.; Jung-Poppe, L.; Fromm, M.F.; Maas, R.; Nicolaus, H.F. Artificial intelligence-powered chatbots in search engines: A cross-sectional study on the quality and risks of drug information for patients. BMJ Qual. Saf. 2025, 34, 100–109. [Google Scholar] [CrossRef]
- De Rouck, R.; Wille, E.; Gilbert, A.; Vermeersch, N. Assessing artificial intelligence-generated patient discharge information for the emergency department: A pilot study. Int. J. Emerg. Med. 2025, 18, 85. [Google Scholar] [CrossRef]
- Mondal, H.; Gupta, G.; Sarangi, P.K.; Sharma, S.; Choudhary, P.K.; Juhi, A.; Kumari, A.; Mondal, S. Assessing the Capability of Large Language Model Chatbots in Generating Plain Language Summaries. Cureus 2025, 17, e80976. [Google Scholar] [CrossRef]
- Xu, Q.; Wang, J.; Chen, X.; Wang, J.; Li, H.; Wang, Z.; Li, W.; Gao, J.; Chen, C.; Gao, Y. Assessing the Efficacy of ChatGPT Prompting Strategies in Enhancing Thyroid Cancer Patient Education: A Prospective Study. J. Med. Syst. 2025, 49, 11. [Google Scholar] [CrossRef]
- Scaff, S.P.S.; Reis, F.J.J.; Ferreira, G.E.; Jacob, M.F.; Saragiotto, B.T. Assessing the performance of AI chatbots in answering patients’ common questions about low back pain. Ann. Rheum. Dis. 2025, 84, 143–149. [Google Scholar] [CrossRef]
- Dharia, S.N.; Traversone, J.; Wortman, R.; Mulligan, M. Assessing the quality and readability of ChatGPT responses to frequently asked questions about trigger finger release. J. Plast. Reconstr. Aesthet. Surg. 2025, 105, 170–172. [Google Scholar] [CrossRef]
- Stephenson-Moe, C.A.; Behers, B.J.; Gibons, R.M.; Behers, B.M.; Jesus Herrera, L.; Anneaud, D.; Rosario, M.A.; Wojtas, C.N.; Bhambrah, S.; Hamad, K.M. Assessing the quality and readability of patient education materials on chemotherapy cardiotoxicity from artificial intelligence chatbots: An observational cross-sectional study. Medicine 2025, 104, e42135. [Google Scholar] [CrossRef]
- Grilo, A.; Marques, C.; Corte-Real, M.; Carolino, E.; Caetano, M. Assessing the Quality and Reliability of ChatGPT’s Responses to Radiotherapy-Related Patient Queries: Comparative Study with GPT-3.5 and GPT-4. JMIR Cancer 2025, 11, e63677. [Google Scholar] [CrossRef]
- Gezer, M.C.; Armangil, M. Assessing the quality of ChatGPT’s responses to commonly asked questions about trigger finger treatment. Turk. J. Trauma Emerg. Surg. Ulus. Travma Acil Cerrahi Derg. 2025, 31, 389–393. [Google Scholar] [CrossRef] [PubMed]
- Keating, M.; Bollard, S.M.; Potter, S. Assessing the Quality, Readability, and Acceptability of AI-Generated Information in Plastic and Aesthetic Surgery. Cureus 2024, 16, e73874. [Google Scholar] [CrossRef] [PubMed]
- Ozduran, E.; Hancı, V.; Erkin, Y.; Özbek, İ.C.; Abdulkerimov, V. Assessing the readability, quality and reliability of responses produced by ChatGPT, Gemini, and Perplexity regarding most frequently asked keywords about low back pain. PeerJ 2025, 13, e18847. [Google Scholar] [CrossRef] [PubMed]
- Ömür Arça, D.; Erdemir, İ.; Kara, F.; Shermatov, N.; Odacioğlu, M.; İbişoğlu, E.; Hanci, F.B.; Sağiroğlu, G.; Hanci, V. Assessing the readability, reliability, and quality of artificial intelligence chatbot responses to the 100 most searched queries about cardiopulmonary resuscitation: An observational study. Medicine 2024, 103, e38352. [Google Scholar] [CrossRef]
- Olszewski, R.; Watros, K.; Mańczak, M.; Owoc, J.; Jeziorski, K.; Brzeziński, J. Assessing the response quality and readability of chatbots in cardiovascular health, oncology, and psoriasis: A comparative study. Int. J. Med. Inform. 2024, 190, 105562. [Google Scholar] [CrossRef]
- Saeedi, S.; Bakhtiar, M. Assessing the response quality and readability of ChatGPT in stuttering. J. Fluen. Disord. 2025, 85, 106149. [Google Scholar] [CrossRef]
- Khabaz, K.; Newman-Hung, N.J.; Kallini, J.R.; Kendal, J.; Christ, A.B.; Bernthal, N.M.; Wessel, L.E. Assessment of Artificial Intelligence Chatbot Responses to Common Patient Questions on Bone Sarcoma. J. Surg. Oncol. 2025, 131, 719–724. [Google Scholar] [CrossRef]
- Pan, A.; Musheyev, D.; Bockelman, D.; Loeb, S.; Kabarriti, A.E. Assessment of Artificial Intelligence Chatbot Responses to Top Searched Queries About Cancer. JAMA Oncol. 2023, 9, 1437–1440. [Google Scholar] [CrossRef]
- Topdağı, B.; Kavaz, T. Assessment of information quality in contemporary artificial intelligence systems for digital smile design: A comparative analysis. J. Prosthet. Dent. 2025, 134, 1279.E1–1279.E8. [Google Scholar] [CrossRef]
- Hancı, V.; Ergün, B.; Gül, Ş.; Uzun, Ö.; Erdemir, İ.; Hancı, F.B. Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care. Medicine 2024, 103, e39305. [Google Scholar] [CrossRef] [PubMed]
- Cao, H.; Hao, C.; Zhang, T.; Zheng, X.; Gao, Z.; Wu, J.; Gan, L.; Liu, Y.; Zeng, X.; Wang, W. Battle of the artificial intelligence: A comprehensive comparative analysis of DeepSeek and ChatGPT for urinary incontinence-related questions. Front. Public Health 2025, 13, 1605908. [Google Scholar] [CrossRef] [PubMed]
- Özer Aslan, İ.; Aslan, M.T. Benchmarking AI Chatbots for Maternal Lactation Support: A Cross-Platform Evaluation of Quality, Readability, and Clinical Accuracy. Healthcare 2025, 13, 1756. [Google Scholar] [CrossRef] [PubMed]
- Rouhi, A.D.; Ghanem, Y.K.; Yolchieva, L.; Saleh, Z.; Joshi, H.; Moccia, M.C.; Suarez-Pierre, A.; Han, J.J. Can Artificial Intelligence Improve the Readability of Patient Education Materials on Aortic Stenosis? A Pilot Study. Cardiol. Ther. 2024, 13, 137–147. [Google Scholar] [CrossRef]
- Dursun, D.; Bilici Geçer, R. Can artificial intelligence models serve as patient information consultants in orthodontics? BMC Med. Inform. Decis. Mak. 2024, 24, 211. [Google Scholar] [CrossRef]
- Lack, B.T.; Mouhawasse, E.; Childers, J.T.; Jackson, G.R.; Daji, S.V.; Yerke-Hansen, P.; Familiari, F.; Knapik, D.M.; Sabesan, V.J. Can ChatGPT answer patient questions regarding reverse shoulder arthroplasty? J. ISAKOS 2024, 9, 100323. [Google Scholar] [CrossRef]
- Hones, K.; Krisanda, E.; Chim, H. Caution Regarding ChatGPT’s Appropriateness and Reliability Regarding Surgery for Wrist Arthritis. Hand 2025, 20, 910–916. [Google Scholar] [CrossRef]
- Dias, R.; Castan, A.; Gotoff, K.; Kadkoy, Y.; Ippolito, J.; Beebe, K.; Benevenia, J. ChatGPT 3.5 Better Improves Comprehensibility of English, than Spanish, Generated Responses to Osteosarcoma Questions. J. Surg. Oncol. 2025, 131, 1692–1695. [Google Scholar] [CrossRef]
- Nian, P.P.; Umesh, A.; Jones, R.H.; Adhiyaman, A.; Williams, C.J.; Goodbody, C.M.; Heyer, J.H.; Doyle, S.M. ChatGPT and Google Gemini are Clinically Inadequate in Providing Recommendations on Management of Developmental Dysplasia of the Hip Compared to American Academy of Orthopaedic Surgeons Clinical Practice Guidelines. J. Pediatr. Orthop. Soc. N. Am. 2024, 10, 100135. [Google Scholar] [CrossRef]
- Siu, A.H.Y.; Gibson, D.P.; Chiu, C.; Kwok, A.; Irwin, M.; Christie, A.; Koh, C.E.; Keshava, A.; Reece, M.; Suen, M.; et al. ChatGPT as a patient education tool in colorectal cancer-An in-depth assessment of efficacy, quality and readability. Color. Dis. 2025, 27, e17267. [Google Scholar] [CrossRef]
- Deng, J.; Li, L.; Oosterhof, J.J.; Malliaras, P.; Silbernagel, K.G.; Breda, S.J.; Eygendaal, D.; Oei, E.H.; de Vos, R.J. ChatGPT is a comprehensive education tool for patients with patellar tendinopathy, but it currently lacks accuracy and readability. Musculoskelet. Sci. Pract. 2025, 76, 103275. [Google Scholar] [CrossRef]
- Mathes, S.; Seurig, S.; Bluhme, F.; Beyer, K.; Heizmann, F.; Wagner, M.; Neugärtner, I.; Biedermann, T.; Darsow, U. ChatGPT Performance on 120 Interdisciplinary Allergology Questions-Systematic Evaluation with Clinical Error Impact Assessment for Critical Erroneous AI-Guided Chatbot Advice. J. Allergy Clin. Immunol. Pract. 2025, 13, 1350–1357.e4. [Google Scholar] [CrossRef] [PubMed]
- AlShehri, Y.; McConkey, M.; Lodhia, P. ChatGPT Provides Satisfactory but Occasionally Inaccurate Answers to Common Patient Hip Arthroscopy Questions. Arthroscopy 2025, 41, 1337–1347. [Google Scholar] [CrossRef] [PubMed]
- Ho, R.A.; Shaari, A.L.; Cowan, P.T.; Yan, K. ChatGPT Responses to Frequently Asked Questions on Ménière’s Disease: A Comparison to Clinical Practice Guideline Answers. OTO Open 2024, 8, e163. [Google Scholar] [CrossRef] [PubMed]
- Shen, S.A.; Perez-Heydrich, C.A.; Xie, D.X.; Nellis, J.C. ChatGPT vs. web search for patient questions: What does ChatGPT do better? Eur. Arch. Otorhinolaryngol. 2024, 281, 3219–3225. [Google Scholar] [CrossRef]
- Sikander, B.; Baker, J.J.; Deveci, C.D.; Lund, L.; Rosenberg, J. ChatGPT-4 and Human Researchers Are Equal in Writing Scientific Introduction Sections: A Blinded, Randomized, Non-inferiority Controlled Study. Cureus 2023, 15, e49019. [Google Scholar] [CrossRef]
- Browne, R.; Gull, K.; Hurley, C.M.; Sugrue, R.M.; O’Sullivan, J.B. ChatGPT-4 Can Help Hand Surgeons Communicate Better with Patients. J. Hand Surg. Glob. Online 2024, 6, 436–438. [Google Scholar] [CrossRef]
- Akyol Onder, E.N.; Ensari, E.; Ertan, P. ChatGPT-4o’s performance on pediatric Vesicoureteral reflux. J. Pediatr. Urol. 2025, 21, 504–509. [Google Scholar] [CrossRef]
- Najafali, D.; Galbraith, L.G.; Camacho, J.M.; Stoffel, V.; Herzog, I.; Moss, C.; Taiberg, S.L.; Knoedler, L. Class in Session: Analysis of GPT-4-created Plastic Surgery In-service Examination Questions. Plast. Reconstr. Surg. Glob. Open 2024, 12, e6185. [Google Scholar] [CrossRef]
- Bahçeci, T.; Elmaağaç, B.; Ceyhan, E. Comparative analysis of the effectiveness of microsoft copilot artificial intelligence chatbot and google search in answering patient inquiries about infertility: Evaluating readability, understandability, and actionability. Int. J. Impot. Res. 2025, 37, 1002–1007. [Google Scholar] [CrossRef]
- Maron, C.M.; Emile, S.H.; Horesh, N.; Freund, M.R.; Pellino, G.; Wexner, S.D. Comparing answers of ChatGPT and Google Gemini to common questions on benign anal conditions. Tech. Coloproctol. 2025, 29, 57. [Google Scholar] [CrossRef]
- Du, K.; Li, A.; Zuo, Q.H.; Zhang, C.Y.; Guo, R.; Chen, P.; Du, W.S.; Li, S.M. Comparing Artificial Intelligence-Generated and Clinician-Created Personalized Self-Management Guidance for Patients with Knee Osteoarthritis: Blinded Observational Study. J. Med. Internet Res. 2025, 27, e67830. [Google Scholar] [CrossRef]
- Gondode, P.; Duggal, S.; Garg, N.; Sethupathy, S.; Asai, O.; Lohakare, P. Comparing patient education tools for chronic pain medications: Artificial intelligence chatbot versus traditional patient information leaflets. Indian. J. Anaesth. 2024, 68, 631–636. [Google Scholar] [CrossRef] [PubMed]
- Shanmugam, S.K.; Browning, D.J. Comparison of Large Language Models in Diagnosis and Management of Challenging Clinical Cases. Clin. Ophthalmol. 2024, 18, 3239–3247. [Google Scholar] [CrossRef] [PubMed]
- Roy, J.M.; Atallah, E.; Piper, K.; Majmundar, S.; Mouchtouris, N.; Self, D.M.; Kaul, A.; Sizdahkhani, S.; Musmar, B.; Tjoumakaris, S.I.; et al. Comparison of quality, empathy and readability of physician responses versus chatbot responses to common cerebrovascular neurosurgical questions on a social media platform. Clin. Neurol. Neurosurg. 2025, 255, 108986. [Google Scholar] [CrossRef] [PubMed]
- Zaleski, A.L.; Berkowsky, R.; Craig, K.J.T.; Pescatello, L.S. Comprehensiveness, Accuracy, and Readability of Exercise Recommendations Provided by an AI-Based Chatbot: Mixed Methods Study. JMIR Med. Educ. 2024, 10, e51308. [Google Scholar] [CrossRef]
- Singh, S.; Errampalli, E.; Errampalli, N.; Miran, M.S. Enhancing Patient Education on Cardiovascular Rehabilitation with Large Language Models. Mo. Med. 2025, 122, 67–71. [Google Scholar]
- Abreu, A.A.; Murimwa, G.Z.; Farah, E.; Stewart, J.W.; Zhang, L.; Rodriguez, J.; Sweetenham, J.; Zeh, H.J.; Wang, S.C.; Polanco, P.M. Enhancing Readability of Online Patient-Facing Content: The Role of AI Chatbots in Improving Cancer Information Accessibility. J. Natl. Compr. Canc. Netw. 2024, 22, e237334. [Google Scholar] [CrossRef]
- Mondal, H.; Tiu, D.N.; Mondal, S.; Dutta, R.; Naskar, A.; Podder, I. Evaluating Accuracy and Readability of Responses to Midlife Health Questions: A Comparative Analysis of Six Large Language Model Chatbots. J. Midlife Health 2025, 16, 45–50. [Google Scholar] [CrossRef]
- Zhan, Y.; Chen, X.; Ye, F.; Wu, Z.; Usman, M.; Yuan, Z.; Wu, H.; Huang, J.; Yu, H. Evaluating AI Chatbot Responses to Postkidney Transplant Inquiries. Transplant. Proc. 2025, 57, 394–405. [Google Scholar] [CrossRef]
- Kayra, M.V.; Anil, H.; Ozdogan, I.; Baradia, S.M.A.; Toksoz, S. Evaluating AI chatbots in penis enhancement information: A comparative analysis of readability, reliability and quality. Int. J. Impot. Res. 2025, 37, 558–563. [Google Scholar] [CrossRef]
- Kacer, E.O. Evaluating AI-based breastfeeding chatbots: Quality, readability, and reliability analysis. PLoS ONE 2025, 20, e0319782. [Google Scholar] [CrossRef] [PubMed]
- Zhou, M.; Pan, Y.; Zhang, Y.; Song, X.; Zhou, Y. Evaluating AI-generated patient education materials for spinal surgeries: Comparative analysis of readability and DISCERN quality across ChatGPT and deepseek models. Int. J. Med. Inform. 2025, 198, 105871. [Google Scholar] [CrossRef]
- Helvacioglu-Yigit, D.; Demirturk, H.; Ali, K.; Tamimi, D.; Koenig, L.; Almashraqi, A. Evaluating artificial intelligence chatbots for patient education in oral and maxillofacial radiology. Oral Surg. Oral Med. Oral Pathol. Oral Radiol. 2025, 139, 750–759. [Google Scholar] [CrossRef] [PubMed]
- Dincer, H.A.; Dogu, D. Evaluating Artificial Intelligence in Patient Education: DeepSeek-V3 Versus ChatGPT-4o in Answering Common Questions on Laparoscopic Cholecystectomy. ANZ J. Surg. 2025, 95, 2322–2328. [Google Scholar] [CrossRef] [PubMed]
- Sina, E.M.; Campbell, D.J.; Duffy, A.; Mandloi, S.; Benedict, P.; Farquhar, D.; Unsal, A.; Nyquist, G. Evaluating ChatGPT as a Patient Education Tool for COVID-19-Induced Olfactory Dysfunction. OTO Open 2024, 8, e70011. [Google Scholar] [CrossRef]
- Lee, T.J.; Campbell, D.J.; Rao, A.K.; Hossain, A.; Elkattawy, O.; Radfar, N.; Lee, P.; Gardin, J.M. Evaluating ChatGPT Responses on Atrial Fibrillation for Patient Education. Cureus 2024, 16, e61680. [Google Scholar] [CrossRef]
- Campbell, D.J.; Estephan, L.E.; Mastrolonardo, E.V.; Amin, D.R.; Huntley, C.T.; Boon, M.S. Evaluating ChatGPT responses on obstructive sleep apnea for patient education. J. Clin. Sleep Med. 2023, 19, 1989–1995. [Google Scholar] [CrossRef]
- Pandey, V.K.; Munshi, A.; Mohanti, B.K.; Bansal, K.; Rastogi, K. Evaluating ChatGPT to test its robustness as an interactive information database of radiation oncology and to assess its responses to common queries from radiotherapy patients: A single institution investigation. Cancer Radiother. 2024, 28, 258–264. [Google Scholar] [CrossRef]
- Sahin, S.; Erkmen, B.; Duymaz, Y.K.; Bayram, F.; Tekin, A.M.; Topsakal, V. Evaluating ChatGPT-4’s performance as a digital health advisor for otosclerosis surgery. Front. Surg. 2024, 11, 1373843. [Google Scholar] [CrossRef]
- Alapati, R.; Campbell, D.; Molin, N.; Creighton, E.; Wei, Z.; Boon, M.; Huntley, C. Evaluating insomnia queries from an artificial intelligence chatbot for patient education. J. Clin. Sleep Med. 2024, 20, 583–594. [Google Scholar] [CrossRef]
- Fazilat, A.Z.; Brenac, C.; Kawamoto-Duran, D.; Berry, C.E.; Alyono, J.; Chang, M.T.; Liu, D.T.; Patel, Z.M.; Tringali, S.; Wan, D.C.; et al. Evaluating the quality and readability of ChatGPT-generated patient-facing medical information in rhinology. Eur. Arch. Otorhinolaryngol. 2025, 282, 1911–1920. [Google Scholar] [CrossRef]
- Giammanco, P.A.; Collins, C.E.; Zimmerman, J.; Kricfalusi, M.; Rice, R.C.; Trumbo, M.; Carlson, B.A.; Rajfer, R.A.; Schneiderman, B.A.; Elsissy, J.G. Evaluating the Quality and Readability of Information Provided by Generative Artificial Intelligence Chatbots on Clavicle Fracture Treatment Options. Cureus 2025, 17, e77200. [Google Scholar] [CrossRef] [PubMed]
- Singavarapu, J.; Khemlani, A.; Jacobs, M.; Berglas, E.; Lazar, J.; Kabarriti, A. Evaluating the Quality of Cardiovascular Disease Information from AI Chatbots: A Comparative Study. Cureus 2025, 17, e88085. [Google Scholar] [CrossRef] [PubMed]
- Kara, M.; Ozduran, E.; Kara, M.M.; Özbek, İ.C.; Hancı, V. Evaluating the readability, quality, and reliability of responses generated by ChatGPT, Gemini, and Perplexity on the most commonly asked questions about Ankylosing spondylitis. PLoS ONE 2025, 20, e0326351. [Google Scholar] [CrossRef] [PubMed]
- Karaagac, M.; Carkit, S. Evaluation of AI-Based Chatbots in Liver Cancer Information Dissemination: A Comparative Analysis of GPT, DeepSeek, Copilot, and Gemini. Oncology 2025, 1–10. [Google Scholar] [CrossRef]
- Spina, A.; Andalib, S.; Flores, D.; Vermani, R.; Halaseh, F.F.; Nelson, A.M. Evaluation of Generative Language Models in Personalizing Medical Information: Instrument Validation Study. JMIR AI 2024, 3, e54371. [Google Scholar] [CrossRef]
- Şahin, M.F.; Keleş, A.; Özcan, R.; Doğan, Ç.; Topkaç, E.C.; Akgül, M.; Yazıci, C.M. Evaluation of information accuracy and clarity: ChatGPT responses to the most frequently asked questions about premature ejaculation. Sex. Med. 2024, 12, qfae036. [Google Scholar] [CrossRef]
- Öztürk, Z.; Bal, C.; Çelikkaya, B.N. Evaluation of Information Provided by ChatGPT Versions on Traumatic Dental Injuries for Dental Students and Professionals. Dent. Traumatol. 2025, 41, 427–436. [Google Scholar] [CrossRef]
- Casciato, D.; Mateen, S.; Cooperman, S.; Pesavento, D.; Brandao, R.A. Evaluation of Online AI-Generated Foot and Ankle Surgery Information. J. Foot Ankle Surg. 2024, 63, 680–683. [Google Scholar] [CrossRef]
- Davis, R.J.; Ayo-Ajibola, O.; Lin, M.E.; Swanson, M.S.; Chambers, T.N.; Kwon, D.I.; Kokot, N.C. Evaluation of Oropharyngeal Cancer Information from Revolutionary Artificial Intelligence Chatbot. Laryngoscope 2024, 134, 2252–2257. [Google Scholar] [CrossRef]
- Meyer, M.K.R.; Kandathil, C.K.; Davis, S.J.; Durairaj, K.K.; Patel, P.N.; Pepper, J.P.; Spataro, E.A.; Most, S.P. Evaluation of Rhinoplasty Information from ChatGPT, Gemini, and Claude for Readability and Accuracy. Aesthetic Plast. Surg. 2025, 49, 1868–1873. [Google Scholar] [CrossRef] [PubMed]
- Gupta, A.; Basha, A.; Sontam, T.R.; Hlavinka, W.J.; Croen, B.J.; Abdou, C.; Abdullah, M.; Hamilton, R. Evolution of patient education materials from large-language artificial intelligence models on complex regional pain syndrome: Are patients learning? Bayl. Univ. Med. Cent. Proc. 2025, 38, 221–226. [Google Scholar] [CrossRef] [PubMed]
- Kılınç, D.D.; Mansız, D. Examination of the reliability and readability of Chatbot Generative Pretrained Transformer’s (ChatGPT) responses to questions about orthodontics and the evolution of these responses in an updated version. Am. J. Orthod. Dentofacial Orthop. 2024, 165, 546–555. [Google Scholar] [CrossRef] [PubMed]
- Canillas Del Rey, F.; Canillas Arias, M. Exploring the potential of Artificial Intelligence in Traumatology: Conversational answers to specific questions. Rev. Esp. Cir. Ortop. Traumatol. 2025, 69, 38–46, (In English, Spanish). [Google Scholar] [CrossRef]
- Park, K.U.; Lipsitz, S.; Dominici, L.S.; Lynce, F.; Minami, C.A.; Nakhlis, F.; Waks, A.G.; Warren, L.E.; Eidman, N.; Frazier, J.; et al. Generative artificial intelligence as a source of breast cancer information for patients: Proceed with caution. Cancer 2025, 131, e35521. [Google Scholar] [CrossRef]
- Zaretsky, J.; Kim, J.M.; Baskharoun, S.; Zhao, Y.; Austrian, J.; Aphinyanaphongs, Y.; Gupta, R.; Blecker, S.B.; Feldman, J. Generative Artificial Intelligence to Transform Inpatient Discharge Summaries to Patient-Friendly Language and Format. JAMA Netw. Open 2024, 7, e240357. [Google Scholar] [CrossRef]
- Lee, Y.; Shin, T.; Tessier, L.; Javidan, A.; Jung, J.; Hong, D.; Strong, A.T.; McKechnie, T.; Malone, S.; ASMBS Artificial Intelligence and Digital Surgery Task Force; et al. Harnessing artificial intelligence in bariatric surgery: Comparative analysis of ChatGPT-4, Bing, and Bard in generating clinician-level bariatric surgery recommendations. Surg. Obes. Relat. Dis. 2024, 20, 603–608. [Google Scholar] [CrossRef]
- Asfuroğlu, Z.M.; Yağar, H.; Gümüşoğlu, E. High accuracy but limited readability of large language model-generated responses to frequently asked questions about Kienböck’s disease. BMC Musculoskelet. Disord. 2024, 25, 879. [Google Scholar] [CrossRef]
- Gül, Ş.; Erdemir, İ.; Hanci, V.; Aydoğmuş, E.; Erkoç, Y.S. How artificial intelligence can provide information about subdural hematoma: Assessment of readability, reliability, and quality of ChatGPT, BARD, and perplexity responses. Medicine 2024, 103, e38009. [Google Scholar] [CrossRef]
- Ulusoy, I.; Yılmaz, M.; Kıvrak, A. How Efficient Is ChatGPT in Accessing Accurate and Quality Health-Related Information? Cureus 2023, 15, e46662. [Google Scholar] [CrossRef] [PubMed]
- Akkan, H.; Seyyar, G.K. Improving readability in AI-generated medical information on fragility fractures: The role of prompt wording on ChatGPT’s responses. Osteoporos. Int. 2025, 36, 403–410. [Google Scholar] [CrossRef] [PubMed]
- Tan, C.W.; Chan, J.C.Y.; Chan, J.J.I.; Nagarajan, S.; Sng, B.L. Information about labor epidural analgesia: An updated evaluation on the readability, accuracy, and quality of ChatGPT responses incorporating patient preferences and complex clinical scenarios. Int. J. Obstet. Anesth. 2025, 63, 104688. [Google Scholar] [CrossRef] [PubMed]
- Xie, Y.; Seth, I.; Hunter-Smith, D.J.; Rozen, W.M.; Seifman, M.A. Investigating the impact of innovative AI chatbot on post-pandemic medical education and clinical assistance: A comprehensive analysis. ANZ J. Surg. 2024, 94, 68–77. [Google Scholar] [CrossRef]
- Cao, J.J.; Kwon, D.H.; Ghaziani, T.T.; Kwo, P.; Tse, G.; Kesselman, A.; Kamaya, A.; Tse, J.R. Large language models’ responses to liver cancer surveillance, diagnosis, and management questions: Accuracy, reliability, readability. Abdom. Radiol. 2024, 49, 4286–4294. [Google Scholar] [CrossRef]
- Singh, S.P.; Jamal, A.; Qureshi, F.; Zaidi, R.; Qureshi, F. Leveraging Generative Artificial Intelligence Models in Patient Education on Inferior Vena Cava Filters. Clin. Pract. 2024, 14, 1507–1514. [Google Scholar] [CrossRef]
- Andreadis, K.; Newman, D.R.; Twan, C.; Shunk, A.; Mann, D.M.; Stevens, E.R. Mixed methods assessment of the influence of demographics on medical advice of ChatGPT. J. Am. Med. Inform. Assoc. 2024, 31, 2002–2009. [Google Scholar] [CrossRef]
- Shukla, I.Y.; Sun, M.Z. Online and ChatGPT-generated patient education materials regarding brain tumor prognosis fail to meet readability standards. J. Clin. Neurosci. 2025, 138, 111410. [Google Scholar] [CrossRef]
- Hunter, N.; Allen, D.; Xiao, D.; Cox, M.; Jain, K. Patient education resources for oral mucositis: A google search and ChatGPT analysis. Eur. Arch. Otorhinolaryngol. 2025, 282, 1609–1618. [Google Scholar] [CrossRef]
- Yalla, G.R.; Hyman, N.; Hock, L.E.; Zhang, Q.; Shukla, A.G.; Kolomeyer, N.N. Performance of Artificial Intelligence Chatbots on Glaucoma Questions Adapted from Patient Brochures. Cureus 2024, 16, e56766. [Google Scholar] [CrossRef]
- Alasker, A.; Alsalamah, S.; Alshathri, N.; Almansour, N.; Alsalamah, F.; Alghafees, M.; AlKhamees, M.; Alsaikhan, B. Performance of large language models (LLMs) in providing prostate cancer information. BMC Urol. 2024, 24, 177. [Google Scholar] [CrossRef]
- Chen, D.; Parsa, R.; Hope, A.; Hannon, B.; Mak, E.; Eng, L.; Liu, F.F.; Fallah-Rad, N.; Heesters, A.M.; Raman, S. Physician and Artificial Intelligence Chatbot Responses to Cancer Questions from Social Media. JAMA Oncol. 2024, 10, 956–960. [Google Scholar] [CrossRef] [PubMed]
- Zhang, J.; Sun, Y.; Rong, Y.; Li, H.; Jiang, B.; Zhao, C.; Liu, H. Potential of AI Chatbots in Online Hair Transplantation Consultations: A Multi-metric Assessment of Three Models. Aesthet. Plast. Surg. 2025, 49, 6155–6161. [Google Scholar] [CrossRef] [PubMed]
- Bragazzi, N.L.; Buchinger, M.; Atwan, H.; Tuma, R.; Chirico, F.; Szarpak, L.; Farah, R.; Khamisy-Farah, R. Proficiency, Clarity, and Objectivity of Large Language Models Versus Specialists’ Knowledge on COVID-19’s Impacts in Pregnancy: Cross-Sectional Pilot Study. JMIR Form. Res. 2025, 9, e56126. [Google Scholar] [CrossRef] [PubMed]
- Warren, C.J.; Edmonds, V.S.; Payne, N.G.; Voletti, S.; Wu, S.Y.; Colquitt, J.; Sadeghi-Nejad, H.; Punjani, N. Prompt matters: Evaluation of large language model chatbot responses related to Peyronie’s disease. Sex. Med. 2024, 12, qfae055. [Google Scholar] [CrossRef]
- Warren, C.J.; Payne, N.G.; Edmonds, V.S.; Voleti, S.S.; Choudry, M.M.; Punjani, N.; Abdul-Muhsin, H.M.; Humphreys, M.R. Quality of Chatbot Information Related to Benign Prostatic Hyperplasia. Prostate 2025, 85, 175–180. [Google Scholar] [CrossRef]
- Stapleton, P.; Santucci, J.; Cundy, T.P.; Sathianathen, N. Quality of Information on Wilms Tumor from Artificial Intelligence Chatbots: What Are Your Patients and Their Families Reading? Urology 2025, 198, 130–134. [Google Scholar] [CrossRef]
- Boscolo-Rizzo, P.; Marcuzzo, A.V.; Lazzarin, C.; Giudici, F.; Polesel, J.; Stellin, M.; Pettorelli, A.; Spinato, G.; Ottaviano, G.; Ferrari, M.; et al. Quality of Information Provided by Artificial Intelligence Chatbots Surrounding the Reconstructive Surgery for Head and Neck Cancer: A Comparative Analysis Between ChatGPT4 and Claude2. Clin. Otolaryngol. 2025, 50, 330–335. [Google Scholar] [CrossRef]
- Aydın, F.O.; Aksoy, B.K.; Ceylan, A.; Akbaş, Y.B.; Ermiş, S.; Kepez Yıldız, B.; Yıldırım, Y. Readability and Appropriateness of Responses Generated by ChatGPT 3.5, ChatGPT 4.0, Gemini, and Microsoft Copilot for FAQs in Refractive Surgery. Turk. J. Ophthalmol. 2024, 54, 313–317. [Google Scholar] [CrossRef]
- Musheyev, D.; Pan, A.; Gross, P.; Kamyab, D.; Kaplinsky, P.; Spivak, M.; Bragg, M.A.; Loeb, S.; Kabarriti, A.E. Readability and Information Quality in Cancer Information from a Free vs Paid Chatbot. JAMA Netw. Open 2024, 7, e2422275. [Google Scholar] [CrossRef]
- Alsabawi, Y.; Quesada, P.R.; Rouse, D.T. Readability of custom chatbot vs. GPT-4 responses to otolaryngology-related patient questions. Am. J. Otolaryngol. 2025, 46, 104717. [Google Scholar] [CrossRef]
- Gawey, L.; Dagenet, C.B.; Tran, K.A.; Park, S.; Hsiao, J.L.; Shi, V. Readability of Information Generated by ChatGPT for Hidradenitis Suppurativa. JMIR Dermatol. 2024, 7, e55204. [Google Scholar] [CrossRef] [PubMed]
- Büker, M.; Mercan, G. Readability, accuracy and appropriateness and quality of AI chatbot responses as a patient information source on root canal retreatment: A comparative assessment. Int. J. Med. Inform. 2025, 201, 105948. [Google Scholar] [CrossRef] [PubMed]
- Ozduran, E.; Akkoc, I.; Büyükçoban, S.; Erkin, Y.; Hanci, V. Readability, reliability and quality of responses generated by ChatGPT, gemini, and perplexity for the most frequently asked questions about pain. Medicine 2025, 104, e41780. [Google Scholar] [CrossRef] [PubMed]
- Alamleh, S.; Mavedatnia, D.; Francis, G.; Le, T.; Davies, J.; Lin, V.; Lee, J.J.W. Readability, Reliability, and Quality Analysis of Internet-Based Patient Education Materials and Large Language Models on Meniere’s Disease. J. Otolaryngol. Head Neck Surg. 2025, 54, 19160216251360651. [Google Scholar] [CrossRef]
- Şan, H.; Bayrakcı, Ö.; Çağdaş, B.; Serdengeçti, M.; Alagöz, E. Reliability and readability analysis of ChatGPT-4 and Google Bard as a patient information source for the most commonly applied radionuclide treatments in cancer patients. Rev. Esp. Med. Nucl. Imagen. Mol. Engl. Ed. 2024, 43, 500021. [Google Scholar] [CrossRef]
- Aydinbelge-Dizdar, N.; Dizdar, K. Evaluación de la fiabilidad y legibilidad de las respuestas de los chatbots como recurso de información al paciente para las exploraciones PET-TC más communes. Rev. Esp. Med. Nucl. Imagen. Mol. Engl. Ed. 2025, 44, 500065. [Google Scholar] [CrossRef]
- Şahin, M.F.; Ateş, H.; Keleş, A.; Özcan, R.; Doğan, Ç.; Akgül, M.; Yazıcı, C.M. Responses of Five Different Artificial Intelligence Chatbots to the Top Searched Queries About Erectile Dysfunction: A Comparative Analysis. J. Med. Syst. 2024, 48, 38. [Google Scholar] [CrossRef]
- Yassa, A.; Ayad, O.; Cohen, D.A.; Patel, A.M.; Vengsarkar, V.A.; Hegazin, M.S.; Filimonov, A.; Hsueh, W.D.; Eloy, J.A. Search for medical information for chronic rhinosinusitis through an artificial intelligence ChatBot. Laryngoscope Investig. Otolaryngol. 2024, 9, e70009. [Google Scholar] [CrossRef]
- Shin, D.; Tang, T.; Carson, J.; Isaac, R.; Dinh, C.; Im, D.; Fay, A.; Isaac, A.; Cho, S.; Brandt, Z.; et al. Subthalamic nucleus or globus pallidus internus deep brain stimulation for the treatment of parkinson’s disease: An artificial intelligence approach. J. Clin. Neurosci. 2025, 138, 111393. [Google Scholar] [CrossRef]
- Anıl, H.; Kayra, M.V. The digital dialogue on premature ejaculation: Evaluating the efficacy of artificial intelligence-driven responses. Int. Urol. Nephrol. 2025, 57, 2829–2836. [Google Scholar] [CrossRef]
- Liu, X.; Shi, S.; Zhang, X.; Gao, Q.; Wang, W. The role of ChatGPT-4o in differential diagnosis and management of vertigo-related disorders. Sci. Rep. 2025, 15, 18688. [Google Scholar] [CrossRef] [PubMed]
- Taka, T.M.; Collins, C.E.; Miner, A.; Overfield, I.; Shin, D.; Seo, L.; Danisa, O. The role of generative artificial intelligence in deciding fusion treatment of lumbar degeneration: A comparative analysis and narrative review. Eur. Spine J. 2025, 34, 3901–3910. [Google Scholar] [CrossRef] [PubMed]
- Arzu, U.; Gencer, B. To Self-Treat or Not to Self-Treat: Evaluating the Diagnostic, Advisory and Referral Effectiveness of ChatGPT Responses to the Most Common Musculoskeletal Disorders. Diagnostics 2025, 15, 1834. [Google Scholar] [CrossRef] [PubMed]
- Ayo-Ajibola, O.; Davis, R.J.; Lin, M.E.; Vukkadala, N.; O’Dell, K.; Swanson, M.S.; Johns, M.M., 3rd; Shuman, E.A. TrachGPT: Appraisal of tracheostomy care recommendations from an artificial intelligent Chatbot. Laryngoscope Investig. Otolaryngol. 2024, 9, e1300. [Google Scholar] [CrossRef]
- Kerkütlüoğlu, M.; Kaya, E.; Gökmen, R. Trustworthiness, Value, Danger, and Readability of ChatGPT-Generated Responses to Health Questions Related to Pulmonary Arterial Hypertension. Cureus 2024, 16, e71472. [Google Scholar] [CrossRef]
- Lee, T.J.; Campbell, D.J.; Patel, S.; Hossain, A.; Radfar, N.; Siddiqui, E.; Gardin, J.M. Unlocking Health Literacy: The Ultimate Guide to Hypertension Education from ChatGPT Versus Google Gemini. Cureus 2024, 16, e59898. [Google Scholar] [CrossRef]
- Covington, E.W.; Watts Alexander, C.S.; Sewell, J.; Hutchison, A.M.; Kay, J.; Tocco, L.; Hyte, M. Unlocking the future of patient Education: ChatGPT vs. LexiComp® as sources of patient education materials. J. Am. Pharm. Assoc. 2025, 65, 102119. [Google Scholar] [CrossRef]
- Steimetz, E.; Minkowitz, J.; Gabutan, E.C.; Ngichabe, J.; Attia, H.; Hershkop, M.; Ozay, F.; Hanna, M.G.; Gupta, R. Use of Artificial Intelligence Chatbots in Interpretation of Pathology Reports. JAMA Netw. Open 2024, 7, e2412767. [Google Scholar] [CrossRef]
- Patel, T.A.; Michaelson, G.; Morton, Z.; Harris, A.; Smith, B.; Bourguillon, R.; Wu, E.; Eguia, A.; Maxwell, J.H. Use of ChatGPT for patient education involving HPV-associated oropharyngeal cancer. Am. J. Otolaryngol. 2025, 46, 104642. [Google Scholar] [CrossRef]
- Burns, C.; Bakaj, A.; Berishaj, A.; Hristidis, V.; Deak, P.; Equils, O. Use of Generative AI for Improving Health Literacy in Reproductive Health: Case Study. JMIR Form. Res. 2024, 8, e59434. [Google Scholar] [CrossRef] [PubMed]
- ELSenbawy, O.M.; Patel, K.B.; Wannakuwatte, R.A.; Thota, A.N. Use of generative large language models for patient education on common surgical conditions: A comparative analysis between ChatGPT and Google Gemini. Updates Surg. 2025, 1–7. [Google Scholar] [CrossRef]
- Šuto Pavičić, J.; Marušić, A.; Buljan, I. Using ChatGPT to Improve the Presentation of Plain Language Summaries of Cochrane Systematic Reviews About Oncology Interventions: Cross-Sectional Study. JMIR Cancer 2025, 11, e63347. [Google Scholar] [CrossRef] [PubMed]
- Tran, Q.L.; Huynh, P.P.; Le, B.; Jiang, N. Utilization of Artificial Intelligence in the Creation of Patient Information on Laryngology Topics. Laryngoscope 2025, 135, 1295–1300. [Google Scholar] [CrossRef] [PubMed]
- Sönmezoğlu, H.İ.; Güner Sönmezoğlu, B.; Temel, M.H.; Çakir, B. Comprehensibility and readability of selected artificial intelligence chatbots in providing uveitis-related information. Medicine 2025, 104, e45135. [Google Scholar] [CrossRef]
- Baur, D.; Ansorg, J.; Heyde, C.E.; Voelker, A. Development and Evaluation of a Retrieval-Augmented Generation Chatbot for Orthopedic and Trauma Surgery Patient Education: Mixed-Methods Study. JMIR AI 2025, 4, e75262. [Google Scholar] [CrossRef]
- Prabha, S.; Gomez-Cabello, C.A.; Haider, S.A.; Genovese, A.; Trabilsy, M.; Wood, N.G.; Bagaria, S.; Tao, C.; Forte, A.J. Enhancing Clinical Decision Support with Adaptive Iterative Self-Query Retrieval for Retrieval-Augmented Large Language Models. Bioengineering 2025, 12, 895. [Google Scholar] [CrossRef]
- Cross, J.L.; Choma, M.A.; Onofrey, J.A. Bias in medical AI: Implications for clinical decision-making. PLoS Digit. Health 2024, 3, e0000651. [Google Scholar] [CrossRef]
- Alli, S.R.; Hossain, S.Q.; Das, S.; Upshur, R. The Potential of Artificial Intelligence Tools for Reducing Uncertainty in Medicine and Directions for Medical Education. JMIR Med. Educ. 2024, 10, e51446. [Google Scholar] [CrossRef]
- Gomez-Cabello, C.A.; Prabha, S.; Haider, S.A.; Genovese, A.; Collaco, B.G.; Wood, N.G.; Bagaria, S.; Forte, A.J. Comparative Evaluation of Advanced Chunking for Retrieval-Augmented Generation in Large Language Models for Clinical Decision Support. Bioengineering 2025, 12, 1194. [Google Scholar] [CrossRef]
- Abo El-Enen, M.; Saad, S.; Nazmy, T. A survey on retrieval-augmentation generation (RAG) models for healthcare applications. Neural Comput. Appl. 2025, 37, 28191–28267. [Google Scholar] [CrossRef]
- Wada, A.; Tanaka, Y.; Nishizawa, M.; Yamamoto, A.; Akashi, T.; Hagiwara, A.; Hayakawa, Y.; Kikuta, J.; Shimoji, K.; Sano, K.; et al. Retrieval-augmented generation elevates local LLM quality in radiology contrast media consultation. npj Digit. Med. 2025, 8, 395. [Google Scholar] [CrossRef] [PubMed]
- Maity, S.; Saikia, M.J. Large Language Models in Healthcare and Medical Applications: A Review. Bioengineering 2025, 12, 631. [Google Scholar] [CrossRef] [PubMed]
- Weiss, B.D. Health Literacy and Patient Safety: Help Patients Understand. Manual for Clinicians, 2nd ed.; American Medical Association Foundation and American Medical Association: Chicago, IL, USA, 2007. [Google Scholar]
- US Department of Health and Human Services; Office of Disease Prevention and Health Promotion. National Action Plan to Improve Health Literacy; US Department of Health and Human Services: Washington, DC, USA, 2010.
- DeTemple, D.E.; Meine, T.C. Comparison of the readability of ChatGPT and Bard in medical communication: A meta-analysis. BMC Med. Inform. Decis. Mak. 2025, 25, 325. [Google Scholar] [CrossRef] [PubMed]
- Moons, P.; Van Bulck, L. Using ChatGPT and Google Bard to improve the readability of written patient information: A proof of concept. Eur. J. Cardiovasc. Nurs. 2024, 23, 122–126. [Google Scholar] [CrossRef]
- Andrew, A. Accuracy of ChatGPT in answering cardiology board-style questions. J. Educ. Eval. Health Prof. 2025, 22, 9. [Google Scholar] [CrossRef]
- Uchmanowicz, I.; Jędrzejczyk, M.; Vellone, E.; Janczak, S.; Mirkowski, K.; Uchmanowicz, B.M.; Czapla, M. ChatGPT in cardiovascular medicine: Revolution, hype, or helper? Front. Public Health 2025, 13, 1622561. [Google Scholar] [CrossRef]
- Harskamp, R.E.; De Clercq, L. Performance of ChatGPT as an AI-assisted decision support tool in medicine: A proof-of-concept study for interpreting symptoms and management of common cardiac conditions (AMSTELHEART-2). Acta Cardiol. 2024, 79, 358–366. [Google Scholar] [CrossRef]
- Lautrup, A.D.; Hyrup, T.; Schneider-Kamp, A.; Dahl, M.; Lindholt, J.S.; Schneider-Kamp, P. Heart-to-heart with ChatGPT: The impact of patients consulting AI for cardiovascular health advice. Open Heart 2023, 10, e002455. [Google Scholar] [CrossRef]
- Meyer, A.; Riese, J.; Streichert, T. Comparison of the Performance of GPT-3.5 and GPT-4 with That of Medical Students on the Written German Medical Licensing Examination: Observational Study. JMIR Med. Educ. 2024, 10, e50965. [Google Scholar] [CrossRef]
- Lahat, A.; Sharif, K.; Zoabi, N.; Shneor Patt, Y.; Sharif, Y.; Fisher, L.; Shani, U.; Arow, M.; Levin, R.; Klang, E. Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4. J. Med. Internet Res. 2024, 26, e54571. [Google Scholar] [CrossRef]
- Bolliger, L.S.; Haller, P.; Cretton, I.C.R.; Reich, D.R.; Kew, T.; Jäger, L.A. EMTeC: A corpus of eye movements on machine-generated texts. Behav. Res. Methods 2025, 57, 189. [Google Scholar] [CrossRef]
- James, A.; Trovati, M.; Bolton, S. Retrieval-Augmented Generation to Generate Knowledge Assets and Creation of Action Drivers. Appl. Sci. 2025, 15, 6247. [Google Scholar] [CrossRef]
- Nastoska, A.; Jancheska, B.; Rizinski, M.; Trajanov, D. Evaluating Trustworthiness in AI: Risks, Metrics, and Applications Across Industries. Electronics 2025, 14, 2717. [Google Scholar] [CrossRef]
- Novelo, R.; Silva, R.R.; Bernardino, J. A Literature Review of Personalized Large Language Models for Email Generation and Automation. Future Internet 2025, 17, 536. [Google Scholar] [CrossRef]
- Di Martino, F.; Delmastro, F. Explainable AI for clinical and remote health applications: A survey on tabular and time series data. Artif. Intell. Rev. 2023, 56, 5261–5315. [Google Scholar] [CrossRef]
- Wagner, N.; Kraus, M.; Minker, W.; Griol, D.; Callejas, Z. A Survey on Multi-User Conversational Interfaces. Appl. Sci. 2025, 15, 7267. [Google Scholar] [CrossRef]
- Lai, X.; Lai, Y.; Chen, J.; Huang, S.; Gao, Q.; Huang, C. Evaluation Strategies for Large Language Model-Based Models in Exercise and Health Coaching: Scoping Review. J. Med. Internet Res. 2025, 27, e79217. [Google Scholar] [CrossRef]
- Lv, X.; Zhang, X.; Li, Y.; Ding, X.; Lai, H.; Shi, J. Leveraging Large Language Models for Improved Patient Access and Self-Management: Assessor-Blinded Comparison Between Expert- and AI-Generated Content. J. Med. Internet Res. 2024, 26, e55847. [Google Scholar] [CrossRef]
- Singh, S.U.; Namin, A.S. A survey on chatbots and large language models: Testing and evaluation techniques. Nat. Lang. Process. J. 2025, 10, 100128. [Google Scholar] [CrossRef]
- Dahlgren Lindström, A.; Methnani, L.; Krause, L.; Ericson, P.; de Rituerto de Troya, Í.M.; Coelho Mollo, D.; Dobbe, R. Helpful, harmless, honest? Sociotechnical limits of AI alignment and safety through Reinforcement Learning from Human Feedback. Ethics Inf. Technol. 2025, 27, 28. [Google Scholar] [CrossRef]
- Shao, Y.; Yang, X.; Chen, Q.; Guo, H.; Duan, X.; Xu, X.; Yue, J.; Zhang, Z.; Zhao, S.; Zhang, S. Determinants of digital health literacy among older adult patients with chronic diseases: A qualitative study. Front. Public Health 2025, 13, 1568043. [Google Scholar] [CrossRef] [PubMed]
- Zolfaghari, Z.; Karimian, Z.; Zarifsanaiey, N.; Farahmandi, A.Y. Navigating challenges in medical english learning: Leveraging technology and gamification for interactive education—A qualitative study. BMC Med. Educ. 2025, 25, 1045. [Google Scholar] [CrossRef] [PubMed]
- Khojasteh, L.; Kafipour, R.; Pakdel, F.; Mukundan, J. Empowering medical students with AI writing co-pilots: Design and validation of AI self-assessment toolkit. BMC Med. Educ. 2025, 25, 159. [Google Scholar] [CrossRef] [PubMed]
- Ahmed, A.; Leroy, G.; Kauchak, D.; Barai, P.; Harber, P.; Rains, S. Parallel Corpus Analysis of Text and Audio Comprehension to Evaluate Readability Formula Effectiveness: Quantitative Analysis. J. Med. Internet Res. 2025, 27, e69772. [Google Scholar] [CrossRef]
- Joseph, S.; Bhardwaj, A.; Skariah, J.; Aggarwal, I.; Shah, V.; Harris, R.A. Effects of education level on natural language processing in cardiovascular health communication. Front. Public Health 2025, 13, 1688173. [Google Scholar] [CrossRef]
- Gao, Y.; Xu, Q.; Zhang, O.; Wang, H.; Wang, Y.; Wang, J.; Chen, X. Large language models: Unlocking new potential in patient education for thyroid eye disease. Endocrine 2025, 90, 689–698. [Google Scholar] [CrossRef]
- Zhang, Z.; Zhang, H.; Pan, Z.; Bi, Z.; Wan, Y.; Song, X.; Fan, X. Evaluating Large Language Models in Ophthalmology: Systematic Review. J. Med. Internet Res. 2025, 27, e76947. [Google Scholar] [CrossRef]
- Zhang, J.; Song, X.; Tian, B.; Tian, M.; Zhang, Z.; Wang, J.; Fan, T. Large language models in the management of chronic ocular diseases: A scoping review. Front. Cell Dev. Biol. 2025, 13, 1608988. [Google Scholar] [CrossRef]
- Betzler, B.K.; Chen, H.; Cheng, C.Y.; Lee, C.S.; Ning, G.; Song, S.J.; Lee, A.Y.; Kawasaki, R.; van Wijngaarden, P.; Grzybowski, A.; et al. Large language models and their impact in ophthalmology. Lancet Digit. Health 2023, 5, e917–e924. [Google Scholar] [CrossRef]
- Bacco, L.; Russo, F.; Ambrosio, L.; D’Antoni, F.; Vollero, L.; Vadalà, G.; Dell’Orletta, F.; Merone, M.; Papalia, R.; Denaro, V. Natural language processing in low back pain and spine diseases: A systematic review. Front. Surg. 2022, 9, 957085. [Google Scholar] [CrossRef]
- Shah, R.; Schwab, J.H. Large Language Models in Spine Surgery: A Promising Technology. HSS J. 2025, 21, 15563316251340696. [Google Scholar] [CrossRef]
- Croxford, E.; Gao, Y.; First, E.; Pellegrino, N.; Schnier, M.; Caskey, J.; Oguss, M.; Wills, G.; Chen, G.; Dligach, D.; et al. Evaluating clinical AI summaries with large language models as judges. npj Digit. Med. 2025, 8, 640. [Google Scholar] [CrossRef]
- Alshammari, A.F.; Madfa, A.A.; Anazi, B.A.; Alenezi, Y.E.; Alkurdi, K.A. Comparison of accuracy and consistency of AI Language models when answering standardised dental MCQs. BMC Med. Educ. 2025, 25, 1507. [Google Scholar] [CrossRef]
- Martos, M.; Fields, B.; Finlayson, S.G.; Hartell, N.; Kim, T.; Larimer, E.; Lau, J.J.; Lin, Y.H.; Salaguinto, T.; Tran, N.; et al. Accuracy of Artificial Intelligence vs Professionally Translated Discharge Instructions. JAMA Netw. Open 2025, 8, e2532312. [Google Scholar] [CrossRef]
- Lee, C.; Britto, S.; Diwan, K. Evaluating the Impact of Artificial Intelligence (AI) on Clinical Documentation Efficiency and Accuracy Across Clinical Settings: A Scoping Review. Cureus 2024, 16, e73994. [Google Scholar] [CrossRef]





| Country | Count |
|---|---|
| USA | 60 |
| Turkey | 34 |
| China | 6 |
| India | 6 |
| Australia | 5 |
| Canada | 5 |
| Germany | 3 |
| Italy | 2 |
| Brazil | 1 |
| Denmark | 2 |
| Ireland | 2 |
| Belgium | 1 |
| Croatia | 1 |
| Egypt | 1 |
| Netherlands | 1 |
| Poland | 1 |
| Saudi Arabia | 1 |
| Singapore | 1 |
| South Korea | 1 |
| Spain | 1 |
| United Kingdom | 1 |
| Chatbot | Count |
|---|---|
| ChatGPT-4/GPT-4o | 94 |
| ChatGPT-3.5 | 83 |
| Google Bard/Gemini | 52 |
| Microsoft Copilot/Microsoft Copilot Pro/Bing AI | 39 |
| Perplexity AI/Perplexity Pro | 26 |
| Claude 2.0/Claude 3.5/Claude Sonnet | 12 |
| Meta AI Assistant | 4 |
| ChatSonic 1.0.2 | 3 |
| DeepSeek-V3 | 2 |
| DocsGPT 0.15.0—Changelog | 2 |
| DeepSeek-R1 | 2 |
| Open Evidence 2.0 | 1 |
| ChatSpot Alpha | 1 |
| DeepSeek-R1 | 1 |
| Ernie Bot 4.0 | 1 |
| LLaMA 3.1 | 1 |
| Llama 3.1 Large | 1 |
| MediSearch Version 1.5.10 | 1 |
| Pi AI 1.0.53 | 1 |
| Vello | 1 |
| Vello Pro | 1 |
| Readability Scale | Count |
|---|---|
| Flesch–Kincaid Grade Level | 117 |
| Flesch Reading Ease Score | 95 |
| Gunning Fog Index | 41 |
| Simple Measure of Gobbledygook | 39 |
| Coleman–Liau Index | 22 |
| Automated Readability Index | 14 |
| FORCAST | 4 |
| Dale–Chall Readability | 3 |
| Fry Readability Graph | 2 |
| Fry Readability Score | 2 |
| Läsbarhetsindex | 2 |
| Linsear Write | 2 |
| Raygor Readability Estimate | 2 |
| Lix Readability Index | 1 |
| Chatbot | Flesch Reading Ease | Flesch–Kincaid Grade Level | Gunning Fog Index | SMOG Index | Coleman–Liau Index | Automated Readability Index | Linsear Write | Dale-Chall Score | FORCAST | Fry Graph | Fry Readability Score | Lesbarhetsindex | Lix Readability Index | Raygor Estimate |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ChatGPT-4 | 37.55 ± 17.76 | 13.85 ± 8.10 | 14.49 ± 3.60 | 12.94 ± 2.74 | 14.61 ± 2.91 | 11.67 ± 2.38 | 9.61 ± 2.33 | 9.90 | 12.60 ± 0.42 | 13.55 ± 0.64 | 9.50 ± 0.71 | 36.49 ± 38.91 | 72.00 | 13.80 ± 0.28 |
| ChatGPT-3.5 | 35.16 ± 13.59 | 15.45 ± 8.78 | 15.57 ± 3.26 | 13.11 ± 1.92 | 15.43 ± 2.16 | 14.06 ± 1.62 | 13.95 ± 1.81 | 10.25 ± 0.35 | 12.48 ± 0.12 | |||||
| Microsoft Copilot | 35.66 ± 12.01 | 13.66 ± 8.02 | 14.57 ± 2.94 | 13.64 ± 2.87 | 14.25 ± 2.38 | 11.95 ± 2.20 | 11.90 ± 1.27 | 10.30 | 12.30 | |||||
| Google Gemini | 39.61 ± 14.73 | 13.14 ± 8.31 | 14.29 ± 4.13 | 12.65 ± 2.41 | 13.33 ± 2.66 | 11.23 ± 2.45 | 11.71 ± 2.39 | 11.60 | 11.21 ± 1.41 | |||||
| Perplexity | 31.31 ± 11.27 | 19.62 ± 13.51 | 16.58 ± 2.63 | 14.02 ± 2.40 | 14.68 ± 2.07 | 14.06 ± 3.14 | 14.76 ± 5.11 | |||||||
| Meta AI | 28.38 ± 21.83 | 11.97 ± 1.79 | 11.60 | 12.40 | 19.10 | 13.50 | 13.80 | |||||||
| Claude | 40.11 ± 21.18 | 11.22 ± 2.87 | 10.31 | 10.31 | ||||||||||
| PiAI | 16.30 | 15.90 | 20.00 | 11.90 | ||||||||||
| DeepSeek-V3 | 53.35 ± 7.00 | 8.45 ± 0.35 | 16.40 | 15.10 | ||||||||||
| ChatSpot | 23.10 | 15.00 | 18.20 | 11.30 | ||||||||||
| DeepSeek | 76.43 | 12.26 | 15.40 | |||||||||||
| DocsGPT | 72.00 | 9.75 ± 5.73 | 12.10 | |||||||||||
| Llama 3.1 Large | 20.10 | 24.10 | ||||||||||||
| Llama 3.1 | 23.70 | 34.20 | ||||||||||||
| Ernie Bot 4.0 | 37.50 | 12.90 | ||||||||||||
| DeepSeek-R1 | 61.40 | 7.20 | ||||||||||||
| MediSearch | 18.30 | |||||||||||||
| ChatSonic | 21.65 ± 16.77 | |||||||||||||
| Open Evidence | 17.09 ± 0.56 | |||||||||||||
| Vello | 29.00 | |||||||||||||
| Vello Pro | 17.40 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Olszewski, R.; Brzeziński, J.; Watros, K.; Rysz, J. Quantifying Readability in Chatbot-Generated Medical Texts Using Classical Linguistic Indices: A Review. Appl. Sci. 2026, 16, 1423. https://doi.org/10.3390/app16031423
Olszewski R, Brzeziński J, Watros K, Rysz J. Quantifying Readability in Chatbot-Generated Medical Texts Using Classical Linguistic Indices: A Review. Applied Sciences. 2026; 16(3):1423. https://doi.org/10.3390/app16031423
Chicago/Turabian StyleOlszewski, Robert, Jakub Brzeziński, Klaudia Watros, and Jacek Rysz. 2026. "Quantifying Readability in Chatbot-Generated Medical Texts Using Classical Linguistic Indices: A Review" Applied Sciences 16, no. 3: 1423. https://doi.org/10.3390/app16031423
APA StyleOlszewski, R., Brzeziński, J., Watros, K., & Rysz, J. (2026). Quantifying Readability in Chatbot-Generated Medical Texts Using Classical Linguistic Indices: A Review. Applied Sciences, 16(3), 1423. https://doi.org/10.3390/app16031423

