Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Printed Edition

A printed edition of this Special Issue is available at MDPI Books....

Share Help Cite Discuss in SciProfiles

Open AccessCommunication

Peer-Review Record

Evaluating the Utility of a Large Language Model in Answering Common Patients’ Gastrointestinal Health-Related Questions: Are We There Yet?

Diagnostics 2023, 13(11), 1950; https://doi.org/10.3390/diagnostics13111950

by Adi Lahat^1,*

, Eyal Shachar¹, Benjamin Avidan¹, Benjamin Glicksberg²

and Eyal Klang³

Reviewer 1:

Manish Manrai

Reviewer 2:

Linda M. Herrick

Reviewer 3:

Anita C. Windhorst

Diagnostics 2023, 13(11), 1950; https://doi.org/10.3390/diagnostics13111950

Submission received: 27 March 2023 / Revised: 28 May 2023 / Accepted: 1 June 2023 / Published: 2 June 2023

(This article belongs to the Special Issue Artificial Intelligence in Gastrointestinal Disease: Diagnosis and Management)

Round 1

Reviewer 1 Report

An emerging concept and the authors have done well to conduct this study.

The study does add to the novelty of this emerging concept of AI
This study and further studies in this domain is likely to generate reader interest provided tangible data with clinical relevance is obtained in the future

The following maybe discussed to enhance the discussion:

a. Lahat A, Shachar E, Avidan B, Shatz Z, Glicksberg BS, Klang E. Evaluating the use of large language model in identifying top research questions in gastroenterology. Sci Rep. 2023;13(1):4164. Published 2023 Mar 13. doi:10.1038/s41598-023-31412-2

b. Ge J, Lai JC. Artificial intelligence-based text generators in hepatology: ChatGPT is just the beginning. Hepatol Commun. 2023 Mar 24;7(4):e0097. doi: 10.1097/HC9.0000000000000097. PMID: 36972383; PMCID: PMC10043591.

c. Hirosawa T, Harada Y, Yokose M, Sakamoto T, Kawamura R, Shimizu T. Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study. Int J Environ Res Public Health. 2023 Feb 15;20(4):3378. doi: 10.3390/ijerph20043378. PMID: 36834073; PMCID: PMC9967747.

d. Eysenbach G. The Role of ChatGPT, Generative Language Models, and Artificial Intelligence in Medical Education: A Conversation With ChatGPT and a Call for Papers. JMIR Med Educ. 2023 Mar 6;9:e46885. doi: 10.2196/46885. PMID: 36863937; PMCID: PMC10028514.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

This is an interesting study evaluating the use of the Open AI’s ChatGPT chat bot in answering common questions in the area of gastroenterology and hepatology. Given the current interest and development of artificial intelligence and chatbots, this is an interesting topic.

The abstract is complete with results provided clearly. The methods would benefit from clarification of the process of comparison of the answers by ChatGPT and the physicians. It appears the physicians rated the chat bot answers and they were not actually compared to answers that the physicians might have given. This is misleading and would suggest a different method than what was used. The conclusion is that further development of the chat bot is needed, however there should also be consideration of the material available on-line as that is the source of the information. If information on-line is incomplete, answers will be incomplete.

The background is sufficient with information about AI chat bots and possible use by patients in answering questions related to gastrointestinal questions.

The methods section has a good description of the questions identified and the use of the AI chat bot to obtain answers. The evaluation component of the study is not clear. It states that the answers provided by the AI chat bot were compared to those of 3 gastroenterologists, however the gastroenterologists rated the answers. They did not provide answers that were then compared by other parties. The gastroenterologists rated the answers in those areas for accuracy, clarity, up-to-date information, and effectivity. It is not clear what is meant the effectivity. Do you mean effectiveness? This should be defined as to what was meant. The methods note that all 3 gastroenterologists rated the answers and grades were summarized, but it also says consensus was reached. Did they rate independently or reached consensus? The answers would appear to indicate consensus as there is a single, whole number for each item. Please clarify the process and rating used.

The data are presented clearly and it was nice that the questions and answers were provided in the supplemental material.

The results section has some of the methods repeated or added. There are some comments in the results that indicate opinion rather than an objective assessment of the results. For example, on page 5 authors note that answers related to treatment options varied wildly between topics. It would be better to state the variance rather than use a subjective term.

The discussion is fairly complete. One factor that is not considered is the source of the information for the AI chat bot which is electronic sources. Results would also indicate that detailed information related to the questions is missing from electronic sources. This would be consistent with what most gastroenterology departments or health care institutions post related to these topics. Research would not include answers to these basic questions. The use of some of the acronyms in the questions should be addressed as students are aware that use of acronyms in web-based searches is problematic so why were those included as such and not spelled out? This should also be noted in the discussion.

Limitations are clearly addressed.

The conclusion is that the AI chat bot should be used with caution. It would be important to note the source of information used as well.