Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessCommunication

Peer-Review Record

Evaluation of the Applicability of ChatGPT in Patient Education on Obstructive Sleep Apnea

J. Respir. 2025, 5(1), 3; https://doi.org/10.3390/jor5010003

by Cristina López-Riolobos^1,*, Juan Riestra-Ayora^2,3

, Beatriz Raboso Moreno^1,3

, Nora Lebrato Rubio⁴, José María Diaz García^1,3, Cristina Vaduva^2,3

, Indira Astudillo Rodríguez⁴

, Leonardo Saldaña Pérez^1,3, Fernando García Prieto^1,3

, Sara Calero Pardo^1,3 and Araceli Abad Fernández^1,3

Reviewer 1:

Marco Zaffanello

Reviewer 2: Anonymous

Reviewer 3: Anonymous

J. Respir. 2025, 5(1), 3; https://doi.org/10.3390/jor5010003

Submission received: 29 October 2024 / Revised: 2 February 2025 / Accepted: 24 February 2025 / Published: 4 March 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Dear Authors

Although this study's concept is exciting and innovative, the manuscript presents several significant issues that limit its potential for publication.

Lack of essential information regarding the qualifications of the interviewed and evaluating physicians: What are the ages of the two expert physicians in sleep medicine? How many years of experience do they have in sleep medicine? For the six evaluating physicians, Information on age, gender, academic degrees, years of experience, country of origin, and other professional qualifications (e.g., relevant publications) is missing.

There are a limited number of interviewed experts and evaluators. The small sample size undermines the generalisability of the findings.

The date of access to the chatbot is missing. Specifying when the chatbot responses were generated is crucial, as the AI model is constantly updated.

Details regarding the questions are missing. Were the questions validated sleep-related questionnaires or non-validated?

Ethical considerations are missing. Was ethical approval sought for this study? If not, the reasons should be explained.

The manuscript implies the language used might have been Spanish; this should be explicitly clarified.

The timing of the evaluation is missing. When was the assessment conducted? This is important to contextualise the findings.

Statistical analysis is inappropriate. The closed-ended questions (yes/no responses) are categorical variables for which a chi-squared or Fisher’s exact test would be more appropriate.

I strongly recommend that the authors continue this study by increasing the sample size of evaluated subjects and evaluators. Additionally, the statistical analysis should be revised, and a larger sample size should be used to improve the statistical power of the findings.

Comments on the Quality of English Language

The English could be improved to express the research more clearly.

Author Response

Dear Reviewer 1,

We appreciate your detailed and thoughtful feedback. Your comments have significantly contributed to refining our manuscript, and we have carefully addressed each point below:

"Lack of essential information regarding the qualifications of the interviewed and evaluating physicians."

Response: We have added detailed information regarding the age, experience, academic degrees, gender, country of origin, and other relevant qualifications of both the interviewed physicians and the evaluators.

Change in Manuscript: This information has been incorporated into the Materials and Methods section.

Added Sentence: "The two expert physicians had over five years of clinical experience in treating OSA, held academic teaching positions, and actively participated in training programs. The evaluators represented diverse specialties (Pulmonology, Neurophysiology, and Otorhinolaryngology) with extensive clinical experience and academic backgrounds."

"The small sample size undermines the generalisability of the findings."

Response: We acknowledge the small sample size as a limitation of our study. However, we ensured a multidisciplinary representation among the evaluators. We have suggested that future research should include a larger and more diverse sample to enhance generalizability.

Change in Manuscript: This has been discussed in the Limitations section.

Added Sentence: "We acknowledge the limited sample size as a limitation. Future studies should include a larger and more diverse cohort to improve generalizability."

"The date of access to the chatbot is missing."

Response: The chatbot responses were generated in August 2024, and the evaluation by the six experts was conducted in September 2024.

Change in Manuscript: This information has been added to the Materials and Methods section.

Added Sentence: "The responses from ChatGPT 4.0® were generated in August 2024, and the evaluation process by the six experts took place during September 2024."

4. "Were the questions validated sleep-related questionnaires or non-validated?"

Response: The questions were not derived from validated questionnaires but were instead formulated by experienced clinicians based on common patient concerns encountered in daily consultations.

Change in Manuscript: This clarification has been added to the Materials and Methods section.

Added Sentence: "The questions were developed based on the most frequently asked concerns observed in clinical practice by experienced sleep medicine specialists, rather than being derived from standardized questionnaires."

"Ethical considerations are missing."

Response: Ethical approval was not required, as there was no direct patient involvement or intervention. However, the study adhered to standard ethical principles for scientific research.

Change in Manuscript: This clarification has been included in the Materials and Methods section.

Added Sentence: "As no direct patient involvement occurred, formal ethical approval was not required. The study adhered to ethical principles for scientific research."

“The English could be improved."

Response: The manuscript has undergone a thorough English revision to improve clarity, fluency, and adherence to academic style.

7.- “The language of the manuscript”

Response: The questions were asked in Spanish, using colloquial language, as Spanish patients usually request information in consultation with the sleep unit of our hospital. The answers and responses were prepared by Spanish speakers, originally from Spain and Latin America.

Change in Manuscript: This clarification has been included in the Materials and Methods section

We are confident that the revisions address your concerns and have strengthened our manuscript.

Sincerely,
The Authors

Reviewer 2 Report

Comments and Suggestions for Authors

congratulations for you work. AI it a very hot topic in medicine and the studies are few

Author Response

Thank you very much for your comment. We are very excited about our work and we believe that it is important to know the use of new technologies in the daily practice of medicine.

Reviewer 3 Report

Comments and Suggestions for Authors

This is a very interesting study comparing the responses of Chat-GPT with the responses of clinicians to questions that patients commonly ask about obstructive sleep apnoea. It is most timely as artificial intelligence is increasingly being used in clinical practice both by health care professionals and patients themselves.

METHODS I think that the description of the methods needs more information I note that the authors are Spanish. What language was used in this study. Did this match the primary language of the experts? Is there data on how Chat-GP works with different languages? Were patients directly involved in formulating the questions?

Were the specialists assessing the responses blinded to who or what had produced the responses? Were they asked to indicate the sources of each response to see if they were truly blinded? I presume that patients were not asked to comment on they thought of the responses.

RESULTS How did the six assessors compare with each other? Were they consistent in their assessment of the responses?

Author Response

Dear Reviewer 3,

We sincerely appreciate your constructive feedback and the time you dedicated to reviewing our manuscript. Your insightful comments have helped us improve the clarity and depth of our work. Below, we address each of your points individually:

"What language was used in this study? Did this match the primary language of the experts?

Response: The study was conducted entirely in Spanish, including both the formulation of questions and the evaluation of responses. This matched the native language of the medical experts involved in the study. Regarding ChatGPT’s multilingual capabilities, while there are studies suggesting variability in performance across languages, our analysis was limited to Spanish.
Change in Manuscript: A clarification has been added in the Materials and Methods section.
Added Sentence: "The questions and responses were formulated and evaluated entirely in Spanish, aligning with the native language of the medical experts involved."

2." Is there data on how Chat-GP works with different languages?"

Response: Chat-GPT has its best performance in English, since in this language it handles the greatest amount of data and therefore, the training is greater. Spanish, the language used in this article, is also a predominant language. However, this topic can be a source of linguistic or cultural biases when it comes to obtaining less accurate or inappropriate responses for non-English speakers or underrepresented groups, as mentioned by Neha et al in ChatGPT: Transforming Healthcare with AI. AI 2024, 5, 2618-2650. https://doi.org/10.3390/ai5040126

3.-"Were patients directly involved in formulating the questions?"

Response: Patients were not directly involved in formulating the questions. Instead, the questions were compiled by senior physicians with over 5 years of experience based on the most frequently asked concerns observed during clinical consultations.
Change in Manuscript: This point has been clarified in the Materials and Methods
Added Sentence: "The questions were developed by experienced sleep medicine specialists based on the most common concerns observed during daily consultations."

4.-"Were the specialists assessing the responses blinded to who or what had produced the responses? Were they asked to indicate the sources of each response to see if they were truly blinded?"

Response: The evaluators were completely blinded to the origin of the responses (AI or expert physicians). They were not asked to identify the source to avoid introducing any potential bias.
Change in Manuscript: This clarification has been included in the Materials and Methods
Added Sentence: "The evaluators were blinded to the source of the responses and were not asked to identify their origin to ensure an unbiased assessment."

5.-The patients were not asked to comment on they thought of the responses.

Response: Nor were the answers offered to the patients themselves, to assess their clarity and understanding. This would also be interesting to offer in a second round.

6.-"How did the six assessors compare with each other? Were they consistent in their assessment of the responses?"

Response: An ANOVA analysis was conducted to assess the consistency among the six evaluators. The results showed no significant differences (p > 0.05) between the evaluators' scores, indicating a high degree of consistency in their assessments.
Change in Manuscript: This analysis has been detailed in the Results
Added Sentence: "An ANOVA analysis revealed no significant differences among the evaluators' scores (p > 0.05), indicating consistency in their assessments."

We greatly value your constructive feedback and trust that the revisions address your concerns.

Sincerely,
The Authors

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have clarified the qualifications of the participating physicians, indicating that they are experienced sleep medicine specialists with over five years of practice. Specifically, they hold a specialist certification in somnology. Furthermore, the manuscript specifies that the questions were submitted to Chat-GPT 4.0® in August 2024, adequately addressing this comment.

However, the limited sample size remains a significant factor that constrains the generalisability of the results. The authors have acknowledged this issue and confirmed that including a larger number of evaluators and subjects assessed could reduce the risk of bias. Nevertheless, this acknowledgement does not fully resolve the concern. A statistical power analysis would be recommended. A power analysis would help determine the required sample size to detect differences between the two groups. This methodological step would also reduce the risk of Type I errors (false positives) and Type II errors (false negatives).

The manuscript does not clarify whether the questions were derived from validated instruments. Greater clarity regarding the process adopted to ensure the reliability of the questions would help strengthen the study.

The authors state that the study did not require approval from an ethics committee, as there was no direct patient involvement. However, it would be helpful to discuss further the ethical implications of using artificial intelligence in medical contexts. Specifically, data privacy and the risk of inaccuracies in AI-generated responses should be explored further.

Therefore, I would ask the authors to discuss the issues raised further.

Regards

Author Response

Muchas gracias por tomarse el tiempo de revisar este manuscrito. A continuación encontrará las respuestas detalladas y las correcciones correspondientes resaltadas.

Author Response File: Author Response.pdf

Article Menu

Evaluation of the Applicability of ChatGPT in Patient Education on Obstructive Sleep Apnea

Further Information

Guidelines

MDPI Initiatives

Follow MDPI