Assessing the Quality and Accuracy of ChatGPT-3.5 Responses to Patient Questions About Hip Arthroscopy
Abstract
1. Introduction
2. Materials and Methods
2.1. Study Design and Data Generation
2.2. Response Collection
2.3. Evaluation Protocol
2.4. Statistical Analysis
3. Results
3.1. Descriptive Statistics
3.2. Inter-Rater Reliability and Agreement
3.3. Content Quality Assessment
3.4. Qualitative Findings
4. Discussion
4.1. Primary Findings
4.2. Consistency and Inter-Rater Agreement
4.3. Comparison with Prior Studies
4.4. Limitations and Clinical Implications
4.5. Interpretation of Statistical Findings
4.6. Clinical Utility and Future Directions
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| AI | Artificial Intelligence |
| FAIS | Femoroacetabular Impingement Syndrome |
| HAS | Hip Arthroscopy |
| ICC | Intraclass Correlation Coefficient |
| LLMs | Large Language Models |
| SD | Standard Deviation |
| MDPI | Multidisciplinary Digital Publishing Institute |
References
- Hale, R.F.; Melugin, H.P.; Zhou, J.; LaPrade, M.D.; Bernard, C.; Leland, D.; Levy, B.A.; Krych, A.J. Incidence of Femoroacetabular Impingement and Surgical Management Trends Over Time. Am. J. Sports Med. 2021, 49, 35–41. [Google Scholar] [CrossRef]
- Ganz, R.; Parvizi, J.; Beck, M.; Leunig, M.; Nötzli, H.; Siebenrock, K.A. Femoroacetabular Impingement: A Cause for Osteoarthritis of the Hip. Clin. Orthop. Relat. Res. 2003, 417, 112–120. [Google Scholar] [CrossRef]
- Kyin, C.; Maldonado, D.R.; Go, C.C.; Shapira, J.; Lall, A.C.; Domb, B.G. Mid- to Long-Term Outcomes of Hip Arthroscopy: A Systematic Review. Arthroscopy 2021, 37, 1011–1025. [Google Scholar] [CrossRef] [PubMed]
- Jamil, M.; Dandachli, W.; Noordin, S.; Witt, J. Hip Arthroscopy: Indications, Outcomes and Complications. Int. J. Surg. 2018, 54, 341–344. [Google Scholar] [CrossRef] [PubMed]
- Ramadanov, N.; Lettner, J.; Voss, M.; Prill, R.; Hable, R.; Dimitrov, D.; Becker, R. Minimal Clinically Important Differences in Conservative Treatment Versus Hip Arthroscopy for Femoroacetabular Impingement Syndrome: A Frequentist Meta-Analysis of RCTs. Orthop. Surg. 2025, 17, 2514–2528. [Google Scholar] [CrossRef]
- Ramadanov, N.; Lettner, J.; Voss, M.; Hable, R.; Prill, R.; Dimitrov, D.; Becker, R. Conservative Treatment Versus Hip Arthroscopy in Patients with Femoroacetabular Impingement: A Multilevel Meta-Analysis of Randomized Controlled Trials. Bone Jt. Open 2025, 6, 480–498. [Google Scholar] [CrossRef]
- Clohisy, J.C.; St John, L.C.; Schutz, A.L. Surgical Treatment of Femoroacetabular Impingement: A Systematic Review of the Literature. Clin. Orthop. Relat. Res. 2010, 468, 555–564. [Google Scholar] [CrossRef] [PubMed]
- de Sa, D.; Lian, J.; Sheean, A.J.; Inman, K.; Drain, N.; Ayeni, O.; Mauro, C. A Systematic Summary of Systematic Reviews on the Topic of Hip Arthroscopic Surgery. Orthop. J. Sports Med. 2018, 6, 2325967118796222. [Google Scholar] [CrossRef]
- Aydin, S.; Karabacak, M.; Vlachos, V.; Margetis, K. Large Language Models in Patient Education: A Scoping Review of Applications in Medicine. Front. Med. 2024, 11, 1477898. [Google Scholar] [CrossRef]
- Hoang, V.; Parekh, A.; Sagers, K.; Call, T.; Howard, S.; Hoffman, J.; Lee, D. Patient Utilization of Online Information and Its Influence on Orthopedic Surgeon Selection: Cross-Sectional Survey of Patient Beliefs and Behaviors. JMIR Form. Res. 2022, 6, e22586. [Google Scholar] [CrossRef]
- Johnson, D.; Goodman, R.; Patrinely, J.; Stone, C.; Zimmerman, E.; Donald, R.; Chang, S.; Berkowitz, S.; Finn, A.; Jahangir, E.; et al. Assessing the Accuracy and Reliability of AI-Generated Medical Responses: An Evaluation of the Chat-GPT Model. Res. Sq. 2023, rs.3.rs-2566942. [Google Scholar] [CrossRef]
- Halaseh, F.F.; Yang, J.S.; Danza, C.N.; Halaseh, R.; Spiegelman, L. ChatGPT’s Role in Improving Education Among Patients Seeking Emergency Medical Treatment. West. J. Emerg. Med. 2024, 25, 845–855. [Google Scholar] [CrossRef]
- Wu, Y.; Zheng, Y.; Feng, B.; Yang, Y.; Kang, K.; Zhao, A. Embracing ChatGPT for Medical Education: Exploring Its Impact on Doctors and Medical Students. JMIR Med. Educ. 2024, 10, e52483. [Google Scholar] [CrossRef]
- Takita, H.; Kabata, D.; Walston, S.L.; Tatekawa, H.; Saito, K.; Tsujimoto, Y.; Miki, Y.; Ueda, D. A Systematic Review and Meta-Analysis of Diagnostic Performance Comparison Between Generative AI and Physicians. npj Digit. Med. 2025, 8, 175. [Google Scholar] [CrossRef]
- Walker, H.L.; Ghani, S.; Kuemmerli, C.; Nebiker, C.A.; Müller, B.P.; Raptis, D.A.; Staubli, S.M. Reliability of Medical Information Provided by ChatGPT: Assessment Against Clinical Guidelines and Patient Information Quality Instrument. J. Med. Internet Res. 2023, 25, e47479. [Google Scholar] [CrossRef] [PubMed]
- Kaarre, J.; Feldt, R.; Zsidai, B.; Senorski, E.H.; Rydberg, E.M.; Wolf, O.; Mukka, S.; Möller, M.; Samuelsson, K. ChatGPT Can Yield Valuable Responses in the Context of Orthopaedic Trauma Surgery. J. Exp. Orthop. 2024, 11, e12047. [Google Scholar] [CrossRef] [PubMed]
- Mika, A.P.; Martin, J.R.; Engstrom, S.M.; Polkowski, G.G.; Wilson, J.M. Assessing ChatGPT Responses to Common Patient Questions Regarding Total Hip Arthroplasty. J. Bone Jt. Surg. Am. 2023, 105, 1519–1526. [Google Scholar] [CrossRef]
- Özbek, E.A.; Ertan, M.B.; Kından, P.; Karaca, M.O.; Gürsoy, S.; Chahla, J. ChatGPT Can Offer at Least Satisfactory Responses to Common Patient Questions Regarding Hip Arthroscopy. Arthroscopy 2025, 41, 1806–1827. [Google Scholar] [CrossRef]
- Ayık, G.; Ercan, N.; Demirtaş, Y.; Yıldırım, T.; Çakmak, G. Evaluation of ChatGPT-4o’s Answers to Questions About Hip Arthroscopy from the Patient Perspective. Jt. Dis. Relat. Surg. 2025, 36, 193–199. [Google Scholar] [CrossRef] [PubMed]
- Atik, O.Ş. Artificial Intelligence: Who Must Have Autonomy—The Machine or the Human? Jt. Dis. Relat. Surg. 2024, 35, 1–2. [Google Scholar] [CrossRef]
- AlShehri, Y.; McConkey, M.; Lodhia, P. ChatGPT Provides Satisfactory but Occasionally Inaccurate Answers to Common Patient Hip Arthroscopy Questions. Arthroscopy 2025, 41, 1337–1347. [Google Scholar] [CrossRef]
- Maity, S.; Saikia, M.J. Large Language Models in Healthcare and Medical Applications: A Review. Bioengineering 2025, 12, 631. [Google Scholar] [CrossRef] [PubMed]
- Raghu Subramanian, C.; Yang, D.A.; Khanna, R. Enhancing Health Care Communication With Large Language Models—The Role, Challenges, and Future Directions. JAMA Netw. Open 2024, 7, e240347. [Google Scholar] [CrossRef] [PubMed]
- Cross, J.L.; Choma, M.A.; Onofrey, J.A. Bias in Medical AI: Implications for Clinical Decision-Making. PLOS Digit. Health 2024, 3, e0000651. [Google Scholar] [CrossRef]
- Mukhtar, T.; Babur, M.N.; Abbas, R.; Irshad, A.; Kiran, Q. Digital Health Literacy: A Systematic Review of Interventions and Their Influence on Healthcare Access and Sustainable Development Goal 3 (SDG-3). Pak. J. Med. Sci. 2025, 41, 910–918. [Google Scholar] [CrossRef]
- Magruder, M.L.; Rodriguez, A.N.; Wong, J.C.J.; Erez, O.; Piuzzi, N.S.; Scuderi, G.R.; Slover, J.D.; Oh, J.H.; Schwarzkopf, R.; Chen, A.F.; et al. Assessing Ability for ChatGPT to Answer Total Knee Arthroplasty-Related Questions. J. Arthroplast. 2024, 39, 2022–2027. [Google Scholar] [CrossRef] [PubMed]
- McGraw, K.O.; Wong, S.P. Forming Inferences About Some Intraclass Correlation Coefficients. Psychol. Methods 1996, 1, 30–46. [Google Scholar] [CrossRef]
- Moldovan, F.; Moldovan, L. Assessment of Patient Matters in Healthcare Facilities. Healthcare 2024, 12, 325. [Google Scholar] [CrossRef]
| No. | Question |
|---|---|
| 1 | What is FAIS? |
| 2 | What causes FAIS? |
| 3 | What are the symptoms of FAIS? |
| 4 | How is FAIS diagnosed? |
| 5 | Do I always need surgery for FAIS? |
| 6 | What types of FAIS are there? |
| 7 | What is hip arthroscopy and when is it used? |
| 8 | What happens during hip arthroscopy for FAIS? |
| 9 | What are the benefits of hip arthroscopy compared to open surgery? |
| 10 | What are the risks or complications of hip arthroscopy? |
| 11 | When can I return to sports or full activity? |
| 12 | Will my leg be shorter or longer after surgery? |
| 13 | Will I develop arthritis later if I have FAIS? |
| 14 | How do I prepare for hip arthroscopy? |
| 15 | What happens if I don’t have surgery? |
| 16 | How successful is hip arthroscopy for FAIS? |
| 17 | What factors affect the outcome of surgery? |
| 18 | Can I live a normal life after surgery? |
| 19 | How long will the benefits of surgery last? |
| 20 | What should I ask my surgeon before surgery? |
| Domain | Mean ± SD | ICC (2,1) [95% CI] | Exact Agreement (%) |
|---|---|---|---|
| Relevance | 5.00 ± 0.00 | not estimable | 100 |
| Accuracy | 4.98 ± 0.11 | 0.00 [0.00–0.05] | 90 |
| Clarity | 4.98 ± 0.11 | 0.01 [0.00-0.06] | 90 |
| Completeness | 4.85 ± 0.24 | 0.03 [0.00–0.12] | 70 |
| Question | Relevance (Rater 1) | Relevance (Rater 2) | Accuracy (Rater 1) | Accuracy (Rater 2) | Clarity (Rater 1) | Clarity (Rater 2) | Completeness (Rater 1) | Completeness (Rater 2) |
|---|---|---|---|---|---|---|---|---|
| Question 1 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
| Question 2 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 4 |
| Question 3 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
| Question 4 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 4 |
| Question 5 | 5 | 5 | 4 | 5 | 4 | 5 | 5 | 5 |
| Question 6 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
| Question 7 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
| Question 8 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 4 |
| Question 9 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 4 |
| Question 10 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
| Question 11 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 4 |
| Question 12 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
| Question 13 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
| Question 14 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
| Question 15 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 4 |
| Question 16 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
| Question 17 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
| Question 18 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
| Question 19 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
| Question 20 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Published by MDPI on behalf of the Lithuanian University of Health Sciences. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Heinz, M.; Hakam, H.T.; Salzmann, M.; Prill, R.; Ramadanov, N. Assessing the Quality and Accuracy of ChatGPT-3.5 Responses to Patient Questions About Hip Arthroscopy. Medicina 2025, 61, 2080. https://doi.org/10.3390/medicina61122080
Heinz M, Hakam HT, Salzmann M, Prill R, Ramadanov N. Assessing the Quality and Accuracy of ChatGPT-3.5 Responses to Patient Questions About Hip Arthroscopy. Medicina. 2025; 61(12):2080. https://doi.org/10.3390/medicina61122080
Chicago/Turabian StyleHeinz, Maximilian, Hassan Tarek Hakam, Mikhail Salzmann, Robert Prill, and Nikolai Ramadanov. 2025. "Assessing the Quality and Accuracy of ChatGPT-3.5 Responses to Patient Questions About Hip Arthroscopy" Medicina 61, no. 12: 2080. https://doi.org/10.3390/medicina61122080
APA StyleHeinz, M., Hakam, H. T., Salzmann, M., Prill, R., & Ramadanov, N. (2025). Assessing the Quality and Accuracy of ChatGPT-3.5 Responses to Patient Questions About Hip Arthroscopy. Medicina, 61(12), 2080. https://doi.org/10.3390/medicina61122080

