Automated Assessment of Reporting Completeness in Orthodontic Research Using LLMs: An Observational Study
Abstract
:1. Introduction
2. Materials and Methods
2.1. Sample Selection
2.2. Identification of Relevant Articles
2.3. Selection of Studies
2.4. Quality Assessment of RCT Abstracts
2.5. Quality Assessment of Systematic Review Abstracts
2.6. Prompt Design and Model Instructions
2.7. Chain-Of-Thought Prompting and Quality Control
2.8. Consistency and Data Management
2.9. Statistical Analysis
3. Results
3.1. Quality Assessment of RCT Abstracts
3.2. Quality Assessment of Systematic Review Abstracts
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Kalla, D.; Smith, N.; Samaah, F.; Kuraku, S. Study and Analysis of ChatGPT and Its Impact on Different Fields of Study. Int. J. Innov. Sci. Res. Technol. 2023, 8, 827–833. [Google Scholar] [CrossRef]
- Li, J.; Dada, A.; Puladi, B.; Kleesiek, J.; Egger, J. ChatGPT in Healthcare: A Taxonomy and Systematic Review. Comput. Methods Programs Biomed. 2024, 245, 108013. [Google Scholar] [CrossRef] [PubMed]
- Baker, A.; Perov, Y.; Middleton, K.; Baxter, J.; Mullarkey, D.; Sangar, D.; Butt, M.; DoRosario, A.; Johri, S. A Comparison of Artificial Intelligence and Human Doctors for the Purpose of Triage and Diagnosis. Front. Artif. Intell. 2020, 3, 543405. [Google Scholar] [CrossRef] [PubMed]
- Paslı, S.; Şahin, A.S.; Beşer, M.F.; Topçuoğlu, H.; Yadigaroğlu, M.; İmamoğlu, M. Assessing the Precision of Artificial Intelligence in Emergency Department Triage Decisions: Insights from a Study with ChatGPT. Am. J. Emerg. Med. 2024, 78, 170–175. [Google Scholar] [CrossRef]
- Grimm, D.R.; Lee, Y.-J.; Hu, K.; Liu, L.; Garcia, O.; Balakrishnan, K.; Ayoub, N.F. The Utility of ChatGPT as a Generative Medical Translator. Eur. Arch. Oto-Rhino-Laryngol. 2024, 281, 6161–6165. [Google Scholar] [CrossRef] [PubMed]
- Caruccio, L.; Cirillo, S.; Polese, G.; Solimando, G.; Sundaramurthy, S.; Tortora, G. Can ChatGPT Provide Intelligent Diagnoses? A Comparative Study between Predictive Models and ChatGPT to Define a New Medical Diagnostic Bot. Expert Syst. Appl. 2024, 235, 121186. [Google Scholar] [CrossRef]
- Delsoz, M.; Madadi, Y.; Raja, H.; Munir, W.M.; Tamm, B.; Mehravaran, S.; Soleimani, M.; Djalilian, A.; Yousefi, S. Performance of ChatGPT in Diagnosis of Corneal Eye Diseases. Cornea 2024, 43, 664–670. [Google Scholar] [CrossRef]
- Horiuchi, D.; Tatekawa, H.; Shimono, T.; Walston, S.L.; Takita, H.; Matsushita, S.; Oura, T.; Mitsuyama, Y.; Miki, Y.; Ueda, D. Accuracy of ChatGPT Generated Diagnosis from Patient’s Medical History and Imaging Findings in Neuroradiology Cases. Neuroradiology 2024, 66, 73–79. [Google Scholar] [CrossRef]
- Kozel, G.; Gurses, M.E.; Gecici, N.N.; Gökalp, E.; Bahadir, S.; Merenzon, M.A.; Shah, A.H.; Komotar, R.J.; Ivan, M.E. ChatGPT on Brain Tumors: An Examination of Artificial Intelligence/Machine Learning’s Ability to Provide Diagnoses and Treatment Plans for Neuro-Oncology Cases. Clin. Neurol. Neurosurg. 2024, 239, 108238. [Google Scholar] [CrossRef]
- Mayo-Yáñez, M.; González-Torres, L.; Saibene, A.M.; Allevi, F.; Vaira, L.A.; Maniaci, A.; Chiesa-Estomba, C.M.; Lechien, J.R. Application of ChatGPT as a Support Tool in the Diagnosis and Management of Acute Bacterial Tonsillitis. Health Technol. 2024, 14, 773–779. [Google Scholar] [CrossRef]
- Oon, M.L.; Syn, N.L.; Tan, C.L.; Tan, K.B.; Ng, S.B. Bridging Bytes and Biopsies: A Comparative Analysis of ChatGPT and Histopathologists in Pathology Diagnosis and Collaborative Potential. Histopathology 2024, 84, 601–613. [Google Scholar] [CrossRef] [PubMed]
- Panwar, P.; Gupta, S. A Review: Exploring the Role of ChatGPT in the Diagnosis and Treatment of Oral Pathologies. Oral Oncol. Rep. 2024, 10, 100225. [Google Scholar] [CrossRef]
- Sandmann, S.; Riepenhausen, S.; Plagwitz, L.; Varghese, J. Systematic Analysis of ChatGPT, Google Search, and Llama 2 for Clinical Decision Support Tasks. Nat. Commun. 2024, 15, 2050. [Google Scholar] [CrossRef] [PubMed]
- Shojaei, M. ChatGPT and Artificial Intelligence in Medical Endocrine System and Interventions. Eurasian J. Chem. Med. Petrol. Res. 2024, 3, 197–209. [Google Scholar]
- Singh, S.; Djalilian, A.; Ali, M.J. ChatGPT and Ophthalmology: Exploring Its Potential with Discharge Summaries and Operative Notes. Semin Ophthalmol. 2023, 38, 503–507. [Google Scholar] [CrossRef]
- Kernberg, A.; Gold, J.A.; Mohan, V. Using ChatGPT-4 to Create Structured Medical Notes from Audio Recordings of Physician-Patient Encounters: Comparative Study. J. Med. Internet Res. 2024, 26, e54419. [Google Scholar] [CrossRef]
- Huang, J.; Yang, D.M.; Rong, R.; Nezafati, K.; Treager, C.; Chi, Z.; Wang, S.; Cheng, X.; Guo, Y.; Klesse, L.J.; et al. A Critical Assessment of Using ChatGPT for Extracting Structured Data from Clinical Notes. npj Digit. Med. 2024, 7, 106. [Google Scholar] [CrossRef]
- Johnson, S.B.; King, A.J.; Warner, E.L.; Aneja, S.; Kann, B.H.; Bylund, C.L. Using ChatGPT to Evaluate Cancer Myths and Misconceptions: Artificial Intelligence and Cancer Information. JNCI Cancer Spectr. 2023, 7, pkad015. [Google Scholar] [CrossRef]
- Hatia, A.; Doldo, T.; Parrini, S.; Chisci, E.; Cipriani, L.; Montagna, L.; Lagana, G.; Guenza, G.; Agosta, E.; Vinjolli, F.; et al. Accuracy and Completeness of ChatGPT-Generated Information on Interceptive Orthodontics: A Multicenter Collaborative Study. J. Clin. Med. 2024, 13, 735. [Google Scholar] [CrossRef]
- Abu Arqub, S.; Al-Moghrabi, D.; Allareddy, V.; Upadhyay, M.; Vaid, N.; Yadav, S. Content Analysis of AI-Generated (ChatGPT) Responses Concerning Orthodontic Clear Aligners. Angle Orthodontist. 2024, 94, 263–272. [Google Scholar] [CrossRef]
- Ollivier, M.; Pareek, A.; Dahmen, J.; Kayaalp, M.E.; Winkler, P.W.; Hirschmann, M.T.; Karlsson, J. A Deeper Dive into ChatGPT: History, Use, and Future Perspectives for Orthopaedic Research. Knee Surg. Sports Traumatol. Arthrosc. 2023, 31, 1190–1192. [Google Scholar] [CrossRef] [PubMed]
- Salvagno, M.; Taccone, F.S.; Gerli, A.G. Can Artificial Intelligence Help for Scientific Writing? Crit. Care 2023, 27, 75. [Google Scholar] [CrossRef]
- Biswas, S. ChatGPT and the Future of Medical Writing. Radiol. Soc. North Am. 2023, 307, e223312. [Google Scholar] [CrossRef] [PubMed]
- Alshami, A.; Elsayed, M.; Ali, E.; Eltoukhy, A.E.; Zayed, T. Harnessing the Power of ChatGPT for Automating Systematic Review Process: Methodology, Case Study, Limitations, and Future Directions. Systems 2023, 11, 351. [Google Scholar] [CrossRef]
- Wang, S.; Scells, H.; Koopman, B.; Zuccon, G. Can ChatGPT Write a Good Boolean Query for Systematic Review Literature Search? In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Taipei, Taiwan, 23–27 July 2023. [Google Scholar] [CrossRef]
- Mahmoudi, H.; Chang, D.; Lee, H.; Ghaffarzadegan, N.; Jalali, M.S. A Critical Assessment of Large Language Models for Systematic Reviews: Utilizing ChatGPT for Complex Data Extraction. SSRN 2024. [Google Scholar] [CrossRef]
- Kılınç, D.D.; Mansız, D. Examination of the Reliability and Readability of Chatbot Generative Pretrained Transformer’s (ChatGPT) Responses to Questions about Orthodontics and the Evolution of These Responses in an Updated Version. Am. J. Orthod. Dentofac. Orthop. 2024, 165, 546–555. [Google Scholar] [CrossRef]
- Campbell, D.J.; Estephan, L.E.; Mastrolonardo, E.V.; Amin, D.R.; Huntley, C.T.; Boon, M.S. Evaluating ChatGPT Responses on Obstructive Sleep Apnea for Patient Education. J. Clin. Sleep Med. 2023, 19, 1989–1995. [Google Scholar] [CrossRef]
- Daraqel, B.; Wafaie, K.; Mohammed, H.; Cao, L.; Mheissen, S.; Liu, Y.; Zheng, L. The Performance of Artificial Intelligence Models in Generating Responses to General Orthodontic Questions: ChatGPT vs. Google Bard. Am. J. Orthod. Dentofac. Orthop. 2024, 165, 652–662. [Google Scholar] [CrossRef]
- Makrygiannakis, M.A.; Giannakopoulos, K.; Kaklamanos, E.G. Evidence-Based Potential of Generative Artificial Intelligence Large Language Models in Orthodontics: A Comparative Study of ChatGPT, Google Bard, and Microsoft Bing. Eur. J. Orthod. 2024, 46, cjae017. [Google Scholar] [CrossRef]
- Demir, G.B.; Süküt, Y.; Duran, G.S.; Topsakal, K.G.; Görgülü, S. Enhancing Systematic Reviews in Orthodontics: A Comparative Examination of GPT-3.5 and GPT-4 for Generating PICO-Based Queries with Tailored Prompts and Configurations. Eur. J. Orthod. 2024, 46, cjae011. [Google Scholar] [CrossRef]
- Roberts, R.H.; Ali, S.R.; Hutchings, H.A.; Dobbs, T.D.; Whitaker, I.S. Comparative Study of ChatGPT and Human Evaluators on the Assessment of Medical Literature According to Recognized Reporting Standards. BMJ Health Care Inform. 2023, 30, e100830. [Google Scholar] [CrossRef] [PubMed]
- Woelfle, T.; Hirt, J.; Janiaud, P.; Kappos, L.; Ioannidis, J.; Hemkens, L.G. Benchmarking Human-AI Collaboration for Common Evidence Appraisal Tools. J. Clin. Epidemiol. 2024, 175, 111533. [Google Scholar] [CrossRef] [PubMed]
- Salewski, L.; Alaniz, S.; Rio-Torto, I.; Schulz, E.; Akata, Z. In-Context Impersonation Reveals Large Language Models’ Strengths and Biases. arXiv 2024, arXiv:2305.14930. [Google Scholar] [CrossRef]
- R Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2020. [Google Scholar] [CrossRef]
- Sjoberg, D.D.; Whiting, K.; Curry, M.; Lavery, J.A.; Larmarange, J. Reproducible Summary Tables with the gtsummary Package. R J. 2021, 13, 570–580. [Google Scholar] [CrossRef]
Variable | N | ChatGPT, N = 20 | Human, N = 20 | p-Value |
---|---|---|---|---|
1. Title identification of the study as randomized | 40 | |||
Reported | 20 (100%) | 20 (100%) | ||
2. Authors contact details for the corresponding author | 40 | |||
Reported | 20 (100%) | 20 (100%) | ||
3. Trial design | 40 | 0.11 | ||
Reported | 20 (100%) | 16 (80%) | ||
Not reported | 0 (0%) | 4 (20%) | ||
4. Participants’ eligibility criteria for participants and the settings | 40 | >0.9 | ||
Reported | 20 (100%) | 19 (95%) | ||
Not reported | 0 (0%) | 1 (5.0%) | ||
5. Interventions intended for each group | 40 | |||
Reported | 20 (100%) | 20 (100%) | ||
6. Objective | 40 | |||
Reported | 20 (100%) | 20 (100%) | ||
7. Outcome clearly defined | 40 | 0.11 | ||
Reported | 20 (100%) | 16 (80%) | ||
Not reported | 0 (0%) | 4 (20%) | ||
8. Randomization/how participants were allocated to interventions | 40 | 0.001 | ||
Reported | 20 (100%) | 11 (55%) | ||
Not reported | 0 (0%) | 9 (45%) | ||
9. Blinding | 40 | 0.7 | ||
Reported | 8 (40%) | 9 (45%) | ||
Not reported | 12 (60%) | 11 (55%) | ||
10. Number of participants randomized to each group | 40 | 0.5 | ||
Reported | 16 (80%) | 14 (70%) | ||
Not reported | 4 (20%) | 6 (30%) | ||
11. Recruitment trial status and period or duration | 40 | <0.001 | ||
Reported | 20 (100%) | 0 (0%) | ||
Not applicable | 0 (0%) | 20 (100%) | ||
12. Number of participants analyzed in each group | 40 | 0.2 | ||
Reported | 15 (75%) | 11 (55%) | ||
Not reported | 5 (25%) | 9 (45%) | ||
13. Outcome result for each group and estimated effect size | 40 | >0.9 | ||
Reported | 20 (100%) | 19 (95%) | ||
Not reported | 0 (0%) | 1 (5.0%) | ||
14. Harms/important adverse events or side effects | 40 | 0.082 | ||
Reported | 8 (40%) | 3 (15%) | ||
Not reported | 11 (55%) | 17 (85%) | ||
Not applicable | 1 (5.0%) | 0 (0%) | ||
15. Conclusions/general interpretation of the results | 40 | |||
Reported | 20 (100%) | 20 (100%) | ||
16. Trial registration | 40 | 21 (100%) | 0.5 | |
Reported | 9 (45%) | 11 (55%) | ||
Not reported | 11 (55%) | 9 (45%) | ||
17. Funding | 40 | >0.9 | ||
Reported | 4 (20%) | 3 (15%) | ||
Not reported | 16 (80%) | 17 (85%) |
Variable | N | ChatGPT, N = 20 | Human, N = 20 | p-Value |
---|---|---|---|---|
1. Identify the report as a systematic review | 40 | |||
Reported | 20 (100%) | 20 (100%) | ||
2. Objectives | 40 | |||
Reported | 20 (100%) | 20 (100%) | ||
3. Eligibility criteria | 40 | 0.028 | ||
Reported | 18 (90%) | 12 (60%) | ||
Not reported | 2 (10%) | 8 (40%) | ||
4. Information sources | 40 | 0.11 | ||
Reported | 20 (100%) | 16 (80%) | ||
Not reported | 0 (0%) | 4 (20%) | ||
5. Risk of bias | 40 | >0.9 | ||
Reported | 15 (75%) | 15 (75%) | ||
Not reported | 5 (25%) | 5 (25%) | ||
6. Methods of synthesis results | 40 | 0.14 | ||
Reported | 17 (85%) | 13 (65%) | ||
Not reported | 3 (15%) | 7 (35%) | ||
7. Included studies | 40 | >0.9 | ||
Reported | 20 (100%) | 19 (95%) | ||
Not reported | 0 (0%) | 1 (5.0%) | ||
8. Synthesis of results | 40 | 0.2 | ||
Reported | 18 (90%) | 14 (70%) | ||
Not reported | 2 (10%) | 6 (30%) | ||
9. Limitation of evidence | 40 | >0.9 | ||
Reported | 17 (85%) | 17 (85%) | ||
Not reported | 3 (15%) | 3 (15%) | ||
10. Interpretation | 40 | |||
Reported | 20 (100%) | 20 (100%) | ||
11. Funding | 40 | 0.091 | ||
Reported | 6 (30%) | 1 (5.0%) | ||
Not reported | 14 (70%) | 19 (95%) | ||
12. Registration | 40 | 0.7 | ||
Reported | 9 (45%) | 8 (40%) | ||
Not reported | 11 (55%) | 12 (60%) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alharbi, F.; Asiri, S. Automated Assessment of Reporting Completeness in Orthodontic Research Using LLMs: An Observational Study. Appl. Sci. 2024, 14, 10323. https://doi.org/10.3390/app142210323
Alharbi F, Asiri S. Automated Assessment of Reporting Completeness in Orthodontic Research Using LLMs: An Observational Study. Applied Sciences. 2024; 14(22):10323. https://doi.org/10.3390/app142210323
Chicago/Turabian StyleAlharbi, Fahad, and Saeed Asiri. 2024. "Automated Assessment of Reporting Completeness in Orthodontic Research Using LLMs: An Observational Study" Applied Sciences 14, no. 22: 10323. https://doi.org/10.3390/app142210323
APA StyleAlharbi, F., & Asiri, S. (2024). Automated Assessment of Reporting Completeness in Orthodontic Research Using LLMs: An Observational Study. Applied Sciences, 14(22), 10323. https://doi.org/10.3390/app142210323