AI in Hand Surgery: Assessing Large Language Models in the Classification and Management of Hand Injuries
Abstract
:1. Introduction
2. Materials and Methods
2.1. Study Design
2.2. Data Collection and Analysis
3. Results
3.1. Classification Results
3.2. Surgical Management Results from Sensitivity Testing
3.3. Replicability Results
4. Discussion
4.1. Ethical Considerations
- Autonomy: Healthcare providers must ensure that LLMs respect a patient’s autonomy by facilitating informed decision making and respecting their preferences and values, especially throughout the surgical process [57,58]. A patient’s autonomy may be compromised if they are not adequately informed about the limitations, biases, and role of LLMs in their care [1,59].
- Beneficence: LLMs have the potential to significantly benefit patient care by providing timely and accurate information that can empower both healthcare professionals and patients to make more informed decisions. However, the implementation of LLMs must be guided by a commitment to maximizing these potential benefits while minimizing harm [48,59].
- Nonmaleficence: While LLMs can offer valuable assistance, they also carry inherent risks, including the potential for errors, biases, and misinformation [1,3,11,48]. Healthcare providers must critically evaluate and verify LLM-generated recommendations. Additionally, measures should be in place to mitigate the risk of LLMs propagating misinformation or perpetuating bias and healthcare disparities [57,59]. This entails the ongoing monitoring and evaluation of LLM performance, as well as efforts to address any identified issues or limitations.
- Justice: The fair distribution of resources requires equitable access to this technology and its benefits [2]. Failing to address LLM bias and disparities in LLM utilization could exacerbate existing inequities in healthcare access and outcomes [1,2,52,57]. Therefore, it is imperative for healthcare systems to implement policies and initiatives aimed at promoting equitable access to LLM technology.
4.2. Limitations
4.3. Future Research and Next Steps
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Miller, R.; Farnebo, S.; Horwitz, M.D. Insights and trends review: Artificial intelligence in hand surgery. J. Hand Surg. Eur. Vol. 2023, 48, 396–403. [Google Scholar] [CrossRef] [PubMed]
- Topol, E.J. High-performance medicine: The convergence of human and artificial intelligence. Nat. Med. 2019, 25, 44–56. [Google Scholar] [CrossRef] [PubMed]
- Dave, T.; Athaluri, S.A.; Singh, S. ChatGPT in medicine: An overview of its applications, advantages, limitations, future prospects, and ethical considerations. Front. Artif. Intell. 2023, 6, 1169595. [Google Scholar] [CrossRef] [PubMed]
- Ulusoy, I.; Yılmaz, M.; Kıvrak, A. How Efficient Is ChatGPT in Accessing Accurate and Quality Health-Related Information? Cureus 2023, 15, e46662. [Google Scholar] [CrossRef] [PubMed]
- Mikolov, T.; Karafiát, M.; Burget, L.; Cernocký, J.; Khudanpur, S. Recurrent neural network based language model. In Interspeech; ISCA: Chiba, Japan, 2010; pp. 1045–1048. [Google Scholar]
- Jin, Z. Analysis of the Technical Principles of ChatGPT and Prospects for Pre-trained Large Models. In Proceedings of the 2023 IEEE 3rd International Conference on Information Technology, Big Data and Artificial Intelligence (ICIBA), Chongqing, China, 26–28 May 2023; pp. 1755–1758. [Google Scholar]
- Google. Gemini. Available online: https://gemini.google.com/app (accessed on 10 March 2024).
- OpenAI. ChatGPT. Available online: https://chat.openai.com/chat (accessed on 10 March 2024).
- Abi-Rafeh, J.; Xu, H.H.; Kazan, R.; Tevlin, R.; Furnas, H. Large Language Models and Artificial Intelligence: A Primer for Plastic Surgeons on the Demonstrated & Potential Applications, Promises, and Limitations of ChatGPT. Aesthet. Surg. J. 2023, 44, 329–343. [Google Scholar] [CrossRef] [PubMed]
- Kung, T.H.; Cheatham, M.; Medenilla, A.; Sillos, C.; De Leon, L.; Elepaño, C.; Madriaga, M.; Aggabao, R.; Diaz-Candido, G.; Maningo, J.; et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLoS Digit. Health 2023, 2, e0000198. [Google Scholar] [CrossRef] [PubMed]
- Ghanem, D.; Nassar, J.; El Bachour, J.; Hanna, T. ChatGPT Earns American Board Certification in Hand Surgery. Hand Surg. Rehabil. 2024, 101688. [Google Scholar] [CrossRef] [PubMed]
- Sallam, M. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare 2023, 11, 887. [Google Scholar] [CrossRef] [PubMed]
- Keller, M.; Guebeli, A.; Thieringer, F.; Honigmann, P. Artificial intelligence in patient-specific hand surgery: A scoping review of literature. Int. J. Comput. Assist. Radiol. Surg. 2023, 18, 1393–1403. [Google Scholar] [CrossRef]
- Gummesson, C.; Ward, M.M.; Atroshi, I. The shortened disabilities of the arm, shoulder and hand questionnaire (Quick DASH): Validity and reliability based on responses within the full-length DASH. BMC Musculoskelet. Disord. 2006, 7, 1–7. [Google Scholar] [CrossRef]
- Poerbodipoero, S.; Steultjens, M.; Van der Beek, A.; Dekker, J. Pain, disability in daily activities and work participation in patients with traumatic hand injury. Br. J. Hand Ther. 2007, 12, 40–47. [Google Scholar] [CrossRef]
- Schier, J.S.; Chan, J. Changes in life roles after hand injury. J. Hand Ther. 2007, 20, 57–69. [Google Scholar] [CrossRef] [PubMed]
- Smith, M.E.; Auchincloss, J.M.; Ali, M.S. Causes and consequences of hand injury. J. Hand Surg. Br. 1985, 10, 288–292. [Google Scholar] [CrossRef] [PubMed]
- Angly, B.; Constantinescu, M.A.; Kreutziger, J.; Juon, B.H.; Vögelin, E. Early versus delayed surgical treatment in open hand injuries: A paradigm revisited. World J. Surg. 2012, 36, 826–829. [Google Scholar] [CrossRef]
- del Pinal, F. Severe mutilating injuries to the hand: Guidelines for organizing the chaos. J. Plast. Reconstr. Aesthet. Surg. 2007, 60, 816–827. [Google Scholar] [CrossRef] [PubMed]
- Gustilo, R.B.; Anderson, J.T. Prevention of infection in the treatment of one thousand and twenty-five open fractures of long bones: Retrospective and prospective analyses. J. Bone Joint Surg. Am. 1976, 58, 453–458. [Google Scholar] [CrossRef] [PubMed]
- Salazar Botero, S.; Hidalgo Diaz, J.J.; Benaïda, A.; Collon, S.; Facca, S.; Liverneaux, P.A. Review of Acute Traumatic Closed Mallet Finger Injuries in Adults. Arch. Plast. Surg. 2016, 43, 134–144. [Google Scholar] [CrossRef]
- Wong, K.; von Schroeder, H.P. Delays and Poor Management of Scaphoid Fractures: Factors Contributing to Nonunion. J. Hand Surg. 2011, 36, 1471–1474. [Google Scholar] [CrossRef]
- Yoong, P.; Johnson, C.A.; Yoong, E.; Chojnowski, A. Four hand injuries not to miss: Avoiding pitfalls in the emergency department. Eur. J. Emerg. Med. 2011, 18, 186–191. [Google Scholar] [CrossRef] [PubMed]
- Leypold, T.; Schäfer, B.; Boos, A.; Beier, J.P. Can AI Think Like a Plastic Surgeon? Evaluating GPT-4′s Clinical Judgment in Reconstructive Procedures of the Upper Extremity. Plast. Reconstr. Surg. Glob. Open 2023, 11, e5471. [Google Scholar] [CrossRef]
- Crook, B.S.; Park, C.N.; Hurley, E.T.; Richard, M.J.; Pidgeon, T.S. Evaluation of Online Artificial Intelligence-Generated Information on Common Hand Procedures. J. Hand Surg. Am. 2023, 48, 1122–1127. [Google Scholar] [CrossRef] [PubMed]
- Seth, I.; Xie, Y.; Rodwell, A.; Gracias, D.; Bulloch, G.; Hunter-Smith, D.J.; Rozen, W.M. Exploring the Role of a Large Language Model on Carpal Tunnel Syndrome Management: An Observation Study of ChatGPT. J. Hand Surg. Am. 2023, 48, 1025–1033. [Google Scholar] [CrossRef] [PubMed]
- Al Rawi, Z.M.; Kirby, B.J.; Albrecht, P.A.; Nuelle, J.A.V.; London, D.A. Experimenting With the New Frontier: Artificial Intelligence-Powered Chat Bots in Hand Surgery. Hand 2024, 15589447241238372. [Google Scholar] [CrossRef] [PubMed]
- Cooney, W.P., 3rd. Scaphoid fractures: Current treatments and techniques. Instr. Course Lect. 2003, 52, 197–208. [Google Scholar] [PubMed]
- Cooney, W.P.; Dobyns, J.H.; Linscheid, R.L. Fractures of the scaphoid: A rational approach to management. Clin. Orthop. Relat. Res. 1980, 149, 90–97. [Google Scholar] [CrossRef]
- Eaton, R.G.; Malerich, M.M. Volar plate arthroplasty of the proximal interphalangeal joint: A review of ten years’ experience. J. Hand Surg. Am. 1980, 5, 260–268. [Google Scholar] [CrossRef] [PubMed]
- Geissler, W.B. Arthroscopic management of scapholunate instability. J. Wrist Surg. 2013, 2, 129–135. [Google Scholar] [CrossRef] [PubMed]
- Green, D.P.; O’Brien, E.T. Fractures of the thumb metacarpal. South. Med. J. 1972, 65, 807–814. [Google Scholar] [CrossRef] [PubMed]
- Gustilo, R.B.; Mendoza, R.M.; Williams, D.N. Problems in the management of type III (severe) open fractures: A new classification of type III open fractures. J. Trauma. 1984, 24, 742–746. [Google Scholar] [CrossRef] [PubMed]
- Herbert, T.J.; Fisher, W.E. Management of the fractured scaphoid using a new bone screw. J. Bone Joint Surg. Br. 1984, 66, 114–123. [Google Scholar] [CrossRef]
- Hintermann, B.; Holzach, P.J.; Schütz, M.; Matter, P. Skier’s thumb--the significance of bony injuries. Am. J. Sports Med. 1993, 21, 800–804. [Google Scholar] [CrossRef] [PubMed]
- Kleinert, H.E.; Verdan, C. Report of the Committee on Tendon Injuries. J. Hand Surg. 1983, 8, 794–798. [Google Scholar] [CrossRef] [PubMed]
- Leddy, J.P.; Packer, J.W. Avulsion of the profundus tendon insertion in athletes. J. Hand Surg. Am. 1977, 2, 66–69. [Google Scholar] [CrossRef]
- Lichtman, D.M.; Pientka, W.F., 2nd; Bain, G.I. Kienböck Disease: A New Algorithm for the 21st Century. J. Wrist Surg. 2017, 6, 2–10. [Google Scholar] [CrossRef] [PubMed]
- Mayfield, J.K.; Johnson, R.P.; Kilcoyne, R.K. Carpal dislocations: Pathomechanics and progressive perilunar instability. J. Hand Surg. Am. 1980, 5, 226–241. [Google Scholar] [CrossRef]
- Carlà, M.M.; Gambini, G.; Baldascino, A.; Boselli, F.; Giannuzzi, F.; Margollicci, F.; Rizzo, S. Large language models as assistance for glaucoma surgical cases: A ChatGPT vs. Google Gemini comparison. Graefes Arch. Clin. Exp. Ophthalmol. 2024. [Google Scholar] [CrossRef] [PubMed]
- Carlà, M.M.; Gambini, G.; Baldascino, A.; Giannuzzi, F.; Boselli, F.; Crincoli, E.; D’Onofrio, N.C.; Rizzo, S. Exploring AI-chatbots’ capability to suggest surgical planning in ophthalmology: ChatGPT versus Google Gemini analysis of retinal detachment cases. Br. J. Ophthalmol. 2024. [Google Scholar] [CrossRef]
- Koga, S.; Martin, N.B.; Dickson, D.W. Evaluating the performance of large language models: ChatGPT and Google Bard in generating differential diagnoses in clinicopathological conferences of neurodegenerative disorders. Brain Pathol. 2023, 34, e13207. [Google Scholar] [CrossRef] [PubMed]
- Kumari, A.; Kumari, A.; Singh, A.; Singh, S.K.; Juhi, A.; Dhanvijay, A.K.D.; Pinjar, M.J.; Mondal, H. Large Language Models in Hematology Case Solving: A Comparative Study of ChatGPT-3.5, Google Bard, and Microsoft Bing. Cureus 2023, 15, e43861. [Google Scholar] [CrossRef]
- Lim, Z.W.; Pushpanathan, K.; Yew, S.M.E.; Lai, Y.; Sun, C.H.; Lam, J.S.H.; Chen, D.Z.; Goh, J.H.L.; Tan, M.C.J.; Sheng, B.; et al. Benchmarking large language models’ performances for myopia care: A comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard. EBioMedicine 2023, 95, 104770. [Google Scholar] [CrossRef]
- Rahsepar, A.A.; Tavakoli, N.; Kim, G.H.J.; Hassani, C.; Abtin, F.; Bedayat, A. How AI Responds to Common Lung Cancer Questions: ChatGPT vs Google Bard. Radiology 2023, 307, e230922. [Google Scholar] [CrossRef] [PubMed]
- Gan, R.K.; Ogbodo, J.C.; Wee, Y.Z.; Gan, A.Z.; González, P.A. Performance of Google bard and ChatGPT in mass casualty incidents triage. Am. J. Emerg. Med. 2024, 75, 72–78. [Google Scholar] [CrossRef] [PubMed]
- Zúñiga Salazar, G.; Zúñiga, D.; Vindel, C.L.; Yoong, A.M.; Hincapie, S.; Zúñiga, A.B.; Zúñiga, P.; Salazar, E.; Zúñiga, B. Efficacy of AI Chats to Determine an Emergency: A Comparison Between OpenAI’s ChatGPT, Google Bard, and Microsoft Bing AI Chat. Cureus 2023, 15, e45473. [Google Scholar] [CrossRef]
- Berg, H.T.; van Bakel, B.; van de Wouw, L.; Jie, K.E.; Schipper, A.; Jansen, H.; O’Connor, R.D.; van Ginneken, B.; Kurstjens, S. ChatGPT and Generating a Differential Diagnosis Early in an Emergency Department Presentation. Ann. Emerg. Med. 2024, 83, 83–86. [Google Scholar] [CrossRef] [PubMed]
- Franc, J.M.; Cheng, L.; Hart, A.; Hata, R.; Hertelendy, A. Repeatability, reproducibility, and diagnostic accuracy of a commercial large language model (ChatGPT) to perform emergency department triage using the Canadian triage and acuity scale. Cjem 2024, 26, 40–46. [Google Scholar] [CrossRef] [PubMed]
- Funk, P.F.; Hoch, C.C.; Knoedler, S.; Knoedler, L.; Cotofana, S.; Sofo, G.; Bashiri Dezfouli, A.; Wollenberg, B.; Guntinas-Lichius, O.; Alfertshofer, M. ChatGPT’s Response Consistency: A Study on Repeated Queries of Medical Examination Questions. Eur. J. Investig. Health Psychol. Educ. 2024, 14, 657–668. [Google Scholar] [CrossRef]
- Fraser, H.; Crossland, D.; Bacher, I.; Ranney, M.; Madsen, T.; Hilliard, R. Comparison of Diagnostic and Triage Accuracy of Ada Health and WebMD Symptom Checkers, ChatGPT, and Physicians for Patients in an Emergency Department: Clinical Data Analysis Study. JMIR Mhealth Uhealth 2023, 11, e49995. [Google Scholar] [CrossRef] [PubMed]
- Barash, Y.; Klang, E.; Konen, E.; Sorin, V. ChatGPT-4 Assistance in Optimizing Emergency Department Radiology Referrals and Imaging Selection. J. Am. Coll. Radiol. 2023, 20, 998–1003. [Google Scholar] [CrossRef] [PubMed]
- Günay, S.; Öztürk, A.; Özerol, H.; Yiğit, Y.; Erenler, A.K. Comparison of emergency medicine specialist, cardiologist, and chat-GPT in electrocardiography assessment. Am. J. Emerg. Med. 2024, 80, 51–60. [Google Scholar] [CrossRef]
- van Leerdam, R.H.; Krijnen, P.; Panneman, M.J.; Schipper, I.B. Incidence and treatment of hand and wrist injuries in Dutch emergency departments. Eur. J. Trauma. Emerg. Surg. 2022, 48, 4327–4332. [Google Scholar] [CrossRef]
- Rizwan, A.; Sadiq, T. The Use of AI in Diagnosing Diseases and Providing Management Plans: A Consultation on Cardiovascular Disorders With ChatGPT. Cureus 2023, 15, e43106. [Google Scholar] [CrossRef] [PubMed]
- Sun, Y.X.; Li, Z.M.; Huang, J.Z.; Yu, N.Z.; Long, X. GPT-4: The Future of Cosmetic Procedure Consultation? Aesthet. Surg. J. 2023, 43, NP670–NP672. [Google Scholar] [CrossRef] [PubMed]
- Oleck, N.C.; Naga, H.I.; Nichols, D.S.; Morris, M.X.; Dhingra, B.; Patel, A. Navigating the Ethical Landmines of ChatGPT: Implications of Intelligent Chatbots in Plastic Surgery Clinical Practice. Plast. Reconstr. Surg. Glob. Open 2023, 11, e5290. [Google Scholar] [CrossRef] [PubMed]
- Pressman, S.M.; Borna, S.; Gomez-Cabello, C.A.; Haider, S.A.; Haider, C.; Forte, A.J. AI and Ethics: A Systematic Review of the Ethical Considerations of Large Language Model Use in Surgery Research. Healthcare 2024, 12, 825. [Google Scholar] [CrossRef] [PubMed]
- Keskinbora, K.H. Medical ethics considerations on artificial intelligence. J. Clin. Neurosci. 2019, 64, 277–282. [Google Scholar] [CrossRef]
- Li, W.; Zhang, Y.; Chen, F. ChatGPT in Colorectal Surgery: A Promising Tool or a Passing Fad? Ann. Biomed. Eng. 2023, 51, 1892–1897. [Google Scholar] [CrossRef]
Classification System | LLM | Correct | Partially Correct | Incorrect | Mean Score | Standard Deviation |
---|---|---|---|---|---|---|
Eaton classification for volar plate avulsion injuries | ChatGPT-4 | 0 | 1 | 7 | 0.13 | 0.35 |
Gemini | 8 | 0 | 0 | 2.00 | 0.00 | |
Geissler arthroscopic classification for carpal instability | ChatGPT-4 | 3 | 0 | 5 | 0.75 | 1.04 |
Gemini | 4 | 0 | 4 | 1.00 | 1.07 | |
Green and O’Brien’s classification of thumb metacarpal fractures | ChatGPT-4 | 1 | 1 | 10 | 0.25 | 0.62 |
Gemini | 11 | 1 | 0 | 1.92 | 0.29 | |
Gustilo-Anderson classification of open fractures | ChatGPT-4 | 7 | 1 | 2 | 1.50 | 0.85 |
Gemini | 5 | 3 | 2 | 1.30 | 0.82 | |
Herbert and Fisher Classification of scaphoid fractures | ChatGPT-4 | 4 | 6 | 10 | 0.70 | 0.80 |
Gemini | 20 | 0 | 0 | 2.00 | 0.00 | |
Hintermann et al.’s classification of ulnar collateral ligament (UCL) injury of the thumb | ChatGPT-4 | 0 | 0 | 12 | 0.00 | 0.00 |
Gemini | 10 | 0 | 2 | 1.67 | 0.78 | |
Kleinert and Verdan’s Zone classification of flexor tendon injuries | ChatGPT-4 | 3 | 6 | 7 | 0.75 | 0.77 |
Gemini | 4 | 2 | 10 | 0.63 | 0.89 | |
Leddy and Packer classification of avulsion injury of the flexor digitorum profundus (FDP) | ChatGPT-4 | 4 | 0 | 8 | 0.67 | 0.98 |
Gemini | 6 | 0 | 6 | 1.00 | 1.04 | |
Lichtman classification of Kienböck disease (osteonecrosis the lunate) | ChatGPT-4 | 8 | 3 | 1 | 1.58 | 0.67 |
Gemini | 8 | 0 | 4 | 1.33 | 0.98 | |
Mayfield classification for carpal instability | ChatGPT-4 | 4 | 1 | 3 | 1.13 | 0.99 |
Gemini | 6 | 0 | 2 | 1.50 | 0.93 | |
Mayo Classification of scaphoid fractures | ChatGPT-4 | 0 | 0 | 10 | 0.00 | 0.00 |
Gemini | 8 | 0 | 2 | 1.60 | 0.84 | |
Tubiana classification for mallet finger | ChatGPT-4 | 2 | 0 | 6 | 0.50 | 0.93 |
Gemini | 6 | 0 | 2 | 1.50 | 0.93 | |
Total | ChatGPT-4 | 36 | 19 | 81 | 0.67 | 0.87 |
Gemini | 96 | 6 | 34 | 1.46 | 0.87 |
Value | ChatGPT | Gemini |
---|---|---|
Sensitivity | 0.980 | 0.888 |
Specificity | 0.684 | 0.947 |
Positive Predictive Value (PPV) | 0.889 | 0.978 |
Negative Predictive Value (NPV) | 0.929 | 0.766 |
Positive Likelihood Ratio (LR+) | 3.102 | 16.867 |
Negative Likelihood Ratio (LR−) | 0.030 | 0.118 |
Accuracy | 0.897 | 0.904 |
F1 score | 0.932 | 0.930 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pressman, S.M.; Borna, S.; Gomez-Cabello, C.A.; Haider, S.A.; Forte, A.J. AI in Hand Surgery: Assessing Large Language Models in the Classification and Management of Hand Injuries. J. Clin. Med. 2024, 13, 2832. https://doi.org/10.3390/jcm13102832
Pressman SM, Borna S, Gomez-Cabello CA, Haider SA, Forte AJ. AI in Hand Surgery: Assessing Large Language Models in the Classification and Management of Hand Injuries. Journal of Clinical Medicine. 2024; 13(10):2832. https://doi.org/10.3390/jcm13102832
Chicago/Turabian StylePressman, Sophia M., Sahar Borna, Cesar A. Gomez-Cabello, Syed Ali Haider, and Antonio Jorge Forte. 2024. "AI in Hand Surgery: Assessing Large Language Models in the Classification and Management of Hand Injuries" Journal of Clinical Medicine 13, no. 10: 2832. https://doi.org/10.3390/jcm13102832
APA StylePressman, S. M., Borna, S., Gomez-Cabello, C. A., Haider, S. A., & Forte, A. J. (2024). AI in Hand Surgery: Assessing Large Language Models in the Classification and Management of Hand Injuries. Journal of Clinical Medicine, 13(10), 2832. https://doi.org/10.3390/jcm13102832