This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
Flexing ChatGPT-4o’s Diagnostic Muscle: Detection of Fractures in the Ossifying Pediatric Elbow on Radiographs
by
Jonathan Kia-Sheng Phua
Jonathan Kia-Sheng Phua and
Timothy Shao Ern Tan
Timothy Shao Ern Tan *
Department of Diagnostic and Interventional Imaging, KK Women’s and Children’s Hospital, 100 Bukit Timah Road, Singapore 229899, Singapore
*
Author to whom correspondence should be addressed.
Diagnostics 2025, 15(22), 2882; https://doi.org/10.3390/diagnostics15222882 (registering DOI)
Submission received: 25 September 2025
/
Revised: 2 November 2025
/
Accepted: 12 November 2025
/
Published: 13 November 2025
Abstract
Background/Objectives: Elbow fractures are the most common injuries in children and are frequently evaluated with plain radiographs in the acute setting. As dedicated pediatric radiology services are not widely available, diagnosis of fractures could be delayed. Since 2023, ChatGPT-4 has offered image analysis capabilities, which has untapped potential for radiographic analysis. This study represents the first evaluation of ChatGPT-4o, a multimodal large language model, in interpreting pediatric elbow radiographs for fracture detection, thereby demonstrating its potential as a generalist AI tool distinct from domain-specific pediatric models. Methods: A curated set of 200 pediatric elbow radiographs (100 normal, 100 abnormal with at least one fracture site, 105 right elbow, and 95 left elbow radiographs) acquired between October 2023 and March 2024 at a tertiary pediatric hospital were analyzed in this case–control study. Each anonymized radiograph was evaluated by ChatGPT-4o via a standardized prompt. ChatGPT-4o’s prediction outputs (fracture vs. no fracture) were subsequently compared against verified radiology reports (ground-truth). Diagnostic performance metrics such as sensitivity, specificity, accuracy, positive predictive value (PPV), negative predictive value (NPV), and F1 score were calculated. Results: ChatGPT-4o achieved an overall accuracy of 85% in detecting elbow fractures on pediatric radiographs, with a sensitivity of 87% and specificity of 82%. PPVs and NPVs were 83% and 86%, respectively. The F1 score was 0.85. ChatGPT-4o correctly identified the fracture site in 68 (78%) of the 87 studies in which it had detected fractures accurately. Cohen’s kappa coefficient was 0.69, indicating substantial agreement with actual diagnoses. Conclusions: This study highlights the utility and potential applications of ChatGPT-4o as a valuable point-of-care tool in aiding the detection of pediatric elbow fractures in emergency settings, particularly where specialist access is limited.
Share and Cite
MDPI and ACS Style
Phua, J.K.-S.; Tan, T.S.E.
Flexing ChatGPT-4o’s Diagnostic Muscle: Detection of Fractures in the Ossifying Pediatric Elbow on Radiographs. Diagnostics 2025, 15, 2882.
https://doi.org/10.3390/diagnostics15222882
AMA Style
Phua JK-S, Tan TSE.
Flexing ChatGPT-4o’s Diagnostic Muscle: Detection of Fractures in the Ossifying Pediatric Elbow on Radiographs. Diagnostics. 2025; 15(22):2882.
https://doi.org/10.3390/diagnostics15222882
Chicago/Turabian Style
Phua, Jonathan Kia-Sheng, and Timothy Shao Ern Tan.
2025. "Flexing ChatGPT-4o’s Diagnostic Muscle: Detection of Fractures in the Ossifying Pediatric Elbow on Radiographs" Diagnostics 15, no. 22: 2882.
https://doi.org/10.3390/diagnostics15222882
APA Style
Phua, J. K.-S., & Tan, T. S. E.
(2025). Flexing ChatGPT-4o’s Diagnostic Muscle: Detection of Fractures in the Ossifying Pediatric Elbow on Radiographs. Diagnostics, 15(22), 2882.
https://doi.org/10.3390/diagnostics15222882
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.