Diagnostic Performance of ChatGPT-5 for Detecting Pediatric Pneumothorax on Chest Radiographs: A Multi-Prompt Evaluation
Abstract
1. Introduction
2. Materials and Methods
2.1. Ethical Statement and Informed Consent
2.2. Study Design and Settings
2.3. Study Participants
2.4. Data Collection
- ACCP definition: small pneumothorax, apex-to-cupola distance < 3 cm; large pneumothorax, distance ≥ 3 cm
- BTS definition: small pneumothorax, interpleural distance < 2 cm at the level of the hilum; large pneumothorax, distance ≥ 2 cm
2.5. Data Input for ChatGPT-5
- Prompt A (instructional): “Whether pneumothorax is present.”
- Prompt B (role-based): Prompt A preceded by “You are an experienced pediatric radiologist.”
- Prompt C (clinical context): Prompt A preceded by “You are an experienced pediatric radiologist working in an emergency department.”
2.6. Statistical Analyses
3. Results
3.1. Study Population
3.2. Overall Diagnostic Performance of ChatGPT-5
3.3. Impact of Pneumothorax Size on Sensitivity
4. Discussion
4.1. Summary of Principal Findings
4.2. Comparison with Prior Multimodal LLM Studies
4.3. Comparison with Specialized AI Models
4.4. Toward Hybrid Integration of LLM and CNN Frameworks
4.5. Limitations
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| AI | Artificial Intelligence |
| CNN | Convolutional Neural Network |
| LLM | Large Language Model |
| STROBE | Strengthening the Reporting of Observational Studies in Epidemiology |
| CXR | Chest X-ray |
| EMR | Electronic Medical Record |
| PACS | Picture Archiving and Communication System |
| ICD | International Classification of Diseases |
| PA | Posterior–anterior |
| IQR | Interquartile Range |
| ACCP | American College of Chest Physicians |
| BTS | British Thoracic Society |
| CI | Confidence Interval |
| AUROC | Area under the Receiver Operating Characteristic |
| FDA | Food and Drug Administration |
| PPV | Positive Predictive Value |
| NPV | Negative Predictive Value |
References
- Jouneau, S.; Ricard, J.-D.; Seguin-Givelet, A.; Bigé, N.; Contou, D.; Desmettre, T.; Hugenschmitt, D.; Kepka, S.; Le Gloan, K.; Maitre, B.; et al. SPLF/SMFU/SRLF/SFAR/SFCTCV Guidelines for the management of patients with primary spontaneous pneumothorax. Ann. Intensive Care 2023, 13, 88. [Google Scholar] [CrossRef]
- Larson, P.A.; Berland, L.L.; Griffith, B.; Kahn, C.E., Jr.; Liebscher, L.A. Actionable findings and the role of IT support: Report of the ACR Actionable Reporting Work Group. J. Am. Coll. Radiol. 2014, 11, 552–558. [Google Scholar] [CrossRef] [PubMed]
- Ozkale Yavuz, O.; Ayaz, E.; Ozcan, H.N.; Oguz, B.; Haliloglu, M. Spontaneous pneumothorax in children: A radiological perspective. Pediatr. Radiol. 2024, 54, 1864–1872. [Google Scholar] [CrossRef] [PubMed]
- Terboven, T.; Leonhard, G.; Wessel, L.; Viergutz, T.; Rudolph, M.; Schöler, M.; Weis, M.; Haubenreisser, H. Chest wall thickness and depth to vital structures in paediatric patients—Implications for prehospital needle decompression of tension pneumothorax. Scand. J. Trauma. Resusc. Emerg. Med. 2019, 27, 45. [Google Scholar] [CrossRef]
- Hwang, E.J.; Park, S.; Jin, K.-N.; Kim, J.I.; Choi, S.Y.; Lee, J.H.; Goo, J.M.; Aum, J.; Yim, J.-J.; Cohen, J.G.; et al. Development and Validation of a Deep Learning-Based Automated Detection Algorithm for Major Thoracic Diseases on Chest Radiographs. JAMA Netw. Open 2019, 2, e191095. [Google Scholar] [CrossRef] [PubMed]
- Nam, J.G.; Kim, M.; Park, J.; Hwang, E.J.; Lee, J.H.; Hong, J.H.; Goo, J.M.; Park, C.M. Development and validation of a deep learning algorithm detecting 10 common abnormalities on chest radiographs. Eur. Respir. J. 2021, 57, 2003061. [Google Scholar] [CrossRef]
- Chang, J.; Lee, K.J.; Wang, T.H.; Chen, C.M. Utilizing ChatGPT for Curriculum Learning in Developing a Clinical Grade Pneumothorax Detection Model: A Multisite Validation Study. J. Clin. Med. 2024, 13, 4042. [Google Scholar] [CrossRef]
- Nagendran, M.; Chen, Y.; Lovejoy, C.A.; Gordon, A.C.; Komorowski, M.; Harvey, H.; Topol, E.J.; Ioannidis, J.P.; Collins, G.S.; Maruthappu, M. Artificial intelligence versus clinicians: Systematic review of design, reporting standards, and claims of deep learning studies. BMJ 2020, 368, m689. [Google Scholar] [CrossRef]
- Majkowska, A.; Mittal, S.; Steiner, D.F.; Reicher, J.J.; McKinney, S.M.; Duggan, G.E.; Eswaran, K.; Chen, P.-H.C.; Liu, Y.; Kalidindi, S.R.; et al. Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation. Radiology 2020, 294, 421–431. [Google Scholar] [CrossRef]
- Schalekamp, S.; Klein, W.M.; van Leeuwen, K.G. Current and emerging artificial intelligence applications in chest imaging: A pediatric perspective. Pediatr. Radiol. 2022, 52, 2120–2130. [Google Scholar] [CrossRef]
- Ahn, J.S.; Ebrahimian, S.; McDermott, S.; Lee, S.; Naccarato, L.; Di Capua, J.F.; Wu, M.Y.; Zhang, E.W.; Muse, V.; Miller, B.; et al. Association of Artificial Intelligence-Aided Chest Radiograph Interpretation with Reader Performance and Efficiency. JAMA Netw. Open 2022, 5, e2229289. [Google Scholar] [CrossRef]
- Lee, S.; Youn, J.; Kim, H.; Kim, M.; Yoon, S.H. CXR-LLaVA: A multimodal large language model for interpreting chest X-ray images. Eur. Radiol. 2025, 35, 4374–4386. [Google Scholar] [CrossRef] [PubMed]
- Meskó, B.; Topol, E.J. The imperative for regulatory oversight of large language models (or generative AI) in healthcare. NPJ Digit. Med. 2023, 6, 120. [Google Scholar] [CrossRef]
- von Elm, E.; Altman, D.G.; Egger, M.; Pocock, S.J.; Gøtzsche, P.C.; Vandenbroucke, J.P. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: Guidelines for reporting observational studies. Lancet 2007, 370, 1453–1457. [Google Scholar] [CrossRef]
- Baumann, M.H.; Strange, C.; Heffner, J.E.; Light, R.; Kirby, T.J.; Klein, J.; Luketich, J.D.; Panacek, E.A.; Sahn, S.A.; Forthe ACCP Pneumothorax Consensus Group. Management of spontaneous pneumothorax: An American College of Chest Physicians Delphi consensus statement. Chest 2001, 119, 590–602. [Google Scholar] [CrossRef] [PubMed]
- Henry, M.; Arnold, T.; Harvey, J. BTS guidelines for the management of spontaneous pneumothorax. Thorax 2003, 58, ii39–ii52. [Google Scholar] [CrossRef]
- Lacaita, P.G.; Galijasevic, M.; Swoboda, M.; Gruber, L.; Scharll, Y.; Barbieri, F.; Widmann, G.; Feuchtner, G.M. The Accuracy of ChatGPT-4o in Interpreting Chest and Abdominal X-Ray Images. J. Pers. Med. 2025, 15, 194. [Google Scholar] [CrossRef] [PubMed]
- Bulut, B.; Öz, M.A.; Genç, M.; Gür, A.; Yortanlı, M.; Yortanlı, B.Ç.; Sariyildiz, O.; Yazıcı, R.; Mutlu, H.; Kotanoglu, M.S.; et al. New frontiers in radiologic interpretation: Evaluating the effectiveness of large language models in pneumothorax diagnosis. PLoS ONE 2025, 20, e0331962. [Google Scholar] [CrossRef]
- Ostrovsky, A.M. Evaluating a large language model’s accuracy in chest X-ray interpretation for acute thoracic conditions. Am. J. Emerg. Med. 2025, 93, 99–102. [Google Scholar] [CrossRef]
- Sugibayashi, T.; Walston, S.L.; Matsumoto, T.; Mitsuyama, Y.; Miki, Y.; Ueda, D. Deep learning for pneumothorax diagnosis: A systematic review and meta-analysis. Eur. Respir. Rev. 2023, 32, 220259. [Google Scholar] [CrossRef]
- Shamrat, F.J.M.; Azam, S.; Karim, A.; Ahmed, K.; Bui, F.M.; De Boer, F. High-precision multiclass classification of lung disease through customized MobileNetV2 from chest X-ray images. Comput. Biol. Med. 2023, 155, 106646. [Google Scholar] [CrossRef] [PubMed]
- van Beek, E.J.R.; Ahn, J.S.; Kim, M.J.; Murchison, J.T. Validation study of machine-learning chest radiograph software in primary and emergency medicine. Clin. Radiol. 2023, 78, 1–7. [Google Scholar] [CrossRef] [PubMed]
- Lind Plesner, L.; Müller, F.C.; Brejnebøl, M.W.; Laustrup, L.C.; Rasmussen, F.; Nielsen, O.W.; Boesen, M.; Andersen, M.B. Commercially Available Chest Radiograph AI Tools for Detecting Airspace Disease, Pneumothorax, and Pleural Effusion. Radiology 2023, 308, e231236. [Google Scholar] [CrossRef] [PubMed]
- Nam, Y.; Kim, D.Y.; Kyung, S.; Seo, J.; Song, J.M.; Kwon, J.; Kim, J.; Jo, W.; Park, H.; Sung, J.; et al. Multimodal Large Language Models in Medical Imaging: Current State and Future Directions. Korean J. Radiol. 2025, 26, 900–923. [Google Scholar] [CrossRef]
- Huang, J.; Neill, L.; Wittbrodt, M.; Melnick, D.; Klug, M.; Thompson, M.; Bailitz, J.; Loftus, T.; Malik, S.; Phull, A.; et al. Generative Artificial Intelligence for Chest Radiograph Interpretation in the Emergency Department. JAMA Netw. Open 2023, 6, e2336100. [Google Scholar] [CrossRef]
- Lee, R.W.; Lee, K.H.; Yun, J.S.; Kim, M.S.; Choi, H.S. Comparative Analysis of M4CXR, an LLM-Based Chest X-Ray Report Generation Model, and ChatGPT in Radiological Interpretation. J. Clin. Med. 2024, 13, 7057. [Google Scholar] [CrossRef]
- Xu, T.; Weng, H.; Liu, F.; Yang, L.; Luo, Y.; Ding, Z.; Wang, Q. Current Status of ChatGPT Use in Medical Education: Potentials, Challenges, and Strategies. J. Med. Internet Res. 2024, 26, e57896. [Google Scholar] [CrossRef]



| Characteristic | Control (n = 190) | Pneumothorax (n = 190) | p-Value |
|---|---|---|---|
| Age, years | 16.8 (16.2–17.3) | 16.8 (16.0–17.5) | 0.41 † |
| Male, No. (%) | 170 (89.5) | 170 (89.5) | >0.99 ‡ |
| Pneumothorax Laterality, No. (%) | |||
| Left | N/A | 127 (66.8) | |
| Right | N/A | 63 (33.2) | |
| Pneumothorax Size (ACCP), No. (%) | |||
| Small (<3 cm) | N/A | 60 (31.6) | |
| Large (≥3 cm) | N/A | 130 (68.4) | |
| Pneumothorax Size (BTS), No. (%) | |||
| Small (<2 cm) | N/A | 122 (64.2) | |
| Large (≥2 cm) | N/A | 68 (35.8) |
| Prompt | Accuracy | Sensitivity | Specificity | Conditional Side Accuracy * |
|---|---|---|---|---|
| A | 0.77 (0.72–0.81) | 0.57 (0.49–0.64) | 0.96 (0.93–0.99) | 0.98 (0.93–1.00) |
| B | 0.77 (0.73–0.81) | 0.57 (0.50–0.65) | 0.97 (0.93–0.99) | 0.96 (0.91–0.99) |
| C | 0.79 (0.75–0.83) | 0.61 (0.54–0.68) | 0.98 (0.95–0.99) | 0.97 (0.93–0.99) |
| A2 | 0.79 (0.74–0.83) | 0.61 (0.53–0.68) | 0.97 (0.93–0.99) | 0.97 (0.93–0.99) |
| A3 | 0.77 (0.73–0.81) | 0.58 (0.51–0.65) | 0.96 (0.92–0.98) | 0.98 (0.94–1.00) |
| Prompt | System | Size | Sensitivity (95% CI) |
|---|---|---|---|
| A | ACCP | Small | 0.18 (0.10–0.30) |
| Large | 0.75 (0.66–0.82) | ||
| BTS | Small | 0.41 (0.33–0.50) | |
| Large | 0.85 (0.75–0.93) | ||
| B | ACCP | Small | 0.18 (0.10–0.30) |
| Large | 0.75 (0.67–0.83) | ||
| BTS | Small | 0.41 (0.32–0.50) | |
| Large | 0.87 (0.76–0.94) | ||
| C | ACCP | Small | 0.22 (0.12–0.34) |
| Large | 0.79 (0.71–0.86) | ||
| BTS | Small | 0.46 (0.37–0.55) | |
| Large | 0.88 (0.78–0.95) |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wang, C.-H.; Lin, P.-C.; Shih, S.-L.; Tsai, P.-S.; Huang, W.-H. Diagnostic Performance of ChatGPT-5 for Detecting Pediatric Pneumothorax on Chest Radiographs: A Multi-Prompt Evaluation. Diagnostics 2026, 16, 232. https://doi.org/10.3390/diagnostics16020232
Wang C-H, Lin P-C, Shih S-L, Tsai P-S, Huang W-H. Diagnostic Performance of ChatGPT-5 for Detecting Pediatric Pneumothorax on Chest Radiographs: A Multi-Prompt Evaluation. Diagnostics. 2026; 16(2):232. https://doi.org/10.3390/diagnostics16020232
Chicago/Turabian StyleWang, Chih-Hao, Po-Chih Lin, Shin-Lin Shih, Pei-Shan Tsai, and Wen-Hui Huang. 2026. "Diagnostic Performance of ChatGPT-5 for Detecting Pediatric Pneumothorax on Chest Radiographs: A Multi-Prompt Evaluation" Diagnostics 16, no. 2: 232. https://doi.org/10.3390/diagnostics16020232
APA StyleWang, C.-H., Lin, P.-C., Shih, S.-L., Tsai, P.-S., & Huang, W.-H. (2026). Diagnostic Performance of ChatGPT-5 for Detecting Pediatric Pneumothorax on Chest Radiographs: A Multi-Prompt Evaluation. Diagnostics, 16(2), 232. https://doi.org/10.3390/diagnostics16020232

