The Accuracy of ChatGPT-4o in Interpreting Chest and Abdominal X-Ray Images
Abstract
1. Introduction
2. Materials and Methods
2.1. Study Design
2.2. Data Collection
2.3. AI-Based Image Analysis
Statistical Analysis
3. Results
Diagnostic Accuracy
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| AI | Artificial Intelligence |
| LLM | Large Language Model |
| CXR | Chest X-ray |
| AXR | Abdominal X-ray |
| ED | Emergency Department |
| ICU | Intensive Care Unit |
| GPT-4V | Generative Pretrained Transformer 4 Vision |
| GPT-4o | Generative Pretrained Transformer 4 omni |
| CNN | Convolutional Neural Network |
| SVM | Support Vector Machine |
| CTA | Computed Tomography Angiography |
| BO | Bowel Obstruction |
| R/U/B | Renal/Ureter/Bladder |
| CM | Cardiomegaly |
| Med. | Mediastinum |
| Fx | Fracture |
| Eff | Effusion |
| IQR | Interquartile Range |
| N | Number (of cases) |
| SD | Standard Deviation |
| MACEs | Major Adverse Cardiovascular Events |
| ASCVD | Atherosclerotic Cardiovascular Disease |
| USMLE | United States Medical Licensing Examination |
References
- Ou, X.; Chen, X.; Xu, X.; Xie, L.; Chen, X.; Hong, Z.; Bai, H.; Liu, X.; Chen, Q.; Li, L.; et al. Recent Development in X-Ray Imaging Technology: Future and Challenges. Research 2021, 2021, 9892152. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Baratella, E.; Marrocchio, C.; Bozzato, A.M.; Roman-Pognuz, E.; Cova, M.A. Chest X-ray in intensive care unit patients: What there is to know about thoracic devices. Diagn. Interv. Radiol. 2021, 27, 633–638. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Schneider, E.; Franz, W.; Spitznagel, R.; Bascom, D.A.; Obuchowski, N.A. Effect of computerized physician order entry on radiologic examination order indication quality. Arch. Intern. Med. 2011, 171, 1036–1038. [Google Scholar] [CrossRef] [PubMed]
- Cohen, M.D.; Curtin, S.; Lee, R. Evaluation of the quality of radiology requisitions for intensive care unit patients. Acad. Radiol. 2006, 13, 236–240. [Google Scholar] [CrossRef] [PubMed]
- Waisberg, E.; Ong, J.; Masalkhi, M.; Kamran, S.A.; Zaman, N.; Sarker, P.; Lee, A.G.; Tavakkoli, A. GPT-4: A new era of artificial intelligence in medicine. Ir. J. Med. Sci. 2023, 192, 3197–3200. [Google Scholar] [CrossRef] [PubMed]
- Zhu, L.; Mou, W.; Lai, Y.; Chen, J.; Lin, S.; Xu, L.; Lin, J.; Guo, Z.; Yang, T.; Lin, A.; et al. Step into the era of large multimodal models: A pilot study on ChatGPT-4V(ision)’s ability to interpret radiological images. Int. J. Surg. 2024, 110, 4096–4102. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Kuzan, B.N.; Meşe, İ.; Yaşar, S.; Kuzan, T.Y. A retrospective evaluation of the potential of ChatGPT in the accurate diagnosis of acute stroke. Diagn. Interv. Radiol. 2025, 31, 187–195. [Google Scholar] [CrossRef] [PubMed]
- Tian, D.; Jiang, S.; Zhang, L.; Lu, X.; Xu, Y. The role of large language models in medical image processing: A narrative review. Quant. Imaging Med. Surg. 2024, 14, 1108–1121. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Mihalache, A.; Huang, R.S.; Popovic, M.M.; Patil, N.S.; Pandya, B.U.; Shor, R.; Pereira, A.; Kwok, J.M.; Yan, P.; Wong, D.T.; et al. Accuracy of an Artificial Intelligence Chatbot’s Interpretation of Clinical Ophthalmic Images. JAMA Ophthalmol. 2024, 142, 321–326. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Rogasch, J.M.; Jochens, H.V.; Metzger, G.; Wetz, C.; Kaufmann, J.; Furth, C.; Amthauer, H.; Schatka, I. Keeping Up With ChatGPT: Evaluating Its Recognition and Interpretation of Nuclear Medicine Images. Clin. Nucl. Med. 2024, 49, 500–504. [Google Scholar] [CrossRef] [PubMed]
- Hayden, N.; Gilbert, S.; Poisson, L.M.; Griffith, B.; Klochko, C. Performance of GPT-4 with Vision on Text- and Image-based ACR Diagnostic Radiology In-Training Examination Questions. Radiology 2024, 312, e240153. [Google Scholar] [CrossRef] [PubMed]
- Chen, C.J.; Sobol, K.; Hickey, C.; Raphael, J. The Comparative Performance of Large Language Models on the Hand Surgery Self-Assessment Examination. Hand, 2024; Epub ahead of print. [Google Scholar] [CrossRef] [PubMed]
- Lyu, Q.; Tan, J.; Zapadka, M.E.; Ponnatapura, J.; Niu, C.; Myers, K.J.; Wang, G.; Whitlow, C.T. Translating radiology reports into plain language using ChatGPT and GPT-4 with prompt learning: Results, limitations, and potential. Vis. Comput. Ind. Biomed. Art. 2023, 6, 9. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Ueda, D.; Mitsuyama, Y.; Takita, H.; Horiuchi, D.; Walston, S.L.; Tatekawa, H.; Miki, Y. ChatGPT’s Diagnostic Performance from Patient History and Imaging Findings on the Diagnosis Please Quizzes. Radiology 2023, 308, e231040. [Google Scholar] [CrossRef] [PubMed]
- Suthar, P.P.; Kounsal, A.; Chhetri, L.; Saini, D.; Dua, S.G. Artificial Intelligence (AI) in Radiology: A Deep Dive Into ChatGPT 4.0’s Accuracy with the American Journal of Neuroradiology’s (AJNR) “Case of the Month”. Cureus 2023, 15, e43958. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Horiuchi, D.; Tatekawa, H.; Shimono, T.; Walston, S.L.; Takita, H.; Matsushita, S.; Oura, T.; Mitsuyama, Y.; Miki, Y.; Ueda, D. Accuracy of ChatGPT generated diagnosis from patient’s medical history and imaging findings in neuroradiology cases. Neuroradiology 2024, 66, 73–79. [Google Scholar] [CrossRef] [PubMed]
- Zaki, H.A.; Mai, M.; Abdel-Megid, H.; Liew, S.Q.R.; Kidanemariam, S.; Omar, A.S.; Tiwari, U.; Hamze, J.; Ahn, S.H.; Maxwell, A.W.P. Using ChatGPT to Improve Readability of Interventional Radiology Procedure Descriptions. Cardiovasc. Intervent. Radiol. 2024, 47, 1134–1141. [Google Scholar] [CrossRef] [PubMed]
- Truhn, D.; Weber, C.D.; Braun, B.J.; Bressem, K.; Kather, J.N.; Kuhl, C.; Nebelung, S. A pilot study on the efficacy of GPT-4 in providing orthopedic treatment recommendations from MRI reports. Sci. Rep. 2023, 13, 20159, Erratum in Sci. Rep. 2024, 14, 5431. https://doi.org/10.1038/s41598-024-56029-x. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Horiuchi, D.; Tatekawa, H.; Oura, T.; Shimono, T.; Walston, S.L.; Takita, H.; Matsushita, S.; Mitsuyama, Y.; Miki, Y.; Ueda, D. ChatGPT’s diagnostic performance based on textual vs. visual information compared to radiologists’ diagnostic performance in musculoskeletal radiology. Eur. Radiol. 2024; Epub ahead of print. [Google Scholar] [CrossRef] [PubMed]
- Rosen, S.; Saban, M. Evaluating the reliability of ChatGPT as a tool for imaging test referral: A comparative study with a clinical decision support system. Eur. Radiol. 2024, 34, 2826–2837. [Google Scholar] [CrossRef] [PubMed]
- Barash, Y.; Klang, E.; Konen, E.; Sorin, V. ChatGPT-4 Assistance in Optimizing Emergency Department Radiology Referrals and Imaging Selection. J. Am. Coll. Radiol. 2023, 20, 998–1003. [Google Scholar] [CrossRef] [PubMed]
- Ghosh, A.; Li, H.; Trout, A.T. Large language models can help with biostatistics and coding needed in radiology research. Acad. Radiol. 2024, 14, 604–611. [Google Scholar] [CrossRef] [PubMed]
- Mongan, J.; Moy, L.; Kahn, C.E. Checklist for Artificial Intelligence in Medical Imaging (CLAIM): A Guide for Authors and Reviewers. Radiol. Artif. Intell. 2020, 2, e200029. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Bhayana, R. Chatbots and Large Language Models in Radiology: A Practical Primer for Clinical and Research Applications. Radiology 2024, 310, e232756. [Google Scholar] [CrossRef] [PubMed]
- Dehdab, R.; Brendlin, A.; Werner, S.; Almansour, H.; Gassenmaier, S.; Brendel, J.M.; Nikolaou, K.; Afat, S. Evaluating ChatGPT-4V in chest CT diagnostics: A critical image interpretation assessment. Jpn. J. Radiol. 2024, 42, 1168–1177. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Mitsuyama, Y.; Tatekawa, H.; Takita, H.; Sasaki, F.; Tashiro, A.; Oue, S.; Walston, S.L.; Nonomiya, Y.; Shintani, A.; Miki, Y.; et al. Comparative analysis of GPT-4-based ChatGPT’s diagnostic performance with radiologists using real-world radiology reports of brain tumors. Eur. Radiol. 2024, 35, 1938–1947. [Google Scholar] [CrossRef] [PubMed]
- Javan, R.; Kim, T.; Mostaghni, N. GPT-4 Vision: Multi-Modal Evolution of ChatGPT and Potential Role in Radiology. Cureus 2024, 16, e68298. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Li, H.; Moon, J.T.; Iyer, D.; Balthazar, P.; Krupinski, E.A.; Bercu, Z.L.; Newsome, J.M.; Banerjee, I.; Gichoya, J.W.; Trivedi, H.M. Decoding radiology reports: Potential application of OpenAI ChatGPT to enhance patient understanding of diagnostic reports. Clin. Imaging 2023, 101, 137–141. [Google Scholar] [CrossRef] [PubMed]
- Huang, Y.; Gomaa, A.; Semrau, S.; Haderlein, M.; Lettmaier, S.; Weissmann, T.; Grigo, J.; Ben Tkhayat, H.; Frey, B.; Gaipl, U.; et al. Benchmarking ChatGPT-4 on a radiation oncology in-training exam and Red Journal Gray Zone cases: Potentials and challenges for ai-assisted medical education and decision making in radiation oncology. Front. Oncol. 2023, 13, 1265024. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Patil, N.S.; Huang, R.S.; van der Pol, C.B.; Larocque, N. Using Artificial Intelligence Chatbots as a Radiologic Decision-Making Tool for Liver Imaging: Do ChatGPT and Bard Communicate Information Consistent With the ACR Appropriateness Criteria? J. Am. Coll. Radiol. 2023, 20, 1010–1013. [Google Scholar] [CrossRef] [PubMed]
- Derevianko, A.; Pizzoli, S.F.M.; Pesapane, F.; Rotili, A.; Monzani, D.; Grasso, R.; Cassano, E.; Pravettoni, G. The Use of Artificial Intelligence (AI) in the Radiology Field: What Is the State of Doctor-Patient Communication in Cancer Diagnosis? Cancers 2023, 15, 470. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Sim, Y.; Chung, M.J.; Kotter, E.; Yune, S.; Kim, M.; Do, S.; Han, K.; Kim, H.; Yang, S.; Lee, D.-J.; et al. Deep Convolutional Neural Network-based Software Improves Radiologist Detection of Malignant Lung Nodules on Chest Radiographs. Radiology 2020, 294, 199–209. [Google Scholar] [CrossRef] [PubMed]
- Castiglioni, I.; Ippolito, D.; Interlenghi, M.; Monti, C.B.; Salvatore, C.; Schiaffino, S.; Polidori, A.; Gandola, D.; Messa, C.; Sardanelli, F. Machine learning applied on chest x-ray can aid in the diagnosis of COVID-19: A first experience from Lombardy, Italy. Eur. Radiol. Exp. 2021, 5, 7. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Zandehshahvar, M.; van Assen, M.; Maleki, H.; Kiarashi, Y.; De Cecco, C.N.; Adibi, A. Toward understanding COVID-19 pneumonia: A deep-learning-based approach for severity analysis and monitoring the disease. Sci. Rep. 2021, 11, 11112. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Akter, S.; Shamrat, F.M.J.M.; Chakraborty, S.; Karim, A.; Azam, S. COVID-19 Detection Using Deep Learning Algorithm on Chest X-ray Images. Biology 2021, 10, 1174. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Oh, Y.; Park, S.; Ye, J.C. Deep learning COVID-19 features on CXR using limited training data sets. IEEE Trans. Med. Imaging. 2020, 39, 8. [Google Scholar] [CrossRef]
- Pathak, Y.; Shukla, P.; Tiwari, A.; Stalin, S.; Singh, S. Deep transfer learning based classification model for COVID-19 disease. Ing. Rech. Biomed. 2022, 43, 87–91. [Google Scholar] [CrossRef]
- Minaee, S.; Kafieh, R.; Sonka, M.; Yazdani, S.; Soufi, G.J. Deep-COVID: Predicting COVID-19 from chest X-ray images using deep transfer learning. Med. Image. Anal. 2020, 65, 101794. [Google Scholar] [CrossRef] [PubMed]
- Weiss, J.; Raghu, V.K.; Paruchuri, K.; Zinzuwadia, A.; Natarajan, P.; Aerts, H.J.; Lu, M.T. Deep Learning to Estimate Cardiovascular Risk From Chest Radiographs: A Risk Prediction Study. Ann. Intern. Med. 2024, 177, 409–417, Erratum in Ann. Intern. Med. 2024, 178, 1. https://doi.org/10.7326/ANNALS-24-03386. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Nakaura, T.; Ito, R.; Ueda, D.; Nozaki, T.; Fushimi, Y.; Matsui, Y.; Yanagawa, M.; Yamada, A.; Tsuboyama, T.; Fujima, N.; et al. The impact of large language models on radiology: A guide for radiologists on the latest innovations in AI. Jpn. J. Radiol. 2024, 42, 685–696. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Shieh, A.; Tran, B.; He, G.; Kumar, M.; Freed, J.A.; Majety, P. Assessing ChatGPT 4.0’s test performance and clinical diagnostic accuracy on USMLE STEP 2 CK and clinical case reports. Sci. Rep. 2024, 14, 9330. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Goh, E.; Gallo, R.; Hom, J.; Strong, E.; Weng, Y.; Kerman, H.; Cool, J.A.; Kanjee, Z.; Parsons, A.S.; Ahuja, N.; et al. Large Language Model Influence on Diagnostic Reasoning: A Randomized Clinical Trial. JAMA Netw. Open. 2024, 7, e2440969. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Kochanek, M.; Cichecki, I.; Kaszyca, O.; Szydło, D.; Madej, M.; Jędrzejewski, D.; Kazienko, P.; Kocoń, J. Improving Training Dataset Balance with ChatGPT Prompt Engineering. Electronics 2024, 13, 2255. [Google Scholar] [CrossRef]
- Gunes, Y.C.; Cesur, T. The Diagnostic Performance of Large Language Models and General Radiologists in Thoracic Radiology Cases: A Comparative Study. J. Thorac. Imaging. 2025, 40, e0805. [Google Scholar] [CrossRef] [PubMed]
- Arnold, P.G.; Russe, M.F.; Bamberg, F.; Emrich, T.; Vecsey-Nagy, M.; Ashi, A.; Kravchenko, D.; Varga-Szemes, Á.; Soschynski, M.; Rau, A.; et al. Performance of large language models for CAD-RADS 2.0 classification derived from cardiac CT reports. J. Cardiovasc. Comput. Tomogr. 2025; Epub ahead of print. [Google Scholar] [CrossRef] [PubMed]
- Yang, Y.; Zhou, J.; Ding, X.; Huai, T.; Liu, S.; Chen, Q.; Xie, Y.; He, L. Recent Advances of Foundation Language Models-based Continual Learning: A Survey. ACM Comput. Surv. 2025, 57, 112. [Google Scholar] [CrossRef]
- Kochanek, K.; Skarzynski, H.; Jedrzejczak, W.W. Accuracy and Repeatability of ChatGPT Based on a Set of Multiple-Choice Questions on Objective Tests of Hearing. Cureus 2024, 16, e59857. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Chen, Q.; Sun, H.; Liu, H.; Jiang, Y.; Ran, T.; Jin, X.; Xiao, X.; Lin, Z.; Chen, H.; Niu, Z. An extensive benchmark study on biomedical text generation and mining with ChatGPT. Bioinformatics 2023, 39, btad557. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
- Sounderajah, V.; Ashrafian, H.; Rose, S.; Shah, N.H.; Ghassemi, M.; Golub, R.; Kahn, C.E., Jr.; Esteva, A.; Karthikesalingam, A.; Mateen, B.; et al. A quality assessment tool for artificial intelligence-centered diagnostic test accuracy studies: QUADAS-AI. Nat. Med. 2021, 27, 1663–1665. [Google Scholar] [CrossRef] [PubMed]
- Mongan, J.; Moy, L.; Kahn, C.E. Checklist for Artificial Intelligence in Medical Imaging (CLAIM): 2024 Update. Radiol. Artif. Intell. 2024, 6, e220159. [Google Scholar] [CrossRef]





| CXR Pathology | N | Detection Rate N/N (%) | Confidence Score Median (IQR) | AXR Pathology | N | Detection Rate N/N (%) | Confidence Score Median (IQR) |
|---|---|---|---|---|---|---|---|
| Pneumonia | 50 | 37/50 (74%) | 3 (IQR 2) | Small/Large BO | 60 | 54/60 (90.9%) | 3.5 (IQR 2) |
| Pulmonary edema | 30 | 30/30 (100%) | 3 (IQR 0.5) | Pneumoperitoneum | 30 | 10/30 (33.3%) | 4 (IQR 0) |
| Pleural effusion | 30 | 21/30 (70%) | 2.5 (IQR 1) | R/U/B calculi or gallstones | 73 | 43/73 (59.7%) | 4 (IQR 1) |
| Lung tumors | 10 | 9/10 (90%) | 3.5 (IQR 0.75) | Diverticulitis (Barium) | 40 | 27/40 (67.5%) | 4 (IQR 1) |
| Emphysma | 11 | 9/11 (81.8%) | 4 (IQR 1) | Foreign bodies | 41 | 40/41 (97.6%) | 4 (IQR 0) |
| Cardiomegaly | 44 | 32/44 (72.7%) | 3 (IQR 2) | Total | 243 | 175/243 (72.02%) 95% CI: 66.06–77.28 | Median 4 (IQR 1) Mean 3.45 ± 1.1 |
| Enlarged mediastinum | 11 | 6/11 (54.5%) | 3 (IQR 1.25) | ||||
| Rib fracture | 20 | 0/20 (0%) | 3.5 (IQR 2) | ||||
| Pneumothorax | 51 | 21/51 (41.2%) | 3 (IQR 1) | ||||
| Total | 257 | 170/257 (66.15%) 95%CI:60.16–71.66 | Median 4 (IQR 3) Mean 2.48 ± 1.45 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lacaita, P.G.; Galijasevic, M.; Swoboda, M.; Gruber, L.; Scharll, Y.; Barbieri, F.; Widmann, G.; Feuchtner, G.M. The Accuracy of ChatGPT-4o in Interpreting Chest and Abdominal X-Ray Images. J. Pers. Med. 2025, 15, 194. https://doi.org/10.3390/jpm15050194
Lacaita PG, Galijasevic M, Swoboda M, Gruber L, Scharll Y, Barbieri F, Widmann G, Feuchtner GM. The Accuracy of ChatGPT-4o in Interpreting Chest and Abdominal X-Ray Images. Journal of Personalized Medicine. 2025; 15(5):194. https://doi.org/10.3390/jpm15050194
Chicago/Turabian StyleLacaita, Pietro G., Malik Galijasevic, Michael Swoboda, Leonhard Gruber, Yannick Scharll, Fabian Barbieri, Gerlig Widmann, and Gudrun M. Feuchtner. 2025. "The Accuracy of ChatGPT-4o in Interpreting Chest and Abdominal X-Ray Images" Journal of Personalized Medicine 15, no. 5: 194. https://doi.org/10.3390/jpm15050194
APA StyleLacaita, P. G., Galijasevic, M., Swoboda, M., Gruber, L., Scharll, Y., Barbieri, F., Widmann, G., & Feuchtner, G. M. (2025). The Accuracy of ChatGPT-4o in Interpreting Chest and Abdominal X-Ray Images. Journal of Personalized Medicine, 15(5), 194. https://doi.org/10.3390/jpm15050194

