Evaluation of ChatGPT-5 for Automated ASPECTS Assessment on Non-Contrast CT in Acute Ischemic Stroke
Abstract
1. Introduction
2. Materials and Methods
2.1. Study Design and Patient Selection
2.2. Imaging Protocol
2.3. ASPECTS Assessment by Human Readers
2.4. AI-Based ASPECTS Assessment Using ChatGPT-5
2.5. Clinical and Angiographic Data
2.6. Statistical Analysis
3. Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| AIS | Acute ischemic stroke |
| AI | Artificial intelligence |
| ASPECTS | Alberta Stroke Program Early CT Score |
| AUC | Area under the curve |
| DL | Deep learning |
| EIC | Early ischemic change |
| HT | Hemorrhagic transformation |
| ICC | Intraclass correlation coefficient |
| LLM | Large language model |
| mRS | Modified Rankin Scale |
| mTICI | Modified Thrombolysis in Cerebral Infarction |
| NCCT | Non-contrast computed tomography |
| OR | Odds ratio |
| RAG | Retrieval-augmented generation |
| ROC | Receiver operating characteristic |
| SD | Standard deviation |
References
- Kim, J.; Thayabaranathan, T.; Donnan, G.A.; Howard, G.; Howard, V.J.; Rothwell, P.M.; Feigin, V.; Norrving, B.; Owolabi, M.; Pandian, J.; et al. Global Stroke Statistics 2019. Int. J. Stroke 2020, 15, 819–838. [Google Scholar] [CrossRef] [PubMed]
- Wei, J.; Shang, K.; Wei, X.; Zhu, Y.; Yuan, Y.; Wang, M.; Ding, C.; Dai, L.; Sun, Z.; Mao, X.; et al. Deep Learning-Based Automatic ASPECTS Calculation Can Improve Diagnosis Efficiency in Patients with Acute Ischemic Stroke: A Multicenter Study. Eur. Radiol. 2025, 35, 627–639. [Google Scholar] [CrossRef]
- Powers, W.J.; Rabinstein, A.A.; Ackerson, T.; Adeoye, O.M.; Bambakidis, N.C.; Becker, K.; Biller, J.; Brown, M.; Demaerschalk, B.M.; Hoh, B.; et al. Guidelines for the Early Management of Patients with Acute Ischemic Stroke: 2019 Update to the 2018 Guidelines for the Early Management of Acute Ischemic Stroke: A Guideline for Healthcare Professionals From the American Heart Association/American Stroke Association. Stroke 2019, 50, E344–E418. [Google Scholar] [CrossRef] [PubMed]
- Barber, P.A.; Demchuk, A.M.; Zhang, J.; Buchan, A.M. Validity and Reliability of a Quantitative Computed Tomography Score in Predicting Outcome of Hyperacute Stroke before Thrombolytic Therapy. Lancet 2000, 355, 1670–1674. [Google Scholar] [CrossRef]
- Farzin, B.; Fahed, R.; Guilbert, F.; Poppe, A.Y.; Daneault, N.; Durocher, A.P.; Lanthier, S.; Boudjani, H.; Khoury, N.N.; Roy, D.; et al. Early CT Changes in Patients Admitted for Thrombectomy: Intrarater and Interrater Agreement. Neurology 2016, 87, 249–256. [Google Scholar] [CrossRef]
- Wilson, A.T.; Dey, S.; Evans, J.W.; Najm, M.; Qiu, W.; Menon, B.K. Minds Treating Brains: Understanding the Interpretation of Non-Contrast CT ASPECTS in Acute Ischemic Stroke. Expert. Rev. Cardiovasc. Ther. 2018, 16, 143–153. [Google Scholar] [CrossRef]
- Kuang, H.; Najm, M.; Chakraborty, D.; Maraj, N.; Sohn, S.I.; Goyal, M.; Hill, M.D.; Demchuk, A.M.; Menon, B.K.; Qiu, W. Automated Aspects on Noncontrast CT Scans in Patients with Acute Ischemic Stroke Using Machine Learning. Am. J. Neuroradiol. 2019, 40, 33–38. [Google Scholar] [CrossRef] [PubMed]
- Lee, E.J.; Kim, Y.H.; Kim, N.; Kang, D.W. Deep into the Brain: Artificial Intelligence in Stroke Imaging. J. Stroke 2017, 19, 277. [Google Scholar] [CrossRef]
- Chiang, P.L.; Lin, S.Y.; Chen, M.H.; Chen, Y.S.; Wang, C.K.; Wu, M.C.; Huang, Y.T.; Lee, M.Y.; Chen, Y.S.; Lin, W.C. Deep Learning-Based Automatic Detection of ASPECTS in Acute Ischemic Stroke: Improving Stroke Assessment on CT Scans. J. Clin. Med. 2022, 11, 5159. [Google Scholar] [CrossRef]
- Zong, H.; Wu, R.; Cha, J.; Wang, J.; Wu, E.; Li, J.; Zhou, Y.; Zhang, C.; Feng, W.; Shen, B. Large Language Models in Worldwide Medical Exams: Platform Development and Comprehensive Analysis. J. Med. Internet Res. 2024, 26, e66114. [Google Scholar] [CrossRef]
- Yang, X.; Chen, W. The Performance of ChatGPT on Medical Image-Based Assessments and Implications for Medical Education. BMC Med. Educ. 2025, 25, 1192. [Google Scholar] [CrossRef] [PubMed]
- Fink, M.A.; Bischoff, A.; Fink, C.A.; Moll, M.; Kroschke, J.; Dulz, L.; Heußel, C.P.; Kauczor, H.U.; Weber, T.F. Potential of ChatGPT and GPT-4 for Data Mining of Free-Text CT Reports on Lung Cancer. Radiology 2023, 308, e231362. [Google Scholar] [CrossRef]
- Keshavarz, P.; Bagherieh, S.; Nabipoorashrafi, S.A.; Chalian, H.; Rahsepar, A.A.; Kim, G.H.J.; Hassani, C.; Raman, S.S.; Bedayat, A. ChatGPT in Radiology: A Systematic Review of Performance, Pitfalls, and Future Perspectives. Diagn. Interv. Imaging 2024, 105, 251–265. [Google Scholar] [CrossRef]
- Handa, P.; Chhabra, D.; Goel, N.; Krishnan, S. Exploring the Role of ChatGPT in Medical Image Analysis. Biomed. Signal Process. Control 2023, 86, 105292. [Google Scholar] [CrossRef]
- Şahin, M.F.; Ateş, H.; Keleş, A.; Özcan, R.; Doğan, Ç.; Akgül, M.; Yazıcı, C.M. Responses of Five Different Artificial Intelligence Chatbots to the Top Searched Queries About Erectile Dysfunction: A Comparative Analysis. J. Med. Syst. 2024, 48, 1–6. [Google Scholar] [CrossRef]
- Deng, J.; Heybati, K.; Shammas-Toma, M. When Vision Meets Reality: Exploring the Clinical Applicability of GPT-4 with Vision. Clin. Imaging 2024, 108, 110101. [Google Scholar] [CrossRef]
- Bilgin, C.; Ibrahim, M.; Ghozy, S.; Jabal, M.S.; Shehata, M.; Kobeissi, H.; Kadirvel, R.; Brinjikji, W.; Rabinstein, A.A.; Kallmes, D.F. Disability-Free Outcomes after Mechanical Thrombectomy: A Systematic Review and Meta-Analysis of the Randomized Controlled Trials. Interv. Neuroradiol. 2024, 15910199231224826. [Google Scholar] [CrossRef] [PubMed]
- Von Kummer, R.; Broderick, J.P.; Campbell, B.C.V.; Demchuk, A.; Goyal, M.; Hill, M.D.; Treurniet, K.M.; Majoie, C.B.L.M.; Marquering, H.A.; Mazya, M.V.; et al. The Heidelberg Bleeding Classification. Stroke 2015, 46, 2981–2986. [Google Scholar] [CrossRef] [PubMed]
- Lee, S.J.; Park, G.; Kim, D.; Jung, S.; Song, S.; Hong, J.M.; Shin, D.H.; Lee, J.S. Clinical Evaluation of a Deep-Learning Model for Automatic Scoring of the Alberta Stroke Program Early CT Score on Non-Contrast CT. J. Neurointerv Surg. 2023, 16, 61. [Google Scholar] [CrossRef]
- Adamou, A.; Beltsios, E.T.; Bania, A.; Gkana, A.; Kastrup, A.; Chatziioannou, A.; Politi, M.; Papanagiotou, P. Artificial Intelligence-Driven ASPECTS for the Detection of Early Stroke Changes in Non-Contrast CT: A Systematic Review and Meta-Analysis. J. Neurointerv Surg. 2023, 15, E298–E304. [Google Scholar] [CrossRef] [PubMed]
- Cao, Z.; Xu, J.; Song, B.; Chen, L.; Sun, T.; He, Y.; Wei, Y.; Niu, G.; Zhang, Y.; Feng, Q.; et al. Deep Learning Derived Automated ASPECTS on Non-Contrast CT Scans of Acute Ischemic Stroke Patients. Hum. Brain Mapp. 2022, 43, 3023–3036. [Google Scholar] [CrossRef]
- Temel, M.H.; Erden, Y.; Bağcıer, F. Evaluating Artificial Intelligence Performance in Medical Image Analysis: Sensitivity, Specificity, Accuracy, and Precision of ChatGPT-4o on Kellgren-Lawrence Grading of Knee X-Ray Radiographs. Knee 2025, 55, 79–84. [Google Scholar] [CrossRef] [PubMed]
- Qiu, W.; Kuang, H.; Teleg, E.; Ospel, J.M.; Sohn, S.I.; Almekhlafi, M.; Goyal, M.; Hill, M.D.; Demchuk, A.M.; Menon, B.K. Machine Learning for Detecting Early Infarction in Acute Stroke with Non-Contrast-Enhanced CT. Radiology 2020, 294, 638–644. [Google Scholar] [CrossRef] [PubMed]
- Shamout, F.E.; Shen, Y.; Wu, N.; Kaku, A.; Park, J.; Makino, T.; Jastrzębski, S.; Witowski, J.; Wang, D.; Zhang, B.; et al. An Artificial Intelligence System for Predicting the Deterioration of COVID-19 Patients in the Emergency Department. NPJ Digit. Med. 2021, 4, 80. [Google Scholar] [CrossRef]
- Gauriau, R.; Bizzo, B.C.; Comeau, D.S.; Hillis, J.M.; Bridge, C.P.; Chin, J.K.; Pawar, J.; Pourvaziri, A.; Sesic, I.; Sharaf, E.; et al. Head CT Deep Learning Model Is Highly Accurate for Early Infarct Estimation. Sci. Rep. 2023, 13, 189. [Google Scholar] [CrossRef]
- Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models Are Few-Shot Learners. arXiv 2005, arXiv:2005.14165. [Google Scholar] [CrossRef]
- Bhayana, R.; Bleakney, R.R.; Krishna, S. GPT-4 in Radiology: Improvements in Advanced Reasoning. Radiology 2023, 307, 230987. [Google Scholar] [CrossRef] [PubMed]
- Temperley, H.C.; O’Sullivan, N.J.; Mac Curtain, B.M.; Corr, A.; Meaney, J.F.; Kelly, M.E.; Brennan, I. Current Applications and Future Potential of ChatGPT in Radiology: A Systematic Review. J. Med. Imaging Radiat. Oncol. 2024, 68, 257–264. [Google Scholar] [CrossRef]
- Lacaita, P.G.; Galijasevic, M.; Swoboda, M.; Gruber, L.; Scharll, Y.; Barbieri, F.; Widmann, G.; Feuchtner, G.M. The Accuracy of ChatGPT-4o in Interpreting Chest and Abdominal X-Ray Images. J. Pers. Med. 2025, 15, 194. [Google Scholar] [CrossRef]
- Maingard, J.; Paul, A.; Churilov, L.; Mitchell, P.; Dowling, R.; Yan, B. Recanalisation Success Is Independent of ASPECTS in Predicting Outcomes after Intra-Arterial Therapy for Acute Ischaemic Stroke. J. Clin. Neurosci. 2014, 21, 1344–1348. [Google Scholar] [CrossRef]
- Tian, B.; Tian, X.; Shi, Z.; Peng, W.; Zhang, X.; Yang, P.; Li, Z.; Zhang, X.; Lou, M.; Yin, C.; et al. Clinical and Imaging Indicators of Hemorrhagic Transformation in Acute Ischemic Stroke After Endovascular Thrombectomy. Stroke 2022, 53, 1674–1681. [Google Scholar] [CrossRef] [PubMed]
- Shen, Y.; Heacock, L.; Elias, J.; Hentel, K.D.; Reig, B.; Shih, G.; Moy, L. ChatGPT and Other Large Language Models Are Double-Edged Swords. Radiology 2023, 307, e230163. [Google Scholar] [CrossRef] [PubMed]
- Lee, R.C.; Hadidchi, R.; Coard, M.C.; Rubinov, Y.; Alamuri, T.; Liaw, A.; Chandrupatla, R.; Duong, T.Q. Use of Large Language Models on Radiology Reports: A Scoping Review. J. Am. Coll. Radiol. 2025; in press. [Google Scholar] [CrossRef] [PubMed]




| Variable | Value |
|---|---|
| Age, years (mean ± SD) | 70.2 ± 12.5 |
| Sex, male/female, n (%) | 101/98 (50.8/49.2) |
| Hypertension, n (%) | 135 (67.8) |
| Diabetes mellitus, n (%) | 68 (34.2) |
| Coronary artery disease, n (%) | 63 (31.7) |
| Hyperlipidemia, n (%) | 24 (12.1) |
| Atrial fibrillation, n (%) | 76 (38.2) |
| Intravenous thrombolysis, n (%) | 21 (10.6) |
| Side of stroke (right/left) | 111/88 (55.8/44.2) |
| Occlusion site, n (%) | |
| ICA (%) | 28 (14.1) |
| ICA + MCA M1 (%) | 24 (12.1) |
| ICA + MCA M2 (%) | 3 (1.5) |
| MCA M1 (%) | 129 (64.8) |
| MCA M2 (%) | 14 (7.0) |
| MCA M2–M3 (%) | 1 (0.5) |
| Comparison | ICC (95% CI) * | p-Value | Interpretation | Cohen’s κ † (95% CI) | Agreement Level |
|---|---|---|---|---|---|
| Radiologist vs. Neurologist | 0.854 (0.797–0.894) | <0.001 | Excellent | 0.83 (0.75–0.89) | Almost perfect |
| ChatGPT-5 vs. Consensus | 0.845 (0.792–0.884) | <0.001 | Excellent | 0.79 (0.71–0.86) | Substantial |
| ChatGPT-5 vs. Radiologist | 0.821 (0.765–0.868) | <0.001 | Good | 0.76 (0.67–0.84) | Substantial |
| ChatGPT-5 vs. Neurologist | 0.807 (0.745–0.856) | <0.001 | Good | 0.74 (0.65–0.82) | Substantial |
| Overall (three raters) | 0.451 (0.335–0.555) | <0.001 | Poor overall multi-rater consistency | — | — |
| Variable | ChatGPT-5 ASPECT (OR [95% CI]) | p-Value | Consensus ASPECT (OR [95% CI]) | p-Value |
|---|---|---|---|---|
| ASPECT score (per 1-point increase) | 1.28 (1.09–1.52) | 0.004 | 1.31 (1.11–1.54) | 0.003 |
| Age (years) | 0.95 (0.92–0.98) | 0.002 | 0.95 (0.92–0.98) | 0.002 |
| TICI 2b–3 (successful reperfusion) | 2.65 (1.33–5.28) | 0.006 | 2.67 (1.35–5.31) | 0.005 |
| Onset-to-groin time (min) | 0.88 (0.74–1.04) | 0.12 | 0.87 (0.73–1.03) | 0.11 |
| Sex (male vs. female) | 1.12 (0.64–1.98) | 0.70 | 1.10 (0.63–1.96) | 0.72 |
| Diabetes mellitus | 0.93 (0.51–1.70) | 0.81 | 0.92 (0.50–1.68) | 0.82 |
| Hypertension | 0.87 (0.47–1.61) | 0.65 | 0.85 (0.46–1.58) | 0.66 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Genez, S.; Özer, H.; Buz Yaşar, A.; Yılmazsoy, Y.; Soydan, T.; Sarıoğlu, A.E.; Ersoy, S. Evaluation of ChatGPT-5 for Automated ASPECTS Assessment on Non-Contrast CT in Acute Ischemic Stroke. Diagnostics 2025, 15, 3160. https://doi.org/10.3390/diagnostics15243160
Genez S, Özer H, Buz Yaşar A, Yılmazsoy Y, Soydan T, Sarıoğlu AE, Ersoy S. Evaluation of ChatGPT-5 for Automated ASPECTS Assessment on Non-Contrast CT in Acute Ischemic Stroke. Diagnostics. 2025; 15(24):3160. https://doi.org/10.3390/diagnostics15243160
Chicago/Turabian StyleGenez, Samet, Hamza Özer, Ayşenur Buz Yaşar, Yunus Yılmazsoy, Tunahan Soydan, Abdullah Emre Sarıoğlu, and Sadettin Ersoy. 2025. "Evaluation of ChatGPT-5 for Automated ASPECTS Assessment on Non-Contrast CT in Acute Ischemic Stroke" Diagnostics 15, no. 24: 3160. https://doi.org/10.3390/diagnostics15243160
APA StyleGenez, S., Özer, H., Buz Yaşar, A., Yılmazsoy, Y., Soydan, T., Sarıoğlu, A. E., & Ersoy, S. (2025). Evaluation of ChatGPT-5 for Automated ASPECTS Assessment on Non-Contrast CT in Acute Ischemic Stroke. Diagnostics, 15(24), 3160. https://doi.org/10.3390/diagnostics15243160

