Variability of ChatGPT in Interpreting the Lexicon of ACR-TIRADS, EU-TIRADS, and K-TIRADS
Abstract
1. Introduction
2. Material and Methods
2.1. Conduction of Study
2.2. TIRADS Characteristics
2.3. Study Design
2.4. Ethics Approval
2.5. Statistical Analysis
3. Results
3.1. Study Series
3.2. ChatGPT Assessment of Cases According to the Three Systems
3.3. ChatGPT Assessment Crossing the Lexicons and the TIRADSs
3.4. Analysis of the Discordant Cases
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- McCarthy, J.; Minsky, M.L.; Rochester, N.; Shannon, C.E. A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955. AI Mag. 2006, 27, 12. [Google Scholar]
- Kadom, N.; Cook, T.S.; Bruno, M.A. Improving diagnosis: Advances in radiology. Diagnosis 2025, 12, 578–587. [Google Scholar] [CrossRef] [PubMed]
- Ramos-Casallas, A.; Cardona-Mendoza, A.; Perdomo-Lara, S.J.; Rico-Mendoza, A.; Porras-Ramírez, A. Performance evaluation of machine learning models in cervical cancer diagnosis: Systematic review and meta-analysis. Eur. J. Cancer 2025, 229, 115768. [Google Scholar] [CrossRef]
- Du, M.; He, S.; Liu, J.; Yuan, L. Artificial Intelligence in CT Angiography for the Detection of Coronary Artery Stenosis and Calcified Plaque: A Systematic Review and Meta-analysis. Acad. Radiol. 2025, 32, 3776–3787. [Google Scholar] [CrossRef]
- Bini, F.; Pica, A.; Azzimonti, L.; Giusti, A.; Ruinelli, L.; Marinozzi, F.; Trimboli, P. Artificial Intelligence in Thyroid Field—A Comprehensive Review. Cancers 2021, 13, 4740. [Google Scholar] [CrossRef]
- Yu, Y.; Gomez-Cabello, C.A.; Haider, S.A.; Genovese, A.; Prabha, S.; Trabilsy, M.; Collaco, B.G.; Wood, N.G.; Bagaria, S.; Tao, C.; et al. Enhancing Clinician Trust in AI Diagnostics: A Dynamic Framework for Confidence Calibration and Transparency. Diagnostics 2025, 15, 2204. [Google Scholar] [CrossRef]
- Shan, G.; Chen, X.; Wang, C.; Liu, L.; Gu, Y.; Jiang, H.; Shi, T. Comparing Diagnostic Accuracy of Clinical Professionals and Large Language Models: Systematic Review and Meta-Analysis. JMIR Public Health Surveill. 2025, 13, e64963. [Google Scholar] [CrossRef]
- Durante, C.; Hegedüs, L.; Na, D.G.; Papini, E.; Sipos, J.A.; Baek, J.H.; Frasoldati, A.; Grani, G.; Grant, E.; Horvath, E.; et al. International Expert Consensus on US Lexicon for Thyroid Nodules. Radiology 2023, 309, e231481. [Google Scholar] [CrossRef]
- Grani, G.; Sponziello, M.; Filetti, S.; Durante, C. Thyroid nodules: Diagnosis and management. Nat. Rev. Endocrinol. 2024, 20, 715–728. [Google Scholar] [CrossRef]
- Tessler, F.N.; Middleton, W.D.; Grant, E.G.; Hoang, J.K.; Berland, L.L.; Teefey, S.A.; Cronan, J.J.; Beland, M.D.; Desser, T.S.; Frates, M.C.; et al. ACR Thyroid Imaging, Reporting and Data System (TI-RADS): White Paper of the ACR TI-RADS Committee. J. Am. Coll. Radiol. 2017, 14, 587–595. [Google Scholar] [CrossRef] [PubMed]
- Russ, G.; Bonnema, S.J.; Erdogan, M.F.; Durante, C.; Ngu, R.; Leenhardt, L. European Thyroid Association Guidelines for Ultrasound Malignancy Risk Stratification of Thyroid Nodules in Adults: The EU-TIRADS. Eur. Thyroid. J. 2017, 6, 225–237. [Google Scholar] [CrossRef] [PubMed]
- Ha, E.J.; Chung, S.R.; Na, D.G.; Ahn, H.S.; Chung, J.; Lee, J.Y.; Park, J.S.; Yoo, R.-E.; Baek, J.H.; Baek, S.M.; et al. 2021 Korean Thyroid Imaging Reporting and Data System and Imaging-Based Management of Thyroid Nodules: Korean Society of Thyroid Radiology Consensus Statement and Recommendations. Korean J. Radiol. 2021, 22, 2094–2123. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.M.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All you Need. Neural Inf. Process. Syst. 2017. [Google Scholar] [CrossRef]
- Chen, Z.; Chambara, N.; Wu, C.; Lo, X.; Liu, S.Y.W.; Gunda, S.T.; Han, X.; Qu, J.; Chen, F.; Ying, M.T.C. Assessing the feasibility of ChatGPT-4o and Claude 3-Opus in thyroid nodule classification based on ultrasound images. Endocrine 2024, 87, 1041–1049. [Google Scholar] [CrossRef]
- Helvaci, B.C.; Hepsen, S.; Candemir, B.; Boz, O.; Durantas, H.; Houssein, M.; Cakal, E. Assessing the accuracy and reliability of ChatGPT’s medical responses about thyroid cancer. Int. J. Med. Inform. 2024, 191, 105593. [Google Scholar] [CrossRef]
- Deniz, M.S.; Guler, B.Y. Assessment of ChatGPT’s adherence to ETA-thyroid nodule management guideline over two different time intervals 14 days apart: In binary and multiple-choice queries. Endocrine 2024, 85, 794–802. [Google Scholar] [CrossRef]
- Xia, S.; Hua, Q.; Mei, Z.; Xu, W.; Lai, L.; Wei, M.; Qin, Y.; Luo, L.; Wang, C.; Huo, S.; et al. Clinical application potential of large language model: A study based on thyroid nodules. Endocrine 2024, 87, 206–213. [Google Scholar] [CrossRef]
- Guo, S.; Li, R.; Li, G.; Chen, W.; Huang, J.; He, L.; Ma, Y.; Wang, L.; Zheng, H.; Tian, C.; et al. Comparing ChatGPT’s and Surgeon’s Responses to Thyroid-related Questions From Patients. J. Clin. Endocrinol. Metab. 2024, 110, e841–e850. [Google Scholar] [CrossRef] [PubMed]
- Jiang, H.; Xia, S.; Yang, Y.; Xu, J.; Hua, Q.; Mei, Z.; Hou, Y.; Wei, M.; Lai, L.; Li, N.; et al. Transforming free-text radiology reports into structured reports using ChatGPT: A study on thyroid ultrasonography. Eur. J. Radiol. 2024, 175, 111458. [Google Scholar] [CrossRef]
- Hamour, A.F.; Yang, W.; Lee, J.J.W.; Wu, V.; Ziai, H.; Singh, P.; Eskander, A.; Sahovaler, A.; Higgins, K.; Witterick, I.J.; et al. Association of the Implementation of a Standardized Thyroid Ultrasonography Reporting Program With Documentation of Nodule Characteristics. Arch. Otolaryngol. Neck Surg. 2021, 147, 343–349. [Google Scholar] [CrossRef] [PubMed]
- Champendal, M.; Müller, H.; Prior, J.O.; dos Reis, C.S. A scoping review of interpretability and explainability concerning artificial intelligence methods in medical imaging. Eur. J. Radiol. 2023, 169, 111159. [Google Scholar] [CrossRef] [PubMed]
- Trimboli, P.; Colombo, A.; Gamarra, E.; Ruinelli, L.; Leoncini, A. Performance of computer scientists in the assessment of thyroid nodules using TIRADS lexicons. J. Endocrinol. Investig. 2024, 48, 877–883. [Google Scholar] [CrossRef] [PubMed]
- Piticchio, T.; Russ, G.; Radzina, M.; Frasca, F.; Durante, C.; Trimboli, P. Head-to-head comparison of American, European, and Asian TIRADSs in thyroid nodule assessment: Systematic review and meta-analysis. Eur. Thyroid. J. 2024, 13, e230242. [Google Scholar] [CrossRef]
- Castellana, M.; Grani, G.; Radzina, M.; Guerra, V.; Giovanella, L.; Deandrea, M.; Ngu, R.; Durante, C.; Trimboli, P. Performance of EU-TIRADS in malignancy risk stratification of thyroid nodules: A meta-analysis. Eur. J. Endocrinol. 2020, 183, 255–264. [Google Scholar] [CrossRef] [PubMed]



| ACR-TIRADS Lexicon | EU-TIRADS Lexicon | K-TIRADS Lexicon | |
|---|---|---|---|
| ACR vs. EU | 0.50 | 0.47 | 0.55 |
| ACR vs. K | 0.54 | 0.41 | 0.55 |
| K vs. EU | 0.48 | 0.43 | 0.60 |
| Lexicon | Descriptor | Cases | Discordances | |||
|---|---|---|---|---|---|---|
| Number | ACR vs. EU | ACR vs. K | EU vs. K | |||
| ACR-TIRADS | Cystic or almost completely cystic | 128 | 163 | 52 | 54 | 57 |
| Smooth | 128 | 147 | 50 | 51 | 46 | |
| Spongiform | 128 | 143 | 47 | 46 | 50 | |
| Anechoic | 128 | 138 | 48 | 47 | 43 | |
| EU-TIRADS | Spongiform appearance | 320 | 459 | 158 | 140 | 161 |
| Smooth margin | 320 | 372 | 124 | 130 | 118 | |
| Ill-defined margin | 320 | 371 | 101 | 133 | 137 | |
| Cystic | 320 | 364 | 113 | 131 | 120 | |
| Halo/rim | 320 | 357 | 121 | 123 | 113 | |
| Round | 400 | 442 | 139 | 152 | 151 | |
| Egg shell calcification | 400 | 429 | 139 | 148 | 142 | |
| Hyperechoic | 400 | 427 | 139 | 157 | 131 | |
| Isoechoic | 400 | 423 | 130 | 148 | 145 | |
| K-TIRADS | Smooth/regular/circumscribed | 160 | 178 | 69 | 58 | 51 |
| Spongiform/honeycomb | 96 | 105 | 33 | 41 | 31 | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Trimboli, P.; Colombo, A.; Ruinelli, L.; Leoncini, A. Variability of ChatGPT in Interpreting the Lexicon of ACR-TIRADS, EU-TIRADS, and K-TIRADS. Diagnostics 2025, 15, 2694. https://doi.org/10.3390/diagnostics15212694
Trimboli P, Colombo A, Ruinelli L, Leoncini A. Variability of ChatGPT in Interpreting the Lexicon of ACR-TIRADS, EU-TIRADS, and K-TIRADS. Diagnostics. 2025; 15(21):2694. https://doi.org/10.3390/diagnostics15212694
Chicago/Turabian StyleTrimboli, Pierpaolo, Amos Colombo, Lorenzo Ruinelli, and Andrea Leoncini. 2025. "Variability of ChatGPT in Interpreting the Lexicon of ACR-TIRADS, EU-TIRADS, and K-TIRADS" Diagnostics 15, no. 21: 2694. https://doi.org/10.3390/diagnostics15212694
APA StyleTrimboli, P., Colombo, A., Ruinelli, L., & Leoncini, A. (2025). Variability of ChatGPT in Interpreting the Lexicon of ACR-TIRADS, EU-TIRADS, and K-TIRADS. Diagnostics, 15(21), 2694. https://doi.org/10.3390/diagnostics15212694

