Multicenter, Multinational, and Multivendor Validation of an Artificial Intelligence Application for Acute Cervical Spine Fracture Detection on CT
Abstract
1. Introduction
2. Materials and Methods
2.1. Data Collection
2.2. AI Application
2.3. Reference Standard
2.4. Statistical Analysis
3. Results
3.1. Data Characteristics
3.2. Diagnostic Performance of AI Application
3.3. Subgroup Analysis According to Data Sources, Patient Age, and CT Scanner Manufacturers
3.4. Per-Bounding Box Analysis and Cervical Spinal Level Labeling Validation
3.5. Analysis of Discrepant Cases Between the First Two Radiologists
3.6. Analysis of False Positive and False Negative Cases
3.7. Per-Case Time-to-Notification of AI Application
4. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| AI | Artificial Intelligence |
| AUC | Area under the receiver-operating characteristic curve |
| CSFx | Cervical spine fractures |
References
- Asemota, A.O.; Ahmed, A.K.; Purvis, T.E.; Passias, P.G.; Goodwin, C.R.; Sciubba, D.M. Analysis of Cervical Spine Injuries in Elderly Patients from 2001 to 2010 Using a Nationwide Database: Increasing Incidence, Overall Mortality, and Inpatient Hospital Charges. World Neurosurg. 2018, 120, e114–e130. [Google Scholar] [CrossRef] [PubMed]
- Baaj, A.A.; Uribe, J.S.; Nichols, T.A.; Theodore, N.; Crawford, N.R.; Sonntag, V.K.; Vale, F.L. Health care burden of cervical spine fractures in the United States: Analysis of a nationwide database over a 10-year period. J. Neurosurg. Spine 2010, 13, 61–66. [Google Scholar] [CrossRef] [PubMed]
- Holmes, J.F.; Akkinepalli, R. Computed tomography versus plain radiography to screen for cervical spine injury: A meta-analysis. J. Trauma Acute Care Surg. 2005, 58, 902–905. [Google Scholar] [CrossRef] [PubMed]
- Lin, H.M.; Colak, E.; Richards, T.; Kitamura, F.C.; Prevedello, L.M.; Talbott, J.; Ball, R.L.; Gumeler, E.; Yeom, K.W.; Hamghalam, M.; et al. The RSNA Cervical Spine Fracture CT Dataset. Radiol. Artif. Intell. 2023, 5, e230034. [Google Scholar] [CrossRef] [PubMed]
- Ruitenbeek, H.C.; Oei, E.H.G.; Schmahl, B.L.; Bos, E.M.; Verdonschot, R.; Visser, J.J. Towards clinical implementation of an AI-algorithm for detection of cervical spine fractures on computed tomography. Eur. J. Radiol. 2024, 173, 111375. [Google Scholar] [CrossRef] [PubMed]
- van den Wittenboer, G.J.; van der Kolk, B.Y.M.; Nijholt, I.M.; Langius-Wiffen, E.; van Dijk, R.A.; van Hasselt, B.; Podlogar, M.; van den Brink, W.A.; Bouma, G.J.; Schep, N.W.L.; et al. Diagnostic accuracy of an artificial intelligence algorithm versus radiologists for fracture detection on cervical spine CT. Eur. Radiol. 2024, 34, 5041–5048. [Google Scholar] [CrossRef] [PubMed]
- Voter, A.F.; Larson, M.E.; Garrett, J.W.; Yu, J.J. Diagnostic Accuracy and Failure Mode Analysis of a Deep Learning Algorithm for the Detection of Cervical Spine Fractures. AJNR Am. J. Neuroradiol. 2021, 42, 1550–1556. [Google Scholar] [CrossRef] [PubMed]
- Small, J.E.; Osler, P.; Paul, A.B.; Kunst, M. CT Cervical Spine Fracture Detection Using a Convolutional Neural Network. AJNR Am. J. Neuroradiol. 2021, 42, 1341–1347. [Google Scholar] [CrossRef] [PubMed]
- Hu, Z.; Patel, M.; Ball, R.L.; Lin, H.M.; Prevedello, L.M.; Naseri, M.; Mathur, S.; Moreland, R.; Wilson, J.; Witiw, C.; et al. Assessing the Performance of Models from the 2022 RSNA Cervical Spine Fracture Detection Competition at a Level I Trauma Center. Radiol. Artif. Intell. 2024, 6, e230550. [Google Scholar] [CrossRef] [PubMed]
- Lee, G.R.; Flanders, A.E.; Richards, T.; Kitamura, F.; Colak, E.; Lin, H.M.; Ball, R.L.; Talbott, J.; Prevedello, L.M. Erratum for: Performance of the Winning Algorithms of the RSNA 2022 Cervical Spine Fracture Detection Challenge. Radiol. Artif. Intell. 2024, 6, e249002. [Google Scholar] [CrossRef] [PubMed]
- Chen, Y.A.; Hu, Z.; Shek, K.D.; Wilson, J.; Alotaibi, F.S.S.; Witiw, C.D.; Lin, H.M.; Ball, R.L.; Patel, M.; Mathur, S.; et al. Machine Learning to Detect Cervical Spine Fractures Missed by Radiologists on CT: Analysis Using Seven Award-Winning Models From the 2022 RSNA Cervical Spine Fracture AI Challenge. AJR Am. J. Roentgenol. 2025, 224, e2432076. [Google Scholar] [CrossRef] [PubMed]
- Bossuyt, P.M.; Reitsma, J.B.; Bruns, D.E.; Gatsonis, C.A.; Glasziou, P.P.; Irwig, L.; Lijmer, J.G.; Moher, D.; Rennie, D.; de Vet, H.C.; et al. STARD 2015: An updated list of essential items for reporting diagnostic accuracy studies. BMJ 2015, 351, h5527. [Google Scholar] [CrossRef] [PubMed]
- Mongan, J.; Moy, L.; Kahn, C.E., Jr. Checklist for Artificial Intelligence in Medical Imaging (CLAIM): A Guide for Authors and Reviewers. Radiol. Artif. Intell. 2020, 2, e200029. [Google Scholar] [CrossRef] [PubMed]
- Tejani, A.S.; Klontzas, M.E.; Gatti, A.A.; Mongan, J.T.; Moy, L.; Park, S.H.; Kahn, C.E., Jr. Checklist for Artificial Intelligence in Medical Imaging (CLAIM): 2024 Update. Radiol. Artif. Intell. 2024, 6, e240300. [Google Scholar] [CrossRef] [PubMed]
- Chekmeyan, M.; Baccei, S.J.; Garwood, E.R. Cross-Check QA: A Quality Assurance Workflow to Prevent Missed Diagnoses by Alerting Inadvertent Discordance Between the Radiologist and Artificial Intelligence in the Interpretation of High-Acuity CT Scans. J. Am. Coll. Radiol. 2023, 20, 1225–1230. [Google Scholar] [CrossRef] [PubMed]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:1505.04597. [Google Scholar] [CrossRef]
- Taghanaki, S.A.; Zheng, Y.; Kevin Zhou, S.; Georgescu, B.; Sharma, P.; Xu, D.; Comaniciu, D.; Hamarneh, G. Combo loss: Handling input and output imbalance in multi-organ segmentation. Comput. Med. Imaging Graph. 2019, 75, 24–33. [Google Scholar] [CrossRef] [PubMed]
- Zou, K.H.; Tuncali, K.; Silverman, S.G. Correlation and simple linear regression. Radiology 2003, 227, 617–622. [Google Scholar] [CrossRef] [PubMed]
- Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef] [PubMed]
- Tawfik, D.S.; Gould, J.B.; Profit, J. Perinatal Risk Factors and Outcome Coding in Clinical and Administrative Databases. Pediatrics 2019, 143, e20181487. [Google Scholar] [CrossRef] [PubMed]






| U.S. Teleradiology Company (n = 176) | U.S. University Hospital (n = 22) | French Teleradiology Company (n = 130) | p Value | |
|---|---|---|---|---|
| Age (y) * | 63 ± 20 (19–80) | 52 ± 23 (22–90) | 48 ± 23 (18–101) | 0.02 |
| Age Group | ||||
| 18–45 years | 39 (22.2) | 10 (45.5) | 64 (49.2) | <0.001 |
| 46–74 years | 56 (31.8) | 6 (27.3) | 45 (34.6) | 0.62 |
| ≥75 years | 81 (46.0) | 6 (27.3) | 21 (16.2) | 0.09 |
| Sex | ||||
| Male | 108 (61.4) | 15 (68.2) | 72 (55.4) | 0.35 |
| Female | 56 (31.8) | 6 (27.3) | 57 (43.8) | 0.62 |
| NA | 12 (6.8) | 1 (4.5) | 1 (0.8) | <0.001 |
| Presence of Acute Cervical Spine Fractures | ||||
| Positive Cases | 127 (72.2) | 22 (100) | 6 (4.6) | 0.002 |
| Negative Cases | 49 (27.8) | NA | 124 (95.4) | <0.001 |
| U.S. Teleradiology Company (n = 176) | U.S. University Hospital (n = 22) | French Teleradiology Company (n = 130) | Total (n = 328) | |
|---|---|---|---|---|
| GE | 73 | 0 | 29 | 102 |
| Philips | 16 | 22 | 33 | 71 |
| Siemens | 45 | 0 | 49 | 94 |
| Canon (Toshiba) | 41 | 0 | 19 | 60 |
| Fujifilm | 1 | 0 | 0 | 1 |
| U.S. Teleradiology Company | U.S. University Hospital | French Teleradiology Company | Overall | p Value | |
|---|---|---|---|---|---|
| Sensitivity (%) | 90.6 (84.1, 95.0) [115/127] | 90.9 (70.8, 98.9) [20/22] | 83.3 (35.9, 99.6) [5/6] | 90.3 (84.5, 94.5) [140/155] | 0.84 |
| Specificity (%) | 89.8 (77.8, 96.6) [44/49] | NA | 92.7 (86.7, 96.6) [115/124) | 91.9 (86.8, 95.5) [159/173] | 0.53 * |
| Accuracy (%) | 90.3 (85, 94.3) [159/176] | 90.9 (70.8, 98.9) [20/22] | 92.3 (86.3, 96.2) [120/130] | 91.2 (87.5, 94.0) [299/328] | 0.83 |
| AUC | 0.90 (0.85, 0.94) | NA | 0.88 (0.812, 0.931) | 0.91 (0.87, 0.94) | 0.56 * |
| 18–45 Years | 46–74 Years | ≥75 Years | p Value | |
|---|---|---|---|---|
| Sensitivity (%) | 90.2 (76.9, 97.3) [37/41] | 89.1 (76.4, 96.4) [41/46] | 91.2 (81.8, 96.7) [62/68] | 0.94 |
| Specificity (%) | 91.7 (82.7, 96.9) [66/72] | 95.1 (86.3, 99.0) [58/61] | 87.5 (73.2, 95.8) [35/40] | 0.78 |
| Accuracy (%) | 91.2 (84.3, 95.7) [103/113] | 92.5 (85.8, 96.7) [99/107] | 89.81 (82.51, 94.80) [97/108] | 0.98 |
| AUC | 0.91 (0.841, 0.955) | 0.92 (0.853, 0.964) | 0.893 (0.82, 0.945) | 0.82 |
| Sensitivity (%) | Specificity (%) | |
|---|---|---|
| GE | 90.4% (79.0, 96.8) [47/52] | 92.0% (80.8, 97.8] [46/50] |
| Philips | 91.2% (76.3, 98.1) [31/34] | 91.9% (78.1, 98.3) [34/37] |
| Siemens | 87.2% (72.6, 95.7) [34/39] | 90.9% (80.0, 97.0) [50/55] |
| Canon (Toshiba) | 93.3% (77.9, 99.2) [28/30] | 93.3% (77.9, 99.2] [28/30] |
| Fuji film | 100.0% (2.5, 100.0) [1/1] | NA |
| p value | 0.91 | 0.98 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Sung, J.; Chang, P.D.; Ayobi, A.; Cotena, M.; Roca-Sogorb, M.; Jang, J.; Chow, D.S.; Chaibi, Y. Multicenter, Multinational, and Multivendor Validation of an Artificial Intelligence Application for Acute Cervical Spine Fracture Detection on CT. Diagnostics 2026, 16, 194. https://doi.org/10.3390/diagnostics16020194
Sung J, Chang PD, Ayobi A, Cotena M, Roca-Sogorb M, Jang J, Chow DS, Chaibi Y. Multicenter, Multinational, and Multivendor Validation of an Artificial Intelligence Application for Acute Cervical Spine Fracture Detection on CT. Diagnostics. 2026; 16(2):194. https://doi.org/10.3390/diagnostics16020194
Chicago/Turabian StyleSung, Jinkyeong, Peter D. Chang, Angela Ayobi, Martina Cotena, Mar Roca-Sogorb, Jinhee Jang, Daniel S. Chow, and Yasmina Chaibi. 2026. "Multicenter, Multinational, and Multivendor Validation of an Artificial Intelligence Application for Acute Cervical Spine Fracture Detection on CT" Diagnostics 16, no. 2: 194. https://doi.org/10.3390/diagnostics16020194
APA StyleSung, J., Chang, P. D., Ayobi, A., Cotena, M., Roca-Sogorb, M., Jang, J., Chow, D. S., & Chaibi, Y. (2026). Multicenter, Multinational, and Multivendor Validation of an Artificial Intelligence Application for Acute Cervical Spine Fracture Detection on CT. Diagnostics, 16(2), 194. https://doi.org/10.3390/diagnostics16020194

