Transformer-Based Deep Learning for Multiplanar Cervical Spine MRI Interpretation: Comparison with Spine Surgeons and Radiologists
Abstract
1. Introduction
2. Materials and Methods
2.1. Model Training
2.2. Dataset Labelling
2.3. Demographics
2.4. Statistical Analysis
3. Results
3.1. Model Performance
3.2. Internal Testing
3.3. External Testing
4. Discussion
4.1. Model Performance and Generalizability
4.2. Inter-Rater Performance
4.3. Clinical Implications
4.4. Limitations and Future Work
4.5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| DCS | Degenerative cervical spondylosis |
| DLM | Deep learning model |
| MRI | Magnetic resonance imaging |
| ROI | Region of interest |
| FFN | Feed-forward network |
| ATSS | Adaptive Training Sample Selection |
| FCOS | Fully convolutional one-stage object detection |
| R-CNN | Region-based convolutional neural network |
| IOU | Intersection over union |
| STIR | Short-tau inversion recovery |
References
- Theodore, N. Degenerative cervical spondylosis. N. Engl. J. Med. 2020, 383, 159–168. [Google Scholar] [CrossRef]
- Nouri, A.; Martin, A.R.; Kato, S.; Reihani-Kermani, H.; Riehm, L.E.; Fehlings, M.G. The relationship between MRI signal intensity changes, clinical presentation, and surgical outcome in degenerative cervical myelopathy: Analysis of a global cohort. Spine 2017, 42, 1851–1858. [Google Scholar] [CrossRef]
- Brown, B.M.; Schwartz, R.H.; Frank, E.; Blank, N.K. Preoperative evaluation of cervical radiculopathy and myelopathy by surface-coil MR imaging. AJR Am. J. Roentgenol. 1988, 151, 1205–1212. [Google Scholar] [CrossRef]
- Teresi, L.M.; Lufkin, R.B.; Reicher, M.A.; Moffit, B.J.; Vinuela, F.V.; Wilson, G.M.; Bentson, J.R.; Hanafee, W.N. Asymptomatic degenerative disk disease and spondylosis of the cervical spine: MR imaging. Radiology 1987, 164, 83–88. [Google Scholar] [CrossRef] [PubMed]
- Cheng, P.M.; Montagnon, E.; Yamashita, R.; Pan, I.; Cadrin-Chênevert, A.; Romero, F.P.; Chartrand, G.; Kadoury, S.; Tang, A. Deep learning: An update for radiologists. Radiographics 2021, 41, 1427–1445. [Google Scholar] [CrossRef] [PubMed]
- Lee, A.; Wu, J.; Liu, C.; Makmur, A.; Ting, Y.H.; Lee, S.; Chan, M.D.Z.; Lim, D.S.W.; Khoo, V.M.H.; Sng, J.; et al. Using deep learning to enhance reporting efficiency and accuracy in degenerative cervical spine MRI. Spine J. 2025, 25, 1942–1950. [Google Scholar] [CrossRef] [PubMed]
- Lee, A.; Wu, J.; Liu, C.; Makmur, A.; Ting, Y.H.; Muhamat, F.E.; Tan, L.Y.; Ong, W.; Tan, W.C.; Lee, Y.J.; et al. Deep learning model for automated diagnosis of degenerative cervical spondylosis and altered spinal cord signal on MRI. Spine J. 2025, 25, 255–264. [Google Scholar] [CrossRef]
- Hallinan, J.T.P.D.; Zhu, L.; Yang, K.; Makmur, A.; Algazwi, D.A.R.; Thian, Y.L.; Lau, S.; Choo, Y.S.; Eide, S.E.; Yap, Q.V.; et al. Deep learning model for automated detection and classification of central canal, lateral recess, and neural foraminal stenosis at lumbar spine MRI. Radiology 2021, 300, 130–138. [Google Scholar] [CrossRef]
- Tumko, V.; Kim, J.; Uspenskaia, N.; Honig, S.; Abel, F.; Lebl, D.R.; Hotalen, I.; Kolisnyk, S.; Kochnev, M.; Rusakov, A.; et al. A neural network model for detection and classification of lumbar spinal stenosis on MRI. Eur. Spine J. 2024, 33, 941–948. [Google Scholar] [CrossRef]
- Yi, W.; Zhao, J.; Tang, W.; Yin, H.; Yu, L.; Wang, Y.; Tian, W. Deep learning-based high-accuracy detection for lumbar and cervical degenerative disease on T2-weighted MR images. Eur. Spine J. 2023, 32, 3807–3814. [Google Scholar] [CrossRef]
- Niemeyer, F.; Galbusera, F.; Tao, Y.; Phillips, F.M.; An, H.S.; Louie, P.K.; Samartzis, D.; Wilke, H.J. Deep phenotyping the cervical spine: Automatic characterization of cervical degenerative phenotypes based on T2-weighted MRI. Eur. Spine J. 2023, 32, 3846–3856. [Google Scholar] [CrossRef] [PubMed]
- Zong, Z.; Song, G.; Liu, Y. DETRs with collaborative hybrid assignments training. arXiv 2022, arXiv:2211.12860. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Lecture Notes in Computer Science, Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 770–778. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Zhang, S.; Chi, C.; Yao, Y.; Lei, Z.; Li, S.Z. Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; IEEE: Piscataway, NJ, USA, 2020. [Google Scholar]
- Tian, Z.; Shen, C.; Chen, H.; He, T. FCOS: Fully convolutional one-stage object detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; IEEE: Piscataway, NJ, USA, 2019. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Muhle, C.; Metzner, J.; Weinert, D.; Falliner, A.; Brinkmann, G.; Mehdorn, M.H.; Heller, M.; Resnick, D. Classification system based on kinematic MR imaging in cervical spondylitic myelopathy. AJNR Am. J. Neuroradiol. 1998, 19, 1763–1771. [Google Scholar] [PubMed]
- Park, H.J.; Kim, J.H.; Lee, J.W.; Lee, S.Y.; Chung, E.C.; Rho, M.H.; Moon, J.W. Clinical correlation of a new and practical magnetic resonance grading system for cervical foraminal stenosis assessment. Acta Radiol. 2015, 56, 727–732. [Google Scholar] [CrossRef]
- Wang, X.; Liang, G.; Zhang, Y.; Blanton, H.; Bessinger, Z.; Jacobs, N. Inconsistent performance of deep learning models on mammogram classification. J. Am. Coll. Radiol. 2020, 17, 796–803. [Google Scholar] [CrossRef]
- Tran, A.T.; Zeevi, T.; Payabvash, S. Strategies to improve the robustness and generalizability of deep learning segmentation and classification in neuroimaging. BioMedInformatics 2025, 5, 20. [Google Scholar] [CrossRef]
- Lurie, J.D.; Doman, D.M.; Spratt, K.F.; Tosteson, A.N.A.; Weinstein, J.N. Magnetic resonance imaging interpretation in patients with symptomatic lumbar spine disc herniations: Comparison of clinician and radiologist readings. Spine 2009, 34, 701–705. [Google Scholar] [CrossRef]
- Rihn, J.A.; Yang, N.; Fisher, C.; Saravanja, D.; Smith, H.; Morrison, W.B.; Harrop, J.; Vacaro, A.R. Using magnetic resonance imaging to accurately assess injury to the posterior ligamentous complex of the spine: A prospective comparison of the surgeon and radiologist: Clinical article. J. Neurosurg. Spine 2010, 12, 391–396. [Google Scholar] [CrossRef]
- Grochmal, J.K.; Lozen, A.M.; Klein, A.P.; Mark, L.P.; Li, J.; Wang, M.C. Interobserver reliability of magnetic resonance imaging predictors of outcome in cervical spine degenerative conditions. World Neurosurg. 2018, 117, e215–e220. [Google Scholar] [CrossRef]
- Sritharan, K.; Chamoli, U.; Kuan, J.; Diwan, A.D. Assessment of degenerative cervical stenosis on T2-weighted MR imaging: Sensitivity to change and reliability of mid-sagittal and axial plane metrics. Spinal Cord. 2020, 58, 238–246. [Google Scholar] [CrossRef]
- Shin, H.; Park, J.E.; Jun, Y.; Eo, T.; Lee, J.; Kim, J.E.; Lee, D.H.; Moon, H.H.; Park, S.I.; Kim, S.; et al. Deep learning referral suggestion and tumour discrimination using explainable artificial intelligence applied to multiparametric MRI. Eur. Radiol. 2023, 33, 5859–5870. [Google Scholar] [CrossRef]
- Kolossváry, M.; Raghu, V.K.; Nagurney, J.T.; Hoffmann, U.; Lu, M.T. Deep learning analysis of chest radiographs to triage patients with acute chest pain syndrome. Radiology 2023, 306, e221926. [Google Scholar] [CrossRef]




| Characteristics | Training/Validation and Internal Testing (n = 723) | Internal Testing (n = 75) | External Testing (n = 75) | |
|---|---|---|---|---|
| Age (years ± standard deviations (range)) | 58 ± 13.7 (19–90) | 56 ± 14.3 (20–84) | 60 ± 13.2 (24–95) | |
| Women | 319 (44%) | 40 (53%) | 17 (23%) | |
| Indication for MRI | Neck pain | 174 (35%) | 14 (19%) | 1 (1%) |
| Unilateral radiculopathy | 199 (39%) | 20 (27%) | 16 (21%) | |
| Bilateral radiculopathy | 24 (5%) | 4 (5%) | 3 (4%) | |
| Myelopathy | 296 (59%) | 35 (47%) | 51 (68%) | |
| Others | 30 (6%) | 2 (3%) | 4 (5%) | |
| Test Set | Axial Spinal Canal | Sagittal Spinal Canal | Axial Neural Foramina 1 | |
|---|---|---|---|---|
| Internal Testing | Grade 0 | 1642 (58.7%) | 877 (57.0%) | 627 (62.0%) |
| Grade 1 | 757 (27.0%) | 436 (28.3%) | 186 (18.4%) | |
| Grade 2 | 216 (7.7%) | 142 (9.2%) | 199 (19.7%) | |
| Grade 3 | 180 (6.4%) | 83 (5.4%) | N/A | |
| Total | 2795 | 1538 | 1012 | |
| DLM Total (Recall) | 2777 (99.4%) | 1492 (95.7%) | 899 (88.8%) | |
| External Testing | Grade 0 | 978 (39.4%) | 892 (47.4%) | 370 (41.4%) |
| Grade 1 | 842 (33.9%) | 526 (27.9%) | 258 (28.9%) | |
| Grade 2 | 330 (13.3%) | 229 (12.2%) | 265 (29.7%) | |
| Grade 3 | 332 (13.4%) | 236 (12.5%) | N/A | |
| Total | 2482 | 1883 | 893 | |
| DLM Total (Recall) | 2477 (99.8%) | 1791 (95.1%) | 774 (86.7%) | |
| Rater | Axial Spinal Canal | p-Value Against DLM | Sagittal Spinal Canal | p-Value Against DLM | Axial Neural Foramina | p-Value Against DLM | |
|---|---|---|---|---|---|---|---|
| Deep Learning Model | All-class | 0.80 (0.72–0.82) | 0.83 (0.81–0.85) | 0.81 (0.77–0.84) | |||
| Dichotomous | 0.95 (0.93–0.96) | 0.95 (0.94–0.96) | 0.90 (0.88–0.93) | ||||
| Spine Surgeon 1 | All-class | 0.65 (0.63–0.67) | <0.001 | 0.70 (0.67–0.73) | <0.001 | 0.70 (0.66–0.74) | <0.001 |
| Dichotomous | 0.93 (0.92–0.94) | 0.037 | 0.92 (0.91–0.94) | 0.033 | 0.86 (0.83–0.88) | 0.002 | |
| Spine Surgeon 2 | All-class | 0.60 (0.58–0.62) | <0.001 | 0.65 (0.62–0.68) | <0.001 | 0.65 (0.61–0.69) | <0.001 |
| Dichotomous | 0.91 (0.90–0.93) | <0.001 | 0.93 (0.91–0.94) | 0.058 | 0.84 (0.81–0.86) | <0.001 | |
| Neuroradiologist | All-class | 0.67 (0.65–0.69) | <0.001 | 0.70 (0.67–0.73) | <0.001 | 0.65 (0.61–0.68) | <0.001 |
| Dichotomous | 0.90 (0.88–0.91) | <0.001 | 0.90 (0.88–0.92) | <0.001 | 0.82 (0.79–0.86) | <0.001 | |
| Musculoskeletal Radiologist | All-class | 0.72 (0.70–0.74) | <0.001 | 0.70 (0.67–0.72) | <0.001 | 0.72 (0.69–0.76) | <0.001 |
| Dichotomous | 0.92 (0.91–0.93) | 0.001 | 0.88 (0.86–0.90) | <0.001 | 0.87 (0.84–0.89) | 0.009 | |
| Radiology Resident 1 | All-class | 0.67 (0.64–0.69) | <0.001 | 0.80 (0.78–0.82) | 0.041 | 0.72 (0.68–0.75) | <0.001 |
| Dichotomous | 0.93 (0.91–0.94) | 0.017 | 0.94 (0.93–0.95) | 0.411 | 0.89 (0.86–0.91) | 0.333 | |
| Radiology Resident 2 | All-class | 0.67 (0.65–0.69) | <0.001 | 0.68 (0.66–0.71) | <0.001 | 0.68 (0.64–0.71) | <0.001 |
| Dichotomous | 0.90 (0.89–0.92) | <0.001 | 0.90 (0.88–0.92) | <0.001 | 0.84 (0.81–0.87) | <0.001 | |
| Region of Interest | Reader | Sensitivity | Specificity | PPV | NPV | AUROC |
|---|---|---|---|---|---|---|
| Axial Spinal Canal | Deep learning model | 86.8 | 97.4 | 84.5 | 97.8 | 92.1 |
| Spine Surgeon 1 | 87.6 | 95.8 | 77.6 | 97.9 | 91.7 | |
| Spine Surgeon 2 | 89.1 | 94.5 | 72.6 | 98.1 | 91.8 | |
| Neuroradiologist | 93.9 | 92.5 | 67.4 | 98.9 | 93.2 | |
| Musculoskeletal Radiologist | 81.1 | 96.1 | 77.4 | 96.9 | 88.6 | |
| Radiology Resident 1 | 90.2 | 95.2 | 75.6 | 98.3 | 92.7 | |
| Radiology Resident 2 | 91.4 | 93.1 | 68.7 | 98.5 | 92.3 | |
| Axial Neural Foramina | Deep learning model | 89.1 | 94.6 | 80.8 | 97.1 | 91.8 |
| Spine Surgeon 1 | 55.3 | 97.8 | 85.9 | 89.9 | 76.5 | |
| Spine Surgeon 2 | 42.2 | 98.8 | 89.4 | 87.5 | 70.5 | |
| Neuroradiologist | 36.7 | 98.9 | 89.0 | 86.5 | 67.8 | |
| Musculoskeletal Radiologist | 53.3 | 99.1 | 93.8 | 89.7 | 76.2 | |
| Radiology Resident 1 | 76.9 | 96.1 | 82.7 | 94.4 | 86.5 | |
| Radiology Resident 2 | 43.7 | 99.0 | 91.6 | 87.8 | 71.4 | |
| Sagittal Spinal Canal | Deep learning model | 85.0 | 97.9 | 86.7 | 97.6 | 91.5 |
| Spine Surgeon 1 | 91.6 | 95.0 | 75.7 | 98.5 | 93.3 | |
| Spine Surgeon 2 | 96.0 | 94.4 | 74.7 | 99.3 | 95.2 | |
| Neuroradiologist | 93.3 | 92.6 | 68.4 | 98.8 | 93.0 | |
| Musculoskeletal Radiologist | 93.3 | 91.4 | 65.0 | 98.8 | 92.4 | |
| Radiology Resident 1 | 82.2 | 97.8 | 96.5 | 97.0 | 90.0 | |
| Radiology Resident 2 | 94.2 | 92.9 | 69.5 | 99.0 | 93.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lee, A.; Wu, J.; Liu, C.; Makmur, A.; Ting, Y.H.; Lee, Y.J.; Ong, W.; Kuah, T.; Huang, J.; Ge, S.; et al. Transformer-Based Deep Learning for Multiplanar Cervical Spine MRI Interpretation: Comparison with Spine Surgeons and Radiologists. AI 2025, 6, 308. https://doi.org/10.3390/ai6120308
Lee A, Wu J, Liu C, Makmur A, Ting YH, Lee YJ, Ong W, Kuah T, Huang J, Ge S, et al. Transformer-Based Deep Learning for Multiplanar Cervical Spine MRI Interpretation: Comparison with Spine Surgeons and Radiologists. AI. 2025; 6(12):308. https://doi.org/10.3390/ai6120308
Chicago/Turabian StyleLee, Aric, Junran Wu, Changshuo Liu, Andrew Makmur, Yong Han Ting, You Jun Lee, Wilson Ong, Tricia Kuah, Juncheng Huang, Shuliang Ge, and et al. 2025. "Transformer-Based Deep Learning for Multiplanar Cervical Spine MRI Interpretation: Comparison with Spine Surgeons and Radiologists" AI 6, no. 12: 308. https://doi.org/10.3390/ai6120308
APA StyleLee, A., Wu, J., Liu, C., Makmur, A., Ting, Y. H., Lee, Y. J., Ong, W., Kuah, T., Huang, J., Ge, S., Teo, A. Q. A., Beh, J. C. Y., Lim, D. S. W., Low, X. Z., Teo, E. C., Yap, Q. V., Lin, S., Tan, J. J. H., Kumar, N., ... Hallinan, J. T. P. D. (2025). Transformer-Based Deep Learning for Multiplanar Cervical Spine MRI Interpretation: Comparison with Spine Surgeons and Radiologists. AI, 6(12), 308. https://doi.org/10.3390/ai6120308

