Next Article in Journal
A Hybrid Type-2 Fuzzy Double DQN with Adaptive Reward Shaping for Stable Reinforcement Learning
Previous Article in Journal
Deep Learning for Unsupervised 3D Shape Representation with Superquadrics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improved Productivity Using Deep Learning-Assisted Major Coronal Curve Measurement on Scoliosis Radiographs

by
Xi Zhen Low
1,*,†,
Mohammad Shaheryar Furqan
2,†,
Kian Wei Ng
3,
Andrew Makmur
1,4,
Desmond Shi Wei Lim
1,
Tricia Kuah
1,
Aric Lee
1,
You Jun Lee
1,
Ren Wei Liu
1,
Shilin Wang
5,
Hui Wen Natalie Tan
5,
Si Jian Hui
5,
Xinyi Lim
5,
Dexter Seow
5,
Yiong Huak Chan
6,
Premila Hirubalan
7,
Lakshmi Kumar
7,
Jiong Hao Jonathan Tan
5,
Leok-Lim Lau
5 and
James Thomas Patrick Decourcy Hallinan
1,4
1
Department of Diagnostic Imaging, National University Hospital, National University Health System, Singapore 119074, Singapore
2
Division of Biomedical Informatics, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 119228, Singapore
3
Academic Informatics Office, National University Health System, Singapore 117602, Singapore
4
Department of Diagnostic Radiology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597, Singapore
5
Department of Orthopaedic Surgery, National University Hospital, National University Health System, Singapore 119074, Singapore
6
Biostatistics Unit, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117597, Singapore
7
Youth Preventive Health Service Division, Youth Preventive Service, Health Promotion Board, Singapore 168937, Singapore
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
AI 2025, 6(12), 318; https://doi.org/10.3390/ai6120318
Submission received: 23 August 2025 / Revised: 2 December 2025 / Accepted: 2 December 2025 / Published: 5 December 2025
(This article belongs to the Special Issue AI-Driven Innovations in Medical Computer Engineering and Healthcare)

Abstract

Background: Deep learning models have the potential to enable fast and consistent interpretations of scoliosis radiographs. This study aims to assess the impact of deep learning assistance on the speed and accuracy of clinicians in measuring major coronal curves on scoliosis radiographs. Methods: We utilized a deep learning model (Context Axial Reverse Attention Network, or CaraNet) to assist in measuring Cobb’s angles on scoliosis radiographs in a simulated clinical setting. Four trainee radiologists with no prior experience and four trainee orthopedists with four to six months of prior experience analyzed the radiographs retrospectively, both with and without deep learning assistance, using a six-week washout period. We recorded the interpretation time and mean angle differences, with a consultant spine surgeon providing the reference standard. The dataset consisted of 640 radiographs from 640 scoliosis patients, aged 10–18 years; we divided the dataset into 75% for training, 16% for validation, and 9% for testing. Results: Deep learning assistance achieved non-statistically significant improvements in mean accuracy of 0.32 for trainee orthopedists (95% CI −1.4 to 0.8, p > 0.05) and 0.43 degrees (95% CI −1.6 to 0.8, p > 0.05) for trainee radiologists (non-inferior across all readers). Mean interpretation time decreased by 13.25 s for trainee radiologists, but increased by 3.85 s for trainee orthopedists (p = 0.005). Conclusions: Deep learning assistance for measuring Cobb’s angles was as accurate as unaided interpretation and slightly improved measurement accuracy. It increased the interpretation speeds of trainee radiologists but slightly slowed trainee orthopedists, suggesting that its effect on speed depended on prior experience.

1. Introduction

Adolescent idiopathic scoliosis (AIS) has a global prevalence of 0.5–5.2% [1] and can lead to significant morbidity if left untreated [2]. Several countries have implemented national scoliosis screening programs to detect moderate AIS early in children and adolescents. Early detection of scoliosis enables the use of conservative treatments, such as bracing, to prevent curve progression and potentially reduce the need for surgical intervention [3]. In Singapore, more than 140,000 children are screened for AIS annually, of which approximately 8000 are identified for further radiographic evaluation [4].
Radiographic evaluation of suspected scoliosis is crucial for diagnosis and further management [5]. On a standing posteroanterior spinal radiograph, scoliosis is defined as a coronal curvature with a Cobb’s angle of 10 degrees or greater [6]. Figure 1 illustrates an example of the measurement of Cobb’s angle, which refers to the greatest angle between two lines passing through the superior and inferior endplates of two appropriately selected vertebrae. The severity of AIS is classified according to the angle of the major coronal curve, with curves of approximately 10–24°, 25–40° and >40° representing mild, moderate and severe scoliosis, respectively [7,8]. This classification is essential for prognostication and treatment planning [9,10].
Methods of measurement of the major coronal curve include manual (i.e., using printed radiographic films and Cobbometers) or computer-assisted techniques. Manual measurements performed by an experienced reader is the most widely used technique, but it requires significant technical expertise and may be vulnerable to inaccuracies, with varying intra- and inter-reader agreement [11]. In addition, manual measurement methods are time-consuming and labor-intensive, requiring dedicated reading sessions and potentially reducing the amount of time available for physician-patient interaction. Computer-assisted techniques are able to facilitate semi-automated measurements of Cobb’s angle, but require accurate input data, in terms of vertebral endplate selection. The correct selection of vertebral endplates remains challenging, especially for inexperienced readers performing the initial screening [12].
Deep learning techniques have found applications in the management of various spinal diseases, such as the automated detection of spinal stenosis on magnetic resonance imaging (MRI) scans [13,14], prediction of adverse events after spinal surgery [15], and prognostication of walking ability after spinal cord injury [16,17]. Deep learning solutions, utilizing convolutional neural networks (CNNs), have also been developed for the assessment of scoliotic curves on radiographic images [18,19,20,21,22]. These solutions involve manual segmentation of the vertebra for CNN training and incorporate various methods for deriving the major coronal curve, such as spline techniques [23]. Overall, these deep learning solutions show significant promise, and a recent study by Ha et al. (2022) [22] demonstrated a mean difference of 7.3° between major coronal curve angles derived by deep learning methods, and those obtained by the interpretations of expert readers. This is within the reported range of inter-observer discrepancies (three to five degrees) in Cobb’s angle measurements performed by expert readers [22,24,25,26].
These studies appear to demonstrate the accuracy of deep learning techniques in the measurement of Cobb’s angle; however, the translational impact of these deep learning solutions, when incorporated into actual clinical practice workflows to augment physician capabilities, remains unclear. There is a relative scarcity of studies examining the actual clinical impact of deep learning-augmented diagnostics; recent deep learning scoliosis studies have largely focused on advancing the technical performance of existing analytical tools. In a recent systematic review by Goldman et al. (2024) [27], studies investigating the use of deep learning techniques for imaging analysis of scoliosis primarily focused on the prediction of Cobb angles, or the progression of adolescent idiopathic scoliosis. The review found that only a small proportion of studies investigated the use of artificial intelligence in clinical decision support such as treatment management or diagnosis of AIS, and that most (up to 62.5%) studies did not report on practical strategies or guidelines on clinical implementation of their deep learning techniques. In another systematic review, Chan et al. (2024) [28] also reported that most existing machine learning models are not clinically deployable and require further external validation.
In this study, we utilized a deep learning model in a simulated clinical setting to assist physicians in the assessment and grading of scoliotic curves. This proof-of-concept study aims to investigate the impact of deep learning assistance on the interpretation times and accuracies of clinicians in the assessment of scoliosis radiographs. We hypothesize that deep learning assistance would have a beneficial impact in reducing the interpretation times of scoliosis radiographs, especially for less experienced readers.

2. Materials and Methods

The study was conducted in accordance with the Declaration of Helsinki, and the local Institutional Review Board (IRB) of the National Healthcare Group (NHG), Singapore approved the study protocol (IRB approval number—2021/01084, date of approval: 17 February 2022). The IRB granted the study a waiver of consent due to the study’s retrospective nature and minimal risk involved.

2.1. Scoliosis Dataset

This study included all consecutive pediatric patients (10–18 years old) diagnosed with AIS, who presented at the National University Hospital, Singapore during the period of January 2018–January 2019 and who had available standing, full-length, posteroanterior spinal radiographs. The study excluded adult patients (aged 18 years or older), as well as patients with spinal instrumentation and other skeletal or neuromuscular disorders.
We retrospectively extracted the radiographs from the platforms of various institutions—e.g., EOS® Imaging System (Paris, France), AGFA (Mortsel, Belgium) and Fujifilm (Tokyo, Japan), etc.—and anonymized them for the purpose of this study. The selection of a wide range of radiographic images allowed the deep learning model to be trained to process images with varying levels of noise, brightness, and contrast, across a range of scoliotic curves.
The division of the dataset of scoliosis radiographs into 75% for training, 16% for validation and 9% for testing (480, 100 and 60 radiographs, respectively) was a split that was consistent with previous studies [29,30].

2.2. Context Axial Reverse Attention Network (CaraNet) Object Detection Model

In this study, we used the Context Axial Reverse Attention Network (CaraNet) to perform vertebral segmentation. CaraNet has been utilized in many applications and is found to be accurate in segmenting objects from images, utilizing axial reserve attention and channel-wise feature pyramid modules to detect global and local feature information [31].

2.3. Deep Learning Model Development

Four experienced radiologists, including two musculoskeletal radiologists (J.T.P.D.H., with 12 years of experience, and D.S.W.L., with two years of experience) and two neuroradiologists (A.M., with seven years of experience, and X.Z.L., with four years of experience) manually labeled the radiographs in the training and validation sets.
Pre-processing ensured that the images were consistent in terms of size and shape before model training. The division of the image dataset into the training, validation and test sets (480, 100 and 60 radiographs, respectively) followed the pre-processing phase. The use of standard image augmentation techniques (e.g., flip, rotation, etc.) allowed the dataset size to be increased from 580 to more than 7000 images.
Each radiologist labeled at least 50 whole-spine posteroanterior radiographs and used an annotation tool, Darwin (available at: https://darwin.v7labs.com/login, accessed on 1 May 2022) (V7, London, UK), to perform manual segmentation of all corners of the vertebral bodies from C7 to L5 (or the last unfused vertebral body). Manual labeling generated approximately 36,550 object annotations, which trained the CaraNet model to identify the corners of each vertebra, with a Dice coefficient of 0.93 for the training data and 0.88 for the validation set. We used the Adam optimizer and a batch size of 32 to train the CaraNet model. The model learning rate was 0.001; the number of training epochs was 500. The high volume of medical imaging data necessitated the use of an institutional supercomputer to train the deep learning model. The total model training time was 130 h.
The process above allowed deep learning model masks of the whole spine (showing individual vertebrae) to be obtained and compared to the manual annotations. The model achieved a Dice similarity coefficient of 0.88 and Intersection over Union (IOU) of 84% on validation; it achieved a Dice coefficient of 0.619 and IOU of 72% on testing. Using the masks, the deep learning model determined the center of each vertebra, and extrapolated a polynomial curve along the centers. The spline technique, or the exhaustive assessment of the maximum angles between the vertebrae center line pairs to determine the largest angle, was the method of choice for the measurement of Cobb’s angle. Once the model has identified the correct points on the line, the reader could view the model prediction as extended lines and text in the top right hand corner of the image. For physicians’ review and augmentation, the model (available at the following web server: https://radweb.sha.endeavour-poc.ai/, accessed on 1 May 2022) annotated the largest possible angle in the output images, as shown in Figure 2 and Figure 3. Appendix A provides more details on the deep learning model.

2.4. Reference Standard

For the held-out test set, a consultant orthopedic surgeon experienced in scoliosis assessment and surgery (J.H.J.T., with six years of experience) provided the reference standard in determining the major coronal curve angle and curve location for each radiograph, without access to the deep learning model.

2.5. Study Design

Eight residents performed the assessment of the major coronal curve with and without the assistance of the deep learning model on the held-out test set.
The readers comprised four trainee radiologists (R1–R4), and four trainee orthopedic surgeons (O1–O4). Each orthopedist possessed at least six months of experience working in a specialist scoliosis clinic and interpreting scoliosis radiographs, including the determination of major coronal curve angles. Therefore, the trainee orthopedists had an intermediate level of experience for major coronal curve assessment (intermediate readers), as compared to the trainee radiologists, who had little to no experience (novice readers).
The relationship between reader experience and performance on deep learning-assisted reads remains a relatively under-explored area in artificial intelligence research. Evidence from prior research is inconclusive, and the effects of deep learning augmentation on the performance of readers with varying levels of experience are inconsistent and heterogeneous [32]. Hence, another aim of this study is to quantify the impact of the experience level of observers on deep learning-assisted reads, through the comparison of orthopedic and radiology trainees.
Prior to the study, each reader utilized a training set of 10 radiographs for illustration and instruction on major coronal curve measurement. The readers performed manual measurements first, after which they accessed the deep learning model and the reference standard measurements for comparison.
The study divided the readers equally into two groups, with each group consisting of two trainee radiologists and two trainee orthopedists. Firstly, Group A (R1, R2, O1 and O2) interpreted 60 scoliosis radiographs without deep learning assistance, while Group B (R3, R4, O3 and O4) performed the radiographic interpretations with assistance from the deep learning model. After a washout period of six weeks [33], we reshuffled the test set at random to prevent familiarization, and reversed the utilization of the deep learning model—Group A interpreted the radiographs with deep learning assistance, and Group B performed the interpretations without. Figure 4 shows the study design, which aimed to minimize carryover effect [14]. The readers independently assessed the radiographs in a dedicated reading room [with a high-resolution monitor (BARCO MDCC-6430, 6MP, Belgium)]. This study recorded both the major coronal curve angle and time taken for assessment of each radiograph. Interpretation time was the duration from the opening of the radiograph to the transcription of Cobb’s angle by the reader. The system automatically logged the start and end of the interpretation (as defined) using time stamps, and calculated the elapsed time. We anonymized all images, and blinded the readers to the clinical histories, patient characteristics and prior studies.

2.6. Radiographic Assessment

The readers performed manual measurements of the major coronal curve angle without assistance from the deep learning model using the digital angle tool on the Picture Archiving and Communication System (PACS) (Centricity, GE Healthcare, Chicago, IL, USA), with the results overlaid on the images. When interpreting the radiographs with assistance from the deep learning model, the readers accessed the automated major coronal curve calculations and annotations overlaid on the images (as illustrated in Figure 2 and Figure 3). The readers could accept the model predictions or perform manual calculations using the same digital tool as per their clinical judgment, especially if the center curve alignment did not match the image.

2.7. Statistical Analysis

A biostatistician (Y.H.C.) performed all statistical analysis using Stata (Version 16, StataCorp, College Station, TX, USA). The main variable of interest was the time required to measure the primary coronal curve in each radiograph, comparing readings with and without support from the deep learning model.
Assuming a moderate effect size of 0.5 as per Cohen’s criteria for time difference (measured in seconds) between readings of enhanced and non-enhanced images, a priori analysis showed that at least 40 scoliosis radiographs were needed to achieve a statistical power of 80%, at a 5% two-tailed significance level.
The assessment of differences in angle measurements and time required for the primary coronal curve measurements between deep learning-assisted and unassisted reads involved comparing paired mean angle differences. This study utilized linear mixed model analysis to evaluate the time differences, treating readers as a random effect to adjust for any sequence effects, and treating the use of deep learning assistance (i.e., unassisted vs. assisted reads) as a fixed effect. Normalization of the timing data with the Fisher-Yates shuffle confirmed the validity of the linear mixed model p-values.
Standard statistical methods of analysis included the t-test, Mann–Whitney U test, and analysis of variance (ANOVA) for continuous variables. This study determined the means and standard deviations of continuous variables and presented their 95% confidence intervals. The definition of statistical significance was two-tailed p < 0.05.

3. Results

3.1. Patient Characteristics

This study excluded 160 patients out of 800 consecutive patients—40 adult patients, 100 patients with spinal instrumentation, and 20 patients with other skeletal or neuromuscular disorders. The study included a total of 640 patients with 640 posteroanterior scoliosis radiographs in the Digital Imaging and Communications in Medicine (DICOM) format.
The study divided the data randomly into the training, validation and test sets—580 patients with 580 posteroanterior spinal radiographs in the training and validation set, and 60 patients with 60 radiographs in the internal test set. Mean age of the 60 patients in the test set was 12.6 ± 2.0 (10–18) years. There were 43 (71.7%) female patients with a mean age of 11.8 ± 1.6 years (range 10–18 years) and 17 (28.3%) male patients with a mean age of 14.4 ± 1.8 years (range 12–18 years) in the internal test set. Figure 5 shows a flow chart of the study design. Table 1 shows the characteristics of the patients included in this study.

3.2. Scoliotic Curve Characteristics

Table 1 shows the reference standard major coronal curve angles for the 60 radiographs in the test set. Overall, major coronal curves of <25 degrees (mild scoliosis) accounted for 50.0% (30/60) of the cases. Moderate scoliosis (Cobb’s angle in the range of 25–40°) and severe scoliosis (Cobb’s angle >40°) accounted for 41.6% (25/60) and 8.3% (5/60) of cases, respectively.

3.3. Major Coronal Curve Accuracy

On the 60 test set radiographs, the deep learning model demonstrated a mean angle difference of 3.9 ± 4.9° (95% CI: −5.9 to 8.8°), when compared to the reference standard.
Table 2 shows the mean angle differences of the individual readers, with or without deep learning assistance. Mean angle differences of readers R1, R4 and O3 showed significant improvements with the use of deep learning assistance, as compared to without. R1 had a mean angle difference of −2.0 ± 4.2° (95% CI −3.1 to −1.0°) without deep learning assistance, compared to an improved mean angle difference of 1.4 ± 3.8° (95% CI 0.4 to 2.4°) after utilizing deep learning assistance (p < 0.001). R4 had a mean angle difference of −3.2 ± 4.1° (95% CI −4.2 to −2.1°) without deep learning assistance, compared to an improved mean angle difference of 1.0 ± 5.9° (95% CI −0.5 to 2.6°) after utilizing deep learning assistance (p < 0.001). Mean angle difference of O3 also improved from −2.4 ± 6.1° (95% CI −4.0 to −0.8°), without deep learning assistance, to −0.5 ± 3.3° (95% CI −1.4 to 0.3°), with deep learning assistance (p = 0.039). The remaining five readers (R2, R3, O1, O2 and O4) showed no significant improvements in mean angle differences after utilizing deep learning model assistance, when compared against the reference standard (p-values ranging from 0.31 to 0.94).
Overall, when comparing the trainee radiologists and the trainee orthopedists, there were no significant differences in mean angle differences with or without assistance from the deep learning model (Table 3). For unassisted reads, the mean angle differences ranged from −0.1 to −3.2° for the trainee radiologists, as compared to −0.5 to −2.4° for the trainee orthopedists. When assisted by the deep learning model, the mean angle differences ranged from −0.2 to −1.4° for the trainee radiologists, as compared to 0.5 to −0.8° for the trainee orthopedists.

3.4. Interpretation Time

Overall, the use of deep learning assistance improved the mean interpretation time per radiograph. There was a mean improvement of 4.7 s across all readers. There were significant differences across the two groups of inexperienced vs. experienced readers (Table 4).
Deep learning assistance significantly reduced the interpretation times of all trainee radiologists, R1 to R4 (p < 0.001). The trainee radiologists achieved an average reduction of 13.3 s per radiograph. R2 showed the greatest improvement in mean interpretation time and achieved a reduction of 16.0 ± 10.5 s (95% CI −18.7 to −13.3) with the utilization of deep learning assistance (deep learning-assisted interpretation time of 13.0 ± 9.5 s, compared to an unassisted interpretation time of 29.0 ± 6.9 s) (Table 4); this represented a 55.2% saving in reading time (p < 0.001). R1 showed the smallest improvement in mean interpretation time, and achieved a reduction of 10.4 ± 11.0 s (95% CI −13.2 to −7.6) with the use of deep learning assistance (deep learning -assisted interpretation of time of 8.9 ± 6.4 s, compared to an unassisted interpretation time of 19.3 ± 10.1 s) (Table 4); this represented a 53.9% saving in reading time (p < 0.001). Figure 6 illustrates the impact of deep learning assistance on the mean interpretation times of individual readers.
As for the orthopedists in-training, only O1 had a small absolute time saving for deep learning-assisted reads, as compared to unassisted reads (Table 4). O1 showed a reduction in mean interpretation time of 3.9 ± 3.8 s (95% CI −4.9 to −2.9) (deep learning -assisted interpretation time of 5.8 ± 2.6 s, as compared to an unassisted interpretation time of 9.7 ± 2.8 s), which represented a 40.2% saving in reading time (p < 0.001). On the other hand, O3 showed an increase in mean interpretation time of 11.4 ± 43.4 s (95% CI 0.1 to 22.6) (deep learning-assisted interpretation time of 62.2 ± 35.9 s, as compared to an unassisted interpretation time of 50.8 ± 18.8 s, which represented a 22.4% mean time increase (p = 0.047). O2 and O3 had no significant differences in mean interpretation times between deep learning-assisted reads and unassisted reads.

4. Discussion

Radiographic evaluation of scoliosis and accurate measurement of the major coronal curve angle are essential for diagnosis and treatment planning [5]. In the clinic, manual measurement of Cobb’s angle is time-consuming, repetitive, and requires significant technical expertise (for accurate selection of the most angulated vertebral endplates). Deep learning techniques could potentially improve the efficiency and accuracy of major coronal curve angle measurements. This multi-reader study compared the accuracy and interpretation times of four trainee radiologists, without prior experience in the radiographic assessment of scoliosis (i.e., novice readers), to those of four trainee orthopedists with prior experience (i.e., intermediate readers), before and after the use of the deep learning model. Three of eight readers showed improvements in the accuracy of their major coronal curve angle measurements, as compared to a reference standard, after utilizing the deep learning model. This was greatest for R4—mean angle difference improved from -3.2° (95% CI −4.2 to −2.1°) to 1.0° (95% CI −0.5 to 2.6°) after using the deep learning model (p < 0.001). The remaining five readers showed no significant changes in mean angle differences after using the deep learning model (p-values ranging from 0.31 to 0.94). Overall, when comparing deep learning-assisted and unassisted major coronal curve assessments between the groups, the trainee radiologists had a mean time saving of 13.3 s (95% CI −19.6 to −6.9), compared to a slight time increase of 3.9 s (95% CI −2.9 to 10.6) for the trainee orthopedists (p = 0.005).
We believe that our study, serving as a proof of concept, has demonstrated the feasibility of the utilization of deep learning assistance by clinicians to augment their interpretations of scoliosis radiographs. Deep learning assistance has yielded non-inferior results in terms of accuracy, and has also resulted in statistically significant reductions in interpretation times for the group of less experienced readers. There is room for future studies to externally validate our study findings, and truly quantify the clinical utility of the deep learning model when it is applied in a variety of real-world, clinical practice settings.

4.1. Time Savings and Improved Diagnostic Efficiency

Prior studies have also demonstrated potential time savings by using deep learning models to perform repetitive tasks. Eng et al. [34] utilized a deep learning model for bone age assessment and achieved a reduced interpretation time of 102 s with deep learning assistance, as compared to an interpretation time of 140 s without deep learning assistance. Similarly, Lim et al. [14] were also able to achieve a reduction in mean interpretation time for detecting lumbar spinal stenosis on MRI scans by using a deep learning model, with up to 74% time savings for trainee radiologists (71 s with deep learning assistance, vs. 274 s without) and equivalent or superior inter-observer agreement for all stenosis gradings, as compared to radiologists who were not assisted by deep learning. Ahn et al. [35] also evaluated the impact of deep learning assistance on the interpretation of chest radiographs and the detection of four target findings (pneumonia, nodule, pneumothorax, and pleural effusion); the use of deep learning assistance allowed a 10% reduction in mean reporting time to be achieved (36.9 s with deep learning assistance, compared to 40.8 s without). There was also an improvement in reader sensitivities, but no negative impact on specificities. In our study, the trainee radiologists had reduced interpretation times with deep learning assistance, with a mean time saved of up to 16 s, or 55.2% (13.0 s with deep learning assistance, compared to 29.0 s without). Overall, three readers (R1, R4 and O3) showed increased interpretation accuracy with deep learning assistance, with R4 having the greatest reduction in mean angle difference, from −3.2 to 1.0°. Although the mean angle difference of −3.2° (without deep learning assistance) was the highest of all readers, it was still within the reported range of differences between expert readers (up to 10°); it also improved relative to our developed deep learning model (3.9°) and a prior published deep learning model by Ha et al. (7.3°) [22].
When comparing deep learning-assisted and unassisted major coronal curve measurements between the groups, the trainee radiologists had a mean time saving of 13.3 s (95% CI −19.6 to −6.9), compared to a slight time increase of 3.9 s (95% CI −2.9 to 10.6) for the trainee orthopedists (p = 0.005) (Table 3). We acknowledge that the effects are modest; however, they can accumulate over many cases, potentially translating into substantial savings in time and labor, when these deep learning techniques are formally incorporated into actual clinical workflows on an institutional or national level. In Singapore, more than 140,000 children are screened for adolescent idiopathic scoliosis annually, of which approximately 8000 are identified for further radiographic evaluation [4]. The scale of the national scoliosis screening program illustrates the necessity for enhanced productivity in the interpretation and reporting of scoliosis radiographs, a time-consuming and labor-intensive process. Increased automation of the radiographic interpretation process through the incorporation of deep learning techniques can potentially reduce the high workload and significant technical demands of radiographic analysis. Importantly, the readers also performed the measurements in an ideal environment with specialized PACS algorithms; we would expect greater time savings with the rudimentary manual measurements at the Health Promotion Board (the government agency of Singapore that conducts the national scoliosis screening program). A further prospective study will be required to validate this hypothesis.
Furthermore, there is room for further refinement of our deep learning model—with the inclusion of larger datasets and better model fitting, we can further improve and optimize the diagnostic accuracy of the model.

4.2. Clinical Impact and Cost-Effectiveness Thresholds

The thresholds of clinical significance of deep learning diagnostic techniques, and the cost-effectiveness of these diagnostics, remain an area of ongoing research. Future work should include cost-effectiveness analysis to fully elucidate the potential impact of integrating these deep learning techniques into actual clinical practice. In order to determine the thresholds of clinical significance or importance, future studies should balance the benefits of utilizing the deep learning model to augment existing diagnostic processes against the potential administrative, logistical and financial costs of implementation. Preliminary evidence from cost-effectiveness studies published by other authors on the utilization of artificial intelligence in other fields of radiology is encouraging. Prior research has shown that deep learning solutions may lead to substantial reductions in total operating costs, as compared to other capacity-enhancing methods [36]; the utilization of artificial intelligence to augment diagnostic tests can potentially result in significant cost savings and improve patient outcomes. Garg et al. (2025) [37] conducted a retrospective cost analysis on the use of deep learning-augmented chest radiography to detect active tuberculosis cases in Nigeria, and found that the use of chest radiographs and artificial intelligence in combination with symptoms screening, resulted in more cases detected at a lower cost per case, even after accounting for the implementation costs of artificial intelligence. Another study by Reginster et al. (2025) [38] examining the use of deep learning models for opportunistic screening of osteoporosis on chest radiographs found that the cost per quality-adjusted life year gained from screening was far below the threshold of cost-effectiveness.
Both of these studies demonstrated that the beneficial impact on patient outcomes, such as increased accuracy and efficiency of disease detection, earlier treatment initiation, and reduction in future complications, far outweigh the potential logistical and administrative costs of implementing these novel technologies.
Improved diagnostic processes will likely have a positive impact on the outcomes of patients with adolescent idiopathic scoliosis, by facilitating early treatment with conservative measures, such as bracing, to prevent curve progression, and reducing the need for costly surgical intervention. Studies of clinical utility should thus also take these factors into consideration, and, in this regard, longitudinal studies examining the long-term clinical impact of these deep learning-enhanced diagnostic processes would be of benefit, to allow for the accurate characterization of the thresholds of clinical importance and significance. Given the scale of the national scoliosis screening program in Singapore, where more than 140,000 children are screened for adolescent idiopathic scoliosis annually, we expect that the implementation of these deep learning techniques will result in significant time and cost savings. The public health impact of improved scoliosis diagnostics is also likely to be significant, given the prevalence and incidence of adolescent idiopathic scoliosis in Singapore. Furthermore, with the inclusion of larger datasets and better model fitting, we can further improve the accuracy and efficiency of our deep learning model, leading to greater time and cost savings.
In addition, the preliminary results of our proof-of-concept study demonstrated that the use of machine learning techniques may have a beneficial impact in the interpretations of scoliosis radiographs, by allowing clinicians to determine Cobb’s angle more efficiently and consistently. Beyond screening contexts, this also has direct implications on treatment planning, by allowing the accurate identification of patients who are at greatest risk of curve progression, and hence may require surgical intervention in future. A recent systematic review by Wong et al. (2022) has found strong and consistent evidence for Cobb’s angle to be one of the most significant predictive factors for curve progression [39].
The work of other researchers also demonstrated that artificial intelligence has an increasingly important role in other aspects of surgical planning, being utilized for various purposes such as the prognostication of surgical outcomes [40], prediction of compensatory postoperative changes in spinopelvic parameters [41], simulation of deformity correction [42] and the intraoperative insertion of spinal instrumentation (e.g., pedicle screws) [43].
Future work could build upon the preliminary results of this study, and investigate the use of the deep learning model and artificial intelligence techniques in clinical practice to guide treatment decisions. For example, Yahara et al. (2022) [44] developed a diagnostic tool using a deep convolutional neural network to predict the risk of curve progression and thus guide decision-making for therapeutic interventions in patients with adolescent idiopathic scoliosis.
We believe that the results from this study demonstrate the promise of deep learning-assisted diagnostics, which should be externally validated further in future studies.

4.3. Impact of Experience Level of Readers and Clinical Implications

In this study, the differences in interpretation times between the trainee radiologists and the trainee orthopedists could possibly be attributed to differences in experience and judgment [45]. The trainee radiologists were novice readers with no prior experience in the radiographic evaluation of scoliosis and may have been more likely to accept the deep learning model interpretations without confirmatory manual measurements. In comparison, the trainee orthopedists, possessing prior experience in the assessment of scoliosis radiographs, may have been more likely to rely on their manual measurements, rather than accept the deep learning model interpretations. Overall, despite the differences in prior experience and interpretation times, there were no significant differences in the accuracies of major coronal curve assessments between the trainee radiologists and trainee orthopedists, on both unassisted and deep learning-assisted reads (p > 0.05).
Previous studies have also explored the effects of the experience level of readers on the utilization of artificial intelligence. Current available evidence is relatively heterogeneous, but appears to weakly suggest an inverse relationship between the experience level of readers and the likelihood of acceptance of deep learning model predictions. Deep learning augmentation is also less likely to have a measurable impact on the performance of more experienced readers. In a study by Sung et al. (2021) [46] investigating the use of deep learning-based detection systems in the interpretation of chest radiographs, the non-radiology resident showed the greatest improvement in metrics such as area-under-the-curve analysis, as compared to experienced specialist thoracic radiologists, who showed the smallest increase, without significance, in per-lesion sensitivity. Similarly, in a study by Jang et al. (2020) [47], less experienced radiologists also contributed a higher-than-average incidence of report changes, when utilizing deep learning assistance to detect lung nodules on chest radiographs. Murata et al. (2019) [48] found that the use of computer-assisted diagnostics for the evaluation of maxillary sinusitis on panoramic radiographs was able to augment the performance of inexperienced readers to resemble that of experienced readers; however, experienced readers demonstrated high performance even without the deep learning system, and the use of computer-assisted diagnostics did not improve their performance.
This relationship (between reader experience and performance on deep learning-assisted reads) remains a relatively under-explored area in artificial intelligence research, but it has clinically relevant implications on the implementation of artificial intelligence in real-world practice. Based on current evidence from existing published studies, deep learning assistance may benefit inexperienced readers the most, and the implementation of deep learning-enhanced diagnostics in clinical settings that are lacking in resources [49], such as advanced imaging equipment or specialist physician expertise, would maximize their effectiveness. In these resource-scarce settings, artificial intelligence may have an integral role to play in enhancing the accuracy and efficiency of existing diagnostic tools, thus reducing the cognitive burden of physicians, and directly impacting patient outcomes. Future work should include more external validation studies to investigate the clinical utility of deep learning diagnostics, when deployed across a wide range of clinical settings and healthcare facilities.

4.4. Study Limitations

Our study has a few limitations. Firstly, the study focused on the evaluation of the major coronal curve and did not include the assessment of other features, such as secondary minor curves, or the Risser score for grading skeletal maturity.
Secondly, we used the spline technique of fitting a polynomial curve through the vertebral centers to develop the deep learning model. We chose the spline technique as it was superior to end-plate detection in the deep learning algorithm, especially on low-dose EOS radiographs such as those utilized at the Health Promotion Board of Singapore (which conducts Singapore’s national scoliosis screening program for children and adolescents). We postulate that the resolution for end-plate detection may be insufficient on low-dose radiographs, and that the spline technique, adopting the best fit curve, may reduce local precision, but increase the overall accuracy of major coronal curve measurement. This differs from human readers, who calculate the major coronal curve using the vertebral endplates, and may have led to reduced trust of the more experienced readers in the model interpretations.
Thirdly, as this was a retrospective study simulating a screening cohort of scoliosis radiographs, the study sample may have under-represented severe scoliosis cases, limiting the generalizability of the results. The performance of our deep learning model in the radiological assessment of severe scoliosis cases is uncertain, as it was largely trained on and fitted to mild or moderate cases—mirroring the cases encountered at a true screening center. Data from Singapore’s Health Promotion board (HPB) confirmed the proportion of cases in the current study sample. In other clinical contexts, such as an outpatient scoliosis clinic, the proportion of severe scoliosis cases may be greater. With the inclusion of more severe cases, re-fitting of our model may be necessary. The potential for greater inter- or intra-reader variability may also have an impact on the reads of less experienced observers. Our current test set fulfilled the sample size requirement determined by a priori power analysis; nevertheless, a larger test set would be beneficial in further increasing statistical power. Future studies could include larger test sets to determine more accurately the impact of deep learning assistance.
Fourthly, the readers accessed only a small training set of 10 cases prior to the formal interpretations, and this may have reduced some of the readers’ trust in the deep learning model’s interpretations. The study primarily involved trainees, and the lack of a more diverse group of readers may limit the generalizability of the findings. Less experienced readers, who often encounter difficulties in selecting the correct vertebral endplates for Cobb angle measurement, would likely benefit most from integrating deep learning into clinical workflows, through improved interpretation time and accuracy. Future studies should include larger and more heterogeneous reader groups to allow subgroup analysis, including assessing how deep learning affects accuracy among experienced readers across mild, moderate, and severe scoliosis. Based on preliminary data and existing literature, we hypothesize that experienced observers may achieve greater accuracy gains for severe, complex cases compared to mild or moderate ones, a hypothesis that future studies should test.
Finally, only a single expert orthopedic surgeon provided the reference standard for major coronal curve angle measurements. As most cases were mild or moderate, the use of a single reader as the reference standard may be acceptable (and is often the case in a true clinical scenario), but an expert panel is needed for complex cases to address inter- and intra-reader variability in Cobb angle measurement. Seah et al. (2021) [50] employed seven thoracic radiologists to establish ground truth for chest radiographs, showing that the optimal number of readers depends on data complexity; nevertheless, consensus among expert panels remains essential to mitigate inter- and intra-reader variability [51,52,53]. The inclusion of experts from multiple specialties (e.g., radiology and orthopedic surgery) may reduce bias. Future validation of this deep learning model in severe scoliosis should therefore involve multidisciplinary expert panels and larger test sets to assess inter- and intra-observer reliability.
In summary, future studies are required to address the generalizability gap, externally validate the results of our study, and confirm the utility of the deep learning model, when applied in a variety of real-world contexts, settings and populations. Further work could include additional features in the deep learning model; more extensive training sessions could also be conducted for the readers to enhance their understanding of the deep learning model’s interpretations [54]. Future studies should also prospectively evaluate the use of this deep learning model across multiple scoliosis screening centers where unassisted digital and physical measurements of the Cobb’s angle are both performed. The complete automation of the interpretation process, with the removal of human involvement, represents the ultimate goal of the development of these deep learning techniques, which will require successive, incremental steps to be achieved.

5. Conclusions

This study has demonstrated potential time savings for trainee radiologists utilizing a deep learning model (CaraNet) for measuring major coronal curve angles on posteroanterior scoliosis radiographs. Overall, trainee orthopedists with prior experience in the interpretation of scoliosis radiographs showed no time savings using the CaraNet model, and did not demonstrate improved reading accuracy, as compared to trainee radiologists. This study also demonstrated an interesting feature in behavior and implementation science, highlighting differences in potential bias between expert and non-expert readers.

Author Contributions

Conceptualization, X.Z.L., M.S.F., J.T.P.D.H., A.M., J.H.J.T. and L.-L.L.; methodology, X.Z.L., M.S.F., J.T.P.D.H., A.M., J.H.J.T. and L.-L.L.; software, X.Z.L., M.S.F., J.T.P.D.H., A.M. and K.W.N.; validation, X.Z.L., M.S.F., J.T.P.D.H., A.M., J.H.J.T., L.-L.L. and K.W.N.; formal analysis, Y.H.C., X.Z.L. and M.S.F.; investigation, X.Z.L., M.S.F., K.W.N., A.M., D.S.W.L., T.K., A.L., Y.J.L., R.W.L., S.W., H.W.N.T., S.J.H., X.L., D.S., Y.H.C., P.H., L.K., J.H.J.T., L.-L.L. and J.T.P.D.H.; resources, X.Z.L., M.S.F., J.T.P.D.H., A.M., J.H.J.T., L.-L.L., K.W.N., P.H. and L.K.; data curation, X.Z.L., M.S.F., A.M., D.S.W.L., T.K., A.L., Y.J.L., R.W.L., H.W.N.T., S.J.H., X.L., D.S., J.H.J.T. and J.T.P.D.H.; writing— X.Z.L., M.S.F., J.T.P.D.H., J.H.J.T. and S.W.; writing—review and editing, X.Z.L., M.S.F., K.W.N., A.M., D.S.W.L., T.K., A.L., Y.J.L., R.W.L., S.W., H.W.N.T., S.J.H., X.L., D.S., Y.H.C., P.H., L.K., J.H.J.T., L.-L.L. and J.T.P.D.H.; visualization, X.Z.L., M.S.F., Y.H.C. and S.W.; supervision, X.Z.L., M.S.F., J.T.P.D.H., A.M. and J.H.J.T.; project administration, X.Z.L., M.S.F., J.T.P.D.H., A.M. and J.H.J.T.; funding acquisition, X.Z.L. and J.T.P.D.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was directly funded by MOH/NMRC, Singapore. Specifically, this study received support from the Singapore Ministry of Health National Medical Research Council under the NMRC Clinician Innovator Award (CIA). The grant was awarded for the project titled "From Prototype to Full Deployment: A Comprehensive Deep Learning Pipeline for Whole-Spine MRI” (Grant ID: CIAINV25jan-0005 J.T.P.D.H). The research was also supported by the Population Health Research Grant-New Investigator Grant (PHRG-NIG), Grant Title: Deep learning application for Cobb’s angle assessment in Singapore (MOH-001653-00) (X.Z.L.).

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the local Institutional Review Board (IRB) of the National Healthcare Group (NHG), Singapore (IRB approval number: 2021/01084, date of approval: 17 February 2022).

Informed Consent Statement

A waiver of consent was granted by the Institutional Review Board, in accordance with the Human Biomedical Research Act of Singapore, due to the study’s retrospective nature and minimal risk involved.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to confidentiality and ethical issues.

Conflicts of Interest

The authors declare no conflicts of interest. The funding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
AISAdolescent idiopathic scoliosis
ANOVAAnalysis of variance
CaraNetContext Axial Reverse Attention Network
CIConfidence interval
CNNConvolutional neural networks
DICOMDigital Imaging and Communications in Medicine
DLDeep learning
MRIMagnetic resonance imaging
PACSPicture Archiving and Communication System
SDStandard deviation

Appendix A

Appendix A.1. Deep Learning Model Development

The study utilized the Context Axial Reverse Attention Network (CaraNet) model to perform vertebral segmentation. CaraNet has been used in many applications and is found to be accurate in segmenting objects from images [31]. This appendix describes the main steps in the development of this deep learning model.
Prior to model training, image pre-processing allowed the dataset to be standardized and augmented. The standardization steps ensured that the resultant images were consistent in terms of size and shape. We centered and cropped the images to a standard size of 512 × 512, in JPEG format.
The division of the image dataset into two parts, training and validation set (580 images) and test set (60 images), followed the pre-processing phase. As can be observed from the training dataset size, there was a limited number of annotated images. Standard image augmentation techniques (e.g., flip, rotation, etc.) allowed the dataset size to be increased from 580 to more than 7000 images.
Following augmentation of the training dataset, we trained the CaraNet model using the Adam optimizer and a batch size of 32. The model learning rate was 0.001; the number of training epochs was 500. The high volume of medical imaging data necessitated the use of an institutional supercomputer to train the deep learning model. The total model training time was 130 h.
The process above allowed deep learning model masks of the whole spine (showing individual vertebrae) to be obtained and compared to the manual annotations. The model achieved a Dice similarity coefficient of 0.88 and Intersection over Union (IOU) of 84% were achieved on validation; it achieved a Dice coefficient of 0.619 and IOU of 72% on testing. Using the masks, the model determined the center of each vertebra, and fitted a polynomial curve to calculate Cobb’s angle. The spline technique, or the exhaustive assessment of the maximum angles between the vertebrae center line pairs to determine the largest angle, was the method of choice for the measurement of Cobb’s angle. Once the model has identified the correct points on the line, the reader could view the model prediction as extended lines and text in the top right hand corner of the image. Figure A1 summarizes the steps in the development of the deep learning model.
Figure A1. Deep learning model development from data engineering to model testing. Abbreviations: CARA-Net—Context Axial Reverse Attention Network, DICE—Dice similarity coefficient, DL—deep learning, IOU—intersection over union, JSON—JavaScript Object Notation.
Figure A1. Deep learning model development from data engineering to model testing. Abbreviations: CARA-Net—Context Axial Reverse Attention Network, DICE—Dice similarity coefficient, DL—deep learning, IOU—intersection over union, JSON—JavaScript Object Notation.
Ai 06 00318 g0a1

References

  1. Dunn, J.; Henrikson, N.B.; Morrison, C.C.; Blasi, P.R.; Nguyen, M.; Lin, J.S. Screening for Adolescent Idiopathic Scoliosis: Evidence Report and Systematic Review for the US Preventive Services Task Force. JAMA 2018, 319, 173–187. [Google Scholar] [CrossRef]
  2. Jinnah, A.H.; Lynch, K.A.; Wood, T.R.; Hughes, M.S. Adolescent Idiopathic Scoliosis: Advances in Diagnosis and Management. Curr. Rev. Musculoskelet. Med. 2025, 18, 54–60. [Google Scholar] [CrossRef]
  3. Thomas, J.J.; Stans, A.A.; Milbrandt, T.A.; Treder, V.M.; Kremers, H.M.; Shaughnessy, W.J.; Larson, A.N. Does School Screening Affect Scoliosis Curve Magnitude at Presentation to a Pediatric Orthopedic Clinic? Spine Deform 2018, 6, 403–408. [Google Scholar] [CrossRef]
  4. Wong, H.K.; Hui, J.H.; Rajan, U.; Chia, H.P. Idiopathic scoliosis in Singapore schoolchildren: A prevalence study 15 years into the screening program. Spine 2005, 30, 1188–1196. [Google Scholar] [CrossRef]
  5. Addai, D.; Zarkos, J.; Bowey, A.J. Current concepts in the diagnosis and management of adolescent idiopathic scoliosis. Childs Nerv. Syst. 2020, 36, 1111–1119. [Google Scholar] [CrossRef]
  6. Cheng, J.C.; Castelein, R.M.; Chu, W.C.; Danielsson, A.J.; Dobbs, M.B.; Grivas, T.B.; Gurnett, C.A.; Luk, K.D.; Moreau, A.; Newton, P.O.; et al. Adolescent idiopathic scoliosis. Nat. Rev. Dis. Primers 2015, 1, 15030. [Google Scholar] [CrossRef]
  7. Oakley, P.A.; Ehsani, N.N.; Harrison, D.E. The Scoliosis Quandary: Are Radiation Exposures from Repeated X-Rays Harmful? Dose Response 2019, 17, 1559325819852810. [Google Scholar] [CrossRef]
  8. Oba, H.; Watanabe, K.; Asada, T.; Matsumura, A.; Sugawara, R.; Takahashi, S.; Ueda, H.; Suzuki, S.; Doi, T.; Takeuchi, T.; et al. Effects of Physiotherapeutic Scoliosis-Specific Exercise for Adolescent Idiopathic Scoliosis Cobb Angle: A Systematic Review. Spine Sur. Rel. Res. 2025, 9, 120–129. [Google Scholar] [CrossRef]
  9. Blevins, K.; Battenberg, A.; Beck, A. Management of Scoliosis. Adv. Pediatr. 2018, 65, 249–266. [Google Scholar] [CrossRef] [PubMed]
  10. Tambe, A.D.; Panikkar, S.J.; Millner, P.A.; Tsirikos, A.I. Current concepts in the surgical management of adolescent idiopathic scoliosis. Bone Jt. J. 2018, 100-b, 415–424. [Google Scholar] [CrossRef]
  11. Elfiky, T.; Patil, N.; Shawky, M.; Siam, A.; Ragab, R.; Allam, Y. Oxford Cobbometer Versus Computer Assisted-Software for Measurement of Cobb Angle in Adolescent Idiopathic Scoliosis. Neurospine 2020, 17, 304–311. [Google Scholar] [CrossRef]
  12. Sun, Y.; Xing, Y.; Zhao, Z.; Meng, X.; Xu, G.; Hai, Y. Comparison of manual versus automated measurement of Cobb angle in idiopathic scoliosis based on a deep learning keypoint detection technology. Eur. Spine J. 2022, 31, 1969–1978. [Google Scholar] [CrossRef]
  13. Hallinan, J.T.P.D.; Zhu, L.; Yang, K.; Makmur, A.; Algazwi, D.A.R.; Thian, Y.L.; Lau, S.; Choo, Y.S.; Eide, S.E.; Yap, Q.V.; et al. Deep Learning Model for Automated Detection and Classification of Central Canal, Lateral Recess, and Neural Foraminal Stenosis at Lumbar Spine MRI. Radiology 2021, 300, 130–138. [Google Scholar] [CrossRef] [PubMed]
  14. Lim, D.S.W.; Makmur, A.; Zhu, L.; Zhang, W.; Cheng, A.J.L.; Sia, D.S.Y.; Eide, S.E.; Ong, H.Y.; Jagmohan, P.; Tan, W.C.; et al. Improved Productivity Using Deep Learning–assisted Reporting for Lumbar Spine MRI. Radiology 2022, 305, 160–166. [Google Scholar] [CrossRef]
  15. Han, S.S.; Azad, T.D.; Suarez, P.A.; Ratliff, J.K. A machine learning approach for predictive models of adverse events following spine surgery. Spine J. 2019, 19, 1772–1781. [Google Scholar] [CrossRef]
  16. DeVries, Z.; Hoda, M.; Rivers, C.S.; Maher, A.; Wai, E.; Moravek, D.; Stratton, A.; Kingwell, S.; Fallah, N.; Paquet, J.; et al. Development of an unsupervised machine learning algorithm for the prognostication of walking ability in spinal cord injury patients. Spine J. 2020, 20, 213–224. [Google Scholar] [CrossRef]
  17. Azimi, P.; Yazdanian, T.; Benzel, E.C.; Aghaei, H.N.; Azhari, S.; Sadeghi, S.; Montazeri, A. A Review on the Use of Artificial Intelligence in Spinal Diseases. Asian Spine J. 2020, 14, 543–571. [Google Scholar] [CrossRef]
  18. Horng, M.H.; Kuok, C.P.; Fu, M.J.; Lin, C.J.; Sun, Y.N. Cobb Angle Measurement of Spine from X-Ray Images Using Convolutional Neural Network. Comput. Math. Methods Med. 2019, 2019, 6357171. [Google Scholar] [CrossRef]
  19. Wu, H.; Bailey, C.; Rasoulinejad, P.; Li, S. Automated comprehensive Adolescent Idiopathic Scoliosis assessment using MVC-Net. Med. Image Anal. 2018, 48, 1–11. [Google Scholar] [CrossRef] [PubMed]
  20. Zhang, K.; Xu, N.; Guo, C.; Wu, J. MPF-net: An effective framework for automated cobb angle estimation. Med. Image Anal. 2022, 75, 102277. [Google Scholar] [CrossRef]
  21. Caesarendra, W.; Rahmaniar, W.; Mathew, J.; Thien, A. Automated Cobb Angle Measurement for Adolescent Idiopathic Scoliosis Using Convolutional Neural Network. Diagnostics 2022, 12, 396. [Google Scholar] [CrossRef]
  22. Ha, A.Y.; Do, B.H.; Bartret, A.L.; Fang, C.X.; Hsiao, A.; Lutz, A.M.; Banerjee, I.; Riley, G.M.; Rubin, D.L.; Stevens, K.J.; et al. Automating Scoliosis Measurements in Radiographic Studies with Machine Learning: Comparing Artificial Intelligence and Clinical Reports. J. Digit Imaging 2022, 35, 524–533. [Google Scholar] [CrossRef]
  23. Bernstein, P.; Metzler, J.; Weinzierl, M.; Seifert, C.; Kisel, W.; Wacker, M. Radiographic scoliosis angle estimation: Spline-based measurement reveals superior reliability compared to traditional COBB method. Eur. Spine J. 2021, 30, 676–685. [Google Scholar] [CrossRef]
  24. Prestigiacomo, F.G.; Hulsbosch, M.; Bruls, V.E.J.; Nieuwenhuis, J.J. Intra- and inter-observer reliability of Cobb angle measurements in patients with adolescent idiopathic scoliosis. Spine Deform. 2022, 10, 79–86. [Google Scholar] [CrossRef]
  25. Lucasti, C.; Haider, M.N.; Marshall, I.P.; Thomas, R.; Scott, M.M.; Ferrick, M.R. Inter- and intra-reliability of Cobb angle measurement in pediatric scoliosis using PACS (picture archiving and communication systems methods) for clinicians with various levels of experience. AME Surg. J. 2023, 3, 12. [Google Scholar] [CrossRef]
  26. Lechner, R.; Putzer, D.; Dammerer, D.; Liebensteiner, M.; Bach, C.; Thaler, M. Comparison of two- and three-dimensional measurement of the Cobb angle in scoliosis. Int. Orthop. 2017, 41, 957–962. [Google Scholar] [CrossRef] [PubMed]
  27. Goldman, S.N.; Hui, A.T.; Choi, S.; Mbamalu, E.K.; Tirabady, P.; Eleswarapu, A.S.; Gomez, J.A.; Alvandi, L.M.; Fornari, E.D. Applications of artificial intelligence for adolescent idiopathic scoliosis: Mapping the evidence. Spine Deform. 2024, 12, 1545–1570. [Google Scholar] [CrossRef] [PubMed]
  28. Chan, W.W.-Y.; Fu, S.-N.; Zheng, Y.-P.; Parent, E.C.; Cheung, J.P.Y.; Zheng, D.K.Y.; Wong, A.Y.L. A Systematic Review of Machine Learning Models for Predicting Curve Progression in Teenagers with Idiopathic Scoliosis. JOSPT Open 2024, 2, 202–224. [Google Scholar] [CrossRef]
  29. England, J.R.; Cheng, P.M. Artificial Intelligence for Medical Image Analysis: A Guide for Authors and Reviewers. AJR. Am. J. Roentgenol. 2019, 212, 513–519. [Google Scholar] [CrossRef]
  30. Willemink, M.J.; Koszek, W.A.; Hardell, C.; Wu, J.; Fleischmann, D.; Harvey, H.; Folio, L.R.; Summers, R.M.; Rubin, D.L.; Lungren, M.P. Preparing Medical Imaging Data for Machine Learning. Radiology 2020, 295, 4–15. [Google Scholar] [CrossRef]
  31. Lou, A.; Guan, S.; Ko, H.; Loew, M.H. (Eds.) CaraNet: Context axial reverse attention network for segmentation of small medical objects. In Medical Imaging 2022: Image Processing; SPIE: Bellingham, WA, USA, 2022. [Google Scholar]
  32. Yu, F.; Moehring, A.; Banerjee, O.; Salz, T.; Agarwal, N.; Rajpurkar, P. Heterogeneity and predictors of the effects of AI assistance on radiologists. Nat. Med. 2024, 30, 837–849. [Google Scholar] [CrossRef] [PubMed]
  33. Park, A.; Chute, C.; Rajpurkar, P.; Lou, J.; Ball, R.L.; Shpanskaya, K.; Jabarkheel, R.; Kim, L.H.; McKenna, E.; Tseng, J.; et al. Deep Learning–Assisted Diagnosis of Cerebral Aneurysms Using the HeadXNet Model. JAMA Netw. Open 2019, 2, e195600. [Google Scholar] [CrossRef] [PubMed]
  34. Eng, D.K.; Khandwala, N.B.; Long, J.; Fefferman, N.R.; Lala, S.V.; Strubel, N.A.; Milla, S.S.; Filice, R.W.; Sharp, S.E.; Towbin, A.J.; et al. Artificial Intelligence Algorithm Improves Radiologist Performance in Skeletal Age Assessment: A Prospective Multicenter Randomized Controlled Trial. Radiology 2021, 301, 692–699. [Google Scholar] [CrossRef]
  35. Ahn, J.S.; Ebrahimian, S.; McDermott, S.; Lee, S.; Naccarato, L.; Di Capua, J.F.; Wu, M.Y.; Zhang, E.W.; Muse, V.; Miller, B.; et al. Association of Artificial Intelligence-Aided Chest Radiograph Interpretation with Reader Performance and Efficiency. JAMA Netw. Open 2022, 5, e2229289. [Google Scholar] [CrossRef] [PubMed]
  36. Brix, M.A.K.; Järvinen, J.; Bode, M.K.; Nevalainen, M.; Nikki, M.; Niinimäki, J.; Lammentausta, E. Financial impact of incorporating deep learning reconstruction into magnetic resonance imaging routine. Eur. J. Radiol. 2024, 175, 111434. [Google Scholar] [CrossRef]
  37. Garg, T.; John, S.; Abdulkarim, S.; Ahmed, A.D.; Kirubi, B.; Rahman, M.T.; Ubochioma, E.; Creswell, J. Implementation costs and cost-effectiveness of ultraportable chest X-ray with artificial intelligence in active case finding for tuberculosis in Nigeria. PLOS Digit. Health 2025, 4, e0000894. [Google Scholar] [CrossRef]
  38. Reginster, J.Y.; Schmidmaier, R.; Alokail, M.; Hiligsmann, M. Cost-effectiveness of opportunistic osteoporosis screening using chest radiographs with deep learning in Germany. Aging Clin. Exp. Res. 2025, 37, 149. [Google Scholar] [CrossRef]
  39. Wong, L.P.K.; Cheung, P.W.H.; Cheung, J.P.Y. Curve type, flexibility, correction, and rotation are predictors of curve progression in patients with adolescent idiopathic scoliosis undergoing conservative treatment: A systematic review. Bone Jt. J. 2022, 104-b, 424–432. [Google Scholar] [CrossRef]
  40. Benzakour, A.; Altsitzioglou, P.; Lemée, J.M.; Ahmad, A.; Mavrogenis, A.F.; Benzakour, T. Artificial intelligence in spine surgery. Int. Orthop. 2023, 47, 457–465. [Google Scholar] [CrossRef]
  41. Ngo, J.; Athreya, K.; Ehrlich, B.; Sayrs, L.; Morphew, T.; Halvorson, S.; Aminian, A. Use of machine learning predictive models in sagittal alignment planning in adolescent idiopathic scoliosis surgery. Eur. Spine J. 2025. [Google Scholar] [CrossRef]
  42. Tachi, H.; Kato, K.; Abe, Y.; Kokabu, T.; Yamada, K.; Iwasaki, N.; Sudo, H. Surgical Outcome Prediction Using a Four-Dimensional Planning Simulation System with Finite Element Analysis Incorporating Pre-bent Rods in Adolescent Idiopathic Scoliosis: Simulation for Spatiotemporal Anatomical Correction Technique. Front. Bioeng. Biotechnol. 2021, 9, 746902. [Google Scholar] [CrossRef]
  43. Zhang, H.; Huang, C.; Wang, D.; Li, K.; Han, X.; Chen, X.; Li, Z. Artificial Intelligence in Scoliosis: Current Applications and Future Directions. J. Clin. Med. 2023, 12, 7382. [Google Scholar] [CrossRef]
  44. Yahara, Y.; Tamura, M.; Seki, S.; Kondo, Y.; Makino, H.; Watanabe, K.; Kamei, K.; Futakawa, H.; Kawaguchi, Y. A deep convolutional neural network to predict the curve progression of adolescent idiopathic scoliosis: A pilot study. BMC Musculoskelet. Disord. 2022, 23, 610. [Google Scholar] [CrossRef] [PubMed]
  45. Rainey, C.; O’Regan, T.; Matthew, J.; Skelton, E.; Woznitza, N.; Chu, K.Y.; Goodman, S.; McConnell, J.; Hughes, C.; Bond, R.; et al. UK reporting radiographers’ perceptions of AI in radiographic image interpretation-Current perspectives and future developments. Radiography 2022, 28, 881–888. [Google Scholar] [CrossRef]
  46. Sung, J.; Park, S.; Lee, S.M.; Bae, W.; Park, B.; Jung, E.; Seo, J.B.; Jung, K.-H. Added Value of Deep Learning–based Detection System for Multiple Major Findings on Chest Radiographs: A Randomized Crossover Study. Radiology 2021, 299, 450–459. [Google Scholar] [CrossRef]
  47. Jang, S.; Song, H.; Shin, Y.J.; Kim, J.; Kim, J.; Lee, K.W.; Lee, S.S.; Lee, W.; Lee, S.; Lee, K.H. Deep Learning-based Automatic Detection Algorithm for Reducing Overlooked Lung Cancers on Chest Radiographs. Radiology 2020, 296, 652–661. [Google Scholar] [CrossRef] [PubMed]
  48. Murata, M.; Ariji, Y.; Ohashi, Y.; Kawai, T.; Fukuda, M.; Funakoshi, T.; Kise, Y.; Nozawa, M.; Katsumata, A.; Fujita, H.; et al. Deep-learning classification using convolutional neural network for evaluation of maxillary sinusitis on panoramic radiography. Oral Radiol. 2019, 35, 301–307. [Google Scholar] [CrossRef]
  49. Kim, J.H.; Han, S.G.; Cho, A.; Shin, H.J.; Baek, S.E. Effect of deep learning-based assistive technology use on chest radiograph interpretation by emergency department physicians: A prospective interventional simulation-based study. BMC Med. Inform. Decis. Mak. 2021, 21, 311. [Google Scholar] [CrossRef]
  50. Seah, J.C.Y.; Tang, C.H.M.; Buchlak, Q.D.; Holt, X.G.; Wardman, J.B.; Aimoldin, A.; Esmaili, N.; Ahmad, H.; Pham, H.; Lambert, J.F.; et al. Effect of a comprehensive deep-learning model on the accuracy of chest x-ray interpretation by radiologists: A retrospective, multireader multicase study. Lancet Digit. Health 2021, 3, e496–e506. [Google Scholar] [CrossRef]
  51. Hayashi, D.; Regnard, N.E.; Ventre, J.; Marty, V.; Clovis, L.; Lim, L.; Nitche, N.; Zhang, Z.; Tournier, A.; Ducarouge, A.; et al. Deep learning algorithm enables automated Cobb angle measurements with high accuracy. Skelet. Radiol. 2025, 54, 1469–1478. [Google Scholar] [CrossRef]
  52. Molière, S.; Hamzaoui, D.; Granger, B.; Montagne, S.; Allera, A.; Ezziane, M.; Luzurier, A.; Quint, R.; Kalai, M.; Ayache, N.; et al. Reference standard for the evaluation of automatic segmentation algorithms: Quantification of inter observer variability of manual delineation of prostate contour on MRI. Diagn. Interv. Imaging 2024, 105, 65–73. [Google Scholar] [CrossRef] [PubMed]
  53. van Eekelen, L.; Spronck, J.; Looijen-Salamon, M.; Vos, S.; Munari, E.; Girolami, I.; Eccher, A.; Acs, B.; Boyaci, C.; de Souza, G.S.; et al. Comparing deep learning and pathologist quantification of cell-level PD-L1 expression in non-small cell lung cancer whole-slide images. Sci. Rep. 2024, 14, 7136. [Google Scholar] [CrossRef]
  54. Filice, R.W.; Ratwani, R.M. The Case for User-Centered Artificial Intelligence in Radiology. Radiol. Artif. Intell. 2020, 2, e190095. [Google Scholar] [CrossRef] [PubMed]
Figure 1. An example of Cobb’s angle, measured by a deep learning model (utilized in this study). The angle of the major coronal curve (i.e., Cobb’s angle) refers to the greatest angle between two lines passing through superior and inferior vertebral endplates. Proper vertebral endplate selection is crucial to allow the accurate measurement of Cobb’s angle, but remains challenging for inexperienced readers.
Figure 1. An example of Cobb’s angle, measured by a deep learning model (utilized in this study). The angle of the major coronal curve (i.e., Cobb’s angle) refers to the greatest angle between two lines passing through superior and inferior vertebral endplates. Proper vertebral endplate selection is crucial to allow the accurate measurement of Cobb’s angle, but remains challenging for inexperienced readers.
Ai 06 00318 g001
Figure 2. Posteroanterior whole-spine radiograph for scoliosis assessment (left) with the deep learning model predictions overlaid on the processed image (right). The deep learning model extrapolated a polynomial curve along the centers of the vertebrae, and highlighted the predicted Cobb’s angle for the reader at the maximum angulated curve. This is an example of a good model prediction, with less than three degrees of difference from the reference standard.
Figure 2. Posteroanterior whole-spine radiograph for scoliosis assessment (left) with the deep learning model predictions overlaid on the processed image (right). The deep learning model extrapolated a polynomial curve along the centers of the vertebrae, and highlighted the predicted Cobb’s angle for the reader at the maximum angulated curve. This is an example of a good model prediction, with less than three degrees of difference from the reference standard.
Ai 06 00318 g002
Figure 3. Posteroanterior whole-spine radiograph for scoliosis assessment (left) with the deep learning model polynomial curve (fitted to the centers of the vertebrae) and Cobb’s angle prediction overlaid on the image (right). This is an example of a poor model prediction as it was not fitted to the inferior curve, with more than 10 degrees of difference from the reference standard Cobb’s angle.
Figure 3. Posteroanterior whole-spine radiograph for scoliosis assessment (left) with the deep learning model polynomial curve (fitted to the centers of the vertebrae) and Cobb’s angle prediction overlaid on the image (right). This is an example of a poor model prediction as it was not fitted to the inferior curve, with more than 10 degrees of difference from the reference standard Cobb’s angle.
Ai 06 00318 g003
Figure 4. Crossover study design. DL—deep learning, O—trainee orthopedist, R—trainee radiologist.
Figure 4. Crossover study design. DL—deep learning, O—trainee orthopedist, R—trainee radiologist.
Ai 06 00318 g004
Figure 5. Flow chart of the study design. The study conducted training, validation and testing of the deep learning model on separate data. * Clinicians performed measurements of Cobb’s angles with and without assistance from the deep learning model.
Figure 5. Flow chart of the study design. The study conducted training, validation and testing of the deep learning model on separate data. * Clinicians performed measurements of Cobb’s angles with and without assistance from the deep learning model.
Ai 06 00318 g005
Figure 6. Impact of deep learning assistance on the mean interpretation times of individual readers. Abbreviations: DL—deep learning, O—trainee orthopedist, R—trainee radiologist.
Figure 6. Impact of deep learning assistance on the mean interpretation times of individual readers. Abbreviations: DL—deep learning, O—trainee orthopedist, R—trainee radiologist.
Ai 06 00318 g006
Table 1. Patient demographics and scoliotic curve characteristics.
Table 1. Patient demographics and scoliotic curve characteristics.
CharacteristicsTest Set (n = 60)
Age (years), mean ± standard deviation (range)12.6 ± 2.0 (10–18)
Sex, n (%)
    Female43 (71.7)
    Male17 (28.3)
Reference standard scoliosis grading (Cobb’s angle), n (%)
    Mild (10–24°)30 (50.0)
    Moderate (25–40°)25 (41.6)
    Severe (>40°)5 (8.3)
Table 2. Mean Cobb’s angle differences for each reader with and without assistance from the deep learning model.
Table 2. Mean Cobb’s angle differences for each reader with and without assistance from the deep learning model.
ANOVA (DL-Assisted V. Unassisted)
ReaderDL AssistanceMean Angle Difference (°)95% CIF-Statisticp-Value
DL-3.9−5.9 to 8.8--
R1No−2.0−3.1 to −1.021.65<0.001 *
Yes1.40.4 to 2.4
R2No0.1−0.9 to 1.10.230.630
Yes0.5−0.5 to 1.4
R3No−0.1−1.0 to 0.80.170.900
Yes−0.2−1.1 to 0.7
R4No−3.2−4.2 to −2.120.46<0.001 *
Yes1.0−0.5 to 2.6
O1No−0.5−1.6 to 0.61.030.310
Yes0.5−1.1 to 2.0
O2No−0.9−1.9 to 0.20.010.940
Yes−0.8−1.7 to 0.1
O3No−2.4−4.0 to −0.84.370.039 *
Yes−0.5−1.4 to 0.3
O4No1.20.3 to 2.10.980.330
Yes0.5−0.5 to 1.6
Footnote: paired differences; mixed model with reader as the random factor. Abbreviations: 95% CI—95% confidence interval, ANOVA—analysis of variance, DL—deep learning, O—trainee orthopedist, R—trainee radiologist. * Statistically significant.
Table 3. Differences in interpretation times and accuracies of trainee radiologists (novice readers) and trainee orthopedists (intermediate readers), between deep learning-assisted reads and unassisted reads.
Table 3. Differences in interpretation times and accuracies of trainee radiologists (novice readers) and trainee orthopedists (intermediate readers), between deep learning-assisted reads and unassisted reads.
ReaderMean Difference95% CIp-Value
Timing (s) 1Ortho3.9−2.9 to 10.60.005
Radio−13.3−19.6 to −6.9
Accuracy (°) 2Ortho−0.3−1.4 to 0.8>0.05
Radio−0.4−1.6 to 0.8
Footnote: 1 Difference in mean individual timings (in seconds) between deep learning-assisted reads and unassisted reads. 2 Angle accuracy: differences in mean individual angles (in degrees) between deep learning-assisted reads and unassisted reads. Abbreviations: 95% CI—95% confidence interval, Ortho—trainee orthopedists, Radio—trainee radiologists.
Table 4. Individual mean Cobb angle interpretation time (in seconds) per radiograph with and without assistance from the deep learning model.
Table 4. Individual mean Cobb angle interpretation time (in seconds) per radiograph with and without assistance from the deep learning model.
ReaderDL AssistanceInterpretation Time (s), Mean ± SDMean Difference (s)95% CIp-Value
R1No19.3 ± 10.1−10.4−13.2 to −7.6<0.001 *
Yes8.9 ± 6.4
R2No29.0 ± 6.9−16.0−18.7 to −13.3<0.001 *
Yes13.0 ± 9.5
R3No39.1 ± 12.9−11.5−16.3 to −6.8<0.001 *
Yes27.6 ± 14.8
R4No30.4 ± 13.3−15.1−21.2 to −9.0<0.001 *
Yes15.3 ± 18.3
O1No9.7 ± 2.8−3.9−4.9 to −2.9<0.001 *
Yes5.8 ± 2.6
O2No37.9 ± 13.44.9−2.4 to 12.20.186
Yes42.8 ± 25.9
O3No50.8 ± 18.811.40.1 to 22.60.047 *
Yes62.2 ± 35.9
O4No26.9 ± 11.8−2.1−5.4 to 1.30.220
Yes24.9 ± 6.9
Footnote—paired differences. Abbreviations: 95% CI—95% confidence interval, DL—deep learning, O—trainee orthopedist, R—trainee radiologist, SD—standard deviation. * Statistically significant.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Low, X.Z.; Furqan, M.S.; Ng, K.W.; Makmur, A.; Lim, D.S.W.; Kuah, T.; Lee, A.; Lee, Y.J.; Liu, R.W.; Wang, S.; et al. Improved Productivity Using Deep Learning-Assisted Major Coronal Curve Measurement on Scoliosis Radiographs. AI 2025, 6, 318. https://doi.org/10.3390/ai6120318

AMA Style

Low XZ, Furqan MS, Ng KW, Makmur A, Lim DSW, Kuah T, Lee A, Lee YJ, Liu RW, Wang S, et al. Improved Productivity Using Deep Learning-Assisted Major Coronal Curve Measurement on Scoliosis Radiographs. AI. 2025; 6(12):318. https://doi.org/10.3390/ai6120318

Chicago/Turabian Style

Low, Xi Zhen, Mohammad Shaheryar Furqan, Kian Wei Ng, Andrew Makmur, Desmond Shi Wei Lim, Tricia Kuah, Aric Lee, You Jun Lee, Ren Wei Liu, Shilin Wang, and et al. 2025. "Improved Productivity Using Deep Learning-Assisted Major Coronal Curve Measurement on Scoliosis Radiographs" AI 6, no. 12: 318. https://doi.org/10.3390/ai6120318

APA Style

Low, X. Z., Furqan, M. S., Ng, K. W., Makmur, A., Lim, D. S. W., Kuah, T., Lee, A., Lee, Y. J., Liu, R. W., Wang, S., Tan, H. W. N., Hui, S. J., Lim, X., Seow, D., Chan, Y. H., Hirubalan, P., Kumar, L., Tan, J. H. J., Lau, L.-L., & Hallinan, J. T. P. D. (2025). Improved Productivity Using Deep Learning-Assisted Major Coronal Curve Measurement on Scoliosis Radiographs. AI, 6(12), 318. https://doi.org/10.3390/ai6120318

Article Metrics

Back to TopTop