Automated Cobb Angle Measurement for Adolescent Idiopathic Scoliosis Using Convolutional Neural Network

The Cobb angle measurement of the scoliotic spine is prone to inter- and intra-observer variations in the clinical setting. This paper proposes a deep learning architecture for detecting spine vertebrae from X-ray images to evaluate the Cobb angle automatically. The public AASCE MICCAI 2019 anterior-posterior X-ray image dataset and local images were used to train and test the proposed convolutional neural network architecture. Sixty-eight landmark features of the spine were detected from the input image to obtain seventeen vertebrae on the spine. The vertebrae locations obtained were processed to automatically measure the Cobb angle. The proposed method can measure the Cobb angle with accuracies up to 93.6% and has excellent reliability compared to clinicians’ measurement (intraclass correlation coefficient > 0.95). The proposed deep learning architecture may be used as a tool to augment Cobb angle measurement in X-ray images of patients with adolescent idiopathic scoliosis in a real-world clinical setting.


Introduction
Scoliosis is a structural abnormality in which the spine curves from side to side and rotates. Children aged 10 to 17 years old who present with scoliosis of unknown cause are categorized as having Adolescent Idiopathic Scoliosis (AIS) [1]. Patients with mild deformities are usually asymptomatic. However, if the curvature progresses during the growth spurt, discomfort, pain, and symptoms related to abnormal chest wall growth and difference in shoulder height can lead to decreased quality of life [2]. AIS can cause respiratory symptoms, such as shortness of breath when the curvature exceeds 50 • , and patients are at a high risk of significant lung function abnormalities if the curvature is more than 100 • [3].
The Cobb measuring method is the gold standard used in quantifying the scoliotic curve. The Cobb Angle (CA) is measured from the most tilted vertebra (end vertebra) above and below the apex (most laterally placed vertebra) of the curve on radiographs taken either in the anterior-posterior or the posterior-anterior view on the coronal plane [4]. In general, the "manual" procedure requires lines to be drawn onto a hardcopy of radiographic films, and the angle between the two lines is measured with a protractor. Therefore, measuring the CA can be time-consuming and is also prone to inter-observer and intraobserver variations. Reported accuracies of measuring CA vary from 2 • to 11 • [5][6][7], with measurements differing up to 5 • even with the same end vertebrae selected [2,8].
Semi-automatic assessments of CA have been made possible with the advent of the digitalization of computerized radiography. The Picture Archiving and Communications System (PACS) enables a built-in function so users can digitally draw lines for the required

Materials and Methods
This study used two datasets which were comprised of open-source and local data. CNN was used for automated spinal detection and CA measurement. A total of 551 X-ray images were evaluated; 481 and 70 X-ray images were used for training and testing stages, respectively. For ease of calculation, the collected data were divided into (1) CA < 10 • ; (2) CA 10 • to 25 • ; (3) CA > 25 • to 40 • ; and (4) CA > 40 • .
A detailed description of the datasets and the methods are presented in the following subsections.

Opensource Datasets
The collection and labeling of spinal images were performed by the public AASCE MICCAI 2019 anterior-posterior X-ray images dataset [18]. The input images vary in size from 359 × 973 to 1427 × 3755. Some challenging images can be handled due to our large number of training image conditions, which include images with different noise, contrast, lighting conditions, and spines with high CA, as shown in Figure 1. Each image contains 17 vertebrae from the thoracic (upper spine) and lumbar (lower spine) regions. The image input resolution is set to 1024 × 512 for the algorithm development. Each vertebra is located by four corner landmarks. The ground-truth of the 68 landmarks or points in each image is provided by the dataset.

Local Datasets
Patients with AIS who attended the scoliosis clinic at Pantai Jerudong Specialist Centre from 1 November 2018 to 4 September 2020 were identified from the institution's scoliosis database. These patients had standard standing anterior-posterior X-rays showing cervical vertebra level 7 to the femoral heads and the entire rib cage from right to left as part of routine clinical management. The X-ray images were retrieved from the institution PACS, anonymized, and exported as a JPEG format image, as shown in Figure 2. The CA for each X-ray image was measured by (1) two neurosurgeons (Observer 1 specializes in scoliosis; Observer 2 does not), who were blinded to each other and the ACAMM, using the in-built function in the PACS and (2) the ACAMM.

Local Datasets
Patients with AIS who attended the scoliosis clinic at Pantai Jerudong Specialist Centre from 1 November 2018 to 4 September 2020 were identified from the institution's scoliosis database. These patients had standard standing anterior-posterior X-rays showing cervical vertebra level 7 to the femoral heads and the entire rib cage from right to left as part of routine clinical management. The X-ray images were retrieved from the institution PACS, anonymized, and exported as a JPEG format image, as shown in Figure 2. The CA for each X-ray image was measured by (1) two neurosurgeons (Observer 1 specializes in scoliosis; Observer 2 does not), who were blinded to each other and the ACAMM, using the in-built function in the PACS and (2) the ACAMM.

Local Datasets
Patients with AIS who attended the scoliosis clinic at Pantai Jerudong Specialist Centre from 1 November 2018 to 4 September 2020 were identified from the institution's scoliosis database. These patients had standard standing anterior-posterior X-rays showing cervical vertebra level 7 to the femoral heads and the entire rib cage from right to left as part of routine clinical management. The X-ray images were retrieved from the institution PACS, anonymized, and exported as a JPEG format image, as shown in Figure 2. The CA for each X-ray image was measured by (1) two neurosurgeons (Observer 1 specializes in scoliosis; Observer 2 does not), who were blinded to each other and the ACAMM, using the in-built function in the PACS and (2) the ACAMM.

Proposed Methods
The proposed automated vertebrae detection and CA measurement comprises of sequential stages presented in detail in Figure 3.

Proposed Methods
The proposed automated vertebrae detection and CA measurement comprises of sequential stages presented in detail in Figure 3. In order to automate the CA measurement, this study was performed in three main stages: (1) development of an algorithm to automatically crop and standardize the X-ray images dimension; (2) development of vertebrae detection based on the local X-ray images using CNN which improved the previous method described [19,20]; and (3) development of an algorithm to identify the apex vertebra and superior and inferior end vertebrae to measure the CA. The following subsections provide a detail description of the three main stages.
2.3.1. Stage 1: Pre-Processing of the X-ray Images Size Local Binary Patterns (LBP) and cascade classifier [21], a type of visual descriptor used for classification in computer vision, was used to standardize the X-ray images and automatically crop the image between the cervical and sacrum. An example of the process result is shown in Figure 4.   In order to automate the CA measurement, this study was performed in three main stages: (1) development of an algorithm to automatically crop and standardize the X-ray images dimension; (2) development of vertebrae detection based on the local X-ray images using CNN which improved the previous method described [19,20]; and (3) development of an algorithm to identify the apex vertebra and superior and inferior end vertebrae to measure the CA. The following subsections provide a detail description of the three main stages.
2.3.1. Stage 1: Pre-Processing of the X-ray Images Size Local Binary Patterns (LBP) and cascade classifier [21], a type of visual descriptor used for classification in computer vision, was used to standardize the X-ray images and automatically crop the image between the cervical and sacrum. An example of the process result is shown in Figure 4.

Proposed Methods
The proposed automated vertebrae detection and CA measurement comprises of sequential stages presented in detail in Figure 3. In order to automate the CA measurement, this study was performed in three main stages: (1) development of an algorithm to automatically crop and standardize the X-ray images dimension; (2) development of vertebrae detection based on the local X-ray images using CNN which improved the previous method described [19,20]; and (3) development of an algorithm to identify the apex vertebra and superior and inferior end vertebrae to measure the CA. The following subsections provide a detail description of the three main stages.

Stage 1: Pre-Processing of the X-ray Images Size
Local Binary Patterns (LBP) and cascade classifier [21], a type of visual descriptor used for classification in computer vision, was used to standardize the X-ray images and automatically crop the image between the cervical and sacrum. An example of the process result is shown in Figure 4.   The LBP code label histogram contains information about the pixel level distribution for edges and other local features in the image. This feature was chosen because it uses a derivative pattern to obtain direction from a binary gradient, making it suitable for obtaining information from X-ray images that have a gray-scale color. Following this, a cascade classifier is used to select features from the X-ray image that are used to define the body area of the object. This body region is selected and used as a positive input for training the cascade classifier. The limits of the boundaries were from cervical vertebrae level-7 to the sacrum at the lumbar-sacral junction, and the right and left outermost parts of the body. The negative image which is separate from the body region of interest is also obtained. The pixel information which contains about body region can be labeled as where z = N s − N c , s is the number of sampling points in a small circular neighborhood with r = 1, N s is the neighborhood pixels in each s, N c is the neighborhood center pixel, and the binary threshold function g(x) can be defined as This body region is then detected by the sliding window method and each area traversed by each sub-window is labeled as positive or negative, can be represented as where X is a set of the training images, α is the weight vector, β is the weighting parameter which computed from the error associated with the classifier, i = 1, 2, . . . , I is the iteration number of the training. If the classifier detects the label as positive, the detection is passed to the next stage. Continuation of the classifier until the final stage will enable the production of a body region to be used as an input image in the next stage.

Stage 2: Vertebrae Detection on the Local X-ray Images Using CNN
The deep learning architecture was based on CNN for automated spine segmentation to select the vertebrae accurately and determine the center and corner offset of each vertebra based on a method described previously [19]. Modifications in the pre-processing method and parameter selection were performed according to local datasets. In addition, we improved upon the previous method to enable the automatic CA measurement.
In the proposed method, the 152-layer ResNet [22] is used as a backbone network to classify 68 landmarks to obtain the corner offset of spine. This CNN [23][24][25] consists of several convolutional layers that learn the local features of the images and generate the classifications. A bottleneck block with 4-layer extension and 152 layers was built using more 3-layer blocks for higher accuracy [26,27]. Despite having increased layers, this backbone has a lower complexity than the ResNet-50 used in the previous method [19]. This feature map is then classified into a fully connected layer with a sigmoid function to get a better feature intensity. The proposed network as presented in Figure 5 includes pooling layers (average pool and maximum pool), feature maps classification (fully connected layer and sigmoid function), and corner offset. The model is initialized from the pre-trained weights on ImageNet. The network was trained to a learning rate of 0.0001 using the Adam optimizer; the batch and epoch sizes were set as 2 and 150, respectively. trained weights on ImageNet. The network was trained to a learning rate of 0.0001 using the Adam optimizer; the batch and epoch sizes were set as 2 and 150, respectively. The parameters to obtain the classification were optimized using the focal loss [19] as follows: where m = 1, 2, …, M is the index of each feature maps' position, m ρ and m τ are the prediction and ground-truth value, respectively. The center offset and corner offset maps using convolutional layers for landmark localization were constructed. Since the output of the feature map on the network is downsized, the center offset and corner offset are mapped to a new location which is then trained with L1 loss. Detection bounding boxes were displayed on each vertebra after applying the object detection step on the X-ray images. The coordinates for the corners and center of each bounding box were found as presented in Figure 6. , , The parameters to obtain the classification were optimized using the focal loss [19] as follows: where m = 1, 2, . . . , M is the index of each feature maps' position, ρ m and τ m are the prediction and ground-truth value, respectively. The center offset and corner offset maps using convolutional layers for landmark localization were constructed. Since the output of the feature map on the network is downsized, the center offset and corner offset are mapped to a new location which is then trained with L1 loss. Detection bounding boxes were displayed on each vertebra after applying the object detection step on the X-ray images. The coordinates for the corners and center of each bounding box were found as presented in Figure 6. trained weights on ImageNet. The network was trained to a learning rate of 0.0001 using the Adam optimizer; the batch and epoch sizes were set as 2 and 150, respectively. The parameters to obtain the classification were optimized using the focal loss [19] as follows: The center offset and corner offset maps using convolutional layers for landmark localization were constructed. Since the output of the feature map on the network is downsized, the center offset and corner offset are mapped to a new location which is then trained with L1 loss. Detection bounding boxes were displayed on each vertebra after applying the object detection step on the X-ray images. The coordinates for the corners and center of each bounding box were found as presented in Figure 6.  Errors in the detection of landmarks in the vertebrae are evaluated by where d and p are the locations (x,y) of the detected and ground-truth landmarks, respectively, t = 1, 2, . . . , T is the total number of the detected landmarks.

Stage 3: Cobb Angle (CA) Measurement
Curve fitting was used to select the appropriate boxes. Boxes with a prediction score of more than 0.5 were extracted. From the location of the detected boxes, the center point of each vertebra is found to remove some outliers based on the anatomy of the spine, where the adjacent vertebrae should not be far apart from each other. If the x-axis center of the detected bounding box is more than half the width of the box from the x-axis center of its two closest neighbors (top and bottom), the box is rejected as an outlier. Otherwise, the position of the box is reconsidered based on the position of the nearest boxes. The steps to measure the CA is presented in Figure 7. where d and p are the locations (x,y) of the detected and ground-truth landmarks, respectively, t = 1, 2, …, T is the total number of the detected landmarks.

Stage 3: Cobb Angle (CA) Measurement
Curve fitting was used to select the appropriate boxes. Boxes with a prediction score of more than 0.5 were extracted. From the location of the detected boxes, the center point of each vertebra is found to remove some outliers based on the anatomy of the spine, where the adjacent vertebrae should not be far apart from each other. If the x-axis center of the detected bounding box is more than half the width of the box from the x-axis center of its two closest neighbors (top and bottom), the box is rejected as an outlier. Otherwise, the position of the box is reconsidered based on the position of the nearest boxes. The steps to measure the CA is presented in Figure 7. The apex of the spine curvature is found as the deepest part of the curve. For each box above the apex, the slope of each vertebrae is measured based on the position between top-left and top-right to detect the most-tilted vertebrae above the apex. For each box below the apex, the slope of each vertebra is measured based on the position between bottom-left and bottom-right to detect most-tilted vertebra below the apex. The location of the superior end vertebra and inferior end vertebra is identified as the most tilted vertebra above apex (aa) and below apex (ba), respectively. CA measured as the angle of the intersection between two lines from aa to ba. Figure 9 shows an example of a measurement performed by the ACAMM.  The apex of the spine curvature is found as the deepest part of the curve. For each box above the apex, the slope of each vertebrae is measured based on the position between top-left and top-right to detect the most-tilted vertebrae above the apex. For each box below the apex, the slope of each vertebra is measured based on the position between bottom-left and bottom-right to detect most-tilted vertebra below the apex. The location of the superior end vertebra and inferior end vertebra is identified as the most tilted vertebra above apex (aa) and below apex (ba), respectively. CA measured as the angle of the intersection between two lines from aa to ba. Figure 9 shows an example of a measurement performed by the ACAMM. where d and p are the locations (x,y) of the detected and ground-truth landmarks, respectively, t = 1, 2, …, T is the total number of the detected landmarks.

Stage 3: Cobb Angle (CA) Measurement
Curve fitting was used to select the appropriate boxes. Boxes with a prediction score of more than 0.5 were extracted. From the location of the detected boxes, the center point of each vertebra is found to remove some outliers based on the anatomy of the spine, where the adjacent vertebrae should not be far apart from each other. If the x-axis center of the detected bounding box is more than half the width of the box from the x-axis center of its two closest neighbors (top and bottom), the box is rejected as an outlier. Otherwise, the position of the box is reconsidered based on the position of the nearest boxes. The steps to measure the CA is presented in Figure 7. The apex of the spine curvature is found as the deepest part of the curve. For each box above the apex, the slope of each vertebrae is measured based on the position between top-left and top-right to detect the most-tilted vertebrae above the apex. For each box below the apex, the slope of each vertebra is measured based on the position between bottom-left and bottom-right to detect most-tilted vertebra below the apex. The location of the superior end vertebra and inferior end vertebra is identified as the most tilted vertebra above apex (aa) and below apex (ba), respectively. CA measured as the angle of the intersection between two lines from aa to ba. Figure 9 shows an example of a measurement performed by the ACAMM.  Vertebrae anatomy for the illustration of finding apex, superior end, and inferior end-vertebrae.

Statistical Analysis
All statistical analyses were performed with SPSS version 20 (IBM Corporation, Armonk, New York, United States of America). The chi-squared test and the Mann-Whitney U test were performed for nominal and non-normally distributed variables, respectively. Using Observer 1 as the reference, median percentage accuracy (Interquartile Range (IQR)), median CA measurement differences (IQR), and proportion of CA measurements within ± 5° comparing the ACAMM to Observer 2 was calculated. The percentage accuracy was calculated based on the following formula: Percentage accuracy (%) = 100 100 (6) where C is the Observer 1 CA measurement and C is ACAMM (or Observer 2) CA measurement. The reliability of the ACAMM using our proposed CNN was assessed by Intraclass Correlation Coefficient (ICC) and Pearson Correlation Coefficient (PCC). Generally, the ICC reliability values are rated as poor (< 0.50), fair (0.50 to 0.75), good (> 0.75 to 0.90), or excellent (> 0.90). The significance level for the study was set at p < 0.05.

Vertebrae Detection Results
The datasets were trained on the RTX2060 GPU with Intel Core-i7 processor. The proposed architecture using CNN accurately detected the location of each of the 17 vertebrae in the spine X-ray. In addition to this, the bounding box was evaluated to be sufficient in its accordance with the vertebra positions. Its performance was accurate to provide the information needed to detect the superior and inferior end vertebrae, enabling the CA to be evaluated correctly.

Statistical Analysis
All statistical analyses were performed with SPSS version 20 (IBM Corporation, Armonk, New York, United States of America). The chi-squared test and the Mann-Whitney U test were performed for nominal and non-normally distributed variables, respectively. Using Observer 1 as the reference, median percentage accuracy (Interquartile Range (IQR)), median CA measurement differences (IQR), and proportion of CA measurements within ±5 • comparing the ACAMM to Observer 2 was calculated. The percentage accuracy was calculated based on the following formula: Percentage accuracy (%) = 100 − 100( C −Ć C ) (6) where C is the Observer 1 CA measurement andĆ is ACAMM (or Observer 2) CA measurement. The reliability of the ACAMM using our proposed CNN was assessed by Intraclass Correlation Coefficient (ICC) and Pearson Correlation Coefficient (PCC). Generally, the ICC reliability values are rated as poor (<0.50), fair (0.50 to 0.75), good (>0.75 to 0.90), or excellent (>0.90). The significance level for the study was set at p < 0.05.

Vertebrae Detection Results
The datasets were trained on the RTX2060 GPU with Intel Core-i7 processor. The proposed architecture using CNN accurately detected the location of each of the 17 vertebrae in the spine X-ray. In addition to this, the bounding box was evaluated to be sufficient in its accordance with the vertebra positions. Its performance was accurate to provide the information needed to detect the superior and inferior end vertebrae, enabling the CA to be evaluated correctly.
The detection results also showed that the proposed architecture could be used to identify the vertebrae in X-ray images of different contrast and lighting conditions (see Appendix A, Figure A1). Our test on several images with poor contrast and lighting conditions yielded good results. Importantly, CA measurements and curve classification were able to be accurately accomplished even when the detection process failed to identify one or two vertebrae due to our curve fitting and quantification method. Previous studies using CNN [14,16] focused on vertebrae detection and measurement of CA under certain conditions. The method we proposed was able to measure CA from normal to severely scoliotic spine (up to 79 • ). This is a key part of the algorithm as X-ray images may come in different contrast and lighting qualities in the clinical setting, depending on the severity of the curve as well as the patient's body habitus.

Evaluation of Automated CA Measurement Results and Ground Truth
The results of the ACAMM were compared with the results measured by two Neurosurgeons (Observer 1 specializes in scoliosis; Observer 2 does not) as shown in Tables     Overall, the ACAMM was highly matched to the CA assessment performed by the two observers (Table 6). ICC for the ACAMM compared to Observer 1 and Observer 2 was 0.995 and 0.954, respectively. The PCC also showed high correlation between the ACAMM and Observer 1 (0.991, p < 0.001) and Observer 2 (0.931, p < 0.001).

Discussions
Our proposed method of automatic assessment of CA on spine X-ray images used ResNet-152 as a backbone to improve the performance accuracy. We used feature maps classification with a sigmoid function as network depth has previously been shown to be beneficial in classification accuracy. However, its performance can become saturated with a resultant rapid decrease in performance as the network gains greater depth. This issue can be corrected by the ResNet framework, where a shortcut connection is added for every three convolution layers across the deep network. These shortcut connections performed identity mapping without additional parameters, which can increase computational complexity. This simplification of network optimization during the training process enabled ResNet to achieve a higher accuracy from deeper networks when performing image segmentation tasks. Curve fitting and quantification were used to handle errors in vertebrae detection so that accurate CA measurements could still be obtained.
Reproducibility remains a common problem in CA measurements due to the high degree of intra-observer and inter-observer variabilities. An objective, reliable method to determine CA is crucial, as this measurement is used to determine and guide clinical decisions regarding diagnosis, curve progression, and management, including surgical options. Our ACAMM showed good CA measurement accuracies (93.6%) when assessed against a neurosurgeon with expertise in scoliosis. Importantly, all except for one (98.6%) of the automated measurements were within ±5 • in CA measurements of the expert neurosurgeon. Generally, this variation is within the accepted threshold when measuring CA in the clinical setting [5,8]. This was in contrast to the second observer, where only 54.3% of the CA measurements were within ±5 • . Furthermore, the reliability of our proposed method to measure CA was excellent (ICC > 0.95, PCC > 0.93). The results, which are highly matched with the assessment performed by the two neurosurgeons, indicate that this CNN method has a high potential for its use in the real-world setting.
There are limitations to this study. The results shown were performed in a non-clinical setting. Therefore, further tests of reliability and accuracy in the clinical setting with realtime comparisons with the clinicians' CA measurement, including an increase in the testing samples and observers, is warranted. Secondly, the current method is only able to measure a single major curvature in the spine and cannot detect other minor curves in the same X-rays, which may be clinically relevant. Future work to enable the ability of the algorithm to detect all the curves in the scoliotic spine X-ray will be explored.

Conclusions
We developed a pre-processing method and a deep learning architecture using a convolutional neural network for spine segmentation and vertebrae detection to automatically measure Cobb angle in adolescent idiopathic scoliosis. The vertebrae detection network uses ResNet-152 as the backbone, feature maps classification, and corner offset compensation to improve the performance accuracy. The proposed method demonstrates good measurement accuracies when compared against an expert in scoliosis and has an excellent reliability rating, indicating it is a promising method for automatic measurement of Cobb angle in a real-world setting. Figure A1. The red font inside the X-ray image is a CNN CA measurement. Observer 1 and 2 evaluation is presented on the bottom left and right, respectively.