Validity and Absolute Reliability of Axial Vertebral Rotation Measurements in Thoracic and Lumbar Vertebrae

: Axial vertebral rotation (AVR) and Cobb angles are the essential parameters to analyse different types of scoliosis, including adolescent idiopathic scoliosis. The literature shows signiﬁcant discrepancies in the validity and reliability of AVR measurements taken in radiographic examinations, according to the type of vertebra. This study’s scope evaluated the validity and absolute reliability of thoracic and lumbar vertebrae AVR measurements, using a validated software based on Raimondi’s method in digital X-rays that allowed measurement with minor error when compared with other traditional, manual methods. Twelve independent evaluators measured AVR on the 74 most rotated vertebrae in 42 X-rays with the software on three separate occasions, with one-month intervals. We have obtained a gold standard for the AVR of vertebrae. The validity and reliability of the measurements of the thoracic and lumbar vertebrae were studied separately. Measurements that were performed on lumbar vertebrae were shown to be 3.6 times more valid than those performed on thoracic, and with almost an equal reliability (1.38 ◦ ± 1.88 ◦ compared to − 0.38 ◦ ± 1.83 ◦ ). We can conclude that AVR measurements of the thoracic vertebrae show a more signiﬁcant Mean Bias Error and a very similar reliability than those of the lumbar vertebrae.


Introduction
Most authors accept adolescent idiopathic scoliosis (AIS) as a three-dimensional deformity involving the axial, sagittal and frontal planes [1]. AIS can progress over the years, especially during growth, and can cause musculoskeletal, lung, and psychological problems, as well as significant pain in adulthood [2]. Axial vertebral rotation (AVR) is an essential parameter in the AIS study [3,4]. AVR is defined as the rotation of a vertebra Appl. Sci. 2021, 11, 11084 2 of 10 around its longitudinal axis when projected onto the transverse image plane [5]. Its measurement is necessary to assess the severity of scoliosis and to quantify the risk of progression [4,[6][7][8], for the selection of treatment [6,9,10] and the analysis of orthopaedic and surgical procedures [4,8,9,[11][12][13]. The term rotation is not as appropriate as torsion but is widely used in the literature [5].
There are several methods for assessing AVR by using conventional X-rays (through identifying the position of some vertebral anatomical structures and their relationships). One of the methods used [14,15] for its simplicity and reliability [15,16] is the Raimondi method, which uses templates (Raimondi's tables) to determine the degree of AVR on X-ray films [17][18][19][20]. Another widely used method is the Perdriolle method, recommended by the Scoliosis Research Society [15,21], which also measures the degree of AVR but uses a ruler with 5 • intervals, whereas the Raimondi method measures it with 2 • intervals.
We can obtain a three-dimensional reconstruction of the spine using computerised tomography (CT) scans, and measure AVR with high accuracy [22]. However, the CT-scan is unsuitable for monitoring scoliotic progression because of the excessive and repeated radiation it involves (e.g., an estimated radiation dose of 5.2 mSv for each study [22]). Radiographic medical imaging, especially standing, frontal, full-length spinal X-ray imaging [9,23,24], is the method of choice for the diagnosing and monitoring of scoliosis [25].
To the best of our knowledge, there is no unanimity on AVR measurements' validity and reliability on radiographic images, depending on the type of vertebra [6,16,26]. These discrepancies could be due to the use of different measurement instruments (e.g., Raimondi, Perdriolle, or Nash & Moe) [14][15][16]21], by the type of imaging media used (conventional radiography, digital X-ray with different characteristics, and others) [27], as well as by the number of observers and measurements, which in some studies leads to a relatively low statistical power [6,16,26,28].
Advances in the digital technology of radiology have fostered the development of software tools for the evaluation of medical images, with which manual measurement methods can be applied to medical images more quickly, easily, and with less intra-and inter-observer variability [29][30][31][32][33][34].
Hurtado-Avilés et al. have recently conducted a study with twelve independent evaluators with different experience levels that measured 33 scoliotic curves in 21 X-rays with software on three separate occasions, separated by one month. The observers re-measured the same radiographic studies three months later, but instead on X-ray films and in a conventional way. The results showed that the software with the built-in equation increased the validity 1.7 times, and the absolute reliability 1.9 times of AVR measurements conducted on digital X-rays when compared to Raimondi's conventional manual measurements [28].
Our study aimed to evaluate the differences in quality (validity and reliability) of AVR measurements of the thoracic and lumbar vertebrae on digital frontal entire-spine radiographs of patients with idiopathic scoliosis, using an improved version of Raimondi's method which uses validated computer-aided diagnosis (CAD) software, and meets the criteria for absolute validity [35,36].
We hypothesised that the AVR measurements on two-dimensional medical images show a different validity and reliability for the thoracic and lumbar vertebrae, perhaps due to anatomical differences (e.g., size or costal overlap in the thoracic vertebrae).

Software
To calculate the AVR in digital X-ray images, we used software based on equations [37], which applies an improved version of Raimondi's method [28]. The software (registered under the name TraumaMeter v. 873) was developed in C++ language under the Microsoft Visual Studio 2019 development environment, using the OpenCV 3.4.10 artificial vision libraries and the DCMTK libraries, from the OFFIS-Institute for Information Technology, to operate with DICOM (Digital Imaging and Communication on Medicine) files. This software incorporates additional tools, such as zooming in on regions of interest and varying the contrast (fractional difference in the optical density of the brightness between two regions of an image) of the digitalised X-ray image.
The observer selected the rotated vertebra in the X-ray image to perform the AVR calculation with the software (in case of doubt among two vertebrae, the operator measured both). The observer enlarged the frontal-vertebral projection and selected, with a mouse click, the two closest points on the lateral sides of the vertebral body, as well as the two opposite sides of the shadow of the pedicle, rotated towards the centre of the vertebra. Based on the position of the two closest lateral faces, the vertebral body points, and the two opposite sides of the shadow of the pedicle which is turned towards the centre of the vertebra in the anteroposterior projection, the software calculated the width of the vertebral body (D) and the distance from the centre of the pedicle to the side of the vertebral body (d) (Figure 1). From the position of these points, the software obtained the vertebral rotation, using the equation [Equation (1)] published by our group [37].
To calculate the AVR in digital X-ray images, we used software based on equations [37], which applies an improved version of Raimondi's method [28]. The software (registered under the name TraumaMeter v.873) was developed in C++ language under the Microsoft Visual Studio 2019 development environment, using the OpenCV 3.4.10 artificial vision libraries and the DCMTK libraries, from the OFFIS-Institute for Information Technology, to operate with DICOM (Digital Imaging and Communication On Medicine) files. This software incorporates additional tools, such as zooming in on regions of interest and varying the contrast (fractional difference in the optical density of the brightness between two regions of an image) of the digitalised X-ray image.
The observer selected the rotated vertebra in the X-ray image to perform the AVR calculation with the software (in case of doubt among two vertebrae, the operator measured both). The observer enlarged the frontal-vertebral projection and selected, with a mouse click, the two closest points on the lateral sides of the vertebral body, as well as the two opposite sides of the shadow of the pedicle, rotated towards the centre of the vertebra. Based on the position of the two closest lateral faces, the vertebral body points, and the two opposite sides of the shadow of the pedicle which is turned towards the centre of the vertebra in the anteroposterior projection, the software calculated the width of the vertebral body (D) and the distance from the centre of the pedicle to the side of the vertebral body (d) (Figure 1). From the position of these points, the software obtained the vertebral rotation, using the equation [Equation (1)] published by our group [37].

Study Design and Measurement Protocol
A prospective, observational study of 42 selected frontal spinal radiographs of patients with AIS, with an equivalent image quality and no defects (e.g., image noise or inadequate voltage), was performed.
Radiographic images were collected from an image repository retrospectively, during the routine medical care of AIS patients.
Our study followed the ethical guidelines of the World Medical Association Declaration of Helsinki, as revised in 2013. The study was considered exempt from requiring ethical approval as complete and irreversible anonymisation of the images did not involve data processing.
The radiographs were natively in digital format (in DICOM format, with 283.46 pixels/mm resolution).
The study was carried out with twelve independent evaluators with different experience levels in using Raimondi's method. A 5 h briefing was held before the software measurements, with comprehensive information on the study and training in the software that was used.
Each observer measured the 42 X-rays with the software on three occasions, which were separated by one month. The sequence of presentation of the radiographs was randomly assigned in each of the measurement rounds by the study coordinator, who kept the randomisation key confidential.
To establish a gold standard (Table 1), a specialist in orthopaedic surgery, a specialist in physical medicine and rehabilitation (both with more than 35 years of professional experience in the field of the spine, as well as regular users of the Raimondi method), and the engineer who designed the software measured together, as well as on the same computer, all the vertebrae. This gold standard made it possible to compare the observers' AVR values that were obtained for each vertebra (the average of each set of 36 measurements obtained by the 12 observers).

Statistics
We used the Statistical Package for the Social Sciences (SPSS), version 25 for Windows (SPSS, Inc., Chicago, IL, USA) for the statistical analysis.
We estimated the agreement between the obtained measurements and each gold standard, the validity (MBE, Mean Bias Error), the reliability (SD), the sample standard error (SEM),the minimum detectable change (MCD95) and the absolute agreement intraclass correlation coefficient, employing a two-factor random-effects model (ICC (2.1)) [38]. We also used the Bland-Altmann plot to graphically show the agreement between the obtained measurements and the gold standard (Figures 2 and 3).

Statistics
We used the Statistical Package for the Social Sciences (SPSS), version 25 for Windows (SPSS, Inc., Chicago, IL, USA) for the statistical analysis.
We estimated the agreement between the obtained measurements and each gold standard, the validity (MBE, Mean Bias Error), the reliability (SD), the sample standard error (SEM),the minimum detectable change (MCD95) and the absolute agreement intraclass correlation coefficient, employing a two-factor random-effects model (ICC (2,1)) [38]. We also used the Bland-Altmann plot to graphically show the agreement between the obtained measurements and the gold standard (Figures 2 and 3).   We assessed the reliability of the agreement according to Landis and Koch's criteria (<0 indicates no agreement, 0.00 to 0.20 indicates slight agreement, 0.21 to 0.40 indicates fair agreement, 0.41 to 0.60 indicates moderate agreement, 0.61 to 0.80 indicates substantial agreement, and 0.81 to 1.0 indicates near perfect or perfect agreement) [39].
The distributions of the measurements were improved by identifying values below Q1− (1.5 interquartile range (IQR)) and above Q3+ (1.5 IQR). These values were considered outliers and were removed from each distribution.
We removed outliers based on statistical methods due to their effects on the loss of normality of the data distributions. Normality of distributions is necessary for the application of inferential statistical methods.
Table S1 (supplementary material) shows the outliers which were removed from each distribution.
We used the Shapiro-Wilk test to check that the p-values of the data were above the 0.05 significance level, accepting the null hypothesis that the data fit a normal distribution. All distributions met the normality criterion of this test.

Results
The evaluation results for both types of vertebrae showed measurements with good validity and reliability values, and the intraclass correlation coefficients were almost perfect (greater than 0.8).
Table S2 (supplementary material) shows the measurements of the rotation of each vertebra that were obtained by each observer, and in each round of measurements.
The measurements that were taken from the lumbar vertebrae showed a 3.6 times higher validity than those that were taken from the thoracic vertebrae, with an almost equal reliability (1.38° ± 1.88° versus −0.38° ± 1.83°).
The Bland-Altman plots (Figures 2 and 3) of the agreement between the gold standard values, as well as the distributions of the thoracic and lumbar vertebrae measurements, show the absence of bias in the agreement of both vertebrae. The distributions of the measurements were improved by identifying values below Q1− (1.5 interquartile range (IQR)) and above Q3+ (1.5 IQR). These values were considered outliers and were removed from each distribution.
We removed outliers based on statistical methods due to their effects on the loss of normality of the data distributions. Normality of distributions is necessary for the application of inferential statistical methods.
Table S1 (Supplementary Material) shows the outliers which were removed from each distribution.
We used the Shapiro-Wilk test to check that the p-values of the data were above the 0.05 significance level, accepting the null hypothesis that the data fit a normal distribution. All distributions met the normality criterion of this test.

Results
The evaluation results for both types of vertebrae showed measurements with good validity and reliability values, and the intraclass correlation coefficients were almost perfect (greater than 0.8). Table S2 (Supplementary Material) shows the measurements of the rotation of each vertebra that were obtained by each observer, and in each round of measurements.
The measurements that were taken from the lumbar vertebrae showed a 3.6 times higher validity than those that were taken from the thoracic vertebrae, with an almost equal reliability (1. The Bland-Altman plots (Figures 2 and 3) of the agreement between the gold standard values, as well as the distributions of the thoracic and lumbar vertebrae measurements, show the absence of bias in the agreement of both vertebrae.
According to the Student's t-test for the equality of means, the difference between the measurement errors is significant, at a 95% confidence interval (t = 3.683, bilateral significance 0.001). Figure 4 shows this difference graphically, where the 95% confidence intervals for each mean do not overlap. According to the Student's t-test for the equality of means, the difference between the measurement errors is significant, at a 95% confidence interval (t = 3.683, bilateral significance 0.001). Figure 4 shows this difference graphically, where the 95% confidence intervals for each mean do not overlap.

Discussion
The present study shows that the error in AVR measurements of the thoracic vertebrae in radiographic images is greater than the error in the measurement of the lumbar vertebrae, and that the reliability of the thoracic and lumbar vertebrae measurements is practically the same (SD = 1.88° and 1.83°). However, the mean error in the measurements of the thoracic vertebrae is significantly higher (1.38° and 0.38).
Nevertheless, we have obtained an almost perfect agreement, according to Landis and Koch [39], between the distributions of the two types of vertebrae and their respective gold standard (ICC(2,1) = 0.985 with 95% confidence interval = 0.97-0.993 in the lumbar vertebrae and ICC(2,1) = 0.974 with 95% confidence interval = 0.897-0.99 in the thoracic segment).
Knowledge of a measuring instruments' quality is essential, as the diagnosis and treatment decision may depend on its performance. There are significant discrepancies in the validity and reliability of AVR measurements in radiographic tests, according to the type of vertebrae imaged, in the literature. These discrepancies could be justified by the use of different measurement instruments (Raimondi, Perdriolle, Nash & Moe, etc., each with a different validity and reliability), by the type of image support (physical radiography, digital X-ray with different characteristics, and others), as well as by the number of observers and measurements, which in some studies lead to a relatively low statistical power.
For example, Eijgenraam et al. [6], who used the Perdriolle method and the X-rays of two thoracic and one lumbar cadaveric vertebrae, found no significant differences between the vertebral levels. Cerny et al. [26], using the Perdriolle method on the X-rays of five lumbar and five thoracic cadaveric vertebrae, obtained a higher reliability in the measurements of the lumbar vertebrae. Defino et al. [16], using the X-rays of one lumbar and one thoracic cadaveric vertebra, obtained, with the Raimondi method, a better agreement between the measurements of the thoracic vertebra and the real values; with the Nash & Moe method, they obtained worse measurements in the thoracic vertebrae. Mangone et al. [3] studied the AVR in the X-rays of 25 patients with AIS, using the Raimondi method, and they found an error in the thoracic vertebrae of 9.18° ± 3.33° and 10.18° ± 5.9° for the

Discussion
The present study shows that the error in AVR measurements of the thoracic vertebrae in radiographic images is greater than the error in the measurement of the lumbar vertebrae, and that the reliability of the thoracic and lumbar vertebrae measurements is practically the same (SD = 1.88 • and 1.83 • ). However, the mean error in the measurements of the thoracic vertebrae is significantly higher (1.38 • and 0.38).
Nevertheless, we have obtained an almost perfect agreement, according to Landis and Koch [39], between the distributions of the two types of vertebrae and their respective gold standard (ICC (2.1) = 0.985 with 95% confidence interval = 0.97-0.993 in the lumbar vertebrae and ICC (2.1) = 0.974 with 95% confidence interval = 0.897-0.99 in the thoracic segment).
Knowledge of a measuring instruments' quality is essential, as the diagnosis and treatment decision may depend on its performance. There are significant discrepancies in the validity and reliability of AVR measurements in radiographic tests, according to the type of vertebrae imaged, in the literature. These discrepancies could be justified by the use of different measurement instruments (Raimondi, Perdriolle, Nash & Moe, etc., each with a different validity and reliability), by the type of image support (physical radiography, digital X-ray with different characteristics, and others), as well as by the number of observers and measurements, which in some studies lead to a relatively low statistical power.
For example, Eijgenraam et al. [6], who used the Perdriolle method and the X-rays of two thoracic and one lumbar cadaveric vertebrae, found no significant differences between the vertebral levels. Cerny et al. [26], using the Perdriolle method on the Xrays of five lumbar and five thoracic cadaveric vertebrae, obtained a higher reliability in the measurements of the lumbar vertebrae. Defino et al. [16], using the X-rays of one lumbar and one thoracic cadaveric vertebra, obtained, with the Raimondi method, a better agreement between the measurements of the thoracic vertebra and the real values; with the Nash & Moe method, they obtained worse measurements in the thoracic vertebrae. Mangone et al. [3] studied the AVR in the X-rays of 25 patients with AIS, using the Raimondi method, and they found an error in the thoracic vertebrae of 9.18 • ± 3.33 • and 10.18 • ± 5.9 • for the lumbar vertebrae. In our study, we have made a more significant number of measurements than in the studies of Eijgenraam et al. [6] or Cerny et al. [26]. In addition, to our knowledge, our study is the analysis with the largest number of evaluators.
In our study, we have calculated, for both thoracic and lumbar vertebrae, the absolute validity and reliability of their AVR measurements, which requires a minimum of 30 clinical cases, measured by at least six blind observers and a minimum of 3 tests per observer, spaced at least two weeks apart [35,36].
The medical images which were used have been considered standard in the quantification of AVR in AIS patients, i.e., plain radiographs in DICOM format of the entire spine [9,23,24,[29][30][31][32].
The possibility of increasing a vertebra's size with the mouse's scroll wheel allows the correct determination of the reference points. Frequently, determining the width of the vertebra is difficult, due to the overlap of structures, particularly in the dorsal vertebrae, and focusing on the width of the adjacent vertebrae can make that easier. The main reason for an incorrect level determination and worse reliability by using conventional measurement methods is that the vertebra to be measured (the one with the highest torsion) is often determined with the naked eye and then measured. With the system we propose, we would measure at least the two most-twisted vertebrae, at the apex of the scoliotic curve (in case of doubt, three should be measured), and use the value of the vertebra with the highest degree of twisting, whereby the software would then calculate the degree of torsion automatically, after determining the four reference points that are used for this method. We used an improved version of the Raimondi method for measuring AVR (Prosperini et al., 2010; Kadoury et al., 2009;Weiss, 1995) using software that implements it (TraumaMeter v. 873). This software has been validated and allows for the AVR to be measured more accurately than by the traditional manual measurement method, with Raimondi tables on plain radiographs (with a validity and reliability of 0.53 • ± 1.9 • , the mean ICC (2.1) of the measurements being ICC = 0.913 with a 95% confidence interval = 0.87-0.949.) [28].
The differences found between AVR measurements on the two types of vertebrae may be due to the overlap of the ribs and costovertebral joints that make it difficult to establish the limit of the width of the vertebral body and the smaller size of the thoracic vertebrae when compared to the lumbar vertebrae.
Regarding terminology, the term rotation is commonly a synonym for torsion in the medical literature. We consider that rotation is the physiological mobility of the vertebrae relative to each other in the axial plane, and torsion is a pathological deformity of a vertebra with a fixed rotation position in the axial plane [5]. We think it would be correct to use the term axial vertebral torsion (AVT), but we have used the term axial vertebral rotation (AVR) in the present article, because it is the most common in the literature.
Inaccuracy in AVR measurements may influence the diagnosis of scoliosis severity. The authors suggest that the evaluation of AVR using Traumameter v. 873, as an improved and digital version of Raimondi's method, would be helpful in clinical practice to increase the accuracy of AVR measurements, despite the significant inaccuracy in the determination of thoracic AVR when compared to lumbar AVR.
Our study had some limitations. Firstly, we did not consider the computer equipment of each tester (e.g., visible image size, screen resolution, luminance, contrast ratio, mouse, or touchpad characteristics) that may have somewhat influenced the accuracy of the measurements.
Secondly, we removed the outliers from some of the study distributions (due to imperfect measurements or errors in recording the values of the measurements in the database provided by each observer, or the incorrect selection of the most rotated vertebra).
On the other hand, we think that the experience level of the evaluators is not a limitation, since the intra-group and inter-group agreement of the measurements with the software shows equal or minor variations than those of the manual method, among the different measurement sessions and independently of the experience level.
Despite these limitations, the authors consider the results of the study to be valuable, as there are no published studies, to our knowledge, on the "absolute reliability" of measurements that have been made on different types of vertebrae, or by using a more accurate and precise measuring instrument than those traditionally used.

Conclusions
We conclude that software-assisted AVR measurements in frontal spine X-rays of patients with scoliosis at the lumbar level are more accurate and valid than those taken at the thoracic level. AVR measurements of the thoracic vertebrae show a mean bias error that is 3.6 times higher, despite a very similar reliability (1.88 • vs. 1.83 • ), to the AVR of lumbar vertebrae measurements, in an assessment with absolute reliability and validity criteria. Despite this difference, at both levels (thoracic and lumbar), the degree of inter-rater agreement with the software TraumaMeter v. 873 was almost perfect.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/app112311084/s1, Table S1: Outliers removed from each measurement distribution, Table S2: Rotation of each vertebra measured by each observer and in each round of measurements. Institutional Review Board Statement: Ethical review and approval were waived for this study because the complete and irreversible anonymisation of the images did not involve patient data processing.
Informed Consent Statement: Patient consent was waived due to the complete and irreversible anonymisation of the images.