Study on the Reliability and Accuracy of Scolioscope, a New Digital Scoliometer

Early detection of scoliosis with school screening and quick, easy, and reliable assessment of its progress are of paramount importance in the management of patients. There have been several tools described, with the most common being the analog scoliometer. Most recently, smartphone applications have entered this area with and without the use of sleeves for the device. There is no research that has evaluated the accuracy of measurements both left and right in either digital or analog devices. In this study, we evaluated the reliability and validity of a new digital scoliometer called the Scolioscope. Thirty subjects were included for the intra-rater reliability study. ICC values >0.9 were calculated both for same-day and between-day measurements. The device was highly accurate with an average difference from the ones set on the sine bar of 0.03° for right-side measurements and 0.18° for the left. These measurements suggest a highly accurate and reliable tool.


Introduction
The term scoliosis (scoliosis = crooked) was used by Hippocrates to describe a deformity of the spine. Scoliosis is a three-dimensional deformity [1,2] of the spine with a disorder in the frontal, sagittal, and transverse plane [3]. In the majority of cases (80-85%) the etiology of scoliosis is not known, and the disease is characterized as idiopathic. Idiopathic scoliosis, depending on the age of the patient at the time of diagnosis, is characterized as infantile (0-3 years), childhood (3-10 years), adolescent (10 years-skeletal maturation) and adult scoliosis. The most representative and common form of idiopathic scoliosis is adolescent idiopathic scoliosis with an incidence, according to various studies, in the general population between 0.47% and 5.2% [4]. During the clinical examination for scoliosis, the Adams test is a rough and easy clinical method [5,6]; however, it presents specific weaknesses in terms of its individual application [7].
Simple radiological examination (frontal and sagittal radiographs of the spine) shows three-dimensional deformation over two spinal segments and firstly offers the possibility of direct measurement of the frontal and sagittal deformity, more commonly with the method of Cobb, and secondly allows the indirect assessment of spinal deformation on the transverse plane (rotation of the vertebrae) using various radiological indicators (Lippman-Cobb, Nash-Moe). According to the Scoliosis Research Society (SRS), scoliosis is considered to occur when the Cobb angle is above 10 • [8]. Disadvantages of radiological control, for the detection of scoliotic deformities in the general population, is the need for special equipment (radiological machine) with the corresponding financial cost, as well as the exposure of the examinee to harmful radiation [3].
The international literature reports the usefulness of early detection of scoliosis in the student population through school screening [9], resulting in an increase (tripling) of Diagnostics 2022, 12, 142 2 of 12 patients who could be treated conservatively in time, and a decrease in those who would re-quire future surgery [10,11]. Therefore, reliable and easy-to-use diagnostic tools are necessary for the clinical evaluation of scoliosis [12].
Various complementary clinical examination tools for the detection of scoliosis have been described in the international literature [13], with the most widely used being the analog scoliometer (or Bunnell scoliometer) measuring the axial trunk rotation (ATR) during the Adams test [14][15][16]. The reliability of the measurements [17] of the scoliometer has been evaluated from very good to excellent [10,16], and the validity of the measurements of this instrument, when correlated with the Cobb angle, found from competent to good [10,18]. That is, the validity of this tool has been evaluated with regard to whether it can detect the presence of scoliosis and not regarding the accuracy of measuring the angle in degrees, which is the definition of validity [19]. It is worth noting that there is no research that has evaluated the accuracy of the scoliometer measurements by comparing them with a given angle, such as from a sine bar or other method. Additionally, the original analog scoliometer (Orthopedics Systems Inc., Union City, CA, USA) which has been researched has a relatively large ball that travels in a tube filled with fluid, and so the actual measurement is open to interpretation [6]. In addition, it is rather expensive and not readily available in most countries.
With smartphones entering the market, many relevant applications have been created, including ones measuring ATR. Only one of those applications has been most researched, the Scoliogauge (Ockendon Partners Ltd., Shrewsbury, UK). There have been studies that assessed this application's validity without the use of a special phone adapter that has a notch for the spinous process [20][21][22] or with it [23]. These studies suggested that this application is reliable and valid against the scoliometer. However, this application can only be used by an iPhone (Apple Inc., Cupertino, CA, USA), which is influenced by software and hardware updates and therefore cannot be generalized to other electronic devices or similar applications.
There seemed to be a need for a purpose-built device capable of providing clear digital measurements, not influenced by the need of software or hard upgrades. The purpose of this study was the assessment of the accuracy and reliability of a new scoliometer providing digital readouts, called the Scolioscope.

Materials and Methods
This study had two parts. The first part assessed the accuracy of measurements against a given angle, and the second, the reliability of repeated measurements. The full trial protocol was registered at an international clinical trial database (https://clinicaltrials. gov/, NCT04764136, accessed on 6 December 2021). Recruitment took place between July 2021 and October 2021. The study was approved by the ethics committee of the Attikon University General Hospital (338/1-7-2021) and all volunteers (or their guardians/parents if they were under 18 years of age) signed an informed consent before their participation in the study. This new device has a similar shape to the Bunnell scoliometer, with the difference of presenting the measurement in digital format. The Scolioscope is comprised of an outer shell made of marine plywood that was designed using a three-dimensional design software and built in Greece by G.Kr., using a Computerized Numerical Control (CNC) milling machine. The external dimensions of the shell are 180 × 70 × 15 mm. The digital measuring unit was imported (Shahe, Wenzhou, China) and attached securely to the recess of the shell.

Accuracy Study
In order to evaluate the accuracy of the measurements, the instrument was placed on a 5" sine bar, which in turn was placed on a 30 by 40 cm, 2" thick piece of granite to provide a stable surface plate ( Figure 1). The plate was leveled using a laser level unit (Dewalt DW089K, Leola, PA, USA) and four micro-adjustable feet attached to its underside. Fifteen angle values were randomly selected ranging from 0 • to 30 • , as this is the range measured by the analog scoliometer. For the randomization process, we used https://randomizer.org/ (accessed 15 June 2021). To set the angle on the sine bar, precision machined gauge blocks (grade 0, Milton tools, Quandong, Figure 2) were utilized that raised the end of the sine bar to the height derived by the formula opposite = sineθX hypotenuse (Figures 3 and 4). The angles, their sines, and the calculated opposites can be found in Table 1. The measurements were assessed in both directions (left and right), and this was done three times for each set angle. The average value of the three measurements was measured against the sine bar angle with the use of the t-test and the Pearson correlation coefficient. In addition, the Bland-Altman plot [24] was used to compare the difference between each coupled value.
The process was performed in a temperature-controlled environment steadily kept at 20 • C, as this was the temperature at which the dimensions of the gauge blocks were initially measured, and any other temperature might potentially have minutely changed them and therefore influenced the results [25]. (Dewalt DW089K, Leola, PA, USA) and four micro-adjustable feet attached to its underside. Fifteen angle values were randomly selected ranging from 0° to 30°, as this is the range measured by the analog scoliometer. For the randomization process, we used https://randomizer.org/ (accessed 15 June 2021). To set the angle on the sine bar, precision machined gauge blocks (grade 0, Milton tools, Quandong, Figure 2) were utilized that raised the end of the sine bar to the height derived by the formula opposite = sineΧ hypotenuse (Figures 3 and 4). The angles, their sines, and the calculated opposites can be found in Table 1. The measurements were assessed in both directions (left and right), and this was done three times for each set angle. The average value of the three measurements was measured against the sine bar angle with the use of the t-test and the Pearson correlation coefficient. In addition, the Bland-Altman plot [24] was used to compare the difference between each coupled value. The process was performed in a temperature-controlled environment steadily kept at 20 °C, as this was the temperature at which the dimensions of the gauge blocks were initially measured, and any other temperature might potentially have minutely changed them and therefore influenced the results [25].   (Dewalt DW089K, Leola, PA, USA) and four micro-adjustable feet attached to its underside. Fifteen angle values were randomly selected ranging from 0° to 30°, as this is the range measured by the analog scoliometer. For the randomization process, we used https://randomizer.org/ (accessed 15 June 2021). To set the angle on the sine bar, precision machined gauge blocks (grade 0, Milton tools, Quandong, Figure 2) were utilized that raised the end of the sine bar to the height derived by the formula opposite = sineΧ hypotenuse (Figures 3 and 4). The angles, their sines, and the calculated opposites can be found in Table 1. The measurements were assessed in both directions (left and right), and this was done three times for each set angle. The average value of the three measurements was measured against the sine bar angle with the use of the t-test and the Pearson correlation coefficient. In addition, the Bland-Altman plot [24] was used to compare the difference between each coupled value. The process was performed in a temperature-controlled environment steadily kept at 20 °C, as this was the temperature at which the dimensions of the gauge blocks were initially measured, and any other temperature might potentially have minutely changed them and therefore influenced the results [25].

Reliability Study
Power analysis [26] indicated the inclusion of 30 subjects. Inclusion criteria were a cobb angle ≥ 10 • and an ability to bend forward. Exclusion criteria were spinal surgery. Demographic characteristics are summarized in Table 2 and full details are given in Table A1 (Appendix A). All participants were recruited from two physiotherapy clinics in Athens, Greece. One observer at each clinic measured the ATR of participants during the Adams forward bend test [14][15][16]. The observers placed the Scolioscope at the level which they evaluated as having maximal rotational deformation during the test, either on the thoracic or lumbar spine. Both observers were physiotherapists with more than 20 years of experience, five of which were spent assessing patients with scoliosis. Subjects were measured three times at two separate sessions, and the observers chose not to mark the level.
Only the intra-rater reliability was assessed at this time. The Intra-class Correlation Coefficient (ICC) and either the "two-way mixed for consistency" model to evaluate the repeatability of the measurements on the same day (intra-rater reliability), or the "two-way random for absolute agreement" model to evaluate the reliability of measurements at a separate time (test-retest reliability) was used. For the same day, the three measurements were used to calculate the ICC. To determine reliability between days, the mean value of the three measurements from day 1 and the relevant value from day 2 were included to calculate the ICC. The standard error of measurement (SEM) (SEM = √ (residual mean square from ANOVA) and the minimum detectable change (MDC) (MDC 95%CI = SEM × √ 2 × 1.96) were also calculated both for the same day and between days. For all statistical analyses, we used the IBM (Armonk, NY, USA) SPSS software package version 26.

Accuracy Study
There was no statistically significant difference between the sine bar set angle and the one measured by the Scolioscope (p < 0.05) on either the left or right measurements. In addition, the Pearson correlation analysis showed absolute correlation (1) between the actual and average value from both left and right measurements. The average difference between the sine bar set angle and the device was 0.03 • (SD 0.03 • ) for the measurements to the right and 0.18 • (SD 0.15 • ) to the left. This constitutes a highly accurate device.
The Bland-Altman plot for measurements to the right ( Figure 5) shows an average difference of 0.03 • with a 95% confidence interval (CI) between −0.045 • to 0.105 • . All measurements fell within the upper and lower limit of the CI, except the measurement at 22 • where the difference was 0.12 • . The Bland-Altman plot for measurements to the left ( Figure 6) shows an average difference of 0.18 • with a 95% confidence interval (CI) between −0.11 • to 0.48 • . All measurements fell within the upper and lower limit of the CI.
Diagnostics 2022, 12, 142 6 of 12 Figure 5. The Bland-Altman plot for measurements to the right. The red line represents the average difference between the sine bar set angles and the mean measurements, and the green lines, the 95% CI. Figure 6. The Bland-Altman plot for measurements to the left. The red line represents the average difference between the sine bar set angles and the mean measurements, and the green lines, the 95% CI.

Reliability Study
For the same-day measurements, the ICC was calculated at 0.998 (Table 3), which is an excellent ICC value [27]. The residual mean square from the ANOVA table was calculated at 0.029°, and therefore the SEM was 0.17° and the MDC95%CI was 0.472°. Figure 5. The Bland-Altman plot for measurements to the right. The red line represents the average difference between the sine bar set angles and the mean measurements, and the green lines, the 95% CI.
Diagnostics 2022, 12, 142 6 of 12 Figure 5. The Bland-Altman plot for measurements to the right. The red line represents the average difference between the sine bar set angles and the mean measurements, and the green lines, the 95% CI. Figure 6. The Bland-Altman plot for measurements to the left. The red line represents the average difference between the sine bar set angles and the mean measurements, and the green lines, the 95% CI.

Reliability Study
For the same-day measurements, the ICC was calculated at 0.998 (Table 3), which is an excellent ICC value [27]. The residual mean square from the ANOVA table was calculated at 0.029°, and therefore the SEM was 0.17° and the MDC95%CI was 0.472°. Figure 6. The Bland-Altman plot for measurements to the left. The red line represents the average difference between the sine bar set angles and the mean measurements, and the green lines, the 95% CI.

Reliability Study
For the same-day measurements, the ICC was calculated at 0.998 (Table 3), which is an excellent ICC value [27]. The residual mean square from the ANOVA table was calculated at 0.029 • , and therefore the SEM was 0.17 • and the MDC 95%CI was 0.472 • . Two-way mixed effects model where the people effects are random and measures effects are fixed. a The estimator is the same, whether the interaction effect is present or not; b Type C intraclass correlation coefficients using a consistency definition. The between-measure variance is excluded from the denominator variance; c this estimate is computed assuming the interaction effect is absent, because it is not estimable otherwise.
Regarding between-day reliability, the ICC was calculated at 0.997 (Table 4), which again is an excellent ICC value [27]. The residual mean square from the ANOVA table was 0.1 • , therefore the SEM was 0.316 • and the MDC 95%CI was 0.876 • .

Discussion
This study provided evidence towards the accuracy and reliability of a new digital scoliometer called the Scolioscope.

Accuracy
Regarding the accuracy of the measurements, the mean measurement difference of 0.03 • between the actual and the measured angle to the right and 0.18 • to the left are minimal for this type of assessment and below the between-day standard error of measurement of 0.316 • (Section 3.2). Measurements to the right were more accurate than ones to the left overall. However, both sides exhibited results within the upper and lower CI boundaries except the measurement at 22 • to the right, where the difference from the sine bar set angle was 0.12 • , a distance of 0.015 • higher than the upper CI boundary and 0.09 • from the mean difference. This difference between the left and right measurements, albeit small, could be associated with either the wooden shell forming or the measuring unit whose manufacturer advertised accuracy within 0.2 • . In the clinical environment, and especially when measuring ATR, these differences in accuracy would not be considered significant, and therefore, the Scolioscope constitutes a highly accurate tool.
Balg et al. [21] assessed the reliability and validity of a smartphone application (Scoliogauge) to measure ATR. The smartphone (iPhone 4s) was used without an adapter sleeve to cater for spinous process protrusion, and the device was placed directly on the back of the participant. The positioning of the smartphone used without a sleeve, could have especially affected patients with a low body mass index and produced error. They used the Bunnell scoliometer as a gold standard to measure the validity against, and reported excellent correlation between the two (>0.9). The mean difference between the devices was 0.3 • for the thoracic spine, and 0.4 • for the lumbar. Both the mean difference and the CIs were larger than the ones in this study. However, the results cannot be directly compared, as the authors [21] used the analog scoliometer as the gold standard for which they did not provide evidence as to its accuracy in measuring angles, but rather, its capability in predicting patients with scoliosis.
Guardia et al. [28] assessed the accuracy of two different smartphones (iPhones 4 and 5) using the application Scolioscreen (Spinologics, Montreal Canada), against gauge blocks and plaster casts. They assessed both left and right angles and reported consistent measurements of both devices. However, one of the two (it is not reported which) consistently measured the gauge blocks incorrectly by an average of −1.1 degrees (−0.7 to −1.6). The variability of measuring the casts was greater with a maximum difference of 3.3 degrees. This paper [28] was therefore published as an excerpt from an oral presentation, and due to the lack of further information we could not make a direct comparison of the results.
The authors concluded that caution should be used if different phones are used to take measures or if the iPhone is tilted. In addition, both these applications (Scoliogauge and Scolioscreen) were only available for Apple devices, and only those had gone through the rigor of research up to a few years ago. It is unknown whether Android-based applications and devices could provide accurate measurements.
Naziri et al. [29] tried to cover this knowledge gap by testing the accuracy of four different smartphone applications (two on iOS and two on Android) using the sine bar as the gold standard. They used the Scoliogauge and ScoliTrack on an iPhone 5 and Scoliometer and Scoliosis Measurement on a Samsung Galaxy S3. Even though this was not described in Section 2, in the tables of Section 3, a manual (analog?) scoliometer was also included but not otherwise commented on, and it is unclear what it was. The authors [29] did not indicate whether they assessed both directions, and reported mean difference values of −0.14 • (SD.31 • ) for the ScoliTrack, −0.12 • (SD 0.12 • ) for the Scoliogauge, 0.82 • (SD 1.52 • ) for the Android Scoliometer, and 2.26 • (SD 1.28 • ) for the Android Scoliosis Measurement. The iOS applications using the specific iPhone device produced results similar to that of this study when compared to measurements to the left (mean difference 0.18 • with SD 0.15 • ), but less similar when compared to the right (mean difference 0.03 • , SD 0.03 • ). Additionally, these results are better compared to the ones from Balg et al. [21]. The Android applications, on the other hand, produced far less accurate results in comparison to that of this study. The authors [29] did not use any other iPhone or Android device, and according to Guardia et al. [28], this might have influenced the results. Additionally, as the angle increased, so did the mean error. However, this finding was observed and calculated among all four applications, and therefore it is unknown whether a specific device or application was responsible.
Regarding the validity of the analog scoliometer, Amendt et al. [30] assessed the sensitivity and specificity of the device using an angle of 5 • or more (5 • -10 • ) of ATR as a cut-off point in determining patients with scoliosis. The authors [30] reported good predictive values but did not comment on the accuracy of the measurements, and therefore, their findings cannot be directly compared to that of this study.
Côté et al. [6] used the Cobb angle as the gold standard. They [6] reported that the scoliometer is more probable to determine thoracic scoliosis, but less probable for lumbar scoliosis than the Adam's test. The authors [6] concluded that the device had poor precision and inadequate diagnostic accuracy and cannot be used to monitor curve progression due to inherent error, and therefore should not be used as a screening tool. However, they did not measure the actual precision against a given angle, but rather, the ability of the device to predict the existence of scoliosis.
There is evidence to suggest [31][32][33] that an angle of 5 • or more (5 • -10 • ) in ATR measured by the scoliometer is necessary to screen for scoliosis in a non-invasive manner. The research available on the analog scoliometer appears inconclusive, and on the smartphonebased ones, is dependent on device and application, but methodological limitations cannot allow generalizability of the results. The accuracy of the Scolioscope™ appears better in comparison to both the analog and application versions, and could therefore be used as a screening tool.

Reliability
The results from this study suggest high intra-rater reliability with ICC values exceeding 0.9. The between-days SEM was 0.316 • and the MDC 95%CI was 0.876 • . Amendt et al. [30] reported high intra-rater reliability of the analog scoliometer. However, the authors [30] chose the Pearson correlation method, which could overestimate agreement [34] and not the ICC, which evidence suggests is better [35]. In addition, the SEM and MDC 95%CI were not reported, and therefore, direct comparison was not allowed.
Côté et al. [6] reported excellent inter-rater ICC values for the scoliometer in the thoracic region, and substantial values in the lumbar. They measured the inter-examiner error at 4.9 • . The MDC was not reported, and they concluded that the analog scoliometer has poor precision and diagnostic accuracy, and chose the Adam's test and spinal radiograph to remain the methods of choice to determine patients with scoliosis. As this study examined intra-rater reliability, a direct comparison cannot be made with Côté et al. [6].
On assessing the intra-rater reliability of a smartphone application (Scoliogauge), Balg et al. [21] reported an ICC value of 0.952 from thoracic measurements, and 0.966 for lumbar. These measurements were on the same day, and even though they were excellent, they were still less than the ones observed in this study (0.998). There was no between-days measurement, and the SEM and MDC were also not calculated except the reported CI, which was 2.7 • . The Scolioscope™ for the same-day measurements had a SEM of 0.17 • , and in comparison, shows a much closer spread of measured values.
The reliability of Scoliogauge was also evaluated by Getnet et al. [22] who used the iPhone 4 to evaluate intra-and inter-rater ICC. The observers did not choose to use a sleeve to place the device in, but rather used their thumbs to balance it on the back of their participants. As the thumb is oval and not round, this might have introduced error in their findings [21]. The intra-rater ICC ranged between 0.871 to 0.932 (depending on the spinal segment) with a mean standard error of 5.97 • . Even though the authors did not calculate the MDC, using the formula MDC 95%CI = SEM × √ 2 × 1.96, the result is 16.54 • . Although the authors [22] provided evidence towards the excellent intra-rater reliability of Scoliogauge, the high standard error and the derived MDC prove that this application, in combination with the manner used to assess ATR, is not suitable for clinical application [6].
Despite the excellent reliability observed both for the analog and smartphone-based scoliometers, due to the minimal SEM and MDC 95%CI , the Scolioscope™ appears a highly reliable tool.

Limitations/Suggestions
This study assessed intra-rater reliability only. It would be useful to assess inter-rater ICC. In should be noted that the two different observers used two different devices, both of which provided evidence towards the calculation of ICC, SEM, and MDC 95%CI .
The average BMI was 20.4, which is considered normal for both genders. Perhaps the inclusion of more overweight or obese participants might have influenced the results, and should therefore be considered in another study.
The literature contains controversial data regarding the correlation of the angle of trunk rotation and spinal deformity. There is evidence to suggest a positive correlation between the rib hump and Cobb angle [3,36,37], where others [38] reported that there is no clear linear relationship between the rib hump and vertebral rotation, Cobb angle, and vertebral-rib angle. In a recent study [39], there was strong correlation between the formula predicting the Cobb angle using the scoliometer readings and the actual Cobb angle. Perhaps further research towards correlating Scolioscope ATR measurements and the Cobb angle could provide further insight into the tool's clinical usefulness.
This device has just been introduced and is not readily available in the market yet. Anyone wishing to perform further evaluation of the device can acquire it by contacting G. Kr. This device, as any new device, also needs to go through the rigor of time and use to determine its robustness and longevity.

Conclusions
To determine the existence of scoliosis, inexpensive, non-invasive tools and methods with accurate and repeatable measurements are needed. There was a need for a new device capable of being accurate and reliable, and not in any need for a software or hardware upgrade. This study provided evidence towards this, indicating highly accurate and reliable findings of the Scolioscope. Further studies including a larger and more inclusive cohort also assessing inter-rater reliability could provide further evidence towards the clinical usefulness of this new device.