Inter-rater reliability between musculoskeletal radiologists and orthopedic surgeons on computed tomography imaging features of spinal metastases

On autopsy, 70%–80% of cancer patients show bone metastases 1. The most common site of bony metastasis is the spine 2, and associated back pain can result from tumour and mechanical causes. Tumour-related pain may be caused by inflammatory mediators, tumour stretching the periosteum of the vertebral body, and nerve-root compression 3; mechanical pain results from structural abnormalities of the spine, such as a pathologic compression fracture. The most effective sequence of interventions—that is, the use of radiotherapy (rt) or minimally invasive surgical procedures—depends on reliable evaluation of metastatic spine involvement and its features 4–6. Computed tomography (ct) is the imaging modality most commonly used for evaluating the bony spine. However, despite continuing advancement in ct image resolution and better visibility of tumours, variability in interpretation of the target volume remains a major source of error 7,8. Bowden et al. found that application of a delineation protocol improved accuracy in identifying target volume. Their improved protocol includes guidelines concerning ABSTRACT


Introduction
The primary objective of this pilot study was to examine the inter-rater reliability in scoring the computed tomography (ct) imaging features of spinal metastases in patients referred for radiotherapy (rt) for bone pain.

Methods
In a retrospective review, 3 musculoskeletal radiologists and 2 orthopedic spinal surgeons independently evaluated ct imaging features for 41 patients with spinal metastases treated with rt in an outpatient radiation clinic from January 2007 to October 2008.The evaluation used spinal assessment criteria that had been developed in-house, with reference to • osseous and soft tissue tumour extent, • presence of a pathologic fracture, • severity of vertebral height loss, and • presence of kyphosis.
The Cohen kappa coefficient between the two specialties was calculated.
The decision to proceed to surgery is often made from an evaluation of ct imaging by the orthopedic spinal surgeon (oss) involved.In 1989, Mirels proposed an innovative image-based scoring system for impending pathologic fractures of long bones 10 ; that system has been highly utilized since its conception.A similar prognostic tool, developed within a multidisciplinary setting, for spinal compromise secondary to metastatic disease is not currently available.
The purpose of the present pilot study was to assess reliability in the scoring of ct imaging features between musculoskeletal radiologists (msks) and osss.

METHODS
We retrospectively reviewed 41 patients with spinal metastases who were receiving rt in an outpatient palliative clinic at a tertiary care hospital from January 2007 to October 2008.Given the retrospective nature of this study, the assessment was performed using the available ct imaging from routine rt simulation (3-mm slices).Features of the ct images were independently evaluated by 3 msks and 2 osss, using inhouse spinal assessment criteria that included features that both expert groups thought important to capture (Appendix A).The spinal assessment criteria included • radiated site, including cervical, thoracic, lumbar, and sacral spine; • extent of tumour involvement; • type of lesion (that is, sclerotic, lytic, mixed); • presence of pathologic fracture; • height loss; • column involvement; • soft-tissue component; • nerve-root compression; and • kyphosis.
The msk scoring was considered the clinical standard.

Statistical Analysis
Descriptive statistics are expressed as means and standard deviations for quantitative variables and as proportions for qualitative variables.The percentage agreement between msks and osss was calculated for each spinal metastasis assessment criterion.The weighted Cohen kappa coefficient (κ) was also calculated after adjusting for weighting information to test for the percentage agreement between the two specialties at the 95% confidence interval.The weight for calculation was considered using a binary variable of cancer seen (1 = Yes, 0 = No), because some cancers were seen only by the msk group.Primary cancer type was recorded based on the pathology report and captured in the demographic data.Some weighted kappa values were not calculated because of the lower number of cells in the cross table.A kappa value of 1 implies perfect agreement; values less than 1 imply less perfect agreement.These were the agreement categories used in the study 11
Agreement was 94.7% for kyphosis, 97.5% for vertebral body involvement (yes/no), and 84.6% for the type of lesion (lytic, sclerotic, or mixed).For these last three criteria, the Cohen kappa could not be calculated because the sample size was almost negligible.Tables ii and iii respectively set out the percentage agreement between the msks and osss and the Cohen kappa coefficient for inter-rater agreement on the spinal metastases assessment criteria.Overall, high-percentage agreement was observed in most areas, and the inter-rater agreement between the two specialties ranged from moderate to poor (Figure 1).

DISCUSSION
Our study investigated the inter-rater reliability between oss and msk specialists in the assessment of disruption in the bony architecture secondary to spinal metastases.Simulation ct with 3-mm slices were used in the assessments.The following findings are salient: • Moderate agreement in differentiating pedicle or lamina involvement • Poor agreement for the degree of pedicle or lamina involvement • Fair agreement regarding fracture type • Poor agreement regarding vertebral height loss • Poor agreement for indentifying nerve-root compression The poor agreement for lamina, pedicle, and anterior or posterior column involvement is most likely a result of the known difficulty in quantifying metastatic bone disease.To effectively quantify metastatic tumour involvement in the spine, accurate segmentation of the vertebra is required.Manual segmentation can be accurate, but involves extensive and time-consuming user interaction 12 .Hardisty et al. proposed an algorithm that allows for semi-automated quantification of bone involvement by tumour; however, their method is still time-consuming, useful only in the hands of an experienced user, and not widely available 13 .Differentiating between wedge and burst in a pathologic fracture was also poor, but such differentiation is somewhat of a gray area.The definition of burst is based on retropulsion of the posterior cortex.
With a wedge fracture, the appearance of the posterior cortex can be similar if the tumour extends posterior.
Agreement was also poor with regard to height loss.That lack of agreement raises concerns, because patients with vertebral collapse have been shown to benefit from surgical intervention 4,6 .Walraevans et al. 14 showed that, for precise identification of height loss, objective scoring systems are required.These scoring systems often depend on experience and the discipline involved.
Another area of poor agreement was the identification of nerve-root compression.Compared with ct, magnetic resonance imaging is well known to be a more sensitive modality for evaluating the spinal canal and nerve roots, and that difference could perhaps explain the level of discordance in scores based on ct imaging 15,16 .
The identified areas of poor agreement are central to clinical decision-making, and thus highlight the need for objective measures to quantify disease, to validate clinical outcome, to contribute to the efficiency of clinical trials, and to raise the degree of certainty for clinicians attempting to correlate interval change with true change in the clinical status of the patient 17 .The manner in which different clinicians "see" the tumour burden in a vertebra forms their perception of the clinical issues of stability and prognosis.If the tumour is variably measured, then clinical judgments will be expected to similarly vary.The present study may serve as early investigational step in determining the need for a prognostic tool to evaluate metastases to the spine.Our findings might be important in formulating consensus-based protocols for a multidisciplinary approach to managing challenging vertebral injuries secondary to spinal metastases.
Our study is limited by the sample size and the diagnostic quality of the imaging.Specifically, we note that the kappa values were influenced by the sample size rather than by variance in the interpretation between the two specialities 11 .Validation of this study in a larger cohort of patients undergoing either surgery or radiation therapy should be considered for the future.Lack of agreement in the scoring of ct imaging features could prove to be a critical factor in therapeutic decision-making.A standardized method of characterizing spinal metastases with explicit guidelines would be helpful in triaging patients to the most appropriate treatments.

¨¨
Cervical Spine Level ____ (specify) ¨ Thoracic Spine Level ____ (specify) ¨ Lumbar Spine Level ____ (specify) 2. Extent of tumor involvement: Vertebral body only = ____% involvement ¨ Posterior elements ¨ Pedicle = ____% involvement ¨ Due to tumor ¨ Due to fracture ¨ Due to both ¨ Laminar = ____% involvement ¨ Due to tumor ¨ Due to fracture ¨ Due to both 3. Type of lesion:

table iii
Cohen kappa coefficient for inter-rater agreement between musculoskeletal radiologists and orthopedic spinal surgeons on spinal metastases assessment criteria figure 1 Cohen kappa coefficient for inter-rater agreement between musculoskeletal radiologists and orthopedic spinal surgeons on spinal metastases assessment criteria.R = right; L = left.e286 Current OnCOlOgy-VOlume 18, number 6 Copyright © 2011 Multimed Inc.Following publication in Current Oncology, the full text of each article is available immediately and archived in PubMed Central (PMC).