Rasch analysis of the Edmonton Symptom Assessment System and research implications

Cancer and its treatments—including surgery, chemotherapy, radiation, and hormone therapy—are associated with decreased physical function and adverse psychological and emotional effects. Physical symptoms commonly include fatigue, nausea, pain, and decreased strength and endurance1–4. The importance of appreciating the consequences of cancer-related side effects is magnified as the number of people living with cancer continues to increase5. Evaluating the impact of cancer-related (or treatmentrelated) side effects on patients has evolved beyond physiologic evaluations (for example, changes in hemoglobin levels) to a more holistic view that considers the constellation of symptoms that patients present (for example, pain, nausea, and anxiety)6. The Edmonton Symptom Assessment System (esas) was developed to assess a variety of symptoms often reported by patients in the palliative care setting regardless of their specific diagnosis7. The esas consists of 10 questions evaluating symptoms commonly associated with cancer, including pain, fatigue, nausea, depression, anxiety, drowsiness, appetite, well-being, shortness of breath, and an “other” condition identified as important by the patient7. Each item is scored on a scale ranging from 0 to 10 (maximum score of 100 if all esas items are ABSTRACT


BACKGROUND
Cancer and its treatments-including surgery, chemotherapy, radiation, and hormone therapy-are associated with decreased physical function and adverse psychological and emotional effects.Physical symptoms commonly include fatigue, nausea, pain, and decreased strength and endurance [1][2][3][4] .The importance of appreciating the consequences of cancer-related side effects is magnified as the number of people living with cancer continues to increase 5 .Evaluating the impact of cancer-related (or treatmentrelated) side effects on patients has evolved beyond physiologic evaluations (for example, changes in hemoglobin levels) to a more holistic view that considers the constellation of symptoms that patients present (for example, pain, nausea, and anxiety) 6 .
The Edmonton Symptom Assessment System (esas) was developed to assess a variety of symptoms often reported by patients in the palliative care setting regardless of their specific diagnosis 7 .The esas consists of 10 questions evaluating symptoms commonly associated with cancer, including pain, fatigue, nausea, depression, anxiety, drowsiness, appetite, well-being, shortness of breath, and an "other" condition identified as important by the patient 7 .Each item is scored on a scale ranging from 0 to 10 (maximum score of 100 if all esas items are

Background
Reliable and valid assessment of the disease burden across all forms of cancer is critical to the evaluation of treatment effectiveness and patient progress.The Edmonton Symptom Assessment System (esas) is used for routine evaluation of people attending for cancer care.In the present study, we used Rasch analysis to explore the measurement properties of the esas and to determine the effect of using Raschproposed interval-level esas scoring compared with traditional scoring when evaluating the effects of an exercise program for cancer survivors.

Methods
Polytomous Rasch analysis (Andrich's rating-scale model) was applied to data from 26,645 esas questionnaires completed at the Juravinski Cancer Centre.The fit of the esas to the polytomous Rasch model was investigated, including evaluations of differential item functioning for sex, age, and disease group.The research implication was investigated by comparing the results of an observational research study previously analysed using a traditional approach with the results obtained by Rasch-proposed interval-level esas scoring.

Results
The Rasch reliability index was 0.73, falling short of the desired 0.80-0.90level.However, the esas was found to fit the Rasch model, including the criteria for uni-dimensional data.The analysis suggests that the current esas scoring system of 0-10 could be collapsed to a 6-point scale.Use of the Rasch-proposed interval-level scoring yielded results that were different from those calculated using summarized ordinallevel esas scores.Differential item functioning was not found for sex, age, or diagnosis groups.included), with higher scores indicating a higher burden of disease.The esas does not claim to measure health status or quality of life; rather, it focuses on bothersome symptoms that burden cancer survivors and are likely to interfere with quality of life 7 .Clinicians are able to address patient-specific symptoms identified on the esas and to record change over time by plotting symptom scores on a graph.Conversely, researchers have used esas total scores ("symptom distress score") to evaluate the effects of interventions such as exercise 8 or a combination of symptom management strategies on overall disease burden 9 .
The esas has been extensively studied throughout the cancer care continuum in a variety of cancer diagnostic groups, and its validity and reliability have been studied in various settings and with various patient populations [10][11][12][13] .The esas has also been incorporated as a "standard of care assessment" by Ontario's cancer centres, which suggests that it should be used routinely in practice and research [http://www.cancercare.on.ca/cms/one.aspx?portalId=1377&pageId=57699 (accessed February 7, 2013)].Chang et al. 14 conducted a validation study relating the esas to the Memorial Symptom Assessment Scale and the Functional Assessment of Cancer Therapy.In a sample of 233 cancer survivors, those authors reported that the esas had an overall Cronbach alpha of 0.79 with a 1-day testretest reliability (Spearman correlation) in the range 0.39-0.86,depending on the symptom measured 14 .However, in a 15-year retrospective review of esas validation studies, Nekolaichuk et al. 10 located 13 publications specifically evaluating various versions of the esas and concluded that, although this tool has been extensively adopted into clinical use, psychometric validation is limited in scope and that further studies are required to address the gap.
To date, the literature validating the esas or using the scale as an outcome measure has used mostly statistical methods such as regression, correlations, and traditional descriptive statistics such as means, standard deviations, and frequencies 6,7,14,15 .Those statistical methods are grounded in classical test theory 16 and are based on associated theoretic assumptions such as that estimates of reliability and validity apply only to the population tested, and that a greater number of questions or items reduces the variability directly attributable to random error.In contrast, Rasch analysis is a statistical method in the tradition of item-response theory.It therefore includes the assumptions that, the easier the item on a scale, the more likely it is that respondents will obtain a positive score on that item; that respondents with more ability are more likely to obtain a higher score on any given item; and that the measurements do not depend on the study population if the Rasch model can be applied (that is, items on the scale are locally independent, meaning that no single data point has influence on the value of another) 17 .Further, Rasch analysis is based on mathematical modelling that supports the use of ordinal scaling (such as the scaling used in the esas) for intervallevel calculations (such as overall score summation) if the scale is found to fit the Rasch model 17 .Pallant and Tennant 18 published an introduction to the Rasch model that included a review of some of the model's mathematical formulation.Tennant and Conaghan 19 also summarized the Rasch measurement model and discussed why and when Rasch should be used.Bond and Fox 20 provide a basic introduction to Rasch analysis, and Velozo et al. 21describe the use of Rasch to produce scale-free measurements of functional ability.Rasch analysis has also been used in the development and evaluation of physical abilities questionnaires 22,23 .
To date, we are unaware of any published examinations of the measurement properties of the esas based on Rasch analysis.That gap is a critical one, given that most published studies that have used the esas have used parametric statistics whose assumptions would be violated if the scale is ordinal.The purposes of the present study were to use Rasch analysis to estimate the measurement properties of the esas, and to determine whether a Rasch-driven scoring metric provides results different from those of the current scoring metric in an evaluation of the effects of an exercise program for people with cancer.

Participants
At the Juravinski Cancer Centre (Hamilton, ON), esas scores are collected electronically from patients attending (standard program of care) appointments.At electronic kiosks located at the entrance to the cancer centre, all patients sign in and score the esas based on their symptoms at the time of completion.Patients completing the esas might be visiting the cancer centre at any stage of their cancer journey, including for initial assessment before or after a cancer diagnosis, for chemotherapy or radiation therapy treatments, and for short-or long-term medical follow-up.The completed esas scores are forwarded to the medical team for review at the patient's appointment; the needs identified by the results guide the care provided.After permission was obtained from the joint Ethics Board of Hamilton Health Sciences and Mc-Master University, all esas questionnaires completed between November 1, 2010, and October 31, 2012, were retrieved for inclusion in the present retrospective study.All completed esas forms were included in the analysis.Each patient in the sample has only one esas as part of the current analysis.

Rasch Analysis
Rasch analysis was designed a priori and included these components: Andrich's rating-scale model Current OnCOlOgy-VOlume 21, number 2, April 2014 Copyright © 2014 Multimed Inc.Following publication in Current Oncology, the full text of each article is available immediately and archived in PubMed Central (PMC).
("polytomous Rasch analysis") was used because esas scoring is polytomous.The fit of items and persons to the Rasch model were evaluated using InFit and OutFit values of 0.7 and 1.4 24 .The assumptions of local independence and uni-dimensionality, evaluation of differential item functioning (dif), and person separation reliability were analyzed according to the guidelines provided by Linacre 24 .To evaluate the current measurement properties of the esas, item deletion or re-scoring was not performed.Response reordering was not performed in the current study 25 ; however, item misfit was evaluated 24 .
After the Rasch analysis, esas scores reported by cancer survivors participating in a communitybased exercise program (CanWell program) 8 were converted to standardized scores proposed by the Rasch analysis results and reanalyzed.

Statistical Analysis
All esas scores were exported from electronic patient charts to an Excel database and were subsequently imported into IBM SPSS Statistics (version 20.0: IBM, Armonk, NY, U.S.A.) for analysis of demographic information, including date of birth, diagnosis type, visit date, and sex.Rasch analysis was completed using the Winsteps Rasch measurement software (version 3.80.1:Linacre JM, Beaverton, OR, U.S.A.).To explore any potential changes in the conclusion about the effects of exercise on cancer disease burden in CanWell participants, a traditional (ordinal-level scoring) repeated-measures analysis and a proposed Rasch (interval-level scoring) analysis of the esas scores were performed using IBM SPSS Statistics.Statistical analyses were two-sided, with significance set at p < 0.05.

RESULTS
During the study period, 26,645 completed esas forms were collected as already outlined.Patients identified as "non-cancer respondents" (n = 13,178) were those awaiting pathology confirmation of their cancer diagnosis.Of all patients completing the esas, 1.7% (n = 443) did not report their cancer diagnosis.Of those reporting a diagnosis, most were women with breast cancer (12.1%), followed by patients diagnosed with genitourinary cancers, including prostate cancer (10.2%).Table i presents full demographic information for the study sample.Regardless of the symptom measured, the most common esas response in the study sample was 0 (Table ii).
Table iii presents Rasch summary statistics.Extreme scores defined by the Rasch software were omitted by the software because they do not provide additional information about the relative difficulty of the esas categories, and they reduce reliability scores 24 .The mean score on the esas as a whole was 18.7 out of 90, which indicates that the average response on each esas item was roughly 2 out of 10, demonstrating that patients with cancer in this sample generally score at the low end of the response range (0-10, with 10 representing worse symptoms) on each item.A general pattern of responses at the lower end of the continuum is further supported by the negative client mean for the measure in its entirety (-0.81).Considering the separation statistic (range: 1.53-1.66)and the patient reliability (0.73) together might indicate that the esas is not sufficiently sensitive to distinguish between patients who score high and those who score low.Ideally, separation values would be greater than 2, and reliability would be at least 0.80-or better yet, 0.90 or greater 24 .In contrast, the classical test theory base calculation of the Cronbach alpha on raw patient scores yielded a reliability coefficient of 0.88.When the extreme scores were included, separation and reliability values were further reduced.
Considering item fit criteria (that is, the ability of each individual item to measure a unique level of cancer symptoms), the OutFit mean square range is expected to fall between 0.7 and 1.4 26 , with values in the range 0.5-1.7 considered to be appropriate for clinical observations, and values in the range 0.5-1.5 being productive for Rasch measurement 27 .For the present study, all items but "wellbeing" (item 8 mean square: 0.65) had an acceptable item fit within the Rasch model, indicating that we are able to predict with certain accuracy how any given client will respond to a given item.Further evidence that the items are consistently discriminative are the calculated point-measures correlations, which ranged from to 0.74 28 .
For a measure to fit the Rasch model, an important condition is that the tool in question is unidimensional 24 (it measures one theoretical construct).Uni-dimensionality is substantiated within the Rasch model if the unexplained variance in the first contrast is less than 4 times the total unexplained variance 24 .In the present analysis, the unexplained variance in the first contrast was 8.6% and the total unexplained variance was 45.5%.Additionally, the eigenvalue of the first construct was calculated to be 1.6 (should be less than 3 to support uni-dimensionality 24 ).Those results support the contention that the esas measures one overall construct-that is, disease burden.Item maps are used to indicate the degree of difficulty of each individual item relative to both the scale and the construct of interest.Figure 1 presents the esas item map.The figure demonstrates that the patients in our sample are less "able" than the items are "difficult" (the person mean is roughly -1, and the item mean is centred on 0).The importance of that finding is discussed later in this paper.
Another key component of Rasch analysis involves an evaluation of the actual measurement or scoring used.The esas score for each item is in the range 0-10, offering the respondent 11 options from which to select.Figure 2 demonstrates that patients are not utilizing or discriminating well between response options 2-7 (indicated by clustering of the responses).
The dif was evaluated by looking at patient age, sex, and diagnostic groups (as outlined in Table i).
The dif is considered to be a factor when a dif size of 0.43 logits or more is calculated 24 .In the present study, the sex dif ranged from -0.16 to 0.11.All dif logits calculated for the diagnosis groups and age groups were also less than 0.43 (data not shown).
To evaluate the research implications of developing Rasch-proposed esas scoring (Table iv), the esas scores reported by CanWell participants 8 were converted to interval-level scores and subjected to a repeated-measures analysis.Contrary to the initial results, which showed participants reporting statistically significant reductions in overall disease burden [F (2, 102) = 3.37; p < 0.05; power: 0.6], the reanalysis (Table v) failed to show statistically significant changes [F (2, 102) = 1.6, p = 0.21].

DISCUSSION
The present study establishes that the esas fits the requirements of the Rasch model in that it measures a single construct (disease burden) and that esas scores can be converted to an interval-level scoring metric.Our study adds to previous studies demonstrating that the esas has moderate measurement reliability and provides important new information about the structure and scoring of the esas.
Several studies have used a variety of statistical methods to investigate the reliability and validity of the esas 10,15 .In two studies of patients with kidney disease and of patients on hemodialysis 15,29 , the reliability of the esas was reported to be moderateto-high (intraclass correlation coefficient: 0.70; p < 0.01).However, in 15-year narrative review of esas validation studies, Nekolaichuk et al. 10 concluded that the instrument lacks the psychometric evidence for such reliability, possibly because of the various esas formats used in the validity studies 10 , and that more validation studies are needed.Richardson and Jones 11 conducted a narrative review of the reliability and validity of the esas.They found 33 studies that evaluated the reliability or validity of the esas in patients with cancer and concluded that the esas is a reliable tool (correlation coefficients in the range 0.56-0.74)with restricted validity 11 .Neither review located a study evaluating the esas using Rasch analysis.
In the present study, the results of the Rasch analysis support use of the esas as a global measure of disease burden.Although the esas has items that might be considered to primarily evaluate physical health (pain, fatigue, nausea, drowsiness, appetite, and shortness of breath) and emotional health (depression, anxiety, well-being), our analysis demonstrated that, in patients with cancer, the esas total score can assess one factor: overall symptom burden.That is, the esas can be considered uni-dimensional.Although the symptoms measured by the esas are diverse, they are common complaints attributed both to the disease and to the treatment process that are often independently used to direct patient care.For example, high scores on the pain item might trigger a referral to the pain management team.
A possible question then arises: Should the esas be used as an evaluation of each item independently or of the total score?
The answer depends on the setting in which the esas is used.In the clinical setting, plotting change in the independent esas items over time provides the clinician with more important information than the total score does.Conversely, when evaluating the overall effect of an intervention (exercise, for example), the total esas score might be more relevant.Although our study has demonstrated that the esas is uni-dimensional, a consideration of the constructs being evaluated and of the purpose of the measurement tool is important 20,24 .
We calculated the Cronbach alpha of the raw esas scores as 0.88, which is similar to results reported in other studies 11,14 .However, the observed patient reliability of 0.73 calculated using Rasch analysis indicates some error in how Rasch assigns person ability 24 .A possible reason for variations in esas reliability is that the overall scale consists of only nine items, and that study samples vary, creating some instability in the findings.Increasing the number of items might increase reliability, but might also potentially increase the response burden.Such a change would conflict with the intended purpose of the esas to provide a tool that is easy for patients to complete 7 .Richardson and Jones 11 suggested adding 2-3 items evaluating other symptoms to the esas to potentially increase the tool's reliability and to provide important additional information for the health care team.To strengthen content validity, such items could be generated from participative research.However, additions of this kind should include well-written questions that do not detract from the overall fit of the esas with the Rasch model, ensuring that the tool continues to be uni-dimensional in evaluating disease burden.The addition of new questions at the higher levels of the esas might increase the spread of the patients who respond to the esas, further improving its reliability.
The review by Richardson and Jones 11 discussed the fact that some patients interpreted the esas 11-point scale as having fewer categories.Our patient respondents had similar difficulties.As demonstrated in Figure 2, response options 2-7 are clustered for most items, indicating that they are being similarly responded to.That observation suggests that the 10 response points on the scale are too many, and that patients are not easily able to discriminate between them.Future iterations of Rasch analysis of the esas might want to consider collapsing the response items to assess whether the fit to the model and the tool reliability change.
Rasch analysis allows for an examination of bias from individual factors such as age and sex, and an exploration of the effects on esas scores of a specific cancer diagnosis (for example, breast vs. prostate cancer).Although it is reasonable to assume that people with different cancers might interpret the esas differently, our study did not identify dif between diagnosis groups, or sex and age groups.Given that the data for the present study were collected in an ambulatory cancer centre setting, it is unknown whether patients in a palliative setting would have a different distribution of esas scores.Future research might include data from the palliative setting in combination with our data to examine whether dif is present for the two clinical settings.Such an analysis would contribute important information to the understanding of how esas scores might be interpreted by different patient groups 30 .It would also support generalizability of results, because a major advantage of the Rasch tradition is that the item and person measures are not sample-dependent if the data can be shown to fit the Rasch model after adjustment for dif 26 .
The research implications of the present study are highlighted using data from the CanWell study 8 .Rasch-proposed interval-level esas scores (Table iv) were used to review the results of that study, in which the effects of a supervised community-based exercise program for people with cancer were evaluated.The authors reported statistically significant reductions in overall esas scores after completion of the 12-week exercise program 8 .However, using Rasch-produced interval-level scoring for the esas, a revised analysis found no statistically significant change in esas scores.
Although the overall conclusion of that study-that exercise does not have a negative effect on people with cancer-did not change, the conclusion that exercise reduced overall disease burden in the patient sample might have to be modified.Other researchers are encouraged to use the Rasch-proposed interval-level esas scores when evaluating intervention effects on the overall disease burden in cancer.

Limitations
It is important to acknowledge that, although our study demonstrated that the esas as a whole fits the Rasch model, some issues-such as the lower patient reliability-remain.Newer versions of the esas have added short explanations under items such as wellbeing ("wellbeing = how you feel overall") to help improve reliability and validity.Data collected in the present study antedates those changes.It is important that research and clinical settings use the newest versions of the esas to allow for ongoing evaluation and improvements.Future research should include Rasch analysis of data collected with the revised version of the esas currently used in routine care.
When the data for our study were collated, cancer stage was not available for the responding patients, limiting the ability of readers to compare their patients with ours.It is possible that patients in a palliative setting might respond differently.However, considering that one of the major advantages of Rasch analysis is that items and person measures are sample-independent within the same population, ambulatory and palliative patients would both be able to be positioned on an "esas ruler" and identified using dif analysis.

CONCLUSIONS
Our study demonstrates that the esas fits the Rasch model and can be converted to an interval-level scale.
This study also supports the notion that the esas can be summed to produce an overall disease burden score.However, future research is needed to evaluate whether cancer survivors in different clinical settings interpret esas scores differently.Additionally, when interpreting esas scores in research, it is important to consider converting patient-reported ordinal results to interval-level scores before conducting parametric statistical testing.

figure 1
figure 1 Item map of the Edmonton Symptom Assessment System (esas).The Rasch measure, ranging from -3 to 2 (negative values indicate either patients of lesser "ability" or items that are "easier" for the patient to endorse at high levels, and vice versa for positive values), is shown on the far left.Patients are shown immediately to the left of the vertical line, and the esas items, to the right.The letter "M" in the figure indicates the mean for patients and items."S" is one standard deviation from the mean, and "T" indicates two standard deviations from the mean.

figure 2
figure 2 Keyform demonstrating response consolidation around items 2-7 of the Edmonton Symptom Assessment System.

table ii
Response frequencies for the Edmonton Symptom Assessment System a Percentage of all respondents (n=26,645).table iii Rasch person summary statistics, excluding extreme scores (n = 22,871) mnsq = mean squares; zstd = Z-standardized; sd = standard deviation; rmse = root mean square error.Current OnCOlOgy-VOlume 21, number 2, April 2014 Copyright © 2014 Multimed Inc.Following publication in Current Oncology, the full text of each article is available immediately and archived in PubMed Central (PMC).

table iv
Raw score-to-interval scale conversion, calibrated to 0-90 scoring se = standard error.e193 Current OnCOlOgy-VOlume 21, number 2, April 2014 Copyright © 2014 Multimed Inc.Following publication in Current Oncology, the full text of each article is available immediately and archived in PubMed Central (PMC).

table v
Edmonton Symptom Assessment System means, current and re-scored data Statistically significant reduction over the 12 weeks of the CanWell exercise program.
a sd = standard deviation.