A Cross-Cultural Adaptation of the ICECAP-O : Test – Retest Reliability and Item Relevance in Swedish 70-Year-Olds

Background: While there is a plethora of Quality of Life (QoL) measures, the Investigating Choice Experiments for the Preferences of Older People—CAPability index (ICECAP-O) is one of the few that taps into the concept of capability, i.e., opportunities to ‘do’ and ‘be’ the things that one deems important in life. We aimed to examine test–retest reliability of the ICECAP-O in a Swedish context and to study item relevance. Methods: Thirty-nine 70-year-olds who took part in a population-based health study completed the Swedish version of the ICECAP-O on two occasions. We analyzed the test–retest reliability for the index and for the individual items. Participants also rated the relevance of each item on a visual analogue scale (0–100). Results: Test–retest reliability for the index score was in good agreement with an ICC of 0.80 (95% CI 0.62–0.90). However, Kappa was low for each item and ranged from 0.18 (control) to 0.41 (role). For attachment, we found a systematic disagreement with lower ratings at the second test occasion. Participants gave their highest relevance rating to attachment and lowest to enjoyment. Conclusion: The Swedish version of the ICECAP-O had good test–retest agreement, similar to that observed for the English version. Item level agreement was problematic, however, highlighting a need for future research.


Introduction
The concept of capability provides an alternative approach to the study of human well-being.Capability can be described as people's ability to perform actions in order to reach goals that they have reason to value [1].At the core of this theoretical framework are a person's functionings (beings and doings) and capabilities (the genuine opportunities or freedoms to realize these functionings).The distinction between functionings and capabilities is between achievements, on the one hand, and freedoms or valuable opportunities on the other [1].The concept of capability has been highly influential in welfare economics, philosophy, and social and political science [2].
Although the capability approach is difficult to apply empirically because of the subjectivity of the concept, a capability-based measure of general Quality of Life (QoL) has recently been developed.The Investigating Choice Experiments for the Preferences of Older People-CAPability index (ICECAP-O) [3][4][5] is conceptually linked to Amartya Sen's capability approach [3,6] and designed to reflect QoL in its broadest sense [7].
To date, there is no Swedish language QoL questionnaire that taps into the concept of capability.As the ICECAP-O encompasses areas identified as highly valued in the wellbeing of community-dwelling [8], as well as frailer older persons [9,10] living in Sweden, we wanted to adapt the questionnaire to the Swedish setting.
The main aim of this paper, which is the first in a series, is to evaluate test-retest reliability of the ICECAP-O in a Swedish context.A further aim is to study participants' perceptions of the relevance of the individual ICECAP-O items.

Setting and Participants
Participants were recruited from the ongoing comprehensive population-based study of health among older persons, the Gerontological and Geriatric Population study in Gothenburg, Sweden (H70) [11].A convenience subsample (n = 40) from a birth-cohort of 70-year-olds born in 1944 and examined in 2015 took part in the test-retest procedure and rated the relevance of the individual ICECAP-O items.

Ethics, Consent, and Permissions
The study was approved by the Regional Ethical Review Board (Dnr 869-13 and T139-15).Written informed consent was obtained after the participants had received a complete description of the study.

The ICECAP-O
The ICECAP-O is a five-item instrument (attachment, security, role, enjoyment, and control) with four-level response options that are described as statements representing: none, a little, a lot, and full capability [5].The instrument was developed based on findings from rigorous qualitative and quantitative research with older persons in the UK [3,4,7,12].Values are anchored with a best-worst scaling, ranging from 1.00 (full capability) to 0.00 (no capability).A total index score, based on a tariff computed from population-based values in the UK, is thus obtained [4,13].The ICECAP-O questionnaire, is freely available at the University of Birmingham´s webpage [5].The Swedish version could also be accessed through a direct link: http://www.birmingham.ac.uk/Documents/collegemds/haps/projects/icecap/Icecap-o/SWEDISH-ORIGINAL-ICECAP-O-Helena-Hörder.docx.

Procedure
We followed guidelines for cross-cultural adaptation of health-related quality of life measures, including forward-backward translation, committee review, and pre-test [14].For assessment of test-retest reliability, we administered the final Swedish version of the ICECAP-O on two occasions: first as one of several self-report questionnaires included in the above described H70-study, and second as a single postal questionnaire, sent 1-to 2-weeks after the research appointment.The time interval was chosen to minimize recall bias as well as bias due to actual change.
At the second occasion, participants were asked to "[r]ate the relevance of each item for [their] well-being" on a 0-100 visual analogue scale (VAS), ranging from "not important at all" to "could not be more important."

Analysis
We analyzed the test-retest reliability for both the index and the individual items.For the index (continuous data), we calculated the intraclass correlation coefficient (ICC), using a two way mixed model of absolute agreement.The ICC can range from 0.00 (no stability/agreement) to 1.00 (perfect agreement).An ICC of 0.70 is considered to be acceptable [15].For each item, we calculated percentage agreement (PA), PA ± 1 (proportion of persons with answers within a scale score range of ±1) and Cohen's weighted kappa [16].For a deeper understanding, we also calculated systematic disagreements in relative position (RP) using a rank-based statistical method for paired ordinal data [17].RP indicates to which extent the distribution between the two test occasions is systematically shifted towards higher or lower scale categories.Values can range from −1 to 1, with a value close to 0 representing a small systematic disagreement.Statistically significant values are indicated by a 95% confidence interval (CI) that does not include the zero value.
VAS ratings of the relevance of each of the individual ICECAP-O items are presented as mean (±1 SD) and range.

Results
All participants, 21 women and 19 men, had complete data on all five items on both occasions.One person was not included in the analyses due to an actual change in conditions that seriously affected capability, yielding a total of 39 paired ratings.
The mean ICECAP-O index score was 0.86 (SD 0.10) at test occasion one and 0.84 (SD 0.11) at test occasion two.The ICC for the index was 0.80 (95% CI 0.62-0.90).For details, see Figure 1.
Societies 2016, 6, 30 3 of 6 ±1) and Cohen's weighted kappa [16].For a deeper understanding, we also calculated systematic disagreements in relative position (RP) using a rank-based statistical method for paired ordinal data [17].RP indicates to which extent the distribution between the two test occasions is systematically shifted towards higher or lower scale categories.Values can range from −1 to 1, with a value close to 0 representing a small systematic disagreement.Statistically significant values are indicated by a 95% confidence interval (CI) that does not include the zero value.VAS ratings of the relevance of each of the individual ICECAP-O items are presented as mean (±1 SD) and range.

Results
All participants, 21 women and 19 men, had complete data on all five items on both occasions.One person was not included in the analyses due to an actual change in conditions that seriously affected capability, yielding a total of 39 paired ratings.
The mean ICECAP-O index score was 0.86 (SD 0.10) at test occasion one and 0.84 (SD 0.11) at test occasion two.The ICC for the index was 0.80 (95% CI 0.62-0.90).For details, see Figure 1.Test-retest ratings were consistent for all five items for eight participants (21%).Seven persons (18%) reported a lower rating on one or more items at the second test occasion, and eight (21%) a higher rating on one or more items.A mix of higher and lower ratings was observed in 16 persons (41%).
The agreement on item level is presented in Table 1.The agreement was highest for Role (Kappa 0.41, PA 67%), followed by Attachment, Enjoyment, Security, and Control.The PA ± 1 was ≥ 95% for all items.A systematic change in RP was seen for Attachment only.For this item, the pattern of change in position indicated a small but significant probability of lower scores at the second test occasion.Test-retest ratings were consistent for all five items for eight participants (21%).Seven persons (18%) reported a lower rating on one or more items at the second test occasion, and eight (21%) a higher rating on one or more items.A mix of higher and lower ratings was observed in 16 persons (41%).
The agreement on item level is presented in Table 1.The agreement was highest for Role (Kappa 0.41, PA 67%), followed by Attachment, Enjoyment, Security, and Control.The PA ± 1 was ≥ 95% for all items.A systematic change in RP was seen for Attachment only.For this item, the pattern of change in position indicated a small but significant probability of lower scores at the second test occasion.Table 2 shows details on changes in positions for each item.Disagreements were mainly due to response shifts from "full" (rating 4) to "a lot" (rating 3).Sub-analyses indicated that men were more likely than women to report a lower capability in Attachment at the second test occasion (results not shown).x-axis = test occasion one, y-axis = test occasion two.In bold = absolute agreement at test occasion one and two.
Results of the VAS ratings for item relevance are shown in Figure 2. The highest rating was observed for Attachment and the lowest for Enjoyment.
Societies 2016, 6, 30 4 of 6 Table 2 shows details on changes in positions for each item.Disagreements were mainly due to response shifts from "full" (rating 4) to "a lot" (rating 3).Sub-analyses indicated that men were more likely than women to report a lower capability in Attachment at the second test occasion (results not shown).x-axis = test occasion one, y-axis = test occasion two.In bold = absolute agreement at test occasion one and two.
Results of the VAS ratings for item relevance are shown in Figure 2. The highest rating was observed for Attachment and the lowest for Enjoyment.

Discussion
This cross-cultural adaptation of the ICECAP-O indicates that the index has good test-retest reliability, similar to that observed in a Dutch study that focused on frail older adults [18].On the other hand, the absolute agreement for each item was low to moderate in our study.
Good reliability for the index despite the low absolute agreement for individual items could be explained by the fact that most item changes involved shifts from "full" (level 4) to "a lot" (level 3),

Discussion
This cross-cultural adaptation of the ICECAP-O indicates that the index has good test-retest reliability, similar to that observed in a Dutch study that focused on frail older adults [18].On the other hand, the absolute agreement for each item was low to moderate in our study.
Good reliability for the index despite the low absolute agreement for individual items could be explained by the fact that most item changes involved shifts from "full" (level 4) to "a lot" (level 3), and these levels have very similar weighting in the tariff, in contrast to the much larger difference in weighting between the two lowest levels [4].Further, about 40% had both lower and higher ratings on individual items at the second test occasion, resulting in a relatively consistent index score.This is the first study to examine test-retest agreement for individual ICECAP-O items.A partial explanation for the observed item inconsistency might be related to the age of our participants.In a general population-based British study that utilized the ICECAP-A (adult), higher age was associated with inconsistent item ratings [19].
Another possible explanation for test-retest item inconsistency might be related to differences in available time for completion of the questionnaire.On the first occasion, the ICECAP-O was included in an extensive questionnaire packet that was administered in connection with a comprehensive health examination.In contrast, the retest was completed at home at the participant's leisure.This meant that participants were free to take their time and reflect on ICECAP-O item response options, which could impact on interpretation and choice of response.In a think aloud study, persons were shown to vary in interpretations when rating capabilities [20].Previous general population-based research has shown slightly lower reliability for the ICECAP index compared to EuroQoL [19,21].As highlighted by others [20], a more thorough guiding might be one way to achieve more consistent interpretations of capabilities.
Participants' rating of item relevance showed that attachment was valued highest, followed by control, role, security, and enjoyment.This order is similar to that observed in the original UK study on older persons [4].
Some important study limitations need to be mentioned.The sample was relatively homogenous and small in size.Reliability testing is necessary for other age groups.A wider variation in medical conditions and functional abilities would be anticipated in older age groups, and this would be expected to result in a larger variation in item responses.

Conclusions
The ICECAP-O provides a promising self-report capability approach to the assessment of QoL.The Swedish version presented here showed high test-retest reliability for the index score, but agreement for individual items was problematic.Further testing is needed to better understand effects of testing environments, contexts, and time frames.

Figure 1 .
Figure 1.Correlation between ICECAP-O index scores at test occasion one (x-axis) and test occasion two (y-axis).Possible range from 1.00 (full capability), to 0.00 (no capability).

Figure 1 .
Figure 1.Correlation between ICECAP-O index scores at test occasion one (x-axis) and test occasion two (y-axis).Possible range from 1.00 (full capability), to 0.00 (no capability).

Figure 2 .
Figure 2. VAS ratings (0-100) of relevance of individual items included in ICECAP-O (±1 sd and range).0 = not important at all, 100 = could not be more important.

Figure 2 .
Figure 2. VAS ratings (0-100) of relevance of individual items included in ICECAP-O (±1 sd and range).0 = not important at all, 100 = could not be more important.

Table 1 .
Test-retest reliability of the ICECAP-O on item level in 70-year-olds (n = 39).
Percentages of agreement of less than 60% indicate poor agreement.Relative position values close to 0 represent a high level of reliability.Values in bold indicate significant changes.CI = confidence interval.

Table 1 .
Test-retest reliability of the ICECAP-O on item level in 70-year-olds (n = 39).
Percentages of agreement of less than 60% indicate poor agreement.Relative position values close to 0 represent a high level of reliability.Values in bold indicate significant changes.CI = confidence interval.

Table 2 .
Distributions of change in positions between test occasion one and test occasion two for ICECAP-O items (1 = no capability to 4 = full capability), n = 39.

Table 2 .
Distributions of change in positions between test occasion one and test occasion two for ICECAP-O items (1 = no capability to 4 = full capability), n = 39.