Validation of the Canadian Version of the Shame and Stigma Scale for Head and Neck Cancer Patients

Cancers of the head and neck and their treatment can cause disfigurement and loss of functioning, with a profound negative impact on the person’s self-image and psychosocial wellbeing. This can lead to experiences of shame and stigma, which are important targets for psychosocial interventions. Accurate measurement and identification of these problems enables clinicians to offer appropriate interventions and monitor patients’ progress. This study aimed to validate the Canadian version of the Shame and Stigma Scale (SSS) among French- and English-speaking head and neck cancer patients. Data from 254 patients from two major Canadian hospitals were analysed. The existing four-factor structure of the SSS was supported, with the following subscales: Shame with Appearance, Sense of Stigma, Regret, and Social/Speech Concerns. The Canadian SSS showed adequate convergent and divergent validity and test–retest reliability. Rasch analysis suggested scale improvement by removing two misfitting items and two items with differential functioning between French- and English-speaking patients. The final 16-item scale version was an adequate fit with the Rasch model. The SSS provides more accurate measures for people with high levels of shame and stigma, and thus has utility in identifying patients with more severe symptoms who may be in need of psychosocial interventions.


Introduction
Cancers of the head and neck and their treatment can cause various degrees of disfigurement and loss of functioning, with a profound negative impact on the person's self-image and psychological and social wellbeing [1][2][3][4]. This can lead to experiences of shame and stigma [5]. Shame is a psychological state in which the person experiences a sense of disgrace, dishonour or humiliation, and a desire to cover up, hide, or escape [6]. Stigma involves social disapproval regarding the person's identity or disfigurement [7]. Experiences of shame and stigma in cancer patients are associated with low self-esteem, depression, social isolation [1,[8][9][10], and deteriorating family relationships despite physical improvement [11]. Head and neck cancer patients also often experience guilt and regret about past behaviours [12], such as smoking or drinking, that may have contributed to the cancer [13,14].
Shame, stigma, and regret are important targets for psychosocial interventions with head and neck cancer patients in order to help them modify their negative perceptions and develop acceptance and accommodation [15]. Accurate measurement [16] and identification of these problems enables clinicians to offer appropriate interventions and monitor patients' progress. It also contributes to a better understanding of the prevalence of shame and stigma and their impact on patients' lives [15].
In response to the above needs, Kissane et al. [15] developed and validated the Shame and Stigma Scale (SSS) at the Memorial Sloan-Kettering Cancer Center in the US. Since then, the scale has been translated and validated in seven different countries [17] and has been found to be a reliable and valid measure of shame and stigma.
The aim of the current study was to validate the scale among French-and Englishspeaking Canadian head and neck cancer patients. It is important to explore the scale's functioning in different cultural and language contexts, and to adapt it to these specific needs if necessary. This appears particularly important considering the known detrimental effects of shame and stigma on physical and mental health outcomes and its target as a public health concern by Canada's Chief Public Health Officer [18].

Study Population and Procedure
Patients were recruited through the Otolaryngology-Head and Neck Surgery departments of two large Canadian hospitals affiliated with McGill University. Eligible patients had undergone a disfiguring surgery less than three years from the time of referral. The type of surgery was defined in consultation with medical teams. Patients were mailed the study questionnaires, which they returned by mail in prestamped, preaddressed envelopes. Two weeks later, patients were resent the questionnaires again for test-retest validation. Completion of the questionnaire required 20-25 min. Patients were able to complete the questionnaires at a time convenient to them to minimise burden and reduce fatigue. The recruitment procedure is described in more detail elsewhere, in a study led by one of the authors, Melissa Henry [19].

Statistical Analysis
To assess the factor structure of the SSS, exploratory factor analysis was conducted. The factor analysis used the matrix of polychoric correlations of the items rather than the Pearson correlations. This is more appropriate for ordinal data [34]. An oblique promax rotation was used, allowing for correlations between the factors. Eigen values ≥1 and the scree plot were considered in factor selection, as well as conceptual considerations of factor interpretability.
Descriptive statistics, such as means, medians, and response frequencies, were derived for each SSS subscale and for each item. Internal consistency reliabilities (Cronbach's α) and test-retest reliabilities were also calculated. Spearman's rho was used for the test-retest reliability due to the skewness of the data.
Convergent and divergent validity were examined using calculation of Spearman's rho correlations between the SSS and its subscales and the other measures used in the study. It was expected that the SSS would have stronger associations with appearance-related scales, such as the BICSI, BIS, FACT-G, and FACT H&N, as well as at least moderate correlations with psychological distress (the HADS and CES-D) and interference of illness with life (IIRS). The SSS was expected to have relatively weaker associations with aggression (PAQ) and social desirability (MCSD) scales.
Rasch [35] analysis was used to further investigate the psychometric properties of the SSS based on item-level information and the respondents' pattern of responses to each item. Rasch is a probabilistic model that estimates each respondent's ability (or level of the measured attribute) and each item's difficulty (or level of the attribute measured by the item) on the same scale with logits as the measuring units [36]. Rasch analysis was used to evaluate overall model fit, item fit, unidimensionality, potential item bias between subgroups of respondents, and how well the items targeted respondents. A good overall model fit is indicated by a nonsignificant λ 2 (with a Bonferroni adjustment for the number of items), and an overall item fit residual standard deviation below the accepted limit of <1.5. Individual item fit is indicated by item-standardised residuals within the accepted range of >−2.5 and <2.5. High item interdependencies are indicated using item residual correlations ≥0.2 [37]. Internal consistency was assessed with the Person Separation Index (PSI) estimated using Rasch, and with Cronbach's α. The PSI is interpreted similarly to α, with values ≥0.70 considered good [37]. Targeting (or the alignment of the item difficulty and person ability levels) was assessed by examining the person-item threshold distribution. Differential item functioning (DIF) refers to item bias or differential item performance [38]. DIF for the language in which the scale was completed (French or English), sex, age, and education were examined with analysis of variance, comparing Rasch scores for each level of these variables across each item. Item thresholds (the points between each adjacent response category where either of the adjacent responses is equally likely) were tested for disordering (when responses are not ordered as expected, low to high). The unidimensionality of each subscale and the whole scale was assessed by using t-tests between the Rasch-derived scores of subsets of items identified using principal component analysis on the residual correlations. Unidimensionality is confirmed if significantly less than 5% of the t-tests are significant [39]. A minimum of 150 participants is required to estimate item difficulty within ±0.5 logits accuracy [40].
Rasch analysis was conducted with RUMM2030 [41] software using the partial credit model [42]. All other analyses were conducted with Stata 17 [43].

Description of the Study Population
Data from 254 patients were collected and analysed. Of the study population, 55% were recruited from the Jewish General Hospital and 44% from the McGill University Health Centre. The questionnaires were completed in French by 53% and in English by 47% of the sample. Of the sample, 67% were male, median age was 67 years with a range of 23-98 years, 64% were partnered, 34% had completed at least high school, and 42% had tertiary education. The most common forms of cancer were cutaneous (38%), oral cavity (21%), and oropharynx (12%). Of the sample, 14% had advanced stage IV cancer, 29% stage III, 23% stage II, and 18% stage I. The demographics of the study population are described in detail elsewhere [19].

Factor Analysis of the Shame and Stigma Scale
Four factors were derived from the exploratory factor analysis (Table 1), consistent with Kissane et al.'s [15] original validation of the scale. There were several cross-loading items that had a lower or similar loading on two factors, but they were retained in the factor they conceptually fitted most with, in line with the original scale. Item 9 was the only one that did not load as expected. Overall, the results supported the four-factor structure of the original scale. The four factors were: Sense of Stigma (eight items), Shame with Appearance (six items), Regret (three items), and Social/Speech Concerns (three items). The correlations between the factors were moderate in size and similar to those obtained in Kissane et al.'s [15] original validation, ranging from 0.27 to 0.46. Table 1. Exploratory factor analysis using correlations for ordinal data (polychoric correlations) with promax rotation (allowing correlations between the factors).  Table 2 shows that the first three scales had good internal consistency (Cronbach's α from 0.76 to 0.82) and test-retest reliability (Spearman's rho from 0.60 to 0.73). The Social/Speech Concerns subscale had lower internal consistency of 0.52. The Sense of Stigma Scale had a substantial floor effect, with nearly half the scores being zero. The medians and means of the subscales and the complete scales were fairly low. Table 3 shows good corrected correlations between the items and their subscales and the total scale. Item 20 has the lowest correlation with the total scale (0.24). For most items, very low proportions of the participants obtained high scores above 3. There was a substantial floor effect for many of the items. The results in Tables 2 and 3 indicate skewed data, with only a small proportion of participants scoring in the high range of the item response categories.

Descriptive Statistics for the Subscales and Items of the SSS
The convergent and divergent validity correlations (Table 4) follow the expected pattern. The Shame with Appearance scale had the strongest correlations with the appearancerelated scales (BICSI, BIS, FACT-G, and FACT H&N). All SSS subscales were weakly to moderately correlated to the depression and anxiety scales (GADS and CES-D) and illness interference with life (IIRS). The weakest correlations for all subscales were with the aggression (PAQ) and social desirability (MSCD) scales.

Rasch Analysis
A summary of the Rasch analysis for each SSS subscale and the total scale is presented in Table 5. The Rasch analysis suggested that the Canadian version of the SSS would be improved by removing four items: 8, 9, 17, and 20. On the Shame with Appearance subscale, item 8 ("I am distressed by the changes in my face and neck") showed significant DIF in language, with respondents who completed the scale in English consistently scoring higher than expected on this item and those who completed it in French scoring lower. After item 8 was removed, there was no remaining DIF, and the model fit and PSI were adequate.
The Sense of Stigma subscale initially showed some misfit, with a significant χ 2 for the overall model fit, although the item residual standard deviation and the PSI were within acceptable limits. This subscale contained item 9 ("I feel others consider me responsible for my cancer"), which upon factor analysis loaded on only the Regret subscale. In the original validation of the SSS, item 9 was included in the Sense of Stigma subscale for conceptual reasons, but upon factor analysis it also loaded on Regret and Social/Speech Concerns. In the Canadian version, removing item 9 improved the fit of the Sense of Stigma subscale, with the overall model fit χ 2 becoming nonsignificant, and the item residual standard deviation and the PSI improving slightly.
Item 17 ("I feel sorry about the things I have done in the past") of the Regret subscale showed significant DIF for language, with French speakers scoring higher than expected and English speakers lower. Removing Item 17 reduced the PSI from 0.56 to 0.37; however, α remained adequate at 0.71. Since this is only a two-item scale with items targeted towards high-scoring respondents, this model fit is acceptable despite the low PSI.
The Social/Speech Concerns subscale showed an overall model misfit with a highly significant χ 2 , and the PSI could not be estimated properly. Cronbach's α was also low (0.52). All items showed misfit with a highly significant χ 2 (p < 0.0001), with item 20 ("I am able to join conversations") having a large fit residual (2.35), close to the ≤2.5 limit. After deleting item 20, an adequate model fit was obtained and α improved to 0.79, suggesting good internal consistency. Item 20 also showed misfit in Kissane et al.'s [15] original validation of the SSS, with an indication that some patients may have responded to it as a negatively worded item. Item 20 is a positively worded item after a row of 12 negatively worded items, which may have caused confusion among respondents.
The PSI for the Sense of Stigma, Regret, and Social/Speech Concerns scales were below the acceptable value of 0.70, while Shame with Appearance and the total scale had acceptable PSIs. All subscales and the total scale had acceptable Cronbach's α values. Both the PSI and α are measures of internal consistency reliability. While α is calculated from raw scores, the PSI is based on estimated Rasch person locations. If the scale's item difficulties and person abilities are well aligned, α and PSI are close in value. A misalignment in a skewed distribution results in a discrepancy between α and PSI, since with more extreme person scores, the error variance in the PSI increases but α remains more constant [41]. In the current sample, there was a large proportion of respondents who mainly endorsed zero or the low end of the item categories, few high scoring respondents, and subscales with a small number of items. This would explain the discrepancy between PSI and α, as well as obtaining low PSIs for the most misaligned subscales. In the total SSS, with an increased number of items covering a wider range of person abilities, the PSI improved. Overall, the acceptable α values, combined with good fit of the models, indicate that the subscales had adequate psychometric properties. However, they are better targeted at a population with a higher level of shame and stigma than the current sample. Figure 1, showing the item-person location distributions, illustrates that most items were targeted towards the higher range of difficulties, while persons are mainly located at the lower end of the distribution.
All subscales and the total scale were unidimensional, with none having significantly more than 5% of significant unidimensionality t-tests.
All item thresholds, except for item 5 ("I feel people stare at me"), were disordered. Collapsing the item response options into fewer categories (Never, Seldom/Sometimes, Often/All the time) resulted in ordered thresholds, but did not affect the fit of the models. Collapsing of categories results in loss of statistical information and decreases reliability. Disordered thresholds may not always represent a violation of the intended order of item response categories. They may be a result of low frequencies of responses in some categories [42,[44][45][46]. In our sample, there was a high proportion of zero ("Never") responses and low proportions for most items of "Often" or "All the time" responses. In Kissane et al.'s [15] original validation of the SSS, in which the data were less skewed, the thresholds were ordered. Kissane et al.'s sample was restricted to oral cavity cancers with potentially more severe symptoms, whereas in our sample, 38% of patients had cutaneous cancers with less severe symptoms, thus restricting the SSS range. Furthermore, we found that in our data, the average Rasch estimated person abilities increased monotonically as item response categories advanced from 0 to 4 for each SSS item. This suggests that patients responded to the items as intended and the item categories represent advancing levels of shame and stigma [45,46]. Based on the above evidence, it was decided to retain the existing item categories.   The final version of the Canadian SSS is presented in Appendix A.

Discussion
Despite the morbidity that the diagnosis and treatment of head and neck cancer brings, there has been a relative paucity of validated measures that are specific to the needs of this population [47]. Here, we take just such a measure and validate it for the mixed language needs of a Canadian population. The only other related measure that has been validated in English and French in a Canadian context is the McGill Body Image Concerns Scale for use in head and neck oncology (MBIS-HNC). The latter comprises two subscales: social discomfort and negative self-image. It focuses more specifically on changes in appearance, function, and senses as a result of head and neck cancer treatments [19]. Rodriguez et al. [19] reported moderate to high correlations between the MBIS-HNC and the SSS subscales (0.49 with Regret, 0.50 with Social/Speech Concerns, 0.55 with Sense of Stigma, and 0.72 with Shame about Appearance). This suggests that the two scales have good convergent validity but measure different constructs.
We have shown the Shame and Stigma Scale to be a reliable and valid measure and to have adequate psychometric properties in a Canadian population of cancer patients. The factor structure of the original SSS was supported, with the existing four subscales: Shame with Appearance, Sense of Stigma, Regret, and Social/Speech Concerns. The Canadian version of the SSS showed adequate convergent and divergent validity and test-retest reliability. The SSS followed the expected pattern, of having stronger associations with measures of concerns about appearance, psychological distress, and life interference from illness. The SSS had weaker associations with less related constructs, such as aggression and social desirability.
We suggest scale improvements by removing two misfitting items (items 9 and 20) and two items with differential functioning between French-and English-speaking patients In comparison to our study, validations in five other countries also supported the fourfactor structure of the SSS. The Taiwanese [48] and Malay [49] versions extracted five and three factors, respectively. In the Malay [49] version, all four positively worded items (1, 4, 7, and 20) were removed, and in the Chinese [50] version, items 1 and 4 were removed, since they did not perform well. These two studies raised issues about translating oppositely worded items to different cultures. We note that the positively worded item 20 also exhibited problematic fit in our study and in the original validation [15].
Our study, as well as Kissane et al.'s [15] original validation, showed that the items of the SSS are targeted towards higher levels of shame and stigma severity. This means that the scale provides more accurate measures for people with high levels of shame and stigma. Thus, the SSS has utility in identifying patients with severe levels of shame and stigma who may be in need of psychosocial intervention. The SSS could also be useful for monitoring the progress of interventions, such as counselling or psychosocial therapies.

Strengths and Limitations
The study had a sample of adequate size and representativeness. Our sample size of 254 enabled a high accuracy of Rasch estimation of item difficulty, within at least ±0.5 logits. Patients from two different health settings and a range of socioeconomic groups were recruited, and French-and English-speaking participants were approximately equally represented. A limitation of this study is that most patients had low levels of shame and stigma, while the SSS is targeted towards the high end of the distributions. The scale would provide less precise assessment for people with low levels of shame and stigma. For many of the SSS items the highest categories "Often" and "Sometimes" had very low response frequencies. This may have contributed to disordered thresholds in the Rasch model [42,[44][45][46]. Our data indicated that, nevertheless, the response options functioned as intended. The Rasch estimated person abilities (levels of shame and stigma) increased monotonically with higher response categories for all items [45,46]. However, further investigation of how the item response categories and Rasch estimated thresholds function with less skewed data is warranted in future studies. Further work to investigate how the SSS generalises to other types of cancer and other cultural groups is also needed.

Conclusions
This study provided evidence of the reliability and validity of the SSS in a Canadian population of cancer patients. The SSS is a promising tool for identifying patients with high levels of shame and stigma in clinical settings in Canada so that appropriate psychosocial interventions can be offered. In future research with the SSS, it would be important to investigate differences in experiences of head and neck cancers between men and women, as well as various sociodemographic and cultural groups.  Informed Consent Statement: Our manuscript does not contain any case studies using individual people with identifying information. All reported data are de-identified and aggregated and cannot be traced back to individual patients.

Data Availability Statement:
Data not publicly available due to privacy and ethical restrictions.

Conflicts of Interest:
The authors declare no conflict of interest.