Psychometric Properties of the Berger HIV Stigma Scale: A Systematic Review

Addressing HIV-related stigma requires the use of psychometrically sound measures. However, despite the Berger HIV stigma scale (HSS) being among the most widely used measures for assessing HIV-related stigma, no study has systematically summarised its psychometric properties. This review investigated the psychometric properties of the HSS. A systematic review of articles published between 2001 and August 2021 was undertaken (CRD42020220305) following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Additionally, we searched the grey literature and screened the reference lists of the included studies. Of the total 1241 studies that were screened, 166 were included in the review, of which 24 were development and/or validation studies. The rest were observational or experimental studies. All the studies except two reported some aspect of the scale’s reliability. The reported internal consistency ranged from acceptable to excellent (Cronbach’s alpha ≥ 0.70) in 93.2% of the studies. Only eight studies reported test–retest reliability, and the reported reliability was adequate, except for one study. Only 36 studies assessed and established the HSS’s validity. The HSS appears to be a reliable and valid measure of HIV-related stigma. However, the validity evidence came from only 36 studies, most of which were conducted in North America and Europe. Consequently, more validation work is necessary for more precise insights.


Introduction
The HIV/AIDS pandemic continues to be a major public health burden, affecting millions of people globally, with significant morbidity and mortality being reported. According to UNAIDS, approximately 37.7 million people were living with HIV/AIDS globally in 2020 [1]. Furthermore, there were approximately 680,000 HIV/AIDS-related deaths across the globe in 2020 [1]. This puts HIV/AIDS among the top 20 leading causes of death globally [2].
HIV-related stigma remains a significant impediment to the eradication of the HIV/AIDS pandemic. Across the globe, HIV-related stigma has been a contributing factor in delays in HIV testing [3,4] and engagement with HIV care [5]. Moreover, among people living with HIV (PLWH), it has had a role in suboptimal adherence to antiretroviral therapy (ART) [6] as well as disengagement from HIV care [7,8]. This has led to poor outcomes such as non-viral suppression [9] and faster infection progression [10]. Furthermore, HIV-related stigma has been associated with negative consequences that may further impede progress towards eradicating the pandemic, such as non-disclosure of HIV-positive status [11] and poor mental health functioning [9].
To adequately address HIV-related stigma, it needs to be measured using appropriate measurement tools [12]. Several tools have been developed for this purpose. These include the Internalized AIDS-related Stigma Scale [13], the Stigma and Social Impact Scale [14], the T.B. and HIV/AIDS Stigma Scale [15], the HIV Stigma Scale developed by Sowell and colleagues [16], the HIV Stigma Scales developed by Visser and colleagues [12], the HIV/AIDS Stigma Instrument [17], the Stigma mechanisms of the HIV stigma framework [18], the Internalized HIV Stigma Measure [19], and Berger's HIV Stigma Scale (HSS) [20], among others. To quantify the burden of HIV-related stigma adequately and accurately, these tools should be psychometrically sound across the diverse population of PLWH from different settings. Psychometric properties describe a scale's reliability and validity for use in a given population [21].
One of the most commonly used tools is the HSS developed by Berger and colleagues [20]. This tool is a 40-item measure that assesses perceived stigma in PLWH using a four-point Likert scale (strongly disagree = 1, disagree = 2, agree = 3, strongly agree = 4). The scale consists of four subscales that assess the various mechanisms through which PLWH experience stigma: personalised stigma, disclosure concerns, negative self-image, and concern with public attitudes. The personalised stigma subscale assesses the perceived consequences of other people knowing about an individual's HIV status. The disclosure concerns subscale assesses an individual's concerns or worries about disclosing their HIV status. The negative self-image subscale assesses an individual's negative feelings towards oneself due to HIV. Finally, the concern with public attitudes subscale assesses people's attitudes towards people with HIV.
During its development, the scale was shown to have excellent reliability and validity [20]. The scale, including its subscales and abbreviated versions, has since been validated and used widely among different HIV-positive populations in different settings, such as among the youth in Thailand [22], children in Sweden [23], men who have sex with men (MSM) in the United States [24], and women in Indonesia [25]. In these studies, it was observed that the scale was reliable and/or valid for use in these diverse sub-populations and settings.
Despite its extensive use and evidence of adequate psychometric properties across different settings, data on the scale's psychometric robustness has not been systematically summarised. For meaningful and accurate data that can inform the development and evaluation of HIV-stigma reduction interventions, researchers and related practitioners involved in HIV research and care who intend to use this scale require information on its psychometric robustness. Therefore, to address the above-mentioned gap, this study aims to systematically summarise the available data on the psychometric properties of the HSS in terms of reliability, content and face validity, construct validity, convergent and divergent validity, discriminant validity, and cross-cultural adaptation.

Protocol and Registration
This study's protocol was developed and registered in the International Prospective Register of Systematic Reviews (PROSPERO) under registration number CRD42020220305. This study followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [26].

Search Strategy
We performed a comprehensive bibliographic search on PubMed, Web of Sciencecore collection, PsycINFO, Scopus, and Embase (Excerpta Medica dataBASE) for relevant articles that were published from 2001 (when the scale was first published) to 24 August 2021 (when the last search was conducted).
Our search structure included keywords such as "Stigma", "HIV infections", "Scale", and "Berger" combined by the Boolean operator "AND". Respective synonyms for these keywords were joined using the "OR" Boolean operator. Where applicable, Medical Subject Headings (MeSH) terms were used. The search strategy was adapted to fit the specifications of the different databases. Supplementary File S1 provides the search string used in the PubMed database.
The search was limited to articles published in the English language where a database could allow this filter. All the identified references were retrieved and uploaded to the EPPI Reviewer Web software (https://eppi.ioe.ac.uk/EPPIReviewer-Web/Main, accessed on 21 September 2021) for data management. Additionally, we manually searched the reference lists of the included studies for additional relevant articles. We also searched the Open Grey database for potential grey literature that met our inclusion criteria.

Eligibility Criteria Inclusion and Exclusion Criteria
To be included in the review, studies had to fulfil pre-determined inclusion and exclusion criteria. We included studies where the HSS (including abbreviated versions and subscales) was being developed and/or validated for use among PLWH or in the HIV-negative but affected population. We also included any studies (observational or experimental) that used any version of the HSS to assess HIV-related stigma and reported a psychometric measure of reliability and/or validity.
We excluded studies that used the HSS but did not report psychometric properties. We also excluded studies that used the HSS and reported the psychometric properties of the original scale or an earlier version of the scale. Studies that adapted the HSS for use in a population other than PLWH or HIV-affected participants were excluded. Studies that constructed scales using a mix of items from the HSS and other scales were also excluded. Studies published in languages other than English, qualitative studies, reviews, and studies for which the full text could not be found were all excluded. For duplicate reports from the same project, only the main and more comprehensive paper was considered.

Screening of Articles by Inclusion and Exclusion Criteria
For potential inclusion, all the identified articles from the database search were independently screened by two reviewers (S.W.W. and E.K.T.) in two steps: (i) by title and abstract and (ii) by full text. The reviewers held a meeting at the end of every step to resolve disagreements. Disagreements between the reviewers were consistently low and were resolved through consensus.

Data Extraction
Data extraction was conducted in the EPPI-Reviewer Web software by S.W.W. and E.K.T., who shared the included studies equally. The following information was extracted from the included studies: (i) article details-the name of first author, title, and year of publication; (ii) study information-country, study design, study setting, sampling method, and source of the sample; (iii) sample characteristics-the population involved, sample size, age (mean, median, or range), and sex (proportion of females); (iv) characteristics of the scale used-version of the scale used, the number of items in the scale, and mode of administration; (v) outcome-the reported psychometric properties of the scales used.

Quality Assessment
Two reviewers (S.W.W. and E.K.T.) independently assessed the quality of the included studies using the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) checklist [27][28][29]. The two reviewers then resolved any disagreements in the quality rating through consensus. The COSMIN checklist contains standards (in terms of design requirements and preferred statistical methods) that assess the methodological quality of studies on measurement properties. The checklist contains boxes that contain the quality standards for each specific measurement property. The quality of each included study is evaluated by rating the quality standards of individual measurement properties on a four-point rating scale (inadequate, doubtful, adequate, very good) [30]. The overall rating of each study's quality is determined by taking the lowest rating of any standard in a box ("worse score counts" principle) [30].
In this review, we only assessed the quality of the development and/or validation studies. Since the primary aim of the included observational or experimental studies was not scale development and/or validation, they inadvertently did not provide standards that would enable quality assessment using the COSMIN checklist. Therefore, we anticipated that these studies would be rated poorly even though they may be of good quality.

Data Analysis
The included studies were heterogeneous in terms of the versions of the scale used and the sup-populations involved. Therefore, we summarised the data narratively by the reported psychometric properties. We categorised the countries where the included studies were conducted into their respective continents and used descriptive statistics (frequencies and percentages) to summarise their distribution. Descriptive statistics were also used to summarise the years of publication, the versions of the scale used, the populations involved, and the study types.

Results of Database Search
The initial electronic search yielded 1241 records from the different databases (Embase, n = 94; Pubmed, n = 87; PsycInfo, n = 162; Scopus, n = 845 and Web of Science, n = 53). A search of the reference lists of the included articles yielded 14 additional articles. We also searched the Open Grey database for grey literature but did not find any relevant articles. After removing duplicates and screening the articles by the eligibility criteria, 166 articles were included in the review [6,20,[22][23][24][25]. Figure 1 shows the PRISMA flowchart for the systematic review process.
In terms of the versions of the HSS used, 51 studies used or validated the full 40-item version of the scale (Supplementary Table S1). Three studies [22,69,91] validated the full scale and further developed an abbreviated version of the scale. The remaining studies (n = 112) used study-specific abbreviated versions or subscales of the HSS to assess HIVrelated stigma across various contexts.   In summary, all but two [144,146] of the included studies reported an aspect of reliability of the HSS.

Internal Consistency and Test-Retest Reliability
Of the 164 studies that reported an aspect of reliability, all the studies reported the internal consistency of the HSS, except one study [166], which only reported the testretest reliability of a 40-item version used in a US sample of PLWH. Of these, seven studies [20,39,52,88,91,92,121] additionally reported the scale's test-retest reliability. In one study [39], split-half reliability was additionally assessed and reported to be adequate (0.93). No study reported the intra-or interrater reliability of the scale.
For internal consistency, all the studies, except a 4-item scale in one study [127] and some subscales in ten studies [23,52,68,69,91,111,170,172,176,189], reported Cronbach's alphas of above the recommended acceptable threshold of 0.70 [191]. Two studies [34,95] reported the internal consistency of the HSS using McDonald's omega instead of Cronbach's alpha and reported good reliability (Table 1 and Supplementary Table S2). In one study [111], Cronbach's alpha (0.83), ordinal alpha (0.88), and omega alpha (0.93) were reported as measures of internal consistency. For test-retest reliability, the intraclass correlations in all the studies were above the recommended cut-off score of 0.40 [191], except for the concern with public attitudes subscale in the 17-item HSS used in Puerto Rico, which reported an intraclass correlation coefficient of 0.27 [92]. Table 1 and Supplementary Table S2 present the validity of the HSS as reported by the development and/or validation studies and observational or experimental studies, respectively, in detail. In total, 36 of the included studies assessed an aspect of the validity of the HSS (Table 1 and Supplementary Table S2).     PCA-principal component analysis, CFA-confirmatory factor analysis; EFA-exploratory factor analysis; CFI-comparative fit index; DIF-differential item functioning; GFI-goodness of fit index; AGFI-adjusted goodness of fit index; AIC-Akaike information criterion; χ2-Chi-square; df--degree of freedom; RMSEA-root mean square of approximation; SRMR-standardized root mean square residual; TLI-Tucker-Lewis index; a-Cronbach alpha; ICC-intraclass correlation; r-Pearson correlation coefficient; KMO-Kaiser-Meyer-Olkin measure; FGDs-focused group discussions; PLWH-people living with HIV; NR-not reported.

Content and Face Validity
Fourteen studies [20,35,40,63,70,91,97,104,107,111,118,137,147,184] evaluated the content validity of the HSS. Of these, three studies [35,70,91] also evaluated the scale's face validity. In three of these studies [35,104,184], content validity was assessed using content validity index, with reported content validity index of 1.00 [184], 0.82 [35], and 0.87 [104]. For the rest of the studies, the items in the scales used were judged to be relevant, comprehensive, clear, or comprehensible by the participants [40,63,70,91,107,118], experts [20,147], and participants with experts [97,111]. In one study [137], the items were selected from a previously published and validated scale to ensure content validity. To assess face validity, item relevance was judged by the participants or the experts in two studies [70,91]. One study [35] assessed face validity using the face validity index and reported a face validity index of 0.56.

Construct Validity
In total, 27 studies assessed the factor structure of the HSS [20,22,23,40,45,60,[69][70][71][91][92][93]95,97,107,111,121,143,147,156,171,176,178,180,184,190]. During the development of the initial scale, Berger et al. [20] derived a four-factor solution from the exploratory factor analysis (EFA) that accounted for 46% of the variance. The four-factor structure was replicated in 14 of these studies [45,60,[69][70][71]91,92,107,111,147,156,171,178,184], often with acceptable-to-good model fit statistics (see Table 1 and Supplementary Table S2). In two studies that assessed construct validity in two versions of the HSS [22,97], the 12-and 10-item versions of the scale in these studies also replicated the four-factor structure of the original scale. However, EFA suggested a five-factor solution in the 40-item version of the scale in a study by Rongkavilit et al. [22] and a three-factor solution in the 13-item version of the scale in a study by Kamitani et al. [97]. Factor analysis for the remaining studies suggested a three-factor solution in two studies [23,176], a two-factor solution in six studies [40,93,95,121,177,180], and a one-factor solution in two studies [143,190].
The observed associations between the HSS and these measures in these studies were as hypothesised (see Table 1 and Supplementary Table S2 for correlation coefficients). Overall, there were low to high positive correlations between the scale and measures of depression, anxiety, social conflict, peer problems, bullying victimisation, stigmatisation, discrimination, fear of discovery, sexual abuse, sexuality problems, perceived side effects, HIV symptoms, detectable viral load, previous incarceration, female gender, heterosexual orientation, inconsistent condom use, and alcohol use.
Similarly, but in the inverse direction, there were low to high negative correlations between the scale and measures of self-esteem, social support, social integration, quality of life, life satisfaction, physical, psychological, and emotional well-being, self-efficacy, adherence, CD4 count level, disclosure of HIV status, spirituality/religiousness, overall health status, and acculturation. To assess divergent validity, Kamitani et al. [97] explored the associations of the 10-and 13-item versions of the HSS with education level in a sample of Asians living with HIV in the US and found no association (B = −0.28, p = 0.93).

Discriminant Validity
Only four studies [40,70,111,171] assessed the discriminant validity of the HSS. First, Boyes et al. [40] evaluated the discriminant validity of a 10-item scale adapted for measuring stigma-by-association among South African youth affected by HIV by comparing its performance among HIV-affected and unaffected youth. As expected, stigma scores were significantly higher among HIV-affected youth (p < 0.001).
Second, in a Spanish sample of PLWH, Fuster-RuizdeApodaca et al. [70] evaluated the discriminant validity of a 30-item scale by comparing its performance among participants with and without a history of AIDS-related opportunistic infection. As expected, stigma scores were significantly higher among participants with a history of AIDS-related opportunistic infection (p = 0.003).
Third, using a 21-item Spanish version of the scale in Mexico, Valle et al. [171] demonstrated discriminant validity of the scale by finding significant differences between participants with lower and higher stigma scores on the scale (p < 0.01). Finally, Luz et al. [111] determined the discriminant validity of a 12-item scale in a Brazilian sample of PLWH by comparing stigma scores based on antiretroviral treatment and adherence status. As expected, in specific samples or across the entire sample, participants who were not on treatment or those with poor adherence showed statistically significant higher subscale scores.

Cross-Cultural Adaptation and/or Validity
Three studies [111,144,146] assessed the cross-cultural validity of the HSS by assessing the differential item functioning (DIF) of its items across cultural and ethnic groups. In a 40-item version of the HSS, Rao et al. [144] found 11 items that functioned differently across a sample of black and white American participants. Reinius et al. [146] assessed the DIF of a 32-item version of the scale across Indian, Swedish, and US cohorts of PLWH. This study found nine items that functioned differently between cultures and one item that functioned differently across gender. Luz et al. [111] assessed the DIF of a 12-item version of the scale in a diverse sample of Brazilian participants recruited from different social media platforms (Grindr, Hornet, and WhatsApp/Facebook). They found one item that functioned differently across the three samples.
The cross-cultural adaptation process of the HSS was conducted in 34 studies (Table 1  and Supplementary Table S2). In 23 of these studies [36,39,40,69,70,91,92,94,95,97,107,111,118,121,125,126,129,132,137,156,159,162,184], the scales were adapted to the local contexts following forward and back-translations and review by experts and/or feedback from participant interviews and discussions. Ten of the studies [22,82,98,[103][104][105]118,127,136,164] just translated the scale to the target native languages, while one study [122] adapted a scale that had already been translated in the study setting. Nine studies [23,35,61,88,116,147,160,170,175] did not adapt the scale but used previously adapted versions.

Quality of the Included Studies
Supplementary Table S3 presents the quality ratings of the included development and/or validation studies per the reported psychometric measures using the COSMIN checklist. Three studies [92,144,146] were rated to be of very good quality, eight studies [22,45,69,93,95,121,178,184] to be of adequate quality, nine studies [20,40,70,91,97,111,118,147,171] to be of doubtful quality, and four studies [23,39,63,107] to be of inadequate quality.

Discussion
This review aimed to systematically summarise the available evidence on the psychometric properties of the Berger Stigma Scale. One hundred and sixty-six studies met our inclusion criteria and were reviewed. Most of the included studies (46.4%) were conducted in North America, particularly in the United States. This is not surprising, considering that the scale was first developed and validated for use in this setting. Therefore, a reliable and valid tool for use in this setting was available to researchers. The lack of adequately validated tools for use in a given setting can contribute to the under investigation or underreporting of patient-reported health outcomes. For instance, the paucity of validated screening tools for anxiety disorders has previously been cited as a reason for the lower screening of anxiety disorders among young people living with HIV from sub-Saharan Africa [192].
Most of the included studies (91.6%) were published from 2010 onwards. This highlights an encouraging increase in the scientific knowledge of HIV-related stigma over the past decade. It is also encouraging to note that the included studies recruited diverse samples of PLWH, including key populations such as MSM, female sex workers, and adolescent girls and young women. This exemplifies the efforts that have been put in place to end HIV/AIDS, including efforts towards understanding HIV-related stigma in such key populations that have been known to be significant contributors to the pandemic [193,194].

Psychometric Properties of the HSS
The internal consistency of the HSS ranged from acceptable to excellent (Cronbach's alpha ≥ 0.70 [191]) in 93.2% of the included studies. This suggests that the scale is largely reliable for use across contexts and the diverse population of HIV-positive and negative but affected individuals. Moreover, for the few studies (n = 8) [20,39,52,88,91,92,121,166] that reported the test-retest reliability of the scale, the reported intraclass correlation coefficients were above acceptable (≥0.40 [191]) in seven of these studies. This suggests that the HSS is also stable over time and provides further evidence of the scale's reliability. However, since this was only assessed in a few studies, future studies also need to incorporate this aspect of reliability to confidently ascertain the temporal stability of this scale.
It appears that the four-factor structure of the HSS is stable across contexts and the diverse population of PLWH. Of the 26 studies that assessed the factor structure of the HSS besides the original study, 16 studies (61.5%) replicated the four-factor structure of the original scale. Although different factor-solutions were suggested in some studies, the suggested solutions were consistent with the number of subscales used [143,177,180]. Where different factor solutions were suggested for the full or abbreviated versions, some factors in the suggested solutions were consistent with factors on the original scale. While this suggests that the experiences of HIV-related stigma might be similar across contexts, the differences in factor solutions might be due to cultural, linguistic, and sociodemographic differences [23,176] and differences in the number of items used [121].
Across studies reporting the convergent validity of the HSS, the correlations between the scale and the theoretically related measures were all in the expected direction, providing evidence of convergent validity. Of note, the correlations between the HSS and these measures were not only in the expected direction but also significant in most of the included studies. The observed associations are consistent with findings from a systematic review that assessed the association between HIV-related stigma and various health outcomes [195].
The discriminant validity of the HSS was established in four studies [40,70,111,171]. Although limited to only a few studies, the HSS appears to have the ability to distinguish between people with a higher and lower risk of HIV-related stigma. This suggests that the scale is sensitive to HIV-related stigma and provides strong evidence of validity [196]. However, four studies indicate a very thin evidence base, thus the need for more research focusing on correlating the HSS with biomarkers of HIV, e.g., medical adherence, viral load, and CD4 count. Additionally, the HSS could be used in longitudinal studies to try and see to what extent it is sensitive to change since this is what will make it useful for interventions.
Although the HSS has been translated and/or adapted to target settings, evidence of cross-cultural validity remains limited. Only three of the included studies [111,144,146] assessed and established the cross-cultural validity of the HSS through DIF. More crosscultural validation work of the HSS is needed to determine its performance across different cultures and contexts. Despite this, the scale appears to have content and face validity across contexts, although this is limited as it was only assessed in 14 of the included studies. Overall, the evidence of reliability and validity (although limited) from this review suggests that the HSS is psychometrically sound to assess HIV-related stigma across contexts and the diverse population of HIV-positive and negative but affected people.

Limitations of the Review
This review was not without limitations. First, we restricted our search to studies that were only published in the English language. Consequently, we may have left out relevant studies published in different languages. Second, we could not rate the quality of the included observational or experimental studies because of the nature of our chosen rating tool. Because of this, it is difficult to ascertain the strength of evidence of the results from these studies. Finally, we could not conduct a meta-analysis because of the heterogeneity in the versions of the HSS used as well as the sub-populations involved. Because we could not conduct a meta-analysis, we could not also assess and report on publication bias.

Implications of the Findings to Practice and Future Research
There is a need for more validation work to ascertain the validity of the HSS across contexts. Only 21.7% (n = 36) of the included studies reported an aspect of the validity of the HSS, while the remaining studies only reported reliability. This validation work is needed even more in African, Asian, Oceania, and South American settings, where there is minimal information on the validity of the HSS. Of the 36 studies reporting on an aspect of validity, more than half of these studies (58.3%, n = 21) came from North America and Europe. The rest were distributed across Africa [40,133,137], Asia [22,35,39,91,95,104,143,180,184], and South America [69,111,121]. No study was from Oceania. The distribution of the validation studies reflects a bias towards the global North. This may be because most of the funding for research is in the global North [197] although the highest disease burden is in lowand middle-income countries [1]. This highlights a need for more research investments in low-and middle-income countries. That said, where reported, the HSS appears to have good psychometric properties across contexts and across the diverse population HIVpositive and negative but affected people, particularly reliability. Therefore, researchers and related practitioners may use this measure in their contexts following some validation work as needed.

Conclusions
The measurement of HIV-related stigma has the potential to help identify individuals at risk of poor psychological outcomes and inform the development and evaluation of HIVstigma reduction interventions. However, this requires the availability of psychometrically sound measures for use among the target populations. The HSS is an example of such measures and is one of the most widely used measures of HIV-related stigma. According to this review, the HSS appears to be a reliable and valid measure of HIV-related stigma across contexts. However, evidence of validity is limited, particularly for African, Asian, Oceanian, and South American settings. This calls for more context-specific validation work of the HSS. The HSS may be used to assess HIV-related stigma following context-specific work as needed.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/ijerph182413074/s1, Supplementary File S1: Search string used in PubMed, Table S1: Characteristics of included studies, Table S2: Psychometric properties of the HSS as reported by the observational or experimental studies, Table S3: Quality ratings of the included development and/or validation studies.