Validation and Psychometric Properties of the Spanish Version of the Hopkins Symptom Checklist-25 Scale for Depression Detection in Primary Care

Depression constitutes a major public health problem due to its high prevalence and difficulty in diagnosis. The Hopkins Symptom Checklist-25 (HSCL-25) scale has been identified as valid, reproducible, effective, and easy to use in primary care (PC). The purpose of the study was to assess the psychometric properties of the HSCL-25 and validate its Spanish version. A multicenter cross-sectional study was carried out at six PC centers in Spain. Validity and reliability were assessed against the structured Composite International Diagnostic Interview (CIDI). Out of the 790 patients, 769 completed the HSCL-25; 738 answered all the items. Global Cronbach’s alpha was 0.92 (0.88 as calculated for the depression dimension and 0.83 for the anxiety one). Confirmatory factor analysis (CFA) showed one global factor and two correlated factors with a correlation of 0.84. Area under the curve (AUC) was 0.89 (CI 95%, 0.86–0.93%). For a 1.75 cutoff point, sensibility was 88.1% (CI 95%, 77.1–95.1%) and specificity was 76.7% (CI 95%, 73.3–79.8%). The Spanish version of the HSCL-25 has a high response percentage, validity, and reliability and is well-accepted by PC patients.


Introduction
Depression is a common condition among adults and can lead to harmful consequences. Worldwide, it is considered the third cause of years lost to disability [1], with a prevalence that increased by 17.8% between 2005 and 2015 [2]. In Europe, studies carried out in primary care (PC) settings have reported an incidence from 9.6% [3] to 20.2% [4]. The prevalence of depression in Spain is higher than the European mean and is associated

Participants
The selection criteria were those employed in the EIRA study. Eligible participants were patients aged between 45 and 75 years who had two or more of the following unhealthy behaviors: tobacco use, low adherence to the Mediterranean dietary pattern, and insufficient physical activity. Exclusion criteria were advanced serious illness, cognitive impairment, dependence in basic everyday activities, severe mental illness, inclusion in a long-term home healthcare program, treatment for cancer, end-of-life care, or no plan to reside in the area during the intervention period.

Recruitment and Sample Size
Recruitment was made by consecutive sampling of patients meeting the selection criteria and attending the PHC for any reason. The recruitment period took 6 months during 2017.
The COSMIN guide [23] was followed to calculate the sample size. It states that seven completed questionnaires are needed per each item of the scale and that at least 100 completed questionnaires are required to assess psychometric properties. As the HCSL-25 has 25 items, and taking into account a 10% possibility of missing values, 193 patients were needed to complete the questionnaire.
In order to estimate the sample size required to compare the HSCL-25 with the Composite International Diagnostic Interview (CIDI), the receiver operating curve (ROC) and the corresponding area under the curve (AUC) were calculated with the BIOSOFT application (http://www.biosoft.hacettepe.edu.tr/easyROC/, accessed on 31 January 2021) employing the following parameters:

2.
A type I error of 5% and a power of 95% were selected. Thus, 87 cases and 174 controls were needed (new AUC test: 0.80, standard AUC test: 0.74; case/control ratio: 2).
Taking into account that the estimated prevalence in PC is 16.3% [4], 533 patients were required to complete the scale to obtain 87 cases.
To evaluate test-retest reliability, the same considerations and a 20% possibility of missing values were taken into account, 26 patients were needed to reach an acceptable correlation coefficient of 0.7 [24]. All the included patients were invited to participate in the telephonic retest.

Variables
Sociodemographic data (sex, age, nationality, marital status, current employment, and education level) were gathered from the participants. They were asked to complete the self-administered HSCL-25 questionnaire and other forms related to the EIRA study. Afterwards, trained professionals blinded to the HSCL-25 results conducted the CIDI with all the participants. Training consisted of a global presentation of the procedure of the interview, the reading question by question, role-playing with the interviewers, and resolution of doubtful situations. Retest of the HSCL-25 was telephonic to facilitate the viability; it was carried out between 1 and 3 months later.

Hopkins Symptom Checklist-25 (HSCL-25)
The HSCL-25 is a self-administered questionnaire that takes from five to ten minutes to complete [13]. It consists of 25 items on a four-point Likert scale: 1 = "Not at all," 2 = "A little," 3 = "Quite a bit," 4 = "Extremely". The tool has two well-known dimensions: items 1 to 10 belong to the anxiety dimension, whereas items 11 to 25 constitute the depression one. The HSCL-25 score is calculated by dividing the total score of items by the number of items answered, so the final score can range from 1 to 4. A cutoff value of 1.75 is generally used for diagnosis of major depression, defined as "a case in need of treatment". This cutoff point is recommended as a valid predictor of mental disorder [15,17,25]. Our study was carried out using the Spanish version of the HSCL-25 obtained by means of translation and cultural adaption of the original English version [12].

Composite International Diagnostic Interview (CIDI)
The CIDI is a standardized structured diagnostic interview created by the World Health Organization (WHO) according to the DSM-IV and International Classification of Diseases (ICD-10) definitions and criteria. Used by trained interviewers for mental disorder assessment in the general population [26], it has demonstrated high validity and reliability [27]. Whilst the original CIDI was in English, it has been adapted into and validated for many languages using a common procedure overseen by the WHO [28]. Questions related to depression symptoms can be found in section E of the CIDI. In this study, it was considered the gold standard to assess the HSCL-25.

Patient Health Questionnaire (PHQ)
The PHQ is a well-known self-administered questionnaire used for common mental disorders. The PHQ-9 is the depression module in which each of the nine items is rated with a Likert scale that ranges from 0 to 3 [29]. The total score can vary from 0 to 27. Scores of 15 or more indicate major depression. For this study, the validated Spanish version [30] was employed.

Statistical Analysis
Analysis was conducted using STATA version 15 (manufacturer StataCorp LLC, Texas, USA).

Missing Data
The missing values for scale item responses were imputed with the mean of the responses to the rest of the scale items of each individual (the participant's most representative value). The subjects with less than 50% response were excluded. The same imputation was carried out for the retest values.

Responding Process and Item Analysis
An analysis of the responding process was performed, looking for patterns of nonresponse and frequency response distribution of the items by category and sex. The discriminatory capacity of the items was assessed by comparing the two extreme groups. The discrimination index (DI) of each item was calculated by the mean difference of each group. Given that the response options have four possible categories, the DI could vary from −3 to +3.

Internal Structure
Confirmatory factor analysis (CFA) was carried out based on the structure of the original English version. The factorial loads for the two models with only one factor and for the two correlated ones (anxiety and depression) were calculated. The robust maximum likelihood mean adjusted method was employed to carry out factorial analysis of the standardized values. To evaluate the estimated model fit, the absolute fit index was calculated with chi-squared distribution. Given that this value may be affected by the sample size, complementary indices were employed, including the root mean square error of approximation (RMSEA), the standardized root mean square residual (SRMR), and the coefficient of determination (CD). In addition, comparative indices such as the comparative fit index (CFI) and the Tucker-Lewis fit index (TLI) were employed.

Criterion Validity
Criterion validity was measured by calculating the ROC curve for the HSCL-25 scale in comparison with the gold-standard CIDI. The AUC was estimated with 95% CI. Sensitivity, specificity, positive and negative predictive values, Youden index, and the best cutoff point were also assessed.
Concordance with the PHQ-9 was measured with the Pearson correlation coefficient and the prevalence-and bias-adjusted kappa, taking into account cutoff points of 1.75 and 15 for the HSCL-25 and PHQ-9, respectively.

Internal Consistency
The contribution of the items to the internal consistency was analyzed with indicators based on correlation (homogeneity), covariance (Cronbach's alpha coefficient), and regression (R 2 ). The total Cronbach's alpha and one for each of the two subscales were calculated. The value ≥0.7 was considered adequate [24].

Test-Retest Reliability
Test-retest reliability was assessed by calculating the intraclass correlation coefficient (ICC) by the use of the mean of two evaluations (test and retest), absolute agreement, and a two-way mixed-effects model.

Participants
A total of 790 patients were selected for the HSCL-25 and 768 patients completed it (97.2% response rate). The participants' mean age was 58.4 years (± 8.2) without significant gender differences; 54.4% were women. Table 1 depicts the sociodemographic characteristics of the sample. Women and men differed in marital status and current employment.

Responding Process and Item Analysis
Of the 23 participants excluded from the analysis (2.9%), 22 did not answer any of the items and one only responded to 12 of the 25 items. Thirty participants (3.8%) left between one and five items blank; these missing values were imputed. No non-response patterns were found; Supplementary Materials Table S1 shows the non-response patterns in detail.
The mean score of the items, the response percentages for each category, and the DI are depicted in Table 2. Item 20 "Worrying too much" with a mean of 2.14 had the highest global rating; it was followed by item 4 "Nervousness" (mean = 2.03). In contrast, item 18 "Thinking of ending one's life" (mean = 1.09) had the lowest value, followed by item 9 "Feeling panic" (mean = 1.17). Women scored higher in all the items. The greatest difference between genders was observed in item 14 "Losing sexual interest" with a statistically significant difference of 0.73. The item that varied the least between genders was item 24 "Poor appetite," and it was non-significantly different.
With respect to response frequency distribution, 60.9% of the responses were found in the lowest response category "Not at all," with the rating of 1. A floor effect was observed in item 18 "Thinking of ending one's life," with 93.6% of responses in the lowest category. None of the items presented a ceiling effect.
The discrimination capacities of the items all showed a positive DI. Item 4 "Nervousness" discriminated the best with the DI of 1.43. In contrast, the item with the worst discrimination values was item 18 "Thinking of taking one's life" (DI = 0.22).

Internal Structure: Confirmatory Factorial Analysis
The Satorra-Bentler comparative fit index was significant. Globally, the other indices showed that the proposed one-factor and two-correlated-factor models were reasonably acceptable. Table 3 depicts the fit indices for each model. Table 4 shows the factor loading for each model and correlation in the two-factor model. All the factor loadings were positive and statistically significant (p < 0.001) and ≥0.30. Only item 24 "Poor appetite" had a loading below 0.4. With respect to the two correlated factors, the standardized values ranged from 0.3 for item 24 and 0.84 for item 17 "Feeling blue," both in the depression category.

Criterion Validity: Relationship with the Gold-Standard CIDI
Of the 767 patients who completed the HSCL-25 scale, 736 also participated in the CIDI interview (96.0%). The 31 patients who did not take part in the interview were excluded from the following analysis. According to the CIDI, the global depression prevalence was 8.0% (CI 95%, 6.2-10.2%): 4.7% (CI 95%, 2.7-7.5%) in men and 10.8% (CI 95%, 8.0-14.4%) in women. With respect to the HSCL-25, the global prevalence was 28.5% (CI 95%, 25.3-31.9%) for the 1.75 cutoff point. Table 5 shows the different indices and values globally and by sex. The differing optimum cutoff points for women (1.76) and men (1.84) are noteworthy. Sensitivity was similar for both genders whilst specificity was better in men. The global ROC curves are depicted in Figure 1, by gender-in Figure 2. The global AUC was 0.892 (CI 95%, 0.856-0.928); in the gender analysis, it was greater in men. The optimum cutoff point for the Spanish version of the HSCL-25 was 1.76, with the Youden index of 64.8%.

Criterion Validity: Relationship with PHQ-9 External Criteria
The HSCL-25 scale and the PHQ-9 were completed by 761 patients. The Pearson coefficient for the values of both scales was 0.780 (CI 95%, 0.750-0.806). Considering both variables as categorical with cutoff points of 1.75 and 15 for the HSCL-25 and the PHQ-9, respectively, the prevalence-and bias-adjusted kappa (PABAK) value was 0.553, with the global agreement of 77.7% (CI 95%, 74.6-80.5%).

Criterion Validity: Relationship with PHQ-9 External Criteria
The HSCL-25 scale and the PHQ-9 were completed by 761 patients. The Pearson coefficient for the values of both scales was 0.780 (CI 95%, 0.750-0.806). Considering both variables as categorical with cutoff points of 1.75 and 15 for the HSCL-25 and the PHQ-9, respectively, the prevalence-and bias-adjusted kappa (PABAK) value was 0.553, with the global agreement of 77.7% (CI 95%, 74.6-80.5%).

Reliability: Internal Consistency and Test-Retest Reliability
Intercorrelation between the items can be observed in Table S2 of  Cronbach's alpha coefficient for the total values and for each subscale, the total item correlations and determination coefficients (R 2 ) are depicted in Table 6. The value obtained for the coefficient without the item was also calculated as shown in the middle column in Table 6. All the results were lower with the exception of item 24 "Poor appetite," the elimination of which resulted in an increase in the global coefficient from the 4th

Reliability: Internal Consistency and Test-Retest Reliability
Intercorrelation between the items can be observed in Table S2 of  Cronbach's alpha coefficient for the total values and for each subscale, the total item correlations and determination coefficients (R 2 ) are depicted in Table 6. The value obtained for the coefficient without the item was also calculated as shown in the middle column in Table 6. All the results were lower with the exception of item 24 "Poor appetite," the elimination of which resulted in an increase in the global coefficient from the 4th decimal. Assessing the item-total correlation and the R 2 of this item resulted in lower values in both cases as the item was the least consistent one. The most homogeneous item was 17 "Feeling blue;" when eliminated, the internal consistency of the scale decreased to the lowest value; this item presented the highest item-total correlation and R 2 .

Discussion
A major finding of our study is that the Spanish version of the HSCL-25 is an instrument with good acceptability and high response rate for PC patients. Its reliability in measuring depression is robust and presents considerable sensitivity and specificity when compared to the CIDI interview. The CFA demonstrated that the Spanish version is similar to the original English one.
For most of the Spanish population, PC consultations are the gateway to the healthcare system. Due to the high prevalence of depression [5], it is crucial that easy to use viable tools are available for the PC environment. As the HSCL-25 meets such characteristics [9], awareness of its psychometric properties is relevant, in particular, of those items that most contribute to detecting symptoms and thus permit discrimination between the healthy populations and the potentially depressed ones. In addition, PC professionals should be informed of the reliability of the scale and its sensitivity and specificity values which are key in order to establish its diagnostic utility.
The study participants were PC patients aged 45-75 years who had taken part in the more extensive EIRA study [22]. Whilst this implied a restricted age range, which might signify a limitation, the sample was considered sufficiently representative of such individuals. The sample size was greater than the minimum required for the analysis according to the COSMIN guidelines [23], which are taken as reference in the field of psychometry. The statistical analysis was carried out based on the same recommendations. The content validity of the Spanish version of the HSCL-25 had been previously evaluated when it had been translated and transculturally adapted to Spanish and other official languages of the country [12].
With respect to item analysis, a considerable percentage of responses was available, and no definite pattern was observed. As a consequence, the questionnaire appears to be widely accepted by PC patients without any items which may cause discomfort or difficulty in understanding. As the study was carried out with patients attending the PHC for any reason, a high percentage of low-rating responses for the categories was expected. In addition, a floor effect was foreseen for item 18 "Thinking of ending one's life" which concerned suicidal ideation. Taking into account the definition of depression according to the DSM-5 [31], it is not surprising that the item that best discriminated between the healthy population and the one with depressive symptoms referred to sadness. Item 17 "Feeling blue" was shown to be the most homogenous in all the analyses, with the highest correlation compared to the other scale items. It presented the highest coefficient of determination (that is to say, it could be predicted from the rest of the items) and most contributed to augmenting internal global consistency.
Regarding analysis of the scale's factorial structure, this was performed with the CFA as the HSCL-25 has been widely studied with one single factor or two correlated ones even though other models have been proposed [15,32,33]. The fit indices for both models were acceptable, and the results indicated moderately elevated factorial loads. In the study of the two-factor model, there was a factorial correlation of 0.84 which indicated that the depression and anxiety dimensions strongly correlated in a positive manner. Such a figure is higher than that detected in previous studies [15]. The correlation is understandable as symptoms of anxiety are often observed in patients diagnosed with depression; moreover, anxiety and depression are frequently found to be associated comorbidities [34,35].
In other studies which compared the HSCL-25 scale with structured psychiatric interviews, a subsample of participants was selected for the latter to facilitate viability [13,16,17,20]. A strength of our work is that all the participants who responded to the HSCL-25 scale also took part in the structured CIDI interview imparted by trained professionals. We obtained 736 patients who fully answered both the scale and the gold-standard CIDI. Validity criteria were considerable, the global AUC was 0.890 (higher in men than in women). The global sensitivity and specificity by gender were elevated. The former was greater than that found in previous studies [13,16,17,20,25], whilst the latter was similar to the 73% reported by Nettelbladt et al. [20] and the 78% observed by Lundin et al. [16], both in Swedish populations. Other authors have described higher values [13,17]. In spite of the augmented number of false positives obtained, in a similar manner to other studies [36], the negative predictive value was greater than 97% for both genders. Such a finding indicates that the scale is a good tool for depression screening. With respect to the optimum cutoff point, both the global figure and the one for women were very similar to the 1.75 proposed in the original version and employed in other studies [21,37]. Nevertheless, 1.84 for men was higher, and contrasted with the findings of other authors where the cutoff point was greater for women [25]. When contrasting the total rating of the HSCL-25 scale with that of the PHQ-9 [29,38] of depression, an elevated correlation was obtained, and the PABAK was acceptable. Such an analysis reinforces the elevated criteria validity found.
For the one-factor HSCL-25 scale, Cronbach's alpha coefficient was 0.92, similar to the 0.93 obtained in the French version [13]. Nunally et al. established the critical level of reliability at 0.70; they stated, however, that for the key individual decisions, such as the diagnosis of depression, reliability should be raised to 0.90 [39]. Cronbach's alpha for the subscales of depression and anxiety taken separately was greater than 0.80. Such findings demonstrate the strong reliability of the scale to measure depression, especially when employed as a single dimension instrument. The test-retest reliability was greater than 0.90, higher than that observed in other studies [18], which indicates that the ratings are stable over time. The time interval between the baseline interview and the retest was considered adequate, the test conditions-acceptable, in spite of the retest being carried out by telephone to avoid overwhelming the participants.
Other shortened versions of the scale with five and 10 items [40,41] presenting acceptable reliability have been proposed. They could be of use, taking into account the characteristics of the PC environment. These studies have been performed in other languages and it might be of interest to translate them into Spanish.
Our findings indicate that, in the future, the Spanish version of the HSCL-25 scale could be employed as a diagnostic tool for depression in PC consultations. Our study has taken place within the framework of a European project [8,9], in which a common methodology has been used for the translation and adaptation of different languages. We believe, therefore, that the HSCL-25 is a good tool to carry out research concerning the prevalence of depression at the European level once the various language versions have been validated.

Conclusions
The Spanish version of the Hopkins Symptom Checklist-25 is well-accepted by patients and shows high validity and reliability to detect depression symptoms in primary care. It has a similar factorial structure to the original English version and can be used in daily practice and for research.