Development and Validation of the EspaiJove.net Mental Health Literacy (EMHL) Test for Spanish Adolescents

There is evidence of the effectiveness of implementing mental health literacy (MHL) programs. However, there are substantial limitations in the instruments available for measuring MHL. This study aimed to develop and validate the EspaiJove.net MHL test (EMHL) for Spanish adolescents by assessing its psychometric properties. The development of the EMHL test was conducted using item pool generation and a pilot study. A convenience sample of students aged 13–15 years (n = 355) participated in the validity study. Reliability was assessed for internal consistency and via test-retest. Convergent validity was evaluated by comparing the effect sizes among known groups with different levels of mental health knowledge, the correlation with mental health-related instruments, and the item discrimination index. A final version of a 35-item EMHL test was obtained with two parts: (i) a binary choice format (yes/no) for the identification of mental disorders; (ii) a multiple choice question with four possible answer options. Internal consistency was acceptable in the first part (Cronbach’s alpha = 0.744; Guttman’s lambda 2 = 0.773) and almost acceptable in the second part (Cronbach’s alpha = 0.615; Guttman’s lambda 2 = 0.643). The test-retest evaluation supported the stability of the test (first part, ICC = 0.578; second part, ICC = 0.422). No ceiling and floor effects were found. The EMHL test scores discriminated between known groups with different levels of mental health knowledge and it is associated with several-related constructs of MHL. Conclusions: The EMHL test is a relevant measure for assessing MHL in adolescents into Spanish context with acceptable validity and stability.


Introduction
It is estimated that 75% of all people suffering from a mental disorder have experienced the onset at 25 years old [1,2] and 50% during adolescence [3]. Promoting mental health to prevent mental disorders and their consequences is one of the main goals in public health [4]. A lack in mental health literacy (MHL) is associated with mental illness and delays in seeking help, so an increase in the community's MHL is needed to empower the community in improving mental health [5,6].
MHL is defined as "a set of knowledge and beliefs about mental disorders which aid their recognition, management or prevention" [5]. MHL involves: (a) the ability to recognize the development of mental disorders, (b) knowledge and beliefs about risk factors, the causes of mental disorders and how to prevent them, as well as (c) knowledge of how to seek professional help and effective available treatments [5].
Some MHL interventions for adolescents and young people have been developed in recent years in several countries [7][8][9][10][11]. These interventions suggest an improvement in mental health knowledge, in facilitating monitoring and help-seeking, an increase in the self-recognition of mental disorders, and an improvement and reduction in mental health-related stigma. In Spain, the MHL program "EspaiJove.net: a space for mental health" (EspaiJove.net) [12] has been developed in Barcelona in secondary schools.
A systematic review of the measurement properties of tools measuring MHL [13] showed that there are only two specific instruments targeting adolescents: one was only for depression [14] and the other one focused on improving beliefs and attitudes towards mental health [15]. This highlights the need for the development, evaluation, and validation of tools addressing mental health knowledge specifically for adolescents who are vulnerable to developing a mental illness. So, there is a validation gap in measuring MHL categories and the related psychometric properties of these instruments. Furthermore, no validated MHL measures addressing knowledge of positive/good mental health have been developed [16]. Developing an assessment in a test format that comprehensively evaluates the main contents of the MHL (negative and positive mental health, and help-seeking) would allow a more specific and rigorous assessment of MHL levels in adolescents, and the effectiveness of MHL interventions.
To our knowledge, no specific MHL questionnaire has been properly validated. The aim of this study is to develop and validate the EspaiJove.net mental health literacy (EMHL) test for the assessment of MHL in Spanish. The EspaiJove.net intervention consists of a universal MHL intervention which aims to promote mental health, prevent mental disorders, and facilitate help-seeking behaviors among secondary school students in the Spanish context.

Materials and Methods
The EMHL test is a maximum performance test (criterion-referenced test-CRT) based on thematic content from EspaiJove.net. The item pool generation from EspaiJove.net thematic module contents include: (1) The concepts of mental health and mental disorders, (2) mental health multidisciplinary team network and the use of health services, (3) healthy and risk behaviors in mental health, (4) social skills and antisocial behavior, bullying and cyber-bullying, (5) anxiety, (6) depression, (7) self-harm and suicidal behaviors, (8) eating disorders, (9) alcohol and substance use, and (10) psychotic disorders.
The EMHL test development process involved two phases, as shown in Figure 1. The content development of the EMHL test was developed using a literature review and focus groups. For more information, see Appendix A.

The EMHL Test Score
To obtain the EMHL total test score for each part of the test, the formula (A-E)/(n-1) is used, where A = number of correct answers, E = number of errors (including missing values), and n = number of options for each item. Then, for the first part of the EMHL test, the formula is (A-E)/(2-1), and for the second part it is (A-E)/(4-1), where each correct answer adds one point to the total score, and each incorrect answer results in zero points (Uncorrected total score) [17]. To facilitate the interpretation of results, both sections were converted into continuous scales ranging from 0 to 10 (corrected scores) using this formula (U-m)/(M-m), where U = Uncorrected total score, m = the minimum score allowed, M = the maximum score allowed. So, for the first part of the EMHL test, the formula is ((U − (−15))/(15 − (−15))) × 10 which is ((U + 15)/30) × 10, and for the second part it is ((U − (−6.67))/(6.67 − (−6.67))) × 10, which is ((U + 6.67)/13.34) × 10. A higher score means greater mental health knowledge.

Sample
The validation process was performed through the administration and analysis of the final version of the EMHL test to a non-randomized convenience sample of high school students aged 14/15y(N = 355) in 6 schools in Barcelona, Spain, and the informed consent signed by both adolescents and parents. Exclusion criteria included: (1) Students with special educational needs and/or with cognitive problems; and (2) no understanding of Spanish or Catalan. Nurses and psychologists who were members of the EspaiJove.net team informed the participants about the content of the study and administered the EMHL test.
Written informed consent from all adults participating (teachers, university students, primary care physicians and nurses and mental health professionals), high school students, and the parents of all students participating in the study was requested.

Main Validity Measures
We hypothesized specific variables to associate them with the level of MHL, with varying degrees of strength.
Stigma was measured using two questionnaires: (1) the Scaling Community Attitudes toward the Mentally Ill (CAMI) Spanish version [18] which consists of 40 items divided into four dimensions (authoritarianism; benevolence; community mental health ideology and social restrictiveness). We only used the authoritarianism dimension (10 items). For the social restrictiveness dimension, the 4 questions on the future of the RIBS were chosen, since both work on the same concepts. Higher scores mean greater agreement in engaging in the stated attitude; (2) the reported and intended behavior scale (RIBS) consists of 8 items, the first four of which are designed to assess the prevalence (past and current) of behaviors in each of the four contexts (1. living with; 2. working with; 3. living nearby; and 4. being in a relationship with someone with a mental health problem) while items 5-8 ask about intended (future) behaviors within the same contexts [19]. We selected four items from 5 to 8 (future behaviors). Higher scores indicate greater agreement with engaging in the stated behaviors. We hypothesized that higher mental health knowledge would attract less stigma.
Mental health. The strengths and difficulties scale (SDQ) was used [20]. The SDQ consists of 25 items which generate scores along five dimensions: emotional symptoms, conduct problems, hyperactivity/inattention, peer problems, and prosocial behavior (positive mental health). We hypothesized that adolescents with higher emotional symptoms and peer and conduct problems would have lower mental health knowledge, and more prosocial behaviors would have higher mental health knowledge. No a priori relationship will be found for hyperactivity/inattention because there is no item in the EMHL test regarding this construct.
Health-related quality of life (HRQoL). The 5-level EQ-5D is a brief, multi-attribute, generic, preference-based health status measure [21,22]. We used the Spanish version of EQ-5D-5L and time trade-off preference values from the Catalan general population [23]. EQ-5D-5L scores range from negative values to 1, with higher scores indicating better health status, and 0 being equal to death. We hypothesized that more anxiety / depression as indicated by EQ-5D dimension 5 would mean lower mental health knowledge; however, the remaining HRQoL dimensions of the EQ-5D will not show this relationship.
Bullying and cyberbullying. We developed a 4-item scale to assess bully victims and the bullying behaviors of perpetrators specifically for this study. Two items assess whether an adolescent has been bullied or cyberbullied and two items have bullying or cyberbullying behaviors. We hypothesized that adolescents who have been bullied or bullies themselves would have lower mental health knowledge.

Known-Groups Validity Assessment
We recruited high school teachers, nursing and psychology university students, primary care physicians and nurses, and mental health professionals (psychiatrists, psychologists and nurses) (n = 213). We hypothesized that some groups, in particular health professionals and teachers, would have significantly higher mental health knowledge than high school students.

Reliability
Missing values were assessed. The distribution of the item responses from complete responders was analyzed in order to detect highly skewed distributions and the floor or ceiling effects of correct answers. The internal consistency index for CRTs was calculated using the phi (lambda) coefficient [24,25] as an estimate of consistency. This coefficient is specific to CRTs and is interpreted as a Cronbach's alpha coefficient and 95% confidence intervals (95% CI) [26], obtaining values between 0 and 1. One month test-retest reliability was assessed with the intraclass correlation coefficient (ICC) two-way random model and 95% CI that tested for absolute agreement between the first and second administration of the scale. Values below 0.4 were considered as poor, between 0.40 and 0.59 as fair, between 0.60 and 0.74 as good, and over 0.75 as excellent [27]. In the case of negative ICC values, each item will be deleted, and the Alpha coefficient will be assessed again to measure the consistency of the EMHL test.

Convergent Validity
The ability of the EMHL test's uncorrected total score was assessed to distinguish among different groups. Differences across known groups were assessed with the ANOVA parametric test. The magnitude of the association was estimated with the effect size (ES) to compare average differences in the MHL mean between subgroups in categorical variables. The cutoffs and the interpretation of ES were low (|0.20| ≤ ES ≥ |0.50|), moderate (|0.50| < ES ≥ |0.80|), and high (ES > |0.80|) [28,29]. In the case of continuous measures, the magnitude of the association was assessed by cut-offs for Pearson correlation coefficients: very weak (< 0.20), weak (≥ 0.20-< 0.40), moderate (≥ 0.40-<0.60), strong (≥ 0.60-< 0.80), and very strong (≥ 0.80) [30]. Significance tests were all evaluated at the 0.05 level. Additionally, we assessed Alpha coefficient for all validity measures administered in our sample to observe if there is an error measurement in the proxies used to assess MHL construct.
The item discrimination index was also assessed, and it evaluates how well an individual question sorts the sample out between those who have mastered the material and those students who have not. It is based on comparing the performance of the extreme groups (low and high) in the test scores. The number of participants who have been successful in the high group (mastered) was compared to those in the low proficiency group (non-mastered). We selected 36% of the sample. The discrimination capacity of each item was assessed by these cut-offs: items that must be deleted

Reliability
The no-item category was missing. A visual inspection of item response frequencies showed very few skewed distributions in both parts of the EMHL test. Nevertheless, the Kolmogorov-Smirnov normality test showed as significant for uncorrected and corrected total scores (p < 0.001) in both total scores, which was most likely due to the large sample size studied. We assumed that both parts of the EMHL test have a normal distribution of total scores (see Figure 2). Median (25-75 percentile) and mean (SD) corrected total scores for the first part were 8.7(8.3-9.3) and 8.8(0.7), and for the second part were 4.5(3.5-5) and 4.3(1.1), respectively. The first part of the items revealed varying levels of difficulty with a large range of correct responses (28.9% to 97.7%), and the second part items were more demanding, ranging from 11.9% to 83.0% of correct responses. Thus, only 5.9% of the total sample scored the maximum possible (ceiling effect) score for the first part and 0.6% for the second part of EMHL test. For the ceiling effect, the proportion of the total sample that scored the minimum possible score was both 0.3% for the first and second part of the EMHL test. The first part of the EMHL test showed internal consistency values above ≥0.70, Cronbach's alpha was 0.744 and Guttman's lambda 2 was 0.773. However, the second part was below < 0.70, 0.615 of Cronbach's alpha and 0.643 of Guttman's lambda 2, respectively. Corrected item-to-total correlations in each item ranged from 0.162 to 0.526 for the first part and ranged from −0.206 to 0.411 for the second part ( Table 1). The items with negative values in ICC (items 2 and 4) were deleted separately and Cronbach's alpha was assessed again. Results showed an increase in Alpha coefficient (Cronbach's α when deleting item 2 = 0.660; Cronbach's α when deleting item 4 = 0.644). However, the increase has not been considerable. Score test-retest reliability using uncorrected scores and measured with the ICC were fair for both parts of the EMHL test: first part, ICC (95% CI) = 0.578(0.326-0.736), p < 0.001, and second part, ICC (95% CI) = 0.422(0.072-0.639), p = 0.012.   Table 2 shows the EMHL uncorrected total test scores by known groups. Results show significant differences across subgroups (F (4563) = 140.459; p < 0.001; η 2 = 0.499). For the first part of the EMHL test, all groups had a significant and moderate magnitude of association compared with high school students (range ES, 0.466 to 0.640; p < 0.001). In the second part of the EMHL test, teachers, university students and primary care physicians and nurses had a significant and moderate magnitude of association compared with high school students (range ES, 0.485 to 0.692; p < 0.001), and a significant and high magnitude of association when we compared with mental health professionals (ES, 0.812; p < 0.001) (see Table 2).  Table 3 shows correlations between several scales and EMHL uncorrected test scores. In the first part of the EMHL test, high school students with higher uncorrected scores for MHL had a weak strength and significant correlation of CAMI's authoritarianism dimension (r = −0.246; p < 0.01), and a very weak but significant correlation for HRQoL in the anxiety/depression domain of EQ-5D-5L (r = −0.122; p < 0.05), and in the emotional symptoms domain of SDQ (r = 0.140; p = 0.009). In the second part of the EMHL test, high school students with higher uncorrected scores for MHL had a weak strength and significant correlation for CAMI's Authoritarianism dimension (r = −0.222; p < 0.01) and RIBS' future behavior (r = <0.201; p < 0.01), and a very weak and significant correlation in the conduct problems domain of SDQ (r = −0.121; p < 0.05), HRQoL in the self-care domain of EQ-5D-5L and bullying behaviors (r = −0.130; p < 0.05). Other domains and scales were not significantly related to both parts of the total scores from the EMHL test.

Convergent Validity
Finally, we assessed if there is an error measurement using Alpha coefficients in each validity measures. Results suggest there has been error measurement in most of them (CAMI's Authoritarianism dimension: 0,141; SDQ total difficulties score: 0.469; EQ-5D: 0.596) (see Table 3).
All items of the first part and almost all items of the second part of the EMLH test show a powerful discrimination index using the 36% of the sample and comparing the subsample with the highest performance levels and the lowest (mastered vs. non-mastered). In the second part of the test, only item 18 (delusions) has an acceptable level of discrimination (D = 0.33), and item 4 (healthy behaviors) has low levels of discrimination (D = 0.24) (see Table 4). Table 4. Item discrimination index of the EMLH test comparing the highest performance levels (mastered) and the lowest (non-mastered) subsamples and using 36% of the total sample.

Discussion
This study described the development of the EMHL test and its psychometric properties in a sample of Spanish adolescents, showing good validity properties and stability over time. The EMHL test is a measure created to assess the effectiveness of EspaiJove.net intervention, and more generally, to evaluate the knowledge and belief about mental health and mental disorders, knowledge of the risk behaviors of a mental disorder and the help-seeking behaviours. Accordingly with the developmental process, the EMHL test has been developed to deliver a clinically relevant measurement adapted to the Spanish context, supported by extensive recommendations by mental health experts and taking into account the opinion of adolescents, using non-offensive and adolescent-adapted vocabulary which was delivered via the content on the EspaiJove.net universal MHL intervention. Regarding the validity of the results, the EMHL test demonstrated it was capable of distinguishing between known groups with different levels of mental health knowledge, correlated with some variables related to mental health, related-behaviors and HRQoL, and it has the capacity to differentiate between those adolescents who have mastered the materials of the EspaiJove.net and those who have not. So, these results suggest that this test will be appropriate for the assessment of its efficacy after the intervention. However, some reliability measures showed slightly low scores in the EMHL test.

Reliability of the EMHL Test
The score distribution for the EMHL test showed that it has an appropriate ceiling and floor effect and stability over time, where less 6% of the adolescents answered all the items correctly in both parts of the EMHL test, respectively. Regarding reliability, these results suggest that this test can be used over time for the assessment of the efficacy of interventions aimed at increasing MHL among Spanish adolescents, such as the EspaiJove.net intervention, and among the adult population with an estimated higher level of MHL. However, internal consistency showed low results for the second part of the EMHL test, although it almost reached the ≥0.70 value which is considered to be acceptable. Internal consistency ranges from poor to fair in both parts of the EMHL test, where the same results have been demonstrated in corrected item-to-total correlations. Furthermore, two items (item 2-where to go for help; item 4-healthy behaviours) have negative values recommending to delete both items. However, the opinions of mental health professionals considering both items as clinically relevant, recommend to keep them in the EMHL test. According to a previous systematic review of the quality of developed MHL instruments [13], out of fifteen MHL instruments developed that assessed internal consistency or reliability, five of them demonstrated poor quality properties, two in the adolescent population for assessing mental disorders and depression, respectively [14,15], and six for the general population [32][33][34][35][36][37], one for schizophrenia, two for depression, and three for mental disorders, respectively. One hypothesis regarding these low values is that the EMHL test broadly covers a wide range of mental health constructs. In fact, this instrument asks about mental health services, healthy and risky behaviors, conduct problems and antisocial behaviors among peers, and several mental disorders and problems such as anxiety, mood, eating and behavioral disorders, substance use and psychotic disorders, and self-harm and suicidal behaviors. So, it is most likely that the capacity of the EMHL test to cover the main clinically relevant questions to promote mental health and to prevent mental disorders may have considerably decreased its internal consistency. Nevertheless, during the development process, skilled mental health professionals and high school students were asked about the items that they considered being more relevant to be used with adolescents for improving mental health. Other hypothesis is that the second part of the EMHL test uses multiple choices including distracting items and based on stereotypes, prejudices and erroneous statements about mental health increasing the difficulty of the test and the variability of their answers and decreasing its reliability. So, in conclusion, although some reliability values showed poor properties, we consider that the opinion of skilled professionals and targeted adolescent population prevails over statistical analyses for the final inclusion of the items in the EMHL test.

Convergent Validity of the EMHL Test
Both parts discriminated between wide ranges of groups (adolescents, postgraduate students, high school teachers, primary care professionals and mental health professionals) with different levels of mental health knowledge. The discriminative capacity of the EMHL test between these groups makes it possible to conduct studies among youths, even in the adult population and to conduct studies that assess the effectiveness of MHL interventions aimed at increasing the level of mental health knowledge up to that of mental health professionals which would be the gold standard of MHL and expertise. Although most validity measures showed poor or fair reliability, such as stigma-related with CAMI's authoritarianism dimension, mental health symptoms and HRQoL, the EMHL test also demonstrated significant correlations between many mental health-related constructs, as our a priori hypotheses regarding the pattern of correlations between MHL and mental health-related stigma, emotional symptoms, anxiety and depression HRQoL, conduct problems and bullying behaviors were generally satisfied. As predicted, adolescents with higher MHL levels had less mental health-related stigma, anxiety and depression HRQoL, conduct problems and bullying behaviors, and higher emotional symptoms. Unexpectedly, higher self-care HRQoL correlated with higher MHL levels. This unexpected correlation is probably related to the positive mental health items in the EMHL test. These results suggest that higher scores of MHL using the EMHL test have an impact on these constructs.

Strengths and Limitations
Our study has several strengths, including a large sample size, as well as the fact that the EMHL test addresses a gap in the assessment of adolescent MHL. Methodologically, this study proposes an evaluation method for MHL that has been scarcely explored so far. The CRTs are generally used in education to find out to what extent each subject dominates the criterion of interest. In this case, the selection of items is not based on individual differences and the variability of answers but on the purpose of the test: to identify those students who manage the domain and those who do not. Therefore, the CRTs' decisions are based on acceptable performance rather than the relative position [38].
On the other hand, our study has some limitations. Although this instrument has been developed for the Spanish context, the EMHL test has been developed with adolescents living in Barcelona, so further validation should be assessed in other cities, regions and settings. Secondly, the broad scope of the EMHL test and the complexity of multiple choice questions could diminish its reliability. However, we consider that adolescents who know key questions about mental health prevention, and which they consider to be useful, are more important and clinically relevant than the consistency of the test assessed statistically.

Conclusions
As a result, this study provides a new valid instrument for the evaluation of MHL interventions. Although the EMHL test is only used for the EspaiJove.net intervention, it could be a promising tool to inspire other MHL measurements.