Effects of Anonymity versus Examinee Name on a Measure of Depressive Symptoms in Adolescents

There is evidence in the literature that anonymity when investigating individual variables could increase the objectivity of the measurement of some psychosocial constructs. However, there is a significant gap in the literature on the theoretical and methodological usefulness of simultaneously assessing the same measurement instrument across two groups, with one group remaining anonymous and a second group revealing identities using names. Therefore, the aim of this study was to compare the psychometric characteristics of a measure of depressive symptoms in two groups of adolescents as a consequence of identification or anonymity at the time of answering the measuring instrument. The participants were 189 adolescents from Metropolitan Lima; classrooms were randomly assigned to the identified group (n = 89; application requesting to write one’s own name) or to the anonymous group (n = 100; application under usual conditions), who responded to the Childhood Depression Inventory, short version (CDI-S). Univariate characteristics (mean, dispersion, distribution), dimensionality, reliability, and measurement invariance were analyzed. Specific results in each of the statistical and psychometric aspects evaluated indicated strong psychometric similarity. The practical and ethical implications of the present results for professional and research activity are discussed.


Introduction
According to systematic reviews on the topic of social desirability in the clinical context, a situation that represents a pervasive risk in applied research based on self-report, application for clinical and forensic practice seems to be attributed to the identification of those evaluated [1,2]. To mitigate response biases associated with self-report and to conform to ethical standards, participants are generally asked to respond anonymously. The requirement of response anonymity has a long history in the application of surveys of all types, as well as its implications for its adequacy in the ethical standards of scientific research. However, this strategy is also exposed to particular effects due to its subjective value on the privacy of responses [3], and anonymity has influenced the quality of responses even in ethnocultural research contexts and clinical applications, overestimating scores [4][5][6]. On the other hand, in nonclinical samples, anonymity may reduce the sense of responsibility in the response process [7], even more so in the face of stigmatizing behaviors such as stealing, cheating, psychoactive substance use, and erotophilic behaviors [8][9][10][11][12]. Even in circumstances of respondent quasi-identity, perceived identity protection influences the possible contamination of scores related to gender, age, and place of origin [9]. Finally, response anonymity does not guarantee the absence of careless responses or insufficient effort (C/IE; [13]), given that this type of response is almost always present in anonymous surveys presented in pencil-paper format or on a web platform [13][14][15].
At this point, we arrive at the following question: Would there be an impact on mental health assessments of adolescents if their responses are anonymous or identified? Apparently, this question has not been asked before in the context of screening the adolescent community, and it seems possible that it has not been asked in published research. The anonymous responses of adolescents in assessments for research purposes does not appear to be problematic because no actual or masked identification of the evaluee is usually required; however, in screening assessments for emotional problems within an institution, accurate referral requires identifying the adolescent being assessed to refer him or her to appropriate clinical intervention services [16,17]. The identification of the symptomatology associated with childhood depression is essential to reduce its effects on the mental health of children and adolescents. The USA's National Institute of Mental Health [18] indicated that in the preadult stage (among children and adolescents), approximately three million individuals suffer from mental disorders [18] and require proper identification to provide them with timely clinical services. This is more sensitive because the adolescent stage is vulnerable to mood alterations, social and school behavioral changes, and transition to new family roles [19,20]. Therefore, in the context of the assessment of children and adolescents, it is necessary to use screening instruments that are widely applied [17] and to use especially short scales, because they reduce irrelevant variance, potentially producing acceptable levels of specificity and sensitivity and thus improving control of Type I and Type II errors in identification and referral to clinical services [21][22][23][24][25]. Like longer measures with more items, short scales have advantages and limitations that the user must weigh out in deciding on their use and the interpretation of their scores. However, in the context of mass use and given the purpose of screening, short scales with good evidence of validity may be the best option.
One of the instruments for the detection of childhood depression is the Child Depression Inventory (CDI; [26]). This measure has a shorter version (Child Depression Inventory-Short, CDI-S) that is used as a screening and treatment follow-up instrument. It takes between 5 and 10 min to administer and even less time to score. Overall, the CDI-S is a cost-and speed-efficient tool for assessing behaviors of low population prevalence [27] and for the assessment of adults with intellectual disabilities [28] and populations with physical disabilities [29], including in the school context [30]. The CDI-S can have better evaluative efficiency than the full version because of its intrinsic and psychometric characteristics, and one of them is its dimensionality. That is, while studies using the full version yield different factorial solutions (between three and eight factors, possibly associated with the analysis strategies applied and the criteria that the different authors applied; [27]), the dimensionality of the brief version seems less problematic due to the reduced number of items. If the internal structure of the CDI is modified in subsequent studies, the problem lies in the instability (a) of the construct to be generalizable across contexts and (b) of the content sampling of the construct as originally planned. Additionally, whenever this structure of the instrument changes, the interpretation is not always statistically or conceptually comparable across groups or studies.
The CDI-S uses the self-report method, and this type of procedure generally presents several challenges that the researcher or clinician must recognize and address. For example, one such limit is social desirability [3,8,12], which tends to interact with the examinee's perception of threat toward the evaluative situation [31,32]. Therefore, the aim of this research was to analyze the effect of anonymity and subject identification on the psychometric properties (internal structure, reliability, measurement invariance) of the CDI-S scores in the adolescent population. This was conducted in the context of the natural application of screening instruments to detect early symptoms of depression.

Participants
In total, 189 Peruvian adolescent students living in Metropolitan Lima were enrolled. All were enrolled in a public, tuition-free educational institution at the secondary elemental level. The majority (69.1%) lived in the same district in which the institution was located. The mean age was 13.23 and ranged from 11 to 17 years (SD = 1.14). The distribution of students in their grades of study was as follows: first (51, 26.0%), second (64, 32.7%), third (69, 35.2%), and fourth (12, 6.1%). The demographic characteristics of the adolescents are shown in Table 1. For the purposes of the study, the participants were divided into two groups, using the classroom as the unit to identify them and randomly assign the modification of the CDI-S filling instructions (see the Procedure section); the groups were identified as group A (those who received the unmodified instructions, n = 100) and group B (modified group, or those who received the modified instructions, n = 89).

Instruments
Children's Depression Inventory-Short (CDI-S; [33]). The Spanish version of the CDI-S [34] was used. This self-report is used to screen for depressive symptoms, derived from the 27 item long version. The CDI-S can be applied to children and adolescents between the ages of 7 and 17 years individually or in groups. The CDI-S consists of 10 items selected by the author as the most representative of the construct, and its format is identical to that of the longer version. Each item has three phrases describing symptoms ranging from less (absence of the symptom) to more intense (severe presence of the symptom). The instructions ask the participants to choose the sentence that best fits how he or she has felt in the last 15 days. Items 2, 4, 5, 6, and 10 are reverse scored. The internal consistency coefficient found in the adaptation of the Hispanic version was 71 [34].

Ethical Considerations
This study is a part of the research project (HIM/2015/017/SSA.1207; "Effects of mindfulness training on psychological distress and quality of life of the family caregiver") that was approved on 16 December 2014 by the Research, Ethics, and Biosafety Commissions of the Hospital Infantil de México Federico Gómez National Institute of Health in Mexico City. While conducting this study, the ethical rules and considerations for research with humans currently enforced in Mexico [35] and those outlined by the American Psychological Association [36] were followed. All family caregivers were informed of the objectives and scope of the research and their rights in accordance with the Declaration of Helsinki [37]. The caregivers who agreed to participate in the study signed an informed consent letter. Participation in this study was voluntary and did not involve payment. The caregivers who provided consent for their child to participate completed an informed consent letter. Youth participants provided assent and returned a survey if they wished to participate.

Procedure
The authorization of the directors of the educational institution was obtained, and the corresponding permissions were requested from the parents, who were informed of the research proposal and the data collection procedures. Once the directors and parents agreed to participate, the instrument was administered during class time. The students who provided assent completed the CDI-S. Classrooms were randomly assigned to groups A and B; these groups had different instructions for filling out the CDI-S: group "A" received instructions to fill out the CDI-S anonymously, while group "B" was asked to give their name in order to have a better identification at the time of collecting the completed questionnaires. The general instruction given to the adolescents emphasized that they could stop responding at any time, without consequence. All information on examinees in both groups was transferred to a database, but the names of the examinees in group B were not entered into this database. When the database was completed, the written names were removed from the paper questionnaires.

Data Analysis
The analysis consisted of univariate and multivariate analysis phases. First, several statistical aspects of the items, such as distribution, location (mean), dispersion (standard deviation), and floor and ceiling (minimum and maximum frequency of response), were analyzed. The statistical comparison between the distributions of each item was made using the KS-D test [38,39] for two independent samples, and the overlap coefficient (OVL; [40]) was used as a measure of the practical significance of the comparison of two distribution functions that are not necessarily normally distributed [41]; the model was used for different variances to ensure better precision.
The internal structure of the CDI-S was examined by a confirmatory factor analysis, with the maximum likelihood method adjusted for item nonnormality (SB-χ 2 ; [42]), on the matrix of interitem polychoric correlations; given the limited number of response categories, this approach can be a satisfactory estimation method [43][44][45]. The measurement invariance of the items was examined by means of two procedures: the first was the metric congruence of the items [46], in which the factor loadings of the items of each group were compared by means of the congruence coefficient (ϕ; [46]). The second procedure used differential item functioning analysis (DIF; [47]), with the following specifications: (a) the matching variable was the observed score, θ, and (b) the grouping variable (G) was the status of the group examined, where the reference group was "A" (anonymous group) and the focus group was "B" (provided names). The DIF analysis was implemented with ordinal logistic regression (OLR; [48]), in which each item was assumed to be a dependent and continuous latent variable (Z, standardized in logits). The independent variables were the measured attribute (or observed score, θ), subject grouping (G; group A vs. group B), and attribute-group interaction (θ*G). Each represents a different type of DIF [49,50]. The OLR methodology consists of modeling three equations: one representing the nonuniform DIF (OLR 1 , Z = β 0 + β 1 θ + β 2 G + β 3 θ*G), one for uniform DIF (OLR 2 , Z = β 0 + β 1 θ + β 2 G), and another model for representing responses without DIF (OLR 3 , Z = β 0 + β 1 θ). The stepwise screening strategy [49,50] focused on the evaluation of practical and statistical significance, according to which for each item we first evaluated the difference between the −2 log likelihood (∆χ 2 , gl = 1, α = 0.05) between OLR 1 and OLR 2 models for detection of no uniform DIF (null hypothesis: OLR 1 = OLR 2 ) and then between OLR 3 and OLR 2 for detection of uniform DIF. The Bonferroni correction [50,51] was applied to adjust nominal α according to the number of items (0.05/10 = 0.005). Results below this level (α Bonferroni = 0.005) identified the impact of the interaction term (θ*G) and therefore the presence of nonuniform DIF. If the previous null hypothesis (nonuniform DIF) is not rejected, the second step tested the uniform DIF by the difference (∆) of the beta coefficients of the models OLR 3 (β θ ) and OLR 2 (β G ). A result ≥10% indicated statistical significance at the nominal level α = 0.20 [50].
Finally, reliability was estimated by the α coefficient [52] and ω [53]; although ω tends to be more appropriate [54], the α coefficient was also reported because it is a measure of score reliability that (a) is still popular in behavioral science research, (b) serves for direct comparison with the Spanish validation study, and (c) allows for comparison with the ω coefficient to assess the impact of possible noncompliance with the basic assumption for using α [54].

Equivalence between Groups
The equivalence of characteristics in both groups was analyzed in an equivalence testing framework [55]. To maximize the sensitivity of the test for equivalence of means, the minimum standardized difference was set at d = 0.10 [56]. This showed that the average ages of the two groups were equivalent, t

Items Level
The univariate statistics for the items (Table 2) in both groups were similar, and the discrepancies can be established as small. The Pearson correlations of these descriptive statistics (M, SD, g 1 , g 2 , floor and ceiling effect) between groups A and B were 0.96, 0.94, 0.95, 0.92, 0.95, and 0.97, respectively. These high magnitudes confirm that the pattern of descriptive statistics at the item level was similar between the compared groups. To verify this more rigorously, the statistics for each item were analyzed individually. In Table 3, the distributional differences in the response range of each item were not statistically significant (KS-D between 0.009 and 0.11), and the degree of overlap (coefficient OVL) between the distributions was greater than 79.2% but approximately 95%, suggesting that the items showed practically overlapping distributions between groups A and B (see Figure 1). Differences in the location or media (d between |0.000| and |0.210|; t-test < 1.50) and variances (F L ; Levene [57]; α nominal with Bonferroni correction: 0.05/10 = 0.005) were essentially trivial and not statistically significant. These results, taken together, point to univariate similarity at the item level between the two groups.

Differential Item Functioning: Anonymity vs. Examinee Name
The metric congruence (equality of factor loadings between groups A and B) was ϕ = 0.989, which presents substantial equality between them [46]. The analysis of the nonuniform and uniform DIF (Table 4) in each item showed the absence of any type of DIF.

Reliability
The α coefficients for groups A and B were 0.741 and 0.633, respectively; the difference between them [58] was not statistically significant, W = 1.417, F(93, 81) = 1.43: ω coefficients were 0.898 and 0.835 for groups A and B, respectively, and can also be considered to be similar.

Discussion
The aim of this research was to analyze the effect of anonymity and identification by name of adolescents in a research context. This effect was examined on the statistical and psychometric properties (internal structure, reliability, measurement invariance) of the scores of an abbreviated measure of depressive symptoms, the CDI-S. The findings of the present study are interesting and indicate that there are no effects on the psychometric properties and, consequently, on the interpretation of the CDI-S score. It can be stated that the Type I or II error that could be present in the identification and referral decisions with the CDI-S would probably be less associated with the identification of the assessed person so that the scores obtained are valid. Although a reduction in internal consistency was observed, this was statistically trivial and possibly without relevant effects on the standard error of measurement. This trivial effect was observed mainly for the coefficient ω, while the α coefficient showed a comparatively smaller reduction. It is possible that this difference interacted with one of the assumptions of coefficient α, which is tau equivalence and correlated errors [52], but correlated errors were not detected in the modeling of the dimensionality of the CDI-S in either group. Even with this reduction in internal consistency as measured by the α coefficient, the lack of statistical significance suggests that it may be considered sampling error. A complementary finding is that, in contrast to the Hispanic study by del Barrio et al. [34], here, a single latent dimension was endorsed to the CDI-S and had higher reliability; differences in reliability estimates obtained from two coefficients were also detected (α and ω), which usually represent noncompliance with the tau-equivalence model in the items [54].
The practical implications of the present results point to several potential consequences. First, the clinician using mass screening strategies now has evidence that respondent identification has a trivial effect on score variability. Second, and as a consequence of the above, the clinician can be confident that CDI-S results are possibly less influenced by subject anonymity or identification. Finally, another no less important implication is of an ethical nature; in this context, the clinician must pay attention to the safety of the tests applied and to the identification of respondents. Within an effective strategy to prevent unauthorized dissemination of the test applied and its results, identification of the examinee creates a more challenging situation than anonymous application.
The results should be interpreted in consideration of the specific limitations of the study. First, the sample size in each group limits its statistical power in each of the statistics applied, but even more so in the representativeness of the population to which it can be generalized. This limitation requires a replication study as a necessary condition to verify this effect of anonymity/identification in survey for schooled adolescents, an issue that apparently has not been addressed in previous studies [9]. On the other hand, a balance for this limitation on sample size is that the robust method used here [42] has proven to be effective in estimating the parameters of interest (the factor loadings) and their statistical significance in challenging situations such as small sample size and distributional skewness of the items [43][44][45]. Therefore, the problem of the accuracy of the estimates may have been partially solved. Second, the sample size also prevented further partitioning along of the main study variable (group A and group B), as it meant further reducing the samples compared; for example, the difference between males and females was not examined in interaction with the effect of anonymity, and the extent to which they affect response variability is not known.
With respect to sample size, previous studies have suggested that by applying multiple criteria (absolute number of cases according to expert opinion, the ratio number of casesnumber of parameters or number of observed variables, and statistical power), the range of minimum sample sizes varies from 16 to 2760 cases [59]. Other more sophisticated methods also produce divergence (e.g., on the basis of statistical power; [60,61]). Methodological research has shown that aspects such as the size of factor loadings, communality, and the number of dimensions [62,63] are stable criteria. On the basis of opportunity and contextual constraints in the present study and the minimum sample size for estimating the parameters of interest (e.g., factor loadings and communalities; [62,63]), our sample size may be sufficient (approximately 200).

Conclusions
The present study shows support for a unidimensional internal structure of the CDI-S in Peruvian adolescents. When specific conditions were imposed on the selection groups (anonymity/identification of the participants), no significant differences were found at the level of internal structure, and therefore both models were acceptable. The CDI-S can be considered a unidimensional measure for use in the general adolescent population (as is the case in our study) since experiencing some condition of dysphoria and/or negative selfesteem does not seem to be differentiable if there is no additive exposure to some clinical condition (e.g., institutionalization, chronic noncommunicable diseases, terminal illnesses); similar findings have been obtained in the literature with other depression assessment instruments (e.g., PHQ-9). Measurement invariance was corroborated, which would imply that the possible impact of anonymity would be closely related to socially inappropriate behaviors. The reliability of the CDI-S scores for both groups was not compromised. This would imply that the measurement bias would not be directly related to an identification condition but rather to other factors already identified in the literature. Due to sample size limitations in the groups of interest, further research is required on other conditions of intergroup variability, such as some sociodemographic variables and mental and physical health conditions versus each condition of anonymity vs. participant name.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.