Psychometric Properties of the Adult Self-Report: Data from over 11,000 American Adults

The first purpose of this study was to examine the factor structure of the Adult Self-Report (ASR) via traditional confirmatory factor analysis (CFA) and contemporary exploratory structural equation modeling (ESEM). The second purpose was to examine the measurement invariance of the ASR subscales across age groups. We used baseline data from the Adolescent Brain Cognitive Development study. ASR data from 11,773 participants were used to conduct the CFA and ESEM analyses and data from 11,678 participants were used to conduct measurement invariance testing. Fit indices supported both the CFA and ESEM solutions, with the ESEM solution yielding better fit indices. However, several items in the ESEM solution did not sufficiently load on their intended factors and/or cross-loaded on unintended factors. Results from the measurement invariance analysis suggested that the ASR subscales are robust and fully invariant across subgroups of adults formed on the basis of age (18–35 years vs. 36–59 years). Future research should seek to both CFA and ESEM to provide a more comprehensive assessment of the ASR.


Introduction
The Achenbach System of Empirically Based Assessment (ASEBA) is a comprehensive evidence-based assessment system designed to measure adaptive and maladaptive functioning across individuals from age 1 1 ⁄2 to over 90 years [1]. Quantitative data from ASEBA-inventories can be conceptualized into functioning scales, syndrome scales, substance use scales, Diagnostic and Statistical Manual of Mental Disorders (DSM)-oriented scales, and critical item scales (items used primarily by clinicians). ASEBA-inventories have been translated in over 110 languages and referenced and used in more than 8000 publications. ASEBA inventories are widely used in various settings, including education, research, clinical, and healthcare (mental health and medical).
The Adult Self-Report (ASR) is one component of the ASEBA [2]. This self-report inventory comprises eight syndrome scales: anxious/depressed, withdrawn, somatic complaints, thought problems, attention problems, aggressive behavior, rule-breaking behavior, and intrusiveness. The syndrome scales can be examined independently, or aggregated into two broad-band scales, internalizing (i.e., anxious/depressed, withdrawn, and somatic complaints) and externalizing (i.e., aggressive behavior, rule-breaking behavior, and intrusiveness).
families and between $50,000-99,999 for 26% of the families; 52% of families identified as White, 15% as Black, 20% as Hispanic, 2% as Asian, and 11% as other (e.g., biracial). Complete recruitment details can be found elsewhere [14]. Study procedures received ethics board approval. Prior to participation, parents/guardians provided written informed consent and children provided assent.

Statistical Analysis
For the main analysis, we conducted CFA and ESEM using the latent variable modeling program Mplus Version 8.3. Data for these analyses were based on responses from 11,773 parents. A total of 102 participants were removed from the analyses because of incomplete data on ASR items (n = 5) or they exceeded the age threshold (i.e., >60 years; n = 97). We used the weighted least squares mean and variance-adjusted (WLSMV) estimator, which is suitable for ordered categorical (ordinal) variables wherein the assumption of normality is not assumed [15]. We used target rotation for the ESEM solution and set all unintended factor loadings (i.e., cross-loadings) to be close to zero. We used the following fit indices to evaluate the model fit of both the CFA and ESEM solutions: CFI, TLI, and RMSEA. Model fit was deemed acceptable if CFI and TLI values were ≥0.90 and RMSEA was ≤0.08 [16,17]. Model fit was considered good if CFI and TLI values were ≥0.95 and RMSEA was ≤0.06 [18]. We also reported the Chi-Square statistic (χ 2 ), but because of its sensitivity to large sample sizes [19], we did not use it to gauge model fit. Factor loadings for both the CFA and ESEM solutions were required to be ≥0.30 for items to be considered salient and used in defining constructs. Loadings >0.71 were considered excellent, >0.63 very good, >0.55 good, >0.45 fair, and ≥0.30 poor [20,21]. Whereas cross-loadings were automatically zero for the CFA solution, they were required to be <0.30 for the ESEM solution. Had both the CFA and ESEM solutions met all criteria for an adequate fitting model, our intention was to compare the two solutions by examining the model fit statistics for both models. However, as described in detail in the Results section, the ESEM solution demonstrated issues with some of the factor loadings. Consequently, we deemed the CFA solution to be the more favorable model.
Because we deemed the CFA solution as the more favorable model, we tested for measurement invariance by examining changes in model fit statistics using CFA. Data for the invariance testing were based on responses from 11,678 participants (M age = 39.76, SD = 6.48). A total of 95 participants did not self-report their age and were therefore excluded from the analysis. We tested for configural, metric, and scalar invariance to determine whether the underlying factor structure of the ASR was the same across age groups (i.e., 18-35 years [n = 3094] vs. 36-59 years [n = 8584]). These age categories were selected in order to align with the normed scales for syndromes outlined in the ASEBA manual [2]. Configural invariance assesses the fit of the factor structure when there are no invariance constraints imposed across groups. Metric invariance assesses the invariance of factor loadings across groups. Scalar invariance assesses the invariance of both factor loadings and item intercepts across groups. We assessed metric invariance after configural invariance was established, and scalar invariance after metric invariance was established. Thus, this process involved testing a series of increasingly constrained models. There is support for a more constrained model when the CFI decreases by less than 0.010 and the RMSEA increases by less than 0.015 [22]. There is strong measurement invariance when all three tests demonstrate invariance across groups [23].
Last, we used standardized factor loadings to calculate composite reliability scores for the latent factors [24]. We assessed reliability using a composite reliability index. Given the persistent use of alpha coefficients as a measure of reliability, alpha coefficients were also calculated, though it should be noted that alpha has been increasingly criticized as a measure of reliability [25,26].

Results
The results of the CFA indicated that the 8-factor model fit acceptably well based on CFI (0.919) and TLI (0.917) values and very well based on the RMSEA (0.026; 90% CI [0.026-0.026]), χ 2 (4724) = 42,010.26, p < 0.001. All standardized factor loadings for the CFA were significant (p < 0.001), ranging from 0.36 to 0.90 (see Table 1). The 8-factor model using ESEM had very good model fit based on the CFI (0.970), TLI (0.965), and RMSEA (0.017; 90% CI [0.017-0.017]), χ 2 (4087) = 17,706.41, p < 0.001. However, not all items loaded cleanly on their intended factors. The anxious/depressed factor had one intended item that loaded <0.30, and three intended items that cross-loaded ≥0.30 on unintended factors. The withdrawn factor had two intended items that loaded <0.30, and one intended item that cross-loaded >0.30 on an unintended factor. The somatic complaints factor had one intended item that loaded <0.30, yet no intended items that cross-loaded ≥0.30 on unintended factors. The thought problems factor was particularly problematic; seven of ten intended items loaded <0.30, and three intended items cross-loaded ≥0.30 on unintended factors. The attention problems factor had three intended items that loaded <0.30, and two intended items that cross-loaded >0.30 on an unintended factor. The aggressive behavior factor had three intended items that loaded <0.30, and one intended item that cross-loaded >0.30 on an unintended factor. The rule-breaking behavior factor was quite problematic; six of fourteen intended items loaded <0.30, and four intended items cross-loaded ≥0.30 on unintended factors. All intended items for the intrusiveness factor loaded >0.30, and one intended item cross-loaded >0.30 on an unintended factor. Important to note is that in some cases, secondary loadings were larger than the primary loadings. Factor loadings (including cross-loadings) for the ESEM solution are reported in Table 1.
The results of measurement invariance testing using CFA demonstrated evidence of acceptable model fit. All changes in fit indices pertaining to configural, metric, and scalar invariance were well within acceptable ranges [22]. Consequently, we found support for strong measurement invariance across age groups (i.e., 18-35 years vs. 36-59 years). These results are summarized in Table 2. Descriptive statistics, as well as latent factor correlations (polychoric) and internal consistencies for the CFA and ESEM solutions, are reported in Table 3. Reliability scores for the CFA solution were good (0.83 to 0.95 for composite reliability and 0.82 to 0.95 for alpha coefficient) and scores for the ESEM solution ranged from poor to good (0.59 to 0.88). Note. CFA and ESEM correlations are presented below and above the diagonal, respectively. * Correlation is non-significant (p > 0.05); all remaining ESEM correlations are significant (p < 0.01). All CFA correlations are significant (p < 0.001).

Discussion
Scholars have argued that the classic CFA models may not be appropriate for many well-established psychological measures. ESEM is a new and evolving methodological approach that has proven to overcome the typical shortcomings of CFA. The purpose of the current study was to compare the factor structure of the ASR using both CFA and ESEM and to test for age measurement invariance. Results of the current study showed that both measurement models met conventional model fit criteria, with the ESEM solution showing an improvement in fit over CFA but demonstrating poor factor loadings on intended factors and substantial cross-loading on unintended factors. Furthermore, invariance testing revealed that the ASR is robust and invariant across two age groups.
The ESEM solution showed an improvement in model fit statistics over the CFA solution. This finding is not surprising given that ESEM yields better fit in the presence of nontrivial cross-loadings [27,28]. However, several (n = 26) items in the ESEM solution did not sufficiently load on their intended factors (<0.30) and/or cross-loaded on unintended factors (≥0.30). There were nine instances wherein the cross-loading on the unintended factor was greater than the factor loading on the intended factor. Two factors that were particularly problematic were thought problems and rule-breaking behavior, with more than half of the items (13 of 24) being problematic. Moreover, factor loadings on intended factors were substantially lower in the ESEM solution compared to the CFA solution. The strength of factor loadings on intended factors in the ESEM model generally ranged from 'poor' to 'fair' and very few were considered 'excellent', whereas the strength of factor loadings in the CFA solution typically hovered around 'good' and 'excellent'. Overall, results of the ESEM solution revealed that: (a) factor loadings on intended factors were generally weak; (b) factor loadings did not sufficiently load on intended factors; (c) there were substantial cross-loadings on unintended factors; (d) in some cases, secondary loadings were larger than the primary loadings.
Many studies examining the factorial validity of psychological measures seek to compare results from traditional CFA to the newer method of ESEM. In these studies, the conclusion that the ESEM solution was superior to the CFA solution is more common than not [11]. However, in some cases [29], and certainly in the current study, ESEM may not always provide the best overall solution-a notion acknowledged within the statistical literature [6]. Booth and Hughes [29] used ESEM as an alternative to CFA and found little evidence supporting the application of ESEM over CFA. They acknowledged that there are several nuances that must be considered when selecting between the two analytic strategies. For example, they discuss how in ESEM the inclusion of all possible parameters ensures that no substantive parameters are omitted, but that this approach may also be interpreted as overly inclusive and contradictory to the principle of parsimony. In their study, additional parameters included in the ESEM solution added very little to the definition of latent constructs. Highlighting the advantages and disadvantages of both CFA and ESEM, Booth and Hughes illustrate how CFA and ESEM can be used together to improve measurement of constructs.
The results of measurement invariance analysis suggested that the ASR is robust and fully invariant across subgroups of adults formed on the basis of age (18-35 years vs. 36-59 years). That is, the factor structure was the same regardless of the adults' age (configural invariance), the strength of the factor loadings were similar across age (metric invariance), and no significant age differences were noted in the intercepts (full scalar invariance). This finding is consistent with previous research that showed the thought problems subscale was invariant across three age cohorts (12-18 years, 19-27 years, and 28-59 years) [13]. Results of the current study are encouraging and confirm that the ASR's psychometric properties are robust across adulthood.
The results of this study should be considered in light of some limitations. Results provide evidence of the ASR's factorial validity only. Subsequent research should consider examining other types of construct validity, such as criterion or concurrent. Furthermore, we were unable to test for gender invariance because data on the participants' gender were unavailable. Existing research with dyadic data (romantic couples) has shown that the ASR subscales are noninvariant across females and males [12]. Specifically, five subscales (withdrawn, attention problems, aggressive behavior, rule-breaking behavior, and intrusiveness) met full metric invariance, two subscales (anxious/depressed and somatic complaints) met partial metric invariance, and all subscales met partial scalar invariance [12]. Future research should seek to explore the measurement invariance of the ASR across other factors (e.g., socio-demographic status) as well as across multiple time points as more ABCD data is released.
The results of the current study should be carefully considered in future studies using the ASR. Results of the ESEM solution uncovered important findings regarding factor loadings on intended and unintended factors, especially for the thought problems and rule-breaking behavior syndromes. The ESEM results therefore may raise concern for some regarding the sole reliance on CFA to examine the factor structure of the ASR. It is hoped that future researchers use CFA and ESEM in harmony rather than in isolation. Previous works examining the factor structure of psychological measures have successfully illustrated how ESEM and CFA can be used to complement each other [28,29]. Future research should also explore the psychometric properties of the ASR using other modeling approaches, such as partial least squares SEM, bifactor models that incorporate a general factor encompassing all syndromes [30] and Bayesian structural equation modeling (BSEM) [31]. The latter approach is similar to ESEM in that it allows cross-loadings. A key difference between ESEM and BSEM is that prior knowledge (referred to as priors) can inform the statistical model via statements regarding model parameters (e.g., mean, path coefficient). Thus, the model is tested against a set of known parameters rather than against a null hypothesis. An obvious challenge for researchers seeking to use BSEM is the choice of the priors, since using improper priors can have a negative impact on estimates [31]. Another advantage of BSEM is that it performs well with small sample sizes.

Conclusions
Findings showed that both the eight-syndrome CFA and ESEM solutions met model fit criteria. As anticipated and in line with previous research, the ESEM solution revealed an improvement in fit statistics and lower factor correlations than the CFA solution; however, examination of the item factor loadings revealed that several items had poor factor loadings on intended factors and/or significant cross-loadings on unintended factors. Future research should seek to use both CFA and ESEM to provide a more comprehensive assessment of the ASR. Measurement invariance testing revealed that the ASR subscales are equivalent for younger and older adults. To the best of our knowledge, this study was the first to explore the factor structure of the ASR using ESEM and the first to test measurement invariance of the individual items from all ASR subscales across age groups.