Applying SEM, Exploratory SEM, and Bayesian SEM to Personality Assessments

: Despite the importance of demonstrating and evaluating how structural equation modeling (SEM), exploratory structural equation modeling (ESEM), and Bayesian structural equation modeling (BSEM) work simultaneously, research comparing these analytic techniques is limited with few studies conducted to systematically compare them to each other using correlated-factor, hierarchical, and bifactor models of personality. In this study, we evaluate the performance of SEM, ESEM, and BSEM across correlated-factor, hierarchical, and bifactor structures and multiple estimation techniques (maximum likelihood, robust weighted least squares, and Bayesian estimation) to test the internal structure of personality. Results across correlated-factor, hierarchical, and bifactor models highlighted the importance of controlling for scale coarseness and allowing small off-target loadings when using maximum likelihood (ML) and robust weighted least squares estimation (WLSMV) and including informative priors (IP) when using Bayesian estimation. In general, Bayesian-IP and WLSMV ESEM models provided noticeably best model fits. This study is expected to serve as a guide for professionals and applied researchers, identify the most appropriate ways to represent the structure of personality, and provide templates for future research into personality and other multidimensional representations of psychological constructs. We provide Mplus code for conducting the demonstrated analyses in the online supplement.


Introduction
Factor analysis has served as a critical tool for understanding the nature of constructs included in personality inventories.Over the years, factor analysis techniques have advanced from simple exploratory analyses to more sophisticated approaches that include (a) confirmatory factor analyses (CFAs); (b) estimation procedures that adjust for nonnormal distributions, correct for ordinal-level data, and enhance model convergence; (c) extensions of simple structure models, also called independent-clusters models (IC-CFA; [1]), to allow for cross-loadings for non-targeted items via small-variance priors.
Maximum likelihood (ML) estimation is commonly used in factor analytic models but is affected by scale coarseness, whereas weighted least squares estimation with means and variances adjusted (WLSMV) corrects for such effects (e.g., [2,3]).Bayesian estimation allows for the incorporation of prior information in testing models (e.g., [4,5]).Traditional factor analytic procedures have forced off-target loadings to equal zero, but more recent exploratory structural equation modeling (ESEM) techniques have allowed such loadings to vary from zero to improve model fit and represent constructs more realistically ( [6]).
Despite the importance of demonstrating how these CFA models estimated under maximum likelihood (CFA ML) and CFA models estimated under mean-variance adjusted weighted least squares (CFA WLSMV), exploratory structural equation modeling under maximum likelihood (ESEM ML), exploratory structural equation modeling under mean-variance adjusted weighted least squares (ESEM WLSMV), and Bayesian structural equation modeling (BSEM) analytical dimensions work, research comparing these analytic techniques simultaneously is limited with no studies to the best of our knowledge conducted to systematically compare them to each other using more complex correlated-factor, hierarchical factor, and bifactor models of personality or other related constructs.
In this study, we demonstrate how structural equation modeling (SEM) can be applied with different estimation methods (maximum likelihood (ML) vs. weighted least squares with mean and variance adjusted (WLSMV) vs. Bayesian and factor loading constraints (exploratory structural equation modeling; ESEM)) to find the types of procedures that best represent the theoretical framework of personality using online data for 447,500 respondents from the International Personality Item Pool (IPIP) database ( [7]).The present study is focused on the most recent advances in factor analytic procedures to determine their relative effectiveness in best representing constructs measured by a personality inventory.The present analyses are expected to provide templates for future research into personality and other similar multidimensional representations of psychological constructs by implication.
The remainder of this paper is structured as follows: first, we provide an overview of factor analytic techniques including exploratory factor analysis, confirmatory factor analysis, exploratory structural equation modeling, and Bayesian structural equation modeling with an emphasis on correlated-factor, hierarchical, and bifactor structures.Next, we discuss (a) the methods used, data, and sample; (b) the results; (c) the discussion and implications; (d) recommendations for future studies; (e) limitations; and (f) conclusions.

An Overview of Factor Analytic Techniques
The primary goal of factor analysis is to determine the number and nature of latent variables or factors that explain variation and covariation among observed scores ( [8]).Two main kinds of factor analysis fall under the umbrella of Thurstone's (1947) [9] common factor model: exploratory factor analysis (EFA) and confirmatory factor analysis (CFA; [10,11]).Both methods seek to account for observed relationships among a set of indicators with a smaller set of latent variables or factors but are fundamentally different in other ways ( [8,12]).

Exploratory Factor Analysis (EFA)
EFA is data-driven in the sense that the number of factors or the pattern of associations between the latent factors and the indicators need not be initially specified.EFA is used as an exploratory or descriptive tool to find an appropriate number of common factors and to determine which observed measures serve as the best indicators of the latent factors by the size and magnitude of factor loadings ( [8,12]).
However, EFA has several shortcomings.These generally include the absence of fit indexes, being data-driven rather than theory-driven, the inability to account for method effects, confounding measurement error and indicator specificity, and the absence of procedures for testing measurement invariance and formal theoretical models ( [13]).Confirmatory factor analysis is intended to alleviate these shortcomings.

Confirmatory Factor Analysis (CFA)
Confirmatory factor analysis (CFA) is a structural equation modeling (SEM) procedure that quantifies associations between observed measures or indicators (e.g., items) and latent variables or factors ([8]).CFA is a common statistical approach to analyzing complex multidimensional structures underlying personality inventories, including correlatedfactor, hierarchical, and bifactor models.CFA is theory-driven, but it often fails to achieve acceptable model fit and can produce substantial parameter biases in the estimation of factor loadings and correlations because of its restrictive assumptions of exact zero cross-loadings and residual covariances (e.g., [8,12,14,15]).The CFA model links the observed indicators to the latent factors via the measurement equation y r = µ + Λη r + ϵ r where r = 1, . . ., R indexes observations, y r is a p × 1 vector of observed indicators, µ is a p × 1 vector of intercepts, η r is a q × 1 vector of latent variables, ϵ r is a p × 1 vector of measurement errors, and Λ is a p × q matrix of factor loadings that links the latent variables to the observed indicators.Under standard assumptions of the CFA model, η r and ϵ r are normally distributed and independent, implying that E(y r ) = µ + ΛΞ and V(y r ) = ΛΦΛ ⊤ + Θ, with Ξ denoting the mean vector of the latent variables, Φ is a factor covariance matrix, and Θ is the residual covariance matrix.
Common CFA Models: Correlated-Factor, Hierarchical, and Bifactor Models In this study, three types of CFA models including the correlated-factor, hierarchical, and bifactor models are relevant (see Figure 1).Within correlated-factor CFA models, factors are typically intercorrelated, but the researcher does not analyze the directions or patterns of factor interrelationships.Hierarchical factor analysis extends such modeling by further investigating higher-order factors that might account for intercorrelations among lower-order factors.Hierarchical CFA models, as the name implies, are useful when theory dictates that constructs are related in a hierarchical fashion in which lower-order factors account for the intercorrelations among observed indicators and higher-order factors in turn account for interrelationships among lower-order factors.As a result, the hierarchical model is more parsimonious but can fit the data no better than the correlatedfactor model.Hierarchical models must include at least three first-order factors to differ from correlated-factor models.Additionally, first-order factors should have at least three indicators ( [8,12,[15][16][17]).
Bifactor (also known as nested-factor or general-specific) models serve as alternatives to hierarchical models ( [17]).Such models have a latent structure that includes a general factor that accounts for the commonality among all indicators and independent group factors that represent systematic variance specific to nested and non-overlapping subsets of indicators ([8,12,16,18]).Unlike correlated-factor models, bifactor and hierarchical models both assume that global factors in addition to more specific factors explain covariation in observed scores ( [13]).Chen et al. (2006) [17] enumerated the following advantages of the bifactor model.First, the bifactor model can work as a baseline model to be compared to the second-order model through a likelihood ratio test because the second-order model is nested within the bifactor model ( [19,20]).Second, the bifactor model can be used to assess the importance of domain-specific factors that are orthogonal to the general factor.For example, with Spearman's (1927) [21] original conception of general and domain-specific factors, he assumed that there was a factor representing general intelligence as well as domain-specific factors reflecting separate abilities such as verbal, spatial, mathematical, and analytic.
Third, bifactor models show the strength of association between the specific group factors and their corresponding indicators with the effects of the general factor partialed out and vice versa.This is not the case with the hierarchical model because interpretations are level specific.Finally, the bifactor model can be used to predict outcomes using both the general and group factors while they mutually control for each other's effects.This cannot be done with the second-order model because the second-order factors directly affect the first-order factors and thus residual variances of the first-order factors reflect the variability of the domain-specific factors that are not accounted for by the second-order factors ( [8,17] Brown, 2015; Chen et al., 2006).
Despite the interpretative advantages of the three types of CFA models described here, they are not without their shortcomings and often fail to yield acceptable fits ( [14] Examples of correlated-factor, hierarchical, and bifactor measurement models.Note: F = factor; FG = general factor; HO = hierarchical factor.In (a), the correlated-factor model, covariation in the observed indicators is explained by six interrelated factors; in (b), the hierarchical model, covariation in the observed indicators is captured by lower-order factors whose interrelationship is in turn captured by the hierarchical factor; in (c), the bifactor model, covariation in the observed indicators is primarily captured by a general factor with the specific factors capturing additional covariation not captured by the general factor.For simplicity, residual variances and intercepts are omitted.

Exploratory Structural Equation Modeling
Exploratory structural equation modeling (ESEM) was proposed by Asparouhov and Muthén (2009) [6] as an alternative method to solve the problems described in the previous paragraph by merging methodological advantages of CFA, SEM, and EFA to allow for assessment of goodness-of-fit, tests of multiple-group invariance, longitudinal differential item functioning, higher-order factor structures, growth modeling, and other applications ( [6,13,14,24] Asparouhov and Muthén, 2009;Marsh et al., 2009Marsh et al., , 2010a;;Marsh et al., 2014).
ESEM Correlated-Factor, Hierarchical, and Bifactor Models An appealing attribute of the ESEM framework is that it is applicable to the correlatedfactor, hierarchical, and bifactor models of interest to this study, and has been applied in numerous studies (see, e.g., [15,25- Perera (2016) [26] investigated the construct validity of the Social Provisions Scale data using correlated-factor, bifactor, and higher-order ESEM models and found that the bifactor ESEM model outperformed other ESEM and IC-CFA models, which links each item to one factor only, with respect to model fit.Perera also found that the correlated-factor ESEM model produced much lower factor correlations than those in the corresponding IC-CFA model.Tóth-Király et al. (2018) [28] examined the multidimensionality of responses to the Basic Psychological Need Satisfaction and Frustration inventory using bifactor ESEM and found that most of the bifactor ESEM models achieved an acceptable level of model fit.Figure 2 illustrates an example of exploratory SEM correlated-factor, hierarchical, and bifactor models, respectively.and (c) bifactor models.Note: F: factor; FG: general factor; HO: higher-order factor.For simplicity and illustrative purposes, the figures above only contain a subset of cross-loadings (represented with the dashed lines).Note: F: factor; FG: general factor; HO: higher-order factor.For simplicity and illustrative purposes, the figures above only contain a subset of cross-loadings (represented with the dashed lines).

Maximum Likelihood vs. Weighted Least Squares
With maximum likelihood (ML) estimation in factor modeling, observed item scores are assumed to fall on a continuous equal-interval scale that follows a multivariate normal distribution ([3,29,30]

Bayesian Structural Equation Modeling
Bayesian structural equation modeling (BSEM) has been proposed as an alternative approach to traditional CFA and SEM to improve model fit and provide less restrictive and more realistic presentations of the nature of constructs measured within personality inventories ( [6,34] Asparouhov and Muthén, 2009;Muthén and Asparouhov, 2012).The term BSEM encompasses an extensive variety of Bayesian analyses conducted in SEM with numerous prior distributions ( [35] Liang, 2020).BSEM is less restrictive than maximum likelihood procedures and uses prior information to reflect a researcher's theories or prior knowledge in a way that the previously discussed estimation procedures do not.In contrast to BSEM, ML procedures rely on large-sample theory and normality assumptions ( [34,36] Levy and Choi, 2013; Muthén and Asparouhov, 2012).BSEM is not based on such assumptions and can be used with small samples and in situations where ML estimates do not converge or yield counterintuitive results (e.g., [37][38][39]  Rooted in Bayes' theorem, BSEM treats a vector of model parameters θ as random variables that are assigned a prior distribution according to a researcher's previous knowledge, beliefs, or assumptions.The elements in θ include, but are not limited to, item intercepts, factor loadings, residual variances and factor covariances.The distributions are collectively denoted as P(θ) and are referred to as priors.Bayesian inference combines the prior distribution P(θ) and the likelihood of the observed data P(D | θ) (i.e., the SEM model of interest) to produce the posterior distribution, denoted by P(θ | D ) ( [5] Levy and Mislevy, 2017), which reflects updated knowledge about θ in light of the observed data.Bayes' theorem says that the posterior distribution is given by where P(D) = θ P(D | θ)P(θ dθ is the marginal distribution of the observed data after the parameters have been integrated over their respective parameter space.Except for simple cases, the marginal distribution of the observed data are generally not available in analytic form, however, because it is constant with respect to the parameter vector θ, it is not needed in Bayesian computations.Previous research on BSEM has found that a Bayesian approach provides bettermodel fit than CFA/SEM ( [34] Muthén and Asparouhov, 2012).In addition, BSEM is also more flexible when the models are complex with smaller sample sizes of respondents in comparison to ESEMs which typically require larger samples to provide trustworthy results ( [35,40] Liang, 2020; Reis, 2017).

Types of Priors
The prior distribution represents theories and previous knowledge or beliefs that a researcher has about the parameter values before gathering new data and conducting a new study ( [34,41] Kaplan, 2014; Muthén and Asparouhov, 2012).Priors are specified for each of the CFA model parameters.They can be informative (small prior variance) or non-informative (large prior variance; also called vague or diffuse).
Informative priors are used when there are sufficient previous findings about the nature of scales and shapes of distributions ( [42,43] Kaplan and Depaoli, 2012; Zyphur and Oswlad, 2015).BSEM provides a means to shrink noise parameters (e.g., trivial cross-loadings) toward zero in a sparse factor loading matrix.Many cross-loadings are constrained to zero, and only a few important or theoretically relevant cross-loadings are typically allowed to be non-zero ([35,44,45] Bhattacharya and Dunson, 2011; Kaufmann and Schumacher, 2017; Liang, 2020).Informative priors on small-sized parameters such as cross-loadings are assigned and manipulated to achieve shrinkage.Shrinkage priors are intended to diminish the noise parameters (e.g., trivial cross-loadings) toward zero for the purpose of generating a more parsimonious factor model while keeping estimates strong for the signal parameters (e.g., targeted loadings and non-trivial targeted crossloadings; [35,38] Liang, 2020; Liang et al., 2020).
The small-variance normal prior is one of the shrinkage priors commonly used among various kinds of distributions in BSEM and available in Mplus ( [34,40,46] Muthén and Asparouhov, 2012; Price, 2017; Reis, 2017).In BSEM with small-variance normal distribution priors (BSEM-N), major loadings are estimated in a confirmatory approach by being assigned non-informative priors (e.g., N (0, 10 10 )) or weakly informative priors if prior information is presented from past analyses.Cross-loadings are estimated in an exploratory way and penalized by being assigned shrinkage priors with a mean of zero and a small variance ( [35] Liang, 2020).For example, utilizing a normal prior of N (0, 0.01) suggests "the prior belief that a 95% chance the true cross-loading falls between −0.196 and 0.196" ( [38] Liang et al., 2020, p. 876).In comparison to the independent clusters model CFA, which allows each item to load on one factor and sets all the cross-loadings to zero ([13] Marsh et al., 2009), Bayesian estimation is flexible in estimating models by regulating the variability in cross-loadings through controlling prior distributions.This enables models that are not computed by frequentist methods such as ML to be estimated ( [38] Liang et al., 2020).
Non-informative priors are employed when we do not possess sufficient prior knowledge or information or have consistent findings or theories to yield posterior inferences.This lack of information is still important to quantify to represent our cumulative understanding of an imminent problem.Non-informative priors are regarded as a standard method to quantify the lack of prior information.
One type of commonly used non-informative prior distribution is the uniform (flat) distribution.This prior is less informative (i.e., flatter) than other types of priors, so it allows the data to take over the estimation of posteriors via their likelihood ( [47,48] Gill, 2008; Kass and Wasserman, 1996).This distribution indicates that, before any data are assembled, no parameter values are more plausible than others.A prior distribution with a large variance, such as a normal distribution with a mean µ = 0 and variance σ 2 = 10 10 , serves a similar purpose.This variance makes the prior probability distribution of the parameter values almost flat, which is the default setting in Mplus (see [49,50] [53] recommended that several default priors be used to evaluate the effects of the choice of the prior on model estimation. The Markov chain Monte Carlo (MCMC) algorithm is often applied to empirically approximate posterior distributions.The words Monte Carlo mean that a simulation process with sampling, generating, and drawing is applied.The word chain indicates that the random values are drawn by being linked and taking place sequentially.Thus, MCMC iteratively draws many samples from posterior distributions of unknown model parameters.When MCMC chains converge to a stationary distribution, Bayesian parameter estimates can be generated based on summary statistics (e.g., mean, mode, standard deviation, median, and quantiles) of the posterior distributions ( [4,5,35,42] Gelman et al., 2014; Kaplan and Depaoli, 2012; Levy and Mislevy, 2017; Liang, 2020).

Bayesian Model Fit Evaluation
Model goodness-of-fit indices and information criteria (IC) used in conventional SEMs have also been adapted for use in BSEM.These include the Bayesian root mean square error of approximation (BRMSEA; [7] Hoofs et al., 2018), the Bayesian comparative fit index (BCFI), and the Bayesian Tucker-Lewis index (BTLI; [54] Garnier-Villarreal and Jorgensen, 2020).BRMSEA, BCFI, and BTLI indices of model fit are based on discrepancies between actual and replicated data at each MCMC iteration in a similar way to the posterior predictive model checking technique (PPMC; e.g., [55] Gelman et al., 1995).PPMC replicates a dataset with the same sample size as the observed data at every MCMC iteration and quantifies a posterior predictive p-value (PPp; [56] Meng, 1994) as the proportion of iterations where the fit statistic calculated based on the observed data does not exceed the fit statistic obtained from the replicated data.PPMC implies that the discrepancy between the observed and replicated data should be minimal when the model fits the data ( [34,42] Kaplan and Depaoli, 2012; Muthén and Asparouhov, 2012).BRMSEA indices have been shown effective in large samples (N > 1000).Unlike most frequentist equivalents, Bayesian adaptions of fit indices also include credibility intervals to quantify their uncertainty ( [54] Garnier-Villarreal and Jorgensen, 2020).
In keeping with cutoff criteria for acceptable model fit within a frequentist structural equation modeling framework stated earlier, RMSEAs less than 0.06 indicate excellent fit, and values less than 0.08 are acceptable with sample sizes greater than or equal to 1000.In contrast, the application of traditional cutoff values for BCFI and BTLI is not endorsed by Garnier-Villarreal and Jorgensen (2020) [54] because the criteria were not applicable to all samples.

Research on IPIP-NEO-120
The International Personality Item Pool-Neuroticism, Extraversion, and Openness (IPIP-NEO-120) questionnaire is a public domain 120-item inventory that assesses personality traits under the Big Five model.The domain scales (Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism) have 24 items each with six nested fouritem facet subscales under each domain ( [57] Johnson, 2014).The IPIP-NEO-120 was assembled by Johnson as an abbreviated version of the 300-item IPIP-NEO inventory ([58] Goldberg, 1999) and is intended to measure the same constructs assessed by the original 240-item NEO Personality Inventory (NEO PI-R; [59] Costa and McCrae, 1992).To measure personality more efficiently, researchers have created several reduced-length forms of the IPIP-NEO-300, including the IPIP-20, IPIP-50, IPIP-100, and IPIP-NEO-120, but only the IPIP-NEO-120 measures all six facets nested under each of the Big Five domains.The IPIP-NEO-120 is available from the IPIP website (http://ipip.ori.org),accessed on 28 November 2023.
The reliability indices of the IPIP-NEO-120 are commonly reported at both domain and facet levels.Overall, alpha coefficients ranged from 0.81 to 0.92 for domain scales and from 0.47 to 0.89 for facet scales.In general, the mean alpha coefficients for domain scales were higher than those for facet scales ( [57,[60][61][62][63][64][65][66]  The convergent validity evidence of the IPIP-NEO-120 was reported in two studies.Factor structures for IPIP-NEO-120 scale scores were also investigated using CFA, EFA, and maximum likelihood estimation in two studies ( [61,63] Kajonius and Giolla, 2017; Kajonius and Johnson, 2019).Kajonius and Giolla (2017) [61] used a worldwide sample of 130,602 respondents from 22 countries to examine hierarchical factor models using confirmatory factor analysis with maximum likelihood estimation for each of the five global domains.They reported comparative fit indices (CFIs) ranging from 0.87 to 0.92 for Neuroticism, 0.75 to 0.89 for Extraversion, 0.76 to 0.89 for Openness, 0.84 to 0.91 for Agreeableness, and 0.89 to 0.94 for Conscientiousness across 22 countries.
As another example, Kajonius and Johnson (2019) [63] initially confirmed the five factors of the IPIP-NEO-120 using exploratory factor analysis with maximum likelihood estimation based on a sample of 320,128 respondents living in the United States.They then compared hierarchical and bifactor confirmatory factor analysis models utilizing maximum likelihood estimation for each of the IPIP-NEO-120 global domains.They found that bifactor models fit better than hierarchical models with the largest differences in CFIs between bifactor and hierarchical models observed for Extraversion (0.91 vs. 0.88) and Agreeableness (0.93 vs. 0.87).

Significance of the Current Study
Recent research into ESEM and BSEM has been primarily focused on simple factor analytic models and generally shows that these procedures better fit data than CFAs (e.g., [51,67]  Absent from these studies are weighted least squares with mean and variance adjusted estimation (WLSMV), exploratory structural equation modeling, and Bayesian structural equation modeling analyses.Limited research involved comparisons of CFA, ESEM, and BSEM simultaneously, focusing only on simple factor analytic models for the Big Five personality models (e.g., [34,67,68] Guo et al., 2019; Kim and Wang, 2021; Muthén and Asparouhov, 2012).To the best of our knowledge, no research has been conducted within more complex correlated-factor, hierarchical, and bifactor frameworks with the IPIP-NEO-120.Because the IPIP-NEO-120 was recently developed, evidence supporting the reliability and validity of its scores is quite limited.The present study is focused on the most recent advances in factor analytic procedures to determine their relative effectiveness in best representing constructs measured by a personality inventory, allowing replication of analyses within a distinct domain of personality.

Illustrations of SEM, ESEM, and BSEM Techniques Using IPIP-NEO-120 Agreeableness Scale
In this study, we demonstrate and evaluate the effects of SEM estimation methods (maximum likelihood (ML) vs. weighted least squares with mean and variance adjusted (WLSMV) vs. Bayesian estimation) and factor loading constraints (exploratory structural equation modeling: ESEM) on model fit in correlated-factor, hierarchical, and bifactor models using IPIP-NEO-120 Agreeableness Scale.The following three research questions will guide this study.

RQ1.
To what extent do parameter estimation methods (SEM ML, WLSMV, and Bayesian SEM) affect model fit for correlated-factor, bifactor, and hierarchical factor models? RQ2.To what extent do factor loading constraints (allowing vs. restricting weak off-target loadings; ESEM vs. CFA) affect model fit for correlated-factor, bifactor, and hierarchical factor models? RQ3.To what extent will the use of different priors in BSEM affect model fit for correlatedfactor, bifactor, and hierarchical factor models?
6. Methods 6.1.Data and Measure IPIP-NEO-120 The IPIP-NEO-120 ([57] Johnson, 2014) is a 120-item personality inventory that measures the Big Five personality factors (Agreeableness, Conscientiousness, Extraversion, Neuroticism, and Openness to Experience).Each of the five domain scales has six nested four-item facet subscales.The Agreeableness subscale consists of 24 items measuring levels of trust, morality, altruism, cooperation, modesty, and sympathy.The Conscientiousness subscale has 24 items focused on self-efficacy, orderliness, dutifulness, achievement-striving, self-discipline, and cautiousness.The Extraversion subscale consists of 24 items with facets for friendliness, gregariousness, assertiveness, activity level, excitement-seeking, and cheerfulness.The Neuroticism subscale has 24 items that measure anxiety, anger, depression, and vulnerability.The Openness to Experience subscale includes 24 items for the facets of imagination, artistic interests, emotionality, and liberalism ([57] Johnson, 2014).
Items are answered using a five-point Likert-type rating scale in which 1 = Very Inaccurate, 2 = Moderately Inaccurate, 3 = Neither Accurate nor Inaccurate, 4 = Moderately Accurate, and 5 = Very Accurate (see Figure 3), with possible scores ranging from 24 to 144 for each 24-item domain scale, and from 4 to 20 for each facet subscale.Missing values are coded as 0. The IPIP-NEO-120 consists of 65 positively phrased and 55 negatively phrased items.Among the domain scales, 29%, 46%, 75%, 71%, and 50% are positively phrased for Agreeableness, Conscientiousness, Extraversion, Neuroticism, and Openness to Experience, respectively.Negatively phrased items are reverse scored by recoding (1 = 5, 2 = 4, 4 = 2, 5 = 1), and this recoding is automatically done after respondents finish the questionnaire so that users need not do so themselves ([57] Johnson, 2014).For illustrations, we used data from the Agreeableness subscale.In this study, we demonstrate and evaluate the effects of SEM estimation methods (maximum likelihood (ML) vs. weighted least squares with mean and variance adjusted (WLSMV) vs. Bayesian estimation) and factor loading constraints (exploratory structural equation modeling: ESEM) on model fit in correlated-factor, hierarchical, and bifactor models using IPIP-NEO-120 Agreeableness Scale.The following three research questions will guide this study.

RQ1.
To what extent do parameter estimation methods (SEM ML, WLSMV, and Bayesian SEM) affect model fit for correlated-factor, bifactor, and hierarchical factor models?

RQ2.
To what extent do factor loading constraints (allowing vs. restricting weak off-target loadings; ESEM vs. CFA) affect model fit for correlated-factor, bifactor, and hierarchical factor models?

RQ3.
To what extent will the use of different priors in BSEM affect model fit for correlatedfactor, bifactor, and hierarchical factor models?

IPIP-NEO-120
The IPIP-NEO-120 ([57] Johnson, 2014) is a 120-item personality inventory that measures the Big Five personality factors (Agreeableness, Conscientiousness, Extraversion, Neuroticism, and Openness to Experience).Each of the five domain scales has six nested four-item facet subscales.The Agreeableness subscale consists of 24 items measuring levels of trust, morality, altruism, cooperation, modesty, and sympathy.The Conscientiousness subscale has 24 items focused on self-efficacy, orderliness, dutifulness, achievementstriving, self-discipline, and cautiousness.The Extraversion subscale consists of 24 items with facets for friendliness, gregariousness, assertiveness, activity level, excitement-seeking, and cheerfulness.The Neuroticism subscale has 24 items that measure anxiety, anger, depression, and vulnerability.The Openness to Experience subscale includes 24 items for the facets of imagination, artistic interests, emotionality, and liberalism ([57] Johnson, 2014).

Sample
Participants included in the study sample consisted of 447,500 respondents from the United States who voluntarily completed the International Personality Item Pool NEO-120 questionnaire (IPIP-NEO-120; [57] Johnson, 2014) at the IPIP website (data publicly available via the Open Source Framework at https://osf.io/tbmh5/wiki/home/)accessed on 28 November 2023.It consisted of 39% male (N = 174,707) and 61% female (N = 272,793) respondents who had an average age of 24.93 years (SD = 10.29).

Descriptive Statistics
Preliminary analyses included descriptive statistics for the Agreeableness domain and their facet scores (means and standard deviations) and conventional reliability estimates (alpha and omega; see Table 1).Item-scale means ranged from 3.10 to 4.13.The mean itemscale standard deviation across all facets was 0.82.Alpha reliability estimates ranged from 0.70 to 0.84 (overall M = 0.85 for the domain, and M = 0.73 across facets).Omega coefficients ranged from 0.70 to 0.85 (overall M = 0.90 for the domain, and M = 0.74 across facets).Yu, 2002) as indicators of excellent or close fit, and values of 0 as perfect fit.We choose to adopt the more commonly used value of 0.06 for RMSEAs to signify an excellent fit in this study.For Bayesian analyses, we investigated recently developed counterparts to the CFI, TLI, and RMSEA, namely the BRMSEA, BCFI, and BTLI, proposed by Garnier-Villarreal and Jorgensen (2020) [54].
The Bayesian models were estimated using the default priors for the target loadings and informative priors for the off-target loadings, respectively.The small-variance normal distribution priors were used with a mean of zero and a small variance.More specifically, the small-variance normal distribution informative priors assigned to the off-target loadings were N(0, 0.005), N(0, 0.01), N(0, 0.02), and N(0, 0.03).A total of two chains were specified with each running 10,000 Markov chain iterations with the first 5000 iterations of each chain discarded as burn-in.

Correlated-Factor Models
The results in Table 3 for the correlated-factor model revealed that when considering estimation methods as a whole, Bayesian estimation with informative priors produced superior model fit on average (BCFI = 0.983, BTLI = 0.983, and BRMSEA = 0.022) relative to ML and WLSMV estimation with respect to CFI, TLI, and RMSEA.The best-fitting models for CFAs versus ESEMs varied with the index considered.WLSMV estimates for ESEM models provided the best fit in relation to CFIs (=0.983), and Bayesian-IP estimates did so in relation to TLIs (=0.983) and RMSEAs (=0.022) that penalized model complexity.Model fit under ML estimation was better than WLSMV estimation in terms of CFIs, TLIs, and RMSEAs.
On average, WLSMV estimates for CFA models provided the poorest fit in terms of CFIs, TLIs, and RMSEAs.Based on the criteria of CFIs and TLIs ≥ 0.95 and RMSEAs ≤ 0.06, all Bayesian-IP and all ESEM ML and WLSMV models yielded excellent fit.

Bifactor Models
The results in Table 3 for the bifactor model again showed that Bayesian-IP estimation produced better model fit on average than did ML and WLSMV in relation to CFIs, TLIs, and RMSEAs.For CFAs versus ESEMs, the best-fitting estimation procedure varied with the index considered.WLSMV estimates for ESEMs provided the best fit in relation to CFIs (=0.993), and Bayesian-IP estimates did so in relation to TLIs (=1.000) and RMSEAs (=0.001).Model fit for WLSMV was better than those for ML in terms of CFIs and TLIs, but the reverse was true for RMSEAs.On average, ML estimates for CFAs and Bayesian-NIP estimation provided the poorest fit in terms of CFIs and TLIs, and WLSMV estimates did so in terms of RMSEAs.Based on the criteria of CFIs and TLIs ≥ 0.95 and RMSEAs ≤ 0.06, all Bayesian-IP, ESEM ML, and ESEM WLSMV models yielded excellent fits.

Hierarchical Models
As with previous models, results for the hierarchical model in Table 3 show that Bayesian-IP estimation generated better model fit on average than did ML and WLSMV with respect to CFIs, TLIs, and RMSEAs.For CFAs versus ESEMs, the best-fitting estimation procedure varied with the index considered.WLSMV estimates for ESEM models provided the best fit in relation to CFIs (≥0.986), and Bayesian-IP estimates did so in relation to TLIs (≥0.978) and RMSEAs (≤0.025).Model fits for WLSMV were better than those for ML in terms of TLIs, but worse for RMSEAs.On average, WLSMV estimates for CFAs provided the poorest fit in terms of CFIs, TLIs, and RMSEAs.Based on CFIs and TLIs ≥ 0.95 and RMSEAs ≤ 0.06, all Bayesian-IP, ESEM ML, and ESEM WLSMV models provided excellent fits.

Fit Differences for Estimation Procedures
We examined differences in average model fit indices between selected pairs of models to compare relative improvements in fit gained by altering estimation procedures and allowing off-target loadings.We provide mean differences in CFI, TLI, and RMSEA values between estimation methods within each type of factor model (correlated-factor, bifactor, and hierarchical).Table 4 also allows for direct comparisons of fit indices among the three types of models.The following sets of estimation procedures are compared: (a) ML versus WLSMV estimation, (b) ML estimation versus Bayesian estimation with informative priors, and (c) WLSMV estimation versus Bayesian estimation with informative priors.
An important observation that applies to all comparisons to be discussed is that overall fit indices were best for the bifactor model, followed by the correlated-factor model, and followed by the hierarchical models.In addition, overall patterns of differences related to estimation procedure and allowance of non-zero off-target loadings were very consistent across these factor models.  4 reveal that control for scale coarseness provided better fits in terms of TLIs (M difference = 0.001) but worse fits in terms of RMSEAs (M difference = 0.020).

ML vs. Bayesian Informative Priors
Differences in fit indices between ML and Bayesian estimation with informative priors shown in Table 4 range from 0.030 to 0.056 (M = 0.041) for CFIs, from 0.053 to 0.066 (M = 0.057) for TLIs, and from −0.020 to −0.036 (M = −0.026)for RMSEAs.

WLSMV vs. Bayesian Informative Priors
Differences in CFIs, TLIs, and RMSEAs between WLSMV and Bayesian estimation with informative priors were also noteworthy.In comparison to ML, corresponding differences for WLSMV in CFIs (M = 0.041 vs. 0.041) and TLIs (M = 0.057 vs. 0.056) were very similar but differences in RMSEAs were larger (M = −0.046 vs. −0.026).Thus, when considering results in this section collectively, Bayesian estimation with informative priors provided the best average fits for CFIs, TLIs, and RMSEAs.WLSMV estimation on average yielded similar CFIs and TLIs to ML estimation, but RMSEAs were larger than ML estimation.

Hierarchical Models
Consistent with results for correlated-factor and bifactor ESEMs, ESEMs for hierarchical models allowing non-zero off-targets fit better (CFI = 0.982, TLI = 0.970, and RMSEA = 0.036) on average than did corresponding hierarchical models (CFI = 0.872, TLI = 0.856, and RMSEA= 0.080) that restricted such loadings.As with the correlatedfactor and bifactor ESEMs, all hierarchical ESEMs yielded excellent fit (CFI and TLI ≥ 0.95 and RMSEA ≤ 0.06).When considered collectively, fit results for models supported the inclusion of non-zero off-target loadings, with the improvements in most fit statistics.

Fit Differences When Allowing Off-Target Loadings
In this section, we contrast differences in fit indices for ML and WLSMV estimation when restricting versus allowing off-target loadings.Results in Table 6 reveal that for all comparisons, models allowing off-target loadings provided better model fits than models that restricted such loadings.For ML models, allowing off-target loadings raised mean CFIs by 0.077, mean TLIs by 0.077, and reduced mean RMSEAs by 0.026.For WLSMV models, a similar pattern was evident with mean changes in CFIs, TLIs, and RMSEAs equaling 0.093, 0.099, and −0.051, respectively.In general, these results emphasize that model fit noticeably improved on average for all models and fit indices.

The Effects of Different Priors in BSEM on Model Fits
Correlated-Factor Models.The results in Table 7 for the correlated-factor model reveal that the Bayesian-IP models on average provided noticeably better fits (BCFI = 0.983, BTLI = 0.983, and BRMSEA = 0.022) than Bayesian-NIP models (BCFI = 0.906, BTLI = 0.891, and BRMSEA = 0.055).Improvements in fit when using different informative priors were not noticeable, but as variance increased, BTLI was to some extent increased (BTLI = 0.980 vs. 0.984) and BRASEA was slightly decreased (BRMSEA = 0.024 vs. 0.021).In keeping with the cutoff criteria of CFIs and TLIs ≥ 0.95 and RMSEAs ≤ 0.06, all Bayesian-IP models yielded exceptional fits.

Fit Differences for ESEM WLSMV Models versus Bayesian Models with Informative Priors
As a final follow-up comparison, we contrast average fit results between WLSMV-ESEMs and Bayesian models with informative priors, which were the best-ranking models overall in model fit.Results in Table 8 reveal that WLSMV-ESEMs slightly raised mean CFIs (by 0.004), lowered mean TLIs (by 0.008), and raised mean RMSEAs (by 0.020).

The Effects of Parameter Estimation Methods on Model Fit
When considering model fit indices collectively, Bayesian-IP estimation provided the best average model fit in comparison to non-Bayesian estimation (ML and WLSMV) procedures, whereas WLSMV estimation for CFAs provided the poorest overall fit.Based on the criteria of CFIs and TLIs ≥ 0.95 and RMSEAs ≤ 0.06 adopted here, all Bayesian-IP models yielded excellent fit.
The finding that Bayesian-IP models performed better than ML models is consistent with previous studies ( [34,40,49,51] Asparouhov and Muthén, 2010; Muthén and Asparouhov, 2012;Reis, 2017;Xiao et al., 2019).Bayesian estimation is a less restrictive approach than maximum likelihood and does not depend on large-sample theory and normality assumptions.BSEM can be applied to complex models with small sample sizes where ML estimates often fail to converge or produce counterintuitive results (e.g., [35,[37][38][39]68 Findings also revealed that Bayesian-IP typically outperformed WLSMV, especially in relation to parsimony-favoring fit indices.In previous research, this was not always the case.Depaoli and Clifton (2015) [76] demonstrated that Bayesian models more frequently converged in comparison to WLMSV models within a multilevel SEM simulation study.Holtmann et al. (2016) [77] in another simulation study found that Bayesian estimation performed better than WLSMV only when using highly informative priors, and this was true in the present study as well.Aside from fit considerations, Liang and Yang (2014) [78] noted that WLSMV is much more efficient than Bayesian-IP with respect to computational time.
We speculate that conflicting results for Bayesian estimation are largely due to the choice of priors.Researchers (see, e.g., [52,53] [51] recommended that Bayesian estimation only be used when correct specifications of informative priors are made for cross-loadings.Consequently, methods for best specifying appropriate priors remain an important topic for future research. In comparison to ML, WLSMV estimation yielded similar or better comparative fit indices (CFIs) and Tucker-Lewis indices (TLIs), but worse root mean square errors of approximation (RMSEAs).These results are congruent with those reported by Beauducel and Herzberg (2006), Lei (2009), and Li (2016) [3,29,79].Weaker fits for ML are often attributable to scale coarseness effects due to limited numbers of response options and/or unequal intervals between those options.Ark (2015), Rhemtulla et al. (2012), and Zumbo et al. (2007) [30,80,81] note that corrections for scale coarseness are most needed when scales have four or fewer response options.However, the present results indicated that such corrections also were warranted with five-option Likert-style scales commonly used with self-report measures.Failure to make such corrections can produce imprecise test statistics, standard errors, and parameter estimates.In such cases, the use of WLSMV is strongly advocated ([2,3,82] Beauducel and Herzberg, 2006; Muthén, 1993;Muthén and Kaplan, 1985), and the present results further support this conclusion.
However, an interesting and consistent result observed here was that RMSEA values for WLSMV were generally larger than those for ML.Across previous studies, relationships between RMSEAs for WLSMV versus ML estimation have been inconsistent.Beauducel and Herzberg (2006) [3] found the same relationship observed here when number of response categories exceeded four, whereas others ( [83,84] Nye and Drasgow, 2011; Xia and Yang, 2019) found that WLSMV RMSEAs were smaller than ML RMSEAs.Under circumstances when RMSEAs for WLSMV do not reach satisfactory levels but CFIs and TLIs do, modification indices may highlight effective ways to lower RMSEAs.In the present analyses, when RMSEAs for WLSMV were higher than those for ML, they still met the present criteria for inferring excellent to adequate fits.
In this regard, researchers have routinely emphasized that final model selection should not be based solely on fit indices exceeding a set of cutoff values ([74,84] Marsh et al., 2004; Xia and Yang, 2019).Instead, RMSEA, CFI, and TLI values should serve as a diagnostic means to improve fit.As suggested by Xia and Yang (2019) [84], researchers need to pursue alternative approaches to assessing goodness-of-fit statistics when ordered categorical data are employed, and this again represents an important area for further inquiry.

The Effects of Factor Loading Constraints on Model Fit
ESEMs have been adopted as an alternative approach to CFA models, which though theory-driven, often fail to achieve acceptable model fits.Unlike CFA, ESEM is more datadriven and less restrictive.In ESEMs, items loading on specific factors are not specified in advance and all other parameters are estimated freely ( [14]  Nevertheless, ESEMs also have potential drawbacks worth mentioning.These include (a) being less parsimonious, especially in large, complicated models, (b) being more likely to have convergence issues in small samples with complicated models, and (c) being susceptible to confounding of constructs within factors that should be separated according to theory ( [14,35,40,68,86] Kim and Wang, 2021; Liang, 2020; Marsh et al., 2014;Marsh et al., 2020;Reis, 2017).Although these issues did not come into play in the present analyses, they might be in other situations, especially with small sample sizes.
Sellbom and Tellegen (2019) [87] have argued against comparing ESEMs and CFA models in terms of fit due to the large number of additional parameters estimated in an exploratory manner within ESEMs.Marsh et al. (2020) [86] also acknowledged that, due to the nesting of CFA models within ESEM models, CFA models will always be more parsimonious.As a result, though not observed here, ESEMs may exhibit superior fits than CFA models for indices that do not control for parsimony such as CFI, whereas the opposite fits may be true for indices that do (e.g., TLI and RMSEA).Accordingly, Marsh et al. (2020) [86] recommended that when ESEMs and CFA models differ regarding parsimony, models should not be evaluated solely by the goodness of fit.They recommended that if fit and parameter estimates for CFA models and ESEMs are similar, then CFA models are more desirable based on parsimony.If ESEMs provide substantial improvements in goodness of fit compared to CFA models, then ESEMs would be preferred despite their added complexity.
ESEM results can also be influenced by the choice of rotation method ([51,67] Guo et al., 2019; Xiao et al., 2019).ESEMs here were estimated using target rotations in which cross-loadings were confined to zeros, whereas target loadings were freely estimated ( [6,88] Asparouhov and Muthén, 2009; Marsh et al., 2013).When target rotations are used, ESEMs provide good estimates of target loadings and factor correlations.However, with alternative rotations such as Geomin, these relationships may vary and thus are worthy of further investigation.

Bayesian Structural Equation Models and Exploratory Structural Equation Models
Another key finding observed here was that Bayesian-IP models and ESEMs consistently provided noticeably better fits to the data than did CFA models.WLSMV-ESEMs yielded the best fits in terms of CFIs, whereas Bayesian-IP models did so in terms of parsimony favoring TLIs and RMSEAs.Consequently, if model fit is of primary interest, then one of those models would likely best serve that purpose assuming satisfactory convergence and the absence of counterintuitive results.
Compared to CFA models, using ESEM models allowing non-zero off-target loadings generally provided the greatest improvements in model fit.Specifically, for both ML and WLSMV models, allowing non-zero off-target loadings improved mean CFIs and mean TLIs and reduced mean RMSEAs.Using increases in CFIs and TLIs greater than 0.01 and decreases in RMSEAs greater than 0.015 suggested by Chen (2007) and Cheung and Rensvold (2002) [89,90] as rough guidelines, these differences all would be considered noteworthy changes in average fit.
Results contrasting ML and Bayesian-IP estimation were generally even more pronounced, revealing that models using Bayesian-IP fit better on average than did models with ML and WLSMV estimation with respect to CFIs, TLIs, and RMSEAs.According to guidelines suggested by Chen (2007) and Cheung and Rensvold (2002) [89,90], all of these differences would be considered important.When conflicts arise among fit indices, Marsh, Hau, and Grayson (2005) and Marsh et al. (2020) [23,86] favor the use of TLIs and RMSEAs over CFIs because the former indices favor parsimony by penalizing model complexity.

Implications
Overall, this study was centered on the most recent innovations in factor analytic techniques to assess their relative effectiveness in representing latent constructs measured by the IPIP-NEO-120 inventory.Incorporating these innovations into correlated-factor, bifactor, and hierarchical models significantly extended evidence of the psychometric quality of IPIP-NEO-120 scores, identified appropriate ways to represent the structure of personality, and provided guidelines for future research into personality and other multidimensional measures of psychological constructs.
The current results revealed that the structure of personality can be well represented from three theoretical perspectives (correlated-factor, bifactor, and hierarchical models).In general, Bayesian-IP and WLSMV ESEM models best represented each theoretical framework.Researchers in future studies might replicate the present modeling procedures to determine the extent to which they generalize to other measures.
To guide professionals and applied researchers to identify suitable models for multidimensional representations of constructs, we suggest the following guidelines.First, Bayesian estimation is a less restrictive approach than maximum likelihood and does not depend on large-sample theory and normality assumptions.BSEM-IP models provide better fits and can be used when models are complex with small sample sizes where ML or WLSMV often fail to converge or yield negative variance.However, specifying appropriate informative priors is necessary to avoid counterintuitive results and best enhance fit.Second, ESEM models are advantageous in allowing off-target loadings and providing better fits.With ML and WLSMV estimations, allowing non-zero off-target loadings noticeably improved model fit.If ESEMs show much better model fits and lower inter-factor correlations than CFAs and the model fit is of foremost interest, ESEMs may be preferred.ESEM results may be influenced by the choice of rotation methods, so further investigation with different rotation methods may be required.Third, WLSMV estimation may be needed to correct for scale coarseness when scales have four or fewer response options.Finally, if CFA and ESEM models provide similar fit and parameter estimates, CFA models are more desirable based on the parsimony.

Recommendations for Future Research
This study might be extended to assessing method effects for positively and negatively worded items ([1,15] Marsh et al., 2010b; Morin et al., 2016) or an overall acquiescence factor by adding a unit-weighted factor to all items when feasible in correlated-factor and hierarchical models ([91-93] Hofstee et al., 1998; Soto and John, 2017; Ten Berge, 1999).Alternative response metrics with additional options might be substituted by the current five-point scale to assess possible improvements in reliability, validity, and model fit.For example, Vispoel et al. (2019) [94] demonstrated that there were no meaningful differences between ML and WLSMV when eight response options were used with measures of self-concept.
Other areas for future inquiry include exploration of methods for creating appropriate priors within Bayesian analyses, use of promising but underused methods to correct for scale coarseness (e.g., paired maximum likelihood; [95] Katsikatsou et al., 2012), most accurate criteria for determining model fit when using WLSMV and Bayesian estimation ( [54,83,84] Garnier-Villarreal and Jorgensen 2020; Nye and Drasgow, 2011; Xia and Yang, 2019), alternative rotation methods in ESEM analyses ( [6,51,67,88] Asparouhov and Muthén, 2009;Guo et al., 2019;Marsh et al., 2013;Xiao et al., 2019), guidelines for the best uses of the bifactor model for theoretical inquiries and practical applications, and simulation studies for determining the most suitable model for various scenarios.

Limitations
While interpreting results from this study, a couple of limitations should be kept in mind.First, analyses relied on only one self-report personality measure collected from a volunteer sample of United States participants, and, therefore, the results may not apply to heterogeneous groups.Second, it is important to note that our findings regarding the Bayesian analysis included many parameters in situations in which cross-loadings were included.The inclusion of cross-loadings necessarily increases the complexity of the model and it is possible that improvements in model fit could come at the cost of overfitting the data.Hence, future research should investigate the extent to which the inclusion of multiple cross-loadings affects model fit.

Conclusions
Our goal in this study was to demonstrate and systematically evaluate the effects of estimation procedures, scale metrics, and relaxed factor loading constraints within correlated-factor, bifactor, and hierarchical models using a recently developed, comprehensive, and well-constructed inventory of reasonable length that captures current theoretical consensus on overall domains and underlying facets of personality.Analyses performed on a sample of well over 400,000 respondents from the United States also contributed new insights into the psychometric scores from the IPIP-NEO-120 in representing constructs from several theoretical frameworks.Techniques demonstrated here are widely applicable to self-report measures in general and may serve as templates for future investigations into multidimensional psychological constructs.To help readers apply these techniques, we provide code in Mplus 8.8 for analyzing all factor models described in this study (see Supplementary Materials).We hope these resources prove helpful in choosing models to best represent constructs in multiple disciplines.

Figure 1 .
Figure 1.Examples of correlated-factor, hierarchical, and bifactor measurement models.Note: F = factor; FG = general factor; HO = hierarchical factor.In (a), the correlated-factor model, covariation in the observed indicators is explained by six interrelated factors; in (b), the hierarchical model, covariation in the observed indicators is captured by lower-order factors whose interrelationship is in turn captured by the hierarchical factor; in (c), the bifactor model, covariation in the observed indicators is primarily captured by a general factor with the specific factors capturing additional covariation not captured by the general factor.For simplicity, residual variances and intercepts are omitted.

Figure 2 .
Figure 2. Examples of exploratory SEM (a) correlated-factors, (b) hierarchical, and (c) bifactor models.Note: F: factor; FG: general factor; HO: higher-order factor.For simplicity and illustrative purposes, the figures above only contain a subset of cross-loadings (represented with the dashed lines).

Figure 2 .
Figure 2. Examples of exploratory SEM (a) correlated-factors, (b) hierarchical, and (c) bifactor models.Note: F: factor; FG: general factor; HO: higher-order factor.For simplicity and illustrative purposes, the figures above only contain a subset of cross-loadings (represented with the dashed lines).
[57] Johnson (2014) evaluated the convergent validity of the IPIP-NEO-120 in the Eugene-Springfield sample by correlating subscale scores with those for the NEO PI-R ([59] Costa and McCrae, 1992).Correlation coefficients among IPIP-NEO-120 and NEO PI-R ([59] Costa and McCrae, 1992) domain scores ranged from 0.76 to 0.87 (M = 0.82) and among facet scores from 0.53 to 0.76 (M = 0.66).Lace et al. (2019) [65] also reported convergent concurrent validity coefficients of the IPIP-NEO-120 Anxiety and Depression scores with Anxiety and Depression scores from the Kessler Psychological Distress Scale (K10) with correlations of 0.65 for Anxiety and 0.75 for Depression, respectively.
[22][23][24], 2014).The likelihood of this occurring typically increases as the number of items representing each factor increases ([22][23][24]Booth and Huges, 2014; Marsh et al., 2005; Marsh et al., 2010a).Misfit can occur because items typically do not exclusively represent the factors to which they are linked.In EFA, all cross-loadings are freely estimated to indicate potential systematic associations with other constructs, but such cross-loadings are forced to be zero in the basic IC-CFA models discussed here.Such restrictions also can result in inflated correlations among CFA factors, poor discriminant validity, and less accurate structural parameter estimates in misspecified SEMs ([6,14,15] Asparouhov and Muthén, 2009; Marsh et al., 2014; Morin et al., 2016).
Likelihood vs. Weighted Least Squares With maximum likelihood (ML) estimation in factor modeling, observed item scores are assumed to fall on a continuous equal-interval scale that follows a multivariate normal distribution ([3,29,30] Beauducel and Herzberg, 2006; Li, 2016; Rhemtulla et al., 2012).

Table 3 .
Goodness-of-Fit Statistics for the Factor Models Estimated.

Table 4 .
Fit Differences for Estimation Procedures.

Table 5 .
Goodness-of-Fit Statistics for the Factor Models Allowing vs. Restricting Weak Off-Target Loadings.

Table 6 .
Fit Differences When Allowing Off-Target Loadings.
Note.CFI: comparative fit index; TLI: Tucker-Lewis index; RMSEA: root mean square error of approximation; ML: maximum likelihood; WLSMV: weighted least squares mean and variance adjusted; Diff in the first column: difference between CFA ML and ESEM ML; Diff in the second column: difference between CFA WLSMV and ESEM WLSMV.

Table 7 .
Goodness-of-Fit Statistics for the Bayesian Factor Models with Different Priors.

Table 8 .
Fit Differences for ESEM WLSMV Models versus Bayesian Models with Informative Priors.