#
Evaluating Cluster-Level Factor Models with `lavaan` and Mplus

^{1}

^{2}

^{*}

## Abstract

**:**

`lavaan`to reveal how their conclusions were dependent on their study conditions. Methods: We generated data sets from the so-called configural model and the simultaneous shared-and-configural model, both with and without nonzero residual variances at the cluster level. We fitted models to these data sets using different maximum likelihood estimation algorithms. Results: Stapleton and Johnson’s results were highly contingent on their confounded design factors. Convergence rates could be very different across algorithms, depending on whether between-level residual variances were zero in the population or in the fitted model. We discovered a worrying convergence issue with the default settings in Mplus, resulting in seemingly converged solutions that are actually not. Rejection rates of the normal-theory test statistic were as expected, while rejection rates of the scaled test statistic were seriously inflated in several conditions. Conclusions: The defaults in Mplus carry specific risks that are easily checked but not well advertised. Our results also shine a different light on earlier advice on the use of measurement models for shared factors.

## 1. Introduction

`lavaan`[4], which have different default settings that could have important consequences for convergence problems and the quality of obtained results.

#### 1.1. Different Types of Two-Level Models

#### 1.1.1. Configural Model

#### 1.1.2. Unconstrained Model

#### 1.1.3. Simultaneous Shared-and-Configural Model

#### 1.2. Estimation of Two-Level Models

`lavaan`. We also discuss the estimation of between-level residual variances, which was confounded with model type in Stapleton and Johnson’s [2] simulation study.

#### 1.2.1. Maximum Likelihood Estimation Algorithms

`lavaan`[4] and Mplus [3] use normal-theory ML estimation by default for continuous variables, using the observed information matrix to derive SEs. Although also available in

`lavaan`, only Mplus defaults to a χ

^{2}statistic and SEs that are robust to nonnormality. The robust χ

^{2}statistic provided by

`lavaan`and Mplus is asymptotically equivalent to Yuan and Bentler’s ${T}_{2}^{*}$ statistic [13]. This robust test statistic (with SEs) is requested in

`lavaan`with the argument estimator = “MLR” (or equivalently, test = “yuan.bentler.mplus” and se = “robust.sem”; see the ?lavOptions help page) and in Mplus with the ANALYSIS option ESTIMATOR = MLR [3] (chapter 16).

`lavaan`maximizes the sum of log-likelihoods of the clusters—which Mplus refers to as the observed-data log-likelihood (ALGORITHM = ODLL)—using a quasi-Newton (QN) algorithm, which is the same algorithm used for ML estimation with single-level data. The expectation–maximization (EM) algorithm is also available in both (

`lavaan`and Mplus to obtain ML estimates, although the implementation in

`lavaan`is notably slower. The EM algorithm can be requested by passing the argument optim.method = “em” to

`lavaan`, or with the ANALYSIS option ALGORITHM = EM in Mplus [3] (chapter 16). Mplus also has an accelerated EM algorithm (ALGORITHM = EMA), achieved by switching to QN when EM does not optimize quickly enough (i.e., when relative or absolute changes in log-likelihood do not decrease enough between iterations). Mplus can also switch between a Fisher-scoring (FS) algorithm and EM (with ALGORITHM = FS), but EMA is the default, and neither EMA nor FS are currently available in

`lavaan`. Availability of current options and their default settings are listed for

`lavaan`and Mplus in Table 1.

`lavaan`is nlminb from the stats package, whose options can be set by passing a named list to the control = argument (see the ?lavOptions help page).

`lavaan`’s defaults are list (iter.max = 10,000, abs.tol = .Machine$double.eps × 10, rel.tol = 1 × 10

^{−10}). When using EM,

`lavaan`has its own parallel dedicated arguments, shown in the last example on the tutorial page: https://

`lavaan`.ugent.be/tutorial/multilevel.html, accessed on 31 May 2021.

`lavaan`warns that the optimizer did not find a ML solution and “estimates below are most likely unreliable.” Likewise, Mplus output will contain the message: “The model estimation did not terminate normally due to a non-zero derivative of the observed-data loglikelihood,” referring to the first-order condition. Any nonzero element of the gradient indicates the corresponding parameter estimate is not a ML estimate. But the reverse is not necessarily true: if the gradient consists only of zeros, it could be a minimum or a saddle point rather than a maximum. In order to verify the solution is a maximum, the Hessian should be negative definite. Because the Hessian is intensive to compute, this “second-order condition” is rarely checked to simply verify convergence. However, multiplying the Hessian by −1 yields the information matrix, the inverse of which is the asymptotic covariance matrix of the estimated parameters (the diagonal of which contains the squared SEs). Thus, if the information matrix is not positive definite (and so cannot be inverted), a warning is issued that SEs cannot be calculated.

#### 1.2.2. Between-Level Residual Variances in Two-Level Models

_{B}) can therefore be interpreted as differences in intercepts across clusters (measurement bias, also called cluster bias by Jak, Oort and Dolan [14]. Nonzero residual variance at the between level (θ

_{B}> 0) means that the cluster-level differences in the indicators are not all explained by cluster-level differences in the common factor. In other words, variables other than what was intended to be measured cause differences in the indicator scores across clusters. In practice, it may not be realistic to expect exactly zero cluster bias for all indicators, similar to how exact invariance of intercepts generally does not hold [17]. That is, cluster invariance may hold only approximately, implying small θ

_{B}instead of zero θ

_{B}. Moreover, some indicators may be subject to cluster bias while other indicators are not (representing partial invariance [18]).

_{B}vary around the population values, so when they are (nearly) zero, estimates can frequently take negative values simply due to sampling error. Thus, if strong factorial invariance across clusters holds even approximately (θ

_{B}≅ 0), then estimating θ

_{B}under non-negative constraints may lead to trouble with convergence in samples that would have contained at least one negative variance under unconstrained estimation. By default,

`lavaan`does not restrict θ

_{B}estimates to be positive when using QN, while Mplus does. The EM algorithm requires θ

_{B}> 0 in both packages, because the EM algorithm requires the between-level model-implied covariance matrix to be positive definite. The minimum-variance requirement (set with the ANALYSIS option VARIANCE) must be between 0 and 1, so negative values for θ

_{B}are not permitted in Mplus. This requirement may therefore result in nonconvergence for populations with θ

_{B}≅ 0.

_{B}≅ 0 would prevent the ability to compare that model to one with strong factorial invariance across clusters (θ

_{B}= 0). In our simulation study, we explicitly crossed these design factors: zero vs. nonzero θ

_{B}in the population model and fixed vs. estimated θ

_{B}in the fitted model. We focus on conditions where population θ

_{B}is exactly zero, representing exact invariance, but we will show some results based on conditions with θ

_{B}= 0.01 and θ

_{B}= 0.0001 as well. Chen, Bollen, Paxton, Curran and Kirb [19] describe the possible causes of, consequences of, and possible strategies to handle inadmissible solutions in more detail. Negative variance estimates could result from either model misspecification or sampling error. So before fixing negative variance estimates to zero, one should first test the null hypothesis that the parameter is an admissible solution. For example, if the 95% confidence interval for a residual variance includes positive values, then one cannot reject the null hypothesis (using α = 0.05) that the true population value is indeed positive; in this case, if the model fits well and there are no other signs of misspecification, one could conclude that the true parameter is simply close enough to zero that sampling error occasionally yields a negative estimate. For more discussion about negative variance parameters and estimates see [20,21,22].

#### 1.3. Overview of the Study

## 2. Materials and Methods

`lavaan`[4] (Version 0.6-7 and Mplus [3] (Version 8.5) to fit all models to those data. R syntax to generate and analyze the Monte Carlo results are available from the Open Science Framework: https://osf.io/sdwam/ (accessed on 31 May 2021).

#### 2.1. Data Generating Models

_{B}= 0, and a shared model with θ

_{B}> 0. As this confounding of factor structure and presence of θ

_{B}is not apparent from the article, their results are easily misinterpreted. We generated data from four different population models. These are models with or without the existence of an additional between-level construct, and with or without cluster bias (indicated by θ

_{B}> 0).

_{B}values were either fixed to zero or chosen to standardize the between-level factor loadings (e.g., 1 − (0.40

^{2}+ 0.70

^{2}) = 0.35 in the model with the shared factor). Stapleton and Johnson only generated data from the shared-and-configural model with θ

_{B}= 0 and from the configural-only model with θ

_{B}> 0, so these factors were confounded in their study design.

_{B}> 0. In conditions with θ

_{B}= 0, item ICCs were 0.32 for the configural model conditions, and item ICCs were 0.39 for the shared model conditions. The ICC of the configural factor was 0.50 in all conditions.

#### 2.2. Sample Size Conditions

#### 2.3. Fitted Models

_{B}and fixed θ

_{B}= 0. The unconstrained model is a two-level CFA model with one factor at each level and with freely estimated factor loadings at both levels. The factor variances of the unconstrained model are fixed at one at both levels. This model has df = 10 in the condition with freely estimated θ

_{B}, and df = 15 in the condition with fixed θ

_{B}= 0.

_{B}, and df = 19 in the condition with fixed θ

_{B}= 0.

_{B}, and df = 14 in the condition with fixed θ

_{B}= 0.

#### 2.4. Estimation Options

`lavaan`and Mplus. In a follow-up study holding the number of clusters constant at 100, we also compared QN to EMA (the default algorithm in Mplus) and EM in Mplus. A third study used MLR (the default in Mplus) to compare the rejection rates reported by Stapleton and Johnson [2].

#### 2.5. Number of Conditions and Replications

_{B}= 0 or θ

_{B}> 0 in the population) × 3 (sample size) = 12 data conditions. We generated 1000 datasets per condition. For all 12,000 datasets, 3 (unconstrained, configural, or shared) × 2 (θ

_{B}= 0 or θ

_{B}> 0) = 6 models were fitted with ML estimation using the QN algorithm in both software packages. In the first follow-up study, the subset of 4000 datasets with 100 clusters were analyzed using ML estimation for the same six models in both software packages, but additionally using EMA and EM in Mplus. The second follow-up study used MLR, again with only QN in

`lavaan`but QN and EMA in Mplus.

#### 2.6. Expectations Regarding Convergence and Rejection Rates

_{B}> 0 was not taken into account (i.e., by fixing θ

_{B}= 0 in the analysis), we expected high rejection rates (cells labeled ‘R’ in Table 2). In conditions with an overparameterized model (cells labeled ‘O’ in Table 2), such as when θ

_{B}= 0 but was freely estimated, we expected more convergence problems, but nominal rejection rates for the converged cases.

_{B}= 0 when data were generated with θ

_{B}> 0.

## 3. Results

#### 3.1. Convergence Rates in the Primary Study

`lavaan`only occurred when fitting the shared-and-configural models to datasets. Regardless of θ

_{B}in the data-generating model, convergence was consistently >95% when estimating θ

_{B}. When θ

_{B}= 0 in the population, oddly, convergence problems only occurred in

`lavaan`when appropriately fixing θ

_{B}= 0, but convergence was still approximately 80% and improved in larger samples, whereas convergence in the same conditions was notably lower in Mplus and varied erratically across sample sizes.

_{B}despite θ

_{B}= 0 in the population. In contrast,

`lavaan`had 100% convergence in the same conditions (except the conditions described in the previous paragraph). In order to evaluate the effects of generating data with exact or approximate cluster invariance, we also evaluated the conditions with 100 clusters while fixing θ

_{B}to 0.0001 (the minimum value for a variance parameter in Mplus) or 0.01 (a small but realistic amount of variance). Table A1 shows that the convergence problems when estimating θ

_{B}freely persisted in conditions with approximate instead of exact cluster invariance, with 0–0.5% convergence in conditions with generated θ

_{B}= 0.0001, and 12.4–43.1% convergence in conditions with generated θ

_{B}= 0.01.

_{B}= 0 in the population and analysis models). For all conditions in Table 3

`lavaan`either converged more often than Mplus or both packages converged in 100% of samples.

#### 3.2. Rejection Rates in the Primary Study

_{B}> 0, rejection rates were as expected in both Mplus and

`lavaan`. In fact, the rates are mostly identical in conditions where convergence rates were 100% for both

`lavaan`and Mplus, reinforcing the expectation the two software packages provide the same results when fitting the same model to the same data, using the same estimation routine and calculating the normal-theory χ

^{2}test statistic. In the other conditions, differences in convergence rates cause small differences between the results obtained with

`lavaan`and Mplus with ML.

_{B}= 0 yielded 100% power to reject the model, and freely estimating θ

_{B}yielded rejection rates that did not appreciably differ from the nominal 5% and were closer in larger samples. In populations with θ

_{B}= 0, however, rejections rates were nearly 0% across conditions and software (except when they could not be calculated in conditions where Mplus did not converge).

#### 3.3. Follow-Up Study Comparing ML Algorithms

_{B}> 0. When θ

_{B}= 0 in the population, EMA converged in all samples, including the conditions that fitted θ

_{B}> 0 (for which QN failed in all samples). Convergence rates for EM were consistently zero for fitted models with θ

_{B}= 0, regardless of the population model. This implies some counter-intuitive results with EM. For example, fitting the correct configural model with θ

_{B}= 0 leads to 0% convergence, while fitting the overparametarized configural model with θ

_{B}> 0 (while θ

_{B}= 0 in the population) converged in 90.4% of the replications. In addition, EM convergence rates were particularly low in conditions where the shared model with θ

_{B}> 0 was fitted to data generated with θ

_{B}= 0. For the converged cases, there were no notable differences in rejection rates across the different algorithms.

#### Nonconvergence Anomaly with Mplus

_{B}= 0, but θ

_{B}is freely estimated in the fitted model. In this condition, EM failed to converge in 98.1% of the replications, QN failed to converge in 100% of the replications, yet EMA converged for each sample. We inspected the TECH8 output, which prints the optimization history for each replication, and found that all 998 converged replications using EMA contained the message: “The optimization algorithm has changed to the em algorithm”. As noted in Section 1.2.1, this occurs when a QN step (used to accelerate convergence with EM) fails to improve the log-likelihood.

^{2}(9) = 2.817, p = 0.971. But indeed the optimization history showed that the default EMA algorithm failed after 111 iterations, at which point Mplus switched to EM and appeared to converge after 243 iterations. However, when we explicitly selected ALGORITHM = EM in the Mplus input file, we saw that the MCONVERGENCE criterion of the EM algorithm was not fulfilled, despite having apparently converged when EM was used following the failure of EMA to converge. Close inspection of each interation in the optimization history for this data set reveals that Iteration 243 of explicitly requested EM had the same log-likelihood as the final (apparently converging) Iteration 243 of EM following failure of EMA; however, when explicitly requested, the EM continued iterating until the maximum number of m-iterations (500) was reached and eventually failed to converge. When we ran the MONTECARLO analysis on all 1000 datasets in this condition with ALGORITHM = EM, the model converged on a solution in none of the data sets.

#### 3.4. Follow-Up Study Comparing ML Algorithms with Robust Corrections

^{2}statistic in the 100-cluster conditions using the QN and EMA algorithms in Mplus and QN in

`lavaan`. The results obtained with MLR show that the rejections rates of the default test statistic in Mplus can be seriously inflated under certain conditions that we examined in this simulation study—namely, in populations for which there is strong invariance across clusters (θ

_{B}= 0). The same inflation is not apparent when using MLR (with QN) in

`lavaan`. However, the same inflation is apparent when using QN in Mplus, except in conditions where models did not converge, in which case rejection rates cannot be calculated. Because interpreting these differential results involves quite some detail across software packages, we focus first on how our results compare to the results presented by Stapleton and Johnson [2], and then discuss the differences between software packages.

#### 3.4.1. Comparison with Stapleton and Johnson (2019)

_{B}= 0 in the configural model. That is, when the correct model was fitted to data generated under the configural model with no residual variance, rejection rates were 11%. Also, when fitting the overparameterized unconstrained model to the same data, rejection rates were as highly inflated as Stapleton and Johnson reported, but only when the model was additionally overparameterized by freely estimating θ

_{B}. When appropriately fixing θ

_{B}= 0, the rejection rates were not nearly as inflated (i.e., 13% rather than 51%). Furthermore, the same pattern of results just described (for fitting the unconstrained model to configural-model data) can be seen not only when fitting the configural or shared models to the same data, but also when fitting any model to data from populations with a shared construct (but again, only when population θ

_{B}= 0).

^{2}test criterion was lowered,” perhaps forgetting that the χ

^{2}test statistic itself would also lower in this case (on average, as much as the df decrease). Table 6 shows that the unconstrained model’s rejection rates were not substantially larger than α = 5% when appropriately estimating θ

_{B}> 0, matching Stapleton and Johnson’s results when fitting an unconstrained model to data from a population with a shared construct. Note that this is also consistent with our prediction that even the configural model (which is similar to but more constrained than the unconstrained model) should have rejection rates near the α level. When fitting the unconstrained model to data from a population without a shared construct, Stapleton and Johnson’s high rejection rates were instead due to unnecessarily fixing θ

_{B}= 0.

_{B}= 0 when the population θ

_{B}> 0. Appropriately estimating θ

_{B}yielded 5% rejection rates, supporting our claim that the shared and configural factors would be confounded due to proportionally equivalent loadings across indicators. All in all, Stapleton and Johnson’s results are not generalizable because they did not hold constant whether θ

_{B}= 0 either in the population or in the fitted model. Note that all of these inflated error rates occurred only when using the Mplus default setting (i.e., a scaled test statistic) because the same patterns were not found in Table 5. We will elaborate more on the inflated error rates found with the scaled test statistic in the discussion.

#### 3.4.2. Comparison of MLR Results with `lavaan `and Mplus

`lavaan`with MLR in Table 6, six conditions stand out where models were seriously misspecified because population θ

_{B}> 0 but fitted θ

_{B}= 0. In these conditions,

`lavaan`did not provide a test statistic. Closer inspection of these results indicated that the scaled test statistis was actually not defined due to a negative trace involved in calculating the scaling factor (i.e., the trace of

**UΓ**[23]). Users can obtain this quantity from

`lavaan`using the function lavInspect(), with the second argument as “UGamma”, as well as the

**U**or

**Γ**matrix separately using “UfromUGamma” or “gamma”. Naturally, the same issue occurs for Mplus. However, instead of providing a warning, Mplus reports the unscaled test statistic and indicates ‘Undefined’ for the scaling correction factor. The MONTECARLO output of Mplus does not contain any information pointing to the scaled test statistic being undefined, so we only discovered this was happening because of our comparison of results reported by

`lavaan`. Given that the models are severely misspecified in these conditions, the uncorrected test statistics are very high and will not lead to wrong conclusions in practice. The second column in Table A1 indicates the number of samples for which

`lavaan`indicated that the test statistic was not available per condition.

_{B}> 0 and in the fitted model θ

_{B}> 0, the results obtained with

`lavaan`(with QN) and Mplus were identical. For population conditions with θ

_{B}= 0 and fitted θ

_{B}= 0, the rejection rates obtained with Mplus with QN are somewhat larger than the rejection rates obtained with

`lavaan`. For example, the rejection rate was 0.004 in

`lavaan`and 0.111 in Mplus when fitting the correct configural model with θ

_{B}= 0. Since the rejection rates did not differ across

`lavaan`and Mplus with QN or EMA for the uncorrected test statistic (reported in Table 5), the difference in results must be rooted in how the scaled test statistics are calculated. In this condition there were 3 samples for which the scaling correction factor was not defined, implying that the results for Mplus are partly based on unscaled test statistics. This may explain a small part of the difference across packages. Another source of differences may be found in how the packages proceed when the (augmented) observed information matrix is near-singular. Near-singularity of the observed information matrix often happens, and usually this it not a reason for concern. Both

`lavaan`and Mplus do not print out a warning when (equality or inequality) constraints are part of the model, and both programs probably use a different approach to handle these near-singular cases. While

`lavaan`uses a generalized inverse, the solution of Mplus is less clear. In some cases Mplus gives the warning: “An adjustment to the estimatioon of the information matrix has been made”. The last column in Table A3 shows the number of replications per condition for which Mplus provided this warning.

_{B}= 0 and in the fitted model θ

_{B}> 0,

`lavaan`showed low rejection rates as expected. The 100% nonconvergence of Mplus with QN prevents comparison of results across software packages using the same QN algorithm. Mplus with EMA however resulted in severely inflated Type 1 error rates, ranging from 0.365 to 0.515. In Table A3 one can see that the rejection rates are also inflated in conditions where θ

_{B}is freely estimated while population θ

_{B}is not exactly zero, but 0.0001 or 0.01, although the inflation is less severe (but still around 12%) in the conditions with θ

_{B}= 0.01.

## 4. Discussion

#### 4.1. Summary of the Results

`lavaan`either converged more often than Mplus or both packages converged in 100% of samples. Mplus never converged in conditions with population θ

_{B}= 0 but fitted θ

_{B}> 0. Rejection rates of the normal-theory χ

^{2}test statistic were identical across packages. Our comparison of ML algorithms in Mplus showed that using EM did not converge in any condition for which fitted θ

_{B}= 0. With the default Mplus settings for two-level models (MLR + EMA), Mplus often switches to the EM algorithm. When this switch is made, Mplus ignores one of the main convergence criteria (i.e., whether the algorithm in fact converged on a ML estimate, as revealed by the first derivative), meaning that the obtained results may be based on a non-converged solution. Users are not notified that convergence criteria are ignored, nor are they notified of this switch being made (unless they specifically request and pay attention to the TECH8 output, which seems unlikely to be common practice). In our second follow up study, we found seriously inflated scaled test statistics in Mplus in populations with θ

_{B}= 0.

_{B}= 0 either in the population or in the fitted model. In all conditions in which they reported high rejection rates, this was the result of incorrectly fixing θ

_{B}= 0, or of inflated scaled test statistics obtained by using the default settings in Mplus.

#### 4.2. Recommendation for Practice

`lavaan`does not carry the same risks, given that the convergence rates were good, rejection rates appropriate, and one can be sure that there are no hidden changes of convergence checks or test statistics. Therefore, when Mplus users find evidence that their apparently converged model might not have actually converged on a maximum, we recommend fitting their model (when possible) with

`lavaan`, whose defaults do not cause the same rates of convergence problems, nor are

`lavaan`’s test statistics (and Type I error rates) inflated in the very conditions when Mplus results are dubious (Table 6; see also the Appendix A).

^{2}statistic can be seriously inflated. This is in line with earlier findings [24,25]. Moreover, there exist many computational options for (scaled) test statistics, and more research is needed to evaluate which of those work best in which conditions. Also, the default implementations differ across packages. Savalei and Rosseel [23] provide an overview of computational variations and how to apply them using

`lavaan`.

#### 4.3. Future Research

^{2}as implemented in Mplus overrejects models when the sample size is not large enough, especially with large models [25]. In line with earlier findings based on two-level models [14,26], our simulation study shows that over-rejection by this test statistic is exacerbated when between-level residual variances are zero (or small, see Table A2) in the population. Because θ

_{B}= 0 is to be expected when strong factorial invariance across clusters holds (i.e., when there is no cluster bias, even approximately), it is reasonable to assume that θ

_{B}≅ 0, at least for some of the indicators in a substantial part of research settings. In our study, we did not evaluate conditions with partial cluster invariance. Future research may investigate the performance of the scaled χ

^{2}as implemented in Mplus when population residual variances are zero for some indicators but nonzero for others.

## Author Contributions

## Funding

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Appendix A

**Table A1.**Convergence rates with 100 clusters for ML estimation with QN in Mplus with population θ

_{B}being zero, 0.0001, and 0.01.

Population Value θ_{B} | |||||
---|---|---|---|---|---|

Data Model | Fitted Model | 0 | 0.0001 | 0.01 | |

Config | Uncon | θ_{B} > 0 | 0 | 0.005 | 0.401 |

Uncon | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | |

Config | θ_{B} > 0 | 0.001 | 0.004 | 0.431 | |

Config | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | |

Shared | θ_{B} > 0 | 0 | 0.001 | 0.245 | |

Shared | θ_{B} = 0 | 0.227 | 0.237 | 0.722 | |

Shared | Uncon | θ_{B} > 0 | 0 | 0.005 | 0.200 |

Uncon | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | |

Config | θ_{B} > 0 | 0 | 0.002 | 0.383 | |

Config | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | |

Shared | θ_{B} > 0 | 0 | 0 | 0.124 | |

Shared | θ_{B} = 0 | 0.215 | 0.446 | 0.952 |

_{B}= between-level residual variances.

**Table A2.**Frequency (in 1000 samples) of Mplus switch from EMA to EM, undefined scaling corrections and saddle point warnings.

Data Model | Fitted Model | Switch EMAto EM | Scaling Factor Undefined | Saddle Point | ||
---|---|---|---|---|---|---|

Config | θ_{B} > 0 | Uncon | θ_{B} > 0 | 0 | 0 | 0 |

Uncon | θ_{B} = 0 | 0 | 1000 | 0 | ||

Config | θ_{B} > 0 | 0 | 0 | 0 | ||

Config | θ_{B} = 0 | 0 | 1000 | 0 | ||

Shared | θ_{B} > 0 | 0 | 9 | 11 | ||

Shared | θ_{B} = 0 | 0 | 1000 | 0 | ||

Config | θ_{B} = 0 | Uncon | θ_{B} > 0 | 985 | 4 | 0 |

Uncon | θ_{B} = 0 | 0 | 3 | 0 | ||

Config | θ_{B} > 0 | 981 | 3 | 0 | ||

Config | θ_{B} = 0 | 0 | 3 | 0 | ||

Shared | θ_{B} > 0 | 998 | 11 | 0 | ||

Shared | θ_{B} = 0 | 518 | 7 | 0 | ||

Shared | θ_{B} > 0 | Uncon | θ_{B} > 0 | 0 | 0 | 0 |

Uncon | θ_{B} = 0 | 0 | 1000 | 0 | ||

Config | θ_{B} > 0 | 0 | 0 | 0 | ||

Config | θ_{B} = 0 | 0 | 1000 | 0 | ||

Shared | θ_{B} > 0 | 3 | 10 | 13 | ||

Shared | θ_{B} = 0 | 0 | 1000 | 0 | ||

Shared | θ_{B} = 0 | Uncon | θ_{B} > 0 | 987 | 2 | 0 |

Uncon | θ_{B} = 0 | 0 | 2 | 0 | ||

Config | θ_{B} > 0 | 984 | 2 | 0 | ||

Config | θ_{B} = 0 | 0 | 1 | 0 | ||

Shared | θ_{B} > 0 | 998 | 10 | 407 | ||

Shared | θ_{B} = 0 | 518 | 11 | 0 |

_{B}= between-level residual variances. Saddle point = Mplus with EMA warns that “The model estimation has reached a saddle point or a point where the observed and the expected information matrices do not match. An adjustment to the estimation of the information matrix has been made”.

**Table A3.**Rejection rates with 100 clusters for MLR estimation with EMA in Mplus with population θ

_{B}being zero, 0.0001, and 0.01.

Population Value θ_{B} | |||||
---|---|---|---|---|---|

Data Model | Fitted Model | 0 | 0.0001 | 0.01 | |

Config | Uncon | θ_{B} > 0 | 0.512 | 0.522 | 0.124 |

Uncon | θ_{B} = 0 | 0.131 | 0.127 | 0.940 | |

Config | θ_{B} > 0 | 0.365 | 0.338 | 0.118 | |

Config | θ_{B} = 0 | 0.111 | 0.108 | 0.905 | |

Shared | θ_{B} > 0 | 0.445 | 0.450 | 0.123 | |

Shared | θ_{B} = 0 | 0.190 | 0.161 | 0.719 | |

Shared | Uncon | θ_{B} > 0 | 0.515 | 0.498 | 0.119 |

Uncon | θ_{B} = 0 | 0.122 | 0.135 | 0.927 | |

Config | θ_{B} > 0 | 0.357 | 0.353 | 0.103 | |

Config | θ_{B} = 0 | 0.104 | 0.112 | 0.899 | |

Shared | θ_{B} > 0 | 0.430 | 0.403 | 0.127 | |

Shared | θ_{B} = 0 | 0.170 | 0.160 | 0.713 |

_{B}= between-level residual variances. Reported convergence rates for these conditions were all higher than 0.973, but across θ

_{B}conditions, the frequencies of Mplus switching to the EM algorithm (and ignoring one of the convergence criteria) are very similar to the frequencies reported in the first column of Table A2, so convergence status is effectively unknown.

## References

- Stapleton, L.M.; Yang, J.S.; Hancock, G.R. Construct Meaning in Multilevel Settings. J. Educ. Behav. Stat.
**2016**, 41, 481–520. [Google Scholar] [CrossRef] - Stapleton, L.M.; Johnson, T.L. Models to Examine the Validity of Cluster-Level Factor Structure Using Individual-Level Data. Adv. Methods Pract. Psychol. Sci.
**2019**, 2, 312–329. [Google Scholar] [CrossRef] - Muthén, B.O.; Muthén, L.K. Mplus User’s Guide, 8th ed.; Muthén & Muthén: Los Angeles, CA, USA, 1998. [Google Scholar]
- Rosseel, Y.
`lavaan`: An R Package for Structural Equation Modeling. J. Stat. Softw.**2012**, 48, 1–36. [Google Scholar] [CrossRef] [Green Version] - van Schaik, S.D.M.; Leseman, P.P.M.; Haan, M. de Using a Group-Centered Approach to Observe Interactions in Early Childhood Education. Child Dev.
**2018**, 89, 897–913. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Asparouhov, T.; Muthén, B. General Random Effect Latent Variable Modeling: Random Subjects, Items, Contexts, and Parameters. In Proceedings of the Annual Meeting of the National Council on Measurement in Education, Vancouver, BC, Canada, 13–17 April 2012. [Google Scholar]
- Hox, J.J.; Moerbeek, M.; van de Schoot, R. Multilevel Analysis: Techniques and Applications, 3rd ed.; Routledge: London, UK, 2017; ISBN 978-1-317-30868-3. [Google Scholar]
- Jak, S. Cross-Level Invariance in Multilevel Factor Models. Struct. Equ. Model. Multidiscip. J.
**2019**, 26, 607–622. [Google Scholar] [CrossRef] [Green Version] - Kim, E.S.; Dedrick, R.F.; Cao, C.; Ferron, J.M. Multilevel Factor Analysis: Reporting Guidelines and a Review of Reporting Practices. Multivar. Behav. Res.
**2016**, 51, 881–898. [Google Scholar] [CrossRef] - Mehta, P.D.; Neale, M.C. People Are Variables Too: Multilevel Structural Equations Modeling. Psychol. Methods
**2005**, 10, 259–284. [Google Scholar] [CrossRef] [Green Version] - Rabe-Hesketh, S.; Skrondal, A.; Pickles, A. Generalized Multilevel Structural Equation Modeling. Psychometrika
**2004**, 69, 167–190. [Google Scholar] [CrossRef] - Muthén, B.O. Mean and Covariance Structure Analysis of Hierarchical Data. UCLA Statistics Series #62, August 1990. Available online: https://escholarship.org/uc/item/1vp6w4sr (accessed on 31 May 2021).
- Yuan, K.-H.; Bentler, P.M. 5. Three Likelihood-Based Methods for Mean and Covariance Structure Analysis with Nonnormal Missing Data. Sociol. Methodol.
**2000**, 30, 165–200. [Google Scholar] [CrossRef] - Jak, S.; Oort, F.J.; Dolan, C.V. A Test for Cluster Bias: Detecting Violations of Measurement Invariance Across Clusters in Multilevel Data. Struct. Equ. Model. Multidiscip. J.
**2013**, 20, 265–282. [Google Scholar] [CrossRef] - Muthén, B.; Asparouhov, T. Recent Methods for the Study of Measurement Invariance With Many Groups: Alignment and Random Effects. Sociol. Methods Res.
**2018**, 47, 637–664. [Google Scholar] [CrossRef] - Jak, S.; Jorgensen, T.D. Relating Measurement Invariance, Cross-Level Invariance, and Multilevel Reliability. Front. Psychol.
**2017**, 8. [Google Scholar] [CrossRef] [Green Version] - Muthén, B.; Asparouhov, T. Bayesian SEM: A More Flexible Representation of Substantive Theory. Psychol. Methods
**2012**, 313–335. [Google Scholar] [CrossRef] [PubMed] - Byrne, B.M.; Shavelson, R.J.; Muthén, B. Testing for the Equivalence of Factor Covariance and Mean Structures: The Issue of Partial Measurement Invariance. Psychol. Bull.
**1989**, 105, 456–466. [Google Scholar] [CrossRef] - Furlow, C.F.; Beretvas, S.N. Meta-Analytic Methods of Pooling Correlation Matrices for Structural Equation Modeling Under Different Patterns of Missing Data. Psychol. Methods
**2005**, 10, 227–254. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Lüdtke, O.; Marsh, H.W.; Robitzsch, A.; Trautwein, U. A 2 × 2 Taxonomy of Multilevel Latent Contextual Models: Accuracy–Bias Trade-Offs in Full and Partial Error Correction Models. Psychol. Methods
**2011**, 16, 444–467. [Google Scholar] [CrossRef] [PubMed] - Zitzmann, S.; Lüdtke, O.; Robitzsch, A.; Marsh, H.W. A Bayesian Approach for Estimating Multilevel Latent Contextual Models. Struct. Equ. Model. Multidiscip. J.
**2016**, 23, 661–679. [Google Scholar] [CrossRef] - Depaoli, S.; Clifton, J.P. A Bayesian Approach to Multilevel Structural Equation Modeling With Continuous and Dichotomous Outcomes. Struct. Equ. Model. Multidiscip. J.
**2015**, 22, 327–351. [Google Scholar] [CrossRef] - Savalei, V.; Rosseel, Y. Computational Options for Standard Errors and Test Statistics with Incomplete Normal and Nonnormal Data in SEM.
**2021**. [Google Scholar] [CrossRef] - Savalei, V. Expected versus Observed Information in SEM with Incomplete Normal and Nonnormal Data. Psychol. Methods
**2010**, 15, 352–367. [Google Scholar] [CrossRef] - Maydeu-Olivares, A. Maximum Likelihood Estimation of Structural Equation Models for Continuous Data: Standard Errors and Goodness of Fit. Struct. Equ. Model. Multidiscip. J.
**2017**, 24, 383–394. [Google Scholar] [CrossRef] - Jak, S.; Oort, F.J.; Dolan, C.V. Using Two-Level Factor Analysis to Test for Cluster Bias in Ordinal Data. Multivar. Behav. Res.
**2014**, 49, 544–553. [Google Scholar] [CrossRef] [PubMed] - Holtmann, J.; Koch, T.; Lochner, K.; Eid, M. A Comparison of ML, WLSMV, and Bayesian Methods for Multilevel Structural Equation Models in Small Samples: A Simulation Study. Multivar. Behav. Res.
**2016**, 51, 661–680. [Google Scholar] [CrossRef] [PubMed] - Lüdtke, O.; Robitzsch, A.; Wagner, J. More Stable Estimation of the STARTS Model: A Bayesian Approach Using Markov Chain Monte Carlo Techniques. Psychol. Methods
**2018**, 23, 570–593. [Google Scholar] [CrossRef] - Devlieger, I.; Rosseel, Y. Multilevel Factor Score Regression. Multivar. Behav. Res.
**2020**, 55, 600–624. [Google Scholar] [CrossRef] [PubMed] - Zitzmann, S.; Helm, C. Multilevel Analysis of Mediation, Moderation, and Nonlinear Effects in Small Samples, Using Expected a Posteriori Estimates of Factor Scores. Struct. Equ. Model. Multidiscip. J.
**2021**, 1–18. [Google Scholar] [CrossRef]

Software Package | Require θ_{B} ≥ 0? | QN | EM | EMA | FS | ||||
---|---|---|---|---|---|---|---|---|---|

ML | MLR | ML | MLR | ML | MLR | ML | MLR | ||

lavaan | D | ✓ | ✓ | ✓ | |||||

Mplus | ✓ (D = 0.1^{4}) | ✓ | ✓ | ✓ | ✓ | ✓ | D | ✓ | ✓ |

_{B}= between-level residual variances. QN = quasi-Newton algorithm. EM = expectation–maximization algorithm. EMA = accelerated EM algorithm. FS = Fisher scoring algorithm. ML(R) = (robust) maximum likelihood. D = default setting.

**Table 2.**Overview of the four data generation conditions (rows), and the six fitted models (columns).

Data-Generating Model | Fitted Model | ||||||
---|---|---|---|---|---|---|---|

Uncon | Conf | Shared | |||||

θ_{B} > 0 | θ_{B} = 0 | θ_{B} > 0 | θ_{B} = 0 | θ_{B} > 0 | θ_{B} = 0 | ||

Configural | θ_{B} > 0 | O | R | T | R | O | R |

θ_{B} = 0 | O | O | O | T | O ^{a} | O | |

Shared | θ_{B} > 0 | S | R | S | R | T ^{a} | R |

θ_{B} = 0 | S | S | S | S | O | T |

_{B}= between-level residual variances. T = True model fitted, R = Theta between not taken into account, O = Overparameterized model, S = Shared factor not explicitly modeled. Grey cells represent conditions evaluated by Stapleton and Johnson [1].

^{a}In contrast to Stapleton and Johnson [1], we did not fix the ICC of the configural factor.

**Table 3.**Convergence rates for the six models in the four data conditions and three sample size conditions.

50 Clusters | 100 Clusters | 200 Clusters | |||||||
---|---|---|---|---|---|---|---|---|---|

Data Model | Fitted Model | lavaan | Mplus | lavaan | Mplus | lavaan | Mplus | ||

Config | θ_{B} > 0 | Uncon | θ_{B} > 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |

Uncon | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | ||

Config | θ_{B} > 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | ||

Config | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | ||

Shared | θ_{B} > 0 | 0.969 | 0.681 | 0.998 | 0.793 | 1.000 | 0.938 | ||

Shared | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | ||

Config | θ_{B} = 0 | Uncon | θ_{B} > 0 | 1.000 | 0.002 | 1.000 | 0 | 1.000 | 0 |

Uncon | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | ||

Config | θ_{B} > 0 | 1.000 | 0.002 | 1.000 | 0.001 | 1.000 | 0.003 | ||

Config | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | ||

Shared | θ_{B} > 0 | 0.955 | 0 | 0.970 | 0 | 0.989 | 0 | ||

Shared | θ_{B} = 0 | 0.787 | 0.199 | 0.863 | 0.227 | 0.968 | 0.218 | ||

Shared | θ_{B} > 0 | Uncon | θ_{B} > 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |

Uncon | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | ||

Config | θ_{B} > 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | ||

Config | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | ||

Shared | θ_{B} > 0 | 0.957 | 0.699 | 0.991 | 0.772 | 1.000 | 0.918 | ||

Shared | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | ||

Shared | θ_{B} = 0 | Uncon | θ_{B} > 0 | 1.000 | 0 | 1.000 | 0 | 1.000 | 0 |

Uncon | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | ||

Config | θ_{B} > 0 | 1.000 | 0 | 1.000 | 0 | 1.000 | 0.003 | ||

Config | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | ||

Shared | θ_{B} > 0 | 0.960 | 0 | 0.970 | 0 | 0.987 | 0 | ||

Shared | θ_{B} = 0 | 0.821 | 0.358 | 0.868 | 0.215 | 0.938 | 0.254 |

_{B}= between-level residual variances. Italicized = conditions in which the true model was fitted.

50 Clusters | 100 Clusters | 200 Clusters | |||||||
---|---|---|---|---|---|---|---|---|---|

Data Model | Fitted Model | lavaan | Mplus | lavaan | Mplus | lavaan | Mplus | ||

Config | θ_{B} > 0 | Uncon | θ_{B} > 0 | 0.072 | 0.072 | 0.062 | 0.062 | 0.050 | 0.050 |

Uncon | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | ||

Config | θ_{B} > 0 | 0.072 | 0.072 | 0.057 | 0.057 | 0.058 | 0.058 | ||

Config | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | ||

Shared | θ_{B} > 0 | 0.041 | 0.034 | 0.037 | 0.026 | 0.056 | 0.046 | ||

Shared | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | ||

Config | θ_{B} = 0 | Uncon | θ_{B} > 0 | 0.003 | - | 0.006 | - | 0.006 | - |

Uncon | θ_{B} = 0 | 0.001 | 0.001 | 0.001 | 0.001 | 0.006 | 0.004 | ||

Config | θ_{B} > 0 | 0.007 | - | 0.005 | - | 0.01 | - | ||

Config | θ_{B} = 0 | 0.003 | 0.003 | 0.003 | 0.003 | 0.009 | 0.009 | ||

Shared | θ_{B} > 0 | 0.002 | - | 0.003 | - | 0.008 | - | ||

Shared | θ_{B} = 0 | 0 | 0 | 0 | 0 | 0.002 | 0.005 | ||

Shared | θ_{B} > 0 | Uncon | θ_{B} > 0 | 0.071 | 0.071 | 0.044 | 0.044 | 0.044 | 0.044 |

Uncon | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | ||

Config | θ_{B} > 0 | 0.068 | 0.068 | 0.049 | 0.049 | 0.050 | 0.050 | ||

Config | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | ||

Shared | θ_{B} > 0 | 0.024 | 0.016 | 0.038 | 0.026 | 0.058 | 0.051 | ||

Shared | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | ||

Shared | θ_{B} = 0 | Uncon | θ_{B} > 0 | 0.002 | - | 0.004 | - | 0.003 | - |

Uncon | θ_{B} = 0 | 0.003 | 0.003 | 0.003 | 0.003 | 0.002 | 0.002 | ||

Config | θ_{B} > 0 | 0.004 | - | 0.006 | - | 0.005 | - | ||

Config | θ_{B} = 0 | 0.005 | 0.004 | 0.004 | 0.004 | 0.001 | 0.001 | ||

Shared | θ_{B} > 0 | 0 | - | 0.003 | - | 0.004 | - | ||

Shared | θ_{B} = 0 | 0.001 | 0 | 0 | 0 | 0.001 | 0 |

_{B}= between-level residual variances. Italicized = conditions in which the true model was fitted.

Convergence Rates | Rejection Rates | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

Algorithm: | QN | QN | EMA | EM | QN | QN | EMA | EM | |||

Software: | lavaan | Mplus | Mplus | Mplus | lavaan | Mplus | Mplus | Mplus | |||

Data Model | Fitted Model | ||||||||||

Config | θ_{B} > 0 | Uncon | θ_{B} > 0 | 1.000 | 1.000 | 1.000 | 1.000 | 0.062 | 0.062 | 0.062 | 0.062 |

Uncon | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | 0 | 1.000 | 1.000 | 1.000 | - | ||

Config | θ_{B} > 0 | 1.000 | 1.000 | 1.000 | 1.000 | 0.057 | 0.057 | 0.057 | 0.057 | ||

Config | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | 0 | 1.000 | 1.000 | 1.000 | - | ||

Shared | θ_{B} > 0 | 0.998 | 0.793 | 0.741 | 0.730 | 0.037 | 0.026 | 0.022 | 0.025 | ||

Shared | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | 0 | 1.000 | 1.000 | 1.000 | - | ||

Config | θ_{B} = 0 | Uncon | θ_{B} > 0 | 1.000 | 0 | 1.000 | 0.897 | 0.006 | - | 0.009 | 0.008 |

Uncon | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | 0 | 0.001 | 0.001 | 0.001 | - | ||

Config | θ_{B} > 0 | 1.000 | 0.001 | 1.000 | 0.904 | 0.005 | - | 0.009 | 0.009 | ||

Config | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | 0 | 0.003 | 0.003 | 0.003 | - | ||

Shared | θ_{B} > 0 | 0.970 | 0 | 1.000 | 0.028 | 0.003 | - | 0.007 | - | ||

Shared | θ_{B} = 0 | 0.863 | 0.227 | 1.000 | 0 | 0 | 0 | 0 | - | ||

Shared | θ_{B} > 0 | Uncon | θ_{B} > 0 | 1.000 | 1.000 | 1.000 | 1.000 | 0.044 | 0.044 | 0.044 | 0.044 |

Uncon | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | 0 | 1.000 | 1.000 | 1.000 | - | ||

Config | θ_{B} > 0 | 1.000 | 1.000 | 1.000 | 1.000 | 0.049 | 0.049 | 0.049 | 0.049 | ||

Config | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | 0 | 1.000 | 1.000 | 1.000 | - | ||

Shared | θ_{B} > 0 | 0.991 | 0.772 | 0.667 | 0.659 | 0.038 | 0.026 | 0.022 | 0.023 | ||

Shared | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | 0 | 1.000 | 1.000 | 1.000 | - | ||

Shared | θ_{B} = 0 | Uncon | θ_{B} > 0 | 1.000 | 0 | 1.000 | 0.907 | 0.004 | - | 0.012 | 0.009 |

Uncon | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | 0 | 0.003 | 0.003 | 0.003 | - | ||

Config | θ_{B} > 0 | 1.000 | 0 | 1.000 | 0.909 | 0.006 | - | 0.012 | 0.009 | ||

Config | θ_{B} = 0 | 1.000 | 1.000 | 1.000 | 0 | 0.004 | 0.004 | 0.004 | - | ||

Shared | θ_{B} > 0 | 0.970 | 0 | 0.998 | 0.019 | 0.003 | - | 0.010 | - | ||

Shared | θ_{B} = 0 | 0.868 | 0.215 | 1.000 | 0 | 0 | 0 | 0.003 | - |

_{B}= between-level residual variances. QN = quasi-Newton algorithm. EM = expectation–maximization algorithm. EMA = accelerated EM algorithm. ML(R) = (robust) maximum likelihood. Italicized = conditions in which the true model was fitted.

**Table 6.**Rejection rates of scaled χ

^{2}statistic (MLR) with 100 clusters across estimation algorithms.

Algorithm | |||||||
---|---|---|---|---|---|---|---|

QN | QN | EMA | |||||

Software | lavaan | Mplus | Mplus | S&J | |||

Data Model | Fitted Model | ||||||

Config | θ_{B} > 0 | Uncon | θ_{B} > 0 | 0.070 | 0.070 | 0.070 | |

Uncon | θ_{B} = 0 | NA | 1.000 | 1.000 | |||

Config | θ_{B} > 0 | 0.065 | 0.065 | 0.065 | |||

Config | θ_{B} = 0 | NA | 1.000 | 1.000 | |||

Shared | θ_{B} > 0 | 0.041 | 0.049 | 0.049 | |||

Shared | θ_{B} = 0 | NA | 1.000 | 1.000 | |||

Config | θ_{B} = 0 | Uncon | θ_{B} > 0 | 0.006 | - | 0.512 | 0.54 |

Uncon | θ_{B} = 0 | 0.004 | 0.131 | 0.131 | |||

Config | θ_{B} > 0 | 0.006 | - | 0.365 | |||

Config | θ_{B} = 0 | 0.004 | 0.111 | 0.111 | 0.11 | ||

Shared | θ_{B} > 0 | 0.014 | - | 0.445 | - | ||

Shared | θ_{B} = 0 | 0.011 | 0.132 | 0.190 | |||

Shared | θ_{B} > 0 | Uncon | θ_{B} > 0 | 0.052 | 0.052 | 0.052 | 0.09 |

Uncon | θ_{B} = 0 | NA | 1.000 | 1.000 | |||

Config | θ_{B} > 0 | 0.058 | 0.058 | 0.058 | |||

Config | θ_{B} = 0 | NA | 1.000 | 1.000 | 1.000 | ||

Shared | θ_{B} > 0 | 0.043 | 0.040 | 0.037 | 0.09 | ||

Shared | θ_{B} = 0 | NA | 1.000 | 1.000 | |||

Shared | θ_{B} = 0 | Uncon | θ_{B} > 0 | 0.005 | - | 0.515 | |

Uncon | θ_{B} = 0 | 0.007 | 0.122 | 0.122 | |||

Config | θ_{B} > 0 | 0.006 | - | 0.357 | |||

Config | θ_{B} = 0 | 0.008 | 0.104 | 0.104 | |||

Shared | θ_{B} > 0 | 0.006 | - | 0.430 | |||

Shared | θ_{B} = 0 | 0.018 | 0.126 | 0.170 |

_{B}= between-level residual variances. QN = quasi-Newton algorithm. EM = expectation–maximization algorithm. EMA = accelerated EM algorithm. MLR = robust maximum likelihood. S&J = Rejection rates reported by Stapleton & Johnson [1]. Italicized = conditions in which the true model was fitted. NA = Test statistic not available.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Jak, S.; Jorgensen, T.D.; Rosseel, Y.
Evaluating Cluster-Level Factor Models with `lavaan` and M*plus*. *Psych* **2021**, *3*, 134-152.
https://doi.org/10.3390/psych3020012

**AMA Style**

Jak S, Jorgensen TD, Rosseel Y.
Evaluating Cluster-Level Factor Models with `lavaan` and M*plus*. *Psych*. 2021; 3(2):134-152.
https://doi.org/10.3390/psych3020012

**Chicago/Turabian Style**

Jak, Suzanne, Terrence D. Jorgensen, and Yves Rosseel.
2021. "Evaluating Cluster-Level Factor Models with `lavaan` and M*plus*" *Psych* 3, no. 2: 134-152.
https://doi.org/10.3390/psych3020012