Next Article in Journal
Interaction Map: A Visualization Tool for Personalized Learning Based on Assessment Data
Previous Article in Journal
Detecting Faking on Self-Report Measures Using the Balanced Inventory of Desirable Responding
 
 
Please note that, as of 22 March 2024, Psych has been renamed to Psychology International and is now published here.
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

L0 and Lp Loss Functions in Model-Robust Estimation of Structural Equation Models

by
Alexander Robitzsch
1,2
1
IPN–Leibniz Institute for Science and Mathematics Education, Olshausenstraße 62, 24118 Kiel, Germany
2
Centre for International Student Assessment (ZIB), Olshausenstraße 62, 24118 Kiel, Germany
Psych 2023, 5(4), 1122-1139; https://doi.org/10.3390/psych5040075
Submission received: 26 August 2023 / Revised: 16 October 2023 / Accepted: 19 October 2023 / Published: 20 October 2023

Abstract

:
The L p loss function has been used for model-robust estimation of structural equation models based on robustly fitting moments. This article addresses the choice of the tuning parameter ε that appears in the differentiable approximations of the nondifferentiable L p loss functions. Moreover, model-robust estimation based on the L p loss function is compared with a recently proposed differentiable approximation of the L 0 loss function and a direct minimization of a smoothed version of the Bayesian information criterion in regularized estimation. It turned out in a simulation study that the L 0 loss function slightly outperformed the L p loss function in terms of bias and root mean square error. Furthermore, standard errors of the model-robust SEM estimators were analytically derived and exhibited satisfactory coverage rates.

1. Introduction

Structural equation models (SEMs) and confirmatory factor analysis (CFA) are important statistical methods for analyzing multivariate data in the social sciences [1,2,3,4,5]. In these models, a multivariate vector X = ( X 1 , , X I ) of I continuous observed variables (also referred to as items or indicators) is modeled as a function of a vector of latent variables (i.e., factors or traits) η . SEMs represent the mean vector μ and the covariance matrix Σ of the random variable X as a function of an unknown parameter vector θ . In SEMs, constrained estimation of the moment structure of the multivariate normal distribution is applied [6].
The measurement model in an SEM is given as
X = ν + Λ η + ϵ .
We denote the covariance matrix Var ( ϵ ) = Ψ . The vectors η and ϵ are multivariate normally distributed. In addition, η and ϵ are uncorrelated random vectors. In CFA, the multivariate normal (MVN) distribution is represented as η MVN ( α , Φ ) and ϵ MVN ( 0 , Ψ ) . Hence, one can represent the mean and the covariance matrix in CFA as
μ ( θ ) = ν + Λ α and Σ ( θ ) = Λ Φ Λ + Ψ .
In SEM, a matrix B of regression coefficients can additionally be specified such that
η = B η + ξ with E ( ξ ) = α and Var ( ξ ) = Φ .
Hence, the mean vector and the covariance matrix are represented in SEM as
μ ( θ ) = ν + Λ ( I B ) 1 α and Σ ( θ ) = Λ ( I B ) 1 Φ [ ( I B ) 1 ] Λ + Ψ ,
where I is the identity matrix.
Researchers often parsimoniously parameterize the mean vector and the covariance matrix using a parameter θ as a summary in an SEM. The model assumptions in SEMs are, at best, merely an approximation of a true data-generating model. In SEMs, model deviations (i.e., model errors) in covariances emerge as a difference between a population covariance matrix Σ and a model-implied covariance matrix Σ ( θ ) (see [7,8,9]). Simultaneously, model errors in the mean vector cause a difference between the population mean vector μ and the model-implied mean vector μ ( θ ) . As a result, the SEM is misspecified at the population level. It should be noted that the model errors are defined at the population level in infinite sample sizes. In real-data applications with limited sample sizes, the empirical covariance matrix S estimates the population covariance matrix Σ , while the mean vector x ¯ estimates the population mean vector μ .
In this work, estimators with some resistance to model deviations are investigated. In more depth, the presence of some amount of model errors should have no impact on the parameter estimate θ . This robustness property is denoted as model robustness, and it adheres to robust statistics principles [10,11,12]. Model errors in SEMs appear as residuals in the modeled mean vector and the modeled covariance matrix, whereas in traditional robust statistics, observations (i.e., cases or subjects) do not obey an imposed statistical model and should be considered as outliers. That is, an estimator in an SEM should automatically recognize large deviations in μ μ ( θ ) and Σ Σ ( θ ) as outliers that should not significantly damage the estimated parameter θ .
In previous research, the L p loss function has been used for model-robust estimation based on moments [13,14]. Non-robust estimators such as maximum likelihood or unweighted and weighted least square estimates will typically result in biased estimates in the presence of model error [14]. In this article, we more thoroughly discuss the choice of the tuning parameter ε in differentiable approximations of the nondifferentiable L p loss functions. Furthermore, we compare the L p loss function with a recently proposed differentiable approximation of the L 0 loss function and a direct minimization of a smoothed version of the Bayesian information criterion [15] in regularized estimation. Notable, the  L 0 loss function minimizes the number of model deviations in a fitted model. If only a few entries in the modeled mean vector (or mean vectors) or covariance matrix (or covariance matrices) deviate from zero at the population level, while all other entries equal zero, the L 0 loss function would be the most appropriate fit function. In contrast, if all model deviations differ from zero and unsystematically fluctuate around zero, the L p or L 0 loss functions with  p 1 would be less appropriate. Finally, standard errors for the proposed model-robust estimators based on the delta method are derived in this article. Their performance is assessed by evaluating coverage rates.
To sum up, this article focuses on implementation details of SEM estimation based on the L p ( 0 < p 1 ) and the newly proposed L 0 loss functions, while [16] was devoted to regularized SEM estimation, which can also be utilized for model-robust estimation. A comparison of regularized estimation and robust loss functions can be found in [14].
The remainder of the article is organized as follows. Model-robust SEM estimation based on the robust L 0 and L p loss functions is treated in Section 2. Section 3 introduces direct BIC minimization as a special approach to regularized maximum likelihood estimation. Section 4 is devoted to details of standard error computation. In Section 5, research questions are formulated that are addressed in two subsequent simulation studies. In Section 6, bias and root mean square error of model-robust SEM estimators are of interest. Section 7 reports findings regarding the standard error estimation regarding coverage rates. Finally, the article closes with a discussion in Section 8.

2. L 0 and L p Loss Functions in SEM Estimation

We now describe model-robust moment estimation of multiple-group SEMs. The treatment closely follows previous work in Refs. [14,16].
The empirical mean vector x ¯ and the empirical covariance matrix S are sufficient statistics for estimating μ and Σ when modeling multivariate normally distributed data with no missing values. In particular, they are also sufficient statistics of μ ( θ ) and Σ ( θ ) that are constrained functions of a parameter vector θ . Hence, x ¯ and S are also sufficient statistics for the parameter vector θ = ( θ 1 , , θ K ) that contains K elements.
Now, assume that there are G groups with sample sizes N g , mean vectors x ¯ g , and covariance matrices S g ( g = 1 , , G ). Let ξ g = x ¯ g , vech ( S g ) be the vector of sufficient statistics in group g, where vech denotes the operator that stacks all nonredundant matrix entries on top of one another. Furthermore, the vector ξ = ( ξ 1 , , ξ G ) contains the sufficient statistics of all G groups.
The population mean vectors and covariance matrices are denoted by  μ g and Σ g , respectively. The model-implied mean vectors and covariance matrices are denoted by  μ g ( θ ) and Σ g ( θ ) , respectively. It is worth noting that the parameter vector θ lacks an index g, indicating that there can be common and unique parameters across groups. Equal factor loadings and item intercepts across groups are frequently imposed in a multiple-group CFA (i.e., measurement invariance is specified [17,18]).
In model-robust SEM estimation discussed in this article, discrepancies x ¯ g μ g ( θ ) and vech ( S g ) vech ( Σ g ( θ ) ) are minimized according to a loss function ρ . There are two kinds of errors that are only, for simplicity, discussed for the mean structure. We can express the discrepancy in the mean structure as
x ¯ g μ g ( θ ) = x ¯ g μ g + μ g μ g ( θ ) .
The first term x ¯ g μ g describes a discrepancy due to sampling variation (i.e., with respect to the sampling of subjects). This term can typically be reduced when larger samples are drawn. The second term μ g μ g ( θ ) indicates a model error. This term exists at the population level and, therefore, does not vanish with increasing sample sizes. In model-robust estimation, a few entries in the model error are allowed to differ from zero (i.e., a sparsity assumption), corresponding to model misspecification. Note that the sparsity assumption is vital for the performance of model-robust estimators.
In robust moment estimation, the following fit function F rob is minimized:
F rob ( θ ; ξ ) = g = 1 G i = 1 I w 1 g , i ρ x ¯ g , i μ g , i ( θ ) + g = 1 G i = 1 I j = i I w 2 g , i j ρ s g , i j σ g , i j ( θ ) .
In the first term at the right side of the equation in (6), discrepancies in the sample means  x ¯ g , i from the model-implied mean μ g , i ( θ ) for item i in group g are considered. In the second term at the right side of the equation in (6), discrepancies in the sample covariances s g , i j from the model-implied covariance σ g , i j ( θ ) for items i and j in group g are considered. The weights w 1 g , i ( i = 1 , , I ) and w 2 g , i j ( i , j = 1 , , I ) are known but can be set to one if all variables have (approximately) the same standard deviation in the sample comprising all groups or the original scaling of the variables reflects the intended weighing of sampling and model errors. The loss function ρ in (6) should be chosen such that it is resistant to outlying effects in the mean and the covariance structure.
The robust mean absolute deviation (MAD) loss function ρ ( x ) = | x | was examined in [19,20]. When compared to usually employed SEM estimation approaches, this fit function is more robust to a few model violations, such as unmodeled item intercepts or unmodeled residual correlations of residuals (see [7]).
In this article, we investigate the L p loss function
ρ ( x ) = | x | p for p > 0 .
It has been shown that p < 1 provides more efficient model-robust estimates than p = 1 (see [7,13]). The L p loss function with p = 2 is the square loss function ρ ( x ) = x 2 and corresponds to unweighted least squares (ULSs) estimation. However, this loss function does not possess the model robustness property [7]. The L p loss function ρ ( x ) = | x | = | x | 0.5 (i.e., p = 0.5 ) is implemented in invariance alignment [21,22,23] and penalized structural equation modeling [24] in the popular Mplus software (Version 8.10., https://www.statmodel.com/support/index.shtml (accessed on 25 September 2023)). The critical aspect of the L p loss function ρ defined in (7) is that it is a nondifferentiable function. Consequently, the fit function F rob in (6) is also nondifferentiable, which does not allow the application of general-purpose optimizers that rely on the differentiable optimization functions. As a remedy, the nondifferentiable L p loss function ρ can be replaced by a differentiable approximation ρ ε , which is close to ρ , but differentiable on the entire real line. The approximating function ρ ε is defined as
ρ ε ( x ) = x 2 + ε p / 2 ,
where ε > 0 is a tuning parameter that should be small enough such that ρ ε is close to ρ but large enough to ensure estimation stability. Replacing the nondifferentiable ρ by the differentiable approximation ρ ε has been previously recommended by [13,21,25,26].
The loss function ρ and its differentiable approximation ρ ε are displayed for six different values of p in Figure 1. It can be seen (and shown) that ρ ( x ) ρ ε ( x ) for all  p > 0 and ε > 0 . Furthermore, the loss function ρ is much steeper at x = 0 for a smaller p. With a larger ε , the approximation ρ ε becomes smoother. Choosing an appropriate tuning parameter ε > 0 is therefore important when applying model-robust moment estimation based on the L p  loss function.
It might be tempting to use a very small p close to zero. Such a loss function is close to the L 0 loss function, which takes the value of 1 for all arguments differently from  x = 0 and 0 for  x = 0 . If the sparsity assumption of model errors holds, the L 0 would theoretically be the most desirable loss function [27,28]. However, as shown in Figure 1, ρ ε for p = 0.01 does not have a clear minimum zero, making this differentiable loss function difficult to apply in practical optimization.
O’Neill and Burke [15,29] proposed the following differentiable approximation χ ε of the L 0 loss function in a recent work related to regularized estimation:
χ ε ( x ) = x 2 x 2 + ε ,
where ε > 0 is again a tuning parameter. The differentiable approximation χ ε is displayed for different ε values in Figure 2. It can be seen that the functional form of χ ε seems much nicer to use in optimization than ρ ε with a p close to 0. Hence, χ ε might be a useful alternative robust loss function whose performance in the presence of model errors has to be evaluated.
In practical minimization of F rob in (6), when ρ is replaced with ρ ε (using an appropriate p) or χ ε , it is advisable to use reasonable starting values and to minimize F rob using a sequence of differentiable approximations ρ ε with decreasing ε values (i.e., subsequently fitting ρ ˜ ε with ε = 10 1 , 10 2 , 10 3 , 10 4 , while using the previously obtained parameter estimate as the initial value for the subsequent minimization problem).

3. A Direct BIC Minimization in Regularized Maximum Likelihood Estimation

Most frequently, SEMs are estimated with maximum likelihood (ML) estimation. This estimation method provides the most efficient estimates for correctly specified models. However, the efficiency properties are lost in the case of misspecified SEMs.
As an alternative, regularized ML estimation can be used that introduces an overidentified SEM by allowing free group-specific item intercepts and residual covariances. To identify the model, a penalty function on the overidentified parameters is imposed that is targeted at the sparsity structure of model errors. The methodological literature has extensively documented the regularized estimation of single-group and multiple-group SEMs [30,31,32,33]. Cross-loadings, residual covariances, or item intercepts are regularized in these applications. Regularized SEM estimation enables flexible yet parsimonious model specifications.
In ML estimation, the fit function F ML is the negative log-likelihood function based on the multivariate normal distribution that is defined as [2,4]
F ML ( θ ; ξ ) = g = 1 G N g 2 I log ( 2 π ) + log | Σ g ( θ ) | + tr ( S g Σ g ( θ ) 1 ) + ( x ¯ g μ g ( θ ) ) Σ g ( θ ) 1 ( x ¯ g μ g ( θ ) ) .
In empirical applications, the model-implied mean vectors μ g and covariance matrices  Σ g will often be misspecified [34,35,36], and θ can be understood as a pseudo-true parameter that is defined as the minimizer of F ML in (10).
In regularized SEM estimation, a penalty function P is added to the log-likelihood fit function F ML that imposes some sparsity assumption on a subset of model parameters [31,33]. In order to enforce sparsity, the penalty function P is often chosen to be nondifferentiable. Define a known parameter ι k { 0 , 1 } for all parameters θ k , where ι k = 1 indicates that for the kth entry θ k in θ a penalty function is applied. The penalized log-likelihood function is defined as
F pen ( θ , λ ; ξ ) = F ML ( θ ; ξ ) + N k = 1 K ι k P ( | θ k | , λ ) ,
where λ > 0 is a regularization parameter, and N is a scaling factor that frequently equals the total sample size N = g = 1 G N g . The regularized (or penalized) ML estimate is defined as the minimizer of F pen ( θ ; ξ ) .
The least absolute shrinkage and selection operator (LASSO; ref. [37]) or the smoothly clipped absolute deviation (SCAD; ref. [38]) penalty functions have been frequently used in regularized SEM estimation. For a fixed value of λ , a subset of θ k parameters for which the penalty function is applied (i.e., ι k = 1 ) will result in estimates of zero. That is, a sparse θ vector is obtained as the result of regularized estimation. However, the estimate of θ depends on the fixed regularization parameter λ ; that is,
θ ˜ ( λ ) = arg min θ F pen ( θ , λ ; ξ ) .
As a result, the parameter estimate θ ˜ ( λ ) of θ depends on the unknown parameter λ . To avoid this problem, the regularized SEM can be repeatedly estimated on a finite grid of regularization parameters λ (e.g., on an equidistant grid between 0.01 and 1.00 with increments of 0.01). The Bayesian information criterion (BIC), defined by BIC = 2 F ML ( θ , ξ ) + log ( N ) H may be used to choose an optimal regularization parameter λ , where H denotes the number of parameters. Because the minimization of BIC is equivalent to the minimization of BIC/2, the final parameter estimate θ ^ is determined as
θ ^ = θ ˜ ( λ ^ ) with λ ^ = arg min λ F ML ( θ ˜ ( λ ) , ξ ) + log ( N ) 2 k = 1 K ι k χ ( θ ˜ k ( λ ) ) ,
where the function χ is an indicator whether | x | is different from 0:
χ ( x ) = 1 if | x | 0 0 if | x | = 0
In particular, the quantity k = 1 K ι k χ ( θ ˜ k ( λ ) ) in (13) counts the number of parameter estimates  θ ˜ k ( λ ) for k = 1 , , K for which the penalty function is applied (i.e., ι k = 1 ) and differs from 0.
As becomes clear, regularization SEM estimation necessitates fitting an SEM on a grid of the regularization parameter λ . This approach is computationally intensive, especially for SEMs with a large number of parameters. The final parameter estimate is obtained by minimizing the BIC across all estimated regularized SEMs. A naïve idea might be directly minimizing the BIC to avoid repeated estimation that involves regularization parameter selection. It should be noted that only a subset of parameters for which sparsity should be imposed is relevant in the BIC computation. Hence, a parameter estimate θ ^ by minimizing the BIC is provided by
θ ^ = arg min θ F ML ( θ , ξ ) + log ( N ) 2 k = 1 K ι k χ ( θ k ) .
The optimization function in (15) employs an L 0 penalty function [39,40] with a fixed regularization parameter log ( N ) / 2 . This optimization function contains the nondifferentiable indicator function χ that counts the number of regularized parameters that differ from 0. The ingenious idea of O’Neill and Burke [15] was to replace the nondifferentiable L 0 loss function χ with its differentiable approximation χ ε (see (8) and Ref. [16] for a more comprehensive treatment). Therefore, the parameter θ can be estimated as
θ ^ = arg min θ F DBIC ( θ , ξ ) with F DBIC ( θ , ξ ) = F ML ( θ , ξ ) + log ( N ) 2 k = 1 K ι k χ ε ( θ k ) .
The estimation approach from (16) is referred to as the smoothed direct BIC minimization (DBIC) approach. This method has been used to estimate regularized distributional regression models [15].
It has been shown in SEMs that the DBIC approach performs similarly to regularized estimation based on the indirect approach by minimizing the BIC on a finite grid of regularization parameters [16]. Hence, we confine ourselves in this article to compare the model-robust moment estimation methods with different values of the power p with the DBIC estimator.

4. Computation of Standard Errors

In this section, the computation of the variance matrix of parameter estimates θ ^ from model-robust moment estimation and DBIC estimation using the fit functions in (6) and (16), respectively, is described (see also [14,16] for a similar treatment). Both methods minimize a differentiable (approximation) function F ( θ , ξ ) with respect to θ as a function of sufficient statistics ξ (see also [41]). The vector of estimated sufficient statistics ξ ^ is approximately normally distributed (see [3]); that is,
ξ ^ ξ 0 MVN ( 0 , V ξ )
for a true population parameter ξ 0 of sufficient statistics. Let F θ = ( F ) / ( θ ) be the vector of partial derivatives with respect to θ . The parameter estimate θ ^ fulfills the nonlinear equation F θ ( θ ^ , ξ ^ ) = 0 . The delta method [34] can be employed to derive the variance matrix of θ ^ . Assume that there exists a (pseudo-)true parameter θ 0 such that F θ ( θ 0 , ξ 0 ) = 0 .
Now, we conduct a Taylor expansion of F θ (see [3,5,42]). Denote by F θ θ and F θ ξ the matrices of second-order partial derivatives of F θ with respect to θ and ξ , respectively. The Taylor expansion can be written as
F θ ( θ ^ , ξ ^ ) = F θ ( θ 0 , ξ 0 ) + F θ θ ( θ 0 , ξ 0 ) ( θ ^ θ 0 ) + F θ ξ ( θ 0 , ξ 0 ) ( ξ ^ ξ 0 ) = 0 .
By solving (18) for θ ^ , we get the approximation
θ ^ θ 0 = F θ θ ( θ 0 , ξ 0 ) 1 F θ ξ ( θ 0 , ξ 0 ) ( ξ ^ ξ 0 ) .
By defining A ^ = F θ θ ( θ ^ , ξ ^ ) 1 F θ ξ ( θ ^ , ξ ^ ) when substituting θ 0 and ξ 0 with θ ^ and ξ ^ , respectively, we get by using the multivariate delta method [34]
Var ( θ ^ ) = A ^ V ξ A ^ .
The square root of diagonal elements of Var ( θ ^ ) computed from (20) may be used to calculate standard errors for elements in θ ^ .

Statistical Inference for Parameter Differences of Different Models Based on the Same Dataset

In this section, statistical inference for differences in parameters from different models based on the same dataset is discussed. For example, researchers could use the L p loss function with p = 2 , p = 0.5 , and p = 0 . It should be evaluated whether the estimated factor means from different models that employ the different loss functions are statistically significant. Importantly, the different models rely on the same dataset and its vector of sufficient statistics ξ ^ . Hence, the standard error of a parameter difference from parameters of different models can be smaller than the standard error of a single model because the data are used twice. The M-estimation framework can also be utilized to derive the variance estimate of a parameter difference [43,44]. The different loss functions provide different estimates θ ^ m for models m = 1 , , M . At the population level, the parameters are denoted as θ m . Note that the population parameters differ in the case of misspecified SEMs. In the following, we discuss the case M = 2 to reduce notation.
Following the lines of the variance derivation in the previous section, we can approximate the estimate of model m by using (19)
θ ^ m θ m = A m ( ξ ^ ξ ) for m = 1 , 2 ,
where A m is defined as in (19). Note that A 1 A 2 due to choosing different loss functions. Researchers can now ask whether the parameter difference Δ ^ = θ ^ 1 θ ^ 2 (or some entries of it) significantly differs from 0 . From (21), we obtain
Δ ^ Δ = ( A 1 A 2 ) ( ξ ^ ξ ) ,
where Δ = θ 1 θ 2 . We then obtain the variance estimate of Δ ^ as
Var ( Δ ^ ) = ( A 1 A 2 ) V ξ ( A 1 A 2 ) .
The unknown matrix A 1 A 2 can be estimated by its sample analog A ^ 1 A ^ 2 .

5. Research Purpose

In this article, several research questions are addressed connected to model-robust estimation. First, simulations should clarify which tuning parameter ε > 0 in the differentiable approximation should be chosen to minimize bias and variance in estimated structural SEM parameters. Second, according to our knowledge, no research compares the performance of the L p loss function (approximation) with the newly proposed L 0 loss function approximation of O’Neill and Burke in terms of bias and root mean square error (RMSE). Third, it should be examined whether the standard error computation based on the delta method provides valid standard error estimates regarding coverage rates even though a differentiable approximation of the involved model-robust loss function is used. The first two research questions are addressed in Simulation Study 1 (see Section 6), while the third is investigated in Simulation Study 2 (see Section 7).

6. Simulation Study 1: Bias and RMSE

This Simulation Study 1 examined the impact of group-specific item intercepts in a multiple-group one-dimensional factor model on bias and RMSE of factor means and factor variances. In the data-generating model (DGM), measurement invariance was violated. That is, differential item functioning (DIF; refs. [45,46]) occurred, and hence, DIF effects in item intercepts were simulated.

6.1. Method

The DGM in the simulation study was identical to Simulation Study 2 in [14] and mimicked [21]. The data were simulated from a one-dimensional factor model involving five items and three groups. The factor variable η 1 was normally distributed with group means α 1 , 1 = 0 , α 2 , 1 = 0.3 , and α 3 , 1 = 0.8 and group variances ϕ 1 , 11 = 1 , ϕ 2 , 11 = 1.5 , and  ϕ 3 , 11 = 1.2 , respectively. All five factor loadings were set to 1, and all measurement error variances were set to 1 in all groups and uncorrelated with each other. The factor variable and residual variables were normally distributed.
Only a subset of group-specific item intercepts was simulated differently from zero. These nonzero item intercepts indicated measurement noninvariance (i.e., the presence of DIF effects). One of the five items in each group had a DIF effect. However, different items across the three groups were affected by the DIF. In the first group, the fourth item intercepts had a DIF effect δ . In the second group, the first item had a DIF effect δ , while the second item had a DIF effect δ in the third group. The DIF effect δ was chosen as 0, 0.3, or 0.6. The value of δ = 0 represented the situation of measurement invariance. The sample size per group was chosen as  N = 250 , N = 500 , N = 1000 , or N = 2000 .
All analysis models were a multiple-group one-factor model. For identification reasons, the mean of the factor variable in the first group was fixed at 0, and the standard deviation in the first group was fixed at 1. Invariant factor loadings and residual variances were specified across groups. In model-robust moment estimation (ME), we also assumed invariant item intercepts. We utilized the powers p = 0.5 , p = 0.25 , and p = 0.1 for the loss function ρ ε defined in (8) combined with values of the tuning parameter ε chosen as  ε = 10 2 (=0.01), ε = 10 3 (=0.001), and ε = 10 4 (=0.0001). The resulting estimators will be denoted by ME0.5, ME0.25, and ME0.1, in the following Results sections, respectively. Moreover, we used the loss function χ ε defined in (9) with the same ε tuning parameter values as for ρ ε to approximate the L 0 loss function (denoted by ME0 in the following Section 6.2). Furthermore, we used DBIC estimation in which all group-specific item intercepts were allowed to differ across groups. The indicator variables ι k involved in the DBIC approach (see (16)) of the model parameter vector θ take only the value 1 for item intercepts. For all other elements of θ , they are set to 0. Therefore, the DBIC approach effectively minimizes the number of estimated item intercepts in its penalty. We also chose the tuning parameter values as ε chosen as 10 2 , 10 3 , and 10 4 .
We did not use the power p = 1 in model-robust moment estimation because it resulted in biased estimates in the presence of DIF [14]. However, we included the non-robust ML estimation method for a more comprehensive comparison of the estimation methods.
In total, R = 5000 replications were conducted for all 3 (DIF effect size δ ) × 4 (sample size N) = 12 conditions of the simulation study. We investigated the estimation quality of factor means (i.e., α 2 , 1 and α 3 , 1 ) and factor variances (i.e., ϕ 2 , 11 and ϕ 3 , 11 ) in the second and third groups. Bias, RMSE, and relative RMSE were computed to assess the performance of the different estimators. Let θ ^ r be a model parameter estimate in replication r = 1 , , R . The bias was estimated by
Bias ( θ ^ ) = 1 R r = 1 R ( θ ^ r θ ) ,
where θ denotes the true parameter value. The RMSE was estimated by
RMSE ( θ ^ ) = 1 R r = 1 R ( θ ^ r θ ) 2 .
Note that RMSE ( θ ^ ) | Bias ( θ ^ ) | holds because the mean square error (i.e., the square of the RMSE) is the sum of the square of the bias and the variance of an estimator. A relative RMSE can be defined by dividing the RMSE of an estimator by the RMSE of a chosen reference model. To ease the reading of the numeric values of the relative RMSE, the values were multiplied by 100. This quantity can then easily be converted as a percentage gain or loss of a particular estimator compared to a reference model.
The entire simulation study was carried out in the R [47] software (Version 4.3.1). The SEMs were estimated using the sirt::mgsem() function in the R package sirt (Version 4.0-19; ref. [48]). Information about model specification can be found in the material located at https://osf.io/ng6s3 (accessed on 25 September 2023).
Researchers interested in fitting a particular model without interest in studying the entire simulation code are referred to the manual help site of the mgsem function in the R package sirt [48]. They can type ?sirt::mgsem in the R console (or look at https://alexanderrobitzsch.r-universe.dev/sirt/doc/manual.html#mgsem (accessed on 25 September 2023)) and find an example for applying the ME estimator for an existing dataset.

6.2. Results

Figure 3 displays the RMSE of factor mean α 2 , 1 in the second group for the different estimators for DIF effect sizes δ = 0.3 and δ = 0.6 and sample sizes N = 500 and N = 1000 as a function of the tuning parameter ε . It can be seen that ε = 10 3 was optimal with respect to the RMSE for ME0.5, ME0.25, and ME0.1, while ε = 10 2 resulted in the smallest RMSE for ME0 and DBIC. The findings were very similar for the factor mean α 3 , 1 in the third group and the sample sizes N = 250 and N = 2000 .
Table 1 displays the bias and relative RMSE of factor means and factor variances as a function of the DIF effect size and the sample size for the different estimation methods. We chose the tuning parameter ε = 10 3 for the estimators ME0.5, ME0.25, and ME0.1, and ε = 10 2 for the estimators ME0 and DBIC because they resulted in the smallest RMSE of factor means according to the findings of Figure 3.
Overall, all estimators except for ML were approximately unbiased for factor variances in all conditions and for factor means in the absence of DIF effects (i.e., δ = 0 and measurement invariance holds) except for the sample size N = 250 . Slightly biased estimates were obtained for ME with a larger p, such as p = 0.5 in ME0.5. However, the bias decreased with increasing sample size. The ME0 and DBIC were unbiased across all conditions, followed by the estimators ME0.1, ME0.25, and ME0.5, with increasing absolute bias. There was substantial bias for estimated factor means for ML estimation, while the estimates of factor variances were approximately unbiased for ML.
To compare the estimation accuracy in terms of the relative RMSE, we chose ME0.5 (with ε = 10 3 ) as the reference model. Hence, the relative RMSE was 100 for this method in Table 1. It turned out that ME0 and DBIC were superior to the other estimators with non-negligible efficiency gains. For example, for the factor mean α 2 , 1 in the second group, δ = 0.3 , and N = 1000 , the efficiency gain in terms of the RMSE was 6.4% (=100 − 93.6) for ME0 and 6.5% for DBIC. Across all conditions, no noteworthy differences between ME0 and DBIC estimators were found.
ML estimates were slightly more efficient than ME estimates in the absence of DIF. However, the efficiency gains of ML decrease with increasing sample size.
To conclude, it seems promising to replace the L p loss function using the power p = 0.5 (i.e., ME0.5) with the L 0 loss function implemented in ME0 or regularized estimation in DBIC for large sample sizes N = 1000 or N = 2000 . Similar performance of the different estimators was found for N = 500 , while the power p = 0.5 was the frontrunner for N = 250 .

7. Simulation Study 2: Coverage

The second Simulation Study 2 investigated the assessment of coverage rates of the model-robust moment estimators and the DBIC method.

7.1. Method

The same DGM as in Simulation Study 1 was employed to simulate data (see Section 6.1). We evaluated the standard error computation described in Section 4 for the five estimators using the tuning parameter ε that resulted in the smallest RMSE of factor means. In detail, we used ε = 10 3 for the estimators ME0.5, ME0.25, and ME0.1; and we used ε = 10 2 for the estimators ME0 and DBIC (see Figure 3). Moreover, we included ML estimation for reasons of comparison.
The same analysis models as in Simulation Study 1 were specified. Confidence intervals at the confidence level of 95% were computed using a normal distribution approximation (i.e., the estimated confidence interval was  θ ^ ± 1.96 × SE ( θ ^ ) ). The coverage rate at the confidence level of 95% was computed as the percentage of the events that a computed confidence interval covers the true parameter value.
As in Simulation Study 1, 5000 replications were conducted in each of the 12 simulation conditions (i.e., 3 (DIF effect size δ ) × 4 (sample size N) = 12 conditions). The SEMs with standard error estimates of model parameters were again estimated with the sirt::mgsem() function in the R [47] package sirt (Version 4.0-19; ref. [48]). Material for replication can be found at https://osf.io/ng6s3 (accessed on 25 September 2023).

7.2. Results

Table 2 shows the coverage rates of factor means and factor variances for the five different estimators as a function of DIF effects size and sample size. Overall, the coverage rates were acceptable because they were neither smaller than 91.0 nor larger than 98.0 (see [49]). Across all conditions and estimators (when excluding ML estimation), the coverage rates in Table 2 ranged between 92.1 and 97.8, with a mean of M = 96.16 and a standard deviation of S D = 0.93 . The coverage rates of ME0.5 ( M = 95.87 , S D = 0.87 ), ME0 ( M = 96.03 , S D = 0.82 ), DBIC ( M = 95.81 , S D = 1.09 ) performed slightly better than ME0.25 ( M = 96.42 , S D = 0.78 ) and ME0.1 ( M = 96.67 , S D = 0.76 ). The coverage rates for ML estimation ( M = 74.57 , S D = 33.43 ) were not acceptable for factor means which had biased estimates in the presence of DIF effects.

8. Discussion

In this article, we compared model-robust moment estimation with a recently proposed variant of regularized ML estimation by O’Neill and Burke [15] that directly maximizes the BIC (i.e., the DBIC estimator). In the DBIC estimation, these authors suggested a differentiable approximation of the L 0 loss function, which was also used in model-robust moment estimation. Interestingly, the L 0 loss function outperformed L p loss functions for p > 0 regarding bias and RMSE. Furthermore, model-robust moment estimation with the L 0 loss function performed very similarly to the DBIC estimator. Moreover, the estimation of standard errors was successfully implemented for all estimators because coverage rates were acceptable for all parameters in all simulation conditions.
In line with previous studies, we anticipate that sample sizes must be sufficiently large in order to achieve the model-robustness properties of the L p and L 0 loss functions. If the sample size is too small (e.g., N = 100 subjects in a multiple-group SEM analysis), the sampling error in moments that are used as sufficient statistics in the SEM can exceed model errors (i.e., unmodeled group-specific item intercepts). In this case, model-robust methods are not expected to perform well. In fact, Simulation Study 1 revealed that p = 0.5 is preferable to other L p loss functions with p < 0.5 or the L 0 loss function for N = 250 , while the situation changes with larger sample sizes such as N = 1000 .
Simulation Study 1 indicated that the tuning parameter ε = 0.001 in the differentiable approximation of the nondifferentiable L p loss function should be used for the L p loss function for p = 0.5 , p = 0.25 , or p = 0.1 . In contrast, ε = 0.01 was found optimal for the L 0 loss function and the direct BIC minimization estimation approach in regularized estimation. We expect that these findings will transfer to other models that involve standardized variables. In our experience from regularized estimation and the invariance alignment approach [21], using a value of ε (such as ε = 10 5 and smaller values) tuning parameter that is too small is prohibitive because it more likely results in convergence issues in the local optima of the fit function.
In our simulation studies, we only considered model errors in item intercepts. Future simulation studies could also investigate model errors in the covariance structure (i.e., unmodeled residual correlations). Of course, model errors in the mean and the covariance structure can also be simultaneously examined.
A requirement to achieve model robustness of SEM estimators is that model errors in the mean or covariance structure are sparsely distributed. That is, only a few entries are allowed to differ from zero, while the majority of model errors must be (approximately) zero. If factor loadings were not invariant across groups, more densely distributed model errors would result. As a consequence, model-robust moment estimation will likely not work, and regularized maximum likelihood estimation might be preferred. However, the DBIC minimization method for regularized maximum likelihood estimation can also be utilized in this case if the number of group-specific factor loadings is counted in the BIC penalty term.
In this article, we only applied the L 0 and L p loss functions to continuous items. However, the principle directly transfers to SEMs of ordinal data that are based on fitting thresholds and polychoric correlations [50] instead of means and covariances for continuous data, respectively. Moreover, the model-robust estimators could also be applied to two-step estimation methods of multilevel structural equation models [51,52].
To sum up, model-robust SEM estimators based on the L p (for p < 1 ) and L 0 loss functions are attractive to researchers who do not want model estimates being influenced by the presence of a few model deviations (i.e., model errors). In contrast, usually employed (non-robust) SEM estimators such as maximum likelihood estimation are impacted by model errors. In this sense, misfitting models do not necessarily result in biased estimates when using model-robust estimation.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
BICBayesian information criterion
CFAconfirmatory factor analysis
DBICdirect BIC minimization
DGMdata-generating model
DIFdifferential item functioning
MEmoment estimation
MLmaximum likelihood
RMSEroot mean square error
SEMstructural equation model

References

  1. Bartholomew, D.J.; Knott, M.; Moustaki, I. Latent Variable Models and Factor Analysis: A Unified Approach; Wiley: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
  2. Bollen, K.A. Structural Equations with Latent Variables; Wiley: New York, NY, USA, 1989. [Google Scholar] [CrossRef]
  3. Browne, M.W.; Arminger, G. Specification and estimation of mean-and covariance-structure models. In Handbook of Statistical Modeling for the Social and Behavioral Sciences; Arminger, G., Clogg, C.C., Sobel, M.E., Eds.; Springer: Boston, MA, USA, 1995; pp. 185–249. [Google Scholar] [CrossRef]
  4. Jöreskog, K.G.; Olsson, U.H.; Wallentin, F.Y. Multivariate Analysis with LISREL; Springer: Basel, Switzerland, 2016. [Google Scholar] [CrossRef]
  5. Yuan, K.H.; Bentler, P.M. Structural equation modeling. In Handbook of Statistics, Volume 26: Psychometrics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 297–358. [Google Scholar] [CrossRef]
  6. Magnus, J.R.; Neudecker, H. Matrix Differential Calculus with Applications in Statistics and Econometrics; Wiley: New York, NY, USA, 2019. [Google Scholar] [CrossRef]
  7. Robitzsch, A. Comparing the robustness of the structural after measurement (SAM) approach to structural equation modeling (SEM) against local model misspecifications with alternative estimation approaches. Stats 2022, 5, 631–672. [Google Scholar] [CrossRef]
  8. Uanhoro, J.O. Modeling misspecification as a parameter in Bayesian structural equation models. Educ. Psychol. Meas. 2023. Epub ahead of print. [Google Scholar] [CrossRef]
  9. Wu, H.; Browne, M.W. Quantifying adventitious error in a covariance structure as a random effect. Psychometrika 2015, 80, 571–600. [Google Scholar] [CrossRef] [PubMed]
  10. Huber, P.J.; Ronchetti, E.M. Robust Statistics; Wiley: New York, NY, USA, 2009. [Google Scholar] [CrossRef]
  11. Maronna, R.A.; Martin, R.D.; Yohai, V.J. Robust Statistics: Theory and Methods; Wiley: New York, NY, USA, 2006. [Google Scholar] [CrossRef]
  12. Ronchetti, E. The main contributions of robust statistics to statistical science and a new challenge. Metron 2021, 79, 127–135. [Google Scholar] [CrossRef]
  13. Robitzsch, A. Lp loss functions in invariance alignment and Haberman linking with few or many groups. Stats 2020, 3, 246–283. [Google Scholar] [CrossRef]
  14. Robitzsch, A. Model-robust estimation of multiple-group structural equation models. Algorithms 2023, 16, 210. [Google Scholar] [CrossRef]
  15. O’Neill, M.; Burke, K. Variable selection using a smooth information criterion for distributional regression models. Stat. Comput. 2023, 33, 71. [Google Scholar] [CrossRef] [PubMed]
  16. Robitzsch, A. Implementation aspects in regularized structural equation models. Algorithms 2023, 16, 446. [Google Scholar] [CrossRef]
  17. Meredith, W. Measurement invariance, factor analysis and factorial invariance. Psychometrika 1993, 58, 525–543. [Google Scholar] [CrossRef]
  18. Millsap, R.E. Statistical Approaches to Measurement Invariance; Routledge: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
  19. Siemsen, E.; Bollen, K.A. Least absolute deviation estimation in structural equation modeling. Sociol. Methods Res. 2007, 36, 227–265. [Google Scholar] [CrossRef]
  20. Van Kesteren, E.J.; Oberski, D.L. Flexible extensions to structural equation models using computation graphs. Struct. Equ. Model. 2022, 29, 233–247. [Google Scholar] [CrossRef]
  21. Asparouhov, T.; Muthén, B. Multiple-group factor analysis alignment. Struct. Equ. Model. 2014, 21, 495–508. [Google Scholar] [CrossRef]
  22. Muthén, B.; Asparouhov, T. IRT studies of many groups: The alignment method. Front. Psychol. 2014, 5, 978. [Google Scholar] [CrossRef] [PubMed]
  23. Pokropek, A.; Lüdtke, O.; Robitzsch, A. An extension of the invariance alignment method for scale linking. Psych. Test Assess. Model. 2020, 62, 303–334. [Google Scholar]
  24. Asparouhov, T.; Muthén, B. Penalized Structural Equation Models; 2023 Technical Report. Available online: https://rb.gy/tbaj7 (accessed on 28 March 2023).
  25. Battauz, M. Regularized estimation of the nominal response model. Multivar. Behav. Res. 2020, 55, 811–824. [Google Scholar] [CrossRef]
  26. Oelker, M.R.; Tutz, G. A uniform framework for the combination of penalties in generalized structured models. Adv. Data Anal. Classif. 2017, 11, 97–120. [Google Scholar] [CrossRef]
  27. Davies, P.L. Data Analysis and Approximate Models; CRC Press: Boca Raton, FL, USA, 2014. [Google Scholar] [CrossRef]
  28. Davies, P.L.; Terbeck, W. Interactions and outliers in the two-way analysis of variance. Ann. Stat. 1998, 26, 1279–1305. [Google Scholar] [CrossRef]
  29. O’Neill, M.; Burke, K. Robust distributional regression with automatic variable selection. arXiv 2022, arXiv:2212.07317. [Google Scholar] [CrossRef]
  30. Geminiani, E.; Marra, G.; Moustaki, I. Single- and multiple-group penalized factor analysis: A trust-region algorithm approach with integrated automatic multiple tuning parameter selection. Psychometrika 2021, 86, 65–95. [Google Scholar] [CrossRef]
  31. Huang, P.H.; Chen, H.; Weng, L.J. A penalized likelihood method for structural equation modeling. Psychometrika 2017, 82, 329–354. [Google Scholar] [CrossRef]
  32. Huang, P.H. A penalized likelihood method for multi-group structural equation modelling. Brit. J. Math. Stat. Psychol. 2018, 71, 499–522. [Google Scholar] [CrossRef]
  33. Jacobucci, R.; Grimm, K.J.; McArdle, J.J. Regularized structural equation modeling. Struct. Equ. Model. 2016, 23, 555–566. [Google Scholar] [CrossRef]
  34. Boos, D.D.; Stefanski, L.A. Essential Statistical Inference; Springer: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
  35. Kolenikov, S. Biases of parameter estimates in misspecified structural equation models. Sociol. Methodol. 2011, 41, 119–157. [Google Scholar] [CrossRef]
  36. White, H. Maximum likelihood estimation of misspecified models. Econometrica 1982, 50, 1–25. [Google Scholar] [CrossRef]
  37. Hastie, T.; Tibshirani, R.; Wainwright, M. Statistical Learning with Sparsity: The Lasso and Generalizations; CRC Press: Boca Raton, FL, USA, 2015. [Google Scholar] [CrossRef]
  38. Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  39. Oelker, M.R.; Pößnecker, W.; Tutz, G. Selection and fusion of categorical predictors with L0-type penalties. Stat. Model. 2015, 15, 389–410. [Google Scholar] [CrossRef]
  40. Shen, X.; Pan, W.; Zhu, Y. Likelihood-based selection and sharp parameter estimation. J. Am. Stat. Assoc. 2012, 107, 223–232. [Google Scholar] [CrossRef]
  41. Shapiro, A. Statistical inference of covariance structures. In Current Topics in the Theory and Application of Latent Variable Models; Edwards, M.C., MacCallum, R.C., Eds.; Routledge: Milton Park Abingdon, UK, 2012; pp. 222–240. [Google Scholar] [CrossRef]
  42. Shapiro, A. Statistical inference of moment structures. In Handbook of Latent Variable and Related Models; Lee, S.Y., Ed.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 229–260. [Google Scholar] [CrossRef]
  43. Clogg, C.C.; Petkova, E.; Haritou, A. Statistical methods for comparing regression coefficients between models. Am. J. Sociol. 1995, 100, 1261–1293. [Google Scholar] [CrossRef]
  44. Mize, T.D.; Doan, L.; Long, J.S. A general framework for comparing predictions and marginal effects across models. Sociol. Methodol. 2019, 49, 152–189. [Google Scholar] [CrossRef]
  45. Holland, P.W.; Wainer, H. (Eds.) Differential Item Functioning: Theory and Practice; Lawrence Erlbaum: Hillsdale, NJ, USA, 1993. [Google Scholar] [CrossRef]
  46. Mellenbergh, G.J. Item bias and item response theory. Int. J. Educ. Res. 1989, 13, 127–143. [Google Scholar] [CrossRef]
  47. R Core Team. R: A Language and Environment for Statistical Computing; R Core Team: Vienna, Austria, 2023; Available online: https://www.R-project.org/ (accessed on 15 March 2023).
  48. Robitzsch, A. sirt: Supplementary Item Response Theory Models, R Package Version 4.0-19; 2023. Available online: https://github.com/alexanderrobitzsch/sirt (accessed on 16 September 2023).
  49. Muthén, L.K.; Muthén, B.O. How to use a Monte Carlo study to decide on sample size and determine power. Struct. Equ. Model. 2002, 9, 599–620. [Google Scholar] [CrossRef]
  50. Muthén, B. A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika 1984, 49, 115–132. [Google Scholar] [CrossRef]
  51. Muthén, B.O. Multilevel covariance structure analysis. Sociol. Methods Res. 1994, 22, 376–398. [Google Scholar] [CrossRef]
  52. Yuan, K.H.; Bentler, P.M. Multilevel covariance structure analysis by fitting multiple single-level models. Sociol. Methodol. 2007, 37, 53–82. [Google Scholar] [CrossRef]
Figure 1. Loss function ρ ε (see (8)) for different values of p as a function of the tuning parameter ε .
Figure 1. Loss function ρ ε (see (8)) for different values of p as a function of the tuning parameter ε .
Psych 05 00075 g001aPsych 05 00075 g001b
Figure 2. Loss function χ ε (see (9)) as a function of the tuning parameter ε .
Figure 2. Loss function χ ε (see (9)) as a function of the tuning parameter ε .
Psych 05 00075 g002
Figure 3. Simulation Study 1: Root mean square error for the factor mean α 2 , 1 of the different model-robust moment estimation (ME0.5, ME0.25, ME0.1, ME0) and the direct BIC minimization (DBIC) estimators as a function of the tuning parameter ε , sample size N, and DIF effect size δ .
Figure 3. Simulation Study 1: Root mean square error for the factor mean α 2 , 1 of the different model-robust moment estimation (ME0.5, ME0.25, ME0.1, ME0) and the direct BIC minimization (DBIC) estimators as a function of the tuning parameter ε , sample size N, and DIF effect size δ .
Psych 05 00075 g003
Table 1. Simulation Study 1: Bias and relative root mean square error (RMSE) of factor means and factor variances as a function of DIF effect size δ and sample size N.
Table 1. Simulation Study 1: Bias and relative root mean square error (RMSE) of factor means and factor variances as a function of DIF effect size δ and sample size N.
BiasRelative RMSE
Par δ N ML ME0.5 ME0.25 ME0.1 ME0 DBIC ML ME0.5 ME0.25 ME0.1 ME0 DBIC
α 2 , 1 0250 0.001 0.002 0.002 0.002 0.002 0.00297.5100.0101.3102.0101.2101.1
500 0.000 0.000 0.000 0.000 0.000 0.00098.5100.0100.8101.4100.0100.0
1000 0.001 0.000 0.000 0.000 0.000 0.00099.1100.0100.5100.999.599.5
2000 0.001 0.001 0.001 0.001 0.001 0.00199.6100.0100.3100.599.699.3
0.3250−0.119−0.061−0.055−0.054−0.048−0.048114.9100.0101.2102.2104.3104.3
500−0.119−0.034−0.026−0.023−0.015−0.015152.9100.098.999.1100.099.4
1000−0.120−0.021−0.014−0.011−0.005−0.005211.9100.097.396.793.493.2
2000−0.120−0.015−0.009−0.007−0.004−0.004293.2100.096.095.293.293.4
0.6250−0.239−0.034−0.023−0.019−0.005−0.005215.0100.099.098.997.197.1
500−0.238−0.018−0.011−0.008 0.000 0.000293.3100.099.499.796.696.6
1000−0.239−0.013−0.007−0.005−0.001−0.001417.8100.098.798.796.596.5
2000−0.238−0.009−0.004−0.003 0.000 0.000598.0100.098.298.196.897.0
α 3 , 1 0250 0.005 0.005 0.005 0.005 0.005 0.00496.8100.0101.2101.8101.7101.8
500 0.002 0.002 0.002 0.002 0.002 0.00297.8100.0101.1101.8100.3100.4
1000 0.002 0.002 0.002 0.002 0.002 0.00298.9100.0100.7101.199.499.5
2000 0.001 0.001 0.001 0.001 0.001 0.00199.4100.0100.3100.699.699.3
0.3250−0.110−0.058−0.053−0.051−0.045−0.045112.0100.0101.2102.1104.3104.1
500−0.113−0.033−0.025−0.022−0.014−0.014147.0100.099.299.5100.099.6
1000−0.115−0.021−0.013−0.010−0.004−0.004203.3100.097.396.993.793.5
2000−0.115−0.015−0.009−0.007−0.003−0.003280.3100.096.195.393.393.5
0.6250−0.218−0.032−0.021−0.017−0.003−0.004197.0100.099.699.898.799.0
500−0.217−0.017−0.009−0.007 0.002 0.001268.5100.099.9100.397.697.6
1000−0.219−0.012−0.006−0.004 0.001 0.000376.6100.099.299.497.297.2
2000−0.220−0.009−0.005−0.003 0.000 0.000539.6100.098.398.296.997.1
ϕ 2 , 11 0250 0.012 0.015 0.016 0.016 0.016 0.01697.2100.0101.0101.7102.6102.6
500 0.007 0.009 0.009 0.009 0.009 0.00998.0100.0100.8101.3101.3101.4
1000 0.007 0.008 0.008 0.008 0.007 0.00798.6100.0100.6101.0100.2100.3
2000 0.003 0.003 0.003 0.003 0.003 0.00399.0100.0100.3100.699.999.9
0.3250 0.017 0.020 0.021 0.021 0.021 0.02196.9100.0100.9101.4102.7102.8
500 0.007 0.008 0.008 0.009 0.008 0.00998.5100.0100.7101.2101.2101.2
1000 0.002 0.003 0.003 0.003 0.002 0.00299.0100.0100.6101.0100.3100.3
2000 0.002 0.002 0.002 0.002 0.001 0.00199.5100.0100.3100.599.999.9
0.6250 0.018 0.020 0.021 0.022 0.023 0.02397.9100.0100.8101.3102.5102.5
500 0.010 0.008 0.008 0.008 0.008 0.00999.1100.0100.8101.3101.2101.4
1000 0.007 0.004 0.004 0.004 0.004 0.00499.7100.0100.6101.1100.1100.1
2000 0.004 0.001 0.001 0.001 0.001 0.001100.2100.0100.3100.599.999.9
ϕ 3 , 11 0250 0.011 0.012 0.013 0.013 0.013 0.01396.7100.0101.4102.2103.3103.2
500 0.005 0.006 0.006 0.006 0.006 0.00697.9100.0100.9101.5101.3101.4
1000 0.005 0.005 0.005 0.005 0.005 0.00598.6100.0100.7101.2100.3100.4
2000 0.002 0.002 0.002 0.002 0.002 0.00298.8100.0100.4100.799.999.9
0.3250 0.015 0.015 0.015 0.015 0.016 0.01697.3100.0101.0101.7103.2103.1
500 0.005 0.006 0.006 0.006 0.006 0.00698.2100.0101.0101.6101.4101.5
1000 0.002 0.002 0.002 0.002 0.002 0.00298.7100.0100.7101.2100.3100.3
2000 0.002 0.002 0.002 0.002 0.002 0.00299.6100.0100.3100.599.999.9
0.6250 0.015 0.015 0.016 0.016 0.016 0.01698.5100.0101.3102.0103.4103.5
500 0.008 0.007 0.007 0.007 0.007 0.00798.8100.0101.1101.6101.4101.5
1000 0.005 0.004 0.004 0.004 0.004 0.00499.4100.0100.8101.3100.2100.2
2000 0.002 0.001 0.001 0.001 0.001 0.00199.9100.0100.4100.699.999.9
Note. Par = parameter; αg,1 = factor mean in group g = 2, 3; ϕg,11 = factor variance in group g = 2, 3; ML = maximum likelihood estimation; MEp = robust moment estimation with p = 0.5 (with ε = 0.001), p = 0.25 (with ε = 0.001), p = 0.1 (with ε = 0.001), or p = 0 (with ε = 0.01); DBIC = direct BIC minimization (with ε = 0.01). Absolute biases larger than 0.015 are shown with a gray background. ME0.5 is the reference method in the computation of the relative RMSE. Relative RMSE values smaller than 98.0 are printed in bold font. Relative RMSE values larger than 102.0 are show with a gray background.
Table 2. Simulation Study 2: Coverage rates (in percentages) of factor means and factor variances as a function of DIF effect size δ and sample size N.
Table 2. Simulation Study 2: Coverage rates (in percentages) of factor means and factor variances as a function of DIF effect size δ and sample size N.
Par δ NMLME0.5ME0.25ME0.1ME0DBIC
α 2 , 1 025094.796.697.197.296.596.6
50095.296.396.696.896.196.2
100094.995.595.796.095.395.3
200095.095.495.695.895.295.3
0.325080.395.496.396.896.195.6
50066.095.696.496.796.295.9
100039.895.096.096.395.795.7
200013.293.094.394.694.194.2
0.625042.896.396.997.196.392.1
50014.595.295.896.095.295.2
10001.095.696.396.495.795.7
20000.095.295.695.995.695.5
α 3 , 1 025095.297.397.697.897.297.2
50095.096.397.097.396.096.1
100095.396.096.396.695.695.6
200094.995.395.595.795.195.2
0.325081.595.796.897.396.796.0
50068.695.096.596.895.895.7
100044.995.496.396.795.795.8
200016.393.594.594.894.394.1
0.625048.896.497.297.496.992.4
50021.395.896.796.896.095.9
10002.595.396.396.795.695.6
20000.195.095.595.895.395.3
ϕ 2 , 11 025094.896.897.297.597.297.2
50094.996.797.297.396.896.8
100094.695.596.096.295.395.4
200094.995.495.795.995.395.2
0.325094.696.797.297.497.297.2
50095.596.997.497.697.097.0
100095.196.196.496.696.096.0
200094.895.595.896.095.295.2
0.625094.996.897.397.497.297.1
50094.796.096.596.896.296.1
100095.296.096.396.696.096.1
200094.995.395.695.895.195.2
ϕ 3 , 11 025094.697.097.597.897.497.3
50095.196.997.297.497.097.0
100094.695.896.296.595.895.9
200095.095.595.996.295.495.4
0.325094.496.897.397.597.397.2
50095.196.997.397.596.896.8
100094.796.096.396.696.096.0
200095.295.896.296.395.695.7
0.625094.897.197.697.897.497.3
50095.297.197.597.797.096.8
100095.295.996.596.996.096.0
200094.695.195.495.794.994.9
Note. Par = parameter; αg,1 = factor mean in group g = 2, 3; ϕg,11 = factor variance in group g = 2, 3; ML = maximum likelihood estimation; MEp = robust moment estimation with p = 0.5 (with ε = 0.001), p = 0.25 (with ε = 0.001), p = 0.1 (with ε = 0.001), or p = 0 (with ε = 0.01); DBIC = direct BIC minimization (with ε = 0.01); Coverage rates smaller than 91.0 or larger than 98.0 are shown with a gray background.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Robitzsch, A. L0 and Lp Loss Functions in Model-Robust Estimation of Structural Equation Models. Psych 2023, 5, 1122-1139. https://doi.org/10.3390/psych5040075

AMA Style

Robitzsch A. L0 and Lp Loss Functions in Model-Robust Estimation of Structural Equation Models. Psych. 2023; 5(4):1122-1139. https://doi.org/10.3390/psych5040075

Chicago/Turabian Style

Robitzsch, Alexander. 2023. "L0 and Lp Loss Functions in Model-Robust Estimation of Structural Equation Models" Psych 5, no. 4: 1122-1139. https://doi.org/10.3390/psych5040075

Article Metrics

Back to TopTop