L0 and Lp Loss Functions in Model-Robust Estimation of Structural Equation Models

Robitzsch, Alexander

doi:10.3390/psych5040075

Open AccessArticle

L₀ and L_p Loss Functions in Model-Robust Estimation of Structural Equation Models

by

Alexander Robitzsch

^1,2

¹

IPN–Leibniz Institute for Science and Mathematics Education, Olshausenstraße 62, 24118 Kiel, Germany

²

Centre for International Student Assessment (ZIB), Olshausenstraße 62, 24118 Kiel, Germany

Psych 2023, 5(4), 1122-1139; https://doi.org/10.3390/psych5040075

Submission received: 26 August 2023 / Revised: 16 October 2023 / Accepted: 19 October 2023 / Published: 20 October 2023

(This article belongs to the Special Issue Editorial Board Members' Collection Series: Latent Trait Models and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

:

The

L_{p}

loss function has been used for model-robust estimation of structural equation models based on robustly fitting moments. This article addresses the choice of the tuning parameter

ε

that appears in the differentiable approximations of the nondifferentiable

L_{p}

loss functions. Moreover, model-robust estimation based on the

L_{p}

loss function is compared with a recently proposed differentiable approximation of the

L_{0}

loss function and a direct minimization of a smoothed version of the Bayesian information criterion in regularized estimation. It turned out in a simulation study that the

L_{0}

loss function slightly outperformed the

L_{p}

loss function in terms of bias and root mean square error. Furthermore, standard errors of the model-robust SEM estimators were analytically derived and exhibited satisfactory coverage rates.

Keywords:

structural equation modeling; model-robust estimation; differentiable approximation; robust loss function; regularized estimation; BIC

1. Introduction

Structural equation models (SEMs) and confirmatory factor analysis (CFA) are important statistical methods for analyzing multivariate data in the social sciences [1,2,3,4,5]. In these models, a multivariate vector

X = (X_{1}, \dots, X_{I})

of I continuous observed variables (also referred to as items or indicators) is modeled as a function of a vector of latent variables (i.e., factors or traits)

η

. SEMs represent the mean vector

μ

and the covariance matrix

Σ

of the random variable

X

as a function of an unknown parameter vector

θ

. In SEMs, constrained estimation of the moment structure of the multivariate normal distribution is applied [6].

The measurement model in an SEM is given as

X = ν + Λ η + ϵ .

(1)

We denote the covariance matrix

Var (ϵ) = Ψ

. The vectors

η

and

ϵ

are multivariate normally distributed. In addition,

η

and

ϵ

are uncorrelated random vectors. In CFA, the multivariate normal (MVN) distribution is represented as

η \sim MVN (α, Φ)

and

ϵ \sim MVN (0, Ψ)

. Hence, one can represent the mean and the covariance matrix in CFA as

μ (θ) = ν + Λ α and Σ (θ) = Λ Φ Λ^{⊤} + Ψ .

(2)

In SEM, a matrix

B

of regression coefficients can additionally be specified such that

η = B η + ξ with E (ξ) = α and Var (ξ) = Φ .

(3)

Hence, the mean vector and the covariance matrix are represented in SEM as

μ (θ) = ν + Λ {(I - B)}^{- 1} α and Σ (θ) = Λ {(I - B)}^{- 1} Φ {[{(I - B)}^{- 1}]}^{⊤} Λ^{⊤} + Ψ,

(4)

where

I

is the identity matrix.

Researchers often parsimoniously parameterize the mean vector and the covariance matrix using a parameter

θ

as a summary in an SEM. The model assumptions in SEMs are, at best, merely an approximation of a true data-generating model. In SEMs, model deviations (i.e., model errors) in covariances emerge as a difference between a population covariance matrix

Σ

and a model-implied covariance matrix

Σ (θ)

(see [7,8,9]). Simultaneously, model errors in the mean vector cause a difference between the population mean vector

μ

and the model-implied mean vector

μ (θ)

. As a result, the SEM is misspecified at the population level. It should be noted that the model errors are defined at the population level in infinite sample sizes. In real-data applications with limited sample sizes, the empirical covariance matrix

S

estimates the population covariance matrix

Σ

, while the mean vector

\bar{x}

estimates the population mean vector

μ

.

In this work, estimators with some resistance to model deviations are investigated. In more depth, the presence of some amount of model errors should have no impact on the parameter estimate

θ

. This robustness property is denoted as model robustness, and it adheres to robust statistics principles [10,11,12]. Model errors in SEMs appear as residuals in the modeled mean vector and the modeled covariance matrix, whereas in traditional robust statistics, observations (i.e., cases or subjects) do not obey an imposed statistical model and should be considered as outliers. That is, an estimator in an SEM should automatically recognize large deviations in

μ - μ (θ)

and

Σ - Σ (θ)

as outliers that should not significantly damage the estimated parameter

θ

.

In previous research, the

L_{p}

loss function has been used for model-robust estimation based on moments [13,14]. Non-robust estimators such as maximum likelihood or unweighted and weighted least square estimates will typically result in biased estimates in the presence of model error [14]. In this article, we more thoroughly discuss the choice of the tuning parameter

ε

in differentiable approximations of the nondifferentiable

L_{p}

loss functions. Furthermore, we compare the

L_{p}

loss function with a recently proposed differentiable approximation of the

L_{0}

loss function and a direct minimization of a smoothed version of the Bayesian information criterion [15] in regularized estimation. Notable, the

L_{0}

loss function minimizes the number of model deviations in a fitted model. If only a few entries in the modeled mean vector (or mean vectors) or covariance matrix (or covariance matrices) deviate from zero at the population level, while all other entries equal zero, the

L_{0}

loss function would be the most appropriate fit function. In contrast, if all model deviations differ from zero and unsystematically fluctuate around zero, the

L_{p}

or

L_{0}

loss functions with

p \leq 1

would be less appropriate. Finally, standard errors for the proposed model-robust estimators based on the delta method are derived in this article. Their performance is assessed by evaluating coverage rates.

To sum up, this article focuses on implementation details of SEM estimation based on the

L_{p}

(

0 < p \leq 1

) and the newly proposed

L_{0}

loss functions, while [16] was devoted to regularized SEM estimation, which can also be utilized for model-robust estimation. A comparison of regularized estimation and robust loss functions can be found in [14].

The remainder of the article is organized as follows. Model-robust SEM estimation based on the robust

L_{0}

and

L_{p}

loss functions is treated in Section 2. Section 3 introduces direct BIC minimization as a special approach to regularized maximum likelihood estimation. Section 4 is devoted to details of standard error computation. In Section 5, research questions are formulated that are addressed in two subsequent simulation studies. In Section 6, bias and root mean square error of model-robust SEM estimators are of interest. Section 7 reports findings regarding the standard error estimation regarding coverage rates. Finally, the article closes with a discussion in Section 8.

2. $L_{0}$ and $L_{p}$ Loss Functions in SEM Estimation

We now describe model-robust moment estimation of multiple-group SEMs. The treatment closely follows previous work in Refs. [14,16].

The empirical mean vector

\bar{x}

and the empirical covariance matrix

S

are sufficient statistics for estimating

μ

and

Σ

when modeling multivariate normally distributed data with no missing values. In particular, they are also sufficient statistics of

μ (θ)

and

Σ (θ)

that are constrained functions of a parameter vector

θ

. Hence,

\bar{x}

and

S

are also sufficient statistics for the parameter vector

θ = (θ_{1}, \dots, θ_{K})

that contains K elements.

Now, assume that there are G groups with sample sizes

N_{g}

, mean vectors

{\bar{x}}_{g}

, and covariance matrices

S_{g}

(

g = 1, \dots, G

). Let

ξ_{g} = ({\bar{x}}_{g}, vech (S_{g}))

be the vector of sufficient statistics in group g, where

vech

denotes the operator that stacks all nonredundant matrix entries on top of one another. Furthermore, the vector

ξ = (ξ_{1}, \dots, ξ_{G})

contains the sufficient statistics of all G groups.

The population mean vectors and covariance matrices are denoted by

μ_{g}

and

Σ_{g}

, respectively. The model-implied mean vectors and covariance matrices are denoted by

μ_{g} (θ)

and

Σ_{g} (θ)

, respectively. It is worth noting that the parameter vector

θ

lacks an index g, indicating that there can be common and unique parameters across groups. Equal factor loadings and item intercepts across groups are frequently imposed in a multiple-group CFA (i.e., measurement invariance is specified [17,18]).

In model-robust SEM estimation discussed in this article, discrepancies

{\bar{x}}_{g} - μ_{g} (θ)

and

vech (S_{g}) - vech (Σ_{g} (θ))

are minimized according to a loss function

ρ

. There are two kinds of errors that are only, for simplicity, discussed for the mean structure. We can express the discrepancy in the mean structure as

{\bar{x}}_{g} - μ_{g} (θ) = \{{\bar{x}}_{g} - μ_{g}\} + \{μ_{g} - μ_{g} (θ)\} .

(5)

The first term

{\bar{x}}_{g} - μ_{g}

describes a discrepancy due to sampling variation (i.e., with respect to the sampling of subjects). This term can typically be reduced when larger samples are drawn. The second term

μ_{g} - μ_{g} (θ)

indicates a model error. This term exists at the population level and, therefore, does not vanish with increasing sample sizes. In model-robust estimation, a few entries in the model error are allowed to differ from zero (i.e., a sparsity assumption), corresponding to model misspecification. Note that the sparsity assumption is vital for the performance of model-robust estimators.

In robust moment estimation, the following fit function

F_{rob}

is minimized:

F_{rob} (θ; ξ) = \sum_{g = 1}^{G} \sum_{i = 1}^{I} w_{1 g, i} ρ ({\bar{x}}_{g, i} - μ_{g, i} (θ)) + \sum_{g = 1}^{G} \sum_{i = 1}^{I} \sum_{j = i}^{I} w_{2 g, i j} ρ (s_{g, i j} - σ_{g, i j} (θ)) .

(6)

In the first term at the right side of the equation in (6), discrepancies in the sample means

{\bar{x}}_{g, i}

from the model-implied mean

μ_{g, i} (θ)

for item i in group g are considered. In the second term at the right side of the equation in (6), discrepancies in the sample covariances

s_{g, i j}

from the model-implied covariance

σ_{g, i j} (θ)

for items i and j in group g are considered. The weights

w_{1 g, i}

(

i = 1, \dots, I

) and

w_{2 g, i j}

(

i, j = 1, \dots, I

) are known but can be set to one if all variables have (approximately) the same standard deviation in the sample comprising all groups or the original scaling of the variables reflects the intended weighing of sampling and model errors. The loss function

ρ

in (6) should be chosen such that it is resistant to outlying effects in the mean and the covariance structure.

The robust mean absolute deviation (MAD) loss function

ρ (x) = | x |

was examined in [19,20]. When compared to usually employed SEM estimation approaches, this fit function is more robust to a few model violations, such as unmodeled item intercepts or unmodeled residual correlations of residuals (see [7]).

In this article, we investigate the

L_{p}

loss function

ρ (x) = {| x |}^{p} for p > 0 .

(7)

It has been shown that

p < 1

provides more efficient model-robust estimates than

p = 1

(see [7,13]). The

L_{p}

loss function with

p = 2

is the square loss function

ρ (x) = x^{2}

and corresponds to unweighted least squares (ULSs) estimation. However, this loss function does not possess the model robustness property [7]. The

L_{p}

loss function

ρ (x) = \sqrt{| x |} = {| x |}^{0.5}

(i.e.,

p = 0.5

) is implemented in invariance alignment [21,22,23] and penalized structural equation modeling [24] in the popular Mplus software (Version 8.10., https://www.statmodel.com/support/index.shtml (accessed on 25 September 2023)). The critical aspect of the

L_{p}

loss function

ρ

defined in (7) is that it is a nondifferentiable function. Consequently, the fit function

F_{rob}

in (6) is also nondifferentiable, which does not allow the application of general-purpose optimizers that rely on the differentiable optimization functions. As a remedy, the nondifferentiable

L_{p}

loss function

ρ

can be replaced by a differentiable approximation

ρ_{ε}

, which is close to

ρ

, but differentiable on the entire real line. The approximating function

ρ_{ε}

is defined as

ρ_{ε} (x) = {(x^{2} + ε)}^{p / 2},

(8)

where

ε > 0

is a tuning parameter that should be small enough such that

ρ_{ε}

is close to

ρ

but large enough to ensure estimation stability. Replacing the nondifferentiable

ρ

by the differentiable approximation

ρ_{ε}

has been previously recommended by [13,21,25,26].

The loss function

ρ

and its differentiable approximation

ρ_{ε}

are displayed for six different values of p in Figure 1. It can be seen (and shown) that

ρ (x) \leq ρ_{ε} (x)

for all

p > 0

and

ε > 0

. Furthermore, the loss function

ρ

is much steeper at

x = 0

for a smaller p. With a larger

ε

, the approximation

ρ_{ε}

becomes smoother. Choosing an appropriate tuning parameter

ε > 0

is therefore important when applying model-robust moment estimation based on the

L_{p}

loss function.

It might be tempting to use a very small p close to zero. Such a loss function is close to the

L_{0}

loss function, which takes the value of 1 for all arguments differently from

x = 0

and 0 for

x = 0

. If the sparsity assumption of model errors holds, the

L_{0}

would theoretically be the most desirable loss function [27,28]. However, as shown in Figure 1,

ρ_{ε}

for

p = 0.01

does not have a clear minimum zero, making this differentiable loss function difficult to apply in practical optimization.

O’Neill and Burke [15,29] proposed the following differentiable approximation

χ_{ε}

of the

L_{0}

loss function in a recent work related to regularized estimation:

χ_{ε} (x) = \frac{x^{2}}{x^{2} + ε},

(9)

where

ε > 0

is again a tuning parameter. The differentiable approximation

χ_{ε}

is displayed for different

ε

values in Figure 2. It can be seen that the functional form of

χ_{ε}

seems much nicer to use in optimization than

ρ_{ε}

with a p close to 0. Hence,

χ_{ε}

might be a useful alternative robust loss function whose performance in the presence of model errors has to be evaluated.

In practical minimization of

F_{rob}

in (6), when

ρ

is replaced with

ρ_{ε}

(using an appropriate p) or

χ_{ε}

, it is advisable to use reasonable starting values and to minimize

F_{rob}

using a sequence of differentiable approximations

ρ_{ε}

with decreasing

ε

values (i.e., subsequently fitting

{\tilde{ρ}}_{ε}

with

ε = 10^{- 1}, 10^{- 2}, 10^{- 3}, 10^{- 4}

, while using the previously obtained parameter estimate as the initial value for the subsequent minimization problem).

3. A Direct BIC Minimization in Regularized Maximum Likelihood Estimation

Most frequently, SEMs are estimated with maximum likelihood (ML) estimation. This estimation method provides the most efficient estimates for correctly specified models. However, the efficiency properties are lost in the case of misspecified SEMs.

As an alternative, regularized ML estimation can be used that introduces an overidentified SEM by allowing free group-specific item intercepts and residual covariances. To identify the model, a penalty function on the overidentified parameters is imposed that is targeted at the sparsity structure of model errors. The methodological literature has extensively documented the regularized estimation of single-group and multiple-group SEMs [30,31,32,33]. Cross-loadings, residual covariances, or item intercepts are regularized in these applications. Regularized SEM estimation enables flexible yet parsimonious model specifications.

In ML estimation, the fit function

F_{ML}

is the negative log-likelihood function based on the multivariate normal distribution that is defined as [2,4]

F_{ML} (θ; ξ) = \sum_{g = 1}^{G} \frac{N_{g}}{2} (- I log (2 π) + \log | Σ_{g} (θ) | + tr (S_{g} Σ_{g} {(θ)}^{- 1}) + {({\bar{x}}_{g} - μ_{g} (θ))}^{⊤} Σ_{g} {(θ)}^{- 1} ({\bar{x}}_{g} - μ_{g} (θ))) .

(10)

In empirical applications, the model-implied mean vectors

μ_{g}

and covariance matrices

Σ_{g}

will often be misspecified [34,35,36], and

θ

can be understood as a pseudo-true parameter that is defined as the minimizer of

F_{ML}

in (10).

In regularized SEM estimation, a penalty function

P

is added to the log-likelihood fit function

F_{ML}

that imposes some sparsity assumption on a subset of model parameters [31,33]. In order to enforce sparsity, the penalty function

P

is often chosen to be nondifferentiable. Define a known parameter

ι_{k} \in {0, 1}

for all parameters

θ_{k}

, where

ι_{k} = 1

indicates that for the kth entry

θ_{k}

in

θ

a penalty function is applied. The penalized log-likelihood function is defined as

F_{pen} (θ, λ; ξ) = F_{ML} (θ; ξ) + N^{*} \sum_{k = 1}^{K} ι_{k} P (| θ_{k} |, λ),

(11)

where

λ > 0

is a regularization parameter, and

N^{*}

is a scaling factor that frequently equals the total sample size

N = \sum_{g = 1}^{G} N_{g}

. The regularized (or penalized) ML estimate is defined as the minimizer of

F_{pen} (θ; ξ)

.

The least absolute shrinkage and selection operator (LASSO; ref. [37]) or the smoothly clipped absolute deviation (SCAD; ref. [38]) penalty functions have been frequently used in regularized SEM estimation. For a fixed value of

λ

, a subset of

θ_{k}

parameters for which the penalty function is applied (i.e.,

ι_{k} = 1

) will result in estimates of zero. That is, a sparse

θ

vector is obtained as the result of regularized estimation. However, the estimate of

θ

depends on the fixed regularization parameter

λ

; that is,

\tilde{θ} (λ) = \underset{θ}{arg min} F_{pen} (θ, λ; ξ) .

(12)

As a result, the parameter estimate

\tilde{θ} (λ)

of

θ

depends on the unknown parameter

λ

. To avoid this problem, the regularized SEM can be repeatedly estimated on a finite grid of regularization parameters

λ

(e.g., on an equidistant grid between 0.01 and 1.00 with increments of 0.01). The Bayesian information criterion (BIC), defined by

BIC = 2 F_{ML} (θ, ξ) + log (N) H

may be used to choose an optimal regularization parameter

λ

, where H denotes the number of parameters. Because the minimization of BIC is equivalent to the minimization of BIC/2, the final parameter estimate

\hat{θ}

is determined as

\hat{θ} = \tilde{θ} (\hat{λ}) with \hat{λ} = \underset{λ}{arg min} \{F_{ML} (\tilde{θ} (λ), ξ) + \frac{log (N)}{2} (\sum_{k = 1}^{K} ι_{k} χ ({\tilde{θ}}_{k} (λ)))\},

(13)

where the function

χ

is an indicator whether

| x |

is different from 0:

χ (x) = \{\begin{matrix} 1 & if | x | \neq 0 \\ 0 & if | x | = 0 \end{matrix}

(14)

In particular, the quantity

\sum_{k = 1}^{K} ι_{k} χ ({\tilde{θ}}_{k} (λ))

in (13) counts the number of parameter estimates

{\tilde{θ}}_{k} (λ)

for

k = 1, \dots, K

for which the penalty function is applied (i.e.,

ι_{k} = 1

) and differs from 0.

As becomes clear, regularization SEM estimation necessitates fitting an SEM on a grid of the regularization parameter

λ

. This approach is computationally intensive, especially for SEMs with a large number of parameters. The final parameter estimate is obtained by minimizing the BIC across all estimated regularized SEMs. A naïve idea might be directly minimizing the BIC to avoid repeated estimation that involves regularization parameter selection. It should be noted that only a subset of parameters for which sparsity should be imposed is relevant in the BIC computation. Hence, a parameter estimate

\hat{θ}

by minimizing the BIC is provided by

\hat{θ} = \underset{θ}{arg min} \{F_{ML} (θ, ξ) + \frac{log (N)}{2} (\sum_{k = 1}^{K} ι_{k} χ (θ_{k}))\} .

(15)

The optimization function in (15) employs an

L_{0}

penalty function [39,40] with a fixed regularization parameter

log (N) / 2

. This optimization function contains the nondifferentiable indicator function

χ

that counts the number of regularized parameters that differ from 0. The ingenious idea of O’Neill and Burke [15] was to replace the nondifferentiable

L_{0}

loss function

χ

with its differentiable approximation

χ_{ε}

(see (8) and Ref. [16] for a more comprehensive treatment). Therefore, the parameter

θ

can be estimated as

\hat{θ} = \underset{θ}{arg min} F_{DBIC} (θ, ξ) with F_{DBIC} (θ, ξ) = F_{ML} (θ, ξ) + \frac{log (N)}{2} (\sum_{k = 1}^{K} ι_{k} χ_{ε} (θ_{k})) .

(16)

The estimation approach from (16) is referred to as the smoothed direct BIC minimization (DBIC) approach. This method has been used to estimate regularized distributional regression models [15].

It has been shown in SEMs that the DBIC approach performs similarly to regularized estimation based on the indirect approach by minimizing the BIC on a finite grid of regularization parameters [16]. Hence, we confine ourselves in this article to compare the model-robust moment estimation methods with different values of the power p with the DBIC estimator.

4. Computation of Standard Errors

In this section, the computation of the variance matrix of parameter estimates

\hat{θ}

from model-robust moment estimation and DBIC estimation using the fit functions in (6) and (16), respectively, is described (see also [14,16] for a similar treatment). Both methods minimize a differentiable (approximation) function

F (θ, ξ)

with respect to

θ

as a function of sufficient statistics

ξ

(see also [41]). The vector of estimated sufficient statistics

\hat{ξ}

is approximately normally distributed (see [3]); that is,

\hat{ξ} - ξ_{0} \sim MVN (0, V_{ξ})

(17)

for a true population parameter

ξ_{0}

of sufficient statistics. Let

F_{θ} = (\partial F) / (\partial θ)

be the vector of partial derivatives with respect to

θ

. The parameter estimate

\hat{θ}

fulfills the nonlinear equation

F_{θ} (\hat{θ}, \hat{ξ}) = 0

. The delta method [34] can be employed to derive the variance matrix of

\hat{θ}

. Assume that there exists a (pseudo-)true parameter

θ_{0}

such that

F_{θ} (θ_{0}, ξ_{0}) = 0

.

Now, we conduct a Taylor expansion of

F_{θ}

(see [3,5,42]). Denote by

F_{θ θ}

and

F_{θ ξ}

the matrices of second-order partial derivatives of

F_{θ}

with respect to

θ

and

ξ

, respectively. The Taylor expansion can be written as

F_{θ} (\hat{θ}, \hat{ξ}) = F_{θ} (θ_{0}, ξ_{0}) + F_{θ θ} (θ_{0}, ξ_{0}) (\hat{θ} - θ_{0}) + F_{θ ξ} (θ_{0}, ξ_{0}) (\hat{ξ} - ξ_{0}) = 0 .

(18)

By solving (18) for

\hat{θ}

, we get the approximation

\hat{θ} - θ_{0} = - F_{θ θ} {(θ_{0}, ξ_{0})}^{- 1} F_{θ ξ} (θ_{0}, ξ_{0}) (\hat{ξ} - ξ_{0}) .

(19)

By defining

\hat{A} = - F_{θ θ} {(\hat{θ}, \hat{ξ})}^{- 1} F_{θ ξ} (\hat{θ}, \hat{ξ})

when substituting

θ_{0}

and

ξ_{0}

with

\hat{θ}

and

\hat{ξ}

, respectively, we get by using the multivariate delta method [34]

Var (\hat{θ}) = \hat{A} V_{ξ} {\hat{A}}^{⊤} .

(20)

The square root of diagonal elements of

Var (\hat{θ})

computed from (20) may be used to calculate standard errors for elements in

\hat{θ}

.

Statistical Inference for Parameter Differences of Different Models Based on the Same Dataset

In this section, statistical inference for differences in parameters from different models based on the same dataset is discussed. For example, researchers could use the

L_{p}

loss function with

p = 2

,

p = 0.5

, and

p = 0

. It should be evaluated whether the estimated factor means from different models that employ the different loss functions are statistically significant. Importantly, the different models rely on the same dataset and its vector of sufficient statistics

\hat{ξ}

. Hence, the standard error of a parameter difference from parameters of different models can be smaller than the standard error of a single model because the data are used twice. The M-estimation framework can also be utilized to derive the variance estimate of a parameter difference [43,44]. The different loss functions provide different estimates

{\hat{θ}}_{m}

for models

m = 1, \dots, M

. At the population level, the parameters are denoted as

θ_{m}

. Note that the population parameters differ in the case of misspecified SEMs. In the following, we discuss the case

M = 2

to reduce notation.

Following the lines of the variance derivation in the previous section, we can approximate the estimate of model m by using (19)

{\hat{θ}}_{m} - θ_{m} = A_{m} (\hat{ξ} - ξ) for m = 1, 2,

(21)

where

A_{m}

is defined as in (19). Note that

A_{1} \neq A_{2}

due to choosing different loss functions. Researchers can now ask whether the parameter difference

\hat{Δ} = {\hat{θ}}_{1} - {\hat{θ}}_{2}

(or some entries of it) significantly differs from

0

. From (21), we obtain

\hat{Δ} - Δ = (A_{1} - A_{2}) (\hat{ξ} - ξ),

(22)

where

Δ = θ_{1} - θ_{2}

. We then obtain the variance estimate of

\hat{Δ}

as

Var (\hat{Δ}) = (A_{1} - A_{2}) V_{ξ} {(A_{1} - A_{2})}^{⊤} .

(23)

The unknown matrix

A_{1} - A_{2}

can be estimated by its sample analog

{\hat{A}}_{1} - {\hat{A}}_{2}

.

5. Research Purpose

In this article, several research questions are addressed connected to model-robust estimation. First, simulations should clarify which tuning parameter

ε > 0

in the differentiable approximation should be chosen to minimize bias and variance in estimated structural SEM parameters. Second, according to our knowledge, no research compares the performance of the

L_{p}

loss function (approximation) with the newly proposed

L_{0}

loss function approximation of O’Neill and Burke in terms of bias and root mean square error (RMSE). Third, it should be examined whether the standard error computation based on the delta method provides valid standard error estimates regarding coverage rates even though a differentiable approximation of the involved model-robust loss function is used. The first two research questions are addressed in Simulation Study 1 (see Section 6), while the third is investigated in Simulation Study 2 (see Section 7).

6. Simulation Study 1: Bias and RMSE

This Simulation Study 1 examined the impact of group-specific item intercepts in a multiple-group one-dimensional factor model on bias and RMSE of factor means and factor variances. In the data-generating model (DGM), measurement invariance was violated. That is, differential item functioning (DIF; refs. [45,46]) occurred, and hence, DIF effects in item intercepts were simulated.

6.1. Method

The DGM in the simulation study was identical to Simulation Study 2 in [14] and mimicked [21]. The data were simulated from a one-dimensional factor model involving five items and three groups. The factor variable

η_{1}

was normally distributed with group means

α_{1, 1} = 0

,

α_{2, 1} = 0.3

, and

α_{3, 1} = 0.8

and group variances

ϕ_{1, 11} = 1

,

ϕ_{2, 11} = 1.5

, and

ϕ_{3, 11} = 1.2

, respectively. All five factor loadings were set to 1, and all measurement error variances were set to 1 in all groups and uncorrelated with each other. The factor variable and residual variables were normally distributed.

Only a subset of group-specific item intercepts was simulated differently from zero. These nonzero item intercepts indicated measurement noninvariance (i.e., the presence of DIF effects). One of the five items in each group had a DIF effect. However, different items across the three groups were affected by the DIF. In the first group, the fourth item intercepts had a DIF effect

δ

. In the second group, the first item had a DIF effect

- δ

, while the second item had a DIF effect

- δ

in the third group. The DIF effect

δ

was chosen as 0, 0.3, or 0.6. The value of

δ = 0

represented the situation of measurement invariance. The sample size per group was chosen as

N = 250

,

N = 500

,

N = 1000

, or

N = 2000

.

All analysis models were a multiple-group one-factor model. For identification reasons, the mean of the factor variable in the first group was fixed at 0, and the standard deviation in the first group was fixed at 1. Invariant factor loadings and residual variances were specified across groups. In model-robust moment estimation (ME), we also assumed invariant item intercepts. We utilized the powers

p = 0.5

,

p = 0.25

, and

p = 0.1

for the loss function

ρ_{ε}

defined in (8) combined with values of the tuning parameter

ε

chosen as

ε = 10^{- 2}

(=0.01),

ε = 10^{- 3}

(=0.001), and

ε = 10^{- 4}

(=0.0001). The resulting estimators will be denoted by ME0.5, ME0.25, and ME0.1, in the following Results sections, respectively. Moreover, we used the loss function

χ_{ε}

defined in (9) with the same

ε

tuning parameter values as for

ρ_{ε}

to approximate the

L_{0}

loss function (denoted by ME0 in the following Section 6.2). Furthermore, we used DBIC estimation in which all group-specific item intercepts were allowed to differ across groups. The indicator variables

ι_{k}

involved in the DBIC approach (see (16)) of the model parameter vector

θ

take only the value 1 for item intercepts. For all other elements of

θ

, they are set to 0. Therefore, the DBIC approach effectively minimizes the number of estimated item intercepts in its penalty. We also chose the tuning parameter values as

ε

chosen as

10^{- 2}

,

10^{- 3}

, and

10^{- 4}

.

We did not use the power

p = 1

in model-robust moment estimation because it resulted in biased estimates in the presence of DIF [14]. However, we included the non-robust ML estimation method for a more comprehensive comparison of the estimation methods.

In total, R = 5000 replications were conducted for all 3 (DIF effect size

δ

) × 4 (sample size N) = 12 conditions of the simulation study. We investigated the estimation quality of factor means (i.e.,

α_{2, 1}

and

α_{3, 1}

) and factor variances (i.e.,

ϕ_{2, 11}

and

ϕ_{3, 11}

) in the second and third groups. Bias, RMSE, and relative RMSE were computed to assess the performance of the different estimators. Let

{\hat{θ}}_{r}

be a model parameter estimate in replication

r = 1, \dots, R

. The bias was estimated by

Bias (\hat{θ}) = \frac{1}{R} \sum_{r = 1}^{R} ({\hat{θ}}_{r} - θ),

(24)

where

θ

denotes the true parameter value. The RMSE was estimated by

RMSE (\hat{θ}) = \sqrt{\frac{1}{R} \sum_{r = 1}^{R} {({\hat{θ}}_{r} - θ)}^{2}} .

(25)

Note that

RMSE (\hat{θ}) \geq | Bias (\hat{θ}) |

holds because the mean square error (i.e., the square of the RMSE) is the sum of the square of the bias and the variance of an estimator. A relative RMSE can be defined by dividing the RMSE of an estimator by the RMSE of a chosen reference model. To ease the reading of the numeric values of the relative RMSE, the values were multiplied by 100. This quantity can then easily be converted as a percentage gain or loss of a particular estimator compared to a reference model.

The entire simulation study was carried out in the R [47] software (Version 4.3.1). The SEMs were estimated using the sirt::mgsem() function in the R package sirt (Version 4.0-19; ref. [48]). Information about model specification can be found in the material located at https://osf.io/ng6s3 (accessed on 25 September 2023).

Researchers interested in fitting a particular model without interest in studying the entire simulation code are referred to the manual help site of the mgsem function in the R package sirt [48]. They can type ?sirt::mgsem in the R console (or look at https://alexanderrobitzsch.r-universe.dev/sirt/doc/manual.html#mgsem (accessed on 25 September 2023)) and find an example for applying the ME estimator for an existing dataset.

6.2. Results

Figure 3 displays the RMSE of factor mean

α_{2, 1}

in the second group for the different estimators for DIF effect sizes

δ = 0.3

and

δ = 0.6

and sample sizes

N = 500

and

N = 1000

as a function of the tuning parameter

ε

. It can be seen that

ε = 10^{- 3}

was optimal with respect to the RMSE for ME0.5, ME0.25, and ME0.1, while

ε = 10^{- 2}

resulted in the smallest RMSE for ME0 and DBIC. The findings were very similar for the factor mean

α_{3, 1}

in the third group and the sample sizes

N = 250

and

N = 2000

.

Table 1 displays the bias and relative RMSE of factor means and factor variances as a function of the DIF effect size and the sample size for the different estimation methods. We chose the tuning parameter

ε = 10^{- 3}

for the estimators ME0.5, ME0.25, and ME0.1, and

ε = 10^{- 2}

for the estimators ME0 and DBIC because they resulted in the smallest RMSE of factor means according to the findings of Figure 3.

Overall, all estimators except for ML were approximately unbiased for factor variances in all conditions and for factor means in the absence of DIF effects (i.e.,

δ = 0

and measurement invariance holds) except for the sample size

N = 250

. Slightly biased estimates were obtained for ME with a larger p, such as

p = 0.5

in ME0.5. However, the bias decreased with increasing sample size. The ME0 and DBIC were unbiased across all conditions, followed by the estimators ME0.1, ME0.25, and ME0.5, with increasing absolute bias. There was substantial bias for estimated factor means for ML estimation, while the estimates of factor variances were approximately unbiased for ML.

To compare the estimation accuracy in terms of the relative RMSE, we chose ME0.5 (with

ε = 10^{- 3}

) as the reference model. Hence, the relative RMSE was 100 for this method in Table 1. It turned out that ME0 and DBIC were superior to the other estimators with non-negligible efficiency gains. For example, for the factor mean

α_{2, 1}

in the second group,

δ = 0.3

, and

N = 1000

, the efficiency gain in terms of the RMSE was 6.4% (=100 − 93.6) for ME0 and 6.5% for DBIC. Across all conditions, no noteworthy differences between ME0 and DBIC estimators were found.

ML estimates were slightly more efficient than ME estimates in the absence of DIF. However, the efficiency gains of ML decrease with increasing sample size.

To conclude, it seems promising to replace the

L_{p}

loss function using the power

p = 0.5

(i.e., ME0.5) with the

L_{0}

loss function implemented in ME0 or regularized estimation in DBIC for large sample sizes

N = 1000

or

N = 2000

. Similar performance of the different estimators was found for

N = 500

, while the power

p = 0.5

was the frontrunner for

N = 250

.

7. Simulation Study 2: Coverage

The second Simulation Study 2 investigated the assessment of coverage rates of the model-robust moment estimators and the DBIC method.

7.1. Method

The same DGM as in Simulation Study 1 was employed to simulate data (see Section 6.1). We evaluated the standard error computation described in Section 4 for the five estimators using the tuning parameter

ε

that resulted in the smallest RMSE of factor means. In detail, we used

ε = 10^{- 3}

for the estimators ME0.5, ME0.25, and ME0.1; and we used

ε = 10^{- 2}

for the estimators ME0 and DBIC (see Figure 3). Moreover, we included ML estimation for reasons of comparison.

The same analysis models as in Simulation Study 1 were specified. Confidence intervals at the confidence level of 95% were computed using a normal distribution approximation (i.e., the estimated confidence interval was

\hat{θ} \pm 1.96 \times SE (\hat{θ})

). The coverage rate at the confidence level of 95% was computed as the percentage of the events that a computed confidence interval covers the true parameter value.

As in Simulation Study 1, 5000 replications were conducted in each of the 12 simulation conditions (i.e., 3 (DIF effect size

δ

) × 4 (sample size N) = 12 conditions). The SEMs with standard error estimates of model parameters were again estimated with the sirt::mgsem() function in the R [47] package sirt (Version 4.0-19; ref. [48]). Material for replication can be found at https://osf.io/ng6s3 (accessed on 25 September 2023).

7.2. Results

Table 2 shows the coverage rates of factor means and factor variances for the five different estimators as a function of DIF effects size and sample size. Overall, the coverage rates were acceptable because they were neither smaller than 91.0 nor larger than 98.0 (see [49]). Across all conditions and estimators (when excluding ML estimation), the coverage rates in Table 2 ranged between 92.1 and 97.8, with a mean of

M = 96.16

and a standard deviation of

S D = 0.93

. The coverage rates of ME0.5 (

M = 95.87

,

S D = 0.87

), ME0 (

M = 96.03

,

S D = 0.82

), DBIC (

M = 95.81

,

S D = 1.09

) performed slightly better than ME0.25 (

M = 96.42

,

S D = 0.78

) and ME0.1 (

M = 96.67

,

S D = 0.76

). The coverage rates for ML estimation (

M = 74.57

,

S D = 33.43

) were not acceptable for factor means which had biased estimates in the presence of DIF effects.

8. Discussion

In this article, we compared model-robust moment estimation with a recently proposed variant of regularized ML estimation by O’Neill and Burke [15] that directly maximizes the BIC (i.e., the DBIC estimator). In the DBIC estimation, these authors suggested a differentiable approximation of the

L_{0}

loss function, which was also used in model-robust moment estimation. Interestingly, the

L_{0}

loss function outperformed

L_{p}

loss functions for

p > 0

regarding bias and RMSE. Furthermore, model-robust moment estimation with the

L_{0}

loss function performed very similarly to the DBIC estimator. Moreover, the estimation of standard errors was successfully implemented for all estimators because coverage rates were acceptable for all parameters in all simulation conditions.

In line with previous studies, we anticipate that sample sizes must be sufficiently large in order to achieve the model-robustness properties of the

L_{p}

and

L_{0}

loss functions. If the sample size is too small (e.g.,

N = 100

subjects in a multiple-group SEM analysis), the sampling error in moments that are used as sufficient statistics in the SEM can exceed model errors (i.e., unmodeled group-specific item intercepts). In this case, model-robust methods are not expected to perform well. In fact, Simulation Study 1 revealed that

p = 0.5

is preferable to other

L_{p}

loss functions with

p < 0.5

or the

L_{0}

loss function for

N = 250

, while the situation changes with larger sample sizes such as

N = 1000

.

Simulation Study 1 indicated that the tuning parameter

ε = 0.001

in the differentiable approximation of the nondifferentiable

L_{p}

loss function should be used for the

L_{p}

loss function for

p = 0.5

,

p = 0.25

, or

p = 0.1

. In contrast,

ε = 0.01

was found optimal for the

L_{0}

loss function and the direct BIC minimization estimation approach in regularized estimation. We expect that these findings will transfer to other models that involve standardized variables. In our experience from regularized estimation and the invariance alignment approach [21], using a value of

ε

(such as

ε = 10^{- 5}

and smaller values) tuning parameter that is too small is prohibitive because it more likely results in convergence issues in the local optima of the fit function.

In our simulation studies, we only considered model errors in item intercepts. Future simulation studies could also investigate model errors in the covariance structure (i.e., unmodeled residual correlations). Of course, model errors in the mean and the covariance structure can also be simultaneously examined.

A requirement to achieve model robustness of SEM estimators is that model errors in the mean or covariance structure are sparsely distributed. That is, only a few entries are allowed to differ from zero, while the majority of model errors must be (approximately) zero. If factor loadings were not invariant across groups, more densely distributed model errors would result. As a consequence, model-robust moment estimation will likely not work, and regularized maximum likelihood estimation might be preferred. However, the DBIC minimization method for regularized maximum likelihood estimation can also be utilized in this case if the number of group-specific factor loadings is counted in the BIC penalty term.

In this article, we only applied the

L_{0}

and

L_{p}

loss functions to continuous items. However, the principle directly transfers to SEMs of ordinal data that are based on fitting thresholds and polychoric correlations [50] instead of means and covariances for continuous data, respectively. Moreover, the model-robust estimators could also be applied to two-step estimation methods of multilevel structural equation models [51,52].

To sum up, model-robust SEM estimators based on the

L_{p}

(for

p < 1

) and

L_{0}

loss functions are attractive to researchers who do not want model estimates being influenced by the presence of a few model deviations (i.e., model errors). In contrast, usually employed (non-robust) SEM estimators such as maximum likelihood estimation are impacted by model errors. In this sense, misfitting models do not necessarily result in biased estimates when using model-robust estimation.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

BIC	Bayesian information criterion
CFA	confirmatory factor analysis
DBIC	direct BIC minimization
DGM	data-generating model
DIF	differential item functioning
ME	moment estimation
ML	maximum likelihood
RMSE	root mean square error
SEM	structural equation model

References

Bartholomew, D.J.; Knott, M.; Moustaki, I. Latent Variable Models and Factor Analysis: A Unified Approach; Wiley: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
Bollen, K.A. Structural Equations with Latent Variables; Wiley: New York, NY, USA, 1989. [Google Scholar] [CrossRef]
Browne, M.W.; Arminger, G. Specification and estimation of mean-and covariance-structure models. In Handbook of Statistical Modeling for the Social and Behavioral Sciences; Arminger, G., Clogg, C.C., Sobel, M.E., Eds.; Springer: Boston, MA, USA, 1995; pp. 185–249. [Google Scholar] [CrossRef]
Jöreskog, K.G.; Olsson, U.H.; Wallentin, F.Y. Multivariate Analysis with LISREL; Springer: Basel, Switzerland, 2016. [Google Scholar] [CrossRef]
Yuan, K.H.; Bentler, P.M. Structural equation modeling. In Handbook of Statistics, Volume 26: Psychometrics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 297–358. [Google Scholar] [CrossRef]
Magnus, J.R.; Neudecker, H. Matrix Differential Calculus with Applications in Statistics and Econometrics; Wiley: New York, NY, USA, 2019. [Google Scholar] [CrossRef]
Robitzsch, A. Comparing the robustness of the structural after measurement (SAM) approach to structural equation modeling (SEM) against local model misspecifications with alternative estimation approaches. Stats 2022, 5, 631–672. [Google Scholar] [CrossRef]
Uanhoro, J.O. Modeling misspecification as a parameter in Bayesian structural equation models. Educ. Psychol. Meas. 2023. Epub ahead of print. [Google Scholar] [CrossRef]
Wu, H.; Browne, M.W. Quantifying adventitious error in a covariance structure as a random effect. Psychometrika 2015, 80, 571–600. [Google Scholar] [CrossRef] [PubMed]
Huber, P.J.; Ronchetti, E.M. Robust Statistics; Wiley: New York, NY, USA, 2009. [Google Scholar] [CrossRef]
Maronna, R.A.; Martin, R.D.; Yohai, V.J. Robust Statistics: Theory and Methods; Wiley: New York, NY, USA, 2006. [Google Scholar] [CrossRef]
Ronchetti, E. The main contributions of robust statistics to statistical science and a new challenge. Metron 2021, 79, 127–135. [Google Scholar] [CrossRef]
Robitzsch, A. L_p loss functions in invariance alignment and Haberman linking with few or many groups. Stats 2020, 3, 246–283. [Google Scholar] [CrossRef]
Robitzsch, A. Model-robust estimation of multiple-group structural equation models. Algorithms 2023, 16, 210. [Google Scholar] [CrossRef]
O’Neill, M.; Burke, K. Variable selection using a smooth information criterion for distributional regression models. Stat. Comput. 2023, 33, 71. [Google Scholar] [CrossRef] [PubMed]
Robitzsch, A. Implementation aspects in regularized structural equation models. Algorithms 2023, 16, 446. [Google Scholar] [CrossRef]
Meredith, W. Measurement invariance, factor analysis and factorial invariance. Psychometrika 1993, 58, 525–543. [Google Scholar] [CrossRef]
Millsap, R.E. Statistical Approaches to Measurement Invariance; Routledge: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
Siemsen, E.; Bollen, K.A. Least absolute deviation estimation in structural equation modeling. Sociol. Methods Res. 2007, 36, 227–265. [Google Scholar] [CrossRef]
Van Kesteren, E.J.; Oberski, D.L. Flexible extensions to structural equation models using computation graphs. Struct. Equ. Model. 2022, 29, 233–247. [Google Scholar] [CrossRef]
Asparouhov, T.; Muthén, B. Multiple-group factor analysis alignment. Struct. Equ. Model. 2014, 21, 495–508. [Google Scholar] [CrossRef]
Muthén, B.; Asparouhov, T. IRT studies of many groups: The alignment method. Front. Psychol. 2014, 5, 978. [Google Scholar] [CrossRef] [PubMed]
Pokropek, A.; Lüdtke, O.; Robitzsch, A. An extension of the invariance alignment method for scale linking. Psych. Test Assess. Model. 2020, 62, 303–334. [Google Scholar]
Asparouhov, T.; Muthén, B. Penalized Structural Equation Models; 2023 Technical Report. Available online: https://rb.gy/tbaj7 (accessed on 28 March 2023).
Battauz, M. Regularized estimation of the nominal response model. Multivar. Behav. Res. 2020, 55, 811–824. [Google Scholar] [CrossRef]
Oelker, M.R.; Tutz, G. A uniform framework for the combination of penalties in generalized structured models. Adv. Data Anal. Classif. 2017, 11, 97–120. [Google Scholar] [CrossRef]
Davies, P.L. Data Analysis and Approximate Models; CRC Press: Boca Raton, FL, USA, 2014. [Google Scholar] [CrossRef]
Davies, P.L.; Terbeck, W. Interactions and outliers in the two-way analysis of variance. Ann. Stat. 1998, 26, 1279–1305. [Google Scholar] [CrossRef]
O’Neill, M.; Burke, K. Robust distributional regression with automatic variable selection. arXiv 2022, arXiv:2212.07317. [Google Scholar] [CrossRef]
Geminiani, E.; Marra, G.; Moustaki, I. Single- and multiple-group penalized factor analysis: A trust-region algorithm approach with integrated automatic multiple tuning parameter selection. Psychometrika 2021, 86, 65–95. [Google Scholar] [CrossRef]
Huang, P.H.; Chen, H.; Weng, L.J. A penalized likelihood method for structural equation modeling. Psychometrika 2017, 82, 329–354. [Google Scholar] [CrossRef]
Huang, P.H. A penalized likelihood method for multi-group structural equation modelling. Brit. J. Math. Stat. Psychol. 2018, 71, 499–522. [Google Scholar] [CrossRef]
Jacobucci, R.; Grimm, K.J.; McArdle, J.J. Regularized structural equation modeling. Struct. Equ. Model. 2016, 23, 555–566. [Google Scholar] [CrossRef]
Boos, D.D.; Stefanski, L.A. Essential Statistical Inference; Springer: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
Kolenikov, S. Biases of parameter estimates in misspecified structural equation models. Sociol. Methodol. 2011, 41, 119–157. [Google Scholar] [CrossRef]
White, H. Maximum likelihood estimation of misspecified models. Econometrica 1982, 50, 1–25. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R.; Wainwright, M. Statistical Learning with Sparsity: The Lasso and Generalizations; CRC Press: Boca Raton, FL, USA, 2015. [Google Scholar] [CrossRef]
Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Oelker, M.R.; Pößnecker, W.; Tutz, G. Selection and fusion of categorical predictors with L₀-type penalties. Stat. Model. 2015, 15, 389–410. [Google Scholar] [CrossRef]
Shen, X.; Pan, W.; Zhu, Y. Likelihood-based selection and sharp parameter estimation. J. Am. Stat. Assoc. 2012, 107, 223–232. [Google Scholar] [CrossRef]
Shapiro, A. Statistical inference of covariance structures. In Current Topics in the Theory and Application of Latent Variable Models; Edwards, M.C., MacCallum, R.C., Eds.; Routledge: Milton Park Abingdon, UK, 2012; pp. 222–240. [Google Scholar] [CrossRef]
Shapiro, A. Statistical inference of moment structures. In Handbook of Latent Variable and Related Models; Lee, S.Y., Ed.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 229–260. [Google Scholar] [CrossRef]
Clogg, C.C.; Petkova, E.; Haritou, A. Statistical methods for comparing regression coefficients between models. Am. J. Sociol. 1995, 100, 1261–1293. [Google Scholar] [CrossRef]
Mize, T.D.; Doan, L.; Long, J.S. A general framework for comparing predictions and marginal effects across models. Sociol. Methodol. 2019, 49, 152–189. [Google Scholar] [CrossRef]
Holland, P.W.; Wainer, H. (Eds.) Differential Item Functioning: Theory and Practice; Lawrence Erlbaum: Hillsdale, NJ, USA, 1993. [Google Scholar] [CrossRef]
Mellenbergh, G.J. Item bias and item response theory. Int. J. Educ. Res. 1989, 13, 127–143. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing; R Core Team: Vienna, Austria, 2023; Available online: https://www.R-project.org/ (accessed on 15 March 2023).
Robitzsch, A. sirt: Supplementary Item Response Theory Models, R Package Version 4.0-19; 2023. Available online: https://github.com/alexanderrobitzsch/sirt (accessed on 16 September 2023).
Muthén, L.K.; Muthén, B.O. How to use a Monte Carlo study to decide on sample size and determine power. Struct. Equ. Model. 2002, 9, 599–620. [Google Scholar] [CrossRef]
Muthén, B. A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika 1984, 49, 115–132. [Google Scholar] [CrossRef]
Muthén, B.O. Multilevel covariance structure analysis. Sociol. Methods Res. 1994, 22, 376–398. [Google Scholar] [CrossRef]
Yuan, K.H.; Bentler, P.M. Multilevel covariance structure analysis by fitting multiple single-level models. Sociol. Methodol. 2007, 37, 53–82. [Google Scholar] [CrossRef]

Figure 1. Loss function

ρ_{ε}

(see (8)) for different values of p as a function of the tuning parameter

ε

.

Figure 1. Loss function

ρ_{ε}

(see (8)) for different values of p as a function of the tuning parameter

ε

.

Figure 2. Loss function

χ_{ε}

(see (9)) as a function of the tuning parameter

ε

.

Figure 2. Loss function

χ_{ε}

(see (9)) as a function of the tuning parameter

ε

.

Figure 3. Simulation Study 1: Root mean square error for the factor mean

α_{2, 1}

of the different model-robust moment estimation (ME0.5, ME0.25, ME0.1, ME0) and the direct BIC minimization (DBIC) estimators as a function of the tuning parameter

ε

, sample size N, and DIF effect size

δ

.

Figure 3. Simulation Study 1: Root mean square error for the factor mean

α_{2, 1}

of the different model-robust moment estimation (ME0.5, ME0.25, ME0.1, ME0) and the direct BIC minimization (DBIC) estimators as a function of the tuning parameter

ε

, sample size N, and DIF effect size

δ

.

Table 1. Simulation Study 1: Bias and relative root mean square error (RMSE) of factor means and factor variances as a function of DIF effect size

δ

and sample size N.

Table 1. Simulation Study 1: Bias and relative root mean square error (RMSE) of factor means and factor variances as a function of DIF effect size

δ

and sample size N.

			Bias						Relative RMSE
Par	$δ$	$N$	ML	ME0.5	ME0.25	ME0.1	ME0	DBIC	ML	ME0.5 $^{†}$	ME0.25	ME0.1	ME0	DBIC
$α_{2, 1}$	0	250	0.001	0.002	0.002	0.002	0.002	0.002	97.5	100.0	101.3	102.0	101.2	101.1
		500	0.000	0.000	0.000	0.000	0.000	0.000	98.5	100.0	100.8	101.4	100.0	100.0
		1000	0.001	0.000	0.000	0.000	0.000	0.000	99.1	100.0	100.5	100.9	99.5	99.5
		2000	0.001	0.001	0.001	0.001	0.001	0.001	99.6	100.0	100.3	100.5	99.6	99.3
	0.3	250	−0.119	−0.061	−0.055	−0.054	−0.048	−0.048	114.9	100.0	101.2	102.2	104.3	104.3
		500	−0.119	−0.034	−0.026	−0.023	−0.015	−0.015	152.9	100.0	98.9	99.1	100.0	99.4
		1000	−0.120	−0.021	−0.014	−0.011	−0.005	−0.005	211.9	100.0	97.3	96.7	93.4	93.2
		2000	−0.120	−0.015	−0.009	−0.007	−0.004	−0.004	293.2	100.0	96.0	95.2	93.2	93.4
	0.6	250	−0.239	−0.034	−0.023	−0.019	−0.005	−0.005	215.0	100.0	99.0	98.9	97.1	97.1
		500	−0.238	−0.018	−0.011	−0.008	0.000	0.000	293.3	100.0	99.4	99.7	96.6	96.6
		1000	−0.239	−0.013	−0.007	−0.005	−0.001	−0.001	417.8	100.0	98.7	98.7	96.5	96.5
		2000	−0.238	−0.009	−0.004	−0.003	0.000	0.000	598.0	100.0	98.2	98.1	96.8	97.0
$α_{3, 1}$	0	250	0.005	0.005	0.005	0.005	0.005	0.004	96.8	100.0	101.2	101.8	101.7	101.8
		500	0.002	0.002	0.002	0.002	0.002	0.002	97.8	100.0	101.1	101.8	100.3	100.4
		1000	0.002	0.002	0.002	0.002	0.002	0.002	98.9	100.0	100.7	101.1	99.4	99.5
		2000	0.001	0.001	0.001	0.001	0.001	0.001	99.4	100.0	100.3	100.6	99.6	99.3
	0.3	250	−0.110	−0.058	−0.053	−0.051	−0.045	−0.045	112.0	100.0	101.2	102.1	104.3	104.1
		500	−0.113	−0.033	−0.025	−0.022	−0.014	−0.014	147.0	100.0	99.2	99.5	100.0	99.6
		1000	−0.115	−0.021	−0.013	−0.010	−0.004	−0.004	203.3	100.0	97.3	96.9	93.7	93.5
		2000	−0.115	−0.015	−0.009	−0.007	−0.003	−0.003	280.3	100.0	96.1	95.3	93.3	93.5
	0.6	250	−0.218	−0.032	−0.021	−0.017	−0.003	−0.004	197.0	100.0	99.6	99.8	98.7	99.0
		500	−0.217	−0.017	−0.009	−0.007	0.002	0.001	268.5	100.0	99.9	100.3	97.6	97.6
		1000	−0.219	−0.012	−0.006	−0.004	0.001	0.000	376.6	100.0	99.2	99.4	97.2	97.2
		2000	−0.220	−0.009	−0.005	−0.003	0.000	0.000	539.6	100.0	98.3	98.2	96.9	97.1
$ϕ_{2, 11}$	0	250	0.012	0.015	0.016	0.016	0.016	0.016	97.2	100.0	101.0	101.7	102.6	102.6
		500	0.007	0.009	0.009	0.009	0.009	0.009	98.0	100.0	100.8	101.3	101.3	101.4
		1000	0.007	0.008	0.008	0.008	0.007	0.007	98.6	100.0	100.6	101.0	100.2	100.3
		2000	0.003	0.003	0.003	0.003	0.003	0.003	99.0	100.0	100.3	100.6	99.9	99.9
	0.3	250	0.017	0.020	0.021	0.021	0.021	0.021	96.9	100.0	100.9	101.4	102.7	102.8
		500	0.007	0.008	0.008	0.009	0.008	0.009	98.5	100.0	100.7	101.2	101.2	101.2
		1000	0.002	0.003	0.003	0.003	0.002	0.002	99.0	100.0	100.6	101.0	100.3	100.3
		2000	0.002	0.002	0.002	0.002	0.001	0.001	99.5	100.0	100.3	100.5	99.9	99.9
	0.6	250	0.018	0.020	0.021	0.022	0.023	0.023	97.9	100.0	100.8	101.3	102.5	102.5
		500	0.010	0.008	0.008	0.008	0.008	0.009	99.1	100.0	100.8	101.3	101.2	101.4
		1000	0.007	0.004	0.004	0.004	0.004	0.004	99.7	100.0	100.6	101.1	100.1	100.1
		2000	0.004	0.001	0.001	0.001	0.001	0.001	100.2	100.0	100.3	100.5	99.9	99.9
$ϕ_{3, 11}$	0	250	0.011	0.012	0.013	0.013	0.013	0.013	96.7	100.0	101.4	102.2	103.3	103.2
		500	0.005	0.006	0.006	0.006	0.006	0.006	97.9	100.0	100.9	101.5	101.3	101.4
		1000	0.005	0.005	0.005	0.005	0.005	0.005	98.6	100.0	100.7	101.2	100.3	100.4
		2000	0.002	0.002	0.002	0.002	0.002	0.002	98.8	100.0	100.4	100.7	99.9	99.9
	0.3	250	0.015	0.015	0.015	0.015	0.016	0.016	97.3	100.0	101.0	101.7	103.2	103.1
		500	0.005	0.006	0.006	0.006	0.006	0.006	98.2	100.0	101.0	101.6	101.4	101.5
		1000	0.002	0.002	0.002	0.002	0.002	0.002	98.7	100.0	100.7	101.2	100.3	100.3
		2000	0.002	0.002	0.002	0.002	0.002	0.002	99.6	100.0	100.3	100.5	99.9	99.9
	0.6	250	0.015	0.015	0.016	0.016	0.016	0.016	98.5	100.0	101.3	102.0	103.4	103.5
		500	0.008	0.007	0.007	0.007	0.007	0.007	98.8	100.0	101.1	101.6	101.4	101.5
		1000	0.005	0.004	0.004	0.004	0.004	0.004	99.4	100.0	100.8	101.3	100.2	100.2
		2000	0.002	0.001	0.001	0.001	0.001	0.001	99.9	100.0	100.4	100.6	99.9	99.9

Note. Par = parameter; α_g,1 = factor mean in group g = 2, 3; ϕ_g,11 = factor variance in group g = 2, 3; ML = maximum likelihood estimation; MEp = robust moment estimation with p = 0.5 (with ε = 0.001), p = 0.25 (with ε = 0.001), p = 0.1 (with ε = 0.001), or p = 0 (with ε = 0.01); DBIC = direct BIC minimization (with ε = 0.01). Absolute biases larger than 0.015 are shown with a gray background. ^† ME0.5 is the reference method in the computation of the relative RMSE. Relative RMSE values smaller than 98.0 are printed in bold font. Relative RMSE values larger than 102.0 are show with a gray background.

Table 2. Simulation Study 2: Coverage rates (in percentages) of factor means and factor variances as a function of DIF effect size

δ

and sample size N.

Table 2. Simulation Study 2: Coverage rates (in percentages) of factor means and factor variances as a function of DIF effect size

δ

and sample size N.

Par	$δ$	N	ML	ME0.5	ME0.25	ME0.1	ME0	DBIC
$α_{2, 1}$	0	250	94.7	96.6	97.1	97.2	96.5	96.6
		500	95.2	96.3	96.6	96.8	96.1	96.2
		1000	94.9	95.5	95.7	96.0	95.3	95.3
		2000	95.0	95.4	95.6	95.8	95.2	95.3
	0.3	250	80.3	95.4	96.3	96.8	96.1	95.6
		500	66.0	95.6	96.4	96.7	96.2	95.9
		1000	39.8	95.0	96.0	96.3	95.7	95.7
		2000	13.2	93.0	94.3	94.6	94.1	94.2
	0.6	250	42.8	96.3	96.9	97.1	96.3	92.1
		500	14.5	95.2	95.8	96.0	95.2	95.2
		1000	1.0	95.6	96.3	96.4	95.7	95.7
		2000	0.0	95.2	95.6	95.9	95.6	95.5
$α_{3, 1}$	0	250	95.2	97.3	97.6	97.8	97.2	97.2
		500	95.0	96.3	97.0	97.3	96.0	96.1
		1000	95.3	96.0	96.3	96.6	95.6	95.6
		2000	94.9	95.3	95.5	95.7	95.1	95.2
	0.3	250	81.5	95.7	96.8	97.3	96.7	96.0
		500	68.6	95.0	96.5	96.8	95.8	95.7
		1000	44.9	95.4	96.3	96.7	95.7	95.8
		2000	16.3	93.5	94.5	94.8	94.3	94.1
	0.6	250	48.8	96.4	97.2	97.4	96.9	92.4
		500	21.3	95.8	96.7	96.8	96.0	95.9
		1000	2.5	95.3	96.3	96.7	95.6	95.6
		2000	0.1	95.0	95.5	95.8	95.3	95.3
$ϕ_{2, 11}$	0	250	94.8	96.8	97.2	97.5	97.2	97.2
		500	94.9	96.7	97.2	97.3	96.8	96.8
		1000	94.6	95.5	96.0	96.2	95.3	95.4
		2000	94.9	95.4	95.7	95.9	95.3	95.2
	0.3	250	94.6	96.7	97.2	97.4	97.2	97.2
		500	95.5	96.9	97.4	97.6	97.0	97.0
		1000	95.1	96.1	96.4	96.6	96.0	96.0
		2000	94.8	95.5	95.8	96.0	95.2	95.2
	0.6	250	94.9	96.8	97.3	97.4	97.2	97.1
		500	94.7	96.0	96.5	96.8	96.2	96.1
		1000	95.2	96.0	96.3	96.6	96.0	96.1
		2000	94.9	95.3	95.6	95.8	95.1	95.2
$ϕ_{3, 11}$	0	250	94.6	97.0	97.5	97.8	97.4	97.3
		500	95.1	96.9	97.2	97.4	97.0	97.0
		1000	94.6	95.8	96.2	96.5	95.8	95.9
		2000	95.0	95.5	95.9	96.2	95.4	95.4
	0.3	250	94.4	96.8	97.3	97.5	97.3	97.2
		500	95.1	96.9	97.3	97.5	96.8	96.8
		1000	94.7	96.0	96.3	96.6	96.0	96.0
		2000	95.2	95.8	96.2	96.3	95.6	95.7
	0.6	250	94.8	97.1	97.6	97.8	97.4	97.3
		500	95.2	97.1	97.5	97.7	97.0	96.8
		1000	95.2	95.9	96.5	96.9	96.0	96.0
		2000	94.6	95.1	95.4	95.7	94.9	94.9

Note. Par = parameter; α_g,1 = factor mean in group g = 2, 3; ϕ_g,11 = factor variance in group g = 2, 3; ML = maximum likelihood estimation; MEp = robust moment estimation with p = 0.5 (with ε = 0.001), p = 0.25 (with ε = 0.001), p = 0.1 (with ε = 0.001), or p = 0 (with ε = 0.01); DBIC = direct BIC minimization (with ε = 0.01); Coverage rates smaller than 91.0 or larger than 98.0 are shown with a gray background.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Robitzsch, A. L₀ and L_p Loss Functions in Model-Robust Estimation of Structural Equation Models. Psych 2023, 5, 1122-1139. https://doi.org/10.3390/psych5040075

AMA Style

Robitzsch A. L₀ and L_p Loss Functions in Model-Robust Estimation of Structural Equation Models. Psych. 2023; 5(4):1122-1139. https://doi.org/10.3390/psych5040075

Chicago/Turabian Style

Robitzsch, Alexander. 2023. "L₀ and L_p Loss Functions in Model-Robust Estimation of Structural Equation Models" Psych 5, no. 4: 1122-1139. https://doi.org/10.3390/psych5040075

APA Style

Robitzsch, A. (2023). L₀ and L_p Loss Functions in Model-Robust Estimation of Structural Equation Models. Psych, 5(4), 1122-1139. https://doi.org/10.3390/psych5040075

Article Menu

L₀ and L_p Loss Functions in Model-Robust Estimation of Structural Equation Models

Abstract

1. Introduction

2. $L_{0}$ and $L_{p}$ Loss Functions in SEM Estimation

3. A Direct BIC Minimization in Regularized Maximum Likelihood Estimation

4. Computation of Standard Errors

Statistical Inference for Parameter Differences of Different Models Based on the Same Dataset

5. Research Purpose

6. Simulation Study 1: Bias and RMSE

6.1. Method

6.2. Results

7. Simulation Study 2: Coverage

7.1. Method

7.2. Results

8. Discussion

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

L0 and Lp Loss Functions in Model-Robust Estimation of Structural Equation Models

Abstract

1. Introduction

2. L 0 and L p Loss Functions in SEM Estimation

3. A Direct BIC Minimization in Regularized Maximum Likelihood Estimation

4. Computation of Standard Errors

Statistical Inference for Parameter Differences of Different Models Based on the Same Dataset

5. Research Purpose

6. Simulation Study 1: Bias and RMSE

6.1. Method

6.2. Results

7. Simulation Study 2: Coverage

7.1. Method

7.2. Results

8. Discussion

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

L₀ and L_p Loss Functions in Model-Robust Estimation of Structural Equation Models

2. $L_{0}$ and $L_{p}$ Loss Functions in SEM Estimation