Next Article in Journal
Carousel Greedy Algorithms for Feature Selection in Linear Regression
Next Article in Special Issue
Mathematical Foundation of a Functional Implementation of the CNF Algorithm
Previous Article in Journal
Automated Segmentation of Optical Coherence Tomography Images of the Human Tympanic Membrane Using Deep Learning
Previous Article in Special Issue
Optimal Confidence Regions for Weibull Parameters and Quantiles under Progressive Censoring
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Implementation Aspects in Regularized Structural Equation Models

by
Alexander Robitzsch
1,2
1
IPN–Leibniz Institute for Science and Mathematics Education, Olshausenstraße 62, 24118 Kiel, Germany
2
Centre for International Student Assessment (ZIB), Olshausenstraße 62, 24118 Kiel, Germany
Algorithms 2023, 16(9), 446; https://doi.org/10.3390/a16090446
Submission received: 21 August 2023 / Revised: 12 September 2023 / Accepted: 14 September 2023 / Published: 18 September 2023
(This article belongs to the Special Issue Mathematical Models and Their Applications IV)

Abstract

:
This article reviews several implementation aspects in estimating regularized single-group and multiple-group structural equation models (SEM). It is demonstrated that approximate estimation approaches that rely on a differentiable approximation of non-differentiable penalty functions perform similarly to the coordinate descent optimization approach of regularized SEMs. Furthermore, using a fixed regularization parameter can sometimes be superior to an optimal regularization parameter selected by the Bayesian information criterion when it comes to the estimation of structural parameters. Moreover, the widespread penalty functions of regularized SEM implemented in several R packages were compared with the estimation based on a recently proposed penalty function in the Mplus software. Finally, we also investigate the performance of a clever replacement of the optimization function in regularized SEM with a smoothed differentiable approximation of the Bayesian information criterion proposed by O’Neill and Burke in 2023. The findings were derived through two simulation studies and are intended to guide the practical implementation of regularized SEM in future software pieces.

1. Introduction

Confirmatory factor analysis (CFA) and structural equation models (SEM) are among of the most important statistical approaches for analyzing multivariate data in the social sciences [1,2,3,4,5,6,7]. In these models, a multivariate vector X = ( X 1 , , X I ) of I observed variables (also referred to as items) is modeled as a function of a vector of latent variables (i.e., factors) η . SEMs represent the mean vector μ and the covariance matrix Σ of the random variable X as a function of an unknown parameter vector θ . In this sense, they apply constrained estimation for the moment structure of the multivariate normal distribution [8].
SEMs impose a measurement model that relates the observed variables X to latent variables η :
X = ν + Λ η + ϵ .
In addition, we denote the covariance matrix Var ( ϵ ) = Ψ , and  η and ϵ are multivariate normally distributed random vectors. Moreover, η and ϵ are uncorrelated random vectors. The issue of model identification has to be evaluated on a case-by-case basis [9,10]. We now describe two different specifications: the CFA and the more general SEM approach.
In the CFA approach, the multivariate normal (MVN) distribution is represented as η MVN ( α , Φ ) and ϵ MVN ( 0 , Ψ ) . Hence, one can represent the mean vector μ ( θ ) and the covariance matrix Σ ( θ ) in CFA as a function of an unknown parameter vector θ as
μ ( θ ) = ν + Λ α and Σ ( θ ) = Λ Φ Λ + Ψ .
The parameter vector θ contains freely estimated elements of ν , Λ , α , Φ , and  Ψ .
In the general SEM approach, a matrix B of regression coefficients is specified such that
η = B η + ξ with E ( ξ ) = α and Var ( ξ ) = Φ .
Note that (3) can be rewritten as
η = ( I B ) 1 ξ with E ( ξ ) = α and Var ( ξ ) = Φ ,
where I denotes the identity matrix. Hence, the mean vector and the covariance matrix are represented in SEM as
μ ( θ ) = ν + Λ ( I B ) 1 α and Σ ( θ ) = Λ ( I B ) 1 Φ [ ( I B ) 1 ] Λ + Ψ ,
The estimation of SEM often follows an ideal measurement model. For example, a simple-structure factor loading matrix Λ is desired in a multidimensional CFA. In this case, an item loads on one and only factor η , meaning that the number of non-zero entries in a row of Λ is one. However, this assumption on the simple-structure loading matrix could be somewhat violated in practice. For this reason, some cross-loadings could be assumed to be different from zero. Such sparsity assumptions on SEM model parameters can be tackled with regularized SEM [11]. Moreover, deviations in the entries of the observed and modeled mean vector (i.e., μ μ ( θ ) ) can be quantified in non-zero entries of the vector of item intercepts ν (see [12]). Again, model errors could be sparsely distributed, which would allow for the application of regularized SEM. In a similar manner, model deviations Σ Σ ( θ ) can be tackled by assuming sparsely distributed entries in the matrix of residual covariances Ψ . Notably, regularized SEM estimation is now becoming more popular in the social sciences and is recognized as an important approach in the machine learning literature [13].
In this article, we review several implementation aspects in estimating regularized SEMs with single and multiple groups. A recent article by Orzek et al. [14] recommended avoiding differentiable approximations for the non-differentiable optimization function in regularized SEM. We critically evaluate the credibility of this statement. Furthermore, we compare the currently used regularization estimation approach in most software, such as the regsem R package [15], with a recently proposed optimization function in the commercial SEM software package Mplus [16]. Finally, we also investigate the performance of a clever replacement of the optimization function in regularized SEM with a smoothed differentiable approximation of the Bayesian information criterion [17]. The findings were derived through two simulation studies. They are intended to provide guidance for the practical implementation of regularized SEM in future software pieces.
The remainder of the article is organized as follows. Different approaches of regularized maximum likelihood estimation methods of SEMs are reviewed in Section 2. In Section 3, research questions are formulated that are addressed in two subsequent simulation studies. In Section 4, results from a simulation study involving a multiple-group CFA model with violations of measurement invariance in item intercepts are presented. Section 5 reports findings from a simulation study of a single-group CFA in the presence of cross-loadings. In Section 6, the findings of the two simulation studies are summarized, and the research questions from Section 3 are answered. Finally, the article closes with a discussion in Section 7.

2. Estimation of Regularized Structural Equation Models

We now describe regularized maximum likelihood (ML) estimation approach for multiple-group SEMs. Note that some identification constraints must be imposed to estimate the covariance structure model (5) (see [2]). For modeling multivariate normally distributed data without missing values, the empirical mean vector x ¯ and the empirical covariance matrix S are sufficient statistics for estimating μ and Σ . Hence, they are also sufficient statistics for the parameter vector θ = ( θ 1 , , θ K ) .
Now, assume that there exist G groups with sample sizes N g and empirical means x ¯ g and covariance matrices S g ( g = 1 , , G ). The population mean vectors and covariance matrices are denoted by μ g and Σ g , respectively. The model-implied mean vectors and covariance matrices are denoted by μ g ( θ ) and Σ g ( θ ) , respectively. Note that the parameter vector θ does not have an index g to indicate that there can be common and unique parameters across groups. In a multiple-group CFA, equal factor loadings and item intercepts across groups are frequently imposed (i.e., measurement invariance holds).
Let ξ g = ( x ¯ g , S g ) be the sufficient statistics of group g. The combined vector containing all sufficient statistics for the multiple-group SEM is denoted by ξ = ( ξ 1 , , ξ G ) . The negative log-likelihood function l for the multiple-group SEM (see [2,4]) is given by
l ( θ ; ξ ) = g = 1 G N g 2 I log ( 2 π ) + log | Σ g ( θ ) | + tr ( S g Σ g ( θ ) 1 ) + ( x ¯ g μ g ( θ ) ) Σ g ( θ ) 1 ( x ¯ g μ g ( θ ) ) .
In empirical applications, the model-implied mean vectors covariance matrices will frequently be misspecified [18,19,20], and  θ can be interpreted as a pseudo-true parameter defined as the maximizer of the fitting function l in (6).
In regularized SEM estimation, a penalty function is added to the log-likelihood function that imposes some sparsity assumption on a subset of model parameters [11,12,21]. Frequently, the penalty function P is non-differentiable in order to impose sparsity. We define a known parameter δ k { 0 , 1 } for all parameters θ k , where δ k = 1 indicates that for the kth entry θ k in θ , a penalty function is applied. The penalized log-likelihood function is given by
l pen ( θ , λ ; ξ ) = l ( θ ; ξ ) + N * k = 1 K δ k P ( | θ k | p , λ ) ,
where λ is a nonnegative regularization parameter, and N * a scaling factor that frequently equals the total sample size N = g = 1 G N g . The power p in the penalty function usually takes values in [ 0 , 2 ] . Most of the literature on regularized SEMs employs the power p = 1 , but  p = 0.5 has been recently suggested [16] (but see also [22]). The minimizer of l ( θ ) is denoted as the regularized (or penalized) ML estimate.
We now discuss typical choices of the penalty function P . For a scalar parameter x, the least absolute shrinkage and selection operator (LASSO) penalty is a popular penalty function used in regularization [23], and it is defined as
P LASSO ( x , λ ) = λ | x | ,
where λ is a nonnegative regularization parameter that controls the extent of sparsity in the obtained parameter estimate. Note that the LASSO penalty function combined with p = 0.5 is equivalent to the alignment loss function (ALF [16]):
P ALF ( x , λ ) = λ | x | .
It is known that the LASSO penalty introduces bias in estimated parameters. To circumvent this issue, the smoothly clipped absolute deviation (SCAD [24]) penalty has been proposed.
P SCAD ( x , λ ) = λ | x | if | x | < λ ( x 2 2 a λ | x | 2 + λ 2 ) ( 2 ( a 1 ) ) 1 if λ | x | a λ ( a + 1 ) λ 2 if | x | > a λ
In many studies, the recommended value of a = 3.7 (see [24]) has been adopted (e.g., [25,26]). The SCAD penalty retains the penalization rate and the induced bias of the lasso for model parameters close to zero, but continuously relaxes the rate of penalization as the absolute value of the model parameters increases. Note that P SCAD has the property of the lasso penalty around zero, but has zero derivatives for x values strongly differing from zero.
Note that the minimizer of l pen is a function of the fixed regularization parameter λ ; that is,
θ ˜ ( λ ) = arg min θ l pen ( θ , λ ; ξ ) .
Hence, the parameter estimate θ ˜ ( λ ) of θ depends on a parameter that must be known. To circumvent this issue, the regularized SEM can be repeatedly estimated on a finite grid of regularization parameters λ (e.g., on an equidistant grid between 0.01 and 1.00 with increments of 0.01). The Bayesian information criterion (BIC), defined by BIC = 2 l ( θ , ξ ) + log ( N ) H , where H denotes the number of parameters, can be used to select an optimal regularization parameter. Because the minimization of BIC is equivalent to the minimization of BIC/2, the final parameter estimate θ ^ is obtained as
θ ^ = θ ˜ ( λ ^ ) with λ ^ = arg min λ l ( θ ˜ ( λ ) , ξ ) + log ( N ) 2 k = 1 K δ k χ 0 ( θ ˜ k ( λ ) ) ,
where the function χ z as an indicator whether | x | is larger than z for any z 0 :
χ z ( x ) = 1 if | x | > z 0 if | x | z .
In particular, the quantity k = 1 K δ k χ 0 ( θ ˜ k ( λ ) ) in (12) counts the number of parameter estimates θ ˜ k ( λ ) for k = 1 , , K for which the penalty function is applied (i.e., δ k = 1 ) and which differ from 0.
Note that the minimization of the BIC depends on two components. First, the model fit can be improved by minimizing the negative log-likelihood function while freely estimating more parameters. Second, sparse models are preferred in BIC minimization because the second term in (12) minimizes the number of estimated model parameters that are different from zero. Hence, there is always a trade-off between model fit improvement and parsimonious model estimation.
It should be emphasized that BIC is frequently preferred over the Akaike information criterion (AIC) in regularized estimation [11,27]. In typical sample sizes, BIC imposes stronger penalization of the number of estimated parameters than AIC. In fact, alternative information criteria with even stronger penalization are discussed in regularization [25,28,29].
Regularized estimation of single-group and multiple-group SEMs are widespread in the methodological literature [11,21,30,31,32,33,34]. In these applications, cross-loadings, entries in the covariance matrix of residuals, or the vector of item intercepts are regularized. Applying regularized estimation in SEMs allows for flexible yet parsimonious model specifications.

2.1. Regularized SEM Estimation Approaches

Regularized estimation of (11) typically involves a non-differentiable optimization function because the penalty function is non-differentiable. In  [14], exact and approximate solutions are distinguished for minimizing the penalized log-likelihood function l pen in (11).
Exact estimation operates on the non-differentiable penalized log-likelihood function. In coordinate descent (CD), the penalized log-likelihood function is cyclically minimized across all entries of the parameter vector θ (see [23]). If the function l pen is minimized in the kth coordinate θ k of θ , the remaining entries in θ are fixed to the estimate from the previous iteration. This coordinate-wise estimation can be repeated for all parameters and iterated until convergence is reached. The advantage of CD when using the LASSO or the SCAD penalty is that regularized parameters are exactly zero, while nonregularized parameters differ from zero. Hence, a sparse estimate θ is obtained. However, CD can be computationally demanding [14]. In addition, it can also not be generally ensured that a global minimum (instead of a local minimum) is found with CD estimation.
Alternatively, the non-differentiable optimization function can be replaced by a differentiable one [12,35,36,37,38]. The penalty function involves the non-differentiable absolute value function that can be replaced by
| x | ( x 2 + ε ) 1 / 2 or more generally | x | p ( x 2 + ε ) p / 2
for a sufficiently small ε > 0 , such as ε = 10 3 or ε = 10 4 . Fortunately, general-purpose optimizers that rely on derivatives can be relied on when using differentiable approximations based on the penalized log-likelihood function. These optimizers are widely available in software and are reliable if good starting values are available. The disadvantage of the differentiable approximation (DA) approach is that there are no estimated parameters that are exactly zero. To determine a parameter estimate θ ^ , a threshold τ [14] must be specified that defines which small parameter entries should be set to zero. Hence, the final parameter estimate in DA is given by
θ ^ = θ ˜ ( λ ^ ) with λ ^ = arg min λ l ( θ ˜ ( λ ) , ξ ) + log ( N ) 2 k = 1 K δ k χ τ ( θ ˜ k ( λ ) ) .
Note that the threshold τ is typically a function of ε [14], and  τ should be (much) larger than ε . In general, the penalized ML estimate based on DA defined in (15) relies on two tuning parameters, ε and τ , that must be properly chosen. Orzek et al. [14] argue that there is typically not enough knowledge on how to choose these tuning parameters in practical applications. Therefore, they generally prefer CD over DA.

2.2. Direct BIC Minimization Approach of O’Neill and Burke

The estimation approaches described in Section 2.1 require repeatedly fitting a SEM on a grid of regularization parameters λ . Such an approach is computationally demanding, in particular for SEMs with a large number of parameters. The final parameter estimate is obtained by minimizing the BIC across all estimated regularized SEMs. A naïve idea might be directly minimizing the BIC to avoid introducing the penalty function and the unknown regularization parameter λ in the optimization. Only a subset of parameters for which sparsity should be imposed is relevant in the BIC computation. Hence, a parameter estimate by minimizing the BIC is given by
θ ^ = arg min θ l ( θ , ξ ) + log ( N ) 2 k = 1 K δ k χ 0 ( θ k ) .
The optimization function in (16) employs a L 0 penalty function [39,40,41] with a fixed regularization parameter log ( N ) / 2 . This optimization function contains the non-differentiable indicator function χ 0 . However, like in the DA of the non-differentiable penalty function, the function χ 0 could also be replaced by a differentiable approximation. O’Neill and Burke [17] had the brilliant idea of approximating the indicator function χ 0 by
N ( x ) = x 2 x 2 + ε
for a sufficiently small ε > 0 . Hence, the minimization problem (16) can be replaced by
θ ^ = arg min θ l ( θ , ξ ) + log ( N ) 2 k = 1 K δ k N ( θ k ) .
The estimation approach from (18) is referred to as the smoothed direct BIC minimization (DIR) approach. This estimation approach has been applied to distributional regression models [17].

2.3. Standard Error Estimation

We now describe the computation of the variance matrix of parameter estimates θ ^ from penalized ML estimation for a fixed regularized parameter λ or the direct BIC minimization approach. Both estimation approaches minimize a differentiable (or differentiable approximation) function F ( θ , ξ ) with respect to θ as a function of sufficient statistics ξ (see also [6]). The vector of sufficient statistics ξ ^ is approximately normally distributed (see [3]); that is,
ξ ^ ξ 0 MVN ( 0 , V ξ )
for a true population parameter ξ 0 of sufficient statistics. We denote by F θ = ( F ) / ( θ ) the vector of partial derivatives with respect to θ . The parameter estimate θ ^ is given as the root of the non-linear equation
F θ ( θ , ξ ^ ) = 0 .
General M-estimation theory (i.e., the delta method [18]) can be applied to derive the variance matrix of θ ^ . Assume that there exists a (pseudo-)true parameter θ 0 such that
F θ ( θ 0 , ξ 0 ) = 0 .
We now derive the covariance matrix of θ ^ by utilizing a Taylor expansion of F θ . We denote by F θ θ and F θ ξ the matrices of second-order partial derivatives of F θ with respect to θ and ξ , respectively. We obtain
F θ ( θ ^ , ξ ^ ) = F θ ( θ 0 , ξ 0 ) + F θ θ ( θ 0 , ξ 0 ) ( θ ^ θ 0 ) + F θ ξ ( θ 0 , ξ 0 ) ( ξ ^ ξ 0 ) = 0 .
As the parameter estimate θ ^ is a non-linear function of ξ ^ , the Taylor expansion (22) provides the approximation
θ ^ θ 0 = F θ θ ( θ 0 , ξ 0 ) 1 F θ ξ ( θ 0 , ξ 0 ) ( ξ ^ ξ 0 ) .
By defining A = F θ θ ( θ 0 , ξ 0 ) 1 F θ ξ ( θ 0 , ξ 0 ) , we get by using the multivariate delta formula [18]:
Var ( θ ^ ) = A V ξ A .
An estimate of A is obtained as A ^ = F θ θ ( θ ^ , ξ ^ ) 1 F θ ξ ( θ ^ , ξ ^ ) . This approach is ordinarily used for differentiable discrepancy functions in the SEM literature [3,7,42]. Standard errors for entries in θ ^ can be obtained by taking the square root of diagonal elements of Var ( θ ^ ) computed from (24).

3. Research Questions

In the following two simulation studies, several implementation and algorithmic aspects of regularized SEM estimation are investigated. Five research questions (RQ) are imposed in this section that will be answered by means of the simulation studies.
The research questions are tackled through two simulation studies. The first, Simulation Study 1, considers the case of regularized multiple-group SEM estimation with noninvariant item intercept. In the second, Simulation Study 2, regularized SEM estimation is applied for data simulated from a two-factor model in the presence of cross-loadings.

3.1. RQ1: Fixed or Estimated Regularization Parameter λ ?

In the first research question, RQ1, we consider the choice of the regularization parameter λ regarding statistically efficient parameter estimation if structural parameters, such as factor means or factor correlations, are the primary analytical focus. We study whether an optimal regularization parameter is obtained by information criteria or a pre-chosen regularization parameter. Using only a fixed value of the regularization parameter instead of estimating the regularized SEM on a sequence of regularization parameters would decrease the computational burden of the estimation.

3.2. RQ2: Exact Optimization or Differentiable Approximation?

In the second research question, RQ2, we compare exact optimization and approximate optimization approaches based on differentiable approximations for regularized SEMs. Previous work argued that the exact approach should be generally preferred. We thoroughly investigate whether this preference is justified. Notably, approximate optimization with differentiable optimization functions is easier to implement because general-purpose optimizers are widely available and provide reliable convergence guarantees if adequate starting values are used in the estimation.

3.3. RQ3: Direct BIC Minimization or Minimizing BIC Using a Grid of λ Values?

The third research question, RQ3, investigates whether the direct one-step BIC minimization approach provides comparable results to the indirect estimation approach that requires the estimation of the regularized SEM on a grid of regularization parameters. If the one-step BIC minimization approach provides similar findings to the indirect approach, substantial computational gains would be achieved, which eases the application of regularized SEM.

3.4. RQ4: Always Choosing the Power p = 1 in the Penalty Function?

In the fourth research question, RQ4, we investigate whether there are considerable differences in the choice of the power p in the penalty. While the majority of regularization approaches employ the absolute value function p = 1 , a recent implementation in the popular Mplus software utilizes p = 0.5 . The outcome of this comparison gives hints on how future regularized SEM software should be implemented.

3.5. RQ5: Does the Delta Method Work for Standard Error Estimation?

Finally, in the fifth research question, RQ5, the quality of standard error estimation in terms of coverage rates (see Section 2.3) is studied. It is interesting whether the standard errors based on the delta method are reliable if they are applied for differentiable approximations of the optimization function in regularized SEM.

4. Simulation Study 1: Noninvariant Item Intercepts (DIF)

In Simulation Study 1, we investigated the impact of group-specific item intercepts in a multiple-group one-dimensional factor model. In the data-generating model (DGM), measurement invariance was violated.

4.1. Method

The setup of the simulation study mimics the one presented in [43]. Datasets were simulated from a one-dimensional factor model involving five items and three groups. The factor variable η 1 was normally distributed with group means α 1 , 1 = 0 , α 2 , 1 = 0.3 , and  α 3 , 1 = 0.8 . The group variances were set to ϕ 1 , 11 = 1 , ϕ 2 , 11 = 1.5 , and  ϕ 3 , 11 = 1.2 , respectively. All factor loadings were set to 1, and all measurement error variances were set to 1 in all groups and uncorrelated with each other. The factor variable, as well as the residual variables, were normally distributed.
Some non-zero group-specific item intercepts were simulated that indicate measurement noninvariance. These differential item functioning (DIF [44]) effects in item intercepts were simulated in one and only one of the five items in each group. In the first group, the fourth item intercepts had a DIF effect δ . In the second group, the first item had a DIF effect δ , while the second item had a DIF effect δ in the third group. The DIF effect δ was chosen as either 0.3 or 0.6. The sample size per group was chosen as N = 500 or N = 1000 .
A regularized multiple-group one-dimensional SEM was specified as the analysis model. In this model, invariant factor loadings were assumed. For identification reasons, the mean of the factor variable in the first group was fixed at 0, and the standard deviation in the first group was fixed at 1. The SCAD penalty function was imposed on group-specific item intercepts. In the penalty function, the powers p = 1 and p = 0.5 were investigated. The SEM was estimated on a grid of regularization parameters between 0.025 and 0.40, with increments of 0.025. The exact estimation approach was implemented by the coordinate descent (CD). In the differentiable approximation (DA) of the non-differentiable SCAD penalty function, we chose ε = 10 4 . The optimal regularization parameter λ was obtained by minimizing the BIC. Because no estimated item intercepts are exactly set to 0 in the estimation, the thresholds τ = 0.01 , 0.02, and 0.05 were chosen as the cutoff values for treating model parameters as a value of 0 in the BIC computation. Furthermore, the smoothed direct BIC minimization (DIR) approach of O’Neill and Burke was carried out using ε = 0.01 . This relatively large value was found to be optimal in preliminary simulation studies in which the tuning parameter ε was varied as 0.1, 0.01, 0.001, and 0.0001. The lowest RMSE and a small bias for parameter estimates were obtained for ε = 0.01 .
For the direct BIC minimization method DIR and regularized ML estimation for a set of fixed regularization parameters λ , standard errors were computed by means of the delta method described in Section 2.3. Confidence intervals were calculated based on the normal distribution assumption (i.e., the confidence interval of an estimate θ ^ was computed as [ θ ^ 1.96 · SE ( θ ^ ) , θ ^ + 1.96 · SE ( θ ^ ) ] , where SE ( θ ^ ) is the estimated standard error).
In total, 1000 replications were conducted for all 2 (DIF effect size δ ) × 2 (sample size N) = 4 conditions of the simulation study. We investigated the estimation quality of factor means and factor variances. Bias and root mean square error (RMSE) were utilized to assess the performance of different estimators. Let θ ^ r be a model parameter estimate in replication r = 1 , , R . The bias was estimated by
Bias ( θ ^ ) = 1 R r = 1 R ( θ ^ r θ ) ,
where θ denotes the true parameter value. The RMSE was estimated by
RMSE ( θ ^ ) = 1 R r = 1 R ( θ ^ r θ ) 2 .
Coverage rates at the confidence level of 95% were computed as the percentage of the events that a computed confidence interval covers the true parameter value. The models were estimated using the sirt::mgsem() function in the R [45] package sirt [46]. Replication material can be found in the directory “Simulation Study 1” located at https://osf.io/7kzgb (accessed on 21 August 2023).

4.2. Results

Figure 1 and Figure 2 display the absolute bias and the RMSE of the factor mean α 2 , 1 in the second group as a function of the regularization parameter λ for the two sample sizes, N = 500 and N = 1000 , and the two powers of the penalty function p = 1 and p = 0.5 , respectively. It can be seen in the two figures that there is a range of values of the regularization parameter λ , which results in unbiased and least variable (i.e., in terms of RMSE) estimates. The optimal fixed regularized parameter was larger for p = 0.5 than for p = 1 . However, the minimal RMSE was similar for p = 1 and p = 0.5 in Simulation Study 1. Furthermore, the RMSE of the factor mean estimate based on the optimal regularization parameter selected by the minimal BIC did not generally outperform a well-chosen fixed regularization parameter λ .
Interestingly, Figure 2 illustrates in the condition of a larger DIF effect δ = 0.6 that too small regularization parameters λ resulted in biased parameter estimates. The issue occurred both for p = 1 and p = 0.5 . Moreover, by comparing Figure 1 and Figure 2, it is evident that the optimal regularization parameter is a function of the size of DIF effects δ . That is, larger DIF effects δ resulted in larger regularization parameters λ .
Table 1 presents the bias and the RMSE of the estimated group means of the second and the third group. In this table, the direct BIC minimization approach DIR is compared with the exact estimation approach (CD) and the differentiable approximation (DA) using the optimal regularization parameter λ based on the minimal BIC, as well as for fixed regularization parameters λ = 0.05 for the power p = 1 and λ = 0.10 for p = 0.5 . The DA estimation approach is shown using the threshold τ = 0.02 . Values of the fixed regularization parameters were chosen based on the findings in Figure 1 and Figure 2. Overall, there was only a negligible bias in factor mean estimates. Regarding RMSE, all different estimation methods resulted in relatively similar estimates. Furthermore, there were essentially no differences between the exact solution (CD) and the approximate solution (DA). If researchers seek toknow a close-to-optimal regularization parameter λ , regularized ML estimation must not involve the choice of an optimal λ based on a minimal BIC. Finally, the computationally cheap direct BIC minimization approach (DIR) performed similarly to BIC estimation that requires fitting a regularized SEM at a sequence of regularization parameters λ . A slight increase in RMSE of the DIR method was only observed for N = 500 and δ = 0.3 , which was the consequence of slightly biased parameter estimates.
Table 2 compares the average number of regularized parameters of the exact approach (CD) and the differentiable estimation approach (DA) using the thresholds τ of 0.01, 0.02, and 0.04. It turned out that the number of regularized item intercepts in the selected models was very similar. Only for N = 500 and δ = 0.3 was the number of regularized parameters slightly too low. Notably, lower values of thresholds such as τ = 0.005 or τ = 0.001 would result in a substantially lower average number of regularized parameters.
Table 3 focuses on the coverage rates of selected model parameters. It can be seen that coverage rates for a model parameter were acceptable for both powers p = 1 and p = 0.5 if the respective parameter estimate was approximately unbiased. Interestingly, the coverage rates were also satisfactory for the direct BIC minimization approach (DIR).

5. Simulation Study 2: Two-Dimensional Factor Model with Cross-Loadings

In Simulation Study 1, regularized ML estimation of a two-dimensional factor model with cross-loadings was investigated.

5.1. Method

The data-generating method involves a two-dimensional factor model involving ten manifest variables X 1 , , X 10 (i.e., items), and two latent (factor) variables η 1 and η 2 . The data-generating model is graphically presented in Figure 3. The first five items load on the first factor, while the last five items load on the second factor. Three cross-loadings for items X 1 , X 9 , and  X 10 were introduced.
All variables had zero means and were normally distributed. Furthermore, the latent variables η 1 and η 2 were standardized (i.e., they had a true variance of 1). The true factor correlation ϕ 12 of the two factor variables was set to 0.5. The primary factor loadings of the ten items were 1.000, 0.858, 0.782, 0.877, 0.888, 1.000, 0.815, 0.721, 0.880, and 0.749. The variances of the normally distributed residual error variables were chosen as 0.115, 0.464, 0.572, 0.345, 0.411, 0.122, 0.536, 0.680, 0.383, and 0.627.
All cross-loadings were simulated with the size δ . In the simulation, δ was chosen as 0.2 or 0.4. Furthermore, the sample size N was chosen to be either 500 or 1000.
The two-dimensional factor model with the SCAD penalty function on the cross-loadings was specified as the analysis model. For identification reasons, the variances of the factor variables were fixed to 1. The estimation method followed those used in Simulation Study 1. Again, ε = 10 4 was employed in the differentiable approximation method (DA) utilizing thresholds τ = 0.01 , 0.02, and 0.04 in the BIC minimization. The smoothed direct BIC minimization approach (DIR) was again conducted with ε = 0.01 .
In total, 1000 replications were conducted for all 2 (size of cross-loadings δ ) × 2 (sample size N) = 4 conditions of the simulation study. We analyzed the estimation quality of model parameter estimates through bias, RMSE, and coverage rates.
The models were again estimated using the sirt::mgsem() function in the R [45] package sirt [46]. Replication material and the data-generating parameters can be found in the directory “Simulation Study 2” located at https://osf.io/7kzgb (accessed on 21 August 2023).

5.2. Results

Figure 4 and Figure 5 display the absolute bias and the RMSE of the factor correlation ϕ 2 , 11 of the two factors as a function of the regularization parameter λ for the two sample sizes, N = 500 and N = 1000 , and the two powers of the penalty function p = 1 and p = 0.5 , respectively. In contrast to Simulation Study 1, the two figures showed that using a regularization parameter λ that is smaller than the optimal λ selected by BIC resulted in parameter estimates with a smaller RMSE. This property somehow questions the standard procedure of searching for a parsimonious model according to the BIC if a structural parameter should be estimated with low variance. Furthermore, parameter estimates with the power p = 0.5 in the penalty function resulted in a lower RMSE than using the power p = 1 .
Table 4 presents the bias and the RMSE of the estimated factor correlation and the four factor loadings for the first two items. According to the findings in Figure 4 and Figure 5, we also display the parameter estimates for the fixed regularization parameter λ = 0.025 . The estimated parameters were unbiased. Hence, we focus on differences between the estimation approaches regarding the RMSE. Like in Simulation Study 1, the exact estimation approach (CD) performed similarly to the differentiable estimation approach (DA). The only exception was the case of p = 0.5 and the fixed regularization parameter λ = 0.025 . In this case, the DA approach resulted in fewer variables estimated than the CD approach for the factor correlation ϕ 2 , 11 . Moreover, the direct BIC minimization approach (DIR) was similar or superior to the CD and DA estimation approaches based on BIC. This is an interesting finding because repeatedly fitting the regularized SEM on a grid of regularization parameters is not required in DIR.
Table 5 displays the average number of regularized cross-loadings. It turned out that the choice of the threshold parameter τ was less critical for p = 0.5 than for p = 1 . Furthermore, using τ = 0.02 in the DA approach resulted in a similar average number of regularized parameters to the exact approach (CD).
Finally, Table 6 displays the coverage rates and bias of the estimated factor correlation and the factor loading of the first item. Coverage rates were satisfactory for the DIR approach as well as for the DA approach with fixed regularization parameters. There was a tendency of overcoverage for the power p = 0.5 for a small regularization parameter λ = 0.025 .

6. Summary of Simulation Findings

In this section, the main findings of the two simulation studies are discussed regarding the research questions posed in Section 3.

6.1. RQ1: Using a Fixed Regularization Parameter λ Can Be Advantageous Regarding Bias and RMSE

First, the findings of Simulation Study 2 demonstrated that using a fixed regularized parameter λ instead of an optimally chosen λ by means of minimizing BIC can result in more efficient estimates of structural parameters (research question RQ1). This finding undermines the fact that obtaining efficient parameter estimates is not necessarily related to the search for a parsimonious model in terms of minimal information criteria.

6.2. RQ2: Differentiable Approximations of Penalty Functions Generally Work

Second, differentiable approximation approaches of the non-differentiable penalty function in regularized estimation using appropriate tuning parameters performed similarly to exact estimation approaches that employed coordinate descent (research question RQ2). This result contradicts recommendations in the recent literature that differentiable approximation approaches should generally be avoided. Note that the latter approaches can utilize widely availably general-purpose optimizers. Moreover, regularized estimation with the differentiable approximation of the penalty function is frequently faster than specialized optimizers.

6.3. RQ3: Direct BIC Minimization Is a Competitive Estimation Method

Third, even if differentiable approximation were used in regularized estimation, selecting the optimal regularization parameter λ based on the minimal BIC requires repeated estimation of the SEM which can be computationally demanding. A recently proposed smooth direct BIC minimization approach by O’Neill and Burke [17] avoids the specification of a regularization parameter and directly minimizes a smoothed version of the BIC. In our simulation studies that involved SEMs, the direct BIC minimization approach performed surprisingly well and had similar performance to the ordinarily employed indirect BIC minimization approach that requires repeated estimations (research question RQ3). This finding is remarkable because it could change the practice of the implementation of regularized estimation.

6.4. RQ4: The Power p = 0.5 in the Penalty Function Can Be Sometimes Beneficial

Fourth, the choice of the power p of the penalty function is ordinarily p = 1 , but a recent implementation used p = 0.5 in SEM. Our simulations demonstrated similar performance of both power values (research question RQ4). However, in Simulation Study 2, p = 0.5 with a particular fixed regularization parameter λ outperformed the estimation based on p = 1 . Moreover, the determination of the number of estimated parameters depended less on a chosen threshold for p = 0.5 than p = 1 if a differentiable approximation of regularized estimation was utilized.

6.5. RQ5: Reliable Standard Error Estimation Using the Delta Method

Fifth, our simulation studies demonstrated that standard error computation based on the delta method was satisfactory for the direct BIC minimization approach as well as for regularized estimation with a fixed regularization parameter λ (research question RQ5).

7. Discussion and Conclusions

In this article, implementation aspects of regularized maximum likelihood estimation of SEMs were investigated. We obtained some insights into how regularized SEMs could be efficiently implemented in practice. In contrast to statements in the literature, differentiable approximations of the non-differentiable penalty functions in regularized SEM perform comparably well to specialized estimation methods if tuning parameters in these approximations are thoughtfully chosen.
Our preliminary conclusion for regularized SEM estimation from our simulation studies is that the direct BIC minimization approach or the fixed regularization parameter approach should deserve more attention in future research. By focusing on these approaches, the computational burden of regularized SEM is noticeably reduced. Future research might investigate whether the findings obtained for SEMs transfer to other models involving latent variables such as item response models [47,48,49,50,51,52], latent class models [28,53,54,55], or mixture models [56,57].
In this article, we focused on a differentiable approximation of the BIC. However, the same approximation technique could be applied to estimating regularized SEMs that minimize the AIC. However, we have preliminary simulation evidence that convergence issues appeared more frequently when minimizing the differentiable approximation of the AIC compared to BIC.
Hopefully, the availability of the direct BIC minimization approach could lead to more widespread use of regularized estimation. Nevertheless, the regularization approaches discussed in this paper still hinge on the assumption that there is sparsity with respect to regularized model parameters. Such sparse models or parameter deviations might not always be appropriate for modeling real-world datasets.
The simulation studies showed that the optimal regularization parameter λ regarding the bias and RMSE of the model parameters of interest does not necessarily coincide with the optimal λ obtained by minimizing the BIC. Determining the optimal regularization parameter λ for particular regularized SEMs is, therefore, difficult for researchers. Maybe only simulation studies that involve a similarly complex model and a similar sample size could help to determine an appropriate λ . If the researcher’s interest lies in the interpretation of model parameters in a regularized SEM, it is uncertain as to why model fitting is aimed at minimizing a prediction error, as in BIC, because such a criterion can only be weakly related to estimating optimal model parameters (see [58]).

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AICAkaike information criterion
BICBayesian information criterion
CDcoordinate descent
CFAconfirmatory factor analysis
DAdifferentiable approximation
DGMdata-generating model
DIFdifferential item functioning
LASSOleast absolute shrinkage and selection operator
MLmaximum likelihood
RMSEroot mean square error
SCADsmoothly clipped absolute deviation
SEMstructural equation model

References

  1. Bartholomew, D.J.; Knott, M.; Moustaki, I. Latent Variable Models and Factor Analysis: A Unified Approach; Wiley: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
  2. Bollen, K.A. Structural Equations with Latent Variables; Wiley: New York, NY, USA, 1989. [Google Scholar] [CrossRef]
  3. Browne, M.W.; Arminger, G. Specification and estimation of mean-and covariance-structure models. In Handbook of Statistical Modeling for the Social and Behavioral Sciences; Arminger, G., Clogg, C.C., Sobel, M.E., Eds.; Springer: Boston, MA, USA, 1995; pp. 185–249. [Google Scholar] [CrossRef]
  4. Jöreskog, K.G.; Olsson, U.H.; Wallentin, F.Y. Multivariate Analysis with LISREL; Springer: Basel, Switzerland, 2016. [Google Scholar] [CrossRef]
  5. Kaplan, D. Structural Equation Modeling: Foundations and Extensions; Sage: Thousand Oaks, CA, USA, 2009. [Google Scholar] [CrossRef]
  6. Shapiro, A. Statistical inference of covariance structures. In Current Topics in the Theory and Application of Latent Variable Models; Edwards, M.C., MacCallum, R.C., Eds.; Routledge: London, UK, 2012; pp. 222–240. [Google Scholar] [CrossRef]
  7. Yuan, K.H.; Bentler, P.M. Structural equation modeling. In Handbook of Statistics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; Volume 26, pp. 297–358. [Google Scholar] [CrossRef]
  8. Magnus, J.R.; Neudecker, H. Matrix Differential Calculus With Applications in Statistics and Econometrics; Wiley: New York, NY, USA, 2019. [Google Scholar] [CrossRef]
  9. Bollen, K.A.; Davis, W.R. Two rules of identification for structural equation models. Struct. Equ. Model. 2009, 16, 523–536. [Google Scholar] [CrossRef]
  10. Drton, M.; Foygel, R.; Sullivant, S. Global identifiability of linear structural equation models. Ann. Stat. 2011, 39, 865–886. [Google Scholar] [CrossRef]
  11. Jacobucci, R.; Grimm, K.J.; McArdle, J.J. Regularized structural equation modeling. Struct. Equ. Model. 2016, 23, 555–566. [Google Scholar] [CrossRef] [PubMed]
  12. Robitzsch, A. Model-robust estimation of multiple-group structural equation models. Algorithms 2023, 16, 210. [Google Scholar] [CrossRef]
  13. Brandmaier, A.M.; Jacobucci, R.C. Machine learning approaches to structural equation modeling. In Handbook of Structural Equation Modeling; Hoyle, R.H., Ed.; Guilford Press: New York, NY, USA, 2023; pp. 722–739. [Google Scholar]
  14. Orzek, J.H.; Arnold, M.; Voelkle, M.C. Striving for sparsity: On exact and approximate solutions in regularized structural equation models. Struct. Equ. Model. 2023. Epub ahead of print. [Google Scholar] [CrossRef]
  15. Li, X.; Jacobucci, R.; Ammerman, B.A. Tutorial on the use of the regsem package in R. Psych 2021, 3, 579–592. [Google Scholar] [CrossRef]
  16. Asparouhov, T.; Muthén, B. Penalized Structural Equation Models. Technical Report. 2023. Available online: https://rb.gy/tbaj7 (accessed on 28 March 2023).
  17. O’Neill, M.; Burke, K. Variable selection using a smooth information criterion for distributional regression models. Stat. Comput. 2023, 33, 71. [Google Scholar] [CrossRef]
  18. Boos, D.D.; Stefanski, L.A. Essential Statistical Inference; Springer: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
  19. Kolenikov, S. Biases of parameter estimates in misspecified structural equation models. Sociol. Methodol. 2011, 41, 119–157. [Google Scholar] [CrossRef]
  20. White, H. Maximum likelihood estimation of misspecified models. Econometrica 1982, 50, 1–25. [Google Scholar] [CrossRef]
  21. Huang, P.H.; Chen, H.; Weng, L.J. A penalized likelihood method for structural equation modeling. Psychometrika 2017, 82, 329–354. [Google Scholar] [CrossRef]
  22. Xu, Z.; Chang, X.; Xu, F.; Zhang, H. L1/2 regularization: A thresholding representation theory and a fast solver. IEEE Trans. Neur. Net. Lear. 2012, 23, 1013–1027. [Google Scholar] [CrossRef]
  23. Hastie, T.; Tibshirani, R.; Wainwright, M. Statistical Learning with Sparsity: The Lasso and Generalizations; CRC Press: Boca Raton, FL, USA, 2015. [Google Scholar] [CrossRef]
  24. Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  25. Fan, J.; Li, R.; Zhang, C.H.; Zou, H. Statistical Foundations of Data Science; Chapman and Hall/CRC: Boca Raton, FL, USA, 2020. [Google Scholar] [CrossRef]
  26. Zhang, H.; Li, S.J.; Zhang, H.; Yang, Z.Y.; Ren, Y.Q.; Xia, L.Y.; Liang, Y. Meta-analysis based on nonconvex regularization. Sci. Rep. 2020, 10, 5755. [Google Scholar] [CrossRef] [PubMed]
  27. Huang, P.H. A penalized likelihood method for multi-group structural equation modelling. Br. Math. Stat. Psychol. 2018, 71, 499–522. [Google Scholar] [CrossRef] [PubMed]
  28. Chen, Y.; Li, X.; Liu, J.; Ying, Z. Regularized latent class analysis with application in cognitive diagnosis. Psychometrika 2017, 82, 660–692. [Google Scholar] [CrossRef] [PubMed]
  29. Zhang, Y.; Li, R.; Tsai, C.L. Regularization parameter selections via generalized information criterion. J. Am. Stat. Assoc. 2010, 105, 312–323. [Google Scholar] [CrossRef] [PubMed]
  30. Chen, J. Partially confirmatory approach to factor analysis with Bayesian learning: A LAWBL tutorial. Struct. Equ. Model. 2022, 22, 800–816. [Google Scholar] [CrossRef]
  31. Geminiani, E.; Marra, G.; Moustaki, I. Single- and multiple-group penalized factor analysis: A trust-region algorithm approach with integrated automatic multiple tuning parameter selection. Psychometrika 2021, 86, 65–95. [Google Scholar] [CrossRef]
  32. Hirose, K.; Terada, Y. Sparse and simple structure estimation via prenet penalization. Psychometrika 2022. Epub ahead of print. [Google Scholar] [CrossRef]
  33. Huang, P.H. lslx: Semi-confirmatory structural equation modeling via penalized likelihood. J. Stat. Softw. 2020, 93, 1–37. [Google Scholar] [CrossRef]
  34. Scharf, F.; Nestler, S. Should regularization replace simple structure rotation in exploratory factor analysis? Struct. Equ. Model. 2019, 26, 576–590. [Google Scholar] [CrossRef]
  35. Battauz, M. Regularized estimation of the nominal response model. Multivar. Behav. Res. 2020, 55, 811–824. [Google Scholar] [CrossRef] [PubMed]
  36. Oelker, M.R.; Tutz, G. A uniform framework for the combination of penalties in generalized structured models. Adv. Data Anal. Classif. 2017, 11, 97–120. [Google Scholar] [CrossRef]
  37. Robitzsch, A. Comparing the robustness of the structural after measurement (SAM) approach to structural equation modeling (SEM) against local model misspecifications with alternative estimation approaches. Stats 2022, 5, 631–672. [Google Scholar] [CrossRef]
  38. Tutz, G.; Gertheiss, J. Regularized regression for categorical data. Stat. Model. 2016, 16, 161–200. [Google Scholar] [CrossRef]
  39. Oelker, M.R.; Pößnecker, W.; Tutz, G. Selection and fusion of categorical predictors with L0-type penalties. Stat. Model. 2015, 15, 389–410. [Google Scholar] [CrossRef]
  40. Phan, D.T.; Idé, T. l0-regularized sparsity for probabilistic mixture models. In Proceedings of the 2019 SIAM International Conference on Data Mining, Calgary, AB, Canada, 2–4 May 2019; pp. 172–180. [Google Scholar] [CrossRef]
  41. Shen, X.; Pan, W.; Zhu, Y. Likelihood-based selection and sharp parameter estimation. J. Am. Stat. Assoc. 2012, 107, 223–232. [Google Scholar] [CrossRef]
  42. Shapiro, A. Statistical inference of moment structures. In Handbook of Latent Variable and Related Models; Lee, S.Y., Ed.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 229–260. [Google Scholar] [CrossRef]
  43. Asparouhov, T.; Muthén, B. Multiple-group factor analysis alignment. Struct. Equ. Model. 2014, 21, 495–508. [Google Scholar] [CrossRef]
  44. Millsap, R.E. Statistical Approaches to Measurement Invariance; Routledge: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
  45. R Core Team. R: A Language and Environment for Statistical Computing; R Core Team: Vienna, Austria, 2023; Available online: https://www.R-project.org/ (accessed on 15 March 2023).
  46. Robitzsch, A. sirt: Supplementary Item Response Theory Models; R package version 3.13-228. 2023. Available online: https://CRAN.R-project.org/package=sirt (accessed on 11 August 2023).
  47. Belzak, W.C. The multidimensionality of measurement bias in high-stakes testing: Using machine learning to evaluate complex sources of differential item functioning. Educ. Meas. 2023, 42, 24–33. [Google Scholar] [CrossRef]
  48. Chen, Y.; Li, C.; Ouyang, J.; Xu, G. DIF statistical inference without knowing anchoring items. Psychometrika 2023. Epub ahead of print. [Google Scholar] [CrossRef]
  49. Robitzsch, A. Comparing robust linking and regularized estimation for linking two groups in the 1PL and 2PL models in the presence of sparse uniform differential item functioning. Stats 2023, 6, 192–208. [Google Scholar] [CrossRef]
  50. Sun, J.; Chen, Y.; Liu, J.; Ying, Z.; Xin, T. Latent variable selection for multidimensional item response theory models via L1 regularization. Psychometrika 2016, 81, 921–939. [Google Scholar] [CrossRef] [PubMed]
  51. Tutz, G.; Schauberger, G. A penalty approach to differential item functioning in Rasch models. Psychometrika 2015, 80, 21–43. [Google Scholar] [CrossRef] [PubMed]
  52. Zhang, S.; Chen, Y. Computation for latent variable model estimation: A unified stochastic proximal framework. Psychometrika 2022, 87, 1473–1502. [Google Scholar] [CrossRef] [PubMed]
  53. Chen, Y.; Liu, J.; Xu, G.; Ying, Z. Statistical analysis of Q-matrix based diagnostic classification models. J. Am. Stat. Assoc. 2015, 110, 850–866. [Google Scholar] [CrossRef] [PubMed]
  54. Robitzsch, A. Regularized latent class analysis for polytomous item responses: An application to SPM-LS data. J. Intell. 2020, 8, 30. [Google Scholar] [CrossRef] [PubMed]
  55. Xu, G.; Shang, Z. Identifying latent structures in restricted latent class models. J. Am. Stat. Assoc. 2018, 113, 1284–1295. [Google Scholar] [CrossRef]
  56. Robitzsch, A. Regularized mixture Rasch model. Information 2022, 13, 534. [Google Scholar] [CrossRef]
  57. Wallin, G.; Chen, Y.; Moustaki, I. DIF analysis with unknown groups and anchor items. arXiv 2023, arXiv:2305.00961. [Google Scholar] [CrossRef]
  58. Browne, M.W. Cross-validation methods. J. Math. Psychol. 2000, 44, 108–132. [Google Scholar] [CrossRef]
Figure 1. Simulation Study 1: Absolute bias and root mean square error (RMSE) of the factor mean α 2 , 1 in the second group as a function of the regularization parameter λ for a DIF effect of the item intercept of δ = 0.3 for sample sizes N = 500 and N = 1000 and powers p = 1 and p = 0.5 of the penalty function. The RMSE of the estimate obtained by the optimal BIC and coordinate descent (BIC-CD) is displayed by the blue line. The location of the average optimal regularization parameter obtained by BIC-CD is displayed by the blue triangle.
Figure 1. Simulation Study 1: Absolute bias and root mean square error (RMSE) of the factor mean α 2 , 1 in the second group as a function of the regularization parameter λ for a DIF effect of the item intercept of δ = 0.3 for sample sizes N = 500 and N = 1000 and powers p = 1 and p = 0.5 of the penalty function. The RMSE of the estimate obtained by the optimal BIC and coordinate descent (BIC-CD) is displayed by the blue line. The location of the average optimal regularization parameter obtained by BIC-CD is displayed by the blue triangle.
Algorithms 16 00446 g001
Figure 2. Simulation Study 1: Absolute bias and root mean square error (RMSE) of the factor mean α 2 , 1 in the second group as a function of the regularization parameter λ for a DIF effect of the item intercept of δ = 0.6 for sample sizes N = 500 and N = 1000 and powers p = 1 and p = 0.5 of the penalty function. The RMSE of the estimate obtained by the optimal BIC and coordinate descent (BIC-CD) is displayed by the blue line. The location of the average optimal regularization parameter obtained by BIC-CD is displayed by the blue triangle.
Figure 2. Simulation Study 1: Absolute bias and root mean square error (RMSE) of the factor mean α 2 , 1 in the second group as a function of the regularization parameter λ for a DIF effect of the item intercept of δ = 0.6 for sample sizes N = 500 and N = 1000 and powers p = 1 and p = 0.5 of the penalty function. The RMSE of the estimate obtained by the optimal BIC and coordinate descent (BIC-CD) is displayed by the blue line. The location of the average optimal regularization parameter obtained by BIC-CD is displayed by the blue triangle.
Algorithms 16 00446 g002
Figure 3. Simulation Study 2: Data-generating model.
Figure 3. Simulation Study 2: Data-generating model.
Algorithms 16 00446 g003
Figure 4. Simulation Study 2: Absolute bias and root mean square error (RMSE) of the factor correlation ϕ 12 as a function of the regularization parameter λ for a cross-loading of δ = 0.2 for sample sizes N = 500 and N = 1000 and powers p = 1 and p = 0.5 of the penalty function. The RMSE of the estimate obtained by the optimal BIC and coordinate descent (BIC-CD) is displayed by the blue line. The location of the average optimal regularization parameter obtained by BIC-CD is displayed by the blue triangle.
Figure 4. Simulation Study 2: Absolute bias and root mean square error (RMSE) of the factor correlation ϕ 12 as a function of the regularization parameter λ for a cross-loading of δ = 0.2 for sample sizes N = 500 and N = 1000 and powers p = 1 and p = 0.5 of the penalty function. The RMSE of the estimate obtained by the optimal BIC and coordinate descent (BIC-CD) is displayed by the blue line. The location of the average optimal regularization parameter obtained by BIC-CD is displayed by the blue triangle.
Algorithms 16 00446 g004
Figure 5. Simulation Study 2: Absolute bias and root mean square error (RMSE) of the factor correlation ϕ 12 as a function of the regularization parameter λ for a cross-loading of δ = 0.4 for sample sizes N = 500 and N = 1000 and powers p = 1 and p = 0.5 of the penalty function. The RMSE of the estimate obtained by the optimal BIC and coordinate descent (BIC-CD) is displayed by the blue line. The location of the average optimal regularization parameter obtained by BIC-CD is displayed by the blue triangle.
Figure 5. Simulation Study 2: Absolute bias and root mean square error (RMSE) of the factor correlation ϕ 12 as a function of the regularization parameter λ for a cross-loading of δ = 0.4 for sample sizes N = 500 and N = 1000 and powers p = 1 and p = 0.5 of the penalty function. The RMSE of the estimate obtained by the optimal BIC and coordinate descent (BIC-CD) is displayed by the blue line. The location of the average optimal regularization parameter obtained by BIC-CD is displayed by the blue triangle.
Algorithms 16 00446 g005
Table 1. Simulation Study 1: Bias and root mean square error (RMSE) of factor means as a function of sample size N and the size of the DIF effect of item intercepts δ .
Table 1. Simulation Study 1: Bias and root mean square error (RMSE) of factor means as a function of sample size N and the size of the DIF effect of item intercepts δ .
p = 1p = 0.5
BICBIC λ  = 0.05BIC λ  = 0.10
ParN δ DIRCDDACDDACDDACDDA
Bias
α 2 , 1 500 0.3−0.012−0.005−0.005−0.004−0.005−0.005−0.006−0.007−0.007
0.6 0.000 0.000 0.001−0.001−0.001 0.000 0.000−0.019−0.019
1000 0.3−0.005−0.001 0.000 0.000 0.000−0.001−0.001−0.001−0.001
0.6 0.001 0.002 0.002 0.002 0.002 0.002 0.002−0.010−0.010
α 3 , 1 500 0.3−0.013−0.007−0.006−0.006−0.006−0.007−0.007−0.008−0.008
0.6 0.001 0.001 0.002 0.000 0.000 0.001 0.001−0.017−0.017
1000 0.3−0.004 0.000 0.001 0.001 0.001 0.000 0.000 0.000 0.000
0.6 0.004 0.004 0.004 0.004 0.004 0.004 0.004−0.008−0.008
RMSE
α 2 , 1 500 0.3 0.090 0.086 0.086 0.086 0.086 0.083 0.083 0.082 0.082
0.6 0.081 0.081 0.081 0.081 0.081 0.080 0.080 0.084 0.084
1000 0.3 0.057 0.056 0.056 0.056 0.056 0.056 0.056 0.056 0.056
0.6 0.057 0.057 0.057 0.057 0.057 0.057 0.057 0.060 0.061
α 3 , 1 500 0.3 0.096 0.090 0.090 0.090 0.090 0.087 0.087 0.086 0.086
0.6 0.083 0.082 0.082 0.082 0.082 0.082 0.082 0.083 0.084
1000 0.3 0.058 0.058 0.058 0.058 0.058 0.057 0.057 0.057 0.057
0.6 0.059 0.059 0.059 0.059 0.059 0.059 0.059 0.061 0.061
Note. Par = parameter; α g , 1  = factor mean in group g = 2 , 3 ; estimation using the optimal regularization parameter based on the BIC; p = power used in the penalty function; λ  = fixed regularized parameter; DIR = direct BIC minimization using the differentiable approximation of O’Neill and Burke (2023) with ε = 0.01 ; CD = coordinate descent; DA = differentiable approximation using the threshold parameter τ = 0.02 .
Table 2. Simulation Study 1: Average number of regularized item intercepts using the optimal regularized parameter λ based on the BIC as a function of sample size N and the size of the DIF effect of item intercepts δ .
Table 2. Simulation Study 1: Average number of regularized item intercepts using the optimal regularized parameter λ based on the BIC as a function of sample size N and the size of the DIF effect of item intercepts δ .
p = 1 p = 0.5
DA with  τ = DA with  τ =
N δ CD0.010.020.04CD0.010.020.04
5000.311.8611.4511.9312.0811.9511.9611.9611.96
0.611.9411.9611.9711.9911.9711.9711.9711.97
10000.311.9511.8711.9812.0011.9511.9611.9611.96
0.611.9912.0012.0012.0011.9811.9911.9911.99
Note. p = power used in the penalty function; CD = coordinate descent; DA = differentiable approximation using a threshold parameter τ .
Table 3. Simulation Study 1: Bias and coverage rates as a function of sample size N and the size of the DIF effect of item intercepts δ .
Table 3. Simulation Study 1: Bias and coverage rates as a function of sample size N and the size of the DIF effect of item intercepts δ .
BiasCoverage
p = 1 p = 0.5 p = 1 p = 0.5
BIC DA with  λ = DA with  λ = BIC DA with  λ = DA with  λ =
Par N δ DIR0.050.100.100.15DIR0.050.100.100.15
α 2 , 1 500 0.3−0.02−0.01−0.08−0.01−0.0694.193.781.992.482.8
0.6 0.00 0.00 0.00−0.01 0.0095.495.095.195.295.3
1000 0.3 0.00 0.00−0.09 0.00−0.0695.395.567.595.374.2
0.6 0.00 0.00 0.00 0.00 0.0094.594.594.594.594.5
α 3 , 1 500 0.3−0.01 0.00−0.08−0.01−0.0594.194.781.892.985.1
0.6 0.00 0.00 0.00 0.00 0.0095.594.995.094.594.9
1000 0.3 0.00 0.00−0.09 0.00−0.0595.895.666.095.275.7
0.6 0.00 0.00 0.00 0.00 0.0095.395.195.295.195.2
ϕ 2 , 11 500 0.3 0.01 0.01 0.01 0.01 0.0195.495.395.395.295.3
0.6 0.01 0.01 0.01 0.00 0.0095.695.495.595.295.3
1000 0.3 0.01 0.01 0.01 0.00 0.0195.295.095.094.995.0
0.6 0.00 0.00 0.00 0.00 0.0095.595.295.295.295.2
ϕ 3 , 11 500 0.3 0.01 0.01 0.01 0.01 0.0194.894.694.794.494.6
0.6 0.01 0.01 0.01 0.01 0.0195.295.295.295.195.1
1000 0.3 0.00 0.00 0.00 0.00 0.0195.295.295.495.495.3
0.6 0.00 0.00 0.00 0.00 0.0095.895.395.395.295.2
ν 2 , 1 500 0.3 0.02 0.01 0.16 0.02 0.1294.593.935.191.648.3
0.6 0.00 0.00 0.00 0.03 0.0095.394.395.285.795.2
1000 0.3 0.01 0.00 0.18 0.00 0.1296.095.732.194.852.0
0.6 0.00 0.00 0.00 0.02 0.0095.194.794.585.394.5
ν 3 , 2 500 0.3 0.02 0.01 0.17 0.02 0.1294.293.529.692.048.4
0.6 0.00 0.00 0.00 0.03 0.0095.895.195.486.195.4
1000 0.3 0.00 0.00 0.19 0.00 0.1195.695.223.794.552.1
0.6 0.00 0.00 0.00 0.02 0.0095.395.094.985.995.0
Note. Par = parameter; α g , 1  = factor mean in group g = 2 , 3 ; ϕ g , 11  = factor mean in group g = 2 , 3 ; ν g , i  = item intercept of item i in group g = 2 , 3 ; estimation using the optimal regularization parameter based on the BIC; p = power used in the penalty function; λ = fixed regularized parameter; DIR = direct BIC minimization using the differentiable approximation of O’Neill and Burke (2023) with ε = 0.01 ; DA = differentiable approximation using the threshold parameter τ = 0.02 ; Absolute biases larger than 0.04 are printed in bold. Coverage rates smaller than 91 or larger than 98 are printed in bold.
Table 4. Simulation Study 2: Bias and root mean square error (RMSE) of factor means as a function of sample size N and the size of cross-loadings δ .
Table 4. Simulation Study 2: Bias and root mean square error (RMSE) of factor means as a function of sample size N and the size of cross-loadings δ .
p = 1 p = 0.5
BICBIC λ = 0.025 BIC λ = 0.025
Par N δ DIRCDDACDDACDDACDDA
Bias
ϕ 12 500 0.2 0.000 0.011 0.009−0.001−0.001 0.001 0.001 0.000 0.000
0.4 0.000 0.001 0.000 0.000−0.001 0.001 0.001 0.000 0.000
1000 0.2 0.000 0.002 0.001−0.001−0.001 0.000 0.000 0.000 0.000
0.4−0.001 0.000−0.001−0.001−0.001 0.000 0.000 0.000 0.000
λ 11 500 0.2−0.001−0.002−0.002−0.001−0.001 0.000−0.001−0.001−0.001
0.4 0.000−0.001−0.001 0.000 0.000 0.000 0.000−0.001−0.001
1000 0.2 0.000 0.000−0.001 0.000 0.000 0.000 0.000 0.000 0.000
0.4 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000−0.001
λ 12 500 0.2−0.001−0.006−0.005 0.000 0.000−0.001−0.001 0.000 0.000
0.4−0.001−0.002−0.001−0.001−0.001−0.002−0.001 0.000 0.000
1000 0.2−0.001−0.002−0.001 0.000 0.000−0.001 0.000 0.000 0.000
0.4−0.002−0.002−0.002−0.002−0.001−0.002−0.002−0.001−0.001
λ 21 500 0.2−0.001−0.003−0.002−0.001−0.001−0.001−0.001−0.001−0.001
0.4−0.001−0.001−0.001−0.001−0.001−0.001−0.001−0.001−0.002
1000 0.2−0.001−0.001−0.001−0.001−0.001−0.001−0.001−0.001−0.001
0.4 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
λ 22 500 0.2 0.000 0.000 0.000 0.001 0.001 0.000 0.000 0.001 0.001
0.4 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.002
1000 0.2 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.001
0.4 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
RMSE
ϕ 12 500 0.2 0.040 0.049 0.048 0.044 0.043 0.041 0.041 0.028 0.024
0.4 0.039 0.040 0.040 0.043 0.043 0.040 0.040 0.029 0.024
1000 0.2 0.028 0.030 0.030 0.030 0.029 0.028 0.028 0.022 0.017
0.4 0.028 0.028 0.028 0.030 0.029 0.028 0.028 0.023 0.017
λ 11 500 0.2 0.040 0.042 0.042 0.041 0.041 0.040 0.040 0.034 0.034
0.4 0.040 0.041 0.041 0.041 0.041 0.040 0.040 0.034 0.034
1000 0.2 0.028 0.028 0.028 0.028 0.028 0.028 0.028 0.024 0.023
0.4 0.029 0.029 0.029 0.030 0.029 0.029 0.029 0.025 0.024
λ 12 500 0.2 0.031 0.038 0.037 0.035 0.035 0.032 0.032 0.029 0.027
0.4 0.032 0.033 0.033 0.036 0.035 0.033 0.033 0.029 0.027
1000 0.2 0.022 0.024 0.023 0.023 0.023 0.022 0.022 0.022 0.019
0.4 0.023 0.023 0.023 0.025 0.024 0.024 0.024 0.022 0.020
λ 21 500 0.2 0.042 0.043 0.043 0.045 0.045 0.042 0.042 0.043 0.042
0.4 0.042 0.042 0.042 0.045 0.045 0.042 0.042 0.043 0.043
1000 0.2 0.030 0.030 0.030 0.031 0.031 0.030 0.030 0.030 0.030
0.4 0.030 0.029 0.030 0.031 0.031 0.030 0.030 0.030 0.030
λ 22 500 0.2 0.010 0.018 0.018 0.029 0.029 0.013 0.012 0.036 0.035
0.4 0.011 0.009 0.010 0.029 0.029 0.011 0.010 0.036 0.035
1000 0.2 0.006 0.009 0.009 0.015 0.014 0.007 0.007 0.025 0.024
0.4 0.007 0.005 0.004 0.016 0.015 0.008 0.007 0.026 0.024
Note. Par = parameter; ϕ 12 = factor correlation; λ i d = factor loading of ith item on the dth factor; estimation using the optimal regularization parameter based on the BIC; p = power used in the penalty function; λ = fixed regularized parameter; DIR = direct BIC minimization using the differentiable approximation of O’Neill and Burke (2023) with ε = 0.01 ; CD = coordinate descent; DA = differentiable approximation using the threshold parameter τ = 0.02 .
Table 5. Simulation Study 2: Average number of regularized cross-loadings using the optimal regularized parameter λ based on the BIC as a function of sample size N and the size of cross-loadings δ .
Table 5. Simulation Study 2: Average number of regularized cross-loadings using the optimal regularized parameter λ based on the BIC as a function of sample size N and the size of cross-loadings δ .
p = 1 p = 0.5
DA with  τ = DA with  τ =
N δ CD0.010.020.04CD0.010.020.04
5000.36.505.856.647.036.987.017.016.99
0.66.926.726.946.986.946.956.956.94
10000.36.726.356.866.996.956.966.966.91
0.66.976.956.996.996.956.956.956.90
Note. p = power used in the penalty function; CD = coordinate descent; DA = differentiable approximation using a threshold parameter τ .
Table 6. Simulation Study 2: Bias and coverage rates as a function of sample size N and the size of cross-loadings δ .
Table 6. Simulation Study 2: Bias and coverage rates as a function of sample size N and the size of cross-loadings δ .
BiasCoverage
p = 1 p = 0.5 p = 1 p = 0.5
BICDA with  λ = DA with λ = BICDA with  λ = DA with  λ =
Par N δ DIR0.0250.050.0250.05DIR0.0250.050.0250.05
ϕ 12 5000.20.000.000.000.000.0094.296.293.897.397.9
0.40.000.000.000.000.0094.396.094.798.098.3
10000.20.000.000.000.000.0095.195.994.797.097.0
0.40.000.000.000.000.0095.195.895.197.796.9
λ 11 5000.20.000.000.000.000.0094.495.494.699.696.7
0.40.000.000.000.000.0094.995.895.099.796.9
10000.20.000.000.000.000.0094.794.994.599.895.9
0.40.000.000.000.000.0095.195.495.199.996.4
Note. Par = parameter; ϕ 12 = factor correlation; λ i d = factor loading of ith item on the dth factor; estimation using the optimal regularization parameter based on the BIC; p = power used in the penalty function; λ = fixed regularized parameter; DIR = direct BIC minimization using the differentiable approximation of O’Neill and Burke (2023) with ε = 0.01 ; DA = differentiable approximation using the threshold parameter τ = 0.02 ; Absolute biases larger than 0.04 are printed in bold. Coverage rates smaller than 91 or larger than 98 are printed in bold.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Robitzsch, A. Implementation Aspects in Regularized Structural Equation Models. Algorithms 2023, 16, 446. https://doi.org/10.3390/a16090446

AMA Style

Robitzsch A. Implementation Aspects in Regularized Structural Equation Models. Algorithms. 2023; 16(9):446. https://doi.org/10.3390/a16090446

Chicago/Turabian Style

Robitzsch, Alexander. 2023. "Implementation Aspects in Regularized Structural Equation Models" Algorithms 16, no. 9: 446. https://doi.org/10.3390/a16090446

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop