1. Introduction
In the comparison of multiple groups in confirmatory factor analysis (CFA), some identifying assumptions have to be made. It is frequently assumed that item parameters are equal across groups, which is denoted as measurement invariance [
1]. The invariance concept has been very prominent in psychology and the social sciences in general [
2,
3]. For example, in international large-scale assessment studies in education, like the programme for international student assessment (PISA), the necessity of invariance is strongly emphasized [
4].
In violation of measurement invariance, the invariance alignment (IA) method [
5,
6] has been proposed to achieve approximate invariance [
7]. That is, item parameters should be made as invariant as possible while allowing a few deviations from invariance. By doing so, group comparisons can be made more robust against violations of measurement invariance. Note that IA is also referred to as alignment optimization [
8,
9].
Although IA can be seen as a canonical method for handling measurement noninvariance in the social sciences [
10,
11], this method has not been thoroughly studied from a statistical and conceptual point of view. There are a few simulation studies that investigate the behavior of the IA method. Except for a few studies [
12,
13], all simulation studies were carried out with the popular but commercial (and closed-source) Mplus software [
14]. Previous simulation studies for unidimensional factor models investigated the case of continuous items [
5,
9,
15,
16,
17], dichotomous items [
18,
19], and polytomous items [
13,
20]. The extension of IA to multidimensional factor models with continuous items was discussed in [
21,
22]. IA was studied in longitudinal measurement models in [
23,
24,
25,
26]. The IA method has been extended to exploratory structural equation models [
22,
27]. Moreover, the optimization function used in IA gave rise to extending it to a general framework used in the penalized maximum likelihood estimation of structural equation models [
28].
IA has been applied in a wide area of disciplines. For example, IA has been utilized to compare European countries regarding attitudes towards migration [
29,
30]. Ref. [
31] compares seven Latin American countries for the purpose in life test through IA. IA was applied to study bullying for children and adolescents [
32,
33]. Questionnaire data from the programme for international student assessment (PISA) study [
34,
35] and the trends in international mathematics and science study (TIMSS) [
36] were used in IA applications. Furthermore, the IA method was utilized to investigate overexcitability [
37], homophobia [
38], distributed leadership [
39], and gender role attitudes [
40].
In this article, we focus on the implementation aspects of the IA method. The IA optimization function is nondifferentiable. Software implementations of the IA method rely on differentiable approximations that depend on a tuning parameter . The default value of this tuning parameter is critically examined in this paper. Furthermore, the originally proposed IA method utilizes the loss function for . This article investigates whether other choices than result in improved estimation performance of the IA method. Moreover, the IA method uses a particular linking function for determining factor standard deviation based on a quantification of differences in residual item loadings. We show in this article that the performance of IA can be improved by relying on a different linking function that employs logarithmized item loadings to estimate factor standard deviations. Finally, the performance of IA is compared with a recently proposed differentiable approximation of the loss function. It turns out that this loss function performed comparably to default IA implementations, if not better, regarding the root mean square error of parameter estimates.
The rest of this article is organized as follows. IA estimation based on the robust
and
loss functions is treated in
Section 2.
Section 3 discusses the standard error computation in IA. In
Section 4, the research purpose of this article is outlined.
Section 5,
Section 6 and
Section 7 contain three simulation studies, respectively, that thoroughly investigate the choices in IA implementation. Finally, the article closes with a discussion in
Section 8.
2. Loss Functions in Invariance Alignment
In this section, the statistical background of IA is reviewed. In particular, the choice of different loss functions in IA is discussed.
Let
denote item
i (
) in group
g (
). A unidimensional factor model [
41] is defined as
where
is an item loading and
is an item intercept. Without loss of generality, item loadings can be assumed to be positive. The factor variables
and all residual variables
are independent and normally distributed. The factor variable
has a factor mean
and a factor standard deviation
.
It must be emphasized that the model parameters in (
1) are not identified. An identified model is obtained by assuming a standardized latent variable
(i.e., with a mean of 0 and a standard deviation of 1):
The model parameters in (
1) and (
2) are related to each other by
A convenient property is measurement invariance [
1,
3], in which the same item loadings and item intercepts across groups can be assumed. That is, there exist item loadings
such that
for all
and item intercepts
such that
for
for all items
. The absence of measurement invariance is also labeled as differential item functioning (DIF; [
2,
42]) in the literature. In the case of measurement invariance, Equation (
3) can be rewritten as
The IA method of Asparouhov and Muthén [
5,
6] tackles situations under sparse violations of measurement invariance. In this case, a few item loadings or item intercepts are allowed to differ across groups, while the majority of items (approximately) fulfill the invariance assumption [
43]. This situation is referred to as partial invariance [
44].
In IA, the unidimensional factor model (
2) is separately estimated for all groups in the first step. The estimated item parameters
and
(
;
) are used as the input of the IA. By rewriting (
3) and inserting the estimated item loadings and item intercepts, we obtain
These relations motivate the minimization of the following linking function in IA to determine group means
and standard deviations
:
where the weights
and
are known, and
is a loss function. Asparouhov and Muthén [
5] proposed
and
, where
denotes the sample size of group
g. In the minimization of (
6), additional identification constraints must be imposed. In this article, we fix the moments in the first group; that is, we set
and
.
In the rest of the article, we ignore the weights in (
6) for the following reasons. First, it simplifies mathematical notation. Second, it is not obvious why one should choose weights related to the sample sizes of the groups. We think that model deviations should be equally weighted across groups.
Note that the optimization function
H of IA, defined in (
6), can be rewritten as
It has been shown that the simultaneous minimization of
H with respect to
and
can be viewed as a two-step minimization problem [
45]. In more detail, a vector of estimated factor standard deviations
is obtained by minimizing
in the first step. In the second step, a vector of estimated factor means
is obtained by minimizing
with respect to
.
Equation (
3) can be rewritten as
This motivates using an alternative optimization function
for determining standard deviations that employs logarithmized item loadings (see [
45,
46])
where
for
. Due to the identification constraint, we fix
(i.e.,
). By minimizing
, a vector of standard deviations
on the logarithm metric is obtained; that is,
. The vector of estimated standard deviations
can be obtained by exponentiating all entries in
. The vector of estimated factor means
can again be obtained by minimizing
.
Hence, there are two estimation options for IA. The original approach of [
5] minimizes
and is referred to as the “NOL” method (i.e., no logarithm for item loadings). The second approach obtains factor standard deviations by minimizing
and is referred to as the “LOG” method (i.e., taking the logarithmized item loadings for defining deviations).
As mentioned above, IA uses the loss function
as the default in the Mplus software package [
14]. However, the loss function
is also available in Mplus [
14]. The more general
loss function
for
was studied for IA in [
12,
45]. It has been shown that values of the power
p smaller than 0.5 can be advantageous in some situations [
12]. An interesting case is the limiting case in which
p tends to zero. Effectively,
counts the number of parameter deviations that differ from zero [
47,
48], resulting in the
loss function. In the practical minimization of
H involved in IA, the nondifferentiable
loss function
(for
) is replaced by a differentiable approximation
(see [
5,
45])
where
is a tuning parameter that controls the approximation error of
for
. The approximation error becomes smaller with
values close to zero. However, the minimization of
H in IA gets more difficult when choosing too-small values of
. Practical experience led to proposals
[
5] or
[
45]. The choice
is the default in Mplus (see [
12]). It is also tempting to consider the
loss function
that takes values of 1 if
and 0 for
. However, the differentiable approximation
in (
11) performs poorly for
p values close to 0 because the minimization in
H becomes very difficult. O’Neill and Burke [
49] proposed, in a recent work related to regularized estimation, the following differentiable approximation
of the
loss function:
where
is again a tuning parameter that controls the approximation error and estimation stability of the differentiable approximation. This approximation (
12) of the
loss function has not yet been investigated in IA. As IA is particularly suited to the sparse deviations in model parameters, the
loss function should theoretically fit typical data-generating models frequently utilized in simulation studies that investigate the performance of IA.
In our experience, in the case of small
values, the optimization of the alignment function is very sensitive to starting values. Asparouhov and Muthén [
5] remark that the linking function in invariance alignment is prone to multiple local minima. Moreover, they mention that these local minima often yield values of the linking function that are only slightly different from values at the global minimum. In the estimation of the IA approach, it is advised to choose a sequence of decreasing values of
in the optimization, each using the previous solution as initial values (see [
50] for a similar approach). This choice guarantees better suitable starting values and a more stable estimation of IA.
3. Standard Errors in Invariance Alignment
In this section, the estimation of standard errors for the IA approach is described. IA minimizes the linking function
with respect to
and
. Let
be the parameter vector of interest. The estimated item loadings and item intercepts across all groups are collected in a vector
, where
contains the group-specific model parameter estimates from a unidimensional factor model in group
. As a maximum likelihood estimate,
is approximately multivariate normally distributed. Because of the independence of subjects across groups, a multivariate normal distribution of the input parameter
in IA is obtained as
where
is a block-diagonal covariance matrix, and
is a population parameter.
The distribution of the IA estimates
is now derived using the delta method in M-estimation theory [
51] by relying on the implicit function theorem [
5]. We assume differentiability of the optimization function because the nondifferentiable loss function
in IA is replaced by a differentiable approximation
. The IA approach minimizes
, where we now highlight the dependency of the input parameters
. A parameter estimate
is obtained by taking the partial derivative of
H with respect to
(i.e.,
) and solving the nonlinear equation such that
Note that there exists a population parameter
such that
Now, a Taylor expansion of
around
can be carried out. Denote with
and
the matrices of second-order partial derivatives of
with respect to
and
, respectively. The Taylor expansion can be written as
By solving (
16) for
, we have the approximation
By defining
when substituting
and
with
and
, respectively, we have, by using the multivariate delta method [
51],
Standard errors for elements in
can be obtained by taking the square root of diagonal elements of
computed from (
18).
5. Simulation Study 1: Bias and RMSE in a Three-Group
Example
In Simulation Study 1, IA is studied in a case with noninvariant item intercepts and in a case with noninvariant item loadings and noninvariant item intercepts. This study focuses on bias and root mean square error (RMSE).
5.1. Method
The data-generating models (DGMs) in the simulation study mimicked the DGM used in [
5]. The data were simulated from a one-dimensional factor model involving five items (i.e.,
) and three groups (i.e.,
). The factor variable was normally distributed with group means 0, 0.3, and 0.8, and the group variances were 1, 1.5, and 1.2, respectively. All measurement error variances were set to one in all groups and were uncorrelated with each other. The factor variable and residual variables were normally distributed.
Two DGMs were simulated that refer to a violation of measurement invariance. Group-specific item parameters that are noninvariant are referred to as DIF effects [
2].
In the first DGM (i.e., DGM1), only item intercepts were noninvariant. All item loadings were set to one, and only a subset of group-specific item intercepts were simulated differently from zero. Hence, data were simulated assuming partial invariance. In the first group, the fourth item intercept was . In the second group, the first item was , while the second item had an intercept of in the third group.
In the second DGM (i.e., DGM2), item intercepts and item loadings were invariant. The same item intercepts as in DGM1 were used. Three group-specific item loadings were different from one. The item loading of the third item in the first group and the item loadings of the fifth item in the third group were 2.013. The second item in the second group had an item loading of 0.497.
The sample size per group was chosen as
,
,
, or
. IA was estimated using the NOL method (based on the
function defined in (
8)) and the LOG method (based on the
function defined in (
10)). The
loss function was employed using the powers
,
, and
and the differentiable approximation defined in (
11). The
loss function employed differentiable approximation (
12). We did not consider power values
or
because they have been shown to result in severely biased estimates in the situation of partial invariance [
12,
45]. The reason is that noninvariant item intercepts (i.e., model errors) indicate a misspecified model. This kind of misspecification biases factor means if the
loss function is with
because all item intercepts contribute to the estimation of factor means. The situation is known from robust statistics where the unweighted mean is not robust to outlying observations. Moreover, the
loss function with
does not fully remove bias because it treats model errors (i.e., outlying observations) symmetrically. In contrast, the
loss function with
is more robust for asymmetrically distributed model errors.
All estimation methods were applied with the tuning parameters , , , and . The choice led to substantially biased parameter estimates, while the IA estimates based on had large variances. Hence, we only report findings for the tuning parameters and .
In total,
replications were conducted for each cell of the simulation study. Bias, standard deviation (SD), RMSE, and relative RMSE were computed to assess the performance of the different estimators. Let
be a model parameter estimate in replication
for the parameter
. The bias of the estimator
was estimated with
where
denotes the true parameter value. The SD of an estimator
was calculated as
The RMSE of an estimator
was estimated with
A relative RMSE can be defined by dividing the RMSE of an estimator by the RMSE of a chosen reference model. In Simulation Study 1 (and in Simulation Study 3), the Mplus default with the NOL method, , and is used as the reference model. To more easily grasp differences in the relative RMSE, the values were multiplied by 100. This quantity can then easily be converted into a percentage gain of a particular estimator compared with a reference model.
The entire simulation study was carried out in the R [
52] software. IA was performed with the
sirt::invariance.alignment() (see also [
12,
53,
54]) function in the R package sirt (Version 4.0-19; [
55]). Information about model specification can be found in the material located at
https://osf.io/7kwqh/ (accessed on 17 September 2023).
5.2. Results
Table 1 reports the bias, SD, and relative RMSE of the factor mean
and the factor SD
of the second group in the DGM of noninvariant item intercepts (i.e., DGM1). It can be seen that the
loss function with
,
, and
showed some bias for
. Importantly, the bias was more substantial when using
instead of
. Furthermore, the extent of the bias in the estimated factor mean
decreased with increasing sample size. Notably, the pattern of the bias was similar for the NOL and LOG methods. Interestingly, the
loss function (i.e.,
) outperformed the other specifications regarding bias. While
would be preferable for
,
, and
, for
, the tuning parameter choice
would be preferred due to a smaller SD of the estimate. Notably,
had a slightly increased SD compared with
for the sample size
. However, this effect decreased with larger sample sizes. Moreover, it can be seen that the Mplus default
and
could be improved in terms of relative RMSE by using
and
for DGM1. Notably, the relative performance gains are more important in larger sample sizes. Additional smaller gains can be obtained by switching to
and
.
The factor SD was almost unbiasedly estimated in the condition of only noninvariant item intercepts. In this situation, the choice can be defended over for the Mplus default. The loss function could not outperform p values different from zero regarding the SD.
Table 2 displays bias, SD, and relative RMSE for the factor mean and the factor SD of the second group in the condition of noninvariant item loadings and noninvariant item intercepts. In this situation, the SD estimate was biased, particularly for the smallest sample size
. Importantly, the bias was significantly reduced when using the LOG method instead of the NOL method. Overall, the
loss function with
was the preferred method regarding the bias and relative RMSE for sample sizes of at least 500. Surprisingly, no bias occurred for the estimated factor mean in DGM2. However, this seems to be a coincidence of different defining factors for the bias.
Table A1 in
Appendix A reveals that the factor mean of the third group in DGM2 also provided biased estimates. Again, in this case,
resolved the issue, in particular for larger sample sizes.
As a preliminary conclusion from Simulation Study 1, one could argue that the case of noninvariant item loadings can induce bias in parameter estimates in default implementations of IA. The bias can be reduced by using the LOG method instead of the default NOL method. There is a tendency that the choice outperformed for . Finally, the loss function (i.e., ) had satisfactory performance for . However, it came at the price of increased variance in smaller sample sizes.
7. Simulation Study 3: Bias and RMSE in a Six-Group Example
In the last example, Simulation Study 3, different estimation methods of IA were examined for DGMs involving six groups and four items.
7.1. Method
The data were simulated from a one-dimensional factor model involving four items (i.e., ) and six groups (i.e., ). The factor variable was normally distributed with group means 0, −0.27, −0.46, 0.11, 0.21, and 0.49, and the group variances were 1, 0.95, 0.87, 1.23, 1.1, and 0.99, respectively. All measurement error variances were set to one in all groups and uncorrelated with each other. The factor variable and residual variables were normally distributed.
Two DGMs were simulated that refer to a violation of measurement invariance. Invariance only appeared in item loadings, while item intercepts had invariant parameters.
In the first DGM of Simulation Study 3 (i.e., DGM3), the DIF effects in item loadings (, …) were unidirectional. That is, item loadings with DIF loadings were all larger than one, while the invariant item parameters had loadings equal to one. The item loadings with DIF effects were as follows: (i.e., first item in second group), , , and . The item intercepts were all set to 0 in DGM3.
In the second DGM of Simulation Study 3 (i.e., DGM4), the DIF effects in item loadings were directional. That is, item loadings with DIF effects could be smaller or larger than one. The item loadings with DIF effects were as follows: , , , and . Like in DGM3, the item intercepts were all set to 0 in DGM4.
As in Simulation Study 1 and Simulation Study 2, the sample size per group was chosen as , , , or . The same IA estimation methods as in the other two studies were utilized. To summarize the performance of the parameter estimates across the six groups, average absolute bias, average SD, and the average relative RMSE were computed, where the average was calculated for factor means and factor SDs , separately.
Again, the simulation study was carried out in the R [
52] software. IA was estimated using the
sirt::invariance.alignment() function in the R package sirt (Version 4.0-19; [
55]). Replication material can be found at
https://osf.io/7kwqh/ (accessed on 17 September 2023).
7.2. Results
Table 4 displays average absolute bias, average SD, and average relative RMSE for factor means
and factor SDs
for the DGM with unidirectional effects of noninvariance (i.e., DGM3). As expected from Simulation Study 1, noninvariant item loadings mainly impacted factor standard deviations. It was obtained that factor SDs had a larger absolute bias for the (default) NOL method compared with the LOG method. However, absolute bias decreased with larger sample sizes and by choosing the tuning parameter
instead of
. Furthermore, the LOG method had a much smaller bias in estimated factor SDs. This finding also translated into findings for the average absolute RMSE of the
estimates. Interestingly, the LOG method was also preferred for the estimated factor means. The LOG estimates for
had, on average, smaller SDs than the NOL estimates.
Similar findings were obtained in the case of the bidirectional DIF effects in item loadings (DGM4) that are displayed in
Table A2 in
Appendix B. The LOG method was preferred over the NOL method regarding the estimation of factor SDs
. Furthermore, LOG resulted in slightly better estimates than the NOL method for factor means
. IA with
should preferably choose the tuning parameter
instead of
.
8. Discussion and Conclusions
In this article, we critically discussed implementation aspects in IA. Because IA is now widely applied in the social sciences, researchers should opt for appropriate estimation methods. We derived recommendations for software implementation and practical application of IA through three simulation studies.
In IA, the loss function is the default choice in the popular Mplus software. A differentiable approximation of this loss function uses the tuning parameter as a default in this software. Our simulations revealed that this default choice can induce bias in estimated factor means and factor standard deviations. The bias can be reduced by switching to the tuning parameter . Notably, biases in IA were particularly pronounced in small to moderate samples (i.e., persons per group). It turned out that bias in estimated factor standard deviations occurred in the presence of noninvariant item loadings. This bias can be reduced by using a modified IA optimization function in which logarithmized item loadings are aligned (i.e., the LOG method described in this paper). In general, we found that the LOG method generally improves the default NOL method in IA (which uses no logarithmized item loadings) in the situations in which bias occurred and performed comparably to NOL in all other situations. Furthermore, the loss function recently proposed by O’Neill and Burke (i.e., ) showed the least bias across all simulated conditions. This method can be regarded as the frontrunner across all simulation conditions when used with the tuning parameter . Finally, statistical inference based on the delta method performed satisfactorily for all approximately unbiased estimates in terms of coverage rates.
In future research, the generalizability of the findings of this study to more groups or more items [
17] can be examined. Moreover, implementation aspects of IA could also be investigated for dichotomous or ordinal items [
19,
20]. In particular, the performance of IA in small samples [
57] requires additional consideration. It might be that regularized estimation [
58,
59,
60,
61] or confirmatory factor analysis estimation that uses robust loss functions (i.e., model-robust estimation; see [
62,
63]) have advantages over IA in small samples.