1. Introduction
In the comparison of multiple groups in confirmatory factor analysis (CFA) regarding factor variables, some identifying assumptions have to be made. It is frequently assumed that item parameters are equal across groups, denoted as measurement invariance [
1]. The concept of invariance has been very prominent in psychology and the social sciences in general [
2,
3]. For example, in international large-scale assessment studies in education, like the Programme for International Student Assessment (PISA), the necessity of invariance is strongly emphasized [
4].
In the violation of measurement invariance, the invariance alignment (IA) method [
5,
6] (also referred to as alignment optimization [
7,
8]) has been proposed to tackle such situations. The IA method tries to make item parameters as invariant as possible while allowing a few deviations from invariance. By doing so, group comparisons can be made more robust against violations of measurement invariance.
Nowadays, the IA method is frequently applied in social sciences for analyzing questionnaire data [
9,
10,
11,
12]. Unfortunately, most methodological developments of IA (but see [
13,
14,
15] for exceptions) are strongly coupled to the popular but commercial (and closed-source) Mplus software [
16]. Previous simulation studies for one-dimensional factor models investigated the case of continuous items [
5,
8,
17,
18,
19], dichotomous items [
20,
21], and polytomous items [
14,
22]. IA to multidimensional factor models with continuous items has been investigated in [
23,
24]. Moreover, IA was studied in longitudinal models in [
25,
26,
27]. The optimization function used in IA also gave rise to extending it to a general framework used in penalized structural equation models [
28].
Besides the Mplus software, there exists an alternative implementation in the R package sirt [
29]. However, several researchers have pointed out that there could be subtle differences of IA between Mplus and sirt. Unfortunately, there is no systematic comparison of the performance of invariance alignment implementations in Mplus and sirt. This article tries to shed some light on the subtleties of implementation differences of IA. It turns out that different identification constraints are likely the cause of the different results of software packages. By changing the default identification constraint in sirt, Mplus and sirt provided much more similar results. Moreover, the results from a simulation study also question the default choices of tuning parameters in the software packages.
The rest of this article is structured as follows. In 
Section 2, the background of IA is reviewed. 
Section 3 discusses the syntax code and estimation options of IA in Mplus and sirt. In 
Section 4, the two software packages are compared by means of a simulation study. An empirical example is presented in 
Section 5. Finally, the paper closes with a discussion in 
Section 6.
  2. Invariance Alignment
Let the random variable 
 denote item 
i (
) in group 
g (
). A one-dimensional factor model [
30] is defined as
      
      where 
 are item loadings, and 
 are item intercepts. Item loadings can be assumed to be positive. If some loading is negative, the corresponding random variable 
 must be multiplied by 
. The factor variables 
 and all residual variables 
 are independent and univariate normally distributed. The factor variable 
 has a factor mean 
 and a factor standard deviation 
.
Without additional assumptions, the parameters in (
1) are not identified. An identified model is obtained by assuming a standardized latent variable 
 (i.e., with a mean of 0 and a standard deviation of 1):
	  The parameters in (
1) and (
2) are related to each other by
      
In many applications, the factor means 
 and factor standard deviations 
 should be compared across groups. To achieve this, a typical assumption in the social sciences is the property is measurement invariance [
1,
3]. Measurement invariance presupposes that item loadings 
 and item intercepts 
 are equal across groups. That is, there exist common item loadings 
 such that 
 for all 
 and common item intercepts 
 such that 
 for 
 for all items 
. The absence of measurement invariance is also labeled as differential item functioning (DIF; [
2,
31]) in the item response theory literature. If measurement invariance holds, (
3) can be rewritten as
      
The IA method of Asparouhov and Muthén [
5,
6] tackles situations under sparse violations of measurement invariance. In this case, a few item loadings or item intercepts are allowed to differ across groups, while the majority of items (approximately) fulfills the invariance assumption [
32]. This situation is called partial invariance in the literature [
33].
The IA estimation method proceeds in two steps. In the first step, the one-dimensional factor model (
2) is separately estimated by the maximum likelihood method for all groups in the first step. The estimated item parameters 
 and 
 (
; 
) are used as the input of the IA. By rewriting (
3) and inserting the estimated item loadings and item intercepts, we obtain
      
	  These relations motivate the minimization of the following linking function in IA to determine group means 
 and standard deviations 
:
	  where the weights 
 and 
 are known, and 
 is a nonnegative, symmetric loss function with 
 and is monotonically increasing for nonnegative 
x values. Asparouhov and Muthén [
5] proposed using 
 and 
, where 
 denotes the sample size of group 
g.
In the minimization of (
6), additional identification constraints must be imposed. As a first alternative, the distribution parameters of the first (or any other) group can be fixed. That is, we set 
 and 
. As a second alternative, one can simultaneously constrain all estimated parameters. Then, the following identification constraints can be imposed:
	  The constraints in (
7) state that the arithmetic mean of the factor means equals zero, and the geometric mean of the factor standard deviation equals one.
Note that the optimization function 
H of IA defined in (
6) can be rewritten as
      
	  However, the function 
 can be conveniently substituted by an alternative. Note that Equation (
3) can be rewritten as
      
	  This motivates the alternative optimization function 
 for determining standard deviations, which employs logarithmized item loadings (see [
34,
35])
      
      where 
 for 
. Due to the required identification constraints, we fix 
 (i.e., 
). By minimizing 
, a vector of standard deviations 
 on the logarithm metric is obtained; that is, 
. The vector of estimated standard deviations 
 can be obtained by exponentiating all entries in 
.
  2.1. Numerical Optimization
As mentioned above, IA uses the loss function 
 as the default in the Mplus software package [
16]. However, the loss function 
 is also available in Mplus [
16]. The more general 
 loss function 
 for 
 has been studied for IA in [
13,
35]. It has been shown that values of the power 
p smaller than 0.5 can be advantageous in some situations [
13].
In the practical minimization of 
H involved in IA, the nondifferentiable 
 loss function 
 (for 
) is replaced by a differentiable approximation 
 (see [
5,
35])
        
        where 
 is a tuning parameter that controls the approximation error of 
 for 
. The approximation error becomes smaller with 
 values close to zero. However, the minimization of 
H in IA becomes more difficult when choosing too small values of 
. Practical experience led to proposals 
 [
5] or 
 [
35]. The choice 
 is the default in Mplus (see [
13]).
  2.2. A More in-Depth Look into the Identification Constraint for Standard Deviations for Many Groups
The IA method measures the similarity between item loadings in the optimization function 
 by
        
As mentioned above, an identification constraint could be to fix the standard deviation of the first group to 1 or to fix the product of standard deviations to 1. Regarding the choice of the chosen identification constraint in their Mplus software, Asparouhov and Muthén [
5] state that “[…] in Mplus by default the parameters are indeed reported in that metric, however, the alignment optimization is carried out using Equation (10) [i.e., the product identification constraint in (
7)] to ensure full symmetry between the different groups”. To illustrate this motivation a bit, we rewrite (
13) as
        
        where we decomposed the terms that do involve and do not involve the first group, respectively. If the optimization would only have been carried out based on the second term in (
14), the optimization value would tend to zero if standard deviations tend to infinity. Hence, fixing the standard deviation 
 to 1 prevents obtaining infinite estimates of 
 for 
. If 
 is specified in the minimization of (
14), it becomes clear that the first term in the sum involving the first group becomes less relevant if the number of groups increases. Hence, there is a danger that estimated standard deviations are larger if more groups are involved in the analysis. For this reason, the identification constraint 
 is likely not appropriate in the case of many groups. In contrast, the constraint 
 would be preferable in this case. The behavior of IA for many groups is analyzed in a simulation study in 
Section 4 and an empirical example in 
Section 5.
  3. Implementation of Invariance Alignment in Mplus and Sirt
We now describe how IA can be estimated with the commercial Mplus software (Version 8.9; [
16]) and the R (Version 4.3; [
36]) package sirt [
29].
Listing 1 contains command-line syntax for the specification of IA in Mplus (see [
16,
37]). The dataset is locally saved in 
mydata.dat (see Line 4 in Listing 1) in an appropriate working directory. The IA method should be applied for five items 
I1, …, 
I5 (see Line 6 in Listing 1). The numeric grouping variable 
group is included in the dataset. The grouping variable has to be specified as a known class variable in Mplus (see Lines 8 and 9 in Listing 1).
| Listing 1. Specification of invariance alignment in Mplus software. | 
- 1
 TITLE : - 2
  Invariance Alignment ; - 3
 DATA : - 4
  FILE IS mydata.dat ; - 5
 VARIABLE : - 6
  NAMES ARE group I1 I2 I3 I4 I5; - 7
  USEVARIABLES ARE group I1 I2 I3 I4 I5; - 8
  CLASSES = c(3); - 9
  KNOWNCLASS = c(group = 1 group = 2 group = 3); - 10
 ANALYSIS : - 11
  TYPE = MIXTURE; - 12
  ESTIMATOR = MLR; - 13
  ALIGNMENT = FIXED(2);  ! group=2 is reference group with zero mean; - 14
                         ! ALIGNMENT = FREE for method ’FREE’; - 15
   - 16
  TOLERANCE = 0.01;      ! epsilon value; - 17
  SIMPLICITY = SQRT;     ! for p=0.5; - 18
                         ! SIMPLICITY = FOURTHRT for p=0.25; - 19
 MODEL : - 20
  %overall% - 21
  f1 BY I1 I2 I3 I4 I5; - 22
 OUTPUT : - 23
  alignment ; 
  | 
Mplus has only implemented the product constraint 
 for standard deviations. The method 
FIXED (i.e., Line 13 in Listing 1 that states 
ALIGNMENT=FIXED) utilizes the zero constraint of the factor of the first group; that is, 
. The reference to the first group can be changed using the command 
ALIGNMENT=FIXED(2) (see Line 13 in Listing 1). In this case, Group 2 is used as the reference group. Alternatively, “the 
FREE alignment optimization estimates 
 as an additional parameter” [
5]. This specification seems to be overparametrized, and Mplus must have implemented some fix to prevent nonconvergence of the IA optimization problem. The Mplus manual states, “In the 
FREE setting, all factor means are estimated. 
FREE is the most general approach” [
16]. This statement does not certainly provide enough details for an independent implementation of the black-box algorithms in the Mplus software. Furthermore, the 
TOLERANCE argument in Line 15 in Listing 1 specifies the tuning parameter 
 that appears in the differentiable approximation (
12). The default in Mplus is 
. Finally, the 
SIMPLICITY argument can either choose the power 
 (i.e., square root 
SQRT) or 
 (i.e., fourth root 
FOURTHRT).
Listing 2 shows how IA can be estimated in the R package sirt [
29,
38,
39]. In the first step, group-specific estimation of the one-dimensional factor models can be carried out with the function 
sirt::invariance_alignment_cfa_config() (see Line 5 in Listing 2). The group-specific estimated item loadings 
lambda and item intercepts 
nu can be extracted from the output of this function (see Lines 9 and 10 in Listing 2). Moreover, the weights 
 in IA (see Equation (
6)) are specified in Line 14 in Listing 2. The specification in this listing ensures the same chosen weights as in Mplus. The function 
sirt::invariance.alignment() performs IA based on estimated item loadings 
lambda and item intercepts 
nu (see Line 17 in Listing 2). The power 
p in IA can be separately chosen for item loadings (first entry in 
align.pow) and item intercepts (second entry in 
align.pow). If the power 
 instead of the default 
 should be used in the analysis, users have to specify the argument 
align.pow=c(0.25,0.25) in the 
sirt::invariance.alignment() function. The tuning parameter 
 in Equation (
12) can be specified with the argument 
eps.
| Listing 2. Specification of invariance alignment in the R package sirt. | 
- 1
 #∗ define items - 2
 items <- paste0(‘‘I’’, 1:5) - 3
   - 4
 #∗ separate estimate of factor model in groups - 5
 prep <- sirt::invariance_alignment_cfa_config(dat=dat[,items], - 6
                   group=dat$group ) - 7
   - 8
 # extract item loadings and item intercepts - 9
 lambda <- prep$lambda - 10
 nu <- prep$nu - 11
   - 12
 #- define weights - 13
 Ng <- prep$N - 14
 wgts <- matrix(sqrt(Ng), length(Ng), ncol(nu)) - 15
   - 16
 #∗ perform invariance alignment - 17
 res <- sirt::invariance.alignment(lambda=lambda, nu=nu, - 18
                   align.pow=c(.5, .5), eps=0.01, wgt=wgts, meth=3) - 19
   - 20
 #- extract estimated means and standard deviations - 21
 res$pars 
  | 
The IA function in the sirt package has four different estimation methods that can be requested with the argument 
meth. The default 
meth = 1 uses the optimization Function (
6) with the identification constraints 
 and 
. The method 
meth = 2 performs IA on logarithmized item loadings (see Equation (
11)), also using the constraints 
 and 
. The method 
meth = 3 implements the product constraint 
 for standard deviations and the zero mean constraint for the first group (i.e., 
). Hence, this method is expected to perform similarly to Mplus’ 
FIXED alignment method. Finally, 
meth = 4 also utilizes the product constraint for standard deviations but freely estimates the first group mean 
. To identify the model, a penalty term 
 is added to the optimization function, where 
W is the sum of the involved weights in the IA optimization function and 
 is a small factor to achieve convergence in optimization. Likely, this method has only conceptual similarity with Mplus’ 
FREE method, and no equivalent performance can be expected.
The estimated distributed parameters can be requested by the list entry $pars (see Line 21 in Listing 2).
  4. Simulation Study
  4.1. Method
The datasets in this simulation study were simulated from a one-dimensional factor model consisting of  items and , 6, 9, or 12 groups. In the case of three groups, the group means were 0, 0.3, and 0.8, and the group standard deviations were 1, 1.225, and 1.095, respectively. With more than three groups, all parameters (i.e., distribution and item parameters) were replicated accordingly. For example, for six groups, the parameters were twice replicated.
All measurement error variances were set to 1 in all groups and uncorrelated with each other. The factor variable and residual variables were normally distributed. There was noninvariance in item intercepts and item loadings. All item intercepts had a value of zero except for a few cases. In the first group, the fifth item intercept was . In the second group, the first item intercept was , while the second item had an intercept of  in the third group. All item loadings had a value of one except for a few cases. In the first group, the third item loading was . In the second group, the fifth item loading was , while the fourth item loading was  in the third group. These parameters were duplicated with more than three groups as described above.
The sample size per group was chosen as , , , , or  (i.e., infinite sample size). In the case of an infinite sample size, there was no sampling error, and the population parameters were the data-generating parameters. The mean vectors and the covariance matrices are sufficient statistics for the IA method. Datasets with a sample size of , whose empirical means and covariances equaled the population means and covariances, respectively, were simulated in this case.
The IA method was applied in the Mplus software (Version 8.9; [
16]), and the function 
invariance.alignment() in the R package sirt (Version 4.1-15; [
29]) was applied. Both software packages utilized the power 
 and the tuning parameter choices 
 and 
. Mplus was used with the 
FIXED or the 
FREE methods, while the method 
meth in sirt was specified as 
meth = 1, 
meth = 2, 
meth = 3, or 
meth = 4. To compare the performance across methods, the estimates were linearly transformed such that the mean and the SD of the first group were 0 and 1, respectively.
In total, 
 replications were conducted for each cell of the simulation study. Bias, standard deviation (SD), root mean square error (RMSE), and relative RMSE were computed to assess the performance of the different estimators for factor means and factor standard deviations. To ease the comparability between the different estimation methods, we computed a relative RMSE value, which was defined as the quotient of the RMSE for a particular method and the RMSE of a reference method. This quotient was multiplied by 100 afterward. The reference method was Mplus’ 
FIXED method with 
 and 
, which is the default in this software package. We also computed the mean absolute difference between estimates of Mplus and sirt to determine possible differences between software packages. Information about model specifications can be found in the material located at 
https://osf.io/84ne5 (accessed on 17 February 2024).
  4.2. Results
In this section, we only present results for the distribution parameters for the second group. The findings for the other groups were very similar.
Table 1 contains the bias of the estimated factor mean 
 for different estimation methods in Mplus and sirt. Overall, noticeable bias occurred for 
 = 0.01 and 
p = 0.5. However, the bias decreased with increasing sample size but still appeared in infinite sample sizes. Moreover, note that the bias did not disappear with an increasing number of groups. Interestingly, bias was substantially reduced with the tuning parameter 
 = 0.001, particularly for sample sizes of at least 1000. For three, six, or nine groups, the method 
meth = 1 in sirt performed best in terms of bias. In general, the bias of both Mplus methods 
FIXED and 
FREE was similar to those obtained from the four methods implemented in sirt. Interestingly, sirt’s method 
meth = 1 had issues with an increasing number of groups. For 
 and 
N = 250, there was a large bias in estimated factor means, which showed that 
meth = 1 failed for a large number of groups.
 Table 2 shows the relative RMSE of the estimated factor mean 
 in the second group. The 
FREE method in Mplus was slightly inferior to the 
FIXED method in Mplus for more than three groups. The tuning parameter 
 outperformed 
 in terms of relative RMSE. This observation was primarily an effect of the larger bias for 
. The simulation study also highlighted that the SD for the different estimates was larger for 
 than for 
.
 Table 3 presents the average absolute difference between the estimates of the factor mean in the second group between Mplus and sirt. It can be seen that Mplus’ 
FIXED method was closest to the sirt method 
meth = 3. The differences were larger to sirt’s 
meth = 1, which is the default in the R package sirt. Furthermore, the 
FREE method of Mplus turned out to perform most similarly to sirt’s 
meth = 4. However, the differences between the two methods are noticeable. Hence, it can be concluded that there is no equivalent implementation of the Mplus 
FREE method in the sirt package.
 Table 4 shows the bias for the factor SD of the second group for 
p = 0.5. As for the factor mean, the tuning parameter 
 = 0.001 had superior performance compared to 
 = 0.01. For the SD, the Mplus methods 
FIXED and 
FREE as well as sirt’s 
meth = 3 and 
meth = 4 coincide. Overall, the sirt method 
meth = 1 was preferable for three or six groups, while its performance deteriorated for a larger number of groups. It should be emphasized that the bias did not even disappear in infinite sample sizes for 
.
 Table 5 presents the relative RMSE for the factor SD in the second group. The specifications with 
 were generally preferable over 
 in terms of RMSE. The Mplus and sirt methods performed very similarly. Obviously, the bias issues of sirt’s 
meth = 1 for many groups (i.e., 9 or 12 groups) also translated into substantially increased RMSE values.
 Table 6 displays the mean absolute difference for the estimate of the factor SD in the second group between Mplus and sirt. The Mplus method 
FIXED had a similar performance to the sirt 
meth = 3, while Mplus’ 
FREE method has comparable performance with sirt’s 
meth = 4.
 To conclude, this simulation study demonstrated that the performance of IA estimates in Mplus can be similar to sirt if an appropriate estimation method meth in sirt is chosen. The default sirt method meth = 1 resulted in larger differences to Mplus. However, sirt’s meth = 1 can be preferred over Mplus and the other sirt methods for three or six groups but cannot be recommended for many groups (i.e., at least nine groups). Overall, the tuning parameter  = 0.001 should be preferred over  = 0.01 in terms of bias and RMSE.
  5. Empirical Example: Asparouhov and Muthén (2014) Dataset
This empirical example uses a dataset that was previously also analyzed in [
5,
40,
41]. The dataset came from the European social survey (ESS) conducted in the year 2005 (ESS 2005), which included subjects from 26 countries. The factor variable of tradition and conformity was assessed by four items presented in portrait format, where the scale of the items is such that a high value represents a low level of tradition conformity. The wording of the four items were as follows (see [
5]): It is important for him to be humble and modest. He tries not to draw attention to himself (item 
TR9); Tradition is important to him. He tries to follow the customs handed down by his religion or family (item 
TR20); He believes that people should do what they’re told. He thinks people should follow rules at all times, even when no one is watching (item 
CO7); and It is important for him to always behave properly. He wants to avoid doing anything people would say is wrong (item 
CO16). The dataset for this empirical example (and used in [
5]) was downloaded from 
https://www.statmodel.com/Alignment.shtml (accessed on 17 February 2024).
  5.1. Original Data
We analyzed the original ESS dataset but included subjects with no missing values on the four items. The dataset used in this article can be found at 
https://osf.io/84ne5 (accessed on 17 February 2024). In the 26 countries, the sample sizes ranged between 1031 and 2963 persons with a mean of 1869.5 and an SD of 454.7. The IA method was applied with the specifications 
 and 
 in Mplus and sirt. The same six estimation methods (i.e., 
FIXED and 
FREE in Mplus as well as 
meth = 1, 
meth = 2, 
meth = 3, and 
meth = 4 in sirt) were applied to the dataset.
Table 7 shows the estimated factor means and SDs for the 26 countries and the six estimation methods. It can be seen that sirt’s default 
meth = 1 provides implausible estimates in this example with many groups. However, the sirt methods 
meth = 2, 
meth = 3, and 
meth = 4 performed comparably to Mplus’ 
FIXED and 
FREE methods. It turned out that Mplus’ 
FIXED method was relatively close to sirt’s 
meth = 3 in terms of absolute differences in estimated factor means (
M = 0.010, 
 = 0.013, 
 = 0.000, 
 = 0.070). In addition, estimated factor means were also similar between the Mplus 
FIXED method and the sirt 
meth = 2 method (absolute differences: 
M = 0.012, 
 = 0.014, 
 = 0.000, 
 = 0.068). Moreover, Mplus’ 
FREE method also performed similarly to sirt’s 
meth = 4 for estimated factor means (absolute differences: 
M = 0.010, 
 = 0.016, 
 = 0.000, 
 = 0.086). There was also a close resemblance for estimated factor standard deviations between the Mplus 
FIXED and sirt 
meth = 3 methods (absolute differences: 
M = 0.007, 
 = 0.006, 
 = 0.000, 
 = 0.020). However, the differences between the estimation methods 
FIXED and 
FREE in Mplus (or 
meth = 3 and 
meth = 4 in sirt) are noteworthy.
   5.2. Pseudo-Datasets
In this section, the original ESS dataset is used to create pseudo-datasets that should provide more insights about the different behavior of the estimation methods implemented in Mplus and sirt. The first five countries from the original datasets with sample sizes 1525, 1695, 2320, 1468, and 1031 subjects are used in the creation of the datasets. It is investigated whether the size of the estimates depends on the number of groups. To enable clean but idealized settings, we varied the number of included groups by replicating the original dataset accordingly. For example, with G = 10 groups, the first five groups were the original five countries, while groups six to ten are also the five countries but labeled as unique groups in the IA estimation. Usually, one would expect that the results of the first five groups should not change if the same dataset appears as duplications in the pseudo-dataset.
Table 8 presents estimated factor means and SDs for the third and the sixth group in the pseudo-datasets involving 
G = 5, 10, 15, 20, 25, or 30 groups. Note that the sixth group coincided with the first group in the pseudo-datasets and the first country in the original dataset. The distribution parameter estimates were transformed such that the mean and the SD of the first group were 0 and 1, respectively.
 The factor mean estimates changed as a function of a number of groups for the Mplus FIXED and all sirt methods. Only for the Mplus FREE method were the estimates invariant with respect to the number of groups. In particular, large differences in the estimates were observed when comparing results in a model with 25 and 30 groups. Because the first group had a (transformed) mean of 0, it would also be expected that Group 6 would have factor mean estimates of 0. However, this was not the case for the estimation method, except for Mplus’ FREE and sirt’s meth = 4 methods. Overall, this pattern is surprising because it implies that the choice of the reference group (i.e., the first group in our case) and the number of groups strongly affect the estimates of factor means. For the SD, only sirt’s meth = 1 had estimates that depended on the number of groups.
  6. Discussion
In this article, we compared the performance of IA estimates of the Mplus software and the R package sirt. There are two alternative identification constraints for estimating standard deviations . Mplus uses the product constraint , which is used in the sirt methods meth = 3 and meth = 4. However, one can alternatively fix the standard deviation of the first group to 1. This is the default in the R package sirt (i.e., meth = 1. The differences between Mplus and the IA function in the sirt package can primarily be traced back to the different identification constraints for standard deviations. The difference between Mplus and sirt can be made smaller by choosing meth = 3, which mimics the identification constraint used in Mplus. Notably, the latter method is preferred for a large(r) number of groups (say, more than eight), while the default of meth = 1 might be preferable for at most six groups. The simulation study and the empirical example demonstrated that the default meth = 1 in the sirt package does not provide trustworthy results, and users are strongly recommended switching to meth = 2 or meth = 3.
Overall, it turned out in the simulation study that the tuning parameter 
 generally outperforms the default Mplus choice 
. A previous study indicated that the choice of 
 is more critical than the choice between the power 
 or 
 [
15]. Minor reductions regarding bias can be obtained with the power 
 instead of 
. However, for reasonably large sample sizes (e.g., more than 500 subjects per group), an 
 loss function [
42] can even outperform the 
 loss function for 
 or 
 [
15].
Regardless of the use of a particular estimation method in Mplus or sirt, we wonder whether the optimization function of IA is suitable in the case of many groups. The pairwise differences between model parameters in the optimization might lead to less stable estimates than a linear model specification that does not involve pairwise differences. There is some evidence that Haberman linking with the 
 IA loss function could be superior in the estimation of many groups (say, more than 20 groups) in IA (see [
35]). Further research is needed to explore possible adaptations of the IA method in the case of many groups.
In this article, we only examined estimation differences between Mplus and sirt for normally distributed data. It can be expected that estimation differences due to different identification constraints would similarly be present for ordinal data  [
6] because it uses item loadings and item thresholds from item response theory models instead of item loadings and item intercepts from a one-dimensional factor model based on the multivariate normal distribution as the model input.
The IA method can provide consistent estimation of factor means and standard deviations if there is a sparse pattern of parameters that are noninvariant across groups. It is debatable whether such a sparse pattern of noninvariant effects can be theoretically assumed in empirical datasets [
43,
44]. However, if researchers believe in such a sparsity assumption, IA can be deemed an effective data-driven method.
The simulation study conducted in this article assumed a sparse structure of noninvariant parameters. It could be that the differences between Mplus and the IA function in the sirt package were larger under different data-generating models. Future research could further investigate the software differences for more data-generating models and could also involve scenarios of a large number of groups.
As a cautionary remark, we would like to add that enough implementation details must appear in publications for commercial black-box software like Mplus to enable independent judgment, evaluation, and reimplementation of existing methods. We believe that non-documented or sparely documented modeling approaches in commercial software, like the IA method in Mplus, should not be used in substantive and methodological publications because it fundamentally contradicts the principles of open science.