Next Article in Journal
On Exact Non-Traveling Wave Solutions to the Generalized Nonlinear Kadomtsev–Petviashvili Equation in Plasma Physics and Fluid Mechanics
Previous Article in Journal
AI-Assisted Real-Time Monitoring of Infectious Diseases in Urban Areas
Previous Article in Special Issue
Experimental Study of an Approximate Method for Calculating Entropy-Optimal Distributions in Randomized Machine Learning Problems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Standard Error Estimation in Invariance Alignment

by
Alexander Robitzsch
1,2
1
IPN—Leibniz Institute for Science and Mathematics Education, Olshausenstraße 62, 24118 Kiel, Germany
2
Centre for International Student Assessment (ZIB), Olshausenstraße 62, 24118 Kiel, Germany
Mathematics 2025, 13(12), 1915; https://doi.org/10.3390/math13121915 (registering DOI)
Submission received: 5 May 2025 / Revised: 22 May 2025 / Accepted: 6 June 2025 / Published: 8 June 2025

Abstract

:
The invariance alignment (IA) method enables group comparisons in factor models involving either continuous or discrete items. This article evaluates the performance of the commonly used delta method for standard error estimation against alternative bootstrap confidence interval (CI) approaches for IA using the L 0.5 and L 0 loss functions. For IA applied to continuous items, both the delta method and all bootstrap methods yielded acceptable coverage rates. In contrast, for dichotomous items, only bias-corrected bootstrap CIs provided reliable statistical inference in moderate to large sample sizes. In small sample sizes with dichotomous items, none of the individual methods performed consistently well. However, a newly proposed average bootstrap CI approach—based on averaging the lower and upper CI limits from two bootstrap methods—achieved acceptable coverage rates.

1. Introduction

When comparing multiple groups in confirmatory factor analysis (CFA; [1]) or item response theory (IRT; [2,3]) models with respect to a factor variable, identification assumptions are required to separate group differences in item parameters from differences in factor variables [4]. A common assumption is that item parameters in CFA or IRT models are assumed to be equal across groups. This property is referred to as measurement invariance [5,6], which is widely discussed in the social sciences [6,7]. In IRT, violations of this assumption are typically addressed as differential item functioning (DIF; [8,9]). The assumption of equal item parameters across groups is prevalent in the social sciences for two main reasons. First, there is ambiguity in how to impose identification constraints that define group differences in CFA or IRT models. For example, one might assume that the deviations of group-specific item intercepts from the overall item intercept average to zero—similar to the constraint in linear regression that residuals sum to zero. However, any weighted sum of deviations that equals zero could serve as a valid identification constraint. Second, it is often believed that the latent variable can only be meaningfully compared across groups if items operate equivalently across those groups ([10,11,12,13]; but see [4,14,15]).
To handle violations of measurement invariance, the invariance alignment (IA) method [16,17,18,19], also known as alignment optimization [20,21], has been proposed for comparing groups in unidimensional CFA or IRT models. IA seeks to identify a solution in which the majority of item parameters remain (approximately) invariant, allowing for small deviations. This enhances the robustness of group comparisons when invariance is only partially met, that is, in cases where only a limited number of item parameters differ across groups. For this reason, IA is considered advantageous over the strict invariance assumption by many researchers, as it relaxes that assumption [20,21,22]. The IA method has gained wide application in social science research using questionnaire data [23,24,25,26,27,28,29,30,31,32].
Despite its widespread empirical use, statistical inference for IA has primarily been investigated for continuous items, with only limited research for dichotomous or polytomous items. In particular, comparisons of the widely used delta method against alternative bootstrap methods for confidence interval estimation remain scarce. Moreover, there is a lack of comprehensive evaluation of statistical inference for IA using the recently proposed  L 0 loss function [33], which serves as an alternative to the more commonly used L 0.5 loss function.
The adequacy of statistical inference is typically evaluated through simulation studies, focusing on coverage rates or standard error ratios. The following discussion is limited to IA simulation studies using maximum likelihood estimation. Asparouhov and Muthén [16] examined coverage rates for IA with continuous items and observed inflated coverage for group means—i.e., overcoverage—while coverage for group standard deviations was acceptable. They also reported that standard error ratios exceeded 1 for these parameters, indicating overestimated standard errors and explaining the overcoverage. In contrast, Asparouhov and Muthén [17,34] and Wen and Hu [35] did not observe inflated coverage for either continuous or dichotomous items. Flake and McCoach [36] found undercoverage of factor means when IA was applied to polytomous items. Finch [37] reported acceptable coverage for factor means in the case of dichotomous items. Robitzsch [38] observed slightly elevated coverage rates—above the nominal 95% level—for IA applied to continuous items.
This article compares the delta method for standard error estimation in IA with different bootstrap methods and examines whether the results differ between continuous and dichotomous items.
The remainder of the article is structured as follows. Section 2 provides an overview of the IA method. Section 3 introduces alternative methods for standard error and confidence interval estimation in IA. Results from a simulation study involving continuous and dichotomous items are presented in Section 4 and Section 5, respectively. The article concludes with a discussion in Section 6.

2. Invariance Alignment

This section reviews the IA method for unidimensional factor models involving continuous and dichotomous items. Let X g = ( X 1 g , , X I g ) denote the vector of I > 1 items in group g = 1 , , G . The item vector X g is linked to a normally distributed factor variable θ g , commonly referred to as a latent variable in the social sciences. It is assumed that θ g follows a normal distribution with mean μ g and standard deviation (SD) σ g in group g. For identification purposes, the constraints μ 1 = 0 and σ 1 = 1 are imposed in the first group.
In unidimensional factor models, the items X i g are assumed to be conditionally independent given the latent variable θ g . This conditional independence is formally expressed as
P ( X 1 g , , X I g | θ g ) = i = 1 I P ( X i g | θ g ) .
Equation (1) indicates that conditional on the latent variable θ g , the items are statistically independent. Consequently, all associations among the items are accounted for by this unidimensional latent variable.
The unidimensional factor model for continuous items in CFA is defined as
X i g = ν i g + λ i g θ g + ξ i g with θ g N ( μ g , σ g 2 ) and ξ i g N ( 0 , ω i g ) ,
where ν i g denotes the item intercept and λ i g represents item discrimination. The residual terms ξ i g are normally distributed, uncorrelated with the factor variable θ g , and have variance ω i g . The parameters of the unidimensional factor model in (2) can be estimated using maximum likelihood [1,39].
When the model (2) is estimated separately for each group  g = 1 , , G , identification constraints μ g = 0 and σ g = 1 must again be imposed. Under these constraints, the identified item parameters λ ^ i g and ν ^ i g are given by
λ ^ i g = λ i g σ g and ν ^ i g = ν i g + λ i g μ g = ν i g + λ ^ i g σ g μ g .
Consider now the case of dichotomous items, X g { 0 , 1 } I . In this context, the unidimensional factor model is commonly referred to as an IRT model. The function P i g ( θ g ) = P ( X i g = 1 θ g ) for i = 1 , , I in group g = 1 , , G is known as the item response function (IRF). The IRF of the two-parameter logistic (2PL) model [40], which is employed here as the unidimensional factor model for dichotomous items, is defined as
P i g ( θ g ) = Ψ λ i g θ g + ν i g with θ g N ( μ g , σ g 2 ) ,
where λ i g and ν i g denote the item discrimination and item intercept, respectively, and  Ψ ( x ) = ( 1 + exp ( x ) ) 1 is the logistic distribution function. The parameters of the 2PL model can be consistently estimated using marginal maximum likelihood (MML) estimation [41,42].
When model (4) is estimated separately for each group g, the identification constraints μ g = 0 and σ g = 1 must be imposed. Under these constraints, the same transformation formulas as in (3) apply for the identified parameters λ ^ i g and ν ^ i g .
The optimization function used in the IA method proposed by Asparouhov and Muthén [16,18] is described below. The IA method has been discussed in the literature for continuous items [16], dichotomous items [18], and polytomous items [43].
The IA method estimates group means μ = ( μ 2 , , μ G ) and group SDs σ = ( σ 2 , , σ G ) by minimizing a discrepancy function based on transformed item parameter differences across group pairs. For identification, define μ 1 = 0 and σ 1 = 1 . Let δ = ( μ , σ ) denote the vector of distribution parameters of interest. The estimate δ ^ is obtained by minimizing
H ( δ ) = i = 1 I g h ρ λ ^ i g σ g λ ^ i h σ h + i = 1 I g h ρ ν ^ i g ν ^ i h λ ^ i g σ ^ g μ g + λ ^ i h σ ^ h μ h ,
where ρ denotes a robust loss function. The original IA method also incorporates weights for group pairs ( g , h ) to account for varying sample sizes in the objective function [16]. These weights are omitted here, as they are unnecessary when sample sizes are equal across groups.
A robust loss function ρ is employed in the minimization problem (5) to account for the assumption that only a subset of item parameters deviates across groups. This reflects a sparse pattern of deviations from full measurement invariance. Accordingly, the IA method is particularly well suited to situations involving partial invariance. In its original formulation, the IA method uses a differentiable approximation of the nondifferentiable L p loss function x | x | p (see [44,45,46]), defined as (see [16])
ρ ( x ) = | x | 2 + ε p / 2 ,
where ε > 0 is a tuning parameter that governs the approximation error [16,18]. In the default setting of the commercially available Mplus software (Version: 8.11) by Muthén and Asparouhov [47], the power is p = 0.5 , which is also adopted in this paper. Empirical studies suggest that a value of ε = 0.001 is effective [29,33], although Mplus applies a default of ε = 0.01 .
As an alternative to the L 0.5 loss function, a differentiable approximation of the L 0 loss function x 1 { x 0 } —the indicator function that returns one if the argument differs from zero—has been employed within the IA framework [38]; see also [48]. The corresponding loss function is defined as
ρ ( x ) = x 2 x 2 + ε ,
where ε > 0 is a tuning parameter that controls the approximation error of ρ relative to the nondifferentiable L 0 loss function. Empirical findings indicate that setting ε = 0.01 yields satisfactory performance in practical applications [33,49].
The L 0 loss function typically reduces bias relative to the L 0.5 loss function; however, this reduction in bias is accompanied by increased variance. The choice between the two loss functions depends on the bias-variance trade-off in a given context, as the root mean squared error (RMSE) may favor either one depending on the relative magnitudes of bias and variance [33].

3. Standard Error and Confidence Interval Estimation in Invariance Alignment

This section presents alternative methods for computing standard errors and confidence intervals (CIs) for IA. While the delta method discussed in Section 3.1 relies on asymptotic theory, the bootstrap methods outlined in Section 3.2 leverage computational resources to compute a CI.
Let δ = ( μ , σ ) represent the vector of distribution parameters, with estimates denoted as δ ^ = ( μ ^ , σ ^ ) . The vector γ ^ contains the estimated item parameters, including item discriminations λ ^ i g and item intercepts ν i g . The corresponding estimated variance matrix is denoted by V γ ^ , which is typically derived from statistical software that estimates the unidimensional factor models for continuous or dichotomous items separately in each of the G groups.

3.1. Delta Method

The standard error of δ ^ and the corresponding confidence intervals (CIs) for its components are computed using the delta method (DM; see [16,17,50,51,52,53,54,55]). In IA, the estimate δ ^ = ( μ ^ , σ ^ ) satisfies the estimating equation
h δ ( δ , γ ^ ) = 0 ,
where h δ is the partial derivative of H with respect to δ in the IA minimization problem (5). The DM approach applies a Taylor expansion of h δ around the population values ( δ , γ ) , yielding
h δ ( δ ^ , γ ^ ) = h δ ( δ , γ ) + h δ δ ( δ , γ ) ( δ ^ δ ) + h δ γ ( δ , γ ) ( γ ^ γ ) .
Here, h δ δ and h δ γ represent the Jacobians of h δ with respect to δ and γ , respectively. From (8) and the identity h δ ( δ , γ ) = 0 , expression (9) simplifies to
δ ^ δ = h δ δ ( δ , γ ) 1 h δ γ ( δ , γ ) ( γ ^ γ ) .
Letting A = h δ δ ( δ , γ ) 1 h δ γ ( δ , γ ) , the variance of δ ^ becomes
Var ( δ ^ ) = A V γ ^ A .
An estimate of A is given by
A ^ = h δ δ ( δ ^ , γ ^ ) 1 h δ γ ( δ ^ , γ ^ ) ,
resulting in the estimated variance matrix
V δ ^ = A ^ V γ ^ A ^ .
Standard errors for the entries in δ ^ are computed as the square roots of the diagonal elements in V δ ^ . Confidence intervals can then be constructed assuming normality and utilizing these computed standard errors.

3.2. Bootstrap Methods

This section discusses the application of parametric bootstrap methods to compute confidence intervals for the parameter estimate  δ ^ in IA. The estimate  δ ^ satisfies the estimating equation  h δ ( δ , γ ^ ) = 0 . The item parameters  δ ^ have a variance matrix  V γ ^ , which is estimated separately for each group. The parametric bootstrap method resamples item parameters based on  V γ ^ , generating a distribution of IA parameter estimates  δ ^ .
To obtain bootstrap samples b = 1 , , B , draw  γ ^ ( b ) from a multivariate normal distribution with mean vector  γ ^ and variance matrix  V γ ^ . For each bth bootstrap sample, the estimate  δ ^ ( b ) satisfies the estimating equation  h δ ( δ , γ ^ ( b ) ) = 0 . Standard bootstrap techniques can then be applied to the resulting estimates  γ ^ ( b ) for b = 1 , , B (see [56]).
Let  β ^ represent an entry of  δ ^ , corresponding to estimated group means or group SDs, and let  β ^ ( b ) denote the corresponding parameter estimate in the bth bootstrap sample. Define  G ^ β ^ , boot as the empirical distribution function of  β ^ , derived from the parametric bootstrap using B samples. The associated inverse distribution function (i.e., the quantile or percentile function) is denoted as  G ^ β ^ , boot 1 .
The following outlines alternative methods for CI estimation at a confidence level  1 α (e.g., 1 α = 0.95 ), based on the bootstrap procedures described in [56,57].
The normal distribution bootstrap (BNO) CI assumes a normal distribution for the parameter estimate  β ^ . Let  Φ 1 denote the inverse distribution function (i.e., the quantile function) of the standard normal distribution. The BNO confidence interval is given by
C I = β ^ Φ 1 ( 1 α / 2 ) s β ^ , β ^ + Φ 1 ( 1 α / 2 ) s β ^ ,
where  s β ^ denotes the empirical standard deviation of the bootstrap estimates  β ^ ( b ) . For a confidence level of  1 α = 0.95 , the quantile in (14) is  Φ 1 ( 1 α / 2 ) = 1.96 .
The percentile bootstrap (BPE) CI is based on the quantiles (i.e., percentiles) of the empirical distribution of the bootstrap parameter estimates  β ^ . The CI is defined as
C I = G ^ β ^ , boot 1 ( α / 2 ) , G ^ β ^ , boot 1 ( 1 α / 2 ) .
An advantage of the BPE method is its applicability to cases where the distribution of  β ^ is asymmetric.
The bias-corrected bootstrap (BBC) CI accounts for potential bias in the bootstrap samples and accommodates asymmetric distributions of the linking parameter estimate β ^ . It is defined as
C I = G ^ β ^ , boot 1 Φ 2 B β ^ Φ 1 ( 1 α / 2 ) , G ^ β ^ , boot 1 Φ 2 B β ^ + Φ 1 ( 1 α / 2 ) ,
where
B β ^ = Φ 1 1 B b = 1 B 1 ( β ^ ( b ) < β ^ )
is a bias correction term and  1 denotes the indicator function.
If two confidence intervals (CIs), C I j = [ l j , u j ] for j = 1 , 2 , are available, an average bootstrap (BAV) CI can be computed as
C I = l 1 + l 2 2 , u 1 + u 2 2 .
The rationale for using BAV is that if C I 1 results in overcoverage and C I 2 in undercoverage, averaging the lower and upper bounds of the two CIs yields a coverage rate that is between the original CIs.

3.3. Comparison

The delta method and parametric bootstrap methods discussed in Section 3.1 and Section 3.2, respectively, rely on the asymptotic normality of the item parameter estimates. These estimates are treated as input to the IA procedure, which is a nonlinear function of the item parameters. The delta method applies a linear Taylor approximation to compute confidence intervals for the IA estimates, whereas the bootstrap method does not rely on this approximation and instead constructs confidence intervals using computational resources. Although the bootstrap methods may be more accurate due to fewer distributional assumptions, they are computationally more demanding. In contrast, the delta method is more efficient but may be less accurate because of the linear approximation involved in the parameter transformation.

4. Simulation Study 1: Continuous Items

In this Simulation Study 1, the performance of confidence intervals (CIs) is compared for IA using the L 0.5 and L 0 loss functions with continuous items.

4.1. Method

The data-generating model for this study was adapted from Asparouhov and Muthén [16]. Continuous item responses were simulated using a unidimensional CFA model with G = 3 groups. The normally distributed factor variable was measured by I = 5 normally distributed items.
In the simulation, the group means μ g of the factor variable θ g were set to 0, 0.3, and 1.0, while the group SDs σ g were set to 1, 1.5, and 1.2, respectively. Item responses included DIF in item discriminations and item intercepts. Each group had one out of five items with noninvariant item intercept ν i g and one noninvariant item discrimination  λ i g . For all groups, invariant item discriminations and residual variances of the indicator variables were set to λ i g = 1 ( i = 1 , , 5 ), and invariant item intercepts were set to ν i g = 0 . The noninvariant item parameters in the first group were ν 51 = 0.5 and λ 13 = 1.4 . The noninvariant item parameters in the second group were ν 12 = 0.5 and λ 52 = 0.5 . The noninvariant item parameters in the third group were ν 23 = 0.5 and λ 43 = 0.3 . The complete set of item parameters is also available at https://osf.io/8hfjc (accessed on 5 May 2025).
Sample sizes per group were selected as N = 250 , 500, 1000, and 2000 to encompass a broad range of sample sizes, reflecting the variety of potential applications for IA alignment.
In each of the 4 simulation conditions (with sample size N), 5000 replications were conducted. IA was applied for p = 0.5 with ε = 0.001 (i.e., L 0.5 loss function) and for p = 0 with ε = 0.01 (i.e., L 0 loss function). For both IA specifications, CIs at a confidence level of 1 α = 0.95 were computed using five methods: DM, BNO, BPE, BBC, and BAV (see Section 3.2). The BAV method averaged the CI limits from BNO and BBC. A total of B = 1000 bootstrap samples were used for the parametric bootstrap approaches. Bias and root mean squared error (RMSE) were evaluated for the estimated group means μ ^ g and SDs  σ ^ g . Coverage rates were also assessed for the two IA methods, which were crossed with the five CI estimation approaches. For each IA method, a pseudo-true parameter was defined as the average parameter estimate across replications within a given simulation condition, isolating coverage performance from parameter bias. The coverage rate was defined as the proportion of replications in which the CI contained the pseudo-true parameter. Coverage rates between 91% and 98% were considered acceptable [58]. A coverage rate below 91% indicated undercoverage, while a rate above 98% signified overcoverage.
All analyses for this simulation study were conducted using the statistical software R (Version 4.4.1; [59]). The unidimensional CFA model was fitted using the R function sirt::invariance_alignment_cfa_config() from the sirt package (Version 4.2-114; [60]). This function is a wrapper for the lavaan package (Version 0.6-19; [61,62]). IA estimation, including point estimates and standard errors based on DM, was performed using the R function sirt::invariance.alignment() from the sirt package. Custom functions were developed by the author to implement the parametric bootstrap approach. Replication materials for this simulation study are available at https://osf.io/8hfjc (accessed on 5 May 2025).

4.2. Results

Table 1 presents the bias and RMSE for the estimated group means and group SDs as a function of sample size for IA specified with the loss functions using powers p = 0.5 and p = 0 . Notably, IA with p = 0.5 exhibited a larger bias than IA with p = 0 . However, as the sample size increased, the bias diminished. Despite this, IA using the L 0 loss function outperformed the L p loss in terms of RMSE across most simulation conditions.
Table 2 shows the coverage rates for the estimated means and group SDs as a function of sample size. In this simulation study with continuous items, all five CI estimation methods resulted in acceptable coverage rates. However, the bootstrap methods BNO and BPE exhibited slightly inflated coverage rates. For BPE, two conditions with the smallest sample size ( N = 250 ) resulted in overcoverage. The DM method produced coverage rates slightly above the target value of 95.0, but this was not considered problematic.

5. Simulation Study 2: Dichotomous Items

This Simulation Study 2 evaluates the performance of CIs for IA based on the L 0.5 and  L 0 loss functions in the context of dichotomous items.

5.1. Method

As in Simulation Study 1 presented in Section 4, this study considered G = 3 groups. In contrast to the previous study, the present design involved dichotomous items simulated under the 2PL model, rather than continuous items. The number of items was set to either I = 15 or I = 30 .
In the simulation, the group means μ g of the factor variable θ g were set to 0, 0.3, and 0.8, and the group SDs σ g were specified as 1, 1.4, and 1.2, respectively. Item discriminations λ i g were held invariant across groups, whereas item intercepts  ν i g exhibited DIF with an absolute effect of 0.5. The specification of item parameters is described for the case of I = 15 items; for I = 30 items, the parameters were duplicated. The item discriminations for the 15 items were set to 0.78, 0.74, 1.27, 0.82, 1.40, 0.85, 0.84, 1.30, 1.34, 0.75, 1.16, 1.15, 1.10, 0.70, and 0.80, yielding an average discrimination of M = 1.00 with S D = 0.25 . The invariant item intercepts were 1.45, −0.31, −1.09, −1.22, −1.64, 0.05, −0.49, 1.67, −0.07, −0.48, 0.01, −1.79, 0.36, −1.08, and 0.14, resulting in M = 0.30 and S D = 0.97 . DIF in item intercepts was introduced for three items per group. In the first group, Items 1–3 had intercepts 1.95, −0.81, and −0.59. In the second group, Items 4–6 had intercepts −0.72, −1.14, and 0.55. In the third group, Items 7–9 had intercepts −0.99, 1.17, and −0.57. The full set of item parameters is available at https://osf.io/8hfjc (accessed on 5 May 2025).
In this simulation, sample sizes per group were set to N = 500 , 1000, 2000, and 4000, reflecting the increased sample size requirements typically associated with estimating the 2PL model compared to the unidimensional CFA model for continuous items.
In each of the 4 (sample size N) × 2 (number of items I) = 8 simulation conditions, 5000 replications were performed. The same analysis models—IA with L 0.5 and L 0 loss functions and CI estimation methods—as in Simulation Study 1 were applied. The parametric bootstrap procedure again used B = 1000 bootstrap samples.
This simulation study was also conducted using the statistical software R (Version 4.4.1; [59]). The unidimensional 2PL model was fitted using the R function sirt::xxirt() from the sirt package (Version 4.2-114; [60]). IA estimation was performed using the sirt::invariance.alignment() function from the same package. The custom functions used to implement the parametric bootstrap approach were identical to those employed in Simulation Study 1. Replication materials for this simulation study are available at https://osf.io/8hfjc (accessed on 5 May 2025).

5.2. Results

Table 3 reports the bias and RMSE for the estimated group means and group SDs as a function of sample size, based on IA specified with the loss functions using powers p = 0.5 and p = 0 . For the group means, IA with the L 0 loss function exhibited smaller bias than the specification with p = 0.5 ; however, this advantage did not extend to the estimation of group SDs. As observed in the continuous item setting, the bias associated with IA decreased as sample size increased.
Table 4 reports the coverage rates for estimated group means and group SDs. In contrast to the results obtained with continuous items, the study involving dichotomous items revealed that the DM method led to markedly inflated coverage rates (i.e., overcoverage) for both the L 0.5 and L 0 loss functions. The bootstrap methods BNO and BPE, which rely on normality assumptions or empirical percentiles, respectively, performed slightly better for IA with p = 0.5 than for p = 0 . However, several simulation conditions still yielded overcoverage for these CI methods. The BBC method produced acceptable coverage rates for sample sizes of at least N = 1000 , but showed undercoverage for N = 500 . In the latter condition, the BAV method, which averages the CI limits of the BNO and BBC methods, demonstrated improved performance in terms of coverage. While BAV consistently resulted in acceptable coverage for IA with p = 0.5 , it occasionally produced overcoverage for IA with p = 0 , particularly at larger sample sizes.

6. Discussion

This study compared the commonly used delta method (DM) for standard error estimation in IA with alternative bootstrap-based approaches for computing CIs. For IA applied to continuous items, both the L 0.5 and L 0 loss functions yielded reliable statistical inference, with all CI methods producing acceptable coverage rates. In contrast, for dichotomous items, the DM method resulted in pronounced overcoverage, and bootstrap CIs based on the normal distribution (BNO) or empirical percentiles (BPE) also showed a tendency toward overcoverage. In such cases, bias-corrected bootstrap (BBC) CIs should be preferred for moderate to large sample sizes, whereas in smaller samples, a bootstrap CI method that averages the CI limits from BNO and BBC (BAV) demonstrated better coverage performance. Use of the BBC bootstrap is recommended for practitioners of the IA method, at least in the case of dichotomous items.
The finding that standard error assessment for IA using the delta method leads to inflated coverage rates for dichotomous items has not been clearly addressed in the literature. Further research is needed to understand why the results for dichotomous items differ substantially from those for continuous items. One possible explanation is that the normality assumption for estimated item parameters may be better satisfied with continuous items than with dichotomous ones. In addition, item parameter estimates for dichotomous items tend to have much larger standard errors. The linear Taylor approximation used in the delta method may be more appropriate when standard errors are small, and the normality assumption is more closely met.
Consistent with previous studies, IA using the L 0 loss function—rather than the commonly applied L 0.5 loss function—produced less biased and more precise estimates [29,33,38]. Although the Mplus software [47] also allows for the specification of the L 0.25 loss function, this alternative is similarly outperformed by the L 0 loss function. Therefore, greater use of the L 0 loss function is recommended in empirical research.
Future research could investigate whether the validity of statistical inference for IA with dichotomous items depends on the degree of invariance violations or the magnitude of DIF effects. Additionally, it would be informative to examine whether similar patterns of overcoverage occur when IA is applied to polytomous item responses [36,43].

Funding

This research received no external funding.

Data Availability Statement

Replication material for Simulation Study 1 in Section 4 and Simulation Study 2 in Section 5 can be found at https://osf.io/8hfjc (accessed on 5 May 2025).

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
2PLtwo-parameter logistic
BAVaverage bootstrap
BBCbias-corrected bootstrap
BNObootstrap based on normal distribution
BPEpercentile bootstrap
CFAconfirmatory factor analysis
CIconfidence interval
DIFdifferential item functioning
DMdelta method
IAinvariance alignment
IRFitem response function
IRTitem response theory
MMLmarginal maximum likelihood
RMSEroot mean square error
SDstandard deviation

References

  1. Bartholomew, D.J.; Knott, M.; Moustaki, I. Latent Variable Models and Factor Analysis: A Unified Approach; Wiley: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
  2. Bock, R.D.; Gibbons, R.D. Item Response Theory; Wiley: New York, NY, USA, 2021. [Google Scholar] [CrossRef]
  3. Chen, Y.; Li, X.; Liu, J.; Ying, Z. Item response theory—A statistical framework for educational and psychological measurement. arXiv 2021, arXiv:2108.08604. [Google Scholar]
  4. Robitzsch, A.; Lüdtke, O. Why full, partial, or approximate measurement invariance are not a prerequisite for meaningful and valid group comparisons. Struct. Equ. Model. 2023, 30, 859–870. [Google Scholar] [CrossRef]
  5. Meredith, W. Measurement invariance, factor analysis and factorial invariance. Psychometrika 1993, 58, 525–543. [Google Scholar] [CrossRef]
  6. Millsap, R.E. Statistical Approaches to Measurement Invariance; Routledge: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
  7. Mellenbergh, G.J. Item bias and item response theory. Int. J. Educ. Res. 1989, 13, 127–143. [Google Scholar] [CrossRef]
  8. Holland, P.W.; Wainer, H. (Eds.) Differential Item Functioning: Theory and Practice; Lawrence Erlbaum: Hillsdale, NJ, USA, 1993. [Google Scholar] [CrossRef]
  9. Penfield, R.D.; Camilli, G. Differential item functioning and item bias. In Handbook of Statistics, Volume 26: Psychometrics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 125–167. [Google Scholar] [CrossRef]
  10. Fischer, R.; Rudnev, M. From MIsgivings to MIse-en-scène: The role of invariance in personality science. Eur. J. Pers. 2024; online ahead of print. [Google Scholar] [CrossRef]
  11. Fischer, R.; Karl, J.A.; Luczak-Roesch, M.; Hartle, L. Why we need to rethink measurement invariance: The role of measurement invariance for cross-cultural research. Cross-Cult. Res. 2025, 47, 147–179. [Google Scholar] [CrossRef]
  12. Sterner, P.; Pargent, F.; Deffner, D.; Goretzko, D. A causal framework for the comparability of latent variables. Struct. Equ. Model. 2024, 31, 747–758. [Google Scholar] [CrossRef]
  13. Sterner, P.; Pargent, F.; Rudnev, M.; Goretzko, D. Do not let MI be misunderstood: Measurement invariance is more than a statistical assumption. PsyArXiv 2025. [Google Scholar] [CrossRef]
  14. Funder, D.C.; Gardiner, G. MIsgivings about measurement invariance. Eur. J. Pers. 2024, 38, 889–895. [Google Scholar] [CrossRef]
  15. Welzel, C.; Inglehart, R.F. Misconceptions of measurement equivalence: Time for a paradigm shift. Comp. Political Stud. 2016, 49, 1068–1094. [Google Scholar] [CrossRef]
  16. Asparouhov, T.; Muthén, B. Multiple-group factor analysis alignment. Struct. Equ. Model. 2014, 21, 495–508. [Google Scholar] [CrossRef]
  17. Asparouhov, T.; Muthén, B. Penalized structural equation models. Struct. Equ. Model. 2024, 31, 429–454. [Google Scholar] [CrossRef]
  18. Muthén, B.; Asparouhov, T. IRT studies of many groups: The alignment method. Front. Psychol. 2014, 5, 978. [Google Scholar] [CrossRef]
  19. Muthén, B.; Asparouhov, T. Recent methods for the study of measurement invariance with many groups: Alignment and random effects. Sociol. Methods Res. 2018, 47, 637–664. [Google Scholar] [CrossRef]
  20. Cieciuch, J.; Davidov, E.; Schmidt, P. Alignment optimization. Estimation of the most trustworthy means in cross-cultural studies even in the presence of noninvariance. In Cross-Cultural Analysis: Methods and Applications; Davidov, E., Schmidt, P., Billiet, J., Eds.; Routledge: London, UK, 2018; pp. 571–592. [Google Scholar] [CrossRef]
  21. Pokropek, A.; Davidov, E.; Schmidt, P. A Monte Carlo simulation study to assess the appropriateness of traditional and newer approaches to test for measurement invariance. Struct. Equ. Model. 2019, 26, 724–744. [Google Scholar] [CrossRef]
  22. Sterner, P.; De Roover, K.; Goretzko, D. New developments in measurement invariance testing: An overview and comparison of EFA-based approaches. Struct. Equ. Model. 2025, 32, 117–135. [Google Scholar] [CrossRef]
  23. Fischer, R.; Karl, J.A. A primer to (cross-cultural) multi-group invariance testing possibilities in R. Front. Psychol. 2019, 10, 1507. [Google Scholar] [CrossRef]
  24. Han, H. Using measurement alignment in research on adolescence involving multiple groups: A brief tutorial with R. J. Res. Adolesc. 2024, 34, 235–242. [Google Scholar] [CrossRef]
  25. Lai, M.H.C. Adjusting for measurement noninvariance with alignment in growth modeling. Multivar. Behav. Res. 2023, 58, 30–47. [Google Scholar] [CrossRef]
  26. Leitgöb, H.; Seddig, D.; Asparouhov, T.; Behr, D.; Davidov, E.; De Roover, K.; Jak, S.; Meitinger, K.; Menold, N.; Muthén, B.; et al. Measurement invariance in the social sciences: Historical development, methodological challenges, state of the art, and future perspectives. Soc. Sci. Res. 2023, 110, 102805. [Google Scholar] [CrossRef]
  27. Luong, R.; Flake, J.K. Measurement invariance testing using confirmatory factor analysis and alignment optimization: A tutorial for transparent analysis planning and reporting. Psychol. Methods 2023, 28, 905–924. [Google Scholar] [CrossRef]
  28. Pokropek, A.; Lüdtke, O.; Robitzsch, A. An extension of the invariance alignment method for scale linking. Psychol. Test Assess. Model. 2020, 62, 303–334. [Google Scholar]
  29. Robitzsch, A. Comparing robust Haberman linking and invariance alignment. Stats 2025, 8, 3. [Google Scholar] [CrossRef]
  30. Sandoval-Hernandez, A.; Carasco, D.; Eryilmaz, N. Alignment optimization in international large-scale assessments: A scoping review and future directions. Educ. Methods Psychom. 2025, 3, 16. [Google Scholar] [CrossRef]
  31. Sideridis, G.; Alghamdi, M.H. Bullying in middle school: Evidence for a multidimensional structure and measurement invariance across gender. Children 2023, 10, 873. [Google Scholar] [CrossRef]
  32. Tsaousis, I.; Jaffari, F.M. Identifying bias in social and health research: Measurement invariance and latent mean differences using the alignment approach. Mathematics 2023, 11, 4007. [Google Scholar] [CrossRef]
  33. Robitzsch, A. Examining differences of invariance alignment in the Mplus software and the R package sirt. Mathematics 2024, 12, 770. [Google Scholar] [CrossRef]
  34. Asparouhov, T.; Muthén, B. Multiple group alignment for exploratory and structural equation models. Struct. Equ. Model. 2023, 30, 169–191. [Google Scholar] [CrossRef]
  35. Wen, C.; Hu, F. Investigating the applicability of alignment–A Monte Carlo simulation study. Front. Psychol. 2022, 13, 845721. [Google Scholar] [CrossRef]
  36. Flake, J.K.; McCoach, D.B. An investigation of the alignment method with polytomous indicators under conditions of partial measurement invariance. Struct. Equ. Model. 2018, 25, 56–70. [Google Scholar] [CrossRef]
  37. Finch, W.H. Detection of differential item functioning for more than two groups: A Monte Carlo comparison of methods. Appl. Meas. Educ. 2016, 29, 30–45. [Google Scholar] [CrossRef]
  38. Robitzsch, A. Implementation aspects in invariance alignment. Stats 2023, 6, 1160–1178. [Google Scholar] [CrossRef]
  39. Basilevsky, A.T. Statistical Factor Analysis and Related Methods: Theory and Applications; Wiley: New York, NY, USA, 2009. [Google Scholar] [CrossRef]
  40. Birnbaum, A. Some latent trait models and their use in inferring an examinee’s ability. In Statistical Theories of Mental Test Scores; Lord, F.M., Novick, M.R., Eds.; MIT Press: Reading, MA, USA, 1968; pp. 397–479. [Google Scholar]
  41. Aitkin, M. Expectation maximization algorithm and extensions. In Handbook of Item Response Theory, Volume 2: Statistical Tools; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 217–236. [Google Scholar] [CrossRef]
  42. Bock, R.D.; Aitkin, M. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika 1981, 46, 443–459. [Google Scholar] [CrossRef]
  43. Mansolf, M.; Vreeker, A.; Reise, S.P.; Freimer, N.B.; Glahn, D.C.; Gur, R.E.; Moore, T.M.; Pato, C.N.; Pato, M.T.; Palotie, A.; et al. Extensions of multiple-group item response theory alignment: Application to psychiatric phenotypes in an international genomics consortium. Educ. Psychol. Meas. 2020, 80, 870–909. [Google Scholar] [CrossRef] [PubMed]
  44. Livadiotis, G. Expectation values and variance based on Lp-norms. Entropy 2012, 14, 2375–2396. [Google Scholar] [CrossRef]
  45. Livadiotis, G. On the convergence and law of large numbers for the non-euclidean Lp-means. Entropy 2017, 19, 217. [Google Scholar] [CrossRef]
  46. Lipovetsky, S. Optimal Lp-metric for minimizing powered deviations in regression. J. Mod. Appl. Stat. Methods 2007, 6, 20. [Google Scholar] [CrossRef]
  47. Muthén, L.; Muthén, B. Mplus User’s Guide; Version 8.11; Muthén & Muthén: Los Angeles, CA, USA, 1998–2024; Available online: https://www.statmodel.com/ (accessed on 5 May 2025).
  48. O’Neill, M.; Burke, K. Variable selection using a smooth information criterion for distributional regression models. Stat. Comput. 2023, 33, 71. [Google Scholar] [CrossRef]
  49. Robitzsch, A. L0 and Lp loss functions in model-robust estimation of structural equation models. Psych 2023, 5, 1122–1139. [Google Scholar] [CrossRef]
  50. Ogasawara, H. Standard errors of item response theory equating/linking by response function methods. Appl. Psychol. Meas. 2001, 25, 53–67. [Google Scholar] [CrossRef]
  51. Battauz, M. Factors affecting the variability of IRT equating coefficients. Stat. Neerl. 2015, 69, 85–101. [Google Scholar] [CrossRef]
  52. Andersson, B. Asymptotic variance of linking coefficient estimators for polytomous IRT models. Appl. Psychol. Meas. 2018, 42, 192–205. [Google Scholar] [CrossRef] [PubMed]
  53. Zhang, Z. Asymptotic standard errors of equating coefficients using the characteristic curve methods for the graded response model. Appl. Meas. Educ. 2020, 33, 309–330. [Google Scholar] [CrossRef]
  54. Jewsbury, P.A. Generally applicable variance estimation methods for common-population linking. J. Educ. Behav. Stat. 2024; online ahead of print. [Google Scholar] [CrossRef]
  55. Robitzsch, A. Estimation of standard error, linking error, and total error for robust and nonrobust linking methods in the two-parameter logistic model. Stats 2024, 7, 592–612. [Google Scholar] [CrossRef]
  56. Efron, B.; Tibshirani, R.J. An Introduction to the Bootstrap; CRC Press: Boca Raton, FL, USA, 1994. [Google Scholar] [CrossRef]
  57. Davison, A.C.; Hinkley, D.V. Bootstrap Methods and Their Application; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar] [CrossRef]
  58. Muthén, L.K.; Muthén, B.O. How to use a Monte Carlo study to decide on sample size and determine power. Struct. Equ. Model. 2002, 9, 599–620. [Google Scholar] [CrossRef]
  59. R Core Team. R: A Language and Environment for Statistical Computing; R Core Team: Vienna, Austria, 2024; Available online: https://www.R-project.org (accessed on 15 June 2024).
  60. Robitzsch, A. sirt: Supplementary Item Response Theory Models. R Package Version 4.2-114. 2025. Available online: https://github.com/alexanderrobitzsch/sirt (accessed on 7 April 2025).
  61. Rosseel, Y. lavaan: An R package for structural equation modeling. J. Stat. Softw. 2012, 48, 1–36. [Google Scholar] [CrossRef]
  62. Rosseel, Y.; Jorgensen, T.D.; De Wilde, L.; Oberski, D.; Byrnes, J.; Vanbrabant, L.; Savalei, V.; Merkle, E.; Hallquist, M.; Rhemtulla, M.; et al. lavaan: Latent Variable Analysis. R Package Version 0.6-19. 2024. Available online: https://cran.r-project.org/web/packages/lavaan/lavaan.pdf (accessed on 26 September 2024).
Table 1. Simulation Study 1, continuous items: Bias and root mean square error for estimated group means and group SDs as a function of sample size N.
Table 1. Simulation Study 1, continuous items: Bias and root mean square error for estimated group means and group SDs as a function of sample size N.
μ 2 μ 3 σ 2 σ 3
Crit N p = 0.5 p = 0 p = 0.5 p = 0 p = 0.5 p = 0 p = 0.5 p = 0
Bias250−0.042−0.007 0.0190.022 −0.0160.005 −0.009 0.005
500−0.0270.000 0.0070.006 −0.0130.002 −0.0070.004
1000−0.020−0.003 0.0040.003 −0.0090.001 −0.0070.001
2000−0.0120.000 0.0030.002 −0.0070.001 −0.0050.000
RMSE2500.1400.138 0.1510.172 0.1280.135 0.1110.116
5000.0920.088 0.1010.105 0.0840.085 0.0730.074
10000.0660.063 0.0680.068 0.0570.056 0.0510.050
20000.0460.044 0.0460.046 0.0390.038 0.0350.034
Note. Crit = criterion (bias or RMSE); p = used power in the loss function ρ in invariance alignment; Values of absolute bias larger than 0.010 are printed in bold font.
Table 2. Simulation Study 1, continuous items: Coverage rates for estimated group means and group SDs as a function of sample size N.
Table 2. Simulation Study 1, continuous items: Coverage rates for estimated group means and group SDs as a function of sample size N.
p = 0.5 p = 0
Par N DM BNO BPE BBC BAV DM BNO BPE BBC BAV
μ 2 25096.496.696.293.295.4 96.098.097.394.296.5
50096.996.796.295.196.2 96.697.397.095.796.8
100096.095.895.494.195.1 95.796.295.895.496.0
200096.095.895.595.095.695.595.895.595.495.6
μ 3 25096.496.597.194.395.7 94.397.298.194.796.4
50097.097.196.895.696.5 96.298.397.296.597.7
100096.496.696.695.295.9 96.397.596.796.297.0
200096.196.096.095.495.795.496.195.795.896.0
σ 2 25096.896.096.993.895.3 95.997.397.895.896.9
50096.996.597.095.296.0 96.197.597.696.497.1
100096.496.396.494.995.8 95.996.596.596.196.4
200096.296.396.395.796.295.696.295.995.796.1
σ 3 25097.096.597.494.195.9 96.197.698.295.597.0
50096.796.597.295.095.9 96.497.597.896.597.0
100096.396.196.494.995.6 95.896.796.796.096.4
200096.096.196.295.595.895.796.196.296.096.1
Note. Par = parameter; p = used power in the loss function ρ in invariance alignment; DM = delta method; BNO = bootstrap CI based on normal distribution; BPE = percentile bootstrap CI; BBC = bias-corrected bootstrap CI; BAV = average of CI limits of BNO and BBC; Coverage rates smaller than 91.0 or larger than 98.0 are printed in bold font.
Table 3. Simulation Study 2, dichotomous items: Bias and root mean square error for estimated group means and group SDs as a function of the number of items I sample size N.
Table 3. Simulation Study 2, dichotomous items: Bias and root mean square error for estimated group means and group SDs as a function of the number of items I sample size N.
μ 2 μ 3 σ 2 σ 3
Crit I N p = 0.5 p = 0 p = 0.5 p = 0 p = 0.5 p = 0 p = 0.5 p = 0
Bias155000.0400.031−0.099−0.0680.0610.0690.0530.059
10000.0250.011−0.064−0.0180.0280.0320.0270.029
20000.0140.003 −0.0350.000 0.0150.0160.013 0.014
40000.0080.001−0.0230.0000.0070.0070.0060.006
305000.0370.023−0.100−0.0550.0470.0530.0430.047
10000.0220.006 −0.061−0.009 0.0220.0230.0200.021
20000.0150.004 −0.0350.000 0.0090.009 0.0090.010
40000.0080.001−0.0220.0000.0040.0040.0050.005
RMSE155000.1240.146 0.1600.180 0.1390.170 0.1210.149
10000.0840.091 0.1100.114 0.0880.104 0.0760.091
20000.0560.055 0.0680.063 0.0570.063 0.0500.056
40000.0370.0360.0450.0380.0380.0390.0340.035
305000.1090.123 0.1450.149 0.1050.128 0.0930.113
10000.0720.073 0.0940.081 0.0670.075 0.0580.066
20000.0490.047 0.0580.049 0.0440.046 0.0390.041
40000.0330.0320.0390.0320.0300.0300.0260.026
Note. Crit = criterion (bias or RMSE); p = used power in the loss function ρ in invariance alignment; Values of absolute bias larger than 0.010 are printed in bold font.
Table 4. Simulation Study 2, dichotomous items: Coverage rates for estimated group means and group SDs as a function of the number of items I sample size N.
Table 4. Simulation Study 2, dichotomous items: Coverage rates for estimated group means and group SDs as a function of the number of items I sample size N.
p = 0.5 p = 0
Par I N DM BNO BPE BBC BAV DM BNO BPE BBC BAV
μ 2 1550099.397.998.992.095.4 98.298.899.691.396.0
100099.297.998.593.696.0 98.599.399.695.398.2
200098.897.997.995.196.9 98.398.999.096.798.2
400098.197.197.495.396.697.797.897.996.797.6
3050099.296.897.992.194.8 98.698.499.492.096.2
100099.397.397.593.795.9 98.999.099.296.698.4
200098.697.097.094.896.4 98.698.598.597.097.9
400097.796.796.895.496.097.497.197.396.296.8
μ 3 1550099.397.699.488.394.0 96.898.699.986.393.5
100099.297.398.889.394.5 97.398.399.890.895.8
200099.498.697.793.697.2 99.199.599.596.598.9
400099.197.998.094.196.699.098.999.396.998.2
3050099.696.598.389.593.8 98.097.999.687.294.0
100099.697.697.490.195.2 99.199.299.494.798.1
200099.398.096.392.996.6 99.299.299.397.198.4
400098.597.196.493.896.097.897.698.196.697.3
σ 2 1550099.898.899.089.395.7 99.099.8100.089.797.3
100099.898.599.091.296.1 99.599.599.992.797.7
200099.698.298.993.396.3 99.099.699.895.498.3
400099.297.598.494.296.298.798.999.496.798.2
3050099.897.997.490.995.3 99.399.299.990.696.7
100099.797.698.092.495.9 99.599.399.894.797.9
200099.397.397.693.895.7 99.198.999.296.197.9
400098.696.797.194.896.097.997.898.096.597.3
σ 3 1550099.898.798.989.795.9 99.199.599.990.197.2
100099.798.499.291.896.1 99.299.499.992.497.4
200099.798.398.893.596.5 99.599.699.995.998.5
400099.297.698.694.396.098.899.099.696.697.9
3050099.997.997.490.395.5 99.699.599.890.596.9
100099.897.898.092.496.0 99.699.499.894.397.8
200099.597.497.993.396.1 99.299.199.596.297.9
400098.696.897.194.295.998.298.198.396.297.4
Note. Par = parameter; p = used power in the loss function ρ in invariance alignment; DM = delta method; BNO = bootstrap CI based on normal distribution; BPE = percentile bootstrap CI; BBC = bias-corrected bootstrap CI; BAV = average of CI limits of BNO and BBC; Coverage rates smaller than 91.0 or larger than 98.0 are printed in bold font.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Robitzsch, A. Standard Error Estimation in Invariance Alignment. Mathematics 2025, 13, 1915. https://doi.org/10.3390/math13121915

AMA Style

Robitzsch A. Standard Error Estimation in Invariance Alignment. Mathematics. 2025; 13(12):1915. https://doi.org/10.3390/math13121915

Chicago/Turabian Style

Robitzsch, Alexander. 2025. "Standard Error Estimation in Invariance Alignment" Mathematics 13, no. 12: 1915. https://doi.org/10.3390/math13121915

APA Style

Robitzsch, A. (2025). Standard Error Estimation in Invariance Alignment. Mathematics, 13(12), 1915. https://doi.org/10.3390/math13121915

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop