SIMEX-Based and Analytical Bias Corrections in Stocking–Lord Linking

: Stocking–Lord (SL) linking is a popular linking method for group comparisons based on dichotomous item responses. This article proposes a bias correction technique based on the simulation extrapolation (SIMEX) method for SL linking in the 2PL model in the presence of uniform differential item functioning (DIF). The SIMEX-based method is compared to the analytical bias correction methods of SL linking. It turned out in a simulation study that SIMEX-based SL linking performed best, is easy to implement, and can be adapted to other linking methods straightforwardly.


Introduction
Item response theory (IRT) models [1,2] are statistical models for multivariate discrete variables.IRT models are frequently applied in educational or psychological research.This article investigates group comparisons in unidimensional IRT models [3].Let X = (X 1 , . . ., X I ) be the vector of I dichotomous (i.e., binary) random variables X i ∈ {0, 1} that are typically referred to as items in the psychometric literature.A unidimensional IRT model [4] is a statistical model for the probability distribution P(X = x) for x = (x 1 , . . ., x I ) ∈ {0, 1} I , where P(X = x; δ, γ) = I ∏ i=1 P i (θ; γ i ) x i (1 − P i (θ; γ i )) 1−x i ϕ(θ; µ, σ) dθ . (1) In (1), ϕ denotes the density of the normal distribution with mean µ and standard deviation σ.The parameters of the distribution for the latent (factor) variable θ (also labeled as a trait or ability) are contained in the vector δ = (µ, σ) of the distribution parameters.
The vector γ = (γ 1 , . . ., γ I ) contains all the estimated item parameters of the item response functions (IRF) P i (θ; γ i ) = P(X i = 1|θ) (i = 1, . . ., I).The two-parameter logistic (2PL) model [5] possesses the IRF where a i and b i are the item discrimination and item difficulty, respectively, and Ψ(x) = (1 + exp(−x)) −1 denotes the logistic distribution function.For independent and identically distributed realizations of the random variable X, the unknown model parameters in (1) can be estimated by (marginal) maximum likelihood estimation [6,7].In many educational applications, IRT models are employed to compare the distribution of two groups in a test (i.e., on a set of items) regarding the factor variable θ in the IRT model (1).Linking methods [8][9][10] estimate the 2PL model separately in the two groups in the first step.The linking methods process the estimated item parameters in the second step to calculate a difference regarding the mean µ and the standard deviation σ between the two groups.The separate application of the 2PL model in each of the two groups enables the items to function differently across the groups.This property is termed differential item functioning (DIF; [11,12]) in the literature.It has been pointed out that the occurrence of DIF causes additional variability in the estimated mean µ and standard deviation σ when applying a linking method [13][14][15][16].Moreover, it has been shown that DIF can bias the group differences when applying a linking method [17].
We now discuss the group comparisons within the 2PL model in the presence of DIF.Assume that the 2PL model holds in the first group, with the item discriminations and item intercepts in the first group being defined as a i1 = a i and b i1 = b i , respectively.For identification reasons, we assume θ ∼ N(0, 1) in the first group.In the second group, the ability distribution θ ∼ N(µ, σ 2 ) is assumed, where µ and σ are the mean and the standard deviation (SD), respectively.The item discriminations are assumed to be invariant across the two groups; that is, a i2 = a i .A random uniform DIF [11] effect e i is assumed for the item difficulties [18,19].Then, the item difficulties b i2 in the second group are provided by b i2 = b i + e i , where E(e i ) = 0 and Var(e i ) = τ 2 . ( The variance τ 2 is also called the DIF variance [20], and τ is labeled the DIF SD. A linking method consists of two steps.In the first step, the 2PL model is separately fitted within the two groups while assuming a standard normal distribution N(0, 1) for the ability variable θ.Due to sampling errors (i.e., the sampling of persons), the estimated item parameters a i1 and b i1 will slightly differ from the data-generating parameters a i1 = a i and b i1 = b i .In the second group, the original item parameters a i2 and b i2 are not recovered because the 2PL model is fitted with θ ∼ N(0, 1), but the data-generating model imposes θ ∼ N(µ, σ 2 ).A simple calculation provides Therefore, the identified item parameters in the second group are provided by Due to sampling errors, the estimated item parameters a i1 and b i1 (or a i2 and b i2 ) will slightly differ from a * i1 and b * i1 (or a * i2 and b * i2 ).In the presence of the sampling errors or DIF, a linking function H is chosen that estimates µ and σ based on the estimated item parameters a ig and b ig for i = 1, . . ., I and g = 1, 2. The Stocking-Lord (SL; [21]) linking method determines the parameter of interest δ = (µ, σ) as a minimizer of the weighted squared distance of the test characteristic functions In (6), the IRFs are evaluated on a finite grid of θ values θ 1 , . . ., θ T , and ω t represents the known weight.Previous research highlighted that SL linking is superior to other linking methods, such as Haebara [22] linking [23][24][25].However, the conclusion of this research was based on the absence of DIF.It has been shown in [17] that the presence of (uniform) DIF can lead to biased estimates of the mean µ and the SD σ, which quantify the differences between the two groups.
In this article, we aim to study the bias correction methods for SL linking.Simulation extrapolation (SIMEX; [26]) is a method to correct for bias due to measurement error.It is particularly suited for nonlinear statistical models because it is entirely simulation-based and does not require analytical derivations that must be carried out for a particular model under study.This article interprets uniform DIF as a measurement error in applying the SIMEX method to the SL linking method, corresponding to a particular nonlinear model.Our proposed SIMEX method is compared with an alternative analytical bias correction that has been recently proposed for a wide class of linking methods [27].
The rest of the article is organized as follows.In Section 2, we review the analytical bias correction methods in SL linking.The newly proposed SIMEX-based bias correction method for SL linking is presented in Section 3. Section 4 presents a numerical illustration based on idealized datasets to demonstrate the bias in the original SL linking.The newly proposed SIMEX-based SL linking is compared with the original SL linking and other analytical bias correction SL methods through three simulation studies in Section 5. Finally, the article closes with a discussion in Section 6.

Analytical Bias Correction in Stocking-Lord Linking
In this section, analytical bias correction for a general linking method [27], including the SL linking method as an example, is reviewed.In a linking method, the vector δ = (µ, σ) is the statistical parameter of interest.The parameter estimate for δ is denoted as δ = ( µ, σ).The linking function H is defined as a function of δ and estimated item parameters γ = ( γ 1 , . . ., γ I ) (where where is H is a sufficiently smooth (i.e., at least three times differentiable) function.The SL linking function defined in ( 6) is an example of the general formulation (7).The parameter estimate δ fulfills the estimating equation The linking method that is associated with the linking function H recovers the true group mean µ 0 and the true group SD σ 0 if there is no sampling error and no DIF (i.e., τ = 0).This property is formalized as where γ 0 denotes joint item parameters a i and b i , and all uniform DIF effects e i are assumed to be zero.The analytical bias correction due to the presence of DIF effects e i is based on a secondorder Taylor expansion of H δ .The estimated item parameters γ i for item i are a function of a common item parameter γ i = (a i , b i ) and the uniform DIF effect e i .The estimating Equation ( 8) can be rewritten as where γ collects all item parameters and e = (e 1 , . . ., e I ) denotes the vector of DIF effects.
Note that H δ (δ 0 ; γ, 0) = 0 for e = 0.By assuming independent DIF effects e i for i = 1, . . ., I, we apply a first-order Taylor expansion with respect to δ and a second-order Taylor expansion with respect to DIF effects e i to H δ and obtain Note that Taylor expansion (11) relies on the assumption of independent DIF effects e i .
Because H δ (δ 0 ; γ, 0) = 0, we obtain from ( 11) Because DIF effects can be assumed as random variables with zero mean (i.e., E(e i ) = 0), the expected bias of the parameter estimate δ can be computed as H δe i e i (δ 0 ; γ, 0)E(e 2 i ) .( 13) Equation ( 13) can be used to construct a bias correction term for δ.We compute estimated DIF effects e i as a proxy for the true DIF effect e i as Then, we obtain a bias-corrected estimate δ br of δ of an empirical version of δ br = δ − Bias( δ) as The vector H δe i e i contains second-order derivatives of H δ with respect to e i and is provided by H δe i e i = (H µe i e i , H σe i e i ).
An alternative bias correction estimate can be obtained by using where τ2 is an estimate of τ 2 = Var(e i ) = E(e 2 i ).We now present the required partial derivatives for the bias correction in SL linking.The estimating equations for SL linking defined in (6) for the mean µ and the SD σ are provided as The second-order derivatives H δδ can be similarly obtained.
The second-order derivatives of H µ and H σ with respect to e i can be computed as where Ψ ′ and Ψ ′′ denote the first and the second derivative of the logistic function Ψ, respectively.The terms can be inserted into the Formula (15) for the bias-corrected estimates.
The application of the bias correction estimate (15) to SL linking is denoted by SLA1 in the rest of this paper.If τ2 is estimated using the empirical variance of the estimates e i , the corresponding bias correction estimate for SL linking is denoted by SLA2.The empirical variance of e i also contains sampling errors due to the sampling of persons.By obtaining variance estimates Var( b i1 ) and Var( b i2 ) from the group-wise fitted 2PL models, the part of the variance Var( e i ) due to sampling of persons can be computed.By computing the difference in the empirical variance and the average variance of e i estimates due to sampling error, we obtain a bias-corrected variance τ2 .If a negative variance estimate would result in this difference, the quantity should be set to zero.If the variance estimate τ2 is used in the bias correction Formula (16) for SL linking, the corresponding estimate is denoted by SLA3.
Note that the second-order derivatives H δe i e i vanish in linear linking methods such as mean-geometric-mean linking [9].Therefore, no bias would occur in these methods [27].

SIMEX-Based Bias Correction in Stocking-Lord Linking
In this section, we describe how the simulation extrapolation (SIMEX) method [28][29][30] is adapted to conduct a bias correction for SL linking in the presence of uniform DIF.We now briefly describe the main idea of the SIMEX method.Let a dataset contain parts Z and W, where variables in the matrix Z is measured without error and W contains measurement error.Assume that rows in these matrices refer to independent observations such that z i and w i are the observed data of case i.In our application, we only consider a one-dimensional variable W that is prone to measurement error.We assume the decomposition where w * i corresponds to an error-free measurement and e i denotes the measurement error.It is assumed that the measurement error Var(e i ) = τ 2 is known or can be consistently estimated from data.Furthermore, we assume that measurement errors e i are normally distributed with zero mean.
A statistical procedure provides a parameter δ = f (Z, W ) for a specified function f to estimate an unknown parameter δ.Frequently, the bias of the estimated parameter δ depends on the measurement error variance τ 2 .The SIMEX method aims at removing (or, at best, reducing) the bias in δ due to measurement error by means of a simulation-based method.The general idea is to induce additional measurement error in the data and to compute the parameter estimate δ on these modified datasets that contain additional measurement error.More formally, the values w i of variable W are modified as where e i is a random draw from a normal distribution with zero mean and a standard deviation of λτ 2 .Hence, the w i contains measurement error e i + e i that has the variance (1 + λ)τ 2 .SIMEX evaluates the estimate δ as a function of λ at a grid of values.Ref.
[31] recommended using the λ grid of 0.5, 1.0, 1.5, and 2.0.To reduce Monte Carlo error, the estimation is applied for a repeated simulation of the dataset for a fixed λ value.To this end, the parameter estimate is a function of λ, resulting in a parameter curve δ(λ).This first part of the procedure refers to the simulation step in SIMEX.The second part of SIMEX is the extrapolation step.A regression function δ(λ) is estimated, and the predicted value at λ = −1 provides a parameter estimate for δ extrapolated to the case of no measurement error variance (i.e., τ 2 = 0).The extrapolating function could be linear or quadratic: In the case of linear extrapolation, the final parameter estimate for δ is provided as In the case of quadratic extrapolation, the final parameter estimate is provided by The SIMEX method is a particularly flexible measurement error correction method because it can be applied to any class of statistical models [26,32].We now interpret the application of Stocking-Lord (SL) linking in the presence of uniform DIF as a case of measurement error.Note that DIF, as a measurement error phenomenon, also occurs at the population level and is not restricted to the finite sample size of persons.The input data of SL linking are item parameters { a i1 , b i1 , a i2 , b i2 } i=1,...,I .In what follows, we assume that item parameters do not contain sampling errors of persons and that there is only uniform DIF.Note that we have (see (5)) where δ = (µ, σ) is the parameter vector of interest.The uniform DIF effect e i is considered as measurement error and has a variance τ 2 .An application of the original SL linking method provides initial estimates σ and σ from which DIF effects e i can be estimated as We discussed in Section 2 how τ 2 can be estimated from the data.In the application of SIMEX in this paper, we employ the bias-corrected variance estimate τ2 (see the end of Section 2).The general idea of applying the SIMEX method is to induce additional DIF in the data and to study the expected parameter estimate in SL linking for an increased DIF variance.That is, modified DIF effects e i are computed as where ε i is a random draw from a normal distribution with zero mean and variance λτ 2 .Overall, the variable e i has a variance of (1 + λ)τ 2 .Finally, we recompute item difficulties in the second group as (see ( 24)) and subsequently apply the SL linking method to the set of modified item parameters { a i1 , b i1 , a i2 , b i2 } i=1,...,I .As a consequence, modified parameter estimates µ and σ are ob- tained for item parameters that contain more uniform DIF (i.e., more measurement error) than the original item parameters.For the parameter of interest δ = µ or δ = σ, the SIMEX method provides a parameter curve δ(λ) from which a linear or quadratic regression function can be calculated (see (23)).The extrapolation of the regression function at λ = −1 is used as a final parameter estimate δ that aims to reduce potential bias due to DIF.The originally proposed SIMEX method simulates new datasets.To reduce Monte Carlo error, the simulation is repeatedly conducted for a fixed λ value, and the parameter estimates are averaged afterward.In the case of SL linking, the critical issue is that there are only a few cases (i.e., items) leading to relatively large Monte Carlo errors.Furthermore, reducing the Monte Carlo requires a (very) large number of simulations.However, the parameter estimates in the SL linking method should only be corrected for asymptotic bias.This kind of bias in SL linking would occur in a test for infinite length (i.e., I → ∞) for a fixed DIF variance τ 2 .To this end, we propose a quasi-Monte Carlo variant of SIMEX in which SL linking is applied to a test that contains MI pseudo-items for an integer M. The idea is to replicate item parameters and induce normally distributed uniform DIF systematically across all items.Let ε 1 , . . ., ε M be quantiles of the standard normal distribution such that the mean is approximately zero and the variance is one.In this paper, we chose M = 81 and computed the inverse of the standard normal distribution Φ −1 (i.e., quantiles) on the grid seq(1/(2*M+2), 1-1/(2*M+2), length=M).The ε m values are obtained using qnorm(seq(1/(2*M+2), 1-1/(2*M+2), length=M)).For a fixed λ, the set of values √ λτε m have approximately a zero mean and a variance λτ 2 .The item difficulty for the pseudo-item (i, m) for i = 1, . . ., I and m = 1, . . ., M is defined as All other item parameters a i1 , b i1 , and a i2 are left unmodified across the values of m = 1, . . ., M. By this method, a set of item parameters of pseudo-items are constructed that have a DIF variance of λτ 2 .The SL linking method is applied to the set of pseudo-items for different λ values, and a SIMEX parameter curve δ(λ) is obtained.Note that our modified SIMEX method does not contain simulation error but at the price of only removing (or reducing) asymptotic bias.Finally, the extrapolation of a fitted regression function to the values δ(λ) at λ = −1 provides a final parameter estimate δ.
In this article, we used the quadratic function for SIMEX extrapolation in SL linking (denoted as SLSQ).We also considered the linear extrapolation function in SIMEX for SL linking, abbreviated as SLSL.
It should be emphasized that the described quasi-Monte Carlo SIMEX procedure can be applied to any linking method that uses item parameters as input and is potentially biased in the presence of DIF.

Numerical Illustration
In this section, we demonstrate the bias in SL linking for an idealized dataset in the presence of uniform DIF.Because the bias occurs at population-level data, there is no need to simulate item response.Instead, we show the bias with computations that are solely based on item parameters.
We consider the situation of linking two groups using the 2PL model.The item discriminations a i (= a i1 = a i2 ) are chosen equal to 1 in both groups.Eleven base items are defined with equidistant item difficulties −2, −1.6, . .., 1.6, and 2. These item parameters are duplicated eleven times.As described in the definition of pseudo-items in our SIMEX modification to SL linking, we define a grid of values ε m that is computed by qnorm(seq(1/(2*M+2), 1-1/(2*M+2), length=M)), where M = 11 (i.e., M <-11).These values are approximately standard normally distributed and have zero mean and a standard deviation of one.Overall, 11 × 11 items (i, m) are used in this illustration.In the first group, the item parameters are duplicated M = 11 times.In the second group, for item (i, m), the item discrimination a (i,m)2 equals a i .The item difficulty b (i,m)2 is defined as By this construction, the DIF variance in the test is τ 2 .Note that we constructed DIF deterministically in a systematic way such that items of all difficulties are crossed with all levels of uniform DIF.We computed sets of item parameters for values of τ 2 between 0 and 1. True group differences were simulated by setting µ = 0.3 and σ = 1.2 in this illustration.The original SL linking method was applied to these datasets of item parameters.This method was compared with the two SIMEX-based SL methods SLSQ (quadratic extrapolation) and SLSL (linear extrapolation), as well as the analytical bias correction SLA1.We studied the parameter estimates µ and σ as a function of the DIF variance τ 2 .SIMEX was applied with the four λ values 0.5, 1.0, 1.5, and 2.0.
Figure 1 displays the SIMEX parameter curves µ(λ) and σ(λ) for the set of item parameters with the DIF variance τ 2 = 0.55 (which corresponds to a DIF SD τ = 0.74).It can be seen in Figure 1 that the linear SIMEX regression curve slightly differed from the quadratic SIMEX regression curve.The estimated mean µ for the original SL linking method was 0.285, which led to a bias of 0.285 − 0.3 = −0.015.In contrast, SIMEX-based linking based on quadratic extrapolation (SLSQ) resulted in an almost unbiased estimate of 0.299.The other methods, however, were close, at 0.297 (SLA1) and 0.296 (SLSL).
The differences between the linking methods were more pronounced for the estimated SD σ.SL linking resulted in an estimate of 1.140, which led to a bias of 1.140 − 1.2 = 0.160.The SIMEX-based linking method SLSQ resulted in a σ estimate of 1.196, which again was almost unbiased.In contrast, the methods SLA1 (with 1.188) and SLSL (with 1.183) resulted in slight biases.However, all three bias correction SL methods clearly outperformed the original SL linking method in terms of bias.
Figure 2 presents parameter estimates µ and σ as a function of the DIF variance τ 2 .It is evident that SL linking provided (strongly) biased estimates for µ and σ.The bias is an approximately linear function of τ 2 .Notably, the bias correction SL methods SLSQ, SLA1, and SLSL were superior compared to SL because they resulted in parameter estimates close to the true values µ = 0.3 and σ = 1.2.For large DIF variances, the SIMEX-based linking method SLSQ should be preferred over alternative bias-corrected SL linking methods in terms of bias.It should be noted that the bias for SL linking for the estimated SD is much larger than for the estimated mean.In this simulation study, the 2PL model was used to simulate item responses in two groups.The mean and the standard deviation of the normally distributed ability variable θ in the first group were set to 0 to 1, respectively.The mean µ and the SD σ for the normally distributed ability variable θ in the second group were set to 0.3 and 1.2, respectively.
The number of items I in the simulation was varied as 10, 20, and 40.The groupspecific item parameters a ig and b ig for i = 1, . . ., I and g = 1, 2 relied on common item parameters that were fixed in the simulation and a random uniform and normally distributed DIF effect that was simulated in each replication of the simulation study.The common item discriminations a i in the case of I = 10 items were chosen as 0.83, 1.02, 0.88, 0.80, 1.04, 0.95, 1.00, 1.13, 1.32, and 1.11, resulting in a mean M = 1.01 and an SD = 0.16.The common item difficulties b i were chosen as −1.74, −1.22, −0.22, 0.54, −0.04, −0.39, −0.73, 0.30, 0.83, and −1.39, resulting in M = −0.41 and SD = 0.86.For item numbers as multiples of 10, we duplicated the item parameters of the 10 items accordingly.The item parameters in the second group included a uniform DIF effect e i that was added to the common item difficulty b i .The item difficulty b i2 in the second group was simulated as b i2 = b i + e i for i = 1, . . ., I, where the DIF effects e i were independently and identically normally distributed with a mean of zero and a DIF SD τ.The DIF SD τ was chosen as 0, 0.25, and 0.5 (corresponding to DIF variances τ 2 of 0, 0.0625, and 0.25), indicating no DIF, moderate DIF, and large DIF.
Item responses were simulated according to the 2PL model for sample sizes N per group of 500, 1000, 2000, and 4000.
Five different linking methods were studied in the simulation study to estimate the mean µ and the SD σ in the second group: the original Stocking-Lord linking (SL) method, the SIMEX-based SL linking method based on quadratic (SLSQ) or linear (SLSL) extrapolation, and the three analytical bias correction SL methods SLA1, SLA2, and SLA3.The SIMEX-based linking methods relied on the bias-corrected DIF variance estimate τ2 .
In each of the 4 (sample size N) × 3 (DIF standard deviation τ) × 3 (number of items I) = 36 cells of the simulation, 2500 replications were conducted.We computed the empirical bias and the root mean square error (RMSE) for the estimated mean µ and the estimated standard deviation σ.A relative percentage RMSE was computed as the ratio of the RMSE values of a particular linking method and the RMSE of the SIMEX-based SLSQ linking method.
The R software (Version 4.2.3)[33] was used for the entire analysis in this simulation study.The 2PL model was fitted using the sirt::xxirt() function in the R package sirt [34].The author of this article wrote dedicated R functions for the original and bias-corrected SL linking methods.These functions and replication material for Simulation Study 1 can be found at https://osf.io/kjusm(accessed on 12 May 2024).

Results
Table 1 displays the bias and the relative RMSE of the estimated group mean µ as a function of the uniform DIF SD τ, the number of items I, and sample size N.It turned out that there was only a noticeable bias in µ for SL linking in the large DIF condition of τ = 0.5.Interestingly, although SL was biased in these conditions, the relative RMSE of SL had the largest difference to bias-corrected SL linking methods.
For the first group, the sample size N 1 was chosen as 250, 500, 1000, and 2000.The sample size N 2 for the second group was the same, but only sample size combinations (N 1 , N 2 ) fulfilling N 1 ≤ N 2 were simulated, resulting in 10 conditions.Moreover, the uniform DIF SD τ was chosen as 0, 0.25, and 0.5.
In total, 2000 replications were conducted in each of the 10 (sample size combinations (N 1 , N 2 )) × 3 (DIF standard deviation τ) = 30 cells of the simulation.The same analysis strategy as in Simulation Study 1 was chosen.

Results
Table 3 presents the bias and relative RMSE of the estimated group mean µ.Overall, the results in the case of unbalanced group sizes followed those of balanced group sizes.The performance of the different estimators was mainly driven by the group with a smaller sample size.It turned out that SIMEX-based bias corrections (SLSQ and SLSL) had particular advantages over the analytical bias corrections (SLA1, SLA2, and SLA3) for the conditions with a small sample size N 1 = 250.Table 4 presents bias and relative RMSE for the estimated SD σ.Again, the findings in the case of unbalanced sample sizes were similar to those in the case of balanced sample sizes.Importantly, the analytical bias correction did not perform well in the conditions with the smallest sample size N 1 = 250.Moreover, SIMEX-based bias correction methods also had issues for small sample sizes.Note.SL = Stocking-Lord linking; SLSQ = SIMEX-based Stocking-Lord linking with a quadratic extrapolation; SLSL = SIMEX-based Stocking-Lord linking with a linear extrapolation; SLA1 = analytical bias correction for Stocking-Lord linking according to Equation ( 15); SLA2 = analytical bias correction for Stocking-Lord linking according to Equation ( 16) using the empirical variance estimate τ2 ; SLA3 = analytical bias correction for Stocking-Lord linking according to Equation ( 16) using the bias-corrected variance estimate τ2 ; ‡ the linking method SLSQ was the reference method for computing the relative RMSE.Biases with absolute values larger than 0.010 are printed in bold font.Relative RMSE values larger than 105.0 are printed in bold font.

Method
In Simulation Study 2, we investigated the impact of misspecified IRT models.Item responses in this simulation study were simulated according to the logistic positive exponential (LPE) model [35][36][37][38].The IRF of the LPE model is provided by where ξ i is the positive exponent of the logistic IRF.Note that (30) corresponds to the 2PL model with ξ i = 1.The linking was based on the misspecified 2PL model.To reduce computation time, we only simulated one condition for the number of items with I = 20 items.
Item discriminations a i and item difficulties b i were the same as in Simulation Study 1 and Simulation Study 2. We now describe the choice of the positive exponent parameter ξ i .The item parameters for items 1 to 10 were the same as for items 11 to 20.Two types of ξ i parameters in the LPE IRT model were simulated: unbalanced and balanced ξ i parameters around the value ξ i = 1, which corresponds to the 2PL model.In the unbalanced case of ξ i parameters, all ξ i parameters were equal to a common parameter ξ 0 that was chosen as either 0.5 or 2. In the balanced case of ξ i parameters, the ξ i parameters of the first five items (items 1 to 5) were 1/ξ 0 , while they were ξ 0 for the second five items (items 6 to 10).The common parameter ξ 0 was chosen as 0.5, 1, or 2. Note that the condition ξ 0 = 1 corresponds to the 2PL model.Therefore, five conditions of item parameters in the LPE model were simulated.
Balanced group sizes with sample sizes N = 500, 1000, 2000, and 4000 were simulated.Moreover, uniform DIF for item difficulties b i were simulated with DIF SD τ of 0, 0.25, and 0.5.For reasons of space, we omitted to report the condition τ = 0.25 in the following subsection.
In each of the 4 (sample size N) × 3 (DIF standard deviation τ) × 5 (choice of ξ i parameter) = 48 cells of the simulation, 3000 replications were conducted.The same analysis strategy as in Simulation Study 1 was chosen.

Results
Table 5 shows the bias and relative RMSE for the estimated group mean µ for item responses simulated from the LPE model.In the case of balanced ξ i parameters, the group mean was essentially unbiased.However, there was bias in the estimated group mean in two conditions with unbalanced ξ i parameters.This was unsurprising because the 2PL analysis model was misspecified.If the 2PL model were intentionally chosen as the preferred scaling model, the pseudo-true population parameter would be the group mean in an infinite sample size with a DIF SD τ = 0. Hence, the adequacy of the bias correction methods for the 2PL model in the DIF condition τ = 0.5 should be judged on whether the estimated group mean would be similar to the no-DIF condition τ = 0 with a large sample size N = 4000.It can be seen in Table 5 that the SIMEX-based bias correction method SLSQ was quite successful in this respect.
Table 6 presents the bias and relative RMSE for the estimated group SD σ.As for the estimated group mean µ, the average estimates from the 2PL model differed from the LPE model when item responses were simulated with unbalanced ξ i parameters.Again, the average estimates based on the SIMEX-based and analytical bias correction methods had values similar to those in the no-DIF condition with a large sample size.
To conclude, performing (SIMEX-based) bias correction methods in SL linking is still beneficial in conditions where the 2PL model was misspecified.In these cases, the distribution parameters from the 2PL models estimate a pseudo-true population parameter that is defined for an infinite sample size and no DIF.

Discussion
In this article, different bias correction estimators for SL linking were compared.The previous literature highlighted that SL linking results in a substantial negative bias for the estimated group standard deviation and moderately biased group mean.This article proposed a SIMEX-based bias correction for SL linking that removed most of the bias in (large) DIF conditions and did not lead to practical efficiency losses in no-DIF conditions.Overall, the SIMEX-based SL methods had slight advantages over the analytical bias corrections for SL linking.However, the main advantage of the SIMEX method is that it is entirely computational and does not require analytical work.Hence, SIMEX-based bias correction can be applied to any linking method that could be affected by DIF.
It has been repeatedly pointed out that the presence of DIF effects requires identification constraints for the group mean and group standard deviation if the item parameters are not assumed to be invariant [39][40][41][42].For example, one could assume that the mean of uniform DIF effects equals zero.This case is referred to as DIF cancellation [43] or balanced DIF [44].In this article, we assumed that the DIF effects have zero means in the population (i.e., E(e i ) = 0); that is, they are centered in hypothetical of the experiment.The random DIF assumption is in stark contrast to the ordinarily employed fixed DIF assumption in which the DIF effects are treated as fixed parameters.We think that researchers intentionally define a pseudo-true parameter in the latter situation by choosing a particular linking method [45].Hence, any choice of a linking method and a structural assumption on DIF effects can be defended by a researcher in an empirical study.There is a tendency in the psychometric literature to believe in a partial invariance assumption of DIF effects (e.g., [46][47][48]).We tend to believe that the bias correction methods for Stocking-Lord linking proposed in this article are adequate for the random DIF situation but would likely be less effective for the fixed DIF case.
An anonymous reviewer wondered why we only specified Stocking-Lord linking in a "naïve" implementation in which the presence of DIF effects is essentially ignored.This reviewer suggested removing the identified DIF items from the linking method as discussed as iterated linking or scale purification in the literature [49][50][51][52][53].We do not think that it is generally advised to mindlessly eliminate items that are potentially prone to DIF from group comparisons in linking procedures because, in our belief, researchers should only remove items from a scale (or an analysis) if DIF is shown to be construct-irrelevant (i.e., not being construct-relevant; [54,55]).Unfortunately, such iterative approaches are also implemented in major large-scale educational assessment studies like the programme for international student assessment (PISA; [56,57]) that serve as methodological blueprints for empirical research.Effectively, iterative linking procedures frequently lead to similar findings like the regularized estimation approach [58][59][60] to DIF effects (see [61]).
Simulation Study 3 only considered one type of misspecified IRT model.It might be interesting to investigate whether the findings of this simulation study generalize to other complex IRT models, such as the filtered monotonic IRT model [62][63][64][65] or the fourparameter logistic IRT model [66,67].
A reviewer wondered whether the proposed bias correction methods could also be used to remove the potential bias due to sampling variation in the item parameters.However, the findings of Simulation Study 2 in the condition of a small sample size N 1 = 250 and no DIF demonstrated that such bias correction methods would even hurt the accuracy of the estimates.For larger sample sizes, the sampling error in the item parameters could be less relevant than the extent of the variation in the DIF effects that primarily bias Stocking-Lord linking.
An anonymous reviewer wondered whether the assumption of independent DIF effects could be weakened.In fact, the Taylor expansion could be extended to include differential testlet effects [68].Therefore, the bias correction methods can also accommodate testlet effects (see [14] for a similar approach).Moreover, the SIMEX method can also include variance portions that refer to additional testlet effects.The presence of differential testlet effects could be investigated in future research.
An anonymous reviewer commented that it might be unclear how the modified linking methods would perform in the presence of nonuniform DIF.First, we think, according to our research experience, that uniform DIF is more prevalent than nonuniform DIF [69].Second, both bias correction approaches can be extended to handle nonuniform DIF effects.The Taylor expansion in Section 2 can be extended to include nonuniform DIF effects, which subsequently also provides a bias-corrected estimator.Moreover, SIMEX can also be applied to multivariate predictor variables prone to measurement error [26].Only the variance matrix of the DIF effects instead of a scalar DIF standard deviation must be known (or estimated) to apply the SIMEX method to Stocking-Lord linking.Finally, we would also like to note that even recent articles in highly ranked journals using regularization approaches to handle DIF effects only treated the case of uniform DIF effects [70].
Future research might also investigate the performance of SIMEX-based bias correction methods for nonrobust or robust variants of SL linking (see also [71][72][73][74][75][76][77]).Furthermore, it would be interesting to adopt the methodology polytomous items.In our study, we only considered asymmetric SL linking.SIMEX-based bias correction could also be applied to symmetric SL linking [17,78].However, previous studies have shown that symmetric SL linking in its original form already has a smaller bias than asymmetric SL linking in the presence of DIF.

Table 1 .
Simulation Study 1: Bias and relative root mean square error (RMSE) for the estimated mean µ as a function of the uniform DIF standard deviation τ, number of items I, and sample size per group N. Lord linking according to Equation (16) using the bias-corrected variance estimate τ2 ; ‡ the linking method SLSQ was the reference method for computing the relative RMSE.Biases with absolute values larger than 0.010 are printed in bold font.Relative RMSE values larger than 105.0 are printed in bold font.

Table 2 .
Simulation Study 1: Bias and relative root mean square error (RMSE) for the estimated standard deviation σ as a function of the uniform DIF standard deviation τ, number of items I, and sample size per group N.

Table 3 .
Simulation Study 2: Bias and relative root mean square error (RMSE) for the estimated mean µ for I = 20 items as a function of the uniform DIF standard deviation τ, and sample sizes per group N 1 and N 2 .

Table 4 .
Simulation Study 2: Bias and relative root mean square error (RMSE) for the estimated standard deviation σ for I = 20 items as a function of the uniform DIF standard deviation τ, and sample sizes per group N 1 and N 2 .

Table 5 .
Simulation Study 3: Bias and relative root mean square error (RMSE) for the estimated mean µ for I = 20 items as a function of the type of the distribution and size of ξ i parameters in the logistic positive exponential (LPE) model, the uniform DIF standard deviation τ, and sample sizes per group N.Note.Bal = balanced ξ i parameters in the LPE model around 1; Unbal = all ξ i parameters either smaller or larger than 1; ξ 0 = size of ξ i parameters; SL = Stocking-Lord linking; SLSQ = SIMEX-based Stocking-Lord linking with a quadratic extrapolation; SLSL = SIMEX-based Stocking-Lord linking with a linear extrapolation; SLA1 = analytical bias correction for Stocking-Lord linking according to Equation (15); SLA2 = analytical bias correction for Stocking-Lord linking according to Equation (16) using the empirical variance estimate τ2 ; SLA3 = analytical bias correction for Stocking-Lord linking according to Equation (16) using the bias-corrected variance estimate τ2 ; ‡ the linking method SLSQ was the reference method for computing the relative RMSE.Biases with absolute values larger than 0.010 are printed in bold font.Relative RMSE values larger than 105.0 are printed in bold font.

Table 6 .
Simulation Study 3: Bias and relative root mean square error (RMSE) for the estimated standard deviation σ for I = 20 items as a function of the type of the distribution and size of ξ i parameters in the logistic positive exponential (LPE) model, the uniform DIF standard deviation τ, and sample sizes per group N.Note.Bal = balanced ξ i parameters in the LPE model around 1; Unbal = all ξ i parameters either smaller or larger than 1; ξ 0 = size of ξ i parameters; SL = Stocking-Lord linking; SLSQ = SIMEX-based Stocking-Lord linking with a quadratic extrapolation; SLSL = SIMEX-based Stocking-Lord linking with a linear extrapolation; SLA1 = analytical bias correction for Stocking-Lord linking according to Equation (15); SLA2 = analytical bias correction for Stocking-Lord linking according to Equation (16) using the empirical variance estimate τ2 ; SLA3 = analytical bias correction for Stocking-Lord linking according to Equation (16) using the bias-corrected variance estimate τ2 ; ‡ the linking method SLSQ was the reference method for computing the relative RMSE.Biases with absolute values larger than 0.010 are printed in bold font.Relative RMSE values larger than 105.0 are printed in bold font.