Next Article in Journal
Directed Topic Extraction with Side Information for Sustainability Analysis
Previous Article in Journal
Comparative Analysis of Nature-Inspired Metaheuristic Techniques for Optimizing Phishing Website Detection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

SIMEX-Based and Analytical Bias Corrections in Stocking–Lord Linking

by
Alexander Robitzsch
1,2
1
IPN—Leibniz Institute for Science and Mathematics Education, Olshausenstraße 62, 24118 Kiel, Germany
2
Centre for International Student Assessment (ZIB), Olshausenstraße 62, 24118 Kiel, Germany
Analytics 2024, 3(3), 368-388; https://doi.org/10.3390/analytics3030020
Submission received: 13 May 2024 / Revised: 24 June 2024 / Accepted: 25 July 2024 / Published: 6 August 2024

Abstract

:
Stocking–Lord (SL) linking is a popular linking method for group comparisons based on dichotomous item responses. This article proposes a bias correction technique based on the simulation extrapolation (SIMEX) method for SL linking in the 2PL model in the presence of uniform differential item functioning (DIF). The SIMEX-based method is compared to the analytical bias correction methods of SL linking. It turned out in a simulation study that SIMEX-based SL linking performed best, is easy to implement, and can be adapted to other linking methods straightforwardly.

1. Introduction

Item response theory (IRT) models [1,2] are statistical models for multivariate discrete variables. IRT models are frequently applied in educational or psychological research. This article investigates group comparisons in unidimensional IRT models [3]. Let X = ( X 1 , , X I ) be the vector of I dichotomous (i.e., binary) random variables X i { 0 , 1 } that are typically referred to as items in the psychometric literature. A unidimensional IRT model [4] is a statistical model for the probability distribution P ( X = x ) for x = ( x 1 , , x I ) { 0 , 1 } I , where
P ( X = x ; δ , γ ) = i = 1 I P i ( θ ; γ i ) x i 1 P i ( θ ; γ i ) 1 x i ϕ ( θ ; μ , σ ) d θ .
In (1), ϕ denotes the density of the normal distribution with mean μ and standard deviation  σ . The parameters of the distribution for the latent (factor) variable θ (also labeled as a trait or ability) are contained in the vector δ = ( μ , σ ) of the distribution parameters. The vector γ = ( γ 1 , , γ I ) contains all the estimated item parameters of the item response functions (IRF) P i ( θ ; γ i ) = P ( X i = 1 | θ ) ( i = 1 , , I ). The two-parameter logistic (2PL) model [5] possesses the IRF
P i ( θ ; γ i ) = Ψ a i ( θ b i ) ,
where a i and b i are the item discrimination and item difficulty, respectively, and Ψ ( x ) = ( 1 + exp ( x ) ) 1 denotes the logistic distribution function. For independent and identically distributed realizations of the random variable X , the unknown model parameters in (1) can be estimated by (marginal) maximum likelihood estimation [6,7].
In many educational applications, IRT models are employed to compare the distribution of two groups in a test (i.e., on a set of items) regarding the factor variable θ in the IRT model (1). Linking methods [8,9,10] estimate the 2PL model separately in the two groups in the first step. The linking methods process the estimated item parameters in the second step to calculate a difference regarding the mean μ and the standard deviation σ between the two groups. The separate application of the 2PL model in each of the two groups enables the items to function differently across the groups. This property is termed differential item functioning (DIF; [11,12]) in the literature. It has been pointed out that the occurrence of DIF causes additional variability in the estimated mean μ and standard deviation σ when applying a linking method [13,14,15,16]. Moreover, it has been shown that DIF can bias the group differences when applying a linking method [17].
We now discuss the group comparisons within the 2PL model in the presence of DIF. Assume that the 2PL model holds in the first group, with the item discriminations and item intercepts in the first group being defined as a i 1 = a i and b i 1 = b i , respectively. For identification reasons, we assume θ N ( 0 , 1 ) in the first group. In the second group, the ability distribution θ N ( μ , σ 2 ) is assumed, where μ and σ are the mean and the standard deviation (SD), respectively. The item discriminations are assumed to be invariant across the two groups; that is, a i 2 = a i . A random uniform DIF [11] effect e i is assumed for the item difficulties [18,19]. Then, the item difficulties b i 2 in the second group are provided by
b i 2 = b i + e i , where E ( e i ) = 0 and Var ( e i ) = τ 2 .
The variance τ 2 is also called the DIF variance [20], and τ is labeled the DIF SD.
A linking method consists of two steps. In the first step, the 2PL model is separately fitted within the two groups while assuming a standard normal distribution N ( 0 , 1 ) for the ability variable θ . Due to sampling errors (i.e., the sampling of persons), the estimated item parameters a ^ i 1 and b ^ i 1 will slightly differ from the data-generating parameters a i 1 = a i and b i 1 = b i . In the second group, the original item parameters a i 2 and b i 2 are not recovered because the 2PL model is fitted with θ N ( 0 , 1 ) , but the data-generating model imposes θ N ( μ , σ 2 ) . A simple calculation provides
a i 2 ( θ b i 2 ) = a i 2 ( σ θ * + μ b i 2 ) for θ * N ( 0 , 1 ) .
Therefore, the identified item parameters in the second group are provided by
a i 2 * = a i σ and b i 2 * = σ 1 ( b i + e i μ ) .
Due to sampling errors, the estimated item parameters a ^ i 1 and b ^ i 1 (or a ^ i 2 and b ^ i 2 ) will slightly differ from a i 1 * and b i 1 * (or a i 2 * and b i 2 * ).
In the presence of the sampling errors or DIF, a linking function H is chosen that estimates  μ and σ based on the estimated item parameters a ^ i g and b ^ i g for  i = 1 , , I and g = 1 , 2 . The Stocking–Lord (SL; [21]) linking method determines the parameter of interest δ = ( μ , σ ) as a minimizer of the weighted squared distance of the test characteristic functions
H ( δ ; γ ^ ) = t = 1 T ω t [ i = 1 I Ψ a ^ i 1 ( σ θ t + μ b ^ i 1 ) i = 1 I Ψ a ^ i 2 ( θ t b ^ i 2 ) ] 2 .
In (6), the IRFs are evaluated on a finite grid of θ values θ 1 , , θ T , and ω t represents the known weight. Previous research highlighted that SL linking is superior to other linking methods, such as Haebara [22] linking [23,24,25]. However, the conclusion of this research was based on the absence of DIF. It has been shown in [17] that the presence of (uniform) DIF can lead to biased estimates of the mean μ and the SD σ , which quantify the differences between the two groups.
In this article, we aim to study the bias correction methods for SL linking. Simulation extrapolation (SIMEX; [26]) is a method to correct for bias due to measurement error. It is particularly suited for nonlinear statistical models because it is entirely simulation-based and does not require analytical derivations that must be carried out for a particular model under study. This article interprets uniform DIF as a measurement error in applying the SIMEX method to the SL linking method, corresponding to a particular nonlinear model. Our proposed SIMEX method is compared with an alternative analytical bias correction that has been recently proposed for a wide class of linking methods [27].
The rest of the article is organized as follows. In Section 2, we review the analytical bias correction methods in SL linking. The newly proposed SIMEX-based bias correction method for SL linking is presented in Section 3. Section 4 presents a numerical illustration based on idealized datasets to demonstrate the bias in the original SL linking. The newly proposed SIMEX-based SL linking is compared with the original SL linking and other analytical bias correction SL methods through three simulation studies in Section 5. Finally, the article closes with a discussion in Section 6.

2. Analytical Bias Correction in Stocking–Lord Linking

In this section, analytical bias correction for a general linking method [27], including the SL linking method as an example, is reviewed. In a linking method, the vector δ = ( μ , σ ) is the statistical parameter of interest. The parameter estimate for δ is denoted as δ ^ = ( μ ^ , σ ^ ) . The linking function H is defined as a function of δ and estimated item parameters γ ^ = ( γ ^ 1 , , γ ^ I ) (where γ ^ i = ( a ^ i 1 , b ^ i 1 , a ^ i 2 , b ^ i 2 ) ) in the two groups such that
δ ^ = arg min δ H ( δ ; γ ^ ) ,
where is H is a sufficiently smooth (i.e., at least three times differentiable) function. The SL linking function defined in (6) is an example of the general formulation (7). The parameter estimate δ ^ fulfills the estimating equation
H δ ( δ ^ ; γ ^ ) = H δ ( δ ^ ; γ ^ ) = 0 .
The linking method that is associated with the linking function H recovers the true group mean μ 0 and the true group SD σ 0 if there is no sampling error and no DIF (i.e., τ = 0 ). This property is formalized as
H δ ( δ 0 ; γ 0 ) = 0 for δ 0 = ( μ 0 , σ 0 ) ,
where γ 0 denotes joint item parameters a i and b i , and all uniform DIF effects e i are assumed to be zero.
The analytical bias correction due to the presence of DIF effects e i is based on a second-order Taylor expansion of H δ . The estimated item parameters γ ^ i for item i are a function of a common item parameter γ i = ( a i , b i ) and the uniform DIF effect e i . The estimating Equation (8) can be rewritten as
H δ ( δ ; γ , e ) = 0 ,
where γ collects all item parameters and e = ( e 1 , , e I ) denotes the vector of DIF effects. Note that H δ ( δ 0 ; γ , 0 ) = 0 for e = 0 .
By assuming independent DIF effects e i for i = 1 , , I , we apply a first-order Taylor expansion with respect to δ and a second-order Taylor expansion with respect to DIF effects  e i to H δ and obtain
0 = H δ ( δ ^ ; γ , e ) = H δ ( δ 0 ; γ , 0 ) + H δ δ ( δ 0 ; γ , 0 ) ( δ ^ δ 0 ) + i = 1 I H δ e i ( δ 0 ; γ , 0 ) e i + 1 2 i = 1 I H δ e i e i ( δ 0 ; γ , 0 ) e i 2
Note that Taylor expansion (11) relies on the assumption of independent DIF effects e i . Because H δ ( δ 0 ; γ , 0 ) = 0 , we obtain from (11)
δ ^ δ 0 = H δ δ ( δ 0 ; γ , 0 ) 1 i = 1 I H δ e i ( δ 0 ; γ , 0 ) e i + 1 2 i = 1 I H δ e i e i ( δ 0 ; γ , 0 ) e i 2
Because DIF effects can be assumed as random variables with zero mean (i.e., E ( e i ) = 0 ), the expected bias of the parameter estimate δ ^ can be computed as
Bias ( δ ^ ) = 1 2 H δ δ ( δ 0 ; γ , 0 ) 1 i = 1 I H δ e i e i ( δ 0 ; γ , 0 ) E ( e i 2 ) .
Equation (13) can be used to construct a bias correction term for δ ^ . We compute estimated DIF effects e ^ i as a proxy for the true DIF effect e i as
e ^ i = μ ^ + σ ^ b ^ i 2 b ^ i 1 .
Then, we obtain a bias-corrected estimate δ ^ br of δ of an empirical version of δ ^ br = δ ^ Bias ( δ ^ ) as
δ ^ br = δ ^ + 1 2 H δ δ ( δ ^ ; γ ^ , 0 ) 1 i = 1 I H δ e i e i ( δ ^ ; γ ^ , 0 ) e ^ i 2
The vector H δ e i e i contains second-order derivatives of H δ with respect to e i and is provided by H δ e i e i = ( H μ e i e i , H σ e i e i ) .
An alternative bias correction estimate can be obtained by using
δ ^ br = δ ^ + 1 2 τ ^ 2 H δ δ ( δ ^ ; γ ^ , 0 ) 1 i = 1 I H δ e i e i ( δ ^ ; γ ^ , 0 ) ,
where τ ^ 2 is an estimate of τ 2 = Var ( e i ) = E ( e i 2 ) .
We now present the required partial derivatives for the bias correction in SL linking. The estimating equations for SL linking defined in (6) for the mean μ and the SD σ are provided as
H μ = 2 t = 1 T ω t i = 1 I Ψ a ^ i 1 ( σ θ t + μ b ^ i 1 ) i = 1 I Ψ a ^ i 2 ( θ t b ^ i 2 ) i = 1 I Ψ a ^ i 1 ( σ θ t + μ b ^ i 1 ) a ^ i 1 = 0 and
H σ = 2 t = 1 T ω t i = 1 I Ψ a ^ i 1 ( σ θ t + μ b ^ i 1 ) i = 1 I Ψ a ^ i 2 ( θ t b ^ i 2 ) i = 1 I Ψ a ^ i 1 ( σ θ t + μ b ^ i 1 ) a ^ i 1 θ t = 0 .
The second-order derivatives H δ δ can be similarly obtained.
The second-order derivatives of H μ and H σ with respect to e i can be computed as
H μ e i e i = 2 t = 1 T ω t Ψ a ^ i 2 ( θ t b ^ i 2 ) i = 1 I Ψ a ^ i 1 ( σ θ t + μ b ^ i 1 ) a ^ i 1 a ^ i 1 2 and
H σ e i e i = 2 t = 1 T ω t Ψ a ^ i 2 ( θ t b ^ i 2 ) i = 1 I Ψ a ^ i 1 ( σ θ t + μ b ^ i 1 ) a ^ i 1 a ^ i 1 2 θ t ,
where Ψ and Ψ denote the first and the second derivative of the logistic function Ψ , respectively. The terms can be inserted into the Formula (15) for the bias-corrected estimates.
The application of the bias correction estimate (15) to SL linking is denoted by SLA1 in the rest of this paper. If τ ^ 2 is estimated using the empirical variance of the estimates  e ^ i , the corresponding bias correction estimate for SL linking is denoted by SLA2. The empirical variance of e ^ i also contains sampling errors due to the sampling of persons. By obtaining variance estimates Var ( b ^ i 1 ) and Var ( b ^ i 2 ) from the group-wise fitted 2PL models, the part of the variance Var ( e ^ i ) due to sampling of persons can be computed. By computing the difference in the empirical variance and the average variance of e i estimates due to sampling error, we obtain a bias-corrected variance τ ^ ^ 2 . If a negative variance estimate would result in this difference, the quantity should be set to zero. If the variance estimate τ ^ ^ 2 is used in the bias correction Formula (16) for SL linking, the corresponding estimate is denoted by SLA3.
Note that the second-order derivatives H δ e i e i vanish in linear linking methods such as mean–geometric-mean linking [9]. Therefore, no bias would occur in these methods [27].

3. SIMEX-Based Bias Correction in Stocking–Lord Linking

In this section, we describe how the simulation extrapolation (SIMEX) method [28,29,30] is adapted to conduct a bias correction for SL linking in the presence of uniform DIF. We now briefly describe the main idea of the SIMEX method. Let a dataset contain parts Z and W , where variables in the matrix Z is measured without error and W contains measurement error. Assume that rows in these matrices refer to independent observations such that z i and w i are the observed data of case i. In our application, we only consider a one-dimensional variable W that is prone to measurement error. We assume the decomposition
w i = w i * + e i ,
where w i * corresponds to an error-free measurement and e i denotes the measurement error. It is assumed that the measurement error Var ( e i ) = τ 2 is known or can be consistently estimated from data. Furthermore, we assume that measurement errors e i are normally distributed with zero mean.
A statistical procedure provides a parameter δ ^ = f ( Z , W ) for a specified function f to estimate an unknown parameter  δ . Frequently, the bias of the estimated parameter δ ^ depends on the measurement error variance  τ 2 . The SIMEX method aims at removing (or, at best, reducing) the bias in δ ^ due to measurement error by means of a simulation-based method. The general idea is to induce additional measurement error in the data and to compute the parameter estimate δ ^  on these modified datasets that contain additional measurement error. More formally, the values w i of variable W are modified as
w ˜ i = w i + e ˜ i = w i * + e i + e ˜ i ,
where e ˜ i is a random draw from a normal distribution with zero mean and a standard deviation of λ τ 2 . Hence, the w ˜ i contains measurement error e i + e ˜ i that has the variance ( 1 + λ ) τ 2 . SIMEX evaluates the estimate δ ^ as a function of λ at a grid of values. Ref. [31] recommended using the λ grid of 0.5, 1.0, 1.5, and 2.0. To reduce Monte Carlo error, the estimation is applied for a repeated simulation of the dataset for a fixed λ value. To this end, the parameter estimate is a function of λ , resulting in a parameter curve δ ^ ( λ ) . This first part of the procedure refers to the simulation step in SIMEX. The second part of SIMEX is the extrapolation step. A regression function δ ^ ( λ ) is estimated, and the predicted value at λ = 1 provides a parameter estimate for δ extrapolated to the case of no measurement error variance (i.e., τ 2 = 0 ). The extrapolating function could be linear or quadratic:
δ ^ ( λ ) u 0 + u 1 λ or δ ^ ( λ ) v 0 + v 1 λ + v 2 λ 2 .
In the case of linear extrapolation, the final parameter estimate for δ is provided as u 0 u 1 . In the case of quadratic extrapolation, the final parameter estimate is provided by v 0 v 1 + v 2 . The SIMEX method is a particularly flexible measurement error correction method because it can be applied to any class of statistical models [26,32].
We now interpret the application of Stocking–Lord (SL) linking in the presence of uniform DIF as a case of measurement error. Note that DIF, as a measurement error phenomenon, also occurs at the population level and is not restricted to the finite sample size of persons. The input data of SL linking are item parameters { a ^ i 1 , b ^ i 1 , a ^ i 2 , b ^ i 2 } i = 1 , , I . In what follows, we assume that item parameters do not contain sampling errors of persons and that there is only uniform DIF. Note that we have (see (5))
a ^ i 2 = σ a ^ i 1 = a i and b ^ i 2 = σ 1 ( b i + e i μ ) ,
where δ = ( μ , σ ) is the parameter vector of interest. The uniform DIF effect e i is considered as measurement error and has a variance τ 2 . An application of the original SL linking method provides initial estimates σ ^ and σ ^ from which DIF effects e i can be estimated as
e ^ i = μ ^ + σ ^ b ^ i 2 b ^ i 1 .
We discussed in Section 2 how τ 2 can be estimated from the data. In the application of SIMEX in this paper, we employ the bias-corrected variance estimate τ ^ ^ 2 (see the end of Section 2). The general idea of applying the SIMEX method is to induce additional DIF in the data and to study the expected parameter estimate in SL linking for an increased DIF variance. That is, modified DIF effects e ˜ i are computed as
e ˜ i = e ^ i + ε i ,
where ε i is a random draw from a normal distribution with zero mean and variance λ τ 2 . Overall, the variable e ˜ i has a variance of ( 1 + λ ) τ 2 . Finally, we recompute item difficulties in the second group as (see (24))
b ˜ i 2 = σ ^ 1 ( b ^ i 1 + e ˜ i μ ^ ) = σ ^ 1 ( b ^ i 1 + e ^ i + ε i μ ^ )
and subsequently apply the SL linking method to the set of modified item parameters { a ^ i 1 , b ^ i 1 , a ^ i 2 , b ˜ i 2 } i = 1 , , I . As a consequence, modified parameter estimates μ ^ and σ ^ are obtained for item parameters that contain more uniform DIF (i.e., more measurement error) than the original item parameters. For the parameter of interest δ = μ or δ = σ , the SIMEX method provides a parameter curve  δ ( λ ) from which a linear or quadratic regression function can be calculated (see (23)). The extrapolation of the regression function at λ = 1 is used as a final parameter estimate  δ ^ that aims to reduce potential bias due to DIF.
The originally proposed SIMEX method simulates new datasets. To reduce Monte Carlo error, the simulation is repeatedly conducted for a fixed λ value, and the parameter estimates are averaged afterward. In the case of SL linking, the critical issue is that there are only a few cases (i.e., items) leading to relatively large Monte Carlo errors. Furthermore, reducing the Monte Carlo requires a (very) large number of simulations. However, the parameter estimates in the SL linking method should only be corrected for asymptotic bias. This kind of bias in SL linking would occur in a test for infinite length (i.e., I ) for a fixed DIF variance τ 2 . To this end, we propose a quasi-Monte Carlo variant of SIMEX in which SL linking is applied to a test that contains M I pseudo-items for an integer M. The idea is to replicate item parameters and induce normally distributed uniform DIF systematically across all items. Let ε 1 , , ε M be quantiles of the standard normal distribution such that the mean is approximately zero and the variance is one. In this paper, we chose M = 81 and computed the inverse of the standard normal distribution Φ 1 (i.e., quantiles) on the grid seq(1/(2*M+2), 1-1/(2*M+2), length=M). The ε m values are obtained using qnorm(seq(1/(2*M+2), 1-1/(2*M+2), length=M)). For a fixed λ , the set of values λ τ ε m have approximately a zero mean and a variance λ τ 2 . The item difficulty for the pseudo-item ( i , m ) for i = 1 , , I and m = 1 , , M is defined as
b ˜ ( i , m ) , 2 = σ ^ 1 ( b ^ i 1 + e ^ i + λ τ ε m μ ^ ) .
All other item parameters a ^ i 1 , b ^ i 1 , and a ^ i 2 are left unmodified across the values of m = 1 , , M . By this method, a set of item parameters of pseudo-items are constructed that have a DIF variance of  λ τ 2 . The SL linking method is applied to the set of pseudo-items for different λ values, and a SIMEX parameter curve δ ( λ ) is obtained. Note that our modified SIMEX method does not contain simulation error but at the price of only removing (or reducing) asymptotic bias. Finally, the extrapolation of a fitted regression function to the values δ ( λ ) at λ = 1 provides a final parameter estimate δ ^ .
In this article, we used the quadratic function for SIMEX extrapolation in SL linking (denoted as SLSQ). We also considered the linear extrapolation function in SIMEX for SL linking, abbreviated as SLSL.
It should be emphasized that the described quasi-Monte Carlo SIMEX procedure can be applied to any linking method that uses item parameters as input and is potentially biased in the presence of DIF.

4. Numerical Illustration

In this section, we demonstrate the bias in SL linking for an idealized dataset in the presence of uniform DIF. Because the bias occurs at population-level data, there is no need to simulate item response. Instead, we show the bias with computations that are solely based on item parameters.
We consider the situation of linking two groups using the 2PL model. The item discriminations a i ( = a i 1 = a i 2 ) are chosen equal to 1 in both groups. Eleven base items are defined with equidistant item difficulties −2, −1.6, …, 1.6, and 2. These item parameters are duplicated eleven times. As described in the definition of pseudo-items in our SIMEX modification to SL linking, we define a grid of values ε m that is computed by qnorm(seq(1/(2*M+2), 1-1/(2*M+2), length=M)), where M = 11 (i.e., M <- 11). These values are approximately standard normally distributed and have zero mean and a standard deviation of one. Overall, 11 × 11 items ( i , m ) are used in this illustration. In the first group, the item parameters are duplicated M = 11 times. In the second group, for item ( i , m ) , the item discrimination a ( i , m ) 2 equals a i . The item difficulty b ( i , m ) 2 is defined as
b ( i , m ) 2 = b i + τ ε m
By this construction, the DIF variance in the test is τ 2 . Note that we constructed DIF deterministically in a systematic way such that items of all difficulties are crossed with all levels of uniform DIF. We computed sets of item parameters for values of τ 2 between 0 and 1. True group differences were simulated by setting μ = 0.3 and σ = 1.2 in this illustration.
The original SL linking method was applied to these datasets of item parameters. This method was compared with the two SIMEX-based SL methods SLSQ (quadratic extrapolation) and SLSL (linear extrapolation), as well as the analytical bias correction SLA1. We studied the parameter estimates μ ^ and σ ^ as a function of the DIF variance τ 2 . SIMEX was applied with the four λ values 0.5, 1.0, 1.5, and 2.0.
Figure 1 displays the SIMEX parameter curves μ ^ ( λ ) and σ ^ ( λ ) for the set of item parameters with the DIF variance τ 2 = 0.55 (which corresponds to a DIF SD τ = 0.74 ). It can be seen in Figure 1 that the linear SIMEX regression curve slightly differed from the quadratic SIMEX regression curve. The estimated mean μ ^ for the original SL linking method was 0.285, which led to a bias of 0.285 0.3 = 0.015 . In contrast, SIMEX-based linking based on quadratic extrapolation (SLSQ) resulted in an almost unbiased estimate of 0.299. The other methods, however, were close, at 0.297 (SLA1) and 0.296 (SLSL).
The differences between the linking methods were more pronounced for the estimated SD σ ^ . SL linking resulted in an estimate of 1.140, which led to a bias of 1.140 1.2 = 0.160 . The SIMEX-based linking method SLSQ resulted in a σ estimate of 1.196, which again was almost unbiased. In contrast, the methods SLA1 (with 1.188) and SLSL (with 1.183) resulted in slight biases. However, all three bias correction SL methods clearly outperformed the original SL linking method in terms of bias.
Figure 2 presents parameter estimates μ ^ and σ ^ as a function of the DIF variance τ 2 . It is evident that SL linking provided (strongly) biased estimates for μ and σ . The bias is an approximately linear function of τ 2 . Notably, the bias correction SL methods SLSQ, SLA1, and SLSL were superior compared to SL because they resulted in parameter estimates close to the true values μ = 0.3 and σ = 1.2 . For large DIF variances, the SIMEX-based linking method SLSQ should be preferred over alternative bias-corrected SL linking methods in terms of bias. It should be noted that the bias for SL linking for the estimated SD is much larger than for the estimated mean.

5. Simulation Studies

5.1. Simulation Study 1: Balanced Group Sizes

5.1.1. Method

In this simulation study, the 2PL model was used to simulate item responses in two groups. The mean and the standard deviation of the normally distributed ability variable θ in the first group were set to 0 to 1, respectively. The mean μ and the SD σ for the normally distributed ability variable θ in the second group were set to 0.3 and 1.2, respectively.
The number of items I in the simulation was varied as 10, 20, and 40. The group-specific item parameters a i g and b i g for i = 1 , , I and g = 1 , 2 relied on common item parameters that were fixed in the simulation and a random uniform and normally distributed DIF effect that was simulated in each replication of the simulation study. The common item discriminations a i in the case of I = 10 items were chosen as 0.83, 1.02, 0.88, 0.80, 1.04, 0.95, 1.00, 1.13, 1.32, and 1.11, resulting in a mean M = 1.01 and an S D = 0.16 . The common item difficulties b i were chosen as −1.74, −1.22, −0.22, 0.54, −0.04, −0.39, −0.73, 0.30, 0.83, and −1.39, resulting in M = 0.41 and S D = 0.86 . For item numbers as multiples of 10, we duplicated the item parameters of the 10 items accordingly. The item parameters in the second group included a uniform DIF effect e i that was added to the common item difficulty b i . The item difficulty b i 2 in the second group was simulated as b i 2 = b i + e i for i = 1 , , I , where the DIF effects e i were independently and identically normally distributed with a mean of zero and a DIF SD τ . The DIF SD τ was chosen as 0, 0.25, and 0.5 (corresponding to DIF variances τ 2 of 0, 0.0625, and 0.25), indicating no DIF, moderate DIF, and large DIF.
Item responses were simulated according to the 2PL model for sample sizes N per group of 500, 1000, 2000, and 4000.
Five different linking methods were studied in the simulation study to estimate the mean μ and the SD σ in the second group: the original Stocking–Lord linking (SL) method, the SIMEX-based SL linking method based on quadratic (SLSQ) or linear (SLSL) extrapolation, and the three analytical bias correction SL methods SLA1, SLA2, and SLA3. The SIMEX-based linking methods relied on the bias-corrected DIF variance estimate τ ^ ^ 2 .
In each of the 4 (sample size N) × 3 (DIF standard deviation τ ) × 3 (number of items I) = 36 cells of the simulation, 2500 replications were conducted. We computed the empirical bias and the root mean square error (RMSE) for the estimated mean μ ^ and the estimated standard deviation σ ^ . A relative percentage RMSE was computed as the ratio of the RMSE values of a particular linking method and the RMSE of the SIMEX-based SLSQ linking method.
The R software (Version 4.2.3) [33] was used for the entire analysis in this simulation study. The 2PL model was fitted using the sirt::xxirt() function in the R package sirt [34]. The author of this article wrote dedicated R functions for the original and bias-corrected SL linking methods. These functions and replication material for Simulation Study 1 can be found at https://osf.io/kjusm (accessed on 12 May 2024).

5.1.2. Results

Table 1 displays the bias and the relative RMSE of the estimated group mean μ ^ as a function of the uniform DIF SD τ , the number of items I, and sample size N. It turned out that there was only a noticeable bias in μ ^ for SL linking in the large DIF condition of τ = 0.5 . Interestingly, although SL was biased in these conditions, the relative RMSE of SL had the largest difference to bias-corrected SL linking methods.
We also computed descriptive statistics in which we averaged absolute bias and relative RMSE across simulation conditions. SL had the worst performance regarding absolute bias (SL: M = 0.009 , S D = 0.008 ). However, the differences between the bias-corrected SL linking methods regarding the absolute bias in μ ^ were negligible (SLSQ: M = 0.002 , S D = 0.002 ; SLSL: M = 0.002 , S D = 0.002 ; SLA1: M = 0.004 , S D = 0.003 ; SLA2: M = 0.004 , S D = 0.003 ; SLA3: M = 0.002 , S D = 0.002 ). Further, RMSE differences for μ ^ between the SL linking methods can also be considered relatively minor (SL: M = 99.25 , S D = 0.93 ; SLSQ: M = 100 , S D = 0 ; SLSL: M = 99.73 , S D = 0.33 ; SLA1: M = 100.52 , S D = 0.47 ; SLA2: M = 100.49 , S D = 0.41 ; SLA3: M = 100.01 , S D = 0.07 ).
Table 2 presents results for the bias and the relative RMSE for the SD estimate σ ^ . The original SL linking method was negatively biased in the large-DIF condition. Importantly, the bias in σ ^ was essentially removed by the bias correction SL linking methods, with the exception of SLA1 and SLA2 in the condition of N = 500 . SIMEX-based SL linking with quadratic extrapolation (SLSQ) had less bias than SIMEX-based SL linking with linear extrapolation (SLSL). However, SLSL had a smaller RMSE in the large DIF condition τ = 0.5 .
We also computed descriptive statistics averaged across all simulation conditions to summarize the performance of the SL linking estimators. Overall, concerning absolute bias, SLSQ can be considered the frontrunner, followed by SLA3, SLSL, SLA1, and SLA2 (SL: M = 0.016 , S D = 0.016 ; SLSQ: M = 0.003 , S D = 0.002 ; SLSL: M = 0.004 , S D = 0.003 ; SLA1: M = 0.005 , S D = 0.004 ; SLA2: M = 0.006 , S D = 0.005 ; SLA3: M = 0.003 , S D = 0.002 ). As expected, the original SL linking had the worst performance regarding absolute bias. The SLA1 method had a substantially larger RMSE than SLA2 in the condition of large DIF. The negative bias of SL linking was strongly quantified in inferior performance of the relative RMSE in larger samples (SL: M = 105.45 , S D = 11.08 , M i n = 97.1 , M a x = 147.0 ). Overall, it could also be stated regarding the relative RMSE that the analytical bias correction SL methods SLA2 and SLA3 performed quite similarly to SLSQ or SLSL (SLSQ: M = 100 , S D = 0 ; SLSL: M = 99.49 , S D = 0.76 ; SLA1: M = 102.51 , S D = 2.47 ; SLA2: M = 101.90 , S D = 1.18 ; SLA3: M = 100.88 , S D = 1.16 ).

5.2. Simulation Study 2: Unbalanced or Small Group Sizes

5.2.1. Method

In Simulation Study 2, we investigated unbalanced or smaller group sizes. The same item parameters and distribution parameters as in Simulation Study 1 were used (see Section 5.1.1). However, we only simulated the condition with I = 20 items to reduce computation time.
For the first group, the sample size N 1 was chosen as 250, 500, 1000, and 2000. The sample size N 2 for the second group was the same, but only sample size combinations ( N 1 , N 2 ) fulfilling N 1 N 2 were simulated, resulting in 10 conditions. Moreover, the uniform DIF SD τ was chosen as 0, 0.25, and 0.5.
In total, 2000 replications were conducted in each of the 10 (sample size combinations ( N 1 , N 2 ) ) × 3 (DIF standard deviation τ ) = 30 cells of the simulation. The same analysis strategy as in Simulation Study 1 was chosen.

5.2.2. Results

Table 3 presents the bias and relative RMSE of the estimated group mean μ ^ . Overall, the results in the case of unbalanced group sizes followed those of balanced group sizes. The performance of the different estimators was mainly driven by the group with a smaller sample size. It turned out that SIMEX-based bias corrections (SLSQ and SLSL) had particular advantages over the analytical bias corrections (SLA1, SLA2, and SLA3) for the conditions with a small sample size N 1 = 250 .
Table 4 presents bias and relative RMSE for the estimated SD σ ^ . Again, the findings in the case of unbalanced sample sizes were similar to those in the case of balanced sample sizes. Importantly, the analytical bias correction did not perform well in the conditions with the smallest sample size N 1 = 250 . Moreover, SIMEX-based bias correction methods also had issues for small sample sizes.

5.3. Simulation Study 3: Misspecified IRT Model

5.3.1. Method

In Simulation Study 2, we investigated the impact of misspecified IRT models. Item responses in this simulation study were simulated according to the logistic positive exponential (LPE) model [35,36,37,38]. The IRF of the LPE model is provided by
P i ( θ ; γ i ) = Ψ ( a i ( θ b i ) ) ξ i ,
where ξ i is the positive exponent of the logistic IRF. Note that (30) corresponds to the 2PL model with ξ i = 1 . The linking was based on the misspecified 2PL model. To reduce computation time, we only simulated one condition for the number of items with I = 20 items.
Item discriminations a i and item difficulties b i were the same as in Simulation Study 1 and Simulation Study 2. We now describe the choice of the positive exponent parameter ξ i . The item parameters for items 1 to 10 were the same as for items 11 to 20. Two types of ξ i parameters in the LPE IRT model were simulated: unbalanced and balanced ξ i parameters around the value ξ i = 1 , which corresponds to the 2PL model. In the unbalanced case of ξ i parameters, all ξ i parameters were equal to a common parameter ξ 0 that was chosen as either 0.5 or 2. In the balanced case of ξ i parameters, the ξ i parameters of the first five items (items 1 to 5) were 1 / ξ 0 , while they were ξ 0 for the second five items (items 6 to 10). The common parameter ξ 0 was chosen as 0.5, 1, or 2. Note that the condition ξ 0 = 1 corresponds to the 2PL model. Therefore, five conditions of item parameters in the LPE model were simulated.
Balanced group sizes with sample sizes N = 500 , 1000, 2000, and 4000 were simulated. Moreover, uniform DIF for item difficulties b i were simulated with a DIF SD τ of 0, 0.25, and 0.5. For reasons of space, we omitted to report the condition τ = 0.25 in the following subsection.
In each of the 4 (sample size N) × 3 (DIF standard deviation τ ) × 5 (choice of ξ i parameter) = 48 cells of the simulation, 3000 replications were conducted. The same analysis strategy as in Simulation Study 1 was chosen.

5.3.2. Results

Table 5 shows the bias and relative RMSE for the estimated group mean μ ^ for item responses simulated from the LPE model. In the case of balanced ξ i parameters, the group mean was essentially unbiased. However, there was bias in the estimated group mean in two conditions with unbalanced ξ i parameters. This was unsurprising because the 2PL analysis model was misspecified. If the 2PL model were intentionally chosen as the preferred scaling model, the pseudo-true population parameter would be the group mean in an infinite sample size with a DIF SD τ = 0 . Hence, the adequacy of the bias correction methods for the 2PL model in the DIF condition τ = 0.5 should be judged on whether the estimated group mean would be similar to the no-DIF condition τ = 0 with a large sample size N = 4000 . It can be seen in Table 5 that the SIMEX-based bias correction method SLSQ was quite successful in this respect.
Table 6 presents the bias and relative RMSE for the estimated group SD σ ^ . As for the estimated group mean μ ^ , the average estimates from the 2PL model differed from the LPE model when item responses were simulated with unbalanced ξ i parameters. Again, the average estimates based on the SIMEX-based and analytical bias correction methods had values similar to those in the no-DIF condition with a large sample size.
To conclude, performing (SIMEX-based) bias correction methods in SL linking is still beneficial in conditions where the 2PL model was misspecified. In these cases, the distribution parameters from the 2PL models estimate a pseudo-true population parameter that is defined for an infinite sample size and no DIF.

6. Discussion

In this article, different bias correction estimators for SL linking were compared. The previous literature highlighted that SL linking results in a substantial negative bias for the estimated group standard deviation and moderately biased group mean. This article proposed a SIMEX-based bias correction for SL linking that removed most of the bias in (large) DIF conditions and did not lead to practical efficiency losses in no-DIF conditions. Overall, the SIMEX-based SL methods had slight advantages over the analytical bias corrections for SL linking. However, the main advantage of the SIMEX method is that it is entirely computational and does not require analytical work. Hence, SIMEX-based bias correction can be applied to any linking method that could be affected by DIF.
It has been repeatedly pointed out that the presence of DIF effects requires identification constraints for the group mean and group standard deviation if the item parameters are not assumed to be invariant [39,40,41,42]. For example, one could assume that the mean of uniform DIF effects equals zero. This case is referred to as DIF cancellation [43] or balanced DIF [44]. In this article, we assumed that the DIF effects have zero means in the population (i.e., E ( e i ) = 0 ); that is, they are centered in hypothetical replications of the experiment. The random DIF assumption is in stark contrast to the ordinarily employed fixed DIF assumption in which the DIF effects are treated as fixed parameters. We think that researchers intentionally define a pseudo-true parameter in the latter situation by choosing a particular linking method [45]. Hence, any choice of a linking method and a structural assumption on DIF effects can be defended by a researcher in an empirical study. There is a tendency in the psychometric literature to believe in a partial invariance assumption of DIF effects (e.g., [46,47,48]). We tend to believe that the bias correction methods for Stocking–Lord linking proposed in this article are adequate for the random DIF situation but would likely be less effective for the fixed DIF case.
An anonymous reviewer wondered why we only specified Stocking–Lord linking in a “naïve” implementation in which the presence of DIF effects is essentially ignored. This reviewer suggested removing the identified DIF items from the linking method as discussed as iterated linking or scale purification in the literature [49,50,51,52,53]. We do not think that it is generally advised to mindlessly eliminate items that are potentially prone to DIF from group comparisons in linking procedures because, in our belief, researchers should only remove items from a scale (or an analysis) if DIF is shown to be construct-irrelevant (i.e., not being construct-relevant; [54,55]). Unfortunately, such iterative approaches are also implemented in major large-scale educational assessment studies like the programme for international student assessment (PISA; [56,57]) that serve as methodological blueprints for empirical research. Effectively, iterative linking procedures frequently lead to similar findings like the regularized estimation approach [58,59,60] to DIF effects (see [61]).
Simulation Study 3 only considered one type of misspecified IRT model. It might be interesting to investigate whether the findings of this simulation study generalize to other complex IRT models, such as the filtered monotonic IRT model [62,63,64,65] or the four-parameter logistic IRT model [66,67].
A reviewer wondered whether the proposed bias correction methods could also be used to remove the potential bias due to sampling variation in the item parameters. However, the findings of Simulation Study 2 in the condition of a small sample size N 1 = 250 and no DIF demonstrated that such bias correction methods would even hurt the accuracy of the estimates. For larger sample sizes, the sampling error in the item parameters could be less relevant than the extent of the variation in the DIF effects that primarily bias Stocking–Lord linking.
An anonymous reviewer wondered whether the assumption of independent DIF effects could be weakened. In fact, the Taylor expansion could be extended to include differential testlet effects [68]. Therefore, the bias correction methods can also accommodate testlet effects (see [14] for a similar approach). Moreover, the SIMEX method can also include variance portions that refer to additional testlet effects. The presence of differential testlet effects could be investigated in future research.
An anonymous reviewer commented that it might be unclear how the modified linking methods would perform in the presence of nonuniform DIF. First, we think, according to our research experience, that uniform DIF is more prevalent than nonuniform DIF [69]. Second, both bias correction approaches can be extended to handle nonuniform DIF effects. The Taylor expansion in Section 2 can be extended to include nonuniform DIF effects, which subsequently also provides a bias-corrected estimator. Moreover, SIMEX can also be applied to multivariate predictor variables prone to measurement error [26]. Only the variance matrix of the DIF effects instead of a scalar DIF standard deviation must be known (or estimated) to apply the SIMEX method to Stocking–Lord linking. Finally, we would also like to note that even recent articles in highly ranked journals using regularization approaches to handle DIF effects only treated the case of uniform DIF effects [70].
Future research might also investigate the performance of SIMEX-based bias correction methods for nonrobust or robust variants of SL linking (see also [71,72,73,74,75,76,77]). Furthermore, it would be interesting to adopt the methodology for polytomous items. In our study, we only considered asymmetric SL linking. SIMEX-based bias correction could also be applied to symmetric SL linking [17,78]. However, previous studies have shown that symmetric SL linking in its original form already has a smaller bias than asymmetric SL linking in the presence of DIF.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
2PLtwo-parameter logistic
IRFitem response function
IRTitem response theory
LPElogistic positive exponential
RMSEroot mean square error
SDstandard deviation
SIMEXsimulation extrapolation
SLStocking–Lord

References

  1. Bock, R.D.; Moustaki, I. Item response theory in a general framework. In Handbook of Statistics, Vol. 26: Psychometrics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 469–513. [Google Scholar] [CrossRef]
  2. Chen, Y.; Li, X.; Liu, J.; Ying, Z. Item Response Theory—A Statistical Framework for Educational and Psychological Measurement. Stat. Sci. 2023. Available online: https://imstat.org/journals-and-publications/statistical-science/statistical-science-future-papers/ (accessed on 15 March 2012).
  3. van der Linden, W.J. Unidimensional logistic response models. In Handbook of Item Response Theory, Volume 1: Models; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 11–30. [Google Scholar] [CrossRef]
  4. Yen, W.M.; Fitzpatrick, A.R. Item response theory. In Educational Measurement; Brennan, R.L., Ed.; Praeger Publishers: Westport, CT, USA, 2006; pp. 111–154. [Google Scholar]
  5. Birnbaum, A. Some latent trait models and their use in inferring an examinee’s ability. In Statistical Theories of Mental Test Scores; Lord, F.M., Novick, M.R., Eds.; MIT Press: Reading, MA, USA, 1968; pp. 397–479. [Google Scholar]
  6. Aitkin, M. Expectation maximization algorithm and extensions. In Handbook of Item Response Theory, Vol. 2: Statistical Tools; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 217–236. [Google Scholar] [CrossRef]
  7. Glas, C.A.W. Maximum-likelihood estimation. In Handbook of Item Response Theory, Vol. 2: Statistical Tools; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 197–216. [Google Scholar] [CrossRef]
  8. Lee, W.C.; Lee, G. IRT linking and equating. In The Wiley Handbook of Psychometric Testing: A Multidisciplinary Reference on Survey, Scale and Test; Irwing, P., Booth, T., Hughes, D.J., Eds.; Wiley: New York, NY, USA, 2018; pp. 639–673. [Google Scholar] [CrossRef]
  9. Kolen, M.J.; Brennan, R.L. Test Equating, Scaling, and Linking; Springer: New York, NY, USA, 2014. [Google Scholar] [CrossRef]
  10. Sansivieri, V.; Wiberg, M.; Matteucci, M. A review of test equating methods with a special focus on IRT-based approaches. Statistica 2017, 77, 329–352. [Google Scholar] [CrossRef]
  11. Millsap, R.E. Statistical Approaches to Measurement Invariance; Routledge: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
  12. Penfield, R.D.; Camilli, G. Differential item functioning and item bias. In Handbook of Statistics, Vol. 26: Psychometrics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 125–167. [Google Scholar] [CrossRef]
  13. Monseur, C.; Berezner, A. The computation of equating errors in international surveys in education. J. Appl. Meas. 2007, 8, 323–335. [Google Scholar] [PubMed]
  14. Robitzsch, A. Linking error in the 2PL model. J 2023, 6, 58–84. [Google Scholar] [CrossRef]
  15. Sachse, K.A.; Roppelt, A.; Haag, N. A comparison of linking methods for estimating national trends in international comparative large-scale assessments in the presence of cross-national DIF. J. Educ. Meas. 2016, 53, 152–171. [Google Scholar] [CrossRef]
  16. Wu, M. Measurement, sampling, and equating errors in large-scale assessments. Educ. Meas. 2010, 29, 15–27. [Google Scholar] [CrossRef]
  17. Robitzsch, A. A comparison of linking methods for two groups for the two-parameter logistic item response model in the presence and absence of random differential item functioning. Foundations 2021, 1, 116–144. [Google Scholar] [CrossRef]
  18. De Boeck, P. Random item IRT models. Psychometrika 2008, 73, 533–559. [Google Scholar] [CrossRef]
  19. Fox, J.P.; Verhagen, A.J. Random item effects modeling for cross-national survey data. In Cross-cultural Analysis: Methods and Applications; Davidov, E., Schmidt, P., Billiet, J., Eds.; Routledge: London, UK, 2010; pp. 461–482. [Google Scholar] [CrossRef]
  20. Longford, N.T.; Holland, P.W.; Thayer, D.T. Stability of the MH D-DIF statistics across populations. In Differential Item Functioning; Holland, P.W., Wainer, H., Eds.; Routledge: London, UK, 1993; pp. 171–196. [Google Scholar] [CrossRef]
  21. Stocking, M.L.; Lord, F.M. Developing a common metric in item response theory. Appl. Psychol. Meas. 1983, 7, 201–210. [Google Scholar] [CrossRef]
  22. Haebara, T. Equating logistic ability scales by a weighted least squares method. Jpn. Psychol. Res. 1980, 22, 144–149. [Google Scholar] [CrossRef]
  23. Kang, T.; Petersen, N.S. Linking item parameters to a base scale. Asia Pacific Educ. Rev. 2012, 13, 311–321. [Google Scholar] [CrossRef]
  24. Kilmen, S.; Demirtasli, N. Comparison of test equating methods based on item response theory according to the sample size and ability distribution. Procedia Soc. Behav. Sci. 2012, 46, 130–134. [Google Scholar] [CrossRef]
  25. Lee, W.C.; Ban, J.C. A comparison of IRT linking procedures. Appl. Meas. Educ. 2009, 23, 23–48. [Google Scholar] [CrossRef]
  26. Carroll, R.J.; Ruppert, D.; Stefanski, L.A.; Crainiceanu, C.M. Measurement Error in Nonlinear Models: A Modern Perspective; Chapman and Hall/CRC: Boca Raton, FL, USA, 2006. [Google Scholar] [CrossRef]
  27. Robitzsch, A. Bias-reduced Haebara and Stocking-Lord linking in the presence of differential item functioning. PsyArXiv 2024. [Google Scholar] [CrossRef]
  28. Cook, J.R.; Stefanski, L.A. Simulation-extrapolation estimation in parametric measurement error models. J. Am. Stat. Assoc. 1994, 89, 1314–1328. [Google Scholar] [CrossRef]
  29. Carroll, R.J.; Küchenhoff, H.; Lombard, F.; Stefanski, L.A. Asymptotics for the SIMEX estimator in nonlinear measurement error models. J. Am. Stat. Assoc. 1996, 91, 242–250. [Google Scholar] [CrossRef]
  30. Stefanski, L.A.; Cook, J.R. Simulation-extrapolation: The measurement error jackknife. J. Am. Stat. Assoc. 1995, 90, 1247–1256. [Google Scholar] [CrossRef]
  31. Lederer, W.; Küchenhoff, H. A short introduction to the SIMEX and MCSIMEX. R News 2006, 6, 26–31. [Google Scholar]
  32. Buonaccorsi, J.P. Measurement Error: Models, Methods, and Applications; CRC Press: Boca Raton, FL, USA, 2010. [Google Scholar] [CrossRef]
  33. R Core Team. R: A Language and Environment for Statistical Computing, 2023; R Foundation for Statistical Computing: Vienna, Austria, 2023; Available online: https://www.R-project.org/ (accessed on 15 March 2023).
  34. Robitzsch, A. sirt: Supplementary Item Response Theory Models, R Package Version 4.2-57; CRAN: Vienna, Austria, 2024; Available online: https://github.com/alexanderrobitzsch/sirt (accessed on 20 April 2024).
  35. Samejima, F. Logistic positive exponent family of models: Virtue of asymmetric item characteristic curves. Psychometrika 2000, 65, 319–335. [Google Scholar] [CrossRef]
  36. Bolt, D.M.; Deng, S.; Lee, S. IRT model misspecification and measurement of growth in vertical scaling. J. Educ. Meas. 2014, 51, 141–162. [Google Scholar] [CrossRef]
  37. Bolfarine, H.; Bazán, J.L. Bayesian estimation of the logistic positive exponent IRT model. J. Educ. Behav. Stat. 2010, 35, 693–713. [Google Scholar] [CrossRef]
  38. Huang, Q.; Bolt, D.M.; Lyu, W. Investigating item complexity as a source of cross-national DIF in TIMSS math and science. Large-Scale Assess. Educ. 2024, 12, 12. [Google Scholar] [CrossRef]
  39. Bechger, T.M.; Maris, G. A statistical test for differential item pair functioning. Psychometrika 2015, 80, 317–340. [Google Scholar] [CrossRef] [PubMed]
  40. Doebler, A. Looking at DIF from a new perspective: A structure-based approach acknowledging inherent indefinability. Appl. Psychol. Meas. 2019, 43, 303–321. [Google Scholar] [CrossRef] [PubMed]
  41. Robitzsch, A.; Lüdtke, O. A review of different scaling approaches under full invariance, partial invariance, and noninvariance for cross-sectional country comparisons in large-scale assessments. Psychol. Test Assess. Model. 2020, 62, 233–279. [Google Scholar]
  42. Wang, W.C.; Shih, C.L.; Sun, G.W. The DIF-free-then-DIF strategy for the assessment of differential item functioning. Educ. Psychol. Meas. 2012, 72, 687–708. [Google Scholar] [CrossRef]
  43. Sireci, S.G.; Rios, J.A. Decisions that make a difference in detecting differential item functioning. Educ. Res. Eval. 2013, 19, 170–187. [Google Scholar] [CrossRef]
  44. Schulze, D.; Reuter, B.; Pohl, S. Measurement invariance: Dealing with the uncertainty in anchor item choice by model averaging. Struct. Equ. Model. 2022, 22, 521–530. [Google Scholar] [CrossRef]
  45. Robitzsch, A.; Lüdtke, O. Why full, partial, or approximate measurement invariance are not a prerequisite for meaningful and valid group comparisons. Struct. Equ. Model. 2023, 30, 859–870. [Google Scholar] [CrossRef]
  46. Oliveri, M.E.; von Davier, M. Investigation of model fit and score scale comparability in international assessments. Psychol. Test Assess. Model. 2011, 53, 315–333. [Google Scholar]
  47. Pohl, S.; Schulze, D.; Stets, E. Partial measurement invariance: Extending and evaluating the cluster approach for identifying anchor items. Appl. Psychol. Meas. 2021, 45, 477–493. [Google Scholar] [CrossRef] [PubMed]
  48. von Davier, M.; Bezirhan, U. A robust method for detecting item misfit in large scale assessments. Educ. Psychol. Meas. 2023, 83, 740–765. [Google Scholar] [CrossRef] [PubMed]
  49. Lord, F.M. Applications of Item Response Theory to Practical Testing Problems; Erlbaum: Hillsdale, NJ, USA, 1980. [Google Scholar] [CrossRef]
  50. Candell, G.L.; Drasgow, F. An iterative procedure for linking metrics and assessing item bias in item response theory. Appl. Psychol. Meas. 1988, 12, 253–260. [Google Scholar] [CrossRef]
  51. Kim, S.H.; Cohen, A.S. Effects of linking methods on detection of DIF. J. Educ. Meas. 1992, 29, 51–66. [Google Scholar] [CrossRef]
  52. Park, D.G.; Lautenschlager, G.J. Improving IRT item bias detection with iterative linking and ability scale purification. Appl. Psychol. Meas. 1990, 14, 163–173. [Google Scholar] [CrossRef]
  53. Seybert, J.; Stark, S. Iterative linking with the differential functioning of items and tests (DFIT) method: Comparison of testwide and item parameter replication (IPR) critical values. Appl. Psychol. Meas. 2012, 36, 494–515. [Google Scholar] [CrossRef]
  54. Camilli, G. The case against item bias detection techniques based on internal criteria: Do item bias procedures obscure test fairness issues? In Differential Item Functioning: Theory and Practice; Holland, P.W., Wainer, H., Eds.; Erlbaum: Hillsdale, NJ, USA, 1993; pp. 397–417. [Google Scholar]
  55. Robitzsch, A.; Lüdtke, O. Some thoughts on analytical choices in the scaling model for test scores in international large-scale assessment studies. Meas. Instrum. Soc. Sci. 2022, 4, 9. [Google Scholar] [CrossRef]
  56. OECD. PISA 2018. Technical Report; OECD: Paris, France, 2020. [Google Scholar]
  57. von Davier, M.; Yamamoto, K.; Shin, H.J.; Chen, H.; Khorramdel, L.; Weeks, J.; Davis, S.; Kong, N.; Kandathil, M. Evaluating item response theory linking and model fit for data from PISA 2000–2012. Assess. Educ. 2019, 26, 466–488. [Google Scholar] [CrossRef]
  58. Belzak, W.; Bauer, D.J. Improving the assessment of measurement invariance: Using regularization to select anchor items and identify differential item functioning. Psychol. Methods 2020, 25, 673–690. [Google Scholar] [CrossRef]
  59. Magis, D.; Tuerlinckx, F.; De Boeck, P. Detection of differential item functioning using the lasso approach. J. Educ. Behav. Stat. 2015, 40, 111–135. [Google Scholar] [CrossRef]
  60. Tutz, G.; Schauberger, G. A penalty approach to differential item functioning in Rasch models. Psychometrika 2015, 80, 21–43. [Google Scholar] [CrossRef] [PubMed]
  61. Robitzsch, A. Comparing robust linking and regularized estimation for linking two groups in the 1PL and 2PL models in the presence of sparse uniform differential item functioning. Stats 2023, 6, 192–208. [Google Scholar] [CrossRef]
  62. Falk, C.F.; Cai, L. Semiparametric item response functions in the context of guessing. J. Educ. Meas. 2016, 53, 229–247. [Google Scholar] [CrossRef]
  63. Feuerstahler, L.M. Metric transformations and the filtered monotonic polynomial item response model. Psychometrika 2019, 84, 105–123. [Google Scholar] [CrossRef] [PubMed]
  64. Feuerstahler, L. Flexible item response modeling in R with the flexmet package. Psych 2021, 3, 447–478. [Google Scholar] [CrossRef]
  65. Liang, L.; Browne, M.W. A quasi-parametric method for fitting flexible item response functions. J. Educ. Behav. Stat. 2015, 40, 5–34. [Google Scholar] [CrossRef]
  66. Culpepper, S.A. The prevalence and implications of slipping on low-stakes, large-scale assessments. J. Educ. Behav. Stat. 2017, 42, 706–725. [Google Scholar] [CrossRef]
  67. Liao, X.; Bolt, D.M. Item characteristic curve asymmetry: A better way to accommodate slips and guesses than a four-parameter model? J. Educ. Behav. Stat. 2021, 46, 753–775. [Google Scholar] [CrossRef]
  68. Paek, I.; Fukuhara, H. An investigation of DIF mechanisms in the context of differential testlet effects. Brit. J. Math. Stat. Psychol. 2015, 68, 142–157. [Google Scholar] [CrossRef]
  69. Rutkowski, L.; Svetina, D. Assessing the hypothesis of measurement invariance in the context of large-scale international surveys. Educ. Psychol. Meas. 2014, 74, 31–57. [Google Scholar] [CrossRef]
  70. Chen, Y.; Li, C.; Ouyang, J.; Xu, G. DIF statistical inference without knowing anchoring items. Psychometrika 2023, 88, 1097–1122. [Google Scholar] [CrossRef] [PubMed]
  71. Halpin, P.F. Differential item functioning via robust scaling. Psychometrika, 2024; epub ahead of print. [Google Scholar] [CrossRef] [PubMed]
  72. He, Y.; Cui, Z.; Fang, Y.; Chen, H. Using a linear regression method to detect outliers in IRT common item equating. Appl. Psychol. Meas. 2013, 37, 522–540. [Google Scholar] [CrossRef]
  73. He, Y.; Cui, Z. Evaluating robust scale transformation methods with multiple outlying common items under IRT true score equating. Appl. Psychol. Meas. 2020, 44, 296–310. [Google Scholar] [CrossRef]
  74. Magis, D.; De Boeck, P. Identification of differential item functioning in multiple-group settings: A multivariate outlier detection approach. Multivar. Behav. Res. 2011, 46, 733–755. [Google Scholar] [CrossRef] [PubMed]
  75. Robitzsch, A. Robust Haebara linking for many groups: Performance in the case of uniform DIF. Psych 2020, 2, 155–173. [Google Scholar] [CrossRef]
  76. Robitzsch, A. Robust and nonrobust linking of two groups for the Rasch model with balanced and unbalanced random DIF: A comparative simulation study and the simultaneous assessment of standard errors and linking errors with resampling techniques. Symmetry 2021, 13, 2198. [Google Scholar] [CrossRef]
  77. Wang, W.; Liu, Y.; Liu, H. Testing differential item functioning without predefined anchor items using robust regression. J. Educ. Behav. Stat. 2022, 47, 666–692. [Google Scholar] [CrossRef]
  78. Weeks, J.P. plink: An R package for linking mixed-format tests using IRT-based methods. J. Stat. Softw. 2010, 35, 1–33. [Google Scholar] [CrossRef]
Figure 1. Numerical Illustration: SIMEX curves and parameter estimates for mean μ ( λ ) and standard deviation σ ( λ ) for a DIF variance τ 2 = 0.55 ( τ = 0.74 ). The red solid line displays the quadratic SIMEX regression function, while the red dashed line displays the linear SIMEX regression function.
Figure 1. Numerical Illustration: SIMEX curves and parameter estimates for mean μ ( λ ) and standard deviation σ ( λ ) for a DIF variance τ 2 = 0.55 ( τ = 0.74 ). The red solid line displays the quadratic SIMEX regression function, while the red dashed line displays the linear SIMEX regression function.
Analytics 03 00020 g001
Figure 2. Numerical Illustration: Estimated mean μ ^ and estimated standard deviation σ ^ as a function of the DIF variance τ 2 for Stocking–Lord linking (SL), SIMEX-based Stocking–Lord linking with a linear extrapolation (SLSL), analytical bias correction for Stocking–Lord linking according to Equation (15) (SLA1), and SIMEX-based Stocking–Lord linking with a quadratic extrapolation (SLSQ).
Figure 2. Numerical Illustration: Estimated mean μ ^ and estimated standard deviation σ ^ as a function of the DIF variance τ 2 for Stocking–Lord linking (SL), SIMEX-based Stocking–Lord linking with a linear extrapolation (SLSL), analytical bias correction for Stocking–Lord linking according to Equation (15) (SLA1), and SIMEX-based Stocking–Lord linking with a quadratic extrapolation (SLSQ).
Analytics 03 00020 g002
Table 1. Simulation Study 1: Bias and relative root mean square error (RMSE) for the estimated mean μ ^ as a function of the uniform DIF standard deviation τ , number of items I, and sample size per group N.
Table 1. Simulation Study 1: Bias and relative root mean square error (RMSE) for the estimated mean μ ^ as a function of the uniform DIF standard deviation τ , number of items I, and sample size per group N.
BiasRelative RMSE
I N SLSLSQSLSLSLA1SLA2SLA3SLSLSQSLSLSLA1SLA2SLA3
No DIF (DIF SD τ = 0 )
10500 0.004 0.005 0.005 0.012 0.010 0.005 1 99.9100   100.0101.6101.1100.0
1000 0.002 0.002 0.002 0.005 0.004 0.002 1 99.9100   100.0100.6100.4100.0
2000 0.000 0.000 0.000 0.002 0.001 0.000100.0100   100.0100.2100.2100.0
4000 0.001 0.001 0.001 0.001 0.001 0.001100.0100   100.0100.1100.1100.0
20500 0.002 0.003 0.003 0.009 0.007 0.003 1 99.9100   100.0101.1100.9100.0
1000 0.003 0.004 0.004 0.006 0.006 0.004 1 99.9100   100.0100.6100.5100.0
2000 0.002 0.002 0.002 0.004 0.003 0.002100.0100   100.0100.4100.3100.0
4000 0.000 0.000 0.000 0.001 0.000 0.000100.0100   100.0100.1100.1100.0
40500 0.001 0.001 0.001 0.007 0.005 0.001 1 99.9100   100.0100.8100.7100.0
1000 0.002 0.003 0.003 0.005 0.005 0.003100.0100   100.0100.5100.5100.0
2000 0.002 0.002 0.002 0.003 0.003 0.002100.0100   100.0100.3100.3100.0
4000 0.002 0.002 0.002 0.002 0.002 0.002100.0100   100.0100.2100.2100.0
Moderate uniform DIF (DIF SD τ = 0.25 )
10500−0.005−0.001−0.001 0.007 0.006−0.001 1 99.4100   1 99.9101.1101.2100.1
1000−0.001 0.004 0.003 0.007 0.007 0.004 1 99.4100   1 99.9100.7100.7100.1
2000−0.004 0.001 0.000 0.002 0.002 0.001 1 99.3100   1 99.9100.3100.3100.0
4000−0.001 0.004 0.003 0.004 0.004 0.004 1 99.1100   1 99.9100.4100.2100.1
20500−0.003 0.002 0.001 0.008 0.007 0.002 1 99.4100   1 99.9101.2101.1100.0
1000−0.005 0.001 0.000 0.004 0.003 0.001 1 99.4100   1 99.9100.6100.5100.0
2000−0.006 0.000 0.000 0.001 0.001 0.000 1 99.6100   1 99.9100.3100.2100.0
4000−0.005 0.001 0.000 0.001 0.001 0.000 1 99.5100   1 99.9100.1100.2100.0
40500−0.005 0.000 0.000 0.007 0.005 0.000 1 99.5100   1 99.9100.9100.9100.0
1000−0.006 0.000 0.000 0.003 0.002 0.000 1 99.6100   1 99.9100.3100.4100.0
2000−0.007−0.001−0.002 0.000 0.000−0.001100.2100   100.0100.2100.2100.0
4000−0.005 0.001 0.000 0.001 0.001 0.001 1 99.8100   1 99.9100.1100.2100.0
Large uniform DIF (DIF SD τ = 0.5 )
10500−0.012 0.007 0.003 0.014 0.014 0.007 1 97.1100   1 99.1102.1101.8100.3
1000−0.017 0.002−0.001 0.005 0.005 0.002 1 97.4100   1 99.3100.8100.7100.1
2000−0.022−0.003−0.007−0.003−0.002−0.004 1 97.8100   1 99.4100.4100.3100.1
4000−0.021−0.002−0.005−0.003−0.002−0.003 1 97.4100   1 99.2100.1100.1100.0
20500−0.020 0.002−0.003 0.006 0.006 0.001 1 98.0100   1 99.3101.0101.1100.0
1000−0.018 0.003−0.001 0.004 0.004 0.002 1 97.6100   1 99.1100.3100.5100.0
2000−0.021 0.000−0.004−0.002−0.001−0.002 1 98.3100   1 99.3100.2100.2100.0
4000−0.022−0.001−0.005−0.003−0.002−0.002 1 98.2100   1 99.3100.2100.1100.0
40500−0.017 0.005 0.001 0.008 0.009 0.004 1 98.1100   1 99.2100.7100.9 1 99.9
1000−0.023−0.001−0.005−0.002 0.000−0.003 1 99.8100   1 99.5100.0100.3 1 99.9
2000−0.023−0.002−0.006−0.004−0.002−0.003100.3100   1 99.5100.3100.1100.0
4000−0.020 0.002−0.003−0.001 0.000 0.000 1 99.3100   1 99.2 1 99.9100.0 1 99.9
Note. SL = Stocking–Lord linking; SLSQ = SIMEX-based Stocking–Lord linking with a quadratic extrapolation; SLSL = SIMEX-based Stocking–Lord linking with a linear extrapolation; SLA1 = analytical bias correction for Stocking–Lord linking according to Equation (15); SLA2 = analytical bias correction for Stocking–Lord linking according to Equation (16) using the empirical variance estimate τ ^ 2 ; SLA3 = analytical bias correction for Stocking–Lord linking according to Equation (16) using the bias-corrected variance estimate τ ^ ^ 2 ;   the linking method SLSQ was the reference method for computing the relative RMSE. Biases with absolute values larger than 0.010 are printed in bold font. Relative RMSE values larger than 105.0 are printed in bold font.
Table 2. Simulation Study 1: Bias and relative root mean square error (RMSE) for the estimated standard deviation σ ^ as a function of the uniform DIF standard deviation τ , number of items I, and sample size per group N.
Table 2. Simulation Study 1: Bias and relative root mean square error (RMSE) for the estimated standard deviation σ ^ as a function of the uniform DIF standard deviation τ , number of items I, and sample size per group N.
BiasRelative RMSE
I N SLSLSQSLSLSLA1SLA2SLA3SLSLSQSLSLSLA1SLA2SLA3
No DIF (DIF SD τ = 0 )
10500 0.006 0.008 0.007 0.013 0.016 0.008 1 99.6100   1 99.9101.2102.4100.1
1000 0.004 0.005 0.005 0.007 0.009 0.005 1 99.9100   100.0100.6101.0100.0
2000 0.003 0.003 0.003 0.004 0.005 0.003 1 99.9100   100.0100.3100.5100.0
4000 0.001 0.002 0.002 0.002 0.003 0.002 1 99.9100   100.0100.2100.3100.0
20500 0.007 0.008 0.008 0.014 0.016 0.008 1 99.8100   100.0101.6102.6100.0
1000 0.003 0.003 0.003 0.006 0.007 0.003 1 99.9100   100.0100.7101.1100.0
2000 0.002 0.003 0.003 0.004 0.004 0.003 1 99.9100   100.0100.4100.6100.0
4000 0.001 0.001 0.001 0.002 0.002 0.001100.0100   100.0100.2100.3100.0
40500 0.004 0.005 0.005 0.010 0.013 0.005 1 99.9100   100.0101.5102.4100.0
1000 0.003 0.004 0.004 0.006 0.007 0.004 1 99.9100   100.0100.9101.3100.0
2000 0.002 0.002 0.002 0.003 0.004 0.002100.0100   100.0100.5100.6100.0
4000 0.001 0.001 0.001 0.002 0.002 0.001100.0100   100.0100.2100.3100.0
Moderate uniform DIF (DIF SD τ = 0.25 )
10500−0.001 0.008 0.007 0.016 0.020 0.008 1 98.8100   1 99.7102.0103.3100.3
1000−0.007 0.002 0.001 0.005 0.007 0.002 1 99.4100   1 99.8101.1101.4100.3
2000−0.009 0.000 0.000 0.002 0.003 0.000 1 99.9100   1 99.8101.2101.0100.5
4000−0.009 0.001 0.000 0.001 0.002 0.000100.4100   1 99.7101.6101.0100.7
20500−0.006 0.003 0.003 0.010 0.014 0.003 1 99.2100   1 99.8101.6102.7100.2
1000−0.006 0.004 0.003 0.007 0.008 0.004 1 99.3100   1 99.8101.5101.6100.2
2000−0.010 0.000−0.001 0.001 0.002 0.000101.5100   1 99.8101.0100.8100.4
4000−0.009 0.002 0.001 0.002 0.003 0.001101.9100   1 99.6101.5100.9100.6
40500−0.007 0.003 0.002 0.009 0.012 0.003 1 99.6100   1 99.9101.6102.6100.1
1000−0.007 0.004 0.003 0.007 0.008 0.004 1 99.6100   1 99.8101.3101.7100.2
2000−0.009 0.002 0.001 0.003 0.003 0.001102.2100   1 99.8101.0100.9100.3
4000−0.011 0.000−0.001 0.000 0.001 0.000106.5100   1 99.9101.0100.6100.4
Large uniform DIF (DIF SD τ = 0.5 )
10500−0.027 0.009 0.001 0.011 0.020 0.007 1 97.1100   1 97.4103.7104.7101.6
1000−0.034 0.001−0.006 0.000 0.004−0.001103.0100   1 98.2104.8103.0102.1
2000−0.036−0.001−0.008−0.004 0.000−0.003106.1100   1 98.0106.8103.5103.1
4000−0.038−0.002−0.009−0.006−0.003−0.004110.7100   1 97.8108.0103.9103.8
20500−0.038 0.002−0.007 0.001 0.009−0.001104.7100   1 98.4103.4103.0101.3
1000−0.036 0.003−0.005−0.001 0.005 0.000108.6100   1 98.1104.1102.5101.6
2000−0.038 0.000−0.007−0.004 0.000−0.002119.0100   1 98.8106.1102.9102.6
4000−0.040−0.001−0.009−0.006−0.003−0.004126.1100   1 98.8108.4103.6103.6
40500−0.039 0.002−0.007−0.001 0.008−0.001110.8100   1 99.1102.3102.3100.8
1000−0.040 0.000−0.008−0.004 0.001−0.003121.0100   1 99.5103.7101.7101.3
2000−0.041−0.001−0.009−0.006−0.002−0.004135.1100   100.2105.7102.3102.3
4000−0.041−0.001−0.009−0.007−0.003−0.004147.0100   100.1108.5103.1103.2
Note. SL = Stocking–Lord linking; SLSQ = SIMEX-based Stocking–Lord linking with a quadratic extrapolation; SLSL = SIMEX-based Stocking–Lord linking with a linear extrapolation; SLA1 = analytical bias correction for Stocking–Lord linking according to Equation (15); SLA2 = analytical bias correction for Stocking–Lord linking according to Equation (16) using the empirical variance estimate τ ^ 2 ; SLA3 = analytical bias correction for Stocking–Lord linking according to Equation (16) using the bias-corrected variance estimate τ ^ ^ 2 ;   the linking method SLSQ was the reference method for computing the relative RMSE. Biases with absolute values larger than 0.010 are printed in bold font. Relative RMSE values larger than 105.0 are printed in bold font.
Table 3. Simulation Study 2: Bias and relative root mean square error (RMSE) for the estimated mean μ ^ for I = 20 items as a function of the uniform DIF standard deviation τ , and sample sizes per group N 1 and N 2 .
Table 3. Simulation Study 2: Bias and relative root mean square error (RMSE) for the estimated mean μ ^ for I = 20 items as a function of the uniform DIF standard deviation τ , and sample sizes per group N 1 and N 2 .
BiasRelative RMSE
N 1 N 2 SLSLSQSLSLSLA1SLA2SLA3SLSLSQSLSLSLA1SLA2SLA3
No DIF (DIF SD τ = 0 )
250250 0.008 0.009 0.009 0.023 0.020 0.009 1 99.8100   100.0103.9102.9100.0
250500 0.001 0.002 0.002 0.011 0.010 0.002 1 99.9100   100.0101.4101.3100.0
2501000 0.003 0.004 0.004 0.011 0.011 0.004100.0100   100.0100.8101.0100.0
2502000 0.007 0.008 0.008 0.014 0.014 0.008 1 99.9100   100.0101.1101.5100.0
500500 0.002 0.003 0.003 0.009 0.007 0.003 1 99.9100   100.0101.1100.9100.0
5001000 0.001 0.001 0.001 0.005 0.004 0.001100.0100   100.0100.5100.5100.0
5002000 0.003 0.003 0.003 0.006 0.006 0.003100.0100   100.0100.4100.4100.0
10001000 0.003 0.003 0.003 0.006 0.005 0.003 1 99.9100   100.0100.5100.5100.0
10002000 0.001 0.001 0.001 0.003 0.003 0.001100.0100   100.0100.3100.2100.0
20002000 0.001 0.001 0.001 0.002 0.002 0.001100.0100   100.0100.2100.2100.0
Moderate uniform DIF (DIF SD τ = 0.25 )
250250−0.002 0.003 0.002 0.020 0.016 0.003 1 99.4100   1 99.9104.3103.0100.0
250500−0.002 0.003 0.002 0.014 0.013 0.003 1 99.4100   1 99.9101.8101.9100.1
2501000−0.004 0.000 0.000 0.009 0.010 0.001 1 99.6100   100.0100.8101.4100.1
2502000−0.002 0.003 0.002 0.011 0.011 0.003 1 99.6100   1 99.9101.2101.9100.1
500500−0.004 0.001 0.001 0.008 0.007 0.001 1 99.5100   1 99.9101.0101.1100.0
5001000−0.006−0.001−0.001 0.004 0.004 0.000 1 99.6100   1 99.9100.3100.6100.0
5002000−0.005 0.001 0.000 0.005 0.004 0.001 1 99.4100   1 99.9100.6100.6100.1
10001000−0.003 0.002 0.002 0.005 0.005 0.002 1 99.4100   1 99.9100.6100.6100.0
10002000−0.006−0.001−0.001 0.002 0.001−0.001 1 99.6100   1 99.9100.3100.3100.1
20002000−0.007−0.001−0.002 0.000 0.000−0.001 1 99.7100   100.0100.2100.2100.0
Large uniform DIF (DIF SD τ = 0.5 )
250250−0.017 0.004−0.001 0.019 0.018 0.003 1 97.6100   1 99.2103.5103.5100.1
250500−0.023−0.002−0.007 0.007 0.008−0.002 1 98.4100   1 99.4101.2101.8100.1
2501000−0.018 0.002−0.002 0.009 0.011 0.003 1 98.4100   1 99.5100.9101.7100.2
2502000−0.013 0.007 0.003 0.013 0.015 0.007 1 97.6100   1 99.2100.8101.5100.2
500500−0.021 0.000−0.004 0.005 0.005−0.001 1 98.0100   1 99.3100.8101.1100.0
5001000−0.018 0.004−0.001 0.006 0.007 0.003 1 97.6100   1 99.2100.5100.7100.0
5002000−0.021−0.001−0.005 0.001 0.002−0.001 1 98.5100   1 99.4100.5100.6100.1
10001000−0.024−0.003−0.007−0.002−0.001−0.004 1 98.6100   1 99.4100.5100.4100.0
10002000−0.023−0.002−0.007−0.003−0.002−0.004 1 98.5100   1 99.4100.3100.3100.0
20002000−0.023−0.002−0.006−0.003−0.002−0.003 1 98.5100   1 99.4100.4100.2100.0
Note. SL = Stocking–Lord linking; SLSQ = SIMEX-based Stocking–Lord linking with a quadratic extrapolation; SLSL = SIMEX-based Stocking–Lord linking with a linear extrapolation; SLA1 = analytical bias correction for Stocking–Lord linking according to Equation (15); SLA2 = analytical bias correction for Stocking–Lord linking according to Equation (16) using the empirical variance estimate τ ^ 2 ; SLA3 = analytical bias correction for Stocking–Lord linking according to Equation (16) using the bias-corrected variance estimate τ ^ ^ 2 ;   the linking method SLSQ was the reference method for computing the relative RMSE. Biases with absolute values larger than 0.010 are printed in bold font. Relative RMSE values larger than 105.0 are printed in bold font.
Table 4. Simulation Study 2: Bias and relative root mean square error (RMSE) for the estimated standard deviation σ ^ for I = 20 items as a function of the uniform DIF standard deviation τ , and sample sizes per group N 1 and N 2 .
Table 4. Simulation Study 2: Bias and relative root mean square error (RMSE) for the estimated standard deviation σ ^ for I = 20 items as a function of the uniform DIF standard deviation τ , and sample sizes per group N 1 and N 2 .
BiasRelative RMSE
N 1 N 2 SLSLSQSLSLSLA1SLA2SLA3SLSLSQSLSLSLA1SLA2SLA3
No DIF (DIF SD τ = 0 )
250250 0.011 0.013 0.013 0.025 0.033 0.013 1 99.6100   100.0103.0107.1100.0
250500 0.013 0.014 0.014 0.023 0.029 0.014 1 99.9100   100.0103.2108.0100.0
2501000 0.016 0.016 0.016 0.024 0.030 0.017 1 99.8100   100.0103.1108.0100.0
2502000 0.018 0.019 0.019 0.026 0.031 0.019 1 99.8100   100.0103.3109.0100.0
500500 0.004 0.005 0.005 0.010 0.013 0.005 1 99.8100   100.0101.2102.3100.0
5001000 0.006 0.007 0.007 0.011 0.013 0.007 1 99.7100   100.0101.1102.3100.0
5002000 0.007 0.007 0.007 0.011 0.012 0.007 1 99.5100   100.0101.2102.8100.0
10001000 0.003 0.004 0.004 0.006 0.007 0.004 1 99.9100   100.0100.7101.0100.0
10002000 0.003 0.003 0.003 0.005 0.006 0.003 1 99.9100   100.0100.6101.0100.0
20002000 0.002 0.002 0.002 0.004 0.004 0.002 1 99.9100   100.0100.4100.6100.0
Moderate uniform DIF (DIF SD τ = 0.25 )
250250 0.000 0.008 0.007 0.023 0.033 0.008 1 98.7100   1 99.7103.5109.1100.2
250500 0.002 0.011 0.010 0.022 0.030 0.011 1 98.6100   1 99.8103.9108.9100.3
2501000 0.004 0.013 0.012 0.023 0.030 0.013 1 98.6100   1 99.8103.9109.6100.3
2502000 0.005 0.014 0.013 0.023 0.030 0.015 1 98.4100   1 99.8104.4112.2100.5
500500−0.004 0.005 0.005 0.012 0.015 0.005 1 99.0100   1 99.8102.0103.0100.2
5001000−0.003 0.006 0.006 0.012 0.014 0.007 1 98.4100   1 99.7101.7103.2100.3
5002000−0.003 0.007 0.006 0.011 0.013 0.007 1 98.3100   1 99.7101.8103.4100.4
10001000−0.007 0.003 0.002 0.006 0.007 0.003 1 99.5100   1 99.8101.2101.5100.2
10002000−0.007 0.003 0.002 0.005 0.006 0.003 1 99.6100   1 99.8101.2101.5100.3
20002000−0.010 0.000−0.001 0.001 0.002 0.000101.8100   1 99.8101.0100.8100.4
Large uniform DIF (DIF SD τ = 0.5 )
250250−0.027 0.011 0.002 0.017 0.036 0.009 1 98.1100   1 98.0103.7109.2101.0
250500−0.027 0.011 0.003 0.015 0.028 0.010 1 99.5100   1 98.2104.0109.1101.3
2501000−0.026 0.011 0.004 0.016 0.027 0.011 1 99.8100   1 98.1104.3110.6101.7
2502000−0.026 0.011 0.003 0.015 0.026 0.011100.6100   1 98.2104.3109.7101.5
500500−0.036 0.004−0.005 0.003 0.011 0.001104.6100   1 98.4103.2103.4101.2
5001000−0.032 0.007−0.001 0.006 0.013 0.006103.0100   1 97.8103.3103.9101.5
5002000−0.032 0.006−0.002 0.005 0.010 0.005104.6100   1 97.7103.6103.8101.6
10001000−0.038 0.001−0.007−0.002 0.003−0.002111.8100   1 98.8104.5102.5101.9
10002000−0.035 0.003−0.004 0.000 0.004 0.001110.9100   1 97.8104.6102.8101.9
20002000−0.041−0.002−0.010−0.007−0.003−0.005121.8100   1 99.2106.0102.8102.7
Note. SL = Stocking–Lord linking; SLSQ = SIMEX-based Stocking–Lord linking with a quadratic extrapolation; SLSL = SIMEX-based Stocking–Lord linking with a linear extrapolation; SLA1 = analytical bias correction for Stocking–Lord linking according to Equation (15); SLA2 = analytical bias correction for Stocking–Lord linking according to Equation (16) using the empirical variance estimate τ ^ 2 ; SLA3 = analytical bias correction for Stocking–Lord linking according to Equation (16) using the bias-corrected variance estimate τ ^ ^ 2 ;   the linking method SLSQ was the reference method for computing the relative RMSE. Biases with absolute values larger than 0.010 are printed in bold font. Relative RMSE values larger than 105.0 are printed in bold font.
Table 5. Simulation Study 3: Bias and relative root mean square error (RMSE) for the estimated mean μ ^ for I = 20 items as a function of the type of the distribution and size of ξ i parameters in the logistic positive exponential (LPE) model, the uniform DIF standard deviation τ , and sample sizes per group N.
Table 5. Simulation Study 3: Bias and relative root mean square error (RMSE) for the estimated mean μ ^ for I = 20 items as a function of the type of the distribution and size of ξ i parameters in the logistic positive exponential (LPE) model, the uniform DIF standard deviation τ , and sample sizes per group N.
BiasRelative RMSE
Type ξ 0 N SLSLSQSLSLSLA1SLA2SLA3SLSLSQSLSLSLA1SLA2SLA3
No DIF (DIF SD τ = 0 )
Bal1500 0.004 0.004 0.004 0.010 0.009 0.005 1 99.8100   100.0101.3101.1100.0
1000 0.000 0.000 0.000 0.002 0.002 0.000100.0100   100.0100.3100.3100.0
2000 0.001 0.001 0.001 0.002 0.002 0.001100.0100   100.0100.2100.2100.0
4000 0.001 0.001 0.001 0.001 0.001 0.001100.0100   100.0100.1100.1100.0
Bal0.5500−0.003−0.002−0.002 0.006 0.003−0.002 1 99.9100   100.0101.2100.7100.0
1000−0.003−0.003−0.003 0.001−0.001−0.003100.0100   100.0100.1100.2100.0
2000−0.004−0.004−0.004−0.002−0.003−0.004100.0100   100.0 1 99.8 1 99.9100.0
4000−0.004−0.004−0.004−0.003−0.004−0.004100.0100   100.0 1 99.7 1 99.8100.0
Bal2500−0.004−0.004−0.004 0.009 0.001−0.004 1 99.8100   100.0102.8101.0100.0
1000−0.005−0.005−0.005 0.001−0.003−0.005100.0100   100.0100.1100.2100.0
2000−0.006−0.006−0.006−0.004−0.005−0.006100.0100   100.0 1 99.5 1 99.9100.0
4000−0.007−0.007−0.007−0.006−0.007−0.007100.1100   100.0 1 99.1 1 99.7100.0
Unbal0.5500 0.025 0.029 0.029 0.071 0.074 0.029 1 98.3100   1 99.7124.9127.5100.1
1000 0.022 0.024 0.024 0.042 0.043 0.024 1 98.7100   1 99.9112.8113.4100.1
2000 0.021 0.022 0.022 0.031 0.031 0.022 1 98.9100   100.0108.6108.9100.0
4000 0.020 0.021 0.021 0.025 0.025 0.021 1 94.9100   1 97.4101.7107.4101.4
Unbal2500−0.017−0.017−0.017−0.019−0.019−0.017 1 99.9100   100.0100.8100.8100.0
1000−0.015−0.015−0.015−0.016−0.016−0.015100.0100   100.0100.6100.5100.0
2000−0.015−0.015−0.015−0.016−0.016−0.015100.0100   100.0100.5100.4100.0
4000−0.017−0.017−0.017−0.017−0.017−0.017100.0100   100.0100.4100.4100.0
Large uniform DIF (DIF SD τ = 0.5 )
Bal1500−0.019 0.002−0.002 0.007 0.007 0.001 1 97.8100   1 99.2100.9101.1100.0
1000−0.019 0.002−0.002 0.003 0.003 0.001 1 97.7100   1 99.2100.3100.5100.0
2000−0.022−0.001−0.005−0.003−0.002−0.003 1 98.4100   1 99.3100.2100.2100.0
4000−0.022−0.001−0.005−0.003−0.002−0.002 1 98.4100   1 99.3100.0100.1 1 99.9
Bal0.5500−0.018−0.005−0.007 0.002 0.000−0.006 1 98.3100   1 99.6100.8101.1 1 99.9
1000−0.017−0.004−0.006−0.002−0.003−0.005 1 98.5100   1 99.6100.5100.5100.0
2000−0.019−0.005−0.007−0.005−0.005−0.006 1 98.4100   1 99.6100.1100.1 1 99.9
4000−0.019−0.005−0.007−0.005−0.006−0.006 1 98.3100   1 99.5100.3100.0 1 99.9
Bal2500−0.019−0.012−0.012 0.003−0.005−0.012 1 98.6100   1 99.7101.9102.0100.1
1000−0.017−0.009−0.010−0.003−0.007−0.010 1 98.3100   1 99.6100.6100.8100.0
2000−0.016−0.008−0.008−0.004−0.007−0.008 1 98.2100   1 99.6100.4100.3100.0
4000−0.018−0.010−0.010−0.008−0.010−0.010 1 98.2100   1 99.6100.4100.2100.0
Unbal0.5500−0.017 0.016 0.010 0.062 0.071 0.015 1 97.1100   1 98.5111.4115.5100.1
1000−0.024 0.016 0.008 0.030 0.037 0.014 1 98.8100   1 98.9103.8105.4100.1
2000−0.024 0.014 0.009 0.019 0.023 0.012 1 98.8100   1 99.1102.0102.2100.1
4000−0.025 0.028 0.011 0.017 0.036 0.030 1 81.8100   1 83.4 1 87.9120.3117.4
Unbal2500−0.008−0.022−0.018−0.019−0.023−0.021 1 95.1100   1 98.6 1 99.5100.4 1 99.7
1000−0.003−0.017−0.013−0.012−0.017−0.016 1 95.2100   1 98.6 1 99.1 1 99.9 1 99.6
2000−0.001−0.015−0.011−0.009−0.014−0.013 1 95.3100   1 98.7 1 98.9 1 99.7 1 99.5
4000−0.007−0.021−0.017−0.015−0.019−0.019 1 94.4100   1 98.4 1 98.4 1 99.5 1 99.4
Note. Bal = balanced ξ i parameters in the LPE model around 1; Unbal = all ξ i parameters either smaller or larger than 1; ξ 0 = size of ξ i parameters; SL = Stocking–Lord linking; SLSQ = SIMEX-based Stocking–Lord linking with a quadratic extrapolation; SLSL = SIMEX-based Stocking–Lord linking with a linear extrapolation; SLA1 = analytical bias correction for Stocking–Lord linking according to Equation (15); SLA2 = analytical bias correction for Stocking–Lord linking according to Equation (16) using the empirical variance estimate τ ^ 2 ; SLA3 = analytical bias correction for Stocking–Lord linking according to Equation (16) using the bias-corrected variance estimate τ ^ ^ 2 ;   the linking method SLSQ was the reference method for computing the relative RMSE. Biases with absolute values larger than 0.010 are printed in bold font. Relative RMSE values larger than 105.0 are printed in bold font.
Table 6. Simulation Study 3: Bias and relative root mean square error (RMSE) for the estimated standard deviation σ ^ for I = 20 items as a function of the type of the distribution and size of ξ i parameters in the logistic positive exponential (LPE) model, the uniform DIF standard deviation τ , and sample sizes per group N.
Table 6. Simulation Study 3: Bias and relative root mean square error (RMSE) for the estimated standard deviation σ ^ for I = 20 items as a function of the type of the distribution and size of ξ i parameters in the logistic positive exponential (LPE) model, the uniform DIF standard deviation τ , and sample sizes per group N.
BiasRelative RMSE
Type ξ 0 N SLSLSQSLSLSLA1SLA2SLA3SLSLSQSLSLSLA1SLA2SLA3
No DIF (DIF SD τ = 0 )
Bal1500 0.005 0.006 0.006 0.012 0.014 0.006 1 99.8100   100.0101.4102.3100.0
1000 0.004 0.004 0.004 0.007 0.008 0.004 1 99.9100   100.0100.8101.2100.0
2000 0.001 0.001 0.001 0.002 0.003 0.001100.0100   100.0100.3100.4100.0
4000 0.000 0.000 0.000 0.001 0.001 0.000100.0100   100.0100.1100.2100.0
Bal0.5500−0.004−0.003−0.003 0.001 0.008−0.003 1 99.8100   100.0100.5102.2100.1
1000−0.007−0.006−0.006−0.004−0.002−0.006100.0100   100.0 1 99.8 1 99.9100.0
2000−0.009−0.008−0.008−0.007−0.006−0.008100.1100   100.0 1 99.5 1 99.1100.0
4000−0.008−0.008−0.008−0.007−0.007−0.008100.2100   100.0 1 99.5 1 99.0100.0
Bal2500−0.008−0.006−0.006−0.005 0.011−0.006 1 99.8100   1 99.9100.7104.9100.1
1000−0.010−0.009−0.009−0.009−0.002−0.009100.0100   100.0 1 99.9100.1100.0
2000−0.013−0.012−0.012−0.012−0.009−0.012100.3100   100.0 1 99.8 1 98.1100.0
4000−0.013−0.012−0.012−0.012−0.011−0.012100.4100   100.0 1 99.8 1 97.8100.0
Unbal0.5500 0.023 0.026 0.025 0.042 0.055 0.026 1 98.8100   1 99.8108.1117.3100.1
1000 0.019 0.021 0.021 0.029 0.033 0.021 1 99.1100   1 99.9105.1108.8100.0
2000 0.017 0.018 0.018 0.022 0.024 0.018 1 99.0100   100.0103.8106.1100.0
4000 0.017 0.017 0.017 0.019 0.020 0.017 1 97.4100   1 99.2100.8104.8100.8
Unbal2500−0.023−0.023−0.023−0.018−0.017−0.023100.2100   100.0 1 98.5 1 98.3100.0
1000−0.027−0.026−0.026−0.024−0.023−0.026100.2100   100.0 1 98.1 1 97.8100.0
2000−0.028−0.028−0.028−0.027−0.027−0.028100.2100   100.0 1 98.3 1 98.1100.0
4000−0.029−0.029−0.029−0.028−0.028−0.029100.1100   100.0 1 98.8 1 98.6100.0
Large uniform DIF (DIF SD τ = 0.5 )
Bal1500−0.037 0.002−0.006 0.001 0.009 0.000105.2100   1 98.7103.2103.2101.3
1000−0.038 0.001−0.007−0.003 0.003−0.002110.9100   1 98.5104.1102.3101.7
2000−0.039 0.000−0.008−0.005−0.001−0.003118.5100   1 98.7106.5102.8102.7
4000−0.039−0.001−0.008−0.006−0.003−0.004127.1100   1 99.1108.3103.5103.5
Bal0.5500−0.037−0.009−0.014−0.006 0.003−0.010105.0100   1 99.6103.0102.7101.3
1000−0.040−0.012−0.016−0.011−0.007−0.013111.9100   100.1103.5101.6102.0
2000−0.041−0.013−0.017−0.013−0.011−0.014119.8100   100.8105.2102.1102.8
4000−0.042−0.014−0.018−0.015−0.014−0.015124.4100   101.1106.9103.1103.6
Bal2500−0.039−0.016−0.020−0.012 0.006−0.016103.0100   1 99.3103.2105.0101.4
1000−0.043−0.018−0.021−0.016−0.008−0.018109.6100   1 99.9104.1101.7102.1
2000−0.042−0.018−0.021−0.017−0.014−0.018114.6100   100.2106.1101.8102.7
4000−0.044−0.019−0.022−0.018−0.017−0.019117.8100   100.5107.0102.7103.3
Unbal0.5500−0.011 0.012 0.008 0.028 0.048 0.011 1 96.8100   1 98.4107.3116.5100.4
1000−0.016 0.011 0.006 0.015 0.025 0.010 1 99.1100   1 98.0105.4108.2100.6
2000−0.015 0.011 0.007 0.012 0.016 0.009 1 99.6100   1 98.0103.9104.6100.7
4000−0.020 0.015 0.004 0.006 0.021 0.017 1 85.0100   1 79.784.4139.4134.2
Unbal2500−0.076−0.028−0.040−0.031−0.024−0.031123.1100   102.6105.0101.5102.9
1000−0.078−0.030−0.042−0.036−0.030−0.034134.2100   104.7108.6103.2104.4
2000−0.079−0.032−0.043−0.038−0.034−0.035142.6100   106.2111.0104.7105.6
4000−0.078−0.031−0.042−0.038−0.033−0.034147.1100   106.7112.8106.0106.5
Note. Bal = balanced ξ i parameters in the LPE model around 1; Unbal = all ξ i parameters either smaller or larger than 1; ξ 0 = size of ξ i parameters; SL = Stocking–Lord linking; SLSQ = SIMEX-based Stocking–Lord linking with a quadratic extrapolation; SLSL = SIMEX-based Stocking–Lord linking with a linear extrapolation; SLA1 = analytical bias correction for Stocking–Lord linking according to Equation (15); SLA2 = analytical bias correction for Stocking–Lord linking according to Equation (16) using the empirical variance estimate τ ^ 2 ; SLA3 = analytical bias correction for Stocking–Lord linking according to Equation (16) using the bias-corrected variance estimate τ ^ ^ 2 ;   the linking method SLSQ was the reference method for computing the relative RMSE. Biases with absolute values larger than 0.010 are printed in bold font. Relative RMSE values larger than 105.0 are printed in bold font.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Robitzsch, A. SIMEX-Based and Analytical Bias Corrections in Stocking–Lord Linking. Analytics 2024, 3, 368-388. https://doi.org/10.3390/analytics3030020

AMA Style

Robitzsch A. SIMEX-Based and Analytical Bias Corrections in Stocking–Lord Linking. Analytics. 2024; 3(3):368-388. https://doi.org/10.3390/analytics3030020

Chicago/Turabian Style

Robitzsch, Alexander. 2024. "SIMEX-Based and Analytical Bias Corrections in Stocking–Lord Linking" Analytics 3, no. 3: 368-388. https://doi.org/10.3390/analytics3030020

APA Style

Robitzsch, A. (2024). SIMEX-Based and Analytical Bias Corrections in Stocking–Lord Linking. Analytics, 3(3), 368-388. https://doi.org/10.3390/analytics3030020

Article Metrics

Back to TopTop