Previous Article in Journal
Polynomial-Computable Representation of Neural Networks in Semantic Programming

Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

# Linking Error in the 2PL Model

by
Alexander Robitzsch
1,2
1
IPN—Leibniz Institute for Science and Mathematics Education, Olshausenstraße 62, 24118 Kiel, Germany
2
Centre for International Student Assessment (ZIB), 24118 Kiel, Germany
J 2023, 6(1), 58-84; https://doi.org/10.3390/j6010005
Submission received: 3 December 2022 / Revised: 6 January 2023 / Accepted: 7 January 2023 / Published: 11 January 2023

## Abstract

:
The two-parameter logistic (2PL) item response model is likely the most frequently applied item response model for analyzing dichotomous data. Linking errors quantify the variability in means or standard deviations due to the choice of items. Previous research presented analytical work for linking errors in the one-parameter logistic model. In this article, we present linking errors for the 2PL model using the general theory of M-estimation. Linking errors are derived in the case of log-mean-mean linking for linking two groups. The performance of the newly proposed formulas is evaluated in a simulation study. Furthermore, the linking error estimation in the 2PL model is also treated in more complex settings, such as chain linking, trend estimation, fixed item parameter calibration, and concurrent calibration.

## 1. Introduction

Item response theory (IRT) models [1,2] are an important class of multivariate statistical models for analyzing dichotomous random variables used to model testing data from educational or psychological applications. Of particular relevance is the application of item response models in educational large-scale assessments [3], such as the programme for international student assessment (PISA; [4]) study.
In this article, we only investigate unidimensional IRT models. Let $X = ( X 1 , … , X I )$ be the vector of I dichotomous random variables $X i ∈ { 0 , 1 }$ (also referred to as items). A unidimensional item response model [5] is a statistical model for the probability distribution $P ( X = x )$ for $x = ( x 1 , … , x I ) ∈ { 0 , 1 } I$, where
$P ( X = x ; δ , γ ) = ∫ − ∞ ∞ ∏ i = 1 I P i ( θ ; γ i ) x i 1 − P i ( θ ; γ i ) 1 − x i ϕ ( θ ; μ , σ ) d θ ,$
where $ϕ$ is the density of the normal distribution with mean $μ$ and standard deviation $σ$. The vector $δ = ( μ , σ )$ contains the distribution parameters. The vector $γ = ( γ 1 , … , γ I )$ contains all estimated item parameters of item response functions $P i ( θ ; γ i ) = P ( X i = 1 | θ )$.
The one-parameter logistic (1PL) model (also referred to as the Rasch model; [6]) employs the item response function $P i ( θ ) = Ψ ( θ − b i )$, where $Ψ$ denotes the logistic distribution function, and $b i$ is the item difficulty of item i. In this case, $γ i = ( b i )$. The two-parameter logistic (2PL) model [7] additionally includes the item discrimination $a i$ (i.e., $γ i = ( a i , b i )$), and the item response function is given by $P i ( θ ) = Ψ a i ( θ − b i )$.
Note that distribution parameters $δ$ and item parameters $γ$ cannot be simultaneously identified. In applications such as PISA in which a country mean $μ$ and a country standard deviation $σ$, item parameters $γ i$ are often fixed at values $γ i *$ that are used for all countries. In this situation, $μ$ and $σ$ can be identified. If sample data $X 1 , … , X N$ for N persons are available, unknown model parameters in (1) can be estimated by (marginal) maximum likelihood (ML) using an expectation maximization algorithm [8,9].
In practice, data-generating item parameters $γ i$ differ from assumed fixed item parameters $γ i *$. This property is also referred to as differential item functioning (DIF; [10,11]). DIF effects $e i$ are defined as deviations $e i = γ i − γ i *$. The occurrence of DIF causes additional variability in the estimated (country) mean $μ$ and standard deviation $σ$ [12,13]. The estimated distribution parameters depend on the choice of selected items, even in infinite sample sizes of persons. This variability is quantified in the linking error [14,15,16,17,18,19].
There exist simple formulas for linking errors based on variance components for the 1PL model [15,17]. For more complex models, resampling techniques [20,21] such as jackknife [15,17] or the (balanced) half sampling [18] of items can be employed. In this article, we provide closed formulas for the linking error for the 2PL model in various applications based on the M-estimation theory. The proposed formulas have the advantage of avoiding computationally more demanding resampling approaches for computing linking errors.

## 2. Linking Error and M-Estimation

In this section, we discuss the computation of the linking error in the 2PL model for two groups. We do this in a general setting of M-estimation theory [22,23,24] because our treatment will apply to many of the recently discussed linking methods. However, we focus on log-mean-mean linking in this article as an important example in Section 3 and Section 4.
Assume that the 2PL model holds in two groups $g = 1 , 2$ or two time points. The goal is to determine the mean $μ$ and the standard deviation $σ$ of the second group, while the first group is assumed to have a mean of 0 and a standard deviation of 1. The DIF effects $f i$ and $e i$ for logarithmized item discriminations and item difficulties follow
$log a i 2 = log a i 1 + f i b i 2 = b i 1 + e i .$
It is assumed that $f i$ and $e i$ are independently and identically distributed with zero means and variances $τ a 2$, $τ b 2$, and the covariance is defined as $Cov ( e i , f i ) = τ a b$.
In the first step of the linking approach, the 2PL model is separately estimated in each of the two groups. Because the ability $θ$ for the first group has zero mean and a standard deviation of 1, the identified item parameters $a ^ i 1$ and $b ^ i 1$ equal the data-generating item parameters $a i 1$ and $b i 1$, respectively, ($i = 1 , … , I$). In the second group, we fix the mean to 0 and the standard deviation to 1 and obtain identified parameters
$a ^ i 2 = σ a i 2 and b ^ i 2 = σ − 1 b i 2 − μ .$
In the second step of the linking approach, identified item parameters ${ ( a ^ i 1 , b ^ i 1 ) }$ and ${ ( a ^ i 2 , b ^ i 2 ) }$ are used in determining the mean $μ$ and the standard deviation $σ$ in the second group. Note that we assume that identified item parameters are known. Hence, we implicitly have infinite sample sizes of persons. In practice, we estimate item parameters from finite sample sizes. Appropriate adjustments are discussed in Section 5.8.
A general estimating equation of the type
$∑ i = 1 I g ( δ ; a ^ i 1 , b ^ i 1 , a ^ i 2 , b ^ i 2 ) = ∑ i = 1 I g ( δ ; a i 1 , b i 1 , f i , e i ) = 0$
is employed for determining the parameter $δ = ( σ , μ )$ or $δ = ( s , μ )$ with $s = log σ$ as the distribution parameters of interest. M-estimation theory provides the asymptotic variance in an estimate $δ ^$. Because linking errors refer to the uncertainty regarding item choice, this asymptotic variance can be used to compute the linking error.
In the two-group 2PL case, we have two unknown distribution parameters $σ$ (or $s = log σ$) and $μ$. Hence, $g = ( g 1 , g 2 )$ involves two equations for two unknowns that must be solved. The two M-estimation equations of the linking approaches can be generally written as
$∑ i = 1 I g 1 ( δ ; a i 1 , b i 1 , f i , e i ) = 0 ∑ i = 1 I g 2 ( δ ; a i 1 , b i 1 , f i , e i ) = 0$
M-estimation theory provides the asymptotic variance (for $I → ∞$) for the estimate $δ ^$ with the sandwich formula [22]
$V I ( δ ^ ) = A I ( δ ^ ) − 1 B I ( δ ^ ) A I ( δ ^ ) − 1 .$
The matrix $A I$ is denoted as the bread matrix, while the matrix $B I$ is referred to as the meat matrix. The latter matrix is given by
$B I ( δ ) = ∑ i = 1 I Var g ( δ ; a i 1 , b i 1 , f i , e i ) g ( δ ; a i 1 , b i 1 , f i , e i ) ⊤ ,$
where $Var$ denotes a covariance matrix. The bread matrix $A I$ is given as
$A I ( δ ) = ∑ i = 1 I E ∂ g ∂ δ ( δ ; a i 1 , b i 1 , f i , e i ) .$
The two matrices in Equations (7) and (8) require the computation of expected values and variances of random variables. If these were unavailable or the quantities could not be algebraically determined, sample-based versions of the bread and the meat matrix are frequently used [23]. The meat matrix $B I$ can be estimated based on sample data using (7)
$B ^ I ( δ ^ ) = ∑ i = 1 I g ( δ ^ ; a ^ i 1 , b ^ i 1 , a ^ i 2 , b ^ i 2 ) g ( δ ^ ; a ^ i 1 , b ^ i 1 , a ^ i 2 , b ^ i 2 ) ⊤ .$
An empirical version of $A I$ is given by
$A ^ I ( δ ^ ) = ∑ i = 1 I ∂ g ∂ δ ( δ ^ ; a ^ i 1 , b ^ i 1 , a ^ i 2 , b ^ i 2 ) .$
If $A I ( δ ^ )$ and $B I ( δ ^ )$ are used for computing the variance matrix for $δ ^$ (i.e., $V I , ESW ( δ ^ )$), the estimate is denoted as the expected sandwich (ESW) estimate. The observed sandwich (OSW) estimate is obtained by using $A ^ I ( δ ^ )$ and $B ^ I ( δ ^ )$ in the formula of the variance matrix (i.e., $V I , OSW ( δ ^ )$). Finally, a bias-corrected observed sandwich (BOSW) is obtained by using $V I , BOSW ( δ ^ ) = I / ( I − 1 ) V I , OSW ( δ ^ )$ (see [25,26,27]).
We want to emphasize that M-estimation theory is not restricted to applications of linking approaches for two groups and two distribution parameters. The parameter $δ$ can be of any finite dimensionality and could, for example, involve $2 ( G − 1 )$ unknown parameters for linking G groups.
M-estimation theory was applied in the investigation of DIF and linking in [28,29,30]. The simultaneous treatment of standard errors and linking errors in IRT models relying on M-estimation was presented in [18,31] (see also [32]).

We now apply the M-estimation of computing linking errors to log-mean-mean linking [33,34] for linking two groups in the 2PL model. The logarithm of the standard deviation $s = log σ$ is estimated by
$s ^ = 1 I ∑ i = 1 I log a ^ i 2 − log a ^ i 1 .$
It can be shown that $s ^$ is an unbiased and consistent estimate for s [34]. Moreover, $σ ^$ is obtained by computing $σ ^ = exp ( s ^ )$.
An estimate of the group mean $μ$ is obtained using
$μ ^ = 1 I ∑ i = 1 I exp ( s ^ ) b ^ i 2 − b ^ i 1 .$
We can reformulate (11) and (12) as M-estimators
$∑ i = 1 I g 1 ( δ ; a i 1 , b i 1 , f i , e i ) = ∑ i = 1 I log a ^ i 2 − log a ^ i 1 − s = 0 and$
$∑ i = 1 I g 2 ( δ ; a i 1 , b i 1 , f i , e i ) = ∑ i = 1 I exp ( s ) b ^ i 2 − b ^ i 1 + μ = 0$
using $g = ( g 1 , g 2 )$. Now, we determine the linking errors for $σ ^$ and $μ ^$ in log-mean-mean linking using the sandwich formula (6) of M-estimation. First, we compute the variance matrix of $g$. We obtain using $a ^ i 2 = a i 1 exp ( s ) exp ( f i )$ and $b ^ i 2 = σ − 1 ( b i 1 − μ )$
$∑ i = 1 I Var g 1 ( δ ; a i 1 , b i 1 , f i , e i ) = ∑ i = 1 I Var log a ^ i 2 − log a ^ i 1 − s = ∑ i = 1 I Var f i = I τ a 2 ,$
$∑ i = 1 I Cov g 1 ( δ ; a i 1 , b i 1 , f i , e i ) , g 2 ( δ ; a i 1 , b i 1 , f i , e i ) = ∑ i = 1 I Cov f i , e i = I τ a b , and$
$∑ i = 1 I Var g 2 ( δ ; a i 1 , b i 1 , f i , e i ) = ∑ i = 1 I Var e i = I τ b 2 .$
Hence, it follows that
$B I ( δ ^ ) = I τ a 2 τ a b τ a b τ b 2$
Now, we compute derivatives of $g$ with respect to s and $μ$ and obtain the bread matrix as
$A I ( δ ^ ) = ∂ g 1 ∂ s ∂ g 1 ∂ μ ∂ g 2 ∂ s ∂ g 2 ∂ μ = I − 1 0 − ( μ − b ¯ • 1 ) 1$
The inverse of the bread matrix can be determined as
$A I ( δ ^ ) − 1 = − I − 1 1 0 μ − b ¯ • 1 − 1$
Formulas (18) and (19) involve unknown parameters $τ a$, $τ b$, $τ a b$, and $μ$ that must be estimated. The quantities $f i$ and $e i$ can be replaced with their sample estimates to estimate unknown variances and covariances in the $A I$ and $B I$ matrices. We obtain the expected sandwich estimator (ESW) $V I , ESW$ using Equation (6) by
$A I , ESW ( δ ^ ) = I − 1 0 − ( μ ^ − b ¯ • 1 ) 1 and B I , ESW ( δ ^ ) = I τ ^ a 2 τ ^ a b τ ^ a b τ ^ b 2 .$
The observed sandwich (OSW) estimate $V I , OSW$ uses empirical moments in the estimation. First, the bread matrix is estimated by
$A I , OSW ( δ ^ ) = − I 0 exp ( s ) ∑ i = 1 I b ^ i 2 I .$
The meat matrix is estimated by
$B I , OSW ( δ ^ ) = ∑ i = 1 I log a ^ i 2 − log a ^ i 1 − s ^ 2 ∑ i = 1 I ( log a ^ i 2 − log a ^ i 1 − s ^ ) ( exp ( s ^ ) b ^ i 2 − b ^ i 1 + μ ^ ) ∑ i = 1 I ( log a ^ i 2 − log a ^ i 1 − s ^ ) ( exp ( s ^ ) b ^ i 2 − b ^ i 1 + μ ^ ) ∑ i = 1 I ( exp ( s ^ ) b ^ i 2 − b ^ i 1 + μ ^ ) 2$
Finally, we use a bias-corrected variant of the observed sandwich estimator [27] as
$V I , BOSW = I I − 1 V I , OSW .$
Linking errors can be obtained as square roots of the diagonal elements of the variance matrices $V$. The linking error for $s ^$ based on the ESW estimate is given by
$LE ( s ^ ) = τ ^ a I .$
By utilizing the delta method, we can obtain the linking error for $σ ^ = exp ( s ^ )$ as
$LE ( σ ^ ) = σ ^ τ ^ a I .$
Finally, the linking error for $μ ^$ can be estimated by
$LE ( μ ^ ) = τ ^ b 2 + ( μ ^ − b ¯ • 1 ) 2 τ ^ a 2 − 2 ( μ ^ − b ¯ • 1 ) τ ^ a b I .$
In the absence of a nonuniform DIF, we have $τ ^ a 2 = τ ^ a b = 0$, and the linking error for the 1PL model is obtained from (27)
$LE ( μ ^ ) = τ ^ b I .$
Interestingly, the presence of a nonuniform DIF ($τ a 2 > 0$) introduces additional uncertainty in computing the group mean. However, there is only an effect of a nonuniform DIF if the average item difficulty does not match the group mean (i.e., $μ ^ − b ¯ • 1 ≠ 0$). In typical applications, the third term $2 ( μ ^ − b ¯ • 1 ) τ ^ a b$ in (27) will be much more important than the second term $( μ ^ − b ¯ • 1 ) 2 τ ^ a 2$. Hence, a nonuniform DIF particularly plays an important role if uniform and nonuniform DIF effects are strongly correlated.

## 4. Simulation Study

#### 4.1. Method

In this simulation study, we investigate the performance of different linking error estimates for the log-mean-mean linking approach in the 2PL model. In particular, we compare the jackknife linking error (JK) with linking errors obtained by the empirical sandwich (ESW), observed sandwich (OSW), and the bias-corrected observed sandwich (BOSW) estimates. The formulas for the sandwich estimates are presented in Section 3. The jackknife linking error estimate is computed by repeating the linking procedure when omitting the ith item for $i = 1 , … , I$. Let $u = μ$ or $u = σ$ be the distribution parameter of interest and $u ^$ be the corresponding estimate. Let $u ^ ( − i )$ be the estimated parameter if item i was removed from the linking procedure. Then, the jackknife linking error is defined as (see [15])
$LE ( u ^ ) = I − 1 I ∑ i = 1 I ( u ^ ( − i ) − u ^ ) 2$
For identification, the first group had a zero mean and a standard deviation of the ability variable $θ$. For the second group, we defined $μ = − 0.2$ and $σ = 0.9$ in the simulation. Item parameters for 10 items are presented in Table 1. In the simulation, we used $I = 10 , 20 , 40$, or 80 items. For item numbers as multiples of 10, we duplicated the item parameters of the 10 items presented in Table 1 accordingly. The standard deviation of uniform DIF effects $e i$ was chosen as $τ b = 0.25$ or 0.50. The standard deviation of nonuniform DIF effects $f i$ was chosen as $τ a = 0.01$ or 0.25. The first condition mimics the case of the practical absence of nonuniform DIF effects. The correlation of DIF effects between $e i$ and $f i$ was set at 0.3 in all simulation conditions (i.e., $τ a b = 0.3 · τ a τ b$). Finally, we chose three types of distributions for DIF effects $( f i , e i )$. We specified them as a bivariate normal copula model and chose different marginal distributions. First, we chose the normal distribution (i.e., denoted as “Normal”) as a marginal distribution appropriately scaled by $τ a$ and $τ b$. Second, we chose a scaled t distribution with four degrees of freedom (i.e., denoted as “$t 4$”) with an appropriate scaling factor to match the desired standard deviation of DIF effects. Third, we use the distribution function F of a normal mixture model (i.e., denoted as “Normal Mixture”) of the type
$F = ( 1 − ε ) N ( 0 , τ 2 ) + ε N ( 0 , k τ 2 ) ,$
where $k = 3$ and $ε = 0.05$. This distribution can be interpreted as a contaminated distribution that includes a few outlying DIF effects in $N ( 0 , k τ 2 )$ with proportion $ε$. Such a distribution is often employed in robust statistics [35]. For a prespecified DIF effect $τ b$, we obtain from (30) the determining equation $τ = ( 1 − ε ) + ε k 2 − 1 / 2 τ a$.
To disentangle standard errors due to the sampling of persons from linking errors due to item choice, we assumed no sampling error for identified parameters ${ ( a ^ i 1 , b ^ i 1 ) }$ and ${ ( a ^ i 2 , b ^ i 2 ) }$. That is, identified item parameters for the second group only vary across replications in the simulation study because different DIF effects $f i$ and $e i$ were simulated in each replication. It seems reasonable in the simulation study in the comparison of the different M-estimation approaches with a jackknife to exclude the effects of sampling errors because these are just another source of uncertainty in distribution parameter estimates.
In each of the $4 ( I ) × 2 ( τ b ) × 2 ( τ a ) × 3 ( distribution ) = 48$ cells of the simulation, 40,000 replications were conducted. We assessed coverage rates at the 95% confidence level based on the normal distribution for distribution parameter estimates. The linking error computation for the estimated standard deviation $σ ^ = exp ( s ^ )$ utilized the delta method.
The R software [36] was used for simulation and analysis. We used the qmixnorm function from the R package KScorrect [37] for determining quantiles in the data simulation of DIF effects. Because analytical solutions are not available to compute a quantile function for the normal mixture model, the qmixnorm function approximates the quantile function using a spline function calculated from cumulative density functions for the specified mixture distribution [37]. Quantiles for probabilities near zero or one are approximated by taking a randomly generated sample.

#### 4.2. Results

In Table 2, the coverage rates for the estimated mean $μ ^$ are presented as a function of the standard deviation of the DIF effects, the number of items, and the type of distribution for the DIF effects. It turned out that there were no substantial differences in the performance of the different linking error methods with respect to the distribution types of the DIF effects. The jackknife and the ESW estimates were very similar. The OSW estimate did not reach the desired coverage rates in a short test (i.e., $I = 10$) but improved in longer tests. Moreover, the BOSW slightly improved the OSW estimate but still was inferior to the ESW estimate.
In Table 3, the coverage rates for the estimated standard deviation $σ ^$ are presented as a function of the standard deviation of the DIF effects, the number of items, and the type of distribution for the DIF effects. The OSW and BOSW had particular issues in the coverage rates with very small nonuniform DIF effects. These issues also remained in the longer tests. However, the estimated standard deviations of the nonuniform DIF effects $τ ^ a$ turned out to be unbiased and can be detectable in such situations to indicate that linking errors for $σ ^$ would be tiny in this situation. Hence, the practical absence of nonuniform DIF effects using the simulation condition $τ a = 0.01$ might not be very realistic, and future studies could investigate the performance using $τ a = 0.10$.
Overall, the findings of this simulation study indicate that the sandwich estimates (in particular the ESW) are as effective as the jackknife estimates for linking errors.

## 5. Further Applications of the Linking Error in the 2PL Model

In this section, several applications of the linking error computations in the 2PL model are presented. In Section 5.1, the M-estimation theory is applied to linking approaches other than the log-mean-mean linking. Section 5.2 discusses the computation of the linking errors if the items are nested within testlets. The linking errors for chain linking and trend estimation are discussed in Section 5.3 and Section 5.4, respectively. In Section 5.5, the linking error under a fixed item parameter calibration is derived. Section 5.6 presents the linking error in the 2PL model for a concurrent calibration. Section 5.7 investigates the computation of the linking errors of derived parameters. Finally, Section 5.8 focuses on the computation of the total error and sampling error corrections in the linking error estimation.

We now illustrate how the sandwich estimates in the M-estimation theory from Section 2 can be used for other linking approaches.

Log-mean-mean linking involves two steps that compute the mean for determining the logarithm of the standard deviation $s = log σ$ and the mean $μ$. A few outlying items might introduce bias in the estimated distribution parameters [18]; robust estimators for the location measures can be preferred. In this case, the estimating functions for $δ = ( s , μ )$ are given by
$∑ i = 1 I g 1 ( δ ; a i 1 , b i 1 , f i , e i ) = ∑ i = 1 I ρ log a ^ i 2 − log a ^ i 1 − s = 0 and$
$∑ i = 1 I g 2 ( δ ; a i 1 , b i 1 , f i , e i ) = ∑ i = 1 I ρ exp ( s ) b ^ i 2 − b ^ i 1 + μ = 0$
using a robust function that fulfills the property $ρ ( x ) / | x | → ∞$ for $| x | → ∞$ (see [29,30]). For example, if the median was used, $ρ$ would be the sign function $ρ ( x ) = 1 { x > 0 } − 1 { x < 0 }$. A differentiable approximation of this function is given by $ρ ˜ ( x ) = d d x ( x 2 + ε ) p / 2$ for $p = 1$ and $ε = 0.01$. The observed sandwich formula can be easily applied to obtain linking errors for a wide class of robust linking approaches.

Haebara (HAE) linking [38] aligns item response functions instead of directly aligning item parameters. The linking function in HAE linking is given as
$H ( μ , σ ) = ∑ i = 1 I ∫ Ψ a ^ i 1 ( θ − b ^ i 1 ) − Ψ σ − 1 a ^ i 2 ( θ − σ b ^ i 2 − μ 2 ) 2 ω ( θ ) d θ ,$
where $ω$ is a weighting function. By defining the difference
$h ( θ , μ , σ ; a ^ i 1 , b ^ i 1 , a ^ i 2 , b ^ i 2 ) = Ψ a ^ i 1 ( θ − b ^ i 1 ) − Ψ σ − 1 a ^ i 2 ( θ − σ b ^ i 2 − μ 2 ) ,$
we can rewrite (33) as
$H ( μ , σ ) = ∑ i = 1 I ∫ h ( θ , μ , σ ; a ^ i 1 , b ^ i 1 , a ^ i 2 , b ^ i 2 ) 2 ω ( θ ) d θ .$
The estimating equations for $σ$ and $μ$ can be determined by
$∑ i = 1 I g 1 ( δ ; a i 1 , b i 1 , f i , e i ) = ∑ i = 1 I ∫ h ( θ , μ , σ ; a ^ i 1 , b ^ i 1 , a ^ i 2 , b ^ i 2 ) ∂ ∂ σ h ( θ , μ , σ ; a ^ i 1 , b ^ i 1 , a ^ i 2 , b ^ i 2 ) ω ( θ ) d θ = 0 and$
$∑ i = 1 I g 2 ( δ ; a i 1 , b i 1 , f i , e i ) = ∑ i = 1 I ∫ h ( θ , μ , σ ; a ^ i 1 , b ^ i 1 , a ^ i 2 , b ^ i 2 ) ∂ ∂ μ h ( θ , μ , σ ; a ^ i 1 , b ^ i 1 , a ^ i 2 , b ^ i 2 ) ω ( θ ) d θ = 0 .$
Again, linking errors can be easily obtained using the observed sandwich (OSW) formula.

#### 5.2. Linking Error with Testlets

In educational large-scale assessment studies such as PISA, several items frequently share a common item stimulus. In this situation, items are nested within testlets [39,40,41]. It was demonstrated that DIF effects are also pronounced at the testlet level and not only at the item level [17,42]. This additional source of uncertainty should be included in the computation of linking errors to avoid negatively biased linking error estimates.
We illustrate how to apply the theory of M-estimation in the case of testlets. The linking error for $σ ^$ and $μ ^$ in log-mean-mean linking is derived. Let there be H testlets, and there are $I h ≥ 1$ items within each testlet h. It holds that $∑ h = 1 H I h = I$. The data-generating model for DIF effects in the 2PL model in Equation (2) is adapted to include DIF effects at the item and the testlet level. DIF effects for logarithmized item discriminations now include two terms referring to testlets (i.e., h) and items within testlets (i.e., item i nested within testlet h). We assume
$log a i h 2 = log a i h 1 + f h + f i h b i h 2 = b i h 1 + e h + e i h$
Note that item parameters now possess an item index i and a testlet index h in (38). Again, it is assumed that the DIF effects are independently distributed across items, and we define $τ a , testlet 2 = Var ( f h )$, $τ a , item 2 = Var ( f i h )$, $τ b , testlet 2 = Var ( e h )$, $τ b , item 2 = Var ( e i h )$, $τ a b , testlet = Cov ( e h , f h )$, and $τ a b , item = Cov ( e i h , f i h )$.
The estimating equations in log-mean-mean linking are the same. However, they must include the testlet structure of items. The estimating Equations (13) and (14) are rearranged as
$∑ h = 1 H ∑ i = 1 I h g 1 ( δ ; a i h 1 , b i h 1 , f h , f i h , e h , e i h ) = ∑ h = 1 H ∑ i = 1 I h log a ^ i h 2 − log a ^ i h 1 − s = 0 and$
$∑ h = 1 H ∑ i = 1 I h g 2 ( δ ; a i h 1 , b i h 1 , f h , f i h , e h , e i h ) = ∑ h = 1 H ∑ i = 1 I h exp ( s ) b ^ i h 2 − b ^ i h 1 + μ = 0 .$
The essential change in the computation of the sandwich variance in the testlet case is that the variance matrix $B I$ (i.e., the meat matrix) requires the computation of the variance that is carried out at each testlet h instead of each individual item i (see [27]). To indicate the dependency from the testlet structure, it is more appropriate to label the meat matrix $B H$ because testlets are independent units, not items. The entry involving the variance in DIF effects in item discriminations in the meat matrix can be computed as
$∑ h = 1 H Var ∑ i = 1 I h g 1 ( δ ; a i 1 , b i 1 , f h , f i h , e h , e i h ) = ∑ h = 1 H Var ∑ i = 1 I h log a ^ i 2 − log a ^ i 1 − s = ∑ h = 1 H I h 2 τ a , testlet 2 + I τ a , item 2 .$
The other variance and the covariance can be derived similarly. Consequently, the meat matrix is given by
$B H ( δ ^ ) = ∑ h = 1 H I h 2 τ a , testlet 2 τ a b , testlet τ a b , testlet τ b , testlet 2 + I τ a , item 2 τ a b , item τ a b , item τ b , item 2 .$
The bread matrix $A H$ only involves the expected value of the derivatives of the estimating equations. Hence, this matrix remains unchanged in a testlet structure [27].
The unknown variance and covariance components in (42) can be replaced by sample estimates. Then, an expected sandwich variance estimate can be obtained. A sample estimate of the meat matrix can be obtained by replacing the population variance in (41) with an empirical variance. Like in the case of independent items, the observed sandwich variance estimate $V ^ H$ can be modified to obtain a bias-corrected variant. In the testlet case, one should use correction factor $H / ( H − 1 )$ instead of $I / ( I − 1 )$.

In this subsection, we discuss the computation of the linking error in chain linking [33,43,44] in log-mean-mean linking. Figure 1 illustrates the test design in the chain linking. The items are administered at three time points, T1, T2, and T3 (or in three groups). The goal is to determine the distribution parameters at T3. The distribution parameters of the ability variable $θ$ at T3 can be compared with those at T1 by carrying out the linking step T1↔T2 and T2↔T3. The set of all items is denoted as $J = { 1 , … , I }$ and is partitioned into three distinct sets, $J 0$, $J 1$, and $J 2$. The set $J 0$ contains items that were administered at all three time points. Items in $J 1$ and $J 2$ are administered at T1 and T2 and T2 and T3, respectively. We fix the distribution parameters at the first time point T1 to a mean of 0 and a standard deviation of 1. The mean of $θ$ at T2 is denoted by $μ 1$ and the standard deviation by $σ 1$. The mean of $θ$ at T3 is denoted by $μ 2$ and the standard deviation by $σ 2$.
We assume that the 2PL model holds at the three time points. For longitudinal data, DIF is referred to as item parameter drift (IPD; [45,46]). The data-generating model involves IPD effects $f i t$ for item discriminations and $e i t$ for item intercepts ($t = 1 , 2$).
$log a i 2 = log a i 1 + f i 1 b i 2 = b i 1 + e i 1 log a i 3 = log a i 2 + f i 2 b i 3 = b i 2 + e i 2$
All IPD effects are allowed to be correlated within each item i but are uncorrelated across items.
In chain linking, the 2PL model is separately estimated for the three time points. The identified item parameters are given as $a ^ i 1 = a i 1$, $b ^ i 1 = b i 1$ and
$a ^ i 2 = a i 2 σ 1 b ^ i 2 = σ 1 − 1 ( b i 2 − μ 1 ) a ^ i 3 = a i 3 σ 2 b ^ i 3 = σ 2 − 1 ( b i 3 − μ 2 )$
In the first linking step T1↔T2, the mean $μ 1$ and the standard deviation $σ 1 = exp ( s 1 )$ are determined in log-mean-mean linking. In the second linking step T1↔T2, linking constants $s 2$ and $m 2$ are derived that refer to the linear transformation $θ ↦ exp ( s 2 ) θ + m 2$.
In the chain linking approach, the unknown parameters are collected in the vector $δ = ( μ 1 , s 1 , m 2 , s 2 )$. The parameters of interest $( μ 2 , σ 2 )$ are computed as a derived parameter given by
$μ 2 = exp ( s 2 ) μ 1 + m 2 and σ 2 = exp ( s 2 ) σ 1 .$
Let $I k = | J k |$ be the number of items in the sets $J k$ for $k = 0 , 1 , 2$. We define $κ 0 = I 0 / I$ and $κ k = I k / I$ for $k = 1 , 2$. The two successive linking steps are formulated as a joint linking problem involving four estimating equations
$∑ i = 1 I g 1 ( δ ; X i ) ∑ i = 1 I g 2 ( δ ; X i ) ∑ i = 1 I g 3 ( δ ; X i ) ∑ i = 1 I g 4 ( δ ; X i ) = ∑ i = 1 I d i 1 ( log a ^ i 2 − log a ^ i 1 − s 1 ) ∑ i = 1 I d i 1 ( exp ( s 1 ) b ^ i 2 − b ^ i 1 + μ 1 ) ∑ i = 1 I d i 2 ( log a ^ i 3 − log a ^ i 2 − s 2 ) ∑ i = 1 I d i 2 ( exp ( s 2 ) b ^ i 3 − b ^ i 2 + m 2 ) = 0 0 0 0 ,$
where $X i$ includes all identified item parameters and design variables for item i. The first two estimating equations in (46) refer to log-mean-mean linking of the step T1↔T2, while the last two refer to step T2↔T3. In Equation (46), dummy indicators $d i k$ are used that take the value of one if item i is contained in $J 0$ or $J k$ for $k = 1 , 2$. Note that $∑ i = 1 I d i k = I k$ for $k = 1 , 2$ and $∑ i = 1 I d i 1 d i 2 = I 0$.
We simplify the terms in (46) to
$g 1 ( δ ; X i ) = d i 1 log a ^ i 2 − log a ^ i 1 − s 1 = d i 1 f i 1$
$g 2 ( δ ; X i ) = d i 1 exp ( s 1 ) b ^ i 2 − b ^ i 1 + μ 1 = d i 1 e i 1$
$g 3 ( δ ; X i ) = d i 2 log a ^ i 3 − log a ^ i 2 − s 2 = d i 2 f i 2$
$g 4 ( δ ; X i ) = d i 2 exp ( s 2 ) b ^ i 3 − b ^ i 2 + m 2 = d i 2 σ 1 − 1 e i 2$
We can now compute the meat matrix $B I$
$B I ( δ ) = I κ 1 τ f 1 2 κ 1 τ e 1 f 1 κ 0 τ f 1 f 2 κ 0 σ 1 − 1 τ f 1 e 2 κ 1 τ e 1 f 1 κ 1 τ e 1 2 κ 0 τ e 1 f 2 κ 0 σ 1 − 1 τ e 1 e 2 κ 0 τ f 1 f 2 κ 0 τ e 1 f 2 κ 2 τ f 2 2 κ 2 σ 1 − 1 τ e 2 f 2 κ 0 σ 1 − 1 τ f 1 e 2 κ 0 σ 1 − 1 τ e 1 e 2 κ 2 σ 1 − 1 τ e 2 f 2 κ 2 σ 1 − 2 τ e 2 2 .$
Moreover, we can determine the expected value of the bread matrix $E ( A I ( δ ) )$ as
$A I ( δ ) = − I κ 1 0 0 0 I κ 1 D 1 I κ 1 0 0 0 0 − I κ 2 0 0 0 I κ 2 D 2 I κ 2 , where$
$D 1 = I − 1 κ 1 − 1 ∑ i = 1 I d i 1 ( b i 1 − μ 1 ) D 2 = I − 1 κ 2 − 1 ∑ i = 1 I d i 2 ( b i 1 − μ 2 ) .$
The inverse of $A I$ can be computed as
$[ A I ( δ ) ] − 1 = − I − 1 κ 1 − 1 0 0 0 I − 1 κ 1 − 1 D 1 I − 1 κ 1 − 1 0 0 0 0 − I − 1 κ 2 − 1 0 0 0 I − 1 κ 2 − 1 D 2 I − 1 κ 2 − 1 .$
Using the matrices $B I ( δ )$ and $[ A I ( δ ) ] − 1$, the variance matrix $V I ( δ )$ of $δ ^$ can be computed using the sandwich formula (6).
We now explicitly derive linking errors for $σ ^ 2$ and $μ ^ 2$. First, the standard deviation $σ 2$ at T3 is given as $σ 2 = h ( δ ) = exp ( s 1 ) exp ( s 2 )$. We derive the variance by applying the delta method to the nonlinear transformation h and using the variance matrix $V I$. The first-order partial derivatives of h are given by
$u ⊤ = ∂ σ 2 ∂ δ = exp ( s 1 ) exp ( s 2 ) 0 exp ( s 1 ) exp ( s 2 ) 0 .$
Hence, the linking error of the estimated standard deviation $σ ^ 2$ can be determined as
$LE ( σ ^ 2 ) = u ⊤ V I ( δ ^ ) u .$
The algebraic derivation for the linking error formula was somewhat more intricate, which is why the R package rSymPy package [47] as a wrapper to the SymPy computer algebra system [48] was used. The square of the linking error for $σ ^ 2$ (i.e., the quantity $LE ( σ ^ 2 ) 2$) is computed using (56)
$LE ( σ ^ 2 ) 2 = τ f 1 2 I σ 1 2 σ 2 2 κ 1 + τ f 2 2 I σ 1 2 σ 2 2 κ 2 − τ e 1 f 2 I D 1 κ 0 σ 1 2 σ 2 2 κ 1 κ 2 − τ e 2 f 2 I D 2 σ 1 σ 2 2 κ 2 − τ f 1 e 2 I D 2 κ 0 σ 1 σ 2 2 κ 1 κ 2 + τ f 1 f 2 I 2 κ 0 σ 1 2 σ 2 2 κ 1 κ 2 .$
The linking error for the estimated mean $μ ^ 2$ can be determined as a derived parameter using the transformation $μ 2 = h ( δ ) = exp ( s 2 ) μ 1 + m 2$. The first-order partial derivatives of h are given by
$u ⊤ = ∂ μ 2 ∂ δ = 0 exp ( s 2 ) exp ( s 2 ) μ 1 1 .$
The linking error for $μ ^ 2$ can be computed by using (56)
$LE ( μ ^ 2 ) 2 = τ f 2 2 I μ 1 2 σ 2 2 − D 2 μ 1 σ 2 κ 2 + τ e 1 2 I σ 2 2 κ 1 + τ e 2 2 I 1 + D 2 μ 1 σ 2 κ 2 σ 1 2 + τ e 1 f 2 I D 2 κ 0 σ 2 − 2 κ 0 μ 1 σ 2 2 κ 1 κ 2 + τ e 2 f 2 I D 2 − 2 μ 1 σ 2 + μ 1 σ 2 D 2 2 − D 2 μ 1 2 σ 2 2 κ 2 σ 1 ) + τ f 1 e 2 I D 1 κ 0 σ 2 + D 1 D 2 κ 0 μ 1 σ 2 2 κ 1 κ 2 σ 1 − τ f 1 f 2 I D 1 κ 0 μ 1 σ 2 2 κ 1 κ 2 .$
In chain linking in the 1PL model, the linking error for $μ ^ 2$ substantially simplifies by setting $σ 1 = σ 2 = 1$, $D 2 = 0$ (see also [49])
$LE ( μ ^ 2 ) 2 = τ e 1 2 I 1 κ 1 + τ e 2 2 I 1 κ 2 .$
Note that all DIF effects referring to item discriminations vanish in this case.

#### 5.4. Linking Error for Trend Estimates in Educational Large-Scale Assessment Studies

We now turn to the important application of linking errors for trend estimates [13,15,50,51] in educational LSA studies such as PISA [16,52]. The main goal is to compute a linking error for a trend estimate in country means or country standard deviations between two successive assessments. Again, we rely on the 2PL model and use log-mean-mean linking for the derivation of the linking errors. Previous research derived closed formulas for linking errors in the 1PL model [17]. It was stated in official PISA publications that there does not exist a simple generalization to the 2PL model [52]. However, this section provides a closed formula for the 2PL model.
Figure 2 illustrates the problem of trend estimation in LSA studies. The label “NAT” refers to a nation (i.e., a country) c. The label “INT” refers to the international metric that is defined as a pooled sample comprising all students of participating countries in the LSA study. A trend estimate for a country between two LSA assessments can involve means and standard deviations at T1 and T2 (i.e., it compares NAT2 and NAT1). The first linking step NAT1↔INT1 maps country-specific results onto an international metric at T1. This step allows a cross-sectional comparison of countries on an international reference metric. The second linking step INT1↔INT2 links results of two LSA studies at T1 and T2 at the international metric. The third linking step NAT2↔INT2 maps country-specific results to the international metric at T2.
The set of administered items typically differs across assessments [53]. There are $I 0$ link items that are administered at both time points. A set of $I 1$ unique items is only administered at T1, while a set of $I 2$ unique items is only administered at T2. For identification reasons, we assume that the $θ$ ability variable has zero mean and a standard deviation of one at T1. Then, we can identify the mean $μ c 1$ and the standard deviation $σ c 1$ of country c at T1. Furthermore, we assume that the mean at the international metric at T2 is $μ 0$ and the standard deviation is $σ 0$. We can also identify the mean $μ c 2$ and the standard deviation $σ c 2$ of country c at T2.
The first linking step NAT1↔INT1 in log-mean-mean linking estimates $μ c 1$ and $s c 1 = log σ c 2$. The second linking step INT1↔INT2 estimates $μ 0$ and $s 0 = log σ 0$. The third linking step NAT2↔INT2 estimates linking constants $m c 2$ and $s c 2$ to put the results of country c at the international metric at T2.
The country mean and standard deviation of country c at T2 are derived functions of the estimated linking parameters
$μ c 2 = exp ( s 0 ) μ 0 + m c 2 and σ c 2 = exp ( s c 2 ) exp ( s 0 )$
The linking constants for the link INT2↔NAT2 are recomputed using (61) as
$m c 2 = μ c 2 − exp ( s 0 ) μ 0 and s c 2 = log σ c 2 − s 0$
The main idea is to apply the M-estimation theory and the sandwich formula for deriving linking errors for trend estimates. The three-step linking procedure can also be written as a simultaneous estimation problem involving six estimating equations for the vector of unknown linking parameters $δ = ( s c 1 , μ c 1 , s 0 , m 0 , s c 2 , m c 2 )$. The trend estimate in means is given as
$Δ μ c = μ c 2 − μ c 1 = exp ( s 0 ) μ 0 + m c 2 − μ c 1 .$
The trend estimate in standard deviations is given as
$Δ σ c = σ 2 − σ 1 = exp ( s 0 + s c 2 ) − exp ( s c 1 ) .$
The source of linking errors in trend estimates is the presence of DIF effects. We now present a data-generating model for DIF effects in item parameters in the 2PL model. Let $a i c t$ and $b i c t$ for $t = 1 , 2$ be item discriminations and item difficulties for country c. These item parameters are referred to as national item parameters [17]. International item parameters that result from item response models at the international metric that involves students from all countries are denoted by $α i t$ and $β i t$. We use the same random effects model as in [17] for item difficulties
$b i c t = b i + e i t + e i c + e i c t β i c t = b i + e i t$
The variance component $Var ( e i t ) = τ b , IPD 2$ refers to item parameter drift (IPD; [54]). The variance $Var ( e i c ) = τ b , DIF 2$ is referred to as cross-sectional country DIF, and the variance $Var ( e i c t ) = τ b , DIF × IPD 2$ refers to time-point-specific country DIF. All DIF effects are uncorrelated with each other.
We now extend the random effects model in [17] to item discriminations
$log a i c t = log a i + f i t + f i c + f i c t log α i c t = log a i + f i t$
All DIF effects for logarithmized item discriminations are uncorrelated, but DIF effects e and f can be correlated for the same type of heterogeneity (i.e., $Cov ( e i t , f i t ) = τ a b , IPD$, $Cov ( e i c , f i c ) = τ a b , DIF$, and $Cov ( e i c t , f i c t ) = τ a b , DIF × IPD$).
Identified national item parameters are given as $a ^ i c t = a i c t σ c t$ and $b ^ i c t = σ c t − 1 ( b i c t − μ c t )$ for $t = 1 , 2$. Moreover, for international item parameters, it holds that $α ^ i 1 = α i 1$ and $β ^ i 1 = β i 1$ at T1, and $α ^ i 2 = α i 2 σ 1$ and $β ^ i 2 = σ 0 − 1 ( β i 2 − μ 0 )$.
We now apply the M-estimation theory to the estimation of linking errors. The three log-mean-mean linking steps can be formalized as the following six estimating equations:
$∑ i = 1 I g 1 ( δ ; X i ) ∑ i = 1 I g 2 ( δ ; X i ) ∑ i = 1 I g 3 ( δ ; X i ) ∑ i = 1 I g 4 ( δ ; X i ) ∑ i = 1 I g 5 ( δ ; X i ) ∑ i = 1 I g 6 ( δ ; X i ) = ∑ i = 1 I d i 1 ( log a ^ i c 1 − log α ^ i 1 − s 1 ) ∑ i = 1 I d i 1 ( exp ( s 1 ) b ^ i c 1 − β ^ i 1 + μ 1 ) ∑ i = 1 I d i 0 ( log α ^ i 2 − log α ^ i 1 − s 0 ) ∑ i = 1 I d i 0 ( exp ( s 0 ) β ^ i 2 − β ^ b i 1 + μ 0 ) ∑ i = 1 I d i 2 ( log a ^ i c 2 − log α ^ i 2 − s 2 ) ∑ i = 1 I d i 2 ( exp ( s 2 ) b ^ i c 2 − β ^ i 2 + m 2 ) = 0 0 0 0 0 0$
The estimating equations in (67) define an estimate of the parameter of interest $δ = ( s c 1 , μ c 1 , s 0 , m 0 , s c 2 , m c 2 )$. We now use the sandwich formula to derive the asymptotic variance of $δ ^$ by means of the sandwich Formula (6).
The entries of the meat matrix $B I ( δ )$ in the sandwich formula are denoted by
$B I ( δ ) = B 11 B 21 0 0 B 51 B 61 B 21 B 22 0 0 B 52 B 62 0 0 B 33 B 43 0 0 0 0 B 43 B 44 0 0 B 51 B 52 0 0 B 55 B 65 B 61 B 62 0 0 B 65 B 66 .$
The non-zero entries in $B I ( δ )$ are given as
$B 11 = I κ 1 ( τ a , DIF 2 + τ a , DIF × IPD 2 ) B 21 = I κ 1 ( τ a b , DIF + τ a b , DIF × IPD ) B 22 = I κ 1 ( τ b , DIF 2 + τ b , DIF × IPD 2 ) B 33 = 2 I κ 0 τ a , IPD 2 B 43 = 2 I κ 0 τ a b , IPD 2 B 44 = 2 I κ 0 τ b , IPD 2 B 51 = I κ 0 τ a , DIF 2 B 52 = I κ 0 τ a b , DIF B 61 = I κ 0 exp ( s 0 ) − 1 τ a b , DIF B 62 = I κ 0 exp ( s 0 ) − 1 τ b , DIF 2 B 55 = I κ 2 ( τ a , DIF 2 + τ a , DIF × IPD 2 ) B 65 = I κ 2 exp ( s 0 ) − 1 ( τ a b , DIF + τ a b , DIF × IPD ) B 66 = I κ 2 exp ( s 0 ) − 2 ( τ b , DIF 2 + τ b , DIF × IPD 2 )$
Moreover, we can determine the bread matrix $A I ( δ )$ as
$A I ( δ ) = − I κ 1 0 0 0 0 0 I κ 1 D 1 I κ 1 0 0 0 0 0 0 − I κ 0 0 0 0 0 0 I κ 0 D 0 I κ 0 0 0 0 0 0 0 − I κ 2 0 0 0 0 0 I κ 2 D 2 I κ 2 , where$
$D 1 = I − 1 κ 1 − 1 ∑ i = 1 I d i 1 ( b i − μ 1 ) D 0 = I − 1 κ 0 − 1 ∑ i = 1 I d i 0 ( b i − μ 0 ) D 2 = I − 1 κ 2 ∑ i = 1 I d i 2 ( b i − m 2 )$
The inverse of $A I ( δ )$ can be computed as
$( A I ( δ ) ) − 1 = − I − 1 κ 1 − 1 0 0 0 0 0 I − 1 κ 1 − 1 D 1 I − 1 κ 1 − 1 0 0 0 0 0 0 − I − 1 κ 0 − 1 0 0 0 0 0 I − 1 κ 0 − 1 D 0 I − 1 κ 0 − 1 0 0 0 0 0 0 − I − 1 κ 2 − 1 0 0 0 0 0 I − 1 κ 2 − 1 D 2 I − 1 κ 2 − 1$
Using (69) and (72), the variance matrix $V I$ can be computed.
We now derive the linking error of the trend estimate in standard deviations that is a nonlinear function of $δ = ( s c 1 , μ c 1 , s 0 , m 0 , s c 2 , m c 2 )$
$Δ σ c = h ( δ ) = σ 2 − σ 1 = exp ( s 0 + s c 2 ) − exp ( s c 1 ) .$
The first-order partial derivatives of h are given by
$∂ h ∂ δ = − exp ( s c 1 ) 0 exp ( s 0 ) exp ( s c 2 ) 0 exp ( s 0 ) exp ( s c 2 ) 0 .$
Using (56), we can determine the square of the linking error $LE ( Δ σ c ) 2$ by using computer algebra software [47,48] as
$LE ( Δ σ c ) 2 = τ a , DIF 2 I σ 1 2 κ 1 + σ 2 2 κ 2 − 2 σ 1 σ 2 κ 0 κ 1 κ 2 + τ a , IPD 2 I 2 σ 2 2 κ 0 + τ a , DIF × IPD 2 I σ 1 2 κ 1 + σ 2 2 κ 2 + τ a b , DIF I D 2 σ 1 σ 2 κ 0 κ 1 κ 2 σ 0 + D 1 σ 1 σ 2 κ 0 κ 1 κ 2 − D 1 σ 1 2 κ 1 − D 2 σ 2 2 κ 2 σ 0 − τ a b , IPD I 2 D 0 σ 2 2 κ 0 − τ a b , DIF × IPD I D 1 σ 1 2 κ 1 + D 2 σ 2 2 κ 2 σ 0 .$
We now compute the linking error for the trend estimate in the mean in the 2PL model. The country mean difference can be computed as
$Δ μ c = h ( δ ) = μ c 2 − μ c 1 = exp ( s 0 ) μ 0 + m c 2 − μ c 1 .$
The first-order partial derivatives are given as
$∂ h ∂ δ = 0 − 1 exp ( s 0 ) μ 0 exp ( s 0 ) 0 1 .$
The linking error for the trend estimate in means is given by
$LE ( Δ μ c ) 2 = τ b , DIF 2 I 1 κ 1 + 1 κ 2 σ 0 2 − 2 κ 0 σ 0 κ 1 κ 2 + τ b , IPD 2 I 2 σ 0 2 ( 1 + D 0 μ 0 ) κ 0 + τ b , DIF × IPD 2 I 1 κ 1 + 1 σ 0 2 κ 2 + τ a b , DIF I D 1 κ 1 + D 2 σ 0 κ 2 − D 2 κ 0 κ 1 κ 2 − D 1 κ 0 σ 0 κ 1 κ 2 + τ a , IPD 2 I 2 σ 0 2 μ 0 ( μ 0 − D 0 ) κ 0 + τ a b , IPD I 2 D 0 σ 0 2 + 2 μ 0 D 0 2 σ 0 2 − 2 D 0 σ 0 2 μ 0 2 − 4 μ 0 σ 0 2 κ 0 + τ a b , DIF × IPD I D 1 κ 1 + D 2 σ 0 κ 2 .$
The linking error formula for the 1PL model derived in [17] can be obtained from the terms in (78) that involve the variance components $τ b , DIF 2$, $τ b , IPD 2$, and $τ b , DIF × IPD 2$ by setting $σ 0 2 = 1$ and $D 0 = 0$:
$LE ( Δ μ c ) 2 = τ b , DIF 2 I 1 κ 1 + 1 κ 2 − 2 κ 0 κ 1 κ 2 + τ b , IPD 2 I 2 κ 0 + τ b , DIF × IPD 2 I 1 κ 1 + 1 κ 2$
All other components in (78) vanish in the case of the 1PL model.

#### 5.5. Linking Error in Fixed Item Parameter Calibration

In this subsection, linking errors for the estimated mean and the estimated standard deviations under fixed item parameter calibration (FIPC) are derived. It is assumed that one uses fixed item parameters $a i 1$ and $b i 1$, but the true item parameters $a i 2$ and $b i 2$ have DIF effects and follow the data-generating model (2).
The FIPC is typically applied using marginal maximum likelihood estimation [55]. However, we derive the linking error for a diagonally weighted least squares (DWLS) estimation that approximates the former estimation method [56]. The DWLS minimizes the weighted sums of the differences between the estimated and model-implied item thresholds as well as the tetrachoric correlations. For the simplicity of exposition in this subsection, we assume that there are only DIF effects in item difficulties (i.e., uniform DIF). Hence, we can assume that the standard deviation $σ$ can be estimated without bias.
The item-specific weights in the DWLS estimation are precision weights $ω$ that are defined as the inverse of the variance in thresholds $τ$. When omitting a factor that involves the sample size, precision weights $ω$ are determined by
$ω = ω ( τ ) = ϕ ( τ ) 2 Φ ( τ ) 1 − Φ ( τ ) .$
Equation (80) is displayed in Figure 3. It can be seen that extreme thresholds are downweighted.
The optimization function for $μ$ in DWLS is given by
$H ( μ ) = ∑ i = 1 I ω i τ ^ i 2 − τ i 2 ( μ ) 2 ,$
where $τ ^ i 2$ is the identified threshold of item i and $τ i 2$ is the model-implied threshold. Now, define $α i = a i σ a i 2 σ 2 + L − 1 / 2$, where $L$ is the logistic variance that is the byproduct of using the logistic instead of the probit link function in the 2PL model. Then, (81) can be rewritten as
$H ( μ ) = ∑ i = 1 I ω i α i b ˜ i 2 − ( μ − b i 1 ) 2 = ∑ i = 1 I w i b ^ i 2 − ( μ − b i 1 ) 2 ,$
where $w i = C ω i α i$ (C is an appropriate scaling constant) and $b ^ i 2 = α i − 1 τ ^ i 2$. Note that $b ˜ i 2 = μ 0 − b i 2 = μ 0 − b i 1 − e i$ for a data-generating true mean $μ 0$.
The minimization of (82) leads to a weighted mean
$μ ^ = ∑ i = 1 I w i ( b ˜ i 2 − b i 1 ) W = μ 0 − ∑ i = 1 I w i e i W ,$
where $W = ∑ i = 1 I w i$, and $μ 0$ in (83) denotes the true mean.
Hence, one can determine the linking error as the variance in the second term on the right-hand side in (83). We then obtain the linking error in fixed item parameter calibration
$LE ( μ ^ ) = ∑ i = 1 I w i 2 W τ b .$
By the Cauchy–Schwarz inequality, it holds that
$W = ∑ i = 1 I w i ≤ I · ∑ i = 1 I w i 2 .$
Hence, we get from (84) by using (85)
$LE ( μ ^ ) ≥ 1 I τ b .$
Hence, the linking error has a lower bound in which all item-specific weights were set equal to one. This case corresponds to the linking error in the Rasch model obtained for mean-mean linking [15,17]
$LE ( μ ^ ) = τ b I .$
This linking error is also obtained in FIPC of the 1PL model using unweighted least squares (ULS) estimation.
Interestingly, the finding in (86) illustrates that using incorrect item parameters in the FIPC in the presence of DIF effects results in a precision loss. Hence, the maximum likelihood estimation can only achieve the most efficient estimates under correctly specified models. Because it is almost always expected that there are some intentionally unmodeled DIF effects in real datasets, the dominance of the maximum likelihood estimation in LSA practice can be questioned.

#### 5.6. Linking Error in Concurrent Calibration

In this subsection, we derive the linking error for the estimation of the 2PL model in two groups. The mean and the standard deviation of the ability variable $θ$ are fixed to 0 and 1, respectively. The mean $μ$ and the standard deviation $σ$ of the second group are estimated.
Like in Section 5.5, we assume that there are only DIF effects $e i$ in item difficulties and no DIF effects in item discriminations. We assume that $σ$ can be estimated without bias and derive the linking error under concurrent calibration using DWLS. DIF effects $e i$ with zero means follow $b i 2 = b i 1 + e i$. The identified parameters are given by $b ^ i 1 = b i 1$ and $b ^ i 2 = b i 2 − μ$. The estimation of the vector of common item difficulties $b = ( b 1 , … , b I )$ and the mean $μ$ is conducted using a weighted square loss of differences in estimated and model-implied item thresholds. By simplifying the setting while assuming known item discriminations and standard deviation, the optimization function is given by
$H ( μ , b ) = ∑ i = 1 I w i 1 ( b ^ i 1 − b i ) 2 + ∑ i = 1 I w i 2 ( b ^ i 2 + μ − b i ) 2 ,$
where weights $w i g$ are allowed to be group specific ($g = 1 , 2$). The estimating equations for $μ$ and $b$ are given as
$w i 1 ( b ^ i 1 − b ^ i ) + w i 2 ( b ^ i 2 + μ ^ − b ^ i ) = 0 and$
$∑ i = 1 I w i 2 ( b ^ i 2 + μ ^ − b ^ i ) = 0 .$
By inserting (89) into (90), we obtain
$∑ i = 1 I w i 2 b ^ i 2 + ∑ i = 1 I w i 2 μ ^ − ∑ i = 1 I w i 2 w i 1 b ^ i 1 + w i 2 b ^ i 2 + w i 2 μ ^ w i 1 + w i 2 = 0 .$
Equation (91) can be further simplified to
$∑ i = 1 I w i 1 w i 2 w i 1 + w i 2 b ^ i 2 + ∑ i = 1 I w i 1 w i 2 w i 1 + w i 2 μ ^ − ∑ i = 1 I w i 1 w i 2 w i 1 + w i 2 b ^ i 1 = 0 .$
By defining $w i * = ( w i 1 + w i 2 ) − 1 w i 1 w i 2$, we obtain from (92)
$μ ^ = − ∑ i = 1 I w i * ( b ^ i 2 − b ^ i 1 ) ∑ i = 1 I w i * = μ − ∑ i = 1 I w i * e i ∑ i = 1 I w i * .$
Hence, we obtain the linking error for $μ ^$
$LE ( μ ^ ) = ∑ i = 1 I ( w i * ) 2 W * τ b ,$
where $W * = ∑ i = 1 I w i *$. Again, one can conclude $LE ( μ ^ ) ≥ τ b / I$. Hence, using the linking error $τ b / I$ that does not take item-specific weights into account provides a lower bound of the true linking error.
If the weights $w i 1$ and $w i 2$ would be equal across both groups, we obtain the same linking error like under fixed item parameter calibration.

#### 5.7. Linking Error for Derived Parameters

In previous sections of this paper, we derived the linking error for $μ$ and $σ$ for different applications of the 2PL model. Sometimes, other distribution parameters might be estimated. In this subsection, we assume that the ability variable $θ$ is approximately normally distributed and derive the linking error for nonlinear functions h of $μ$ and $σ$. Let $ν = h ( δ ) = h ( μ , σ )$ be a nonlinear function of the mean $μ$ and the standard deviation $σ$. Let $V$ denote the variance matrix of $δ$ regarding item choice (i.e., quantifying linking errors). The linking error estimate $ν ^ = h ( μ ^ , σ ^ )$ is given as
$LE ( ν ^ ) = u ⊤ V u , where u ⊤ = ∂ h ∂ μ ∂ h ∂ σ and V = v μ 2 v μ σ v μ σ v σ 2 .$
We now illustrate the computation in two examples.

#### 5.7.1. Proportions

First, we are interested in computing the probability p
$p = h ( μ , σ ) = P ( θ ∈ [ c 1 , c 2 ] ) = Φ σ − 1 ( c 2 − μ ) − Φ σ − 1 ( c 1 − μ ) .$
The partial derivatives of h are given as
$∂ h ∂ μ = − σ − 1 ϕ σ − 1 ( c 2 − μ ) − ϕ σ − 1 ( c 1 − μ ) = − σ − 1 d ( μ , σ , c 1 , c 2 ) and$
$∂ h ∂ σ = − σ − 2 ϕ σ − 1 ( c 2 − μ ) − ϕ σ − 1 ( c 1 − μ ) = − σ − 2 d ( μ , σ , c 1 , c 2 ) ,$
where $d ( μ , σ , c 1 , c 2 ) = ϕ σ − 1 ( c 2 − μ ) − ϕ σ − 1 ( c 1 − μ )$ Hence, we can determine the linking error of $p ^ = g ( μ ^ , σ ^ )$ using (95) as
$LE ( p ^ ) = σ ^ − 1 d ( μ ^ , σ ^ , c 1 , c 2 ) v ^ μ 2 + σ ^ − 2 v ^ σ 2 + 2 σ ^ − 1 v ^ μ σ ,$
where $v ^ μ 2$, $v ^ σ 2$, and $v ^ μ σ$ are estimated linking variances and covariances.

#### 5.7.2. Percentiles

Second, the linking error of the pth percentile $Q p$ should be computed. The pth percentile is defined by
$P ( θ ≤ Q p ) = Φ σ − 1 ( Q p − μ ) = p .$
Hence, we can solve (100) for $Q p$ and obtain
$Q p = h ( μ , σ ) = μ + σ z p ,$
where $z p = Φ − 1 ( p )$ is the pth percentile for the standard normal distribution.
Now, we can compute the linking error of $Q ^ p$. We obtain
$∂ h ∂ μ = 1 and ∂ h ∂ σ = z p .$
The linking error of the percentile $Q p$ using (95) is given by
$LE ( Q ^ p ) = v ^ μ 2 + z p 2 v ^ σ 2 + 2 z p v ^ μ σ .$
It can be seen that for more extreme percentiles, the absolute value of $z p$ gets larger and the linking error for $σ$ (i.e., $v ^ σ$ in (103)) becomes more relevant.

#### 5.8. Computation of Total Error and Sampling Error Correction for Linking Error Estimates

In practical applications, the sampling error $SE$ due to the sampling of persons must also be taken into account to quantify the uncertainty in the $σ$ and $μ$ estimates. The total error ($TE$) is given by (see [18,19])
$TE = SE 2 + LE 2 .$
A critical issue might be that estimated linking errors also include a portion of variability that can be attributed to the sampling error of persons. We illustrate an analytical bias correction method for the case of log-mean-mean linking. The issue in the sandwich estimate is the variance matrix (i.e., the meat matrix) $B I$ (see (15) for the computation). The identified item parameters also include a sampling error variance that can be estimated by fitting an item response model. Let $v i a *$ be the variance of $log a ^ i 2 − log a ^ i 1$ due to person sampling (i.e., $v i a * = Var ( log a ^ i 2 − log a ^ i 1 )$. Then, we can determine a corresponding entry in the meat matrix $B I$ by modifying (15) into
$∑ i = 1 I Var g 1 ( δ ; a i 1 , b i 1 , f i , e i ) = ∑ i = 1 I Var log a ^ i 2 − log a ^ i 1 − s = ∑ i = 1 I Var f i + v i a * = I τ a 2 + ∑ i = 1 I v i a * .$
In the derivation of (105), we relied on the property that item parameters of different items are approximately uncorrelated in sufficiently long tests [57]. The other entries in $B I$ can be similarly determined. By defining $v i b * = Var ( b ^ i 2 − b ^ i 1 )$ and $v i a b * = Cov ( log a ^ i 2 − log a ^ i 1 , b ^ i 2 − b ^ i 1 )$, Equations (16) and (17) can be modified to
$∑ i = 1 I Cov g 1 ( δ ; a i 1 , b i 1 , f i , e i ) , g 2 ( δ ; a i 1 , b i 1 , f i , e i ) = I τ a b + ∑ i = 1 I v i a b * , and$
$∑ i = 1 I Var g 2 ( δ ; a i 1 , b i 1 , f i , e i ) = I τ b 2 + ∑ i = 1 I v i b * .$
Hence, the originally obtained meat matrix $B I$ can be removed from sampling error contributions by defining a bias-corrected estimate
$B I * ( δ ^ ) = B I ( δ ^ ) − V * , where V * = ∑ i = 1 I v i a * ∑ i = 1 I v i a b * ∑ i = 1 I v i a b * ∑ i = 1 I v i b * .$
This approach was implemented in the simplified setting in the 1PL model [17].
We now show how to generalize the bias-corrected estimate of the meat matrix $B I$. We can rewrite the estimate from Equation (9) as
$B ^ I ( δ ^ ) = ∑ i = 1 I g ( δ ^ ; γ ^ i ) g ( δ ^ ; γ ^ i ) ⊤ ,$
where $γ ^ i = ( a ^ i 1 , b ^ i 1 , a ^ i 2 , b ^ i 2 )$. The estimate $γ ^ i$ includes the sampling error, and the sampling variance is denoted as
$V i = Var ( γ ^ i ) = E [ ( γ ^ i − γ i ) ( γ ^ i − γ i ) ⊤ ] .$
The estimating function $g$ in (109) can be viewed as a function $g = g ( δ , γ )$. Denote by $g γ = ( ∂ g ) / ( ∂ γ )$ the matrix of partial derivatives with respect to $γ$. We can now apply a Taylor expansion and obtain
$g ( δ ^ , γ ^ i ) = g ( δ ^ , γ i ) + g γ ( δ ^ , γ i ) ( γ ^ i − γ i ) .$
Using $E ( γ ^ i ) = γ$, we can obtain a bias-corrected estimate of the bread matrix as
$B I * ( δ ^ ) = B ^ I ( δ ^ ) − ∑ i = 1 I g γ ( δ ^ , γ i ) V i g γ ( δ ^ , γ i ) ⊤ .$

## 6. Discussion

In this article, we have shown that the sandwich formula from the M-estimation theory can be successfully employed for computing the linking error in the 2PL model in a variety of situations. It was shown in a simulation study for the log-mean-mean linking of two groups that the expected sandwich estimate of the linking error produced satisfactory coverage rates. Interestingly, it had a comparable performance to the jackknife linking error in the 2PL model.
As with any simulation study, some limitations of our study can be stated. More comprehensive studies could involve a different range of standard deviations of the DIF effects, different test lengths, or other linking approaches. It might be interesting to compare the performance of the M-estimation approach with the jackknife linking error for the more complex linking problems of chain linking and trend estimation.
In the simulation study, we only considered uncertainty in distribution parameters due to DIF effects (i.e., linking errors). In practical applications, there will also be a sampling of persons, and the simultaneous assessment of sampling errors and linking errors would be an exciting extension of this study.
In most of the applications involving instruments with cognitive and noncognitive items, linking errors are not reported even if linking approaches were utilized [43,58,59]. There might be two reasons why this is the case. First, simulation studies typically presuppose that the IRT model perfectly fits the data. That is, the DIF is absent, and the IRT model is correctly specified. Second, items could be treated as fixed, and the model misspecification is taken for granted but is not included in the uncertainty quantification of the linking approach. In our view, linking errors provide additional information about the impact of heterogeneous item functioning on parameters of interest and should, therefore, (almost always) be additionally reported. In general, we do not think that the presence of DIF or IPD threatens the validity of group differences or trend estimates.
It should be emphasized that our proposed linking error for the trend estimates in the 2PL model differs from a newly proposed linking error estimate since PISA 2015 [4]. The latter relies on a recalibration approach of the item response data from the first time point [34,60]. The new PISA linking error rather assesses the extent of the assumed noninvariant item parameters across the two PISA studies instead of quantifying the variability due to heterogeneous item functioning.
The derived linking errors rely on the assumption that item parameters are identified. In LSA studies such as PISA, balanced incomplete block designs are employed in which only a subset of items is administered to each student [61]. If item parameters can be identified in such test designs, linking error formulas would not change because country means and standard deviations are based on all the items administered in the test, irrespective of the proportion of items administered to each student.
As argued by an anonymous reviewer, the computation of linking errors relies on the assumption of a representative item sample from an item population. Notably, the identification of item parameters in the joint maximum likelihood estimation also requires asymptotic regimes regarding the sample size and the number of items [62,63,64] but does not require that items are a random sample from an item population. However, the main difference in the computation of linking errors is that linking errors reflect the variability in the distribution parameter estimates due to the DIF. Infinite item samples are not required for model identification; they are only used as a statistical tool for justifying the statistical inference with respect to modeled or unmodeled DIF.
Finally, we assumed that DIF effects had zero means throughout this paper. However, this assumption is not essential in deriving linking errors because the M-estimation theory does not require that estimated parameters converge to true parameters. The M-estimation theory will nevertheless provide the asymptotic variance in a potentially biased estimator. However, DIF effects could also result in a mean that is different from zero. For example, Ref. [28] assumes that the median of the DIF effects equals zero. In this case, log-mean-mean linking in the 2PL model could be replaced by an alternative robust linking method in which the mean is substituted with the median. Notably, the choice of an adequate estimating function to provide unbiased distribution parameters depending on the distribution of DIF effects is a different topic though [18].

## Funding

This research received no external funding.

Not applicable.

Not applicable.

Not applicable.

## Conflicts of Interest

The author declares no conflict of interest.

## Abbreviations

The following abbreviations are used in this manuscript:
 1PL one-parameter logistic 2PL two-parameter logistic DIF differential item functioning DWLS diagonally weighted least squares FIPC fixed item parameter calibration IPD item parameter drift IRT item response theory JK jackknife LE linking error LSA large-scale assessment studies PIRLS progress in international reading literacy study PISA programme for international student assessment ULS unweighted least squares

## References

1. Chen, Y.; Li, X.; Liu, J.; Ying, Z. Item response theory—A statistical framework for educational and psychological measurement. arXiv 2021, arXiv:2108.08604. [Google Scholar] [CrossRef]
2. van der Linden, W.J. Unidimensional logistic response models. In Handbook of Item Response Theory, Volume 1: Models; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 11–30. [Google Scholar]
3. Rutkowski, L.; von Davier, M.; Rutkowski, D. (Eds.) A Handbook of International Large-Scale Assessment: Background, Technical Issues, and Methods of Data Analysis; Chapman Hall: London, UK; CRC Press: Boca Raton, FL, USA, 2013. [Google Scholar] [CrossRef]
4. OECD. PISA 2018. Technical Report; OECD: Paris, France, 2020. [Google Scholar]
5. Yen, W.M.; Fitzpatrick, A.R. Item response theory. In Educational Measurement; Brennan, R.L., Ed.; Praeger Publishers: Westport, CT, USA, 2006; pp. 111–154. [Google Scholar]
6. Rasch, G. Probabilistic Models for Some Intelligence and Attainment Tests; Danish Institute for Educational Research: Copenhagen, Denmark, 1960. [Google Scholar]
7. Birnbaum, A. Some latent trait models and their use in inferring an examinee’s ability. In Statistical Theories of Mental Test Scores; Lord, F.M., Novick, M.R., Eds.; MIT Press: Reading, MA, USA, 1968; pp. 397–479. [Google Scholar]
8. Aitkin, M. Expectation maximization algorithm and extensions. In Handbook of Item Response Theory, Vol. 2: Statistical Tools; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 217–236. [Google Scholar] [CrossRef]
9. Bock, R.D.; Aitkin, M. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika 1981, 46, 443–459. [Google Scholar] [CrossRef]
10. Holland, P.W.; Wainer, H. (Eds.) Differential Item Functioning: Theory and Practice; Lawrence Erlbaum: Hillsdale, NJ, USA, 1993. [Google Scholar] [CrossRef]
11. Penfield, R.D.; Camilli, G. Differential item functioning and item bias. In Handbook of Statistics, Vol. 26: Psychometrics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 125–167. [Google Scholar] [CrossRef]
12. Joo, S.; Ali, U.; Robin, F.; Shin, H.J. Impact of differential item functioning on group score reporting in the context of large-scale assessments. Large-Scale Assess. Educ. 2022, 10, 18. [Google Scholar] [CrossRef]
13. Sachse, K.A.; Roppelt, A.; Haag, N. A comparison of linking methods for estimating national trends in international comparative large-scale assessments in the presence of cross-national DIF. J. Educ. Meas. 2016, 53, 152–171. [Google Scholar] [CrossRef]
14. Battauz, M. Multiple equating of separate IRT calibrations. Psychometrika 2017, 82, 610–636. [Google Scholar] [CrossRef]
15. Monseur, C.; Berezner, A. The computation of equating errors in international surveys in education. J. Appl. Meas. 2007, 8, 323–335. [Google Scholar]
16. OECD. PISA 2012. Technical Report; OECD: Paris, France, 2014; Available online: https://bit.ly/2YLG24g (accessed on 3 December 2022).
17. Robitzsch, A.; Lüdtke, O. Linking errors in international large-scale assessments: Calculation of standard errors for trend estimation. Assess. Educ. 2019, 26, 444–465. [Google Scholar] [CrossRef]
18. Robitzsch, A. Robust and nonrobust linking of two groups for the Rasch model with balanced and unbalanced random DIF: A comparative simulation study and the simultaneous assessment of standard errors and linking errors with resampling techniques. Symmetry 2021, 13, 2198. [Google Scholar] [CrossRef]
19. Wu, M. Measurement, sampling, and equating errors in large-scale assessments. Educ. Meas. 2010, 29, 15–27. [Google Scholar] [CrossRef]
20. Efron, B.; Tibshirani, R.J. An Introduction to the Bootstrap; CRC Press: Boca Raton, FL, USA, 1994. [Google Scholar] [CrossRef]
21. Kolenikov, S. Resampling variance estimation for complex survey data. Stata J. 2010, 10, 165–199. [Google Scholar] [CrossRef] [Green Version]
22. Boos, D.D.; Stefanski, L.A. Essential Statistical Inference; Springer: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
23. Stefanski, L.A.; Boos, D.D. The calculus of M-estimation. Am. Stat. 2002, 56, 29–38. [Google Scholar] [CrossRef]
24. Zeileis, A. Object-oriented computation of sandwich estimators. J. Stat. Softw. 2006, 16, 1–16. [Google Scholar] [CrossRef] [Green Version]
25. Fay, M.P.; Graubard, B.I. Small-sample adjustments for Wald-type tests using sandwich estimators. Biometrics 2001, 57, 1198–1206. [Google Scholar] [CrossRef]
26. Li, P.; Redden, D.T. Small sample performance of bias-corrected sandwich estimators for cluster-randomized trials with binary outcomes. Stat. Med. 2015, 34, 281–296. [Google Scholar] [CrossRef] [Green Version]
27. Zeileis, A.; Köll, S.; Graham, N. Various versatile variances: An object-oriented implementation of clustered covariances in R. J. Stat. Softw. 2020, 95, 1–36. [Google Scholar] [CrossRef]
28. Chen, Y.; Li, C.; Xu, G. DIF statistical inference and detection without knowing anchoring items. arXiv 2021, arXiv:2110.11112. [Google Scholar] [CrossRef]
29. Halpin, P.F. Differential item functioning via robust scaling. arXiv 2022, arXiv:2207.04598. [Google Scholar] [CrossRef]
30. Wang, W.; Liu, Y.; Liu, H. Testing differential item functioning without predefined anchor items using robust regression. J. Educ. Behav. Stat. 2022, 47, 666–692. [Google Scholar] [CrossRef]
31. Robitzsch, A. Lp loss functions in invariance alignment and Haberman linking with few or many groups. Stats 2020, 3, 246–283. [Google Scholar] [CrossRef]
32. Hunter, J.E. Probabilistic foundations for coefficients of generalizability. Psychometrika 1968, 33, 1–18. [Google Scholar] [CrossRef]
33. Kolen, M.J.; Brennan, R.L. Test Equating, Scaling, and Linking; Springer: New York, NY, USA, 2014. [Google Scholar] [CrossRef]
34. Robitzsch, A. A comparison of linking methods for two groups for the two-parameter logistic item response model in the presence and absence of random differential item functioning. Foundations 2021, 1, 116–144. [Google Scholar] [CrossRef]
35. Maronna, R.A.; Martin, R.D.; Yohai, V.J. Robust Statistics: Theory and Methods; Wiley: New York, NY, USA, 2006. [Google Scholar] [CrossRef]
36. R Core Team. R: A Language and Environment for Statistical Computing; R Core Team: Vienna, Austria, 2022; Available online: https://www.R-project.org/ (accessed on 11 January 2022).
37. Novack-Gottshall, P.; Wang, S.C. KScorrect: Lilliefors-Corrected Kolmogorov-Smirnov Goodness-of-Fit Tests; R Package Version 1.4-0. 2019. Available online: https://CRAN.R-project.org/package=KScorrect (accessed on 3 July 2019).
38. Haebara, T. Equating logistic ability scales by a weighted least squares method. Jpn. Psychol. Res. 1980, 22, 144–149. [Google Scholar] [CrossRef] [Green Version]
39. Bradlow, E.T.; Wainer, H.; Wang, X. A Bayesian random effects model for testlets. Psychometrika 1999, 64, 153–168. [Google Scholar] [CrossRef]
40. Sireci, S.G.; Thissen, D.; Wainer, H. On the reliability of testlet-based tests. J. Educ. Meas. 1991, 28, 237–247. [Google Scholar] [CrossRef]
41. Wainer, H.; Bradlow, E.T.; Wang, X. Testlet Response Theory and Its Applications; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar] [CrossRef]
42. Monseur, C.; Sibberns, H.; Hastedt, D. Linking errors in trend estimation for international surveys in education. IERI Monogr. Ser. 2008, 1, 113–122. [Google Scholar]
43. Battauz, M. IRT test equating in complex linkage plans. Psychometrika 2013, 78, 464–480. [Google Scholar] [CrossRef]
44. Battauz, M. Factors affecting the variability of IRT equating coefficients. Stat. Neerl. 2015, 69, 85–101. [Google Scholar] [CrossRef]
45. Arce-Ferrer, A.J.; Bulut, O. Investigating separate and concurrent approaches for item parameter drift in 3PL item response theory equating. Int. J. Test. 2017, 17, 1–22. [Google Scholar] [CrossRef]
46. Taherbhai, H.; Seo, D. The philosophical aspects of IRT equating: Modeling drift to evaluate cohort growth in large-scale assessments. Educ. Meas. 2013, 32, 2–14. [Google Scholar] [CrossRef]
47. Grothendieck, G. rSymPy: R Interface to SymPy Computer Algebra System. R Package Version 0.2-1.2. 2010. Available online: https://CRAN.R-project.org/package=rSymPy (accessed on 31 July 2010).
48. Meurer, A.; Smith, C.P.; Paprocki, M.; Čertík, O.; Kirpichev, S.B.; Rocklin, M.; Kumar, A.; Ivanov, S.; Moore, J.K.; Singh, S.; et al. SymPy: Symbolic computing in Python. PeerJ Comput. Sci. 2017, 3, e103. [Google Scholar] [CrossRef] [Green Version]
49. Fischer, L.; Gnambs, T.; Rohm, T.; Carstensen, C.H. Longitudinal linking of Rasch-model-scaled competence tests in large-scale assessments: A comparison and evaluation of different linking methods and anchoring designs based on two tests on mathematical competence administered in grades 5 and 7. Psych. Test Assess. Model. 2019, 61, 37–64. [Google Scholar]
50. Sachse, K.A.; Haag, N. Standard errors for national trends in international large-scale assessments in the case of cross-national differential item functioning. Appl. Meas. Educ. 2017, 30, 102–116. [Google Scholar] [CrossRef]
51. Sachse, K.A.; Mahler, N.; Pohl, S. When nonresponse mechanisms change: Effects on trends and group comparisons in international large-scale assessments. Educ. Psychol. Meas. 2019, 79, 699–726. [Google Scholar] [CrossRef]
52. OECD. PISA 2015. Technical Report; OECD: Paris, France, 2017; Available online: https://bit.ly/32buWnZ (accessed on 3 December 2022).
53. Weeks, J.; von Davier, M.; Yamamoto, K. Design considerations for the program for international student assessment. In A Handbook of International Large-Scale Assessment: Background, Technical Issues, and Methods of Data Analysis; Rutkowski, L., von Davier, M., Rutkowski, D., Eds.; Chapman Hall: London, UK; CRC Press: Boca Raton, FL, USA, 2013; pp. 259–276. [Google Scholar] [CrossRef]
54. Kang, H.A.; Lu, Y.; Chang, H.H. IRT item parameter scaling for developing new item pools. Appl. Meas. Educ. 2017, 30, 1–15. [Google Scholar] [CrossRef]
55. König, C.; Khorramdel, L.; Yamamoto, K.; Frey, A. The benefits of fixed item parameter calibration for parameter accuracy in small sample situations in large-scale assessments. Educ. Meas. 2021, 40, 17–27. [Google Scholar] [CrossRef]
56. Cai, L.; Moustaki, I. Estimation methods in latent variable models for categorical outcome variables. In The Wiley Handbook of Psychometric Testing: A Multidisciplinary Reference on Survey, Scale and Test; Irwing, P., Booth, T., Hughes, D.J., Eds.; Wiley: New York, NY, USA, 2018; pp. 253–277. [Google Scholar] [CrossRef]
57. Yuan, K.H.; Cheng, Y.; Patton, J. Information matrices and standard errors for MLEs of item parameters in IRT. Psychometrika 2014, 79, 232–254. [Google Scholar] [CrossRef]
58. González, J.; Wiberg, M. Applying Test Equating Methods. Using R; Springer: New York, NY, USA, 2017. [Google Scholar] [CrossRef]
59. Jewsbury, P.A. Error Variance in Common Population Linking Bridge Studies; (Research Report No. RR-19-42); Educational Testing Service: Princeton, NJ, USA, 2019. [Google Scholar] [CrossRef] [Green Version]
60. Martin, M.O.; Mullis, I.V.S.; Foy, P.; Brossman, B.; Stanco, G.M. Estimating linking error in PIRLS. IERI Monogr. Ser. 2012, 5, 35–47. Available online: https://bit.ly/2Vx3el8 (accessed on 3 December 2022).
61. Frey, A.; Hartig, J.; Rupp, A.A. An NCME instructional module on booklet designs in large-scale assessments of student achievement: Theory and practice. Educ. Meas. 2009, 28, 39–53. [Google Scholar] [CrossRef]
62. Chen, Y.; Li, X.; Zhang, S. Joint maximum likelihood estimation for high-dimensional exploratory item factor analysis. Psychometrika 2019, 84, 124–146. [Google Scholar] [CrossRef] [Green Version]
63. Chen, Y.; Li, X.; Zhang, S. Structured latent factor analysis for large-scale data: Identifiability, estimability, and their implications. J. Am. Stat. Assoc. 2020, 115, 1756–1770. [Google Scholar] [CrossRef] [Green Version]
64. Haberman, S.J. Maximum likelihood estimates in exponential response models. Ann. Stat. 1977, 5, 815–841. [Google Scholar] [CrossRef]
Figure 1. Illustration of chain linking at three time points: T1, T2, and T3.
Figure 1. Illustration of chain linking at three time points: T1, T2, and T3.
Figure 2. Trend estimation for country means and standard deviations at two time points in an international large-scale assessment study.
Figure 2. Trend estimation for country means and standard deviations at two time points in an international large-scale assessment study.
Figure 3. Precision weights $ω$ as a function of the threshold $τ$ (see Equation (80)).
Figure 3. Precision weights $ω$ as a function of the threshold $τ$ (see Equation (80)).
Table 1. Simulation Study: Used item parameters of the 2PL model.
Table 1. Simulation Study: Used item parameters of the 2PL model.
Item$a i$$b i$
10.73−1.31
21.25$−$1.44
31.20−1.20
41.47$−$0.10
50.97$−$0.10
61.38−0.74
71.05$−$1.48
81.14−0.61
91.15$−$0.82
100.67−0.07
Note. ai = item discrimination; bi = item difficulty.
Table 2. Simulation Study: Coverage rates for estimated mean $μ ^$ as a function of the standard deviation of DIF effects for a ($τ a$) and b ($τ b$), number of items (I), and the type of distribution for DIF effects.
Table 2. Simulation Study: Coverage rates for estimated mean $μ ^$ as a function of the standard deviation of DIF effects for a ($τ a$) and b ($τ b$), number of items (I), and the type of distribution for DIF effects.
Normal$t 4$Normal Mixture
$τ a$$τ b$IJKESWOSWBOSWJKESWOSWBOSWJKESWOSWBOSW
0.010.251092.292.290.892.192.992.991.592.992.792.791.392.7
2093.893.893.293.894.594.593.894.594.494.493.894.4
4094.694.694.394.694.994.994.594.894.994.994.694.9
8095.295.295.095.195.295.295.095.195.295.295.195.2
0.010.501092.592.591.292.592.992.991.592.993.193.191.793.0
2094.194.193.594.194.694.694.094.694.494.493.894.4
4094.794.794.494.795.095.094.795.094.894.894.594.8
8095.195.194.995.095.395.395.195.395.495.495.295.3
0.250.251093.993.891.192.694.694.492.093.494.494.391.893.1
2094.894.893.193.894.994.993.293.995.195.193.494.0
4095.195.193.794.095.595.594.094.495.295.293.694.0
8095.295.293.994.095.495.494.294.495.595.594.394.4
0.250.501093.092.990.992.393.793.691.693.093.593.591.392.7
2094.394.393.193.894.494.493.293.894.594.593.394.0
4095.095.094.294.695.195.194.394.795.195.194.394.6
8095.295.294.694.895.295.194.694.795.195.194.494.6
Note. Normal = DIF effects normally distributed; t4 = DIF effects distributed according to scaled t4 distribution; Normal Mixture = DIF effects distributed according to contaminated mixture model; JK = linking error (LE) estimated by jackknife; ESW = LE estimated by expected sandwich estimator; OSW = LE estimated by observed sandwich estimator; BOSW = LE estimated by bias-corrected observed sandwich estimator; coverage rates smaller than 92.5% or larger than 97.5% are printed in bold.
Table 3. Simulation Study: Coverage rates for estimated standard deviation $σ ^$ as a function of the standard deviation of DIF effects for a ($τ a$) and b ($τ b$), number of items (I), and the type of distribution for DIF effects.
Table 3. Simulation Study: Coverage rates for estimated standard deviation $σ ^$ as a function of the standard deviation of DIF effects for a ($τ a$) and b ($τ b$), number of items (I), and the type of distribution for DIF effects.
Normal$t 4$Normal Mixture
$τ a$$τ b$IJKESWOSWBOSWJKESWOSWBOSWJKESWOSWBOSW
0.010.251092.692.686.187.392.992.984.986.292.992.985.386.7
2093.893.883.083.894.594.582.182.894.494.482.583.3
4094.694.680.280.695.095.079.179.594.994.979.279.5
8095.295.277.978.195.195.176.276.495.095.076.376.6
0.010.501092.192.292.393.093.293.291.892.693.093.092.292.9
2094.194.191.391.794.494.490.791.294.294.290.891.3
4094.594.591.391.694.894.890.690.894.994.990.790.9
8095.195.193.793.795.095.092.592.695.395.392.993.0
0.250.251092.392.389.891.393.193.190.592.092.692.690.091.4
2093.893.892.493.094.594.593.093.694.294.292.793.3
4094.794.793.693.995.195.193.994.394.794.793.794.0
8094.994.994.094.295.295.294.394.595.295.294.394.5
0.250.501092.392.387.789.393.092.988.289.792.792.688.089.5
2094.194.191.291.894.394.391.291.994.194.190.991.6
4094.894.892.793.094.994.992.793.094.994.992.793.1
8095.195.193.393.495.495.493.593.795.195.193.393.4
Note. Normal = DIF effects normally distributed; t4 = DIF effects distributed according to scaled t4 distribution; Normal Mixture = DIF effects distributed according to contaminated mixture model; JK = linking error (LE) estimated by jackknife; ESW = LE estimated by expected sandwich estimator; OSW = LE estimated by observed sandwich estimator; BOSW = LE estimated by bias-corrected observed sandwich estimator; coverage rates smaller than 92.5% or larger than 97.5% are printed in bold.
 Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

## Share and Cite

MDPI and ACS Style

Robitzsch, A. Linking Error in the 2PL Model. J 2023, 6, 58-84. https://doi.org/10.3390/j6010005

AMA Style

Robitzsch A. Linking Error in the 2PL Model. J. 2023; 6(1):58-84. https://doi.org/10.3390/j6010005

Chicago/Turabian Style

Robitzsch, Alexander. 2023. "Linking Error in the 2PL Model" J 6, no. 1: 58-84. https://doi.org/10.3390/j6010005