Linking Error in the 2PL Model

: The two-parameter logistic (2PL) item response model is likely the most frequently applied item response model for analyzing dichotomous data. Linking errors quantify the variability in means or standard deviations due to the choice of items. Previous research presented analytical work for linking errors in the one-parameter logistic model. In this article, we present linking errors for the 2PL model using the general theory of M-estimation. Linking errors are derived in the case of log-mean-mean linking for linking two groups. The performance of the newly proposed formulas is evaluated in a simulation study. Furthermore, the linking error estimation in the 2PL model is also treated in more complex settings, such as chain linking, trend estimation, ﬁxed item parameter calibration, and concurrent calibration.


Introduction
Item response theory (IRT) models [1,2] are an important class of multivariate statistical models for analyzing dichotomous random variables used to model testing data from educational or psychological applications. Of particular relevance is the application of item response models in educational large-scale assessments [3], such as the programme for international student assessment (PISA; [4]) study.
In this article, we only investigate unidimensional IRT models. Let X = (X 1 , . . . , X I ) be the vector of I dichotomous random variables X i ∈ {0, 1} (also referred to as items). A unidimensional item response model [5] is a statistical model for the probability distribution P(X = x) for x = (x 1 , . . . , x I ) ∈ {0, 1} I , where where φ is the density of the normal distribution with mean µ and standard deviation σ. The vector δ = (µ, σ) contains the distribution parameters. The vector γ = (γ 1 , . . . , γ I ) contains all estimated item parameters of item response functions P i (θ; γ i ) = P(X i = 1|θ). The one-parameter logistic (1PL) model (also referred to as the Rasch model; [6]) employs the item response function P i (θ) = Ψ(θ − b i ), where Ψ denotes the logistic distribution function, and b i is the item difficulty of item i. In this case, γ i = (b i ). The two-parameter logistic (2PL) model [7] additionally includes the item discrimination a i (i.e., γ i = (a i , b i )), and the item response function is given by P i (θ) = Ψ(a i (θ − b i )).
Note that distribution parameters δ and item parameters γ cannot be simultaneously identified. In applications such as PISA in which a country mean µ and a country standard deviation σ, item parameters γ i are often fixed at values γ * i that are used for all countries. In this situation, µ and σ can be identified. If sample data x 1 , . . . , x N for N persons are available, unknown model parameters in (1) can be estimated by (marginal) maximum likelihood (ML) using an expectation maximization algorithm [8,9].
In practice, data-generating item parameters γ i differ from assumed fixed item parameters γ * i . This property is also referred to as differential item functioning (DIF; [10,11]).
DIF effects e i are defined as deviations e i = γ i − γ * i . The occurrence of DIF causes additional variability in the estimated (country) mean µ and standard deviation σ [12,13]. The estimated distribution parameters depend on the choice of selected items, even in infinite sample sizes of persons. This variability is quantified in the linking error [14][15][16][17][18][19].
There exist simple formulas for linking errors based on variance components for the 1PL model [15,17]. For more complex models, resampling techniques [20,21] such as jackknife [15,17] or the (balanced) half sampling [18] of items can be employed. In this article, we provide closed formulas for the linking error for the 2PL model in various applications based on the M-estimation theory. The proposed formulas have the advantage of avoiding computationally more demanding resampling approaches for computing linking errors.
The rest of this article is structured as follows. The foundation of the M-estimation theory for the application of the computation of the linking error in the 2PL model is presented in Section 2. The specialization of M-estimation to log-mean-mean linking is treated in Section 3. The performance of the newly proposed linking errors is investigated in a simulation study in Section 4. Section 5 provides a further analytical treatment of the linking error in the 2PL model to more complex applications. Finally, this article closes with a discussion in Section 6.

Linking Error and M-Estimation
In this section, we discuss the computation of the linking error in the 2PL model for two groups. We do this in a general setting of M-estimation theory [22][23][24] because our treatment will apply to many of the recently discussed linking methods. However, we focus on log-mean-mean linking in this article as an important example in Sections 3 and 4.
Assume that the 2PL model holds in two groups g = 1, 2 or two time points. The goal is to determine the mean µ and the standard deviation σ of the second group, while the first group is assumed to have a mean of 0 and a standard deviation of 1. The DIF effects f i and e i for logarithmized item discriminations and item difficulties follow It is assumed that f i and e i are independently and identically distributed with zero means and variances τ 2 a , τ 2 b , and the covariance is defined as Cov(e i , f i ) = τ ab . In the first step of the linking approach, the 2PL model is separately estimated in each of the two groups. Because the ability θ for the first group has zero mean and a standard deviation of 1, the identified item parametersâ i1 andb i1 equal the data-generating item parameters a i1 and b i1 , respectively, (i = 1, . . . , I). In the second group, we fix the mean to 0 and the standard deviation to 1 and obtain identified parameterŝ In the second step of the linking approach, identified item parameters {(â i1 ,b i1 )} and {(â i2 ,b i2 )} are used in determining the mean µ and the standard deviation σ in the second group. Note that we assume that identified item parameters are known. Hence, we implicitly have infinite sample sizes of persons. In practice, we estimate item parameters from finite sample sizes. Appropriate adjustments are discussed in Section 5.8.
A general estimating equation of the type is employed for determining the parameter δ = (σ, µ) or δ = (s, µ) with s = log σ as the distribution parameters of interest. M-estimation theory provides the asymptotic variance in an estimateδ. Because linking errors refer to the uncertainty regarding item choice, this asymptotic variance can be used to compute the linking error.
In the two-group 2PL case, we have two unknown distribution parameters σ (or s = log σ) and µ. Hence, g = (g 1 , g 2 ) involves two equations for two unknowns that must be solved. The two M-estimation equations of the linking approaches can be generally written as M-estimation theory provides the asymptotic variance (for I → ∞) for the estimateδ with the sandwich formula [22] The matrix A I is denoted as the bread matrix, while the matrix B I is referred to as the meat matrix. The latter matrix is given by where Var denotes a covariance matrix. The bread matrix A I is given as The two matrices in Equations (7) and (8) require the computation of expected values and variances of random variables. If these were unavailable or the quantities could not be algebraically determined, sample-based versions of the bread and the meat matrix are frequently used [23]. The meat matrix B I can be estimated based on sample data using (7) An empirical version of A I is given bŷ If A I (δ) and B I (δ) are used for computing the variance matrix forδ (i.e., V I,ESW (δ)), the estimate is denoted as the expected sandwich (ESW) estimate. The observed sandwich (OSW) estimate is obtained by usingÂ I (δ) andB I (δ) in the formula of the variance matrix (i.e., V I,OSW (δ)). Finally, a bias-corrected observed sandwich (BOSW) is obtained by using V I,BOSW (δ) = I/(I − 1)V I,OSW (δ) (see [25][26][27]).
We want to emphasize that M-estimation theory is not restricted to applications of linking approaches for two groups and two distribution parameters. The parameter δ can be of any finite dimensionality and could, for example, involve 2(G − 1) unknown parameters for linking G groups.
M-estimation theory was applied in the investigation of DIF and linking in [28][29][30]. The simultaneous treatment of standard errors and linking errors in IRT models relying on M-estimation was presented in [18,31] (see also [32]).

Linking Error of Log-Mean-Mean Linking
We now apply the M-estimation of computing linking errors to log-mean-mean linking [33,34] for linking two groups in the 2PL model. The logarithm of the standard deviation s = log σ is estimated bŷ It can be shown thatŝ is an unbiased and consistent estimate for s [34]. Moreover,σ is obtained by computingσ = exp(ŝ). An estimate of the group mean µ is obtained usinĝ We can reformulate (11) and (12) as M-estimators using g = (g 1 , g 2 ). Now, we determine the linking errors forσ andμ in log-mean-mean linking using the sandwich formula (6) of M-estimation. First, we compute the variance matrix of g.
Cov( f i , e i ) = Iτ ab , and Hence, it follows that Now, we compute derivatives of g with respect to s and µ and obtain the bread matrix as The inverse of the bread matrix can be determined as Formulas (18) and (19) involve unknown parameters τ a , τ b , τ ab , and µ that must be estimated. The quantities f i and e i can be replaced with their sample estimates to estimate unknown variances and covariances in the A I and B I matrices. We obtain the expected sandwich estimator (ESW) V I,ESW using Equation (6) by The observed sandwich (OSW) estimate V I,OSW uses empirical moments in the estimation. First, the bread matrix is estimated by The meat matrix is estimated by Finally, we use a bias-corrected variant of the observed sandwich estimator [27] as Linking errors can be obtained as square roots of the diagonal elements of the variance matrices V . The linking error forŝ based on the ESW estimate is given by By utilizing the delta method, we can obtain the linking error forσ = exp(ŝ) as Finally, the linking error forμ can be estimated by In the absence of a nonuniform DIF, we haveτ 2 a =τ ab = 0, and the linking error for the 1PL model is obtained from (27) Interestingly, the presence of a nonuniform DIF (τ 2 a > 0) introduces additional uncertainty in computing the group mean. However, there is only an effect of a nonuniform DIF if the average item difficulty does not match the group mean (i.e.,μ − b •1 = 0). In typical applications, the third term 2(μ − b •1 )τ ab in (27) will be much more important than the second term (μ − b •1 ) 2τ2 a . Hence, a nonuniform DIF particularly plays an important role if uniform and nonuniform DIF effects are strongly correlated.

Method
In this simulation study, we investigate the performance of different linking error estimates for the log-mean-mean linking approach in the 2PL model. In particular, we compare the jackknife linking error (JK) with linking errors obtained by the empirical sandwich (ESW), observed sandwich (OSW), and the bias-corrected observed sandwich (BOSW) estimates. The formulas for the sandwich estimates are presented in Section 3. The jackknife linking error estimate is computed by repeating the linking procedure when omitting the ith item for i = 1, . . . , I. Let u = µ or u = σ be the distribution parameter of interest andû be the corresponding estimate. Letû (−i) be the estimated parameter if item i was removed from the linking procedure. Then, the jackknife linking error is defined as (see [15]) For identification, the first group had a zero mean and a standard deviation of the ability variable θ. For the second group, we defined µ = −0.2 and σ = 0.9 in the simulation. Item parameters for 10 items are presented in Table 1. In the simulation, we used I = 10, 20, 40, or 80 items. For item numbers as multiples of 10, we duplicated the item parameters of the 10 items presented in Table 1 accordingly. The standard deviation of uniform DIF effects e i was chosen as τ b = 0.25 or 0.50. The standard deviation of nonuniform DIF effects f i was chosen as τ a = 0.01 or 0.25. The first condition mimics the case of the practical absence of nonuniform DIF effects. The correlation of DIF effects between e i and f i was set at 0.3 in all simulation conditions (i.e., τ ab = 0.3 · τ a τ b ). Finally, we chose three types of distributions for DIF effects ( f i , e i ). We specified them as a bivariate normal copula model and chose different marginal distributions. First, we chose the normal distribution (i.e., denoted as "Normal") as a marginal distribution appropriately scaled by τ a and τ b . Second, we chose a scaled t distribution with four degrees of freedom (i.e., denoted as "t 4 ") with an appropriate scaling factor to match the desired standard deviation of DIF effects. Third, we use the distribution function F of a normal mixture model (i.e., denoted as "Normal Mixture") of the type where k = 3 and ε = 0.05. This distribution can be interpreted as a contaminated distribution that includes a few outlying DIF effects in N(0, kτ 2 ) with proportion ε. Such a distribution is often employed in robust statistics [35]. For a prespecified DIF effect τ b , we obtain from (30) the determining equation To disentangle standard errors due to the sampling of persons from linking errors due to item choice, we assumed no sampling error for identified parameters {(â i1 ,b i1 )} and {(â i2 ,b i2 )}. That is, identified item parameters for the second group only vary across replications in the simulation study because different DIF effects f i and e i were simulated in each replication. It seems reasonable in the simulation study in the comparison of the different M-estimation approaches with a jackknife to exclude the effects of sampling errors because these are just another source of uncertainty in distribution parameter estimates.
In each of the 4 (I) × 2 (τ b ) × 2 (τ a ) × 3 (distribution) = 48 cells of the simulation, 40,000 replications were conducted. We assessed coverage rates at the 95% confidence level based on the normal distribution for distribution parameter estimates. The linking error computation for the estimated standard deviationσ = exp(ŝ) utilized the delta method.
The R software [36] was used for simulation and analysis. We used the qmixnorm function from the R package KScorrect [37] for determining quantiles in the data simulation of DIF effects. Because analytical solutions are not available to compute a quantile function for the normal mixture model, the qmixnorm function approximates the quantile function using a spline function calculated from cumulative density functions for the specified mixture distribution [37]. Quantiles for probabilities near zero or one are approximated by taking a randomly generated sample.

Results
In Table 2, the coverage rates for the estimated meanμ are presented as a function of the standard deviation of the DIF effects, the number of items, and the type of distribution for the DIF effects. It turned out that there were no substantial differences in the performance of the different linking error methods with respect to the distribution types of the DIF effects. The jackknife and the ESW estimates were very similar. The OSW estimate did not reach the desired coverage rates in a short test (i.e., I = 10) but improved in longer tests. Moreover, the BOSW slightly improved the OSW estimate but still was inferior to the ESW estimate. Table 2. Simulation Study: Coverage rates for estimated meanμ as a function of the standard deviation of DIF effects for a (τ a ) and b (τ b ), number of items (I), and the type of distribution for DIF effects. In Table 3, the coverage rates for the estimated standard deviationσ are presented as a function of the standard deviation of the DIF effects, the number of items, and the type of distribution for the DIF effects. The OSW and BOSW had particular issues in the coverage rates with very small nonuniform DIF effects. These issues also remained in the longer tests. However, the estimated standard deviations of the nonuniform DIF effectsτ a turned out to be unbiased and can be detectable in such situations to indicate that linking errors forσ would be tiny in this situation. Hence, the practical absence of nonuniform DIF effects using the simulation condition τ a = 0.01 might not be very realistic, and future studies could investigate the performance using τ a = 0.10. Table 3. Simulation Study: Coverage rates for estimated standard deviationσ as a function of the standard deviation of DIF effects for a (τ a ) and b (τ b ), number of items (I), and the type of distribution for DIF effects. Overall, the findings of this simulation study indicate that the sandwich estimates (in particular the ESW) are as effective as the jackknife estimates for linking errors.

Further Applications of the Linking Error in the 2PL Model
In this section, several applications of the linking error computations in the 2PL model are presented. In Section 5.1, the M-estimation theory is applied to linking approaches other than the log-mean-mean linking. Section 5.2 discusses the computation of the linking errors if the items are nested within testlets. The linking errors for chain linking and trend estimation are discussed in Sections 5.3 and 5.4, respectively. In Section 5.5, the linking error under a fixed item parameter calibration is derived. Section 5.6 presents the linking error in the 2PL model for a concurrent calibration. Section 5.7 investigates the computation of the linking errors of derived parameters. Finally, Section 5.8 focuses on the computation of the total error and sampling error corrections in the linking error estimation.

Different Linking Methods
We now illustrate how the sandwich estimates in the M-estimation theory from Section 2 can be used for other linking approaches.

Robust Log-Mean-Mean Linking
Log-mean-mean linking involves two steps that compute the mean for determining the logarithm of the standard deviation s = log σ and the mean µ. A few outlying items might introduce bias in the estimated distribution parameters [18]; robust estimators for the location measures can be preferred. In this case, the estimating functions for δ = (s, µ) are given by using a robust function that fulfills the property ρ(x)/|x| → ∞ for |x| → ∞ (see [29,30]). For example, if the median was used, ρ would be the sign function A differentiable approximation of this function is given byρ(x) = d dx (x 2 + ε) p/2 for p = 1 and ε = 0.01. The observed sandwich formula can be easily applied to obtain linking errors for a wide class of robust linking approaches.

Haebara Linking
Haebara (HAE) linking [38] aligns item response functions instead of directly aligning item parameters. The linking function in HAE linking is given as where ω is a weighting function. By defining the difference we can rewrite (33) as The estimating equations for σ and µ can be determined by Again, linking errors can be easily obtained using the observed sandwich (OSW) formula.

Linking Error with Testlets
In educational large-scale assessment studies such as PISA, several items frequently share a common item stimulus. In this situation, items are nested within testlets [39][40][41]. It was demonstrated that DIF effects are also pronounced at the testlet level and not only at the item level [17,42]. This additional source of uncertainty should be included in the computation of linking errors to avoid negatively biased linking error estimates.
We illustrate how to apply the theory of M-estimation in the case of testlets. The linking error forσ andμ in log-mean-mean linking is derived. Let there be H testlets, and there are I h ≥ 1 items within each testlet h. It holds that ∑ H h=1 I h = I. The data-generating model for DIF effects in the 2PL model in Equation (2) is adapted to include DIF effects at the item and the testlet level. DIF effects for logarithmized item discriminations now include two terms referring to testlets (i.e., h) and items within testlets (i.e., item i nested within testlet h). We assume Note that item parameters now possess an item index i and a testlet index h in (38). Again, it is assumed that the DIF effects are independently distributed across items, and we define τ 2 The estimating equations in log-mean-mean linking are the same. However, they must include the testlet structure of items. The estimating Equations (13) and (14) are rearranged as The essential change in the computation of the sandwich variance in the testlet case is that the variance matrix B I (i.e., the meat matrix) requires the computation of the variance that is carried out at each testlet h instead of each individual item i (see [27]). To indicate the dependency from the testlet structure, it is more appropriate to label the meat matrix B H because testlets are independent units, not items. The entry involving the variance in DIF effects in item discriminations in the meat matrix can be computed as The other variance and the covariance can be derived similarly. Consequently, the meat matrix is given by The bread matrix A H only involves the expected value of the derivatives of the estimating equations. Hence, this matrix remains unchanged in a testlet structure [27]. The unknown variance and covariance components in (42) can be replaced by sample estimates. Then, an expected sandwich variance estimate can be obtained. A sample estimate of the meat matrix can be obtained by replacing the population variance in (41) with an empirical variance. Like in the case of independent items, the observed sandwich variance estimateV H can be modified to obtain a bias-corrected variant. In the testlet case, one should use correction factor H/(H − 1) instead of I/(I − 1).

Linking Error in Chain Linking
In this subsection, we discuss the computation of the linking error in chain linking [33,43,44] in log-mean-mean linking. Figure 1 illustrates the test design in the chain linking. The items are administered at three time points, T1, T2, and T3 (or in three groups). The goal is to determine the distribution parameters at T3. The distribution parameters of the ability variable θ at T3 can be compared with those at T1 by carrying out the linking step T1↔T2 and T2↔T3. The set of all items is denoted as J = {1, . . . , I} and is partitioned into three distinct sets, J 0 , J 1 , and J 2 . The set J 0 contains items that were administered at all three time points. Items in J 1 and J 2 are administered at T1 and T2 and T2 and T3, respectively. We fix the distribution parameters at the first time point T1 to a mean of 0 and a standard deviation of 1. The mean of θ at T2 is denoted by µ 1 and the standard deviation by σ 1 . The mean of θ at T3 is denoted by µ 2 and the standard deviation by σ 2 . We assume that the 2PL model holds at the three time points. For longitudinal data, DIF is referred to as item parameter drift (IPD; [45,46]). The data-generating model involves IPD effects f it for item discriminations and e it for item intercepts (t = 1, 2).
All IPD effects are allowed to be correlated within each item i but are uncorrelated across items. In chain linking, the 2PL model is separately estimated for the three time points. The identified item parameters are given asâ i1 = a i1 ,b i1 = b i1 and In the first linking step T1↔T2, the mean µ 1 and the standard deviation σ 1 = exp(s 1 ) are determined in log-mean-mean linking. In the second linking step T1↔T2, linking constants s 2 and m 2 are derived that refer to the linear transformation θ → exp(s 2 )θ + m 2 .
Let I k = |J k | be the number of items in the sets J k for k = 0, 1, 2. We define κ 0 = I 0 /I and κ k = I k /I for k = 1, 2. The two successive linking steps are formulated as a joint linking problem involving four estimating equations where X i includes all identified item parameters and design variables for item i. The first two estimating equations in (46) refer to log-mean-mean linking of the step T1↔T2, while the last two refer to step T2↔T3. In Equation (46), dummy indicators d ik are used that take the value of one if item i is contained in J 0 or J k for k = 1, 2. Note that ∑ I i=1 d ik = I k for k = 1, 2 and ∑ I i=1 d i1 d i2 = I 0 . We simplify the terms in (46) to We can now compute the meat matrix B I Moreover, we can determine the expected value of the bread matrix E(A I (δ)) as The inverse of A I can be computed as Using the matrices B I (δ) and [A I (δ)] −1 , the variance matrix V I (δ) ofδ can be computed using the sandwich formula (6).
We now explicitly derive linking errors forσ 2 andμ 2 . First, the standard deviation σ 2 at T3 is given as σ 2 = h(δ) = exp(s 1 ) exp(s 2 ). We derive the variance by applying the delta method to the nonlinear transformation h and using the variance matrix V I . The first-order partial derivatives of h are given by Hence, the linking error of the estimated standard deviationσ 2 can be determined as The algebraic derivation for the linking error formula was somewhat more intricate, which is why the R package rSymPy package [47] as a wrapper to the SymPy computer algebra system [48] was used. The square of the linking error forσ 2 (i.e., the quantity LE(σ 2 ) 2 ) is computed using (56) LE(σ 2 ) 2 = τ 2 The linking error for the estimated meanμ 2 can be determined as a derived parameter using the transformation µ 2 = h(δ) = exp(s 2 )µ 1 + m 2 . The first-order partial derivatives of h are given by The linking error forμ 2 can be computed by using (56) In chain linking in the 1PL model, the linking error forμ 2 substantially simplifies by setting σ 1 = σ 2 = 1, D 2 = 0 (see also [49]) Note that all DIF effects referring to item discriminations vanish in this case.

Linking Error for Trend Estimates in Educational Large-Scale Assessment Studies
We now turn to the important application of linking errors for trend estimates [13,15,50,51] in educational LSA studies such as PISA [16,52]. The main goal is to compute a linking error for a trend estimate in country means or country standard deviations between two successive assessments. Again, we rely on the 2PL model and use log-mean-mean linking for the derivation of the linking errors. Previous research derived closed formulas for linking errors in the 1PL model [17]. It was stated in official PISA publications that there does not exist a simple generalization to the 2PL model [52]. However, this section provides a closed formula for the 2PL model. Figure 2 illustrates the problem of trend estimation in LSA studies. The label "NAT" refers to a nation (i.e., a country) c. The label "INT" refers to the international metric that is defined as a pooled sample comprising all students of participating countries in the LSA study. A trend estimate for a country between two LSA assessments can involve means and standard deviations at T1 and T2 (i.e., it compares NAT2 and NAT1). The first linking step NAT1↔INT1 maps country-specific results onto an international metric at T1. This step allows a cross-sectional comparison of countries on an international reference metric. The second linking step INT1↔INT2 links results of two LSA studies at T1 and T2 at the international metric. The third linking step NAT2↔INT2 maps country-specific results to the international metric at T2. The set of administered items typically differs across assessments [53]. There are I 0 link items that are administered at both time points. A set of I 1 unique items is only administered at T1, while a set of I 2 unique items is only administered at T2. For identification reasons, we assume that the θ ability variable has zero mean and a standard deviation of one at T1. Then, we can identify the mean µ c1 and the standard deviation σ c1 of country c at T1. Furthermore, we assume that the mean at the international metric at T2 is µ 0 and the standard deviation is σ 0 . We can also identify the mean µ c2 and the standard deviation σ c2 of country c at T2.
The first linking step NAT1↔INT1 in log-mean-mean linking estimates µ c1 and s c1 = log σ c2 . The second linking step INT1↔INT2 estimates µ 0 and s 0 = log σ 0 . The third linking step NAT2↔INT2 estimates linking constants m c2 and s c2 to put the results of country c at the international metric at T2.
The country mean and standard deviation of country c at T2 are derived functions of the estimated linking parameters The linking constants for the link INT2↔NAT2 are recomputed using (61) as The main idea is to apply the M-estimation theory and the sandwich formula for deriving linking errors for trend estimates. The three-step linking procedure can also be written as a simultaneous estimation problem involving six estimating equations for the vector of unknown linking parameters δ = (s c1 , µ c1 , s 0 , m 0 , s c2 , m c2 ). The trend estimate in means is given as The trend estimate in standard deviations is given as The source of linking errors in trend estimates is the presence of DIF effects. We now present a data-generating model for DIF effects in item parameters in the 2PL model. Let a ict and b ict for t = 1, 2 be item discriminations and item difficulties for country c. These item parameters are referred to as national item parameters [17]. International item parameters that result from item response models at the international metric that involves students from all countries are denoted by α it and β it . We use the same random effects model as in [17] for item difficulties The variance component Var(e it ) = τ 2 b,IPD refers to item parameter drift (IPD; [54]). The variance Var(e ic ) = τ 2 b,DIF is referred to as cross-sectional country DIF, and the variance Var(e ict ) = τ 2 b,DIF×IPD refers to time-point-specific country DIF. All DIF effects are uncorrelated with each other.
We now extend the random effects model in [17] to item discriminations All DIF effects for logarithmized item discriminations are uncorrelated, but DIF effects e and f can be correlated for the same type of heterogeneity (i.e., Cov(e it , f it ) = τ ab,IPD , Cov(e ic , f ic ) = τ ab,DIF , and Cov(e ict , f ict ) = τ ab,DIF×IPD ). Identified national item parameters are given asâ ict = a ict σ ct andb ict = σ −1 ct (b ict − µ ct ) for t = 1, 2. Moreover, for international item parameters, it holds thatα i1 = α i1 and We now apply the M-estimation theory to the estimation of linking errors. The three log-mean-mean linking steps can be formalized as the following six estimating equations: The estimating equations in (67) define an estimate of the parameter of interest δ = (s c1 , µ c1 , s 0 , m 0 , s c2 , m c2 ). We now use the sandwich formula to derive the asymptotic variance ofδ by means of the sandwich Formula (6).
The entries of the meat matrix B I (δ) in the sandwich formula are denoted by The non-zero entries in B I (δ) are given as Moreover, we can determine the bread matrix A I (δ) as The inverse of A I (δ) can be computed as Using (69) and (72), the variance matrix V I can be computed. We now derive the linking error of the trend estimate in standard deviations that is a nonlinear function of δ = (s c1 , µ c1 , s 0 , m 0 , s c2 , m c2 ) The first-order partial derivatives of h are given by Using (56), we can determine the square of the linking error LE(∆σ c ) 2 by using computer algebra software [47,48] as We now compute the linking error for the trend estimate in the mean in the 2PL model. The country mean difference can be computed as The first-order partial derivatives are given as The linking error for the trend estimate in means is given by The linking error formula for the 1PL model derived in [17] can be obtained from the terms in (78) that involve the variance components τ 2 b,DIF , τ 2 b,IPD , and τ 2 b,DIF×IPD by setting σ 2 0 = 1 and D 0 = 0: All other components in (78) vanish in the case of the 1PL model.

Linking Error in Fixed Item Parameter Calibration
In this subsection, linking errors for the estimated mean and the estimated standard deviations under fixed item parameter calibration (FIPC) are derived. It is assumed that one uses fixed item parameters a i1 and b i1 , but the true item parameters a i2 and b i2 have DIF effects and follow the data-generating model (2).
The FIPC is typically applied using marginal maximum likelihood estimation [55]. However, we derive the linking error for a diagonally weighted least squares (DWLS) estimation that approximates the former estimation method [56]. The DWLS minimizes the weighted sums of the differences between the estimated and model-implied item thresholds as well as the tetrachoric correlations. For the simplicity of exposition in this subsection, we assume that there are only DIF effects in item difficulties (i.e., uniform DIF). Hence, we can assume that the standard deviation σ can be estimated without bias.
The item-specific weights in the DWLS estimation are precision weights ω that are defined as the inverse of the variance in thresholds τ. When omitting a factor that involves the sample size, precision weights ω are determined by Equation (80) is displayed in Figure 3. It can be seen that extreme thresholds are downweighted. The optimization function for µ in DWLS is given by whereτ i2 is the identified threshold of item i and τ i2 is the model-implied threshold. Now, define α i = a i σ a 2 i σ 2 + L −1/2 , where L is the logistic variance that is the byproduct of using the logistic instead of the probit link function in the 2PL model. Then, (81) can be rewritten as where The minimization of (82) leads to a weighted mean where W = ∑ I i=1 w i , and µ 0 in (83) denotes the true mean. Hence, one can determine the linking error as the variance in the second term on the right-hand side in (83). We then obtain the linking error in fixed item parameter calibration By the Cauchy-Schwarz inequality, it holds that Hence, we get from (84) by using (85) Hence, the linking error has a lower bound in which all item-specific weights were set equal to one. This case corresponds to the linking error in the Rasch model obtained for mean-mean linking [15,17] This linking error is also obtained in FIPC of the 1PL model using unweighted least squares (ULS) estimation. Interestingly, the finding in (86) illustrates that using incorrect item parameters in the FIPC in the presence of DIF effects results in a precision loss. Hence, the maximum likelihood estimation can only achieve the most efficient estimates under correctly specified models. Because it is almost always expected that there are some intentionally unmodeled DIF effects in real datasets, the dominance of the maximum likelihood estimation in LSA practice can be questioned.

Linking Error in Concurrent Calibration
In this subsection, we derive the linking error for the estimation of the 2PL model in two groups. The mean and the standard deviation of the ability variable θ are fixed to 0 and 1, respectively. The mean µ and the standard deviation σ of the second group are estimated.
Like in Section 5.5, we assume that there are only DIF effects e i in item difficulties and no DIF effects in item discriminations. We assume that σ can be estimated without bias and derive the linking error under concurrent calibration using DWLS. DIF effects e i with zero means follow b i2 = b i1 + e i . The identified parameters are given byb i1 = b i1 andb i2 = b i2 − µ. The estimation of the vector of common item difficulties b = (b 1 , . . . , b I ) and the mean µ is conducted using a weighted square loss of differences in estimated and model-implied item thresholds. By simplifying the setting while assuming known item discriminations and standard deviation, the optimization function is given by where weights w ig are allowed to be group specific (g = 1, 2). The estimating equations for µ and b are given as By inserting (89) into (90), we obtain Equation (91) can be further simplified to By defining w * i = (w i1 + w i2 ) −1 w i1 w i2 , we obtain from (92) Hence, we obtain the linking error forμ where Hence, using the linking error τ b / √ I that does not take item-specific weights into account provides a lower bound of the true linking error.
If the weights w i1 and w i2 would be equal across both groups, we obtain the same linking error like under fixed item parameter calibration.

Linking Error for Derived Parameters
In previous sections of this paper, we derived the linking error for µ and σ for different applications of the 2PL model. Sometimes, other distribution parameters might be estimated. In this subsection, we assume that the ability variable θ is approximately normally distributed and derive the linking error for nonlinear functions h of µ and σ. Let ν = h(δ) = h(µ, σ) be a nonlinear function of the mean µ and the standard deviation σ. Let V denote the variance matrix of δ regarding item choice (i.e., quantifying linking errors). The linking error estimateν = h(μ,σ) is given as We now illustrate the computation in two examples.

Proportions
First, we are interested in computing the probability p The partial derivatives of h are given as where d(µ, σ, c 1 , Hence, we can determine the linking error ofp = g(μ,σ) using (95) as wherev 2 µ ,v 2 σ , andv µσ are estimated linking variances and covariances.

Percentiles
Second, the linking error of the pth percentile Q p should be computed. The pth percentile is defined by Hence, we can solve (100) for Q p and obtain where z p = Φ −1 (p) is the pth percentile for the standard normal distribution. Now, we can compute the linking error ofQ p . We obtain ∂h ∂µ = 1 and ∂h ∂σ = z p .
The linking error of the percentile Q p using (95) is given by It can be seen that for more extreme percentiles, the absolute value of z p gets larger and the linking error for σ (i.e.,v σ in (103)) becomes more relevant.

Computation of Total Error and Sampling Error Correction for Linking Error Estimates
In practical applications, the sampling error SE due to the sampling of persons must also be taken into account to quantify the uncertainty in the σ and µ estimates. The total error (TE) is given by (see [18,19]) A critical issue might be that estimated linking errors also include a portion of variability that can be attributed to the sampling error of persons. We illustrate an analytical bias correction method for the case of log-mean-mean linking. The issue in the sandwich estimate is the variance matrix (i.e., the meat matrix) B I (see (31) for the computation). The identified item parameters also include a sampling error variance that can be estimated by fitting an item response model. Let v * ia be the variance of logâ i2 − logâ i1 due to person sampling (i.e., v * ia = Var(logâ i2 − logâ i1 ). Then, we can determine a corresponding entry in the meat matrix B I by modifying (31) into In the derivation of (105), we relied on the property that item parameters of different items are approximately uncorrelated in sufficiently long tests [57]. The other entries in B I can be similarly determined. By (16) and (17) can be modified to Hence, the originally obtained meat matrix B I can be removed from sampling error contributions by defining a bias-corrected estimate This approach was implemented in the simplified setting in the 1PL model [17]. We now show how to generalize the bias-corrected estimate of the meat matrix B I . We can rewrite the estimate from Equation (9) aŝ The estimateγ i includes the sampling error, and the sampling variance is denoted as The estimating function g in (109) can be viewed as a function g = g(δ, γ). Denote by g γ = (∂g)/(∂γ) the matrix of partial derivatives with respect to γ. We can now apply a Taylor expansion and obtain g(δ,γ i ) = g(δ, γ i ) + g γ (δ, γ i )(γ i − γ i ) .
Using E(γ i ) = γ, we can obtain a bias-corrected estimate of the bread matrix as

Discussion
In this article, we have shown that the sandwich formula from the M-estimation theory can be successfully employed for computing the linking error in the 2PL model in a variety of situations. It was shown in a simulation study for the log-mean-mean linking of two groups that the expected sandwich estimate of the linking error produced satisfactory coverage rates. Interestingly, it had a comparable performance to the jackknife linking error in the 2PL model.
As with any simulation study, some limitations of our study can be stated. More comprehensive studies could involve a different range of standard deviations of the DIF effects, different test lengths, or other linking approaches. It might be interesting to compare the performance of the M-estimation approach with the jackknife linking error for the more complex linking problems of chain linking and trend estimation.
In the simulation study, we only considered uncertainty in distribution parameters due to DIF effects (i.e., linking errors). In practical applications, there will also be a sampling of persons, and the simultaneous assessment of sampling errors and linking errors would be an exciting extension of this study.
In most of the applications involving instruments with cognitive and noncognitive items, linking errors are not reported even if linking approaches were utilized [43,58,59]. There might be two reasons why this is the case. First, simulation studies typically presuppose that the IRT model perfectly fits the data. That is, the DIF is absent, and the IRT model is correctly specified. Second, items could be treated as fixed, and the model misspecification is taken for granted but is not included in the uncertainty quantification of the linking approach. In our view, linking errors provide additional information about the impact of heterogeneous item functioning on parameters of interest and should, therefore, (almost always) be additionally reported. In general, we do not think that the presence of DIF or IPD threatens the validity of group differences or trend estimates.
It should be emphasized that our proposed linking error for the trend estimates in the 2PL model differs from a newly proposed linking error estimate since PISA 2015 [4]. The latter relies on a recalibration approach of the item response data from the first time point [34,60]. The new PISA linking error rather assesses the extent of the assumed noninvariant item parameters across the two PISA studies instead of quantifying the variability due to heterogeneous item functioning.
The derived linking errors rely on the assumption that item parameters are identified. In LSA studies such as PISA, balanced incomplete block designs are employed in which only a subset of items is administered to each student [61]. If item parameters can be identified in such test designs, linking error formulas would not change because country means and standard deviations are based on all the items administered in the test, irrespective of the proportion of items administered to each student.
As argued by an anonymous reviewer, the computation of linking errors relies on the assumption of a representative item sample from an item population. Notably, the identification of item parameters in the joint maximum likelihood estimation also requires asymptotic regimes regarding the sample size and the number of items [62][63][64] but does not require that items are a random sample from an item population. However, the main difference in the computation of linking errors is that linking errors reflect the variability in the distribution parameter estimates due to the DIF. Infinite item samples are not required for model identification; they are only used as a statistical tool for justifying the statistical inference with respect to modeled or unmodeled DIF.
Finally, we assumed that DIF effects had zero means throughout this paper. However, this assumption is not essential in deriving linking errors because the M-estimation theory does not require that estimated parameters converge to true parameters. The M-estimation theory will nevertheless provide the asymptotic variance in a potentially biased estimator. However, DIF effects could also result in a mean that is different from zero. For example, Ref. [28] assumes that the median of the DIF effects equals zero. In this case, log-mean-mean linking in the 2PL model could be replaced by an alternative robust linking method in which the mean is substituted with the median. Notably, the choice of an adequate estimating function to provide unbiased distribution parameters depending on the distribution of DIF effects is a different topic though [18].

Conflicts of Interest:
The author declares no conflict of interest.