Analytical Approximation of the Jackknife Linking Error in Item Response Models Utilizing a Taylor Expansion of the Log-Likelihood Function

Robitzsch, Alexander

doi:10.3390/appliedmath3010004

Open AccessEditor’s ChoiceArticle

Analytical Approximation of the Jackknife Linking Error in Item Response Models Utilizing a Taylor Expansion of the Log-Likelihood Function

by

Alexander Robitzsch

^1,2

¹

IPN–Leibniz Institute for Science and Mathematics Education, Olshausenstraße 62, 24118 Kiel, Germany

²

Centre for International Student Assessment (ZIB), Olshausenstraße 62, 24118 Kiel, Germany

AppliedMath 2023, 3(1), 49-59; https://doi.org/10.3390/appliedmath3010004

Submission received: 21 November 2022 / Revised: 12 December 2022 / Accepted: 30 December 2022 / Published: 5 January 2023

Download Versions Notes

Abstract

Linking errors in item response models quantify the dependence on the chosen items in means, standard deviations, or other distribution parameters. The jackknife approach is frequently employed in the computation of the linking error. However, this jackknife linking error could be computationally tedious if many items were involved. In this article, we provide an analytical approximation of the jackknife linking error. The newly proposed approach turns out to be computationally much less demanding. Moreover, the new linking error approach performed satisfactorily for datasets with at least 20 items.

Keywords:

item response model; linking error; jackknife

MSC:

62H10; 62H25; 65-04; 65D15

1. Introduction

Item response theory (IRT) models [1,2,3] are an important class of multivariate statistics methodologies for analyzing dichotomous random variables used to model testing data from educational or psychological applications. This class aims to summarize a high-dimensional contingency table by a few latent factor variables of interest. Of particular relevance is the application of item response models in educational large-scale assessment [4], such as the studies programme for international student assessment (PISA; [5]) or progress in international reading literacy study (PIRLS; [6]).

In this article, only unidimensional IRT models are considered. Let

X = (X_{1}, \dots, X_{I})

be the vector of I dichotomous random variables

X_{i} \in {0, 1}

(also referred to as items). A unidimensional item response model [1,7] is a statistical model for the probability distribution

P (X = x)

for

x = (x_{1}, \dots, x_{I}) \in {0, 1}^{I}

, where

P (X = x; δ, γ) = \int_{- \infty}^{\infty} \prod_{i = 1}^{I} [{P_{i} (θ; γ_{i})}^{x_{i}} {(1 - P_{i} (θ; γ_{i}))}^{1 - x_{i}}] ϕ (θ; μ, σ) d θ,

(1)

where

ϕ

is the density of the normal distribution with mean

μ

and standard deviation

σ

. The vector

δ = (μ, σ)

contains the distribution parameters. The vector

γ = (γ_{1}, \dots, γ_{I})

contains all estimated item parameters of item response functions

P_{i} (θ; γ_{i}) = P (X_{i} = 1 | θ)

.

The one-parameter logistic (1PL) model (also referred to as the Rasch model; [8]) uses the item response function

P_{i} (θ) = Ψ (θ - b_{i})

, where

Ψ

denotes the logistic distribution function, and

b_{i}

is the item difficulty of item i. In this case, the vector of item parameters

γ_{i}

only consists of one entry; that is,

γ_{i} = (b_{i})

. The two-parameter logistic (2PL) model [9] includes the item discrimination

a_{i}

in addition (i.e.,

γ_{i} = (a_{i}, b_{i})

), and the item response function is given by

P_{i} (θ) = Ψ (a_{i} (θ - b_{i}))

.

Please note that distribution parameters

δ

and item parameters

γ

cannot be simultaneously identified. If the parameters

(μ, σ, {(a_{i}, b_{i}) | i \in {1, \dots, I}})

parametrize the 2PL model, an equivalent parametrization would be

(μ = 0, σ = 1, {(a_{i} σ, σ^{- 1} (b_{i} - μ)) | i \in {1, \dots, I}})

. In applications like PISA in which a country mean

μ

and country a standard deviation

σ

, item parameters

γ_{i}

are often fixed at values

γ_{i}^{*}

that are used for all countries. In this case,

μ

and

σ

can be identified. If sample data

x_{1}, \dots, x_{N}

for N persons are available, unknown model parameters in (1) can be estimated by (marginal) maximum likelihood (ML) using an expectation-maximization algorithm [10,11].

In practice, data-generating item parameters

γ_{i}

differ from assumed fixed item parameters

γ_{i}^{*}

. This property is also referred to as differential item functioning (DIF; [12]). DIF effects

e_{i}

are defined as deviations

e_{i} = γ_{i} - γ_{i}^{*}

. The occurrence of DIF causes additional variability in the estimated (country) mean

μ

and standard deviation

σ

[13,14]. Consequently, the estimated distribution parameters depend on the choice of selected items, even in infinite sample sizes of persons. This variability is quantified in the linking error [15,16,17,18,19,20]. There exist simple formulas for linking errors based on variance components for the 1PL model [16,18]. For more complex models, resampling techniques [21,22] such as jackknife [16,18] or (balanced) half sampling [19] of items can be employed. In the computation of the jackknife linking error, the model is repeatedly estimated by excluding a single item i at each item resulting in slightly differing estimates

{\hat{μ}}_{(- i)}

and

{\hat{σ}}_{(- i)}

compared to the estimates

\hat{μ}

and

\hat{σ}

in the full sample of items. The jackknife linking error for the estimated mean

\hat{μ}

is defined as

\begin{matrix} LE (\hat{μ}) = \sqrt{\frac{I - 1}{I} \sum_{i = 1}^{I} {({\hat{μ}}_{(- i)} - \hat{μ})}^{2}} . \end{matrix}

(2)

The disadvantage of the linking error formula (2) is that

I + 1

model estimations of the IRT model based on the log-likelihood function l are required. In this article, a computational shortcut for determining increments

{\hat{μ}}_{(- i)} - \hat{μ}

in (2) based on a Taylor expansion of the log-likelihood function is presented. Only second-order derivatives and one additional estimation of the IRT model are required in our proposed approach. Hence, the computational effort is significantly reduced.

The rest of the article is structured as follows. The newly proposed analytical approximation to the jackknife linking error is presented in Section 2. A simulation study compares the performance of our new approach with the jackknife linking error in Section 3. Finally, the article closes with a discussion in Section 4.

2. Analytical Approximation of the Jackknife Linking Error

This section provides details for our analytical approximation to the jackknife linking error. A Taylor expansion of the log-likelihood function l is employed to approximate increments in the jackknife linking error formula.

Let

δ = (μ, σ)

be the vector that includes the mean

μ

and the standard deviation

σ

. Let

γ = (γ_{1}, \dots, γ_{I})

be the vector that includes all item parameters

γ_{i}

(

i = 1, \dots, I

). Furthermore, let

δ_{0}

and

γ_{0}

be the true distribution parameter and item parameters, respectively. In the computation of

\hat{γ}

, the item parameters in the scaling model to

γ = γ^{*}

are fixed. The difference

e = γ_{0} - γ^{*}

indicates misspecification. If the scaling model involves data of a country and

γ^{*}

are international item parameters, the vector

e

includes DIF effects.

The approximation of the jackknife linking error relies on a Taylor expansion of the first derivative of the log-likelihood function l with respect to

δ

(i.e., the score equations) around true data-generating parameters

(δ_{0}, γ_{0})

. In the application of IRT models, the log-likelihood function is typically twice continuously differentiable to guarantee the applicability of the Taylor approximation. Define

l_{δ} = (\partial l) / (\partial δ)

,

l_{δ δ} = (\partial^{2} l) / (\partial δ^{2})

, and

l_{δ γ_{i}} = (\partial^{2} l) / (\partial δ \partial γ_{i})

. With a sufficiently long test, estimated item parameters

{\hat{γ}}_{i}

are independent across items [23]. Hence,

l_{δ}

can be approximated around

(δ_{0}, γ_{0})

as

\begin{matrix} l_{δ} (δ, γ) \approx l_{δ} (δ_{0}, γ_{0}) + l_{δ δ} (δ_{0}, γ_{0}) (δ - δ_{0}) + \sum_{i = 1}^{I} l_{δ δ} (δ_{0}, γ_{i 0}) (γ_{i} - γ_{i 0}) \end{matrix} .

(3)

The distribution parameter estimates

\hat{δ} = (\hat{μ}, \hat{σ})

are obtained by setting (3) to zero and using fixed but misspecified item parameters

γ_{i} = γ_{i}^{*}

. Hence, we obtain from (3)

0 = l_{δ} (δ_{0}, γ_{0}) + l_{δ δ} (δ_{0}, γ_{0}) (\hat{δ} - δ_{0}) + \sum_{i = 1}^{I} l_{δ γ_{i}} (δ_{0}, γ_{i 0}) (γ_{i}^{*} - γ_{i 0}) .

(4)

We now determine the distribution parameter estimate

{\hat{δ}}_{(- i)}

in which item i is omitted from the log-likelihood function. Empirical evidence shows that the distribution parameters can be equivalently estimated if the item parameters of item i were freely estimated. This means that one can set

γ_{i} = γ_{i, 0}

for a sufficiently large number of items I. Then, (4) can be rewritten as

0 = l_{δ} (δ_{0}, γ_{0}) + l_{δ δ} (δ_{0}, γ_{0}) ({\hat{δ}}_{(- i)} - δ_{0}) + \sum_{\binom{j = 1}{j \neq i}}^{I} l_{δ γ_{j}} (δ_{0}, γ_{j 0}) (γ_{j}^{*} - γ_{j 0}) .

(5)

By subtracting (4) from (5), we obtain

{\hat{δ}}_{(- i)} - \hat{δ} = - {[l_{δ δ} (δ_{0}, γ_{0})]}^{- 1} l_{δ γ_{i}} (δ_{0}, γ_{i 0}) (γ_{i}^{*} - γ_{i 0}) .

(6)

Now, Equation (6) is now specialized for the 2PL model. In this case,

γ_{i} = (a_{i}, b_{i})

consists of two parameters. We assume that fixed item discriminations were correct and fixed item intercepts

b_{i}^{*}

do not equal true data-generating item intercepts

b_{i 0}

. We obtain from (6)

\begin{matrix} {\hat{δ}}_{(- i)} - \hat{δ} \approx - {[l_{δ δ} (δ_{0}, γ_{0})]}^{- 1} l_{δ b_{i}} (δ_{0}, γ_{i 0}) (b_{i}^{*} - b_{i 0}) \end{matrix} .

(7)

In the following subsections, it is discussed how the finding can be used in the practical implementation (Section 2.1) of the jackknife linking error and how to efficiently compute the necessary derivatives of the log-likelihood function (Section 2.2). The estimation of linking errors is also prone to sampling errors. To circumvent a biased estimation of the linking error, we propose a bias-corrected version of the analytical approximation of the jackknife linking error in Section 2.3. Finally, a variant of the jackknife linking error computation in subsets of items is discussed in Section 2.4.

2.1. Use of the Approximation in Scaling

We now discuss how to apply the analytical approximation formula (6) for the deviations

{\hat{δ}}_{(- i)} - \hat{δ}

in the jackknife linking error formula. First, we compute the distribution parameters

\hat{δ}

by fixing item parameters to

γ^{*}

. Second, we estimate item parameters

\hat{γ}

by fixing the distribution parameters to

\hat{δ}

. The motivation is that differences of

\hat{δ} - δ_{0}

and

\hat{γ} - γ_{0}

are close to zero for a sufficiently large number of items I. Hence, we replace unknown parameters in (6) with their empirical counterparts, and we arrive at

{\hat{δ}}_{(- i)} - \hat{δ} = - {[l_{δ δ} (\hat{δ}, \hat{γ})]}^{- 1} l_{δ γ_{i}} (\hat{δ}, {\hat{γ}}_{i}) (γ_{i}^{*} - {\hat{γ}}_{i}) .

(8)

For the special case of the 2PL model with misspecified item intercepts

b_{i}^{*}

, we obtain from (7)

{\hat{δ}}_{(- i)} - \hat{δ} = - H_{i} (b_{i}^{*} - {\hat{b}}_{i 0})

(9)

for estimated item parameters

\hat{b} = ({\hat{b}}_{1}, \dots, {\hat{b}}_{I})

and

H_{i} = {[l_{δ δ} (\hat{δ}, \hat{b})]}^{- 1} l_{δ b_{i}} (\hat{δ}, \hat{b})

. The analytical approximation

{LE}_{AN}

of the jackknife linking error is given as

\begin{matrix} {LE}_{AN} (u) = \sqrt{\frac{I - 1}{I} \sum_{i = 1}^{I} h_{i u}^{2} {(b_{i}^{*} - {\hat{b}}_{i 0})}^{2}}, u = μ or u = σ \end{matrix},

(10)

where

H_{i} = {(h_{i μ}, h_{i σ})}^{⊤}

. For the linking error of

\hat{μ}

, the first entry

h_{i μ}

is chosen. For the linking error of

\hat{σ}

, the second entry

h_{i σ}

is chosen.

2.2. Implementation Details for Computing Derivatives of the Log-Likelihood Function

We now discuss how to efficiently compute the necessary derivatives of the log-likelihood function required in the analytical approximation of the jackknife linking error. We evaluate the integral in (1) by a rectangular integration on a finite grid

θ_{1}, \dots, θ_{T}

of

θ

points. Hence, the continuous normal distribution is approximated by a discretized normal distribution [24]. We set

w_{t} = w_{t} (μ, σ) = C ϕ (θ_{t}; μ, σ)

using an appropriate scaling constant C that ensures

\sum_{t = 1}^{T} w_{t} = 1

.

Let

l_{p} = log L_{p}

denote the contribution in the log-likelihood function of person p based on item response data

x_{p} = (x_{p 1}, \dots, x_{p I})

. It holds that

L_{p} = \sum_{t = 1}^{T} w_{t} \prod_{i = 1}^{I} f_{p t i} = \sum_{t = 1}^{T} w_{t} f_{p t}, where

(11)

f_{p t i} = P_{i} {(θ_{t}, γ_{i})}^{x_{p i}} {[1 - P_{i} (θ_{t}, γ_{i})]}^{1 - x_{p i}}

(12)

and

f_{p t} = \prod_{i = 1}^{I} f_{p t i}

. We now compute the partial derivative of

l_{p}

for a scalar parameter u

\frac{\partial l_{p}}{\partial u} = \frac{\frac{\partial L_{p}}{\partial u}}{L_{p}} .

(13)

The second-order derivative with respect to another parameter v is given as

\frac{\partial^{2} l_{p}}{\partial u \partial v} = \frac{\frac{\partial^{2} L_{p}}{\partial u \partial v} L_{p} - \frac{\partial L_{p}}{\partial u} \frac{\partial L_{p}}{\partial v}}{L_{p}^{2}} .

(14)

We can compute for

u = μ

or

u = σ

\frac{\partial L_{p}}{\partial u} = \sum_{t = 1}^{T} \frac{\partial w_{t}}{\partial u} f_{p t} .

(15)

The second-order derivative for

v = μ

or

v = σ

can be obtained as

\frac{\partial^{2} L_{p}}{\partial u \partial v} = \sum_{t = 1}^{T} \frac{\partial^{2} w_{t}}{\partial u \partial v} f_{p t} .

(16)

In the analytical approximation of the jackknife linking error, we need to compute

(\partial^{2} L_{p}) / (\partial u \partial b_{i})

for an item parameter

b_{i}

. We obtain

\frac{\partial^{2} L_{p}}{\partial u \partial b_{i}} = \sum_{t = 1}^{T} \frac{\partial w_{t}}{\partial u} f_{p t} \frac{1}{f_{p i t}} \frac{\partial f_{p i t}}{\partial b_{i}} = \sum_{t = 1}^{T} \frac{\partial w_{t}}{\partial u} f_{p t} \frac{\partial log f_{p i t}}{\partial b_{i}} .

(17)

Equations (15)–(17) indicate that the necessary computations for the first- and second-order derivatives are computationally inexpensive. Hence, the analytical approximation of the linking error is computationally cheap if DIF effects were available.

2.3. Bias Correction due to Sampling Error

The estimation of linking errors is also prone to sampling errors because estimated item parameters are involved in the computation. To avoid a biased estimation of the linking error, we now propose a bias-corrected version of the analytical approximation of the jackknife linking error.

Assume that the variance of

b_{i}^{*} - {\hat{b}}_{i 0}

in (10) is

v_{i}

, and the estimates are approximately independent across items [23]. The bias-corrected analytical linking error can be obtained by subtracting variability that is due to sampling error quantified in

v_{i}

. We obtain

\begin{matrix} {LE}_{AN, bc} (u) = {sqrt}_{+} (\frac{I - 1}{I} \sum_{i = 1}^{I} h_{i u}^{2} [{(b_{i}^{*} - {\hat{b}}_{i 0})}^{2} - v_{i}]), u = μ or u = σ \end{matrix},

(18)

where

{sqrt}_{+} (x) = \sqrt{max (x, 0)}

.

A similar bias-correction method was used in the 1PL method in trend estimation [18]. Alternatively, a bias-correction term can also be estimated using resampling techniques regarding persons (e.g., bootstrap or half sampling; [19,22]).

2.4. Jackknife Linking Error Based on Testlets

The jackknife linking error is frequently evaluated at groups of items (so-called testlets; refs. [25,26]) instead of single items. The reason for this is that subsets of items in a test are often presented jointly with a common (text) stimulus. Hence, DIF effects pertain to all items in a testlet and often have the same sign. Therefore, the testlet structure must be taken into account when computing linking errors [17,18,27].

Assume that there are H testlets. That is, the set of item integers

i = 1, \dots, I

is partitioned into distinct sets

I_{1}, \dots, I_{H}

. The linking error based on testlets for

\hat{μ}

is defined as

\begin{matrix} LE (\hat{μ}) = \sqrt{\frac{H - 1}{H} \sum_{h = 1}^{H} {({\hat{μ}}_{(- I_{h})} - \hat{μ})}^{2}} \end{matrix},

(19)

where

{\hat{μ}}_{(- I_{h})}

is the estimate in which all items from testlet h were removed. In the analytical approximation, we can approximate the relevant jackknife difference in (19) by

\begin{matrix} {\hat{μ}}_{(- I_{h})} - \hat{μ} = \sum_{i \in I_{h}} h_{i μ} (b_{i}^{*} - b_{i 0}) \end{matrix} .

(20)

A corresponding bias-corrected variant of the linking error (see (18)) can be similarly obtained.

3. Simulation Study

3.1. Method

In this simulation study, we investigate the performance of our analytical approximation of the jackknife linking error. We illustrate the performance based on the 2PL model. Equal discriminations

a_{i} \equiv 1

were assumed.

In the simulation study, we varied two factors: the number of items and the DIF standard deviation

τ_{DIF}

. We chose

I = 10

, 20, 30, and 40 items to cover a range of test lengths that are obtained in empirical practice. The goal is to assess the mean

μ

and the standard deviation

σ

in a group (e.g., a country in a large-scale assessment study such as PISA). We defined

μ = - 0.2

and

σ = 0.9

in the simulation. Assumed item difficulties

b_{i}^{*}

were chosen equispaced in the interval

[- 2, 2]

with increments

4 / (I - 1)

. For example, for

I = 10

items, assumed item difficulties were −2.00, −1.56, −1.11, −0.67, −0.22, 0.22, 0.67, 1.11, 1.56, and 2.00. In each replication, data-generating item difficulties

b_{i 0}

were simulated according

b_{i 0} = b_{i}^{*} + e_{i}, e_{i} \sim N (0, τ_{DIF})

(21)

Hence, the estimated distribution parameters

\hat{μ}

and

\hat{σ}

will vary across replications even with infinite sample sizes of persons because the true data-generating item parameters vary. The standard deviation of DIF effects was either

τ_{DIF} = 0.25

or 0.50.

Item response data were simulated according to a quasi-Monte Carlo simulation method ([28], see also [29] for a similar approach). Our motivation was to assess the performance of the linking error by reducing the uncertainty due to sampling error. Simulated item responses

X

should be as close as possible to the true distribution

P (X; μ, σ, b_{0})

(see [29]). To facilitate this, we chose

θ

values from the same discrete grid

θ_{1} = - 3.5, \dots, θ_{21} = 3.5

of equidistant

θ

points that were also used in fitting the 2PL model. We fixed a pseudo-sample size of 10,000 persons. Then, we computed the number of persons at each

θ_{t}

, which is given by

N w_{t} (μ, σ)

(some rounding is necessary). For each

θ = θ_{t}

and for each item i, we can compute

P_{i} (θ_{t}; a_{i}, b_{i 0})

according to the item response function. Hence, there are

N w_{t} (μ, σ) P_{i} (θ_{t}; a_{i}, b_{i 0})

persons with

θ = θ_{t}

with

X_{i} = 1

and

N w_{t} (μ, σ) (1 - P_{i} (θ_{t}; a_{i}, b_{i 0}))

with

X_{i} = 0

. The zeroes and ones for item

i = 1, \dots, I

are randomly allocated to the corresponding persons

θ = θ_{t}

. Although the empirical frequencies for multivariate item response patterns

X = x

do not match the population probabilities, conditional marginal probabilities of the item response functions are (almost) correctly simulated. Hence, one can conclude that this quasi-Monte Carlo approach reduces the impact of sampling errors to a minimum and only reflects variability due to linking errors; that is, using incorrect item parameters

b_{i}^{*}

that differ from the data-generating item parameters

b_{i}

.

In each of the

4 \times 2 = 8

cells of the simulation, 2000 replications were conducted. We fitted the 2PL model with item discriminations fixed to 1 and fixed item difficulties

b_{i}^{*}

and obtained the estimated mean

\hat{μ}

and the estimated standard deviation

\hat{σ}

. Then, we determined item difficulties

{\hat{b}}_{i}

by fixing the mean and standard deviation to

\hat{μ}

and

\hat{σ}

, respectively. Using these quantities, we calculated the analytical approximation (AN) of the jackknife linking error given in (10). We compared the analytical linking error with the jackknife linking error (JK). To evaluate the quality of the estimated linking errors, we computed the mean, the standard deviation, the standard error ratio (

{SE}_{ratio}

; defined as the quotient of the average linking error and the empirical standard deviation of the estimate

\hat{μ}

or

\hat{σ}

), and coverage rates at the 95% confidence level.

The R software [30] was used for simulation and analysis. The R package TAM [31] was used for estimating the 2PL model.

3.2. Results

Table 1 contains the results of the two linking error methods as a function of the standard deviation of DIF effects

τ_{DIF}

and the number of items I.

It can be seen that the mean can be almost unbiasedly estimated in the presence of DIF effects. As expected, the standard deviation of the estimated mean

\hat{μ}

decreases with a larger number of items. Because all item discriminations were equal to one, the linking error of

\hat{μ}

can be analytically predicted as

τ_{DIF} / \sqrt{I}

(referred to as EXP in Table 1; [16,18]). It can be seen that empirical standard deviations of

\hat{μ}

were close to these expected values. Moreover, the mean of the jackknife linking error JK was very similar to the empirical standard deviation of

\hat{μ}

, while the linking error AN based on the analytical approximation was slightly too small. This was particularly the case for a low number of items

I = 10

. However, with a larger number of items, the analytical approximation performed well. This behavior is also reflected in the standard error ratio

{SE}_{ratio}

, which attained desired values close to 1 for the jackknife linking error and was smaller than 1 for the linking error based on the analytical approximation. However, for at least 20 items, the analytical approximation might be considered to have satisfactory performance. Overall, it can also be seen that the estimated linking error AN was a bit smaller on average than the jackknife linking error. However, the average absolute deviation (AAD in Table 1) demonstrated that the analytical approximation was very close to the jackknife linking error in each replication, particularly for a large number of items such as

I = 40

. It can also be seen that coverage rates were satisfactory for the jackknife linking error, while the analytical approximation showed issues in the condition of a few items (i.e.,

I = 10

). Finally, we observed that the empirical standard deviation of estimated linking errors was slightly smaller for the analytical approximation compared to the jackknife approach.

The estimated standard deviation

\hat{σ}

showed a small bias in the condition of a larger standard deviation of DIF effects (i.e.,

τ_{DIF} = 0.5

). Hence, one can expect that coverage rates will perform satisfactorily because the expected value deviated from the true value

σ = 0.9

. Interestingly, the mean of both linking errors was smaller than the empirical standard deviation of

\hat{σ}

. This finding was also reflected in the standard error ratios, which were substantially smaller than 1. Hence, one might conclude that both linking errors (jackknife JK and the analytical approximation AN) were unsatisfactory to reflect the variability in estimated standard deviations in our application. We suspect these results might be explained by the fact that the standard deviation was obtained using fixed but incorrect item parameters. Moreover, such a finding would likely not be observed in a linking approach such as log-mean-mean linking [32].

Across all conditions, the analytical approximation provided linking errors that were slightly too small. In the computation of the jackknife linking error in Equation (10), the multiplication factor

(I - 1) / I

is used. In other applications, the multiplication factor 1 is used [22]. We recomputed linking errors based on the analytical approximation with the multiplication factor of 1, and it turned out that the results grew closer to the jackknife linking error. However, the standard error ratio was still smaller than 1. Nevertheless, there could be a benefit to using the modified formula in empirical applications.

On the Bias in the Estimated Standard Deviation $\hat{σ}$

At first sight, the negative bias in the estimated standard deviation

\hat{σ}

is surprising, although such a finding was also found in other studies that use fixed but incorrect item parameters in estimation [32]. We now present a heuristic derivation of the bias that appears close to the empirically obtained bias. Please note that the estimation of the item response model using a logistic item response function can be approximated by the probit link function [33]. In this case, weighted least squares estimation based on tetrachoric correlations [34] can be used to estimate the standard deviation

σ

. The approach relies on modeling underlying continuous variables

X_{i}^{*}

for dichotomous items

X_{i}

. The variable

X_{i}

takes the value if

X_{i}^{*}

exceeds the item parameter

b_{i}

. The tetrachoric correlation

ρ_{i j}

for items i and j is given by

ρ_{i j} = \frac{Cov (X_{i}^{*}, X_{j}^{*})}{\sqrt{Var (X_{i}^{*}) Var (X_{j}^{*})}} .

(22)

Now assume equal item discriminations

a_{i}

and a correctly specified model. In this case, (22) simplifies to

ρ_{i j} = \frac{σ^{2}}{σ^{2} + L},

(23)

where

L = π^{2} / 3 \approx 3.29

is the variance of the logistic distribution. If incorrect item parameters

b_{i}^{*}

were used, the covariance

Cov (X_{i}^{*}, X_{j}^{*})

in (22) is not affected on average. However, the variance

Var (X_{i}^{*})

of the underlying latent variable

X_{i}^{*}

increases, and the expected value can be determined by the DIF variance

τ_{DIF}^{2}

. The estimated tetrachoric correlation can then be computed as

ρ_{i j}^{*} = \frac{σ^{2}}{σ^{2} + τ_{DIF}^{2} + L} .

(24)

In the computation of

\hat{σ}

, one, therefore, essentially solves

ρ_{i j}^{*} = \frac{σ^{2}}{σ^{2} + τ_{DIF}^{2} + L} = \frac{{\hat{σ}}^{2}}{{\hat{σ}}^{2} + L} .

(25)

The estimated standard deviation

\hat{σ}

can be determined as (see [35] for a similar approach)

\hat{σ} = \sqrt{L \frac{ρ_{i j}^{*}}{1 - ρ_{i j}^{*}}} = σ \sqrt{\frac{L}{L + τ_{DIF}^{2}}} .

(26)

Hence, the estimated standard deviation is negatively biased in the presence of DIF effects. The predicted bias in the estimated standard deviation based on (26) is

- 0.008

for

τ_{DIF} = 0.25

and

- 0.032

for

τ_{DIF} = 0.50

, which is similar to the empirically obtained bias for

\hat{σ}

in Table 1.

4. Discussion

In this article, an analytical approximation of the jackknife linking error by means of a Taylor expansion of the log-likelihood function has been proposed. It turned out that the analytical approximation performed well for estimated means for at least 20 items. The approximation has the advantage because it only requires one additional estimation of an item response model and second-order derivatives of the log-likelihood function. In contrast, the jackknife linking error requires I additional estimations of the item response model for I items which is computationally much more demanding. One might argue that the analytical approximation provides at least a computationally cheap proxy of the linking error. However, the jackknife linking error would be preferred due to a more reliable statistical inference because our simulation findings indicated that it provided slightly better coverage rates.

In the simulation study, we did not consider sampling errors because a quasi-Monte Carlo simulation method was utilized that minimized the impact of sampling errors. Future studies could simultaneously assess sampling errors and linking errors based on the analytical approximation. In particular, the performance of our proposed bias-correction method could be evaluated.

In the analytical derivation and the simulation study, we restricted ourselves to the computation of a single group which can be interpreted as country means and standard deviations in a cross-sectional assessment or the computation of trend between two time points. In future research, our proposed simplified computation formula for the linking error might be applied for trend estimates in the country means in educational assessment studies such as PISA [18].

It should be noted that our proposed approach of the analytical approximation of the jackknife linking error bears similarity with the infinitesimal jackknife technique [36,37,38]. The difference is that the jackknife linking error removes columns (i.e., items) from the dataset, while infinitesimal jackknife addresses contributions of rows (i.e., persons) in a dataset.

In the linking literature [39,40], linking errors are also sometimes referred to as sampling errors (of persons) of obtained linking constants that can be means or standard deviations [41,42,43]. It is important to note that sampling errors due to the sampling of persons and linking errors due to item choice must be distinguished [19,20]. The computation of linking errors can be justified for random items and fixed items [19]. For random items, the used items are thought to be a (representative) draw from a larger population of items [44,45]. For fixed items, DIF effects can be stochastically modeled by some distribution [19]. The latter case might also be conceived as quantifying model error [46]. We would like to point out that we see great potential in using linking errors by incorporating the extent of model misspecification in the reported parameter uncertainty.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

1PL	one-parameter logistic
2PL	two-parameter logistic
DIF	differential item functioning
IRT	item response theory
JK	jackknife
LE	linking error
PIRLS	progress in international reading literacy study
PISA	programme for international student assessment

References

Bock, R.D.; Moustaki, I. Item response theory in a general framework. In Handbook of Statistics, Volume 26: Psychometrics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 469–513. [Google Scholar] [CrossRef]
van der Linden, W.J.; Hambleton, R.K. (Eds.) Handbook of Modern Item Response Theory; Springer: New York, NY, USA, 1997. [Google Scholar] [CrossRef]
van der Linden, W.J. Unidimensional logistic response models. In Handbook of Item Response Theory, Volume 1: Models; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 11–30. [Google Scholar] [CrossRef]
Rutkowski, L.; von Davier, M.; Rutkowski, D. (Eds.) A Handbook of International Large-Scale Assessment: Background, Technical Issues, and Methods of Data Analysis; Chapman Hall/CRC Press: London, UK, 2013. [Google Scholar] [CrossRef]
OECD. PISA 2018. Technical Report; OECD: Paris, France, 2020. [Google Scholar]
Foy, P.; Yin, L. Scaling the PIRLS 2016 achievement data. In Methods and Procedures in PIRLS 2016; Martin, M.O., Mullis, I.V., Hooper, M., Eds.; IEA: Boston College: Chestnut Hill, MA, USA, 2017. [Google Scholar]
Yen, W.M.; Fitzpatrick, A.R. Item response theory. In Educational Measurement; Brennan, R.L., Ed.; Praeger Publishers: Westport, CT, USA, 2006; pp. 111–154. [Google Scholar]
Rasch, G. Probabilistic Models for Some Intelligence and Attainment Tests; Danish Institute for Educational Research: Copenhagen, Denmark, 1960. [Google Scholar]
Birnbaum, A. Some latent trait models and their use in inferring an examinee’s ability. In Statistical Theories of Mental Test Scores; Lord, F.M., Novick, M.R., Eds.; MIT Press: Reading, MA, USA, 1968; pp. 397–479. [Google Scholar]
Aitkin, M. Expectation maximization algorithm and extensions. In Handbook of Item Response Theory, Volume 2: Statistical Tools; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 217–236. [Google Scholar] [CrossRef]
Bock, R.D.; Aitkin, M. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika 1981, 46, 443–459. [Google Scholar] [CrossRef]
Penfield, R.D.; Camilli, G. Differential item functioning and item bias. In Handbook of Statistics, Volume 26: Psychometrics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 125–167. [Google Scholar] [CrossRef]
Joo, S.; Ali, U.; Robin, F.; Shin, H.J. Impact of differential item functioning on group score reporting in the context of large-scale assessments. Large-Scale Assess. Educ. 2022, 10, 18. [Google Scholar] [CrossRef]
Robitzsch, A.; Lüdtke, O. A review of different scaling approaches under full invariance, partial invariance, and noninvariance for cross-sectional country comparisons in large-scale assessments. Psychol. Test Assess. Model. 2020, 62, 233–279. [Google Scholar]
Battauz, M. Multiple equating of separate IRT calibrations. Psychometrika 2017, 82, 610–636. [Google Scholar] [CrossRef]
Monseur, C.; Berezner, A. The computation of equating errors in international surveys in education. J. Appl. Meas. 2007, 8, 323–335. [Google Scholar]
OECD. PISA 2012. Technical Report; OECD: Paris, France, 2014. [Google Scholar]
Robitzsch, A.; Lüdtke, O. Linking errors in international large-scale assessments: Calculation of standard errors for trend estimation. Assess. Educ. 2019, 26, 444–465. [Google Scholar] [CrossRef]
Robitzsch, A. Robust and nonrobust linking of two groups for the Rasch model with balanced and unbalanced random DIF: A comparative simulation study and the simultaneous assessment of standard errors and linking errors with resampling techniques. Symmetry 2021, 13, 2198. [Google Scholar] [CrossRef]
Wu, M. Measurement, sampling, and equating errors in large-scale assessments. Educ. Meas. 2010, 29, 15–27. [Google Scholar] [CrossRef]
Efron, B.; Tibshirani, R.J. An Introduction to the Bootstrap; CRC Press: Boca Raton, FL, USA, 1994. [Google Scholar] [CrossRef]
Kolenikov, S. Resampling variance estimation for complex survey data. Stata J. 2010, 10, 165–199. [Google Scholar] [CrossRef]
Yuan, K.H.; Cheng, Y.; Patton, J. Information matrices and standard errors for MLEs of item parameters in IRT. Psychometrika 2014, 79, 232–254. [Google Scholar] [CrossRef]
Chakraborty, S. Generating discrete analogues of continuous probability distributions—A survey of methods and constructions. J. Stat. Distrib. Appl. 2015, 2, 6. [Google Scholar] [CrossRef]
Sireci, S.G.; Thissen, D.; Wainer, H. On the reliability of testlet-based tests. J. Educ. Meas. 1991, 28, 237–247. [Google Scholar] [CrossRef]
Wainer, H.; Bradlow, E.T.; Wang, X. Testlet Response Theory and Its Applications; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar] [CrossRef]
Monseur, C.; Sibberns, H.; Hastedt, D. Linking errors in trend estimation for international surveys in education. IERI Monogr. Ser. 2008, 1, 113–122. [Google Scholar]
Caflisch, R.E. Monte Carlo and quasi-Monte Carlo methods. Acta Numer. 1998, 7, 1–49. [Google Scholar] [CrossRef]
Robitzsch, A. About the equivalence of the latent D-scoring model and the two-parameter logistic item response model. Mathematics 2021, 9, 1465. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing; R Core Team: Vienna, Austria, 2022; Available online: https://www.R-project.org/ (accessed on 11 January 2022).
Robitzsch, A.; Kiefer, T.; Wu, M. TAM: Test Analysis Modules. 2022. R Package Version 4.1-4. Available online: https://CRAN.R-project.org/package=TAM (accessed on 28 August 2022).
Robitzsch, A. A comparison of linking methods for two groups for the two-parameter logistic item response model in the presence and absence of random differential item functioning. Foundations 2021, 1, 116–144. [Google Scholar] [CrossRef]
Robitzsch, A.; Lüdtke, O. Mean comparisons of many groups in the presence of DIF: An evaluation of linking and concurrent scaling approaches. J. Educ. Behav. Stat. 2022, 47, 36–68. [Google Scholar] [CrossRef]
Muthén, B. A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika 1984, 49, 115–132. [Google Scholar] [CrossRef]
Ip, E.H. Empirically indistinguishable multidimensional IRT and locally dependent unidimensional item response models. Br. J. Math. Stat. Psychol. 2010, 63, 395–416. [Google Scholar] [CrossRef]
Giordano, R.; Stephenson, W.; Liu, R.; Jordan, M.I.; Broderick, T. A higher-order swiss army infinitesimal jackknife. arXiv 2019, arXiv:1806.00550v5. [Google Scholar] [CrossRef]
Jaeckel, L.A. The Infinitesimal Jackknife; Bell Telephone Laboratories: Washington, WA, USA, 1972. [Google Scholar]
Jennrich, R.I. Nonparametric estimation of standard errors in covariance analysis using the infinitesimal jackknife. Psychometrika 2008, 73, 579–594. [Google Scholar] [CrossRef]
Kolen, M.J.; Brennan, R.L. Test Equating, Scaling, and Linking; Springer: New York, NY, USA, 2014. [Google Scholar] [CrossRef]
González, J.; Wiberg, M. Applying Test Equating Methods. Using R; Springer: New York, NY, USA, 2017. [Google Scholar] [CrossRef]
Andersson, B. Asymptotic variance of linking coefficient estimators for polytomous IRT models. Appl. Psychol. Meas. 2018, 42, 192–205. [Google Scholar] [CrossRef] [PubMed]
Battauz, M. Factors affecting the variability of IRT equating coefficients. Stat. Neerl. 2015, 69, 85–101. [Google Scholar] [CrossRef]
Ogasawara, H. Standard errors of item response theory equating/linking by response function methods. Appl. Psychol. Meas. 2001, 25, 53–67. [Google Scholar] [CrossRef]
Brennan, R.L. Generalizabilty Theory; Springer: New York, NY, USA, 2001. [Google Scholar] [CrossRef]
Husek, T.R.; Sirotnik, K. Item Sampling in Educational Research; CSEIP Occasional Report No. 2; University of California: Los Angeles, CA, USA, 1967. [Google Scholar]
Wu, H.; Browne, M.W. Quantifying adventitious error in a covariance structure as a random effect. Psychometrika 2015, 80, 571–600. [Google Scholar] [CrossRef] [PubMed]

Table 1. Simulation study: Results as a function of the standard deviation of DIF effects (

τ_{DIF}

) and number of items (I).

Table 1. Simulation study: Results as a function of the standard deviation of DIF effects (

τ_{DIF}

) and number of items (I).

				Mean			${SE}_{ratio}$			SD		COV95
$τ_{DIF}$	$I$	Bias	SD	EXP	JK	AN	JK	AN	AAD	JK	AN	JK	AN
		Linking error of estimated mean $\hat{μ}$
0.25	10	0.000	0.079	0.079	0.079	0.070	0.990	0.875	0.0091	0.0202	0.0177	91.6	88.6
0.25	20	0.003	0.056	0.056	0.056	0.053	1.000	0.945	0.0031	0.0096	0.0091	93.2	91.7
0.25	30	0.000	0.045	0.046	0.046	0.044	1.007	0.970	0.0017	0.0065	0.0062	94.3	93.3
0.25	40	0.003	0.039	0.040	0.040	0.039	1.013	0.986	0.0011	0.0047	0.0045	96.3	95.7
0.5	10	0.003	0.161	0.158	0.153	0.135	0.949	0.837	0.0181	0.0397	0.0354	91.6	87.9
0.5	20	0.002	0.111	0.112	0.111	0.104	1.000	0.942	0.0065	0.0185	0.0177	93.5	91.8
0.5	30	0.004	0.090	0.091	0.091	0.087	1.011	0.972	0.0036	0.0130	0.0127	94.5	93.5
0.5	40	0.008	0.080	0.079	0.078	0.076	0.978	0.948	0.0025	0.0097	0.0095	93.2	92.3
		Linking error of estimated standard deviation $\hat{σ}$
0.25	10	−0.007	0.039	—	0.041	0.033	1.051	0.866	0.0073	0.0111	0.0091	95.2	92.3
0.25	20	−0.008	0.023	—	0.023	0.021	1.009	0.914	0.0022	0.0042	0.0038	94.7	93.0
0.25	30	−0.008	0.019	—	0.017	0.016	0.901	0.848	0.0011	0.0026	0.0024	92.4	91.7
0.25	40	−0.008	0.016	—	0.014	0.014	0.898	0.856	0.0007	0.0019	0.0017	90.9	90.8
0.5	10	−0.030	0.078	—	0.077	0.062	0.985	0.796	0.0155	0.0206	0.0166	94.9	90.8
0.5	20	−0.029	0.053	—	0.043	0.040	0.827	0.752	0.0047	0.0079	0.0068	89.3	87.5
0.5	30	−0.027	0.042	—	0.033	0.031	0.774	0.727	0.0026	0.0048	0.0044	85.9	84.7
0.5	40	−0.027	0.037	—	0.027	0.026	0.723	0.690	0.0019	0.0034	0.0033	85.0	83.0

Note. EXP = expected value of linking error for

\hat{μ}

; JK = jackknife linking error; AN = linking error estimated by analytical approximation LE_AN from Equation (10); SE_ratio = standard error ratio; AAD = average absolute difference between JK and AN linking error estimates; SD = standard deviation of estimated linking error; COV95 = coverage rate for confidence level 95%.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Robitzsch, A. Analytical Approximation of the Jackknife Linking Error in Item Response Models Utilizing a Taylor Expansion of the Log-Likelihood Function. AppliedMath 2023, 3, 49-59. https://doi.org/10.3390/appliedmath3010004

AMA Style

Robitzsch A. Analytical Approximation of the Jackknife Linking Error in Item Response Models Utilizing a Taylor Expansion of the Log-Likelihood Function. AppliedMath. 2023; 3(1):49-59. https://doi.org/10.3390/appliedmath3010004

Chicago/Turabian Style

Robitzsch, Alexander. 2023. "Analytical Approximation of the Jackknife Linking Error in Item Response Models Utilizing a Taylor Expansion of the Log-Likelihood Function" AppliedMath 3, no. 1: 49-59. https://doi.org/10.3390/appliedmath3010004

APA Style

Robitzsch, A. (2023). Analytical Approximation of the Jackknife Linking Error in Item Response Models Utilizing a Taylor Expansion of the Log-Likelihood Function. AppliedMath, 3(1), 49-59. https://doi.org/10.3390/appliedmath3010004

Article Menu

Analytical Approximation of the Jackknife Linking Error in Item Response Models Utilizing a Taylor Expansion of the Log-Likelihood Function

Abstract

1. Introduction

2. Analytical Approximation of the Jackknife Linking Error

2.1. Use of the Approximation in Scaling

2.2. Implementation Details for Computing Derivatives of the Log-Likelihood Function

2.3. Bias Correction due to Sampling Error

2.4. Jackknife Linking Error Based on Testlets

3. Simulation Study

3.1. Method

3.2. Results

On the Bias in the Estimated Standard Deviation $\hat{σ}$

4. Discussion

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Analytical Approximation of the Jackknife Linking Error in Item Response Models Utilizing a Taylor Expansion of the Log-Likelihood Function

Abstract

1. Introduction

2. Analytical Approximation of the Jackknife Linking Error

2.1. Use of the Approximation in Scaling

2.2. Implementation Details for Computing Derivatives of the Log-Likelihood Function

2.3. Bias Correction due to Sampling Error

2.4. Jackknife Linking Error Based on Testlets

3. Simulation Study

3.1. Method

3.2. Results

On the Bias in the Estimated Standard Deviation σ ^

4. Discussion

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

On the Bias in the Estimated Standard Deviation $\hat{σ}$