Computational Aspects of L0 Linking in the Rasch Model

Robitzsch, Alexander

doi:10.3390/a18040213

Open AccessArticle

Computational Aspects of L₀ Linking in the Rasch Model

by

Alexander Robitzsch

^1,2

¹

IPN—Leibniz Institute for Science and Mathematics Education, Olshausenstraße 62, 24118 Kiel, Germany

²

Centre for International Student Assessment (ZIB), Olshausenstraße 62, 24118 Kiel, Germany

Algorithms 2025, 18(4), 213; https://doi.org/10.3390/a18040213

Submission received: 20 February 2025 / Revised: 4 April 2025 / Accepted: 7 April 2025 / Published: 9 April 2025

(This article belongs to the Special Issue Numerical Optimization and Algorithms: 3rd Edition)

Download

Browse Figures

Versions Notes

Abstract

The

L_{0}

linking approach replaces the

L_{2}

loss function in mean–mean linking under the Rasch model with the

L_{0}

loss function. Using the

L_{0}

loss function offers the advantage of potential robustness against fixed differential item functioning effects. However, its nondifferentiability necessitates differentiable approximations to ensure feasible and computationally stable estimation. This article examines alternative specifications of two approximations, each controlled by a tuning parameter

ε

that determines the approximation error. Results demonstrate that the optimal

ε

value minimizing the RMSE of the linking parameter estimate depends on the magnitude of DIF effects, the number of items, and the sample size. A data-driven selection of

ε

outperformed a fixed

ε

across all conditions in both a numerical illustration and a simulation study.

Keywords:

L₀ loss function; mean–mean linking; Rasch model; differential item functioning; differentiable approximation

1. Introduction

Item response theory (IRT) models [1,2,3] are statistical models for multivariate discrete random variables. This article focuses on dichotomous (i.e., binary) random variables and the comparison of two groups through a linking method.

Let

X = (X_{1}, \dots, X_{I})

denote a vector of

I \in N

random variables

X_{i} \in {0, 1}

, commonly referred to as items or (scored) item responses. A unidimensional IRT model [4] represents the probability distribution

P (X = x)

for

x = (x_{1}, \dots, x_{I}) \in {0, 1}^{I}

,

P (X = x; δ, γ) = \int \prod_{i = 1}^{I} [{P_{i} (θ; γ_{i})}^{x_{i}} {(1 - P_{i} (θ; γ_{i}))}^{1 - x_{i}}] ϕ (θ; μ, σ) d θ,

(1)

where

ϕ

denotes the density of the normal distribution with mean

μ

and standard deviation (SD)

σ

. The distribution parameters of the latent variable

θ

, often labeled as a trait or ability variable, are collected in the vector

δ = (μ, σ)

. The vector

γ = (γ_{1}, \dots, γ_{I})

contains the item parameters associated with the item response functions (IRFs)

P_{i} (θ; γ_{i}) = P (X_{i} = 1 | θ)

for

i = 1, \dots, I

. The IRF of the Rasch model [5,6,7] is defined as

P_{i} (θ; γ_{i}) = Ψ (θ - b_{i}),

(2)

where

b_{i}

denotes the item difficulty

b_{i}

, and

Ψ (x) = {(1 + exp (- x))}^{- 1}

is the logistic distribution function. In this formulation, the item parameter vector is given as

γ_{i} = (b_{i})

.

For a sample of N individuals with independently and identically distributed observations

x_{1}, \dots, x_{N}

from the distribution of the random variable

X

, the unknown parameters of the IRT model in (1) can be consistently estimated using maximum likelihood estimation (ML; [8,9,10]).

IRT models are commonly used to compare the performance of two groups on a test by examining differences in the latent variable

θ

within the framework of the IRT model in (1). This article focuses on a generalization of the mean–mean linking method [11] based on the Rasch model. The purpose of a linking method is to estimate the difference between the distributions of

θ

in the two groups. This difference serves as a summary measure of group performance on the multivariate vector of dichotomous items

X

.

The linking approach consists of two steps. First, the Rasch model is estimated separately for each group, allowing for potential differential item functioning (DIF), where items may function differently across groups [12,13,14]. Second, differences in item parameters are used to estimate group differences in the

θ

variable through a linking method [11,15,16].

This article investigates the performance of a generalization of mean–mean (MM; [11]) linking in the presence of fixed uniform DIF [14] in item difficulties. The traditional MM method relies on mean differences in item difficulties, which corresponds to using an

L_{2}

loss function. MM linking is a widely used linking or equating method, as discussed in popular textbooks on Rasch modeling [6,7,17,18]. In this study, we investigate a generalization of the MM linking method in the Rasch model using the

L_{0}

loss function [19]. Using this robust loss function can essentially remove items with DIF effects from the group comparison directly in the linking method (see also [20,21,22,23,24,25,26,27]). DIF effects can also be treated through the application of robust procedures in MM linking [19,28,29,30,31].

The

L_{0}

loss function, being nondifferentiable, requires differentiable approximations to ensure feasible and computationally stable estimation. This article explores alternative specifications of these approximations, which depend on a tuning parameter

ε

that controls the approximation error. Previous research relied on fixed

ε

values based on prior knowledge or simulation studies. Here, the choice of a fixed

ε

parameter is examined and compared with a data-driven approach for determining

ε

.

The rest of this article is structured as follows. Section 2 introduces

L_{0}

linking in the Rasch model. Section 3 presents a numerical illustration of alternative specifications of

L_{0}

linking in a simplified setting. In Section 4, findings from a simulation study are reported. An empirical example is provided in Section 5. This article closes with a discussion in Section 6 and a conclusion in Section 7.

2. $L_{0}$ Linking in the Rasch Model

This section discusses the

L_{0}

linking approach in the Rasch model. Section 2.1 covers item parameter identification and the ordinary MM linking method. Section 2.2 introduces the

L_{0}

loss function and presents two differentiable approximations. Section 2.3 applies these approximations to define the

L_{0}

linking approach in the Rasch model. Finally, Section 2.4 examines its statistical properties.

2.1. Identified Item Parameters and Mean–Mean Linking

To introduce the

L_{0}

linking method as a replacement for ordinary mean–mean linking in the Rasch model, we first outline the identification of item parameters in a group-specific estimation of the Rasch model when no DIF effects are present. In this case, the identified item parameters in the first group are

{\hat{b}}_{i 1} = b_{i}

, where

b_{i}

represents the invariant item parameters across both groups. For identification, the mean of

θ

in the first group is fixed at 0, while the standard deviation

σ

remains identifiable within this group.

In the second group, the mean

μ

and the SD

σ

can be identified when invariant item difficulties are assumed. Thus,

μ

and

σ

represent group differences in

θ

. The identified item parameters in this group, estimated separately under the assumption of a

θ

mean of 0 and an estimated SD

σ

, are given by

{\hat{b}}_{i 2} = b_{i} - μ .

(3)

Linking methods aim to recover the group parameter

μ

using the group-specific item parameters obtained from separate Rasch model estimations.

The MM linking method estimates the group mean

\hat{μ}

for the second group as

\hat{μ} = - \frac{1}{I} \sum_{i = 1}^{I} ({\hat{b}}_{i 2} - {\hat{b}}_{i 1}) .

(4)

The linking parameter

\hat{μ}

in MM linking is obtained as the minimizer of the squared loss function (i.e., the

L_{2}

loss function; [19])

\hat{μ} = \underset{μ}{arg min} H (μ), with H (μ) = \frac{1}{I} \sum_{i = 1}^{I} ρ ({\hat{b}}_{i 2} - {\hat{b}}_{i 1} + μ), where ρ (x) = \frac{1}{2} x^{2} .

(5)

Thus, the estimate

\hat{μ}

satisfies the estimating equation

\frac{1}{I} \sum_{i = 1}^{I} ρ^{'} ({\hat{b}}_{i 2} - {\hat{b}}_{i 1} + \hat{μ}) = 0, where ρ^{'} (x) = x

(6)

and

ρ^{'}

denotes the derivative of

ρ

with respect to x. Clearly, (6) is equivalent to (4).

This paper investigates the computational aspects of MM linking in (5) when the squared

L_{2}

loss function is replaced with the

L_{0}

loss function, which aims to robustify the linking method in the presence of fixed DIF effects.

2.2. $L_{0}$ Loss Function and Differentiable Approximations

In this section, we introduce the

L_{0}

loss function [32,33], which is defined as

ρ (x) = 1 (x \neq 0) .

(7)

This loss function equals 0 for

x = 0

and 1 for

x \neq 0

. The

L_{0}

loss function has been widely applied in statistical modeling, particularly for obtaining sparse solutions (see, e.g., [34,35,36,37,38,39]).

The exact

L_{0}

loss function in (7) has the disadvantage of being nondifferentiable at

x = 0

, making it difficult to use in numerical optimization. To address this, differentiable approximations of the

L_{0}

loss function have been proposed.

The ratio loss function, as an approximation of the

L_{0}

loss function, is defined as (see [40,41,42,43])

ρ_{ε} (x) = \frac{x^{2}}{x^{2} + ε},

(8)

where

ε > 0

is a tuning parameter. We have found that

ε = 0.01

performs satisfactorily in applications [19,44].

The linking parameter estimate based on the

L_{2}

loss function can be obtained using the

L_{0}

approximation (8) with a sufficiently large

ε

value, such as

ε = 1000

. In this case,

ρ_{ε} (x) \approx x^{2} / ε

, and the resulting estimate closely matches the

L_{2}

estimate, which is the ordinary MM parameter estimate.

An alternative differentiable approximation of the

L_{0}

loss function is based on the Gaussian density function, denoted as the Gaussian function hereafter, and is given by (see [45,46,47])

ρ_{ε} (x) = 1 - exp (- \frac{x^{2}}{ε}),

(9)

where

ε > 0

is again a tuning parameter.

Figure 1 displays the ratio and the Gaussian loss functions for

ε = 0.001

and

ε = 0.01

. It can be seen that the two loss functions exhibit similar behavior around

x = 0

, which can be explained by the fact that their first two derivatives coincide, with

ρ_{ε}^{'} (0) = 0

and

ρ_{ε}^{″} (0) = 2 / ε

. For larger

| x |

values, the Gaussian loss function grows faster toward the upper asymptote of 1 compared to the ratio loss function.

The ordinary MM parameter estimate can be obtained using the Gaussian

L_{0}

loss function approximation, similar to the ratio function, by selecting a sufficiently large

ε

value.

2.3. $L_{0}$ Linking as a Robust Mean–Mean Linking in the Rasch Model

MM linking can be modified by using the

L_{0}

loss function instead of the

L_{2}

loss function. In the former case, the loss is defined by counting the number of non-vanishing deviations

{\hat{b}}_{i 2}

from

{\hat{b}}_{i 1} + μ

, which corresponds to the number of non-vanishing DIF effects. To obtain a stable linking parameter estimate in the presence of sampling errors, the

L_{0}

loss function is replaced by the differentiable approximation

ρ_{ε}

, and the estimated mean

\hat{μ}

is obtained as

\hat{μ} = \underset{μ}{arg min} H (μ) with H (μ) = \frac{1}{I} \sum_{i = 1}^{I} ρ_{ε} ({\hat{b}}_{i 2} - {\hat{b}}_{i 1} + μ) .

(10)

This approach can also be referred to as robust MM linking or

L_{0}

linking in the Rasch model, where “robust” refers to the linking parameter estimate being resilient to the presence of fixed DIF effects.

Equivalently to (10), the linking parameter estimate

\hat{μ}

solves the estimating equation

\frac{\partial H}{\partial μ} = \frac{1}{I} \sum_{i = 1}^{I} ρ_{ε}^{'} ({\hat{b}}_{i 2} - {\hat{b}}_{i 1} + \hat{μ}) = 0 .

(11)

Equation (10) represents a one-dimensional optimization problem that can be numerically solved using standard optimizers implemented in statistical software.

2.4. Statistical Properties of the Estimated Linking Parameter in $L_{0}$ Linking

We now investigate the statistical properties of the linking parameter estimate

\hat{μ}

obtained from (10). The difference in estimated item difficulties required in the

L_{0}

linking approach can be written as

{\hat{b}}_{i 2} - {\hat{b}}_{i 1} = - μ + κ_{i} + u_{i}

(12)

with fixed DIF effects

κ_{i}

and sampling errors

u_{i}

, where

E (u_{i}) = 0

and

Var (u_{i}) = V_{i}

. Asymptotic unbiasedness follows from the general properties of ML estimation.

The

L_{0}

linking approach is analyzed by starting from the estimating equation

\frac{1}{I} \sum_{i = 1}^{I} ρ_{ε}^{'} ({\hat{b}}_{i 2} - {\hat{b}}_{i 1} + \hat{μ}) = \frac{1}{I} \sum_{i = 1}^{I} ρ_{ε}^{'} (\hat{μ} - μ + κ_{i} + u_{i}) = 0 .

(13)

Now, we apply a Taylor expansion with respect to

μ

, as in standard M-estimation theory ([48]; see also [25,49]), and obtain

0 = \frac{1}{I} \sum_{i = 1}^{I} ρ_{ε}^{'} (\hat{μ} - μ + κ_{i} + u_{i}) ≃ \frac{1}{I} \sum_{i = 1}^{I} ρ_{ε}^{'} (κ_{i} + u_{i}) + \frac{1}{I} \sum_{i = 1}^{I} ρ_{ε}^{″} (κ_{i} + u_{i}) (\hat{μ} - μ) = 0 .

(14)

The bias and variance of

\hat{μ}

can be derived from (14) and are given by

Bias (\hat{μ}) = - \frac{\sum_{i = 1}^{I} E [ρ_{ε}^{'} (κ_{i} + u_{i})]}{\sum_{i = 1}^{I} E [ρ_{ε}^{″} (κ_{i} + u_{i})]} and

(15)

Var (\hat{μ}) = \frac{\sum_{i = 1}^{I} Var [ρ_{ε}^{'} (κ_{i} + u_{i})]}{{\{\sum_{i = 1}^{I} E [ρ_{ε}^{″} (κ_{i} + u_{i})]\}}^{2}},

(16)

where approximate independence of item parameters across items is assumed in (16). As a summary precision measure of the linking parameter estimate, the mean squared error (MSE) can be determined as

MSE (\hat{μ}) = E {(\hat{μ} - μ)}^{2} = Bias {(\hat{μ})}^{2} + Var (\hat{μ}) = \frac{\sum_{i = 1}^{I} \{{(E [ρ_{ε}^{'} (κ_{i} + u_{i})])}^{2} + Var [ρ_{ε}^{'} (κ_{i} + u_{i})]\}}{{\{\sum_{i = 1}^{I} E [ρ_{ε}^{″} (κ_{i} + u_{i})]\}}^{2}} .

(17)

Although it might not be immediately evident from Equations (15)–(17), the bias, variance, and MSE depend on the choice of the tuning parameter

ε

in the

L_{0}

approximation

ρ_{ε}

.

In the next two sections, we compare the two differentiable approximations—the ratio and Gaussian loss functions—regarding their statistical properties in a numerical illustration and a simulation study. In particular, we focus on the choice of the tuning parameter

ε

.

3. Numerical Illustration

3.1. Method

In this Numerical Illustration, the properties of the estimated linking parameter

\hat{μ}

were studied in a simplified setting in which no item responses were simulated. It was assumed that the difference

{\hat{b}}_{i 2} - {\hat{b}}_{i 1}

had a variance of

V / N

, where N denotes the sample size. Hence, this illustration assumed equal sampling variances of item parameter differences, which might be violated in practice. However, this assumption eases the statistical treatment and aims at yielding clearly interpretable results. Furthermore, we assumed that we had I items, and a proportion

π

of the items had an unbalanced DIF effect

κ

, while a proportion

1 - π

of the items did not show DIF.

From Section 2.4, we know that the linking parameter satisfies the estimating equation

\frac{1}{I} \sum_{i = 1}^{I} ρ_{ε}^{'} ({\hat{b}}_{i 2} - {\hat{b}}_{i 1} + \hat{μ}) = \frac{1}{I} \sum_{i = 1}^{I} ρ_{ε}^{'} (\hat{μ} - μ + κ_{i} + u_{i}) = 0,

(18)

where

Var (u_{i}) = V / N

. Because all item parameter differences have equal sampling variances, (18) can be simplified to

\sum_{i = 1}^{I π} ρ_{ε}^{'} (\hat{μ} - μ + κ + u_{i}) + \sum_{I π + 1}^{I} ρ_{ε}^{'} (\hat{μ} - μ + u_{i}) = 0,

(19)

assuming that

I π

is an integer. By using the bias Formula (15), we obtain

Bias (\hat{μ}) = - \frac{π E [ρ_{ε}^{'} (κ + u_{i})]}{π E [ρ_{ε}^{″} (κ + u_{i})] + (1 - π) E [ρ_{ε}^{″} (u_{i})]},

(20)

where we use

E [ρ_{ε}^{'} (u_{i})] = 0

because

u_{i}

has a symmetric distribution. The variance can be computed as (see (16))

Var (\hat{μ}) = \frac{π Var [ρ_{ε}^{'} (κ + u_{i})] + (1 - π) Var [ρ_{ε}^{'} (u_{i})]}{{\{π E [ρ_{ε}^{″} (κ + u_{i})] + (1 - π) E [ρ_{ε}^{″} (u_{i})]\}}^{2}},

(21)

The root mean square error (RMSE) as the square root of the MSE can be obtained as

RMSE (\hat{μ}) = \sqrt{MSE (\hat{μ})} = \sqrt{Bias {(\hat{μ})}^{2} + Var (\hat{μ})} .

(22)

The expected values and variances in (20) and (21) can be numerically evaluated.

The bias, SD, and RMSE are evaluated for

\hat{μ}

for sample sizes N of 250, 500, and 1000, as well as the number of items I as 10, 20, and 40. We simulated fixed DIF effects

κ = 0.4

and

κ = 0.8

, representing small and large DIF. The tuning parameter

ε

in the ratio loss function and Gaussian loss function

ρ_{ε}

was chosen as 1, 0.95, 0.9, 0.85, 0.8, 0.75, 0.7, 0.65, 0.6, 0.55, 0.5, 0.45, 0.4, 0.35, 0.3, 0.25, 0.2, 0.15, 0.1, 0.095, 0.09, 0.085, 0.08, 0.075, 0.07, 0.065, 0.06, 0.055, 0.05, 0.045, 0.04, 0.035, 0.03, 0.025, 0.02, 0.015, 0.01, 0.009, 0.008, 0.007, 0.006, 0.005, 0.004, 0.003, 0.002, and 0.001. In total, 46

ε

values were evaluated in this Numerical Illustration. We only report results for the ratio loss function because the findings for the Gaussian loss function were very similar. The statistical software R (Version 4.4.1; [50]) was used for the analysis in this study. Symbolic derivatives of the two loss functions were computed with the R package Deriv (Version: 4.1.6; [51]). Replication material for this illustration can be retrieved from https://osf.io/un6q4 (accessed on 20 February 2025).

3.2. Results

Figure 2 displays the absolute bias, the SD, and the RMSE of the estimated mean

\hat{μ}

for

I = 20

items as a function of the DIF effect

κ

and the sample size N. The tuning parameter

ε

is displayed on a logarithmic scale on the x-axis. The bias increased with increasing values of

ε

in the loss function

ρ_{ε}

. In contrast, the SD decreased with increasing values of

ε

. The RMSE reflects the bias–variance trade-off of the linking parameter estimate. There was an optimal

ε

parameter that minimizes the RMSE. This optimal

ε

parameter decreased with increasing sample size N. Moreover, the optimal

ε

parameter was larger for

κ = 0.8

than for

κ = 0.4

.

Table 1 presents the optimal

ε

values that minimize the RMSE of the estimated mean

\hat{μ}

under a ratio loss function. These values are reported for different DIF effect sizes

κ

, numbers of items I, and sample sizes N. As the sample size N increased, the optimal

ε

generally decreased. This pattern was more pronounced for small item numbers (e.g.,

I = 10

), where

ε

dropped sharply with increasing N. Larger item numbers I were associated with smaller optimal

ε

values, suggesting that when more items were available, the best trade-off under the ratio loss function occurred at a lower

ε

value. This effect was particularly evident for

ε = 0.4

, where

ε

decreased substantially as I increased from 10 to 40. When the DIF effect size

κ

increased from 0.4 to 0.8, the optimal

ε

values tended to be slightly higher for the same I and N. This suggests that stronger DIF effects required a higher

ε

to achieve the lowest RMSE, likely due to greater bias introduced by DIF at the smaller effect size

κ = 0.4

.

Overall, the results highlight a trade-off between sample size, the number of items, and DIF effect size in determining the optimal correction for minimizing the RMSE.

4. Simulation Study

4.1. Method

The Rasch model was used as the IRF in the data-generating model for two groups. For identification purposes, the mean of the latent variable

θ

in the first group was fixed at 0, with an SD

σ

of 1. In the second group, the mean

μ

was set to 0.3 to represent the difference in

θ

between the groups, while the SD

σ

was set to 1.2.

The simulation study was conducted for

I = 10

, 20, and 40 items. In the

I = 10

condition, base item difficulty values

b_{i}

were set to −0.314, 0.411, −1.097, −0.542, −1.854, −0.403, −0.895, 0.715, 0.841, and 0.139, resulting in a mean of

M = - 0.300

and an SD of 0.850. These item difficulties were used in the data generation for the first group. In the second group, the same values were applied, except for the first three items, where item difficulties were adjusted to

b_{i 2} = b_{i} + κ

. This introduced fixed DIF in 30% of the items. The DIF effect size

κ

was set to 0.4 and 0.8, representing small and large DIF conditions, respectively. For item sets larger than 10, the same 10-item parameter set was repeated accordingly. The item parameters are also available at https://osf.io/un6q4 (accessed on 20 February 2025). Note that item parameters remained fixed across all replications within each condition.

Sample sizes of

N = 125

, 250, 500, and 1000 per group were selected to reflect small- to large-scale applications of the Rasch model.

A total of 5000 replications were conducted for each of the 24 conditions, corresponding to the combinations of 4 sample sizes (N) × 3 item numbers (I) × 2 DIF effect sizes (

κ

). For each simulated dataset,

L_{0}

linking was performed using various specifications. Both the ratio and Gaussian loss functions

ρ_{ε}

(see Section 2.2) were applied. The

ρ_{ε}

loss functions were evaluated for a sequence of 11

ε

values: 1, 0.75, 0.50, 0.25, 0.10, 0.075, 0.05, 0.025, 0.01, 0.005, and 0.001.

Additionally, as demonstrated in Section 3, the optimal

ε

value corresponding to the minimal RMSE is dependent on the sample size N. To account for this, a data-driven approach for selecting the

ε

value was explored. Let

\bar{V}

denote the average variance of the estimated item difficulty differences

{\hat{b}}_{i 2} - {\hat{b}}_{i 1}

. For a fixed

y \in (0, 1)

,

ε_{y}

is chosen such that

ρ_{ε_{y}} (\sqrt{\bar{V}}) = y .

(23)

In the simulation, the following y values were chosen: 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, and 0.9.

The bias, standard deviation (SD), and root mean square error (RMSE) of the estimated mean,

\hat{μ}

, were calculated for all specifications of the

L_{0}

linking approach. The relative RMSE of

\hat{μ}

was defined as the ratio of the RMSE for a given specification to the RMSE of a reference method, multiplied by 100. Across all conditions, the reference method was the ratio loss function with a fixed

ε

value of 0.01.

All analyses for this simulation study were conducted using the statistical software R (Version 4.4.1; [50]). The Rasch model was fitted with the sirt::xxirt() function from the R package sirt (Version 4.2-106; [52]). The optimization of

L_{0}

linking was performed with the stats::nlminb() function. Replication materials for this simulation study can be assessed at https://osf.io/un6q4 (accessed on 20 February 2025).

4.2. Results

Table 2 presents the absolute bias as a function of the DIF effect size

κ

, the number of items I, and the sample size N. Overall, the absolute bias generally decreased as

κ

increased from 0.4 to 0.8. The bias in the estimated

\hat{μ}

was highest for the smallest number of items (

I = 10

) and slightly decreased as the number of items increased. This reduction in bias was more pronounced for smaller sample sizes. As the sample size N increased, the bias systematically decreased, with the lowest bias observed for the largest sample size (

N = 1000

). For the data-driven

ε

choices

ε_{0.2}

and

ε_{0.4}

, the bias also decreased as the sample size increased.

The biases for the ratio and Gaussian loss functions were relatively similar, although the bias was slightly smaller under conditions with a larger DIF effect size (

κ = 0.8

) compared to a smaller DIF effect size (

κ = 0.4

). Additionally, the bias decreased as

ε

values in the

ρ_{ε}

function were reduced. Specifically, the bias was slightly higher for

ε = 0.2

compared to

ε = 0.4

, particularly at smaller N. These findings suggest that while the choice of the loss function influenced estimation accuracy, its effect diminished as I and N increased.

Table 3 presents the relative RMSE of the estimated mean

\hat{μ}

as a function of the DIF effect size

κ

, the number of items I, and the sample size N. In general, larger

ε

values performed better in smaller samples and with fewer items. Thus, the optimal

ε

for minimizing the RMSE depends on

κ

, I, and N. The data-driven estimates

ε_{0.2}

and

ε_{0.4}

generally outperformed the fixed choice

ε = 0.01

based on the loss function

ρ_{ε}

. Overall, it can be concluded that

ε_{0.4}

was superior to

ε_{0.2}

. Although

ε_{0.2}

produced a lower RMSE than

ε_{0.4}

in many simulation conditions, the differences were typically small. However, in scenarios where

ε_{0.2}

performed worse than

ε_{0.4}

, its RMSE exceeded the reference value of 100 and was substantially higher than that of

ε_{0.4}

. This supports the use of

ε_{0.4}

as a conservative default choice. In most conditions, the Gaussian loss function slightly outperformed the ratio loss function. In large samples (

N = 1000

), smaller

ε

values, such as

ε = 0.05

or

ε = 0.01

, proved effective.

To provide further insight into the absolute bias and relative RMSE, two additional figures are presented below.

Figure 3 presents the absolute bias, SD, and RMSE of the estimated mean

\hat{μ}

using the ratio loss function for

I = 20

items as a function of the DIF effect size

κ

and sample size N. The absolute bias increased with larger

ε

, while the SD decreased, illustrating a bias–variance trade-off in the RMSE. This trade-off results in an optimal

ε

value that minimizes the RMSE. As shown in Figure 3, the optimal

ε

was smaller for a larger DIF effect size (i.e.,

κ = 0.8

) than for

κ = 0.4

. Additionally, the optimal

ε

decreased with increasing sample size N. As expected, the RMSE was lower for larger sample sizes.

Figure 4 presents the relative RMSE of the estimated mean based on data-driven

ε_{y}

values for the ratio loss function

ρ_{ε}

with

ε = ε_{y}

. The results indicate that an optimal RMSE was achieved for a specific y value, depending on the DIF effect size

κ

, the number of items I, and the sample size N. Overall, the findings in Figure 4 support the selection of

ε_{0.2}

and

ε_{0.4}

, as presented in Table 2 and Table 3.

Following the suggestion of a reviewer, bias and RMSE were regressed onto the simulation factors using a two-way analysis of variance (ANOVA). The factors included sample size N, number of items I, the size of the DIF effect

κ

, the type of loss function (i.e., ratio vs. Gaussian function), and the chosen

ε

value in the loss function

ρ_{ε}

. All simulation factors were treated as categorical factors in the ANOVA. The same item numbers I (e.g.,

I = 10, 20, 40

) and

ε

values (i.e.,

ε = 0.01, 0.05, 0.10, 0.25

) as reported in Table 2 and Table 3 were used. The main focus of the analysis was the proportion of variance explained by the main effects and two-way interactions of the factors in the ANOVA.

In the two-way ANOVA with bias as the dependent variable,

96.0 %

of the variance was explained by the main factors and their two-way interactions. The largest proportion was attributed to sample size N (

39.8 %

), followed by

κ

(

24.8 %

) and

ε

(

24.8 %

). The number of items I (

0.1 %

) and the type of loss function (

0.2 %

) contributed minimally to the variability in bias. Among the two-way interactions, only the interaction between N and

κ

(

N \times κ

,

2.3 %

) and

N \times ε

(

1.7 %

) accounted for non-negligible proportions of variance.

In the two-way ANOVA with RMSE as the dependent variable,

99.3 %

of the variance was explained by the simulation factors and their two-way interactions. As expected, sample size N accounted for the largest proportion of explained variance (

89.9 %

), followed by number of items I (

3.2 %

),

κ

(

1.0 %

),

ε

(

0.5 %

), and the type of loss function (

0.0 %

). Among the two-way interactions,

N \times I

(

1.3 %

),

N \times κ

(

1.0 %

), and

N \times ε

(

2.1 %

) contributed with non-negligible amounts to the variance.

5. Empirical Example

We now illustrate

L_{0}

linking in the Rasch model using an empirical dataset. The dataDIF dataset from the R package equateIRT (Version 1.0.0; [53,54]) was selected for this purpose. The dataset contains 20 dichotomous items and three groups, each with 1000 subjects. For illustration, only Groups 1 and 2 were linked using the Rasch model.

As in the Simulation Study (see Section 4), the Rasch model was fitted using the sirt::xxirt() function from the R package sirt (Version 4.2-106; [52]). The optimization of

L_{0}

linking was performed using the stats::nlminb() function, with R code specifically written for this paper, available at https://osf.io/un6q4 (accessed on 20 February 2025).

The average standard error of item difficulty differences was 0.126. The mean ability difference between Group 1 and Group 2, obtained from MM linking based on the

L_{2}

loss function, was 0.482. Slight deviations from this estimate were observed with different specifications of the

L_{0}

linking function. For the ratio loss function, the mean estimate based on

ε = 0.01

was 0.535. Empirical choices of the

ε

parameter, specifically

ε_{0.2}

and

ε_{0.4}

, were also explored. For the ratio loss function, the estimates for

ε_{0.2} = 0.064

and

ε_{0.4} = 0.024

yielded mean differences of 0.518 and 0.520, respectively. The mean estimate based on the Gaussian loss function was 0.537, with estimates for

ε_{0.2} = 0.072

and

ε_{0.4} = 0.031

yielding 0.519 and 0.517, respectively.

DIF effects for the 20 items were also computed based on the estimated mean difference of 0.520, obtained with

ε_{0.4}

and the ratio loss function. Figure 5 displays the DIF effects along with their 95% confidence intervals. Item 1 exhibited a significantly positive DIF effect (0.798, with a standard error of 0.112), while DIF effects for the remaining 19 items did not significantly differ from 0.

6. Discussion

This article examined the computational aspects of

L_{0}

linking in the Rasch model.

L_{0}

linking serves as a robust alternative to MM linking when fixed DIF effects are present. Since the

L_{0}

loss function is not differentiable at

x = 0

, the ratio loss function and the Gaussian loss function were used as differentiable approximations. Both approximations rely on a tuning parameter

ε

, which controls the approximation error. A numerical illustration and a simulation study demonstrated that the optimal

ε

value minimizing the RMSE of the linking parameter estimate

\hat{μ}

depends on the magnitude of DIF effects, the number of items, and the sample size. A data-driven selection of

ε

provided better performance than a fixed

ε

across all conditions. Additionally, the Gaussian loss function showed a slight advantage over the ratio loss function, though the differences are likely negligible in practice.

This study aimed to determine group differences in a robust manner, ensuring insensitivity to DIF effects. Detecting DIF items was treated as a nuisance, as the goal was not to identify deviating items but rather to obtain a consistent estimate of the mean difference in ability between the two groups. In contrast, much of the psychometric literature focuses on detecting DIF items [22,27,55,56,57], with group difference estimation being, at best, a by-product of the procedure.

Using the

L_{0}

loss function effectively removes items with large DIF effects from the linking process. As a result, the estimated group difference is based solely on items with little to no DIF. This approach may pose a threat to validity, as it alters the construct being measured by treating items with DIF effects as construct-irrelevant [58,59]. Restricting the item set in this way can change the interpretation of the ability variable [60,61,62,63,64].

7. Conclusions

The

L_{0}

loss function proved effective in linking two groups based on the Rasch model in the presence of differential item functioning. Two differentiable approximations (i.e., the ratio and Gaussian loss functions) of the nondifferentiable

L_{0}

loss function have been proposed. The smoothness of the approximation depends on the tuning parameter

ε

. A simulation study demonstrated that a data-driven choice of

ε

, based on the average standard error of item difficulty differences, produced estimates with a lower RMSE compared to estimates based on a fixed

ε

value, such as

ε = 0.01

.

Funding

This research received no external funding.

Data Availability Statement

This article only uses simulated datasets. Replication material for Section 3 and Section 4 can be found at https://osf.io/un6q4 (accessed on 20 February 2025). The dataset dataDIF used in the empirical example in Section 5 is available from the equateIRT package (https://doi.org/10.32614/CRAN.package.equateIRT; accessed on 20 February 2025).

Conflicts of Interest

The author declares no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ANOVA	analysis of variance
DIF	differential item functioning
IRF	item response function
IRT	item response theory
ML	maximum likelihood
MM	mean–mean
MSE	mean square error
RMSE	root mean square error
SD	standard deviation

References

Bock, R.D.; Gibbons, R.D. Item Response Theory; Wiley: Hoboken, NJ, USA, 2021. [Google Scholar] [CrossRef]
Chen, Y.; Li, X.; Liu, J.; Ying, Z. Item response theory—A statistical framework for educational and psychological measurement. arXiv 2021, arXiv:2108.08604. [Google Scholar] [CrossRef]
Yen, W.M.; Fitzpatrick, A.R. Item response theory. In Educational Measurement; Brennan, R.L., Ed.; Praeger Publishers: Westport, CT, USA, 2006; pp. 111–154. [Google Scholar]
Van der Linden, W.J. Unidimensional logistic response models. In Handbook of Item Response Theory, Volume 1: Models; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 11–30. [Google Scholar] [CrossRef]
Rasch, G. Probabilistic Models for Some Intelligence and Attainment Tests; Danish Institute for Educational Research: Copenhagen, Denmark, 1960. [Google Scholar]
Bond, T.; Yan, Z.; Heene, M. Applying the Rasch Model; Routledge: New York, NY, USA, 2020. [Google Scholar] [CrossRef]
Debelak, R.; Strobl, C.; Zeigenfuse, M.D. An Introduction to the Rasch Model with Examples in R; CRC Press: Boca Raton, FL, USA, 2022. [Google Scholar] [CrossRef]
Bock, R.D.; Aitkin, M. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika 1981, 46, 443–459. [Google Scholar] [CrossRef]
Glas, C.A.W. Maximum-likelihood estimation. In Handbook of Item Response Theory, Volume 2: Statistical Tools; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 197–216. [Google Scholar] [CrossRef]
Robitzsch, A. A comprehensive simulation study of estimation methods for the Rasch model. Stats 2021, 4, 814–836. [Google Scholar] [CrossRef]
Kolen, M.J.; Brennan, R.L. Test Equating, Scaling, and Linking; Springer: New York, NY, USA, 2014. [Google Scholar] [CrossRef]
Holland, P.W.; Wainer, H. (Eds.) Differential Item Functioning: Theory and Practice; Lawrence Erlbaum: Hillsdale, NJ, USA, 1993. [Google Scholar] [CrossRef]
Millsap, R.E. Statistical Approaches to Measurement Invariance; Routledge: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
Penfield, R.D.; Camilli, G. Differential item functioning and item bias. In Handbook of Statistics, Volume 26: Psychometrics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 125–167. [Google Scholar] [CrossRef]
Lee, W.C.; Lee, G. IRT linking and equating. In The Wiley Handbook of Psychometric Testing: A Multidisciplinary Reference on Survey, Scale and Test; Irwing, P., Booth, T., Hughes, D.J., Eds.; Wiley: New York, NY, USA, 2018; pp. 639–673. [Google Scholar] [CrossRef]
Sansivieri, V.; Wiberg, M.; Matteucci, M. A review of test equating methods with a special focus on IRT-based approaches. Statistica 2017, 77, 329–352. [Google Scholar] [CrossRef]
Andrich, D.; Marais, I. A Course in Rasch Measurement Theory; Springer: New York, NY, USA, 2019. [Google Scholar] [CrossRef]
Lamprianou, I. Applying the Rasch Model in Social Sciences Using R and BlueSky Statistics; Routledge: New York, NY, USA, 2019. [Google Scholar] [CrossRef]
Robitzsch, A. Extensions to mean–geometric mean linking. Mathematics 2025, 13, 35. [Google Scholar] [CrossRef]
Von Davier, M.; Bezirhan, U. A robust method for detecting item misfit in large scale assessments. Educ. Psychol. Meas. 2023, 83, 740–765. [Google Scholar] [CrossRef]
De Boeck, P. Random item IRT models. Psychometrika 2008, 73, 533–559. [Google Scholar] [CrossRef]
Halpin, P.F. Differential item functioning via robust scaling. Psychometrika 2024, 89, 796–821. [Google Scholar] [CrossRef]
He, Y.; Cui, Z.; Fang, Y.; Chen, H. Using a linear regression method to detect outliers in IRT common item equating. Appl. Psychol. Meas. 2013, 37, 522–540. [Google Scholar] [CrossRef]
Magis, D.; De Boeck, P. Identification of differential item functioning in multiple-group settings: A multivariate outlier detection approach. Multivar. Behav. Res. 2011, 46, 733–755. [Google Scholar] [CrossRef]
Robitzsch, A. Robust and nonrobust linking of two groups for the Rasch model with balanced and unbalanced random DIF: A comparative simulation study and the simultaneous assessment of standard errors and linking errors with resampling techniques. Symmetry 2021, 13, 2198. [Google Scholar] [CrossRef]
Robitzsch, A. A comparison of linking methods for two groups for the two-parameter logistic item response model in the presence and absence of random differential item functioning. Foundations 2021, 1, 116–144. [Google Scholar] [CrossRef]
Wang, W.; Liu, Y.; Liu, H. Testing differential item functioning without predefined anchor items using robust regression. J. Educ. Behav. Stat. 2022, 47, 666–692. [Google Scholar] [CrossRef]
Hu, H.; Rogers, W.T.; Vukmirovic, Z. Investigation of IRT-based equating methods in the presence of outlier common items. Appl. Psychol. Meas. 2008, 32, 311–333. [Google Scholar] [CrossRef]
Jurich, D.; Liu, C. Detecting item parameter drift in small sample Rasch equating. Appl. Meas. Educ. 2023, 36, 326–339. [Google Scholar] [CrossRef]
Liu, C.; Jurich, D. Outlier detection using t-test in Rasch IRT equating under NEAT design. Appl. Psychol. Meas. 2023, 47, 34–47. [Google Scholar] [CrossRef] [PubMed]
Manna, V.F.; Gu, L. Different Methods of Adjusting for Form Difficulty Under the Rasch Model: Impact on Consistency of Assessment Results; (Research Report No. RR-19-08); Educational Testing Service: Princeton, NJ, USA, 2019. [Google Scholar] [CrossRef]
Oelker, M.R.; Pößnecker, W.; Tutz, G. Selection and fusion of categorical predictors with L₀-type penalties. Stat. Model. 2015, 15, 389–410. [Google Scholar] [CrossRef]
Oelker, M.R.; Tutz, G. A uniform framework for the combination of penalties in generalized structured models. Adv. Data Anal. Classif. 2017, 11, 97–120. [Google Scholar] [CrossRef]
Atamturk, A.; Gómez, A.; Han, S. Sparse and smooth signal estimation: Convexification of l₀-formulations. J. Mach. Learn. Res. 2021, 22, 1–43. [Google Scholar]
Dai, S. Variable selection in convex quantile regression: L₁-norm or L₀-norm regularization? Eur. J. Oper. Res. 2023, 305, 338–355. [Google Scholar] [CrossRef]
Huang, J.; Jiao, Y.; Liu, Y.; Lu, X. A constructive approach to L₀ penalized regression. J. Mach. Learn. Res. 2018, 19, 1–37. [Google Scholar]
Panokin, N.V.; Kostin, I.A.; Karlovskiy, A.V.; Nalivaiko, A.Y. Comparison of sparse representation methods for complex data based on the smoothed L₀ norm and modified minimum fuel neural network. Appl. Sci. 2025, 15, 1038. [Google Scholar] [CrossRef]
Soubies, E.; Blanc-Féraud, L.; Aubert, G. A continuous exact l₀ penalty (CEL0) for least squares regularized problem. SIAM J. Imaging Sci. 2015, 8, 1607–1639. [Google Scholar] [CrossRef]
Yang, Y.; McMahan, C.S.; Wang, Y.B.; Ouyang, Y. Estimation of l₀ norm penalized models: A statistical treatment. Comp. Stat. Data Anal. 2024, 192, 107902. [Google Scholar] [CrossRef]
Liu, W.; Li, Z.; Chen, W. Evaluating model robustness using adaptive sparse L₀ regularization. arXiv 2024, arXiv:2408.15702. [Google Scholar] [CrossRef]
O’Neill, M.; Burke, K. Variable selection using a smooth information criterion for distributional regression models. Stat. Comput. 2023, 33, 71. [Google Scholar] [CrossRef]
Wang, B.; Wang, L.; Yu, H.; Xin, F. A new regularized reconstruction algorithm based on compressed sensing for the sparse underdetermined problem and applications of one-dimensional and two-dimensional signal recovery. Algorithms 2019, 12, 126. [Google Scholar] [CrossRef]
Xiang, J.; Yue, H.; Yin, X.; Wang, L. A new smoothed l₀ regularization approach for sparse signal recovery. Math. Probl. Eng. 2019, 2019, 1978154. [Google Scholar] [CrossRef]
Robitzsch, A. L₀ and L_p loss functions in model-robust estimation of structural equation models. Psych 2023, 5, 1122–1139. [Google Scholar] [CrossRef]
Paik, J.W.; Lee, J.H.; Hong, W. An enhanced smoothed L₀-norm direction of arrival estimation method using covariance matrix. Sensors 2021, 21, 4403. [Google Scholar] [CrossRef]
Wang, L.; Yin, X.; Yue, H.; Xiang, J. A regularized weighted smoothed L₀ norm minimization method for underdetermined blind source separation. Sensors 2018, 18, 4260. [Google Scholar] [CrossRef]
Zhu, J.; Li, X. A smoothed l₀-norm and l₁-norm regularization algorithm for computed tomography. J. Appl. Math. 2019, 2019, 8398035. [Google Scholar] [CrossRef]
Boos, D.D.; Stefanski, L.A. Essential Statistical Inference; Springer: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
Simakhin, V.A.; Shamanaeva, L.G.; Avdyushina, A.E. Robust parametric estimates of heterogeneous experimental data. Russ. Phys. J. 2021, 63, 1510–1518. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing; R Core Team: Vienna, Austria, 2024; Available online: https://www.R-project.org (accessed on 15 June 2024).
Clausen, A.; Sokol, S. Deriv: Symbolic Differentiation, 2024. R Package Version 4.1.6. Available online: https://cran.r-project.org/web/packages/Deriv/ (accessed on 13 September 2024).
Robitzsch, A. sirt: Supplementary Item Response Theory Models, 2024. R Package Version 4.2-106. Available online: https://github.com/alexanderrobitzsch/sirt (accessed on 31 December 2024).
Battauz, M. equateIRT: An R package for IRT test equating. J. Stat. Softw. 2015, 68, 1–22. [Google Scholar] [CrossRef]
Battauz, M. equateMultiple: Equating of Multiple Forms, 2024. R Package Version 1.0.0. Available online: https://cran.r-project.org/web/packages/equateMultiple/index.html (accessed on 13 September 2024).
Chen, Y.; Li, C.; Ouyang, J.; Xu, G. DIF statistical inference without knowing anchoring items. Psychometrika 2023, 88, 1097–1122. [Google Scholar] [CrossRef] [PubMed]
Halpin, P.F.; Gilbert, J. Testing whether reported treatment effects are unduly dependent on the specific outcome measure used. arXiv 2024, arXiv:2409.03502. [Google Scholar] [CrossRef]
Strobl, C.; Kopf, J.; Kohler, L.; von Oertzen, T.; Zeileis, A. Anchor point selection: Scale alignment based on an inequality criterion. Appl. Psychol. Meas. 2021, 45, 214–230. [Google Scholar] [CrossRef]
Camilli, G. The case against item bias detection techniques based on internal criteria: Do item bias procedures obscure test fairness issues? In Differential Item Functioning: Theory and Practice; Holland, P.W., Wainer, H., Eds.; Erlbaum: Hillsdale, NJ, USA, 1993; pp. 397–417. [Google Scholar]
Shealy, R.; Stout, W. A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika 1993, 58, 159–194. [Google Scholar] [CrossRef]
De Los Reyes, A.; Tyrell, F.A.; Watts, A.L.; Asmundson, G.J.G. Conceptual, methodological, and measurement factors that disqualify use of measurement invariance techniques to detect informant discrepancies in youth mental health assessments. Front. Psychol. 2022, 13, 931296. [Google Scholar] [CrossRef]
El Masri, Y.H.; Andrich, D. The trade-off between model fit, invariance, and validity: The case of PISA science assessments. Appl. Meas. Educ. 2020, 33, 174–188. [Google Scholar] [CrossRef]
Funder, D.C.; Gardiner, G. MIsgivings about measurement invariance. Eur. J. Pers. 2024, 38, 889–895. [Google Scholar] [CrossRef]
Welzel, C.; Inglehart, R.F. Misconceptions of measurement equivalence: Time for a paradigm shift. Comp. Political Stud. 2016, 49, 1068–1094. [Google Scholar] [CrossRef]
Zwitser, R.J.; Glaser, S.S.F.; Maris, G. Monitoring countries in a changing world: A new look at DIF in international surveys. Psychometrika 2017, 82, 210–232. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Ratio loss function and Gaussian loss function

ρ_{ε}

for

ε = 0.001

and

ε = 0.01

as differentiable approximations of the

L_{0}

loss function.

Figure 1. Ratio loss function and Gaussian loss function

ρ_{ε}

for

ε = 0.001

and

ε = 0.01

as differentiable approximations of the

L_{0}

loss function.

Figure 2. Numerical Illustration: Absolute bias, standard deviation (SD), and root mean square error (RMSE) of the estimated mean

\hat{μ}

using the ratio loss function for

I = 20

items as a function of sample size N and DIF effect size

κ

. The minimum RMSE values are depicted with a triangle. The

ε

parameter on the x-axis is displayed on a logarithmic scale.

Figure 2. Numerical Illustration: Absolute bias, standard deviation (SD), and root mean square error (RMSE) of the estimated mean

\hat{μ}

using the ratio loss function for

I = 20

items as a function of sample size N and DIF effect size

κ

. The minimum RMSE values are depicted with a triangle. The

ε

parameter on the x-axis is displayed on a logarithmic scale.

Figure 3. Simulation Study: Absolute bias, standard deviation (SD), and root mean square error (RMSE) of the estimated mean

\hat{μ}

using the ratio loss function for

I = 20

items as a function of sample size N and DIF effect size

κ

. The minimum RMSE values are depicted with a triangle.

Figure 3. Simulation Study: Absolute bias, standard deviation (SD), and root mean square error (RMSE) of the estimated mean

\hat{μ}

using the ratio loss function for

I = 20

items as a function of sample size N and DIF effect size

κ

. The minimum RMSE values are depicted with a triangle.

Figure 4. Simulation Study: Relative root mean square error (RMSE) of the estimated mean

\hat{μ}

using the ratio loss function based on

ε = ε_{y}

determined through

ρ_{ε_{y}} (\sqrt{\bar{V}}) = y

as a function of sample size N, the number of items I, and DIF effect size

κ

. The minimum relative RMSE values are depicted with a triangle. The ratio loss function with

ε = 0.01

was chosen as the reference method when computing the relative RMSE.

Figure 4. Simulation Study: Relative root mean square error (RMSE) of the estimated mean

\hat{μ}

using the ratio loss function based on

ε = ε_{y}

determined through

ρ_{ε_{y}} (\sqrt{\bar{V}}) = y

as a function of sample size N, the number of items I, and DIF effect size

κ

. The minimum relative RMSE values are depicted with a triangle. The ratio loss function with

ε = 0.01

was chosen as the reference method when computing the relative RMSE.

Figure 5. Empirical example: DIF parameter estimates (displayed with a black triangle) along with their 95% confidence intervals.

Table 1. Numerical Illustration: optimal

ε

value yielding the minimum root mean square error (RMSE) of the estimated mean

\hat{μ}

using the ratio loss function as a function of the DIF effect size

κ

, number of items I, and sample size N.

Table 1. Numerical Illustration: optimal

ε

value yielding the minimum root mean square error (RMSE) of the estimated mean

\hat{μ}

using the ratio loss function as a function of the DIF effect size

κ

, number of items I, and sample size N.

$I$	$κ = 0.4$ , $N =$			$κ = 0.8$ , $N =$
$I$	250	500	1000	250	500	1000
10	0.900	0.065	0.030	0.150	0.100	0.090
20	0.200	0.050	0.030	0.100	0.100	0.090
40	0.100	0.035	0.025	0.100	0.095	0.085

Table 2. Simulation Study: Absolute bias of estimated mean

\hat{μ}

as a function of the DIF effect size

κ

, number of items I, and sample size N.

Table 2. Simulation Study: Absolute bias of estimated mean

\hat{μ}

as a function of the DIF effect size

κ

, number of items I, and sample size N.

$κ$	$I$	$N$	Ratio Function, $ε =$						Gaussian Function, $ε =$
$κ$	$I$	$N$	0.25	0.10	0.05	0.01	$ε_{0.2}$	$ε_{0.4}$	0.25	0.10	0.05	0.01	$ε_{0.2}$	$ε_{0.4}$
0.4	10	125	0.107	0.103	0.102	0.102	0.110	0.105	0.107	0.101	0.100	0.101	0.112	0.106
		250	0.098	0.087	0.082	0.079	0.096	0.085	0.101	0.086	0.080	0.078	0.101	0.087
		500	0.088	0.066	0.052	0.041	0.068	0.049	0.097	0.067	0.046	0.033	0.075	0.048
		1000	0.082	0.050	0.031	0.015	0.034	0.019	0.096	0.057	0.026	0.010	0.035	0.013
	20	125	0.104	0.100	0.099	0.099	0.107	0.102	0.104	0.099	0.096	0.096	0.109	0.103
		250	0.098	0.086	0.080	0.075	0.097	0.084	0.102	0.084	0.075	0.069	0.102	0.086
		500	0.089	0.066	0.051	0.036	0.069	0.048	0.097	0.068	0.044	0.028	0.076	0.046
		1000	0.081	0.049	0.028	0.009	0.031	0.014	0.095	0.056	0.023	0.003	0.032	0.008
	40	125	0.105	0.101	0.098	0.096	0.108	0.103	0.106	0.099	0.096	0.096	0.110	0.105
		250	0.098	0.086	0.077	0.070	0.097	0.083	0.101	0.084	0.071	0.063	0.101	0.086
		500	0.099	0.076	0.059	0.039	0.079	0.055	0.107	0.078	0.051	0.029	0.087	0.054
		1000	0.087	0.054	0.033	0.012	0.036	0.018	0.100	0.062	0.027	0.006	0.036	0.012
0.8	10	125	0.114	0.093	0.087	0.084	0.138	0.102	0.105	0.078	0.074	0.074	0.152	0.099
		250	0.069	0.037	0.027	0.023	0.064	0.033	0.066	0.021	0.015	0.015	0.067	0.023
		500	0.048	0.017	0.008	0.003	0.019	0.006	0.047	0.004	0.000	0.001	0.009	0.000
		1000	0.039	0.011	0.004	0.001	0.005	0.001	0.040	0.002	0.000	0.000	0.000	0.000
	20	125	0.112	0.082	0.073	0.069	0.140	0.096	0.103	0.063	0.055	0.054	0.155	0.095
		250	0.069	0.033	0.020	0.014	0.063	0.029	0.065	0.015	0.007	0.005	0.065	0.017
		500	0.047	0.015	0.006	0.000	0.017	0.004	0.046	0.003	0.002	0.003	0.007	0.001
		1000	0.038	0.011	0.004	0.000	0.005	0.001	0.039	0.002	0.000	0.000	0.000	0.000
	40	125	0.109	0.075	0.061	0.054	0.140	0.092	0.099	0.052	0.041	0.036	0.156	0.090
		250	0.071	0.033	0.020	0.011	0.065	0.029	0.067	0.015	0.006	0.005	0.067	0.018
		500	0.051	0.020	0.011	0.005	0.022	0.009	0.050	0.007	0.003	0.003	0.011	0.003
		1000	0.039	0.012	0.005	0.001	0.006	0.002	0.040	0.002	0.001	0.001	0.001	0.001

Note.

ε_{0.2}, ε_{0.4}

= computed

ε

values in loss function

ρ_{ε}

such that

ρ_{ε_{y}} (\sqrt{\bar{V}}) = y

for

y = 0.2

and

y = 0.4

, where

\sqrt{\bar{V}}

is the square root of the average variance of the item difficulty difference

{\hat{b}}_{i 2} - {\hat{b}}_{i 1}

; absolute biases larger than 0.03 are printed in bold.

Table 3. Simulation Study: Relative root mean square error (RMSE) of estimated mean

\hat{μ}

as a function of the DIF effect size

κ

, number of items I, and sample size N.

Table 3. Simulation Study: Relative root mean square error (RMSE) of estimated mean

\hat{μ}

as a function of the DIF effect size

κ

, number of items I, and sample size N.

$κ$	$I$	$N$	Ratio Function, $ε =$						Gaussian Function, $ε =$
$κ$	$I$	$N$	0.25	0.10	0.05	0.01	$ε_{0.2}$	$ε_{0.4}$	0.25	0.10	0.05	0.01	$ε_{0.2}$	$ε_{0.4}$
0.4	10	125	83.5	90.5	95.0	100	80.2	86.3	83.0	92.9	99.0	105.2	79.1	84.0
		250	83.4	88.1	93.3	100	83.7	89.4	83.2	89.1	96.6	105.8	83.2	88.1
		500	92.6	89.0	91.0	100	89.1	92.0	95.6	89.8	92.3	103.8	90.5	91.6
		1000	123.1	102.5	94.8	100	95.5	96.2	134.4	108.1	94.6	102.4	97.5	94.0
	20	125	83.2	88.6	93.8	100	81.8	85.1	83.0	90.9	98.1	105.6	81.4	83.6
		250	85.9	86.9	91.3	100	85.7	87.8	86.4	87.0	93.7	104.5	86.4	86.5
		500	101.9	93.4	91.7	100	94.0	92.1	106.3	94.1	91.0	103.2	96.8	90.7
		1000	141.1	111.1	97.7	100	99.2	95.7	155.9	118.4	95.8	100.4	100.7	91.7
	40	125	85.6	88.9	93.5	100	85.2	86.5	85.4	89.8	97.1	105.9	85.2	85.6
		250	91.5	89.7	91.6	100	91.1	89.9	92.4	89.3	92.5	104.4	92.4	89.3
		500	120.2	105.7	97.9	100	107.2	96.8	126.2	107.1	94.9	101.5	112.2	95.7
		1000	169.0	127.9	106.9	100	109.1	98.3	187.6	137.6	103.0	99.7	110.8	94.2
0.8	10	125	87.2	92.6	96.0	100	85.6	89.3	87.9	95.9	100.5	106.1	86.8	88.7
		250	90.3	89.4	92.8	100	89.6	90.2	91.4	89.1	94.6	106.2	91.4	89.0
		500	94.8	87.2	89.8	100	87.1	90.7	97.7	84.0	87.5	105.3	84.6	86.6
		1000	101.5	87.9	88.4	100	88.0	93.7	106.2	85.6	86.5	102.4	87.2	91.9
	20	125	88.7	90.9	95.0	100	91.6	88.7	88.5	92.2	97.6	104.7	94.4	88.2
		250	94.6	88.3	90.9	100	92.9	88.5	94.8	85.6	90.8	104.9	94.7	85.2
		500	97.2	86.3	87.6	100	86.6	88.6	99.0	83.7	86.1	104.1	84.2	85.5
		1000	106.8	90.5	90.2	100	90.0	94.0	109.8	88.1	88.7	102.4	88.3	91.4
	40	125	92.1	90.7	94.0	100	98.3	90.2	90.2	89.4	95.3	103.8	102.7	88.8
		250	101.0	90.8	91.2	100	98.6	90.4	100.3	87.5	90.1	104.0	100.1	87.6
		500	105.4	90.3	89.7	100	90.8	90.2	106.1	87.0	87.9	104.9	87.7	87.6
		1000	116.4	94.8	93.3	100	93.3	95.4	118.6	92.0	92.2	101.8	91.9	93.7

Note.

ε_{0.2}, ε_{0.4}

= computed

ε

values in loss function

ρ_{ε}

such that

ρ_{ε_{y}} (\sqrt{\bar{V}}) = y

for

y = 0.2

and

y = 0.4

, where

\sqrt{\bar{V}}

is the square root of the average variance of the item difficulty difference

{\hat{b}}_{i 2} - {\hat{b}}_{i 1}

. The ratio loss function with

ε = 0.01

was chosen as the reference method when computing the relative RMSE. Cells with the smallest RMSE values are printed with a gray background color.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Robitzsch, A. Computational Aspects of L₀ Linking in the Rasch Model. Algorithms 2025, 18, 213. https://doi.org/10.3390/a18040213

AMA Style

Robitzsch A. Computational Aspects of L₀ Linking in the Rasch Model. Algorithms. 2025; 18(4):213. https://doi.org/10.3390/a18040213

Chicago/Turabian Style

Robitzsch, Alexander. 2025. "Computational Aspects of L₀ Linking in the Rasch Model" Algorithms 18, no. 4: 213. https://doi.org/10.3390/a18040213

APA Style

Robitzsch, A. (2025). Computational Aspects of L₀ Linking in the Rasch Model. Algorithms, 18(4), 213. https://doi.org/10.3390/a18040213

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Computational Aspects of L₀ Linking in the Rasch Model

Abstract

1. Introduction

2. $L_{0}$ Linking in the Rasch Model

2.1. Identified Item Parameters and Mean–Mean Linking

2.2. $L_{0}$ Loss Function and Differentiable Approximations

2.3. $L_{0}$ Linking as a Robust Mean–Mean Linking in the Rasch Model

2.4. Statistical Properties of the Estimated Linking Parameter in $L_{0}$ Linking

3. Numerical Illustration

3.1. Method

3.2. Results

4. Simulation Study

4.1. Method

4.2. Results

5. Empirical Example

6. Discussion

7. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Computational Aspects of L0 Linking in the Rasch Model

Abstract

1. Introduction

2. L 0 Linking in the Rasch Model

2.1. Identified Item Parameters and Mean–Mean Linking

2.2. L 0 Loss Function and Differentiable Approximations

2.3. L 0 Linking as a Robust Mean–Mean Linking in the Rasch Model

2.4. Statistical Properties of the Estimated Linking Parameter in L 0 Linking

3. Numerical Illustration

3.1. Method

3.2. Results

4. Simulation Study

4.1. Method

4.2. Results

5. Empirical Example

6. Discussion

7. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Computational Aspects of L₀ Linking in the Rasch Model

2. $L_{0}$ Linking in the Rasch Model

2.2. $L_{0}$ Loss Function and Differentiable Approximations

2.3. $L_{0}$ Linking as a Robust Mean–Mean Linking in the Rasch Model

2.4. Statistical Properties of the Estimated Linking Parameter in $L_{0}$ Linking