# Robust Haebara Linking for Many Groups: Performance in the Case of Uniform DIF

^{1}

^{2}

## Abstract

**:**

## 1. Introduction

## 2. 2PL Model with Partial Invariance: Presence of Uniform DIF Effects

## 3. Haebara Linking

#### 3.1. Estimation

#### 3.2. Estimated Group Means as a Function of DIF Effects

## 4. Simulation Study

#### 4.1. Simulation Design

#### 4.2. Analysis Methods

`sirt::linking.haebara()`function in the R package sirt [36]. The

`TAM::tam.mml.2pl()`function in the R package TAM [43] was used for estimating the 2PL model with marginal maximum likelihood as the estimation method.

#### 4.3. Results

## 5. Empirical Example: PISA 2006 Reading Competence

## 6. Discussion

## Funding

## Conflicts of Interest

## Author Note

## Abbreviations

2PL | two-parameter logistic model |

ABIAS | average absolute bias |

ARMSE | average root mean square error |

DIF | differential item functioning |

FI | full invariance |

IRF | item response function |

MCSE | Monte Carlo standard error |

PISA | programme for international student assessment |

RMSE | root mean square error |

## Appendix A. Estimated Group Means in Robust Haebara Linking

#### Appendix A.1. Taylor Approximation of Power Loss Function ρ

#### Appendix A.2. Minimization of a Quadratic Function

#### Appendix A.3. Taylor Approximation of Item Response Function with DIF Effects

#### Appendix A.4. Derivation of Expected Estimated Group Means for p ≠ 1

#### Appendix A.5. Derivation of Expected Estimated Group Means for p=1

#### Appendix A.6. Unbiasedness for p = 0

## Appendix B. Data Generating Parameters for Simulation Study

Item i | ${\mathit{a}}_{\mathit{i}}$ | ${\mathit{b}}_{\mathit{i}}$ |
---|---|---|

1 | 0.95 | −0.97 |

2 | 0.88 | $\phantom{-}$0.59 |

3 | 0.75 | $\phantom{-}$0.75 |

4 | 1.29 | −0.79 |

5 | 1.28 | $\phantom{-}$1.23 |

6 | 1.29 | −1.10 |

7 | 1.25 | −0.67 |

8 | 0.97 | $\phantom{-}$0.20 |

9 | 0.73 | $\phantom{-}$1.26 |

10 | 1.27 | $\phantom{-}$0.05 |

11 | 1.42 | $\phantom{-}$1.22 |

12 | 0.75 | −0.01 |

13 | 0.50 | $\phantom{-}$0.20 |

14 | 0.81 | $\phantom{-}$1.39 |

15 | 1.12 | $\phantom{-}$0.61 |

16 | 0.78 | −1.00 |

17 | 1.30 | −1.58 |

18 | 0.70 | −1.62 |

19 | 1.29 | $\phantom{-}$1.06 |

20 | 0.74 | −0.81 |

Group g | |||||||||
---|---|---|---|---|---|---|---|---|---|

Item $i$ | $1$ | $2$ | $3$ | $4$ | $5$ | $6$ | $7$ | $8$ | $9$ |

1 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 |

2 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | −1 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 |

3 | $\phantom{-}$0 | −1 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 |

4 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 |

5 | $\phantom{-}$0 | $\phantom{-}$0 | −1 | $\phantom{-}$0 | −1 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 |

6 | $\phantom{-}$1 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$1 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 |

7 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | −1 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 |

8 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 |

9 | $\phantom{-}$0 | $\phantom{-}$0 | −1 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 |

10 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 |

11 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 |

12 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 |

13 | $\phantom{-}$0 | −1 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 |

14 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 |

15 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 |

16 | $\phantom{-}$1 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | −1 | $\phantom{-}$0 |

17 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | −1 | −1 | $\phantom{-}$0 | $\phantom{-}$0 |

18 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$1 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | −1 | $\phantom{-}$0 |

19 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | −1 |

20 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | −1 | $\phantom{-}$0 | −1 |

Group g | |||||||||
---|---|---|---|---|---|---|---|---|---|

Item $i$ | $1$ | $2$ | $3$ | $4$ | $5$ | $6$ | $7$ | $8$ | $9$ |

1 | $\phantom{-}$0 | $\phantom{-}$0 | −1 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 |

2 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | −1 | −1 | −1 | $\phantom{-}$0 | $\phantom{-}$0 |

3 | $\phantom{-}$0 | −1 | $\phantom{-}$0 | $\phantom{-}$1 | −1 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 |

4 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$1 | $\phantom{-}$0 | −1 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 |

5 | $\phantom{-}$1 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | −1 | $\phantom{-}$0 | $\phantom{-}$0 | −1 | $\phantom{-}$0 |

6 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | −1 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 |

7 | $\phantom{-}$0 | $\phantom{-}$0 | −1 | $\phantom{-}$0 | $\phantom{-}$0 | −1 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 |

8 | $\phantom{-}$1 | −1 | $\phantom{-}$0 | $\phantom{-}$0 | −1 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | −1 |

9 | $\phantom{-}$1 | −1 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | −1 | $\phantom{-}$0 | $\phantom{-}$0 |

10 | $\phantom{-}$0 | −1 | −1 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | −1 | −1 | $\phantom{-}$0 |

11 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | −1 | $\phantom{-}$0 |

12 | $\phantom{-}$1 | $\phantom{-}$0 | −1 | $\phantom{-}$1 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | −1 | −1 |

13 | $\phantom{-}$0 | −1 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | −1 |

14 | $\phantom{-}$0 | −1 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | −1 | $\phantom{-}$0 | $\phantom{-}$0 |

15 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$1 | $\phantom{-}$0 | −1 | $\phantom{-}$0 | $\phantom{-}$0 | −1 |

16 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$1 | $\phantom{-}$0 | −1 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 |

17 | $\phantom{-}$1 | $\phantom{-}$0 | −1 | $\phantom{-}$0 | −1 | $\phantom{-}$0 | −1 | −1 | −1 |

18 | $\phantom{-}$1 | $\phantom{-}$0 | −1 | $\phantom{-}$0 | −1 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 |

19 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | −1 |

20 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$0 | $\phantom{-}$1 | $\phantom{-}$0 | $\phantom{-}$0 | −1 | −1 | $\phantom{-}$0 |

## Appendix C. Monte Carlo Standard Errors in Simulation Study

**Table A4.**Monte Carlo Standard Errors for Average Absolute Bias (MCSE ABIAS) and Average Root Mean Square Error (MCSE ARMSE) of Group Means as a Function of Sample Size.

MCSE ABIAS | MCSE ARMSE | ||||||||
---|---|---|---|---|---|---|---|---|---|

Model | $N$ | 250 | 500 | 1000 | 5000 | 250 | 500 | 1000 | 5000 |

FI | $0.00112$ | $0.00074$ | $0.00057$ | $0.00028$ | $0.00092$ | $0.00082$ | $0.00049$ | $0.00021$ | |

$p=2$ | $0.00114$ | $0.00076$ | $0.00058$ | $0.00027$ | $0.00097$ | $0.00082$ | $0.00049$ | $0.00021$ | |

$p=1$ | $0.00112$ | $0.00078$ | $0.00057$ | $0.00027$ | $0.00093$ | $0.00083$ | $0.00049$ | $0.00022$ | |

$p=0.5$ | $0.00113$ | $0.00082$ | $0.00057$ | $0.00027$ | $0.00094$ | $0.00084$ | $0.00049$ | $0.00022$ | |

$p=0.25$ | $0.00113$ | $0.00083$ | $0.00057$ | $0.00027$ | $0.00097$ | $0.00084$ | $0.00049$ | $0.00022$ | |

$p=0.1$ | $0.00114$ | $0.00084$ | $0.00057$ | $0.00027$ | $0.00096$ | $0.00084$ | $0.00049$ | $0.00022$ | |

$p=0.02$ | $0.00114$ | $0.00085$ | $0.00057$ | $0.00027$ | $0.00097$ | $0.00084$ | $0.00049$ | $0.00022$ | |

10% Biased Items | |||||||||

FI | $0.00128$ | $0.00079$ | $0.00055$ | $0.00027$ | $0.00112$ | $0.00067$ | $0.00057$ | $0.00028$ | |

$p=2$ | $0.00131$ | $0.00085$ | $0.00052$ | $0.00029$ | $0.00114$ | $0.00072$ | $0.00058$ | $0.00030$ | |

$p=1$ | $0.00122$ | $0.00078$ | $0.00050$ | $0.00029$ | $0.00104$ | $0.00073$ | $0.00059$ | $0.00028$ | |

$p=0.5$ | $0.00127$ | $0.00081$ | $0.00054$ | $0.00029$ | $0.00103$ | $0.00075$ | $0.00058$ | $0.00026$ | |

$p=0.25$ | $0.00133$ | $0.00085$ | $0.00055$ | $0.00028$ | $0.00104$ | $0.00076$ | $0.00057$ | $0.00025$ | |

$p=0.1$ | $0.00136$ | $0.00087$ | $0.00055$ | $0.00028$ | $0.00105$ | $0.00076$ | $0.00057$ | $0.00024$ | |

$p=0.02$ | $0.00139$ | $0.00087$ | $0.00055$ | $0.00028$ | $0.00105$ | $0.00077$ | $0.00056$ | $0.00024$ | |

30% Biased Items | |||||||||

FI | $0.00115$ | $0.00086$ | $0.00066$ | $0.00027$ | $0.00116$ | $0.00085$ | $0.00065$ | $0.00027$ | |

$p=2$ | $0.00173$ | $0.00084$ | $0.00068$ | $0.00027$ | $0.00117$ | $0.00083$ | $0.00066$ | $0.00026$ | |

$p=1$ | $0.00112$ | $0.00087$ | $0.00069$ | $0.00026$ | $0.00191$ | $0.00084$ | $0.00065$ | $0.00025$ | |

$p=0.5$ | $0.00120$ | $0.00094$ | $0.00073$ | $0.00027$ | $0.00220$ | $0.00087$ | $0.00068$ | $0.00025$ | |

$p=0.25$ | $0.00167$ | $0.00096$ | $0.00073$ | $0.00026$ | $0.01065$ | $0.00091$ | $0.00070$ | $0.00025$ | |

$p=0.1$ | $0.00173$ | $0.00098$ | $0.00074$ | $0.00026$ | $0.01074$ | $0.00093$ | $0.00071$ | $0.00024$ | |

$p=0.02$ | $0.00176$ | $0.00098$ | $0.00076$ | $0.00027$ | $0.01077$ | $0.00093$ | $0.00071$ | $0.00024$ |

## References

- OECD. PISA 2015. Technical Report; OECD: Paris, France, 2017. [Google Scholar]
- Penfield, R.D.; Camilli, G. Differential item functioning and item bias. In Handbook of Statistics, Vol. 26: Psychometrics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 125–167. [Google Scholar] [CrossRef]
- Haebara, T. Equating logistic ability scales by a weighted least squares method. Jpn. Psychol. Res.
**1980**, 22, 144–149. [Google Scholar] [CrossRef][Green Version] - He, Y.; Cui, Z. Evaluating robust scale transformation methods with multiple outlying common items under IRT true score equating. Appl. Psychol. Meas.
**2020**, 44, 296–310. [Google Scholar] [CrossRef] [PubMed] - Hu, H.; Rogers, W.T.; Vukmirovic, Z. Investigation of IRT-based equating methods in the presence of outlier common items. Appl. Psychol. Meas.
**2008**, 32, 311–333. [Google Scholar] [CrossRef] - Magis, D.; De Boeck, P. Identification of differential item functioning in multiple-group settings: A multivariate outlier detection approach. Multivar. Behav. Res.
**2011**, 46, 733–755. [Google Scholar] [CrossRef] - Birnbaum, A. Some latent trait models and their use in inferring an examinee’s ability. In Statistical Theories of Mental Test Scores; Lord, F.M., Novick, M.R., Eds.; MIT Press: Reading, MA, USA, 1968; pp. 397–479. [Google Scholar]
- Bechger, T.M.; Maris, G. A statistical test for differential item pair functioning. Psychometrika
**2015**, 80, 317–340. [Google Scholar] [CrossRef] [PubMed] - Doebler, A. Looking at DIF from a new perspective: A structure-based approach acknowledging inherent indefinability. Appl. Psychol. Meas.
**2019**, 43, 303–321. [Google Scholar] [CrossRef] - Robitzsch, A.; Lüdtke, O. A review of different scaling approaches under full invariance, partial invariance, and noninvariance for cross-sectional country comparisons in large-scale assessments. Psych. Test Assess. Model.
**2020**, 62, 233–279. [Google Scholar] - Byrne, B.M.; Shavelson, R.J.; Muthén, B. Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psychol. Bull.
**1989**, 105, 456–466. [Google Scholar] [CrossRef] - Von Davier, M.; Yamamoto, K.; Shin, H.J.; Chen, H.; Khorramdel, L.; Weeks, J.; Davis, S.; Kong, N.; Kandathil, M. Evaluating item response theory linking and model fit for data from PISA 2000–2012. Assess. Educ.
**2019**, 26, 466–488. [Google Scholar] [CrossRef] - Kopf, J.; Zeileis, A.; Strobl, C. Anchor selection strategies for DIF analysis: Review, assessment, and new approaches. Educ. Psychol. Meas.
**2015**, 75, 22–56. [Google Scholar] [CrossRef][Green Version] - Camilli, G. The case against item bias detection techniques based on internal criteria: Do item bias procedures obscure test fairness issues? In Differential Item Functioning: Theory and Practice; Holland, P.W., Wainer, H., Eds.; Erlbaum: Hillsdale, NJ, USA, 1993; pp. 397–417. [Google Scholar]
- Von Davier, A.A.; Carstensen, C.H.; von Davier, M. Linking Competencies in Educational Settings and Measuring Growth; Research Report No. RR-06-12; Educational Testing Service: Princeton, NJ, USA, 2006. [Google Scholar] [CrossRef]
- González, J.; Wiberg, M. Applying Test Equating Methods. Using R; Springer: New York, NY, USA, 2017. [Google Scholar] [CrossRef]
- Kolen, M.J.; Brennan, R.L. Test Equating, Scaling, and Linking; Springer: New York, NY, USA, 2014. [Google Scholar] [CrossRef]
- Lee, W.C.; Lee, G. IRT linking and equating. In The Wiley Handbook of Psychometric Testing: A Multidisciplinary Reference on Survey, Scale and Test; Irwing, P., Booth, T., Hughes, D.J., Eds.; Wiley: New York, NY, USA, 2018; pp. 639–673. [Google Scholar] [CrossRef]
- Sansivieri, V.; Wiberg, M.; Matteucci, M. A review of test equating methods with a special focus on IRT-based approaches. Statistica
**2017**, 77, 329–352. [Google Scholar] [CrossRef] - DeMars, C.E. Alignment as an alternative to anchor purification in DIF analyses. Struct. Equ. Model.
**2020**, 27, 56–72. [Google Scholar] [CrossRef] - He, Y.; Cui, Z.; Fang, Y.; Chen, H. Using a linear regression method to detect outliers in IRT common item equating. Appl. Psychol. Meas.
**2013**, 37, 522–540. [Google Scholar] [CrossRef] - He, Y.; Cui, Z.; Osterlind, S.J. New robust scale transformation methods in the presence of outlying common items. Appl. Psychol. Meas.
**2015**, 39, 613–626. [Google Scholar] [CrossRef] [PubMed] - Arai, S.; Mayekawa, S.i. A comparison of equating methods and linking designs for developing an item pool under item response theory. Behaviormetrika
**2011**, 38, 1–16. [Google Scholar] [CrossRef] - Battauz, M. Multiple equating of separate IRT calibrations. Psychometrika
**2017**, 82, 610–636. [Google Scholar] [CrossRef] [PubMed] - Kang, H.A.; Lu, Y.; Chang, H.H. IRT item parameter scaling for developing new item pools. Appl. Meas. Educ.
**2017**, 30, 1–15. [Google Scholar] [CrossRef] - Stocking, M.L.; Lord, F.M. Developing a common metric in item response theory. Appl. Psychol. Meas.
**1983**, 7, 201–210. [Google Scholar] [CrossRef][Green Version] - Haberman, S.J. Linking Parameter Estimates Derived from an Item Response Model through Separate Calibrations; (Research Report No. RR-09-40); Educational Testing Service: Princeton, NJ, USA, 2009. [Google Scholar] [CrossRef]
- Muthén, B.; Asparouhov, T. IRT studies of many groups: The alignment method. Front. Psychol.
**2014**, 5, 978. [Google Scholar] [CrossRef][Green Version] - Kim, S.H.; Cohen, A.S. A minimum χ
^{2}method for equating tests under the graded response model. Appl. Psychol. Meas.**1995**, 19, 167–176. [Google Scholar] [CrossRef][Green Version] - Kim, S. An extension of least squares estimation of IRT linking coefficients for the graded response model. Appl. Psychol. Meas.
**2010**, 34, 505–520. [Google Scholar] [CrossRef] - Pokropek, A.; Davidov, E.; Schmidt, P. A Monte Carlo simulation study to assess the appropriateness of traditional and newer approaches to test for measurement invariance. Struct. Equ. Model.
**2019**, 26, 724–744. [Google Scholar] [CrossRef][Green Version] - Pokropek, A.; Lüdtke, O.; Robitzsch, A. An extension of the invariance alignment method for scale linking. Psych. Test Assess. Model.
**2020**, 62, 303–334. [Google Scholar] - Robitzsch, A. L
_{p}loss functions in invariance alignment and Haberman linking. Preprints**2020**, 2020060034. [Google Scholar] [CrossRef] - Von Davier, M.; von Davier, A.A. A unified approach to IRT scale linking and scale transformations. Methodology
**2007**, 3, 115–124. [Google Scholar] [CrossRef] - R Core Team. R: A Language and Environment for Statistical Computing; R Core Team: Vienna, Austria, 2020; Available online: https://www.R-project.org/ (accessed on 1 February 2020).
- Robitzsch, A. sirt: Supplementary Item Response Theory Models; R package Version 3.9-4; R Core Team: Vienna, Austria, 2020; Available online: https://CRAN.R-project.org/package=sirt (accessed on 17 February 2020).
- Battauz, M. Regularized estimation of the nominal response model. Multivar. Behav. Res.
**2019**. [Google Scholar] [CrossRef] - Oelker, M.R.; Pößnecker, W.; Tutz, G. Selection and fusion of categorical predictors with L
_{0}-type penalties. Stat. Model.**2015**, 15, 389–410. [Google Scholar] [CrossRef] - Robitzsch, A.; Lüdtke, O. Mean comparisons of many groups in the presence of DIF: An evaluation of linking and concurrent scaling approaches. OSF Preprints
**2020**. [Google Scholar] [CrossRef] - Chang, Y.W.; Huang, W.K.; Tsai, R.C. DIF detection using multiple-group categorical CFA with minimum free baseline approach. J. Educ. Meas.
**2015**, 52, 181–199. [Google Scholar] [CrossRef] - Huelmann, T.; Debelak, R.; Strobl, C. A comparison of aggregation rules for selecting anchor items in multigroup DIF analysis. J. Educ. Meas.
**2020**, 57, 185–215. [Google Scholar] [CrossRef] - Morris, T.P.; White, I.R.; Crowther, M.J. Using simulation studies to evaluate statistical methods. Stat. Med.
**2019**, 38, 2074–2102. [Google Scholar] [CrossRef] [PubMed][Green Version] - Robitzsch, A.; Kiefer, T.; Wu, M. TAM: Test Analysis Modules; R Package Version 3.4-26; R Core Team: Vienna, Austria, 2020; Available online: https://CRAN.R-project.org/package=TAM (accessed on 10 March 2020).
- Andersson, B. Asymptotic variance of linking coefficient estimators for polytomous IRT models. Appl. Psychol. Meas.
**2018**, 42, 192–205. [Google Scholar] [CrossRef] [PubMed] - OECD. PISA 2006. Technical Report; OECD: Paris, France, 2009. [Google Scholar]
- Oliveri, M.E.; von Davier, M. Analyzing invariance of item parameters used to estimate trends in international large-scale assessments. In Test Fairness in the New Generation of Large-Scale Assessment; Jiao, H., Lissitz, R.W., Eds.; Information Age Publishing: New York, NY, USA, 2017; pp. 121–146. [Google Scholar]
- Robitzsch, A.; Lüdtke, O. Linking errors in international large-scale assessments: Calculation of standard errors for trend estimation. Assess. Educ.
**2019**, 26, 444–465. [Google Scholar] [CrossRef] - Jerrim, J.; Parker, P.; Choi, A.; Chmielewski, A.K.; Sälzer, C.; Shure, N. How robust are cross-country comparisons of PISA scores to the scaling model used? Educ. Meas.
**2018**, 37, 28–39. [Google Scholar] [CrossRef][Green Version] - Robitzsch, A.; Lüdtke, O.; Goldhammer, F.; Kroehne, U.; Köller, O. Reanalysis of the German PISA data: A comparison of different approaches for trend estimation with a particular emphasis on mode effects. Front. Psychol.
**2020**, 11, 884. [Google Scholar] [CrossRef] [PubMed] - Asparouhov, T.; Muthén, B. Multiple-group factor analysis alignment. Struct. Equ. Model.
**2014**, 21, 495–508. [Google Scholar] [CrossRef] - Finch, W.H. Detection of differential item functioning for more than two groups: A Monte Carlo comparison of methods. Appl. Meas. Educ.
**2016**, 29, 30–45. [Google Scholar] [CrossRef] - Pohl, S.; Schulze, D. Assessing group comparisons or change over time under measurement non-invariance: The cluster approach for nonuniform DIF. Psych. Test Assess. Model.
**2020**, 62, 281–303. [Google Scholar] - Rutkowski, L.; Svetina, D. Measurement invariance in international surveys: Categorical indicators and fit measure performance. Appl. Meas. Educ.
**2017**, 30, 39–51. [Google Scholar] [CrossRef] - De Boeck, P. Random item IRT models. Psychometrika
**2008**, 73, 533–559. [Google Scholar] [CrossRef] - De Jong, M.G.; Steenkamp, J.B.E.M.; Fox, J.P. Relaxing measurement invariance in cross-national consumer research using a hierarchical IRT model. J. Consum. Res.
**2007**, 34, 260–278. [Google Scholar] [CrossRef] - Fox, J.P.; Verhagen, A.J. Random item effects modeling for cross-national survey data. In Cross-Cultural Analysis: Methods and Applications; Davidov, E., Schmidt, P., Billiet, J., Eds.; Routledge: London, UK, 2010; pp. 461–482. [Google Scholar] [CrossRef]
- Muthén, B.; Asparouhov, T. Recent methods for the study of measurement invariance with many groups: Alignment and random effects. Soc. Methods Res.
**2018**, 47, 637–664. [Google Scholar] [CrossRef] - Pokropek, A.; Schmidt, P.; Davidov, E. Choosing priors in Bayesian measurement invariance modeling: A Monte Carlo simulation study. Struct. Equ. Model.
**2020**. [Google Scholar] [CrossRef] - Belzak, W.; Bauer, D.J. Improving the assessment of measurement invariance: Using regularization to select anchor items and identify differential item functioning. Psychol. Methods
**2020**. [Google Scholar] [CrossRef] [PubMed] - Tutz, G.; Schauberger, G. A penalty approach to differential item functioning in Rasch models. Psychometrika
**2015**, 80, 21–43. [Google Scholar] [CrossRef][Green Version] - Xu, X.; Douglas, J.; Lee, Y.S. Linking with nonparametric IRT models. In Statistical Models for Test Equating, Scaling, and Linking; von Davier, A.A., Ed.; Springer: New York, NY, USA, 2010; pp. 243–258. [Google Scholar] [CrossRef]
- Fishbein, B.; Martin, M.O.; Mullis, I.V.S.; Foy, P. The TIMSS 2019 item equivalence study: Examining mode effects for computer-based assessment and implications for measuring trends. Large-Scale Assess. Educ.
**2018**, 6, 11. [Google Scholar] [CrossRef][Green Version] - Barrett, M.D.; van der Linden, W.J. Estimating linking functions for response model parameters. J. Educ. Behav. Stat.
**2019**, 44, 180–209. [Google Scholar] [CrossRef] - Battauz, M. Factors affecting the variability of IRT equating coefficients. Stat. Neerl.
**2015**, 69, 85–101. [Google Scholar] [CrossRef] - Jewsbury, P.A. Error Variance in Common Population Linking Bridge Studies; Research Report No. RR-19-42; Educational Testing Service: Princeton, NJ, USA, 2019. [Google Scholar] [CrossRef][Green Version]
- Ogasawara, H. Standard errors of item response theory equating/linking by response function methods. Appl. Psychol. Meas.
**2001**, 25, 53–67. [Google Scholar] [CrossRef] - Zhang, Z. Estimating standard errors of IRT true score equating coefficients using imputed item parameters. J. Exp. Educ.
**2020**. [Google Scholar] [CrossRef] - Gebhardt, E.; Adams, R.J. The influence of equating methodology on reported trends in PISA. J. Appl. Meas.
**2007**, 8, 305–322. [Google Scholar] [PubMed] - Haberman, S.J.; Lee, Y.H.; Qian, J. Jackknifing Techniques for Evaluation of Equating Accuracy; (Research Report No. RR-09-02); Educational Testing Service: Princeton, NJ, USA, 2009. [Google Scholar] [CrossRef]
- Michaelides, M.P. A review of the effects on IRT item parameter estimates with a focus on misbehaving common items in test equating. Front. Psychol.
**2010**, 1, 167. [Google Scholar] [CrossRef] [PubMed][Green Version] - Monseur, C.; Berezner, A. The computation of equating errors in international surveys in education. J. Appl. Meas.
**2007**, 8, 323–335. [Google Scholar] [PubMed] - Sachse, K.A.; Roppelt, A.; Haag, N. A comparison of linking methods for estimating national trends in international comparative large-scale assessments in the presence of cross-national DIF. J. Educ. Meas.
**2016**, 53, 152–171. [Google Scholar] [CrossRef] - Xu, X.; von Davier, M. Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study; Research Report No. RR-10-10; Educational Testing Service: Princeton, NJ, USA, 2010. [Google Scholar] [CrossRef]
- Winter, S.D.; Depaoli, S. An illustration of Bayesian approximate measurement invariance with longitudinal data and a small sample size. Int. J. Behav. Dev.
**2019**. [Google Scholar] [CrossRef] - Arce-Ferrer, A.J.; Bulut, O. Investigating separate and concurrent approaches for item parameter drift in 3PL item response theory equating. Int. J. Test.
**2017**, 17, 1–22. [Google Scholar] [CrossRef] - Fischer, L.; Gnambs, T.; Rohm, T.; Carstensen, C.H. Longitudinal linking of Rasch-model-scaled competence tests in large-scale assessments: A comparison and evaluation of different linking methods and anchoring designs based on two tests on mathematical competence administered in grades 5 and 7. Psych. Test Assess. Model.
**2019**, 61, 37–64. [Google Scholar] - Han, K.T.; Wells, C.S.; Sireci, S.G. The impact of multidirectional item parameter drift on IRT scaling coefficients and proficiency estimates. Appl. Meas. Educ.
**2012**, 25, 97–117. [Google Scholar] [CrossRef] - Huggins, A.C. The effect of differential item functioning in anchor items on population invariance of equating. Educ. Psychol. Meas.
**2014**, 74, 627–658. [Google Scholar] [CrossRef] - Lei, P.W.; Zhao, Y. Effects of vertical scaling methods on linear growth estimation. Appl. Psychol. Meas.
**2012**, 36, 21–39. [Google Scholar] [CrossRef] - Pohl, S.; Haberkorn, K.; Carstensen, C.H. Measuring competencies across the lifespan-challenges of linking test scores. In Dependent Data in Social Sciences Research; Stemmler, M., von Eye, A., Eds.; Springer: Cham, Switzerland, 2015; pp. 281–308. [Google Scholar] [CrossRef]
- Tong, Y.; Kolen, M.J. Comparisons of methodologies and results in vertical scaling for educational achievement tests. Appl. Meas. Educ.
**2007**, 20, 227–253. [Google Scholar] [CrossRef] - Wetzel, E.; Carstensen, C.H. Linking PISA 2000 and PISA 2009: Implications of instrument design on measurement invariance. Psych. Test Assess. Model.
**2013**, 55, 181–206. [Google Scholar] - Robitzsch, A. Robust Haebara linking for many groups in the case of partial invariance. Preprints
**2020**, 2020060035. [Google Scholar] [CrossRef]

**Figure 1.**Loss function $\rho (x)={\left|x\right|}^{p}$ used in robust Haebara linking with different values of p.

**Table 1.**Average Absolute Bias (ABIAS) and Average Root Mean Square Error (ARMSE) of Group Means as a Function of Sample Size.

ABIAS | ARMSE | ||||||||
---|---|---|---|---|---|---|---|---|---|

Model | $N$ | 250 | 500 | 1000 | 5000 | 250 | 500 | 1000 | 5000 |

FI | $0.006$ | $0.001$ | $0.003$ | $0.001$ | $0.059$ | $0.042$ | $0.029$ | $0.013$ | |

$p=2$ | $0.008$ | $0.002$ | $0.003$ | $0.001$ | $0.059$ | $0.042$ | $0.029$ | $0.013$ | |

$p=1$ | $0.008$ | $0.002$ | $0.003$ | $0.001$ | $0.059$ | $0.043$ | $0.029$ | $0.013$ | |

$p=0.5$ | $0.007$ | $0.002$ | $0.003$ | $0.001$ | $0.059$ | $0.043$ | $0.029$ | $0.013$ | |

$p=0.25$ | $0.007$ | $0.002$ | $0.003$ | $0.001$ | $0.060$ | $0.043$ | $0.029$ | $0.013$ | |

$p=0.1$ | $0.007$ | $0.003$ | $0.003$ | $0.001$ | $0.060$ | $0.044$ | $0.029$ | $0.013$ | |

$p=0.02$ | $0.007$ | $0.003$ | $0.003$ | $0.001$ | $0.060$ | $0.044$ | $0.029$ | $0.013$ | |

10% Biased Items | |||||||||

FI | $0.037$ | $0.034$ | $0.032$ | $0.033$ | $0.075$ | $0.057$ | $0.046$ | $0.037$ | |

$p=2$ | $0.038$ | $0.032$ | $0.032$ | $0.032$ | $0.075$ | $0.056$ | $0.045$ | $0.036$ | |

$p=1$ | $0.026$ | $0.019$ | $0.014$ | $0.012$ | $0.069$ | $0.048$ | $0.034$ | $0.019$ | |

$p=0.5$ | $0.020$ | $0.014$ | $0.008$ | $0.007$ | $0.068$ | $0.046$ | $0.031$ | $0.016$ | |

$p=0.25$ | $0.018$ | $0.012$ | $0.006$ | $0.005$ | $0.068$ | $0.046$ | $0.031$ | $0.015$ | |

$p=0.1$ | $0.018$ | $0.012$ | $0.005$ | $0.004$ | $0.069$ | $0.046$ | $0.031$ | $0.015$ | |

$p=0.02$ | $0.017$ | $0.011$ | $0.005$ | $0.004$ | $0.069$ | $0.046$ | $0.030$ | $0.014$ | |

30% Biased Items | |||||||||

FI | $0.111$ | $0.108$ | $0.110$ | $0.109$ | $0.132$ | $0.119$ | $0.116$ | $0.110$ | |

$p=2$ | $0.109$ | $0.108$ | $0.110$ | $0.109$ | $0.132$ | $0.119$ | $0.116$ | $0.110$ | |

$p=1$ | $0.086$ | $0.077$ | $0.072$ | $0.062$ | $0.115$ | $0.092$ | $0.082$ | $0.065$ | |

$p=0.5$ | $0.072$ | $0.058$ | $0.048$ | $0.034$ | $0.107$ | $0.079$ | $0.062$ | $0.037$ | |

$p=0.25$ | $0.068$ | $0.049$ | $0.038$ | $0.024$ | $0.124$ | $0.072$ | $0.054$ | $0.029$ | |

$p=0.1$ | $0.064$ | $0.044$ | $0.032$ | $0.020$ | $0.123$ | $0.069$ | $0.051$ | $0.025$ | |

$p=0.02$ | $0.062$ | $0.042$ | $0.030$ | $0.018$ | $0.123$ | $0.068$ | $0.049$ | $0.024$ |

Robust Haebara Linking with Power p | |||||||||
---|---|---|---|---|---|---|---|---|---|

Country | $N$ | rg | FI | 2 | 1 | 0.5 | 0.25 | 0.1 | 0.02 |

AUS | 7562 | $1.9$ | $516.7$ | $515.5$ | $516.1$ | $516.5$ | $516.8$ | $516.9$ | $517.4$ |

AUT | 2646 | $0.4$ | $496.2$ | $496.0$ | $495.7$ | $495.6$ | $495.7$ | $495.7$ | $495.7$ |

BEL | 4840 | $1.4$ | $506.7$ | $506.8$ | $507.4$ | $507.8$ | $508.0$ | $508.1$ | $508.2$ |

CAN | 12142 | $4.5$ | $528.0$ | $526.1$ | $528.3$ | $529.5$ | $530.0$ | $530.4$ | $530.6$ |

CHE | 6578 | $2.0$ | $502.1$ | $502.3$ | $503.4$ | $503.9$ | $504.1$ | $504.2$ | $504.3$ |

CZE | 3246 | $0.6$ | $483.1$ | $482.6$ | $483.1$ | $483.2$ | $483.2$ | $483.2$ | $483.2$ |

DEU | 2701 | $4.2$ | $496.1$ | $497.0$ | $499.3$ | $500.3$ | $500.8$ | $501.1$ | $501.2$ |

DNK | 2431 | $2.4$ | $500.0$ | $499.5$ | $501.0$ | $501.5$ | $501.7$ | $501.8$ | $501.9$ |

ESP | 10506 | $4.3$ | $465.5$ | $465.0$ | $467.1$ | $468.3$ | $468.8$ | $469.1$ | $469.3$ |

EST | 2630 | $3.8$ | $499.2$ | $497.5$ | $499.3$ | $500.4$ | $500.9$ | $501.2$ | $501.3$ |

FIN | 2536 | $2.2$ | $551.6$ | $548.4$ | $549.8$ | $550.3$ | $550.4$ | $550.5$ | $550.6$ |

FRA | 2524 | $3.3$ | $499.0$ | $498.6$ | $500.3$ | $501.1$ | $501.5$ | $501.7$ | $501.9$ |

GBR | 7061 | $2.5$ | $499.1$ | $498.2$ | $496.6$ | $496.1$ | $495.9$ | $495.7$ | $495.7$ |

GRC | 2606 | $7.7$ | $456.9$ | $458.5$ | $454.1$ | $452.3$ | $451.5$ | $451.1$ | $450.8$ |

HUN | 2399 | $2.4$ | $485.2$ | $485.9$ | $487.2$ | $487.9$ | $488.1$ | $488.2$ | $488.3$ |

IRL | 2468 | $1.9$ | $518.4$ | $517.2$ | $516.3$ | $515.8$ | $515.6$ | $515.4$ | $515.3$ |

ISL | 2010 | $2.0$ | $493.1$ | $492.2$ | $493.1$ | $493.6$ | $493.9$ | $494.1$ | $494.2$ |

ITA | 11629 | $3.0$ | $470.7$ | $471.6$ | $473.1$ | $473.9$ | $474.3$ | $474.5$ | $474.6$ |

JPN | 3203 | $6.1$ | $502.9$ | $506.8$ | $503.8$ | $502.4$ | $501.6$ | $501.1$ | $500.7$ |

KOR | 2790 | $16.1$ | $556.1$ | $560.5$ | $552.1$ | $548.0$ | $546.1$ | $545.0$ | $544.4$ |

LUX | 2443 | $1.4$ | $481.9$ | $481.6$ | $482.3$ | $482.6$ | $482.8$ | $483.0$ | $483.0$ |

NLD | 2666 | $3.6$ | $509.3$ | $511.3$ | $509.9$ | $508.9$ | $508.3$ | $507.9$ | $507.7$ |

NOR | 2504 | $3.2$ | $489.3$ | $488.1$ | $486.5$ | $485.7$ | $485.3$ | $485.1$ | $484.9$ |

POL | 2968 | $2.0$ | $506.7$ | $507.2$ | $508.3$ | $508.8$ | $509.0$ | $509.2$ | $509.2$ |

PRT | 2773 | $0.5$ | $475.8$ | $476.1$ | $476.0$ | $475.8$ | $475.7$ | $475.7$ | $475.6$ |

SWE | 2374 | $0.6$ | $510.5$ | $509.5$ | $509.7$ | $509.9$ | $510.0$ | $510.1$ | $510.1$ |

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Robitzsch, A.
Robust Haebara Linking for Many Groups: Performance in the Case of Uniform DIF. *Psych* **2020**, *2*, 155-173.
https://doi.org/10.3390/psych2030014

**AMA Style**

Robitzsch A.
Robust Haebara Linking for Many Groups: Performance in the Case of Uniform DIF. *Psych*. 2020; 2(3):155-173.
https://doi.org/10.3390/psych2030014

**Chicago/Turabian Style**

Robitzsch, Alexander.
2020. "Robust Haebara Linking for Many Groups: Performance in the Case of Uniform DIF" *Psych* 2, no. 3: 155-173.
https://doi.org/10.3390/psych2030014