Linking Error Estimation in Fixed Item Parameter Calibration: Theory and Application in Large-Scale Assessment Studies
Abstract
:1. Introduction
2. Linking Error Estimation for Independent Sampling
2.1. Standard Error
2.2. Bias
2.3. Linking Error
2.4. Jackknife Linking Error
2.5. Bias-Corrected Linking Error
2.6. Total Error
3. Linking Error Estimation Based on Resampling Methods
4. Simulation Study
4.1. Method
4.2. Results
5. Empirical Example: PISA 2006 Reading
5.1. Method
5.2. Results
6. Discussion
7. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
2PL | two-parameter logistic |
DIF | differential item functioning |
ER | error ratio |
FIPC | fixed item parameter calibration |
IRF | item response function |
IRT | item response theory |
LE | linking error |
LSA | large-scale assessment |
bias-corrected linking error | |
MML | marginal maximum likelihood |
PISA | programme for international student assessment |
SD | standard deviation |
SE | standard error |
TE | total error |
bias-corrected total error | |
TIMSS | trends in international mathematics and science study |
Appendix A. Country Labels for PISA 2006 Reading Study
References
- Bock, R.D.; Moustaki, I. Item response theory in a general framework. Handb. Stat. 2007, 26, 469–513. [Google Scholar] [CrossRef]
- Formann, A.K. Linear logistic latent class analysis for polytomous data. J. Am. Stat. Assoc. 1992, 87, 476–486. [Google Scholar] [CrossRef]
- Lord, F.M.; Novick, R. Statistical Theories of Mental Test Scores; Addison-Wesley: Reading, MA, USA, 1968. [Google Scholar]
- Mellenbergh, G.J. Generalized linear item response theory. Psychol. Bull. 1994, 115, 300–307. [Google Scholar] [CrossRef]
- van der Linden, W.J. Unidimensional logistic response models. In Handbook of Item Response Theory, Volume 1: Models; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 11–30. [Google Scholar] [CrossRef]
- Birnbaum, A. Some latent trait models and their use in inferring an examinee’s ability. In Statistical Theories of Mental Test Scores; Lord, F.M., Novick, M.R., Eds.; MIT Press: Reading, MA, USA, 1968; pp. 397–479. [Google Scholar]
- Bock, R.D.; Aitkin, M. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika 1981, 46, 443–459. [Google Scholar] [CrossRef]
- San Martin, E. Identification of item response theory models. In Handbook of Item Response Theory, Volume 2: Statistical Tools; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 127–150. [Google Scholar] [CrossRef]
- Kim, S. A comparative study of IRT fixed parameter calibration methods. J. Educ. Meas. 2006, 43, 355–381. [Google Scholar] [CrossRef]
- Kim, S.; Kolen, M.J. Application of IRT fixed parameter calibration to multiple-group test data. Appl. Meas. Educ. 2019, 32, 310–324. [Google Scholar] [CrossRef]
- König, C.; Khorramdel, L.; Yamamoto, K.; Frey, A. The benefits of fixed item parameter calibration for parameter accuracy in small sample situations in large-scale assessments. Educ. Meas. 2021, 40, 17–27. [Google Scholar] [CrossRef]
- Robitzsch, A. Bias and linking error in fixed item parameter calibration. AppliedMath 2024, 4, 1181–1191. [Google Scholar] [CrossRef]
- Rutkowski, L.; von Davier, M.; Rutkowski, D. (Eds.) A Handbook of International Large-scale Assessment: Background, Technical Issues, and Methods of Data Analysis; Chapman Hall/CRC Press: London, UK, 2013. [Google Scholar] [CrossRef]
- OECD. PISA 2018. Technical Report; OECD: Paris, France, 2020; Available online: https://bit.ly/3zWbidA (accessed on 25 November 2024).
- Mellenbergh, G.J. Item bias and item response theory. Int. J. Educ. Res. 1989, 13, 127–143. [Google Scholar] [CrossRef]
- Penfield, R.D.; Camilli, G. 5 Differential item functioning and item bias. Handb. Stat. 2007, 26, 125–167. [Google Scholar] [CrossRef]
- De Boeck, P. Random item IRT models. Psychometrika 2008, 73, 533–559. [Google Scholar] [CrossRef]
- de Jong, M.G.; Steenkamp, J.B.E.M.; Fox, J.P. Relaxing measurement invariance in cross-national consumer research using a hierarchical IRT model. J. Consum. Res. 2007, 34, 260–278. [Google Scholar] [CrossRef]
- Robitzsch, A. Robust and nonrobust linking of two groups for the Rasch model with balanced and unbalanced random DIF: A comparative simulation study and the simultaneous assessment of standard errors and linking errors with resampling techniques. Symmetry 2021, 13, 2198. [Google Scholar] [CrossRef]
- Joo, S.; Ali, U.; Robin, F.; Shin, H.J. Impact of differential item functioning on group score reporting in the context of large-scale assessments. Large-Scale Assess. Educ. 2022, 10, 18. [Google Scholar] [CrossRef]
- Robitzsch, A.; Lüdtke, O. Linking errors in international large-scale assessments: Calculation of standard errors for trend estimation. Assess. Educ. 2019, 26, 444–465. [Google Scholar] [CrossRef]
- Monseur, C.; Berezner, A. The computation of equating errors in international surveys in education. J. Appl. Meas. 2007, 8, 323–335. Available online: https://bit.ly/2WDPeqD (accessed on 25 November 2024).
- Robitzsch, A. Linking error in the 2PL model. J 2023, 6, 58–84. [Google Scholar] [CrossRef]
- Sachse, K.A.; Roppelt, A.; Haag, N. A comparison of linking methods for estimating national trends in international comparative large-scale assessments in the presence of cross-national DIF. J. Educ. Meas. 2016, 53, 152–171. [Google Scholar] [CrossRef]
- Wu, M. Measurement, sampling, and equating errors in large-scale assessments. Educ. Meas. 2010, 29, 15–27. [Google Scholar] [CrossRef]
- Robitzsch, A. Estimation of standard error, linking error, and total error for robust and nonrobust linking methods in the two-parameter logistic model. Stats 2024, 7, 592–612. [Google Scholar] [CrossRef]
- Efron, B.; Tibshirani, R.J. An Introduction to the Bootstrap; CRC Press: Boca Raton, FL, USA, 1994. [Google Scholar] [CrossRef]
- Kolenikov, S. Resampling variance estimation for complex survey data. Stata J. 2010, 10, 165–199. [Google Scholar] [CrossRef]
- Monseur, C.; Sibberns, H.; Hastedt, D. Linking errors in trend estimation for international surveys in education. IERI Monogr. Ser. 2008, 1, 113–122. Available online: https://ierinstitute.org/fileadmin/Documents/IERI_Monograph/Volume_1/IERI_Monograph_Volume_01_Chapter_6.pdf (accessed on 25 November 2024).
- Michaelides, M.P.; Haertel, E.H. Selection of common items as an unrecognized source of variability in test equating: A bootstrap approximation assuming random sampling of common items. Appl. Meas. Educ. 2014, 27, 46–57. [Google Scholar] [CrossRef]
- Glas, C.A.W.; Jehangir, M. Modeling country-specific differential functioning. In A Handbook of International Large-Scale Assessment: Background, Technical Issues, and Methods of Data Analysis; Rutkowski, L., von Davier, M., Rutkowski, D., Eds.; Chapman Hall/CRC Press: London, UK, 2013; pp. 97–115. [Google Scholar] [CrossRef]
- Haebara, T. Equating logistic ability scales by a weighted least squares method. Jpn. Psychol. Res. 1980, 22, 144–149. [Google Scholar] [CrossRef]
- Robitzsch, A. Bias-reduced Haebara and Stocking-Lord linking. J 2024, 7, 373–384. [Google Scholar] [CrossRef]
- Robitzsch, A. Analytical approximation of the jackknife linking error in item response models utilizing a Taylor expansion of the log-likelihood function. AppliedMath 2023, 3, 49–59. [Google Scholar] [CrossRef]
- Battauz, M. Multiple equating of separate IRT calibrations. Psychometrika 2017, 82, 610–636. [Google Scholar] [CrossRef]
- Haberman, S.J.; Lee, Y.H.; Qian, J. Jackknifing Techniques for Evaluation of Equating Accuracy; (Research Report No. RR-09-02); Educational Testing Service: Princeton, NJ, USA, 2009. [Google Scholar] [CrossRef]
- Foy, P.; Fishbein, B.; von Davier, M.; Yin, L. Implementing the TIMSS 2019 scaling methodology. In Methods and Procedures: TIMSS 2019 Technical Report; Martin, M.O., von Davier, M., Mullis, I.V., Eds.; IEA, Boston College: Chestnut Hill, MA, USA, 2020. [Google Scholar]
- R Core Team. R: A Language and Environment for Statistical Computing; The R Foundation: Vienna, Austria, 2024; Available online: https://www.R-project.org (accessed on 15 June 2024).
- Robitzsch, A. sirt: Supplementary Item Response Theory Models. R Package Version 4.2-89. 2024. Available online: https://github.com/alexanderrobitzsch/sirt (accessed on 13 November 2024).
- OECD. PISA 2006. Technical Report; OECD: Paris, France, 2009; Available online: https://bit.ly/38jhdzp (accessed on 25 November 2024).
- Wainer, H.; Bradlow, E.T.; Wang, X. Testlet Response Theory and Its Applications; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar] [CrossRef]
- Xu, X.; von Davier, M. Linking Errors in Trend Estimation in Large-Scale Surveys: A Case Study; (Research Report No. RR-10-10); Educational Testing Service: Princeton, NJ, USA, 2010. [Google Scholar] [CrossRef]
- Camilli, G. The case against item bias detection techniques based on internal criteria: Do item bias procedures obscure test fairness issues? In Differential Item Functioning: Theory and Practice; Holland, P.W., Wainer, H., Eds.; Erlbaum: Hillsdale, NJ, USA, 1993; pp. 397–417. [Google Scholar]
- von Davier, M.; Yamamoto, K.; Shin, H.J.; Chen, H.; Khorramdel, L.; Weeks, J.; Davis, S.; Kong, N.; Kandathil, M. Evaluating item response theory linking and model fit for data from PISA 2000–2012. Assess. Educ. 2019, 26, 466–488. [Google Scholar] [CrossRef]
- Adams, R.J. Response to ’Cautions on OECD’s recent educational survey (PISA)’. Oxf. Rev. Educ. 2003, 29, 379–389. [Google Scholar] [CrossRef]
- Robitzsch, A.; Lüdtke, O. Some thoughts on analytical choices in the scaling model for test scores in international large-scale assessment studies. Meas. Instrum. Soc. Sci. 2022, 4, 9. [Google Scholar] [CrossRef]
Par | Bias | SD | Median Error Estimate | Coverage Rate | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|
TE | TEbc | LE | LEbc | TE | TEbc | ||||||
0 | 20 | 500 | 0.000 | 0.058 | 0.062 | 0.059 | 0.022 | 0.000 | 96.1 | 95.0 | |
1000 | −0.001 | 0.041 | 0.044 | 0.041 | 0.014 | 0.000 | 96.3 | 95.1 | |||
2000 | 0.000 | 0.029 | 0.031 | 0.029 | 0.011 | 0.000 | 96.5 | 95.3 | |||
40 | 500 | 0.001 | 0.056 | 0.058 | 0.056 | 0.016 | 0.000 | 96.3 | 95.6 | ||
1000 | 0.000 | 0.040 | 0.041 | 0.040 | 0.009 | 0.000 | 95.6 | 94.7 | |||
2000 | 0.000 | 0.028 | 0.029 | 0.028 | 0.008 | 0.000 | 96.3 | 95.6 | |||
0.3 | 20 | 500 | −0.004 | 0.091 | 0.092 | 0.088 | 0.071 | 0.067 | 94.8 | 93.9 | |
1000 | −0.004 | 0.083 | 0.081 | 0.079 | 0.070 | 0.068 | 93.4 | 92.8 | |||
2000 | −0.007 | 0.076 | 0.076 | 0.075 | 0.070 | 0.069 | 93.9 | 93.4 | |||
40 | 500 | −0.004 | 0.074 | 0.075 | 0.073 | 0.051 | 0.048 | 95.5 | 95.0 | ||
1000 | −0.006 | 0.064 | 0.062 | 0.061 | 0.048 | 0.046 | 94.0 | 93.6 | |||
2000 | −0.006 | 0.057 | 0.057 | 0.057 | 0.050 | 0.049 | 95.0 | 94.8 | |||
0 | 20 | 500 | −0.001 | 0.049 | 0.058 | 0.050 | 0.031 | 0.000 | 98.1 | 95.6 | |
1000 | 0.000 | 0.034 | 0.041 | 0.035 | 0.022 | 0.000 | 98.1 | 96.0 | |||
2000 | 0.000 | 0.025 | 0.030 | 0.025 | 0.017 | 0.007 | 97.8 | 95.5 | |||
40 | 500 | −0.002 | 0.044 | 0.048 | 0.044 | 0.020 | 0.000 | 96.5 | 94.8 | ||
1000 | 0.000 | 0.031 | 0.034 | 0.031 | 0.015 | 0.001 | 97.4 | 95.6 | |||
2000 | 0.000 | 0.022 | 0.024 | 0.022 | 0.009 | 0.000 | 96.5 | 95.4 | |||
0.3 | 20 | 500 | −0.024 | 0.062 | 0.069 | 0.062 | 0.050 | 0.039 | 95.2 | 92.2 | |
1000 | −0.025 | 0.052 | 0.056 | 0.051 | 0.044 | 0.038 | 93.5 | 90.3 | |||
2000 | −0.024 | 0.046 | 0.052 | 0.049 | 0.046 | 0.043 | 94.1 | 92.8 | |||
40 | 500 | −0.024 | 0.050 | 0.054 | 0.049 | 0.032 | 0.025 | 92.9 | 90.3 | ||
1000 | −0.025 | 0.040 | 0.042 | 0.040 | 0.030 | 0.026 | 91.9 | 89.6 | |||
2000 | −0.024 | 0.034 | 0.038 | 0.037 | 0.032 | 0.030 | 93.9 | 92.9 |
CNT | Country Mean, | Country SD, | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Est | SE | LE | LEbc | TE | TEbc | Est | SE | LE | LEbc | TE | TEbc | |||
AUS | 7562 | 28 | 517.0 | 2.25 | 5.36 | 5.32 | 5.81 | 5.77 | 95.8 | 1.48 | 2.45 | 2.26 | 2.86 | 2.70 |
AUT | 2646 | 27 | 496.3 | 3.75 | 4.39 | 4.25 | 5.77 | 5.67 | 103.1 | 2.69 | 3.33 | 2.94 | 4.28 | 3.99 |
BEL | 4840 | 28 | 505.9 | 3.08 | 4.32 | 4.24 | 5.31 | 5.24 | 107.0 | 2.69 | 3.65 | 3.42 | 4.53 | 4.35 |
CAN | 12,142 | 28 | 527.6 | 2.11 | 5.64 | 5.58 | 6.02 | 5.96 | 93.3 | 1.60 | 3.82 | 3.69 | 4.14 | 4.02 |
CHE | 6578 | 28 | 502.3 | 3.13 | 4.55 | 4.47 | 5.52 | 5.46 | 95.7 | 2.33 | 3.03 | 2.81 | 3.83 | 3.65 |
CZE | 3246 | 28 | 483.2 | 4.44 | 5.84 | 5.73 | 7.34 | 7.25 | 112.8 | 3.11 | 4.02 | 3.72 | 5.08 | 4.85 |
DEU | 2701 | 28 | 496.1 | 4.96 | 4.60 | 4.48 | 6.76 | 6.68 | 113.9 | 2.80 | 5.09 | 4.86 | 5.81 | 5.61 |
DNK | 2431 | 27 | 500.1 | 3.14 | 7.19 | 7.11 | 7.84 | 7.77 | 89.0 | 1.97 | 4.72 | 4.49 | 5.11 | 4.90 |
ESP | 10,506 | 28 | 465.0 | 2.12 | 5.70 | 5.60 | 6.08 | 5.99 | 81.4 | 1.24 | 6.40 | 6.26 | 6.52 | 6.38 |
EST | 2630 | 28 | 499.4 | 2.94 | 6.36 | 6.26 | 7.01 | 6.92 | 83.6 | 1.88 | 3.81 | 3.52 | 4.25 | 3.99 |
FIN | 2536 | 28 | 551.6 | 2.37 | 6.26 | 6.15 | 6.69 | 6.59 | 85.3 | 1.92 | 4.52 | 4.26 | 4.91 | 4.67 |
FRA | 2524 | 28 | 499.1 | 3.80 | 5.76 | 5.66 | 6.90 | 6.82 | 98.2 | 2.89 | 5.10 | 4.89 | 5.86 | 5.67 |
GBR | 7061 | 28 | 498.4 | 2.22 | 6.00 | 5.93 | 6.40 | 6.33 | 98.3 | 1.77 | 4.68 | 4.50 | 5.01 | 4.84 |
GRC | 2606 | 28 | 456.9 | 3.59 | 6.57 | 6.49 | 7.49 | 7.42 | 95.0 | 2.54 | 4.18 | 3.92 | 4.89 | 4.67 |
HUN | 2399 | 28 | 485.2 | 3.32 | 4.77 | 4.57 | 5.81 | 5.65 | 91.7 | 2.40 | 4.80 | 4.49 | 5.37 | 5.09 |
IRL | 2468 | 28 | 518.4 | 3.49 | 4.59 | 4.45 | 5.77 | 5.66 | 94.5 | 2.19 | 3.04 | 2.67 | 3.75 | 3.45 |
ISL | 2010 | 28 | 493.2 | 1.96 | 5.36 | 5.25 | 5.71 | 5.60 | 91.3 | 2.09 | 3.38 | 3.03 | 3.97 | 3.68 |
ITA | 11,629 | 28 | 471.6 | 2.15 | 5.35 | 5.30 | 5.76 | 5.72 | 98.1 | 1.91 | 3.29 | 3.13 | 3.81 | 3.67 |
JPN | 3203 | 28 | 502.8 | 3.61 | 9.33 | 9.28 | 10.00 | 9.96 | 103.2 | 2.15 | 3.93 | 3.73 | 4.48 | 4.30 |
KOR | 2790 | 27 | 556.0 | 3.75 | 9.12 | 9.05 | 9.86 | 9.79 | 95.8 | 3.19 | 5.43 | 5.23 | 6.30 | 6.13 |
LUX | 2443 | 27 | 482.1 | 2.12 | 4.26 | 4.14 | 4.76 | 4.65 | 101.0 | 1.94 | 2.68 | 2.23 | 3.31 | 2.96 |
NLD | 2666 | 28 | 509.2 | 3.16 | 7.08 | 7.00 | 7.75 | 7.68 | 101.6 | 3.00 | 4.29 | 4.00 | 5.24 | 5.00 |
NOR | 2504 | 28 | 489.3 | 2.79 | 6.52 | 6.37 | 7.09 | 6.95 | 101.6 | 1.93 | 4.44 | 4.04 | 4.84 | 4.48 |
POL | 2968 | 28 | 506.8 | 2.79 | 6.06 | 5.98 | 6.68 | 6.60 | 99.8 | 2.24 | 3.45 | 3.18 | 4.11 | 3.89 |
PRT | 2773 | 28 | 475.9 | 3.41 | 5.94 | 5.84 | 6.85 | 6.77 | 95.3 | 2.55 | 3.90 | 3.59 | 4.66 | 4.40 |
SWE | 2374 | 28 | 510.7 | 2.99 | 4.51 | 4.36 | 5.41 | 5.29 | 100.2 | 2.55 | 3.26 | 2.91 | 4.14 | 3.87 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Robitzsch, A. Linking Error Estimation in Fixed Item Parameter Calibration: Theory and Application in Large-Scale Assessment Studies. Foundations 2025, 5, 4. https://doi.org/10.3390/foundations5010004
Robitzsch A. Linking Error Estimation in Fixed Item Parameter Calibration: Theory and Application in Large-Scale Assessment Studies. Foundations. 2025; 5(1):4. https://doi.org/10.3390/foundations5010004
Chicago/Turabian StyleRobitzsch, Alexander. 2025. "Linking Error Estimation in Fixed Item Parameter Calibration: Theory and Application in Large-Scale Assessment Studies" Foundations 5, no. 1: 4. https://doi.org/10.3390/foundations5010004
APA StyleRobitzsch, A. (2025). Linking Error Estimation in Fixed Item Parameter Calibration: Theory and Application in Large-Scale Assessment Studies. Foundations, 5(1), 4. https://doi.org/10.3390/foundations5010004