Bias and Linking Error in Fixed Item Parameter Calibration
Abstract
1. Introduction
2. Analytical Derivation of Bias and Linking Error
3. Simulation Study
3.1. Methods
3.2. Results
4. Discussion
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
- The following abbreviations are used in this manuscript:
2PL | two-parameter logistic |
FIPC | fixed item parameter calibration |
Inf | infinite sample size |
IRF | item response function |
IRT | item response theory |
RMSE | root mean square error |
References
- Bock, R.D.; Moustaki, I. Item response theory in a general framework. In Handbook of Statistics, Vol. 26: Psychometrics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 469–513. [Google Scholar] [CrossRef]
- Chen, Y.; Li, X.; Liu, J.; Ying, Z. Item Response Theory—A Statistical Framework for Educational and Psychological Measurement. 2024. Ahead of Print. Available online: https://arxiv.org/abs/2108.08604 (accessed on 4 August 2024).
- De Ayala, R.J. The Theory and Practice of Item Response Theory; Guilford Publications: New York, NY, USA, 2022. [Google Scholar]
- Formann, A.K.; Kohlmann, T. Structural latent class models. Sociol. Methods Res. 1998, 26, 530–565. [Google Scholar] [CrossRef]
- Martinková, P.; Hladká, A. Computational Aspects of Psychometric Methods: With R; Chapman and Hall/CRC: Boca Raton, FL, USA, 2023. [Google Scholar] [CrossRef]
- Noventa, S.; Heller, J.; Kelava, A. Toward a unified perspective on assessment models, part I: Foundations of a framework. J. Math. Psychol. 2024, 122, 102872. [Google Scholar] [CrossRef]
- Yen, W.M.; Fitzpatrick, A.R. Item response theory. In Educational Measurement; Brennan, R.L., Ed.; Praeger Publishers: Westport, WA, USA, 2006; pp. 111–154. [Google Scholar]
- van der Linden, W.J. Unidimensional logistic response models. In Handbook of Item Response Theory, Volume 1: Models; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 11–30. [Google Scholar] [CrossRef]
- Meijer, R.R.; Tendeiro, J.N. Unidimensional item response theory. In The Wiley Handbook of Psychometric Testing: A Multidisciplinary Reference on Survey, Scale and Test; Irwing, P., Booth, T., Hughes, D.J., Eds.; Wiley: New York, NY, USA, 2018; pp. 413–443. [Google Scholar] [CrossRef]
- Birnbaum, A. Some latent trait models and their use in inferring an examinee’s ability. In Statistical Theories of Mental Test Scores; Lord, F.M., Novick, M.R., Eds.; MIT Press: Reading, MA, USA, 1968; pp. 397–479. [Google Scholar]
- von Davier, M. A general diagnostic model applied to language testing data. Br. J. Math. Stat. Psychol. 2008, 61, 287–307. [Google Scholar] [CrossRef] [PubMed]
- von Davier, M.; Yamamoto, K. Partially observed mixtures of IRT models: An extension of the generalized partial-credit model. Appl. Psychol. Meas. 2004, 28, 389–406. [Google Scholar] [CrossRef]
- Bock, R.D.; Aitkin, M. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika 1981, 46, 443–459. [Google Scholar] [CrossRef]
- Glas, C.A.W. Maximum-likelihood estimation. In Handbook of Item Response Theory, Vol. 2: Statistical Tools; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 197–216. [Google Scholar] [CrossRef]
- Kang, T.; Petersen, N.S. Linking item parameters to a base scale. Asia Pac. Educ. Rev. 2012, 13, 311–321. [Google Scholar] [CrossRef]
- Kim, K.Y. Two IRT fixed parameter calibration methods for the bifactor model. J. Educ. Meas. 2020, 57, 29–50. [Google Scholar] [CrossRef]
- Kim, S. A comparative study of IRT fixed parameter calibration methods. J. Educ. Meas. 2006, 43, 355–381. [Google Scholar] [CrossRef]
- Kim, S.; Kolen, M.J. Application of IRT fixed parameter calibration to multiple-group test data. Appl. Meas. Educ. 2019, 32, 310–324. [Google Scholar] [CrossRef]
- König, C.; Khorramdel, L.; Yamamoto, K.; Frey, A. The benefits of fixed item parameter calibration for parameter accuracy in small sample situations in large-scale assessments. Educ. Meas. Issues Pract. 2021, 40, 17–27. [Google Scholar] [CrossRef]
- Magis, D.; Béland, S.; Tuerlinckx, F.; De Boeck, P. A general framework and an R package for the detection of dichotomous differential item functioning. Behav. Res. Methods 2010, 42, 847–862. [Google Scholar] [CrossRef]
- Mellenbergh, G.J. Item bias and item response theory. Int. J. Educ. Res. 1989, 13, 127–143. [Google Scholar] [CrossRef]
- Millsap, R.E. Statistical Approaches to Measurement Invariance; Routledge: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
- Penfield, R.D.; Camilli, G. Differential item functioning and item bias. In Handbook of Statistics, Vol. 26: Psychometrics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 125–167. [Google Scholar] [CrossRef]
- Soares, T.M.; Gonçalves, F.B.; Gamerman, D. An integrated Bayesian model for DIF analysis. J. Educ. Behav. Stat. 2009, 34, 348–377. [Google Scholar] [CrossRef]
- Michaelides, M.P.; Haertel, E.H. Selection of common items as an unrecognized source of variability in test equating: A bootstrap approximation assuming random sampling of common items. Appl. Meas. Educ. 2014, 27, 46–57. [Google Scholar] [CrossRef]
- Monseur, C.; Berezner, A. The computation of equating errors in international surveys in education. J. Appl. Meas. 2007, 8, 323–335. Available online: https://bit.ly/2WDPeqD (accessed on 4 August 2024). [PubMed]
- Robitzsch, A. Linking error in the 2PL model. J 2023, 6, 58–84. [Google Scholar] [CrossRef]
- Robitzsch, A. Estimation of standard error, linking error, and total error for robust and nonrobust linking methods in the two-parameter logistic model. Stats 2024, 7, 592–612. [Google Scholar] [CrossRef]
- Sachse, K.A.; Roppelt, A.; Haag, N. A comparison of linking methods for estimating national trends in international comparative large-scale assessments in the presence of cross-national DIF. J. Educ. Meas. 2016, 53, 152–171. [Google Scholar] [CrossRef]
- Sachse, K.A.; Haag, N. Standard errors for national trends in international large-scale assessments in the case of cross-national differential item functioning. Appl. Meas. Educ. 2017, 30, 102–116. [Google Scholar] [CrossRef]
- Wu, M. Measurement, sampling, and equating errors in large-scale assessments. Educ. Meas. Issues Pract. 2010, 29, 15–27. [Google Scholar] [CrossRef]
- Robitzsch, A. Bias-reduced Haebara and Stocking-Lord linking. J 2024, 7, 373–384. [Google Scholar] [CrossRef]
- Robitzsch, A. SIMEX-based and analytical bias corrections in Stocking-Lord linking. Analytics 2024, 3, 368–388. [Google Scholar] [CrossRef]
- De Boeck, P. Random item IRT models. Psychometrika 2008, 73, 533–559. [Google Scholar] [CrossRef]
- Fox, J.P. Bayesian Item Response Modeling; Springer: New York, NY, USA, 2010. [Google Scholar] [CrossRef]
- Fox, J.P.; Verhagen, A.J. Random item effects modeling for cross-national survey data. In Cross-Cultural Analysis: Methods and Applications; Davidov, E., Schmidt, P., Billiet, J., Eds.; Routledge: London, UK, 2010; pp. 461–482. [Google Scholar] [CrossRef]
- de Jong, M.G.; Steenkamp, J.B.E.M.; Fox, J.P. Relaxing measurement invariance in cross-national consumer research using a hierarchical IRT model. J. Consum. Res. 2007, 34, 260–278. [Google Scholar] [CrossRef]
- Longford, N.T.; Holland, P.W.; Thayer, D.T. Stability of the MH D-DIF statistics across populations. In Differential Item Functioning; Holland, P.W., Wainer, H., Eds.; Routledge: London, UK, 1993; pp. 171–196. [Google Scholar] [CrossRef]
- Van den Noortgate, W.; De Boeck, P. Assessing and explaining differential item functioning using logistic mixed models. J. Educ. Behav. Stat. 2005, 30, 443–464. [Google Scholar] [CrossRef]
- Robitzsch, A. Robust and nonrobust linking of two groups for the Rasch model with balanced and unbalanced random DIF: A comparative simulation study and the simultaneous assessment of standard errors and linking errors with resampling techniques. Symmetry 2021, 13, 2198. [Google Scholar] [CrossRef]
- Bock, R.D.; Gibbons, R.D. Item Response Theory; Wiley: Hoboken, NJ, USA, 2021. [Google Scholar] [CrossRef]
- Boos, D.D.; Stefanski, L.A. Essential Statistical Inference; Springer: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
- Penfield, R.D.; Algina, J. A generalized DIF effect variance estimator for measuring unsigned differential test functioning in mixed format tests. J. Educ. Meas. 2006, 43, 295–312. [Google Scholar] [CrossRef]
- Morris, T.P.; White, I.R.; Crowther, M.J. Using simulation studies to evaluate statistical methods. Stat. Med. 2019, 38, 2074–2102. [Google Scholar] [CrossRef]
- R Core Team. R: A Language and Environment for Statistical Computing; R Core Team: Vienna, Austria, 2023; Available online: https://www.R-project.org (accessed on 15 March 2023).
- Robitzsch, A.; Kiefer, T.; Wu, M. TAM: Test Analysis Modules. 2024. R Package Version 4.2-21. Available online: https://doi.org/10.32614/CRAN.package.TAM (accessed on 19 February 2024).
- Bechger, T.M.; Maris, G. A statistical test for differential item pair functioning. Psychometrika 2015, 80, 317–340. [Google Scholar] [CrossRef]
- Camilli, G. The case against item bias detection techniques based on internal criteria: Do item bias procedures obscure test fairness issues? In Differential Item Functioning: Theory and Practice; Holland, P.W., Wainer, H., Eds.; Erlbaum: Hillsdale, NJ, USA, 1993; pp. 397–417. [Google Scholar]
- Doebler, A. Looking at DIF from a new perspective: A structure-based approach acknowledging inherent indefinability. Appl. Psychol. Meas. 2019, 43, 303–321. [Google Scholar] [CrossRef]
- Robitzsch, A.; Lüdtke, O. A review of different scaling approaches under full invariance, partial invariance, and noninvariance for cross-sectional country comparisons in large-scale assessments. Psychol. Test Assess. Model. 2020, 62, 233–279. Available online: https://bit.ly/3ezBB05 (accessed on 4 August 2024).
- Lord, F.M.; Novick, R. Statistical Theories of Mental Test Scores; Addison-Wesley: Reading, MA, USA, 1968. [Google Scholar]
- Bolt, D.M.; Deng, S.; Lee, S. IRT model misspecification and measurement of growth in vertical scaling. J. Educ. Meas. 2014, 51, 141–162. [Google Scholar] [CrossRef]
- Loken, E.; Rulison, K.L. Estimation of a four-parameter item response theory model. Br. J. Math. Stat. Psychol. 2010, 63, 509–525. [Google Scholar] [CrossRef] [PubMed]
- Shim, H.; Bonifay, W.; Wiedermann, W. Parsimonious asymmetric item response theory modeling with the complementary log-log link. Behav. Res. Methods 2023, 55, 200–219. [Google Scholar] [CrossRef]
- Kolen, M.J.; Brennan, R.L. Test Equating, Scaling, and Linking; Springer: New York, NY, USA, 2014. [Google Scholar] [CrossRef]
- Lee, W.C.; Lee, G. IRT linking and equating. In The Wiley Handbook of Psychometric Testing: A Multidisciplinary Reference on Survey, Scale and Test; Irwing, P., Booth, T., Hughes, D.J., Eds.; Wiley: New York, NY, USA, 2018; pp. 639–673. [Google Scholar] [CrossRef]
- von Davier, M.; von Davier, A.A. A unified approach to IRT scale linking and scale transformations. Methodology 2007, 3, 115–124. [Google Scholar] [CrossRef]
- von Davier, A.A.; Carstensen, C.H.; von Davier, M. Linking Competencies in Educational Settings and Measuring Growth; (Research Report No. RR-06-12); Educational Testing Service: Princeton, NJ, USA, 2006. [Google Scholar] [CrossRef]
- Haberman, S.J. Linking Parameter Estimates Derived from an Item Response Model through Separate Calibrations; (Research Report No. RR-09-40); Educational Testing Service: Princeton, NJ, USA, 2009. [Google Scholar] [CrossRef]
- Monseur, C.; Sibberns, H.; Hastedt, D. Linking errors in trend estimation for international surveys in education. IERI Monogr. Ser. 2008, 1, 113–122. [Google Scholar]
- Robitzsch, A. Analytical approximation of the jackknife linking error in item response models utilizing a Taylor expansion of the log-likelihood function. AppliedMath 2023, 3, 49–59. [Google Scholar] [CrossRef]
- Martin, M.O.; Mullis, I.V.S.; Foy, P.; Brossman, B.; Stanco, G.M. Estimating linking error in PIRLS. IERI Monogr. Ser. 2012, 5, 35–47. [Google Scholar]
- Robitzsch, A. A comparison of linking methods for two groups for the two-parameter logistic item response model in the presence and absence of random differential item functioning. Foundations 2021, 1, 116–144. [Google Scholar] [CrossRef]
- Foy, P.; Yin, L. Scaling the PIRLS 2016 achievement data. In Methods and Procedures in PIRLS 2016; Martin, M.O., Mullis, I.V., Hooper, M., Eds.; IEA: Boston College, Chestnut Hill, MA, USA, 2017. [Google Scholar]
- Foy, P.; Fishbein, B.; von Davier, M.; Yin, L. Implementing the TIMSS 2019 scaling methodology. In Methods and Procedures: TIMSS 2019 Technical Report; Martin, M.O., von Davier, M., Mullis, I.V., Eds.; IEA: Boston College, Chestnut Hill, MA, USA, 2020. [Google Scholar]
- Rutkowski, L.; von Davier, M.; Rutkowski, D. (Eds.) A Handbook of International Large-Scale Assessment: Background, Technical Issues, and Methods of Data Analysis; Chapman Hall/CRC Press: London, UK, 2013. [Google Scholar] [CrossRef]
- OECD. PISA 2018. Technical Report; OECD: Paris, France, 2020; Available online: https://bit.ly/3zWbidA (accessed on 4 August 2024).
- OECD. PISA 2012. Technical Report; OECD: Paris, France, 2014; Available online: https://bit.ly/2YLG24g (accessed on 4 August 2024).
- Oliveri, M.E.; von Davier, M. Investigation of model fit and score scale comparability in international assessments. Psychol. Test Assess. Model. 2011, 53, 315–333. Available online: https://bit.ly/3k4K9kt (accessed on 4 August 2024).
- von Davier, M.; Yamamoto, K.; Shin, H.J.; Chen, H.; Khorramdel, L.; Weeks, J.; Davis, S.; Kong, N.; Kandathil, M. Evaluating item response theory linking and model fit for data from PISA 2000–2012. Assess. Educ. Princ. Policy Pract. 2019, 26, 466–488. [Google Scholar] [CrossRef]
Bias | RMSE | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
N | N | ||||||||||
Par | 250 | 500 | 1000 | 2000 | Inf | 250 | 500 | 1000 | 2000 | Inf | |
0 | −0.001 | −0.001 | 0.000 | 0.000 | 0.000 | 0.086 | 0.061 | 0.043 | 0.030 | 0.000 | |
0.1 | 0.000 | 0.000 | 0.000 | 0.000 | −0.001 | 0.091 | 0.067 | 0.051 | 0.041 | 0.027 | |
0.2 | −0.003 | −0.001 | −0.002 | −0.003 | −0.002 | 0.102 | 0.081 | 0.069 | 0.062 | 0.054 | |
0.3 | −0.005 | −0.004 | −0.006 | −0.005 | −0.006 | 0.118 | 0.102 | 0.091 | 0.086 | 0.081 | |
0.4 | −0.010 | −0.009 | −0.008 | −0.010 | −0.010 | 0.139 | 0.124 | 0.116 | 0.111 | 0.108 | |
0.5 | −0.014 | −0.014 | −0.015 | −0.015 | −0.016 | 0.160 | 0.148 | 0.142 | 0.137 | 0.134 | |
0.6 | −0.020 | −0.020 | −0.021 | −0.021 | −0.022 | 0.182 | 0.171 | 0.166 | 0.163 | 0.161 | |
0 | −0.004 | −0.002 | −0.001 | 0.000 | 0.000 | 0.075 | 0.053 | 0.038 | 0.027 | 0.000 | |
0.1 | −0.005 | −0.003 | −0.002 | −0.002 | −0.001 | 0.077 | 0.055 | 0.040 | 0.031 | 0.016 | |
0.2 | −0.009 | −0.007 | −0.005 | −0.005 | −0.005 | 0.081 | 0.062 | 0.049 | 0.041 | 0.032 | |
0.3 | −0.013 | −0.013 | −0.011 | −0.011 | −0.010 | 0.089 | 0.071 | 0.061 | 0.054 | 0.048 | |
0.4 | −0.022 | −0.021 | −0.019 | −0.019 | −0.018 | 0.098 | 0.083 | 0.074 | 0.069 | 0.064 | |
0.5 | −0.033 | −0.031 | −0.030 | −0.028 | −0.029 | 0.109 | 0.096 | 0.088 | 0.083 | 0.079 | |
0.6 | −0.044 | −0.043 | −0.042 | −0.041 | −0.041 | 0.121 | 0.109 | 0.102 | 0.099 | 0.096 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Robitzsch, A. Bias and Linking Error in Fixed Item Parameter Calibration. AppliedMath 2024, 4, 1181-1191. https://doi.org/10.3390/appliedmath4030063
Robitzsch A. Bias and Linking Error in Fixed Item Parameter Calibration. AppliedMath. 2024; 4(3):1181-1191. https://doi.org/10.3390/appliedmath4030063
Chicago/Turabian StyleRobitzsch, Alexander. 2024. "Bias and Linking Error in Fixed Item Parameter Calibration" AppliedMath 4, no. 3: 1181-1191. https://doi.org/10.3390/appliedmath4030063
APA StyleRobitzsch, A. (2024). Bias and Linking Error in Fixed Item Parameter Calibration. AppliedMath, 4(3), 1181-1191. https://doi.org/10.3390/appliedmath4030063