Linking Error in the 2PL Model
Abstract
:1. Introduction
2. Linking Error and M-Estimation
3. Linking Error of Log-Mean-Mean Linking
4. Simulation Study
4.1. Method
4.2. Results
5. Further Applications of the Linking Error in the 2PL Model
5.1. Different Linking Methods
5.1.1. Robust Log-Mean-Mean Linking
5.1.2. Haebara Linking
5.2. Linking Error with Testlets
5.3. Linking Error in Chain Linking
5.4. Linking Error for Trend Estimates in Educational Large-Scale Assessment Studies
5.5. Linking Error in Fixed Item Parameter Calibration
5.6. Linking Error in Concurrent Calibration
5.7. Linking Error for Derived Parameters
5.7.1. Proportions
5.7.2. Percentiles
5.8. Computation of Total Error and Sampling Error Correction for Linking Error Estimates
6. Discussion
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
1PL | one-parameter logistic |
2PL | two-parameter logistic |
DIF | differential item functioning |
DWLS | diagonally weighted least squares |
FIPC | fixed item parameter calibration |
IPD | item parameter drift |
IRT | item response theory |
JK | jackknife |
LE | linking error |
LSA | large-scale assessment studies |
PIRLS | progress in international reading literacy study |
PISA | programme for international student assessment |
ULS | unweighted least squares |
References
- Chen, Y.; Li, X.; Liu, J.; Ying, Z. Item response theory—A statistical framework for educational and psychological measurement. arXiv 2021, arXiv:2108.08604. [Google Scholar] [CrossRef]
- van der Linden, W.J. Unidimensional logistic response models. In Handbook of Item Response Theory, Volume 1: Models; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 11–30. [Google Scholar]
- Rutkowski, L.; von Davier, M.; Rutkowski, D. (Eds.) A Handbook of International Large-Scale Assessment: Background, Technical Issues, and Methods of Data Analysis; Chapman Hall: London, UK; CRC Press: Boca Raton, FL, USA, 2013. [Google Scholar] [CrossRef]
- OECD. PISA 2018. Technical Report; OECD: Paris, France, 2020. [Google Scholar]
- Yen, W.M.; Fitzpatrick, A.R. Item response theory. In Educational Measurement; Brennan, R.L., Ed.; Praeger Publishers: Westport, CT, USA, 2006; pp. 111–154. [Google Scholar]
- Rasch, G. Probabilistic Models for Some Intelligence and Attainment Tests; Danish Institute for Educational Research: Copenhagen, Denmark, 1960. [Google Scholar]
- Birnbaum, A. Some latent trait models and their use in inferring an examinee’s ability. In Statistical Theories of Mental Test Scores; Lord, F.M., Novick, M.R., Eds.; MIT Press: Reading, MA, USA, 1968; pp. 397–479. [Google Scholar]
- Aitkin, M. Expectation maximization algorithm and extensions. In Handbook of Item Response Theory, Vol. 2: Statistical Tools; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 217–236. [Google Scholar] [CrossRef]
- Bock, R.D.; Aitkin, M. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika 1981, 46, 443–459. [Google Scholar] [CrossRef]
- Holland, P.W.; Wainer, H. (Eds.) Differential Item Functioning: Theory and Practice; Lawrence Erlbaum: Hillsdale, NJ, USA, 1993. [Google Scholar] [CrossRef]
- Penfield, R.D.; Camilli, G. Differential item functioning and item bias. In Handbook of Statistics, Vol. 26: Psychometrics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 125–167. [Google Scholar] [CrossRef]
- Joo, S.; Ali, U.; Robin, F.; Shin, H.J. Impact of differential item functioning on group score reporting in the context of large-scale assessments. Large-Scale Assess. Educ. 2022, 10, 18. [Google Scholar] [CrossRef]
- Sachse, K.A.; Roppelt, A.; Haag, N. A comparison of linking methods for estimating national trends in international comparative large-scale assessments in the presence of cross-national DIF. J. Educ. Meas. 2016, 53, 152–171. [Google Scholar] [CrossRef]
- Battauz, M. Multiple equating of separate IRT calibrations. Psychometrika 2017, 82, 610–636. [Google Scholar] [CrossRef]
- Monseur, C.; Berezner, A. The computation of equating errors in international surveys in education. J. Appl. Meas. 2007, 8, 323–335. [Google Scholar]
- OECD. PISA 2012. Technical Report; OECD: Paris, France, 2014; Available online: https://bit.ly/2YLG24g (accessed on 3 December 2022).
- Robitzsch, A.; Lüdtke, O. Linking errors in international large-scale assessments: Calculation of standard errors for trend estimation. Assess. Educ. 2019, 26, 444–465. [Google Scholar] [CrossRef]
- Robitzsch, A. Robust and nonrobust linking of two groups for the Rasch model with balanced and unbalanced random DIF: A comparative simulation study and the simultaneous assessment of standard errors and linking errors with resampling techniques. Symmetry 2021, 13, 2198. [Google Scholar] [CrossRef]
- Wu, M. Measurement, sampling, and equating errors in large-scale assessments. Educ. Meas. 2010, 29, 15–27. [Google Scholar] [CrossRef]
- Efron, B.; Tibshirani, R.J. An Introduction to the Bootstrap; CRC Press: Boca Raton, FL, USA, 1994. [Google Scholar] [CrossRef]
- Kolenikov, S. Resampling variance estimation for complex survey data. Stata J. 2010, 10, 165–199. [Google Scholar] [CrossRef] [Green Version]
- Boos, D.D.; Stefanski, L.A. Essential Statistical Inference; Springer: New York, NY, USA, 2013. [Google Scholar] [CrossRef]
- Stefanski, L.A.; Boos, D.D. The calculus of M-estimation. Am. Stat. 2002, 56, 29–38. [Google Scholar] [CrossRef]
- Zeileis, A. Object-oriented computation of sandwich estimators. J. Stat. Softw. 2006, 16, 1–16. [Google Scholar] [CrossRef] [Green Version]
- Fay, M.P.; Graubard, B.I. Small-sample adjustments for Wald-type tests using sandwich estimators. Biometrics 2001, 57, 1198–1206. [Google Scholar] [CrossRef]
- Li, P.; Redden, D.T. Small sample performance of bias-corrected sandwich estimators for cluster-randomized trials with binary outcomes. Stat. Med. 2015, 34, 281–296. [Google Scholar] [CrossRef] [Green Version]
- Zeileis, A.; Köll, S.; Graham, N. Various versatile variances: An object-oriented implementation of clustered covariances in R. J. Stat. Softw. 2020, 95, 1–36. [Google Scholar] [CrossRef]
- Chen, Y.; Li, C.; Xu, G. DIF statistical inference and detection without knowing anchoring items. arXiv 2021, arXiv:2110.11112. [Google Scholar] [CrossRef]
- Halpin, P.F. Differential item functioning via robust scaling. arXiv 2022, arXiv:2207.04598. [Google Scholar] [CrossRef]
- Wang, W.; Liu, Y.; Liu, H. Testing differential item functioning without predefined anchor items using robust regression. J. Educ. Behav. Stat. 2022, 47, 666–692. [Google Scholar] [CrossRef]
- Robitzsch, A. Lp loss functions in invariance alignment and Haberman linking with few or many groups. Stats 2020, 3, 246–283. [Google Scholar] [CrossRef]
- Hunter, J.E. Probabilistic foundations for coefficients of generalizability. Psychometrika 1968, 33, 1–18. [Google Scholar] [CrossRef]
- Kolen, M.J.; Brennan, R.L. Test Equating, Scaling, and Linking; Springer: New York, NY, USA, 2014. [Google Scholar] [CrossRef]
- Robitzsch, A. A comparison of linking methods for two groups for the two-parameter logistic item response model in the presence and absence of random differential item functioning. Foundations 2021, 1, 116–144. [Google Scholar] [CrossRef]
- Maronna, R.A.; Martin, R.D.; Yohai, V.J. Robust Statistics: Theory and Methods; Wiley: New York, NY, USA, 2006. [Google Scholar] [CrossRef]
- R Core Team. R: A Language and Environment for Statistical Computing; R Core Team: Vienna, Austria, 2022; Available online: https://www.R-project.org/ (accessed on 11 January 2022).
- Novack-Gottshall, P.; Wang, S.C. KScorrect: Lilliefors-Corrected Kolmogorov-Smirnov Goodness-of-Fit Tests; R Package Version 1.4-0. 2019. Available online: https://CRAN.R-project.org/package=KScorrect (accessed on 3 July 2019).
- Haebara, T. Equating logistic ability scales by a weighted least squares method. Jpn. Psychol. Res. 1980, 22, 144–149. [Google Scholar] [CrossRef] [Green Version]
- Bradlow, E.T.; Wainer, H.; Wang, X. A Bayesian random effects model for testlets. Psychometrika 1999, 64, 153–168. [Google Scholar] [CrossRef]
- Sireci, S.G.; Thissen, D.; Wainer, H. On the reliability of testlet-based tests. J. Educ. Meas. 1991, 28, 237–247. [Google Scholar] [CrossRef]
- Wainer, H.; Bradlow, E.T.; Wang, X. Testlet Response Theory and Its Applications; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar] [CrossRef]
- Monseur, C.; Sibberns, H.; Hastedt, D. Linking errors in trend estimation for international surveys in education. IERI Monogr. Ser. 2008, 1, 113–122. [Google Scholar]
- Battauz, M. IRT test equating in complex linkage plans. Psychometrika 2013, 78, 464–480. [Google Scholar] [CrossRef]
- Battauz, M. Factors affecting the variability of IRT equating coefficients. Stat. Neerl. 2015, 69, 85–101. [Google Scholar] [CrossRef]
- Arce-Ferrer, A.J.; Bulut, O. Investigating separate and concurrent approaches for item parameter drift in 3PL item response theory equating. Int. J. Test. 2017, 17, 1–22. [Google Scholar] [CrossRef]
- Taherbhai, H.; Seo, D. The philosophical aspects of IRT equating: Modeling drift to evaluate cohort growth in large-scale assessments. Educ. Meas. 2013, 32, 2–14. [Google Scholar] [CrossRef]
- Grothendieck, G. rSymPy: R Interface to SymPy Computer Algebra System. R Package Version 0.2-1.2. 2010. Available online: https://CRAN.R-project.org/package=rSymPy (accessed on 31 July 2010).
- Meurer, A.; Smith, C.P.; Paprocki, M.; Čertík, O.; Kirpichev, S.B.; Rocklin, M.; Kumar, A.; Ivanov, S.; Moore, J.K.; Singh, S.; et al. SymPy: Symbolic computing in Python. PeerJ Comput. Sci. 2017, 3, e103. [Google Scholar] [CrossRef] [Green Version]
- Fischer, L.; Gnambs, T.; Rohm, T.; Carstensen, C.H. Longitudinal linking of Rasch-model-scaled competence tests in large-scale assessments: A comparison and evaluation of different linking methods and anchoring designs based on two tests on mathematical competence administered in grades 5 and 7. Psych. Test Assess. Model. 2019, 61, 37–64. [Google Scholar]
- Sachse, K.A.; Haag, N. Standard errors for national trends in international large-scale assessments in the case of cross-national differential item functioning. Appl. Meas. Educ. 2017, 30, 102–116. [Google Scholar] [CrossRef]
- Sachse, K.A.; Mahler, N.; Pohl, S. When nonresponse mechanisms change: Effects on trends and group comparisons in international large-scale assessments. Educ. Psychol. Meas. 2019, 79, 699–726. [Google Scholar] [CrossRef]
- OECD. PISA 2015. Technical Report; OECD: Paris, France, 2017; Available online: https://bit.ly/32buWnZ (accessed on 3 December 2022).
- Weeks, J.; von Davier, M.; Yamamoto, K. Design considerations for the program for international student assessment. In A Handbook of International Large-Scale Assessment: Background, Technical Issues, and Methods of Data Analysis; Rutkowski, L., von Davier, M., Rutkowski, D., Eds.; Chapman Hall: London, UK; CRC Press: Boca Raton, FL, USA, 2013; pp. 259–276. [Google Scholar] [CrossRef]
- Kang, H.A.; Lu, Y.; Chang, H.H. IRT item parameter scaling for developing new item pools. Appl. Meas. Educ. 2017, 30, 1–15. [Google Scholar] [CrossRef]
- König, C.; Khorramdel, L.; Yamamoto, K.; Frey, A. The benefits of fixed item parameter calibration for parameter accuracy in small sample situations in large-scale assessments. Educ. Meas. 2021, 40, 17–27. [Google Scholar] [CrossRef]
- Cai, L.; Moustaki, I. Estimation methods in latent variable models for categorical outcome variables. In The Wiley Handbook of Psychometric Testing: A Multidisciplinary Reference on Survey, Scale and Test; Irwing, P., Booth, T., Hughes, D.J., Eds.; Wiley: New York, NY, USA, 2018; pp. 253–277. [Google Scholar] [CrossRef]
- Yuan, K.H.; Cheng, Y.; Patton, J. Information matrices and standard errors for MLEs of item parameters in IRT. Psychometrika 2014, 79, 232–254. [Google Scholar] [CrossRef]
- González, J.; Wiberg, M. Applying Test Equating Methods. Using R; Springer: New York, NY, USA, 2017. [Google Scholar] [CrossRef]
- Jewsbury, P.A. Error Variance in Common Population Linking Bridge Studies; (Research Report No. RR-19-42); Educational Testing Service: Princeton, NJ, USA, 2019. [Google Scholar] [CrossRef] [Green Version]
- Martin, M.O.; Mullis, I.V.S.; Foy, P.; Brossman, B.; Stanco, G.M. Estimating linking error in PIRLS. IERI Monogr. Ser. 2012, 5, 35–47. Available online: https://bit.ly/2Vx3el8 (accessed on 3 December 2022).
- Frey, A.; Hartig, J.; Rupp, A.A. An NCME instructional module on booklet designs in large-scale assessments of student achievement: Theory and practice. Educ. Meas. 2009, 28, 39–53. [Google Scholar] [CrossRef]
- Chen, Y.; Li, X.; Zhang, S. Joint maximum likelihood estimation for high-dimensional exploratory item factor analysis. Psychometrika 2019, 84, 124–146. [Google Scholar] [CrossRef] [Green Version]
- Chen, Y.; Li, X.; Zhang, S. Structured latent factor analysis for large-scale data: Identifiability, estimability, and their implications. J. Am. Stat. Assoc. 2020, 115, 1756–1770. [Google Scholar] [CrossRef] [Green Version]
- Haberman, S.J. Maximum likelihood estimates in exponential response models. Ann. Stat. 1977, 5, 815–841. [Google Scholar] [CrossRef]
Item | ||
---|---|---|
1 | 0.73 | −1.31 |
2 | 1.25 | 1.44 |
3 | 1.20 | −1.20 |
4 | 1.47 | 0.10 |
5 | 0.97 | 0.10 |
6 | 1.38 | −0.74 |
7 | 1.05 | 1.48 |
8 | 1.14 | −0.61 |
9 | 1.15 | 0.82 |
10 | 0.67 | −0.07 |
Normal | Normal Mixture | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
I | JK | ESW | OSW | BOSW | JK | ESW | OSW | BOSW | JK | ESW | OSW | BOSW | ||
0.01 | 0.25 | 10 | 92.2 | 92.2 | 90.8 | 92.1 | 92.9 | 92.9 | 91.5 | 92.9 | 92.7 | 92.7 | 91.3 | 92.7 |
20 | 93.8 | 93.8 | 93.2 | 93.8 | 94.5 | 94.5 | 93.8 | 94.5 | 94.4 | 94.4 | 93.8 | 94.4 | ||
40 | 94.6 | 94.6 | 94.3 | 94.6 | 94.9 | 94.9 | 94.5 | 94.8 | 94.9 | 94.9 | 94.6 | 94.9 | ||
80 | 95.2 | 95.2 | 95.0 | 95.1 | 95.2 | 95.2 | 95.0 | 95.1 | 95.2 | 95.2 | 95.1 | 95.2 | ||
0.01 | 0.50 | 10 | 92.5 | 92.5 | 91.2 | 92.5 | 92.9 | 92.9 | 91.5 | 92.9 | 93.1 | 93.1 | 91.7 | 93.0 |
20 | 94.1 | 94.1 | 93.5 | 94.1 | 94.6 | 94.6 | 94.0 | 94.6 | 94.4 | 94.4 | 93.8 | 94.4 | ||
40 | 94.7 | 94.7 | 94.4 | 94.7 | 95.0 | 95.0 | 94.7 | 95.0 | 94.8 | 94.8 | 94.5 | 94.8 | ||
80 | 95.1 | 95.1 | 94.9 | 95.0 | 95.3 | 95.3 | 95.1 | 95.3 | 95.4 | 95.4 | 95.2 | 95.3 | ||
0.25 | 0.25 | 10 | 93.9 | 93.8 | 91.1 | 92.6 | 94.6 | 94.4 | 92.0 | 93.4 | 94.4 | 94.3 | 91.8 | 93.1 |
20 | 94.8 | 94.8 | 93.1 | 93.8 | 94.9 | 94.9 | 93.2 | 93.9 | 95.1 | 95.1 | 93.4 | 94.0 | ||
40 | 95.1 | 95.1 | 93.7 | 94.0 | 95.5 | 95.5 | 94.0 | 94.4 | 95.2 | 95.2 | 93.6 | 94.0 | ||
80 | 95.2 | 95.2 | 93.9 | 94.0 | 95.4 | 95.4 | 94.2 | 94.4 | 95.5 | 95.5 | 94.3 | 94.4 | ||
0.25 | 0.50 | 10 | 93.0 | 92.9 | 90.9 | 92.3 | 93.7 | 93.6 | 91.6 | 93.0 | 93.5 | 93.5 | 91.3 | 92.7 |
20 | 94.3 | 94.3 | 93.1 | 93.8 | 94.4 | 94.4 | 93.2 | 93.8 | 94.5 | 94.5 | 93.3 | 94.0 | ||
40 | 95.0 | 95.0 | 94.2 | 94.6 | 95.1 | 95.1 | 94.3 | 94.7 | 95.1 | 95.1 | 94.3 | 94.6 | ||
80 | 95.2 | 95.2 | 94.6 | 94.8 | 95.2 | 95.1 | 94.6 | 94.7 | 95.1 | 95.1 | 94.4 | 94.6 |
Normal | Normal Mixture | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
I | JK | ESW | OSW | BOSW | JK | ESW | OSW | BOSW | JK | ESW | OSW | BOSW | ||
0.01 | 0.25 | 10 | 92.6 | 92.6 | 86.1 | 87.3 | 92.9 | 92.9 | 84.9 | 86.2 | 92.9 | 92.9 | 85.3 | 86.7 |
20 | 93.8 | 93.8 | 83.0 | 83.8 | 94.5 | 94.5 | 82.1 | 82.8 | 94.4 | 94.4 | 82.5 | 83.3 | ||
40 | 94.6 | 94.6 | 80.2 | 80.6 | 95.0 | 95.0 | 79.1 | 79.5 | 94.9 | 94.9 | 79.2 | 79.5 | ||
80 | 95.2 | 95.2 | 77.9 | 78.1 | 95.1 | 95.1 | 76.2 | 76.4 | 95.0 | 95.0 | 76.3 | 76.6 | ||
0.01 | 0.50 | 10 | 92.1 | 92.2 | 92.3 | 93.0 | 93.2 | 93.2 | 91.8 | 92.6 | 93.0 | 93.0 | 92.2 | 92.9 |
20 | 94.1 | 94.1 | 91.3 | 91.7 | 94.4 | 94.4 | 90.7 | 91.2 | 94.2 | 94.2 | 90.8 | 91.3 | ||
40 | 94.5 | 94.5 | 91.3 | 91.6 | 94.8 | 94.8 | 90.6 | 90.8 | 94.9 | 94.9 | 90.7 | 90.9 | ||
80 | 95.1 | 95.1 | 93.7 | 93.7 | 95.0 | 95.0 | 92.5 | 92.6 | 95.3 | 95.3 | 92.9 | 93.0 | ||
0.25 | 0.25 | 10 | 92.3 | 92.3 | 89.8 | 91.3 | 93.1 | 93.1 | 90.5 | 92.0 | 92.6 | 92.6 | 90.0 | 91.4 |
20 | 93.8 | 93.8 | 92.4 | 93.0 | 94.5 | 94.5 | 93.0 | 93.6 | 94.2 | 94.2 | 92.7 | 93.3 | ||
40 | 94.7 | 94.7 | 93.6 | 93.9 | 95.1 | 95.1 | 93.9 | 94.3 | 94.7 | 94.7 | 93.7 | 94.0 | ||
80 | 94.9 | 94.9 | 94.0 | 94.2 | 95.2 | 95.2 | 94.3 | 94.5 | 95.2 | 95.2 | 94.3 | 94.5 | ||
0.25 | 0.50 | 10 | 92.3 | 92.3 | 87.7 | 89.3 | 93.0 | 92.9 | 88.2 | 89.7 | 92.7 | 92.6 | 88.0 | 89.5 |
20 | 94.1 | 94.1 | 91.2 | 91.8 | 94.3 | 94.3 | 91.2 | 91.9 | 94.1 | 94.1 | 90.9 | 91.6 | ||
40 | 94.8 | 94.8 | 92.7 | 93.0 | 94.9 | 94.9 | 92.7 | 93.0 | 94.9 | 94.9 | 92.7 | 93.1 | ||
80 | 95.1 | 95.1 | 93.3 | 93.4 | 95.4 | 95.4 | 93.5 | 93.7 | 95.1 | 95.1 | 93.3 | 93.4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Robitzsch, A. Linking Error in the 2PL Model. J 2023, 6, 58-84. https://doi.org/10.3390/j6010005
Robitzsch A. Linking Error in the 2PL Model. J. 2023; 6(1):58-84. https://doi.org/10.3390/j6010005
Chicago/Turabian StyleRobitzsch, Alexander. 2023. "Linking Error in the 2PL Model" J 6, no. 1: 58-84. https://doi.org/10.3390/j6010005
APA StyleRobitzsch, A. (2023). Linking Error in the 2PL Model. J, 6(1), 58-84. https://doi.org/10.3390/j6010005