Abstract
Haebara and Stocking–Lord linking methods are frequently used to compare the distributions of two groups. Previous research has demonstrated that Haebara and Stocking–Lord linking can produce bias in estimated standard deviations and, to a smaller extent, in estimated means in the presence of differential item functioning (DIF). This article determines the asymptotic bias of the two linking methods for the 2PL model. A bias-reduced Haebara and bias-reduced Stocking–Lord linking method is proposed to reduce the bias due to uniform DIF effects. The performance of the new linking method is evaluated in a simulation study. In general, it turned out that Stocking–Lord linking had substantial advantages over Haebara linking in the presence of DIF effects. Moreover, bias-reduced Haebara and Stocking–Lord linking substantially reduced the bias in the estimated standard deviation.
1. Introduction
Item response theory (IRT) models [1,2] are multivariate statistical models for multivariate binary random variables. These kinds of models are frequently used to model cognitive testing data stemming from educational or psychological applications. This article only considers unidimensional IRT models [3]. Let be the vector of I dichotomous (i.e., binary) random variables that are typically referred to as items in the psychometric literature. A unidimensional IRT model [4] is a statistical model for the probability distribution for , where
where denotes the density of a normal distribution with mean and standard deviation . The parameters of the distribution’s latent variable (also referred to as the factor variable, trait, or ability) are contained in . The vector contains all estimated item parameters of item response functions (IRFs) (). The two-parameter logistic (2PL) model [5] is the most popular IRT model for dichotomous items and possesses the IRF
where and are the item discrimination and item difficulty, respectively, and denotes the logistic distribution function. For independent and identically distributed observations of N persons from the distribution of the random variable , the unknown model parameters in Equation (1) can be estimated by (marginal) maximum likelihood estimation [6,7,8].
In many educational applications, IRT models are employed to compare the distributions of two groups in a test (i.e., on a set of items) regarding the factor variable in the IRT model in Equation (1). Linking methods [9,10,11] estimate the 2PL model separately in the two groups in the first step. All group differences are captured in the item parameters. Linking methods transform the estimated item parameters from the first step in a subsequent second step to determine a difference regarding the mean and the standard deviation between the two groups. The separate application of the 2PL model in each of the two groups has the advantage that it allows items to function differently across groups: a property that is referred to as differential item functioning (DIF) [12,13]. It has been pointed out that the occurrence of DIF causes additional variability in the estimated mean and standard deviation when applying a linking method [14,15,16,17,18,19].
However, it has also been shown that the presence of DIF effects produces a bias in the parameters that quantify the group difference [20]. The bias occurs for the popular Haebara [21] and Stocking–Lord [22] linking methods. This article proposes bias-reduced variants of these linking methods that rely on a derivation of the asymptotic by using a second-order Taylor expansion of the optimization function involved in the linking method for the 2PL model. A simulation study is carried out to investigate the usefulness of the proposed modified linking estimator.
The rest of the article is organized as follows. The derivation of the bias-reduced variants of Haebara and Stocking–Lord linking is presented in Section 2. Section 3 reports findings from a simulation study that compares the proposed new estimators with the currently used linking methods in the presence of DIF. Finally, the article closes with a discussion in Section 4.
2. Bias Reduction in Haebara and Stocking–Lord Linking
2.1. Group Comparisons in the 2PL Model
In this section, we explain how a group comparison involving two groups is conducted within the 2PL model. Assume that the 2PL model holds for the first group, and the ability distribution is given as a standard normal distribution; that is, . The item discriminations and item intercepts in the first group are defined as and , respectively. In the second group, the ability distribution is given as , where and are the mean and the standard deviation (SD), respectively, of the second group. By using this parametrization, reflects group differences regarding the mean, while quantifies group differences in standard deviations. The item discriminations is assumed to be invariant across the two groups such that . A uniform DIF [12] effect is assumed for item difficulties:
The quantity is also called the DIF variance [23,24], and is denoted as the DIF SD.
An anonymous reviewer wondered whether the assumption of a mean DIF of zero is tenable in applications. We favor the assumption of a zero mean DIF because we consider DIF effects as construct-relevant [25,26,27]. In this case, the estimated group differences (i.e., estimates of and ) should be based on all items, and no items should be removed from group comparisons [28], as they are done under a partial invariance assumption [29,30].
A linking method proceeds in two steps. In the first step, the 2PL model is separately fitted within the two groups while assuming a standard normal distribution for the ability . Due to sampling errors (i.e., sampling of persons), the estimated item parameters and will slightly differ from the data-generating parameters and . In the second group, the original item parameters and are not recovered because the 2PL model is fitted with , but the data-generating model imposes . However, it holds that
and we have that the identified item parameters
We can now compute the identified item parameters. By construction, we have and . Note that
and we have that the identified item parameters
Due to sampling errors, the estimated item parameters and will slightly differ from and .
If no uniform DIF effects occur (i.e., ) and there is no sampling error, the group mean and the group SD can be determined from a single item i using the identified item parameters and . In the presence of sampling errors and DIF, a linking function H is chosen that estimates and based on estimated item parameters and for and . A linking method is formally described in the following Section 2.2.
2.2. Bias Reduction in a Linking Method Due to DIF
In a linking method, the vector is the statistical parameter of interest. The corresponding parameter estimate for is denoted as . The linking function H is defined as a function of and the estimated item parameters (where ) in the two groups such that
By assuming differentiability of H, the estimate fulfills the estimating equation
A minimal requirement of a linking method is that it should recover the true group mean and the true group SD if there is no sampling error and no DIF (i.e., ). Hence, it is assumed that
where denotes joint item parameters and , and all DIF effects are assumed to be zero.
We now derive an expression of the bias of the linking method due to the presence of DIF effects . The idea is to utilize a second-order Taylor expansion of and to derive a bias correction term of the parameter estimate . Note that the estimated item parameters for item i are a function of a common item parameter , and the uniform DIF effect . The estimating equation in Equation (9) can be formalized as
where collects all item parameters, and denotes the vector of DIF effects. Due to Equation (10), it holds that for . This property ensures that the application of the linking method provides the true mean and the true SD .
By assuming independence of DIF effects , we have, by using a first-order Taylor expansion with respect to and a second-order Taylor expansion with respect to DIF effects ,
Due to , we obtain from Equation (12)
Because the DIF effects are random variables with zero expectation , we can determine an approximate (i.e., expected) bias in the estimate as
Note that the bias due to DIF effects does not vanish in large samples of persons or for a large number of items. Let ; the asymptotic bias of can be described as
Equation (14) gives rise to the idea of using an empirical version of it as a bias correction (or maybe only a bias-reducing) term for . We compute the estimated DIF effects by
to obtain a bias-reduced estimate of as
The vector contains second-order derivatives of with respect to and is given by .
The described construction scheme for a bias-reduced linking estimate is applied for Haebara linking in Section 2.3 and Stocking–Lord linking in Section 2.4.
2.3. Bias-Reduced Haebara Linking (brHAE)
The optimization function in Haebara (HAE) [21] linking is defined as
with a priori specification of weights on a grid of values . A convenient choice is to use an equidistant grid of values on and weights that are proportional to the density function of a normal distribution with a mean of 0 and an SD of 2. The corresponding estimating equations to the minimization problem in Equation (18) are given by
In Equation (20), denotes the derivative of the logistic function . The second-order derivative (i.e., the partial derivatives of and with respect to and ) can be simply computed, although the formulas are a bit cumbersome. In a numerical implementation of the bias-reduced linking method, the derivatives can be obtained with numerical approximation.
Using the true parameter vector , we can write
The unknown DIF effect in Equation (21) can be substituted by such that we obtain
We now compute the required derivatives with respect to in the bias-reduced estimate derived in Equation (17). The derivatives with respect to are given by
where is the second derivative of . In analogy, we have for the required second derivative of with respect to ,
The term in Equations (24) and (25) can be estimated by . Hence, the required terms for the bias-reduced Haebara linking (brHAE) can be written as
2.4. Bias-Reduced Stocking–Lord Linking (brSL)
The Stocking–Lord (SL) [22] linking method minimizes the weighted squared distance of test characteristic functions:
SL linking offers more flexibility in allowing differences in the IRFs between groups because the alignment occurs at the level of the test characteristic function. In contrast, HAE linking explicitly defines the squared distance between IRFs. The corresponding estimating equations of SL linking defined in Equation (28) for the parameters and are given as
The second-order derivatives can be similarly obtained, although they can be numerically obtained in a practical implementation of the bias-reduced linking method.
The second-order derivatives of and with respect to can be computed similarly as for Haebara linking. We obtain the following quantities that are required in the formula for the bias reduction:
By using Equations (31) and (32) in Formula (17) for the bias-reduced estimate, we obtain the bias-reduced Stocking–Lord (brSL) linking method.
2.5. Mean-Geometric-Mean Linking (MGM)
The mean-geometric-mean (MGM) linking method [10] is a two-step linking method that computes the SD by a geometric mean of item discriminations in the first step and the mean by an ordinary average of transformed item difficulties in the second step. The estimating equations for and are given by
One can easily show that and . Hence, no asymptotic bias of the MGM linking method is expected. Therefore, exploring a bias-reducing variant of the MGM linking method is unnecessary.
3. Simulation Study
3.1. Method
Item responses in two groups were simulated according to the 2PL model. For identification reasons, the mean and the standard deviation of the normally distributed ability variable in the first group were set to 0 to 1, respectively. In the second group, we chose the mean and the SD for the normally distributed ability variable. The parameters and were held constant in the simulation.
In the simulation, the number of items I was chosen as 10, 20, or 40, indicating short, medium, or long tests, respectively. The group-specific item parameters and for and relied on base item parameters that were fixed in the simulation and a random DIF effect that was simulated in each replication of the simulation study. The common item discriminations in the case of items were chosen as 0.83, 1.02, 0.88, 0.80, 1.04, 0.95, 1.00, 1.13, 1.32, and 1.11, resulting in a mean and an . The common item difficulties were chosen as −1.74, −1.22, −0.22, 0.54, −0.04, −0.39, −0.73, 0.30, 0.83, and −1.39, resulting in and . For item numbers as multiples of 10, we duplicated the item parameters of the 10 items accordingly. The item parameters in the second group included a uniform DIF effect that was added to the common item difficulty . As described in Section 2.1, the item difficulty in the second group was simulated as
The DIF effects were independently and identically normally distributed with a mean of zero and a DIF SD . Note that DIF effects varied across replications within simulation conditions. The DIF SD was chosen as 0, 0.25, or 0.5, indicating no DIF, moderate DIF, or large DIF.
Item responses were simulated according to the 2PL model for sample sizes N per group of 500, 1000, 2000, or 4000. We also investigated an infinite sample size (denoted by ) in which only item parameters with DIF effects were simulated.
Five different linking methods were applied to estimate the mean and the SD in the second group: Haebara (HAE) linking, bias-reduced Haebara (brHAE) linking, Stocking–Lord (SL) linking, bias-reduced Stocking–Lord (brSL) linking, and mean-geometric-mean (MGM) linking. In finite sample sizes, all linking methods relied on estimated item discriminations and item difficulties (, ) obtained from separately fitting the 2PL model to the item response datasets in the two groups. In an infinite sample size, the identified item parameters as defined in Equation (7) were used, and no item responses were simulated.
In each of the 5 (sample size N) × 3 (DIF standard deviation ) × 3 (number of items I) = 45 cells of the simulation, 3000 replications were conducted. We computed the empirical bias, the empirical SD, and the root mean square error (RMSE) for the estimated mean and the estimated standard deviation . A relative percentage RMSE was computed as the ratio of the RMSE values of a particular linking method and the RMSE of Haebara linking.
R software (Version 4.3.1, [31]) was used for the entire analysis in this simulation study. The 2PL model was fitted using the TAM::tam.mml.2pl() function in the R package TAM [32]. MGM linking was implemented using the function sirt::linking.haberman() in the R package sirt [33]. Dedicated R functions were written by the author for HAE and SL linking and their bias-reduced variants. These functions and replication material for this Simulation Study can be found at https://osf.io/8mk49 (accessed on 10 May 2024).
3.2. Results
Table 1 presents the bias, the SD, and the relative RMSE for the estimated mean . By construction, all linking methods estimated the true population value in an infinite sample size in the no-DIF condition. Furthermore, the five linking methods performed similarly regarding the RMSE in the no-DIF condition . However, an efficiency loss due to MGM linking and an efficiency advantage due to SL compared to HAE linking were observed for short tests with a small number of items . A small positive bias was induced by brHAE and brSL in the small sample size .
Table 1.
Simulation study: bias, standard deviation (SD), and relative root mean square error (RMSE) for the estimated mean as a function of the uniform DIF standard deviation , number of items I, and sample size N.
Overall, SL outperformed HAE regarding the RMSE, which was a direct consequence of the lower standard deviation of SL linking compared to HAE linking. Notably, the negative bias of HAE and SL was reduced by the bias-reduced variants brHAE and brSL in the large-DIF condition . MGM linking was advantageous over the other linking methods in the presence of large DIF and larger sample sizes.
We also computed descriptive statistics for the absolute biases and the relative RMSEs of the different linking methods across all simulation conditions. MGM linking had the lowest average absolute bias with a small SD (MGM: , ). The bias-reduced variants of HAE and SL slightly outperformed HAE and SL with respect to the average absolute bias (HAE: , ; brHAE: , ; SL: , ; brSL: , ). SL can be considered the frontrunner among the linking methods regarding the average relative RMSE for the estimated mean (HAE: , ; brHAE: , ; SL: , ; brSL: , ; MGM: , ).
Table 2 reports the bias, the SD, and the relative RMSE for the estimated SD . The MGM linking method exactly recovers the true SD in an infinite sample size in all DIF conditions because uniform DIF does not affect estimates of the SD. There was noticeable bias in the large-DIF condition for HAE and SL linking that did not vanish in an infinite sample size. Overall, MGM was the best-performing method regarding bias and relative RMSE for in DIF conditions and had only minor efficiency losses in the no-DIF conditions. In general, SL outperformed HAE linking regarding the RMSE, which was a consequence of the lower SD. Note that the advantages of SL linking over HAE linking were more pronounced in tests with fewer items (i.e., ). The bias-reduced variants of HAE and SL (i.e., brHAE and brSL) effectively reduced a large portion of the bias in the estimated standard deviation . However, positive bias for brHAE and brSL was observed for a sample size .
Table 2.
Simulation study: bias, standard deviation (SD), and relative root mean square error (RMSE) for the estimated standard deviation as a function of the uniform DIF standard deviation , number of items I, and sample size N.
In summary, descriptive statistics across all simulation conditions for the absolute bias resulted in the MGM linking method as the frontrunner, followed by brHAE and brSL (HAE: , ; brHAE: , ; SL: , ; brSL: , ; MGM: , ). The aggregated statistics for the relative RMSEs demonstrated the good performance of MGM linking. SL clearly outperformed the HAE and brHAE methods, but brSL additionally had improvements over SL, in particular for large DIF (HAE: , ; brHAE: , ; SL: , ; brSL: , ; MGM: , ).
4. Discussion
Previous research has highlighted that the HAE and SL linking methods can produce substantial bias for SDs and for the mean (to a smaller extent) in the presence of DIF effects. Importantly, the bias does not vanish for large sample sizes of persons or a large number of items. To this end, bias-reduced variants of HAE and SL linking were proposed in this article to reduce large portions of the bias in the estimated SD. However, using the bias-reduced variants of the HAE and SL linking methods comes with the price of an increased SD. Whether it is beneficial regarding the RMSE to apply the bias-reduced linking method depends on the size of the DIF variance. In general, bias-reducing methods might be preferred in the presence of large DIF effects, but they could potentially hurt in situations with small DIF effects, because in those situations, the biases of the original HAE and SL linking methods are relatively small. Overall, SL linking outperformed HAE linking (see also [34,35,36]).
In future research, the methodology could be further improved by considering alternative bias-reducing linking estimators. Moreover, the methodology could be extended to linking polytomous item responses using the generalized partial credit model [37]. Also, this article only treated the impact of uniform DIF effects (i.e., DIF effects in item difficulties) in HAE and SL linking. It might also be interesting to study bias-reducing linking methods that can also handle nonuniform DIF effects (i.e., DIF effects in item discriminations).
Our simulation study relied on marginal maximum likelihood estimation of item response models. However, the item parameters in separate scalings could also be obtained with limited information estimation [38,39]. Stocking–Lord and Haebara linking remain unchanged when using a different estimation procedure of item parameters. Hence, our proposed methodology also applies to alternative IRT estimation methods.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Replication material for the simulation study can be found at https://osf.io/8mk49.
Conflicts of Interest
The author declares no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| 2PL | two-parameter logistic |
| brHAE | bias-reduced Haebara |
| brSL | bias-reduced Stocking–Lord |
| HAE | Haebara |
| IRF | item response function |
| IRT | item response theory |
| MGM | mean-geometric-mean |
| RMSE | root mean square error |
| SD | standard deviation |
| SL | Stocking–Lord |
References
- Bock, R.D.; Moustaki, I. Item response theory in a general framework. In Handbook of Statistics, Vol. 26: Psychometrics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 469–513. [Google Scholar] [CrossRef]
- Chen, Y.; Li, X.; Liu, J.; Ying, Z. Item response theory—A statistical framework for educational and psychological measurement. Stat. Sci. 2024. Epub ahead of print. Available online: https://rb.gy/1yic0e (accessed on 10 May 2024).
- van der Linden, W.J. Unidimensional logistic response models. In Handbook of Item Response Theory, Volume 1: Models; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 11–30. [Google Scholar] [CrossRef]
- Yen, W.M.; Fitzpatrick, A.R. Item response theory. In Educational Measurement; Brennan, R.L., Ed.; Praeger Publishers: Westport, CT, USA, 2006; pp. 111–154. [Google Scholar]
- Birnbaum, A. Some latent trait models and their use in inferring an examinee’s ability. In Statistical Theories of Mental Test Scores; Lord, F.M., Novick, M.R., Eds.; MIT Press: Reading, MA, USA, 1968; pp. 397–479. [Google Scholar]
- Aitkin, M. Expectation maximization algorithm and extensions. In Handbook of Item Response Theory, Vol. 2: Statistical Tools; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 217–236. [Google Scholar] [CrossRef]
- Bock, R.D.; Aitkin, M. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika 1981, 46, 443–459. [Google Scholar] [CrossRef]
- Glas, C.A.W. Maximum-likelihood estimation. In Handbook of Item Response Theory, Vol. 2: Statistical Tools; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 197–216. [Google Scholar] [CrossRef]
- Lee, W.C.; Lee, G. IRT linking and equating. In The Wiley Handbook of Psychometric Testing: A Multidisciplinary Reference on Survey, Scale and Test; Irwing, P., Booth, T., Hughes, D.J., Eds.; Wiley: New York, NY, USA, 2018; pp. 639–673. [Google Scholar] [CrossRef]
- Kolen, M.J.; Brennan, R.L. Test Equating, Scaling, and Linking; Springer: New York, NY, USA, 2014. [Google Scholar] [CrossRef]
- Sansivieri, V.; Wiberg, M.; Matteucci, M. A review of test equating methods with a special focus on IRT-based approaches. Statistica 2017, 77, 329–352. [Google Scholar] [CrossRef]
- Millsap, R.E. Statistical Approaches to Measurement Invariance; Routledge: New York, NY, USA, 2011. [Google Scholar] [CrossRef]
- Penfield, R.D.; Camilli, G. Differential item functioning and item bias. In Handbook of Statistics, Vol. 26: Psychometrics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 125–167. [Google Scholar] [CrossRef]
- Monseur, C.; Berezner, A. The computation of equating errors in international surveys in education. J. Appl. Meas. 2007, 8, 323–335. Available online: https://bit.ly/2WDPeqD (accessed on 10 May 2024). [PubMed]
- Robitzsch, A. Robust and nonrobust linking of two groups for the Rasch model with balanced and unbalanced random DIF: A comparative simulation study and the simultaneous assessment of standard errors and linking errors with resampling techniques. Symmetry 2021, 13, 2198. [Google Scholar] [CrossRef]
- Robitzsch, A. Linking error in the 2PL model. J 2023, 6, 58–84. [Google Scholar] [CrossRef]
- Sachse, K.A.; Roppelt, A.; Haag, N. A comparison of linking methods for estimating national trends in international comparative large-scale assessments in the presence of cross-national DIF. J. Educ. Meas. 2016, 53, 152–171. [Google Scholar] [CrossRef]
- Sachse, K.A.; Haag, N. Standard errors for national trends in international large-scale assessments in the case of cross-national differential item functioning. Appl. Meas. Educ. 2017, 30, 102–116. [Google Scholar] [CrossRef]
- Wu, M. Measurement, sampling, and equating errors in large-scale assessments. Educ. Meas. 2010, 29, 15–27. [Google Scholar] [CrossRef]
- Robitzsch, A. A comparison of linking methods for two groups for the two-parameter logistic item response model in the presence and absence of random differential item functioning. Foundations 2021, 1, 116–144. [Google Scholar] [CrossRef]
- Haebara, T. Equating logistic ability scales by a weighted least squares method. Jpn. Psychol. Res. 1980, 22, 144–149. [Google Scholar] [CrossRef]
- Stocking, M.L.; Lord, F.M. Developing a common metric in item response theory. Appl. Psychol. Meas. 1983, 7, 201–210. [Google Scholar] [CrossRef]
- Longford, N.T.; Holland, P.W.; Thayer, D.T. Stability of the MH D-DIF statistics across populations. In Differential Item Functioning; Holland, P.W., Wainer, H., Eds.; Routledge: New York, NY, USA, 1993; pp. 171–196. [Google Scholar] [CrossRef]
- Penfield, R.D.; Algina, J. A generalized DIF effect variance estimator for measuring unsigned differential test functioning in mixed format tests. J. Educ. Meas. 2006, 43, 295–312. [Google Scholar] [CrossRef]
- Camilli, G. The case against item bias detection techniques based on internal criteria: Do item bias procedures obscure test fairness issues. In Differential Item Functioning: Theory and Practice; Holland, P.W., Wainer, H., Eds.; Erlbaum: Hillsdale, NJ, USA, 1993; pp. 397–417. [Google Scholar]
- Robitzsch, A.; Lüdtke, O. Some thoughts on analytical choices in the scaling model for test scores in international large-scale assessment studies. Meas. Instrum. Soc. Sci. 2022, 4, 9. [Google Scholar] [CrossRef]
- Shealy, R.; Stout, W. A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika 1993, 58, 159–194. [Google Scholar] [CrossRef]
- Robitzsch, A.; Lüdtke, O. Why full, partial, or approximate measurement invariance are not a prerequisite for meaningful and valid group comparisons. Struct. Equ. Model. 2023, 30, 859–870. [Google Scholar] [CrossRef]
- Byrne, B.M.; Shavelson, R.J.; Muthén, B. Testing for the equivalence of factor covariance and mean structures: The issue of partial measurement invariance. Psychol. Bull. 1989, 105, 456–466. [Google Scholar] [CrossRef]
- von Davier, M.; Yamamoto, K.; Shin, H.J.; Chen, H.; Khorramdel, L.; Weeks, J.; Davis, S.; Kong, N.; Kandathil, M. Evaluating item response theory linking and model fit for data from PISA 2000–2012. Assess. Educ. 2019, 26, 466–488. [Google Scholar] [CrossRef]
- R Core Team. R: A Language and Environment for Statistical Computing. 2023. Available online: https://www.R-project.org (accessed on 15 March 2023).
- Robitzsch, A.; Kiefer, T.; Wu, M. TAM: Test Analysis Modules, R Package Version 4.2-21. 2024. Available online: https://cran.r-project.org/web/packages/TAM/index.html (accessed on 20 April 2024).
- Robitzsch, A. sirt: Supplementary Item Response Theory Models, R Package Version 4.2-57. 2024. Available online: https://github.com/alexanderrobitzsch/sirt (accessed on 20 April 2024).
- Kang, T.; Petersen, N.S. Linking item parameters to a base scale. Asia Pacific Educ. Rev. 2012, 13, 311–321. [Google Scholar] [CrossRef]
- Kilmen, S.; Demirtasli, N. Comparison of test equating methods based on item response theory according to the sample size and ability distribution. Procedia Soc. Behav. Sci. 2012, 46, 130–134. [Google Scholar] [CrossRef]
- Lee, W.C.; Ban, J.C. A comparison of IRT linking procedures. Appl. Meas. Educ. 2009, 23, 23–48. [Google Scholar] [CrossRef]
- Muraki, E. A generalized partial credit model: Application of an EM algorithm. Appl. Psychol. Meas. 1992, 16, 159–176. [Google Scholar] [CrossRef]
- Forero, C.G.; Maydeu-Olivares, A.; Gallardo-Pujol, D. Factor analysis with ordinal indicators: A Monte Carlo study comparing DWLS and ULS estimation. Struct. Equ. Model. 2009, 16, 625–641. [Google Scholar] [CrossRef]
- Robitzsch, A. A comparison of limited information estimation methods for the two-parameter normal-ogive model with locally dependent items. Stats 2024, 7, 576–591. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).