Next Article in Journal
Conditional Inference in Small Sample Scenarios Using a Resampling Approach
Previous Article in Journal
Partially Linear Generalized Single Index Models for Functional Data (PLGSIMF)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Comprehensive Simulation Study of Estimation Methods for the Rasch Model

by
Alexander Robitzsch
1,2
1
IPN—Leibniz Institute for Science and Mathematics Education, Olshausenstraße 62, 24118 Kiel, Germany
2
Centre for International Student Assessment (ZIB), Olshausenstraße 62, 24118 Kiel, Germany
Stats 2021, 4(4), 814-836; https://doi.org/10.3390/stats4040048
Submission received: 31 August 2021 / Revised: 27 September 2021 / Accepted: 28 September 2021 / Published: 1 October 2021

Abstract

:
The Rasch model is one of the most prominent item response models. In this article, different item parameter estimation methods for the Rasch model are systematically compared through a comprehensive simulation study: Different alternatives of joint maximum likelihood (JML) estimation, different alternatives of marginal maximum likelihood (MML) estimation, conditional maximum likelihood (CML) estimation, and several limited information methods (LIM). The type of ability distribution (i.e., nonnormality), the number of items, sample size, and the distribution of item difficulties were systematically varied. Across different simulation conditions, MML methods with flexible distributional specifications can be at least as efficient as CML. Moreover, in many situations (i.e., for long tests), penalized JML and JML with ε adjustment resulted in very efficient estimates and might be considered alternatives to JML implementations currently used in statistical software. Moreover, minimum chi-square (MINCHI) estimation was the best-performing LIM method. These findings demonstrate that JML estimation and LIM can still prove helpful in applied research.

1. Introduction

The Rasch model (RM [1,2,3]) is one of the most popular item response theory (IRT) models [4,5,6,7,8,9]. It is important to select appropriate estimation methods because the RM is widespread in diverse applications (e.g., [10,11,12,13,14,15,16]). For the RM, a variety of estimation methods has been proposed. In this article, a comprehensive comparison of different estimation methods for the RM is conducted. We manipulate the factors’ test length (i.e., number of items), sample size, the type of ability distribution, and the distribution of item difficulties. Recommendations for the choice of estimation methods can be drawn for empirical applications that utilize the RM.
The article is structured as follows. In Section 2, the RM model is introduced. In Section 3, several estimation methods are reviewed. In Section 4, we present the results of a simulation study that compares a wide range of estimation methods. Finally, the paper closes with a discussion in Section 5.

2. Rasch Model

The RM [1,2,17,18,19,20,21,22,23,24,25,26,27] is a statistical model for dichotomous item responses X p i for persons p = 1 , , N and items i = 1 , , I . It assumes the existence of a latent variable θ (so-called ability) that accounts for the dependence among item responses. The item response function for the Rasch model is given as
P ( X p i = x | θ p ; b i ) = exp ( x ( θ p b i ) ) 1 + exp ( θ p b i ) , x = 0 , 1 ,
where θ p is the ability of person p and b i is the item difficulty for item i. Abilities θ p can be either modeled as fixed effects or random effects [28,29]. In the treatment of fixed effects, every person is associated with an ability parameter that has to be estimated. In the random effects situation, a distribution for the ability variable is posed; that is, θ G and the unknown distribution G must be estimated in a parametric, semiparametric, or nonparametric way. Note that the RM parameters are determined up to a constant. Hence, either the mean of the abilities or the mean of item difficulties has to be fixed to zero for reasons of identification [30]. The posed functional form of the item response function (1) in the RM can be assessed by item fit statistics [31]. We would also like to emphasize that the RM only places low requirements on sample sizes because only one parameter per item (i.e., item difficulty b i ) is estimated [9].
In addition to Equation (1), item responses X p i are assumed to be locally independent:
P ( X p 1 = x 1 , , X p I = x I | θ p ) = i = 1 I P ( X p i = x i | θ p ) .
This means that there does not exist residual associations among items after taking the ability θ p into account. Assumption (2) can be tested in empirical applications [32,33,34]. It can be argued that the unidimensionality assumption in Equation (2) is only a crude approximation to real data and enables the extraction of a summary ability variable. The local independence assumption can then be understood as an assumption that residual associations among items cancel out on average. This means that there will always exist positive and negative residual associations after controlling for the extracted ability variable θ .
An important property of the Rasch model is that the sum score S p = i = 1 I X p i is a sufficient statistic for θ p [30]. Hence, θ p is a nonlinear function of the sum score S p , and it does not matter in the computation of the ability which of the items have been solved by a person. Moreover, with at least a moderate number of items, the nonlinear relation of S p and θ p can be closely approximated by a linear function which explains the resemblance of classical test theory [35] and the RM [36]. Moreover, note that the proportion correct for an item is a sufficient statistic for the item difficulty b i .
In this article, the RM is a mixed effects logistic model with a random person effect θ , and item difficulties b i are fixed effects [28,37,38,39,40,41]. The formulation of the RM as a mixed effects model has the advantage that item difficulties can alternatively be considered as random effects [28]. Moreover, more complex hierarchical structures (e.g., students nested within schools) can also be accommodated [39,42].
In the remainder of the paper, we only focus on the estimation of item parameters. We review several estimation methods for the RM in the next section.

3. Estimation Methods for the Rasch Model

A variety of estimation methods has been proposed for the RM [23,43,44]. In this section, we contrast joint maximum likelihood, conditional maximum likelihood, marginal maximum likelihood estimation, and limit information estimation methods.

3.1. Joint Maximum Likelihood (JML) Estimation

Joint maximum likelihood (JML [25,45]) methods treat person abilities θ p as fixed effects. In JML, the vector of person parameters γ = ( θ 1 , , θ N ) is simultaneously estimated with the vector of item parameters b = ( b 1 , , b I ) . The estimation JML algorithm alternates between γ and b parameter estimation in one iteration. Note that the number of estimated parameters grows with the number of observations (i.e., number of persons times number of items). This property is also denoted as the incidental parameter problem, resulting in the undesirable property that JML estimates are not consistent [46,47,48]. However, several bias correction methods can be utilized to circumvent this issue. The different JML estimation variants are described in more detail in the following.

3.1.1. JML with Bias Correction (JMLM and JMLW)

As mentioned above, the JML estimation algorithm cycles between the steps of estimating person and item parameters. For persons that solved no or all items, no finite ability estimate θ p exist, which causes the incidental parameter problem. The JMLM method eliminates the persons with these extreme scores from the JML estimation for estimating item parameters. In contrast, a modified ability estimation method by Warm [49] can be used (JMLW) that results in finite ability estimates and does not require the elimination of persons in the analysis. Interestingly, the JMLW method can be interpreted as a Bayesian estimation method with a Jeffrey’s prior for abilities [50]. The bias due to incidental parameters can be corrected (or at least reduced) in JMLM and JMLW by a subsequent adjustment of estimated item parameters [51,52]. With obtained item parameters b ^ i from the alternating estimating approach, the finally computed bias-corrected item parameter is given as ( I 1 ) / I · b ^ i . Note that the adjustment becomes negligible with an increasing number of items I.

3.1.2. Penalized JML (PJML)

In penalized JML [53,54,55], a ridge penalty term is added to the log-likelihood function with a regularization parameter λ . That is, a term Pen ( θ p ) = λ θ p 2 is added to the person-specific log-likelihood. Including a ridge penalty is equivalent to a Bayesian approach in which a normal prior distribution θ N ( 0 , σ prior 2 ) with an appropriate choice of the regularization parameter σ prior > 0 is employed. PJML also circumvents the exclusion of persons with extreme scores from the estimation. The optimal choice of σ prior will typically differ in the situations in which the precision in person or item parameter estimates should be optimized.

3.1.3. JML with ε Adjustment (JML ε )

Another JML estimation approach that does not require eliminating persons with extreme scores is JML with ε adjustment (JML ε [56,57,58]). JML ε estimation employs a modified likelihood by replacing the sufficient statistic S p with a modified sufficient statistic S p * that is defined by
S p * = ε + I 2 ε I · S p ,
using an appropriate ε > 0 . As a consequence, while S p takes values in the interval [ 0 , I ] , S p * takes values in [ ε , I ε ] , and the latter statistic results in finite ability estimates.
Interestingly, the estimation methods PJML and JML ε tackle the issue of non-finite ability estimates from different angles. The original JML approach (i.e., JMLM, that does not allow persons with extreme scores) seeks ability estimates θ p that solves the estimating equation
S p = f ( θ p ) .
The PJML method adds a penalty Pen ( θ p ) to the right side of Equation (4); that is, S p = f ( θ p ) + Pen ( θ p ) . The JML ε method changes the left side of Equation (4), resulting in the modified estimating equation S p * = f ( θ p ) .

3.2. Conditional Maximum Likelihood (CML) Estimation

Conditional maximum likelihood (CML [43,59,60,61,62]) estimation can handle the situations in which the ability variable is either treated as fixed or random. In CML estimation, the vector of item difficulties b is only estimated. The ability variable θ is removed from the estimation by conditioning on the sum score. One can show that the conditional distribution of item responses X p conditioned on the sum score S p = i = 1 I X p i does not depend on θ p [30]:
P ( X p = x p | S p = s p ) = h ( b ) .
Hence, no distributional assumption about the ability variable has to be posed. In addition, item parameter estimates are consistent. CML estimation is computationally more demanding than JML, but efficient algorithms have been proposed [63,64].
CML estimation has also been discussed for mixed effects logistic models [65,66,67].

3.3. Marginal Maximum Likelihood (MML) Estimation

In marginal maximum likelihood estimation (MML [68,69]), latent variables θ are integrated out by posing a distributional assumption G γ for θ , where distribution parameters γ are simultaneously estimated with b . The log-likelihood function l ( b , γ ) is maximized. The log-likelihood contribution for person p is given by
l p ( b , γ ) = log i = 1 I P ( X p i = x p i | θ ; b i ) d G γ ( θ ) .
If the parametric specification G γ differs from the data-generating distribution H, biased item parameters can occur.
In the following subsections, different distributional specifications in MML are discussed. These MML variants differ in how deviations from normally distributed abilities are handled (see [70,71]).

3.3.1. MML with Normality Assumption (MMLN)

In most applications and the default of most IRT software packages [72,73], a normal distribution for θ is posed (MMLN). For identification of the parameters in the RM, the mean is set to zero, and the standard deviation σ is estimated along with the item parameters b . The integral in the log-likelihood function (6) is evaluated by numerical integration techniques. The consequences of applying a misspecified normal distribution have been frequently studied in the literature [74,75,76,77].
Different numerical approximations of the unidimensional integral involved in the likelihood function (see Equation (6)) have been proposed in the literature [78,79,80]. In our experience, numerical approximations defined as the default in IRT packages such as mirt [72] occasionally provide more accurate than corresponding defaults in the popular mixed effects R package lme4 [37].

3.3.2. MML with Multinomial Distribution (MMLMN)

MML with a multinomial distribution (MMLMN) estimates a discrete distribution for the ability variable θ (see [81]). A fixed grid of θ points θ 1 , , θ C is chosen (e.g., on a grid of equidistant θ points ranging from −4 to 4, see [82]). In MMLMN, probabilities γ c = P ( θ = θ c ) are freely estimated. The number of estimated parameters increases with larger number C of grid points. An appropriate number C of discrete grid points must be chosen to ensure sufficiently stable item and distribution parameters estimation.

3.3.3. MML with Log-Linear Smoothing (MMLLS)

MML with log-linear smoothing (MMLLS) avoids estimating a large number of distribution parameters in MMLMN. In this estimation method, a log-linear smoothing on the discrete probabilities γ c = P ( θ = θ c ) is performed [82,83] (see also [84,85]). If the first two moments are smoothed, MMLLS corresponds to the estimation of discretized normal distribution. In empirical applications, smoothing is typically performed up to the first three or four moments [86,87,88]. These higher moments capture deviations from normality. The log-linear smoothing approach can also be extended to handle nonlinear relations among several latent variables [86].

3.3.4. MML with Located Latent Classes (MMLLC)

The estimation methods MMLMN and MMLLS presuppose the specification of the discrete grid of θ points. In MML with located latent classes, for C latent classes, the values of the grid points θ c are estimated in addition to probabilities γ c (MMLLLC; [89,90,91,92]). In the RM with a test of I items, at most I / 2 located class models can be specified because model parameters in larger models cannot be identified [89]. Notably, MMLLC poses the weakest assumptions about the data-generating distribution G, but it relies on a possible doubtful discrete representation of the θ distribution. Classifying persons into different discrete ability levels might be conceptually appealing in empirical applications [93,94].

3.4. Limited Information Estimation Methods

So-called limited information methods (LIM) for estimating item parameters in the RM do not rely on the full item response pattern x p . These methods are often simpler to compute because they do not iterate through all item response patterns. LIM only consider marginal univariate or bivariate frequency distributions of item responses.

3.4.1. Pairwise Marginal Maximum Likelihood (PMML)

Pairwise MML (PMML [95,96,97,98]) is a composite likelihood estimation method for which only pairwise item response probabilities P ( X p i = x p i , X p j = x p j ) are modelled. The contributions of all item pairs ( i , j ) are taken into account. In principle, any distributional assumption about θ can be posed, like in joint modeling of the probabilities as in MML. However, a normal distribution is often assumed [95,99].

3.4.2. Pairwise Conditional Maximum Likelihood (PCML)

In pairwise CML (PCML [100,101,102,103,104]), the conditional probabilities P ( X p i = x p i , X p j = x p j ) / P ( X p i + X p j = x p i + x p j ) are used for defining an optimization function. Like for CML that conditions on the sum score, PCML also removes θ from estimation equations and does not pose distributional assumptions. The advantage of PCML compared to CML is the strongly reduced burden in computational demand.

3.4.3. Minimum Chi-Square Method (MINCHI)

Minimum chi-square (MINCHI) estimation only relies on bivariate frequencies f i j that are defined as
f i j = P ( X p i = 1 , X p j = 0 ) .
In MINCHI, the following squared distance is defined that is minimized for determining item parameter estimates b (see [30,105,106]):
h ( b ) = i , j ϵ j 1 f i j ϵ i 1 f j i 2 ϵ i 1 ϵ j 1 ( f i j + f j i ) ,
where ϵ i = exp ( b i ) . Fixed-point estimation equations have been proposed for computing the minimizer of Equation (8) (see [30]). Also, note that no distributional assumptions about θ are required for MINCHI estimation.

3.4.4. Row Averaging Method (RA)

Like MICHI estimation, the row averaging method [107,108,109] relies on bivariate frequencies f i j (see Equation (7)). A matrix B with entries b i j = log ( f i j / f j i ) is formed. The row-wise average of entries in the matrix B is used as an item parameter estimate [107,110]. If some cells ( i , j ) are empty, B cannot be computed. An alternative estimation method involving powers has been proposed. Let F denote the matrix consisting of all elements f i j (the so-called incidence matrix). The computation of B can then rely on entries of the matrix F * = F k , where k is an integer larger than one (e.g., 2 or 3).

3.4.5. Eigenvector Method (EVM)

The eigenvector method (EVM [111,112,113,114]) relies on the same preprocessing steps as RA. However, instead of row averaging, the first eigenvector of B is computed as the estimate of the vector of item difficulties. In the case of empty cells, power matrices F * = F k (see Section 3.4.4) can be used. Note that RA and EVM do not require iterations and only require low computational demands that might be attractive for large-scale applications.

3.4.6. Log-Linear by Linear Association Models (LLLA)

Log-linear by linear association (LLLA [115,116,117,118]) models estimate item parameters through a pseudo-likelihood approach. It relies on the fact that a logistic regression for P ( X p i = 1 | S p X p i ) of the item response X p i on the rest score S p X p i can be specified in which θ does not appear (assuming a normal distribution of θ ; see [119]). The logistic regression stacks data of all item responses and allows simultaneous estimation of all item parameters [118,120].

4. Simulation Study

4.1. Purpose

Many simulation studies compare the performance of different item parameter estimation methods for the RM. However, most studies only considered the main estimation methods CML, JML, and MML (see, e.g., [44,77,110,121] for the RM and [122] for the mixed effects logistic model). Moreover, they often only used limited variations of deviations from normality in the ability variable. In this study, we provide a comprehensive comparative simulation study that compares the performance of a large number of estimation methods under a wide range of θ distribution. Moreover, sample size, the number of items, and the distribution of item difficulties are systematically manipulated. This simulation study systematically extends the simulation design employed in [123].

4.2. Design

In the simulation study, item response data has been generated for the RM. We manipulated six factors in the simulation. First, the sample size (N) of persons was manipulated, resulting in three levels N = 250 , 500 , 1000 . These sample sizes reflect small-scale to large-scale applications of the RM. Second, we varied the number of items (I) and chose levels I = 10 and I = 30 . In the simulation, a set of item difficulties was specified for the I = 10 condition. In the condition I = 30 , the item parameters for I = 10 were used three times. The levels reflect a short and a long test in applications. Third, the range of item difficulties was manipulated. In the condition of a test with a symmetric item difficulty distribution, item parameters were chosen from the interval [ 3 , 3 ] for a wide range, and from the interval [ 1.5 , 1.5 ] for a small range. Fourth, the skewness of item difficulties was varied. In the symmetric case, item parameters were equidistantly chosen from the intervals [ 3 , 3 ] and [ 1.5 , 1.5 ] , respectively. In the case of a skew item difficulty distribution, larger item difficulties appear more frequently than smaller item difficulties. The precisely chosen item parameters can be found in Appendix A. Fifth, eight data-generating distributions for the latent ability variable θ in the RM were simulated. All distributions were standardized, that is, E ( θ ) = 0 and SD ( θ ) = 1 . The eight simulated θ distributions are:
  • NO: A normal distribution ( N ( 0 , 1 ) ) with zero mean and a standard deviation of one
  • Chi 2 : A scaled chi-squared distribution with one degree of freedom
  • UN: A uniform distribution on the interval [ 1.73 , 1.73 ] (i.e., U ( 1.73 , 1.73 ) )
  • BE: A scaled U-shaped beta distribution with shape and scale parameters of 0.5; that is, θ 2.83 · ( Beta ( 0.5 , 0.5 ) 0.5 )
  • SM: A symmetric mixture distribution with θ = 0.898 · θ * , and θ * 0.5 · N ( 0.8 , 0 . 77 2 ) + 0.5 · N ( 0.8 , 0 . 77 2 )
  • AM: An asymmetric mixture distribution with θ = 0.994 · ( θ * 0.479 ) , and θ * 0.2 · N ( 0.8 , 0 . 77 2 ) + 0.8 · N ( 0.8 , 0 . 77 2 )
  • LC2: A discrete distribution with θ points −2.0, 0.5 and corresponding probabilities 0.20 and 0.80
  • LC3: A discrete distribution with θ points −0.790, 1.033, 2.248 and corresponding probabilities 0.60, 0.35, and 0.05
In total, there were 3 × 2 × 2 × 2 × 8 = 192 conditions employed in the simulation. In total, 1000 datasets were simulated and analyzed in each condition.

4.3. Analysis Models

Item parameters for the simulated datasets were estimated with the different methods discussed in Section 3. Throughout the simulation, we only considered the estimation of item parameters and did not consider person parameter estimation. To enable the comparability of item parameter estimates, we centered estimated item parameters obtained from each estimation method (i.e., they have zero mean).
For the PJML estimation (see Section 3.1.2), we chose normal priors N ( 0 , σ prior 2 ) with σ prior = 1 , 1.5, and 2. Notably, an optimal value of σ prior could also be estimated by cross-validation or empirical Bayes methods. For JML ε estimation (see Section 3.1.3), we specified values ε = 0.1 , 0.2 , 0.24 , 0.3 , 0.4 , 0.5 . The value ε = 0.24 turned out to be optimal in preliminary simulation studies. For MMLMN estimation (see Section 3.3.2), we specified models with 5 equidistant θ grid points in [ 2 , 2 ] , 7 equidistant θ grid points in [ 3 , 3 ] ), 11 equidistant θ grid points in [ 4 , 4 ] , and 15 equidistant θ grid points in [ 4 , 4 ] . For MMLLS estimation (see Section 3.3.3), we used log-linear smoothing up to three and four moments. The inclusion of moments larger than two allows deviations from normality. An equidistant grid of 15 ability values in [ 4 , 4 ] was chosen. For MMLLC (see Section 3.3.4), we specified analysis with 2, 3, 4, and 5 located latent classes. Notably, the data-generating models LC2 and LC3 are expected to be properly handled by one of these models. For RA estimation (see Section 3.4.4), we used powers 1,2, and 3 of the incidence matrix F for the estimation (i.e., use F * = F k with k = 1 , 2 , 3 as the basis for the computation of the matrix B ). For EVM estimation (see Section 3.4.5), powers 2 and 3 of the incidence matrix F were utilized.
The whole simulation was carried out in the statistical software R [124] utilizing the R packages immer [125] (CML, JML ε ), pairwise [126] (LLLA), plRasch [118,127] (RA) and sirt [128] (EVM, JMLM, JMLW, MINCHI, MMLLC, MMLLS, MMLMN, MMLN, PCML, PJML). For PMML, a dedicated function was implemented in R.

4.4. Outcome Measures

The bias and root mean square error (RMSE) was computed for each estimated item parameter b ^ i . We consider two summary measures of item parameter recovery. First, the mean absolute bias MAB (also labeled as bias in the Results Section 4.5)
MAB ( b ^ ) = 1 I i = 1 I | Bias ( b ^ i ) |
quantifies the average bias of item parameters. MAB values near to zero indicate situations with unbiased item parameter estimates.
Second, bias and variability are summarized in the average RMSE (ARMSE) defined by
ARMSE ( b ^ ) = 1 I i = 1 I RMSE ( b ^ i ) .
To ease the comparison of different estimation methods independent of sample size, ARMSE values are normed with respect to the best-performing estimation method (with a corresponding value ARMSE best ( b ^ ) in one replication cell in the simulation. The so-called relative RMSE (RRMSE) is defined as
RRMSE ( b ^ ) = ARMSE ( b ^ ) ARMSE best ( b ^ ) .
As a consequence, RRMSE values have an optimal value of 100, which are attained by the best-performing estimation method.
To summarize the contribution of each of the manipulated factors in the simulation, we conducted an analysis of variance (ANOVA) based on a linear regression model and used a variance decomposition for assessing the importance (i.e., computed the eta square effect size).
Moreover, we classified estimation methods whether they showed acceptable performance in a particular condition. We defined acceptable performance for the bias if the bias (i.e., the MAB) was smaller than 0.025. Assuming a symmetric item difficulty distribution and that bias is proportional to the true item difficulty, this condition would correspond to a maximum item parameter bias of 0.05. An estimator had satisfactory performance concerning the RMSE if the relative RMSE was smaller than 107. This is equivalent to an average loss in precision in estimated item parameters of about 15 % (i.e., 1 . 07 2 = 1.144 ).

4.5. Results

In Table 1, the variance decomposition from the ANOVA of different simulation factors in the simulation for bias and the (relative) RMSE is presented. All terms up to three-way interactions were included. From the size of the residual variance, it can be concluded that the first three orders capture the most important sources of variance of simulation factors. It turned out that the estimation method (Meth) was the most important first-order factor for bias and RMSE, followed by the range of item difficulties (Range) and the number of items (I). The performance of estimation methods for bias and RMSE depended on an interaction effect with the number of items. Interestingly, there was only an interaction effect of estimation and sample size (N) for the RMSE and not for the bias. Moreover, the performance of estimation methods was also moderated by the range of the item difficulty distribution. Finally, there were also some important three-order terms involving estimation methods (i.e., N × I × Meth, N × Range × Meth, I × Range × Meth). For a selected number of cells, we present some results that demonstrate these interaction effects in more detail.
In Table 2, the performance of the different estimation methods for bias and RMSE are summarized across 192 conditions of the simulation. CML and the LIM MINCHI (being the best estimation method with respect to bias), PCML, EVM, and RA (with powers of the incidence matrix larger than 1) are approximately unbiased across simulation conditions. For MML estimation methods, only those methods were unbiased that specified the ability distribution flexible enough. For the multinomial modeling (MMLMN), a large number of θ grid points (11 or 15; MMLMN(11) or MMLMN(15)) was needed for producing acceptable performance in most of the simulation conditions. At least three located latent classes (MMLLC) were needed for an acceptable estimation of item parameters with respect to bias. Notably, estimation under the normal distribution (MMLN) or with only two latent classes (MMLLC(2)) was unsuccessful in a variety of conditions. Furthermore, all JML variants showed biased item parameter estimates. PJML with a prior of σ prior = 1.5 turned out to be the best-performing method with respect to bias throughout all simulation conditions. Interestingly, the method JMLM that eliminates persons from estimation resulted in a smaller bias than JMLW that does not remove persons.
For the RMSE, JML ε with ε = 0.24 performed best. Like for bias, only sufficiently flexible distributional specifications in MML resulted in acceptable performance for the RMSE. It was indicated to use four instead of only three moments for log-linear smoothing (MMLLS). Again, located latent class models (MMLLC) produced relatively precise item parameter estimates that were even superior to estimates obtained from CML. LIM showed higher RMSE values compared to MML variants and CML. However, the best-performing LIM MINCHI outperformed MML with normal distribution assumption (MMLN) and the widely implemented JML variants JMLM and JMLW. In particular, MINCHI (and partly PCML) should be preferred over EVM and RA estimation.
Table 3 shows the bias and the RMSE for different estimation methods for a sample size of N = 1000 and I = 10 items for a test with a wide range of symmetrically distributed item difficulties as a function of the data-generating trait distribution. Six out of eight data-generating models are depicted that demonstrate the most important differences among estimation methods. The MML method that poses a normal distribution assumption (i.e., MMLN) provides the least bias if the latent ability was generated with a normal distribution. The largest bias was obtained if the located latent class model (LC(2) or LC(3)) generated the data. If θ was normally distributed, log-linear smoothing (MMLLS) and a multinomial distribution (MMLMN) with at least 7 grid points provided approximately unbiased estimates. Notably, located latent class models (MMLLC) have slightly increased bias, but the efficiency with respect to RMSE is even higher than MMLN.
LIM were unbiased or had only small biases, except in the case in which θ was generated with located latent classes and PMML and LLLA estimation. This finding could be explained by the fact that these two estimation methods rely on the incorrect normal distribution assumption. Moreover, note that using powers 2 or 3 of the incidence matrix in RA (and EVM) improved estimates for bias and (to a larger extent) RMSE. Estimation methods PCML and MINCHI outperformed EVM and RA in terms of the RMSE. The additional computational burden in the iterative methods PCML and MINCHI compared to EVM and RA might be acceptable in practical applications.
The results in Table 3 also indicate that flexible MML estimation methods are competitive to CML estimation for nonnormally distributed abilities. Among the JML estimation methods, the bias of PJML with prior σ prior = 1.5 (i.e., PJML(1.5)) was smallest, Across data-generating models, the RMSE for this estimation method was smallest in three of the six data constellations, while in the other three constellations, JML ε with ε = 0.24 performed best. However, it should also be emphasized that JML ε (0.24) introduced non-negligible bias in item parameters. The JMLW estimation method is preferred over JMLM in terms of bias and RMSE, but both methods were strongly inferior to PJML and JML ε .
Table A1 in Appendix B shows the bias and the RMSE for different estimation methods for a sample size of N = 1000 and I = 10 items for a test with a small range of symmetrically distributed item difficulties as a function of the data-generating trait distribution. In general, biases in estimated item parameters were smaller than those with a wide range of item difficulties (presented in Table 3). This finding illustrates that item parameters with large true | b i | values (i.e., extremely small or extremely large) were more prone to bias than item difficulties with true parameters close to zero. It is also evident that the difference between the CML and MML methods from LIM was much smaller. Consequently, practitioners might prefer LIM for tests in which item difficulties do not have extreme values. In this case, the difference between PCML and MINCHI from RA and EVM also turned out to be smaller, although the former two methods might also be preferred. Interestingly and in contrast to a test with a wide range of item difficulties, JMLM outperformed JMLW. Moreover, note that PJML(1.5) resulted in less biased and more precise estimates than JML ε (0.24).
Table A2 in Appendix B shows the bias and the RMSE for different estimation methods for a sample size of N = 250 and I = 10 items for a test with a wide range of symmetrically distributed item difficulties as a function of the data-generating trait distribution. For a smaller sample size of N = 250 (compared to the large sample of N = 1000 in Table 3), JML ε (0.24) consistently resulted in the least RMSE but produced a slightly biased estimate. Surprisingly, more flexible MML estimation methods were not firmly inferior to MMLN if the θ followed a normal distribution. In particular, the located latent class model (MMLLC(2)) provided the most precise estimates among the MML methods for the sample size N = 250 . Also, note that CML was not superior to all MML methods. Interestingly, the performance of LIM compared to MML and CML was even worse than for larger samples. Researchers should only probably opt for the computationally simpler LIM for sufficiently large sample sizes and tests with a smaller range of item difficulties.
Table 4 shows the bias and the RMSE for different estimation methods for a sample size of N = 250 and I = 30 items for a test with a wide range of symmetrically distributed item difficulties as a function of the data-generating trait distribution. For a longer test containing 30 items, biases for all estimation methods were substantially smaller than for 10 items. Notably, differences among estimation methods also turned out to be small. For example, assuming a misspecified normal distribution (i.e., MMLN) only introduced small biases. For a sufficiently long test, the differences between MML methods and CML from LIM were only modest, and practitioners might opt for the computationally simpler methods in this case. Again, PCML and MINCHI have slightly better performance than the EVM and RA methods. It is also important to note that JML ε required a larger ε value of 0.4 or 0.5 compared to a short test for realizing the maximum precision.
Finally, Table A3 in Appendix B shows the bias and the RMSE for different estimation methods for a sample size of N = 1000 and I = 30 items for a test with a wide range of symmetrically distributed item difficulties as a function of the data-generating trait distribution. Again, flexible MML specifications can compete with JML. PJML and JML ε (0.24) produced highly precise estimates, while the bias was almost negligible. It should be emphasized that these JML variants are at least as efficient as CML or MML variants. LIM resulted in more variable estimates. However, PCML and MINCHI almost achieve optimal efficiency of JML ε or MML variants. Surprisingly, PMML produced biased item parameter estimates and might not be recommended. Further research is needed whether this observation relied on the particular implementation of the authors.

5. Discussion

In this article, we compared several estimation methods for the Rasch model. It has been shown that the choice of the ability distribution impacts the precision of estimated item parameters. The differences between estimation methods appear larger for shorter (i.e., 10 items) than for longer (i.e., 30 items) tests. It turned out that MML with a flexible distribution can handle a nonnormally distributed trait well and can compete with CML. Interestingly, JML variants PJML and JML ε outperformed conditional and marginal maximum likelihood as well as LIM in many situations in terms of the RMSE. Moreover, these improved JML methods resulted in approximately unbiased estimates for long tests and larger sample sizes. These findings could stimulate research to consider JML methods PJML and JML ε instead of the widely implemented JMLM or JMLW variants. LIM are attractive for practitioners because they are not computationally demanding. It turned out that PCML or the MINCHI method outperformed the more widely used EVM or RA estimation methods.
Future research could investigate item parameter estimation in the RM for very short scales (e.g., I = 5 or I = 7 items). We suppose that differences among methods will appear larger in this situation. Moreover, optimal tuning parameter σ prior in PJML and ε in JML ε depending on sample size, number of items, and item difficulty distribution have to be determined. We expect that optimal tuning parameters for individual ability estimates do not necessarily coincide with those that are optimal with respect to the RMSE of estimated parameters.
Throughout the simulation study, we assumed that the RM holds in the data. However, there might be situations where RM is intentionally a misspecified IRT model [129]. First, the two-parameter logistic model [130] might have generated the item responses, but the misspecified RM is used as a fitting model. In the case of misspecified IRT models, different estimation functions differently quantify model deviations (see also [110]). Future research might evaluate the appropriateness of estimation methods with respect to robustness from the assumption of the 1PL model. Notably, any estimation method defines its own set of item difficulties in the population of students because estimated difficulties are determined by a particular discrepancy function between the posed misspecified RM and a true IRT model that might involve very complex item response functions. Second, local dependence [34] is also often found in empirical data. LIM MINCHI, PCML, EVM and RA only rely on bivariate frequencies and not marginal frequencies. According to preliminary experience of the authors of this paper, these methods can result in less biased item parameter estimates than CML, MML methods, or JML methods. Studying the effects of local dependence for different estimation methods in the RM might be an exciting topic for future research.
In many applications, missing item responses often occur [131,132,133]. It can be expected that the studied estimation methods in this article can also be applied in situations in which data is missing completely at random (MCAR). It would be interesting whether MML has increased efficiency for MCAR data compared to LIM. For missing at random data, LIM (and ordinary CML) will likely fail (see [134]), and MML or JML will usually be preferred (see [87,135]).
Finally, we only considered frequentist estimation methods. In Bayesian estimation, prior distributions of item parameters can be included in the analysis, which can further stabilize the estimation of item difficulties [136,137,138,139,140,141]. We would like to note that when priors are normally distributed, a penalty function is added (or subtracted) in the estimation function. These penalties can be used not only for MML or JML but also CML [142] or LIM [143].

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CMLconditional maximum likelihood
EVMeigenvector method
IRTitem response theory
JMLjoint maximum likelihood
JML ε JML with ε adjustment
JMLMJML with maximum likelihood ability estimator
JMLWJML with Warm’s maximum likelihood ability estimator
LIMlimited information methods
LLLAlog-linear by linear association method
MABmean absolute bias
MCARmissing completely at random
MMLmarginal maximum likelihood
MMLLLCMML with located latent classes
MMLLSMML with log-linear smoothing
MMLMNMML with multinomial distribution
MMLNMML with normal distribution
MINCHIminimum chi-square estimation
PJMLpenalized JML
PMMLpairwise MML
PCMLpairwise CML
RArow-averaging method
RMRasch model
RMSEroot mean square error

Appendix A. Item Parameters Used in the Simulation Study

The following sets of 10 item difficulties were used in the simulation study:
wide range of difficulties, symmetric difficulty distribution:
−3.000, −2.333, −1.667, −1.000, −0.333, 0.333, 1.000, 1.667, 2.333, 3.000
wide range of difficulties, asymmetric difficulty distribution:
−2.111, −2.037, −1.815, −1.444, −0.926, −0.259, 0.555, 1.518, 2.630, 3.889
small range of difficulties, symmetric difficulty distribution:
−1.500, −1.167, −0.833, −0.500, −0.167, 0.167, 0.500, 0.833, 1.167, 1.500
small range of difficulties, asymmetric difficulty distribution:
−1.055, −1.019, −0.907, −0.722, −0.463, −0.130, 0.278, 0.759, 1.315, 1.945
In the simulation condition with 30 items, each of the item difficulties is used three times. For example:
wide range of difficulties, symmetric difficulty distribution:
−3.000, −2.333, −1.667, −1.000, −0.333, 0.333, 1.000, 1.667, 2.333, 3.000, −3.000, −2.333, −1.667, −1.000, −0.333, 0.333, 1.000, 1.667, 2.333, 3.000, −3.000, −2.333, −1.667, −1.000, −0.333, 0.333, 1.000, 1.667, 2.333, 3.000

Appendix B. Additional Results for Simulation Study

Table A1 shows the bias and the RMSE for different estimation methods for a sample size of N = 1000 and I = 10 items for a test with a small range of symmetrically distributed item difficulties as a function of the data-generating trait distribution. Table A2 shows the bias and the RMSE for different estimation methods for a sample size of N = 250 and I = 10 items for a test with a wide range of symmetrically distributed item difficulties as a function of the data-generating trait distribution. Table A3 shows the bias and the RMSE for different estimation methods for a sample size of N = 1000 and I = 30 items for a test with a wide range of symmetrically distributed item difficulties as a function of the data-generating trait distribution.
Table A1. Bias and relative RMSE for different estimation methods for a sample size of N = 1000 and I = 10 items for a test with a small range of symmetrically distributed item difficulties as a function of the data-generating trait distribution.
Table A1. Bias and relative RMSE for different estimation methods for a sample size of N = 1000 and I = 10 items for a test with a small range of symmetrically distributed item difficulties as a function of the data-generating trait distribution.
MethodBiasRelative RMSE
NOAMUNBELC2LC3NOAMUNBELC2LC3
MMLN0.0020.0050.0070.0100.0130.004100.7101.5102.0102.8105.8101.1
MMLLS(3)0.0030.0020.0020.0040.0040.004100.3100.4100.9101.3104.7101.1
MMLLS(4)0.0030.0030.0040.0040.0040.005100.3100.4100.3100.4105.7100.0
MMLMN(5)0.0170.0210.0220.0260.0300.016102.8105.2105.4108.6130.5102.5
MMLMN(7)0.0030.0040.0040.0040.0280.004100.3100.5100.1100.1113.1100.1
MMLMN(11)0.0040.0040.0050.0030.0030.005100.4100.4100.2100.2100.0101.2
MMLMN(15)0.0040.0030.0040.0040.0030.004100.4100.5100.4100.6106.0100.8
MMLLC(2)0.0320.0260.0230.0190.0050.028104.4102.4101.6100.8102.5102.9
MMLLC(3)0.0150.0130.0130.0120.0040.016100.3100.1100.1100.1102.6100.3
MMLLC(4)0.0110.0090.0090.0080.0030.012100.0100.0100.0100.0102.6100.0
MMLLC(5)0.0100.0080.0080.0070.0040.011100.0100.1100.1100.0102.4100.1
CML0.0020.0030.0020.0020.0020.003100.8101.0100.9101.0103.5100.8
JMLM0.0150.0140.0140.0130.0150.014104.3103.9104.0103.7106.2104.2
JMLW0.0520.0470.0520.0520.0350.060115.9113.6116.6117.1110.2120.6
PJML(1.0)0.0450.0350.0450.0440.0090.063112.1107.7112.7113.0103.7123.1
PJML(1.5)0.0020.0090.0050.0080.0330.013100.3101.9100.9101.4114.9100.4
PJML(2.0)0.0300.0370.0320.0320.0530.021112.0114.9112.1112.6129.9107.0
JML ε (0.1)0.0470.0490.0480.0480.0520.044122.2122.6122.1122.4128.7120.1
JML ε (0.2)0.0270.0270.0240.0220.0300.025104.4102.9102.3102.8107.8100.6
JML ε (0.24)0.0360.0350.0320.0320.0340.038107.4105.4104.6105.0108.4104.6
JML ε (0.3)0.0600.0580.0570.0560.0530.063120.5117.8117.6117.5117.5120.0
JML ε (0.4)0.1010.0990.0970.0960.0880.105153.4150.0149.2149.9146.3153.7
JML ε (0.5)0.1390.1350.1370.1350.1240.144189.9184.7186.5186.5179.7191.9
PMML0.0020.0050.0070.0110.0120.004100.8101.3101.9102.7105.3101.3
PCML0.0020.0030.0030.0030.0020.003103.2102.7103.0102.6105.1102.6
LLLA0.0040.0070.0100.0140.0140.008101.1102.0102.9103.9105.8102.2
MINCHI0.0020.0020.0020.0020.0020.003103.0102.5102.8102.4104.9102.4
EVM(2)0.0020.0030.0020.0030.0020.003104.4103.5104.1103.4106.0103.6
EVM(3)0.0020.0030.0020.0030.0020.003104.3103.5104.1103.4105.9103.6
RA(1)0.0030.0040.0040.0040.0040.004104.8104.1104.7104.1106.4104.3
RA(2)0.0020.0030.0020.0030.0020.003104.4103.5104.1103.4106.0103.6
RA(3)0.0020.0030.0020.0030.0020.003104.3103.5104.1103.4105.9103.6
Note. CML = conditional maximum likelihood; EVM = eigenvector method; JML = joint maximum likelihood; JML ε = JML with ε adjustment; JMLM = JML with maximum likelihood ability estimator; JMLW = JML with Warm’s maximum likelihood ability estimator; LLLA = log-linear by linear association method; MINCHI = minimum chi-square estimation; MML = marginal maximum likelihood; MMLLLC = MML with located latent classes; MMLLS = MML with log-linear smoothing; MMLMN = MML with multinomial distribution; MMLN = MML with normal distribution; PJML = penalized JML; PMML = pairwise MML; PCML = pairwise CML; RA = row-averaging method; NO = normal distribution; AM = asymmetric mixture distribution; UN = uniform distribution; BE = U-shaped beta distribution; LC2 = located 2-class distribution; LC3 = located 3-class distribution; Biases smaller than 0.025 and RMSE values smaller than 107 are printed in bold.
Table A2. Bias and relative RMSE for different estimation methods for a sample size of N = 250 and I = 10 items for a test with a wide range of symmetrically distributed item difficulties as a function of the data-generating trait distribution.
Table A2. Bias and relative RMSE for different estimation methods for a sample size of N = 250 and I = 10 items for a test with a wide range of symmetrically distributed item difficulties as a function of the data-generating trait distribution.
MethodBiasRelative RMSE
NOAMUNBELC2LC3NOAMUNBELC2LC3
MMLN0.0140.0210.0200.0180.0700.032108.6104.0107.0108.0116.1109.0
MMLLS(3)0.0120.0130.0200.0170.0450.023108.6103.6107.3108.1115.9109.1
MMLLS(4)0.0110.0110.0170.0140.0240.015108.5103.4106.3106.7110.1106.8
MMLMN(5)0.0110.0110.0200.0200.0370.019107.8102.7106.8108.5108.3108.9
MMLMN(7)0.0150.0140.0220.0210.0380.022109.0104.0107.6109.0109.1110.7
MMLMN(11)0.0140.0140.0200.0170.0230.019109.0104.0106.7107.2107.4108.3
MMLMN(15)0.0150.0150.0190.0160.0260.017109.0104.0106.8107.1110.7107.2
MMLLC(2)0.0290.0280.0140.0110.0110.010106.2101.1103.1104.5106.9104.4
MMLLC(3)0.0070.0080.0130.0090.0200.010107.9103.0105.8106.3108.5106.1
MMLLC(4)0.0100.0120.0160.0120.0230.012108.5103.5106.3106.6109.0106.7
MMLLC(5)0.0110.0130.0160.0120.0220.013108.7103.7106.4106.8108.9106.8
CML0.0140.0150.0170.0130.0160.013109.0104.1106.5106.8108.6106.9
JMLM0.0930.0950.0970.0940.0960.091128.8123.7127.3126.7128.5126.8
JMLW0.0460.0480.0520.0490.0500.047115.4110.2114.0114.3114.9113.8
PJML(1.0)0.0990.0980.1030.1070.1100.107116.9112.5115.4117.9121.1118.4
PJML(1.5)0.0090.0160.0140.0130.0540.025106.4101.8104.0105.0110.5105.6
PJML(2.0)0.0840.0850.0830.0790.0950.082122.5117.7120.2120.0124.0120.4
JML ε (0.1)0.1600.1620.1630.1600.1620.158150.0145.1148.8148.0149.0148.1
JML ε (0.2)0.0530.0510.0500.0500.0470.047107.0106.5106.5106.9105.1106.4
JML ε (0.24)0.0340.0350.0320.0270.0430.029100.0100.0100.0100.0100.0100.0
JML ε (0.3)0.0490.0530.0490.0500.0660.056102.7100.8101.3101.8104.0102.5
JML ε (0.4)0.1320.1330.1350.1360.1460.140123.4122.5123.8123.9126.0125.2
JML ε (0.5)0.2130.2110.2130.2160.2180.217156.9152.5154.1156.6155.8157.6
PMML0.0140.0220.0200.0190.0810.035108.7104.2107.1108.1118.9109.7
PCML0.0170.0180.0210.0160.0190.016115.2109.8112.8113.2114.4112.3
LLLA0.0130.0210.0200.0180.0730.032108.4103.9106.9107.9116.5109.0
MINCHI0.0080.0080.0060.0100.0080.010111.4106.2108.8109.6110.6108.8
EVM(2)0.0200.0210.0240.0190.0220.018122.9117.2120.9121.4122.6119.9
EVM(3)0.0200.0210.0240.0190.0220.018123.0117.3121.1121.5122.7120.0
RA(1)0.0260.0280.0270.0260.0270.026115.8111.3114.0114.8114.6114.2
RA(2)0.0200.0210.0240.0180.0220.018122.9117.2120.9121.4122.6119.9
RA(3)0.0200.0210.0240.0190.0220.018123.0117.3121.1121.5122.7120.0
Note. CML = conditional maximum likelihood; EVM = eigenvector method; JML = joint maximum likelihood; JML ε = JML with ε adjustment; JMLM = JML with maximum likelihood ability estimator; JMLW = JML with Warm’s maximum likelihood ability estimator; LLLA = log-linear by linear association method; MINCHI = minimum chi-square estimation; MML = marginal maximum likelihood; MMLLLC = MML with located latent classes; MMLLS = MML with log-linear smoothing; MMLMN = MML with multinomial distribution; MMLN = MML with normal distribution; PJML = penalized JML; PMML = pairwise MML; PCML = pairwise CML; RA = row-averaging method; NO = normal distribution; AM = asymmetric mixture distribution; UN = uniform distribution; BE = U-shaped beta distribution; LC2 = located 2-class distribution; LC3 = located 3-class distribution; Biases smaller than 0.025 and RMSE values smaller than 107 are printed in bold.
Table A3. Bias and relative RMSE for different estimation methods for a sample size of N = 1000 and I = 30 items for a test with a wide range of symmetrically distributed item difficulties as a function of the data-generating trait distribution.
Table A3. Bias and relative RMSE for different estimation methods for a sample size of N = 1000 and I = 30 items for a test with a wide range of symmetrically distributed item difficulties as a function of the data-generating trait distribution.
MethodBiasRelative RMSE
NOAMUNBELC2LC3NOAMUNBELC2LC3
MMLN0.0030.0090.0080.0100.0400.018101.2101.4102.3105.1112.3103.8
MMLLS(3)0.0030.0030.0070.0090.0350.011100.4100.2101.6104.7110.6101.9
MMLLS(4)0.0030.0050.0040.0040.0080.007100.3100.2100.5102.5101.5100.9
MMLMN(5)0.0100.0060.0060.0050.0240.016100.4100.0100.5103.6107.6102.2
MMLMN(7)0.0030.0040.0060.0050.0240.016100.3100.2100.5103.6107.5102.8
MMLMN(11)0.0030.0040.0050.0050.0090.013100.3100.2100.4102.4100.9101.3
MMLMN(15)0.0020.0040.0040.0030.0080.007100.4100.2100.5102.7101.7100.0
MMLLC(2)0.0830.0820.0650.0520.0160.040136.7135.7123.3117.0101.4108.5
MMLLC(3)0.0310.0300.0240.0210.0080.019105.5104.9103.1104.3100.6101.8
MMLLC(4)0.0130.0120.0140.0130.0060.009100.9100.7101.2103.2100.6100.0
MMLLC(5)0.0080.0070.0090.0090.0070.010100.6100.3100.8102.9100.5100.2
CML0.0030.0030.0050.0050.0040.002101.3101.0101.4103.7101.3100.6
JMLM0.0240.0250.0250.0260.0250.023105.9105.8106.5109.1106.1105.0
JMLW0.0150.0150.0160.0180.0160.014103.1102.9103.7106.3103.1102.5
PJML(1.0)0.0520.0510.0560.0550.0620.058115.1114.8117.4118.8122.8118.8
PJML(1.5)0.0070.0080.0080.0080.0240.011101.4101.3102.0104.6105.2101.9
PJML(2.0)0.0350.0360.0340.0360.0390.033110.6110.5111.2114.0111.7109.7
JML ε (0.1)0.0500.0510.0510.0520.0500.049117.8117.8118.9121.5117.7116.5
JML ε (0.2)0.0190.0190.0180.0200.0200.020102.7104.2102.5103.0102.3103.3
JML ε (0.24)0.0130.0120.0130.0110.0150.014100.0101.4100.0100.0100.0100.5
JML ε (0.3)0.0160.0160.0150.0130.0210.016100.1100.8100.1100.7100.7100.0
JML ε (0.4)0.0390.0400.0410.0390.0440.041109.3109.9109.9108.3110.1108.9
JML ε (0.5)0.0670.0670.0680.0670.0720.070125.6125.5125.6126.2126.6126.0
PMML0.0490.0450.1050.1120.0790.097161.5158.1210.7217.0182.8202.4
PCML0.0040.0040.0050.0060.0040.003104.6104.5104.5106.7104.2103.9
LLLA0.0030.0100.0080.0100.0420.019101.0101.3102.2105.0113.2103.9
MINCHI0.0040.0040.0040.0020.0030.005103.9103.8103.7105.8103.4103.4
EVM(2)0.0040.0040.0050.0060.0040.003109.2109.2109.1111.3108.4108.6
EVM(3)0.0040.0040.0050.0060.0040.003109.2109.2109.2111.3108.5108.7
RA(1)0.0200.0200.0200.0220.0190.019117.1117.0117.2120.0115.5116.5
RA(2)0.0040.0040.0050.0060.0040.003109.2109.2109.1111.3108.4108.6
RA(3)0.0040.0040.0050.0060.0040.003109.2109.2109.2111.3108.5108.7
Note. CML = conditional maximum likelihood; EVM = eigenvector method; JML = joint maximum likelihood; JML ε = JML with ε adjustment; JMLM = JML with maximum likelihood ability estimator; JMLW = JML with Warm’s maximum likelihood ability estimator; LLLA = log-linear by linear association method; MINCHI = minimum chi-square estimation; MML = marginal maximum likelihood; MMLLLC = MML with located latent classes; MMLLS = MML with log-linear smoothing; MMLMN = MML with multinomial distribution; MMLN = MML with normal distribution; PJML = penalized JML; PMML = pairwise MML; PCML = pairwise CML; RA = row-averaging method; NO = normal distribution; AM = asymmetric mixture distribution; UN = uniform distribution; BE = U-shaped beta distribution; LC2 = located 2-class distribution; LC3 = located 3-class distribution; Biases smaller than 0.025 and RMSE values smaller than 107 are printed in bold.

References

  1. Rasch, G. Probabilistic Models for Some Intelligence and Attainment Tests; Danish Institute for Educational Research: Copenhagen, Danmark, 1960. [Google Scholar]
  2. Fischer, G.H.; Molenaar, I.W. Rasch Models. Foundations, Recent Developments, and Applications; Springer: New York, NY, USA, 1995. [Google Scholar] [CrossRef]
  3. von Davier, M. The Rasch model. In Handbook of Item Response Theory, Volume 1: Models; CRC Press: Boca Raton, FL, USA, 2016; pp. 31–48. [Google Scholar] [CrossRef]
  4. Baker, F.B.; Kim, S.H. Item Response Theory: Parameter Estimation Techniques; CRC Press: Boca Raton, FL, USA, 2004. [Google Scholar] [CrossRef]
  5. Cai, L.; Choi, K.; Hansen, M.; Harrell, L. Item response theory. Annu. Rev. Stat. Appl. 2016, 3, 297–321. [Google Scholar] [CrossRef]
  6. Chen, Y.; Li, X.; Liu, J.; Ying, Z. Item response theory—A statistical framework for educational and psychological measurement. arXiv 2021, arXiv:2108.08604. [Google Scholar]
  7. van der Linden, W.J.; Hambleton, R.K. Handbook of Modern Item Response Theory; Springer: New York, NY, USA, 1997. [Google Scholar] [CrossRef]
  8. Lord, F.M.; Novick, R. Statistical Theories of Mental Test Scores; Addison-Wesley: Reading, MA, USA, 1968. [Google Scholar]
  9. Yen, W.M.; Fitzpatrick, A.R. Item response theory. In Educational Measurement; Brennan, R.L., Ed.; Praeger: New York, NY, USA, 2006; pp. 111–154. [Google Scholar]
  10. Arnold, J.C.; Boone, W.J.; Kremer, K.; Mayer, J. Assessment of competencies in scientific inquiry through the application of Rasch measurement techniques. Educ. Sci. 2018, 8, 184. [Google Scholar] [CrossRef] [Green Version]
  11. Cascella, C.; Giberti, C.; Bolondi, G. Changing the order of factors does not change the product but does affect students’ answers, especially girls’ answers. Educ. Sci. 2021, 11, 201. [Google Scholar] [CrossRef]
  12. Finger, M.E.; Escorpizo, R.; Tennant, A. Measuring work-related functioning using the work rehabilitation questionnaire (WORQ). Int. J. Environ. Res. Public Health 2019, 16, 2795. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  13. Kramer, M.; Förtsch, C.; Boone, W.J.; Seidel, T.; Neuhaus, B.J. Investigating pre-service biology teachers’ diagnostic competences: Relationships between professional knowledge, diagnostic activities, and diagnostic accuracy. Educ. Sci. 2021, 11, 89. [Google Scholar] [CrossRef]
  14. Morales-Rodríguez, F.M.; Martí-Vilar, M.; Peláez, M.A.N.; Lozano, J.M.G.; Ramón, J.P.M.; Caracuel, A. Psychometric properties of the affective dimension of the generic macro-competence assessment scale: Analysis using Rasch model. Sustainability 2021, 13, 6904. [Google Scholar] [CrossRef]
  15. Raccanello, D.; Vicentini, G.; Burro, R. Children’s psychological representation of earthquakes: Analysis of written definitions and Rasch scaling. Geosciences 2019, 9, 208. [Google Scholar] [CrossRef] [Green Version]
  16. Shoahosseini, R.; Baghaei, P. Validation of the Persian translation of the children’s test anxiety scale: A multidimensional Rasch model analysis. Eur. J. Investig. Health Psychol. Educ. 2020, 10, 59–69. [Google Scholar] [CrossRef] [Green Version]
  17. Andrich, D.; Marais, I. A Course in Rasch Measurement Theory; Springer: New York, NY, USA, 2019. [Google Scholar] [CrossRef]
  18. Boone, W.J. Rasch analysis for instrument development: Why, when, and how? CBE Life Sci. Educ. 2016, 15, rm4. [Google Scholar] [CrossRef] [Green Version]
  19. Bond, T.; Yan, Z.; Heene, M. Applying the Rasch Model; Routledge: New York, NY, USA, 2020. [Google Scholar] [CrossRef]
  20. Engelhard, G. Invariant Measurement; Routledge: New York, NY, USA, 2012. [Google Scholar] [CrossRef]
  21. Lamprianou, I. Applying the Rasch Model in Social Sciences Using R and BlueSky Statistics; Routledge: New York, NY, USA, 2019. [Google Scholar] [CrossRef]
  22. Linacre, J.M. Understanding Rasch measurement: Estimation methods for Rasch measures. J. Outcome Meas. 1999, 3, 382–405. [Google Scholar]
  23. Linacre, J.M. Rasch model estimation: Further topics. J. Appl. Meas. 2004, 5, 95–110. [Google Scholar]
  24. Wilson, M. Constructing Measures: An Item Response Modeling Approach; Routledge: New York, NY, USA, 2004. [Google Scholar] [CrossRef]
  25. Wright, B.D.; Stone, M.H. Best Test Design; Mesa Press: Chicago, IL, USA, 1979. [Google Scholar]
  26. Wu, M.; Tam, H.P.; Jen, T.H. Educational Measurement for Applied Researchers; Springer: Singapore, 2016. [Google Scholar] [CrossRef]
  27. Aryadoust, V.; Tan, H.A.H.; Ng, L.Y. A Scientometric review of Rasch measurement: The rise and progress of a specialty. Front. Psychol. 2019, 10, 2197. [Google Scholar] [CrossRef] [Green Version]
  28. De Boeck, P. Random item IRT models. Psychometrika 2008, 73, 533–559. [Google Scholar] [CrossRef]
  29. Holland, P.W. On the sampling theory foundations of item response theory models. Psychometrika 1990, 55, 577–601. [Google Scholar] [CrossRef]
  30. Fischer, G.H. Rasch models. In Handbook of Statistics, Vol. 26: Psychometrics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; pp. 515–585. [Google Scholar] [CrossRef]
  31. Wu, M.; Adams, R.J. Properties of Rasch residual fit statistics. J. Appl. Meas. 2013, 14, 339–355. [Google Scholar] [PubMed]
  32. Christensen, K.B.; Makransky, G.; Horton, M. Critical values for Yen’s Q3: Identification of local dependence in the Rasch model using residual correlations. Appl. Psychol. Meas. 2017, 41, 178–194. [Google Scholar] [CrossRef]
  33. Debelak, R.; Koller, I. Testing the local independence assumption of the Rasch model with Q3-based nonparametric model tests. Appl. Psychol. Meas. 2020, 44, 103–117. [Google Scholar] [CrossRef]
  34. Yen, W.M. Effects of local item dependence on the fit and equating performance of the three-parameter logistic model. Appl. Psychol. Meas. 1984, 8, 125–145. [Google Scholar] [CrossRef]
  35. Meyer, P. Understanding Measurement: Reliability; Oxford University Press: Cambridge, UK, 2010. [Google Scholar] [CrossRef]
  36. Fan, X. Item response theory and classical test theory: An empirical comparison of their item/person statistics. Educ. Psychol. Meas. 1998, 58, 357–381. [Google Scholar] [CrossRef] [Green Version]
  37. Bates, D.; Mächler, M.; Bolker, B.; Walker, S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 2015, 67, 1–48. [Google Scholar] [CrossRef]
  38. De Boeck, P.; Wilson, M. Explanatory Item Response Models: A Generalized Linear and Nonlinear Approach; Springer: New York, NY, USA, 2004. [Google Scholar] [CrossRef]
  39. Doran, H.; Bates, D.; Bliese, P.; Dowling, M. Estimating the multilevel Rasch model: With the lme4 package. J. Stat. Softw. 2007, 20, 1–18. [Google Scholar] [CrossRef]
  40. Rijmen, F.; Tuerlinckx, F.; De Boeck, P.; Kuppens, P. A nonlinear mixed model framework for item response theory. Psychol. Methods 2003, 8, 185–205. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  41. Zheng, X.; Rabe-Hesketh, S. Estimating parameters of dichotomous and ordinal item response models with gllamm. Stata J. 2007, 7, 313–333. [Google Scholar] [CrossRef]
  42. Raudenbush, S.W.; Johnson, C.; Sampson, R.J. A multivariate, multilevel Rasch model with application to self-reported criminal behavior. Sociol. Methodol. 2003, 33, 169–211. [Google Scholar] [CrossRef]
  43. Molenaar, I.W. Estimation of item parameters. In Rasch Models. Foundations, Recent Developments, and Applications; Fischer, G.H., Molenaar, I.W., Eds.; Springer: New York, NY, USA, 1995; pp. 39–52. [Google Scholar] [CrossRef]
  44. Wainer, H.; Morgan, A.; Gustafsson, J.E. A review of estimation procedures for the Rasch model with an eye toward longish tests. J. Educ. Stat. 1980, 5, 35–64. [Google Scholar] [CrossRef]
  45. Haberman, S.J. Joint and Conditional Maximum Likelihood Estimation for the Rasch Model for Binary Responses; (Research Report No. RR-04-20); Educational Testing Service: Princeton, NJ, USA, 2004. [Google Scholar] [CrossRef]
  46. Haberman, S.J. Maximum likelihood estimates in exponential response models. Ann. Stat. 1977, 5, 815–841. [Google Scholar] [CrossRef]
  47. Haberman, S.J. Models with nuisance and incidental parameters. In Handbook of Item Response Theory, Vol. 2: Statistical Tools; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 151–170. [Google Scholar] [CrossRef]
  48. Lancaster, T. The incidental parameter problem since 1948. J. Econom. 2000, 95, 391–413. [Google Scholar] [CrossRef]
  49. Warm, T.A. Weighted likelihood estimation of ability in item response theory. Psychometrika 1989, 54, 427–450. [Google Scholar] [CrossRef]
  50. Magis, D.; Raiche, G. On the relationships between Jeffreys modal and weighted likelihood estimation of ability under logistic IRT models. Psychometrika 2012, 77, 163–169. [Google Scholar] [CrossRef]
  51. Jansen, P.G.W.; van den Wollenberg, A.L.; Wierda, F.W. Correcting unconditional parameter estimates in the Rasch model for inconsistency. Appl. Psychol. Meas. 1988, 12, 297–306. [Google Scholar] [CrossRef] [Green Version]
  52. Wright, B.D.; Douglas, G.A. Best procedures for sample-free item analysis. Appl. Psychol. Meas. 1977, 1, 281–295. [Google Scholar] [CrossRef]
  53. Chen, Y.; Li, X.; Zhang, S. Joint maximum likelihood estimation for high-dimensional exploratory item factor analysis. Psychometrika 2019, 84, 124–146. [Google Scholar] [CrossRef] [Green Version]
  54. Paolino, J.P. Penalized Joint Maximum Likelihood Estimation Applied to Two Parameter Logistic Item Response Models. Ph.D. Thesis, Columbia University, New York, NY, USA, 2013. [Google Scholar] [CrossRef]
  55. Paolino, J.P. Rasch model parameter estimation via the elastic net. J. Appl. Meas. 2015, 16, 353–364. [Google Scholar]
  56. Bertoli-Barsotti, L.; Lando, T.; Punzo, A. Estimating a Rasch Model via Fuzzy Empirical Probability Functions. In Analysis and Modeling of Complex Data in Behavioral and Social Sciences; Vicari, D., Okada, A., Ragozini, G., Weihs, C., Eds.; Springer: Cham, Switzerland, 2014; pp. 29–36. [Google Scholar] [CrossRef]
  57. Lando, T.; Bertoli-Barsotti, L. A modified minimum divergence estimator: Some preliminary results for the Rasch model. Electr. J. Appl. Stat. Anal. 2014, 7, 37–57. [Google Scholar] [CrossRef]
  58. Robitzsch, A.; Steinfeld, J. Item response models for human ratings: Overview, estimation methods, and implementation in R. Psych. Test Assess. Model. 2018, 60, 101–138. [Google Scholar]
  59. Andersen, E.B. The numerical solution of a set of conditional estimation equations. J. R. Stat. Soc. Ser. B Stat. Methodol. 1972, 34, 42–54. [Google Scholar] [CrossRef]
  60. Draxler, C.; Alexandrowicz, R.W. Sample size determination within the scope of conditional maximum likelihood estimation with special focus on testing the Rasch model. Psychometrika 2015, 80, 897–919. [Google Scholar] [CrossRef]
  61. Mair, P.; Hatzinger, R. CML based estimation of extended Rasch models with the eRm package in R. Psychol. Sci. 2007, 49, 26–43. [Google Scholar]
  62. Hatzinger, R.; Rusch, T. IRT models with relaxed assumptions in eRm: A manual-like instruction. Psychol. Sci. Q. 2009, 51, 87–120. [Google Scholar]
  63. Liou, M. More on the computation of higher-order derivatives of the elementary symmetric functions in the Rasch model. Appl. Psychol. Meas. 1994, 18, 53–62. [Google Scholar] [CrossRef] [Green Version]
  64. Verhelst, N.D.; Glas, C.A.W.; Van der Sluis, A. Estimation problems in the Rasch model: The basic symmetric functions. Comp. Stat. Q. 1984, 1, 245–262. [Google Scholar]
  65. Bartolucci, F.; Pigini, C. cquad: An R and Stata package for conditional maximum likelihood estimation of dynamic binary panel data models. J. Stat. Softw. 2017, 78, 1–26. [Google Scholar] [CrossRef] [Green Version]
  66. Duchesne, T.; Fortin, D.; Courbin, N. Mixed conditional logistic regression for habitat selection studies. J. Anim. Ecol. 2010, 79, 548–555. [Google Scholar] [CrossRef]
  67. Sartori, N.; Severini, T.A. Conditional likelihood inference in generalized linear mixed models. Stat. Sin. 2004, 14, 349–360. [Google Scholar]
  68. Aitkin, M. Expectation maximization algorithm and extensions. In Handbook of Item Response Theory, Vol. 2: Statistical Tools; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 217–236. [Google Scholar] [CrossRef]
  69. Bock, R.D.; Aitkin, M. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika 1981, 46, 443–459. [Google Scholar] [CrossRef]
  70. Abdel-Fattah, A. Comparing BILOG and LOGIST estimates for normal, truncated normal, and beta ability distributions. In Proceedings of the Annual Meeting of the American Educational Research Association, New Orleans, LA, USA, 4–8 April 1994. [Google Scholar]
  71. Woods, C.M. Estimating the latent density in unidimensional IRT to permit non-normality. In Handbook of Item Response Theory Modeling; Reise, S.P., Revicki, D.A., Eds.; Routledge: New York, NY, USA, 2014; pp. 78–102. [Google Scholar] [CrossRef]
  72. Chalmers, R.P. mirt: A multidimensional item response theory package for the R environment. J. Stat. Softw. 2012, 48, 1–29. [Google Scholar] [CrossRef] [Green Version]
  73. Robitzsch, A.; Kiefer, T.; Wu, M. TAM: Test Analysis Modules, R Package Version 3.7-6; 2021. Available online: https://CRAN.R-project.org/package=TAM (accessed on 25 June 2021).
  74. Kirisci, L.; Hsu, T.C.; Yu, L. Robustness of item parameter estimation programs to assumptions of unidimensionality and normality. Appl. Psychol. Meas. 2001, 25, 146–162. [Google Scholar] [CrossRef]
  75. Seong, T.J. Sensitivity of marginal maximum likelihood estimation of item and ability parameters to the characteristics of the prior ability distributions. Appl. Psychol. Meas. 1990, 14, 299–311. [Google Scholar] [CrossRef]
  76. Stone, C.A. Recovery of marginal maximum likelihood estimates in the two-parameter logistic response model: An evaluation of MULTILOG. Appl. Psychol. Meas. 1992, 16, 1–16. [Google Scholar] [CrossRef] [Green Version]
  77. Zwinderman, A.H.; Van den Wollenberg, A.L. Robustness of marginal maximum likelihood estimation in the Rasch model. Appl. Psychol. Meas. 1990, 14, 73–81. [Google Scholar] [CrossRef] [Green Version]
  78. Grilli, L.; Metelli, S.; Rampichini, C. Bayesian estimation with integrated nested Laplace approximation for binary logit mixed models. J. Stat. Comput. Simul. 2015, 85, 2718–2726. [Google Scholar] [CrossRef]
  79. Hedeker, D. A mixed-effects multinomial logistic regression model. Stat. Med. 2003, 22, 1433–1446. [Google Scholar] [CrossRef] [PubMed]
  80. Raudenbush, S.W.; Yang, M.L.; Yosef, M. Maximum likelihood for generalized linear models with nested random effects via high-order, multivariate Laplace approximation. J. Comput. Graph. Stat. 2000, 9, 141–157. [Google Scholar] [CrossRef]
  81. Woods, C.M. Empirical histograms in item response theory with ordinal data. Educ. Psychol. Meas. 2007, 67, 73–87. [Google Scholar] [CrossRef]
  82. von Davier, M. A general diagnostic model applied to language testing data. Br. J. Math. Stat. Psychol. 2008, 61, 287–307. [Google Scholar] [CrossRef]
  83. Xu, X.; von Davier, M. Fitting the Structured General Diagnostic Model to NAEP Data; (Research Report No. RR-08-28); Educational Testing Service: Princeton, NJ, USA, 2008. [Google Scholar] [CrossRef]
  84. Casabianca, J.M.; Junker, B.W. Estimating the latent trait distribution with loglinear smoothing models. In New Developments in Quantitative Psychology; Millsap, R.E., van der Ark, L.A., Bolt, D.M., Woods, C.M., Eds.; Springer: New York, NY, USA, 2013; pp. 415–425. [Google Scholar] [CrossRef]
  85. Casabianca, J.M.; Lewis, C. IRT item parameter recovery with marginal maximum likelihood estimation using loglinear smoothing models. J. Educ. Behav. Stat. 2015, 40, 547–578. [Google Scholar] [CrossRef] [Green Version]
  86. Haberman, S.J.; von Davier, M.; Lee, Y.H. Comparison of Multidimensional Item Response Models: Multivariate Normal Ability Distributions Versus Multivariate Polytomous Distributions; (Research Report No. RR-08-45); Educational Testing Service: Princeton, NJ, USA, 2008. [Google Scholar] [CrossRef]
  87. Steinfeld, J.; Robitzsch, A. Item parameter estimation in multistage designs: A comparison of different estimation approaches for the Rasch model. Psych 2021, 3, 279–307. [Google Scholar] [CrossRef]
  88. Xu, X.; von Davier, M. Comparing Multiple-Group Multinomial Log-Linear Models for Multidimensional Skill Distributions in the General Diagnostic Model; (Research Report No. RR-08-35); Educational Testing Service: Princeton, NJ, USA, 2008. [Google Scholar] [CrossRef]
  89. De Leeuw, J.; Verhelst, N. Maximum likelihood estimation in generalized Rasch models. J. Educ. Behav. Stat. 1986, 11, 183–196. [Google Scholar] [CrossRef]
  90. Formann, A.K. Constrained latent class models: Theory and applications. Br. J. Math. Stat. Psychol. 1985, 38, 87–111. [Google Scholar] [CrossRef]
  91. Haberman, S.J. Latent-Class Item Response Models; (Research Report No. RR-05-28); Educational Testing Service: Princeton, NJ, USA, 2005. [Google Scholar] [CrossRef]
  92. Lindsay, B.; Clogg, C.C.; Grego, J. Semiparametric estimation in the Rasch model and related exponential response models, including a simple latent class model for item analysis. J. Am. Stat. Assoc. 1991, 86, 96–107. [Google Scholar] [CrossRef]
  93. Bacci, S.; Bartolucci, F. A multidimensional latent class Rasch model for the assessment of the health-related quality of life. In Rasch Models in Health; Christensen, K.B., Kreiner, S., Mesbah, M., Eds.; Wiley: Hoboken, NJ, USA, 2013; pp. 197–218. [Google Scholar] [CrossRef] [Green Version]
  94. Genge, E. LC and LC-IRT models in the identification of Polish households with similar perception of financial position. Sustainability 2021, 13, 4130. [Google Scholar] [CrossRef]
  95. Katsikatsou, M.; Moustaki, I.; Yang-Wallentin, F.; Jöreskog, K.G. Pairwise likelihood estimation for factor analysis models with ordinal data. Comput. Stat. Data Anal. 2012, 56, 4243–4258. [Google Scholar] [CrossRef] [Green Version]
  96. Feddag, M.L.; Hardouin, J.B.; Sebille, V. Pairwise- and marginal-likelihood estimation for the mixed Rasch model with binary data. J. Stat. Comput. Simul. 2012, 82, 419–430. [Google Scholar] [CrossRef]
  97. Renard, D.; Molenberghs, G.; Geys, H. A pairwise likelihood approach to estimation in multilevel probit models. Comput. Stat. Data Anal. 2004, 44, 649–667. [Google Scholar] [CrossRef]
  98. Varin, C.; Reid, N.; Firth, D. An overview of composite likelihood methods. Stat. Sin. 2011, 21, 5–42. [Google Scholar]
  99. Feddag, M.L.; Bacci, S. Pairwise likelihood for the longitudinal mixed Rasch model. Comput. Stat. Data Anal. 2009, 53, 1027–1037. [Google Scholar] [CrossRef]
  100. Andrich, D. Sufficiency and conditional estimation of person parameters in the polytomous Rasch model. Psychometrika 2010, 75, 292–308. [Google Scholar] [CrossRef]
  101. Christensen, K.B. A Multidimensional Latent Class Rasch Model for the Assessment of the Health-Related Quality of Life. In Rasch Models in Health; Christensen, K.B., Kreiner, S., Mesbah, M., Eds.; Wiley: Hoboken, NJ, USA, 2013; pp. 49–62. [Google Scholar] [CrossRef]
  102. Draxler, C.; Tutz, G.; Zink, K.; Gürer, C. Comparison of maximum likelihood with conditional pairwise likelihood estimation of person parameters in the Rasch model. Commun. Stat. Simul. Comput. 2016, 45, 2007–2017. [Google Scholar] [CrossRef]
  103. van der Linden, W.J.; Eggen, T.J.H.M. An empirical Bayesian approach to item banking. Appl. Psychol. Meas. 1986, 10, 345–354. [Google Scholar] [CrossRef]
  104. Zwinderman, A.H. Pairwise parameter estimation in Rasch models. Appl. Psychol. Meas. 1995, 19, 369–375. [Google Scholar] [CrossRef]
  105. de Gruijter, D.N. On the robustness of the “minimum-chi-square” method for the Rasch model. Tijdschr Onderwijsres 1987, 12, 225–232. [Google Scholar]
  106. Fischer, G.H. Einführung in Die Theorie Psychologischer Tests [Introduction to the Theory of Psychological Testing]; Huber: Bern, Switzerland, 1974. [Google Scholar]
  107. Choppin, B. A fully conditional estimation procedure for Rasch model parameters. Eval. Educ. 1982, 9, 29–42. [Google Scholar]
  108. Heine, J.H.; Tarnai, C. Pairwise Rasch model item parameter recovery under sparse data conditions. Psych. Test Assess. Model. 2015, 57, 3–36. [Google Scholar]
  109. Wang, J.; Engelhard, G. A pairwise algorithm in R for rater-mediated assessments. Rasch Meas. Trans. 2014, 28, 1457–1459. [Google Scholar]
  110. Finch, H.; French, B.F. A comparison of estimation techniques for IRT models with small samples. Appl. Meas. Educ. 2019, 32, 77–96. [Google Scholar] [CrossRef]
  111. Garner, M. An eigenvector method for estimating item parameters of the dichotomous and polytomous Rasch models. J. Appl. Meas. 2002, 3, 107–128. [Google Scholar] [PubMed]
  112. Garner, M.; Engelhard, G. Using paired comparison matrices to estimate parameters of the partial credit Rasch measurement model for rater-mediated assessments. J. Appl. Meas. 2009, 10, 30–41. [Google Scholar] [PubMed]
  113. Saaty, T.L.; Vargas, L.G. Comparison of eigenvalue, logarithmic least squares and least squares methods in estimating ratios. Math. Model. 1984, 5, 309–324. [Google Scholar] [CrossRef] [Green Version]
  114. Saaty, T.L.; Vargas, L.G. Models, Methods, Concepts & Applications of the Analytic Hierarchy Process; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar] [CrossRef]
  115. Anderson, C.J.; Böckenholt, U. Graphical regression models for polytomous variables. Psychometrika 2000, 65, 497–509. [Google Scholar] [CrossRef]
  116. Anderson, C.J.; Vermunt, J.K. Log-multiplicative association models as latent variable models for nominal and/or ordinal data. Sociol. Methodol. 2000, 30, 81–121. [Google Scholar] [CrossRef] [Green Version]
  117. Anderson, C.J.; Yu, H.T. Log-multiplicative association models as item response models. Psychometrika 2007, 72, 5–23. [Google Scholar] [CrossRef]
  118. Anderson, C.J.; Li, Z.; Vermunt, J.K. Estimation of models in a Rasch family for polytomous items and multiple latent variables. J. Stat. Softw. 2007, 20, 1–36. [Google Scholar] [CrossRef] [Green Version]
  119. Holland, P.W. The Dutch identity: A new tool for the study of item response models. Psychometrika 1990, 55, 5–18. [Google Scholar] [CrossRef]
  120. Wolfinger, R.; O’connell, M. Generalized linear mixed models a pseudo-likelihood approach. J. Stat. Comput. Simul. 1993, 48, 233–243. [Google Scholar] [CrossRef]
  121. Le, L.T.; Adams, R.J. Accuracy of Rasch Model Item Parameter Estimation; ACER: Camberwel, UK, 2013. [Google Scholar]
  122. Kim, Y.; Choi, Y.K.; Emery, S. Logistic regression with multiple random effects: A simulation study of estimation methods and statistical packages. Am. Stat. 2013, 67, 171–182. [Google Scholar] [CrossRef]
  123. Robitzsch, A. A comparison of estimation methods for the Rasch model. In Book of Short Papers—SIS 2021; Perna, C., Salvati, N., Spagnolo, F.S., Eds.; Pearson: Upper Saddle River, NJ, USA, 2021; pp. 157–162. [Google Scholar]
  124. R Core Team. R: A Language and Environment for Statistical Computing; R Core Team: Vienna, Austria, 2020; Available online: https://www.R-project.org/ (accessed on 20 August 2020).
  125. Robitzsch, A.; Steinfeld, J. immer: Item Response Models for Multiple Ratings; R Package Version 1.1-35. 2018. Available online: https://CRAN.R-project.org/package=immer (accessed on 10 December 2020).
  126. Heine, J.H. pairwise: Rasch Model Parameters by Pairwise Algorithm, R Package Version 0.5.0-2; 2021. Available online: https://CRAN.R-project.org/package=pairwise (accessed on 6 January 2021).
  127. Li, Z.; Hong, F. plRasch: Log Linear by Linear Association Models and Rasch Family Models by Pseudolikelihood Estimation, R Package Version 1.0; 2014. Available online: https://CRAN.R-project.org/package=plRasch (accessed on 10 January 2014).
  128. Robitzsch, A. Sirt: Supplementary Item Response Theory Models, R Package Version 3.10-111; 2021. Available online: https://github.com/alexanderrobitzsch/sirt (accessed on 25 June 2021).
  129. Robitzsch, A.; Lüdtke, O. Reflections on analytical choices in the scaling model for test scores in international large-scale assessment studies. PsyarXiv 2021. [Google Scholar] [CrossRef]
  130. Birnbaum, A. Some latent trait models and their use in inferring an examinee’s ability. In Statistical Theories of Mental Test Scores; Lord, F.M., Novick, M.R., Eds.; MIT Press: Reading, MA, USA, 1968; pp. 397–479. [Google Scholar]
  131. Finch, H. Estimation of item response theory parameters in the presence of missing data. J. Educ. Meas. 2008, 45, 225–245. [Google Scholar] [CrossRef]
  132. Mislevy, R.J. Missing responses in item response modeling. In Handbook of Item Response Theory, Vol. 2: Statistical Tools; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 171–194. [Google Scholar] [CrossRef]
  133. Waterbury, G.T. Missing data and the Rasch model: The effects of missing data mechanisms on item parameter estimation. J. Appl. Meas. 2019, 20, 154–166. [Google Scholar]
  134. Kubinger, K.D.; Steinfeld, J.; Reif, M.; Yanagida, T. Biased (conditional) parameter estimation of a Rasch model calibrated item pool administered according to a branched testing design. Psych. Test Assess. Model. 2012, 54, 450–460. [Google Scholar]
  135. Eggen, T.J.H.M.; Verhelst, N.D. Item calibration in incomplete testing designs. Psicológica 2011, 32, 107–132. [Google Scholar]
  136. Fox, J.P. Bayesian Item Response Modeling; Springer: New York, NY, USA, 2010. [Google Scholar] [CrossRef]
  137. Hadfield, J.D. MCMC methods for multi-response generalized linear mixed models: The MCMCglmm R package. J. Stat. Softw. 2010, 33, 1–22. [Google Scholar] [CrossRef] [Green Version]
  138. Kim, S.H.; Cohen, A.S.; Kwak, M.; Lee, J. Priors in Bayesian estimation under the Rasch model. J. Appl. Meas. 2019, 20, 384–398. [Google Scholar] [PubMed]
  139. Luo, Y.; Jiao, H. Using the Stan program for Bayesian item response theory. Educ. Psychol. Meas. 2018, 78, 384–408. [Google Scholar] [CrossRef] [PubMed]
  140. Rupp, A.A.; Dey, D.K.; Zumbo, B.D. To Bayes or not to Bayes, from whether to when: Applications of Bayesian methodology to modeling. Struct. Equ. Model. 2004, 11, 424–451. [Google Scholar] [CrossRef]
  141. Swaminathan, H.; Gifford, J.A. Bayesian estimation in the Rasch model. J. Educ. Stat. 1982, 7, 175–191. [Google Scholar] [CrossRef]
  142. Draxler, C. Bayesian conditional inference for Rasch models. AStA Adv. Stat. Anal. 2018, 102, 245–262. [Google Scholar] [CrossRef]
  143. Huang, P.H. Penalized least squares for structural equation modeling with ordinal responses. Multivar. Behav. Res. 2020. [Google Scholar] [CrossRef]
Table 1. Variance proportions of different factors in the simulation study for (mean absolute) bias and relative RMSE.
Table 1. Variance proportions of different factors in the simulation study for (mean absolute) bias and relative RMSE.
SourceBias 1 Rel. RMSE 1
N0.20.9
I5.35.5
Skew0.20.1
Range6.36.5
Meth51.6 1 29.3 1
Dist0.40.1
N×I0.00.6
N× Skew0.00.0
N× Range0.00.4
N× Meth0.79.2
N× Dist0.00.1
I× Skew0.00.1
I× Range0.71.4
I× Meth17.5 1 21.3 1
I× Dist0.00.0
Skew× Range0.40.4
Skew× Meth0.40.6
Skew× Dist0.00.0
Range× Meth7.97.3
Range× Dist0.10.1
Meth× Dist1.60.5
N×I× Skew0.00.0
N×I× Range0.00.0
N×I× Meth0.04.4
N×I× Dist0.00.0
N× Skew× Range0.00.0
N× Skew× Meth0.10.3
N× Skew× Dist0.00.0
N× Range× Meth0.11.4
N× Range× Dist0.00.0
N× Meth× Dist0.10.2
I× Skew× Range0.10.1
I× Skew× Meth0.20.3
I× Skew× Dist0.00.0
I× Range× Meth2.74.5
I× Range× Dist0.00.0
I× Meth× Dist0.20.1
Skew× Range× Meth1.00.7
Skew× Range× Dist0.00.0
Skew× Meth× Dist0.10.1
Range× Meth× Dist1.10.7
Residual1.03.0
Note.N = sample size; I = number of items; Dist = simulated trait distribution; Meth = estimation method; Skew = Skewness of item difficulties; Range = Range in item difficulties. Percentage values larger than 0.5 are printed in bold.
Table 2. Summary of results for (mean absolute) bias and relative RMSE for different estimation methods across all simulation conditions.
Table 2. Summary of results for (mean absolute) bias and relative RMSE for different estimation methods across all simulation conditions.
MethodBiasRelative RMSE
Rk%AccMedQ90MADRk%AccMedQ90MAD
MMLN1781.30.0130.0400.0091474.5104.2109.93.1
MMLLS(3)1490.10.0080.0230.0071181.3103.5109.13.0
MMLLS(4)3100.0 1 0.0060.0140.003593.8102.4106.22.5
MMLMN(5)2162.00.0160.0800.0151868.8104.4125.44.2
MMLMN(7)1585.90.0090.0290.0071081.8103.4108.43.2
MMLMN(11)1296.90.0060.0160.004691.7102.5106.32.5
MMLMN(15)999.50.0060.0160.004791.7102.6106.72.5
MMLLC(2)2640.10.0280.0610.0201772.4104.2116.54.0
MMLLC(3)1394.30.0090.0220.006394.8102.6105.72.5
MMLLC(4)6100.0 1 0.0070.0150.004294.8102.1106.02.2
MMLLC(5)4100.0 1 0.0070.0140.004494.3102.3106.32.3
CML2100.0 1 0.0060.0150.004891.1103.0106.82.3
JMLM2354.20.0210.1320.0222156.8105.9145.65.2
JMLW2735.40.0350.0770.0252253.1106.4125.26.8
PJML(1.0)3022.40.0480.1110.0312345.8108.1134.29.1
PJML(1.5)1684.90.0100.0320.007985.9103.4107.63.0
PJML(2.0)2827.10.0380.0850.0243026.0111.1129.96.7
JML ε (0.1)2925.00.0530.1740.0363223.4116.0179.413.9 1
JML ε (0.2)2453.60.0240.0520.0171379.2103.4109.52.5
JML ε (0.24)2256.30.0230.0400.015195.3101.1104.81.6
JML ε (0.3)2546.40.0360.0770.0301672.4101.3119.41.9
JML ε (0.4)31 1 3.10.0650.1660.0502841.7109.6161.114.3 1
JML ε (0.5)32 1 0.00.1010.2480.0693125.0125.5215.333.3 1
PMML2070.30.0150.0670.0112059.9105.7120.94.5
PCML5100.0 1 0.0070.0170.0041962.0106.1111.02.9
LLLA1980.70.0130.0420.0071574.5104.2110.42.8
MINCHI1100.0 1 0.0050.0120.0031280.7104.9108.62.2
EVM(2)1198.40.0070.0190.0042541.7108.9117.86.6
EVM(3)8100.0 1 0.0070.0190.0042741.7108.8118.16.6
RA(1)1881.30.0190.0280.0102934.4110.0123.06.1
RA(2)1099.50.0070.0190.0042441.7108.9117.86.6
RA(3)7100.0 1 0.0070.0190.0042641.7108.8118.16.6
Note. CML = conditional maximum likelihood; EVM = eigenvector method; JML = joint maximum likelihood; JML ε = JML with ε adjustment; JMLM = JML with maximum likelihood ability estimator; JMLW = JML with Warm’s maximum likelihood ability estimator; LLLA = log-linear by linear association method; MINCHI = minimum chi-square estimation; MML = marginal maximum likelihood; MMLLLC = MML with located latent classes; MMLLS = MML with log-linear smoothing; MMLMN = MML with multinomial distribution; MMLN = MML with normal distribution; PJML = penalized JML; PMML = pairwise MML; PCML = pairwise CML; RA = row-averaging method; Rk = performance rank of the method; %Acc = percentage of conditions with acceptable performance; Med = median; Q90 = 90% quantile; %Acc values larger than 90, biases smaller than 0.025 and RMSE values smaller than 107 are printed in bold.
Table 3. Bias and relative RMSE for different estimation methods for a sample size of N = 1000 and I = 10 items for a test with a wide range of symmetrically distributed item difficulties as a function of the data-generating trait distribution.
Table 3. Bias and relative RMSE for different estimation methods for a sample size of N = 1000 and I = 10 items for a test with a wide range of symmetrically distributed item difficulties as a function of the data-generating trait distribution.
MethodBiasRelative RMSE
NOAMUNBELC2LC3NOAMUNBELC2LC3
MMLN0.0040.0130.0100.0120.0630.030102.2102.4102.3103.6128.9108.8
MMLLS(3)0.0030.0060.0070.0100.0290.015101.6101.0101.5102.7112.2103.9
MMLLS(4)0.0030.0050.0040.0040.0090.004101.6101.0100.4101.0104.6100.0
MMLMN(5)0.0080.0050.0050.0050.0230.011101.0100.4100.6102.4106.2105.3
MMLMN(7)0.0030.0040.0050.0060.0240.013101.6101.0100.9102.6106.4105.9
MMLMN(11)0.0030.0040.0050.0050.0060.005101.7101.2100.2100.5100.0101.1
MMLMN(15)0.0020.0040.0030.0030.0090.003101.7101.2100.6101.2104.8100.0
MMLLC(2)0.0540.0550.0440.0370.0190.033118.5118.0111.1108.1102.8106.0
MMLLC(3)0.0160.0170.0200.0200.0120.020102.3101.9102.0102.8102.0102.2
MMLLC(4)0.0090.0110.0120.0110.0070.008101.7101.2101.0101.4101.8100.5
MMLLC(5)0.0090.0100.0110.0100.0060.009101.7101.2100.7101.2101.7100.5
CML0.0040.0030.0030.0040.0030.004102.6101.8101.2101.9102.6101.4
JMLM0.0830.0810.0810.0820.0810.082144.7142.4144.0144.8143.9145.0
JMLW0.0360.0340.0370.0390.0360.038113.4111.8113.5114.8113.5114.3
PJML(1.0)0.1060.1070.1140.1150.1180.114154.1154.8161.1161.6167.6162.7
PJML(1.5)0.0020.0100.0090.0110.0470.024100.0100.0100.0101.0116.6103.9
PJML(2.0)0.0750.0740.0700.0700.0830.074136.4134.7134.0134.3142.7136.4
JML ε (0.1)0.1520.1500.1500.1510.1500.150203.9201.0203.7204.0203.6204.6
JML ε (0.2)0.0420.0420.0430.0420.0470.041111.0109.7113.3110.5116.7112.2
JML ε (0.24)0.0330.0320.0280.0270.0440.032102.4101.2102.9100.0109.4103.2
JML ε (0.3)0.0560.0580.0550.0550.0690.059121.2120.5121.0119.4128.0121.4
JML ε (0.4)0.1390.1390.1400.1410.1460.142187.1185.9189.1187.2193.2190.0
JML ε (0.5)0.2160.2170.2190.2190.2220.219260.8260.5266.4265.2268.3266.4
PMML0.0040.0150.0100.0130.0730.034102.3102.8102.4103.8136.3110.8
PCML0.0040.0040.0040.0040.0040.005107.4107.9106.4106.9107.9106.4
LLLA0.0040.0140.0100.0130.0660.031102.0102.2102.2103.6130.4109.2
MINCHI0.0030.0040.0040.0030.0030.002106.5107.1105.7106.1107.1105.4
EVM(2)0.0040.0050.0040.0050.0040.006113.6114.4113.2113.6114.5113.3
EVM(3)0.0040.0050.0040.0050.0040.006114.0114.9113.7113.9114.9113.7
RA(1)0.0190.0200.0180.0200.0190.021126.9128.5127.2127.2127.6128.0
RA(2)0.0040.0050.0040.0050.0040.006113.6114.4113.2113.6114.5113.3
RA(3)0.0040.0050.0040.0050.0040.006114.0114.9113.7113.9114.9113.7
Note. CML = conditional maximum likelihood; EVM = eigenvector method; JML = joint maximum likelihood; JML ε = JML with ε adjustment; JMLM = JML with maximum likelihood ability estimator; JMLW = JML with Warm’s maximum likelihood ability estimator; LLLA = log-linear by linear association method; MINCHI = minimum chi-square estimation; MML = marginal maximum likelihood; MMLLLC = MML with located latent classes; MMLLS = MML with log-linear smoothing; MMLMN = MML with multinomial distribution; MMLN = MML with normal distribution; PJML = penalized JML; PMML = pairwise MML; PCML = pairwise CML; RA = row-averaging method; NO = normal distribution; AM = asymmetric mixture distribution; UN = uniform distribution; BE = U-shaped beta distribution; LC2 = located 2-class distribution; LC3 = located 3-class distribution; Biases smaller than 0.025 and RMSE values smaller than 107 are printed in bold.
Table 4. Bias and relative RMSE for different estimation methods for a sample size of N = 250 and I = 30 items for a test with a wide range of symmetrically distributed item difficulties as a function of the data-generating trait distribution.
Table 4. Bias and relative RMSE for different estimation methods for a sample size of N = 250 and I = 30 items for a test with a wide range of symmetrically distributed item difficulties as a function of the data-generating trait distribution.
MethodBiasRelative RMSE
NOAMUNBELC2LC3NOAMUNBELC2LC3
MMLN0.0080.0120.0150.0120.0200.006106.9105.2105.9105.1107.2105.0
MMLLS(3)0.0070.0080.0140.0110.0110.009106.7104.4105.7105.1105.8105.4
MMLLS(4)0.0070.0080.0110.0080.0150.010106.7104.4105.1104.3104.7104.6
MMLMN(5)0.0110.0120.0090.0090.0300.030106.2104.2104.5103.4105.5107.0
MMLMN(7)0.0080.0080.0110.0080.0290.031106.8104.6105.2104.4106.0107.3
MMLMN(11)0.0080.0080.0110.0070.0270.014106.7104.5105.1104.3105.5105.6
MMLMN(15)0.0080.0090.0110.0070.0140.011106.7104.6105.1104.2104.6105.2
MMLLC(2)0.0290.0250.0170.0140.0040.018103.8102.0102.6102.1104.2102.4
MMLLC(3)0.0060.0040.0080.0050.0060.004105.2103.4104.4103.7104.7104.1
MMLLC(4)0.0050.0060.0100.0070.0070.005106.1104.1104.9104.0104.8104.6
MMLLC(5)0.0050.0070.0100.0070.0060.006106.3104.2104.9104.0104.5104.8
CML0.0080.0090.0120.0080.0090.008106.9104.8105.3104.4105.6105.1
JMLM0.0100.0100.0120.0090.0100.010107.1104.8105.4104.5105.4105.3
JMLW0.0070.0050.0070.0070.0040.007105.2103.1103.5102.7104.0103.3
PJML(1.0)0.0150.0120.0160.0170.0120.025104.6103.0103.3102.5105.4102.8
PJML(1.5)0.0100.0130.0140.0100.0210.005107.1105.4105.6104.7107.6104.7
PJML(2.0)0.0200.0220.0230.0190.0270.017109.2107.3107.6106.6109.0106.8
JML ε (0.1)0.0220.0210.0230.0210.0230.021108.9106.5107.2106.3107.3107.0
JML ε (0.2)0.0110.0110.0110.0110.0150.008103.5103.8102.9103.5104.4103.1
JML ε (0.24)0.0110.0110.0100.0100.0130.008102.4102.7101.9102.6103.4102.0
JML ε (0.3)0.0150.0130.0140.0150.0150.015102.2101.1101.1101.2101.7101.2
JML ε (0.4)0.0260.0260.0280.0320.0270.030100.0100.3100.0100.9100.9100.0
JML ε (0.5)0.0450.0420.0410.0440.0380.046102.5100.0100.3100.0100.0101.1
PMML0.0050.0100.0130.0130.0180.009107.6105.7107.0106.2106.5105.6
PCML0.0090.0100.0130.0090.0090.009108.1105.9106.7105.8106.1107.3
LLLA0.0130.0160.0190.0160.0190.014107.6105.8106.6105.7106.9106.1
MINCHI0.0050.0060.0090.0060.0060.005107.2104.9105.7104.8105.3106.4
EVM(2)0.0090.0100.0130.0090.0090.008108.9106.6107.6106.5106.5108.4
EVM(3)0.0090.0100.0130.0090.0090.009108.9106.6107.6106.5106.4108.4
RA(1)0.0180.0190.0210.0170.0170.019111.2108.8109.9108.9108.4110.9
RA(2)0.0090.0100.0130.0090.0090.008108.9106.6107.6106.5106.5108.4
RA(3)0.0090.0100.0130.0090.0090.009108.9106.6107.6106.5106.4108.4
Note. CML = conditional maximum likelihood; EVM = eigenvector method; JML = joint maximum likelihood; JML ε = JML with ε adjustment; JMLM = JML with maximum likelihood ability estimator; JMLW = JML with Warm’s maximum likelihood ability estimator; LLLA = log-linear by linear association method; MINCHI = minimum chi-square estimation; MML = marginal maximum likelihood; MMLLLC = MML with located latent classes; MMLLS = MML with log-linear smoothing; MMLMN = MML with multinomial distribution; MMLN = MML with normal distribution; PJML = penalized JML; PMML = pairwise MML; PCML = pairwise CML; RA = row-averaging method; NO = normal distribution; AM = asymmetric mixture distribution; UN = uniform distribution; BE = U-shaped beta distribution; LC2 = located 2-class distribution; LC3 = located 3-class distribution; Biases smaller than 0.025 and RMSE values smaller than 107 are printed in bold.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Robitzsch, A. A Comprehensive Simulation Study of Estimation Methods for the Rasch Model. Stats 2021, 4, 814-836. https://doi.org/10.3390/stats4040048

AMA Style

Robitzsch A. A Comprehensive Simulation Study of Estimation Methods for the Rasch Model. Stats. 2021; 4(4):814-836. https://doi.org/10.3390/stats4040048

Chicago/Turabian Style

Robitzsch, Alexander. 2021. "A Comprehensive Simulation Study of Estimation Methods for the Rasch Model" Stats 4, no. 4: 814-836. https://doi.org/10.3390/stats4040048

Article Metrics

Back to TopTop