Depth Induced Regression Medians and Uniqueness

: The notion of median in one dimension is a foundational element in nonparametric statistics. It has been extended to multi-dimensional cases both in location and in regression via notions of data depth. Regression depth (RD) and projection regression depth (PRD) represent the two most promising notions in regression. Carrizosa depth D C is another depth notion in regression. Depth-induced regression medians (maximum depth estimators) serve as robust alternatives to the classical least squares estimator. The uniqueness of regression medians is indispensable in the discussion of their properties and the asymptotics (consistency and limiting distribution) of sample regression medians. Are the regression medians induced from RD, PRD, and D C unique? Answering this question is the main goal of this article. It is found that only the regression median induced from PRD possesses the desired uniqueness property. The conventional remedy measure for non-uniqueness, taking average of all medians, might yield an estimator that no longer possesses the maximum depth in both RD and D C cases. These and other ﬁndings indicate that the PRD and its induced median are highly favorable among their leading competitors.


Introduction
Regular univariate sample median defined as the innermost (deepest) point of a data set is unique (If the sample median is defined to be the point θ that minimizes the sum of its distances to sample points (i.e., θ = arg min θ∈R 1 ∑ n i=1 |θ − x i |, where x i , i = 1, · · · , n are the given n sample points in R 1 ), then it is not unique. However, to overcome this drawback, conventionally it is defined as (2) ≤ · · · ≤ x (n) are ordered values of x i 's and · is the floor function. Namely, it is the innermost point (from both left and right direction) or the average of two deepest sample points. Hence, it is unique). The population median defined as the 1 2 -th quantile (Recall, for any univariate distribution function F, and for 0 < p < 1, the quantity F −1 (p) := inf{x : F(x) ≥ p} is called the pth quantile or fractile of F (see page 3 of Serfling (1980) [1])) of the underlying distribution (there are other versions of definition) is also unique. The most outstanding feature of the univariate median is its robustness. In fact, among all translation equivariant location estimators, it has the best possible breakdown point (Donoho (1982) [2]) (and the minimum maximum bias if underlying distribution has a unimodal symmetric density (Huber (1964) [3]). Besides serving as a promising robust location estimator, the univariate median also provides a base for a center-outward ordering (in terms of the deviations from the median), an alternative to the traditional left-to-right ordering.

Introduction
The Rasch model (RM; [1][2][3]) is one of the most popular item response theory (IRT) models [4][5][6][7][8][9]. It is important to select appropriate estimation methods because the RM is widespread in diverse applications (e.g., [10][11][12][13][14][15][16]). For the RM, a variety of estimation methods has been proposed. In this article, a comprehensive comparison of different estimation methods for the RM is conducted. We manipulate the factors' test length (i.e., number of items), sample size, the type of ability distribution, and the distribution of item difficulties. Recommendations for the choice of estimation methods can be drawn for empirical applications that utilize the RM.
The article is structured as follows. In Section 2, the RM model is introduced. In Section 3, several estimation methods are reviewed. In Section 4, we present the results of a simulation study that compares a wide range of estimation methods. Finally, the paper closes with a discussion in Section 5.

Rasch Model
The RM [1,2,[17][18][19][20][21][22][23][24][25][26][27]] is a statistical model for dichotomous item responses X pi for persons p = 1, . . . , N and items i = 1, . . . , I. It assumes the existence of a latent variable θ (so-called ability) that accounts for the dependence among item responses. The item response function for the Rasch model is given as where θ p is the ability of person p and b i is the item difficulty for item i. Abilities θ p can be either modeled as fixed effects or random effects [28,29]. In the treatment of fixed effects, every person is associated with an ability parameter that has to be estimated. In the random effects situation, a distribution for the ability variable is posed; that is, θ ∼ G and the unknown distribution G must be estimated in a parametric, semiparametric, or nonparametric way. Note that the RM parameters are determined up to a constant. Hence, either the mean of the abilities or the mean of item difficulties has to be fixed to zero for reasons of identification [30]. The posed functional form of the item response function (1) in the RM can be assessed by item fit statistics [31]. We would also like to emphasize that the RM only places low requirements on sample sizes because only one parameter per item (i.e., item difficulty b i ) is estimated [9]. In addition to Equation (1), item responses X pi are assumed to be locally independent: P(X p1 = x 1 , . . . , X pI = x I |θ p ) = This means that there does not exist residual associations among items after taking the ability θ p into account. Assumption (2) can be tested in empirical applications [32][33][34]. It can be argued that the unidimensionality assumption in Equation (2) is only a crude approximation to real data and enables the extraction of a summary ability variable. The local independence assumption can then be understood as an assumption that residual associations among items cancel out on average. This means that there will always exist positive and negative residual associations after controlling for the extracted ability variable θ.
An important property of the Rasch model is that the sum score S p = ∑ I i=1 X pi is a sufficient statistic for θ p [30]. Hence, θ p is a nonlinear function of the sum score S p , and it does not matter in the computation of the ability which of the items have been solved by a person. Moreover, with at least a moderate number of items, the nonlinear relation of S p and θ p can be closely approximated by a linear function which explains the resemblance of classical test theory [35] and the RM [36]. Moreover, note that the proportion correct for an item is a sufficient statistic for the item difficulty b i .
In this article, the RM is a mixed effects logistic model with a random person effect θ, and item difficulties b i are fixed effects [28,[37][38][39][40][41]. The formulation of the RM as a mixed effects model has the advantage that item difficulties can alternatively be considered as random effects [28]. Moreover, more complex hierarchical structures (e.g., students nested within schools) can also be accommodated [39,42].
In the remainder of the paper, we only focus on the estimation of item parameters. We review several estimation methods for the RM in the next section.

Estimation Methods for the Rasch Model
A variety of estimation methods has been proposed for the RM [23,43,44]. In this section, we contrast joint maximum likelihood, conditional maximum likelihood, marginal maximum likelihood estimation, and limit information estimation methods.

Joint Maximum Likelihood (JML) Estimation
Joint maximum likelihood (JML; [25,45]) methods treat person abilities θ p as fixed effects. In JML, the vector of person parameters γ = (θ 1 , . . . , θ N ) is simultaneously estimated with the vector of item parameters b = (b 1 , . . . , b I ). The estimation JML algorithm alternates between γ and b parameter estimation in one iteration. Note that the number of estimated parameters grows with the number of observations (i.e., number of persons times number of items). This property is also denoted as the incidental parameter problem, resulting in the undesirable property that JML estimates are not consistent [46][47][48]. However, several bias correction methods can be utilized to circumvent this issue. The different JML estimation variants are described in more detail in the following.

JML with Bias Correction (JMLM and JMLW)
As mentioned above, the JML estimation algorithm cycles between the steps of estimating person and item parameters. For persons that solved no or all items, no finite ability estimate θ p exist, which causes the incidental parameter problem. The JMLM method eliminates the persons with these extreme scores from the JML estimation for estimating item parameters. In contrast, a modified ability estimation method by Warm [49] can be used (JMLW) that results in finite ability estimates and does not require the elimination of persons in the analysis. Interestingly, the JMLW method can be interpreted as a Bayesian estimation method with a Jeffrey's prior for abilities [50]. The bias due to incidental parameters can be corrected (or at least reduced) in JMLM and JMLW by a subsequent adjustment of estimated item parameters [51,52]. With obtained item parametersb i from the alternating estimating approach, the finally computed bias-corrected item parameter is given as (I − 1)/I ·b i . Note that the adjustment becomes negligible with an increasing number of items I.

Penalized JML (PJML)
In penalized JML [53][54][55], a ridge penalty term is added to the log-likelihood function with a regularization parameter λ. That is, a term Pen(θ p ) = −λθ 2 p is added to the personspecific log-likelihood. Including a ridge penalty is equivalent to a Bayesian approach in which a normal prior distribution θ ∼ N(0, σ 2 prior ) with an appropriate choice of the regularization parameter σ prior > 0 is employed. PJML also circumvents the exclusion of persons with extreme scores from the estimation. The optimal choice of σ prior will typically differ in the situations in which the precision in person or item parameter estimates should be optimized.

JML with ε Adjustment (JMLε)
Another JML estimation approach that does not require eliminating persons with extreme scores is JML with ε adjustment (JMLε; [56][57][58]). JMLε estimation employs a modified likelihood by replacing the sufficient statistic S p with a modified sufficient statistic S * p that is defined by using an appropriate ε > 0. As a consequence, while S p takes values in the interval [0, I], S * p takes values in [ε, I − ε], and the latter statistic results in finite ability estimates. Interestingly, the estimation methods PJML and JMLε tackle the issue of non-finite ability estimates from different angles. The original JML approach (i.e., JMLM, that does not allow persons with extreme scores) seeks ability estimates θ p that solves the estimating equation The PJML method adds a penalty Pen(θ p ) to the right side of Equation (4); that is, S p = f (θ p ) + Pen(θ p ). The JMLε method changes the left side of Equation (4), resulting in the modified estimating equation S * p = f (θ p ).

Conditional Maximum Likelihood (CML) Estimation
Conditional maximum likelihood (CML; [43,[59][60][61][62]) estimation can handle the situations in which the ability variable is either treated as fixed or random. In CML estimation, the vector of item difficulties b is only estimated. The ability variable θ is removed from the estimation by conditioning on the sum score. One can show that the conditional distribution of item responses X p conditioned on the sum score S p = ∑ I i=1 X pi does not depend on θ p [30]: Hence, no distributional assumption about the ability variable has to be posed. In addition, item parameter estimates are consistent. CML estimation is computationally more demanding than JML, but efficient algorithms have been proposed [63,64].

Marginal Maximum Likelihood (MML) Estimation
In marginal maximum likelihood estimation (MML; [68,69]), latent variables θ are integrated out by posing a distributional assumption G γ for θ, where distribution parameters γ are simultaneously estimated with b. The log-likelihood function l(b, γ) is maximized. The log-likelihood contribution for person p is given by If the parametric specification G γ differs from the data-generating distribution H, biased item parameters can occur.
In the following subsections, different distributional specifications in MML are discussed. These MML variants differ in how deviations from normally distributed abilities are handled (see [70,71]).

MML with Normality Assumption (MMLN)
In most applications and the default of most IRT software packages [72,73], a normal distribution for θ is posed (MMLN). For identification of the parameters in the RM, the mean is set to zero, and the standard deviation σ is estimated along with the item parameters b. The integral in the log-likelihood function (6) is evaluated by numerical integration techniques. The consequences of applying a misspecified normal distribution have been frequently studied in the literature [74][75][76][77].
Different numerical approximations of the unidimensional integral involved in the likelihood function (see Equation (6)) have been proposed in the literature [78][79][80]. In our experience, numerical approximations defined as the default in IRT packages such as mirt [72] occasionally provide more accurate than corresponding defaults in the popular mixed effects R package lme4 [37].

MML with Multinomial Distribution (MMLMN)
MML with a multinomial distribution (MMLMN) estimates a discrete distribution for the ability variable θ (see [81]). A fixed grid of θ points θ 1 , . . . , θ C is chosen (e.g., on a grid of equidistant θ points ranging from −4 to 4, see [82]). In MMLMN, probabilities γ c = P(θ = θ c ) are freely estimated. The number of estimated parameters increases with larger number C of grid points. An appropriate number C of discrete grid points must be chosen to ensure sufficiently stable item and distribution parameters estimation.

MML with Log-Linear Smoothing (MMLLS)
MML with log-linear smoothing (MMLLS) avoids estimating a large number of distribution parameters in MMLMN. In this estimation method, a log-linear smoothing on the discrete probabilities γ c = P(θ = θ c ) is performed [82,83] (see also [84,85]). If the first two moments are smoothed, MMLLS corresponds to the estimation of discretized normal distribution. In empirical applications, smoothing is typically performed up to the first three or four moments [86][87][88]. These higher moments capture deviations from normality. The log-linear smoothing approach can also be extended to handle nonlinear relations among several latent variables [86].

MML with Located Latent Classes (MMLLC)
The estimation methods MMLMN and MMLLS presuppose the specification of the discrete grid of θ points. In MML with located latent classes, for C latent classes, the values of the grid points θ c are estimated in addition to probabilities γ c (MMLLLC; [89][90][91][92]). In the RM with a test of I items, at most I/2 located class models can be specified because model parameters in larger models cannot be identified [89]. Notably, MMLLC poses the weakest assumptions about the data-generating distribution G, but it relies on a possible doubtful discrete representation of the θ distribution. Classifying persons into different discrete ability levels might be conceptually appealing in empirical applications [93,94].

Limited Information Estimation Methods
So-called limited information methods (LIM) for estimating item parameters in the RM do not rely on the full item response pattern x p . These methods are often simpler to compute because they do not iterate through all item response patterns. LIM only consider marginal univariate or bivariate frequency distributions of item responses.

Pairwise Marginal Maximum Likelihood (PMML)
Pairwise MML (PMML; [95][96][97][98]) is a composite likelihood estimation method for which only pairwise item response probabilities P(X pi = x pi , X pj = x pj ) are modelled. The contributions of all item pairs (i, j) are taken into account. In principle, any distributional assumption about θ can be posed, like in joint modeling of the probabilities as in MML. However, a normal distribution is often assumed [95,99].

Pairwise Conditional Maximum Likelihood (PCML)
In pairwise CML (PCML; [100][101][102][103][104]), the conditional probabilities P(X pi = x pi , X pj = x pj )/P(X pi + X pj = x pi + x pj ) are used for defining an optimization function. Like for CML that conditions on the sum score, PCML also removes θ from estimation equations and does not pose distributional assumptions. The advantage of PCML compared to CML is the strongly reduced burden in computational demand.

Minimum Chi-Square Method (MINCHI)
Minimum chi-square (MINCHI) estimation only relies on bivariate frequencies f ij that are defined as f ij = P(X pi = 1, X pj = 0) .
In MINCHI, the following squared distance is defined that is minimized for determining item parameter estimates b (see [30,105,106]): where i = exp(−b i ). Fixed-point estimation equations have been proposed for computing the minimizer of Equation (8) (see [30]). Also, note that no distributional assumptions about θ are required for MINCHI estimation.

Row Averaging Method (RA)
Like MICHI estimation, the row averaging method [107][108][109] relies on bivariate frequencies f ij (see Equation (7)). A matrix B with entries b ij = log( f ij / f ji ) is formed. The row-wise average of entries in the matrix B is used as an item parameter estimate [107,110]. If some cells (i, j) are empty, B cannot be computed. An alternative estimation method involving powers has been proposed. Let F denote the matrix consisting of all elements f ij (the so-called incidence matrix). The computation of B can then rely on entries of the matrix F * = F k , where k is an integer larger than one (e.g., 2 or 3).

Eigenvector Method (EVM)
The eigenvector method (EVM; [111][112][113][114]) relies on the same preprocessing steps as RA. However, instead of row averaging, the first eigenvector of B is computed as the estimate of the vector of item difficulties. In the case of empty cells, power matrices F * = F k (see Section 3.4.4) can be used. Note that RA and EVM do not require iterations and only require low computational demands that might be attractive for large-scale applications.

Log-Linear by Linear Association Models (LLLA)
Log-linear by linear association (LLLA; [115][116][117][118]) models estimate item parameters through a pseudo-likelihood approach. It relies on the fact that a logistic regression for P(X pi = 1|S p − X pi ) of the item response X pi on the rest score S p − X pi can be specified in which θ does not appear (assuming a normal distribution of θ; see [119]). The logistic regression stacks data of all item responses and allows simultaneous estimation of all item parameters [118,120].

Purpose
Many simulation studies compare the performance of different item parameter estimation methods for the RM. However, most studies only considered the main estimation methods CML, JML, and MML (see, e.g., [44,77,110,121] for the RM and [122] for the mixed effects logistic model). Moreover, they often only used limited variations of deviations from normality in the ability variable. In this study, we provide a comprehensive comparative simulation study that compares the performance of a large number of estimation methods under a wide range of θ distribution. Moreover, sample size, the number of items, and the distribution of item difficulties are systematically manipulated. This simulation study systematically extends the simulation design employed in [123].

Design
In the simulation study, item response data has been generated for the RM. We manipulated six factors in the simulation. First, the sample size (N) of persons was manipulated, resulting in three levels N = 250, 500, 1000. These sample sizes reflect smallscale to large-scale applications of the RM. Second, we varied the number of items (I) and chose levels I = 10 and I = 30. In the simulation, a set of item difficulties was specified for the I = 10 condition. In the condition I = 30, the item parameters for I = 10 were used three times. The levels reflect a short and a long test in applications. Third, the range of item difficulties was manipulated. In the condition of a test with a symmetric item difficulty distribution, item parameters were chosen from the interval [−3, 3] for a wide range, and from the interval [−1.5, 1.5] for a small range. Fourth, the skewness of item difficulties was varied. In the symmetric case, item parameters were equidistantly chosen from the intervals [−3, 3] and [−1.5, 1.5], respectively. In the case of a skew item difficulty distribution, larger item difficulties appear more frequently than smaller item difficulties. The precisely chosen item parameters can be found in Appendix A. Fifth, eight data-generating distributions for the latent ability variable θ in the RM were simulated. All distributions were standardized, that is, E(θ) = 0 and SD(θ) = 1. The eight simulated θ distributions are:

1.
NO: A normal distribution (N(0, 1)) with zero mean and a standard deviation of one 2.
Chi 2 : A scaled chi-squared distribution with one degree of freedom 3.

Analysis Models
Item parameters for the simulated datasets were estimated with the different methods discussed in Section 3. Throughout the simulation, we only considered the estimation of item parameters and did not consider person parameter estimation. To enable the comparability of item parameter estimates, we centered estimated item parameters obtained from each estimation method (i.e., they have zero mean).
For the PJML estimation (see Section 3.1.2), we chose normal priors N(0, σ 2 prior ) with σ prior = 1, 1.5, and 2. Notably, an optimal value of σ prior could also be estimated by cross-validation or empirical Bayes methods. For JMLε estimation (see Section 3. , we specified analysis with 2, 3, 4, and 5 located latent classes. Notably, the data-generating models LC2 and LC3 are expected to be properly handled by one of these models. For RA estimation (see Section 3.4.4), we used powers 1,2, and 3 of the incidence matrix F for the estimation (i.e., use F * = F k with k = 1, 2, 3 as the basis for the computation of the matrix B). For EVM estimation (see Section 3.4.5), powers 2 and 3 of the incidence matrix F were utilized.

Outcome Measures
The bias and root mean square error (RMSE) was computed for each estimated item parameterb i . We consider two summary measures of item parameter recovery. First, the mean absolute bias MAB (also labeled as bias in the Results Section 4.5) quantifies the average bias of item parameters. MAB values near to zero indicate situations with unbiased item parameter estimates. Second, bias and variability are summarized in the average RMSE (ARMSE) defined by To ease the comparison of different estimation methods independent of sample size, ARMSE values are normed with respect to the best-performing estimation method (with a corresponding value ARMSE best (b) in one replication cell in the simulation. The so-called relative RMSE (RRMSE) is defined as As a consequence, RRMSE values have an optimal value of 100, which are attained by the best-performing estimation method.
To summarize the contribution of each of the manipulated factors in the simulation, we conducted an analysis of variance (ANOVA) based on a linear regression model and used a variance decomposition for assessing the importance (i.e., computed the eta square effect size).
Moreover, we classified estimation methods whether they showed acceptable performance in a particular condition. We defined acceptable performance for the bias if the bias (i.e., the MAB) was smaller than 0.025. Assuming a symmetric item difficulty distribution and that bias is proportional to the true item difficulty, this condition would correspond to a maximum item parameter bias of 0.05. An estimator had satisfactory performance concerning the RMSE if the relative RMSE was smaller than 107. This is equivalent to an average loss in precision in estimated item parameters of about 15% (i.e., 1.07 2 = 1.144).

Results
In Table 1, the variance decomposition from the ANOVA of different simulation factors in the simulation for bias and the (relative) RMSE is presented. All terms up to three-way interactions were included. From the size of the residual variance, it can be concluded that the first three orders capture the most important sources of variance of simulation factors. It turned out that the estimation method (Meth) was the most important first-order factor for bias and RMSE, followed by the range of item difficulties (Range) and the number of items (I). The performance of estimation methods for bias and RMSE depended on an interaction effect with the number of items. Interestingly, there was only an interaction effect of estimation and sample size (N) for the RMSE and not for the bias. Moreover, the performance of estimation methods was also moderated by the range of the item difficulty distribution. Finally, there were also some important three-order terms involving estimation methods (i.e., N×I×Meth, N×Range×Meth, I×Range×Meth). For a selected number of cells, we present some results that demonstrate these interaction effects in more detail.
In Table 2, the performance of the different estimation methods for bias and RMSE are summarized across 192 conditions of the simulation. CML and the LIM MINCHI (being the best estimation method with respect to bias), PCML, EVM, and RA (with powers of the incidence matrix larger than 1) are approximately unbiased across simulation conditions. For MML estimation methods, only those methods were unbiased that specified the ability distribution flexible enough. For the multinomial modeling (MMLMN), a large number of θ grid points (11 or 15;MMLMN(11) or MMLMN(15)) was needed for producing acceptable performance in most of the simulation conditions. At least three located latent classes (MMLLC) were needed for an acceptable estimation of item parameters with respect to bias. Notably, estimation under the normal distribution (MMLN) or with only two latent classes (MMLLC(2)) was unsuccessful in a variety of conditions. Furthermore, all JML variants showed biased item parameter estimates. PJML with a prior of σ prior = 1.5 turned out to be the best-performing method with respect to bias throughout all simulation conditions. Interestingly, the method JMLM that eliminates persons from estimation resulted in a smaller bias than JMLW that does not remove persons.
For the RMSE, JMLε with ε = 0.24 performed best. Like for bias, only sufficiently flexible distributional specifications in MML resulted in acceptable performance for the RMSE. It was indicated to use four instead of only three moments for log-linear smoothing (MMLLS). Again, located latent class models (MMLLC) produced relatively precise item parameter estimates that were even superior to estimates obtained from CML. LIM showed higher RMSE values compared to MML variants and CML. However, the best-performing LIM MINCHI outperformed MML with normal distribution assumption (MMLN) and the widely implemented JML variants JMLM and JMLW. In particular, MINCHI (and partly PCML) should be preferred over EVM and RA estimation.     Table 3 shows the bias and the RMSE for different estimation methods for a sample size of N = 1000 and I = 10 items for a test with a wide range of symmetrically distributed item difficulties as a function of the data-generating trait distribution. Six out of eight datagenerating models are depicted that demonstrate the most important differences among estimation methods. The MML method that poses a normal distribution assumption (i.e., MMLN) provides the least bias if the latent ability was generated with a normal distribution. The largest bias was obtained if the located latent class model (LC(2) or LC(3)) generated the data. If θ was normally distributed, log-linear smoothing (MMLLS) and a multinomial distribution (MMLMN) with at least 7 grid points provided approximately unbiased estimates. Notably, located latent class models (MMLLC) have slightly increased bias, but the efficiency with respect to RMSE is even higher than MMLN. Table 3. Bias and relative RMSE for different estimation methods for a sample size of N = 1000 and I = 10 items for a test with a wide range of symmetrically distributed item difficulties as a function of the data-generating trait distribution.

Bias
Relative LIM were unbiased or had only small biases, except in the case in which θ was generated with located latent classes and PMML and LLLA estimation. This finding could be explained by the fact that these two estimation methods rely on the incorrect normal distribution assumption. Moreover, note that using powers 2 or 3 of the incidence matrix in RA (and EVM) improved estimates for bias and (to a larger extent) RMSE. Estimation methods PCML and MINCHI outperformed EVM and RA in terms of the RMSE. The additional computational burden in the iterative methods PCML and MINCHI compared to EVM and RA might be acceptable in practical applications.
The results in Table 3 also indicate that flexible MML estimation methods are competitive to CML estimation for nonnormally distributed abilities. Among the JML estimation methods, the bias of PJML with prior σ prior = 1.5 (i.e., PJML(1.5)) was smallest, Across datagenerating models, the RMSE for this estimation method was smallest in three of the six data constellations, while in the other three constellations, JMLε with ε = 0.24 performed best. However, it should also be emphasized that JMLε(0.24) introduced non-negligible bias in item parameters. The JMLW estimation method is preferred over JMLM in terms of bias and RMSE, but both methods were strongly inferior to PJML and JMLε. Table A1 in Appendix B shows the bias and the RMSE for different estimation methods for a sample size of N = 1000 and I = 10 items for a test with a small range of symmetrically distributed item difficulties as a function of the data-generating trait distribution. In general, biases in estimated item parameters were smaller than those with a wide range of item difficulties (presented in Table 3). This finding illustrates that item parameters with large true |b i | values (i.e., extremely small or extremely large) were more prone to bias than item difficulties with true parameters close to zero. It is also evident that the difference between the CML and MML methods from LIM was much smaller. Consequently, practitioners might prefer LIM for tests in which item difficulties do not have extreme values. In this case, the difference between PCML and MINCHI from RA and EVM also turned out to be smaller, although the former two methods might also be preferred. Interestingly and in contrast to a test with a wide range of item difficulties, JMLM outperformed JMLW. Moreover, note that PJML(1.5) resulted in less biased and more precise estimates than JMLε(0.24). Table A2 in Appendix B shows the bias and the RMSE for different estimation methods for a sample size of N = 250 and I = 10 items for a test with a wide range of symmetrically distributed item difficulties as a function of the data-generating trait distribution. For a smaller sample size of N = 250 (compared to the large sample of N = 1000 in Table 3), JMLε(0.24) consistently resulted in the least RMSE but produced a slightly biased estimate. Surprisingly, more flexible MML estimation methods were not firmly inferior to MMLN if the θ followed a normal distribution. In particular, the located latent class model (MM-LLC(2)) provided the most precise estimates among the MML methods for the sample size N = 250. Also, note that CML was not superior to all MML methods. Interestingly, the performance of LIM compared to MML and CML was even worse than for larger samples. Researchers should only probably opt for the computationally simpler LIM for sufficiently large sample sizes and tests with a smaller range of item difficulties. Table 4 shows the bias and the RMSE for different estimation methods for a sample size of N = 250 and I = 30 items for a test with a wide range of symmetrically distributed item difficulties as a function of the data-generating trait distribution. For a longer test containing 30 items, biases for all estimation methods were substantially smaller than for 10 items. Notably, differences among estimation methods also turned out to be small. For example, assuming a misspecified normal distribution (i.e., MMLN) only introduced small biases. For a sufficiently long test, the differences between MML methods and CML from LIM were only modest, and practitioners might opt for the computationally simpler methods in this case. Again, PCML and MINCHI have slightly better performance than the EVM and RA methods. It is also important to note that JMLε required a larger ε value of 0.4 or 0.5 compared to a short test for realizing the maximum precision.  .007 0.008 0.014 0.011 0.011 0.009 106.7 104.4 105.7 105.1 105.8 105.4  MMLLS(4)  0.007 0.008 0.011 0.008 0.015 0.010 106.7 104.4 105.1 104.3 104 MMLMN(15) 0.008 0.009 0.011 0.007 0.014 0.011 106.7 104.6 105.1 104.2 104 Finally, Table A3 in Appendix B shows the bias and the RMSE for different estimation methods for a sample size of N = 1000 and I = 30 items for a test with a wide range of symmetrically distributed item difficulties as a function of the data-generating trait distribution. Again, flexible MML specifications can compete with JML. PJML and JMLε(0.24) produced highly precise estimates, while the bias was almost negligible. It should be emphasized that these JML variants are at least as efficient as CML or MML variants. LIM resulted in more variable estimates. However, PCML and MINCHI almost achieve optimal efficiency of JMLε or MML variants. Surprisingly, PMML produced biased item parameter estimates and might not be recommended. Further research is needed whether this observation relied on the particular implementation of the authors.

Discussion
In this article, we compared several estimation methods for the Rasch model. It has been shown that the choice of the ability distribution impacts the precision of estimated item parameters. The differences between estimation methods appear larger for shorter (i.e., 10 items) than for longer (i.e., 30 items) tests. It turned out that MML with a flexible distribution can handle a nonnormally distributed trait well and can compete with CML. Interestingly, JML variants PJML and JMLε outperformed conditional and marginal maximum likelihood as well as LIM in many situations in terms of the RMSE. Moreover, these improved JML methods resulted in approximately unbiased estimates for long tests and larger sample sizes. These findings could stimulate research to consider JML methods PJML and JMLε instead of the widely implemented JMLM or JMLW variants. LIM are attractive for practitioners because they are not computationally demanding. It turned out that PCML or the MINCHI method outperformed the more widely used EVM or RA estimation methods.
Future research could investigate item parameter estimation in the RM for very short scales (e.g., I = 5 or I = 7 items). We suppose that differences among methods will appear larger in this situation. Moreover, optimal tuning parameter σ prior in PJML and ε in JMLε depending on sample size, number of items, and item difficulty distribution have to be determined. We expect that optimal tuning parameters for individual ability estimates do not necessarily coincide with those that are optimal with respect to the RMSE of estimated parameters.
Throughout the simulation study, we assumed that the RM holds in the data. However, there might be situations where RM is intentionally a misspecified IRT model [129]. First, the two-parameter logistic model [130] might have generated the item responses, but the misspecified RM is used as a fitting model. In the case of misspecified IRT models, different estimation functions differently quantify model deviations (see also [110]). Future research might evaluate the appropriateness of estimation methods with respect to robustness from the assumption of the 1PL model. Notably, any estimation method defines its own set of item difficulties in the population of students because estimated difficulties are determined by a particular discrepancy function between the posed misspecified RM and a true IRT model that might involve very complex item response functions. Second, local dependence [34] is also often found in empirical data. LIM MINCHI, PCML, EVM and RA only rely on bivariate frequencies and not marginal frequencies. According to preliminary experience of the authors of this paper, these methods can result in less biased item parameter estimates than CML, MML methods, or JML methods. Studying the effects of local dependence for different estimation methods in the RM might be an exciting topic for future research.
In many applications, missing item responses often occur [131][132][133]. It can be expected that the studied estimation methods in this article can also be applied in situations in which data is missing completely at random (MCAR). It would be interesting whether MML has increased efficiency for MCAR data compared to LIM. For missing at random data, LIM (and ordinary CML) will likely fail (see [134]), and MML or JML will usually be preferred (see [87,135]).
Finally, we only considered frequentist estimation methods. In Bayesian estimation, prior distributions of item parameters can be included in the analysis, which can further stabilize the estimation of item difficulties [136][137][138][139][140][141]. We would like to note that when priors are normally distributed, a penalty function is added (or subtracted) in the estimation function. These penalties can be used not only for MML or JML but also CML [142] or LIM [143].
Funding: This research received no external funding.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:  Table A1 shows the bias and the RMSE for different estimation methods for a sample size of N = 1000 and I = 10 items for a test with a small range of symmetrically distributed item difficulties as a function of the data-generating trait distribution. Table A2 shows the bias and the RMSE for different estimation methods for a sample size of N = 250 and I = 10 items for a test with a wide range of symmetrically distributed item difficulties as a function of the data-generating trait distribution. Table A3 shows the bias and the RMSE for different estimation methods for a sample size of N = 1000 and I = 30 items for a test with a wide range of symmetrically distributed item difficulties as a function of the data-generating trait distribution. Table A1. Bias and relative RMSE for different estimation methods for a sample size of N = 1000 and I = 10 items for a test with a small range of symmetrically distributed item difficulties as a function of the data-generating trait distribution.

Bias
Relative