M -Hazy Vector Spaces over M -Hazy Field About the Equivalence of the Latent D-Scoring Model and the Two-Parameter Logistic Item Response Model

: The generalization of binary operation in the classical algebra to fuzzy binary operation is an important development in the ﬁeld of fuzzy algebra. The paper proposes a new generalization of vector spaces over ﬁeld, which is called M -hazy vector spaces over M -hazy ﬁeld. Some fundamental properties of M -hazy ﬁeld, M -hazy vector spaces, and M -hazy subspaces are studied, and some important results are also proved. Furthermore, the linear transformation of M -hazy vector spaces is studied and their important results are also proved. Finally, it is shown that M -fuzzifying convex spaces are induced by an M -hazy subspace of M -hazy vector space. are the realm of linear combinations, also known as superpositions, weighted sums, and sums with coefﬁcients. Such sums occur throughout mathematics, both pure and applied, including statistics, science, engineering, and economics. The key word is “linear”. Even when studying nonlinear phenomena, it is often useful to approximate with a simpler linear model. You can say that vector spaces are one of the great organizing tools of mathematics, helping reveal a structural similarity in a wide variety of topics found in such different contexts that they may seem completely different. Suppose you stand in front of a house. It is rather old but beautifully constructed of Abstract: The comparison a application using linking functions. Abstract: This article shows that the recently proposed latent D-scoring model of Dimitrov is statistically equivalent to the two-parameter logistic item response model. An analytical derivation and a numerical illustration are employed for demonstrating this ﬁnding. Hence, estimation techniques for the two-parameter logistic model can be used for estimating the latent D-scoring model. In an empirical example using PISA data, differences of country ranks are investigated when using different metrics for the latent trait. In the example, the choice of the latent trait metric matters for the ranking of countries. Finally, it is argued that an item response model with bounded latent trait values like the latent D-scoring model might have advantages for reporting results in terms of interpretation.


Introduction
In 1971, Rosenfeld [1] published an innovative paper on fuzzy subgroups. This article introduced the new field of abstract algebra and the new field of fuzzy mathematics. Many scientists and researchers worked in this field and obtained fruitful research. Liu [2,3] gave an important generalization in the field of fuzzy algebra by introducing fuzzy subrings of a ring and fuzzy ideals. Demirci [4] firstly introduced the fuzzification of binary operation to group structure through fuzzy equality [5] and introduced "vague groups." After this work, many researchers used this concept and extended it to several other useful directions such as [6][7][8][9][10]. In Demirci's approach, the characteristic of the degree between the fuzzy binary operation is not used, and the identity and inverse element of an element are also not unique. Liu and Shi [11] proposed a new approach to fuzzify the group structure by characterizing the degree of fuzzy binary operation, which is called M-hazy groups. It is important to mention that M-hazy associative law has been defined in order to obtain M-hazy groups. Mehmood et al. [12] extended this concept to the ring structure and gave a new method to the fuzzification of rings, which is defined by M-hazy rings. It is also worth mentioning that an M-hazy distributive law has been proposed so as to define M-hazy rings. Furthermore, Mehmood et al. [13] also provided the homomorphism theorems of M-hazy rings with its induced fuzzifying convexities. Liu and Shi [14] proposed M-hazy lattices. Fan et al. [15] introduced an M-hazy Γ-semigroup.
Vector space has been the most widely studied and used in linear algebra theory. A vector space is a set of elements with a binary addition operator and a multiplication operator that has closure under these two operations over a field, all while satisfying a set of axioms. Vector spaces are the realm of linear combinations, also known as superpositions, weighted sums, and sums with coefficients. Such sums occur throughout mathematics, both pure and applied, including statistics, science, engineering, and economics. The key word is "linear". Even when studying nonlinear phenomena, it is often useful to approximate with a simpler linear model. You can say that vector spaces are one of the great organizing tools of mathematics, helping reveal a structural similarity in a wide variety of topics found in such different contexts that they may seem completely different. Suppose you stand in front of a house. It is rather old but beautifully constructed of

Introduction
One primary go outcomes across man (PISA; OECD (2017)) countries (72 countri often show differenti In this article, w Haebara (1980) for m proposed by He and two-parameter logist that approximately u group-specific subset identifying items wit et al. (2008); Magis an The paper is stru allows the presence o It is argued that the uniform DIF. In Secti

Introduction
Item response theory (IRT; [1]) is the statistical analysis of test items in education, psychology, and other fields of social sciences. Typically, a number of test items are administered to test takers, and the interest is to infer the ability (performance or trait) of them. IRT models relate observed item responses to unobserved latent traits. Because the latent trait is unobserved, there are many plausible choices for modeling these relationships. The most popular class of IRT models comprises logistic IRT models [2]. Recently, in a series of papers, Dimitrov proposed an alternative IRT model, the so-called latent D-scoring model [3]. The main goal of this paper is to demonstrate that the newly proposed IRT model is statistically equivalent to the well-established two-parameter logistic IRT model.
The paper is structured as follows. In Section 2, IRT models are introduced in their general form. Afterward, the logistic IRT model and the latent D-scoring model are discussed. In Section 3, we show the statistical equivalence of the latent D-scoring model and the logistic IRT model utilizing an analytical derivation and a numerical illustration. Furthermore, we study the properties of the two models. Section 4 presents an empirical example that compares outcomes of the two different modeling strategies and compares them with two alternative parameterizations of the latent trait. Finally, the article closes with a discussion.

Item Response Modeling
In Section 2.1, we discuss the indeterminacy of the latent trait in IRT models. In Section 2.2, we focus on the logistic IRT model and its estimation. As an alternative IRT model, the latent D-scoring model is introduced in Section 2.3.

Indeterminacy of the Latent Trait in IRT Models
A unidimensional IRT model for dichotomous item responses X i ∈ {0, 1} is a statistical model [2] where f denotes the density function of the latent variable θ (also denoted as the latent trait), and P i (x, θ) = P(X i = x|θ) denotes the item response function of item i. Note that items i = 1, . . . , I are conditionally independent given the latent trait θ. The model parameters in Equation (1) are typically not uniquely defined. Assume that one utilizes a monotone function m : R → (0, 1) for defining a transformed latent trait δ by δ = m(θ). For example, m could be the logistic function Ψ(x) = [1 + exp(−x)] −1 that maps the real line onto the unit interval (0, 1). Define P * i (δ) = P i (m −1 (δ)), where m −1 denotes the inverse function of m. Furthermore, denote by g the density function of the transformed latent trait δ. The IRT model in Equation (1) can be equivalently written as The density g can be obtained from the density f by applying the density transformation theorem where m = dm dx is the derivative of m with respect to θ. It could be argued that only ordinal information can be extracted from the latent trait θ because the general IRT model (1) is only identified up to monotone transformations [4][5][6][7]. The indeterminacy of the latent trait metric implies that a researcher can seek a transformation m(θ) for the sake of enhancing interpretations of the results. One possible transformation is the true score metric τ = τ(θ) [2] that maps the θ metric from the real line to the bounded interval (0, 1) by defining For a fixed value of θ, τ = τ(θ) is the expected value of the proportion of correctly solved items. Another alternative is the rank score metric ρ = ρ(θ) [7] that is defined by where F is the distribution function of θ. One can show that ρ follows a uniform distribution (hence, the label "rank score"):

Logistic Item Response Model
An important class of IRT models is the class of logistic IRT models. Logistic IRT models employ the logistic link function for parameterizing IRFs. The IRFs in the twoparameter logistic (2PL) model [8] are given by where a i are item discriminations, and b i are item difficulties. The one-parameter logistic (1PL) model (Rasch model; [9]) is obtained by setting all item discriminations equal to one (i.e., a i = 1 for i = 1, . . . , I).
In Figure 1, IRFs of seven items of the 2PL model are displayed (see the figure legend for item parameters a i and b i ). It can be seen that items with higher item discriminations a i have steeper slopes. Additionally, items with larger item difficulties b i are shifted to the right. A fundamental property of IRFs in the 2PL model is a lower asymptote of zero and an upper asymptote of one. Hence, persons with very low abilities (θ → −∞) have almost zero probability of correctly solving any item in the test, while highly able persons (θ → ∞) correctly solve items with a probability of one. Alternative IRT models allow lower and upper asymptotes different from 0 or 1, respectively [10].  In many applications, a normal distribution N(µ, σ 2 ) for the latent trait θ is assumed [7]. However, more flexible distributions or semiparametric specifications are possible [11,12]. Identification constraints are required in the 1PL and 2PL models for the estimation of model parameters. In the 1PL model, one can identify the model by setting µ = 0 or fixing an item difficulty of a reference item to 0 (or to a prespecified value). Alternatively, one can constrain the sum of the item difficulties equal to zero. In the 2PL model, identification can be ensured by posing a standard normal distribution N(0, 1) (i.e., µ = 0 and σ = 1). Alternatively, a reference item i 0 can be chosen for which a i 0 = 1 and b i 0 = 0 are used as fixed values in the estimation. Using a reference item has the advantage that the distribution F of θ can be flexibly estimated without using constraints on some parameters of F.
The 1PL model or the 2PL model can be estimated using marginal maximum likelihood (MML) or joint maximum likelihood (JML) estimation [2]. It is noteworthy that ∑ I i=1 X i is a sufficient statistic for θ in the 1PL model, while ∑ I i=1 a i X i is the corresponding sufficient statistic in the 2PL model. Hence, the different models imply different interpretations and implications of the trait because the contribution of items to the variable of interest differs considerably [13].

Dimitrov's Latent D-Scoring Model
Dimitrov proposes an alternative IRT model that has a bounded metric for the latent trait. His latent D-scoring (LDS) model [3,14] includes a latent trait δ that takes values in the interval (0, 1). The IRF in the LDS model is given as [15] P(X i = 1|δ) = 1 where G is some distribution on (0, 1). Item discriminations α i are non-negative and indicate the extent to which item i measures the trait δ. Item difficulties δ i range between 0 and 1 and primarily determine the proportion of correctly solving item i. The IRF in Equation (8) is also referred to as the rational function model with two item parameters [15]. The IRFs of the LDS model for seven items are shown in Figure 2 (see the figure legend for item parameters α i and β i ).  The LDS model with one item parameter is obtained by setting α i = 1 [15]: The LDS model with three item parameters that accommodates guessing effects is defined as [15] In the following, we mainly consider the case of the LDS model with two item parameters.
The LDS model can be estimated with MML [3] or JML [16]. In Section 3, we show that identification constraints are needed for the estimation of the model. The latent D-scoring model is applied in psychometric areas of linking and equating [16], differential item functioning [14], and the development of multistage tests [17].

Relation of the Latent D-Scoring Model and the 2PL Model
In this section, we show the close correspondence of the 2PL model and the LDS model. It is demonstrated that the two models are equivalent using analytical (Section 3.1) and numerical (Section 3.2) arguments. However, the two models imply different consequences regarding measurement precision and interpretations (Section 3.3). Finally, we propose an extension of the LDS model to multiple dimensions in Section 3.4.

Equivalence of the Latent D-Scoring Model and the 2PL Model
In this subsection, we analytically show that the LDS model is statistically equivalent to the 2PL model. Consequently, the model parameters of the 2PL model can be transformed to obtain model parameters of the LDS model.
The IRF of the LDS model (Equation (8)) can be rewritten as where G is the distribution function of δ. By defining θ = log δ , and a i = α i , one can rephrase the LDS model in Equation (11) as the 2PL model. Equivalently, we can write δ = Ψ(θ) = [1 + exp(−θ)] −1 as the logistic transform of θ. Note that the logistic transform of δ = Ψ(θ) was also discussed in [6,7]. Hence, the LDS model is a reparametrization of the 2PL model. Hence, estimation routines for the 2PL model can be used for estimating the latent D-scoring model, and item parameters are transformed afterward; that is, α i = a i and β i = Ψ(b i ).
Our derivation also implies that the LDS model with one item parameter is equivalent to the 1PL model. Moreover, the LDS model with three parameters is equivalent to the three-parameter logistic IRT model.
The distribution of δ can also be derived from the distribution of θ. The density function g of δ can be obtained from the density function f of θ by applying Equation (3) Conversely, the density function θ can also be obtained from the density function of δ by The estimation of the LDS model using software for the 2PL model requires a correct specification of the distribution for θ. Suppose that a particular distributional assumption is posed on δ with density g. In that case, the estimation procedure must ensure that the assumed distribution for θ aligns with the implied density f for θ (see Equation (13)) to avoid biased item parameter estimates.
The variance-covariance matrix V for item parameters in the 2PL model can be obtained utilizing the observed information matrix. The transformed item parameters for the LDS model emerge from a nonlinear transformation of the 2PL item parameters. Hence, the delta method can be applied for obtaining item parameters for the LDS model. In more detail, the matrix A of derivatives of the transformed item parameters with respect to the 2PL item parameters is a diagonal matrix, and the variance-covariance matrix for transformed item parameters is AVA T .
In Section 2.2, we showed that identification constraints are needed for estimating the 2PL model. Because the latent D-scoring model is equivalent to the 2PL model, the former also needs identification constraints. In the 2PL model, the location (i.e., the mean µ) and the scale (i.e., the standard deviation σ) for the latent trait θ can be fixed in the estimation. This would translate into identification constraints for the LDS model. Alternatively, a reference item i 0 could be chosen for the LDS model with fixed parameters α i 0 = 1 and

Numerical Illustration
This subsection demonstrates that the LDS model can be estimated using software for the 2PL model. We used item parameters of I = 7 items of the LDS model that were also used in Figure 2 (see also Table 1). The multivariate distribution of these I = 7 items according to the LDS model can be written as where P i (δ; α i , β i ) is the IRF for the ith item of the LDS model, and x = (x 1 , . . . , x I ). Note that there are 2 I = 128 different item response patterns. The corresponding marginal probabilities P(X = x) are computed using (14) and numerical integration with respect to θ. This numerical illustration aims to compute the multivariate distribution P(X = x) of X in Equation (14) for specified item and distribution parameters and to show that the input parameters can be uniquely and correctly identified from P(X = x). This probability distribution corresponds to a population, i.e., a sample with an infinite sample size. Maximum likelihood estimation is applied to estimate the item and distribution parameters. Sampling variability does not play a role in estimated (i.e., identified) parameters by relying on the population. Hence, the illustration demonstrates the parameter equivalence of the 2PL and the LDS model at the population level. Note that no standard errors must be reported for item parameters because the data are defined at the population level. In Section 3.1, we derived the transformation of item parameters from the 2PL model to the LDS model when showing statistical equivalence. Notably, for establishing statistical equivalence, the distribution G for δ is a transformation of the distribution of F for θ. When item parameters of the LDS model are obtained with transformed item parameters from the 2PL model, it must be ensured that the distribution F of θ is correctly specified in the 2PL model. This means that F corresponds to the distribution G for δ that is used for generating data. Hence, we investigate whether distributional misspecifications of θ have consequences for transformed item parameters of the LDS model. We considered two distributions for δ in the data-generating model. First, δ followed a beta distribution Beta(4,2) [18]. Second, δ followed a logit-normal distribution LogitN (0.6, 1.2 2 ); that is θ = log δ 1−δ is a normal distribution with a mean of 0.6 and a standard deviation of 1.2 [19][20][21].
The 2PL model was estimated in the R [22] package sirt [23] using a sample weights option that inputs the item response pattern probabilities P(X = x). To avoid a restrictive distributional assumption on θ, we used a fixed grid of 61 equidistant θ values ranging between −6 and 6 and assumed a normal distribution for θ. The item parameters of the fourth item were fixed (i.e., a 4 = 1 and b 4 = 0 in the 2PL model, which corresponds to α 4 = 1 and β 4 = 0.5 in the LDS model). The 2PL model was estimated using MML estimation and an EM algorithm [24]. Sample R code for the estimation is provided in Appendix A.  Results for this numerical illustration are presented in Table 1. It can be seen that estimated item parametersα i andβ i for the LDS model almost perfectly recover true values in the case of the logit-normal distribution. This finding can be expected because the log-linear smoothing approach includes the normal distribution as a particular instance (smoothing up to two moments). Slightly larger deviations were observed if the distribution for δ was a beta distribution. The logit transform of the beta distribution is not correctly represented by a normal distribution for θ, which explains slight biases in item parameter estimates. For example,β 7 = 0.892 deviated from the true values β 7 = 0.90 andα 5 = 1.472 deviated from α 5 = 1.50. However, these numerical differences are probably negligible in practical applications and confirm our analytical reasoning for the equivalence of the 2PL and the LDS model.

Conditional Standard Errors for the Latent Trait
In this subsection, we study the amount of information for the latent trait that can be extracted with the 2PL model and the LDS model by using the concept of item information. Let x pi = (x p1 , . . . , x pI ) denote the vector of item responses of person p. For IRFs P i (depending on already estimated item parameters), the maximum likelihood estimateθ p for the latent trait of person p is given as [1] Hence, the standard error associated with the estimateθ p is related to the information function that is obtained as the negative value of the second derivative of the log-likelihood function evaluated atθ p . The information that is provided by item i in (15) is then given as This allows defining the (expected) item information I i for item i [25] where π i = E(X i ) is the expected value for item i. In the literature, the observed item information is often defined as the item information function. However, this function can become negative for some IRT models and the LDS model in particular [25], which is why preferring (17) for ensuring positivity of the item information function. For the 2PL model, the expected and observed item information coincide and are given as Equation (19) implies that the least information is available for extreme θ values (i.e., extremely negative or positive). The test information I(θ) is defined as I(θ) = ∑ I i=1 I i (θ). It quantifies the information that is provided by the test at each latent trait value θ. The conditional standard error for the latent trait θ is given by SE(θ) = 1/ I(θ).
One can similarly define the item information function for δ for the LDS model (see also [15]): Analogously, the test information function I(δ) = ∑ I i=1 I i (δ) can be defined for the latent trait δ.
Because the latent D-scoring model is equivalent to the 2PL model (see Section 3.1), δ = Ψ(θ) is a monotonous transformation of θ, and the test information function for θ can be converted into the test transformation for δ. More generally, let δ = m(θ) be a monotone differentiable transformation. The test information function for δ can be computed from the test information function for θ (see [2]): where m = dm dθ . Equation (21) can be rewritten for conditional standard errors as SE(δ) = SE(m(θ)) = m (θ) SE(θ).
Hence, the conditional standard error SE(δ) for the LDS model is given as In Section 3.2, we demonstrated that the LDS model is equivalent to the 2PL model. For the item parameters of the seven items used in the demonstration (see Table 1), the conditional standard errors for θ and δ are shown in Figure 3. It can be seen that the 2PL model measures the latent trait θ less precisely for extremely large negative and extremely large positive values; that is, for low-and high-achieving persons. In line with the results of [3], the converse holds for the LDS model. Conditional standard errors are smallest for persons with δ values near 0 or 1. Hence, statements about measurement precision in different ranges of values for the latent trait strongly depend on the chosen metric (see also [26]). Interestingly, the transformed latent trait ξ = m(θ) = θ −∞ I(u) du (the so-called arc length metric; see [6]) has homogeneous standard errors among the latent trait SE(ξ) = SE(m(θ)) = 1 .
These observations indicate that it is difficult to state for which subgroups of persons adaptive or multistage testing [27] provides measurement precision gains because such statements depend on the chosen metric.
An anonymous reviewer presented an insightful explanation of the behavior of conditional standard errors. For people with a high D-score, a low standard error will result because one can be very confident that these persons will get a new but parallel item correct. In contrast, for persons with a high θ score, a high standard error will be observed because it is uncertain what the hardest item they could write is. Overall, the different scoring methods give different interpretations and, therefore, different interpretations to their respective standard errors.

A Multidimensional Latent D-Scoring Model
To our knowledge, the LDS model has only been investigated for a unidimensional latent variable δ. However, in applications, multidimensional traits are often of interest [28,29]. We now show that an apparent extension of the LDS model to multiple dimensions can be obtained by using the same transformations of the multidimensional variant of the 2PL model. We illustrate the arguments for two dimensions θ 1 and θ 2 .
The multidimensional logistic IRT model can be written as [29] P(X i = 1|θ 1 , where F is a bivariate distribution of (θ 1 , θ 2 ) and θ d (d = Hence, the multidimensional 2PL model can easily be reparametrized for defining a multidimensional LDS model. The generalization to more than two dimensions is straightforward. Given that multidimensional IRT models are more difficult to estimate than unidimensional IRT models, it is advantageous that existing software implementations of multidimensional logistic IRT models can be used for estimating a multidimensional variant of the LDS model.

Method
In order to illustrate the consequences of the choice of different metrics of the latent trait in multiple-group comparisons, we analyzed the data from the Programme for International Student Assessment (PISA) conducted in 2006 (PISA 2006; [30]). In this situation, groups constitute countries. We included 26 countries (see Table 2) that participated in 2006 and focused on the reading test (see [31,32] for other studies using this dataset).  ρ = rank score metric; maxrk = maximum rank difference among ability metrics θ, δ, τ, and ρ.
Items for the reading domain were only administered to a subset of the participating students. We included only those students who received a test booklet with at least one reading item. This resulted in a total sample size of 110,236 students (ranging from 2010 to 12,142 students between countries). In total, 28 reading items nested within eight reading texts were used in PISA 2006. Six of the 28 items were polytomous and were dichotomously recorded, with only the highest category being recorded as correct.
In all analyses, student weights were taken into account. Within a country, student weights were normalized to a sum of 5000, so that all countries contributed equally to the analyses.
In a first step, the 2PL model was estimated based on the data comprising students of all 26 countries. Student weights were taken into account, and a normal distribution was posed for θ in the estimation. The obtained item parametersâ i andb i were fixed in the second step when estimating the trait distribution in each country. More concretely, the 2PL model was fitted using the R [22] package sirt [23] using MML estimation. The 2PL model was estimated by using a discrete grid of T = 121 equidistant θ points ranging between −6 and 6 for numerical integration of the involved integrals in the log-likelihood function of the 2PL model. As in Section 3.2, log-linear smoothing up to four moments of the trait distribution [12] within a country was employed to allow non-normal distributions. Assume that the estimated parametric distribution for θ in country g is π gt = P(θ t ; δ g ) for grid values θ t (t = 1, . . . , T) and country-specific distribution parameters δ g . Afterward, individual posterior distributions h p (θ t |x p ) (t = 1, . . . , T) were computed as where P i (θ t ;â i ,b i ) is the IRF of item i from the 2PL model using estimated item parameterŝ a i andb i from the total sample. By construction, it holds that ∑ T t=1 h p (θ t |x p ) = 1. For N g persons per country g, the country meansμ θ,g on the logit metric θ was estimated bŷ where the person weights w p sum to W = 5, 000 within a country (i.e, ∑ N g p=1 w p = W). Country-specific standard deviationsσ θ,g can be computed similarly: Besides the logit metric θ, we also investigated the metric δ based on the LDS model, the true score metric τ (see Equation (4)), and the rank score metric ρ (see Equation (5)). All three alternative metrics are monotone transformations m(θ) of θ. The country mean µ m(θ),g at the transformed metric was calculated aŝ Using (30), the standard deviation of m(θ) can be computed similarly to (29). Furthermore, conditional standard errors for the four latent trait metrics are computed for the whole sample containing all students. The item information is obtained by using the second derivatives of IRFs with respect to the metrics θ, δ, τ, and ρ (see Equation (17)).

Results
In Table A1 of Appendix B, estimated item parametersâ i andb i from the 2PL model are shown. These item parameters were transformed into parameters of the equivalent LDS model (see columnsα i andβ i in Table A1). The IRFs of seven selected items are displayed in Figure A1 in Appendix B for the four latent trait metrics θ, δ, τ and ρ. IRFs for the bounded metrics δ, τ and ρ look very similar.
In Figure 4, the transformation functions δ = δ(θ), τ = τ(θ) and ρ = ρ(θ) are depicted. The latent D-score δ and the true score τ follow a very close transformation function. The rank score ρ differs from the former two in the tails of the θ distribution. Hence, it can be expected that δ and τ provide similar country rankings, while using ρ might lead to slightly different country rankings. In Figure 5, conditional standard errors are displayed. It can be seen that θ has a U-shaped form, while the three other metrics are inverted U-shaped. Interestingly, the standard errors SE(δ) and SE(τ) approach 0 for δ or τ near to 0 or 1. This is not the case for the rank score metric ρ, for which standard errors for ρ = 0 and ρ = 1 are larger than 0. Assume that Country C1 is low-performing (negative θ value) and Country C2 has average performance (θ average of about 0). Then, it can be the case that the latent trait is less precisely assessed for Country C1 than for Country C2 in the θ metric but more precisely assessed for Country C1 than C2 in one of the three alternative metrics δ, τ, or ρ. These statements rely on the somewhat arbitrary choice of the latent trait metric used to quantify differences between countries.  Table 2 contains detailed results of means, standard deviations, and country ranks based on means for the 26 countries. For the first six high-performing countries, country ranks are the same for all four trait metrics. However, there are countries for which ranks differ considerably. Relatively large deviations are observed for Belgium (BEL; maximum rank difference (maxrk) of 4), Estonia (EST; makrk = 6), and Germany (DEU; maxrk = 7). The most crucial difference occurs for the τ and the ρ metric. For the three mentioned countries, the standard deviation of θ was relatively low or high compared to all other countries in the sample. This observation explains the differences among ranks because the tails of the θ distributions are differently weighted (i.e., differently transformed) for τ and ρ.
Overall, the Spearman rank correlations of country means ranged between 0.949 (between τ and ρ) and 0.992 (between θ and δ). The average rank difference of country means across different metrics was 2.000 (see column "maxrk" in Table 2; SD = 1.853, Min = 0, Max = 7). The Spearman rank correlations of country standard deviations ranged between 0.973 (between τ and ρ) and 0.999 (between δ and ρ). The average rank difference of country standard deviations across different metrics was 1.000 (SD = 1.301, Min = 0, Max = 5). To sum up, the choice of the ability metric can have relevance for some countries for the reporting of country means.

Discussion
This article shows that the newly proposed LDS model of Dimitrov can be interpreted as a reparametrization of the well-studied 2PL model. Hence, all established statistical techniques for the 2PL model can be used for practical applications of the LDS model. It has been shown that the latent trait score δ from the LDS model is a monotonous (logistic) transformation of the θ score from the 2PL model. All other psychometric areas such as differential item functioning, equating and linking, or test assembly must not be reinvented for the LDS model because known techniques for the 2PL model can be used.
Although these findings might be interpreted as somehow destructive for the research surrounding the LDS model, we do not think that the LDS model is not of interest at all. We wanted to argue that the choice latent trait metric is arbitrary in IRT models, and the θ or the δ metric can be both useful in applications. The authors of this paper tend to prefer bounded trait metrics in applications because it seems more challenging to interpret the possibility of unbounded negative and positive trait values of θ [33]. However, we would prefer the true score metric τ or the rank score ρ over δ. The latent D-score δ can be interpreted as a particular true score in which only a reference item with a i = 1 and b i = 0 is used. We believe that using a well-chosen reference test with its item parameters provides a better interpretable latent trait metric in practical applications. The rank score ρ has the advantage that it does not depend on item parameters. For example, in the PISA study, one fixes the θ metric in the starting study (e.g., in PISA 2000) to a mean of 500 and a standard deviation of 100. Using the rank metric ρ would imply that the metric is identified by assuming a uniform distribution on [0, 1] for identification. Both approaches might be legitimate in practical applications. Notably, linking and equating for bounded metrics are more difficult to conduct than for unbounded metrics. However, we would opt for using the unbounded metric from the 2PL model for the operational use for linking but bounded metrics for reporting ability distributions.
In IRT models, items are typically treated as fixed. However, they can alternatively be interpreted as exchangeable. Item sampling models [34][35][36] have fewer assumptions in this respect and could be alternatively employed in assessment studies.
The LDS model has been motivated as an IRT analog of the so-called manifest Dscoring method [37]. The scoring rule ∑ I i=1 (1 − π i )X i is used in this approach, where π i = P(X i = 1) is the probability of getting item i correct. In manifest D-scoring, more difficult items receive larger weights. This property might have appeal in some applications. However, we believe that this scoring rule does not adequately represent all items in a test in typical assessment studies and might lead to country comparisons with reduced validity. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.