Abstract
Diagnostic classification models (DCM) are latent structure models with discrete multivariate latent variables. Recently, extensions of DCMs to mixed membership have been proposed. In this article, ordinary DCMs, mixed and partial membership models, and multidimensional item response theory (IRT) models are compared through analytical derivations, three example datasets, and a simulation study. It is concluded that partial membership DCMs are similar, if not structurally equivalent, to sufficiently complex multidimensional IRT models.
1. Introduction
In the social or life sciences, humans (i.e., subjects or persons) respond to multiple tasks (i.e., items) in a test. For example, students may be asked to solve items in a mathematics test, or patients may report whether particular symptoms present. These tests result in multivariate datasets with dichotomously scored item responses taking values of zero or one.
Let be a random vector containing I items (). In multivariate analysis, the dependency structure of the high-dimensional contingency table (for ) is represented by sufficiently parsimonious models. A general model is defined in latent structure analysis [1,2,3,4,5]:
where is a model parameter. Note that the integration with respect to is interpreted as any measure that can be either continuous, discrete, or a mixture of both. It should be emphasized that the dependency structure in is represented by a (finite-dimensional) latent variable . The dimension of is typically lower than I to achieve a convenient model interpretation of the dependency in . The model (1) imposes a local independence assumption, meaning that the items are conditionally independent of . The functions are also referred to as item response functions (IRF).
In item response theory (IRT; Refs. [6,7,8]), the latent variable is a unidimensional or multidimensional latent variable that can take values between minus and plus infinity. Diagnostic classification models (DCM; Refs. [9,10,11]) are particular latent class models in which the variable is denoted as and is a multidimensional latent variable. It is evident that IRT and DCM offer quite distinct interpretations of the underlying latent variable to applied researchers. Recently, the exact allocation (i.e., crisp membership) of subjects to latent classes in DCMs has been weakened, allowing for gradual membership to latent classes [12,13]. Comparisons between unidimensional IRT models and DCMs have been carried out in [14,15,16,17,18,19,20,21]. However, comparisons of DCMs with multidimensional DCMs are scarce in the literature, and to our knowledge, there is no systematic comparison of multidimensional IRT models and mixed membership DCMs.
Purpose
In this article, DCMs, mixed membership DCMs, and multidimensional IRT models are compared using three publicly available datasets and by conducting a simulation study focusing on model selection.
The remainder of the article is structured as follows. Section 2 reviews compensatory and noncompensatory multidimensional IRT models. DCMs are reviewed in Section 3. The extension of ordinary DCMs to DCMs with mixed or partial membership is outlined in Section 4. Section 5 provides a heuristic comparison of multidimensional IRT models, DCMs, and DCM partial membership extensions. In Section 6, three empirical datasets are analyzed using different specifications of the three modeling approaches. Moreover, Section 7 presents findings from a simulation study devoted to model selection. Finally, the article closes with a discussion Section 8.
2. Multidimensional Item Response Model
Multidimensional IRT (MIRT) models [22,23] assume a vector of real-valued ability variables. That is, abilities can range from minus to plus infinity. The MIRT model can be written as:
where denotes the multivariate density function of that depends on the unknown parameter . Most frequently, a multivariate normal distribution for is assumed, and its density is given as:
where and are the mean vector and the covariance matrix of , respectively. Moreover, denotes the determinant of . In MIRT models that estimate item intercepts and item slopes for all items separately, all means are usually fixed to 0, and all standard deviations are fixed to 1 for identification reasons.
In this article, we confine ourselves to confirmatory MIRT models. Items are allocated to dimensions of in the specification of the model. The allocation can be encoded in a Q-matrix [24,25] that defines which item loads on which dimension. Two types of loading structures can be distinguished: between-item dimensionality and within-item dimensionality [26,27].
2.1. Between-Item Dimensionality
In between-item dimensionality [26,28], all items load on one and only dimension. That is, item loads only on one dimension . In this case, the multidimensional IRF is reduced to an unidimensional IRF:
2.2. Within-Item Dimensionality
In within-item dimensionality, an item is allowed to load on more than a single dimension. Let us assume that item loads on the two dimensions and . Then, the IRF can be written as:
In the rest of the paper, models for within-item dimensionality are only discussed in the case that items load on at most two dimensions. The case in which items load on more than two dimensions is not conceptually different but introduces much more notation. Therefore, we restrict ourselves to only two-dimensional IRT models to simplify the technical aspects of our arguments.
The most frequent choice of the IRF in the case of between-item dimensionality is the multivariate variant of the 2PL model [29,30] and is given as:
where denotes the item intercept and the item slope, respectively, and denotes the logistic distribution function.
There is more flexibility in defining IRFs in the case of within-item dimensionality than for between-item dimensionality. Different alternatives are discussed in the following subsections.
2.2.1. Compensatory Multidimensional IRT Model
In the case of within-item dimensionality, the IRF of item depends on the values of dimensions and . The compensatory MIRT model [23,30] is defined as:
where and are dimension-specific item slopes. It can be seen in (7) that the terms and can compensate for each other. Hence, for example, low values in can be compensated by large values in . Nevertheless, the IRF in (7) allows a flexible relationship of the required abilities for item i. By allowing for compensation in the abilities, the abilities and can operate somehow independently, which is not necessarily the case for noncompensatory MIRT models discussed in the next subsection.
2.2.2. Noncompensatory Multidimensional IRT Model
In some applications, it can be questioned whether the compensatory MIRT model makes assumptions that are likely to be met. In contrast, it could be argued that low abilities in one dimension cannot be compensated by high abilities in another dimension. For this reason, noncompensatory MIRT models have been proposed. A noncompensatory functioning of abilities and can be defined by defining the multidimensional IRF as a product of dimension-wise IRFs [31,32,33,34]:
Note that item intercepts are separately defined for each dimension. Hence, the noncompensatory MIRT model (8) has one additional item parameter compared to the compensatory MIRT model (7). An alternative approach to (8) was proposed in [35], in which the minimum instead of the product of the two probabilities was chosen.
2.2.3. Partially Compensatory Multidimensional IRT Model
It might be too restrictive to decide between the extreme cases of the compensatory MIRT model (7) and the noncompensatory MIRT model (8). The partially compensatory MIRT model has been proposed that contains the former two MIRT models as particular cases [36]. By defining the linear terms:
The IRF of the partially compensatory MIRT model is given as
where the interaction parameter ranges between 0 and 1. A value of 0 for corresponds to the compensatory MIRT model (7), while a value of 1 corresponds to the noncompensatory MIRT model (8).
An alternative partially compensatory MIRT model has been proposed whose IRFs also involve the product of abilities as an additional term [37]. The multidimensional IRF of this model is given as:
The disadvantage of (11) is that the product can positively contribute to the item response probabilities if both abilities are positive or both are negative. This property can be regarded as unreasonable.
3. Diagnostic Classification Model
In DCMs [9,38] (also referred to as cognitive diagnostic models [39,40]), the latent variables in the latent structure model (1) are discrete. In this article, we consider the case of binary latent variables, which are also referred to as skills in the literature [41]. Let be a D-dimensional latent variable that contains binary components (). A value of 1 of variable indicates mastery of dimension d, whereas a value of 0 indicates nonmastery. The DCM is defined as:
where the IRFs are now functions of and are probabilities. In a saturated probability model for , there are probabilities that can be estimated. Obviously, the probabilities must sum to 1:
In the case of two skills (i.e., ), the vector can take the values , , , and . For high-dimensional vectors, positing or estimating a hierarchy among skills [42,43,44] might be convenient, which results in less than possible combination of different vectors in the DCM (12).
3.1. Between-Item Dimensionality
In the case of between-item dimensionality, all items load on one and only skill . The IRF is given as:
where the item parameters and again denote the item intercept and the item slope, respectively. Note that (14) reparametrizes two probabilities for such that:
The probability is also referred to as the slipping probability, whereas is denoted as the guessing probability [45]. The DCM with only one skill (i.e., ) appeared as the mastery model in the literature [46,47].
3.2. Within-Item Dimensionality
In the following subsections, different DCMs for items with within-item dimensionality are discussed. In this case, an item is assumed to load on only two dimensions and . As argued in Section 2, the situation in which an item loads on more than two dimensions (i.e., two skills) is conceptually similar to the case of two dimensions but just introduces more notation, and it does not add more insights to this article. In analogy to MIRT models, compensatory, noncompensatory, and partially compensatory DCMs can be distinguished.
3.2.1. Compensatory DCM:ADCM
The additive diagnostic classification model (ADCM; Refs. [48,49]) allows that the two skills and can compensate. The IRF for the ADCM is defined as:
Note that imposes a constraint on the IRF because the four probabilities for are represented by the three parameters , and .
3.2.2. Noncompensatory DCM:DINA Model
A noncompensatory DCM appeared as the deterministic inputs, noisy “and” gate (DINA; Refs. [45,50]) model in the literature. The IRF of the DINA model is given by:
A high probability of getting item correct can only be achieved if both skills and are mastered (i.e., they have values of 1).
3.2.3. Partially Compensatory DCM:GDINA Model
The generalized deterministic inputs, noisy “and” gate (GDINA; Refs. [50,51,52]) model is a partially compensatory DCM that has the ADCM (17) and the DINA model (18) as two particular cases. It contains the two main effects of the ADCM as well as the interaction effect of the DINA model. The IRF of the GDINA model is given as:
The four item response probabilities for are reparametrized in (19). Hence, in contrast to the ADCM and the DINA model, no constraints are imposed in the GDINA model in its most flexible form.
4. Mixed and Partial Membership Diagnostic Classification Model
Researchers have occasionally questioned that the dichotomous classification into masters and nonmasters of the skills in DCMs is empirically not always tenable [53,54,55]. DCMs assume a crisp membership; that is, students can only belong to the class or to the class . Mixed membership models (also referred to as grade of membership models) or partial membership models weaken this assumption [56,57,58,59,60,61,62]. In these models, students are allowed to switch classes (i.e., the mastery and the nonmastery states) across items [63,64,65].
In mixed membership models, the vector of binary skills is replaced by a vector of continuous bounded variables . The value can be interpreted as the degree to which a student belongs to the mastery class , while characterizes the degree to belong to class . It should be emphasized that the bounded latent membership vector variable can be equivalently represented by an unbounded latent vector variable such that:
where is a monotonically increasing and injective link function. The probit link function has been used in [12]. In this article, the logistic link function is utilized [20,66]. In this case, an injective transformation of the latent membership variable follows a multivariate normal distribution [66]. More formally, assume that follows a multivariate normal distribution with mean vector and covariance matrix . The variable is obtained by applying the logistic transformation coordinate-wise:
We write as an abbreviation for the coordinate-wise application of the inverse logistic function for . By using the density transformation theorem [67], we obtain the following:
Note that choosing a very large standard deviation for (e.g., a standard deviation of 1000) corresponds to a membership variable whose values are concentrated nearly 0 or 1. That is, the crisp membership employed in DCMs can be obtained as a special case of mixed membership models.
As an alternative, a discrete grid of values can be defined, and the mixed membership model is defined based on this grid. This approach is also referred as a nonparametric assumption of the mixed membership distribution. For example, one could assume . In this specification, the value can be interpreted as a partial mastery of the dth skill.
In the following two sections, we compare mixed membership and partial membership DCMs. These two modeling approaches differ in how their IRFs are defined.
4.1. Mixed Membership DCM
Mixed membership DCMs have been proposed in [12]. In general, IRFs in mixed membership models are obtained by defining a weighted sum of item response probabilities from crisp membership values associated with a corresponding latent class [12,68].
4.1.1. Between-Item Dimensionality
We first discuss the case of between-item dimensionality. The IRF for the crisp membership case (i.e., using the binary latent vector variable in the DCM) is denoted as:
The IRF for the mixed membership DCM is then defined as
Note that (24) can be more compactly written as:
Using the IRF definition of the between-item dimensionality DCM in (14), we arrive at:
4.1.2. Within-Item Dimensionality
Now, we discuss mixed membership DCMs in the case of within-item dimensionality. In the DCM with a crisp membership, we denote the IRF as:
The IRF can follow the ADCM, the DINA, or the GDINA model. The IRF for the mixed membership variable is defined as (see [12]):
The IRF in (28) generally holds with a specified DCM for crisp membership. That is, by using the ADCM, DINA, or the GDINA model, different specifications of the mixed membership DCM are obtained.
Note that (28) can be rewritten as:
4.2. Partial Membership DCM
Like mixed membership models, partial membership models [69] also weaken the assumption of a crisp membership but define their IRFs differently. While mixed membership models define the IRF as the weighted sum of IRFs from the crisp membership case, partial membership models define the IRF as the weighted harmonic mean of the IRFs from the crisp membership case [68,70]. We now separately describe the cases of between-item and within-item dimensionality in the next two subsections.
4.2.1. Between-Item Dimensionality
The IRFs in the between-item dimensionality in the crisp membership case (i.e., the DCMs) are denoted as:
If the logistic link function is utilized for defining the IRFs in the DCM, they can be written as:
with a linear predictor
The IRF for the partial membership DCM is defined as:
where .
In Appendix A, the IRF is derived as:
This IRF looks identical to the IRF in the between-item dimensionality MIRT model in which is replaced with . We elaborate on the connection of the two approaches in more detail in Section 5.
4.2.2. Within-Item Dimensionality
We now discuss the definition of the IRF for a partial membership DCM in the case of within-item dimensionality. Let again the IRF for the crisp membership DCMs be denoted by:
The IRFs of the DCMs discussed in this paper utilize the logistic link function. The notation in (34) includes the IRFs of the ADCM, DINA, and the GDINA model.
We can rewrite (34) as:
The IRF for the partial membership DCM can generally be defined as:
where the constant is chosen such that the probabilities in (36) and (37) add to 1.
In Appendix B, the IRF is simplified to:
Interestingly, the IRF in (38) is the same as in the DCM in which the binary skills are substituted with continuous skills . These kinds of models are also referred to as probabilistic (membership) DCMs [13,71,72,73,74,75]. Hence, this section demonstrated the equivalence of partial membership and probabilistic membership DCMs if the IRFs are based on the logistic link function (see also [20]).
5. Heuristic Comparison of the Different Modeling Approaches
In this section, MIRT models, (crisp membership) DCMs, and partial membership CDMs are heuristically compared (see also [70,76]). Note thats the multidimensional real-value variable from a MIRT model can be obtained from the bounded variable by applying the inverse logistic transformation (see also [77]):
for all dimensions . Hence, the IRFs based on from the MIRT model can be alternatively formulated as IRFs based on the transformed latent variable .
For two skills, an equivalent, or for at least three skills, a more parsimonious representation of can be obtained by defining as a discretization of ; that is:
where denotes the indicator function. This underlying multivariate normal distribution approach to DCMs has been discussed in [78]. Hence, (crisp membership) DCMs can be interpreted as partial membership models in which the discretization operator is applied in the IRFs.
We now discuss the different implications of the approaches. The GDINA model for the partial membership DCM has the IRF
The DCM GDINA specification can be formulated as:
which follows the particular step function , and the cut point of 0.5 is fixed across all items. While (38) assumes linear effects of , , and , the DCM specification follows a particular nonlinear step function. To some extent, the linear continuous effects offer more flexibility in modeling the IRFs.
The multidimensional compensatory IRT model can be written as:
with . The IRF in (43) looks quite similar to (38) when excluding the interaction effect for . However, a different transformation of is utilized when defining the IRF.
Alternatively, one could also start with (38) and formulate a partially compensatory IRT model as:
Instead of (11), the logistically transformed variables instead of the original variables are used.
Because the latent variable vector in (38) is bounded, the IRF allows for partially capturing guessing and slipping effects. Hence, the partial membership DCM can be interpreted as more flexible than a MIRT model. The partial membership DCM can likely accommodate the shape of the IRF used in the four-parameter logistic (4PL; Refs. [79,80,81,82]) model and can attain lower and upper asymptotes different from 0 and 1, respectively. Note that it holds that:
Hence, the probabilistic membership DCMs can be interpreted as approximations of a multidimensional 4PL model [83,84,85,86,87]. Note that the probabilistic membership DCMs also estimate the mean vector and the covariance matrix in the logit-normal distribution, while standardized variables are used in MIRT models for identification reasons when using a multivariate normally distributed variable.
Previous literature emphasized that there is likely skill continuity in DCMs. That is, the true data-generating latent variable in DCMs is , but the discrete variable is applied [88,89,90] (see also [91]). The strong resemblance between MIRT models and DCMs has also been studied by researcher Matthias von Davier [92,93,94,95,96].
To sum up, the similarity of the approaches can be investigated by comparing distributional assumptions of and IRFs by rephrasing the compensatory MIRT model and DCMs reparametrized based on the latent variable . In this sense, the compensatory MIRT model and DCMs can be viewed as particular cases of a partial membership DCM.
6. Empirical Examples
In this section, we compare different MIRT models, (crisp membership) DCMs, and mixed and partial membership DCMs through analyzing three publicly available datasets. The first dataset has between-item dimensionality, while the second and third dataset have within-item dimensionality. We selected subdatasets that contained only two dimensions (i.e., ) to reduce model complexity and computation time. It can be expected that the main findings would change if more than two dimensions were used.
The three datasets were frequently applied in DCM applications. Therefore, it is interesting to compare DCMs with alternative model specifications.
6.1. Method
We now describe the three datasets that are used in the following empirical analyses.
The first dataset, acl, contained in the R package mokken [97,98], had 433 subjects on items for the Dutch adjective checklist. For this analysis, we chose 20 items on the scales achievement (original Items 11 to 20) and dominance (original Items 21 to 30). This subdataset had between-item dimensionality. Each of the 20 items either measured the first or the second dimension. The dataset has been dichotomized, and values of at least two were set to one, while values smaller than two were set to zero in this analysis.
The second dataset, data.ecpe, contains item responses of 2922 subjects on the grammar section of the examination for the certificate of proficiency in English (ECPE) test [99,100,101,102]. The data.ecpe dataset can be accessed from the R package CDM [103]. For this analysis, we selected items that measured the first dimension (Skill 1: morphosyntactic rules) or the third skill (Skill 3: lexical rules). In total, 22 items that had within-item dimensionality were selected. Five and ten items uniquely measured the first or the second dimension, respectively. Seven items had within-item dimensionality.
The third dataset, mcmi, contains item responses of 1208 persons on a Dutch version of the Millon clinical multiaxial inventory (MCMI), designed as a questionnaire to diagnose mental disorders. The mcmi dataset is contained in the R package mokken [97,98]. Different DCMs for the whole dataset were studied in [104,105]. In the analysis, we used items from mcmi that measured the dimensions “H” (somatoform) or “CC” (major depression). The resulting 22 items had within-item dimensionality. Eight items measure both dimensions. Four and ten items solely load on the first and the second dimension, respectively.
In the following analyses, six model specifications were applied. In the DCMs, the skill space is chosen as . The mixed membership (MM) and partial membership (PM) specifications defined a distribution on for . In the first approach, the logit-normal distribution was specified (MM-NO and PM-NO) in which the mean vector and the covariance matrix were estimated. In the second approach, a discrete nonparametric distribution on is used for MM and PM (denoted as MM-NP and PM-NP). This approach can be interpreted as a crude approximation of a complex continuous distribution on . In the case of between-item dimensionality, there is only a single option for specifying the IRFs. In the case of within-item dimensionality, the GDINA, ADCM, and the DINA model were combined with the DCM, MM-NO, MM-NP, PM-NO, and PM-NP specifications. Finally, MIRT models for were specified by assuming a bivariate normal distribution on . The mean vector was fixed to , and only correlations were estimated in the covariance matrix , while the standard deviations of the components of were fixed to 1. In the case of between-item dimensionality, there was only a single option for specifying the multidimensional 2PL model. In the case of within-item dimensionality, the partially compensatory (PC), the compensatory (CO), and the noncompensatory (NC) MIRT model were specified.
The different models are compared through information criteria. The information criteria rely on the deviance of a fitted model. The most well-known information criteria are the Akaike information criterion (AIC; ref. [106]) and Bayesian information criterion (BIC; ref. [107]) and are defined as [108]:
where p and N denote the number of estimated model parameters and sample size, respectively. A smaller information criterion indicates a better-fitting model. Note that the information criteria balance model fit and model parsimony.
Notably, AIC and BIC depend on sample size. To obtain a standardized measure of model fit, the Gilula–Haberman penalty (GHP; refs. [109,110,111,112]) is defined as:
The GHP can be interpreted as a standardized measure of a bias-corrected likelihood per case and item. In this article, we define the GHP that is normed for a hypothetical sample size of and is defined as:
where, again, smaller values of GHP or GHP4 indicate a better-fitting model.
For the evaluation of alternative models, model differences in statistics AIC, BIC, and can be computed, which are denoted as , , and . For , model differences regarding the AIC larger than 10 (see [108]) can be interpreted as substantial. In line with previous research, differences larger than 10 might be considered as a moderate deviation, while differences between 1 and 10 as a small deviation [112,113].
All models were estimated using the sirt::xxirt() function from the R (Version 4.3.1; [114]) package sirt [115]. The models were estimated using marginal maximum likelihood estimation [116]. In the first 100 iterations, the expectation-maximization algorithm [117] was utilized and the algorithm switched afterward to the Newton-Raphson approach.
The different model specifications were compared regarding model fit using the AIC and the statistics. We did not perform model selection based on BIC because the simulation study presented in Section 7 showed inferior performance of this criterion compared to model selection based on AIC.
6.2. Results
6.2.1. Dataset acl
Table 1 presents and for the dataset acl. It turned out that the discrete mixed membership specification (i.e., MM-NP) fitted best, followed by PM-NO and the MIRT model. Notably, the DCM resulted in a worse model fit. However, there were only small model differences between MM-NP, PM-NO, PM-NP, and the MIRT model in terms of the statistic.
Table 1.
Dataset acl: Model comparisons based on and .
Table 2 presents the estimated item parameters for four selected model specifications for the acl dataset. Overall, the absolute values of MM and PM item parameters were larger compared to the DCM. The correlations of the item parameters were moderate to large but far from perfect (DCM and MM: 0.94, DCM and PM: 0.83, MM and PM: 0.78). While the correlations for the parameter were quite large (DCM and MM: 0.92, DCM and PM: 0.98, MM and PM: 0.94), they turned out to be small to moderately valued for (DCM and MM: 0.14, DCM and PM: 0.42, MM and PM: 0.75).
Table 2.
Dataset acl: Estimated item parameters for selected models.
The correlation between the two dimensions were 0.67 in the DCM (computed as a tetrachoric correlation), 0.86 in the MM-NO model, 0.98 in the PM-NO model, and 0.54 in the MIRT model. Note that the correlations in MIRT and PM were quite different, resulting in different conclusions, although the model fit was very similar.
In the DCM, the skill class proportions were 0.28 for , 0.09 for , 0.25 for , and 0.38 for , resulting in marginal skill class probabilities of 0.47 for the first skill and 0.63 for the second skill. In the MM-NP specification, the means of () were 0.48 and 0.57, and for PM-NP, they were 0.51 and 0.52, respectively. In addition, Table 3 displays the probabilities of in the MM-NP and PM-NP specification. Although the general probability patterns were similar, there were nonnegligible differences between the probabilities, indicating that the different specifications resulted in slightly different quantitative interpretations regarding the two skills.
Table 3.
Dataset acl: Estimated class probabilities .
6.2.2. Dataset data.ecpe
Table 4 presents model comparisons of the specified DCMs, mixed membership DCMs and MIRT models for the dataset data.ecpe. It turned out that the three PM-NO specifications resulted in the best model fit in terms of AIC. The ADCM for PM-NO was the best-fitting model, slightly better than the GDINA model for PM-NO. In terms of , the model differences between all PM and MIRT model specifications turned out to be small.
Table 4.
Dataset data.ecpe: Model comparisons based on and .
Table 5 reports estimated item parameters for the partially compensatory models (i.e., GDINA and PC). We reported the NO specifications for MM and PM because they resulted in a better model fit compared to the nonparametric distribution (i.e., MM-NP and PM-NP).
Table 5.
Dataset data.ecpe: Estimated item parameters for selected models.
There were moderate to strong correlations in the item intercept parameter (DCM and MM: 0.98, DCM and PM: 0.78, MM and PM: 0.75). The correlations between different model specifications for the item slope parameters and were slightly larger (for : DCM and MM: 0.96, DCM and PM: 0.92, MM and PM: 0.89; for : DCM and MM: 0.83, DCM and PM: 0.95, MM and PM: 0.86). The correlations of the item parameter were large (DCM and MM: 0.95, DCM and PM: 0.98, MM and PM: 0.98). Similar to the acl dataset, the absolute values of estimated item parameters were larger for the MM and PM DCM compared to the crisp membership DCM. The parameter in the PC MIRT model were estimated substantially different from 0 and 1, indicating that the items neither function fully compensatory nor noncompensatory.
The estimated correlations for the GDINA specifications between the two dimensions were 0.90 in the DCM (tetrachoric correlation), 0.96 in MM-NO, and 0.99 in the PM-NO model. Interestingly, the correlation of the ADCM specification for PM-NO was slightly smaller at 0.96. The correlations for the different MIRT models were very similar (MIRT-PC: 0.82, MIRT-CO: 0.82, MIRT-NC: 0.81).
6.2.3. Dataset mcmi
Table 6 reports model comparisons for the mcmi dataset. The GDINA specification of PM-NO fitted best, followed by ADCM of PM-NO. In terms of , the partially compensatory and compensatory MIRT models had only small differences from the PM-NO approaches. The MM-NO resulted in a worse fit, so we instead reported item parameters of the better-fitting MM-NP model. Across all models, the GDINA specifications fitted the data slightly better than the ADCM specifications.
Table 6.
Dataset mcmi: Model comparisons based on and .
Table 7 displays the estimated item parameters for four selected specifications for the mcmi dataset. Overall, the item parameters correlated moderately to strongly across models.
Table 7.
Dataset mcmi: Estimated item parameters for selected models.
The correlations for the GDINA specification between the two dimensions were large (DCM: 0.93, MM-NO: 0.99, PM-NO: 1.00, MIRT-PC: 0.95, MIRT-CO: 0.96, MIRT-NC: 0.93).
7. Simulation Study
In this simulation study, we evaluate the accuracy of model selection based on AIC, BIC, and GHP. For simplicity, we only considered the case of between-item dimensionality.
7.1. Method
We used estimated model parameters from the analysis of the acl dataset presented in Section 6.2.1. This dataset had between-item dimensionality and consisted of 20 items. We varied the sample size N as 500, 1000, and 2000. Six data-generating models (DGM) were simulated: the DCM, MM-NO, MM-NP, PM-NO, PM-NP, and the MIRT model. The same models were specified as analysis models for all DGMs.
Model estimation was carried out in the same way as described in Section 6 (see p. 10). We assessed the percentage rate that a particular analysis was selected based on the smallest AIC and smallest BIC. Moreover, we computed the average statistic for difference between the analysis model and the DGM (see Section 6.1).
In total, 2500 replications were carried out in each of the 3 (sample size) × 6 (DGMs) = 18 cells of the simulation study.
7.2. Results
Table 8 reports the percentage model selection rates and the average statistic as a function of sample size for the five DGMs.
Table 8.
Simulation Study: Model selection rates based on AIC and BIC and average statistic for five different data-generating models (displayed in columns).
If the DCM was the DGM, model selection based on AIC and BIC was accurate. However, there were only small average model differences of the DCM to the MM-NP and PM-NP specifications in terms of .
If the MM-NO was the DGM, model selection based on the BIC strongly favored the MIRT model. However, there were very small model differences between MM-NO and the other models, making the different analysis models difficult to distinguish. The true DGM MM-NO can only be correctly detected for a large sample size and when using the AIC.
If the data were generated by MM-NP, model selection based on AIC worked well. However, BIC also chose the MIRT model at a nonnegligible rate of 32% for the sample size . Note that the PM-NP analysis model was relatively close to the MM-NP model in terms of .
If the DGM was the PM-NO model, the MIRT model was generally favored when the BIC was used as the model selection criterion. The AIC only showed acceptable model selection rates for .
If the PM-NP was the DGM, AIC performed well for all sample sizes but failed for the BIC for a sample size of .
Finally, if the DGM was the MIRT model, model selection based on AIC and BIC performed well. However, there were only small model differences between MIRT and the partial membership models PM-NO and PM-NP, but larger differences to the mixed membership specifications MM-NO and MM-NP.
8. Discussion
In this article, we compared the extension of DCMs to mixed and partial memberships with crisp membership DCMs and MIRT models through three empirical datasets and a simulation study. We clarified the different nature of the two types of memberships (i.e., mixed and partial membership). In particular, we compared the partial membership DCM to the MIRT model. Essentially, these two specifications can be interpreted as structurally very similar. Hence, it is up to the researcher to interpret dimensions as partial memberships of classes or as measures of a quantitative continuous latent variable.
In the empirical examples, the DCM extensions fitted the data substantially better than the crisp membership DCMs. Moreover, partial membership DCMs outperformed mixed membership DCMs because they offer more flexibility in modeling IRFs. An anonymous reviewer wondered why an empirical study was needed to demonstrate this finding. However, the finding has previously been pointed out in the mixed membership literature of exploratory mixture and latent class models [68]. Nevertheless, recent generalizations of DCMs only considered mixed and not partial membership [12,13]. Therefore, our simulation study and empirical analysis could give rise to more DCM research devoted to partial memberships. Furthermore, MIRT models frequently showed a similar fit to the partial membership DCMs.
The simulation study showed that model selection based on AIC turned out to be more reliable than that of BIC. This finding holds for small (i.e., ) and large (i.e., ) sample sizes. For a fixed number of items, model differences in terms of the GHP penalty are almost independent of the sample size. However, a dependence of the GHP penalty on the number of items cannot be expected.
In this article, we confined ourselves to low-dimensional models (i.e., models with only two dimensions). In models with more than two dimensions, marginal maximum likelihood might be computationally demanding, and pairwise likelihood [118,119,120], variational approximation [121,122,123], or Markov chain Monte Carlo estimation [124,125,126] might be considered alternatively.
Given the main findings of our paper, it might be questioned whether DCMs with binary skills should be used at all compared to partial membership DCMs or MIRT models. The model-based classification of persons into masters and nonmasters might be attractive to practitioners in terms of model interpretation and using individual classifications as the outcome of the DCM. Interpreted in this way, DCMs with crisp membership are model-based confirmatory cluster analyses that automatically provide classifications into masters and nonmasters of skills. However, the classification could alternatively be determined by content experts, such as in a standard setting [127,128,129] procedure. The classifications obtained by DCMs will likely be strongly dependent on the chosen persons and items [130], particularly if DCMs with a crisp membership are not the data-generating model (i.e., in the presence of skill continuity). Importantly, the property of the absolute invariance of item parameters in DCMs only holds for correctly specified models [53,90,131,132]. In MIRT models, there is always scale indeterminacy because the mean and the standard deviations of the dimensions of cannot be disentangled from item intercepts and item discriminations if they were freely estimated. That there is no such an indeterminacy in DCMs comes at a price that the DCM exactly holds. Hence, model-based comparisons from DCMs of groups in cross-sectional settings and change in longitudinal designs should only be carried out with caution. If researchers still want to apply DCMs for such comparisons, we think that it is advisable to impose invariant item parameters across groups or time points to enable well-defined and interpretable comparisons, even if measurement invariance is certainly violated.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The empirical datasets used in Section 6 can be extracted from the R packages mokken (https://cran.r-project.org/web/packages/mokken; accessed on 9 April 2024) and CDM (https://cran.r-project.org/web/packages/CDM; accessed on 9 April 2024).
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| 2PL | two-parameter logistic |
| 4PL | four-parameter logistic |
| AIC | Akaike information criterion |
| ADCM | additive diagnostic classification model |
| BIC | Bayesian information criterion |
| CO | compensatory |
| DCM | diagnostic classification model |
| DGM | data-generating model |
| DINA | deterministic inputs, noisy “and” gate |
| GDINA | generalized deterministic inputs, noisy “and” gate |
| GHP | Gilula–Haberman penalty |
| IRF | item response function |
| IRT | item response theory |
| MIRT | multidimensional item response theory |
| MM | mixed membership |
| NC | noncompensatory |
| PC | partially compensatory |
| PM | partial membership |
Appendix A. Derivation of Equation (33)
The IRFs are defined in Equations (30)–(32). The numerator in (32) can be simplified to:
where is used as an abbreviation and we define:
In the same way, we get:
Hence, we obtain the simplified IRF for the partial membership case as:
Now, observe that and . Then, we obtain from (A4):
Appendix B. Derivation of Equation (38)
Using the same derivation as in Section 4.2.1, we obtain:
where the factor is properly defined. In addition, we get:
Hence, we get the following IRF:
which can be further simplified to:
References
- Lazarsfeld, P.F.; Henry, N.W. Latent Structure Analysis; Houghton Mifflin: Boston, MA, USA, 1968. [Google Scholar]
- Andersen, E.B. Latent structure analysis: A survey. Scand. J. Stat. 1982, 9, 1–12. [Google Scholar]
- Andersen, E.B. The Statistical Analysis of Categorical Data; Springer: Berlin, Germany, 1994. [Google Scholar] [CrossRef]
- Clogg, C.C.; Goodman, L.A. Latent structure analysis of a set of multidimensional contingency tables. J. Am. Stat. Assoc. 1984, 79, 762–771. [Google Scholar] [CrossRef]
- Goodman, L.A. On the estimation of parameters in latent structure analysis. Psychometrika 1979, 44, 123–128. [Google Scholar] [CrossRef]
- Yen, W.M.; Fitzpatrick, A.R. Item response theory. In Educational Measurement; Brennan, R.L., Ed.; Praeger Publishers: Westport, WT, USA, 2006; pp. 111–154. [Google Scholar]
- Cai, L.; Choi, K.; Hansen, M.; Harrell, L. Item response theory. Annu. Rev. Stat. Appl. 2016, 3, 297–321. [Google Scholar] [CrossRef]
- Chen, Y.; Li, X.; Liu, J.; Ying, Z. Item response theory—A statistical framework for educational and psychological measurement. arXiv 2021, arXiv:2108.08604. [Google Scholar] [CrossRef]
- Rupp, A.A.; Templin, J.L. Unique characteristics of diagnostic classification models: A comprehensive review of the current state-of-the-art. Meas. Interdiscip. Res. Persp. 2008, 6, 219–262. [Google Scholar] [CrossRef]
- Chang, H.H.; Wang, C.; Zhang, S. Statistical applications in educational measurement. Annu. Rev. Stat. Appl. 2021, 8, 439–461. [Google Scholar] [CrossRef]
- Zhang, S.; Liu, J.; Ying, Z. Statistical applications to cognitive diagnostic testing. Annu. Rev. Stat. Appl. 2023, 10, 651–675. [Google Scholar] [CrossRef]
- Shang, Z.; Erosheva, E.A.; Xu, G. Partial-mastery cognitive diagnosis models. Ann. Appl. Stat. 2021, 15, 1529–1555. [Google Scholar] [CrossRef]
- Shu, T.; Luo, G.; Luo, Z.; Yu, X.; Guo, X.; Li, Y. An explicit form with continuous attribute profile of the partial mastery DINA model. J. Educ. Behav. Stat. 2023, 48, 573–602. [Google Scholar] [CrossRef]
- de la Torre, J.; Santos, K.C. On the relationship between unidimensional item response theory and higher-order cognitive diagnosis models. In Essays on Contemporary Psychometrics; van der Ark, L.A., Emons, W.H.M., Meijer, R.R., Eds.; Springer: New York, NY, USA, 2022; pp. 389–412. [Google Scholar] [CrossRef]
- Lee, Y.S.; de la Torre, J.; Park, Y.S. Relationships between cognitive diagnosis, CTT, and IRT indices: An empirical investigation. Asia Pac. Educ. Rev. 2012, 13, 333–345. [Google Scholar] [CrossRef]
- Ma, W.; Minchen, N.; de la Torre, J. Choosing between CDM and unidimensional IRT: The proportional reasoning test case. Meas. Interdiscip. Res. Persp. 2020, 18, 87–96. [Google Scholar] [CrossRef]
- Maas, L.; Madison, M.J.; Brinkhuis, M.J.S. Properties and performance of the one-parameter log-linear cognitive diagnosis model. Front. Educ. 2024, 9, 1287279. [Google Scholar] [CrossRef]
- Madison, M.J.; Wind, S.A.; Maas, L.; Kazuhiro Yamaguchi, S.H. A one-parameter diagnostic classification model with familiar measurement properties. J. Educ. Meas. 2024; epub ahead of print. [Google Scholar] [CrossRef]
- Liu, R. Using diagnostic classification models to obtain subskill information and explore its relationship with total scores: The case of the Michigan english test. Psychol. Test Assess. Model. 2020, 62, 487–516. [Google Scholar]
- Robitzsch, A. Relating the one-parameter logistic diagnostic classification model to the Rasch model and one-parameter logistic mixed, partial, and probabilistic membership diagnostic classification models. Foundations 2023, 3, 621–633. [Google Scholar] [CrossRef]
- von Davier, M.; DiBello, L.; Yamamoto, K.Y. Reporting Test Outcomes with Models for Cognitive Diagnosis; (Research Report No. RR-06-28); Educational Testing Service: Princeton, NJ, USA, 2006. [Google Scholar] [CrossRef]
- Bonifay, W. Multidimensional Item Response Theory; Sage: Thousand Oaks, CA, USA, 2020. [Google Scholar] [CrossRef]
- Reckase, M.D. Multidimensional Item Response Theory Models; Springer: New York, NY, USA, 2009. [Google Scholar] [CrossRef]
- Tatsuoka, K.K. Rule space: An approach for dealing with misconceptions based on item response theory. J. Educ. Meas. 1983, 20, 345–354. [Google Scholar] [CrossRef]
- da Silva, M.A.; Liu, R.; Huggins-Manley, A.C.; Bazán, J.L. Incorporating the q-matrix into multidimensional item response theory models. Educ. Psychol. Meas. 2019, 79, 665–687. [Google Scholar] [CrossRef]
- Adams, R.J.; Wilson, M.; Wang, W.c. The multidimensional random coefficients multinomial logit model. Appl. Psychol. Meas. 1997, 21, 1–23. [Google Scholar] [CrossRef]
- Hartig, J.; Höhler, J. Representation of competencies in multidimensional IRT models with within-item and between-item multidimensionality. Z. Psychol. 2008, 216, 89–101. [Google Scholar] [CrossRef]
- Feuerstahler, L.; Wilson, M. Scale alignment in between-item multidimensional Rasch models. J. Educ. Meas. 2019, 56, 280–301. [Google Scholar] [CrossRef]
- Birnbaum, A. Some latent trait models and their use in inferring an examinee’s ability. In Statistical Theories of Mental Test Scores; Lord, F.M., Novick, M.R., Eds.; MIT Press: Reading, MA, USA, 1968; pp. 397–479. [Google Scholar]
- Reckase, M.D. Logistic multidimensional models. In Handbook of Item Response Theory, Volume 1: Models; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 189–210. [Google Scholar] [CrossRef]
- Bolt, D.M.; Lall, V.F. Estimation of compensatory and noncompensatory multidimensional item response models using Markov chain Monte Carlo. Appl. Psychol. Meas. 2003, 27, 395–414. [Google Scholar] [CrossRef]
- Babcock, B. Estimating a noncompensatory IRT model using Metropolis within Gibbs sampling. Appl. Psychol. Meas. 2011, 35, 317–329. [Google Scholar] [CrossRef]
- Buchholz, J.; Hartig, J. The impact of ignoring the partially compensatory relation between ability dimensions on norm-referenced test scores. Psychol. Test Assess. Model. 2018, 60, 369–385. [Google Scholar]
- Wang, C.; Nydick, S.W. Comparing two algorithms for calibrating the restricted non-compensatory multidimensional IRT model. Appl. Psychol. Meas. 2015, 39, 119–134. [Google Scholar] [CrossRef]
- Chalmers, R.P. Partially and fully noncompensatory response models for dichotomous and polytomous items. Appl. Psychol. Meas. 2020, 44, 415–430. [Google Scholar] [CrossRef]
- Spray, J.A.; Davey, T.C.; Reckase, M.D.; Ackerman, T.A.; Carlson, J.E. Comparison of Two Logistic Multidimensional Item Response Theory Models. 1990. Research Report ONR90-8. ACT Research Report Series, Iowa City. Available online: https://apps.dtic.mil/sti/citations/tr/ADA231363 (accessed on 3 June 2024).
- DeMars, C.E. Partially compensatory multidimensional item response theory models: Two alternate model forms. Educ. Psychol. Meas. 2016, 76, 231–257. [Google Scholar] [CrossRef]
- DiBello, L.V.; Roussos, L.A.; Stout, W. A review of cognitively diagnostic assessment and a summary of psychometric models. In Handbook of Statistics, Volume 26: Psychometrics; Rao, C.R., Sinharay, S., Eds.; Elsevier: Amsterdam, The Netherlands, 2006; pp. 979–1030. [Google Scholar] [CrossRef]
- George, A.C.; Robitzsch, A. Cognitive diagnosis models in R: A didactic. Quant. Meth. Psych. 2015, 11, 189–205. [Google Scholar] [CrossRef]
- Chen, Y.; Liang, S. BNMI-DINA: A Bayesian cognitive diagnosis model for enhanced personalized learning. Big Data Cogn. Comput. 2023, 8, 4. [Google Scholar] [CrossRef]
- Rupp, A.A.; Templin, J.; Henson, R.A. Diagnostic Measurement: Theory, Methods, and Applications; Guilford Press: New York City, NY, USA, 2010; Available online: https://rb.gy/9ix252 (accessed on 3 June 2024).
- Ma, C.; Ouyang, J.; Xu, G. Learning latent and hierarchical structures in cognitive diagnosis models. Psychometrika 2023, 88, 175–207. [Google Scholar] [CrossRef]
- Martinez, A.J.; Templin, J. Approximate invariance testing in diagnostic classification models in the presence of attribute hierarchies: A Bayesian network approach. Psych 2023, 5, 688–714. [Google Scholar] [CrossRef]
- Wang, C.; Lu, J. Learning attribute hierarchies from data: Two exploratory approaches. J. Educ. Behav. Stat. 2021, 46, 58–84. [Google Scholar] [CrossRef]
- Junker, B.W.; Sijtsma, K. Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Appl. Psychol. Meas. 2001, 25, 258–272. [Google Scholar] [CrossRef]
- Dayton, C.M.; Macready, G.B. A probabilistic model for validation of behavioral hierarchies. Psychometrika 1976, 41, 189–204. [Google Scholar] [CrossRef]
- Haertel, E.H. Using restricted latent class models to map the skill structure of achievement items. J. Educ. Meas. 1989, 26, 301–321. [Google Scholar] [CrossRef]
- de la Torre, J. The generalized DINA model framework. Psychometrika 2011, 76, 179–199. [Google Scholar] [CrossRef]
- Henson, R.A.; Templin, J.L.; Willse, J.T. Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika 2009, 74, 191–210. [Google Scholar] [CrossRef]
- de la Torre, J. DINA model and parameter estimation: A didactic. J. Educ. Behav. Stat. 2009, 34, 115–130. [Google Scholar] [CrossRef]
- Ma, W.; de la Torre, J. GDINA: An R package for cognitive diagnosis modeling. J. Stat. Softw. 2020, 93, 1–26. [Google Scholar] [CrossRef]
- Shi, Q.; Ma, W.; Robitzsch, A.; Sorrel, M.A.; Man, K. Cognitively diagnostic analysis using the G-DINA model in R. Psych 2021, 3, 812–835. [Google Scholar] [CrossRef]
- de la Torre, J.; Lee, Y.S. A note on the invariance of the DINA model parameters. J. Educ. Meas. 2010, 47, 115–127. [Google Scholar] [CrossRef]
- Huang, Q.; Bolt, D.M. Relative robustness of CDMs and (M)IRT in measuring growth in latent skills. Educ. Psychol. Meas. 2023, 83, 808–830. [Google Scholar] [CrossRef]
- Ma, W.; Chen, J.; Jiang, Z. Attribute continuity in cognitive diagnosis models: Impact on parameter estimation and its detection. Behaviormetrika 2023, 50, 217–240. [Google Scholar] [CrossRef]
- Chen, L.; Gu, Y. A spectral method for identifiable grade of membership analysis with binary responses. Psychometrika 2024, 89, 626–657. [Google Scholar] [CrossRef]
- Erosheva, E.A. Comparing latent structures of the grade of membership, Rasch, and latent class models. Psychometrika 2005, 70, 619–628. [Google Scholar] [CrossRef]
- Gu, Y.; Erosheva, E.E.; Xu, G.; Dunson, D.B. Dimension-grouped mixed membership models for multivariate categorical data. J. Mach. Learn. Res. 2023, 24, 1–49. [Google Scholar]
- Manton, K.G.; Woodbury, M.A.; Stallard, E.; Corder, L.S. The use of grade-of-membership techniques to estimate regression relationships. Sociol. Methodol. 1992, 22, 321–381. [Google Scholar] [CrossRef]
- Qing, H. Estimating mixed memberships in directed networks by spectral clustering. Entropy 2023, 25, 345. [Google Scholar] [CrossRef]
- Wang, Y.S.; Erosheva, E.A. Fitting Mixed Membership Models Using Mixedmem; Technical Report; 2020. Available online: https://tinyurl.com/9fxt54v6 (accessed on 3 June 2024).
- Woodbury, M.A.; Clive, J.; Garson, A., Jr. Mathematical typology: A grade of membership technique for obtaining disease definition. Comput. Biomed. Res. 1978, 11, 277–298. [Google Scholar] [CrossRef]
- Erosheva, E.A.; Fienberg, S.E.; Junker, B.W. Alternative statistical models and representations for large sparse multi-dimensional contingency tables. Ann. Faculté Sci. Toulouse Math. 2002, 11, 485–505. [Google Scholar] [CrossRef]
- Erosheva, E.A.; Fienberg, S.E.; Joutard, C. Describing disability through individual-level mixture models for multivariate binary data. Ann. Appl. Stat. 2007, 1, 346–384. [Google Scholar] [CrossRef]
- Finch, H.W. Performance of the grade of membership model under a variety of sample sizes, group size ratios, and differential group response probabilities for dichotomous indicators. Educ. Psychol. Meas. 2021, 81, 523–548. [Google Scholar] [CrossRef]
- Paisley, J.; Wang, C.; Blei, D.M. The discrete infinite logistic normal distribution. Bayesian Anal. 2012, 7, 997–1034. [Google Scholar] [CrossRef]
- Held, L.; Sabanés Bové, D. Applied Statistical Inference; Springer: Berlin, Germany, 2014. [Google Scholar] [CrossRef]
- Gruhl, J.; Erosheva, E.A.; Ghahramani, Z.; Mohamed, S.; Heller, K. A tale of two (types of) memberships: Comparing mixed and partial membership with a continuous data example. In Handbook of Mixed Membership Models and Their Applications; Airoldi, E.M., Blei, D., Erosheva, E.A., Fienberg, S.E., Eds.; Chapman & Hall: Boca Raton, FL, USA, 2014; pp. 15–38. [Google Scholar] [CrossRef]
- Heller, K.A.; Williamson, S.; Ghahramani, Z. Statistical models for partial membership. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; pp. 392–399. [Google Scholar] [CrossRef]
- Ghahramani, Z.; Mohamed, S.; Heller, K. A simple and general exponential family framework for partial membership and factor analysis. In Handbook of Mixed Membership Models and Their Applications; Airoldi, E.M., Blei, D., Erosheva, E.A., Fienberg, S.E., Eds.; Chapman & Hall: Boca Raton, FL, USA, 2014; pp. 101–122. [Google Scholar] [CrossRef]
- Liu, Q.; Wu, R.; Chen, E.; Xu, G.; Su, Y.; Chen, Z.; Hu, G. Fuzzy cognitive diagnosis for modelling examinee performance. ACM Trans. Intell. Syst. Technol. 2018, 9, 1–26. [Google Scholar] [CrossRef]
- Zhan, P.; Wang, W.C.; Jiao, H.; Bian, Y. Probabilistic-input, noisy conjunctive models for cognitive diagnosis. Front. Psychol. 2018, 9, 997. [Google Scholar] [CrossRef]
- Zhan, P.; Tian, Y.; Yu, Z.; Li, F.; Wang, L. A comparative study of probabilistic logic and fuzzy logic in refined learning diagnosis. J. Psychol. Sci. 2020, 43, 1258–1266. [Google Scholar]
- Zhan, P. Refined learning tracking with a longitudinal probabilistic diagnostic model. Educ. Meas. 2021, 40, 44–58. [Google Scholar] [CrossRef]
- Tian, Y.; Zhan, P.; Wang, L. Joint cognitive diagnostic modeling for probabilistic attributes incorporating item responses and response times. Acta Psychol. Sin. 2023, 55, 1573–1586. [Google Scholar] [CrossRef]
- Marini, M.M.; Li, X.; Fan, P.L. Characterizing latent structure: Factor analytic and grade of membership models. Sociol. Methodol. 1996, 26, 133–164. [Google Scholar] [CrossRef]
- Van der Linden, W.J. Unidimensional logistic response models. In Handbook of Item Response Theory, Volume 1: Models; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 11–30. [Google Scholar] [CrossRef]
- Templin, J.L.; Henson, R.A. Measurement of psychological disorders using cognitive diagnosis models. Psychol. Methods 2006, 11, 287–305. [Google Scholar] [CrossRef]
- Culpepper, S.A. The prevalence and implications of slipping on low-stakes, large-scale assessments. J. Educ. Behav. Stat. 2017, 42, 706–725. [Google Scholar] [CrossRef]
- Loken, E.; Rulison, K.L. Estimation of a four-parameter item response theory model. Brit. J. Math. Stat. Psychol. 2010, 63, 509–525. [Google Scholar] [CrossRef]
- Reise, S.P.; Waller, N.G. How many IRT parameters does it take to model psychopathology items? Psychol. Methods 2003, 8, 164–184. [Google Scholar] [CrossRef]
- Robitzsch, A. Four-parameter guessing model and related item response models. Math. Comput. Appl. 2022, 27, 95. [Google Scholar] [CrossRef]
- Fu, Z.; Zhang, S.; Su, Y.H.; Shi, N.; Tao, J. A Gibbs sampler for the multidimensional four-parameter logistic item response model via a data augmentation scheme. Brit. J. Math. Stat. Psychol. 2021, 74, 427–464. [Google Scholar] [CrossRef]
- Guo, S.; Chen, Y.; Zheng, C.; Li, G. Mixture-modelling-based Bayesian MH-RM algorithm for the multidimensional 4PLM. Brit. J. Math. Stat. Psychol. 2023, 76, 585–604. [Google Scholar] [CrossRef]
- Kalkan, Ö.K.; Çuhadar, I. An evaluation of 4PL IRT and DINA models for estimating pseudo-guessing and slipping parameters. J. Meas. Eval. Educ. Psychol. 2020, 11, 131–146. [Google Scholar] [CrossRef]
- Liu, J.; Meng, X.; Xu, G.; Gao, W.; Shi, N. MSAEM estimation for confirmatory multidimensional four-parameter normal ogive models. J. Educ. Meas. 2024, 61, 99–124. [Google Scholar] [CrossRef]
- Liu, T.; Wang, C.; Xu, G. Estimating three-and four-parameter MIRT models with importance-weighted sampling enhanced variational auto-encoder. Front. Psychol. 2022, 13, 935419. [Google Scholar] [CrossRef]
- Bolt, D.M.; Kim, J.S. Parameter invariance and skill attribute continuity in the DINA model. J. Educ. Meas. 2018, 55, 264–280. [Google Scholar] [CrossRef]
- Bolt, D.M. Bifactor MIRT as an appealing and related alternative to CDMs in the presence of skill attribute continuity. In Handbook of Diagnostic Classification Models; von Davier, M., Lee, Y.S., Eds.; Springer: Cham, Switzerland, 2019; pp. 395–417. [Google Scholar] [CrossRef]
- Huang, Q.; Bolt, D.M. The potential for interpretational confounding in cognitive diagnosis models. Appl. Psychol. Meas. 2022, 46, 303–320. [Google Scholar] [CrossRef]
- Hong, H.; Wang, C.; Lim, Y.S.; Douglas, J. Efficient models for cognitive diagnosis with continuous and mixed-type latent variables. Appl. Psychol. Meas. 2015, 39, 31–43. [Google Scholar] [CrossRef]
- Haberman, S.J.; von Davier, M.; Lee, Y.H. Comparison of Multidimensional Item Response Models: Multivariate Normal Ability Distributions versus Multivariate Polytomous Distributions; Research Report No. RR-08-45; Educational Testing Service: Princeton, NJ, USA, 2008. [Google Scholar] [CrossRef]
- Von Davier, M. A general diagnostic model applied to language testing data. Brit. J. Math. Stat. Psychol. 2008, 61, 287–307. [Google Scholar] [CrossRef]
- Von Davier, M. Mixture Distribution Diagnostic Models; Research Report No. RR-07-32; Educational Testing Service: Princeton, NJ, USA, 2007. [Google Scholar] [CrossRef]
- Von Davier, M. Hierarchical mixtures of diagnostic models. Psychol. Test Assess. Model. 2010, 52, 8–28. [Google Scholar]
- Xu, X.; von Davier, M. Comparing Multiple-Group Multinomial Log-Linear Models for Multidimensional Skill Distributions in the General Diagnostic Model; Research Report No. RR-08-35; Educational Testing Service: Princeton, NJ, USA, 2008. [Google Scholar] [CrossRef]
- Van der Ark, L.A. Mokken scale analysis in R. J. Stat. Softw. 2007, 20, 1–19. [Google Scholar] [CrossRef]
- Van der Ark, L.A. New developments in Mokken scale analysis in R. J. Stat. Softw. 2012, 48, 1–27. [Google Scholar] [CrossRef]
- Templin, J.; Hoffman, L. Obtaining diagnostic classification model estimates using Mplus. Educ. Meas. 2013, 32, 37–50. [Google Scholar] [CrossRef]
- Templin, J.; Bradshaw, L. Hierarchical diagnostic classification models: A family of models for estimating and testing attribute hierarchies. Psychometrika 2014, 79, 317–339. [Google Scholar] [CrossRef]
- Feng, Y.; Habing, B.T.; Huebner, A. Parameter estimation of the reduced RUM using the EM algorithm. Appl. Psychol. Meas. 2014, 38, 137–150. [Google Scholar] [CrossRef]
- Robitzsch, A.; George, A.C. The R package CDM for diagnostic modeling. In Handbook of Diagnostic Classification Models; von Davier, M., Lee, Y.S., Eds.; Springer: Cham, Switzerland, 2019; pp. 549–572. [Google Scholar] [CrossRef]
- George, A.C.; Robitzsch, A.; Kiefer, T.; Groß, J.; Ünlü, A. The R package CDM for cognitive diagnosis models. J. Stat. Softw. 2016, 74, 1–24. [Google Scholar] [CrossRef]
- de la Torre, J.; van der Ark, L.A.; Rossi, G. Analysis of clinical data from a cognitive diagnosis modeling framework. Meas. Eval. Couns. Dev. 2018, 51, 281–296. [Google Scholar] [CrossRef]
- Sijtsma, K.; van der Ark, L.A. Measurement Models for Psychological Attributes; CRC Press: Boca Raton, FL, USA, 2020. [Google Scholar] [CrossRef]
- Cavanaugh, J.E.; Neath, A.A. The Akaike information criterion: Background, derivation, properties, application, interpretation, and refinements. WIREs Comput. Stat. 2019, 11, e1460. [Google Scholar] [CrossRef]
- Neath, A.A.; Cavanaugh, J.E. The Bayesian information criterion: Background, derivation, and applications. WIREs Comput. Stat. 2012, 4, 199–203. [Google Scholar] [CrossRef]
- Burnham, K.P.; Anderson, D.R. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach; Springer: New York, NY, USA, 2002. [Google Scholar] [CrossRef]
- Gilula, Z.; Haberman, S.J. Conditional log-linear models for analyzing categorical panel data. J. Am. Stat. Assoc. 1994, 89, 645–656. [Google Scholar] [CrossRef]
- Gilula, Z.; Haberman, S.J. Prediction functions for categorical panel data. Ann. Stat. 1995, 23, 1130–1142. [Google Scholar] [CrossRef]
- Haberman, S.J. The Information a Test Provides on an Ability Parameter; Research Report No. RR-07-18; Educational Testing Service: Princeton, NJ, USA, 2007. [Google Scholar] [CrossRef]
- van Rijn, P.W.; Sinharay, S.; Haberman, S.J.; Johnson, M.S. Assessment of fit of item response theory models used in large-scale educational survey assessments. Large-Scale Assess. Educ. 2016, 4, 10. [Google Scholar] [CrossRef]
- Robitzsch, A. On the choice of the item response model for scaling PISA data: Model selection based on information criteria and quantifying model uncertainty. Entropy 2022, 24, 760. [Google Scholar] [CrossRef]
- R Core Team. R: A Language and Environment for Statistical Computing; R Core Team: Vienna, Austria, 2023; Available online: https://www.R-project.org/ (accessed on 15 March 2023).
- Robitzsch, A. sirt: Supplementary Item Response Theory Models; 2024. R Package Version 4.1-15. Available online: https://CRAN.R-project.org/package=sirt (accessed on 6 February 2024).
- Glas, C.A.W. Maximum-likelihood estimation. In Handbook of Item Response Theory, Volume 2: Statistical Tools; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 197–216. [Google Scholar] [CrossRef]
- Aitkin, M. Expectation maximization algorithm and extensions. In Handbook of Item Response Theory, Volume 2: Statistical Tools; van der Linden, W.J., Ed.; CRC Press: Boca Raton, FL, USA, 2016; pp. 217–236. [Google Scholar] [CrossRef]
- Katsikatsou, M.; Moustaki, I.; Yang-Wallentin, F.; Jöreskog, K.G. Pairwise likelihood estimation for factor analysis models with ordinal data. Comput. Stat. Data Anal. 2012, 56, 4243–4258. [Google Scholar] [CrossRef]
- Varin, C.; Reid, N.; Firth, D. An overview of composite likelihood methods. Stat. Sin. 2011, 21, 5–42. [Google Scholar]
- Robitzsch, A. Pairwise likelihood estimation of the 2PL model with locally dependent item responses. Appl. Sci. 2024, 14, 2652. [Google Scholar] [CrossRef]
- Yamaguchi, K.; Okada, K. Variational Bayes inference algorithm for the saturated diagnostic classification model. Psychometrika 2020, 85, 973–995. [Google Scholar] [CrossRef]
- Tamano, H.; Mochihashi, D. Dynamical non-compensatory multidimensional IRT model using variational approximation. Psychometrika 2023, 88, 487–526. [Google Scholar] [CrossRef]
- Ulitzsch, E.; Nestler, S. Evaluating Stan’s variational Bayes algorithm for estimating multidimensional IRT models. Psych 2022, 4, 73–88. [Google Scholar] [CrossRef]
- Culpepper, S.A. Bayesian estimation of the DINA model with Gibbs sampling. J. Educ. Behav. Stat. 2015, 40, 454–476. [Google Scholar] [CrossRef]
- Yamaguchi, K.; Templin, J. A Gibbs sampling algorithm with monotonicity constraints for diagnostic classification models. J. Classif. 2022, 39, 24–54. [Google Scholar] [CrossRef]
- Yamaguchi, K. Bayesian analysis methods for two-level diagnosis classification models. J. Educ. Behav. Stat. 2023, 48, 773–809. [Google Scholar] [CrossRef]
- Cizek, G.J.; Bunch, M.B.; Koons, H. Setting performance standards: Contemporary methods. Educ. Meas. 2004, 23, 31–50. [Google Scholar] [CrossRef]
- Pant, H.A.; Rupp, A.A.; Tiffin-Richards, S.P.; Köller, O. Validity issues in standard-setting studies. Stud. Educ. Eval. 2009, 35, 95–101. [Google Scholar] [CrossRef]
- Tiffin-Richards, S.P.; Pant, H.A.; Köller, O. Setting standards for English foreign language assessment: Methodology, validation, and a degree of arbitrariness. Educ. Meas. 2013, 32, 15–25. [Google Scholar] [CrossRef]
- Liao, X.; Bolt, D.M. Guesses and slips as proficiency-related phenomena and impacts on parameter invariance. Educ. Meas. 2024; epub ahead of print. [Google Scholar] [CrossRef]
- Bradshaw, L.P.; Madison, M.J. Invariance properties for general diagnostic classification models. Int. J. Test. 2016, 16, 99–118. [Google Scholar] [CrossRef]
- Ravand, H.; Baghaei, P.; Doebler, P. Examining parameter invariance in a general diagnostic classification model. Front. Psychol. 2020, 10, 2930. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).