A Deterministic Learning Algorithm Estimating the Q-Matrix for Cognitive Diagnosis Models

: The goal of an exam in cognitive diagnostic assessment is to uncover whether an examinee has mastered certain attributes. Different cognitive diagnosis models (CDMs) have been developed for this purpose. The core of these CDMs is the Q-matrix, which is an item-to-attribute mapping, traditionally designed by domain experts. An expert designed Q-matrix is not without issues. For example, domain experts might neglect some attributes or have different opinions about the inclusion of some entries in the Q-matrix. It is therefore of practical importance to develop an automated method to estimate the Q-matrix. This research proposes a deterministic learning algorithm for estimating the Q-matrix. To obtain a sensible binary Q-matrix, a dichotomizing method is also devised. Results from the simulation study shows that the proposed method for estimating the Q-matrix is useful. The empirical study analyzes the ECPE data. The estimated Q-matrix is compared with the expert-designed one. All analyses in this research are carried out in R.


Cognitive Diagnosis Models
Cognitive diagnostic assessment (CDA) is a framework that intends to evaluate an examinee's mastery of a specific cognitive skill called an attribute [1]. A few cognitive diagnosis models (CDMs) have been developed, such as the deterministic input, noisy "and" gate (DINA) model [2], and the reduced reparameterized unified model (RRUM) [3,4].
The core of different CDMs is the Q-matrix [5]. Table 1 shows a simple example of the Q-matrix, which assesses whether an examinee has acquired addition and subtraction attributes. An examinee must possess an addition attribute in order to correctly answer the first item. Suppose that the attribute status of the examinee is (1, 0). Without possessing the subtraction attribute, the examinee is expected to only answer item 1 correctly since items 2 and 3 require a subtraction attribute.
Suppose the data is comprised of item responses from I examinees to J items that measure K attributes. In the DINA model, η ij indicates whether examinee i (i = 1, · · · , I) can correctly answer item j (j = 1, · · · , J) , where α ik is the mastery of attribute k (k = 1, · · · , K) for examinee i and q jk is the state of the jth item and kth attribute in the Q-matrix. The item response function (IRF) for the DINA model is: where g j and s j are guess and slip parameters for item j, α i is the attribute status of examinee i. The guess parameter g j represents the probability of X ij = 1 when at least one required attribute is lacking, and the slip parameter s j denotes the probability of X ij = 0 when all required attributes are present. Suppose the guess and slip parameters for item 1 in Table 1 are g 1 = s 1 = 0.2. If the attribute status of an examinee i is α i = (1, 0), then η i1 = 1. Therefore, the examinee's probability of correctly answering item 1 is P The RRUM is another popular CDM, especially in language assessment. The IRF of the RRUM is: (1 − s jk ) q jk and r * jk = g jk /1 − s jk . Note that g jk and s jk are guess and slip parameters of item j for attribute k. As in the DINA model, α ik is the mastery of attribute k for examinee i and q jk is the state of the jth item and kth attribute in the Q-matrix.

Model Based Estimation of the Q-Matrix
Most of the Q-matrices are not specified during exam development and are conventionally designed by domain experts after the fact. A more objective means to assign the Q-matrix is needed because domain experts might neglect some attributes or have different opinions. For instance, different modifications of the Q-matrix for the fraction subtraction data [6] have been suggested (e.g., Tatsuoka [6]; de la Torre [7]; de la Torre and Douglas [8]; de la Torre and Douglas [9]; Henson et al. [10]; Tatsuoka [11]) over the past three decades.
It is therefore of practical importance to develop an automated method for searching the Q-matrix. The purpose of this research is to develop a deterministic learning algorithm that estimates the Q-matrix for CDMs. It should be noted that this research attempts to develop a method to estimate the whole Q-matrix rather than to validate an existing one.
Estimating the whole Q-matrix is NP-hard [12]. The sample space of this discrete optimization can be huge. For example, simulation studies of this research try to recover a 15 by 4 Q-matrix (see Table 2), which gives 2 60 /4! possible Q-matrices when column permutation is considered. Since the optimization problem can be simplified using modelbased methods, most of the studies attempting to estimate the Q-matrix are CDM based.
For example, Chen et al. [13], Liu et al. [14], Liu et al. [15], and Xu and Shang [16] use the DINA model to estimate the Q-matrix. Chung [17] and Chung [18] show that both the DINA model and RRUM are both feasible models. Treating the Q-matrix as a parameter to estimate these studies assume that the true model is the DINA model or the RRUM. Although simulations in these studies suggest promising Q-matrix recovery rates, it is desirable to find non-model-based methods.

NMF Estimation of the Q-Matrix
Winters [19] was the first to apply non-negative matrix factorization (NMF) to explore the structure of the Q-matrix. Desmarais and his colleagues have also published a few studies (e.g., Desmarais [20]; Desmarais [21]; Desmarais and Naceur [22]; Desmarais et al. [23]) that apply NMF to estimating the Q-matrix. These studies applying NMF are inspiring, showing that Q-matrix estimation is in essence a matrix factorization.
While their findings are promising, some issues are worth considering. The primary concern is that the inclusion of an entry (1 or 0) is counted solely on visual inspection using heatmaps. While using visual inspection is desirable in some conditions, we would expect a more decisive means to avoid ambiguity. Another issue is that the way the initial values were retrieved is unclear, especially considering that NMF is very sensitive to initial values (Cichocki et al. [24]; Gond and Nadi [25]; Zheng et al. [26]). A more recent work by Casalino et al. [12] has shown that using the constraint alternating least square is a practical solution to a stable estimation.
Other issues are related to their simulation studies. No information is available for the different sample sizes and the correlations between attributes. In addition, each item measures at most two attributes in their Q-matrix for simulations when in reality it is not uncommon to have an items testing more than two attributes. We are also interested in whether NMF can recognize items measuring all attributes.
Inspired by Desmarais and his colleagues' research, this research offers a factorization algorithm for the Q-matrix, as well as refining simulation designs. Specifically, we use a maximum likelihood estimation and enforce a dichotomizing scheme in the estimation. We define a recovery rate to investigate the effect of the method on different sample sizes and correlations under complete and incomplete Q-matrix designs. In addition to the DINA model, the RRUM is also adopted in the simulation study.

Maximum Likelihood Estimation
Barnes et al. [27] and Desmarais [20] give the following equation that infers the item response (X) as the product of the Q-matrix (Q) and the attribute mastery matrix (α): where X, Q, and α are binary with each element either 0 or 1. Estimating the Q-matrix turns into a factorization problem, which can be simplified if we temporarily treat Q as continuous. This research proposes to freely derive Q by maximum likelihood estimation and then dichotomize the estimate to produce a binary Q-matrix. The advanced procedure is explained as the following. Suppose that a corresponding continuous version of Q is R. The problem becomes to find a valid factorization of X that could provide an estimate of R: If the transformation in (3) is linear and nonsingular, then α T = R −1 X. Using a Jacobian transformation, we can derive the pdf of X: Let M = R −1 , and let p i denote the pdf of α i . The pdf of X in (4) becomes: Suppose the Q-matrix measures K attributes. That is, M = (m 1 , m 2 , · · · , m K ) T , and therefore: Suppose that we have N observations, denoted X 1 , X 2 , · · · X N . From the pdf shown in (5), the likelihood L(M) is therefore: and therefore the log-likelihood is: Dividing the log-likelihood of (6) by N yields: It should be noted that the expectation will eventually be replaced by sample averages. To find the learning algorithm for M, we can maximize (7) by taking partial derivative with respect to M: To simplify notation, we let u i = m T i X so that is the negative score function, Lee et al. [28] shows that: We now evaluate the other term, ∂ log |detM| ∂M . From Grossman [29], we know that: where adj(M) is the adjoint of M. We can express detM in terms of cofactors, Taking partial derivative of (8) with respect to m ij gives ∂detM ∂m ij = m ij , which suggests that: As adj(M) T = (detM) M T −1 , we get: The gradient of the log-likelihood in (7) is therefore: For only one data point, the expectation is omitted. The learning algorithm is given by:

Rotation and Dichotomization
Inverse M in each iteration, and we can obtain R. To enhance the interpretability, R entails an orthogonal rotation by varimax. The rotated R is denoted as Q. As the Q-matrix is binary, a method to dichotomize Q is imperative. We devise the following dichotomizing scheme for converting Q to the binary Q-matrix.
Suppose the Q-matrix uses J items to measure K attributes. Each row in Q J×K is an item. Item j can be presented as q j = (q j1 , q j2 , · · · , q jK ). Let the entry with the highest value in item j be q jmax , namely, q jmax = max(q j1 , q j2 , · · · , q jK ). The value of each attribute in the item is then decided by the relative magnitude of its value to the highest one. That is, q j /q jmax = (q j1 /q jmax , q j2 /q jmax , · · · , q jK /q jmax ). For entry q jk in item j, if q jk /q jmax ≥ a, set q jk to 1. If q jk /q jmax < a, then set q jk to 0.
Different values of a (i.e., 0.4, 0.5, 0.6) are tested in this research. After this procedure is applied to every item, the whole Q-matrix is derived because Q J×K is now binary.

Simulation Study
We examine whether our proposed algorithm can recover the Q-matrix from simulated data. Settings for the simulation are the same as those in Chung [18]. The following simulation study is carried out using customized R codes.

Q-Matrix for Simulation
This simulation study adopts the artificial Q-matrix (the complete Q-matrix in Table 2) acquired from Rupp and Templin [30]. Fifteen items that measure four attributes comprise the Q-matrix, which is constructed such that each attribute appears alone from items 1 to 4, in a pair from items 5 to 10, or in a triad from items 11 to 14. It should be noted that item 15 measures all 4 attributes.
The complete Q-matrix in Table 2 contains at least one item devoted solely to each attribute [31]. That is, each attribute in the complete Q matrix in Table 2 is individually measured by at least one item (e.g., attribute 1 is individually tested by item 1).
This research also examines the effectiveness of our proposed algorithm in recovering an incomplete Q-matrix (the incomplete Q-matrix in Table 2), in which an item measures more than one attribute. Each item in the incomplete Q-matrix in Table 2 tests at least two attributes (e.g., item 1 tests attributes 1 and 4).

Generating Correlated Attributes
Chung [18] suggests generating correlated attributes using a copula with the Choleski decomposition. Suppose the Q-matrix (Q) uses J items to measure K attributes for I examinees. That is, Q J×K = q jk J×K and α I×K = (α ik ) I×K . Let ϑ be the I by K underlying probability matrix of α, and let column k of ϑ be vector ϑ k , k = 1, . . . , K. That is, ϑ = (ϑ 1 , . . . , ϑ K ). The correlation coefficient for each pair of columns in ϑ takes the constant value ρ , and the correlation matrix is represented as Σ. Each entry in Σ corresponds to the correlation coefficient between two columns in ϑ. Σ can be further decomposed as Σ = ν T ν using the Choleski decomposition, where ν is an upper triangular matrix. After ν is derived, create an I × K matrix τ, in which each entry is generated from N(0, 1). τ is then transformed to γ by using γ = τν, so that fl and Σ will have the same correlation structure. Set Φ(γ) = ϑ, where Φ(·) is the CDF of the standard normal distribution. α is generated using inverse transform sampling. Create a matrix Θ I×K = (θ ik ) I×K , where each element is generated from Uniform(0, 1). If ϑ ik ≥ θ ik , set α ik to 1, and if ϑ ik < θ ik , set α ik to 0.

Generating Data from the DINA Model and RRUM
With the Q-matrix and dependent attributes, data were simulated using the DINA model and the RRUM. Values for all guess and slip parameters are set to 0.2 for both the DINA model (s j = g j = 0.2) and RRUM (s jk = g jk = 0.2), and the data are then created using the inverse transform sampling from two points, in which the probability is obtained from the IRF of the DINA model and RRUM. Note that from Equation (1) for the RRUM, we simulate data using s jk and g jk instead of using reparameterized parameters π * j and r * jk . We follow the following settings that appear in Chung [18] . Examinees in groups of 500, 1000, and 2000 were simulated with the correlation between each pair of attributes set to 0.1, 0.3, and 0.5 for both the DINA model and RRUM. One hundred datasets were simulated for each combination of sample size and correlation.

Evaluation
Evaluating how well the proposed method recovers the true Q-matrix q, the recovery rate ∆ suggested by [14] is defined as: where M = 100, | · | takes the absolute value andq (m) stands for the estimated Q-matrix for mth dataset.

Results from Simulation
For both the DINA model and RRUM, the optimal cutoff value a is 0.5 (see Table 3). It should be noted that 0.5 is not an arbitrary cutoff. The value of 0.5 here means a half of the highest value. The following has stated Section 2.2: If q jk /q jmax ≥ a, set q jk to 1. If q jk /q jmax < a, then set q jk to 0. The value of an attribute has to be more than a half of the highest value to be regarded as 1 in the Q-matrix. As a = 0.5 is optimal, Table 4 shows the recovery rate for the complete Q-matrix when a = 0.5. For the complete Q-matrix with a = 0.5, average ∆ from all combinations in the DINA model is 0.994, suggesting an adequate result when the Q-matrix is complete although average ∆ drops to 0.962 for the more complicated RRUM.
In terms of different sample sizes and correlations, ∆ rises along with the increase of the sample size and drops when the correlation between attributes rises. Note that the number of attributes is supposed to be known in advance.
For different combinations of sample size and correlation, ∆ declines when the correlation between attributes goes up, while it increases along with the increase of the sample size. ∆ is higher when the data are simulated from the RRUM. When the Q-matrix is incomplete, ∆ is unsatisfactory (see Table 5), ranging from 0.754 to 0.772. Such low recovery rates from an incomplete Q-matrix are similar to the findings in Chung [18]. Table 5. Recovery rates of incomplete Q-matrix when a = 0.5.

Empirical Study
Obtained from the CDM package in R [32], the Examination for the Certificate of Proficiency in English (ECPE) data that consists of 2922 examinees is analyzed using the proposed method described in Section 2. The ECPE is a test developed and scored by the English Language Institute of the University of Michigan [33].
Buck and Tatsuoka [34] suggest a Q-matrix consisting of 28 items that measure 3 attributes: Morphosyntactic rules, Cohesive rules, and Lexical rules (expert Q-matrix in Table 5). In addition, a parallel analysis was tentatively used to determine the number of attributes in the Q-matrix. The result also suggests 3 attributes.
We consequently decided to evaluate a 3-attribute Q-matrix solution. Exhibited in Table 6, the expert-designed and estimated Q-matrices are denoted Q X and Q E respectively. An inspection of Q X and Q E shows that each of them is a complete Q-matrix, and no two columns are identical. Comparing Q X with Q E , we find that 11 items are identical. Overall, 76.2% of the entries are identical. Like the expert designed Q-matrix, the estimated Q-matrix is complete. As for model fit, AICs for Q X and Q E are respectively 85,812.92 and 85,709.47, suggesting that the estimated Q-matrix (Q E ) better fits the data.

Discussion
The last 10 years have seen the development of a few CDM-based methods for extracting the Q-matrix, whereas non-CDM based approaches have been rarely seen. One of the non-CDM based methods makes use of NMF. Desmarais and his colleagues revealed that NMF is a useful method in deriving the Q-matrix. From the result of NMF, Q-matrix estimation is a matrix factorization. Inspired by NMF, this study demonstrates the practicality of our proposed deterministic learning algorithm in generating the Q-matrix.
As the Q-matrix is regarded as a factor model with binary factor loadings, a rotation using varimax after the estimation is necessary to aid interpretation. After a rotation, an objective way to dichotomize continuous estimates is needed to derive the final Q-matrix. Arbitrarily using a fixed value such as 0.5 might generate fallacious estimates. For example, if the factor loadings for an item j is (0.111, 0.222, 0.333, 0.444), then a cutoff 0.5 will result in (0, 0, 0, 0). In this research, we enforce a dichotomizing scheme using a proportion of the highest loading (a) on the estimates after they are rotated. Results from the simulation study suggests that setting a to 0.5 is optimal.
The simulation study demonstrates that the proposed deterministic learning algorithm is capable of extracting the Q-matrix from data, with the results showing that recovery rates from different conditions are all above 0.9. Both sample sizes and correlations between attributes are influential in the Q-matrix recovery. When the Q-matrix is incomplete, the results are not as good as the settings with a complete Q-matrix. The low recovery rate is due to the non-identifiability of model parameters under the incomplete Q-matrix [38].
One limitation in this research is that the number of attributes is assumed to be known in the estimation. Future research could use a parallel analysis to deal with the issue. Another limitation is that data were simulated only from the DINA model and RRUM. It is desirable to test the effectiveness of the proposed method on data generated from more general CDMs, such as the GDINA model. Further research using the proposed method is also recommended to examine the recovery rate from data simulated from non-compensatory models, such as the DINO model.
More extensive research is needed to compare the result from factor models with the result from CDM-based methods. A further comparison could investigate the effect of model misspecification on guess and slip parameters in the DINA model and RRUM.
In the empirical study, our Q-matrix estimate is very different from the expert designed Q-matrix, although 76.2% of the Q-matrix entries are identical. It is suggested that further research applies validation approach, setting those identical entries as fixed and estimating the rest of the entries.
In conclusion, this research demonstrated that the Q-matrix is intrinsically a factor model. Different dimensionality reduction techniques, such as principal component analysis and singular value decomposition, incorporated with the dichotomizing scheme advanced in this research should be able to derive the Q-matrix.

Data Availability Statement:
The ECPE data can be obtained from the CDM R package.

Conflicts of Interest:
The authors declare no conflict of interest.