1. Introduction
Cognitive diagnostic models (CDMs), also known as diagnostic classification models (DCMs), have emerged as powerful psychometric tools that provide fine-grained information about examinees’ knowledge states and cognitive processes, delivering substantially richer feedback than traditional single test scores [
1,
2,
3]. Unlike traditional item response theory (IRT) models [
4,
5,
6] that place examinees on a continuous ability scale, CDMs classify examinees into discrete attribute mastery patterns, offering an attribute-based interpretation of test performance. Due to these advantages, CDMs have gained significant interest across various domains, including educational assessment [
7,
8], language testing [
9], and psychological assessment [
10].
While most CDM applications have focused on cross-sectional data, there is growing interest in longitudinal applications to measure examinees’ growth or change in attribute mastery over time. Longitudinal CDMs are particularly valuable for monitoring students’ learning progress, evaluating instructional effectiveness, and understanding the development of cognitive skills. Several approaches have been developed to model longitudinal assessment data with CDMs. The most straightforward approach involves fitting separate CDMs at each time point and comparing the resulting classifications [
11,
12]. However, this approach ignores dependencies between attribute mastery states across time points, which may lead to inconsistent classifications. To address these limitations, more sophisticated longitudinal DCMs have been proposed. Li et al. [
13] proposed the LTA-DINA model, which combines the latent transition analysis (LTA) [
14,
15] framework with the deterministic input, noisy “and” gate (DINA) [
16,
17] model. Kaya and Leite [
18] developed longitudinal cognitive diagnosis models that combine latent transition analysis with the DINA model and deterministic input, noisy “or” gate (DINO) model [
10], demonstrating through simulation studies that these approaches yield satisfactory convergence and classification accuracy when tracking attribute mastery over time. Wang et al. [
19] introduced a higher-order hidden Markov model for tracking skill acquisition, which models the development of latent attributes through a continuous higher-order proficiency variable. Madison and Bradshaw [
20] proposed the transition diagnostic classification model (TDCM), which integrates LTA with the more general log-linear cognitive diagnosis model, allowing for flexible modeling of attribute relationships and transitions over time.
However, there are challenges in the application of these longitudinal CDM approaches due to their reliance on a pre-specified Q-matrix. Q-matrix [
21] is essential to standard CDM approaches, mapping each test item to the specific attributes required to answer it correctly. Specifically, for a test with
J items and
K attributes, the Q-matrix is a
binary matrix, where
indicates that item
j requires attribute
k to answer correctly, and
otherwise. The accurate specification of the Q-matrix is crucial for valid diagnostic inferences from the CDMs. Rupp and Templin [
22] demonstrated that Q-matrix misspecification can lead to biased parameter estimates and reduced classification accuracy. Kunina-Habenicht et al. [
23] found that the impact of Q-matrix misspecification depends on the specific DCM and the pattern of misspecification. Madison and Bradshaw [
24] showed that even small changes in the Q-matrix design can significantly affect classification accuracy in the LCDM framework. Despite the importance of the Q-matrix, specifying a Q-matrix is a challenging and time-consuming task [
25], as it requires a deep understanding of the cognitive processes involved in answering each item. In practice, researchers often rely on expert judgment to construct the Q-matrix, which is subjective and may lead to inconsistencies across different assessments. This challenge can be magnified in longitudinal settings, where misclassifications at one time point can propagate and affect the interpretation of transitions over time.
Given the challenges associated with Q-matrix specification, researchers have explored various approaches to estimate or validate Q-matrices empirically, or to develop models that do not require a pre-specified Q-matrix. Data-driven Q-matrix validation methods have been proposed to refine expert-specified Q-matrices based on empirical data. De La Torre [
26] developed a method for validating and refining Q-matrices for the DINA model using a discrimination index. This method has been extended to the generalized DINA (G-DINA) model by De La Torre and Chiu [
27]. These methods typically start with an expert-specified Q-matrix and suggest modifications based on statistical criteria. More ambitious approaches aim to estimate the Q-matrix directly from response data with minimal prior specification. Liu et al. [
28,
29] developed self-learning algorithms for Q-matrix estimation that iteratively update both the Q-matrix and model parameters and established conditions under which the Q-matrix can be uniquely identified from the data. Chen et al. [
30] formulated the Q-matrix estimation as a latent variable selection problem and employed the regularized maximum likelihood to estimate the Q-matrix. Chen et al. [
31] introduced a Bayesian method for estimating the DINA Q-matrix, which incorporates prior knowledge while allowing for data-driven refinement. More recently, Balamuta and Culpepper [
32] introduced exploratory restricted latent class models with monotonicity requirements under Pòlya–Gamma data augmentation, providing a powerful Bayesian framework for Q-matrix estimation. Similarly, Chen et al. [
33] introduced a sparse latent class model (SLCM) for cognitive diagnosis that simultaneously estimates the Q-matrix and classifies examinees without requiring prior specification of the number of attributes. Yamaguchi [
34] uses a partially known Q-matrix to simultaneously estimate the effects of active and nonactive attributes in a Bayesian framework, which can also be employed to estimate the unknown Q-matrix part.
Despite these advances, longitudinal CDMs face significant computational challenges, particularly when the number of attributes or time points increases. Most current estimation methods rely on Markov Chain Monte Carlo (MCMC) techniques, which can be computationally intensive and time-consuming. Instead of MCMC, variational inference (VI) has emerged as a powerful alternative for estimating complex models, including CDMs, due to its computational efficiency and scalability [
35,
36,
37,
38]. Over the past several years, a growing body of work has advanced the use of variational inference for psychometric models. Yamaguchi and Okada [
39,
40] introduced variational inference approaches for the DINA model and the saturated diagnostic classification model, demonstrating significant computational advantages over traditional MCMC methods while maintaining comparable accuracy. Building on this foundation, Yamaguchi and Martinez [
41] extended the variational framework to hidden Markov diagnostic classification models, making longitudinal cognitive diagnostic analyses more tractable for large-scale applications. Further advancing the field, Wang et al. [
42] developed an efficient VBEM-M algorithm for the log-linear cognitive diagnostic model that aligns variational posteriors with prior distributions, achieving both faster convergence and improved parameter recovery compared to conventional estimation methods. In this paper, we propose a novel approach that leverages variational inference to efficiently estimate parameters of our hidden Markov log-linear additive cognitive diagnostic model (HM-LACDM), which extends the sparse latent class model (SLCM) framework. We implement post hoc Q-matrix recovery based on the estimated posterior distributions of item-effect parameters, enabling us to determine the Q-matrix without prior specification. Our approach provides substantial computational advantages over traditional MCMC methods, making it particularly well-suited for analyzing complex longitudinal diagnostic data with numerous attributes or time points. We demonstrate that our proposed method can effectively recover the Q-matrix and accurately estimate the model parameter through simulation studies.
2. Notation and Model Formulation
Before proceeding with our methodology, we first establish the notation that will be used throughout this paper. For any integer n, we denote the set as . We denote the number of respondents as I, the number of items as J, the number of attributes as K, and the number of assessment time points as T. As discussed in the introduction, we denote the Q-matrix as , where indicates that item j requires attribute k to answer correctly, and otherwise. Formally, the Q-matrix is denoted as . Let represent the row vector of , such that . In our work, we assume that the Q-matrix is unknown and we will recover it post hoc based on inference from the variational distributions of the item-effect parameters.
We introduce the latent discrete random variable
, which represents the attribute mastery profiles of all respondents across time points and attributes. For each respondent
, let
denote their complete attribute profile, where each entry
indicates that respondent
i has mastered attribute
k at assessment point
t, and
otherwise. We denote by
the attribute mastery pattern of respondent
i at assessment point
t. At each time point
t, there are
possible attribute mastery patterns. There exists a natural bijective mapping
that maps each attribute profile
to a unique latent class
, such that
and
. With slight abuse of notation, we can omit the mapping
and simply denote the latent class as
. Let
be the design matrix for the item effect parameters, where
is the
row of
. Each row can be constructed as
, where the first element is the intercept term and the remaining elements correspond to the attribute pattern. For example, when
, we have
We denote the response matrix as
, where
indicates that respondent
i answered item
j correctly at time point
t, and
otherwise. In our model, for respondent
i at time point
t, the response to item
j is modeled as a Bernoulli random variable based on the mixture of generalized linear model (GLM) framework:
where
is the logistic link function, i.e.,
,
is the item-effect parameter vector for item
j. For each item
j, we set the prior distribution of the item-effect parameter
as a multivariate normal distribution
where
is the prior mean and
is the prior covariance matrix. We denote the collection of item-effect parameters as
, and the prior distribution of
can be written as
Assuming conditional independence of responses
given the attribute mastery patterns
, and item parameters
, the joint likelihood of the observed data is
To model the longitudinal transitions of latent attribute mastery, we employ a hidden Markov model (HMM) structure. In our model, we assume that the transition probabilities between latent classes are time-homogeneous, meaning that the transition probabilities remain constant across time points. This is a common assumption in HMMs and allows us to simplify the modeling process while still capturing the essential dynamics of the latent attribute mastery states. We denote the time-homogeneous transition matrix as
, where each entry
represents the probability of transitioning from latent class
l at time point
t to latent class
at time point
. The transition matrix satisfies the constraint
for all
. For
, the transition from the latent class at time point
to the latent class at time point
t is modeled as a categorical distribution
For
, the initial latent class distribution is modeled as a categorical distribution with prior probabilities
, where
. We have
We set the prior distribution of each row vector of the transition matrix
as a Dirichlet distribution with concentration parameters
, i.e.,
. The prior distribution of the transition matrix
can be expressed as
Similarly, we set the prior distribution of the initial latent class distribution
as a Dirichlet distribution with concentration parameters
, i.e.,
. We have
We can now summarize our HM-LACDM model as follows:
where
is the likelihood of the observed data given the latent attribute mastery profiles and item parameters (Equation (
3));
represents the likelihood of the latent attribute mastery profiles given the transition matrix and initial distribution (Equations (
5) and (
4));
denotes the prior distribution of the item-effect parameters (Equation (
2));
specifies the prior distribution of the transition matrix (Equation (
6)); and
is the prior distribution of the initial latent class distribution (Equation (
7)). A more detailed description of the data likelihood can be found in
Appendix A.
3. Variational Inference for HM-LACDM
Variational inference (VI) [
35,
36,
37,
38] is a powerful computational technique that transforms Bayesian posterior inference into an optimization problem. Given a probabilistic model with latent variables
and observed data
x, Bayesian inference seeks to compute the posterior distribution
, where
is the marginal likelihood of the data. However,
is often intractable to compute, especially for complex models with high-dimensional latent variables. Instead of computing the intractable posterior distribution
directly, VI approximates it with a simpler distribution
by minimizing the Kullback–Leibler (KL) divergence between
and the true posterior
. VI aims to maximize the evidence lower bound (ELBO), defined as:
where
is the joint distribution of the observed data and latent variables, and
is the variational distribution. Since maximizing the ELBO is equivalent to minimizing the KL divergence to the true posterior, this approach provides a principled framework for approximate Bayesian inference that is both computationally efficient and scalable to large datasets. For our HM-LACDM, we employ coordinate ascent variational inference (CAVI) with mean-field factorization, which assumes the variational posterior factorizes as
. This enables iterative optimization of each component while keeping others fixed. Additionally, we handle the logistic link function using the Jaakkola–Jordan lower bound approximation Jaakkola and Jordan [
43], which transforms the non-conjugate logistic likelihood into a tractable quadratic form suitable for Gaussian variational approximations. A more detailed introduction to variational inference can be found in
Appendix B.1.
Leveraging the mean-field variational inference framework, we aim to approximate the posterior distribution of the model parameters via a factorized distribution:
In this section, we present the variational posterior distributions and their iterative updates based on coordinate ascent variational inference (CAVI). Detailed derivations of the variational updates can be found in the
Appendix B.
First, the variational posterior distribution of the item-effect parameters
is assumed to follow a multivariate normal distribution
, where
is the variational mean and
is the variational covariance matrix. The iterative updates can be updated as follows:
where
is the variational parameter associated with item
j and latent class
l, and
. We use
to denote the collection of all variational parameters
.
The variational posterior distribution of each row vector of the transition matrix
is assumed to follow a Dirichlet distribution, i.e.,
, where
is the variational concentration parameter vector for the
row of the transition matrix. The iterative updates can be updated as follows:
Similarly, the variational posterior distribution of the initial latent class distribution
is assumed to follow a Dirichlet distribution, i.e.,
, where
is the variational concentration parameter vector for the initial latent class distribution. The iterative updates can be updated as follows:
To update these variational posteriors, expectations of latent attribute mastery pattern,
and
, are required. For all
and
, the iterative updates of these expectations can be computed using the following equations:
In the above equations,
and
are the forward and backward probabilities, respectively, which can be computed recursively using the forward–backward algorithm, also known as the Baum–Welch algorithm in the context of HMMs [
44,
45,
46]. For all
,
, and
,
and
are defined as follows:
where
,
, and
During each iteration of the variational inference algorithm, we update the variational parameters
using
We summarize our variational inference algorithm for the HM-LACDM as follows (Algorithm 1):
| Algorithm 1 Variational Inference for HM-LACDM |
- 1:
Input: Response data , prior parameters , , , for all , . - 2:
Initialize: Set , , , for all , . - 3:
Repeat until convergence: - 1.
Update and (Equation ( 13)). - 2.
Update , , , and (Equation ( 10)–( 12)). - 3.
Update the variational parameters (Equation ( 15)).
- 4:
Output: Variational posterior distributions , , , and .
|
We include the details of the derivation of the variational updates in
Appendix B.2. We also provide the explicit form of the ELBO in
Appendix B.3.
4. Post Hoc Q-Matrix Recovery
Although our HM-LACDM does not require a pre-specified Q-matrix, recovering the Q-matrix is still valuable for interpretability and diagnostic purposes. We propose a post hoc Q-matrix recovery approach that leverages the estimated variational posterior distributions of the item-effect parameters. For item
j, its item-effect parameter
can be expressed as
, where
is the intercept term and
is the effect of attribute
k on item
j. If attribute
k is not required for item
j, then its corresponding effect parameter
should be zero. Based on this fact, we can perform hypothesis testing to determine whether each attribute
k is required for item
j based on the estimated posterior distribution of
. We formulate the hypothesis testing problem as follows:
We then estimate the Q-matrix entry
using a decision rule:
where
is the test statistic for the hypothesis testing of
,
and
are the posterior mean and standard deviation of
, respectively, and
is the critical value from the standard normal distribution corresponding to the significance level
.
However, due to the fact that we are testing multiple hypotheses simultaneously (one for each combination of item
j and attribute
k), we need to control the false discovery rate (FDR). We employ the Benjamini–Hochberg (BH) procedure [
47] to control the FDR at level
. The BH procedure begins by computing the test statistic
for all
pairs where
and
, followed by calculating the corresponding
p-values
, where
is the cumulative distribution function of the standard normal distribution. The
p-values are then sorted in ascending order as
. We find the largest index
ℓ such that
and reject all hypotheses corresponding to
, setting the corresponding Q-matrix entries to 1.
Although our approach is post hoc, it is designed to recover the Q-matrix if the ground truth Q-matrix satisfies certain identifiability conditions. A Q-matrix is said to be complete if it allows for the identification of all the possible attribute profiles of the
different proficiency classes. Köhn and Chiu [
48,
49] have proved that the necessary condition for a
Q-matrix of a main-effects-only model to be complete is that the matrix must have full column rank (
), i.e., the columns of the Q-matrix must be linearly independent. Therefore, if the ground truth Q-matrix is complete, our post hoc recovery procedure should recover the Q-matrix with high probability as long as the sample size is sufficiently large. Our approach provides a principled method for Q-matrix recovery that accounts for multiple testing while maintaining strong statistical properties. The recovered Q-matrix can then be used for diagnostic interpretation and validation of the cognitive structure underlying the assessment.
6. Discussion
In this paper, we introduced a novel hidden Markov log-linear additive Cognitive Diagnostic Model (HM-LACDM) that circumvents the necessity of specifying a predefined Q-matrix, thus addressing a critical limitation in traditional cognitive diagnostic models (CDMs). By integrating a variational Bayesian inference framework, our method significantly reduces computational complexity, making it particularly suitable for large-scale longitudinal cognitive assessments involving numerous attributes and multiple time points.
The simulation studies validate the robustness and efficiency of our proposed approach. Our variational inference algorithm consistently demonstrates convergence, parameter recovery accuracy, and computational efficiency across varying configurations of sample size, attribute complexity, and temporal length. Notably, the post hoc Q-matrix recovery method exhibited remarkable accuracy, effectively controlling the false discovery rate and reliably identifying the cognitive attributes underlying each test item.
Our approach’s scalability and computational efficiency make it highly suitable for practical applications in educational assessments, where swift and accurate diagnostics are crucial. It provides educators and psychometricians with an effective tool for longitudinally tracking student learning progression without the intensive and subjective Q-matrix specification process.
In summary, the proposed HM-LACDM and the accompanying variational Bayesian inference method represent substantial advances in the field of cognitive diagnostic modeling. They not only facilitate more robust and scalable inference but also significantly enhance interpretability through effective Q-matrix recovery, thereby opening new avenues for targeted instructional intervention and deeper insights into cognitive skill development.
Despite the advantages of our proposed framework, there are important limitations to acknowledge. First, the recovered Q-matrix is identifiable only up to permutation of the attributes, a common issue in latent-class models without additional constraints. In our simulation studies, we address this by aligning the estimated attributes with the ground truth through enumeration of all possible label permutations. This ensures that the reported recovery accuracy and simulation results are not confounded by label switching. For empirical data, when the Q-matrix is not fully available, partial domain knowledge can be used to anchor certain attributes, and ordering constraints can be imposed to break label symmetry. These strategies mitigate the relabeling issue and represent promising directions for extending our framework to empirical applications. Second, unlike a saturated CDM, our specification includes only intercept and main-effect parameters, excluding higher-order interaction terms. We do not impose explicit constraints on these parameters during estimation; instead, we assume that the monotonicity condition holds for the true model, ensuring that the data themselves satisfy this property. In practice, we guide the algorithm toward monotone solutions by initializing intercepts with negative values and main effects with positive values, and by running the algorithm multiple times with different random initializations to select the best-fitting solution. Under this setting, the one-sided test with positive effects used in our post hoc Q-matrix recovery is appropriate, as the parameters of substantive interest are non-negative by construction.
Future research could extend the framework in several directions: incorporating interaction terms to approximate saturated CDMs, relaxing the monotonicity assumption, or integrating informative priors to further improve identifiability. Importantly, while our simulation studies demonstrate robustness and scalability, applying the method to large-scale empirical assessment data remains a critical next step. Such real-data validation would allow us to evaluate the practical utility of the HM-LACDM in capturing nuanced learning dynamics and further establish its value in operational testing contexts.