Next Article in Journal
A Review of Urban Path Planning Algorithms in Intelligent Transportation Systems
Previous Article in Journal
ECG-CBA: An End-to-End Deep Learning Model for ECG Anomaly Detection Using CNN, Bi-LSTM, and Attention Mechanism
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Variational Bayesian Inference for a Q-Matrix-Free Hidden Markov Log-Linear Additive Cognitive Diagnostic Model

1
Department of Statistics and Data Science, University of California, Los Angeles, Los Angeles, CA 90095, USA
2
Department of Education, University of Georgia, Athens, GA 30602, USA
3
School of Computing, University of Georgia, Athens, GA 30602, USA
4
Department of Education, University of California, Los Angeles, Los Angeles, CA 90095, USA
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Present address: The Walt Disney Company, Burbank, CA 91521, USA.
Algorithms 2025, 18(11), 675; https://doi.org/10.3390/a18110675
Submission received: 9 June 2025 / Revised: 13 October 2025 / Accepted: 15 October 2025 / Published: 22 October 2025
(This article belongs to the Section Algorithms for Multidisciplinary Applications)

Abstract

Cognitive diagnostic models (CDMs) are commonly used in educational assessment to uncover the specific cognitive skills that contribute to student performance, allowing for precise identification of individual strengths and weaknesses and the design of targeted interventions. Traditional CDMs, however, depend heavily on a predefined Q-matrix that specifies the relationship between test items and underlying attributes. In this study, we introduce a hidden Markov log-linear additive cognitive diagnostic model (HM-LACDM) that does not require a Q-matrix, making it suitable for analyzing longitudinal assessment data without prior structural assumptions. To support scalable applications, we develop a variational Bayesian inference (VI) algorithm that enables efficient estimation in large datasets. Additionally, we propose a method to reconstruct the Q-matrix from estimated item-effect parameters. The effectiveness of the proposed approach is demonstrated through simulation studies.

1. Introduction

Cognitive diagnostic models (CDMs), also known as diagnostic classification models (DCMs), have emerged as powerful psychometric tools that provide fine-grained information about examinees’ knowledge states and cognitive processes, delivering substantially richer feedback than traditional single test scores [1,2,3]. Unlike traditional item response theory (IRT) models [4,5,6] that place examinees on a continuous ability scale, CDMs classify examinees into discrete attribute mastery patterns, offering an attribute-based interpretation of test performance. Due to these advantages, CDMs have gained significant interest across various domains, including educational assessment [7,8], language testing [9], and psychological assessment [10].
While most CDM applications have focused on cross-sectional data, there is growing interest in longitudinal applications to measure examinees’ growth or change in attribute mastery over time. Longitudinal CDMs are particularly valuable for monitoring students’ learning progress, evaluating instructional effectiveness, and understanding the development of cognitive skills. Several approaches have been developed to model longitudinal assessment data with CDMs. The most straightforward approach involves fitting separate CDMs at each time point and comparing the resulting classifications [11,12]. However, this approach ignores dependencies between attribute mastery states across time points, which may lead to inconsistent classifications. To address these limitations, more sophisticated longitudinal DCMs have been proposed. Li et al. [13] proposed the LTA-DINA model, which combines the latent transition analysis (LTA) [14,15] framework with the deterministic input, noisy “and” gate (DINA) [16,17] model. Kaya and Leite [18] developed longitudinal cognitive diagnosis models that combine latent transition analysis with the DINA model and deterministic input, noisy “or” gate (DINO) model [10], demonstrating through simulation studies that these approaches yield satisfactory convergence and classification accuracy when tracking attribute mastery over time. Wang et al. [19] introduced a higher-order hidden Markov model for tracking skill acquisition, which models the development of latent attributes through a continuous higher-order proficiency variable. Madison and Bradshaw [20] proposed the transition diagnostic classification model (TDCM), which integrates LTA with the more general log-linear cognitive diagnosis model, allowing for flexible modeling of attribute relationships and transitions over time.
However, there are challenges in the application of these longitudinal CDM approaches due to their reliance on a pre-specified Q-matrix. Q-matrix [21] is essential to standard CDM approaches, mapping each test item to the specific attributes required to answer it correctly. Specifically, for a test with J items and K attributes, the Q-matrix is a J × K binary matrix, where q j k = 1 indicates that item j requires attribute k to answer correctly, and  q j k = 0 otherwise. The accurate specification of the Q-matrix is crucial for valid diagnostic inferences from the CDMs. Rupp and Templin [22] demonstrated that Q-matrix misspecification can lead to biased parameter estimates and reduced classification accuracy. Kunina-Habenicht et al. [23] found that the impact of Q-matrix misspecification depends on the specific DCM and the pattern of misspecification. Madison and Bradshaw [24] showed that even small changes in the Q-matrix design can significantly affect classification accuracy in the LCDM framework. Despite the importance of the Q-matrix, specifying a Q-matrix is a challenging and time-consuming task [25], as it requires a deep understanding of the cognitive processes involved in answering each item. In practice, researchers often rely on expert judgment to construct the Q-matrix, which is subjective and may lead to inconsistencies across different assessments. This challenge can be magnified in longitudinal settings, where misclassifications at one time point can propagate and affect the interpretation of transitions over time.
Given the challenges associated with Q-matrix specification, researchers have explored various approaches to estimate or validate Q-matrices empirically, or to develop models that do not require a pre-specified Q-matrix. Data-driven Q-matrix validation methods have been proposed to refine expert-specified Q-matrices based on empirical data. De La Torre [26] developed a method for validating and refining Q-matrices for the DINA model using a discrimination index. This method has been extended to the generalized DINA (G-DINA) model by De La Torre and Chiu [27]. These methods typically start with an expert-specified Q-matrix and suggest modifications based on statistical criteria. More ambitious approaches aim to estimate the Q-matrix directly from response data with minimal prior specification. Liu et al. [28,29] developed self-learning algorithms for Q-matrix estimation that iteratively update both the Q-matrix and model parameters and established conditions under which the Q-matrix can be uniquely identified from the data. Chen et al. [30] formulated the Q-matrix estimation as a latent variable selection problem and employed the regularized maximum likelihood to estimate the Q-matrix. Chen et al. [31] introduced a Bayesian method for estimating the DINA Q-matrix, which incorporates prior knowledge while allowing for data-driven refinement. More recently, Balamuta and Culpepper [32] introduced exploratory restricted latent class models with monotonicity requirements under Pòlya–Gamma data augmentation, providing a powerful Bayesian framework for Q-matrix estimation. Similarly, Chen et al. [33] introduced a sparse latent class model (SLCM) for cognitive diagnosis that simultaneously estimates the Q-matrix and classifies examinees without requiring prior specification of the number of attributes. Yamaguchi [34] uses a partially known Q-matrix to simultaneously estimate the effects of active and nonactive attributes in a Bayesian framework, which can also be employed to estimate the unknown Q-matrix part.
Despite these advances, longitudinal CDMs face significant computational challenges, particularly when the number of attributes or time points increases. Most current estimation methods rely on Markov Chain Monte Carlo (MCMC) techniques, which can be computationally intensive and time-consuming. Instead of MCMC, variational inference (VI) has emerged as a powerful alternative for estimating complex models, including CDMs, due to its computational efficiency and scalability [35,36,37,38]. Over the past several years, a growing body of work has advanced the use of variational inference for psychometric models. Yamaguchi and Okada [39,40] introduced variational inference approaches for the DINA model and the saturated diagnostic classification model, demonstrating significant computational advantages over traditional MCMC methods while maintaining comparable accuracy. Building on this foundation, Yamaguchi and Martinez [41] extended the variational framework to hidden Markov diagnostic classification models, making longitudinal cognitive diagnostic analyses more tractable for large-scale applications. Further advancing the field, Wang et al. [42] developed an efficient VBEM-M algorithm for the log-linear cognitive diagnostic model that aligns variational posteriors with prior distributions, achieving both faster convergence and improved parameter recovery compared to conventional estimation methods. In this paper, we propose a novel approach that leverages variational inference to efficiently estimate parameters of our hidden Markov log-linear additive cognitive diagnostic model (HM-LACDM), which extends the sparse latent class model (SLCM) framework. We implement post hoc Q-matrix recovery based on the estimated posterior distributions of item-effect parameters, enabling us to determine the Q-matrix without prior specification. Our approach provides substantial computational advantages over traditional MCMC methods, making it particularly well-suited for analyzing complex longitudinal diagnostic data with numerous attributes or time points. We demonstrate that our proposed method can effectively recover the Q-matrix and accurately estimate the model parameter through simulation studies.

2. Notation and Model Formulation

Before proceeding with our methodology, we first establish the notation that will be used throughout this paper. For any integer n, we denote the set { 1 , 2 , , n } as [ n ] . We denote the number of respondents as I, the number of items as J, the number of attributes as K, and the number of assessment time points as T. As discussed in the introduction, we denote the Q-matrix as Q R J × K , where q j k = 1 indicates that item j requires attribute k to answer correctly, and  q j k = 0 otherwise. Formally, the Q-matrix is denoted as Q { 0 , 1 } J × K . Let q j represent the j th  row vector of Q , such that Q = [ q 1 , , q J ] . In our work, we assume that the Q-matrix is unknown and we will recover it post hoc based on inference from the variational distributions of the item-effect parameters.
We introduce the latent discrete random variable Z { 0 , 1 } I × T × K , which represents the attribute mastery profiles of all respondents across time points and attributes. For each respondent i [ I ] , let Z i { 0 , 1 } T × K denote their complete attribute profile, where each entry z i , t , k = 1 indicates that respondent i has mastered attribute k at assessment point t, and  z i , t , k = 0 otherwise. We denote by z i , t { 0 , 1 } K the attribute mastery pattern of respondent i at assessment point t. At each time point t, there are L : = 2 K possible attribute mastery patterns. There exists a natural bijective mapping v : { 0 , 1 } K [ L ] that maps each attribute profile z i , t to a unique latent class c i , t [ L ] , such that v ( z i , t ) = c i , t and v 1 ( c i , t ) = z i , t . With slight abuse of notation, we can omit the mapping v and simply denote the latent class as z i , t = c i , t [ L ] . Let Δ = [ δ 1 , , δ L ] R L × ( K + 1 ) be the design matrix for the item effect parameters, where δ l R K + 1 is the l th row of Δ . Each row can be constructed as δ l = 1 v 1 ( l ) R K + 1 , where the first element is the intercept term and the remaining elements correspond to the attribute pattern. For example, when K = 2 , we have
Δ = 1 0 0 1 0 1 1 1 0 1 1 1 .
We denote the response matrix as Y { 0 , 1 } I × J × T , where y i , j , t = 1 indicates that respondent i answered item j correctly at time point t, and  y i , j , t = 0 otherwise. In our model, for respondent i at time point t, the response to item j is modeled as a Bernoulli random variable based on the mixture of generalized linear model (GLM) framework:
p ( y i , j , t = 1 z i , t = l , β j ) = Ψ ( β j δ l ) ,
where Ψ ( · ) is the logistic link function, i.e.,  Ψ ( x ) = 1 1 + exp ( x ) , β j R K + 1 is the item-effect parameter vector for item j. For each item j, we set the prior distribution of the item-effect parameter β j as a multivariate normal distribution β j N ( μ β j ( 0 ) , Σ β j ( 0 ) ) , where μ β j ( 0 ) R K + 1 is the prior mean and Σ β j ( 0 ) R ( K + 1 ) × ( K + 1 ) is the prior covariance matrix. We denote the collection of item-effect parameters as β = [ β 1 , , β J ] R J × ( K + 1 ) , and the prior distribution of β can be written as
p ( β ) = j = 1 J p ( β j ) j = 1 J exp 1 2 ( β j μ β j ( 0 ) ) Σ β j ( 0 ) 1 ( β j μ β j ( 0 ) ) .
Assuming conditional independence of responses y i , j , t given the attribute mastery patterns z i , t , and item parameters β j , the joint likelihood of the observed data is
p ( Y Z , β ) = i = 1 I t = 1 T j = 1 J p ( y i , j , t z i , t , β j )   = i = 1 I t = 1 T j = 1 J l = 1 L Ψ ( 2 y i , j , t 1 ) β j δ l 1 ( z i , t = l ) .
To model the longitudinal transitions of latent attribute mastery, we employ a hidden Markov model (HMM) structure. In our model, we assume that the transition probabilities between latent classes are time-homogeneous, meaning that the transition probabilities remain constant across time points. This is a common assumption in HMMs and allows us to simplify the modeling process while still capturing the essential dynamics of the latent attribute mastery states. We denote the time-homogeneous transition matrix as T = [ τ l , l ] l , l [ L ] [ 0 , 1 ] L × L , where each entry τ l , l represents the probability of transitioning from latent class l at time point t to latent class l at time point t + 1 . The transition matrix satisfies the constraint l [ L ] τ l , l = 1 for all l [ L ] . For t 2 , the transition from the latent class at time point t 1 to the latent class at time point t is modeled as a categorical distribution
p ( z i , t z i , t 1 , T ) = l = 1 L l = 1 L τ l , l 1 ( z i , t 1 = l , z i , t = l ) .
For t = 1 , the initial latent class distribution is modeled as a categorical distribution with prior probabilities π = [ π 1 , , π L ] , where l = 1 L π l = 1 . We have
p ( z i , 1 π ) = l = 1 L π l 1 ( z i , 1 = l ) .
We set the prior distribution of each row vector of the transition matrix τ l = [ τ l , 1 , , τ l , L ] as a Dirichlet distribution with concentration parameters ω l = [ ω l , 1 ( 0 ) , , ω l , L ( 0 ) ] , i.e.,  τ l Dir ( ω l ) . The prior distribution of the transition matrix T can be expressed as
p ( T ) = l = 1 L p ( τ l ) l = 1 L l = 1 L τ l , l ω l , l ( 0 ) 1 .
Similarly, we set the prior distribution of the initial latent class distribution π as a Dirichlet distribution with concentration parameters α = [ α 1 ( 0 ) , , α L ( 0 ) ] , i.e., π Dir ( α ) . We have
p ( π ) l = 1 L π l α l ( 0 ) 1 .
We can now summarize our HM-LACDM model as follows:
p ( Y , Z , β , T , π ) = p ( Y Z , β ) p ( Z T , π ) p ( β ) p ( T ) p ( π )   = p ( Y Z , β ) i = 1 I p ( z i , 1 π ) i = 1 I t = 2 T p ( z i , t z i , t 1 , T ) p ( β ) p ( T ) p ( π ) ,
where p ( Y Z , β ) is the likelihood of the observed data given the latent attribute mastery profiles and item parameters (Equation (3)); p ( Z T , π ) represents the likelihood of the latent attribute mastery profiles given the transition matrix and initial distribution (Equations (5) and (4)); p ( β ) denotes the prior distribution of the item-effect parameters (Equation (2)); p ( T ) specifies the prior distribution of the transition matrix (Equation (6)); and p ( π ) is the prior distribution of the initial latent class distribution (Equation (7)). A more detailed description of the data likelihood can be found in Appendix A.

3. Variational Inference for HM-LACDM

Variational inference (VI) [35,36,37,38] is a powerful computational technique that transforms Bayesian posterior inference into an optimization problem. Given a probabilistic model with latent variables z and observed data x, Bayesian inference seeks to compute the posterior distribution p ( z x ) = p ( x z ) p ( z ) p ( x ) , where p ( x ) is the marginal likelihood of the data. However, p ( x ) is often intractable to compute, especially for complex models with high-dimensional latent variables. Instead of computing the intractable posterior distribution p ( z x ) directly, VI approximates it with a simpler distribution q ( z ) by minimizing the Kullback–Leibler (KL) divergence between q ( z ) and the true posterior p ( z x ) . VI aims to maximize the evidence lower bound (ELBO), defined as:
ELBO ( q ( z ) ) = E q [ log p ( x , z ) ] E q [ log q ( z ) ] ,
where p ( x , z ) is the joint distribution of the observed data and latent variables, and  q ( z ) is the variational distribution. Since maximizing the ELBO is equivalent to minimizing the KL divergence to the true posterior, this approach provides a principled framework for approximate Bayesian inference that is both computationally efficient and scalable to large datasets. For our HM-LACDM, we employ coordinate ascent variational inference (CAVI) with mean-field factorization, which assumes the variational posterior factorizes as q ( z ) = i = 1 n q ( z i ) . This enables iterative optimization of each component while keeping others fixed. Additionally, we handle the logistic link function using the Jaakkola–Jordan lower bound approximation Jaakkola and Jordan [43], which transforms the non-conjugate logistic likelihood into a tractable quadratic form suitable for Gaussian variational approximations. A more detailed introduction to variational inference can be found in Appendix B.1.
Leveraging the mean-field variational inference framework, we aim to approximate the posterior distribution of the model parameters via a factorized distribution:
p ( Z , β , T , π Y ) q ( Z ) q ( β ) q ( T ) q ( π )   = i = 1 I q ( Z i ) j = 1 J q ( β j ) l = 1 L q ( τ l ) q ( π ) .
In this section, we present the variational posterior distributions and their iterative updates based on coordinate ascent variational inference (CAVI). Detailed derivations of the variational updates can be found in the Appendix B.
First, the variational posterior distribution of the item-effect parameters β j is assumed to follow a multivariate normal distribution q ( β j ) N ( β j μ β j ( ) , Σ β j ( ) ) , where μ β j ( ) R K + 1 is the variational mean and Σ β j ( ) R ( K + 1 ) × ( K + 1 ) is the variational covariance matrix. The iterative updates can be updated as follows:
Σ β j ( ) = Σ β j ( 0 ) 1 + 2 i = 1 I l = 1 L t = 1 T E q 1 z i , t = l h ( ξ j , l ) δ l δ l 1 , μ β j ( ) = Σ β j ( ) Σ β j ( 0 ) 1 μ β j ( 0 ) + i = 1 I l = 1 L t = 1 T E q 1 z i , t = l ( y i , j , t 1 2 ) δ l ,
where ξ j , l is the variational parameter associated with item j and latent class l, and  h ( ξ j , l ) = tanh ( ξ j , l / 2 ) 4 ξ j , l . We use Ξ to denote the collection of all variational parameters { ξ j , l } j [ J ] , l [ L ] .
The variational posterior distribution of each row vector of the transition matrix τ l is assumed to follow a Dirichlet distribution, i.e.,  q ( τ l ) Dir ( τ l ω l ( ) ) , where ω l ( ) R L is the variational concentration parameter vector for the l th row of the transition matrix. The iterative updates can be updated as follows:
ω l l ( ) = ω l l ( 0 ) + i = 1 I t = 2 T E q 1 ( z i , t 1 = l , z i , t = l ) .
Similarly, the variational posterior distribution of the initial latent class distribution π is assumed to follow a Dirichlet distribution, i.e.,  q ( π ) Dir ( π α ( ) ) , where α ( ) R L is the variational concentration parameter vector for the initial latent class distribution. The iterative updates can be updated as follows:
α l ( ) = α l ( 0 ) + i = 1 I E q 1 ( z i , 1 = l ) .
To update these variational posteriors, expectations of latent attribute mastery pattern, E q 1 ( z i , t = l ) and E q 1 ( z i , t 1 = l , z i , t = l ) , are required. For all i [ I ] and l , l [ L ] , the iterative updates of these expectations can be computed using the following equations:
  E q 1 ( z i , t = l ) = f i , t ( l ) b i , t ( l ) l f i , t ( l ) b i , t ( l ) ,   t [ T ]   E q 1 ( z i , t 1 = l , z i , t = l ) = f i , t 1 ( l ) b i , t ( l ) η l l ϕ i , t ( l ) , f i , t 1 ( ) b i , t ( ) η ϕ i , t ( ) ,   t 2 .
In the above equations, f i , t ( l ) and b i , t ( l ) are the forward and backward probabilities, respectively, which can be computed recursively using the forward–backward algorithm, also known as the Baum–Welch algorithm in the context of HMMs [44,45,46]. For all i [ I ] , t [ T ] , and  l [ L ] , f i , t ( l ) and b i , t ( l ) are defined as follows:
f i , 1 ( l ) = κ l ϕ i , 1 , l , f i , t ( l ) = ϕ i , t , l l f i , t 1 ( l ) η l l   t 2 , b i , T ( l ) = 1 , b i , t ( l ) = l b i , t + 1 ( l ) η l l ϕ i , t + 1 , l   t T 1 ,
where η l l = exp E q log τ l l , κ l = exp E q log π l , and 
ϕ i , t , l = exp j = 1 J E q log Ψ ( ξ j , l ) + ( y i , j , t 1 2 ) β j δ l ξ j , l 2 h ( ξ j , l ) β j δ l 2 ξ j , l 2 .
During each iteration of the variational inference algorithm, we update the variational parameters Ξ = { ξ j , l } j [ J ] , l [ L ] using
ξ j , l = E q ( β j δ l ) 2 1 2 = δ l E q β j β j δ l 1 2 .
We summarize our variational inference algorithm for the HM-LACDM as follows (Algorithm 1):
Algorithm 1 Variational Inference for HM-LACDM
1:
Input: Response data Y { 0 , 1 } I × J × T , prior parameters μ β j ( 0 ) , Σ β j ( 0 ) , ω l ( 0 ) , α ( 0 ) for all j [ J ] , l [ L ] .
2:
Initialize: Set μ β j ( ) = μ β j ( 0 ) , Σ β j ( ) = Σ β j ( 0 ) , ω l ( ) = ω l ( 0 ) , α ( ) = α ( 0 ) for all j [ J ] , l [ L ] .
3:
Repeat until convergence:
1.
Update E q 1 ( z i , t = l ) and E q 1 ( z i , t 1 = l , z i , t = l ) (Equation (13)).
2.
Update μ β j ( ) , Σ β j ( ) , ω l ( ) , and  α ( ) (Equation (10)–(12)).
3.
Update the variational parameters ξ j , l (Equation (15)).
4:
Output: Variational posterior distributions q ( Z ) , q ( β ) , q ( T ) , and  q ( π ) .
We include the details of the derivation of the variational updates in Appendix B.2. We also provide the explicit form of the ELBO in Appendix B.3.

4. Post Hoc Q-Matrix Recovery

Although our HM-LACDM does not require a pre-specified Q-matrix, recovering the Q-matrix is still valuable for interpretability and diagnostic purposes. We propose a post hoc Q-matrix recovery approach that leverages the estimated variational posterior distributions of the item-effect parameters. For item j, its item-effect parameter β j can be expressed as β j = [ β j 0 , β j 1 , , β j K ] , where β j 0 is the intercept term and β j k is the effect of attribute k on item j. If attribute k is not required for item j, then its corresponding effect parameter β j k should be zero. Based on this fact, we can perform hypothesis testing to determine whether each attribute k is required for item j based on the estimated posterior distribution of β j k . We formulate the hypothesis testing problem as follows:
H 0 : β j k = 0   vs .   H a : β j k > 0 .
We then estimate the Q-matrix entry q ^ j k using a decision rule:
q ^ j k = 1 if z j k > z 1 α 0 otherwise ,
where z j k = μ j k σ j k is the test statistic for the hypothesis testing of β j k , μ j k and σ j k are the posterior mean and standard deviation of β j k , respectively, and z 1 α is the critical value from the standard normal distribution corresponding to the significance level α .
However, due to the fact that we are testing multiple hypotheses simultaneously (one for each combination of item j and attribute k), we need to control the false discovery rate (FDR). We employ the Benjamini–Hochberg (BH) procedure [47] to control the FDR at level α . The BH procedure begins by computing the test statistic z j k = μ j k σ j k for all ( j , k ) pairs where j [ J ] and k [ K ] , followed by calculating the corresponding p-values p j k = 1 Φ ( z j k ) , where Φ ( · ) is the cumulative distribution function of the standard normal distribution. The p-values are then sorted in ascending order as p ( 1 ) p ( 2 ) p ( J K ) . We find the largest index such that p ( ) α J K and reject all hypotheses corresponding to p ( 1 ) , , p ( ) , setting the corresponding Q-matrix entries to 1.
Although our approach is post hoc, it is designed to recover the Q-matrix if the ground truth Q-matrix satisfies certain identifiability conditions. A Q-matrix is said to be complete if it allows for the identification of all the possible attribute profiles of the L = 2 K different proficiency classes. Köhn and Chiu [48,49] have proved that the necessary condition for a J × K Q-matrix of a main-effects-only model to be complete is that the matrix must have full column rank ( rank ( Q ) = K ), i.e., the columns of the Q-matrix must be linearly independent. Therefore, if the ground truth Q-matrix is complete, our post hoc recovery procedure should recover the Q-matrix with high probability as long as the sample size is sufficiently large. Our approach provides a principled method for Q-matrix recovery that accounts for multiple testing while maintaining strong statistical properties. The recovered Q-matrix can then be used for diagnostic interpretation and validation of the cognitive structure underlying the assessment.

5. Simulation Study

5.1. Single Run

In this section, we present simulation results from a single run of the HM-LACDM with 100 CAVI iterations. For this simulation, we generated synthetic datasets with I = 500 individuals across T = 2 time points, using a ground-truth Q-matrix with K = 3 attributes and J = 21 items (see Appendix C). The data generation parameters were configured as follows: we set the intercept parameter β j 0 = 3 for all items j [ J ] . For items requiring n attributes, we set all main effect parameters to β j k = 6 n for all attributes k required by item j, ensuring that the total effect remains constant across items with different complexity levels. The initial state distribution of the latent attribute mastery profiles was generated as π = softmax ( x ) , where x was sampled from an L-dimensional isotropic Gaussian distribution with L = 2 K = 8 possible latent classes, and softmax ( x ) i = exp ( x i ) j = 1 L exp ( x j ) for i [ L ] . For the transition dynamics, we modeled attribute-specific transition probabilities, where each attribute could independently transition between mastery states. Specifically, for each attribute k [ K ] , we set p ( z i , t , k = 1 | z i , t 1 , k = 1 ) = 0.8 (mastery retention probability), p ( z i , t , k = 0 | z i , t 1 , k = 1 ) = 0.2 (forgetting probability), p ( z i , t , k = 1 | z i , t 1 , k = 0 ) = 0.3 (learning probability), and p ( z i , t , k = 0 | z i , t 1 , k = 0 ) = 0.7 (non-mastery retention probability) for all time points t [ T ] and individuals i [ I ] . The transition matrix T = [ τ l , l ] l , l [ L ] was constructed by assuming independence across attributes:
τ l , l = k = 1 K p ( z k = l k | z k = l k ) ,
where l k and l k are the k th bits of the binary representations of latent classes l and l .
We display trace plots of the evidence lower bound (ELBO) and model parameters to demonstrate convergence behavior during the iterative optimization process. Additionally, we provide parameter recovery plots that compare the estimated parameters against the true values used in data generation, illustrating the accuracy of our proposed method.
Figure 1a shows the trace of the ELBO during the CAVI iterations. We only show the first 30 iterations since the ELBO converges quickly and stabilizes after that. We can see that the ELBO increases monotonically, corresponding to the optimization process of CAVI.
Figure 1b displays the trace plots of item-effect parameters β j for all items j [ J ] . Different colors represent different types of parameters, including intercepts (red) and main effects of different magnitudes (orange, purple, and green), allowing us to visualize the convergence behavior of each parameter type. The blue lines represent the parameters that are not required by the items, which should converge to zero. We can see that all parameters converge after around 10 iterations, indicating that the CAVI algorithm has efficiently optimized the model parameters.
Figure 2 displays the parameter recovery results for the item-effect parameters β j , the initial latent class distribution π , and the transition matrix T . The x-axis represents the true values used in data generation, while the y-axis shows the estimated values obtained through CAVI optimization. The dashed diagonal line y = x represents perfect recovery, where estimated values exactly match true values. As evident from the plots, the estimated item-effect parameters β j , π , and transition matrix T closely align with the diagonal line, demonstrating that our HM-LACDM effectively recovers the true parameters. At the first time point ( t = 1 ), the profile prediction accuracy was 99.2%, while at the second time point ( t = 2 ) it was 99.6%. When aggregated across all the time points, the overall profile prediction accuracy was 98.8%.
We also conducted post hoc Q-matrix recovery using the estimated variational posterior distributions of the item-effect parameters. With a significance level of α = 0.05 for hypothesis testing, we achieved 100% accuracy in Q-matrix recovery, successfully identifying all entries in the ground-truth Q-matrix. It is important to note that our model faces an inherent identifiability issue. We can only identify the Q-matrix up to a permutation of the attributes, which is a common limitation in latent variable models with discrete latent classes. We would also like to highlight the computational efficiency of our approach. Leveraging the torch back-end in R, our algorithm completed all 100 CAVI iterations in just 4.51 s on an Apple M1 Max MacBook Pro with 32 GB of RAM. This demonstrates the practical feasibility of our method for large-scale applications. While we set the maximum iterations to 100, the trace plots show that convergence typically occurs within 30 iterations, suggesting potential for even greater computational efficiency in practice.

5.2. Multiple Runs

In this section, we present results from multiple runs of the HM-LACDM to evaluate its robustness and performance across varying sample sizes and attribute settings. For each configuration, we generated n = 100 independent datasets and applied our algorithm to each dataset, reporting the average performance metrics with their standard errors across all runs. We varied the sample size I (200 and 1000), the number of attributes K (3 and 4), and the number of time points T (2, 3, and 5). We additionally varied the item parameters to evaluate the performance of our algorithm under different levels of signal strength. Specifically, we considered three conditions: strong, moderate, and weak, corresponding to p = 0.95 , 0.90 , 0.80 , respectively, where p denotes the probability that a respondent answers an item correctly given mastery of all required attributes. Our definitions of strong–weak signals align with the “high-” and “low-quality items” used by Yamaguchi and Martinez [41], who similarly manipulated item response probabilities to reflect varying item quality while preserving comparable cognitive diagnostic characteristics. The parameters π and T are set to the same values as in the single-run simulation.
Table 1 summarizes the root mean square error (RMSE) of the estimated parameters across different configurations. The RMSE values are reported as the mean (standard error) over n = 100 independent runs. We observe that as the sample size increases from 200 to 1000, the RMSE of the estimated parameters decreases significantly, indicating consistency of our CAVI algorithm.
Figure 3 shows the distributions of RMSE for item-effect parameters β j , initial latent class distribution π , and transition matrix T under the configuration I = 200 , K = 3 , and T = 5 . We can see that the distributions of RMSE are unimodal and centered around the mean RMSE values reported in Table 1.
Table 2 summarizes the Q-matrix recovery performance across different configurations. We report the false positive rate (FPR), false negative rate (FNR), and accuracy of the recovered Q-matrix entries. For each run, we set the significance level α to 0.05 and 0.01 , and report the results for both levels. We observe that the FPR remains low across all configurations, indicating that our method effectively controls the false discovery rate. The FNR is 0 for all configurations, demonstrating that our method successfully identifies all required attributes for each item. The overall accuracy of the recovered Q-matrix is high, ranging from 96.12 % to 99.59 % , indicating that our method is reliable for Q-matrix recovery.
The tables above present the results for the strong signal condition. Results for the moderate and weak signal conditions are provided in Appendix D.2; see Table A3,Table A4,Table A5,Table A6. Notably, the findings under the weak signal condition are qualitatively similar, further demonstrating the robustness and consistency of our method.
Finally, we highlight the computational efficiency of our approach. For a large configuration with I = 1000 , K = 4 , and T = 5 , the average runtime per run was 5.30 s. For a small configuration with I = 200 , K = 3 , and T = 2 , the average runtime per run was 3.08 s. These results demonstrate the scalability of our variational inference approach even with increasing model complexity and sample size. For a large configuration with I = 1000 , K = 4 , and T = 5 , the average runtime per replication was 5.30 s. In contrast, for a smaller configuration with I = 200 , K = 3 , and T = 2 , the average runtime per replication was 3.08 s. These results underscore the scalability of our variational inference approach, demonstrating its ability to handle increasingly complex models and larger sample sizes with only modest additional computational cost. All simulations were conducted on an Apple M1 Max MacBook Pro with 32 GB of RAM.

6. Discussion

In this paper, we introduced a novel hidden Markov log-linear additive Cognitive Diagnostic Model (HM-LACDM) that circumvents the necessity of specifying a predefined Q-matrix, thus addressing a critical limitation in traditional cognitive diagnostic models (CDMs). By integrating a variational Bayesian inference framework, our method significantly reduces computational complexity, making it particularly suitable for large-scale longitudinal cognitive assessments involving numerous attributes and multiple time points.
The simulation studies validate the robustness and efficiency of our proposed approach. Our variational inference algorithm consistently demonstrates convergence, parameter recovery accuracy, and computational efficiency across varying configurations of sample size, attribute complexity, and temporal length. Notably, the post hoc Q-matrix recovery method exhibited remarkable accuracy, effectively controlling the false discovery rate and reliably identifying the cognitive attributes underlying each test item.
Our approach’s scalability and computational efficiency make it highly suitable for practical applications in educational assessments, where swift and accurate diagnostics are crucial. It provides educators and psychometricians with an effective tool for longitudinally tracking student learning progression without the intensive and subjective Q-matrix specification process.
In summary, the proposed HM-LACDM and the accompanying variational Bayesian inference method represent substantial advances in the field of cognitive diagnostic modeling. They not only facilitate more robust and scalable inference but also significantly enhance interpretability through effective Q-matrix recovery, thereby opening new avenues for targeted instructional intervention and deeper insights into cognitive skill development.
Despite the advantages of our proposed framework, there are important limitations to acknowledge. First, the recovered Q-matrix is identifiable only up to permutation of the attributes, a common issue in latent-class models without additional constraints. In our simulation studies, we address this by aligning the estimated attributes with the ground truth through enumeration of all possible label permutations. This ensures that the reported recovery accuracy and simulation results are not confounded by label switching. For empirical data, when the Q-matrix is not fully available, partial domain knowledge can be used to anchor certain attributes, and ordering constraints can be imposed to break label symmetry. These strategies mitigate the relabeling issue and represent promising directions for extending our framework to empirical applications. Second, unlike a saturated CDM, our specification includes only intercept and main-effect parameters, excluding higher-order interaction terms. We do not impose explicit constraints on these parameters during estimation; instead, we assume that the monotonicity condition holds for the true model, ensuring that the data themselves satisfy this property. In practice, we guide the algorithm toward monotone solutions by initializing intercepts with negative values and main effects with positive values, and by running the algorithm multiple times with different random initializations to select the best-fitting solution. Under this setting, the one-sided test with positive effects used in our post hoc Q-matrix recovery is appropriate, as the parameters of substantive interest are non-negative by construction.
Future research could extend the framework in several directions: incorporating interaction terms to approximate saturated CDMs, relaxing the monotonicity assumption, or integrating informative priors to further improve identifiability. Importantly, while our simulation studies demonstrate robustness and scalability, applying the method to large-scale empirical assessment data remains a critical next step. Such real-data validation would allow us to evaluate the practical utility of the HM-LACDM in capturing nuanced learning dynamics and further establish its value in operational testing contexts.

Author Contributions

Conceptualization, H.D., J.T., M.J., M.J.M. and M.C.; methodology, H.D., J.T. and M.J.; software, H.D. and J.T.; validation, J.T., M.J.M. and M.C.; formal analysis, H.D. and J.T.; investigation, M.J.; writing—original draft preparation, H.D.; writing—review and editing, H.D., J.T., M.J., M.J.M. and M.C.; visualization, H.D. and J.T.; supervision, M.J., M.J.M. and M.C.; project administration, M.J., M.J.M. and M.C.; funding acquisition, M.J., M.J.M. and M.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Institute of Education Sciences R305D220020.

Data Availability Statement

The code with a simulated data set used in the study are available at https://github.com/edwardduanhao/HM-LACDM (accessed on 20 October 2025).

Conflicts of Interest

Author James Tang was employed by the Walt Disney Company. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
KLKullback–Leibler divergence
VIVariational inference
CDMCognitive diagnosis model
DCMDiagnostic classification model
FDRFalse discovery rate
HMMHidden Markov model
IRTItem response theory
CAVICoordinate ascent variational inference
ELBOEvidence lower bound
HM-LACDMHidden Markov log-linear cognitive diagnosis model

Appendix A. Complete Data Likelihood of the HM-LACDM

In this section, we provide the complete data likelihood of the HM-LACDM, which is defined as the joint distribution of the response data Y, latent variables Z , item-effect parameters β , transition matrix T , and initial state distribution π . The complete data likelihood can be expressed as follows:
    p ( Y , Z , β , T , π )   = p ( Y Z , β ) p ( Z T , π ) p ( β ) p ( T ) p ( π )   = p ( Y Z , β ) i = 1 I p ( z i , 1 π ) i = 1 I t = 2 T p ( z i , t z i , t 1 , T ) p ( β ) p ( T ) p ( π )   = i = 1 I j = 1 J t = 1 T p ( y i , j , t z i , t , β j ) i = 1 I p ( z i , 1 π ) i = 1 I t = 2 T p ( z i , t z i , t 1 , T )     j = 1 J p ( β j ) p ( T ) p ( π )   i = 1 I t = 1 T j = 1 J l = 1 L Ψ ( 2 y i , j , t 1 ) β j δ l 1 ( z i , t = l ) i = 1 I l = 1 L π l 1 ( z i , 1 = l )     i = 1 I t = 2 T l = 1 L l = 1 L τ l l 1 ( z i , t 1 = l , z i , t = l ) j = 1 J exp 1 2 ( β j μ β j ( 0 ) ) Σ β j ( 0 ) 1 ( β j μ β j ( 0 ) )     l = 1 L l = 1 L τ l l ω l l ( 0 ) 1 l = 1 L π l α l ( 0 ) 1

Appendix B. Variational Inference for HM-LACDM

Appendix B.1. Background on Variational Inference

Variational inference addresses the fundamental challenge in Bayesian analysis: computing the posterior distribution p ( z x ) = p ( x , z ) p ( x ) when the marginal likelihood p ( x ) = p ( x , z ) d z is intractable. VI approximates the true posterior with a distribution q ( z ) from a parameterized family by minimizing the Kullback–Leibler (KL) divergence:
KL ( q ( z ) p ( z x ) ) = E q ( z ) log q ( z ) p ( z x ) .
Since direct minimization is infeasible due to the unknown normalizing constant, we reformulate the problem by maximizing the evidence lower bound (ELBO):
ELBO ( q ( z ) ) = E q ( z ) log p ( x , z ) q ( z ) = E q ( z ) log p ( x , z ) E q ( z ) log q ( z ) .
The relationship between the ELBO and the KL divergence can be established as follows:
ELBO ( q ( z ) ) + KL ( q ( z ) p ( z x ) ) = E q ( z ) log p ( x , z ) q ( z ) + log q ( z ) p ( z x )   = E q ( z ) log p ( x )   = log p ( x ) .
Since p ( x ) is constant with respect to q ( z ) , maximizing the ELBO is equivalent to minimizing the KL divergence, providing a principled and computationally tractable approach to approximate Bayesian inference.
To facilitate optimization, we typically assume a factorized form for the variational posterior distribution: q ( z ) = i = 1 n q ( z i ) , where each factor q ( z i ) belongs to a simple parametric family. This assumption, while restrictive, dramatically simplifies the optimization landscape and enables coordinate-wise updates for each q ( z i ) , leading to efficient algorithms such as coordinate ascent variational inference (CAVI). CAVI iteratively optimizes each variational factor while holding others fixed. For updating the i-th component q ( z i ) , we write the ELBO as:
ELBO ( q ( z i ) ) = E q ( z i ) E q ( z i ) log p ( x , z ) E q ( z i ) log q ( z i ) + const ,
where z i denotes all latent variables except z i . Taking the functional derivative with respect to q ( z i ) and setting it to zero yields the optimal update:
q ( z i ) exp E q ( z i ) log p ( x , z ) .
This update rule forms the core of the CAVI algorithm, which monotonically increases the ELBO and converges to a local optimum. Plummer et al. [50] studied the dynamics of CAVI in the context of 2D Ising models empirically, demonstrating that CAVI can converge to global optima under certain conditions. Recent theoretical work by Bhattacharya et al. [51] has established conditions for both global and local convergence guarantees.

Appendix B.2. Derivation of Variational Posterior Distributions

Variational inference encounters significant computational challenges when dealing with logistic link functions due to their non-conjugate nature, which renders exact posterior inference intractable. Jaakkola and Jordan [43] addressed this limitation by developing a convex lower bound approximation for the logistic function, commonly referred to as the Jaakkola–Jordan bound, which enables tractable variational updates. For a logistic function Ψ ( x ) = ( 1 + e x ) 1 , they established the following tight lower bound using an auxiliary variational parameter ξ :
Ψ ( x ) Ψ ( ξ ) exp x ξ 2 h ( ξ ) ( x 2 ξ 2 ) ,
where h ( ξ ) = tanh ( ξ / 2 ) 4 ξ and ξ is a data-dependent variational parameter. This bound elegantly transforms the intractable logistic likelihood into a quadratic function of x, facilitating the use of Gaussian variational approximations with closed-form updates. Within the CAVI framework, ξ is iteratively optimized to maximize the bound’s tightness, resulting in a convex optimization problem that ensures monotonic improvement of the ELBO. This technique has become a cornerstone in Bayesian logistic regression and latent Gaussian models, enabling scalable variational inference for logistic likelihoods. The statistical optimality and algorithmic convergence properties of such methods have been rigorously established by Ghosh et al. [52], Anceschi et al. [53]. It is worth noting the theoretical connection between the Jaakkola–Jordan bound and the Pólya–gamma augmentation technique [54] for logistic regression, as explored by Durante and Rigon [55]. This relationship reveals a deep equivalence between Gibbs sampling with Pólya–gamma augmentation and variational inference using the Jaakkola–Jordan transform, providing valuable insights into the fundamental connections between these two inference paradigms.
In our HM-LACDM, we derive the lower bound for log p ( y i , j , t z i , t = l , β j ) using the Jaakkola–Jordan bound as follows:
log p ( y i , j , t z i , t = l , β j ) = log Ψ ( ( 2 y i , j , t 1 ) β j δ l )   log Ψ ( ξ j , l ) + ( y i , j , t 1 2 ) β j δ l ξ j , l 2 h ( ξ j , l ) β j δ l 2 ξ j , l 2
According to Equation (A1) and Equation (A4), the lower bound of the log likelihood of the HM-LACDM can be expressed as:
    log p ( Y , Z , β , T , π )   log p ˜ ( Y , Z , β , T , π , ξ )   = i = 1 I j = 1 J t = 1 T l = 1 L 1 z i , t = l log Ψ ( ξ j , l ) + ( y i , j , t 1 2 ) β j δ l ξ j , l 2 h ( ξ j , l ) β j δ l 2     ξ j , l 2 + i = 1 I l = 1 L 1 z i , 1 = l log π l + i = 1 I t = 2 T l = 1 L l = 1 L 1 z i , t 1 = l , z i , t = l log τ l l     1 2 j = 1 J ( β j μ β j ( 0 ) ) Σ β j ( 0 ) 1 ( β j μ β j ( 0 ) ) + l = 1 L l = 1 L ω l l ( 0 ) 1 log τ l l + l = 1 L α l ( 0 ) 1 log π l .
According to the update rule of CAVI (Equation (A2)), we can derive the variational posterior distributions for the latent variables. At first, we derive the variational posterior distribution of the β j , which is given by:
q ( β j )   exp E q ( β j ) i = 1 I t = 1 T l = 1 L 1 z i , t = l y i , j , t 1 2 β j δ l β j h ( ξ j , l ) δ l δ l β j   1 2 ( β j μ β j ( 0 ) ) Σ β j ( 0 ) 1 ( β j μ β j ( 0 ) )
q ( β j ) is a multivariate normal distribution with mean μ β j ( ) and covariance Σ β j ( ) , given by:
Σ β j ( ) = Σ β j ( 0 ) 1 + 2 i = 1 I l = 1 L t = 1 T E q 1 z i , t = l h ( ξ j , l ) δ l δ l 1 , μ β j ( ) = Σ β j ( ) Σ β j ( 0 ) 1 μ β j ( 0 ) + i = 1 I l = 1 L t = 1 T E q 1 z i , t = l ( y i , j , t 1 2 ) δ l ,
where E q 1 z i , t = l is the expected value of the indicator function under the variational posterior distribution q ( Z ) .
Then, we derive the variational posterior distribution of the transition matrix τ l , which is given by:
q ( τ l )   exp E q ( τ ) i = 1 I t = 2 T l = 1 L 1 z i , t 1 = l , z i , t = l log τ l l + l = 1 L ω l l ( 0 ) 1 log τ l l .
q ( τ l ) is a Dirichlet distribution with parameters ω l ( ) , given by:
ω l l ( ) = ω l l ( 0 ) + i = 1 I t = 2 T E q 1 z i , t 1 = l , z i , t = l .
Similarly, the variational posterior distribution of the initial state distribution π is given by:
q ( π )   exp E q ( π ) i = 1 I l = 1 L 1 z i , 1 = l log π l + l = 1 L α l ( 0 ) 1 log π l .
q ( π ) is a Dirichlet distribution with parameters α ( ) , given by:
α l ( ) = α l ( 0 ) + i = 1 I E q 1 z i , 1 = l .
Finally, the variational posterior distribution of the latent variables Z is given by:
q ( Z i )   exp ( E q ( Z i ) [ j = 1 J t = 1 T l = 1 L 1 z i , t = l log Ψ ( ξ j , l ) + ( y i , j , t 1 2 ) β j δ l ξ j , l 2     h ( ξ j , l ) β j δ l 2 ξ j , l 2 + l = 1 L 1 z i , 1 = l log π l     + t = 2 T l = 1 L l = 1 L 1 z i , t 1 = l , z i , t = l log τ l l ] )   = exp t = 1 T l = 1 L 1 z i , t = l ϕ ˜ i , t , l + l = 1 L 1 z i , 1 = l κ ˜ l + t = 2 T l = 1 L l = 1 L 1 z i , t 1 = l , z i , t = l ψ ˜ l l ,
where ϕ i , t , l , κ l , and ψ l l are defined as:
ϕ ˜ i , t , l = E q log Ψ ( ξ j , l ) + ( y i , j , t 1 2 ) β j δ l ξ j , l 2 h ( ξ j , l ) β j δ l 2 ξ j , l 2 , κ ˜ l = E q log π l , ψ ˜ l l = E q log τ l l .
We observe that q ( Z i ) can still be factorized as a HMM with the following parameters:
q ( z i , 1 = l ) exp κ ˜ l + ϕ ˜ i , 1 , l q ( z i , t = l z i , t 1 = l ) exp ψ ˜ l l + ϕ ˜ i , t , l   for   t = 2 , , T .
To efficiently compute the expected value of the indicator function E q 1 z i , t = l , we can use the well-known forward–backward algorithm, also known as the Baum–Welch algorithm [44,45,46], which is a dynamic programming algorithm for computing the marginal probabilities of the hidden states in a HMM. For all i [ I ] , t [ T ] , and l [ L ] , f i , t ( l ) and b i , t ( l ) are defined as follows:
f i , 1 ( l ) = κ l ϕ i , 1 , l , f i , t ( l ) = ϕ i , t , l l f i , t 1 ( l ) η l l   t 2 , b i , T ( l ) = 1 , b i , t ( l ) = l b i , t + 1 ( l ) η l l ϕ i , t + 1 , l   t T 1 ,
where ϕ i , t , l = exp ( ϕ ˜ i , t , l ) , κ l = exp ( κ ˜ l ) , and η l l = exp ( ψ ˜ l l ) . Then the expected value of the indicator function E q 1 z i , t = l can be computed as:
  E q 1 ( z i , t = l ) = f i , t ( l ) b i , t ( l ) l f i , t ( l ) b i , t ( l ) ,   t [ T ]   E q 1 ( z i , t 1 = l , z i , t = l ) = f i , t 1 ( l ) b i , t ( l ) η l l ϕ i , t ( l ) , f i , t 1 ( ) b i , t ( ) η ϕ i , t ( ) ,   t 2 .

Appendix B.3. Derivation of the Evidence Lower Bound (ELBO)

In this section, we present the derivation of the ELBO for the HM-LACDM, which is defined as:
ELBO ( q ) = E q ( Z , β , T , π ) log p ˜ ( Y , Z , β , T , π , Ξ ) E q ( Z , β , T , π ) log q ( Z , β , T , π ) ,
Based on Equation (A5), the first term in the ELBO can be expanded as follows:
    E q ( Z , β , T , π ) log p ˜ ( Y , Z , β , T , π , Ξ )   = E q [ i = 1 I j = 1 J t = 1 T l = 1 L 1 z i , t = l log Ψ ( ξ j , l ) + ( y i , j , t 1 2 ) β j δ l ξ j , l 2 h ( ξ j , l ) β j δ l 2     ξ j , l 2 + i = 1 I l = 1 L 1 z i , 1 = l log π l + i = 1 I t = 2 T l = 1 L l = 1 L 1 z i , t 1 = l , z i , t = l log τ l l     1 2 j = 1 J ( β j μ β j ( 0 ) ) Σ β j ( 0 ) 1 ( β j μ β j ( 0 ) ) + l = 1 L l = 1 L ω l l ( 0 ) 1 log τ l l + l = 1 L α l ( 0 ) 1 log π l ]   = i = 1 I j = 1 J t = 1 T l = 1 L E q 1 z i , t = l log Ψ ( ξ j , l ) + ( y i , j , t 1 2 ) E q [ β j ] δ l ξ j , l 2 h ( ξ j , l ) (     δ l E q [ β j β j ] δ l ) ) + l = 1 L E q [ log π l ] α l ( 0 ) 1 + i = 1 I E q 1 z i , 1 = l     + l = 1 L l = 1 L E q [ log τ l l ] ω l l ( 0 ) 1 + i = 1 I t = 2 T E q 1 z i , t 1 = l , z i , t = l     j = 1 J E q 1 2 ( β j μ β j ( 0 ) ) Σ β j ( 0 ) 1 ( β j μ β j ( 0 ) ) .
Note that E q [ log π l ] = ψ ( α l ( ) ) ψ ( l = 1 L α l ( ) ) and E q [ log τ l l ] = ψ ( ω l l ( ) ) ψ ( = 1 L ω l = 1 ( ) ) , where ψ ( · ) is the digamma function.
We also have E q [ β j ] = μ β j ( ) , E q [ β j β j ] = Σ β j ( ) + μ β j ( ) μ β j ( ) , and
E q ( β j μ β j ( 0 ) ) Σ β j ( 0 ) 1 ( β j μ β j ( 0 ) ) = tr Σ β j ( ) Σ β j ( 0 ) 1 + ( μ β j ( ) μ β j ( 0 ) ) Σ β j ( 0 ) 1 ( μ β j ( ) μ β j ( 0 ) ) .
The second term in the ELBO is the negative entropy of the variational posterior distribution, which can be expressed as:
    E q ( Z , β , T , π ) log q ( Z , β , T , π )   = i = 1 I E q log q ( Z i ) + j = 1 J E q log q ( β j ) + l = 1 L E q log q ( τ l ) + E q log q ( π ) .
Since q ( π ) and q ( τ l ) are Dirichlet distributions, their negative entropies can be computed as follows:
E q log q ( π ) = log B α ( ) + l = 1 L ( α l ( ) 1 ) ψ ( α l ( ) ) ψ ( l = 1 L α l ( ) ) , E q log q ( τ l ) = log B ω l ( ) + l = 1 L ( ω l l ( ) 1 ) ψ ( ω l l ( ) ) ψ ( = 1 L ω l ( ) ) ,
where B ( · ) is the multivariate beta function. Since each q ( β j ) is a multivariate normal distribution, its negative entropy can be computed as:
E q log q ( β j ) = 1 2 log det ( Σ β j ( ) ) + const .
Finally, the negative entropy of q ( Z i ) can be computed as:
    E q log q ( Z i )   = l = 1 L E q 1 z i , 1 = l log q ( z i , 1 = l )     + t = 2 T l = 1 L l = 1 L E q 1 z i , t 1 = l , z i , t = l log q ( z i , t = l z i , t 1 = l )   = l = 1 L E q 1 z i , 1 = l log E q 1 z i , 1 = l     + t = 2 T l = 1 L l = 1 L E q 1 z i , t 1 = l , z i , t = l log E q 1 z i , t = l , z i , t 1 = l     t = 2 T l = 1 L l = 1 L E q 1 z i , t 1 = l , z i , t = l log E q 1 z i , t 1 = l .
Plugging in Equations (A20)–(A22), to Equation (A19), we can obtain the second term in the ELBO. Finally, we can combine Equations (A18) and (A19) to obtain the ELBO for the HM-LACDM (Equation (A17)), which is given by:
    ELBO ( q )   = i = 1 I j = 1 J t = 1 T l = 1 L z i , t , l ( ) log Ψ ( ξ j , l ) + ( y i , j , t 1 2 ) μ β j ( ) δ l ξ j , l 2 h ( ξ j , l ) ·     δ l Σ β j ( ) + μ β j ( ) μ β j ( ) δ l ξ j , l 2 + l = 1 L ψ ( α l ( ) ) ψ ( l = 1 L α l ( ) ) α l ( 0 ) α l ( ) + i = 1 I z i , 1 , l ( )     + l = 1 L l = 1 L ψ ( ω l l ( ) ) ψ ( = 1 L ω l ( ) ) ω l l ( 0 ) ω l l ( ) + i = 1 I t = 2 T z i , t 1 , t , l , l ( )     1 2 j = 1 J tr Σ β j ( ) Σ β j ( 0 ) 1 + ( μ β j ( ) μ β j ( 0 ) ) Σ β j ( 0 ) 1 ( μ β j ( ) μ β j ( 0 ) ) log det Σ β j ( )     + log B ( α ) + l = 1 L log B ω l ( ) i = 1 I l = 1 L z i , 1 , l ( ) log z i , 1 , l ( )     i = 1 I t = 2 T l = 1 L l = 1 L z i , t 1 , t , l , l ( ) log z i , t 1 , t , l , l ( ) log z i , t 1 , l ( ) ,
where we denote z i , t , l ( ) = E q 1 z i , t = l , z i , t 1 , t , l , l ( ) = E q 1 z i , t 1 = l , z i , t = l .

Appendix C. Q-Matrix Employed in the Simulation Study

In this section, we present the Q-matrix employed in the simulation study of the HM-LACDM. The Q-matrix is a binary matrix that indicates which attributes are required to answer each item correctly. The first Q-matrix has three attributes and 21 items, while the second Q-matrix has four attributes and 30 items.
Table A1. Three attribute Q-matrix.
Table A1. Three attribute Q-matrix.
ItemAttributeSum
123
10011
20011
30011
40101
50101
60101
70112
80112
90112
101001
111001
121001
131012
141012
151012
161102
171102
181102
191113
201113
211113
Sum121212
Table A2. Four attribute Q-matrix.
Table A2. Four attribute Q-matrix.
ItemAttributeSum
1234
110001
201001
300101
400011
510001
601001
700101
800011
911002
1010102
1110012
1201102
1301012
1400112
1511002
1610102
1710012
1801102
1901012
2000112
2111103
2211013
2310113
2401113
2511103
2611013
2710113
2801113
2911114
3011114
Sum16161616

Appendix D. Simulation Study Details

Appendix D.1. Single Run

Here, we present the additional trace plots of the parameters of π and T for a single run of the HM-LACDM under I = 500 , K = 3 , T = 2 .
Figure A1. Trace plots of the parameters of π and T for a single run of the HM-LACDM under I = 500 , K = 3 , T = 2 .
Figure A1. Trace plots of the parameters of π and T for a single run of the HM-LACDM under I = 500 , K = 3 , T = 2 .
Algorithms 18 00675 g0a1
We also increased the number of respondents to I = 10,000 and repeated the same simulation study. The trace plots of the parameters of π and T for a single run of the HM-LACDM under I = 10,000, K = 3, T = 2 are shown below.
Figure A2. Trace plots of the ELBO (a) and item-effect parameters β j (b) during the CAVI iterations.
Figure A2. Trace plots of the ELBO (a) and item-effect parameters β j (b) during the CAVI iterations.
Algorithms 18 00675 g0a2
Figure A3. Parameter recovery plots for parameters in HM-LACDM. (a) Recovery of item-effect parameters β j ; (b) Recovery of initial latent class distribution π ; (c) Recovery of transition matrix T .
Figure A3. Parameter recovery plots for parameters in HM-LACDM. (a) Recovery of item-effect parameters β j ; (b) Recovery of initial latent class distribution π ; (c) Recovery of transition matrix T .
Algorithms 18 00675 g0a3
We observe that the parameters converge to values that are closer to the true values. The parameter recovery plots also show that the parameter estimates are closer to the true values when the sample size is large.

Appendix D.2. Multiple Runs

We also provide the results for simulations in the HM-LACDM under different configurations (low and moderate signal levels).
Table A3. Root mean square error (RMSE) of estimated parameters under a weak signal level across varying sample sizes and attribute settings.
Table A3. Root mean square error (RMSE) of estimated parameters under a weak signal level across varying sample sizes and attribute settings.
IKT β RMSE τ RMSE π RMSE
200320.36 (2.26 × 10−1)0.078 (6.4 × 10−3)0.035 (1.8 × 10−2)
200330.26 (7.4 × 10−2)0.056 (6.0 × 10−3)0.029 (6.1 × 10−3)
200350.20 (2.5 × 10−2)0.039 (3.8 × 10−3)0.025 (5.1 × 10−3)
200420.31 (1.12 × 10−1)0.064 (1.9 × 10−3)0.023 (8.3 × 10−3)
200430.24 (6.8 × 10−2)0.054 (2.7 × 10−3)0.020 (5.9 × 10−3)
200450.20 (1.19 × 10−1)0.038 (2.1 × 10−3)0.019 (7.5 × 10−3)
1000320.16 (1.61 × 10−1)0.037 (4.5 × 10−3)0.023 (1.3 × 10−2)
1000330.11 (1.5 × 10−2)0.026 (2.3 × 10−3)0.020 (2.6 × 10−3)
1000350.089 (1.3 × 10−2)0.019 (1.8 × 10−3)0.017 (2.3 × 10−3)
1000420.13 (1.1 × 10−2)0.042 (2.8 × 10−3)0.010 (1.6 × 10−3)
1000430.12 (1.18 × 10−1)0.029 (1.9 × 10−3)0.011 (6.9 × 10−3)
1000450.087 (7.0 × 10−2)0.021 (1.2 × 10−3)0.0093 (5.0 × 10−3)
Table A4. Root mean square error (RMSE) of estimated parameters under a moderate signal level across varying sample sizes and attribute settings.
Table A4. Root mean square error (RMSE) of estimated parameters under a moderate signal level across varying sample sizes and attribute settings.
IKT β RMSE τ RMSE π RMSE
200320.38 (1.97 × 10−1)0.073 (4.6 × 10−3)0.021 (1.4 × 10−2)
200330.30 (1.89 × 10−1)0.051 (6.3 × 10−3)0.018 (8.6 × 10−3)
200350.21 (2.3 × 10−2)0.035 (2.2 × 10−3)0.015 (2.9 × 10−3)
200420.30 (5.4 × 10−2)0.060 (1.6 × 10−3)0.018 (3.2 × 10−3)
200430.25 (8.7 × 10−2)0.050 (1.6 × 10−3)0.018 (5.1 × 10−3)
200450.19 (7.9 × 10−2)0.033 (1.9 × 10−3)0.017 (4.0 × 10−3)
1000320.14 (1.9 × 10−2)0.028 (2.3 × 10−3)0.014 (1.4 × 10−3)
1000330.13 (1.40 × 10−1)0.020 (4.8 × 10−3)0.014 (6.4 × 10−3)
1000350.087 (1.4 × 10−2)0.015 (1.0 × 10−3)0.013 (1.2 × 10−3)
1000420.13 (1.1 × 10−2)0.035 (1.6 × 10−3)0.0076 (9.6 × 10−4)
1000430.10 (8.9 × 10−3)0.024 (1.1 × 10−3)0.0076 (9.8 × 10−4)
1000450.080 (6.1 × 10−3)0.016 (7.9 × 10−4)0.0075 (9.1 × 10−4)
Table A5. Q-matrix recovery performance under a weak signal level across varying sample sizes and attribute settings under different significance levels ( α = 0.05 / α = 0.01 ). FPR: false positive rate; FNR: false negative rate.
Table A5. Q-matrix recovery performance under a weak signal level across varying sample sizes and attribute settings under different significance levels ( α = 0.05 / α = 0.01 ). FPR: false positive rate; FNR: false negative rate.
IKTFPRFNRAccuracy
200321.00%/0.22%1.37%/2.21%97.63%/97.57%
200331.41%/0.62%0.14%/0.19%98.44%/99.19%
200351.76%/0.54%0.02%/0.02%98.22%/99.44%
200422.30%/0.84%2.02%/4.10%95.67%/95.06%
200432.77%/1.12%0.39%/0.92%96.83%/97.96%
200452.78%/0.92%0.40%/0.42%96.82%/98.67%
1000323.06%/1.32%0.38%/0.38%96.56%/98.30%
1000332.73%/1.22%0%/0%97.27%/98.78%
1000351.76%/0.59%0%/0%98.24%/99.41%
1000424.83%/2.15%0%/0%95.17%/97.85%
1000434.87%/2.22%0.40%/0.40%94.73%/97.38%
1000455.20%/2.23%0.13%/0.13%94.67%/97.64%
Table A6. Q-matrix recovery performance under a moderate signal level across varying sample sizes and attribute settings under different significance levels ( α = 0.05 / α = 0.01 ). FPR: false positive rate; FNR: false negative rate.
Table A6. Q-matrix recovery performance under a moderate signal level across varying sample sizes and attribute settings under different significance levels ( α = 0.05 / α = 0.01 ). FPR: false positive rate; FNR: false negative rate.
IKTFPRFNRAccuracy
200320.81%/0.37%0.51%/0.57%98.68%/99.06%
200331.56%/1.05%0.29%/0.33%98.16%/98.62%
200351.08%/0.32%0%/0%98.92%/99.68%
200421.63%/0.61%0.20%/0.61%98.17%/98.78%
200431.62%/0.41%0.14%/0.15%98.24%/99.44%
200452.19%/0.79%0.06%/0.07%97.75%/99.14%
1000322.02%/0.83%0%/0%97.98%/99.17%
1000332.51%/0.98%0.06%/0.08%97.43%/98.94%
1000352.25%/0.73%0%/0%97.75%/99.27%
1000423.33%/1.23%0%/0%96.67%/98.77%
1000433.39%/1.47%0%/0%96.61%/98.53%
1000453.69%/1.38%0%/0%96.31%/98.62%
Figure A4. Distributions of RMSE when I = 1000 , K = 4 , T = 5 under a moderate signal level. (a) RMSE of item-effect parameters β j ; (b) RMSE of initial latent class distribution π ; (c) RMSE of transition matrix T .
Figure A4. Distributions of RMSE when I = 1000 , K = 4 , T = 5 under a moderate signal level. (a) RMSE of item-effect parameters β j ; (b) RMSE of initial latent class distribution π ; (c) RMSE of transition matrix T .
Algorithms 18 00675 g0a4
Figure A5. Distributions of RMSE when I = 1000 , K = 4 , T = 2 under a weak signal level. (a) RMSE of item-effect parameters β j ; (b) RMSE of initial latent class distribution π ; (c) RMSE of transition matrix T .
Figure A5. Distributions of RMSE when I = 1000 , K = 4 , T = 2 under a weak signal level. (a) RMSE of item-effect parameters β j ; (b) RMSE of initial latent class distribution π ; (c) RMSE of transition matrix T .
Algorithms 18 00675 g0a5
Figure A6. Distributions of RMSE when I = 200 , K = 3 , T = 5 under a moderate signal level. (a) RMSE of item-effect parameters β j ; (b) RMSE of initial latent class distribution π ; (c) RMSE of transition matrix T .
Figure A6. Distributions of RMSE when I = 200 , K = 3 , T = 5 under a moderate signal level. (a) RMSE of item-effect parameters β j ; (b) RMSE of initial latent class distribution π ; (c) RMSE of transition matrix T .
Algorithms 18 00675 g0a6

References

  1. Leighton, J.; Gierl, M. Cognitive Diagnostic Assessment for Education: Theory and Applications; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
  2. Rupp, A.; Templin, J.; Henson, R. Diagnostic Measurement: Theory, Methods, and Applications; Methodology in the Social Sciences; Guilford Publications: New York, NY, USA, 2010. [Google Scholar]
  3. Bradshaw, L. Diagnostic Classification Models. In The Wiley Handbook of Cognition and Assessment; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2016; Chapter 13; pp. 297–327. [Google Scholar] [CrossRef]
  4. Rasch, G. On General Laws and the Meaning of Measurement in Psychology. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 20 June–30 July 1960; Volume 4, pp. 321–333. [Google Scholar]
  5. Hambleton, R.; Swaminathan, H.; Rogers, H. Fundamentals of Item Response Theory; SAGE Publications: London, UK, 1991. [Google Scholar]
  6. van der Linden, W.E. Handbook of Item Response Theory: Three Volume Set; Chapman and Hall/CRC: Boca Raton, FL, USA, 2016. [Google Scholar]
  7. Chen, J.; Torre, J. A Procedure for Diagnostically Modeling Extant Large-Scale Assessment Data: The Case of the Programme for International Student Assessment in Reading. Psychology 2014, 5, 1967–1978. [Google Scholar] [CrossRef]
  8. Yamaguchi, K.; Okada, K. Comparison among cognitive diagnostic models for the TIMSS 2007 fourth grade mathematics assessment. PLoS ONE 2018, 13, e0188691. [Google Scholar] [CrossRef] [PubMed]
  9. Templin, J.; Hoffman, L. Obtaining Diagnostic Classification Model Estimates Using Mplus. Educ. Meas. Issues Pract. 2013, 32, 37–50. [Google Scholar] [CrossRef]
  10. Templin, J.L.; Henson, R.A. Measurement of Psychological Disorders Using Cognitive Diagnosis Models. Psychol. Methods 2006, 11, 287–305. [Google Scholar] [CrossRef] [PubMed]
  11. Jang, E.E. A Validity Narrative: Effects of Reading Skills Diagnosis on Teaching and Learning in the Context of NG TOEFL. Ph.D. Dissertation, Department of Education, University of Illinois at Urbana-Champaign, Champaign, IL, USA, 2005. [Google Scholar]
  12. Jurich, D.P.; Bradshaw, L.P. An Illustration of Diagnostic Classification Modeling in Student Learning Outcomes Assessment. Int. J. Test. 2014, 14, 49–72. [Google Scholar] [CrossRef]
  13. Li, F.; Cohen, A.; Bottge, B.; Templin, J. A Latent Transition Analysis Model for Assessing Change in Cognitive Skills. Educ. Psychol. Meas. 2016, 76, 181–204. [Google Scholar] [CrossRef] [PubMed]
  14. Martin, R.A.; Velicer, W.F.; Fava, J.L. Latent transition analysis to the stages of change for smoking cessation. Addict. Behav. 1996, 21, 67–80. [Google Scholar] [CrossRef]
  15. Collins, L.; Lanza, S. Latent Class and Latent Transition Analysis: With Applications in the Social, Behavioral, and Health Sciences; Wiley Series in Probability and Statistics; Wiley: Hoboken, NJ, USA, 2009. [Google Scholar]
  16. de la Torre, J. DINA Model and Parameter Estimation: A Didactic. J. Educ. Behav. Stat. 2009, 34, 115–130. [Google Scholar] [CrossRef]
  17. de la Torre, J. The Generalized DINA Model Framework. Psychometrika 2011, 76, 179–199. [Google Scholar] [CrossRef]
  18. Kaya, Y.; Leite, W.L. Assessing Change in Latent Skills Across Time With Longitudinal Cognitive Diagnosis Modeling: An Evaluation of Model Performance. Educ. Psychol. Meas. 2017, 77, 369–388. [Google Scholar] [CrossRef]
  19. Wang, S.; Yang, Y.; Culpepper, S.A.; Douglas, J.A. Tracking Skill Acquisition with Cognitive Diagnosis Models: A Higher-Order, Hidden Markov Model with Covariates. J. Educ. Behav. Stat. 2018, 43, 57–87. [Google Scholar] [CrossRef]
  20. Madison, M.J.; Bradshaw, L.P. Assessing growth in a diagnostic classification model framework. Psychometrika 2018, 83, 963–990. [Google Scholar] [CrossRef]
  21. Tatsuoka, K.K. Rule Space: An Approach for Dealing with Misconceptions based on Item Response Theory. J. Educ. Meas. 1983, 20, 345–354. [Google Scholar] [CrossRef]
  22. Rupp, A.A.; Templin, J. The Effects of Q-Matrix Misspecification on Parameter Estimates and Classification Accuracy in the DINA Model. Educ. Psychol. Meas. 2008, 68, 78–96. [Google Scholar] [CrossRef]
  23. Kunina-Habenicht, O.; Rupp, A.A.; Wilhelm, O. The Impact of Model Misspecification on Parameter Estimation and Item-Fit Assessment in Log-Linear Diagnostic Classification Models. J. Educ. Meas. 2012, 49, 59–81. [Google Scholar] [CrossRef]
  24. Madison, M.J.; Bradshaw, L.P. The Effects of Q-Matrix Design on Classification Accuracy in the Log-Linear Cognitive Diagnosis Model. Educ. Psychol. Meas. 2015, 75, 491–511. [Google Scholar] [CrossRef] [PubMed]
  25. Tjoe, H.; de la Torre, J. Designing Cognitively-Based Proportional Reasoning Problems as an Application of Modern Psychological Measurement Models. J. Math. Educ. 2013, 6, 17–26. [Google Scholar]
  26. De La Torre, J. An Empirically Based Method of Q-Matrix Validation for the DINA Model: Development and Applications. J. Educ. Meas. 2008, 45, 343–362. [Google Scholar] [CrossRef]
  27. De La Torre, J.; Chiu, C.Y. A General Method of Empirical Q-matrix Validation. Psychometrika 2016, 81, 253–273. [Google Scholar] [CrossRef]
  28. Liu, J.; Xu, G.; Ying, Z. Data-Driven Learning of Q-Matrix. Appl. Psychol. Meas. 2012, 36, 548–564. [Google Scholar] [CrossRef]
  29. Liu, J.; Xu, G.; Ying, Z. Theory of self-learning Q-matrix. Bernoulli 2013, 19, 1790–1817. [Google Scholar] [CrossRef] [PubMed]
  30. Chen, Y.; Liu, J.; Xu, G.; Ying, Z. Statistical Analysis of Q-Matrix Based Diagnostic Classification Models. J. Am. Stat. Assoc. 2015, 110, 850–866. [Google Scholar] [CrossRef] [PubMed]
  31. Chen, Y.; Culpepper, S.A.; Chen, Y.; Douglas, J. Bayesian Estimation of the DINA Q Matrix. Psychometrika 2018, 83, 89–108. [Google Scholar] [CrossRef]
  32. Balamuta, J.J.; Culpepper, S.A. Exploratory Restricted Latent Class Models with Monotonicity Requirements under Pòlya-Gamma Data Augmentation. Psychometrika 2022, 87, 903–945. [Google Scholar] [CrossRef]
  33. Chen, Y.; Culpepper, S.; Liang, F. A Sparse Latent Class Model for Cognitive Diagnosis. Psychometrika 2020, 85, 121–153. [Google Scholar] [CrossRef]
  34. Yamaguchi, K. Bayesian Diagnostic Classification Models for a Partially Known Q-Matrix. J. Educ. Behav. Stat. 2025, 50, 331–382. [Google Scholar] [CrossRef]
  35. Jordan, M.I.; Ghahramani, Z.; Jaakkola, T.S.; Saul, L.K. An Introduction to Variational Methods for Graphical Models. Mach. Learn. 1999, 37, 183–233. [Google Scholar] [CrossRef]
  36. Beal, M.J. Variational Algorithms for Approximate Bayesian Inference. Ph.D Dissertation, Gatsby Computational Neuroscience Unit, University College London, London, UK, 2003. [Google Scholar]
  37. Wainwright, M.J.; Jordan, M.I. Graphical Models, Exponential Families, and Variational Inference. Found. Trends® Mach. Learn. 2008, 1, 1–305. [Google Scholar] [CrossRef]
  38. Blei, D.M.; Kucukelbir, A.; McAuliffe, J.D. Variational Inference: A Review for Statisticians. J. Am. Stat. Assoc. 2017, 112, 859–877. [Google Scholar] [CrossRef]
  39. Yamaguchi, K.; Okada, K. Variational Bayes Inference Algorithm for the Saturated Diagnostic Classification Model. Psychometrika 2020, 85, 973–995. [Google Scholar] [CrossRef]
  40. Yamaguchi, K.; Okada, K. Variational Bayes Inference for the DINA Model. J. Educ. Behav. Stat. 2020, 45, 569–597. [Google Scholar] [CrossRef]
  41. Yamaguchi, K.; Martinez, A.J. Variational Bayes inference for hidden Markov diagnostic classification models. Br. J. Math. Stat. Psychol. 2024, 77, 55–79. [Google Scholar] [CrossRef]
  42. Wang, X.; Zhang, J.; Lu, J. Efficient and Effective Variational Bayesian Inference Method for Log-Linear Cognitive Diagnostic Model. Psychometrika 2025, 90, 74–107. [Google Scholar] [CrossRef]
  43. Jaakkola, T.S.; Jordan, M.I. A Variational Approach to Bayesian Logistic Regression Models and their Extensions. In Proceedings of the Sixth International Workshop on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, USA, 4–7 January 1997; Madigan, D., Smyth, P., Eds.; Proceedings of Machine Learning Research (PMLR): New York, NY, USA, 2021; Volume R1, pp. 283–294. [Google Scholar]
  44. Baum, L.E. An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. Inequalities 1972, 3, 1–8. [Google Scholar]
  45. Rabiner, L. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 1989, 77, 257–286. [Google Scholar] [CrossRef]
  46. Bishop, C.M. Pattern Recognition and Machine Learning; Information Science and Statistics; Springer: New York, NY, USA, 2006. [Google Scholar]
  47. Benjamini, Y.; Hochberg, Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J. R. Stat. Society. Ser. B (Methodol.) 1995, 57, 289–300. [Google Scholar] [CrossRef]
  48. Köhn, H.F.; Chiu, C.Y. A Procedure for Assessing the Completeness of the Q-Matrices of Cognitively Diagnostic Tests. Psychometrika 2017, 82, 112–132. [Google Scholar] [CrossRef]
  49. Köhn, H.F.; Chiu, C.Y. How to Build a Complete Q-Matrix for a Cognitively Diagnostic Test. J. Classif. 2018, 35, 273–299. [Google Scholar] [CrossRef]
  50. Plummer, S.; Pati, D.; Bhattacharya, A. Dynamics of Coordinate Ascent Variational Inference: A Case Study in 2D Ising Models. Entropy 2020, 22, 1263. [Google Scholar] [CrossRef]
  51. Bhattacharya, A.; Pati, D.; Yang, Y. On the Convergence of Coordinate Ascent Variational Inference. Ann. Stat. 2025, 53, 929–962. [Google Scholar] [CrossRef]
  52. Ghosh, I.; Bhattacharya, A.; Pati, D. Statistical Optimality and Stability of Tangent Transform Algorithms in Logit Models. J. Mach. Learn. Res. 2022, 23, 1–42. [Google Scholar]
  53. Anceschi, N.; Rigon, T.; Zanella, G.; Durante, D. Optimal lower bounds for logistic log-likelihoods. arXiv 2024, arXiv:2410.10309. [Google Scholar] [CrossRef]
  54. Nicholas, G. Polson, J.G.S.; Windle, J. Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables. J. Am. Stat. Assoc. 2013, 108, 1339–1349. [Google Scholar] [CrossRef]
  55. Durante, D.; Rigon, T. Conditionally Conjugate Mean-Field Variational Bayes for Logistic Models. Stat. Sci. 2019, 34, 472–485. [Google Scholar] [CrossRef]
Figure 1. Trace plots of the ELBO (a) and item-effect parameters β j (b) during the CAVI iterations.
Figure 1. Trace plots of the ELBO (a) and item-effect parameters β j (b) during the CAVI iterations.
Algorithms 18 00675 g001
Figure 2. Parameter recovery plots for parameters in HM-LACDM. (a) Recovery of item-effect parameters β j ; (b) Recovery of initial latent class distribution π ; (c) Recovery of transition matrix T .
Figure 2. Parameter recovery plots for parameters in HM-LACDM. (a) Recovery of item-effect parameters β j ; (b) Recovery of initial latent class distribution π ; (c) Recovery of transition matrix T .
Algorithms 18 00675 g002
Figure 3. Distributions of RMSE when I = 200 , K = 3 , T = 5 under a strong signal level. (a) RMSE of item-effect parameters β j ; (b) RMSE of initial latent class distribution π ; (c) RMSE of transition matrix T .
Figure 3. Distributions of RMSE when I = 200 , K = 3 , T = 5 under a strong signal level. (a) RMSE of item-effect parameters β j ; (b) RMSE of initial latent class distribution π ; (c) RMSE of transition matrix T .
Algorithms 18 00675 g003
Table 1. Root mean square error (RMSE) of estimated parameters under a strong signal level across varying sample sizes and attribute settings.
Table 1. Root mean square error (RMSE) of estimated parameters under a strong signal level across varying sample sizes and attribute settings.
IKT β RMSE τ RMSE π RMSE
200320.66 (4.49 × 10−1)0.071 (4.5 × 10−3)0.018 (2.3 × 10−2)
200330.45 (2.49 × 10−1)0.049 (4.4 × 10−3)0.013 (1.0 × 10−2)
200350.31 (2.7 × 10−2)0.034 (7.9 × 10−4)0.011 (1.1 × 10−3)
200420.42 (7.9 × 10−2)0.057 (9.0 × 10−4)0.016 (1.1 × 10−3)
200430.34 (8.4 × 10−2)0.047 (9.1 × 10−4)0.016 (1.6 × 10−3)
200450.27 (1.06 × 10−1)0.030 (1.2 × 10−3)0.016 (1.8 × 10−3)
1000320.21 (2.2 × 10−2)0.026 (7.6 × 10−4)0.012 (3.8 × 10−4)
1000330.20 (3.20 × 10−1)0.017 (4.8 × 10−3)0.013 (1.1 × 10−2)
1000350.16 (3.09 × 10−1)0.014 (6.7 × 10−3)0.013 (7.8 × 10−3)
1000420.18 (1.25 × 10−1)0.032 (1.9 × 10−3)0.0074 (1.7 × 10−3)
1000430.16 (1.86 × 10−1)0.021 (2.3 × 10−3)0.0074 (1.4 × 10−3)
1000450.15 (2.38 × 10−1)0.015 (4.5 × 10−3)0.0077 (3.3 × 10−3)
Table 2. Q-matrix recovery performance under a strong signal level across varying sample sizes and attribute settings under different significance levels ( α = 0.05 / α = 0.01 ). FPR: false positive rate; FNR: false negative rate.
Table 2. Q-matrix recovery performance under a strong signal level across varying sample sizes and attribute settings under different significance levels ( α = 0.05 / α = 0.01 ). FPR: false positive rate; FNR: false negative rate.
IKTFPRFNRAccuracy
200320.98%/0.73%1.13%/1.21%97.89%/98.06%
200331.27%/0.65%0.16%/0.19%98.57%/99.16%
200351.35%/0.41%0%/0%98.65%/99.59%
200421.11%/0.42%0.07%/0.11%98.82%/99.48%
200431.35%/0.46%0.07%/0.08%98.58%/99.46%
200451.70%/0.62%0.03%/0.04%98.27%/99.34%
1000321.97%/0.79%0%/0%98.03%/99.21%
1000332.41%/1.29%0.27%/0.29%97.32%/98.43%
1000353.11%/1.68%0.11%/0.16%96.78%/98.16%
1000423.33%/1.48%0.03%/0.03%96.64%/98.49%
1000433.25%/1.52%0.06%/0.07%96.69%/98.40%
1000453.80%/2.05%0.07%/0.09%96.12%/97.86%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Duan, H.; Tang, J.; Madison, M.J.; Cotterell, M.; Jeon, M. Variational Bayesian Inference for a Q-Matrix-Free Hidden Markov Log-Linear Additive Cognitive Diagnostic Model. Algorithms 2025, 18, 675. https://doi.org/10.3390/a18110675

AMA Style

Duan H, Tang J, Madison MJ, Cotterell M, Jeon M. Variational Bayesian Inference for a Q-Matrix-Free Hidden Markov Log-Linear Additive Cognitive Diagnostic Model. Algorithms. 2025; 18(11):675. https://doi.org/10.3390/a18110675

Chicago/Turabian Style

Duan, Hao, James Tang, Matthew J. Madison, Michael Cotterell, and Minjeong Jeon. 2025. "Variational Bayesian Inference for a Q-Matrix-Free Hidden Markov Log-Linear Additive Cognitive Diagnostic Model" Algorithms 18, no. 11: 675. https://doi.org/10.3390/a18110675

APA Style

Duan, H., Tang, J., Madison, M. J., Cotterell, M., & Jeon, M. (2025). Variational Bayesian Inference for a Q-Matrix-Free Hidden Markov Log-Linear Additive Cognitive Diagnostic Model. Algorithms, 18(11), 675. https://doi.org/10.3390/a18110675

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop