Hypergraph-Driven High-Order Knowledge Tracing with a Dual-Gated Dynamic Mechanism

Ma, Fanglan; Zhu, Changsheng; Lei, Peng

doi:10.3390/app15158617

Open AccessArticle

Hypergraph-Driven High-Order Knowledge Tracing with a Dual-Gated Dynamic Mechanism

by

Fanglan Ma

^1,2,

Changsheng Zhu

^1,* and

Peng Lei

³

¹

School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China

²

Institute of Sensing Technology, Gansu Academy of Sciences, Lanzhou 730000, China

³

Network & Information Center, Lanzhou University of Technology, Lanzhou 730050, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(15), 8617; https://doi.org/10.3390/app15158617

Submission received: 23 June 2025 / Revised: 24 July 2025 / Accepted: 30 July 2025 / Published: 4 August 2025

Download

Browse Figures

Versions Notes

Abstract

Knowledge tracing (KT), a core educational data mining task, models students’ evolving knowledge states to predict future learning. In online education systems, the exercises are numerous, but they are typically associated with only a few concepts. However, existing models rarely integrate exercise information with high-order exercise–concept correlations, focusing solely on optimizing models’ final predictive performance. To address these limitations, we propose the Hypergraph-Driven High-Order Knowledge Tracing with a Dual-Gated Dynamic Mechanism (HGKT), a novel framework that (1) captures correlations between exercises and concepts through a two-layer hypergraph convolution; (2) integrates hypergraph-driven exercise embedding and temporal features (answer time and interval time) to characterize learning behavioral dynamics; and (3) designs a learning layer and a forgetting layer, with the dual-gating mechanism dynamically balancing their impacts on the knowledge state. Experiments on three public datasets demonstrate that the proposed HGKT model achieves superior predictive performance compared to all baselines. On the longest interaction sequence dataset, ASSISChall, HGKT improves prediction AUC by least 1.8%. On the biggest interaction records dataset, EdNet-KT1, it maintains a state-of-the-art AUC of 0.78372. Visualization analyses confirm its interpretability in tracing knowledge state evolution. These results validate HGKT’s effectiveness in modeling high-order exercise–concept correlations while ensuring practical adaptability in real-world online education platforms.

Keywords:

knowledge tracing; hypergraph neural network; high order; forgetting mechanism; online learning

1. Introduction

With the rapid developments in artificial intelligence technology and online education systems, knowledge tracing (KT) [1], as the core technology of educational artificial intelligence, has attracted increasing attention from researchers. Through the assessment of students’ knowledge states derived from their learning interactions, KT can improve learning efficiency and facilitate a better understanding of the learning process for students. Nevertheless, most existing KT methods [2,3,4] assess the students knowledge state without considering the high-order relations between the exercises and knowledge concepts. In actuality, the relations between the exercises and concepts have a profound impact on learning progression and significantly influence future performance predictions.

Research on knowledge tracing (KT) began in the 1990s, with Corbett and Anderson’s [5] foundational work, which is widely regarded as the first major contribution to the field and the origin of the KT concept. Advances in deep learning subsequently spurred the rapid development of numerous KT models. For instance, deep knowledge tracing [2] pioneered the use of deep learning for KT, while memory-augmented KT models were introduced by Zhang et al. (2017) [3]. More innovations include attention enhanced KT frameworks proposed in several studies [6,7,8] as well as graph neural network (GNN)-based approaches [9,10,11] that model relational structures, such as knowledge component (KC) similarities, dependencies, and question-KC mappings, to enhance performance. Beyond diverse neural architectures, KT research progresses using two main types of models. The first models leverage educational data features, including exercise content [12], difficulty levels [13], relationships between exercises and concepts [14], and cold start [15]. The second models incorporate psychological studies on student learning and forgetting behavior [16,17,18,19], considering additional factors that influence knowledge state tracing, such as the time interval between a student’s interactions and the number of times that the knowledge concepts have been practiced. The rise of large language models (LLMs) has spurred the emergence of knowledge tracing frameworks [20,21] that integrate LLM capabilities with sequential interaction modeling. This integration addresses the intricate exercise–knowledge component relationships that are prevalent in real-world educational scenarios. While GNN-based methods like GKT [10] capture skill dependencies through pairwise graphs, they fail to model high-order exercise–concept correlations. In contrast, HGKT’s hypergraph convolution explicitly resolves multi-hop correlations, overcoming limitations in graph-based KT. Similarly, LLM-enhanced frameworks (e.g., LLM-KT [21]) treat exercise IDs as mere numerical tokens, failing to interpret sequential interactions that reflect knowledge states. This fragmented tokenization obscures behavioral semantics. HGKT achieves higher interpretability in knowledge state evolution.

Existing KT approaches [3,4] typically construct predictive models based on knowledge concepts rather than target questions. In KT, it is common to encounter scenarios with several concepts and numerous exercises, where one concept may be associated with many exercises, and a single exercise may correspond to multiple concepts. However, these knowledge tracing (KT) approaches treat knowledge concepts as independent entities, neglecting their inherent hierarchical and composite relationships, which can result in degraded performance. For example, in Figure 1, there are four exercises that are associated with more than one concept. Despite the fact that exercises e2 and e3 assess identical concepts, different difficulty levels may yield different correct response probabilities. Therefore, it is important to research the high-order correlations between the exercises and concepts.

In this paper, we study how to capture latent high-order exercise–concept correlations. Motivated by the hypergraph neural network (HGNN)’s proven superiority in modeling many-to-many complex relationships within data mining, we employ a two-layer hypergraph convolutional network to capture the latent correlations between exercises and concepts. This work makes the following contributions.

(1): This work proposes a novel Hypergraph-Driven High-Order Knowledge Tracing with a Dual-Gated Dynamic Mechanism (HGKT) model to capture correlations between exercises and concepts through a two-layer hypergraph convolution. The proposed HGKT provides superior adaptability and performance for online education learning assessment.
(2): This work designs a learning layer and a forgetting layer, where knowledge acquisition follows learning gain dynamics while memory decay adheres to Ebbinghaus forgetting principles [22]. This model effectively captures the dynamics of human learning.
(3): The proposed HGKT model is evaluated on three public education datasets, including ASSIST2012, ASSISTChall and EdNet-KT1. HGKT achieves superior predictive performance across different datasets.

This paper is structured into the following sections. Section 2 reviews prior research on knowledge tracing models and recent hypergraph neural network (HGNN) applications in educational data mining. Section 3 formally defines and mathematically formulates the knowledge tracing problem. The architecture of the proposed HGKT model is detailed in Section 4, with specific elaboration on the implementation and functionality of its layers. Section 5 elaborates the experimental framework, encompassing dataset specifications, evaluation metrics, benchmark configurations, and comparative results analysis. Section 6 synthesizes the core contributions of this work, discusses theoretical and practical implications, and suggests actionable directions for future knowledge tracing research.

2. Related Work

2.1. Knowledge Tracing

Knowledge tracing (KT) approaches can be categorized into two paradigms: (1) Traditional methods like Bayesian knowledge tracing (BKT) and factor analysis models [23]. BKT [5] is a classic hidden Markov model which represents knowledge states with binary variables. While item response theory (IRT) [22,24], a foundational factor analysis model, analyzes performance through learner ability and item difficulty. (2) Deep learning-based approaches: deep knowledge tracing (DKT) [2] and DKT+ [25] pioneers the use of RNN/LSTM for temporal dependency modeling. Dynamic key–value memory networks (DKVMN) [3] incorporate memory-augmented neural networks into knowledge tracing, utilizing a static key matrix to store latent concepts and a dynamic value matrix to track and update mastery levels. Self-attentive knowledge tracing (SAKT) [26] introduces attention mechanisms to KT. Context-aware AKT [6] integrates cognitive theory with contextualized representations, GKT [10] builds a skill relation graph and learns their relation explicitly. CKT [27] designs hierarchical convolutional layers to extract individualized learning rates based on continuous learning interactions of students. Learning process-consistent knowledge tracing (LPKT) [19] integrated forgetting curves and learning gains to enhance temporal consistency. LPKT relies on a static Q matrix which is a binary mapping of exercises to knowledge concepts. LPKT’s extension LPKT-S [28] emphasizes time interval effects and individual progress variance.

2.2. Hypergraph Neural Networks

Traditional knowledge tracing methods (e.g., BKT, IRT) and existing deep learning models (e.g., DKT, SAKT) typically rely on graph structures or sequential modeling to capture knowledge concept correlations. However, constrained by the binary relationship representation capability of ordinary graphs, they struggle to explicitly model high-order correlations and hierarchical dependencies between exercises and knowledge concepts. Recently, hypergraph neural networks (HGNN) [29,30,31] have gradually been introduced into educational data mining due to their superiority in modeling many-to-many complex relationships. Unlike conventional graphs, hypergraphs utilize hyperedges to connect multiple nodes simultaneously, making them inherently suitable for capturing multidimensional collaborative effects among knowledge concepts. For instance, in knowledge tracing scenarios, a single exercise may involve multiple knowledge concepts (forming a hyperedge), while different exercises can establish high-order correlations through shared subsets of knowledge concepts. Although existing studies have applied GNN [9] to KT tasks (e.g., the GKT model [10]), they remain limited to low-order neighborhood information aggregation and fail to capture multi-hop collaborative influences across exercises. The proposed HGKT model innovatively employs a dual-layer hypergraph convolutional architecture: the first layer models exercise–knowledge concept membership relationships, while the second layer discovers high-order exercise correlations based on shared knowledge concepts, thereby explicitly deconstructing the collaborative evolution mechanism of knowledge states. This approach transcends the pairwise relationship constraints of traditional graph models, providing finer-grained interpretable representations for knowledge tracing.

3. Problem and Definition

Knowledge tracing formalizes the learning domain through a set of students

S = \{s_{1}, s_{2}, \dots, s_{i}, \dots, s_{I}\}

, a set of exercises

E = \{e_{1}, e_{2}, \dots, e_{j}, \dots, e_{J}\}

, and a set of knowledge concepts

C = \{c_{1}, c_{2}, \dots, c_{m}, \dots, c_{M}\}

, where each exercise is related to specific knowledge concepts. Exercise–concept associations are encoded in a binary mapping matrix

Q \in {[0, 1]}^{J \times M}

, which elements satisfy

Q_{j m} = 1

represents that the knowledge concept

c_{m}

is required for exercise

e_{j}

, otherwise,

Q_{j m} = 0

. The Q-matrix constitutes the foundational relational schema between assessment items and latent competencies.

Given a student’s temporal interaction sequence

H_{t} = \{(e_{1}, a_{1}), (e_{2}, a_{2}), . . ., (e_{t}, a_{t})\}

, the KT problem constitutes two interdependent objectives: (1) knowledge state diagnosis—estimate a latent knowledge state vector

h_{t} \in [0, 1]

that characterizes the student’s mastery—and (2) performance prediction—modeling the conditional response probability

P (a_{t + 1} = 1 ∣ e_{t + 1}, H_{t})

for any subsequent exercise

e_{t + 1}

at

t + 1

.

4. Methodology

4.1. The HGKT Model

As illustrated in Figure 2, the proposed model architecture comprises four core components: (1) a hypergraph-enhanced exercise–knowledge concept embedding representation, (2) a learning module (modeling knowledge gain), (3) a forgetting module (modeling knowledge decay), and (4) a prediction module. The hypergraph neural network explicitly constructs heterogeneous correlations between exercises and knowledge concepts through hyperedge connections, employing a dual-layer hypergraph convolution mechanism to capture high-order exercise interaction patterns. Both the learning and forgetting modules incorporate temporal features to dynamically regulate knowledge gain and decay, ensuring adaptive alignment with learners’ evolving cognitive states.

4.1.1. Hypergraph-Enhanced Embedding Layer

Define a hypergraph

G = (V, ε)

, where

V

represents a set of vertices, and

ε

represents a set of hyperedges. In the hypergraph

G

, an exercise

e_{t}

corresponds to a vertex, and a knowledge concept c corresponds to a hyperedge. The incidence matrix

H \in R^{|V| \times |ε|}

is based on the

Q

matrix, an exercise–knowledge concept association matrix that represents node–hyperedge relationships. In a hypergraph, a hyperedge denotes a group relationship connecting multiple vertices simultaneously, as illustrated in Figure 3. Here, C represents knowledge concepts,

e_{t}

represents exercises, and each exercise forms a hyperedge with its associated knowledge concepts. Exercise–knowledge concepts incidence matrix

H

each entry is defined as follows:

H (V_{i}, ε_{j}) = \{\begin{matrix} 1, i f V_{i} \in ε_{j} \\ 0, i f V_{i} \notin ε_{j} \end{matrix}

(1)

where if a vertex belongs to a hyperedge, then

H (V_{i}, ε_{j}) = 1

; otherwise,

H (V_{i}, ε_{j}) = 0

.

By constructing the hypergraph using the exercise–knowledge concepts incidence matrix

H

, each exercise and its associated knowledge concepts form hyperedges. A hypergraph neural network is employed to explicitly model the complex relationships between exercises and knowledge concepts, capturing high-order exercise correlations which latent dependencies formed when multiple exercises nonlinearly interact through shared knowledge concepts through a two-layer hypergraph convolution.

Hypergraph convolution employs a message propagation mechanism, enabling exercise embeddings to aggregate features from related knowledge concepts and form enhanced representations that incorporate knowledge structure information. The generation of enhanced exercise embeddings involves two convolutional layers:

First convolution layer: Uses a non-linear transformation with ReLU activation to fuse exercise features with their associated knowledge concepts features into embedding using incidence matrix

H

.

X^{(1)} = R e L U (W_{1} \cdot H \cdot e_{t} + b_{1})

(2)

where

W_{1}

is the weight matrix, and

b_{1}

is the bias term. ReLU is a non-linear activation function.

Second Convolution Layer: Propagates features across concept-sharing exercises through incidence matrix operations, where hyperedges dynamically cluster exercises with overlapping concepts—mathematically formalizing high-order relationships as multi-hop paths exercise-concept-exercise in hypergraph topology. This further refines the feature representation, producing the final enhanced embedding

E_{t}

with a dimensionality of

d_{k}

.

E_{t} = W_{2} \cdot H \cdot X^{(1)} + b_{2}

(3)

where

W_{2}

is the weight matrix, and

b_{2}

is the bias term.

The enhanced embeddings not only retain the semantic information of the exercises themselves but also encode the association patterns among knowledge concepts, allowing for a more precise characterization of an exercise’s position within the knowledge space.

During each forward propagation, the hypergraph structure is dynamically reconstructed, enabling the convolution operations to adapt to varying knowledge point association patterns across different exercises.

Therefore, the representation of the basic learning unit

(E_{t}, a t_{t}, a_{t})

is constructed by concatenating

E_{t}

,

a t_{t}

, and

a_{t}

, which are then fused through a multilayer perceptron (MLP) to integrate the enhanced exercise representation

E_{t}

, answer time representation

{at}_{t}

, and answer representation

a_{t}

. The embedding layer can be formulated as

l_{t} = W_{3}^{T} (E_{t} \oplus {at}_{t} \oplus a_{t}) + b_{3}

(4)

where ⊕ is operation of concatenating,

W_{3}

is the weight matrix, and

b_{3}

is the bias term.

4.1.2. Learning Gain

Learning knowledge gain refers to the knowledge advancement a learner achieves through consecutive learning interactions (e.g.,

l_{t - 1}

and

l_{t}

), driven by learning activities. Its core lies in quantifying short-term progress through performance differences and integrating dynamic features of the learning process. Specifically, learning gain is benchmarked by the difference in performance between adjacent learning units, a difference that reflects positive changes in the learner’s knowledge state. Simultaneously, the time interval

i t_{t}

between interactions directly impacts the gain effect—shorter intervals typically imply a more compact learning process, which may lead to higher knowledge gains. Additionally, the learner’s prior knowledge state

h_{t - 1}

moderates the gain: learners with lower initial knowledge levels often exhibit greater potential for improvement, resulting in more significant latent progress. Thus, knowledge gain comprehensively captures the dynamics and diversity of knowledge acquisition during learning by integrating three dimensions: performance differences, learning continuity (time intervals), and individual capability disparities (knowledge mastery). Therefore, the learner’s initial learning gain

\lg_{t}

can be modeled as

\lg_{t} = t a n h (W_{4}^{T} [l_{t} \oplus {it}_{t} \oplus l_{t - 1} \oplus h_{t - 1}] + b_{4})

(5)

where tanh is a non-linear activation function.

Since not all knowledge learned by a learner is fully absorbed, a learning gate

Γ_{t}

is introduced to control the learner’s knowledge absorption capacity. Specifically, the concatenation of the representation vectors from two consecutive learning interactions, the time interval, and the learner’s prior knowledge state is processed through a sigmoid function to compute the absorption capacity:

Γ_{t} = σ (W_{5}^{T} [l_{t} \oplus {it}_{t} \oplus l_{t - 1} \oplus h_{t - 1}] + b_{5})

(6)

Here, ⊕ denotes vector concatenation.

σ (\cdot)

is a non-linear sigmoid activation function. The learner’s actual learning gain

{LG}_{t}

is then obtained by multiplying the learning gate with the normalized initial learning gain:

{LG}_{t} = Γ_{t} \cdot ((\lg_{t} + 1) / 2)

(7)

To derive the exercise-specific learning gain associated with the knowledge concepts of exercise

e_{t}

, the actual gain is further multiplied by the knowledge concept vector

q_{e_{t}}

of the current exercise:

{RLG}_{t} = q_{e_{t}} \cdot {LG}_{t}

(8)

4.1.3. Forgetting Layer

Within the forgetting layer, the forgetting erasure vector

{Fe}_{t}

is computed via a sigmoid gating mechanism that integrates three inputs: (1) the previous knowledge state

h_{t - 1}

, (2) the actual learning gain vector

{LG}_{t}

and (3) the time interval vector

{it}_{t}

. The sigmoid — based erasure gate operationalizes Ebbinghaus’s exponential forgetting law, where memory strength will diminish predictably over time without reinforcement. This is formally expressed as

{Fe}_{t} = σ (W_{6}^{T} [h_{t - 1} \oplus {LG}_{t} \oplus {it}_{t}] + b_{6})

(9)

The forgetting update vector uses the tanh function to model the non-linear strengthening of knowledge through deliberate practice. This aligns with cognitive load theory’s focus on effortful encoding.

{Fa}_{t} = t a n h (W_{7}^{T} [h_{t - 1} \oplus {LG}_{t} \oplus {it}_{t}] + b_{7})

(10)

By combining the forgetting erasure vector and the forgetting update vector, the forgetting vector

F_{t}

can be obtained. The erasure vector continuously attenuates prior knowledge states, while the update vector selectively amplifies retrieval-reinforced concepts. This is a biology-inspired balance where forgetting pressures counteract with practice-driven potentiation.

F_{t} = (1 - {Fe}_{t}) (1 + {Fa}_{t})

(11)

Therefore, based on the forgetting vector

F_{t}

and the related learning gain

{RLG}_{t}

, the updated knowledge state matrix

h_{t}

is

h_{t} = {RLG}_{t} + F_{t} \cdot h_{t - 1}

(12)

4.1.4. Predicting Layer

Our model dynamically tracks learners’ knowledge states by integrating learning gains (knowledge improvement from interactions) and forgetting effects (time-dependent decay). This dual mechanism allows the model to simulate how knowledge is both acquired and eroded, providing a realistic representation of cognitive processes.

When predicting a learner’s performance on an upcoming exercise

e_{t + 1}

, the model combines the exercise’s embedding with the current knowledge state

h_{t}

. This fused representation is processed through a fully connected layer with a sigmoid activation function, generating a probability of correct response. Formally,

p_{t + 1} = σ (W_{8}^{T} [e_{t + 1} \oplus h_{t}] + b_{8})

(13)

The prediction layer outputs

p_{t + 1}

, representing the probability of a student correctly answering exercise

e_{t + 1}

, with

p_{t + 1} \in (0, 1)

. A threshold of 0.5 is applied. If

p_{t + 1} > 0.5

, the model predicts a correct answer; otherwise, the prediction is incorrect.

4.1.5. Object Function

The goal of knowledge tracing is to predict the likelihood of a student answering a question correctly. To achieve this, the objective function minimizes the negative log-likelihood between the predicted probabilities

p_{t}

and the ground-truth responses

a_{t}

, while incorporating

L 2

regularization to prevent overfitting:

L (θ) = - \sum_{t - 1}^{T} [a_{t} log p_{t} + (1 - a_{t}) log (1 - p_{t})] + λ_{θ} {∥θ∥}_{2}^{2}

(14)

where

θ

is all the parameters in HGKT, and

λ_{θ}

is a regularized hyperparameter.

5. Experiments

In this section, we present a comprehensive overview of the datasets used for evaluation, baseline models adopted for comparative analysis, and specifics of the training process.

5.1. Datasets

In our experiments, we evaluated our method on three public datasets: ASSIST2012, ASSISTChall, and EdNet-KT1. All datasets are derived from real-world sequences of student exercise–answer interactions. A detailed description of each dataset is summarized in Table 1. The detailed introduction of the datasets is as follows.

ASSIST2012 (https://drive.google.com/file/d/1cU6Ft4R3hLqA7G1rIGArVfelSZvc6RxY/view?usp=sharing (accessed on 31 July 2025)): This dataset was collected through the ASSISTments online education platform during the period spanning September 2012 to October 2013. It comprises learning interaction logs generated during practice sessions, the dataset captures learner engagement through similar exercise sequences designed for skill consolidation. Each record includes multidimensional educational metadata such as exercises, knowledge concepts, answer time and student’s answer. To optimize knowledge tracing model performance, records lacking associated knowledge concept annotations were systematically filtered during dataset preprocessing.

ASSISTChall (https://sites.google.com/view/assistmentsdatamining/dataset?authuser=0 (accessed on 31 July 2025)): Released in 2017 through a data mining competition, the comprehensive ASSISTment Challenge dataset offers the richest metadata among ASSISTments repositories. It encompasses 3162 unique questions answered by 1709 students across 102 knowledge concepts, generating 942,816 interactions. On average, this dataset exhibits substantially longer learning sequences compared to ASSIST2012.

EdNet-KT1 (https://github.com/riiid/ednet (accessed on accessed on 31 July 2025)): EdNet dataset represents the largest publicly released intelligent tutoring system (ITS) collection to date. EdNet-KT1, a subsample derived from the EdNet, originates from Santa—an AI-powered multi-platform tutoring system. There are 784,309 students which contain 131,441,538 interactions in EdNet-KT1. Due to computational constraints, we extracted interaction records from a randomly selected 10% of students as experimental data.

5.2. Baseline Methods

In order to evaluate the performance of our proposed model, we use the following models as our baselines:

DKT [2]: DKT pioneered the use of deep neural networks for knowledge tracing. Utilizing recurrent neural networks (RNNs), it effectively models the complex temporal dynamics of student interactions and learning progressions. This enables more accurate predictions of student performance and mastery, significantly advancing educational analytics and personalized learning.

DKT+ [25]: DKT+ represents an architectural refinement of the original deep knowledge tracing (DKT) framework, specifically designed to mitigate two critical limitations: input reconstruction deficiency, where the base model exhibits inadequate auto-regressive fidelity in reproducing observed interaction sequences and temporal prediction instability, characterized by inconsistent mastery trajectory projections for knowledge components across successive time steps.

DKVMN [3]: leveraging memory-augmented neural architectures, the dynamic key–value memory network (DKVMN) model advances knowledge tracing capabilities. Its core framework deploys a static key matrix for persistent concept embedding and a dynamic value matrix capturing temporal mastery evolution. This dual-memory mechanism substantially enhances neural inference fidelity for latent knowledge state estimation. The innovative integration of associative memory modules yields superior accuracy and computational efficiency in learning progression modeling, ultimately enabling adaptive instructional optimization and measurable pedagogical outcomes enhancement.

AKT [6]: AKT is a context-sensitive knowledge tracing framework that incorporates dual self-attention mechanisms to separately model exercise characteristics and learner responses. Its core innovation lies in a knowledge retriever module that leverages attention mechanisms to dynamically retrieve historical knowledge states pertinent to the target exercise, enabling contextualized prediction of learner proficiency.

SAKT [26]: SAKT model predicts student mastery by identifying the most relevant past knowledge components (KCs) for a given target KC. It leverages attention mechanisms to focus on a select subset of a student’s previous activities for each prediction. This targeted approach, relying on fewer KCs than RNN-based models, significantly mitigates the data sparsity problem common in knowledge tracing.

LPKT [19]: The learning process-consistent knowledge tracing (LPKT) framework operationalizes knowledge state monitoring by explicitly simulating cognitive learning dynamics. This approach synergistically incorporates temporal features—response latency and inter-practice intervals—to quantify incremental learning gains and memory decay across successive learning events. Mnemonic reinforcement is achieved through empirically validated forgetting curve integration, ensuring longitudinal prediction stability. Foundationally, LPKT employs a static Q-matrix to establish binary exercise–concept mappings, preserving essential assessment-skill relational semantics.

5.3. Experimental Setting

The statistical details of the three datasets are summarized in Table 2. To balance computational efficiency and sequence context, the maximum length of student response history was capped at 200 entries. The data was split into 80% for training and validation (with an 80 – 20 split between training and validation subsets) and 20% for testing.

We ordered student learning records based on answer timestamps and standardized all input sequences to fixed lengths. Sequence lengths are fixed at 100 for the ASSIST2012 and EdNet-KT1 datasets, and 500 for the ASSISTChall dataset. Sequences exceeding the threshold length were segmented into multiple independent fixed-length subsequences. Shorter sequences were padded with zero vectors to reach the target length. A standard 5-fold cross-validation protocol was employed. In each fold, 80% of students data were split as training set (80%), and validation set (20%). The remaining 20% used as the test set. Parameters were randomly initialized using a uniform distribution. All hyperparameters were learned via the training set, with optimal models selected based on validation performance. A dropout layer (rate = 0.2) was integrated into the HGKT module to mitigate overfitting. Dimension parameters

d_{k}

and

d_{e}

are set to 128,

d_{a} = 50

. Q-matrix small positive value

γ = 0.03

and learning rate is

3 \times 10^{- 3}

. Batch size is 128. To ensure repeatability and fairness, all experiments were conducted on a Windows server equipped with NVIDIA Quadro RTX 6000 GPUs.

5.4. Experiment Results

5.4.1. The Prediction Performance

Given the inherent difficulty in quantifying latent knowledge states, we operationalize model evaluation through response prediction accuracy—a well-established proxy metric for knowledge estimation fidelity in KT literature. To validate HGKT’s efficacy, comprehensive benchmarking against state-of-the-art baselines was conducted, with comparative performance on learner outcome forecasting detailed in Table 2. All experiments employ standardized evaluation protocols using the receiver operating characteristic’s area under the curve (AUC) and accuracy (ACC), both bounded in [0, 1]. Elevated metric values correspond to enhanced diagnostic capability, signifying superior model performance.

From Table 2, experimental results across three benchmark datasets (ASSIST2012, ASSISTChall, and EdNet-KT1) demonstrate the consistent superiority of the proposed HGKT model over all baseline methods, achieving state-of-the-art performance in both AUC and ACC. Specifically, HGKT improves the AUC by 1.8% compared to the previous best model, LPKT. Its effectiveness is particularly notable on the ASSISTChall dataset, characterized by the longest learning sequences, indicating enhanced adaptability to extended learning contexts. Furthermore, from Figure 4, HGKT achieves superior AUC and ACC performance on the largest dataset, EdNet-KT1, highlighting its robustness in complex learning scenarios. The model significantly outperforms traditional LSTM-based approaches (e.g., yielding an average AUC gain of 11.879% over DKT). These results collectively affirm HGKT’s effectiveness and adaptability for diverse knowledge tracing tasks.

5.4.2. Ablation Study

To validate the capability of hypergraph neural networks in capturing high-order relationships among knowledge concepts and examine the limitations of traditional binary Q-matrices(defining exercise–knowledge concept relationships), this study compares the performance between knowledge tracing models using hypergraph-enhanced embedding methods and those relying on static Q-matrix. Across three datasets, we evaluate the HGKT model and its two constructed variants against the LPKT baseline.

HGKT: Uses Q-matrix only for related learning gain;
HGKT-LG: Replaces related learning gain with direct learning gain (no Q-matrix usage;)
HGKT-Q: Entirely eliminates Q-matrix dependency;
LPKT: Uses Q-matrix for related learning gain and knowledge states update.

As revealed in Figure 5, the HGKT-Q framework’s implementation of hypergraph neural networks achieves significantly better performance than traditional Q-matrix-driven approaches LPKT, demonstrating the effectiveness of high-order relationship modeling in educational data analytics. However, the performance superiority of HGKT over HGKT-LG and HGKT-Q indicates the Q-matrix serves as a static knowledge concept-exercise mapping, its synergistic combination with learning gain through related computation enables precise exercise-specific knowledge enhancement quantification. The performance superiority of HGKT-LG over HGKT-Q indicates Q-matrix’s implicit structural benefits even without explicit multiplicative operations, whereas HGKT’s optimal performance stems from its Q-matrix-constrained explicit operations that focus learning gains on exercise-relevant concepts. These findings collectively highlight the necessity of integrating static Q-matrix with hypergraph neural networks for effective knowledge tracing.

5.4.3. Visualization of the Student Knowledge State

This study of visually demonstrates the dynamic tracing performance of the HGKT model in monitoring students’ evolving knowledge states. Figure 6 shows using 15 exercises (e1–e15) from the ASSISTChall dataset, covering three concepts: C1 (addition), C2 (ordering numbers), and C3 (subtraction). The horizontal axis represents the exercise sequence, while colored markers above indicate ✓ (correct) or × (incorrect) responses for concept-specific exercises, the following conclusions are drawn:

(1) When students answered incorrectly in e11, their mastery of related knowledge concepts still shows an upward trend, indicating that the HGKT model can effectively identify the implicit dependency relationships among related knowledge concepts and verifying the model’s ability to capture higher-order knowledge structures. While students make mistakes in exercises e8 and e13, the model detects a significant decrease in the mastery, indicating that the model can correct the assessment results of the knowledge state in real time and provide targeted feedback for knowledge gaps.

(2) During the initial learning stage, the learning gain of new knowledge concept is significant. For example, after completing exercise e6, the mastery of knowledge concept C3 increases by 27%. However, with the repeated practice of the same knowledge concept and the continuous practice of the same knowledge concept five times (exercises e7 – e11), the mastery degree of C2 only increased by 3%, indicating that there is a phenomenon of diminishing returns from repeated training. It corresponds to the marginal utility theory in cognitive psychology, suggesting that systems should prioritize recommending understudied concepts to optimize learning efficiency.

(3) The HGKT model captures the knowledge decline caused by the forgetting mechanism. After the practice of exercise e5, the mastery degree of knowledge concept C1 continuously declined from a peak of 90% to 62% because it was not covered in eight consecutive questions. The model successfully captured this typical forgetting curve. It is suggested that when the attenuation rate of knowledge point mastery is greater than a certain threshold, the review reminder mechanism should be triggered to strengthen memory retention through interval repetition.

6. Conclusions

In this work, we proposed the Hypergraph-driven Knowledge Tracing (HGKT) framework. Extensive experiments on three diverse public datasets (ASSIST2012, ASSISChall, and Ednet-KT) demonstrate HGKT’s state-of-the-art predictive performance. HGKT achieved significant improvements over baseline models, with average AUC gains of 6.71%, 8.82%, and 12.15% on ASSIST2012, ASSISChall, and Ednet-KT, respectively, alongside corresponding accuracy increases of 4.0%, 3.44%, and 3.96%. These results validate HGKT’s efficacy in predicting student knowledge state.

The key advancements stem from HGKT’s novel two-layer convolutional architecture employing hypergraph neural networks. This design effectively captures intricate high-order correlations among exercises and knowledge concepts–a critical limitation in prior models. HGKT successfully integrates rich hypergraph-derived exercise representations with essential temporal features (answer time and interval time). Furthermore, our cognitively inspired dual-gating strategy dynamically balances learning and forgetting mechanisms, enabling realistic modeling of evolving knowledge states. Visualization analyses confirm the model’s interpretability in tracing this evolution. The consistent performance gains across datasets with varying characteristics underscore HGKT’s strong practical potential for real-world educational platforms.

Future work will focus on employing knowledge distillation to develop lightweight HGKT variants, significantly reducing model complexity and inference latency to enable real-time deployment on large-scale platforms. Concurrently, we will explore integration with large foundation models (LLMs) to enable multimodal knowledge tracing, incorporating textual content, images, or audio prompts for richer contextual understanding. Building on this, further refinement of knowledge concept representation and interaction modeling will enhance generalizability across diverse educational domains and curricula. These combined enhancements aim to broaden HGKT’s applicability and solidify its role within next-generation intelligent tutoring systems.

Author Contributions

Data curation, F.M.; supervision, C.Z.; methodology, F.M. and P.L.; writing—original draft preparation, F.M.; writing—review and editing, F.M. and P.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Science and Technology Planning Project of Chengguan District of Lanzhou (2023JSCX0004), the Outstanding Youth Fund of Gansu Academy of Sciences (2023YQ-03), and the Higher Education Research Project of Lanzhou University of Technology (GJ2024B-45).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All datasets are available from public websites.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, Q.; Shen, S.; Huang, Z.; Chen, E.; Zheng, Y. A survey of knowledge tracing. arXiv 2021, arXiv:2105.15106. [Google Scholar]
Piech, C.; Bassen, J.; Huang, J.; Ganguli, S.; Sahami, M.; Guibas, L.J.; Sohl-Dickstein, J. Deep knowledge tracing. Adv. Neural Inf. Process. Syst. 2015, 28, 1–12. [Google Scholar]
Zhang, J.; Shi, X.; King, I.; Yeung, D.Y. Dynamic key–value memory networks for knowledge tracing. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 May 2017; pp. 765–774. [Google Scholar]
Ghodai, A.; Wang, Q. Knowledge tracing with sequential key-value memory networks. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; pp. 175–184. [Google Scholar]
Corbett, A.; Anderson, J. Knowledge tracing: Modeling the acquisition of procedural knowledge. User Model.-User-Adapt. Interact. 1994, 4, 253–278. [Google Scholar] [CrossRef]
Ghosh, A.; Heffernan, N.; Lan, A. Context-aware attentive knowledge tracing. In Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, 23–27 August 2020; pp. 2330–2339. [Google Scholar]
Yin, Y.; Dai, L.; Huang, Z.; Shen, S.; Wang, F.; Liu, Q.; Chen, E.; Li, X. Tracing knowledge instead of patterns: Stable knowledge tracing with diagnostic transformer. In Proceedings of the ACM Web Conference 2023, WWW 2023, Austin, TX, USA, 30 April–4 May 2023; pp. 855–864. [Google Scholar]
Hou, M.; Li, X.; Guo, T.; Liu, Z.; Tian, M.; Luo, R.; Luo, W. Cognitive Fluctuations Enhanced Attention Network for Knowledge Tracing. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; pp. 14265–14273. [Google Scholar]
Cui, C.; Yao, Y.; Zhang, C.; Ma, H.; Ma, Y.; Ren, Z.; Zhang, C.; Ko, J. DGEKT: A dual graph ensemble learning method for knowledge tracing. ACM Trans. Inf. Syst. 2024, 3, 1–24. [Google Scholar] [CrossRef]
Nakagawa, H.; Iwasawa, Y.; Matsuo, Y. Graph-based knowledge tracing: Modeling student proficiency using graph neural networks. Web Intell. 2021, 19, 156–163. [Google Scholar] [CrossRef]
Yang, Y.; Shen, J.; Qu, Y.; Liu, Y.; Wang, K.; Zhu, Y.; Zhang, W.; Yu, Y. GIKT: A graph-based interaction model for knowledge tracing. In Proceedings of the Machine learning and Knowledge Discovery in Databases-European Conference, ECML PKDD 2020, Ghent, Belgium, 14–18 September 2020; pp. 299–315. [Google Scholar]
Liu, Q.; Huang, Z.; Yin, Y.; Chen, E.; Xiong, H.; Su, Y.; Hu, G. EKT: Exercise aware knowledge tracing for student performance prediction. IEEE Trans. Knowl. Data Eng. 2021, 1, 100–115. [Google Scholar] [CrossRef]
Shen, S.; Huang, Z.; Liu, Q.; Su, Y.; Wang, S.; Chen, E. Assessing student’s dynamic knowledge state by exploring the question difficulty effect. In Proceedings of the SIGIR’22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 427–437. [Google Scholar]
Tong, H.; Wang, Z.; Zhou, Y.; Tong, S.; Han, W.; Liu, Q. Introducing problem schema with hierarchical exercise graph for knowledge tracing. In Proceedings of the SIGIR ’22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 405–415. [Google Scholar]
Bai, Y.; Li, X.; Liu, Z.; Huang, Y.; Guo, T.; Hou, M.; Xia, F.; Luo, W. csKT: Addressing cold-start problem in knowledge tracing via kernel bias and cone attention. Expert Syst. Appl. 2025, 266, 1–13. [Google Scholar] [CrossRef]
Chen, M.; Guan, Q.; He, Y.; He, Z.; Fang, L.; Luo, W. Knowledge Tracing Model with Learning and Forgetting Behavior. In Proceedings of the 31st ACM International Conference on Information and Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; pp. 2863–3867. [Google Scholar]
Nagatani, K.; Zhang, Q.; Sato, M.; Chen, Y.; Chen, F.; Ohkuma, T. Augmenting knowledge tracing by considering forgetting behavior. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 3101–3107. [Google Scholar]
Xu, B.; Huang, Z.; Liu, J.; Shen, S.; Liu, Q.; Chen, E.; Wu, J.; Wang, S. Learning Behavior-oriented Knowledge Tracing. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, USA, 6–10 August 2023; pp. 2789–2800. [Google Scholar]
Shen, S.; Liu, Q.; Chen, E.; Huang, Z.; Huang, W.; Yin, Y.; Su, Y.; Wang, S. Learning process-consistent knowledge tracing. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Virtual Event, 14–18 August 2021; pp. 1452–1460. [Google Scholar]
Fu, L.; Guan, H.; Du, K.; Lin, J.; Xia, W.; Zhang, W.; Tang, R.; Wang, Y.; Yu, Y. SINKT: A structure-aware inductive knowledge tracing model with large language model. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, Boise, ID, USA, 21–25 October 2024; pp. 632–642. [Google Scholar]
Wang, Z.; Zhou, J.; Chen, Q.; Zhang, M.; Jiang, B.; Zhou, A.; Bai, Q.; He, L. LLM-KT: Aligning large language models with knowledge tracing using a plug-and-play instruction. arXiv 2025, arXiv:2502.02945. [Google Scholar]
Ebbinghaus, H. Memory: A contribution to experimental psychology. Ann. Neurosci. 2013, 20, 155. [Google Scholar] [CrossRef] [PubMed]
Ghodai, A.; Wang, Q.; Bernardo, N. Knowledge Tracing: A Survey. ACM Comput. Surv. 2023, 224, 1–37. [Google Scholar] [CrossRef]
Minn, S.; Yu, Y.; Desmarais, M.; Zhu, F.; Vie, J. Deep Knowledge Tracing and Dynamic Student Classification for Knowledge Tracing. In Proceedings of the IEEE International Conference on Data Mining (ICDM), Singapore, 17–20 November 2018; pp. 1182–1187. [Google Scholar]
Yeung, C.; Yeung, D. Addressing two problems in deep knowledge tracing via prediction-consistent regularization. In Proceedings of the Ffith Annual ACM Conference on Learing at Scale, London, UK, 26–28 June 2018; pp. 1–10. [Google Scholar]
Pandey, S.; Karypis, G. A Self Attentive model for Knowledge Tracing. In Proceedings of the 12th International Conference on Educational Data Mining, Montreal, QC, Canada, 2–5 July 2019; pp. 384–389. [Google Scholar]
Shen, S.; Liu, Q.; Chen, E.; Wu, H.; Huang, Z.; Zhao, W.; Su, Y.; Ma, H.; Wang, S. Convolutional Knowledge Tracing: Modeling Individualization in Student Learning Process. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’20), Virtual Event, 25–30 July 2020; pp. 1857–1860. [Google Scholar]
Shen, S.; Chen, E.; Liu, Q.; Huang, Z.; Huang, W.; Yin, Y.; Su, Y.; Wang, S. Monitoring Student Progress for Learning Process-Consistent Knowledge Tracing. IEEE Trans. Autom. Control 2023, 8, 8213–8227. [Google Scholar] [CrossRef]
Feng, Y.; You, H.; Zhang, Z.; Ji, R.; Gao, Y. Hypergraph neural networks. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 3558–3565. [Google Scholar]
Gao, Y.; Zhang, Z.; Lin, H.; Zhao, X.; Du, S.; Zou, C. Hypergraph learning: Methods and practices. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 2548–2566. [Google Scholar] [CrossRef] [PubMed]
Gao, Y.; Ji, S.; Han, X.; Dai, Q. Hypergraph computation. Engineering 2024, 40, 188–201. [Google Scholar] [CrossRef]

Figure 1. A toy example of exercises—concept relation graph.

Figure 2. The architecture of the HGKT model.

Figure 3. The exercise–knowledge concept relationship diagram.

Figure 4. The performance comparison of HGKT vs. baseline models.

Figure 5. The performance of HGKT.

Figure 6. The evolution of students’ knowledge state traced by HGKT.

Table 1. Statistics of all datasets.

Statistics	Datasets
Statistics	ASSIST2012	ASSISTChall	EdNet-KT1
Students	29,018	1709	784,309
Exercises	53,091	3162	12,372
Concepts	265	102	141
Questions Per Concepts	200.34	31	87.74
Answer Time	26,747	1326	9292
Interval Time	29,748	2839	41,830
Avg. Length	93.45	551.68	125.45

Table 2. Results of comparison of models on student performance prediction.

Models	ASSIST2012		ASSISTChall		EdNet-KT1
Models	AUC	ACC	AUC	ACC	AUC	ACC
DKT	0.72430	0.71265	0.72185	0.71963	0.68632	0.68903
DKT+	0.72510	0.71312	0.72210	0.71352	0.64254	0.67352
DKVMN	0.71488	0.72415	0.70630	0.70198	0.67086	0.68231
SAKT	0.72213	0.72750	0.78743	0.74150	0.68041	0.68652
AKT	0.77814	0.75352	0.75621	0.71063	0.75827	0.71024
LPKT	0.77189	0.75936	0.79837	0.73774	0.77324	0.72026
HGKT	0.78812	0.76087	0.81305	0.74533	0.78372	0.72074

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ma, F.; Zhu, C.; Lei, P. Hypergraph-Driven High-Order Knowledge Tracing with a Dual-Gated Dynamic Mechanism. Appl. Sci. 2025, 15, 8617. https://doi.org/10.3390/app15158617

AMA Style

Ma F, Zhu C, Lei P. Hypergraph-Driven High-Order Knowledge Tracing with a Dual-Gated Dynamic Mechanism. Applied Sciences. 2025; 15(15):8617. https://doi.org/10.3390/app15158617

Chicago/Turabian Style

Ma, Fanglan, Changsheng Zhu, and Peng Lei. 2025. "Hypergraph-Driven High-Order Knowledge Tracing with a Dual-Gated Dynamic Mechanism" Applied Sciences 15, no. 15: 8617. https://doi.org/10.3390/app15158617

APA Style

Ma, F., Zhu, C., & Lei, P. (2025). Hypergraph-Driven High-Order Knowledge Tracing with a Dual-Gated Dynamic Mechanism. Applied Sciences, 15(15), 8617. https://doi.org/10.3390/app15158617

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hypergraph-Driven High-Order Knowledge Tracing with a Dual-Gated Dynamic Mechanism

Abstract

1. Introduction

2. Related Work

2.1. Knowledge Tracing

2.2. Hypergraph Neural Networks

3. Problem and Definition

4. Methodology

4.1. The HGKT Model

4.1.1. Hypergraph-Enhanced Embedding Layer

4.1.2. Learning Gain

4.1.3. Forgetting Layer

4.1.4. Predicting Layer

4.1.5. Object Function

5. Experiments

5.1. Datasets

5.2. Baseline Methods

5.3. Experimental Setting

5.4. Experiment Results

5.4.1. The Prediction Performance

5.4.2. Ablation Study

5.4.3. Visualization of the Student Knowledge State

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI