DyGAS: Dynamic Graph-Augmented Sequence Modeling for Knowledge Tracing

Li, Xiuyun; Yan, Zihao; Gu, Yongchun; Zhou, Siwei; Yang, Shasha

doi:10.3390/app152312767

Open AccessArticle

DyGAS: Dynamic Graph-Augmented Sequence Modeling for Knowledge Tracing

by

Xiuyun Li

¹,

Zihao Yan

²,

Yongchun Gu

³,

Siwei Zhou

³ and

Shasha Yang

^3,*

¹

School of Educational Sciences, Yili Normal University, Yining 835000, China

²

School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China

³

Zhejiang Key Laboratory of Intelligent Education Technology and Application, Zhejiang Normal University, Jinhua 321004, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(23), 12767; https://doi.org/10.3390/app152312767

Submission received: 17 October 2025 / Revised: 14 November 2025 / Accepted: 30 November 2025 / Published: 2 December 2025

(This article belongs to the Special Issue Generative AI for Intelligent Knowledge Systems and Adaptive Learning)

Download

Browse Figures

Versions Notes

Abstract

Online learning environments generate vast amounts of student interaction data. While these records capture observable behaviors, they do not directly reveal students’ underlying knowledge states, which are essential for tracking learning progress. Knowledge tracing (KT) addresses this gap by predicting students’ future performance on exercises related to specific concepts, thereby enabling personalized learning and intelligent tutoring. Existing deep learning-based KT methods achieve promising results, but they often overemphasize either the sequential evolution of knowledge or the static structural relationships, which does not reflect the dynamic evolution of student learning. Moreover, they fail to model students’ knowledge state accurately under sparse interactions. To overcome these limitations, we propose DyGAS, a dynamic graph-augmented sequence modeling framework for knowledge tracing. The sequential module captures the dynamics pattern of knowledge acquisition and forgetting, while the structural module employs graph convolutional networks (GCN) to model inter-concept dependencies and knowledge transfer. Additionally, we propose that static knowledge modeling provides semantic priors to stabilize the representation of sparse concepts. Empirical results on three benchmark datasets demonstrate that DyGAS achieves superior performance compared to state-of-the-art methods, offering accurate and robust knowledge tracing across diverse learning scenarios.

Keywords:

knowledge tracing; graph neural networks; dynamic graph-augmented model; sequence modeling; graph learning

1. Introduction

The COVID-19 pandemic has disrupted traditional education worldwide. Online learning has experienced unprecedented growth, with students increasingly relying on platforms such as MOOCs [1], Coursera, and Khan Academy to continue their studies. These platforms generate massive amounts of student interaction data daily, including exercise attempts, quiz results, and other learning activities. Although these data reflect students’ learning behaviors, their underlying knowledge states remain unobservable, making it challenging to provide personalized learning strategies for students with different learning states [2]. Knowledge tracing [3,4] addresses this challenge by predicting students’ future performance on new exercises based on their past responses, thereby indirectly estimating their mastery of different knowledge concepts. By accurately capturing these latent knowledge states, educational systems can adapt learning paths, recommend targeted resources, and provide data-driven support for personalized learning and intelligent tutoring interventions [5,6,7].

Existing approaches can be broadly divided into traditional methods and deep learning–based methods. Traditional methods, such as Bayesian knowledge tracing [7], model student knowledge with simplified states, limiting their ability to capture complex learning behaviors. With the advancement of deep learning, sequence-based models use recurrent architectures to capture sequential dynamics in students’ knowledge [8,9,10]. These studies continue to serve as foundational frameworks in knowledge tracing, providing a baseline for dynamically modeling knowledge evolution. Memory-augmented models enhance knowledge representations with key–value structures for finer-grained mastery tracking [11,12,13]. This design explicitly maintains concept-level knowledge states, which enables dynamic updates of concept representations after each student interaction. Therefore, complex relationships between concepts can be further modeled. Attention-based approaches, inspired by the transformer architecture, capture relationships among questions and their relevance to student knowledge [14,15]. Graph-based methods leverage structural information via graph neural networks to model dependencies among concepts and questions [16,17,18,19]. This provides a structured view of students’ knowledge evolution. Inspired by these ideas, DyGAS integrates graph-based modeling with dynamic and static knowledge representations to achieve more robust and fine-grained knowledge tracing. Compared to traditional methods, these methods can capture more complex interaction patterns and knowledge structures.

Despite these advances, existing methods still face challenges in accurately modeling students’ knowledge states. The evolution of a student’s knowledge state depends both on their historical learning trajectory and on the latent relationships among knowledge concepts. The former reflects the accumulation and forgetting of knowledge, while the latter captures dependencies and transfer across concepts. Existing methods often overemphasize either the sequential evolution of knowledge or the static structural dependencies among concepts. This limitation reduces their ability to dynamically capture the update and transfer of knowledge during learning. In knowledge tracing, dynamic knowledge modeling mainly relies on students’ response interactions to update concept representations. However, due to the sparsity and potential noise in students’ response records, concept representations updated solely through interaction data are often insufficiently learned. As a result, the dynamic modeling process becomes unstable since it fails to capture the intrinsic semantic characteristics of the knowledge concepts. Based on these analyses, knowledge tracing currently faces two main challenges: First, how to jointly model the evolution of student knowledge and the structural dependencies among concepts to dynamically capture knowledge updates and transfer. Second, how to construct stable and reliable knowledge representations under sparse interaction data.

To address these challenges, we propose a novel knowledge tracing model DyGAS, which dynamically integrates sequential knowledge modeling with graph-based structural modeling to simulate students’ knowledge evolution through an interaction–knowledge accumulation–knowledge transfer process (illustrated in Figure 1). Here, interaction refers to students’ problem-solving behaviors, knowledge accumulation refers to the acquisition of knowledge gains from exercises, and transfer refers to the students’ ability to transfer and extend the learned knowledge. The sequential module models the process of acquiring knowledge gains from each interaction and dynamically integrates short-term learning gains with long-term knowledge accumulation, thereby mitigating knowledge forgetting. The structural module employs graph convolutional networks to capture dependencies among concepts, thereby modeling the process of knowledge transfer. To alleviate instability caused by sparse interactions, DyGAS additionally incorporates a static knowledge module, which constructs static embeddings by aggregating features from all exercises associated with each concept. This process captures the inherent semantic information of each concept, independent of students’ interaction patterns. The static embeddings thus act as semantic priors, stabilizing concept representations under conditions of sparse interactions. Finally, DyGAS adaptively fuses multiple knowledge representations to model students’ knowledge states more comprehensively.

Our contributions are as follows:

We propose a novel knowledge tracing model DyGAS. It integrates sequential and graph-based modeling. The sequential module captures knowledge acquisition and forgetting, while the structural module captures dependencies and knowledge transfer across concepts, together effectively simulating students’ knowledge evolution.
To mitigate instability caused by sparse interactions, DyGAS incorporates static knowledge modeling within the structural module, providing semantic priors for knowledge concepts. This complementary perspective ensures stable representations for concepts and strengthens the structural modeling of inter-concept dependencies.
DyGAS significantly improves predictive accuracy. Extensive experiments on three benchmark datasets demonstrate its superiority over state-of-the-art methods.

The remainder of this paper is organized as follows. Section 2 reviews the related work. Section 3 introduces the preliminaries. Section 4 presents the proposed method, DyGAs, in detail. Section 5 describes the algorithm implementation and provides the computational complexity analysis. Section 6 reports the experimental results. Finally, Section 7 concludes the paper.

2. Related Work

Knowledge tracing (KT) aims to capture how students’ knowledge evolves throughout the learning process and to predict their performance on subsequent concept-related tasks. Early studies in this field can be traced back to Bayesian knowledge tracing [7] and factor analysis approaches [20,21,22]. With the emergence of deep learning, KT has enabled more fine-grained and dynamic modeling of students’ knowledge states. Deep learning-based KT approaches can be grouped into four categories based on their modeling strategies: sequential, memory, attention, and graph.

Sequential-based methods leverage recurrent neural networks to effectively capture the temporal dependencies and evolving dynamics in students’ learning processes. DKT [8] is the first to introduce recurrent neural networks into knowledge tracing, modeling students’ evolving knowledge states through sequential dependencies. Building on this foundation, DKT+ [23] incorporates reconstruction errors to stabilize knowledge states, while DKT+forgetting [24] accounts for knowledge decay during the learning process. ATKT [25] leverages adversarial training to improve generalization. LPKT [9] further emphasizes consistency between students’ knowledge states and their learning process. DIMKT [10] incorporates question difficulty into KT, enabling adaptive updates according to problem difficulty. LBKT [26] extends KT by modeling multiple learning behavior features.

Memory-based methods introduce an external memory component, which enables the model to store and update concept-level knowledge states explicitly. DKVMN [11] introduces a dynamic key-value memory network to model the relationships between concepts and track each student’s evolving mastery over them. SKVMN [12] extends DKVMN by integrating sequential modeling to better capture dependencies in students’ exercise histories. DGMN [13] introduces a forget-gated attention memory network to model students’ forgetting behaviors and dynamically capture relationships among latent concepts. MAN [27] combines memory-augmented and attention-based networks with a context-aware attention mechanism to model both long-term and short-term knowledge.

Attention-based methods leverage self-attention mechanisms to alleviate the long-term dependency problem, enabling more flexible modeling of historical interactions and their varying importance. SAKT [28] was the first model to utilize a self-attention mechanism to assess the influence of past interactions on students’ concept mastery when constructing their knowledge states. SAINT [14] employs an encoder-decoder transformer to model exercises and responses separately. AKT [15] introduces a context-aware monotonic attention mechanism to model the influence of past responses. ELAKT [29] extends AKT by incorporating causal convolutions and a prediction correction module to better capture local knowledge dynamics and handle stochastic student behaviors. SparseKT [30] introduces a sparsification mechanism to focus on the most relevant student interactions.

Graph-based methods utilize graph neural networks [31,32,33] to capture complex structural relationships among concepts and exercises, thereby modeling inter-concept dependencies beyond sequential information. GKT [16] represents exercises and concepts in a graph and applies graph neural networks to propagate knowledge states, thereby capturing inter-concept dependencies more effectively. SKT [34] models concept influence propagation, considering both directed and undirected relations. HGKT [35] builds a hierarchical exercise graph to capture complex dependencies and models both knowledge and problem schema for improved tracing and interpretability. GIKT [17] leverages a graph convolutional network to model high-order question-skill interactions and captures long-term dependencies in students’ exercise histories. DGEKT [18] models heterogeneous exercise–knowledge relationships using a dual graph structure and combines them via knowledge distillation. DyGKT [19] employs dynamic graph learning with a dual time encoder to model evolving student–question–concept relationships in continuous time. L-SKSKT [36] integrates long- and short-term knowledge state representations via graph embeddings to enhance knowledge tracing and capture learning preferences.

Our method lies at the intersection of sequential and graph-based approaches, combining sequential modeling of student interactions with structural modeling over knowledge concepts. Unlike prior works that emphasize either sequential dynamics or structural dependencies, our framework explicitly integrates both perspectives. Specifically, it simulates knowledge evolution through an interaction–knowledge accumulation–knowledge transfer process. The sequential module captures short- and long-term learning dynamics, while the graph module models inter-concept dependencies and transfer. This hybrid design provides a comprehensive representation of students’ knowledge states. Through this unified perspective, our approach improves knowledge tracing by effectively bridging sequential and structural paradigms and providing an effective framework for modeling students’ dynamic learning processes.

3. Preliminary and Notation

Let

S = {s_{1}, s_{2}, \dots, s_{I}}

denote the set of students in an online learning system, where I is the total number of students. Each student

s_{i} \in S

has a sequence of interaction records

L_{i} = {(q_{1}, K_{1}, r_{1}), (q_{2}, K_{2}, r_{2}), \dots, (q_{t}, K_{t}, r_{t})}, t \leq T,

where T denotes the maximum length of the interaction sequence,

q_{j} \in Q = {q_{1}, q_{2}, \dots, q_{N}}

represents an exercise,

K_{j} \subseteq K = {k_{1}, k_{2}, \dots, k_{M}}

denotes the associated knowledge concepts, and

r_{j} \in {0, 1}

is the student’s response outcome (one for correct and zero for incorrect). Exercises and knowledge concepts follow a many-to-many relationship, one exercise may involve multiple knowledge concepts, and one concept may correspond to multiple exercises. Let the relations between exercises and knowledge concepts be represented by a binary association matrix

C = [c_{i j}] \in {0, 1}^{N \times M}

, where

c_{i j} = 1

if exercise

q_{i} \in Q

involves knowledge concept

k_{j} \in K

, and

q_{i j} = 0

otherwise. Typical symbols are summarized in Table 1.

Given a student’s learning history

L_{i}

up to time t, the task of knowledge tracing is to predict the probability

P (r_{t + 1} = 1 ∣ L_{i}, q_{t + 1}, K_{t + 1})

that student

s_{i}

will correctly answer the next exercise

q_{t + 1}

.

4. Proposed Method: DyGAS

In this section, we provide a detailed overview of the DyGAS model, for which overall architecture is illustrated in Figure 2. DyGAS combines sequential modeling with graph-based structural reasoning to capture both the evolution of students’ knowledge states and the dependencies among concepts. We first describe sequence modeling of student interactions to capture knowledge gains from exercise responses. Next, we use graph convolutional networks to model students’ current knowledge states from both dynamic and static perspectives. Finally, we explain how DyGAS predicts student responses based on these knowledge representations.

4.1. Embedding

Let

Q \in R^{N \times d}

denote the exercise embedding matrix,

K \in R^{M \times d}

the knowledge embedding matrix, N is the total number of questions, M is the total number of concepts, and d is the embedding dimension. At time step t, a student’s interaction is denoted as

(q_{t}, r_{t})

, where

q_{t}

is the attempted exercise and

r_{t} \in {0, 1}

indicates whether the response is correct. To explicitly distinguish between correct and incorrect responses, we construct the interaction embedding as

[r_{t} \cdot Q_{q_{t}} | | (1 - r_{t}) \cdot Q_{q_{t}}],

where

| |

denotes vector concatenation. A linear layer is then applied to fuse the exercise and correctness information:

x_{t} = W_{1} [r_{t} \cdot Q_{q_{t}} | | (1 - r_{t}) \cdot Q_{q_{t}}] + b_{1},

(1)

where

W_{1} \in R^{d \times 2 d}

and

b_{1} \in R^{d}

. The resulting embedding

x_{t}

constitutes the input representation for the model.

4.2. Sequential Knowledge Modeling

In knowledge tracing tasks, a student’s knowledge state is not static but evolves dynamically with each interaction. To capture this evolution, we model the sequence of student responses, which allows the model to account for both short-term performance fluctuations and long-term knowledge accumulation. At each time step, the model integrates the current exercise response with the previous knowledge state, thereby updating the representation of the student’s knowledge in a context-aware manner. Specifically, to derive the knowledge state

k_{t - 1} \in R^{d}

associated with the current exercise

q_{t}

, we retrieve the corresponding concept embedding from the previous embedding matrix

K_{t - 1} \in R^{M \times d}

:

k_{t - 1} = C_{q_{t}} \cdot K_{t - 1},

(2)

where

C_{q_{t}}

is the row of

C

corresponding to exercise

q_{t}

, representing the set of associated knowledge concepts. We then compute the interaction gain

Δ x_{t}

by combining the previous interaction embedding

x_{t - 1}

, the current interaction embedding

x_{t}

, and the retrieved knowledge state

k_{t - 1}

:

Δ x_{t} = tanh (W_{2} [x_{t - 1} | | x_{t} | | k_{t - 1}] + b_{2}),

(3)

where

W_{2} \in R^{d \times 3 d}, b_{2} \in R^{d}

are learnable parameters, and

tanh (\cdot)

denotes the hyperbolic tangent activation function. To control the extent to which the student absorbs

Δ x_{t}

, we also introduce a absorption gate

G_{t}^{α}

:

G_{t}^{α} = σ (W_{3} [x_{t - 1} | | x_{t} | | k_{t - 1}] + b_{3}),

(4)

where

W_{3} \in R^{d \times 3 d}, b_{3} \in R^{d}

are learnable parameters, and

σ (\cdot)

denotes the sigmoid activation function. The knowledge gain is then obtained by modulating the interaction gain with the gate:

{\tilde{K}}_{t} = C_{q_{t}} \cdot (G_{t}^{α} \cdot \frac{Δ x_{t} + 1}{2}) .

(5)

To update the knowledge state, we further design a forgetting gate

G_{t}^{β}

that determines how much of the previous knowledge should be retained for each knowledge concept:

G_{t}^{β} = σ (W_{4} [K_{t - 1} | | (G_{t}^{α} \cdot \frac{Δ x_{t} + 1}{2})] + b_{4}),

(6)

where

K_{t - 1}

denotes the knowledge embedding matrix before the update, and

W_{4} \in R^{d \times 3 d}, b_{4} \in R^{d}

are learnable parameters. The updated knowledge embedding is then given by:

K_{t} = {\tilde{K}}_{t} + G_{t}^{β} \cdot K_{t - 1} .

(7)

This module ensures that the model not only injects gated learning gains into the embeddings of relevant concepts, but also dynamically regulates the balance between long-term knowledge accumulation and short-term learning gains through the forgetting gate mechanism.

4.3. Structural Knowledge Modeling

Although sequence modeling can capture the sequential evolution of a student’s knowledge, it neglects the complex dependencies among knowledge concepts. In fact, certain concepts are often interrelated. Mastering one concept may facilitate understanding of related concepts, reflecting a natural knowledge transfer process in students’ learning. To capture the structural dependencies among knowledge concepts, we adopt a graph-based approach to model students’ knowledge states. Specifically, knowledge concepts are represented as nodes and their relationships as edges, allowing the model to propagate information across related concepts. This approach not only integrates the learning gains obtained from prior interactions but also generates a richer and more structured representation of the knowledge state.

4.3.1. Knowledge Graph Construction

To capture the structural dependencies among knowledge concepts, we construct a dynamic knowledge graph representing the current structural relationships of the student’s knowledge. Formally, we define the graph as the following:

G_{t} = (V_{t}, E_{t}),

where

V_{t} = {v_{1}, v_{2}, \dots, v_{M}}

denotes the set of M knowledge concepts, and each node

v_{i}

corresponds to the embedding

K_{t, i}

of the concept.

E_{t}

represents the set of edges, capturing pairwise dependencies among concepts. Specifically, the edge weight between concepts

v_{i}

and

v_{j}

is defined using the cosine similarity of their embeddings:

e_{i j}^{t} = \frac{K_{t, i}^{⊤} K_{t, j}}{∥ K_{t, i} ∥_{2} {∥ K_{t, j} ∥}_{2}} .

(8)

The adjacency matrix

A_{t} \in R^{M \times M}

is then obtained by a threshold of similarity scores:

{(A_{t})}_{i j} = \{\begin{matrix} 1, & if e_{i j}^{t} \geq τ, \\ 0, & otherwise, \end{matrix}

(9)

where

τ

is a predefined similarity threshold. This procedure ensures that edges are established only between sufficiently related concepts, allowing information to propagate selectively during graph convolution.

4.3.2. Dynamic Knowledge Modeling

Given the constructed dynamic knowledge graph

G_{t}

, we adopt a graph convolutional network to model the structural dependencies among concepts. In this network, each concept node updates its representation by aggregating information from its neighbors along the graph edges, which implements a standard message-passing operation. Specifically, the knowledge embeddings

K_{t} \in R^{M \times d}

serve as the initial node features

K_{t}^{(0)}

. At the

(j + 1)

-th graph convolution layer, the concept representations are updated by aggregating information from their neighbors:

K_{t}^{(j + 1)} = R e L U (D^{- \frac{1}{2}} {\tilde{A}}_{t} D^{- \frac{1}{2}} K_{t}^{(j)} W_{a}^{(j)}),

(10)

where

{\tilde{A}}_{t} = A_{t} + I

also is the adjacency matrix with self-connections,

D

is its degree matrix,

W_{a}^{(j)}

is a learnable weight matrix, and

R e L U (\cdot)

denotes a nonlinear activation function. The resulting

K_{t}

encodes the dynamically updated knowledge representations, reflecting the structural dependencies among concepts.

4.3.3. Static Knowledge Modeling

Dynamic knowledge modeling is highly dependent on students’ interaction sequences, making the derived concept embeddings susceptible to sparsity and instability, especially for concepts with few interactions. Moreover, such modeling primarily reflects students’ learning dynamics but overlooks the intrinsic semantic information carried by the concepts themselves. To address this limitation, we construct static knowledge embeddings by aggregating the features of all exercises associated with each knowledge concept. Specifically, for concept

k_{i}

, we compute the fused embedding of all corresponding exercises to obtain a stable and semantically enriched feature. Static knowledge embeddings serve as stable semantic priors that preserve each concept’s intrinsic meaning. This design enhances concept representations under sparse interactions and improves the robustness of knowledge transfer. Formally, building upon the defined association matrix

C \in R^{N \times M}

and exercise embedding matrix

Q \in R^{N \times d}

, we derive static knowledge embeddings

K \in R^{M \times d}

by aggregating exercise features for each concept

K = C^{⊤} Q

. Finally, each row of

K

is normalized by the number of exercises associated with the corresponding concept to ensure a consistent scale across concepts.

While the node features are derived from the static knowledge embeddings

K

, we still use the dynamically constructed adjacency matrix

A_{t}

to capture the structural relationships among concepts. This design allows the model to combine stable, semantically enriched node features with the time-varying relational patterns observed in students’ interactions. Each concept node updates its representation by aggregating information from its neighbors according to

A_{t}

:

K^{(j + 1)} = R e L U (D^{- \frac{1}{2}} {\tilde{A}}_{t} D^{- \frac{1}{2}} K^{(j)} W_{b}^{(j)}),

(11)

where

{\tilde{A}}_{t}

denotes the adjacency matrix with self-loops,

D

is the corresponding degree matrix, and

W_{b}^{(j)}

is a learnable weight matrix.

4.4. Prediction and Training

Given the obtained dynamic knowledge embeddings

K_{t}

and static knowledge embeddings

K

, we first identify the set of concepts associated with the next exercise

q_{t + 1}

. For each perspective, the corresponding exercise-specific knowledge state is derived as the following:

\begin{matrix} {\hat{k}}_{t}^{a} & = C_{q_{t + 1}} \cdot K_{t}, \\ {\hat{k}}_{t}^{b} & = C_{q_{t + 1}} \cdot K, \end{matrix}

(12)

where

C_{q_{t + 1}} \in {0, 1}^{M}

is the exercise-concept association vector. To generate the prediction, we concatenate the dynamic state

{\hat{k}}_{t}^{a} \in R^{d}

, the static state

{\hat{k}}_{t}^{b} \in R^{d}

, and the embedding of the target exercise

Q_{q_{t + 1}} \in R^{d}

. The combined representation is then projected through a linear transformation followed by the sigmoid function:

{\hat{r}}_{t + 1} = σ (W_{5} [{\hat{k}}_{t}^{a} ∥ {\hat{k}}_{t}^{b} ∥ Q_{q_{t + 1}}] + b_{5}),

(13)

where

W_{5} \in R^{3 d}

and

b_{5} \in R

are learnable parameters, and

σ (\cdot)

denotes the sigmoid function. Here,

{\hat{r}}_{t + 1} \in (0, 1)

represents the predicted probability that the student answers

q_{t + 1}

correctly.

To encourage each knowledge perspective to learn discriminative representations independently, we also perform auxiliary predictions based on the dynamic and static knowledge states. Specifically, these knowledge states are separately mapped to the probability of correctly answering

q_{t + 1}

:

\begin{matrix} {\hat{r}}_{t + 1}^{a} & = σ (W_{6} [{\hat{k}}_{t}^{a} ∥ Q_{q_{t + 1}}] + b_{6}), \\ {\hat{r}}_{t + 1}^{b} & = σ (W_{7} [{\hat{k}}_{t}^{b} ∥ Q_{q_{t + 1}}] + b_{7}), \end{matrix}

(14)

where

W_{6}, W_{7} \in R^{1 \times 2 d}

and

b_{6}, b_{7} \in R

are learnable parameters. For each prediction, the binary cross-entropy loss is computed with respect to the ground-truth labels

r_{t + 1}

:

L = - \sum_{t = 1}^{T} (r_{t + 1} log {\hat{r}}_{t + 1} + (1 - r_{t + 1}) log (1 - {\hat{r}}_{t + 1})),

(15)

L_{a} = - \sum_{t = 1}^{T} (r_{t + 1} log {\hat{r}}_{t + 1}^{a} + (1 - r_{t + 1}) log (1 - {\hat{r}}_{t + 1}^{a})),

(16)

L_{b} = - \sum_{t = 1}^{T} (r_{t + 1} log {\hat{r}}_{t + 1}^{b} + (1 - r_{t + 1}) log (1 - {\hat{r}}_{t + 1}^{b})),

(17)

where T is the number of valid time steps. The model is trained by minimizing the joint loss

L_{u}

:

L_{u} = L + ψ_{1} \cdot L_{a} + ψ_{2} \cdot L_{b},

(18)

where

ψ_{1}

,

ψ_{2} \in [0, 1]

control the relative contribution of the auxiliary losses to the total objective, enabling a flexible trade-off between the main and auxiliary objectives.

5. Algorithm Implementation and Computational Complexity Analysis

To provide a clear overview of the training procedure of DyGAS, we summarize it in Algorithm 1. We further analyze the computational complexity of the proposed model DyGAS. In the Sequential Knowledge Modeling, the student’s knowledge state is updated at each time step by integrating the previous interaction embedding, the current exercise embedding, and the retrieved knowledge state of associated concepts. This process involves linear transformations applied to concatenated vectors for computing the interaction gain and absorption gate, as well as the forgetting gate

G_{t}^{β}

, which modulates the retention of the previous knowledge state. Each of these transformations contributes

O (d^{2})

complexity per operation, and since the forgetting gate is applied across M knowledge concepts, it incurs an additional

O (M d^{2})

. Multiplication with the exercise–concept association vector of size M to update the embeddings of the associated concepts contributes

O (M d)

. Consequently, the complexity of this sequential module is

O (T (d^{2} + M d^{2} + M d))

.

The Structural Knowledge Modeling captures dependencies among knowledge concepts through graph-based aggregation, considering both dynamic and static perspectives. In the dynamic perspective, at each time step, the adjacency matrix is constructed by computing pairwise cosine similarities among M knowledge concept embeddings, incurring

O (M^{2} d)

complexity. This adjacency matrix is then normalized using degree-based scaling, contributing an additional

O (M^{2})

complexity, and is subsequently used in a graph convolution that first maps the node features via a weight matrix (

O (M d^{2})

) and then aggregates neighbor information (

O (M^{2} d)

), resulting in a total complexity of

O (M^{2} d + M d^{2})

. Consequently, the per-step complexity of the dynamic modeling is

O (M^{2} d + M d^{2} + M^{2})

. In the static perspective, knowledge embeddings are precomputed by aggregating the features of all associated exercises, with a cost of

O (N M d)

, where N is the total number of exercises, while the graph convolution still needs to be applied per time step, resulting in the same per-step complexity

O (M^{2} d + M d^{2})

. The total computational cost of the Structural Knowledge Modeling module is

O (T (M^{2} d + M d^{2} + M^{2}) + N M d)

, where the first term accounts for per-step graph convolution computations and adjacency construction, and the second term corresponds to the static feature aggregation.

In the Prediction and Training module, the model first derives exercise-specific knowledge states by multiplying the exercise–concept association vector

C q_{t + 1}

with the dynamic and static knowledge embeddings, respectively. This projection operation contributes

O (M d)

complexity. The obtained representations, together with the embedding of the target exercise, are concatenated and passed through a linear transformation to produce the main prediction, which incurs

O (d^{2})

complexity. In addition, auxiliary predictions are computed by concatenating the exercise embedding with either the dynamic or the static knowledge state, each followed by a linear transformation, which incurs

O (d^{2})

complexity. Consequently, the overall complexity of this prediction module is

O (T (M d + d^{2}))

.

Considering all components of the DyGAS model, the overall computational complexity is dominated by the structural knowledge modeling. Specifically, the sequential knowledge updates contribute a lower-order term of

O (M d^{2})

per time step, and the prediction module contributes

O (M d)

per step, both of which are negligible compared to the graph-based computations. The structural knowledge modeling requires

O (M^{2} d + M d^{2})

operations per step for adjacency construction and graph convolution, and the static knowledge aggregation contributes

O (N M d)

. Therefore, the total computational cost of the model over a sequence of length T can be expressed as

O (T (M^{2} d + M d^{2}) + N M d)

, where the first term accounts for the per-step graph convolution, and the second term corresponds to the static feature aggregation.

Algorithm 1 Training process of DyGAS.

Require:: Student interaction history $L_{i}$ , exercise–knowledge matrix $C$ , embedding dimension d, training steps T, loss weights $ψ_{1}, ψ_{2}$ .
1:: Initialize model parameters $θ$ .
2:: for not convergence do
3:: for $t = 0$ , … , $T - 1$ do
4:: Encode interaction $(q_{t}, r_{t})$ into embedding $x_{t}$ (Equation (1)).
5:: The sequential model updates knowledge state $K_{t}$ via gated evolution (Equations (2)–(7)).
6:: Construct dynamic knowledge graph $G_{t}$ with threshold $τ$ (Equations (8) and (9)).
7:: The structural model applies GCN to obtain dynamic embeddings $K_{t}^{(j)}$ (Equation (10)).
8:: Aggregate exercise–knowledge relations and apply GCN to learn static embeddings $K^{(j)}$ (Equation (11)).
9:: Fuse dynamic and static embeddings to predict ${\hat{r}}_{t + 1}$ (Equations (12) and (13)).
10:: Compute auxiliary predictions ${\hat{r}}_{t + 1}^{a}, {\hat{r}}_{t + 1}^{b}$ (Equation (14)).
11:: end for
12:: Calculate losses $L, L_{a}, L_{b}$ (Equations (15)–(17)).
13:: Update parameters $θ$ by minimizing $L_{u}$ (Equation (18)).
14:: end for
15:: return trained parameters $θ$

6. Experiments

In this section, we conduct a series of experiments to address the following questions:

RQ1: How does DyGAS perform compared with baseline methods based on different knowledge modeling strategies?
RQ2: What is the impact of each component of DyGAS on the overall performance?
RQ3: Can DyGAS effectively track the evolution of students’ knowledge?
RQ4: How sensitive is DyGAS to the choice of hyperparameters?
RQ5: What is the time complexity of DyGAS?
RQ6: Can DyGAS learn discriminative representations of exercises?

6.1. Datasets

We assess the effectiveness of the proposed model on three real-world public datasets. The overall statistics are reported in Table 2, and their details are summarized as follows:

ASSIST2009 (https://sites.google.com/site/assistmentsdata/home/2009-2010-assistment-data, accessed on 1 November 2025). A widely used benchmark dataset collected from U.S. high school students’ online learning activities. It contains students’ response logs, exercise content, and personal information, enabling the analysis of learning patterns, knowledge hierarchies, and performance prediction.
ASSIST2017 (https://sites.google.com/view/assistmentsdatamining/data-mining-competition-2017, accessed on 1 November 2025). A more challenging benchmark that evaluates students’ problem-solving abilities across diverse knowledge areas. It originates from courses led by Professors Neil Heffernan and Ryan Baker and has been frequently adopted to test the robustness of knowledge tracing models.
Junyi (https://pslcdatashop.web.cmu.edu/DatasetInfo?datasetId=1198, accessed on 1 November 2025). Collected from the Junyi Academy, a large-scale online learning platform, this dataset provides detailed student interaction logs. It records not only practice behaviors but also rich exercise-related information, including names, availability, prerequisite relations, knowledge map positions, and creation dates.

Table 2. Statistics of the datasets.

Statistics	Datasets
	ASSIST2009	ASSIST2017	Junyi
Records	297,343	942,816	4,316,340
Learners	3006	1709	1000
Exercises	9798	3162	701
Concepts	107	102	39

6.2. Baselines

We conduct comparisons against several representative state-of-the-art baselines, which can be categorized into five groups: traditional, sequential, memory, attention, and graph methods.

BKT (Traditional) models students’ mastery of knowledge concepts via hidden Markov models [7].
DKT (Sequential-based) leverages recurrent neural networks to model students’ knowledge dynamics over time [8].
DKVMN (Memory-based) uses a key-value memory architecture to model and update a student’s mastery level for each knowledge concept [11].
SAKT (Attention-based) employs a self-attention mechanism to selectively model the influence of relevant past exercises [28].
AKT (Attention-based) leverages a monotonic attention mechanism to capture students’ knowledge evolution [15].
HiTSKT (Attention-based) uses a hierarchical transformer with session- and interaction-level encoders to capture students’ knowledge dynamics [37].
GKT (Graph-based) leverages graph neural networks to model the latent relationships among knowledge concepts [16].
GIKT (Graph-based) employs graph convolutional networks to model high-order exercise–knowledge correlations and student–exercise interactions, effectively capturing long-term dependencies [17].
DGEKT (Graph-based) constructs a dual graph structure to capture heterogeneous exercise-concept relations and interaction transitions [18].
L-SKSKT (Graph-based) models both long- and short-term student knowledge states using graph-based embeddings [36].

6.3. Experiments Setup and Evaluation Metrics

The DyGAS model was implemented in PyTorch 1.1 [38] with a learning rate set to 0.001 and a batch size of 64. To mitigate overfitting, we applied L2 weight decay of 1 × 10⁻⁵ and a dropout rate of

0.2

. All sequences were standardized to a fixed length of 100 time steps (

T = 100

) by truncation or padding. In each dataset, 80% of sequences were allocated for training and the remaining 20% for testing [39,40]. The embedding dimension d was selected from {16, 32, 64, 128, 256}, the similarity threshold

τ

, used for constructing the knowledge graph, was varied from zero to one in steps of 0.1, and the auxiliary loss balancing parameters

ψ_{1}

and

ψ_{2}

were varied from zero to one in increments of 0.2. The influence of these hyperparameters on model performance is discussed in hyperparameter sensitivity analysis.

We evaluated the performance of our model using three commonly adopted metrics in knowledge tracing tasks [41,42,43,44]: area under the curve (AUC), accuracy (ACC), and root mean square error (RMSE). Both AUC and ACC range between zero and one, where larger values indicate better predictive performance. RMSE also falls within the 0–1 range, with smaller values reflecting lower prediction error and thus more accurate predictions.

6.4. Overall Comparision (RQ1)

Table 3 reports the performance comparison between DyGAS and all baselines, where the best and second-best results are marked in bold and underline, respectively. Overall, DyGAS achieves the strongest performance on all three datasets, substantially surpassing baselines from traditional, sequential, memory, attention, and graph neural network paradigms. Compared with the strongest competitor L-SKSKT, DyGAS obtains remarkable improvements: on ASSIST2009, it boosts AUC by 4.61% and reduces RMSE by 8.45%; on ASSIST2017, it raises ACC by 1.19% and lowers RMSE by 2.02%; and on the large-scale Junyi dataset, it still secures consistent gains across all metrics, including a 0.86% reduction in RMSE.

Specifically, BKT relies on an overly simplistic Markov assumption, making it difficult to capture the dynamic evolution of students’ knowledge. Although DKT, as a sequential model, can track knowledge changes over time, it depends solely on a single sequential state representation. In contrast, DyGAS’s sequential modeling leverages memory gating to dynamically integrate short-term knowledge gains with long-term accumulation, resulting in superior performance compared to these methods. DKVMN’s memory mechanism maintains the mastery state of students’ knowledge, but for concepts with few associated interactions, the knowledge state is updated infrequently, limiting the model’s ability to track learning progress. DyGAS addresses this by incorporating static knowledge modeling, providing stable semantic information to compensate for infrequent knowledge updates from sparse interactions. SAKT and AKT highlights relevant past responses through self-attention, and HiTSKT employs a hierarchical transformer to capture session- and interaction-level dynamics. However, these attention-based methods often overlook the inherent structural relationships among knowledge concepts. In contrast, DyGAS leverages graph-based modeling to capture such complex dependencies, resulting in more accurate and comprehensive modeling of knowledge structures. GKT excels at capturing static dependencies between knowledge concepts, GIKT models higher order exercise–knowledge relations, DGEKT constructs graphs and hypergraphs to capture exercise–knowledge associations, and L-SKSKT employs graph neural networks to represent both long- and short-term knowledge states. However, these methods generally emphasize global or static structures and lack dynamic modeling of students’ knowledge evolution over time. By integrating sequential and graph-based modeling, DyGAS captures knowledge dependencies and dynamically tracks students’ knowledge evolution, achieving state-of-the-art performance on all three datasets.

6.5. Ablation Study (RQ2)

Table 4 presents the ablation study of DyGAS across three datasets. Five model variants are evaluated to examine the contribution of different components. (1) Auxiliary loss (AU): Omitting the auxiliary loss leads to a slight decline in AUC and ACC across all datasets, accompanied by a minor increase in RMSE. This suggests that the auxiliary objectives contribute to more discriminative representation learning for each knowledge aspect independently, which in turn helps capture informative features that enhance predictive performance. (2) Dynamic knowledge modeling (DM): Removing this component results in a noticeable decline in performance. This indicates that integrating a dynamic knowledge graph over knowledge concepts facilitates the transfer of knowledge between related concepts, thereby enhancing the model’s ability to capture evolving knowledge states. (3) Static knowledge modeling (SM): Removing the static knowledge component consistently degrades performance. Static embeddings serve as stable semantic priors that complement sparse interaction data, particularly for knowledge concepts with limited associated exercises. In the absence of static modeling, the concept representations learned solely from dynamic sequences become more susceptible to sparsity and instability, reducing the model’s overall predictive reliability. (4) Entire structural knowledge modeling (AL): Removing all structure-related components, including dynamic and static knowledge modeling as well as auxiliary supervision, leads to the largest performance drop. This highlights the critical role of graph-based structural modeling in capturing inter-concept dependencies, propagating knowledge gains. Without this component, the model relies solely on sequential knowledge modeling, losing the ability to represent the relational structure of knowledge and thereby weakening its capacity to model students’ knowledge states. (5) Forgetting gate (FG): Removing the forgetting gate results in a notable decrease in performance. This mechanism enables the model to selectively retain or decay historical knowledge when integrating it with the current interactions, thereby facilitating the adaptive fusion of long-term and short-term knowledge states. Overall, the findings confirm that all components are effective and contribute to the model’s ability to capture evolving knowledge states and inter-concept dependencies.

6.6. Visualization of Knowledge Evolution (RQ3)

We further analyze an instance to understand how our model captures a student’s evolving knowledge state over a sequence of 15 exercises, as shown in Figure 3. For example, consider the evolution of knowledge concept

K_{1}

across exercises T1–T5. At T1, the student answers incorrectly, and the corresponding prediction probability is 0.31, indicating a limited initial understanding. Following a correct response on T2, the probability sharply rises to 0.70, reflecting knowledge acquisition. At T3, despite an incorrect response, the probability remains relatively high (0.62) compared to T1, suggesting that the student preserves previously accumulated knowledge. Subsequent correct responses at T4 and T5 lead to further increases in prediction probabilities (0.77 and 0.84), demonstrating continued knowledge consolidation. For knowledge concept

K_{3}

, the student initially shows weak mastery, with low prediction probabilities at

q_{11}

and

q_{12}

. After correctly answering

q_{13}

, the probability sharply rises to 0.94, reflecting a strong positive update from this success. Although the student answers incorrectly at

q_{14}

, the probability remains relatively high (0.80), suggesting that the model retains the influence of the accumulated knowledge from the prior correct response. However, the subsequent error at

q_{15}

reduces the probability back to 0.36, indicating a decline in the student’s estimated mastery level after consecutive mistakes. Comparing DyGAS with the DyGAS-AL variant predictions, we observe that DyGAS consistently produces slightly higher prediction probabilities. For instance, at T2 and T10, where the student answers correctly, DyGAS predicts probabilities of 0.70 and 0.94, respectively, whereas DyGAS-AL predicts 0.61 and 0.90. This difference arises because DyGAS leverages the structural relationships among knowledge concepts: knowledge gains in one concept can positively influence related concepts. In contrast, DyGAS-AL relies solely on sequential interactions, which captures sequential patterns but may underestimate the accumulated knowledge due to the lack of structural context. Therefore, DyGAS can reflect the probability of correct responses, especially for knowledge concepts that benefit from related prior learning.

This analysis demonstrates that our model effectively tracks the dynamic evolution of a student’s knowledge state. By capturing structural relationships among knowledge concepts, DyGAS not only reflects accumulated knowledge states but also enhances the modeling of concepts influenced by related knowledge, highlighting the important role of structural knowledge modeling in knowledge tracing. Figure 4 further illustrates how mastery of certain concepts positively influences related concepts as the student progresses through the exercises.

6.7. Hyperparameter Sensitivity Analysis (RQ 4)

In this section, we investigate the impact of key hyperparameters on the performance of DyGAS, including the feature embedding dimension d, the association threshold

τ

for constructing the dynamic knowledge graph, and the balance coefficients of the auxiliary losses,

ψ_{1}

and

ψ_{2}

.

Impact of d. The effect of feature embedding dimension on DyGAS is shown in Figure 5. For ASSIST2009 and ASSIST2017, increasing d generally leads to improvements in AUC and ACC, while RMSE decreases, indicating more accurate predictions. However, the performance gains gradually diminish as d becomes larger, suggesting that extremely high-dimensional embeddings provide limited additional benefit and may increase computational cost during training. On the Junyi dataset, the impact of d is less pronounced. Both ACC and RMSE remain relatively stable across different embedding dimensions, while AUC exhibits a slight increase at moderate dimensions and a minor decrease when d becomes very large. This decline may be attributed to overfitting and the redundancy introduced by excessively high-dimensional embeddings. Overall, these observations suggest that moderate embedding dimensions (e.g.,

d = 64

or 128) offer a good trade-off between representational capacity and computational efficiency. They are sufficient to capture interactions among exercises and concepts without introducing excessive redundancy or training overhead.

Impact of

τ

. The impact of the similarity threshold

τ

on dynamic knowledge graph construction is reported in Figure 6. The overall performance is relatively stable when

τ

varies between zero and one. Specifically, on ASSIST2009, higher thresholds (e.g.,

τ = 0.7

) slightly improve both AUC and ACC while reducing RMSE, suggesting that filtering weaker edges helps the model focus on stronger concept dependencies and yields better structural representations. The results on ASSIST2017 exhibit modest variations, with AUC and ACC fluctuating within a narrow range across thresholds, indicating that the dataset’s relational structure is not highly sensitive to changes in graph sparsity. For Junyi, the metrics remain stable across different thresholds, indicating that its concept relations are relatively independent, so removing weaker edges has little impact on the overall structure. Overall, these results suggest that moderate thresholds (e.g.,

τ = 0.7

–

0.8

) tend to provide a good balance between filtering out noisy edges and preserving meaningful structural relations. While the overall performance remains relatively stable across datasets, ASSIST2009 shows a slight preference for higher thresholds, whereas ASSIST2017 and Junyi exhibit only marginal changes. This indicates that the effectiveness of thresholding is dataset-dependent rather than universally consistent.

Impact of

ψ_{1}, ψ_{2}

. To investigate the influence of auxiliary loss weights on DyGAS, we vary

ψ_{1}

and

ψ_{2}

over

{0.0, 0.2, 0.4, 0.6, 0.8, 1.0}

and report the results in Table 5. The hyperparameter

ψ_{1}

controls the contribution of the auxiliary loss associated with the dynamic knowledge state, whereas

ψ_{2}

regulates the auxiliary loss from the static knowledge state. The results demonstrate a discernible contrast in the impact of the coefficients

ψ_{1}

and

ψ_{2}

. The influence of

ψ_{1}

on predictive performance is relatively subtle. For instance, on ASSIST2009, the optimal AUC and RMSE were achieved at

ψ_{1} = 0.0

, with only marginal gains in ACC observed at higher values. Performance on ASSIST2017 and Junyi remained largely stable, with peak metrics occurring at

ψ_{1} = 1.0

. This suggests that enhancing the dynamic knowledge contribution through

ψ_{1}

provides, a limited performance benefit. In contrast,

ψ_{2}

exhibited a more pronounced and generally positive association with model performance across datasets, highlighting the critical role of the static knowledge perspective. Performance on ASSIST2009 improved monotonically with

ψ_{2}

, peaking at

ψ_{2} = 0.6

. Similarly, on ASSIST2017, optimal results were attained at

ψ_{2} = 0.8

, while on Junyi, the highest performance was achieved at

ψ_{2} = 1.0

. These observations indicate that the auxiliary loss

ψ_{2}

effectively strengthens the static concept embeddings, leading to more stable representations and consequently, enhanced predictive accuracy and robustness. Overall, the auxiliary loss parameters provide independent supervision for the dynamic and static knowledge perspectives, helping the model better capture dynamic knowledge evolution and stabilize concept representations.

6.8. Comparison on Time Cost (RQ5)

To evaluate the computational efficiency of DyGAS, we report the training time of different knowledge tracing models on the ASSIST2009 dataset (Figure 7). Runtime is a critical factor for educational applications, where models need to process long interaction sequences quickly. All experiments were conducted on a single NVIDIA RTX 3060 (12GB) GPU with an Intel(R) Xeon(R) CPU E5-2673 v4 and 16GB RAM, and the reported runtime reflects the actual wall-clock training time per epoch.

As shown in the results, our proposed model DyGAS is more costly than baselines such as DKT (0.05), DKVMN (0.10), and attention-based models SAKT (0.05), AKT (0.16). However, they are substantially more efficient than graph-based baselines including GKT (1.03), GIKT (1.38), and particularly DGEKT (5.43). This demonstrates that our method achieves a favorable trade-off between incorporating graph relational information and maintaining acceptable runtime. The key reason for this efficiency is that, unlike prior models that explicitly construct and update large-scale exercise graphs or exercise-knowledge graphs, our method restricts graph structural modeling to the knowledge concept level, where the graph is significantly smaller. Modeling dependencies between knowledge concepts is not only computationally lighter but also pedagogically meaningful, as concepts capture the underlying cognitive structures shared across multiple exercises and naturally reflect the knowledge transfer process in student learning. Consequently, the model avoids expensive message passing over thousands of exercise nodes while still preserving the essential relational information for effective prediction. Finally, comparing DyGAS with its variant DyGAS-AL demonstrates the cost of graph-based structural modeling. While DyGAS requires slightly more time, the overhead remains modest and acceptable considering the performance improvements. This confirms that our concept-level graph component introduces only limited additional complexity while significantly enhancing knowledge tracking capability.

6.9. Exercises Clustering (RQ6)

To investigate whether training can organize question features meaningfully, we conducted the visual experiment on the ASSIST2009 dataset. Specifically, five knowledge concepts are randomly selected and used the corresponding question embeddings as samples. We visualized both the initial and trained question embeddings using the t-SNE method. As shown in the Figure 8, the initial embeddings are scattered with no obvious clustering, whereas the trained embeddings form more distinct clusters that largely correspond to the labeled knowledge concepts. The questions belonging to the same concept tend to be grouped together, although some clusters are slightly close to each other, which may reflect semantic similarities between concepts or similar question types. These results indicate that, after training, the model has learned more discriminative question embeddings, with questions from the same knowledge concept forming tighter clusters. The proximity between some clusters may reflect potential relationships among knowledge concepts, which may help identify patterns specific to knowledge concepts.

7. Conclusions

In this paper, we proposed DyGAS, a novel knowledge tracing model that dynamically integrates sequential modeling with graph-based methods. DyGAS frames the learning process through interaction–knowledge accumulation–knowledge transfer. The sequential module captures knowledge gains from interactions, supporting accumulation, while the structural module leverages graph convolutional networks to model concept dependencies, facilitating knowledge transfer. To mitigate instability caused by data sparsity, we further introduce a static knowledge module into the structural modeling, providing stable semantic priors for knowledge concepts. Experiments on three benchmark datasets demonstrate that DyGAS consistently outperforms state-of-the-art baselines in prediction accuracy. Additional analyses validate the contribution of each module, the robustness of the model under different hyperparameter settings, and its ability to provide interpretable insights into knowledge evolution. Overall, these results suggest that combining sequential dynamics with structural dependencies enables a more comprehensive simulation of students’ knowledge evolution and offers a new perspective for designing knowledge tracing models.

In future work, we plan to explore the adaptability of DyGAS in cold-start or sparse-learning scenarios through meta- or transfer-learning strategies, and further explore its applicability beyond knowledge tracing to broader educational tasks such as personalized content recommendation and adaptive question generation, which could enhance its potential in intelligent learning environments.

Author Contributions

Conceptualization, X.L., Z.Y., Y.G. and S.Y.; methodology, X.L., Z.Y., Y.G. and S.Y.; software, S.Z.; validation, X.L., Z.Y., Y.G., S.Z. and S.Y.; formal analysis, X.L., S.Z. and S.Y.; investigation, S.Y.; resources, X.L.; data curation, X.L.; writing—original draft, X.L., Z.Y., Y.G., S.Z. and S.Y.; writing—review & editing, X.L., Z.Y., Y.G., S.Z. and S.Y.; visualization, X.L., Z.Y., Y.G., S.Z. and S.Y. All authors have read and agreed to the published version of the manuscript.

Funding

Siwei Zhou acknowledged the Zhejiang Provincial College Student Science and Technology Innovation Program (Xinmiao Talent Program, No. 2025R404AB061).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Vardi, M.Y. Will MOOCs destroy academia? Commun. ACM 2012, 55, 5. [Google Scholar] [CrossRef]
Sun, J.; Wei, M.; Feng, J.; Yu, F.; Li, Q.; Zou, R. Progressive knowledge tracing: Modeling learning process from abstract to concrete. Expert Syst. Appl. 2024, 238, 122280. [Google Scholar] [CrossRef]
Song, X.; Li, J.; Cai, T.; Yang, S.; Yang, T.; Liu, C. A survey on deep learning based knowledge tracing. Knowl.-Based Syst. 2022, 258, 110036. [Google Scholar] [CrossRef]
Abdelrahman, G.; Wang, Q.; Nunes, B. Knowledge tracing: A survey. ACM Comput. Surv. 2023, 55, 1–37. [Google Scholar] [CrossRef]
Anderson, J.R.; Boyle, C.F.; Corbett, A.T.; Lewis, M.W. Cognitive modeling and intelligent tutoring. Artif. Intell. 1990, 42, 7–49. [Google Scholar] [CrossRef]
Villano, M. Probabilistic student models: Bayesian belief networks and knowledge space theory. In International Conference on Intelligent Tutoring Systems; Springer: Berlin/Heidelberg, Germany, 1992; pp. 491–498. [Google Scholar]
Corbett, A.T.; Anderson, J.R. Knowledge tracing: Modeling the acquisition of procedural knowledge. User Model. User-Adapt. Interact. 1994, 4, 253–278. [Google Scholar] [CrossRef]
Piech, C.; Bassen, J.; Huang, J.; Ganguli, S.; Sahami, M.; Guibas, L.J.; Sohl-Dickstein, J. Deep knowledge tracing. Adv. Neural Inf. Process. Syst. 2015, 28, 505–513. [Google Scholar]
Shen, S.; Liu, Q.; Chen, E.; Huang, Z.; Huang, W.; Yin, Y.; Su, Y.; Wang, S. Learning process-consistent knowledge tracing. In Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, 14–18 August 2021; pp. 1452–1460. [Google Scholar]
Shen, S.; Huang, Z.; Liu, Q.; Su, Y.; Wang, S.; Chen, E. Assessing student’s dynamic knowledge state by exploring the question difficulty effect. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 427–437. [Google Scholar]
Zhang, J.; Shi, X.; King, I.; Yeung, D.Y. Dynamic key-value memory networks for knowledge tracing. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 765–774. [Google Scholar]
Abdelrahman, G.; Wang, Q. Knowledge tracing with sequential key-value memory networks. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; pp. 175–184. [Google Scholar]
Abdelrahman, G.; Wang, Q. Deep graph memory networks for forgetting-robust knowledge tracing. IEEE Trans. Knowl. Data Eng. 2022, 35, 7844–7855. [Google Scholar] [CrossRef]
Choi, Y.; Lee, Y.; Cho, J.; Baek, J.; Kim, B.; Cha, Y.; Shin, D.; Bae, C.; Heo, J. Towards an appropriate query, key, and value computation for knowledge tracing. In Proceedings of the 7th ACM Conference on Learning@ Scale, Virtual Event, USA, 12–14 August 2020; pp. 341–344. [Google Scholar]
Ghosh, A.; Heffernan, N.; Lan, A.S. Context-aware attentive knowledge tracing. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Virtual Event, CA, USA, 6–10 July 2020; pp. 2330–2339. [Google Scholar]
Nakagawa, H.; Iwasawa, Y.; Matsuo, Y. Graph-based knowledge tracing: Modeling student proficiency using graph neural network. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, Thessaloniki, Greece, 14–17 October 2019; pp. 156–163. [Google Scholar]
Yang, Y.; Shen, J.; Qu, Y.; Liu, Y.; Wang, K.; Zhu, Y.; Zhang, W.; Yu, Y. GIKT: A graph-based interaction model for knowledge tracing. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Virtual, 14–18 September 2020; Springer: Cham, Switzerland, 2020; pp. 299–315. [Google Scholar]
Cui, C.; Yao, Y.; Zhang, C.; Ma, H.; Ma, Y.; Ren, Z.; Zhang, C.; Ko, J. DGEKT: A dual graph ensemble learning method for knowledge tracing. ACM Trans. Inf. Syst. 2024, 42, 1–24. [Google Scholar] [CrossRef]
Cheng, K.; Peng, L.; Wang, P.; Ye, J.; Sun, L.; Du, B. DyGKT: Dynamic graph learning for knowledge tracing. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; pp. 409–420. [Google Scholar]
Cen, H.; Koedinger, K.; Junker, B. Learning factors analysis—A general method for cognitive model evaluation and improvement. In International Conference on Intelligent Tutoring Systems; Springer: Berlin/Heidelberg, Germany, 2006; pp. 164–175. [Google Scholar]
Pavlik, P.I., Jr.; Cen, H.; Koedinger, K.R. Performance Factors Analysis—A New Alternative to Knowledge Tracing. In Proceedings of the 2009 Conference on Artificial Intelligence in Education: Building Learning Systems That Care: From Knowledge Representation to Affective Modelling, Brighton, UK, 6–10 July 2009; pp. 531–538. [Google Scholar]
Vie, J.J.; Kashima, H. Knowledge tracing machines: Factorization machines for knowledge tracing. Proc. AAAI Conf. Artif. Intell. 2019, 33, 750–757. [Google Scholar] [CrossRef]
Yeung, C.K.; Yeung, D.Y. Addressing two problems in deep knowledge tracing via prediction-consistent regularization. In Proceedings of the 5th Annual ACM Conference on Learning at Scale, London, UK, 26–28 June 2018; pp. 1–10. [Google Scholar]
Nagatani, K.; Zhang, Q.; Sato, M.; Chen, Y.Y.; Chen, F.; Ohkuma, T. Augmenting knowledge tracing by considering forgetting behavior. In Proceedings of the 28th International Conference on World Wide Web, San Francisco, CA, USA, 13–17 May 2019; pp. 3101–3107. [Google Scholar]
Guo, X.; Huang, Z.; Gao, J.; Shang, M.; Shu, M.; Sun, J. Enhancing knowledge tracing via adversarial training. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event, China, 20–24 October 2021; pp. 367–375. [Google Scholar]
Xu, B.; Huang, Z.; Liu, J.; Shen, S.; Liu, Q.; Chen, E.; Wu, J.; Wang, S. Learning behavior-oriented knowledge tracing. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, USA, 6–10 August 2023; pp. 2789–2800. [Google Scholar]
He, L.; Li, X.; Wang, P.; Tang, J.; Wang, T. MAN: Memory-augmented Attentive Networks for Deep Learning-based Knowledge Tracing. ACM Trans. Inf. Systems. 2023, 42, 1–22. [Google Scholar] [CrossRef]
Pandey, S.; Karypis, G. A self-attentive model for knowledge tracing. In Proceedings of the 12th International Conference on Educational Data Mining, Montréal, QC, Canada, 2–5 July 2019; pp. 384–389. [Google Scholar]
Pu, Y.; Liu, F.; Shi, R.; Yuan, H.; Chen, R.; Peng, T.; Wu, W. ELAKT: Enhancing Locality for Attentive Knowledge Tracing. ACM Trans. Inf. Syst. 2024, 42, 1–27. [Google Scholar] [CrossRef]
Huang, S.; Liu, Z.; Zhao, X.; Luo, W.; Weng, J. Towards robust knowledge tracing models via k-sparse attention. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Taipei, Taiwan, 23–27 July 2023; pp. 2441–2445. [Google Scholar]
Xie, Y.; Jia, J.; Wen, C.; Li, D.; Li, M. Multi-topology contrastive graph representation learning. Sci. China Inf. Sci. 2026, 69, 122102. [Google Scholar] [CrossRef]
Xie, Y.; Chang, Y.; Li, M.; Qin, A.; Zhang, X. AutoSGRL: Automated framework construction for self-supervised graph representation learning. Neural Netw. 2025, 194, 108119. [Google Scholar] [CrossRef] [PubMed]
Yang, G.; Li, M.; Feng, H.; Zhuang, X. Deeper insights into deep graph convolutional networks: Stability and generalization. IEEE Trans. Pattern Anal. Mach. Intell. 2025. [Google Scholar] [CrossRef] [PubMed]
Tong, S.; Liu, Q.; Huang, W.; Hunag, Z.; Chen, E.; Liu, C.; Ma, H.; Wang, S. Structure-based knowledge tracing: An influence propagation view. In Proceedings of the 20th IEEE International Conference on Data Mining, Sorrento, Italy, 17–20 November 2020; pp. 541–550. [Google Scholar]
Tong, H.; Wang, Z.; Zhou, Y.; Tong, S.; Han, W.; Liu, Q. Introducing Problem Schema with Hierarchical Exercise Graph for Knowledge Tracing. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 405–415. [Google Scholar]
Yu, G.; Xie, Z.; Zhou, G.; Zhao, Z.; Huang, J.X. Exploring long-and short-term knowledge state graph representations with adaptive fusion for knowledge tracing. Inf. Process. Manag. 2025, 62, 104074. [Google Scholar] [CrossRef]
Ke, F.; Wang, W.; Tan, W.; Du, L.; Jin, Y.; Huang, Y.; Yin, H. HiTSKT: A hierarchical transformer model for session-aware knowledge tracing. Knowl.-Based Syst. 2024, 284, 111300. [Google Scholar] [CrossRef]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. In Proceedings of the 33rd Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
Shen, X.; Yu, F.; Liu, Y.; Liang, R.; Wan, Q.; Yang, T.; Shi, M.; Sun, J. Enhancing knowledge tracing with question-based contrastive learning. Knowl.-Based Syst. 2025, 325, 113899. [Google Scholar] [CrossRef]
Yin, Y.; Dai, L.; Huang, Z.; Shen, S.; Wang, F.; Liu, Q.; Chen, E.; Li, X. Tracing knowledge instead of patterns: Stable knowledge tracing with diagnostic transformer. In Proceedings of the 32nd ACM Web Conference, Austin, TX, USA, 30 April–4 May 2023; pp. 855–864. [Google Scholar]
Chen, M.; Guan, Q.; He, Y.; He, Z.; Fang, L.; Luo, W. Knowledge tracing model with learning and forgetting behavior. In Proceedings of the 31st ACM International Conference on Information and Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; pp. 3863–3867. [Google Scholar]
Long, T.; Qin, J.; Shen, J.; Zhang, W.; Xia, W.; Tang, R.; He, X.; Yu, Y. Improving knowledge tracing with collaborative information. In Proceedings of the 15th ACM International Conference on Web Search and Data Mining, Virtual Event, AZ, USA, 21–25 February 2022; pp. 599–607. [Google Scholar]
Song, X.; Li, J.; Tang, Y.; Zhao, T.; Chen, Y.; Guan, Z. Jkt: A joint graph convolutional network based deep knowledge tracing. Inf. Sci. 2021, 580, 510–523. [Google Scholar] [CrossRef]
Zhang, M.; Zhu, X.; Zhang, C.; Pan, F.; Qian, W.; Zhao, H. No length left behind: Enhancing knowledge tracing for modeling sequences of excessive or insufficient lengths. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023; pp. 3226–3235. [Google Scholar]

Figure 1. Subplot (I) shows a sequence of six interaction records (

q_{1}

–

q_{6}

) associated with three knowledge concepts (

K_{1}

–

K_{3}

). Knowledge tracing predicts the student’s responses and models the evolution of concept-level knowledge states, visualized as a series of radar charts. Subplot (II) illustrates the learning process, where interaction denotes the answering and feedback, accumulation represents knowledge gain after each interaction, and transfer propagates the gained knowledge to related concepts according to inter-concept relationships.

Figure 1. Subplot (I) shows a sequence of six interaction records (

q_{1}

–

q_{6}

) associated with three knowledge concepts (

K_{1}

–

K_{3}

). Knowledge tracing predicts the student’s responses and models the evolution of concept-level knowledge states, visualized as a series of radar charts. Subplot (II) illustrates the learning process, where interaction denotes the answering and feedback, accumulation represents knowledge gain after each interaction, and transfer propagates the gained knowledge to related concepts according to inter-concept relationships.

Figure 2. The overall architecture of the proposed DyGAS model, consisting of (I) a sequential knowledge modeling module, (II) a structural knowledge modeling module (including dynamic and static knowledge modeling), and (III) a prediction and training module.

Figure 3. Visualization of a student’s knowledge state evolution over a sequence of 15 exercises

q_{1}

–

q_{15}

across three knowledge concepts

K_{1}

–

K_{3}

on the ASSIST2009. Each cell represents the predicted probability of an exercise. The DyGAS jointly leverages sequential and structural knowledge modeling, while the DyGAS-AL variant uses only sequential modeling.

Figure 3. Visualization of a student’s knowledge state evolution over a sequence of 15 exercises

q_{1}

–

q_{15}

across three knowledge concepts

K_{1}

–

K_{3}

on the ASSIST2009. Each cell represents the predicted probability of an exercise. The DyGAS jointly leverages sequential and structural knowledge modeling, while the DyGAS-AL variant uses only sequential modeling.

Figure 4. The evolution of knowledge adjacency (T = 0 and T = 15). As learning progresses, the similarity among

K_{1}, K_{2}

, and

K_{3}

increases.

Figure 4. The evolution of knowledge adjacency (T = 0 and T = 15). As learning progresses, the similarity among

K_{1}, K_{2}

, and

K_{3}

increases.

Figure 5. Impact of d on performance.

Figure 6. Impact of

τ

on performance.

Figure 6. Impact of

τ

on performance.

Figure 7. Training time per epoch in minutes for various knowledge tracing models on ASSIST2009. DKT represents a sequential model, and DKVMN is memory-based. SAKT and AKT are attention-based. GKT, GIKT, and DGEKT are graph-based. DyGAS corresponds to the proposed model, and DyGAS-AL is the variant without the graph-based structural modeling module.

Figure 8. t-SNE visualization of exercise embeddings before and after training on ASSIST2009. Nodes of different colors represent exercises associated with different knowledge concepts.

Table 1. Summary of key notations and description.

Symbol	Description
N	Total number of exercises
M	Total number of knowledge concepts
$K_{t}$	The knowledge set related to the t-th exercise
$q_{t}$	The t-th exercise attempted by a student
$r_{t}$	Correctness of the response to $q_{t}$ (1=correct, 0=incorrect)
$x_{t}$	Interaction embedding of the t-th exercise
d	Embedding dimension
$Q$	Exercise embedding matrix of size $N \times d$
$K_{t}$	Dynamic knowledge embedding matrix at time t of size $M \times d$
$K$	Static knowledge embedding matrix of size $M \times d$
$C$	Exercise–concept association matrix of size $N \times M$
$Δ x_{t}$	Interaction gain vector at time t
$G_{t}^{α}$	Absorption gate controlling knowledge gain incorporation
$G_{t}^{β}$	Gate controlling retention of previous knowledge
$G_{t} = (V_{t}, E_{t})$	Dynamic knowledge graph at time t
$A_{t}$	Adjacency matrix of the knowledge graph, shape $M \times M$
$e_{i j}^{t}$	Edge weight between concepts i and j at time t
${\hat{k}}_{t}^{a}$	Dynamic exercise-specific knowledge state vector
${\hat{k}}_{t}^{b}$	Static exercise-specific knowledge state vector
${\hat{r}}_{t + 1}$	Predicted probability that the student answers $q_{t + 1}$ correctly
$L, L_{a}, L_{b}$	Binary cross-entropy losses for main and auxiliary predictions
$ψ_{1}, ψ_{2}$	Weighting coefficients for auxiliary losses

Table 3. Performance comparison between DyGAS and baselines. “-’’ denotes unavailable results. Each model is run five times and the mean metrics are reported (variance < 0.001). ↑ means higher is better; ↓ means lower is better. The best result is in bold and the second-best is underlined.

Method	ASSIST2009			ASSIST2017			Junyi
	AUC ↑	ACC ↑	RSME ↓	AUC ↑	ACC ↑	RSME ↓	AUC ↑	ACC ↑	RSME ↓
BKT [7]	0.6571	0.6209	0.4897	0.6365	0.5983	0.4993	0.7100	0.6819	0.4683
DKT [8]	0.7431	0.7127	0.4372	0.7295	0.6940	0.4454	0.7586	0.8326	0.3536
DKVMN [11]	0.7401	0.7038	0.4416	0.7511	0.7043	0.4379	0.7565	0.8324	0.3544
SAKT [28]	0.7111	0.6885	0.4545	0.6605	0.6694	0.4626	0.7590	0.8323	0.3544
AKT [15]	0.7766	0.7289	0.4273	0.7548	0.7108	0.4308	0.7593	0.8325	0.3538
HiTSKT [37]	0.7766	0.7539	0.4112	0.7566	0.7011	0.4453	-	-	-
GKT [16]	0.7231	0.7091	0.4575	0.7566	0.7132	0.4324	0.7589	0.8331	0.3535
GIKT [17]	0.7896	0.7315	0.4221	0.7623	0.6985	0.4325	-	-	-
DGEKT [18]	0.7656	0.7599	0.4198	0.7755	0.7208	0.5283	-	-	-
L-SKSKT [36]	0.8448	0.8159	0.3846	-	-	-	0.7769	0.8291	0.3499
DyGAS	0.8837	0.8186	0.3521	0.7828	0.7294	0.4221	0.7771	0.8374	0.3469
% Improve	4.61%	0.33%	−8.45%	0.94%	1.19%	−2.02%	0.03%	0.52%	−0.86%

Table 4. Ablation Study of different variants. Variants include: AU, removing the auxiliary loss used to supervise dynamic and static knowledge modeling; DM, removing dynamic knowledge modeling; SM, removing static knowledge modeling that provides stable embeddings for each concept; AL, removing all structure-related components, including dynamic and static knowledge modeling and auxiliary supervision; and FG, removing the forgetting gate that selectively retains or discards historical knowledge. ↑ means higher is better; ↓ means lower is better. The best result is in bold.

Variants	ASSIST2009			ASSIST2017			Junyi
	AUC ↑	ACC ↑	RSME ↓	AUC ↑	ACC ↑	RSME ↓	AUC ↑	ACC ↑	RSME ↓
AU	0.8802	0.8166	0.3547	0.7825	0.7291	0.4423	0.7745	0.8369	0.3473
DM	0.8820	0.8165	0.3541	0.7809	0.7281	0.4233	0.7740	0.8365	0.3476
SM	0.8797	0.8146	0.3553	0.7816	0.7286	0.4226	0.7748	0.8367	0.3474
AL	0.8767	0.8135	0.3556	0.7806	0.7281	0.4234	0.7727	0.8361	0.3481
FG	0.8659	0.8049	0.3667	0.7648	0.7164	0.4302	0.7603	0.8341	0.3513
DyGAS	0.8837	0.8186	0.3521	0.7828	0.7294	0.4221	0.7771	0.8374	0.3469

Table 5. Impact of

ψ_{1}

and

ψ_{2}

on Performance. ↑ means higher is better; ↓ means lower is better. The best result is in bold.

Table 5. Impact of

ψ_{1}

and

ψ_{2}

on Performance. ↑ means higher is better; ↓ means lower is better. The best result is in bold.

Param	Value	ASSIST2009			ASSIST2017			Junyi
		AUC ↑	ACC ↑	RMSE ↓	AUC ↑	ACC ↑	RMSE ↓	AUC ↑	ACC ↑	RMSE ↓
$ψ_{1}$	0.0	0.8764	0.8132	0.3590	0.7826	0.7288	0.4223	0.7748	0.8370	0.3473
	0.2	0.8744	0.8114	0.3606	0.7824	0.7290	0.4222	0.7750	0.8371	0.3471
	0.4	0.8742	0.8120	0.3604	0.7822	0.7282	0.4227	0.7752	0.8371	0.3473
	0.6	0.8748	0.8121	0.3602	0.7823	0.7290	0.4225	0.7753	0.8370	0.3472
	0.8	0.8740	0.8126	0.3601	0.7823	0.7286	0.4223	0.7757	0.8374	0.3469
	1.0	0.8753	0.8138	0.3596	0.7828	0.7294	0.4221	0.7771	0.8374	0.3469
$ψ_{2}$	0.0	0.8703	0.8095	0.3632	0.7823	0.7288	0.4226	0.7762	0.8373	0.3469
	0.2	0.8711	0.8104	0.3627	0.7826	0.7296	0.4223	0.7763	0.8374	0.3468
	0.4	0.8728	0.8116	0.3612	0.7828	0.7292	0.4222	0.7761	0.8374	0.3468
	0.6	0.8754	0.8134	0.3591	0.7831	0.7296	0.4220	0.7759	0.8373	0.3469
	0.8	0.8746	0.8121	0.3605	0.7835	0.7301	0.4218	0.7756	0.8372	0.347
	1.0	0.8753	0.8138	0.3596	0.7828	0.7294	0.4221	0.7771	0.8374	0.3469

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, X.; Yan, Z.; Gu, Y.; Zhou, S.; Yang, S. DyGAS: Dynamic Graph-Augmented Sequence Modeling for Knowledge Tracing. Appl. Sci. 2025, 15, 12767. https://doi.org/10.3390/app152312767

AMA Style

Li X, Yan Z, Gu Y, Zhou S, Yang S. DyGAS: Dynamic Graph-Augmented Sequence Modeling for Knowledge Tracing. Applied Sciences. 2025; 15(23):12767. https://doi.org/10.3390/app152312767

Chicago/Turabian Style

Li, Xiuyun, Zihao Yan, Yongchun Gu, Siwei Zhou, and Shasha Yang. 2025. "DyGAS: Dynamic Graph-Augmented Sequence Modeling for Knowledge Tracing" Applied Sciences 15, no. 23: 12767. https://doi.org/10.3390/app152312767

APA Style

Li, X., Yan, Z., Gu, Y., Zhou, S., & Yang, S. (2025). DyGAS: Dynamic Graph-Augmented Sequence Modeling for Knowledge Tracing. Applied Sciences, 15(23), 12767. https://doi.org/10.3390/app152312767

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DyGAS: Dynamic Graph-Augmented Sequence Modeling for Knowledge Tracing

Abstract

1. Introduction

2. Related Work

3. Preliminary and Notation

4. Proposed Method: DyGAS

4.1. Embedding

4.2. Sequential Knowledge Modeling

4.3. Structural Knowledge Modeling

4.3.1. Knowledge Graph Construction

4.3.2. Dynamic Knowledge Modeling

4.3.3. Static Knowledge Modeling

4.4. Prediction and Training

5. Algorithm Implementation and Computational Complexity Analysis

6. Experiments

6.1. Datasets

6.2. Baselines

6.3. Experiments Setup and Evaluation Metrics

6.4. Overall Comparision (RQ1)

6.5. Ablation Study (RQ2)

6.6. Visualization of Knowledge Evolution (RQ3)

6.7. Hyperparameter Sensitivity Analysis (RQ 4)

6.8. Comparison on Time Cost (RQ5)

6.9. Exercises Clustering (RQ6)

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI