Next Article in Journal
Comparative Analysis of Data Representation Methods for PV Anomaly Detection: Raw Time-Series Data vs. Diverse Binning Strategies
Previous Article in Journal
Global Stability Analysis of Coexistence Steady State in a General Predator–Prey System with Double Prey-Taxis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Knowledge Tracing Model Based on Hierarchical Heterogeneous Graphs

1
School of Mathematics and Computer Application, Shangluo University, Shangluo 726000, China
2
Department of Computer Science, University of California Davis, Davis, CA 94555, USA
*
Author to whom correspondence should be addressed.
Mathematics 2026, 14(3), 500; https://doi.org/10.3390/math14030500
Submission received: 7 October 2025 / Revised: 20 December 2025 / Accepted: 22 December 2025 / Published: 30 January 2026
(This article belongs to the Special Issue Applied Mathematics for Information Security and Applications)

Abstract

Whether learners can correctly complete exercises is influenced by multiple factors, including their mastery of relevant knowledge concepts and the interdependencies among these concepts. To investigate how the structure of the knowledge space—particularly the complex relationships among learners, exercises, and knowledge points—affects learning outcomes, this study proposes the Hierarchical Heterogeneous Graph Knowledge Tracing model (HHGKT). A hierarchical heterogeneous graph was constructed to capture two types of interactions—“learner–knowledge concept” and “exercise–knowledge concept”—and incorporate the interdependencies among knowledge concepts into the graph structure. By leveraging this hierarchical representation, the model’s ability to characterize learners and exercises was enhanced. A hierarchical heterogeneous graph encompassing users, exercises, and knowledge concepts was built based on the ASSISTments dataset, and simulation experiments were conducted. The results indicate that the proposed structure effectively represents the complexity of the knowledge space. Incorporating knowledge concept interdependencies improves prediction accuracy by 1.79%, while the hierarchical heterogeneous graph outperforms traditional bipartite graphs by approximately 1.5 percentage points in accuracy. These findings demonstrate that the model better integrates node and relational information, offering valuable insights for knowledge space modeling and its application in educational contexts.

1. Introduction

Knowledge tracing (KT) is a core technology in the field of educational intelligence, aiming to dynamically model learners’ evolving knowledge states by analyzing their historical learning behaviors—such as answer records and interaction sequences—and to accurately predict future performance [1]. This approach enables both teachers and learners to gain more precise insights into learning progress. Based on these predictions, learning paths can be planned for learners and learning resources can be recommended. It represents an important support for achieving “teaching according to learners’ aptitude” and is a key technical component of adaptive learning.
With the rapid growth of online education and the increasing demand for personalized learning, how to provide adaptive support based on learners’ real-time knowledge acquisition has become a hot topic in educational technology research, attracting the attention of many experts and scholars [2,3,4,5,6]. Knowledge tracing can be summarized as traditional methods [2] and deep learning methods [3,4,5,6]. Traditional methods have the disadvantages of high labor costs and lack of real-time performance, making them unsuitable for adaptive education systems. Current research on knowledge tracing mainly focuses on deep learning methods. Due to the ability of deep learning to capture complex patterns in data and the intrinsic relationships between data, many experts and scholars have constructed knowledge tracing models based on deep learning [1,3,4,5,6].
Graph-structured data, with their flexible combination of nodes and edges, can effectively represent multi-modal and highly complex relational information. In recent years, graph neural networks (GNNs) have been widely applied in knowledge tracing due to their powerful modeling capabilities for graph-structured data [4,5,6]. For instance, GKT [4] transformed the knowledge tracing problem into a temporal node classification task by constructing a knowledge concept relationship graph, thereby enhancing prediction performance. However, nodes such as learners, exercises, and knowledge concepts often possess multi-modal characteristics [5,6,7,8,9,10], and traditional bipartite graphs are insufficient to fully represent the complex relationships among them. To address this issue, in JKT [5], the multi-dimensional relationship graph including “exercise–exercise” and “knowledge concept–knowledge concept” was constructed, and then a knowledge tracing model was established. Multiple types of relationship graphs such as “problem–problem”, “skill–problem”, “learner–problem”, and “learner–skill” were introduced into SPKT [6], leading to the development of a knowledge tracing model based on multiple graphs.
Using multiple graphs can effectively represent various association relationships among nodes. However, the association relationships between graphs are difficult to mine, which limits the representational ability of such models. To enhance the system’s representational ability for multiple association relationships and multi-modal data, a data structure that integrates complex relationships needs to be constructed. Its performance addresses many issues such as multi-modal data representation and the mining of multiple association relationships, which brings significant challenges to the modeling of knowledge tracing models.
Based on this, researchers have conducted extensive studies on the impact of the representation of data multi-modality and multiple association relationships between nodes in knowledge tracing models on their performance [7,8,9,10]. However, there are still deficiencies in research on the mechanism of the impact of knowledge space structure representation on the performance of knowledge tracing models and its influence on educational computation. Therefore, how to construct a knowledge space structure that can integrate multiple modalities and multiple association relationships, as well as clarify the mechanism of the impact of knowledge space structure on educational computation, is an urgent problem that needs to be solved in the field of educational computation.
For this purpose, a hierarchical heterogeneous graph of the user–exercise–knowledge concept was built based on the multi-modality of nodes and the multiple association relationships between nodes. A knowledge tracing model was constructed based on the hierarchical heterogeneous graph to analyze the impact of knowledge space representation on model performance. The aim was to explore the influence of various association relationships on educational computation and provide references for the construction and optimization of the model for educational computation. The main contributions of this study are threefold:
(1)
A hierarchical heterogeneous graph among students, exercises, and knowledge points is constructed to capture the complex relationships among various nodes and enhance the model’s representation ability.
(2)
The hierarchical attention mechanism is employed to achieve information transmission and aggregation between nodes. By integrating the information of students, exercises, and knowledge points, the student nodes and exercise nodes are represented. Compressed attention is adopted for information fusion.
(3)
A hierarchical heterogeneous graph was constructed for the ASSISTments dataset, and comprehensive simulation experiments were subsequently conducted to evaluate model performance and validate the effectiveness of the proposed approach.

2. Related Work

2.1. Graph-Based Knowledge Tracing

Graph data structures not only contain information about various nodes but also provide information on the connection between them. Therefore, graph data structures have been applied in various fields, such as knowledge graphs in different domains. In graph-structured data, complex relationships between nodes are represented through a combination of various nodes and relationships. Knowledge tracing can be regarded as exploring the correlation between learners and exercises, as well as between learners and knowledge concepts. Predicting whether a learner can correctly complete an exercise essentially involves judging the learner’s mastery of the knowledge concepts related to the exercise. Thus, it can be seen that the essence of knowledge tracing is to discover the correlation between learners, exercises, and knowledge concepts. Compared with sequence data, graph data can better represent such correlation relationships. In recent years, graph-based knowledge tracing has received increasing attention [4,5,6,7,8,9,10].
The GKT (graph-based knowledge tracing) model was developed by Nakagawa et al. [4]. It merges learner–exercise interaction sequences with a knowledge concept graph, thereby transforming knowledge tracing into a node classification problem oriented towards time series. This approach can fully explore the influence of the dependency relationships among knowledge concepts on an individual’s ability to complete exercises. However, the knowledge concept graph only considers the dependency relationships among knowledge concepts and neglects other association relationships. A contrastive learning framework was constructed by Song et al. [5] to mine the “exercise-to-exercise” (E2E) and “concept-to-concept” (C2C) relationships from both global and local perspectives. However, the relationships between the nodes in the graph are too simplistic. Yan et al. [6] assumed that one exercise is associated with one knowledge concept, and a KT model was built based on the “exercise–skill” and “learner–exercise” relationships. However, in reality, one exercise is often related to one or more knowledge concept. Based on the one-to-many relationship between exercises and knowledge concepts, the KT model was modeled by Ju et al. [7], but the influence of multiple relationships such as “knowledge concept–knowledge concept” on the model was ignored. Multiple relationships such as “exercise-to-exercise”, “exercise-to-knowledge concept”, and “knowledge concept-to-knowledge concept” were introduced into the model by Song et al. [8] to improve its performance. The knowledge tracing model was constructed by Wang et al. [9] using the “learner–exercise” association graph, knowledge concept association hypergraph, and directed transformation graph, and multiple relationships in multiple graphs were utilized to enhance its representation ability. However, it is difficult to fully represent the inter-graph association information. A knowledge tracing model was constructed by Wu et al. [10] based on heterogeneous hypergraphs, and intra-graph and inter-graph attention mechanisms were adopted for information transmission and aggregation of neighbor information.
In the above methods, the knowledge tracing model was built based on the strong ability of graph-structured data to represent information, and one or more types of association relationships between nodes were integrated into the model. However, it is difficult for bipartite graphs to represent the complex association relationships between users and exercises, and incorporating the association relationships among multiple graphs into the model also proves challenging. Therefore, it is necessary to construct a data structure and information extraction model that includes complex relationships.

2.2. Hierarchical Heterogeneous Graph

The GNN has a significant advantage in representing nodes of heterogeneous graphs. In GNNs, node information is represented by information transmission and aggregation from neighboring nodes. However, the differences in node types are ignored in node representation. Although different types of neighbor nodes can be sampled through meta-paths and the importance of different types of nodes can be identified by using the attention mechanism, in knowledge tracing, multiple same-type nodes are always connected to another type of node. For example, learner nodes are connected to exercise nodes, exercise nodes are connected to knowledge concept nodes, and knowledge concept nodes are connected to learning resource nodes. This many-to-many structure belongs to a hierarchical graph structure, as shown in Figure 1a. Each user’s subgraph has a tree-like structure. For example, the subgraph of learner u1 is shown in Figure 1b, and its essence is also a hierarchical graph structure.
A heterogeneous graph was constructed by Xie et al. [11] based on “user–knowledge concept”, which is essentially a bipartite graph with only one type of relationship, namely “user–knowledge concept”. A heterogeneous graph for users, exercises, and knowledge concepts was built by Zhang et al. [12], but the graph only contained two types of relationships: “user–exercise” and “exercise–knowledge concept”. Multiple relationships were captured, as proposed by Xie et al. [13], through a combination of heterogeneous and homogeneous graphs. A hierarchical tree-like structure was constructed by Qiao et al. [14] based on the characteristics of heterogeneous graphs, and a graph representation learning neural network model was built to achieve the transmission and aggregation of node information, but the impact of different relationships on the model was not considered. A graph structure containing three different levels of information, namely neighbors, meta-paths, and cycles, was constructed by Song et al. [15] to exploit the complementarity among different levels of structural information. In the work of Tian et al. [16], a hierarchical heterogeneous graph was constructed for three node types—users, recipes, and ingredients. By enabling the transmission and aggregation of information across both nodes and their interconnecting relationships, this approach effectively enhances the system’s representational capacity.
The above research achievements are mainly based on the research of knowledge space structure representation methods and model construction. It can be seen that knowledge space structure representation has shifted from bipartite graphs to more complex heterogeneous graphs, but the transmission and aggregation of various relationship information in the graph still need to be further improved. In addition, in the knowledge space, there are more hierarchical relationships, membership relationships, association relationships, etc. How to represent, transmit, and aggregate different relationships still needs further exploration.
In this study, the hierarchical relationships (including association, dependency, membership relationships, etc.) in the knowledge space were introduced into the knowledge tracing model, and a knowledge tracing model based on hierarchical heterogeneous graphs was proposed. First, a “user–knowledge concept–exercise” hierarchical heterogeneous graph was constructed. Subsequently, to achieve superior information flow, node-level and relationship-level attention mechanisms were adopted. This facilitates the transmission and aggregation of both node and relationship information. As a result, the model integrates this diverse information into the vector representations of user and exercise nodes, which strengthens its representational power and boosts its performance. Constructivist learning theory was integrated into the model in this study, through which changes in users’ mastery of knowledge concepts can be dynamically captured, and references for the optimization of knowledge space structure representation in different application scenarios can be provided.
The remainder of this study is organized as follows. Section 3 describes the construction method of hierarchical heterogeneous graphs and builds a knowledge tracing model based on hierarchical heterogeneous graphs. Section 4 conducts simulation experiments on the above model to verify the effectiveness of the hierarchical heterogeneous graph structure in educational subject computation and analyze the influence of various relationships. The last section summarizes this paper and presents relevant conclusions.

3. HHGKT Model

The HHGKT model constructed in this study is shown in Figure 2. The complex structural relationships among learners, knowledge concepts, and exercises are represented by building a hierarchical heterogeneous graph. The hierarchical heterogeneous graph neural network is used to complete the node representation layer by layer. Then, a two-layer attention mechanism is adopted to achieve information transmission and aggregation in the hierarchical heterogeneous graph. Each layer of attention includes node-level adaptive attention and relationship-level adaptive attention to obtain the node feature representations of learners and exercises. Finally, the prediction of exercises is achieved through the fully convolutional network (FCN) network and the cross-entropy loss function.

3.1. Construction of Hierarchical Heterogeneous Graph and Node Embedding

In this study, a hierarchical heterogeneous graph was used to model the relationships among learners, exercises, and knowledge concepts. The constructed hierarchical heterogeneous graph can be represented as G = (S, K, E, R), where S, K, and E, respectively, denote the learner nodes, knowledge concept nodes, and exercise nodes. R represents the relationships between nodes. Four types of relationships are introduced in this study. The “learner–exercise” relationship is denoted as R S E , obtained through the interaction records between learners and exercises. The “exercise–knowledge concept” relationship is denoted as R E K , obtained through the Q matrix. The "learner–knowledge concept" relationship is denoted as R S K , obtained by combining R S E and R E K . The “knowledge concept–knowledge concept” relationship is denoted as R K K , obtained through the knowledge concept graph. The knowledge concept is a directed acyclic graph, but in this study, the knowledge concept graph is regarded as an undirected graph. Based on the above nodes and relationships, a hierarchical heterogeneous graph can be obtained.
The hierarchical heterogeneous graph SKE constructed in this study includes learner nodes S, knowledge concept nodes K, and exercise nodes E. The characteristics of each type of node are different. The learner nodes correspond to learner IDs. The exercise and knowledge concept nodes can be represented by text. To simplify the model, all three types of nodes in this study are numbered and one-hot-encoded.
In the heterogeneous graph G, the nodes U, E, and K adopt different modalities, where learners are represented by IDs, exercises by text, and knowledge concepts by knowledge concept graphs, as stated above. To simplify the model, all three types of nodes—learners, exercises, and knowledge concepts—are represented with IDs in this study. One-hot encoding is used to represent these three types of nodes. Since the feature spaces of the three types of nodes are different, feature mapping is employed to transform them into the same feature space, denoted as HS, HK, and HE, respectively.

3.2. Node-Level Attention

In this study, a two-layer attention mechanism is adopted to achieve information transmission and aggregation. Each layer of attention includes node-level attention and relation-level attention, as shown in Figure 2. In the hierarchical heterogeneous graph SKE designed in this study, nodes at different layers influence each other. In the knowledge concept layer, knowledge concepts also influence each other, as shown in the knowledge concept graph in Figure 3. There are dependency relationships among knowledge concepts, but to simplify the model, the directed knowledge concept graph as shown in Figure 3 is regarded as an undirected graph in this study.
For information transmission in the constructed hierarchical heterogeneous graph, the exercise nodes and the knowledge concept nodes play different roles in the embedding of learner nodes, which can be regarded as calculating the importance of one type of node to another. Through node-level attention, the importance of each type of node to its neighboring nodes in the hierarchical heterogeneous graph can be identified, and the corresponding node embeddings can be obtained, as shown in Equation (1).
a t t n o d e ( H S l , H E l , H K l ) H * , R l + 1 ,
Here, H * , R l + 1 denotes the node embedding of a certain type of node after node-level attention, * denotes any one of the three types of nodes—learner nodes, exercise nodes, and knowledge concept nodes—and R denotes the relationship between nodes, including the four types of relationships outlined in Section 3.1.
Due to the varying importance of nodes to their neighboring nodes under different relationships (for instance, the importance of knowledge concept nodes and learner nodes to exercise nodes is different under the “exercise–knowledge concept” relationship and the “learner–exercise” relationship), the importance of nodes to their neighboring nodes in this study is based on relationship r. Under relationship r, the importance of node n j to node n i n i can be calculated using Equation (2).
e i j r = a t t n o d e ( ( n i , n j ) ; r ) ,
In the above equation, a t t n o d e is the node-level attention module. ( n i , n j ) is a node pair. Under relation r, the attention weight of ( n i , n j ) is related to the features of node n j and node n i . e i j r represents the importance of node n j to node n i under relation r. Since ( n i , n j ) and ( n j , n i ) correspond to different relations, e i j r and e j i r are not the same, i.e., e i j r is asymmetric.
Then, to incorporate structural information into the model, after identifying node importance based on the relationship, it is normalized to obtain the weight coefficient. This can prevent the instability of the model caused by overly large or small importance values. This makes the parameter update process of the model more stable. It can be normalized by Equation (3).
α i j r = s o f max ( e i j r ) = exp ( σ ( a r [ h i | | h j ] ) ) k N i r exp ( σ ( a r [ h i | | h k ] ) ) ,
Here, σ is the activation function. a r is the node-level attention vector under relationship r. || represents the concatenation operation, and [ h i | | h j ] is the concatenation of h j after h i . N i r denotes the set of neighboring nodes to node n i under relation r. The attention coefficient α i j r between node h i l and node h j l is obtained through normalization.
As shown in Equation (3), the attention coefficient α i j r depends on both the features of node pair [ h i | | h j ] and relationship r. Since the importance of node n j to node n i differs from that of node n i to node n j , and the neighbor node set N i r of node n j is distinct from the neighbor node set N j r of node n j , the attention coefficient α i j r exhibits asymmetry.
Then, the embedding of node n i under relation r can be obtained by aggregating information from its neighboring nodes through a projection operation. The resulting embedding of node n i under relation r is formally expressed in Equation (4).
z i r = σ ( j N i r α i j r h j ) ,
α i j r consists of the features of neighboring nodes under relation r, and the information from these neighboring nodes is captured through Equation (4). Due to the inherent characteristics of heterogeneous graphs, graph data exhibit relatively high variance. To mitigate the impact of excessive variance on model performance, multi-head attention is employed to enhance model stability. By repeating the node-level attention mechanism K times and concatenating the resulting embeddings, the final node-level attention embedding is generated, as formalized in Equation (5).
h i l + 1 = | | k = 1 K z i r ,
Here, | | denotes the concatenation operation, indicating that the outputs of multi-head attention are concatenated. The final node-level attention embedding is derived from Equation (4).

3.3. Relation-Level Attention

Each node type in a hierarchical heterogeneous graph contains multiple types of information. Node embeddings based on node-level attention are capable of transmitting and aggregating a single type of information. To incorporate diverse information types into node representations, these information types can be modeled through distinct relationships. To address the challenge of relationship-based information fusion in hierarchical heterogeneous graphs and to capture the structural information of subgraphs associated with nodes, the node-level attention-based embeddings are used as input to compute the weights of each relationship, thereby integrating relationship-specific information into the node embeddings. This process of enriching node embeddings with relationship-specific information is formalized in Equation (6).
a t t r e l ( H 1 , r l + 1 , H 2 , r l + 1 ) H * l + 2 ,
To determine the importance of each relationship to a node, the significance of a relationship to the node embedding is captured by the relationship-level attention coefficient, which is derived through a nonlinear transformation. The contributions of different relationships are then aggregated using aggregation operations such as Mean, Sum, and Max, yielding the relationship-based embedding of the target node. Based on the experimental results, the Mean operation is adopted for aggregation in this study. For a given relationship r, let Sr denote the node set associated with r; the importance of r to node n i can then be expressed by Equation (7).
w i , r = n i S r | S r | q T tanh ( W R h i , r l + 1 + b ) ,
Here, q denotes the node-level attention vector, W R denotes the weight vector, b denotes the bias vector, and w i , r denotes the importance of relationship r to node n i . Aggregation of all node embeddings is performed by computing their average value. The tanh function, which is the hyperbolic tangent activation function, is employed to enhance the convolutional layer’s sensitivity to nonlinear features and thereby improve its feature extraction capability.
The objective of HHGKT is to explore the correlation between learners and exercises. After obtaining the importance of relationship r to the node, the softmax function is used to normalize the importance, and then the relationship-level attention weight coefficient β i , j is obtained according to Equation (8).
β i , r = exp ( w i , r ) j S r exp ( w i , j ) ,
β i , r denotes the significance of relationship r to node n i and can be interpreted as the contribution of r to node n i . The larger the value of β i , r , the greater the importance of r to node n i . Subsequently, attention-based aggregation is performed using the weight coefficient. Let R i denote the set of relationships associated with node n i ; the final node embedding is then given by Equation (9).
h i l + 1 = r = 1 R i β i , r h i , r l + 1 ,
As shown in Equation (9), h i l + 1 denotes the final representation of node n i after relation-level attention, incorporating both the node-level attention vector h i , r l + 1 and the relation-level attention weight β i , r . Thereby both node-specific and relation-specific information are integrated into the final node embedding.

3.4. Two-Layer Attention Mechanism

A two-layer attention network is employed by HHGKT for information transmission and aggregation, as illustrated in Figure 4b,c. Each layer incorporates both node-level attention and relation-level attention mechanisms. Figure 4a depicts the process of information transmission and aggregation for learner nodes in the first layer. As shown in Figure 4c, after applying the two-layer adaptive attention network, the embeddings HS and HE of the learner nodes are obtained.
In addition, to facilitate global information transmission, the Dropout layer was appropriately increased to achieve the effect of perturbation and prevent overfitting in this study. For node n i , after applying the two-layer attention network, the feature matrices Hu and HE of the node are obtained.

3.5. Contrastive Enhanced Learning

In the interaction records between learners and exercises, the influence of correctly completed exercises and wrongly completed exercises on the prediction of exercises is different. Therefore, in the HHGKT model, the subgraph of exercises correctly completed by learners (referred to as the positive association subgraph in this study) and the subgraph of exercises wrongly completed by learners (referred to as the negative association subgraph in this study) are introduced. Then, contrastive learning is adopted to enhance the representation ability of the model. Since the purpose of HHGKT is to classify exercises, i.e., whether they can be correctly completed or not, the loss function as shown in Equation (10) is adopted.
L g c l = 1 | N | i = 1 N log exp ( h i h i + ) exp ( h i h i + ) + τ j = 1 K exp ( h i h j ) ,

3.6. Prediction Modul

The objective of HHGKT is to predict whether a learner can correctly complete a future exercise based on their current mastery of knowledge concepts. To predict the possibility of learners completing exercises correctly, the prediction layer predicts the possibility of learners completing exercises correctly based on the embeddings of learners and exercises in the current moment. Specifically, the learner embedding and the exercise embedding at the current time step are concatenated and then fed into a fully connected layer with a Sigmoid activation function to obtain the probability of correctly completing the exercise at time t, as given in Equation (11).
y ^ i j t = S i g m o i d ( f ( h s i | | h e j ) ) ,
Here, y ^ i j t denotes the probability that learner s i correctly complete exercise e j at time t. h s i and h e j represent the embeddings of learner s i and exercise e j , respectively. | | denotes the concatenation operation, which involves appending h e j to the end of h s i to form a single combined vector.
To learn the parameters of the heterogeneous graph knowledge tracing model, the cross-entropy loss function is adopted to minimize the difference between the predicted value y ^ i j t and the true value y i j t , as shown in Equation (12).
L p r e = ( y i j t log ( y ^ i j t ) + ( 1 y i j t ) log ( 1 y ^ i j t ) ) ,
The final loss function of the HHGKT model is a fusion of the recommendation loss function and the graph contrastive learning loss function. The final loss function can be expressed as Equation (13).
L = L r e c + λ L p r e ,
Here, the weighting coefficient λ is used to balance the two loss functions.

4. Result Analysis and Discussion

4.1. Experimental Data

The ASSIST2009, ASSIST2012, and ASSIST2017 datasets [17] comprise collections of learner activity data collected from the ASSISTments online education platform. Duplicate entries and records from learners with insufficient interaction data were removed. The detailed statistics of the datasets are presented in Table 1. SNum denotes the number of learners. ENum denotes the number of exercises. KNum denotes the number of knowledge concepts. ANum-KE denotes the average number of relevant exercises per knowledge concept. RNum denotes the total number of learner–exercise interaction records. PCAE is the proportion of correctly completed exercises. PIAE is the proportion of incorrectly completed exercises.

4.2. Experimental Benchmarks

Based on the above evaluation metrics, comparative experiments are conducted with seven knowledge tracing methods: DKT [3], DKVMN [18], GKT [4], SGKT [19], GIKT [20], HGKT [21], and SPKT [6].
DKT [3]: In DKT, the RNN was firstly used to deal with the problem of knowledge tracing. It predicts learners’ next attempt at correctly identifying a knowledge concept based on their mastery of that knowledge concept.
DKVMN [18]: The key-value memory network is used to trace the learners’ degree of knowledge for a specific question sequence.
GKT [4]: In GKT, a knowledge concept graph is constructed with knowledge concepts as nodes and dependency relationships between concepts as edges. It assumes that learners possess a knowledge state for each knowledge concept at every time step. When a learner answers a question associated with a specific knowledge concept, not only is the knowledge state of that knowledge concept updated, but so are those of its connected knowledge concepts. Knowledge tracing is thereby modeled as a time-series node classification task within a graph neural network.
SGKT [19]: SGKT is a heterogeneous knowledge tracing model based on session sequences. Based on the relationships between knowledge concepts, the learner’s knowledge state is determined using gated graph neural networks.
GIKT [20]: GIKT is a graph-based knowledge tracing interaction model. In GIKT, a graph convolutional network (GCN) is used to summarize the learner’s mastery of a problem, which represents the learner’s current knowledge state, the learner’s history of completing related exercises, the target problem, and related knowledge concepts.
HGKT [21]: HGKT is a knowledge tracing model based on hierarchical graphs. By constructing a hierarchical graph of exercises and knowledge concepts to explore the potential hierarchical relationship between learners and exercises, two attention mechanisms are adopted that highlight the important historical states of learners.
SPKT [6]: SPKT is a knowledge tracing model based on the heterogeneous graph constructed from the association between learners and exercises. Learners’ abilities and question importance are integrated. A graph attention network is then employed to learn the relationships between learners and questions.

4.3. Experimental Results

4.3.1. Comparative Experiments

The aim of the HHGKT model is to predict learners’ mastery of knowledge concepts by forecasting whether questions will be answered correctly. The knowledge tracing task can be formulated as a binary classification problem. Accordingly, the evaluation metrics include prediction accuracy (accuracy), the area under the ROC curve (AUC), and the following classification metrics: precision, recall, and F1-score.
Seven state-of-the-art knowledge tracing models—DKT, DKVMN, GKT, SGKT, GIKT, HGKT, and SPKT—are compared based on these metrics in this subsection. The experimental results are reported for the ASSISTment 2009 dataset in Table 2, for the ASSISTment 2012 dataset in Table 3, and for the ASSISTment 2017 dataset in Table 4.
As shown in Table 2, Table 3 and Table 4, HHGKT achieves strong performance across all three datasets. On the ASSISTment 2009 dataset, all algorithms achieve relatively high results. This is attributed to the fact that the ASSISTment 2009 dataset contains a larger average number of exercises per knowledge concept and a higher proportion of correctly completed exercises. The more exercises a learner correctly completes, the more accurately their mastery of knowledge concepts can be assessed. Conversely, when the number of correct responses is insufficient, it becomes difficult to reliably estimate conceptual mastery. Therefore, the abundance of correctly completed exercises in the ASSISTment 2009 dataset contributes to the improved performance observed across models.
On the ASSISTment 2012 dataset, the HHGKT method exhibits superior performance compared to baseline approaches, primarily due to the high proportion of incorrectly completed exercises (57.19%). Unlike existing methods, the associative patterns between learners and erroneous responses are more effectively captured by HHGKT through the integration of both positively and negatively correlated exercises into its architecture. This is accomplished as a result of negative correlation signals being incorporated into node representations via a contrastive learning module, thereby enhancing the model’s representational capacity. On the ASSISTment 2017 dataset, HHGKT achieves only modest performance improvements. This limitation arises from the small number of exercises associated with each knowledge concept, leading to sparse connections between exercises and knowledge concepts. As a result, the amount of information propagated during message passing is constrained, which restricts the integration of richer features into the representations of learner and exercise nodes.

4.3.2. Ablation Experiments

To evaluate the contribution of each component in the HHGKT model to overall model performance, ablation studies are conducted by systematically removing or modifying individual modules. The analysis primarily focuses on the roles of the two-layer adaptive attention module and the contrastive learning module. HHGKT-f1 denotes the variant with only the first layer of the adaptive attention network included. HHGKT-cl denotes the model without the contrastive learning module. HHGKT-kk denotes that the “knowledge concept–knowledge concept” relationship was removed from the graph structure. HHGKT-ek denotes that the “exercise–knowledge concept” relationship was removed from the graph structure. HHGKT-sk denotes that the “learner–knowledge concept“ relationship was removed from the graph structure.
In Table 5, the AUC and ACC results of the HHGKT variants across the three datasets are presented. As shown in Table 5, the two-layer attention mechanism contributes most significantly to model performance. When the second layer of adaptive attention is removed, the resulting variant exhibits notably degraded performance. On the ASSISTment 2009 dataset, accuracy and AUC decrease by 0.0335 and 0.0294, respectively; on the ASSISTment 2012 dataset, they drop by 0.0471 and 0.0254; and on the ASSISTment 2017 dataset, reductions of 0.034 and 0.0137 are observed. These results demonstrate that employing a dual-layer adaptive attention network for information transmission and aggregation can enhance the model’s representational capacity. Furthermore, it is indicated by Table 5 that the contrastive learning module also contributes to the improved performance of HHGKT.
Furthermore, with respect to the multiple relationships within the constructed hierarchical heterogeneous graph, removing any of these relationships—whether “knowledge concept–knowledge concept”, “exercise–knowledge concept”, or “learner–knowledge concept”—adversely affects the model’s final performance. In comparison to a bipartite graph that contains only the “learner–exercise” relationship, the hierarchical heterogeneous graph more effectively integrates node and relational information, thereby enhancing the representations of both learner and exercise nodes.
To evaluate the contribution of the contrastive learning module, the weight parameter in the loss function was investigated, enabling us to determine whether it influences the experimental results. In this ablation study, λ was set to values from the set {0, 0.1, 0.2, 0.3, 0.4, 0.5}. The model’s performance in terms of accuracy and AUC was then compared across the three datasets: ASSISTment 2009, ASSISTment 2012, and ASSISTment 2017. Detailed results are presented in Table 6. As shown in the table, the contrastive learning module achieves optimal performance when λ = 0.2. All reported HHGKT results in Table 2, Table 3, Table 4 and Table 5 correspond to this setting.

4.3.3. Visualization of Experimental Results

To provide a more intuitive understanding of changes in learners’ knowledge states, in this section, heatmaps are presented to visualize the evolution of learners’ mastery levels of knowledge concepts across time. The temporal changes in knowledge state for a randomly selected learner are depicted in Figure 5, Figure 6 and Figure 7 for the ASSISTment 2009, ASSISTment 2012, and ASSISTment 2017 datasets, respectively. In these figures, the x-axis denotes the knowledge concepts, while the y-axis denotes the sequential order of interactions (time steps).
Due to the large number of distinct knowledge concepts in the datasets, a subset of 50 randomly selected concepts is displayed to improve clarity and interpretability. At each time step, predictions were made on 10 exercises, and a learner’s knowledge mastery was subsequently updated based on the outcomes.

5. Conclusions

To investigate the impact of knowledge space structure on educational computation and elucidate its influence on learning outcomes, this study began with a representation model of the knowledge space structure, and hierarchical heterogeneous graphs were employed to capture the complex high-order relationships in the graph. Various relationships—such as associations and dependencies—were integrated into the model to analyze their effects on educational computation, using knowledge tracing as a representative example. The following conclusions are drawn:
(1)
The representation of the knowledge space structure will affect educational computation. Compared with heterogeneous graphs and tree graphs, hierarchical heterogeneous graphs can better represent the complex relationships in the knowledge space.
(2)
The correlation between exercises and knowledge concepts as well as the dependency between knowledge concepts have a greater impact on knowledge tracing models. In educational computation (such as knowledge tracing, learning path recommendation, and learning resource recommendation), the influence of the dependency between knowledge concepts should be fully considered.
(3)
The diagnostic results of learning outcomes vary across different learning objectives. Therefore, the construction of knowledge tracing models should be tailored to specific learning goals.
However, learning resources also contain certain attributes (such as learning resource type, difficulty, etc.), and learners have different attributes (age, major, gender, learning preferences, etc.). If this additional information is included, the three-layer heterogeneous graph constructed in this study cannot fully reflect the complex relationships contained within. Additionally, factors such as learner forgetting during the learning process can also influence the diagnostic outcomes of knowledge states, which will be the focus of future research.

Author Contributions

Methodology, H.D.; software, Y.Z.; validation, Y.Z. and B.L.; formal analysis, Y.Z.; investigation, Y.Z.; resources, B.L.; data curation, B.L. and Y.-C.C.; writing—original draft preparation, B.L., Y.-C.C., H.D. and Y.Z.; writing—review and editing, B.L., Y.Z., H.D. and Y.-C.C.; visualization, Y.Z.; supervision, B.L. and Y.-C.C.; project administration, B.L.; funding acquisition, B.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Shaanxi Provincial Natural Science Foundation Research Program of China, grant number 2025JC-YBMS-789; the Scientific Research Program of Shaanxi Provincial Department of Education of China, grant number 24JR058; the Doctoral research project launched by Shangluo University of China, grant number 25SKY0010; and the College Learner Innovation and Entrepreneurship Project of China, grant number 202411396023.

Data Availability Statement

All experimental data used in this study were obtained from the ASSISTments dataset, which provides publicly available datasets. The data can be accessed at https://sites.google.com/site/assistmentsdata/ (accessed on 15 May 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Suo, J.X.; Zhang, L.P.; Yan, S.; Wang, D.Q.; Zhang, Y.W. Review of interpretable deep knowledge tracing methods. J. Comput. Appl. 2025, 45, 2043–2055. [Google Scholar]
  2. Huang, S.W.; Liu, Z.H.; Luo, L.Y.; Zhao, Z.Y.; Wang, C. Research on Bayesian knowledge tracking model integrating behavior and forgetting factors. Appl. Res. Comput. 2021, 38, 1993–1997. [Google Scholar] [CrossRef]
  3. Piech, C.; Bassen, J.; Huang, J.; Ganguli, S.; Sahami, M.; Guibas, L.; Academy, K. Deep knowledge tracing. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
  4. Nakagawa, H.; Iwasawa, Y.; Matsuo, Y. Graph-based knowledge tracing: Modeling learner proficiency using graph neural network. In Proceedings of the 2019 IEEE/WIC/ACM International Conference on Web Intelligence, New York, NY, USA, 13–17 October 2019. [Google Scholar]
  5. Song, X.; Li, J.; Lei, Q.; Zhao, W.; Chen, Y.; Mian, A. Bi-CLKT: Bi-graph contrastive learning-based knowledge tracing. Knowl.-Based Syst. 2022, 241, 108274. [Google Scholar] [CrossRef]
  6. Yan, Q.Y.; Si, Y.Q.; Yuan, G.; Wang, Z.X. Student-Problem Association Based Heterogeneous Graph Knowledge Tracing Model. ACTA Electron. Sin. 2023, 51, 3549–3556. [Google Scholar]
  7. Ju, S.G.; Kang, R.; Zhao, R.M.; Sun, J.P. Deep Knowledge Tracing Model Based on Embedding of Fused Multiple Concepts. J. Softw. 2023, 34, 5126–5142. [Google Scholar] [CrossRef]
  8. Song, X.Y.; Li, J.X.; Tang, Y.F.; Zhao, T.; Guan, Z. JKT: A joint graph convolutional network based deep knowledge tracing. Inf. Sci. 2021, 580, 510–523. [Google Scholar] [CrossRef]
  9. Wang, C.; Ma, D.; Xu, H.R.; Chen, P.F.; Chen, M.; Li, H. SA-MGKT: Multi-graph knowledge tracing method based on self-attention. J. East China Norm. Univ. Nat. Sci. 2024, 5, 20–31. [Google Scholar] [CrossRef]
  10. Wu, T.; Ling, Q. Fusing hybrid attentive network with self-supervised dual-channel heterogeneous graph for knowledge tracing. Expert Syst. Appl. 2023, 225, 120212. [Google Scholar] [CrossRef]
  11. Xie, P.Z.; Li, G.J.; Li, D. Knowledge Tracing Model Based on Exercise-Knowledge Point Heterogeneous Graph and Multi feature-Fusion. Comput. Sci. 2025, 52, 197–205. [Google Scholar] [CrossRef]
  12. Zhang, W.Q.; Wang, H.R.; Zhu, G.F. Temporal Convolutional Knowledge Tracing Method with Heterogenous Graph Neural Network. J. Chin. Comput. Syst. 2024, 45, 2823–2829. [Google Scholar]
  13. Xie, W.Z.; Liu, W.; Hu, D.W.; Cui, Z.H.; Zhao, Y.B. Dynamic knowledge tracking driven by multi-view comparison learning. Appl. Res. Comput. 2025, 42, 3325–3332. [Google Scholar]
  14. Qiao, Z.; Wang, P.; Fu, Y.; Du, Y.; Zhou, Y. Tree Structure-Aware Graph Representation Learning via Integrated Hierarchical Aggregation and Relational Metric Learning. arXiv 2020, arXiv:2008.10003v2. [Google Scholar]
  15. Song, L.Y.; Liu, Z.Z.; Zhang, Y.; Li, Z.H.; Shang, X.Q. Cascade Graph Convolution Network Based on Multi-level Graph Structures in Heterogeneous Graph. J. Softw. 2024, 35, 5179–5195. [Google Scholar] [CrossRef]
  16. Tian, Y.; Zhang, C.; Guo, Z.; Huang, C.; Metoyer, R.; Chawla, N. RecipeRec: A Heterogeneous Graph Learning Model for Recipe Recommendation. arXiv 2022. [Google Scholar] [CrossRef]
  17. Abdelrahman, G.; Wang, Q. Deep graph memory networks for forgetting-robust knowledge tracing. IEEE Trans. Autom. Control 2023, 35, 12. [Google Scholar] [CrossRef]
  18. Heffernan, N.T. ASSISTments. Available online: https://sites.google.com/site/assistmentsdata/ (accessed on 15 May 2025).
  19. Wu, Z.; Huang, L.; Tang, H.Y. SGKT: Session graph-based knowledge tracing for learner performance prediction. Expert Syst. Appl. 2022, 206, 117681. [Google Scholar] [CrossRef]
  20. Yang, Y.; Shen, J.; Qu, Y.R.; Liu, Y.F.; Wang, K.R.; Zhu, Y.M.; Zhang, W.N.; Yu, Y. GIKT: A graph-based interaction model for knowledge tracing. In European Conference on Machine Learning and Knowledge Discovery in Databases; Springer: Cham, Switzerland, 2021; pp. 299–315. [Google Scholar]
  21. Tong, H.; Zhou, Y.; Wang, Z. HGKT: Introducing Problem Schema with Hierarchical Exercise Graph for Knowledge Tracing. arXiv 2020, arXiv:2006.16915. [Google Scholar] [CrossRef]
Figure 1. Example of Hierarchical heterogeneous graph. The colors in the figure represent different types of nodes, including user nodes, exercise nodes, knowledge concept nodes, and learning resource nodes. (a) is a hierarchical heterogeneous graph constructed based on the relationships between all users and exercises, knowledge concepts, and learning resources. If an exercise is only conceptually related to one knowledge concept, then the diagram is a hierarchical tree diagram. (b) is the interaction relationship diagram of user u1.
Figure 1. Example of Hierarchical heterogeneous graph. The colors in the figure represent different types of nodes, including user nodes, exercise nodes, knowledge concept nodes, and learning resource nodes. (a) is a hierarchical heterogeneous graph constructed based on the relationships between all users and exercises, knowledge concepts, and learning resources. If an exercise is only conceptually related to one knowledge concept, then the diagram is a hierarchical tree diagram. (b) is the interaction relationship diagram of user u1.
Mathematics 14 00500 g001
Figure 2. HHGKT model.
Figure 2. HHGKT model.
Mathematics 14 00500 g002
Figure 3. Knowledge concept graph.
Figure 3. Knowledge concept graph.
Mathematics 14 00500 g003
Figure 4. The two-layer attention mechanism. The dot line arrows signify steps that necessitate intermediate calculations. The processes of Equations (1) and (6) are shown in (a). The information transmitted at the first level is illustrated in (b), and the information transmitted at the second layer is depicted in (c); finally, the refined vector representations of learners and exercises are obtained.
Figure 4. The two-layer attention mechanism. The dot line arrows signify steps that necessitate intermediate calculations. The processes of Equations (1) and (6) are shown in (a). The information transmitted at the first level is illustrated in (b), and the information transmitted at the second layer is depicted in (c); finally, the refined vector representations of learners and exercises are obtained.
Mathematics 14 00500 g004
Figure 5. Heatmap of ASSISTment 2009 dataset. In the heat map, the color indicates the mastery level of knowledge concepts: blue for unmastered and red for mastered.
Figure 5. Heatmap of ASSISTment 2009 dataset. In the heat map, the color indicates the mastery level of knowledge concepts: blue for unmastered and red for mastered.
Mathematics 14 00500 g005
Figure 6. Heatmap of ASSISTment 2012 dataset. In the heat map, the color indicates the mastery level of knowledge concepts: blue for unmastered and red for mastered.
Figure 6. Heatmap of ASSISTment 2012 dataset. In the heat map, the color indicates the mastery level of knowledge concepts: blue for unmastered and red for mastered.
Mathematics 14 00500 g006
Figure 7. Heatmap of ASSISTment 2017 dataset. In the heat map, the color indicates the mastery level of knowledge concepts: blue for unmastered and red for mastered.
Figure 7. Heatmap of ASSISTment 2017 dataset. In the heat map, the color indicates the mastery level of knowledge concepts: blue for unmastered and red for mastered.
Mathematics 14 00500 g007
Table 1. Experimental dataset.
Table 1. Experimental dataset.
DatasetSNumENumKNumANum-KEPCAEPIAERNum
ASSIST09381614,375101142.30.64330.3567276,413
ASSIST1217,61141,073233176.30.42810.57191,819,504
ASSIST1715,27328339828.90.56190.4381662,869
Table 2. Experimental results for ASSISTment 2009 dataset.
Table 2. Experimental results for ASSISTment 2009 dataset.
ModelAccuracyAUCPrecisionRecallF1
DKT0.76620.74330.59440.80920.6413
DKVMN0.79670.74590.64280.81090.6638
GKT0.72440.70560.62110.74370.6593
GIKT0.78900.73890.61960.80750.6579
HGKT0.80470.78050.63880.82090.6641
SGKT0.79750.73120.63570.81930.6617
SPKT0.82510.78440.69340.85040.6833
HHGKT0.82860.80280.71150.86770.7024
Table 3. Experimental results for ASSISTment 2012 dataset.
Table 3. Experimental results for ASSISTment 2012 dataset.
ModelAccuracyAUCPrecisionRecallF1
DKT0.73060.72690.62480.84060.6817
DKVMN0.78730.74230.61770.80830.6564
GKT0.73520.71250.63150.64990.6631
GIKT0.77570.73910.60270.81830.6488
HGKT0.80790.77900.64030.82770.6695
SGKT0.81350.75120.64740.82890.6708
SPKT0.81580.76160.64800.82960.6728
HHGKT0.82900.77820.66170.84270.6991
Table 4. Experimental results for ASSISTment 2017 dataset.
Table 4. Experimental results for ASSISTment 2017 dataset.
ModelAccuracyAUCPrecisionRecallF1
DKT0.72770.70350.62030.83720.6718
DKVMN0.72680.69880.61980.83430.6701
GKT0.71520.71120.62840.74030.6562
GIKT0.76130.73570.61460.82180.6577
HGKT0.79620.77480.63720.82390.6649
SGKT0.79240.73560.63970.82180.6632
SPKT0.80270.75010.64270.82860.6684
HHGKT0.81040.75830.64990.83170.6728
Table 5. Experimental results for different variants of the HHGKT model.
Table 5. Experimental results for different variants of the HHGKT model.
variantASSISTment2009ASSISTment2012ASSISTment2017
AccuracyAUCAccuracyAUCAccuracyAUC
HHGKT-f10.79510.77340.78190.75280.77640.7346
HHGKT-kk0.81070.79440.80530.76910.79330.7482
HHGKT-ek0.81460.79880.80930.77110.79840.7503
HHGKT-sk0.81270.79750.804660.76810.79620.7486
HHGKT-cl0.80720.78470.79330.75870.78490.7369
HHGKT0.82860.80280.82900.77820.81040.7583
Table 6. Experimental results of HHGKT model under different parameter λ values.
Table 6. Experimental results of HHGKT model under different parameter λ values.
λASSISTment2009ASSISTment2012ASSISTment2017
AccuracyAUCAccuracyAUCAccuracyAUC
00.80720.78470.79330.75870.78490.7369
0.10.82370.80690.82710.77530.80890.7527
0.20.82860.80280.82900.77820.81040.7583
0.30.82550.80360.82200.77350.80770.7519
0.40.82190.80070.82130.77190.80240.7495
0.50.81740.79820.81880.76760.79980.7436
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, B.; Zhang, Y.; Du, H.; Cheng, Y.-C. A Knowledge Tracing Model Based on Hierarchical Heterogeneous Graphs. Mathematics 2026, 14, 500. https://doi.org/10.3390/math14030500

AMA Style

Li B, Zhang Y, Du H, Cheng Y-C. A Knowledge Tracing Model Based on Hierarchical Heterogeneous Graphs. Mathematics. 2026; 14(3):500. https://doi.org/10.3390/math14030500

Chicago/Turabian Style

Li, Bin, Yan Zhang, Hongle Du, and Yeh-Cheng Cheng. 2026. "A Knowledge Tracing Model Based on Hierarchical Heterogeneous Graphs" Mathematics 14, no. 3: 500. https://doi.org/10.3390/math14030500

APA Style

Li, B., Zhang, Y., Du, H., & Cheng, Y.-C. (2026). A Knowledge Tracing Model Based on Hierarchical Heterogeneous Graphs. Mathematics, 14(3), 500. https://doi.org/10.3390/math14030500

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop