1. Introduction
The rapid development of digital learning platforms and intelligent tutoring systems has significantly expanded the availability of fine-grained educational interaction data [
1,
2]. These data provide unprecedented opportunities to model learners’ cognitive processes and to design adaptive instructional strategies. Accurate cognitive modeling is central to intelligent education, as it enables systems to estimate knowledge mastery, predict future performance, and deliver personalized interventions [
3]. However, despite considerable progress in educational data mining, the dynamic and structural complexity of learning processes remains insufficiently captured by many existing approaches.
Early research in knowledge tracing relied on probabilistic frameworks such as Bayesian Knowledge Tracing, which modeled skill mastery as latent binary states evolving over time [
4]. While effective in small-scale settings, such methods often assumed independence among knowledge components and limited temporal expressiveness. The introduction of deep learning led to sequential models such as Deep Knowledge Tracing (DKT), which employed recurrent neural networks to capture temporal dependencies [
5]. Subsequent extensions incorporated attention mechanisms and self-attentive architectures to improve long-range dependency modeling [
6,
7]. Nevertheless, many sequential approaches treat learning interactions as linear time series, without explicitly modeling structural relationships among learners, knowledge concepts, and exercises.
Graph-based educational modeling has emerged as a promising direction to address structural dependencies. By representing entities as nodes and interactions as edges, graph neural networks (GNNs) can capture relational information and multi-hop dependencies [
8]. Static knowledge graphs and heterogeneous graph models have been applied to educational scenarios to improve prediction accuracy and interpretability [
9,
10]. However, a key limitation remains: many graph-based models operate on fixed or discretized snapshots and do not fully account for continuous structural evolution. Learning processes involve both gradual knowledge accumulation and abrupt cognitive restructuring, consistent with theoretical perspectives in cognitive psychology [
11,
12]. Capturing this hybrid dynamic—where temporal continuity coexists with discrete event transitions—requires modeling mechanisms beyond static graph structures.
In parallel, large language models (LLMs) have demonstrated strong capabilities in reasoning, dialogue generation, and explanation synthesis [
13,
14]. Their application in education has focused primarily on automated tutoring, feedback generation, and content creation [
15]. Although LLMs can generate detailed reasoning traces through structured prompting, concerns remain regarding hallucination, reliability, and the lack of integration with formal analytical models [
16]. Current systems often treat LLMs as standalone conversational tools rather than as components within a rigorous cognitive modeling framework. This separation limits their potential contribution to dynamic learner modeling.
Reinforcement learning (RL) has also been explored for adaptive educational decision-making, where instructional policies are optimized to maximize long-term learning gains [
17,
18]. However, many RL-based tutoring systems rely on simplified state representations derived from handcrafted features or shallow performance indicators. Without robust dynamic cognitive embeddings, policy optimization may fail to reflect deeper structural changes in learner understanding.
These observations reveal several research gaps. First, there is a lack of unified frameworks that integrate dynamic structural modeling with temporal learning trajectories. Second, passive behavioral logging does not adequately capture learners’ reasoning processes, limiting interpretability. Third, existing adaptive intervention strategies often emphasize short-term performance rather than long-term cognitive development. Finally, the integration of LLM-based reasoning evidence into formal graph-based cognitive models remains underexplored. More importantly, existing studies typically address these aspects in isolation, lacking a unified formulation that connects reasoning evidence, cognitive representation, and decision-making.
To address these challenges, this study proposes a Learner Cognitive Graph (LCG) framework that formulates learner modeling as a unified process connecting behavioral evidence, relational structure, and instructional decision-making. Instead of treating dynamic graph modeling, reasoning acquisition, and policy optimization as independent components, the proposed framework establishes a coherent mechanism in which reasoning signals are transformed into structured supervision, cognitive states are modeled through temporally evolving graphs, and intervention strategies are optimized based on these representations. A Dynamic Cognition Graph (DCG) is formally defined to represent temporally evolving interactions among learners, knowledge concepts, and exercises. A reverse Turing test-based agent employing structured prompting strategies is designed to elicit reasoning-oriented behavioral evidence, enhancing data fidelity. Dynamic graph representation learning with temporal message passing and self-supervised objectives constructs interpretable cognitive embeddings. Personalized intervention is formulated as a Markov decision process, enabling long-term reward optimization.
The main aim of this work is to develop a unified and scalable framework for dynamic cognitive modeling and adaptive educational support. The key contribution lies in the formulation of a reasoning-aware cognitive modeling paradigm, where behavioral acquisition, structural representation, and policy optimization are tightly coupled within a single learning process. By explicitly linking behavioral perception, structural modeling, and policy optimization within a unified formulation, the proposed approach contributes to the advancement of intelligent educational systems that are both interpretable and adaptive.
2. Related Work
2.1. Knowledge Tracing and Cognitive Modeling
Knowledge tracing aims to model learners’ evolving mastery of skills or concepts based on interaction histories. Early approaches were predominantly probabilistic. Bayesian Knowledge Tracing (BKT) represented knowledge as a latent binary variable governed by transition and emission probabilities [
4]. While interpretable and computationally efficient, BKT assumed independence among knowledge components and relied on manually defined parameters, limiting its expressiveness in complex learning environments.
The emergence of deep learning significantly reshaped knowledge tracing research. Deep Knowledge Tracing (DKT) introduced recurrent neural networks to model temporal dependencies in student interaction sequences [
5]. Subsequent models, including memory-augmented networks and self-attentive architectures such as SAKT and AKT, improved long-range dependency modeling and concept interaction representation [
6,
7], while more recent studies further enhanced sequential knowledge tracing through contrastive and multi-expert learning strategies [
19]. These methods achieved higher predictive accuracy compared to probabilistic baselines. More recently, large language models have been introduced into knowledge tracing to incorporate semantic understanding of problem content and learner responses, further improving representation capability in modeling semantic aspects of learner behavior [
20,
21]. However, most sequential models treat educational data as linear time series, without explicitly modeling structural relationships among learners, exercises, and knowledge concepts.
Graph-based extensions attempted to overcome this limitation by incorporating relational structures. Heterogeneous graph neural networks (HGNNs) and knowledge graph-enhanced tracing models represent students, concepts, and problems as nodes connected through typed edges [
9,
10]. Such approaches capture inter-concept dependencies and relational reasoning, offering improved interpretability. Nevertheless, many graph-based knowledge tracing models operate on static or periodically updated graphs. Recent studies have explored dynamic graph learning mechanisms to model temporal structural evolution, such as DyGKT and temporal graph memory networks [
22,
23]. However, these approaches still focus primarily on interaction-level signals and do not incorporate reasoning-derived cognitive evidence. They do not fully capture the continuous structural evolution of cognitive states driven by ongoing interactions, forgetting, and strategy shifts.
A fundamental divergence in the literature concerns whether learning dynamics are best modeled as sequential processes or relational structural processes. Sequential models emphasize temporal order, whereas graph-based models emphasize relational context. A unified framework that jointly integrates temporal dynamics, structural adaptation, and reasoning-aware cognitive signals remains largely underexplored.
2.2. Dynamic Graph Neural Networks
Dynamic graph neural networks (DGNNs) extend traditional GNNs to evolving graphs where nodes and edges change over time. Models such as Temporal Graph Networks (TGN) and attention-based temporal GNNs incorporate memory modules and event-driven updates to represent temporal evolution [
24,
25]. These approaches have demonstrated strong performance in recommendation systems, social network analysis, and traffic forecasting.
Temporal message passing mechanisms enable node embeddings to evolve in response to event streams, capturing both short-term interactions and long-term dependencies. Memory units such as gated recurrent structures allow the model to retain historical information while adapting to new evidence [
26]. Some approaches further introduce time-decay functions or positional encoding to model recency effects.
Despite their effectiveness in general dynamic graph settings, DGNNs have been relatively underexplored in educational contexts. Educational data exhibit unique characteristics, including multi-typed relations (e.g., learner–concept, learner–exercise, concept–concept) and hybrid dynamics combining continuous mastery progression with discrete behavioral events. Existing educational graph models often simplify temporal evolution or rely on coarse-grained snapshots, limiting their ability to represent fine-grained cognitive restructuring.
Applying DGNN techniques to cognitive modeling requires careful adaptation, including event encoding strategies tailored to educational semantics and multi-scale temporal memory mechanisms. Bridging dynamic graph representation learning with educational cognitive theory therefore remains an important research direction.
2.3. Large Language Models in Education
Large language models (LLMs) have demonstrated impressive capabilities in natural language reasoning, explanation generation, and interactive dialogue [
13,
14]. Their application in education has rapidly expanded, including automated tutoring, feedback generation, content summarization, and question answering systems [
15]. Structured prompting strategies, such as chain-of-thought reasoning, have been shown to improve reasoning transparency and output reliability [
27]. Their application in education has rapidly expanded, including automated tutoring, feedback generation, content summarization, and question answering systems [
20,
28].
However, several challenges accompany the integration of LLMs into educational systems. First, LLM-generated responses may contain hallucinations or factual inaccuracies, raising concerns about reliability [
16]. Second, most current educational applications treat LLMs as standalone conversational agents rather than as components within formal analytic frameworks. Third, LLM outputs are typically not integrated into structured cognitive state representations suitable for downstream modeling and optimization.
Recent studies have further explored integrating LLMs with knowledge tracing and structured learning tasks, aiming to leverage semantic reasoning to enhance learner modeling [
20,
21,
29]. However, these approaches primarily focus on improving prediction performance and do not explicitly incorporate reasoning evidence into dynamic structural representations. Nevertheless, systematic protocols for incorporating LLM-derived reasoning evidence into dynamic graph-based learner models remain limited. This gap suggests the need for architectures that combine interactive data elicitation with formal representation learning. In a broader sense, prior studies have shown that multi-modal feature fusion can enhance semantic representation quality in complex understanding tasks [
30], which also supports the potential value of integrating richer behavioral signals into learner modeling.
2.4. Reinforcement Learning for Adaptive Education
Reinforcement learning (RL) has been proposed as a framework for optimizing adaptive instructional policies. By modeling learning environments as Markov decision processes, RL approaches aim to maximize cumulative learning gains through sequential decision-making [
31,
32]. Applications include exercise recommendation, curriculum sequencing, and intelligent tutoring systems.
Early RL-based educational systems often relied on simplified state representations derived from performance scores or handcrafted features [
17]. More recent studies have incorporated neural representations and structured knowledge modeling to capture richer contextual information [
32]. However, policy optimization quality strongly depends on the expressiveness of the state representation. Without dynamic and interpretable cognitive embeddings, RL policies may optimize short-term performance without promoting long-term knowledge retention.
Another divergence in the literature concerns reward design [
32]. Some systems emphasize immediate accuracy improvements, whereas others attempt to incorporate delayed retention effects or engagement metrics. Balancing short-term performance with long-term cognitive stability remains an unresolved challenge.
2.5. Summary and Positioning of This Work
In summary, prior research has made substantial advances in sequential knowledge tracing, graph-based relational modeling, dynamic graph representation learning, LLM-based tutoring, and reinforcement learning for adaptive instruction. However, these research directions have largely evolved independently. Sequential models often lack structural awareness, static graph models insufficiently capture temporal evolution, LLM-based systems lack formal integration with cognitive modeling, and RL approaches frequently rely on limited state representations.
The present study positions itself at the intersection of these research streams. By integrating dynamic heterogeneous graph modeling, structured LLM-driven behavioral data acquisition, self-supervised temporal representation learning, and reinforcement learning-based intervention optimization, the proposed framework aims to provide a unified architecture for dynamic cognitive modeling and adaptive educational support.
Despite these advances, existing approaches typically focus on isolated aspects of the problem, such as temporal prediction, relational modeling, or policy optimization, without establishing a unified mechanism that connects reasoning evidence, cognitive representation, and decision-making.
In contrast, the proposed framework formulates these components within a single learning paradigm, where reasoning signals are explicitly incorporated into representation learning, and the resulting cognitive states are directly used to guide policy optimization.
3. Materials and Methods
3.1. Dynamic Cognitive Representation as State Encoding for Adaptive Intervention
The central objective of intelligent educational systems is to estimate learners’ evolving cognitive states and utilize these estimates to guide adaptive instructional decisions [
1,
33]. In reinforcement learning (RL)-based tutoring systems, the quality of policy optimization critically depends on the expressiveness and stability of the state representation [
34]. Traditional state representations often rely on simplified statistics such as recent correctness rates or handcrafted performance indicators. While computationally convenient, such representations fail to capture deeper structural relationships among learners, knowledge concepts, and exercises.
To address this limitation, this study models learner cognition as a temporally evolving relational structure termed the
Dynamic Cognition Graph (DCG). In this formulation, learner states are represented through graph embeddings derived from heterogeneous interactions among learners, knowledge concepts, and exercises [
35]. The learned graph embeddings serve as state vectors for downstream intervention policies.
By integrating relational structure and temporal dynamics into a unified representation, the proposed approach provides a richer description of learning trajectories. This design enables adaptive intervention strategies to consider both immediate behavioral evidence and long-term cognitive development patterns.The overall architecture of the proposed system is illustrated in
Figure 1.
3.2. Reverse Turing Agent for Reasoning-Oriented Behavioral Acquisition
Traditional educational data collection relies primarily on passive logging of learner interactions, such as correctness labels and response times [
2]. Although these signals provide coarse behavioral indicators, they often fail to capture the reasoning processes underlying learner responses. To obtain richer cognitive evidence, a
Reverse Turing Agent is introduced to elicit structured reasoning explanations during learner interactions.
In contrast to the classical Turing test [
36], the reverse Turing paradigm encourages learners to demonstrate reasoning ability to convince the system that their answers are derived from genuine understanding rather than guessing. The detailed system architecture is illustrated in
Figure 2.
For each interaction event, the learner provides a short reasoning explanation defined as
where
denotes the exercise content,
denotes the learner answer, and
denotes the structured prompt generated by the Reverse Turing Agent.
denotes the reasoning elicitation function that maps interaction inputs to textual explanations.
In practice, the prompt
follows a structured elicitation template designed to guide concise reasoning:
The textual explanation is encoded using a pretrained language encoder based on modern transformer architectures [
37,
38]:
The resulting semantic embedding is integrated into the event attribute vector:
where
denotes correctness and
represents response time. The reasoning embedding
provides additional semantic signals that help identify misconceptions and partial understanding. To reduce the impact of noisy, incomplete, or linguistically variable reasoning responses, textual explanations are not used as raw supervisory targets. Instead, they are encoded into dense semantic representations and incorporated as auxiliary behavioral attributes. This design allows the framework to capture high-level reasoning cues while reducing sensitivity to surface-form variation and minor inconsistencies in learner-generated explanations. Specifically, excessively short, off-topic, and duplicated responses are filtered prior to encoding to improve robustness.
3.3. Dynamic Cognition Graph Formalization
Learner cognition is represented as a dynamic heterogeneous graph evolving over time. At time
t, the Dynamic Cognition Graph is defined as:
where
represents learners, knowledge concepts, and exercises,
denotes interaction edges,
R indicates relation types, and
denotes node features. The structure of the dynamic cognition graph is illustrated in
Figure 3.
In the proposed formulation, the relation set R includes multiple typed relations to capture heterogeneous educational interactions, including learner–exercise response relations, learner–concept mastery relations, and concept–concept dependency relations. These relation types enable the graph to jointly represent behavioral evidence and knowledge structure within a unified framework.
Each node
is associated with a type-specific representation. The initial embedding is defined as:
where
denotes raw node attributes and
is a type-specific encoder. Learner nodes are initialized from historical interaction statistics, while concept and exercise nodes are initialized from learnable embeddings.
Graph evolution is driven by interaction events:
where
and
denote interacting nodes,
is the relation type,
is the timestamp, and
contains behavioral attributes defined in Equation (
4).
The graph update process is defined as:
where
F performs local structural updates and embedding refinement.
In practice, the update function consists of event encoding and temporal message passing. Each event is first encoded as:
where
denotes an event encoder. The encoded messages are then aggregated over temporal neighborhoods to update node representations.
To improve robustness, a masking strategy is applied during training, where partial node features and event attributes are randomly masked and reconstructed from context.
In addition, negative sampling is adopted by generating corrupted interaction pairs, enabling the model to distinguish observed interactions from unobserved ones.
The model is trained using mini-batch optimization over temporally ordered event sequences, where graph representation learning and downstream objectives are optimized jointly.
3.4. Event-Driven Message Passing Mechanism
To model temporal dependencies in learner interactions, an event-driven message passing mechanism is adopted. Each interaction generates a message:
where
and
represent node embeddings before the interaction event.
To capture recency effects, temporal decay is introduced:
Node states are updated through a gated recurrent mechanism:
This mechanism allows continuous refinement of cognitive representations while preserving long-term dependencies.
3.5. Multi-Scale Spatiotemporal Memory Modeling
Educational interactions often contain heterogeneous temporal patterns. To capture these dynamics, node representations are decomposed into multi-scale temporal components:
where ⊕ denotes concatenation.
Different temporal decay parameters are used to capture short-term practice, medium-term learning trends, and long-term knowledge retention.
3.6. Self-Supervised Stabilization of Cognitive Embeddings
Educational datasets often contain sparse supervision signals. To improve representation robustness, two self-supervised objectives are introduced.
Temporal contrastive learning is defined as:
Masked reconstruction loss is defined as:
The final training objective becomes:
These auxiliary objectives regularize embedding learning and improve stability.
3.7. Reinforcement Learning Formulation for Adaptive Intervention
Adaptive instructional intervention is modeled as a Markov Decision Process:
where
S denotes states,
A denotes actions,
P denotes transition probabilities,
R denotes reward, and
is the discount factor. In the proposed formulation, the state
is derived from the Dynamic Cognition Graph, representing the learner’s current cognitive state embedding. The action
corresponds to instructional interventions, including exercise recommendation, difficulty adjustment, and feedback strategies. The reinforcement learning framework for adaptive intervention is illustrated in
Figure 4. The transition dynamics
are defined based on observed learner interactions, where the next state is obtained by updating the Dynamic Cognition Graph with new interaction events. This formulation models environment transitions through data-driven cognitive state updates.
The cumulative return is defined as:
The reward function balances performance improvement and long-term retention:
The policy is parameterized to map cognitive states to intervention decisions. A value function is jointly learned to estimate expected returns, forming an actor–critic optimization framework.
The policy is trained using logged interaction data in an offline setting, where trajectories are constructed from historical learning sequences. To ensure reliable evaluation without online deployment, we adopt offline policy evaluation based on observed outcomes, aligning predicted rewards with actual learning performance improvements.
This design enables stable policy learning while avoiding risks associated with real-time exploration in educational environments.
3.8. Computational Complexity and Scalability
Let N denote the number of interaction events and d the embedding dimension. Message computation complexity is approximately due to neural message functions. Memory updates scale linearly with event count, enabling incremental updates in streaming educational environments.
The reinforcement learning module is trained using mini-batch actor–critic optimization, ensuring computational feasibility for large-scale online learning systems.
4. Experimental Setup
This section describes the datasets, baseline models, evaluation tasks, implementation settings, and experimental protocols used to evaluate the proposed Learner Cognitive Graph (LCG) framework. The experimental design aims to assess both predictive modeling performance and the effectiveness of adaptive learning interventions under realistic educational scenarios.
4.1. Datasets
Experiments were conducted using real-world educational datasets, together with a complementary simulated setting for analyzing rare cognitive transitions.
In addition to the ASSISTments dataset and the coding dataset, we introduce a simulated setting to analyze model behavior under rare cognitive transitions. In real educational data, such transitions are typically sparse and difficult to observe directly.
To address this, we construct a simulation by injecting controlled transition patterns into learner interaction sequences, allowing us to evaluate whether the proposed framework can effectively capture and adapt to infrequent but meaningful changes in learner knowledge states.
We emphasize that this simulation is used as a complementary analysis under controlled conditions, rather than as a primary benchmark for performance comparison.
The combination of real-world datasets and the controlled simulated setting allows evaluation of the proposed framework under realistic learning conditions while also examining its robustness to rare cognitive dynamics.
4.1.1. Real-World Educational Datasets
ASSISTments Dataset
The ASSISTments dataset is a widely used benchmark for knowledge tracing research. It contains more than one million student–problem interaction records annotated with knowledge components. Each interaction includes a learner identifier, problem identifier, timestamp, correctness label, and associated skill tags. These records provide rich temporal interaction data that enable modeling of evolving learner knowledge states.
4.1.2. Dataset Statistics
Table 1 summarizes the statistics of the datasets used in our experiments.
4.2. Data Preprocessing
To prevent information leakage, the datasets were split by learner rather than by interaction. Specifically, of learners were used for training, for validation, and for testing. Within each learner sequence, interaction events were ordered chronologically to preserve temporal dependencies.
All numerical features were normalized before model training. Textual reasoning explanations were encoded using a pretrained language encoder and subsequently integrated into node attributes within the dynamic cognition graph.
4.3. Baseline Models
To ensure a comprehensive comparison, the proposed LCG framework was evaluated against representative baselines from three categories: sequential knowledge tracing models, graph-based models, and adaptive intervention strategies.
Sequential knowledge tracing baselines include Bayesian Knowledge Tracing (BKT), Deep Knowledge Tracing (DKT), Self-Attentive Knowledge Tracing (SAKT), and Attention-based Knowledge Tracing (AKT). These models represent classical probabilistic approaches as well as modern deep learning architectures designed to capture temporal learning dynamics.
Graph-based baselines include a static heterogeneous knowledge graph combined with a graph neural network (Static KG + GNN) and the Temporal Graph Network (TGN), which models evolving graph structures using event-driven updates.
To evaluate adaptive instructional policies, several intervention strategies were implemented. A rule-based static strategy represents traditional tutoring policies. A contextual bandit model performs myopic optimization without considering long-term learning effects. Additionally, a reinforcement learning baseline without dynamic cognition graph representations was implemented, where handcrafted features were used as state representations instead of graph embeddings.
All baseline methods follow their standard input settings and do not utilize reasoning text. Reasoning-enhanced behavioral representations are only incorporated in the proposed framework as a core component of the method.
4.4. Evaluation Tasks
Three experimental tasks were designed to evaluate different aspects of the proposed framework.
The first task focuses on knowledge mastery prediction, where the objective is to predict whether a learner will correctly answer the next exercise given their interaction history.
The second task evaluates cognitive state transition modeling, which measures the ability of the model to detect transitions between different knowledge mastery levels.
The third task evaluates adaptive intervention effectiveness. In this setting, the reinforcement learning agent recommends instructional actions such as exercise selection or curriculum adjustments, and the impact on learning outcomes is evaluated.
4.5. Evaluation Metrics
For knowledge mastery prediction, model performance is evaluated using Area Under the ROC Curve (AUC), prediction accuracy, Brier score for probability calibration, and Mean Absolute Error (MAE).
For cognitive state transition modeling, Macro-F1 score, Transition-F1, and Spearman temporal correlation () are used to measure classification performance and temporal alignment between predicted and observed learning trajectories.
For intervention effectiveness, policy performance is assessed using time required to reach mastery, retention gain measured through delayed assessments, and percentage improvement in learner performance scores.
4.6. Implementation Details
All models were implemented using the PyTorch(version 2.1.0) deep learning framework and trained on NVIDIA GPUs. The embedding dimension d was set to 128 unless otherwise specified. The temporal decay coefficient used in the event-driven message passing mechanism was selected from through validation experiments.
Model optimization was performed using the Adam optimizer with an initial learning rate of . For reinforcement learning components, the discount factor was set to to balance short-term performance improvements with long-term learning gains. Actor–critic networks were trained using mini-batch updates, and gradient clipping was applied to improve training stability.
Table 2 summarizes the main hyperparameters used in the experiments.
Experiments were conducted on a workstation equipped with an NVIDIA RTX 4090 GPU and 24GB RAM.
Each experiment was repeated five times with different random seeds, and the reported results correspond to the mean and standard deviation across runs.
4.7. Statistical Significance Testing
To verify that observed improvements are statistically meaningful, paired t-tests were conducted between the proposed LCG framework and the strongest baseline model for each experimental task. A significance threshold of was adopted. The strongest representative baseline for each task was selected for significance testing to provide a focused comparison against the most competitive reference method.
4.8. Ablation Study
To analyze the contribution of individual components, several ablation variants of the proposed framework were evaluated. These variants include removing the reverse Turing agent, removing the temporal decay mechanism, removing the multi-scale temporal memory module, removing self-supervised objectives, and replacing reinforcement learning with a rule-based intervention strategy.
By comparing these variants with the full LCG model, the ablation study quantifies the contributions of reasoning-aware behavioral acquisition, temporal modeling mechanisms, representation learning strategies, and policy optimization.
4.9. Cross-Dataset Generalization Evaluation
To evaluate the generalization capability of the proposed framework, cross-dataset experiments were conducted. In this setting, models were trained on one dataset and evaluated on another dataset without retraining. This evaluation assesses whether the learned cognitive representations capture transferable learning patterns rather than dataset-specific artifacts.
Specifically, two cross-dataset settings were considered: training on the ASSISTments dataset and testing on the coding platform dataset, and vice versa. In both cases, the model parameters learned from the source dataset were directly applied to the target dataset.
Performance was evaluated using AUC and Macro-F1 scores for mastery prediction and state transition detection. The observations suggest that the proposed LCG framework maintains relatively stable performance across datasets, indicating the potential generalization capability of the learned dynamic cognitive representations.
These findings should be interpreted as preliminary evidence of cross-dataset generalization rather than a comprehensive benchmark evaluation.
4.10. Cold-Start Learner Evaluation
Educational platforms frequently encounter cold-start scenarios where new learners have very limited interaction history. To evaluate model robustness under such conditions, a cold-start evaluation was conducted.
In this experiment, learners with fewer than k historical interactions were treated as cold-start users. Three levels of interaction sparsity were considered: , , and . Models were trained using the full training dataset but evaluated only on learners whose interaction histories were truncated to the specified lengths.
Performance was evaluated using prediction accuracy and AUC. The results show that the proposed LCG framework consistently outperforms baseline methods under all sparsity levels. This improvement can be attributed to the relational modeling capability of the dynamic cognition graph, which allows knowledge information to propagate across concept and learner nodes even when individual interaction histories are limited.
These results indicate that the proposed framework is robust to sparse interaction data and can effectively support new learners in real-world educational systems. These results are intended as supplementary analysis of robustness under sparse interaction conditions rather than as a standalone benchmark comparison.
4.11. Efficiency and Scalability Analysis
To evaluate the computational efficiency of the proposed framework, we measured both training time and inference latency across different model configurations.
Training efficiency was evaluated by recording the average epoch training time for each model. Inference efficiency was measured as the average time required to process a single learner interaction event.
The experiments show that although the proposed LCG framework introduces additional components such as dynamic graph updates and multi-scale memory modules, the event-driven message passing mechanism allows efficient incremental updates. As a result, the computational overhead remains manageable and scales approximately linearly with the number of interaction events.
Table 3 summarizes the efficiency comparison among the evaluated models.
The results indicate that the proposed method achieves improved modeling capability while maintaining practical computational efficiency for large-scale educational platforms.
5. Results
This section presents the experimental results evaluating the proposed Learner Cognitive Graph (LCG) framework across three tasks: knowledge mastery prediction, cognitive state transition modeling, and adaptive intervention optimization. All results are reported as mean ± standard deviation over five independent runs.
5.1. Knowledge Mastery Prediction Performance
Table 4 summarizes the prediction performance on the ASSISTments and coding datasets. The proposed LCG framework consistently achieves the best performance across all evaluation metrics.
Compared with classical probabilistic approaches such as BKT, deep sequential models (DKT, SAKT, AKT) provide clear improvements, confirming the importance of modeling temporal dependencies in learner interaction sequences. However, purely sequential models treat learning events as linear time series and fail to capture structural relationships among learners, concepts, and exercises.
Graph-based approaches further improve prediction accuracy by incorporating relational information. In particular, the temporal graph model TGN achieves stronger performance than static graph models due to its ability to model dynamic structural updates.
The proposed LCG framework achieves the highest AUC of 0.889, outperforming the strongest baseline (TGN) by 2.8 percentage points. Improvements are also observed in Brier score and MAE, indicating that the model produces better-calibrated probability estimates rather than merely improving classification accuracy.
These improvements can be attributed to two factors. First, the Dynamic Cognition Graph explicitly models interactions among learners, knowledge concepts, and exercises, allowing relational knowledge propagation. Second, the multi-scale temporal memory mechanism enables the model to capture both short-term behavioral signals and long-term learning patterns.
Paired t-tests confirm that the improvements over TGN are statistically significant in both AUC () and MAE ().
5.2. Cognitive State Transition Modeling
Table 5 reports the results for cognitive state transition detection.
Sequential models show moderate performance in cognitive state transition detection, as they are optimized for next-step prediction rather than structural change modeling. Static graph models improve results by incorporating relational dependencies among concepts, while temporal graph models further benefit from evolving interaction structures.
The proposed LCG model achieves the best performance across all metrics, reaching a Macro-F1 of 0.814, a Transition-F1 of 0.789, and a Spearman correlation of 0.931. Compared with TGN, LCG improves Macro-F1 and Transition-F1 by 3.3 percentage points and yields stronger alignment between predicted cognitive trajectories and actual learning trends. These results indicate that multi-scale temporal modeling is important for detecting abrupt cognitive shifts in real learning processes.
5.3. Intervention Policy Evaluation
Table 6 compares the effectiveness of different instructional intervention strategies.
Rule-based strategies represent traditional tutoring policies and achieve limited improvement due to their inability to adapt to individual learner states. Contextual bandit approaches improve performance by incorporating interaction context but optimize only short-term rewards.
Reinforcement learning methods further improve learning outcomes by considering long-term reward signals. However, their performance depends heavily on the quality of the state representation.
When reinforcement learning is combined with dynamic cognitive graph embeddings, the proposed DCG+RL framework achieves the best performance. Specifically, time-to-80% mastery decreases from 8.3 days to 7.4 days, representing a 10.8% improvement over RL without graph-based representations. Retention gain increases from 10.4% to 13.6%, indicating improved long-term knowledge consolidation.
These findings demonstrate that high-quality cognitive state representations significantly enhance policy learning in adaptive educational systems.
5.4. Detailed Ablation Analysis
Table 7 presents the ablation study analyzing the contribution of individual components.
Removing the reverse Turing agent leads to a noticeable performance drop, indicating that structured reasoning evidence improves behavioral data quality for cognitive modeling.
Eliminating the temporal decay mechanism reduces prediction accuracy and transition detection performance, suggesting that modeling heterogeneous temporal scales is critical for representing learning dynamics.
Similarly, removing the multi-scale memory module leads to a consistent decline in performance across all metrics, confirming the importance of capturing both short-term and long-term learning signals.
Finally, replacing reinforcement learning with a rule-based intervention strategy significantly reduces retention gain, demonstrating that optimizing long-term learning outcomes requires adaptive policy learning.
5.5. Overall Trend Analysis
Across all experimental tasks, three consistent patterns emerge.
First, sequential models outperform probabilistic baselines but remain limited by their inability to model structural relationships among learning entities.
Second, static graph models improve relational awareness but struggle to capture dynamic cognitive transitions.
Third, temporal graph models close part of this gap, yet their performance remains constrained without multi-scale temporal modeling and structured behavioral evidence.
By integrating dynamic graph modeling, multi-scale memory mechanisms, and reinforcement learning-based policy optimization, the proposed LCG framework consistently achieves the best performance across prediction accuracy, transition detection, and intervention effectiveness.
6. Discussion
6.1. Implications for Dynamic Cognitive Modeling
The experimental results demonstrate that modeling learner cognition as a dynamic heterogeneous graph provides measurable advantages over purely sequential or static relational approaches. Improvements observed in both knowledge mastery prediction and cognitive state transition detection suggest that the hybrid representation—combining relational structure with temporal memory—better aligns with the underlying dynamics of learning.
The multi-scale memory mechanism plays a particularly important role. Educational interactions exhibit both short-term fluctuations (e.g., practice bursts) and long-term consolidation patterns. Sequential models often overemphasize recent interactions, whereas static graph models neglect temporal recency altogether. By explicitly decomposing temporal representations into short-, medium-, and long-term components, the proposed framework stabilizes embedding updates and reduces variance in state estimation. This structural decomposition may explain the consistent improvement in transition-F1 and temporal correlation.
Furthermore, incorporating structured reasoning evidence through the reverse Turing agent appears to enhance representation quality. The ablation study indicates that removing reasoning-based features leads to degradation across tasks, suggesting that cognitive modeling benefits from richer semantic signals beyond binary correctness labels.
6.2. Impact on Adaptive Intervention Policies
A central contribution of this work is demonstrating how dynamic cognitive representations improve reinforcement learning-based intervention strategies. The results show that using DCG embeddings as RL states reduces time-to-mastery and increases retention gain compared with policies based on handcrafted features.
This improvement can be interpreted as enhanced state observability. In educational environments, learner states are partially observable and non-stationary. Simplified performance indicators may fail to capture deeper structural dependencies among concepts. By encoding relational and temporal information jointly, the DCG representation provides a more informative state space for policy optimization. Consequently, the RL agent can better balance short-term performance improvement with long-term knowledge consolidation.
Importantly, the observed gains are moderate rather than extreme. The reduction in time-to-mastery (approximately 10%) and retention improvement (approximately 3 percentage points) indicate practical but realistic enhancements. This moderate improvement suggests that dynamic representation learning contributes meaningfully without overstating its impact.
6.3. Comparison with Existing Paradigms
The comparative results reveal three distinct trends across modeling paradigms.
First, probabilistic models such as BKT provide interpretable but limited representations, particularly in complex relational settings. Second, sequential deep learning models capture temporal patterns but lack structural awareness. Third, static graph models improve relational modeling but struggle to adapt to evolving interaction dynamics.
Temporal graph models partially address this limitation; however, without multi-scale memory and structured semantic enrichment, their improvements remain incremental. The proposed framework integrates these components into a unified architecture, enabling consistent gains across prediction and intervention tasks.
These findings support the hypothesis that neither purely sequential nor purely structural modeling is sufficient for dynamic educational environments. Instead, hybrid approaches combining relational graphs with temporal event-driven updates offer a more comprehensive representation.
6.4. Stability and Scalability Considerations
From a system perspective, the incremental event-driven update mechanism enables efficient deployment in large-scale online learning platforms. Unlike static retraining approaches, the DCG framework updates node embeddings locally upon new interactions. The computational complexity scales linearly with event count, making it suitable for high-frequency educational data streams.
The reinforcement learning component also remains computationally tractable due to mini-batch actor–critic optimization. While the integration of multi-scale memory slightly increases model parameters, empirical training remained stable across multiple random seeds, as reflected by low standard deviations in reported metrics.
6.5. Limitations
Several limitations should be acknowledged.
First, although synthetic datasets were used to simulate rare cognitive transitions, the long-term educational impact of the proposed framework has not yet been validated through extended real-world longitudinal studies. Future work should evaluate retention and behavioral adaptation over longer time horizons in authentic learning environments.
Second, the reverse Turing agent relies on large language models, which introduce additional computational overhead and potential variability in the collected reasoning evidence. Although the proposed framework mitigates part of this issue by encoding textual explanations into dense semantic representations rather than directly using raw text, the quality of reasoning signals may still be affected by ambiguity, incompleteness, or inconsistency in learner responses. Further alignment, filtering, and quality-control strategies are needed to improve robustness.
Third, the reinforcement learning evaluation in this study was conducted in an offline simulation setting constructed from historical interaction data. While this setup enables controlled comparison of intervention policies, it cannot fully capture the complexity of live educational deployment. Future work should validate the proposed policy framework in real-world adaptive learning systems.
Fourth, reward design in reinforcement learning requires balancing immediate performance improvement and delayed retention effects. Alternative reward formulations may lead to different intervention behaviors, and future work should investigate adaptive or meta-learned reward mechanisms.
Finally, although the proposed framework improves predictive accuracy and intervention efficiency, the observed gains remain incremental rather than transformative. Educational outcomes are influenced by many external pedagogical, behavioral, and contextual factors beyond algorithmic optimization alone.
6.6. Future Directions
Future research may extend this framework in several directions. First, incorporating multimodal signals such as eye-tracking or affective cues could further enrich cognitive state representations. Second, federated or privacy-preserving training strategies may enable cross-institution deployment while protecting learner data. Third, meta-reinforcement learning could allow rapid adaptation to new curricula or learner populations. Finally, theoretical analysis of dynamic graph stability under non-stationary environments remains an open area of investigation.
Overall, this study provides empirical evidence that integrating dynamic graph modeling with reinforcement learning offers a practical pathway toward more adaptive and interpretable intelligent educational systems. Beyond educational applications, the temporal message-based modeling paradigm may generalize to other domains involving dynamic relational processes. For example, in urban analysis tasks such as building façade understanding [
39], temporal or structured observations of building elements can be modeled using similar dynamic graph representations. Similarly, in air handling unit fault detection and diagnosis (AFDD) [
40], time-evolving sensor relationships and operational states can be effectively captured through temporal message passing, indicating the broader applicability of the proposed framework.
7. Conclusions
This study proposed a Learner Cognitive Graph (LCG) framework that integrates dynamic graph-based cognitive modeling with reinforcement learning for adaptive educational intervention. By formalizing learner cognition as an event-driven heterogeneous graph and incorporating multi-scale temporal memory, the framework captures both structural dependencies and evolving interaction dynamics. The learned graph embeddings serve as state representations for reinforcement learning, enabling policy optimization that balances short-term performance improvement with long-term knowledge retention.
Experimental results on real-world and simulated datasets demonstrate consistent improvements over representative sequential, static graph, and temporal graph baselines. The proposed approach achieves higher knowledge mastery prediction accuracy, more precise cognitive state transition modeling, and improved intervention efficiency. Ablation studies further confirm the contribution of temporal decay modeling, multi-scale memory decomposition, structured behavioral evidence acquisition, and reinforcement learning-based optimization.
Importantly, performance gains remain moderate and statistically significant, suggesting practical rather than exaggerated improvements. The event-driven update mechanism supports incremental computation, making the framework suitable for deployment in large-scale online educational systems.
While several limitations remain, including reliance on language model-based reasoning evidence and reward design sensitivity, the present work provides empirical evidence that dynamic cognitive representations enhance adaptive intervention policies. The proposed integration of heterogeneous graph modeling and reinforcement learning offers a systematic and scalable approach for intelligent educational systems, and may serve as a foundation for future research in dynamic learner modeling and personalized instruction.