Dynamic Cognition Graph for Adaptive Learning: Integrating Reasoning Evidence and Reinforcement Learning

Li, Ying; Gai, Yiming; Wang, Xingyu; Sun, Leilei; Huang, Xuefei

doi:10.3390/app16073580

Open AccessArticle

Dynamic Cognition Graph for Adaptive Learning: Integrating Reasoning Evidence and Reinforcement Learning

by

Ying Li

^1,2,

Yiming Gai

¹

,

Xingyu Wang

¹,

Leilei Sun

^1,2

and

Xuefei Huang

^1,*

¹

Hangzhou International Innovation Institute, Beihang University, Hangzhou 311115, China

²

School of Computer Science and Engineering, Beihang University, Beijing 100191, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(7), 3580; https://doi.org/10.3390/app16073580

Submission received: 10 March 2026 / Revised: 25 March 2026 / Accepted: 1 April 2026 / Published: 6 April 2026

(This article belongs to the Special Issue Artificial Intelligence in Education: Latest Advances and Prospects)

Download

Browse Figures

Versions Notes

Abstract

Accurate modeling of learners’ evolving cognitive states is essential for intelligent educational systems, yet many existing knowledge tracing and graph-based approaches rely on static structures or purely sequential representations that inadequately capture dynamic structural changes in learning processes. This study proposes a Learner Cognitive Graph (LCG) framework that integrates dynamic heterogeneous graph modeling, structured behavioral data acquisition, and reinforcement learning-based intervention optimization. A Dynamic Cognition Graph (DCG) is formally defined as a sequence of temporally evolving graph snapshots representing interactions among learners, knowledge concepts, and exercises. A reverse Turing test-based agent with structured prompting is introduced to collect reasoning-oriented behavioral evidence, improving data reliability for cognitive modeling. Temporal message passing, multi-scale memory updating, and self-supervised learning objectives are employed to construct dynamic cognitive representations. Personalized intervention is formulated as a Markov decision process to optimize long-term learning outcomes. Experiments conducted on real-world and simulated educational datasets demonstrate improved knowledge mastery prediction accuracy, cognitive state transition modeling, and intervention efficiency compared with representative baselines. The proposed framework provides a systematic and scalable approach for dynamic cognitive modeling and adaptive educational support.

Keywords:

dynamic cognition graph; knowledge tracing; graph neural networks; temporal representation learning; large language models; reinforcement learning; personalized learning; educational data mining

1. Introduction

The rapid development of digital learning platforms and intelligent tutoring systems has significantly expanded the availability of fine-grained educational interaction data [1,2]. These data provide unprecedented opportunities to model learners’ cognitive processes and to design adaptive instructional strategies. Accurate cognitive modeling is central to intelligent education, as it enables systems to estimate knowledge mastery, predict future performance, and deliver personalized interventions [3]. However, despite considerable progress in educational data mining, the dynamic and structural complexity of learning processes remains insufficiently captured by many existing approaches.

Early research in knowledge tracing relied on probabilistic frameworks such as Bayesian Knowledge Tracing, which modeled skill mastery as latent binary states evolving over time [4]. While effective in small-scale settings, such methods often assumed independence among knowledge components and limited temporal expressiveness. The introduction of deep learning led to sequential models such as Deep Knowledge Tracing (DKT), which employed recurrent neural networks to capture temporal dependencies [5]. Subsequent extensions incorporated attention mechanisms and self-attentive architectures to improve long-range dependency modeling [6,7]. Nevertheless, many sequential approaches treat learning interactions as linear time series, without explicitly modeling structural relationships among learners, knowledge concepts, and exercises.

Graph-based educational modeling has emerged as a promising direction to address structural dependencies. By representing entities as nodes and interactions as edges, graph neural networks (GNNs) can capture relational information and multi-hop dependencies [8]. Static knowledge graphs and heterogeneous graph models have been applied to educational scenarios to improve prediction accuracy and interpretability [9,10]. However, a key limitation remains: many graph-based models operate on fixed or discretized snapshots and do not fully account for continuous structural evolution. Learning processes involve both gradual knowledge accumulation and abrupt cognitive restructuring, consistent with theoretical perspectives in cognitive psychology [11,12]. Capturing this hybrid dynamic—where temporal continuity coexists with discrete event transitions—requires modeling mechanisms beyond static graph structures.

In parallel, large language models (LLMs) have demonstrated strong capabilities in reasoning, dialogue generation, and explanation synthesis [13,14]. Their application in education has focused primarily on automated tutoring, feedback generation, and content creation [15]. Although LLMs can generate detailed reasoning traces through structured prompting, concerns remain regarding hallucination, reliability, and the lack of integration with formal analytical models [16]. Current systems often treat LLMs as standalone conversational tools rather than as components within a rigorous cognitive modeling framework. This separation limits their potential contribution to dynamic learner modeling.

Reinforcement learning (RL) has also been explored for adaptive educational decision-making, where instructional policies are optimized to maximize long-term learning gains [17,18]. However, many RL-based tutoring systems rely on simplified state representations derived from handcrafted features or shallow performance indicators. Without robust dynamic cognitive embeddings, policy optimization may fail to reflect deeper structural changes in learner understanding.

These observations reveal several research gaps. First, there is a lack of unified frameworks that integrate dynamic structural modeling with temporal learning trajectories. Second, passive behavioral logging does not adequately capture learners’ reasoning processes, limiting interpretability. Third, existing adaptive intervention strategies often emphasize short-term performance rather than long-term cognitive development. Finally, the integration of LLM-based reasoning evidence into formal graph-based cognitive models remains underexplored. More importantly, existing studies typically address these aspects in isolation, lacking a unified formulation that connects reasoning evidence, cognitive representation, and decision-making.

To address these challenges, this study proposes a Learner Cognitive Graph (LCG) framework that formulates learner modeling as a unified process connecting behavioral evidence, relational structure, and instructional decision-making. Instead of treating dynamic graph modeling, reasoning acquisition, and policy optimization as independent components, the proposed framework establishes a coherent mechanism in which reasoning signals are transformed into structured supervision, cognitive states are modeled through temporally evolving graphs, and intervention strategies are optimized based on these representations. A Dynamic Cognition Graph (DCG) is formally defined to represent temporally evolving interactions among learners, knowledge concepts, and exercises. A reverse Turing test-based agent employing structured prompting strategies is designed to elicit reasoning-oriented behavioral evidence, enhancing data fidelity. Dynamic graph representation learning with temporal message passing and self-supervised objectives constructs interpretable cognitive embeddings. Personalized intervention is formulated as a Markov decision process, enabling long-term reward optimization.

The main aim of this work is to develop a unified and scalable framework for dynamic cognitive modeling and adaptive educational support. The key contribution lies in the formulation of a reasoning-aware cognitive modeling paradigm, where behavioral acquisition, structural representation, and policy optimization are tightly coupled within a single learning process. By explicitly linking behavioral perception, structural modeling, and policy optimization within a unified formulation, the proposed approach contributes to the advancement of intelligent educational systems that are both interpretable and adaptive.

2. Related Work

2.1. Knowledge Tracing and Cognitive Modeling

Knowledge tracing aims to model learners’ evolving mastery of skills or concepts based on interaction histories. Early approaches were predominantly probabilistic. Bayesian Knowledge Tracing (BKT) represented knowledge as a latent binary variable governed by transition and emission probabilities [4]. While interpretable and computationally efficient, BKT assumed independence among knowledge components and relied on manually defined parameters, limiting its expressiveness in complex learning environments.

The emergence of deep learning significantly reshaped knowledge tracing research. Deep Knowledge Tracing (DKT) introduced recurrent neural networks to model temporal dependencies in student interaction sequences [5]. Subsequent models, including memory-augmented networks and self-attentive architectures such as SAKT and AKT, improved long-range dependency modeling and concept interaction representation [6,7], while more recent studies further enhanced sequential knowledge tracing through contrastive and multi-expert learning strategies [19]. These methods achieved higher predictive accuracy compared to probabilistic baselines. More recently, large language models have been introduced into knowledge tracing to incorporate semantic understanding of problem content and learner responses, further improving representation capability in modeling semantic aspects of learner behavior [20,21]. However, most sequential models treat educational data as linear time series, without explicitly modeling structural relationships among learners, exercises, and knowledge concepts.

Graph-based extensions attempted to overcome this limitation by incorporating relational structures. Heterogeneous graph neural networks (HGNNs) and knowledge graph-enhanced tracing models represent students, concepts, and problems as nodes connected through typed edges [9,10]. Such approaches capture inter-concept dependencies and relational reasoning, offering improved interpretability. Nevertheless, many graph-based knowledge tracing models operate on static or periodically updated graphs. Recent studies have explored dynamic graph learning mechanisms to model temporal structural evolution, such as DyGKT and temporal graph memory networks [22,23]. However, these approaches still focus primarily on interaction-level signals and do not incorporate reasoning-derived cognitive evidence. They do not fully capture the continuous structural evolution of cognitive states driven by ongoing interactions, forgetting, and strategy shifts.

A fundamental divergence in the literature concerns whether learning dynamics are best modeled as sequential processes or relational structural processes. Sequential models emphasize temporal order, whereas graph-based models emphasize relational context. A unified framework that jointly integrates temporal dynamics, structural adaptation, and reasoning-aware cognitive signals remains largely underexplored.

2.2. Dynamic Graph Neural Networks

Dynamic graph neural networks (DGNNs) extend traditional GNNs to evolving graphs where nodes and edges change over time. Models such as Temporal Graph Networks (TGN) and attention-based temporal GNNs incorporate memory modules and event-driven updates to represent temporal evolution [24,25]. These approaches have demonstrated strong performance in recommendation systems, social network analysis, and traffic forecasting.

Temporal message passing mechanisms enable node embeddings to evolve in response to event streams, capturing both short-term interactions and long-term dependencies. Memory units such as gated recurrent structures allow the model to retain historical information while adapting to new evidence [26]. Some approaches further introduce time-decay functions or positional encoding to model recency effects.

Despite their effectiveness in general dynamic graph settings, DGNNs have been relatively underexplored in educational contexts. Educational data exhibit unique characteristics, including multi-typed relations (e.g., learner–concept, learner–exercise, concept–concept) and hybrid dynamics combining continuous mastery progression with discrete behavioral events. Existing educational graph models often simplify temporal evolution or rely on coarse-grained snapshots, limiting their ability to represent fine-grained cognitive restructuring.

Applying DGNN techniques to cognitive modeling requires careful adaptation, including event encoding strategies tailored to educational semantics and multi-scale temporal memory mechanisms. Bridging dynamic graph representation learning with educational cognitive theory therefore remains an important research direction.

2.3. Large Language Models in Education

Large language models (LLMs) have demonstrated impressive capabilities in natural language reasoning, explanation generation, and interactive dialogue [13,14]. Their application in education has rapidly expanded, including automated tutoring, feedback generation, content summarization, and question answering systems [15]. Structured prompting strategies, such as chain-of-thought reasoning, have been shown to improve reasoning transparency and output reliability [27]. Their application in education has rapidly expanded, including automated tutoring, feedback generation, content summarization, and question answering systems [20,28].

However, several challenges accompany the integration of LLMs into educational systems. First, LLM-generated responses may contain hallucinations or factual inaccuracies, raising concerns about reliability [16]. Second, most current educational applications treat LLMs as standalone conversational agents rather than as components within formal analytic frameworks. Third, LLM outputs are typically not integrated into structured cognitive state representations suitable for downstream modeling and optimization.

Recent studies have further explored integrating LLMs with knowledge tracing and structured learning tasks, aiming to leverage semantic reasoning to enhance learner modeling [20,21,29]. However, these approaches primarily focus on improving prediction performance and do not explicitly incorporate reasoning evidence into dynamic structural representations. Nevertheless, systematic protocols for incorporating LLM-derived reasoning evidence into dynamic graph-based learner models remain limited. This gap suggests the need for architectures that combine interactive data elicitation with formal representation learning. In a broader sense, prior studies have shown that multi-modal feature fusion can enhance semantic representation quality in complex understanding tasks [30], which also supports the potential value of integrating richer behavioral signals into learner modeling.

2.4. Reinforcement Learning for Adaptive Education

Reinforcement learning (RL) has been proposed as a framework for optimizing adaptive instructional policies. By modeling learning environments as Markov decision processes, RL approaches aim to maximize cumulative learning gains through sequential decision-making [31,32]. Applications include exercise recommendation, curriculum sequencing, and intelligent tutoring systems.

Early RL-based educational systems often relied on simplified state representations derived from performance scores or handcrafted features [17]. More recent studies have incorporated neural representations and structured knowledge modeling to capture richer contextual information [32]. However, policy optimization quality strongly depends on the expressiveness of the state representation. Without dynamic and interpretable cognitive embeddings, RL policies may optimize short-term performance without promoting long-term knowledge retention.

Another divergence in the literature concerns reward design [32]. Some systems emphasize immediate accuracy improvements, whereas others attempt to incorporate delayed retention effects or engagement metrics. Balancing short-term performance with long-term cognitive stability remains an unresolved challenge.

2.5. Summary and Positioning of This Work

In summary, prior research has made substantial advances in sequential knowledge tracing, graph-based relational modeling, dynamic graph representation learning, LLM-based tutoring, and reinforcement learning for adaptive instruction. However, these research directions have largely evolved independently. Sequential models often lack structural awareness, static graph models insufficiently capture temporal evolution, LLM-based systems lack formal integration with cognitive modeling, and RL approaches frequently rely on limited state representations.

The present study positions itself at the intersection of these research streams. By integrating dynamic heterogeneous graph modeling, structured LLM-driven behavioral data acquisition, self-supervised temporal representation learning, and reinforcement learning-based intervention optimization, the proposed framework aims to provide a unified architecture for dynamic cognitive modeling and adaptive educational support.

Despite these advances, existing approaches typically focus on isolated aspects of the problem, such as temporal prediction, relational modeling, or policy optimization, without establishing a unified mechanism that connects reasoning evidence, cognitive representation, and decision-making.

In contrast, the proposed framework formulates these components within a single learning paradigm, where reasoning signals are explicitly incorporated into representation learning, and the resulting cognitive states are directly used to guide policy optimization.

3. Materials and Methods

3.1. Dynamic Cognitive Representation as State Encoding for Adaptive Intervention

The central objective of intelligent educational systems is to estimate learners’ evolving cognitive states and utilize these estimates to guide adaptive instructional decisions [1,33]. In reinforcement learning (RL)-based tutoring systems, the quality of policy optimization critically depends on the expressiveness and stability of the state representation [34]. Traditional state representations often rely on simplified statistics such as recent correctness rates or handcrafted performance indicators. While computationally convenient, such representations fail to capture deeper structural relationships among learners, knowledge concepts, and exercises.

To address this limitation, this study models learner cognition as a temporally evolving relational structure termed the Dynamic Cognition Graph (DCG). In this formulation, learner states are represented through graph embeddings derived from heterogeneous interactions among learners, knowledge concepts, and exercises [35]. The learned graph embeddings serve as state vectors for downstream intervention policies.

By integrating relational structure and temporal dynamics into a unified representation, the proposed approach provides a richer description of learning trajectories. This design enables adaptive intervention strategies to consider both immediate behavioral evidence and long-term cognitive development patterns.The overall architecture of the proposed system is illustrated in Figure 1.

3.2. Reverse Turing Agent for Reasoning-Oriented Behavioral Acquisition

Traditional educational data collection relies primarily on passive logging of learner interactions, such as correctness labels and response times [2]. Although these signals provide coarse behavioral indicators, they often fail to capture the reasoning processes underlying learner responses. To obtain richer cognitive evidence, a Reverse Turing Agent is introduced to elicit structured reasoning explanations during learner interactions.

In contrast to the classical Turing test [36], the reverse Turing paradigm encourages learners to demonstrate reasoning ability to convince the system that their answers are derived from genuine understanding rather than guessing. The detailed system architecture is illustrated in Figure 2.

For each interaction event, the learner provides a short reasoning explanation defined as

r_{i}^{text} = G (x_{i}, y_{i}, p_{i})

(1)

where

x_{i}

denotes the exercise content,

y_{i}

denotes the learner answer, and

p_{i}

denotes the structured prompt generated by the Reverse Turing Agent.

G (\cdot)

denotes the reasoning elicitation function that maps interaction inputs to textual explanations.

In practice, the prompt

p_{i}

follows a structured elicitation template designed to guide concise reasoning:

p_{i} \in \{\begin{matrix} “ Please explain how you arrived at your answer . ”, \\ “ Which concept did you use in this step ? ”, \\ “ If your answer is incorrect, identify the most likely reason . ” \end{matrix}

(2)

The textual explanation is encoded using a pretrained language encoder based on modern transformer architectures [37,38]:

z_{i} = Enc (r_{i}^{text})

(3)

The resulting semantic embedding is integrated into the event attribute vector:

a_{i} = [c_{i}, τ_{i}, z_{i}]

(4)

where

c_{i}

denotes correctness and

τ_{i}

represents response time. The reasoning embedding

z_{i}

provides additional semantic signals that help identify misconceptions and partial understanding. To reduce the impact of noisy, incomplete, or linguistically variable reasoning responses, textual explanations are not used as raw supervisory targets. Instead, they are encoded into dense semantic representations and incorporated as auxiliary behavioral attributes. This design allows the framework to capture high-level reasoning cues while reducing sensitivity to surface-form variation and minor inconsistencies in learner-generated explanations. Specifically, excessively short, off-topic, and duplicated responses are filtered prior to encoding to improve robustness.

3.3. Dynamic Cognition Graph Formalization

Learner cognition is represented as a dynamic heterogeneous graph evolving over time. At time t, the Dynamic Cognition Graph is defined as:

G_{t} = (V_{t}, E_{t}, R, X_{t})

(5)

where

V_{t} = U \cup K \cup Q

represents learners, knowledge concepts, and exercises,

E_{t}

denotes interaction edges, R indicates relation types, and

X_{t}

denotes node features. The structure of the dynamic cognition graph is illustrated in Figure 3.

In the proposed formulation, the relation set R includes multiple typed relations to capture heterogeneous educational interactions, including learner–exercise response relations, learner–concept mastery relations, and concept–concept dependency relations. These relation types enable the graph to jointly represent behavioral evidence and knowledge structure within a unified framework.

Each node

v \in V_{t}

is associated with a type-specific representation. The initial embedding is defined as:

h_{v}^{(0)} = {Enc}_{type (v)} (x_{v})

(6)

where

x_{v}

denotes raw node attributes and

{Enc}_{type (v)} (\cdot)

is a type-specific encoder. Learner nodes are initialized from historical interaction statistics, while concept and exercise nodes are initialized from learnable embeddings.

Graph evolution is driven by interaction events:

e_{i} = (u_{i}, v_{i}, r_{i}, t_{i}, a_{i})

(7)

where

u_{i}

and

v_{i}

denote interacting nodes,

r_{i}

is the relation type,

t_{i}

is the timestamp, and

a_{i}

contains behavioral attributes defined in Equation (4).

The graph update process is defined as:

G_{t + 1} = F (G_{t}, e_{t + 1})

(8)

where F performs local structural updates and embedding refinement.

In practice, the update function consists of event encoding and temporal message passing. Each event is first encoded as:

m_{i} = ϕ (h_{u_{i}}, h_{v_{i}}, r_{i}, a_{i})

(9)

where

ϕ (\cdot)

denotes an event encoder. The encoded messages are then aggregated over temporal neighborhoods to update node representations.

To improve robustness, a masking strategy is applied during training, where partial node features and event attributes are randomly masked and reconstructed from context.

In addition, negative sampling is adopted by generating corrupted interaction pairs, enabling the model to distinguish observed interactions from unobserved ones.

The model is trained using mini-batch optimization over temporally ordered event sequences, where graph representation learning and downstream objectives are optimized jointly.

3.4. Event-Driven Message Passing Mechanism

To model temporal dependencies in learner interactions, an event-driven message passing mechanism is adopted. Each interaction generates a message:

m_{i} = ϕ (h_{u_{i}} (t_{i}^{-}), h_{v_{i}} (t_{i}^{-}), r_{i}, a_{i})

(10)

where

h_{u_{i}} (t_{i}^{-})

and

h_{v_{i}} (t_{i}^{-})

represent node embeddings before the interaction event.

To capture recency effects, temporal decay is introduced:

{\bar{m}}_{v} (t) = \sum_{i \in I_{v} (t)} exp (- β (t - t_{i})) m_{i}

(11)

Node states are updated through a gated recurrent mechanism:

h_{v} (t) = GRU (h_{v} (t^{-}), {\bar{m}}_{v} (t))

(12)

This mechanism allows continuous refinement of cognitive representations while preserving long-term dependencies.

3.5. Multi-Scale Spatiotemporal Memory Modeling

Educational interactions often contain heterogeneous temporal patterns. To capture these dynamics, node representations are decomposed into multi-scale temporal components:

h_{v} (t) = h_{v}^{(s h o r t)} (t) \oplus h_{v}^{(m i d)} (t) \oplus h_{v}^{(l o n g)} (t)

(13)

where ⊕ denotes concatenation.

Different temporal decay parameters are used to capture short-term practice, medium-term learning trends, and long-term knowledge retention.

3.6. Self-Supervised Stabilization of Cognitive Embeddings

Educational datasets often contain sparse supervision signals. To improve representation robustness, two self-supervised objectives are introduced.

Temporal contrastive learning is defined as:

L_{c o n} = - log \frac{exp (s i m (z, z^{+}) / τ)}{exp (s i m (z, z^{+}) / τ) + \sum_{k} exp (s i m (z, z_{k}^{-}) / τ)}

(14)

Masked reconstruction loss is defined as:

L_{r e c} = {∥ X - \hat{X} ∥}_{2}^{2}

(15)

The final training objective becomes:

L = L_{s u p} + λ_{1} L_{c o n} + λ_{2} L_{r e c}

(16)

These auxiliary objectives regularize embedding learning and improve stability.

3.7. Reinforcement Learning Formulation for Adaptive Intervention

Adaptive instructional intervention is modeled as a Markov Decision Process:

M = 〈 S, A, P, R, γ 〉

(17)

where S denotes states, A denotes actions, P denotes transition probabilities, R denotes reward, and

γ

is the discount factor. In the proposed formulation, the state

s_{t} \in S

is derived from the Dynamic Cognition Graph, representing the learner’s current cognitive state embedding. The action

a_{t} \in A

corresponds to instructional interventions, including exercise recommendation, difficulty adjustment, and feedback strategies. The reinforcement learning framework for adaptive intervention is illustrated in Figure 4. The transition dynamics

P (s_{t + 1} ∣ s_{t}, a_{t})

are defined based on observed learner interactions, where the next state is obtained by updating the Dynamic Cognition Graph with new interaction events. This formulation models environment transitions through data-driven cognitive state updates.

The cumulative return is defined as:

G_{t} = \sum_{k = 0}^{\infty} γ^{k} r_{t + k}

(18)

The reward function balances performance improvement and long-term retention:

r_{t} = α Δ A c c_{t} + (1 - α) Δ R e t_{t} - η C o s t (a_{t})

(19)

The policy

π (a_{t} ∣ s_{t})

is parameterized to map cognitive states to intervention decisions. A value function is jointly learned to estimate expected returns, forming an actor–critic optimization framework.

The policy is trained using logged interaction data in an offline setting, where trajectories are constructed from historical learning sequences. To ensure reliable evaluation without online deployment, we adopt offline policy evaluation based on observed outcomes, aligning predicted rewards with actual learning performance improvements.

This design enables stable policy learning while avoiding risks associated with real-time exploration in educational environments.

3.8. Computational Complexity and Scalability

Let N denote the number of interaction events and d the embedding dimension. Message computation complexity is approximately

O (N d^{2})

due to neural message functions. Memory updates scale linearly with event count, enabling incremental updates in streaming educational environments.

The reinforcement learning module is trained using mini-batch actor–critic optimization, ensuring computational feasibility for large-scale online learning systems.

4. Experimental Setup

This section describes the datasets, baseline models, evaluation tasks, implementation settings, and experimental protocols used to evaluate the proposed Learner Cognitive Graph (LCG) framework. The experimental design aims to assess both predictive modeling performance and the effectiveness of adaptive learning interventions under realistic educational scenarios.

4.1. Datasets

Experiments were conducted using real-world educational datasets, together with a complementary simulated setting for analyzing rare cognitive transitions.

In addition to the ASSISTments dataset and the coding dataset, we introduce a simulated setting to analyze model behavior under rare cognitive transitions. In real educational data, such transitions are typically sparse and difficult to observe directly.

To address this, we construct a simulation by injecting controlled transition patterns into learner interaction sequences, allowing us to evaluate whether the proposed framework can effectively capture and adapt to infrequent but meaningful changes in learner knowledge states.

We emphasize that this simulation is used as a complementary analysis under controlled conditions, rather than as a primary benchmark for performance comparison.

The combination of real-world datasets and the controlled simulated setting allows evaluation of the proposed framework under realistic learning conditions while also examining its robustness to rare cognitive dynamics.

4.1.1. Real-World Educational Datasets

ASSISTments Dataset

The ASSISTments dataset is a widely used benchmark for knowledge tracing research. It contains more than one million student–problem interaction records annotated with knowledge components. Each interaction includes a learner identifier, problem identifier, timestamp, correctness label, and associated skill tags. These records provide rich temporal interaction data that enable modeling of evolving learner knowledge states.

Online Coding Platform Dataset

To evaluate the proposed framework in a programming education scenario, anonymized interaction data were collected from an online coding learning platform.

The dataset consists of learner–exercise interaction records derived from programming tasks of varying difficulty levels, including basic syntax exercises, algorithmic problems, and code debugging tasks. Each interaction sample contains the problem content, learner response, correctness label, response time, and evaluation score.

From a structural perspective, the dataset includes multiple learners interacting with a diverse set of coding problems, forming sequential interaction records that reflect the evolution of learner knowledge over time.

In addition, reasoning explanations were elicited using the proposed Reverse Turing Agent during learner interaction. Specifically, learners were prompted to provide short explanations of their answers, such as describing the applied concept or identifying potential errors. These explanations were automatically collected as part of the interaction process without manual annotation.

To ensure data quality, simple filtering strategies were applied to remove excessively short, off-topic, and duplicated responses before encoding. The textual explanations were then transformed into dense semantic representations for subsequent modeling.

All data were anonymized prior to analysis, and no personally identifiable information was collected or stored.

4.1.2. Dataset Statistics

Table 1 summarizes the statistics of the datasets used in our experiments.

4.2. Data Preprocessing

To prevent information leakage, the datasets were split by learner rather than by interaction. Specifically,

70 %

of learners were used for training,

10 %

for validation, and

20 %

for testing. Within each learner sequence, interaction events were ordered chronologically to preserve temporal dependencies.

All numerical features were normalized before model training. Textual reasoning explanations were encoded using a pretrained language encoder and subsequently integrated into node attributes within the dynamic cognition graph.

4.3. Baseline Models

To ensure a comprehensive comparison, the proposed LCG framework was evaluated against representative baselines from three categories: sequential knowledge tracing models, graph-based models, and adaptive intervention strategies.

Sequential knowledge tracing baselines include Bayesian Knowledge Tracing (BKT), Deep Knowledge Tracing (DKT), Self-Attentive Knowledge Tracing (SAKT), and Attention-based Knowledge Tracing (AKT). These models represent classical probabilistic approaches as well as modern deep learning architectures designed to capture temporal learning dynamics.

Graph-based baselines include a static heterogeneous knowledge graph combined with a graph neural network (Static KG + GNN) and the Temporal Graph Network (TGN), which models evolving graph structures using event-driven updates.

To evaluate adaptive instructional policies, several intervention strategies were implemented. A rule-based static strategy represents traditional tutoring policies. A contextual bandit model performs myopic optimization without considering long-term learning effects. Additionally, a reinforcement learning baseline without dynamic cognition graph representations was implemented, where handcrafted features were used as state representations instead of graph embeddings.

All baseline methods follow their standard input settings and do not utilize reasoning text. Reasoning-enhanced behavioral representations are only incorporated in the proposed framework as a core component of the method.

4.4. Evaluation Tasks

Three experimental tasks were designed to evaluate different aspects of the proposed framework.

The first task focuses on knowledge mastery prediction, where the objective is to predict whether a learner will correctly answer the next exercise given their interaction history.

The second task evaluates cognitive state transition modeling, which measures the ability of the model to detect transitions between different knowledge mastery levels.

The third task evaluates adaptive intervention effectiveness. In this setting, the reinforcement learning agent recommends instructional actions such as exercise selection or curriculum adjustments, and the impact on learning outcomes is evaluated.

4.5. Evaluation Metrics

For knowledge mastery prediction, model performance is evaluated using Area Under the ROC Curve (AUC), prediction accuracy, Brier score for probability calibration, and Mean Absolute Error (MAE).

For cognitive state transition modeling, Macro-F1 score, Transition-F1, and Spearman temporal correlation (

ρ

) are used to measure classification performance and temporal alignment between predicted and observed learning trajectories.

For intervention effectiveness, policy performance is assessed using time required to reach

80 %

mastery, retention gain measured through delayed assessments, and percentage improvement in learner performance scores.

4.6. Implementation Details

All models were implemented using the PyTorch(version 2.1.0) deep learning framework and trained on NVIDIA GPUs. The embedding dimension d was set to 128 unless otherwise specified. The temporal decay coefficient

β

used in the event-driven message passing mechanism was selected from

{0.01, 0.05, 0.1}

through validation experiments.

Model optimization was performed using the Adam optimizer with an initial learning rate of

1 \times 10^{- 3}

. For reinforcement learning components, the discount factor

γ

was set to

0.9

to balance short-term performance improvements with long-term learning gains. Actor–critic networks were trained using mini-batch updates, and gradient clipping was applied to improve training stability.

Table 2 summarizes the main hyperparameters used in the experiments.

Experiments were conducted on a workstation equipped with an NVIDIA RTX 4090 GPU and 24GB RAM.

Each experiment was repeated five times with different random seeds, and the reported results correspond to the mean and standard deviation across runs.

4.7. Statistical Significance Testing

To verify that observed improvements are statistically meaningful, paired t-tests were conducted between the proposed LCG framework and the strongest baseline model for each experimental task. A significance threshold of

p < 0.05

was adopted. The strongest representative baseline for each task was selected for significance testing to provide a focused comparison against the most competitive reference method.

4.8. Ablation Study

To analyze the contribution of individual components, several ablation variants of the proposed framework were evaluated. These variants include removing the reverse Turing agent, removing the temporal decay mechanism, removing the multi-scale temporal memory module, removing self-supervised objectives, and replacing reinforcement learning with a rule-based intervention strategy.

By comparing these variants with the full LCG model, the ablation study quantifies the contributions of reasoning-aware behavioral acquisition, temporal modeling mechanisms, representation learning strategies, and policy optimization.

4.9. Cross-Dataset Generalization Evaluation

To evaluate the generalization capability of the proposed framework, cross-dataset experiments were conducted. In this setting, models were trained on one dataset and evaluated on another dataset without retraining. This evaluation assesses whether the learned cognitive representations capture transferable learning patterns rather than dataset-specific artifacts.

Specifically, two cross-dataset settings were considered: training on the ASSISTments dataset and testing on the coding platform dataset, and vice versa. In both cases, the model parameters learned from the source dataset were directly applied to the target dataset.

Performance was evaluated using AUC and Macro-F1 scores for mastery prediction and state transition detection. The observations suggest that the proposed LCG framework maintains relatively stable performance across datasets, indicating the potential generalization capability of the learned dynamic cognitive representations.

These findings should be interpreted as preliminary evidence of cross-dataset generalization rather than a comprehensive benchmark evaluation.

4.10. Cold-Start Learner Evaluation

Educational platforms frequently encounter cold-start scenarios where new learners have very limited interaction history. To evaluate model robustness under such conditions, a cold-start evaluation was conducted.

In this experiment, learners with fewer than k historical interactions were treated as cold-start users. Three levels of interaction sparsity were considered:

k = 5

,

k = 10

, and

k = 20

. Models were trained using the full training dataset but evaluated only on learners whose interaction histories were truncated to the specified lengths.

Performance was evaluated using prediction accuracy and AUC. The results show that the proposed LCG framework consistently outperforms baseline methods under all sparsity levels. This improvement can be attributed to the relational modeling capability of the dynamic cognition graph, which allows knowledge information to propagate across concept and learner nodes even when individual interaction histories are limited.

These results indicate that the proposed framework is robust to sparse interaction data and can effectively support new learners in real-world educational systems. These results are intended as supplementary analysis of robustness under sparse interaction conditions rather than as a standalone benchmark comparison.

4.11. Efficiency and Scalability Analysis

To evaluate the computational efficiency of the proposed framework, we measured both training time and inference latency across different model configurations.

Training efficiency was evaluated by recording the average epoch training time for each model. Inference efficiency was measured as the average time required to process a single learner interaction event.

The experiments show that although the proposed LCG framework introduces additional components such as dynamic graph updates and multi-scale memory modules, the event-driven message passing mechanism allows efficient incremental updates. As a result, the computational overhead remains manageable and scales approximately linearly with the number of interaction events.

Table 3 summarizes the efficiency comparison among the evaluated models.

The results indicate that the proposed method achieves improved modeling capability while maintaining practical computational efficiency for large-scale educational platforms.

5. Results

This section presents the experimental results evaluating the proposed Learner Cognitive Graph (LCG) framework across three tasks: knowledge mastery prediction, cognitive state transition modeling, and adaptive intervention optimization. All results are reported as mean ± standard deviation over five independent runs.

5.1. Knowledge Mastery Prediction Performance

Table 4 summarizes the prediction performance on the ASSISTments and coding datasets. The proposed LCG framework consistently achieves the best performance across all evaluation metrics.

Compared with classical probabilistic approaches such as BKT, deep sequential models (DKT, SAKT, AKT) provide clear improvements, confirming the importance of modeling temporal dependencies in learner interaction sequences. However, purely sequential models treat learning events as linear time series and fail to capture structural relationships among learners, concepts, and exercises.

Graph-based approaches further improve prediction accuracy by incorporating relational information. In particular, the temporal graph model TGN achieves stronger performance than static graph models due to its ability to model dynamic structural updates.

The proposed LCG framework achieves the highest AUC of 0.889, outperforming the strongest baseline (TGN) by 2.8 percentage points. Improvements are also observed in Brier score and MAE, indicating that the model produces better-calibrated probability estimates rather than merely improving classification accuracy.

These improvements can be attributed to two factors. First, the Dynamic Cognition Graph explicitly models interactions among learners, knowledge concepts, and exercises, allowing relational knowledge propagation. Second, the multi-scale temporal memory mechanism enables the model to capture both short-term behavioral signals and long-term learning patterns.

Paired t-tests confirm that the improvements over TGN are statistically significant in both AUC (

p = 0.012

) and MAE (

p = 0.018

).

5.2. Cognitive State Transition Modeling

Table 5 reports the results for cognitive state transition detection.

Sequential models show moderate performance in cognitive state transition detection, as they are optimized for next-step prediction rather than structural change modeling. Static graph models improve results by incorporating relational dependencies among concepts, while temporal graph models further benefit from evolving interaction structures.

The proposed LCG model achieves the best performance across all metrics, reaching a Macro-F1 of 0.814, a Transition-F1 of 0.789, and a Spearman correlation of 0.931. Compared with TGN, LCG improves Macro-F1 and Transition-F1 by 3.3 percentage points and yields stronger alignment between predicted cognitive trajectories and actual learning trends. These results indicate that multi-scale temporal modeling is important for detecting abrupt cognitive shifts in real learning processes.

5.3. Intervention Policy Evaluation

Table 6 compares the effectiveness of different instructional intervention strategies.

Rule-based strategies represent traditional tutoring policies and achieve limited improvement due to their inability to adapt to individual learner states. Contextual bandit approaches improve performance by incorporating interaction context but optimize only short-term rewards.

Reinforcement learning methods further improve learning outcomes by considering long-term reward signals. However, their performance depends heavily on the quality of the state representation.

When reinforcement learning is combined with dynamic cognitive graph embeddings, the proposed DCG+RL framework achieves the best performance. Specifically, time-to-80% mastery decreases from 8.3 days to 7.4 days, representing a 10.8% improvement over RL without graph-based representations. Retention gain increases from 10.4% to 13.6%, indicating improved long-term knowledge consolidation.

These findings demonstrate that high-quality cognitive state representations significantly enhance policy learning in adaptive educational systems.

5.4. Detailed Ablation Analysis

Table 7 presents the ablation study analyzing the contribution of individual components.

Removing the reverse Turing agent leads to a noticeable performance drop, indicating that structured reasoning evidence improves behavioral data quality for cognitive modeling.

Eliminating the temporal decay mechanism reduces prediction accuracy and transition detection performance, suggesting that modeling heterogeneous temporal scales is critical for representing learning dynamics.

Similarly, removing the multi-scale memory module leads to a consistent decline in performance across all metrics, confirming the importance of capturing both short-term and long-term learning signals.

Finally, replacing reinforcement learning with a rule-based intervention strategy significantly reduces retention gain, demonstrating that optimizing long-term learning outcomes requires adaptive policy learning.

5.5. Overall Trend Analysis

Across all experimental tasks, three consistent patterns emerge.

First, sequential models outperform probabilistic baselines but remain limited by their inability to model structural relationships among learning entities.

Second, static graph models improve relational awareness but struggle to capture dynamic cognitive transitions.

Third, temporal graph models close part of this gap, yet their performance remains constrained without multi-scale temporal modeling and structured behavioral evidence.

By integrating dynamic graph modeling, multi-scale memory mechanisms, and reinforcement learning-based policy optimization, the proposed LCG framework consistently achieves the best performance across prediction accuracy, transition detection, and intervention effectiveness.

6. Discussion

6.1. Implications for Dynamic Cognitive Modeling

The experimental results demonstrate that modeling learner cognition as a dynamic heterogeneous graph provides measurable advantages over purely sequential or static relational approaches. Improvements observed in both knowledge mastery prediction and cognitive state transition detection suggest that the hybrid representation—combining relational structure with temporal memory—better aligns with the underlying dynamics of learning.

The multi-scale memory mechanism plays a particularly important role. Educational interactions exhibit both short-term fluctuations (e.g., practice bursts) and long-term consolidation patterns. Sequential models often overemphasize recent interactions, whereas static graph models neglect temporal recency altogether. By explicitly decomposing temporal representations into short-, medium-, and long-term components, the proposed framework stabilizes embedding updates and reduces variance in state estimation. This structural decomposition may explain the consistent improvement in transition-F1 and temporal correlation.

Furthermore, incorporating structured reasoning evidence through the reverse Turing agent appears to enhance representation quality. The ablation study indicates that removing reasoning-based features leads to degradation across tasks, suggesting that cognitive modeling benefits from richer semantic signals beyond binary correctness labels.

6.2. Impact on Adaptive Intervention Policies

A central contribution of this work is demonstrating how dynamic cognitive representations improve reinforcement learning-based intervention strategies. The results show that using DCG embeddings as RL states reduces time-to-mastery and increases retention gain compared with policies based on handcrafted features.

This improvement can be interpreted as enhanced state observability. In educational environments, learner states are partially observable and non-stationary. Simplified performance indicators may fail to capture deeper structural dependencies among concepts. By encoding relational and temporal information jointly, the DCG representation provides a more informative state space for policy optimization. Consequently, the RL agent can better balance short-term performance improvement with long-term knowledge consolidation.

Importantly, the observed gains are moderate rather than extreme. The reduction in time-to-mastery (approximately 10%) and retention improvement (approximately 3 percentage points) indicate practical but realistic enhancements. This moderate improvement suggests that dynamic representation learning contributes meaningfully without overstating its impact.

6.3. Comparison with Existing Paradigms

The comparative results reveal three distinct trends across modeling paradigms.

First, probabilistic models such as BKT provide interpretable but limited representations, particularly in complex relational settings. Second, sequential deep learning models capture temporal patterns but lack structural awareness. Third, static graph models improve relational modeling but struggle to adapt to evolving interaction dynamics.

Temporal graph models partially address this limitation; however, without multi-scale memory and structured semantic enrichment, their improvements remain incremental. The proposed framework integrates these components into a unified architecture, enabling consistent gains across prediction and intervention tasks.

These findings support the hypothesis that neither purely sequential nor purely structural modeling is sufficient for dynamic educational environments. Instead, hybrid approaches combining relational graphs with temporal event-driven updates offer a more comprehensive representation.

6.4. Stability and Scalability Considerations

From a system perspective, the incremental event-driven update mechanism enables efficient deployment in large-scale online learning platforms. Unlike static retraining approaches, the DCG framework updates node embeddings locally upon new interactions. The computational complexity scales linearly with event count, making it suitable for high-frequency educational data streams.

The reinforcement learning component also remains computationally tractable due to mini-batch actor–critic optimization. While the integration of multi-scale memory slightly increases model parameters, empirical training remained stable across multiple random seeds, as reflected by low standard deviations in reported metrics.

6.5. Limitations

Several limitations should be acknowledged.

First, although synthetic datasets were used to simulate rare cognitive transitions, the long-term educational impact of the proposed framework has not yet been validated through extended real-world longitudinal studies. Future work should evaluate retention and behavioral adaptation over longer time horizons in authentic learning environments.

Second, the reverse Turing agent relies on large language models, which introduce additional computational overhead and potential variability in the collected reasoning evidence. Although the proposed framework mitigates part of this issue by encoding textual explanations into dense semantic representations rather than directly using raw text, the quality of reasoning signals may still be affected by ambiguity, incompleteness, or inconsistency in learner responses. Further alignment, filtering, and quality-control strategies are needed to improve robustness.

Third, the reinforcement learning evaluation in this study was conducted in an offline simulation setting constructed from historical interaction data. While this setup enables controlled comparison of intervention policies, it cannot fully capture the complexity of live educational deployment. Future work should validate the proposed policy framework in real-world adaptive learning systems.

Fourth, reward design in reinforcement learning requires balancing immediate performance improvement and delayed retention effects. Alternative reward formulations may lead to different intervention behaviors, and future work should investigate adaptive or meta-learned reward mechanisms.

Finally, although the proposed framework improves predictive accuracy and intervention efficiency, the observed gains remain incremental rather than transformative. Educational outcomes are influenced by many external pedagogical, behavioral, and contextual factors beyond algorithmic optimization alone.

6.6. Future Directions

Future research may extend this framework in several directions. First, incorporating multimodal signals such as eye-tracking or affective cues could further enrich cognitive state representations. Second, federated or privacy-preserving training strategies may enable cross-institution deployment while protecting learner data. Third, meta-reinforcement learning could allow rapid adaptation to new curricula or learner populations. Finally, theoretical analysis of dynamic graph stability under non-stationary environments remains an open area of investigation.

Overall, this study provides empirical evidence that integrating dynamic graph modeling with reinforcement learning offers a practical pathway toward more adaptive and interpretable intelligent educational systems. Beyond educational applications, the temporal message-based modeling paradigm may generalize to other domains involving dynamic relational processes. For example, in urban analysis tasks such as building façade understanding [39], temporal or structured observations of building elements can be modeled using similar dynamic graph representations. Similarly, in air handling unit fault detection and diagnosis (AFDD) [40], time-evolving sensor relationships and operational states can be effectively captured through temporal message passing, indicating the broader applicability of the proposed framework.

7. Conclusions

This study proposed a Learner Cognitive Graph (LCG) framework that integrates dynamic graph-based cognitive modeling with reinforcement learning for adaptive educational intervention. By formalizing learner cognition as an event-driven heterogeneous graph and incorporating multi-scale temporal memory, the framework captures both structural dependencies and evolving interaction dynamics. The learned graph embeddings serve as state representations for reinforcement learning, enabling policy optimization that balances short-term performance improvement with long-term knowledge retention.

Experimental results on real-world and simulated datasets demonstrate consistent improvements over representative sequential, static graph, and temporal graph baselines. The proposed approach achieves higher knowledge mastery prediction accuracy, more precise cognitive state transition modeling, and improved intervention efficiency. Ablation studies further confirm the contribution of temporal decay modeling, multi-scale memory decomposition, structured behavioral evidence acquisition, and reinforcement learning-based optimization.

Importantly, performance gains remain moderate and statistically significant, suggesting practical rather than exaggerated improvements. The event-driven update mechanism supports incremental computation, making the framework suitable for deployment in large-scale online educational systems.

While several limitations remain, including reliance on language model-based reasoning evidence and reward design sensitivity, the present work provides empirical evidence that dynamic cognitive representations enhance adaptive intervention policies. The proposed integration of heterogeneous graph modeling and reinforcement learning offers a systematic and scalable approach for intelligent educational systems, and may serve as a foundation for future research in dynamic learner modeling and personalized instruction.

Author Contributions

Conceptualization, Y.L. and Y.G.; methodology, Y.G.; software, Y.G. and X.W.; validation, Y.G., X.W. and X.H.; formal analysis, Y.G.; investigation, Y.G.; resources, Y.L. and L.S.; data curation, Y.G. and X.H.; writing—original draft preparation, Y.G.; writing—review and editing, Y.L. and L.S.; visualization, Y.G.; supervision, Y.L. and L.S.; project administration, Y.L.; funding acquisition, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Higher Education Computer Education Research Association Teaching and Education Research Project, grant number CERACU2025R06; the Strategic Research Project on the Training Reform of Excellent Engineers, grant number ZT-20240108; the Teaching Reform Project of the International Innovation Institute of Beihang University in Hangzhou, grant numbers JG202520 and JGXY2025020B; and the Research Startup Fund of the International Innovation Institute of Beihang University in Hangzhou, grant number 2024KQ086. The APC was funded by the International Innovation Institute of Beihang University in Hangzhou.

Institutional Review Board Statement

Ethical review and approval were waived for the publicly available ASSISTments dataset. For the coding dataset, all collected interaction data were anonymized and used solely for research purposes, and no personally identifiable information was collected or stored.

Informed Consent Statement

Reasoning explanations in the coding dataset were voluntarily provided by learners during interaction with the platform.

Data Availability Statement

The datasets used in this study include the publicly available ASSISTments dataset and an anonymized coding interaction dataset. The ASSISTments dataset is available at: https://sites.google.com/site/assistmentsdata/(accessed on 30 March 2026). The coding interaction dataset contains anonymized learner interaction records and cannot be publicly released due to privacy considerations. The associated reasoning explanations are anonymized textual interaction records and are not publicly released due to privacy considerations.

Acknowledgments

We appreciate the valuable feedback from the anonymous reviewers.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Baker, R.S.; Inventado, P. Educational Data Mining and Learning Analytics. In Learning Analytics; Springer: New York, NY, USA, 2014; pp. 61–75. [Google Scholar]
Romero, C.; Ventura, S. Educational Data Mining: A Review of the State of the Art. IEEE Trans. Syst. Man, Cybern. Part C 2010, 40, 601–618. [Google Scholar] [CrossRef]
Koedinger, K.; D’Mello, S.; McLaughlin, E. Data Mining and Education. Wiley Interdiscip. Rev. Cogn. Sci. 2015, 6, 333–353. [Google Scholar] [CrossRef]
Corbett, A.; Anderson, J. Knowledge Tracing: Modeling the Acquisition of Procedural Knowledge. User Model. -User-Adapt. Interact. 1995, 4, 253–278. [Google Scholar] [CrossRef]
Piech, C.; Bassen, J.; Huang, J.; Ganguli, S.; Sahami, M.; Guibas, L.; Sohl-Dickstein, J. Deep Knowledge Tracing. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; MIT Press: Cambridge, MA, USA, 2015. [Google Scholar]
Pandey, S.; Karypis, G. A Self-Attentive Model for Knowledge Tracing. In Proceedings of the 12th International Conference on Educational Data Mining, Montreal, QC, Canada, 2–5 July 2019. [Google Scholar]
Ghosh, A.; Heffernan, N.; Lan, A. Context-Aware Attentive Knowledge Tracing. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mini; Association for Computing Machinery: New York, NY, USA, 2020. [Google Scholar]
Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4–24. [Google Scholar] [CrossRef]
Nakagawa, H.; Iwasawa, Y.; Matsuo, Y. Graph-Based Knowledge Tracing: Modeling Student Proficiency Using Graph Neural Network. In IEEE/WIC/ACM International Conference on Web Intelligence; Association for Computing Machinery: New York, NY, USA, 2019. [Google Scholar]
Yang, Y.; Shen, Y.; Li, X. Learning Knowledge Graph Embedding for Knowledge Tracing. IEEE Access 2020, 8, 212428–212437. [Google Scholar]
Anderson, J.R. Cognitive Psychology and Its Implications; Worth Publishers: New York, NY, USA, 2005. [Google Scholar]
Piaget, J. The Psychology of Intelligence; Routledge: London, UK, 2001. [Google Scholar]
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. In Advances in Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2020. [Google Scholar]
OpenAI. GPT-4 Technical Report. arXiv 2023, arXiv:2303.08774. [Google Scholar] [CrossRef]
Kasneci, E.; Sessler, K.; Küchemann, S.; Bannert, M.; Dementieva, D.; Fischer, F.; Gasser, U.; Groh, G.; Günnemann, S.; Hüllermeier, E.; et al. ChatGPT for Good? Opportunities and Challenges of Large Language Models in Education. Learn. Individ. Differ. 2023, 103, 102274. [Google Scholar] [CrossRef]
Ji, Z.; Lee, N.; Frieske, R.; Yu, T.; Su, D.; Xu, Y.; Ishii, E.; Bang, Y.; Madotto, A.; Fung, P. Survey of Hallucination in Natural Language Generation. ACM Comput. Surv. 2023, 35, 248. [Google Scholar] [CrossRef]
Mandel, T.; Liu, Y.; Brunskill, E.; Popovic, Z. Offline Policy Evaluation across Representations with Applications to Educational Games. In Proceedings of the 13th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2014), Paris, France, 5–9 May 2014. [Google Scholar]
Rafferty, A.; Brunskill, E.; Griffiths, T.; Shafto, P. Faster Teaching via POMDP Planning. Cogn. Sci. 2016, 40, 1290–1332. [Google Scholar] [CrossRef]
Zhang, H.; Liu, Z.; Shang, C.; Li, D.; Jiang, Y. A Question-centric Multi-experts Contrastive Learning Framework for Improving the Accuracy and Interpretability of Deep Sequential Knowledge Tracing Models. arXiv 2024, arXiv:2403.07322. [Google Scholar] [CrossRef]
Lee, U.; Bae, J.; Kim, D.; Lee, S.; Park, J.; Ahn, T.; Lee, G.; Stratton, D.; Kim, H. Language Model Can Do Knowledge Tracing: Simple but Effective Method to Integrate Language Model and Knowledge Tracing Task. arXiv 2024, arXiv:2406.02893. [Google Scholar] [CrossRef]
Yang, L.; Sun, X.; Li, H.; Xu, R.; Wei, X. Difficulty Aware Programming Knowledge Tracing via Large Language Models. Sci. Rep. 2025, 15, 11475. [Google Scholar] [CrossRef] [PubMed]
Cheng, K.; Peng, L.; Wang, P.; Ye, J.; Sun, L.; Du, B. DyGKT: Dynamic Graph Learning for Knowledge Tracing. arXiv 2024, arXiv:2407.20824. [Google Scholar] [CrossRef]
Gad, S.; Abdelfattah, S.; Abdelrahman, G. Temporal Graph Memory Networks for Knowledge Tracing. arXiv 2024, arXiv:2410.01836. [Google Scholar] [CrossRef]
Rossi, E.; Chamberlain, B.; Frasca, F.; Eynard, D.; Monti, F.; Bronstein, M. Temporal Graph Networks for Deep Learning on Dynamic Graphs. arXiv 2020, arXiv:2006.10637. [Google Scholar] [CrossRef]
Kazemi, S.M.; Goel, R.; Jain, K.; Kobyzev, I.; Sethi, A.; Forsyth, P.; Poupart, P. Representation Learning for Dynamic Graphs: A Survey. J. Mach. Learn. Res. 2020, 21, 1–73. [Google Scholar]
Trivedi, R.; Dai, H.; Wang, Y.; Song, L. Know-Evolve: Deep Temporal Reasoning for Dynamic Knowledge Graphs. In Proceedings of the 34th International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2017. [Google Scholar]
Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Ichter, B.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Proceedings of the 35th International Conference on Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2022. [Google Scholar]
Lee, U.; Bae, J.; Jung, Y.; Kang, M.; Byun, G.; Lee, Y.; Kim, D.; Lee, S.; Park, J.; Ahn, T.; et al. From Prediction to Application: Language Model-based Code Knowledge Tracing with Domain Adaptive Pre-Training and Automatic Feedback System with Pedagogical Prompting for Comprehensive Programming Education. arXiv 2024, arXiv:2409.00323. [Google Scholar]
Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A.; et al. Training Language Models to Follow Instructions with Human Feedback. In Proceedings of the 36th International Conference on Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2022. [Google Scholar]
Huang, X.; Chan, K.-H.; Wu, W.; Sheng, H.; Ke, W. Fusion of Multi-Modal Features to Enhance Dense Video Caption. Sensors 2023, 23, 5565. [Google Scholar] [CrossRef]
Sutton, R.; Barto, A. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Hare, R.; Tang, Y. Ontology-driven Reinforcement Learning for Personalized Student Support. arXiv 2024, arXiv:2407.10332. [Google Scholar] [CrossRef]
Koedinger, K.; Aleven, V. Exploring the Assistance Dilemma in Experiments with Cognitive Tutors. Educ. Psychol. Rev. 2007, 19, 239–264. [Google Scholar] [CrossRef]
Chi, M.; VanLehn, K.; Litman, D.; Jordan, P. An Evaluation of Pedagogical Tutorial Tactics for a Natural Language Tutoring System. Int. J. Artif. Intell. Educ. 2011, 21, 83–113. [Google Scholar] [CrossRef]
Hamilton, W.; Ying, Z.; Leskovec, J. Inductive Representation Learning on Large Graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2017. [Google Scholar]
Turing, A. Computing Machinery and Intelligence. Mind 1950, 59, 433–460. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2017. [Google Scholar]
Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers); Association for Computational Linguistics: Minneapolis, MN, USA, 2019. [Google Scholar]
Wang, S.; Park, S.; Park, S.; Kim, J. Building Façade Datasets for Analyzing Building Characteristics Using Deep Learning. Data Brief 2024, 57, 110885. [Google Scholar] [CrossRef]
Wang, S. Real Operational Labeled Data of Air Handling Units from Office, Auditorium, and Hospital Buildings. Sci. Data 2025, 12, 1481. [Google Scholar] [CrossRef]

Figure 1. Overall framework of the proposed intelligent educational system. The framework integrates learner interaction data, a reasoning-aware reverse Turing agent, dynamic cognition graph modeling, and reinforcement learning optimization to support personalized learning intervention. Arrows indicate the flow of information between modules, including learner behavior, cognitive state representations, and policy-driven interventions.

Figure 2. Architecture of the proposed intelligent educational system. Learner interactions are processed by a reverse Turing agent to elicit reasoning explanations, which are encoded and integrated into the dynamic cognition graph for temporal knowledge propagation and adaptive intervention. Arrows indicate the direction of information flow across the processing pipeline, while the reward and feedback arrows denote the reinforcement learning update loop for policy optimization.

Figure 3. Dynamic cognition graph for modeling learning states. The graph evolves over time, capturing interactions between learners, knowledge concepts, and exercises through temporal message passing and knowledge propagation. Blue, green, and teal nodes denote learners, knowledge concepts, and exercises, respectively. Horizontal arrows indicate temporal state transitions and message propagation across consecutive time steps, while labeled edges denote mastery, performance interaction, and knowledge dependency relations.

Figure 4. Reinforcement learning optimization for personalized learning intervention. The learner cognitive state derived from the dynamic cognition graph serves as the environment state, and the policy network selects instructional actions based on reward signals derived from learning outcomes.

Table 1. Dataset statistics used in the experiments.

Dataset	Learners	Interactions	Concepts	Avg. Seq. Length
ASSISTments	4217	1,215,748	123	288
Coding Dataset	2864	735,420	96	257

Table 2. Main hyperparameters used in the experiments.

Parameter	Value
Embedding dimension d	128
Learning rate	$1 \times 10^{- 3}$
Discount factor $γ$	0.9
Temporal decay $β$	${0.01, 0.05, 0.1}$
Optimizer	Adam
Batch size	64

Table 3. Efficiency comparison of different models.

Model	Training Time/Epoch	Inference Time/Event
DKT	32.5 s	2.8 ms
AKT	48.1 s	4.6 ms
TGN	61.7 s	5.9 ms
LCG (ours)	68.4 s	6.4 ms

Table 4. Knowledge mastery prediction performance (mean ± std over five runs). Higher values indicate better performance for AUC and Accuracy, while lower values indicate better performance for Brier score and MAE. Bold values indicate the best performance in each column.

Model	AUC ↑	Accuracy ↑	Brier ↓	MAE ↓
BKT	0.721 ± 0.006	0.684 ± 0.007	0.214 ± 0.005	0.298 ± 0.006
DKT	0.801 ± 0.004	0.743 ± 0.005	0.187 ± 0.004	0.261 ± 0.004
SAKT	0.824 ± 0.003	0.756 ± 0.004	0.178 ± 0.003	0.247 ± 0.003
AKT	0.843 ± 0.004	0.768 ± 0.004	0.171 ± 0.003	0.238 ± 0.004
Static KG + GNN	0.857 ± 0.003	0.779 ± 0.003	0.165 ± 0.002	0.229 ± 0.003
TGN	0.861 ± 0.003	0.783 ± 0.003	0.163 ± 0.002	0.226 ± 0.003
LCG (ours)	0.889 ± 0.002	0.812 ± 0.003	0.148 ± 0.002	0.204 ± 0.002

Table 5. Cognitive state transition modeling results. Higher values indicate better performance for Macro-F1, Transition-F1, and Spearman

ρ

. Bold values indicate the best performance in each column.

Table 5. Cognitive state transition modeling results. Higher values indicate better performance for Macro-F1, Transition-F1, and Spearman

ρ

. Bold values indicate the best performance in each column.

Model	Macro-F1 ↑	Transition-F1 ↑	Spearman $ρ$ ↑
DKT	0.712 ± 0.006	0.681 ± 0.007	0.874 ± 0.005
SAKT	0.738 ± 0.005	0.702 ± 0.006	0.889 ± 0.004
AKT	0.751 ± 0.004	0.716 ± 0.005	0.901 ± 0.004
Static KG + GNN	0.768 ± 0.004	0.739 ± 0.004	0.913 ± 0.003
TGN	0.781 ± 0.003	0.756 ± 0.004	0.921 ± 0.003
LCG (ours)	0.814 ± 0.003	0.789 ± 0.003	0.931 ± 0.002

Table 6. Adaptive intervention policy evaluation. Lower values indicate better performance for Time to 80% Mastery, while higher values indicate better performance for Retention Gain and Score Improvement. Bold values indicate the best performance in each column.

Method	Time to 80% Mastery (days) ↓	Retention Gain ↑	Score Improvement ↑
Rule-based Strategy	9.1 ± 0.4	8.2% ± 0.6	12.4% ± 0.8
Contextual Bandit	8.7 ± 0.3	9.1% ± 0.5	14.3% ± 0.7
RL (Handcrafted State)	8.3 ± 0.3	10.4% ± 0.5	16.1% ± 0.6
LCG + RL (ours)	7.4 ± 0.2	13.6% ± 0.4	19.2% ± 0.6

Table 7. Detailed ablation analysis of the proposed LCG framework. Higher values indicate better performance for all metrics. Bold values indicate the best performance in each column.

Variant	AUC ↑	Macro-F1 ↑	Retention Gain ↑
Full LCG	0.889	0.814	13.6%
w/o Reverse Turing Agent	0.871	0.792	11.4%
w/o Temporal Decay	0.865	0.781	10.8%
w/o Multi-scale Memory	0.859	0.773	10.3%
w/o Self-supervised Loss	0.868	0.785	11.1%
Rule-based Intervention	0.889	0.814	8.2%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, Y.; Gai, Y.; Wang, X.; Sun, L.; Huang, X. Dynamic Cognition Graph for Adaptive Learning: Integrating Reasoning Evidence and Reinforcement Learning. Appl. Sci. 2026, 16, 3580. https://doi.org/10.3390/app16073580

AMA Style

Li Y, Gai Y, Wang X, Sun L, Huang X. Dynamic Cognition Graph for Adaptive Learning: Integrating Reasoning Evidence and Reinforcement Learning. Applied Sciences. 2026; 16(7):3580. https://doi.org/10.3390/app16073580

Chicago/Turabian Style

Li, Ying, Yiming Gai, Xingyu Wang, Leilei Sun, and Xuefei Huang. 2026. "Dynamic Cognition Graph for Adaptive Learning: Integrating Reasoning Evidence and Reinforcement Learning" Applied Sciences 16, no. 7: 3580. https://doi.org/10.3390/app16073580

APA Style

Li, Y., Gai, Y., Wang, X., Sun, L., & Huang, X. (2026). Dynamic Cognition Graph for Adaptive Learning: Integrating Reasoning Evidence and Reinforcement Learning. Applied Sciences, 16(7), 3580. https://doi.org/10.3390/app16073580

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dynamic Cognition Graph for Adaptive Learning: Integrating Reasoning Evidence and Reinforcement Learning

Abstract

1. Introduction

2. Related Work

2.1. Knowledge Tracing and Cognitive Modeling

2.2. Dynamic Graph Neural Networks

2.3. Large Language Models in Education

2.4. Reinforcement Learning for Adaptive Education

2.5. Summary and Positioning of This Work

3. Materials and Methods

3.1. Dynamic Cognitive Representation as State Encoding for Adaptive Intervention

3.2. Reverse Turing Agent for Reasoning-Oriented Behavioral Acquisition

3.3. Dynamic Cognition Graph Formalization

3.4. Event-Driven Message Passing Mechanism

3.5. Multi-Scale Spatiotemporal Memory Modeling

3.6. Self-Supervised Stabilization of Cognitive Embeddings

3.7. Reinforcement Learning Formulation for Adaptive Intervention

3.8. Computational Complexity and Scalability

4. Experimental Setup

4.1. Datasets

4.1.1. Real-World Educational Datasets

ASSISTments Dataset

Online Coding Platform Dataset

4.1.2. Dataset Statistics

4.2. Data Preprocessing

4.3. Baseline Models

4.4. Evaluation Tasks

4.5. Evaluation Metrics

4.6. Implementation Details

4.7. Statistical Significance Testing

4.8. Ablation Study

4.9. Cross-Dataset Generalization Evaluation

4.10. Cold-Start Learner Evaluation

4.11. Efficiency and Scalability Analysis

5. Results

5.1. Knowledge Mastery Prediction Performance

5.2. Cognitive State Transition Modeling

5.3. Intervention Policy Evaluation

5.4. Detailed Ablation Analysis

5.5. Overall Trend Analysis

6. Discussion

6.1. Implications for Dynamic Cognitive Modeling

6.2. Impact on Adaptive Intervention Policies

6.3. Comparison with Existing Paradigms

6.4. Stability and Scalability Considerations

6.5. Limitations

6.6. Future Directions

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI