Learning Path Recommendation Enhanced by Knowledge Tracing and Large Language Model

Lin, Yunxuan; Wu, Zhengyang

doi:10.3390/electronics14224385

Open AccessArticle

Learning Path Recommendation Enhanced by Knowledge Tracing and Large Language Model

by

Yunxuan Lin

and

Zhengyang Wu

^*

School of Computer Science, South China Normal University, Guangzhou 510631, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(22), 4385; https://doi.org/10.3390/electronics14224385

Submission received: 11 October 2025 / Revised: 1 November 2025 / Accepted: 5 November 2025 / Published: 10 November 2025

(This article belongs to the Special Issue Data Mining and Recommender Systems)

Download

Browse Figures

Versions Notes

Abstract

With the development of large language model (LLM) technology, AI-assisted education systems are gradually being widely used. Learning Path Recommendation (LPR) is an important task in personalized instructional scenarios. AI-assisted LPR is gaining traction for its ability to generate learning content based on a student’s personalized needs. However, the native-LLM has the problem of hallucination, which may lead to the inability to generate learning content; in addition, the evaluation results of the LLM on students’ knowledge status are usually conservative and have a large margin of error. To address these issues, this work proposes a novel approach for LPR enhanced by knowledge tracing (KT) and LLM. Our method operates in a “generate-and-retrieve” manner: the LLM acts as a pedagogical planner that generates contextual reference exercises based on the student’s needs. Subsequently, a retrieval mechanism constructs the concrete learning path by retrieving the top-N most semantically similar exercises from an established exercise bank, ensuring the recommendations are both pedagogically sound and practically available. The KT plays the role of an evaluator in the iterative process. Rather than generating semantic instructions directly, it provides a quantitative and structured performance metric. Specifically, given a candidate learning path generated by the LLM, the KT model simulates the student’s knowledge state after completing the path and computes a knowledge promotion score. This score quantitatively measures the effectiveness of the proposed path for the current student, thereby guiding the refinement of subsequent recommendations. This iterative interaction between the KT and the LLM continuously refines the candidate learning items until an optimal learning path is generated. Experimental validations on public datasets demonstrate that our model surpasses baseline methods.

Keywords:

learning path recommendation; knowledge tracing; large language model

1. Introduction

Learning Path Recommendation (LPR) systems leverage real-time analysis of students’ knowledge acquisition to construct personalized learning trajectories tailored to their specific needs [1]. By dynamically adapting instructional content and methodologies, these systems not only enhance learning efficiency but also foster deeper engagement and intrinsic motivation, thereby significantly augmenting overall educational outcomes. As AI applications become increasingly pervasive in education, the interpretability and fairness of recommendation systems have also received growing attention [2,3,4]. An ideal LPR system should not only be effective but also transparent, trustworthy, and fair. In recent years, research in the field of LPR has exhibited a noticeable upward trend. Some researchers have applied knowledge tracing (KT) to LPR. The KT model has the ability to predict students’ knowledge status, so as to enhance the ability of the LPR system to dynamically evaluate students’ knowledge status and adjust the path based on this [5,6]. However, the existing KT model only has the ability of single-step prediction, that is, it can only predict the knowledge state of students at time

t + 1

, which is difficult to meet the requirements of LPR for multi-step prediction of students’ knowledge status.

Recent advances in large language models (LLMs) have opened up new possibilities for LPR. LLMs can understand educational materials in textual format, generate textual content that can be used for teaching purposes, or provide answers to questions in learning [7,8]. These methods make use of the text comprehension and reasoning capabilities of LLMs. Based on the semantic understanding of the description text of the interaction of students’ learning behavior by LLM, the current knowledge state of students is inferred. However, it is also difficult to correctly reason about the knowledge state of the student’s future multi-step in this way. In addition, the LLM assessment results of students’ knowledge status are always concentrated on a certain average, for example, if it is asked to rank the student’s knowledge state from 1 to 5 according to the student’s learning record, the result given is always around 3 points. This is because LLMs have a tendency to be greedy too early, that is, they tend to choose safe answers that occur frequently in the training data to avoid extreme judgments [9]. LLM prematurely locks in the local optimal solution in decision-making, resulting in a decrease in action coverage instead of comprehensively exploring the optimal strategy. Therefore, in the ranking task, the area around the comfort zone of the model is because of the lowest risk.

To address these challenges, we propose a novel Learning Path Recommendation Enhanced by Knowledge Tracing and Large Language Model, referred to as LPReKL. Primarily, to enhance KT’s capability for multi-step prediction of students’ knowledge status, we probabilistically replace a portion of students’ responses with masked tokens during training. In addition, we have introduced information on the difficulty of the exercises in the KT model. Through this process, KT can accurately infer the student’s level of mastery of a particular knowledge concept at

t + n

steps. Subsequently, leveraging feedback from KT and predefined prompt templates, the LLM is designed to generate the new reference exercise text. Finally, exercises with high similarity to these reference exercise text are retrieved from an exercise bank as the candidate item of the learning path. This content undergoes validation by the KT—if deemed appropriate, it is presented to students; otherwise, the LLM refines its generated content based on KT-driven feedback through the continuous interaction and optimization between KT and LLM. The contributions of this work can be summarized as follows:

This work proposes a novel learning path recommendation method that fuses KT models and LLMs. It introduces a feedback mechanism that guides LLMs to automatically adjust what they generate based on KT’s predictions.
We reconstruct the students’ historical interaction data by incorporating mask markers, enabling KT’s training to predict the students’ state of knowledge at the next $t + n$ time step.
A series of experimental results on multiple public datasets show that the proposed model is effective and has advantages over existing methods.

Traditional sequential or graph-based recommendation methods [10,11] rely on static, historical associations to recommend items [12]. In contrast, our approach performs dynamic simulation: before recommending a top-N exercise path, it forecasts the student’s knowledge state after completing the entire sequence, enabling evaluation of long-term effectiveness and selection of paths with optimal cumulative gains rather than immediate benefits. Moreover, the masking mechanism—by randomly hiding historical responses during training—forces the model to reason under incomplete information, improving its robustness and generalization for multi-step prediction. This allows reliable simulation of entirely new, unseen learning paths.

The remainder of this paper is organized as follows. Section 2 reviews related work on learning path recommendation. Section 3 formalizes the problem definition. Section 4 elaborates on the proposed methodology in detail. Experimental results and analyses are presented in Section 5. Finally, Section 6 concludes this paper and outlines future research directions.

2. Related Work

2.1. Learning Path Recommendation

Learning path recommendation systems can provide customized learning content and teaching strategies based on individual student characteristics, effectively meeting the demands of personalized education [13]. Existing LPR methods mainly fall into two categories. The first category relies on manually defined rules, including constructing learning resource dependency models based on prerequisite relationships [14,15] and modeling knowledge concept correlations using knowledge graphs [16]. Although these methods offer certain interpretability, they exhibit notable limitations in practical applications: first, the systems lack flexibility to adapt to dynamically changing learning needs; second, they incur high maintenance costs. For instance, knowledge graphs require continuous updates to maintain timeliness, otherwise their performance may degrade when handling new knowledge-related exercises. The second category formulates LPR as a sequential recommendation problem [17,18]. These methods generate complete learning paths by analyzing students’ historical behavior data. For example, Zhang et al. [1] employ a recursive propagation approach to process learner–course interaction data, utilizing graph convolutional networks to generate predicted course ratings, which are subsequently integrated with learning style similarity scores to achieve personalized course recommendations. Liu et al. construct learning networks by analyzing learning records and propose a combinatorial recommendation algorithm [19].

While achieving certain results, these methods fail to effectively capture real-time changes in students’ knowledge states, resulting in recommendations that lack personalization and adaptability. The introduction of KT effectively addresses these limitations. Unlike previous recommendation methods that rely on static rules, KT models can dynamically adjust and optimize recommendation paths by continuously analyzing students’ real-time learning performance, thereby significantly enhancing the adaptability and personalization of recommendation systems.

2.2. Knowledge Tracing for LPR

By analyzing student interactions with learning materials, KT predicts students’ mastery of specific knowledge concepts [20,21]. Bayesian Knowledge Tracing is one of the classic approaches, using Bayesian inference to estimate students’ knowledge states [22]. In 2015, Piech et al. use recurrent neural networks for KT [23], leveraging LSTMs to capture students’ knowledge states and temporal dependencies, leading to improved prediction accuracy. The advent of deep KT has significantly advanced the development of learning path recommendation systems [5,24]. By employing data-driven dynamic modeling approaches, this technology transforms traditional one-size-fits-all learning paths into personalized adaptive recommendations, thereby effectively enhancing learning efficiency and outcomes. Specifically, Zhang et al. [5] utilized knowledge tracing to annotate students’ knowledge states in historical learning logs, incorporating these state features as model inputs to generate personalized learning paths that capture both sequential relationships and selection logic among learning resources. Chen et al. [24] proposed an auxiliary prediction module based on knowledge tracing, which continuously evaluates students’ mastery levels at each node of the learning path and optimizes model parameters through cross-entropy loss functions, significantly improving recommendation stability. Notably, the application of multimodal data provides online learning systems with more comprehensive and multi-dimensional representations of student behaviors, creating opportunities for building more accurate personalized learning experiences [25,26].

However, current KT models exhibit two critical limitations: they cannot effectively predict exercises for the next

t + n

step and typically require large amounts of high-quality training data. These constraints significantly challenge their applicability in LPR, where sparse data and continuously emerging new exercises are common. Additionally, Wang et al. [27] argue that existing methods often fail to adequately capture complex behavioral patterns unless they effectively leverage rich world knowledge for deeper reasoning about learning materials. This limitation has prompted researchers to explore novel applications of LLMs in this field.

2.3. LLM-Assisted Recommendation

Recent advancements in deep learning have ushered in transformative breakthroughs, particularly within the domain of natural language processing. Cutting-edge LLMs, including DeepSeek and the Qwen series, exhibit unparalleled proficiency in natural language comprehension [28,29], providing robust technical support for LPR. The integration of LLMs into the educational field presents a salient advantage. LLMs possess contextual understanding and reasoning capabilities. When students struggle with overwhelming learning materials, LLMs generate personalized content based on their learning progress and knowledge states, ensuring access to the most relevant educational resources [8,30,31]. Wang et al. [31] propose a ChatGPT-based personalized English reading comprehension support system, which enhances learning by predicting students’ reading skills, generating tailored questions, and automatically evaluating responses. Cui et al. [32] employ KT techniques to estimate students’ mastery of knowledge concepts based on their learning history and subsequently generated exercise recommendations using a pre-trained language model. Li et al. [15] integrate LLM with knowledge graphs to recommend appropriate learning materials tailored to students’ knowledge states and the human knowledge system. However, current LLM-assisted recommendations suffer from two notable shortcomings: first, they directly adopt model-generated content while overlooking potential logical fallacies or factual inaccuracies; second, they lack effective interactive feedback mechanisms—once the model provides recommendations to students, it cannot collect meaningful learning feedback, thereby hindering dynamic optimization of the recommended content.

The distinctions between LPReKL and prior approaches lie in two aspects:

Conceptually: Previous methods used LLM and KT in a one-way pipeline, lacking bidirectional interaction [33,34]. LPReKL introduces an iterative feedback loop where KT not only diagnoses the initial knowledge state but also evaluates the quality of the LLM-generated learning path and provides feedback for dynamic refinement, forming a closed-loop optimization system.
Practically: We propose a masked tokens strategy that enables KT to predict performance on unknown exercises, whereas traditional KT only supports prediction for known exercises. We employ a retrieval mechanism to match LLM-generated reference exercises with real items from the exercise bank, ensuring that recommended content is pedagogically sound and practically available, rather than relying directly on potentially hallucinated LLM outputs.

3. Problem Formulation

In this section, we review the definition of LPR and explain some key terms within this work. We have listed some frequently used symbols in Table 1 and provide a detailed explanation of their roles in this context.

Learning Path Recommendation. The learning path is illustrated in Figure 1. In the Learning Path Recommendation task, the student is first assessed and receives an initial score of

E_{s}

at the beginning. The recommendation system then proposes a candidate learning path

l p_{c} = \{q_{1}, q_{2}, \dots, q_{K}\}

, where the student engages in the recommended exercises and generates new learning records

u_{t} = {(q_{1}, a_{1}), (q_{2}, a_{2}), \dots, (q_{K}, a_{K})}

, along with a final score

E_{e}

. The student’s learning records and knowledge states are updated accordingly, such that

u_{t} = u_{t} \cup u_{t - 1}

. The effectiveness of the learning path, denoted as

E_{φ}

[14,24,35], can be calculated using the following formula:

E_{φ} = \frac{E_{e} - E_{s}}{E_{s u p} - E_{s}},

(1)

where

E_{s u p}

is the maximum score for the path,

E_{s}

is the student’s initial score, and

E_{e}

is the score after completing the target path. A higher

E_{φ}

indicates a more effective learning path that better matches the needs of student.

In this work, we define a learning path as a carefully curated sequence of learning items designed to achieve a specific learning goal. The internal logic of the path lies in its targeted addressing of the student’s weak knowledge concepts and its overall effectiveness, concretely represented as an ordered list of exercises

l p_{c} = \{q_{1}, q_{2}, \dots, q_{K}\}

.

Knowledge State. Given a set of

| U |

students,

| C |

knowledge concepts, and

| Q |

exercises, each student’s historical learning process is represented as a sequence:

L P (u^{i}) = {(q_{1}, a_{1}), (q_{2}, a_{2}), \dots, (q_{t}, a_{t}), \dots, (q_{L}, a_{L})},

(2)

where

u^{i} \in U

,

q_{t} \in Q

denotes the exercise answered by the student at time step t, and

a_{t} \in \{0, 1\}

indicates the correctness of the response (1 for correct, 0 for incorrect). Each exercise

q_{t}

is associated with one or more knowledge concepts

c_{t}

. By analyzing students’ historical data, we can pinpoint their weaknesses and determine the knowledge concepts that require further reinforcement, as described below:

P (u_{t + 1}^{i}) = Ω ({(q_{1}, a_{1}, c_{1}), (q_{2}, a_{2}, c_{2}), \dots, (q_{t}, a_{t}, c_{t})}),

(3)

where

Ω

corresponds to a trainable algorithm or model, and

u^{i}

represents the ith student. This formulation aims to predict the next learning step

u_{t + 1}^{i}

for each student, enabling the system to dynamically adapt to their evolving knowledge states and optimize their learning trajectories.

Exercise Difficulty. The difficulty of an exercise significantly affects student learning. Exercises that are too easy or too difficult can demotivate students [36], while appropriately challenging ones enhance understanding and stimulate interest [37]. We define the difficulty of an exercise

q_{i}

as the average error rate of its associated knowledge concepts from students’ learning histories:

Diff (q_{i}) = \frac{1}{N} \sum_{K = 1}^{N} e r r o r (q_{i}, {c^{k} | c^{k} \in KC [q_{i}]}),

(4)

where N is the number of knowledge concepts in the set

K C [q_{i}]

,

e r r o r (q_{i})

is the error rate of concept

c^{k}

. Zhang et al. suggest that an error rate around 30% optimally engages students [37].

4. Methodology

This section first introduces the framework and workflow of LPReKL, followed by a detailed discussion of each component.

4.1. Framework Overview

Figure 2 presents the overall architecture of LPReKL. First, a KT with the ability to predict students’ multi-step knowledge states is used to detect students’ initial knowledge states. These students’ initial knowledge states are then fused into prompt templates and transferred to the LLM, which generates a learning project for reference. These items are used as references to retrieve the most similar exercises from the practice library to construct an initial learning path. This learning path is then fed into the KT model again to evaluate the improvement value of the student’s knowledge status after using the learning path. Finally, based on the feedback of the improved value, the prompt template is adjusted and fed back to the LLM to continue or stop generating new learning items. If the LLM continues to generate new learning items, the subsequent operations will be repeated until the optimization goal is achieved.

The details of the procedures for LPR based on LPReKL are as shown in Algorithm 1.

Algorithm 1: LPReKL framework

Input:: learning records $u^{i}$ , masked rate $τ$ , Exercise bank Q.
Output:: recommendation learning path $l p_{c}$
1:: $K T \leftarrow P r e t r a i n i n g (u^{i}, τ)$
2:: for each u in U do
3:: $s_{t - 1}^{i} \leftarrow K T (u_{t - 1}^{i})$
4:: $p r o m p t_t e m p l a t e \leftarrow P r e p a r e P r o m p t (u_{t - 1}^{i}, [K C], [q_{i}], d)$
5:: $L i s t [q_{j}^{'}] \leftarrow L L M (p r o m p t_t e m p l a t e)$ ▹ Equation (10)
6:: $l p_{c} \leftarrow M a t c h E x e r c i s e s (L i s t [q_{j}^{'}], Q)$ ▹ Equation (12)
7:: $s_{t}^{i} \leftarrow K T (l p_{c})$
8:: while not $I s S a t i s f i e d (s_{t - 1}^{i}, s_{t}^{i})$ do
9:: $p r o m p t_t e m p l a t e \leftarrow P r e p a r e P r o m p t (u_{t}^{i}, [K C], [q_{i}], d)$
10:: $L i s t [q_{j}^{'}] \leftarrow L L M (p r o m p t_t e m p l a t e)$ ▹ Equation (10)
11:: $l p_{c} \leftarrow M a t c h E x e r c i s e s (L i s t [q_{j}^{'}], E x e r c i s e B a n k)$ ▹ Equation (12)
12:: end while
13:: Return $l p_{c}$
14:: end for

4.2. Knowledge Tracing for Multi-Step Predictions

We introduce a novel masked mechanism during the training of the KT model to enable multi-step knowledge state prediction. The intuition is to simulate the uncertainty in future student responses when planning a multi-step learning path. For each training sequence, we randomly mask a proportion of the historical records by setting their mask indicator

(m_{i} = 0)

. This forces the model to rely on a broader context rather than just the most recent interactions, thereby improving its ability to forecast knowledge states several steps ahead.

The robust capabilities of DKT have been well established [24], prompting us to adopt the original DKT architecture [23] as the foundational simulator [5]. However, we introduce a novel mechanism: the use of masked tokens to reconstruct students’ historical interaction data, thereby enabling the model for multi-step knowledge states prediction. Given the historical response sequence of the student j as

u_{< t}^{j} = {(q_{1}, a_{1}), (q_{2}, a_{2}), \dots, (q_{t - 1}, a_{t - 1})}

, where

q_{i}

and

a_{i}

denote the exercise index and the corresponding response. Furthermore, we maintain a mask value

m_{i}

for each record, indicating whether the answer should remain visible

(m_{i} = 1)

or be masked

(m_{i} = 0)

. The reconstructed data is expressed as:

{\tilde{u}}_{< t}^{j} = {(q_{1}, m_{1}, a_{1}), (q_{2}, m_{2}, a_{2}), \dots, (q_{t - 1}, m_{t - 1}, a_{t - 1})} .

(5)

During training, for masked records

(m_{i} = 0)

, the response is replaced with a special token to indicate missing data. The input to the KT model is formulated as:

\begin{matrix} q_{t - 1} = W_{q} e_{t - 1}^{q}, \\ m_{t - 1} = W_{m} e_{t - 1}^{m}, \\ a_{t - 1} = W_{a} e_{t - 1}^{a}, \\ x_{< t}^{j} = q_{t - 1} \oplus m_{t - 1} \oplus a_{t - 1}, \end{matrix}

(6)

where

e_{t - 1}^{q}

,

e_{t - 1}^{m}

, and

e_{t - 1}^{a}

are one-hot vector representations of the respective symbols, and

W_{*}

indicates weight matrices. The model then predicts the student’s knowledge states as follows:

\begin{matrix} i_{t} = sigmoid (W_{x i} x_{< t}^{j} + W_{h i} h_{t - 1} + b_{i}), \\ f_{t} = sigmoid (W_{x f} x_{< t}^{j} + W_{h f} h_{t - 1} + b_{f}), \\ o_{t} = sigmoid (W_{x o} x_{< t}^{j} + W_{h o} h_{t - 1} + b_{o}), \\ c_{t} = f_{t} c_{t - 1} + i_{t} tanh (W_{x c} x_{< t}^{j} + W_{h c} h_{t - 1} + b_{c}), \\ h_{t} = o_{t} tanh (c_{t}), \\ s_{t} = sigmoid (W_{o s} o_{t} + b_{0}) . \end{matrix}

(7)

Here,

i_{*}

,

f_{*}

,

o_{*}

,

c_{*}

correspond to the input gate, forget gate, output gate, and memory cell, respectively. Both

t a n h (*)

and

s i g m o i d (*)

denote activation functions, while

W_{*}

and

b_{*}

are trainable parameters. The output vector

s_{t}

has a dimension equal to the number of knowledge concepts, with each entry indicating the student’s mastery of a specific knowledge concept. Our model not only predicts students’ knowledge states but also balances the impact of exercise difficulty on performance. The binary cross-entropy loss for n iterations and exercise difficulty regulation are defined as:

\begin{matrix} ℓ_{b c e} = \sum_{k = 1}^{n} B C E (a_{k}, r_{k}), \\ ℓ_{d i f f} = \frac{1}{n} \sum_{i = 1}^{n} | d - D i f f (q_{i}) |, \end{matrix}

(8)

where

d

is a predefined difficulty level tailored to student needs. The final optimization objective is:

ℓ = α ℓ_{b c e} + β ℓ_{d i f f},

(9)

where

α

and

β

are weighting parameters.

4.3. Prompt Template of LLM

LLMs refer to ultra-large-scale neural networks trained via deep learning techniques, capable of comprehending and generating human language with remarkable fluency. These models are typically pre-trained on extensive textual corpora, allowing them to capture nuanced linguistic patterns and contextual dependencies. Owing to their transformative impact across a wide range of natural language processing tasks—such as text generation and machine translation—LLMs have garnered increasing attention in the domain of recommendation systems [38,39]. Prompt templates serve as an effective mechanism to enhance the interpretability and reasoning capabilities of LLMs; when carefully crafted, they can guide the model to produce responses that are not only coherent but also closely aligned with task-specific objectives [40]. Qwen (version 2.0; Alibaba Cloud, Hangzhou, China) is a language model introduced by Alibaba Cloud, and we have deployed this model on our servers with a parameter size of 32 billion. As illustrated in Figure 3, the prompt template is carefully designed to guide the LLM in its role as an intelligent exercise generation assistant. The blue section highlights that the LLM should focus on students’ evolving understanding of specific knowledge concepts and the desired exercise difficulty. Newly generated reference exercises should adhere to the format provided in the examples. For each student with multiple weak knowledge concepts, the LLM aims to generate exercises covering as many of these knowledge concepts as possible to optimize knowledge improvement. To ensure content quality, the reference exercises are matched against the exercise bank, with the top five most similar exercises selected for recommendation. The yellow section corresponds to each student’s learning situation, which is updated before each recommendation. The green section represents the response of LLM. The process is outlined below:

LP = LLM (s_{t}^{i}; [KC]; [q_{i}, q_{j}]; d),

(10)

where

s^{i}

denotes the student’s knowledge states,

[K C]

is the set of relevant knowledge concepts,

[q_{i}]

provides example exercises, and

d

represents the expected difficulty level.

The LLM used in this work is the Qwen2-32B pre-trained model. We did not fine-tune this model; instead, we fully leveraged its in-context learning capability through carefully designed prompt templates. The temperature in the text generation process was set to 0.5.

4.4. Exercise Retrieval

We employ a pre-trained BERT model to obtain textual embeddings for the exercises. For each exercise, we input its text into the BERT model and use the corresponding output vector as its semantic embedding, which is then used for subsequent similarity computation.

To ensure the effectiveness and accuracy of the recommended content, we match the candidate exercises generated by the LLM with existing items in the exercise bank and select the top-N most similar ones for recommendation.

Given a reference exercise

q = {w_{1}, w_{2}, \dots, w_{n}}

and a candidate exercise

q^{'} = {w_{1}^{'}, w_{2}^{'}, \dots, w_{m}^{'}}

, where

w_{i}

and

w_{i}^{'}

represent words in the exercise and n and m are the number of words in each exercise text, we utilize a pre-trained BERT model to obtain contextualized vector embeddings for each token:

E (q_{i}), E (q_{j}^{'}) \in R^{d},

(11)

where

E (q_{i})

and

E (q_{j}^{'})

are the d-dimensional embeddings produced by BERT [41]. The semantic similarity between a reference and a candidate exercise is computed using cosine similarity:

sim (q_{i}, q_{j}^{'}) = \frac{E (q_{i}) ⊙ E (q_{j}^{'})}{| | E (q_{i}) | | | | E (q_{j}^{'}) | |} .

(12)

Here, ⊙ denotes the dot product and

| | * | |

is the Euclidean norm. Ultimately, the top

N = 5

candidate exercises with the highest similarity scores are selected:

{lp}_{c} = {Rank}_{5} = {q_{1}, q_{2}, \dots, q_{5}} .

(13)

Before generating a learning path, the student’s knowledge state—provided by the KT model—is transformed into explicit, structured pedagogical objectives, namely the “set of knowledge concepts requiring reinforcement” and the “target difficulty level.” These objectives are clearly communicated to the LLM through carefully designed prompt templates, thereby guiding the generation process in a controlled and purposeful manner. Based on the entire exercise bank, we compute the semantic similarity between the LLM-generated reference item and all candidate exercises using a pre-trained BERT model and select the top-N most similar exercises. Through this design, the LLM functions more like a “pedagogical assistant” that operates within a well-defined instructional framework, generating contextually appropriate and semantically rich reference content. Meanwhile, the retrieval mechanism ensures that the final recommended exercises are drawn from the authentic, curated exercise bank. This separation of responsibilities effectively aligns the creative capacity of the LLM with educational fidelity, thereby preserving consistency with the intended pedagogical goals.

The detailed process of exercise retrieval is documented in Algorithm 2.

Algorithm 2: MatchExercises

Input: $L i s t [q_{j}^{'}]$ generated by LLM, Exercise bank Q,
Number of exercises to retrieve n
Output: $t o p_n_e x e r c i s e s$
$a l l_c a n d i d a t e s \leftarrow []$
for each $r e f$ in $L i s t [q_{j}^{'}]$ do
$r e f_{r e c} \leftarrow B E R T_{E} m b e d (r e f)$
for each $c a n d i d a t e$ in Q do
$c a n d i d a t e_{r e c} \leftarrow B E R T_{E} m b e d (c a n d i d a t e)$
$s i m \leftarrow C o s i n e S i m i l a r i t y (r e c_{r e f}, c a n d i d a t e_{r e c})$
$A p p e n d (c a n d i d a t e, s i m) t o a l l_c a n d i d a t e s$
end for
end for
$s o r t e d_c a n d i d a t e s \leftarrow S o r t (a l l_{c} a n d i d a t e s, b y = s i m, d e s c e n d i n g = T r u e)$
$t o p_n_e x e r c i s e s \leftarrow G e t T o p N (s o r t e d_c a n d i d a t e s, n)$

5. Experiment

In order to understand the model more clearly, we conducted experiments that addressed the following research questions:

RQ1: In the task of assessing students’ knowledge status, do the results generated by direct use of LLM tend to be conservative and lack discrimination?

RQ2: Is LPReKL more effective than the existing LPR?

RQ3: What impact do the individual core components of LPReKL have on the model’s overall performance?

RQ4: How do hyperparameters in the model contribute to overall performance?

5.1. Dataset and Simulator

5.1.1. Dataset

To evaluate the performance of the proposed LPReKL, we conduct experiments on three publicly available datasets. MOOCCubeX (https://github.com/THU-KEG/MOOCCubeX?tab=readme-ov-file, accessed on 1 March 2025) is one of the largest and most comprehensive MOOC datasets, containing a wealth of exercises, knowledge concepts, and student interaction records. We extract students’ activity logs related to physics subjects for our experiments. MOOPer (http://data.openkg.cn/dataset/mooper, accessed on 1 March 2025) is derived from interaction data collected between 2018 and 2019 on the EduCoder platform, where students participated in practical programming exercises. XES3G5M (https://github.com/ai4ed/xes3g5m, accessed on 1 March 2025) is collected from a real-world online mathematics learning platform and contains third-grade students’ historical interaction records on math exercises. The processed dataset details are summarized in Table 2.

5.1.2. Simulator

To evaluate the effectiveness of different methods in LPR, we follow prior work [14,24,35] and utilize our proposed KT to assess the quality of the generated learning paths. The KT model is trained on large-scale real-world data with the objective of accurately predicting students’ responses to exercises. Therefore, it can serve as a reliable simulator for students’ knowledge state evolution, enabling a fair comparison of the expected effectiveness of learning paths generated by different recommendation methods.

5.2. Implementation Details

We discard student records with fewer than ten interactions and retain only the associated knowledge concepts. The remaining data is split into training, validation, and test sets with a ratio of 8:1:1. The hidden dimension of the KT is set to 200, and the output layer size matches the number of knowledge concepts in the dataset. Dropout with a rate of 0.6 is applied to mitigate overfitting. We use the Adam optimizer with a momentum of 0.9, a gradient clipping threshold of 3.0, an initial learning rate of 0.01, and a decay rate of 0.75. The input sequence length is fixed to 200, and shorter sequences are padded with null values. All experiments are conducted on a Tesla V100 GPU with Python 3.10, PyTorch 2.1.2, and CUDA 11.8.

5.3. Baselines and Evaluation Metric

5.3.1. Baselines

We compare LPReKL with several state-of-the-art methods:

FISM [42]: Generates recommendations based on a similarity matrix.
CluLSTM [43]: Clusters students into groups and uses LSTM to predict learning paths.
DQN [44]: Applies Q-learning and deep neural networks for decision-making.
GRU4Rec [45]: Utilizes gated recurrent units to process students’ historical interactions and generate learning paths.
LightGCN [46]: Employs multi-layer graph convolution to extract deep features among entities for recommendation.
GEHRL [35]: Tracks students’ knowledge states with KT and adopts hierarchical reinforcement learning for goal planning and recommendation.
SKarRec [15]: Leverages LLMs to construct textual descriptions of learning items and combines KT with graph neural networks for recommendation.
KGNN-KT [33]: Utilize LLMs to construct a knowledge graph from unordered knowledge concepts, and then model student behavior using GNNs.

5.3.2. Evaluation Metric

We adopt

E_{φ}

(Equation (1)) to measure the effectiveness of learning paths [14,24,35] and compare all methods based on their performance using this metric. Traditional recommendation metrics (e.g., NDCG, Precision) aim to measure item “relevance” or “ranking quality.” However, in education, the ultimate goal is not merely the alignment between students and resources but the actual improvement in students’ knowledge state. A highly “relevant” exercise may fail to promote learning if it is too easy or too difficult. Optimizing for relevance alone risks recommending items that appear suitable but yield little educational benefit. In contrast, our evaluation metric,

E_{φ}

, directly quantifies the gain in a student’s knowledge state after completing the recommended path—i.e., the learning gain. This ensures that our model optimization is aligned with the fundamental objective of education: meaningful and measurable knowledge advancement.

5.4. Exploring the Predictive Capabilities of LLM (RQ1)

In this experiment, we use KT instead of LLM to directly predict the knowledge state of the students, primarily because LLMs exhibit a pronounced central tendency in their assessments of knowledge states. To validate this phenomenon, we randomly sample students with varying record lengths from three datasets, input their response histories into the LLM, and obtain predictions of their knowledge states. Specifically, we categorize students’ historical learning records into three groups based on length (fewer than 100 entries, fewer than 200 entries, and more than 200 entries), randomly select 100 students from each group, and visualize the LLM’s predictions using box plots (Figure 4).

For all three datasets, when the record length is fewer than 200, the median predicted state is relatively high and the data distribution is more dispersed. In contrast, when the record length exceeds 200, the median predicted state is comparatively lower and the distribution is more concentrated. With longer input sequences, the LLM can better capture the input information, leading to more stable predictions. In summary, the LLM tends to produce moderate predictions in scoring tasks, remaining in a comfort zone to avoid extreme judgments. In comparison, KT does not suffer from this limitation and can more accurately predict students’ knowledge states. Therefore, we opt for the KT to evaluate students’ learning capabilities.

5.5. Overall Performance Comparison (RQ2)

To thoroughly evaluate the capabilities of various models, we configure three settings for selecting candidate exercises: 1

ρ = 1

: randomly sample n exercises. 2

ρ = 2

: divide all exercises into

(\frac{N}{n + 1})

groups, each of size n, and randomly select one group as the candidate set. 3

ρ = 3

: use all available exercises. N denotes the total number of exercises. For the MOOCCubeX, MOOPer, and XES3G5M datasets, n is set to 100, 500, and 500, respectively. The three settings are referred to as

ρ =

1, 2, and 3 in the following experiments.

5.5.1. Promotion Comparison

As shown in Table 3, LPReKL achieves superior performance in most experimental settings, being outperformed by other models only in a few cases. When the number of candidate exercises is limited (

ρ = 1

and

ρ = 2

), the performance of all models declines significantly. This is mainly because the randomly selected exercises in these scenarios often fail to cover the specific knowledge deficiencies of students, resulting in limited improvement. In contrast, when

ρ = 3

, i.e., when the entire exercise bank is available, all models demonstrate notable performance gains, with LPReKL achieving particularly remarkable improvement. This highlights the model’s ability to accurately identify and address students’ weaknesses when provided with sufficient and diverse learning resources. However, LPReKL performs slightly less effectively on the XES3G5M dataset compared to the other two datasets. A possible explanation is the large number of knowledge concepts in XES3G5M, which makes it difficult for a fixed set of five recommended exercises to meet students’ diverse learning needs. Additionally, the strong semantic understanding capabilities of the LLM embedded in LPReKL are better leveraged in datasets with rich textual information, such as MOOCCubeX and MOOPer. Notably, SKarRec—which also employs an LLM—demonstrates strong competitiveness on these two datasets, further confirming the potential of LLMs in the context of LPR. For other models, FISM and CluLSTM heavily rely on co-occurrence similarity between items, and their recommendation quality deteriorates significantly when candidate exercises are randomly generated. DQN, as a classic reinforcement learning algorithm, can dynamically adjust its strategy based on the environment state, thereby demonstrating relatively stable performance. GRU4Rec and LightGCN critically depend on sufficient historical interaction data to capture student behavior patterns; as such, their representational capacity is limited in cases of data sparsity or short interaction sequences. GEHRL achieves competitive performance by decoupling the recommendation process into two stages—goal planning and exercise recommendation—and by incorporating awareness of students’ evolving knowledge states. However, its architecture lacks a closed-loop mechanism that enables real-time optimization of the recommendation strategy based on student feedback. SkarRec and KGNN-KT attempt to leverage LLMs to uncover semantic relationships among learning items. Nevertheless, in real-world educational scenarios, knowledge systems are large-scale and dynamically evolving, making it impractical to exhaustively encode and input all conceptual relationships into the LLM. This limitation hinders their generalization ability and practical applicability.

Specifically, we performed paired t-tests to compare LPReKL against the two strongest baseline models, GEHRL and SkarRec. The performance improvement of LPReKL over the best baseline is statistically significant (p < 0.05) on MOOCCubeX (

ρ = 1

,

ρ = 3

), MOOPer (

ρ = 1

,

ρ = 2

), and XES3G5M (

ρ = 2

,

ρ = 3

). In the remaining cases, the performance differences did not reach conventional significance levels. Nevertheless, our method consistently demonstrates a favorable numerical advantage across all settings, indicating its robustness and effectiveness.

5.5.2. Difficulty Comparison

An appropriate level of exercise difficulty can significantly enhance students’ learning motivation, whereas exercises that are too simple or too difficult may hinder their long-term engagement. As shown in Table 4, LPReKL consistently achieves favorable results under all three settings. This is primarily because our framework integrates both the LLM and KT components to explicitly account for difficulty when generating learning paths, ensuring that the recommended exercises align well with the predefined difficulty expectations. In contrast, other methods do not incorporate difficulty as a consideration during the recommendation process, resulting in learning paths of inconsistent quality, which may negatively impact students’ learning experience. These findings suggest that incorporating external factors—such as exercise difficulty—is crucial for delivering personalized learning services, as it helps efficiently target students’ weaknesses and accelerates progress toward their learning goals.

5.6. Ablation Study (RQ3)

We design the following variants to investigate the contribution of each component in LPReKL:

LPReKL-F: Removes the feedback mechanism. Candidate exercises retrieved from the exercise bank are directly recommended without iterative adjustment.
LPReKL-K: Reverts to the original DKT by omitting the masking mechanism when reconstructing historical interaction data.
LPReKL-L: Excludes the LLM. Instead, exercises relevant to students’ weak knowledge concepts, as identified by KT, are retrieved from the exercise bank for recommendation.

As shown in Table 5, the performance of LPReKL-K is on par with LPReKL, indicating that traditional KT still performs excellently in predicting student knowledge states. However, it typically requires more structured training data and cannot be generalized to multi-step prediction tasks. LPReKL-F and LPReKL-L yield the weakest performance, highlighting the critical roles of both the feedback mechanism and the LLM. The LLM’s semantic understanding and reasoning capabilities allow it to infer students’ weaknesses from contextual information and generate personalized learning items accordingly. The feedback mechanism guides the LLM to iteratively adjust its instructional strategies. Unlike prior studies, which largely overlook student feedback and rely solely on historical data for one-way recommendations, our framework introduces a dynamic, student-aware loop. This enables the model to better align with students’ evolving needs and individual learning profiles.

5.7. Parameter Analysis (RQ4)

5.7.1. Weight Coefficients of Knowledge Tracing+

The KT in our framework involves two key hyperparameters,

α

and

β

(as defined in Equation (9)). We vary these parameters to examine their influence on model performance. The results are presented in Figure 5. When

α

and

β

are set to 0.5:0.5, the model achieves the best performance on the MOOCCubeX and XES3G5M datasets. However, increasing either

α

or

β

disproportionately leads to a significant drop in performance. The MOOPer dataset presents a more complex pattern. The model performs best when

α

and

β

are set to 0.6:0.4, while further adjustments cause noticeable fluctuations in performance. We attribute this to the larger number of knowledge components in MOOPer, which requires a more balanced consideration of prediction accuracy and item difficulty—an aspect that aligns more closely with real-world educational settings. These observations highlight the importance of dataset-specific parameter tuning, as appropriate values of

α

and

β

are crucial for optimizing model performance.

5.7.2. Mask Ratio of Knowledge Tracing+

In real-world educational scenarios, the problem of a large exercise bank arises. To enable the KT to predict students’ knowledge states for the next multi-step, we reconstruct students’ historical records during training by randomly masking their responses with a certain probability. Figure 6 shows the impact of different masking rates on model performance. The results indicate that excessively high masking rates significantly degrade model performance. This is because with too many missing student records, the model struggles to accurately assess students’ knowledge states, hindering the generation of personalized learning paths. For the MOOPer dataset, which contains the most knowledge concepts, reducing the masking rate allows for more comprehensive training, leading to more precise decisions. For the other two datasets, optimal performance is achieved at a masking rate of 0.15, with further reductions yielding minimal improvements. We attribute this to the moderate number of knowledge concepts in these datasets, where a masking rate of 0.15 is sufficient for the model to make reasonably accurate predictions based on the available data.

6. Discussion

Traditional rule-based methods lack flexibility, while early sequential recommendation models struggle to capture the dynamics of knowledge acquisition. Although KT models and LLMs have individually shown promise in addressing these challenges, the former are often limited to single-step prediction, while the latter suffer from hallucination and a conservative bias in evaluation. The proposed LPReKL framework is designed to synergize the strengths of both KT and LLM to overcome these limitations. Experimental results demonstrate that LPReKL performs well across multiple datasets and settings, primarily due to two key design principles. First, the iterative feedback loop between KT and LLM enables dynamic adaptation. Unlike conventional “one-shot” recommendation approaches, our system continuously refines the learning path: the KT model evaluates the expected effectiveness of candidate paths and feeds this information back to the LLM, which then adjusts its generation strategy accordingly. This closed-loop interaction allows for personalized and context-aware path refinement. Second, the paradigm of “LLM-generated reference items + semantic retrieval from a question bank” effectively balances creativity with reliability. The LLM interprets complex pedagogical intentions and concretizes them into reference exercises, while the retrieval mechanism ensures that the final recommended items are drawn from a curated, high-quality item bank. This division of labor not only mitigates the hallucination issues commonly associated with LLMs but also enhances the system’s scalability—new, high-quality exercises can be seamlessly integrated into the bank and become immediately available for recommendation. Nevertheless, a limitation of this work lies in using the same KT model both as a component within the framework and as the primary evaluator. While this practice is common in the field and the KT model itself is robust and well validated on large-scale data, it may introduce potential evaluation bias. To further strengthen the validity of our findings, we acknowledge the importance of future online A/B testing in real educational platforms. Such studies would allow us to assess the long-term impact of LPReKL in authentic teaching and learning environments, beyond simulated offline evaluations.

7. Conclusions

In this paper, we proposed LPReKL, a novel learning path recommendation framework that synergizes Knowledge Tracing (KT) with a Large Language Model (LLM) to address the limitations of traditional educational approaches. By dynamically tracking students’ knowledge states through an enhanced KT model and leveraging the generative capabilities of LLMs, our system provides highly personalized and adaptive learning recommendations. The integration of a feedback mechanism ensures continuous optimization of the recommended content, aligning it with students’ evolving needs and balancing exercise difficulty to enhance engagement. Experimental results on multiple public datasets demonstrated that LPReKL outperforms existing baselines in terms of recommendation effectiveness and adaptability, particularly in scenarios with abundant learning resources. Ablation studies further validated the critical roles of the feedback mechanism, LLM-generated content, and the masked training strategy for KT in improving performance. Future work will explore incorporating additional personalized factors (e.g., learning styles, engagement metrics) and refining the model architecture to better handle complex real-world educational environments. This research contributes to the advancement of AI-driven educational technologies, offering a scalable solution to deliver tailored learning experiences and improve educational outcomes. Additionally, the LLM adjusts its strategy by processing feedback through natural language prompts, rather than via gradient updates. While this design offers advantages in computational efficiency and safety, and has proven effective in practice, the adjustment granularity is indeed relatively coarse. Future work may explore finer-grained optimization techniques, such as reinforcement learning from human or synthetic feedback, to refine the LLM’s decision-making process and potentially achieve further performance gains.

Author Contributions

The study was conceived and designed jointly by Y.L. and Z.W. Y.L. performed the experiments, collected and analyzed the data, and drafted the manuscript. Z.W. provided critical feedback, revised the manuscript for important intellectual content, and supervised the research. Both authors discussed the results and contributed to the final version of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 62377015), National Key Research and Development Program of China (Grant No. 2023YFC3341200), and the Innovation Project for General Colleges and Universities in Guangdong Province (Grant No. 2024KTSCX094).

Data Availability Statement

Suggested Data Availability Statements are available at https://github.com/THU-KEG/MOOCCubeX?tab=readme-ov-file, accessed on 1 March 2025, http://data.openkg.cn/dataset/mooper and https://github.com/ai4ed/xes3g5m, accessed on accessed on 1 March 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhang, G.; Gao, X.; Ye, H.; Zhu, J.; Lin, W.; Wu, Z.; Zhou, H.; Ye, Z.; Ge, Y.; Baghban, A. Optimizing learning paths: Course recommendations based on graph convolutional networks and learning styles. Appl. Soft Comput. 2025, 175, 113083. [Google Scholar] [CrossRef]
Chen, X.; Zhu, Z.; Xie, Y. Design and Implementation of an Explainable Course Recommendation Algorithm Based on Causal Inference. In Proceedings of the International Conference on Computer Science and Educational Informatization, Haikou, China, 1–3 November 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 183–199. [Google Scholar]
Wang, L.; Li, Q.; Cui, D.; Wang, M.; Zhao, Y.; Xu, Y.; Zhuang, H.; Zhou, Y.; Wang, L. Building Bridges, Not Walls: Fairness-Aware and Accurate Recommendation of Code Reviewers via LLm-Based Agents Collaboration. In Proceedings of the 2025 IEEE/ACM 33rd International Conference on Program Comprehension (ICPC), Ottawa, ON, Canada, 27–28 April 2025; IEEE Computer Society: Washington, DC, USA, 2025; pp. 577–588. [Google Scholar]
Ma, W.; Chen, W.; Lu, L.; Fan, X. Integrating learners’ knowledge background to improve course recommendation fairness: A multi-graph recommendation method based on contrastive learning. Inf. Process. Manag. 2024, 61, 103750. [Google Scholar] [CrossRef]
Zhang, F.; Feng, X.; Wang, Y. Personalized process–type learning path recommendation based on process mining and deep knowledge tracing. Knowl.-Based Syst. 2024, 303, 112431. [Google Scholar] [CrossRef]
Li, S.; Liu, X.; Tang, X.; Chen, X.; Pu, J. MLKT4Rec: Enhancing Exercise Recommendation Through Multitask Learning with Knowledge Tracing. IEEE Trans. Comput. Soc. Syst. 2024, 12, 1458–1472. [Google Scholar] [CrossRef]
Kuo, B.C.; Chang, F.T.; Bai, Z.E. Leveraging LLMs for Adaptive Testing and Learning in Taiwan Adaptive Learning Platform (TALP). In Proceedings of the LLM@ AIED, Tokyo, Japan, 7 July 2023; pp. 101–110. [Google Scholar]
He, R.; Zhang, L.; Lyu, L.; Xue, C. Enhancing the ability of LLMs for spaceborne equipment code generation via retrieval-augmented generation and contrastive learning. Autom. Softw. Eng. 2026, 33, 1–25. [Google Scholar] [CrossRef]
Schmied, T.; Bornschein, J.; Grau-Moya, J.; Wulfmeier, M.; Pascanu, R. LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities. arXiv 2025, arXiv:2504.16078. [Google Scholar]
Fan, Y.; Tong, M.; Li, D. Learning path recommendation based on forgetting factors and knowledge graph awareness. Inf. Process. Manag. 2026, 63, 104393. [Google Scholar] [CrossRef]
Mrhar, K.; Abik, M. A deep learning framework for optimizing personalized online course recommendation and selection. Decis. Anal. J. 2025, 16, 100616. [Google Scholar] [CrossRef]
Wu, Q.; Ji, W.; Zhou, G. CLKT: Optimizing cognitive load management in knowledge tracing. Cogn. Comput. 2025, 17, 74. [Google Scholar] [CrossRef]
Li, H.; Xu, T.; Zhang, C.; Chen, E.; Liang, J.; Fan, X.; Li, H.; Tang, J.; Wen, Q. Bringing generative AI to adaptive learning in education. arXiv 2024, arXiv:2402.14601. [Google Scholar] [CrossRef]
Liu, Q.; Tong, S.; Liu, C.; Zhao, H.; Chen, E.; Ma, H.; Wang, S. Exploiting cognitive structure for adaptive learning. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 627–635. [Google Scholar]
Li, Q.; Xia, W.; Du, K.; Zhang, Q.; Zhang, W.; Tang, R.; Yu, Y. Learning Structure and Knowledge Aware Representation with Large Language Models for Concept Recommendation. arXiv 2024, arXiv:2405.12442. [Google Scholar] [CrossRef]
Liang, Z.; Mu, L.; Chen, J.; Xie, Q. Graph path fusion and reinforcement reasoning for recommendation in MOOCs. Educ. Inf. Technol. 2023, 28, 525–545. [Google Scholar] [CrossRef]
Shang, Y.; Luo, X.; Wang, L.; Peng, H.; Zhang, X.; Ren, Y.; Liang, K. Reinforcement learning guided multi-objective exam paper generation. In Proceedings of the 2023 SIAM International Conference on Data Mining (SDM), Minneapolis, MN, USA, 27–29 April 2023; pp. 829–837. [Google Scholar]
Wang, H.; Long, T.; Yin, L.; Zhang, W.; Xia, W.; Hong, Q.; Xia, D.; Tang, R.; Yu, Y. Gmocat: A graph-enhanced multi-objective method for computerized adaptive testing. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, USA, 6–10 August 2023; pp. 2279–2289. [Google Scholar]
Liu, H.; Li, X. Learning path combination recommendation based on the learning networks. Soft Comput. 2020, 24, 4427–4439. [Google Scholar] [CrossRef]
Abdelrahman, G.; Wang, Q.; Nunes, B. Knowledge tracing: A survey. ACM Comput. Surv. 2023, 55, 1–37. [Google Scholar] [CrossRef]
Li, L.; Wang, Z.; Jose, J.M.; Ge, X. LLM supporting knowledge tracing leveraging global subject and student specific knowledge graphs. Inf. Fusion 2025, 126, 103577. [Google Scholar] [CrossRef]
Corbett, A.T.; Anderson, J.R. Knowledge tracing: Modeling the acquisition of procedural knowledge. User Model. -User-Adapt. Interact. 1994, 4, 253–278. [Google Scholar] [CrossRef]
Piech, C.; Bassen, J.; Huang, J.; Ganguli, S.; Sahami, M.; Guibas, L.J.; Sohl-Dickstein, J. Deep knowledge tracing. Adv. Neural Inf. Process. Syst. 2015, 28, 505–513. [Google Scholar]
Chen, X.; Shen, J.; Xia, W.; Jin, J.; Song, Y.; Zhang, W.; Liu, W.; Zhu, M.; Tang, R.; Dong, K.; et al. Set-to-sequence ranking-based concept-aware learning path recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 5027–5035. [Google Scholar]
Li, Q.; Yuan, X.; Yue, J.; Shen, X.; Liang, R.; Liu, S.; Yan, Z. Dual-view multi-scale cognitive representation for deep knowledge tracing. Knowl.-Based Syst. 2025, 310, 113010. [Google Scholar] [CrossRef]
Huang, C.; Jiang, W.; Li, K.; Wu, J.; Zhang, J. Enhancing learning process modeling for session-aware knowledge tracing. Knowl.-Based Syst. 2025, 309, 112740. [Google Scholar] [CrossRef]
Wang, Z.; Zhou, J.; Chen, Q.; Zhang, M.; Jiang, B.; Zhou, A.; Bai, Q.; He, L. LLM-KT: Aligning Large Language Models with Knowledge Tracing using a Plug-and-Play Instruction. arXiv 2025, arXiv:2502.02945. [Google Scholar]
Roumeliotis, K.I.; Tselikas, N.D.; Nasiopoulos, D.K. Fake News Detection and Classification: A Comparative Study of Convolutional Neural Networks, Large Language Models, and Natural Language Processing Models. Future Internet 2025, 17, 28. [Google Scholar] [CrossRef]
Kuang, J.; Shen, Y.; Xie, J.; Luo, H.; Xu, Z.; Li, R.; Li, Y.; Cheng, X.; Lin, X.; Han, Y. Natural language understanding and inference with mllm in visual question answering: A survey. ACM Comput. Surv. 2025, 57, 1–36. [Google Scholar] [CrossRef]
Agrawal, G.; Pal, K.; Deng, Y.; Liu, H.; Chen, Y.C. Cyberq: Generating questions and answers for cybersecurity education using knowledge graph-augmented llms. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 23164–23172. [Google Scholar]
Wang, X.; Zhong, Y.; Huang, C.; Huang, X. ChatPRCS: A personalized support system for English reading comprehension based on ChatGPT. IEEE Trans. Learn. Technol. 2024, 17, 1722–1736. [Google Scholar] [CrossRef]
Cui, P.; Sachan, M. Adaptive and personalized exercise generation for online language learning. arXiv 2023, arXiv:2306.02457. [Google Scholar] [CrossRef]
Zhang, D.; Niu, Q.; Wang, T.; Hou, Y.; Wu, J.; Zhang, C.; Stefanidis, A. KGNN-KT: Enhancing Knowledge Tracing in Programming Education Through LLM-Extracted Knowledge Graphs. In Proceedings of the International Conference on Intelligent Computing, Shanghai, China, 23–25 May 2025; Springer: Berlin/Heidelberg, Germany, 2025; pp. 137–147. [Google Scholar]
Sun, X.; Liu, Q.; Zhang, K.; Shen, S.; Yang, L.; Li, H. Harnessing code domain insights: Enhancing programming knowledge tracing with large language models. Knowl.-Based Syst. 2025, 317, 113396. [Google Scholar] [CrossRef]
Li, Q.; Xia, W.; Yin, L.; Shen, J.; Rui, R.; Zhang, W.; Chen, X.; Tang, R.; Yu, Y. Graph enhanced hierarchical reinforcement learning for goal-oriented learning path recommendation. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023; pp. 1318–1327. [Google Scholar]
Papoušek, J.; Stanislav, V.; Pelánek, R. Impact of question difficulty on engagement and learning. In Proceedings of the Intelligent Tutoring Systems: 13th International Conference, ITS 2016, Zagreb, Croatia, 7–10 June 2016; Proceedings 13. Springer: Berlin/Heidelberg, Germany, 2016; pp. 267–272. [Google Scholar]
Zhang, X.; Shang, Y.; Ren, Y.; Liang, K. Dynamic multi-objective sequence-wise recommendation framework via deep reinforcement learning. Complex Intell. Syst. 2023, 9, 1891–1911. [Google Scholar] [CrossRef]
Wei, C.; Duan, K.; Zhuo, S.; Wang, H.; Huang, S.; Liu, J. Enhanced Recommendation Systems with Retrieval-Augmented Large Language Model. J. Artif. Intell. Res. 2025, 82, 1147–1173. [Google Scholar] [CrossRef]
Sakurai, K.; Togo, R.; Ogawa, T.; Haseyama, M. Llm is knowledge graph reasoner: Llm’s intuition-aware knowledge graph reasoning for cold-start sequential recommendation. In Proceedings of the European Conference on Information Retrieval, Lucca, Italy, 6–10 April 2025; Springer: Berlin/Heidelberg, Germany, 2025; pp. 263–278. [Google Scholar]
Li, X.; Peng, S.; Yada, S.; Wakamiya, S.; Aramaki, E. GenKP: Generative knowledge prompts for enhancing large language models. Appl. Intell. 2025, 55, 464. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
Kabbur, S.; Ning, X.; Karypis, G. Fism: Factored item similarity models for top-n recommender systems. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, 11–14 August 2013; pp. 659–667. [Google Scholar]
Zhou, Y.; Huang, C.; Hu, Q.; Zhu, J.; Tang, Y. Personalized learning full-path recommendation model based on LSTM neural networks. Inf. Sci. 2018, 444, 135–152. [Google Scholar] [CrossRef]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar] [CrossRef]
Hidasi, B.; Karatzoglou, A.; Baltrunas, L.; Tikk, D. Session-based recommendations with recurrent neural networks. arXiv 2015, arXiv:1511.06939. [Google Scholar]
He, X.; Deng, K.; Wang, X.; Li, Y.; Zhang, Y.; Wang, M. Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 25–30 July 2020; pp. 639–648. [Google Scholar]

Figure 1. Diagram of the learning process.

Figure 2. Framework overview.

Figure 3. Prompt template of LLM.

Figure 4. Exploring the capabilities of LLM in predicting students’ knowledge states. The box represents the interquartile range, spanning from the 25th to the 75th percentile. The horizontal line inside the box indicates the median.

Figure 5. Weight coefficients of KT.

Figure 6. Mask ratio of KT.

Table 1. Key notations.

Notation	Description
Q = { $q_{1}, q_{2}, \dots$ }	All exercises in the exercise bank
U = { $u_{1}, u_{2}, \dots$ }	All involved students
C = { $c_{1}, c_{2}, \dots$ }	All relevant knowledge concepts
$u^{i}$ = { $(q_{1}, a_{1}), (q_{2}, a_{2}), \dots$ }	A specific student’s learning records
$a_{i}$	Exercise answer, $a_{i} \in$ [0, 1]
$E_{s u p}$	Full score of the examination
$E_{s}$	Beginning score of the examination
$E_{e}$	Ending score of the examination
$E_{φ}$	A student’s promotion in specific knowledge concepts
$K C (q)$	A set of knowledge concepts contained in q
$D i f f (q_{i})$	Exercise difficulty of $q_{i}$
d	Expected difficulty
$l p_{c}$	Candidate exercises
$s^{i}$	A specific knowledge state of the student i

Table 2. Dataset statistics.

Dataset	Students	Exercise	KCs	Interations
MOOCCubeX	13,091	156	71	1,206,646
MOOPer	26,603	1756	1360	2,007,572
XES3G5M	16,378	6006	828	4,803,902

Table 3. Comparison of different methods in terms of promotion. Bold indicates the best result and underlining indicates the second best. * indicates significant improvement over baselines (p < 0.01).

		FISM	CluLSTM	DQN	GRU4Rec	LightGCN	GEHRL	SKarRec	KGNN-KT	LPReKL
	$ρ = 1$	0.1009	0.1590	0.2067	0.1802	0.1643	0.2179	0.2011	0.2032	*${0.2433}^{}$**
MOOCCubeX	$ρ = 2$	0.1519	0.2188	0.1923	0.2105	0.2294	0.2577	0.2371	0.2291	0.2574
	$ρ = 3$	0.2428	0.3422	0.3114	0.2956	0.2861	0.3446	0.3593	0.3015	*${0.3885}^{}$**
	$ρ = 1$	0.1417	0.2499	0.2044	0.2208	0.1629	0.2391	0.2371	0.2205	*${0.2607}^{}$**
MOOPer	$ρ = 2$	0.1858	0.2355	0.2168	0.1977	0.2126	0.3065	0.3109	0.2817	*${0.3356}^{}$**
	$ρ = 3$	0.2407	0.3696	0.3235	0.3316	0.3576	0.4973	0.4641	0.3762	0.4852
	$ρ = 1$	0.0196	0.0774	0.1642	0.1355	0.1178	0.1469	0.2028	0.1847	0.1952
XES3G5M	$ρ = 2$	0.0888	0.0417	0.1916	0.1755	0.1040	0.2021	0.2415	0.2163	*${0.2587}^{}$**
	$ρ = 3$	0.0165	0.2474	0.2168	0.2178	0.2062	0.2318	0.2671	0.2492	*${0.3002}^{}$**

Table 4. Comparison of different methods in terms of difficulty. Values closer to 0.3 indicate better performance. Bold indicates the best result and underlining indicates the second best.

		FISM	CluLSTM	DQN	GRU4Rec	LightGCN	GEHRL	SKarRec	LPReKL
	$ρ = 1$	0.4909	0.3559	0.3795	0.3822	0.3648	0.2194	0.4011	0.3321
MOOCCubeX	$ρ = 2$	0.2147	0.2311	0.4346	0.2394	0.2417	0.3734	0.3505	0.2849
	$ρ = 3$	0.4416	0.3655	0.3420	0.2324	0.3614	0.3981	0.2573	0.2790
	$ρ = 1$	0.4636	0.3448	0.3826	0.4227	0.4116	0.2264	0.2391	0.3784
MOOPer	$ρ = 2$	0.4026	0.4058	0.2005	0.2152	0.2157	0.4463	0.3744	0.2374
	$ρ = 3$	0.3617	0.3540	0.3675	0.3874	0.2391	0.3746	0.2256	0.2851
	$ρ = 1$	0.4314	0.2258	0.2499	0.4375	0.2088	0.3914	0.4021	0.3480
XES3G5M	$ρ = 2$	0.4121	0.3608	0.3750	0.2356	0.2165	0.2210	0.2572	0.3505
	$ρ = 3$	0.2444	0.3734	0.2338	0.3571	0.3753	0.4107	0.3847	0.2871

Table 5. Ablation study. Bold indicates the best result.

		LPReKL-F	LPReKL-K	LPReKL-L	LPReKL
	$ρ = 1$	0.1899	0.2647	0.1037	0.2643
MOOCCubeX	$ρ = 2$	0.2681	0.2501	0.1676	0.2507
	$ρ = 3$	0.3589	0.3829	0.2036	0.3885
	$ρ = 1$	0.1046	0.2591	0.1398	0.2607
MOOPer	$ρ = 2$	0.0948	0.3359	0.1666	0.3356
	$ρ = 3$	0.2127	0.4812	0.3198	0.4852
	$ρ = 1$	0.0414	0.1894	0.0857	0.1952
XES3G5M	$ρ = 2$	0.1745	0.2609	0.1659	0.2587
	$ρ = 3$	0.2141	0.2973	0.2184	0.3002

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lin, Y.; Wu, Z. Learning Path Recommendation Enhanced by Knowledge Tracing and Large Language Model. Electronics 2025, 14, 4385. https://doi.org/10.3390/electronics14224385

AMA Style

Lin Y, Wu Z. Learning Path Recommendation Enhanced by Knowledge Tracing and Large Language Model. Electronics. 2025; 14(22):4385. https://doi.org/10.3390/electronics14224385

Chicago/Turabian Style

Lin, Yunxuan, and Zhengyang Wu. 2025. "Learning Path Recommendation Enhanced by Knowledge Tracing and Large Language Model" Electronics 14, no. 22: 4385. https://doi.org/10.3390/electronics14224385

APA Style

Lin, Y., & Wu, Z. (2025). Learning Path Recommendation Enhanced by Knowledge Tracing and Large Language Model. Electronics, 14(22), 4385. https://doi.org/10.3390/electronics14224385

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Learning Path Recommendation Enhanced by Knowledge Tracing and Large Language Model

Abstract

1. Introduction

2. Related Work

2.1. Learning Path Recommendation

2.2. Knowledge Tracing for LPR

2.3. LLM-Assisted Recommendation

3. Problem Formulation

4. Methodology

4.1. Framework Overview

4.2. Knowledge Tracing for Multi-Step Predictions

4.3. Prompt Template of LLM

4.4. Exercise Retrieval

5. Experiment

5.1. Dataset and Simulator

5.1.1. Dataset

5.1.2. Simulator

5.2. Implementation Details

5.3. Baselines and Evaluation Metric

5.3.1. Baselines

5.3.2. Evaluation Metric

5.4. Exploring the Predictive Capabilities of LLM (RQ1)

5.5. Overall Performance Comparison (RQ2)

5.5.1. Promotion Comparison

5.5.2. Difficulty Comparison

5.6. Ablation Study (RQ3)

5.7. Parameter Analysis (RQ4)

5.7.1. Weight Coefficients of Knowledge Tracing+

5.7.2. Mask Ratio of Knowledge Tracing+

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI