SFBKT: A Synthetically Forgetting Behavior Method for Knowledge Tracing

Song, Qi; Luo, Wenjie

doi:10.3390/app13137704

Open AccessArticle

SFBKT: A Synthetically Forgetting Behavior Method for Knowledge Tracing

by

Qi Song

¹ and

Wenjie Luo

^1,2,*

¹

School of Cybersecurity and Computer, Hebei University, Baoding 071002, China

²

Hebei Machine Vision Engineering Research Center, Hebei University, Baoding 071002, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(13), 7704; https://doi.org/10.3390/app13137704

Submission received: 29 April 2023 / Revised: 25 June 2023 / Accepted: 27 June 2023 / Published: 29 June 2023

(This article belongs to the Special Issue Advances in Artificial Intelligence (AI)-Driven Data Analytics)

Download

Browse Figures

Versions Notes

Abstract

:

Knowledge tracing (KT) aims to model students’ knowledge levels based on their historical learning records and predict their future learning performance, which constitutes an essential component of intelligent education. Learning and forgetting are closely related, and forgetting can often interfere with the learning process. Prior research has employed diverse techniques to address the issue of interference caused by forgetting factors in predictions, yet many of these methods fail to fully leverage the forgetting information contained within learning records. This paper proposes a synthetically forgetting behavior knowledge tracing (SFBKT) model that comprehensively models a student’s knowledge level by considering both individual forgetting factors and group status. Specifically, the model initially extracts forgetting information from exercise records in the input module, then updates the student’s knowledge state through an improved continuous-time long short-term memory network (CTLSTM), and finally combines the individual state with the group state using collaborative filtering to predict the student’s ability to correctly answer the next exercise. Our predictive model has been evaluated using four public education datasets. The experimental results indicate that our model’s predictions are effective and outperform other existing methods.

Keywords:

knowledge tracing; forgetting behavior; continuous-time LSTM; deep learning; intelligence education; collaborative filtering

1. Introduction

Smart learning and education are new concepts of technology-enhanced learning [1]. In the past three decades, as intelligent education has continued to advance, numerous online education platforms have emerged as a supplement to the limitations of traditional education methodologies [2]. Online education emphasizes the importance of customized instruction that accommodates the unique abilities of individual students and recommends appropriate learning resources based on their knowledge levels. A student’s knowledge level is shaped by their learning stage and cognitive capacity and constantly evolves throughout the learning process. Consequently, real-time monitoring of students’ knowledge levels is indispensable for personalized online education [3]. To gain insight into students’ knowledge proficiency and provide individualized guidance, knowledge tracing (KT) has garnered considerable attention from researchers. Knowledge tracing can generate models of students’ knowledge states based on their past practice, analyze their interactions with online learning materials, identify patterns in their knowledge progression, and predict their future learning outcomes [4]. Specifically, knowledge tracing encompasses two primary objectives: modeling students’ knowledge status and predicting their future performance. The precision of knowledge tracing is directly correlated with the relevance and efficacy of the system’s provided content [5]. In contrast, a user study has shown that presenting content and questions that are excessively challenging or easy for students diminishes their engagement [6]. Therefore, it is essential to create accurate models of students’ knowledge to increase their learning efficiency and satisfaction with the online education system.

The notion of knowledge tracing was initially introduced by Anderson et al. in a technical report in 1986 to facilitate cognitive modeling and intelligent tutoring [7]. Subsequently, a variety of approaches have been devised to tackle the knowledge-tracing challenge. Initially, researchers employed Bayesian inference methods [8,9], which often involved simplifying model assumptions (e.g., assuming only one skill) to render posterior computation more feasible. Subsequently, with the advent of classical machine learning methods, parametric factor analysis methods emerged as the dominant approach to knowledge tracing. Various factors, including those related to students, study materials, and the environment, are modeled by these methods to track students’ knowledge status and make answer predictions [10,11,12]. Moreover, psychological research on students’ learning and forgetting behavior has provided new insights into modeling students’ knowledge state [13,14]. These studies suggest that forgetting is a critical factor that affects a student’s knowledge state over time.

Deep learning techniques have made significant breakthroughs in recent years and have been applied extensively in various fields [15], leading to the rapid emergence of deep learning knowledge tracing models. Piech et al. were the pioneers in this research direction and demonstrated the superior performance of deep learning techniques for knowledge tracing [16]. Over time, numerous knowledge tracing methods have incorporated diverse neural network architectures to enhance their efficacy by considering the unique features of the learning sequence. For instance, graph-based knowledge tracing approaches have utilized graph neural networks to model the knowledge structure in knowledge concept (KC) [17]. Attention-based knowledge tracing techniques have integrated the attention mechanism to capture dependencies among learning interactions [18]. In addition, CKT has incorporated convolutional neural networks (CNNs) to analyze the personalized phased learning patterns of students and predict their learning outcomes [19]. In real-world scenarios, forgetting is an inevitable consequence of learning [20]. According to the Ebbinghaus forgetting curve theory, students’ knowledge proficiency can decline over time due to forgetting [21]. Thus, it is unreasonable to assume that a student’s knowledge state will remain constant over time. Moreover, the incorporation of the forgetting factor into knowledge tracing has advanced with the development of deep learning techniques. To account for the intricate forgetting behavior, the DKT + forgetting model [22] integrates forgetting into deep knowledge tracing through the consideration of three types of side information associated with forgetting. Notably, the HawkesKT model proposes an explicit modeling of temporal cross-effects inspired by point processes, which takes into account the varying impacts of previous interactions [23]. HawkesKT is the pioneering model that incorporates the Hawkes process for the simulation of temporal cross-effects in knowledge tracing; this characteristic facilitates the more effective integration of students’ forgetting behavior into the learning process. Additionally, AKT is an attention-based neural network model that utilizes distance-aware exponential decay with a global decay rate, which also integrates temporal information into the model to compute attention weights [24].

Over the past few years, there have been attempts to incorporate forgetting factors into knowledge tracing. However, the methods proposed so far have relied on either manually designed forgetting features or oversimplified process assumptions, such as fixed time intervals and global decay functions. According to temporal cross-effect theory [23] and temporal point process theory [25], every student’s interaction with an exercise will significantly influence their knowledge states. Figure 1 illustrates how the impacts of different exercises on knowledge states can vary over time. Previous studies’ methods have not been able to fully utilize forgotten information, resulting in a failure to capture dynamic changes in knowledge states. To address this problem, we propose a neural time series point process and a collaborative filtering-based forgetting behavior knowledge tracing model. Building on the ideas of DKT + forgetting [22], neural hawkes process [25], and collaborative filtering (CF) [26], our model considers several factors, including practice records before learning, lag time after learning, and the hidden forgetting factor in the group state. These factors are all essential to effectively capturing students’ forgetting behaviors and improving the accuracy of the model’s forecasts. In particular, we modify the memory cells within the network based on the lag time after learning. This gradual alteration over time serves to mitigate the effects of mutations resulting from interactions. As a result, the training process of the model becomes more akin to the natural forgetting process of students, as opposed to abrupt and drastic changes. Specifically, we first extract three types of forgetting-related information from practice records, which are then combined with the records as joint input data. The forgetting information consists of repetition intervals, sequence intervals, and past practice times. The joint data are fed into the Continuous-Time LSTM network to generate a knowledge state that changes dynamically over time. Lastly, cosine similarity is employed to assess the similarity of the knowledge state, which is then combined with neighbor information for prediction.

In summary, our main contributions are as follows:

Our study proposes the Synthetically Forgetting Behavior Knowledge Tracing model, which aims to comprehensively model students’ knowledge levels by incorporating pre-training forgetting information, post-training forgetting behavior, and a group forgetting state. This approach allows for a better prediction of students’ performance in the next stage.
The Neural Hawkes Process is a seminal article in the field of time series point processes. In this study, we enhance it and transfer the improved model to the field of knowledge tracing as part of SFBKT. This enables us to fully consider the forgetting information implicit in students’ training records, even during irregular continuous time periods. By using the self-exciting multivariate point process, the knowledge levels of students obtained by the neural network dynamically change over time, better aligning with the law of human forgetting.
Our approach employs the expressive power of neural networks to model individual training records while utilizing traditional collaborative filtering to account for similarities in knowledge levels among groups of students. This approach enables us to combine individual student status with group status and avoid ignoring students’ social attributes.
We tested our model on four widely used public education datasets, and the results indicate that it outperformed the standard methods used for comparison in all four datasets.

The remainder of the article is organized as follows: First, Section 2 provides a brief summary of the research on knowledge tracing and its underlying reasons. Then, in Section 3, our paper offers a precise explanation of the issue that the knowledge tracing task aims to solve. Additionally, we discuss the various components that comprise the SFBKT model structure and provide comprehensive information on its implementation. In Section 4, we compare the experimental outcomes of SFBKT with various baseline methods and variants of the SFBKT model in four public datasets. Finally, in Section 5, we conclude the paper and discuss potential research opportunities in the future.

2. Related Work and Motivation

Research on knowledge tracing can be broadly classified into two phases: the development of KT models based on traditional research methods and the development of KT models based on deep learning. Traditionally, Bayesian knowledge tracing and factor analysis models have been the two primary areas of research in KT. However, in recent years, KT based on deep learning has incorporated various advanced techniques to improve its performance. Table 1 presents an overview of the principal knowledge tracing models that have been developed in the KT literature.

Bayesian knowledge tracing is based on the principles of mastery learning, which assumes that students can attain mastery of a skill through practice if two conditions are met: (a) knowledge is represented as a hierarchical structure of skills, and (b) the learning experiences are structured such that students can master lower-level skills before progressing to higher-level ones [2]. The BKT approach usually employs probabilistic graphical models, such as Hidden Markov Models [8] and Bayesian Belief Network [9], to monitor how students’ knowledge states evolve while they practice various skills. The BKT model posits a uniform level of prior knowledge and learning rate for all students with regard to any skill, failing to personalize model parameters for individual students. Consequently, the model may provide inaccurate estimates of learning performance, with such estimates tending to skew toward the mean. Subsequent studies have primarily focused on enhancing the BKT model from two key perspectives: incorporating individualized parameters for each student and modeling the interplay between various skills. Bayes’ theorem is a fundamental tool in these models that describes the relationship between two events, A and B, as follows:

p (A| B) = \frac{p (B| A) p (A)}{p (B)}

(1)

Factor analysis models in education are based on Item Response Theory (IRT) [27], which has made significant contributions to educational measurement. IRT typically constructs a function that estimates the performance of students based on the various factors involved in their learning processes. Although the original IRT was only capable of modeling single-skill problems, subsequent research has enabled it to model problems involving multiple skills. In such cases, the relationship between skills and items is commonly expressed using a

Q

-matrix, where each entry

q_{j k}

is set to 1 if item

j

involves skill

k

and 0 otherwise. The

Q

-matrix is often used as an additional source of information in these models. Psychometric models can be well understood from a psychological perspective, but their ability to encode complex features is limited by simplistic parameter settings.

Deep Knowledge Tracing (DKT) is an innovative knowledge tracing model that utilizes deep learning to predict learning performance. To overcome the limitations of DKT, a range of knowledge tracing models have been developed using it as a foundation. For instance, the Extended-DKT model represents an improvement over the original DKT model by incorporating both student and exercise features. The model leverages student features, such as prior knowledge, answer rate, and practice time, as well as exercise features, such as textual information, question difficulty, skill level, and skill dependency [28]. To enhance the capacity of the original loss function and mitigate inconsistencies in answering predictions for questions that share similar knowledge components (KC), the DKT+ model introduces two regularization terms [29]. Furthermore, to capture the intricate KCs that students learn, key-value memory has been introduced as a representation of the knowledge state, inspired by memory-augmented neural networks [30]. The use of an external memory structure provides greater representational power than the hidden variables used in the original DKT model, thus enhancing the model’s ability to indicate proficiency. Several studies have endeavored to integrate attention mechanisms into KT models by adopting the Transformer architecture [31]. The common goal of these studies is to optimize prediction performance by redistributing the attention weight of the problem. Although each work adopts a unique approach to incorporating the attention mechanism, the underlying idea remains consistent [18,32,33]. Moreover, graph-based KT models utilize graph neural networks to model the interrelatedness and interdependence among KCs. Despite some previous research addressing the limitations of existing KT models and advancing the field of deep learning-based KT, the feature extraction methods used in these models have tended to be relatively simplistic, resulting in suboptimal utilization of student practice record information. Table 2 provides a summary of the key descriptive features of the main KT methods employed in the field.

Based on current knowledge, it appears that students’ learning sequences are embedded with complex information forgetting. The probability of a student providing a correct answer in the next exercise is influenced not only by their learning history but also by the time lag of the previous exercise. Traditional knowledge tracing models and deep knowledge tracing models have rarely comprehensively considered forgetting behavior. Most models only account for forgetting factors in past exercise records, such as the DKT-forgetting model, and ignore the forgetting process after practice. The introduction of the neural Hawkes process is a significant step in addressing this issue, as it expands the traditional multivariate Hawkes process by incorporating the time decay effects of past times and LSTM nodes [25]. Building on this concept, we aim to combine traditional forgetting factors with the Hawkes process to enable dynamic changes in students’ knowledge states over time while also considering past practice records, which align more closely with the forgetting curve. Collaborative filtering can further enhance the full utilization of information by better integrating individual and group data.

3. Proposed Method

3.1. Problem Definition

Given a student’s interaction sequence

S_{t} = \{k_{0}, k_{1}, \dots k_{n}\}

, knowledge tracing aims to assess the student’s knowledge state and predict whether they can answer the next interaction

k_{n + 1}

correctly. In the learning sequence,

k_{i}

is defined as a tuple

(q_{i}, t_{i}, a_{i})

, which includes the question

q_{i}

that the student attempts to answer at timestamp

t_{i}

, and the corresponding response

a_{i} ϵ \{0, 1\}

indicates whether the exercise

q_{i}

has been answered correctly (1 stands for correct, and 0 represents wrong).

3.2. Model Architecture

The Synthetically Forgetting Behavior Knowledge Tracing model comprises three primary components. The first component extracts forgotten information from a student’s learning sequences. This process involves integrating additional information related to forgetting from the student’s sequence of interactions with their interaction sequence. The second part involves obtaining the student’s knowledge status through a continuous-time long short-term memory network (CTLSTM), which captures changes in the student’s knowledge state throughout the learning process. Finally, the third component predicts the student’s performance in the next time slice by leveraging neighbor information and using cosine similarity to determine the similarity of different students’ knowledge states. The model combines individual and group information to determine the student’s knowledge state and predict their performance.

The SFBKT model is an improvement over existing models such as DKT-forgetting [22] and HawkesKT [23] as it considers forgetting as a critical factor in student learning and uses continuous-time LSTM networks to capture changes in student knowledge states over time. Figure 2 demonstrates how the SFBKT model enhances the model structures of DKT-forgetting and HawkesKT.

DKT-forgetting integrates practice sequences and additional forgetting information in the data processing stage and uses LSTM networks for feature extraction to obtain the student’s knowledge status. However, DKT-forgetting assumes that the knowledge state remains unchanged until the next exercise, which contradicts the law of forgetting. SFBKT improves upon DKT-forgetting in two primary ways. First, the SFBKT model enhances the LSTM network to capture the change in the knowledge state over time. This improvement addresses the inadequacy of the DKT-forgetting model by incorporating the forgetting curve into the learning process, where the students’ mastery of exercises gradually declines over time [34]. Second, the SFBKT model leverages group information to improve prediction accuracy. Since students often learn together in groups rather than independently, combining the knowledge state of the group helps to reduce interference from incorrect information recorded in the dataset. This approach combines information from students with similar knowledge states with the student being predicted, resulting in a more precise assessment of the student’s knowledge state.

HawkesKT represents the first model to incorporate the Hawkes process for simulating temporal cross-effects in knowledge tracing, serving as an inspiration for our work. Nevertheless, HawkesKT employs an exponential function as a kernel function to approximate the forgetting curve, which only crudely simulates the average knowledge state of students and cannot be customized for individual students. In contrast, we integrate the CTLSTM network with the Hawkes process to model student knowledge, thereby facilitating the personalized tracing of forgetting patterns for each student. This approach yields more precise and dependable predictions of students’ knowledge states over time.

3.2.1. Extracting Forgotten Information in Student Learning Sequences

Given the proven effectiveness of the method of extracting forgotten information proposed by DKT-forgetting, we adopted this method in our work and improved upon it. However, we differ from DKT-forgetting in that our SFBKT approach requires the extracted features to be fed into the CTLSTM network instead of the RNN network. We proceed as follows:

Extracting information related to memory retention from learning sequences. Previous studies have confirmed the existence of temporal cross-effects, which means that each previous interaction has a time-sensitive effect on mastery of the target skill. In our approach, we extract three pieces of information from a student’s learning sequence: the time interval of the same skill, the time interval of the previous skill, and the number of attempts at the same skill in the past.
Incorporate information into the input space of the model. To model the knowledge acquisition process of students, we utilize a trainable embedding matrix A to calculate the embedding vector $v_{i}$ of the interaction vector $k_{i}$ at time $ⅈ$ , instead of assigning arbitrary values. The above process can be summarized as follows:

k_{i} = (q_{i}, a_{i})

(2)

v_{i} = k_{i} \times A

(3)

The information’s three characteristics are expressed as one-hot vectors and subsequently merged. Before entering the CTLSTM module, the embedding vector

v_{i}

and additional information vector

p_{i}

are combined into

x_{i}

and fed into the network. Specifically,

p_{i}

is created by concatenating three one-hot vectors and transforming them using a trainable transformation matrix

C

to match the dimensions of

v_{i}

. Then, a Hadamard product operation is performed on

v_{i}

and the transformed

p_{i}

, resulting in a vector that is connected with

p_{i}

to form the combined feature vector

x_{i}

. The above process can be summarized as follows:

x_{i} = θ^{i n} (v_{i}, p_{i}) = [v_{i} ⨀ {C p}_{i}; p_{i}]

(4)

3.2.2. Using CTLSTM to Obtain Student Knowledge Status

In continuous-time LSTM, the knowledge state vector is dependent on the memory unit vector

c_{t}

as the lstm network. The key difference is that each memory unit

c

decays exponentially at a rate

δ

towards a steady-state value

\bar{c}

in subsequent time intervals after the event. Formula (5) depicts how the hidden states

h_{t}

are continually obtained from the memory cells

c_{t}

as the cells decay, where

O_{i}

is the output gate following the

i - t h

event.

h_{t}

is analogous to

h_{i}

in an LSTM language model [35], summarizing the past knowledge state changes caused by learning sequences

\{k_{0}, k_{1}, \dots, k_{i - 1}\}

. Afterwards, the model predicts the probabilities of answering correctly for all skills, based on the hidden states

h (t_{i + 1})

. The resulting output

\hat{Y} (t_{i + 1})

is a vector of equal length to the number of questions. Each entry in

\hat{Y} (t_{i + 1})

represents the predicted probability that the student answered a specific question correctly. Therefore, the prediction

{\hat{y}}_{i + 1}

for question

q_{i + 1}

can be obtained from the entry corresponding to

q_{i + 1}

in

\hat{Y} (t_{i + 1})

. The above can be summarized as follows:

h (t) = O_{i} ⊙ (2 σ (2 c (t)) - 1) f o r t ϵ (t_{i}, t_{i + 1}]

(5)

\hat{Y} (t_{i + 1}) = σ (W^{o u t} h (t_{i + 1}) + b^{o u t})

(6)

In our model,

h_{t}

also reflects memory decay caused by the time interval

t_{i} - t_{i - 1}

between the current event and the last event. This interval

(t_{i - 1}, t_{i}]

ends when the next event

k_{i}

occurs stochastically at some time

t_{i}

. At this point, the continuous-time LSTM reads

(k_{\dot{i}}, t_{i})

and updates the current (decayed) hidden cells

c_{(t)}

to new initial values

c_{i + 1}

, based on the current (decayed) hidden state

h (t_{i})

. When information is input at time

t_{i}

, the update formula of the model is similar to traditional LSTM. The overall process is illustrated in Formulas (7)–(13).

i_{i + 1} \leftarrow σ (W_{i} x_{i} + U_{i} h (t_{i}) + d_{i})

(7)

f_{i + 1} \leftarrow σ (W_{f} x_{i} + U_{f} h (t_{i}) + d_{f})

(8)

z_{i + 1} \leftarrow 2 σ (W_{z} x_{i} + U_{z} h (t_{i}) + d_{z}) - 1

(9)

o_{i + 1} \leftarrow σ (W_{o} x_{i} + U_{o} h (t_{i}) + d_{o})

(10)

c_{i + 1} \leftarrow f_{i + 1} ⊙ c (t_{i}) + i_{i + 1} ⊙ z_{i + 1}

(11)

{\bar{c}}_{i + 1} \leftarrow {\bar{f}}_{i + 1} ⊙ {\bar{c}}_{i} + {\bar{i}}_{i + 1} ⊙ z_{i + 1}

(12)

δ_{i + 1} \leftarrow f (W_{d} x_{i} + U_{d} h (t_{i}) + d_{d})

(13)

The vector

x_{i}

is derived from Formulas (2)–(4), which represents the new exercise item information combined with the forgotten factor in the input module. The symbol

σ

is the logistic sigmoid function, and

i

,

f

,

o

,

c

are respectively the input gate, forget gate, output gate and cell activation vectors. The upright-font subscripts

i

,

f

,

z

,

c

and

o

are not variables but constant labels that distinguish different

W

,

U

and

d

tensors. The

\bar{f}

and

\bar{i}

in Formula (12) are defined analogously to

f

and

i

but with different weights. The key difference between this method and the traditional LSTM is that the update depends not only on the knowledge state at

t_{i - 1}

time, but also on the time interval between this time and the last practice, which affects the update of the knowledge state. As time progresses, the memory cell vector changes according to Formula (14). Specifically, the memory cell vector at the moment when a new practice occurs continues to decay deterministically from

c_{i + 1}

to the target

{\bar{c}}_{i + 1}

at different rates. In other words, on the interval

(t_{i}, t_{i + 1})

,

c_{(t)}

follows an exponential curve that begins at

c_{i + 1}

(in the sense that

\lim_{t \to t_{i}^{+}} c (t) = c_{i + 1}

) and decays toward

{\bar{c}}_{i + 1}

(which it would approach as

t \to \infty

, if extrapolated).

c (t) \overset{def}{=} {\bar{c}}_{i + 1} + (c_{i + 1} - {\bar{c}}_{i + 1}) \exp (- δ_{i + 1} (t - t_{i})) f o r t ϵ (t_{i}, t_{i + 1}]

(14)

The aforementioned scenario pertains to situations wherein a new test question is answered correctly, resulting in a sudden increase followed by decay in the memory unit vector

c_{t}

. Similarly, when the response is incorrect,

c_{t}

undergoes a sudden decrease, followed by an increase, and the formula remains the same as in the case of a correct response.

3.2.3. Predicting Results Based on Neighbor Information

The SFBKT network is utilized to calculate the knowledge state vector of each student, resulting in the acquisition of the knowledge state vector matrix

U (m \times n)

representing all students. Cosine similarity is employed to determine the similarity between the vectors

U_{i}

and

U_{j}

. Assuming that

U_{i} = {a_{1}, a_{2}, \dots, a_{n}}

and

U_{j} = {b_{1}, b_{2}, \dots, b_{n}}

, the cosine similarity between

U_{i}

and

U_{j}

is defined by Formula (15).

\cos (θ) = \frac{U_{i} \cdot U_{j}}{∥ U_{i} ∥ ∥ U_{j} ∥} = \frac{\sum_{i = 1}^{n} a_{i} \cdot b_{i}}{\sqrt{\sum_{i = 1}^{n} {(a_{i})}^{2}} \times \sqrt{\sum_{i = 1}^{n} {(b_{i})}^{2}}}

(15)

Through the ranking of cosine similarity, students whose scores are above a certain threshold (0.9 in the experiment) are selected as similar users. For the target student

i

, the final knowledge state vector is predicted using Formula (16). In this equation,

U_{i}

represents the knowledge level vector of the target student, while

a v e r a g e (N_{i})

is the average knowledge level vector of the similar students. The parameter

ρ

is used to adjust the proportion of the individual knowledge level of students and the common knowledge level between students when using

f_{i}

to predict. The value range of

ρ

is

[0, 1]

. As

ρ

increases, the influence of the individual knowledge level of students also increases. Specifically, when

ρ = 0

,

f_{i}

represents the average knowledge level of neighboring students without considering the individual state of the target student. On the other hand, when

ρ = 1

,

f_{i}

represents solely the personal knowledge level vector output by the SFBKT model.

f_{i} = ρ \cdot U_{i} + (1 - ρ) \cdot a v e r a g e (N_{i})

(16)

3.3. Optimization

All parameters of SFBKT are obtained by training on the student practice sequence. A cross-entropy loss function and Adam optimizer are employed to optimize the model. Specifically,

{\hat{y}}_{t}^{s}

represents the model’s prediction for the performance of student

s

at time

t

, while

y_{t}^{s}

represents the actual performance of the student. The loss function used is formulated as follows:

L = - \sum_{S} \sum_{t} (y_{t}^{s} l o g {\hat{y}}_{t}^{s} + (1 - y_{t}^{s}) l o g (1 - {\hat{y}}_{t}^{s}))

(17)

4. Experiments

4.1. Experimental Datasets

To ensure an equitable comparison of our model with other knowledge-tracing models, we conducted experiments using identical public datasets to evaluate our model’s performance. Specifically, we used four datasets, namely ASSISTment09-10, ASSISTment12-13, ASSISTment17, and Slepemapy.cz. ASSISTments is an online tutoring system specifically designed to provide mathematical instruction to students. We selected three publicly available datasets from different time periods. For the Slepemapy.cz4 dataset, which comes from an online geography practice system, we defined the skill ID as ‘place_asked’ to maintain consistency.

Utilizing the previously mentioned datasets necessitates consideration of student records that have insufficient practice data, as such records may not accurately reflect the actual learning progression of the students and could be unsuitable for implementation in time series models [36]. Therefore, we removed student records with fewer than five practice records from the sample set. Additionally, we only considered the first 50 interactions of each user, as predictive performance is more important when there is little user history. Furthermore, the ASSISTment09-10 dataset is devoid of timestamps for each interaction. Consequently, we made the working assumption that users responded to questions with an identical time interval of 1 s. Table 3 presents the descriptive statistics for all four datasets post-preprocessing.

4.2. Experimental Environment

The success of a model analysis experiment is critically dependent on the availability of an appropriate hardware and software environment. Table 4 presents an overview of the key hardware and software environment configurations utilized in this study.

4.3. Results and Discussion

We conducted 5-fold cross-validation in our experiments. For each fold, we divided the dataset into three parts: the training set, the validation set, and the test set, accounting for 70%, 10%, and 20%, respectively. Area under the curve (AUC) and accuracy (ACC) were used as evaluation indices to assess the model’s performance. In order to strike a balance between prediction accuracy and computational resources, we established the hyperparameters, as presented in Table 5. It is worth mentioning that, irrespective of the available computational resources, the SFBKT model can achieve even more accurate predictions by adjusting its hyperparameters.

Furthermore, in the collaborative filtering component of our model, the parameter

ρ

is utilized to adjust the proportion of influence of individual knowledge levels of students and the common knowledge level of the student group on the recommendation results. The value range of

ρ

is

[0, 1]

. When

ρ

is smaller, the predicted exercise score vector is closer to that of the knowledge state of the student group, similar to the target students. On the other hand, when the value of

ρ

is closer to 1, the final vector is more similar to the students’ personal knowledge status vector. Figure 3 illustrates the relationship between AUC and

ρ

when the SFBKT method predicts varying

ρ

across the four datasets. The experimental results indicate that

ρ

has a significant impact on prediction effectiveness. For all four datasets,

ρ

between 0.4 and 0.5 yielded the highest AUC values, suggesting a strong correlation between individual student states and the group’s state.

Baseline comparison. To evaluate the performance of our synthetically forgetting behavior knowledge tracing method, we selected five classic models for comparative experiments, including:

DKT. It was the first to employ deep learning methods to trace knowledge progression. By utilizing the long short-term memory model, it has the capability to monitor changes in students’ proficiency levels over an extended period. This is a conventional technique that has been refined by numerous models in recent years.
DKT + Forgetting. Although several models have explored the concept of forgetting behaviors in knowledge tracing, DKT-Forgetting stands out as the pioneer in integrating forgetting-related information into deep learning.
KTM. It employs Factorization Machines to model the interplay between features. The characteristics taken into account in this approach comprise question ID, skill ID, previous responses to diverse skills, and temporal features within DKT + Forgetting.
HawkesKT. It represents a pioneering category of knowledge-tracing methodologies that resulted from the integration of temporal point processes and knowledge-tracing studies. This approach builds upon the observed temporal crossover effects in the data, where each historical interaction has a continuously changing effect on the target skill, offering fresh perspectives on the incorporation of forgetting behavior.
AKT. It employs a novel attention mechanism that links a student’s forthcoming responses with their previous responses. This attention mechanism calculates attention weights through exponential decay and a context-sensitive relative distance metric, as well as the resemblance between the questions.

Table 6 presents a performance comparison of our SFBKT model with five classic models. As the DKT model outperforms traditional KT methods significantly, we only select models that appear after DKT as baseline methods. Figure 4 visualizes the AUC results of all models on the four datasets in the form of histograms, providing an intuitive comparison of their predictive effects.

Among these methods, the extent of the considered forgetting factors positively correlates with their predictive performance. DKT and KTM do not consider forgetting factors, resulting in lower performance than the other three methods that incorporate forgetting behaviors. HawkesKT performs best on most datasets by deeply considering the time cross-effect in the learning sequence. However, our SFBKT model outperforms other models in both ACC and AUC for all datasets. Specifically, in the ASSISTment09-10 dataset, SFBKT achieves an ACC of 0.7593 and an AUC of 0.7909, which are 1.21% and 2.86% higher than the best results among the five baseline methods, respectively. On the ASSISTment12-13 dataset, SFBKT achieves an ACC of 0.7544 and an AUC of 0.7842, which are 0.31% and 1.79% higher than the best results among the five baseline methods, respectively. On the ASSISTment17 dataset, SFBKT achieves an ACC of 0.7404 and an AUC of 0.8095, which are 2.30% and 5.12% higher than the best results among the five baseline methods, respectively. On the Slepemapy.cz dataset, SFBKT achieves an ACC of 0.8087 and an AUC of 0.7736, which are 0.59% and 2.13% higher than the best results among the five baseline methods, respectively. Therefore, SFBKT can better classify the prediction results of students’ correctness.

Ablation Study. We conduct a series of ablation experiments to evaluate the performance of individual components in SFBKT, thereby enabling us to assess the effectiveness of each component and determine its impact on the overall system. Since the model consists of three modules, we remove one module in turn to observe its impact on the prediction.

SFBKT-GF. The SFBKT-Group Forgetting variant model is used to study the influence of personal forgetting information processing in the SFBKT model on prediction. In the input module of SFBKT-GF, the part of extracting students’ past forgetting records is removed. Thus, the variant model maintains the network module and output module of the SFBKT model, while reducing the input data of the CTLSTM network to a series of student exercises. SFBKT-GF can effectively reflect the degree of influence of students’ personal forgotten information on predictions.
SFBKT-TF. To study the importance of the continuous-time long short-term memory network in the SFBKT model for prediction performance, the SFBKT-TF variant model removes it and compares it with the original model. At this time, the second part of the model becomes the traditional LSTM network, which is called the SFBKT-Traditional Forgetting variant model.
SFBKT-PF. To investigate how group forgetting impacts the predictive accuracy of the SFBKT model, the SFBKT-Personal Forgetting variant model removes the part of collaborative filtering by considering the state of group knowledge from the SFBKT model. Specifically, the SFBKT-PF model first integrates the students’ forgetting information and the record of doing questions as input. Then, through the CTLSTM network, the time feature is extracted to obtain the student’s knowledge status and predict the result based on this output.

Table 7 presents the prediction performance of SFBKT and its variants (SFBKT-GF, SFBKT-TF, and SFBKT-PF) on four datasets, namely, ASSISTment09-10, ASSISTment12-13, ASSISTment17, and Slepemapy.cz. Notably, different models based on the same hyperparameters were used in experiments on the same datasets. The other modules within SFBKT maintained identical configurations, with the only variation being the structure of the module in each individual experiment. To facilitate a more intuitive comparison between the original model and its variants, the histograms in Figure 5 display the AUC outcomes for SFBKT and its variant models on the four datasets.

Among the various model variants, the SFBKT-TF model performed the worst, which suggests that the CTLSTM network played a crucial role in improving model performance. Although the AUC of the other two variant models was similar to the original model, subtle differences still existed, reflecting certain underlying rules. Specifically, the SFBKT-GF model performed better than the SFBKT-PF model, which could be attributed to two factors. First, although the former eliminated the artificial integration of personal information, the CTLSTM network in the model still captured and processed personal forgetting information, thereby reducing the impact of deleted modules on prediction. Second, the collaborative filtering module fused similar students’ knowledge states with the target students’ knowledge states, thereby mitigating the influence of unreasonable data collected in individual practice sequences and enhancing the weight of effective data in the process of knowledge state formation. These two reasons contributed to the excellent performance of the SFBKT-GF model.

By conducting ablation experiments, we assessed the importance of each module of the model in enhancing its performance. The most crucial module was found to be the improved network module. When this module was removed and the original RNN network was restored, the prediction accuracy of the model was significantly reduced. Specifically, the AUC of the SFBKT-TF variant model on the ASSISTment09-10 dataset decreased by 3.17% when compared with the original model. Similar outcomes were observed in other datasets, which further underscores the importance of this module in improving model performance. Our findings demonstrate the feasibility of integrating the theory of neurally self-modulating multivariate point processes into the field of knowledge tracing and the promising results that can be achieved. Additionally, the other two modules played significant roles and contributed to the improvement of the model’s performance.

The experimental results presented above demonstrate that SFBKT outperforms previous knowledge tracing methods in terms of predictive performance, with each module playing its due role. This indicates that our consideration of the forgetting factor is more profound than previous models, effectively capturing the forgetting behavior of students in the learning sequence. The practical significance of our model is twofold: for students, it reduces the number of useless exercises and promotes more efficient learning, while for teachers, it enhances their work efficiency, enabling personalized education for each student. Overall, through its innovative design and implementation, it has demonstrated the ability to effectively optimize learning outcomes and improve the quality of education. Our SFBKT model can contribute significantly to the advancement of intelligent education.

5. Conclusions

To address the limitations of the existing Knowledge Tracing model, and completely consider the forgotten information in the student learning sequence, we propose a synthetically forgetting behavior knowledge tracing model (SFBKT). SFBKT comprises three modules to process forgetting information and to better analyze the forgetting behavior of students in the learning sequence. First, SFBKT identifies the learning characteristics related to forgetting from the learning records before learning. Second, the jointly learned features are input into the CTLSTM network. Lastly, collaborative filtering is combined with the learning status of similar students to predict whether the students can answer the questions correctly. We verify the prediction performance of the SFBKT model on four public datasets, and the experimental results demonstrate that SFBKT’s prediction effect is superior to that of the five classic knowledge tracing models. Therefore, SFBKT can effectively utilize forgotten information in the student learning sequence and improve prediction accuracy. Additionally, we conduct multiple experiments comparing the predictive performance of SFBKT with its variant models and identifying the role of each module.

As part of our future research objectives, we plan to utilize the findings from this study to investigate the forgetting patterns of individual students, with the ultimate goal of enabling more personalized and targeted instruction. Additionally, we intend to enhance the scalability of SFBKT by incorporating supplementary side information, such as skill dependencies and the role of group size, individual age, and gender on the model’s performance, to improve the accuracy of our predictions.

Author Contributions

Funding acquisition, W.L.; resources, W.L.; supervision, W.L.; writing—original draft, Q.S.; writing—review and editing, Q.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Hebei Province (F2019201451).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The four public datasets ASSISTment09-10, ASSISTment12-13, ASSISTment17, and Slepemapy.cz can be found from the following four links, respectively: https://sites.google.com/site/assistmentsdata/home/2009-2010-assistment-data (accessed on 3 October 2022), https://sites.google.com/site/assistmentsdata/datasets/2012-13-school-data-with-affect (accessed on 3 October 2022), https://sites.google.com/view/assistmentsdatamining/dataset (accessed on 3 October 2022), https://www.fi.muni.cz/adaptivelearning/?a=data (accessed on 3 October 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

Yassine, S.; Kadry, S.; Sicilia, M.A. Measuring learning outcomes effectively in smart learning environments. In Proceedings of the 2016 Smart Solutions for Future Cities, Kuwait City, Kuwait, 7–9 February 2016; pp. 1–5. [Google Scholar]
Abdelrahman, G.; Wang, Q.; Nunes, B. Knowledge tracing: A survey. ACM Comput. Surv. 2023, 55, 1–37. [Google Scholar] [CrossRef]
Zhu, T.Y.; Huang, Z.; Chen, E.; Liu, Q.; Wu, R.; Wu, L.; Hu, G. Cognitive diagnosis based personalized question recommendation. Chin. J. Comput. 2017, 40, 176–191. [Google Scholar]
Corbett, A.T.; Anderson, J.R. Knowledge tracing: Modeling the acquisition of procedural knowledge. User Model. User-Adapt. Interact. 1994, 4, 253–278. [Google Scholar] [CrossRef]
David, Y.B.; Segal, A.; Gal, Y. Sequencing educational content in classrooms using Bayesian knowledge tracing. In Proceedings of the Sixth International Conference on Learning Analytics & Knowledge, Edinburgh, UK, 25–29 April 2016; pp. 354–363. [Google Scholar]
Adamopoulos, P. What makes a great MOOC? An interdisciplinary analysis of student retention in online courses. In Proceedings of the Thirty Fourth International Conference on Information Systems, Milan, Italy, 15–18 December 2013. [Google Scholar]
Anderson, J.R.; Boyle, C.F.; Corbett, A.T.L.; Matthew, W. Cognitive Modelling and Intelligent Tutoring; ERIC: Washington, DC, USA, 1986. [Google Scholar]
Baker, R.S.J.D.; Corbett, A.T.; Aleven, V. More Accurate Student Modeling through Contextual Estimation of Slip and Guess Probabilities in Bayesian Knowledge Tracing. In Intelligent Tutoring Systems: 9th International Conference, ITS 2008, Montreal, Canada, 23–27 June 2008 Proceedings 9; Springer: Berlin/Heidelberg, Germany, 2008; pp. 406–415. Available online: https://link.springer.com/chapter/10.1007/978-3-540-69132-7_44 (accessed on 3 May 2022).
Villano, M. Probabilistic student models: Bayesian belief networks and knowledge space theory. In Intelligent Tutoring Systems: Second International Conference, ITS’92 Montréal, Canada, 10–12 June 1992 Proceedings 2; Springer: Berlin/Heidelberg, Germany, 1992; pp. 491–498. [Google Scholar]
Cen, H.; Koedinger, K.; Junker, B. Learning factors analysis—A general method for cognitive model evaluation and improvement. In Intelligent Tutoring Systems: 8th International Conference, ITS 2006, Jhongli, Taiwan, 26–30 June 2006. Proceedings 8; Springer: Berlin/Heidelberg, Germany, 2006; pp. 164–175. [Google Scholar]
Cen, H.; Koedinger, K.; Junker, B. Comparing two IRT models for conjunctive skills. In Intelligent Tutoring Systems: 9th International Conference, ITS 2008, Montreal, Canada, 23–27 June 2008 Proceedings 9; Springer: Berlin/Heidelberg, Germany, 2008; pp. 796–798. [Google Scholar]
Pavlik, P.I., Jr.; Cen, H.; Koedinger, K.R. Performance Factors Analysis—A New Alternative to Knowledge Tracing. In Proceedings of the 14th International Conference on Artificial Intelligence in Educatio, Brighton, UK, 6–10 July 2009. [Google Scholar]
Murray, R.C.; Ritter, S.; Nixon, T.; Schwiebert, R.; Hausmann, R.G.; Towle, B.; Fancsali, S.E.; Vuong, A. Revealing the learning in learning curves. In Artificial Intelligence in Education: 16th International Conference, AIED 2013, Memphis, TN, USA, 9–13 July 2013. Proceedings 16; Springer: Berlin/Heidelberg, Germany, 2013; pp. 473–482. [Google Scholar]
Pavlik, P.I., Jr.; Anderson, J.R. Practice and forgetting effects on vocabulary memory: An activation-based model of the spacing effect. Cogn. Sci. 2005, 29, 559–586. [Google Scholar] [CrossRef] [PubMed]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Piech, C.; Bassen, J.; Huang, J.; Ganguli, S.; Sahami, M.; Guibas, L.J.; Sohl-Dickstein, J. Deep knowledge tracing. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
Nakagawa, H.; Iwasawa, Y.; Matsuo, Y. Graph-based knowledge tracing: Modeling student proficiency using graph neural network. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, New York, NY, USA, 14–17 October 2019; pp. 156–163. [Google Scholar]
Pandey, S.; Karypis, G. A self-attentive model for knowledge tracing. arXiv 2019, arXiv:1907.06837. [Google Scholar]
Shen, S.; Liu, Q.; Chen, E.; Wu, H.; Huang, Z.; Zhao, W.; Su, Y.; Ma, H.; Wang, S. Convolutional knowledge tracing: Modeling individualization in student learning process. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, 25–30 July 2020; pp. 1857–1860. [Google Scholar]
Markovitch, S.; Scott, P.D. The role of forgetting in learning. In Machine Learning Proceedings 1988; Morgan Kaufmann: Burlington, MA, USA, 1988; pp. 459–465. [Google Scholar]
Chen, Y.; Liu, Q.; Huang, Z.; Wu, L.; Chen, E.; Wu, R.; Su, Y.; Hu, G. Tracking knowledge proficiency of students with educational priors. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; pp. 989–998. [Google Scholar]
Nagatani, K.; Zhang, Q.; Sato, M.; Chen, Y.Y.; Chen, F.; Ohkuma, T. Augmenting knowledge tracing by considering forgetting behavior. In Proceedings of the WWW 19: The Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 3101–3107. [Google Scholar]
Wang, C.; Ma, W.; Zhang, M.; Lv, C.; Wan, F.; Lin, H.; Tang, T.; Liu, Y.; Ma, S. Temporal cross-effects in knowledge tracing. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, Virtual Event, 8–12 March 2021; pp. 517–525. [Google Scholar]
Ghosh, A.; Heffernan, N.; Lan, A.S. Context-aware attentive knowledge tracing. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, 6–10 July 2020; pp. 2330–2339. [Google Scholar]
Mei, H.; Eisner, J.M. The neural hawkes process: A neurally self-modulating multivariate point process. In Proceedings of the Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web, Hong Kong, 1–5 May 2001; pp. 285–295. [Google Scholar]
Embretson, S.E.; Reise, S.P. Item Response Theory; Psychology Press: London, UK, 2013. [Google Scholar]
Xiong, X.; Zhao, S.; Van Inwegen, E.G.; Beck, J.E. Going deeper with deep knowledge tracing. In Proceedings of the International Educational Data Mining Society, Raleigh, NC, USA, 29 June–2 July 2016. [Google Scholar]
Yeung, C.K.; Yeung, D.Y. Addressing two problems in deep knowledge tracing via prediction-consistent regularization. In Proceedings of the Fifth Annual ACM Conference on Learning at Scale, London, UK, 26–28 June 2018; pp. 1–10. [Google Scholar]
Miller, A.; Fisch, A.; Dodge, J.; Karimi, A.H.; Bordes, A.; Weston, J. Key-value memory networks for directly reading documents. arXiv 2016, arXiv:1606.03126. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Choi, Y.; Lee, Y.; Cho, J.; Baek, J.; Kim, B.; Cha, Y.; Shin, D.; Bae, C.; Heo, J. Towards an appropriate query, key, and value computation for knowledge tracing. In Proceedings of the Seventh ACM Conference on Learning@ Scale, Virtual Event, 12–14 August 2020; pp. 341–344. [Google Scholar]
Shin, D.; Shim, Y.; Yu, H.; Lee, S.; Kim, B.; Choi, Y. Saint+: Integrating temporal features for ednet correctness prediction. In Proceedings of the LAK21: 11th International Learning Analytics and Knowledge Conference, Irvine, CA, USA, 12–16 April 2021; pp. 490–496. [Google Scholar]
Murre, J.M.J.; Dros, J. Replication and analysis of Ebbinghaus’ forgetting curve. PLoS ONE 2015, 10, e0120644. [Google Scholar] [CrossRef] [PubMed]
Mikolov, T.; Karafiát, M.; Burget, L.; Cernocký, J.; Khudanpur, S. Recurrent neural network based language model. In Proceedings of the Eleventh Annual Conference of the International Speech Communication Association, Chiba, Japan, 26–30 September 2010; Volume 2, pp. 1045–1048. [Google Scholar]
Zhang, L.; Xiong, X.; Zhao, S.; Botelho, A.; Heffernan, N.T. Incorporating rich features into deep knowledge tracing. In Proceedings of the Fourth (2017) ACM Conference on Learning@Scale, Cambridge, MA, USA, 20–21 April 2017; pp. 169–172. [Google Scholar]

Figure 1. Illustration of how the different interactions affect changes in the target knowledge state. There is an instantaneous change (orange line) at the moment the interaction occurs, and the dashed line indicates the steady-state asymptote to which they will eventually approach.

Figure 2. Architecture diagram of SFBKT.

Figure 3. The relationship between AUC and ρ in the four datasets.

Figure 4. AUC results of all models for the four datasets.

Figure 5. AUC results of SFBKT and its variant models on four datasets.

Table 1. An overview of knowledge tracing models.

Model	Method	Deep Learning
BKT	Bayesian Knowledge Tracing	×
IRT	Factor Analysis Models	×
AFM
PFA
KTM
DKT	Sequence Modeling KT Models	√
EERNN	Text-Aware KT Models	√
EKT	Text-Aware KT Models	√
SAKT	Attentive KT Models	√
AKT
SSAKT
RKT
GKT	Graph-Based KT Models	√
GIKT
SKT
DKT + Forgetting	Forgetting-Aware KT Models	√
KPT
HawkesKT
DGMN
DKVMN	Memory-Augmented KT Models	√
SKVMN	Memory-Augmented KT Models	√

Table 2. Comparison of previous main research in KT. The “Learning Model” column means the main machine learning or deep learning techniques used, the “Knowledge State” column means how the obtained student knowledge level is stored, and the “Forgetting” column indicates whether the mode takes forgetting information into account.

KT Model	Year	Learning Model	Knowledge State	Forgetting
IRT	1993	LR	Real-valued vector	×
BKT	1994	HMM	Binary scalar	×
AFM	2008	LR	Real-valued vector	×
DKT	2015	RNN/LSTM	Vector	×
DKVMN	2017	MVNN	matrix	×
KTM	2019	FM	Real-valued vector	×
DKT + Forgetting	2019	RNN/LSTM	Vector	√
GKT	2019	GNN	Vector	×
AKT	2020	FFN + MSA	Vector	√
HawkesKT	2021	FM	Real-valued vector	√

Table 3. Introduction of the datasets.

Dataset	Students	Skills	Records
ASSISTment09-10	3700	111	110,200
ASSISTment12-13	25,300	245	879,500
ASSISTment17	1700	102	9,786,500
Slepemapy.cz	81,700	1473	2,877,500

Table 4. Description of the experimental environment.

Configuration Environment	Configuration Parameters
Operating System	Windows 10 64-bit
GPU	RTX 3070
CPU	R7 5800H
Memory	16 GB
Programming language	Python3.6
Deep learning framework	Pytorch 1.7.1
Python library	Scikit-learn, Numpy, Pandas

Table 5. Hyperparameters of the SFBKT.

Hyperparameters	Value
Learning rate	0.001
Batch size	1
Hidden size	64
Embed size	64
Epochs	100
Early stop	5
K-Fold	5

Table 6. AUC and ACC of all methods on the four datasets (higher is better), the best results are in bold face, and the best baseline is underlined.

Method	ASSISTment 09-10	ASSISTment 12-13	ASSISTment17	Slepemapy.cz
Metrics	ACC AUC	ACC AUC	ACC AUC	ACC AUC
DKT	0.7396 0.7508	0.7275 0.7314	0.6981 0.7273	0.7971 0.7421
DKT + Forgetting	0.7402 0.7537	0.7388 0.7457	0.6993 0.7302	0.8009 0.7498
KTM	0.7327 0.7415	0.7439 0.7532	0.6935 0.7237	0.7967 0.7415
HawkesKT	0.7472 0.7623	0.7513 0.7663	0.7063 0.7487	0.8028 0.7523
AKT-R	0.7375 0.7462	0.7452 0.7552	0.7174 0.7583	0.7986 0.7462
SFBKT	0.7593 0.7909	0.7544 0.7842	0.7404 0.8095	0.8087 0.7736

Table 7. Comparison of the experimental results among the SFBKT and variants of SFBKT.

Method	ASSISTment 09-10	ASSISTment 12-13	ASSISTment17	Slepemapy.cz
Metrics	ACC AUC	ACC AUC	ACC AUC	ACC AUC
SFBKT-GF	0.7581 0.7883	0.7535 0.7822	0.7361 0.8057	0.8064 0.7721
SFBKT-TF	0.7423 0.7592	0.7448 0.7544	0.7036 0.7431	0.8001 0.7512
SFBKT-PF	0.7566 0.7870	0.7518 0.7810	0.7345 0.8046	0.8026 0.7688
SFBKT	0.7593 0.7909	0.7544 0.7842	0.7404 0.8095	0.8087 0.7736

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Song, Q.; Luo, W. SFBKT: A Synthetically Forgetting Behavior Method for Knowledge Tracing. Appl. Sci. 2023, 13, 7704. https://doi.org/10.3390/app13137704

AMA Style

Song Q, Luo W. SFBKT: A Synthetically Forgetting Behavior Method for Knowledge Tracing. Applied Sciences. 2023; 13(13):7704. https://doi.org/10.3390/app13137704

Chicago/Turabian Style

Song, Qi, and Wenjie Luo. 2023. "SFBKT: A Synthetically Forgetting Behavior Method for Knowledge Tracing" Applied Sciences 13, no. 13: 7704. https://doi.org/10.3390/app13137704

APA Style

Song, Q., & Luo, W. (2023). SFBKT: A Synthetically Forgetting Behavior Method for Knowledge Tracing. Applied Sciences, 13(13), 7704. https://doi.org/10.3390/app13137704

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SFBKT: A Synthetically Forgetting Behavior Method for Knowledge Tracing

Abstract

1. Introduction

2. Related Work and Motivation

3. Proposed Method

3.1. Problem Definition

3.2. Model Architecture

3.2.1. Extracting Forgotten Information in Student Learning Sequences

3.2.2. Using CTLSTM to Obtain Student Knowledge Status

3.2.3. Predicting Results Based on Neighbor Information

3.3. Optimization

4. Experiments

4.1. Experimental Datasets

4.2. Experimental Environment

4.3. Results and Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI