CourseKG: An Educational Knowledge Graph Based on Course Information for Precision Teaching

: With the rapid development of advanced technologies, such as artificial intelligence and deep learning


Introduction
With the development of artificial intelligence and the growing interest in the education field, smart education has attracted significant attention in recent years [1].However, increasing digitization in the field of education has brought educators and institutions a multitude of new challenges [2,3].Precision teaching driven by artificial intelligence, big data analysis, and other informational technology has become a development trend and a hot research topic in the educational information field.Precision teaching strives to optimize the efficiency of the learning process by customizing the curriculum for each learner [4,5].
The utilization of knowledge graphs (KGs) involves the reconstruction and reorganization of knowledge points derived from extensive course materials.The KG extracts and represents semantic issues of knowledge points based on the structure of nodes and edges in a KG [6].In addition, the KG emphasizes knowledge organization, determining the optimal sequence for delivering knowledge throughout the curriculum.Using the KG, vast and fragmented knowledge can be seamlessly integrated to assist learners in intuitively connecting within the knowledge system.This facilitates precision teaching, including the establishment of personalized learning paths.
Although KGs have demonstrated their efficiency in various domains of artificial intelligence, their application to the field of education has been relatively unexplored.Compared to general KGs, the construction of educational KGs faces many challenges.First, the expected educational concept entity is more abstract than the real-world entity.Second, the expected relationships are more cognitive and implicit and difficult to be deduced from the literal meaning of the text, which is the case with general KGs.In addition, domain adaptation in the education field has been hindered by the lack of relevant data, making it difficult to assess students' individual cognitive abilities.Further, the automation level in the KG construction process is relatively low and often relies on expert knowledge, which can lead to cognitive discrepancies among experts on the same topic, thus affecting scientific rigor and consistency.
Considering the above-mentioned challenges, this study proposes an educational KG based on course information, named CourseKG, for precision teaching.CourseKG employs advanced techniques, including deep learning and big data, for enhanced intelligence.It visually represents the points that should be taught in a lesson and their sequential relationships.Moreover, CourseKG can precisely assess a student's current learning level, determine their zone of proximal development, and identify their sweet spot.
With the KG acquired, CourseKG can attain precise teaching objectives by adhering to the principle of formulating teaching based on specific learning requirements.It enables the implementation of teaching tailored to students' aptitudes.The contributions of this study can be outlined as follows: 1.
A precise teaching and education KG based on course information, named CourseKG, which uses heterogeneous course data, including structured, semi-structured, and unstructured teaching and learning assessment data, is proposed to extract teaching concepts and identify important educational relationships, effectively serving smart education.

2.
An education entity recognition framework called the BERT-BiGRU-MHSA-CRF, based on a pre-training BERT model, is proposed.This framework uses the BERT model for word embedding in educational text and combines the BiGRU and multihead self-attention (MHSA) mechanisms to extract global contextual relevance from multiple perspectives and levels.In addition, a CRF layer that considers the dependency relationships and constraints between character-level labels is used for decoding.

3.
A relationship extraction method based on the BERT model is developed.This method combines sentence features and educational entities using the BERT model and estimates the similarity between knowledge pairs using cosine similarity.

4.
Experiments conducted with real-world teaching data from C programming courses on two leading online learning platforms in China validate the scalability and feasibility of the proposed BERT-BiGRU-MHSA-CRF method.The experimental results demonstrate that the proposed method surpasses state-of-the-art approaches in both relation extraction and educational entity recognition.
The remainder of this paper is structured as follows: Section 2 provides an overview of the related work concerning the topic addressed in this study.Section 3 details the proposed methods and their implementation.Section 4 presents the experimental results and conducts an analysis.Finally, Section 5 summarizes the main conclusions drawn from the study and proposes directions for future research.

General KG
In recent decades, KGs have become a hot research area since they can provide enterprises with structured data and factual knowledge, making their products more intelligent [7].In the development of KGs, the Semantic Web proposed by Berners-Lee [8] and Linked Data from Bizer [9] have made great contributions.
Since the introduction of knowledge graphs (KGs), much research has focused on information retrieval systems, recommendation systems, and question-answering (QA) systems [17].QA systems based on KGs can be broadly categorized into three types: semantic-parsing-based systems [18], deep-learning-based systems [19], and embeddingbased systems [20], along with information-retrieval-based systems [21].It is worth noting that combining deep learning methods with traditional techniques can enhance the performance of KG-based QA systems [17].KG-based recommendation systems typically fall into two categories: path-based methods [22] and embedding-based methods [23].In recent years, information retrieval has garnered significant attention from various companies, such as Google and Facebook.
In addition to all the above-mentioned areas, KGs also play a key role in many other fields, including medicine, network security, finance, news, social networks, and education [17,[24][25][26][27][28][29].This work focuses on the educational KGs and their implementation in practice.

Educational KGs
In addition to the conventional fields of QA, RS, and IR, KGs have also achieved excellent results in the education field.Therefore, applying KGs and knowledge bases to the field of education has been very popular in recent research.Penghe proposed a system for the automatic construction of educational KGs [30].In [31], the strength of semantic linking between knowledge points was assessed, and a Semantic Web-based learner preference model was proposed.In [32], the authors utilize machine learning to design a KG of courses, facilitating the study of MOOC courses.The relationship between the content expressed by KG technology was presented in a visual manner in [33].Khan Academy [30] used educational KG for concept visualization and learning resource recommendation.In [34], a system for educational concepts extraction and identification of implicit relations focused on K-12 educational subjects was developed.A number of studies have suggested learning paths for learners based on a KG [35,36].The main goal of [29] was to construct a knowledge map for the purpose of personalized university education.KGs can be applied to micro-learning [37].Some researchers have constructed a knowledge base to assist in the decision-making process for the adaptation of micro open educational resources.KGs have been used in the teaching of courses, such as databases, as well as in other disciplines.However, the accuracy of educational KGs requires further improvement.In view of that, this work studies course information for achieving precision teaching.
Entity recognition is an essential step in constructing a KG, and extracting entities from unstructured or structured data based on predefined labels is the main task in this step [38].In the beginning, entity recognition was mainly performed using methods based on rules and dictionaries and methods based on neural networks [39].However, each of these two method types faces different problems.For instance, the former has low accuracy and poor portability, and the latter requires numerous manually labeled samples.
Relation extraction is another key part of the KG construction process; it is mainly performed to discern semantic connections between entities.Currently, the relation extraction methods mainly consist of unsupervised/semi-supervised, supervised learning, and template-based approaches [40].Notably, supervised learning methods demonstrate superior performance.In addition, neural networks have been extensively employed in relation extraction to improve the performance of relation extraction tasks.

Current Problems
At present, the practical application of KGs to the field of education is still in its early stages, and there are several problems related to domain adaptation, knowledge granularity, and construction methods, which can be summarized as follows: 1.
The knowledge category is coarse-grained: In typical KGs, nodes primarily represent real entities, leading to an uncertain granularity structure.Consequently, it becomes challenging to directly represent knowledge elements within a course.

2.
Domain adaptation requires further improvement: The lack of an appropriate corpus in education poses a challenge to effectively simulate and test individual students' cognitive abilities at a granular level.

3.
The degree of automation in constructing KGs is relatively limited: Numerous KGs heavily depend on expert knowledge.Addressing of the same knowledge point by various experts introduces cognitive variations, creating challenges in maintaining scientific rigor and consistency.

Proposed Method
In this study, a KG, which represents a type of semantic network, is used to extract and depict relationships among knowledge points within the course.This section describes in detail the construction flow of the proposed CourseKG, as shown in Figure 1.
We initially collected digital teaching materials from online learning platforms, which encompassed electronic textbooks, syllabuses, courseware, and tests.Subsequently, the preprocessing stage involved cleaning and standardization to separate Chinese characters from valid sentences.Entity recognition was then carried out through techniques such as BERT and BiLSTM, followed by feature extraction and measurement of correlation to extract relationships.Finally, the data were input into CourseKG for the purpose of precision teaching.The process of converting raw data from an online learning platform to the highquality CourseKG mainly includes three procedures: pre-processing, entity recognition, and relationship extraction.The three procedures are described in the following subsections.

Definition and Description
The definition of CourseKG is given by Definition 1.
CourseKG aims to structure and integrate data based on the educational ontology type.Ontology typically includes a subclass-based taxonomic hierarchy, incorporating diverse classifications of concepts.The three elements of Model, V represents a collection of extracted knowledge points, P signifies their properties or features, and R denotes relationships between entities.The symbolic or numeric attributes of each point are depicted by nested feature vectors for establishing a standardized definition.The three elements are defined as follows: the entity name is defined based on educational ontology; where: A is parent and B is its child cause_and_effect A triggers an event or condition, while B is the outcome arising from A Further, Data denotes the foundation of CourseKG, sourced from three text types, as shown in Table 2. Based on the above-presented data, Model of CourseKG has a hierarchical structure.One example of CourseKG is given in the following: [[type.skills, type.abilities], [objectives.apply, [Brother : {local_variable, global_variable}] >

Data Pre-Processing and Knowledge Acquisition
Domain information embedded in the vocabulary can contribute to entity recognition contained in vocabulary and improve entity recognition performance.Given the scarcity of labeled data, the direct utilization of raw data for entity recognition in vertical domains for pre-training models like BERT [41] may not be as effective.To address this issue, this study first pre-processes raw data and performs knowledge acquisition.This can enhance the accuracy of subsequent entity recognition and contribute to more comprehensive knowledge mining and KG construction.In addition, based on the N-LTP [42], which is a language technology platform, the entity corpus construction process is performed, as shown in Figure 2.

Entity Recognition
This study proposes a framework called BERT-BiGRU-MHSA-CRF designed to enhance the accuracy of educational entity recognition within CourseKG.The structure of BERT-BiGRU-MHSA-CRF, as depicted in Figure 3, comprises four components: the BERT module, the BiGRU module, the MHSA module, and the CRF module.The BERT module is employed for word embedding of educational text.In this module, the BERT model converts Chinese characters into word vectors with textual information, achieving efficient embedding.The resulting embedded word vectors are input to the BiGRU module, where feature extraction is conducted on the word vectors.The MHSA module utilizes the MHSA mechanism to extract global feature correlation information from various perspectives and levels.Finally, the CRF module fully considers intercharacter tag dependencies and constraints, decoding them with CRF to ensure the rationality of the final predicted tags.

BERT Module
The BERT model is an excellent pre-training model for representing text word vectors.This model includes a multilayer bidirectional Transformer encoding that considers words before and after a particular word to determine its meaning in context.The BERT model shares a similar structure with the GPT and ELMO models.The Chinese BERT model is commonly acquired through unsupervised task training on extensive general-purpose corpora.Compared to the original BERT model, it acquires an enhanced feature representation of words and is applicable directly to downstream tasks.
In the field of education, there exists a widespread scarcity of corpora, and the available datasets fall short in supplying adequate data for BERT pre-training.Therefore, this study refines the BERT model through fine-tuning to enhance recognition accuracy.In addition, word embedding operations are performed on the training data to more effectively capture the content information embedded in educational texts.In the pre-training method used in this study, only the words related to education are masked.Finally, the word embedding vector is fed into the BERT model for feature extraction, yielding a sequence vector endowed with rich semantic features.

BiGRU Module
The gated recurrent unit (GRU) [43] is a variant of the long short-term memory neural network (LSTM).Traditional recurrent neural network (RNN) training frequently faces challenges such as gradient vanishing and explosion.In contrast, LSTM addresses these issues by incorporating input gates, output gates, and forget gates.Although the LSTM partially addresses the gradient disappearance problem, its computational process is timeintensive.The GRU structure streamlines the LSTM by merging the input gate and forget gate into an update gate.Consequently, the GRU preserves the strengths of the LSTM while simplifying its architecture.In the context of entity recognition in educational records, the GRU proves effective in extracting features.
Following the operating principle of the GRU unit, the GRU module excels at discarding redundant information, while its straightforward model structure reduces computational complexity.However, the naive GRU alone fails to fully leverage the contextual information present in educational records.Therefore, the backward GRU is introduced to capture backward semantics in this paper.By incorporating both forward and backward GRU neural networks, the model, named the BiGRU model, aims to extract key features of named entities in educational records.The BiGRU model employs two sets of GRUs to extract features from sentences, ensuring that each token in a sequence is influenced by both its past and future contexts.This dual-directional approach provides access to both backward and forward information at each step.
The word vectors generated by BERT serve as the input to the BiGRU at each time step.At step t, a forward hidden layer processes the sequence from step 1 to step t, producing a forward hidden sequence Concurrently, a backward hidden layer handles the same sequence from t to 1, resulting in a backward hidden sequence Finally, the preceding and subsequent semantic features are combined to obtain h t =< → h t , ← h t >.

MHSA Module
The MHSA technique has been extensively applied in deep-learning-based natural language processing (NLP), especially in named-entity recognition tasks.When extracting text feature information, the integration of the MHSA mechanism proves effective in addressing the issue of time-series association in textual data.To enhance the extraction of interactive representation from a text sequence, the MHSA mechanism combines semantic feature information from various levels and perspectives, resulting in a comprehensive interactive representation of the text sequence.
In this study, the MHSA mechanism is focused on extracting feature information pivotal for entity recognition from the output of the BiGRU module.The matrix produced by the BiGRU comprises To establish correspondences between key slices and their respective query slices, the query-key pair computes their intermediate representational similarity using the scaled dot-product, followed by the SoftMax operation.Subsequently, the aggregation of self-attentional interactions is derived by multiplying the resulting matrix by V c , yielding the representational context.For a single self-attention head h and a set of C slices, the representational context is determined as follows: To compute the respective self-attention heads, multiple projections of Q c , K c , and V c are employed, and their outputs are concatenated to achieve an aggregation of multiple heads, as depicted below: The MHSA module consists of N MHSA layers.In the MHSA module, the longdistance characteristics and global information can be fully captured.

CRF Module
The BiGRU module performs well in tackling long-distance textual information.The CRF decoder [44] derives an optimal prediction sequence by leveraging the linking between adjacent entity labels.A significant strength of CRF is its capability to learn restrictive rules autonomously, thereby probabilistically reducing the occurrence of illogical sequences in the prediction sequence.In this study, the BIOES annotation method is adopted, where "B" signifies "beginning", "I" denotes "inside", "O" represents "out", "E" stands for "end", and "S" indicates "single".Each entity type is associated with its respective BIOES tags.The scoring function of the CRF decoder can be determined as follows: where X is the sequence (x 1 , x 2 , ..., x n ); P and A i,j represent the observation and transition matrices, respectively; the scoring function sums the two matrices; and y corresponds to the label sequence of the predicted output.The conditional probability P(y|X) of y under a given X can be computed with scoring function as follows: where Y X denotes all feasible label sequences for a given sentence.
During the decoding stage, the optimal sequence labeling y * is determined by

Relationship Extraction 3.4.1. BERT-Based Method
The primary objective of this module is to discern the logical linkings within, thereby aiding learners in more effective learning.Relation extraction is crucial for learners: inclusion, precursor, identity, brother, correlation, inheritance, and cause_and_effect relationships.The relationship categories and their definitions are presented in Table 1.The locations of entities are very important in determining their relationships.According to the BERT model, sentence features and educational entities, previously identified in the preceding module, are fused.
For a given sentence S = {c 1 , c 2 , ..., c n }, the head entity E head , tail entity E tail , and relationship R between them form a triple < E head , R, E tail >.As shown in Figure 4, the input includes a sentence and positions of the tail and head entities in the sentence, and the output is the relationship category.The word embedding vector sequence and sentence embedding vector are obtained from the last hidden state of the BERT, M sentence .Then, the position information on the tail and head entities is used to encode their representation vector using the corresponding word embedding vector.Finally, the SoftMax layer performs classification prediction.The probability of the classification result for each category is decoded using the concatenation result of the head and tail entity vectors and the sentence vector as follows: where R ∈ R L * 3d , L denotes total number of relationship types; d represents the hidden state size of the BERT model; M head and M tail denote hidden state vectors of the head entity E head and tail entity E tail in the last; and W and b are bias vectors.

Similarity-Based Method
Relation extraction addresses the challenge of semantic connection between entities.The predecessor-successor relationship stands out as a crucial link in knowledge, contributing to the generation of more coherent semantic sequences.Drawing from Mastery Learning [45], an educational philosophy and instructional strategy, asserts that students must achieve a certain level of mastery in prerequisite knowledge before progressing to learn subsequent information.When teaching parameters, arguments, and a function (knowledge A), it presupposes prior comprehension of the function (knowledge B).The theory suggests that one should master its predecessor knowledge A before understanding knowledge B. In this study, we use knowledge pairs to define knowledge points with sequential relationships.Numerous instances in KGs illustrate that the semantic relationship between a knowledge node and its neighboring nodes is often similar and strong.Consequently, this study employs pairwise similarity measurement [46] to extract their relationship.A cosine similarity approach is applied, as depicted in Equation (7).The larger the cosine value, indicating a smaller angle, the more related the two vectors are.The proposed CourseKG is interpreted as an undirected and weighted graph.Given the nodes V i and V j , sim(V i , V j ) denotes a weighted edge in CourseKG, where nodes V i and V j are connected if sim(V i , V j ) ̸ = 0.The matrix S n = sim(V i , V j ), (i, j = 1, . . ., n − 1, n) is referred to as the similarity matrix or affinity matrix, indicating pairwise similarities between knowledge points.

Experiments 4.1. Data Collection and Pre-Processing
The experiment utilized data collected from two prominent online learning platforms in China, namely, CourseGrading [47] and Educoder [48].Specifically, the experiments focused on a C programming course, involving the download of over 6000 teaching documents from the cloud platform.For experimental validation, we randomly chose 100 students as the subjects.The digital teaching material used to construct CourseKG comprised various text resources, including tests, syllabuses, electronic textbooks, and courseware.
Before training, we cleaned and formatted the textual data with a Chinese word segmentation tool, Jieba [49], ensuring compatibility with the Chinese BERT method.Finally, we extracted 6028 Chinese characters based on 27,042 valid sentences from the teaching materials.The raw and pre-processed data are shown in Table 3.For data pre-processing, the proposed BERT-BiGRU-MHSA-CRF framework was utilized to conduct BIOES annotation on a corpus consisting of 6028 Chinese words, thereby achieving knowledge point entity extraction.During model training, the process initiated with pre-training the model using the BERT module to obtain word vectors.Subsequently, these word vectors were fed into the BiLSTM module for additional training.Lastly, the CRF decoding procedure was applied to predict the optimal label sequence for the BIOES module.

Entity Recognition Results
In the proposed entity recognition model BERT-BiGRU-MHSA-CRF, the model parameters were initially configured during the training phase.The batch size, which determines the size of the batch for each iteration, was set to 64, taking into account the dataset size and its influence on the gradient descent direction.To balance convergence speed and model fitting effectiveness, the learning rate was set to 0.0001, and the Adam optimizer was utilized.A dropout rate of 0.5 was employed.The model underwent training for 100 epochs, incorporating an early stopping strategy.The labeled datasets were partitioned into test and training sets at a ratio of 1:4.The parameters for the baseline models followed specifications from their original papers or initial implementations.
To evaluate the performance of the proposed BERT-BiGRU-MHSA-CRF, it was compared with two baseline methods: BERT-BiLSTM-CRF and BERT-GRU-CRF.The results of the proposed model and baselines are presented in Table 4.
Following the most recent studies, four evaluation metrics were selected to evaluate the models, namely, accuracy (Acc), recall, precision, and F1-score, and they were calculated by where TN represents the number of true negatives, TP denotes the number of true positives, FN corresponds to the number of false negatives, and FP signifies the number of false positives.The experimental findings suggested that all models demonstrated effective performance on the dataset.The BiGRU method, which combines forward and backward GRU units to capture information both preceding and following a given sentence, exhibited enhanced utilization of contextual information.Specifically, the experimental results indicated superior performance of the BiGRU method compared to the BiLSTM method.Furthermore, the integration of a CRF layer facilitated automatic learning of sentence-level constraints and incorporation of constraint labels into the BiGRU output, thereby enhancing entity recognition performance.The accuracy and F1-score values achieved by the BERT-BiGRU-CRF model were 85.33% and 81.19%, respectively.Moreover, the addition of the MHSA module to the BERT-BiGRU-CRF model allowed for improved utilization of global information.Analysis revealed that the model augmented with the MHSA module exhibited increases in accuracy and F1-score values by 3.3% and 5.3%, respectively, underscoring the positive contribution of MHSA to named-entity recognition performance.
Owing to the absence of education-specific data, the conventional BERT model faced challenges in effectively extracting features from sequences.Education data had unique characteristics that could not be overlooked.This study used a fine-tuned BERT model, which enriched the contextual semantics in the vertical domain and provided domain awareness, resulting in the best performance in educational entity recognition among all methods, having the accuracy and F1-score values of 88.36% and 85.49%, respectively.
Several measures were taken to prevent overfitting.First, we introduced regularization techniques to limit the complexity of model parameters and prevent overfitting to the training data.Secondly, we used an appropriate dropout value in the network to reduce the dependence between neurons and improve the generalization ability of the model.These comprehensive measures helped ensure that our models were less prone to overfitting and had better generalization performance when faced with new data.
The results suggest that the proposed BERT-BiGRU-MHSA-CRF method produced good outcomes.Moreover, this model was employed for entity prediction on unlabeled textual data, ensuring the quality of entity data and the KG.

Parametric Analysis
While training the model, there are two important parameters that need to be considered: the learning rate and the dropout value.If the learning rate is too large, then the model will converge too fast and may exceed the optimal value.If the learning rate is too small, then the model will converge too slowly and it may even cause the model to fail to converge.The dropout method can be used to avoid overfitting during model training.Based on the above considerations, this paper adds comparative learning rate and dropout value experiments to explore the model while also obtaining the best results.
First, by adjusting the learning rate continuously, the model effects were compared when the learning rate was 0.01, 0.001, and 0.0001.Table 4 shows the experimental results for different learning rates.For the model with a learning rate of 0.0001, the value of F1 was higher than that of the models with other learning rates, so the learning rate was chosen as 0.0001 based on the perspective of model performance.The dropout parameters for the experiment also need to be taken into consideration.In the forward propagation process, the dropout method causes a certain neuron to stop working temporarily according to a certain probability, P, which makes the generalization ability of the model stronger.Models can be regularized to some extent by not relying on local characteristics too much.The results of our model with different dropouts are shown in Table 5.The experimental results show that the model with dropout equal to 0.5 had better performance, so the dropout value of 0.5 was selected.

Efficiency Analysis
The accuracy and loss were also considered in the model training process.Figure 5 depicts the accuracy and loss curves, offering a visual representation of the model's performance throughout the training process.The accuracy curve shows the model's capability to accurately recognize data, whereas the loss curve reflects the incurred errors during training.Based on the results, both the loss and accuracy exhibit improvement as the number of epochs increases.This suggests that our model was progressively learning and refining its accuracy.After around 80 epochs, the model reached stable performance, indicating that additional training iterations may not yield substantial improvements.Consequently, the accuracy and loss curves illustrate that beyond approximately 80 training epochs, our model demonstrated strong convergence and excelled in performance.

Ablation Study
To examine the contributions of the three main modules in the proposed model, namely, the BERT, MHSA, and CRF modules, this study conducted an ablation study.The results obtained on the experimental dataset are presented in Table 6.
The results for the CRF module show that the model performance could be improved by adding the CRF module to the model compared to employing the BiGRU method directly.The results indicate that the CRF module was beneficial for improving the entity recognition effect.Furthermore, a remarkable improvement in recognition performance is observed after introducing the BERT model fine-tuned with educational text information.This indicates that the educational semantic information derived from the BERT model could provide significant assistance in educational entity recognition.Moreover, the MHSA module comprehensively extracted global context information, leading to a certain improvement in the results.

Extracted Relationship Results
The connections among the knowledge points in CourseKG were established using the BERT model, and the similarity of all nodes was calculated.In this paper, a subset of the knowledge relationships of Function is depicted in Figure 6, where nodes represent the extracted knowledge point entity.

Visualization of CourseKG
In order to present CourseKG in a more intuitive and vivid manner, we employ Neo4j [50]   This interactive visualization system allows users considerable freedom; they can drag, click, and access any relevant knowledge node at will.This knowledge graph is automatically constructed by our parsing pipeline, which has the capability to parse and analyze the desired textbooks.
To enable users to concentrate more effectively on the specific knowledge points of interest, we have implemented a focus differentiation feature with dynamic effects.When a user hovers their cursor over a particular knowledge point, that node and its directly associated knowledge points, along with the edges connecting them, are highlighted, while the opacity of the remaining knowledge nodes and edges is reduced.This design facilitates effortless recognition of the currently focused knowledge node by the user.Moreover, a concise tooltip summary is generated for the current node, thereby allowing users to swiftly grasp the essence of the highlighted knowledge point.Furthermore, we also provide more detailed information for users who wish to delve further into the knowledge graph.CourseKG is capable of delivering a "relevance score" between any two nodes within a range of 0 to 1, where higher values denote a stronger correlation between the two knowledge points.This type of granular data serves as an expandable option, empowering users to gain a deeper understanding of the underlying knowledge system at hand.In essence, the provision of such scores allows for a nuanced exploration of the relationships among various knowledge points in the network, thereby enriching the user's investigative experience and enhancing their comprehension of the interconnectedness within the course's domain.

Conclusions and Future Work
This paper proposes CourseKG, utilizing diverse course data, enhancing smart education by extracting teaching concepts and identifying vital educational relationships.We propose the BERT-BiGRU-MHSA-CRF framework for precise education entity recognition, integrating advanced mechanisms for global contextual relevance.Additionally, we present a groundbreaking relationship extraction method based on the BERT model.Experimental validation with real-world C programming course data demonstrates the scalability and superiority of our approach, surpassing state-of-the-art methods in both relation extraction and educational-entity recognition.Our research not only introduces innovative frameworks but also validates their effectiveness, emphasizing their potential impact on advancing knowledge representation in smart education.
Currently, our research primarily substantiates its experiments using a dataset from C programming courses.In the future, we can explore the possibility of collecting more teaching data across diverse academic disciplines and integrate various data sources to construct a more comprehensive knowledge graph.Beyond textual data, multi-modal data such as videos and audio can also furnish substantial informational support.Hence, future endeavors could investigate how to leverage multi-modal data to enrich the content of the knowledge graph and further enhance the efficacy of personalized learning.
We are fully aware that student feedback plays a pivotal role in personalized learning.Consequently, in the future, we may attempt to incorporate student feedback data and merge it with the knowledge graph to gain a deeper understanding of students' learning conditions and adaptively fine-tune instructional strategies accordingly.Moreover, natural language processing techniques can be employed to analyze student feedback, thereby extracting the most valuable insights for promoting individualized learning.
In order to improve the performance of CourseKG, future research can implement domain adaptation and transfer learning, combine domain knowledge, introduce dynamic context modeling technology, and explore continuous supervised learning methods.These innovative directions are expected to enhance the accuracy, robustness, and adaptability of the model so that it can better serve the diverse and evolving needs of the education field.The proposed CourseKG could offer educators a professional and efficient solution, enabling them to provide personalized and adaptive teaching experiences.Due to its precise instructional capabilities and personalized learning paths, the proposed CourseKG could revolutionize the delivery and reception of education.

Figure 2 .
Figure 2. Flowchart of knowledge extraction from unstructured textual data.

Figure 4 .
Figure 4. Block diagram of the relationship extraction method.

Figure 5 .
Figure 5. Accuracy and loss curves during the model training process.
as the backend graph database, and on the foundation of the Vue framework [51] integrate D3.js [52] to construct a visual representation of the knowledge graph, as illustrated in Figures 7, 8and 9.

Figure 7 .
Figure 7. Visualization of knowledge graph.Orange denotes primary nodes, green represents secondary nodes.

Figure 8 .
Figure 8.The focus differentiation in CourseKG.The cursor is currently hovering over the "Sorting" knowledge point, causing it and the directly interconnected knowledge points to be highlighted, with the opacity of other knowledge points diminished, thus facilitating clear distinction among them.

Figure 9 .
Figure 9.A visually augmented CourseKG, incorporating the representation of relationship associations.The scores between nodes are explicitly annotated on the edges, thereby enabling intuitive visualization and examination.
denotes the nature of knowledge and can be categorized into three; -objectives = [remember, understand, apply, analyze, evaluate, create]; this is defined as a set of six hierarchical models following Bloom's taxonomy; di f f iculties = [easy, relativelyeasy, normal, hard, veryhard]; this is employed to predict the level of difficulty for students in comprehending a specific knowledge unit V i ;• R i = [inclusion, precursor, identity, brother, cause_and_e f f ect, ...].It is important to highlight that educational entities exhibit a sequential relationship and can possess numerous specific relations, as illustrated in Table1.Table 1.Special relations in Model of CourseKG.Type Description inclusion A contains B precursor A is a prerequisite for learning B identity A and B denote distinct descriptions of identical knowledge brother A and B share the same parent C, yet there is no sequential relationship between them correlation A and B are relevant but do not conform to the preceding relationships inheritance

Table 2 .
The three types of text in Data of CourseKG.

Table 3 .
The raw and pre-processed data.# stands for "number of".

Table 4 .
Comparison results of the three methods.The best results are emphasized in bold.

Table 5 .
Comparison results of different parameters.The best results are emphasized in bold.

Table 6 .
The ablation study's results.The best results are emphasized in bold.