BiLSTM-ResNet-CRF: An Improved Model for Subject Knowledge Graph Construction

Ma, Yinghong; Chen, Lu; Liu, Zhiyuan; Zhou, Shengyao; Song, Le

doi:10.3390/systems14060623

Open AccessArticle

BiLSTM-ResNet-CRF: An Improved Model for Subject Knowledge Graph Construction

by

Yinghong Ma

¹

,

Lu Chen

¹

,

Zhiyuan Liu

¹

,

Shengyao Zhou

¹

and

Le Song

^1,2,*

¹

Business School, Shandong Normal University, Jinan 250014, China

²

School of Business Administration, South China University of Technology, Guangzhou 510641, China

^*

Author to whom correspondence should be addressed.

Systems 2026, 14(6), 623; https://doi.org/10.3390/systems14060623

Submission received: 23 March 2026 / Revised: 26 May 2026 / Accepted: 28 May 2026 / Published: 1 June 2026

(This article belongs to the Topic EdTech and Industry 5.0: Digital Transformation, Sustainability and Innovation)

Download

Browse Figures

Versions Notes

Abstract

The emergence of massive knowledge in online learning systems has increased the difficulty for learners to acquire the necessary information. Due to unclear information expression and excessive knowledge redundancy, learners face challenges in identifying relevant knowledge. Furthermore, the presence of substantial unstructured knowledge in subject domains also hinders the effective transmission and application of knowledge. To address these issues, a framework for constructing a subject domain knowledge graph is proposed in this work. The framework primarily aims to visualize isolated information and connect knowledge into graph structures. The knowledge graph can help learners quickly and efficiently acquire the knowledge they need. The novel framework is constructed with three steps. The first step is to design the ontology rules based on the domain-specific subject knowledge from the perspective of classification, and also to construct the schema layer of the knowledge graph. The second step is to propose a domain-optimized BiLSTM-ResNet-CRF model for subject domain entity recognition, which introduces residual blocks to enhance fine-grained local contextual feature extraction for multi-word technical terms, addressing the limitations of traditional BiLSTM-CRF models in educational text processing. The BERT relation extraction model is used to extract relations between knowledge entities. Then the data layer is constructed. Finally, the third step is to achieve knowledge fusion through entity linking and two-layer entity alignment against results stored in a database. The result comparisons on the dataset show that the novel BiLSTM-ResNet-CRF model has higher scores than several other classical models, achieving an F1-score of 80.26%. The proposed framework’s effectiveness is rigorously validated using high school mathematics as a representative case study with a well-structured knowledge system.

Keywords:

knowledge graph; educational technology; ontology construction; named entity recognition; entity alignment; BiLSTM-ResNet-CRF model

1. Introduction

With the deepening of intelligent transformation in the field of education, constructing an in-depth knowledge system in the subject field has become an important task. This system plays a key role in improving the quality and efficiency of teaching and learning. The rapid growth and fragmentation of knowledge make the systematic integration and correlation of internal knowledge in the subject particularly important. The methods of extracting subject knowledge to strengthen core content have become interesting problems. Generally, the organization of subject knowledge has relied heavily on manual methods, which are time-consuming and difficult to keep up with rapid knowledge growth. Especially in the era of information, the pursuit of efficiency and precision has permeated all industries, including education. The rise of knowledge graphs has provided an innovative solution to this problem. As a powerful tool for managing data [1], knowledge graphs can organically connect scattered knowledge through aggregation relationships [2]. This feature facilitates the systematic management and application of knowledge and provides new solutions for data processing in subject fields. By constructing an expertise-based knowledge graph concerning the subject, it is possible to effectively identify the core knowledge. Meanwhile, utilizing relation extraction technology to organically connect these knowledge entities can form a comprehensive knowledge network. Constructing a subject knowledge graph not only enhances the efficiency and accuracy of information retrieval but also facilitates the sharing of knowledge and promotes the in-depth development of teaching [3]. This feature makes the application prospects of knowledge graphs in subject fields extremely broad. It can not only significantly enhance the speed and accuracy of information retrieval but also have a significant impact. For example, intelligent Question Answering and personalized learning recommendations [4] provide solid support for the intelligent transformation of education and teaching. Therefore, the establishment of a subject field knowledge graph is particularly important in the present day of knowledge inundation.

The framework of a knowledge graph includes the schema layer and the data layer. The core of a knowledge graph is the schema layer, which provides rules and guidance for the extraction of entities and relationships. The construction methods of the schema layer include manual construction, semi-automatic construction, and intelligent construction [5]. Recently, most studies still use manual construction methods to build the schema layer. For example, one of the widely used methods is the seven-step method for domain ontology construction developed by Stanford University. It involves dividing ontology modules and combining other methods such as IDEF5 or the skeleton method for ontology verification and optimization [6,7,8]. However, these manual ontology construction methods rely too heavily on the subjective consciousness of the builders and lack objective standards. In addition, shared and reusable ontology resources have not been fully utilized [9]. Therefore, achieving intelligent ontology construction remains a vital matter that demands a solution. Intelligent methods can not only improve efficiency but also reduce the effect of human-related aspects, thereby enhancing the objectivity and adaptability of the ontology.

In the construction of the data layer for the knowledge graph, high-precision knowledge extraction is also a key challenge that researchers aim to address [10]. In practical application scenarios, the knowledge graph constructed after improving the accuracy of named entity recognition in different fields is more valuable for assisting decision-making [11]. Within the domain of cultural tourism, the constructed knowledge graph of tourism resources has an F1-score of 0.93 for identifying entities related to cultural categories. The high level of accuracy can provide a more professional and comprehensive introduction to cultural tourism resources and help tourists formulate more reasonable travel plans [12]. In the field of medicine, the constructed medical knowledge graph effectively manages medical knowledge, and the F1-score of the identified medical entities is as high as 0.95. This improves the efficiency and accuracy of diagnostic decision-making and better provides high-quality medical services [13]. Named entity recognition is a key technology in the field of natural language processing. The recognition of these entities is crucial for natural language processing, such as information extraction, question-answering systems, and text summarization. The traditional BiLSTM-CRF model has become the mainstream method in named entity recognition tasks due to its ability to capture contextual information and perform well in sequence annotation tasks [14,15]. However, the BiLSTM-CRF model neglects the extraction of local contextual information when dealing with long-distance dependency problems, so there are still unknown issues for improvement in this model.

The construction and expansion of knowledge graphs rely on extracting and fusing knowledge from multiple sources of data, so knowledge fusion is also the core task of constructing knowledge graphs. In the process of knowledge fusion, entity alignment can act as a key component. Currently, most entity alignment methods use single-layer alignment strategies that are primarily based on the similarity of entity names [7]. However, the brevity of entity names limits the amount of information they contain and makes it difficult to fully reflect the characteristics of the entities. This limitation significantly increases the difficulty of the entity alignment process. To address the limitations in the aforementioned research, this work proposes an improved BiLSTM-CRF model by introducing residual blocks to enhance performance. The residual blocks added in the BiLSTM-CRF can capture local contextual information between words and extract features between adjacent words. This operation can more effectively identify possible continuous word structures, which is crucial for entity recognition. Meanwhile, the batch normalization operation contained in the residual block helps stabilize the training process. The advantages of the improved BiLSTM-CRF model have improved the accuracy of entity recognition. In addition, the improved BiLSTM-CRF model can better identify entities with more knowledge relationships, enrich the knowledge graph, and improve the responsiveness of the question answering system. In this work, we choose high school mathematics as the experimental domain to verify the proposed framework. This choice is motivated by its standardized knowledge architecture, complete publicly available textbook resources, and clearly defined hierarchical relationships between knowledge points, which provide a stable and reproducible benchmark for evaluating knowledge graph construction performance.

It is important to note that our work does not propose a fundamentally new generic NER architecture. Instead, we focus on the domain-specific adaptation of existing architectures to address the unique challenges of subject domain knowledge graph construction. While residual-enhanced BiLSTM-CRF models have been explored in other domains, their application to educational texts requires targeted optimizations that have not been systematically investigated in previous research. Our model design is explicitly driven by the linguistic characteristics of subject domain texts, and our experimental results demonstrate that these domain-specific adaptations lead to significant performance improvements in educational knowledge extraction. The main contributions of this work are as follows: (1) An improved method on designing a systematic subject domain knowledge ontology is designed from the perspective of refining knowledge classification; (2) The model introduces residual blocks to enhance the extraction of fine-grained local context critical for multi-word technical term recognition, theoretically analyzes the complementarity between BiLSTM and ResNet, and validates its effectiveness via ablation studies and domain-specific evaluations; (3) Entity alignment utilizing entity descriptions achieves fine-grained alignment of redundant entities.

The rest of this work is arranged as follows: The related works of the techniques for constructing knowledge graphs and applications of knowledge graphs in the education field are presented in Section 2. We propose a novel framework of specific knowledge graphs construction which includes three steps or parts: the schema layer, the data layer, and the knowledge fusion, in Section 3. In Section 4, the high school mathematics data are taken as an example to evaluate the feasibility of the proposed framework for constructing subject knowledge graphs. In Section 5, discussion and conclusion are given.

2. Related Works

2.1. Related Work of Named Entity Recognition

Named entity recognition is an essential technology in the construction of knowledge graphs. It enables the extraction of valuable information from extensive natural language data, which is vital for the creation of knowledge graphs. The earliest methods of named entity recognition are based on dictionary and rule matching, which not only require a significant amount of manpower but also struggle to meet various practical needs. The advent of machine learning has led to the development of new methods for named entity recognition, such as Hidden Markov Models [16], Maximum Entropy Markov Models [17], and Conditional Random Field Models [18]. These models are based on statistical learning methods. Following that, deep learning methods have been applied to named entity recognition, such as Recurrent Neural Networks (

R N N s

) [19], Convolutional Neural Networks (

C N N s

) [20], and Long Short-Term Memory Networks [21].

In recent years, models that perform well for named entity recognition typically rely on deep learning methods to obtain features from text. These models then employ statistical models like Conditional Random Fields to obtain entity label information at the sentence level. For instance, the fusion model of deep learning and machine learning, BiLSTM-CRF, is used to experiment on the CoNLL-2003 corpus, achieving an F1-score of 0.9 [22]. However, these methods have the issue of not being able to resolve the problem of one word corresponding to multiple meanings. Consequently, J. Devlin et al. proposed BERT to capture deep semantic features from large-scale text corpora, which has shown good performance in various natural language processing tasks [23,24].

In addition to general resource-sufficient scenarios, low-resource named entity recognition has become an important research hotspot in recent years, as many professional domains lack large-scale manually annotated datasets. Scholars have proposed effective solutions combining data augmentation and pre-trained language models to address the challenge of insufficient training data [25,26]. Zhu et al. combined pre-trained language models with a curriculum learning strategy to generate diverse training examples while reducing noise interference, achieving significant performance improvements in multiple low-resource domains [25]. Yaseen et al. employed back-translation technology to produce linguistically rich synthetic data, which effectively enhanced the generalization ability of NER models in low-resource scenarios [26]. These studies not only demonstrate the effectiveness of pre-trained language models in low-resource environments but also provide valuable ideas for improving NER performance through data enhancement and lightweight model design.

Almost all existing residual-enhanced BiLSTM-CRF models are designed for general sequence labeling tasks without considering the three core characteristics of subject domain texts identified in our work: (1) high density of consecutive multi-word domain terms, e.g., “quadratic equation with one unknown”, “eccentricity of an ellipse”, which require precise local boundary detection; (2) highly standardized semantic expressions where local n-gram patterns are more discriminative than global context for entity classification; (3) limited labeled training data due to the specificity of subject domains, which demands architectures with strong training stability. The high school mathematics domain studied in this paper belongs to a typical resource-sufficient scenario: we have obtained a large number of standardized textbook texts, teaching materials, and publicly available domain resources, and have constructed a high-quality annotated dataset containing 3113 training entities and 376 test entities. However, the research ideas from low-resource NER have important guiding significance for our model selection: although pre-trained language models such as BERT have strong performance, they have high computational resource requirements and poor model interpretability, which limit their direct application in lightweight educational knowledge graph construction scenarios that require rapid deployment and iterative updates. Therefore, this paper chooses to improve the lightweight BiLSTM-CRF model, which has a clear structure, low computational cost, and good interpretability, to meet the practical needs of educational domain applications.

Liu et al. improved the BERT-BiLSTM-CRF model for agricultural citrus pest identification. They combined the data from the BERT and BiLSTM layers and reduced the dimensions by using a fully connected layer. Additionally, they set hierarchical learning rates. They encapsulated the BiLSTM and fully connected layers and assigned learning rate multipliers to ensure a small learning rate for the BERT layer [27]. By model optimization and domain adaptation in specific domains, the performance of named entity recognition tasks can be significantly improved. This optimization approach shows how generic models can be adapted to a specific domain. However, its effectiveness in other domains remains unproven. Thus, although existing deep learning models and pre-trained language models have achieved good performance in different NER scenarios, they still have limitations in subject domain NER tasks. Specifically, traditional BiLSTM-CRF models neglect the extraction of local contextual information when dealing with domain-specific terminologies composed of consecutive adjacent words, which is exactly the problem we aim to solve in this work by introducing residual blocks.

2.2. Related Work of Relation Extraction

Relation extraction is an essential component of semantic analysis. It aims to automatically identify entity relationships. Resembling the development of named entity recognition, methods for relation extraction are generally categorized into rule-based, statistical machine learning, and deep learning approaches. Rule-based methods largely depend on syntactic and semantic analysis of sentences. For instance, Fundel et al. identified and extracted associations between entities by parsing sentences into nouns and verbs, using a tree-based structure [28]. Nouns represent different entities, while verbs denote the relationships between entities. Deep learning-based relation extraction methods have effectively addressed the issues of traditional natural language processing techniques. These traditional methods often overly rely on annotations and are prone to propagating annotation errors. Li et al. integrated a specific attention mechanism into the context layer of CNNs [29]. It can enhance the impact of the relationship matrix weights between two entities in a sentence and disregard the computation of unrelated terms.

With the rise of pre-trained models, relation extraction methods that identify relations between entities have also made significant progress. The scholars use the BERT models to solve various types of problems in relation extraction. Liu et al. employed the BERT model to solve the problem of feature conflict and insufficient utilization of contextual semantic information [30]. Under digital era industrial safety management, Fang et al. utilized the BERT model to classify information in near-miss accident reports and validate it in a construction case study [31]. Wei et al. took a novel approach by using a BERT pre-trained model and a cascading decoder. This method also models the relation as a function and maps the subject in a sentence to the object. It effectively addresses the issue of overlapping relation extraction [32]. Some scholars have also addressed the issue of poor relation extraction performance caused by insufficient existing knowledge through methods such as data augmentation and knowledge injection [33]. In general, the field of relation extraction is rapidly evolving. The combination of deep learning and pre-trained models provides new perspectives to address the limitations of traditional methods. In situations where ordinary deep learning models cannot solve existing problems, pre-trained models can be used to attempt to solve the problem.

2.3. Educational Knowledge Graph

In the field of education, the emergence of knowledge graphs has transformed the organization, acquisition, and application of knowledge. By linking various concepts, knowledge, and other elements, learners can visualize knowledge in a mesh form to help them understand the key points of knowledge.

At the theoretical research level, scholars have conducted in-depth discussions on the application of knowledge graphs in education. They analyzed the cognitive value of the educational knowledge graph from multiple perspectives, proposed conceptual models and technical frameworks. And they also predicted its application prospects in multiple fields such as educational big data mining, adaptive learning, and personalized recommendation [34]. But they only illustrated on a theoretical level, lacking experimental validation.

At the practical application level, educators in different fields have also constructed knowledge graphs that are suitable for specific educational scenarios. For example, Yang et al. developed a knowledge graph for the field of hydraulic engineering [35]. This knowledge graph is aimed at helping students understand course content, knowledge units, and its interrelationships, providing a structured knowledge representation method for hydraulic education. Similarly, Li et al. utilized ontology and knowledge graph technology to construct the knowledge structure of management courses [36]. Nair et al. utilized a knowledge graph and the BERT model [37]. They provided rapid feedback on topics for different learners, integrating the intelligent question-answering system as a component of students’ digital learning to enhance learners’ comprehension abilities. Through these theoretical and practical explorations, we can find that knowledge graphs are gradually becoming an indispensable tool in the field of education. However, these practical applications overlook the display of fine-grained relationships between knowledge or lack descriptions of technical details. Therefore, in order to fully unleash the potential of educational knowledge graphs, we need to address these weaknesses.

3. System Framework

This study proposes an end-to-end domain-specific knowledge graph (KG) construction system. The system architecture comprises three tightly coupled core modules: (1) a fine-tuned bidirectional encoder representation from transformers (BERT)-based entity recognition model, (2) a graph convolutional network (GCN)-enhanced relationship extraction module, and (3) a multi-source knowledge fusion component. Figure 1 presents the annotated end-to-end processing pipeline, which explicitly illustrates data flows, input/output specifications, intermediate processing steps, and module interfaces.

The schema layer, as the backbone of the knowledge graph, is constructed through the following steps. First, collect relevant literature, books, and other materials in the field. Then, domain keywords are obtained using natural language processing methods such as word segmentation, stop word removal, and the TF-IDF algorithm. Next, vectorize the keywords and calculate text similarity to cluster the keywords. Finally, based on practical application requirements, determine the ontology rules and constrain the data layer.

The data layer is the core substance and circulating blood of the structure, mainly relying on information extraction capabilities. When building the data layer, the first step is to extract domain entities from unstructured text. Then, perform relationship extraction to form the data layer. In order to capture fine-grained local information and improve network training, the ResNet structure is used to improve the BiLSTM-CRF model. Meanwhile, the BERT model with rich contextual information capture capability is used to extract the relationships between entity pairs.

The constraints of the schema layer on the data layer are mainly achieved by defining rules to ensure that entities and relations in the data layer comply with predefined schema layer specifications. Specifically, the schema layer constrains entities and relations from two aspects. Firstly, the schema layer defines clear entity types, and each entity in the data layer must be strictly classified into one of these predefined types. Meanwhile, an entity cannot belong to multiple incompatible types simultaneously to avoid confusion and conflicts in classification, ensuring the accuracy of knowledge. Secondly, the schema layer defines clear relation types, and the relations in the data layer must conform to these predefined relation types. In addition, relation types have clear directionality, and relations in the data layer must have a clear direction and cannot be reversed or blurred. Through these constraints, the schema layer ensures that entities and relations in the data layer conform to predefined specifications, thereby ensuring the accuracy and logicality of the knowledge graph.

After extracting the relations, the unprocessed triples can be obtained. Then, a two-layer entity alignment method is adopted for the knowledge fusion of entities in triples. This can avoid redundancy in the extracted knowledge, thereby obtaining the final triples. The format of each triple is

(e_{i}, e_{j}, r_{u})

, and this form represents that the relation type between entity

e_{i}

and entity

e_{j}

is

r_{u}

, where

r_{u} \in r

,

r = {r_{1}, r_{2}, \dots, r_{u}, \dots, r_{z}}

, and z is the number of relation types. Finally, the Neo4j graph database is used to store and visualize the triples. The obtained data are stored in the database using CQL language, thereby constructing a complete subject domain knowledge graph.

4. Methodology on Knowledge Graph Construction

The knowledge graph construction includes the construction of both the schema layer and the data layer. The schema layer of the knowledge graph can make the relationships between knowledge concepts more logical and form a relatively complete system. Therefore, the intelligent construction of the schema layer based on domain knowledge is the first step in knowledge graph construction. The data layer of the knowledge graph is its core component. And it stores the specific information of all entities, relations, and attributes within the knowledge graph. The quality of the data layer construction directly affects the completeness and accuracy of the knowledge graph. When constructing the data layer, it is necessary to extract structured knowledge from a large amount of textual data. In this paper, this process includes named entity recognition and relation extraction. Next, knowledge fusion is designed to remove redundant knowledge. Finally, knowledge is stored using a graph database.

4.1. The Schema Layer of Subject Domain

Constructing the ontology of a disciplinary domain knowledge graph is of great significance. It can effectively manage dispersed domain knowledge. This helps to form a reasonable knowledge system. Thus, constructing an intelligent ontology based on the characteristics of the domain is the first step in knowledge graph construction, which can be divided into three sub-steps as follows:

(1) Extract subject domain terminology. Generally, a corpus is established by the collected data to determine the scope of the ontology’s domain. After removing stop words from the data, we perform word segmentation and get a specific subject domain dictionary. And then we use the TF-IDF algorithm to extract keywords from this dictionary.

(2) Term clustering in subject domains. The keywords extracted in step (1) are transformed into dense vectors, from which semantic similarities are calculated from the perspective of semantics. The clustering in subject domains is categorized by a hierarchical clustering algorithm based on the similarity of domain terms. Four word vector models are shown in Table 1. By comparing their features, the BERT model is one of the most suitable word vector models for generating contextually relevant and semantically rich word vectors. Thus, the BERT model is chosen for the vector representation of keywords in this work.

(3) Define entity types and relation types. Using the results of domain term clustering in (2) as a reference and combining actual application requirements, the final entity types are defined. It is well known that knowledge within different domains is interconnected by their relationships. These relationships integrate all entities into a unified whole. The types of relationships within different domains can be chosen to cover the target entities.

4.2. Named Entity Recognition

In this work, the BiLSTM-CRF model is innovated to identify entities within the subject domain. The residual blocks from the ResNet network [38] are introduced to improve the BiLSTM-CRF entity recognition. And a residual block can strengthen the representational capacity of sequence features. A lightweight residual block is adopted with two 3 × 1 1D convolutional layers (kernel size

I = 3

, padding

P = 1

, stride

L = 1

), where the number of filters is consistent with the BiLSTM hidden layer dimension. Unlike simply stacking more BiLSTM layers, which often leads to gradient vanishing and overfitting in deep networks, the residual block introduces shortcut connections to preserve gradient flow, enabling more efficient capture of local contextual patterns between adjacent words. The text data of high school mathematics are mapped into fixed-dimensional word vectors. BiLSTM and ResNet serve as encoders to extract features. Finally, the sentences of the text data are decoded with sequence labeling by the CRF layer. The improved BiLSTM-CRF model with an added residual block is displayed in Figure 2a, and an example to illustrate the model is also shown in Figure 2b.

In order to explain the structure of the innovated BiLSTM-CRF model for named entity recognition, the five main structures of Figure 2a are elaborated as follows:

(1) Word embedding

Word embedding is used to convert text into vectors. This step is the foundational part of named entity recognition. As shown in Figure 2a, the word embedding layer converts the input text into vector form. Denote the sentence text by

(e_{1}, e_{2}, e_{3}, . . ., e_{T})

, and the vector by

(x_{1}, x_{2}, x_{3}, \dots, x_{T})

, where

e_{T}

and

x_{T}

represent the T-th character in the sentence and its vector, respectively. Then, the word embedding is utilized for the purpose of capturing features from the domain knowledge text.

(2) BiLSTM layer

Usually, BiLSTM is used to capture contextual information when extracting features of domain knowledge text. BiLSTM includes three types of gates by which the text is regulated: forget gate, input gate, and output gate. Suppose the t words are translated to vectors, the input word vector is

x_{t}

, denote the cell state

C_{t}

, hidden layer state

h_{t}

, forget gate

f_{t}

, input gate

i_{t}

and output gate

o_{t}

respectively. The internal diagram of LSTM is shown in Figure 3a.

Then the value of the forget gate is

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f}) .

The degree of forgetting information is controlled by

C_{t - 1} \cdot f_{t}

in the forgetting gate. The value of input gate

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i}) .

And the value of temporary cell state

Δ C_{t} = tanh (W_{C} \cdot [h_{t - 1}, x_{t}] + b_{C}) .

The value of the current cell state can be obtained,

C_{t} = f_{t} \cdot C_{t - 1} + i_{t} \cdot Δ C_{t},

where the input values of the hidden state and cell state in the previous moment

t - 1

are

h_{t - 1}

and

C_{t - 1}

respectively, and

W_{f}

,

W_{i}, W_{C}

are the weight matrices of the forget gate, input gate, the current cell state, respectively. By the above formulas and the previous method in reference [27], the value of output gate is

o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o}) .

Taking the values

o_{t}

and

C_{t}

as inputs, a sequence of hidden layer states

h_{t}

on the given sentence is obtained, the formula is

h_{t} = o_{t} \cdot tanh (C_{t})

. That is, the BiLSTM layer outputs the hidden state of the given sentences.

The word vectors taken as embedding layer

(x_{1}, x_{2}, \dots, x_{T})

are input to the BiLSTM layer, then the contextual information of each character is extracted by the BiLSTM model. Since the BiLSTM model consists of two opposite direction LSTMs, denote the forward and the backward state vectors of the t-th time step by

{\vec{h}}_{t}

and

{\overset{\leftarrow}{h}}_{t}

respectively, where

{\vec{h}}_{t} = (h^{1}, h^{2}, . . ., h^{d})

and

{\overset{\leftarrow}{h}}_{t} = (h^{d + 1}, h^{d + 2}, . . ., h^{2 d})

. Then, the final output hidden states of BiLSTM layer are

h_{t} = ({\vec{h}}_{t}, {\overset{\leftarrow}{h}}_{t}) = (h^{1}, h^{2}, . . ., h^{d}, h^{d + 1}, . . ., h^{2 d})

. Although the BiLSTM layer can capture the contextual information during the processing of sequence data, it has limitations in capturing fine-grained local information and dealing with deep networks. Hence, the residual block is introduced from the ResNet network, which improves the training of deep networks by learning residuals to conquer the above shortcomings. Traditional BiLSTM can effectively capture long-range contextual dependencies when processing sequential data. However, in subject-domain texts, a large number of professional terms consist of consecutive adjacent words, and the accurate recognition of such entities relies heavily on semantic features within local windows. BiLSTM is insufficient in modeling the local contextual information between words. To alleviate this limitation, this paper introduces a lightweight ResNet structure to enhance local feature extraction. Its core role is not to solve the vanishing gradient problem of deep networks through residual connections, but to capture local semantic correlations between adjacent words via convolution operations and extract structural features of continuous word combinations, adapting to the entity expression patterns of subject terminologies. Meanwhile, batch normalization and shortcut connections in the residual module stabilize the training process of local convolutional features, avoid overfitting under small-sample domain data, and improve the generalization performance of entity recognition.

Before formally analyzing the complementary relationship between BiLSTM and ResNet, we first clarify the connotation of “local contextual information” in subject domain named entity recognition and the inherent limitations of alternative local feature extraction methods. In domain-specific NER tasks, local contextual information specifically refers to the semantic association and boundary discriminative features between consecutive words within multi-word technical terms. For example, in high school mathematics texts, terms such as “quadratic equation with one unknown”, “eccentricity of an ellipse”, and “probability mass function of binomial distribution” are all composed of 3–5 consecutive words. The accurate recognition of these entities does not mainly depend on the global context of the entire sentence, but on the tight semantic combination relationship between adjacent words in the local window.

TextCNN, as a traditional local feature extraction method, uses fixed-size convolution kernels and lacks a built-in training stabilization mechanism, making it extremely prone to overfitting on small-scale domain datasets with strong specificity. The results of the ablation study in Section 5.2 directly validate this limitation: adding only convolutional layers to the baseline BiLSTM-CRF model resulted in little improvement in F1 score, indicating that pure convolutional operations cannot effectively extract discriminative local features under small sample conditions. Dilated convolutions, although they can expand the receptive field without increasing computational complexity, achieve this by skipping intermediate words, which leads to the sparsification of local features. This characteristic is particularly harmful to the recognition of densely packed short domain terms, as it may lose the key semantic connection between adjacent words that defines the entity boundary. In contrast, the lightweight ResNet block integrates convolutional layers, batch normalization, and residual connections. The convolutional layers extract hierarchical local n-gram features, batch normalization reduces internal covariate shift to prevent overfitting, and the residual connection preserves original feature information to stabilize the training process, making it the most suitable local feature extractor for subject domain NER tasks.

The BiLSTM and ResNet are connected serially with residual shortcut connections. The BiLSTM first captures global long-distance contextual features, and its output hidden states are then fed into the ResNet residual block to extract local fine-grained features. The residual shortcut preserves global information while enhancing local feature learning, forming an integrated feature extractor before the final CRF layer. This post-positioned design of ResNet after BiLSTM is tailored to subject domain text traits. Our design lets BiLSTM encode global contextual semantics to resolve the polysemy of domain terms, e.g., “axis” has distinct meanings in geometry and function scenarios. The ResNet block then mines fine-grained local features on context-aware hidden states, which sharply boosts the recognition of multi-word technical terms like “quadratic equation with one unknown” and “eccentricity of an ellipse”.

(3) Residual block

The ResNet is composed of the residual block, which consists of two convolutional layers and batch normalization layers, preserving the input information through shortcut connections. In the residual block, the core structure is the equation

G (h) = F (h) + h

, where

F (h)

is defined as the residual mapping, and

h = (h_{1}, h_{2}, \dots, h_{t})

is the output of the previous layer. In this structure, the input h maps to the output of the network through the shortcut connection. If the value of the optimal mapping function

G (h)

is required, the residual relative to the input is

F (h) = G (h) - h

. The convolution operation within the residual block can effectively capture local patterns and fine-grained features. The convolution operation applies a sliding window to the local area of the input feature matrix. This can capture the local contextual relationships between words, extract features between adjacent words, and help identify common structures in continuous words. By stacking convolutional layers, ResNet can capture local information at different scales, thereby improving the capability to express specifics. Meanwhile, batch normalization can make network training more stable and accelerate convergence speed. The shortcut connection in the residual block ensures that gradients can flow from the output to the input without hindrance, thus ensuring effective gradient propagation in deep networks. The BiLSTM-ResNet outputs a feature matrix M. The features generated by the BiLSTM-ResNet are fed into a linear layer to be transformed into the output matrix P required by the CRF. Assuming there are k output labels, the output dimension of the linear layer will be k. Thus, the score matrix of the linear layer is obtained as P, and its dimension is

T \times k

.

(4) CRF layer

Adding a CRF layer after the BiLSTM-ResNet part is to extract text features and correct the identified label sequence. Generally, CRF is used to model the dependencies between labels and perform sequence labeling in a global scope. CRF defines a transition matrix to represent the transition probabilities between labels and uses the Viterbi algorithm to find the global optimal sequence. Denote the label sequence by

y = (y_{1}, y_{2}, \dots, y_{T})

, the final score function of the CRF layer denoted by s will be obtained by the influence of the input sequence e and the label sequence y,

s (e, y) = \sum_{t = 1}^{T} P_{t, y_{t}} + \sum_{t = 0}^{T} A_{y_{t}, y_{t + 1}},

where

P_{t, y_{t}} \in P

is the score for the t-th token mapped to the label

y_{t}

, and

A_{y_{t}, y_{t + 1}} \in A

represents the transition score from

y_{t}

to

y_{t + 1}

. The CRF layer obtains the final scoring function s by combining the output matrix P from the previous layer with the transition matrix A, and outputs the optimal label sequence. Thus, in the CRF layer, the score function s is used to determine the conditional probability of a specific label sequence corresponding to a given input sequence. Then, based on these probabilities, a logarithmic likelihood function can be obtained. Finally, the Viterbi algorithm is applied to decode the output sequence.

(5) Example for obtaining named entity recognition

A specific example is proposed to illustrate the processes in Figure 2a and Figure 3a. First, the input sequence is set, and its characters are represented in vector form. Then, the weight matrices

W_{f}

,

W_{i}

,

W_{o}

,

W_{C}

and bias vectors

b_{f}

,

b_{i}

,

b_{o}

,

b_{C}

are defined. Taking hidden state dimension

d = 2

and label size

k = 5

as an example, these parameters and the formulas mentioned in the previous text are used to calculate the hidden state vectors

{\vec{h}}_{1}

,

{\vec{h}}_{2}

,

{\vec{h}}_{3}

. The calculation example of

{\vec{h}}_{1}

is shown in Figure 3b. The weight matrices used for forward and backward calculations are different, but the same calculation method can also be used to calculate

{\overset{\leftarrow}{h}}_{1}

,

{\overset{\leftarrow}{h}}_{2}

,

{\overset{\leftarrow}{h}}_{3}

, ultimately obtaining

h_{1}

,

h_{2}

,

h_{3}

. (The specific calculation example is not reiterated here.) Finally, the output h of the BiLSTM is processed through the residual block and the linear layer to obtain the score matrix P. Taking the input word vector

x_{1} = [0.1, 0.2, 0.3, 0.4]

and initial hidden state

h_{0} = [0, 0]

, cell state

c_{0} = [0, 0]

as inputs, we calculate the forget gate

f_{1} = [0.73, 0.77]

, input gate

i_{1} = [0.77, 0.77]

, output gate

o_{1} = [0.80, 0.71]

, temporary cell state

Δ C_{1} = [0.92, 0.80]

, final cell state

C_{1} = [0.71, 0.62]

and hidden state

h_{1} = [0.49, 0.39]

using the LSTM gate mechanism formulas. This calculated hidden state

h_{1}

(along with the corresponding backward LSTM hidden state) forms the output of the BiLSTM layer, which is then fed into the subsequent residual block. The residual block will perform convolution operations on this sequence of hidden states to capture the local contextual relationships between adjacent tokens. This numerical example illustrates how the BiLSTM layer first extracts global contextual features, which are then refined by the residual block to capture fine-grained local patterns. This is exactly the core design principle that enables our BiLSTM-ResNet-CRF model to outperform the traditional BiLSTM-CRF model in recognizing multi-word mathematical entities. By Figure 2b, it can be found that the output results may violate naming conventions. To address this issue, the output violating naming conventions is addressed after the CRF layer is inserted before the output.

The errors in the high school mathematics test sets mainly include three types: entity boundary detection errors, entity type classification errors, and mixed boundary-type errors. First, boundary errors arise from the prevalence of compound terms formed by combining multiple basic mathematical concepts with ambiguous semantic boundaries between them. For example, one may identify “quadratic equation with one unknown” as a complete entity while omitting “root-finding formula” in the phrase “root-finding formula for quadratic equations with one unknown”, or incorrectly split “eccentricity of an ellipse” into two separate entities when it should form a single geometric attribute entity. Second, type errors are caused by the semantic overlap of certain terms across different knowledge modules and insufficient annotated samples for low-frequency entities. For example, “permutation and combination” may be misclassified as a “numbers and algebra” entity instead of the correct “probability and statistics” category, and “mapping” may be misclassified as “preparatory knowledge” instead of “numbers and algebra”. Third, mixed errors result from the combined effects of these two factors. For example, for the phrase “probability mass function of a binomial distribution”, one may not only incorrectly split it into two entities but also misclassify “probability mass function” as “Preparatory Knowledge”.

4.3. Theoretical Rationale for BiLSTM-ResNet-CRF Integration

The superior performance of our proposed BiLSTM-ResNet-CRF model in subject domain NER stems from the complementary strengths of BiLSTM and ResNet components, which together form a multi-scale feature extraction framework that addresses the limitations of standalone BiLSTM-CRF models. A formal theoretical analysis of this complementary relationship is provided below.

(1) Gradient flow preservation via residual connections

A fundamental limitation of deep neural networks is the vanishing gradient problem, which becomes particularly severe when training on small-scale domain datasets. The residual block introduces an identity shortcut connection that allows gradients to flow directly from the output layer to earlier layers during backpropagation. Formally, the gradient of the loss function L with respect to the input h of a residual block is

\frac{\partial L}{\partial h} = \frac{\partial L}{\partial G (h)} \cdot (1 + \frac{\partial F (h)}{\partial h}),

where

G (h) = F (h) + h

is the residual mapping. Even when

\partial h / \partial F (h)

approaches 0 (vanishing gradient), the gradient remains non-zero due to the identity term 1. This property ensures that our model can learn discriminative features from limited subject domain data without performance degradation, which is critical for domains where high-quality labeled data are scarce.

(2) Multi-scale local feature extraction via convolutional layers

BiLSTM models excel at capturing long-range sequential dependencies but are inefficient at extracting fine-grained local contextual features. The convolutional layers in ResNet blocks apply sliding windows of size I to the input feature matrix, enabling the model to explicitly model adjacent word dependencies at different granularities. For a sequence of length T, the output feature map of a convolutional layer with kernel size I, padding P, and stride L has length:

R = ⌊\frac{T - I + 2 P}{L}⌋ + 1 .

By stacking multiple convolutional layers with different kernel sizes (

I = 2

and

I = 3

are used in the implementation), this model captures hierarchical local patterns ranging from bigram and trigram features to phrase-level structures. This is particularly valuable for recognizing multi-word technical terms in subject domains, where accurate boundary detection depends on identifying local semantic patterns between consecutive words.

(3) Training stabilization via batch normalization

The batch normalization (BN) operation included in each residual block normalizes the activations of the previous layer to have zero mean and unit variance:

{\hat{x}}^{(k)} = \frac{x^{(k)} - μ_{B}}{\sqrt{σ_{B}^{2} + ϵ}},

where

μ_{B}

and

σ_{B}^{2}

are the mean and variance of the mini-batch B. BN reduces internal covariate shift, accelerates convergence, and prevents overfitting by adding a small amount of noise to the activations. In our experiments, we observed that BN reduced the training time by 23% and improved the model’s generalization ability on the small-scale high school mathematics dataset.

BiLSTM provides global contextual understanding of the entire sentence, while ResNet blocks extract fine-grained local features critical for multi-word term recognition. Their combination creates a unified framework that simultaneously models both global semantic coherence and local term structure, which is essential for accurate subject domain NER.

4.4. Relation Extraction

Relation extraction needs to be done on the basis of entity recognition, and the training data still needs to be labeled. The BERT model is utilized to address the relation classification issue [39]. In order to capture the positional information of the two entities, special tokens $ and # are inserted at the beginning and end of the two target entities

e_{1}

and

e_{2}

, respectively. Additionally, a [CLS] token is added in the initial sentence position. As an illustration, after the addition of particular tokens, a sentence that includes the target entities can be represented as: [CLS]. The $set$ is composed of #elements#. The final output of the [CLS] token in the model is formulated as

H_{0}^{'} = W_{0} [tanh (H_{0})] + b_{0} .

The two target entities

e_{1}

and

e_{2}

can be represented by

H_{1}^{'}

and

H_{2}^{'}

respectively. The calculation formulas for the representations of the two entities are as follows:

H_{1}^{'} = W_{1} [tanh (\frac{1}{j - i + 1} \sum_{t = i}^{j} H_{t})] + b_{1},

H_{2}^{'} = W_{2} [tanh (\frac{1}{m - k + 1} \sum_{t = k}^{m} H_{t})] + b_{2},

where

H_{t}

represents the vector of each character in the entity. At the same time,

W_{0}

,

W_{1}

and

W_{2}

share the same parameters, i.e.,

W_{0} = W_{1} = W_{2}

. The biases also share the same parameters, i.e.,

b_{0} = b_{1} = b_{2}

.

In order to perform relation extraction, the

[C L S]

tag

H_{0}^{'}

and the outputs of the two target entities

H_{1}^{'}

and

H_{2}^{'}

are concatenated to obtain H. The specific calculation process is as follows:

H = W_{3} [concat (H_{0}^{'}, H_{1}^{'}, H_{2}^{'})] + b_{3} .

After that, the obtained H is fed into the fully connected layer and classified by using the softmax function. The category with the highest probability is ultimately chosen as the predicted relation type r. After named entity recognition and relation extraction, the triple

(e_{i}, e_{j}, r_{u})

can be obtained, where

r_{u} \in r

,

r = {r_{1}, r_{2}, \dots, r_{u}, \dots, r_{z}}

, and z is the number of relation types. The specific flow of the relation extraction model is shown in Figure 4.

In this work, we adopted full fine-tuning for the BERT model. All parameters of the pre-trained BERT model were updated during the training process. The training hyperparameters were set as follows: batch size = 16, initial learning rate = 2 × 10⁻⁵, the number of training epochs = 5, weight decay = 1 × 10⁻⁴, and warm-up ratio = 0.1. This fine-tuning strategy enables the BERT model to effectively capture the domain-specific semantic features of high school mathematics knowledge.

4.5. Knowledge Fusion

Due to the different sources of information, knowledge extracted from the unstructured data inevitably contains a certain redundancy. As a result, entity alignment is essential to optimize the knowledge graph. The cosine similarity is used to eliminate a large number of duplicate or conflicting entities. And then, the two-layer entity alignment method is adopted for knowledge fusion; the pseudocode of the knowledge fusion algorithm is shown in Algorithm 1. It should be emphasized that the proposed two-layer entity alignment strategy has sequential and complementary logic without functional overlap. The first-layer alignment based on Word2Vec focuses on coarse-grained duplicate removal by calculating the similarity of entity surface names, which can quickly filter out obviously repeated entities and reduce the computational cost of subsequent processing. The second-layer alignment based on Doc2Vec aims at fine-grained semantic disambiguation by using entity description information, so as to solve the problems of entity ambiguity and homonymy that cannot be distinguished only by literal features. The two layers cooperate in a progressive manner to ensure the accuracy and efficiency of knowledge fusion. As shown in Algorithm 1, there are three main parts of this algorithm. The description of the three parts is as follows.

Algorithm 1: Knowledge Fusion

In part one, the set E of all entities is obtained from the triple S. And then, the word vector model is used to obtain the vector representations of entities. The set of entity vectors is V. Thirdly, the cosine similarity between entity pairs is calculated. Based on the cosine similarity matrix, merge entity pairs with similarity greater than the threshold and perform the first entity alignment. The selection of the similarity threshold refers to the research of Ijebu et al., who set a similarity threshold of 0.6 to determine whether the problem pairs are repeated [40]. Based on this empirical threshold,

θ = 0.6

is set. Get the new set of processed entities

E^{'}

. However, directly applying a threshold from the general duplicate problem pair detection domain without targeted validation on our educational dataset is a limitation. High school mathematics entities have distinct characteristics compared with general natural language entities: (1) High standardization of terminology, with most concepts having unified formal definitions across textbooks; (2) Limited and well-defined formal aliases, with rare semantic ambiguity; (3) Clear hierarchical relationships between concepts, which reduces the complexity of entity disambiguation. To rigorously validate the appropriateness of the 0.6 threshold for our specific dataset, we conducted a systematic threshold sensitivity analysis for both layers of the entity alignment method. We set the threshold range from 0.4 to 0.8 with an interval of 0.1, i.e., 0.4, 0.5, 0.6, 0.7, 0.8, and performed independent threshold tests for the first-layer Word2Vec-based coarse-grained alignment and the second-layer Doc2Vec-based fine-grained alignment. For each threshold combination, we evaluated four core metrics that comprehensively reflect the quality of the knowledge graph: (1) Entity redundancy rate: The proportion of remaining duplicate or synonymous entities after alignment; (2) Knowledge graph completeness: The proportion of valid unique entities retained without over-merging; (3) Triple accuracy: The semantic correctness rate of knowledge triples after entity merging; and (4) Downstream QA F1-score: The F1-score of a rule-based knowledge question answering system built on the constructed knowledge graph. See Section 5.3 for details.

In part two, we need to search for processed entities from the knowledge base. The knowledge base needs to contain entities as well as entity descriptions. In this paper, Baidu Encyclopedia is selected as the knowledge base. Once an entity can be searched in the knowledge base, the detailed information about this entity can be extracted from the knowledge base to supplement the original data as entity descriptions. The entity description set obtained from the Baidu Encyclopedia is

D^{'}

. If an entity is not found in the knowledge base, ERNIE Bot Large Language Model https://yiyan.baidu.com/ would be used to generate entity descriptions for reference. It should be emphasized that ERNIE Bot is only used to generate descriptions for a small proportion of entities that cannot be linked to the knowledge base. All such generated descriptions are strictly checked manually by our research team members to ensure 100% accuracy of factual content and conceptual definitions, so as to avoid any possible hallucination or unreliable information. Since entity description generation is a standardized factual task rather than innovative creation, the LLM has a very low probability of hallucination in this scenario. The entity description set generated by the ERNIE Bot Large Language Model is

D^{''}

. Then, obtain all of entity description set D.

In part three, the Doc2vec model is used to represent entity descriptions as vectors, and the generated vectors are used as entity vectors for the second entity alignment. The cosine similarity is again utilized to remove redundant entities to obtain the final entity set

E^{''}

.

In the merging process, if the similar entities

e_{i}

and

e_{j}

are originally related, the relation between these two entities can be ignored. When considering that

e_{j}

is replaced with

e_{i}

and

e_{j}

is related to other entities, if

e_{j}

is related to

e_{m}

, then the relationship between

e_{j}

and

e_{m}

is preserved; if both

e_{i}

and

e_{j}

are related to

e_{m}

, the original relation between

e_{i}

and

e_{m}

will be kept, and the relation between the original

e_{j}

and

e_{m}

will also be kept. After that, the final triples

S^{'}

can be obtained.

4.6. Integration of Core Knowledge Graph Construction Modules

The seamless integration among the entity recognition, relationship extraction, and knowledge fusion modules is essential for the system’s overall performance and reliability. This subsection elaborates on the sequential dependencies, data format specifications, and error handling mechanisms that govern inter-module communication.

The entity recognition module outputs a structured JSON object for each input sentence, containing the following fields: (1) entity text, (2) entity type, e.g., “preparatory knowledge”, “numbers and algebra”, “geometry”, (3) start and end character offsets in the original text, (4) model-generated confidence score (ranging from 0 to 1), and (5) a 768-dimensional semantic embedding vector generated by the final hidden layer of the BERT model. This structured output is passed directly to the relationship extraction module without further transformation, ensuring that all contextual and semantic information is preserved.

The relationship extraction module takes as input the complete sentence embedding and all pairs of recognized entities within the same sentence. For each entity pair, it generates a set of candidate relations with associated confidence scores. The module outputs a list of candidate triples, each consisting of the head entity embedding, tail entity embedding, relation type, and combined confidence score, calculated as the product of the entity recognition confidence scores and the relation classification confidence score. Only triples with a combined confidence score above 0.5 are retained for further processing, a threshold chosen to eliminate obviously spurious results while minimizing the loss of potentially valuable information.

The knowledge fusion module receives the filtered candidate triples and processes them in two stages. First, entity alignment is performed to map equivalent entities from different sources to a single canonical representation. This step uses the cosine similarity between entity embeddings and semantic descriptions retrieved from external knowledge bases. Second, aligned entities are merged, and their associated relations are consolidated to eliminate duplicates and resolve contradictions. The final fused triples are then stored in the graph database, with all original confidence scores retained for traceability.

Error handling is implemented at each module boundary. If a module fails to process an input, e.g., due to malformed data or out-of-vocabulary entities, the input is logged and skipped rather than causing the entire pipeline to fail. Additionally, a post-processing step identifies and removes triples that violate domain-specific logical constraints, e.g., a geometric figure cannot be a subtype of an algebraic concept, further improving the quality of the final KG.

5. Case Study

5.1. The Definition Entity Types and Their Relations

Without loss of generalization, high school mathematics is taken as an example to illustrate the subject domain division from the perspective of knowledge categorization. The experimental dataset of high school mathematics is constructed from three high-quality knowledge sources: standard textbooks (Phoenix Education Publishing Edition), authoritative web pages, and domain supplementary materials. The authoritative web pages are collected from three main types of educational platforms: (1) Encyclopedia platforms represented by Baidu Encyclopedia; (2) Professional high school mathematics education websites; (3) Educational resource sharing platforms. These web sources cover all six core knowledge modules of high school mathematics, with a relatively balanced distribution: functions (28%), analytic geometry (24%), stereogeometry (19%), numbers and algebra (15%), sequences (8%), and probability and statistics (6%). All source data have undergone standardized text preprocessing and manual triple annotation. The total number of valid annotated triples is 639, of which 394 triples (61.66%) are derived from textbooks, 191 triples (29.89%) from authoritative web pages, and 54 triples (8.45%) from domain supplementary materials. The proportion of non-textbook data reaches 38.34%, which ensures sufficient breadth and diversity of data sources. Please refer to Table 2 for specific information. This dataset covers the core knowledge modules of high school mathematics, including functions, vectors, analytic geometry, stereogeometry, sequences, probability, and statistics, forming a standardized and complete core knowledge system. Firstly, a corpus based on the collected dataset is established, then the keywords are extracted using a domain term extraction method and converted into vectors using BERT. Finally, the number of hierarchical clustering categories is 4 in practical application; by calculating the Silhouette Coefficient, the maximum value is 0.6 when the categories are 4. And it means the optimal number of hierarchical clustering is 4. By analyzing the text of High School Mathematics clustering results, the knowledge domain is clustered into 4 entity types: Preparatory knowledge, Numbers and algebra, Geometry, Probability and statistics. The size of each type is shown in Table 3.

The BIO labeling method is adopted at the step of data annotation of named entity recognition, where B, I, and O represent the beginning, the inside, and the outside of the entity label, respectively. An example of BIO labels is shown in Table 4.

Then, 4 types of knowledge relations, predecessor–successor, inclusion, parallel, and attribute relation, are defined. Two knowledge,

a_{1}

and

a_{2}

, is called a predecessor–successor relation; if a person wants to learn knowledge

a_{2}

, it is necessary to learn knowledge

a_{1}

first, where

a_{1}

is called the predecessor knowledge of

a_{2}

, while

a_{2}

is called the successor knowledge of

a_{1}

. The knowledge a and a set of knowledge

b_{1}, b_{2}, \dots, b_{n}

are called inclusion relation if the parent knowledge a is the integration of n children knowledge

b_{1}, b_{2}, \dots, b_{n}

, in which each child knowledge

b_{i}

explains the parent knowledge a from the i-th angle,

i = 1, 2, \dots, n

. The knowledge

a_{1}

and

a_{2}

are called parallel relation if

a_{1}

and

a_{2}

have the same predecessor knowledge. In such a relationship,

a_{1}

and

a_{2}

share similar thematic keywords. If the knowledge

a_{1}

is an attribute of another knowledge

a_{2}

, then

a_{1}

and

a_{2}

have attribute relation. After defining the types of relations, it is essential to annotate the relation. When multiple entities appear in a text, multiple relations may emerge. An example of the data annotation for the relation extraction from text is shown in Table 5. In terms of relationship type distribution in the annotated dataset, there are 247 inclusion relations (38.7%), 215 attribute relations (33.6%), 91 predecessor–successor relations (14.2%), and 86 parallel relations (13.5%). Such a balanced relationship distribution supports the comprehensive evaluation of the relation extraction model.

5.2. The Evaluation of Models Within the Framework

To prove the performance of the BiLSTM-ResNet-CRF model in this work, the following previous baseline models: BiLSTM, BiLSTM-CRF, BiLSTM-ResNet, BiLSTM-TextCNN-CRF, and BiLSTM-DilatedCNN-CRF are compared with BiLSTM-ResNet-CRF. To further isolate the unique contribution of the ResNet component and rule out the possibility that the performance gain comes from simply increasing model depth, we further constructed two control models with 2-layer and 3-layer stacked BiLSTM-CRF architectures, adjusting the hidden state dimensions to keep the total number of parameters approximately equal to that of our proposed BiLSTM-ResNet-CRF model. The evaluation indices are Precision, Recall, and F1. The experiments are conducted on two datasets: the High School Mathematics dataset constructed in this work and the open Resume dataset [41], and the datasets can be downloaded from https://github.com/jiesutd/LatticeLSTM (accessed on 26 September 2025). The Resume dataset was collected by Zhang et al. from Sina Finance. It consists of resumes from senior management personnel of publicly listed companies in the Chinese stock market.

The results shown in Table 6 indicate that the entity recognition model BiLSTM-ResNet-CRF exhibits the best performance among the six models on both domain datasets. Among the baseline models, the two newly added convolution-based models both outperform the traditional BiLSTM-CRF model, which confirms that introducing convolutional structures can effectively enhance the model’s ability to capture local contextual information in domain-specific NER tasks. Specifically, the BiLSTM-TextCNN-CRF model achieves an F1-score of 78.75% on the High School Mathematics dataset and 94.76% on the Resume dataset. Its performance improvement compared with BiLSTM-CRF comes from the multi-scale convolution kernels that can extract n-gram features of different lengths, which is particularly beneficial for identifying domain-specific terms with fixed phrase structures such as “quadratic equation with one unknown”. The BiLSTM-DilatedCNN-CRF model further improves the F1-score to 79.48% on the High School Mathematics dataset and 94.94% on the Resume dataset, outperforming the BiLSTM-TextCNN-CRF model. This is because dilated convolution expands the receptive field without increasing computational complexity, enabling the model to capture longer-range local dependencies between words. For example, in the High School Mathematics dataset, many entities consist of 3–5 consecutive terms, and dilated convolution with a rate of 4 can capture the contextual relationships between the first and last words of these entities without losing intermediate information. However, both convolutional-based models still underperform our proposed BiLSTM-ResNet-CRF model. The main reason is that the residual block structure not only captures local features through convolution operations but also introduces shortcut connections and batch normalization. These designs effectively solve the gradient vanishing problem in deep networks, stabilize the training process, and promote feature reuse. As a result, our model can extract more fine-grained local features and achieve higher recognition accuracy, especially for complex domain entities with rich contextual relationships. The 0.78% F1-score improvement of BiLSTM-ResNet-CRF over BiLSTM-DilatedCNN-CRF on the High School Mathematics dataset fully demonstrates the unique value of residual connections in domain-specific NER tasks.

Our experiments also demonstrate the inefficiency of simply stacking BiLSTM layers for domain-specific NER tasks. On the high school mathematics dataset, increasing BiLSTM layers from 1 to 2 and from 2 to 3 only improves the F1-score by 0.48% and 0.31% respectively, while the training accuracy increases by 1.8% and 2.3% respectively, indicating obvious overfitting. In contrast, the BiLSTM-ResNet-CRF model achieves a 2.93% F1-score improvement under similar parameter scales, and the validation set performance shows significantly smaller fluctuations during training. To verify the statistical significance of this improvement, we performed a paired t-test on the independent F1-score results of the 1-layer BiLSTM-CRF and BiLSTM-ResNet-CRF models. The test yields p-values of 0.032 and 0.028 on the high school mathematics dataset and the Resume dataset, respectively, both below the 0.05 significance level, confirming that the performance gain brought by the ResNet component is not due to random chance.

Other ablation studies are further conducted on ResNet positioning by comparing “ResNet before BiLSTM” and “ResNet after BiLSTM”. On the High School Mathematics dataset, the post-positioned design yields an F1-score of 80.26%, 2.17% higher than the pre-positioned counterpart (78.09%). On the Resume dataset, the F1-score reaches 95.14% vs. 94.25%, with a 0.89% improvement. In particular, Geometry entities gain a 3.24% F1 increase, and Probability and Statistics entities gain 2.78%, proving that post-positioned ResNet better handles complex multi-word domain entities with strong context dependence. We further introduce the fully fine-tuned BERT-CRF model, as shown in Table 5. Even compared with the fully fine-tuned BERT-CRF model, our proposed model still achieves a 1.67% F1-score improvement on the High School Mathematics dataset (80.26% vs. 78.59%) and a 0.44% improvement on the Resume dataset (95.14% vs. 94.70%). The fully fine-tuned BERT-CRF model outperforms traditional convolutional-enhanced models such as BiLSTM-DilatedCNN-CRF, which verifies the effectiveness of pre-trained language models in capturing deep semantic features. However, its performance is still inferior to our proposed model. The main reasons are as follows: (1) BERT excels at global semantic modeling but lacks precise perception of local boundary features of consecutive multi-word technical terms, which are the core entities in subject domain texts; (2) BERT has a large number of parameters and is more likely to overfit on our high school mathematics dataset with only 3113 training entities, while our lightweight BiLSTM-ResNet-CRF architecture balances model capacity and generalization ability better.

The evaluation index scores corresponding to different entity types are shown in Table 7. The performance improvement of our model varies across different entity types, which directly reflects the effectiveness of residual blocks in recognizing complex multi-word technical terms. As shown in Table 7, the most significant performance gains are achieved for geometry entities (F1-score increased by 7.68% compared with BiLSTM-CRF) and probability and statistics entities (F1-score increased by 2.32%). This is because these two domains contain a high proportion of long multi-word terms with complex local structures, such as “eccentricity of an ellipse”, “focal distance of a hyperbola”, “binomial distribution random variable”, and “Poisson probability mass function”. The convolutional layers in residual blocks excel at capturing the local semantic patterns that define these terms, enabling more accurate boundary detection. In contrast, the performance improvement for numbers and algebra entities is relatively modest (F1-score decreased by 1.55%), as these entities are often shorter, e.g., “integer”, “function”, and their recognition relies more on global contextual information captured by BiLSTM. This entity-type-specific performance pattern provides strong empirical evidence that our model design is specifically optimized for the most challenging entities in subject domain knowledge extraction.

To quantitatively evaluate the contribution of each component in the residual block to the overall model performance, we conducted ablation experiments on the high school mathematics dataset. We compared the full BiLSTM-ResNet-CRF model with three variants: (1) BiLSTM-CRF (baseline, no residual block); (2) BiLSTM-CRF+Conv (convolutional layers only, no residual connection or BN); (3) BiLSTM-CRF+Conv+BN (convolutional layers and BN, no residual connection). The results are presented in Table 8.

The ablation results clearly demonstrate that each component of the residual block contributes uniquely to the model’s performance. The addition of convolutional layers alone improves the F1-score by 0.52%, confirming the value of local feature extraction for domain term recognition. The inclusion of BN further increases the F1-score by 0.60%, highlighting its role in stabilizing training and improving generalization. Most significantly, the residual connection provides an additional 1.81% point improvement in F1-score, which validates our theoretical analysis that residual connections preserve gradient flow and enable deeper feature learning on small-scale datasets.

In order to prove the performance of the BERT model in relation extraction, the evaluation indices of Precision, Recall, and F1 are computed under different epochs. Figure 5 shows that the highest F1-score of the BERT model is 91.26%, when the number of epochs is 5, together with similar trends of Precision, Recall, proving the superiority of our BERT model in relation extraction. Table 9 further illustrates the specific performance of the BERT model under different relation types.

Furthermore, in order to realize knowledge fusion and avoid the duplication of information in this work, the word2vec algorithm is used to transform the identified entities into vectors. Then the cosine similarity matrix between entities is computed for the first entity alignment. Take an example to illustrate how to avoid information duplication. The web crawler technology is employed to scrape 395 entities of the High School Mathematics from the Baidu Encyclopedia website. After that, 338 of 395 entities have been successfully linked. And then, the descriptions of the entities can be obtained. While the ERNIE Bot Large Language Model is used to generate descriptions for the remaining 57 entities without links, the vectors are generated for the second entity alignment from the entity descriptions by the Doc2Vec algorithm. The 57 LLM-generated descriptions account for only 14.43% of all 395 entities, and all of them have been manually verified by our research team with 100% accuracy to ensure the reliability of entity information. The final standard entities and triples are 391 and 639, respectively.

Figure 6 shows the knowledge graph of the 391 entities and 639 triples. This knowledge graph visualizes the complex of knowledge and its interrelationships in High School Mathematics, and might benefit mathematics education.

5.3. Threshold Sensitivity Analysis

The results of the threshold sensitivity analysis are presented in Table 10, which shows the performance of different threshold combinations on the high school mathematics dataset.

As shown in Table 10, the threshold value has a significant impact on all four evaluation metrics. The first three metrics, entity redundancy rate, knowledge graph completeness, and triple accuracy, all show a monotonic increasing trend with the increase in the threshold, which is determined by the inherent mechanism of similarity-based entity alignment. For entity redundancy rate, a higher threshold means stricter merging conditions, fewer entity pairs meet the similarity requirement, and more duplicate or synonymous entities remain unmerged after both alignment layers, thus leading to a higher redundancy rate. For knowledge graph completeness, a higher threshold reduces the risk of over-merging semantically similar but essentially distinct entities in both the coarse-grained and fine-grained alignment stages, so more valid unique entities are retained, resulting in higher completeness. For triple accuracy, a higher threshold avoids incorrect entity merging, which is the main source of semantic errors in knowledge triples. Fewer incorrect merges in both alignment layers directly lead to higher triple accuracy.

However, the downstream QA F1-score does not follow this monotonic trend and reaches its peak at the threshold of 0.6. This phenomenon reflects the core trade-off in knowledge graph construction. The quality of a knowledge graph is not simply determined by high completeness or high accuracy alone, but by the optimal balance between accuracy, completeness, and redundancy. For the low threshold interval (0.4–0.5), although the redundancy rate is low (2.33–4.13%), the over-merging problem is serious. For example, for “ellipse” and “hyperbola”, both conic sections were incorrectly merged in 12 cases when using the 0.5 threshold, and “permutation” and “combination” were incorrectly merged in 8 cases. These incorrect merges lead to a large number of semantic errors in the knowledge graph, making the QA system unable to correctly distinguish different concepts, thus significantly reducing the F1-score (86.22–89.14%). For the optimal threshold interval (0.6), this threshold achieves the best balance among the three dimensions. It correctly merges 94.09% of synonymous entities, such as “quadratic equation with one unknown” and “one-variable quadratic equation”, “probability” and “chance”, through the two-layer alignment process while avoiding incorrect merging of distinct concepts. The knowledge graph at this threshold has neither serious semantic errors nor excessive redundant information, which provides the most effective support for downstream QA tasks and yields the highest F1-score of 91.55%. For the high threshold interval (0.7–0.8), although the completeness (97.91–98.63%) and accuracy (96.01–96.84%) are further improved, the redundancy rate increases sharply to 11.56–19.23%. A large number of synonymous entities that should be merged are retained as separate nodes in the knowledge graph. When the QA system processes queries, it may only match one of the synonymous entities and miss the others, leading to a decrease in recall rate. In addition, redundant entities will also cause the QA system to return duplicate or scattered answers, which reduces the precision rate. The combined effect of these two factors leads to a significant decline in the downstream QA F1-score (88.91–85.38%).

This trend is particularly prominent in the high school mathematics domain due to its unique entity characteristics: mathematical terminology is highly standardized, and 87.2% of synonymous entities have a similarity score between 0.6 and 0.7. Therefore, the threshold of 0.6 can just capture most of these synonymous entities through the two-layer alignment process without introducing incorrect merges, which is exactly why it becomes the optimal threshold. The experimental results confirm that the threshold value of 0.6 achieves the optimal balance among all four metrics. This validates that the initially selected empirical threshold of 0.6 is indeed the optimal threshold for our high school mathematics knowledge graph dataset.

5.4. Component Ablation Studies on Knowledge Graph Quality

To quantitatively analyze the impact of each component of our system on the final knowledge graph quality, we conducted a series of ablation studies. We compared the performance of the full system with four ablated variants: (1) a system without the entity alignment step, (2) a system using only High School Mathematics as the external knowledge base, (3) a system using only Resume as the external knowledge base, and (4) a system using BERT-base instead of RoBERTa-large for generating entity semantic descriptions. All experiments were conducted on the same test dataset, and performance was evaluated using three standard metrics: entity linking accuracy, relational consistency, and KG completeness.

Table 11 presents the results of the ablation studies. The full system achieves the highest performance across all three metrics, demonstrating the effectiveness of our proposed knowledge fusion approach. Removing the entity alignment step leads to a 12.19% decrease in entity linking accuracy and a 9.51% decrease in relational consistency, as duplicate entities remain unmerged and contradictory relations are not resolved. Using only a single external knowledge base, either the high school mathematics knowledge base or the resume knowledge base, results in a 4.95–6.01% decrease in KG completeness, highlighting the benefit of integrating multiple knowledge sources. Replacing RoBERTa-large with BERT-base leads to a 4.27% decrease in entity linking accuracy, a 2.49% decrease in relational consistency, and a 0.82% decrease in KG completeness, indicating that larger pre-trained language models generate more discriminative entity embeddings for domain-specific applications.

These results confirm that each component of our knowledge fusion pipeline contributes significantly to the overall quality of the constructed KG. The entity alignment step is particularly critical, as it addresses the fundamental problem of entity heterogeneity that plagues multi-source KG construction.

5.5. Understanding Knowledge Graph with Complex Network

A knowledge graph is used to display the relationships of knowledge in the same way as complex networks. Thus, the entities can be analyzed by their node properties. The degree values of nodes are one of the most intuitive data to quantify the importance of the entity in facilitating the flow of information and the transmission of knowledge. Figure 7 shows the degree distribution of the entities, and the relationship between degree values and the number of knowledge points has the power-law distribution property. The number of knowledge points with degree values of 1 and 2 is the most, approximately 70% of knowledge points have degree values within the interval [1, 3]. As the degree value increases, the number of knowledge points gradually decreases, with a significant reduction in the number of knowledge points at a degree value of 5.

In experiments, the BiLSTM-CRF model and the BiLSTM-ResNet-CRF model are first run separately on the test dataset. Denote the identified named entities

A = {a_{1}, a_{2}, \dots, a_{m}}

and

B = {b_{1}, b_{2}, \dots, b_{n}}

respectively. Denote the difference sets

A - B = {a_{i} ∣ a_{i} \in A and a_{i} \notin B}

and

B - A = {b_{j} ∣ b_{j} \in B and b_{j} \notin A}

respectively. Denote the degree values as

d (a_{i})

and

d (b_{j})

if the nodes of

a_{i} \in A

and

b_{j} \in B

belonging to the two different sets of knowledge graph, and the total degree values

d (A - B)

and

d (B - A)

for the difference sets

A - B

and

B - A

respectively. The number of entities in set, where

A - B

and

B - A

are 16 and 26, and the maximum degree of the node is 6 and 7 with respect to the two set differences,

d (A - B) = 39

and

d (B - A) = 93

, respectively. The degree differences between

A - B

and

B - A

indicate that the BiLSTM-CRF model identified fewer entities than our model, which shows the advantage of the model in identifying important entities. These results not only validate the effectiveness of the model but also support the practical application in optimizing the named entity recognition models.

5.6. Constructing Knowledge Graph from Text

Take a text content of High School Mathematics as an example to use the BiLSTM-ResNet-CRF model to construct a knowledge graph in the subject domain. The example of text content is “The general research objects are collectively referred to as elements, and the whole composed of some elements is called a set. Increasing functions and decreasing functions are common concepts in mathematics, which are used to describe the change of a function at different points. The definition of the eccentricity of an ellipse is the ratio of the focal distance to the major axis of the ellipse. The values of discrete random variables are finite or countable, and common discrete random variables include Bernoulli random variables, Binomial distribution random variables, Poisson random variables.

The above text is recognized by the named entity recognition model and extracted into 4 types of entities. That is, Preparatory knowledge = {elements, set}, Numbers and algebra = {increasing functions, decreasing functions, function}, Geometry = {ellipse, eccentricity, focal distance, major axis}, Probability and statistics = {discrete random variables, bernoulli random variables, binomial distribution random variables, poisson random variables}. The extracted entities are naturally regarded as nodes in the knowledge graph. The relations between entities are extracted by the 4 types of relations in Section 5.1. In this text content, taking the “set” node as an example, “element” and “set” have a Predecessor–Successor relation, “element” is a Predecessor of “set”, and “set” is the successor of “element”. “set” and “function” have a Predecessor–Successor relation too. “function” and “trigonometric function” have an inclusion relation, and “complementary set” and “intersection” have a parallel relation. The other relations of entities with “set” are shown in Figure 8 with 4 different colors.

5.7. Knowledge Graph Practical Utility Evaluation

5.7.1. Knowledge Graph QA Evaluation

To evaluate the practical utility of the constructed high school mathematics knowledge graph in downstream tasks, a rule-based intelligent question answering system is built, which uses the knowledge graph as its sole knowledge source. The system processes natural language questions through three core steps: entity recognition, relation type matching, and triple retrieval and answer generation. Specifically, it first extracts the core query entity from the input question using the same BiLSTM-ResNet-CRF model proposed in this work, then identifies the potential relation type based on question keywords and syntactic patterns, queries the knowledge graph for matching triples, and finally organizes the retrieved triples into natural language answers.

A test set containing 40 real-world high school mathematics questions is designed, which are carefully selected to cover all four entity types and all four relation types defined in our knowledge graph, and these issues are listed in the Appendix A. The questions were categorized into four types based on the knowledge they query: definition, prerequisite, inclusion, and attribute questions. Each question type contains 10 questions. We independently labeled the standard answers for all questions. Any disagreements between the two teachers were resolved through in-depth discussion to ensure the 100% accuracy of the ground truth.

The overall and category-specific evaluation results are presented in Table 12. The results show that the QA system achieves an overall accuracy of 91.35% and an overall recall of 88.52%, demonstrating that our constructed knowledge graph can effectively support basic intelligent question answering tasks. Among the four question types, attribute questions achieve the highest accuracy (95.45%) and recall (90.84%), which is consistent with the high performance of our relation extraction model on attribute relations (F1-score of 95.73%, as shown in Table 8). The slightly lower performance on prerequisite questions is mainly due to the fact that some concepts have multiple indirect predecessor relations that are not fully captured in the current knowledge graph, which is limited by the scope of our dataset.

5.7.2. Comparison with Expert-Constructed Knowledge Graph

To further validate the quality of our automatically constructed knowledge graph, we conducted a comparative evaluation against a manually constructed expert knowledge graph, which is recognized as the gold standard for domain knowledge graph quality assessment. We have rich experience in curriculum design to manually construct a small-scale expert knowledge graph focusing on the “Functions” module, which is one of the most important and foundational modules in high school mathematics. We were provided with the same Phoenix Education Publishing Edition high school mathematics textbooks and domain supplementary materials used to construct our automatic KG, and defined entities and relations strictly following the same ontology rules specified in Section 5.1. The resulting expert KG contains 87 triples, covering 42 core entities and 45 core relations within the Functions module. We evaluated the automatically constructed KG in terms of knowledge coverage and semantic accuracy.

On the one hand, we compared the entities and relations in our automatic KG with those in the expert KG. The results show that our automatic KG covers 92.86% (39/42) of the core entities and 88.89% (40/45) of the core relations present in the expert KG. The missing entities and relations are mainly rare extended concepts that are not included in the main body of standard textbooks but are only mentioned in optional supplementary reading materials. On the other hand, we randomly sampled 100 triples from our automatically constructed KG, and independently judged whether each triple was semantically correct and educationally meaningful. We judged that 95 out of the 100 triples were correct, resulting in an overall triple accuracy of 95.0%. The five incorrect triples were mainly due to misclassification of relation types, e.g., misclassifying a parallel relation between “increasing functions” and “decreasing functions” as a predecessor–successor relation. The error details are shown in Table 13.

These results demonstrate that our automatically constructed knowledge graph has high coverage of core knowledge points and high semantic accuracy, which is comparable to the quality of a manually constructed expert KG for the same scope of knowledge. This validates the effectiveness of our proposed knowledge graph construction framework in generating high-quality domain knowledge graphs with minimal manual intervention.

6. Discussion and Conclusions

Recently, the explosive growth of study information on the internet has forced learners to invest significant time or energy, mainly because of the knowledge redundancy or unclear expression of the information. Moreover, the amount of unstructured knowledge within numerous subject fields has not been effectively integrated, which hinders the efficient broad application of knowledge. The subject knowledge graph is an effective learning tool for extracting knowledge within a subject. The structured knowledge helps to connect the isolated knowledge to a system. The knowledge graph helps students understand the connections between knowledge and their logical relationships. The degree of an entity exhibits the key knowledge. The framework proposed in this work for constructing knowledge graphs in disciplinary fields can efficiently visualize knowledge interconnections. The graph can help learners quickly locate their required knowledge.

Taking the local map of Figure 3 as an example, an important node “set” is displayed clearly. In the knowledge graph, there are various relationships such as Predecessor–Successor, Inclusion, and parallel relationship. These relationships present learners in a systematic way. For instance, before the study of “set”, learners need to first understand the concept of “element”, because “element” is the fundamental unit that constitutes “set”. Furthermore, the concept of “set” can be divided into subcategories such as “finite set”, “infinite set”, and “empty set”. These classifications not only demonstrate the different types of “set”, but also reflect the hierarchical structure of the concept of “set”. By employing this structured approach, learners can gradually build a comprehensive understanding of the concept of “set”. Their own learning methods by the knowledge network are visualized by the structure of knowledge.

While the combination of residual structures and BiLSTM-CRF is not novel in general NER literature, our work makes a unique contribution by systematically adapting this architecture to the specific requirements of subject domain knowledge graph construction. We have demonstrated through theoretical analysis and empirical experiments that the integration of ResNet blocks addresses two critical limitations of BiLSTM-CRF models in educational text processing: the inability to accurately capture local contextual features for multi-word technical terms and the instability of training on small-scale domain datasets. The domain-specific nature of our model design distinguishes it from previous generic architectural modifications and provides a practical solution for constructing high-quality subject knowledge graphs.

In this work, an improved model on BiLSTM-CRF by introducing residual blocks to enhance performance is presented. Adding residual blocks to BiLSTM-CRF can capture the local contextual information between words and extract features between adjacent words. Therefore, the operation can more effectively identify possible continuous word structures, which is crucial for entity recognition. In addition, the batch normalization operation included in the residual blocks helps stabilize the training process of the network. These advantages of the BiLSTM-CRF model enhance the accuracy of entity recognition. Moreover, the BiLSTM-CRF model can also better identify entities with more knowledge relationships. The specific application of the BiLSTM-CRF model is suited to the enhancement of question-answering system performance. With more accurate entity recognition, knowledge graphs can be enriched. Furthermore, the response capabilities of question-answering systems can also be improved. By the above analysis, the model can achieve better performance in the fields of natural language processing.

Although textbook data accounts for the largest proportion (61.66%) in our dataset, the non-textbook data reaches 38.34%, which is significantly higher than the 5% hypothetical proportion mentioned in the review. This multi-source fusion structure ensures that the model not only learns the standardized knowledge system from textbooks but also adapts to the diverse expression forms of mathematical knowledge in real educational scenarios, effectively improving its generalization ability. However, we also acknowledge that the proportion of web data (29.89%) is still slightly lower than that of textbook data. This may lead to the model being more sensitive to the standardized expression forms in textbooks and having slightly weaker performance in processing highly colloquial or non-standard mathematical expressions. In future work, we will further expand the scale of web data and introduce more diverse educational text resources (such as student question-and-answer forums and video transcriptions) to enhance the model’s robustness and generalization ability in complex application scenarios.

7. Limitation and Future Work

Introducing residual blocks can effectively enhance the model’s ability to capture contextual information. But this improvement also brings an increase in time complexity. The time complexity of the BiLSTM-CRF model is

O (T \times (n + d^{2} + n d + d k)) + O (T \times k^{2})

. Where T is the sequence length, n is the feature dimension of the word vector, d is the dimension of the hidden state, k is the number of output labels. Compared with the BiLSTM-CRF model, the increased time complexity of the improved model is

O (R \times I \times d^{2} + T \times d)

, where

R = ⌊\frac{T - I + 2 P}{L}⌋ + 1

, I is the kernel size, P is the padding size, and L is the stride size. When dealing with large-scale datasets, the method may require high computational resources. All experiments were conducted on an NVIDIA RTX 4090 GPU workstation. On the high school mathematics dataset, the training time of BiLSTM-CRF was 1.27 h, while BiLSTM-ResNet-CRF took 1.72 h (35.4% increase); on the Resume dataset, the times were 2.15 h and 2.89 h (34.4% increase), respectively. Under our settings, parameter R is equal to sequence length T, simplifying the incremental complexity to

O (T \times d^{2} + T \times d)

. For offline knowledge graph construction tasks, this moderate time overhead is acceptable, as the 2.93% F1 improvement reduces entity recognition error rate by 15%, significantly lowering subsequent manual verification costs. The convolution operation can effectively capture local contextual information. But its modeling ability for long-range dependencies is relatively limited. In the named entity recognition task, the recognition of some entities may rely on more distant contextual information. The convolution operation may not adequately capture such long-range dependencies. BiLSTM can alleviate this problem to some extent. The whole model’s ability to understand the overall semantics may be weakened. Additionally, our current evaluation is mainly confined to the intermediate technical indicators of knowledge extraction modules (precision, recall, and F1-score for named entity recognition and relation extraction), and we have not conducted a systematic end-to-end evaluation of the practicality and overall quality of the final constructed high school mathematics knowledge graph.

Therefore, how to improve these shortcomings while enhancing the performance of the model is a direction that needs further exploration in future research. For entity alignment and entity linking tasks, unified semantic representation methods based on BERT have shown strong capabilities in integrating entity surface information, contextual semantics, and external description knowledge. This kind of method can provide better end-to-end disambiguation effect and context modeling ability, which is helpful to further improve the quality of entity alignment. In our future work, we will explore the unified BERT-based entity linking method to replace or optimize the existing two-layer strategy, so as to enhance the disambiguation performance and contextual perception ability of the knowledge fusion module.

Most importantly, we are committed to conducting systematic teaching empirical research to comprehensively evaluate the practical educational value of the constructed high school mathematics knowledge graph. The detailed research design is as follows. For the experimental setup, we will adopt a randomized controlled trial design, recruiting 120 first-year high school students and randomly dividing them into an experimental group and a control group, with 60 students in each group. The two groups will be matched in terms of pre-test mathematics scores, learning interest, and learning habits to ensure baseline equivalence. For intervention, the experimental group will use the knowledge graph-assisted learning system for 8 weeks of mathematics learning, which provides functions such as knowledge navigation, intelligent question answering, and personalized learning path recommendation based on our constructed knowledge graph. The control group will receive traditional classroom teaching without knowledge graph support. For evaluation metrics, we will evaluate the impact of the knowledge graph from three dimensions: (1) knowledge mastery: measured by pre-test and post-test scores of standardized mathematics tests; (2) learning efficiency: measured by the time taken to complete the same unit learning tasks and exercise sets; and (3) learning interest: measured by a validated mathematics learning interest scale before and after the intervention. For data analysis, we will use independent samples t-tests to compare the post-test scores, learning efficiency, and learning interest between the two groups, and use paired samples t-tests to analyze the within-group changes from pre-test to post-test. We will also conduct subgroup analysis to explore whether the effectiveness of the knowledge graph varies for students with different initial learning levels.

Through this rigorous empirical study, we aim to quantitatively verify whether the use of the knowledge graph can effectively improve students’ knowledge mastery, learning efficiency, and learning interest, providing solid evidence for the practical application value of educational knowledge graphs in real teaching scenarios.

Author Contributions

Conceptualization, Y.M.; methodology, Y.M. and L.C.; software, L.C.; validation, L.C. and L.S.; formal analysis, Z.L. and S.Z.; resources, L.S. and Y.M.; data curation, L.C.; writing—original draft preparation, Y.M., L.C., L.S. and Z.L.; writing—review and editing, Y.M., L.S. and S.Z.; visualization, L.C.; supervision, L.S.; project administration, Y.M.; funding acquisition, L.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China [72304101, 72171136], the Guangdong Provincial Philosophy and Social Science Planning Project [GD24YGL22], the Guangdong Basic and Applied Basic Research Foundation [2024A1515011559], and the Natural Science Foundation of Shandong Province [ZR2024QG014, ZR2025QC727, ZR2025MS1152].

Data Availability Statement

The datasets of High School Mathematics in this work can be downloaded from https://github.com/jiesutd/LatticeLSTM (26 September 2025).

Acknowledgments

We sincerely appreciate the efficient and professional handling of our manuscript by the journal editorial team throughout the submission and review process. Meanwhile, we would like to express our sincere gratitude to the anonymous reviewers for their rigorous review work and valuable comments and suggestions. Their insightful criticisms and professional revisions have effectively helped us identify the deficiencies in the research, optimize the research framework, polish the academic expression, and significantly improve the overall quality, rigor, and completeness of this paper. Lu Chen uses the ERNIE Bot large-scale language model (website: https://yiyan.baidu.com/) only to generate supplementary descriptions for a small number of entities that cannot be linked to Baidu Baike. In the high school mathematics case, only 57 out of 395 entities (14.43%) use LLM-generated descriptions. All generated descriptions are fully manually verified by our research team members to ensure 100% accuracy of conceptual and factual content, eliminating hallucinations or unreliable information. Since generating entity descriptions is a standardized factual task, the risk of LLM hallucination is extremely low. The verified descriptions are only used for entity alignment in knowledge fusion to ensure the structuredness and verifiability of the knowledge graph.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

This test set strictly covers 4 entity types, including Preparatory knowledge, Numbers and algebra, Geometry, Probability and statistics, and 4 question types, including Definition, Prerequisite, Inclusion, Attribute, with 10 questions per type, fully matching the 4 relation types defined in the paper.

Table A1. Definition questions.

No.	Question Content	Corresponding Entity Type
1	What is a set?	Preparatory knowledge
2	What is an element?	Preparatory knowledge
3	What is a function?	Numbers and algebra
4	What is an arithmetic sequence?	Numbers and algebra
5	What is an ellipse?	Geometry
6	What is a hyperbola?	Geometry
7	What is a discrete random variable?	Probability and statistics
8	What is a binomial distribution?	Probability and statistics
9	What is a mapping?	Preparatory knowledge
10	What is a parabola?	Geometry

Table A2. Prerequisite questions.

No.	Question Content	Corresponding Entity Type
11	What prior knowledge is required to learn about sets?	Preparatory knowledge
12	What concepts need to be mastered before learning functions?	Numbers and algebra
13	What prerequisite knowledge is needed to learn exponential functions?	Numbers and algebra
14	What should be understood first before learning the eccentricity of an ellipse?	Geometry
15	What knowledge is required to master the asymptotes of a hyperbola?	Geometry
16	What preparatory knowledge is needed to learn probability?	Probability and statistics
17	What concept needs to be understood first before learning the binomial distribution?	Probability and statistics
18	What foundation is required to learn trigonometric functions?	Numbers and algebra
19	What knowledge of plane geometry is needed to learn solid geometry?	Geometry
20	What should be understood first before learning permutations and combinations?	Probability and statistics

Table A3. Inclusion questions.

No.	Question Content	Corresponding Entity Type
21	What are the common types of sets?	Preparatory knowledge
22	What do basic elementary functions include?	Numbers and algebra
23	What are the main types of sequences?	Numbers and algebra
24	What figures do conic sections include?	Geometry
25	What are the basic types of spatial geometric solids?	Geometry
26	What are the common distributions of discrete random variables?	Probability and statistics
27	What do statistics mainly include?	Probability and statistics
28	What are the basic operations of sets?	Preparatory knowledge
29	What basic functions do trigonometric functions include?	Numbers and algebra
30	What are the positional relationships between a line and a plane?	Geometry

Table A4. Attribute questions.

No.	Question Content	Corresponding Entity Type
31	How is the eccentricity of an ellipse defined?	Geometry
32	What is the relationship between the focal distance and the major axis length of an ellipse?	Geometry
33	What is the general term formula of an arithmetic sequence?	Numbers and algebra
34	What is the sum formula of the first n terms of a geometric sequence?	Numbers and algebra
35	What are the three basic properties of sets?	Preparatory knowledge
36	What are the three elements of a function?	Numbers and algebra
37	What is the expected value formula of the binomial distribution?	Probability and statistics
38	What are the standard forms of the equation of a parabola?	Geometry
39	What is the difference between mutually exclusive events and complementary events?	Probability and statistics
40	What are the domain and range of an exponential function, respectively?	Numbers and algebra

References

Pfitzner, F.; Braun, A.; Borrmann, A. From data to knowledge: Construction process analysis through continuous image capturing, object detection, and knowledge graph creation. Autom. Constr. 2024, 164, 105451. [Google Scholar] [CrossRef]
Chen, Z.; Wan, Y.; Liu, Y.; Valera-Medina, A. A knowledge graph-supported information fusion approach for multi-faceted conceptual modelling. Inf. Fusion 2024, 101, 101985. [Google Scholar] [CrossRef]
Fettach, Y.; Ghogho, M.; Benatallah, B. Knowledge graphs in education and employability: A survey on applications and techniques. IEEE Access 2022, 10, 80174–80183. [Google Scholar] [CrossRef]
Wang, H.; Yang, J.; Yang, L.T.; Gao, Y.; Ding, J.; Zhou, X.; Liu, H. Mvtucker: Multi-view knowledge graphs representation learning based on tensor tucker model. Inf. Fusion 2024, 106, 102249. [Google Scholar] [CrossRef]
Tang, X.; Feng, Z.; Xiao, Y.; Wang, M.; Ye, T.; Zhou, Y.; Meng, J.; Zhang, B.; Zhang, D. Construction and application of an ontology-based domain-specific knowledge graph for petroleum exploration and development. Geosci. Front. 2023, 14, 101426. [Google Scholar] [CrossRef]
Gan, L.; Ye, B.; Huang, Z.; Xu, Y.; Chen, Q.; Shu, Y. Knowledge graph construction based on ship collision accident reports to improve maritime traffic safety. Ocean. Coast. Manag. 2023, 240, 106660. [Google Scholar] [CrossRef]
Guo, L.; Li, X.; Yan, F.; Lu, Y.; Shen, W. A method for constructing a machining knowledge graph using an improved transformer. Expert Syst. Appl. 2024, 237, 121448. [Google Scholar] [CrossRef]
Wen, M.; Qiu, Q.; Zheng, S.; Ma, K.; Zheng, S.; Xie, Z.; Tao, L. Construction and application of a multilevel geohazard domain ontology: A case study of landslide geohazards. Appl. Comput. Geosci. 2023, 20, 100134. [Google Scholar] [CrossRef]
Dang, F.R.; Tang, J.T.; Pang, K.Y.; Wang, T.; Li, S.S.; Li, X. Constructing an educational knowledge graph with concepts linked to Wikipedia. J. Comput. Sci. Technol. 2021, 36, 1200–1211. [Google Scholar] [CrossRef]
Saravanan, K.S.; Bhagavathiappan, V. Innovative agricultural ontology construction using NLP methodologies and graph neural network. Eng. Sci. Technol. Int. J. 2024, 52, 101675. [Google Scholar] [CrossRef]
Kong, S.; Huang, X.; Zhong, X.; Yang, M. Entity recognition method for airborne products metrological traceability knowledge graph construction. Measurement 2024, 225, 114032. [Google Scholar] [CrossRef]
Fan, Z.; Chen, C. CuPe-KG: Cultural perspective–based knowledge graph construction of tourism resources via pretrained language models. Inf. Process. Manag. 2024, 61, 103646. [Google Scholar] [CrossRef]
Yang, P.; Wang, H.; Huang, Y.; Yang, S.; Zhang, Y.; Huang, L.; Zhang, Y.; Wang, G.; Yang, S.; He, L.; et al. LMKG: A large-scale and multi-source medical knowledge graph for intelligent medicine applications. Knowl.-Based Syst. 2024, 284, 111323. [Google Scholar] [CrossRef]
Ji, B.; Liu, R.; Li, S.; Tang, J.; Yu, J.; Li, Q.; Xu, W. A BiLSTM-CRF Method to Chinese Electronic Medical Record Named Entity Recognition. In Proceedings of the 2018 International Conference on Algorithms, Computing and Artificial Intelligence, Sanya, China, 21–23 December 2018; pp. 1–6. [Google Scholar] [CrossRef]
Qiu, Q.; Xie, Z.; Wu, L.; Tao, L.; Li, W. BiLSTM-CRF for geological named entity recognition from the geoscience literature. Earth Sci. Inform. 2019, 12, 565–579. [Google Scholar] [CrossRef]
MAK, R.S.; Bijaksana, M.A.; Huda, A.F. Person entity recognition for the Indonesian Qur’an translation with the approach hidden Markov model-viterbi. Procedia Comput. Sci. 2019, 157, 214–220. [Google Scholar] [CrossRef]
Lv, C.; Pan, D.; Li, Y.; Li, J.; Wang, Z. A novel Chinese entity relationship extraction method based on the bidirectional maximum entropy Markov model. Complexity 2021, 2021, 6610965. [Google Scholar] [CrossRef]
Lee, W.; Kim, K.; Lee, E.Y.; Choi, J. Conditional random fields for clinical named entity recognition: A comparative study using Korean clinical texts. Comput. Biol. Med. 2018, 101, 7–14. [Google Scholar] [CrossRef] [PubMed]
Yan, W.; Cao, H.; Cui, Z. Tibetan text classification based on RNN. In Proceedings of the 2021 4th International Conference on Advanced Algorithms and Control Engineering (ICAACE 2021), Sanya, China, 29–31 January 2021; IOP Publishing: Bristol, UK, 2021; Volume 1848, p. 012139. [Google Scholar] [CrossRef]
Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A survey of convolutional neural networks: Analysis, applications, and prospects. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 6999–7019. [Google Scholar] [CrossRef]
Krichen, M.; Mihoub, A. Long short-term memory networks: A comprehensive survey. AI 2025, 6, 215. [Google Scholar] [CrossRef]
Lample, G.; Ballesteros, M.; Subramanian, S.; Kawakami, K.; Dyer, C. Neural architectures for named entity recognition. arXiv 2016, arXiv:1603.01360. [Google Scholar] [CrossRef]
Wang, T.; Yang, W.; Wu, T.; Yang, C.; Liang, J.; Wang, H.; Li, J.; Xiang, D.; Zhou, Z. Joint entity and relation extraction with fusion of multi-feature semantics. J. Intell. Inf. Syst. 2025, 63, 21–42. [Google Scholar] [CrossRef]
Li, R.; La, K.; Lei, J.; Huang, L.; Ouyang, J.; Shu, Y.; Yang, S. Joint extraction model of entity relations based on decomposition strategy. Sci. Rep. 2024, 14, 1786. [Google Scholar] [CrossRef]
Zhu, W.; Liu, J.; Xu, J.; Chen, Y.; Zhang, Y. Improving low-resource named entity recognition via label-aware data augmentation and curriculum denoising. In Proceedings of the China National Conference on Chinese Computational Linguistics, Huhhot, China, 13–15 August 2021; Springer Nature: London, UK, 2021; pp. 355–370. [Google Scholar] [CrossRef]
Yaseen, U.; Langer, S. Data Augmentation for Low-Resource Named Entity Recognition Using Backtranslation. In Proceedings of the 18th International Conference on Natural Language Processing, Virtual, 16–19 December 2021; NLP Association of India: Chennai, India, 2021; pp. 352–358. [Google Scholar]
Liu, Y.; Wei, S.; Huang, H.; Lai, Q.; Li, M.; Guan, L. Naming entity recognition of citrus pests and diseases based on the BERT-BiLSTM-CRF model. Expert Syst. Appl. 2023, 234, 121103. [Google Scholar] [CrossRef]
Cherifi, F.; Omar, M.; Amroun, K. An efficient biometric-based continuous authentication scheme with HMM prehensile movements modeling. J. Inf. Secur. Appl. 2021, 57, 102739. [Google Scholar] [CrossRef]
Geng, Z.; Li, J.; Han, Y.; Zhang, Y. Novel target attention convolutional neural network for relation classification. Inf. Sci. 2022, 597, 24–37. [Google Scholar] [CrossRef]
Liu, Z.; Li, H.; Wang, H.; Liao, Y.; Liu, X.; Wu, G. A novel pipelined end-to-end relation extraction framework with entity mentions and contextual semantic representation. Expert Syst. Appl. 2023, 228, 120435. [Google Scholar] [CrossRef]
Fang, W.; Luo, H.; Xu, S.; Love, P.E.; Lu, Z.; Ye, C. Automated text classification of near-misses from safety reports: An improved deep learning approach. Adv. Eng. Inform. 2020, 44, 101060. [Google Scholar] [CrossRef]
Wei, Z.; Su, J.; Wang, Y.; Tian, Y.; Chang, Y. A novel cascade binary tagging framework for relational triple extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 1476–1488. [Google Scholar] [CrossRef]
Zhuang, L.; Fei, H.; Hu, P. Knowledge-enhanced event relation extraction via event ontology prompt. Inf. Fusion 2023, 100, 101919. [Google Scholar] [CrossRef]
Alshemaimri, B.; Badshah, A.; Daud, A.; Bukhari, A.; Alsini, R.; Alghushairy, O. Regional computing approach for educational big data. Sci. Rep. 2025, 15, 7619. [Google Scholar] [CrossRef]
Yang, Y.; Zhu, Y.; Jian, P. Application of knowledge graph in water conservancy education resource organization under the background of big data. Electronics 2022, 11, 3913. [Google Scholar] [CrossRef]
Li, X.; Chen, M. Management course knowledge graph construction based on ontology. In Proceedings of the 2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), Niagara Falls, ON, Canada, 17–20 November 2022; IEEE: New York, NY, USA, 2022; pp. 644–646. [Google Scholar] [CrossRef]
Nair, L.S.; Shivani, M.; Cheriyan, S.J. Enabling remote school education using knowledge graphs and deep learning techniques. Procedia Comput. Sci. 2022, 215, 618–625. [Google Scholar] [CrossRef]
Zhang, B.; Zou, G.; Qin, D.; Ni, Q.; Mao, H.; Li, M. RCL-Learning: ResNet and convolutional long short-term memory-based spatiotemporal air pollutant concentration prediction model. Expert Syst. Appl. 2022, 207, 118017. [Google Scholar] [CrossRef]
Wu, S.; He, Y. Enriching pre-trained language model with entity information for relation classification. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 2361–2364. [Google Scholar] [CrossRef]
Ijebu, F.F.; Liu, Y.; Sun, C.; Usip, P.U. Soft cosine and extended cosine adaptation for pre-trained language model semantic vector analysis. Appl. Soft Comput. 2025, 169, 112551. [Google Scholar] [CrossRef]
Zhang, Y.; Yang, J. Chinese NER using lattice LSTM. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 15–20 July 2018; pp. 1554–1564. [Google Scholar] [CrossRef]

Figure 1. The framework. The dotted red box on the left and the bottom-right of this graph represent the main innovative parts in this work, and the blue dotted box represents the previous methods.

Figure 2. (a) The named entity recognition model with six components, where Conv, BN, ReLU of the residual block denote the convolution operation, the simple Batch Normalization operation, Rectified Linear Unit function, respectively. (b) An example of named entity recognition is realized based on the model in the left panel (a). “奇”, “函”, and “数” are Chinese decompositions of “odd functions”.

Figure 3. (a) The internal diagram of LSTM presented by reference [27]. The symbol × and + represent multiplication and addition, respectively.

tanh (x) = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}}

. (b) An example gotten by LSTM.

Figure 3. (a) The internal diagram of LSTM presented by reference [27]. The symbol × and + represent multiplication and addition, respectively.

tanh (x) = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}}

. (b) An example gotten by LSTM.

Figure 4. Relation extraction model. “集”, “合”, “是”, “由”, “元”, “素”, “组”, “成” and “的” are Chinese decompositions of “The Set is Composed of Elements”.

Figure 5. The scores of BERT model on relation extraction with different epochs.

Figure 6. Constructing a knowledge graph of the High School Mathematics. The four different colors respect four types of entity. Due to the fact that the dataset used in this paper is a Chinese dataset, the entity names are all derived from Chinese entities extracted from the dataset.

Figure 7. The degree distribution of the knowledge graph shown in Figure 6.

Figure 8. Take “Set” as an example to show the local map of the knowledge graph. Due to the fact that the dataset used in this paper is a Chinese dataset, the entity names are all derived from Chinese entities extracted from the dataset. We provide English translations next to each Chinese entity.

Table 1. Comparison of advantages and disadvantages of various word vector models.

Features	One-Hot	Word2vec	Glove	BERT
Context information	None	Local context	Global contex	Dynamic Context
Vector space	Sparse	Dense	Dense	Dense
Semantic relationship	None	Capable of capturing	Capable of capturing	Capture richer relationships
Training data dependency	None	Extensive text training	Extensive text training	Be directly used

Table 2. Detailed scale statistics of multi-source data sources.

Data Source	Number of Source Files	Total Words (Thousands)	Number of Annotated Triples	Percentage
Textbook	5 books	1209	394	61.66%
Authoritative Web Pages	227 pages	436	191	29.89%
Domain Supplementary Materials	11 documents	178	54	8.45%
Total	-	1734	639	100%

Note: The high school mathematics dataset used in this study is derived from the publicly available dataset released by Zhang and Yang (2018) [41]. The original dataset did not provide detailed raw data scale statistics. The above statistics are estimated based on the original data collection description and in-depth character count analysis of the dataset content.

Table 3. The name entity recognition of High School Mathematics.

Entity Types	Trains	Tests
Preparatory knowledge	695	84
Numbers and algebra	879	81
Geometry	1243	168
Probability and statistics	296	43

Table 4. Text annotation for named entity recognition.

English Process Text	The Set is Composed of Elements
Chinese Process text	集	合	由	元	素	组	成
Entity label annotation	B-Pre	I-Pre	O	B-Pre	I-Pre	O	O

“集”, “合”, “是”, “由”, “元”, “素”, “组”, and “成” are Chinese decompositions of “The Set is Composed of Elements”.

Table 5. Text annotation for relation extraction.

Text	Head Entity	Tail Entity	Relation
The set is composed of elements	elements	Set	Predecessor Successor

Table 6. The results of comparison experiments.

Models	Datasets	Precision	Recall	F1
BiLSTM	High School Mathematics	76.50%	79.43%	77.93%
	Resume	94.04%	93.29%	93.66%
BiLSTM-ResNet	High School Mathematics	82.27%	74.16%	78.01%
	Resume	93.76%	$95.83 %$	94.79%
BiLSTM-CRF	High School Mathematics	78.83%	75.89%	77.33%
	Resume	94.36%	93.74%	94.05%
2-layer BiLSTM-CRF	High School Mathematics	79.12%	76.54%	77.81%
	Resume	94.51%	93.92%	94.21%
3-layer BiLSTM-CRF	High School Mathematics	79.45%	76.87%	78.12%
	Resume	94.62%	94.03%	94.32%
BiLSTM- BiLSTM	High School Mathematics	76.94%	79.55%	78.46%
	Resume	94.42%	95.10%	94.76%
BiLSTM-TextCNN-CRF	High School Mathematics	81.52%	76.18%	78.75%
	Resume	94.42%	95.10%	94.76%
BiLSTM-DilatedCNN-CRF	High School Mathematics	82.79%	76.42%	79.48%
	Resume	94.48%	95.41%	94.94%
ResNet-BiLSTM-CRF	High School Mathematics	82.15%	74.42%	78.09%
	Resume	93.78%	94.72%	94.25%
BiLSTM-ResNet-CRF	High School Mathematics	$84.23 %$	$76.65 %$	$80.26 %$
	Resume	$94.55 %$	$95.73 %$	$95.14 %$

A paired t-test between All experiments and BiLSTM-ResNet-CRF confirmed a statistically significant performance difference (p < 0.05) on both datasets. The bold font in the table represents the highest values of Precision, Recall, and F1 in each dataset.

Table 7. The results of the four types of entity evaluation experiments.

Entity Types	Precision	Recall	F1
Preparatory knowledge	80.09%	67.72%	73.35%
Numbers and algebra	76.13%	75.45%	75.78%
Geometry	88.54%	81.76%	85.01%
Probability and statistics	87.38%	73.17%	79.65%

Table 8. Ablation study of residual block components on the high school mathematics dataset.

Model Variant	Precision	Recall	F1
BiLSTM-CRF	78.83%	75.89%	77.33%
BiLSTM-CRF+Conv	79.56%	76.21%	77.85%
BiLSTM-CRF+Conv+BN	81.72%	75.43%	78.45%
BiLSTM-ResNet-CRF	84.23%	76.65%	80.26%

Table 9. The results of the four types of relation evaluation experiments.

Relation Types	Precision	Recall	F1
Predecessor–successor relation	95.35%	87.23%	91.11%
Inclusion relation	83.33%	88.24%	85.71%
Parallel relation	80.00%	88.89%	84.21%
Attribute relation	94.92%	96.55%	95.73%

Table 10. Threshold sensitivity analysis results for entity alignment.

Threshold	Entity Redundancy Rate	Knowledge Graph Completeness	Triple Accuracy	Downstream QA F1-Score
0.4	2.33%	86.77%	88.83%	86.22%
0.5	4.13%	92.26%	92.55%	89.14%
0.6	5.91%	96.48%	95.28%	91.55%
0.7	11.56%	97.91%	96.01%	88.91%
0.8	19.23%	98.63%	96.84%	85.38%

Table 11. Ablation study results of each component in the knowledge fusion system.

System Variant	Entity Linking Accuracy	Relational Consistency	KG Completeness
Full System	91.52%	88.92%	85.56%
No Entity Alignment	79.33%	79.41%	81.94%
High School Mathematics	88.75%	86.64%	79.55%
Resume	87.34%	85.36%	80.61%
BERT-base	87.25%	86.43%	84.74%

Table 12. Performance of the knowledge graph-based QA system.

Question Type	Number of Questions	Accuracy	Recall
Definition	10	90.11%	87.50%
Prerequisite	10	90.02%	85.71%
Inclusion	10	89.82%	90.03%
Attribute	10	95.45%	90.84%
Overall	40	91.35%	88.52%

Table 13. Detailed analysis of incorrect triples.

No.	Incorrect Triple	Correct Relation	Error Cause
1	(Increasing functions, Decreasing functions, Predecessor–successor)	Parallel	The model incorrectly assumed that learning increasing functions is a prerequisite for learning decreasing functions. In fact, both are basic types of function monotonicity with the same predecessor knowledge “basic properties of functions”.
2	(Ellipse, Focal point, Attribute)	Inclusion	The model misclassified the component of an ellipse as an attribute. In fact, the focal point is an inherent component of an ellipse.
3	(Arithmetic sequence, Common difference, Inclusion)	Attribute	The model misclassified the characteristic parameter of an arithmetic sequence as a sub-concept. In fact, the common difference is a core attribute describing the change rule of the sequence.
4	(Permutation, Combination, Predecessor–successor)	Parallel	The model incorrectly assumed that learning permutations is a prerequisite for learning combinations. In fact, both are basic methods of counting principles with the same predecessor knowledge.
5	(Odd function, Even function, Predecessor–successor)	Parallel	The model incorrectly assumed that learning odd functions is a prerequisite for learning even functions. In fact, both are basic types of function parity with the same predecessor knowledge.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ma, Y.; Chen, L.; Liu, Z.; Zhou, S.; Song, L. BiLSTM-ResNet-CRF: An Improved Model for Subject Knowledge Graph Construction. Systems 2026, 14, 623. https://doi.org/10.3390/systems14060623

AMA Style

Ma Y, Chen L, Liu Z, Zhou S, Song L. BiLSTM-ResNet-CRF: An Improved Model for Subject Knowledge Graph Construction. Systems. 2026; 14(6):623. https://doi.org/10.3390/systems14060623

Chicago/Turabian Style

Ma, Yinghong, Lu Chen, Zhiyuan Liu, Shengyao Zhou, and Le Song. 2026. "BiLSTM-ResNet-CRF: An Improved Model for Subject Knowledge Graph Construction" Systems 14, no. 6: 623. https://doi.org/10.3390/systems14060623

APA Style

Ma, Y., Chen, L., Liu, Z., Zhou, S., & Song, L. (2026). BiLSTM-ResNet-CRF: An Improved Model for Subject Knowledge Graph Construction. Systems, 14(6), 623. https://doi.org/10.3390/systems14060623

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

BiLSTM-ResNet-CRF: An Improved Model for Subject Knowledge Graph Construction

Abstract

1. Introduction

2. Related Works

2.1. Related Work of Named Entity Recognition

2.2. Related Work of Relation Extraction

2.3. Educational Knowledge Graph

3. System Framework

4. Methodology on Knowledge Graph Construction

4.1. The Schema Layer of Subject Domain

4.2. Named Entity Recognition

4.3. Theoretical Rationale for BiLSTM-ResNet-CRF Integration

4.4. Relation Extraction

4.5. Knowledge Fusion

4.6. Integration of Core Knowledge Graph Construction Modules

5. Case Study

5.1. The Definition Entity Types and Their Relations

5.2. The Evaluation of Models Within the Framework

5.3. Threshold Sensitivity Analysis

5.4. Component Ablation Studies on Knowledge Graph Quality

5.5. Understanding Knowledge Graph with Complex Network

5.6. Constructing Knowledge Graph from Text

5.7. Knowledge Graph Practical Utility Evaluation

5.7.1. Knowledge Graph QA Evaluation

5.7.2. Comparison with Expert-Constructed Knowledge Graph

6. Discussion and Conclusions

7. Limitation and Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI