1. Introduction
As educational informatics enters a new stage of deep integration, knowledge graphs (KGs)—a pivotal technology for knowledge organization and representation in artificial intelligence—have emerged as a prominent research frontier, providing innovative pathways for the high-quality development of smart education [
1,
2]. Their primary advantage lies in leveraging natural language processing and knowledge fusion techniques to transform unstructured data into inferable, structured knowledge frameworks. In the practice of smart education, the application value of this technology is manifested in several key areas: Curriculum KGs integrate macro-disciplinary frameworks with micro-level knowledge points to construct hierarchical and logically rigorous knowledge systems [
3]. By analyzing multi-dimensional learner profile data, KGs enable the precise diagnosis of students’ cognitive states and knowledge gaps, thereby facilitating personalized learning path planning and resource recommendation. KGs provide scientific data support for instructors to optimize instructional design and implement formative assessments.
Notably, within the end-to-end pipeline of automated KG construction, named entity recognition (NER) serves as the foundational link for knowledge extraction and fusion; its accuracy directly dictates the ultimate quality of the resulting graph.
Given the multiple advantages of curriculum knowledge graphs, many researchers are actively exploring their application potential in teaching. To address the challenges of educational informatics and achieve precision teaching, Yang et al. [
4] proposed a model for effective knowledge tracing through subject knowledge maps (SKMs). This model first constructs a hierarchical SKM to represent parent–child, sibling, and parallel relationships between knowledge concepts. This approach effectively overcomes the problem of ambiguous relationships in traditional models, thereby significantly improving the accuracy and interpretability of knowledge tracing.
Additionally, Su et al. [
5] proposed a construction framework based on iterative bootstrapping. This framework enables the progressive expansion of labeled samples during the construction of subject knowledge graphs, leading to a reduction in manual workload compared with traditional methods. Similarly, Li et al. [
6] introduced an educational knowledge graph named CourseKG. They utilized various types of teaching data from online learning platforms, such as electronic textbooks, course outlines, and tests. By accurately extracting pedagogical concepts and relationships from this heterogeneous data, they significantly improved the quality of precision teaching.
However, constructing a high-quality curriculum knowledge graph necessitates the accurate extraction of knowledge entities from course texts [
7]. Unlike general entities such as Person, Location, and Organization, entities in the educational domain possess distinct domain-specific attributes and professional terminology [
8]. These entities range from concise lexical units such as “course names” to long, phrase-based conceptual entities such as “subject theories.” Their forms are highly flexible, and they exhibit deep complexity in terms of type, structure, and semantics, which poses significant challenges to the task of named entity recognition.
Specifically, the occurrence of nested and overlapping entities makes it difficult for models to identify precise entity boundaries. Furthermore, contextual information, especially long-range dependencies, is crucial for accurately determining entity types. For example, in a nursing report, various signs of onset described earlier in the text (labeled as “Symptom” entities) can provide critical contextual clues for the classification of corresponding entities, such as “Disease” entities, that appear later. However, models often fail to effectively process and associate information that is spatially distant in a sequence but closely related in semantic logic. As a result, they cannot sufficiently leverage the surrounding context for entity classification.
In summary, the aforementioned issues are key factors contributing to the suboptimal performance of named entity recognition in educational domains. To address these challenges, enhance the model’s ability to identify the boundaries of course knowledge entities, and capture the associations between entities within contextual instructional texts, this study proposes a novel approach: a course named entity recognition method based on multi-dimensional position features (MP-CNER). The primary contributions of this work are as follows:
A multi-dimensional position feature (MDPF) module is proposed to overcome the bottleneck of ambiguous entity boundaries. To address the boundary ambiguity caused by the highly specialized nature of curriculum texts, the MDPF module utilizes a self-attention mechanism to dynamically fuse multi-dimensional boundary features, representing the character’s roles as the beginning (B), middle (M), and end (E) of various potential lexicons. This mechanism accounts for the positional features of characters across different contexts and introduces essential boundary prior knowledge into the model, significantly enhancing its ability to precisely demarcate complex and ambiguous entities.
A boundary-guided selective contextual linker (SCL) is designed to resolve long-distance dependency challenges. Building on the deep semantic representations extracted by the pre-trained language model, the SCL utilizes the boundary prior signals provided by the MDPF as guidance to establish “jump-style” associations between characters that are physically distant but logically connected. This mechanism breaks the local window limitations of traditional sequential models, allowing the model to bypass redundant descriptions in curriculum texts and directly capture the profound internal dependencies within long-span entities.
Specialized curriculum datasets are constructed, and the model’s superior performance and universality are confirmed. To address the structural scarcity of NER corpora in the field of vocational education, this study constructs and annotates two specialized datasets: Nursing and New_Energy_Vehicle. Extensive comparative and ablation experiments conducted on these datasets, alongside the general-domain People’s_Daily_NER, demonstrate that the proposed MP-CNER method not only exhibits significant effectiveness in processing specialized curriculum texts but also possesses robust cross-domain universality.
The remainder of this paper is organized as follows:
Section 2 reviews the related work regarding named entity recognition and curriculum knowledge graph construction.
Section 3 provides a detailed description of the proposed MP-CNER model architecture.
Section 4 presents the experimental datasets, the setup, and a comprehensive analysis of the results. Finally,
Section 5 concludes the paper and discusses potential directions for future research.
2. Related Work
Named entity recognition, as a foundational and critical task in natural language processing (NLP), has undergone a clear technical evolution. The mainstream methods used throughout its development can be divided into three categories: rule- and dictionary-based methods, statistical machine learning methods, and deep learning-based methods [
9,
10]. Currently, deep learning-based approaches have become the undisputed core of NER research. The most fundamental breakthrough of this paradigm lies in its ability to achieve automatic feature learning. Models can capture deep, context-dependent semantic information directly from raw text, thereby eliminating the reliance on manually designed features [
11]. In particular, the emergence of pre-trained language models such as BERT [
12] has significantly advanced the field, as state-of-the-art performance can be achieved through simple fine-tuning across various NER tasks.
In the context of document-level NER, Wei and Li [
13] proposed a two-stage model named ScdNER, which employs a span-based multi-stage filtering and fusion strategy to ensure accurate global information sharing. This design effectively suppresses the noise caused by non-entity spans, leading to more robust and consistent entity recognition. Zha et al. [
14] introduced the CeptNER model, which integrates contrastive learning into a two-stage framework based on prototypical networks. This integration ensures that representations of the same entity category are clustered more closely in the embedding space, while those of different categories are further separated, significantly enhancing the classification capability of prototypical networks in few-shot scenarios.
Addressing the limitations of traditional span-based methods—which classify each candidate entity independently and ignore semantic dependencies between spans—Geng et al. [
15] proposed an innovative flattened sentence representation. This method models nested entity structures more naturally and explicitly learns the mutual relationships between them. For biomedical NER, Naseem et al. [
16] developed BioALBERT, a pre-trained model that utilizes self-supervised loss within ALBERT to better learn context-dependent information. Li et al. [
17] proposed the MacBERT-SDI-ML model, which incorporates a multi-perspective lexical information fusion component to strengthen the boundary features of character representations. Their model also introduces a syntactic dependency information parser (SDIP) to extract and fuse richer entity dependency information, thereby improving classification accuracy. Additionally, Wu et al. [
18] developed a model based on RoBERTa and radical features. This approach uses RoBERTa to learn medical features and a Bi-LSTM to extract radical-based features, which are then concatenated with the RoBERTa vectors for label decoding via a conditional random field (CRF) layer. Collectively, deep learning-based NER methods have achieved a leap in performance and have pushed the field to new heights.
Course named entity recognition is a critical step in constructing curriculum knowledge graphs. An increasing number of educators are dedicated to researching more effective construction methods, including techniques to improve the accuracy of entity extraction. For example, Hu et al. [
19] applied the classic BERT-BiLSTM-CRF architecture to NER in educational psychology courses. Their work addressed the issues of complex and easily confused terminology within the field, thereby facilitating the construction of psychology curriculum KGs and improving learning efficiency.
Zhai et al. [
20] utilized RoBERTa to construct word embeddings and employed the efficient GlobalPointer to resolve entity nesting issues, achieving robust NER performance on Chinese biomedical courses and clinical data. Similarly, Yu et al. [
21] proposed a unified model named RBTG. This model was optimized to handle challenges in Chinese cyber threat intelligence (CTI) data, such as difficult entity boundary recognition and dependencies on direction and distance. By leveraging relative positional information and a multiplicative attention mechanism, RBTG predicts entity boundaries from a global perspective, thus enhancing boundary sensitivity.
To address the problems of insufficient semantic representation and low efficiency in Chinese electronic medical record NER, Tang et al. [
22] designed a model named ALBIC. This model introduces the lightweight pre-trained model ALBERT to effectively mitigate word polysemy across different contexts while reducing the number of training parameters. Furthermore, to handle the complexity of educational texts, Qin et al. [
23] proposed the DP-FWCA model. The core innovation of this model lies in its domain-adaptive prompting strategy, which guides the pre-trained model to focus on domain-specific semantic information by prepending explicit instructions containing all entity category definitions to the input text. This approach achieves lightweight domain alignment and effectively improves NER performance on professional domain texts.
Despite the extensive exploration of course-related named entity recognition in recent years and the progress made in model structure optimization and semantic feature representation, numerous challenges remain in practical applications. First, course texts typically exhibit strong professionalism and domain-specific characteristics, containing many technical terms and compound words. This often results in ambiguous entity boundaries, which increases the difficulty of precise boundary segmentation. Second, in course descriptions or instructional resource texts, semantic associations between entities often span multiple sentences or even paragraphs. Traditional sequence labeling models still face certain limitations in capturing these long-range dependencies and global contextual information. Furthermore, the aforementioned methods do not fully account for the positional features of characters across different lexicons. In a given sentence, a single character may belong to multiple potential words, occupying different relative positions within each. These variations carry significant implications for precise entity boundary identification.
Considering these considerations, this study proposes MP-CNER, a curriculum-oriented named entity recognition model based on multi-dimensional position features. Inspired by the adaptive text-guided feature fusion architecture introduced by Liu et al. [
24], the multi-dimensional position feature module within the MP-CNER framework is designed to effectively capture boundary features. This model not only accurately defines entity boundaries but also deeply explores the latent association information between entities in the text. By constructing a more comprehensive and deep contextual semantic representation, the proposed approach significantly improves the accuracy of course named entity recognition tasks.
3. Model
The overall architecture of the MP-CNER model is illustrated in
Figure 1.
The entity extraction workflow of the model is described as follows: First, the model performs parallel extraction and semantic encoding of multi-source features. The input layer receives a course text character sequence X = {c1, c2,…,cn}, which is concurrently fed into two parallel branches. On the one hand, the Chinese-RoBERTa-wwm-ext model is utilized for deep semantic modeling to capture character-level vector representations within a general context. On the other hand, the MDPF module integrates a domain dictionary to perform multi-perspective position feature mapping for each character. This module identifies the roles that characters play within professional terminology, such as the beginning (B), middle (M), and end (E) positions, and transforms external prior knowledge into boundary-enhanced vectors.
Then, the model utilizes the semantic vectors output by Chinese-RoBERTa-wwm-ext as guiding signals, which are injected into the MDPF module. At this stage, the features are no longer a simple physical stacking. Instead, a self-attention mechanism is employed to enable the boundary features to be dynamically adjusted and filtered based on the current semantic environment. Subsequently, a fusion operator ⨁ is used to concatenate the semantic features with the enhanced boundary features. This process forms a composite feature representation that possesses deep semantic understanding and precise boundary perception capabilities.
The fused feature sequence is then fed into the model’s selective contextual linker. The SCL utilizes the boundary signals provided by the MDPF component as guidance to establish “skip links” between characters that are semantically and logically close but physically distant in the text. This mechanism overcomes the limitations of traditional models, which typically transmit information character by character, and it allows the model to bypass redundant descriptions within course texts and directly capture deep logical connections within long-span entities and between different entities. Consequently, this approach effectively resolves the entity classification bias that often results from long-range dependencies.
Finally, the feature representations enhanced by the SCL are fed into a conditional random field, which serves as the model’s classifier. By integrating the precise boundary information and long-range semantic associations captured in the preceding layers, the CRF decodes the final entity label sequence from the global optimal path. This process achieves the accurate identification of course knowledge entities.
3.1. Encoder
Chinese-RoBERTa-wwm-ext is a high-performance pre-trained language model specifically designed for Chinese natural language processing tasks [
25,
26]. It integrates advanced RoBERTa training strategies with a whole-word masking (WWM) mechanism that is tailored to the linguistic characteristics of the Chinese language. This integration enables the model to learn more accurate and rich semantic representations of Chinese text. Compared to the traditional BERT model, this model takes the unique linguistic features of Chinese into full consideration during the pre-training phase. Specifically, it optimizes the masking strategy by adopting the WWM approach, thereby enhancing the model’s capacity for semantic understanding.
Furthermore, during the pre-training process, this model expands the scale and diversity of the training corpus by incorporating a wider variety of text types and a broader range of thematic content. This expansion further enhances the model’s generalization capability and its performance across diverse Chinese NLP tasks. The overall architecture of this model is illustrated in
Figure 2.
Chinese-RoBERTa-wwm-ext produces a hidden state matrix H1 = {, , …, } that is equal in length to the input sequence. Each vector hi contains deep semantic information of the i-th character within the current context of the course text. This output serves as one of the inputs to the MDPF module. Simultaneously, it undergoes lateral interaction and fusion with the boundary features H2 extracted by the MDPF module, ultimately forming the input H3 for the model’s SCL module.
3.2. Multi-Dimensional Position Features
To enhance the model’s perception of entity boundaries, a multi-dimensional position feature fusion strategy is proposed. Based on the position of a character within different words, this strategy represents the boundary features of each character using three components: B (beginning of a word), M (middle of a word), and E (end of a word). Subsequently, a self-attention mechanism is employed to dynamically calculate the relative weights of these three position features, thereby generating a more informative and boundary-aware character representation. For example, assuming that c
1c
2, c
3c
4, and c
1c
2c
3c
4 are three distinct words, the B, I, and E lexical information for each character within the entity “c
1c
2c
3c
4” is illustrated in
Figure 3.
The MDPF component utilizes an external dictionary to filter all combinations of the text, resulting in a set of characters and words D = {c1, (c1c2), (c1c2c3), …, c2, (c2c3), …, ck−1, (ck−1ck), ck}. Therefore, any character ci and its associated sets are subsets of D. It is noteworthy that in Chinese, as a compound language, each character possesses its own fundamental meaning that cannot be ignored. Consequently, the character ci is also included as an element in D.
This study utilizes
H2 = {
,
,…,
} as the output of the MDPF component, where
= concatenate(
CiB,
CiM,
CiE). Here,
CiB,
CiM, and
CiE represent the character’s representations at the three lexical positions beginning (B), middle (M), and end (E), respectively:
where
Dci represents the set of words in
D associated with character
ci;
Bci,
Mci, and
Eci represent the subsets of
Dci, where character
ci is at the beginning, middle, and end of a word, respectively. (
j) denotes the
j-th word in the sets
Bci,
Mci, and
Eci; and
m,
n, and
p represent the sizes of the aforementioned sets. The calculation process for the coefficients
αj,
βj, and
θj is as follows:
In this study,
fB(·),
fM(·), and
fE(·) are employed to construct the correlation between character
ci and all words within the word sets
Bci,
Mci, and
Eci, respectively. The specific representations are as follows:
Here, Wθ denotes the parameters related to the character position, and δ(·) represents the ReLU activation function.
To further reinforce the importance of boundary position features, a position bias mechanism is introduced to explicitly distinguish the significance of different boundary positions. The position bias bpos is defined as follows: when calculating the sets CiB or CiE, bpos = 0.1; when calculating the set CiM, bpos = 0.0. By incorporating this bias value, the attention weights for boundary positions (B/E) are enhanced, thereby strengthening boundary features through lexical position information.
3.3. Selective Contextual Linker
To address the challenge of long-range dependencies caused by complex semantic structures in course texts, a selective contextual linker is designed. The core logic of this module lies in leveraging the boundary prior knowledge H2, obtained by the MDPF module, to guide the model in establishing jump-style associations between key characters.
First, the SCL receives the input consisting of the fused character semantics
H1 and the multi-dimensional position feature sequence
H2. Let the fused feature sequence be
H3 = {
,
, …,
}, where
represents the integrated vector of the
i-th character, which fuses the semantic information obtained from the encoder with the boundary features extracted by the MDPF module. Subsequently, the association strength between characters is calculated via a selective attention mechanism guided by boundary information. Unlike the traditional global self-attention mechanism, which is prone to introducing redundant noise, the SCL utilizes boundary features as constraints to calculate the link score
eij between any two characters i and j in the sequence. The calculation formula is as follows:
where
WQ and
WK are learnable weight matrices,
dk is the feature dimension, and
Bij is the boundary bias term provided by the MDPF module. When characters i and j are located at the boundaries of the same potential entity or possess strong semantic associations,
Bij is assigned a higher weight, thereby guiding the model to form skip links, as illustrated by the arcs in the diagram.
Subsequently, the normalized link weights are used for the dynamic aggregation of contextual information. The link scores are normalized via the Softmax function to obtain the attention weight
αij, which represents the degree to which character i absorbs information from other positions in the sequence. The formula is as follows:
Based on this weight, the SCL performs a weighted summation of the sequence to generate the context vector
li, which incorporates long-range dependency features. The formula is as follows:
where
WV denotes the trainable weight matrix for the value transformation, and
hj represents the integrated hidden vector of the
j-th character in the input sequence.
This step achieves a non-linear information leap, enabling the model to directly capture logically related knowledge points that are far apart in the text, effectively addressing the issue of long-distance semantic loss in course texts. Finally, the model performs residual connection and normalization on the fused features
H3 (which integrate character semantics and boundary information) and the long-range features
li extracted by the SCL, yielding the final output features
Z = {
z1,
z2, …,
zn}:
This vector sequence Z integrates local boundary precision with global semantic correlation and is fed into the subsequent decoding layer for the final label path decoding. In this manner, the SCL successfully captures the logical associations within long-span entities and across different entities, providing critical support for improving the accuracy of named entity recognition.
3.4. Classifier
At the top level of the MP-CNER model, to address the strong dependencies between labels in sequence labeling tasks and obtain globally optimal prediction results, a conditional random field [
27] is adopted as the decoding layer. Although the preceding SCL module captures rich contextual information, independent character-wise classification is prone to generating illogical label combinations (for instance, in BIO tagging, an “I-” label cannot directly follow an “O” label).
First, the model feeds the fused feature sequence Z from the SCL layer into a linear mapping layer to calculate the unnormalized probability of each character being predicted as a specific entity label (e.g., B-Name). This results in the emission score P, which encapsulates the multi-dimensional features extracted by the preceding model components.
Then, a transition probability matrix is introduced to learn the logical constraints between labels. To ensure the validity of the output sequence, the CRF incorporates a learnable transition matrix A. This matrix is responsible for capturing the transition regularities between labels (for example, an “I-” label must follow a “B-” label). Through this logical constraint, the model can effectively correct illegal labeling issues caused by ambiguous boundaries.
The model accumulates the emission scores for all positions in the sequence and the transition scores between adjacent labels to obtain the global joint score
S(
X,
y) for the label sequence
y:
where
Ayi,yi+1 represents the transition score from label
yi to label
yi+1, while
y0 and
yn+1 denote the start and end labels of the sequence, respectively.
Finally, in the prediction stage, to avoid the exponential computational complexity of enumerating all possible paths, the model adopts the Viterbi algorithm based on the principle of dynamic programming. This algorithm efficiently solves for the optimal label path
y* that maximizes the global score within the vast search space:
where
y* represents the optimal predicted label sequence, and argmax denotes the operation to find the candidate sequence
that maximizes the scoring function S(
) within the space of all possible label sequences
YX for the given input
X.
Through global optimal decoding, the MP-CNER model ultimately outputs precise entity labeling results, achieving the identification of knowledge entities.
4. Experimental Settings
To evaluate our NER model, we conducted experiments on three datasets. This section describes the datasets used in the experiments, as well as the baseline models.
4.1. Datasets
The datasets used in the experiments are shown in
Table 1.
Nursing dataset: This dataset is composed of textbooks, supplementary teaching materials, and nursing cases related to Internal Medicine Nursing courses, collected from various educational websites such as MOOC (China MOOC). Under the guidance of professional instructors, we defined and annotated five types of entities: Disease, Etiology, Treatment, Clinical Manifestation, and Drug.
New_Energy_Vehicle dataset: This dataset originates from document materials provided by instructors of the “New Energy Vehicle Detection and Maintenance” course at a vocational college, along with open-source materials from various educational websites. It includes data such as course standards, textbooks, and supplementary teaching materials. Under the guidance of professional instructors, we defined and annotated nine types of entities: Vehicle Name, System, Component, Fault, Maintenance Tool and Equipment, Maintenance Operation, Specification, Task, and Principle.
People’s_Daily_NER dataset: This is a classic public dataset for named entity recognition tasks, originating from People’s Daily articles. It covers multiple time periods, such as 1998 and 2014, and contains three types of entities: Person, Location, and Organization.
During the data cleaning process, special symbols, HTML tags, and redundant white spaces were removed. To accommodate the input constraints of the pre-trained Chinese-RoBERTa-wwm-ext model, the maximum sequence length was set to 256. Sentences exceeding this limit were processed using a sliding window approach with a 20-character overlap to ensure that entities located at the truncation boundaries remained intact. For sentences shorter than the maximum length, dynamic padding was applied to maintain consistent tensor dimensions during batch training.
4.2. Baseline
Lattice-LSTM [
28]: This is a specialized extraction architecture designed for Chinese NER tasks. By introducing a “character–word” lattice structure into traditional LSTM units, it dynamically integrates external lexicon information into the character sequence representation. This approach effectively leverages lexical boundary information to enhance feature expression.
BERT-BiLSTM-CRF: This is an evolution that combines the strengths of the BERT and BiLSTM-CRF models. BiLSTM-CRF is a classic architecture that performed exceptionally well in sequence labeling tasks prior to BERT, offering robust sequence modeling and constraint capabilities. BERT provides powerful contextual feature representation capabilities.
MRC [
29]: By formulating the NER task as a machine reading comprehension (question answering) task and utilizing the strong encoding power of pre-trained models such as BERT, this method effectively resolves the issue of ambiguous entity boundaries. The introduction of prior knowledge enables it to perform well across various practical applications with broad applicability.
SDI-NER [
30]: This model employs graph neural networks (GNNs) to learn syntactic dependency graph information between segmented words and integrates this structural information into the NER model’s word representations. Simultaneously, it extracts task-specific hidden information from multiple Chinese word segmentation (CWS) and part-of-speech (POS) tagging tasks, using multi-head self-attention components to fuse the diverse information retrieved.
Efficient_Global_Pointer [
31]: This is an improved version of the GlobalPointer model. While maintaining the original model’s ability to handle ambiguous entity boundaries, it reduces the number of parameters, thereby enhancing the model’s efficiency.
Qwen-7B [
32]: This is a 7-billion parameter model from the Tongyi Qwen large language model series developed by Alibaba Cloud. It is a Transformer-based large language model trained on massive, diverse, and high-quality pre-training data, including web text, professional books, and code, and it is characterized by its large-scale training corpus, powerful performance, and comprehensive vocabulary. During the experiment, we employed a few-shot CoT (chain-of-thought) prompting strategy by explicitly defining entity categories and output formats; meanwhile, we set the temperature parameter to 0 to ensure the reproducibility of the results.
4.3. Environment and Parameter Settings
The experimental environment settings are shown in
Table 2.
The experimental parameters are shown in
Table 3.
4.4. Evaluation Metrics
The performance of the model was evaluated using the classic metrics for multi-classification tasks: precision, recall, and F1. The calculation methods are as follows:
where
TP denotes the number of correctly identified entities,
FP represents the number of samples incorrectly identified as a specific entity type by the model, and
FN signifies the number of actual entities that the model failed to recognize.
Specifically, precision measures the proportion of entities predicted as a specific type that truly belong to that category. Recall quantifies the proportion of all ground-truth entities that the model successfully identifies. F1 serves as a comprehensive evaluation metric, calculated as the harmonic mean of precision and recall. It provides a balanced reflection of the model’s overall efficacy in the entity recognition task. Consequently, F1 is widely regarded as a primary benchmark for assessing the performance of named entity recognition models.
5. Results
5.1. Comparison Study
The experimental results are shown in
Table 4 (Bold values indicate the optimal results). The experimental results indicate that the MP-CNER model achieves SOTA F1 scores across all three datasets, demonstrating the superior efficacy of the proposed architecture in handling curriculum-based NER tasks.
Regarding specific dataset performance, MP-CNER reached an F1 score of 76.79% on the Nursing dataset, which represents an improvement of 3.15 percentage points over the next best-performing MRC model. On the New_Energy_Vehicle dataset, the model achieved a peak F1 score of 89.85%. Although the Efficient_Global_Pointer showed a marginal advantage in recall by 0.21%, MP-CNER maintained a substantial lead in precision by reaching 90.11%. Furthermore, on the general People’s_Daily_NER dataset, the model continued its dominance with a high F1 score of 96.10%.
Compared with Lattice-LSTM and SDI-NER, MP-CNER exhibited marked performance gains across all datasets, particularly on the New_Energy_Vehicle dataset. This superiority suggests that unlike traditional lattice structures or complex graph neural networks that fuse syntactic information, the MDPF dynamic feature fusion scheme guided by Chinese-RoBERTa-wwm-ext extracts entity boundary information more directly and efficiently. Consequently, it effectively mitigates the recognition errors caused by word segmentation biases common in specialized domains.
The lead maintained by MP-CNER over MRC and Efficient_Global_Pointer highlights the advantages of the SCL. By leveraging boundary signals to establish skip links, the SCL offers greater flexibility in capturing the internal logic of long-span entities than the fixed query templates of MRC or the global matrices of Global Pointer. This flexibility translates directly into higher overall identification precision.
The results show that QWen-7B performed considerably worse than specialized models, with F1 scores fluctuating between 50% and 80%. This reflects a severe domain misalignment when general-purpose LLMs, without specialized fine-tuning, are applied to technical curriculum texts such as those in nursing. In contrast, by integrating a domain lexicon as prior knowledge, MP-CNER demonstrates a decisive advantage in specialized curriculum knowledge extraction.
To verify the effectiveness of using Chinese-RoBERTa-wwm-ext as the encoding layer for MP-CNER, comparative experiments were conducted on the Nursing dataset by replacing it with other representative pre-trained models. The experimental results are illustrated in
Figure 4.
Figure 4 shows that the experimental results obtained using Chinese-RoBERTa-wwm-ext as the embedder are significantly superior to those obtained using the other models, fully demonstrating its advantages in this role. During the feature extraction process, the whole-word masking strategy employed by Chinese-RoBERTa-wwm-ext substantially enhances the extraction of Chinese features, surpassing traditional word2vec methods, as well as the standard BERT and RoBERTa models.
5.2. Ablation Study
To further investigate the performance gains contributed by each module within the MP-CNER model, we conducted ablation studies across all three datasets. Specifically, we evaluated the following variants: R
Embedding, where the Chinese-RoBERTa-wwm-ext model is removed and replaced with word2vec as the embedding layer; R
MDPF, where the MDPF module is removed, and R
SCL, where the SCL is discarded. The detailed results of these ablation experiments are presented in
Table 5,
Table 6 and
Table 7 (Bold values indicate the optimal results).
To more clearly present the impact of the individual modules on the entity alignment results, a visual comparison was performed, and the results are illustrated in
Figure 5.
The results of the ablation experiments clearly validate the effectiveness of each module within the proposed architecture. The removal of any single component leads to a consistent decline in NER performance. These findings demonstrate that Chinese-RoBERTa-wwm-ext as the embedding layer, the multi-dimensional position feature fusion module, and the selective contextual linker all play indispensable roles in the MP-CNER model.