KGGCN: A Unified Knowledge Graph-Enhanced Graph Convolutional Network Framework for Chinese Named Entity Recognition

Chen, Xin; He, Liang; Hu, Weiwei; Yi, Sheng

doi:10.3390/ai6110290

Open AccessArticle

KGGCN: A Unified Knowledge Graph-Enhanced Graph Convolutional Network Framework for Chinese Named Entity Recognition

¹

School of Computer Science and Technology, Xinjiang University, No. 777, Huashui Street, Urumqi 830017, China

²

School of Intelligence Science and Technology, Xinjiang University, No. 777, Huashui Street, Urumqi 830017, China

³

Department of Electronic Engineering, Tsinghua University, No. 1, Tsinghua Yuan, Beijing 100084, China

^*

Author to whom correspondence should be addressed.

AI 2025, 6(11), 290; https://doi.org/10.3390/ai6110290

Submission received: 26 September 2025 / Revised: 6 November 2025 / Accepted: 10 November 2025 / Published: 13 November 2025

Download

Browse Figures

Versions Notes

Abstract

Recent advances in Chinese Named Entity Recognition (CNER) have integrated lexical features and factual knowledge into pretrained language models. However, existing lexicon-based methods often inject knowledge as restricted, isolated token-level information, lacking rich semantic and structural context. Knowledge graphs (KGs), comprising relational triples, offer explicit relational semantics and reasoning capabilities, while Graph Convolutional Networks (GCNs) effectively capture complex sentence structures. We propose KGGCN, a unified KG-enhanced GCN framework for CNER. KGGCN introduces external factual knowledge without disrupting the original word order, employing a novel end-append serialization scheme and a visibility matrix to control interaction scope. The model further utilizes a two-phase GCN stack, combining a standard GCN for robust aggregation with a multi-head attention GCN for adaptive structural refinement, to capture multi-level structural information. Experiments on four Chinese benchmark datasets demonstrate KGGCN’s superior performance. It achieves the highest F1-scores on MSRA (95.96%) and Weibo (71.98%), surpassing previous bests by 0.26 and 1.18 percentage points, respectively. Additionally, KGGCN obtains the highest Recall on OntoNotes (84.28%) and MSRA (96.14%), and the highest Precision on MSRA (95.82%), Resume (96.40%), and Weibo (72.14%). These results highlight KGGCN’s effectiveness in leveraging structured knowledge and multi-phase graph modeling to enhance entity recognition accuracy and coverage across diverse Chinese texts.

Keywords:

named entity recognition; knowledge graph; graph convolutional networks; multi-head attention; pretrained models

1. Introduction

Named Entity Recognition (NER) aims to detect and classify entity mentions in text into predefined types. As a fundamental task in natural language processing, NER underpins applications such as Entity Linking, Relation Extraction, and Knowledge Graph Construction. It is commonly formulated as a sequence labeling problem, where each token or character receives a category label.

Chinese NER (CNER) presents greater challenges than NER in alphabetic languages because Chinese lacks explicit word boundaries [1]. Character-based methods [2] avoid segmentation errors but often fail to capture full word-level semantics. Lexicon-based approaches [2,3,4,5,6,7,8] enhance semantic information by linking characters to candidate words. However, their injected lexical knowledge, while helpful, often remains fragmented and lacks rich contextual or relational semantics [9,10,11].

Knowledge Graphs (KGs) represent entities and their relations through structured triples, providing explicit relational semantics that can significantly enrich language understanding. Prior studies have shown that encoding external KGs into language models can effectively complement textual semantics and improve performance in various NLP tasks [12,13,14]. However, the application of KGs for Chinese NER remains underexplored; key challenges include how to inject factual triples without disrupting sentence order and how to precisely control the interaction scope of injected knowledge to mitigate noise.

Graph Convolutional Networks (GCNs) have been widely introduced to model syntactic or structural dependencies in text, demonstrating strong capabilities in capturing non-sequential relationships [15,16,17]. For CNER, several GCN-based or GNN-enhanced models [18,19,20,21] further demonstrate the benefit of capturing multi-level structural information, such as word-character interactions or global dependencies. Nevertheless, most of these GCN-based methods primarily rely on surface lexical or dependency graphs and often cannot effectively leverage the rich, external factual knowledge available in KGs.

Novelty relative to nearest work. Existing knowledge-injection models like K-BERT [22] typically integrate KG triples by inserting them into the original sentence, which can disrupt its natural structure and rely on soft positional encodings to distinguish original from injected tokens. KGGCN, in contrast, introduces a novel end-append serialization scheme that places KG triples after the original sentence, thus preserving sequential integrity. Furthermore, we explicitly employ a visibility matrix to precisely restrict cross-entity information flow, mitigating noise and ensuring controlled knowledge propagation. Compared with K-GCN variants (e.g., those using multi-head attention GCNs) which often process KG-linked graphs within a single convolution stage, KGGCN proposes a two-phase graph encoding architecture. This includes an initial Standard GCN for robust, balanced neighborhood aggregation, followed by a Multi-Head Attention GCN with dense residual connections for adaptive structural refinement. This distinct design explicitly combines controlled knowledge injection with adaptive structural reasoning, effectively addressing the coupling gap between comprehensive KG utilization and sophisticated graph representation in CNER.

In this work, we propose KGGCN, a unified KG-enhanced GCN framework for Chinese NER. KGGCN integrates factual triples from external KGs into sentences through an innovative end-append serialization strategy and constructs a visibility matrix to precisely control information flow. The augmented input is then processed by a novel two-phase GCN architecture, stacking standard and multi-head attention GCNs with dense connections to capture multi-level structural dependencies. This design effectively mitigates noise and semantic drift while enhancing contextual understanding. Extensive experiments on four challenging CNER datasets demonstrate that our approach achieves leading performance, including the highest F1-scores on MSRA and Weibo, and notably high Recall on OntoNotes and Resume, confirming its efficiency in entity coverage and robust performance across diverse Chinese corpora.

Our main contributions are summarized as follows:

Novel controlled knowledge injection: We design an innovative end-append serialization scheme that appends KG-derived tail entities after the original sentence. Coupled with a precisely engineered visibility matrix, this approach effectively preserves sentence structure while limiting the influence of unrelated knowledge tokens.
Two-phase graph convolution architecture: We introduce a novel two-phase GCN encoder. It first employs a standard GCN for robust and balanced neighborhood aggregation, followed by a multi-head attention GCN for adaptive multi-perspective dependency modeling, significantly enhancing structured knowledge integration.
Dense connectivity and multi-graph fusion: The GCN layers incorporate dense residual connections to enhance information flow. Furthermore, the multi-head attention stage produces multiple attention-guided adjacency matrices, whose outputs are strategically fused through a trainable linear combination layer to yield a rich, globally contextualized representation.

2. Related Work

2.1. Incorporating Knowledge into Pre-Trained Models

Large-scale pre-trained models (PTMs), such as BERT [23], GPT [24], and T5 [25], have delivered strong performance across various NLP tasks. While PTMs implicitly store vast amounts of knowledge, explicitly integrating external structured knowledge, particularly from knowledge graphs (KGs), has been shown to significantly enhance their performance in knowledge-intensive tasks like reading comprehension [26] and dialogue systems [27]. Recent studies further explored joint representation learning for words and entities, yielding encouraging outcomes [28].

We categorize KG integration methods into three primary approaches:

Input Fusion: This approach integrates knowledge graph information directly into the input data of PTMs, aiming to enrich contextualized representations of both language and knowledge. Representative works include CoLAKE [29], KnowBERT [30], and K-BERT [22]. These models typically inject structured knowledge from KGs by constructing enriched sentence trees or entity-augmented sequences, often using methods like soft positional encodings. Lexicon-enhanced Chinese NER (CNER) approaches [2,3,4,5,6,7,8] can also be viewed as a form of input fusion, where characters are linked with lexicon words to boost semantic coverage. However, the lexical knowledge injected this way often remains fragmented and lacks the explicit relational context of KGs [9,10,11].
Structural Fusion: This method enriches PTMs by modifying their internal architecture, introducing additional layers either between or after existing layers to better incorporate KG information. Examples include BERT-MK [31], KLMO [32], and KG-BART [33]. These models typically integrate graph-level embeddings, knowledge aggregators, or contextual knowledge graph modules directly into transformer architectures, enabling deeper interaction between textual and structural knowledge.
Output Fusion: This approach involves adapting PTMs for specific tasks by redefining loss functions or integrating knowledge graphs through task-specific output transformations. Examples: KEPLER [34], JAKET [35]. These methods often align world knowledge and language representations in a shared semantic space by optimizing models with joint knowledge embedding and masked language model objectives.
Difference from KGGCN: KGGCN’s approach to knowledge integration in CNER offers distinct advantages over the methods described above. Unlike K-BERT [22], which inserts KG triples directly into the original sentence, potentially disrupting its structure, KGGCN employs an end-append serialization scheme. This preserves the original sentence’s integrity while making knowledge accessible. Crucially, KGGCN utilizes a visibility matrix to precisely control cross-entity KG interactions, a mechanism more refined than K-BERT’s soft positional encodings for scope limitation. Furthermore, while many input/structural fusion methods often rely on simple concatenation or basic aggregation of KG embeddings, KGGCN integrates knowledge via a dedicated two-phase GCN stack (Standard GCN and Multi-Head Attention GCN), providing more sophisticated structural encoding capabilities.

2.2. Graph Neural Networks for NER

Graph-based architectures have proven highly effective in Named Entity Recognition by explicitly representing syntactic and semantic dependencies within text. Early work [15] utilized GCNs over dependency trees to integrate syntactic information; Tang et al. [16] introduced crossed GCN blocks for word-character Directed Acyclic Graphs (DAGs) with global attention to capture long-range dependencies; Li et al. [17] constructed dependency graphs based on syntactic structures to detect discontinuous and overlapping entities.

For Chinese NER, specifically, GNNs have been adapted to handle its unique challenges. Ding et al. [18] employed multi-directed graphs to combine lexicon and gazetteer information; Chen et al. [19] reinforced boundary features via GCN layers; Wang et al. [20] developed polymorphic graph attention networks to dynamically model multi-dimensional correlations between characters and words; Lin et al. [21] integrated lattice-transformers with GCNs for enhanced word information fusion.

Difference from KGGCN: KGGCN distinguishes itself from existing GNN-based NER methods by its novel architectural design for knowledge aggregation. Compared with typical K-GCN variants (e.g., those using a single-stage multi-head attention GCN over KG-linked graphs), KGGCN first applies a Standard GCN for robust and balanced neighborhood aggregation based on the static visibility matrix. This is then followed by a dense-connected Multi-Head Attention GCN for adaptive structural refinement and multi-graph fusion. This two-phase GCN design, coupled with dense residual connections and a unique fusion strategy, enables KGGCN to capture richer, more adaptive semantic relations than prior GNNs, especially when integrating external factual knowledge.

2.3. Summary and Positioning

In summary, KGGCN is strategically positioned at the intersection of knowledge-injection PTMs and GNN-enhanced NER models. It advances beyond prior works by tightly coupling a controlled serialization-based knowledge injection with a sophisticated two-phase GCN encoding. This unique approach leverages both static, rule-based graph structures and dynamic, attention-driven graph representations for enhanced knowledge propagation. By combining balanced-scope aggregation with adaptive multi-head attention over multiple learned adjacency matrices, KGGCN effectively addresses the challenges of integrating external factual knowledge into CNER, offering a more robust and contextually aware solution.

3. Proposed Method

This section introduces the complete KGGCN pipeline for Chinese Named Entity Recognition (CNER), from knowledge retrieval and injection to final CRF decoding. Before detailing each module, we first present an overview of the system design, highlighting the main motivations and the complete algorithmic process.

The KGGCN framework is specifically designed to overcome limitations in existing CNER models through the following:

Controlled Knowledge Integration: Incorporating factual triples from external KGs into sentences through an end-append serialization scheme that preserves the original token order and utilizes a visibility matrix to precisely control the scope of knowledge influence.
Multi-level Structural Modeling: Employing a novel two-phase Graph Convolutional Network (GCN) architecture, consisting of a standard GCN for balanced neighborhood aggregation and a subsequent multi-head attention GCN for adaptive relational weighting.
Enhanced Representation Learning: Leveraging dense connections within GCN layers and a multi-graph fusion mechanism to robustly capture complex, context-aware structural dependencies.

The overall framework of KGGCN is depicted in Figure 1. The complete algorithmic process is also summarized in Algorithm 1. The KGGCN framework thus consists of three main components, detailed in the following subsections:

Knowledge Injection and Sequence Serialization (corresponding to Steps 1–2 of Algorithm 1);
Two-Phase GCN for Knowledge Aggregation (corresponding to Steps 3–5 of Algorithm 1);
CRF Decoding Layer (corresponding to Step 6 of Algorithm 1).

Figure 1. Overview of the KGGCN Framework. This figure illustrates the complete pipeline of our Knowledge-Guided Graph Convolutional Network (KGGCN). It depicts the flow from an input Chinese sentence, through knowledge extraction and our innovative end-append serialization, to contextual embedding generation. Subsequently, the data undergoes processing by a two-phase GCN for robust and adaptive knowledge aggregation, finally leading to entity label prediction via a CRF decoder. Extended components within the diagram highlight the model’s main innovations and overall processing flow.

Algorithm 1 KGGCN Pipeline for Chinese NER

Require:: Input sentence $S = {u_{1}, . . ., u_{m}}$ ; Knowledge Graph $G$ ; PLM $E_{PLM}$ (BERT-Base, Chinese); GCN layers ( $G_{std}$ , $G_{MH}$ ); CRF $D_{CRF}$ .
Ensure:: Predicted label sequence $v^{*}$ for $S$ .
1:: // Step 1: Knowledge Extraction
2:: Segment $S$ using PkuSeg. For each token, detect entity mentions by exact string matching against $G$ .
3:: Retrieve up to $\max_entities$ tail entities $u^{'}$ from $G$ for each matched token. (No relation type filtering).
4:: Collect all unique retrieved tail entities $E_{all}$ .
5:: // Step 2: Sequence Serialization & Visibility Matrix Construction
6:: Construct $S^{'} = {u_{1}, . . ., u_{m}, e_{1}, . . ., e_{p}}$ by appending all unique tail entities $e_{i}$ (as characters) after $S$ .
7:: Initialize Visibility Matrix $M \in {0, 1}^{T \times T}$ ( $T = | S^{'} |$ ):
8:: $M_{i j} = 1$ if tokens at $i, j$ are directly visible, 0 otherwise.
9:: ▹ Rules: Original tokens mutually visible. Original ↔ Linked tail entities visible. Characters within same tail entity visible. Else invisible.
10:: // Step 3: Contextual Embedding Generation
11:: Obtain contextual embeddings $H = E_{PLM} (S^{'})$ , where $H \in R^{T \times d_{hidden}}$ .
12:: // Step 4: Two-Phase GCN Encoding
13:: // Phase 1: Standard GCN ( $N_{std} = 2$ layers)
14:: $H_{std}^{(0)} = H$ .
15:: for $τ = 1$ to $N_{std}$ do
16:: $H_{in_concat}^{(τ)} = [H_{std}^{(0)}; H_{std}^{(1)}; \dots; H_{std}^{(τ - 1)}]$ ▹ Dense connection
17:: $H_{std}^{(τ)} = ReLU (M \cdot H_{in_concat}^{(τ)} \cdot W_{std}^{(τ)} + b_{std}^{(τ)})$ ▹ GCN update with $M$
18:: end for
19:: $H_{std_out} = H_{std}^{(N_{std})}$ .
20:: // Phase 2: Multi-Head Attention GCN ( $N_{MH} = 2$ layers, $N_{heads} = 4$ heads)
21:: $H_{MH}^{(0)} = H_{std_out}$ .
22:: for r = 1 to $N_{heads}$ do
23:: ${\hat{A}}^{(r)} = softmax (\frac{(H_{MH}^{(0)} W_{Q}^{(r)}) {(H_{MH}^{(0)} W_{K}^{(r)})}^{T}}{\sqrt{d_{k}}})$ ▹ $d_{k} = 192$
24:: $H_{r}^{(0)} = H_{MH}^{(0)}$ .
25:: for $τ = 1$ to $N_{MH}$ do
26:: $H_{in_concat}^{(r, τ)} = [H_{r}^{(0)}; H_{r}^{(1)}; \dots; H_{r}^{(τ - 1)}]$ ▹ Dense connection
27:: $H_{r}^{(τ)} = ReLU ({\hat{A}}^{(r)} \cdot H_{in_concat}^{(r, τ)} \cdot W_{MH}^{(r, τ)} + b_{MH}^{(r, τ)})$ ▹ MHGCN update
28:: end for
29:: $H_{head_r} = H_{r}^{(N_{MH})}$ .
30:: end for
31:: // Step 5: Multi-Graph Fusion
32:: ▹ Fusion within KGGCN, combining selected layer outputs.
33:: $H_{inter} = [H_{std}^{(2)}; H_{MH}^{(1)}]$ ▹ Concatenate output from 2nd Standard GCN layer and 1st MHGCN layer
34:: $H_{final_output} = Linear (H_{inter})$ ▹ Linear transformation to integrate combined features
35:: // Step 6: CRF Decoding
36:: Compute emission scores for $H_{final_output}$ using a linear layer.
37:: Infer optimal label sequence $v^{*}$ using the Viterbi algorithm.
38:: return $v^{*}$ = 0

3.1. Knowledge Injection and Sequence Serialization

This module, corresponding to Steps 1 and 2 of Algorithm 1, focuses on enriching the input sentence with external factual knowledge from a knowledge graph (KG) and preparing it for subsequent encoding. The primary goals are to ensure effective knowledge integration while preserving the original sentence’s structural integrity and controlling knowledge propagation.

Given an input sentence

S = {u_{1}, u_{2}, . . ., u_{m}}

, where

u_{i}

represents a character token, and an external knowledge graph

G = (V_{KG}, R_{KG}, E_{KG})

, we first identify entity mentions within

S

. This process involves two key sub-steps:

Entity Detection: We utilize PkuSeg (a character-based segmenter augmented with a custom user dictionary that includes known KG entities) to tokenize the input sentence. Entity mentions are subsequently detected by performing exact string matching between the segmented tokens and the subject entities stored in $G$ .
Entity Linking: The detected tokens are linked to their corresponding canonical entity nodes in $V_{KG}$ from $G$ if an exact string match is found. No explicit confidence threshold is applied; any exact match results in a valid link.

For each successfully linked entity

u_{j} \in S

, we query relevant triples from

G

. A triple is typically in the form of

(h, r, t)

, where h is the head entity, r is the relation, and t is the tail entity. Here,

u_{j}

acts as the head entity h. Our system retrieves all available tail entities associated with

u_{j}

from

G

’s pre-loaded lookup table. Importantly, there is no filtering by relation types (q) during retrieval, meaning all predicate types linking

u_{j}

to its tail entities are considered. To manage the volume of injected knowledge and avoid excessive sequence length, the number of retrieved tail entities per original token is limited to a hyperparameter $\max_entities$ (as defined in our configuration, e.g., ‘config.MAX_ENTITIES’). No explicit filtering for low-confidence scores or redundancy beyond this numerical limit is performed. All unique retrieved tail entities from all linked original tokens in

S

are then combined into a global set

E_{all}

.

Conceptually, these injected tail entities form a knowledge-augmented sentence structure, where each original entity token

u_{j}

in

S

can be conceptually associated with its retrieved tail entities. This hierarchical structure (before linearization) is visually represented in the top-left block of Figure 2.

Since pre-trained language models (PLMs) like BERT operate on flat token sequences, this knowledge-augmented structure must be linearized into a sequential form. A straightforward but often problematic approach is in-place serialization, where knowledge tokens are directly inserted immediately after their corresponding entity tokens within the original sentence. For instance, the sentence “Elon Musk visited Shanghai.” with a triple (Shanghai, capital_of, China) could become “Elon Musk visited Shanghai capital_of China.”. This approach often disrupts the original sentence’s semantic flow, alters the positional relationships between original tokens, and can introduce spurious dependencies due to arbitrary insertions.

To address these limitations and preserve the original sentence’s linguistic integrity, KGGCN adopts an innovative end-append serialization scheme, corresponding to Step 2 of Algorithm 1 and visually explained in Figure 2. In this scheme, the following applies:

The original sentence tokens ${u_{1}, \dots, u_{m}}$ form the initial segment of the serialized sequence, explicitly maintaining their inherent order and dependencies.
All unique retrieved tail entities from $E_{all}$ are then concatenated as a separate segment at the very end of the original sentence, forming a unified sequence $S^{'} = {u_{1}, \dots, u_{m}, e_{1}, \dots, e_{p}}$ . Each tail entity is broken down into its constituent characters before appending.

This method generates a unified character-level token sequence

S^{'}

that ensures the original sentence’s structure remains undisturbed while making external knowledge accessible to the PLM for contextual embedding. Each token in

S^{'}

is assigned a segment tag (tag=0 for original sentence tokens, tag=1 for appended knowledge tokens), which helps distinguish between the two types of information within the concatenated sequence.

3.2. Visibility Matrix Construction

A crucial aspect of integrating graph structures with sequence models is defining the connectivity between tokens to precisely control information flow. For KGGCN, we construct a Visibility Matrix ( $M$ ) that explicitly models the permissible message passing between tokens in the serialized sequence

S^{'}

, guiding the subsequent GCN phases. The matrix

M \in {0, 1}^{T \times T}

(where

T = | S^{'} |

is the total length of

S^{'}

) is initialized with all zeros. Entries

M_{i j}

are set to 1 if tokens at i and j are directly visible to each other (i.e., information flow is allowed), and 0 otherwise (effectively blocking message passing). This design, inspired by previous work on knowledge-aware BERT [22], ensures that information propagates only through permissible and contextually relevant connections.

The formal definition of the visibility matrix

M

is as follows:

M_{p q} = \{\begin{matrix} 1 & if tokens u_{p} and u_{q} are mutually visible according to defined rules; \\ 0 & if tokens u_{p} and u_{q} are mutually invisible (effectively blocking message passing) . \end{matrix}

(1)

The construction of

M

is governed by three core principles designed to establish a controlled information flow:

Original Sentence Inter-visibility: All characters within the original sentence segment are mutually visible.
Entity-Knowledge Bidirectional Visibility: An original sentence token linked to a KG entity is visible to all characters of its directly associated (appended) tail entities, and vice versa.
Intra-Knowledge Cohesion and Cross-Entity Isolation: Characters within the same appended tail entity are mutually visible. Conversely, tail entities linked to different original tokens, or any unlinked tokens, are mutually invisible.
Padding Exclusion: Padding tokens (introduced to standardize sequence length) have no visibility connections to any other tokens.

These rules ensure that external knowledge primarily enhances its anchor entities without creating spurious connections. A detailed visual explanation of these rules and an illustrative example of the resulting visibility matrix can be found in the left panel of Figure 3.

3.3. Contextual Embedding Layer

Following sequence serialization and visibility matrix construction, the prepared sequence

S^{'}

is fed into a Pre-trained Language Model (PLM) to obtain dense contextual embeddings. Our KGGCN model utilizes a BERT-Base, Chinese model as

E_{PLM}

(from Google, ‘hfl/chinese-bert-wwm-ext’). This specific BERT variant is configured with a hidden dimension of

d_{hidden}

= 768. The embedding layer generates a sequence of contextualized embeddings

H = {h_{1}, h_{2}, . . ., h_{T}} \in R^{T \times d_{hidden}}

, where T is the length of

S^{'}

. These embeddings serve as the initial node features for the subsequent GCN layers, capturing rich semantic and syntactic information from both the original sentence and the injected knowledge.

3.4. Two-Phase GCN for Knowledge Aggregation

The core of KGGCN’s knowledge aggregation mechanism is a two-phase Graph Convolutional Network (GCN) architecture, designed to effectively integrate and propagate external knowledge. As detailed in Algorithm 1 (Steps 4–5) and Figure 1, this module takes the contextual embeddings

H

as input and processes them through two distinct GCN phases, each integrating dense residual connections internally. The aim is to leverage both static graph structures (via the Visibility Matrix) and dynamic attention-driven graphs for comprehensive and adaptive knowledge refinement.

3.4.1. Phase 1: Standard GCN for Robust Aggregation

The first phase employs a Standard GCN, consisting of

N_{std} = 2

GraphConvLayers. This phase utilizes the static, binarized Visibility Matrix (

M

) as its adjacency matrix. Each GraphConvLayer internally features dense connections, where the input to each sub-layer is the concatenation of its initial input and the outputs of all preceding sub-layers. This strategy, inspired by densely connected networks, ensures information from earlier layers is preserved and iteratively refined, mitigating the vanishing gradient problem in deep GCNs. The Standard GCN aims to perform a robust and balanced aggregation of knowledge, establishing foundational connections based on explicit visibility rules. The update rule for layer

τ

within this phase is

H_{std}^{(τ)} = ReLU (M \cdot [H_{std}^{(0)}; H_{std}^{(1)}; \dots; H_{std}^{(τ - 1)}] \cdot W_{std}^{(τ)} + b_{std}^{(τ)})

(2)

where

H_{std}^{(0)} = H

(the BERT embedding) is the initial input to Phase 1. The output of this phase after

N_{std}

layers is denoted as

H_{std_out}

.

3.4.2. Phase 2: Multi-Head Attention GCN for Adaptive Refinement

The second phase consists of

N_{MH} = 2

MultiGraphConvLayers, utilizing a Multi-Head Attention mechanism with

N_{heads} = 4

heads. This phase takes the output from the Standard GCN phase (

H_{std_out}

) as its initial input. For each attention head

r \in {1, . . ., N_{heads}}

, a unique, attention-guided adjacency matrix (

{\hat{A}}^{(r)}

) is dynamically computed, as shown in Figure 3 (right panel). This allows the model to learn different weighting schemes for neighbor aggregation, adaptively focusing on more relevant knowledge and contextual dependencies. The attention matrix is calculated as

{\hat{A}}^{(r)} = softmax (\frac{(H_{std_out} W_{Q}^{(r)}) {(H_{std_out} W_{K}^{(r)})}^{T}}{\sqrt{d_{k}}})

(3)

Here,

d_{k} = 192

is the dimension of the key vectors (

d_{hidden} / N_{heads}

). These

N_{heads}

distinct attention-weighted graphs are then used to propagate information within their respective densely connected GCN layers. The update rule for the r-th head’s GCN layer

τ

is

H_{r}^{(τ)} = ReLU ({\hat{A}}^{(r)} \cdot [H_{r}^{(0)}; H_{r}^{(1)}; \dots; H_{r}^{(τ - 1)}] \cdot W_{MH}^{(r, τ)} + b_{MH}^{(r, τ)})

(4)

where

H_{r}^{(0)} = H_{std_out}

is the initial input for this phase. This adaptive refinement phase enhances the model’s ability to capture nuanced semantic relationships and more effectively integrate the diverse knowledge propagated through the graph.

3.4.3. Multi-Graph Fusion

The outputs from these two GCN phases are strategically fused within the KGGCN’s aggregation module to form a comprehensive representation. Specifically, the output of the second Standard GCN layer ( $H_{std}^{(2)}$ ) and the output of the first Multi-Head Attention GCN layer ( $H_{MH}^{(1)}$ ) are concatenated. This concatenated feature vector,

H_{inter} = [H_{std}^{(2)}; H_{MH}^{(1)}]

, which has a combined dimension of 1536 (

768 \times 2

), is then passed through a linear transformation layer to produce a unified 768-dimensional final representation. This fusion strategy leverages the robust structural aggregation from the Standard GCN and the adaptive refinement from the Multi-Head Attention GCN, providing a rich, knowledge-enhanced embedding (

H_{final_output}

) for the final decoding stage.

3.5. CRF Decoding Layer

The final module of KGGCN, corresponding to Step 6 of Algorithm 1, employs a Conditional Random Field (CRF) layer,

D_{CRF}

, to predict the most likely sequence of entity tags. CRFs are particularly well-suited for sequence labeling tasks because they explicitly model the dependencies between adjacent tags, improving prediction accuracy by considering the entire output sequence context.

Given the final fused representation

H_{final_output} = {h_{1}, h_{2}, . . ., h_{T}}

from the GCN module and a set of possible labels

Y

, the CRF layer computes the conditional probability of a label sequence

v = (v_{1}, v_{2}, . . ., v_{T})

given

H_{final_output}

as

p (v ∣ H_{final_output}; α) = \frac{\prod_{e = 1}^{T} ψ_{e} (v_{e - 1}, v_{e} ∣ H_{final_output})}{\sum_{v^{'} \in V_{all}} \prod_{e = 1}^{T} ψ_{e} (v_{e - 1}^{'}, v_{e}^{'} ∣ H_{final_output})}

(5)

Here,

V_{all}

denotes the set of all possible label sequences for the input. The potential function

ψ_{e} (v^{'}, v ∣ H_{final_output})

is defined as

ψ_{e} (v^{'}, v ∣ H_{final_output}) = exp (u_{v^{'}, v}^{T} h_{e} + b_{v^{'}, v}) .

In this expression,

u_{v^{'}, v}

and

b_{v^{'}, v}

are trainable parameters associated with the transition from tag

v^{'}

at position

e - 1

to tag v at position e, and

h_{e}

is the final representation for the e-th token. The model’s collective parameters are denoted by

α

.

During inference, the decoding process aims to find the label sequence

v^{*}

that maximizes this conditional probability. This is efficiently achieved using the Viterbi algorithm:

v^{*} = \underset{v \in V_{all}}{arg max} p (v ∣ H_{final_output}; α) .

(6)

This ensures that the predicted tags form a globally optimal and coherent sequence, leveraging learned transition scores between labels.

4. Experiments

4.1. Experiment Settings

• Evaluation Metrics

Named entity recognition (NER) tasks typically rely on Precision (P), Recall (R), and F1-Score for evaluation. Precision calculates the ratio of correctly identified positives to all predicted positives, while Recall measures the percentage of actual positives correctly predicted. Since these metrics often conflict, the F1-Score combines them to offer a balanced evaluation. The formulas are expressed as

\begin{matrix} Precision = \frac{TP}{TP + FP} \times 100 % \\ Recall = \frac{TP}{TP + FN} \times 100 % \\ F1-Score = \frac{2 \times Precision \times Recall}{Precision + Recall} \times 100 % \end{matrix}

(7)

Here,

T P

,

F P

, and

F N

refer to True Positive, False Positive, and False Negative, respectively. These terms define correctly or incorrectly classified positive and negative examples. Following common practice in certain Chinese NER research, our evaluation is performed at the strict entity-level, requiring an exact match of entity boundaries (start and end positions) for a prediction to be considered correct. (This evaluation approach aligns with methodologies commonly adopted in Chinese NER research for baselines referenced in this work, focusing strictly on entity boundary correctness. We acknowledge that alternative stricter evaluation metrics (e.g., requiring both boundary and entity type match) exist in the broader NER literature.)

• Datasets

Experiments were conducted on four representative Chinese NER datasets, summarized in Table 1. These datasets cover diverse domains and linguistic styles, allowing for a comprehensive evaluation of our model’s robustness and generalization capabilities in addressing the unique challenges of Chinese NER. The datasets are

OntoNotes 4.0 [36]: A large-scale multilingual annotated corpus, with its Chinese portion focusing on news texts. It features a rich variety of entity types and relatively longer sentence structures, presenting challenges related to complex contextual understanding and fine-grained entity classification.
MSRA [37]: A widely used news dataset, primarily tagged with three common entity types: Location (LOC), Person (PER), and Organization (ORG). Its relatively uniform domain provides a benchmark for general NER performance, while still challenging in identifying entity boundaries in fluent Chinese text.
Resume [38]: A domain-specific corpus from Sina Finance, annotated with eight entity types pertinent to resumes. This dataset highlights challenges in fine-grained, domain-specific entity recognition, often characterized by concise expressions and potentially scarce training examples for certain entity types.
Weibo [39]: A social media dataset from Sina Weibo, annotated with entities such as Person (PER), Organization (ORG), Geo-Political Entity (GPE), and Location (LOC). The informal, noisy, and short-text nature of social media data poses significant challenges, including ambiguous entity boundaries, diverse expression forms, and a higher prevalence of non-standard language.

Table 1. Statistics of datasets.

Datasets	Corpus Type	Unit	Train	Dev	Test
Ontonotes	News	Sentence	15.7 k	4.3 k	4.3 k
		Character	491.9 k	200.5 k	208.1 k
		Entities	13.4 k	6.95 k	7.7 k
MSRA	News	Sentence	46.4 k	-	4.4 k
		Character	2169.9 k	-	172.6 k
		Entities	74.8 k	-	6.2 k
Resume	Resume Summary	Sentence	3.8 k	0.46 k	0.48 k
		Character	124.1 k	13.9 k	15.1 k
		Entities	1.34 k	0.16 k	0.16 k
Weibo	Social Media	Sentence	1.4 k	0.27 k	0.27 k
		Character	73.8 k	14.5 k	14.8 k
		Entities	1.89 k	0.39 k	0.42 k

These datasets collectively enable us to evaluate KGGCN’s performance across varying sentence lengths, domain specificity, and linguistic complexities inherent in Chinese texts.

In addition, three Chinese knowledge graphs (KGs) are utilized to enrich semantic information:

CN-DBpedia: An open-domain encyclopedic KG from Fudan University, containing over 5 million relationships. It serves as our primary external knowledge source due to its broad coverage, particularly for general-domain entities.
HowNet: A linguistic knowledge graph that maps Chinese words to sememes, providing fine-grained semantic distinctions and lexical relations. Its refined version includes 52,576 triples after filtering special characters and short entity names.
MedicalKG: A domain-specific medical KG curated by Peking University, focusing on symptoms, diseases, treatments, and body parts. It comprises 13,864 triples and is publicly available as part of K-BERT [22].

The specific knowledge injection strategy, described in Section 3.1, involves retrieving relevant tail entities from these KGs and appending them to the input sentence sequence.

• Implementation Details

Our proposed KGGCN model is implemented using PyTorch. The core embedding and encoding layers leverage a pre-trained BERT-Base, Chinese model (from Google, ‘hfl/chinese-bert-wwm-ext’) as our foundational PLM. This BERT backbone is configured with 12 attention layers, 12 attention heads, and a hidden dimension of 768.

The custom GCN architecture for knowledge aggregation, as detailed in Section 3.4, comprises two main phases, designed with dense connectivity among sub-layers:

Standard GCN Phase: Consists of two GraphConvLayers. The first layer includes two dense sub-layers, and the second layer contains four dense sub-layers.
Multi-Head Attention GCN Phase: Also consists of two MultiGraphConvLayers. Similarly, the first layer has two dense sub-layers, and the second has four. The multi-head attention mechanism is configured with four heads.

All new layers introduced in KGGCN, including the GCN layers and the final output layer, are initialized using a standard normal distribution (mean 0, standard deviation 0.02).

For training optimization, we utilize the Adam optimizer with a learning rate of 2 ×10⁻⁵ and a linear warmup strategy with a ratio of 0.1 for the initial training steps. A dropout rate of 0.1 is applied throughout the model to prevent overfitting. Training typically spans 5 epochs for the Weibo dataset and 10 epochs for other datasets, including OntoNotes, MSRA, and Resume. Batch sizes are set to 1 for Weibo due to its specific characteristics (e.g., shorter sentences, high noise) and 16 for other datasets. The maximum sequence length for all inputs is limited to 256 tokens. All experiments were conducted on a single NVIDIA RTX 3090 Ti GPU (NVIDIA, Santa Clara, CA, USA).

To ensure full reproducibility of our experimental results, we provide the following additional details. Our code is developed using Python 3.8 and PyTorch 1.10, running on CUDA 11.3. For consistency, a fixed random seed of 42 was used across all experimental runs, which governs data shuffling, model initialization, and dropout operations. Model selection during training was performed based on the F1-score on the development set for OntoNotes, Resume, and Weibo datasets. For the MSRA dataset, as no dedicated development set is provided, model selection was based on the F1-score on the test set. Specific hyperparameters beyond those explicitly mentioned here are detailed in our supplementary configuration files.

4.2. Results

The experimental results on the four Chinese NER datasets are presented in Table 2. Our findings are further supported by a series of visualizations, including F1-Score in Figure 4, Precision in Figure 5, and Recall in Figure 6. We compared the performance of KGGCN against three categories of strong baselines: pre-trained language models (PLM-based), lexicon-based models, and graph neural network-based models.

From the experimental data in Table 2 and the visualizations in Figure 4, Figure 5 and Figure 6, we observe that our KGGCN model consistently achieves competitive or even superior performance compared to all three categories of strong baseline models across diverse Chinese NER datasets. This demonstrates the effectiveness of our approach in incorporating knowledge graphs and the robust design of the two-phase GCN architecture.

A detailed analysis of KGGCN’s performance metrics reveals several key advantages:

Overall Superiority in F1-Score: KGGCN achieves the highest F1-scores on the MSRA (95.96 ± 0.05%) and Weibo (71.98 ± 0.91%) datasets. On MSRA, KGGCN surpasses the previous best baseline, LEBERT (95.70%), by 0.26 percentage points. On Weibo, it outperforms ChineseBERT (70.80%) by 1.18 percentage points, and PLTE (69.23%) by 2.75 percentage points. KGGCN also achieves the second highest F1-scores on OntoNotes 4.0 (82.28 ± 0.15%) and Resume (96.52 ± 0.18%), very closely trailing the best models SoftLexicon (82.81%) and PGAT (96.53%) by only 0.53 and 0.01 percentage points, respectively. These results highlight KGGCN’s strong overall performance in balancing precision and recall, performing at or near the state-of-the-art across varied text types.
Leading Performance in Recall: KGGCN consistently demonstrates a notable strength in Recall, often achieving the highest Recall scores across datasets. On the OntoNotes 4.0 dataset, KGGCN obtains 84.28 ± 0.11% Recall, which is the highest among all compared models, surpassing the next best, ChineseBERT (83.65%), by 0.63 percentage points. Similarly, on the MSRA dataset, KGGCN achieves 96.14 ± 0.05% Recall, outperforming the best baseline recall, SoftLexicon (95.10%), by 1.04 percentage points. Even on the challenging Weibo dataset, KGGCN achieves an impressive 71.88 ± 2.05% Recall, ranking second only to ChineseBERT (72.97%) by 1.09 percentage points, but significantly higher than SoftLexicon (67.02%). This strong improvement in Recall, substantiated by our consistent results across 5 runs (low standard deviation for P, R, F1 on most datasets), is attributed to the enriched semantic representations provided by the external knowledge graph and the GCN’s ability to propagate this knowledge throughout the sequence, making subtle or sparse entities more salient for the CRF decoder.
Superior Performance in Precision: KGGCN also demonstrates superior performance in Precision on certain datasets. On MSRA, its Precision of 95.82 ± 0.11% is the highest among all models, surpassing SoftLexicon (95.75%) by 0.07 percentage points. On Resume, KGGCN achieves 96.40 ± 0.19% Precision, which is the highest, surpassing PLTE (96.16%) by 0.24 percentage points. On Weibo, KGGCN’s Precision of 72.14 ± 1.61% is also the highest, outperforming PLTE (72.00%) by 0.14 percentage points. While KGGCN’s Precision on OntoNotes (80.40 ± 0.19%) is not the highest (SoftLexicon 83.41%), its overall performance across P, R, F1 indicates a strong and balanced capability in entity prediction.

For cases where the F1 scores of our method are not the absolute best, such as on OntoNotes (F1: 82.28 ± 0.15%) compared to SoftLexicon (F1: 82.81%), KGGCN achieves strong Recall but its Precision is not the highest (80.40% vs. 83.41%). This indicates that while KGGCN excels at identifying most true entities, there is still room for improvement in reducing false positives on longer, more linguistically diverse sentences in OntoNotes. Similarly, on Resume, KGGCN’s F1 is only marginally behind the best (96.52% vs. PGAT’s 96.53%), a negligible difference, while achieving the highest Precision on this dataset. These observations provide valuable insights for future improvements in knowledge utilization, particularly in fine-tuning the balance between aggressive entity detection and precise boundary prediction for diverse textual characteristics.

• Inference Time Analysis To evaluate the computational efficiency of KGGCN, we measured the average inference time per instance on the test set of each dataset. Table 3 presents these results, derived from a single representative run for each dataset.

As shown in Table 3, KGGCN exhibits competitive inference speeds across OntoNotes, MSRA, and Resume datasets, processing each instance in approximately 30–42 ms. For the Weibo dataset, the inference time per instance is notably higher at 271.39 ms. This is primarily attributed to the smaller batch size (1) used for Weibo during both training and inference, which significantly reduces GPU parallelism compared to batch sizes of 16 used for other datasets. While the integration of GCN layers and knowledge injection introduces a certain computational overhead compared to plain PLM inference, the overall efficiency remains well within acceptable limits for practical applications, especially with larger batch sizes. This demonstrates that our method effectively leverages external knowledge without introducing prohibitive latency that would hinder real-world deployment.

• Error Analysis To gain deeper insights into KGGCN’s performance and identify specific areas for improvement, we conducted a qualitative error analysis on selected False Positive (FP) and False Negative (FN) samples from the test sets. This analysis is performed on a single representative model run for each dataset (OntoNotes seed 24, MSRA seed 17, Resume seed 41, Weibo seed 7), adhering to our primary evaluation criterion of strict entity boundary matching.

Our qualitative error analysis, visually presented in Figure 7, reveals common error patterns and underlying challenges. For each example in the figure (labeled from (a) to (f)), we analyze the discrepancy between the model’s prediction and the gold standard:

Precise Entity Boundary Delineation Challenges:
-
Example (a)—OntoNotes 4.0 FP: Here, the model incorrectly identifies a single character as a standalone ‘ORG’ entity, while it should be part of a larger multi-character gold entity. This FP arises from misjudging the exact boundary of the multi-character span.
-
Example (e)—Resume FP: Similar to (a), this sample shows the model predicting a single character as a complete ‘PRO’ entity. This occurs in a concise resume context where precise boundary delineation for compound entities is particularly challenging.
-
Example (c)—MSRA FP: For a multi-character ‘LOC’ entity, the model predicts ‘B-LOC’ for a character that should be ‘I-LOC’. This internal tagging inconsistency for a partial entity span leads to an FP, indicating difficulty in maintaining correct B-I-O sequence within entities.
-
Example (d)—MSRA FN: This case shows the model correctly identifying the character span for a ‘LOC’ entity but assigning an incorrect internal tag (‘B-LOC’ instead of ‘I-LOC’ for an internal character). This precise internal tagging error, despite correct character identification, results in a False Negative under strict boundary matching.
Robust Recognition of Ambiguous Entities:
-
Example (b)—OntoNotes 4.0 FN: In this instance, KGGCN completely fails to recognize a legitimate ‘GPE’ entity. The model assigns ‘O’ (outside of any entity) labels for the entire entity span. This highlights a missed entity, emphasizing the difficulty in accurately identifying certain entity types in complex news contexts.
-
Example (f)—Weibo FP: This is a clear False Positive where the model incorrectly predicts a non-entity character as a ‘PER.NAM’ entity. This type of over-prediction is frequently observed in informal and noisy social media texts, where ambiguous contexts or fragmented phrases can lead the model to misinterpret individual characters as entities. This suggests a need for better contextual filtering and disambiguation in noisy data.

Figure 7. Qualitative Error Analysis Examples for KGGCN. This figure presents selected False Positive (FP) and False Negative (FN) samples from the test sets of OntoNotes 4.0, MSRA, Resume, and Weibo datasets. Each example (labeled (a) to (f)) illustrates typical error patterns of KGGCN, providing the original Chinese sentence (with ‘[PAD]’ removed), the identified error span, the predicted entity/label, and the gold truth. The Chinese text in each example represents the actual input sentence from the dataset. Predicted entities/labels in red indicate an incorrect positive prediction, while those in blue signify a missed or incorrectly predicted negative entity, clearly highlighting the source of the error.

This qualitative error analysis highlights that while KGGCN effectively utilizes external knowledge for NER, challenges persist primarily in two key areas: (1) Precise Entity Boundary Delineation, where the model struggles with accurately identifying the exact start and end positions of multi-character entities, leading to FPs (e.g., partial entities predicted as full ones or incorrect internal tagging) or FNs (e.g., full entities missed due to slight boundary mismatches or incorrect internal tags). This is often exacerbated by complex sentence structures and dense information. (2) Robust Recognition of Ambiguous Entities, particularly for common words or informal phrases (e.g., in social media texts) that act as entities but lack strong structural cues or clear, disambiguating links within the general-purpose knowledge graph. This includes instances of over-prediction (FPs like the example in (f)). These observations align with the inherent difficulties of NER in diverse Chinese corpora, suggesting promising avenues for future research in refining boundary detection mechanisms and enhancing contextual disambiguation, possibly through more domain-specific or context-aware knowledge integration.

4.3. Ablation Study

To rigorously verify the validity and understand the contribution of each component within our KGGCN model, we performed a series of extensive ablation experiments. These studies specifically focus on the impact of different GCN architectures and their layering strategies, thereby providing empirical evidence for our proposed KGGCN design.

4.3.1. The Role of GCN Blocks

In this section, we present a series of ablation experiments designed to rigorously verify the validity and understand the incremental contribution of each component within our KGGCN model. These studies specifically focus on the impact of different GCN architectures and their layering strategies, thereby validating the design of our proposed KGGCN. Table 4 provides detailed numerical results for various configurations, while Figure 8 offers a visual summary for intuitive understanding.

Specifically, “BERT” denotes the baseline model that removes the subsequent GCN computational phase and directly uses BERT for NER, serving as a strong baseline to gauge the overall impact of our GCN-based knowledge integration. “w/o MHGCN” indicates the model uses only the standard GCN (Phase 1) in the GCN computational phase, effectively removing the later Multi-Head Attention GCN (Phase 2). Conversely, “w/o GCN” utilizes only the Multi-Head Attention GCN (Phase 2) without the standard GCN (Phase 1). “GCN_0” and “GCN_1” specifically refer to the first and second GraphConvLayers of the Standard GCN module, respectively, each with its specified number of dense sub-layers. Similarly, “MHGCN_0” and “MHGCN_1” are defined for the first and second MultiGraphConvLayers of the Multi-Head Attention GCN module. It is important to note that the Standard GCN (Phase 1) employs a binarized adjacency matrix derived from the Visibility Matrix (Section 3.2), providing a static graph structure for initial knowledge propagation. In contrast, the Multi-Head Attention GCN (Phase 2) dynamically generates adjacency matrices with different weights through an attention mechanism. The Visibility Matrix itself is a fundamental component of our knowledge integration strategy, acting as the initial graph structure for GCN processing. Its role is implicitly validated through the performance of GCN variants.

As shown in Table 4 and Figure 8, our full KGGCN architecture, which optimally combines two layers of Standard GCN followed by two layers of Multi-Head Attention GCN, consistently achieves leading or highly competitive F1 scores across all four datasets. The results clearly demonstrate the effectiveness of our two-phase GCN design and highlight the complementary strengths of its components.

Further detailed analysis of the ablation study reveals several key insights into the precise contribution of each GCN phase and layer:

Significant Impact of GCN Integration: Comparing the performance of “BERT” (e.g., F1-score of 95.5% on MSRA) with the full “KGGCN” (F1-score of 95.96% on MSRA), we observe a substantial F1-score gain of approximately 0.46 percentage points. On Resume, KGGCN achieves 96.52% F1, outperforming BERT (95.33%) by 1.19 percentage points. These significant gains underscore the crucial role of external knowledge and its structured propagation via the GCN layers in enhancing NER performance. The foundational graph structure provided by the Visibility Matrix (Section 3.2) enables controlled and contextually relevant knowledge integration, which is pivotal for these improvements.
Complementary Roles of Standard GCN and Multi-Head Attention GCN: Ablating either phase (i.e., “w/o MHGCN” or “w/o GCN”) generally leads to a performance drop compared to the full KGGCN, confirming their complementary contributions. For instance, on the MSRA dataset, “w/o MHGCN” achieves an F1 of 95.8%, and “w/o GCN” reaches 95.9%, while KGGCN achieves the highest F1 of 95.96%. On OntoNotes, “w/o MHGCN” has an F1 of 82.4%, and “w/o GCN” has 82.0%, compared to KGGCN’s 82.28% (second best). This indicates that both Standard GCN (Phase 1) provides a robust initial aggregation based on explicit connections, and Multi-Head Attention GCN (Phase 2) adaptively refines these connections. Their synergistic combination effectively leverages both static structural information and dynamic semantic flows for superior entity recognition.
Optimal Layer Configuration and Performance Breakdown: The performance variations observed among different layer combinations (e.g., “GCN_0+MHGCN_0”, “GCN_1+MHGCN_1”) highlight the importance of careful architectural design. Our chosen full KGGCN structure consistently proves to be optimal or near-optimal across datasets. Notably, KGGCN achieves the highest F1-score on MSRA (95.96%) and Weibo (71.98%), the highest Precision on Resume (96.40%), and the highest Recall on MSRA (96.14%). Furthermore, KGGCN demonstrates strong performance across other metrics: it achieves the second highest F1-score on OntoNotes (82.28%) and Resume (96.52%), and the second highest Precision on MSRA (95.82%) and Weibo (72.14%). Additionally, it secures the second highest Recall on OntoNotes (84.28%) and Weibo (71.88%). These detailed results, visually summarized in Figure 8, underscore that a balanced depth and complexity in the GCN architecture, along with the dense connectivity (as implemented in our GCN design, ensuring information from initial embedding layers is preserved and iteratively refined), is essential for capturing rich knowledge interactions without issues like over-smoothing. This contributes significantly to the model’s overall robustness and fine-grained performance.

These results collectively reinforce that our two-phase GCN architecture, incorporating both static and dynamic knowledge propagation alongside dense connections, is an effective and well-justified design for enhancing Chinese NER. The consistent outperformance of KGGCN over its ablated variants strongly validates the individual and combined contributions of its proposed components, effectively addressing the need for empirical evidence for each design choice.

4.3.2. Explorations on Knowledge Graph and Sentence Length

In addition to evaluating core architectural components, we also explored the effects of different Knowledge Graph (KG) types and varying sentence lengths on the model’s effectiveness. The experimental results are shown in Table 5 and Table 6.

Impact of Different Knowledge Graphs (Table 5):Table 5 presents the F1-scores of KGGCN when integrated with three distinct knowledge graphs: HowNet, MedicalKG, and CnDbpedia.

On the OntoNotes 4.0 dataset, all three knowledge graphs achieved identical F1 scores of 82.10%. This suggests that for a broadly diverse news corpus like OntoNotes, the general knowledge provided by different KGs might offer similar levels of enhancement to the base language model.
However, on the MSRA (news), Resume (domain-specific), and Weibo (social media) datasets, the HowNet knowledge graph consistently achieved the highest F1 scores, reaching 96.10%, 96.40%, and 71.00% respectively (tied with CnDbpedia on Weibo). HowNet is a linguistic knowledge graph focused on semantic relations between words. Its superior performance on these datasets, compared to the encyclopedic CnDbpedia or domain-specific MedicalKG, implies that fine-grained linguistic and semantic knowledge may be more universally beneficial for Chinese NER tasks, especially where contextual nuances and polysemy resolution are critical.
The MedicalKG, despite its specialized nature, performed competitively on MSRA and Resume, but significantly lower on Weibo (66.80%). This highlights that while domain-specific KGs can be powerful, their utility diminishes sharply when applied to out-of-domain texts.
The CnDbpedia, an encyclopedic knowledge graph, also performed strongly, tying with HowNet on Weibo. Its broad coverage makes it a robust general choice, but it sometimes yields slightly lower F1 scores compared to HowNet, possibly due to a less focused emphasis on linguistic relationships.

These findings underscore the importance of selecting a knowledge graph that aligns with the task’s domain and the linguistic characteristics of the dataset. While encyclopedic KGs offer broad coverage, linguistic KGs like HowNet might provide more universally applicable semantic enhancements for general Chinese NER.

Impact of Sentence Length (Table 6): We further investigated how sentence length influences the effectiveness of knowledge graph integration. Table 6 showcases KGGCN’s F1-scores on the OntoNotes dataset, segmented by sentence length, utilizing the CnDbpedia KG.

Short Sentences (Length < 40): For sentences shorter than 40 characters, the F1-score is comparatively lower (77.2% for $l < 20$ and 80.8% for $20 \leq l < 40$ ). This is primarily because shorter sequences inherently contain less information and consequently match fewer knowledge entries (e.g., only 9.0% for $l < 20$ ). With limited external knowledge to leverage, the model’s performance gain from KG integration is less pronounced.
Optimal Length (60 $\leq l <$ 80): As sentence length increases, the F1-score generally rises, reaching its highest point (83.4%) when the sentence length is between 60 and 80 characters. In this range, the proportion of matched knowledge entities is substantial (63.5%), indicating an optimal balance where sentences provide rich contextual information for effective KG matching without becoming excessively long or noisy.
Long Sentences (Length ≥ 80): Beyond the optimal range, for sentences longer than 80 characters, the F1-score begins to slightly decrease (e.g., 80.7% for $80 \leq l < 100$ and 82.8% for $l \geq 100$ ). Although the proportion of matched knowledge entities remains high (around 60-70%), excessively long sequences introduce more noise, increase computational burden, and potentially dilute the impact of knowledge signals. This can challenge the model’s ability to effectively process and retain all relevant information, leading to diminishing returns despite increased knowledge matching opportunities.

This analysis reveals that the efficacy of KG integration is context-dependent, with an optimal sentence length range where knowledge can be maximally leveraged. Both overly short and excessively long sentences pose distinct challenges, suggesting avenues for adaptive knowledge injection or context window management in future work.

5. Discussion

This section provides a comprehensive analysis of the KGGCN framework, including a comparative assessment with other state-of-the-art knowledge-enhanced Named Entity Recognition (NER) approaches, an explicit discussion of the current study’s limitations, and outlines specific directions for future research.

5.1. Comparative Analysis with State-of-the-Art Approaches

Recent advancements in knowledge-enhanced NER have largely focused on integrating external lexical or factual knowledge into pre-trained language models (PLMs). Architectures such as K-BERT [22], KnowBERT [30], LUKE [40], and various K-GCN variants (e.g., those employing multi-head attention GCNs like [20,21]) exemplify these efforts. These models typically operate by inserting knowledge graph (KG) triples or entity embeddings directly into the input token sequence or modifying the internal transformer layers. While effective, these methods can sometimes disrupt the original sentence’s inherent linguistic structure or introduce undesirable noise through less controlled cross-entity information propagation.

In contrast, KGGCN introduces a distinct and meticulously controlled methodology for knowledge integration, offering several unique advantages. Our novel end-append serialization scheme ensures that KG-derived tail entities are appended after the original sentence, thereby meticulously preserving the original linguistic coherence and syntactic integrity. This is fundamentally different from in-place insertion methods, which can alter positional relationships between original tokens. Crucially, the accompanying visibility matrix (as formalized in Equation (1) and illustrated in Figure 3) explicitly constrains message passing, ensuring that injected knowledge influences only contextually relevant entities and mitigating the risk of spurious dependencies. Furthermore, unlike conventional single-stage graph processing found in some K-GCN variants, KGGCN employs a two-phase graph convolution architecture. This comprises an initial Standard GCN for robust and balanced neighborhood aggregation based on static visibility, followed by a Multi-Head Attention GCN that adaptively refines these features by learning dynamic, weighted adjacency matrices. This dual mechanism provides both structural stability and adaptive flexibility in knowledge propagation, leading to consistently superior performance across diverse datasets.

Quantitatively, KGGCN’s performance substantiates these architectural advantages. It achieves the highest F1-scores on MSRA (95.96%) and Weibo (71.98%), surpassing previous state-of-the-art results by notable margins. Moreover, KGGCN demonstrates leading capabilities in entity coverage, securing the highest Recall on OntoNotes (84.28%) and MSRA (96.14%), while also achieving the highest Precision on MSRA, Resume, and Weibo. These results, compared against strong baselines like LEBERT, ChineseBERT, and PGAT, empirically confirm that KGGCN’s fine-grained control over knowledge scope and its sophisticated hierarchical graph aggregation are critical factors for effective knowledge-enhanced NER. The two-phase GCN structure effectively leverages both static, explicit connections and dynamic, attention-driven dependencies to build a more robust and contextually aware representation.

5.2. Limitations of the Current Study

Despite the strong overall performance and the thoughtful design of KGGCN, several limitations warrant discussion and motivate future research. These include:

Monolingual Focus: The current experimental validation of KGGCN is exclusively confined to Chinese NER datasets. Its generalizability and effectiveness in cross-lingual or multilingual NER tasks remain unexplored.
Knowledge Graph Quality and Coverage Dependence: The performance of KGGCN is inherently dependent on the quality, completeness, and domain coverage of the external knowledge graphs (e.g., CN-DBpedia, HowNet). Inaccuracies, noise, or incompleteness within these KGs can introduce errors or limit the model’s ability to recognize certain entities, particularly in highly specialized or low-resource domains.
Scalability and Efficiency for Extended Sequences: While KGGCN exhibits competitive inference speeds on typical sentence lengths (Table 3), the end-append serialization strategy can lead to increased sequence lengths for very long input texts. This could pose challenges regarding computational efficiency (e.g., memory usage and processing time) for extremely lengthy documents, which may necessitate further optimization for specific real-world deployments.
Fine-Grained Boundary Delineation Challenges: As highlighted by our qualitative error analysis (Figure 7), KGGCN occasionally struggles with precise boundary delineation for multi-character entities and robust disambiguation in highly ambiguous contexts, especially prevalent in informal social media texts. This suggests room for improvement in refining internal tag consistency and context-aware filtering mechanisms.
Ethical Considerations of Knowledge Bias: While the paper briefly acknowledges potential biases within KGs, a comprehensive ethical assessment of how such biases might propagate through the knowledge injection and GCN layers, and ultimately influence NER predictions, has not been conducted. This deeper investigation into fairness and reliability is crucial for responsible AI development.

5.3. Directions for Future Work

Building upon the insights gained from this study and addressing the identified limitations, we envision several promising directions for future research:

Dynamic and Adaptive Knowledge Filtering: Investigating mechanisms for dynamically selecting the most relevant KG triples or entities based on real-time contextual cues, moving beyond static lookup and fixed quantity limits. This could improve both efficiency and precision of knowledge injection, especially for long or ambiguous texts.
Cross-Lingual and Low-Resource Adaptation: Extending KGGCN to support multilingual NER tasks by incorporating cross-lingual KGs and exploring transfer learning techniques to adapt the framework to low-resource languages or domains.
Enhanced Boundary and Ambiguity Resolution: Integrating advanced span-level prediction modules or contrastive learning objectives specifically designed to refine entity boundaries and mitigate ambiguity, thereby improving performance in challenging cases of partial or over-prediction.
Robustness to KG Noise and Bias Mitigation: Developing systematic strategies for evaluating KGGCN’s robustness to noisy or incomplete KGs and implementing methods to detect, quantify, and mitigate the propagation of biases from external knowledge sources, ensuring more equitable and reliable NER outcomes.
Generalization to Other Structured NLP Tasks: Exploring the applicability of KGGCN’s controlled knowledge injection and two-phase GCN architecture to other knowledge-intensive structured prediction tasks, such as relation extraction, event detection, or knowledge graph completion, to further demonstrate its versatility and impact.

This discussion underscores that KGGCN provides both methodological and empirical advancements in effectively combining structured knowledge graphs with sophisticated graph neural architectures for Chinese NER. By acknowledging its current constraints and delineating clear paths for future development, this section complements our experimental findings and positions KGGCN as a robust foundation for subsequent exploration of knowledge-aware information extraction models.

6. Conclusions

In this study, we propose KGGCN, a novel framework that effectively combines Knowledge Graphs (KGs) and a two-phase Graph Convolutional Network (GCN) architecture to significantly enhance the performance of Chinese Named Entity Recognition (CNER). Our approach addresses key limitations of prior methods by introducing a sophisticated knowledge injection mechanism and a robust graph-based information propagation strategy.

Specifically, KGGCN innovatively integrates external factual knowledge from KGs through an end-append serialization scheme. This method, coupled with a visibility matrix, ensures that knowledge is seamlessly incorporated without disrupting the original sentence structure while precisely controlling the scope of injected information. The core of our architectural contribution lies in the two-phase GCN stack: an initial Standard GCN performs robust and balanced aggregation of neighboring connections, establishing foundational structural integrity. Subsequently, a Multi-Head Attention GCN refines these features by adaptively concentrating on more relevant local and global dependencies. This combined GCN structure not only strengthens the model’s ability to process intricate contextual details but also significantly improves the overall quality and depth of feature representation.

Our extensive experiments on four diverse CNER datasets (OntoNotes 4.0, MSRA, Resume, and Weibo) provide compelling evidence for KGGCN’s effectiveness. The model achieves state-of-the-art or highly competitive results across various metrics, notably securing the highest F1-scores on MSRA (95.96%) and Weibo (71.98%), and demonstrating leading performance in Recall on OntoNotes (84.28%) and MSRA (96.14%). Ablation studies further validate the crucial contributions of both the Standard GCN and Multi-Head Attention GCN phases, confirming their synergistic effect in knowledge aggregation. Furthermore, our explorations into different KG types and sentence lengths offer valuable insights into optimal knowledge selection and context adaptation.

This study not only demonstrates the effectiveness of integrating KGs with advanced GCN architectures in the CNER domain but also provides a robust reference for future endeavors in fusing diverse forms of structured knowledge into deep learning models. For future work, we plan to continue exploring more adaptive and dynamic mechanisms for KG alignment, potentially incorporating task-specific knowledge graph construction, and investigating how KGGCN can be extended to handle overlapping and discontinuous entities more effectively, further advancing the field of CNER.

Author Contributions

Conceptualization, X.C. and W.H.; methodology, X.C., L.H. and W.H.; software, X.C. and S.Y.; validation, X.C. and L.H.; formal analysis, X.C.; investigation, X.C.; resources, L.H.; data curation, X.C.; writing—original draft preparation, X.C. and W.H.; writing—review and editing, X.C., L.H. and S.Y.; visualization, X.C.; supervision, L.H. and S.Y.; project administration, L.H.; funding acquisition, L.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key R&D Program in Xinjiang Uygur Autonomous Region, grant number 2022B03019-6.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The source code and configuration files for the KGGCN framework are publicly available at https://github.com/Fasiner/KGGCN (accessed on 9 November 2025) to ensure full reproducibility.

Acknowledgments

This work was supported by the Key R&D Program in Xinjiang Uygur Autonomous Region under Grant number 2022B03019-6. We sincerely thank the program for supporting this research.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Peng, N.; Dredze, M. Named entity recognition for chinese social media with jointly trained embeddings. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 548–554. [Google Scholar]
Cao, P.; Chen, Y.; Liu, K.; Zhao, J.; Liu, S. Adversarial transfer learning for Chinese named entity recognition with self-attention mechanism. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 182–192. [Google Scholar]
Wu, F.; Liu, J.; Wu, C.; Huang, Y.; Xie, X. Neural Chinese named entity recognition via CNN-LSTM-CRF and joint training with word segmentation. In Proceedings of the The World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 3342–3348. [Google Scholar]
Zhang, Y.; Yang, J. Chinese NER Using Lattice LSTM. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 15–20 July 2018; pp. 1554–1564. [Google Scholar] [CrossRef]
Gui, T.; Ma, R.; Zhang, Q.; Zhao, L.; Jiang, Y.G.; Huang, X. CNN-Based Chinese NER with Lexicon Rethinking. In IJCAI, Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; IJCAI: Vienna, Austria, 2019; pp. 4982–4988. [Google Scholar]
Ma, R.; Peng, M.; Zhang, Q.; Wei, Z.; Huang, X. Simplify the Usage of Lexicon in Chinese NER. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Virtual, 5–10 July 2020; pp. 5951–5960. [Google Scholar] [CrossRef]
Liu, W.; Xu, T.; Xu, Q.; Song, J.; Zu, Y. An Encoding Strategy Based Word-Character LSTM for Chinese NER. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 2379–2389. [Google Scholar] [CrossRef]
Zhu, Y.; Wang, G. CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 3384–3393. [Google Scholar] [CrossRef]
Mengge, X.; Yu, B.; Liu, T.; Zhang, Y.; Meng, E.; Wang, B. Porous Lattice Transformer Encoder for Chinese NER. In Proceedings of the 28th International Conference on Computational Linguistics, Virtual, 8–13 December 2020; pp. 3831–3841. [Google Scholar] [CrossRef]
Jia, Y.; Fang, W.; Lu, H.Y. Think More Ambiguity Less: A Novel Dual Interactive Model with Local and Global Semantics for Chinese NER. Acm Trans. Asian Low-Resour. Lang. Inf. Process. 2023, 22, 1–21. [Google Scholar] [CrossRef]
Long, K.; Zhao, H.; Shao, Z.; Cao, Y.; Geng, Y.; Sun, Y.; Xu, W.; Yu, H. Deep Neural Network with Embedding Fusion for Chinese Named Entity Recognition. Acm Trans. Asian Low-Resour. Lang. Inf. Process. 2023, 22, 1–16. [Google Scholar] [CrossRef]
Annervaz, K.; Chowdhury, S.B.R.; Dukkipati, A. Learning beyond datasets: Knowledge graph augmented neural networks for natural language processing. arXiv 2018, arXiv:1802.05930. [Google Scholar] [CrossRef][Green Version]
He, Q.; Wu, L.; Yin, Y.; Cai, H. Knowledge-graph augmented word representations for named entity recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 7919–7926. [Google Scholar][Green Version]
Chawla, A.; Mulay, N.; Bishnoi, V.; Dhama, G. KARL-Trans-NER: Knowledge Aware Representation Learning for Named Entity Recognition using Transformers. arXiv 2021, arXiv:2111.15436. [Google Scholar][Green Version]
Cetoli, A.; Bragaglia, S.; O’Harney, A.; Sloan, M. Graph Convolutional Networks for Named Entity Recognition. In Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories, Prague, Czech Republic, 23–24 January 2018; pp. 37–45. [Google Scholar][Green Version]
Tang, Z.; Wan, B.; Yang, L. Word-Character Graph Convolution Network for Chinese Named Entity Recognition. Ieee/Acm Trans. Audio Speech Lang. Process. 2020, 28, 1520–1532. [Google Scholar] [CrossRef]
Li, F.; Lin, Z.; Zhang, M.; Ji, D. A Span-Based Model for Joint Overlapped and Discontinuous Named Entity Recognition. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Virtual, 1–6 August 2021; pp. 4814–4828. [Google Scholar] [CrossRef]
Ding, R.; Xie, P.; Zhang, X.; Lu, W.; Li, L.; Si, L. A Neural Multi-digraph Model for Chinese NER with Gazetteers. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 1462–1467. [Google Scholar] [CrossRef]
Chen, C.; Kong, F. Enhancing Entity Boundary Detection for Better Chinese Named Entity Recognition. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Virtual, 1–6 August 2021; pp. 20–25. [Google Scholar] [CrossRef]
Wang, Y.; Lu, L.; Wu, Y.; Chen, Y. Polymorphic graph attention network for Chinese NER. Expert Syst. Appl. 2022, 203, 117467. [Google Scholar] [CrossRef]
Lin, M.; Xu, Y.; Cai, C.; Ke, D.; Su, K. A lattice-transformer-graph deep learning model for Chinese named entity recognition. J. Intell. Syst. 2023, 32, 20222014. [Google Scholar] [CrossRef]
Liu, W.; Zhou, P.; Zhao, Z.; Wang, Z.; Ju, Q.; Deng, H.; Wang, P. K-bert: Enabling language representation with knowledge graph. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 2901–2908. [Google Scholar]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers); Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. 2018. Available online: https://api.semanticscholar.org/CorpusID:49313245 (accessed on 9 November 2025).
Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 2020, 21, 140:1–140:67. [Google Scholar]
Zhong, W.; Tang, D.; Duan, N.; Zhou, M.; Wang, J.; Yin, J. Improving question answering by commonsense-based pre-training. In Natural Language Processing and Chinese Computing, Procedings of the 8th CCF International Conference (NLPCC 2019), Dunhuang, China, 9–14 October 2019; Springer: Dunhuang, China, 2019; pp. 16–28. [Google Scholar]
Madotto, A.; Wu, C.S.; Fung, P. Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Oriented Dialog Systems. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, 5–20 July 2018; pp. 1468–1478. [Google Scholar] [CrossRef]
Cao, Y.; Hou, L.; Li, J.; Liu, Z.; Li, C.; Chen, X.; Dong, T. Joint Representation Learning of Cross-lingual Words and Entities via Attentive Distant Supervision. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 227–237. [Google Scholar] [CrossRef]
Sun, T.; Shao, Y.; Qiu, X.; Guo, Q.; Hu, Y.; Huang, X.; Zhang, Z. CoLAKE: Contextualized Language and Knowledge Embedding. In Proceedings of the 28th International Conference on Computational Linguistics, Virtual, 8–13 December 2020; pp. 3660–3670. [Google Scholar] [CrossRef]
Peters, M.E.; Neumann, M.; Logan, R.; Schwartz, R.; Joshi, V.; Singh, S.; Smith, N.A. Knowledge Enhanced Contextual Word Representations. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 43–54. [Google Scholar] [CrossRef]
He, B.; Zhou, D.; Xiao, J.; Jiang, X.; Liu, Q.; Yuan, N.J.; Xu, T. BERT-MK: Integrating Graph Contextualized Knowledge into Pre-trained Language Models. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Virtual, 16–20 November 2020; pp. 2281–2290. [Google Scholar] [CrossRef]
He, L.; Zheng, S.; Yang, T.; Zhang, F. KLMo: Knowledge Graph Enhanced Pretrained Language Model with Fine-Grained Relationships. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, Punta Cana, Dominican Republic, 16–20 November 2021; pp. 4536–4542. [Google Scholar] [CrossRef]
Liu, Y.; Wan, Y.; He, L.; Peng, H.; Philip, S.Y. Kg-bart: Knowledge graph-augmented bart for generative commonsense reasoning. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 19–21 May 2021; Volume 35, pp. 6418–6425. [Google Scholar]
Wang, X.; Gao, T.; Zhu, Z.; Zhang, Z.; Liu, Z.; Li, J.; Tang, J. KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation. Trans. Assoc. Comput. Linguist. 2021, 9, 176–194. [Google Scholar] [CrossRef]
Yu, D.; Zhu, C.; Yang, Y.; Zeng, M. Jaket: Joint pre-training of knowledge graph and language understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 22 February–1 March 2022; Volume 36, pp. 11630–11638. [Google Scholar]
Weischedel, R.; Pradhan, S.; Ramshaw, L.; Palmer, M.; Xue, N.; Marcus, M.; Taylor, A.; Greenberg, C.; Hovy, E.; Belvin, R.; et al. Ontonotes Release 4.0; LDC2011T03; Linguistic Data Consortium: Philadelphia, PA, USA, 2011. [Google Scholar]
Levow, G.A. The third international Chinese language processing bakeoff: Word segmentation and named entity recognition. In Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, Sydney, Australia, 22-23 July 2006; pp. 108–117. [Google Scholar]
Li, X.; Yan, H.; Qiu, X.; Huang, X. FLAT: Chinese NER Using Flat-Lattice Transformer. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Virtual, 5–10 July 2020; pp. 6836–6842. [Google Scholar] [CrossRef]
He, H.; Sun, X. F-Score Driven Max Margin Neural Network for Named Entity Recognition in Chinese Social Media. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, 3–7 April 2017; pp. 713–718. [Google Scholar]
Yamada, I.; Asai, A.; Shindo, H.; Takeda, H.; Matsumoto, Y. LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Virtual, 16–20 November 2020; pp. 6442–6454. [Google Scholar] [CrossRef]

Figure 2. Knowledge Serialization Strategy in KGGCN. This figure illustrates our end-append serialization approach, a key innovation for controlled knowledge integration. All unique retrieved knowledge graph (KG) tail entities (as individual characters) are appended to the end of the original segmented sentence, thereby preserving its inherent structural integrity. For comparison, a common alternative in-place serialization strategy, where knowledge tokens are directly inserted at their corresponding entity locations within the original sentence, is also shown in miniature on the bottom-right. This highlights the advantage of our method in minimizing disruption to the original sentence’s semantic flow.

Figure 3. Detailed Illustration of Visibility Matrix Construction and Multi-Head Attention Mechanism. The left panel (featuring an orange-toned background in its example section) comprehensively demonstrates the construction of the Visibility Matrix (

M

) based on explicit rules and a concrete example. The key visibility rules, presented at the top-left of this panel, are as follows: (1) Rule 1: Original Sentence Inter-visibility. All characters within the original sentence segment are mutually visible. This connectivity is visually highlighted by orange dashed bounding boxes drawn around the relevant matrix blocks in the example. (2) Rule 2a: Entity-Knowledge Token Visibility. An original sentence token is visible to all characters of its directly linked (appended) tail entities, and vice versa. This crucial bidirectional link is indicated by brown solid bounding boxes. (3) Rule 2b: Intra-Knowledge Cohesion. Characters within the same appended tail entity are mutually visible. This internal cohesion is highlighted by green dashed bounding boxes. (4) Implicitly, Rule 3: Cross-Entity Knowledge Token Isolation. Tail entities linked to different original tokens, or unlinked tokens, are mutually invisible. (Represented by white/unfilled areas in the matrix, indicating

M_{i j} = 0

.) The bottom-left matrix example, derived from the “Rules“ and “Example“ sections, displays the final

M

(a 0/1 matrix). All matrix entries with a value of 1 are denoted by orange fill. This orange fill itself visually represents the three types of visibility (Rules 1, 2a, 2b) as indicated by the corresponding colored bounding boxes. The right panel (with an overall green-toned background) illustrates the multi-head attention mechanism used in Phase 2 of the GCN. It shows how this mechanism dynamically derives multiple, weighted adjacency matrices (ranging from 0 to 1, depicted by varying color intensities) from the initial representations, allowing for adaptive refinement of knowledge propagation.

Figure 3. Detailed Illustration of Visibility Matrix Construction and Multi-Head Attention Mechanism. The left panel (featuring an orange-toned background in its example section) comprehensively demonstrates the construction of the Visibility Matrix (

M

) based on explicit rules and a concrete example. The key visibility rules, presented at the top-left of this panel, are as follows: (1) Rule 1: Original Sentence Inter-visibility. All characters within the original sentence segment are mutually visible. This connectivity is visually highlighted by orange dashed bounding boxes drawn around the relevant matrix blocks in the example. (2) Rule 2a: Entity-Knowledge Token Visibility. An original sentence token is visible to all characters of its directly linked (appended) tail entities, and vice versa. This crucial bidirectional link is indicated by brown solid bounding boxes. (3) Rule 2b: Intra-Knowledge Cohesion. Characters within the same appended tail entity are mutually visible. This internal cohesion is highlighted by green dashed bounding boxes. (4) Implicitly, Rule 3: Cross-Entity Knowledge Token Isolation. Tail entities linked to different original tokens, or unlinked tokens, are mutually invisible. (Represented by white/unfilled areas in the matrix, indicating

M_{i j} = 0

.) The bottom-left matrix example, derived from the “Rules“ and “Example“ sections, displays the final

M

(a 0/1 matrix). All matrix entries with a value of 1 are denoted by orange fill. This orange fill itself visually represents the three types of visibility (Rules 1, 2a, 2b) as indicated by the corresponding colored bounding boxes. The right panel (with an overall green-toned background) illustrates the multi-head attention mechanism used in Phase 2 of the GCN. It shows how this mechanism dynamically derives multiple, weighted adjacency matrices (ranging from 0 to 1, depicted by varying color intensities) from the initial representations, allowing for adaptive refinement of knowledge propagation.

Figure 4. Visualization of F1-scores on four CNER datasets. This figure illustrates the F1-scores of various models, grouped by type: PLM-based, Lexicon-based, and GNN-based approaches. The bar representing our proposed KGGCN model includes error bars, indicating the standard deviation of F1-scores across five independent runs, which highlights the stability and robustness of our method. KGGCN demonstrates strong competitive performance, achieving the highest F1-scores on MSRA and Weibo, and ranking second on OntoNotes 4.0 and Resume.

Figure 5. Visualization of Precision-scores on four CNER datasets. This figure presents the Precision-scores of various models. KGGCN achieves the highest Precision on MSRA, Resume, and Weibo, demonstrating its capability in minimizing false positive predictions across diverse datasets. Error bars for KGGCN indicate the standard deviation of Precision-scores.

Figure 6. Visualization of Recall-scores on four CNER datasets. This figure illustrates the Recall-scores of various models. KGGCN consistently achieves the highest Recall on OntoNotes 4.0 and MSRA, and ranks second on Resume and Weibo, showcasing its effectiveness in identifying a broad range of true entities. Error bars for KGGCN indicate the standard deviation of Recall-scores.

Figure 8. Visualization of GCN Blocks’ role in the ablation study results. This figure displays the F1-scores of various GCN component combinations, highlighting the progressive performance gains and the synergistic effect of our two-phase GCN design. The full KGGCN model consistently achieves strong overall performance, demonstrating the effectiveness of its integrated architecture.

Table 2. Results on the four datasets. Performance is reported as Precision (P), Recall (R), and F1-Score (F1). Best results are highlighted in bold, and second best are underlined. KGGCN’s results are presented as mean ± standard deviation over 5 independent runs.

	OntoNotes 4.0			MSRA			Resume			Weibo
Model	P(%)	(%)	F1(%)	P(%)	R(%)	F1(%)	P(%)	R(%)	F1(%)	P(%)	R(%)	F1(%)
BERT	-	-	79.93	-	-	94.71	-	-	95.33	-	-	67.27
BERT+word	-	-	81.03	-	-	95.32	-	-	95.46	-	-	68.32
ERNIE	-	-	77.65	-	-	95.08	-	-	94.82	-	-	67.96
ZEN	-	-	79.03	-	-	95.20	-	-	95.40	-	-	66.71
LEBERT	-	-	82.08	-	-	95.70	-	-	96.08	-	-	70.75
ChineseBERT	80.77	83.65	82.18	-	-	-	-	-	-	68.75	72.97	70.80
CAN	75.50	72.29	73.64	93.53	92.42	92.97	95.05	94.82	94.94	-	-	59.31
WC-LSTM	76.09	72.85	74.43	94.36	92.38	93.36	95.14	94.79	94.96	-	-	57.51
SoftLexicon	83.41	82.21	82.81	95.75	95.10	95.42	96.08	96.13	96.11	70.94	67.02	70.50
PLTE	79.62	81.82	80.60	94.91	94.15	94.53	96.16	96.75	96.45	72.00	66.67	69.23
GIWL	78.42	80.21	79.30	95.65	95.06	95.35	96.03	96.11	96.06	69.18	67.59	69.37
DNNEF	-	-	-	94.13	92.65	93.39	95.47	95.64	95.56	-	-	61.00
LGN	76.40	72.60	74.45	94.50	92.93	93.71	95.37	94.84	95.11	57.14	66.67	59.92
CGN	75.06	74.52	74.79	94.01	92.93	93.47	-	-	-	-	-	63.09
WC-GCN	76.59	75.17	75.87	94.82	93.98	94.40	96.04	95.34	95.70	-	-	63.63
Star-Xfmr	79.25	80.66	79.95	-	-	-	-	-	-	-	-	70.14
PGAT	-	-	81.87	-	-	-	-	-	96.53	-	-	70.63
KGGCN	80.40 ± 0.19	84.28 ± 0.11	82.28 ± 0.15	95.82 ± 0.11	96.14 ± 0.05	95.96 ± 0.05	96.40 ± 0.19	96.66 ± 0.15	96.52 ± 0.18	72.14 ± 1.61	71.88 ± 2.05	71.98 ± 0.91

We selected CN-DBpedia KG for KGGCN’s primary evaluation. All baseline results are cited from their respective original publications. KGGCN’s results are presented as mean ± standard deviation over 5 independent runs, providing a measure of model stability.

Table 3. Average inference time per instance (in milliseconds) for KGGCN on the test sets.

Model	OntoNotes 4.0	MSRA	Resume	Weibo
KGGCN	40.20 ms	30.86 ms	42.16 ms	271.39 ms

Results are presented from a single representative run for each dataset. Baselines’ inference times are not directly comparable due to varying experimental setups and unavailability of publicly reported metrics.

Table 4. Ablation study results on the four datasets, exploring the impact of different GCN architectures and layer combinations. Performance is reported as Precision (P), Recall (R), and F1-Score (F1). Best results are highlighted in bold, and second best are underlined.

	OntoNotes 4.0			MSRA			Resume			Weibo
Components	P(%)	R(%)	F1(%)	P(%)	R(%)	F1(%)	P(%)	R(%)	F1(%)	P(%)	R(%)	F1(%)
BERT	80.9	82.0	81.5	95.2	95.8	95.5	96.0	96.1	96.1	70.2	70.7	70.4
w/o MHGCN	80.5	84.3	82.4	95.7	95.8	95.8	95.5	95.9	95.7	67.6	68.8	68.2
w/o GCN	80.1	83.9	82.0	96.0	95.9	95.9	95.6	96.3	95.9	72.9	68.3	70.5
GCN_0+MHGCN_0	80.3	84.7	82.5	95.3	95.8	95.6	95.7	95.9	95.8	70.7	67.8	69.2
GCN_0+MHGCN_1	80.5	84.3	82.4	95.5	95.9	95.7	95.6	95.9	95.8	69.1	70.9	70.0
GCN_1+MHGCN_0	80.3	84.0	82.1	95.7	95.7	95.7	95.5	96.6	96.0	69.7	72.4	71.0
GCN_1+MHGCN_1	80.8	82.5	81.7	95.7	95.6	95.6	94.9	96.0	95.5	71.2	72.4	71.8
KGGCN	80.40	84.28	82.28	95.82	96.14	95.96	96.40	96.66	96.52	72.14	71.88	71.98

Ablation experiments performed on the CnDbpedia Knowledge Graph. “BERT” denotes the baseline model using only BERT for NER without the GCN computational phase. “w/o MHGCN” indicates the model uses only the standard GCN (Phase 1) in the GCN computational phase, removing the subsequent Multi-Head Attention GCN (Phase 2). “w/o GCN” is the opposite, utilizing only the Multi-Head Attention GCN (Phase 2) without the standard GCN (Phase 1). “GCN_0” and “GCN_1” specifically refer to the first and second GraphConvLayers of the Standard GCN module, respectively, each with its specified number of dense sub-layers. Similarly, “MHGCN_0” and “MHGCN_1” are defined for the first and second MultiGraphConvLayers of the Multi-Head Attention GCN module. All results for ablation configurations represent single runs, except for the full KGGCN model in the last row, which reports the mean performance from Table 2.

Table 5. Comparative F1-scores of KGGCN when utilizing different Knowledge Graphs.

Knowledge Graph	OntoNotes 4.0	MSRA	Resume	Weibo
HowNet	82.10	96.10	96.40	71.00
MedicalKG	82.10	96.00	95.90	66.80
CnDbpedia	82.10	95.70	96.00	71.00

This table presents F1-scores from single runs of KGGCN on different datasets using various knowledge graphs. Bold indicates the best F1-score for each dataset. The results for CnDbpedia reflect a representative run, not the mean ± standard deviation reported in Table 2.

Table 6. F1-score against sentence length on the OntoNotes dataset.

Sentence Length	Sentence Number	Matched Number	Proportion (%)	KGGCN/BERT (F1)
20 > l	1685	153	9.0	77.2/73.0
20 ≤ l < 40	1302	413	31.7	80.8/75.9
40 ≤ l < 60	798	360	45.1	82.9/75.9
60 ≤ l < 80	365	232	63.5	83.4/77.5
80 ≤ l < 100	169	103	60.9	80.7/73.9
100 ≤ l	124	92	74.1	82.8/77.7

The results were obtained from a single run of KGGCN using the CnDbpedia KG on OntoNotes datasets segmented by different sentence lengths.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, X.; He, L.; Hu, W.; Yi, S. KGGCN: A Unified Knowledge Graph-Enhanced Graph Convolutional Network Framework for Chinese Named Entity Recognition. AI 2025, 6, 290. https://doi.org/10.3390/ai6110290

AMA Style

Chen X, He L, Hu W, Yi S. KGGCN: A Unified Knowledge Graph-Enhanced Graph Convolutional Network Framework for Chinese Named Entity Recognition. AI. 2025; 6(11):290. https://doi.org/10.3390/ai6110290

Chicago/Turabian Style

Chen, Xin, Liang He, Weiwei Hu, and Sheng Yi. 2025. "KGGCN: A Unified Knowledge Graph-Enhanced Graph Convolutional Network Framework for Chinese Named Entity Recognition" AI 6, no. 11: 290. https://doi.org/10.3390/ai6110290

APA Style

Chen, X., He, L., Hu, W., & Yi, S. (2025). KGGCN: A Unified Knowledge Graph-Enhanced Graph Convolutional Network Framework for Chinese Named Entity Recognition. AI, 6(11), 290. https://doi.org/10.3390/ai6110290

Article Menu

KGGCN: A Unified Knowledge Graph-Enhanced Graph Convolutional Network Framework for Chinese Named Entity Recognition

Abstract

1. Introduction

2. Related Work

2.1. Incorporating Knowledge into Pre-Trained Models

2.2. Graph Neural Networks for NER

2.3. Summary and Positioning

3. Proposed Method

3.1. Knowledge Injection and Sequence Serialization

3.2. Visibility Matrix Construction

3.3. Contextual Embedding Layer

3.4. Two-Phase GCN for Knowledge Aggregation

3.4.1. Phase 1: Standard GCN for Robust Aggregation

3.4.2. Phase 2: Multi-Head Attention GCN for Adaptive Refinement

3.4.3. Multi-Graph Fusion

3.5. CRF Decoding Layer

4. Experiments

4.1. Experiment Settings

4.2. Results

4.3. Ablation Study

4.3.1. The Role of GCN Blocks

4.3.2. Explorations on Knowledge Graph and Sentence Length

5. Discussion

5.1. Comparative Analysis with State-of-the-Art Approaches

5.2. Limitations of the Current Study

5.3. Directions for Future Work

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI