Prototype-Enhanced Few-Shot Relation Extraction Method Based on Cluster Loss Optimization

Qian, Shenyi; Fu, Bowen; Liu, Chao; Jin, Songhe; Sun, Tong; Chen, Zhen; Li, Daiyi; Sun, Yifan; Chen, Yibing; Li, Yuheng

doi:10.3390/sym17101673

Open AccessArticle

Prototype-Enhanced Few-Shot Relation Extraction Method Based on Cluster Loss Optimization

by

Shenyi Qian

¹,

Bowen Fu

¹,

Chao Liu

^2,*,

Songhe Jin

^1,*,

Tong Sun

²,

Zhen Chen

²,

Daiyi Li

¹

,

Yifan Sun

¹,

Yibing Chen

³ and

Yuheng Li

⁴

¹

College of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou 450001, China

²

College of Information and Managment Science, Henan Agricultural University, Zhengzhou 450046, China

³

International Business School, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China

⁴

School of Electronic and Electrical Engineering, Nanyang Technological University, 50 Nanyang Road, Singapore 639798, Singapore

^*

Authors to whom correspondence should be addressed.

Symmetry 2025, 17(10), 1673; https://doi.org/10.3390/sym17101673

Submission received: 7 July 2025 / Revised: 27 August 2025 / Accepted: 3 September 2025 / Published: 7 October 2025

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

The purpose of few-shot relation extraction (RE) is to recognize the relationship between specific entity pairs in text when there are a limited number of labeled samples. A few-shot RE method based on a prototype network, which constructs relation prototypes by relying on the support set to assign labels to query samples, inherently leverages the symmetry between support and query processing. Although these methods have achieved remarkable results, they still face challenges such as the misjudging of noisy samples or outliers, as well as distinguishing semantic similarity relations. To address the aforementioned challenges, we propose a novel semantic enhanced prototype network, which can integrate the semantic information of relations more effectively to promote more expressive representations of instances and relation prototypes, so as to improve the performance of the few-shot RE. Firstly, we design a prompt encoder to uniformly process different prompt templates for instance and relation information, and then utilize the powerful semantic understanding and generation capabilities of large language models (LLMs) to obtain precise semantic representations of instances, their prototypes, and conceptual prototypes. Secondly, graph attention learning techniques are introduced to effectively extract specific-relation features between conceptual prototypes and isomorphic instances while maintaining structural symmetry. Meanwhile, a prototype-level contrastive learning strategy with bidirectional feature symmetry is proposed to predict query instances by integrating the interpretable features of conceptual prototypes and the intra-class shared features captured by instance prototypes. In addition, a clustering loss function was designed to guide the model to learn a discriminative metric space with improved relational symmetry, effectively improving the accuracy of the model’s relationship recognition. Finally, the experimental results on the FewRel1.0 and FewRel2.0 datasets show that the proposed approach delivers improved performance compared to existing advanced models in the task of few-shot RE.

Keywords:

few-shot; relation extraction; cluster loss; prototype network

1. Introduction

Relation extraction (RE) plays a crucial role in various domains such as information retrieval and knowledge graph (KG) construction. However, traditional relation extraction methods heavily rely on large-scale manually annotated datasets, which impose significant costs in terms of data acquisition and labeling in real-world applications [1,2,3,4,5]. To overcome this challenge, few-shot learning (FSL) has been progressively introduced into the field of relation extraction, thereby promoting the rapid development of few-shot RE. Few-shot learning allows models to generalize to novel relation types utilizing just a limited set of labeled samples, making it particularly suitable for low-resource scenarios [6]. It has become one of the most promising solutions to alleviate the data scarcity problem.

Among existing few-shot RE approaches, prototype-based methods [7,8] have gained widespread attention owing to their simple structure and efficient computation. These methods leverage metric learning to map support instances of the same class into a feature space, and categorize query instances based on their distance to each class prototype. However, owing to the inherent diversity and ambiguity of natural language expressions, prototype networks are often challenged by intra-class variability and inter-class similarity when dealing with complex semantic relations, which in turn limits their classification performance [9]. To enhance the model’s proficiency, researchers have improved the classification accuracy of prototype networks by introducing relation information and optimizing class prototypes in two different ways [10].

By introducing relation information to assist the model’s learning, for instance, TD-proto [11], it combines entity and relationship description information to help the model extract key information in sentences more accurately; CTEG [12] utilizes an entity-focused attention module and confusion-aware learning approach to strengthen the model’s ability to identify true relationships while avoiding confusion.
By focusing on optimizing the representation of category prototypes, the aim is to reduce intra-class distances and expand inter-class differences, thereby improving the classification accuracy of the model. For instance, Han et al. [13] proposed a Hybrid Contrastive Relation Prototype (HCRP) method, which adopts relation description information as anchors to bring similar prototypes closer and push different prototypes farther apart in the representation space; Dong et al. [14] designed a semantic mapping framework, MapRE, based on label-aware and label-irrelevant semantic information, which enhances the generalization performance of the model by combining similarities between samples and between samples and labels; Sun et al. [15] proposed a hierarchical attention prototype network that combines multi-channel convolution feature extraction with adversarial sample generation to further optimize the feature representation of the model.

Although current methods [16,17,18,19] have alleviated the performance bottleneck of few-shot RE to some extent, they still have significant limitations.

High computational cost [16]: Traditional approaches typically rely on complex network architectures, resulting in the introduction of a substantial number of additional parameters. This not only increases computational overhead but may also compromise the overall effectiveness of the model.
Limitations of the random sampling strategy [17,18]: During training, the use of a random sampling strategy may cause the model to become trapped in local optima, particularly when dealing with relation categories that exhibit high semantic similarity, thereby leading to degraded performance.
Deficiencies in hard sample mining methods [19]: Existing hard sample mining methods primarily focus on the relative differences between positive and negative samples, lacking the capability to effectively address anomalous instances or semantically complex samples.

To overcome the limitations of current few-shot RE approaches, this work presents a prototype-enhanced few-shot relation extraction method based on cluster loss optimization, intended to boost the model’s effectiveness in scenarios with scarce data by optimizing prototype representations. Unlike traditional methods that rely on complex network architectures [20], the method enhances the global optimization ability and generalization ability of the model by effectively reducing the distance between samples within the same class and increasing the differences between classes [21]. Moreover, while maintaining the simplicity of the model, this method significantly improves the fusion degree between prototypes and relation information, as shown in Figure 1.

Specifically, this paper proposes a novel prototype enhancement strategy. Firstly, by unifying the encoder, the relationship information and the sentence are mapped to the same semantic space to ensure their consistency and representational symmetry in the high-dimensional space. Further, we generate a unified relationship representation by joining the two relationship views (i.e., the [CLS] token embedding and the average value of all token embeddings) through a symmetry-preserving fusion, making the relationship representation perfectly aligned with the prototype in terms of dimensions. Then, we explicitly add the initial prototype to the final generated unified relationship representation in a feature-symmetry-enhanced manner, thereby optimizing the feature representation of the prototype and the relationship information. This strategy effectively avoids the reliance of traditional methods on complex network structures, significantly reduces computational costs, simultaneously enhances the model’s representational capacity, and effectively reduces the parameter introduction, further improving the efficiency and scalability of the model.

Overall, the core contributions of this study are:

(1): We design a prompt encoder that can structurally encode different prompt templates while maintaining processing symmetry for instance and relationship information. Meanwhile, these encoded prompts are input into a Large Language Model (LLM), which utilizes its powerful semantic understanding capabilities to obtain high-quality representations of instances, instance prototypes, and conceptual prototypes;
(2): We adopt a graph attention mechanism to model the association between conceptual prototypes and isomorphic instances, and use the proposed prototype-level contrastive learning strategy with bidirectional symmetry to fuse the interpretable features of conceptual prototypes with the intra-class common features among instance prototypes, forming an enhanced relation representation.
(3): We designed a clustering loss function to enable the model to learn a distinguishable metric space with improved class symmetry, ensuring that samples of the same class are highly aggregated and samples of different classes are significantly separated.
(4): The experimental results on the FewRel1.0 and FewRel2.0 datasets show that the proposed model performs better than existing advanced models in the task of extracting few sample relations.

2. Materials and Methods

2.1. Relation Extraction

Early studies on RE mainly relied on supervised learning frameworks, where classifiers were trained using large amounts of human-annotated corpora for achieving effective extraction. Traditional approaches include feature-based models [22] such as SVM and logistic regression, as well as end-to-end neural models [23] based on CNNs, RNNs, and their variants, which are effective in capturing the contextual information surrounding entity pairs.

To alleviate the dependence on costly manual annotation, researchers have proposed distant supervision (DS) methods [5] for RE, which automatically generate labeled data by aligning raw texts with knowledge graphs. Guo et al. [24] first introduced this automatic labeling paradigm, which significantly reduced annotation efforts but introduced substantial noise in the training data. Follow-up works have addressed this issue by developing multi-instance learning [25], instance selection [26], and noise reduction techniques [27] to improve data quality and model performance.

Recently, the emergence of pre-trained language models (PLMs) has led to substantial progress in RE tasks. For instance, the MTB (Matching the Blanks) model based on BERT constructs [28] masked entity pairs to learn relational representations, while the BERT-Pair model enhances the relation class by feeding paired entity contexts into the model. These approaches demonstrate that modeling the interaction between context and entity semantics is crucial for improving extraction accuracy.

In addition, structured modeling [29] has also proven beneficial for enhancing the semantic understanding of RE models. Graph neural networks (GNNs) have been introduced to model both explicit syntactic dependencies and implicit semantic relations among entities. For example, the HGEED model [30], initially proposed for event detection, integrates sentence-level syntactic graphs and document-level semantic graphs to enhance both local and global semantic information. Such hierarchical graph architectures are increasingly being adapted for RE tasks to learn long-range contextual relationships, inter-sentential relations, and coreference structures.

2.2. Few-Shot Relation Extraction

Few-shot RE has recently emerged as a prominent research topic in information extraction. It aims to identify novel relation types under extremely limited labeled data conditions. The typical process consists of two stages: first, a model is trained on a small number of labeled instances (the support set) to learn the representative features of target relations; second, it classifies unseen instances (the query set) by generalizing from the learned relation prototypes.

To simulate realistic low-resource scenarios, few-shot RE typically adopts an episodic training mechanism based on meta-learning frameworks [19]. Specifically, the source domain datasets is separated into training and testing portions. During training, episodes are constructed as N-way-K-shot tasks, where N relation types are sampled, and K instances per type are selected to form the support set, along with corresponding query examples. This task formulation enhances the model’s generalization ability and enables rapid adaptation to new relations with few shot.

In terms of methodological taxonomy, existing few-shot RE approaches can be broadly categorized into distance-based methods and pre-trained language model (PLM)-based methods. Prototypical networks are among the most widely used metric-based models [31], which classify query samples by computing distances to the prototypes of each relation type. Introduced alongside the FewRel dataset, this framework has sparked a wealth of follow-up work. Bai et al. [32] enhanced prototype representation by incorporating semantic relation descriptions, label prompts, and attention mechanisms to address class imbalance and feature bias issues.

However, prototypical networks often struggle with semantically similar relations, where inter-class confusion and prototype drift degrade performance. To address this, Guo et al. [24], who integrated large language models (LLMs) with traditional relation extraction methods, has bridged the distance between generative and discriminative learning strategies, effectively enhancing the expression capabilities in scenarios with limited samples. For instance, AdapAug [9], on the other hand, designs both instance-level and representation-level augmentation strategies—such as entity order reversal and context perturbation—and incorporates adaptive debiased contrastive training to improve performance on hard-to-distinguish relation pairs. In parallel, prompt-based learning has gained increasing attention in few-shot RE. By converting the relation class into cloze-style language modeling tasks through the use of natural language templates, this approach enables PLMs to activate latent relational knowledge more effectively. Compared to traditional fine-tuning, prompt learning offers stronger generalization under low-resource conditions. The Template Regularization Network (TRN) [33], for example, generates diverse templates and employs attention weighting, ranking constraints, and template calibration to suppress the influence of poor-quality prompts, thereby improving few-shot RE stability and performance.

Moreover, graph-based modeling—originally applied to event extraction—has shown promising capabilities in relational semantic representation. The HGEED [30] framework introduces a hierarchical graph neural network that integrates sentence-level syntactic graphs with document-level semantic co-occurrence graphs, capturing both local and global dependencies across text. While initially proposed for event detection, this architecture provides valuable insights for few-shot RE tasks that require fine-grained relational modeling [6].

In summary, the research focus in few-shot RE has gradually shifted from simple prototype-based classification to more sophisticated methods involving semantic enhancement, contrastive mechanisms with intrinsic symmetry, prompt-based learning, and graph-based modeling.

2.3. Cluster Loss Function

Cluster loss functions are a novel approach to optimizing embedding spaces, and have attracted increasing attention in recent years. The core idea of such losses is to simultaneously enhance intra-class compactness and inter-class separability. By explicitly constraining the distance between samples and their corresponding class centers (or prototypes), cluster loss encourages samples of the identical category to gather closely in the embedding space, while maximizing the distance between different class centers to ensure clear decision boundaries.

Compared to the traditional Triplet Loss [34], cluster loss functions significantly reduce dependence on complex sample mining strategies and avoid instability or local minima caused by improper sample selection. Triplet loss updates the model by leveraging triplets composed of anchor, positive, and negative samples, seeking to reduce the distance between the anchor and positive while increasing the distance from the negative. However, this method heavily relies on effective triplet construction, which becomes inefficient and prone to sparse gradients in large-scale or few-shot settings. To address this, Xiao et al. [35] proposed a hard sample mining strategy, selecting the farthest positive sample and the nearest negative sample relative to the anchor to better learn discriminative boundaries.

By introducing class centers or prototypes, cluster losses eliminate the need for extensive sample sampling, accelerate convergence, and enhance robustness to outliers. Furthermore, they offer global modeling capabilities for embedding space, leading to more consistent and stable training. In the context of prototypical networks, Ren et al. [36] employed cluster loss to accurately represent inter-class similarity via distance-based metrics. Their method explicitly enforces intra-class compactness and inter-class separability, thereby significantly improving the performance of few-shot learning tasks.

3. Methodology

3.1. Problem Formulation

Few-shot RE includes the following steps: Firstly, the model is trained with limited labeled data, so that it can quickly learn the features of relation categories; then, the model learns new knowledge and performs generalized reasoning to accurately determine the relationship between entity pairs within newly acquired knowledge. By leveraging meta-learning, the model can quickly adapt to new relation types from few samples, enabling efficient and flexible relation extraction.

In this study, we adopt the standard N-way K-shot experimental setup established by Han et al. [37], dividing the samples into a support set and a query set with inherent categorical symmetry. Meanwhile, inspired by recent successful practices of introducing prior knowledge to enhance relationship extraction performance [38], we adopt relation descriptions as the specific carrier and employ prior knowledge to better identify the essential semantic traits of target relations. The support set is composed of N relation categories, each with K labeled instances; the query set contains N identical categories, but corresponds to K different instances. The core goal of the few-shot RE task is to use the knowledge of

N \times K

labeled instances in the support set to identify the relation class to which the instances in the query set belong. Therefore, our proposed new few-shot RE process is as follows: (1) Obtaining prior-knowledge from relation descriptions; (2) Combining class enhancement techniques with representation learning to reinforce relation representations while preserving semantic symmetry; and (3) Ultimately achieving the relation class of query instances.

In meta learning with few-shot RE, the model is trained using meta-tasks drawn from the training set (

D_{T r a i n}

). Throughout or following training, the model is assessed using meta-tasks selected from the validation set (

D_{d e v}

); the best-performing model or optimal hyperparameters are identified according to validation set performance. Finally, the selected model is evaluated for performance on the meta-tasks sampled from the test set (

D_{T e s t}

). Each meta-task consist of a support set

S = (s_{i}^{k}; i = 1, 2, \dots, N; k = 1, 2, \dots, K)

with N classes and K examples for each class, and a query set

Q = (q_{i}^{m}; i = 1, 2, \dots, N; m = 1, 2, \dots, M)

that includes the same N categories with

M

samples drawn from the remaining instances of each category, where each instance is represented as

(x, h, t, r)

, x represents a sentence, h and t, respectively, represent the head entity and tail entity in an entity pair, and

r

denotes the relation label. According to the definition, our proposed few-shot RE model utilizes relation description information for class enhancement to optimize model parameters. Specifically, the semantic representation of each relation

R = {r^{i}, i = 1, 2, \dots, N}

is constructed from its relationship name and relationship description text. The few-shot RE requires learning features from N instances of the support set S and predicting the relation class r for any given query instance q.

3.2. Model Overview

This study introduces a semantically enhanced prototype network. This model can more effectively integrate the semantic information of relationships while maintaining representational symmetry, thereby significantly improving the performance of few-shot RE. As shown in Figure 2, the model consists of four parts: (1) Prompt encoder, which can encode text information into embedded representations; (2) Graph attention learning, using conceptual prototypes as guidance and where clusters support set instances of the same relation class in the representation space to reinforce discriminative relation-specific features in the instances through symmetry-aware aggregation; (3) Prototype-level contrastive learning with bidirectional symmetry integrates interpretable features contained in conceptual prototypes with intra class shared features induced by instance prototypes to form enhanced relational representations; (4) The clustering loss function explicitly constrains the optimization objective by actively sampling difficult positive and negative sample pairs in a symmetry-preserving manner, thereby learning a highly discriminative metric space.

3.3. Prompt Encoder

Prompt learning formats downstream task inputs into prompt templates to stimulate pre-trained knowledge in large language models (LLMs). Early work focused on designing more effective prompt templates to guide model output [39,40,41]. For example, Chen et al. [42] proposed the knowledge-aware prompt tuning method. Subsequent research aims to develop more efficient prompt learning methods and reduce dependence on training samples [43,44,45]. Unlike the general methods mentioned above, we use a specially designed prompt template:

Entity relation awareness template: Contains head entity and tail entity information, used for encoding instance representations.

Relation semantic perception template: Incorporates relational information for building conceptual prototypes.

We take these specifically designed prompt templates as input and use BERT [46,47,48], serving as the prompt encoder for instance information encoding, and corresponding relation description information for each relation class in the support set. After tokenization, the input sentence is constructed into a BERT-compliant input format: adding a [CLS] tag as the sequence start symbol (usually used to aggregate sequence information), and adding a [SEP] tag as the sequence end symbol (or separator). This standardized format ensures that the BERT encoder can correctly process the input sequence. For each instance

s = (x, h, t)

in the support set S and query set Q, we convert it into an input sequence based on the prompt template

I (s) = {[C L S] x [S E P] h [M A S K] t [S E P]}

, and then the contextual embedding calculation of the instance is as follows:

\bar{h} = B E R T (I (s))

(1)

where

\bar{h} \in R^{d}

represents the contextual representation, and

d

represents dimension.

I (s)

represents the input sequence constructed based on the prompt template.

After completing the calculation of instance embedding

\bar{h}

, this study introduces a prototype network architecture, aiming to accurately calculate the corresponding prototype for every relation class

r

using instance embedding in the support set S. Specifically, for each relation class

r_{i}

contained in the support set S, calculate the instance prototype

P_{i}^{I}

, as shown in Formula (2). This step aims to combine the feature representations of multiple instances under the same relation class, so as to generate the initial prototype for each relation. This average-based method can reduce the noise interference of data to a certain extent and highlight the common characteristics of the same relation class.

P_{i}^{I} = \frac{1}{K} \sum_{k = 1}^{K} {\bar{h}}_{i}^{k}

(2)

where

P_{i}^{I}

represents the instance prototype of the relation class

r_{i}

, and

{\bar{h}}_{i}^{k}

represents the contextual representation of instance

s_{i}^{k}

. Among them,

i \in {1, 2, \dots, N}

represents the i-th relation class

r_{i}

, and k represents the k-th instance in relation class

r_{i}

.

The mean view of all tag embeddings takes a more nuanced approach. As the model runs, BERT generates a hidden state vector for each mark in the input sequence, including punctuation, special symbols, and words. These hidden state vectors reflect the semantic characteristics of each tag and its semantic association with the surrounding tags from different dimensions. We average all hidden state vectors to obtain a global representation, complementing the local contextual features and capturing the overall semantic distribution.

When dealing with various types of relations, we use the standardized template “name: description” to organically concatenate relation names with corresponding relation descriptions, thereby constructing a structured input sequence

C (c) = {[C L S] d [S E P] m [M A S K] [S E P]}

for a relation concept c. The constructed sequence is then input into the BERT encoder to generate a corresponding relation embedding vector

\bar{c} = B E R T (C (c))

. Among them,

m

represents the name of the relation concept c, and

d

represents the description of the relation concept c. To emphasize the importance of relation information, the calculation of the relation prototype

p_{i}^{C}

is as follows:

p_{i}^{C} = {\bar{c}}_{i}^{l o c a l} + {\bar{c}}_{i}^{g l o b a l}

(3)

where

{\bar{c}}_{i}^{k}

represents the conceptual prototype of the relation class

r_{i}

, and the embedding of the relation prototype is represented as

{\bar{c}}_{i}

;

{\bar{c}}_{i}^{l o c a l}

refers to the context-dependent representation of each specific position (Token) in the sequence. It captures the semantic information of the token within a limited window around it, focusing more on fine-grained vocabulary, phrases, and local syntactic relationships;

{\bar{c}}_{i}^{g l o b a l}

refers to the aggregated representation of the entire input sequence. It captures the overall semantics, themes, emotions, or intentions of a sequence and is a summary of the entire contextual information. Among them,

i \in {1, 2, \dots, N}

represents the i-th relation class

r_{i}

.

Taking the relation class (name) of “main creator” as an example, its relation description is “the person who undertakes the main creative task in the creative process of the work and plays a key role in shaping the style, content, etc., of the work”. We combine the relation name of “main creator” with its detailed description according to the “name: description” template to form a specific input sequence of “main creator: the person who undertakes the main creative task in the creation process of the work and plays a key role in shaping the style, content, etc.”, and input it into the BERT encoder for processing.

3.4. Graph Attention Learning Based on Relation Graph

To enhance the relationship specific features in isomorphic instances while maintaining structural symmetry, we propose a relation graph attention method. This method utilizes conceptual prototypes as semantic anchors, defines instance prototype associations, and optimizes features based on these associations.

Consider a support set

S = (s_{i}^{k}; i = 1, 2, \dots, N; k = 1, 2, \dots, K)

, relation concepts

c_{i} = (m_{i}, d_{i}), i = {1, 2, \dots, N}

, and

N

relation classes

R = r_{i}, i = {1, 2, \dots, N}

, our goal is to model the semantic association between support instances and their conceptual prototypes using graph attention. We assume

Ω = {V, E}

represents a relation graph for

N \times K

supported instances and

N

relationship concepts, where

v_{i} \in V

represents a set of nodes, each corresponding to a relation concept or instance, and

E = {e_{i j}^{k} | i, j = 1, 2, \dots, N; k = 1, 2, \dots, K}

represents a set of edges, each representing an association between a relation concept and an instance.

In this study, we mainly consider highlighting the importance of inferring specific relationships in instances through instance classification tasks. We assume that when an instance is associated with relation class

r_{i}

, a connection is established between the relation concept

c_{i}

and the instance

e_{i j}^{k}

. The specific calculation is as follows:

e_{i j}^{k} = \{\begin{cases} 1, i f s_{j}^{k} = r_{i} \\ 0, o t h e r w i s e \end{cases}

(4)

where

s_{j}^{k}

represents the k-th instance,

r_{i}

represents the i-th relationship category, and

e_{i j}^{k}

represents the existence mark of the edge in the graph and is a binary flag.

To address the instance classification task, we reformulate it as a link prediction problem within the relation graph

Ω = {V, E}

, which consists of instances and relation concepts from the support set. Specifically, our objective is to link nodes of the same relation class and separate nodes from different relation categories. We use the graph attention network (GAT) [49] and dot product operation to calculate the correlation

σ_{i j}^{k}

between the relation concept

c_{i}

(i = 1,2,..,N) and the instance

s_{i}^{k} (i = 1, 2, \dots, N; k = 1, 2, \dots, K)

. The attention weight

α_{i j}

calculation is as follows:

α_{i j} = \frac{\exp ({(p_{i}^{C})}^{T} W s_{j}^{k})}{\sum_{m = 1}^{N} \exp ({(p_{i}^{C})}^{T} W s_{m}^{k}))}

(5)

where

p_{i}^{C}

indicates the representation of each relation concept

c_{i}

,

s_{j}^{k}

indicates the representation of each instance

s_{i}^{k}

, W represents the training weight, and

N

represents the total number of neighboring nodes. The node-level correlation calculation is completed through dot product operation, and the specific calculation is as follows:

σ_{i j}^{k} = \sum_{j = 1}^{N} α_{i j} {(p_{i}^{C})}^{T} s_{j}^{k}

(6)

where

σ_{i j}^{k}

represents the predicted connection probability,

α_{i j}

represents attention weight,

s_{j}^{k}

indicates the representation of each instance

s_{i}^{k}

, and

p_{i}^{C}

represents the concept prototype of the relation class

r_{i}

.

To quantify the internal tightness of the relation-specific type (representing the tightness of all its relation instances clustered in the embedding space), we propose a link prediction loss function, which is expressed as follows:

L_{l p} = \frac{- 1}{N^{2} K} \sum_{i = 1, j = 1}^{N} \sum_{k = 1}^{K} e_{i j}^{k} \log (σ_{i j}^{k}) + (1 - e_{i j}^{k}) \log (1 - σ_{i j}^{k})

(7)

where

L_{l p}

represents the link prediction loss,

N

represents the number of categories,

K

represents the number of instances in each category,

e_{i j}^{k}

is a binary flag, and

σ_{i j}^{k}

represents the predicted connection probability.

3.5. Prototype-Level Contrastive Learning

The prototype representations of certain relation classes (such as ‘mother’, ‘father’ and ‘child’) at the instance-level may be very similar, which poses a challenge when aiming to accurately distinguish them. However, the core relation connotations of these relation classes themselves (such as the core semantics of reproduction, upbringing, etc., in the “parent–child” relationship) are usually clear and easy to distinguish. The key point is that the connotation definition of a concept (its essential attributes or relationships) should be highly consistent with the core features shared by all its extended instances (specific examples belonging to the concept).

To enhance this consistency and enhance the model’s capability to differentiate between similar relation categories, we propose a novel prototype-level contrastive learning strategy with bidirectional feature symmetry. The core idea of this strategy is to use relation prototypes extracted from conceptual connotations as anchors to guide and calibrate instance prototypes induced from specific instances. By comparing and learning, instance prototypes belonging to the same relation class are aligned with the connotation prototype of that relationship, while pushing away prototypes from different relation classes, thus more effectively capturing and distinguishing the essential characteristics of different relation classes.

Specifically, in the designed meta learning framework, each meta-task contains a set of core data units

A = {{{(p}_{i}^{I}, p}_{i}^{C})}_{i = 1}^{N}

. Unit A explicitly includes a pair

{{(p}_{i}^{I}, p}_{i}^{C})

, where the conceptual prototype is

p_{i}^{C}

(representing the essential definition or core semantic features of a specific relation class

r_{i}

(such as “mother”)). The prototype of the extended instance is

p_{i}^{I}

, representing the aggregated typical features of a specific set of instances associated with the same relation class

r_{i}

, such as multiple different “mother” individuals. Following the contrastive learning paradigm proposed by Chen et al. [50], we construct sample pairs at this prototype level: (1) We use concept prototype

p_{i}^{C}

as the anchor point. (2) Pair the extension instance prototype

p_{i}^{I}

belonging to relation class

r_{i}

with anchor point

p_{i}^{C}

to form a positive sample pair

{{(p}_{i}^{I}, p}_{i}^{C})

. This forces the model to learn the consistency between the connotation definition and the typical features of its extension instances within the same relation class. (3) Take the extended instance prototypes (denoted as

p_{i}^{I}

) belonging to different relation classes (such as’ father’, ‘child’, etc.) from anchor point

p_{i}^{C}

as negative samples. Based on this comparison structure, we define a prototype-layer contrastive loss function

L_{p r o}

, whose core objective is to enhance the similarity between positive sample pairs

{{(p}_{i}^{I}, p}_{i}^{C})

in the representation space and minimize the similarity between anchor point

p_{i}^{C}

and all negative sample extension prototypes

p_{i}^{I}

, in order to enhance the distinguishability between different relation class prototypes. The specific calculation of the loss function

L_{p r o}

is as follows:

L_{p r o} = - \frac{1}{N} \sum_{i = 1}^{N} \log (\frac{\exp (f (p_{i}^{I}, p_{i}^{C}) / λ)}{\sum_{i \neq j} \exp (f (p_{i - j}^{I}, p_{i}^{C})) / λ + \sum_{i \neq j} \exp (f (p_{i}^{I}, p_{i - j}^{C})) / λ})

(8)

where

p_{i - j}^{I} (i \neq j)

represents the different instance prototype,

p_{i - j}^{C}

represents the different concept prototype,

p_{i}^{I}

represents the instance prototype, and

p_{i}^{C}

represents the concept prototype,

f (

)

represents the similarity function and

λ

is a temperature variable designed to influence the curvature of the loss function.

3.6. Cluster Loss Based on Support Set

It is challenging to define loss function scientifically in few-shot RE task. This task requires the accurate identification and extraction of entity relations derived from few samples, and also requires the model to generate separable representations of new class relations. Due to the scarcity of samples, it is difficult to accurately characterize the bias as in large-scale data tasks, resulting in a lack of data support for loss function calculation. At the same time, there are still many technical problems in designing loss functions that can guide the model to learn the difference features of new relations.

To help the model learn a highly differentiated metric space in a few-shot RE task and significantly improve its performance in a complex relationship extraction scenario, this paper designs an innovative cluster loss function after in-depth research and exploration. The core goal of class cluster loss function is to effectively limit the intra-class compactness and inter-class separability. By strengthening the compactness within the class, the samples of the same relation class are closely clustered in the metric space, which helps the model to better capture the common characteristics of the same relation. Enhancing the separability between classes can make the samples of different relation categories clearly delimited in the metric space, which is convenient for the model to distinguish different relation categories clearly. This design enables the model to classify and extract inter-entity relationships more accurately under the condition of limited samples, so as to effectively improve the overall performance of small-sample relationship extraction.

The cluster loss function is an improvement on the triplet loss function. The triplet loss function, as a key loss function widely used in model training, its core mechanism lies in the strict definition of triplet samples, namely anchor samples, positive samples and negative samples, and the measure of distance between samples, to promote the model to learn a highly differentiated feature representation. As shown in Formula (9):

L_{t r i p l e t} = r e l u (D^{a, p} - D^{a, n} + β)

(9)

where

a

represents the anchor sample,

p

represents the positive sample,

n

represents the negative sample,

D^{a, p}

represents the distance between the anchor point and the positive sample,

D^{a, n}

represents the distance between the anchor point and the negative sample, and hyperparameter

β

is a margin parameter strictly greater than zero, which plays a key role in model training. Specifically, this parameter precisely constrains the distance between positive and negative samples by regulating the distance between different samples. This constraint mechanism can effectively guide model learning, so that in the feature space, positive samples and negative samples can maintain a reasonable distance, in order to improve the model’s capability to differentiate various sample types, and improve the overall performance and generalization ability of the model.

D

represents the Euclidean distance between the two samples.

In few-shot tasks with high semantic similarity, random triplet sampling can lead to local optima. Hard sample mining alleviates this by selecting the farthest positive and nearest negative samples, but it focuses only on relative distances, and struggles with outliers. To efficiently address the aforementioned issues, inspired by hard sample mining technology, we extended the triplet loss function, as shown in Formula (10). In this way, it aims to comprehensively strengthen the model’s deep learning capacity for complex and changeable features, and markedly boost the model’s generalization capability across diverse scenarios.

L_{t r i p l e t - h a r d} = r e l u (D_{m a x}^{p r o, p} - D_{m i n}^{p r o, n} + β)

(10)

where

D_{m a x}^{p r o, p}

represents the maximum distance among the positive samples of the same category,

D_{m i n}^{p r o, n}

represents the distance between the nearest positive sample and the samples of different classes, and hyperparameter

β

is a margin parameter strictly greater than zero.

Hard sample mining provides significant advantages for the optimization of triplet loss. By selecting the positive sample with the farthest distance from the anchor sample and the negative sample with the nearest distance, this symmetry-aware sampling strategy provides the model with high-value learning examples, thus improving the model’s discrimination ability and training efficiency. This strategy helps the model focus on samples that are difficult to distinguish, enhances the learning of complex data distributions, reduces the impact of random sampling introducing redundant samples, and accelerates the model convergence. However, hard sample mining also has some limitations. It only focuses on the relative distance between samples and ignores the global feature distribution, resulting in limited performance on abnormal samples with low semantic similarity. In addition, hard sample mining introduces additional computational overhead, which may lead to unstable gradient updates and neglect the learning of simple samples.

In view of the fact that triplet loss function only considers the margin of separability between classes when dealing with abnormal samples lacking symmetry in global feature distribution modeling, and has significant shortcomings in the face of complex and diverse sample data, we propose class cluster loss. Cluster loss achieves enhanced structural symmetry as a result of deep optimization based on triplet loss.

First, the center distance

D_{c e n t e r}

is determined by calculating the average distance between the similar sample and the prototype, as shown in Formula (11).

D_{c e n t e r} = \frac{1}{M} \sum_{M = 1}^{M} D^{a, p}

(11)

where

M

represents the number of pairs of similar samples, and

D^{a, p}

represents the distance between the anchor point and the positive sample. Then, to significantly improve the discriminant ability of the metric space, we impose explicit and strict constraints on intra-class compactness and inter-class separability. Cluster loss

L_{c c}

is shown in Formula (12):

L_{c c} = r e l u (D_{m a x}^{p r o, p} - D_{m i n}^{p r o, n} + β) + r e l u (D_{\max}^{a, p} - D_{c e n t e r})

(12)

where

D_{\max}^{a, p} - D_{c e n t e r}

is used to control the distance between the anchor point and the center of the same type, and this distance should not be too large.

3.7. Training

In order to find a more ideal classification plane and further optimize the classification performance of the model, for each sub-task, we will organically combine link prediction loss

L_{l p}

(as shown Formula (7)), prototype level comparison loss

L_{p r o}

(as shown Formula (8)), and clustering loss

L_{c c}

(as shown Formula (12)) for collaborative training. The joint loss is shown in Formula (13):

L_{a l l} = λ_{1} L_{l p} + λ_{2} L_{p r o} + λ_{3} L_{c c}

(13)

where

λ_{1}

,

λ_{1}

and

λ_{1}

represent the hyperparameter used to balance the loss function. The joint loss optimizes the model for complex few-shot tasks.

4. Experiments

We conducted ablation studies and baseline comparisons using two publicly available datasets, FewRel 1.0 [37] and FewRel 2.0 [51], to verify that the proposed approach significantly enhances the effectiveness of few-shot relation extraction.

4.1. Datasets

This study conducts systematic empirical research on FewRel1.0 and its extended version FewRel2.0, which are widely recognized as authoritative benchmark datasets for few-shot RE. Among them, FewRel1.0, as the first benchmark dataset for the classification of few-shot relations, was developed and made an open source by the Natural Language Processing (NLP) Laboratory of Tsinghua University. The dataset comprises 100 relations and 70,000 labeled instances, and the 100 relations are divided into three non-overlapping subsets: 64 relations are used for training, 16 for validation, and 20 for testing. Each relation includes 700 example sentences from Wikipedia. The test set contains 10,000 query sentences, and evaluation results for this set can only be obtained through an official online platform, as the labels of the test set remain undisclosed. The FewRel1.0 dataset has the limitation of domain coverage, which makes it difficult to reflect the heterogeneity of multi-domain data in real application scenarios. In view of this, FewRel2.0 is introduced in this experiment as a supplementary evaluation benchmark, and the dataset constructed is a test subset covering 6 vertical fields such as biomedical science and education technology through a stratified sampling strategy. Through this experimental design, the model’s stability in domain adaptation can be thoroughly validated. Unlike FewRel1.0, both the validation and test datasets for FewRel2.0 are from the medical field, whereas the training set belongs to the general domain. The training set for FewRel2.0 is the same as for FewRel1.0, with the validation set containing 10 relation classes and the test set containing 25 relation classes, each containing 100 instances from PubMed (https://www.ncbi.nlm.nih.gov/pubmed/, accessed on 6 July 2025). The detailed statistics of the datasets are summarized in Table 1.

4.2. Parameter Settings

The specific experimental settings are shown in Table 2. Based on the provided configuration, the model utilizes a BERT encoder for processing input representations. The back-end model employs either standard BERT or CP (BERT/CP). Training is performed with a learning rate of 1 × 10⁻⁵ or 5 × 10⁻⁶, using the AdamW optimizer. Input sequences have a maximum length of 256 tokens, and the model features a hidden layer size of 768. Training uses a batch size of 32, runs for a maximum of 30,000 iterations, and includes a verification procedure every 1000 steps. All experiments are conducted using the NVIDIA GeForce RTX 4080Ti with 64 G.

4.3. Baseline Methods

This section provides a comparison between the proposed model and 14 state-of-the-art baseline models from the field of few-shot relation extraction, including three CNN encoder-based models and 11 BERT-based (https://huggingface.co/bert-base-uncased, accessed on 6 July 2025) models. Specifically, the following models are CNN encoder-based:

CNN encoder: (1) Proto-CNN [52]: a prototype network model based on convolutional neural networks; (2) Proto-HATT [53]: a hybrid attention prototype network; and (3) MLMAN [54]: an interactive prototype network using multi-layer matching and knowledge enhancement strategies; the following are based on the BERT encoder.

BERT-based modes: (1) BERT-pair [51]: a model that pairs a support instance with a corresponding query instance and feeds it into BERT for relationship prediction; (2) Proto-BERT [52]: a network that classifies relationships according to the distance between the query instance and the class prototype; (3) TD-Proto [11]: a method that integrates entity and relationship information through a weighted gating mechanism to build a categorical prototype of knowledge perception; (4) ConceptFERE [55]: a method that incorporates the intrinsic conceptual information of entities as an external source of knowledge; (5) DAPL [56]: a de-biased-contrast learning method for unsupervised sentence representation; (6) HCPR [13]: A model based on contrastive learning framework, which uses relational label information to improve learning effect and focuses the model on difficult tasks by increasing the weight of difficult samples; (7) DRK [57]: a prototype network ECRC based on entity convolution, making full use of entity information; (8) MultiRep [58]: A method for aligning multiple representations in a classification of small-sample relations by contrast learning; (9) CP [59]: a comparative pre-training framework based on entity masking, which aims to enhance the learning effect of the model by masking specific entities; (10) MapRE [14]: a relational mapping network, which makes full use of the semantic knowledge of relationships by combining the agnostic semantic information of labels with the tag-aware information; (11) CBPM [16]: A more sophisticated prototype is built by mining the category knowledge in the query set. (12) LoToG [60]: Optimizing information within prototype networks to improve the quality of the embedding space

5. Result and Discussion

5.1. Compare Other Baseline Models

The performance of the proposed model is systematically evaluated on the FewRel 1.0 and FewRel 2.0 benchmark datasets. The evaluation follows the standard few-shot learning task setting, where accuracy is employed as the primary evaluation metric. The detailed experimental results are presented in Table 3 and Table 4. Table 3 reports the performance of the proposed model on the FewRel 1.0 dataset.

As presented in Table 3, the main conclusions can be summarized as follows:

(1): When employing BERT’s encoder architecture, the proposed model outperforms the CNN-based encoder model, demonstrating stronger competitiveness. The BERT encoder more effectively captures and expresses complex semantic information in few-shot learning tasks, which significantly enhances the model’s generalization capability and overall performance of the model.
(2): With BERT as the pre-trained model, the proposed (BERT) approach surpasses existing mainstream methods in performance. As shown in Table 3, BERT-based models perform well in the evaluation. Compared to the second-best model (MultiRep), it achieves improvements of 0.45%, 1.06%, 0.39%, and 2.60% under four few-shot settings. While existing approaches often rely on more complex network architectures and implementations, the proposed model (BERT) achieves significant performance gains while maintaining a relatively concise model structure, which further validates the effectiveness and efficiency of our approach.
(3): When using CP as the pre-trained model, the proposed model outperforms the state-of-the-art HCRP(CP) model in two few-shot settings (5-way-1-shot and 10-way-1-shot), with improvements of 0.83% and 1.38%, respectively.

Figure 3 shows the comparative analysis between the proposed model and several representative prototype-based models evaluated using the FewRel 1.0 test set. The analysis findings show that in all test configurations, the introduced approach significantly achieves superior performance compared to baseline models.

5.2. Result Analysis of Domain Adaptation

The evaluation conducted on the FewRel 2.0 test set demonstrates that the proposed model exhibits strong performance in this domain and demonstrates remarkable domain adaptability. Through the analysis of Table 4, we can draw the following conclusions:

(1): Our model significantly outperforms standard methods like Proto-BERT and BERT-PAIR. Under various N-way-K-shot settings, it achieves an accuracy improvement of at least 11%, with an average improvement of 13% across all settings. This demonstrates that introducing external information effectively enhances few-shot learning performance in domain adaptation scenarios.

The FewRel 2.0 dataset tests domain adaptation: its Wikipedia training data differs from its medical-domain validation/test sets. This domain shift challenges few-shot learning, better evaluating a model’s meta-learning (“learning to learn”) capability.

(2): Our model surpasses typical external information methods (HCRP, LoToG). Specifically, it outperforms HCRP by at least 4% in accuracy across N-way-K-shot settings, demonstrating that integrating relation information in the feature space is more effective than direct incorporation. It also exceeds LoToG by at least 3%, highlighting the critical importance of enhancing instance/prototype representations through the complementarity of relation-specific features and intra-class common features.

To more intuitively illustrate the domain adaptability of the proposed model across different fields, the results are further presented using visual representations. As shown in Table 4, it is clear that the proposed model consistently outperforms all competing methods evaluated on the FewRel 2.0 test set, further confirming its superior performance in cross-domain scenarios.

Figure 4 compares the effectiveness of the introduced method and the existing typical prototype network method evaluated on the FewRel2.0 test set. The analysis results show that, under all test configurations, the proposed model significantly improves model performance.

5.3. Ablation Experiment

To systematically evaluate the performance improvements brought by the direct integration of relational information and the introduction of a combined loss function, we conducted a series of ablation studies on the FewRel 1.0 test set. An in-depth investigation is conducted on the individual contributions of relational information, as well as the impact of loss function design on the overall effectiveness of the prototype network. Based on this, several experimental configurations were constructed.

(1): “w/o relation information” represents a variant of the model that does not contain direct relation information and only relies on the basic prototype network structure, so as to enhance the representation capability of the model;
(2): “w/o $L_{l p}$ ” indicates that the link prediction loss function is used to quantify the internal tightness of the relation-specific type (Remove the part of graph attention learning);
(3): “w/o $L_{p r o}$ ” indicates that the prototype-layer contrastive loss function is used to enhance the distinguishability between different relation class prototypes (remove the part of prototype-level contrastive learning);
(4): “w/o $L_{c c}$ ” indicates that the clustering loss function is used to impose clear and strict constraints on intra-class compactness and inter-class separability (Remove the part of cluster learning).

Based on the experimental findings presented in Table 5, the following conclusions are derived.

(1): The integration of relational information is critical for enhancing prototype representations. When relational information is excluded (that is, “w/o relation information”), the model shows a noticeable performance drop. Specifically, the exclusion of integrated relational information resulted in a notable decrease in model accuracy by 1.78% and 1.55% under the two few-shot learning task configurations. This phenomenon clearly shows that relational information plays a key role in enhancing the representation ability of the model, and the lack of relational information makes the model unable to effectively capture the dependencies between global and local features, thus limiting the performance of the model.
(2): The proposed constraints, particularly those emphasizing intra-class compactness and inter-class separability, allow the model to obtain better results compared to the conventional cross-entropy loss approach. Specifically, when both the prototype-layer contrastive loss and cluster loss are excluded (that is, “w/o $L_{l p}$ ”), the accuracy of the model is observed to decrease by 1.43% and 2.20% under the two few-shot learning task configurations, which clearly indicates the importance of $L_{l p}$ loss; when both the link prediction loss and cluster loss are excluded (that is, “w/o $L_{p r o}$ ”), the accuracy of the model is observed to decrease by 1.81% and 2.71% under the two few-shot learning task configurations, which clearly indicates the importance of $L_{p r o}$ loss; when both the prediction loss and prototype-layer contrastive loss are excluded (that is, “w/o $L_{c c}$ ”), the accuracy of the model is observed to decrease by 1.15% and 2.38% under the two few-shot learning task configurations, which clearly indicates the importance of $L_{c c}$ loss.

By introducing these constraints into the prototype network, the model can not only better organize the relationship between and block classification, but also effectively improve the degree of differentiation and cohesion between samples, thereby obtaining improved outcomes in few-shot relation extraction tasks.

To have a more intuitive understanding of the importance of clustering loss, a set of data was represented graphically. As shown in Figure 5, the model labeled as “cluster loss” has a limited ability to represent instance vectors. Although the model can distinguish instances of various relational classes after introducing clustering loss, there are still abnormal samples that are difficult to classify, and the intra-class distance is not compact enough. The model with all losses (our model) can learn a metric space with greater discriminability than other models, enabling it to predict relations between entities more accurately. We believe that several parameters and explicit constraints of the few shot model are advantageous.

These results clearly confirm that the introduced model is capable of more efficiently representing both the prototype and the metric space by integrating relational information with the optimized loss function design, thereby significantly enhancing its performance in complex few-shot relation extraction tasks. The experimental results further verify the efficiency of the introduced framework in improving the expressive capacity and discriminative ability of the model.

6. Conclusions and Future Work

This study introduces a novel few-shot RE framework. Firstly, by unifying the encoder, the relationship information and the sentence are mapped to the same semantic space to ensure their symmetry to ensure their consistency in the high-dimensional space. Further, we generate a unified relationship representation through the symmetry-preserving concatenation of the two relationship views, making the relationship representation perfectly aligned with the prototype in terms of dimensions. Then, we directly add the initial prototype to the final generated unified relationship representation in a feature-symmetry-enhanced manner, thereby optimizing the feature representation of the prototype and the relationship information. This strategy effectively avoids the reliance of traditional methods on complex network structures, significantly reduces computational costs, simultaneously enhances the model’s expressive power, and effectively reduces the parameter introduction, further enhancing the model’s computational efficiency and scalability. The experimental results on the FewRel1.0 and FewRel2.0 datasets show that the proposed model outperforms existing advanced models in the task of few-shot RE.

In the future, by extending the model to diverse and few-shot RE datasets (such as CrossFewRel, etc.), the universality and robustness of the method will be comprehensively verified, and its practical deployment path in low resource scenarios (such as cross-language, more informal text types or industrial datasets) will be explored to promote the application and transformation of few sample relationship extraction technology. Moreover, part of the model’s strong semantic representation capability stems from its reliance on LLMs. However, this dependence on large pretrained LLMs as encoders constrains the method due to the availability and update cycles of these external models, which may hinder its long-term maintainability. This is also an aspect that we plan to further address in our future work.

Author Contributions

Conceptualization, S.Q., Z.C. and C.L.; methodology, B.F., C.L., Y.S. and D.L.; validation, S.Q., S.J. and T.S.; formal analysis, D.L.; investigation, B.F.; resources, T.S., Y.C., Y.L. and S.J.; writing—original draft preparation, Y.C., B.F., Y.L. and Y.S.; writing—review and editing, S.Q. and Z.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by National Natural Science Foundation of China (61672470), in part by the Science and Technology Research Project of Henan Province, China (252102211034, 252102210101), in part by the Henan Province Key Research and Promotion Special Project, China (231111110100), in part by Major Science and Technology Research Projects in Henan Province, China (231100110200, 221100210400), in part by Key Research and Development Projects in Henan Province, China (251111313500), in part by Major Public Welfare Projects in Henan Province, China (201300210200), in part by the Key Scientific Research Projects of Higher Education in Henan Province, China (25B520001), and in part by the Grant 2024SJGLX0126, 2024BSJJ014.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhao, X.; Deng, Y.; Yang, M.; Wang, L.; Zhang, R.; Cheng, H.; Lam, W.; Shen, Y.; Xu, R. A comprehensive survey on relation extraction: Recent advances and new frontiers. ACM Comput. Surv. 2024, 56, 1–39. [Google Scholar] [CrossRef]
Wang, W.; Wei, X.; Wang, B.; Li, Y.; Xin, G.; Wei, Y. Hyperplane projection network for few-shot relation classification. Expert Syst. Appl. 2024, 238, 121971. [Google Scholar] [CrossRef]
Chen, X.; Wu, H.; Shi, X. Consistent prototype learning for few-shot continual relation extraction. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; pp. 7409–7422. [Google Scholar]
Detroja, K.; Bhensdadia, C.K.; Bhatt, B.S. A survey on relation extraction. Intell. Syst. Appl. 2023, 19, 200244. [Google Scholar] [CrossRef]
Zhang, Y.; Zhang, Y.; Wang, Z.; Peng, H.; Yang, Y.; Li, Y. Multi-information interaction graph neural network for joint entity and relation extraction. Expert Syst. Appl. 2024, 235, 121211. [Google Scholar] [CrossRef]
Zhao, X.; Yang, M.; Qu, Q.; Xu, R. Few-Shot Relation Extraction with Automatically Generated Prompts. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 4971–4983. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Hu, J.; Wan, X.; Chang, T.H. Learn from relation information: Towards prototype representation rectification for few-shot relation extraction. In Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, WA, USA, 10–15 July 2022; pp. 1822–1831. [Google Scholar]
Wen, W.; Liu, Y.; Ouyang, C.; Lin, Q.; Chung, T. Enhanced prototypical network for few-shot relation extraction. Inf. Process. Manag. 2021, 58, 102596. [Google Scholar] [CrossRef]
Li, R.; Zhong, J.; Hu, W.; Dai, Q.; Wang, C.; Wang, W.; Li, X. Adaptive class augmented prototype network for few-shot relation extraction. Neural Netw. 2024, 169, 134–142. [Google Scholar] [CrossRef]
Ding, N.; Wang, X.; Fu, Y.; Xu, G.; Wang, R.; Xie, P.; Shen, Y.; Huang, F.; Zheng, H.T.; Zhang, R. Prototypical representation learning for relation extraction. arXiv 2021, arXiv:2103.11647. [Google Scholar] [CrossRef]
Yang, K.; Zheng, N.; Dai, X.; He, L.; Huang, S.; Chen, J. Enhance prototypical network with text descriptions for few-shot relation classification. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual, 19–23 October 2020; pp. 2273–2276. [Google Scholar]
Yang, K.; Zheng, N.; Dai, X.; He, L.; Huang, S.; Chen, J. Learning to decouple relations: Few-shot relation classification with entity-guided attention and confusion-aware training. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 5799–5809. [Google Scholar]
Han, J.; Cheng, B.; Lu, W. Exploring task difficulty for few-shot relation extraction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 2605–2616. [Google Scholar]
Dong, M.; Pan, C.; Luo, Z. MapRE: An Effective Semantic Mapping Approach for Low-resource Relation Extraction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 2694–2704. [Google Scholar]
Sun, S.; Sun, Q.; Zhou, K.; Lv, T. Hierarchical attention prototypical networks for few-shot text classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 476–485. [Google Scholar]
Li, J.; Feng, S.; Chiu, B. Few-shot relation extraction with dual graph neural network interaction. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 14396–14408. [Google Scholar] [CrossRef]
Wen, M.; Xia, T.; Liao, B.; Tian, Y. Few-shot relation classification using clustering-based prototype modification. Knowl.-Based Syst. 2023, 268, 110477. [Google Scholar] [CrossRef]
Wu, H.; He, Y.; Chen, Y.; Bai, Y.; Shi, X. Improving few-shot relation extraction through semantics-guided learning. Neural Netw. 2024, 169, 453–461. [Google Scholar] [CrossRef] [PubMed]
He, K.; Huang, Y.; Mao, R.; Gong, T.; Li, C.; Cambria, E. Virtual prompt pre-training for prototype-based few-shot relation extraction. Expert Syst. Appl. 2023, 213, 118927. [Google Scholar] [CrossRef]
Gong, J.; Eldardiry, H. Few-Shot Relation Extraction with Hybrid Visual Evidence. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Torino, Italy, 20–25 May 2024; pp. 7232–7247. [Google Scholar]
Luo, D.; Gan, Y.; Hou, R.; Lin, R.; Liu, Q.; Cai, Y.; Gao, W. Synergistic anchored contrastive pre-training for few-shot relation extraction. Proc. AAAI Conf. Artif. Intell. 2024, 38, 18742–18750. [Google Scholar] [CrossRef]
Chen, Y.; Yang, W.; Wang, K.; Qin, Y.; Huang, R.; Zheng, Q. A neuralized feature engineering method for entity relation extraction. Neural Netw. 2021, 141, 249–260. [Google Scholar] [CrossRef] [PubMed]
Cabot, P.L.H.; Navigli, R. REBEL: Relation extraction by end-to-end language generation. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, Virtual, 16–20 November 2021; pp. 2370–2381. [Google Scholar]
Guo, Q.; Zhang, J.; Wang, S.; Tian, L.; Kang, Z.; Yan, B.; Xiao, W. Bridging Generative and Discriminative Learning: Few-Shot Relation Extraction via Two-Stage Knowledge-Guided Pre-training. arXiv 2025, arXiv:2505.12236. [Google Scholar]
Ying, X.; Xie, X.; Xu, T.; Zhao, Y.; Meng, Z.; Zhao, M. WMRE: Enhancing Distant Supervised Relation Extraction with Word-level Multi-instance Learning and Multi-hierarchical Feature. In Proceedings of the ICASSP 2025–2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 6–11 April 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 1–5. [Google Scholar]
Liu, W.; Yin, M.; Zhang, J.; Cui, L. A Joint Entity Relation Extraction Model Based on Relation Semantic Template Automatically Constructed. Comput. Mater. Contin. 2024, 78, 975. [Google Scholar] [CrossRef]
Sun, X.; Guo, Q.; Ge, S.Q. GFN: A novel joint entity and relation extraction model with redundancy and denoising strategies. Knowl.-Based Syst. 2024, 300, 112137. [Google Scholar] [CrossRef]
Hu, B.; Lin, A.; Brinson, L.C. Tackling Structured Knowledge Extraction from Polymer Nanocomposite Literature as an NER/RE Task with seq2seq. Integr. Mater. Manuf. Innov. 2024, 13, 656–668. [Google Scholar] [CrossRef]
Dagdelen, J.; Dunn, A.; Lee, S.; Walker, N.; Rosen, A.S.; Ceder, G.; Persson, K.A.; Jain, A. Structured information extraction from scientific text with large language models. Nat. Commun. 2024, 15, 1418. [Google Scholar] [CrossRef]
Lv, J.; Zhang, Z.; Jin, L.; Li, S.; Li, X.; Xu, G.; Sun, X. HGEED: Hierarchical graph enhanced event detection. Neurocomputing 2021, 453, 141–150. [Google Scholar] [CrossRef]
Sun, B.; Gong, K.; Li, W.; Song, X. MetaR: Few-Shot Named Entity Recognition with Meta-Learning and Relation Network. IEEE Trans. Audio Speech Lang. Process. 2025, 33, 974–986. [Google Scholar] [CrossRef]
Bai, G.; Lu, C.; Guo, D.; Li, S.; Liu, Y.; Zhang, Z.; Dong, G.; Liu, R.; Yong, S. Clear Up Confusion: Advancing Cross-Domain Few-Shot Relation Extraction through Relation-Aware Prompt Learning. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers), Mexico City, Mexico, 29 April–4 May 2024; pp. 70–78. [Google Scholar]
Wang, Z.; Yang, L.; Yang, J.; Li, T.; He, L.; Li, Z. A Triple Relation Network for Joint Entity and Relation Extraction. Electronics 2022, 11, 1535. [Google Scholar] [CrossRef]
Fan, M.; Bai, Y.; Sun, M.; Li, P. Large margin prototypical network for few-shot relation classification with fine-grained features. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 2353–2356. [Google Scholar]
Xiao, Y.; Jin, Y.; Hao, K. Adaptive prototypical networks with label words and joint representation learning for few-shot relation classification. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 1406–1417. [Google Scholar] [CrossRef]
Ren, L.; Chen, C.; Wang, L.; Li, P. Towards improved proxy-based deep metric learning via data-augmented domain adaptation. Proc. AAAI Conf. Artif. Intell. 2024, 38, 14811–14819. [Google Scholar] [CrossRef]
Han, X.; Zhu, H.; Yu, P.; Wang, Z.; Yao, Y.; Liu, Z.; Sun, M. FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 30 November 2018; pp. 4803–4809. [Google Scholar]
Han, Y.; Qiao, L.; Zheng, J.; Kan, Z.; Feng, L.; Gao, Y.; Tang, Y.; Zhai, Q.; Li, D.; Liao, X. Multi-view interaction learning for few-shot relation classification. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Gold Coast, Australia, 1–5 November 2021; pp. 649–658. [Google Scholar]
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
Gao, T.; Fisch, A.; Chen, D. Making Pre-trained Language Models Better Few-shot Learners. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Virtual, 1–6 August 2021; pp. 3816–3830. [Google Scholar]
Schick, T.; Schütze, H. Exploiting Cloze-Questions for Few-Shot Text Classification and Natural Language Inference. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Online, 19–23 April 2021; pp. 255–269. [Google Scholar]
Chen, X.; Zhang, N.; Xie, X.; Deng, S.; Yao, Y.; Tan, C.; Huang, F.; Si, L.; Chen, H. Knowprompt: Knowledge-aware prompt-tuning with synergistic optimization for relation extraction. In Proceedings of the ACM Web Conference 2022, Lyon, France, 25–29 April 2022; pp. 2778–2788. [Google Scholar]
Yang, Y.; Li, Y.; Quan, X. Ubar: Towards fully end-to-end task-oriented dialog system with gpt-2. Proc. AAAI Conf. Artif. Intell. 2021, 35, 14230–14238. [Google Scholar] [CrossRef]
Jing, L.; Fan, X.; Feng, D.; Lu, C.; Jiang, S. A patent text-based product conceptual design decision-making approach considering the fusion of incomplete evaluation semantic and scheme beliefs. Appl. Soft Comput. 2024, 157, 111492. [Google Scholar] [CrossRef]
Duan, X.; Liu, Y.; You, Z.; Li, Z. Agricultural Text Classification Method Based on ERNIE 2.0 and Multi-Feature Dynamic Fusion. IEEE Access 2025, 13, 52959–52971. [Google Scholar] [CrossRef]
Zheng, W.; Lu, S.; Cai, Z.; Wang, R.; Wang, L.; Yin, L. PAL-BERT: An Improved Question Answering Model. CMES-Comput. Model. Eng. Sci. 2024, 139, 2729–2745. [Google Scholar] [CrossRef]
Nie, W.; Chen, R.; Wang, W.; Lepri, B.; Sebe, N. T2TD: Text-3D Generation Model Based on Prior Knowledge Guidance. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 172–189. [Google Scholar] [CrossRef]
Gardazi, N.M.; Daud, A.; Malik, M.K.; Bukhari, A.; Alsahfi, T.; Alshemaimri, B. BERT applications in natural language processing: A review. Artif. Intell. Rev. 2025, 58, 166. [Google Scholar] [CrossRef]
Vrahatis, A.G.; Lazaros, K.; Kotsiantis, S. Graph attention networks: A comprehensive review of methods and applications. Future Internet 2024, 16, 318. [Google Scholar] [CrossRef]
Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, PmLR, Virtual, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
Gao, T.; Han, X.; Zhu, H.; Liu, Z.; Li, P.; Sun, M.; Zhou, J. FewRel 2.0: Towards More Challenging Few-Shot Relation Classification. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 6249–6254. [Google Scholar]
Snell, J.; Swersky, K.; Zemel, R. Prototypical networks for few-shot learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30, pp. 4080–4090. [Google Scholar]
Gao, T.; Han, X.; Liu, Z.; Sun, M. Hybrid attention-based prototypical networks for noisy few-shot relation classification. Proc. AAAI Conf. Artif. Intell. 2019, 33, 6407–6414. [Google Scholar] [CrossRef]
Ye, Z.; Ling, Z. Multi-Level Matching and Aggregation Network for Few-Shot Relation Classification. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 2872–2881. [Google Scholar]
Yang, S.; Zhang, Y.; Niu, G.; Zhao, Q.; Pu, S. Entity Concept-enhanced Few-shot Relation Extraction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, 1–6 August 2021; pp. 987–991. [Google Scholar]
Zhou, K.; Zhang, B.; Zhao, X.; Wen, J. Debiased Contrastive Learning of Unsupervised Sentence Representations. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022; pp. 6120–6130. [Google Scholar]
Wang, M.; Zheng, J.; Cai, F.; Shao, T.; Chen, H. DRK: Discriminative rule-based knowledge for relieving prediction confusions in few-shot relation extraction. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; pp. 2129–2140. [Google Scholar]
Borchert, P.; De Weerdt, J.; Moens, M.-F. Efficient Information Extraction in Few-Shot Relation Classification through Contrastive Representation Learning. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Mexico City, Mexico, 16–21 June 2024; pp. 638–646. [Google Scholar]
Peng, H.; Gao, T.; Han, X.; Lin, Y.; Li, P.; Liu, Z.; Sun, M.; Zhou, J. Learning from Context or Names? An Empirical Study on Neural Relation Extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 3661–3672. [Google Scholar]
Sun, H.; Chen, R. Enhancing the Prototype Network with Local-to-Global Optimization for Few-Shot Relation Extraction. In Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, NM, USA, 4 May 2025; pp. 2668–2677. [Google Scholar]

Figure 1. Comparison between new method and traditional method. The left triangle represents relationship type information, which is usually to be classified; The quadrilateral on the right represents the classification performance of the prototype network.

Figure 2. Structure of the proposed few-shot RE model. In the figure, the circles represent node-level information, the triangles represent relation information, and the five-pointed stars represent the initial prototype corresponding to the relation information; The query set is represented by rectangular boxes of different colors, representing how the instances in the query set are encoded and represented, and is compared with the support set and prototype representation throughout the learning process.

Figure 3. Accuracy comparison of different models on the FewRel1.0 test set.

Figure 4. Accuracy comparison of different models on the FewRel2.0 test set.

Figure 5. Visualization of instance embeddings of a 5-way-1-shot task on the FewRel 1.0 test set. Five different types of relationships were verified, represented by red, yellow, green, blue and purple respectively. The experiment proved that our model can distinguish species relationships very well, reduce intra-class distances and expand inter-class differences.

Table 1. Datasets details.

Dataset	FewRel1.0		FewRel2.0
Dataset	Relations	Instances	Relations	Instances
Train	64	44,800	64	44,800
Verify	16	11,200	10	1000
Test	20	14,000	15	1500

Table 2. Model parameter settings.

Argument	Value
encoder	BERT encoder
Back-end model	BERT/CP
Learning rate	1 × 10⁻⁵/5 × 10⁻⁶
Maximum length	256
Hidden layer size	768
Batch size	32
Optimizer	AdamW
Verification procedure	1000
Maximum number of training iterations	30,000

Table 3. The performance of few-shot RE models on FewRel1.0 dataset.

Model	5-Way-1-Shot		5-Way-5-Shot		10-Way-1-Shot		10-Way-5-Shot
Model	Val	Test	Val	Test	Val	Test	Val	Test
Proto-CNN [52]	— —/74.29		— —/85.18		— —/61.15		— —/74.41
Proto-HATT [53]	72.65/74.52		86.15/88.40		60.13/62.38		76.20/80.45
MLMAN [54]	75.01/— —		87.09/90.12		62.48/— —		77.50/83.05
BERT-Pair [51]	85.66/88.32		89.48/93.22		76.84/80.63		81.76/87.02
Proto-BERT [52]	84.77/89.33		89.57/94.13		76.85/83.41		83.42/90.25
TD-proto [11]	— —/84.76		— —/92.38		— —/74.32		— —/85.92
ConceptFERE [55]	— —/89.21		— —/90.34		— —/75.72		— —/81.82
DAPL [56]	— —/85.94		— —/94.28		— —/77.59		— —/89.26
HCRP [13]	90.90/93.76		93.22/95.66		84.11/89.95		87.79/92.10
DRK [57]	— —/89.94		— —/92.42		— —/81.94		— —/85.23
MultiRep [58]	92.73/94.18		93.79/96.29		86.12/91.07		88.80/91.98
LoToG [60]	92.38/95.28		94.26/96.71		86.23/91.48		91.11/93.14
Ours (BERT)	92.95/94.63		94.36/97.35		87.25/91.46		89.85/94.58
CP [59]	— —/95.10		— —/97.10		— —/91.20		— —/94.70
MapRE [14]	— —/95.73		— —/97.84		— —/93.18		— —/95.64
HCRP (CP) [13]	94.10/96.42		96.05/97.96		89.13/93.97		93.10/96.46
CBPM [16]	— —/90.89		— —/94.68		— —/82.54		— —/89.67
Ours (CP)	96.57/97.25		97.99/98.26		94.56/95.35		95.81/96.27

Table 4. The performance of few-shot RE models on FewRel2.0 test set.

Model	5-Way-1-Shot	5-Way-5-Shot	10-Way-1-Shot	10-Way-5-Shot
Proto-CNN [52]	35.09	49.37	22.98	35.22
Proto-BERT [52]	40.12	51.50	26.45	36.93
BERT-PAIR [51]	67.41	78.57	54.89	66.85
HCRP [13]	76.34	83.03	63.77	72.94
LoToG [60]	—	84.38	—	75.69
Ours (CP)	80.45	87.63	68.15	79.89

Table 5. Ablation results of the proposed few-shot RE model on FewRel 1.0 test set.

Model	5-Way 1-Shot	Training Time (Hours)	10-Way 1-Shot	Training Time (Hours)
Our (CP)	94.63	4	91.46	9
w/o relation information	92.85	3.5	89.91	8
w/o $L_{l p}$	93.20	2.5	89.26	6
w/o $L_{p r o}$	92.82	2	88.75	5
w/o $L_{c c}$	93.48	2	89.08	5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qian, S.; Fu, B.; Liu, C.; Jin, S.; Sun, T.; Chen, Z.; Li, D.; Sun, Y.; Chen, Y.; Li, Y. Prototype-Enhanced Few-Shot Relation Extraction Method Based on Cluster Loss Optimization. Symmetry 2025, 17, 1673. https://doi.org/10.3390/sym17101673

AMA Style

Qian S, Fu B, Liu C, Jin S, Sun T, Chen Z, Li D, Sun Y, Chen Y, Li Y. Prototype-Enhanced Few-Shot Relation Extraction Method Based on Cluster Loss Optimization. Symmetry. 2025; 17(10):1673. https://doi.org/10.3390/sym17101673

Chicago/Turabian Style

Qian, Shenyi, Bowen Fu, Chao Liu, Songhe Jin, Tong Sun, Zhen Chen, Daiyi Li, Yifan Sun, Yibing Chen, and Yuheng Li. 2025. "Prototype-Enhanced Few-Shot Relation Extraction Method Based on Cluster Loss Optimization" Symmetry 17, no. 10: 1673. https://doi.org/10.3390/sym17101673

APA Style

Qian, S., Fu, B., Liu, C., Jin, S., Sun, T., Chen, Z., Li, D., Sun, Y., Chen, Y., & Li, Y. (2025). Prototype-Enhanced Few-Shot Relation Extraction Method Based on Cluster Loss Optimization. Symmetry, 17(10), 1673. https://doi.org/10.3390/sym17101673

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Prototype-Enhanced Few-Shot Relation Extraction Method Based on Cluster Loss Optimization

Abstract

1. Introduction

2. Materials and Methods

2.1. Relation Extraction

2.2. Few-Shot Relation Extraction

2.3. Cluster Loss Function

3. Methodology

3.1. Problem Formulation

3.2. Model Overview

3.3. Prompt Encoder

3.4. Graph Attention Learning Based on Relation Graph

3.5. Prototype-Level Contrastive Learning

3.6. Cluster Loss Based on Support Set

3.7. Training

4. Experiments

4.1. Datasets

4.2. Parameter Settings

4.3. Baseline Methods

5. Result and Discussion

5.1. Compare Other Baseline Models

5.2. Result Analysis of Domain Adaptation

5.3. Ablation Experiment

6. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI