Next Article in Journal
Structural Complexity as a Directional Signature of System Evolution: Beyond Entropy
Previous Article in Journal
Simulating Public Opinion: Comparing Distributional and Individual-Level Predictions from LLMs and Random Forests
Previous Article in Special Issue
Hyperparameter Optimization EM Algorithm via Bayesian Optimization and Relative Entropy
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Learnable Convolutional Attention Network for Unsupervised Knowledge Graph Entity Alignment

1
School of Computer Science, Guangdong University of Education, Guangzhou 510303, China
2
Aberdeen Institute of Data Science and Artificial Intelligence, South China Normal University, Foshan 528225, China
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(9), 924; https://doi.org/10.3390/e27090924 (registering DOI)
Submission received: 11 June 2025 / Revised: 17 August 2025 / Accepted: 22 August 2025 / Published: 3 September 2025
(This article belongs to the Special Issue Entropy in Machine Learning Applications, 2nd Edition)

Abstract

The success of current entity alignment (EA) tasks largely depends on the supervision information provided by labeled data. Considering the cost of labeled data, most supervised methods are challenging to apply in practical scenarios. Therefore, an increasing number of works based on contrastive learning, active learning, or other deep learning techniques have been developed, to solve the performance bottleneck caused by the lack of labeled data. However, existing unsupervised EA methods still face certain limitations; either their modeling complexity is high or they fail to balance the effectiveness and practicality of alignment. To overcome these issues, we propose a learnable convolutional attention network for unsupervised entity alignment, named LCA-UEA. Specifically, LCA-UEA performs convolution operations before the attention mechanism, ensuring the acquisition of structural information and avoiding the superposition of redundant information. Then, to efficiently filter out invalid neighborhood information of aligned entities, LCA-UEA designs a relation structure reconstruction method based on potential matching relations, thereby enhancing the usability and scalability of the EA method. Notably, a similarity function based on consistency is proposed to better measure the similarity of candidate entity pairs. Finally, we conducted extensive experiments on three datasets of different sizes and types (cross-lingual and monolingual) to verify the superiority of LCA-UEA. Experimental results demonstrate that LCA-UEA significantly improved alignment accuracy, outperforming 25 supervised or unsupervised methods, and improving by 6.4% in Hits@1 over the best baseline in the best case.

1. Introduction

Knowledge graphs (KGs) have progressively emerged as a new method to manage massive information, with various domains such as question answering [1], recommendation systems [2], reasoning [3], and safety and security [4] investigating their potential applications. Despite the rapid expansion in data volume and application scope, KGs still fall short of providing sufficient knowledge to support downstream tasks due to their inherent limitations in coverage. Consequently, it is imperative to explore integration techniques for heterogeneous KGs. A core task in this context is entity alignment (EA), which focuses on identifying equivalent entities across different KGs, thereby enabling the seamless merging of multiple KGs.
Recent advances in representation learning techniques have significantly propelled the development of embedding-based EA methods. Traditional EA methods initially depend on known aligned entities (also called alignment seeds or pre-aligned entity pairs) as supervisory signals. These methods project entities from different KGs into a unified embedding space, learn entity embeddings, and then subsequently employ vector similarity functions to measure entity similarities and predict alignment results. Existing EA methods can be classified into two primary categories: relation-based methods and auxiliary-based methods. Relation-based EA methods are grounded in the assumption that aligned entities possess similar relation neighborhood structures. These methods leverage translation-based models (e.g., TransE [5]) or Graph Neural Networks (e.g., GCN [6], GAT [7]) to extract relation structural features of entities. Auxiliary-based methods incorporate additional entity information, such as attributes, attribute values, images, etc., to enhance the embedding learning process for entities. Nevertheless, these methods still encounter certain limitations in contemporary applications, which undermine their effectiveness and robustness when applied to real-world KGs.
Limitation 1: Most models employ high-order networks to aggregate neighborhood information, but their training costs increase substantially. Owing to the powerful structure learning capabilities, Graph Neural Networks (GNNs) have been widely adopted as encoders in numerous studies [8,9,10] to enhance the ability to capture structural information within KGs. For instance, RPR-RHGT [8] and RANM [9] introduce variant attention mechanisms that incorporate the relation heterogeneity of KGs into the computation of attention coefficients. PEEA [10] proposes a novel position encoding method that integrates both anchor links and relational information from a global perspective. However, these methods rely on GNN variants or multi-layer stacking, resulting in a sharp increase in the number of parameters in neural networks, a significant increase in model complexity, and a substantial decrease in training efficiency. In actual large-scale KG scenarios, this problem is particularly prominent. Therefore, balancing the effectiveness and complexity of models remains a key challenge in EA tasks.
Limitation 2: Auxiliary-based methods exhibit limitations in terms of model effectiveness, practicality, and modeling efficiency. Firstly, while the incorporation of auxiliary information aids in enhancing entity feature extraction, it also introduces additional noise. The auxiliary information associated with different KGs is often inconsistent, rendering the selection or identification of valid and consistent auxiliary information a challenging task. Secondly, real-world KGs do not always encompass auxiliary information such as attributes, attribute values, images, etc. The more auxiliary information a method relies on, the greater the data requirements for its application, thereby making the method more scenario-dependent. Thirdly, it is evident that the introduction of more auxiliary information inevitably leads to an increase in method complexity. Auxiliary-based methods [11,12,13,14] require more complex models to handle auxiliary information across KGs. This issue has garnered increasing attention recently; however, no effective solution has yet been proposed.
To resolve the above limitations, we propose a learnable convolutional attention network for unsupervised entity alignment method known as LCA-UEA. For limitation 1, we abandon the use of complex network and introduce a novel GNN, LCAT [15], as the backbone network to model the graph structures of KGs. LCAT has the advantage of simplicity and outperforms existing benchmark GNNs in terms of robustness to input noise and network initialization. For limitation 2, we restrict model inputs to only partial relation information. As shown in Figure 1, the most aligned entity pairs, a and b, exhibit differing neighborhood structures; only a subset of their neighbors (connected by solid edges) are identical, while other neighbors (connected by dashed edges) introduce noise that interferes with alignment. To address this issue, we propose a novel reconstruction method for relation structures based on potential matching relations, which efficiently filters out effective neighbors of aligned entities. Specifically, inspired by RPR-RHGT [8], we incorporate only those triples that contribute positively to alignment into the LCAT model learning process. Additionally, we integrate entity-context embeddings as model inputs to enhance alignment performance without substantially increasing model complexity. These designs not only reduce the data flow into the encoder but also lower the model’s resource requirements.
In addition, we incorporate contrastive learning [16] to reduce the reliance on alignment seeds, thereby enhancing the practicality and robustness of our proposed method. Finally, we propose a novel similarity function for evaluating the similarity of aligned entities. This is motivated by the limitation of conventional similarity functions, which only account for the diversity of entity pairs based on their intrinsic features. Our experiments on three benchmark datasets demonstrated that LCA-UEA outperforms state-of-the-art EA methods. We further conducted comprehensive additional analyses to validate the effectiveness of our simplifying and learnable convolutional attention network. More specifically, we summarize our contributions as follows:
  • A reconstruction method of relation structure based on potential matching relations is designed, which improves alignment accuracy while reducing the computational cost of model training.
  • A learnable GNN model is introduced to learn entity features, which performs convolution operations before the attention mechanism, ensuring the acquisition of structural information while avoiding the superposition of redundant information.
  • A novel similarity function based on consistency is proposed, enabling more accurate measurement of the similarity between candidate entity pairs.
  • Extensive experiments conducted on three well-known benchmark datasets demonstrated that LCA-UEA not only significantly outperforms 25 state-of-the-art models but also exhibits strong scalability and robustness.
The remainder of this paper is organized as follows: Section 2 provides a concise overview of related work. Section 3 presents a formal definition of concepts relevant to our methods. Section 4 details our proposed method, LCA-UEA. Section 5 reports the experimental results and compares them with state-of-the-art alignment methods. Finally, Section 6 concludes the paper and outlines potential future research directions.

2. Related Work

Existing embedding-based EA methods can be classified into two categories based on whether they use a training set: (semi-)supervised methods and self-supervised or unsupervised methods. According to the graph structure modeling method, the former category can be further divided into two groups: translation-based methods and GNN-based methods. Also, GNN-based methods can be further subdivided into two subcategories: relation-based methods and auxiliary-based methods. In this section, we briefly review each of these related works.

2.1. EA Based on Translation Model

TransE [5] is an energy-based model that projects entities and relations of KGs into different spaces and finds suitable translations between them. Translation-based methods served as the primary methods in early EA research, leveraging TransE to encode the structural information of KGs. MTransE [17] was the earliest work in this domain, relying solely on the relational structures of KGs for alignment. Subsequent studies proposed various enhancements. For instance, JAPE [18] and AttrE [19] incorporate attribute modeling, MultiKE [20] integrates multi-view knowledge representation, and BootEA [21] optimizes performance through iterative strategies. Additionally, NAEA [22] combines TransE and GCN to enhance the extraction of entity features. However, translation-based methods often fall short of achieving the desired results, as the high heterogeneity of KGs renders it challenging to transform one KG into another using a linear mapping function akin to multilingual lexical space transformation.

2.2. EA Based on Relation Structures

Given that entities with similar neighborhood structures in different KGs are likely to be aligned, GNNs have emerged as the most popular solution for EA tasks. GNN-based methods use well-known GNN architectures (e.g., GCN, GAT) and their variants to extract neighborhood features of entities, which are utilized to make final alignment decisions. Relation-based methods are the most common methods, where the input consists solely of the relation structures of KGs, such as RDGCN [23], AliNet [24], and KAGNN [25]. These methods are highly practical, as they rely exclusively on the fundamental data (i.e., the relation structure) of KGs.
Some other works cleverly model both intra-graph and cross-graph information, such as Dual-AMN [26] and PEEA [10]. Here, the cross-graph information is constructed through Graph Matching Networks (GMNs). Moreover, some works also add modeling of heterogeneous information (e.g., relation edges), such as MRAEA [27], RPR-RHGT [8], and RANM [9]. These works propose or apply novel heterogeneous graph embedding methods to learn more effective entity representations.
Some researchers focus on the development of semi-supervised methods, such as MRAEA [27], RANM [9], and PEEA [10]. These approaches expand the training set through iterative strategies by generating new alignment seeds during the training phase. In addition to semi-supervised learning, other advanced techniques (e.g., active learning, deep reinforcement learning, adversarial learning) have been applied to enhance the efficiency and effectiveness of EA methods [28,29,30,31].

2.3. EA Based on Auxiliary Information

In addition to the relation structure, many studies add auxiliary information (e.g., attributes, entity descriptions, and images) into the entity encoding process. The relation structure represents the external relationships between entities, while the attribute structure (i.e., the attributes and attribute values of associated entities) represents the internal characteristics of entities. Attribute-based methods require simultaneous input of both the relation and attribute structures of KGs, such as HMAN [11], AttrGNN [12], MRAEA [27], MHNA [13], RoadEA [32], and EAMI [14].
There are also methods that leverage powerful pre-trained or language models to model descriptive information about entities, such as HMAN [11], SDEA [33], SKEA [34], and MMEA-cat [35]. Since some entities possess distinctive visual features (e.g., humans, objects, or animals) or have clear logo markings (e.g., companies, organizations, or associations), this characteristic is highly beneficial for alignment judgments. Therefore, some methods utilize image information as additional input, such as MMEA-cat [35], SKEA [34], and GEEA [31].

2.4. Self-Supervised or Unsupervised EA Methods

Most early EA methods are (semi-)supervised, utilizing alignment seeds during the training process. Despite the success of these methods, the critical requirement for labeled data remains a significant barrier to their practical application [36]. Tagging alignment seeds is inherently time-consuming and labor-intensive, prompting growing academic interest in self-supervised or unsupervised EA methods, which typically leverage external information to reduce reliance on labeled data.
Self-supervised learning involves automatically generating supervised signals from large-scale unlabeled data through auxiliary tasks (pretexts), and subsequently training models using supervised learning methods. For example, MultiKE [20] develops cross-KG inference techniques to enhance labeled data generation. EVA [37] leverages images as pivots to produce pseudo-labeled data. UPLR [38] computes the graph interaction divergence between entity pairs and adaptively selects confident samples from unlabeled data.
Unsupervised EA methods bypass the need for labeled data by constructing loss functions based on the intrinsic characteristics of data distributions. For example, ICLEA [36] and SelfKG [39] both employ contrastive learning to learn entity representations in an unsupervised manner. SEU [40] and UDCEA [41] reformulate EA tasks as assignment problems and propose novel unsupervised approaches that do not rely on neural networks. Specifically, they leverage machine translation and pre-trained language models to compute cross-lingual or cross-KG similarities across multi-view information (e.g., entity names, structures, and attributes), which are then integrated using global alignment strategies.
The supervised learning methods described above, particularly those leveraging auxiliary information, tend to achieve better results because alignment seeds and auxiliary information are generally more informative. However, these methods have the following limitations: (1) Auxiliary information often contains noise, necessitating customized pre-processing for most methods. (2) Incorporating auxiliary structures significantly increases model complexity, resulting in inefficient training. (3) Acquiring alignment seeds and auxiliary information is inherently time-consuming and labor-intensive for most KG applications, limiting the scalability of these methods. Existing unsupervised methods face similar challenges. Therefore, we propose LCA-UEA, which relies solely on basic semantic information and relation structures of entities. In this work, we avoid using complex neural network architectures, such as stacked or concatenated networks, heterogeneous GNNs, or GMNs. This design choice enhances the practicality and robustness of our method in real-world applications.

3. Preliminaries

Before starting the method description, we proceed to introduce the preliminary definitions.
Definition 1.
A knowledge graph (KG) can be denoted as G = ( E , R , A , V , T R , T A ) , where E , R , A , and V, respectively, represent the entity set, relation set, attribute set and value set. T R E × R × E denotes the relation structure, and T A E × A × V denotes the attribute structure. However, we only focus on the relation structure in this paper, so a KG can be simplified to G = ( E , R , T ) , where T denotes the set of relation triples.
Definition 2.
Entity alignment (EA) task aims to find matching entities with the same meaning from two KGs, G 1 = ( E 1 , R 1 , T 1 ) and G 2 = ( E 2 , R 2 , T 2 ) . In practice, there is usually some pre-alignment involved in model training providing seed alignments. However, this paper focuses on unsupervised entity alignment models, and there is no involvement of seed alignments in our model training.
For convenience, we put G 1 and G 2 together as a primal graph, G = ( E , R , T ) , in the experiment, where E = E 1 E 2 , R = R 1 R 2 and T = T 1 T 2 . Formally, we denote embedding vectors using bold lowercase letters and embedding matrices using bold uppercase letters. In particular, e i t denotes the embedding of the i-th object of the t-th type, and E t denotes the embedding matrix for all objects of the t-th type, where t refers to the type index and i refers to the object index within that type. Specific notations and their descriptions are summarized in Table 1.

4. Methodology

This section introduces the core work of this paper. Figure 2 depicts the architecture of LCA-UEA, which consists of five major components, as detailed below.
  • The textual feature module extends traditional entity name embedding by introducing entity-context embedding, thereby enhancing the extraction of entity name information.
  • The reconstruction of relation structure module aims to improve model efficiency and alignment performance by filtering out irrelevant neighborhood information for aligned entities during the data pre-processing stage.
  • The LCAT-based neighborhood aggregator module employs a simple yet effective method to extract graph relation structures for entities.
  • The contrastive learning module enables LCA-UEA to operate in an unsupervised method, eliminating the reliance on alignment seeds.
  • The alignment with consistency similarity module proposes a novel consistency-based similarity function, which can measure the similarity of candidate entity pairs more effectively.

4.1. Textual Feature

Firstly, we construct the textual features of entities, which serve as the input for LCA-UEA training. In this work, we generate input embeddings by leveraging both entity-text information and entity-context semantics from KGs.
Entity-Text Embedding. The entity name serves as a fundamental form for entity recognition, encapsulating rich semantic information that is highly beneficial for alignment tasks. Therefore, many studies utilize pre-trained word embeddings to obtain entity name representations. Some works [23,40] employ Google Translate to convert non-English entity names into English counterparts. However, the entity embeddings generated by these methods are suboptimal due to translation errors and the out-of-vocabulary (OOV) problem encountered by existing pre-trained embeddings [42]. Multi-language embedding models are powerful tools for mapping text from different languages into a shared vector space. Among these models, LaBSE [43] is a language-agnostic BERT-based sentence embedding model developed by Google researchers. In this paper, we adopt LaBSE to encode entity name information. Without loss of generality, let W ( e i ) = ( w 1 , w 2 , . . . , w n ) denote the entity name of e i E , consisting of n words or characters. So we construct its name embedding using LaBSE as follows:
e i n = f L a B S E ( W ( e i ) ) ,
where f L a B S E is a LaBSE encoder directly used to initialize the embeddings without additional fine-tuning.
Entity-Context Embedding. In the relation structure of KGs, most entities have associated entities within their context, which significantly aids in determining the similarity between aligned entities. Inspired by [42], we employ a random walk strategy to generate walk paths for each entity, thereby constructing sentences that encode dependency information from the entity-context. For any entity, e i , a k-step random walk generates a walking path, P e i = ( e i , r 1 , e 1 , r 2 , . . . , r k , e k ) . It is important to note that these walking paths include both entity nodes and relation edges. Different KGs not only have aligned entities but also aligned relation, and context sentences containing relation names better help the model distinguish different connection types. Incorporating relation edges into the long paths of entity-context is highly beneficial for uncovering the similarity between aligned entities. Subsequently, we also use the LaBSE model to learn features from these sentences and extract entity-context semantics, defined as follows:
e i c = t [ 1 , T ] f L a B S E ( P e i ) ,
where T indicates the number of random walk times, ⊕ indicates the superposition operation, and e i c denotes the entity-context embedding of entity e i . Given the randomness and uncertainty inherent in random walks, the results of a single simulation often have significant deviations and make it difficult to reflect the overall pattern. Therefore, multiple repeated random walks are adopted, and all simulation results are then superimposed and integrated to reduce the impact of random fluctuations. So, the superposition operation is to superimpose the output vectors of multiple random walk. In this paper, we choose to perform 10 random walks of the length 5 for each entity, that is, k = 5 and T = 10 .
Finally, the multi-view embedding of entity e i E is calculated by concatenating two kinds of embeddings:
e i m = e i n | | e i c ,
where | | denotes the vector concatenation operation. The output of the textual feature layer is the feature matrix of all entities, E m = { e 1 m , e 2 m , . . . , e n m } , where n represents the number of entities in G.

4.2. Reconstruction of Relation Structure

Most related works input the entire relation structure of KGs into convolutional networks. However, due to the diverse data sources of real-world KGs, the structures of different KGs are inherently heterogeneous. In other words, the neighborhoods of aligned entities are not fully matched in most cases. Some works address this issue by using random sampling to reconstruct the relation structure, which involves randomly selecting neighbor nodes of an entity. However, this method is prone to losing important neighbors. Following [8], we first generate matching relations using pseudo-labels, which are constructed through name embeddings. Then we propose a more effective method for restructuring the relation structure, which filters entity neighbors based on generated matching relations.
Pseudo-Labels. The relation matching method proposed in [8] relies on alignment seeds. However, in this paper, we focus on unsupervised learning, where alignment seeds are not available. To address this challenge, we generate pseudo-labels for new reliable alignment pairs, with the embedding of entity-text features constructed in the previous section. A simple method is to calculate the embedding distance for each entity pair and identify alignment pairs with distances smaller than a predefined threshold. However, this simplistic method may introduce errors.
To enhance the accuracy of the labeling results, we adopt bidirectional one-to-one alignment. Specifically, we first define a function for screening similar entity pairs:
S ( e j 2 , E 1 ) = { e i 1 | e i 1 E 1 , d ( e i 1 , e j 2 ) > γ s i m } ,
where d ( e i 1 , e j 2 ) = c o s i n e ( e i 1 , e j 2 ) is cosine similarity function, where a smaller value indicates a higher similarity between the entity pairs; γ s i m is a pre-defined similarity threshold used to filter more reliable alignment pairs. Clearly, D ( e j 2 , E 1 ) is a filtering step, where each element has a similarity to e j 2 exceeding γ s i m . Subsequently, we identify the set of pseudo-labels through one-to-one alignment.  
P L = { ( e i 1 , e j 2 ) | e i 1 = arg max e p 1 D ( e j 2 , E 1 ) d ( e p 1 , e j 2 ) , e j 2 = arg max e q 2 D ( e i 1 , E 2 ) d ( e i 1 , e q 2 ) } .
The key point of this operation is to select the most similar pseudo-aligned entity pairs from the filtered set.
Relation Matching. In this step, we establish similarity relations between the neighbors of each pseudo-label. For each ( e i 1 , e j 2 ) P L , we first compute the similarities between each pair of their respective neighbors. Next, we sort the pairs of neighbors with a similarity exceeding a predefined threshold, τ s i m , in descending order and conduct one-to-one matching for the neighbors:
m a t c h ( e i 1 , e j 2 ) = { ( e i 1 , e j 2 ) | arg max e p 1 D ( e j 2 , N ( e i 1 ) ) d ( e p 1 , e j 2 ) , arg max e q 2 D ( e i 1 , N ( e j 2 ) ) d ( e i 1 , e q 2 ) } ,
where N ( · ) denotes the neighbors of an entity. Next, we derive the matching relation based on the following corresponding relationships:
e i 1 e j 2 ( e i 1 , r k 1 , e i 1 ) ( e j 2 , r p 2 , e j 2 ) r k 1 r q 2 ,
where ⇒ indicates the matching relationship between two entities or two relations; ( e i 1 , r k 1 , e i 1 ) is a relation triple corresponding to e i 1 and its neighbor, e i 1 .
After that, we obtain all matching relations:
R a l i g n = { ( r k 1 , r q 2 ) | c o u n t e r ( ( r k 1 r q 2 ) ) > γ r } ,
where c o u n t e r ( · ) is employed to return the number of matches for a pair of relations, and γ r is a counting threshold. To minimize the impact of matching errors, we select only those matching relations whose counts exceed the threshold γ r .
Triples Reconstruction. Finally, we use the matching relations to filter the triples in KGs, thereby achieving the reconstruction of the relation structure:
T n e w = { ( e i , r k , e j ) | ( e i , r k , e j ) T , r k R a l i g n } .
Algorithm 1 presents the procedure of this algorithm. It is important to emphasize that this algorithm serves only as a pre-processing step, meaning that it needs to be executed only once prior to model training. Thus, incorporating this module does not increase training complexity. Conversely, it reduces training complexity by decreasing the number of relation triples input into the EA model.

4.3. LCAT-Based Neighborhood Aggregator

The LCAT-based neighborhood aggregator is designed to update entity embeddings by performing message passing, leveraging the relation information from the KGs. This aggregator aggregates neighborhood information to the central node entity, which plays a crucial role in obtaining useful information for EA. As shown in Figure 2, LCA-UEA first applies different graph data augmentation techniques to construct two graph structures of KGs (this implementation will be elaborated on in the next subsection) and then utilizes LCAT as the backbone network to model these graph structures.
Algorithm 1 Procedure of reconstruction of relation structure.
Input:  G 1 = ( E 1 , R 1 , T 1 ) , G 2 = ( E 2 , R 2 , T 2 ) , textual features E m .
Output: new relation triples T n e w .
  1:
Set S , P L , R a l i g n , T n e w ;
▹ Generate pseudo-labels
  2:
for each e i 1 E 1 and e j 1 E 2  do
  3:
      if  d ( e i 1 , e j 2 ) > γ s i m  then
  4:
            Expand S S ( e i 1 , e j 2 ) ;
  5:
for each ( e i 1 , e j 2 ) S  do
  6:
      if  e i 1 = a r g max e p 1 D ( e j 2 , E 1 ) d ( e p 1 , e j 2 ) and e j 2 = a r g max e q 2 D ( e i 1 , E 2 ) d ( e i 1 , e q 2 )  then
  7:
            Expand P L P L ( e i 1 , e j 2 ) ;
▹ Generate matching relations
  8:
for  ( e i 1 , e j 2 ) P L and ( e i 1 , e j 2 ) m a t c h ( e i 1 , e j 2 )  do
  9:
      for  ( e i 1 , r k 1 , e i 1 ) T 1 and ( e j 2 , r p 2 , e j 2 ) T 2  do
10:
           c o u n t e r ( ( r k 1 r q 2 ) ) + + ;
11:
for  ( r k 1 , r q 2 ) c o u n t e r ( · ) do
12:
      if  c o u n t e r ( ( r k 1 r q 2 ) ) > γ r  then
13:
          Expand R a l i g n R a l i g n ( r k 1 , r q 2 ) ;
▹ Generate new relation structure
14:
for  ( e i , r k , e i ) T 1 T 2 do
15:
      if  r k R a l i g n  then
16:
          Expand T n e w T n e w ( e i , r k , e i ) ;
It is well known that a message-passing GNN layer generates node embeddings by collecting and aggregating information from its neighbors. This operation can be formally defined as follows:
e ˜ i = j N i * α i j W 1 e j ,
where W 1 is a learnable matrix, N i * denotes the set of neighbors of node i (including node i itself), α i j [ 0 , 1 ] is the coefficient such that j α i j = 1 . Different GNN styles are determined based on the computation of α i j .
For example, GCN computes the average of messages by assigning the same coefficient, α i j = 1 / | N i * | , to each neighbor. In contrast to assigning a fixed coefficient, GAT calculates the attention coefficient for each neighbor as follows:
α i j = e x p ( L e a k y R e l u ( ψ ( e i , e j ) ) ) k N i * e x p ( L e a k y R e l u ( ψ ( e i , e k ) ) ) ,
where ψ ( e i , e j ) = a T [ W 1 e i | | W 1 e j ] , a R 2 d is the weight vector. As discussed in the Introduction, GCN or GAT have been widely used to obtain entity embeddings in many prior works. However, as pointed out in [44], these architectures exhibit certain limitations because their performance is highly data-dependent, as expected.
Therefore, we introduce the learnable convolutional attention network (LCAT) [15] to enhance the learning of embeddings for aligned entities. LCAT learns proper operations to apply in each layer, thereby integrating different layer types within the same GNN architecture. Relevant experiments also demonstrate that LCAT surpasses existing benchmark GNNs in terms of performance, network initialization, and robustness to input noise.
To exploit the advantages of both convolution and attention in the design of GNN architecture, LCAT extends the existing attention layer by introducing two learnable parameters to interpolate between GCN and GAT. This can be formulated as the following attention layer scores:
e ˜ i = e i + λ 1 k N i * e k 1 + λ 1 | N i * | , ψ ( e i , e j ) = λ 2 · ( a T [ W 2 e ˜ i | | W 2 e ˜ j ] ) ,
where λ 1 , λ 2 [ 0 , 1 ] are the introduced learnable parameters. Here, λ 1 interpolates adding a mean-neighbor vector, while λ 2 interpolates between attention and no attention. So this formulation enables LCAT to interpolate between the GCN (when λ 2 = 0 ) and GAT (when λ 1 = 0 ,   λ 2 = 1 ). Therefore, LCAT not only switches between existing layers but also learns the degree of attention required for each neighbor.
In this paper, we adapt LCAT appropriately to make it more suitable for EA tasks. First, we employ a fully connected layer to model the textual features of entities, thereby enhancing the training of the model. Specifically, let E = M L P ( E m ) serve as the input to the LCAT layer. By combining Equations (10)–(12), the output result of the LCAT layer can be obtained, denoted as E ˜ . Secondly, to capture the similarity of alignment in both textual features and neighborhood features, we introduce another learnable parameter to interpolate between the two, as follows:
E ˜ = λ 3 · E ˜ + ( 1 λ 3 ) · M L P ( E m ) ,
where λ 3 [ 0 , 1 ] is the introduced learnable parameter, and E ˜ is the output of our LCAT layer. Figure 3 provides an intuitive illustration of the above three learnable parameters.

4.4. Contrastive Learning

Contrastive learning enables the model to perceive structural differences by generating two distinct graph views without relying on labeled training data, thereby maximizing the consistency between the original KG and the augmented KG. Learning to distinguish positive and negative samples from two distributions is the key idea in contrastive learning. Inspired by the recent successful applications of unsupervised learning in EA tasks [16], we adhere to the common paradigm of graph contrastive learning, which seeks to maximize the consistency of representations across different views.
Momentum update. Specifically, we establish two encoders, the query encoder and the key encoder, each of which contains an LCAT network layer. The query encoder updates its parameters θ u through gradient backpropagation in each training iteration. Meanwhile, the key encoder, which has the same structure as the query encoder, adopts a momentum update mechanism in each training iteration to maintain the consistency of negative samples. The parameters of the key encoder, namely θ v , are updated as follows:
θ v = m × θ v + ( 1 m ) × θ u ,
where m [ 0 , 1 ) is the momentum hyper-parameter.
Contrastive learning. Data augmentation in graph contrastive learning offers a complementary perspective by applying transformation operations (such as masking, denoising, and deletion) to the input graph. In this work, we implement graph data augmentation by masking some neighbors, which is one of the simplest methods. Specifically, we employ random graph augmentation on the relation structure using two distinct perturbation ratios, γ 1 and γ 2 , which results in the generation of two complementary views of the KGs. Let E ˜ k and E ˜ v denote the output embedding matrix of the query encoder and key encoder, respectively. We adopt the InfoNCE loss function [16] to train the model, which aims to ensure that the node embeddings of each entity in the two views are consistent with each other while being distinguishable from the embeddings of other entities:
L I n f o N C E = e i E l o g s ( e ˜ i u , e ˜ i v ) s ( e ˜ i u , e ˜ i v ) + k i s ( e ˜ i u , e ˜ k v ) + k i s ( e ˜ i u , e ˜ k u ) ,
where s ( e ˜ i u , e ˜ i v ) = e e ˜ i u · e ˜ i v / τ , and τ is a temperature hyper-parameter. Under the guidance of the loss function, the model is optimized via backpropagation to learn entity embeddings.

4.5. Alignment with Consistency Similarity

After obtaining the final entity embeddings, we measure the similarities of candidate entity pairs. In real KGs, most entities have rather sparse neighborhood structures, while only a few entities are densely connected to others. As a result, the number of entities in real KGs follows a long-tailed distribution.
Most works use conventional similarity functions (e.g., cosine, Manhattan, Euclidean) to compute entity pair similarity, considering only the differences based on their own features. This results in two issues: first, the correlations between an entity and others are ignored; second, many one-to-many alignments appear in the results. To eliminate the impact of these discrepancies on the alignment, we reconstruct a similarity function based on consistency by introducing two local maxima (i.e., row maxima and column maxima of the similarity matrix) after the dot product operation, as implemented below:
s ( e s , e t ) = ( max e i E 2 e s · e i + max e j E 1 e t · e j ) / 2 e s · e t .
Here s ( · , · ) is essentially a distance (or negative similarity) measure; thus a smaller s ( e s , e t ) means that e s and e t are each other’s top choices to a greater extent. This formulation effectively downranks scenarios in which, for example, e s is highly similar to another node compared to e t or vice versa, thereby mitigating the occurrence of one-to-many mappings. The final experiment also demonstrates that this function can eliminate the effects of the above differences and more effectively measure the similarity of entity pairs.

5. Experiments

5.1. Experiment Settings

Datasets. To fairly and comprehensively evaluate the performance of LCA-UEA, we conducted experiments on three extensive benchmark datasets, including two 15K standard datasets and a 100K large-scale dataset. The detailed statistics of the datasets are listed in Table 2.
  • DBP-15K [18] is one of the most widely used datasets in the literature. It consists of three cross-lingual subsets derived from multi-lingual DBpedia: Chinese–English ( ZH-EN D B P ), Japanese–English ( JA-EN D B P ), and French–English ( FR-EN D B P ). Each subset contains 15,000 aligned entity pairs but varies in the number of relation triples.
  • WK31-15K [45] is designed to evaluate model performance on sparse and dense datasets. It comprises four subsets: EN-DE V 1 , EN-DE V 2 , EN-FR V 1 , and EN-FR V 2 . The V1 subsets represent sparse graphs obtained using the IDS algorithm, while the density of the V2 subsets is approximately twice that of the corresponding V1 subsets.
  • DWY-100K [21] is a large-scale dataset suitable for evaluating the scalability of experimental models. It includes two monolingual KGs: DBpedia–Wikidata (DBP-WD) and DBpedia–YAGO3 (DBP-YG). Each KG contains 100,000 aligned entity pairs and nearly one million triples.
Evaluation Metrics. During the experimental evaluation, we used the similarity function defined in Equation (16) to rank candidate alignment pairs and adopted the following two standard evaluation metrics: Hits@k represents the proportion of correctly aligned pairs ranked among the top k candidates; the MRR (Mean Reciprocal Rank) is the average of the reciprocal ranks of the correct alignments. It is worth noting that higher Hits@k and MRR scores indicate better EA performance.
Implementation Settings. We followed the original data splits provided for DBP-15K [18] and WK31-15K [45]. For the unsupervised model, we allocated 10% of link pairs as the validation set and 70% as the test set. The dimension of input embeddings, batch size, number of epochs, momentum (m), and temperature τ were set to 768, 1024, 800, 0.999, and 0.08, respectively. For other hyper-parameters, we used the following configuration: γ s i m = 0.8 ; γ r = 5 ; γ 1 = 0.2 ; and γ 2 = 0.3 . Our proposed method was implemented using the Adam optimizer in the PyTorch 1.12.1 framework and experiments were conducted on a workstation equipped with an NVIDIA A5000 GPU located at South China Normal University in Guangzhou. The source code was made available on January 2025 at https://github.com/cwswork/SLU.
Baselines. To evaluate LCA-UEA, we compared it with the following three types of state-of-the-art EA methods, including both supervised and unsupervised methods.
  • Supervised methods with pure relation structures: These methods are based on the original relation structures (i.e., triples): MTransE [17], BootEA [21], RDGCN [23], AliNet [24], RPR-RHGT [8], STEA [46], EMEA [47], PEEA [10], RANM [9], and KAGNN [25].
  • Supervised methods with auxiliary information: These methods are based on both relation structure and some auxiliary information (e.g., attribute information, images), where JAPE [18], GCN-Align [48], MRAEA [27], AttrGNN [12], MHNA [13], and SDEA [33] use attribute or descriptive information and MMEA-cat [49] and GEEA [31] use image information.
  • Unsupervised methods: These methods do not use training data, but some of them use some auxiliary information, including attribute information (MultiKE [20], AttrE [19], ICLEA [36], UDCEA [41]), descriptive information (ICLEA [36]), and images (EVA [37]). SEU [40] and SelfKG [39] only use the original relation structures.
As a reminder, our method relies solely on the structural information of KGs. For a relatively fair comparison, we replicated the UDCEA model by removing its attribute information. A similar case is the ICLEA method [36], which also incorporates description information.

5.2. Overall Results on DBP-15K and WK31-15K

In Table 3 and Table 4, we report the performance of LCA-UEA and the baselines on DBP-15K and WK31-15K. We categorize the baseline models into three groups, using horizontal lines for segmentation: supervised methods with pure relation structures, supervised methods with auxiliary information, and unsupervised methods.
Comparison with supervised methods using pure relation structures. Our proposed method was first compared with 10 supervised and relation-based methods, and it consistently achieved the best performance on all datasets except the JA-EN D B P dataset. Specifically, compared with the second method, RANM [9], our method improved Hits@1 by 6.4% on FR-EN D B P and by 6.1% on EN-FR V 1 . Since RANM considers the heterogeneous information of KGs and PEEA focuses on searching for one-to-one alignments, their training or alignment efficiency is lower than that of LCA-UEA. This shows that LCA-UEA remains highly competitive among this type of method, despite not performing as well as them on the JA-EN D B P dataset. In addition, as an unsupervised method, LCA-UEA and some other unsupervised methods do not rely on labeled input data but still outperform these supervised methods. One of the main reasons for this is that the contrastive learning mechanism provides more positive samples (each entity serves as its own positive sample), and the InfoNCE loss function effectively extracts cross-KG entity information. In summary, LCA-UEA breaks the upper performance limit of relation-based EA methods and validates the effectiveness of its design.
Comparison with supervised methods using auxiliary information. Among the nine supervised methods with auxiliary information, the best performer was SDEA [33], which significantly outperformed our method. We attribute this to its effective integration of various types of information, particularly the attribute and description information of entities based on BERT [50]. Neither MMEA-cat [49] nor GEEA [31] considers entity name information but incorporate image information, while GEEA also considers attribute information. As shown in the experimental results, both methods exhibited lower performance compared to other baselines. This indicates that modeling entity names using pre-trained language models is more effective than relying on image-based methods. In conclusion, LCA-UEA still demonstrates better effectiveness and robustness due to its simpler model architecture and reduced reliance on input information.
Compared with unsupervised methods. From the experimental results alone, LCA-UEA achieved optimal performance on most datasets compared to the six unsupervised methods, but the performance improvement was not significant. However, we further observe the following: (1) Two translation-based methods, MultiKE [20] and AttrE [19], both consider attribute information but performed more generally. This indicates that translation-based models are less effective than GNNs for EA tasks. (2) The relation-based methods, SEU [40] and SelfKG [39], both achieved good results. However, SEU’s Sinkhorn operation requires O ( n 2 ) algorithmic complexity, while SelfKG’s self-negative sampling strategy involves maintaining two negative sample queues. Compared to these two methods, LCA-UEA demonstrates better performance and modeling efficiency. (3) EVA [37] is a relatively early unsupervised model incorporating image information. Its performance was mediocre, and the difficulty in obtaining image information limits its applicability in real-world scenarios. (4) ICLEA [36] and UDCEA [41] consider both relation and attribute structures. ICLEA’s encoder is a multi-head GAT model, and UDCEA’s encoder is a multi-language Sentence-BERT. Since our LCA-UEA’s encoder consists of a single LCAT layer, the number of training parameters in the neural network is lower than that in the previous two methods. In summary, our LCA-UEA exhibits certain advantages over other unsupervised methods.

5.3. Overall Results on DWY100K

To verify the effectiveness of our method on a large-scale dataset, we report an end-to-end comparison of LCA-UEA with 16 baselines on the DWY100K dataset. As shown in Table 5, LCA-UEA first outperformed all other unsupervised methods and achieved the best performance across all metrics. Second, the Hits@1 scores of supervised methods on DBP-WD reached up to 99.3%, which is only 1% higher than LCA-UEA’s 98.3%. However, our LCA-UEA method still significantly outperformed all supervised methods on DBP-YG. Finally, while most baselines exhibited commendable performance, LCA-UEA’s Hits@1 on DBP-YG clearly reached 100.0%. This indicates that the monolingual setting effectively alleviates name bias and enhances the recognition of aligned entities. Overall, given that the size of DWY-100K is several times larger than that of WK31-15K and DBP-15K, this experiment demonstrated the excellent scalability and superiority of our method for larger real-world and monolingual KGs.

5.4. Ablation Experiments

In the previous section, we demonstrated the overall success of LCA-UEA. To validate the effectiveness of each component design in LCA-UEA, we conducted ablation studies using five variants of LCA-UEA on DBP-15K, and the results are shown in Table 6.
  • w/o ECE+RRS: The modules for entity-context embedding and reconstruction of relation structure were removed;
  • w/o ECE: The module for entity-context embedding was removed;
  • w Cosine: The similarity function based on consistency was replaced with the cosine function;
  • w GAT: The LCAT model was replaced with a simple GAT model;
  • w GCN+GAT: The LCAT model was replaced with a stacked network of GCN+GAT.
As illustrated in Table 6, LCA-UEA achieved the best performance across most metrics and datasets. First, it can be observed that the Hits@1 of w/o ECE and w/o ECE+RRS degraded by 0.2–1.9% and 0.3–2.1%, respectively. The results confirm the effectiveness of the entity-context embedding, as it further extracts information about entity names compared to simply utilizing individual entity names. Although the effect of relation structure reconstruction was not significant, its key role was to reduce the amount of model training by removing some triples (those that had no effect on alignment) before training the model. Second, we tested the performance of LCA-UEA without using the new similarity function, meaning it relied on a universal cosine function to calculate entity similarity, which is also the case for most baselines. The results demonstrate that the similarity function based on consistency was significantly effective, bringing about an absolute improvement of 2.2–4.5% in Hits@1. Third, to analyze the effect of the LCAT model, we compared the performance of w GAT, w GCN+GAT, and LCA-UEA. The LCAT model effectively captured rich and subtle alignment information for the EA task.

5.5. Additional Analysis

In this section, we first investigate the stability of EA methods on datasets with varying densities. A sensitivity analysis was conducted on the hyper-parameters of graph data augmentation, specifically the perturbation rate γ 1 , temperature τ , and momentum coefficient m, to evaluate their impact on the robustness of LCA-UEA.
Effect of dataset sparsity. The WK31-15K dataset contains four subsets with varying densities, where V1 represents a sparse dataset and V2 represents a dense one. Intuitively, EA methods tend to perform better on dense KGs because their entities possess richer neighborhood information. From Table 4, most supervised methods aligned with this intuition; their performance on V2 was significantly better than that on V1, particularly for AliNet, PEEA, and MRAEA. However, this trend did not hold for most unsupervised methods, as the performance difference between the V1 and V2 datasets was relatively small. This indicates that supervised methods, guided by alignment seeds, can train models to more effectively capture the similarity of aligned entities within neighborhoods. Unsupervised methods, in contrast, rely more heavily on entity-level information (e.g., entity names) during model training to infer alignments and thus do not demonstrate significant improvements in capturing neighborhood features. Therefore, enhancing the ability of GNN-based models to acquire neighborhood features under the unsupervised learning framework remains one of our key research directions.
Impact of perturbing ratio γ 1 . In graph data augmentation experiments, it is common to vary the perturbation rate in one view while keeping the other fixed. We set six perturbation ratios from 0.0 to 0.5; a zero ratio means that no disturbance is applied in one view. Clearly, higher ratios introduce more noise into the graph data. As shown in Figure 4a, LCA-UEA’s performance remained stable despite increasing perturbation. This shows that LCA-UEA is robust to such noise and that moderate noise can improve performance.
Impact of temperature τ and momentum coefficients m. Temperature and momentum are standard hyper-parameters in contrastive learning. The temperature τ controls the focus on difficult samples, while the momentum coefficient m stabilizes model updates [36]. We selected two sets of ensemble data based on prior work and show the results in Figure 4b–d. The results indicate that larger values of m lead to more stable performance, and τ = 0.08 achieves a better balance, as shown in most studies. Overall, these experiments show that LCA-UEA is relatively insensitive to hyper-parameters, maintaining robustness during tuning.

6. Conclusions

Recent EA methods include both supervised and unsupervised approaches. However, these methods often face two main challenges: balancing effectiveness and efficiency, and managing increasing model complexity. To address these issues, we propose LCA-UEA, a novel unsupervised EA method that integrates five modules to improve alignment accuracy. We conducted extensive experiments on three diverse datasets to evaluate LCA-UEA. The results show that LCA-UEA outperforms several state-of-the-art supervised and unsupervised methods. We further analyze each module and find that our consistency-based similarity function significantly improves alignment performance. Moreover, experiments show that our redesigned relation structure module reduces complexity while improving performance. Additionally, we observe that GNN-based models in LCA-UEA do not significantly outperform those in some supervised methods. Therefore, our future work will focus on enhancing the ability of GNNs to capture neighborhood features in unsupervised settings.

Author Contributions

Conceptualization, W.C.; methodology, W.C.; validation, W.C. and W.M.; formal analysis, W.C.; investigation, W.C. and W.M.; data curation, W.C.; writing—original draft preparation, W.C.; writing—review and editing, W.C. and W.M.; supervision W.M.; project administration, W.C.; funding acquisition, W.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 62306079), the Education Science Planning Project of Guangdong Province (CN) (Specialized higher education) (Grant No. 2023GXJK422), and the Scientific Research Innovation Team Project of Guangdong University of Education (Grant No. 2024KYCXTD015).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Zhao, J.; Zhang, P.; Wang, Y.; Xin, R.; Lu, X.; Li, R.; Lyu, S.; Ou, Z.; Song, M. TQAgent: Enhancing Table-Based Question Answering with Knowledge Graphs and Tree-Structured Reasoning. Appl. Sci. 2025, 15, 3788. [Google Scholar] [CrossRef]
  2. Chen, F.; Yin, G.; Dong, Y.; Li, G.; Zhang, W. KHGCN: Knowledge-Enhanced Recommendation with Hierarchical Graph Capsule Network. Entropy 2023, 25, 697. [Google Scholar] [CrossRef] [PubMed]
  3. Xia, Y.; Luo, J.; Zhou, G.; Lan, M.; Chen, X.; Chen, J. DT4KGR: Decision Transformer for Fast and Effective Multi-Hop Reasoning over Knowledge Graphs. Inf. Process. Manag. 2024, 61, 103648. [Google Scholar] [CrossRef]
  4. Li, M.; Qiao, Y.; Lee, B. Multi-View Intrusion Detection Framework Using Deep Learning and Knowledge Graphs. Information 2025, 16, 377. [Google Scholar] [CrossRef]
  5. Bordes, A.; Usunier, N.; García-Durán, A.; Weston, J.; Yakhnenko, O. Translating Embeddings for Modeling Multi-relational Data. In Proceedings of the 26th International Conference on Neural Information Processing Systems (NeurIPS), Lake Tahoe, NV, USA, 5–8 December 2013; pp. 2787–2795. [Google Scholar]
  6. Kipf, T.N.; Welling, M. Semi-supervised Classification with Graph Convolutional Networks. In Proceedings of the 5th International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017; pp. 1–14. [Google Scholar]
  7. Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. In Proceedings of the 6th International Conference on Learning Representations(ICLR), Vancouver, BC, Canada, 30 April–3 May 2018; pp. 1–12. [Google Scholar]
  8. Cai, W.; Ma, W.; Zhan, J.; Jiang, Y. Entity Alignment with Reliable Path Reasoning and Relation-aware Heterogeneous Graph Transformer. In Proceedings of the 31th International Joint Conference on Artificial Intelligence (IJCAI), Vienna, Austria, 23–29 July 2023; Volume 3, pp. 1930–1937. [Google Scholar]
  9. Cai, W.; Ma, W.; Wei, L.; Jiang, Y. Semi-supervised Entity Alignment via Relation-based Adaptive Neighborhood Matching. IEEE Trans. Knowl. Data Eng. (TKDE) 2023, 35, 8545–8558. [Google Scholar] [CrossRef]
  10. Tang, W.; Su, F.; Sun, H.; Qi, Q.; Wang, J.; Tao, S.; Yang, H. Weakly Supervised Entity Alignment with Positional Inspiration. In Proceedings of the the 16th ACM International Conference on Web Search and Data Mining, Singapore, 27 February–3 March 2023; pp. 814–822. [Google Scholar]
  11. Yang, H.W.; Zou, Y.; Shi, P.; Lu, W.; Lin, J.; Sun, X. Aligning Cross-Lingual Entities with Multi-Aspect Information. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 4430–4440. [Google Scholar]
  12. Liu, Z.; Cao, Y.; Pan, L.; Li, J.; Liu, Z.; Chua, T.S. Exploring and Evaluating Attributes, Values, and Structures for Entity Alignment. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 6355–6364. [Google Scholar]
  13. Cai, W.; Wang, Y.; Mao, S.; Zhan, J.; Jiang, Y. Multi-heterogeneous Neighborhood-aware for Knowledge Graphs Alignment. Inf. Process. Manag. 2022, 59, 102790. [Google Scholar] [CrossRef]
  14. Zhu, B.; Bao, T.; Han, R.; Cui, H.; Han, J.; Liu, L.; Peng, T. An Effective Knowledge Graph Entity Alignment Model based on Multiple Information. Neural Netw. 2023, 162, 83–98. [Google Scholar] [CrossRef] [PubMed]
  15. Javaloy, A.; Martin, P.S.; Levi, A.; Valera, I. Learnable Graph Convolutional Attention Networks. In Proceedings of the 11th International Conference on Learning Representations(ICLR), Kigali, Rwanda, 1–5 May 2023; pp. 1–35. [Google Scholar]
  16. Oord, A.v.d.; Li, Y.; Vinyals, O. Representation Learning with Contrastive Predictive Coding. arXiv 2018, arXiv:1807.03748. [Google Scholar]
  17. Chen, M.; Tian, Y.; Yang, M.; Zaniolo, C. Multilingual Knowledge Graph Embeddings for Cross-lingual Knowledge Alignment. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), Melbourne, Australia, 19–25 August 2017; pp. 1511–1517. [Google Scholar]
  18. Sun, Z.; Hu, W.; Li, C. Cross-Lingual Entity Alignment via Joint Attribute-Preserving Embedding. In Proceedings of the 17th International Semantic Web Conference (ISWC), Vienna, Austria, 21–25 October 2017; pp. 628–644. [Google Scholar]
  19. Trisedya, B.D.; Qi, J.; Zhang, R. Entity Alignment between Knowledge Graphs Using Attribute Embeddings. In Proceedings of the 33th AAAI Conference on Artificial Intelligence (AAAI), Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 297–304. [Google Scholar]
  20. Zhang, Q.; Sun, Z.; Hu, W.; Chen, M.; Guo, L.; Qu, Y. Multi-view Knowledge Graph Embedding for Entity Alignment. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), Macao, China, 10–16 August 2019; pp. 5429–5435. [Google Scholar]
  21. Sun, Z.; Hu, W.; Zhang, Q.; Qu, Y. Bootstrapping Entity Alignment with Knowledge Graph Embedding. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI), Stockholm, Sweden, 13–19 July 2018; pp. 4396–4402. [Google Scholar]
  22. Zhu, Q.; Zhou, X.; Wu, J.; Tan, J.; Guo, L. Neighborhood-Aware Attentional Representation for Multilingual Knowledge Graphs. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), Macao, China, 10–16 August 2019; pp. 1943–1949. [Google Scholar]
  23. Wu, Y.; Liu, X.; Feng, Y.; Wang, Z.; Yan, R.; Zhao, D. Relation-Aware Entity Alignment for Heterogeneous Knowledge Graphs. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), Macao, China, 10–16 August 2019; pp. 5278–5284. [Google Scholar]
  24. Sun, Z.; Wang, C.; Hu, W.; Chen, M.; Dai, J.; Zhang, W.; Qu, Y. Knowledge Graph Alignment Network with Gated Multi-Hop Neighborhood Aggregation. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI), New York, NY, USA, 7–12 February 2020; Volume 34, pp. 222–229. [Google Scholar]
  25. Huang, Z.; Li, X.; Ye, Y.; Zhang, B.; Xu, G.; Gan, W. Multi-view Knowledge Graph Fusion via Knowledge-aware Attentional Graph Neural Network. Appl. Intell. 2023, 53, 3652–3671. [Google Scholar] [CrossRef]
  26. Mao, X.; Wang, W.; Wu, Y.; Lan, M. Boosting the Speed of Entity Alignment 10*: Dual Attention Matching Network with Normalized Hard Sample Mining. In Proceedings of the World Wide Web Conference (WWW), Ljubljana, Slovenia, 19–23 April 2021; pp. 821–832. [Google Scholar]
  27. Mao, X.; Wang, W.; Xu, H.; Lan, M.; Wu, Y. MRAEA: An Efficient and Robust Entity Alignment Approach for Cross-lingual Knowledge Graph. In Proceedings of the 13th International Conference on Web Search and Data Mining (WSDM), Houston, TX, USA, 3–7 February 2020; pp. 420–428. [Google Scholar]
  28. Zeng, W.; Zhao, X.; Tang, J.; Fan, C. Reinforced Active Entity Alignment. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management (CIKM), New York, NY, USA, 1–5 November 2021; pp. 2477–2486. [Google Scholar]
  29. Guo, L.; Han, Y.; Zhang, Q.; Chen, H. Deep Reinforcement Learning for Entity Alignment. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, 22–27 May 2022; pp. 2754–2765. [Google Scholar]
  30. Qian, Y.; Pan, L. Variety-Aware GAN and Online Learning Augmented Self-Training Model for Knowledge Graph Entity Alignment. Inf. Process. Manag. 2023, 60, 103472. [Google Scholar] [CrossRef]
  31. Guo, L.; Chen, Z.; Chen, J.; Fang, Y.; Zhang, W.; Chen, H. Revisit and Outstrip Entity Alignment: A Perspective of Generative Models. In Proceedings of the 12th International Conference on Learning Representations (ICLR), Vienna, Austria, 7–11 May 2024; pp. 1–18. [Google Scholar]
  32. Sun, Z.; Hu, W.; Wang, C.; Wang, Y.; Qu, Y. Revisiting Embedding-based Entity Alignment: A Robust and Adaptive Method. IEEE Trans. Knowl. Data Eng. 2022, 35, 8461–8475. [Google Scholar] [CrossRef]
  33. Zhong, Z.; Zhang, M.; Fan, J.; Dou, C. Semantics Driven Embedding Learning for Effective Entity Alignment. In Proceedings of the IEEE 38th International Conference on Data Engineering (ICDE), Kuala Lumpur, Malaysia, 9–12 May 2022; pp. 2127–2140. [Google Scholar]
  34. Su, F.; Xu, C.; Yang, H.; Chen, Z.; Jing, N. Neural Entity Alignment with Cross-Modal Supervision. Inf. Process. Manag. 2023, 60, 103174. [Google Scholar] [CrossRef]
  35. Li, Q.; Ji, C.; Guo, S.; Liang, Z.; Wang, L.; Li, J. Multi-modal Knowledge Graph Transformer Framework for Multi-modal Entity Alignment. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, 6–10 December 2023; pp. 987–999. [Google Scholar]
  36. Zeng, K.; Dong, Z.; Hou, L.; Cao, Y.; Hu, M.; Yu, J.; Lv, X.; Cao, L.; Wang, X.; Liu, H.; et al. Interactive Contrastive Learning for Self-supervised Entity Alignment. In Proceedings of the 31st ACM International Conference on Information and Knowledge Management (CIKM), Atlanta, GA, USA, 17–21 October 2022; pp. 2465–2475. [Google Scholar]
  37. Liu, F.; Chen, M.; Roth, D.; Collier, N. Visual Pivoting for (Unsupervised) Entity Alignment. In Proceedings of the 35th the AAAI Conference on Artificial Intelligence (AAAI), Online, 2–9 February 2021; Volume 35, pp. 4257–4266. [Google Scholar]
  38. Li, J.; Song, D. Uncertainty-aware Pseudo Label Refinery for Entity Alignment. In Proceedings of the ACM Web Conference 2022, Virtual Event, Lyon, France, 25–29 April 2022; pp. 829–837. [Google Scholar]
  39. Liu, X.; Hong, H.; Wang, X.; Chen, Z.; Kharlamov, E.; Dong, Y.; Tang, J. SelfKG: Self-supervised Entity Alignment in Knowledge Graphs. In Proceedings of the 2022 World Wide Web Conference (WWW), Virtual Event, Lyon, France, 25–29 April 2022; pp. 860–870. [Google Scholar]
  40. Mao, X.; Wang, W.; Wu, Y.; Lan, M. From Alignment to Assignment: Frustratingly Simple Unsupervised Entity Alignment. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online and Punta Cana, Dominican Republic, 7–11 November 2021; pp. 2843–2853. [Google Scholar]
  41. Jiang, C.; Qian, Y.; Chen, L.; Gu, Y.; Xie, X. Unsupervised Deep Cross-Language Entity Alignment. In Proceedings of the Machine Learning and Knowledge Discovery in Databases: Research Track: European Conference (ECML PKDD), Turin, Italy, 18–22 September 2023; pp. 3–19. [Google Scholar]
  42. Wang, C.; Huang, Z.; Wan, Y.; Wei, J.; Zhao, J.; Wang, P. FuAlign: Cross-Lingual Entity Alignment via Multi-view Representation Learning of Fused Knowledge Graphs. Inf. Fusion 2023, 89, 41–52. [Google Scholar] [CrossRef]
  43. Feng, F.; Yang, Y.; Cer, D.; Arivazhagan, N.; Wang, W. Language-agnostic Bert Sentence Embedding. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), Dublin, Ireland, 22–27 May 2022; pp. 878–891. [Google Scholar]
  44. Fountoulakis, K.; Levi, A.; Yang, S.; Baranwal, A.; Jagannath, A. Graph Attention Retrospective. J. Mach. Learn. Res. 2023, 24, 1–52. [Google Scholar]
  45. Sun, Z.; Zhang, Q.; Hu, W.; Wang, C.; Chen, M.; Akrami, F.; Li, C. A Benchmarking Study of Embedding-based Entity Alignment for Knowledge Graphs. In Proceedings of the VLDB Endowment (PVLDB), Tokyo, Japan, 31 August–4 September 2020; pp. 2326–2340. [Google Scholar]
  46. Liu, B.; Lan, T.; Hua, W.; Zuccon, G. Dependency-aware Self-training for Entity Alignment. In Proceedings of the 16th ACM International Conference on Web Search and Data Mining, Singapore, 27 February–3 March 2023; pp. 796–804. [Google Scholar]
  47. Liu, B.; Scells, H.; Hua, W.; Zuccon, G.; Zhao, G.; Zhang, X. Guiding Neural Entity Alignment with Compatibility. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 491–504. [Google Scholar]
  48. Wang, Z.; Lv, Q.; Lan, X.; Zhang, Y. Cross-lingual Knowledge Graph Alignment via Graph Convolutional Networks. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), Brussels, Belgium, 31 October–4 November 2018; pp. 349–357. [Google Scholar]
  49. Wang, M.; Shi, Y.; Yang, H.; Zhang, Z.; Lin, Z.; Zheng, Y. Probing the Impacts of Visual Context in Multimodal Entity Alignment. Data Sci. Eng. 2023, 8, 124–134. [Google Scholar] [CrossRef]
  50. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
Figure 1. Illustration of the EA task.
Figure 1. Illustration of the EA task.
Entropy 27 00924 g001
Figure 2. The overall architecture of LCA-UEA.
Figure 2. The overall architecture of LCA-UEA.
Entropy 27 00924 g002
Figure 3. Intuitive illustration of the learnable parameters ( λ 1 , λ 2 , λ 3 ).
Figure 3. Intuitive illustration of the learnable parameters ( λ 1 , λ 2 , λ 3 ).
Entropy 27 00924 g003
Figure 4. Performance comparison with different perturbation ratios, temperatures, and momentum coefficients on FR-EN D B P dataset.
Figure 4. Performance comparison with different perturbation ratios, temperatures, and momentum coefficients on FR-EN D B P dataset.
Entropy 27 00924 g004
Table 1. Notations and descriptions.
Table 1. Notations and descriptions.
NotationDescription
E n Text embedding matrix of entities.
E c Context embedding matrix of entities.
E m Output embedding matrix of the textual feature layer.
E ˜ Output embedding matrix of the LCAT layer.
E ˜ u Output embedding matrix of the query encoder.
E ˜ v Output embedding matrix of the key encoder.
Superposition operation.
Vector concatenation operation.
·Dot product operation.
Table 2. Statistics of datasets.
Table 2. Statistics of datasets.
DatasetsKGsEntitiesRel.Rel. Triples
DBP-15K JA-EN D B P Japanese65,7442043164,373
English95,6802096233,319
FR-EN D B P French66,8581379192,191
English105,8892209278,590
ZH-EN D B P Chinese66,4692830153,929
English98,1252317237,674
WK31-15K EN-DE V 1 English15,00021547,676
German15,00013150,419
EN-DE V 2 English15,00016984,867
German15,0009692,632
EN-FR V 1 English15,00026747,334
French15,00021040,864
EN-FR V 2 English15,00019396,318
French15,00016680,112
DWY-100KDBP-WDDBpedia100,000330463,294
Wikidata100,000220448,774
DBP-YGDBpedia100,000302428,952
YAGO3100,00021502,563
Table 3. Comparative results of LCA-UEA against 23 baselines on DBP-15K. Underline indicates the best results for the first two categories, while bold marks the best results for the unsupervised methods.
Table 3. Comparative results of LCA-UEA against 23 baselines on DBP-15K. Underline indicates the best results for the first two categories, while bold marks the best results for the unsupervised methods.
Datasets ZH-EN DBP JA-EN DBP FR-EN DBP
ModelsHits@1Hits@10MRRHits@1Hits@10MRRHits@1Hits@10MRR
MTransE [17]30.861.436.427.957.534.924.455.633.5
BootEA [21]62.984.870.362.285.470.165.387.473.1
RDGCN ‡ [23]70.884.674.976.789.581.288.695.790.8
AliNet [24]53.982.662.854.983.164.555.285.265.7
EMEA * [47]78.293.384.277.195.083.780.196.686.3
RPR-RHGT ‡ [8]69.386.975.488.695.591.288.997.091.0
PEEA *† [10]76.191.581.677.292.582.180.694.585.8
RANM ‡ [9]77.688.181.390.595.292.390.995.892.7
KAGNN ‡ [25]73.687.378.679.491.183.792.097.694.1
JAPE [18]41.274.549.036.368.547.632.466.743.0
GCN-Align [48]41.374.454.939.974.554.637.374.553.2
MRAEA [27]75.792.982.775.793.382.678.094.884.9
AttrGNN † [12]79.692.984.578.392.083.491.897.791.0
MHNA * [13]60.380.565.787.694.490.387.895.090.5
MMEA-cat ‡ [49]62.484.570.264.186.972.372.591.479.3
GEEA ‡ [31]76.194.682.775.595.382.777.696.284.4
MultiKE [20]43.751.646.657.064.259.671.476.073.3
AttrE [19]26.343.632.238.161.547.562.379.368.6
ICLEA † [36]80.491.4-87.393.1-97.399.5-
EVA ‡ [37]75.289.580.473.789.079.173.190.979.2
SEU *† [40]80.892.185.287.194.689.897.099.698.3
SelfKG *† [39]73.886.077.181.591.384.994.298.897.2
UDCEA *† [41]81.192.285.584.793.587.898.199.598.7
LCA-UEA (ours) †81.591.585.187.594.690.198.499.899.0
The baselines marked with “*” are reproduced by using their source code, while the others are directly obtained from OpenEA [45] or their original papers. The baselines marked with “†” utilize pre-trained language models (e.g., LaBSE, FastText) to generate the initial embeddings of entity names, while those marked with “‡” incorporate image information. ’-’ indicates that the value is not reported in the original paper, so it is left blank. Additionally, we highlight the best performance results in the first two categories with underline and the best performance results among the unsupervised methods in bold. The notations in Table 4 and Table 5 are the same.
Table 4. Comparative results of LCA-UEA against 19 baselines on WK31-15K.
Table 4. Comparative results of LCA-UEA against 19 baselines on WK31-15K.
Datasets EN-FR V 1 EN-FR V 2 EN-DE V 1 EN-DE V 2
ModelsHits@1Hits@10MRRHits@1Hits@10MRRHits@1Hits@10MRRHits@1Hits@10MRR
MTransE [17]24.656.235.024.452.834.030.961.240.919.643.327.7
BootEA [21]50.378.659.766.190.874.767.186.673.784.994.588.3
RDGCN [23]75.487.979.984.893.488.182.491.385.584.090.986.6
AliNet [24]35.867.146.454.286.065.659.381.366.479.892.384.4
EMEA * [47]63.891.173.385.598.390.575.194.181.987.698.094.4
RPR-RHGT [8]90.996.693.094.998.596.392.197.294.093.897.895.3
STEA [46]72.892.979.892.699.095.081.195.386.096.099.297.2
PEEA † [10]76.692.080.488.998.292.578.795.484.595.799.097.0
RANM [9]92.597.094.197.098.497.794.997.896.296.698.097.5
JAPE [18]26.659.437.429.462.340.427.459.638.115.939.424.0
GCN-Align [48]33.466.944.641.880.154.548.075.357.154.178.662.6
MRAEA * [27]40.672.251.178.996.985.853.378.762.175.792.281.6
MHNA * [13]92.996.494.596.198.497.294.197.495.595.798.296.9
SDEA *† [33]97.198.997.897.699.298.197.299.097.997.799.498.3
MultiKE [20]74.283.677.686.192.388.475.382.978.175.783.778.6
AttrE [19]48.973.757.653.280.062.753.675.861.464.385.671.9
SEU *† [40]97.599.398.695.199.396.597.299.097.995.497.996.3
SelfKG *† [39]97.099.497.997.199.598.096.799.097.596.298.897.1
UDCEA *† [41]97.699.498.297.899.198.296.698.697.494.898.096.0
LCA-UEA (ours) †98.699.899.198.899.799.297.799.498.396.898.897.5
Due to the difficulty in acquiring image information for the WK31-15K dataset, we are unable to provide the experimental results for certain baselines that require image input, such as MMEA-cat [49], GEEA [31], and EVA [37].
Table 5. Comparative results of LCA-UEA against 15 baselines on DWY100K.
Table 5. Comparative results of LCA-UEA against 15 baselines on DWY100K.
DatasetsDBP-WDDBP-YG
ModelsHits@1Hits@10MRRHits@1Hits@10MRR
MTransE [17]28.152.036.325.249.333.4
BootEA [21]74.889.880.176.189.480.8
AliNet [24]69.090.876.678.694.384.1
EMEA * [47]83.695.288.986.297.390.4
RPR-RHGT [8]99.299.899.596.598.897.4
STEA * [46]90.697.893.289.396.591.9
RANM [9]99.399.899.597.299.498.0
JAPE [18]31.858.941.123.648.432.0
GCN-Align [48]50.677.257.759.783.868.6
MRAEA [27]65.588.673.477.594.283.4
AttrGNN † [12]96.098.897.299.899.999.9
MHNA * [13]99.399.999.499.9100.0100.0
MultiKE [20]91.896.293.588.095.390.6
SEU *† [40]95.799.497.299.9100.099.9
SelfKG *† [39]98.099.898.999.8100.099.9
LCA-UEA (ours) †98.399.898.9100.0100.0100.0
Table 6. Ablation study of LCA-UEA on DBP-15K. Results in bold are the best results.
Table 6. Ablation study of LCA-UEA on DBP-15K. Results in bold are the best results.
Datasets ZH-EN DBP JA-EN DBP FR-EN DBP
ModelsHits@1Hits@10MRRHits@1Hits@10MRRHits@1Hits@10MRR
w/o ECE+RRS79.489.883.185.693.388.498.199.698.7
w/o ECE79.689.683.286.093.588.798.299.798.8
w Cosine77.089.181.483.893.387.396.299.697.6
w GAT74.390.280.181.693.786.197.799.698.5
w GCN+GAT76.591.882.287.095.690.297.899.998.8
LCA-UEA (ours)81.591.585.187.594.690.198.499.899.0
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cai, W.; Ma, W. Learnable Convolutional Attention Network for Unsupervised Knowledge Graph Entity Alignment. Entropy 2025, 27, 924. https://doi.org/10.3390/e27090924

AMA Style

Cai W, Ma W. Learnable Convolutional Attention Network for Unsupervised Knowledge Graph Entity Alignment. Entropy. 2025; 27(9):924. https://doi.org/10.3390/e27090924

Chicago/Turabian Style

Cai, Weishan, and Wenjun Ma. 2025. "Learnable Convolutional Attention Network for Unsupervised Knowledge Graph Entity Alignment" Entropy 27, no. 9: 924. https://doi.org/10.3390/e27090924

APA Style

Cai, W., & Ma, W. (2025). Learnable Convolutional Attention Network for Unsupervised Knowledge Graph Entity Alignment. Entropy, 27(9), 924. https://doi.org/10.3390/e27090924

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop