HIEA: Hierarchical Inference for Entity Alignment with Collaboration of Instruction-Tuned Large Language Models and Small Models
Abstract
1. Introduction
- We propose HIEA, a novel and effective LLM-enhanced framework for EA. By fine-tuning a generative LLM with a unified and concise prompt, HIEA produces alignment results with a single query. It incorporates a knowledge adapter to inject KG embeddings into the LLM, thereby enhancing the LLM’s understanding of KG structure. We also perform data augmentation during instruction construction to obtain more high-quality tuning data.
- We introduce a collaborative inference strategy for EA. By analyzing similarity features of entity embeddings, we train a lightweight classifier to distinguish certain and uncertain source entities. SM’s predictions are retained for certain entities, while uncertain entities are delegated to the LLM for further inference, resulting in a clear reduction in usage cost without sacrificing performance.
- We conduct extensive experiments on both standard and highly heterogeneous temporal EA datasets, demonstrating that HIEA outperforms both embedding-based and LLM-enhanced EA methods. Comprehensive ablation studies further verify the effectiveness of each component.
2. Related Work
2.1. Small Model-Based EA Methods
2.2. LLM-Enhanced EA Methods
2.3. Adapting LLMs to Structured KG Tasks
3. Problem Definition
4. Our Method
4.1. Framework Overview
4.2. Prompt Construction
4.3. Instruction Data Classification and Augmentation
4.3.1. Certainty-Aware Data Classification
4.3.2. Data Augmentation
4.4. Knowledge Adaptation for Instruction Tuning
4.5. Discussion on Structural Embedding Injection
5. Experiments
5.1. Experimental Settings
- Datasets. We evaluate our approach on two widely used real-world entity alignment datasets. DBP15K [38] includes three conventional subsets from the multilingual version of DBpedia: (Chinese–English), (Japanese–English), and (French–English). HHKG [20] consists of two subsets, ICEWS-WIKI and ICEWS-YAGO, which are sampled from the Integrated Crisis Early Warning System (ICEWS), Wikidata, and YAGO. These two subsets are highly heterogeneous in terms of scale, structure, and entity-overlapping ratios. All five subdatasets above contain a number of pre-aligned entity pairs that serve as gold standards. Following prior work [10,14,20], we use 30% of these pairs for training and reserve the remainder for testing. Table 2 presents detailed dataset statistics, where Ent., Rel., and Tri. denote entities, relations, and triples, respectively. In addition, we report the structural similarity defined in [20], which measures the average similarity between aligned neighbors of aligned pairs and thus reflects the neighborhood-level similarity between KGs. As shown in Table 2, HHKG exhibits markedly lower structural similarity than DBP15K, indicating that it is more challenging and in line with realistic alignment scenarios.
- Baselines. To comprehensively evaluate the performance of the proposed HIEA, we compare it against a broad range of existing EA methods. These baselines include both well-established techniques and recent advances, collectively reflecting the rapid development of this research area and providing a solid foundation for empirical evaluation. Specifically, the selected baselines are categorized into two groups: (1) Small model-based methods, including translation-based approaches—MTransE [5], BootEA [23], and TransEdge [24], GNN-based approaches—GCN-Align [7], RDGCN [26], Dual-AMN [27], HOLI-GNN [28], PMF [10], and Simple-HHEA [20], and several time-aware models designed for temporal KG alignment—TEA-GNN [39], TREA [40], and STEA [41]; and (2) LLM-enhanced methods, including LLMEA [12], Seg-Align [13], ChatEA [14], and HLMEA [16].
- Evaluation metrics. We evaluate EA performance using two mainstream metrics: Hits@k and mean reciprocal rank (MRR). Larger values of Hits@k and MRR correspond to superior performance. In our framework, the fine-tuned LLM selects the most similar entity from the candidate list as its final answer. To ensure comparability with prior work, we move the selected entity to the top of the ranking while keeping the order of the remaining candidates unchanged, and then compute Hits@k and MRR on the re-ranked list. For each aligned pair, we treat either entity as the source entity in turn, and report the average performance over both alignment directions.
5.2. Implementation Details and Deployment Guidance
- Choice of base small models. We adopt two state-of-the-art embedding-based models, PMF [10] and Simple-HHEA [20], as the small models pre-trained on DBP15K and HHKG, respectively. In our experiment, we use a variant of PMF without visual features, as LLMs cannot process image information. In addition, following the settings of PMF and Simple-HHEA, we employ machine translation to transform non-English entity and relation names into English.
- Configuration for large language models. We employ LLaMA3-8B-Instruct for instruction tuning, as it is open-source and widely used. Additional comparisons with other representative LLMs (e.g., LLaMA2-7B-Chat) are provided in Section 5.6. We adopt low-rank adaptation (LoRA) [42] for parameter-efficient fine-tuning, where the LoRA modules are configured with rank , scaling factor , and a dropout rate of 0.1. These modules are inserted into the query and value projection layers of the Transformer’s self-attention blocks. To further speed up training, we follow the QLoRA strategy [43], which compresses model weights to 4-bit precision using double quantization with the 4-bit NormalFloat format. During inference, we adopt a greedy decoding strategy, with a maximum of 64 tokens generated. The LLM is instructed to output only the selected target entity name from the candidate list. If the generated string does not exactly match any candidate name, we select the candidate with the minimum Levenshtein edit distance after basic normalization.
- Instruction-tuning data construction. We build instruction instances from (i) the gold seed alignments in the training split and (ii) additional weakly-supervised alignments mined by the small model through embedding similarity. For each aligned entity pair , we create two instructions by swapping the alignment direction (i.e., and ), consistent with the bidirectional evaluation protocol. Each instruction follows the unified prompt template in Section 4.2, where the candidate list C is formed by retrieving the top-k target entities ranked by the small model, and neighbor facts are sampled using the seed-neighbor priority strategy until reaching facts.
- Data augmentation and filtering. Starting from the pre-trained small model, we perform one round of bidirectional mutual-nearest mining in the learned embedding space. Candidate new seed pairs are required to satisfy the mutual nearest constraint. To control noise, we rank the mined pairs by cosine similarity and keep only the top- pairs as high-confidence weak supervision. is dataset-dependent: we use for each DBP15K subset, for ICEWS-WIKI, and for ICEWS-YAGO. These new pairs are then converted into additional instructions using the same prompt template. As a result, the final instruction set contains 10,000 instances for each DBP15K subset, 4048 instances for ICEWS-WIKI, and 15,060 instances for ICEWS-YAGO.
- Hyperparameter settings. We determine several important hyperparameters via grid search. Specifically, the search space includes the number of candidate entities , the positive-to-negative sample ratio for training the classifier, and the number of neighbor facts . Parameter sensitivity analyses are presented in Section 5.5. In our implementation, for the purpose of balancing efficiency and performance, the final hyperparameter settings are , for HHKG and 9 for DBP15K, and .
- Experimental environment. All experiments are conducted on a server running Ubuntu 22.04, equipped with an Intel Xeon Platinum 8470Q CPU and an NVIDIA A40 GPU (48 GB). To ensure result stability, the code is independently executed five times, and the average of the outcomes is reported.
5.3. Main Results
5.4. Ablation Study
5.5. Parameter Sensitivity Study
- Effect of the candidate set size. In our framework, the LLM selects the most suitable target entity from a candidate set offered by the small model. To evaluate the reliability of the coarse candidate generation stage, we analyze the recall of the correct target entity within the top-k candidate sets produced by the small model. Table 6 reports the recall results on and ICEWS-WIKI with k ranging from 10 to 50. The results show that the candidate recall remains consistently high across both datasets. On , the recall exceeds 99% even when , while on the more heterogeneous ICEWS-WIKI dataset, the recall steadily increases from 88.5% to 93.7% as k grows. These results indicate that the small model provides high-quality candidate sets in practice, ensuring that the correct target entity is included with high probability and that the LLM-based refinement is not overly constrained by the initial coarse ranking.
- Effect of the positive–negative sample ratio. When training the source entity classifier, the data suffers from a class imbalance issue. We therefore rank positive samples according to the certainty confidence and truncate them to enforce a positive-to-negative ratio of . In this experiment, we analyze the impact of on classification performance. As shown in Figure 4, we report the precision and recall on the test set for classifiers trained with different values of . As increases, classification precision gradually decreases, while recall increases on two datasets. In addition, ICEWS-WIKI is more sensitive to changes in than , exhibiting a wider variation in precision and recall. Precision is critical for alignment accuracy, as high precision ensures that uncertain source entities are not misclassified as certain. In contrast, recall mainly affects inference efficiency: high recall reduces the number of source entities requiring further refinement by the LLM. Based on the results in Figure 4, we set for DBP15K and for HHKG, achieving both high recall and precision.
5.6. Further Analysis
- Comparison across different LLM backbones. In the main experiments, we instruction-tune LLaMA3-8B-Instruct for entity alignment. To further investigate the impact of different LLMs and learning paradigms, we adopt LLaMA2-7B-Chat and LLaMA3-8B-Instruct as backbones and evaluate them under two settings: in-context learning (ICL) and instruction fine-tuning (FT). The results are reported in Table 7. Across both backbones, consistent performance gains are observed for instruction-tuned models over their non-fine-tuned counterparts. This improvement can be attributed to instruction tuning with the proposed knowledge adapter, which enables LLMs to better incorporate KG embeddings and understand the essence of the EA task. In most cases, LLaMA3-8B-Instruct achieves superior performance compared to LLaMA2-7B-Chat, benefiting from its stronger reasoning capability. Interestingly, HIEA with LLaMA3 under the ICL setting performs slightly worse than its LLaMA2 counterpart on ICEWS-WIKI. A possible explanation is that LLaMA3-8B-Instruct, as a dialogue-oriented model, tends to generate more verbose outputs (e.g., explanations or labels), which may interfere with precise entity selection in the absence of fine-tuning. Overall, these results demonstrate the robustness of HIEA, which maintains excellent alignment performance across different LLM backbones and learning paradigms.
- Analysis of noise in data augmentation. The proposed data augmentation strategy can automatically generate additional entity pairs for instruction tuning, which may raise concerns about potential noise. To quantitatively assess the quality of the augmented data, we evaluate the correctness of the generated entity pairs against the gold-standard alignments, where accuracy is defined as the proportion of correctly matched pairs. Table 8 reports the accuracy of generated entity pairs on five datasets. The results show consistently high accuracy across all datasets, with an average accuracy of 91% on DBP15K and 100% accuracy on both ICEWS-WIKI and ICEWS-YAGO. These findings indicate that the proposed augmentation strategy introduces only limited noise. In particular, the perfect accuracy on the two highly heterogeneous temporal datasets highlights the effectiveness of the bidirectional nearest constraint in challenging alignment scenarios. Despite minor noise on some DBP15K subsets, we observe consistent improvements in downstream EA performance, suggesting that the instruction-tuned LLM is robust to moderate label noise.
- Effect of data classification. To verify the effectiveness of the data classification strategy, we compare the full HIEA method with a variant that removes this component in terms of alignment performance, the proportion of uncertain entities, and inference time. The results are reported in Table 9. By annotating test instances with the classifier trained in Section 4.3, the number of uncertain source entities that require further refinement by the LLM is substantially reduced, leading to significant gains in inference efficiency. For example, on , the proportion of uncertain source entities is decreased to 45.92%, resulting in a reduction of inference time by more than half. Importantly, the data classification strategy does not degrade alignment performance, thanks to the high precision of the classifier (see Section 5.5). Notably, on , HIEA even slightly outperforms the variant without classification. Further analysis reveals that some entities are correctly aligned by the small model but incorrectly predicted by the LLM. By labeling such cases as certain, HIEA bypasses unnecessary LLM inference and directly adopts the small model predictions, thereby avoiding potential errors introduced by the LLM.
- Effect of classifier choice. The certainty classifier plays an auxiliary role in distinguishing relatively certain and uncertain entities for collaborative inference. To evaluate the impact of classifier choice, we compare Random Forest with two alternative models commonly used for continuous features, namely Logistic Regression and a two-layer Multi-Layer Perceptron (MLP). All classifiers are trained on the same data, and precision, recall, and F1-score are reported for the positive (certain) class. The results on and ICEWS-WIKI are summarized in Table 10. Logistic Regression consistently exhibits very high precision but substantially lower recall on both datasets, indicating an overly conservative decision boundary. The MLP achieves the best F1-score on , while Random Forest performs best on ICEWS-WIKI, demonstrating stronger recall and overall balance in a more heterogeneous setting. These results suggest that the performance differences among classifiers are moderate, and our framework is not highly sensitive to the specific classifier choice. We adopt Random Forest in our framework due to its robust and stable performance across datasets.
- Efficiency analysis. Beyond strong alignment performance, HIEA features remarkably low inference overhead. To demonstrate this, we compare the efficiency of different EA methods and LLM backbones. Specifically, we calculate the average number of tokens and the average inference time required to align a single source entity. The results are summarized in Table 11. HIEA incurs substantially lower inference costs than ChatEA. For instance, on ICEWS-WIKI, the average token consumption is reduced from 9803 to 541, while the inference time decreases from 63.4 s to 0.25 s per entity. Notably, even when the fine-tuning overhead is included, HIEA remains significantly more efficient than ChatEA, reducing the average inference time by over two orders of magnitude. This mainly stems from two designs in HIEA. First, we adopt a unified prompt that instructs the LLM to generate alignment results with a single query, eliminating the need for multi-round interactions. Second, the data classification strategy filters out certain entities that do not require LLM refinement. Moreover, HIEA exhibits consistently high efficiency with both LLaMA2-7B and LLaMA3-8B as backbones, indicating that its efficient inference property is not tied to a specific LLM.
- Cross-dataset generalization analysis. To further evaluate the cross-dataset generalization capability of HIEA, we perform a transfer experiment in which the LoRA parameters fine-tuned on one dataset are directly reused for inference on another related dataset, without additional fine-tuning. Specifically, the LoRA parameters learned on are applied to and , while those trained on ICEWS-WIKI are directly transferred to ICEWS-YAGO. This setting allows us to assess whether HIEA learns dataset-agnostic alignment patterns or relies heavily on dataset-specific supervision. The results in Table 12 show that on and , the transferred model achieves performance nearly identical to the Original setting. On ICEWS-YAGO, the transferred variant shows a modest 1.4% drop in Hits@1 but still outperforms all baselines. These results not only highlight the robustness of the proposed framework under cross-dataset transfer but also shed light on the feasibility of developing foundation models for entity alignment.
6. Conclusions
7. Limitations and Future Work
- (1) Limitations of latent knowledge in LLMs. HIEA leverages the latent knowledge and reasoning capability of large language models to refine entity alignments. However, such latent knowledge may be incomplete or outdated in certain scenarios, particularly in temporal domains with rapidly evolving facts, newly emerged entities that are absent from the pre-training corpus, or knowledge graphs containing noisy or conflicting information. In these cases, the LLM may produce less reliable reasoning outcomes. A potential direction for future work is to incorporate external or up-to-date knowledge sources, such as temporal knowledge graph snapshots or retrieval-augmented generation, to mitigate the limitations of static pre-trained knowledge.
- (2) Dependency on coarse candidate generation. The proposed framework relies on a small model to generate an initial coarse ranking of candidate entities, and the LLM is constrained to select the final alignment from this candidate set. While our empirical analysis shows that the candidate recall remains high in practice, the framework may fail to recover correct alignments if the true target entity is absent from the candidate list. Future work could explore more robust candidate generation strategies, such as hybrid retrieval methods, adaptive candidate expansion, or uncertainty-aware re-ranking, to further improve recall without sacrificing efficiency.
- (3) Constraints in multi-modal alignment scenarios. The current implementation of HIEA operates under a text-only LLM setting and does not explicitly exploit visual information, which can be crucial for disambiguation in some multi-modal entity alignment scenarios. As a result, HIEA may lag behind fully multi-modal approaches when visual cues provide decisive signals. Nevertheless, the framework is inherently modular and can be naturally extended by integrating multi-modal small models for candidate generation or adopting multi-modal LLMs (e.g., GPT-4V or LLaVA [44]) for joint reasoning. A systematic exploration of such multi-modal extensions is left for future work.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| EA | Entity Alignment |
| KG | Knowledge Graph |
| HIEA | Hierarchical Inference for Entity Alignment |
| LLM | Large Language Model |
| SM | Small Model |
| GNN | Graph Neural Network |
| ICL | In-context Learning |
| FT | Fine-tuning |
| LoRA | Low-Rank Adaptation |
| ICEWS | Integrated Crisis Early Warning System |
References
- Xiong, G.; Bao, J.; Zhao, W. Interactive-KBQA: Multi-Turn Interactions for Knowledge Base Question Answering with Large Language Models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand, 11–16 August 2024; Ku, L.W., Martins, A., Srikumar, V., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 10561–10582. [Google Scholar] [CrossRef]
- Hu, Z.; Li, Z.; Jiao, Z.; Nakagawa, S.; Deng, J.; Cai, S.; Zhou, T.; Ren, F. Bridging the user-side knowledge gap in knowledge-aware recommendations with large language models. Proc. AAAI Conf. Artif. Intell. 2025, 39, 11799–11807. [Google Scholar] [CrossRef]
- Xu, T.; Li, B.; Chen, L.; Yang, C.; Gu, Y.; Gu, X. EHR coding with hybrid attention and features propagation on disease knowledge graph. Artif. Intell. Med. 2024, 154, 102916. [Google Scholar] [CrossRef] [PubMed]
- Cao, J.; Fang, J.; Meng, Z.; Liang, S. Knowledge Graph Embedding: A Survey from the Perspective of Representation Spaces. ACM Comput. Surv. 2024, 56, 1–42. [Google Scholar] [CrossRef]
- Chen, M.; Tian, Y.; Yang, M.; Zaniolo, C. Multilingual Knowledge Graph Embeddings for Cross-lingual Knowledge Alignment. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, 19–25 August 2017; Sierra, C., Ed.; IJCAI: Palo Alto, CA, USA, 2017; pp. 1511–1517. [Google Scholar] [CrossRef]
- Jiang, T.; Bu, C.; Zhu, Y.; Wu, X. Combining embedding-based and symbol-based methods for entity alignment. Pattern Recognit. 2022, 124, 108433. [Google Scholar] [CrossRef]
- Wang, Z.; Lv, Q.; Lan, X.; Zhang, Y. Cross-lingual Knowledge Graph Alignment via Graph Convolutional Networks. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 349–357. [Google Scholar] [CrossRef]
- Zhang, Z.; Yang, Y.; Chen, B. Relation-aware heterogeneous graph neural network for entity alignment. Neurocomputing 2024, 592, 127797. [Google Scholar] [CrossRef]
- Shi, X.; Li, B.; Chen, L.; Yang, C. Bi-Neighborhood Graph Neural Network for cross-lingual entity alignment. Knowl.-Based Syst. 2023, 277, 110841. [Google Scholar] [CrossRef]
- Huang, Y.; Zhang, X.; Zhang, R.; Chen, J.; Kim, J. Progressively Modality Freezing for Multi-Modal Entity Alignment. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand, 11–16 August 2024; Ku, L.W., Martins, A., Srikumar, V., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 3477–3489. [Google Scholar] [CrossRef]
- Zhu, L.; Li, N.; Bai, L. Embedding-based entity alignment between multi-source temporal knowledge graphs. Eng. Appl. Artif. Intell. 2024, 133, 108451. [Google Scholar] [CrossRef]
- Yang, L.; Chen, H.; Wang, X.; Yang, J.; Wang, F.Y.; Liu, H. Two Heads Are Better Than One: Integrating Knowledge from Knowledge Graphs and Large Language Models for Entity Alignment. arXiv 2024, arXiv:2401.16960. [Google Scholar] [CrossRef]
- Yang, L.; Cheng, J.; Zhang, F. Advancing Cross-Lingual Entity Alignment with Large Language Models: Tailored Sample Segmentation and Zero-Shot Prompts. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, Miami, FL, USA, 12–16 November 2024; Al-Onaizan, Y., Bansal, M., Chen, Y.N., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 8122–8138. [Google Scholar] [CrossRef]
- Jiang, X.; Shen, Y.; Shi, Z.; Xu, C.; Li, W.; Li, Z.; Guo, J.; Shen, H.; Wang, Y. Unlocking the Power of Large Language Models for Entity Alignment. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand, 11–16 August 2024; Ku, L.W., Martins, A., Srikumar, V., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 7566–7583. [Google Scholar] [CrossRef]
- Chen, S.; Zhang, Q.; Dong, J.; Hua, W.; Li, Q.; Huang, X. Entity Alignment with Noisy Annotations from Large Language Models. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 10–15 December 2024; Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2024; Volume 37, pp. 15097–15120. [Google Scholar] [CrossRef]
- Jin, X.; Wang, Z.; Chen, J.; Yang, L.; Oh, B.; Hwang, S.w.; Li, J. HLMEA: Unsupervised Entity Alignment Based on Hybrid Language Models. Proc. AAAI Conf. Artif. Intell. 2025, 39, 11888–11896. [Google Scholar] [CrossRef]
- Cheng, J.; Lu, C.; Yang, L.; Chen, G.; Zhang, F. EasyEA: Large Language Model is All You Need in Entity Alignment Between Knowledge Graphs. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2025, Vienna, Austria, 27 July–1 August 2025; Che, W., Nabende, J., Shutova, E., Pilehvar, M.T., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2025; pp. 20981–20995. [Google Scholar] [CrossRef]
- Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; et al. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv 2023, arXiv:2307.09288. [Google Scholar] [CrossRef]
- OpenAI; Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; et al. GPT-4 Technical Report. arXiv 2024, arXiv:2303.08774. [Google Scholar] [CrossRef]
- Jiang, X.; Xu, C.; Shen, Y.; Wang, Y.; Su, F.; Shi, Z.; Sun, F.; Li, Z.; Guo, J.; Shen, H. Toward Practical Entity Alignment Method Design: Insights from New Highly Heterogeneous Knowledge Graph Datasets. In Proceedings of the WWW ’24: ACM Web Conference 2024, Singapore, 13–17 May 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 2325–2336. [Google Scholar] [CrossRef]
- Zhang, Y.; Chen, Z.; Guo, L.; Xu, Y.; Zhang, W.; Chen, H. Making Large Language Models Perform Better in Knowledge Graph Completion. In Proceedings of the MM ’24: 32nd ACM International Conference on Multimedia, Melbourne, VIC, Australia, 28 October–1 November 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 233–242. [Google Scholar] [CrossRef]
- Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating Embeddings for Modeling Multi-relational Data. In Proceedings of the Advances in Neural Information Processing Systems 26, Lake Tahoe, NV, USA, 5–8 December 2013; Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2013; pp. 2787–2795. [Google Scholar]
- Sun, Z.; Hu, W.; Zhang, Q.; Qu, Y. Bootstrapping Entity Alignment with Knowledge Graph Embedding. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, Stockholm, Sweden, 13–19 July 2018; Lang, J., Ed.; IJCAI: Palo Alto, CA, USA, 2018; pp. 4396–4402. [Google Scholar] [CrossRef]
- Sun, Z.; Huang, J.; Hu, W.; Chen, M.; Guo, L.; Qu, Y. TransEdge: Translating Relation-Contextualized Embeddings for Knowledge Graphs. In Proceedings of the The Semantic Web—ISWC 2019, Auckland, New Zealand, 26–30 October 2019; Ghidini, C., Hartig, O., Maleshkova, M., Svátek, V., Cruz, I., Hogan, A., Song, J., Lefrançois, M., Gandon, F., Eds.; Springer: Cham, Switzerland, 2019; pp. 612–629. [Google Scholar]
- Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017. [Google Scholar] [CrossRef]
- Wu, Y.; Liu, X.; Feng, Y.; Wang, Z.; Yan, R.; Zhao, D. Relation-Aware Entity Alignment for Heterogeneous Knowledge Graphs. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, 10–16 August 2019; Kraus, S., Ed.; IJCAI: Palo Alto, CA, USA, 2019; pp. 5278–5284. [Google Scholar] [CrossRef]
- Mao, X.; Wang, W.; Wu, Y.; Lan, M. Boosting the Speed of Entity Alignment 10 ×: Dual Attention Matching Network with Normalized Hard Sample Mining. In Proceedings of the WWW ’21: Proceedings of the Web Conference 2021, Ljubljana Slovenia, 19–23 April 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 821–832. [Google Scholar] [CrossRef]
- Chen, J.; Yang, L.; Wang, Z.; Gong, M. Higher-order GNN with Local Inflation for entity alignment. Knowl.-Based Syst. 2024, 293, 111634. [Google Scholar] [CrossRef]
- Tian, S.; Luo, Y.; Xu, T.; Yuan, C.; Jiang, H.; Wei, C.; Wang, X. KG-Adapter: Enabling Knowledge Graph Integration in Large Language Models through Parameter-Efficient Fine-Tuning. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand, 11–16 August 2024; Ku, L.W., Martins, A., Srikumar, V., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 3813–3828. [Google Scholar] [CrossRef]
- Liu, Y.; Cao, Y.; Lin, X.; Shang, Y.; Wang, S.; Pan, S. Enhancing Large Language Model for Knowledge Graph Completion via Structure-Aware Alignment-Tuning. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Suzhou, China, 4–9 November 2025; Christodoulopoulos, C., Chakraborty, T., Rose, C., Peng, V., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2025; pp. 20970–20984. [Google Scholar] [CrossRef]
- Zhang, Q.; Dong, J.; Chen, H.; Zha, D.; Yu, Z.; Huang, X. KnowGPT: Knowledge Graph based Prompting for Large Language Models. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 10–15 December 2024; Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2024; Volume 37, pp. 6052–6080. [Google Scholar] [CrossRef]
- Chen, Z.; Bai, L.; Li, Z.; Huang, Z.; Jin, X.; Dou, Y. A New Pipeline for Knowledge Graph Reasoning Enhanced by Large Language Models Without Fine-Tuning. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, 12–16 November 2024; Al-Onaizan, Y., Bansal, M., Chen, Y.N., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 1366–1381. [Google Scholar] [CrossRef]
- Sun, Z.; Zhang, Q.; Hu, W.; Wang, C.; Chen, M.; Akrami, F.; Li, C. A Benchmarking Study of Embedding-Based Entity Alignment for Knowledge Graphs. Proc. VLDB Endow. 2020, 13, 2326–2340. [Google Scholar] [CrossRef]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Yan, Z.; Peng, R.; Wu, H. Similarity propagation based semi-supervised entity alignment. Eng. Appl. Artif. Intell. 2024, 130, 107787. [Google Scholar] [CrossRef]
- Mao, X.; Wang, W.; Xu, H.; Lan, M.; Wu, Y. MRAEA: An Efficient and Robust Entity Alignment Approach for Cross-Lingual Knowledge Graph. In Proceedings of the WSDM ’20: Proceedings of the 13th International Conference on Web Search and Data Mining, Houston, TX, USA, 3–7 February 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 420–428. [Google Scholar] [CrossRef]
- Shazeer, N. GLU Variants Improve Transformer. arXiv 2020, arXiv:2002.05202. [Google Scholar] [CrossRef]
- Sun, Z.; Hu, W.; Li, C. Cross-Lingual Entity Alignment via Joint Attribute-Preserving Embedding. In Proceedings of the Semantic Web—ISWC 2017—16th International Semantic Web Conference, Vienna, Austria, 21–25 October 2017; Proceedings, Part I. d’Amato, C., Fernández, M., Tamma, V.A.M., Lécué, F., Cudré-Mauroux, P., Sequeda, J.F., Lange, C., Heflin, J., Eds.; Springer: Berlin/Heidelberg, Germany, 2017; Volume 10587, Lecture Notes in Computer Science. pp. 628–644. [Google Scholar] [CrossRef]
- Xu, C.; Su, F.; Lehmann, J. Time-aware Graph Neural Network for Entity Alignment between Temporal Knowledge Graphs. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic, 7–11 November 2021; Moens, M.F., Huang, X., Specia, L., Yih, S.W.t., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2021; pp. 8999–9010. [Google Scholar] [CrossRef]
- Xu, C.; Su, F.; Xiong, B.; Lehmann, J. Time-aware Entity Alignment using Temporal Relational Attention. In Proceedings of the WWW ’22: Proceedings of the ACM Web Conference 2022, Virtual Event, Lyon, France, 25–29 April 2022; Association for Computing Machinery: New York, NY, USA, 2022; pp. 788–797. [Google Scholar] [CrossRef]
- Cai, L.; Mao, X.; Ma, M.; Yuan, H.; Zhu, J.; Lan, M. A Simple Temporal Information Matching Mechanism for Entity Alignment between Temporal Knowledge Graphs. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022; Calzolari, N., Huang, C.R., Kim, H., Pustejovsky, J., Wanner, L., Choi, K.S., Ryu, P.M., Chen, H.H., Donatelli, L., Ji, H., et al., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 2075–2086. [Google Scholar]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. arXiv 2021, arXiv:2106.09685. [Google Scholar] [CrossRef]
- Dettmers, T.; Pagnoni, A.; Holtzman, A.; Zettlemoyer, L. QLORA: Efficient finetuning of quantized LLMs. In Proceedings of the NIPS ’23: Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; Curran Associates Inc.: Red Hook, NY, USA, 2023. [Google Scholar]
- Liu, H.; Li, C.; Wu, Q.; Lee, Y.J. Visual Instruction Tuning. In Proceedings of the Advances in Neural Information Processing Systems; Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2023; Volume 36, pp. 34892–34916. [Google Scholar]




| Method | Learning Paradigm | LLM Involvement | Structural Signal Injection | Inference Strategy |
|---|---|---|---|---|
| Small Model-Based Methods | Embedding-based representation learning | None | Explicit graph embeddings | Embedding similarity ranking |
| LLMEA | In-context learning | Virtual entity generation + iterative multiple-choice reasoning | None | Multi-round multiple-choice selection |
| ChatEA | In-context learning | Description generation + dialogue reasoning | KG-Code translation | Iterative dialogue reasoning |
| HLMEA | In-context learning | Repeated annotation + majority voting | Textual representation of entities | Iterative filtering and voting |
| EasyEA | In-context learning | Information summarization + candidate selection | Implicit via summaries | LLM-driven candidate selection based on semantic features |
| HIEA (Ours) | Instruction tuning with knowledge adaptation | Single-step answer generation | Explicit structural embedding injection + neighbor facts | Hierarchical inference via collaboration between instruction-tuned LLMs and small models |
| Dataset | KG | Ent. | Rel. | Tri. | Pairs | Struc. Sim. |
|---|---|---|---|---|---|---|
| DBPzh-en | ZH EN | 19,388 19,572 | 1701 1323 | 70,414 95,142 | 15,000 | 0.644 |
| DBPja-en | JA EN | 19,814 19,780 | 1299 1153 | 77,214 93,484 | 15,000 | 0.660 |
| DBPfr-en | FR EN | 19,661 19,993 | 903 1208 | 105,998 115,722 | 15,000 | 0.652 |
| ICEWS-WIKI | ICEWS WIKI | 11,047 15,896 | 272 226 | 3,527,881 198,257 | 5058 | 0.154 |
| ICEWS-YAGO | ICEWS YAGO | 26,863 22,734 | 272 41 | 4,192,555 107,118 | 18,824 | 0.140 |
| Methods | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Hits@1 | Hits@10 | MRR | Hits@1 | Hits@10 | MRR | Hits@1 | Hits@10 | MRR | |
| MTransE | 0.308 | 0.614 | 0.364 | 0.279 | 0.575 | 0.349 | 0.244 | 0.556 | 0.335 |
| BootEA | 0.629 | 0.847 | 0.703 | 0.622 | 0.854 | 0.701 | 0.653 | 0.874 | 0.731 |
| TransEdge | 0.735 | 0.919 | 0.801 | 0.719 | 0.932 | 0.795 | 0.710 | 0.941 | 0.796 |
| GCN-Align | 0.413 | 0.744 | 0.549 | 0.399 | 0.745 | 0.546 | 0.373 | 0.745 | 0.532 |
| RDGCN | 0.708 | 0.846 | 0.746 | 0.767 | 0.895 | 0.812 | 0.873 | 0.950 | 0.901 |
| Dual-AMN | 0.861 | 0.964 | 0.901 | 0.892 | 0.978 | 0.925 | 0.954 | 0.994 | 0.970 |
| HOLI-GNN | 0.901 | 0.966 | 0.926 | 0.924 | 0.977 | 0.943 | 0.971 | 0.993 | 0.980 |
| PMF | 0.940 | 0.991 | 0.960 | 0.971 | 0.997 | 0.981 | 0.988 | 0.999 | 0.992 |
| LLMEA | 0.898 | 0.923 | - | 0.911 | 0.946 | - | 0.957 | 0.977 | - |
| Seg-Align | 0.953 | - | - | 0.907 | - | - | 0.987 | - | - |
| ChatEA | - | - | - | - | - | - | 0.990 | 1.000 | 0.995 |
| HLMEA | 0.930 | - | 0.934 | 0.938 | - | 0.950 | 0.986 | - | 0.989 |
| HIEA | |||||||||
| Methods | ICEWS-WIKI | ICEWS-YAGO | ||||
|---|---|---|---|---|---|---|
| Hits@1 | Hits@10 | MRR | Hits@1 | Hits@10 | MRR | |
| MTransE | 0.021 | 0.158 | 0.068 | 0.012 | 0.084 | 0.040 |
| BootEA | 0.072 | 0.275 | 0.139 | 0.020 | 0.120 | 0.056 |
| GCN-Align | 0.046 | 0.184 | 0.093 | 0.017 | 0.085 | 0.038 |
| RDGCN | 0.064 | 0.202 | 0.096 | 0.029 | 0.097 | 0.042 |
| Dual-AMN | 0.083 | 0.281 | 0.145 | 0.031 | 0.144 | 0.068 |
| TEA-GNN | 0.063 | 0.253 | 0.126 | 0.025 | 0.135 | 0.064 |
| TREA | 0.081 | 0.302 | 0.155 | 0.033 | 0.150 | 0.072 |
| STEA | 0.079 | 0.292 | 0.152 | 0.033 | 0.147 | 0.073 |
| Simple-HHEA | 0.720 | 0.872 | 0.754 | 0.847 | 0.915 | 0.870 |
| ChatEA | 0.880 | 0.945 | 0.912 | 0.935 | 0.955 | 0.944 |
| HIEA | ||||||
| Variants | ICEWS-WIKI | |||||
|---|---|---|---|---|---|---|
| Hits@1 | Hits@10 | MRR | Hits@1 | Hits@10 | MRR | |
| HIEA | 0.969 | 0.993 | 0.978 | 0.936 | 0.940 | 0.938 |
| w/o neighbors | 0.956 | 0.989 | 0.968 | 0.932 | 0.937 | 0.934 |
| w/o augmentation | 0.961 | 0.990 | 0.972 | 0.921 | 0.929 | 0.924 |
| w/o adaption | 0.959 | 0.899 | 0.970 | 0.929 | 0.935 | 0.932 |
| Dataset | k = 10 | k = 20 | k = 30 | k = 40 | k = 50 |
|---|---|---|---|---|---|
| 0.992 | 0.995 | 0.997 | 0.998 | 0.998 | |
| ICEWS-WIKI | 0.885 | 0.909 | 0.921 | 0.931 | 0.937 |
| Models | ICEWS-WIKI | |||||
|---|---|---|---|---|---|---|
| Hits@1 | Hits@10 | MRR | Hits@1 | Hits@10 | MRR | |
| HIEA | ||||||
| w/LLaMA2-7B (ICL) | 0.943 | 0.991 | 0.962 | 0.891 | 0.912 | 0.900 |
| w/LLaMA2-7B (FT) | 0.958 | 0.992 | 0.970 | 0.924 | 0.929 | 0.927 |
| w/LLaMA3-8B (ICL) | 0.949 | 0.992 | 0.966 | 0.881 | 0.909 | 0.891 |
| w/LLaMA3-8B (FT) | 0.969 | 0.993 | 0.978 | 0.936 | 0.940 | 0.938 |
| Dataset | ICEWS-WIKI | ICEWS-YAGO | |||
|---|---|---|---|---|---|
| Number | 500 | 500 | 500 | 506 | 1882 |
| Accuracy | 0.947 | 0.910 | 0.874 | 1.000 | 1.000 |
| Models | ICEWS-WIKI | |||||||
|---|---|---|---|---|---|---|---|---|
| Hits@1 | MRR | Uncer. Per. | Time(s) | Hits@1 | MRR | Uncer. Per. | Time(s) | |
| HIEA | 0.969 | 0.978 | 45.92% | 4045 | 0.936 | 0.938 | 50.08% | 1784 |
| w/o classification | 0.967 | 0.976 | 100% | 8586 | 0.941 | 0.942 | 100% | 3474 |
| Methods | ICEWS-WIKI | |||||
|---|---|---|---|---|---|---|
| Precision | Recall | F1 | Precision | Recall | F1 | |
| Logistic Regression | 0.999 | 0.559 | 0.717 | 0.995 | 0.582 | 0.734 |
| MLP (2-layer) | 0.995 | 0.623 | 0.766 | 0.996 | 0.533 | 0.694 |
| Random Forest | 0.999 | 0.575 | 0.730 | 0.994 | 0.622 | 0.765 |
| Models | ICEWS-WIKI | ICEWS-YAGO | ||
|---|---|---|---|---|
| Avg. Tokens | Avg. Time(s) | Avg. Tokens | Avg. Time(s) | |
| ChatEA | ||||
| w/LLaMA2-70B | 11,380 | 63.4 | 8950 | 46.5 |
| w/LLaMA2-13B | 47,007 | 150.1 | 44,907 | 135.8 |
| w/GPT-4 | 9803 | 131.8 | 6593 | 90.8 |
| HIEA | ||||
| w/LLaMA2-7B (inference only) | 604 | 0.252 | 554 | 0.228 |
| w/LLaMA2-7B (fine-tuning + inference) | 947 | 1.307 | 872 | 0.932 |
| w/LLaMA3-8B (inference only) | 541 | 0.261 | 473 | 0.221 |
| w/LLaMA3-8B (fine-tuning + inference) | 848 | 1.123 | 745 | 0.833 |
| Methods | ICEWS-YAGO | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Hits@1 | Hits@10 | MRR | Hits@1 | Hits@10 | MRR | Hits@1 | Hits@10 | MRR | |
| Transfer | 0.985 | 0.996 | 0.991 | 0.994 | 0.998 | 0.997 | 0.949 | 0.952 | 0.950 |
| Original | 0.990 | 0.998 | 0.994 | 0.998 | 1.000 | 0.999 | 0.963 | 0.965 | 0.964 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Shi, X.; Han, Z.; Li, B. HIEA: Hierarchical Inference for Entity Alignment with Collaboration of Instruction-Tuned Large Language Models and Small Models. Electronics 2026, 15, 421. https://doi.org/10.3390/electronics15020421
Shi X, Han Z, Li B. HIEA: Hierarchical Inference for Entity Alignment with Collaboration of Instruction-Tuned Large Language Models and Small Models. Electronics. 2026; 15(2):421. https://doi.org/10.3390/electronics15020421
Chicago/Turabian StyleShi, Xinchen, Zhenyu Han, and Bin Li. 2026. "HIEA: Hierarchical Inference for Entity Alignment with Collaboration of Instruction-Tuned Large Language Models and Small Models" Electronics 15, no. 2: 421. https://doi.org/10.3390/electronics15020421
APA StyleShi, X., Han, Z., & Li, B. (2026). HIEA: Hierarchical Inference for Entity Alignment with Collaboration of Instruction-Tuned Large Language Models and Small Models. Electronics, 15(2), 421. https://doi.org/10.3390/electronics15020421

