OPICE: Ontology-Guided Pseudo-Label Generation and Inter-Modal Consistency Enhancement for Self-Supervised Multi-Modal Entity Alignment
Abstract
1. Introduction
- We systematically identify the key challenges in self-supervised multi-modal entity alignment (MMEA), including the difficulty of generating high-quality pseudo-labels and the weak semantic associations across different modalities.
- We propose OPICE, a novel self-supervised MMEA framework that introduces a pseudo-label generation strategy and an inter-modal consistency enhancement mechanism to jointly improve alignment quantity, robustness, and cross-modal coherence.
- We conduct extensive experiments on two widely used multi-modal datasets, FB–DB15K and FB–YAGO15K, demonstrating that OPICE consistently outperforms existing self-supervised methods and even surpasses most supervised baselines.
2. Related Work
2.1. Multi-Modal Entity Alignment
2.2. Semi-Supervised and Self-Supervised Entity Alignment
3. Methodology
3.1. Problem Definition
3.2. Framework Overview
3.3. Multi-Modal Knowledge Embedding
3.3.1. Graph Structure Embeddings
3.3.2. Relation Embeddings
3.3.3. Attribute Embeddings
3.3.4. Visual Information Embeddings
3.4. Pseudo-Label Generation
3.4.1. Attribute Alignment
3.4.2. Ontology Alignment
3.4.3. Entity Alignment
3.5. Inter-Modal Consistency Enhancement
3.6. Optimization Objective
3.7. Computational Complexity Analysis
4. Experiments
4.1. Data Sets
4.2. Evaluation Metrics
4.3. Model Configurations
4.4. Baseline Models
4.5. Main Results
4.6. Ablation Study
4.7. Case Study
4.8. Further Analysis
4.9. Runtime and Cost Analysis
4.10. Robustness Discussion in Realistic Scenarios
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Lehmann, J.; Isele, R.; Jakob, M.; Jentzsch, A.; Kontokostas, D.; Mendes, P.N.; Hellmann, S.; Morsey, M.; Van Kleef, P.; Auer, S.; et al. Dbpedia–a large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web. 2015, 6, 167–195. [Google Scholar] [CrossRef]
- Suchanek, F.M.; Kasneci, G.; Weikum, G. Yago: A core of semantic knowledge. In Proceedings of the 16th International Conference on World Wide Web, Banff, AL, Canada, 8–12 May 2007; pp. 697–706. [Google Scholar]
- Cao, Y.; Wang, X.; He, X.; Hu, Z.; Chua, T.S. Unifying knowledge graph learning and recommendation: Towards a better understanding of user preferences. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 151–161. [Google Scholar]
- Ding, Y.; Yu, J.; Liu, B.; Hu, Y.; Cui, M.; Wu, Q. Mukea: Multimodal knowledge extraction and accumulation for knowledge-based visual question answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 5089–5098. [Google Scholar]
- Han, X.; Liu, Z.; Sun, M. Neural knowledge acquisition via mutual attention between knowledge graph and text. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Shao, Z.; Yu, Z.; Wang, M.; Yu, J. Prompting large language models with answer heuristics for knowledge-based visual question answering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 14974–14983. [Google Scholar]
- Ding, L.; Shen, D.; Wang, C.; Wang, T.; Zhang, L.; Zhang, Y. Dgr: A general graph desmoothing framework for recommendation via global and local perspectives. arXiv 2024, arXiv:2403.04287. [Google Scholar] [CrossRef]
- Sun, R.; Cao, X.; Zhao, Y.; Wan, J.; Zhou, K.; Zhang, F.; Wang, Z.; Zheng, K. Multi-modal knowledge graphs for recommender systems. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtually, 19–23 October 2020; pp. 1405–1414. [Google Scholar]
- Wang, T.; Deng, L.; Wang, C.; Lian, J.; Yan, Y.; Yuan, N.J.; Zhang, Q.; Xiong, H. Comet: Nft price prediction with wallet profiling. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; pp. 5893–5904. [Google Scholar]
- Liu, Y.; Li, H.; Garcia-Duran, A.; Niepert, M.; Onoro-Rubio, D.; Rosenblum, D.S. MMKG: Multi-modal knowledge graphs. In Proceedings of the European Semantic Web Conference, Portorož, Slovenia, 2–6 June 2019; pp. 459–474. [Google Scholar]
- Xu, D.; Xu, T.; Wu, S.; Zhou, J.; Chen, E. Relation-enhanced negative sampling for multimodal knowledge graph completion. In Proceedings of the 30th ACM International Conference on Multimedia, Lisbon, Portugal, 10–14 October 2022; pp. 3857–3866. [Google Scholar]
- Xu, D.; Zhou, J.; Xu, T.; Xia, Y.; Liu, J.; Chen, E.; Dou, D. Multimodal biological knowledge graph completion via triple co-attention mechanism. In Proceedings of the 2023 IEEE 39th International Conference on Data Engineering (ICDE), Anaheim, CA, USA, 3–7 April 2023; pp. 3928–3941. [Google Scholar]
- Chen, L.; Li, Z.; Wang, Y.; Xu, T.; Wang, Z.; Chen, E. MMEA: Entity alignment for multi-modal knowledge graph. In Proceedings of the International Conference on Knowledge Science, Engineering and Management, Hangzhou, China, 28–30 August 2020; pp. 134–147. [Google Scholar]
- Chen, Z.; Chen, J.; Zhang, W.; Guo, L.; Fang, Y.; Huang, Y.; Zhang, Y.; Geng, Y.; Pan, J.Z.; Song, W.; et al. Meaformer: Multi-modal entity alignment transformer for meta modality hybrid. In Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; pp. 3317–3327. [Google Scholar]
- Chen, L.; Li, Z.; Xu, T.; Wu, H.; Wang, Z.; Yuan, N.J.; Chen, E. Multi-modal siamese network for entity alignment. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; pp. 118–126. [Google Scholar]
- Liu, F.; Chen, M.; Roth, D.; Collier, N. Visual pivoting for (unsupervised) entity alignment. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; Volume 35, pp. 4257–4266. [Google Scholar]
- Sun, Z.; Hu, W.; Zhang, Q.; Qu, Y. Bootstrapping entity alignment with knowledge graph embedding. In Proceedings of the IJCAI, Stockholm, Sweden, 13–19 July 2018; Volume 18. [Google Scholar]
- Wang, L.; Qi, P.; Bao, X.; Zhou, C.; Qin, B. Pseudo-label calibration semi-supervised multi-modal entity alignment. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; Volume 38, pp. 9116–9124. [Google Scholar]
- Jin, X.; Wang, Z.; Chen, J.; Yang, L.; Oh, B.; Hwang, S.w.; Li, J. HLMEA: Unsupervised Entity Alignment Based on Hybrid Language Models. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; Volume 39, pp. 11888–11896. [Google Scholar]
- Zhang, R.; Su, Y.; Trisedya, B.D.; Zhao, X.; Yang, M.; Cheng, H.; Qi, J. Autoalign: Fully automatic and effective knowledge graph alignment enabled by large language models. IEEE Trans. Knowl. Data Eng. 2023, 36, 2357–2371. [Google Scholar] [CrossRef]
- Huo, N.; Cheng, R.; Kao, B.; Ning, W.; Haldar, N.A.H.; Li, X.; Li, J.; Najafi, M.M.; Li, T.; Qu, G. Zeroea: A zero-training entity alignment framework via pre-trained language model. Proc. VLDB Endow. 2024, 17, 1765–1774. [Google Scholar] [CrossRef]
- Chen, X.; Lu, T.; Wang, Z. LLM-Align: Utilizing Large Language Models for Entity Alignment in Knowledge Graphs. arXiv 2024, arXiv:2412.04690. [Google Scholar] [CrossRef]
- Chen, L.; Sun, Y.; Zhang, S.; Ye, Y.; Wu, W.; Xiong, H. Tackling uncertain correspondences for multi-modal entity alignment. Adv. Neural Inf. Process. Syst. 2024, 37, 119386–119410. [Google Scholar]
- Zhang, Y.; Luo, X.; Hu, J.; Zhang, M.; Xiao, K.; Li, Z. Graph structure prefix injection transformer for multi-modal entity alignment. Inf. Process. Manag. 2025, 62, 104048. [Google Scholar] [CrossRef]
- Wang, C.; Wang, W.; Li, X.; Liang, Q.; Bao, F. OTMEA: Multi-modal Entity Alignment via Optimal Transport. In Proceedings of the ICASSP 2025–2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 6–11 April 2025; pp. 1–5. [Google Scholar]
- Su, T.; Sheng, J.; Ma, D.; Li, X.; Yue, J.; Song, M.; Tang, Y.; Liu, T. Mitigating Modality Bias in Multi-modal Entity Alignment from a Causal Perspective. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, Padua, Italy, 13–17 July 2025; pp. 1186–1196. [Google Scholar]
- Yang, H.W.; Zou, Y.; Shi, P.; Lu, W.; Lin, J.; Sun, X. Aligning cross-lingual entities with multi-aspect information. arXiv 2019, arXiv:1910.06575. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. Gpt-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar] [CrossRef]
- Chen, L.; Tong, P.; Jin, Z.; Sun, Y.; Ye, J.; Xiong, H. Plan-on-graph: Self-correcting adaptive planning of large language model on knowledge graphs. Adv. Neural Inf. Process. Syst. 2024, 37, 37665–37691. [Google Scholar]
- Lin, Z.; Zhang, Z.; Wang, M.; Shi, Y.; Wu, X.; Zheng, Y. Multi-modal contrastive representation learning for entity alignment. arXiv 2022, arXiv:2209.00891. [Google Scholar] [CrossRef]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 9729–9738. [Google Scholar]
- Glorot, X.; Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, Sardinia, Italy, 13–15 May 2010; pp. 249–256. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Zhu, H.; Xie, R.; Liu, Z.; Sun, M. Iterative Entity Alignment via Joint Knowledge Embeddings. In Proceedings of the IJCAI, Melbourne, Australia, 19–25 August 2017; Volume 17, pp. 4258–4264. [Google Scholar]
- Wang, Z.; Lv, Q.; Lan, X.; Zhang, Y. Cross-lingual knowledge graph alignment via graph convolutional networks. In Proceedings of the Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 349–357. [Google Scholar]
- Shukla, P.K.; Veerasamy, B.D.; Alduaiji, N.; Addula, S.R.; Pandey, A.; Shukla, P.K. Fraudulent account detection in social media using hybrid deep transformer model and hyperparameter optimization. Sci. Rep. 2025, 15, 38447. [Google Scholar] [CrossRef] [PubMed]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]






| Prompt Description |
|---|
| Task Definition: |
| You are an expert in schema alignment between knowledge graphs. Your task is to perform one-to-one attribute alignment between two sets of KG attributes based on their semantic equivalence. |
| Input: |
| Two sets of KG attributes:Source KG Attributes and Target KG Attributes |
| Alignment Requirements: |
| 1. Align attributes only when they share the same or highly similar meaning. |
| 2. Each attribute can be aligned with at most one attribute from the other set (strict one-to-one mapping). |
| 3. It is allowed for attributes to remain unmatched if no semantically appropriate counterpart is found. |
| 4. If no perfect synonym exists, select the closest possible semantic match, while ensuring semantic consistency.The output should contain only the final aligned attribute pairs and nothing else. |
| Output: |
| A list of aligned attribute pairs in the format (KG1_Attribute, KG2_Attribute). Each pair should be printed on a new line. Only include matched attribute pairs; exclude: unmatched attributes, explanations, reasoning process and duplicated results. |
| Example: |
| Source KG Attributes: {date_of_birth, area, longitude} and Target KG Attributes: {birthDate, totalArea, birthYear}. The expected output is: |
| (date_of_birth, birthDate) |
| (area, totalArea) |
| Dataset | Entity | Relation | Attribute | Image | Seed |
|---|---|---|---|---|---|
| Triple | Triple | ||||
| FB15K | 14,951 | 59,2213 | 29,395 | 13,444 | - |
| DB15K | 12,842 | 89,197 | 48,080 | 12,837 | 12,846 |
| YAGO15K | 15,404 | 122,886 | 23,532 | 11,194 | 11,119 |
| Modality | Raw Data Example |
|---|---|
| Entity Description | /m/027rn |
| Attribute | (/m/027rn, location.geocode.latitude, 19.0)… |
| ImageID | FBIMG07459 |
| Relation | (/m/027rn, country.form.of.government, /m/06cx9)… |
| Method | FB-DB15K | FB-YAGO15K | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Hit@1 | Hit@5 | Hit@10 | MR | MRR | Hit@1 | Hit@5 | Hit@10 | MR | MRR | |
| IPTransE | 0.0402 | 0.1110 | 0.1703 | 398.0461 | 0.0963 | 0.0315 | 0.0857 | 0.1410 | 553.2436 | 0.0712 |
| GCN-Align | 0.0421 | 0.1090 | 0.1590 | 843.6562 | 0.0810 | 0.0250 | 0.0760 | 0.1120 | 1201.6574 | 0.0593 |
| BootEA | 0.3230 | 0.4990 | 0.5790 | 205.5305 | 0.4100 | 0.2340 | 0.3740 | 0.4450 | 272.1206 | 0.3070 |
| MMEA | 0.2650 | 0.4510 | 0.5410 | 124.8072 | 0.3570 | 0.2340 | 0.3980 | 0.4800 | 147.4410 | 0.3170 |
| OTMEA | 0.4430 | - | 0.7120 | - | 0.5230 | 0.3950 | - | 0.6280 | - | 0.4770 |
| MCLEA | 0.4450 | 0.6400 | 0.7050 | 84.6284 | 0.5340 | 0.3880 | 0.5790 | 0.6410 | 123.3940 | 0.4740 |
| MEAformer | 0.5720 | 0.7437 | 0.8120 | 62.3348 | 0.6610 | 0.4440 | 0.6178 | 0.6920 | 86.2394 | 0.5290 |
| GSIEA | 0.5800 | 0.7600 | 0.7910 | - | 0.6490 | 0.5150 | 0.6730 | 0.7280 | - | 0.5880 |
| MSNEA | 0.6527 | 0.7685 | 0.8121 | 54.0251 | 0.7080 | 0.4429 | 0.6255 | 0.6983 | 85.0744 | 0.5290 |
| CDMEA | 0.6740 | - | 0.8610 | - | 0.7410 | 0.6230 | - | 0.8110 | - | 0.6900 |
| PCMEA | 0.6763 | 0.8214 | 0.8872 | 50.6283 | 0.7280 | 0.5896 | 0.7518 | 0.8347 | 89.5815 | 0.6460 |
| EVA | 0.5559 | 0.6664 | 0.7159 | 139.9956 | 0.6090 | 0.1026 | 0.2166 | 0.2779 | 616.7891 | 0.1640 |
| OPICE | 0.8093 | 0.8644 | 0.9090 | 28.8862 | 0.8440 | 0.6264 | 0.7562 | 0.7842 | 84.5213 | 0.6910 |
| Method | FB-DB15K | FB-YAGO15K | ||||
|---|---|---|---|---|---|---|
| Hit@1 | Hit@10 | MRR | Hit@1 | Hit@10 | MRR | |
| OPICE | 0.8081 | 0.9065 | 0.8440 | 0.6264 | 0.7842 | 0.6810 |
| w/o ICL | 0.5208 | 0.6979 | 0.5820 | 0.3543 | 0.5511 | 0.4210 |
| w/o MOCO | 0.6384 | 0.7484 | 0.6770 | 0.3398 | 0.5039 | 0.3960 |
| w/o clean | 0.7848 | 0.8792 | 0.8180 | 0.6031 | 0.7376 | 0.6500 |
| w/o orth | 0.7697 | 0.8742 | 0.8060 | 0.5331 | 0.6824 | 0.5850 |
| w/o IT | 0.4982 | 0.7018 | 0.5690 | 0.2610 | 0.4752 | 0.3340 |
| Method | w/ Ontology | w/o Ontology |
|---|---|---|
| Bi-NN(Top-1 ↔ Top-1) | ✓ | ✗ |
| Pseudo-label decision | Include | Exclude |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wang, Y.; Guo, Z.; Mu, Y.; Li, X.; Shao, L.; Mei, G.; Li, F. OPICE: Ontology-Guided Pseudo-Label Generation and Inter-Modal Consistency Enhancement for Self-Supervised Multi-Modal Entity Alignment. Electronics 2026, 15, 254. https://doi.org/10.3390/electronics15020254
Wang Y, Guo Z, Mu Y, Li X, Shao L, Mei G, Li F. OPICE: Ontology-Guided Pseudo-Label Generation and Inter-Modal Consistency Enhancement for Self-Supervised Multi-Modal Entity Alignment. Electronics. 2026; 15(2):254. https://doi.org/10.3390/electronics15020254
Chicago/Turabian StyleWang, Yingdi, Ziyu Guo, Yongheng Mu, Xuewei Li, Lixu Shao, Guangxu Mei, and Feng Li. 2026. "OPICE: Ontology-Guided Pseudo-Label Generation and Inter-Modal Consistency Enhancement for Self-Supervised Multi-Modal Entity Alignment" Electronics 15, no. 2: 254. https://doi.org/10.3390/electronics15020254
APA StyleWang, Y., Guo, Z., Mu, Y., Li, X., Shao, L., Mei, G., & Li, F. (2026). OPICE: Ontology-Guided Pseudo-Label Generation and Inter-Modal Consistency Enhancement for Self-Supervised Multi-Modal Entity Alignment. Electronics, 15(2), 254. https://doi.org/10.3390/electronics15020254

