Analogical Reasoning with Multimodal Knowledge Graphs: Fine-Tuning Model Performance Based on LoRA
Abstract
1. Introduction
- (1)
- A low-rank fine-tuning adaptation module based on a rank-stable scaling factor is designed to perform LoRA fine-tuning on the MKGformer model so that the model can meet the requirements of this task.
- (2)
- A dynamic fine-tuning strategy is designed to add a cue-embedding layer module to dynamically adjust the model inputs to ensure that the model achieves optimal results during training.
- (3)
- Experiments show that the method can effectively fine-tune the model for multimodal knowledge graph analogical reasoning, which significantly improves the performance of the model in relevant research in this field, and provides a new solution for applications in the field of multimodal knowledge graph analogical reasoning.
2. Related Work
2.1. Analogical Reasoning with Multimodal Knowledge Graphs
2.2. Efficient Parameter Fine-Tuning Technology
3. R-MKG Modeling Approach
3.1. Task Definition
3.2. R-MKG Model
3.3. Cue-Embedded Layer Module
3.4. Low-Rank Fine-Tuning Adaptation Module Based on Rank-Stable Scaling Factors
4. Materials and Methods
4.1. Experimental Environment Configuration
4.2. Experimental Parameter Setting
4.3. Datasets
4.4. Assessment of Indicators
5. Experimental Evaluation
5.1. Comparative Experimental Design and Analysis
5.2. Ablation Experiment
5.3. Comparative Experiments with Different Datasets
6. Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
LoRA | Low-Rank Adaptation |
LoRA-FA | Memory-Efficient Low-Rank Adaptation |
QLoRA | Quantized Low-Rank Adaptation |
KG | Knowledge Graph |
MKG | Multimodal Knowledge Graphs |
NLP | Natural Language Processing |
DoRA | Decomposed Low-Rank Adaptation |
ELU | Exponential Linear Units |
MRR | Mean Reciprocal Ranking |
COCO | Common Objects in Context |
CC | Conceptual Captions |
Appendix A
References
- Wu, B.; Qin, H.; Zareian, A.; Vondrick, C.; Chang, S.-F. Analogical reasoning for visually grounded language acquisition. arXiv 2020, arXiv:2007.11668. [Google Scholar] [CrossRef]
- Prade, H.; Richard, G. Analogical proportions: Why they are useful in AI. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence (IJCAI-21), Montreal, QC, Canada, 19–27 August 2021; pp. 4568–4576. [Google Scholar]
- Thagard, P. Analogy, explanation, and education. J. Res. Sci. Teach. 1992, 29, 537–544. [Google Scholar] [CrossRef]
- Daugherty, J.L.; Mentzer, N. Analogical reasoning in the engineering design process and technology education applications. J. Technol. Educ. 2008, 19, 7–21. [Google Scholar]
- Turner, M. Categories and Analogies. In Analogical Reasoning: Perspectives of Artificial Intelligence, Cognitive Science, and Philosophy; Helman, D.H., Ed.; Springer Science & Business Media: Dordrecht, The Netherlands, 2013; pp. 3–24. [Google Scholar]
- Jin, Z. Analyzing the role of semantic representations in the era of large language models. arXiv 2024, arXiv:2405.01502. [Google Scholar] [CrossRef]
- Liang, L.; Li, Y.; Wen, M.; Liu, Y. KG4Py: A toolkit for generating Python knowledge graph and code semantic search. Conn. Sci. 2022, 34, 1384–1400. [Google Scholar] [CrossRef]
- Yang, Z. Design and research of intelligent question-answering (Q&A) system based on high school course knowledge graph. Mobile Netw. Appl. 2021, 26, 1884–1890. [Google Scholar]
- Simianer, P.; Riezler, S.; Dyer, C. Joint feature selection in distributed stochastic learning for large-scale discriminative training in SMT. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, Jeju Island, Republic of Korea, 8–14 July 2012; pp. 11–21. [Google Scholar]
- Luo, H.; Ji, L.; Shi, B.; Huang, H.; Duan, N.; Li, T.; Li, J.; Bharti, T.; Zhou, M. UniVL: A unified video and language pre-training model for multimodal understanding and generation. arXiv 2020, arXiv:2002.06353. [Google Scholar]
- Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4–24. [Google Scholar] [CrossRef] [PubMed]
- Zhou, K.; Hassan, F.H.; Hoon, G.K. The state of the art for cross-modal retrieval: A survey. IEEE Access 2023, 11, 138568–138589. [Google Scholar] [CrossRef]
- Zhang, N.; Li, L.; Chen, X.; Liang, X.; Deng, S.; Chen, H. Multimodal analogical reasoning over knowledge graphs. arXiv 2022, arXiv:2210.00312. [Google Scholar]
- Chen, X.; Zhang, N.; Li, L.; Deng, S.; Tan, C.; Xu, C.; Huang, F.; Si, L.; Chen, H. Hybrid transformer with multi-level fusion for multimodal knowledge graph completion. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 904–915. [Google Scholar]
- Jiang, Y.; Wang, S.; Valls, V.; Ko, B.; Lee, W.; Leung, K.K.; Tassiulas, L. Model pruning enables efficient federated learning on edge devices. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 10374–10386. [Google Scholar] [CrossRef] [PubMed]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-rank adaptation of large language models. In Proceedings of the Tenth International Conference on Learning Representations, Virtual, 25–29 April 2022; pp. 1–3. [Google Scholar]
- Liu, S.; Wang, C.; Yin, H.; Molchanov, P.; Wang, Y.; Cheng, K.; Chen, M. DoRA: Weight-decomposed low-rank adaptation. In Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024. [Google Scholar]
- Dettmers, T.; Pagnoni, A.; Holtzman, A.; Zettlemoyer, L. QLORA: Efficient finetuning of quantized LLMs. Adv. Neural Inf. Process. Syst. 2023, 36, 10088–10115. [Google Scholar]
- Zhang, L.; Zhang, L.; Shi, S.; Chu, X.; Li, B. LoRA-FA: Memory-efficient low-rank adaptation for large language models fine-tuning. arXiv 2023, arXiv:2308.03303. [Google Scholar]
- Lu, J.; Batra, D.; Parikh, D.; Lee, S. ViLBERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. Adv. Neural Inf. Process. Syst. 2019, 32, 13–23. [Google Scholar]
- Singh, A.; Hu, R.; Goswami, V.; Couairon, G.; Galuba, W.; Rohrbach, M.; Kiela, D. FLAVA: A foundational language and vision alignment model. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 15638–15650. [Google Scholar]
Parameter Name | Parameter Value |
---|---|
batch_size | 64 |
epoch | 15 |
sequence length | 128 |
LoRA rank | 8 |
LoRA scaling factor | 16 |
optimizer | AdamW |
learning rate |
LoRA Rank (r) | R-MKG Accuracy (%) | Relative Full-Volume Fine-Tuning Performance (%) | Training Time (Minutes) | FLOP () | Loss Volatility |
---|---|---|---|---|---|
4 | 87.3 | 97.5 | 42 | 1.2 | ±0.18 |
8 | 89.6 | 99 | 58 | 1.5 | ±0.08 |
16 | 89.8 | 99.1 | 82 | 2.1 | ±0.12 |
32 | 89.9 | 99.2 | 110 | 3 | ±0.15 |
full-scale fine-tuning | 90.5 | 100 | 121 | 10 | ±0.38 |
LoRA Scaling Factor () | R-MKG Accuracy (%) | Relative Full-Volume Fine-Tuning Performance (%) | Training Time (Minutes) | FLOP () | Loss Volatility |
---|---|---|---|---|---|
4 | 88.2 | 97.5 | 45 | 1.2 | ±0.18 |
8 | 89.1 | 98.4 | 47 | 1.5 | ±0.12 |
16 | 89.7 | 99.1 | 49 | 2.1 | ±0.09 |
32 | 89.8 | 99.2 | 52 | 3 | ±0.15 |
full-scale fine-tuning | 90.5 | 100 | 121 | 10 | ±0.37 |
Dataset | Size | KB | Modality | Entity | Relation | Images |
---|---|---|---|---|---|---|
550 | NO | Text | 919 | 14 | NO | |
E-KAR | 1251 | NO | Text | 2032 | 28 | NO |
RAVEN | 70,000 | NO | Vision | NO | 8 | 1,120,000 |
MARS | 13,328 | MarKG | Vision + Text | 2063 | 27 | 13,398 |
Modelling | Hits@1 | Hits@3 | Hits@5 | Hits@10 | MRR |
---|---|---|---|---|---|
ViLBERT + LoRA | 0.279 | 0.320 | 0.341 | 0.355 | 0.325 |
ViLBERT + DoRA | 0.277 | 0.299 | 0.337 | 0.350 | 0.321 |
ViLBERT + QLoRA | 0.268 | 0.286 | 0.323 | 0.344 | 0.312 |
ViLBERT + LoRA-FA | 0.271 | 0.295 | 0.334 | 0.348 | 0.317 |
FLAVA + LoRA | 0.285 | 0.325 | 0.346 | 0.357 | 0.322 |
FLAVA + DoRA | 0.281 | 0.322 | 0.341 | 0.355 | 0.319 |
FLAVA + QLoRA | 0.291 | 0.333 | 0.355 | 0.361 | 0.331 |
FLAVA + LoRA-FA | 0.275 | 0.317 | 0.340 | 0.351 | 0.312 |
MKGformer + LoRA | 0.289 | 0.341 | 0.359 | 0.376 | 0.335 |
MKGformer + DoRA | 0.271 | 0.324 | 0.345 | 0.359 | 0.321 |
MKGformer + QLoRA | 0.282 | 0.333 | 0.354 | 0.370 | 0.329 |
MKGformer + LoRA-FA | 0.278 | 0.33 | 0.352 | 0.366 | 0.325 |
R-MKG (our) | 0.332 | 0.385 | 0.408 | 0.437 | 0.387 |
Model | Hits@1 | Hits@10 | MRR |
---|---|---|---|
R-MKG | 0.332 | 0.437 | 0.387 |
w/o LoRA | 0.324 | 0.431 | 0.372 |
w/o Cue-Embedding Layer | 0.320 | 0.427 | 0.369 |
w/o LoRA + w/o Cue-Embedding Layer | 0.292 | 0.383 | 0.352 |
Dataset | Modelling | MRR | HIts@1 | Hits@10 |
---|---|---|---|---|
COCO | FLAVA + LoRA | 0.317 | 0.273 | 0.351 |
COCO | MKGformer + LoRA | 0.328 | 0.277 | 0.368 |
COCO | R-MKG | 0.37 | 0.316 | 0.423 |
CC | FLAVA + LoRA | 0.32 | 0.277 | 0.355 |
CC | MKGformer + LoRA | 0.331 | 0.282 | 0.373 |
CC | R-MKG | 0.375 | 0.322 | 0.427 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Z.; Zhang, S.; An, Z.; Li, Z.; Zhang, C. Analogical Reasoning with Multimodal Knowledge Graphs: Fine-Tuning Model Performance Based on LoRA. Electronics 2025, 14, 3140. https://doi.org/10.3390/electronics14153140
Zhang Z, Zhang S, An Z, Li Z, Zhang C. Analogical Reasoning with Multimodal Knowledge Graphs: Fine-Tuning Model Performance Based on LoRA. Electronics. 2025; 14(15):3140. https://doi.org/10.3390/electronics14153140
Chicago/Turabian StyleZhang, Zhenglong, Sijia Zhang, Zongshi An, Zhenglin Li, and Chun Zhang. 2025. "Analogical Reasoning with Multimodal Knowledge Graphs: Fine-Tuning Model Performance Based on LoRA" Electronics 14, no. 15: 3140. https://doi.org/10.3390/electronics14153140
APA StyleZhang, Z., Zhang, S., An, Z., Li, Z., & Zhang, C. (2025). Analogical Reasoning with Multimodal Knowledge Graphs: Fine-Tuning Model Performance Based on LoRA. Electronics, 14(15), 3140. https://doi.org/10.3390/electronics14153140