Multimodal Temporal Knowledge Graph Embedding Method Based on Mixture of Experts for Recommendation
Abstract
1. Introduction
- We propose an effective framework based on the mixture-of-experts model, which combines experts for different attribute information and effectively learns the combined weights of each attribute information to form the final accurate representation, effectively improving the accuracy of shopping recommendations.
- We also effectively utilize time series information, further promoting the effective impact of shopping history information on future recommendations.
- Based on experimental results from two classic datasets in the real world, we validate the effectiveness of the proposed method.
2. Related Work
2.1. Knowledge Graph Embedding Representation
2.1.1. Methods Based on Translation Distance
2.1.2. Methods Based on Bilinear Model
2.1.3. Methods Based on Neural Network
2.2. Knowledge-Graph-Based Recommendation
3. Methodology
3.1. MoE Embedding Module
Algorithm 1 Multimodal next POI recommendation with MoE-based knowledge graph. |
|
3.2. Attribute Capture Enhancement Module
Subgraph Unit
Retrospection Unit
Speculation Unit
3.3. Judgment Module
4. Experiments
4.1. Experimental Setting
4.1.1. Dataset
4.1.2. Baselines
- FPMC [40]: It proposes the factorized personalized Markov chains (FPMC) method, which combines matrix factorization and Markov chains to solve the recommendation problem under limited data and improve the recommendation performance.
- ST-RNN [41]: It proposes Spatial Temporal Recurrent Neural Networks (ST-RNNs), which addresses the insufficiency of existing methods in modeling continuous time intervals and geographical distances and enhances the location prediction performance.
- DeepMove [34]: It proposes the DeepMove attentional recurrent network, which solves the problems such as complex sequential transitions and data sparsity in human mobility prediction, and improves the prediction performance and interpretability.
- LSTPM [33]: It proposes the Long-Term and Short-Term Preference Modeling (LSTPM) method, which solves the problem that existing recurrent neural networks (RNNs) methods in POI recommendation neglect long-term and short-term preferences as well as geographical relations, and enhances the recommendation reliability.
- STAN [32]: It proposes the Spatio-Temporal Attention Network (STAN) for location recommendation, which addresses the problem of not considering the associations between non-adjacent locations and non-consecutive visits, and outperforms existing methods.
- MKGAT [30]: It proposes the Multimodal Knowledge Graph Attention Network (MKGAT), which solves the problem that existing recommender systems ignore the diversity of data types in multimodal knowledge graphs, and improves the recommendation quality.
- RE-NET [31]: It proposes the Relation Embedded deep model (RE-Net) for Facial action units (AUs) detection, which addresses the problem of suboptimal ways of utilizing AU correlations, and improves the performance of AU detection and AU intensity estimation.
- Mandari [9]: It proposes the Multimodal Temporal Knowledge Graph-aware Sub-graph Embedding approach (Mandari), which constructs a multimodal temporal knowledge graph, solves the problems of multimodal data association and dynamic modeling of user preferences, and improves the performance of next-POI recommendation.
- DOGE [26]: It utilizes large language models to interpret image information under the guidance of textual data, generating cross-modal features that effectively enhance the relationship between text and image modalities.
- STKG-PLM [27]: It integrates contrastive learning and prompt-tuned pre-trained language models to enhance next POI recommendation.
- Multi-KG4Rec [28]: It employs a modality fusion module to extract user modality preferences in a fine-grained manner.
- HM4SR [29]: It employs a hierarchical time-aware mixture-of-experts (MoE) framework with two-level gating and multitask learning strategies for multimodal sequential recommendation.
4.1.3. Evaluation Metrics
4.2. Main Experiment Results and Analysis
4.3. Ablation Study
4.4. Hyperparameter Analysis
5. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Jin, W.; Mao, H.; Li, Z.; Jiang, H.; Luo, C.; Wen, H.; Han, H.; Lu, H.; Wang, Z.; Li, R.; et al. Amazon-M2: A Multilingual Multi - locale Shopping Session Dataset for Recommendation and Text Generation. In Advances in Neural Information Processing Systems 36; Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S., Eds.; Curran Associates, Inc.: New York, NY, USA, 2023; pp. 8006–8026. [Google Scholar]
- Park, M.; Oh, J. Enhancing E-Commerce Recommendation Systems with Multiple Item Purchase Data: A Bidirectional Encoder Representations from Transformers-Based Approach. Appl. Sci. 2024, 14, 7255. [Google Scholar] [CrossRef]
- Jiang, M.; Li, M.; Cao, W.; Yang, M.; Zhou, L. Multi-task convolutional deep neural network for recommendation based on knowledge graphs. Neurocomputing 2025, 619, 129136. [Google Scholar] [CrossRef]
- Huang, C.; Yu, F.; Wan, Z.; Li, F.; Ji, H.; Li, Y. Knowledge graph confidence-aware embedding for recommendation. Neural Netw. 2024, 180, 106601. [Google Scholar] [CrossRef] [PubMed]
- Zeng, J.; Wang, N.; Li, J. Knowledge-driven hierarchical intents modeling for recommendation. Expert Syst. Appl. 2025, 259, 125361. [Google Scholar] [CrossRef]
- Wang, F.; Zhu, X.; Cheng, X.; Zhang, Y.; Li, Y. MMKDGAT: Multi-modal Knowledge graph-aware Deep Graph Attention Network for remote sensing image recommendation. Expert Syst. Appl. 2024, 235, 121278. [Google Scholar] [CrossRef]
- Balloccu, G.; Boratto, L.; Fenu, G.; Marras, M.; Soccol, A. KGGLM: A Generative Language Model for Generalizable Knowledge Graph Representation Learning in Recommendation. In Proceedings of the 18th ACM Conference on Recommender Systems (RecSys ’24), Bari, Italy, 14–18 October 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 1079–1084. [Google Scholar] [CrossRef]
- Liu, X.; Liang, B.; Niu, J.; Sha, C.; Wu, D. Dual-graph co-representation learning for knowledge-Graph Enhanced Recommendation. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar] [CrossRef]
- Liu, X.; Li, X.; Cao, Y.; Zhang, F.; Jin, X.; Chen, J. Mandari: Multi-Modal Temporal Knowledge Graph-aware Sub-graph Embedding for Next-POI Recommendation. In Proceedings of the 2023 IEEE International Conference on Multimedia and Expo (ICME), Brisbane, Australia, 10–14 July 2023; pp. 1529–1534. [Google Scholar] [CrossRef]
- Zeng, W.; Li, M.; Xiong, W.; Tong, T.; Lu, W.J.; Tan, J.; Wang, R.; Huang, R. MPCViT: Searching for Accurate and Efficient MPC-Friendly Vision Transformer with Heterogeneous Attention. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 5029–5040. [Google Scholar] [CrossRef]
- Teng, X.; Zhang, L.; Gao, P.; Yu, C.; Sun, S. BERT-Driven stock price trend prediction utilizing tokenized stock data and multi-step optimization approach. Appl. Soft Comput. 2025, 170, 112627. [Google Scholar] [CrossRef]
- Yang, C.; Zheng, R.; Chen, X.; Wang, H. Content recommendation with two-level TransE predictors and interaction-aware embedding enhancement: An information seeking behavior perspective. Inf. Process. Manag. 2023, 60, 103402. [Google Scholar] [CrossRef]
- Song, D.; Zhang, F.; Lu, M.; Yang, S.; Huang, H. DTransE: Distributed Translating Embedding for Knowledge Graph. IEEE Trans. Parallel Distrib. Syst. 2021, 32, 2509–2523. [Google Scholar] [CrossRef]
- Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence (AAAI’14), Québec, QC, Canada, 27–31 July 2014; AAAI Press: Washington, DC, USA, 2014; pp. 1112–1119. [Google Scholar]
- Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning Entity and Relation Embeddings for Knowledge Graph Completion. Proc. Aaai Conf. Artif. Intell. 2015, 29, 1. [Google Scholar] [CrossRef]
- Nickel, M.; Tresp, V.; Kriegel, H. A three-way model for collective learning on multi-relational data. In Proceedings of the 28th International Conference on International Conference on Machine Learning (ICML’11), Omnipress, Madison, WI, USA, 2011, 28 June–2 Jully; pp. 809–816.
- Yang, B.; Yih, W.-t.; He, X.; Gao, J.; Deng, L. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. arXiv 2015. [Google Scholar] [CrossRef]
- Liang, S. Knowledge Graph Embedding Based on Graph Neural Network. In Proceedings of the 2023 IEEE 39th International Conference on Data Engineering (ICDE), Anaheim, CA, USA, 3–7 April 2023; pp. 3908–3912. [Google Scholar] [CrossRef]
- Jiang, W.; Fu, Y.; Zhao, H.; Wan, J.; Pu, S. Graph Intention Neural Network for Knowledge Graph Reasoning. In Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 18–23 July 2022; pp. 1–8. [Google Scholar] [CrossRef]
- Li, X.; Ma, J.; Yu, J.; Zhao, M.; Yu, M.; Liu, H.; Ding, W.; Yu, R. A structure-enhanced generative adversarial network for knowledge graph zero-shot relational learning. Inf. Sci. 2023, 629, 169–183. [Google Scholar] [CrossRef]
- Wang, J.; Zhang, Y.; Sun, L.; Liu, Y.; Zhang, W.; Zhang, Y. Learning path design on knowledge graph by using reinforcement learning. In Proceedings of the 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Istanbul, Türkiye, 5–8 December 2023. [Google Scholar]
- Liu, X.; Yang, L.; Liu, Z.; Yang, M.; Wang, C.; Peng, H.; Yu, P.S. Knowledge Graph Context-Enhanced Diversified Recommendation. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining (WSDM ’24), Merida, Mexico, 4–8 March 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 462–471. [Google Scholar] [CrossRef]
- Yang, Y.; Huang, C.; Xia, L.; Li, C. Knowledge Graph Contrastive Learning for Recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22), Madrid, Spain, 11–15 July 2022; Association for Computing Machinery: New York, NY, USA, 2022; pp. 1434–1443. [Google Scholar] [CrossRef]
- Breitfuss, A.; Errou, K.; Kurteva, A.; Fensel, A. Representing emotions with knowledge graphs for movie recommendations. Future Gener. Comput. Syst. 2021, 125, 715–725. [Google Scholar] [CrossRef]
- Liu, S.; Lu, L.; Wang, B. Mining User–Item Interactions via Knowledge Graph for Recommendation. ACM Trans. Recomm. Syst. 2025, 3, 32. [Google Scholar] [CrossRef]
- Meng, F.; Meng, Z.; Jin, R.; Lin, R.; Wu, B. DOGE: LLMs-Enhanced Hyper-Knowledge Graph Recommender for Multimodal Recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; pp. 12399–12407. [Google Scholar]
- Chen, W.; Huang, H.; Zhang, Z.; Wang, T.; Lin, Y.; Chang, L.; Wan, H. Next-POI Recommendation via Spatial-Temporal Knowledge Graph Contrastive Learning and Trajectory Prompt. IEEE Trans. Knowl. Data Eng. 2025, 37, 3570–3582. [Google Scholar] [CrossRef]
- Wang, J.; Xie, H.; Zhang, S.; Qin, S.J.; Tao, X.; Wang, F.L.; Xu, X. Multimodal fusion framework based on knowledge graph for personalized recommendation. Expert Syst. Appl. 2025, 268, 126308. [Google Scholar] [CrossRef]
- Zhang, S.; Chen, L.; Shen, D.; Wang, C.; Xiong, H. Hierarchical Time-Aware Mixture of Experts for Multi-Modal Sequential Recommendation. In Proceedings of the ACM on Web Conference, Sydney, Australia, 28 April–2 May 2025; pp. 3672–3682. [Google Scholar]
- Sun, R.; Cao, X.; Zhao, Y.; Wan, J.; Zhou, K. Multi-modal knowledge graphs for recommender systems. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual, 19–23 October 2020; pp. 1405–1414. [Google Scholar]
- Yang, H.; Yin, L. Re-net: A relation embedded deep model for au occurrence and intensity estimation. In Proceedings of the Asian Conference on Computer Vision, Kyoto, Japan, 30 November–4 December 2020; pp. 137–153. [Google Scholar]
- Luo, Y.; Liu, Q.; Liu, Z. Stan: Spatio-temporal attention network for next location recommendation. In Proceedings of the web conference, Ljubljana, Slovenia, 19–23 April 2021; pp. 2177–2185. [Google Scholar]
- Sun, K.; Qian, T.; Chen, T.; Liang, Y.; Nguyen, Q.V.H.; Yin, H. Where to go next: Modeling long-and short-term user preferences for point-of-interest recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 214–221. [Google Scholar]
- Feng, J.; Li, Y.; Zhang, C.; Sun, F.; Meng, F.; Guo, A.; Jin, D. Deepmove: Predicting human mobility with attentional recurrent networks. In Proceedings of the 2018 World Wide Web Conference 2018, Lyon, France, 23–27 April 2018; pp. 1459–1468. [Google Scholar]
- Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 3982–3992. [Google Scholar]
- Boitel, E.; Mohasseb, A.; Haig, E. MIST: Multimodal emotion recognition using DeBERTa for text, Semi-CNN for speech, ResNet-50 for facial, and 3D-CNN for motion analysis. Expert Syst. Appl. 2025, 270, 126236. [Google Scholar] [CrossRef]
- Yang, D.; Zhang, D.; Zheng, V.W.; Yu, Z. Modeling User Activity Preference by Leveraging User Spatial Temporal Characteristics in LBSNs. IEEE Trans. Syst. Man Cybern. Syst. 2015, 45, 129–142. [Google Scholar] [CrossRef]
- Zhang, Y.; Ai, Q.; Chen, X.; Bruce, W. Croft. 2017. Joint Representation Learning for Top-N Recommendation with Heterogeneous Information Sources. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (CIKM ’17). Association for Computing Machinery, New York, NY, USA, 6–10 November 2017; pp. 1449–1458. [Google Scholar] [CrossRef]
- McAuley, J.; Leskovec, J. Hidden factors and hidden topics: Understanding rating dimensions with review text. In Proceedings of the 7th ACM conference on Recommender systems (RecSys ’13), New York, NY, USA, 12–16 October 2013; pp. 165–172. [Google Scholar] [CrossRef]
- Rendle, S.; Freudenthaler, C.; Schmidt-Thieme, L. Factorizing personal ized markov chains for next-basket recommendation. In Proceedings of the 19th international conference on World wide Web, Raleigh, NC, USA, 26–30 April 2010; pp. 811–820. [Google Scholar]
- Liu, Q.; Wu, S.; Wang, L.; Tan, T. Predicting the next location: A recurrent model with spatial and temporal contexts. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; pp. 194–200. [Google Scholar]
Method/Model | Modal Fusion | Temporal Modeling | Knowledge Graph Integration | MoE Mechanism |
---|---|---|---|---|
MTR (Ours) | ✓ Multimodal (text, image, structure) | ✓ Explicit temporal sequence modeling (retrospection and speculation) | ✓ KG-based structural and semantic modeling | ✓ Mixture of Experts (MoE) for dynamic modal weighting |
Mandari [9] | ✓ Multimodal (image, text, structure) | ✓ Timestamp-based user preference evolution | ✓ Subgraph embedding for KG reasoning | ✗ No MoE mechanism |
MKGAT [30] | ✓ Multimodal (image, text, structure) | ✗ No explicit temporal modeling | ✓ KG attention mechanism for graph enhancement | ✗ No MoE mechanism |
RE-NET [31] | ✗ Structural modality only | ✓ Temporal evolution modeling | ✓ Temporal relational graph modeling | ✗ No MoE mechanism |
STAN [32] | ✗ Spatial–temporal only | ✓ Spatial–temporal attention mechanism | ✗ No KG integration | ✗ No MoE mechanism |
LSTPM [33] | ✗ Spatial–temporal only | ✓ Long- and short-term preference modeling | ✗ No KG integration | ✗ No MoE mechanism |
DeepMove [34] | ✗ Spatial–temporal only | ✓ Attention-based RNN for sequence modeling | ✗ No KG integration | ✗ No MoE mechanism |
DOGE [26] | ✓ Cross-modal text–image understanding via LLM | ✗ No explicit temporal modeling | ✓ LLM-enhanced KG semantic understanding | ✓ No MoE mechanism |
STKG-PLM [27] | ✗ Spatial–temporal only | ✓ Novel spatiotemporal transition relations for POI transitions | ✓ Temporal KG construction | ✗ No MoE mechanism |
Multi-KG4Rec [28] | ✓ Multimodal fusion | ✗ No explicit temporal modeling | ✓ Personalized KG construction | ✓ No MoE mechanism |
HM4SR [29] | ✓ Multimodal fusion | ✓ Timestamp-aware dynamic interest modeling | ✗ No KG integration | ✓ Hierarchical time-aware MoE for multimodal fusion |
Datasets Model | NYC | Yelp | ||||||
---|---|---|---|---|---|---|---|---|
MRR | Hit@1 | Hit@5 | Hit@10 | MRR | Hit@1 | Hit@5 | Hit@10 | |
FPMC | 0.1801 | 0.1003 | 0.2026 | 0.2854 | 0.0164 | 0.0122 | 0.0215 | 0.0295 |
ST-RNN | 0.2293 | 0.1681 | 0.2813 | 0.3352 | 0.0277 | 0.0173 | 0.0297 | 0.0411 |
DeepMove | 0.2651 | 0.2044 | 0.2898 | 0.3487 | 0.0325 | 0.0251 | 0.0358 | 0.0638 |
LSTPM | 0.3312 | 0.2552 | 0.3164 | 0.3509 | 0.0506 | 0.0363 | 0.0502 | 0.0774 |
STAN | 0.3253 | 0.2531 | 0.3672 | 0.4712 | 0.0657 | 0.0420 | 0.0686 | 0.0985 |
MKGAT | 0.4043 | 0.2474 | 0.3516 | 0.3958 | 0.0698 | 0.0344 | 0.0663 | 0.0966 |
RE-NET | 0.3812 | 0.3551 | 0.4246 | 0.4802 | 0.0702 | 0.0418 | 0.0678 | 0.1124 |
Mandari | 0.4668 | 0.4165 | 0.5234 | 0.5724 | 0.0803 | 0.0461 | 0.0731 | 0.1235 |
DOGE | 0.4663 | 0.4046 | 0.4877 | 0.5501 | 0.0845 | 0.0625 | 0.0798 | 0.1344 |
STKG-PLM | 0.4029 | 0.3998 | 0.4559 | 0.5207 | 0.0805 | 0.0463 | 0.0763 | 0.1313 |
Multi-KG4Rec | 0.5156 | 0.4681 | 0.5484 | 0.5864 | 0.0792 | 0.0444 | 0.0722 | 0.1280 |
HM4SR | 0.5172 | 0.4894 | 0.5780 | 0.5943 | 0.0872 | 0.0634 | 0.0837 | 0.1383 |
MTR | 0.5488 | 0.5171 | 0.5973 | 0.6194 | 0.0932 | 0.0645 | 0.0886 | 0.1452 |
Model | Recall@10 | Recall@20 |
---|---|---|
BPR | 0.0235 | 0.0367 |
LightGCN | 0.0363 | 0.0540 |
VBPR | 0.0293 | 0.0456 |
MMGCN | 0.0272 | 0.0331 |
GRCN | 0.0349 | 0.0529 |
DualGNN | 0.0363 | 0.0543 |
BM3 | 0.0437 | 0.0648 |
MGCN | 0.0439 | 0.0628 |
FREEDOM | 0.0382 | 0.0636 |
MTR | 0.0453 | 0.0681 |
Dataset | Metric | Avg. Impr. | p-Value (t/W) | 95% CI | Cohen’s d | Conclusions |
---|---|---|---|---|---|---|
NYC | MRR | +5.53% | 0.007 | 1.12 | Significant | |
NYC | Hit@1 | +8.07% | 0.004 | 1.25 | Significant | |
Yelp | MRR | +2.11% | 0.021 | 0.87 | Significant | |
Yelp | Hit@1 | +1.70% | 0.038 | 0.76 | Significant |
Model | MRR | Hit@1 | Hit@5 | Hit@10 |
---|---|---|---|---|
Original experiment | 0.0924 | 0.0631 | 0.0873 | 0.1428 |
Remove MoE for simple splicing | 0.0742 | 0.0335 | 0.0542 | 0.0928 |
Remove time series modeling | 0.0776 | 0.0548 | 0.0724 | 0.1205 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, B.; Dong, G.; Li, Z.; Fang, Y.; Li, J.; Sun, W.; Zhang, B.; Li, C.; Li, X. Multimodal Temporal Knowledge Graph Embedding Method Based on Mixture of Experts for Recommendation. Mathematics 2025, 13, 2496. https://doi.org/10.3390/math13152496
Liu B, Dong G, Li Z, Fang Y, Li J, Sun W, Zhang B, Li C, Li X. Multimodal Temporal Knowledge Graph Embedding Method Based on Mixture of Experts for Recommendation. Mathematics. 2025; 13(15):2496. https://doi.org/10.3390/math13152496
Chicago/Turabian StyleLiu, Bingchen, Guangyuan Dong, Zihao Li, Yuanyuan Fang, Jingchen Li, Wenqi Sun, Bohan Zhang, Changzhi Li, and Xin Li. 2025. "Multimodal Temporal Knowledge Graph Embedding Method Based on Mixture of Experts for Recommendation" Mathematics 13, no. 15: 2496. https://doi.org/10.3390/math13152496
APA StyleLiu, B., Dong, G., Li, Z., Fang, Y., Li, J., Sun, W., Zhang, B., Li, C., & Li, X. (2025). Multimodal Temporal Knowledge Graph Embedding Method Based on Mixture of Experts for Recommendation. Mathematics, 13(15), 2496. https://doi.org/10.3390/math13152496