Multi-Agent Transfer Learning Based on Contrastive Role Relationship Representation
Abstract
1. Introduction
- This study constructs a role relationship representation and proposes the MCRR framework. By dynamically clustering roles and extracting behavioral representations via contrastive learning, and enhancing cross-role interaction with attention mechanisms, it achieves collaborative optimization of role strategies and transfer learning.
- We leverage role representations to realize more expressive credit assignment via an attention mechanism, promoting strategic coordination in a sophisticated role space. This mechanism enhances transfer learning by embedding cross-task role generalization into the hybrid network, thereby optimizing the coordination capabilities of agent roles in different tasks.
- In SMAC benchmark experiments, MCRR significantly outperforms baseline methods such as MATTAR and UPDeT in win rates for source tasks and unseen tasks across task series like mixed formations and quantity variations, verifying that the role mechanism, as a bridge for knowledge transfer, enhances cross-task generalization ability.
2. Preliminaries
2.1. Problem Formulation
2.2. Task Relationship
3. Method
3.1. Contrastive Role Representation
3.2. Attention Role Collaboration
4. Experiments
4.1. Role Generalization in Unknown Tasks
4.2. Role-Aided Good Initialization for Policy Fine-Tuning
4.3. Dynamic Evolution of Role Representations
5. Related Works
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Appendix A.1. Experimental Detail
| Map Name | Ally Units | Enemy Units | Difficulty |
|---|---|---|---|
| 1s8z | 1 Stalkers, 8 Zealots | 1 Stalkers, 8 Zealots | Easy |
| 1s9z | 1 Stalkers, 9 Zealots | 1 Stalkers, 9 Zealots | Easy |
| 2s3z | 2 Stalkers, 3 Zealots | 2 Stalkers, 3 Zealots | Easy |
| 2s8z | 2 Stalkers, 8 Zealots | 2 Stalkers, 8 Zealots | Easy |
| 2s9z | 2 Stalkers, 9 Zealots | 2 Stalkers, 9 Zealots | Easy |
| 3s5z | 3 Stalkers, 5 Zealots | 3 Stalkers, 5 Zealots | Easy |
| 3s5z_vs_3s6z | 3 Stalkers, 5 Zealots | 3 Stalkers, 6 Zealots | Super Hard |
| 7s3z | 7 Stalkers, 3 Zealots | 7 Stalkers, 3 Zealots | Easy |
| 3s5z_vs_3s7z | 3 Stalkers, 5 Zealots | 3 Stalkers, 7 Zealots | Extremely Hard |
| Map Name | Ally Units | Enemy Units | Difficulty |
|---|---|---|---|
| 3m | 3 Marines | 3 Marines | Easy |
| 4m | 4 Marines | 4 Marines | Easy |
| 4m_vs_5m | 4 Marines | 5 Marines | Hard |
| 5m | 5 Marines | 5 Marines | Easy |
| 5m_vs_6m | 5 Marines | 6 Marines | Hard |
| 6m | 6 Marines | 6 Marines | Easy |
| 6m_vs_7m | 6 Marines | 7 Marines | Hard |
| 7m | 7 Marines | 7 Marines | Easy |
| 7m_vs_8m | 7 Marines | 8 Marines | Hard |
| 8m | 8 Marines | 8 Marines | Easy |
| 8m_vs_9m | 8 Marines | 9 Marines | Easy |
| 9m | 9 Marines | 9 Marines | Easy |
| 9m_vs_10m | 9 Marines | 10 Marines | Easy |
| 10m | 10 Marines | 10 Marines | Easy |
| 10m_vs_11m | 10 Marines | 11 Marines | Easy |
| 10m_vs_12m | 10 Marines | 12 Marines | Super Hard |
| Map Name | Ally Units | Enemy Units | Difficulty |
|---|---|---|---|
| MMM0 | 1 Medivac, 2 Marauders, 5 Marines | 1 Medivac, 2 Marauders, 5 Marines | Easy |
| MMM | 1 Medivac, 2 Marauders, 7 Marines | 1 Medivac, 2 Marauders, 7 Marines | Easy |
| MMM1 | 1 Medivac, 1 Marauder, 7 Marines | 1 Medivac, 2 Marauders, 7 Marines | Hard |
| MMM2 | 1 Medivac, 2 Marauders, 7 Marines | 1 Medivac, 3 Marauders, 8 Marines | Super Hard |
| MMM3 | 1 Medivac, 2 Marauders, 8 Marines | 1 Medivac, 3 Marauders, 9 Marines | Super Hard |
| MMM4 | 1 Medivac, 3 Marauders, 8 Marines | 1 Medivac, 4 Marauders, 9 Marines | Super Hard |
| MMM5 | 1 Medivac, 3 Marauders, 8 Marines | 1 Medivac, 4 Marauders, 10 Marines | Super Hard |
| MMM6 | 1 Medivac, 3 Marauders, 8 Marines | 1 Medivac, 4 Marauders, 11 Marines | Super Hard |
Appendix A.2. Pseudo-Code
| Algorithm A1 MCRR |
|
References
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar] [CrossRef]
- Todorov, E.; Erez, T.; Tassa, Y. Mujoco: A physics engine for model-based control. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 7–12 October 2012. [Google Scholar]
- Terven, J. Deep reinforcement learning: A chronological overview and methods. AI 2025, 6, 46. [Google Scholar] [CrossRef]
- Vinyals, O.; Babuschkin, I.; Czarnecki, W.M.; Mathieu, M.; Dudzik, A.; Chung, J.; Choi, D.H.; Powell, R.; Ewalds, T.; Georgiev, P.; et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 2019, 575, 350–354. [Google Scholar] [CrossRef]
- Chen, D.; Chen, K.; Li, Z.; Chu, T.; Yao, R.; Qiu, F.; Lin, K. Powernet: Multi-agent deep reinforcement learning for scalable powergrid control. IEEE Trans. Power Syst. 2021, 37, 1007–1017. [Google Scholar] [CrossRef]
- Altun, H.O.; Ceran, H.F.; Metin, K.K.; Erol, T.; Fişne, E. Strategic Implementation of Super-Agents in Heterogeneous Multi-Agent Training for Advanced Military Simulation Adaptability. IEEE Access 2025, 13, 96544–96563. [Google Scholar] [CrossRef]
- Fereidooni, Z.; Palesi, L.I.; Nesi, P. Multi-Agent Optimizing Traffic Light Signals Using Deep Reinforcement Learning. IEEE Access 2025, 13, 106974–106988. [Google Scholar] [CrossRef]
- Foerster, J.; Assael, I.A.; De Freitas, N.; Whiteson, S. Learning to communicate with deep multi-agent reinforcement learning. Adv. Neural Inf. Process. Syst. 2016, 29, 2145–2153. [Google Scholar]
- Gupta, J.K.; Egorov, M.; Kochenderfer, M. Cooperative multi-agent control using deep reinforcement learning. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems, São Paulo, Brazil, 8–12 May 2017; Springer International Publishing: Cham, Switzerland, 2017. [Google Scholar]
- Wang, T.; Dong, H.; Lesser, V.; Zhang, C. Roma: Multi-agent reinforcement learning with emergent roles. arXiv 2020, arXiv:2003.08039. [Google Scholar] [CrossRef]
- Wang, T.; Gupta, T.; Mahajan, A.; Peng, B.; Whiteson, S.; Zhang, C. Rode: Learning roles to decompose multi-agent tasks. arXiv 2020, arXiv:2010.01523. [Google Scholar] [CrossRef]
- Hu, Z.; Zhang, Z.; Li, H.; Chen, C.; Ding, H.; Wang, Z. Attention-guided contrastive role representations for multi-agent reinforcement learning. arXiv 2023, arXiv:2312.04819. [Google Scholar]
- Tan, C.; Sun, F.; Kong, T.; Zhang, W.; Yang, C.; Liu, C. A survey on deep transfer learning. In Proceedings of the International Conference on Artificial Neural Networks, Rhodes, Greece, 4–7 October 2018; Springer International Publishing: Cham, Switzerland, 2018. [Google Scholar]
- Hu, S.; Zhu, F.; Chang, X.; Liang, X. Updet: Universal multi-agent reinforcement learning via policy decoupling with transformers. arXiv 2021, arXiv:2101.08001. [Google Scholar]
- Iqbal, S.; De Witt, C.A.; Peng, B.; Böhmer, W.; Whiteson, S.; Sha, F. Randomized entity-wise factorization for multi-agent reinforcement learning. In Proceedings of the International Conference on Machine Learning PMLR 2021, Virtual, 18–24 July 2021. [Google Scholar]
- Samvelyan, M.; Rashid, T.; De Witt, C.S.; Farquhar, G.; Nardelli, N.; Rudner, T.G.; Hung, C.M.; Torr, P.H.; Foerster, J.; Whiteson, S. The starcraft multi-agent challenge. arXiv 2019, arXiv:1902.04043. [Google Scholar]
- Qin, R.; Chen, F.; Wang, T.; Yuan, L.; Wu, X.; Kang, Y.; Zhang, Z.; Zhang, C.; Yu, Y. Multi-agent policy transfer via task relationship modeling. Sci. China Inf. Sci. 2024, 67, 182101. [Google Scholar] [CrossRef]
- Oliehoek, F.A.; Amato, C. A Concise Introduction to Decentralized POMDPs; Springer International Publishing: Cham, Switzerland, 2016; Volume 1. [Google Scholar]
- Oord, A.V.; Li, Y.; Vinyals, O. Representation learning with contrastive predictive coding. arXiv 2018, arXiv:1807.03748. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar] [CrossRef]
- Laskin, M.; Srinivas, A.; Abbeel, P. Curl: Contrastive unsupervised representations for reinforcement learning. In Proceedings of the International Conference on Machine Learning PMLR 2020, Virtual, 13–18 July 2020. [Google Scholar]
- Yuan, H.; Lu, Z. Robust task representations for offline meta-reinforcement learning via contrastive learning. In Proceedings of the International Conference on Machine Learning PMLR 2022, Baltimore, MD, USA, 17–23 July 2022. [Google Scholar]
- Sunehag, P.; Lever, G.; Gruslys, A.; Czarnecki, W.M.; Zambaldi, V.; Jaderberg, M.; Lanctot, M.; Sonnerat, N.; Leibo, J.Z.; Tuyls, K.; et al. Value-decomposition networks for cooperative multi-agent learning. arXiv 2017, arXiv:1706.05296. [Google Scholar]
- Rashid, T.; Samvelyan, M.; De Witt, C.S.; Farquhar, G.; Foerster, J.; Whiteson, S. Monotonic value function factorisation for deep multi-agent reinforcement learning. J. Mach. Learn. Res. 2020, 21, 1–51. [Google Scholar]
- Lhaksmana, K.M.; Murakami, Y.; Ishida, T. Role-based modeling for designing agent behavior in self-organizing multi-agent systems. Int. J. Softw. Eng. Knowl. Eng. 2018, 28, 79–96. [Google Scholar]
- Xia, Y.; Zhu, J.; Zhu, L. Dynamic role discovery and assignment in multi-agent task decomposition. Complex Intell. Syst. 2023, 9, 6211–6222. [Google Scholar] [CrossRef]
- Cao, J.; Yuan, L.; Wang, J.; Zhang, S.; Zhang, C.; Yu, Y.; Zhan, D.C. LINDA: Multi-agent local information decomposition for awareness of teammates. Sci. China Inf. Sci. 2023, 66, 182101. [Google Scholar] [CrossRef]
- Yang, J.; Borovikov, I.; Zha, H. Hierarchical cooperative multi-agent reinforcement learning with skill discovery. arXiv 2019, arXiv:1912.03558. [Google Scholar]
- Yuan, L.; Wang, C.; Wang, J.; Zhang, F.; Chen, F.; Guan, C.; Zhang, Z.; Zhang, C.; Yu, Y. Multi-Agent Concentrative Coordination with Decentralized Task Representation. In Proceedings of the IJCAI 2022, Vienna, Austria, 23–29 July 2022. [Google Scholar]
- Zeng, X.; Peng, H.; Li, A. Effective and stable role-based multi-agent collaboration by structural information principles. Proc. AAAI Conf. Artif. Intell. 2023, 37, 11772–11780. [Google Scholar] [CrossRef]
- Liu, B.; Liu, Q.; Stone, P.; Garg, A.; Zhu, Y.; Anandkumar, A. Coach-player multi-agent reinforcement learning for dynamic team composition. In Proceedings of the International Conference on Machine Learning PMLR 2021, Virtual, 18–24 July 2021. [Google Scholar]
- Liu, Y.; Hu, Y.; Gao, Y.; Chen, Y.; Fan, C. Value Function Transfer for Deep Multi-Agent Reinforcement Learning Based on N-Step Returns. In Proceedings of the IJCAI 2019, Macao, China, 10–16 August 2019. [Google Scholar]
- Niu, L.; Liang, W.; Tao, J.; Zhou, W.; Yan, H. Multi-agent reinforcement learning policy transfer by buffer. In Proceedings of the 2021 7th International Conference on Big Data and Information Analytics (BigDIA), Chongqing, China, 29–31 October 2021. [Google Scholar]
- Bo, C.; Liu, S.; Liu, Y.; Guo, Z.; Wang, J.; Xu, J. Research on Isomorphic Task Transfer Algorithm Based on Knowledge Distillation in Multi-Agent Collaborative Systems. Sensors 2024, 24, 4741. [Google Scholar] [CrossRef] [PubMed]
- Shi, H.; Li, J.; Mao, J.; Hwang, K.S. Lateral transfer learning for multiagent reinforcement learning. IEEE Trans. Cybern. 2021, 53, 1699–1711. [Google Scholar] [CrossRef] [PubMed]



| Task | Our | w/o Role | MATTAR | UPDeT-b | UPDeT-m | REFIL | |
|---|---|---|---|---|---|---|---|
| Source tasks | 2s3z | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 | 0.94 ± 0.04 | 0.60 ± 0.11 | 0.75 ± 0.09 |
| 3s5z | 1.00 ± 0.00 | 0.96 ± 0.04 | 0.99 ± 0.01 | 0.86 ± 0.13 | 0.47 ± 0.15 | 0.43 ± 0.13 | |
| 3s5z3s6z | 0.83 ± 0.07 | 0.73 ± 0.13 | 0.48 ± 0.13 | 0.09 ± 0.08 | 0.03 ± 0.03 | 0.01 ± 0.01 | |
Unseen tasks | 1s8z | 0.90 ± 0.06 | 0.86 ± 0.07 | 0.79 ± 0.09 | 0.16 ± 0.11 | 0.08 ± 0.06 | 0.08 ± 0.04 |
| 1s9z | 0.94 ± 0.01 | 0.83 ± 0.11 | 0.60 ± 0.12 | 0.11 ± 0.10 | 0.04 ± 0.04 | 0.03 ± 0.01 | |
| 2s8z | 0.96 ± 0.03 | 0.98 ± 0.04 | 0.93 ± 0.09 | 0.29 ± 0.22 | 0.14 ± 0.12 | 0.08 ± 0.05 | |
| 2s9z | 0.92 ± 0.03 | 0.85 ± 0.13 | 0.84 ± 0.04 | 0.15 ± 0.13 | 0.06 ± 0.05 | 0.05 ± 0.04 | |
| 7s3z | 0.42 ± 0.12 | 0.35 ± 0.08 | 0.16 ± 0.12 | 0.02 ± 0.04 | 0.01 ± 0.01 | 0.06 ± 0.04 |
| Task | Our | w/o Role | MATTAR | UPDeT-b | UPDeT-m | REFIL | |
|---|---|---|---|---|---|---|---|
| Source tasks | MMM | 1.00 ± 0.00 | 0.96 ± 0.03 | 1.00 ± 0.00 | 1.00 ± 0.00 | 0.48 ± 0.03 | 0.97 ± 0.01 |
| MMM2 | 0.96 ± 0.03 | 0.83 ± 0.10 | 0.92 ± 0.20 | 0.78 ± 0.04 | 0.15 ± 0.19 | 0.04 ± 0.02 | |
| MMM6 | 0.27 ± 0.03 | 0.15 ± 0.04 | 0.09 ± 0.02 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 | |
| Unseen tasks | MMM0 | 1.00 ± 0.00 | 1.00 ± 0.00 | 0.98 ± 0.02 | 0.73 ± 0.21 | 0.30 ± 0.16 | 0.93 ± 0.02 |
| MMM1 | 1.00 ± 0.00 | 1.00 ± 0.00 | 0.97 ± 0.04 | 0.84 ± 0.07 | 0.27 ± 0.13 | 0.38 ± 0.06 | |
| MMM3 | 0.95 ± 0.14 | 0.75 ± 0.13 | 0.86 ± 0.10 | 0.57 ± 0.15 | 0.28 ± 0.08 | 0.12 ± 0.04 | |
| MMM4 | 1.00 ± 0.00 | 0.80 ± 0.13 | 0.93 ± 0.12 | 0.41 ± 0.14 | 0.20 ± 0.07 | 0.06 ± 0.03 | |
| MMM5 | 0.62 ± 0.08 | 0.54 ± 0.03 | 0.47 ± 0.15 | 0.00 ± 0.00 | 0.00 ± 0.00 | 0.00 ± 0.00 |
| Task | Our | w/o Role | MATTAR | UPDeT-b | UPDeT-m | REFIL | |
|---|---|---|---|---|---|---|---|
| Source tasks | 5m | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 | 0.77 ± 0.09 | 0.73 ± 0.03 |
| 5m6m | 0.76 ± 0.07 | 0.77 ± 0.11 | 0.72 ± 0.05 | 0.93 ± 0.05 | 0.32 ± 0.03 | 0.00 ± 0.00 | |
| 8m9m | 0.86 ± 0.04 | 0.96 ± 0.04 | 0.83 ± 0.05 | 0.81 ± 0.19 | 0.35 ± 0.05 | 0.01 ± 0.01 | |
| 10m11m | 0.93 ± 0.03 | 0.91 ± 0.09 | 0.81 ± 0.09 | 0.94 ± 0.04 | 0.43 ± 0.02 | 0.03 ± 0.02 | |
| Unseen tasks | 3m | 1.00 ± 0.00 | 1.00 ± 0.00 | 0.94 ± 0.27 | 0.81 ± 0.08 | 0.36 ± 0.04 | 0.68 ± 0.06 |
| 4m | 1.00 ± 0.00 | 1.00 ± 0.00 | 0.97 ± 0.02 | 0.95 ± 0.06 | 0.57 ± 0.03 | 0.74 ± 0.02 | |
| 4m5m | 0.18 ± 0.05 | 0.14 ± 0.03 | 0.04 ± 0.05 | 0.29 ± 0.17 | 0.10 ± 0.06 | 0.00 ± 0.00 | |
| 6m | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 | 0.91 ± 0.09 | 0.71 ± 0.02 | |
| 6m7m | 0.90 ± 0.03 | 0.86 ± 0.05 | 0.74 ± 0.15 | 0.78 ± 0.05 | 0.35 ± 0.10 | 0.01 ± 0.00 | |
| 7m | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 | 0.99 ± 0.01 | 0.92 ± 0.03 | 0.66 ± 0.03 | |
| 7m8m | 0.89 ± 0.09 | 0.82 ± 0.07 | 0.83 ± 0.04 | 0.73 ± 0.11 | 0.38 ± 0.05 | 0.01 ± 0.01 | |
| 8m | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 | 0.99 ± 0.02 | 0.83 ± 0.05 | 0.63 ± 0.05 | |
| 9m | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 | 0.99 ± 0.01 | 0.66 ± 0.11 | 0.55 ± 0.05 | |
| 9m10m | 0.89 ± 0.05 | 1.00 ± 0.00 | 0.84 ± 0.09 | 0.80 ± 0.16 | 0.33 ± 0.09 | 0.01 ± 0.00 | |
| 10m | 1.00 ± 0.00 | 1.00 ± 0.00 | 1.00 ± 0.00 | 0.99 ± 0.01 | 0.17 ± 0.08 | 0.46 ± 0.02 | |
| 10m12m | 0.18 ± 0.07 | 0.12 ± 0.02 | 0.07 ± 0.01 | 0.07 ± 0.04 | 0.03 ± 0.02 | 0.00 ± 0.00 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Wu, Z.; Wu, J.; Zhang, J. Multi-Agent Transfer Learning Based on Contrastive Role Relationship Representation. AI 2026, 7, 13. https://doi.org/10.3390/ai7010013
Wu Z, Wu J, Zhang J. Multi-Agent Transfer Learning Based on Contrastive Role Relationship Representation. AI. 2026; 7(1):13. https://doi.org/10.3390/ai7010013
Chicago/Turabian StyleWu, Zixuan, Jintao Wu, and Jiajia Zhang. 2026. "Multi-Agent Transfer Learning Based on Contrastive Role Relationship Representation" AI 7, no. 1: 13. https://doi.org/10.3390/ai7010013
APA StyleWu, Z., Wu, J., & Zhang, J. (2026). Multi-Agent Transfer Learning Based on Contrastive Role Relationship Representation. AI, 7(1), 13. https://doi.org/10.3390/ai7010013

