Abstract
In intelligent multi-agent systems, particularly in drone combat scenarios, the challenges posed by rapidly changing environments and incomplete information significantly hinder effective strategy optimization. Traditional multi-agent reinforcement learning (MARL) approaches often encounter difficulties in adapting to the dynamic nature of adversarial environments, especially when enemy strategies are subject to continuous evolution, complicating agents’ ability to respond effectively. To address these challenges, this paper introduces a novel enhanced MARL framework, MADDPG-SASP, which integrates an improved self-attention mechanism with self-play within the MADDPG algorithm, thereby facilitating superior strategy optimization. The self-attention mechanism empowers agents to adaptively extract critical environmental features, thereby enhancing both the speed and accuracy of perception and decision-making processes. Concurrently, the adaptive self-battling mechanism iteratively refines agent strategies through continuous adversarial interactions, thereby bolstering the stability and flexibility of their responses. Empirical results indicate that after 600 rounds, the win rate of agents employing this framework saw a substantial increase, rising from 26.17% with the original MADDPG to a perfect 100%. Further validation through comparative experiments underscores the method’s efficacy, demonstrating considerable advantages in strategy optimization and agent performance in complex, dynamic environments. Moreover, in the Predator–Prey Scenario combat environment, when the enemy side employs a multi-agent strategy, the win rate for the drone agent side can reach 98.5% and 100%.