Optimization of Multi-Intelligent Body Strategies for UAV Adversarial Tasks Based on MADDPG-SASP

Zhenfei Xiao; Fuyong Liu; Qian Wang

doi:10.3390/info16121050

,

and

¹

School of Electronic and Electrical Engineering, Wuhan Textile University, Wuhan 430200, China

²

School of Information Science and Engineering, Xinjiang College of Science and Technology, Korla 841000, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Information2025, 16(12), 1050;https://doi.org/10.3390/info16121050
(registering DOI)

This article belongs to the Special Issue AI and Machine Learning in the Big Data Era: Advanced Algorithms and Real-World Applications

Version Notes

Order Reprints

Abstract

In intelligent multi-agent systems, particularly in drone combat scenarios, the challenges posed by rapidly changing environments and incomplete information significantly hinder effective strategy optimization. Traditional multi-agent reinforcement learning (MARL) approaches often encounter difficulties in adapting to the dynamic nature of adversarial environments, especially when enemy strategies are subject to continuous evolution, complicating agents’ ability to respond effectively. To address these challenges, this paper introduces a novel enhanced MARL framework, MADDPG-SASP, which integrates an improved self-attention mechanism with self-play within the MADDPG algorithm, thereby facilitating superior strategy optimization. The self-attention mechanism empowers agents to adaptively extract critical environmental features, thereby enhancing both the speed and accuracy of perception and decision-making processes. Concurrently, the adaptive self-battling mechanism iteratively refines agent strategies through continuous adversarial interactions, thereby bolstering the stability and flexibility of their responses. Empirical results indicate that after 600 rounds, the win rate of agents employing this framework saw a substantial increase, rising from 26.17% with the original MADDPG to a perfect 100%. Further validation through comparative experiments underscores the method’s efficacy, demonstrating considerable advantages in strategy optimization and agent performance in complex, dynamic environments. Moreover, in the Predator–Prey Scenario combat environment, when the enemy side employs a multi-agent strategy, the win rate for the drone agent side can reach 98.5% and 100%.

Keywords:

self-play; MADDPG strategy; attention mechanism; strategy optimization

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Article metric data becomes available approximately 24 hours after publication online.