Abstract
Unmanned aerial vehicle (UAV) penetration missions in hostile environments face significant challenges due to dense threat coverage, dynamic defense systems, and the need for real-time decision-making under uncertainty. Traditional path planning methods suffer from computational intractability in high-dimensional spaces, while existing deep reinforcement learning approaches lack efficient feature extraction and sample utilization mechanisms for threat-dense scenarios. To address these limitations, this paper presents an enhanced Deep Q-Network (DQN) framework integrating multi-head attention mechanisms with dynamic priority experience replay for autonomous UAV path planning. The proposed architecture employs four specialized attention heads operating in parallel to extract proximity, danger, alignment, and threat density features, enabling selective focus on critical environmental aspects. A dynamic priority mechanism adaptively adjusts sampling strategies during training, prioritizing informative experiences in early exploration while maintaining balanced learning in later stages. Experimental results demonstrate that the proposed method achieves 94.3% mission success rate in complex penetration scenarios, representing 7.1–17.5% improvement over state-of-the-art baselines with 2.2× faster convergence. The approach shows superior robustness in high-threat environments and meets real-time operational requirements with 18.3 ms inference latency, demonstrating its practical viability for autonomous UAV penetration missions.