Abstract
Decision-making for maneuvering in the presence of long-range threats is crucial for enhancing the safety and reliability of autonomous aerial platforms operating in beyond-line-of-sight environments. This study employs the Deep Q-Network (DQN) method to investigate maneuvering strategies for simultaneously avoiding incoming high-speed threats and re-establishing tracking of a maneuvering target platform. First, kinematic models for the aerial platforms and the approaching interceptor are developed, and a DQN training environment is constructed based on these models. A DQN framework is then designed, integrating scenario-specific state representation, action space, and a hybrid reward structure to enable autonomous strategy learning without prior expert knowledge. The agent is trained within this environment to achieve near-optimal maneuvering decisions, with comparative evaluations against Q-learning and deep deterministic policy gradient (DDPG) baselines. Simulation results demonstrate that the trained model outperforms the baselines on key metrics by effectively avoiding approaching threats, re-establishing robust target tracking, reducing maneuver time, and exhibiting strong generalization across challenging scenarios. This work advances Beyond-Visual-Range (BVR) maneuver planning and provides a foundational methodological framework for future research on complex multi-stage aerial pursuit–evasion problems.