In existing phased-array radar systems, anti-jamming strategies are mainly generated through manual judgment. However, manually designing or selecting anti-jamming decisions is often difficult and unreliable in complex jamming environments. Therefore, reinforcement learning is applied to anti-jamming decision-making to solve the above problems. However,
[...] Read more.
In existing phased-array radar systems, anti-jamming strategies are mainly generated through manual judgment. However, manually designing or selecting anti-jamming decisions is often difficult and unreliable in complex jamming environments. Therefore, reinforcement learning is applied to anti-jamming decision-making to solve the above problems. However, the existing anti-jamming decision-making models based on reinforcement learning often suffer from problems such as low convergence speeds and low decision-making accuracy. In this paper, a multi-aspect improved deep
Q-network (MAI-DQN) is proposed to improve the exploration policy, the network structure, and the training methods of the deep
Q-network. In order to solve the problem of the
-greedy strategy being highly dependent on hyperparameter settings, and the
Q-value being overly influenced by the action in other deep
Q-networks, this paper proposes a structure that combines a noisy network, a dueling network, and a double deep
Q-network, which incorporates an adaptive exploration policy into the neural network and increases the influence of the state itself on the
Q-value. These enhancements enable a highly adaptive exploration strategy and a high-performance network architecture, thereby improving the decision-making accuracy of the model. In order to calculate the target value more accurately during the training process and improve the stability of the parameter update, this paper proposes a training method that combines n-step learning, target soft update, variable learning rate, and gradient clipping. Moreover, a novel variable double-depth priority experience replay (VDDPER) method that more accurately simulates the storage and update mechanism of human memory is used in the MAI-DQN. The VDDPER improves the decision-making accuracy by dynamically adjusting the sample size based on different values of experience during training, enhancing exploration during the early stages of training, and placing greater emphasis on high-value experiences in the later stages. Enhancements to the training method improve the model’s convergence speed. Moreover, a reward function combining signal-level and data-level benefits is proposed to adapt to complex jamming environments, which ensures a high reward convergence speed with fewer computational resources. The findings of a simulation experiment show that the proposed phased-array radar anti-jamming decision-making method based on MAI-DQN can achieve a high convergence speed and high decision-making accuracy in environments where deceptive jamming and suppressive jamming coexist.
Full article