Dynamic Jamming Policy Generation for Netted Radars Using Hybrid Policy Network
Abstract
1. Introduction
- A hybrid policy network is designed to simultaneously generate coordinated beam selection and power allocation actions across the decomposed action space, enabling the jammer to efficiently handle mixed actions in complex confrontation scenarios.
- A dynamic weighted fusion metric is introduced to comprehensively assess the jamming effectiveness, with weights dynamically adjusted based on the radar’s various operational stages.
- The PPO2 algorithm is employed to train the reinforcement network, preventing policy collapse due to improper power allocation through importance sampling ratio clipping, and balancing task success and energy efficiency.
2. Jamming Model of Netted Radars
2.1. Active Jamming Type
2.2. Radar Echo Signal Model Under Jamming Condition
2.3. Jamming Effect Evaluation Index
2.3.1. Detection Probability
2.3.2. Positioning Accuracy
2.3.3. Weighted Model for Evaluation Indicators
3. HPN Based DRL for Jamming
3.1. Policy Gradient Methods
3.2. Advantage Actor-Critic
3.3. Proximal Policy Optimization
3.4. Jammer Agent Model
3.4.1. States of Jammer Agent
- State Promotion Condition: Activation of transition to the guidance state () occurs when at least two valid target locks are confirmed across three sequential detection intervals.
- State Regression Condition: Return to search state () is triggered if zero valid target detections are recorded during three consecutive detection windows.
- State Retention Policy: Detection outcomes sustain the track state if just one successful confirmation occurs in three consecutive detections.
3.4.2. Action of the Jammer Agent
3.4.3. Rewards for Jammer Agent
- Mitigation of the Sparse Reward Problem: Traditional reward mechanisms often suffer from sparsity in complex confrontation scenarios, leading to unstable policy updates. We designed the following three-tier reward scheme: a phased reward upon radar state escalation, a penalty for state degradation due to electronic countermeasures, and terminal rewards for task success or failure. This event-driven reward injection increases effective reward density during training while reducing policy gradient estimation variance.
- The Balance Between Energy Efficiency and Task Performance Efficacy: enforces hard energy constraints via reciprocal power accumulation, while the exponential term introduces temporal decay. This forces agents to prioritize jamming intensity during early mission phases while shifting to refined power allocation near mission deadlines, as the marginal benefit of power consumption decays exponentially over time.
3.4.4. HPN Optimization
Algorithm 1 Optimized jamming strategy allocation-based HPN-enhanced PPO2 |
Require: Maximum episodes M, discount factor , clip threshold , learning rates |
Ensure: Optimized policy network and value network |
|
- GAE advantage calculation: The GAE method balances bias and variance in advantage estimation by aggregating multi-step temporal difference (TD) errors. The advantage function is computed as follows:
- Actor Update: The policy network is updated using a clipped surrogate objective to ensure stable policy improvement. For hybrid action spaces, the importance ratios for both action types are computed separately as follows:According to Equation (23), the overall policy loss integrates both ratios with the advantage function and applies a clipping mechanism:
- Critic Update: The value network minimizes the mean squared error (MSE) between predicted state-values and target values derived from discounted cumulative rewards. The Critic loss function is calculated as follows:
4. Experiment and Simulation
4.1. Scene Description and Parameter Settings
4.2. Comparison Strategies
4.3. Training Process
4.4. Experimental Results
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Li, B.; Tian, L.; Chen, D.; Liang, S. An adaptive dwell time scheduling model for phased array radar based on three-way decision. J. Syst. Eng. Electron. 2020, 31, 500–509. [Google Scholar] [CrossRef]
- Jiang, H.; Zhang, Y.; Xu, H. Optimal allocation of cooperative jamming resource based on hybrid quantum-behaved particle swarm optimisation and genetic algorithm. IET Radar Sonar Navig. 2017, 11, 185–192. [Google Scholar] [CrossRef]
- Zhang, D.; Sun, J.; Yang, C.; Yi, W. Joint jamming beam and power scheduling for suppressing netted radar system. In Proceedings of the 2021 IEEE Radar Conference (RadarConf21), Atlanta, GA, USA, 7–14 May 2021; pp. 1–6. [Google Scholar]
- Yao, Z.; Tang, C.; Wang, C.; Shi, Q.; Yuan, N. Cooperative jamming resource allocation model and algorithm for netted radar. Electron. Lett. 2022, 58, 834–836. [Google Scholar] [CrossRef]
- You, S.; Diao, M.; Gao, L. Implementation of a combinatorial-optimisation-based threat evaluation and jamming allocation system. IET Radar Sonar Navig. 2019, 13, 1636–1645. [Google Scholar] [CrossRef]
- Zou, W.; Niu, C.; Liu, W.; Wang, Y.; Zhan, J. Combination search strategy-based improved particle swarm optimisation for resource allocation of multiple jammers for jamming netted radar system. IET Signal Process. 2023, 17, e12198. [Google Scholar] [CrossRef]
- Tian, L.; Liu, F.; Miao, Y.; Li, K.; Liu, Q. Resource allocation of radar network based on particle swarm optimisation. J. Eng. 2019, 2019, 6568–6572. [Google Scholar] [CrossRef]
- He, B.; Yang, N. Power allocation between radar and jammer using conflict game theory. Electron. Lett. 2024, 60, e13311. [Google Scholar] [CrossRef]
- Zhang, S.; Tian, H. Design and implementation of reinforcement learning-based intelligent jamming system. IET Commun. 2020, 14, 3231–3238. [Google Scholar] [CrossRef]
- Li, S.; Liu, G.; Zhang, K.; Qian, Z.; Ding, S. DRL-Based joint path planning and jamming power allocation optimization for suppressing netted radar system. IEEE Signal Process. Lett. 2023, 30, 548–552. [Google Scholar] [CrossRef]
- Feng, L.; Liu, S.; Xu, H. Multifunctional radar cognitive jamming decision based on dueling double deep Q-network. IEEE Access 2021, 10, 112150–112157. [Google Scholar] [CrossRef]
- Zhang, C.; Yang, B.; Ji, W.; Hu, J.; Xu, S.; Xiao, Y. Cognitive jamming policy generation based on A2C algorithm. In Proceedings of the 2024 International Radar Symposium (IRS), Wroclaw, Poland, 2–4 July 2024; pp. 33–38. [Google Scholar]
- Wang, Y.; Liang, Y.; Wang, Z. Hierarchical reinforcement learning-based joint allocation of jamming task and power for countering netted radar. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 2149–2167. [Google Scholar] [CrossRef]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Wang, B.; Cui, G.; Zhang, B.; Sheng, B.; Kong, L.; Ran, D. Deceptive jamming suppression based on coherent cancelling in multistatic radar system. In Proceedings of the 2016 IEEE Radar Conference (RadarConf), Philadelphia, PA, USA, 2–6 May 2016; pp. 1–5. [Google Scholar]
- Wang, Y.; Dong, Q.; Jin, Q.; Mao, X. A deception jamming detection and suppression method for multichannel SAR. In Proceedings of the 2022 7th International Conference on Signal and Image Processing (ICSIP), Suzhou, China, 20–22 July 2022; pp. 29–34. [Google Scholar]
- Li, J.; Shen, X.; Xiao, S. Robust jamming resource allocation for cooperatively suppressing multi-station radar systems in multi-jammer systems. In Proceedings of the 2022 25th International Conference on Information Fusion (FUSION), Linköping, Sweden, 4–7 July 2022; pp. 1–8. [Google Scholar]
- Liu, W.; Wang, Y.; Liu, J.; Huang, L.; Jao, C. Performance analysis of adaptive detectors for point targets in subspace interference and Gaussian noise. IEEE Trans. Aerosp. Electron. Syst. 2018, 54, 429–441. [Google Scholar] [CrossRef]
- Pham, V.; Nguyen, T.; Nguyen, D.; Morishita, H. A new method based on copula theory for evaluating detection performance of distributed-processing multistatic radar system. IEICE Trans. Commun. 2022, 105, 67–75. [Google Scholar] [CrossRef]
- Zhao, Z.; Zhou, X.; Hong, S.; Gong, Y. Receiver placement in passive radar through GDOP coverage ratio with TDOA-AOA hybrid localization. In Proceedings of the IET International Radar Conference, Chongqing, China, 4–6 November 2020; pp. 476–480. [Google Scholar]
- Xia, J.; Ma, J.; Li, Y.; Song, M. Cooperative jamming resource allocation based on integer-encoded directed mutation artificial bee colony algorithm. In Proceedings of the 2021 IEEE 4th International Conference on Electronic Information and Communication Technology (ICEICT), Xi’an, China, 18–20 August 2021; pp. 695–700. [Google Scholar]
- Bachamann, D.; Evans, R.; Moran, B. Game theoretic analysis of adaptive radar jamming. IEEE Trans. Aerosp. Electron. Syst. 2011, 47, 1081–1100. [Google Scholar] [CrossRef]
- Nichlors, R.; Warren, P. Threat evaluation and jamming allocation. IET Radar Sonar Navig. 2017, 11, 459–465. [Google Scholar]
- Tang, Z.; Gong, Y.; Tao, M.; Su, J.; Fan, Y.; Li, T. Recognition of working mode for multifunctional phased array radar Uunder small sample condition. In Proceedings of the 2023 IEEE 6th International Conference on Electronic Information and Communication Technology (ICEICT), Qingdao, China, 21–24 July 2023; pp. 1157–1160. [Google Scholar]
- Schulman, S.; Moritz, P.; Levine, S.; Jordan, M.; Abbeel, P. High-dimensional continuous control using generalized advantage Estimation. arXiv 2015, arXiv:1506.02438. [Google Scholar]
- Wu, Z.; Hu, S.; Luo, Y.; Li, X. Optimal distributed cooperative jamming resource allocation for multi-missile threat scenario. IET Radar Sonar Navig. 2022, 16, 113–128. [Google Scholar] [CrossRef]
- Zhao, Z.; Liu, Y.; Xiao, S. Dynamic weighted fusion algorithm and its accuracy analysis for multi-radar localization. Electron. Opt. Control 2010, 5, 35–37. [Google Scholar]
Parameter | Value |
---|---|
Number of beams generated by the jammer L | 4 |
Total power of the jammer | 100 W |
Operating wavelength | 0.1 m |
Antenna gain of the jammer | 10 dB |
Polarization mismatch loss | 0.5 |
RCS of target | 1 |
Parameter | Value |
---|---|
Transmit power | W |
Transmit-antenna gain | 40 dB |
Operating wavelength | 0.1 m |
Main-lobe beamwidth of the antenna | |
Detection threshold | 1 |
Thermal noise power of receiver | W |
Pulse width | s |
Pulse repetition frequency | Hz |
Antenna servo bandwidth | Hz |
Receiver bandwidth B | Hz |
Type | Initial Location (km) | Initial Speed (m/s) |
---|---|---|
Jammer | (25,55,30) | (0,−0.4,−0.15) |
Target | (24,55,30) | (0,−0.4,−0.15) |
Radar 1 | (10,20,0) | / |
Radar 2 | (20,10,0) | / |
Radar 3 | (30,10,0) | / |
Radar 4 | (40,20,0) | / |
Parameter | Value |
---|---|
Maximum training episodes M | 1000 |
Maximum simulation steps | 50 |
Replay buffer size | 1024 |
Clipping threshold | 0.2 |
Discount factor | 0.9 |
Actor learning rate | 0.0001 |
Critic learning rate | 0.0003 |
Actor update steps | 10 |
Critic update steps | 10 |
Gradient clipping norm | 0.5 |
Optimizer | Adam |
Rate of exponential decay | 0.9999 |
Method | Training Time (s) | Avg. Inference Time (ms) | Complexity Class |
---|---|---|---|
HPN | 3560 | 15.2 | |
AKE | 3200 | 14.8 | |
DQN | 8900 | 32.5 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hao, W.; Ke, W.; Feng, X.; Xia, Z. Dynamic Jamming Policy Generation for Netted Radars Using Hybrid Policy Network. Appl. Sci. 2025, 15, 8898. https://doi.org/10.3390/app15168898
Hao W, Ke W, Feng X, Xia Z. Dynamic Jamming Policy Generation for Netted Radars Using Hybrid Policy Network. Applied Sciences. 2025; 15(16):8898. https://doi.org/10.3390/app15168898
Chicago/Turabian StyleHao, Wanbing, Wentao Ke, Xiaoyi Feng, and Zhaoqiang Xia. 2025. "Dynamic Jamming Policy Generation for Netted Radars Using Hybrid Policy Network" Applied Sciences 15, no. 16: 8898. https://doi.org/10.3390/app15168898
APA StyleHao, W., Ke, W., Feng, X., & Xia, Z. (2025). Dynamic Jamming Policy Generation for Netted Radars Using Hybrid Policy Network. Applied Sciences, 15(16), 8898. https://doi.org/10.3390/app15168898