# Jamming Strategy Optimization through Dual Q-Learning Model against Adaptive Radar

^{*}

## Abstract

**:**

## 1. Introduction

- An RL model named DQL is constructed to guide the jamming decision-making against adaptive radars, where the jamming mode and jamming parameters are hierarchically selected and jointly optimized. Because of the reduced dimensionality of action space, the globally optimal solution can easily be found with a shorter convergence time.
- A new jamming effectiveness evaluation method based on indicator vector space is proposed to serve the feedback to the DQL model, which effectively overcomes the dependence on subjective experience when the model updates. Additionally, in view of the variable electromagnetic environment, the indicators’ weights are calculated dynamically with the real-time radar data, to make the evaluation result more credible.

## 2. System Model and Problem Formulation

#### 2.1. System Model

#### 2.2. Problem Formulation

## 3. Proposed Jamming Scheme Based on DQL Model

#### 3.1. Jamming Decision-Making through DQL Model

#### 3.1.1. Outer Q-Learning

#### 3.1.2. Inner Q-Learning

Algorithm 1: Jamming algorithm based on DQL model |

(K and ${N}_{k}$ denote the amount of jamming rounds in simulation and the total number of pulses in the kth jamming round respectively. m represents the number of pulses required for radar mode discerning.) |

for $k=1,2,\dots ,K$ do |

end |

#### 3.2. Jamming Effectiveness Evaluation through Dynamic Measuring of Vector Distance

#### 3.2.1. The Jamming Effectiveness Evaluation Method Based on Vector Distance Measuring

#### 3.2.2. The Method of Dynamically Weighting for Evaluation Indicators

Algorithm 2: Jamming effectiveness evaluation algorithm |

Input: Evaluation indicator vector ${\mathit{x}}^{\left(k\right)}$ |

Output: Jamming effect evaluation result ${R}_{o}^{\left(k\right)}$ |

Calculate the Euclidean distance $d({\mathit{x}}^{(k-1)},{\mathit{x}}^{\left(k\right)},{\mathbf{\omega}}^{\left(k\right)})$ between normalized ${\mathit{x}}^{\left(k\right)}$ and the last indicator vector ${\mathit{x}}^{(k-1)}$ with the weight vector ${\mathbf{\omega}}^{\left(k\right)}$ according to Equation (14) |

Calculate the feedback ${R}_{o}^{\left(k\right)}$ of the DQL model through Equation (15). |

## 4. Numerical Results

## 5. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Xing, Q.; Zhu, W.; Chi, Z.; Zheng, G. Jamming decision under condition of incomplete jamming rule library. J. Eng.
**2019**, 2019, 7449–7454. [Google Scholar] [CrossRef] - Haykin, S. Cognitive Radar: A Way of the Future. IEEE Signal Procesing Mag.
**2006**, 23, 30–40. [Google Scholar] [CrossRef] - Gao, L.; Liu, L.; Cao, Y.; Wang, S.; You, S. Performance analysis of one-step prediction-based cognitive jamming in jammer-radar countermeasure model. J. Eng.
**2019**, 2019, 7958–7961. [Google Scholar] [CrossRef] - Zhang, B.; Zhu, W. Research on Decision-making System of Cognitive Jamming against Multifunctional Radar. In Proceedings of the 2019 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Dalian, China, 20–22 September 2019; pp. 1–6. [Google Scholar]
- Zheng, T.; Gao, X. Research on the self-defence electronic jamming decision-making based on the discrete dynamic Bayesian network. J. Syst. Eng. Electron.
**2008**, 19, 702–708. [Google Scholar] [CrossRef] - Pan, W.; Jin, X.; Xie, H.; Xia, Y. Radar Jamming Strategy Allocation Algorithm based on Improved Chaos Genetic Algorithm. In Proceedings of the 2020 Chinese Control And Decision Conference (CCDC), Hefei, China, 22–24 August 2020; pp. 4478–4483. [Google Scholar]
- Wang, L.; Zeng, Y.; Li, Y.; Wang, M. An Optimal Jamming Strategy Aiming at Cognitive MIMO Radar. In Proceedings of the 2016 CIE International Conference on Radar (RADAR), Guangzhou, China, 10–13 October 2016; pp. 1–5. [Google Scholar]
- Slimeni, F.; Scheers, B.; Chtourou, Z.; Nir, V.L. Jamming mitigation in cognitive radio networks using a modified Q-learning algorithm. In Proceedings of the 2015 International Conference on Military Communications and Information Systems (ICMCIS), Cracow, Poland, 18–19 May 2015; pp. 1–7. [Google Scholar]
- Machuzak, S.; Jayaweera, S.K. Reinforcement Learning Based Anti-jamming with Wideband Autonomous Cognitive Radios. In Proceedings of the 2016 IEEE/CIC International Conference on Communications in China (ICCC), Chengdu, China, 27–29 July 2016; pp. 1–5. [Google Scholar]
- Peng, J.; Zhang, Z.; Wu, Q.; Zhang, B. Anti-Jamming Communications in UAV Swarms A Reinforcement Learning Approach. IEEE Access.
**2019**, 7, 180532–180543. [Google Scholar] [CrossRef] - Lu, X.; Xiao, L.; Dai, C.; Dai, H. UAV-aided cellular communications with deep reinforcement learning against jamming. IEEE Wirel. Commun.
**2020**, 27, 48–53. [Google Scholar] [CrossRef] - Yao, F.; Jia, L. A collaborative multi-agent reinforcement learning anti-jamming algorithm in wireless networks. Wirel. Commun. Lett.
**2019**, 8, 1024–1027. [Google Scholar] [CrossRef] [Green Version] - Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 1st ed.; MIT Press: Cambridge, MA, USA, 1998; pp. 216–224. [Google Scholar]
- Xing, Q.; Zhu, W.; Jia, X. Research on method of intelligent radar confrontation based on reinforcement learning. In Proceedings of the 2017 2nd IEEE International Conference on Computational Intelligence and Applications (ICCIA), Beijing, China, 8–11 September 2017; pp. 471–475. [Google Scholar]
- Wang, L.; Peng, J.; Xie, Z.; Zhang, Y. Optimal jamming frequency selection for cognitive jammer based on reinforcement learning. In Proceedings of the 2019 IEEE 2nd International Conference on Information Communication and Signal Processing (ICICSP), Weihai, China, 28–30 September 2019; pp. 39–43. [Google Scholar]
- Li, K.; Jiu, B.; Liu, H.; Liang, S. Reinforcement learning based anti-jamming frequency hopping strategies design for cognitive radar. In Proceedings of the 2018 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Qingdao, China, 14–16 September 2018; pp. 1–5. [Google Scholar]
- Lei, M.; Zhang, J. Study on anti-jamming frequency selection in radar netting. In Proceedings of the 2016 2nd IEEE International Conference on Computer and Communications (ICCC), Chengdu, China, 14–17 October 2016; pp. 1781–1784. [Google Scholar]
- Ak, S.; Brüggenwirth, S. Avoiding Jammers: A Reinforcement Learning Approach. In Proceedings of the 2020 IEEE International Radar Conference (RADAR), Florence, Italy, 21–25 September 2020; pp. 321–326. [Google Scholar]
- Li, K.; Jiu, B.; Liu, H.; Pu, W. Robust antijamming strategy design for frequency-agile radar against main lobe jamming. Remote Sens.
**2021**, 13, 3043. [Google Scholar] [CrossRef] - Quan, Y.; Wu, Y.; Li, Y.; Sun, G.; Xing, M. Range-Doppler reconstruction for frequency agile and PRF-jittering radar. IET Radar Sonar Navig.
**2018**, 12, 348–352. [Google Scholar] [CrossRef] - Ou, J.; Zhao, F.; Ai, X.; Liu, J.; Xiao, S. Quantitative evaluation for self-screening jamming effectiveness based on the changing characteristics of intercepted radar signals. In Proceedings of the 2016 CIE International Conference on Radar (RADAR), Guangzhou, China, 10–13 October 2016; pp. 1–5. [Google Scholar]
- Li, C.; Zhou, J. Jamming effectiveness evaluation from the jamming side. Electron. Inf. Warf. Technol.
**2008**, 23, 46–49. [Google Scholar] - Peng, X.; Yu, J.; Ren, W.; Weng, X. Radar jamming effectiveness evaluation method based on feature space weighting. In Proceedings of the IET International Radar Conference (IET IRC 2020), Chongqing, China, 4–6 November 2020; pp. 629–633. [Google Scholar]
- Rabiner, L.R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE
**1989**, 77, 257–286. [Google Scholar] [CrossRef] - Osner, N.R.; du Plessis, W.P. Threat evaluation and jamming allocation. IET Radar Sonar Navig.
**2017**, 11, 459–465. [Google Scholar] [CrossRef] [Green Version] - Han, L.; Ning, Q.; Chen, B.; Lei, Y.; Zhou, X. Ground threat evaluation and jamming allocation model with Markov chain for aircraft. IET Radar Sonar Navig.
**2020**, 14, 1039–1045. [Google Scholar] [CrossRef]

**Figure 1.**The self-defense electronic jamming scenario: through effective jamming, the jammer can shorten the detection range of the radar to protect the target.

**Figure 2.**Jamming decision-making based on DQL model: During a jamming round, the radar work mode is discerned with the earliest received m radar pulses. Once a new radar mode ${M}_{r}^{\left(k\right)}$ is recognized, it is sent to the outer Q-learning module. Then the jamming mode ${M}_{j}^{\left(k\right)}$ is chosen and map to time domain Q table and frequency domain Q table in the inner Q-learning module. According to ${M}_{r}^{\left(k\right)}$ and ${M}_{r}^{(k-1)}$, the jamming effectiveness of last jamming round is evaluated, with which the outer Q table can be updated according to Equation (11). When receiving the nth ($n>m$) radar pulse, the radar pulse parameter vector ${s}^{\left(n\right)}$ is obtained through parameter estimation. Then the jamming parameters are selected to constitute the parameter vector ${a}^{(n+1)}$, and the jamming signal will be generated. According to ${s}^{\left(n\right)}$, the two effective jamming coefficients are evaluated, with which the inner Q table can be updated according to Equation (13).

**Figure 4.**Vector space $\mathcal{V}$${}^{3}$ composed of three evaluation indicators. $\mathit{u}$, $\mathit{v}$ are two evaluation indicator vectors, and $d(\mathit{u},\mathit{v},\mathit{\omega})$ is the Euclidean distance between $\mathit{u}$ and $\mathit{v}$ with weight vector $\mathit{\omega}$.

**Figure 5.**Time-frequency information at the initial and convergent stage: for signals in each image, the brighter strips are radar pulses, the darker strips are jamming pulses. (

**a1**,

**b1**,

**c1**,

**d1**) show time-frequency information of radar and jamming pulses at initial stage when the radar is at mode ${M}_{{r}_{1}}$, ${M}_{{r}_{4}}$, ${M}_{{r}_{5}}$, and ${M}_{{r}_{7}}$, respectively. (

**a2**,

**b2**,

**c2**,

**d2**) show the corresponding time-frequency information at convergent stage for the same four radar modes.

**Figure 7.**Radar mode switching comparison among different methods: radar modes 1 to 8 represent ${M}_{{r}_{1}}$ to ${M}_{{r}_{8}}$ respectively.

**Figure 8.**Convergence time comparison among different methods. The size of jamming action space is defined as the product of the total number of optional parameters in time and frequency domains.

**Table 1.**Radar parameter template. When the radar is at a certain mode, it selects the pulse parameters according to the rules in the template. Reside and switch, slippery, staggered, and jittered are four of the common radar parameter agility patterns. As shown, reside and switch A:k B:m C:n means that the parameter value stays at A for k pulses, stays at B for m pulses, and stays at C for n pulses; slippery A:B:C means that the parameter value changes from A to C in steps of B; staggered such as [A B C] means that the parameter value is cycled in the order of the list; jittered such as (A, B) means the parameter value is randomly selected from the range of A to B.

Work Mode | Sub-Mode | ${\mathit{f}}_{\mathit{r}}$/MHz | ${\mathit{B}}_{\mathit{r}}$/MHz | ${\mathit{pri}}_{\mathit{r}}$/us | ${\mathit{pw}}_{\mathit{r}}$/us | ${\mathit{P}}_{\mathit{r}}$/kW |
---|---|---|---|---|---|---|

search | ${M}_{{r}_{1}}$ | reside and switch: 8500:5 9500:5 9000:5 | 100 | staggered: [1100 1320 1470] | 80 | 120 |

${M}_{{r}_{2}}$ | reside and switch: 8600:3 9600:3 9100:3 | 100 | staggered: [1100 1320 1470] | 120 | 120 | |

acquisition | ${M}_{{r}_{3}}$ | slippery: 8800:600:10000 | 150 | staggered: [1070 1430 857] | 120 | 170 |

${M}_{{r}_{4}}$ | slippery: 9800:600:12200 | 150 | staggered: [1070 1430 857] | 120 | 170 | |

tracking | ${M}_{{r}_{5}}$ | jittered: (8500,11500) | 800 | reside and switch: 830:2 890:4 960:3 | 120 | 170 |

${M}_{{r}_{6}}$ | jittered: (7500,12500) | 1000 | reside and switch: 830:2 890:4 960:3 | 120 | 170 | |

guidance | ${M}_{{r}_{7}}$ | jittered: (9500,12500) | 800 | slippery: 740:40:900 | 120 | 200 |

${M}_{{r}_{8}}$ | jittered: (8500,13500) | 1000 | slippery: 740:40:900 | 120 | 200 |

**Table 2.**Jamming parameter template. When a jamming mode is determined, the corresponding pulse parameters are selected according to the rules in the template. {A:B:C} denotes a set of optional values, consisting of an arithmetic sequence from A to C with B as the difference.

Mode | ${\mathit{f}}_{\mathit{j}}$/MHz | ${\mathit{B}}_{\mathit{j}}$($\times {\mathit{B}}_{\mathit{r}}$) | ${\mathit{dt}}_{\mathit{j}}$/us | ${\mathit{pw}}_{\mathit{j}}$($\times {\mathit{pw}}_{\mathit{r}}$) | ${\mathit{P}}_{\mathit{j}}$($\times {\mathit{P}}_{\mathit{r}}$) |
---|---|---|---|---|---|

${M}_{{j}_{1}}$ | {8000:500:12000} | 3 | {800:100:1500} | 2 | 0.003 |

${M}_{{j}_{2}}$ | {8500:1000:11500} | 6 | {800:100:1500} | 2 | 0.006 |

**Table 3.**Jamming effectiveness evaluation indicator set. The correlation to evaluation result is positive means that the greater the increase of this indicator, the better the jamming effectiveness. While the negative attribute means that the greater the decrease of this indicator, the better the jamming effectiveness.

Evaluation Indicator | Explanation | Correlation to Evaluation Result |
---|---|---|

${I}_{1}$ | PRI | positive |

${I}_{2}$ | power | positive |

${I}_{3}$ | beam dwell time | negative |

${I}_{4}$ | BW | positive |

${I}_{5}$ | PW | positive |

${I}_{6}$ | range of frequency agility | positive |

${I}_{7}$ | speed of frequency agility | positive |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Liu, H.; Zhang, H.; He, Y.; Sun, Y.
Jamming Strategy Optimization through Dual Q-Learning Model against Adaptive Radar. *Sensors* **2022**, *22*, 145.
https://doi.org/10.3390/s22010145

**AMA Style**

Liu H, Zhang H, He Y, Sun Y.
Jamming Strategy Optimization through Dual Q-Learning Model against Adaptive Radar. *Sensors*. 2022; 22(1):145.
https://doi.org/10.3390/s22010145

**Chicago/Turabian Style**

Liu, Hongdi, Hongtao Zhang, Yuan He, and Yong Sun.
2022. "Jamming Strategy Optimization through Dual Q-Learning Model against Adaptive Radar" *Sensors* 22, no. 1: 145.
https://doi.org/10.3390/s22010145