Few-Shot Intelligent Anti-Jamming Access with Fast Convergence: A GAN-Enhanced Deep Reinforcement Learning Approach
Abstract
1. Introduction
- (1)
- Small-sample training bottleneck: In complex electromagnetic environments, jamming patterns are typically unknown and scarce, causing conventional DQN to suffer from insufficient samples and slow convergence during training [1].
- (2)
- Limited convergence efficiency: In high-dimensional action–state spaces, the exploration–exploitation trade-off is easily destabilized and traditional policies require lengthy exploration steps before discovering optimal strategies. This latency risks rendering anti-jamming decisions obsolete before adapting to rapidly evolving jamming.
- (3)
- Poor transferability: Methods leveraging transfer learning or knowledge graphs perform better when confronting identical or similar jamming styles, whereas the traditional DQN still demands extensive retraining.
2. Related Work
2.1. The Principle of GANs
2.2. GAN Applied to Data Augmentation
2.3. Methods of Combining GAN with DRL
3. System Model
3.1. Communication Model
3.2. Problem Formulation
- State space represents the set of all channel jamming states; is an element of the environmental state space, denoting the set of jamming states of all channels at time slot t. The actual state of the entire target frequency band can be expressed as .
- Action space is the set of all executable anti-jamming actions of the wireless communication system. The transmission power of the communication transmitter is constant. At time slot t, the action of the system is to select one channel from N channels for communication; represents the transmission channel selected by the transmitter at time slot t.
- Reward function r represents the return obtained from the environment after the wireless communication system selects and executes the anti-jamming action . This reward function r is used to calculate the immediate reward. The immediate reward is measured by the normalized throughput and is defined as
- The state transition probability p is the probability that the intelligent anti-jamming communication system transitions to after selecting and executing the action in the jamming environment state .
- Policy represents the conditional probability distribution of selecting different anti-jamming actions in the current jamming state .
4. Experimental Principles
4.1. LACGAN-Based Small-Sample Data-Augmentation Method
4.1.1. Auxiliary-Classifier Generative Adversarial Networks
4.1.2. Network Architecture Design
- A label-embedding layer is added to the input of the generator. It maps discrete category labels into continuous vectors, which are then concatenated with jamming vectors to form combined inputs. This enables the generator to clearly distinguish the feature distributions of different jamming types in the latent space, achieving precise control over jamming types.
- Instance normalization (InstanceNorm) is adopted in the discriminator. The discriminator of a traditional GAN only outputs the authenticity probability and uses batch normalization (BatchNorm), which leads to instability in small-batch training. Instance normalization performs normalization on the feature dimension of each sample and does not depend on the batch size, improving the fine-grained classification of samples.
- The LACGAN introduces learnable scaling factors at the output layer of the generator by setting learnable intensity parameters. It can automatically learn the optimal intensity range for different jamming types through backpropagation, dynamically adjusting the jamming intensity. This enhances the adaptability of the generator in different Signal-to-Jamming-plus-Noise Ratio (SJNR) conditions.
4.1.3. Sample Evaluation of Sample Evaluation Metrics
4.2. GA-DQN Anti-Jamming Decision-Making Method
4.2.1. DQN Algorithm
4.2.2. DQN Algorithm
Algorithm 1 GAN Algorithm for Synthetic Sample Generation |
|
Algorithm 2 DQN Anti-Jamming Decision-Making Algorithm |
|
5. Experimental Results and Analysis
5.1. Simulation Setup
- Multi-Tone Jamming: The energy of this jamming is evenly concentrated on multiple fixed channels. The interval between the jamming channels is 4 channels, and the initial jamming position is channel 2. It continuously acts on multiple fixed channels within 100 time slots.
- Frequency-Hopping Jamming: This type of jamming adopts a pseudo-random hopping mode, performing time-varying frequency point switching within 12 channels. Only 1 channel is activated in each time slot, and the activation position is determined cyclically by the frequency-hopping pattern map.
- Comb-shaped Sweeping Jamming: This type of jamming presents a dynamically changing comb-tooth shape in the time–frequency domain. The interference channels are spaced 4 channels apart, starting from channel 2, and the sweep is at a rate of 1 channel per time slot.
- Linear Sweeping Jamming: This jamming exhibits continuous frequency sweeping characteristics. The jamming starts from a random channel number and scans at a constant rate of 1 channel per time slot. It reflects at the spectrum boundary, forming a saw-toothed jamming path.
5.2. Experiment on the Generalization of Network Application
5.3. Simulation Analysis of GA-DQN
- Single-jamming environment: The jammer employs only single comb-shaped sweeping jamming to disrupt the receiver;
- Composite-jamming environment: The jammer simultaneously deploys comb-shaped sweeping jamming and multi-channel jamming to challenge the receiver;
- Dynamic-jamming environment: The jammer switches jamming types every 100 time slots in the sequence of . This dynamic pattern prevents the DQN from fully exploring and adapting to any single jamming type;
- Intelligent-jamming environment: The jammer updates its target channels every 10 time slots, selecting the channels occupied by the communication signals in the previous 10 slots plus 4 adjacent channels. This adaptive strategy mimics intelligent adversarial behavior.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhou, Q.; Niu, Y. From Adaptive Communication Anti-Jamming to Intelligent Communication Anti-Jamming: 50 Years of Evolution. Adv. Intell. Syst. 2024, 6, 2300853. [Google Scholar] [CrossRef]
- Torrieri, D. Principles of Spread-Spectrum Communication Systems; Springer: Berlin, Germay, 2018. [Google Scholar]
- Fuqiang, Y. Communication Anti-Jamming Engineering and Practice, 3rd ed.; Publishing House of Electronics Industry: Beijing, China, 2025. [Google Scholar]
- Song, B.; Xu, H.; Jiang, L.; Rao, N. An intelligent decision-making method for anti-jamming communication based on deep reinforcement learning. J. Northwestern Polytech. Univ. 2021, 39, 641–649. [Google Scholar] [CrossRef]
- Xiao, L.; Jiang, D.; Xu, D.; Zhu, H.; Zhang, Y.; Poor, H.V. Two-dimensional Anti-jamming Mobile Communication Based on Reinforcement Learning. IEEE Trans. Veh. Technol. 2017, 67, 9499–9512. [Google Scholar] [CrossRef]
- Wan, B.; Niu, Y.; Chen, C.; Zhou, Z.; Xiang, P. A novel algorithm of joint frequency–power domain anti-jamming based on PER-DQN. Neural Comput. Appl. 2025, 37, 7823–7840. [Google Scholar] [CrossRef]
- Zhang, F.; Niu, Y.; Zhou, Q.; Chen, Q. Intelligent anti-jamming decision algorithm for wireless communication under limited channel state information conditions. Sci. Rep. 2025, 15, 6271. [Google Scholar] [CrossRef]
- Li, Y.; Xu, Y.; Li, G.; Gong, Y.; Liu, X.; Wang, H.; Li, W. Dynamic Spectrum Anti-Jamming Access with Fast Convergence: A Labeled Deep Reinforcement Learning Approach. IEEE Trans. Inf. Forensics Secur. 2023, 18, 5447–5458. [Google Scholar] [CrossRef]
- Li, Y.; Xu, Y.; Li, W.; Li, G.; Feng, Z.; Liu, S.; Du, J.; Li, X. Achieving Hiding and Smart Anti-Jamming Communication: A Parallel DRL Approach against Moving Reactive Jammer. IEEE Trans. Commun. 2025, 1, 0090-6778. [Google Scholar] [CrossRef]
- Li, G.; Wu, Q.; Wang, X.; Luo, H.; Li, L.; Jing, X.; Chen, Q. Deep reinforcement learning-empowered anti-jamming strategy aided by sample information entropy. J. Commun. 2024, 45, 115–128. [Google Scholar]
- Hou, Y.; Zhang, W.; Zhu, Z.; Yu, H. CLIP-GAN: Stacking CLIPs and GAN for Efficient and Controllable Text-to-Image Synthesis. IEEE Trans. Multimed. 2025, 27, 3702–3715. [Google Scholar] [CrossRef]
- Gu, B.; Wang, X.; Liu, W.; Wang, Y. MDA-GAN: Multi-dimensional Attention Guided Concurrent-Single-Image-GAN. Circuits Syst. Signal Process. 2025, 44, 1075–1102. [Google Scholar] [CrossRef]
- Fang, F.; Zhang, P.; Zhou, B.; Qian, K.; Gan, Y. Atten-GAN: Pedestrian Trajectory Prediction with GAN Based on Attention Mechanism. Cogn. Comput. 2022, 14, 2296–2305. [Google Scholar] [CrossRef]
- Kapoor, P.; Arora, S. Optic-GAN: A generalized data augmentation model to enhance the diabetic retinopathy detection. Int. J. Inf. Technol. 2025, 17, 2251–2269. [Google Scholar] [CrossRef]
- Airale, L.; Alameda-Pineda, X.; Lathuilière, S.; Vaufreydaz, D. Autoregressive GAN for Semantic Unconditional Head Motion Generation. ACM Trans. Multimed. Comput. Commun. Appl. 2025, 21, 14. [Google Scholar] [CrossRef]
- Singh, R.; Sethi, A.; Saini, K.; Saurav, S.; Tiwari, A.; Singh, S. VALD-GAN: Video anomaly detection using latent discriminator augmented GAN. Signal Image Video Process. 2024, 18, 821–831. [Google Scholar] [CrossRef]
- Ma, D.; Zhang, F.; Bull, D.R. CVEGAN: A perceptually-inspired GAN for Compressed Video Enhancement. Signal Process. Image Commun. 2024, 127, 117–127. [Google Scholar] [CrossRef]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
- Zhang, K.; Yang, X.; Xu, L.; Thé, J.; Tan, Z.; Yu, H. Enhancing coal-gangue object detection using GAN-based data augmentation strategy with dual attention mechanism. Energy 2024, 287, 129654. [Google Scholar] [CrossRef]
- Ye, R.; Boukerche, A.; Yu, X.-S.; Zhang, C.; Yan, B.; Zhou, X.-J. Data augmentation method for insulators based on Cycle-GAN. J. Electron. Sci. Technol. 2024, 22, 36–47. [Google Scholar] [CrossRef]
- Seon, J.; Lee, S.; Sun, Y.G.; Kim, S.H.; Kim, D.I.; Kim, J.Y. Least Information Spectral GAN with Time-Series Data Augmentation for Industrial IoT. IEEE Trans. Emerg. Top. Comput. Intell. 2025, 9, 757–769. [Google Scholar] [CrossRef]
- Han, H.; Wang, X.; Gu, F.; Li, W.; Cai, Y.; Xu, Y.; Xu, Y. Better Late Than Never: GAN-Enhanced Dynamic Anti-Jamming Spectrum Access with Incomplete Sensing Information. IEEE Wirel. Commun. Lett. 2021, 10, 1800–1804. [Google Scholar] [CrossRef]
- Strickland, C.; Zakar, M.; Saha, C.; Soltani Nejad, S.; Tasnim, N.; Lizotte, D.J.; Haque, A. DRL-GAN: A Hybrid Approach for Binary and Multiclass Network Intrusion Detection. Sensors 2024, 24, 2746. [Google Scholar] [CrossRef]
- Wang, Z.; Zhu, H.; He, M.; Zhou, Y.; Luo, X.; Zhang, N. GAN and Multi-Agent DRL Based Decentralized Traffic Light Signal Control. IEEE Trans. Veh. Technol. 2022, 71, 1333–1348. [Google Scholar] [CrossRef]
- Shi, Z.; Huang, C.; Wang, J.; Yu, Z.; Fu, J.; Yao, J. Enhancing performance and generalization in dormitory optimization using deep reinforcement learning with embedded surrogate model. Build. Environ. 2025, 276, 112864. [Google Scholar] [CrossRef]
- Guo, Q.; Zhao, W.; Lyu, Z.; Zhao, T. A GAN enhanced meta-deep reinforcement learning approach for DCN routing optimization. Inf. Fusion 2025, 121, 103160. [Google Scholar] [CrossRef]
- Huang, W.; Dai, Z.; Hou, J.; Liang, L.; Chen, Y.; Chen, Z.; Pan, Z. Risk-averse stochastic dynamic power dispatch based on deep reinforcement learning with risk-oriented Graph-Gan sampling. Front. Energy Res. 2023, 11, 1272216. [Google Scholar] [CrossRef]
- Cai, J.; Xu, K.; Zhu, Y.; Hu, F.; Li, L. Prediction and analysis of net ecosystem carbon exchange based on gradient boosting regression and random forest. Appl. Energy 2020, 262, 114566. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- Chen, Y.; Zhao, Z. Intelligent anti-jamming decision algorithm of bivariate frequency hopping pattern based on ET-PPO. Telecommun. Sci. 2022, 38, 86–95. [Google Scholar]
- Yao, F.; Jia, L. A Collaborative Multi-agent Reinforcement Learning Anti-jamming Algorithm in Wireless Networks. IEEE Wirel. Commun. Lett. 2019, 8, 1024–1027. [Google Scholar] [CrossRef]
Ref | Application | Method | Key Feature |
---|---|---|---|
[22] | Dynamic spectrum access | SWGAN | Simulates the spectrum environment for offline training. |
[23] | Network intrusion detection | CTGAN | Generates datasets via a GAN to pre-train DRL. |
[24] | Traffic signal control | GAN | Reconstructs neighboring traffic data, using traffic statistics. |
[25] | Dormitory environment design | GAN | Rapidly predicts indoor performance with a GAN. |
[26] | Network routing optimization | GAN | Produces rare traffic features; optimizes the discriminator boundary. |
[27] | Power system scheduling | KEG-GAN | Synthesizes additional critical scenario samples. |
Parameter | Value |
---|---|
Length of communication time slot | 350 |
Number of channels M | 12 |
Threshold of JSR | 8 |
Discount rate | 0.3 |
Communication power | 10 |
Jamming power | 10 |
Model | Single Jamming | Double Jamming | Dynamic Jamming | Intelligent Jamming | ||||
---|---|---|---|---|---|---|---|---|
C S | T P | C S | T P | C S | T P | C S | T P | |
GA-DQN | 126 | 1 | 131 | 1 | 185 | 1 | 245 | 0.95 |
DQN | 147 | 1 | 156 | 1 | 256 | 0.94 | 280 | 0.82 |
PPO | 263 | 1 | 252 | 1 | 280 | 0.75 | 300 | 0.75 |
QL | 149 | 0.67 | 211 | 0.67 | 300 | 0.68 | 300 | 0.55 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, T.; Niu, Y.; Zhou, Z. Few-Shot Intelligent Anti-Jamming Access with Fast Convergence: A GAN-Enhanced Deep Reinforcement Learning Approach. Appl. Sci. 2025, 15, 8654. https://doi.org/10.3390/app15158654
Wang T, Niu Y, Zhou Z. Few-Shot Intelligent Anti-Jamming Access with Fast Convergence: A GAN-Enhanced Deep Reinforcement Learning Approach. Applied Sciences. 2025; 15(15):8654. https://doi.org/10.3390/app15158654
Chicago/Turabian StyleWang, Tianxiao, Yingtao Niu, and Zhanyang Zhou. 2025. "Few-Shot Intelligent Anti-Jamming Access with Fast Convergence: A GAN-Enhanced Deep Reinforcement Learning Approach" Applied Sciences 15, no. 15: 8654. https://doi.org/10.3390/app15158654
APA StyleWang, T., Niu, Y., & Zhou, Z. (2025). Few-Shot Intelligent Anti-Jamming Access with Fast Convergence: A GAN-Enhanced Deep Reinforcement Learning Approach. Applied Sciences, 15(15), 8654. https://doi.org/10.3390/app15158654