A Multi-Agent Deep Reinforcement Learning Anti-Jamming Spectrum-Access Method in LEO Satellites
Abstract
1. Introduction
1.1. Traditional Anti-Jamming Techniques
1.2. Intelligent Anti-Jamming Techniques
- A VDN-based multi-agent DRL method is proposed to solve the anti-jamming spectrum-access problem. This method adopts an “offline centralized training–online distributed execution” architecture. After offline training on the ground, the model is deployed onto satellites to enable real-time anti-jamming spectrum-access decisions based on local observations.
- During training, the parameter-sharing mechanism is employed to reduce communication overhead significantly. During execution, the incremental update mechanism is employed to enhance model adaptability.
- The simulation results prove the proposed method’s effectiveness. It balances performance and cost better than fully centralized training and independent distributed training.
2. System Model and Problem Construction
2.1. System Model
2.2. Propagation Model
2.3. Problem Construction
3. The Proposed Multi-Agent Deep Reinforcement Learning Method
3.1. Multi-Agent MDP Problem Formulation
- State: The state of the environment is the global information of all users, which can be formed as follows:
- Action: The action is the spectrum-access strategy for all users at time slot t, which can be formed as follows:
- Reward: The agents can obtain immediate rewards after taking actions, which can be formed as follows:
- State–action value function: The state–action value function is the discounted reward obtained by taking action in state , which can be formed as follows:
3.2. Multi-Agent DRL Algorithm Design
3.2.1. The Offline Centralized Training Phase
Algorithm 1 Offline Centralized Training Algorithm |
|
3.2.2. The Online Distributed Execution Phase
Algorithm 2 Online Distributed Execution Algorithm |
|
3.2.3. Complexity Analysis
4. Simulation Results and Performance Analysis
4.1. Simulation Parameters
- Centralized Training Execution (CTE) [37]: The CTE method employs a fully centralized architecture during both training and execution phases. It requires continuous global information interaction and incurs high communication overhead.
- Non-Cooperative Independent Learning (NIL) [38]: In the NIL method, each agent independently optimizes strategies without coordination mechanisms or information sharing.
- Random Action Selection (RAS): The RAS method is a benchmark strategy without learning ability, and each agent randomly selects actions.
4.2. Convergence Analysis
4.3. Performance Analysis
4.3.1. Jamming Avoidance
4.3.2. User Satisfaction
4.3.3. Network Fairness
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Al-Hraishawi, H.; Chougrani, H.; Kisseleff, S.; Lagunas, E.; Chatzinotas, S. A survey on nongeostationary satellite systems: The communication perspective. IEEE Commun. Surv. Tutorials 2022, 25, 101–132. [Google Scholar] [CrossRef]
- Xiao, Z.; Yang, J.; Mao, T.; Xu, C.; Zhang, R.; Han, Z.; Xia, X.G. LEO satellite access network (LEO-SAN) toward 6G: Challenges and approaches. IEEE Wirel. Commun. 2022, 31, 89–96. [Google Scholar] [CrossRef]
- Luo, X.; Chen, H.H.; Guo, Q. LEO/VLEO satellite communications in 6G and beyond networks–technologies, applications, and challenges. IEEE Netw. 2024, 38, 273–285. [Google Scholar] [CrossRef]
- Wang, Q.; Chen, X.; Qi, Q. Energy-efficient design of satellite-terrestrial computing in 6G wireless networks. IEEE Trans. Commun. 2023, 72, 1759–1772. [Google Scholar] [CrossRef]
- Kodheli, O.; Lagunas, E.; Maturo, N.; Sharma, S.K.; Shankar, B.; Montoya, J.F.M.; Duncan, J.C.M.; Spano, D.; Chatzinotas, S.; Kisseleff, S.; et al. Satellite communications in the new space era: A survey and future challenges. IEEE Commun. Surv. Tutor. 2020, 23, 70–109. [Google Scholar] [CrossRef]
- Al-Hraishawi, H.; Chatzinotas, S.; Ottersten, B. Broadband non-geostationary satellite communication systems: Research challenges and key opportunities. In Proceedings of the 2021 IEEE International Conference on Communications Workshops (ICC Workshops), Montreal, QC, Canada, 14–23 June 2021; pp. 1–6. [Google Scholar]
- Li, W.; Jia, L.; Chen, Y.; Chen, Q.; Yan, J.; Qi, N. A game-theoretic approach for satellites beam scheduling and power control in a mega hybrid constellation spectrum sharing scenario. IEEE Internet Things J. 2025, 12, 20626–20639. [Google Scholar] [CrossRef]
- Kim, D.; Jung, H.; Lee, I.H.; Niyato, D. Multi-Beam Management and Resource Allocation for LEO Satellite-Assisted IoT Networks. IEEE Internet Things J. 2025, 12, 19443–19458. [Google Scholar] [CrossRef]
- Yang, Q.; Laurenson, D.I.; Barria, J.A. On the use of LEO satellite constellation for active network management in power distribution networks. IEEE Trans. Smart Grid 2012, 3, 1371–1381. [Google Scholar] [CrossRef]
- Hasan, M.; Thakur, J.M.; Podder, P. Design and implementation of FHSS and DSSS for secure data transmission. Int. J. Signal Process. Syst. 2016, 4, 144–149. [Google Scholar] [CrossRef]
- Wang, J.; Jiang, C.; Kuang, L. Turbo iterative DSSS acquisition in satellite high-mobility communications. IEEE Trans. Veh. Technol. 2021, 70, 12998–13009. [Google Scholar] [CrossRef]
- Alagil, A.; Liu, Y. Randomized positioning dsss with message shuffling for anti-jamming wireless communications. In Proceedings of the 2019 IEEE conference on dependable and secure computing (DSC), Hangzhou, China, 18–20 November 2019; pp. 1–8. [Google Scholar]
- Lu, R.; Ye, G.; Ma, J.; Li, Y.; Huang, W. A numerical comparison between FHSS and DSSS in satellite communication systems with on-board processing. In Proceedings of the 2009 2nd International Congress on Image and Signal Processing, Tianjin, China, 17–19 October 2009; pp. 1–4. [Google Scholar]
- Mast, J.; Hänel, T.; Aschenbruck, N. Enhancing adaptive frequency hopping for bluetooth low energy. In Proceedings of the 2021 IEEE 46th Conference on Local Computer Networks (LCN), Edmonton, AB, Canada, 4–7 October 2021; pp. 447–454. [Google Scholar]
- Kokkinen, H.; Piemontese, A.; Kulacz, L.; Arnal, F.; Amatetti, C. Coverage and interference in co-channel spectrum sharing between terrestrial and satellite networks. In Proceedings of the 2023 IEEE Aerospace Conference, Big Sky, MT, USA, 4–11 March 2023; pp. 1–9. [Google Scholar]
- Yan, D.; Ni, S. Overview of anti-jamming technologies for satellite navigation systems. In Proceedings of the 2022 IEEE 6th Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 4–6 March 2022; Volume 6, pp. 118–124. [Google Scholar]
- Wang, X.; Wang, S.; Liang, X.; Zhao, D.; Huang, J.; Xu, X.; Dai, B.; Miao, Q. Deep Reinforcement Learning: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 5064–5078. [Google Scholar] [CrossRef]
- Nguyen, T.T.; Nguyen, N.D.; Nahavandi, S. Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications. IEEE Trans. Cybern. 2020, 50, 3826–3839. [Google Scholar] [CrossRef]
- Jia, L.; Qi, N.; Su, Z.; Chu, F.; Fang, S.; Wong, K.K.; Chae, C.B. Game theory and reinforcement learning for anti-jamming defense in wireless communications: Current research, challenges, and solutions. IEEE Commun. Surv. Tutorials 2024, 27, 1798–1838. [Google Scholar] [CrossRef]
- Yin, Z.; Li, J.; Wang, Z.; Qian, Y.; Lin, Y.; Shu, F.; Chen, W. UAV Communication Against Intelligent Jamming: A Stackelberg Game Approach With Federated Reinforcement Learning. IEEE Trans. Green Commun. Netw. 2024, 8, 1796–1808. [Google Scholar] [CrossRef]
- Wu, Z.; Lin, Y.; Zhang, Y.; Shu, F.; Li, J. Multi-agent collaboration based UAV clusters multi-domain energy-saving anti-jamming communication. Sci. Sin. Inf. 2023, 53, 2511. [Google Scholar] [CrossRef]
- Li, Y.; Xu, Y.; Li, G.; Gong, Y.; Liu, X.; Wang, H.; Li, W. Dynamic spectrum anti-jamming access with fast convergence: A labeled deep reinforcement learning approach. IEEE Trans. Inf. Forensics Secur. 2023, 18, 5447–5458. [Google Scholar] [CrossRef]
- Li, W.; Chen, J.; Liu, X.; Wang, X.; Li, Y.; Liu, D.; Xu, Y. Intelligent dynamic spectrum anti-jamming communications: A deep reinforcement learning perspective. IEEE Wirel. Commun. 2022, 29, 60–67. [Google Scholar] [CrossRef]
- Yin, Z.; Lin, Y.; Zhang, Y.; Qian, Y.; Shu, F.; Li, J. Collaborative Multiagent Reinforcement Learning Aided Resource Allocation for UAV Anti-Jamming Communication. IEEE Internet Things J. 2022, 9, 23995–24008. [Google Scholar] [CrossRef]
- Bai, H.; Wang, H.; Du, J.; He, R.; Li, G.; Xu, Y. Multi-Hop UAV Relay Covert Communication: A Multi-Agent Reinforcement Learning Approach. In Proceedings of the 2024 International Conference on Ubiquitous Communication (Ucom), Xi’an, China, 5–7 July 2024; pp. 356–360. [Google Scholar]
- Zhang, F.; Niu, Y.; Zhou, Q.; Chen, Q. Intelligent anti-jamming decision algorithm for wireless communication under limited channel state information conditions. Sci. Rep. 2025, 15, 6271. [Google Scholar] [CrossRef]
- ITU-R S.672; Satellite Antenna Radiation Patterns for Geostationary Orbit Satellite Antennas Operating in the Fixed-Satellite Service. International Telecommunication Union (ITU): Geneva, Switzerland, 1997.
- ITU-R S.1528; Satellite Antenna Radiation Patterns for Non-Geostationary Orbit Satellite Antennas Operating in the Fixed-Satellite Service Below 30 GHz. International Telecommunication Union (ITU): Geneva, Switzerland, 2001.
- ITU-R S.465; Reference radiation pattern for earth station antennas in the fixed-satellite service for use in coordination and interference assessment in the frequency range from 2 to 31 ghz. International Telecommunication Union (ITU): Geneva, Switzerland, 2010.
- Li, W.; Jia, L.; Chen, Q.; Chen, Y. A game theory-based distributed downlink spectrum sharing method in large-scale hybrid satellite constellations. IEEE Trans. Commun. 2024, 72, 4620–4632. [Google Scholar] [CrossRef]
- Islam, M.; Sharmin, S.; Nur, F.N.; Razzaque, M.A.; Hassan, M.M.; Alelaiwi, A. High-throughput link-channel selection and power allocation in wireless mesh networks. IEEE Access 2019, 7, 161040–161051. [Google Scholar] [CrossRef]
- Lin, Z.; Ni, Z.; Kuang, L.; Jiang, C.; Huang, Z. Dynamic beam pattern and bandwidth allocation based on multi-agent deep reinforcement learning for beam hopping satellite systems. IEEE Trans. Veh. Technol. 2022, 71, 3917–3930. [Google Scholar] [CrossRef]
- Luo, Z.Q.; Zhang, S. Dynamic spectrum management: Complexity and duality. IEEE J. Sel. Top. Signal Process. 2008, 2, 57–73. [Google Scholar] [CrossRef]
- Lin, Z.; Ni, Z.; Kuang, L.; Jiang, C.; Huang, Z. Satellite-terrestrial coordinated multi-satellite beam hopping scheduling based on multi-agent deep reinforcement learning. IEEE Trans. Wirel. Commun. 2024, 23, 10091–10103. [Google Scholar] [CrossRef]
- Busoniu, L.; Babuska, R.; De Schutter, B. A comprehensive survey of multiagent reinforcement learning. IEEE Trans. Syst. Man, Cybern. Part Appl. Rev. 2008, 38, 156–172. [Google Scholar] [CrossRef]
- Sunehag, P.; Lever, G.; Gruslys, A.; Czarnecki, W.M.; Zambaldi, V.; Jaderberg, M.; Lanctot, M.; Sonnerat, N.; Leibo, J.Z.; Tuyls, K.; et al. Value-decomposition networks for cooperative multi-agent learning. arXiv 2017, arXiv:1706.05296. [Google Scholar]
- Tampuu, A.; Matiisen, T.; Kodelja, D.; Kuzovkin, I.; Korjus, K.; Aru, J.; Aru, J.; Vicente, R. Multiagent cooperation and competition with deep reinforcement learning. PloS ONE 2017, 12, e0172395. [Google Scholar] [CrossRef]
- Aref, M.A.; Jayaweera, S.K.; Machuzak, S. Multi-agent reinforcement learning based cognitive anti-jamming. In Proceedings of the 2017 IEEE wireless communications and networking conference (WCNC), San Francisco, CA, USA, 19–22 March 2017; pp. 1–6. [Google Scholar]
- Zhang, Y.; Jia, L.; Qi, N.; Xu, Y.; Wang, M. Anti-jamming channel access in 5G ultra-dense networks: A game-theoretic learning approach. Digit. Commun. Netw. 2023, 9, 523–533. [Google Scholar] [CrossRef]
Notation | Description |
---|---|
The kth LEO satellite | |
The nth user of LEO satellites | |
The downlink accessed by at time slot t | |
The downlink accessed by at time slot t | |
The interference link from to at time slot t | |
The jamming link from the jamming satellite to at time slot t | |
The co-channel interference from to at time slot t | |
Jamming from the jamming satellite to at time slot t | |
The off-axis angle from the to at time slot t | |
The off-axis angle from the center of the jamming beam to at time slot t | |
Channel bandwidth | |
The beam power of jamming satellite | |
The beam power of LEO satellites | |
The channel gain of link | |
The channel gain of link | |
The channel gain of link | |
The transmission rate of user at time slot t | |
The satisfaction of user at time slot t |
Parameter | Jamming Satellite | LEO Satellite |
---|---|---|
Orbital altitude | 35,786 km | 550 km |
Beam power | 45 dBw | 15 dBw |
Beam bandwidth | 200 MHz | 200 MHz |
Beam radius | 200 km | 50 km |
Frequency range | 10.7–12.7 GHz | 10.7–12.7 GHz |
Frequency channels | 4 | 4 |
Channel noise | −100 dBm | −100 dBm |
Spectrum switching | Markov probability matrix | intelligent |
Parameter | Value |
---|---|
Learning rate | 0.001 |
Replay buffer size | 10,000 |
Mini-batch size | 64 |
Initial exploration rate | 0.1 |
Final exploration rate | 0.01 |
Discount factor | 0.99 |
Optimizer | Adam |
User 1 | User 2 | User 3 | User 4 | User 5 | User 6 | User 7 | User 8 | User 9 | User 10 | Jammer | |
---|---|---|---|---|---|---|---|---|---|---|---|
t = 10 | 3 | 3 | 3 | 4 | 3 | 4 | 4 | 3 | 4 | 3 | 4 |
t = 20 | 3 | 3 | 4 | 3 | 4 | 3 | 4 | 3 | 4 | 3 | 4 |
t = 30 | 4 | 3 | 3 | 3 | 3 | 3 | 3 | 4 | 3 | 3 | 4 |
t = 40 | 3 | 2 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
t = 50 | 3 | 4 | 3 | 4 | 3 | 4 | 3 | 3 | 3 | 4 | 2 |
User 1 | User 2 | User 3 | User 4 | User 5 | User 6 | User 7 | User 8 | User 9 | User 10 | Jammer | |
---|---|---|---|---|---|---|---|---|---|---|---|
t = 10 | 1 | 1 | 4 | 2 | 4 | 1 | 4 | 2 | 4 | 2 | 3 |
t = 20 | 2 | 2 | 1 | 2 | 3 | 1 | 2 | 3 | 2 | 2 | 4 |
t = 30 | 3 | 4 | 4 | 1 | 4 | 3 | 3 | 4 | 3 | 4 | 2 |
t = 40 | 4 | 3 | 3 | 1 | 3 | 3 | 4 | 3 | 4 | 1 | 2 |
t = 50 | 3 | 2 | 3 | 3 | 1 | 2 | 3 | 1 | 1 | 3 | 4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cao, W.; Chu, F.; Jia, L.; Zhou, H.; Zhang, Y. A Multi-Agent Deep Reinforcement Learning Anti-Jamming Spectrum-Access Method in LEO Satellites. Electronics 2025, 14, 3307. https://doi.org/10.3390/electronics14163307
Cao W, Chu F, Jia L, Zhou H, Zhang Y. A Multi-Agent Deep Reinforcement Learning Anti-Jamming Spectrum-Access Method in LEO Satellites. Electronics. 2025; 14(16):3307. https://doi.org/10.3390/electronics14163307
Chicago/Turabian StyleCao, Wenting, Feihuang Chu, Luliang Jia, Hongyu Zhou, and Yunfan Zhang. 2025. "A Multi-Agent Deep Reinforcement Learning Anti-Jamming Spectrum-Access Method in LEO Satellites" Electronics 14, no. 16: 3307. https://doi.org/10.3390/electronics14163307
APA StyleCao, W., Chu, F., Jia, L., Zhou, H., & Zhang, Y. (2025). A Multi-Agent Deep Reinforcement Learning Anti-Jamming Spectrum-Access Method in LEO Satellites. Electronics, 14(16), 3307. https://doi.org/10.3390/electronics14163307