Service-Driven Dynamic Beam Hopping with Resource Allocation for LEO Satellites
Abstract
1. Introduction
- To solve the problem of uneven ground traffic distribution in LEO beam hopping satellite communication systems, we establish a downlink transmission link model. To fully utilize beam resources, a cooperative MARL-VDN algorithm is applied for joint beam hopping pattern and power allocation decisions, achieving multi-objective optimization of the system and enabling limited on-board resources to dynamically match nonuniform and time-varying traffic demands.
- The optimization algorithm for the joint beam hopping pattern and power allocation needs to make decisions sequentially in multiple time slots. However, traditional optimization methods exhibit high computational complexity, resulting in a long convergence time and failing to meet the real-time requirements. Therefore, this paper employs a cooperative MARL-VDN algorithm framework using the centralized training and decentralized execution (CTDE) paradigm to address sequential decision-making problems. After offline training, the model can be deployed on LEO satellites to achieve real-time beam hopping scheduling.
- In the applied cooperative MARL-VDN algorithm framework, each agent only manages the hopping pattern and power allocation decision for a single beam, thereby reducing both the individual agent’s action space and the overall joint action space across all agents. In addition, agents can continuously optimize their policies through ongoing interaction with the environment to better adapt to changing and complex communication environments.
2. System Model and Problem Formulation
2.1. System Model
2.2. Channel Model
- : Free-space loss, a function of carrier frequency , and the straight-line distance between the satellite and the beam-positioning center.
- : Shadow fading, modeled as a log-normal random variable.
- : Rain attenuation, dependent on rain probability .
- : Multipath fading, characterized by a Rayleigh distribution.
2.3. Problem Formulation
3. A Cooperative Multi-Agent RL for Single-Satellite Beam Hopping
3.1. MDP Model
3.2. Cooperative MARL-VDN Algorithm Framework
Algorithm 1 Cooperative MARL-VDN algorithm for dynamic beam hopping |
Input: |
Number of agents K. |
Exploration rate . |
Size of the experience replay buffer . |
Update frequency of the target Q-network G. |
Initialize: |
Initialize an independent Q-network for each agent with randomly generated network parameters . |
Initialize the target Q-network, set network parameters . |
Initialize the environment state . |
Initialize the experience replay buffer D. |
Training: |
for episode to do |
Reset the environment. |
for time slot to T do |
for agent to K do |
Obtain the global state of the agent. |
Select action using the -greedy strategy. |
end for |
Execute the joint action . |
Obtain the global reward and transition to the next state . |
Store the experience information in the experience buffer. |
Each agent randomly samples a mini-batch of samples from the experience replay buffer D. |
for agent to K do |
Calculate the target value of the mini-batch data via Equation (24). |
Calculate the mean squared error loss according to Equation (26). |
Update the individual Q-network parameters of the agent using the Adam gradient descent algorithm. |
end for |
Update the target Q-network parameters of each agent every G time steps. |
end for |
end for |
4. Simulation Results and Analysis
4.1. Setup and Parameters
4.2. Computational Complexity Analysis
- R-BH: The R-BH algorithm randomly selects K beam positions from M beam positions for irradiation, and the beam power is randomly selected from N elements of the transmitting power set. Therefore, the complexity of the R-BH algorithm is expressed as .
- P-BH: The P-BH algorithm carries out a time division service for different beam positions through a periodic beam hopping pattern. So, the complexity of the P-BH algorithm is expressed as .
- GA-BH [26]: The GA-BH algorithm needs to calculate user satisfaction and delay separately. The computational complexity of the former is , and the computational complexity of the latter is . Therefore, the complexity of the GA-BH algorithm is expressed as , where G represents the maximum number of iterations and P represents the population size.
- DF-BH [27]: The DF-BH algorithm serves the top K beam positions with the largest delay in each time slot. So, the complexity of the DF-BH algorithm is expressed as .
- DQN: The DQN algorithm learns optimal strategies through a neural network, and its computational complexity depends on the complexity of the deep neural network. For a neural network structure with L layers, the dimension of the weight matrix is . In a single forward-propagation calculation, the computational amount of a fully connected layer is , and the computational complexity of the forward propagation of the entire neural network is .
4.3. Performance of the Applied Algorithm
4.4. Performance Comparison and Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Liu, J.; Shi, Y.; Fadlullah, Z.M.; Kato, N. Space-air-ground integrated network: A survey. IEEE Commun. Surv. Tutor. 2018, 20, 2714–2741. [Google Scholar] [CrossRef]
- Lin, Z.; Ni, Z.; Kuang, L.; Jiang, C.; Huang, Z. Dynamic beam pattern and bandwidth allocation based on multi-agent deep reinforcement learning for beam hopping satellite systems. IEEE Trans. Veh. Technol. 2022, 71, 3917–3930. [Google Scholar] [CrossRef]
- Chen, S.; Sun, S.; Kang, S. System integration of terrestrial mobile communication and satellite communication—the trends, challenges and key technologies in B5G and 6G. China Commun. 2020, 17, 156–171. [Google Scholar] [CrossRef]
- Chen, S.; Liang, Y.C.; Sun, S.; Kang, S.; Cheng, W.; Peng, M. Vision, requirements, and technology trend of 6G: How to tackle the challenges of system coverage, capacity, user data-rate and movement speed. IEEE Wirel. Commun. 2020, 27, 218–228. [Google Scholar] [CrossRef]
- Di, B.; Song, L.; Li, Y.; Poor, H.V. Ultra-dense LEO: Integration of satellite access networks into 5G and beyond. IEEE Wirel. Commun. 2019, 26, 62–69. [Google Scholar] [CrossRef]
- Du, J.; Jiang, C.; Zhang, H.; Wang, X.; Ren, Y.; Debbah, M. Secure satellite-terrestrial transmission over incumbent terrestrial networks via cooperative beamforming. IEEE J. Sel. Areas Commun. 2018, 36, 1367–1382. [Google Scholar] [CrossRef]
- Lei, J.; Vázquez-Castro, M.Á. Multibeam satellite frequency/time duality study and capacity optimization. J. Commun. Netw. 2011, 13, 472–480. [Google Scholar] [CrossRef]
- Wang, C.; Liu, N. Research on software defined spectrum sharing cognitive multi-beam satellite communication system. In Proceedings of the 2019 IEEE 2nd International Conference on Automation, Electronics and Electrical Engineering (AUTEEE), Shenyang, China, 22–24 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 598–601. [Google Scholar]
- Lin, Z.; Ni, Z.; Kuang, L.; Jiang, C.; Huang, Z. NGSO satellites beam hopping strategy based on load balancing and interference avoidance for coexistence with GSO systems. IEEE Commun. Lett. 2022, 27, 278–282. [Google Scholar] [CrossRef]
- Fonseca, N.J.; Sombrin, J. Multi-beam reflector antenna system combining beam hopping and size reduction of effectively used spots. IEEE Antennas Propag. Mag. 2012, 54, 88–99. [Google Scholar] [CrossRef]
- Meng, M.; Hu, B.; Chen, S.; Kang, S. Dynamic Beam Pattern Based on Cooperation Multi-Agent VDN-D3QN for LEO Satellite Communication System. IEEE Trans. Green Commun. Netw. 2024, 9, 725–738. [Google Scholar] [CrossRef]
- Shaohui, S.; Liming, H.; Deshan, M. Beam switching solutions for beam-hopping based LEO system. In Proceedings of the 2021 IEEE 94th Vehicular Technology Conference (VTC2021-Fall), Virtual, 27–28 September 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–5. [Google Scholar]
- Kibria, M.G.; Lagunas, E.; Maturo, N.; Spano, D.; Chatzinotas, S. Precoded cluster hopping in multi-beam high throughput satellite systems. In Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM), Big Island, HI, USA, 9–13 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar]
- Diao, R.; Zhang, X.; Zhang, L.; Zheng, S.; Quan, Q. A Muti-beam Placement Optimization Scheme in LEO Beam Hopping Satellite Systems. In Proceedings of the 2023 International Conference on Wireless Communications and Signal Processing (WCSP), Hangzhou, China, 2–4 November 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 658–663. [Google Scholar]
- Li, W.; Zeng, M.; Wang, X.; Fei, Z. Dynamic beam hopping of double LEO multi-beam satellite based on determinant point process. In Proceedings of the 2022 14th International Conference on Wireless Communications and Signal Processing (WCSP), Nanjing, China, 1–3 November 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 713–718. [Google Scholar]
- Li, T.; Yao, R.; Fan, Y.; Zuo, X.; Jiang, L. Multiobjective optimization for beam hopping and power allocation in dual satellite cooperative transmission networks. IEEE Syst. J. 2023, 17, 3870–3881. [Google Scholar] [CrossRef]
- Shi, S.; Li, G.; Li, Z.; Zhu, H.; Gao, B. Joint power and bandwidth allocation for beam-hopping user downlinks in smart gateway multibeam satellite systems. Int. J. Distrib. Sens. Netw. 2017, 13, 1550147717709461. [Google Scholar] [CrossRef]
- Lei, L.; Lagunas, E.; Yuan, Y.; Kibria, M.G.; Chatzinotas, S.; Ottersten, B. Beam illumination pattern design in satellite networks: Learning and optimization for efficient beam hopping. IEEE Access 2020, 8, 136655–136667. [Google Scholar] [CrossRef]
- Zhang, T.; Zhang, L.; Shi, D. Resource allocation in beam hopping communication system. In Proceedings of the 2018 IEEE/AIAA 37th Digital Avionics Systems Conference (DASC), London, UK, 23–27 September 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–5. [Google Scholar]
- Sunehag, P.; Lever, G.; Gruslys, A.; Czarnecki, W.M.; Zambaldi, V.; Jaderberg, M.; Lanctot, M.; Sonnerat, N.; Leibo, J.Z.; Tuyls, K.; et al. Value-decomposition networks for cooperative multi-agent learning. arXiv 2017, arXiv:1706.05296. [Google Scholar]
- Zhang, Y.; Wang, Y.; Zhang, Q.; Jia, H.; Feng, L. Optimizing Beam Hopping in Multibeam NGSO Constellations with Multi-Agent Reinforcement Learning. In Proceedings of the 2024 IEEE International Workshop on Radio Frequency and Antenna Technologies (iWRF&AT), Shenzhen, China, 31 May–3 June 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 476–481. [Google Scholar]
- Hu, X.; Zhang, Y.; Liao, X.; Liu, Z.; Wang, W.; Ghannouchi, F.M. Dynamic beam hopping method based on multi-objective deep reinforcement learning for next generation satellite broadband systems. IEEE Trans. Broadcast. 2020, 66, 630–646. [Google Scholar] [CrossRef]
- Liu, H.; Wang, Y.; Wang, T.; Li, P. User-Level Dynamic Beam Hopping Design for LEO Satellite Networks Based on Deep Reinforcement Learning Assisted Enhanced Genetic Algorithm. In Proceedings of the 2024 IEEE 99th Vehicular Technology Conference (VTC2024-Spring), Singapore, 24–27 June 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–7. [Google Scholar]
- Ran, Y.; Tan, F.; Chen, S.; Lei, J.; Luo, J. Towards beam hopping and power allocation in multi-beam satellite systems with parameterized reinforcement learning. IEEE Trans. Veh. Technol. 2024, 73, 14050–14055. [Google Scholar] [CrossRef]
- Wen, J.; Wang, C.; Zhao, X.; Chen, R.; Wang, W. Beam Hopping and Power Allocation of LEO Multi-Satellite Communication Network Based on Multi-Agent DQN Algorithm. In Proceedings of the 2024 10th International Conference on Computer and Communications (ICCC), Chengdu, China, 13–16 December 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1832–1839. [Google Scholar]
- Angeletti, P.; Fernandez Prim, D.; Rinaldo, R. Beam hopping in multi-beam broadband satellite systems: System performance and payload architecture analysis. In Proceedings of the 24th AIAA International Communications Satellite Systems Conference, San Diego, CA, USA, 11–14 June 2006; p. 5376. [Google Scholar]
- Han, H.; Zheng, X.; Huang, Q.; Lin, Y. QoS-equilibrium slot allocation for beam hopping in broadband satellite communication systems. Wirel. Netw. 2015, 21, 2617–2630. [Google Scholar] [CrossRef]
Notations | Definitions |
---|---|
M | Number of beam positions |
K | Number of beams |
N | Number of beam power |
T | Beam hopping period |
The beam hopping pattern in time slot t | |
Indication of whether the beam position m is illuminated in time slot t | |
Newly arrived data packets in beam position m in time slot t | |
The survival time of data packets in the queue | |
The data packet buffer queues for all beam positions | |
The data packet buffer queue of beam position m in time slot t | |
The packet arrival rate of beam position m in time slot t | |
The generation probability of data packets in each time slot | |
The traffic demands of all beam positions in time slot t | |
The service demand of beam position m in time slot t | |
The data in beam position m in time slot t | |
The channel coefficient between the beam k and the beam position m | |
H | Channel coefficient matrix between beam and ground beam position |
The antenna transmission gain of beams | |
The antenna reception gain of beam positions | |
Transmission path loss | |
Total satellite bandwidth | |
Total satellite power | |
Maximum beam power | |
The channel capacity of the beam position m in time slot t | |
The weight between user satisfaction and delay fairness |
Layer | Input Size | Output Size | Activation |
---|---|---|---|
Input Layer (fc1) | 76 | 64 | Relu |
Hidden Layer 1 (GRU Cell) | 64 | 64 | GRU-internal |
Output Layer (fc2) | 64 | Linear |
Simulation Parameters | Values |
---|---|
Satellite altitude H | 550 km |
Ka-band frequency | 20 GHz |
Number of LEO satellite | 1 |
Number of beams K | 4 |
Number of positions M | 12 |
Tatal bandwidth | 200 MHz |
Shadow fading variance | 5 dB |
Multipath fading variance | 2.5 dB |
Total satellite power | 23W |
Beam power | 8 dBW |
3 dB beamwidth | |
Antenna transmission gain | 25 dBi |
Antenna reception gain | 45 dBi |
Noise power spectral density | −173.8 dBm/Hz |
Noise temperature | 300 K |
Boltzmann constant | |
Time slot duration | 2 ms |
Time to survive of data packets | |
Traffic demand D | [1500, 2200] Mbps |
Algorithm Parameters | Values |
---|---|
Training episode | 2000 |
Time slots per episode T | 100 |
Learning rate | |
Experience pool size | 2000 |
Target network learning frequency | 100 |
Minibatch size | 64 |
Discount factor | 0.9 |
Initial exploration rate | 1 |
Final exploration rate | 0.05 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, H.; Liu, L.; Zhang, Z. Service-Driven Dynamic Beam Hopping with Resource Allocation for LEO Satellites. Electronics 2025, 14, 2367. https://doi.org/10.3390/electronics14122367
Xu H, Liu L, Zhang Z. Service-Driven Dynamic Beam Hopping with Resource Allocation for LEO Satellites. Electronics. 2025; 14(12):2367. https://doi.org/10.3390/electronics14122367
Chicago/Turabian StyleXu, Huaixiu, Lilan Liu, and Zhizhong Zhang. 2025. "Service-Driven Dynamic Beam Hopping with Resource Allocation for LEO Satellites" Electronics 14, no. 12: 2367. https://doi.org/10.3390/electronics14122367
APA StyleXu, H., Liu, L., & Zhang, Z. (2025). Service-Driven Dynamic Beam Hopping with Resource Allocation for LEO Satellites. Electronics, 14(12), 2367. https://doi.org/10.3390/electronics14122367