# Research on Resource Allocation Method of Space Information Networks Based on Deep Reinforcement Learning

^{*}

## Abstract

**:**

## 1. Introduction

- Based on the core idea of SDN, a hierarchical and domain-controlled SIN architecture is established. The overall network architecture and network control architecture are designed.
- On the basis of the SDN-based SIN architecture, the transmission resources, caching resources, and computing resources in the SIN are unified. Among them, the transmission resource depends on the coverage time of low Earth orbit (LEO) satellite to users, the transmission state of geostationary orbit (GEO) data relay satellite, and the communication link state.
- The dynamic allocation of multi-dimensional resources in the SIN is modeled mathematically. A SIN resource allocation method based on the A3C algorithm is proposed.
- The expected benefits of unit resources under different conditions are simulated and analyzed. The simulation results show that the proposed scheme of unified management of transmission resources, caching resources, and computing resources has better expected benefits, and can effectively improve the efficiency of the SIN resources.

## 2. Related Work

#### 2.1. Space Information Networks

#### 2.2. SDN-Based Space Information Networks

## 3. System Model

#### 3.1. SDN-Based Space Information Network Architecture

#### 3.1.1. Overall Networking Architecture

#### 3.1.2. Network Control Architecture

#### 3.2. Network Model

#### 3.3. Satellite Coverage and Transmission Model

#### 3.3.1. LEO Satellite Coverage Model

#### 3.3.2. GEO Data Relay Satellite Transmission Model

#### 3.4. Communication Link Model

#### 3.5. Caching Model

#### 3.6. Computing Model

## 4. Problem Equation

#### 4.1. State Set

#### 4.2. Action Set

#### 4.3. Reward Function

#### 4.4. A3C Algorithm

Algorithm: Asynchronous Advantage Actor-Critic |

Initialize thread step counter $t\leftarrow 1$ |

repeat |

Reset gradients:$d\Theta \leftarrow 0$ and $d\Theta \leftarrow 0$ $d{\Theta}_{c}\leftarrow 0$ |

Synchronize thread-specific parameters ${\Theta}^{\prime}=\Theta $ and ${\Theta}_{c}{}^{\prime}={\Theta}_{c}$ |

${t}_{start}=t$ |

Get state ${S}_{t}$ |

repeat |

Perform ${a}_{u}(t)$ according to policy $\iota ({a}_{u}(t)|{S}_{t};{\Theta}^{\prime})$ |

Receive reward ${R}_{u}(t)$ and new state ${S}_{t+1}$ |

$t\leftarrow t+1$ |

$T\leftarrow T+1$ |

until terminal ${S}_{t}$ or $t-{t}_{start}={t}_{\mathrm{max}}$ |

$\mathrm{R}=\{\begin{array}{ll}0& \mathrm{for}\mathrm{terminal}{S}_{t}\\ V({S}_{t},{\Theta}_{c}^{\prime})& \mathrm{for}\mathrm{non}-\mathrm{terminal}{S}_{t}//\mathrm{Bootstrap}\mathrm{from}\mathrm{last}\mathrm{state}\end{array}$ |

for $k\in \{t-1,\dots ,{t}_{start}\}$ do |

$R\leftarrow {R}_{u}(k+1)+\Upsilon R$ |

Accumulate gradients wrt ${\Theta}^{\prime}$:$d\Theta \leftarrow d\Theta +{\nabla}_{{\Theta}^{\prime}}\mathrm{log}\iota ({a}_{k}|{S}_{k};{\Theta}^{\prime})(R-V({S}_{k};{\Theta}^{\prime}{}_{c}))$ |

Accumulate gradients wrt ${\Theta}_{c}{}^{\prime}$:$d{\Theta}_{c}\leftarrow d{\Theta}_{c}+\partial (R-V{({S}_{k};{\Theta}_{c}^{\prime})}^{2})/\partial {\Theta}_{c}^{\prime}$ |

end for |

Perform asynchronous update of $\Theta $ using $d\Theta $ and of ${\Theta}_{c}$ using $d{\Theta}_{c}$ |

Until $T>{T}_{\mathrm{max}}$ |

## 5. Simulation Analysis

#### 5.1. Simulation Parameter Setting

#### 5.2. Simulation Result

## 6. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Zhang, W. Topological Control Theory and Method of Space Information Network; PLA University Science and Technology: Nanjing, China, 2016; pp. 1–5. [Google Scholar]
- Wang, Y.; Sheng, M.; Zhuang, W.; Zhang, S.; Zhang, N.; Liu, R. Multi-Resource Coordinate Scheduling for Earth Observation in Space Information Networks. IEEE J. Sel. Areas Commun.
**2018**, 36, 268–279. [Google Scholar] [CrossRef] - National Natural Science Foundation. The Program Guidance of the Basic Theory and Key Technology Research of Space Information Network in 2016. Available online: http://www.nsfc.gov.cn/publish/portal0/tab38/info51946.htm (accessed on 25 March 2016).
- Yu, Q.Y.; Meng, W.X.; Yang, M.C.; Zheng, L.M.; Zhang, Z.Z. Virtual multi-beamforming for distributed satellite clusters in space information networks. IEEE Wirel. Commun.
**2016**, 23, 95–101. [Google Scholar] [CrossRef] - Li, D.R.; Shen, X.; Gong, J.Y.; Zhang, J.; Lu, J.H. On construction of China’s space information network. Wuhan Univ. Inf. Sci. Ed.
**2015**, 40, 711–715. [Google Scholar] [CrossRef] - Cui, L.; Yu, F.R.; Yan, Q. When big data meets software-defined networking: SDN for big data and big data for SDN. IEEE Netw.
**2016**, 30, 58–65. [Google Scholar] [CrossRef] - Li, T.X.; Zhou, H.C.; Xu, Q. SAT-FLOW: Multi-Strategy Flow Table Management for Software Defined Satellite Networks. IEEE Access
**2017**, 5, 14952–14965. [Google Scholar] [CrossRef] - Gardikis, G.; Koumaras, H.; Sakkas, C.; Koumaras, V. Towards SDN/NFV-enabled satellite networks. Telecommun. Syst.
**2017**, 66, 1–14. [Google Scholar] [CrossRef] - Liu, Q.; Zhai, J.W.; Zhang, Z.Z.; Zhong, S.; Zhou, Q.; Zhang, P. A Survey on Deep Reinforcement Learing. Chin. J. Comp.
**2018**, 1, 1–27. [Google Scholar] [CrossRef] - Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature
**2015**, 518, 529–533. [Google Scholar] [CrossRef] - Jennings, E.; Heckman, D. Performance Characterization of Space Communications and Navigation (SCaN) Network by Simulation. In Proceedings of the IEEE Aerospace Conference, Big Sky, MT, USA, 1–8 March 2008; pp. 1–9. [Google Scholar] [CrossRef]
- Vanderpoorten, J.; Cohen, J.; Moody, J.; Cornell, C.; Streland, A.; Breese, S. Transformational Satellite Communications System (TSAT) lessons learned: Perspectives from TSAT program leaders. In Proceedings of the 2012 IEEE Military Communications Conference, Orlando, FL, USA, 29 October–1 November 2012; pp. 1–6. [Google Scholar] [CrossRef]
- Sesena, J.; Alfaro, A.; Munoz, S. Regulatory environment for the successful ISICOM development. In Proceedings of the 2009 International Workshop on Satellite and Space Communications, Tuscany, Italy, 9–11 September 2009; pp. 109–112. [Google Scholar] [CrossRef]
- Axford, R.; Short, S.; Shchupak, P.; Muhammad, N. Wideband Global SATCOM (WGS) earth terminal interoperability demonstrations. In Proceedings of the 2012 IEEE Military Communications Conference, San Diego, CA, USA, 16–19 November 2008; pp. 1–6. [Google Scholar] [CrossRef]
- Schroth, K.; Burkhardt, N.; Che, T.S.; Pisano, D. IP networking over the AEHF MILSATCOM system. In Proceedings of the 2012 IEEE Military Communications Conference, Orlando, FL, USA, 29 October–1 November 2012; pp. 1–6. [Google Scholar] [CrossRef]
- Adinolfi, M.; Cesta, A. Heuristic scheduling of the DRS communication system. Eng. Appl. Artif. Intell.
**1995**, 8, 147–156. [Google Scholar] [CrossRef] - Rojanasoonthon, S.; Bard, J.F.; Reddy, S.D. Algorithms for parallel machine scheduling: A case study of the tracking and data relay satellite system. J. Oper. Res. Soc.
**2003**, 54, 806–821. [Google Scholar] [CrossRef] - Gu, Z.S. Research on the Relay Satellite Dynamic Scheduling Problem Modeling and Optimizational Technology; National University of Defense Technology: Changsha, China, 2008; pp. 11–26. [Google Scholar] [CrossRef]
- Bertaux, L.; Medjiah, S.; Berthou, P.; Abdellatif, S.; Hakiri, A.; Gelard, P.; Planchou, F.; Bruyere, M. Software defined networking and virtualization for broadband satellite networks. IEEE Commun. Mag.
**2015**, 53, 54–60. [Google Scholar] [CrossRef] [Green Version] - Ferrús, R.; Koumaras, H.; Sallent, O.; Agapiou, G.; Rasheed, T.; Kourtis, M.-A.; Boustie, C.; Gélard, P.; Ahmed, T. SDN/NFV-enabled satellite communications networks: Opportunities, scenarios and challenges. Phys. Commun.
**2016**, 18, 95–112. [Google Scholar] [CrossRef] [Green Version] - Gopal, R.; Ravishankar, C. Software Defined Satellite Networks. In Proceedings of the Aiaa International Communications Satellite Systems Conference, San Diego, CA, USA, 24–27 September 2013. [Google Scholar] [CrossRef]
- Yu, X.; Lei, W.M.; Song, L. A framework of SDN-based satellits on-board switching networks. J. PLA Univ. Sci. Tech. (Nat. Sci. Ed.)
**2017**, 18, 224–230. [Google Scholar] [CrossRef] - Zhu, S.Y. Research on Routing Algorithm of Space Network Based on SDN; Harbin Institute of Technology: Harbin, China, 2017; pp. 1–19. [Google Scholar]
- Tian, R.; Yu, X.S.; Zhao, Y.L.; Wang, W.Z.; Li, Y.J.; Wang, C.F.; Zhang, J. Multi-path Carrying Strategy in SDN-based Space Information Networks. Radio Eng.
**2016**, 46, 63–67. [Google Scholar] [CrossRef] - Tian, R. Research on Control Protocol and Routing Algorithms of Software Defined Space-Terrestrial Network; Beijing University of Posts and Telecommunications: Beijing, China, 2017; pp. 9–16. [Google Scholar]
- Zhang, S.M.; Zou, F.M. Survey on software defined network research. Appl. Res. Comput.
**2013**, 30, 2246–2251. [Google Scholar] [CrossRef] - Nguyen, X.N.; Saucez, D.; Barakat, C. Rules Placement Problem in OpenFlow Networks: A Survey. IEEE Commun. Surv. Tutor.
**2016**, 18, 1273–1286. [Google Scholar] [CrossRef] - Zhang, Q.; Li, M.; Deng, Y. Measure the structure similarity of nodes in complex networks based on relative entropy. Phys. A Stat. Mech. Appl.
**2018**, 491, 749–763. [Google Scholar] [CrossRef] - Yang, B.; He, F.; Jin, J.; Xu, G.H. Analysis of Coverage Time and Handoff Number on LEO Satellite Comunication Systems. J. Electron. Inf. Technol.
**2014**, 36, 804–809. [Google Scholar] [CrossRef] - Deng, B.; Jiang, C.; Kuang, L.; Guo, S.; Lu, J.; Zhao, S. Two-Phase Task Scheduling in Data Relay Satellite Systems. IEEE Trans. Veh. Technol.
**2018**, 67, 1782–1793. [Google Scholar] [CrossRef] - Gomaa, H.; Messier, G.G.; Williamson, C.; Davies, R. Estimating Instantaneous Cache Hit Ratio Using Markov Chain Analysis. IEEE/ACM Trans. Netw.
**2013**, 21, 1472–1483. [Google Scholar] [CrossRef] - Breslau, L.; Cao, P.; Fan, L.; Phillips, G.; Shenker, S. Web caching and Zipf-like distributions: Evidence and implications. Proc. IEEE INFOCOM
**1999**, 1, 126–134. [Google Scholar] [CrossRef] - Li, H.Q. Hardware Implementation of LEO Satellite Channel Characteristic Emulation; Harbin Institute of Technology: Harbin, China, 2008; pp. 12–15. [Google Scholar]
- Theofanis, X.; Psannis, K.E. Caching Hit Probability and Compressive Sensing Perspective for Mobile Cellular Networks. Simul. Model. Pract. Theory
**2018**, 87, 92–98. [Google Scholar] [CrossRef] - Daniel, G.; Gerson, S.; Jordi, C. Advanced prefetching and caching of models with PrefetchML. Softw. Syst. Model.
**2018**, 1–22. [Google Scholar] [CrossRef] [Green Version] - Zhou, Y.; Yu, F.R.; Chen, J.; Kuo, Y. Resource Allocation for Information-Centric Virtualized Heterogeneous Networks with In-Network Caching and Mobile Edge Computing. IEEE Trans Veh. Technol.
**2017**, 66, 11339–11351. [Google Scholar] [CrossRef] - He, Y.; Zhao, N.; Yin, H.X. Integrated Networking, Caching and Computing for Connected Vehicles: A Deep Reinforcement Learning Approach. IEEE Trans. Veh. Technol.
**2018**, 67, 44–55. [Google Scholar] [CrossRef] - Helma, C.; Cramer, T.; Kramer, S.; Raedt, L.D. Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds. J. Chem. Inf. Comput. Sci.
**2018**, 35, 1402–1411. [Google Scholar] [CrossRef] - Jiang, S.W.; Guo, K.K.; Liao, J.; Zheng, G.A. Solving Fourier ptychographic imaging problems via neural network modeling and TensorFlow. Biomed. Opt. Express
**2018**, 9, 3306–3319. [Google Scholar] [CrossRef] - Ying, H.; Cheng, C.L.; Richard, Y.; Zhu, H. Trust-based Social Networks with Computing, Caching and Communications: A Deep Reinforcement Learning Approach. IEEE Trans. Netw. Sci. Eng.
**2018**, 1–14. [Google Scholar] [CrossRef]

**Figure 1.**Overall networking architecture of the hierarchical and domain-controlled space information network (SIN) architecture.

**Figure 9.**The relationship between the unit charging price for using transmission resources and the expected benefit of unit resources.

**Figure 10.**The relationship between the unit charging price for using caching resources and the expected benefit of unit resources.

**Figure 11.**The relationship between the unit charging price for using computing resources and the expected benefit of unit resources.

Architecture | Satellite–Earth Network | Space-based Network | Space–net–Earth Network |
---|---|---|---|

Typical system | Civil: Inmarsat, O3b, OneWeb, IntersatMilitary: WGS, MUOS | Civil: IridiumMilitary: AEHF | Civil: SCaN, ISICOMMilitary: TSAT |

Ground | Global distributed ground station network | The system can operate independently of the ground station | The earth and the sky cooperate with each other; the ground network does not need the global distribution of stations |

Inter-satellite networking | No | Yes | Yes |

Equipment on satellite | Simple | Complex | Moderate |

Difficulty of System Maintenance | Simple | Complex | Moderate |

Technical complexity | Simple | Complex | Moderate |

Construction cost | Low | High | Moderate |

**Table 2.**Simulation parameter setting. LEO—low Earth orbit; GEO—geostationary orbit; CPU—central processing unit.

Parameters | Values | Descriptions |
---|---|---|

${B}_{u}^{l}$ | 6 MHz | Bandwidth allocated by LEO satellite l to user u |

${B}_{l}^{lg}$ | 6 MHz | Bandwidth allocated by GEO satellite lg to user l |

${\delta}_{l}$ | 2 units/MHz | Payment price using LEO spectrum resources |

${\delta}_{lg}$ | 2 units/MHz | Payment price using GEO spectrum resources |

${\varsigma}_{c}$ | 4 units/Mbits | Payment price using caching resources |

${\eta}_{m}$ | 1 unit/J | Payment price using computing resources |

${\tau}_{u}$ | 15 units/Mbps | The unit transmission fee charged to the user |

${\kappa}_{u}$ | 10 units/Mbps | The unit caching fee charged to the user |

${\varphi}_{u}$ | 5 units/Mbps | The unit computing fee charged to the user |

${\theta}_{u,\mathrm{max}}^{l}$ | $\pi /2$ | Maximum elevation between user u and satellite l |

${n}_{u}$ | 6 Mcycles | Number of cycles a CPU takes to complete each space task |

${e}_{m}$ | 1 J | The energy consumed by the CPU in one lap |

${o}_{u}$ | 3 Mbits | Task content |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Meng, X.; Wu, L.; Yu, S.
Research on Resource Allocation Method of Space Information Networks Based on Deep Reinforcement Learning. *Remote Sens.* **2019**, *11*, 448.
https://doi.org/10.3390/rs11040448

**AMA Style**

Meng X, Wu L, Yu S.
Research on Resource Allocation Method of Space Information Networks Based on Deep Reinforcement Learning. *Remote Sensing*. 2019; 11(4):448.
https://doi.org/10.3390/rs11040448

**Chicago/Turabian Style**

Meng, Xiangli, Lingda Wu, and Shaobo Yu.
2019. "Research on Resource Allocation Method of Space Information Networks Based on Deep Reinforcement Learning" *Remote Sensing* 11, no. 4: 448.
https://doi.org/10.3390/rs11040448