# Revising the Observation Satellite Scheduling Problem Based on Deep Reinforcement Learning

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

- Aiming to enhance the task scheduling efficiency further, an improved graph-based minimum clique partition algorithm is introduced as a task clustering preprocess to decrease the task scale and improve the scheduling algorithm’s effect.
- Different from previous studies, the EOSSP was considered as a time-discrete model when solving by RL algorithms. In this paper, a time-continuous model is established for the EOSSP, which could make accurate observation time decisions for each task by the DDPG algorithm.
- Considering practical engineering constraints, comparison experiments were implemented between the RL method and some metaheuristic methods, such as the GA, SA and GA–SA hybrid algorithm, to validate the feasibility of the DDPG algorithm.

## 2. Problem Description

#### 2.1. Graph Clustering Model

#### 2.2. Task Scheduling Problem

#### 2.2.1. Scheduling Model

#### 2.2.2. Constraint Conditions

#### 2.2.3. Optimization Objectives

## 3. Solving Method

#### 3.1. Task Preprocess: Graph Clustering

#### 3.1.1. Graph Model Establishment

#### 3.1.2. Clique Partition Algorithm

Algorithm 1: Improved minimum clique partition algorithm |

#### 3.2. DRL-Based Method for Optimization

#### 3.2.1. Markov Decision Process Model

#### 3.2.2. Optimization with DDPG

#### 3.2.3. Task Scheduling Method

Algorithm 2: Task scheduling method based on DDPG |

## 4. Experimental Simulation

#### 4.1. Simulation Scenario

#### 4.2. Results and Discussion

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## References

- Bianchessi, N.; Cordeau, J.F.; Desrosiers, J.; Laporte, G.; Raymond, V. A Heuristic for the Multi-Satellite, Multi-Orbit and Multi-User Management of Earth Observation Satellites. Eur. J. Oper. Res.
**2007**, 177, 750–762. [Google Scholar] [CrossRef] - Bianchessi, N.; Righini, G. Planning and Scheduling Algorithms for the COSMO-SkyMed Constellation. Aerosp. Sci. Technol.
**2008**, 12, 535–544. [Google Scholar] [CrossRef] - Irrgang, C.; Saynisch, J.; Thomas, M. Estimating Global Ocean Heat Content from Tidal Magnetic Satellite Observations. Sci. Rep.
**2019**, 9, 1–8. [Google Scholar] - Gevaert, C.M.; Suomalainen, J.; Tang, J.; Kooistra, L. Generation of Spectral–Temporal Response Surfaces by Combining Multispectral Satellite and Hyperspectral UAV Imagery for Precision Agriculture Applications. IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens.
**2015**, 8, 3140–3146. [Google Scholar] [CrossRef] - Lemaître, M.; Verfaillie, G.; Jouhaud, F.; Lachiver, J.M.; Bataille, N. Selecting and Scheduling Observations of Agile Satellites. Aerosp. Sci. Technol.
**2002**, 6, 367–381. [Google Scholar] [CrossRef] - Zheng, Z.; Guo, J.; Gill, E. Distributed Onboard Mission Planning for Multi-Satellite Systems. Aerosp. Sci. Technol.
**2019**, 89, 111–122. [Google Scholar] [CrossRef] - Wang, X.; Wu, G.; Xing, L.; Pedrycz, W. Agile Earth Observation Satellite Scheduling over 20 Years: Formulations, Methods, and Future Directions. IEEE Syst. J.
**2020**. [Google Scholar] [CrossRef] - Xu, R.; Wang, H.; Zhu, S.; Jiang, H.; Li, Z. Multiobjective Planning for Spacecraft Reorientation under Complex Pointing Constraints. Aerosp. Sci. Technol.
**2020**, 104, 106002. [Google Scholar] [CrossRef] - Wolfe, W.J.; Sorensen, S.E. Three Scheduling Algorithms Applied to the Earth Observing Systems Domain. Manag. Sci.
**2000**, 46, 148–166. [Google Scholar] [CrossRef] - Zhu, X.; Zhang, C.; Sun, R.; Chen, J.; Wan, X. Orbit Determination for Fuel Station in Multiple SSO Spacecraft Refueling Considering the J2 Perturbation. Aerosp. Sci. Technol.
**2020**, 105, 105994. [Google Scholar] [CrossRef] - Chen, X.; Reinelt, G.; Dai, G.; Spitz, A. A Mixed Integer Linear Programming Model for Multi-Satellite Scheduling. Eur. J. Oper. Res.
**2019**, 275, 694–707. [Google Scholar] [CrossRef] [Green Version] - Peng, G.; Dewil, R.; Verbeeck, C.; Gunawan, A.; Xing, L.; Vansteenwegen, P. Agile Earth Observation Satellite Scheduling: An Orienteering Problem with Time-Dependent Profits and Travel Times. Comput. Oper. Res.
**2019**, 111, 84–98. [Google Scholar] [CrossRef] - Liu, X.; Laporte, G.; Chen, Y.; He, R. An Adaptive Large Neighborhood Search Metaheuristic for Agile Satellite Scheduling with Time-Dependent Transition Time. Comput. Oper. Res.
**2017**, 86, 41–53. [Google Scholar] [CrossRef] - Wang, X.W.; Chen, Z.; Han, C. Scheduling for Single Agile Satellite, Redundant Targets Problem Using Complex Networks Theory. Chaos Solitons Fractals
**2016**, 83, 125–132. [Google Scholar] [CrossRef] - Valicka, C.G.; Garcia, D.; Staid, A.; Watson, J.P.; Hackebeil, G.; Rathinam, S.; Ntaimo, L. Mixed-Integer Programming Models for Optimal Constellation Scheduling given Cloud Cover Uncertainty. Eur. J. Oper. Res.
**2019**, 275, 431–445. [Google Scholar] [CrossRef] - Wang, X.; Han, C.; Zhang, R.; Gu, Y. Scheduling Multiple Agile Earth Observation Satellites for Oversubscribed Targets Using Complex Networks Theory. IEEE Access
**2019**, 7, 110605–110615. [Google Scholar] [CrossRef] - Islas, M.A.; Rubio, J.d.J.; Muñiz, S.; Ochoa, G.; Pacheco, J.; Meda-Campaña, J.A.; Mujica-Vargas, D.; Aguilar-Ibañez, C.; Gutierrez, G.J.; Zacarias, A. A Fuzzy Logic Model for Hourly Electrical Power Demand Modeling. Electronics
**2021**, 10, 448. [Google Scholar] [CrossRef] - De Jesus Rubio, J. SOFMLS: Online Self-Organizing Fuzzy Modified Least-Squares Network. IEEE Trans. Fuzzy Syst.
**2009**, 17, 1296–1309. [Google Scholar] [CrossRef] - Gabrel, V.; Moulet, A.; Murat, C.; Paschos, V.T. A New Single Model and Derived Algorithms for the Satellite Shot Planning Problem Using Graph Theory Concepts. Ann. Oper. Res.
**1997**, 69, 115–134. [Google Scholar] [CrossRef] - Jang, J.; Choi, J.; Bae, H.J.; Choi, I.C. Image Collection Planning for KOrea Multi-Purpose SATellite-2. Eur. J. Oper. Res.
**2013**, 230, 190–199. [Google Scholar] [CrossRef] - Liu, S.; Yang, J. A Satellite Task Planning Algorithm Based on a Symmetric Recurrent Neural Network. Symmetry
**2019**, 11, 1373. [Google Scholar] [CrossRef] [Green Version] - Kim, H.; Chang, Y.K. Mission Scheduling Optimization of SAR Satellite Constellation for Minimizing System Response Time. Aerosp. Sci. Technol.
**2015**, 40, 17–32. [Google Scholar] [CrossRef] - Niu, X.; Tang, H.; Wu, L. Satellite Scheduling of Large Areal Tasks for Rapid Response to Natural Disaster Using a Multi-Objective Genetic Algorithm. Int. J. Disaster Risk Reduct.
**2018**, 28, 813–825. [Google Scholar] [CrossRef] - Long, X.; Wu, S.; Wu, X.; Huang, Y.; Mu, Z. A GA-SA Hybrid Planning Algorithm Combined with Improved Clustering for LEO Observation Satellite Missions. Algorithms
**2019**, 12, 231. [Google Scholar] [CrossRef] [Green Version] - Mao, H.; Alizadeh, M.; Menache, I.; Kandula, S. Resource Management with Deep Reinforcement Learning. In Proceedings of the 15th ACM Workshop on Hot Topics in Networks; Association for Computing Machinery: Atlanta, GA, USA, 2016; pp. 50–56. [Google Scholar]
- Sutton, R.S.; McAllester, D.A.; Singh, S.P.; Mansour, Y. Policy Gradient Methods for Reinforcement Learning with Function Approximation. Adv. Neural Inf. Process. Syst.
**1999**, 99, 1057–1063. [Google Scholar] - Bello, I.; Pham, H.; Le, Q.V.; Norouzi, M.; Bengio, S. Neural Combinatorial Optimization with Reinforcement Learning. arXiv
**2016**, arXiv:1611.09940. [Google Scholar] - Khalil, E.; Dai, H.; Zhang, Y.; Dilkina, B.; Song, L. Learning Combinatorial Optimization Algorithms over Graphs. Adv. Neural Inf. Process. Syst.
**2017**, 30, 6348–6358. [Google Scholar] - Nazari, M.; Oroojlooy, A.; Snyder, L.; Takác, M. Reinforcement Learning for Solving the Vehicle Routing Problem. Adv. Neural Inf. Process. Syst.
**2018**, 31, 9839–9849. [Google Scholar] - Peng, B.; Wang, J.; Zhang, Z. A Deep Reinforcement Learning Algorithm Using Dynamic Attention Model for Vehicle Routing Problems. In International Symposium on Intelligence Computation and Applications; Springer: Singapore, 2019; pp. 636–650. [Google Scholar]
- Khadilkar, H. A Scalable Reinforcement Learning Algorithm for Scheduling Railway Lines. IEEE Trans. Intell. Transp. Syst.
**2018**, 20, 727–736. [Google Scholar] [CrossRef] - Ye, H.; Li, G.Y.; Juang, B.H.F. Deep Reinforcement Learning Based Resource Allocation for V2V Communications. IEEE Trans. Veh. Technol.
**2019**, 68, 3163–3173. [Google Scholar] [CrossRef] [Green Version] - Hadj-Salah, A.; Verdier, R.; Caron, C.; Picard, M.; Capelle, M. Schedule Earth Observation Satellites with Deep Reinforcement Learning. arXiv
**2019**, arXiv:1911.05696. [Google Scholar] - Haijiao, W.; Zhen, Y.; Wugen, Z.; Dalin, L. Online Scheduling of Image Satellites Based on Neural Networks and Deep Reinforcement Learning. Chin. J. Aeronaut.
**2019**, 32, 1011–1019. [Google Scholar] - Zhao, X.; Wang, Z.; Zheng, G. Two-Phase Neural Combinatorial Optimization with Reinforcement Learning for Agile Satellite Scheduling. J. Aerosp. Inf. Syst.
**2020**, 17, 346–357. [Google Scholar] [CrossRef] - Lam, J.T.; Rivest, F.; Berger, J. Deep Reinforcement Learning for Multi-Satellite Collection Scheduling. In International Conference on Theory and Practice of Natural Computing; Springer: Cham, Switzerland, 2019; pp. 184–196. [Google Scholar]
- Wu, G.; Du, X.; Fan, M.; Wang, J.; Shi, J.; Wang, X. Ensemble of Heuristic and Exact Algorithm Based on the Divide and Conquer Framework for Multi-Satellite Observation Scheduling. arXiv
**2020**, arXiv:2007.03644. [Google Scholar] - Wu, G.; Liu, J.; Ma, M.; Qiu, D. A Two-Phase Scheduling Method with the Consideration of Task Clustering for Earth Observing Satellites. Comput. Oper. Res.
**2013**, 40, 1884–1894. [Google Scholar] [CrossRef] - Tangpattanakul, P.; Jozefowiez, N.; Lopez, P. A Multi-Objective Local Search Heuristic for Scheduling Earth Observations Taken by an Agile Satellite. Eur. J. Oper. Res.
**2015**, 245, 542–554. [Google Scholar] [CrossRef] [Green Version] - Wang, S.; Zhao, L.; Cheng, J.; Zhou, J.; Wang, Y. Task Scheduling and Attitude Planning for Agile Earth Observation Satellite with Intensive Tasks. Aerosp. Sci. Technol.
**2019**, 90, 23–33. [Google Scholar] [CrossRef] - Liu, F.; Gao, F.; Zhang, W.; Zhang, B.; He, J. The Optimization Design with Minimum Power for Variable Speed Control Moment Gyroscopes with Integrated Power and Attitude Control. Aerosp. Sci. Technol.
**2019**, 88, 287–297. [Google Scholar] [CrossRef] - Tseng, C.; Siewiorek, D.P. Automated Synthesis of Data Paths in Digital Systems. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst.
**1986**, 5, 379–395. [Google Scholar] [CrossRef] - Wu, G.; Wang, H.; Pedrycz, W.; Li, H.; Wang, L. Satellite Observation Scheduling with a Novel Adaptive Simulated Annealing Algorithm and a Dynamic Task Clustering Strategy. Comput. Ind. Eng.
**2017**, 113, 576–588. [Google Scholar] [CrossRef] - Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-Level Control through Deep Reinforcement Learning. Nature
**2015**, 518, 529–533. [Google Scholar] [CrossRef] - Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous Control with Deep Reinforcement Learning. arXiv
**2015**, arXiv:1509.02971. [Google Scholar] - Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv
**2014**, arXiv:1412.6980. [Google Scholar]

Symbol | Meaning |
---|---|

${t}_{u}$ | Observation task to target u |

${t}_{u}^{c}$ | Observation task to merged target u |

$[TW{S}_{u}^{i},TW{E}_{u}^{i}]$ | VTW of ${t}_{u}$ in the $i$th orbit |

$[TW{S}_{u}^{ci},TW{E}_{u}^{ci}]$ | VTW of ${t}_{u}^{c}$ in the $i$th orbit |

${\theta}_{u}$ | Slewing angle for observation task ${t}_{u}$ |

${d}_{u}$ | Observation duration time of ${t}_{u}$ |

${x}_{u}$ | Whether to execute ${t}_{u}$ |

${x}_{uv}$ | Whether to transform execution from ${t}_{u}$ to ${t}_{v}$ |

${v}_{s}$ | Slewing maneuver velocity |

${s}_{uv}$ | Preparation time of task switch |

$tran{T}_{uv}$ | Slewing angle maneuver time between two tasks |

$maxT$ | Maximum operating time in one observation |

$max\theta $ | Maximum slewing angle |

${c}_{i}$ | Storage consumption per unit observation time |

M | Total data storage capacity |

${e}_{i}$ | Energy consumption per unit time of observation |

${\epsilon}_{uv}$ | Energy consumption per unit time of slewing maneuver |

E | Total available energy |

$pri{o}_{u}$ | Priority of observation task ${t}_{u}$ |

Parameters | Value |
---|---|

Semi-major axis of orbit a | 7000 km |

Orbital eccentricity e | 0 |

Orbital inclination i | ${60}^{\circ}$ |

Longitude of ascending node $\mathrm{\Omega}$ | ${285}^{\circ}$ |

Argument of perihelion $\omega $ | ${0}^{\circ}$ |

Mean anomaly ${M}_{0}$ | ${0}^{\circ}$ |

Parameters | Value | Parameters | Value |
---|---|---|---|

M | 600 | ${c}_{i}$ | 1 |

$FOV$ | ${10}^{\circ}$ | ${e}_{i}$ | 1 |

$maxT$ | 150 s | ${\epsilon}_{uv}$ | 0.5 |

$max\theta $ | $\pm {40}^{\circ}$ | E | 1200 |

Parameters | Value |
---|---|

Learning rate for the critic | 0.002 |

Learning rate for the actor | 0.001 |

Discount factor $\gamma $ | 0.95 |

Memory capacity | 3000 |

Batch size | 32 |

Noise attenuation coefficient $\alpha $ | 0.9995 |

Standard deviation $\sigma $ | 0.1 |

Soft synchronization coefficient $\tau $ | 0.001 |

Profit | Running Time (s) | |||||||
---|---|---|---|---|---|---|---|---|

Task Numbers | 50 | 100 | 150 | 200 | 50 | 100 | 150 | 200 |

TC-DDPG | 3.25 | 2.31 | 1.83 | 1.56 | 163.7 | 228.3 | 295.3 | 362.4 |

NTC-DDPG | 3.11 | 2.17 | 1.77 | 1.36 | 181.1 | 255.2 | 352.1 | 454.7 |

GA-SA | 3.14 | 1.95 | 1.54 | 1.13 | 155.3 | 163.7 | 362.9 | 530.2 |

GA | 2.96 | 1.56 | 1.29 | 0.79 | 3.7 | 6.4 | 10.3 | 15.6 |

SA | 2.79 | 1.59 | 1.07 | 0.63 | 47.1 | 83.4 | 379.2 | 702.3 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Huang, Y.; Mu, Z.; Wu, S.; Cui, B.; Duan, Y.
Revising the Observation Satellite Scheduling Problem Based on Deep Reinforcement Learning. *Remote Sens.* **2021**, *13*, 2377.
https://doi.org/10.3390/rs13122377

**AMA Style**

Huang Y, Mu Z, Wu S, Cui B, Duan Y.
Revising the Observation Satellite Scheduling Problem Based on Deep Reinforcement Learning. *Remote Sensing*. 2021; 13(12):2377.
https://doi.org/10.3390/rs13122377

**Chicago/Turabian Style**

Huang, Yixin, Zhongcheng Mu, Shufan Wu, Benjie Cui, and Yuxiao Duan.
2021. "Revising the Observation Satellite Scheduling Problem Based on Deep Reinforcement Learning" *Remote Sensing* 13, no. 12: 2377.
https://doi.org/10.3390/rs13122377