A Satellite Task Planning Algorithm Based on a Symmetric Recurrent Neural Network

: The intelligent satellite, iSAT, is a concept based on software-deﬁned satellites. Earth observation is one of the important applications of intelligent satellites. With the increasing demand for rapid satellite response and observation tasks, intelligent satellite in-orbit task planning has become an inevitable trend. In this paper, a mixed integer programming model for observation tasks is established, and a heuristic search algorithm based on a symmetric recurrent neural network is proposed. The conﬁgurable probability of the observation task is obtained by constructing a structural symmetric recurrent neural network, and ﬁnally, the optimal task planning scheme is obtained. The experimental results are compared with several typical heuristic search algorithms, which have certain advantages, and the validity of the paper is veriﬁed. Finally, future application prospects of the method are discussed. of satellite in-orbit task planning.


Proposal of the Intelligent Satellite
The various existing types of application satellites already have a strong performance. On the one hand, the communication ability, remote sensing, navigation, and other payloads of these satellites have been greatly improved, and they now basically meet the needs of military activities and civil fields. On the other hand, the life of satellites in orbit has been prolonged. However, once these satellites are launched into orbit, the performance parameters of their loads cannot be changed throughout the life of the device, which restricts the effective application of satellites to some extent. At the same time, compared with the open Internet ecological environment, the aerospace industry is relatively closed, and its research and development is basically based on hardware and software. A more common practice is to customize a satellite for a specific function or task. The process has a long development cycle and high research and development costs. Different satellite models are not compatible with each other in terms of hardware, and the components cannot be interchanged. Meanwhile, they are incompatible with each other in terms of software, and software developed for one model cannot be directly run on another satellite model. Developers cannot develop and deploy software for different satellite models. This hardware-based, software-assisted R&D process has become one of the important bottlenecks restricting the development of space technology [1,2]. With the rapid advancement of satellite hardware and software technology, as well as the rapid promotion of software definition ideas, satellites are gradually developing toward miniaturization, a low cost, and a rapid design and deployment. Each country has made software-defined satellites and loads an important development direction. A lot of manpower and resources have been invested in this.

Task Planning for Satellite Earth Observation
Intelligent satellites are oriented toward future space-based information network applications and have more comprehensive space observation requirements than traditional satellites. Therefore, Earth observation capability has become one of the core capabilities that must be considered in relation to the development of intelligent satellites. The observation process of the traditional satellite is roughly as follows: first, the imaging request is given; the imaging satellite ground task observation system is planned, according to the obtained imaging task attribute, satellite information and corresponding constraint conditions; then, the remote sensor control command is generated, according to the result. After the confirmation is found to be correct, instructions are sent to the satellite, and the instructions are executed by the satellite. After the image is transmitted, processed, etc., the image is handed over to the user to complete all the processes. The command is executed by the satellite, the image is sent to the user, after downloading and processing, and all of the processes are completed. Moreover, this "ground planning, on-board execution" approach manages in-orbit satellites and relies too much on the manual development of detailed task execution plan sequences. The amount of command data required for satellites to be greatly increased and the pressure on the ground monitoring and control stations to control the orbiting satellites continue to grow, and they will not be able to adapt to the increasingly precise and complex observation task requirements and high dynamics of intelligent satellite applications. The autonomous satellite observation process is shown in Figure 1. satellite is roughly as follows: first, the imaging request is given; the imaging satellite ground task observation system is planned, according to the obtained imaging task attribute, satellite information and corresponding constraint conditions; then, the remote sensor control command is generated, according to the result. After the confirmation is found to be correct, instructions are sent to the satellite, and the instructions are executed by the satellite. After the image is transmitted, processed, etc., the image is handed over to the user to complete all the processes. The command is executed by the satellite, the image is sent to the user, after downloading and processing, and all of the processes are completed. Moreover, this ″ground planning, on-board execution″ approach manages in-orbit satellites and relies too much on the manual development of detailed task execution plan sequences. The amount of command data required for satellites to be greatly increased and the pressure on the ground monitoring and control stations to control the orbiting satellites continue to grow, and they will not be able to adapt to the increasingly precise and complex observation task requirements and high dynamics of intelligent satellite applications. The autonomous satellite observation process is shown in Figure 1. Due to the manpower consuming and error-prone way of formulating detailed instructions, as well as the development of artificial intelligence technology and space science and technology, autonomous task planning has become one of the satellite intelligent control technologies that many countries have attached importance to in recent years. Autonomous task planning refers to the process in which a satellite, after receiving high-level instructions sent from the ground to describe the task requirements, generates the necessary action sequence with a time stamp in-orbit, according to the current state of the satellite, the action set involved in the satellite task and the flight rules through a certain programming solution algorithm. Satellites with autonomous task planning capabilities can perform various routine operations without interference and have the ability to adapt to fault conditions or unknown environments, self-tolerance, and self-recovery. With the introduction and development of iSAT technology, we will face the rapid growth of users and tasks, continuous and irregular tasks, the dynamic access of ground stations, and a complex computing environment in the satellite. How to maximize the use of satellites and measurement and control resources to meet the needs of dynamically changing task sets is an important reason for studying satellite task planning.
At present, research on the autonomous observation task planning on satellites has achieved certain results, mainly using heuristic search algorithms. However, the application of machine learning methods to satellite autonomous task planning is still in the exploratory stage. Combining the neural network method and heuristic search algorithm, this paper proposes a task planning algorithm, which is superior to the existing heuristic search algorithm in terms of the overall profit. The rest of this paper is organized as follows. Section 2 reviews the existing satellite task planning algorithms and the application of neural networks in satellite task planning. Section 3 describes the Due to the manpower consuming and error-prone way of formulating detailed instructions, as well as the development of artificial intelligence technology and space science and technology, autonomous task planning has become one of the satellite intelligent control technologies that many countries have attached importance to in recent years. Autonomous task planning refers to the process in which a satellite, after receiving high-level instructions sent from the ground to describe the task requirements, generates the necessary action sequence with a time stamp in-orbit, according to the current state of the satellite, the action set involved in the satellite task and the flight rules through a certain programming solution algorithm. Satellites with autonomous task planning capabilities can perform various routine operations without interference and have the ability to adapt to fault conditions or unknown environments, self-tolerance, and self-recovery. With the introduction and development of iSAT technology, we will face the rapid growth of users and tasks, continuous and irregular tasks, the dynamic access of ground stations, and a complex computing environment in the satellite. How to maximize the use of satellites and measurement and control resources to meet the needs of dynamically changing task sets is an important reason for studying satellite task planning.
At present, research on the autonomous observation task planning on satellites has achieved certain results, mainly using heuristic search algorithms. However, the application of machine learning methods to satellite autonomous task planning is still in the exploratory stage. Combining the neural network method and heuristic search algorithm, this paper proposes a task planning algorithm, which is superior to the existing heuristic search algorithm in terms of the overall profit. The rest of this paper is organized as follows. Section 2 reviews the existing satellite task planning algorithms and the application of neural networks in satellite task planning. Section 3 describes the problem of satellite in-orbit observation task planning and sets up the research hypotheses and constraints. In Section 4, the task planning problem is expressed as a mixed integer programming model, and a heuristic search algorithm based on a symmetric recurrent neural network is proposed in Section 5. Section 6 introduces the experimental setting, compares the proposed algorithm with several advanced heuristic algorithms, and concludes that the proposed algorithm outperforms several other algorithms. Finally, Section 7 presents a discussion and analysis, and the application of this method to iSAT satellites is explained.

Overview of Satellite Task Planning Algorithms
The task planning problem of intelligent satellites is the same as that of traditional satellites. It is essentially an NP-hard constraint satisfaction problem [12]. In other words, with the goal of satisfying on-board resource constraints and task constraints, the way to arrange the execution sequence and specific execution time of a group of tasks, so as to make one or more objective functions, such as the largest observation area, the largest number of images, the highest energy distribution efficiency, and the largest task income reach, the optimum is determined. The on-board resource constraints include electrical energy, mobility, communication bandwidth, storage space, side viewing capabilities, and so on. The task constraints include the target area (point and face) size and position, sun height angle, target line of the sight angle, and so on. Many researchers have now studied autonomous satellite task planning and tend to use intelligent optimization algorithms and heuristic algorithms to solve them.
The intelligent optimization algorithm does not need to construct an accurate mathematical search direction and does not need to perform complicated one-dimensional search. Instead, it obtains the optimal solution of the problem through a large number of simple information dissemination and evolution methods, including the genetic algorithm, simulated annealing algorithm, tabu search algorithms, etc. [13]. Mouw et al. [14] studied the task planning problem of optical imaging satellites using the tabu search algorithm and divided the process of the solution into stages. Xhafa et al. [15] also designed an adaptive mutation probability genetic algorithm, when studying the task planning problem of Earth observation satellites. Sarkheyli et al. [16], in a study of low-orbit satellite task planning, took task priority, resource constraints and user satisfaction into consideration and designed a new tabu search algorithm to solve them. Ruan et al. [17] adopted the greedy algorithm for task planning and solving. Globus et al. [18] used the simulated annealing algorithm to solve the satellite point target observation task planning problem. He et al. [19] designed a combination algorithm to solve the problem of task planning for an earth observation satellite in order to avoid the instability problem associated with providing a solution using a single traditional algorithm. In order to improve the efficiency of remote sensing satellite scheduling, Gao et al. [20] designed the ant colony algorithm for remote sensing satellite ground integrated task planning. Jiang and Pang [21] designed an adaptive ant colony algorithm by designing a task merging mechanism to solve the problem of huge user requests. Zheng et al. [22] combined the re-planning technique with the multi-objective hybrid dynamic mutation genetic algorithm for dynamic re-planning scenarios of different urgency levels and designed the cyclic re-planning algorithm and the near-real-time re-planning algorithm. Niu [23] used genetic algorithms to solve dynamic task planning problems.
The heuristic algorithm provides a feasible solution to the combinatorial optimization problem, with an acceptable cost (calculation time, space occupation, etc.). The degree of deviation of the feasible solution from the optimal solution is generally not expected. With the deepening of research, heuristic algorithms can also be simple or complex, single or mixed. It has gradually become an effective method for solving combinatorial optimization problems.
Marinelli et al. [24] regards satellite task planning as an integer programming problem. Using a non-standard blackmail heuristic algorithm, a heuristic algorithm based on Lagrangian relaxation is proposed to solve the problem and verify the efficiency of the algorithm. Bina et al. [25] proposed a local search algorithm based on heuristic rules. Tangpattanakul et al. [26] designed a heuristic algorithm to solve the multi-satellite multi-orbit task planning problem, with the goal of satisfying user requests and various observation constraints. Chen et al. [27] designed a heuristic neighborhood search algorithm to solve the problem of satellite and ground receiving station scheduling problems. When Xue [28] studied the task planning of the Earth observation satellite, he used a two-stage planning algorithm based on relaxation map planning, which combines a heuristic search with plan evaluation technology, and obtained a time-stamped sequential planning solution. Si [29] designed an improved heuristic search algorithm, combining the conflict-oriented algorithm and the arc-compatible algorithm to improve the efficiency of the solution. Liu et al. [30] proposed a rolling plan heuristic algorithm, which uses successive multiple local plans, instead of one global plan. Wang et al. [31] designed a multi-satellite joint task planning and data transmission scheduling algorithm framework for multi-satellite ground observation system joint task planning and designed a task scheduling algorithm, based on task priority.
While the intelligent optimization algorithm has better solutions, the computational efficiency is lower. If the satellite task changes greatly and cannot be processed in time, the intelligent optimization algorithm cannot adapt to the requirements of in-orbit dynamic tasks. Heuristic algorithms are used to solve specific problems. While it is not guaranteed to find the best planning results, it can find a solution that meets the planning requirements in a short time. It has the advantages of a simple and easy implementation, fast and efficient solution, and intuitive and easy participation in the formulation of rules. It can be used as the basis for the algorithm design of intelligent satellite task planning.

Overview of the Application of Neural Networks in Task Planning Algorithms
The existing research, which is mainly conducted through the establishment of mathematical programming models, using various algorithms to directly solve the satellite task planning problem, has less reference to historical task planning results. Some researchers have tried to learn relevant empirical knowledge from historical task planning data using machine learning methods to solve satellite task planning problems to improve the efficiency of problem solving.
For example, Wang et al. [32] mapped the satellite task planning problem into a dynamic random knapsack problem. Asynchronous advantage actor-critic (A3C) reinforcement learning algorithm was used to determine whether each observation task should be inserted into the observation scheme in real time. Experiments show that this method is superior to the first-come-first-serve greedy algorithm in terms of optimization. Li et al. [33] combined the supervised learning method with the heuristic search algorithm and used the neural network to calculate the scheduling priority of each task. Then, the heuristic search algorithm sequentially inserts the observation task into the observation scheme, according to the scheduling priority of the task, which enhances the search ability of heuristic search algorithms. Chen et al. [34] proposed an improved genetic algorithm based on memory learning to solve the resource conflict in the evolutionary search process of a genetic algorithm by learning knowledge rules from historical planning results, which effectively improved the search efficiency and stability of the algorithm. Wang et al. [35] designed the feature expression model based on the satellite observation constraint model and the objective function, which effectively accelerated the convergence speed. Wang et al. [36] introduced the intensive learning method into the multi-satellite collaborative task assignment problem and improved the task negotiation distribution efficiency of multiple satellites by an iteratively "trial and error" search for the best negotiation strategy.
It can be seen, from the above research, that the use of machine learning methods has played a positive role in improving the efficiency and optimization of algorithms. This is also the theoretical basis of this paper.

Problem Description
During the autonomous operation of intelligent satellites, observation tasks will arrive at any time, and their sources can mainly be divided into the following three types: 1.
When the iSAT satellite detects natural disasters, such as floods, mudslides, forest fires or ground targets that are covered by clouds, new observation requests are autonomously generated; 2.
The iSAT satellite receives cooperative observation requests, sent by other iSAT satellites, such as the joint observation of multiple types of sensors; 3.
The iSAT satellite receives a user observation request, uploaded by the ground control center or the ground user terminal.
The autonomous task planning problem for satellites is essentially a problem of maximizing the satisfaction of customer observation needs.
According to the needs of different customers, the optimization objectives have multiple manifestations, such as maximizing the observation gain, maximizing the number of observation tasks, maximizing resource utilization, etc. It may also consider multiple optimization objectives at the same time. In this paper, the commonly used maximization observation gain is used as the optimization goal, and the profit is maximized by reasonably selecting the observation task within the specified time range.
For a single iSAT satellite, we define the interval between two consecutive star-ground communication links as the entire task scheduling range. At the beginning of the scheduling range, the satellite already has several tasks to execute, and there may be new tasks to join the sequence that is to be planned. This paper does not differentiate between the two and treats them in accordance with the tasks to be planned. The total number of tasks in the two groups is N, and the two groups are collectively awaiting task planning and scheduling. The existing tasks and the newly added tasks are shown in Figure 2.

Problem Description
During the autonomous operation of intelligent satellites, observation tasks will arrive at any time, and their sources can mainly be divided into the following three types: 1. When the iSAT satellite detects natural disasters, such as floods, mudslides, forest fires or ground targets that are covered by clouds, new observation requests are autonomously generated; 2. The iSAT satellite receives cooperative observation requests, sent by other iSAT satellites, such as the joint observation of multiple types of sensors; 3. The iSAT satellite receives a user observation request, uploaded by the ground control center or the ground user terminal.
The autonomous task planning problem for satellites is essentially a problem of maximizing the satisfaction of customer observation needs.
According to the needs of different customers, the optimization objectives have multiple manifestations, such as maximizing the observation gain, maximizing the number of observation tasks, maximizing resource utilization, etc. It may also consider multiple optimization objectives at the same time. In this paper, the commonly used maximization observation gain is used as the optimization goal, and the profit is maximized by reasonably selecting the observation task within the specified time range.
For a single iSAT satellite, we define the interval between two consecutive star-ground communication links as the entire task scheduling range. At the beginning of the scheduling range, the satellite already has several tasks to execute, and there may be new tasks to join the sequence that is to be planned. This paper does not differentiate between the two and treats them in accordance with the tasks to be planned. The total number of tasks in the two groups is N, and the two groups are collectively awaiting task planning and scheduling. The existing tasks and the newly added tasks are shown in Figure 2.

Assumptions
Based on the research of actual and realistic needs, this paper makes reasonable assumptions regarding the problems raised in this paper, which are as follows: 1) The scheduling range is defined as the interval between two consecutive satellite-ground communication links. In terms of time, geographically, we only plan observation tasks within a given range of observation tasks. 2) Intelligent satellites operate well throughout the entire dispatch range and will not be affected by space radiation effects.

Assumptions
Based on the research of actual and realistic needs, this paper makes reasonable assumptions regarding the problems raised in this paper, which are as follows: (1) The scheduling range is defined as the interval between two consecutive satellite-ground communication links. In terms of time, geographically, we only plan observation tasks within a given range of observation tasks. (2) Intelligent satellites operate well throughout the entire dispatch range and will not be affected by space radiation effects.
(3) Intelligent satellites have certain autonomic capabilities, allowing them to analyze the collected image information. If an event of interest is detected, a new task can be generated in-orbit. (4) Smart satellites can only be charged when idle (for example, without performing a task) and in the sun. The iSAT satellite can be charged while performing tasks and in the sun, but the energy obtained is much less than the energy consumed to observe the payload or transmit the payload, so the energy obtained is negligible. (5) When a smart satellite processes a task at any time, the task is not replaced by other tasks. That is, the observation task or the transmission task cannot be interrupted or deleted, once it is executed. (6) There is no priority constraint between the tasks, but each task has a constraint on the observation time window. All tasks have one and only one corresponding observation time window.

Constraints
Task Switching Time Constraint Due to the strict requirements relating to the attitude of the satellite when the observation task is executed, a certain time interval must be met between two adjacent observation tasks.

Energy Constraint
Energy is a renewable resource, but the total battery capacity is limited, so the energy level cannot be higher than the upper limit of the capacity at any time, nor can it be lower than the lower limit of the capacity. The energy level drops when an observation task or a transmission task is performed, and the energy level rises while charging.

Data Storage Constraint
Storage constraints are similar to energy constraints. Due to storage capacity limitations, the total amount of data cannot be higher than the upper limit of the capacity at any time, nor can it be lower than the lower limit of the capacity. The difference is that the total amount of data increases when the observation task is performed, and the total amount of data decreases when the transmission task is executed.

Mixed Integer Programming Model
The essence of the satellite task planning problem is the selection of the appropriate observation task from the candidate observation tasks and determination of the execution order. Therefore, it is a typical combinatorial optimization problem. Therefore, we first express this problem as a mixed integer linear programming model.
For convenience, the symbols used in this article are as follows: The decision variables are as follows: y ij = 1, if task i procedes task j, i j 0, otherwise.
(2) The decision variables are as follows: 1, if task is selected, 0,1,..., 1 0, otherwise. (Ts j + i( j))I i ≤ Te j , ∀ j = 0, 1, . . . , N − 1 (5) Te i ≤ l f i I i , ∀i = 0, 1, . . . , N − 1 (7) Equation (3) defines the rule that, if a task command arrives at the satellite and is accepted by the satellite, the task is executed only before one task and can only be completed by one task. If the above rules are not accepted, the task does not enter the task planning sequence. Receiving task j means that task j becomes a task before task i.
Equation (4) and Formula (5) show that if task i has been determined in the planning arrangement, task j is after task i, and the completion time of task j needs to consider the sequence-related setting time between task i and task j, and the imaging time of task j.
Equation (6) shows that, assuming that the observation task j is executed after the observation task i, there is a switching time s ij between task j and task i. Equation (7) shows that all tasks in the sequence are completed before the latest completion time. Equation (8) shows that the completion time of the task is equal to the start time of the task plus the imaging time of the task.
Equations (9)- (11) indicate that the total memory, total energy, and total working time of all tasks in the planning sequence do not exceed the available memory space, total energy cap, and working time.
Equation (12) presents the total profit of the satellite during the planning process. The core purpose of task planning is to maximize P.

Algorithm Design
Due to the limitations of computing resources on the star, researchers mainly use a heuristic search algorithm with a fast calculation speed to solve them. The main idea is to sort the observation tasks according to the set rules and then determine whether the task can be inserted into the observation plan. It can be seen that the optimization performance of these heuristic search algorithms largely depends on the collation designed by the researchers. Because the empirical knowledge is difficult to express, or the information is not comprehensive enough, the rule-based heuristic search algorithm is usually worse than the exact search algorithm and intelligent optimization algorithm.
In order to improve the optimization performance of the heuristic search algorithm, the paper proposed a heuristic search algorithm. The main difference between the algorithm and the existing heuristic algorithm is that the ranking rules of the observation tasks are no longer designed based on the experience of researchers, but are rather learned from the planning results of the precise search algorithm using the deep neural network algorithm, so that the ranking rules can be expressed more accurately. Specifically, the workflow of the heuristic search algorithm is shown in Figure 3: In the above process, the schedulable probability of the observation task determines the ordering of the task, which is the key to affecting the performance of the heuristic algorithm. However, the historical planning result data only tell us which tasks were executed or not and does not directly give the schedulable probability of the observed task. It is not difficult to understand that the higher the schedulable probability value of the observation task, the more likely it is that the task will be executed. Then, we can use the historical planning data to train a task classification model, whose output contains two states, execution and non-execution. We can use the probability that the task will be executed as the schedulable probability of the observed task. Next, we can elaborate on the calculation method of the schedulable probability.
In order to calculate the schedulable probability of the observation task, we designed a symmetric neural network model to calculate the schedulable probability of the observation task. The network structure is shown in the following figure. In the graph, the first recurrent neural network reads the task sequence on the left side of the observation task, and the second recurrent neural network reads the task sequence on the right side of the observation task. Finally, the executed probability,  Step 1: Select all the observation tasks in the planning period and record them as the set, TSet; Step 2: Align the observation tasks in the set, TSet, in chronological order and use the deep neural network to sequentially calculate the schedulable probability of each observation task in the TSet (that is, the probability that the task will be executed); Step 3: Sort the observation tasks in the TSet in descending order of schedulable probability values; Step 4: Select the first observation task of the TSet (denoted as FTask), insert it into the observation plan, and delete the FTask from the TSet; check whether the observation plan violates resource constraints, such as energy and storage, and task switching constraints. If a constraint is violated, the task FTask is removed from the scenario.
Step 5: Repeat step 4 until the TSet is empty; then, a complete observation plan is obtained.
The Algorithm 1 shown as follow: In the above process, the schedulable probability of the observation task determines the ordering of the task, which is the key to affecting the performance of the heuristic algorithm. However, the historical planning result data only tell us which tasks were executed or not and does not directly give the schedulable probability of the observed task. It is not difficult to understand that the higher the schedulable probability value of the observation task, the more likely it is that the task will be executed. Then, we can use the historical planning data to train a task classification model, whose output contains two states, execution and non-execution. We can use the probability that the task will be executed as the schedulable probability of the observed task. Next, we can elaborate on the calculation method of the schedulable probability.
In order to calculate the schedulable probability of the observation task, we designed a symmetric neural network model to calculate the schedulable probability of the observation task. The network structure is shown in the following figure. In the graph, the first recurrent neural network reads the task sequence on the left side of the observation task, and the second recurrent neural network reads the task sequence on the right side of the observation task. Finally, the executed probability, P s , and the non-execution probability, P n = (P s + P n = 1), of the observation task are given by the fully connected neural network; then, P s is the schedulable probability value of the observation task. The symmetric recurrent neural network model is shown in Figure 4.

Model Input
Using the recurrent neural network to calculate the schedulable probability of an observation task, the first question to consider is how to convert the original observation task data into input data that are recognizable by the recurrent neural network. That is, the way to represent the observation task, so that it can contain the attribute information of the observation task and the distribution relationship between the tasks. In this regard, we define the eigenvectors of the observed tasks. We take the input of the first recurrent neural network as an example.

Model Input
Using the recurrent neural network to calculate the schedulable probability of an observation task, the first question to consider is how to convert the original observation task data into input data that are recognizable by the recurrent neural network. That is, the way to represent the observation task, so that it can contain the attribute information of the observation task and the distribution relationship between the tasks. In this regard, we define the eigenvectors of the observed tasks. We take the input of the first recurrent neural network as an example. M input = {m 0 , m 1 , . . . , m N−1 } is set as an observation task sequence, arranged in chronological order, from small to large, and m 0 is the observation task to be determined. Since ∀m i ∈ M input , its eigenvector can be expressed as x i : where p i is the benefit value of the observation task m i , e i is the energy resource that the observation task m i needs to consume, and d i is the storage resource that the observation task m i needs to consume; Based on the eigenvector representation of the observation task, we constructed the original input data of the recurrent neural network, as shown in the matrix. In the figure, each column corresponds to an observation task, and each row corresponds to the characteristic value of the observation task in terms of its dimensions.
The observation task sequence Λ can be represented by eigenvector x.

Optimization Algorithm and Loss Function
We use the cross-entropy cost function (see Formula (14)) and the Adam optimization algorithm [37] to train the schedulable probability calculation model based on the recurrent neural network.
where p(y, X) is the gold one-hot distribution of the training sample X, and P(y X) is the probability distribution function of each type.

Track Parameter Setting
The intelligent satellite used in the simulation experiment is set as a sun-synchronous orbit satellite, and its orbital parameters are selected from the STK satellite database (see Table 1 for the specific settings). The planning period is set to 1 June 2019 00:00:00 and 1 June 2019 00:00:00, with a total of 24 h. The observation time window of the satellite to the ground target is calculated by STK. Due to the orbital height of the satellite and the angle of view of the observed load, the ground point target usually has only 1 or 2 observation time windows within 24 h. In this chapter, we assume that each observation target has only one valid observation time window, that is, each observation target corresponds to an observation task. In order to evaluate the performance of the proposed heuristic algorithm based on the symmetric recurrent neural network in each test case, we set up 5 planning scenarios. The number of observation tasks in each planning scene is 100-500, and the step size is 100. The target is randomly selected from the urban dataset of STK. There are 10 test cases for each planning scenario.

Indicator Setting
In order to evaluate the performance of an algorithm, we use three evaluation indicators: planning profit, revenue gap, and calculation time.
The planned profit represents the average of the cumulative observed returns of all test cases in the task planning algorithm in a single planning scenario. The calculation method is as follows: where p k (A) represents the cumulative observation benefit of algorithm A in the k test case. The profit gap represents the average of the ratio of the cumulative observed return difference between the task planning algorithm and the benchmark reference algorithm to the cumulative reference income of the benchmark reference algorithm. The calculation method is as follows: where p k (A) represents the cumulative observation profit of algorithm A in the k test case. The calculation time represents the average value of the calculation time of all of the test cases of the person planning algorithm in a single planning scenario (from the start of planning to the generation of the observation plan). The calculation method is as follows: where p k (A) represents the calculation time of algorithm A in the k test case.

Analysis of Results
The experiment compares the planning profit and planning time of the five algorithms, CPLEX benchmark algorithm, PF, d-PSB, GBDT, and SRNN-HS (Symmetric Recurrent Neural Network Heuristic Search algorithm) in terms of the global urban datasets [38][39][40]. Table 2 shows the planned profit and profit gap for each algorithm, Figure 5 shows the revenue error for each algorithm, Table 3 and Figure 6 show the calculation time for each algorithm.   The calculation result of the CPLEX benchmark algorithm is the optimal scheme for satellite task planning, but its calculation time is unacceptable for the on-board computer, so it can only be used as a benchmark algorithm for evaluating the pros and cons of the algorithm. The profit gap indicates a gap with CPLEX, and the smaller the value, the better the planning effect.
As can be seen from Table 2, the SRNN-HS proposed in this paper has achieved the highest task planning benefits in most scenarios, indicating the effectiveness of SRNN-HS.    The calculation result of the CPLEX benchmark algorithm is the optimal scheme for satellite task planning, but its calculation time is unacceptable for the on-board computer, so it can only be used as a benchmark algorithm for evaluating the pros and cons of the algorithm. The profit gap indicates a gap with CPLEX, and the smaller the value, the better the planning effect.
As can be seen from Table 2, the SRNN-HS proposed in this paper has achieved the highest task planning benefits in most scenarios, indicating the effectiveness of SRNN-HS. The calculation result of the CPLEX benchmark algorithm is the optimal scheme for satellite task planning, but its calculation time is unacceptable for the on-board computer, so it can only be used as a benchmark algorithm for evaluating the pros and cons of the algorithm. The profit gap indicates a gap with CPLEX, and the smaller the value, the better the planning effect.
As can be seen from Table 2, the SRNN-HS proposed in this paper has achieved the highest task planning benefits in most scenarios, indicating the effectiveness of SRNN-HS.
As can be seen from Table 2 and Figure 5, in the planning scenario s100, the PF is larger in planning revenue than other algorithms, because the neural network is required to calculate the schedulable probability of each task. However, as can be seen, the calculation time increment is not very large. Even if the performance of the onboard computer is only 1/100 of that of an office computer, our algorithm can still give the planning result in a short time.
From Figure 1, it can be seen that in the planning scenario s100, PF has a less significant planning revenue gap than that of other algorithms. This is because the number of observation tasks involved in planning is lower, the degree of conflict between tasks is lower, and the selection of observation tasks with a high revenue value can achieve better results. Except for the planning scenario, s100, the SRNN-HS algorithm has less revenue error than the three heuristic algorithms, PF, d-PSB, and GBDT. This is because the task scheduling of the heuristic algorithms is obtained by researchers based on time experience. Due to the existence of blind spots in knowledge or difficulties in knowledge representation, there is still room for improvement in the optimization of rule-based heuristic search algorithms. It can be seen that SRNN-HS is also better than the GBDT algorithm. Because GBDT is a machine learning algorithm based on feature engineering, it is not comprehensive enough in the expression of features. The deep learning algorithm that we use can automatically extract features, without too much interference from researchers, and contains more effective information. The calculation cost is acceptable.
Through the above experimental analysis, the feasibility and effectiveness of the proposed algorithm are verified.

Conclusions
Earth observation is one of the important applications of intelligent satellites. Satellite on-orbit task planning is an important part of the intelligent satellite automatic operation. This paper describes the problem of autonomous task planning for earth observation of intelligent satellite, and establishes a mixed integer programming model for satellite mission planning. We designed a heuristic search algorithm for a symmetric recurrent neural network, based on the characteristics of the model. The schedulable probability of the task is obtained by constructing a symmetric structured recurrent neural network model. The task with the highest schedulable probability is selected to be inserted into the observation scheme, and the iterative calculation is repeated to obtain the optimal observation scheme. Considering that there is no specific research on intelligent satellite task planning, this paper compares several heuristic algorithms with several traditional observation satellite task planning algorithms. The result of experiments proved that the profit of SRNN-HS algorithm is higher than other algorithms (except s100), and the profit gap is significantly lower than other algorithms. The experimental results illustrated that the performance of SRNN-HS algorithm is better than the rule-based heuristic algorithm and the feature extraction-based machine learning heuristic algorithm. However, this algorithm brings the increase of time consumption, which is also the limitation of this algorithm. This paper presents a task planning algorithm which can be used in the background of intelligent satellite iSAT. In the future, we can consider the application of the algorithm and multi satellite cooperative autonomous task planning, so as to improve the observation range and the efficiency of earth observation.