A Multi-Pointer Network for Multiple Agile Optical Satellite Scheduling Problem

Liu, Zheng; Xiong, Wei; Han, Chi; Zhao, Kai

doi:10.3390/aerospace11100792

Open AccessArticle

A Multi-Pointer Network for Multiple Agile Optical Satellite Scheduling Problem

National Key Laboratory of Space Target Awareness, Space Engineering University, Beijing 101416, China

^*

Author to whom correspondence should be addressed.

Aerospace 2024, 11(10), 792; https://doi.org/10.3390/aerospace11100792

Submission received: 20 August 2024 / Revised: 17 September 2024 / Accepted: 24 September 2024 / Published: 25 September 2024

(This article belongs to the Topic Techniques and Science Exploitations for Earth Observation and Planetary Exploration)

Download

Browse Figures

Versions Notes

Abstract

With the rapid growth in space-imaging demands, the scheduling problem of multiple agile optical satellites has become a crucial problem in the field of on-orbit satellite applications. Because of the considerable solution space and complicated constraints, the existing methods suffer from a huge computation burden and a low solution quality. This paper establishes a mathematical model of this problem, which aims to maximize the observation profit rate and realize the load balance, and proposes a multi-pointer network to solve this problem, which adopts multiple attention layers as the pointers to construct observation action sequences for multiple satellites. In the proposed network, a local feature-enhancement strategy, a remaining time-based decoding sorting strategy, and a feasibility-based task selection strategy are developed to improve the solution quality. Finally, extensive experiments verify that the proposed network outperforms the comparison algorithms in terms of solution quality, computation efficiency, and generalization ability and that the proposed three strategies significantly improve the solving ability of the proposed network.

Keywords:

multiple agile optical satellite scheduling; load balance; sequence-to-sequence model; deep reinforcement learning

1. Introduction

Agile optical satellites (AOS), as a new generation of Earth-observation platforms, have flexible attitude maneuverability and advanced imaging capabilities, which can adjust attitudes by rolling, pitching, and yawing. Over past years, AOSs have played an increasingly significant role in agricultural production [1], resource exploration [2], military reconnaissance [3], and other fields. In the current Earth-observation satellite-management system, the ground operation center collects observation requests from users daily and then generates task schedules in a centralized processing way [4]. A task schedule consists of a task allocation plan and observation instructions for satellites, and the observation instructions are uploaded to the corresponding satellites through ground stations. In the operation center, an effective and efficient scheduling algorithm is crucial for improving the scheduling effect of the whole system.

The multiple agile optical satellite scheduling problem (MAOSSP) aims to allocate observation tasks to AOSs and arrange their observation actions, satisfying complex constraints. As illustrated in Figure 1, an AOS executes observation tasks according to the scheduled order in the arranged time windows when flying over ground targets. The time window when the target is visible to the AOS is the visible time window (VTW), and the duration of the actual observation action is the observation time window (OTW). The agile attitude maneuverability of the AOSs expands the range of their observable areas and extends the length of the VTWs. However, MAOSSP is a complex scheduling problem whose complexity stems from three aspects. First, apart from conventional constraints, such as memory and energy limits, more complicated attitude maneuver constraints should be considered due to the agile characteristic of AOSs. For an AOS, the observation start time depends on its observation attitude, and the time interval between two adjacent observation actions must be sufficient enough for attitude adjustment, resulting in that the current observation action can influence the subsequent one [5]. Second, this problem has been proven as an NP-hard (non-deterministic polynomial hard) combination optimization problem [6], whose solution space grows dramatically with the rise of the problem scale. For one thing, the increase in the number of satellites provides more observation opportunities for each task but leads to a huge solution space with higher complexity. For another, the increase in the number of tasks turns this problem into an oversubscribed one [7,8], in which massive tasks are allocated to the limited satellite resources and can only be partially completed. Finally, the task allocation of satellites needs to consider not only the maximization of overall observation profit but also the load balance, which helps rationalize the daily management of satellites and prolong their lifetime [9].

Existing methods for satellite scheduling can be generally classified into three categories: exact algorithms, heuristic algorithms, and deep reinforcement learning (DRL) algorithms. Exact algorithms can obtain the optimal solution by searching throughout the entire solution space but require a significant amount of computation time [10]. Thus, exact algorithms are only applicable to single-orbit or single-satellite scheduling scenarios. Heuristic algorithms can search for approximate solutions through population iteration. However, they can hardly tackle large-scale problems because the population iteration results in extensive constrain-check steps and enormous computation burden. In addition, premature convergence and local optimum are common problems for heuristic algorithms [11]. Differing from heuristic algorithms, DRL algorithms provide a non-iterative way to solve this problem, which constructs policy networks to generate solutions directly. DRL-based satellite scheduling methods can be divided into reinforcement learning (RL) approaches [12,13,14] and sequence-to-sequence (seq2seq) models [15,16]. The RL methods formulate the satellite scheduling process as a Markov decision process (MDP), which is an interaction process of agents and the environment, and a reward is given every time an agent chooses an action. In this process, the policy of the agent is continuously optimized through trial and error. The actions of agents only depend on the current environment state, making it hard to ensure the final action sequence can obtain a high-quality solution. Furthermore, the expansion of the problem scale leads to a rise in the decision dimensions, and it is challenging for MDP models to address the high-dimension problems. The seq2seq models use deep neural networks (DNN) based on the encoder-decoder structure [17] to construct a sequence in an auto-regressive way, which has superior abilities to handle long task sequences with high-dimension features. However, the current research can only achieve the “single-sequence-to-single-sequence” application and apply the seq2seq models to the single-satellite scheduling problems. Therefore, the seq2seq models need to be further explored to apply to the multi-satellite scheduling instances.

In this paper, we expand the current seq2seq models and propose a “single-sequence-to-multiple-sequence” model to solve the MAOSSP. In the proposed model, an unordered task sequence is input, and multiple task observation action sequences are output, each belonging to the corresponding satellite. The major contributions are summarized as follows.

A mathematical model of the MAOSSP is established, whose optimization objective is to maximize the observation profit rate and realize the load balance. Meanwhile, a series of complex constraints are considered, such as the lighting condition, the flexible attitude maneuverability, the limitation of memory energy, and so on.
A multi-pointer network is designed to construct high-quality scheduling solutions in an auto-regression manner, which provides a competitive DRL-based algorithm for this practical problem. Specifically, the network adopts multiple attention layers as the pointers to construct observation action sequences for multiple satellites. Furthermore, a local feature-enhancement strategy, a remaining time-based decoding sorting strategy, and a feasibility-based task selection strategy are proposed to improve the solving ability of the proposed network.
Extensive experiments are conducted to verify the effectiveness of the proposed method. The experimental results validate that the proposed method has superior performance in solution quality, computational efficiency, and generalization ability in comparison with the state-of-the-art algorithms.

The remainder of this paper is organized as follows. Section 2 presents the recent research on the satellite scheduling problem. Section 3 gives a detailed description of the MAOSSP and builds the corresponding mathematical model. In Section 4, a multi-pointer network is proposed, and its architecture, components, and training algorithm are elaborated. Section 5 presents the experimental results, and Section 6 presents the conclusions of the study with a summary and directions for future work.

2. Related Work

In recent years, the Earth-observation satellite scheduling problem has received wide attention in the research literature, which can be roughly divided into exact methods, heuristic methods, and DRL methods.

When the problem scale is not too large, such as single-orbit or single-satellite scheduling scenarios, the exact algorithms can obtain the optimal solutions. Lemaître et al. [6] gave the general description of the agile satellite scheduling problem for the first time and proposed a dynamic programming algorithm. Chu et al. [18] presented a branch-and-bound algorithm with a look-ahead method and three pruning strategies to tackle the simplified agile satellite scheduling problem in a small-size instance. Peng et al. [5] considered the time-dependent profits and presented an adaptive-directional dynamic programming algorithm with decremental state space relaxation for the single-orbit scheduling of the single agile satellite. Qu et al. [19] developed a mixed integer linear programming model of the satellite scheduling problem and proposed an imitation learning framework for branch and bound to solve this model. The above studies summarily prove that exact algorithms can explore the whole search space and obtain the optimal solution, and they are applicable to single-orbit or small-scale satellite scheduling. However, as the problem scale expands, the computational cost of the exact algorithms tends to be unacceptable because of the NP-hard characteristic and the complex constraints.

Heuristic algorithms have been widely used to address the agile satellite scheduling problem owing to their outstanding search ability, such as the genetic algorithm (GA) [20,21], the particle swarm optimization (PSO) algorithm [22,23], the ant colony optimization (ACO) algorithm [24,25], the artificial bee colony (ABC) algorithm [26,27], and so on. Chatterjee and Chatterjee [28] established a mixed integer non-linear programming model to formulate the agile satellite scheduling problem with energy and memory constraints and proposed an elitist mixed coded genetic algorithm with a hill-climber mechanism to solve this problem. Zhou et al. [29] investigated the agile satellite scheduling problem of multiple observations with various durations and proposed an improved adaptive ant colony algorithm (IAACO) to solve this problem. This algorithm consisted of an adaptive task assignment layer with four operators and an observation time determination algorithm based on an improved ACO algorithm. Yang et al. [27] developed a hybrid discrete artificial bee colony algorithm (HDABC) for the imaging satellite mission planning problem. This algorithm improved the three search phases of the basic ABC algorithm to balance the exploitation exploration abilities of this algorithm and achieve population co-evolution. These heuristic algorithms exhibit satisfactory performance in solution quality through the iterative search of the population. However, heuristic algorithms suffer from the issues of long computation time and premature convergence. For one thing, each individual in the population needs complex constraint-checking steps, and the iterations of the population significantly increase the computational burden. For another, heuristic algorithms are prone to converge to the local optima because of the limitation of the search mechanisms.

In addition to these improvements for heuristic algorithms, some scholars have tried to utilize DRL approaches to improve population initialization and search operator selection. Wu et al. [30] presented a data-driven improved genetic algorithm to address the agile satellite scheduling problem, which adopted an artificial neural network to build the initial population, a frequent pattern mining approach to discover specific patterns from elite solutions, and three competition-based adaptive local adjustment strategies to maintain the diversity of the population. Song et al. [31] studied the electromagnetic detection satellite scheduling problem and proposed a genetic algorithm based on reinforcement learning (RLGA), which used Q-learning to guide the GA in choosing appropriate evolution operators. Based on this work, Song et al. [32] further investigated the multi-type satellite observation scheduling problem and developed a deep reinforcement learning-based genetic algorithm (DRL-GA). This algorithm used a dueling deep Q network to initialize the population, an individual update mechanism with an elite individual retention strategy to update the population, and a fast local search method with a DRL-assisted approach to construct neighborhood solutions. The applications of DRL have improved the exploration ability and computational efficiency of the heuristic algorithms, but the inherent issues of massive time consumption and premature convergence have not been thoroughly resolved.

Apart from combining the DRL methods with the heuristic algorithms, some attempts have been made to apply DRL algorithms to the agile satellite scheduling problem directly. Some research has formulated the agile satellite scheduling process as a Markov decision process and adopted RL algorithms to solve this problem. Specifically, Herrmann and Schaub [13] established an MDP model of the single agile satellite scheduling problem and adopted Monte Carlo tree search (MCTS) and supervised learning to train the policy networks of agents. The testing results showed that the performance of the trained policy networks approximated that of the MCTS policy, but its computational time was significantly reduced. Zhang et al. [33] presented a multi-agent MDP model of the multi-satellite collaborative scheduling problem and proposed a multi-agent reinforcement learning (MARL) method with the multi-agent proximal policy optimization (MAPPO) algorithm to solve this problem. Notably, the existing RL algorithms for satellite scheduling only adopt multi-layer fully connected neural networks as the policy networks, which can only handle the fixed-scale satellite scheduling problem. Once the number of tasks changes, the policy networks must be re-conducted and retrained. Therefore, these RL algorithms cannot be applicable to satellite scheduling problems with various scales. Inspired by the applications of seq2seq models in classical combination optimization fields, some scholars have developed excellent seq2seq models for the agile satellite scheduling problem. Zhao et al. [34] adopted the pointer network [35] to generate a permutation of executable tasks for the single-satellite scheduling problem. Wei et al. [17] studied the multi-objective agile satellite scheduling problem and constructed an encoder-decoder structure neural network to generate high-quality schedules. The constructed network consisted of a task encoder with a gated recurrent unit (GRU), a feature encoder with a convolution neural network (CNN), and a decoder with two attention layers. Long et al. [16] proposed a Transformer-based encoder-decoder network with temporal encoding for the Earth-observation scheduling problem with various scales. Due to the use of more advanced neural networks as policy networks, these seq2seq models can handle various satellite scheduling instances with different task scales. However, they can only generate one sequence for one satellite owing to the limitation of the network structures, so they cannot address the multi-satellite scheduling problems.

3. Problem Statement

The MAOSSP allocates tasks to satellites and arranges an observation action sequence for each satellite. A series of complex constraints must be considered in this problem, and its optimization objective is to maximize the observation profit rate and realize the load balance. In this section, we first make some simplifications and assumptions to standardize the MAOSSP and define some notions for the description of this problem. Then, a mathematical model is established to formulate this problem with complicated constraints and an optimization objective regarding the observation profit rate and the load balance.

3.1. Simplification and Assumption

The actual satellite scheduling process is complicated, and many complex practical factors make this problem difficult to describe [36,37], some of which do not need to be considered within the scope of the problem model [32]. For the convenience of problem modeling, some reasonable simplifications and assumptions are made based on the actual engineering background and the previous literature [16,32,37,38,39], which are listed as follows:

Every task is a point target or a small strip, which can be covered through a single observation and cannot be decomposed or clustered.
Every task can be executed at most once, and the execution process cannot be interrupted.
Once a task is successfully scheduled, no uncertain factors will occur to affect its execution.
The energy and memory of each satellite can recover to their maximum values at the beginning of each orbital period.
For an AOS, the adjustment process of its pitch angle or roll angle is a “uniform acceleration-uniform velocity-uniform deceleration” process or a “uniform acceleration-uniform deceleration” process, as shown in Figure 2.

3.2. Variable and Definition

As shown in Figure 3, the inputs of the MAOSSP include a task set, a satellite set, and VTW sets of tasks, and the output scheduling solution consists of multiple observation action sequences. Before the formal solution, the VTW sets of all tasks are obtained according to the task location and the orbital parameters of satellites in the simulation system. In the solving process, the Multi-AOS scheduling method must fully consider a series of constraints, and its optimization objective is to maximize the observation profit rate and realize the load balance simultaneously.

To ease the description of the MAOSSP, the relevant variables are defined as follows:

$T = {i | i = 1, 2, \dots, | T |}$ is the set of tasks, where $| T |$ is the number of tasks, and i is the task index. For the task i, the requirement information is denoted by $R_{i} = [d_{i}, p_{i}]$ , whose parameters are defined:
−
$d_{i}$ is the requested imaging duration;
−
$p_{i}$ is the priority, representing the observation profit.
$S = {j | j = 1, 2, \dots, | S |}$ is the set of satellites, where $| S |$ is the number of satellites, j is the satellite index. For the satellite j, the following parameters are defined:
−
$O_{j}$ is the orbit set of the satellite j;
−
$L P_{j}$ is the set of the periods that satisfy the lighting condition;
−
$[- p u, p u]$ is the pitch angle range;
−
$[- r u, r u]$ is the roll angle range;
−
a is the angular acceleration;
−
w is the maximum angular velocity;
−
$trans (\cdot)$ is the angular adjustment time function, which is formulated as follows:

$trans (Δ A) = \{\begin{matrix} 2 \cdot \sqrt{\frac{Δ A}{a}}, 0 ⩽ Δ A ⩽ \frac{w^{2}}{a} \\ \frac{Δ A}{w} + \frac{w}{a}, Δ A > \frac{w^{2}}{a} \end{matrix},$

(1)

where $Δ A$ is the adjusted angle value on one axis; the derivation process of this function is presented in Appendix A;
−
M is the maximum memory for the storage of images;
−
$o m$ is the memory consumption rate duration observation;
−
E is the maximum energy value used for observation and attitude adjustment;
−
$o e$ is the energy consumption rate during observation;
−
$a e$ is the energy consumption rate during angular adjustment.
$V T W_{i} = {W_{i}^{j} | j = 1, 2, \dots, | S |}$ is the VTW set of the task i, and $W_{i}^{j} = {w_{i, j}^{k} | k = 1, 2, \dots, | W_{i}^{j} |}$ is the VTW set of the task i from satellite j, where $| W_{i}^{j} |$ is the number of VTWs, k is the VTW index. For the VTW $w_{i, j}^{k}$ , the following parameters are defined:
−
$o_{i, j}^{k} \in O_{j}$ is the index of the orbit where this VTW is located;
−
$e s t_{i, j}^{k}$ is the earliest start time of this VTW;
−
$l s t_{i, j}^{k}$ is the latest start time of this VTW;
−
$r r_{i, j}^{k}$ is the requested observation roll angle of this VTW.
$Y = {A_{j} | j = 1, 2, \dots, | S |}$ denotes the final multi-satellite scheduling solution, consisting of the observation action sequences of all satellites. $A_{j} = {a_{j}^{l} | l = 1, 2, \dots, | A_{j} |}$ is the observation action sequence of the satellite j. $| A_{j} |$ is the number of observation actions, which is also the number of tasks assigned to this satellite. $a_{j}^{l} = [I_{j}^{l}, c_{j}^{l}, o s t_{j}^{l}, e s t_{j}^{l}, o p_{j}^{l}, o r_{j}^{l}]$ is the lth observation action, whose parameters are defined as follows:
−
$I_{j}^{l} \in T$ is the index of the task executed by this observation action;
−
$c_{j}^{l}$ is the index of the orbit where this observation action is executed;
−
$[o s t_{j}^{l}, e s t_{j}^{l}]$ is the OTW, where $o s t_{j}^{l}$ is the observation start time, and $o e t_{j}^{l} = o s t_{j}^{l} + d_{I_{j}^{l}}$ is the observation end time;
−
$o p_{j}^{l}$ is the observation pitch angle;
−
$o r_{j}^{l}$ is the observation roll angle.
$s v_{j}^{l} = [c_{j}^{l}, e s t_{j}^{l}, o p_{j}^{l}, o r_{j}^{l}, r m_{j}^{l}, r e_{j}^{l}]$ is the state vector of the satellite j after the observation action $a_{j}^{l}$ , where the observation end time $o s t_{j}^{l}$ is also the idle start time of this satellite, $r m_{j}^{l}$ is the remaining memory, and $r e_{j}^{l}$ is the remaining energy; $s v_{j}^{0} = [c_{j}^{0}, e s t_{j}^{0}, o p_{j}^{0}, o r_{j}^{0}, r m_{j}^{0}, r e_{j}^{0}]$ is the initial state of the satellite j. $r m_{j}^{l}$ and $r e_{j}^{l}$ are formulated as follows:

$r m_{j}^{l} = r m_{j}^{l - 1} - o m \cdot d_{I_{j}^{l}}, l \in {1, 2, \dots, | A_{j} |};$

(2)

$r e_{j}^{l} = r e_{j}^{l - 1} - o e \cdot d_{I_{j}^{l}} - a e \cdot (| o p_{j}^{l} - o p_{j}^{l - 1} | + | o r_{j}^{l} - o r_{j}^{l - 1} |), l \in {1, 2, \dots, | A_{j} |};$

(3)
$x_{i, j}^{k}$ is a decision variable used to determine which VTW is selected for the task i, as formulated in Equation (4). Furthermore, the observation action $a_{j}^{l} = [I_{j}^{l}, c_{j}^{l}, o s t_{j}^{l}, e s t_{j}^{l}, o p_{j}^{l}, o r_{j}^{l}]$ ( $I_{j}^{l} = i$ ) is a decision vector used to determine the observation action of the task i in the selected VTW.

$x_{i, j}^{k} = \{\begin{matrix} 1, the task i is executed in the VTW w_{i, j}^{k} \\ 0, otherwise \end{matrix} .$

(4)

3.3. Mathematical Model

According to the above variables, the observation profit rate

f_{1}

is formulated in Equation (5), which is the ratio of the total profit of the executed tasks to the total profit of all tasks. The load balance

f_{2}

is formulated in Equation (6) and denoted by a coefficient of variation of the number of tasks assigned to satellites, which is the ratio of the standard deviation to the average. The lower value of

f_{2}

denotes that the scheduling solution performs better in load balance.

f_{1} = \frac{\sum_{i = 1}^{| T |} (p_{i} \cdot \sum_{j = 1}^{| S |} \sum_{k = 1}^{| W_{i}^{j} |} x_{i, j}^{k})}{\sum_{i = 1}^{| T |} p_{i}} .

(5)

f_{2} = \frac{1}{\bar{| A |}} \sqrt{\frac{\sum_{j = 1}^{| S |} {|| A_{j} | - \bar{| A |}|}^{2}}{| S |}},

(6)

where

\bar{| A |} = \frac{\sum_{j = 1}^{| S |} | A_{j} |}{| S |}

is the average number of assigned tasks.

The optimization objective F is denoted by the difference between

f_{1}

and

f_{2}

, which is formulated as

F = f_{1} - f_{2}

. The mathematical model of the MAOSSP is presented as follows:

Maximize F = \frac{\sum_{i = 1}^{| T |} (p_{i} \cdot \sum_{j = 1}^{| S |} \sum_{k = 1}^{| W_{i}^{j} |} x_{i, j}^{k})}{\sum_{i = 1}^{| T |} p_{i}} - \frac{1}{\bar{| A |}} \sqrt{\frac{\sum_{j = 1}^{| S |} {|| A_{j} | - \bar{| A |}|}^{2}}{| S |}} .

(7)

Subject to:

\sum_{j = 1}^{| S |} \sum_{k = 1}^{| W_{i}^{j} |} x_{i, j}^{k} \leq 1, \forall i \in T;

(8)

[o s t_{j}^{l}, e s t_{j}^{l}] \subseteq L P_{j}, \forall l \in {1, 2, \dots, | A_{j} |}, \forall j \in | S |;

(9)

e s t_{i, j}^{k} \leq o s t_{j}^{l} \leq l s t_{i, j}^{k}, if x_{i, j}^{k} = 1 and i = I_{j}^{l}, \forall l \in {1, 2, \dots, | A_{j} |}, \forall j \in | S |;

(10)

trans (| o p_{j}^{l} - o p_{j}^{l - 1} |) + trans (| o r_{j}^{l} - o r_{j}^{l - 1} |) \leq o s t_{j}^{l} - o e t_{j}^{l - 1}, \forall l \in {1, 2, \dots, | A_{j} |}, \forall j \in | S |;

(11)

r m_{j}^{l} \geq 0, \forall l \in {1, 2, \dots, | A_{j} |}, \forall j \in | S |;

(12)

r e_{j}^{l} \geq 0, \forall l \in {1, 2, \dots, | A_{j} |}, \forall j \in | S | .

(13)

Equation (7) denotes the optimization objective, and the higher value indicates that the scheduling solution is better. Equation (8) denotes the task execution uniqueness constraint that each task cannot be executed more than once. Equation (9) denotes the lighting constraint that each OTW must satisfy the lighting condition. Equation (10) denotes the time window constraint that every observation action must start in the corresponding VTW. Equation (11) denotes the transition time constraint that the time interval between two adjacent observation actions must be sufficient enough for angle adjustment. Equation (12) denotes the memory constraint that the remaining memory after an observation action must not be less than zero. Equation (13) denotes the energy constraint that the remaining energy after an observation action must not be less than zero.

4. Method

In this section, we propose a multi-pointer network for the MAOSSP. The architecture of the multi-pointer network, its components, and its training algorithm are elaborated in turn.

4.1. Architecture of the Multi-Pointer Network

The existing seq2seq models [16,17,34] for satellite scheduling can only generate one observation action sequence for the single satellite. The core of these models is applying an attention model as a pointer to select an appropriate task for the satellite at every decoding step. To construct multiple observation action sequences for multi-satellite scheduling, we develop a multi-pointer network (MPN), as shown in Figure 4.

The most critical part of the MPN is that it adopts multiple attention layers as the multiple pointers, each of which is used to generate an observation action sequence for a different satellite. These attention layers belong to different decoding layers, and the number of decoding layers is equal to that of the satellites. In addition to the decoding layers, the important components of the MPN include a static embedding layer, multiple encoding layers, and multiple dynamic embedding layers. As shown in Figure 5, the workflow of this network model can be divided into two stages: a static feature extraction stage and a dynamic decoding stage, which is elaborated as follows:

Static feature extraction stage. In this stage, $| S |$ encoding layers with a static embedding layer are used to extract distinctive features from the input task information and provide them to the corresponding decoding layers. The input task information contains the requirements and VTW sets of the tasks. Considering that the VTWs of a task originate from different satellites, we classify the task information into global information and local information. For the task i, its global information contains its requirement information $R_{i}$ and the overall set of all its VTWs $V T W_{i}$ , and the VTW set $W_{i}^{j}$ that originates from the satellite j is the local information. A local feature-enhancement strategy is proposed to handle the input task information, which is described as follows:
- Local feature-enhancement strategy—First, the static embedding layer fuses the global information with the local information from different satellites and embeds the fusion results into a high-dimensional vector space. Then, the encoding layers further extract distinctive features from the embedding results with different local information. The local information embedding facilitates the local feature enhancement and can provide richer task features for decision-making.
Dynamic decoding stage. In this stage, $| S |$ decoding layers, each equipped with a dynamic embedding layer, are used to construct the observation action sequences for the satellites. At every decoding step, the decoding process of these decoding layers is the same, and each decoding layer selects a task in turn. At the decoding step l, the dynamic embedding layer j is responsible for embedding the features of the previously selected task and the current state of the satellite j into a high-dimensional query vector. Next, the decoding layer j generates a set of candidate tasks and calculates the probabilities of candidate tasks by associating the query vector with the extracted task features provided to this decoding layer. The generation rule of the candidate task set is that the unscheduled task will be put into the candidate task set if it has at least one VTW in the remaining scheduling period of this satellite. Then, a candidate task is selected based on the probability distribution, and its observation action is determined through the earliest start time (EST) strategy, which sets the earliest feasible time as its observation start time. Finally, this observation action is placed at the end of the observation action sequence of the satellite j, and the state of this satellite is updated. The observation action sequence of the satellite j after l decoding steps is denoted as $A_{j}^{l} = {a_{j}^{n} | n = 1, 2, \dots, l}$ . Based on the above decoding process, a remaining time-based satellite sorting strategy and a feasibility-based task selection strategy are proposed to optimize the solution construction process, which is described as follows:
- Remaining time-based decoding sorting strategy—used to determine the decoding order of these decoding layers based on the remaining time of the corresponding satellites at the beginning of every decoding step. The remaining time of a satellite is the difference between the total scheduling time and its current idle start time. The shorter remaining time means fewer feasible tasks that can be allocated to this satellite, so the satellites with shorter remaining time should be ranked at the front in favor of the selection of more appropriate tasks. For this reason, this sorting strategy sorts these satellites in ascending order of their remaining time, and the order of these satellites is the decoding order of the corresponding decoding layers. If all the remaining unscheduled tasks do not have VTWs in the remaining period of a satellite, this satellite will be removed from the satellite set.
- Feasibility-based task selection strategy—used to select an appropriate task from the set of candidate tasks according to the probability distribution in every decoding layer. The proposed task selection strategy incorporates a feasibility-checking step based on the conventional greedy strategy. The task selection process with the proposed strategy is presented as follows. First, the candidate tasks are sorted in descending order of their probabilities. Second, these tasks are checked one by one to determine whether their observation actions can satisfy the constraints in Section 3.3. Once an observation action can satisfy these constraints, the corresponding will be selected, and this observation action will be added to the observation action sequence. If their observation actions do not meet these constraints, the candidate task set will be updated. This way ensures that each selected task is feasible.

4.2. Specific Structures of the Network Components

The multi-pointer network is composed of a static embedding layer, multiple encoding layers, multiple dynamic embedding layers, and multiple decoding layers, as shown in Figure 4. The number of these layers is related to the number of satellites. The specific structures of these components are elaborated as follows:

Static embedding layer—used for the fusion and dimension conversion of the global and local task information. As shown in Figure 6, the static embedding layer consists of a one-dimensional convolution network ( $Conv 1 d$ ) and $| S | + 1$ LSTM networks ( ${LSTM}_{0}, {LSTM}_{1}, \dots, {LSTM}_{| S |}$ ). $Conv 1 d$ is used to process the task requirements, ${LSTM}_{0}$ is used to process the overall sets of all VTWs, and the left LSTMs are used to process the VTW sets from different satellites, respectively. For each task, the VTWs in its overall set of all VTWs are sorted in chronological order before being embedded. The final embedding results are denoted by a set ${S E_{j} | j = 1, 2, \dots, | S |}$ , where $S E_{j}$ is a high-dimensional embedding matrix provided to the encoding layer j. $S E_{j} = (s e_{j}^{1}, s e_{j}^{2}, \dots, s e_{j}^{| T |})$ consists of $| T |$ embedding vectors, and $s e_{j}^{i}$ is the embedding result of the task i for the encoding layer j. The generation process of $s e_{j}^{i}$ is formulated as follows:

$r e_{i} = Conv 1 d (R_{i}),$

(14)

$g e_{i} = {LSTM}_{0} (V T W_{i}),$

(15)

$s e_{j}^{i} = Concat (r e_{i}, g e_{i} + {LSTM}_{j} (W_{i}^{j})),$

(16)

where $Concat (\cdot, \cdot)$ is a concatenation operator.
Encoding layers—extracting the distinctive features from the embedding results. These encoding layers have the same structure, each consisting of a multi-head self-attention layer ( $MHSA$ ), a feed-forward layer ( $FF$ ), and two residual connections. Each residual connection is composed of a skip-connection structure and a layer-normalization operator ( $LN$ ). The encoding results are denoted by a set ${E F_{j} | j = 1, 2, \dots, | S |}$ , where $E F_{j}$ is a feature matrix provided to the decoding layer j. $E F_{j} = (e f_{j}^{1}, e f_{j}^{2}, \dots, e f_{j}^{| T |})$ consists of $| T |$ feature vectors, and $e f_{j}^{i}$ is the feature vector of the task i for the decoding layer j. The generation process of $e f_{j}^{i}$ is formulated as follows:

$m e_{j}^{i} = LN (s e_{j}^{i} + MHSA (s e_{j}^{i})),$

(17)

$e f_{j}^{i} = LN (m e_{j}^{i} + FF (m e_{j}^{i})),$

(18)

where $m e_{j}^{i}$ is a process variable.
Dynamic embedding layers—embedding the dynamic elements into a high-dimensional vector space and providing query vectors for the corresponding decoders at every decoding step. These dynamic embedding layers have the same structure, each consisting of two linear networks ( ${Linear}_{1}, {Linear}_{2}$ ). As for the dynamic embedding layer j, it embeds the feature vector $e f_{j}^{I_{j}^{l - 1}}$ of the previous selected task and the current state vector $s v_{j}^{l - 1}$ of the satellite j into a query vector $q_{j}^{l}$ at the decoding step l, which is formulated as follows:

$q_{j}^{l} = {Linear}_{2} (Concat (e f_{j}^{I_{j}^{l - 1}}, {Linear}_{1} (s v_{j}^{l - 1}))) .$

(19)
Decoding layers—selecting appropriate tasks for all satellites by associating the query vectors from the dynamic embedding layers with the extracted task features. These decoding layers have the same structure, and the main component of each one is an attention layer, whose learnable parameters are denoted by $v^{T}$ , $F C_{1}$ , and $F C_{1}$ . As for the decoding layer j, it obtains the probability distribution $P (\cdot | A_{j}^{l - 1}, T)$ of the candidate tasks depending on the correlation between the query vector $q_{j}^{l}$ and the features of the candidate tasks. The set of the candidate tasks is denoted by $C T_{j}^{l}$ , and a mask operator $Mask (\cdot, \cdot)$ is used to mask the feature vectors of the tasks that are not in the candidate task set $C T_{j}^{l}$ . The probability distribution $P (\cdot | A_{j}^{l - 1}, C T_{j}^{l})$ is formulated in Equation (20). The probability $P (Y | T)$ of generating the final scheduling solution $Y = {A_{j} | j = 1, 2, \dots, | S |}$ can be factorized according to the chain rule, as formulated in Equation (21).

$P (\cdot | A_{j}^{l - 1}, C T_{j}^{l}) = softmax (v^{T} \cdot \tanh ({FC}_{1} (q_{j}^{l}) + {FC}_{2} (Mask (E F_{j}, C T_{j}^{l})))) .$

(20)

$P (Y | T) = \prod_{j = 1}^{| S |} \prod_{l = 1}^{| A_{j} |} P (\cdot | A_{j}^{l - 1}, C T_{j}^{l}) .$

(21)

4.3. Training Algorithm

The parameter of the MPN must be optimized by learning from abundant training samples, and the REINFORCE algorithm with baseline is adopted to train this network model, which is presented in Algorithm 1. The details of the training process are elaborated below.

Algorithm 1: Training algorithm of the multi-pointer network

1:: Initialize the network parameter $θ$
2:: for $e p o c h \leftarrow 1, \dots, E p o c h$ do
3:: for $b n \leftarrow 1, \dots, B N$ do
4:: Sample N training instances from the training dataset
5:: for $n \leftarrow 1, \dots, N$ do
6:: Obtain the scheduling solution $Y_{n}$
7:: Calculate the reward $F_{n}$
8:: end for
9:: $d θ \leftarrow \frac{1}{N} \sum_{n = 1}^{N} (F_{n} - b) \nabla_{θ} log P (Y_{n} | T_{n}; θ)$
10:: Update $θ \leftarrow Adam (θ, d θ)$
11:: end for
12:: end for

First, given

B N

batches of training scenarios, some variables are defined to describe a batch of training scenarios: N denotes the batch size;

T_{n}

denotes the task set in the nth scenario;

Y_{n}

denotes the output scheduling solution for the nth scenario;

F_{n}

denotes the optimization objective value of

Y_{n}

, which is also the reward of this scheduling solution. Furthermore,

E p o c h

is the number of training epochs,

θ

is the parameter of the MPN, and b is the baseline, the value of which is a constant.

Then, after a scheduling solution

Y_{n}

is generated, its reward

Y_{n}

can be calculated, and the probability

P (Y_{n} | T_{n}; θ)

can be obtained. After the scheduling of a batch of training instances is finished, the policy gradient of the MPN can be calculated, and the parameter of the MPN can be updated through the Adam optimizer, which is formulated as follows:

d θ = \frac{1}{N} \sum_{n = 1}^{N} (F_{n} - b) \nabla_{θ} log P (Y_{n} | T_{n}; θ),

(22)

θ \leftarrow Adam (θ, d θ) .

(23)

5. Computational Experiments

This section presents the computational experiments to demonstrate the effectiveness and the superiority of the MPN. First, the experimental settings are introduced, including the design of the datasets and the evaluation metrics. Second, a comparison experiment is carried out to verify the superiority of the MPN over some state-of-the-art algorithms for the MAOSSP. Third, an ablation experiment is constructed to verify the effectiveness of the proposed strategies in the MPN. These experiments are carried out on a laptop computer with Intel^® Core^TM i7-7700HQ CPU@2.80 GHz and 40 GB RAM. The DRL framework embedded in Pytorch 1.5.1 in Python 3.8 is adopted in this study.

5.1. Experimental Settings

Due to the lack of a benchmark dataset, a large number of scenarios are designed based on the satellite scheduling scenario design method proposed by He et al. [4]. In these scenarios, the tasks are randomly distributed around the world. For each task, the requested imaging duration is a random integer between 5 and 10, and the priority is both random integers between 1 and 10. The orbital parameters of the satellites are available at https://celestrak.org/. The detailed scenario parameters are listed in Table 1. The scheduling time horizon is from 1 January 2023, 00:00:00, to 1 January 2023, 24:00:00.

Based on the above method, three training datasets and a testing dataset are created. Each training dataset contains 1920 training scenarios, in which the number of tasks is 1000, and the number of satellites is different. There are 10 satellites in the first training dataset, 15 satellites in the second training dataset, and 20 satellites in the third training dataset. The testing dataset contains several testing scenarios with various task and satellite scales. To distinguish the different testing scenarios, we used an “A-B-C” format to name them, where A is the number of satellites, B is the number of tasks, and C is the scenario number.

Three metrics are used to evaluate the performance of the testing results, including the final optimization objective F, the gap of the optimization objective

G a p

, and the computation time. F represents the solution quality, and the higher F value indicates the better solution quality. The formula of F is given in Section 3.3.

G a p

is used to enhance the assessment of the gap between the proposed MPN and comparison algorithms, which is formulated as follows:

G a p = 1 - \frac{F_{c o m}}{F_{M P N}},

(24)

where

F_{c o m}

is the optimization objective value obtained by a comparison algorithm, and

F_{M P N}

is the optimization objective value obtained by the MPN.

5.2. Training Performance

The MPNs for the scheduling scenarios with different satellite scales are trained on the three training datasets, respectively, and the trained networks are separately denoted by MPN-10, MPN-15, and MPN-20. The training parameters are listed in Table 2. Figure 7 depicts the loss curves of these three networks in the training process.

As seen in Figure 7, the fluctuation of the three loss curves becomes smaller after 200 training steps, indicating that MPN-10, MPN-15, and MPN-20 can converge quickly. Furthermore, the three loss curves all converge to low levels, indicating that the training algorithm for the MPNs is effective.

5.3. Comparison with State-of-the-Art Algorithms

To verify the superiority of the trained MPNs, they are compared with four state-of-the-art multi-satellite scheduling algorithms, including IAACO [29], HDABC [27], RLGA [31], and MARL [33]. IAACO and HDABC are the improved heuristic algorithms, and RLGA is a combination of the reinforcement learning algorithm and the genetic algorithm. In these three algorithms, the size of population is set as 10, and the number of iterations is set as 100. MARL is a multi-agent reinforcement learning algorithm with MAPPO. The comparison experiment consists of two parts: the comparison in scenarios with 10 satellites and various task scales and the comparison in scenarios with more satellites and various task scales.

Table 3 and Table 4 present the testing results of the MPN-10 and the comparison algorithms in the scenarios with 10 satellites and various task scales, including the solution quality and the computation time. As shown in Table 3, the F values of MPN-10 are the highest among these algorithms, indicating that MPN-10 is significantly superior to the comparison algorithms in terms of the solution quality. Furthermore, the

G a p

values of IAACO and HDABC are higher than those of RLGA and MARL, indicating that the heuristic algorithms are inferior to the deep reinforcement learning algorithms in terms of the solving ability. Moreover, the

G a p

values of the comparison algorithms tend to be higher with the increase in the number of tasks, indicating the more significant superiority of MPN-10 in the scenarios with lager task scales. As listed in Table 4, MPN-10 takes the least computation among these algorithms, indicating that MPN-10 is superior to the comparison algorithms in terms of computation efficiency. In particular, the computation time of MPN-10 is much lower than that of IAACO and HDABC owing to the fact that the MPN constructs the solutions in an end-to-end manner, which does not need the iterations of the population. From a comprehensive perspective of solution quality and computation time, the proposed MPN exhibits excellent performance in solving ability and computation efficiency.

Table 5 and Table 6 present the testing results of the MPN-15 and MPN-20 and the comparison algorithms in the scenarios with more satellites and various task scales. As illustrated in Table 5, MPN-15 achieves the best F values in the15-satellite scenarios, and MPN-20 obtains the best F values in the 20-satellite scenarios, demonstrating that the proposed MPN has a superior generalization ability over the comparison algorithms to apply to the various-scale scenarios. In particular, the MPN performs much better in the scenarios with more satellites, meaning that the MPN is applicable to large-scale satellite scheduling scenarios. As illustrated in Table 6, MPN-15 takes the least computation time in the 15-satellite scenarios, and MPN-20 takes the least computation time in the 20-satellite scenarios, demonstrating that the proposed MPN is superior to the comparison algorithms in terms of the computation efficiency.

Overall, the solving ability, computation efficiency, and generalization of the proposed MPN have been verified through the above comparison experiments on extensive testing scenarios. First, the performance of the MPN is better than that of the comparison algorithms in terms of solution quality and computation efficiency. Second, the gaps between the comparison algorithms and the MPN rise gradually with the increase in the task scale, demonstrating the superior solving ability of the MPN to tackle scheduling scenarios with large-scale tasks. Third, the overall performance of the MPN is still better than that of the comparison algorithms in the scenarios with more satellites, demonstrating the superior generalization ability of the MPN to apply to various-scale scheduling scenarios.

5.4. Ablation Study

In the proposed MPN, a local feature-enhancement strategy is used to extract richer task features in the static feature extraction stage, and a remaining time-based decoding sorting strategy and a feasibility-based task selection strategy are used to improve the solution construction process in the dynamic decoding stage. To verify the effectiveness of these three strategies, we compare the performance of the MPN to that of the MPNs without one of these strategies. MPN-10 is the MPN with all these three strategies, and the comparison MPNs lack a strategy but are equipped with the other two strategies.

Table 7 lists the testing results of these MPNs with different strategies in the 10-satellite scenarios with different task scales. In terms of the solution quality, MPN-10 performs the best among these MPNs, demonstrating that the proposed strategies are beneficial for improving the solution quality. The

G a p

value of the MPN without the remaining time-based decoding sorting strategy is higher than that of the other two MPNs in all testing scenarios, indicating that the remaining time-based decoding sorting strategy plays a more important role in the improvement of the solution quality compared to the local feature-enhancement strategy and the feasibility-based task selection strategy. In most testing scenarios, MPN-10 takes the most computation time, and the MPN without the local feature-enhancement strategy takes the least computation time, indicating that these strategies need a little more computation time and that the local feature-enhancement strategy needs the most among them. From a comprehensive perspective of solution quality and computation time, these three strategies result in a slight increase in the computation burden but significantly improve the solution quality.

Table 8 lists the observation profit rate

f_{1}

and the load balance

f_{2}

obtained by the MPNs with different strategies in the 10-satellite scenarios with various task scales.

f_{1}

and

f_{2}

are described in Section 3.3. In terms of the load balance

f_{2}

, MPN-10 performs the best in most testing scenarios whose task scales do not exceed 1000, while MPN-10 achieves the highest

f_{2}

values when the number of tasks is more than 1000. However, MPN-10 is superior to the other MPNs in all testing scenarios in terms of the observation profit rate

f_{1}

, indicating that these three strategies significantly improve the solution quality, especially in the aspect of the improvement of the observation profit rate. Furthermore, the MPN with the feasibility-based task selection strategy obtains the lowest

f_{1}

values in all these testing scenarios, indicating that the improvement of the solution quality from the feasibility-based task selection strategy lies in the enhancement of the observation profit rate.

Overall, the effectiveness of the proposed three strategies has been verified through the above ablation experiment. First, the MPN with all these strategies shows better performance in the solution quality than the other MPNs, demonstrating that these strategies effectively enhance the solving ability of the MPN. Second, these strategies significantly improve the observation profit rate obtained by the MPN, resulting in the improvement of the solution quality. Third, the feasibility-based task selection strategy plays a more significant role in improving the solution quality than the other two strategies.

6. Conclusions

In this paper, a multi-pointer network is proposed to solve the multi-agile optical satellite scheduling problem considering the observation profit rate and the load balance. This network is a “single-sequence-to-multiple-sequence” model, which adopts multiple attention layers as the pointers to construct observation action sequences for multiple satellites. Three strategies are proposed to improve the solving ability of the proposed network: first, a local feature-enhancement strategy is adopted to fuse the global task information with the local task information, beneficial for providing more distinctive and richer task features for decision-making; second, a remaining time-based decoding sorting strategy is applied to arrange the sort of the decoding layers based on the remaining time of the corresponding satellites, facilitating the selection of appropriate tasks for satellites; third, a feasibility-based task selection strategy is used to select executable tasks for satellites, avoiding the generation of the infeasible solution. Based on these strategies, the proposed network shows excellent performance in solution quality, computation efficiency, and generalization ability compared to four state-of-the-art algorithms. The ablation experiment fully demonstrates the effectiveness of these three strategies in the improvement of the solution quality.

For future work, we will investigate the uncertain scheduling problem of multiple agile optical satellites and consider the uncertain factors in this problem, such as the effect of cloud cover and the dynamic arrival of emergency tasks. The effective solution to this problem usually requires a real-time or near real-time scheduling method to adapt to the dynamic changes in uncertain factors. Therefore, we will further explore the proposed network model to extend it to this uncertain multi-satellite scheduling problem.

Author Contributions

Conceptualization, Z.L. and W.X.; methodology, Z.L. and W.X.; software, Z.L. and C.H.; validation, Z.L.; formal analysis, Z.L.; investigation, Z.L. and C.H.; resources, C.H. and K.Z.; data curation, K.Z.; writing—original draft preparation, Z.L.; writing—review and editing, Z.L. and W.X.; visualization, Z.L. and C.H.; supervision, W.X.; project administration, W.X.; funding acquisition, W.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Aerospace Discipline Education New Engineering Project, grant number 145AXL250004000X.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AOS	Agile optical satellite scheduling
MAOSSP	Multi-agile optical satellite scheduling problem
VTW	Visible time window
OTW	Observation time window
NP	Non-deterministic polynomial
DRL	Deep reinforcement learning
RL	Reinforcement learning
Seq2Seq	Sequence-to-sequence
MDP	Markov decision process
DNN	Deep neural network
GA	Genetic algorithm
PSO	Particle swarm optimization
ACO	Ant colony optimization
ABC	Artificial bee colony
IAACO	Improved adaptive ant colony algorithm
HDABC	Hybrid discrete artificial bee colony algorithm
RLGA	Genetic algorithm based on reinforcement learning
DRL-GA	Deep reinforcement learning-based genetic algorithm
MCTS	Monte Carlo tree search
MARL	Multi-agent reinforcement learning
MAPPO	Multi-agent proximal policy optimization
GRU	Gated recurrent unit
CNN	Convolution neural network
MPN	Multi-pointer network
EST	Earliest start time
LSTM	Long short-term memory
MHSA	Multi-head self-attention
FF	Feed-forward
LN	layer-normalization

Appendix A

The derivation process of the angular adjustment time function

trans (\cdot)

is described as follows.

First, for an AOS, the adjustment process of its pitch angle or roll angle is regarded as a “uniform acceleration-uniform velocity-uniform deceleration” process or a “uniform acceleration-uniform deceleration” process, which is shown in Figure 2. Based on this assumption, some variables are redefined to describe this process.

Δ A

is the adjustment value of an angular; w is the maximum angular velocity; a is the angular acceleration value;

τ

is the adjustment time, and

τ = trans (Δ A)

. The time required for the angular velocity to accelerate from 0 to its maximum value w is denoted by

Δ τ

, which is formulated as follows:

Δ τ = \frac{w}{a} .

(A1)

Figure A1. Angular adjustment process. (a) Uniform acceleration-uniform deceleration. (b) Uniform acceleration-uniform velocity-uniform deceleration.

Second, when the adjustment value of the angular is small, this angular adjustment process is a “uniform acceleration-uniform deceleration” process, as shown in Figure A1a; and it is a “uniform acceleration-uniform velocity-uniform deceleration” process when the adjustment value of the angular exceeds a certain value, as shown in Figure A1b. The boundary value of the angular is denoted by

{Δ A}_{b}

, and it can be formulated as follows:

{Δ A}_{b} = w \cdot Δ τ = \frac{w^{2}}{a} .

(A2)

Third, when

0 \leq Δ A \leq {Δ A}_{b}

, the area of the triangle with the black line in Figure A1a is equal to

Δ A

, which is formulated as follows:

Δ A = \frac{1}{4} a τ^{2} .

(A3)

Then, the required adjustment time

τ

is formulated as follows:

τ = 2 \cdot \sqrt{\frac{Δ A}{a}}, 0 \leq Δ A \leq {Δ A}_{b} .

(A4)

Fourth, when

Δ A > {Δ A}_{b}

, the area of the trapezoid with the black line in Figure A1b is equal to

Δ A

, which is formulated as follows:

Δ A = w \cdot (τ - Δ τ) = w \cdot τ - \frac{w^{2}}{a} .

(A5)

Then, the required adjustment time

τ

is formulated as follows:

τ = \frac{Δ A}{w} + \frac{w}{a}, Δ A > {Δ A}_{b} .

(A6)

Finally, based on Equations (A4) and (A6), the angular adjustment time is formulated as follows:

trans (Δ A) = \{\begin{matrix} 2 \cdot \sqrt{\frac{Δ A}{a}}, 0 ⩽ Δ A ⩽ \frac{w^{2}}{a} \\ \frac{Δ A}{w} + \frac{w}{a}, Δ A > \frac{w^{2}}{a} \end{matrix},

(A7)

The above is the derivation process of the angular adjustment time function

trans (\cdot)

.

References

Onojeghuo, A.O.; Blackburn, G.A.; Huang, J.; Kindred, D.; Huang, W. Applications of satellite ‘hyper-sensing’in Chinese agriculture: Challenges and opportunities. Int. J. Appl. Earth Obs. Geoinf. 2018, 64, 62–86. [Google Scholar]
Hede, A.N.H.; Koike, K.; Kashiwaya, K.; Sakurai, S.; Yamada, R.; Singer, D.A. How can satellite imagery be used for mineral exploration in thick vegetation areas? Geochem. Geophys. Geosyst. 2017, 18, 584–596. [Google Scholar] [CrossRef]
Han, C.; Xiong, W.; Xiong, M.; Liu, Z. Support vector regression-based operational effectiveness evaluation approach to reconnaissance satellite system. J. Syst. Eng. Electron. 2023, 34, 1626–1644. [Google Scholar] [CrossRef]
He, Y.; Xing, L.; Chen, Y.; Pedrycz, W.; Wang, L.; Wu, G. A generic Markov decision process model and reinforcement learning method for scheduling agile earth observation satellites. IEEE Trans. Syst. Man Cybern. Syst. 2020, 52, 1463–1474. [Google Scholar] [CrossRef]
Peng, G.; Song, G.; He, Y.; Yu, J.; Xiang, S.; Xing, L.; Vansteenwegen, P. Solving the agile earth observation satellite scheduling problem with time-dependent transition times. IEEE Trans. Syst. Man Cybern. Syst. 2020, 52, 1614–1625. [Google Scholar] [CrossRef]
Lemaître, M.; Verfaillie, G.; Jouhaud, F.; Lachiver, J.M.; Bataille, N. Selecting and scheduling observations of agile satellites. Aerosp. Sci. Technol. 2002, 6, 367–381. [Google Scholar] [CrossRef]
Wang, X.; Han, C.; Zhang, R.; Gu, Y. Scheduling multiple agile earth observation satellites for oversubscribed targets using complex networks theory. IEEE Access 2019, 7, 110605–110615. [Google Scholar] [CrossRef]
Xiong, M.; Xiong, W.; Liu, Z. A co-evolutionary algorithm with elite archive strategy for generating diverse high-quality satellite range schedules. Complex Intell. Syst. 2023, 9, 5157–5172. [Google Scholar] [CrossRef]
Wei, L.; Xing, L.; Wan, Q.; Song, Y.; Chen, Y. A multi-objective memetic approach for time-dependent agile earth observation satellite scheduling problem. Comput. Ind. Eng. 2021, 159, 107530. [Google Scholar] [CrossRef]
Chu, X.; Chen, Y.; Xing, L. A branch and bound algorithm for agile earth observation satellite scheduling. Discret. Dyn. Nat. Soc. 2017, 2017, 7345941. [Google Scholar] [CrossRef]
Zheng, Z.; Guo, J.; Gill, E. Swarm satellite mission scheduling & planning using hybrid dynamic mutation genetic algorithm. Acta Astronaut. 2017, 137, 243–253. [Google Scholar]
Ren, L.; Ning, X.; Wang, Z. A competitive Markov decision process model and a recursive reinforcement-learning algorithm for fairness scheduling of agile satellites. Comput. Ind. Eng. 2022, 169, 108242. [Google Scholar] [CrossRef]
Herrmann, A.; Schaub, H. Reinforcement learning for the agile earth-observing satellite scheduling problem. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 5235–5247. [Google Scholar] [CrossRef]
Li, G.; Li, X.; Li, J.; Chen, J.; Shen, X. PTMB: An online satellite task scheduling framework based on pre-trained Markov decision process for multi-task scenario. Knowl.-Based Syst. 2024, 284, 111339. [Google Scholar] [CrossRef]
Wang, X.; Wu, J.; Shi, Z.; Zhao, F.; Jin, Z. Deep reinforcement learning-based autonomous mission planning method for high and low orbit multiple agile Earth observing satellites. Adv. Space Res. 2022, 70, 3478–3493. [Google Scholar] [CrossRef]
Long, Y.; Shan, C.; Shang, W.; Li, J.; Wang, Y. Deep Reinforcement Learning-Based Approach With Varying-Scale Generalization for the Earth Observation Satellite Scheduling Problem Considering Resource Consumptions and Supplements. IEEE Trans. Aerosp. Electron. Syst. 2024, 60, 2572–2585. [Google Scholar] [CrossRef]
Wei, L.; Chen, Y.; Chen, M.; Chen, Y. Deep reinforcement learning and parameter transfer based approach for the multi-objective agile earth observation satellite scheduling problem. Appl. Soft Comput. 2021, 110, 107607. [Google Scholar] [CrossRef]
Chu, X.; Chen, Y.; Tan, Y. An anytime branch and bound algorithm for agile earth observation satellite onboard scheduling. Adv. Space Res. 2017, 60, 2077–2090. [Google Scholar] [CrossRef]
Qu, Q.; Liu, K.; Li, X.; Zhou, Y.; Lü, J. Satellite observation and data-transmission scheduling using imitation learning based on mixed integer linear programming. IEEE Trans. Aerosp. Electron. Syst. 2022, 59, 1989–2001. [Google Scholar] [CrossRef]
Hosseinabadi, S.; Ranjbar, M.; Ramyar, S.; Amel-Monirian, M. Scheduling a constellation of agile Earth observation satellites with preemption. J. Qual. Eng. Prod. Optim. 2017, 2, 47–64. [Google Scholar]
Mao, L.; Qing, D.; Liu, R.; Kong, X. CPM-GA for Multi-satellite and Multi-task Simulation Scheduling. J. Syst. Simul. 2021, 33, 205–214. [Google Scholar]
Yan, B.; Wang, Y.; Xia, W.; Hu, X.; Ma, H.; Jin, P. An improved method for satellite emergency mission scheduling scheme group decision-making incorporating PSO and MULTIMOORA. J. Intell. Fuzzy Syst. 2022, 42, 3837–3853. [Google Scholar] [CrossRef]
Wu, X.; Yang, Y.; Sun, Y.; Xie, Y.; Song, X.; Huang, B. Dynamic regional splitting planning of remote sensing satellite swarm using parallel genetic PSO algorithm. Acta Astronaut. 2023, 204, 531–551. [Google Scholar] [CrossRef]
Cui, K.; Xiang, J.; Zhang, Y. Mission planning optimization of video satellite for ground multi-object staring imaging. Adv. Space Res. 2018, 61, 1476–1489. [Google Scholar] [CrossRef]
He, L.; Liu, X.L.; Chen, Y.W.; Xing, L.N.; Liu, K. Hierarchical scheduling for real-time agile satellite task scheduling in a dynamic environment. Adv. Space Res. 2019, 63, 897–912. [Google Scholar] [CrossRef]
Luo, K. A hybrid binary artificial bee colony algorithm for the satellite photograph scheduling problem. Eng. Optim. 2020, 52, 1421–1440. [Google Scholar] [CrossRef]
Yang, Y.; Liu, D. A hybrid discrete artificial bee colony algorithm for imaging satellite mission planning. IEEE Access 2023, 11, 40006–40017. [Google Scholar] [CrossRef]
Chatterjee, A.; Tharmarasa, R. Reward factor-based multiple agile satellites scheduling with energy and memory constraints. IEEE Trans. Aerosp. Electron. Syst. 2022, 58, 3090–3103. [Google Scholar] [CrossRef]
Zhou, Z.; Chen, E.; Wu, F.; Chang, Z.; Xing, L. Multi-satellite scheduling problem with marginal decreasing imaging duration: An improved adaptive ant colony algorithm. Comput. Ind. Eng. 2023, 176, 108890. [Google Scholar] [CrossRef]
Wu, J.; Song, B.; Zhang, G.; Ou, J.; Chen, Y.; Yao, F.; He, L.; Xing, L. A data-driven improved genetic algorithm for agile earth observation satellite scheduling with time-dependent transition time. Comput. Ind. Eng. 2022, 174, 108823. [Google Scholar] [CrossRef]
Song, Y.; Wei, L.; Yang, Q.; Wu, J.; Xing, L.; Chen, Y. RL-GA: A reinforcement learning-based genetic algorithm for electromagnetic detection satellite scheduling problem. Swarm Evol. Comput. 2023, 77, 101236. [Google Scholar] [CrossRef]
Song, Y.; Ou, J.; Pedrycz, W.; Suganthan, P.N.; Wang, X.; Xing, L.; Zhang, Y. Generalized model and deep reinforcement learning-based evolutionary method for multitype satellite observation scheduling. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 2576–2589. [Google Scholar] [CrossRef]
Zhang, G.; Li, X.; Hu, G.; Li, Y.; Wang, X.; Zhang, Z. MARL-Based Multi-Satellite Intelligent Task Planning Method. IEEE Access 2023, 11, 135517–135528. [Google Scholar] [CrossRef]
Zhao, X.; Wang, Z.; Zheng, G. Two-phase neural combinatorial optimization with reinforcement learning for agile satellite scheduling. J. Aerosp. Inf. Syst. 2020, 17, 346–357. [Google Scholar] [CrossRef]
Vinyals, O.; Fortunato, M.; Jaitly, N. Pointer networks. Adv. Neural Inf. Process. Syst. 2015, 28, 2692–2700. [Google Scholar]
Ren, L.; Ning, X.; Li, J. Hierarchical reinforcement-learning for real-time scheduling of agile satellites. IEEE Access 2020, 8, 220523–220532. [Google Scholar] [CrossRef]
Wang, X.; Wu, G.; Xing, L.; Pedrycz, W. Agile earth observation satellite scheduling over 20 years: Formulations, methods, and future directions. IEEE Syst. J. 2020, 15, 3881–3892. [Google Scholar] [CrossRef]
He, L.; de Weerdt, M.; Yorke-Smith, N. Time/sequence-dependent scheduling: The design and evaluation of a general purpose tabu-based adaptive large neighbourhood search algorithm. J. Intell. Manuf. 2020, 31, 1051–1078. [Google Scholar] [CrossRef]
Zhang, J.; Xing, L. An improved genetic algorithm for the integrated satellite imaging and data transmission scheduling problem. Comput. Oper. Res. 2022, 139, 105626. [Google Scholar] [CrossRef]

Figure 1. Observation process of an AOS. (a) Determining the observation sequence of tasks. (b) Determining the observation time for tasks.

Figure 2. Angular adjustment process. (a) Uniform acceleration-uniform velocity-uniform deceleration. (b) Uniform acceleration-uniform deceleration.

Figure 3. Framework of MAOSSP.

Figure 4. The architecture of the multi-pointer network.

Figure 5. The workflow of the multi-pointer network.

Figure 6. The structure of the embedding layer.

Figure 7. Loss curves of MPN-10, MPN-15, and MPN-20.

Table 1. Settings of scenario parameters.

Object	Notation	Definition	Type	Value
Tasks	$d_{i}$	Requested imaging duration	Integer	$[5, 10]$
Tasks	$p_{i}$	Task priority	Integer	$[1, 10]$
Satellites	$p u$	Maximum pitch angle	Float	$45^{\circ}$
	$r u$	Maximum roll angle	Float	$45^{\circ}$
	a	Angular acceleration	Float	$1^{\circ} / s^{2}$
	w	Maximum angular velocity	Float	$3^{\circ} / s$
	M	Maximum memory	Float	1000
	$o m$	Memory consumption rate	Float	1
	E	Maximum energy	Float	1500
	$o e$	Memory consumption rate during observation	Float	1
	$a e$	Memory consumption rate during angular adjustment	Float	0.5

Table 2. Settings of training parameters.

Notation	Definition	Value
$E p o c h$	Number of training epochs	5
$B N$	Number of batches	120
N	Batch size	16
b	Baseline value	1.0

Table 3. Comparison of the solution quality in the scenarios with 10 satellites and various task scales.

Scenario	MPN-10	IAACO		HDABC		RLGA		MARL
Scenario	$F$	$F$	$Gap$ (%)	$F$	$Gap$ (%)	$F$	$Gap$ (%)	$F$	$Gap$ (%)
10-600-1	0.986	0.817	17.133	0.811	17.780	0.871	11.720	0.951	3.588
10-600-2	0.984	0.835	15.151	0.765	22.241	0.926	5.891	0.913	7.242
10-600-3	0.959	0.842	12.209	0.724	24.533	0.875	8.787	0.929	3.102
10-800-1	0.941	0.795	15.474	0.718	23.727	0.865	8.035	0.910	3.272
10-800-2	0.914	0.754	17.479	0.793	13.257	0.836	8.541	0.879	3.840
10-800-3	0.922	0.771	16.462	0.692	24.996	0.798	13.457	0.868	5.905
10-1000-1	0.874	0.681	22.111	0.625	28.556	0.721	17.589	0.830	5.076
10-1000-2	0.858	0.711	17.213	0.726	15.373	0.759	11.531	0.825	3.882
10-1000-3	0.876	0.722	17.596	0.660	24.712	0.734	16.210	0.803	8.370
10-1200-1	0.805	0.618	23.234	0.601	25.307	0.659	18.079	0.743	7.686
10-1200-2	0.772	0.657	14.914	0.643	16.693	0.680	11.938	0.726	5.975
10-1200-3	0.770	0.620	19.583	0.575	25.409	0.687	10.823	0.696	9.660
10-1400-1	0.736	0.587	20.277	0.533	27.602	0.624	15.271	0.649	11.879
10-1400-2	0.720	0.572	20.473	0.551	23.445	0.617	14.255	0.672	6.630
10-1400-3	0.704	0.561	20.388	0.553	21.534	0.574	18.424	0.614	12.801
10-1600-1	0.646	0.484	25.080	0.451	30.201	0.556	13.912	0.585	9.384
10-1600-2	0.665	0.506	23.999	0.482	27.542	0.543	18.430	0.562	15.516
10-1600-3	0.663	0.489	26.317	0.502	24.266	0.513	22.668	0.541	18.433
10-1800-1	0.625	0.459	26.611	0.403	35.476	0.485	22.430	0.509	18.514
10-1800-2	0.624	0.449	27.969	0.462	25.942	0.485	22.239	0.513	17.781
10-1800-3	0.621	0.430	30.798	0.412	33.621	0.482	22.309	0.496	20.124
10-2000-1	0.586	0.383	34.763	0.348	40.694	0.408	30.414	0.432	26.324
10-2000-2	0.545	0.374	31.293	0.323	40.678	0.411	24.443	0.442	18.839
10-2000-3	0.580	0.390	32.820	0.379	34.619	0.441	24.063	0.435	25.042

The best values are bold.

Table 4. Comparison of the computation time (in seconds) in the scenarios with 10 satellites and various task scales.

Scenario	MPN-10	IAACO	HDABC	RLGA	MARL
10-600-1	1.864	119.344	25.484	7.332	3.849
10-600-2	2.007	119.955	25.439	6.897	3.965
10-600-3	1.995	127.239	25.331	6.982	3.584
10-800-1	2.853	201.328	41.488	8.365	5.053
10-800-2	2.912	199.204	41.475	8.597	4.964
10-800-3	2.999	215.057	41.320	10.130	4.964
10-1000-1	3.880	324.481	60.564	10.835	5.712
10-1000-2	3.679	317.759	60.843	11.643	5.564
10-1000-3	3.539	323.242	60.502	11.280	6.012
10-1200-1	4.696	471.022	84.778	12.933	7.129
10-1200-2	5.981	458.885	84.437	12.773	7.013
10-1200-3	5.460	440.219	84.216	14.177	7.322
10-1400-1	5.680	645.974	116.140	15.108	8.583
10-1400-2	5.873	624.416	116.293	13.836	8.369
10-1400-3	5.600	659.900	117.381	14.994	8.601
10-1600-1	6.848	841.542	174.126	16.045	9.921
10-1600-2	7.111	839.708	162.837	15.377	10.171
10-1600-3	7.062	816.961	158.378	15.386	10.232
10-1800-1	8.064	1043.236	181.295	16.953	11.514
10-1800-2	7.928	1019.824	188.873	17.143	11.465
10-1800-3	8.662	993.649	186.891	18.757	12.281
10-2000-1	12.881	1220.792	226.303	20.222	13.188
10-2000-2	10.296	1226.942	247.153	20.873	13.422
10-2000-3	9.511	1222.097	255.434	18.966	12.972

The best values are bold.

Table 5. Comparison of the solution quality in the scenarios with more satellites and various task scales.

Scenario	MPN-15	MPN-20	IAACO		HDABC		RLGA		MARL
Scenario	$F$	$F$	$F$	$Gap$ (%)	$F$	$Gap$ (%)	$F$	$Gap$ (%)	$F$	$Gap$ (%)
15-1000-1	0.959	-	0.805	16.023	0.778	18.889	0.853	11.053	0.939	2.098
15-1000-2	0.978	-	0.779	20.307	0.689	29.569	0.846	13.525	0.926	5.326
15-1000-3	0.970	-	0.828	14.700	0.740	23.775	0.842	13.217	0.943	2.825
15-1500-1	0.888	-	0.672	24.241	0.642	27.709	0.745	16.009	0.832	6.256
15-1500-2	0.896	-	0.652	27.245	0.632	29.440	0.736	17.810	0.826	7.776
15-1500-3	0.877	-	0.681	22.349	0.667	23.971	0.713	18.711	0.805	8.259
15-2000-1	0.758	-	0.530	30.002	0.498	34.211	0.583	23.003	0.693	8.538
15-2000-2	0.754	-	0.517	31.334	0.506	32.797	0.579	23.113	0.671	10.958
15-2000-3	0.748	-	0.510	31.740	0.499	33.3200	0.556	25.622	0.685	8.402
20-1000-1	-	0.989	0.894	9.606	0.803	18.741	0.916	7.358	0.956	3.313
20-1000-2	-	0.991	0.871	12.133	0.869	12.305	0.921	7.022	0.970	2.102
20-1000-3	-	0.983	0.853	13.254	0.882	10.262	0.943	4.099	0.941	4.256
20-1500-1	-	0.984	0.793	19.406	0.776	21.147	0.840	14.717	0.915	7.056
20-1500-2	-	0.978	0.807	17.457	0.807	17.425	0.889	9.033	0.908	7.131
20-1500-3	-	0.977	0.824	15.633	0.729	25.353	0.854	12.560	0.925	5.305
20-2000-1	-	0.955	0.704	26.276	0.662	30.653	0.796	16.646	0.829	13.172
20-2000-2	-	0.949	0.713	24.882	0.721	24.067	0.777	18.174	0.841	11.423
20-2000-3	-	0.940	0.699	25.645	0.670	28.722	0.810	13.790	0.855	9.030

The best values are bold.

Table 6. Comparison of the computation time (in seconds) in the scenarios with more satellites and various task scales.

Scenario	MPN-15	MPN-20	IAACO	HDABC	RLGA	MARL
15-1000-1	3.824	-	306.275	127.499	11.604	5.211
15-1000-2	4.575	-	302.272	127.419	11.449	6.791
15-1000-3	4.056	-	302.837	127.090	11.723	5.962
15-1500-1	7.223	-	655.844	265.851	15.277	9.766
15-1500-2	7.211	-	673.752	265.958	15.579	10.097
15-1500-3	7.450	-	670.882	267.281	16.307	10.656
15-2000-1	11.003	-	1173.060	492.238	21.016	14.262
15-2000-2	11.003	-	1154.998	491.969	20.409	14.580
15-2000-3	10.964	-	1167.175	493.100	20.989	15.048
20-1000-1	-	3.997	309.247	129.984	12.982	7.101
20-1000-2	-	4.284	307.369	130.149	12.911	7.456
20-1000-3	-	4.276	304.992	129.982	12.692	7.219
20-1500-1	-	7.760	671.782	267.664	16.415	11.245
20-1500-2	-	8.363	686.055	268.438	16.362	12.349
20-1500-3	-	8.333	667.193	269.594	16.574	11.986
20-2000-1	-	12.686	1181.269	460.388	20.192	17.290
20-2000-2	-	15.558	1191.063	459.843	20.872	16.583
20-2000-3	-	13.466	1190.204	456.313	20.879	16.492

The best values are bold.

Table 7. Comparison results of the MPNs with different strategies.

Scenario	MPN-10		Without Local Feature Enhancement			Without Remaining Time-Based Decoding Sorting			Without Feasibility-Based Task Selection
Scenario	$F$	Time (s)	$F$	$Gap$ (%)	Time (s)	$F$	$Gap$ (%)	Time (s)	$F$	$Gap$ (%)	Time (s)
10-600-1	0.986	1.864	0.949	3.812	2.021	0.929	5.788	1.818	0.855	13.322	1.825
10-600-2	0.984	2.007	0.948	3.649	1.734	0.965	1.930	1.894	0.892	9.353	2.007
10-600-3	0.959	1.995	0.937	2.292	1.717	0.958	0.033	1.895	0.907	5.412	1.975
10-800-1	0.941	2.853	0.884	6.065	2.469	0.844	10.323	2.675	0.804	14.525	2.701
10-800-2	0.914	2.912	0.845	7.533	2.498	0.876	4.199	2.583	0.787	13.952	2.712
10-800-3	0.922	2.999	0.878	4.805	3.041	0.843	8.637	2.635	0.802	13.043	3.241
10-1000-1	0.874	3.880	0.742	15.115	3.282	0.809	7.421	3.600	0.713	18.505	3.816
10-1000-2	0.858	3.679	0.780	9.155	3.457	0.827	3.626	3.887	0.703	18.069	4.889
10-1000-3	0.876	3.539	0.810	7.615	3.425	0.808	7.840	3.583	0.729	16.767	3.398
10-1200-1	0.805	4.696	0.744	7.580	3.702	0.741	7.958	4.375	0.698	13.232	4.237
10-1200-2	0.772	5.981	0.760	1.562	3.845	0.767	0.621	4.439	0.666	13.775	4.279
10-1200-3	0.770	5.460	0.722	6.329	3.659	0.758	1.599	4.637	0.656	14.883	4.890
10-1400-1	0.736	5.680	0.702	4.630	5.491	0.644	12.586	5.729	0.595	19.201	5.153
10-1400-2	0.720	5.873	0.682	5.286	5.008	0.687	4.527	5.600	0.646	10.261	5.722
10-1400-3	0.704	5.600	0.620	12.005	4.866	0.687	2.434	5.496	0.603	14.369	5.528

The highest F and

G a p

values and the least computation time are bold.

Table 8.

f_{1}

and

f_{2}

obtained by the MPNs with different strategies.

Table 8.

f_{1}

and

f_{2}

obtained by the MPNs with different strategies.

Scenario	MPN-10		Without Local Feature Enhancement		Without Remaining Time-Based Decoding Sorting		Without Feasibility-Based Task Selection
Scenario	$f_{1}$	$f_{2}$	$f_{1}$	$f_{2}$	$f_{1}$	$f_{2}$	$f_{1}$	$f_{2}$
10-600-1	1.0	0.014	0.981	0.032	0.994	0.064	0.929	0.074
10-600-2	1.0	0.016	0.985	0.037	0.993	0.028	0.932	0.039
10-600-3	0.999	0.040	0.986	0.049	0.986	0.028	0.933	0.027
10-800-1	0.976	0.036	0.924	0.040	0.919	0.075	0.859	0.055
10-800-2	0.977	0.063	0.911	0.066	0.940	0.064	0.865	0.078
10-800-3	0.974	0.052	0.950	0.072	0.924	0.082	0.881	0.079
10-1000-1	0.955	0.081	0.873	0.131	0.893	0.083	0.808	0.095
10-1000-2	0.935	0.077	0.870	0.091	0.884	0.057	0.780	0.077
10-1000-3	0.928	0.052	0.876	0.066	0.874	0.066	0.809	0.080
10-1200-1	0.914	0.109	0.809	0.065	0.824	0.083	0.773	0.074
10-1200-2	0.896	0.124	0.821	0.061	0.835	0.067	0.733	0.067
10-1200-3	0.908	0.134	0.827	0.072	0.826	0.047	0.739	0.083
10-1400-1	0.863	0.126	0.763	0.061	0.770	0.126	0.682	0.087
10-1400-2	0.865	0.146	0.762	0.080	0.766	0.079	0.721	0.075
10-1400-3	0.831	0.126	0.741	0.121	0.790	0.103	0.705	0.102

f_{1}

is the observation profit rate, and

f_{2}

is the load balance. The highest

f_{1}

values and the lowest

f_{2}

values are bold.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Z.; Xiong, W.; Han, C.; Zhao, K. A Multi-Pointer Network for Multiple Agile Optical Satellite Scheduling Problem. Aerospace 2024, 11, 792. https://doi.org/10.3390/aerospace11100792

AMA Style

Liu Z, Xiong W, Han C, Zhao K. A Multi-Pointer Network for Multiple Agile Optical Satellite Scheduling Problem. Aerospace. 2024; 11(10):792. https://doi.org/10.3390/aerospace11100792

Chicago/Turabian Style

Liu, Zheng, Wei Xiong, Chi Han, and Kai Zhao. 2024. "A Multi-Pointer Network for Multiple Agile Optical Satellite Scheduling Problem" Aerospace 11, no. 10: 792. https://doi.org/10.3390/aerospace11100792

APA Style

Liu, Z., Xiong, W., Han, C., & Zhao, K. (2024). A Multi-Pointer Network for Multiple Agile Optical Satellite Scheduling Problem. Aerospace, 11(10), 792. https://doi.org/10.3390/aerospace11100792

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Multi-Pointer Network for Multiple Agile Optical Satellite Scheduling Problem

Abstract

1. Introduction

2. Related Work

3. Problem Statement

3.1. Simplification and Assumption

3.2. Variable and Definition

3.3. Mathematical Model

4. Method

4.1. Architecture of the Multi-Pointer Network

4.2. Specific Structures of the Network Components

4.3. Training Algorithm

5. Computational Experiments

5.1. Experimental Settings

5.2. Training Performance

5.3. Comparison with State-of-the-Art Algorithms

5.4. Ablation Study

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI