1. Introduction
For time-sensitive Earth observation applications in natural disaster responses, such as earthquakes, tsunamis, and floods, rapid and efficient target capture is crucial [
1]. Conventional Earth Observation Satellites (CEOSs), primarily maneuverable only along the roll axis [
2], are designed for systematic large-area monitoring. While effective for long-term missions, their limited agility, constrained by slower attitude control systems and fixed observation windows [
3], makes them unsuitable for urgent, unpredictable scenarios. CEOSs cannot dynamically adjust their orientation during an overpass, limiting their flexibility in responding to changing mission requirements. Consequently, CEOSs struggle to provide timely and efficient coverage for critical applications, such as disaster monitoring or other time-sensitive data collection tasks, where rapid adjustment of observation windows is essential. Conversely, agile Earth Observation Satellites (AEOSs) with flexible attitude adjustment capabilities are increasingly deployed in constellations. AEOSs can capture targets before, during, and after a single overpass through rapid rotation along the roll, pitch, and yaw axes [
4], effectively extending restricted observation intervals into longer observation time windows (OTWs). This capability enables AEOSs to observe affected areas at different points before, during, and after a single orbital pass by adjusting their pitch angle. As the satellite approaches the target area, it can increase its pitch angle to begin observing the region earlier, allowing for faster data acquisition, which is crucial for disaster assessment and emergency response. If the satellite has already passed over the target area, it can then decrease its pitch angle to reorient the observation, ensuring data acquisition at the earliest possible time, particularly when no other satellite is scheduled to pass over the affected area in the near future. Such flexibility is essential for mitigating damage, coordinating relief, and saving lives, as it allows for more timely data acquisition during critical emergency situations. The acquired data is subsequently downlinked to the ground, which is constrained by transmission time windows (TTWs) due to sparse ground station deployment. Hence, the agile attitude adjustment of AEOSs decouples potentially overlapping OTWs and TTWs. As illustrated in
Figure 1, in the widely adopted direct ground station communication mode, CEOSs need to choose between observation and transmission during temporally overlapping windows. Due to battery constraints, we assume that observation and transmission cannot occur simultaneously in each time slot, as power limitations prevent the satellite from observing and transmitting large amounts of data at the same time. Meanwhile, AEOSs can flexibly schedule observation and transmission start times by adjusting their pitch angle, thereby enabling more target capture per orbital cycle.
Since AEOSs can have more observation opportunities within a single orbital cycle, their data acquisition volume increases accordingly, as long as operational constraints are satisfied. For instance, even if two observation objects lie along the same ground track and have overlapping observation windows, they may require different sensor configurations. One area might need higher spatial resolution, whereas the other requires a broader swath or different spectral bands. Due to limited maneuverability and fixed observation parameters, conventional CEOSs are unable to satisfy both sets of requirements within a single pass. In contrast, AEOSs can flexibly adjust their pitch angle and extend the observation window. As a result, they can start one observation before the target reaches nadir and delay the other, thereby using the appropriate imaging parameters to complete both tasks. This extended observation capability naturally generates larger volumes of data, which in turn imposes heavier burdens on onboard storage and downlink transmission, thus highlighting the necessity of adopting the on-orbit computing paradigm. Recently, more satellites have been equipped with computational units dedicated to data processing (e.g., GPU and FPGA), which enables them to process observation data locally [
5,
6], and transmit only essential information to ground stations, remarkably reducing downlink overhead. However, all spacecraft, including AEOSs, face inherent limitations in mass, volume, and power, which constrain the extent of their onboard computational resources. In particular, these constraints make it infeasible for AEOSs or other satellites to accommodate extensive data processing capabilities, highlighting the need for satellites that can provide computing support as processing satellites [
7,
8,
9,
10,
11,
12,
13]. To mitigate this limitation, some constellation systems introduce specialized processing satellites, which are configured with enhanced onboard computational capacity for handling tasks offloaded from observation satellites. Such a design enables other AEOSs to strategically offload their computation tasks to processing satellites through inter-satellite links. For example, Jiang et al. [
14] established satellite edge computing with high-performance computing hardware, and utilized resource allocation to achieve efficient on-orbit data processing. However, due to orbital dynamics, inter-satellite link TTWs inevitably exist, requiring that task offloading be scheduled within these windows. This constraint necessitates rational satellite mission planning and scheduling decisions. Similarly, satellite-to-ground link TTWs must also be considered when transmitting processed results to ground stations.
To achieve the efficient operation of AEOS constellations under multiple time window constraints (i.e., OTWs and TTWs), effective scheduling of observation and on-orbit computation tasks is required. Initially, substantial research efforts focused on observation scheduling for satellites [
15,
16], particularly for AEOSs [
17,
18]. Another stream of studies concentrated on data transmission scheduling [
19,
20], with the objective of optimizing the throughput and efficiency of the data downlink within limited TTWs. Recognizing that data acquisition is only valuable if the data can be successfully transmitted, more advanced studies have addressed the joint observation and transmission scheduling problem [
21,
22]. The primary goal of such joint scheduling is to resolve conflicts between OTWs and TTWs to improve end-to-end data delivery efficiency. However, existing research has not considered the on-orbit computation enabled by processing satellites. Its real-time processing capability can reduce data downlink latency, thus allowing AEOSs to complete more observation missions. Given the TTWs imposed by inter-satellite links, the rational choice between executing computational tasks locally or offloading them to processing satellites is critical for achieving efficient data processing. Hence, it is imperative to jointly consider observation scheduling and on-orbit computation scheduling.
Coordinating observation, on-orbit computation, and downlink operations in agile satellite constellations introduces a series of inherent challenges that go beyond conventional scheduling. First, the coexistence of OTWs and TTWs often leads to temporal conflicts due to power limitations, as observation and downlink operations cannot be performed simultaneously. Second, strict precedence constraints enforce that data acquisition must be completed before computation, and computation must be completed before downlink transmission, which substantially increases scheduling complexity. Third, limited on-board computational resources and constrained inter-satellite link capacity create bottlenecks when tasks are offloaded to processing satellites. Moreover, communication resource contention at both processing satellites and ground stations may cause overload if multiple transmissions occur simultaneously. Finally, these interdependent requirements substantially enlarge the decision space and increase scheduling complexity, posing significant challenges for algorithm design to balance solution quality and computational efficiency.
To overcome the above deficiencies and unsolved challenges in previous studies, we propose a joint observation and on-orbit computation scheduling (JOOCS) scheme for agile satellite constellations. The main contributions of our work are summarized as follows:
We consider an integrated satellite-edge infrastructure comprising AEOSs, a computing-specialized processing satellite, ground stations, and cross-layer communication links. We then rigorously formulate the JOOCS problem using mathematical constraints and develop a partially observable Markov decision process (POMDP) model that optimizes task completion profit.
We propose a novel joint scheduling algorithm based on multiagent proximal policy optimization (JS-MAPPO), a DRL algorithm, to maximize AEOS mission throughput under OTW and TTW constraints. In addition, JS-MAPPO incorporates a tailored encoder–decoder policy network that enhances learning efficiency through spatiotemporal state embedding and action masking.
We conduct extensive simulations to validate our approach. The results demonstrate that JS-MAPPO achieves competitive performance, closely approaching the near-optimal solutions provided by the commercial solver, Gurobi, while maintaining computational efficiency. Moreover, our method outperforms other metaheuristics and DRL algorithms in terms of total task profit, especially in large-scale scenarios.
The remainder of this paper is outlined as follows. In
Section 2, we provide an overview of the related work.
Section 3 presents the problem formulation and the relevant POMDP model is constructed in
Section 4.
Section 5 elaborates on the proposed algorithm. In
Section 6, we present simulation results and discussions. Finally, we give concluding remarks in
Section 7.
3. Problem Description
The JOOCS involves developing a collaborative scheduling strategy for the constellation of AEOSs. The primary goal is to coordinate the observation of ground targets, the on-orbit computation of collected data, and the subsequent data transmission to ground stations, all within a finite planning horizon, in order to maximize the total profit obtained from completed missions. In this problem, a set of agile satellites, denoted as
, is tasked with observing a set of ground targets
. Based on satellite on-orbit computing technique, the observation data can be processed locally and only the key information is transmitted to the ground, reducing downlink latency. Several available ground stations
are provided to receive the data from satellites. Additionally, we consider the deployment of dedicated computing satellite with more computational resources, enabling faster on-orbit computation compared to AEOSs. Thus, AEOSs may either perform data processing locally using onboard resources (local computation) or offload computation tasks to the processing satellites (edge computation). All notations commonly used in the problem formulation are listed in
Table 1.
The objective is to accomplish more target acquisition via scheduling observation, computation, and downlink under the constraints of OTWs and TTWs. All tasks associated with a given target execute exactly once under strict precedence constraints: computation must follow observation, and downlinking must succeed computation. Moreover, for AEOSs, the operations of observation, offloading computation to the processing satellite, and downlink exhibit mutual exclusivity, while both inter-satellite offloading and satellite-to-ground downlink transmissions are subject to communication resource constraints. We subsequently construct mathematical formulations to model this process.
Uniqueness and Precedence Constraints: Each target
m can be observed at most once during the planning horizon, i.e., it can be assigned to at most one satellite and one observation time. The constraint is formulated as follows:
For any given target
m, observation must be completed before subsequent actions. Computation must precede the final downlink. Then, the following constraints are established:
Equation (
2) ensures that offloading for target
m can only occur after its observation is complete. Equation (
3) enforces the necessary processing delays for either the local or edge computation path before a downlink can be initiated, using the actual offloading decision time
t.
Time Window Constraints: Each action must be fully executed once within a valid time window. Let
denote an observation window for satellite
i on target
m. The observation action is constrained by the following:
where
denotes the indicator function, which takes the value 1 if the condition holds, and 0 otherwise. Similarly, the constraints of offloading computation to processing satellite and downlink can be formulated as follows:
where
indicates an available window for offloading or downlink.
Satellite Operation Constraints: Each satellite
can initiate at most one operation (observation, offloading, or downlink) at each time step
t, which can be described as follows:
For simplicity, we assume that on-orbit computation follows a First-In-First-Out (FIFO) queuing discipline, meaning that the computation of a task begins only after all previously arrived tasks have been executed. This assumption reduces scheduling complexity and provides a tractable framework for our study.
Communication Resource Constraints: A constraint is imposed on the communication resources of both the processing satellite and ground stations. At any given time, each is limited to receiving a single data transmission from AEOSs, thereby preventing their communication modules from being overloaded by simultaneous transmissions. This can be formulated as follows:
4. JOOCS POMDP Model
As shown in
Figure 2, the JOOCS framework consists of two components: POMDP model and MADRL. POMDP provides a formal representation of the satellite scheduling environment. This model is defined as a seven-tuple
, where
represents the state space,
the action space,
the state transition probability function,
the reward function, and
the observation space.
denotes the reward discount factor. The value of
n represents the number of agents. The second component is a MADRL approach, following the centralized training with decentralized execution (CTDE) paradigm. During the execution phase, each satellite agent (denoted as
i) acts autonomously, determining its actions via a dedicated policy network (
) based exclusively on its local observation (
). Conversely, the training phase employs a centralized critic (
) that leverages the global state (
), an aggregation of all agent information, and generalized advantage estimation (GAE) to enable an accurate evaluation of the joint actions. The learning process is driven by an interaction loop wherein each agent selects an action upon its received observation. The environment then transitions to a new state (
) based on the joint action and yields a reward signal (
). This reward is subsequently utilized by the MADRL to update both the individual policy networks and the centralized critic, thereby continuously optimizing the scheduling strategy.
4.1. State Space
The state
at any given decision step
t is represented with static and dynamic parts, which are detailed in
Table 2. For the purpose of notational simplicity, the time index
t is omitted from the table. The static part
remains constant throughout the scheduling horizon, while the dynamic part
evolves during the whole process. The total state space is formally defined as follows:
Specifically, the static state
contains all pre-calculated time windows for potential actions, formulated as follows:
where
,
, and
are the sets of all feasible time windows for observation, offloading, and transmission actions, respectively. The dynamic state
is constructed as follows:
4.2. Observation Space
In the proposed POMDP framework, each AEOS (agent i) receives a local observation, , at any given decision step t, rather than the full global state . This local observation vector is carefully designed to provide the agent with all pertinent information required for effective decision-making, while withholding the internal states of other agents to reduce input dimensionality.
Specifically, the local observation
for agent
i is composed of system-wide information, its own state, and state information pertaining to all tasks. It can be formally defined as the following set:
Here, we assume that the whole system information can be obtained through multi-satellite routing mechanisms within the constellation. Since satellite system state information typically involves relatively small data volumes (e.g., task status, queue length, binary operational state), this information can be efficiently propagated through inter-satellite links with minimal bandwidth requirements. However, this approach introduces an inherent trade-off between scheduling optimality and service timeliness. While system information accessibility improves scheduling optimality, the routing process inevitably introduces communication delays that may compromise the timeliness of satellite services. This represents a limitation of our current approach, particularly in scenarios requiring low latency responses.
4.3. Action Space
The action space,
, describes the set of all possible operations that can be executed by the agents at each decision step
t. Within the proposed multiagent formulation, the joint action
from all agents is represented as follows:
where
is the action for agent
i.
For an individual agent i, its is discrete and encompasses four distinct types of AEOSs operations.
Observe: An agent selects a ground target for observation. The validity of this action is determined by whether target m is within a time window.
Offload to Edge: An agent selects a previously observed target m to offload observation data to the processing satellite for processing. This action is constrained by the TTWs of inter-satellite link between AEOSs and the processing satellite.
Downlink: An agent selects a previously observed and computed task related to target m to transmit the final data to an available ground station . This action is constrained by the TTWs of satellite-to-ground link.
Idle: This serves as the default action when no other valid actions are available or selected.
At each step t, the set of available actions for each agent is dynamically determined by the environment based on the current state , considering all OTWs, TTWs, and precedence constraints. To enforce these constraints during policy execution, an action masking mechanism is employed.
The action mask is a critical mechanism that functions as a binary vector, denoted as , which has the same dimension as the action space . An element in is set to 1 if the corresponding action is valid and 0 otherwise. This mask is then applied within the actor network to filter the output logits before the final action selection. The process is as follows:
- 1.
The actor network’s final layer outputs a vector of raw scores (logits) for every possible action.
- 2.
The logits corresponding to all invalid actions (where the mask value in is 0) are set to a large negative number (effectively ).
- 3.
These modified logits are then passed through a softmax function to generate the final probability distribution over the actions, where the probabilities of valid actions are normalized to sum to 1.
This procedure ensures that the probabilities for all invalid actions become zero, thereby compelling the agent to sample only from the set of currently feasible actions. This dramatically improves training efficiency and guarantees the validity of the generated schedule.
4.4. Transition Function
The state transition function determines the next state based on the current state and the joint action . In our environment, the transition is deterministic and can be expressed as . The evolution of the dynamic state is controlled by two changes:
Action-driven Transitions: The execution of the joint action directly alters the state. The state updates for the primary actions are defined as follows.
If agent
i executes a successful Observe action on target
m at time
t, then
This action updates the execution status of target m () to “observed”, the status of satellite i over the subsequent duration () to “busy”, and the flag indicating that target m is observed by satellite i. It also adds a new task into the local computation queue of satellite i ().
If agent
i executes a successful Offload action for target
m at time
t, then
This action updates the execution status of target m () to “offloaded”, the status of satellite i over the subsequent duration () to “busy”, and the communication status of the processing satellite over the subsequent duration () to “busy”. It also adds a new task into the computation queue of the processing satellite.
If agent
i executes a successful Downlink action for target
m to ground station
g at time
t, then
This action updates the execution status of target m () to “transmitted”, the status of satellite i over the subsequent duration () to “busy”, the status of ground station g over the subsequent duration () to “busy”, and the flag indicating that processed result of target m (observed by satellite i) is transmitted to the ground. It also removes the corresponding task from the downlink queue of satellite i.
Time-driven Transitions: The state also evolves implicitly with the increment of time step (
). First the value of time step is normalized by
. Second, upon completion of local computation for target
m on satellite
i, the following transitions are executed.
where a task is moved from the computation queue to the downlink queue of satellite
i, and the corresponding flag
is updated. Finally, upon completion of edge computation for target
m, the following transitions are executed.
where the computation status is updated according to the length of the computation queue, and a task is moved from the computation queue to the downlink queue on the processing satellite.
4.5. Reward Function
The reward function
, is defined to guide the agents toward maximizing the total profit from completed missions. A shaped reward function is employed to mitigate the issue of sparse rewards. Specifically, the total reward
at each time step
t is defined as a sum of rewards obtained for accomplishing a specific mission for target
m, minus a constant step penalty:
where PL is a small constant penalty set to 0.01 to promote efficiency, and
is the event-driven reward for task
m at time
t, defined as follows:
where
is a fixed profit of task
m. Note that, to avoid processing satellite overloading, the reward for an offload action is dynamically impacted by the length of computation queue (
).
The design of the reward function in Equation (
38) follows two main considerations. First, we regard a task as truly completed only when it is successfully downlinked to a ground station; therefore, in principle, the full task reward is granted at this stage. To alleviate reward sparsity during training, partial rewards are also provided at intermediate stages, namely when a task is observed and when its computation is completed. In addition, for computation offloading to the processing satellite, we introduce a dynamic reward term to discourage excessive congestion at the processing satellite and to balance the utilization of system resources. Second, the coefficients associated with these reward components were determined empirically: we conducted preliminary training runs with different candidate settings and compared their performance in terms of convergence stability and task completion profit. The final set of coefficients was chosen as the one that offered the best trade-off between training efficiency and solution quality.
6. Experimental Results and Discussions
This section validates the JOOCS framework for AEOS constellations and demonstrates JS-MAPPO’s effectiveness through comparative experiments.
6.1. Simulation Scenario Setting
Experiments employed Satellite Tool Kit (STK) to generate realistic mission scenarios. Simulations initialized at 04:00:00 UTC on 6 June 2025, and span 24 h. The simulation period is discretized into 288 five-minute slots, balancing computational efficiency with scheduling flexibility.
Figure 4 shows the simulation interface.
Ten AEOSs operate in the simulation, with orbital parameters derived from two-line element (TLE) data for realistic orbital dynamics. The satellites vary in inclination, altitude, and orbital plane orientation, enabling diverse coverage for task allocation.
Table 3 lists the orbital parameters. Three ground stations support data reception and downlink operations: Shenzhen (22.54° N, 114.06° E), Harbin (45.80° N, 126.53° E), and Jiuquan (39.74° N, 98.52° E). Their geographic distribution across China ensures robust satellite visibility throughout orbital passes. Communication links between satellites and ground stations remain stable without disconnection throughout the simulation. Additionally, a processing satellite in geostationary orbit maintains continuous visibility with all three ground stations. The simulation includes 200 observation targets distributed across Earth’s surface. Each target has a unique identifier and geographic coordinates, with latitudes uniformly sampled from [−60°,60°] and longitudes from [−168°,168°].
To assess scalability and robustness across varying mission complexities, 12 simulation scenarios combine different numbers of targets and satellites. The scenarios use 3, 5, or 10 AEOSs with 50, 100, 150, or 200 observation targets. All scenarios maintain three ground stations and one geostationary processing satellite.
Table 4 details each configuration.
6.2. Algorithm Settings
Table 5 lists the JS-MAPPO hyperparameters. Training utilized an Intel Xeon Gold 6133 CPU with NVIDIA RTX 4090 GPU, while testing employed an Intel Core i7-11800H CPU with NVIDIA RTX 3050 Ti GPU. Training was conducted with Python 3.12.11, PyTorch 2.4.1, and NumPy 2.0.1. A total of 300 tasks were generated in advance. During training, tasks corresponding to the scale of each scenario were randomly sampled from this set, while in testing, a separate batch of tasks was sampled from the same set to ensure non-overlapping evaluation.
JS-MAPPO is compared against five baseline algorithms:
- (1)
Random policy (Random) [
45]: Selects feasible actions uniformly at random without using any optimization or learning mechanism, serving as a naive baseline for comparison.
- (2)
Genetic algorithm (GA) [
46]: Evolves joint action sequences using tournament selection, one-point crossover with repair, mutation, and elitist retention.
- (3)
Counterfactual multiagent actor–critic (COMA) [
47]: A multiagent RL algorithm that reduces policy gradient variance through counterfactual baselines.
- (4)
Standard MAPPO [
48]: A multiagent extension of PPO for cooperative and competitive environments.
- (5)
Gurobi [
49]: A state-of-the-art commercial optimization solver widely used for mixed-integer programming. It leverages advanced heuristics, preprocessing, and parallel computation to efficiently handle large-scale scheduling problems, and is commonly adopted as a benchmark to provide near-optimal reference solutions.
6.3. Results and Analysis
Figure 5 shows JS-MAPPO’s training curves across all 12 scenarios, with training steps on the x-axis and episodic reward on the y-axis. JS-MAPPO exhibits stable convergence across all scenarios, independent of satellite and target numbers. Scenarios with fewer targets (SCEN_1~SCEN_4) converge rapidly within
steps due to lower scheduling complexity. As targets and satellites increase (SCEN_5~SCEN_12), convergence slows due to expanded action spaces and complex temporal–spatial constraints, yet performance remains high, demonstrating effective scalability. The curves show minimal post-convergence oscillations, indicating robust policies without overfitting. Notably, JS-MAPPO achieves steady improvement and high rewards even in the largest scenario (SCEN_12), demonstrating its capability for high-dimensional multiagent problems. This scalability and stability are crucial for real-time satellite constellation scheduling.
Table 6,
Table 7 and
Table 8 present performance comparisons across all scenarios using five metrics: completed tasks, completion rate, total profit, profit rate, and computational cost. JS-MAPPO consistently achieves high performance comparable to or exceeding baselines across all scales. In small-scale scenarios (SCEN_1~SCEN_4), JS-MAPPO matches Gurobi and GA performance while requiring dramatically less computation time. For example, in SCEN_3, JS-MAPPO achieves Gurobi’s completion rate (21.33%) in
versus Gurobi’s
. In medium-scale scenarios (SCEN_5~SCEN_8), JS-MAPPO maintains strong performance. In SCEN_7, it surpasses MAPPO in profit (417 vs. 409) and profit rate (49.58% vs. 48.63%) while computing in under
. GA occasionally matches JS-MAPPO’s completion rate but requires over 2000 s, impractical for real-time applications. In large-scale scenarios (SCEN_9~SCEN_12), JS-MAPPO demonstrates excellent scalability with computation times below 1 s. In SCEN_12, it achieves the highest profit (671) and profit rate (60.23%), outperforming all baselines. Notably, Gurobi fails to produce solutions within two hours for SCEN_8, SCEN_11, and SCEN_12, highlighting its impracticality for real-time large-scale scheduling. JS-MAPPO’s stable computation times across all scales make it ideal for time-critical satellite scheduling.
In our design, the primary optimization objective of reinforcement learning training is the total profit of completed tasks, rather than the sheer number of tasks completed. As a result, there may be cases where MAPPO completes more tasks, but these tasks yield relatively low profits, leading to a lower overall return compared to JS-MAPPO. In other words, the number of completed tasks and the total profit are not strictly correlated. We included the task completion count as an additional metric mainly to provide a more intuitive illustration of scheduling behaviors. Nevertheless, when considering the actual optimization objective, JS-MAPPO consistently achieves superior overall profit.
Despite the strong performance of JS-MAPPO, several limitations remain. First, in small-scale scenarios, JS-MAPPO does not always achieve the absolute best solution quality compared with exact solvers such as Gurobi or metaheuristics such as GA. However, given its dramatically shorter computation time, this trade-off is acceptable for real-time applications. Second, the training of DRL requires substantial computational resources and a long training time, which limits its feasibility for rapid deployment. Finally, as with most DRL-based methods, the learned policies operate as black boxes and lack theoretical guarantees of optimality.
It is worth noting that in small-scale scenarios, JS-MAPPO does not always achieve the absolute best task profit compared with exact solvers such as Gurobi or metaheuristics such as GA. However, given its dramatically shorter computation time—often several orders of magnitude faster—the slight gap in solution quality is acceptable for real-time applications. In medium-scale and large-scale scenarios, JS-MAPPO shows significant advantages over exact and heuristic methods in terms of computation time, while achieving superior solution quality compared with other DRL-based approaches that operate on a similar time scale. Taken together, these results demonstrate that JS-MAPPO offers the most practical balance between effectiveness and efficiency, making it a preferable scheduling solution across different scales.
The experimental results confirm that JS-MAPPO achieves optimization-quality solutions with the computational efficiency and scalability of DRL, enabling real-time decision-making for large-scale JOOCS problems.
Figure 6,
Figure 7 and
Figure 8 visualize total profit and completed tasks from
Table 6,
Table 7 and
Table 8 across different AEOSs configurations. JS-MAPPO consistently demonstrates competitive or superior performance at all scales. To assess the processing satellite’s contribution, we conducted comparative experiments on SCEN_2, SCEN_6, and SCEN_10 by removing the processing satellite.
Figure 9 compares performance metrics including total profit, completed tasks, profit rate, and completion rate between configurations with and without the processing satellite. The processing satellite consistently enhances performance across all scenarios. In SCEN_2, it increases total profit and completion rate by providing accelerated task processing and additional downlink opportunities. This performance gap widens in larger scenarios (SCEN_6 and SCEN_10), where resource contention intensifies. Here, the processing satellite’s computational capacity and stable downlink links yield substantially higher profit and completion rates.
Through analytical and empirical evaluations, we demonstrate the critical role of the processing satellite in enhancing scalability and efficiency for large-scale JOOCS problems.