1. Introduction
Unmanned Aerial Vehicles (UAVs) have become an important enabler for time-critical logistics, including the rapid delivery of medical supplies, food, and emergency equipment in disaster-stricken areas. Prior UAV logistics systems have typically relied on centralized fleet management, fixed resource provisioning, and preconfigured routing or scheduling policies [
1,
2]. While effective under controlled conditions, these designs often exhibit limited scalability and adaptability when the environment becomes highly dynamic, e.g., sudden demand surges, partial infrastructure failures, flight restrictions (e.g., “no-drone zones”), and heterogeneous UAV availability [
3,
4]. In practice, centralized approaches also create single points of failure and bottlenecks that can delay task dispatch and re-planning during emergencies.
A promising direction is crowdsourcing-enabled multi-UAV emergency logistics, where UAVs owned by individuals, enterprises, and public agencies can be temporarily integrated into a common operational pool. Such a model can expand capacity on demand, improve geographic coverage, and reduce dependence on a single operator. However, crowdsourcing introduces new challenges: (i) decentralized coordination among heterogeneous UAVs with different battery/energy constraints and reliability, (ii) robust task assignment under time pressure and safety constraints (e.g., collision avoidance), (iii) routing and re-planning under obstacles and flight restrictions, and (iv) participation management with incentives and fairness when privately owned UAVs participate.
To address these challenges, we propose a crowdsourced multi-UAV emergency logistics framework that combines learning-based decision policies (PPO and DQN) with planning-level route refinement (GA and ACO), together with an incentive-aware dispatch mechanism. The design goal is not only navigation-centric success, but also crowdsourcing feasibility under stress, which requires measuring incentive outcomes (total payment and inequality) and robustness against unreliable participants and participation dropout. Consequently, our evaluation explicitly reports baseline-under-stress results under the same stress settings, in addition to nominal-condition comparisons.
Figure 1 illustrates a conventional setting in which a single policy (or a centrally trained controller) handles routing and task decisions. Such a design can be brittle in emergency scenarios, where a single controller must continuously re-optimize under uncertain demand and constraints. In contrast,
Figure 2 illustrates a crowdsourcing-enabled setting in which multiple UAVs can be dynamically recruited and coordinated through decentralized mechanisms.
Reinforcement learning (RL) is attractive for emergency logistics because it can optimize sequential decisions under uncertainty, enabling agents to adapt to stochastic demand and evolving constraints [
5,
6,
7,
8]. Nevertheless, RL may lack optimality guarantees, can be sensitive to reward design/hyperparameters, and may degrade under environment shifts. Therefore, our framework is designed as a
hybrid decision architecture: learning-based policies are used for real-time decision-making and navigation, while planning-level refinement (GA/ACO) and explicit safety constraints are incorporated to improve robustness and interpretability. We emphasize that the evaluation is simulation-based; bridging the gap to real-world deployment requires additional modeling of 3D dynamics, communication disruptions, and operational constraints (discussed in the concluding section).
The main contributions of this paper are summarized as follows:
Crowdsourced multi-UAV framework with incentive-aware dispatch: We introduce a crowdsourcing-enabled emergency logistics framework that coordinates heterogeneous UAV participants based on real-time status and reliability-related signals, reducing reliance on fixed fleets.
Hybrid decision architecture (policy-level RL + planning-level refinement): We integrate PPO/DQN-based decision policies with GA/ACO route refinement to improve route quality and robustness in constrained environments [
7,
8,
9,
10,
11].
Incentive and fairness characterization: Unlike centralized baselines that do not model payments, we quantify total payment and payment inequality (Gini) to evaluate crowdsourcing feasibility.
Stress-matched baseline re-evaluation (where applicable): We re-evaluate centralized PPO/DQN under crowdsourcing-relevant stressors that remain well-defined for baselines (e.g., unreliability as stochastic failure/noise), using identical map generation, obstacle placement, and collision rules. We explicitly note that participation dropout is not defined for centralized baselines and is evaluated only for the crowdsourcing regime.
This paper is organized as follows.
Section 2 reviews related work on UAV logistics, decentralized coordination, crowdsourcing, and learning-based decision-making.
Section 3 presents the system overview and problem formulation.
Section 4 describes the proposed algorithms and operational workflow.
Section 5 reports the simulation setup and experimental results (including stress tests and baseline-under-stress). Finally,
Section 6 concludes the paper and outlines limitations and future research directions.
2. Related Work
2.1. UAV Logistics and Coordination Paradigms
Prior research on UAV-based logistics has largely focused on centrally managed fleets operated by companies or public agencies. Centralized designs can simplify control and compliance management, but they often face scalability limitations, high operational overhead, and reduced robustness under large-scale disruptions or rapidly changing demand [
1,
2]. Moreover, their reliance on fixed fleet capacity makes it difficult to elastically scale during emergencies, where additional UAV resources may be urgently required.
A major line of work investigates
swarm and metaheuristic optimization for multi-UAV coordination and routing. Methods inspired by collective behaviors, including Ant Colony Optimization (ACO), Genetic Algorithms (GAs), Particle Swarm Optimization (PSO), and related heuristics, have been applied to multi-UAV path planning and task coordination [
9,
10,
11]. These approaches can produce efficient routes and cooperative behaviors, but they are commonly evaluated under assumptions of known objectives and relatively stable environments, and they may still depend on centralized orchestration or predefined behavioral rules when deployed.
Another line of work uses
learning-based navigation and decision-making. Supervised learning and reinforcement learning have been adopted for UAV navigation, obstacle avoidance, and dynamic scheduling [
6,
12]. While learning-based approaches offer adaptivity, they introduce practical concerns such as training stability, sensitivity to reward shaping, and generalization under environment shifts. These limitations motivate hybrid designs that integrate learning with explicit planning and safety constraints.
A further research direction explores decentralized coordination infrastructures, including blockchain-based mechanisms for secure logging, accountability, and trust management. Such systems can improve integrity and traceability, but they do not by themselves guarantee fast, adaptive decision-making under time-critical constraints. In emergency logistics, latency and re-planning responsiveness remain essential requirements, motivating lightweight edge-enabled orchestration.
2.2. Crowdsourcing-Enabled UAV Resource Expansion
Crowdsourcing UAV logistics aims to expand the available UAV pool by allowing privately owned UAVs to participate in emergency missions under a unified coordination mechanism [
13,
14]. Compared to fixed-fleet approaches, crowdsourcing can improve scalability and coverage, particularly when public infrastructure is disrupted. However, this paradigm requires solutions for participant management, incentive alignment, and trust assessment (e.g., reputation), as well as mechanisms to integrate heterogeneous UAV constraints (battery, payload, flight capability) into task assignment and routing.
In addition, crowdsourcing introduces challenges in coordination and communication. Inter-UAV information exchange may be intermittent, and decentralized execution must remain robust to partial failures and missing updates. Consequently, practical crowdsourcing frameworks often incorporate edge-level coordination to reduce long-haul dependency and to enable rapid local decision-making in disaster regions.
2.3. Multi-Agent Reinforcement Learning for Task Allocation and Navigation
Multi-Agent Reinforcement Learning (MARL) provides a principled approach to learning cooperative policies for multi-UAV decision-making under uncertainty. Representative MARL formulations include centralized training with decentralized execution (CTDE), which can stabilize learning while enabling distributed deployment [
5]. MARL has been investigated for task allocation, coverage control, and coordination in dynamic environments, where multiple agents must share limited resources and avoid conflicts.
In emergency logistics, MARL is attractive because it can learn policies that jointly optimize response time and safety while adapting to stochastic demand patterns. However, MARL also faces known limitations: training may be unstable or computationally expensive as the number of agents grows, learned policies can be sensitive to reward definition, and performance can degrade under distribution shifts. These challenges motivate hybrid architectures in which MARL supports real-time decisions (e.g., dispatching and local navigation) while global route refinement and explicit safety constraints are integrated to improve robustness and interpretability. Our work follows this direction by proposing a decentralized crowdsourcing edge network with cooperative MARL-based decision policies, complemented by planning-level refinement.
3. Crowdsourced Multi-UAV Framework and Utilization of Crowdsourcing
This section presents the proposed
crowdsourced multi-UAV emergency logistics framework for coordinating heterogeneous UAV resources contributed by individuals, enterprises, and public agencies. We clarify system roles, the decision/planning components, and (critically) the
in-scope vs. out-of-scope modules for the current evaluation in
Section 5 to avoid ambiguity in reviewer-facing interpretation.
3.1. System Overview and Roles of Key Modules
The proposed framework scales emergency logistics capacity by integrating dynamically available UAVs into a unified operational pool. It consists of (i) a crowdsourcing layer to enroll heterogeneous UAVs, (ii) an edge orchestrator to maintain a task pool and aggregate UAV status, and (iii) a decision/planning layer to support real-time dispatch and safe navigation.
Figure 3 represents a brief architectural design of the crowdsourced multi-UAV emergency logistics framework.
UAV owners register UAV profiles including location, residual battery/energy, payload capability, and reliability-related attributes (initialized priors). Importantly, our study models
participation dynamics that are central to crowdsourcing: UAVs can be unreliable and can drop out. These two stress dimensions are explicitly evaluated in
Section 5.
The edge orchestrator maintains (a) a task pool containing emergency requests and constraints, (b) status reports (location/energy/availability), and (c) a dispatch channel for assignments. It enforces operational constraints (e.g., collision safety and restricted zones) and provides bounded state/context information to support decentralized execution.
We adopt a hybrid decision architecture: PPO/DQN-based policies are used for real-time navigation/control (policy level), while GA/ACO are used for planning-level route candidate generation and refinement. We emphasize that GA/ACO are metaheuristic planners, not reinforcement learning algorithms; they complement learning-based control rather than replacing it.
3.2. Learning-Based Decisions and Hybrid Planning Rationale
RL is suitable for emergency logistics because it supports sequential decisions under uncertainty. In this paper, PPO and DQN are used as representative learning-based policies for navigation/control. However, RL may suffer from reward sensitivity, training instability, and degraded generalization under environment shifts. Therefore, we combine policy-level RL with planning-level refinement (GA/ACO) and explicit safety constraints.
We note that centralized training with decentralized execution (CTDE) and fully cooperative MARL can be incorporated as an extension for large-scale deployment. However, the evaluation focus in this paper is on (i) incentive-aware dispatch feasibility, (ii) robustness to unreliable participation and dropout, and (iii) baseline-under-stress comparisons under matched stress settings.
3.3. Crowdsourcing Participation: Incentives, Reliability, and Scope Boundary
Crowdsourcing requires mechanisms that encourage participation and preserve reliability when privately owned UAVs join emergency operations. In this study, the following components are
implemented and evaluated (
Section 5):
Incentive payment and fairness tracking: The platform computes total payment and payment inequality (Gini) among participating UAVs.
Reliability-aware selection: Dispatch incorporates reliability-related attributes; under stress sweeps, a fraction of participants may be unreliable or may drop out.
The following modules are
explicitly treated as out-of-scope for the current evaluation (to keep comparisons controlled) and are discussed as future work (
Section 7):
Auction/bidding mechanisms: Strategic bidding, budget-feasible auctions, and welfare analysis are not activated in the experiments.
Ledger/blockchain auditing: Auditable reputation ledgers and fraud-resistant logging are not activated in the experiments.
This explicit scope boundary prevents over-claiming and aligns the algorithmic description with what is actually measured in
Section 5.
3.4. Summary of the Framework Scope
The proposed framework provides a scalable coordination architecture for time-critical emergency logistics using crowdsourcing and a hybrid decision design (policy-level RL + planning-level GA/ACO). Our evaluation (
Section 5) reports nominal performance and explicitly provides
baseline-under-stress results under unreliable participation and dropout, together with incentive feasibility outcomes (payment and Gini). Extending the framework to richer real-world settings (e.g., 3D dynamics, communication disruptions, regulatory constraints, and strategic auctions/ledgers) will be discussed in future work.
4. Proposed Schemes
This section describes the proposed schemes for (i) incentive/reliability-aware crowdsourcing participation, (ii) hybrid planning and learning for UAV path planning, and (iii) decentralized task allocation with learning-based and swarm-intelligence signals. In contrast to centralized or single-policy frameworks, our design targets time-critical emergency logistics under heterogeneous UAV resources, safety constraints (collision avoidance and restricted zones), and dynamic task arrivals. We emphasize that
ACO/GA are planning-level metaheuristics (global candidate search/refinement) that complement learning-based decision-making (policy/execution) rather than reinforcement learning algorithms [
9,
10].
4.1. Notation and Platform Utility Definition
We first define the platform objective used throughout this paper, which is also aligned with the evaluation metrics in
Section 5. Let an episode outcome be determined by task success, safety (collision events), and incentive payments. The platform utility is defined as
where
is the mission value (e.g., success reward aggregated over tasks),
is the collision-related cost (e.g., number of collision events or collision penalty), and
P is the total incentive payment. Scalars
and
weight safety and payment costs, respectively. In centralized baselines,
by design, while the proposed crowdsourcing mechanism explicitly models
P and reports both
P and the payment inequality (Gini) as a feasibility indicator.
4.2. Incentive and Reliability Update for UAV Crowdsourcing
To ensure fair and reliable participation in a crowdsourced emergency logistics network, we propose an incentive and reliability update mechanism that adjusts compensation based on contribution signals and reliability records. The goals are to (i) encourage time-critical participation, (ii) discourage task abandonment, and (iii) prioritize reliable UAVs for high-urgency missions.
For each UAV
, we compute a normalized contribution score
from status and outcome signals:
where
is the estimated distance (or travel time) to the assigned task/disaster region,
is the energy expenditure,
is the task urgency weight, and
is an indicator function. Functions
and
are monotone normalizers (e.g.,
,
), and
are non-negative coefficients.
We maintain a reliability score
per UAV. A simple exponentially smoothed update is
where
is the update rate. This makes reliability decrease under collisions or dropouts and increase under consistent success.
The incentive payment is defined as a base payment plus a contribution-dependent term:
where
is a bounded function (e.g.,
or a clipped linear function), and
controls the sensitivity to contribution and reliability. The total payment is
where
is the set of participating/selected UAVs in the episode.
To quantify inequality among participating UAVs, we compute the Gini coefficient:
where
is a small constant to avoid division by zero. This is reported only for the proposed crowdsourcing mechanism (centralized baselines do not model payments).
4.3. Hybrid GA–ACO–(PPO/DQN) Algorithm for UAV Path Planning
To optimize UAV routing under obstacles, restricted zones, and energy constraints, we adopt a hybrid planning-and-learning approach. GA and ACO generate and refine global route candidates (planning level), while PPO/DQN policies perform local control and safety-aware adjustments (policy/execution level).
A route for UAV
is represented as a discrete waypoint sequence (or grid actions)
. We minimize a weighted cost:
where
L is path length,
counts near-obstacle risky steps,
counts restricted-zone violations (hard constrained or heavily penalized), and
is an energy proxy.
GA maximizes a fitness over a population of candidate paths.
In ACO, the probability of moving from node
u to
v is
where
is pheromone intensity,
is heuristic desirability (e.g., inverse distance-to-goal or safety score), and
is the neighbor set. Pheromone is updated by
where
is evaporation,
Q is a constant, and
denotes an elite path (e.g., top-
K from GA/ACO).
Given a refined global route suggestion (e.g., waypoint guidance), PPO/DQN executes local actions
based on observation
under collision and restricted-zone constraints:
where
denotes a PPO policy (stochastic) or a greedy action from DQN (
).
4.4. Swarm-Intelligence Signals and Learning-Based UAV Task Allocation
Efficient task assignment is crucial for multi-UAV emergency logistics under heterogeneous constraints and dynamic task arrivals [
15,
16]. We propose a decentralized task allocation scheme that integrates (i) pheromone-like swarm signals and (ii) learning-based policies for selection and navigation. The objective is to reduce conflicts, balance workload, and prioritize urgent tasks without requiring a single centralized controller at execution time.
Figure 4 shows a traditional centralized or single-policy baseline in a grid environment.
For UAV
and task
, define an attractiveness score:
where
is a task-level pheromone intensity,
is a feasibility heuristic (e.g., inverse distance and energy feasibility),
is reliability, and
is urgency. UAV
samples a task using
We filter infeasible assignments by a hard constraint:
and enforce
if infeasible.
After task completion or failure, task pheromone is updated as
where
is task pheromone decay.
4.5. Computational Complexity Discussion
We briefly discuss the computational complexity as the UAV network size N and the number of tasks m increase.
Incentive and reliability update (Algorithm 1). Each update cycle processes UAV updates once, giving . If an optional reliability record adds verification overhead per UAV, the worst-case becomes .
Hybrid path planning (Algorithm 2). Let
P be GA population size,
G the number of GA generations, and
ℓ expected path length. GA evolution costs
due to repeated cost evaluations in (
7). ACO updates scale with visited edges; if
is the induced graph size, pheromone update is
per iteration (or
for elite-path-only updates). Policy execution over horizon
H yields
per episode for small discrete action spaces.
Task allocation (Algorithm 3). In the worst case, each allocation round evaluates feasibility/attractiveness over m tasks per UAV, yielding . Pheromone and reliability updates remain linear in the number of participating UAVs and tasks updated.
Overall, the proposed design scales linearly with
N for incentive updates and approximately linearly in both
N and
m for task allocation, while planning complexity depends on GA/ACO hyperparameters
and the environment size.
| Algorithm 1 Incentive and Reliability Update for Crowdsourced UAVs |
Require: UAV network , task pool T, (optional) reliability record L
1: Initialize reliability scores and payment parameters
2: Set contribution weights and reliability update rate
3: while simulation is running do
4: for each allocated task and participating UAV do
5: Observe status/outcome signals
6: Compute contribution using (2)
7: Update reliability using (3)
8: Compute payment using (4)
9: if L is enabled then
10: Store in record L for auditing
11: Compute total payment P via (5) and fairness G via (6)
|
| Algorithm 2 GA–ACO–(PPO/DQN) Hybrid UAV Path Planning |
Require: UAV network , map M, obstacle set , restricted zones , destination set D
1: for each UAV do
2: Initialize GA population of paths and evaluate cost via (7)
3: while GA termination not met do
4: Selection/crossover/mutation to evolve paths using fitness
5: Keep elite paths
6: Initialize ACO pheromone and heuristic ; compute transitions via (8)
7: Update pheromone using elite paths via (9)
8: Produce refined route guidance
9: Execute local control using PPO/DQN policy under safety constraints via (10)
10: Return refined global route candidates and executed trajectories
|
| Algorithm 3 Swarm-Intelligence Signals and Learning-Based Task Allocation |
Require: UAV network , task set , urgency weights
1: Initialize task pheromone and reliability scores
2: while tasks remain do
3: for all UAV do
4: Compute feasibility and heuristic ; set infeasible tasks prob. to zero via (13)
5: Compute attractiveness via (11) and sample via (12)
6: Execute navigation/control toward using PPO/DQN policy under safety constraints
7: Update via (14) and update via (3)
8: Return UAV–task assignments and executed trajectories
|
5. Simulation Results and Performance Analysis
In this section, we evaluate the proposed crowdsourced multi-UAV framework against centralized single-policy baselines under both nominal and stressed conditions. The goal is twofold: (i) provide an apples-to-apples comparison under matched simulator settings, and (ii) explicitly report baseline-under-stress outcomes under crowdsourcing-relevant stressors (unreliability and dropout), together with incentive feasibility metrics (payment and fairness) that centralized baselines do not define.
5.1. Simulation Environment and Parameterization
We implement a grid-based simulator in Python 3.14 with Matplotlib-based visualization. Each episode is executed on a grid (default ) with randomized obstacle placement and a dynamically positioned disaster region. We consider two deployment regimes:
Centralized single-policy baselines (PPO, DQN): A centralized controller operates with a fixed policy for dispatch/navigation. These baselines do not include incentive payment; hence payment-related metrics (total payment, Gini) are not applicable.
Proposed crowdsourced multi-UAV: A pool of UAVs (crowd) is instantiated. UAVs are heterogeneous in reliability and availability. The system selects/dispatches UAVs based on distance/energy/reliability constraints and computes incentive payments with fairness tracking.
We use a grid-world to enable controlled ablations and stress testing with consistent collision semantics and reproducible randomness. The default
size balances (i) non-trivial obstacle interaction, (ii) sufficient horizon to observe failure/timeout modes, and (iii) tractable multi-seed evaluation. To address reviewer concerns on justification,
Section 5.9 additionally reports sensitivity to grid size and obstacle density/scale.
Table 1 summarizes the parameters required to reproduce the experiments. Parameters marked as “(from CSV/config)” must be consistent with the released configuration used to generate the CSV logs.
Figure 5 illustrates the crowdsourced multi-UAV environment.
5.2. Agent Interface: State, Action, Reward, and Interaction Flow
To address reproducibility concerns, we explicitly document the agent interface used by PPO/DQN and the platform-level logging.
We consider two discrete action sets: (i) 4-directional moves and (ii) 8-directional moves . Actions that would leave the grid are clipped (or rejected) according to the simulator rule used in the logged runs.
The observation includes at minimum the agent position and goal/disaster encoding. A reproducible implementation should specify the exact vectorization, e.g.,
where the dimensionality and included fields must match the PPO/DQN training configuration used to generate the CSV logs.
We use a shaped reward with (i) success reward, (ii) step penalty, and (iii) collision penalty:
where weights
are fixed in a run and should be reported (
Table 2). An episode terminates upon reaching the disaster region (success), collision with terminal rule, or timeout
.
We compute platform utility using Equation (
1) defined in
Section 4. Centralized baselines have
by construction, while the proposed crowdsourcing method logs payment
P and fairness
G (Gini).
Table 3 in
Section 4 provides the CSV-to-symbol mapping used in this section.
Each episode proceeds as follows: (1) sample map/obstacles/disaster region, (2) initialize UAV pool with reliability and energy states, (3) dispatch selection (crowdsourcing only), (4) execute navigation with PPO/DQN (policy level) optionally guided by GA/ACO route candidates (planning level), (5) update reliability/payment (crowdsourcing only), (6) log metrics (success, reach steps, collisions, energy left, payment, Gini, utility).
5.3. Training Setup and Hyperparameters (Reproducibility)
We train PPO and DQN policies in the base UAV environments (4-directional and 8-directional variants) and report learning curves for transparency and reproducibility.
Note on hyperparameters. We emphasize that the contribution of this paper is evaluated through (a) matched simulator settings, (b) baseline-under-stress reporting, and (c) incentive feasibility metrics (Payment/Gini) that centralized baselines do not define. To avoid introducing unverified values, we omit non-loggable hyperparameters from the main table and rely on the released run artifacts (CSV logs and, when available, configuration scripts) for exact replication.
Figure 6 shows the curves for UAVs trained with DQN and PPO in the UAVEnv (4-directional).
Due to page limits, we do not enumerate full optimizer/network hyperparameters in the main text. Instead, we provide (i) the complete per-episode CSV logs used to compute all mean ± std values reported in this section, and (ii) a compact description of the evaluation protocol and stress settings that are directly verifiable from the logs. All reported results are computed from the logged metrics under the same simulator rules (grid size, horizon, collision semantics, and success criterion). Centralized baselines do not model incentives;
Table 4 represents simulation parameters which are directly verifiable from the simulator protocol and the released CSV logs. These explicit items directly address reproducibility gaps: state/action/reward definition, hyperparameters, and episode protocol. Also,
Figure 7 shows the reward trends of DQN, PPO, and cooperative agents in the 8-directional environment.
5.4. Compared Methods
We compare three methods:
Centralized PPO (Multi-UAV): PPO policy used as the centralized baseline controller.
Centralized DQN (Multi-UAV): DQN policy used as the centralized baseline controller.
Proposed (Crowdsourced Multi-UAV): A crowdsourcing framework that combines (i) RL-based navigation/control (PPO/DQN) and (ii) GA/ACO planning-level refinement, together with incentive and reliability updates. GA/ACO operates at the planning level to refine global candidates, while PPO/DQN executes local control under safety constraints.
5.5. Evaluation Metrics
For each evaluation episode, we record the following: success ratio
S, reach steps
, collision event rate per step
, energy left
, total payment
(crowdsourcing only), payment inequality
G (crowdsourcing only), and platform utility
U computed using Equation (
1) (
Section 4). All results are summarized as mean ± std over evaluation episodes.
5.6. Overall Performance Comparison
Table 5 reports the primary comparison under the same multi-UAV evaluation setup. Centralized baselines do not include incentives; payment and Gini are not applicable.
The proposed approach is evaluated under crowdsourcing-specific constraints (unreliability/dropout, incentive cost, and dispatch feasibility gates). As a result, episodes can terminate by timeout (
) when (i) selected UAVs drop out mid-episode, (ii) reliability gating rejects feasible continuation, or (iii) collision-avoidance and constraint penalties dominate exploration, producing conservative behavior. This explains the observed ReachSteps saturation at 200 in
Table 5. To ensure reviewer transparency, we provide (i) utility decomposition (
Figure 8), (ii) stress sweep results under matched settings (
Table 6 and
Table 7), and (iii) an ablation study and sensitivity analysis (
Section 5.8 and
Section 5.9) that isolate which modules contribute to success/timeout behavior.
Figure 8 visualizes
U together with its components (
,
,
) for the proposed method, and juxtaposes baseline utilities. Negative-utility regimes are primarily driven by collision-related costs and secondarily by incentive expenditure.
To present the overall contrast visually,
Figure 9 plots success and utility as mean ± std.
5.7. Stress Tests: Unreliable Participants and Participation Dropout
A key reviewer requirement is robustness evaluation with baselines under the same stress settings. We conduct two stress sweeps: (i) unreliable ratio r (fraction of unreliable UAVs) and (ii) participation dropout probability .
5.7.1. Unreliable Ratio Sweep
Table 6 reports mean ± std across stress levels.
Figure 10 visualizes success and utility trends.
As unreliability increases, the proposed method preserves measurable and bounded incentive outcomes (payment and Gini), which are absent from centralized baselines. Success degrades with r, consistent with the presence of unreliable participants.
5.7.2. Participation Dropout Sweep
Table 7 reports mean ± std across dropout levels.
Figure 11 shows success/utility trends.
In this implementation, centralized baselines do not model volunteer participation and incentives; thus, dropout does not affect their outcomes. The proposed method explicitly models participation dynamics; therefore, its sensitivity is observable and is a key crowdsourcing realism signal.
5.8. Module Ablation Study
We conduct a module-level ablation to isolate which components drive success versus timeout behavior (ReachSteps saturation at ). Specifically, we compare (i) RL-only execution (PPO/DQN), (ii) RL+GA, (iii) RL+ACO, (iv) incentive/reliability gating on/off (crowdsourcing-only), and (v) full system. For each variant, we report success, utility, Coll. (event/step), and ReachSteps as mean ± std under the same evaluation protocol.
5.9. Stress Sweep Summary (Protocol-Consistent)
To avoid over-claiming beyond what is directly supported by the logged evaluations, we summarize here a
protocol-consistent stress sweep under the unreliable-participant ratio
(
Section 5.7). This stressor is crowdsourcing-relevant and remains
well-defined for all compared methods, allowing the centralized baselines and the proposed crowdsourced method to be re-evaluated under the same environment/map generation, obstacle placement process, and collision/success rules.
Broader environment-parameter sensitivity analyses (e.g., varying grid size or obstacle density/scale) are valuable for external validity; however, in this revision we do not include such multi-setting sweeps in the main text in order to keep the comparison tightly controlled under a single simulator protocol and to stay within page limits. accordingly, we refrain from making grid-size/obstacle density sensitivity claims in the main paper and focus on the log-verifiable stress sweep that is reproducible under the same protocol.
Platform utility depends on the payment-weight term
in Equation (
1); for centralized baselines,
by design and thus utility is invariant to
. Throughout this paper, we report utilities under the fixed
used in the logged runs.
Table 8 should be interpreted as a
protocol-consistent stress sweep summary (under a crowdsourcing-relevant stressor that is definable for all methods), rather than as a general environment-parameter sensitivity claim. Across
r, the centralized baselines remain unaffected by payment terms (not modeled), whereas the proposed method remains evaluable under the same simulator rules while additionally incorporating crowdsourcing-relevant constraints (reliability-aware dispatch and incentive accounting). The persistent ReachSteps saturation at
for the proposed method is consistent with the timeout behavior discussed in
Section 5.6 and is further examined via the stress tables (
Section 5.7) and the module ablation study (
Section 5.8).
Instead of asserting broad scenario sensitivity without a complete set of multi-setting sweeps in the main text, this subsection provides a conservative, reproducible summary under a stressor (r) that can be applied consistently across all compared methods. This improves reviewer-facing transparency by (i) avoiding placeholder figures, (ii) aligning claims strictly with log-verifiable evaluations, and (iii) making the demonstrated scope explicit.
5.10. Centralized vs. Crowdsourced Deployment: Qualitative Trajectories
We visualize representative trajectories to interpret quantitative results.
Figure 12 shows the results of trajectory visualizations in the evaluation environment.
5.11. Evaluation Summary and Reviewer-Facing Takeaways
This section now provides (i) a main table with mean ± std comparison (
Table 5), (ii) baseline-under-stress results including baselines under matched stress settings (
Table 6 and
Table 7), (iii) utility decomposition (
Figure 8) to explain negative-utility regimes, (iv)
module ablation (
Section 5.8) to isolate GA/ACO/incentive contributions, and (v)
sensitivity analyses (
Section 5.9) to address scenario complexity and parameter justification.
Accordingly, the comparative narrative is as follows:
Centralized PPO/DQN quantify navigation-centric performance under fixed-control assumptions.
The proposed method quantifies crowdsourcing feasibility (payments, inequality) and robustness to unreliable participation and dropout.
Ablations and sensitivity sweeps explain which modules and which environment factors drive timeout (ReachSteps saturation), collisions, and utility.
6. Conclusions
In this paper, we presented a crowdsourced multi-UAV emergency-response framework that enables heterogeneous UAV participants to be coordinated through an incentive-aware dispatch mechanism in dynamic and uncertain environments. Unlike centralized single-policy deployment—where a fixed controller executes navigation without modeling participant economics—our framework explicitly treats each UAV as a potentially unreliable and intermittently available contributor. To operationalize this setting, we integrate (i) reinforcement-learning-based decision policies (PPO and DQN) for local navigation/control and (ii) metaheuristic optimization (GA and ACO) for route refinement and candidate selection. This hybrid design is intended to balance policy generalization (RL) with global/path-level optimization (GA/ACO), while allowing the platform to dispatch the most suitable UAV(s) under real-time constraints such as distance, energy, and reliability.
A major emphasis of this work is reviewer-facing robustness: we evaluated the proposed scheme against centralized PPO/DQN baselines under matched environment settings, and we additionally reported baseline-under-stress results under two stress dimensions that are critical in crowdsourcing: (1) unreliable participant ratio and (2) participation dropout probability. These stress tests quantify how performance degrades as the participant pool becomes adversarial or intermittently unavailable, and they provide an apples-to-apples comparison that is often missing in multi-UAV incentive studies. Beyond navigation-centric metrics (success, collisions, reach steps), our evaluation reports platform-level outcomes that centralized baselines do not define by construction, including total payment and payment inequality (Gini). This enables a realistic assessment of crowdsourcing feasibility, where the platform must simultaneously achieve mission success and maintain economically interpretable and reasonably fair incentive outcomes.
Overall, the results support the following conclusion: centralized baselines can appear strong when incentives and participation dynamics are ignored, but such results do not directly translate to a deployable crowdsourcing system. By contrast, the proposed framework explicitly exposes the trade-off between mission objectives and incentive expenditure, and it remains evaluable and interpretable under unreliability and dropout. Therefore, this work provides a practical foundation for multi-UAV crowdsourcing in time-critical disaster-response scenarios, where robustness and incentive/fairness characteristics are first-class requirements rather than afterthoughts.