Highlights
What are the main findings?
- A novel infrastructure-aware UAV path planning framework is developed, integrating surveillance quality assessment and Deep Reinforcement Learning (DRL) for enhanced urban airspace operations.
- The proposed DDQN-CNN model effectively balances goal reachability, obstacle avoidance, and surveillance compliance, outperforming conventional baselines across multiple metrics.
What is the implication of the main finding?
- Embedding real-world infrastructure constraints into navigation policies substantially improves operational safety and regulatory conformance in complex urban environments.
- The framework provides a scalable foundation for intelligent and decentralized airspace management systems, supporting future Urban Air Mobility (UAM) integration.
Abstract
Urban Air Mobility (UAM) requires reliable communication and surveillance infrastructures to ensure safe Unmanned Aerial Vehicle (UAV) operations in dense metropolitan environments. However, urban infrastructure is inherently heterogeneous, leading to significant spatial variations in monitoring performance. This study proposes a unified framework that integrates infrastructure readiness assessment with Deep Reinforcement Learning (DRL)-based UAV path planning. Using Singapore as a representative case, we employ a data-driven methodology combining clustering analysis and in situ measurements to estimate the citywide distribution of surveillance quality. We then introduce an infrastructure-aware path planning algorithm based on a Double Deep Q-Network (DQN) with a convolutional architecture, which enables UAVs to learn efficient trajectories while avoiding surveillance blind zones. Extensive simulations demonstrate that the proposed approach significantly improves path success rates, reduces traversal through poorly monitored regions, and maintains high navigation efficiency. These results highlight the potential of combining infrastructure modeling with DRL to support performance-aware airspace operations and inform future UAM governance systems.
1. Introduction
Urban Air Mobility (UAM) has emerged as a promising paradigm for enhancing metropolitan transportation systems by integrating autonomous Unmanned Aerial Vehicles (UAVs) into low-altitude airspace [1]. These aerial operations are expected to transform not only logistics and emergency response but also passenger mobility in dense urban settings. To enable the safe and scalable deployment of UAM, robust Communication, Navigation, and Surveillance (CNS) infrastructures are essential for ensuring continuous tracking, conformance monitoring, and airspace coordination [2].
However, the assumption of uniformly available CNS services across a city does not hold in practice. Urban environments are characterized by tall buildings, signal occlusion, network load variations, and heterogeneous infrastructure deployment, all of which can lead to significant spatial differences in surveillance performance. These disparities create “surveillance blind zones” that undermine critical safety functions such as conflict detection and conformance monitoring [3].
As UAV operations scale up, regulatory frameworks are beginning to require minimum surveillance and communication standards for specific airspace classes. In this context, evaluating infrastructure readiness becomes essential not only for long-term planning but also for flight plan approval. Nonetheless, directly measuring latency or surveillance performance citywide is infeasible due to cost and scalability [4,5]. This motivates a data-driven assessment approach that can model and quantify the spatial distribution of CNS performance using open datasets, clustering methods, and limited field experiments.
Once these spatial performance patterns are known, the remaining challenge is to establish an airspace management framework incorporating infrastructure readiness and navigational services performance distribution, based on which infrastructure-aware flight management becomes possible. Traditional path planning algorithms focus mainly on obstacle avoidance and distance minimization and do not account for infrastructure quality [6]. Without infrastructure-awareness, planners may inadvertently route UAVs through low-surveillance areas, violating regulatory constraints and compromising operational safety.
Reinforcement Learning (RL) has shown strong adaptability to complex navigation environments [7]. Yet, existing RL-based planners typically assume homogeneous or abstracted environmental feedback and rarely integrate real-world infrastructure performance into their policy design. This limits their applicability in realistic UAM contexts.
Beyond the technical challenges of path safety and infrastructure awareness, the future UAM ecosystem will also involve diverse stakeholders, requiring navigation strategies that align with emerging models of decentralized governance and trust [8]. Recent research has proposed blockchain-based airspace management systems to support secure airspace reservation, dynamic allocation, and auditable governance under high traffic volumes [9]. These developments highlight a broader shift towards infrastructure- and trust-aware urban airspace operations, where navigation strategies must be dynamically adaptable to service quality, safety constraints, and evolving coordination protocols.
To address these challenges, this paper presents a unified framework that combines urban surveillance performance assessment with deep reinforcement learning-based UAV path planning. We first construct a data-driven model to estimate the spatial distribution of surveillance quality in the urban environment, using Singapore as a representative case. Then, we develop a learning-based planning system that incorporates this spatial information to intelligently avoid regions with the poorest monitoring performance, while still ensuring route efficiency and reachability. By integrating infrastructure-awareness into navigation decision-making, our approach enhances operational safety and regulatory compliance and provides a scalable foundation for future UAM integration in dense urban contexts.
1.1. Related Works
Urban environments pose substantial challenges for the safe integration of Unmanned Aircraft Systems (UASs) due to spatial heterogeneity in Communication, Navigation, and Surveillance (CNS) infrastructure performance. The Performance-Based Navigation (PBN) framework established by ICAO emphasizes that navigation requirements and operational safety are inherently dependent on the local availability and quality of CNS services, including ground-based infrastructure and airborne equipment [10]. In this context, the FAA’s UTM Concept of Operations v2.0 further highlights that flight authorizations and performance assessments must consider the dynamic variability of surveillance and communication availability, especially in complex urban airspaces [11]. These institutional frameworks emphasize the importance of performance-aware decision-making for flight planning and airspace access.
Building on these conceptual frameworks, a number of studies have sought to model CNS performance in urban contexts. For example, researchers have proposed probabilistic models to characterize surveillance quality using signal propagation, obstruction, or environmental variables such as the Sky Openness Ratio (SOR), with applications to navigation accuracy estimation and alert zone construction [12,13,14]. Advanced clustering methods have also been used to classify urban airspace according to CNS indicators, supporting performance-based airspace design and real-time monitoring of tracking capabilities [15,16]. Moreover, dependability analyses of smart city surveillance systems reveal how network layout and sensor reliability critically impact the availability and coverage of monitoring infrastructure [17].
Despite these developments, current approaches often focus on isolated technical domains—such as navigation or surveillance—but lack an integrated framework to spatially quantify and utilize CNS performance as a constraint for flight planning. Furthermore, most existing models rely on simulation or worst-case assumptions, rather than on empirical data-driven characterizations grounded in real urban infrastructure. These gaps underscore the need for operational methodologies that can assess infrastructure readiness for UAV operations at the city scale and dynamically inform downstream services such as trajectory planning or airspace reservation. Our work addresses this gap by constructing a comprehensive, data-driven model of surveillance quality using open infrastructure data, field measurements, and spatial clustering.
In the domain of urban flight path planning, Deep Reinforcement Learning (DRL) has emerged as a powerful tool for autonomous navigation in complex and dynamic environments. Traditional algorithms such as A* search [18,19], Rapidly-Exploring Random Trees (RRTs) [20], and Artificial Potential Fields (APFs) [21] have been widely used for UAV route optimization. While these classical methods are effective in static and fully known environments, they often struggle to adapt in real time to dynamic obstacles, variable surveillance constraints, or unforeseen hazards.
To overcome such limitations, learning-based approaches have gained increasing attention due to their ability to optimize navigation policies through trial-and-error interactions with uncertain environments. Deep Q-Networks (DQNs) and their variants have demonstrated strong performance in tasks such as obstacle avoidance and goal-directed flight in cluttered settings [22]. Recent surveys emphasize that conventional single-objective formulations—typically focused on distance or time—are insufficient to address modern UAV mission requirements involving collision risk, navigation uncertainty, and energy consumption [23,24]. In response, evolutionary computation and swarm intelligence methods have been explored to solve multi-objective path planning problems, offering greater robustness and flexibility in large-scale or three-dimensional environments [25,26,27,28].
In urban settings, navigation quality is not determined solely by physical obstacles but is also heavily influenced by local infrastructure characteristics such as sky openness, signal blockage, and GNSS multipath interference [12]. Several studies have incorporated environmental constraints such as limited flight time, maneuverability, and signal degradation into the optimization process, leveraging techniques like adaptive RRT*, Dubins curves, and Lyapunov-based guidance fields to generate feasible and efficient paths under strict operational constraints [29,30].
Reinforcement learning—particularly deep and multi-agent variants—has shown promise in addressing these multi-constrained, high-dimensional challenges. Such methods have been successfully applied to scenarios involving cooperative UAV operations, safe separation in dense urban airspace, and trade-offs between conflicting objectives such as energy efficiency and risk exposure [28,31,32]. Multi-agent DRL frameworks have further demonstrated their potential in structured airspace management and real-time conflict resolution in UAM, effectively scaling to heterogeneous vehicle types and evolving regulatory environments [33,34,35].
Despite these advances, few existing methods explicitly incorporate surveillance performance heterogeneity into the navigation policy itself. Most DRL-based planners assume homogeneous or implicit environmental feedback, leaving a gap in developing infrastructure-aware learning strategies that can proactively avoid regions with poor monitoring quality. This motivates the need for infrastructure-aware planning strategies that directly incorporate CNS heterogeneity into decision-making.
1.2. Contributions
This paper aims to bridge the gap between urban infrastructure heterogeneity and UAV path planning. The main contributions are summarized as follows:
- We propose a data-driven framework to quantify surveillance heterogeneity in urban environment, using Singapore as a representative case study.
- We design a deep reinforcement learning-based path planning algorithm that explicitly incorporates surveillance quality constraints, enabling UAVs to avoid regions with poor monitoring capabilities.
- We conduct comprehensive simulations to evaluate the proposed system, demonstrating improvements in safety-related metrics.
1.3. Organization of the Paper
The rest of this paper is organized as follows. Section 2 presents the methodology for assessing urban surveillance performance and discusses the case study results based on Singapore’s infrastructure data. Section 3 introduces the deep reinforcement learning framework for infrastructure-aware UAV path planning. Section 4 concludes the paper and discusses potential directions for future research.
3. DRL-Based Infrastructure-Aware Flight Planning
To ensure robust and efficient flight operations in urban environments, a path planning algorithm should not only avoid physical obstacles but also account for the communication and surveillance quality across different regions. Building on the infrastructure assessment in Section 2, we develop a Deep Reinforcement Learning (DRL)-based navigation system that incorporates infrastructure constraints into flight planning. The objective is to enable Unmanned Aerial Vehicles (UAVs) to learn optimal trajectories that avoid both obstacles and areas with poor surveillance performance.
3.1. Problem Formulation
We formulate the infrastucture-aware path planning task as a finite-horizon Markov Decision Process (MDP) [38], where a UAV navigates from a given starting position to a predefined destination on a grid-based urban map. Each environment instance encodes two spatial layers:
- An obstacle map , where each cell indicates whether the location is traversable (0) or blocked (1);
- A surveillance performance map , which reflects the monitoring quality available at each location, based on factors such as communication delay and signal coverage.
In Section 2, we clustered the urban environment into five surveillance performance categories using data-driven analysis. Among them, Cluster 1 was identified as having the poorest monitoring conditions, including the highest tracking delay and weakest coverage. In our path planning framework, we refer to these regions as surveillance blind zones and treat them as areas to avoid.
The overall planning objective is to find a path that not only avoids obstacles and reaches the goal efficiently but also maximizes monitoring safety by avoiding blind zones. However, overly strict optimization toward high surveillance performance could result in unnecessarily long or infeasible paths. To balance safety and navigability, we simplify the surveillance constraint into a binary formulation: only Cluster 4 areas are designated as forbidden zones, while all others are considered acceptable. This transforms the original multi-class optimization problem into a binary constraint-aware navigation task, improving both training efficiency and practical feasibility.
This can be formulated as a multi-objective optimization problem:
where and are weighting coefficients balancing efficiency and surveillance quality.
Formally, the MDP is defined as:
where is the state space, is the action space, P is the transition probability function, R is the reward function, and is the discount factor.
At each time step t, the agent’s state consists of two components:
- A local observation window , which is a patch centered at the agent’s current position , extracted from both the obstacle map O and the surveillance map S;
- A relative goal vector , computed as:
Thus, the complete state is defined as:
The action space consists of four discrete actions, up, down, left, and right, corresponding to cardinal movements on the grid. This discrete action space aligns with practical UAV control requirements in urban flight corridors and simplifies the learning process while maintaining sufficient maneuverability for the navigation task.
The transition function reflects the deterministic nature of the grid environment. Specifically,
if is the resulting state after taking action in state , and 0 otherwise.
3.2. Deep Reinforcement Learning Approach
To solve the infrastructure-aware path planning problem defined in Section 3.1, we employ a Double Deep Q-Network (DDQN) algorithm [39] with Convolutional Neural Networks (CNNs) [40] to effectively learn policies that consider both physical obstacles and surveillance constraints.
3.2.1. Reward Function Design and Learning Strategy
The reward function is designed to balance three competing objectives: reaching the goal efficiently, avoiding obstacles, and maintaining high-quality surveillance coverage. We construct a hierarchical reward structure that properly prioritizes these objectives while providing effective learning signals.
Specifically, at each time step, the reward is determined as:
where:
- is a large positive reward for successfully reaching the goal;
- is a substantial negative penalty for collisions or boundary violations, leading to episode termination;
- is a penalty for traversing surveillance blind zones;
- is a small step-wise penalty to encourage shorter paths;
- provides incremental feedback based on distance reduction toward the goal.
The progress reward is calculated as:
where denotes the Manhattan distance from the current position to the goal and is a positive scaling factor. This shaping reward guides the agent toward the goal, even when the final reward is distant and sparse.
Consequently, the agent is incentivized to avoid terminal collisions first, maintain surveillance quality second, and optimize path efficiency third. The hierarchical reward design ensures proper prioritization and facilitates efficient learning convergence.
3.2.2. Action Selection and Training Procedure
During training, the agent follows an -greedy exploration policy. At each time step, with probability , a random action is selected to encourage exploration; otherwise, the action with the highest Q-value is chosen:
The network parameters are updated by minimizing the mean squared Temporal Difference (TD) error between the predicted Q-values and the Double DQN target. The target value is computed as:
where is the target network and is the online network. The loss function is given by:
where denotes the experience replay buffer and d indicates episode termination.
The training follows the standard DDQN pipeline, employing the following: (1) experience replay to decorrelate samples and improve sample efficiency; (2) a periodically updated target network to stabilize the learning process.
3.2.3. Neural Network Architecture
The effectiveness of our infrastructure-aware path planning approach relies significantly on the neural network architecture’s ability to process spatial information and learn meaningful feature representations. We employ a hybrid architecture that combines convolutional layers for spatial feature extraction with fully connected layers for decision making.
Figure 13 illustrates the architecture of our DDQN-CNN model. The network processes two input streams: the local observation window and the goal vector. The observation window contains both obstacle and surveillance information, requiring specialized processing to extract relevant spatial features.
Figure 13.
The architecture of the convolutional neural network combined with double deep Q-learning (DDQN-CNN) for infrastructure-aware UAV path planning.
The network architecture consists of three main components:
- Convolutional Feature Extractor: Processes the local observation window through three convolutional layers. These layers progressively extract spatial features related to obstacle configurations and surveillance quality patterns.
- Feature Fusion Module: The convolutional features are flattened into a 1D vector and concatenated with the 2D goal vector to create a comprehensive state representation that combines local environmental features with global goal information.
- Value Approximation Layers: The fused feature vector is processed through fully connected layers.
The convolutional layers help identify complex patterns in the local environment, such as obstacle configurations and surveillance blind zones, while the fully connected layers learn to associate these patterns with appropriate Q-values for each action. This architecture significantly outperforms standard MLP-based approaches by effectively leveraging the spatial structure of the grid world.
For the DDQN algorithm, we maintain two instances of this network: an online network for action selection and a target network for stable Q-value targets. The target network parameters are periodically updated from the online network parameters to stabilize training.
3.2.4. Overview
Algorithm 1 summarizes the complete training procedure of the proposed DDQN-CNN-based infrastructure-aware path planning agent. The process includes environment interaction, -greedy exploration, experience replay optimization, target network updates, and reward clipping for stability.
| Algorithm 1: DDQN-CNN-Based Infrastructure-Aware Flight Planning Algorithm |
|
3.3. Numerical Study and Results
3.3.1. Experimental Setup
To evaluate the effectiveness of the proposed infrastructure-aware UAV path planning algorithm, we conducted training and testing in a simulated urban environment represented as a 50 × 50 grid map with randomly generated obstacles and surveillance performance variations. The agent was trained using a Double Deep Q-Network with Convolutional Neural Network (DDQN-CNN) architecture, as described in Section 3.2.3.
For a comprehensive comparison, we additionally trained three alternative models: (1) Deep Q-Network with Multi-Layer Perceptron (DQN-MLP), (2) Deep Q-Network with Convolutional Neural Network (DQN-CNN), and (3) Double Deep Q-Network with Multi-Layer Perceptron (DDQN-MLP).
These comparison models were used to assess the impact of network architecture and Q-learning variants, although the DDQN-CNN remains the primary method proposed.
To ensure model generalization, we constructed a map pool consisting of 100 pre-generated maps. During training, at each episode the environment was determined by sampling from this map pool with a probability of 80%, or by dynamically generating a new map with a probability of 20%. Each map contains randomly placed obstacles and surveillance blind spots. Additionally, the start and goal positions were randomly selected at the beginning of each episode to increase the diversity of navigation scenarios
Training was conducted over 3000 episodes per model, with multiple random seeds (0, 100, 499, 999, 5000) to ensure statistical robustness. Performance was evaluated based on several metrics, including total reward, success rate, shortest path ratio, blind step ratio, and path length.
3.3.2. Training Performance Analysis
The training progression over 3000 episodes for all four models is illustrated in Figure 14, showing key performance metrics with 50-episode smoothing to better reveal underlying trends despite the inherent variability from randomized environments.
Figure 14.
Training performance comparison of different models (smoothed over 50 episodes).
Figure 14b presents the success rate evolution during training, that is the percentage of episodes where the agent successfully reaches the destination within the maximum allowed steps. The DDQN-MLP model demonstrates the fastest initial learning, reaching approximately 70% success rate by episode 600. The DDQN-CNN model shows a slightly slower but steady improvement, achieving comparable success levels after around 1000 episodes and eventually exceeding 80% success rate by the end of training. In contrast, the DQN-CNN model initially struggles, maintaining a much lower success rate in early episodes, but gradually recovers to around 70% after 3000 episodes. The DQN-MLP model shows steady progress throughout training, achieving about 75–80% success rate in the later stages.
Figure 14a shows the total reward comparison across models. The DDQN-MLP model achieves strong early performance, while the DDQN-CNN model, despite initial lower rewards, gradually improves and ultimately attains the highest and most stable rewards (approximately 500) by the end of training. The DQN-MLP model closely follows, whereas the DQN-CNN model, although recovering from negative rewards in early episodes, remains slightly behind other models throughout training.
The shortest path ratio comparison in Figure 14c highlights the differences in navigation efficiency among the models. The shortest path ratio is defined as the agent’s actual trajectory length divided by the theoretical shortest distance. Lower values indicate more efficient path planning. Initially, all models exhibit high ratios between 8 and 12, suggesting inefficient navigation. Over time, all models significantly improve, converging to ratios around 2.5–3.5. Among them, the DDQN-MLP model achieves the best final performance with the lowest shortest path ratios, while the CNN-based models, particularly DQN-CNN, show larger fluctuations.
Figure 14d depicts the blind step ratio over training, representing the proportion of the agent’s trajectory that passes through surveillance blind zones. Throughout training, all models can keep the blind step ratio significantly lower than the randomly generated environment’s blind zone proportion (5%–8%). Notably, the DDQN-MLP and DDQN-CNN models stabilize at blind step ratios around or below 1% in the later stages of training. This indicates that the proposed surveillance-aware path planning algorithm effectively enhances the Conformance Monitoring (CM) performance by steering the agent away from blind zones. In addition to improving surveillance coverage consistency, lower blind step ratios can also reduce the probability of tracking loss and increase flight safety, which is particularly critical for urban UAV operations.
Figure 14e shows the Temporal Difference (TD) loss curves, reflecting the learning stability of the models. As expected, all models initially experience an increase in loss during the exploration-heavy early episodes, followed by a steady decline as training progresses. The DDQN-MLP model demonstrates the fastest convergence in loss, stabilizing before episode 1000. Meanwhile, the DQN-CNN model exhibits the slowest loss reduction, requiring more episodes to achieve stable training.
These results collectively demonstrate the performance trade-offs among different architectures. While MLP-based models generally achieve faster early learning and more efficient paths, the CNN-based models, particularly DDQN-CNN, demonstrate better capabilities in balancing multiple objectives, including success rate, reward maximization, navigation efficiency, and surveillance coverage enhancement over extended training horizons.
To further summarize the overall performance of each model, a radar chart comparison is provided in Figure 15. The chart aggregates the normalized results across five evaluation metrics: total reward, success rate, shortest path ratio, blind step ratio, and TD loss. For total reward and success rate, higher values indicate better performance, whereas for shortest path ratio, blind step ratio, and loss, lower values are preferable (after appropriate normalization).
Figure 15.
Radar chart comparing the normalized performance of the four models across five evaluation metrics: total reward, success rate, shortest path ratio, blind step ratio, and TD loss.
The radar chart highlights the strong balance achieved by the DDQN-CNN model across all dimensions. It attains the highest values for total reward and success rate, reflecting superior learning and navigation capabilities. Although the DDQN-CNN model does not achieve the absolute best shortest path ratio and blind step ratio, the performance gaps compared to the best models are minor and practically insignificant. Combined with its low TD loss and consistent training dynamics, the DDQN-CNN demonstrates the most robust and balanced overall behavior among all candidates.
These findings validate the effectiveness of the proposed DDQN-CNN approach for infrastructure-aware UAV path planning tasks. Its ability to consistently achieve high rewards, maintain navigation efficiency, and enhance surveillance coverage, while preserving training stability, makes it particularly promising for practical deployment in real-world urban airspace management applications.
To provide a qualitative illustration of the navigation behaviors learned by the agent, several representative flight trajectories generated by the DDQN-CNN model are presented in Figure 16. These examples demonstrate the agent’s ability to effectively reach its goal while avoiding obstacles and minimizing traversal through surveillance blind zones. As training progresses, the trajectories become progressively more direct and efficient, reflecting the model’s improved planning capabilities and enhanced situational awareness.

Figure 16.
Representative flight trajectories generated by the DDQN-CNN model at different training stages. Obstacles are indicated with red dots, surveillance blind zones are marked in black, and the agent’s path from start (green) to goal (blue) is shown in orange.
4. Discussions and Concluding Remarks
This study presents a unified framework for infrastructure-aware UAV path planning that explicitly incorporates urban surveillance performance into the decision-making process. By modeling the spatial heterogeneity of communication and monitoring infrastructure and integrating it into a Deep Reinforcement Learning (DRL) framework, we enable UAVs to avoid areas with degraded tracking conditions while maintaining navigational efficiency.
Our empirical analysis based on Singapore’s urban data reveals substantial spatial disparities in surveillance quality, with some regions exhibiting significantly higher tracking delays and weaker signal strength. By conducting in situ latency measurements and simulating conformance monitoring behavior, we demonstrate that monitoring blind zones can adversely impact flight safety. The proposed DDQN-CNN planning model effectively learns to avoid such regions, yielding improved success rates, lower blind zone ratios, and more stable training dynamics compared to baseline models.
While promising, this work has several limitations. The current surveillance model is static and may not reflect real-time variations caused by network congestion or weather disturbances. Additionally, only surveillance-related constraints are considered, whereas real-world planning must also account for energy usage, no-fly zones, and regulatory restrictions. The training process, although robust, remains computationally intensive.
Moreover, since some communication-related data in our infrastructure model are derived from unofficial, crowd-sourced sources, they may introduce spatial or temporal uncertainty into the surveillance performance estimation. While partially validated through field measurements, such uncertainty could affect the precision of cluster assignments and simulation results. Future studies could mitigate this by incorporating more authoritative datasets or modeling data uncertainty explicitly to enhance robustness.
Looking ahead, future work could explore online adaptation through real-time infrastructure sensing, multi-objective policy learning, and collaborative navigation among multiple UAVs. In particular, infrastructure-aware navigation could serve as a foundational capability for decentralized and automated airspace management. This is especially relevant in the context of blockchain-based governance systems, which are being actively explored to support secure, privacy-preserving, and auditable urban airspace operations. By embedding real-time infrastructure performance into decision-making processes, such systems could enable trusted coordination among heterogeneous stakeholders without relying on centralized authorities, while enabling safer, more efficient, and regulation-compliant UAV operations in complex urban environments.
Author Contributions
Conceptualization, Q.L. and W.D.; methodology, Q.L. and W.D.; software, Q.L.; validation, Q.L., W.D. and Z.Y.; formal analysis, Q.L. and W.D.; investigation, Q.L. and W.D.; resources, W.D., Z.Y. and C.J.T.; writing—original draft preparation, Q.L.; writing—review and editing, Q.L. and W.D.; visualization, Q.L.; supervision, Z.Y. and C.J.T.; project administration, W.D. and Z.Y.; funding acquisition, W.D. and C.J.T. All authors have read and agreed to the published version of the manuscript.
Funding
This research is supported by the Fundamental Research Funds for the Central Universities under the Civil Aviation University of China (3122025QD12), and by the National Natural Science Foundation of China No. 72374032. Qianyu Liu is supported by the China Scholarship Council (CSC).
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Pang, B.; Hu, X.; Dai, W.; Low, K.H. Stochastic route optimization under dynamic ground risk uncertainties for safe drone delivery operations. Transp. Res. Part E Logist. Transp. Rev. 2024, 192, 103717. [Google Scholar] [CrossRef]
- Dai, W.; Deng, C. Urban Performance-Based Navigation (uPBN): Addressing the CNS Variation Problem in the Urban Airspace in the Context of UAS Traffic Management. In Proceedings of the 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), Bilbao, Spain, 24–28 September 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 5524–5529. [Google Scholar]
- Falanga, D.; Kim, S.; Scaramuzza, D. How fast is too fast? the role of perception latency in high-speed sense and avoid. IEEE Robot. Autom. Lett. 2019, 4, 1884–1891. [Google Scholar] [CrossRef]
- Dai, W.; Quek, Z.H.; Pang, B.; Feroskhan, M. Analysis of UTM tracking performance for conformance monitoring via hybrid SITL Monte Carlo methods. Drones 2023, 7, 597. [Google Scholar] [CrossRef]
- Dai, W.; Quek, Z.H.; Low, K.H. Probabilistic modeling and reasoning of conflict detection effectiveness by tracking systems towards safe urban air mobility operations. Reliab. Eng. Syst. Saf. 2024, 244, 109908. [Google Scholar] [CrossRef]
- Pang, B.; Hu, X.; Dai, W.; Low, K.H. UAV path optimization with an integrated cost assessment model considering third-party risks in metropolitan environments. Reliab. Eng. Syst. Saf. 2022, 222, 108399. [Google Scholar] [CrossRef]
- Jiang, Y.; Xu, X.X.; Zheng, M.Y.; Zhan, Z.H. Evolutionary computation for unmanned aerial vehicle path planning: A survey. Artif. Intell. Rev. 2024, 57, 267. [Google Scholar] [CrossRef]
- Liu, Q.; Dai, W.; Ma, L.; Tessone, C.J. Towards Transparent and Privacy-Preserving Urban Airspace Management: A Blockchain-Based Scheme Under the Airspace-Resource-Centric Concept. In Proceedings of the 2025 Integrated Communications, Navigation and Surveillance Conference (ICNS), Brussels, Belgium, 8–10 April 2025; pp. 1–8. [Google Scholar]
- Keith, A.; Sangarapillai, T.; Almehmadi, A.; El-Khatib, K. A Blockchain-Powered Traffic Management System for Unmanned Aerial Vehicles. Appl. Sci. 2023, 13, 10950. [Google Scholar] [CrossRef]
- ICAO RNPSORSG. Performance Based Navigation Manual. Working Draft 5.1-Final. 2007. Available online: https://www.icao.int/Meetings/AMC/MA/2007/perf2007/_PBN%20Manual_W-Draft%205.1_FINAL%2007MAR2007.pdf (accessed on 15 March 2025).
- Whitley, P. FAA UTM Concept of Operations-v2.0. FAA. 2020. Available online: https://www.faa.gov/sites/faa.gov/files/2022-08/UTM_ConOps_v2.pdf (accessed on 15 March 2025).
- Deng, C.; Wang, C.H.J.; Low, K.H. Investigation of using sky openness ratio as predictor for navigation performance in urban-like environment to support PBN in UTM. Sensors 2022, 22, 840. [Google Scholar] [CrossRef]
- Wang, C.J.; Tan, S.K.; Low, K.H. Collision risk management for non-cooperative UAS traffic in airport-restricted airspace with alert zones based on probabilistic conflict map. Transp. Res. Part C Emerg. Technol. 2019, 109, 19–39. [Google Scholar] [CrossRef]
- Wang, Y.; Pang, Y.; Chen, O.; Iyer, H.N.; Dutta, P.; Menon, P.K.; Liu, Y. Uncertainty quantification and reduction in aircraft trajectory prediction using Bayesian-Entropy information fusion. Reliab. Eng. Syst. Saf. 2021, 212, 107650. [Google Scholar] [CrossRef]
- Pongsakornsathien, N.; Gardi, A.; Bijjahalli, S.; Sabatini, R.; Kistan, T. A multi-criteria clustering method for UAS traffic management and urban air mobility. In Proceedings of the 2021 IEEE/AIAA 40th Digital Avionics Systems Conference (DASC), San Antonio, TX, USA, 3–7 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–9. [Google Scholar]
- Pongsakornsathien, N.; Bijjahalli, S.; Gardi, A.; Symons, A.; Xi, Y.; Sabatini, R.; Kistan, T. A Performance-Based Airspace Model for Unmanned Aircraft Systems Traffic Management. Aerospace 2020, 7, 154. [Google Scholar] [CrossRef]
- Gonçalves, I.; Rodrigues, L.; Silva, F.A.; Nguyen, T.A.; Min, D.; Lee, J.W. Surveillance System in Smart Cities: A Dependability Evaluation Based on Stochastic Models. Electronics 2021, 10, 876. [Google Scholar] [CrossRef]
- Liang, H.; Bai, H.; Sun, R.; Sun, R.; Li, C. Three-dimensional path planning based on DEM. In Proceedings of the 2017 36th Chinese Control Conference (CCC), Dalian, China, 26–28 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 5980–5987. [Google Scholar]
- Dai, W.; Pang, B.; Low, K.H. Conflict-free four-dimensional path planning for urban air mobility considering airspace occupancy. Aerosp. Sci. Technol. 2021, 119, 107154. [Google Scholar] [CrossRef]
- Kothari, M.; Postlethwaite, I. A probabilistically robust path planning algorithm for UAVs using rapidly-exploring random trees. J. Intell. Robot. Syst. 2013, 71, 231–253. [Google Scholar] [CrossRef]
- Chen, Y.-b.; Luo, G.-c.; Mei, Y.-s.; Yu, J.-q.; Su, X.-l. UAV path planning using artificial potential field method updated by optimal control theory. Int. J. Syst. Sci. 2016, 47, 1407–1420. [Google Scholar] [CrossRef]
- Liu, J.; Luo, W.; Zhang, G.; Li, R. Unmanned Aerial Vehicle Path Planning in Complex Dynamic Environments Based on Deep Reinforcement Learning. Machines 2025, 13, 162. [Google Scholar] [CrossRef]
- Aggarwal, S.; Kumar, N. Path planning techniques for unmanned aerial vehicles: A review, solutions, and challenges. Comput. Commun. 2020, 149, 270–299. [Google Scholar] [CrossRef]
- Zhao, Y.; Zheng, Z.; Liu, Y. Survey on computational-intelligence-based UAV path planning. Knowl.-Based Syst. 2018, 158, 54–64. [Google Scholar] [CrossRef]
- Besada-Portas, E.; de la Torre, L.; Moreno, A.; Risco-Martín, J.L. On the performance comparison of multi-objective evolutionary UAV path planners. Inf. Sci. 2013, 238, 111–125. [Google Scholar] [CrossRef]
- He, W.; Qi, X.; Liu, L. A novel hybrid particle swarm optimization for multi-UAV cooperate path planning. Appl. Intell. 2021, 51, 7350–7364. [Google Scholar] [CrossRef]
- Yuhang, R.; Liang, Z. An adaptive evolutionary multi-objective estimation of distribution algorithm and its application to multi-UAV path planning. IEEE Access 2023, 11, 50038–50051. [Google Scholar] [CrossRef]
- Peng, C.; Huang, X.; Wu, Y.; Kang, J. Constrained multi-objective optimization for UAV-enabled mobile edge computing: Offloading optimization and path planning. IEEE Wirel. Commun. Lett. 2022, 11, 861–865. [Google Scholar] [CrossRef]
- Babel, L. Online flight path planning with flight time constraints for fixed-wing UAVs in dynamic environments. Int. J. Intell. Unmanned Syst. 2022, 10, 416–443. [Google Scholar] [CrossRef]
- Yao, P.; Wang, H.; Su, Z. Real-time path planning of unmanned aerial vehicle for target tracking and obstacle avoidance in complex dynamic environment. Aerosp. Sci. Technol. 2015, 47, 269–279. [Google Scholar] [CrossRef]
- Kim, H.; Aung, P.S.; Munir, M.S.; Saad, W.; Hong, C.S. Cooperative Urban Air Mobility Trajectory Design for Power and AoI Optimization: A Multi-agent Reinforcement Learning Approach. IEEE Trans. Veh. Technol. 2025. [Google Scholar] [CrossRef]
- Zammit, C.; van Kampen, E.J. Real-time 3D UAV path planning in dynamic environments with uncertainty. Unmanned Syst. 2023, 11, 203–219. [Google Scholar] [CrossRef]
- Deniz, S.; Wu, Y.; Shi, Y.; Wang, Z. A reinforcement learning approach to vehicle coordination for structured advanced air mobility. Green Energy Intell. Transp. 2024, 3, 100157. [Google Scholar] [CrossRef]
- Yun, W.J.; Jung, S.; Kim, J.; Kim, J.H. Distributed deep reinforcement learning for autonomous aerial eVTOL mobility in drone taxi applications. ICT Express 2021, 7, 1–4. [Google Scholar] [CrossRef]
- Deniz, S.; Wang, Z. Autonomous Conflict Resolution in Urban Air Mobility: A Deep Multi-Agent Reinforcement Learning Approach. In Proceedings of the AIAA Aviation Forum and ASCEND 2024, Las Vegas, NV, USA, 29 July–2 August 2024; p. 4005. [Google Scholar]
- OpenStreetMap. Available online: https://www.openstreetmap.org (accessed on 15 March 2025).
- Reynolds, D.A. Gaussian mixture models. In Encyclopedia of Biometrics; Springer: Boston, MA, USA, 2009; pp. 659–663. [Google Scholar] [CrossRef]
- Mundhenk, M.; Goldsmith, J.; Lusena, C.; Allender, E. Complexity of finite-horizon Markov decision process problems. J. ACM 2000, 47, 681–720. [Google Scholar] [CrossRef]
- Van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
- O’shea, K.; Nash, R. An introduction to convolutional neural networks. arXiv 2015, arXiv:1511.08458. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).