An Intelligent Path Planning System for Urban Airspace Monitoring: From Infrastructure Assessment to Strategic Optimization

Liu, Qianyu; Dai, Wei; Yan, Zichun; Tessone, Claudio J.

doi:10.3390/smartcities8030100

Open AccessArticle

An Intelligent Path Planning System for Urban Airspace Monitoring: From Infrastructure Assessment to Strategic Optimization

¹

Research Institute of Science and Technology Innovation, Civil Aviation University of China, Tianjin 300300, China

²

Blockchain and Distributed Ledger Technology Group, University of Zurich, 8050 Zurich, Switzerland

³

School of Economics and Management, Beijing University of Posts and Communications, Beijing 100876, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Smart Cities 2025, 8(3), 100; https://doi.org/10.3390/smartcities8030100

Submission received: 12 May 2025 / Revised: 11 June 2025 / Accepted: 17 June 2025 / Published: 19 June 2025

Download

Browse Figures

Versions Notes

Abstract

Highlights

What are the main findings?

A novel infrastructure-aware UAV path planning framework is developed, integrating surveillance quality assessment and Deep Reinforcement Learning (DRL) for enhanced urban airspace operations.
The proposed DDQN-CNN model effectively balances goal reachability, obstacle avoidance, and surveillance compliance, outperforming conventional baselines across multiple metrics.

What is the implication of the main finding?

Embedding real-world infrastructure constraints into navigation policies substantially improves operational safety and regulatory conformance in complex urban environments.
The framework provides a scalable foundation for intelligent and decentralized airspace management systems, supporting future Urban Air Mobility (UAM) integration.

Abstract

Urban Air Mobility (UAM) requires reliable communication and surveillance infrastructures to ensure safe Unmanned Aerial Vehicle (UAV) operations in dense metropolitan environments. However, urban infrastructure is inherently heterogeneous, leading to significant spatial variations in monitoring performance. This study proposes a unified framework that integrates infrastructure readiness assessment with Deep Reinforcement Learning (DRL)-based UAV path planning. Using Singapore as a representative case, we employ a data-driven methodology combining clustering analysis and in situ measurements to estimate the citywide distribution of surveillance quality. We then introduce an infrastructure-aware path planning algorithm based on a Double Deep Q-Network (DQN) with a convolutional architecture, which enables UAVs to learn efficient trajectories while avoiding surveillance blind zones. Extensive simulations demonstrate that the proposed approach significantly improves path success rates, reduces traversal through poorly monitored regions, and maintains high navigation efficiency. These results highlight the potential of combining infrastructure modeling with DRL to support performance-aware airspace operations and inform future UAM governance systems.

Keywords:

urban air mobility; surveillance performance; path planning; deep reinforcement learning; conformance monitoring

1. Introduction

Urban Air Mobility (UAM) has emerged as a promising paradigm for enhancing metropolitan transportation systems by integrating autonomous Unmanned Aerial Vehicles (UAVs) into low-altitude airspace [1]. These aerial operations are expected to transform not only logistics and emergency response but also passenger mobility in dense urban settings. To enable the safe and scalable deployment of UAM, robust Communication, Navigation, and Surveillance (CNS) infrastructures are essential for ensuring continuous tracking, conformance monitoring, and airspace coordination [2].

However, the assumption of uniformly available CNS services across a city does not hold in practice. Urban environments are characterized by tall buildings, signal occlusion, network load variations, and heterogeneous infrastructure deployment, all of which can lead to significant spatial differences in surveillance performance. These disparities create “surveillance blind zones” that undermine critical safety functions such as conflict detection and conformance monitoring [3].

As UAV operations scale up, regulatory frameworks are beginning to require minimum surveillance and communication standards for specific airspace classes. In this context, evaluating infrastructure readiness becomes essential not only for long-term planning but also for flight plan approval. Nonetheless, directly measuring latency or surveillance performance citywide is infeasible due to cost and scalability [4,5]. This motivates a data-driven assessment approach that can model and quantify the spatial distribution of CNS performance using open datasets, clustering methods, and limited field experiments.

Once these spatial performance patterns are known, the remaining challenge is to establish an airspace management framework incorporating infrastructure readiness and navigational services performance distribution, based on which infrastructure-aware flight management becomes possible. Traditional path planning algorithms focus mainly on obstacle avoidance and distance minimization and do not account for infrastructure quality [6]. Without infrastructure-awareness, planners may inadvertently route UAVs through low-surveillance areas, violating regulatory constraints and compromising operational safety.

Reinforcement Learning (RL) has shown strong adaptability to complex navigation environments [7]. Yet, existing RL-based planners typically assume homogeneous or abstracted environmental feedback and rarely integrate real-world infrastructure performance into their policy design. This limits their applicability in realistic UAM contexts.

Beyond the technical challenges of path safety and infrastructure awareness, the future UAM ecosystem will also involve diverse stakeholders, requiring navigation strategies that align with emerging models of decentralized governance and trust [8]. Recent research has proposed blockchain-based airspace management systems to support secure airspace reservation, dynamic allocation, and auditable governance under high traffic volumes [9]. These developments highlight a broader shift towards infrastructure- and trust-aware urban airspace operations, where navigation strategies must be dynamically adaptable to service quality, safety constraints, and evolving coordination protocols.

To address these challenges, this paper presents a unified framework that combines urban surveillance performance assessment with deep reinforcement learning-based UAV path planning. We first construct a data-driven model to estimate the spatial distribution of surveillance quality in the urban environment, using Singapore as a representative case. Then, we develop a learning-based planning system that incorporates this spatial information to intelligently avoid regions with the poorest monitoring performance, while still ensuring route efficiency and reachability. By integrating infrastructure-awareness into navigation decision-making, our approach enhances operational safety and regulatory compliance and provides a scalable foundation for future UAM integration in dense urban contexts.

1.1. Related Works

Urban environments pose substantial challenges for the safe integration of Unmanned Aircraft Systems (UASs) due to spatial heterogeneity in Communication, Navigation, and Surveillance (CNS) infrastructure performance. The Performance-Based Navigation (PBN) framework established by ICAO emphasizes that navigation requirements and operational safety are inherently dependent on the local availability and quality of CNS services, including ground-based infrastructure and airborne equipment [10]. In this context, the FAA’s UTM Concept of Operations v2.0 further highlights that flight authorizations and performance assessments must consider the dynamic variability of surveillance and communication availability, especially in complex urban airspaces [11]. These institutional frameworks emphasize the importance of performance-aware decision-making for flight planning and airspace access.

Building on these conceptual frameworks, a number of studies have sought to model CNS performance in urban contexts. For example, researchers have proposed probabilistic models to characterize surveillance quality using signal propagation, obstruction, or environmental variables such as the Sky Openness Ratio (SOR), with applications to navigation accuracy estimation and alert zone construction [12,13,14]. Advanced clustering methods have also been used to classify urban airspace according to CNS indicators, supporting performance-based airspace design and real-time monitoring of tracking capabilities [15,16]. Moreover, dependability analyses of smart city surveillance systems reveal how network layout and sensor reliability critically impact the availability and coverage of monitoring infrastructure [17].

Despite these developments, current approaches often focus on isolated technical domains—such as navigation or surveillance—but lack an integrated framework to spatially quantify and utilize CNS performance as a constraint for flight planning. Furthermore, most existing models rely on simulation or worst-case assumptions, rather than on empirical data-driven characterizations grounded in real urban infrastructure. These gaps underscore the need for operational methodologies that can assess infrastructure readiness for UAV operations at the city scale and dynamically inform downstream services such as trajectory planning or airspace reservation. Our work addresses this gap by constructing a comprehensive, data-driven model of surveillance quality using open infrastructure data, field measurements, and spatial clustering.

In the domain of urban flight path planning, Deep Reinforcement Learning (DRL) has emerged as a powerful tool for autonomous navigation in complex and dynamic environments. Traditional algorithms such as A* search [18,19], Rapidly-Exploring Random Trees (RRTs) [20], and Artificial Potential Fields (APFs) [21] have been widely used for UAV route optimization. While these classical methods are effective in static and fully known environments, they often struggle to adapt in real time to dynamic obstacles, variable surveillance constraints, or unforeseen hazards.

To overcome such limitations, learning-based approaches have gained increasing attention due to their ability to optimize navigation policies through trial-and-error interactions with uncertain environments. Deep Q-Networks (DQNs) and their variants have demonstrated strong performance in tasks such as obstacle avoidance and goal-directed flight in cluttered settings [22]. Recent surveys emphasize that conventional single-objective formulations—typically focused on distance or time—are insufficient to address modern UAV mission requirements involving collision risk, navigation uncertainty, and energy consumption [23,24]. In response, evolutionary computation and swarm intelligence methods have been explored to solve multi-objective path planning problems, offering greater robustness and flexibility in large-scale or three-dimensional environments [25,26,27,28].

In urban settings, navigation quality is not determined solely by physical obstacles but is also heavily influenced by local infrastructure characteristics such as sky openness, signal blockage, and GNSS multipath interference [12]. Several studies have incorporated environmental constraints such as limited flight time, maneuverability, and signal degradation into the optimization process, leveraging techniques like adaptive RRT*, Dubins curves, and Lyapunov-based guidance fields to generate feasible and efficient paths under strict operational constraints [29,30].

Reinforcement learning—particularly deep and multi-agent variants—has shown promise in addressing these multi-constrained, high-dimensional challenges. Such methods have been successfully applied to scenarios involving cooperative UAV operations, safe separation in dense urban airspace, and trade-offs between conflicting objectives such as energy efficiency and risk exposure [28,31,32]. Multi-agent DRL frameworks have further demonstrated their potential in structured airspace management and real-time conflict resolution in UAM, effectively scaling to heterogeneous vehicle types and evolving regulatory environments [33,34,35].

Despite these advances, few existing methods explicitly incorporate surveillance performance heterogeneity into the navigation policy itself. Most DRL-based planners assume homogeneous or implicit environmental feedback, leaving a gap in developing infrastructure-aware learning strategies that can proactively avoid regions with poor monitoring quality. This motivates the need for infrastructure-aware planning strategies that directly incorporate CNS heterogeneity into decision-making.

1.2. Contributions

This paper aims to bridge the gap between urban infrastructure heterogeneity and UAV path planning. The main contributions are summarized as follows:

We propose a data-driven framework to quantify surveillance heterogeneity in urban environment, using Singapore as a representative case study.
We design a deep reinforcement learning-based path planning algorithm that explicitly incorporates surveillance quality constraints, enabling UAVs to avoid regions with poor monitoring capabilities.
We conduct comprehensive simulations to evaluate the proposed system, demonstrating improvements in safety-related metrics.

1.3. Organization of the Paper

The rest of this paper is organized as follows. Section 2 presents the methodology for assessing urban surveillance performance and discusses the case study results based on Singapore’s infrastructure data. Section 3 introduces the deep reinforcement learning framework for infrastructure-aware UAV path planning. Section 4 concludes the paper and discusses potential directions for future research.

2. Assessment of Navigational Infrastructure Readiness

To ensure the feasibility and safety of UAV operations in urban environments, it is critical to evaluate the communication and surveillance infrastructure that supports flight planning. This section presents a data-driven methodology for estimating urban tracking performance and assessing infrastructure readiness, using Singapore as a case study.

2.1. Methodology

Urban environments are characterized by varying levels of communication latency due to differences in cellular infrastructure deployment and environmental interference [12]. Direct measurement across an entire city is infeasible; therefore, we adopt a feature-based clustering and extrapolation approach.

The methodology, as illustrated in Figure 1, consists of four main steps:

We first extract and engineer features from open-source datasets that include cellular network attributes and urban building information. The city is then discretized into 200 m × 200 m grid blocks to enable spatial analysis. These blocks are clustered based on their communication performance characteristics using unsupervised learning algorithms. To validate and parameterize the simulation model, we conduct in situ latency measurements at representative sites within each cluster, covering both 4G and 5G networks. Finally, the measured data are used to drive Monte Carlo simulations [4] to evaluate how latency heterogeneity impacts UAV surveillance effectiveness.

2.2. Identification of Typical Geographical Blocks

To assess the spatial distribution of surveillance performance, we analyze and cluster the urban environment based on multiple communication-related features. This section details the process of identifying typical geographical blocks through data extraction, feature encoding, and clustering. The process begins with compiling a diverse set of open-source datasets relevant to communication infrastructure and urban morphology.

2.2.1. Data Extraction and Pre-Procession

Due to the lack of quality data sources for the cellular network in Singapore, we obtained information from open-source data or websites and used image processing techniques to extract tabular data from screenshots.Four key datasets were extracted:

Network Coverage Data: The network coverage data are extracted from the NPerf website, as illustrated in Figure 2. The data processing involves setting the filter to the desired service provider, taking the screenshot of the raw data, and using OpenCV and NumPy libraries for image processing and data extraction. The data are also classified into 5G and non-5G according to their RGB values.

Network Strength Data: Similar to coverage, network strength is also extracted from a screenshot of a website. The Cellmap website provides information for cellular networks in different regions. As shown in Figure 3, we analyzed the network strength distribution. By applying image-processing techniques, we extracted the information. The data are also classified into High, Moderate, and Weak strength points according to their RGB values.

Cellular Tower Data: The open-source cellular tower dataset is provided by the OpenCellID website. The dataset contains information on the type of tower and position in Latitude and Longitude. Figure 4 illustrates the towers on the Singapore map. It is worth noting that the information of network coverage, network strength, and cellular tower are not official. They are estimations based on crowd-sourcing.

Building Data: The building data were obtained from OpenStreetMap (OSM) [36]. They consist of the location of the buildings as shapely polygons along with their height, road name, and area. Figure 5 provides a snapshot of the building information.

2.2.2. Clustering Analysis

In the first step of clustering analysis, we need to encode the datasets using a vector to represent the features of a geographical block. We discretize the Singapore map into blocks with 200 × 200 m size. Then, we proceed to identify the features associated with each block using the data sources. The features that we use in this study are shown in Table 1.

2.2.3. Clustering Analysis

Before clustering, we normalized the data using a standard scaler to reduce the bias within the results. After evaluating various density-based and iterative algorithms like DBSCAN, KMeans, Spectral, GMM, and Hierarchical clustering, we decided the Gaussian Mixture Model (GMM) and KMeans clustering would be the most optimal choice of algorithms for our problem.

We performed clustering with various numbers of clusters, starting with 5 clusters and going up to 9 clusters. The results do not change significantly as the number of clusters increases after 5 and 6. As accurate testing would be difficult if the number of clusters is too high, we felt that about 5 clusters would be optimal for estimating the network latency accurately. The two metrics (Silhouette analysis and Within cluster sum of squares) also suggest that 5 and 6 clusters would be optimal, as illustrated in Figure 6.

The clustering result is shown in Figure 7, where different colors refer to different clusters. The number of blocks in each cluster is shown in Figure 8. These are the final results we will be using for the testing. We can guess that higher latency will be mostly within the central region of Singapore. The borders are mostly forests with fewer towers and low coverage strength.

In Table 2, we compare the differences among different clusters. The average number of towers shows the average density of cellular towers in a block in each cluster. The average number of building elements shows not only the density of buildings but also the complexity of buildings. Among the five clusters, Cluster 1 has the least number of buildings and the worst coverage, while Cluster 4 has the largest number of towers and the best coverage. Cluster 3 has good 3/4G coverage but the 5G coverage is limited. Cluster 5 has good cellular network coverage but has the most complex ground infrastructure environment.

2.3. Monte Carlo Simulation and Results

To establish a foundation for infrastructure-aware flight planning, we first conducted field tests of communication latency across different urban environments. We selected two test sites within each of the five clusters identified in our infrastructure assessment and measured the communication latency distributions for both 4G and 5G networks at all sites. The latency distributions were fitted using Gaussian Mixture Models (GMMs) [37] to characterize their statistical properties.

Our measurements revealed that the variation in latency between different test locations was significantly more pronounced than the differences between 4G and 5G networks at the same location. This finding empirically confirms that communication performance in urban environments exhibits substantial spatial heterogeneity, underscoring the importance of location-specific infrastructure awareness in flight planning.

Building on these field measurements, we quantitatively evaluated the impact of communication infrastructure on flight safety monitoring by implementing the measured communication latency into our Conformance Monitoring (CM) framework. This allowed us to assess how cross-site differences in latency affect monitoring effectiveness across different urban regions.

Tracking latency significantly influences CM detection delay, with tracking update rate being another contributing factor. Figure 9 shows the distribution of detection delay under 1 Hz and 2.5 Hz update rate settings. The distributions between 4G and 5G networks show subtle differences, suggesting that network generation alone is not the determining factor in monitoring performance.

When comparing distributions across different regions (Figure 10 and Figure 11), we observe that, while the overall patterns are similar, Cluster 1 exhibits the largest mean CM delay under the ExtOff setting at both sites 1 and 2.

To better evaluate the distribution of CM delay, we propose a comprehensive metric

p_{C M} (α, β, t_{1}, t_{2})

, defined as:

p_{C M} (α, β, t_{1}, t_{2}) = α P (t_{d e l a y} < t_{1}) - β P (t_{d e l a y} > t_{2})

(1)

where

t_{1}

and

t_{2}

are two delay time thresholds and

α

and

β

are weighting coefficients. This metric considers both the majority of distribution and the long-tail effect: it rewards the probability that the delay is lower than the first threshold (

t_{1}

) and significantly penalizes the probability that the delay is higher than the second threshold (

t_{2}

). The range of

p_{C M} (α, β, t_{1}, t_{2})

is

[α, - β]

.

For our analysis, we set

t_{1} = 1 s

,

t_{2} = 2 s

,

α = 1

, and

β = 10

. The results across the five clusters are presented in Table 3. The high value of

β

reflects our emphasis on minimizing the occurrence of significant delays, which could compromise flight safety in critical scenarios.

Figure 12 illustrates the spatial distribution of

p_{C M}

values across Singapore, where darker colors indicate better CM timeliness performance. This visualization reveals significant spatial variation in monitoring quality, with certain areas showing consistently poorer performance.

These findings establish the foundation for the infrastructure-aware path planning framework introduced in Section 3. By identifying regions with potentially compromised monitoring capability, we can guide UAVs to prefer flight paths with better surveillance coverage, thereby enhancing operational safety.

3. DRL-Based Infrastructure-Aware Flight Planning

To ensure robust and efficient flight operations in urban environments, a path planning algorithm should not only avoid physical obstacles but also account for the communication and surveillance quality across different regions. Building on the infrastructure assessment in Section 2, we develop a Deep Reinforcement Learning (DRL)-based navigation system that incorporates infrastructure constraints into flight planning. The objective is to enable Unmanned Aerial Vehicles (UAVs) to learn optimal trajectories that avoid both obstacles and areas with poor surveillance performance.

3.1. Problem Formulation

We formulate the infrastucture-aware path planning task as a finite-horizon Markov Decision Process (MDP) [38], where a UAV navigates from a given starting position to a predefined destination on a grid-based urban map. Each environment instance encodes two spatial layers:

An obstacle map $O \in {0, 1}^{H \times W}$ , where each cell indicates whether the location is traversable (0) or blocked (1);
A surveillance performance map $S \in {[0, 1]}^{H \times W}$ , which reflects the monitoring quality available at each location, based on factors such as communication delay and signal coverage.

In Section 2, we clustered the urban environment into five surveillance performance categories using data-driven analysis. Among them, Cluster 1 was identified as having the poorest monitoring conditions, including the highest tracking delay and weakest coverage. In our path planning framework, we refer to these regions as surveillance blind zones and treat them as areas to avoid.

The overall planning objective is to find a path

p_{i} \in P a t h, (P a t h = {p_{1}, p_{2}, \dots, p_{n}})

that not only avoids obstacles and reaches the goal efficiently but also maximizes monitoring safety by avoiding blind zones. However, overly strict optimization toward high surveillance performance could result in unnecessarily long or infeasible paths. To balance safety and navigability, we simplify the surveillance constraint into a binary formulation: only Cluster 4 areas are designated as forbidden zones, while all others are considered acceptable. This transforms the original multi-class optimization problem into a binary constraint-aware navigation task, improving both training efficiency and practical feasibility.

This can be formulated as a multi-objective optimization problem:

min J (P) = α \cdot Length (P a t h) + β \cdot \sum_{p_{i} \in P a t h} (1 - S (p_{i}))

(2)

where

α

and

β

are weighting coefficients balancing efficiency and surveillance quality.

Formally, the MDP is defined as:

M = (S, A, P, R, γ)

where

S

is the state space,

A

is the action space, P is the transition probability function, R is the reward function, and

γ \in [0, 1]

is the discount factor.

At each time step t, the agent’s state

s_{t}

consists of two components:

A local observation window $o_{t} \in R^{2 \times w \times w}$ , which is a $w \times w$ patch centered at the agent’s current position $(x_{t}, y_{t})$ , extracted from both the obstacle map O and the surveillance map S;
A relative goal vector $g_{t} \in {[- 1, 1]}^{2}$ , computed as:

$g_{t} = [\frac{x_{goal} - x_{t}}{H}, \frac{y_{goal} - y_{t}}{W}]$

Thus, the complete state is defined as:

s_{t} = (o_{t}, g_{t})

The action space

A

consists of four discrete actions, up, down, left, and right, corresponding to cardinal movements on the grid. This discrete action space aligns with practical UAV control requirements in urban flight corridors and simplifies the learning process while maintaining sufficient maneuverability for the navigation task.

The transition function reflects the deterministic nature of the grid environment. Specifically,

P (s_{t + 1} | s_{t}, a_{t}) = 1

if

s_{t + 1}

is the resulting state after taking action

a_{t}

in state

s_{t}

, and 0 otherwise.

3.2. Deep Reinforcement Learning Approach

To solve the infrastructure-aware path planning problem defined in Section 3.1, we employ a Double Deep Q-Network (DDQN) algorithm [39] with Convolutional Neural Networks (CNNs) [40] to effectively learn policies that consider both physical obstacles and surveillance constraints.

3.2.1. Reward Function Design and Learning Strategy

The reward function is designed to balance three competing objectives: reaching the goal efficiently, avoiding obstacles, and maintaining high-quality surveillance coverage. We construct a hierarchical reward structure that properly prioritizes these objectives while providing effective learning signals.

Specifically, at each time step, the reward

R (s_{t}, a_{t}, s_{t + 1})

is determined as:

R (s_{t}, a_{t}, s_{t + 1}) = \{\begin{matrix} r_{goal}, & if (x_{t + 1}, y_{t + 1}) = (x_{goal}, y_{goal}) \\ r_{obstacle}, & if obstacle at (x_{t + 1}, y_{t + 1}) or boundary violation \\ r_{blind}, & if (x_{t + 1}, y_{t + 1}) is in a surveillance blind zone \\ r_{step} + r_{progress}, & otherwise \end{matrix}

(3)

where:

$r_{goal} > 0$ is a large positive reward for successfully reaching the goal;
$r_{obstacle} < 0$ is a substantial negative penalty for collisions or boundary violations, leading to episode termination;
$r_{blind} < 0$ is a penalty for traversing surveillance blind zones;
$r_{step} < 0$ is a small step-wise penalty to encourage shorter paths;
$r_{progress}$ provides incremental feedback based on distance reduction toward the goal.

The progress reward is calculated as:

r_{progress} = λ_{progress} \cdot (d (s_{t}) - d (s_{t + 1}))

(4)

where

d (s)

denotes the Manhattan distance from the current position to the goal and

λ_{progress} > 0

is a positive scaling factor. This shaping reward guides the agent toward the goal, even when the final reward is distant and sparse.

Consequently, the agent is incentivized to avoid terminal collisions first, maintain surveillance quality second, and optimize path efficiency third. The hierarchical reward design ensures proper prioritization and facilitates efficient learning convergence.

3.2.2. Action Selection and Training Procedure

During training, the agent follows an

ε

-greedy exploration policy. At each time step, with probability

ε

, a random action is selected to encourage exploration; otherwise, the action with the highest Q-value is chosen:

a_{t} = \{\begin{matrix} random action, & with probability ε \\ arg max_{a} Q_{θ} (s_{t}, a), & otherwise \end{matrix}

(5)

The network parameters are updated by minimizing the mean squared Temporal Difference (TD) error between the predicted Q-values and the Double DQN target. The target value

y_{t}

is computed as:

y_{t} = r_{t} + γ Q_{θ^{-}} (s_{t + 1}, arg max_{a} Q_{θ} (s_{t + 1}, a))

(6)

where

Q_{θ^{-}}

is the target network and

Q_{θ}

is the online network. The loss function is given by:

L (θ) = E_{(s, a, r, s^{'}, d) \sim D} [{(Q_{θ} (s, a) - y)}^{2}]

(7)

where

D

denotes the experience replay buffer and d indicates episode termination.

The training follows the standard DDQN pipeline, employing the following: (1) experience replay to decorrelate samples and improve sample efficiency; (2) a periodically updated target network to stabilize the learning process.

3.2.3. Neural Network Architecture

The effectiveness of our infrastructure-aware path planning approach relies significantly on the neural network architecture’s ability to process spatial information and learn meaningful feature representations. We employ a hybrid architecture that combines convolutional layers for spatial feature extraction with fully connected layers for decision making.

Figure 13 illustrates the architecture of our DDQN-CNN model. The network processes two input streams: the local observation window and the goal vector. The observation window

o_{t} \in R^{2 \times w \times w}

contains both obstacle and surveillance information, requiring specialized processing to extract relevant spatial features.

The network architecture consists of three main components:

Convolutional Feature Extractor: Processes the local observation window through three convolutional layers. These layers progressively extract spatial features related to obstacle configurations and surveillance quality patterns.
Feature Fusion Module: The convolutional features are flattened into a 1D vector and concatenated with the 2D goal vector to create a comprehensive state representation that combines local environmental features with global goal information.
Value Approximation Layers: The fused feature vector is processed through fully connected layers.

The convolutional layers help identify complex patterns in the local environment, such as obstacle configurations and surveillance blind zones, while the fully connected layers learn to associate these patterns with appropriate Q-values for each action. This architecture significantly outperforms standard MLP-based approaches by effectively leveraging the spatial structure of the grid world.

For the DDQN algorithm, we maintain two instances of this network: an online network for action selection and a target network for stable Q-value targets. The target network parameters

θ^{-}

are periodically updated from the online network parameters

θ

to stabilize training.

3.2.4. Overview

Algorithm 1 summarizes the complete training procedure of the proposed DDQN-CNN-based infrastructure-aware path planning agent. The process includes environment interaction,

ε

-greedy exploration, experience replay optimization, target network updates, and reward clipping for stability.

Algorithm 1: DDQN-CNN-Based Infrastructure-Aware Flight Planning Algorithm

Initialize:
Q-network with CNN encoder and MLP head
Target Q-network with same architecture
Replay buffer $D$
Set initial $ϵ = 1.0$
Load fixed-size environment (e.g., 100 × 100 grid)
Set hyperparameters (learning rate, batch size, discount factor $γ$ )
for each episode $e = 1, 2, \dots, E$ do
Reset environment; get initial state s
Initialize $t o t a l_r e w a r d = 0$
while not done and steps < MAX_STEPS do
With probability $ϵ$ , select a random action a
Otherwise, select $a = arg {max}_{a} Q (s, a)$
Execute action a, observe reward r, next state $s^{'}$ , and done flag
Store transition $(s, a, r, s^{'}, d o n e)$ into $D$
$s \leftarrow s^{'}$
$t o t a l_r e w a r d + = r$
if replay buffer $D$ has at least batch size samples then
Sample a mini-batch of transitions $(s_{j}, a_{j}, r_{j}, s_{j}^{'}, d o n e_{j})$ from $D$
Compute target Q-value for each sample:
$y_{j} = r_{j} + γ Q_{target} (s_{j}^{'}, arg {max}_{a} Q (s_{j}^{'}, a))$
Compute loss: $L = \frac{1}{N} \sum_{j} {(Q (s_{j}, a_{j}) - y_{j})}^{2}$
Perform gradient descent on loss L
end if
end while
Update target network every K episodes: $Q_{target} \leftarrow Q$
Decay $ϵ$ : $ϵ \leftarrow max (ϵ \times d e c a y, ϵ_{\min})$
Log $t o t a l_r e w a r d$ , success flag, path length, blind zone ratio, shortest path ratio, loss
end for

3.3. Numerical Study and Results

3.3.1. Experimental Setup

To evaluate the effectiveness of the proposed infrastructure-aware UAV path planning algorithm, we conducted training and testing in a simulated urban environment represented as a 50 × 50 grid map with randomly generated obstacles and surveillance performance variations. The agent was trained using a Double Deep Q-Network with Convolutional Neural Network (DDQN-CNN) architecture, as described in Section 3.2.3.

For a comprehensive comparison, we additionally trained three alternative models: (1) Deep Q-Network with Multi-Layer Perceptron (DQN-MLP), (2) Deep Q-Network with Convolutional Neural Network (DQN-CNN), and (3) Double Deep Q-Network with Multi-Layer Perceptron (DDQN-MLP).

These comparison models were used to assess the impact of network architecture and Q-learning variants, although the DDQN-CNN remains the primary method proposed.

To ensure model generalization, we constructed a map pool consisting of 100 pre-generated maps. During training, at each episode the environment was determined by sampling from this map pool with a probability of 80%, or by dynamically generating a new map with a probability of 20%. Each map contains randomly placed obstacles and surveillance blind spots. Additionally, the start and goal positions were randomly selected at the beginning of each episode to increase the diversity of navigation scenarios

Training was conducted over 3000 episodes per model, with multiple random seeds (0, 100, 499, 999, 5000) to ensure statistical robustness. Performance was evaluated based on several metrics, including total reward, success rate, shortest path ratio, blind step ratio, and path length.

3.3.2. Training Performance Analysis

The training progression over 3000 episodes for all four models is illustrated in Figure 14, showing key performance metrics with 50-episode smoothing to better reveal underlying trends despite the inherent variability from randomized environments.

Figure 14b presents the success rate evolution during training, that is the percentage of episodes where the agent successfully reaches the destination within the maximum allowed steps. The DDQN-MLP model demonstrates the fastest initial learning, reaching approximately 70% success rate by episode 600. The DDQN-CNN model shows a slightly slower but steady improvement, achieving comparable success levels after around 1000 episodes and eventually exceeding 80% success rate by the end of training. In contrast, the DQN-CNN model initially struggles, maintaining a much lower success rate in early episodes, but gradually recovers to around 70% after 3000 episodes. The DQN-MLP model shows steady progress throughout training, achieving about 75–80% success rate in the later stages.

Figure 14a shows the total reward comparison across models. The DDQN-MLP model achieves strong early performance, while the DDQN-CNN model, despite initial lower rewards, gradually improves and ultimately attains the highest and most stable rewards (approximately 500) by the end of training. The DQN-MLP model closely follows, whereas the DQN-CNN model, although recovering from negative rewards in early episodes, remains slightly behind other models throughout training.

The shortest path ratio comparison in Figure 14c highlights the differences in navigation efficiency among the models. The shortest path ratio is defined as the agent’s actual trajectory length divided by the theoretical shortest distance. Lower values indicate more efficient path planning. Initially, all models exhibit high ratios between 8 and 12, suggesting inefficient navigation. Over time, all models significantly improve, converging to ratios around 2.5–3.5. Among them, the DDQN-MLP model achieves the best final performance with the lowest shortest path ratios, while the CNN-based models, particularly DQN-CNN, show larger fluctuations.

Figure 14d depicts the blind step ratio over training, representing the proportion of the agent’s trajectory that passes through surveillance blind zones. Throughout training, all models can keep the blind step ratio significantly lower than the randomly generated environment’s blind zone proportion (5%–8%). Notably, the DDQN-MLP and DDQN-CNN models stabilize at blind step ratios around or below 1% in the later stages of training. This indicates that the proposed surveillance-aware path planning algorithm effectively enhances the Conformance Monitoring (CM) performance by steering the agent away from blind zones. In addition to improving surveillance coverage consistency, lower blind step ratios can also reduce the probability of tracking loss and increase flight safety, which is particularly critical for urban UAV operations.

Figure 14e shows the Temporal Difference (TD) loss curves, reflecting the learning stability of the models. As expected, all models initially experience an increase in loss during the exploration-heavy early episodes, followed by a steady decline as training progresses. The DDQN-MLP model demonstrates the fastest convergence in loss, stabilizing before episode 1000. Meanwhile, the DQN-CNN model exhibits the slowest loss reduction, requiring more episodes to achieve stable training.

These results collectively demonstrate the performance trade-offs among different architectures. While MLP-based models generally achieve faster early learning and more efficient paths, the CNN-based models, particularly DDQN-CNN, demonstrate better capabilities in balancing multiple objectives, including success rate, reward maximization, navigation efficiency, and surveillance coverage enhancement over extended training horizons.

To further summarize the overall performance of each model, a radar chart comparison is provided in Figure 15. The chart aggregates the normalized results across five evaluation metrics: total reward, success rate, shortest path ratio, blind step ratio, and TD loss. For total reward and success rate, higher values indicate better performance, whereas for shortest path ratio, blind step ratio, and loss, lower values are preferable (after appropriate normalization).

The radar chart highlights the strong balance achieved by the DDQN-CNN model across all dimensions. It attains the highest values for total reward and success rate, reflecting superior learning and navigation capabilities. Although the DDQN-CNN model does not achieve the absolute best shortest path ratio and blind step ratio, the performance gaps compared to the best models are minor and practically insignificant. Combined with its low TD loss and consistent training dynamics, the DDQN-CNN demonstrates the most robust and balanced overall behavior among all candidates.

These findings validate the effectiveness of the proposed DDQN-CNN approach for infrastructure-aware UAV path planning tasks. Its ability to consistently achieve high rewards, maintain navigation efficiency, and enhance surveillance coverage, while preserving training stability, makes it particularly promising for practical deployment in real-world urban airspace management applications.

To provide a qualitative illustration of the navigation behaviors learned by the agent, several representative flight trajectories generated by the DDQN-CNN model are presented in Figure 16. These examples demonstrate the agent’s ability to effectively reach its goal while avoiding obstacles and minimizing traversal through surveillance blind zones. As training progresses, the trajectories become progressively more direct and efficient, reflecting the model’s improved planning capabilities and enhanced situational awareness.

4. Discussions and Concluding Remarks

This study presents a unified framework for infrastructure-aware UAV path planning that explicitly incorporates urban surveillance performance into the decision-making process. By modeling the spatial heterogeneity of communication and monitoring infrastructure and integrating it into a Deep Reinforcement Learning (DRL) framework, we enable UAVs to avoid areas with degraded tracking conditions while maintaining navigational efficiency.

Our empirical analysis based on Singapore’s urban data reveals substantial spatial disparities in surveillance quality, with some regions exhibiting significantly higher tracking delays and weaker signal strength. By conducting in situ latency measurements and simulating conformance monitoring behavior, we demonstrate that monitoring blind zones can adversely impact flight safety. The proposed DDQN-CNN planning model effectively learns to avoid such regions, yielding improved success rates, lower blind zone ratios, and more stable training dynamics compared to baseline models.

While promising, this work has several limitations. The current surveillance model is static and may not reflect real-time variations caused by network congestion or weather disturbances. Additionally, only surveillance-related constraints are considered, whereas real-world planning must also account for energy usage, no-fly zones, and regulatory restrictions. The training process, although robust, remains computationally intensive.

Moreover, since some communication-related data in our infrastructure model are derived from unofficial, crowd-sourced sources, they may introduce spatial or temporal uncertainty into the surveillance performance estimation. While partially validated through field measurements, such uncertainty could affect the precision of cluster assignments and simulation results. Future studies could mitigate this by incorporating more authoritative datasets or modeling data uncertainty explicitly to enhance robustness.

Looking ahead, future work could explore online adaptation through real-time infrastructure sensing, multi-objective policy learning, and collaborative navigation among multiple UAVs. In particular, infrastructure-aware navigation could serve as a foundational capability for decentralized and automated airspace management. This is especially relevant in the context of blockchain-based governance systems, which are being actively explored to support secure, privacy-preserving, and auditable urban airspace operations. By embedding real-time infrastructure performance into decision-making processes, such systems could enable trusted coordination among heterogeneous stakeholders without relying on centralized authorities, while enabling safer, more efficient, and regulation-compliant UAV operations in complex urban environments.

Author Contributions

Conceptualization, Q.L. and W.D.; methodology, Q.L. and W.D.; software, Q.L.; validation, Q.L., W.D. and Z.Y.; formal analysis, Q.L. and W.D.; investigation, Q.L. and W.D.; resources, W.D., Z.Y. and C.J.T.; writing—original draft preparation, Q.L.; writing—review and editing, Q.L. and W.D.; visualization, Q.L.; supervision, Z.Y. and C.J.T.; project administration, W.D. and Z.Y.; funding acquisition, W.D. and C.J.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the Fundamental Research Funds for the Central Universities under the Civil Aviation University of China (3122025QD12), and by the National Natural Science Foundation of China No. 72374032. Qianyu Liu is supported by the China Scholarship Council (CSC).

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Pang, B.; Hu, X.; Dai, W.; Low, K.H. Stochastic route optimization under dynamic ground risk uncertainties for safe drone delivery operations. Transp. Res. Part E Logist. Transp. Rev. 2024, 192, 103717. [Google Scholar] [CrossRef]
Dai, W.; Deng, C. Urban Performance-Based Navigation (uPBN): Addressing the CNS Variation Problem in the Urban Airspace in the Context of UAS Traffic Management. In Proceedings of the 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), Bilbao, Spain, 24–28 September 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 5524–5529. [Google Scholar]
Falanga, D.; Kim, S.; Scaramuzza, D. How fast is too fast? the role of perception latency in high-speed sense and avoid. IEEE Robot. Autom. Lett. 2019, 4, 1884–1891. [Google Scholar] [CrossRef]
Dai, W.; Quek, Z.H.; Pang, B.; Feroskhan, M. Analysis of UTM tracking performance for conformance monitoring via hybrid SITL Monte Carlo methods. Drones 2023, 7, 597. [Google Scholar] [CrossRef]
Dai, W.; Quek, Z.H.; Low, K.H. Probabilistic modeling and reasoning of conflict detection effectiveness by tracking systems towards safe urban air mobility operations. Reliab. Eng. Syst. Saf. 2024, 244, 109908. [Google Scholar] [CrossRef]
Pang, B.; Hu, X.; Dai, W.; Low, K.H. UAV path optimization with an integrated cost assessment model considering third-party risks in metropolitan environments. Reliab. Eng. Syst. Saf. 2022, 222, 108399. [Google Scholar] [CrossRef]
Jiang, Y.; Xu, X.X.; Zheng, M.Y.; Zhan, Z.H. Evolutionary computation for unmanned aerial vehicle path planning: A survey. Artif. Intell. Rev. 2024, 57, 267. [Google Scholar] [CrossRef]
Liu, Q.; Dai, W.; Ma, L.; Tessone, C.J. Towards Transparent and Privacy-Preserving Urban Airspace Management: A Blockchain-Based Scheme Under the Airspace-Resource-Centric Concept. In Proceedings of the 2025 Integrated Communications, Navigation and Surveillance Conference (ICNS), Brussels, Belgium, 8–10 April 2025; pp. 1–8. [Google Scholar]
Keith, A.; Sangarapillai, T.; Almehmadi, A.; El-Khatib, K. A Blockchain-Powered Traffic Management System for Unmanned Aerial Vehicles. Appl. Sci. 2023, 13, 10950. [Google Scholar] [CrossRef]
ICAO RNPSORSG. Performance Based Navigation Manual. Working Draft 5.1-Final. 2007. Available online: https://www.icao.int/Meetings/AMC/MA/2007/perf2007/_PBN%20Manual_W-Draft%205.1_FINAL%2007MAR2007.pdf (accessed on 15 March 2025).
Whitley, P. FAA UTM Concept of Operations-v2.0. FAA. 2020. Available online: https://www.faa.gov/sites/faa.gov/files/2022-08/UTM_ConOps_v2.pdf (accessed on 15 March 2025).
Deng, C.; Wang, C.H.J.; Low, K.H. Investigation of using sky openness ratio as predictor for navigation performance in urban-like environment to support PBN in UTM. Sensors 2022, 22, 840. [Google Scholar] [CrossRef]
Wang, C.J.; Tan, S.K.; Low, K.H. Collision risk management for non-cooperative UAS traffic in airport-restricted airspace with alert zones based on probabilistic conflict map. Transp. Res. Part C Emerg. Technol. 2019, 109, 19–39. [Google Scholar] [CrossRef]
Wang, Y.; Pang, Y.; Chen, O.; Iyer, H.N.; Dutta, P.; Menon, P.K.; Liu, Y. Uncertainty quantification and reduction in aircraft trajectory prediction using Bayesian-Entropy information fusion. Reliab. Eng. Syst. Saf. 2021, 212, 107650. [Google Scholar] [CrossRef]
Pongsakornsathien, N.; Gardi, A.; Bijjahalli, S.; Sabatini, R.; Kistan, T. A multi-criteria clustering method for UAS traffic management and urban air mobility. In Proceedings of the 2021 IEEE/AIAA 40th Digital Avionics Systems Conference (DASC), San Antonio, TX, USA, 3–7 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–9. [Google Scholar]
Pongsakornsathien, N.; Bijjahalli, S.; Gardi, A.; Symons, A.; Xi, Y.; Sabatini, R.; Kistan, T. A Performance-Based Airspace Model for Unmanned Aircraft Systems Traffic Management. Aerospace 2020, 7, 154. [Google Scholar] [CrossRef]
Gonçalves, I.; Rodrigues, L.; Silva, F.A.; Nguyen, T.A.; Min, D.; Lee, J.W. Surveillance System in Smart Cities: A Dependability Evaluation Based on Stochastic Models. Electronics 2021, 10, 876. [Google Scholar] [CrossRef]
Liang, H.; Bai, H.; Sun, R.; Sun, R.; Li, C. Three-dimensional path planning based on DEM. In Proceedings of the 2017 36th Chinese Control Conference (CCC), Dalian, China, 26–28 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 5980–5987. [Google Scholar]
Dai, W.; Pang, B.; Low, K.H. Conflict-free four-dimensional path planning for urban air mobility considering airspace occupancy. Aerosp. Sci. Technol. 2021, 119, 107154. [Google Scholar] [CrossRef]
Kothari, M.; Postlethwaite, I. A probabilistically robust path planning algorithm for UAVs using rapidly-exploring random trees. J. Intell. Robot. Syst. 2013, 71, 231–253. [Google Scholar] [CrossRef]
Chen, Y.-b.; Luo, G.-c.; Mei, Y.-s.; Yu, J.-q.; Su, X.-l. UAV path planning using artificial potential field method updated by optimal control theory. Int. J. Syst. Sci. 2016, 47, 1407–1420. [Google Scholar] [CrossRef]
Liu, J.; Luo, W.; Zhang, G.; Li, R. Unmanned Aerial Vehicle Path Planning in Complex Dynamic Environments Based on Deep Reinforcement Learning. Machines 2025, 13, 162. [Google Scholar] [CrossRef]
Aggarwal, S.; Kumar, N. Path planning techniques for unmanned aerial vehicles: A review, solutions, and challenges. Comput. Commun. 2020, 149, 270–299. [Google Scholar] [CrossRef]
Zhao, Y.; Zheng, Z.; Liu, Y. Survey on computational-intelligence-based UAV path planning. Knowl.-Based Syst. 2018, 158, 54–64. [Google Scholar] [CrossRef]
Besada-Portas, E.; de la Torre, L.; Moreno, A.; Risco-Martín, J.L. On the performance comparison of multi-objective evolutionary UAV path planners. Inf. Sci. 2013, 238, 111–125. [Google Scholar] [CrossRef]
He, W.; Qi, X.; Liu, L. A novel hybrid particle swarm optimization for multi-UAV cooperate path planning. Appl. Intell. 2021, 51, 7350–7364. [Google Scholar] [CrossRef]
Yuhang, R.; Liang, Z. An adaptive evolutionary multi-objective estimation of distribution algorithm and its application to multi-UAV path planning. IEEE Access 2023, 11, 50038–50051. [Google Scholar] [CrossRef]
Peng, C.; Huang, X.; Wu, Y.; Kang, J. Constrained multi-objective optimization for UAV-enabled mobile edge computing: Offloading optimization and path planning. IEEE Wirel. Commun. Lett. 2022, 11, 861–865. [Google Scholar] [CrossRef]
Babel, L. Online flight path planning with flight time constraints for fixed-wing UAVs in dynamic environments. Int. J. Intell. Unmanned Syst. 2022, 10, 416–443. [Google Scholar] [CrossRef]
Yao, P.; Wang, H.; Su, Z. Real-time path planning of unmanned aerial vehicle for target tracking and obstacle avoidance in complex dynamic environment. Aerosp. Sci. Technol. 2015, 47, 269–279. [Google Scholar] [CrossRef]
Kim, H.; Aung, P.S.; Munir, M.S.; Saad, W.; Hong, C.S. Cooperative Urban Air Mobility Trajectory Design for Power and AoI Optimization: A Multi-agent Reinforcement Learning Approach. IEEE Trans. Veh. Technol. 2025. [Google Scholar] [CrossRef]
Zammit, C.; van Kampen, E.J. Real-time 3D UAV path planning in dynamic environments with uncertainty. Unmanned Syst. 2023, 11, 203–219. [Google Scholar] [CrossRef]
Deniz, S.; Wu, Y.; Shi, Y.; Wang, Z. A reinforcement learning approach to vehicle coordination for structured advanced air mobility. Green Energy Intell. Transp. 2024, 3, 100157. [Google Scholar] [CrossRef]
Yun, W.J.; Jung, S.; Kim, J.; Kim, J.H. Distributed deep reinforcement learning for autonomous aerial eVTOL mobility in drone taxi applications. ICT Express 2021, 7, 1–4. [Google Scholar] [CrossRef]
Deniz, S.; Wang, Z. Autonomous Conflict Resolution in Urban Air Mobility: A Deep Multi-Agent Reinforcement Learning Approach. In Proceedings of the AIAA Aviation Forum and ASCEND 2024, Las Vegas, NV, USA, 29 July–2 August 2024; p. 4005. [Google Scholar]
OpenStreetMap. Available online: https://www.openstreetmap.org (accessed on 15 March 2025).
Reynolds, D.A. Gaussian mixture models. In Encyclopedia of Biometrics; Springer: Boston, MA, USA, 2009; pp. 659–663. [Google Scholar] [CrossRef]
Mundhenk, M.; Goldsmith, J.; Lusena, C.; Allender, E. Complexity of finite-horizon Markov decision process problems. J. ACM 2000, 47, 681–720. [Google Scholar] [CrossRef]
Van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
O’shea, K.; Nash, R. An introduction to convolutional neural networks. arXiv 2015, arXiv:1511.08458. [Google Scholar]

Figure 1. Methodology for estimating the distribution of tracking performances in urban area.

Figure 2. (a) Cellular network coverage information as a screenshot from the source website. (b) Extracted cellular network coverage information illustrated in Singapore map. The x- and y-axes represent 2D map-projected coordinates in meters.

Figure 3. (a) Cellular network strength information as a screenshot from the source website. (b) Extracted cellular network strength information illustrated in Singapore map. The coordinate axes show projected 2D positions in meters.

Figure 4. Dataset of cellular tower information illustrated on Singapore map. Plotted using 2D projected map coordinates, with both axes in meters.

Figure 5. Building data used in this study.

Figure 6. Silhouette analysis result for the determination the number of clusters.

Figure 7. Result of GMM clustering analysis. Spatial layout is represented in projected 2D coordinates (unit: meters).

Figure 8. The number of blocks in each cluster.

Figure 9. Probability distribution of CM delay under 1 Hz and 2.5 Hz update rate settings.

Figure 10. Distribution of CM delay in different sites under 1 Hz update rate setting.

Figure 11. Distribution of CM delay in different sites under 2.5 Hz update rate setting.

Figure 12. Spatial distribution of conformance monitoring performance across Singapore. Darker regions indicate areas with better monitoring timeliness.

Figure 13. The architecture of the convolutional neural network combined with double deep Q-learning (DDQN-CNN) for infrastructure-aware UAV path planning.

Figure 14. Training performance comparison of different models (smoothed over 50 episodes).

Figure 15. Radar chart comparing the normalized performance of the four models across five evaluation metrics: total reward, success rate, shortest path ratio, blind step ratio, and TD loss.

Figure 16. Representative flight trajectories generated by the DDQN-CNN model at different training stages. Obstacles are indicated with red dots, surveillance blind zones are marked in black, and the agent’s path from start (green) to goal (blue) is shown in orange.

Table 1. Description of data used in the clustering analysis.

	Variable	Data Source	Type	Information
1	No of towers	Tower data	Primary	Directly grouped the number of towers in each geographic block
2	UMTS_towers	Tower data	Primary	Refers to the 3G towers. Extracted from the variable ‘Radio’ in the data
3	LTE_towers	Tower data	Primary	Refers to the 4G towers. Extracted from the variable ‘Radio’ in the data
4	GSM_towers	Tower data	Primary	Refers to the 2G towers. Extracted from the variable ‘Radio’ in the data
5	NR_towers	Tower data	Primary	Refers to the 5G towers. Extracted from the variable ‘Radio’ in the data
6	Total towers in surrounding blocks	Tower data	Secondary	Using a buffer radius of 200 m and geographic coordinates of each tower, number of towers surrounding each block was found.
7	No of buildings	Building data	Primary	Similar to no. of towers, number of buildings in each block was directly extracted by matching the coordinates of tower to the block
8	Average height of buildings	Building data	Primary	Using the variable ‘height_m’ in the building data, the average height of all the buildings present in the block were obtained
9	Standard deviation of height	Building data	Primary	Using the variable ‘height_m’ in the building data, the standard deviation of height of all the buildings present in the block were obtained.
10	Total buildings in surrounding blocks	Building data	Secondary	Using a buffer radius of 200 m and geographic coordinates of each buildings, number of buildings within a radius of 200 m was found
11	Signal strength	Network strength data	Image	Using image processing, the number of points within each block was obtained. Differentiating each point based on their color gives insight into the signal strength associated with the block
12	High strength	Network strength data	Image	In the data source, the dark green points were high strength points
13	Moderate strength	Network strength data	Image	Light green points were classified as medium strength
14	Weak strength	Network strength data	Image	All other points belong to weak strength areas. Signal_strength = High strength + Moderate strength + Weak strength
15	Coverage strength	Coverage data	Image	Coverage Strength = 5G + 4G. Similar to network strength, all points within a block were classified as 4G or 5G based on their color
16	5G	Coverage data	Image	The purple points belong to 5G coverage areas
17	4G	Coverage data	Image	All nonpurple points belong to 4G coverage areas.

Table 2. Comparison of representative features among clusters.

Cluster	Avg. No. Towers	Avg. No. Building Elements	3/4G Coverage	5G Coverage
1	0.5	17.0	1.22	1.12
2	7.8	113.5	6.59	6.48
3	2.3	71.3	6.93	0.10
4	33.1	269.1	6.84	6.79
5	9.1	1042.6	6.80	6.44

Table 3. Evaluations of conformance monitoring latency in five clusters.

Cluster	$P (t_{delay} < 1)$	$P (t_{delay} > 2)$	$p_{CM} (1, 10, 1, 2)$
1	0.748	0.007986	0.668
2	0.770	0.006944	0.701
3	0.765	0.005903	0.706
4	0.736	0.004861	0.687
5	0.751	0.006250	0.689

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Q.; Dai, W.; Yan, Z.; Tessone, C.J. An Intelligent Path Planning System for Urban Airspace Monitoring: From Infrastructure Assessment to Strategic Optimization. Smart Cities 2025, 8, 100. https://doi.org/10.3390/smartcities8030100

AMA Style

Liu Q, Dai W, Yan Z, Tessone CJ. An Intelligent Path Planning System for Urban Airspace Monitoring: From Infrastructure Assessment to Strategic Optimization. Smart Cities. 2025; 8(3):100. https://doi.org/10.3390/smartcities8030100

Chicago/Turabian Style

Liu, Qianyu, Wei Dai, Zichun Yan, and Claudio J. Tessone. 2025. "An Intelligent Path Planning System for Urban Airspace Monitoring: From Infrastructure Assessment to Strategic Optimization" Smart Cities 8, no. 3: 100. https://doi.org/10.3390/smartcities8030100

APA Style

Liu, Q., Dai, W., Yan, Z., & Tessone, C. J. (2025). An Intelligent Path Planning System for Urban Airspace Monitoring: From Infrastructure Assessment to Strategic Optimization. Smart Cities, 8(3), 100. https://doi.org/10.3390/smartcities8030100

Article Menu

An Intelligent Path Planning System for Urban Airspace Monitoring: From Infrastructure Assessment to Strategic Optimization

Abstract

Highlights

Abstract

1. Introduction

1.1. Related Works

1.2. Contributions

1.3. Organization of the Paper

2. Assessment of Navigational Infrastructure Readiness

2.1. Methodology

2.2. Identification of Typical Geographical Blocks

2.2.1. Data Extraction and Pre-Procession

2.2.2. Clustering Analysis

2.2.3. Clustering Analysis

2.3. Monte Carlo Simulation and Results

3. DRL-Based Infrastructure-Aware Flight Planning

3.1. Problem Formulation

3.2. Deep Reinforcement Learning Approach

3.2.1. Reward Function Design and Learning Strategy

3.2.2. Action Selection and Training Procedure

3.2.3. Neural Network Architecture

3.2.4. Overview

3.3. Numerical Study and Results

3.3.1. Experimental Setup

3.3.2. Training Performance Analysis

4. Discussions and Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI