Next Article in Journal
An Efficient Autonomous Exploration Framework for Autonomous Vehicles in Uneven Off-Road Environments
Previous Article in Journal
Evaluating the Potential of UAVs for Monitoring Fine-Scale Restoration Efforts in Hydroelectric Reservoirs
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Traj-Q-GPSR: A Trajectory-Informed and Q-Learning Enhanced GPSR Protocol for Mission-Oriented FANETs

1
East-China Research Institute of Computer Technology, Shanghai 201800, China
2
School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing 400065, China
3
School of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, China
*
Authors to whom correspondence should be addressed.
Drones 2025, 9(7), 489; https://doi.org/10.3390/drones9070489
Submission received: 20 May 2025 / Revised: 8 July 2025 / Accepted: 8 July 2025 / Published: 10 July 2025
(This article belongs to the Section Drone Communications)

Abstract

Routing in flying ad hoc networks (FANETs) is hindered by high mobility, trajectory-induced topology dynamics, and energy constraints. Conventional topology-based or position-based protocols often fail due to stale link information and limited neighbor awareness. This paper proposes a trajectory-informed routing protocol enhanced by Q-learning: Traj-Q-GPSR, tailored for mission-oriented UAV swarm networks. By leveraging mission-planned flight trajectories, the protocol builds time-aware two-hop neighbor tables, enabling routing decisions based on both current connectivity and predicted link availability. This spatiotemporal information is integrated into a reinforcement learning framework that dynamically optimizes next-hop selection based on link stability, queue length, and node mobility patterns. To further enhance adaptability, the learning parameters are adjusted in real time according to network dynamics. Additionally, a delay-aware queuing model is introduced to forecast optimal transmission timing, thereby reducing buffering overhead and mitigating redundant retransmissions. Extensive ns-3 simulations across diverse mobility, density, and CBR connections demonstrate that the proposed protocol consistently outperforms GPSR, achieving up to 23% lower packet loss, over 80% reduction in average end-to-end delay, and improvements of up to 37% and 52% in throughput and routing efficiency, respectively.

1. Introduction

Flying Ad Hoc Networks (FANETs) are gaining traction in disaster relief, environmental monitoring, and communication relay due to their rapid deployment, flexible mobility, and broad coverage [1,2,3,4]. With the rapid development of the low-altitude economy, unmanned aerial vehicles (UAVs) have emerged as critical carriers of edge intelligence [5]. However, the limited capabilities of individual UAVs highlight the need for UAV swarms, which can perform complex tasks more efficiently and cost-effectively. As a representative application of FANETs, UAV swarms present significant challenges to routing protocol design due to their high-speed movement in three-dimensional space, energy constraints, frequent link disruptions, and dynamic changes in network density [6,7,8].
Conventional routing protocols can be categorized into two main groups: topology-based and location-based [9]. Classic topology-based protocols, such as AODV [10] and OLSR [11], maintain routing tables either passively or proactively. However, in FANETs, where network topology changes frequently, these protocols tend to generate outdated routing information and incur substantial control overhead. In contrast, Greedy Perimeter Stateless Routing (GPSR) [12], a prominent position-based protocol, relies solely on node location information, offering strong scalability. GPSR employs two forwarding strategies to ensure data delivery. GPSR employs two forwarding strategies to guarantee data delivery. In greedy mode, each node selects the neighbor nearest to the destination. When no such neighbor is available, indicating a routing void, the protocol transitions to perimeter mode, forwarding packets along the network boundary using the right-hand rule until greedy forwarding can resume.
GPSR’s greedy forwarding, while flexible and scalable, struggles in highly dynamic UAV swarms due to its reliance on single-hop neighbor information, resulting in degraded routing performance from frequent link interruptions and topology changes [13,14]. In recent years, researchers have enhanced GPSR by incorporating multi-hop information [15,16], energy level [17], and reinforcement learning techniques such as Q-learning to optimize routing decisions [18,19,20]. However, these approaches often rely on historical data for prediction, resulting in high computational overhead and underutilizing the future trajectory information of UAVs. In mission-oriented UAV swarms, nodes are typically equipped with planners to coordinate the swarm’s global trajectory, and trajectory data is shared within the swarm to facilitate collaborative task execution. Inspired by [21], our study innovatively integrates trajectory knowledge into GPSR routing by constructing a two-hop neighbor table at any given moment. This provides comprehensive spatiotemporal environmental awareness and, combined with multidimensional features and adaptive learning mechanisms, significantly improves routing stability and transmission efficiency.
The main contributions of this paper are as follows:
  • Trajectory-Aware Neighbor Expansion: We propose a novel routing strategy for UAV swarms by leveraging mission-planner-derived trajectory knowledge. We develop a time-aware two-hop neighbor table that enables nodes to anticipate network dynamics, enhancing both local and global topological awareness. This proactive approach significantly reduces routing voids and fosters stable communication in dynamic environments;
  • Q-Learning-Driven Routing Optimization: We introduce an adaptive routing framework for UAV networks, utilizing Q-learning to navigate their high dynamism. The framework incorporates a comprehensive state space, including two-hop distances, residual energy, queue lengths, and trajectory dynamics, and employs a multi-objective reward function with dynamically adaptive parameters. This design ensures balanced and efficient next-hop selection, promoting robust routing decisions under rapidly evolving network conditions;
  • Latency-Focused Queue Scheduling: To improve real-time performance and load balance, we devise a composite delay model that leverages trajectory insights to refine packet transmission timing. This mechanism mitigates congestion, reduces end-to-end latency, and provides an effective queue management solution for high-load, multi-hop scenarios in UAV swarms;
  • Comprehensive Simulation Validation: Through extensive ns-3 simulations, we assess Traj-Q-GPSR across diverse scenarios, varying node densities, UAV speeds, and CBR loads. Comparative evaluations reveal substantial enhancements in packet delivery ratio, end-to-end delay, routing efficiency, and throughput, demonstrating the robustness and effectiveness of our approach in dynamic UAV swarm settings.

2. Related Work

2.1. GPSR and Its Adaptations in Dynamic Networks

GPSR (Greedy Perimeter Stateless Routing) is a well-known position-based routing protocol, originally designed for vehicular ad hoc networks (VANETs) due to its scalability and low communication overhead [13,22,23,24]. However, when applied to flying ad hoc networks (FANETs), GPSR faces significant challenges caused by three-dimensional high-speed mobility, frequent link breaks, and highly dynamic topologies. The greedy forwarding strategy of GPSR tends to fall into routing voids, thus compromising delivery reliability and network robustness. To address these issues, recent studies have focused on enhancing link stability and improving the resilience of GPSR-based routing.
In [25], Alsalami et al. proposed the OLSR+GPSR protocol, combining the topology-driven OLSR with the location-driven GPSR. A fuzzy control system was used to dynamically adjust the HELLO message interval based on node velocity and positional prediction error, thereby improving link awareness. Moreover, the protocol integrates multiple metrics, including neighborhood density, velocity vector, buffer occupancy, and residual energy, into the MPR (Multi-Point Relay) selection process. Although the scheme enhances OLSR’s applicability to FANETs, it still requires partial topology maintenance and lacks predictive capability regarding future topology changes.
To address multipath diversity and load balancing, the AM-GPSR protocol [26] introduces an adaptive HELLO mechanism and a multipath greedy forwarding scheme. By adjusting the HELLO interval according to neighbor stability and maintaining multiple candidate paths with priority ranking, the protocol provides better resilience to link failures and improves balancing. However, the additional control overhead and reliance on single-hop information limit its scalability in large-scale UAV swarms.
SZLS-GPSR [27] introduces the concept of a “communication safe zone,” restricting candidate next-hop nodes to a stable sub-region within the communication radius. By jointly considering direction projection length, cumulative link duration, and movement direction within the zone, it significantly reduces the probability of falling into routing voids and improves packet delivery success. Nevertheless, this approach focuses mainly on local stability and does not exploit trajectory information to improve global topological awareness.
In UF-GPSR [28], Sudesh Kumar et al. introduce a utility function in the greedy forwarding process, which integrates residual energy ratio, distance reduction, movement direction, link risk, and average velocity to evaluate next-hop suitability. This enables more accurate and robust forwarding decisions.
In high-density networks, CF-GPSR [29] utilizes a cylindrical filtering method to constrain candidate nodes within a 3D cylinder between the source and the destination. Combined with metrics such as velocity alignment, ideal distance, residual energy, and movement angle, the protocol builds a composite utility function that enables low-overhead and reliable next-hop selection.
Overall, current GPSR adaptations improve performance in dynamic UAV networks by enhancing greedy forwarding with energy awareness, link stability, and multi-factor evaluation. However, most methods still rely on real-time single-hop neighbor information, lacking the ability to anticipate future topological changes. Moreover, few approaches have integrated machine learning-based decision-making to construct truly adaptive routing protocols.

2.2. Trajectory-Aware Routing in FANETs Networks

In FANETs, UAV mission planners can often generate highly accurate future trajectory sequences. Nevertheless, most existing studies rely on historical data for trajectory prediction rather than leveraging planner-generated trajectories directly [30].
In [17], the authors proposed EORB-TP, which predicts near-future movement based on time-series modeling of historical positions to estimate link duration and optimize forwarder set selection in opportunistic routing. Similarly, Vinti Gupta et al. in [31] applies an optimized 3D interpolation model for energy-efficient trajectory prediction. In [30], a prediction module is incorporated into a reactive greedy protocol to estimate link stability and improve path durability. While these approaches improve stability, they generally suffer from limited prediction accuracy and high computational complexity, which undermines real-time applicability.
Only a few works have directly utilized trajectory data from task planners for routing. In [21], the authors propose a mission-driven protocol where UAVs fly in a predefined formation and determine packet deliverability based on their current phase. Though effective in reducing packet loss in idealized scenarios, this method lacks flexibility in next-hop selection and adaptability to dynamic environments. Muhammad Morshed Alam et al. in [32] introduce a multi-agent deep reinforcement learning framework to jointly optimize trajectory, spectrum allocation, and routing. Although performance improves holistically, the solution’s high computational overhead and reliance on deep models hinder its deployment in resource-constrained UAV nodes.
In summary, trajectory-aware routing in FANETs is still underexplored. Most existing work depends on prediction rather than directly using planner-generated trajectories. Furthermore, such knowledge has not been fully integrated into position-based protocols like GPSR. To fill this gap, this paper proposes a novel integration of future trajectory information and Q-learning into GPSR, constructing time-aware two-hop neighbor tables to support adaptive and predictive routing decisions.

2.3. Q-Learning-Based Routing Protocols

The emergence of deep reinforcement learning (DRL) has led to increasing interest in intelligent routing protocols for FANETs [33,34,35,36,37,38]. Q-learning, as a model-free RL method, can iteratively refine routing decisions through interactions with the environment, making it particularly suitable for dynamic and rapidly evolving networks.
In [39], the authors extended traditional Q-learning by incorporating node velocity and direction into the state space, allowing decisions to align with mobility patterns. However, Q-FANET lacks mechanisms for predicting link failures or achieving effective load balancing. QTAR [38] (Q-learning-based topology-aware routing) improves upon this by introducing traffic-aware metrics into the reward function, enabling better congestion handling under high load, though it also does not consider trajectory dynamics. QMR [40] (Q-learning based Multi-objective optimization Routing) integrates multi-objective optimization into Q-learning via Pareto-based reward functions to balance latency, energy, and reliability. While theoretically powerful, its high-dimensional state-action space results in longer training times, limiting real-time applicability.
More recently, PER-D3QN [41] (Prioritized Experience Replay Dueling Double DQN) applies a prioritized experience replay mechanism and dueling architecture to mitigate Q-value overestimation and improve convergence. The approach builds congestion- and failure-aware state and reward models, showing improved robustness in dynamic FANET conditions. QFRP [37] (Q-Learning and Fuzzy Logic based Routing Protocol) enhances Q-table update granularity by combining HELLO and ACK signals, and integrates fuzzy logic to assess neighbors based on Q-value, link quality, and access delay. IQMR [15] (Improved Q-learning-based Multihop Routing) utilizes a Q ( λ ) learning framework with an elaborate state space encompassing UAV status, connectivity, coverage, and collision risk, along with adaptive learning rates and dynamic HELLO intervals to improve convergence under varying network conditions.
Overall, Q-learning-based routing protocols are evolving toward multi-dimensional sensing, adaptive parameter tuning, and intelligent decision-making. Nevertheless, existing work primarily responds to current states without integrating future topology evolution or planner-based trajectory information. Our proposed method combines trajectory knowledge with Q-learning to construct a predictive, robust, and high-efficiency routing protocol tailored for dynamic UAV swarm networks.

3. System Model

3.1. Network Model

Figure 1 presents a high-level overview of the proposed system, which includes both the operational scenario and the internal routing logic within each UAV. In the illustrated mission environment, UAVs perform collaborative tasks over urban and forest areas with diverse mobility patterns. Each UAV is equipped with a routing module enhanced by trajectory knowledge and a Q-learning agent. The example highlights a typical routing void encountered when the GPSR protocol fails due to insufficient neighbor coverage. By contrast, the proposed method anticipates future link availability and dynamically selects more reliable multi-hop paths.
This paper introduces a UAV swarm that operates autonomously without ground station control, using only intra-swarm communication to execute tasks. The swarm consists of m low-altitude drones, U = { U 1 , U 2 , , U m } , initially distributed randomly in a 3D space. Trajectory information is shared within the swarm to coordinate task planning and execution. To support predictive coordination and efficient routing, each UAV periodically disseminates its future trajectory through extended beacon messages embedded in HELLO packets. Neighboring nodes maintain a time-indexed local trajectory table, which is updated upon reception to support anticipatory coordination and predictive routing decisions.
Each UAV node U i determines its three-dimensional coordinates at time t via GPS, represented as P i ( t ) = ( x i ( t ) , y i ( t ) , z i ( t ) ) . We suppose that the 3D misssion area is limited in ( x min x x max , y min y y max , z min z z max ) . The swarm follows a predefined task planning algorithm to determine its collective trajectory, where the future position of each UAV is governed by its current state and planned motion:
p i ( t + Δ t ) = f ( p i ( t ) , v i ( t ) , a i ( t ) , Δ t )
Here, f ( · ) represents the motion model function, v i ( t ) denotes the velocity vector, and  a i ( t ) represents the acceleration vector. The trajectory planning algorithm ensures collision avoidance as a fundamental constraint, maintaining a minimum safety distance r between any two UAVs at all times:
q i ( t ) q j ( t ) r i j , t 0 .
where · represents the Euclidean norm.
We assume that UAVs have a fixed communication range, allowing wireless communication between UAVs that are within this radius. Consequently, the network topology can be modeled as a time-dependent graph G ( t ) = ( V ( t ) , E ( t ) ) , where the edge set E ( t ) ) represents the dynamic communication links between UAVs.

3.2. Energy Model

Considering the heterogeneity in UAV energy levels, we assume that each UAV U i starts with an initial energy E i n i t i a l , which follows a random distribution within a certain range. As the mission progresses, energy is continuously consumed, primarily due to flight and communication.
Flight energy consumption is the primary contributor to overall UAV energy usage and is influenced by aerodynamic properties, thrust requirements, and velocity. It consists of two main components: hovering power consumption and motion power consumption.The power required to maintain a hovering state in the absence of wind resistance is given by
P hover , i = m g η
where m is the UAV’s mass and η represents the efficiency of the propulsion system, which quantifies the conversion between electrical power and mechanical thrust. During motion, energy consumption is largely determined by the UAV’s velocity and aerodynamic drag, expressed as
P motion , i = 1 2 C d ρ A v i 3
where C d is the drag coefficient, ρ is the air density, A is the UAV’s frontal area, and  v i is its velocity.The total flight energy consumption over a time interval Δ t is given by
E flight , i = t t + Δ t P flight , i d t = t t + Δ t P hover , i + P motion , i d t
The energy consumed for communication depends on the power required for information exchange with neighboring UAVs. It consists of three main components: transmission power, reception power, and idle power when the wireless module remains active but is not transmitting or receiving data. Assuming the transmission and reception power remain constant over time, the total communication energy consumption over Δ t is given by
E comm , i = P tx , i T tx + P rx , i T rx + P idle , i T idle
where P tx , i and P rx , i are the power required for transmission and reception, respectively, and  T tx , T rx , T idle represent the time spent in each corresponding state.
At any given time t, the remaining energy of UAV U i is
E res , i ( t ) = E initial , i ( E flight , i + E comm , i )
If this energy level drops below the predefined threshold E threshold , the UAV is considered to have insufficient power to continue its mission and must disengage from the network to return to base.

3.3. Transmission Delay

Packet transmission delay in UAV networks is influenced by queuing delay T q , channel contention delay T c , and transmission delay T t . Each UAV node maintains a FIFO-based transmission queue. To accurately estimate packet transmission delay, we adopt the M/M/1 queue model, which characterizes a single-server system with Poisson arrivals and exponentially distributed service times, making it well-suited for wireless communication modeling.
According to M/M/1 queue theory, the average queuing delay for a packet is
T q = ρ μ ( 1 ρ )
where the queue utilization factor ρ = λ / μ must satisfy 0 ρ < 1 for stability. The service rate μ is determined by the available channel bandwidth B and the packet size L, such that μ = B ( b p s ) / L ( b i t s ) .
UAV nodes typically use the CSMA/CA mechanism for channel access, leading to additional contention delays. We adopt the Bianchi backoff model [42] to characterize this contention. In this model, a node undergoes multiple backoff stages before successfully transmitting a packet, where the backoff time per stage is
T backoff = C W min + 1 2 · T slot
where C W min is the minimum contention window size, and  T slot is the time slot duration. The expected number of backoff stages, E [ b ] = ( 1 p c ) / p c (where p c is the collision probability dependent on network load and neighboring nodes), determines the channel contention delay.
T c = E [ b ] · T backoff
The transmission delay, which represents the time required to physically send the packet over the channel, is given by
T t = L ( b i t s ) B ( b p s )
Combining the queuing, contention, and transmission delays, the total estimated delay for a packet is
T delay = ρ L B ( 1 ρ ) + E [ b ] · T backoff + L B
Due to dynamic topology changes in UAV networks, packet transmission delay depends on both the current queue state and the future network topology. To enhance routing efficiency, we predict the transmission time and assess whether a suitable next-hop neighbor will be available. For a packet p arriving at time t, the expected transmission time is
t send , p = t + T delay
Using trajectory information shared among the UAV swarm, the set of next-hop neighbors at t send , p can be determined as:
N i ( t send , p ) = { U j d i j ( t send , p ) R }
where R is the fixed communication radius.If no next-hop neighbors are available at t send , p , the packet is dropped to prevent unnecessary queuing:
If N i ( t send , p ) = then drop packet .
By incorporating trajectory-based predictions, this approach reduces unnecessary retransmissions, optimizes queue management, and improves overall network efficiency.
Figure 2 visualizes how this paper combines the trajectory information with the sending delay model to optimize the queue management of the nodes.
In the diagram, dashed blue lines represent the UAVs’ planned trajectories, while solid blue arrows denote available communication links. Under a task-driven deployment, UAVs U A , U B , U C , and  U G converge on the left to execute Task 1, whereas U D , U E , and  U F move to the right for Task 2.
At time t k , shown in panel (a), both U A and U E possess packets in their transmission queues addressed to U G , and their queue lengths are equal. Using the predicted transmission delay, both packets are slated for dispatch at time t k + 5 , as depicted in panel (b). Although no viable route from U A to U G exists at t k , by  t k + 5 UAV U A can reach U G via the stable multi-hop path
P : A B C D G .
Conversely, U E becomes isolated, its neighbor set empties, and no path to U G is available at t k + 5 . Consequently, its packet is proactively discarded. This anticipatory drop prevents wasteful transmissions and thus materially improves queue efficiency.

4. The TRAJ-Q-GPSR Algorithm

4.1. Trajectory-Informed UAV Routing

To address the inherent limitation of traditional GPSR protocols, where routing decisions rely solely on the current positions of nodes for greedy forwarding, this study introduces the integration of trajectory knowledge and a two-hop neighbor table. These enhancements enable UAV nodes to obtain a more comprehensive understanding of both spatial and temporal network dynamics. As a result, the likelihood of encountering routing voids is reduced, and both routing stability and path selection are significantly improved. The proposed approach first describes the acquisition of trajectory knowledge, followed by its mathematical modeling. Based on this foundation, a method is developed to construct a two-hop neighbor table that leverages the acquired trajectory information.
We assume that the mission planner computes the trajectory for each UAV within a future time window [ t 0 , t f ] . The trajectory T i of UAV U i at time t is defined as a time-parameterized geometric state representation:
T i ( t ) = p i ( t ) , v i ( t ) , a i ( t ) , t [ t 0 , t f ]
where p i ( t ) , v i ( t ) and a i ( t ) denote the position, velocity, and acceleration of UAV U i at time t, respectively.
To facilitate computation and storage, the trajectory is discretized into a finite time series:
T i = { ( p i ( t k ) , v i ( t k ) , a i ( t k ) ) t k = t 0 + k · Δ t , k = 0 , 1 , , K }
where Δ t represents the sampling interval, and K is the number of planned steps, determined by the planning algorithm’s prediction range.
For the entire UAV swarm, the set of all predicted trajectories within the given time window is defined as
T = i = 1 N T i .
This dataset encapsulates the motion information of all UAV nodes within the prediction horizon, providing valuable spatiotemporal knowledge for routing decisions.
In GPSR protocol, routing nodes can only sense their immediate neighbors at the current moment. Given the rapid changes in network topology, this often results in routing failures or frequent transitions into the perimeter forwarding mode. However, by sharing trajectory information, UAVs can proactively anticipate their future neighbors. The neighbor set N i ( t k ) of UAV U i at a future time t k is defined as the set of all nodes within its communication range R:
N i ( t k ) = { U j p i ( t k ) p j ( t k ) R } .
By leveraging trajectory sharing, each UAV can obtain information about its neighbors’ neighbors, thereby constructing a two-hop neighbor table:
N i ( 2 ) ( t k ) = U j N i ( t k ) N j ( t k ) .
Here, N i ( 2 ) ( t k ) represents the two-hop neighbor set of UAV U i at a future time t k .
Constructing a two-hop neighbor table at future time instances enables UAVs to make more informed routing decisions. This approach not only helps avoid local optima and reduces the likelihood of routing voids, but also supports the optimization of queue management strategies. By incorporating estimated packet transmission times, the method ultimately contributes to more robust and efficient routing performance.
The Traj-Q-GPSR theoretical framework of this paper is shown in Figure 3, which integrates trajectory knowledge with reinforcement learning to optimize routing decisions. The left part constructs a two-hop neighbor table using trajectory information. It evaluates node availability at the estimated transmission time based on a delay model, thereby determining whether to enqueue or drop the packet. The right part incorporates reinforcement learning to dynamically adjust routing weights. These adjustments are based on state features such as distance, queue length, and residual energy. This mechanism supports more efficient forwarding decisions and enables continuous policy updates.

4.2. Q-Learning Framework and Theory

Q-learning, a model-free reinforcement learning method, enables agents to learn optimal policies through interaction with their environment, adapting to unknown conditions. Its low complexity and adaptive nature make it ideal for optimizing routing decisions on edge devices like UAVs, particularly given the dynamic and uncertain topology of UAV networks caused by node mobility.
Q-learning is based on value iteration, and its theoretical foundation lies in the Markov Decision Process (MDP). An MDP provides a rigorous mathematical framework for describing how an agent learns the optimal policy in a stochastic environment, and is defined by the five-tuple ( S , A , P , R , γ ) . Here, the state space S represents all possible states of the system; in our study, the state includes the UAV node’s current position, velocity, neighbor information, and queue status. The action space A encompasses all possible actions that the agent can take, corresponding to the selection of a next-hop forwarding node. The state transition probability P defines the probability that the system transitions to a new state s after the agent takes an action a in state s; this transition is influenced by the UAV’s motion model and changes in the network topology. The reward function R measures the immediate gain obtained by executing action a in state s, thereby reflecting the quality of that action. The discount factor γ ( 0 , 1 ] is used to balance the importance of immediate and long-term rewards, with larger γ placing more emphasis on long-term returns and smaller γ on immediate gains.
The fundamental principle of Q-learning relies on an iterative update mechanism between states, actions, and rewards. Its objective is to find the optimal action-selection strategy for any state s by continuously exploring, experimenting, and updating Q-values, thereby enhancing routing efficiency and network adaptability. Specifically, when an agent in state s chooses an action a, it receives an immediate reward R ( s , a ) and transitions to a new state s . The agent maintains a Q-value table Q ( s , a ) and updates it using the Bellman equation:
Q ( s , a ) Q ( s , a ) + α R ( s , a ) + γ max a Q ( s , a ) Q ( s , a )
Here, α is the learning rate, determining the influence of newly acquired information on the existing Q-values, and  max a Q ( s , a ) represents the maximum Q-value achievable from the new state s , i.e., the maximum expected future reward. Through this update mechanism, the Q-values gradually converge to their optimal values, guiding the agent to make efficient decisions in a dynamic environment.

4.3. Traj-Q-GPSR Algorithm

Traj-Q-GPSR integrates trajectory information and two-hop neighbor awareness into routing, enabling forward-looking Q-learning for efficient UAV data delivery. This section details the implementation of the Traj-Q-GPSR algorithm from four aspects: the design of the state and action spaces, the construction of the reward function, the dynamic adjustment strategy for Q-learning parameters, and the pseudocode representation.

4.3.1. Definition of State and Action Spaces

In this algorithm, each UAV node functions as an agent whose state representation comprises both physical attributes and network context. Specifically, the state includes the node’s current position, velocity, and acceleration, along with two-hop neighbor information obtained from shared trajectory data. Additionally, the queue status is incorporated to reflect the current load of the node. The state space is formally defined as follows:
s = p i , v i , a i , N i ( 2 ) , Q i
where Q i indicates the current queue length reflecting the load of the UAV. The action space A defines the decision options available to the UAV in each state. Given the high dynamism of UAV networks, conventional GPSR, which relies solely on the current positions of one-hop neighbors for greedy forwarding, often suffers from local optima and routing voids. Therefore, this study extends the action space to include two-hop neighbor information, allowing UAVs to make next-hop selections with broader topological awareness and achieve improved global routing performance. The action space is formally defined as
A = { a a N i ( 2 ) }

4.3.2. Design of the Reward Function

The reward function forms the cornerstone of Q-learning, directly shaping the optimization direction of routing decisions. It is designed to balance multiple objectives—transmission efficiency, energy distribution, network load, and mobility prediction—while improving stability by mitigating routing voids. The composite reward function is expressed as
R ( s , a ) = R two hop ( s , a ) + w v R v ( s , a ) + w h P ( s , a )
where R two hop ( s , a ) denotes the two-hop reward, R v ( s , a ) is the velocity projection reward, and  P ( s , a ) is the routing void penalty term. The coefficients w v and w h represent the weights of the velocity reward and the void penalty, respectively.
  • Two-hop reward
    The two-hop reward is composed of single-hop rewards and their extension. The single-hop reward R ( s n ) evaluates the immediate benefit of selecting a neighbor U n as the next hop when UAV node U i is in state s. It considers three key factors: distance, residual energy, and queue load, defined as
    R ( s n ) = w d R d ( n ) + w e R e ( n ) + w q R q ( n )
    • Distance Reward R d ( n )
      Promotes selection of nodes closer to the destination, calculated as the proportion of distance reduced:
      R d ( n ) = d i d n d i
      Here, d i = p i ( t ) p D is the Euclidean distance from the current node to the destination U D , and d n = p n ( t ) p D is the distance from the neighbor to the destination. If d n < d i , then R d ( n ) > 0 , yielding a positive reward; otherwise, it is negative.
    • Energy Reward R e ( n )
      This prioritizes nodes with higher remaining energy to extend network lifetime:
      R e ( n ) = E n E max
      where E n is the remaining energy of U n , and E max is the maximum energy capacity.
    • Queue Length Reward R q ( n )
      This prefers nodes with lower loads to reduce transmission delay:
      R q ( n ) = 1 Q n Q max
    Expanding the single-hop reward to consider the influence of the next-hop neighbors, the two-hop reward is defined as
    R two hop ( s , a ) = R ( s n ) + m N ( n ) 1 D m R ( n m )
    where R ( n m ) is the single-hop reward from U n to its neighbor U m , and D m is the degree of U m (number of neighbors). The normalization factor 1 D m prevents reward inflation in highly connected regions.
  • Velocity Projection Reward
    In UAV networks, node mobility affects link stability. The velocity projection reward R v ( s , a ) leverages trajectory information to evaluate whether the selected next-hop UAV is moving toward the destination, improving adaptation to network dynamics.
    Given the velocity vector v n ( t ) of UAV U n and the directional vector d = p d p n ( t ) from U n to the destination U d , the reward is defined as
    R v ( s , a ) = cos ( θ ) = v n ( t ) · d | v n ( t ) | | d |
    A positive cos ( θ ) indicates movement toward the destination, yielding a positive reward; a negative value indicates movement away, resulting in a penalty. This encourages selecting UAVs whose motion trends enhance link availability.
  • Routing Void Penalty
    To prevent routing voids—where no neighbor is closer to the destination than the current node—the penalty term refines void severity assessment. Void severity H ( s ) is defined as
    H ( s ) = 1 N closer ( s ) N total ( s )
    where N closer ( s ) is the number of neighbors closer to the destination than U i , and N total ( s ) is the total number of neighbors. The penalty is
    P ( s , a ) = H ( s )
    If H ( s ) = 1 , indicating a full routing void, the penalty is maximized.

4.3.3. Adaptive Q-Learning Parameters

The learning rate α and discount factor γ are critical parameters in Q-learning, directly influencing the convergence speed and decision-making performance of the algorithm. Given the high dynamics and task diversity in UAV networks, these parameters require adaptive adjustment. In rapidly changing environments, they should facilitate quick learning, while in more stable conditions, they enable fine-tuned decision-making. This approach ensures a balanced trade-off between exploration and exploitation, thereby enhancing adaptability to topology variations.
The learning rate α determines the extent to which new information influences the existing Q-values. A fixed α approach can struggle in dynamic environments: it may converge too slowly when the environment changes drastically or over-adjust and deviate from the optimal solution when the environment is stable. To address this, an adaptive adjustment mechanism based on Q-value fluctuations is proposed. When Q-values fluctuate significantly, indicating unstable learning, α should increase to speed up updates. When Q-values stabilize, α should decrease to refine the strategy. The formula for dynamically adjusting α is
α = α min + ( α max α min ) · 1 1 + | Q ( s , a ) Q prev ( s , a ) |
where α max and α min are the maximum and minimum allowable values for the learning rate, Q ( s , a ) and Q prev ( s , a ) represent the current and previous Q-values, respectively. This strategy enables UAVs to rapidly adjust their policies in dynamic environments while maintaining stability during convergence.
The discount factor γ determines the balance between immediate rewards and long-term returns. In UAV networks, changes in the neighbor set reflect the network’s topological instability. When the neighbor set changes significantly, future uncertainty rises, and  γ should decrease to reduce reliance on future rewards. When changes are minimal, γ can increase to prioritize long-term planning. The neighbor change rate is defined as
Δ N = | N ( t k ) N ( t k 1 ) |
Based on this, the adaptive adjustment formula for γ is
γ = γ min + ( γ max γ min ) · e β Δ N
where γ max and γ min denote the upper and lower bounds of the discount factor, and  β is a tuning parameter. When Δ N is large (indicating a highly dynamic environment), γ decreases, focusing on immediate decisions over uncertain future rewards. When Δ N is small (stable environment), γ nears γ max , emphasizing long-term gains. This approach enhances adaptability in dynamic conditions while maintaining optimization in stable ones.
By dynamically adjusting the learning rate α based on Q-value fluctuations and the discount factor γ based on neighbor change rates, this strategy enables Q-learning to quickly adapt to environmental shifts in UAV networks while ensuring long-term convergence stability. This mechanism significantly boosts the efficiency and robustness of routing decisions, offering an effective solution for learning and optimization in UAV network applications.

4.4. Pseudo-Code

The pseudo-code representation of the Traj-Q-GPSR algorithm proposed in this paper is summarized in Algorithm 1.
Algorithm 1 Traj-Q-GPSR Algorithm
  • Require: Global trajectory information T , current time t k , node energy, communication radius R
  • Ensure: Updated Q-states and next-hop routing decisions
      1:
/*Initialization*/
      2:
for each UAV node i do
      3:
    Initialize the current state s based on T
      4:
    Construct the two-hop neighbor table N i ( 2 ) ( t k )
      5:
    Initialize the Q-table Q ( s , a ) 0
      6:
    Initialize parameters α = α max and γ = γ max
      7:
end for
      8:
/*Packet Processing Logic*/
      9:
if A packet is pending for transmission then
      10:
    if Queue is at maximum capacity. then
      11:
        drop the packet
      12:
    else
      13:
        compute the estimated sending time t send , p
      14:
        if  N i ( t send , p ) =  then
      15:
           drop the packet
      16:
        else
      17:
           enqueue the packet
      18:
        end if
      19:
    end if
      20:
end if
      21:
/*Q-Learning Routing Decision*/
      22:
while network is active do
      23:
    for each UAV node i do
      24:
        if  E res , i ( t k ) E threshold  then
      25:
           if the queue is non-empty then
      26:
               if there’s a closer neighbor then
      27:
                   route based on Q-learning action space
      28:
                   forward the packet
      29:
                   compute immediate reward via (24).
      30:
                   update the Q-table
      31:
                   update α and γ via (33) and (35)
      32:
               else
      33:
                   encountering routing voids
      34:
                   perimeter forwarding by right-hand rule
      35:
               end if
      36:
           end if
      37:
        else
      38:
           node i departs the network and returns to base
      39:
        end if
      40:
    end for
      41:
end while

5. Simulation Results and Discussions

5.1. Simulation Settings

In our simulations, we employ ns-3 as the network simulator and MATLAB R2022b for plotting the results. Notably, the mobility scenarios are generated using BonnMotion [43], a tool developed by the University of Bonn, which produces node movement trace files for the UAV network based on various mobility models. The detailed simulation parameters [15,44] are summarized in Table 1.
We adopt four performance metrics [22,38,44] to quantify the improvements in the proposed routing protocol, and—by holding all other factors constant—evaluate its behavior under varying node densities, mobility speeds, and CBR connections.
  • Packet Loss Ratio (PLR):The PLR is defined as the ratio of lost packets to the total packets sent by the source, reflecting packet losses due to routing failures, congestion, or channel errors:
    PLR = N lost N sent .
    A lower PLR indicates higher reliability and robustness in dynamic topologies and interference-prone environments.
  • End-to-End Delay (E2ED): E2ED measures the total time a packet takes from transmission at the source to reception at the destination, capturing the timeliness of the protocol:
    E 2 ED = 1 N recv p = 1 N recv t recv , p t send , p ,
    where N recv is the number of successfully received packets, and t send , p , t recv , p are the send and receive timestamps of packet p.
  • Routing Efficiency: To assess resource utilization in multi-hop scenarios, we define routing efficiency as the average ratio of successfully delivered packets per forwarding hop:
    RoutingEfficiency = 1 N totalRx totalTx × HopCount .
    This metric minimizes forwarding overhead while ensuring successful delivery. A higher value indicates better path optimization and resource efficiency.
  • Throughput: Throughput quantifies the network’s capacity to successfully deliver payload data per unit time, in bits per second (bps):
    Throughput = totalRx × PacketSize × 8 simtime .
    It directly reflects the protocol’s data delivery capability and bandwidth utilization under given network conditions.
In addition, this study examines node energy consumption and routing overhead to complement the performance evaluation. Energy consumption reflects the protocol’s impact on forwarding-related power usage, while routing overhead quantifies control message load required for maintaining connectivity and route discovery under dynamic conditions.

5.2. Impact of Node Density

First, we analyze the impact of UAV node density on network performance. As shown in Table 1, a fixed simulation area is configured, and the node mobility speed ranges from 15 m/s to 25 m/s in this section. For each parameter setting, 30 independent simulations are conducted, and a 95% confidence interval is adopted. Figure 4 shows the impact of node density on network performance.
At any node density, the proposed Traj-Q-GPSR algorithm consistently achieves a significantly lower packet loss rate compared to the traditional GPSR. As node density increases, the packet loss curve reflects a trade-off between improved connectivity and intensified channel contention. Specifically, when the number of nodes increases from 30 to around 70, the connectivity rapidly improves and path breaks are significantly reduced. Leveraging trajectory prediction and two-hop queue optimization, Traj-Q-GPSR reduces the packet loss rate from approximately 33.13% to 25.5%. In contrast, traditional GPSR, relying solely on instant location and single-hop decision-making, struggles with frequent neighbor changes and inefficient recovery, leading to an increased packet loss rate, peaking at 40.66%. As the number of nodes continues to increase from 70 to 100, the network enters a high-density phase. The redundancy among nodes increases, and alternative routes become more available, allowing GPSR to partially compensate for local routing failures and reducing the loss rate to 35.46%. Meanwhile, Traj-Q-GPSR, by proactively filtering invalid packets and applying queue-aware intelligent routing, effectively mitigates congestion and queuing losses under dense scenarios, further reducing the packet loss rate to 22.76%.
As can be seen from the end-to-end delay curves in Figure 4b, across all regimes, Traj-Q-GPSR achieves substantially lower delays than the conventional GPSR. Although GPSR’s latency decreases modestly as density grows, it remains around 1000 ms. In contrast, Traj-Q-GPSR maintains delays below 500 ms, rapidly dropping to 191.94 ms at moderate densities and stabilizing near 100 ms at high densities. The observed performance gap stems from the fundamental differences in their forwarding strategies. GPSR employs a one-hop greedy forwarding approach, which often triggers perimeter recovery when routing voids occur, resulting in frequent detours and retransmissions. In contrast, Traj-Q-GPSR utilizes trajectory information and two-hop neighbor awareness to enable more forward-looking and informed relay selection. It also integrates queue-aware packet management to improve forwarding efficiency and reduce unnecessary overhead. Together, these innovations dramatically curtail the detour and retransmission overheads that dominate GPSR’s delay profile.
As shown in Figure 4c, routing efficiency for both Traj-Q-GPSR and GPSR decreases as node count grows from 30 to 100, but Traj-Q-GPSR consistently outperforms GPSR. At low densities, adding nodes rapidly improves connectivity but increases average hop-count and forwarding overhead even faster, causing the ratio totalRx ( totalTx × HopCount ) to drop. Once the topology nears saturation, gains in delivered packets and in transmitted-hop product balance each other, stabilizing efficiency. In dense regimes, Traj-Q-GPSR’s trajectory prediction and two-hop neighbor awareness fully exploit available paths, yielding low delay and high reliability, so its efficiency remains at a high plateau.
Figure 4d plots throughput as node count increases from 30 to 100. Enhanced connectivity and spatial reuse drive overall growth: GPSR rises from approximately 1.03 × 10 5 bps to 1.40 × 10 5 bps, whereas Traj-Q-GPSR climbs from 1.32 × 10 5 bps to 1.92 × 10 5 bps. This trend mirrors reductions in packet loss and latency, reflecting more multi-hop paths, fewer link failures and retransmissions, and lower MAC-layer contention at high density. Thanks to its fusion of two-hop neighbor information, trajectory prediction, and reinforcement-learning-based selection, Traj-Q-GPSR delivers about 28% higher throughput than GPSR at 30 nodes, increasing to 37% at 100 nodes, thereby markedly improving data-delivery efficiency in dense UAV networks.

5.3. Impact of Node Mobility Speed

Node mobility speed is a key factor affecting the stability and performance of FANETs. Higher mobility speeds result in more frequent topology changes and increased link-break probability, imposing greater demands on the robustness and adaptability of routing protocols. In this subsection, we investigate the performance of each protocol across four core metrics under various mobility speeds to assess their adaptability in highly dynamic environments. The network is fixed at 70 UAV nodes with 20 CBR pairs. Each speed scenario is simulated 30 times, and all results are reported with a 95% confidence interval.
As illustrated in Figure 5a, the packet loss rate of Traj-Q-GPSR remains nearly constant as the UAV average speed increases, while GPSR’s loss rate rises from 21.8% up to 26.8%, with a pronounced jump around 20 m/s and a further increase nearing 27%. This divergence stems from their differing robustness to high-mobility links: GPSR relies solely on one-hop greedy forwarding, so when nodes move rapidly links break more frequently and retransmissions surge. In contrast, Traj-Q-GPSR uses two-hop neighbor tables and trajectory information to select more stable relays. It also applies queue-based rules to avoid sending packets over links that are likely to break, reducing both retransmission overhead and packet loss. At 30 m/s, Traj-Q-GPSR’s loss rate is 17% lower than GPSR’s, clearly demonstrating its superior link reliability in highly dynamic environments.
As shown in Figure 5b, when the UAV speed increases from 10 m/s to 30 m/s, the end-to-end delay of GPSR first rises from 878.7 ms to 991.4 ms, and then drops to 827.5 ms. This trend is mainly caused by prolonged fallback to recovery mode under low-to-moderate mobility, where broken links result in delayed forwarding. At higher speeds, however, faster topology updates enable quicker discovery of alternative relay paths and reduce average waiting time. In contrast, Traj-Q-GPSR maintains a stable delay range of 134–170 ms across all speeds, benefiting from trajectory prediction for preemptive two-hop neighbor construction and Q-learning-based robust relay selection. This allows Traj-Q-GPSR to sustain higher routing connectivity and forwarding efficiency under dynamic conditions, achieving an approximate 80% reduction in overall delay.
Figure 5c shows that Traj-Q-GPSR consistently outperforms GPSR in routing efficiency, and both exhibit a synchronized “rise-then-fall” trend as speed increases. This behavior closely mirrors the combined characteristics of packet loss rate and end-to-end delay discussed earlier. At moderate speeds, topology updates occur at an optimal rate, yielding good inter-node connectivity and enhanced path validity, which in turn boosts successful packet delivery and reduces average hop count, driving up routing efficiency. As speed increases, frequent link breakages reduce route stability. Although delay may slightly decrease, rising packet loss lowers throughput, leading to reduced efficiency. By contrast, Traj-Q-GPSR leverages trajectory awareness and Q-learning to mitigate high-mobility link fluctuations, maintaining superior efficiency across all speed regimes—achieving a peak improvement of over 52% compared to GPSR.
As shown in Figure 5d, Traj-Q-GPSR consistently maintains a high and stable throughput performance, with values ranging from 1.8 × 10 5 bps to 2.0 × 10 5 bps. In contrast, GPSR achieves significantly lower throughput, fluctuating around 1.0 × 10 5 to 1.2 × 10 5 bps. This trend closely aligns with the packet loss rate and routing efficiency observed earlier, particularly at the speed of 15 m/s, where the lowest packet loss and highest efficiency result in elevated throughput, indicating a synergistic optimization of overall transmission quality. Even under high-speed mobility, Traj-Q-GPSR leverages trajectory awareness and reinforcement learning strategies to adapt swiftly, thereby preserving throughput stability. Overall, Traj-Q-GPSR consistently demonstrates superior link maintenance and transmission reliability across the entire speed range, significantly enhancing system throughput in dynamic network conditions.

5.4. Impact of CBR Connections

In this subsection, we investigate the impact of different CBR traffic intensities on network performance. By varying the number of CBR connections, we analyze the differences between Traj-Q-GPSR and traditional GPSR in the four key performance metrics mentioned above. This evaluation provides a direct insight into each protocol’s network carrying capacity and congestion resilience, assessing their adaptability and robustness under varying traffic loads. The simulation scenario in this section involves 70 UAV nodes with node mobility ranging from 15 m/s to 25 m/s. Each simulation setting was run 30 times, and the results are reported with a 95% confidence interval.
As shown in Figure 6a, as the number of CBR connections increases from 10 to 50, the network load intensifies, resulting in more frequent link contention and collisions, which collectively lead to a general rise in packet loss rate. Under such conditions, the traditional GPSR protocol sees its loss rate increase from approximately 33.4% to 39.3%, highlighting the limitations of its single-hop greedy forwarding strategy. This approach often leads to routing voids and excessive retransmissions under high load. In contrast, Traj-Q-GPSR achieves a packet loss rate of 28.5% even at 10 CBR connections, which rises to 33.3% at 20 connections due to increasing contention. However, aided by two-hop neighbor information, trajectory prediction, and queue-aware optimization, the loss rate decreases to 29.8% and 30.3% at 30 and 50 pairs, respectively—consistently maintained around 30%. Notably, under the heavy-load condition of 50 CBR pairs, Traj-Q-GPSR achieves a 30.3% loss rate, which represents a reduction of approximately 23% compared to GPSR. This result clearly demonstrates Traj-Q-GPSR’s superiority in mitigating congestion and avoiding routing voids recovery failures under high-traffic conditions.
As illustrated in Figure 6b, end-to-end delay increases significantly with the growth of CBR traffic intensity. This is primarily due to aggravated link contention, increased MAC-layer retransmissions, and the accumulation of queueing delays. In the traditional GPSR protocol, the delay surges from approximately 305.9 ms to 3209.5 ms, indicating the considerable overhead caused by frequent route recoveries and retransmissions triggered by its single-hop greedy forwarding under high traffic loads. In contrast, the delay in Traj-Q-GPSR only rises from 82.2 ms to 859.0 ms, consistently staying below one second. This trend is also consistent with the packet loss patterns observed in Figure 4a. As corroborated by earlier experimental results, Traj-Q-GPSR demonstrates substantial improvement in end-to-end latency, further validating its capability to maintain low delay in high-load communication scenarios.
As shown in Figure 6c, the routing efficiency of GPSR declines significantly with the increase in the number of CBR pairs. When the CBR count reaches 50, its routing efficiency drops by 40.4% compared to the case with only 10 CBR pairs. Moreover, during moderate traffic conditions, GPSR exhibits marked fluctuations, reflecting its unstable forwarding performance due to frequent routing voids and retransmissions under heavy loads. In contrast, Traj-Q-GPSR maintains a consistently high level of routing efficiency, with variations contained within 5% across all tested loads. Under the maximum traffic load of 50 pairs, Traj-Q-GPSR improves routing efficiency by 77.8% compared to GPSR, demonstrating a significantly enhanced data delivery success rate per unit of forwarding cost.
As illustrated in Figure 6d, the overall throughput increases steadily with the number of CBR communication pairs. This growth is attributed to the elevated traffic activity and improved resource utilization across the network. Throughout the process, Traj-Q-GPSR consistently achieves higher link utilization efficiency by leveraging future trajectory awareness and Q-learning-based routing decisions, particularly under high-traffic conditions. In contrast, GPSR exhibits significantly lower throughput. When the number of communication pairs reaches 50, the throughput of Traj-Q-GPSR reaches approximately 4.3 × 10 5 bps, which is double that of GPSR at only 2.14 × 10 5 bps. The performance gap widens as the traffic load increases, confirming the superior adaptability and scheduling efficiency of Traj-Q-GPSR in complex communication scenarios.

5.5. Energy Consumption

To comprehensively evaluate the proposed Traj-Q-GPSR protocol, we further compare its energy consumption with that of GPSR under varying node densities, as illustrated in Figure 7 and summarized in Table 2. By incorporating neighbor energy information into the routing decision process, Traj-Q-GPSR aims to enhance energy efficiency during data forwarding.
Compared to GPSR, Traj-Q-GPSR consistently exhibits marginally lower energy consumption across all tested node densities. As the number of nodes increases, the energy consumption of both protocols rises accordingly. In high-density scenarios with 100 nodes, Traj-Q-GPSR achieves a 17.4% reduction in energy consumption relative to GPSR.The improvement in energy efficiency is relatively moderate. This is primarily due to the additional overhead caused by the exchange of trajectory information. Nevertheless, the results highlight that Traj-Q-GPSR effectively reduces redundant transmissions and lowers routing maintenance costs in densely deployed FANET environments.

5.6. Routing Overhead

In the evaluation of routing overhead, this study considers two key dimensions: communication overhead and storage overhead. For communication overhead, as outlined in the system model section, trajectory information is periodically disseminated through HELLO packets that encapsulate positional beacons. The structure of the HELLO packet, as detailed in Table 3, Table 4, extends the standard GPSR format by embedding trajectory-related fields, without introducing additional control message types. To rigorously quantify the cost of control traffic, we adopt the Byte-Level Routing Overhead Ratio (BROR) as a measurement metric [45]. This indicator captures the proportion of routing-related bytes relative to the total transmission, offering a fine-grained view of protocol efficiency at the byte level. The BROR is defined as
BROR = Control Bytes Control Bytes + Average Data Payload Bytes
This formulation enables an objective assessment of protocol-induced overhead throughout the simulation period, independent of packet counts or sizes.
Figure 8 and Table 5 present a comparative analysis of the byte-level routing overhead ratio (BROR) for Traj-Q-GPSR and the conventional GPSR protocol under varying node densities. As anticipated, Traj-Q-GPSR incurs a slightly higher overhead due to the incorporation of trajectory-related information in control packets. Nevertheless, as the network density increases, improved connectivity and higher throughput contribute to a steady decline in BROR for both protocols. Importantly, the overhead gap between Traj-Q-GPSR and GPSR diminishes from 0.57% at low density to 0.32% in high-density scenarios. These findings suggest that, while the proposed protocol introduces marginally greater control traffic, this cost is effectively amortized in denser topologies. More significantly, the gains in routing stability and overall network efficiency justify the additional overhead introduced by the trajectory-aware mechanism.
This work extends the conventional one-hop neighbor table to a time-aware two-hop structure and employs Q-tables within a Q-learning framework, thereby incurring greater memory overhead compared to GPSR. Since storage requirements grow with neighbor count, node density is the dominant factor affecting memory cost.
Table 6 presents the average number of neighbor entries maintained by Traj-Q-GPSR and GPSR under different node densities. The results indicate that Traj-Q-GPSR experiences accelerated growth in storage load as density increases, yet remains within acceptable limits, reflecting a balance between enhanced topological awareness and resource efficiency.

6. Conclusions

This work addresses the core challenges of routing in UAV swarm networks, including link instability, routing voids, and lack of adaptability under dynamic and mission-driven conditions. To overcome these issues, we propose a trajectory-informed and reinforcement learning-based routing framework that integrates future mobility knowledge into a time-aware two-hop neighbor structure. Coupled with an adaptive Q-learning mechanism and delay-aware queue scheduling, the proposed approach enhances routing stability and decision intelligence. In future work, we aim to extend the framework to handle uncertain or asynchronous trajectory updates and improve learning convergence under constrained communication conditions.

Author Contributions

Conceptualization, M.W. and S.C.; Methodology, M.W. and F.X.; Software, M.W. and T.P.; Validation, B.J., S.C., H.X. and T.P.; Formal analysis, H.X. and T.P.; Investigation, S.C.; Data curation, M.G.; Writing—original draft, M.W.; Writing— review & editing, B.J., S.C., H.X., M.G. and F.X.; Visualization, M.G.; Supervision, B.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key Research and Development Program of China under Grant 2024YFB4504500.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest. All co-authors have seen and agree with the contents of the manuscript and there is no financial interest to report. We certify that the submission is original work and is not under review at any other publication.

Abbreviations

The following abbreviations are used in this manuscript:
UAVUnmanned Aerial Vehicle
FANETFlying Ad Hoc Network
VANETVehicular Ad Hoc Networks
GPSRGreedy Perimeter Stateless Routing
Traj-Q-GPSRTrajectory-Informed Q-learning-based GPSR
CBRConstant Bit Rate
PLRPacket Loss Ratio
MDPMarkov Decision Process
E2EDEnd to End Delay
BRORByte-Level Routing Overhead Ratio

References

  1. Sadaf, J.; Hassan, A.; Ahmad, R.; Ahmed, W.; Ahmed, R.; Saadat, A.; Guizani, M. State-of-the-Art and Future Research Challenges in UAV Swarms. IEEE Internet Things J. 2024, 11, 19023–19045. [Google Scholar] [CrossRef]
  2. Kaddour, M.; Oubbati, O.S.; Rachedi, A.; Lakas, A.; Bendouma, T.; Chaib, N. A Survey of UAV-Based Data Collection: Challenges, Solutions and Future Perspectives. J. Netw. Comput. Appl. 2023, 216, 103670. [Google Scholar] [CrossRef]
  3. Mehdi, H.; Ali, S.; Rahmani, A.M.; Lansky, J.; Nulicek, V.; Yousefpoor, M.S.; Yousefpoor, E.; Darwesh, A.; Lee, S.W. A Smart Filtering-Based Adaptive Optimized Link State Routing Protocol in Flying Ad Hoc Networks for Traffic Monitoring. J. King Saud Univ.-Comput. Inf. Sci. 2024, 36, 102034. [Google Scholar] [CrossRef]
  4. Ridha, G.; Mami, S.; Chokmani, K. Drones in Precision Agriculture: A Comprehensive Review of Applications, Technologies, and Challenges. Drones 2024, 8, 686. [Google Scholar] [CrossRef]
  5. Abbas, S.; Talib, M.A.; Ahmed, I.; Belal, O. Integration of UAVs and FANETs in Disaster Management: A Review on Applications, Challenges and Future Directions. Trans. Emerg. Telecommun. Technol. 2024, 35, E70023. [Google Scholar] [CrossRef]
  6. Mansoor, N.; Hossain, M.I.; Rozario, A.; Zareei, M.; Arreola, A.R. A Fresh Look at Routing Protocols in Unmanned Aerial Vehicular Networks: A Survey. IEEE Access 2023, 11, 66289–66308. [Google Scholar] [CrossRef]
  7. Chen, S.; Jiang, B.; Pang, T.; Xu, H.; Gao, M.; Ding, Y.; Wang, X. Firefly Swarm Intelligence Based Cooperative Localization and Automatic Clustering for Indoor FANETs. PLoS ONE 2023, 18, E0282333. [Google Scholar] [CrossRef]
  8. Chen, S.; Jiang, B.; Xu, H.; Pang, T.; Gao, M.; Liu, Z. A Task-Driven Scheme for Forming Clustering-Structure-Based Heterogeneous FANETs. Veh. Commun. 2025, 52, 100884. [Google Scholar] [CrossRef]
  9. Lakew, D.S.; Sa’ad, U.; Dao, N.-N.; Na, W.; Cho, S. Routing in Flying Ad Hoc Networks: A Comprehensive Survey. IEEE Commun. Surv. Tutor. 2020, 22, 1071–1120. [Google Scholar] [CrossRef]
  10. Das, S.R.; Perkins, C.E.; Belding-Royer, E.M. Ad Hoc On-Demand Distance Vector (AODV) Routing; RFC 3561; Internet Engineering Task Force: Fremont, CA, USA, 2003. [Google Scholar] [CrossRef]
  11. Clausen, T.H.; Jacquet, P. Optimized Link State Routing Protocol (OLSR); RFC 3626; Internet Engineering Task Force: Fremont, CA, USA, 2003. [Google Scholar] [CrossRef]
  12. Karp, B.; Kung, H.T. GPSR: Greedy Perimeter Stateless Routing. In Proceedings of the 6th Annual International Conference on Mobile Computing and Networking (MobiCom ’00), Boston, MA, USA, 6–11 August 2000; ACM: New York, NY, USA, 2000; pp. 243–254. [Google Scholar] [CrossRef]
  13. Bengag, A.; Bengag, A.; Elboukhari, M. The GPSR Routing Protocol in VANETs: Improvements and Analysis. In Proceedings of the 3rd International Conference on Electronic Engineering and Renewable Energy Systems, Saidia, Morocco, 20–22 May 2022; Bekkay, H., Mellit, A., Gagliano, A., Rabhi, A., Koulali, M.A., Eds.; Lecture Notes in Electrical Engineering. Springer: Singapore, 2023; Volume 954, pp. 21–28. [Google Scholar] [CrossRef]
  14. Wang, C.-M.; Yang, S.; Dong, W.Y.; Zhao, W.; Lin, W. A Distributed Hybrid Proactive-Reactive Ant Colony Routing Protocol for Highly Dynamic FANETs with Link Quality Prediction. IEEE Trans. Veh. Technol. 2024, 74, 1817–1822. [Google Scholar] [CrossRef]
  15. Sharvari, N.P.; Das, D.; Bapat, J.; Das, D. Improved Q-Learning Based Multi-Hop Routing for UAV-Assisted Communication. IEEE Trans. Netw. Serv. Manag. 2024, 22, 1330–1344. [Google Scholar] [CrossRef]
  16. Sun, S.; Guo, X.; Liu, K. A Multi-Protocol Integrated Ad Hoc Networking Architecture. In Proceedings of the 2023 9th International Conference on Communication and Information Processing, Lingshui, China, 14–16 December 2023; ACM: New York, NY, USA, 2023; pp. 279–283. [Google Scholar] [CrossRef]
  17. Sang, Q.; Wu, H.; Xing, L.; Ma, H.; Xie, P. An Energy-Efficient Opportunistic Routing Protocol Based on Trajectory Prediction for FANETs. IEEE Access 2020, 8, 192009–192020. [Google Scholar] [CrossRef]
  18. Cui, J.; Ma, L.; Wang, R.; Liu, M. Research and Optimization of GPSR Routing Protocol for Vehicular Ad-Hoc Network. China Commun. 2022, 19, 194–206. [Google Scholar] [CrossRef]
  19. Alam, M.M.; Moh, S. Survey on Q-Learning-Based Position-Aware Routing Protocols in Flying Ad Hoc Networks. Electronics 2022, 11, 1099. [Google Scholar] [CrossRef]
  20. Wu, Q.; Zhang, M.; Dong, C.; Feng, Y.; Yuan, Y.; Feng, S.; Quek, T.Q.S. Routing Protocol for Heterogeneous FANETs with Mobility Prediction. China Commun. 2022, 19, 186–201. [Google Scholar] [CrossRef]
  21. Hu, D.; Yang, S.; Gong, M.; Feng, Z.; Zhu, X. A Cyber–Physical Routing Protocol Exploiting Trajectory Dynamics for Mission-Oriented Flying Ad Hoc Networks. Engineering 2022, 19, 217–227. [Google Scholar] [CrossRef]
  22. Zhang, W.; Jiang, L.; Song, X.; Shao, Z. Weight-Based PA-GPSR Protocol Improvement Method in VANET. Sensors 2023, 23, 5991. [Google Scholar] [CrossRef]
  23. Silva, A.; Reza, N.; Oliveira, A. Improvement and Performance Evaluation of GPSR-Based Routing Techniques for Vehicular Ad Hoc Networks. IEEE Access 2019, 7, 21722–21733. [Google Scholar] [CrossRef]
  24. Babu, S.; Rajkumar, P.A. Group Communication in Vehicular Ad-Hoc Networks: A Comprehensive Survey on Routing Perspectives. Wirel. Pers. Commun. 2024, 139, 2325–2377. [Google Scholar] [CrossRef]
  25. Alsalami, O.M.; Yousefpoor, E.; Hosseinzadeh, M.; Lansky, J. A Novel Optimized Link-State Routing Scheme with Greedy and Perimeter Forwarding Capability in Flying Ad Hoc Networks. Mathematics 2024, 12, 1016. [Google Scholar] [CrossRef]
  26. Rahmani, A.M.; Hussain, D.; Ismail, R.J.; Alanazi, F.; Belhaj, S.; Yousefpoor, M.S.; Yousefpoor, E.; Darwesh, A.; Hosseinzadeh, M. An Adaptive and Multi-Path Greedy Perimeter Stateless Routing Protocol in Flying Ad Hoc Networks. Veh. Commun. 2024, 50, 100838. [Google Scholar] [CrossRef]
  27. Zhou, Y.; Mi, Z.; Wang, H.; Lu, Y.; Tian, Y. SZLS-GPSR: UAV Geographic Location Routing Protocol Based on Link Stability of Communication Safe Zone. In Proceedings of the 2023 15th International Conference on Computer Research and Development (ICCRD), Hangzhou, China, 10–12 January 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 258–267. [Google Scholar] [CrossRef]
  28. Kumar, S.; Raw, R.S.; Bansal, A.; Singh, P. UF-GPSR: Modified Geographical Routing Protocol for Flying Ad-Hoc Networks. Trans. Emerg. Telecommun. Technol. 2023, 34, E4813. [Google Scholar] [CrossRef]
  29. Rahmani, A.M.; Haider, A.; Aurangzeb, K.; Altulyan, M.; Gemeay, E.; Yousefpoor, M.S.; Yousefpoor, E.; Khoshvaght, P.; Hosseinzadeh, M. A Novel Cylindrical Filtering-Based Greedy Perimeter Stateless Routing Scheme in Flying Ad Hoc Networks. Veh. Commun. 2025, 52, 100879. [Google Scholar] [CrossRef]
  30. Li, X.; Sun, H. Prediction-Based Reactive-Greedy Routing Protocol for Flying Ad Hoc Networks. Wirel. Netw. 2025, 31, 2893–2907. [Google Scholar] [CrossRef]
  31. Gupta, V.; Seth, D.; Yadav, D.K. An Energy-Efficient Trajectory Prediction for UAVs Using an Optimised 3D Improvised Protocol. Wirel. Pers. Commun. 2023, 132, 2963–2989. [Google Scholar] [CrossRef]
  32. Alam, M.M.; Moh, S. Joint Trajectory Control, Frequency Allocation, and Routing for UAV Swarm Networks: A Multi-Agent Deep Reinforcement Learning Approach. IEEE Trans. Mob. Comput. 2024, 23, 11989–11995. [Google Scholar] [CrossRef]
  33. Lin, N.; Huang, J.; Hawbani, A.; Zhao, L.; Tang, H.; Guan, Y.; Sun, Y. Joint Routing and Computation Offloading Based Deep Reinforcement Learning for Flying Ad Hoc Networks. Comput. Netw. 2024, 249, 110514. [Google Scholar] [CrossRef]
  34. Khoshvaght, P.; Tanveer, J.; Rahmani, A.M.; Altulyan, M.; Alkhrijah, Y.; Yousefpoor, M.S.; Yousefpoor, E.; Mohammadi, M.; Hosseinzadeh, M. Computational Intelligence-Based Routing Schemes in Flying Ad-Hoc Networks (FANETs): A Review. Veh. Commun. 2025, 53, 100913. [Google Scholar] [CrossRef]
  35. Xie, X.; Zhang, J.; Yan, Z.; Wang, H.; Li, T. Can Routing Be Effectively Learned in Integrated Heterogeneous Networks? IEEE Netw. 2024, 38, 210–218. [Google Scholar] [CrossRef]
  36. Rovira-Sugranes, A.; Razi, A.; Afghah, F.; Chakareski, J. A Review of AI-Enabled Routing Protocols for UAV Networks: Trends, Challenges, and Future Outlook. Ad Hoc Netw. 2022, 130, 102790. [Google Scholar] [CrossRef]
  37. Huang, S.; Tang, J.; Zhou, Z.; Yang, G.; Davydov, M.V.; Wong, K.K. A Q-Learning and Fuzzy Logic Based Routing Protocol for UAV Networks. In Proceedings of the 2024 16th International Conference on Wireless Communications and Signal Processing (WCSP), Hefei, China, 24–26 October 2024; IEEE: Piscataway, NJ, USA, 2025; pp. 1090–1095. [Google Scholar] [CrossRef]
  38. Arafat, M.Y.; Moh, S. A Q-Learning-Based Topology-Aware Routing Protocol for Flying Ad Hoc Networks. IEEE Internet Things J. 2022, 9, 1985–2000. [Google Scholar] [CrossRef]
  39. Da Costa, L.A.L.F.; Kunst, R.; de Freitas, E.P. Q-FANET: Improved Q-Learning Based Routing Protocol for FANETs. Comput. Netw. 2021, 198, 108379. [Google Scholar] [CrossRef]
  40. Liu, J.; Wang, Q.; He, C.; Jaffrès-Runser, K.; Xu, Y.; Li, Z.; Xu, Y. QMR: Q-Learning Based Multi-Objective Optimization Routing Protocol for Flying Ad Hoc Networks. Comput. Commun. 2020, 150, 304–316. [Google Scholar] [CrossRef]
  41. Pang, Y.; Dong, F.; Huang, R.; He, Q.; Shi, Z.; Chen, Z. A Resilient Packet Routing Approach Based on Deep Reinforcement Learning. In Proceedings of the 2024 IEEE 24th International Conference on Communication Technology (ICCT), Chengdu, China, 18–20 October 2024; IEEE: Piscataway, NJ, USA, 2025; pp. 741–747. [Google Scholar] [CrossRef]
  42. Bianchi, G. Performance Analysis of the IEEE 802.11 Distributed Coordination Function. IEEE J. Sel. Areas Commun. 2000, 18, 535–547. [Google Scholar] [CrossRef]
  43. BonnMotion—A Mobility Scenario Generation and Analysis Tool. Available online: https://sys.cs.uos.de/bonnmotion/ (accessed on 17 April 2025).
  44. Hosseinzadeh, M.; Ali, S.; Ionescu-Feleaga, L.; Ionescu, B.S.; Yousefpoor, M.S.; Yousefpoor, E.; Ahmed, O.H.; Rahmani, A.M.; Mehmood, A. A Novel Q-Learning-Based Routing Scheme Using an Intelligent Filtering Algorithm for Flying Ad Hoc Networks (FANETs). J. King Saud Univ.-Comput. Inf. Sci. 2023, 35, 101817. [Google Scholar] [CrossRef]
  45. Hosseinzadeh, M.; Yousefpoor, M.S.; Yousefpoor, E.; Lansky, J.; Min, H. A New Version of the Greedy Perimeter Stateless Routing Scheme in Flying Ad Hoc Networks. J. King Saud Univ.-Comput. Inf. Sci. 2024, 36, 102066. [Google Scholar] [CrossRef]
Figure 1. Trajectory-aware routing system model for mission-oriented UAV swarms.
Figure 1. Trajectory-aware routing system model for mission-oriented UAV swarms.
Drones 09 00489 g001
Figure 2. Queue optimization using trajectory knowledge and transmission delay model.(a) At time t k , U A is not reachable to U G , while U E is reachable to U G . (b) At time t k + 5 , U A is reachable to U G , while U E is not.
Figure 2. Queue optimization using trajectory knowledge and transmission delay model.(a) At time t k , U A is not reachable to U G , while U E is reachable to U G . (b) At time t k + 5 , U A is reachable to U G , while U E is not.
Drones 09 00489 g002
Figure 3. Theoretical framework of Traj-Q-GPSR.
Figure 3. Theoretical framework of Traj-Q-GPSR.
Drones 09 00489 g003
Figure 4. Impact of node density on network performance: (a) Impact of node density on packet loss rate. (b) Impact of node density on average end-to-end delay. (c) Impact of node density on routing efficiency. (d) Impact of node density on throughput.
Figure 4. Impact of node density on network performance: (a) Impact of node density on packet loss rate. (b) Impact of node density on average end-to-end delay. (c) Impact of node density on routing efficiency. (d) Impact of node density on throughput.
Drones 09 00489 g004
Figure 5. Impact of node velocity on network performance: (a) Impact of node velocity on packet loss rate. (b) Impact of node velocity on average end-to-end delay. (c) Impact of node velocity on routing efficiency. (d) Impact of node velocity on throughput.
Figure 5. Impact of node velocity on network performance: (a) Impact of node velocity on packet loss rate. (b) Impact of node velocity on average end-to-end delay. (c) Impact of node velocity on routing efficiency. (d) Impact of node velocity on throughput.
Drones 09 00489 g005
Figure 6. Impact of CBR connections on network performance: (a) Impact of CBR connections on packet loss rate. (b) Impact of CBR connections on average end-to-end delay. (c) Impact of CBR connections on routing efficiency. (d) Impact of CBR connections on throughput.
Figure 6. Impact of CBR connections on network performance: (a) Impact of CBR connections on packet loss rate. (b) Impact of CBR connections on average end-to-end delay. (c) Impact of CBR connections on routing efficiency. (d) Impact of CBR connections on throughput.
Drones 09 00489 g006
Figure 7. Curve of energy consumption (J) vs. number of nodes.
Figure 7. Curve of energy consumption (J) vs. number of nodes.
Drones 09 00489 g007
Figure 8. Curve of routing overhead vs. number of nodes.
Figure 8. Curve of routing overhead vs. number of nodes.
Drones 09 00489 g008
Table 1. Simulation parameters.
Table 1. Simulation parameters.
ParameterValue
Simulatorns-3.31, MATLAB
Packet Size512 bytes
Simulation Time150 s
Simulation Area 2000 m × 2000 m × 300 m
Speed Range10–30 m/s
Communication Range250 m
Bandwidth10 MHz
Number of Nodes30, 40, 50, 60, 70, 80, 90, 100
Traffic TypeCBR
CBR Rate2 Mbps
Number of CBR Connections10, 20, 30, 40, 50
HELLO interval1 s
Frequency of trajectory updates5 Hz
MAC ProtocolIEEE 802.11p
Transport ProtocolUDP
Initial Energy900–1000 J
Energy Threshold100 J
Propagation ModelNakagami Model
Mobility ModelGauss–Markov Mobility Model
Table 2. Energy consumption (J) based on the number of nodes.
Table 2. Energy consumption (J) based on the number of nodes.
Number of Nodes30405060708090100
Traj-Q-GPSR27.153.3112.5172.3232.8290.1345.5374.7
GPSR48.983.7153.2199.8266.6332.1420.3453.7
Table 3. A HELLO packet header format table.
Table 3. A HELLO packet header format table.
Field NameSize (Bytes)
Packet Type1
Number of Neighbors2
Node ID2
Current Coordinates ( x , y , z )24
Current Queue Length1
Current Energy Level2
Table 4. A format table of packet header.
Table 4. A format table of packet header.
Field NameSize (Bytes)
Packet Type1
Perimeter Mode Flag1
Node ID2
Destination Coordinates ( x , y , z ) 24
Update Timestamp4
Perimeter Entry Coordinates ( x , y , z ) 24
Previous Hop Coordinates ( x , y , z ) 24
Future Trajectory (5 × ( x , y , z ) )120
Table 5. Byte-level routing overhead ratio (BROR, %) based on the number of nodes.
Table 5. Byte-level routing overhead ratio (BROR, %) based on the number of nodes.
Number of Nodes30405060708090100
Traj-Q-GPSR1.43%1.33%1.17%1.06%1.01%0.98%0.96%0.93%
GPSR0.86%0.82%0.78%0.71%0.67%0.65%0.63%0.61%
Table 6. Storage overhead (average number of neighbor entries) based on number of nodes.
Table 6. Storage overhead (average number of neighbor entries) based on number of nodes.
Number of Nodes30405060708090100
Traj-Q-GPSR4.086.659.8113.5717.9322.8728.4234.56
GPSR1.582.132.673.223.764.314.855.40
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wu, M.; Jiang, B.; Chen, S.; Xu, H.; Pang, T.; Gao, M.; Xia, F. Traj-Q-GPSR: A Trajectory-Informed and Q-Learning Enhanced GPSR Protocol for Mission-Oriented FANETs. Drones 2025, 9, 489. https://doi.org/10.3390/drones9070489

AMA Style

Wu M, Jiang B, Chen S, Xu H, Pang T, Gao M, Xia F. Traj-Q-GPSR: A Trajectory-Informed and Q-Learning Enhanced GPSR Protocol for Mission-Oriented FANETs. Drones. 2025; 9(7):489. https://doi.org/10.3390/drones9070489

Chicago/Turabian Style

Wu, Mingwei, Bo Jiang, Siji Chen, Hong Xu, Tao Pang, Mingke Gao, and Fei Xia. 2025. "Traj-Q-GPSR: A Trajectory-Informed and Q-Learning Enhanced GPSR Protocol for Mission-Oriented FANETs" Drones 9, no. 7: 489. https://doi.org/10.3390/drones9070489

APA Style

Wu, M., Jiang, B., Chen, S., Xu, H., Pang, T., Gao, M., & Xia, F. (2025). Traj-Q-GPSR: A Trajectory-Informed and Q-Learning Enhanced GPSR Protocol for Mission-Oriented FANETs. Drones, 9(7), 489. https://doi.org/10.3390/drones9070489

Article Metrics

Back to TopTop