UAV Trajectory Control and Power Optimization for Low-Latency C-V2X Communications in a Federated Learning Environment

Fernando, Xavier; Gupta, Abhishek

doi:10.3390/s24248186

Open AccessArticle

UAV Trajectory Control and Power Optimization for Low-Latency C-V2X Communications in a Federated Learning Environment

by

Xavier Fernando

^†

and

Abhishek Gupta

^*,†

Department of Electrical, Computer and Biomedical Engineering, Toronto Metropolitan University, Toronto, ON M5B2K3, Canada

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Sensors 2024, 24(24), 8186; https://doi.org/10.3390/s24248186

Submission received: 26 November 2024 / Revised: 16 December 2024 / Accepted: 20 December 2024 / Published: 22 December 2024

(This article belongs to the Section Communications)

Download

Browse Figures

Versions Notes

Abstract

Unmanned aerial vehicle (UAV)-enabled vehicular communications in the sixth generation (6G) are characterized by line-of-sight (LoS) and dynamically varying channel conditions. However, the presence of obstacles in the LoS path leads to shadowed fading environments. In UAV-assisted cellular vehicle-to-everything (C-V2X) communication, vehicle and UAV mobility and shadowing adversely impact latency and throughput. Moreover, 6G vehicular communications comprise data-intensive applications such as augmented reality, mixed reality, virtual reality, intelligent transportation, and autonomous vehicles. Since vehicles’ sensors generate immense amount of data, the latency in processing these applications also increases, particularly when the data are not independently identically distributed (non-i.i.d.). Furthermore, when the sensors’ data are heterogeneous in size and distribution, the incoming packets demand substantial computing resources, energy efficiency at the UAV servers and intelligent mechanisms to queue the incoming packets. Due to the limited battery power and coverage range of UAV, the quality of service (QoS) requirements such as coverage rate, UAV flying time, and fairness of vehicle selection are adversely impacted. Controlling the UAV trajectory so that it serves a maximum number of vehicles while maximizing battery power usage is a potential solution to enhance QoS. This paper investigates the system performance and communication disruption between vehicles and UAV due to Doppler effect in the orthogonal time–frequency space (OTFS) modulated channel. Moreover, a low-complexity UAV trajectory prediction and vehicle selection method is proposed using federated learning, which exploits related information from past trajectories. The weighted total energy consumption of a UAV is minimized by jointly optimizing the transmission window (

L_{w}

), transmit power and UAV trajectory considering Doppler spread. The simulation results reveal that the weighted total energy consumption of the OTFS-based system decreases up to 10% when combined with federated learning to locally process the sensor data at the vehicles and communicate the processed local models to the UAV. The weighted total energy consumption of the proposed federated learning algorithm decreases by 10–15% compared with convex optimization, heuristic, and meta-heuristic algorithms.

Keywords:

queuing delay; processing delay; C-V2X; unmanned aerial vehicles; Doppler spread; OTFS; 6G; federated learning; fed-DDPG

1. Introduction

Sixth-generation (6G) communication networks are increasingly being characterized by non-terrestrial networks (NTNs) that operate using a terrestrial gateway to enable communication between ground nodes and aerial base stations [1]. In unmanned aerial vehicle (UAV)-assisted cellular vehicle-to-everything (C-V2X) networks, UAVs can be used as aerial data centers to store vehicle routes, traffic alerts, and calculate collision probabilities [2]. However, the UAV power consumption varies due to random fluctuations in the vehicular data traffic [3]. Hence, in a transmission window, maximizing the UAV communication range is critical to boost network capacity and improve performance. To improve the data rate and energy efficiency, the UAV trajectory and vehicle data transmission strategies should be optimized jointly [4].

The third-generation partnership project (3GPP) aims to develop 6G technologies and standards that integrate UAVs into terrestrial vehicular communications [5]. The 3GPP Release 16 and beyond include specifications for C-V2X communication that consider the effects of mobility, dynamic channel conditions, and latency challenges [6]. Maintaining robust communication in scenarios with fading channels and frequent link adaptations is a prevailing challenge in UAV-assisted C-V2X communication. If UAV computing resources are intermittently unavailable, it causes interruptions, and the vehicle’s sensor data can become outdated relatively quickly [7]. Furthermore, machine learning techniques are being explored to address C-V2X channel dynamics to address the time-sensitive nature of vehicular data. This is aimed at achieving wider coverage by UAV, as well as providing uninterrupted processing capability to vehicular data [8]. In this paper, the UAV-assisted C-V2X communication channel is modulated using orthogonal time–frequency space (OTFS) modulation to improve performance in scenarios with high mobility, non-line-of-sight (NLoS) conditions and multipath environment [9]. OTFS transforms the wireless channel into the delay-Doppler domain, where the instantaneous channel conditions appear to be stationary in high-mobility scenarios [10]. In the delay-Doppler domain, delay represents the time delay of the signal, and Doppler represents the rate of change of the signal frequency.

UAVs can be utilized to improve the coverage range and communications performance of vehicles. Due to the high mobility, UAVs can provide on-demand services to vehicles by adequately overcoming geographical constraints [11]. However, maximizing the UAV available battery power is challenging and comprises a multi-agent, multi-objective, co-operative non-convex optimization problem [12]. Moreover, the UAV-assisted C-V2X environment and the UAV trajectory are typically unknown and time-varying [13]. Therefore, a co-operative UAV-assisted C-V2X communication scenario that maximizes the available battery power of the UAV and optimizes its trajectory while serving the maximum number of vehicles is investigated in this paper. A brief timeline depicting the amalgamation of wireless communication technologies with transportation systems is illustrated in Figure 1. Figure 1 also illustrates the gradual integration of UAVs in vehicular networks in 5G and 6G wireless communication paradigms. Note, each vehicle captures a different kind of data packet, leading to non-independently identically distributed (i.i.d.) and heterogeneous data, requiring significant computational and storage resources. Moreover, a detailed overview and the evolution timeline of the recent applications of machine learning techniques for performance enhancement of UAV communication frameworks can be found in [14,15].

Existing works have investigated federated learning (FL) techniques to effectively improve the quality of service (QoS) in UAV-assisted vehicular communications [16]. Due to the unpredictable trajectory of UAVs and the limited battery power, the fairness of vehicle selection in each transmission window needs to be considered to enhance the real-time coverage rate [17]. Owing to the limited battery power of UAVs, the maximum achievable QoS is limited. Therefore, FL-based mechanisms to ensure an optimal UAV trajectory to maximize the battery power utilization of UAVs must be investigated for low-latency communications between vehicles and UAV [18]. Some recent works have proposed FL to design energy-saving algorithms by jointly optimizing the UAV flight distance and flight height [19]. Maximizing the minimum remaining energy of the UAV is also investigated as a viable solution to extend the UAV battery usage. Federated reinforcement learning (FRL) is also shown to provide superior performance in co-operative UAV–vehicle communications [20]. In certain Markovian approaches, the UAV and vehicles observe their individual states and also learn the state information of other agents to improve the level of co-operation among participating agents. However, obtaining the actions of UAV and other vehicles increases the dimension of the state space [21]. Multi-agent co-operation is also shown to have slow convergence in high-dimensional spaces. For high-dimensional spaces and multi-agent cooperation, the learning efficiency can be improved using FL-based optimization algorithms that assist the participating agents in faster decision-making [22].

A UAV must achieve wider ground coverage due to line-of-sight (LoS) channel links from an optimum UAV altitude [23]. However, in a multi-vehicle network with spatio-temporal correlation in transmitted data, the UAV energy constraints, flight trajectory restrictions, and QoS requirements need to be jointly considered [24]. Because of the multipath propagation effects, vehicles encounter non-line-of-sight (NLoS) links, and the channel quality and transmission rate deteriorate. The signal frequency varies due to the relative motion between the UAV and the vehicles [9]. The UAV trajectory must be optimized considering the constraints of the data rates and bit error rate (BER), taking into account the Doppler spread [10]. Accumulating and processing data from different transmission time intervals (TTIs) leads to delay [25].

1.1. Contributions

This paper investigates energy-efficient UAV trajectory optimization and computing resource allocation in UAV-assisted C-V2X communication and is a significant extension of our previous work in [26]. This paper also extends our previous work in [27], where we investigated some robust and strategic game-theoretic approaches used by the UAV to select different vehicles in each TTI. The main contributions of this paper are as follows:

We propose to examine end-to-end packet latency as well as UAV energy consumption based on FL with varying numbers of vehicles. From the FL iterations, we study the probability of the optimal trajectory prediction of the UAV using different neural network models. This is a significant extension of our previous work in [28] where we analyzed the variation in the UAV transmit power for a varying number of vehicles in a gross data offloading scenario.
As a function of computation offloading to UAV and the local vehicle model computation time, we plot the average task completion latency for varying numbers of vehicles. Using long short-term memory (LSTM), gated recurrent unit (GRU), recurrent neural network (RNN) and convolutional neural network (CNN)-LSTM models, we compare the average task completion latency for gross data offloading and FL, and the probability of optimal UAV trajectory prediction.
We validate the proposed solution by calculating the number of training iterations required to satisfy the service-time constraint for LSTM, GRU, RNN, and CNN-LSTM models. Here, we use the V2X-Sim and LTE I/Q datasets and based on the number of vehicles that exceed a specified time frame to process a task, we conclude the maximum number of vehicles a UAV can support without violating the identified constraints. Furthermore, we utilize the V2X-Sim dataset to verify the FL model convergence characteristics and performance trade-offs [29] for the proposed UAV-assisted C-V2X communications.
Unlike existing works where the device-to-device communication largely depends on neighbor discovery [30], in this work, the TTI is selected using the distributed scheduling protocol known as sensing-based semi-persistent scheduling (SPS) [31]. Since the vehicles and the UAV operate at different speeds, SPS is utilized to enable vehicles to independently select and manage the available bandwidth and the UAV communication and computational resources.

1.2. Organization

The remainder of this paper is organized as follows. Section 2 discusses some of the recent literature that applied FL and FRL techniques to achieve performance improvement in UAV-assisted vehicular communications. The section also identifies the challenges and opportunities for the performance enhancement of UAV-assisted C-V2X communications using FRL [8]. Section 3 illustrates our system model and discusses the UAV vehicle communication architecture used in this work. Section 4 presents our problem formulation where we formulate the problem of latency minimization, power optimization, and UAV trajectory control in UAV-assisted C-V2X as a Markov decision process (MDP). Here, a mixed-integer non-convex UAV trajectory optimization problem is proposed and is divided into four optimization sub-problems using Lagrangian dual decomposition. Section 5 outlines our proposed proposed solution approach, where Q-learning and policy gradient learning are applied at each vehicle to generate the local model parameters. Here, federated-DDPG using the LSTM model with temporal aggregation is proposed to enhance the UAV trajectory prediction. Additionally, federated reinforcement learning (FRL) is investigated to minimize the UAV energy consumption for time-varying data size and channel conditions. Section 6 discusses the findings of this work and investigates the impact of a control parameter on optimal UAV trajectory prediction. Here, we also compare the latency observed for packets of varying byte sizes for a varying number of vehicles and available UAV power. Section 7 concludes the paper and discusses some avenues for future research. The main abbreviations used in this paper are described in Table 1.

2. Related Work

Some novel applications of UAVs in various domains are listed in [32]. Recently, UAVs have found applications in monitoring the stability of landfills, monitoring the frequency of settlements, and, consequently, safeguarding the people living near landfills from serious hazards [33]. Some studies have investigated the applicability of UAV-based photogrammetry to monitor geometric changes in landfills to provide precise information to make informed reclamation decisions [33]. Other works have proposed UAV-based antennas and propagation measurements for electromagnetic field assessments, as they are flexible, cost effective, and easy to deploy [34]. The authors in [34] also evaluated some channel models for UAV positioning accuracy and antenna alignment in the presence of large-scale antenna arrays. The authors reported that some inaccuracies in positioning were alleviated by virtue of the recent advances in portable measurement systems and antennas designed specifically for UAVs [34]. Moreover, some authors have hypothesized that as the UAV-assisted wireless channels are overt, the UAV communications are vulnerable to eavesdropping attacks [35]. The authors in [35] proposed novel solutions to enhance UAV communications’ minimum secrecy rates. The authors formulated an optimization problem with parameters such as user association variables, UAV trajectory, and output power and proposed a solution based on sequential decision-making. Specifically, the work proposed and utilized a single agent soft actor critic and twin delayed deep deterministic policy gradient algorithm to jointly optimize the aforementioned parameters [35]. However, jointly optimizing the UAV trajectory and power consumption was not addressed by these works. In some recent works, it has been hypothesized that the UAVs do not operate accurately in indoor environments where the localization performance of global navigation satellite system (GNSS) is suboptimal [36]. This makes it even more difficult to accurately localize UAVs in dynamic environments with weak signal strengths in mission-critical applications and where channel interference is not negligible; hence, some alternatives to GNSS-based positioning were explored in [36]. These solutions estimate the distance and position using the received signal strength indicator (RSSI) based on Bluetooth low-energy beacons.

The authors in [37] proposed a single-agent deep reinforcement learning approach to solve scheduling requests, and a short-term memory was constructed to forecast the traffic. Using echo state networks, a machine learning framework was proposed to predict packet distribution and traffic patterns to optimize UAV flight paths and cached content. The energy consumption of the hybrid online offload framework was minimized by using a deep learning-based hybrid memory to store offloading decisions [38]. In order to implement optimization methods, it was necessary to have complete information about how vehicles distribute their data and how traffic is distributed.

The authors in [39] analyzed UAV placement strategies to maximize the number of vehicles under coverage to maximize the sum of channel gains or minimize non-convex path loss functions. The Dinkelbach algorithm and successive convex approximation (SCA) were used to maximize the UAV’s energy efficiency by optimizing its trajectory, transmit power, and computational load distribution. In order to minimize overall energy consumption, the authors optimized flight trajectory, transmission power, time slot scheduling, and task data assignment using Lagrangian duality. To optimize both the UAV position and its computing resources simultaneously, a three-stage iterative method was proposed. Based on ideal LoS channels, independent of multipath channels and Doppler spread, a mobile edge computing (MEC) network was designed with access to base stations (BSs) and UAVs [40]. Moreover, in the existing works, the UAV-assisted C-V2X communications can be modulated using non-orthogonal multiple access (NOMA), orthogonal frequency division multiplexing (OFDM), or OTFS modulation schemes.

In NOMA, multiple vehicles share the same time and frequency resources, and their signals are distinguished by different power levels. Vehicles with stronger channel conditions are assigned lower power, while vehicles with weaker channel conditions are assigned higher power [41]. This leads to the efficient utilization of the available resources and enhances the throughput [42].
OFDM divides the spectrum into multiple orthogonal subcarriers in the frequency domain and uses them to transmit data simultaneously. However, OFDM faces challenges in high-mobility environments, where Doppler spread is significant and the performance degrades in NLoS conditions and multipath environments [43].
OTFS modulation improves communication performance in scenarios with high mobility, NLoS conditions, and multipath environments [9]. OTFS transforms the wireless channel into a new domain called the delay-Doppler domain, where the instantaneous channel conditions appear to be stationary in high-mobility scenarios [10].

A multi-hop UAV-assisted relay network was employed to facilitate communication between transceivers on the ground by optimizing the channel assignment and flight control of the UAVs [44]. To maximize computation capability, the UAV deployment coordinates and altitude, transmit power, and bandwidth allocation were optimized together with the resource allocation and deployment strategy of the UAVs. By optimizing the communication scheduling, the UAV trajectory, and the computing resources jointly, subject to mobility, connection, and computation constraints, the authors proved that it could lower the signal-to-noise ratio (SNR) based on the first-order Gauss–Markov process and maximize computation capability [45]. An orthogonal frequency division multiple access (OFDMA) multi-user MEC with resource allocation was studied in [46]. A low-complexity suboptimal algorithm was proposed to extract the probable correlations between trajectory alterations and operations that are difficult due to their complexity. To minimize the distance between vehicles, a DRL algorithm using actor–critic was proposed for the stochastic scheduling of MECs [47].

The authors in [10] demonstrated that due to the high-speed movement of UAVs, the estimation and equalization of wireless channels for over-the-air communication are complex tasks. However, it is possible to directly modulate data in the time-delay-Doppler (TDD) domain over a wide range of time frequencies in a multipath propagation channel using OTFS. In high-speed mobile communication systems, OTFS achieves greater diversity gain by adapting to time-varying channels and converting the multipath channel into the TDD domain [48]. Moreover, NLoS signal reflections and multipath propagation cause challenging channel conditions to achieve BER performance for high-spectral-efficiency signals. It was determined that the BER varied depending on the severity of the Doppler spread, SNR, and noise and interference in the channel. Moreover, the choice of modulation scheme within the OTFS framework was also a critical factor [49].

To maximize the network throughput, conventional DRL implementations rely on centralized data collection at the MEC server, which adapts UAV trajectories jointly during each time step. In order to adapt the UAV trajectory, the vehicle scheduling, and energy harvesting policies, a DQN approach was proposed to overcome the excessive communication and training overhead [50]. It was proposed to use the multi-agent DQN method to optimize the real-time downlink capacity for all vehicles, as well as the MADDPG method to assign targets and plan the trajectory of multiple UAVs to gather sensor data. The UAV speed control was adjusted according to its energy status and position using the Q-learning algorithm. Other approaches employed an actor–critic DDPG for jointly optimizing the UAV trajectories and transmission scheduling strategy using the k-means clustering to aggregate different vehicles. In order to minimize the age of information (AoI), actor–critic DRL was used to adapt the sensing and flying decisions of each UAV, and to optimize the UAV–vehicle association strategy and minimize the total flight time [51].

Using DRL in applications with random task arrivals to optimize resource distribution, many approaches have proposed a dynamic scheduling strategy. Using the FL algorithm, collaborative model training can be accomplished without sharing data [52]. In this model, the spatial correlation of the traffic of adjacent vehicles is extracted from the wireless traffic data as images. In order to improve the prediction accuracy, this model employs temporal aggregation to capture the relationship between wireless traffic during current time slots and traffic during upcoming time slots. To reduce the power consumption of the UAV, select multiple starting positions at random to mitigate the impact of the initial positions [53].

In recent machine learning approaches, spatio-temporal neural networks have been used for accurate cellular traffic forecasting, and variations combine the output with past statistical data in order to improve long-term prediction accuracy [54]. An LSTM-based architecture for cellular traffic prediction combines Gaussian process regression with LSTM to capture spatial and temporal dependencies. For extracting dominant periodic components, LSTM is used for learning the long-term relationships among small random values, and FL is applied to estimate the residual random values [55]. The FL methods were shown to have lower computational complexity than probability models to predict the baseline component and maximum likelihood estimation methods to estimate the residual component. To reduce the communication overhead for energy saving, graph partitioning and rejoining is utilized in some of the proposed schemes, whereas to predict throughput and minimizing power consumption, LSTM has been used [56].

Recent works have shown that UAV trajectory optimization requires free space pathloss channel model when the UAV is flying at a sufficiently high altitude where the LoS is dominant [57]. However, in urban scenarios, buildings and small-scale fading cannot be ignored, and the channel is characterized by the Rician fading model. To address random channel capacity in Rician fading, some works have proposed to approximate the maximum allowed transmission rate, based on which a trajectory design method is presented to minimize the task completion time in Rician shadowed fading channel models [58]. In some approaches, the ratio of LoS-to-NLoS components and the degree of LoS shadowing is determined. Some approaches involved time slots where the UAV position is assumed to be invariant in each slot, optimizing them to approximate the optimal trajectory [59]. The path loss between a UAV and a vehicle depends on their respective positions as well as the propagation environment, which varies when a UAV is in a rural, suburban, urban, or high-rise environment. The movement of UAVs affects the amplitude, phase, and delay of the received signal [60]. Some advantages of FRL in latency minimization of UAV-assisted C-V2X networks are summarized in Table 2.

3. System Model

The system model considered in this paper is illustrated in Figure 2. In a time period (

T

), the UAV has available battery power (

P_{b}

), flying at a height (

H

) in meters (m). The time period (

T

) is divided into multiple transmission windows (

L_{w}

), where V vehicles sense and process sensor data and transmit the processed local models to the UAV.

As per Equation (1), the data rate in bits/Hz transmitted by each vehicle over UAV flight time is computed based on the summation of the instantaneous transmission rate (

s_{i, t}

):

s_{i, t} (ψ_{t}, w_{t}, D_{t}) = B_{j} {log}_{2} (1 + \frac{P_{i}^{t} {| h_{i, j}^{t} |}^{2}}{\sum_{j = 1}^{T - 1} P_{{\bar{d}}_{j}} {| h_{i, j}^{t} |}^{2} + B_{j} {(σ_{i, j}^{t})}^{2}})

(1)

where

ψ_{t}

is a binary variable, indicating whether a vehicle transmits local models to the UAV in a transmission window (

L_{w}

) or not,

w_{t}

is the FL model weights at time t, and

D_{t}

is the delay, which is the sum of the queuing delay and processing delay.

3.1. Packet Arrival at the UAV

The bandwidth is denoted by

B_{j}

, and

P_{i}^{t}

indicates the power consumed by the UAV while communicating with the ith vehicle. The term

σ^{2}

is the noise power density at the receiver of the vehicle. The channel gains of the UAV at time slot t are denoted as

h_{i, j} [t]

. The communication links can be LoS or NLoS with probabilistic path loss depending on the UAV position, obstacle height, number of obstacles, and the UAV height (

H

). The instantaneous transmission rate for the ith vehicle in a time slot is given by Equation (2) as

s_{j}^{i} (b_{j}^{i}, x_{i}, y_{i}) = b_{j}^{i} {log}_{2} (1 + Γ_{j, i})

(2)

where

b_{j}^{i}

is the instantaneous bandwidth utilized by the ith vehicle in the jth transmission window, and

Γ

is the SNR. The packets comprising processed local models arrive at the

M / M / k

queue where the inter-arrival time (

λ_{t}

) between successive packets is exponentially distributed, and in each TTI, an arrival is independent of the previous arrivals. The local models (

Ψ_{j}

) considered for federated averaging, similar to the approach proposed in [26], are the summation of local models from all vehicles (

\forall i \in V

) transmitted in the previous TTIs as given by Equation (3):

Ψ_{j} (b_{j}^{i}, x_{i}, y_{i}) = λ_{t} \sum_{t = 1}^{T} ψ_{j}^{i}, \forall i \in V

(3)

The local models arrive at the queue, which accumulates at a rate specified by Equation (4). For uplink transmission, the data transmitted from a vehicle at time t are restricted by the available bandwidth (

B_{i}^{t}

) and uplink transmission window (

τ_{u}

):

ψ_{j}^{i} = \{\begin{matrix} s_{j}^{i} (b_{j}^{i}, x_{i}, y_{i}), & if τ_{i} \leq i \leq λ_{i} \\ min (B_{i}^{t}, s_{i}^{t} \times τ_{u}), & otherwise \end{matrix}

(4)

3.2. UAV Power Consumption

The total power consumption during time period t is given by Equation (5):

P_{t o t a l} (t) = \sum_{i = 1}^{V} P_{i} (t) + P_{U A V}^{(V)} (t)

(5)

where

P_{U A V}^{(V)}

is the power required for operating the UAV during time period t, and

P_{i} (t)

denotes the power consumption in the ith time slot when V vehicles transmit data or local models to the UAV. The variation in UAV power

P_{i} (t)

is modeled in Equation (6), which depends on the uplink power, downlink power, processing power, and flying power:

P_{i} (t) = u_{i} (t) (\sum_{k = 1}^{k} γ_{k} ψ_{i, k} (t) + P_{U A V}^{(1)} (t))

(6)

where

u_{i} (t)

is a binary variable;

u_{i} (t)

= 1 indicates that in the ith time slot, the UAV is processing the data from a vehicle, and

u_{i} (t)

= 0 indicates that the UAV is idle which leads to the wastage of energy. Moreover,

ψ_{i, k} (t)

denotes the size of the kth type of vehicular data served by the UAV in the ith time slot. In addition, the coefficient

γ_{k} ψ_{i, k} (t)

determines the power consumption during the uplink and downlink of the kth type of vehicular data, which can either be a local model for different sensor data, basic safety message (BSM), or co-operative perception message (CPM). This paper aims to minimize the weighted total energy consumption by jointly optimizing the transmission window (

L_{w}

), transmit power (

P_{i} (t)

), and UAV trajectory (

q_{(x, y)}

) while considering the Doppler spread. The UAV energy consumption depends on its flying speed, flying duration, flying time, and communication with vehicles [68]. The trajectory coordinates traversed by UAV are denoted by

q_{(x, y)}

with the flying time upper bound by

T

. The trajectory coordinates including all flying and turning points are represented as

q_{(x, y)} \in [q_{(0, 0)}, q_{(0, 1)} \dots q_{(1, 0)}, q_{(1, 1)} \dots q_{(m, m)}]

.

3.3. Distance Between Vehicles and UAV

To calculate the real-time distance between the vehicles and the UAV, we use the Euclidean coordinate system, where the vehicles remain on the same two-dimensional horizontal plane, implying

H

= 0. The maximum speed of UAV is denoted as

v_{m a x}

in meter/second (m/s). When the UAV flies and collects sensing data from vehicles, it must meet the minimum SNR for reliable data collection by spatially distributing the vehicles along the ground. This is challenging in NLoS channel conditions, as the vehicles frequently move in and out of the direct communication range of the UAV. The collected data are buffered at the queue or forwarded to the UAV. Hence, we utilize the LTE I/Q dataset for five different UAV altitudes to construct a predetermined 3D map of spatial coordinates collected as the trajectory changes rapidly and abruptly [69]. The distance between the UAV and the ith vehicle is given by Equation (7):

d_{v e h}^{u a v} = \sqrt{{(x_{u} - x_{i})}^{2} + {(y_{u} - y_{i})}^{2} + H^{2}}, i = 1, 2, \dots V

(7)

At a specific height (

H

), the UAV distance traveled in one time slot is constrained by Equation (8) as follows:

{(x_{i + 1} - x_{i})}^{2} + {(y_{i + 1} - y_{i})}^{2} \leq {(v_{m a x} δ_{t})}^{2}, i = 1, \dots, V

(8)

where

δ_{t}

is a limiting parameter.

3.4. Channel State and UAV Energy Consumption

The orthogonal transmission is employed in the uplink to allow multiple vehicles to simultaneously upload their data to the UAV. The building heights are modeled using Rayleigh distribution. To model the channel state and UAV-to-vehicle links, we consider Rician fading. The LoS and NLoS links have different probabilities of occurrence, which is a function of the environment, density and height of buildings, and the elevation angle between the UAV and vehicles. The geometrical statistics of various environments are offered by the International Telecommunication Union (ITU-R), which determine the density, number, and height of the buildings and other obstacles [48]. The NLoS effect decreases as the elevation angle between the receiver and transmitter increases, and the communication link approaches near LoS. The Rician K-factor that represents the strength of LoS component is a function of the elevation angle and the UAV altitude [49]. The Rician K-factor impacts the maximum transmit power and path loss exponents as given by Equation (9):

{\hat{h}}_{j}^{i} = (\sqrt{\frac{K}{1 + K}} \bar{h} + \sqrt{\frac{1}{1 + K}} \tilde{h})

(9)

where

{\hat{h}}_{j}^{i}

is the channel-gain experienced by the ith vehicle in the jth transmission window considering the Rician K-factor. The term

\bar{h}

implies the LoS component, and the term

\tilde{h}

indicates the fading component.

4. Problem Formulation

We formulate an optimization problem to determine the optimal trajectory of UAV in C-V2X communication, where the UAV maximizes the number of served vehicles adhering to QoS and power constraints. In a transmission window (

L_{w}

), we maximize the UAV utility considering the priority of data transmission and link connection time, as well as minimizing the UAV energy consumption formulated as a bin packing problem. We consider the uplink and downlink capacities in vehicle edge servers for transmission rate maximization, energy minimization, and delay minimization. Due to the spatial and temporal correlations between the UAV energy and trajectory, the proposed optimization is a non-convex problem. The objective function and constraints are non-convex with respect to the backhaul link outages and transmission latency, leading to a mixed-integer non-convex optimization problem. Hence, the problem (

P 1

) of energy-efficient computing resource allocation in UAV is formulated as follows in Equation (10):

\begin{matrix} P 1 : & min_{ψ^{(t)}, w^{(t)}, D^{(t)}, T} {w_{1} \underset{S P_{1}}{\underset{︸}{[T τ_{u} + \sum_{t = 1}^{T} min (max_{v \in V} (d_{v}^{t}), τ_{d})]}} + \\ w_{2} \underset{S P_{2}}{\underset{︸}{[\sum_{t = 1}^{T} \sum_{v \in V} D_{v}^{(t)} min (d_{v}^{(t)}, τ_{d})]}} + w_{3} \underset{S P_{3}}{\underset{︸}{[\sum_{t = 1}^{T} \sum_{v \in V} I_{v}^{(t)}]}}} + \\ min_{Ψ, Φ, D_{i}, T} \underset{S P_{4}}{\underset{︸}{\sum_{i = 1}^{V} \sum_{t = 1}^{T} (e_{i} (t) + \sum_{j \in V} \sum_{k \in K} t_{i} p_{i, j}^{(k)} ς_{i, j}^{(k)} (t))}} \\ subject to \end{matrix}

(10)

\begin{matrix} C 1 : & \sum_{t = 1}^{T} (1 - I_{v}^{(t)}) D_{v}^{(t)} = B_{v}^{(t)}, \forall v \in V \end{matrix}

(11)

\begin{matrix} C 2 : & D_{v}^{(t)} \in [D_{m i n}, D_{m a x}], \forall v \in V, \forall t \in T \end{matrix}

(12)

\begin{matrix} C 3 : & D_{i} (t) \leq D_{m a x} and D_{i} (T) = 0 \end{matrix}

(13)

\begin{matrix} C 4 : & ψ_{v}^{(t)} \in {0, 1}, \forall n \in V, \forall t \in T \end{matrix}

(14)

\begin{matrix} C 5 : & w_{n}^{(t)} \in {1, \dots, V}, \forall v \in V, \forall t \in T \end{matrix}

(15)

\begin{matrix} C 6 : & | | q_{i} (t + 1) - q_{i} (t) | | \leq v_{m a x} (t) t_{i, i + 1} \end{matrix}

(16)

\begin{matrix} C 7 : & | | q_{i} (t + 1) - q_{i} (t) | | \geq d_{m i n} \end{matrix}

(17)

\begin{matrix} C 8 : & q_{(x, y)} = q_{(x_{0}, y_{0})} \cdot e^{- α (\frac{x^{2}}{a^{2}} + \frac{y^{2}}{b^{2}})} + H \end{matrix}

(18)

\begin{matrix} C 9 : & w_{m a x} (0) = D_{m} and w_{m a x} (T) = 0 \end{matrix}

(19)

\begin{matrix} C 10 : & ς_{i, j}^{(k)} (t) \in {0, 1} and s_{i, m} (t) \in {0, 1} \end{matrix}

(20)

\begin{matrix} C 11 : & s_{i} (b_{i}^{(n)}, x^{(n)}, y^{(n)}) \geq ϱ_{i} s_{i}^{m i n}, \forall n, i \in I \end{matrix}

(21)

\begin{matrix} C 12 : & 0 \leq b_{i}^{(n)} \leq ϱ_{i}, \forall n, i \in V \end{matrix}

(22)

\begin{matrix} C 13 : & D_{n} ({q (t)}, {a (t)}) = \int_{0}^{T} a_{k} (t) s_{n, max} (q (t)) d t \end{matrix}

(23)

\begin{matrix} C 14 : & a_{k} (t) = {0, 1}, \forall v \in V, t \in T \end{matrix}

(24)

where

P 1

is a multi-objective optimization problem. The constraint

C 1

is an indicator of the status of a TTI. It is identified as a binary value, 0 or 1, implying whether, in the ith TTI, the UAV is communicating with a vehicle or is idle. The constraints

C 2

and

C 3

imply that a vehicle is served by the UAV in the current

L_{w}

subject to an upper bound on delay given by

D_{m a x}

. The constraints

C 4

and

C 5

impose an upper bound on the size of the packets (

ψ

) and model weights transmitted from a vehicle to the UAV. This ensures that the queue at the UAV is not overloaded and imply that the UAV computing resources consumed do not exceed the maximum baseband processing capacity.

The constraints

C 6

and

C 7

restrict the UAV trajectory to ensure that the UAV is not out of the vehicle’s coverage range for a long time and serves at least one vehicle in each TTI. The UAV trajectory is constrained by its initial position and its final position to avoid obstacles, maintain speed and altitude, minimize energy consumption, and maximize travel time. In constraint

C 8

, we bound the UAV trajectory by an elliptical path. The UAV can theoretically move from point

A_{(x_{A}, y_{A})}

to

B_{(x_{B}, y_{B})}

through infinite possible paths. The constraint

C 8

represents a Poisson point process within an elliptical area and characterizes the spatial distribution of paths traversed by the UAV. The constraint

C 9

ensures that the vehicles’ data are successfully offloaded to the UAV during a TTI and UAV flight time (

T

). The constraint

C 10

denotes the assignment coefficient for vehicles, where

ς

= 1 denotes vehicle is scheduled for transmission, while

ς

= 0 denotes that it is waiting. The constraint

C 11

introduces a control parameter

ϱ_{i}

for the data rates to reduce the computational complexity of the solution. The constraint

C 12

limits the available bandwidth so that all vehicles have a chance to send the maximum amount of data to the UAV. The constraints

C 13

and

C 14

indicate the throughput of the vehicle local models. These constraints guarantee that each vehicle uploads the minimum amount of data to the UAV and prevents the UAV from wasting radio resources on a vehicle that cannot be served in a given TTI. The main symbols used in this paper are described in Table 3.

5. Proposed Solution

The problem

P 1

is divided into four sub-problems pertaining to communication scheduling as well as trajectory and computing resource optimization using Lagrangian dual decomposition. The communication delay is optimized in the first sub-problem

S P_{1}

. Branch and bound is used to solve optimization sub-problems

S P_{1}

and

S P_{2}

to find an optimal solution from a set of candidate solutions. Branch and bound is used to allocate limited UAV resources to vehicles to minimize the objective function. For a given delay, the trajectory and computing resource are jointly optimized in

S P_{2}

. Next, we apply binary relaxation to transform

P 1

into a linear programming problem. Binary relaxation transforms the sub-problems into convex problems, where the complex variables are relaxed into real variables. In sub-problems

S P_{3}

and

S P_{4}

, the objective function to maximize the data rate, energy efficiency, and the constraints are non-convex and are solved using successive convex approximation (SCA). To integrate the solutions from the four sub-problems with equality and inequality constraints, we use successive quadratic programming (SQP), which is an iterative approximation of the nonlinear objective function and constraints using quadratic models, updating them in each iteration. Figure 3 illustrates the proposed FRL-based solution approach for UAV trajectory control and power optimization for low-latency C-V2X communications.

5.1. Long Short-Term Memory to Approximate the UAV Trajectory Parameters

We propose an LSTM to approximate the complex nonlinear functions in the above convex optimization sub-problems. As the LSTM approximates the parameters of the objective function and the constraints, the resulting model is integrated with the optimization framework. In

P 1

, parameters such as the UAV trajectory (

q_{(x, y)}

) in the next TTI or the vehicle data rate

s_{i, t}

cannot be precisely known, and vary significantly with time. LSTM is used to learn these parameters from training data, and the learned parameters are incorporated in the solutions of the optimization sub-problems (

S P_{1}

–

S P_{4}

). As the UAV trajectory changes, these parameters change, and as there are temporal dependencies, LSTM models these dynamics. The optimization problem is solved in an iterative manner, where the LSTM adapts the optimization strategy based on the updated parameters. In addition, the LSTM models the time-varying constraints or complex relationships between variables in

P 1

and predicts constraint violations, as well as adapting to changing constraints. The problem

P 1

is addressed using the actor–critic framework using two sets of LSTM networks to approximate the policy function (

Q^{π} (s_{t}, π (s_{t} | ψ)

) and value functions (

V (ψ)

). The parameterized actor network generates a deterministic action to maximize the value function based on the steady-state distribution of the actors’ policy.

The critic network approximates the value function by taking the derivative of

V (ψ)

, given by

\nabla V (ψ)

with respect to the policy parameter and updates the actor network by gradient ascent to improve the value function

Q^{π} (s_{t}, a_{t} | w) \nabla_{ψ} π (s_{t} | ψ)

. We use the LTE I/Q dataset to train the LSTM to learn a mapping from the sequences of states to the optimal trajectories and from the input features to the objective function and the constraints. The LTE I/Q dataset includes sequences of state–action pairs for the UAV [69]. As illustrated in Figure 4, the state includes information about the position, velocity, and environmental conditions of the UAV, while the action represents the trajectory. We use the mean squared error (MSE) loss function to predict future trajectories based on the current state and environmental conditions of the UAV to approximate the behavior of the constraints. We periodically retrain the actor–critic LSTM using updated data to evaluate its accuracy in predicting the UAV trajectory. The actor LSTM predicts trajectory, and the critic updates the parameters based on the predictions. We monitor the convergence of the optimization process and update the LSTM training strategy. The process is repeated throughout the trajectory of the UAV to model the sequential dependencies over time.

5.2. Federated Deep Deterministic Policy Gradient (Fed-DDPG)

The fed-DDPG algorithm comprising LSTM in the actor–critic framework aims to enhance the accuracy of individual DDPG algorithms at each agent. This involves exchanging the model parameters among the DDPG agents, allowing the simultaneous optimization of energy consumption for each vehicle and the UAV. The DDPG algorithm is executed on each vehicle, and federated averaging is applied to the parameters obtained from the DDPG algorithm of multiple vehicles to implement the fed-DDPG algorithm. The proposed fed-DDPG algorithm optimizes the agents’ actions together to reduce the entire cost function of the system. An episode is the sequence of events from when the UAV starts flying and either returns to the initial position or the TTI ends. The objective of the UAV is to maximize the average vehicle data traffic and the number of vehicles served. The state at time t is the current location of UAV, the current coordinates and the remaining UAV battery power (

P_{t}

). In Figure 4, the UAV at coordinates (

x_{t}, y_{t}

) serves vehicles and moves either to the east (

x_{t + 1}, y_{t}

), south (

x_{t}, y_{t - 1}

), west (

x_{t - 1}, y_{t}

), north (

x_{t}, y_{t + 1}

) or back to the initial point.

5.3. Experience Replay and Fed-DDPG

Using experience replay and a target critic network, fed-DDPG enhances the stability and robustness of our proposed solution by minimizing data correlation in various TTIs. Policy exploration and action selection are further improved by adding Gaussian noise to the output of the policy network. In order to model the prior distribution of the UAV height as the UAV trajectory adapts to vehicles’ spatial distribution and traffic demands, we use multivariate Gaussian distribution as per Equations (25) and (26):

f_{i} (H_{t}) \sim G (μ_{i} (H_{t}), v_{i} (H_{t}))

(25)

where

μ_{i} (H_{t})

is the mean vector, and

v_{i} (H_{t})

is the covariance matrix for each sampling value on the trajectory point, without any prior information:

f_{i} (H_{t}) | D_{i} (t) \sim P (D_{i} (t) | f_{i} (H_{t})) P (f_{i} (H_{t}))

(26)

At height

H_{t}

, given a maximum tolerable delay, the elements of the covariance matrix in Equation (27) are

v_{τ, τ^{'}} (H_{t}) = exp (- \frac{1}{2} | | q_{i} (τ) - q_{i} (τ^{'}) {| |}^{2})

(27)

which implies a larger correlation when two trajectory points are closer to each other. To collect more data from the vehicles, the UAV selects the next trajectory point to maximize the expected value function (

V (ψ)

). The expected improvement in the value function as the UAV moves to the next set of coordinates in a time slot is based on the constraints

C 12

and

C 13

that limit the throughput of the vehicles’ local models. As per Equation (28), the UAV selects an action to move to the next set of coordinates that maximizes the expected delay minimization:

a_{i, t} (q_{i}) = E [max {0, f_{i} (q_{i}) - f_{i}^{*} (q_{i - 1} (t))}]

(28)

where

E

denotes the expected maximum value function when UAV visited the past coordinates. The next optimal trajectory coordinate in the next TTI is found by maximizing the expected improvement. The UAV periodically updates its location, buffer size, energy status, and channel conditions. The UAV status information adapts the vehicles’ transmission strategies with the UAV trajectories and a vehicle successfully transmits its data to the UAV when the received SNR meets a minimum threshold.

5.4. Reward Function

We consider the occupancy of the UAV processor queue as a cost function to define the UAV overall reward function (

r_{i} (s_{t}, a_{t})

) in each TTI as follows in Equation (29):

r_{i} (s_{t}, a_{t}) = γ_{1} r_{i, ψ} (t) + γ_{2} r_{i, D} (t) + γ_{3} r_{i, s} (t) + γ_{4} r_{i, P} (t)

(29)

where

γ

is a discount factor,

r_{i, ψ} (t)

is the reward that achieves optimal model transmission from the vehicles to the UAV. The term

r_{i, D} (t)

is the reward that achieves minimal delay. The term

r_{i, s} (t)

is the reward that achieves the maximum data rates, and

r_{i, P} (t)

is the reward that achieves the maximum power utilization for UAV. The expectation (

E

) is taken over all samples in the experience replay buffer and constitutes the global reward

r_{i} (s_{t}, a_{t})

. Based on the past UAV trajectories, the actor–critic LSTM provides model-free prediction based on the existing samples for action exploration. Consequently, it guides the UAV trajectory towards a more rewarding policy. In fed-DDPG, each vehicle learns a local model based on its data samples and uploads the model to the UAV to maximize the long-term discounted reward in Equation (30):

r_{π_{i}} = E_{s, a \sim E} [\sum_{t = 0}^{\infty} γ_{t} r_{i, t} (s, a_{1, t}, \dots, a_{i, t}, \dots, a_{I, t})]

(30)

where

a_{I, t}

is the UAV action in each TTI. When trained on the LTE I/Q dataset, the LSTM estimates an optimal coordinate in the next TTI based on the recent trajectory. The action estimation is input to the critic LSTM together with the action learned by the actor LSTM. The critic LSTM evaluates the two actions, and the critic decides an action to be executed. The UAV selects rewarding action with a higher probability to reduce the action space and improve the learning efficiency to map each coordinate to the data size collected from the vehicles to minimize the MSE. The UAV acts as an independent DRL agent, where the policy parameter outputs its action using the deterministic policy based on its observation of the channel states, where s is the channel state, a is the action of UAV, and

E

refers to the expectation in the C-V2X environment. The gradient is calculated as per Equation (31) as

\nabla_{ϕ_{i}} r (π_{i}) = E_{s, a \sim E} \nabla_{ϕ_{i}} log (π_{ϕ_{i}} (a_{i} ∣ s_{i})) + e (s_{i}, a_{i})

(31)

where

e (s_{i}, a_{i})

is the entropy. The trade-off between maximizing entropy and reward is determined by Equation (32):

Q_{i}^{ψ} (s, a) \leftarrow Q_{i}^{ψ} (s, a) + β [r + γ max_{a} Q_{i}^{ψ} (s, a^{'}) - Q_{i}^{ψ} (s, a)]

(32)

where

Q_{i}^{ψ}

is the actor–critic Q-function,

β

∈ (0,1] is the learning rate, and r denotes the reward. As illustrated in Figure 5, we assume a random trajectory as well as an elliptical trajectory to model the UAV path. The elliptical trajectory leads to less MSE compared to the random trajectory, as the future coordinates can be accurately predicted. As per Equations (33) and (34), the reward is recursively updated, and the updated reward is used to calculate the loss function using MSE.

r_{i} = r_{i - 1} + γ E_{a^{'} \sim π_{ϕ^{'}} (s^{'})} [Q_{i}^{ψ^{'}} (s, a) + e (s_{i}, a_{i})]

(33)

min f_{L_{Q}} (ψ) = \sum_{i = 1}^{I} E_{(s, a, r, s^{'}) \sim E} [{(Q_{i}^{ψ} (s, a) - r_{i})}^{2}]

(34)

Based on the available battery power (

P_{b}

), the UAV adjusts its height (

H

). For each height, the number of vehicles covered is depicted by a triangle in Figure 5.

6. Simulation Results and Discussion

The vehicles and UAV are trained using a random 10% slice of V2X-Sim dataset, split into 60% training data and 40% testing data, to generate sensor data and learn local models [70]. The length and width of the road segment are 300 m by 20 m and the vehicle speed varies between 10 and 100 km/h. The interference experienced by each vehicle is proportional to the relative velocity between the vehicle and the UAV. The threshold for the maximum UAV altitude is (

H

)+1 km, the noise power is set to −10 dBm, and the BER tolerance is

10^{- 4}

. The size of the vehicle data and UAV data for each task is distributed randomly between 1–10 MB and 1–3 MB. Note, the batch size of the local models equals that of the gross data offloading. The transmission and reception powers of the UAV are set to 120 mW and 60 mW, respectively, and the channel bandwidth is set to 20 MHz. We implement the proposed model using Python and utilize the Amazon EC-2 instance to process the datasets. The simulation experiments are executed 50 times for 100–2000 iterations, and the average parameter values are calculated. Table 4 lists the main parameters used in the simulations. Next, we discuss the variation in UAV power (

P_{i} (t)

) with respect to channel uncertainty caused by the number of vehicles (V), UAV trajectory (

q_{(x, y)}

), and UAV height (

H

).

6.1. Variation in Average Cost Function (UAV Energy and Latency) with Number of Vehicles (V)

The variation in average cost function (UAV energy and latency) vs. number of vehicles (V) for FL is illustrated in Figure 6 for

V = 1

–100. The average throughput and the average transmission power also increases with the vehicle speed and road segment length. Also, the FL algorithm on vehicles converges after 20 iterations. When vehicles follow FL at a low vehicle speed of

V = 40

km/h and for a smaller road segment length 1 km, the UAV transmits at a lower power but experiences high interference and reduced throughput. For a higher vehicle speed of

V = 80

km/h and a larger road segment length 4 km, the UAV power consumption is notably higher.

The variation in queuing delay (

D_{q u e}

) in the FL scenario with time slots is illustrated in Figure 7, where the lowest queuing delay (

D_{q u e}

) of approximately 11.5 ms is obtained for the fed-DDPG approach. The actor–critic mechanism with LSTM also performs comparably well, with a maximum

D_{q u e}

of approximately 13.5 ms. The GRU

D_{q u e}

reaches a maximum of approximately 12 ms. As CNN is more suited to image processing applications, the CNN-LSTM and CNN-GRU lead to a higher

D_{q u e}

of 15–20 ms.

The variation in total delay (

D = D_{q u e} + D_{p r o c}

) is illustrated in Figure 8. The variation in

D

is dependent on the inter-arrival time of the packets. With fed-DDPG, we notice a max

D

of 13 ms, which is approximately 40% less than the max

D

of the RNN and CNN-GRU models.

6.2. Variation in Average Packet Drop Rate with Control Parameter ( $ϱ$ ) Using Fed-DDPG

The variation in the average packet drop rate with control parameter (

ϱ

) using fed-DDPG is illustrated in Figure 9 for

V = 1

–100. As we increase

ϱ

, the model convergence improves, and the average packet drop rate gradually reduces as

ϱ

is increased from 2000 to 10,000. Increasing

ϱ

beyond 10,000 increases the required number of iterations and hence increases the latency. Moreover, the transmit power of UAV increases with the increase in the number of vehicles and with the increase in the UAV trajectory fluctuations. In the Rician channel model, the channel gain has a non-monotonic relationship with the flying altitude of the UAV. While increasing the flying altitude improves the channel conditions for certain vehicles and leads to improved throughput, it simultaneously requires higher transmission power to meet the QoS requirement. The results reveal that as

ϱ

increases, the proposed solution optimizes the task execution time to reduce the queue backlog to ensure queue stability. As the value of

ϱ

decreases, the priority of the queue backlog increases with the reduced computation rate.

The variation in the average UAV energy with the number of vehicles at a specific UAV altitude (

H

) for different packet transmission sizes is illustrated in Figure 10 for the FL model transmission, BSM, and CPM packets. Here, we vary the packet size from 1 to 5 MB. NLoS propagation conditions and path losses are increased in dense urban environments due to obstacles such as buildings, trees, and other structures that follow a Rayleigh distribution. For vehicles to achieve their target SINR thresholds in dense urban areas, the UAV needs to fly at higher altitudes and transmit data at a higher power.

6.3. Variation in FL Computation Rate and Average UAV Energy with V for Different Machine Learning Models

Here, we consider the effect of wind in the direction of UAV hovering, as well as in the direction against UAV hovering. We add a random SNR to

q_{(x, y)}

directions of the UAV’s position, and

H

, respectively. We execute this iteration for a set of changes in the number of layers and the number of hidden nodes in LSTM. It is observed that when the number of hidden nodes exceeds 32, the average MSE value increases significantly, so it is not necessary to have more than 32 hidden nodes. For scenarios with 1–100 vehicles, due to the limited available resources, not all vehicles achieve their required SINR. Hence, the UAV increases its flying altitude and hence power consumption to provide service to these vehicles to reduce packet drops and retransmissions.

As a result, the UAV consumes more energy and has a shorter flight time since it starts from the coordinate origin and does not return to its starting point when it finishes the coverage task. Moreover, when the vehicle data size is smaller, we achieve a higher computation rate for the fed-DDPG algorithm. However, when the vehicle data size increases, the transmission task is completed in a shorter amount of time, and the performance gain increases. Increasing the number of vehicles increases the average UAV power consumption, as the bandwidth available to each vehicle decreases. The variation in FL computation rate (Mbits/s) with control parameter (

ϱ

) for different machine learning models is illustrated in Figure 11.

6.4. Probability of Optimal Trajectory Prediction

The variation in the probability of optimal trajectory prediction using LSTM vs. UAV altitude (

H

) for varying number of vehicles (V) is illustrated in Figure 12 for V = 1–100. In order to predict a sequence of trajectory coordinates, the LSTM uses past positions or features that represent the movement of an UAV, and predicts the future path. As seen from Figure 12, the probability of optimal trajectory prediction decreases with a greater number of vehicles on the road segment. Also, when the UAV altitude is between 1 and 600 m, the probability of optimal trajectory prediction reduces gradually with the altitude. However, when the UAV altitude is between 600 and 1000 m, the probability of optimal trajectory prediction reduces steeply as compared to an altitude between 1 and 600 m. This is attributed to the available battery power, which restricts UAV movement as well as the number of vehicles the UAV can serve with the available SNR. By combining multiple vehicles, the LSTM improves the sum throughput and reduces the trajectory length, thereby improving the trajectory prediction probability.

The variation in the probability of optimal trajectory prediction using actor critic vs. UAV altitude (

H

) for a varying number of vehicles (V) is illustrated in Figure 13 for V = 1–100. The actor–critic comprises an LSTM in both the actor and critic layers. Compared to Figure 12 for V = 1–100, the probability of optimal trajectory prediction using actor–critic is lower than that of fed-DDPG using LSTM for the same number of vehicles, UAV transmit power, road segment length, and UAV altitude. This is because when trained on the LTE I/Q dataset for five different UAV altitudes, the fed-DDPG using the LSTM architecture is able to capture complex dependencies in the UAV trajectory samples, and the data processing time is significantly reduced. In the case of actor–critic, when UAV transmits power between 10 and 28 dBm, the trajectory length and sum throughput performance are affected by the number of vehicles and the vehicle speed. This results in slower convergence speed and hence lower probability of optimal trajectory prediction. The probability of optimal trajectory prediction using CNN-LSTM, RNN, and GRU is illustrated in Figure 14, Figure 15, and Figure 16, respectively. As seen, the optimal trajectory prediction declines steeply as compared to fed-DDPG. Also, the CNN-LSTM, RNN, and GRU models are trained for 1000 episodes, whereas actor–critic and fed-DDPG yield a higher probability in 500 and 250 training episodes, respectively.

6.5. UAV Transmit Power ( $P_{i} (t)$ ) vs. SNR for OTFS Modulation

The variation in UAV transmit power (

P_{i} (t)

) vs. SNR for OTFS modulation is illustrated in Figure 17 for V = 1–100. Low SNR and a larger transmission delay lead to a longer flying time and higher energy consumption. The high mobility of vehicles results in lower SNR as velocity increases, increasing the average power consumption. Low SNR results in decreased system throughput and higher power consumption from UAV to maintain the SNR requirement for vehicles. Higher altitude requires high transmission power in order to meet the SNR requirement for vehicles, which includes both the flying and transmission power of the UAV. The BER is compared at different UAV and vehicle velocities. As the flight speed of the UAV increases, the BER increases, indicating that the Doppler shift from high-speed movement affects communication. It is noted that the difference in BER becomes larger as the SNR increases, and OTFS allows the UAV to communicate over longer distances. With increasing packet size, the weighted sum energy consumption increases, as the UAV is limited in resources to process the incoming packets.

6.6. Discussion and Comparison with Existing Works

As shown in Table 5, compared with the previous machine learning approaches whose result is reported as 50% reduction in model convergence time, our method achieves around 30% reduction in latency for similar model convergence time. Furthermore, when the average UAV transmit power is between 1 and 10 dBm, the average task processing time of the gross data offloading and the FL scenarios varies from 5 to 22 ms, respectively. With the increase in packet size for a higher vehicle speed of V = 80 km/h and larger road segment length 4 km, the average UAV transmit power is between 10 and 28 dBm.

In this case, the average task processing time of the gross data offloading and the FL scenarios varies from 20 to 35 ms, respectively. When the UAV altitude increases from 600 to 1000 m, the average UAV transmit power varies between 15 and 35 dBm. The flying time is constrained by the average task processing time and number of vehicles in a road segment, respectively. For FL scenario, when the UAV transmit power is above 20 dBm, the processing and queuing delay becomes negligible as compared to the gross data offloading, as part of computation is performed at the edge nodes in the FL scenario. When the average UAV transmit power is higher than 10 dBm, and when the SNR is low, the data offloaded by the vehicles are queued. The average task processing time is determined by its local computation time in the case of FL vs. gross data offloading.

7. Conclusions

In this paper, a trajectory optimization mechanism is proposed for UAV-assisted C-V2X communication that considers the Doppler effect caused by high-speed UAV and vehicle motion. Our findings demonstrate that jointly optimizing a vehicle’s transmission time interval, UAV transmit power, and UAV trajectory reduces the weighted total energy consumption of the UAV. Based on past UAV trajectory, a fed-DDPG algorithm is proposed that allows the UAV to estimate a rewarding action by optimizing the trade-off between the maximum flying altitude, power consumption, and number of vehicles served. We formulate the above problem as a mixed-integer non-convex optimization problem and divide the problem into four optimization sub-problems. We utilize an actor–critic framework in fed-DDPG approach, where the actor–critic network comprises an LSTM to estimate and explore the optimal trajectory and altitude for the UAV. Depending on the vehicle’s data distribution and traffic demand, the UAV trajectory is adapted based on the UAV energy consumption and buffer size. Compared to exhaustive search methods that have a high computational complexity, our algorithm reaches a globally optimal solution in polynomial time. Simulation results reveal that the OTFS-based fed-DDPG algorithm is superior to heuristic search algorithms in terms of throughput maximization and latency minimization.

Author Contributions

Conceptualization, A.G. and X.F.; methodology, A.G.; writing—original draft preparation, A.G.; writing—review and editing, X.F.; supervision, X.F.; funding acquisition, X.F. All authors have read and agreed to the published version of the manuscript.

Funding

This project is funded by Natural Sciences and Engineering Research Council (NSERC) of Canada, Project Number RGPIN-2024-04924.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chafii, M.; Bariah, L.; Muhaidat, S.; Debbah, M. Twelve Scientific Challenges for 6G: Rethinking the Foundations of Communications Theory. IEEE Commun. Surv. Tutor. 2023, 25, 868–904. [Google Scholar] [CrossRef]
Labib, N.S.; Brust, M.R.; Danoy, G.; Bouvry, P. The Rise of Drones in Internet of Things: A Survey on the Evolution, Prospects and Challenges of Unmanned Aerial Vehicles. IEEE Access 2021, 9, 115466–115487. [Google Scholar] [CrossRef]
Liu, R.; Liu, A.; Qu, Z.; Xiong, N.N. An UAV-Enabled Intelligent Connected Transportation System with 6G Communications for Internet of Vehicles. IEEE Trans. Intell. Transp. Syst. 2023, 24, 2045–2059. [Google Scholar] [CrossRef]
Bai, L.; Liu, J.; Wang, J.; Han, R.; Choi, J. Data Aggregation in UAV-Aided Random Access for Internet of Vehicles. IEEE Internet Things J. 2022, 9, 5755–5764. [Google Scholar] [CrossRef]
Amadeo, M.; Campolo, C.; Molinaro, A.; Harri, J.; Rothenberg, C.E.; Vinel, A. Enhancing the 3GPP V2X Architecture with Information-Centric Networking. Future Internet 2019, 11, 199. [Google Scholar] [CrossRef]
Garcia-Roger, D.; Gonzalez, E.E.; Martin-Sacristan, D.; Monserrat, J.F. V2X Support in 3GPP Specifications: From 4G to 5G and Beyond. IEEE Access 2020, 8, 190946–190963. [Google Scholar] [CrossRef]
Moreira, I.; Pimentel, C.; Barros, F.P.; Chaves, D.P.B. Modeling Fading Channels with Binary Erasure Finite-State Markov Channels. IEEE Trans. Veh. Technol. 2017, 66, 4429–4434. [Google Scholar] [CrossRef]
Qiao, D.; Liu, G.; Guo, S.; He, J. Adaptive Federated Learning for Non-Convex Optimization Problems in Edge Computing Environment. IEEE Trans. Netw. Sci. Eng. 2022, 9, 3478–3491. [Google Scholar] [CrossRef]
Ma, Y.; Ma, G.; Ai, B.; Liu, J.; Wang, N.; Zhong, Z. OTFCS-Modulated Waveform Design for Joint Grant-Free Random Access and Positioning in C-V2X. IEEE J. Sel. Areas Commun. 2024, 42, 103–119. [Google Scholar] [CrossRef]
Wang, B.; Yuan, Z.; Zheng, S.; Liu, Y. Data-Driven Intelligent Receiver for OTFS Communication in Internet of Vehicles. IEEE Trans. Veh. Technol. 2023, 73, 6968–6979. [Google Scholar] [CrossRef]
Muñoz, J.; López, B.; Quevedo, F.; Monje, C.A.; Garrido, S.; Moreno, L.E. Multi UAV Coverage Path Planning in Urban Environments. Sensors 2021, 21, 7365. [Google Scholar] [CrossRef] [PubMed]
He, X.; Li, T.; Jin, R.; Dai, H. Delay-Optimal Coded Offloading for Distributed Edge Computing in Fading Environments. IEEE Trans. Wirel. Commun. 2022, 21, 10796–10808. [Google Scholar] [CrossRef]
Chen, Z.; Yi, W.; Shin, H.; Nallanathan, A.; Li, G.Y. Efficient Wireless Federated Learning with Partial Model Aggregation. IEEE Trans. Commun. 2024, 72, 6271–6286. [Google Scholar] [CrossRef]
Sun, C.; Fontanesi, G.; Canberk, B.; Mohajerzadeh, A.; Chatzinotas, S.; Grace, D.; Ahmadi, H. Advancing UAV Communications: A Comprehensive Survey of Cutting-Edge Machine Learning Techniques. IEEE Open J. Veh. Technol. 2024, 5, 825–854. [Google Scholar] [CrossRef]
Gu, X.; Zhang, G. A survey on UAV-assisted wireless communications: Recent advances and future trends. Comput. Commun. 2023, 208, 44–78. [Google Scholar] [CrossRef]
Ng, J.S.; Lim, W.Y.B.; Dai, H.N.; Xiong, Z.; Huang, J.; Niyato, D.; Hua, X.S.; Leung, C.; Miao, C. Joint Auction-Coalition Formation Framework for Communication-Efficient Federated Learning in UAV-Enabled Internet of Vehicles. IEEE Trans. Intell. Transp. Syst. 2021, 22, 2326–2344. [Google Scholar] [CrossRef]
Li, Z.; Lu, J.; Luo, S.; Zhu, D.; Shao, Y.; Li, Y.; Zhang, Z.; Wang, Y.; Wu, C. Towards Effective Clustered Federated Learning: A Peer-to-peer Framework with Adaptive Neighbor Matching. IEEE Trans. Big Data 2024, 10, 812–826. [Google Scholar] [CrossRef]
Tang, Q.; Yang, Y.; Yang, H.; Cao, D.; Yang, K. Energy Consumption Minimization for Hybrid Federated Learning and Offloadable Tasks in UAV-Enabled WPCN. IEEE Trans. Netw. Sci. Eng. 2024, 11, 4639–4650. [Google Scholar] [CrossRef]
Duan, Q.; Huang, J.; Hu, S.; Deng, R.; Lu, Z.; Yu, S. Combining Federated Learning and Edge Computing Toward Ubiquitous Intelligence in 6G Network: Challenges, Recent Advances, and Future Directions. IEEE Commun. Surv. Tutor. 2023, 25, 2892–2950. [Google Scholar] [CrossRef]
Nasr-Azadani, M.; Abouei, J.; Plataniotis, K.N. Distillation and Ordinary Federated Learning Actor-Critic Algorithms in Heterogeneous UAV-Aided Networks. IEEE Access 2023, 11, 44205–44220. [Google Scholar] [CrossRef]
Li, X.C.; Song, S.; Li, Y.; Li, B.; Shao, Y.; Yang, Y.; Zhan, D.C. MAP: Model Aggregation and Personalization in Federated Learning with Incomplete Classes. IEEE Trans. Knowl. Data Eng. 2024, 36, 6560–6573. [Google Scholar] [CrossRef]
Xu, X.; Feng, G.; Qin, S.; Liu, Y.; Sun, Y. Joint UAV Deployment and Resource Allocation: A Personalized Federated Deep Reinforcement Learning Approach. IEEE Trans. Veh. Technol. 2024, 73, 1–14. [Google Scholar] [CrossRef]
Pakrooh, R.; Bohlooli, A. A Survey on Unmanned Aerial Vehicles-Assisted Internet of Things: A Service-Oriented Classification. Wirel. Pers. Commun. 2021, 119, 1541–1575. [Google Scholar] [CrossRef]
Le, N.P.; Huang, X.; Dutkiewicz, E.; Ritz, C.; Phung, S.L.; Bouzerdoum, A.; Franklin, D.; Hanzo, L. Energy-Harvesting Aided Unmanned Aerial Vehicles for Reliable Ground User Localization and Communications Under Lognormal-Nakagami-m Fading Channels. IEEE Trans. Veh. Technol. 2021, 70, 1632–1647. [Google Scholar] [CrossRef]
Kang, B.; Yang, J.; Paek, J.; Bahk, S. ATOMIC: Adaptive Transmission Power and Message Interval Control for C-V2X Mode 4. IEEE Access 2021, 9, 12309–12321. [Google Scholar] [CrossRef]
Gupta, A.; Fernando, X. Federated Reinforcement Learning for Collaborative Intelligence in UAV-assisted C-V2X Communications. Drones 2024, 8, 321. [Google Scholar] [CrossRef]
Gupta, A.; Fernando, X. Analysis of Unmanned Aerial Vehicle-Assisted Cellular Vehicle-to-Everything Communication Using Markovian Game in a Federated Learning Environment. Drones 2024, 8, 238. [Google Scholar] [CrossRef]
Gupta, A.; Fernando, X. Latency Analysis of Drone-Assisted C-V2X Communications for Basic Safety and Co-Operative Perception Messages. Drones 2024, 8, 600. [Google Scholar] [CrossRef]
Wei, W.; Gu, H.; Li, B. Congestion Control: A Renaissance with Machine Learning. IEEE Netw. 2021, 35, 262–269. [Google Scholar] [CrossRef]
Albasry, H.; Ahmed, Q.Z. Network-Assisted D2D Discovery Method by Using Efficient Power Control Strategy. In Proceedings of the 2016 IEEE 83rd Vehicular Technology Conference (VTC Spring), Nanjing, China, 15–18 May 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–5. [Google Scholar]
Shimizu, T.; Cheng, B.; Lu, H.; Kenney, J. Comparative Analysis of DSRC and LTE-V2X PC5 Mode 4 with SAE Congestion Control. In Proceedings of the 2020 IEEE Vehicular Networking Conference (VNC), New York, NY, USA, 16–18 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–8. [Google Scholar]
Gupta, A.; Fernando, X. Simultaneous Localization and Mapping (SLAM) and Data Fusion in Unmanned Aerial Vehicles: Recent Advances and Challenges. Drones 2022, 6, 85. [Google Scholar] [CrossRef]
Pasternak, G.; Pasternak, K.; Koda, E.; Ogrodnik, P. Unmanned Aerial Vehicle Photogrammetry for Monitoring the Geometric Changes of Reclaimed Landfills. Sensors 2024, 24, 7247. [Google Scholar] [CrossRef] [PubMed]
Kandregula, V.R.; Zaharis, Z.D.; Ahmed, Q.Z.; Khan, F.A.; Loh, T.H.; Schreiber, J.; Serres, A.J.R.; Lazaridis, P.I. A Review of Unmanned Aerial Vehicle Based Antenna and Propagation Measurements. Sensors 2024, 24, 7395. [Google Scholar] [CrossRef] [PubMed]
Xing, Z.; Qin, Y.; Du, C.; Wang, W.; Zhang, Z. Deep Reinforcement Learning-Driven Jamming-Enhanced Secure Unmanned Aerial Vehicle Communications. Sensors 2024, 24, 7328. [Google Scholar] [CrossRef]
Ponte, S.; Ariante, G.; Greco, A.; Del Core, G. Differential Positioning with Bluetooth Low Energy (BLE) Beacons for UAS Indoor Operations: Analysis and Results. Sensors 2024, 24, 7170. [Google Scholar] [CrossRef]
Luong, P.; Gagnon, F.; Tran, L.N.; Labeau, F. Deep Reinforcement Learning-Based Resource Allocation in Cooperative UAV-Assisted Wireless Networks. IEEE Trans. Wirel. Commun. 2021, 20, 7610–7625. [Google Scholar] [CrossRef]
Xu, Y.; Zhu, K.; Xu, H.; Ji, J. Deep Reinforcement Learning for Multi-Objective Resource Allocation in Multi-Platoon Cooperative Vehicular Networks. IEEE Trans. Wirel. Commun. 2023, 22, 6185–6198. [Google Scholar] [CrossRef]
Xie, J.; Chang, Z.; Guo, X.; Hamalainen, T. Energy Efficient Resource Allocation for Wireless Powered UAV Wireless Communication System with Short Packet. IEEE Trans. Green Commun. Netw. 2023, 7, 101–113. [Google Scholar] [CrossRef]
Zheng, H.; Atia, M.; Yanikomeroglu, H. A Positioning System in an Urban Vertical Heterogeneous Network (VHetNet). IEEE J. Radio Freq. Identif. 2023, 7, 352–363. [Google Scholar] [CrossRef]
Qin, P.; Wu, X.; Cai, Z.; Zhao, X.; Fu, Y.; Wang, M.; Geng, S. Joint Trajectory Plan and Resource Allocation for UAV-Enabled C-NOMA in Air-Ground Integrated 6G Heterogeneous Network. IEEE Trans. Netw. Sci. Eng. 2023, 10, 3421–3434. [Google Scholar]
Liu, Z.; Qi, J.; Shen, Y.; Ma, K.; Guan, X. Maximizing Energy Efficiency in UAV-Assisted NOMA-MEC Networks. IEEE Internet Things J. 2023, 10, 22208–22222. [Google Scholar] [CrossRef]
Zhang, M.; Xiong, Y.; Ng, S.X.; El-Hajjar, M. Content-Aware Transmission in UAV-Assisted Multicast Communication. IEEE Trans. Wirel. Commun. 2023, 22, 7144–7157. [Google Scholar] [CrossRef]
Yang, D.; Wu, Q.; Zeng, Y.; Zhang, R. Energy Tradeoff in Ground-to-UAV Communication via Trajectory Design. IEEE Trans. Veh. Technol. 2018, 67, 6721–6726. [Google Scholar] [CrossRef]
Bithas, P.S.; Nikolaidis, V.; Kanatas, A.G.; Karagiannidis, G.K. UAV-to-Ground Communications: Channel Modeling and UAV Selection. IEEE Trans. Commun. 2020, 68, 5135–5144. [Google Scholar] [CrossRef]
Hua, B.; Ni, H.; Zhu, Q.; Wang, C.X.; Zhou, T.; Mao, K.; Bao, J.; Zhang, X. Channel Modeling for UAV-to-Ground Communications with Posture Variation and Fuselage Scattering Effect. IEEE Trans. Commun. 2023, 71, 3103–3116. [Google Scholar] [CrossRef]
Park, H.; Lim, Y. Deep Reinforcement Learning Based Resource Allocation with Radio Remote Head Grouping and Vehicle Clustering in 5G Vehicular Networks. Electronics 2021, 10, 3015. [Google Scholar] [CrossRef]
Liu, X.; Yang, Y.; Gong, J.; Xia, N.; Guo, J.; Peng, M. Amplitude Barycenter Calibration of Delay-Doppler Spectrum for OTFS Signal—An Endeavor to Integrated Sensing and Communication Waveform Design. IEEE Trans. Wirel. Commun. 2023, 23, 2622–2637. [Google Scholar] [CrossRef]
Xia, X.; Xu, K.; Wang, Y.; Xu, Y.; Xie, W. Achieving Better Accuracy with Less Computations: A Delay-Doppler Spectrum Matching Assisted Active Sensing Framework for OTFS Based ISAC Systems. IEEE Trans. Wirel. Commun. 2023, 23, 6204–6220. [Google Scholar] [CrossRef]
Stefanovic, C.; Panic, S.; Bhatia, V.; Kumar, N. On Second-Order Statistics of the Composite Channel Models for UAV-to-Ground Communications with UAV Selection. IEEE Open J. Commun. Soc. 2021, 2, 534–544. [Google Scholar] [CrossRef]
Qu, G.; Xie, A.; Liu, S.; Zhou, J.; Sheng, Z. Reliable Data Transmission Scheduling for UAV-Assisted Air-to-Ground Communications. IEEE Trans. Veh. Technol. 2023, 72, 13787–13792. [Google Scholar] [CrossRef]
Li, Z.; Giorgetti, A.; Kandeepan, S. Multiple Radio Transmitter Localization via UAV-Based Mapping. IEEE Trans. Veh. Technol. 2021, 70, 8811–8822. [Google Scholar] [CrossRef]
Al-Quraan, M.; Mohjazi, L.; Bariah, L.; Centeno, A.; Zoha, A.; Arshad, K.; Assaleh, K.; Muhaidat, S.; Debbah, M.; Imran, M.A. Edge-Native Intelligence for 6G Communications Driven by Federated Learning: A Survey of Trends and Challenges. IEEE Trans. Emerg. Top. Comput. Intell. 2023, 7, 957–979. [Google Scholar] [CrossRef]
Feng, J.; Liu, L.; Pei, Q.; Li, K. Min-Max Cost Optimization for Efficient Hierarchical Federated Learning in Wireless Edge Networks. IEEE Trans. Parallel Distrib. Syst. 2022, 33, 2687–2700. [Google Scholar] [CrossRef]
Hu, Z.; Shaloudegi, K.; Zhang, G.; Yu, Y. Federated Learning Meets Multi-Objective Optimization. IEEE Trans. Netw. Sci. Eng. 2022, 9, 2039–2051. [Google Scholar] [CrossRef]
Taik, A.; Mlika, Z.; Cherkaoui, S. Clustered Vehicular Federated Learning: Process and Optimization. IEEE Trans. Intell. Transp. Syst. 2022, 23, 25371–25383. [Google Scholar] [CrossRef]
Deng, D.; Wang, C.; Wang, W. Joint Air-to-Ground Scheduling in UAV-Aided Vehicular Communication: A DRL Approach with Partial Observations. IEEE Commun. Lett. 2022, 26, 1628–1632. [Google Scholar] [CrossRef]
Shen, S.; Yang, K.; Wang, K.; Zhang, G. UAV-Aided Vehicular Short-Packet Communication and Edge Computing System Under Time-Varying Channel. IEEE Trans. Veh. Technol. 2023, 72, 6625–6638. [Google Scholar] [CrossRef]
Almutairi, J.; Aldossary, M.; Alharbi, H.A.; Yosuf, B.A.; Elmirghani, J.M.H. Delay-Optimal Task Offloading for UAV-Enabled Edge-Cloud Computing Systems. IEEE Access 2022, 10, 51575–51586. [Google Scholar] [CrossRef]
Hosseini, M.; Ghazizadeh, R. Stackelberg Game-Based Deployment Design and Radio Resource Allocation in Coordinated UAVs-Assisted Vehicular Communication Networks. IEEE Trans. Veh. Technol. 2023, 72, 1196–1210. [Google Scholar] [CrossRef]
Khazali, A.; Bozorgchenani, A.; Tarchi, D.; Shayesteh, M.G.; Kalbkhani, H. Joint Task Assignment, Power Allocation and Node Grouping for Cooperative Computing in NOMA-mmWave Mobile Edge Computing. IEEE Access 2023, 11, 93664–93678. [Google Scholar] [CrossRef]
Shinde, S.S.; Tarchi, D. Joint Air-Ground Distributed Federated Learning for Intelligent Transportation Systems. IEEE Trans. Intell. Transp. Syst. 2023, 24, 9996–10011. [Google Scholar] [CrossRef]
Hu, C.; Qu, G.; Shin, H.S.; Tsourdos, A. Distributed synchronous cooperative tracking algorithm for ground moving target in urban by UAVs. Int. J. Syst. Sci. 2021, 52, 832–847. [Google Scholar] [CrossRef]
Shinde, S.S.; Tarchi, D. A Markov Decision Process Solution for Energy-Saving Network Selection and Computation Offloading in Vehicular Networks. IEEE Trans. Veh. Technol. 2023, 72, 12031–12046. [Google Scholar] [CrossRef]
Kumar, A.S.; Zhao, L.; Fernando, X. Multi-Agent Deep Reinforcement Learning-Empowered Channel Allocation in Vehicular Networks. IEEE Trans. Veh. Technol. 2022, 71, 1726–1736. [Google Scholar] [CrossRef]
Kumar, A.S.; Zhao, L.; Fernando, X. Mobility Aware Channel Allocation for 5G Vehicular Networks using Multi-Agent Reinforcement Learning. In Proceedings of the ICC 2021-IEEE International Conference on Communications, Virtual, 14–23 June 2021; pp. 1–6. [Google Scholar]
Kumar, A.S.; Zhao, L.; Fernando, X. Task Offloading and Resource Allocation in Vehicular Networks: A Lyapunov-based Deep Reinforcement Learning Approach. IEEE Trans. Veh. Technol. 2023, 72, 13360–13373. [Google Scholar] [CrossRef]
Liu, Z.; Huang, G.; Zhong, Q.; Zheng, H.; Zhao, S. UAV-Aided Vehicular Communication Design with Vehicle Trajectory’s Prediction. IEEE Wirel. Commun. Lett. 2021, 10, 1212–1216. [Google Scholar] [CrossRef]
Maeng, S.J.; Ozdemir, O.; Guvenc, I.; Sichitiu, M.L.; Mushi, M.; Dutta, R. LTE I/Q Data Set for UAV Propagation Modeling, Communication, and Navigation Research. IEEE Commun. Mag. 2023, 61, 90–96. [Google Scholar] [CrossRef]
Li, Y.; Ma, D.; An, Z.; Wang, Z.; Zhong, Y.; Chen, S.; Feng, C. V2X-Sim: Multi-Agent Collaborative Perception Dataset and Benchmark for Autonomous Driving. IEEE Robot. Autom. Lett. 2022, 7, 10914–10921. [Google Scholar] [CrossRef]
Roshdi, M.; Bhadauria, S.; Hassan, K.; Fischer, G. Deep Reinforcement Learning based Congestion Control for V2X Communication. In Proceedings of the 2021 IEEE 32nd Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), Virtual, 13–16 September 2021; pp. 1–6. [Google Scholar]
Chen, M.; Poor, H.V.; Saad, W.; Cui, S. Convergence Time Optimization for Federated Learning Over Wireless Networks. IEEE Trans. Wirel. Commun. 2021, 20, 2457–2471. [Google Scholar] [CrossRef]
Samarakoon, S.; Bennis, M.; Saad, W.; Debbah, M. Distributed Federated Learning for Ultra-Reliable Low-Latency Vehicular Communications. IEEE Trans. Commun. 2020, 68, 1146–1159. [Google Scholar] [CrossRef]
Jayanetti, A.; Halgamuge, S.; Buyya, R. Deep reinforcement learning for energy and time optimized scheduling of precedence-constrained tasks in edge–cloud computing environments. Future Gener. Comput. Syst. 2022, 137, 14–30. [Google Scholar] [CrossRef]
Gyawali, S.; Qian, Y.; Hu, R. Deep Reinforcement Learning Based Dynamic Reputation Policy in 5G Based Vehicular Communication Networks. IEEE Trans. Veh. Technol. 2021, 70, 6136–6146. [Google Scholar] [CrossRef]
Sial, M.N.; Deng, Y.; Ahmed, J.; Nallanathan, A.; Dohler, M. Stochastic Geometry Modeling of Cellular V2X Communication Over Shared Channels. IEEE Trans. Veh. Technol. 2019, 68, 11873–11887. [Google Scholar] [CrossRef]

Figure 1. A brief timeline depicting the amalgamation of wireless communication technologies with transportation systems. Also illustrated is the gradual integration of UAVs in vehicular networks in 5G and 6G wireless communication paradigms. A detailed timeline and comprehensive overview of the recent and evolving applications of machine learning techniques in UAV communication frameworks can be found in [14,15].

Figure 2. System Model: Delay is accumulated as vehicles in different clusters generate and transmit local models to the UAV. The UAV transmits the global model to the vehicles. Note, each vehicle captures a different kind of data packet, leading to non-i.i.d. and heterogeneous data.

Figure 3. An illustration of the proposed federated reinforcement learning-based solution approach for UAV trajectory control and power optimization for low-latency C-V2X communications.

Figure 4. UAV trajectory varies in a random manner, and the vehicles capture varying sensor data at different TTIs. By processing the sensor data, local models are generated at the vehicles and a global model is generated at the UAV.

Figure 5. UAV trajectory and vehicle coverage depending on UAV transmit power (

P_{i} (t)

) and altitude (

H

). The shaded triangular region (

P_{i} (t)

) indicates the coverage range of the UAV when the UAV is at a specific altitude (

H

).

Figure 5. UAV trajectory and vehicle coverage depending on UAV transmit power (

P_{i} (t)

) and altitude (

H

). The shaded triangular region (

P_{i} (t)

) indicates the coverage range of the UAV when the UAV is at a specific altitude (

H

).

Figure 6. Variation in average cost function (UAV energy and latency) with number of vehicles (V).

Figure 7. Variation in queuing delay (

D_{q u e}

) in FL scenario with time slots.

Figure 7. Variation in queuing delay (

D_{q u e}

) in FL scenario with time slots.

Figure 8. Total delay (

D

) vs. number of vehicles (V) for different machine learning models.

Figure 8. Total delay (

D

) vs. number of vehicles (V) for different machine learning models.

Figure 9. Variation in average packet drop rate with control parameter (

ϱ

) using fed-DDPG.

Figure 9. Variation in average packet drop rate with control parameter (

ϱ

) using fed-DDPG.

Figure 10. Variation in average UAV energy with number of vehicles (V) for different machine learning models.

Figure 11. Variation in FL computation rate (Mbits/s) with control parameter (

ϱ

) for different machine learning models.

Figure 11. Variation in FL computation rate (Mbits/s) with control parameter (

ϱ

) for different machine learning models.

Figure 12. Probability of optimal trajectory prediction for fed-DDPG (using LSTM) vs. UAV altitude (

H

) for varying number of vehicles (V) over trials of 250 episodes.

Figure 12. Probability of optimal trajectory prediction for fed-DDPG (using LSTM) vs. UAV altitude (

H

) for varying number of vehicles (V) over trials of 250 episodes.

Figure 13. Probability of optimal trajectory prediction for actor–critic (using LSTM) vs. UAV altitude (

H

) for varying number of vehicles (V) over trials of 500 episodes.

Figure 13. Probability of optimal trajectory prediction for actor–critic (using LSTM) vs. UAV altitude (

H

) for varying number of vehicles (V) over trials of 500 episodes.