Next Article in Journal
An Improved YOLOv5 Crack Detection Method Combined with a Bottleneck Transformer
Next Article in Special Issue
Spectral Efficiency Analysis for IRS-Assisted MISO Wireless Communication: A Metaverse Scenario Proposal
Previous Article in Journal
The Geometry of Feature Space in Deep Learning Models: A Holistic Perspective and Comprehensive Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

HAP-Assisted RSMA-Enabled Vehicular Edge Computing: A DRL-Based Optimization Framework

Department of Computer Science and Engineering, Seoul National University of Science and Technology, Seoul 01811, Republic of Korea
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(10), 2376; https://doi.org/10.3390/math11102376
Submission received: 20 April 2023 / Revised: 17 May 2023 / Accepted: 17 May 2023 / Published: 19 May 2023

Abstract

:
In recent years, the demand for vehicular edge computing (VEC) has grown rapidly due to the increasing need for low-latency and high-throughput applications such as autonomous driving and smart transportation systems. Nevertheless, offering VEC services in rural locations remains a difficulty owing to a lack of network facilities. We tackle this issue by taking advantage of high-altitude platforms (HAPs) and rate-splitting multiple access (RSMA) techniques to propose an HAP-assisted RSMA-enabled VEC system, which can enhance connectivity and provide computational capacity in rural locations. We also introduce a deep deterministic policy gradient (DDPG)-based framework that optimizes the allocation of resources and task offloading by jointly considering the offloading rate, splitting rate, transmission power, and decoding order parameters. Via results from extensive simulations, the proposed framework shows superior performance in comparison with conventional schemes regarding task success rate and energy consumption.

1. Introduction

The increasing demand for applications that require fast response times and high data transfer rates has led to the emergence of multi-access edge computing (MEC). In the context of vehicular networks, MEC is referred to as VEC, which leverages the computational capabilities available at the network’s edge to process data generated by connected vehicles [1,2,3]. Despite its advantages, offering VEC services in rural locations remains a challenge due to limited network coverage and infrastructure. Moreover, terrestrial communication infrastructure can be disrupted by natural disasters such as earthquakes and floods. To address these issues, upcoming sixth-generation (6G) communications will emphasize the utilization of non-terrestrial networks, also known as aerial access networks, which consist of unmanned aerial vehicles (UAVs), HAPs, and satellites [4,5]. Non-terrestrial networks offer flexibility and wide coverage, enabling them to provide fast connectivity and meet the high computational demands of the Internet of Things (IoT). In Release 18 of the 3rd Generation Partnership Project [6], HAPs have been introduced as new uncrewed aircraft systems that operate at a height between 17 and 22 km in the stratosphere. This altitude is above the commercial aeroplane level and below that of satellites [7]. HAPs can provide a wide network connection and improve the quality of service (QoS) owing to strong line-of-sight (LoS) communications. This feature is particularly useful in scenarios where terrestrial networks are unavailable, such as in rural areas or during emergencies. Recent studies have investigated the potential of HAPs in MEC networks, focusing on resource allocation and computation offloading in regions without terrestrial access network coverage [8,9,10]. However, the extent to which HAPs can benefit VEC remains uncertain.
Moreover, the efficacy of communication networks heavily depends on the techniques of multiple access. Non-orthogonal multiple access (NOMA) is a technique that divides users in the power domain, making it possible to simultaneously serve multiple users in a single-frequency or time resource [11]. However, NOMA may not be the most efficient approach for multi-user systems with a single antenna. To address this limitation, RSMA enables flexible rate splitting of each user’s message into different proportions while controlling the interference and treating it as noise through optimal decoding [12,13,14,15]. This configurable split of messages according to priority allows for better power control and spectral efficiency in both uplink and downlink transmission, making it more effective than existing techniques. Adopting RSMA can improve spectral efficiency and reliability in VEC systems. In addressing task offloading problems, RSMA can be utilized for uplink transmission of tasks from devices to computing nodes by dividing transmitted messages and decoding them sequentially at the receiving node with successive interference cancellation (SIC). This makes RSMA an appealing prospect for integration with HAPs in VEC networks.
Meanwhile, deep reinforcement learning (DRL) [16,17] has proven to be advantageous in solving decision-making problems in communication networks [18,19], by utilizing exploratory learning through trial and feedback mechanisms to learn optimal policies. One popular DRL algorithm is Deep Q-Network (DQN) [16], which uses a deep neural network (DNN) to approximate the optimal action-value function. The action-value function determines the expected cumulative reward for taking a particular action in a given state. However, DQN has its limitations in handling continuous action spaces, which can lead to inaccurate approximations of the action-value function. Additionally, DQN requires a large amount of training data to achieve good performance. The DDPG [17] algorithm can address these limitations as it uses an actor–critic framework where the actor learns the optimal policy, while the critic learns the optimal action-value function. DDPG can handle high-dimensional continuous action spaces and performs well with smaller amounts of training data compared to DQN.

1.1. Related Work

Task offloading and resource allocation are critical factors that impact the overall performance of MEC-enabled aerial networks and have been extensively studied recently. For instance, the work [20] proposed a UAV-aided MEC system that delivers computing services to IoT devices. A joint problem of task offloading and resource allocation was investigated to reduce service latency and UAV energy usage. Recently, DRL has been recognized as a powerful technique for solving optimization problems in aerial networks. In [21], the authors examined the VEC network, where traffic congestion occurred or roadside units (RSUs) were outside the range of communication. To address this issue, a provisional UAV was employed as an edge computing and relay node to help vehicles obtain traffic data from other vehicles. A channel allocation and task-processing strategy based on DQN was proposed to optimize the data transmission policy. In [22], the authors employed a UAV-mounted intelligent reflecting surface to reflect signals from terrestrial base stations to mobile devices situated in underprivileged regions where direct communication was not possible. To enhance the overall system sum rate, they developed a DDPG-based optimization framework that took into account the trajectory of the UAV and the phase shift of the intelligent reflecting surface. In [23], a DDPG-based optimizing strategy was developed to maximize the system sum rate in a downlink RSMA-enabled UAV-assisted network.
While UAVs offer cost-effective and rapid deployment, their limited energy result in short endurance. Additionally, their coverage area is limited to only a few hundred metres, and they are prone to unpredictable interference, making it difficult to provide continuous wireless coverage in the long run. In comparison, HAPs have the advantage of being able to stay aloft for extended periods, cover larger areas, carry heavier payloads, and operate at higher altitudes with greater stability. This makes them better suited for certain applications when compared to UAVs. HAPs can provide computing services for vehicle users or IoT devices in rural areas or when base stations are not functioning properly [2,24,25]. They can also support UAVs in data collection [9] and wireless energy charging [26]. In [1], the authors investigated a vehicular network enabled by MEC and assisted by aerial-terrestrial connectivity using HAPs and satellites. To address the issue of managing resources and task offloading, they proposed a decentralized value-iteration-based reinforcement learning scheme. In [27], the authors studied a smart transportation system that consists of three computation layers: onboard vehicle devices, RSUs, and HAPs. The main goal was to minimize delay by optimizing decisions related to computing, caching, bandwidth, and computing resource allocation. They developed an optimized framework based on an improved version of DQN. In [8], the authors designed a heterogeneous aerial network that included one HAP and some UAVs. They aimed to address the challenge of providing computing services in hard-to-reach areas without terrestrial infrastructure. To maximize IoT device satisfaction and minimize total energy consumption, the authors tackled a joint problem involving device association, task offloading, and allocation of communication resources, which was addressed by a multi-agent DDPG scheme. In [10], UAVs gathered tasks from IoT devices and offloaded some of them to the HAP. To optimize the offloading policies, a multi-agent proximal policy optimization strategy was developed. However, as the HAP did not receive any tasks directly from IoT devices, it was not feasible to serve IoT devices situated beyond the reach of the UAVs. More recent studies on DRL-based computation offloading techniques in aerial access networks can be found in [19]. Regarding the RSMA technique, interested readers are referred to recent studies and surveys [12,13,14,15].

1.2. Motivations and Contributions

Although significant progress has been made in computation offloading, there are still some challenges that need to be addressed:
(1)
Optimization objective: Task offloading problems typically aim to minimize task completion delay [1,9,20,25,27]. However, this objective does not guarantee that tasks will be fully processed within the desired time frame. To address this, a more reasonable objective would be to maximize the number of tasks completed within QoS constraints, also known as task success rate [8,10].
(2)
Offloading scheme: Related studies on task offloading [1,27] use a binary offloading model where tasks are either locally processed or offloaded to edge nodes. However, this approach fails to effectively coordinate offloading between users and edge nodes, leading to low task success rates and high energy consumption. To ensure greater flexibility, partial offloading schemes should be explored.
(3)
Multiple access scheme: Many related works have not considered advanced multiple access techniques. RSMA can achieve better performance than NOMA and other techniques [12,13]. Therefore, incorporating RSMA into HAP-assisted VEC systems while optimizing decoding order for SIC at the HAP could improve system performance.
(4)
Problem-solving method: DRL has demonstrated its effectiveness in solving various problems [1,8,9,10,22,23,27]. However, applying DRL (i.e., DDPG) to a specific problem is not always straightforward, particularly in dynamic VEC systems with time-varying communication channel states. In general, there is currently a lack of studies that consider the unique features of HAPs and RSMA in dynamic VEC systems with DRL-based optimization approaches. To address this gap, we investigate an HAP-aided RSMA-enabled VEC system with a DDPG-based offloading and resource allocation framework. Our work makes the following key contributions:
  • We introduce an HAP-assisted RSMA-enabled vehicular edge network in remote areas where a conventional ground-based communication infrastructure is unavailable. This network allows multiple vehicle users with restricted computational capacity to transfer their tasks to the HAP for computation while the RSMA ensures improved transmission rates. With the objective of maximizing task success rate while minimizing energy usage, we formulate a joint optimization problem involving partial task offloading and resource allocation variables such as task offloading rate, RSMA splitting rate, the transmission power of vehicle users, and the decoding order of signals at the HAP.
  • We transform the non-convex mixed-integer problem into a Markov decision process (MDP) model due to its complexity. To achieve this, we replace discrete variables with normalized continuous ones. However, the high-dimensional continuous state-action space and time-varying channels in the system pose a challenge. To navigate this complexity, we propose a DDPG-based optimization framework that maximizes the long-term expected cumulative reward to enhance task success rate and energy efficiency, while conventional DDPG algorithms generate actions that may violate constraints on accessible ranges and correlations between optimization variables, we present an action correlation function to satisfy the correlation constraints and use a sigmoid function to maintain desired action variable ranges while exploring different actions. These enhancements enable the effective application of the DDPG algorithm to solve the problem.
  • We analyse the efficacy of the proposed DDPG-based optimization framework using extensive simulations. The simulation results demonstrate that the DDPG-based scheme for dynamic task offloading and resource allocation outperforms benchmark schemes, including DQN, local search, full offloading, and random-based methods, across different scenarios.
The rest of the paper is structured in the following manner. Section 2 presents the system model and problem formulation. The DDPG-driven task offloading optimization framework is outlined in Section 3. Section 4 provides the convergence analysis and performance comparisons. Lastly, Section 5 concludes the work and outlines future work.

2. System Model and Problem Formulation

In this section, we present the system model, communication model, computation model, and computing overhead, followed by the presentation of the problem formulation. To enhance the readability, Table 1 displays the notations used in this section.

2.1. System Model

We consider an HAP-aided RSMA-enabled edge network in rural areas (Figure 1), in which the on-ground base station or RSU is unavailable. The network includes an HAP serving as an aerial computing server and a group of vehicle users (VUs) k 𝓚 = { 1 , , K } within the range of the HAP network coverage. The time interval is partitioned into T discrete time slots, with t 𝓣 = { 1 , , T } representing the set of these time slots. The duration of each time slot is considered small enough to ensure that the network stage and channel conditions do not change within the time slot [1]. At each time slot t, VU k’s position is defined as ( x k ( t ) , y k ( t ) , 0 ) , k . The HAP is positioned in the stratosphere at specified coordinates ( x H , y H , z H ) above the area of interest. Thus, at time step t, the distance d k ( t ) from VU k to the HAP can be estimated by
d k ( t ) = x H x k ( t ) 2 + y H y k ( t ) 2 + z H 2 .
In this scenario, for each time slot t, each VU k randomly generates a computational task τ k ( t ) = ( w k ( t ) , c k ( t ) ) . Here, w k ( t ) and c k ( t ) refer to the size of data (bits) and the required number of CPU cycles to process one data bit (CPU cycles/bit), respectively. Due to the limitations in computing capacity, each VU may not have the capability to perform all tasks independently within the given time limit of t max . Hence, the VUs need to transfer their computational tasks to the HAP, which has more computing resources and processing power. To ensure generality, we employ the partial offloading model. It allows tasks to be divided into non-overlapping segments that can be processed simultaneously through local computing at the VU and offloading execution at the HAP, thus providing more flexibility in dealing with computation offloading and resource management problems than the binary offloading model [8,19]. Hence, an offloading rate o k ( t ) [ 0 , 1 ] for the task τ k ( t ) is defined to indicate the proportion of the task that will be offloaded to the HAP for processing. The remaining portion of the task ( 1 o k ( t ) ) can be executed on the VU.

2.2. Communication Model

The VUs offload partial tasks to the HAP via uplink RSMA. The channel between the HAP and VUs follows a strong LoS model due to the high altitude of the HAP. Using the model of free space path loss [20], the channel gain between VU k and the HAP can be estimated as
g k ( t ) = g 0 d k ( t ) α ,
where g 0 corresponds to the channel gain at a reference distance of 1 m and α is the path loss exponent. RSMA technique is used to improve spectrum efficiency in communications between VUs and the HAP. Each VU splits its transmitted signal into multiple sub-signals, which are sent simultaneously at the same time and frequency slot to the HAP. Without loss of generality, each transmitted signal is supposed to be split into two sub-signals [9,12,13]. We denote s k ( t ) as the transmitted signal from the VU k and { s k m ( t ) m { 1 , 2 } } as the set of sub-signals for VU k such that E s k m 2 = 1 . At time step t, the transmitted signal s k ( t ) is represented as
s k ( t ) = m = 1 2 p k m ( t ) s k m ( t ) , k 𝓚 ,
where p k m ( t ) 0 represents the transmission power of sub-signal s k m ( t ) during time step t. At the HAP, the overall received signal s H ( t ) at time slot t is given by
s H ( t ) = k = 1 K g k ( t ) s k ( t ) + n H = k = 1 K m = 1 2 g k ( t ) p k m ( t ) s k m ( t ) + n H ,
where n H denotes a white Gaussian noise with a power spectral density σ 2 . The total transmission signal power of VU k at each time slot cannot exceed the maximum transmission power p max , expressed by
m = 1 2 p k m ( t ) p max , k 𝓚 .
In terms of task offloading, the offloaded part o k ( t ) of task τ k ( t ) is split into two sub-offloaded tasks, which are independently encoded into two sub-signals of the VU k. Hence, we define a set of splitting rate variables { δ k m ( t ) [ 0 , 1 ] m { 1 , 2 } } for the VU k, in which m = 1 2 δ k m ( t ) = 1 . At time slot t, a sub-offloaded task τ k m ( t ) corresponding to a sub-signal s k m ( t ) of VU k is defined as
τ k m ( t ) = ( w k m ( t ) , c k ( t ) ) , m { 1 , 2 } ,
where w k m ( t ) = δ k m ( t ) o k ( t ) w k ( t ) (bits) indicates the size of sub-offloaded task τ k m ( t ) .
At the HAP, the SIC method is utilized for decoding each sub-signal s k m ( t ) from the overall received signal s H ( t ) . A permutation 𝝅 is used to determine the order in which the decoding occurs. During time slot t, the collection of all feasible decoding orders for the 2 K sub-signals { s k m ( t ) k 𝓚 , m { 1 , 2 } } across K VUs is denoted by Π ( t ) , and the decoding order is represented by 𝝅 ( t ) Π ( t ) . The decoding order for a specific sub-signal s k m ( t ) is denoted by π k m ( t ) { 1 , , 2 K } . If π k m > π k m , where k 𝓚 and m { 1 , 2 } , then the sub-signal s k m is decoded after s k m . Then, the uplink transmission rate for s k m ( t ) is calculated by
r k m ( t ) = W log 2 1 + g k ( t ) p k m ( t ) π k m ( t ) > π k m ( t ) g k ( t ) p k m ( t ) + σ 2 ,
where W is the HAP’s available communication bandwidth.

2.3. Computation Model

The focus is on the computing cost with respect to the time it takes to process tasks and the energy consumed by the VUs. The transmission process of results is assumed to be negligible, given that the size of computational results is considerably smaller than that of the offloaded input [8,9].

2.3.1. Local Computation

We compute the local processing time T k loc ( t ) for the task τ k ( t ) of VU k with a ratio of ( 1 o k ( t ) ) and a size of ( 1 o k ( t ) ) w k ( t ) as
T k loc ( t ) = ( 1 o k ( t ) ) w k ( t ) c k ( t ) F k ,
in which F k is the computing capacity of VU k (CPU cycles/s). Accordingly, with κ representing the energy consumption coefficient of VUs, the consumed energy of VU k due to local computing is computed by
E k loc ( t ) = ( 1 o k ( t ) ) w k ( t ) c k ( t ) κ F k 2 .

2.3.2. HAP-Assisted Computation

The sub-offloaded task has to be uploaded to the HAP before it can be executed at the HAP. The time required for transmitting the sub-offloaded task τ k m ( t ) is estimated by
T k m tran ( t ) = w k m ( t ) r k m ( t ) = δ k m ( t ) o k ( t ) w k ( t ) r k m ( t ) .
Hence, the transmission duration for the offloaded part o k ( t ) of task τ k ( t ) can be obtained by summing up the transmission times for all the sub-offloaded tasks as
T k tran ( t ) = m = 1 2 T k m tran ( t ) = m = 1 2 δ k m ( t ) o k ( t ) w k ( t ) r k m ( t ) .
In addition, the execution time of the offloaded part o k ( t ) of task τ k ( t ) at the HAP can be estimated by
T k exec ( t ) = o k ( t ) w k ( t ) c k ( t ) f k ,
where f k represents the HAP computational resources allocated to VU k, which is assumed to be predefined at each time slot (i.e., all offloaded tasks receive an equal share of the HAP computing resources F H ) [8]. The overall delay for handling the offloaded portion o k ( t ) of task τ k ( t ) , including transmission time T k tran ( t ) and execution time T k exec ( t ) , can be expressed as
T k off ( t ) = T k tran ( t ) + T k exec ( t ) = m = 1 2 δ k m ( t ) o k ( t ) w k ( t ) r k m ( t ) + o k ( t ) w k ( t ) c k ( t ) f k = o k ( t ) w k ( t ) m = 1 2 δ k m ( t ) r k m ( t ) + c k ( t ) f k .
Regarding energy consumption of VUs, the required energy for transmitting the sub-offloaded task τ k m ( t ) is computed as
E k m tran ( t ) = p k m ( t ) T k m tran ( t ) = p k m ( t ) w k m ( t ) r k m ( t ) = p k m ( t ) δ k m ( t ) o k ( t ) w k ( t ) r k m ( t ) .
As only the energy consumption of the VUs is taken into account, the total energy consumed by VU k to complete the offloaded portion o k ( t ) of task τ k ( t ) is equivalent to the total energy consumed for transmitting all sub-offloaded tasks to the HAP, expressed by
E k off ( t ) = m = 1 2 E k m tran ( t ) = m = 1 2 p k m ( t ) δ k m ( t ) o k ( t ) w k ( t ) r k m ( t ) .

2.4. Computing Cost

In the partial offloading model, the delay cost T k ( t ) for task τ k ( t ) is decided by taking the maximum processing delays of on-device computing T k loc ( t ) and computation offloading T k off ( t ) . Mathematically, this is expressed as
T k ( t ) = max { T k loc ( t ) , T k off ( t ) } .
The total energy consumption E k ( t ) for task τ k ( t ) is determined by summing the energy consumed by local computing E k loc ( t ) and the energy used for task offloading E k off ( t ) . Hence, the energy cost of VU k for processing task τ k ( t ) is given by
E k ( t ) = E k loc ( t ) + E k off ( t ) .

2.5. Problem Formulation

This work aims to investigate the consumed energy of VUs and the number of completed tasks, with a specific emphasis on meeting the QoS requirements. Hence, to ensure the QoS, the delay of VU k for executing the task τ k ( t ) cannot exceed the delay limit t max , expressed by
T k ( t ) t max , k 𝓚 .
To indicate if a task’s QoS constraint is fulfilled, we utilize a unit step function H [ t max T k ( t ) ] , which returns 1 if t max T k ( t ) 0 , otherwise, 0. Accordingly, the partial computation offloading problem is formulated to maximize the task success rate and minimize the consumed energy of VUs simultaneously through jointly optimizing variables of offloading rate { o k ( t ) k 𝓚 } , splitting rate { δ k m ( t ) k 𝓚 , m { 1 , 2 } } , transmission power { p k m ( t ) k 𝓚 , m { 1 , 2 } } , and decoding order { π k m ( t ) k 𝓚 , m { 1 , 2 } } in all time slots. Hence, the optimization problem for partial computation offloading is defined by
( P 1 ) : max o k ( t ) , δ k m ( t ) , p k m ( t ) , π k m ( t ) k 𝓚 , m { 1 , 2 } k 𝓚 C k ( t ) ,
s . t . o k ( t ) [ 0 , 1 ] , k 𝓚 ,
m = 1 2 δ k m ( t ) = 1 , δ k m ( t ) [ 0 , 1 ] , k 𝓚 , m { 1 , 2 } ,
m = 1 2 p k m ( t ) p max , p k m ( t ) 0 , k 𝓚 , m { 1 , 2 } ,
𝝅 ( t ) Π ( t ) , π k m ( t ) { 1 , , 2 K } , k 𝓚 , m { 1 , 2 } ,
where C k ( t ) represents the computing cost function of the VU k, defined as
C k ( t ) = ω H [ t max T k ( t ) ] ( 1 ω ) E k ( t ) ,
ω [ 0 , 1 ] indicates the trade-off between the importance of task satisfaction and energy cost, constraint (19b) specifies the value range of the offloading rate, constraint (19c) sets the acceptable range for splitting rates. constraint (19d) ensures that the sub-signals transmitted during offloading are not negative and do not exceed the maximum transmission power, and constraint (19e) specifies the range of values suitable for the decoding order.
The problem (P1) is a non-convex mixed-integer problem and the solution space grows exponentially with the problem scale, making it impractical to solve using traditional optimization techniques. To overcome this challenge, we convert the problem into an MDP model and resort to a DRL-based solution, which is elaborated upon in the following.

3. DDPG-Based Framework for HAP-Assisted RSMA-Enabled VEC

In this section, we first normalize the problem with continuous decision variables. Following this, we transform the normalized optimization problem into an MDP and develop a DDPG-based algorithm to solve it. The proposed algorithm’s computational complexity is also analysed.

3.1. Problem Transformation

Before transforming the problem into an MDP model and solving it using a DDPG-based method, we take some preparatory steps. Initially, we normalize the problem and refine the decision variables so that they are continuous and compatible with the DDPG algorithm. Additionally, we introduce new constraints for the decision variables to help simplify the exploration process of the DDPG algorithm. In particular, there are correlations between the offloading rate, splitting rates of sub-offload tasks, and transmission powers of sub-signals of the VU k. If the offloading rate is zero, both splitting rate variables for the sub-signals must be zero because no data is being offloaded. In contrast, when a part of the task is offloaded, the splitting rate variables have to be specified accordingly. Hence, the splitting rates are refined and they obey the constraints related to offloading rate variable as follows
δ k m ( t ) [ 0 , o k ( t ) ] , k 𝓚 , m { 1 , 2 } , m = 1 2 δ k m ( t ) = o k ( t ) , k 𝓚 ,
where · is the least integer function or ceiling function, which produces the minimum integer value that is equal to or greater than the input number. In this context, o k ( t ) returns 1 when a part of the task is offloaded, otherwise, 0.
Then, the transmission power of the sub-signal should be allocated to transmit the sub-offloaded task if the corresponding splitting rate variable is specified; otherwise, the transmission power should be zero. The transmission powers p k m ( t ) of sub-signals follow the constraints with the continuous transmission power variables p k m ( t ) [ 0 , 1 ] and splitting rate variables as follows
p k m ( t ) [ 0 , δ k m ( t ) ] , k 𝓚 , m { 1 , 2 } , p k m ( t ) = p k m ( t ) p max , k 𝓚 , m { 1 , 2 } , m = 1 2 p k m ( t ) 1 , k 𝓚 ,
We refer to the constraints (21) and (22) as correlation constraints, as they represent the relationships between the offloading rate, splitting rate, and continuous transmission power variables. Then, we reform the Π ( t ) by the Φ ( t ) = { ϕ k m ( t ) [ 0 , 1 ] k 𝓚 , m { 1 , 2 } } , which represents decoding priorities for all sub-signals at time slot t. Here, ϕ k m ( t ) [ 0 , 1 ] denotes a decoding priority for sub-signal s k m ( t ) . As a result, the uplink transmission rate of sub-signal s k m ( t ) is estimated based on the decoding priorities, sorted in ascending order as
r k m ( t ) = W log 2 1 + g k ( t ) p k m ( t ) ϕ k m ( t ) > ϕ k m ( t ) g k ( t ) p k m ( t ) + σ 2 .
Finally, the original problem (P1) is equivalently transformed with the continuous decision variables as
( P 2 ) : max o k ( t ) , δ k m ( t ) , p k m ( t ) , ϕ k m ( t ) k 𝓚 , m { 1 , 2 } k 𝓚 C k ( t ) , s . t . o k ( t ) [ 0 , 1 ] , k 𝓚 , ( 21 ) , ( 22 ) , ϕ ( t ) Φ ( t ) , ϕ k m ( t ) [ 0 , 1 ] , k 𝓚 , m { 1 , 2 } .
We can observe that all optimization variables in problem (P2) are continuous and within [ 0 , 1 ] , making them appropriate for input in the DDPG algorithm. To this end, the problem (P2) is transformed into an MDP model, in which the VEC system is regarded as the environment and the HAP is considered the agent. For every time slot, the agent engages in an interaction with the environment by evaluating the present state s ( t ) , making a decision on the appropriate action a ( t ) , receiving a reward signal r ( t ) as feedback, and transitioning to the next state s ( t + 1 ) . The tuple ( s ( t ) , a ( t ) , r ( t ) , s ( t + 1 ) ) is commonly used to represent the transition, which helps to track the history of changes in the environment. In the following, we specify the state, action, and reward functions.
  • State: A state represents the current state of the environment, which includes all relevant information that can affect the reward at a particular time step. In this work, task profiles (i.e., task size w k ( t ) and required computing resources c k ( t ) ) and communication channels (i.e., channel gains g k ( t ) ) of all VUs are involved in a state s ( t ) at time step t. Formally, the state s ( t ) is defined by
    s ( t ) = { g k ( t ) , w k ( t ) , c k ( t ) } , k 𝓚 .
    The dimension of the state space is 3 K .
  • Action: An action refers to the decision made by the agent at a given time step, which aims to maximize the expected cumulative reward, based on the current state. In this paper, the agent can select an action a ( t ) made up of offloading rate o k ( t ) , splitting rate δ k m ( t ) , normalized transmission power p k m ( t ) , and decoding priority ϕ k m ( t ) parameters. Accordingly, at time step t, the action a ( t ) can be specified as
    a ( t ) = { o k ( t ) , δ k m ( t ) , p k m ( t ) , ϕ k m ( t ) } , k 𝓚 , m { 1 , 2 } .
    The entire action space has a size of 7 K . To ensure that the values of the variables in an action generated randomly with additive noise for exploration during the training process do not exceed the allowable range, we apply a sigmoid function that maps any input value to an output value between 0 and 1, expressed as
    f ( x ) = 1 1 + e x
    where x represents the input to the function and f ( x ) represents the output. We further process the variables using Algorithm 1 to satisfy the correlation constraints (21) and (22). Specifically, when the offloading rate is set to 0, both the splitting rate and transmission power variables are also set to 0. However, when the offloading rate is non-zero, we implement the softmax function to convert the real values of the splitting rate variables into probabilities that sum up to 1. The same approach is applied to continuous transmission power variables. Additionally, if either the continuous transmission power variables or the splitting rate variables take on a value of 0, then the offloading rate variable must also be 0. By implementing these measures, we can ensure that the resulting action variables remain within the allowable range and satisfy the correlation constraints of the problem.
Algorithm 1 Action correlation function
1:
input: action a ( t )
2:
refine δ k m ( t ) according to o k ( t ) , k 𝓚 , m { 1 , 2 }
δ k m ( t ) = 0 , if o k ( t ) = 0 , e δ k m ( t ) n = 1 2 e δ k n ( t ) , otherwise .
3:
refine p k m ( t ) according to δ k m ( t ) , k 𝓚 , m { 1 , 2 }
p k m ( t ) = 0 , if δ k m ( t ) = 0 , e p k m ( t ) n = 1 2 e p k n ( t ) , otherwise .
4:
refine o k ( t ) according to p k m ( t ) and δ k m ( t ) , k 𝓚 , m { 1 , 2 }
o k ( t ) = 0 , if p k m ( t ) < δ k m ( t ) , o k ( t ) , otherwise .
5:
return action a ( t ) with refined variables.
  • Reward: An immediate reward r ( t ) signifies the effect of an action a ( t ) on the environment at the current state s ( t ) . We aim to optimize the task success rate and consumed energy of VUs. This can be achieved by defining the reward as
    r ( t ) = k 𝓚 C k ( t ) = k 𝓚 ω H [ t max T k ( t ) ] ( 1 ω ) E k ( t ) .
    The agent aims to maximize the long-term expected cumulative reward, which is defined by
    R = max a ( t ) E t = 1 T γ t 1 r ( t ) ,
    where E [ · ] denotes the expectation operator and γ [ 0 , 1 ] represents the discount factor. A value of 0 for γ implies that the agent values immediate reward exclusively, whereas increasing γ leads to a greater focus on future reward.

3.2. Proposed DDPG-Based Optimization Framework

Given that the HAP-assisted RSMA-enabled VEC system has continuous state and action spaces and is time-varying, the DDPG algorithm is used to maximize the long-term expected reward in the defined MDP model aiming at maximizing task success rate and energy efficiency. DDPG [17] is a model-free and off-policy DRL algorithm. It employs two sets of neural networks: a critic network Q ( s , a | θ Q ) with parameter θ Q and an actor network μ ( s | θ μ ) with parameter θ μ . The actor network is responsible for selecting actions based on the current state, and it outputs a deterministic action for each input state. The critic network, on the other hand, evaluates the actions selected by the actor network and provides feedback to the actor network in the form of a gradient signal that guides its learning process. Additionally, DDPG utilizes an experience replay buffer and target networks, including a target actor network μ ( s | θ μ ) with parameter θ μ and a target critic network Q ( s , a | θ Q ) with parameter θ Q . The experience replay buffer stores the transitions for use in the training phase, while the target networks help reduce training data correlation and improve training stability. All four networks are DNNs.
In the training process, the parameters of the critic network are adjusted by minimizing the loss function of the action-value function Q ( s i , a i | θ Q ) and the target value y i , which is expressed as
L ( θ Q ) = 1 N i N ( y i Q ( s i , a i | θ Q ) ) 2 ,
where y i = r i + γ Q s i + 1 , μ ( s i + 1 | θ μ ) | θ Q , N represents the number of samples in a mini-batch that is randomly chosen from the experience replay buffer, and ( s i , a i , r i , s i + 1 ) represents a sample i that includes the state, action, reward, and next state, respectively. Then, the actor network is updated via the deterministic policy gradient method, which relies on the gradient information from the critic network, as
θ μ J 1 N i N a Q ( s , a | θ Q ) | s = s i , a = μ ( s i ) θ μ μ ( s | θ μ ) | s i .
Moreover, to boost training stability, a soft update is employed to update the parameters of the target networks using a small constant ϵ 1 . This can be formulated as
θ μ ϵ θ μ + ( 1 ϵ ) θ μ , θ Q ϵ θ Q + ( 1 ϵ ) θ Q .
During the execution phase, the critic network is inactive, and only the actor network is utilized. The trained actor network uses the present state observed by the HAP to make timely decisions. This forward propagation process is much faster and consumes fewer computational resources compared to the training phase.
The proposed optimization framework based on DDPG for the HAP-assisted RSMA-enabled VEC system is illustrated in Figure 2. In addition, Algorithm 2 outlines the training process for the framework, which can be broken down into three steps: initializing, data gathering, and training the model. In the first step, the actor and the critic networks are initialized with random weights, target networks are established, and an experience reply buffer is created (lines 1–3). During the second step, the agent interacts with the environment to collect training data, which is stored in the experience replay buffer (lines 7–11). To regulate the exploration process, the actor policy is augmented with noise to generate actions as
a ( t ) = μ ( s ( t ) | θ μ ) + OU ( t ) ,
where the action noise OU ( t ) follows the Ornstein–Uhlenbeck process [17]. Nevertheless, the noise induces the violation of constraints related to the value ranges of the action’s variables. Instead of clipping the action as in the original DDPG algorithm, we utilize the sigmoid function to scale the output of the action within the range of 0 to 1. The action is then processed by the action correlation function (Algorithm 1), which ensures that the action satisfies the constraints of the problem. The action correlation function is a crucial addition that allows us to effectively apply the DDPG algorithm to solve the problem. Each experience tuple ( s ( t ) , a ( t ) , r ( t ) , s ( t + 1 ) ) is saved in the experience reply buffer D that has a fixed capacity. The oldest sample will be replaced with new data when the buffer is full. In the third step, a mini-batch of samples is randomly selected from the experience replay buffer to train the networks (line 12). The algorithm then uses the loss function and policy gradient ascent method to update the parameters of critic and actor networks, respectively, (lines 13–14). To ensure training stability and avoid divergence, the target network’s parameters are softly updated with a constant value of ϵ (line 15). The training concludes once the desired number of episodes has been reached, resulting in a well-trained actor network that can be utilized during the online execution phase (line 18).
Algorithm 2 Training procedure for the DDPG-based optimization framework
  1:
Initialize actor network μ ( s | θ μ ) and critic network Q ( s , a | θ Q ) with parameters θ μ and θ Q
  2:
Initialize target networks μ and Q with parameters θ μ θ μ , θ Q θ Q
  3:
Initialize experience replay buffer D
  4:
for  e p i s o d e { 1 , , M }  do
  5:
   Get initial observed state s ( 1 )
  6:
   for  t { 1 , , T }  do
  7:
     Select action a ( t ) = μ ( s ( t ) | θ μ ) + OU ( t )
  8:
     Process action a ( t ) by Algorithm 1
  9:
     Perform action a ( t ) and get reward r ( t ) and next state s ( t + 1 )
10:
     Keep the transition ( s ( t ) , a ( t ) , r ( t ) , s ( t + 1 ) ) in D
11:
     Set s ( t ) = s ( t + 1 )
12:
     Retrieve a random mini-batch of N samples ( s i , a i , r i , s i + 1 ) from D
13:
     Minimize the loss function to update the critic network
L ( θ Q ) = 1 N i N ( y i Q ( s i , a i | θ Q ) ) 2 where y i = r i + γ Q ( s i + 1 , μ ( s i + 1 | θ μ ) | θ Q )
14:
     Employ the policy gradient to update the actor network
θ μ J 1 N i N a Q ( s , a | θ Q ) | s = s i , a = μ ( s i ) θ μ μ ( s | θ μ ) | s i
15:
     Perform a soft update to the target network’s parameters
θ μ ϵ θ μ + ( 1 ϵ ) θ μ θ Q ϵ θ Q + ( 1 ϵ ) θ Q
16:
   end for
17:
end for
14:
return the trained actor network μ ( s | θ μ ) .

3.3. Computational Complexity Analysis

The framework employs fully connected DNNs, which comprise an input layer, two hidden layers, and an output layer. The computational complexity of each training step depends on several factors, including the size of the input state and action, the number of layers, and the number of neurons within each layer of the DNNs [22,28]. Let M denote the total number of training episodes, T be the number of steps taken in each episode, N be the size of the mini-batch used for each update, L μ and L Q be the number of layers in the actor and critic networks, respectively, and n l μ μ and n l Q Q denote the number of neurons in the l μ -th layer of the actor network and the l Q -th layer of the critic network, respectively. The overall computational complexity during the training phase can be expressed as
O M T N l μ = 0 L μ 1 n l μ μ n l μ + 1 μ + l Q = 0 L Q 1 n l Q Q n l Q + 1 Q .
In the execution phase, the agent only requires the policy function based on the trained model, resulting in a computational complexity per time slot of O l μ = 0 L μ 1 n l μ μ n l μ + 1 μ .
The proposed framework relies on the DDPG algorithm, which can be computationally expensive and resource-intensive in training. Nonetheless, once the model has been trained, it allows for real-time inference. The DDPG algorithm is an off-policy actor–critic approach that acquires a deterministic policy, implying that upon receiving a state, it provides the corresponding action directly. This makes it ideal for time-sensitive applications where actions have to be promptly computed.

4. Performance Evaluation

In this section, we delve into the specifics of the simulation and provide an analysis of the resulting outcomes.

4.1. Simulation Settings

We utilized Python programming and PyTorch [29] to create simulations of the HAP-assisted RSMA-enabled VEC environment, which were used for data collection and training. The environment covers a 1000 m × 1000 m area, with an HAP maintaining a stationary position at a height of 20 km in the centre and several VUs moving at a speed of 15 m/s (or 54 km/h). Each VU randomly generates a computational task, which includes both random data size and required computing resources. For the algorithm, we utilized fully connected DNNs for both the actor and critic networks, each with two hidden layers. The actor network consists of two hidden layers with 128 and 256 neurons, while the critic network has two hidden layers with 512 and 1024 neurons. The input layer of the actor network has a size equal to the state space, which is 3 K , and its output layer has a size equal to the number of entries in the action space, which is 7 K . The critic network’s input layer size is determined by both the state and action spaces and is 10 K , while its output layer has a single neuron. The actor network’s output layer uses the sigmoid function as its activation function, while the activation function of all other layers in both networks is the rectified linear unit function. Unless specified otherwise, we utilized the default settings presented in Table 2 for the environment and algorithm parameters [8,9]. We first examine the convergence of the proposed DDPG-based optimization framework under different values of hyperparameters. Then, we evaluate the performance of the proposed method against other benchmarks. Finally, we measure the execution time of the methods.
The methods used in the comparisons are as follows.
  • DDPG-based offloading and resource allocation method (DDPG-ORA): This is the proposed framework for optimizing offloading policies. The HAP utilizes the trained model to make promptly optimal task offloading decisions for VUs.
  • DQN-based offloading and resource allocation method (DQN-ORA): DQN is a popular DRL algorithm [16]. Since the action space is continuous in the defined MDP model, applying the DQN algorithm directly to solve the problem can be challenging. To address this, the action variables are discretized into 11 levels using a step size of 0.1 within the range of [0, 1], resulting in a set of action space { 0 , 0.1 , 0.2 , , 0.9 , 1 } . This creates a discrete action space that can be used with DQN to solve the defined task offloading and resource allocation problem.
  • Local search-based offloading and resource allocation method (LS-ORA): With the discrete action space same as in DQN-ORA, starting with a randomly initial action, a local search algorithm is applied to find the action with the highest reward among neighbouring actions [22].
  • Random offloading and resource allocation method (RORA): Offloading and resource allocation decisions are randomly made while adhering to system constraints [1,8,10].
  • Full offloading method (FO): All tasks are offloaded to the HAP for processing with a priority on offloading rather than resource allocation by utilizing maximum transmission power [1,8].
As the problem is a non-convex mixed-integer problem, there are no standard solutions available for solving it. To demonstrate the effectiveness of the proposed method, LS-ORA, RORA, and FO are utilized as baseline methods for comparison purposes. These have been discussed in previous works [1,8,10,22]. In addition, since FO uses the binary offloading model which differs from the partial offloading model used in other methods, we use it for comparison to illustrate the significance of the partial offloading model, which allows VUs to flexibly offload tasks to the HAP for processing. Furthermore, since DQN requires a discrete action space for learning, we compare the proposed method with DQN-ORA to emphasize the importance of the continuous action space in our approach. To ensure fair comparisons, we apply the action correlation function to all the compared methods.
To evaluate the performance of the methods, we employ two metrics: the task success rate, which is the proportion of completed tasks that satisfy the delay limit out of the total tasks generated by VUs during the testing period, and the energy cost (Joule or J), which is the total energy used by all VUs to process their tasks during the testing period.

4.2. Convergence Analysis

To assess the convergence of our DDPG-based optimization framework, we vary the learning rate, which is a crucial hyperparameter that affects learning performance the most. We set the other hyperparameters to their default values. The sets of learning rates and the training convergence results of the proposed DDPG-based optimization framework are shown in Figure 3, where l r μ and l r Q represent the learning rate of the actor and critic networks, respectively. We observe that the reward increases with each episode and ultimately converges for all learning rates. This is because the actor and critic networks adjust their parameters during training to achieve optimal policies. Among the sets of learning rates evaluated, the set ( l r μ , l r Q ) = ( 1 × 10 4 , 1 × 10 4 ) shows the best training performance in terms of higher reward and faster convergence speed. Therefore, we choose ( l r μ , l r Q ) = ( 1 × 10 4 , 1 × 10 4 ) for the remaining simulations.
The convergence of the DDPG-based optimization framework is also influenced by the mini-batch size hyperparameter. To evaluate this, we change the mini-batch size N across { 8 , 16 , 32 } while keeping the other hyperparameters at their default values. The mini-batch size determines the number of experience samples utilized in a single gradient update step, which has a direct impact on the stability and learning speed of the model. Our results, depicted in Figure 4, demonstrate that when the mini-batch size is set to 16, the model exhibits the fastest and highest convergence rate compared to other cases. A smaller mini-batch size of 8 results in less accurate estimates of the error gradient due to fewer training samples used, leading to poor convergence performance. Conversely, a larger mini-batch size of 32 can introduce noise into the dynamic environment, causing poor convergence and requiring more computational resources for computation.
To further evaluate the influence of hyperparameters on the DDPG-based optimization framework, we also vary the discount factor hyperparameter γ from { 0.8 , 0.9 , 0.99 } while keeping the other hyperparameters at their default values. The discount factor determines the weight given by the agent to future rewards when learning the current policy. Figure 5 indicates that the discount factor of 0.99 yields the highest reward, as it prioritizes long-term expected cumulative reward. Conversely, using a smaller discount factor leads to poor performance since it results in a short-sighted approach, where the agent focuses on the reward of the current policy and disregards the impact of future rewards on the learned policy.
Based on the above observations, we have defined a set of hyperparameters for performance comparisons. Specifically, we set the learning rates for both the actor and critic networks to 1 × 10 4 , the mini-batch size N to 16, and the discount factor γ to 0.99. These hyperparameters are used in conjunction with default values for all other hyperparameters.

4.3. Performance Comparisons

In this part, we evaluate the performance of our proposed approach and compare it to both state-of-the-art and baseline approaches.
First, we investigate how the performance of offloading is impacted by the number of VUs in the network, ranging from 10 to 30 VUs. Table 3 and Figure 6 display the results of task success rate and energy consumption for different numbers of VUs. As illustrated in Figure 6a, our results indicate that DDPG-ORA outperforms other techniques in terms of task success rate, achieving a maximum of 100% for 10 VUs and gradually declining to 69.11% for 30 VUs. DQN-ORA also achieves good performances, however, falls behind DDPG-ORA due to its discrete action space. In contrast, FO experiences a sharp decline in task success rate as more VUs are added to the network, dropping to 0% for 25 and 30 VUs due to overloading the HAP’s computing capacity, highlighting the importance of partial offloading in the VEC system. Both RORA and LS-ORA can be executed with more than 20 VUs, indicating that partial offloading is preferable in such scenarios. Task success rates for RORA and LS-ORA decrease steadily as the number of VUs increases. LS-ORA performs better than RORA, indicating that the local search algorithm can help improve offloading and resource allocation policies. With respect to energy consumption, Figure 6b shows that DDPG-ORA maintains reasonable energy costs, ranging from 39.2 kJ for 10 VUs to 85.3 kJ for 30 VUs while delivering the highest task success rates. FO incurs the highest energy consumption across all scenarios, reaching a peak of 223 kJ for 30 VUs, mainly due to VUs using maximum transmission power to send their tasks to the HAP.
Next, we analyse the effect of task size on the efficiency of offloading methods, as illustrated in Table 4 and Figure 7. The number of VUs is fixed at 20 while the task size is varied between 1.0 and 1.4 Mb. Our proposed method, DDPG-ORA, consistently maintains high task success rates across all task sizes; ranging from 100% to 92.83%, as evident in Figure 7a. DQN-ORA performs well with small task sizes; however, the gap in task success rate between DDPG-ORA and DQN-ORA becomes larger with the increase in task size. FO attains its maximum success rate of 74.5% for the smallest task size, but its task success rate drastically drops to 0% as the task size increases. This issue arises because FO offloads all tasks to the HAP for processing, resulting in a bottleneck when task sizes become too large. Similarly, LS-ORA and RORA’s task success rate exhibits a downward trend but they still can be operated in cases of large task sizes. The energy consumption results of the four approaches are shown in Figure 7b. As FO prioritizes offloading over resource allocation, it is not a cost-effective option. When considering energy efficiency relative to task success rate, DDPG-ORA emerges as the most efficient approach.
Furthermore, we assess the influence of the HAP’s computing power on the performance of the methods, as visualized in Table 5 and Figure 8. The HAP’s computing capacity is varied from 40 to 120 while the number of VUs remains fixed at 20. As shown in Figure 8a, DDPG-ORA achieves the highest task success rate amongst all methods, with a rate of 96.67% when the HAP’s computing capacity is at its maximum of 120 GHz. DQN-ORA achieves its highest task success rate of 81.5% among all scenarios. In contrast, FO offloads all tasks to the HAP for processing, producing task success rates of up to 58.67% only when the HAP has sufficient computing resources. This method performs poorly when the HAP’s computing capacity is limited, leading to zero task success rates. RORA makes decisions randomly, resulting in low task success rates ranging from 6.75% to 16.42%. LS-ORA shows moderate success rates ranging from 10.67% to 58.33%, which are generally better than FO and RORA but still lower than DDPG-ORA and DQN-ORA. In terms of energy cost, it is observed that the parameter is not significantly affected by the HAP’s computing capacity. Instead, it depends more on the transmission of data from VUs to the HAP. As per Figure 8b, all methods except LS-ORA maintain constant energy consumption regardless of the HAP’s computing capacity. LS-ORA allows VUs to offload their tasks to the HAP only when the HAP’s computing capacity is sufficient to handle offloaded tasks, such as in scenarios where the HAP’s computing capacity is higher than 80 GHz, resulting in increased energy consumption.
At last, we assess the execution time of the proposed DDPG-based method and other benchmarks. It is worth noting that we do not consider the training time since DRL models often require extensive training times, which are tolerable. Our main focus is on the time taken to generate and execute an action. Table 6 shows the execution time (in seconds) of the methods at different numbers of VUs. As the number of VUs increases, the execution time for all algorithms also increases accordingly, as the system has to process more task-offloading requests concurrently. RORA and FO have lower execution times compared to other methods since they do not involve any optimization process. DDPG-ORA and DQN-ORA produce actions using the current state and policy function, but they consume only slightly more time compared to RORA and FO. In contrast, LS-ORA requires substantial time to determine a suitable action through an iterative process that begins with a random action. Hence, LS-ORA is only appropriate for small-scale networks, while the other methods are more suitable for real-time decision-making.
Overall, DDPG-ORA, which has real-time inference capability, outperforms other methods in terms of task success rate while maintaining a reasonable energy cost. Although DQN-ORA also maintains a reasonable energy cost, it falls short of DDPG-ORA in terms of task success rate due to its discrete action space. LS-ORA shows promising results in certain scenarios; however, it is not suitable for real-time applications. RORA is not a viable offloading approach due to its random selection strategy. FO, which utilizes the binary offloading model, is inflexible, consumes substantial amounts of energy for data offloading, and fails in extreme cases. These results highlight the importance of using a robust DRL algorithm along with partial offloading and resource allocation for achieving optimal performance in the HAP-assisted RSMA-enabled VEC system.

5. Conclusions

This study focused on addressing the challenge of partial task offloading and resource allocation in VEC systems that utilize HAP and RSMA technologies, particularly in areas without access to a terrestrial base station. Our objective was to jointly optimize the task offloading rate, RSMA splitting rate, VU transmission power, and sub-signal decoding order at the HAP to maximize the task success rate while minimizing VU energy consumption. Given the problem’s high complexity and dynamism, we transformed it into an MDP model and designed a DDPG-based optimization framework to obtain effective policies for task offloading and resource allocation. Our simulation results demonstrated that the proposed optimization framework was efficient and performed well in terms of training coverage and performance evaluation.
While the DDPG-based framework has shown promising results, it has some limitations. One of the main challenges is that the algorithm can be computationally expensive and resource-intensive during training. Additionally, the overestimation problem in the Q-value function estimation can lead to sub-optimal performance. To enhance the system’s performance, future research could explore new variations of DDPG or alternative algorithms that can mitigate these limitations. Furthermore, the potential of heterogeneous aerial networks comprising multiple HAPs and UAVs could be further explored to support VEC systems in real-world scenarios.

Author Contributions

Conceptualization, T.-H.N.; methodology, T.-H.N. and L.P.; software, T.-H.N.; validation, T.-H.N. and L.P.; formal analysis, T.-H.N.; investigation, T.-H.N. and L.P.; resources, L.P.; data curation, T.-H.N.; writing—original draft preparation, T.-H.N.; writing—review and editing, L.P.; visualization, T.-H.N.; supervision, L.P.; project administration, L.P.; funding acquisition, T.-H.N. and L.P. All authors have read and agreed to the published version of the manuscript.

Funding

This study was financially supported by Seoul National University of Science and Technology.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
DDPGDeep deterministic policy gradient
DNNDeep neural network
DQNDeep Q-network
DRLDeep reinforcement learning
HAPHigh-altitude platform
IoTInternet of Things
LoSLine-of-sight
MDPMarkov decision process
MECMulti-access edge computing
NOMANon-orthogonal multiple access
QoSQuality of service
RSMARate-splitting multiple access
RSURoadside unit
SICSuccessive interference cancellation
UAVUnmanned aerial vehicle
VECVehicular edge computing
VUVehicle user

References

  1. Waqar, N.; Hassan, S.A.; Mahmood, A.; Dev, K.; Do, D.T.; Gidlund, M. Computation Offloading and Resource Allocation in MEC-Enabled Integrated Aerial-Terrestrial Vehicular Networks: A Reinforcement Learning Approach. IEEE Trans. Intell. Transp. Syst. 2022, 23, 21478–21491. [Google Scholar] [CrossRef]
  2. Traspadini, A.; Giordani, M.; Giambene, G.; Zorzi, M. Real-Time HAP-Assisted Vehicular Edge Computing for Rural Areas. IEEE Wirel. Commun. Lett. 2023, 12, 674–678. [Google Scholar] [CrossRef]
  3. Li, G.; Nguyen, T.H.; Jung, J.J. Traffic Incident Detection Based on Dynamic Graph Embedding in Vehicular Edge Computing. Appl. Sci. 2021, 11, 5861. [Google Scholar] [CrossRef]
  4. Dao, N.N.; Pham, Q.V.; Tu, N.H.; Thanh, T.T.; Bao, V.N.Q.; Lakew, D.S.; Cho, S. Survey on Aerial Radio Access Networks: Toward a Comprehensive 6G Access Infrastructure. IEEE Commun. Surv. Tutor. 2021, 23, 1193–1225. [Google Scholar] [CrossRef]
  5. Azari, M.M.; Solanki, S.; Chatzinotas, S.; Kodheli, O.; Sallouha, H.; Colpaert, A.; Montoya, J.F.M.; Pollin, S.; Haqiqatnejad, A.; Mostaani, A.; et al. Evolution of Non-Terrestrial Networks From 5G to 6G: A Survey. IEEE Commun. Surv. Tutor. 2022, 24, 2633–2672. [Google Scholar] [CrossRef]
  6. 3rd Generation Partnership Project. Study on Narrow-Band Internet of Things (NB-IoT)/Enhanced Machine Type Communication (eMTC) Support for Non-Terrestrial Networks (NTN); Technical Report 3GPP TR 36.763; 3rd Generation Partnership Project: Sophia Antipolis, France, 2021. [Google Scholar]
  7. Kurt, G.K.; Khoshkholgh, M.G.; Alfattani, S.; Ibrahim, A.; Darwish, T.S.J.; Alam, M.S.; Yanikomeroglu, H.; Yongacoglu, A. A Vision and Framework for the High Altitude Platform Station (HAPS) Networks of the Future. IEEE Commun. Surv. Tutor. 2021, 23, 729–779. [Google Scholar] [CrossRef]
  8. Lakew, D.S.; Tran, A.T.; Dao, N.N.; Cho, S. Intelligent Offloading and Resource Allocation in Heterogeneous Aerial Access IoT Networks. IEEE Internet Things J. 2023, 10, 5704–5718. [Google Scholar] [CrossRef]
  9. Truong, T.P.; Dao, N.N.; Cho, S. HAMEC-RSMA: Enhanced Aerial Computing Systems With Rate Splitting Multiple Access. IEEE Access 2022, 10, 52398–52409. [Google Scholar] [CrossRef]
  10. Kang, H.; Chang, X.; Misic, J.; Misic, V.B.; Fan, J.; Liu, Y. Cooperative UAV Resource Allocation and Task Offloading in Hierarchical Aerial Computing Systems: A MAPPO Based Approach. IEEE Internet Things J. 2023. [Google Scholar] [CrossRef]
  11. Liu, Y.; Qin, Z.; Elkashlan, M.; Ding, Z.; Nallanathan, A.; Hanzo, L. Nonorthogonal Multiple Access for 5G and Beyond. Proc. IEEE 2017, 105, 2347–2381. [Google Scholar] [CrossRef]
  12. Yang, Z.; Chen, M.; Saad, W.; Xu, W.; Shikh-Bahaei, M. Sum-Rate Maximization of Uplink Rate Splitting Multiple Access (RSMA) Communication. IEEE Trans. Mob. Comput. 2022, 21, 2596–2609. [Google Scholar] [CrossRef]
  13. Katwe, M.; Singh, K.; Clerckx, B.; Li, C.P. Rate Splitting Multiple Access for Sum-Rate Maximization in IRS Aided Uplink Communications. IEEE Trans. Wirel. Commun. 2023, 22, 2246–2261. [Google Scholar] [CrossRef]
  14. Mao, Y.; Dizdar, O.; Clerckx, B.; Schober, R.; Popovski, P.; Poor, H.V. Rate-Splitting Multiple Access: Fundamentals, Survey, and Future Research Trends. IEEE Commun. Surv. Tutor. 2022, 24, 2073–2126. [Google Scholar] [CrossRef]
  15. Clerckx, B.; Mao, Y.; Jorswieck, E.A.; Yuan, J.; Love, D.J.; Erkip, E.; Niyato, D. A Primer on Rate-Splitting Multiple Access: Tutorial, Myths, and Frequently Asked Questions. IEEE J. Sel. Areas Commun. 2023, 41, 1265–1308. [Google Scholar] [CrossRef]
  16. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
  17. Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. In Proceedings of the 4th International Conference on Learning Representations (ICLR 2016), San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
  18. Luong, N.C.; Hoang, D.T.; Gong, S.; Niyato, D.; Wang, P.; Liang, Y.C.; Kim, D.I. Applications of Deep Reinforcement Learning in Communications and Networking: A Survey. IEEE Commun. Surv. Tutor. 2019, 21, 3133–3174. [Google Scholar] [CrossRef]
  19. Nguyen, T.H.; Park, L. A Survey on Deep Reinforcement Learning-driven Task Offloading in Aerial Access Networks. In Proceedings of the IEEE 2022 13th International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Republic of Korea, 19–21 October 2022; pp. 822–827. [Google Scholar] [CrossRef]
  20. Yu, Z.; Gong, Y.; Gong, S.; Guo, Y. Joint Task Offloading and Resource Allocation in UAV-Enabled Mobile Edge Computing. IEEE Internet Things J. 2020, 7, 3147–3159. [Google Scholar] [CrossRef]
  21. Yang, C.; Liu, B.; Li, H.; Li, B.; Xie, K.; Xie, S. Learning Based Channel Allocation and Task Offloading in Temporary UAV-Assisted Vehicular Edge Computing Networks. IEEE Trans. Veh. Technol. 2022, 71, 9884–9895. [Google Scholar] [CrossRef]
  22. Truong, T.P.; Tuong, V.D.; Dao, N.N.; Cho, S. FlyReflect: Joint Flying IRS Trajectory and Phase Shift Design Using Deep Reinforcement Learning. IEEE Internet Things J. 2023, 10, 4605–4620. [Google Scholar] [CrossRef]
  23. Hua, D.T.; Do, Q.T.; Dao, N.N.; Cho, S. On sum-rate maximization in downlink UAV-aided RSMA systems. ICT Express, 2023; in press. [Google Scholar] [CrossRef]
  24. Zhang, Y.; Na, Z.; Wang, Y.; Ji, C. Joint power allocation and deployment optimization for HAP-assisted NOMA-MEC system. Wirel. Netw. 2022. [Google Scholar] [CrossRef]
  25. Nguyen, T.H.; Truong, T.P.; Dao, N.N.; Na, W.; Park, H.; Park, L. Deep Reinforcement Learning-based Partial Task Offloading in High Altitude Platform-aided Vehicular Networks. In Proceedings of the 2022 13th International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Republic of Korea, 19–21 October 2022; pp. 1341–1346. [Google Scholar] [CrossRef]
  26. Cheng, Z.; Liwang, M.; Chen, N.; Huang, L.; Du, X.; Guizani, M. Deep reinforcement learning-based joint task and energy offloading in UAV-aided 6G intelligent edge networks. Comput. Commun. 2022, 192, 234–244. [Google Scholar] [CrossRef]
  27. Ren, Q.; Abbasi, O.; Kurt, G.K.; Yanikomeroglu, H.; Chen, J. Caching and Computation Offloading in High Altitude Platform Station (HAPS) Assisted Intelligent Transportation Systems. IEEE Trans. Wirel. Commun. 2022, 21, 9010–9024. [Google Scholar] [CrossRef]
  28. Qiu, C.; Hu, Y.; Chen, Y.; Zeng, B. Deep Deterministic Policy Gradient (DDPG)-Based Energy Harvesting Wireless Communications. IEEE Internet Things J. 2019, 6, 8577–8588. [Google Scholar] [CrossRef]
  29. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of the Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019; Curran Associates, Inc.: Brooklyn, NY, USA, 2019; pp. 8026–8037. [Google Scholar]
Figure 1. An HAP-assisted RSMA-enabled vehicular edge network in remote areas.
Figure 1. An HAP-assisted RSMA-enabled vehicular edge network in remote areas.
Mathematics 11 02376 g001
Figure 2. The proposed DDPG-based optimization framework.
Figure 2. The proposed DDPG-based optimization framework.
Mathematics 11 02376 g002
Figure 3. Convergence performance of the proposed framework under different sets of learning rates.
Figure 3. Convergence performance of the proposed framework under different sets of learning rates.
Mathematics 11 02376 g003
Figure 4. Effect of different mini-batch sizes on convergence performance of the proposed framework.
Figure 4. Effect of different mini-batch sizes on convergence performance of the proposed framework.
Mathematics 11 02376 g004
Figure 5. Effect of varying discount factors on convergence performance of the proposed framework.
Figure 5. Effect of varying discount factors on convergence performance of the proposed framework.
Mathematics 11 02376 g005
Figure 6. Comparison of performance across varying numbers of VUs. (a) Task success rate versus the number of VUs; (b) Energy cost versus the number of VUs.
Figure 6. Comparison of performance across varying numbers of VUs. (a) Task success rate versus the number of VUs; (b) Energy cost versus the number of VUs.
Mathematics 11 02376 g006
Figure 7. Performance evaluation under different task sizes. (a) Task success rate versus the task data size; (b) Energy cost versus the task data size.
Figure 7. Performance evaluation under different task sizes. (a) Task success rate versus the task data size; (b) Energy cost versus the task data size.
Mathematics 11 02376 g007
Figure 8. HAP computation capacity’s impact on the performance of the methods. (a) Task success rate versus the HAP’s computing capacity; (b) Energy cost versus the HAP’s computing capacity.
Figure 8. HAP computation capacity’s impact on the performance of the methods. (a) Task success rate versus the HAP’s computing capacity; (b) Energy cost versus the HAP’s computing capacity.
Mathematics 11 02376 g008
Table 1. Notations used in the system model and problem formulation.
Table 1. Notations used in the system model and problem formulation.
NotationDescription
k , K , 𝓚 Index, the number, and the set of VUs
t , T , 𝓣 Index, the number, and the set of time slots
τ k = ( w k , c k ) Computational task of VU k
τ k m = ( w k m , c k ) Sub-offloaded task of VU k
t max Task delay limit
( x k , y k , 0 ) Coordinates of VU k
( x H , y H , z H ) Coordinates of the HAP
d k Distance from VU k to the HAP
g k Channel gain from VU k to the HAP
g 0 Channel gain at a reference distance of 1 m
κ Energy consumption coefficient of VUs
p max Maximum transmission power of VUs
n H Additive white Gaussian noise
σ 2 Power spectral density
WCommunication bandwidth of the HAP
s k , s k m Transmitted signal, sub-signal from VU k to the HAP
r k m Transmission rate of sub-signal s k m
F k , F H Computing capacity of VU k and the HAP
T k , T k loc , T k off Delay cost, VU k’s delay, and HAP’s delay for handling τ k
E k , E k loc , E k off Energy cost, VU k’s energy, and HAP’s energy for handling τ k
ω Weight parameter
C k Overall computing cost for handling τ k
Optimization Variables
o k Offloading rate for task τ k of VU k
δ k m Splitting rate of sub-offloaded task τ k m of VU k
p k m Transmission power of sub-signal s k m of VU k
p k m Normalized transmission power of sub-signal s k m of VU k
π k m Decoding order for sub-signal s k m of VU k
ϕ k m Decoding priority for sub-signal s k m of VU k
Table 2. Parameter settings.
Table 2. Parameter settings.
ParameterValue
Environment
Task size, w k [1, 1.5] Megabits or Mb
Required computational resources, c k [1000, 1300] CPU cycles/bit
Delay limit, t max 0.5 s
Number of VUs, K20
VU computational resources, F k 2 GHz
VU energy coefficient, κ 1 × 10 28
VU maximum transmit power, p max 20 dBm
Channel gain at 1 m reference distance, g 0 1.42 × 10 4
Path loss exponent, α 2
HAP computational resources, F H 80 GHz
HAP communication bandwidth, W200 MHz
Noise power, σ 2 −170 dBm/Hz
Weight parameter, ω 0.99
DDPG-based Optimization Framework
OptimizerAdam
Replay buffer capacity 1 × 10 6
Discount factor, γ 0.99
Size of mini-batch, N16
Soft update constant, ϵ 1 × 10 2
Number of training episodes2000
Number of training/testing steps300
Table 3. Performance comparison of methods across varying numbers of VUs.
Table 3. Performance comparison of methods across varying numbers of VUs.
Method1015202530
Task success rate (%)
DDPG-ORA100.0098.0096.1781.4769.11
DQN-ORA97.0090.6780.8368.4052.11
LS-ORA83.3341.5617.5013.6010.97
RORA18.9714.9812.7510.699.54
FO83.3341.1116.650.000.00
Energy consumption (kJ)
DDPG-ORA39.256.070.479.185.3
DQN-ORA38.756.566.473.176.7
LS-ORA72.089.463.771.786.2
RORA37.055.273.092.5111.0
FO74.0111.0149.0186.0223.0
Table 4. Performance evaluation under different task sizes (Mb).
Table 4. Performance evaluation under different task sizes (Mb).
Method1.01.11.21.31.4
Task success rate (%)
DDPG-ORA100.00100.00100.0098.6792.83
DQN-ORA95.0092.3390.0084.6770.83
LS-ORA74.6726.6716.5811.008.55
RORA30.1720.0012.457.303.92
FO74.5024.670.000.000.00
Energy consumption (kJ)
DDPG-ORA56.762.468.073.779.4
DQN-ORA55.561.066.672.177.7
LS-ORA116.061.254.761.267.4
RORA58.264.069.975.781.5
FO119.0131.0143.0154.0166.0
Table 5. Performance evaluation under different HAP’s computing capacities (GHz).
Table 5. Performance evaluation under different HAP’s computing capacities (GHz).
Method406080100120
Task success rate (%)
DDPG-ORA53.1790.3396.5096.6796.67
DQN-ORA53.0072.3381.0081.5081.50
LS-ORA10.6714.0218.0031.0058.33
RORA6.7510.0012.1214.4216.42
FO0.000.0017.1731.6758.67
Energy consumption (kJ)
DDPG-ORA69.470.170.070.370.3
DQN-ORA66.366.366.066.466.3
LS-ORA58.858.465.1105.0140.0
RORA72.672.972.972.772.8
FO149.0148.0149.0149.0149.0
Table 6. Comparison of execution time (in seconds) across varying VUs in a single testing step.
Table 6. Comparison of execution time (in seconds) across varying VUs in a single testing step.
Method1015202530
DDPG-ORA0.001090.001660.002080.002800.00335
DQN-ORA0.001190.001720.002270.002830.00392
LS-ORA0.273690.719501.461022.542244.05519
RORA0.000670.001030.001550.002180.00260
FO0.000700.001230.002040.002420.00308
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nguyen, T.-H.; Park, L. HAP-Assisted RSMA-Enabled Vehicular Edge Computing: A DRL-Based Optimization Framework. Mathematics 2023, 11, 2376. https://doi.org/10.3390/math11102376

AMA Style

Nguyen T-H, Park L. HAP-Assisted RSMA-Enabled Vehicular Edge Computing: A DRL-Based Optimization Framework. Mathematics. 2023; 11(10):2376. https://doi.org/10.3390/math11102376

Chicago/Turabian Style

Nguyen, Tri-Hai, and Laihyuk Park. 2023. "HAP-Assisted RSMA-Enabled Vehicular Edge Computing: A DRL-Based Optimization Framework" Mathematics 11, no. 10: 2376. https://doi.org/10.3390/math11102376

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop