1. Introduction
Wireless sensor networks (WSNs), comprising a large number of sensor nodes, show impressive capability in transmitting a very large number of data with high efficiency [
1,
2]. Their compactness, cost-effectiveness, and ease of deployment make WSNs highly effective for a wide range of real-time applications. With the rapid development of WSNs, the explosive growth of sensor devices has intensified the demand for high data rates and ultra-low-latency services [
3]. Traditional cloud computing paradigms face challenges in meeting the diverse service requirements of delay-sensitive and computing-intensive tasks [
4]. Recently, mobile edge computing (MEC), which provides computing and caching services for nearby sensors, has enabled local task offloading, avoiding the need to send data to distant cloud centers and thus reducing costs [
5]. This has facilitated the transition of WSN devices to large-scale deployment, allowing for the real-time monitoring of environmental conditions to offer essential insights for urban planning and management. Moreover, by deploying MEC servers at access points (APs) or base stations (BSs), it is possible not only to cache popular content on the edge cloud to reduce the delay and energy consumption of sensor content requests but also to offload tasks requiring computation [
6]. However, traditional MEC servers, typically deployed at fixed base stations, often suffer from limited coverage and challenges like non-line-of-sight (NLoS) transmission, which degrades signal quality and results in reduced overall service efficiency and performance [
7].
In recent years, unmanned aerial vehicles (UAVs) have gained widespread application across various industries due to their mobility, low operational cost, and ability to establish easy line-of-sight (LoS) communication links [
8]. These unique characteristics enable UAVs to effectively address challenges inherent to traditional communication systems, such as fixed deployment locations, high infrastructure costs, and limited adaptability to specialized scenarios [
9]. Furthermore, equipped with MEC servers, UAVs can not only provide computing and caching services to sensors but also function as aerial relays to offload tasks to other nodes, thereby significantly enhancing the flexibility and efficiency of network services [
10,
11,
12]. However, UAVs have limited computing, caching, and endurance capabilities, thus low-power solutions are crucial to improving UAV network performance [
13].
Recently presented, a promising approach to reducing UAV energy consumption involves the deployment of simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) as an alternative to UAVs for signal relaying [
14]. Each element of a STAR-RIS, capable of supporting both electric and magnetic currents, can simultaneously reconfigure transmitted and reflected signals, thereby achieving full-space coverage [
15,
16]. However, most existing studies in this field assume that STAR-RISs are deployed in a fixed position [
17], with UAVs primarily offering computing [
18,
19] or caching capabilities [
20,
21,
22]. The fixed deployment of STAR-RISs limits their ability to flexibly adjust the distance between themselves and the sensors, thereby degrading task offloading transmission performance. Other studies have proposed mounting STAR-RISs on UAVs, but UAVs do not have computing and caching resources [
23,
24]. Consequently, UAVs serve only as flight carriers, leaving their potential computing and caching capabilities underutilized. This significantly reduces resource utilization and overall task processing efficiency in UAV-based WSNs.
While significant progress has been made in this field, several critical research gaps remain unaddressed. First, the fixed deployment of STAR-RISs restricts their adaptability to dynamic WSN environments, leading to suboptimal task offloading transmission performance. Second, while some studies equip UAVs with STAR-RISs, they overlook UAVs’ inherent computing and caching capabilities, resulting in the underutilization of WSN resources and reduced system efficiency. Third, existing research predominantly lacks a comprehensive joint optimization framework that simultaneously considers caching decisions, offloading strategies, UAV hovering positions, and STAR-RIS passive beamforming. These limitations hinder the ability to fully capitalize on the dynamic and heterogeneous resources of UAV-based WSNs. To address these limitations, we propose a novel STAR-RIS-assisted computing offloading and content caching framework to minimize system energy consumption for UAV-based WSNs. In this context, UAVs have computing and caching capabilities, which can greatly enhance task processing efficiency and response speed. Subsequently, given the limited energy capacity of UAVs, we optimize their energy utilization by separating the relay functionality. To this end, a STAR-RIS, as a passive relay, is introduced to assist the sensor nodes in forwarding tasks that can be reasonably allocated to UAV-based WSN resources. Furthermore, by installing a STAR-RIS on the UAV, the system can dynamically adjust the relative positioning between the STAR-RIS and the sensors, further enhancing efficiency in task transmission and processing. However, several key challenges must still be addressed to fully achieve this. First, traditional static caching strategies are not suitable for the dynamic characteristics of UAV-based WSNs. Therefore, it is essential to design effective caching strategies that ensure fast responses to sensor requests. Secondly, in order to ensure the efficient use of resources and avoid single-point overload, how to achieve the dynamic offloading of tasks among edge clouds, UAVs, and sensors is crucial. Third, due to the limited endurance of UAVs, STAR-RISs as relay nodes can effectively save the energy consumption of UAVs when forwarding tasks as relay nodes by optimizing signal transmission and reflection paths. Therefore, in order to further improve the endurance of the UAV system, it is crucial to design an effective transmission and reflection coefficient matrix for STAR-RISs. Finally, UAVs equipped with STAR-RISs not only cache and process sensor tasks but also serve as relay nodes, providing additional communication links between sensors and the edge cloud. Therefore, it is essential to jointly optimize caching decisions, offloading decisions, UAV hovering positions, and passive beamforming to minimize overall system energy consumption.
Tackling these challenges, the main contributions of this paper can be summarized as follows:
- (1)
- We propose a novel aerial STAR-RIS-aided computing offloading and content caching framework to minimize system energy consumption for WSNs. This framework leverages flexible deployment and caching, computing, and communication (3C) resources to offer adaptive computation and caching services. Additionally, a STAR-RIS is introduced as a passive relay to assist sensors in forwarding tasks, which can reasonably allocate UAV-based WSN resources. Lastly, by installing a STAR-RIS on a UAV, the system can flexibly adjust the position between the STAR-RIS and the sensors to improve task transmission performance. 
- (2)
- Since the energy consumption minimization problem is non-convex, we decomposed the problem into four subproblems: content caching decision, computation offloading decision, UAV hovering position, and STAR-RIS resource allocation. For the subproblem of content caching decisions, the network caching decisions are optimized by utilizing a new deep reinforcement learning (DRL) algorithm. For the other subproblems, we utilize the Karush–Kuhn–Tucker (KKT) conditions and the successive convex approximation (SCA) algorithm to iteratively solve and optimize system energy consumption. 
- (3)
- The numerical results demonstrate that the proposed STAR-RIS-aided computing offloading and content caching framework significantly reduces system energy consumption in UAV-based WSNs compared with the benchmarks, especially in scenarios with limited network resources or adverse channel conditions. 
The rest of this paper is organized as follows: We first briefly review the related works of this paper in 
Section 2 and then give the overview and mathematical description of the system model in 
Section 3. In 
Section 4 and 
Section 5, the optimization algorithm and iterative solution process for the proposed model are introduced. In 
Section 6, we discuss the convergence and complexity of the proposed DRL-SCA algorithm. Simulation environments are presented, and the results are discussed in 
Section 7, followed by conclusions in 
Section 8.
  3. System Model and Problem Formulation
In this section, we provide a comprehensive description of the network overview, communication model, computation model, caching model, and energy consumption model for the proposed aerial STAR-RIS-aided WSN. We also formulate a system energy consumption minimization optimization problem. The notation is summarized in 
Table 1.
  3.1. Network Overview
As shown in 
Figure 1, the considered aerial STAR-RIS-aided WSN consists of an edge cloud 
c, a UAV 
u equipped with computing and caching servers and a STAR-RIS, and many single-antenna sensors. UAV 
u can provide services for sensors and those sensors are denoted by 
. And the set of tasks is denoted by 
. Moreover, we assume that content popularity follows the Zipf distribution. We consider that edge cloud 
c and UAV 
u have limited caching and computing capacities and that the sensors only have limited computing capacities. The UAV has mobility and can adjust its position relative to the connected sensors to improve the quality of signal transmission. In this network, UAV 
u is connected to edge cloud 
c and to the sensors by wireless links. In particular, the STAR-RIS consists of 
 × 
 passive units, which are spanned as a uniform planar array (UPA). Each column and row of the UPA has 
 and 
 passive units, respectively [
31]. The 
-th STAR-RIS element is used to represent the element of the 
-th row and 
-th column of the STAR-RIS. We assume that the direct communication links between the sensors and edge cloud 
c are blocked and that the STAR-RIS elements work in energy-splitting (ES) mode [
32]. In the proposed STAR-RIS-aided UAV system, the sensor 
m can process the partial tasks; then, UAV 
u can partially process the remaining tasks uploaded by the sensors and return the results to the sensors. At the same time, the other remaining tasks are forwarded to edge cloud 
c for processing through the STAR-RIS elements, and the results are returned to the sensors. In this paper, the main focus is on minimizing system energy consumption in the STAR-RIS-aided UAV system by jointly optimizing the content caching decisions, computing offloading decisions, UAV hovering positions, and STAR-RIS passive beamforming.
As illustrated in 
Figure 2, we make use of virtual reality (VR) as a typical application scenario. The sensors need to process a variety of virtual reality application tasks, such as object tracking, object identification, and scene rendering. Every task requires a different amount of computing capacity and data size. For instance, object-tracking tasks require a large number of data to be transmitted, while object identification tasks and scene-rendering tasks require greater computing resources. Therefore, we adopt two parameters in total for modeling heterogeneous computation tasks. For computation task 
k, we define 
, where 
 represents the computing resources required to complete the task and 
 represents the data size of the task, i.e., the size of the data that need to be transmitted to UAV 
u or edge cloud 
c.
When the computing task is not cached in the system, we consider the partial offloading scheme for delay-sensitive computation tasks in STAR-RIS-aided UAV systems. This kind of computation offloading model allows tasks to be calculated in parallel at the sensors, UAV 
u, and edge cloud 
c. The tasks processed at the sensors are referred to as local tasks, the tasks processed at UAV 
u are referred to as UAV-offloading tasks, and the tasks that are offloaded to edge cloud 
c are called edge cloud-offloading tasks. 
Figure 3 presents the time allocation for task processing in the STAR-RIS-aided UAV system, where the sensors utilize the same resource block with duration 
 to transmit and compute tasks.
In the local execution phase, the sensors process their tasks by the local computing servers. In the UAV task offloading phase, some of the remaining tasks are uploaded to UAV u and are processed by the computing server of UAV u. In the edge cloud task offloading phase, some of the remaining tasks are forwarded to edge cloud c through the STAR-RIS elements for processing. When the tasks are completed, the computing results obtained at both UAV u and edge cloud c are returned to the sensors. In downlink communication, since UAV u and edge cloud c tend to have high transmit power and the computing results are usually of small size, the downloading time is comparatively negligible in the UAV task offloading phase and edge cloud task offloading phase.
  3.2. Communication Model
This subsection introduces the communication model and gives the uplink data rate when the sensor offloads tasks on UAV 
u and edge cloud 
c. We assume that when the task is offloaded, the sensor does not move. A 3D Cartesian coordinate system is established to describe the locations of the sensors, UAV 
u, STAR-RIS, and edge cloud 
c. The locations of sensor 
m and edge cloud 
c are described by vector 
 and vector 
. We assume that UAV 
u is hovering at a fixed position in every slot to provide computing services for the sensors [
33]. The position of UAV 
u is 
, and the position of the 
-th STAR-RIS element is 
.
Due to the high probability of LoS links in UAV communication, the communication channels between sensors 
m and UAV 
u, between sensor 
m and the 
-th STAR-RIS element, and between the 
-th STAR-RIS element and edge cloud 
c are assumed to be LoS links, all following the free-space path loss model [
34]. So, the channel gain between node 
n and another node 
 can be formulated as follows:
        where 
, 
, and 
. 
 is the received power at a distance of 1 
m for a transmission power of 1 
W and 
. The signal-to-interference-plus-noise ratio (SINR) of the wireless link from node 
n to another node 
, denoted by 
, can be expressed as
        where 
 is the transmit power of node 
n. We assume that the noise power has constant variance 
. Therefore, the transmission rate from node 
n to node 
 can be given by
        where 
 is the transmission bandwidth of the wireless link from node 
n to node 
.
  3.2.1. Channel in UAV Task Offloading
UAV 
u equipped with the MEC server has computing resources, which allows the sensors to transfer some tasks to UAV 
u for processing. The signal received by UAV 
u can be written as
          where 
 is the corresponding signal with 
 [
35]. 
 is the additive white Gaussian noise (AWGN) received by UAV 
u. The noise power is 
, i.e., 
.
According to (
2) and (
3), the SNR and transmission rate of the link from sensor 
m to UAV 
u are denoted by 
 and 
.
  3.2.2. Channel in Edge Cloud Task Offloading
Subject to its energy limitations and task time constraints, UAV 
u can only perform part of its received remaining tasks. The remaining tasks are forwarded to edge cloud 
c for processing through the STAR-RIS. In order to better distinguish the channel gain from sensor 
m to the 
-th STAR-RIS element, the channel gain from the 
-th STAR-RIS element to edge cloud 
c is expressed as 
. Therefore, the channel gains from sensor 
m to the STAR-RIS and from the STAR-RIS to edge cloud 
c are denoted by 
 and 
, respectively. 
 is the transmission or reflection coefficient matrix of the STAR-RIS for the incident signal from sensor 
m, where 
 and 
 denote the amplitude and phase shift of the 
-th STAR-RIS element for the sensor’s signal, respectively, and 
. Let 
 be the reflection 
 or transmission 
 beamforming vectors. In consequence, the following constraint is required when the STAR-RIS is in ES mode:
Hence, the signal received at edge cloud 
c can be obtained as
          where 
 is the AWGN received at edge cloud 
c with variance 
, i.e., 
. If sensor 
m is in the transmission region, then 
; otherwise, if sensor 
m is in the reflection region, 
.
Similarly, the SNR and transmission rate of the link from sensor m to edge cloud c via the STAR-RIS are denoted by  and , respectively, where the channel gain from sensor m via the STAR-RIS to edge cloud c is .
We do not take packet loss and downlink transmission delay into account in this paper. This is due to the fact that the downlink transition rate is higher than the uplink transition rate and the size of the data after task processing is much smaller than that of the data before processing.
  3.3. Computation Model
In this study, we consider a divisible computation task, allowing it to be segmented into multiple parts. Taking video analysis as an example, a large video file containing numerous frames can be divided into several video clips through segmentation. This enables a portion of the clips to be initially processed locally at the sensor, while others are handled by UAV u, and the remaining clips are offloaded to edge cloud c. Detailed explanations will be provided in subsequent sections. Additionally, we disregard the delay associated with transmitting the processed task results from the UAV and edge cloud back to the sensors, as the output data size is significantly smaller compared with that of the input data.
We define the integer caching decision variable, , which indicates whether task k is cached at node  ( = 1) or not ( = 0). Therefore, the task caching strategy can be represented as follows: . For the task k offloading problem of sensor m, we define decision variable , .  is the task offloading ratio of sensor m at node j. (In this paper, we assume that sensor m initiates a task request to node j. Node j senses the information of the task (task size and time constraints) and then makes a task offloading decision based on its own computing capability. We call the above process the sensing phase, and since this time is very short, we ignore this time.) Consequently, the following is a representation of the task offloading policy: , .
Service delay is the overall service time for task k when task k is not cached, consisting of two parts: (i) uplink communication delay; (ii) computation delay. In downlink communication, since the UAV and edge cloud tend to have high transmit power and the computing results are usually of small size, the downloading time is also negligible. Next, we discuss the delay and corresponding energy consumption.
  3.3.1. Energy Consumption for Uplink Communication
We denote 
, as task k is not cached, and offloading decision variable 
. Therefore, the uplink transmission delay of offloading task k from sensor 
m to node 
 can be expressed as
Therefore, when task k is not cached, the total uplink communication energy consumption for sensor 
m offloading task k can be calculated as
  3.3.2. Energy Consumption for Computation
We denote 
, as task k is not cached, and offloading decision variable 
. Therefore, the computation delay for computing offloading task 
k at node 
j can be expressed as
          where 
 is the number of required computation resources for task 
k, i.e., the number of CPU cycles required for computing 1-bit task data. 
 is the computing capability (CPU cycles per second) of node 
j.
Therefore, the total computing energy consumption of sensor 
m for computing task request 
k can be expressed as
          where 
 is the effective capacitance coefficient of node 
j that depends on the processor’s chip architecture.
  3.4. Caching Model
In this subsection, we describe the caching model. Task caching involves storing completed tasks and associated data within UAV u or edge cloud c. Specifically, an independent resource container is maintained on UAV u or edge cloud c. The caching process operates as follows: Firstly, sensors send a computing task request. If UAV u or edge cloud c has already cached this task, the respective node notifies the sensors of its availability on the caching servers. Consequently, the sensor can avoid offloading the same task to UAV u or edge cloud c. At last, after the task is processed by UAV u or edge cloud c, the results are sent back to the sensors. This caching mechanism reduces the need for redundant task offloading, thereby lowering sensor energy consumption and minimizing offloading delays.
Despite its benefits, task caching still faces many challenges: (i) although UAV 
u and edge cloud 
c have greater caching and computational capacities compared with sensors, they still cannot cache or handle all kinds of computation tasks; (ii) unlike traditional caching strategies, task caching requires the consideration of not only the data size and computational resources necessary for each task but also task popularity. Consequently, designing an effective caching strategy presents significant challenges. We introduce an integer caching decision variable, 
, to indicate whether task 
k is cached at node 
 (
 = 1) or not (
 = 0). Therefore, the computation caching strategy can be defined as 
. In this study, we evaluate the task duration and energy consumption on UAV 
u or edge cloud 
c in scenarios with and without task caching. For task caching (
 = 1), the task duration, simplified to the processing delay, is denoted by 
 and can be expressed as
The primary energy consumption occurs within UAV 
u or edge cloud 
c, with sensors incurring no energy cost. Accordingly, the energy consumption associated with task caching can be formulated as
        where 
 is the effective capacitance coefficient of node 
 that depends on the processor’s chip architecture.
  3.5. Problem Formulation
Based on the task communication, computation, and caching process mentioned above, the overall delay and energy consumption of sensor 
m from sending task request 
k to obtaining computing results are expressed as
In this paper, for task caching (
 = 1), the primary energy consumption occurs within UAV 
u or edge cloud 
c, with sensors incurring no energy cost. The primary energy consumption, simplified to the processing energy consumption, is denoted by 
. For task caching (
 = 0), the primary energy consumption includes the total uplink communication energy consumption for sensor 
m offloading task 
k and the total computing energy consumption of computing task request 
k. Therefore, we formulate a system energy consumption minimization problem by jointly optimizing caching decision 
 , offloading decision 
 (
), hovering position of UAV 
u , and passive beamforming 
 in the STAR-RIS-aided UAV system. According to (
14), the optimization problem for minimizing the total energy consumption of the STAR-RIS-aided UAV system can be formulated as
Constraint  ensures that the task offloading ratio at each sensor must be between 0 (indicating no offloading) and 1 (indicating full offloading), ensuring that the offloading task is appropriately distributed. Constraint  ensures that the sum of offloading ratios for task k across all offloading targets (sensors m, UAV u, and edge cloud c) is equal to 1. Constraint  ensures that it governs the amplitude response of the -th STAR-RIS element, limiting it to a range of . Constraint  defines the phase shift of the -th STAR-RIS element, ensuring that it lies within the range . This is because the phase shift is a periodic quantity, and its effective range should be between 0 and . Constraint  restricts the UAV’s coordinates,  and , which must not exceed the maximum allowed values  and , ensuring that the UAV operates within a defined area. Constraint  ensures that the total cached content at each node does not exceed its maximum caching capacity . Constraint  ensures that the total computational resources allocated to handle tasks at each node do not exceed its maximum computation capacity . Constraint  restricts the caching binary decision variables  to take the value of either 0 or 1. This is typically used to represent whether a task is cached to a specific node or not. Constraint  ensures that the completion time of task k does not exceed the maximum tolerable time  for the task. This guarantees that all tasks are completed within the allowed time window, ensuring that delay requirements are met.
Due to the different hovering positions of UAV 
u, the channel gain of the sensor-to-UAV channel link and the sensor-to-STAR-RIS-to-edge cloud channel link may differ, which in turn affects the transmission energy consumption. For ease of calculation, we assume that the 
 of UAV 
u is fixed [
36]. We optimize the 
 and 
 of UAV 
u to minimize the total energy consumption of the system. Meanwhile, we assume that the coordinates of STAR-RIS are the same as those of UAV 
u.
As shown in 
Figure 4, we adopt alternative optimization techniques and decompose problem (15) into four subproblems:
- (1)
- Content caching decision subproblem: Given , , and , i.e., when  and  are fixed, problem (15) optimizes the caching decision vector  to minimize the total energy consumption of the system. We adopt the DRL algorithm to optimize the content caching decision, denoted by . 
- (2)
- Computing offloading decision subproblem: Given , , and , i.e., when  and  are fixed, problem (15) optimizes the offloading decision vector  to minimize the total energy consumption of the system. We adopt the KKT conditions to obtain an optimal solution, denoted by . 
- (3)
- UAV hovering position subproblem: Given , , and , i.e., when  and  are fixed, problem (15) optimizes the hovering position of UAV u vector  to minimize the total energy consumption of the system. We adopt the SCA method to optimize the hovering position of UAV u, denoted by . 
- (4)
- STAR-RIS resource allocation subproblem: Given , , and , the transmission and reflection coefficient matrix  is optimized to minimize the total energy consumption of the system. We adopt the SCA method to obtain an optimal solution, denoted by . 
  
    
  
  
    Figure 4.
      The proposed optimization framework of the energy consumption minimization problem.
  
 
   Figure 4.
      The proposed optimization framework of the energy consumption minimization problem.
  
 
  4. Content Caching Decision Optimization
Since content caching decision optimization is only related to 
 but independent of other variables in 
, caching decision optimization can be solved in advance with given offloading decisions and the hovering position of the UAV, as well as the transmission and reflection coefficient matrix. The subproblem can be written as
Due to the fact that  is a binary vector,  is still subject to a mixed-integer nonlinear programming (MINLP) problem. Traditional optimization methods like SCA may not effectively handle such problems, especially in environments where the network state changes rapidly and unpredictably. The traditional binary-relax SCA approach first relaxes the binary variables  from the discrete space  into the continuous space [0, 1] and then forces them to round back after the SCA-based iterations. Therefore, such “relaxation” may lead the solution to converge to a local minimum in real-time dynamic systems.
To address these challenges, we utilize the proximal policy optimization (PPO) algorithm, a DRL method. As shown in 
Figure 5, the PPO algorithm uses neural networks to model complex relationships between system states and actions, learning directly from interactions with the environment. By leveraging the PPO algorithm, caching decisions are dynamically adjusted based on real-time network conditions, such as caching state, content popularity, historical request access frequency, and network topology. The intelligent agent iteratively updates its caching strategy to maximize a reward function reflecting caching efficiency, like the cache hit rate, enabling near-optimal caching strategies that adapt to changing conditions and enhance overall system performance.
PPO is implemented within the actor–critic framework, comprising a policy network (actor) and a value network (critic). In this setup, the actor generates actions, while the critic evaluates them. A significant limitation of the basic actor–critic framework is its low sample efficiency, which requires extensive interactions with the environment to converge. To address this, PPO introduces two major contributions: mini-batch updates to improve data efficiency and a clipped surrogate loss to constrain policy updates. The PPO algorithm allows for a small difference between the target policy  and the behavior policy , where  and  denote the action taken and the state observed at time t. This is achieved by using a clipping function that limits the extent of policy change. If the policy update exceeds a predefined threshold, the clipping function prevents further increase.
  4.1. Intelligent Caching MDP Model
In the content caching decision subproblem for aerial STAR-RIS-aided WSNs, we leverage caching state, content popularity, historical request frequency, and network topology data to realize the optimal content caching decisions. The caching update model is formulated as a Markov decision process (MDP). In each time slot , the agent observes the current state, , and selects an action . Upon executing action , the agent receives an immediate reward  and the environment transitions to the next state, . The transition tuple  is stored in an experience replay buffer for agent training. To derive the optimal solution for problem  under heterogeneous scenarios, state space , action space , and reward function  in the proposed intelligent caching MDP model are designed as follows:
		
- (1)
- State: State  -  in slot t includes caching state information  - , content popularity  - , historical request access frequency  - , and network topology information  - . Thus, the state vector in slot t is expressed as - 
            where  -  represents the content caching status across all caching nodes in time slot  t- , expressed as  - . Additionally,  -  denotes the historical access frequencies for all requests. 
- (2)
- Action: During the process of caching decision, the optimal content caching decision,  - , includes the cached content across all nodes for the upcoming time slot  - . The expression for  -  is given by - 
            where  -  represents whether the content of task request  k-  is cached in edge cloud  c-  in time slot t + 1. 
- (3)
- Reward: The formulation of the reward function plays a crucial role in guiding the exploration of the caching update problem and ensuring algorithm convergence. Consequently, the reward function in time slot  t-  is defined as follows: - 
            where  -  is the maximum training steps,  -  is the weights parameter,  -  is the number of cache hits for node  -  in time slot t, and  -  represents sensor satisfaction at step k. 
  4.2. PPO-Based Content Caching Process
As shown in Algorithm 1, the PPO algorithm optimizes content caching decisions by iteratively adjusting the policy parameters by using a clipped objective function and advantage estimation to maximize cumulative rewards. The process starts with initializing the actor network’s policy parameters  and  to ensure consistent learning. The critic network is initialized with parameters  to evaluate state values. Key hyperparameters, including learning rate , discount factor , and clipping parameter , are set to guide the training process. This setup forms the foundation for the algorithm’s ability to adapt and optimize caching strategies dynamically. Then, the environment is reset to its initial state in each episode. At each time step, the agent observes the current state (), selects an action  based on the current policy , executes the action, and receives a reward .
The critic network, parameterized by 
, is trained by using gradient descent to minimize the loss function
        where 
. The generalized advantage estimator (GAE) is as shown in
The actor network updates its policy parameters 
 by maximizing a clipped objective function, ensuring stable updates. The clipped objective function is defined as follows:
        where 
 is a clip fraction. The policy ratio of the target policy to the behavior policy can be expressed as
Periodically, the behavior policy parameters are synchronized with the current policy to maintain consistency. By iteratively repeating these steps, the PPO algorithm effectively learns optimal caching strategies, adapting to dynamic changes in the network environment and improving overall performance.
        
| Algorithm 1: PPO-based content caching process | 
| ![Sensors 25 00393 i001]() | 
  6. Convergence and Complexity Analysis
Algorithm 2 provides details of the DRL-SCA algorithm in its entirety. As a whole loop, the generated action 
 by the PPO algorithm will be taken into (
29), (38), and (
55) as a given parameter for computing offloading decision, UAV hovering position, and STAR-RIS passive beamforming optimization. Then, utilizing block coordinate descent (BCD) to iterative optimize computing offloading decision 
, UAV hovering position 
, and STAR-RIS passive beamforming 
. Each iteration’s solution serves as the input of feasible points for the subsequent one. Therefore, 
, 
, and 
 will become a new state for the PPO agent in the next time slot.
      
| Algorithm 2: DRL-SCA-based caching decision, offloading decision, hovering position of the UAV, and passive beamforming of STAR-RIS for UAV-based WSNs | 
| ![Sensors 25 00393 i002]() | 
Therefore, the change of action will eventually lead to the transition of state, which drives the agent to learn to improve the operation objective-related reward. Even if the system finds it hard to obtain an analytical prediction due to the environment complexity and embedded SCA, deep reinforcement learning can still pave the way to figure out a feasible solution in such a complicated dynamic environment.
Lemma 1.  At least in finite iterations, Algorithm 2 can converge to a local suboptimal solution.
 Proof.  The initial problem (15) is decomposed into four subproblems and addressed iteratively by using the BCD method. Specifically, subproblems (16), (
29), (38), and (
55) are optimized in an alternating sequence to acquire a suboptimal solution. The solution obtained in each iteration is subsequently used as the feasible input for the following iteration.
Let  represent the value of the original objective function (15) obtained during the l-th iteration.
The DRL algorithm generates an improved solution 
 in Step 3 that meets the condition
        where (a) follows the inherent nature of learning approaches that always tend to seek a better reward defined in (
19).
In Step 4, the suboptimal solution for offloading decision 
 can be obtained by solving (
29), where (b) follows its convexity. In Step 5, the suboptimal solution for UAV hovering position 
 can be obtained by solving (38), where (c) follows its convexity at the given feasible point due to constraints (32b), (33b), and (38b), and it can be optimally solved due to its convexity. In Step 6, the suboptimal solution for STAR-RIS passive beamforming 
 can be obtained by solving (55), where (d) follows its convexity at the given feasible point due to constraints (47b), (52), and (54), and it can be optimally solved due to its convexity.
The inequality of Equation (
64) induces that subproblems (29), (38), and (
55) with regard to energy consumption are always nonincreasing after each iteration. It is observed that the objective function does not increase with each iteration. Due to the constraints, the minimum achievable energy consumption is bounded below by a finite value. Consequently, Algorithm 2 is assured to converge to at least a locally suboptimal solution within a finite number of iterations.    □
   7. Performance Evaluation and Discussion
In this section, we present numerical results to evaluate the effectiveness of the proposed energy-efficient aerial STAR-RIS-aided computing offloading and content caching framework for WSNs. The simulation scenario and parameter settings are initially outlined, followed by an in-depth discussion of the simulation results.
  7.1. Simulation Scenario and Parameter Settings
In this section, we evaluate the performance of the proposed energy-efficient aerial STAR-RIS-aided computing offloading and content caching framework through numerical experiments. The simulation setup is detailed below.
We assume that the reference locations of the ground edge cloud (
c), the UAV (
u), and the STAR-RIS are positioned at 
 meters, 
 meters, and 
 meters, respectively. Additionally, the four ground sensors are fixed at 
 meters, 
 meters, 
 meters, and 
 meters, representing a typical distributed sensor network configuration [
37]. The detailed simulation parameters are summarized in 
Table 2 [
38]. For example, the data size to be transmitted is set to 
 bits, representing a typical high-volume data transmission scenario in UAV-based WSNs. Moreover, we modeled data transmission as periodic, assuming a consistent flow of information typical of high-demand applications [
39]. The maximum hovering range of the UAV is constrained within 40 m (
 m), while the communication bandwidth 
 is set to 3.2 MHz, ensuring sufficient transmission capacity for the tasks. 
 represents the 600 r/bit computing resources required to complete task 
k. 
 indicates that the STAR-RIS consists of nine passive units.
For the caching DRL algorithm, detailed parameter settings are provided to ensure stability and convergence of the learning process: Different learning rates 
 are evaluated (e.g., 0.0001, 0.0003, and 0.0005) to assess their impact on convergence speed and stability. A learning rate of 0.0003 is found to provide the best balance between performance and robustness. The clipping parameter 
 is set to 0.2. This parameter constrains policy updates, ensuring stable and gradual improvements during training. The discount factor 
 is fixed at 0.99, which balances immediate rewards and long-term benefits, crucial to achieving consistent performance across episodes. These parameter settings were chosen to optimize the performance of the PPO algorithm within the DRL-SCA framework and ensure the reliable operation of the proposed model under dynamic conditions [
40,
41].
Moreover, to demonstrate the advantages of the proposed framework, we evaluate its performance in multiple existing comparative scenarios, including (1) full offloading, (2) fixed positions, (3) no STAR-RIS, and (4) no caching [
30,
42]. These existing comparative setups provide insights into the effectiveness of the proposed joint optimization framework in improving energy efficiency and overall system performance. All simulation scenarios are detailed below:
- (1)
- Proposed policy: We minimize the total energy consumption by jointly optimizing the content caching decision, offloading decision, the hovering position of the UAV, and the STAR-RIS transmission and reflection coefficient matrix in the STAR-RIS-aided UAV system. 
- (2)
- Fixed-position policy: The fixed-position policy is to set the UAV position at a fixed position without dynamic adjustment. The reason for setting this comparison strategy is to analyze the effectiveness and advantages of optimizing the UAV hovering position. 
- (3)
- Full offloading policy: The full offloading policy mainly offloads all request tasks sent by users to the UAV or cloud for independent processing. The reason for setting this comparison strategy is to analyze the effectiveness and advantages of the partial offloading strategy. 
- (4)
- No-STAR-RIS policy: The no-STAR-RIS policy is to directly transmit the task request sent by the user to the edge cloud without using the STAR-RIS transmission and reflection coefficient matrix for transmission. The reason for setting this comparison strategy is to analyze the effectiveness and advantages of the STAR-RIS transmission and reflection coefficient matrix. 
- (5)
- No-caching policy: The no–caching strategy consists of not to using the cache service capability on the UAV or edge cloud and not caching the requested content sent by the user in advance. Therefore, under this strategy, the user directly uses the partial offloading strategy to process the task locally or in the UAV or edge cloud. The reason for setting this comparison strategy is to analyze the effectiveness and advantages of the content caching strategy. 
  7.2. The Discussion of the Simulation Results
This section evaluates and discusses the performance of the proposed DRL-SCA algorithm in the energy-efficient aerial STAR-RIS-aided computing offloading and content caching framework by comparing it with various benchmark methods across diverse scenarios.
Figure 6 plots the total energy consumption of the system versus the number of iterations. We see that the proposed method can ensure that the total energy consumption of the system converges to the optimal value after only serval iterations, confirming that the optimal caching strategy, partial offloading strategy, the hovering position of the UAV, and the STAR-RIS transmission and reflection matrix are always available. Moreover, the convergence speed of the four benchmark schemes is slightly slower than that of the proposed scheme. Finally, this figure can verify that the energy consumption of the proposed STAR-RIS-aided computing offloading and content caching framework for UAV systems is less than that of other benchmark schemes.
 Figure 7 shows the total energy consumption of the system versus the network bandwidth. The results show that the total energy consumption of the five schemes decreases with the increase in network bandwidth. The reason is that the increase in network bandwidth for offloading improves the transmission rate between sensors and the UAV, as well as the transmission rate among the sensors, the STAR-RIS, and the edge cloud, reducing the transmission delay and energy consumption. At a lower network bandwidth, the proposed caching strategy optimization, UAV hovering position optimization, partial offloading optimization, and STAR-RIS transmission reflection coefficient matrix significantly reduce energy consumption compared with other comparison strategies. However, as the network bandwidth increases, it helps to increase the task transmission rate, making the impact of network bandwidth in system energy consumption dominant. The system energy consumption of other benchmark solutions gradually approaches that of the proposed solution, but through the proposed joint optimization policy, our system performance is always optimal.
 Figure 8 shows that as the CPU cycles required for computing 1 bit of task data increase, the system consumes more computing resources when calculating tasks of the same size, thereby increasing computing energy consumption. However, when the computing power of drones and local sensors is limited, more tasks are offloaded to the edge cloud for processing, which increases transmission energy consumption. At the same time, as the number of offloaded tasks increases, network resources become increasingly limited. Under the condition of limited network resources, the cache strategy optimization, drone hovering position optimization, and STAR-RIS transmission reflection coefficient matrix optimization strategies we proposed can significantly reduce system transmission energy consumption compared with other comparison strategies. Therefore, the system energy consumption gap between the proposed solution and other benchmark solutions will gradually widen. However, as network resources become increasingly limited, the network energy consumption under full offloading is only affected by computing resources, resulting in an increase in computing energy consumption, and the transmission energy consumption does not change much. Therefore, the system energy consumption of full offloading solutions gradually approaches that of the proposed partial offloading solution.
 Figure 9 evaluates the total energy consumption of the proposed scheme and other benchmark schemes for various computation task sizes. First, the increase in computation task sizes leads to an increase in the total energy consumption of the system, because as the computation task sizes increase, transmission energy consumption gradually increases, and the computing energy consumption for processing unit tasks also increases. When local sensors and drones are unable to handle tasks due to their computation capability limitations, more computation tasks have to be offloaded to the distant edge cloud for processing. This not only increases computing energy consumption but also, due to the limited resources of UAVs and local sensors, offloading tasks to a distant cloud for processing, greatly increasing transmission energy consumption. Furthermore, the advantage of the proposed scheme over other benchmark schemes in the total energy consumption is marginal when we set the computation task size to small, while the advantage becomes increasingly substantial upon increasing the task size.
 Figure 10 shows that the system energy consumption of all STAR-RIS-related schemes decreases significantly as the number of STAR-RIS elements increases. This is because STAR-RIS elements can provide more channel gain and effectively reduce transmission energy consumption. However, the no-STAR-RIS policy directly transmits some task requests sent by the sensor to the edge cloud without using the STAR-RIS transmission reflection coefficient matrix for transmission. Therefore, as the number of STAR-RIS elements increases, the system energy consumption under this strategy remains unchanged. At a smaller number of STAR-RIS elements, the energy consumption of the proposed cache strategy optimization, drone hovering position optimization, partial offloading optimization, and STAR-RIS transmission reflection coefficient matrix are significantly reduced compared with other comparison strategies. However, as the number of STAR-RIS elements increases, this provides more channel gain, which greatly reduces the transmission energy consumption of the system offloading tasks to a more distant cloud for processing. Therefore, the gap between the system energy consumption of other benchmark solutions and the energy consumption of the proposed solution gradually narrows, but through the proposed joint optimization strategy, our system performance is always in the optimal state.
 Figure 11 illustrates that the system energy consumption of all schemes increases significantly with the increase in the sensors’ transmit power. This is because as the sensor’s transmission power increases, the transmission delay gradually decreases, but the transmission energy consumption is proportional to the sensor’s power. Therefore, when the sensor’s offloading ratio remains unchanged, the overall energy consumption of the system gradually increases. At smaller sensor’s transmit power, the energy consumption of the proposed cache strategy optimization, drone hovering position optimization, partial offloading optimization, and STAR-RIS transmission reflection coefficient matrix is significantly reduced compared with other comparison strategies. However, as the sensor’s transmission power increases, it greatly increases the transmission energy consumption of the system offloading tasks to the edge cloud or UAV for processing. Therefore, the gap between the system energy consumption of other benchmark solutions and the energy consumption of the proposed solution gradually narrows, but through the proposed joint optimization strategy, our system performance is always in the optimal state.
 We define the SINR as 
. It can be seen from 
Figure 12 that at greater SINR, the signal strength is relative to the noise. That is to say, the signal transmission quality will be better, thus improving transmission efficiency, which helps to reduce energy consumption during the transmission process. At a smaller SINR, the energy consumption of the proposed caching strategy optimization, UAV hovering position optimization, partial offloading optimization, and STAR-RIS transmission reflection coefficient matrix is significantly reduced compared with other comparison strategies. However, as the SINR increases, this provides a better channel state, which greatly reduces the transmission energy consumption of the system offloading tasks to a more distant cloud for processing. Therefore, the gap between the system energy consumption of other benchmark solutions and the energy consumption of the proposed solution gradually narrows, but through the proposed joint optimization strategy, our system performance is always in the optimal state.
Figure 13 illustrates the convergence of the average weighted reward per episode for the DRL caching agent at varying learning rates. As seen in the figure, it is evident that the DRL caching agent converges rapidly at all learning rates and achieves optimal performance at a learning rate of 0.0003. A higher learning rate results in the current Q-value having more influence than the prior Q-value, leading to faster updates. However, excessively high learning rates may destabilize the learning process and hinder convergence by over-adjusting to recent rewards. Therefore, selecting an appropriate learning rate is crucial to balancing the learning speed and stability for optimal performance.
 The above simulation results demonstrate the effectiveness of the proposed DRL-SCA algorithm in optimizing energy consumption in the aerial STAR-RIS-aided computing offloading and content caching framework. Across various scenarios, our joint optimization strategy consistently outperforms benchmark methods. Specifically, the results reveal that the proposed framework achieves faster convergence in energy optimization, as shown by the rapid decrease in energy consumption over iterations. The analysis highlights that key factors such as network bandwidth, computation task size, CPU cycles per bit, STAR-RIS element count, and sensors’ transmission power significantly impact the system’s energy performance. For instance, the proposed strategy exhibits substantial energy savings, particularly under conditions of limited network resources or high computational demand, owing to the efficient coordination of offloading and caching decisions, UAV hovering positions, and STAR-RIS passive beamforming. Additionally, the results demonstrate that higher STAR-RIS element counts and better SINR conditions further enhance energy efficiency by improving channel gain and transmission quality. The sensitivity analysis of DRL caching learning rates confirms the stability and rapid convergence of the DRL component, with optimal performance achieved at a learning rate of 0.0003. Overall, the results validate that the proposed framework achieves superior energy efficiency and scalability while maintaining robustness under dynamic conditions, showcasing its applicability in real-world UAV-based wireless sensor networks.
  8. Conclusions
In this paper, the energy-efficient STAR-RIS-aided computing offloading and content caching framework is proposed in order to meet the service requirements of delay-sensitive tasks for UAV-based WSNs. Firstly, we formulated the system energy consumption minimization problem, aiming to jointly optimize content caching decisions, computing offloading decisions, UAV hovering positions, and STAR-RIS passive beamforming. Subsequently, to tackle the non-convex problem of system energy consumption minimization, we decomposed it into four subproblems and proposed a DRL-SCA algorithm for iterative optimization, achieving near-optimal solutions with low complexity. According to numerical results, the suggested framework significantly reduces the network energy consumption of the overall system in aerial STAR-RIS-aided WSNs, exhibiting a fast convergence rate.
In our future research, we plan to investigate the impact of various design parameters, including the power-to-weight ratio, to further optimize the overall system performance. We will explore the application of multi-agent proximal policy optimization (MAPPO) to enhance the content caching decisions, computing offloading decisions, UAV hovering positions, and STAR-RIS passive beamforming decision process by enabling collaborative optimization among multiple agents, further improving adaptability and the overall performance of the framework. To enhance real-world applicability, future work will focus on developing lightweight optimization models for key parameters and testing the framework under dynamic conditions, such as varying data loads, network congestion, and hardware constraints. Additionally, we will incorporate more complex real-world factors, including UAV mobility, multi-task coordination, and heterogeneous sensor networks, to further expand the framework’s applicability. Finally, we plan to explore advanced feature extraction and training techniques to improve the scalability of MAPPO algorithms for large-scale and complex network environments.