Deep Reinforcement Learning for Secure and Low-Latency Communications in UAV-Mounted STAR-RIS Assisted Urban Vehicular Networks
Abstract
1. Introduction
- A UAV-mounted STAR-RIS-assisted urban vehicular communication framework is established by jointly considering urban blockage, dynamic vehicle mobility, passive eavesdropping threats, queueing delay, and UAV mobility constraints.
- A long-term, secure, and low-latency utility maximization problem is formulated to jointly optimize the UAV trajectory, the STAR-RIS transmission–reflection partition ratio, the phase-shift matrices, and the transmit power allocation, resulting in a high-dimensional and strongly coupled continuous-control problem.
- A hierarchical constrained soft actor–critic-based joint optimization algorithm is proposed to address the above problem. The developed method improves the adaptability of UAV-mounted STAR-RIS control in dynamic vehicular scenarios and enhances the trade-off between secrecy performance and delay efficiency.
- Simulation results demonstrate that the proposed method outperforms DDPG and all structural benchmark schemes. Compared with PPO, it achieves lower delay and lower secrecy outage probability while maintaining a competitive secrecy rate and successful service ratio, thereby yielding the highest normalized composite utility.
2. System Model
2.1. Geometry and Mobility Model
2.2. Channel Gain Model
2.3. Imperfect CSI and Practical STAR-RIS Constraints
2.4. Vehicle State Acquisition and Control Signaling
2.5. Secure Communication Model
2.6. Queue Evolution and Low-Latency Service Model
2.7. UAV Flight Energy Consumption Model
3. MDP Modeling and Problem Transformation
3.1. MDP Modeling
3.2. State Space Design
- denotes the current UAV position;
- denotes the set of legitimate user positions;
- denotes the set of eavesdropper positions;
- denotes the traffic queue states;
- represents the channel-state features composed of the BS–STAR-RIS, STAR-RIS–user, and STAR-RIS–eavesdropper links;
- and denote the STAR-RIS partition ratio and power allocation of the previous slot, respectively.
3.3. Action Space Design
- denotes the UAV displacement control at slot n;
- denotes the STAR-RIS reflection ratio, while the transmission ratio is obtained as ;
- is the reflection phase-shift vector;
- is the transmission phase-shift vector;
- is the power allocation vector.
3.4. Reward Function Design
4. HC-SAC-Based Joint Optimization Algorithm
4.1. HC-SAC Framework Overview
4.2. Hierarchical Constrained SAC Update
4.3. Feasibility Mapping of Continuous Actions
4.4. Algorithm Procedure
| Algorithm 1 Proposed HC-SAC-based joint optimization algorithm |
|
4.5. Complexity and Convergence Discussion
5. Simulation Results and Discussion
5.1. Simulation Settings
5.2. Simulation Setup and Evaluation Metrics
5.3. Benchmark Schemes
5.4. Training Behavior Comparison
5.5. Delay Performance Analysis
5.6. Secure Communication Performance Analysis
5.7. Service Reliability Analysis
5.8. Composite Utility Analysis
5.9. Ablation Study
5.10. Robustness Analysis Under Practical Non-Idealities
5.11. Reference Performance and Composite Utility Comparison
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| STAR-RIS | Simultaneously transmitting and reflecting reconfigurable intelligent surface |
| UAV | Unmanned aerial vehicle |
| SAC | Soft actor–critic |
| DRL | Deep reinforcement learning |
References
- Meng, K.; Masouros, C.; Petropulu, A.P.; Hanzo, L. Cooperative ISAC networks: Opportunities and challenges. IEEE Wirel. Commun. 2024, 32, 212–219. [Google Scholar] [CrossRef]
- Liu, Q.; Luo, R.; Liang, H.; Liu, Q. Energy-efficient joint computation offloading and resource allocation strategy for ISAC-aided 6G V2X networks. IEEE Trans. Green Commun. Netw. 2023, 7, 413–423. [Google Scholar] [CrossRef]
- Cheng, X.; Duan, D.; Gao, S.; Yang, L. Integrated sensing and communications (ISAC) for vehicular communication networks (VCN). IEEE Internet Things J. 2022, 9, 23441–23451. [Google Scholar] [CrossRef]
- Yu, K.; Li, K.; Zhao, Y.; Feng, Z.; Li, D.; Zhang, Q.; Yu, J. Movable Antenna-Aided Secure V2X Communication: An Integrated Sensing and Communication Perspective. IEEE Wirel. Commun. 2025, 32, 118–124. [Google Scholar] [CrossRef]
- Hasan, M.; Mohan, S.; Shimizu, T.; Lu, H. Securing vehicle-to-everything (V2X) communication platforms. IEEE Trans. Intell. Veh. 2020, 5, 693–713. [Google Scholar] [CrossRef]
- Gyawali, S.; Xu, S.; Qian, Y.; Hu, R.Q. Challenges and solutions for cellular based V2X communications. IEEE Commun. Surv. Tutor. 2020, 23, 222–255. [Google Scholar] [CrossRef]
- Hakeem, S.A.A.; Kim, H. Advancing intrusion detection in V2X networks: A comprehensive survey on machine learning, federated learning, and edge AI for V2X security. IEEE Trans. Intell. Transp. Syst. 2025, 26, 11137–11205. [Google Scholar] [CrossRef]
- ElMossallamy, M.A.; Zhang, H.; Song, L.; Seddik, K.G.; Han, Z.; Li, G.Y. Reconfigurable intelligent surfaces for wireless communications: Principles, challenges, and opportunities. IEEE Trans. Cogn. Commun. Netw. 2020, 6, 990–1002. [Google Scholar] [CrossRef]
- Mu, X.; Liu, Y.; Guo, L.; Lin, J.; Schober, R. Simultaneously transmitting and reflecting (STAR) RIS aided wireless communications. IEEE Trans. Wirel. Commun. 2021, 21, 3083–3098. [Google Scholar] [CrossRef]
- Aung, P.S.; Nguyen, L.X.; Tun, Y.K.; Han, Z.; Hong, C.S. Deep reinforcement learning-based joint spectrum allocation and configuration design for STAR-RIS-assisted V2X communications. IEEE Internet Things J. 2023, 11, 11298–11311. [Google Scholar] [CrossRef]
- Andreou, A.; Mavromoustakis, C.X.; Batalla, J.M.; Markakis, E.K.; Mastorakis, G. UAV-assisted RSUs for V2X connectivity using voronoi diagrams in 6G+ infrastructures. IEEE Trans. Intell. Transp. Syst. 2023, 24, 15855–15865. [Google Scholar] [CrossRef]
- Peng, Y.; Tang, J.; Yang, Q.; Han, Z.; Ma, J. Joint power allocation algorithm for UAV-borne simultaneous transmitting and reflecting reconfigurable intelligent surface-assisted non-orthogonal multiple access system. IEEE Access 2023, 11, 140506–140518. [Google Scholar] [CrossRef]
- Nakazato, J.; So, H.; Tran, G.K.; Suto, K. Multi-Agent Reinforcement Learning for Resilient UAV Ad Hoc Backhaul Networks. IEEE J. Miniaturization Air Space Syst. 2026, 7, 232–245. [Google Scholar] [CrossRef]
- Li, S.; Duo, B.; Yuan, X.; Liang, Y.C.; Di Renzo, M. Reconfigurable Intelligent Surface Assisted UAV Communication: Joint Trajectory Design and Passive Beamforming. IEEE Wirel. Commun. Lett. 2020, 9, 716–720. [Google Scholar] [CrossRef]
- Yang, L.; Meng, F.; Zhang, J.; Hasna, M.O.; Di Renzo, M. On the Performance of RIS-Assisted Dual-Hop UAV Communication Systems. IEEE Trans. Veh. Technol. 2020, 69, 10385–10390. [Google Scholar] [CrossRef]
- Liu, X.; Liu, Y.; Chen, Y. Machine Learning Empowered Trajectory and Passive Beamforming Design in UAV-RIS Wireless Networks. IEEE J. Sel. Areas Commun. 2021, 39, 2042–2055. [Google Scholar] [CrossRef]
- Li, S.; Duo, B.; Di Renzo, M.; Tao, M.; Yuan, X. Robust Secure UAV Communications with the Aid of Reconfigurable Intelligent Surfaces. IEEE Trans. Wirel. Commun. 2021, 20, 6402–6417. [Google Scholar] [CrossRef]
- Shi, E.; Zhang, J.; Du, H.; Ai, B.; Yuen, C.; Niyato, D. RIS-Aided Cell-Free Massive MIMO Systems for 6G: Fundamentals, System Design, and Applications. Proc. IEEE 2024, 112, 331–364. [Google Scholar] [CrossRef]
- Saikia, P.; Pala, S.; Singh, K.; Singh, S.K.; Huang, W.J. Proximal policy optimization for RIS-assisted full duplex 6G-V2X communications. IEEE Trans. Intell. Veh. 2023, 9, 5134–5149. [Google Scholar] [CrossRef]
- Long, X.; Zhao, Y.; Wu, H.; Xu, C.Z. Deep reinforcement learning for integrated sensing and communication in RIS-assisted 6G V2X system. IEEE Internet Things J. 2024, 11, 39834–39849. [Google Scholar] [CrossRef]
- Wang, C.; Li, Z.; Xia, X.G.; Shi, J.; Si, J.; Zou, Y. Physical layer security enhancement using artificial noise in cellular vehicle-to-everything (C-V2X) networks. IEEE Trans. Veh. Technol. 2020, 69, 15253–15268. [Google Scholar] [CrossRef]
- de Lima, D.V.; da Costa, J.P.J.; da Silva, A.A.S.; Santos, G.A.; Vargas, J.A.R.; de Alexandria, A.R. Broadband Beamforming via Frequency Invariance Transformation and PARAFAC Decomposition for Jamming Mitigation in V2X Scenarios. In Proceedings of the 2024 IEEE 99th Vehicular Technology Conference (VTC2024-Spring); IEEE: New York, NY, USA, 2024; pp. 1–6. [Google Scholar]
- Shang, C.; Yu, J.; Hoang, D.T. Energy-efficient and intelligent ISAC in V2X networks with spiking neural networks-driven DRL. IEEE Trans. Wirel. Commun. 2026, 25, 1182–1195. [Google Scholar] [CrossRef]
- Li, Z.; Liao, L.; Gu, S.; Zhao, J. Physical Layer Eavesdropping Defense Scheme for V2X Based on Improved SAC Algorithm. Phys. Commun. 2025, 74, 102980. [Google Scholar] [CrossRef]
- Amudha, S.; Sivaradje, G.; Nagarajan, G. Hyperparameter-Tuned PPO-Based Federated Deep Reinforcement Learning (FDRL) with Explainability for Efficient V2X Resource Allocation in 5G Networks. In Proceedings of the International Conference on Artificial Intelligence and Secure Data Analytics (ICAISDA 2025); Atlantis Press: Dordrecht, The Netherlands, 2026; pp. 559–573. [Google Scholar]
- Mlika, Z.; Cherkaoui, S. Deep deterministic policy gradient to minimize the age of information in cellular V2X communications. IEEE Trans. Intell. Transp. Syst. 2022, 23, 23597–23612. [Google Scholar] [CrossRef]









| Work | Main Scenario | UAV Mobility | RIS/ STAR-RIS | V2X | Security | Queue Latency | DRL/ SAC | Hierarchical Constraint Control |
|---|---|---|---|---|---|---|---|---|
| [14] | RIS-assisted UAV communication with joint trajectory and passive beamforming design | ✓ | RIS | – | – | – | – | – |
| [15] | Performance analysis of RIS-assisted dual-hop UAV communication | ✓ | RIS | – | – | – | – | – |
| [16] | Machine-learning-empowered UAV-RIS wireless network | ✓ | RIS | – | – | – | ✓ | – |
| [17] | Robust secure UAV communication with RIS | ✓ | RIS | – | ✓ | – | – | – |
| [10] | DRL-based spectrum allocation and STAR-RIS configuration for V2X | – | STAR-RIS | ✓ | – | – | ✓ | – |
| [12] | UAV-borne STAR-RIS-assisted NOMA communication | ✓ | STAR-RIS | – | – | – | – | – |
| [19] | PPO-based RIS-assisted full-duplex 6G-V2X communication | – | RIS | ✓ | – | – | ✓ | – |
| [20] | DRL-based ISAC optimization in RIS-assisted 6G V2X systems | – | RIS | ✓ | – | – | ✓ | – |
| [24] | Improved SAC-based physical-layer eavesdropping defense for V2X | – | – | ✓ | ✓ | – | SAC | – |
| This work | UAV-mounted STAR-RIS-assisted secure and low-latency urban vehicular communication | ✓ | STAR-RIS | ✓ | ✓ | ✓ | HC-SAC | ✓ |
| Parameter | Description | Value |
|---|---|---|
| K | Number of vehicular users | 6 |
| E | Number of eavesdroppers | 1–4 |
| M | Number of STAR-RIS elements | 64 |
| N | Number of time slots | 30 |
| Slot duration | 1 s | |
| H | UAV flight altitude | 80 m |
| Maximum UAV speed | 20 m/s | |
| Maximum transmit power | 20–40 dBm | |
| W | System bandwidth | 10 MHz |
| Noise power | dBm | |
| Path loss at reference distance | dB | |
| Path-loss exponent range | – | |
| Rician factor range | –12 | |
| Vehicle speed range | 6–22 m/s | |
| Packet arrival rate | 2 packets/slot | |
| Packet size | bits | |
| Maximum tolerable delay | 12 slots | |
| Secrecy outage threshold | Mbps | |
| Maximum tolerable SOP |
| Parameter | Description | Value |
|---|---|---|
| Blade profile power | W | |
| Induced power in hovering | W | |
| Rotor blade tip speed | 120 m/s | |
| Mean rotor induced velocity | m/s | |
| Fuselage drag ratio | ||
| Air density | kg/m3 | |
| s | Rotor solidity | |
| A | Rotor disc area | m2 |
| Parameter | Value |
|---|---|
| High-level actor hidden layers | 256-256-128 |
| Low-level actor hidden layers | 256-256-128 |
| Critic hidden layers | 256-256-128 |
| Activation function | ReLU |
| Actor learning rate | |
| Critic learning rate | |
| Entropy temperature learning rate | |
| Constraint multiplier learning rate | |
| Replay buffer size | |
| Mini-batch size | 256 |
| Discount factor | |
| Soft update coefficient | |
| Initial entropy temperature | |
| Target entropy | − |
| Training episodes | 420 |
| Time slots per episode | N |
| Warm-up steps | 2200 |
| Evaluation interval | Every 10 episodes |
| Evaluation episodes | 24 |
| Optimizer | Adam |
| Scheme | Description |
|---|---|
| Proposed HC-SAC | The proposed hierarchical constrained SAC scheme jointly optimizes the UAV trajectory, STAR-RIS transmission–reflection partition ratio, phase-shift matrices, and transmit power allocation. The hierarchical policy structure decouples large-scale UAV and partition control from fine-grained phase/power control, while the constraint-aware reward shaping mechanism improves the security–latency trade-off. |
| PPO-based scheme | The proximal policy optimization algorithm is adopted to learn the joint control policy. It uses the same state space, action space, feasibility mapping, reward components, and simulation environment as the proposed HC-SAC. The main difference lies in the on-policy clipped policy update mechanism. |
| DDPG-based scheme | The deep deterministic policy gradient algorithm is used as another DRL-based baseline. It adopts the same state representation, action mapping rules, reward function, and environmental settings as HC-SAC. Different from HC-SAC, DDPG learns a deterministic policy and does not use entropy-regularized exploration. |
| Fixed UAV trajectory | The UAV follows a predefined straight-line trajectory, while the STAR-RIS transmission–reflection partition ratio, phase-shift matrices, and transmit power allocation are optimized. This benchmark is used to evaluate the contribution of UAV trajectory optimization. |
| Fixed STAR-RIS partition | The STAR-RIS transmission–reflection partition ratio is fixed during the whole service period, while the UAV trajectory, phase-shift matrices, and transmit power allocation are optimized. This benchmark is used to evaluate the benefit of adaptive STAR-RIS transmission–reflection partitioning. |
| Random phase-shift scheme | The STAR-RIS phase shifts are randomly generated, while the UAV trajectory, transmission–reflection partition ratio, and transmit power allocation are optimized. This benchmark is used to evaluate the importance of the STAR-RIS phase-shift optimization. |
| No STAR-RIS scheme | The UAV provides aerial assistance without STAR-RIS-enabled propagation reconfiguration. Only the UAV trajectory and transmit power allocation are optimized. This benchmark is used to quantify the performance gain brought by the UAV-mounted STAR-RIS. |
| Variant | Secrecy Rate | Delay | SOP | SSR | Utility |
|---|---|---|---|---|---|
| Standard SAC | 0.476 ± 0.003 | 11.753 ± 0.133 | 0.912 ± 0.004 | 0.105 ± 0.001 | 0.146 ± 0.013 |
| HC-SAC w/o hierarchy | 0.516 ± 0.008 | 11.390 ± 0.129 | 0.880 ± 0.002 | 0.114 ± 0.002 | 0.452 ± 0.018 |
| HC-SAC w/o constraints | 0.388 ± 0.003 | 10.831 ± 0.116 | 0.738 ± 0.002 | 0.102 ± 0.001 | 0.552 ± 0.001 |
| Proposed HC-SAC | 0.433 ± 0.004 | 10.440 ± 0.109 | 0.714 ± 0.002 | 0.109 ± 0.001 | 0.851 ± 0.020 |
| Setting | Secrecy Rate | Delay | SOP | SSR |
|---|---|---|---|---|
| Ideal CSI, continuous phase | 0.488 ± 0.009 | 10.112 ± 0.030 | 0.698 ± 0.002 | 0.121 ± 0.001 |
| CSI error std. = 0.02 | 0.492 ± 0.010 | 10.282 ± 0.049 | 0.695 ± 0.001 | 0.123 ± 0.001 |
| CSI error std. = 0.05 | 0.492 ± 0.010 | 10.282 ± 0.049 | 0.696 ± 0.001 | 0.123 ± 0.001 |
| CSI error std. = 0.10 | 0.492 ± 0.010 | 10.282 ± 0.049 | 0.695 ± 0.001 | 0.123 ± 0.001 |
| 3-bit phase quantization | 0.488 ± 0.009 | 10.113 ± 0.030 | 0.699 ± 0.002 | 0.121 ± 0.001 |
| 2-bit phase quantization | 0.488 ± 0.009 | 10.112 ± 0.030 | 0.698 ± 0.002 | 0.121 ± 0.001 |
| 1-bit phase quantization | 0.488 ± 0.009 | 10.111 ± 0.030 | 0.698 ± 0.002 | 0.121 ± 0.001 |
| Scheme | Avg. Secrecy Rate (Mbps) | Avg. Delay (Slots) | SOP | SSR | Composite Utility |
|---|---|---|---|---|---|
| Proposed HC-SAC | 0.3928 | 10.8471 | 0.7160 | 0.1004 | 0.9254 |
| PPO-based scheme | 0.4927 | 11.7230 | 0.8501 | 0.1163 | 0.6630 |
| DDPG-based scheme | 0.4171 | 11.9358 | 0.8599 | 0.1012 | 0.5060 |
| Fixed UAV trajectory | 0.1811 | 12.4940 | 0.9058 | 0.0628 | 0.2403 |
| Fixed STAR-RIS partition | 0.2233 | 12.2493 | 0.8932 | 0.0681 | 0.3241 |
| Random phase-shift scheme | 0.1792 | 12.4942 | 0.9036 | 0.0624 | 0.2480 |
| No STAR-RIS scheme | 0.0903 | 13.3957 | 0.9473 | 0.0497 | 0.0000 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Tang, J.; Yuan, J.; Zhao, H.; Chen, M.; Peng, Y. Deep Reinforcement Learning for Secure and Low-Latency Communications in UAV-Mounted STAR-RIS Assisted Urban Vehicular Networks. Sensors 2026, 26, 3469. https://doi.org/10.3390/s26113469
Tang J, Yuan J, Zhao H, Chen M, Peng Y. Deep Reinforcement Learning for Secure and Low-Latency Communications in UAV-Mounted STAR-RIS Assisted Urban Vehicular Networks. Sensors. 2026; 26(11):3469. https://doi.org/10.3390/s26113469
Chicago/Turabian StyleTang, Jian, Jun Yuan, Hu Zhao, Mengxiang Chen, and Yi Peng. 2026. "Deep Reinforcement Learning for Secure and Low-Latency Communications in UAV-Mounted STAR-RIS Assisted Urban Vehicular Networks" Sensors 26, no. 11: 3469. https://doi.org/10.3390/s26113469
APA StyleTang, J., Yuan, J., Zhao, H., Chen, M., & Peng, Y. (2026). Deep Reinforcement Learning for Secure and Low-Latency Communications in UAV-Mounted STAR-RIS Assisted Urban Vehicular Networks. Sensors, 26(11), 3469. https://doi.org/10.3390/s26113469

