Reinforcement Learning-Based Resource Allocation for Multiple Vehicles with Communication-Assisted Sensing Mechanism
Abstract
1. Introduction
1.1. Background
1.2. Related Work
1.3. Contributions
- We propose a communication-assisted sensing mechanism based on TD-ISAC. By transmitting sensing information through communication, we can effectively reduce the number of active radars in the system. Simultaneously, by employing time division between sensing and communication, we can enhance spectrum utilization and lower the probability of multiple vehicles’ transmitted signals colliding within the same sub-band, thereby further improving system performance.
- We construct a multi-vehicle sensing and communication interference model. Building this model contributes to a comprehensive understanding of the characteristics and sources of interference, enabling us to take appropriate measures to manage system interference.
- We formulate a multi-vehicle optimization problem based on the partially observable Markov decision process (POMDP) framework. With the POMDP framework, vehicles can choose sensing or communication operations based on dynamic environments adaptively and select different sub-bands to reduce multi-vehicle interference.
- To solve the optimization problem, we design a DRL algorithm using a target network and a prioritized experience replay (PER) scheme to enable multiple vehicles to better obtain the optimal strategy for time–frequency resource allocation under uncertain environmental factors.
2. Problem Formulation
2.1. Environment Model
2.2. Signal Model
2.3. Interference Model
3. Pomdp Framework
3.1. Observation Space
3.2. Action Space
3.3. Reward Function
3.4. Optimal Planning
4. Ddqn-Per Algorithm
Algorithm 1 DDQN-PER |
|
5. Simulation Analysis
5.1. Simulation Setup
5.2. Performance Metrics
- Miss detection probability: defined as the ratio of the number of unsuccessfully sensed or transmitted unexpected events to the total number of unexpected events in each episode:
- Average system interference power: defined as the ratio of the sum of interference power of all CAVPs per episode to the number of training steps per episode:
5.3. Contrasting Approaches
- DRQN: In [24], each vehicle performs sensing independently and does not communicate with other AVs. AVs only need to allocate sensing frequency resources. The authors in [24] introduce a long short-term memory (LSTM) network into the DQN algorithm, enabling the vehicle to learn how to select sensing sub-bands by incorporating both its current and past observations. In this case, miss detection occurs only when the vehicle does not sense successfully.
- Random (CAVPs): We use the random policy to allocate time–frequency resources for the sensing and communication frames of CAVPs. i.e., CAVPs randomly select an action from the action space mentioned in Equation (8) with equal probability.
- DDQN (CAVPs): We use the DDQN algorithm proposed in [31] to allocate time–frequency resources for the sensing and communication frames of CAVPs.
- DQN (single CAVP): In [16], the DQN algorithm was introduced to allocate time resources for both sensing and communication functions of a single vehicle in a dynamic environment. Nevertheless, Ref. [16] did not account for interference between vehicles, implying that both sensing and communication could be successful regardless of the chosen strategy. We apply this algorithm to CAVPs, considering it as an upper-bound performance to evaluate the effectiveness of other algorithms.
5.4. Results Analysis
- The number of AVs changes, other conditions are fixed: In Figure 5, as N increases, and increase for all algorithms except DQN. The reason is that as N increases, the mutual interference between the AVs increases, while DQN is used in a single-CAVP scenario, and the effect of interference is not considered. In addition, compared with DRQN, random has a lower . This is because the proposed communication-assisted sensing mechanism allows some vehicles to acquire sensory data through communication, eliminating the requirement for all vehicles to perform sensing tasks simultaneously, effectively reducing inter-vehicle interference. Nevertheless, random has a higher than DRQN. This is because the of random is not only affected by system interference but also influenced by the dynamic environment. Random cannot choose sensing or communication modes based on the environment, leading to a higher rate of . As DDQN-PER utilizes the communication-assisted sensing mechanism and adaptive time–frequency resource allocation algorithm, it can effectively control the mutual interference between vehicles and reduce the miss detection probability. Thus, compared with other algorithms, and of DDQN-PER are closer to DQN, and the performance does not deteriorate significantly as N increases. In conclusion, DDQN-PER proves to be more advantageous than other algorithms, particularly in scenarios with a higher volume of vehicles.
- The number of sub-bands changes, other conditions are fixed: In Figure 6, as M increases, and decrease for all algorithms except DQN. The reason is that as M increases, the probability of two signals switching to the same sub-band becomes small. Compared with other algorithms, and of DDQN-PER are closer to DQN. It can be deduced that DDQN-PER holds a comparative advantage over other algorithms, especially when the number of sub-bands is limited.
- The interval changes, other conditions are fixed: In Figure 7, when (50 m, 60 m), of DDQN-PER is higher than DRQN. However, when is small, DDQN-PER has better performance. In addition, compared with other algorithms, the of DDQN-PER is closer to 0. This is because when the interval is small, DRQN is greatly affected by interference and cannot effectively allocate frequency resources, while DDQN-PER can effectively control mutual interference and reduce the miss detection probability by using the communication-assisted sensing mechanism and adaptive time–frequency resource allocation algorithm. It can be concluded that DDQN-PER is more advantageous than other algorithms when the interval is smaller.
- The (i.e., the probability of the occurrence of an unexpected event under condition ) changes, other conditions are fixed: In Figure 8, as changes, of DDQN-PER and DQN changes, while other algorithms are almost unchanged. This is because DDQN-PER and DQN can adaptively allocate time–frequency resources according to the environment. As DDQN-PER is affected by multi-vehicle interference, its does not continue to decrease as increases like DQN. Furthermore, compared with other algorithms, the of DDQN-PER is closer to 0.
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Ma, D.; Shlezinger, N.; Huang, T.; Liu, Y.; Eldar, Y.C. Joint Radar-Communication Strategies for Autonomous Vehicles: Combining Two Key Automotive Technologies. IEEE Signal Process. Mag. 2020, 37, 85–97. [Google Scholar] [CrossRef]
- Sciuto, G.L.; Kowol, P.; Nowak, P.; Banás, W.; Coco, S.; Capizzi, G. Neural network developed for obstacle avoidance of the four wheeled electric vehicle. In Proceedings of the 30th IEEE International Conference on Electronics, Circuits and Systems (ICECS), Istanbul, Turkey, 4–7 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–4. [Google Scholar]
- Kowol, P.; Nowak, P.; Banaś, W.; Bagier, P.; Lo Sciuto, G. Haptic feedback remote control system for electric mechanical assembly vehicle developed to avoid obstacles. J. Intell. Robot. Syst. 2023, 107, 41. [Google Scholar] [CrossRef]
- Liu, F.; Cui, Y.; Masouros, C.; Xu, J.; Han, T.X.; Eldar, Y.C.; Buzzi, S. Integrated sensing and communications: Toward dual-functional wireless networks for 6G and beyond. IEEE J. Sel. Areas Commun. 2022, 40, 1728–1767. [Google Scholar] [CrossRef]
- Feng, Z.; Fang, Z.; Wei, Z.; Chen, X.; Quan, Z.; Ji, D. Joint radar and communication: A survey. China Commun. 2020, 17, 1–27. [Google Scholar] [CrossRef]
- Hassanien, A.; Amin, M.G.; Zhang, Y.D.; Ahmad, F. Signaling strategies for dual-function radar communications: An overview. IEEE Aerosp. Electron. Syst. Mag. 2016, 31, 36–45. [Google Scholar] [CrossRef]
- Liu, Y.; Liao, G.; Xu, J.; Yang, Z.; Zhang, Y. Adaptive OFDM integrated radar and communications waveform design based on information theory. IEEE Commun. Lett. 2017, 21, 2174–2177. [Google Scholar] [CrossRef]
- Zhang, Q.; Sun, H.; Gao, X.; Wang, X.; Feng, Z. Time-Division ISAC Enabled Connected Automated Vehicles Cooperation Algorithm Design and Performance Evaluation. IEEE J. Sel. Areas Commun. 2022, 40, 2206–2218. [Google Scholar] [CrossRef]
- Luong, N.C.; Lu, X.; Hoang, D.T.; Niyato, D.; Kim, D.I. Radio Resource Management in Joint Radar and Communication: A Comprehensive Survey. IEEE Commun. Surv. Tutor. 2021, 23, 780–814. [Google Scholar] [CrossRef]
- Chiriyath, A.R.; Paul, B.; Bliss, D.W. Radar-communications convergence: Coexistence, cooperation, and co-design. IEEE Trans. Cogn. Commun. Netw. 2017, 3, 1–12. [Google Scholar] [CrossRef]
- Lee, J.; Cheng, Y.; Niyato, D.; Guan, Y.L.; González G., D. Intelligent Resource Allocation in Joint Radar-Communication with Graph Neural Networks. IEEE Trans. Veh. Technol. 2022, 71, 11120–11135. [Google Scholar] [CrossRef]
- Kumari, P.; Gonzalez-Prelcic, N.; Heath, R.W. Investigating the IEEE 802.11ad Standard for Millimeter Wave Automotive Radar. In Proceedings of the 82nd IEEE Vehicular Technology Conference (VTC2015-Fall), Boston, MA, USA, 6–9 September 2015; pp. 1–5. [Google Scholar]
- Kumari, P.; Nguyen, D.H.N.; Heath, R.W. Performance trade-off in an adaptive IEEE 802.11AD waveform design for a joint automotive radar and communication system. In Proceedings of the 42nd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 4281–4285. [Google Scholar]
- Cao, N.; Chen, Y.; Gu, X.; Feng, W. Joint Bi-Static Radar and Communications Designs for Intelligent Transportation. IEEE Trans. Veh. Technol. 2020, 69, 13060–13071. [Google Scholar] [CrossRef]
- Ren, P.; Munari, A.; Petrova, M. Performance Analysis of a Time-sharing Joint Radar-Communications Network. In Proceedings of the 2020 International Conference on Computing, Networking and Communications (ICNC), Big Island, HI, USA, 17–20 February 2020; pp. 908–913. [Google Scholar]
- Hieu, N.Q.; Hoang, D.T.; Luong, N.C.; Niyato, D. iRDRC: An Intelligent Real-Time Dual-Functional Radar-Communication System for Automotive Vehicles. IEEE Wirel. Commun. Lett. 2020, 9, 2140–2143. [Google Scholar] [CrossRef]
- Hieu, N.Q.; Hoang, D.T.; Niyato, D.; Wang, P.; Kim, D.I.; Yuen, C. Transferable Deep Reinforcement Learning Framework for Autonomous Vehicles With Joint Radar-Data Communications. IEEE Trans. Commun. 2022, 70, 5164–5180. [Google Scholar] [CrossRef]
- Fan, Y.; Huang, J.; Wang, X.; Fei, Z. Resource allocation for v2x assisted automotive radar system based on reinforcement learning. In Proceedings of the 2022 14th International Conference on Wireless Communications and Signal Processing (WCSP), Nanjing, China, 1–3 November 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 672–676. [Google Scholar]
- Lee, J.; Niyato, D.; Guan, Y.L.; Kim, D.I. Learning to Schedule Joint Radar-Communication Requests for Optimal Information Freshness. In Proceedings of the 2021 IEEE Intelligent Vehicles Symposium (IV), Nagoya, Japan, 11–17 July 2021; pp. 8–15. [Google Scholar]
- Alland, S.; Stark, W.; Ali, M.; Hegde, M. Interference in Automotive Radar Systems: Characteristics, Mitigation Techniques, and Current and Future Research. IEEE Signal Process. Mag. 2019, 36, 45–59. [Google Scholar] [CrossRef]
- Zhang, M.; He, S.; Yang, C.; Chen, J.; Zhang, J. VANET-Assisted Interference Mitigation for Millimeter-Wave Automotive Radar Sensors. IEEE Netw. 2020, 34, 238–245. [Google Scholar] [CrossRef]
- Huang, J.; Fei, Z.; Wang, T.; Wang, X.; Liu, F.; Zhou, H.; Zhang, J.A.; Wei, G. V2X-communication assisted interference minimization for automotive radars. China Commun. 2019, 16, 100–111. [Google Scholar] [CrossRef]
- Khoury, J.; Ramanathan, R.; McCloskey, D.; Smith, R.; Campbell, T. RadarMAC: Mitigating Radar Interference in Self-Driving Cars. In Proceedings of the 13th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON), London, UK, 27–30 June 2016; pp. 1–9. [Google Scholar]
- Liu, P.; Liu, Y.; Huang, T.; Lu, Y.; Wang, X. Decentralized Automotive Radar Spectrum Allocation to Avoid Mutual Interference Using Reinforcement Learning. IEEE Trans. Aerosp. Electron. Syst. 2021, 57, 190–205. [Google Scholar] [CrossRef]
- Chang, H.H.; Song, H.; Yi, Y.; Zhang, J.; He, H.; Liu, L. Distributive Dynamic Spectrum Access Through Deep Reinforcement Learning: A Reservoir Computing-Based Approach. IEEE Internet Things J. 2019, 6, 1938–1948. [Google Scholar] [CrossRef]
- Naparstek, O.; Cohen, K. Deep Multi-User Reinforcement Learning for Distributed Dynamic Spectrum Access. IEEE Trans. Wirel. Commun. 2019, 18, 310–323. [Google Scholar] [CrossRef]
- Lee, J.; Niyato, D.; Guan, Y.L.; Kim, D.I. Learning to Schedule Joint Radar-Communication with Deep Multi-Agent Reinforcement Learning. IEEE Trans. Veh. Technol. 2022, 71, 406–422. [Google Scholar] [CrossRef]
- Boban, M.; Kousaridas, A.; Manolakis, K.; Eichinger, J.; Xu, W. Use cases, requirements, and design considerations for 5G V2X. arXiv 2017, arXiv:1712.01754. [Google Scholar]
- Tampuu, A.; Matiisen, T.; Kodelja, D.; Kuzovkin, I.; Korjus, K.; Aru, J.; Aru, J.; Vicente, R. Multiagent cooperation and competition with deep reinforcement learning. PloS ONE 2017, 12, e0172395. [Google Scholar] [CrossRef] [PubMed]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
- Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized experience replay. arXiv 2015, arXiv:1511.05952. [Google Scholar]
- Xing, Y.; Sun, Y.; Qiao, L.; Wang, Z.; Si, P.; Zhang, Y. Deep reinforcement learning for cooperative edge caching in vehicular networks. In Proceedings of the 13th International Conference on Communication Software and Networks (ICCSN), Chongqing, China, 4–7 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 144–149. [Google Scholar]
Notation | Value |
---|---|
v | 10 m/s |
4 | |
T | s |
25 dBmW | |
15 dBmW | |
G | 48 dB |
5 mm2 | |
g | |
L | 3 m |
10 |
Notation | Value |
---|---|
200 | |
64 | |
20 | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fan, Y.; Fei, Z.; Huang, J.; Wang, X. Reinforcement Learning-Based Resource Allocation for Multiple Vehicles with Communication-Assisted Sensing Mechanism. Electronics 2024, 13, 2442. https://doi.org/10.3390/electronics13132442
Fan Y, Fei Z, Huang J, Wang X. Reinforcement Learning-Based Resource Allocation for Multiple Vehicles with Communication-Assisted Sensing Mechanism. Electronics. 2024; 13(13):2442. https://doi.org/10.3390/electronics13132442
Chicago/Turabian StyleFan, Yuxin, Zesong Fei, Jingxuan Huang, and Xinyi Wang. 2024. "Reinforcement Learning-Based Resource Allocation for Multiple Vehicles with Communication-Assisted Sensing Mechanism" Electronics 13, no. 13: 2442. https://doi.org/10.3390/electronics13132442
APA StyleFan, Y., Fei, Z., Huang, J., & Wang, X. (2024). Reinforcement Learning-Based Resource Allocation for Multiple Vehicles with Communication-Assisted Sensing Mechanism. Electronics, 13(13), 2442. https://doi.org/10.3390/electronics13132442