# Deep Reinforcement Learning Based Resource Allocation with Radio Remote Head Grouping and Vehicle Clustering in 5G Vehicular Networks

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Related Works

## 3. System Model and Problem Formulation

## 4. Proposed Algorithm

#### 4.1. RRH Grouping Method

#### 4.2. Vehicle Cluster Formation

- First step is initializing step, and it occurs when no cluster is configured or the initial step of algorithm. If the initial step, all the VE associate with RRH serving the highest SINR and are set V2I mode. In first step, V2I mode VE with the highest value ${F}_{C}{H}^{u}$ is selected as first CH and ${F}_{CH}^{u}$ are calculated as follows:$${F}_{CH}^{u}={\gamma}_{k}^{c}\xb7PPP{P}^{u}$$
- Next step is cluster formation and deformation step, and it occurs in the following situations: (i) When the VE moves to another cell; (ii) when the payload of the jumbo frame exceeds the capacity of the acceptable payload; (iii) when the communication distance between CM and CH exceeds the V2V communication distance constraint. In cluster formation and deformation step, the neighboring vehicles decide whether associating with the cluster or not through comparing receiving SINR between serving RRH and adjacent CH. In this process, the SINR of the serving RRH is compared for each vehicle with those of the neighboring CHs using the messages that each vehicle broadcasts consisting of ID, location, and mobility speed at every time unit. If the SINR of one CH is higher than that of another, then that CH will be clustered with the CH providing the highest SINR.
- The last step occurs when VEs cannot join any cluster or receive higher SINR from RRH than the SINR from neighboring CHs. Then, the VEs that do not belong to the cluster are set to the V2I mode.

#### 4.3. Deep Q-Learning Based Resource Allocation Method

- The state of V2I VE c, the observed state ${s}_{t}$ at time unit t, consists of eight pieces of information: The interference received from CH ${I}_{t-1}^{v,c}$; the interference received from the CMs in vehicle cluster i${I}_{t-1}^{v,i}$; the size of cluster ${l}_{t}^{i}$; the load of serving RRH ${\mathsf{\lambda}}_{t}^{s}$; the rate of RB k usage ${R}_{t}^{k}$; the number of VEs sharing the same RB k${L}_{t}^{k}$; the queue length of VE c${L}_{t}^{c}$; and the number of clusters in range of RRH s${L}_{t}^{s}$. Thus, the state can be described as follows:$${s}_{t}=\{{I}_{t-1}^{v,c},{I}_{t-1}^{v,i},{l}_{t}^{i},{\mathsf{\lambda}}_{t}^{s},{R}_{t}^{k},{L}_{t}^{k},{L}_{t}^{c},{L}_{t}^{s}\}$$
- The action of each V2I VE c is defined as$${a}_{t}=\{p,k\}$$
- To maximize the system capacity and guarantee the QoS requirements, the reward is defined as$${r}_{t}=\left\{\begin{array}{cc}\omega \xb7{\gamma}_{{k}_{i}}+(1-\omega )\xb7({T}_{0}-{T}_{t-1}),\hfill & {\sigma}_{t-1}\le {\sigma}_{0}\hfill \\ -1,\hfill & {\sigma}_{t}>{\sigma}_{0}\hfill \end{array}\right.$$

## 5. Simulation Results and Discussions

## 6. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Peng, M.; Li, Y.; Jiang, J.; Li, J.; Wang, C. Heterogeneous cloud radio access networks: A new perspective for enhancing spectral and energy efficiencies. IEEE Wirel. Commun.
**2014**, 21, 126–135. [Google Scholar] [CrossRef][Green Version] - Zhang, Z.; He, Q.; Tong, H.; Gou, J.; Li, X. Spatial-temporal traffic flow pattern identification and anomaly detection with dictionary-based compression theory in a large-scale urban network. Transp. Res. Part C Emerg. Technol.
**2016**, 71, 284–302. [Google Scholar] [CrossRef] - Seo, H.; Lee, K.D.; Yasukawa, S.; Peng, Y.; Sartori, P. LTE evolution for vehicle-to-everything services. IEEE Commun. Mag.
**2016**, 54, 22–28. [Google Scholar] [CrossRef] - Liang, L.; Peng, H.; Li, G.Y.; Shen, X. Vehicular communications: A physical layer perspective. IEEE Trans. Veh. Technol.
**2017**, 66, 10647–10659. [Google Scholar] [CrossRef] - Peng, H.; Liang, L.; Shen, X.; Li, G.Y. Vehicular communications: A network layer perspective. IEEE Trans. Veh. Technol.
**2018**, 68, 1064–1078. [Google Scholar] [CrossRef][Green Version] - Bagheri, H.; Noor-A-Rahim, M.; Liu, Z.; Lee, H.; Pesch, D.; Moessner, K.; Xiao, P. 5G NR-V2X: Toward Connected and Cooperative Autonomous Driving. IEEE Commun. Stand. Mag.
**2021**, 5, 48–54. [Google Scholar] [CrossRef] - Noor-A-Rahim, M.; Liu, Z.; Lee, H.; Ali, G.G.M.N.; Pesch, D.; Xiao, P. A Survey on Resource Allocation in Vehicular Networks. IEEE Trans. Intell. Transp. Syst.
**2020**, 1–21. [Google Scholar] [CrossRef] - Sempere-García, D.; Sepulcre, M.; Gozalvez, J. LTE-V2X Mode 3 scheduling based on adaptive spatial reuse of radio resources. Ad Hoc Netw.
**2021**, 113, 102351. [Google Scholar] [CrossRef] - Hirai, T.; Murase, T. Node clustering communication method with member data estimation to improve qos of V2X communications for driving assistance with crash warning. IEEE Access
**2019**, 7, 37691–37707. [Google Scholar] [CrossRef] - Luong, N.C.; Hoang, D.T.; Gong, S.; Niyato, D.; Wang, P.; Liang, Y.C.; Kim, D.I. Applications of deep reinforcement learning in communications and networking: A survey. IEEE Commun. Surv. Tutor.
**2019**, 21, 3133–3174. [Google Scholar] [CrossRef][Green Version] - Salahuddin, M.A.; Al-Fuqaha, A.; Guizani, M. Reinforcement learning for resource provisioning in the vehicular cloud. IEEE Wirel. Commun.
**2016**, 23, 128–135. [Google Scholar] [CrossRef][Green Version] - Li, Z.; Wang, C.; Jiang, C.J. User association for load balancing in vehicular networks: An online reinforcement learning approach. IEEE Trans. Intell. Transp. Syst.
**2017**, 18, 2217–2228. [Google Scholar] [CrossRef] - Khan, Z.; Fan, P.; Abbas, F.; Chen, H.; Fang, S. Two-level cluster based routing scheme for 5G V2X communication. IEEE Access
**2019**, 7, 16194–16205. [Google Scholar] [CrossRef] - Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature
**2015**, 518, 529–533. [Google Scholar] [CrossRef] [PubMed] - Xu, Y.H.; Yang, C.C.; Hua, M.; Zhou, W. Deep deterministic policy gradient (DDPG)-based resource allocation scheme for NOMA vehicular communications. IEEE Access
**2020**, 8, 18797–18807. [Google Scholar] [CrossRef] - Ye, H.; Li, G.Y.; Juang, B.H.F. Deep Reinforcement Learning Based Resource Allocation for V2V Communications. IEEE Trans. Veh. Technol.
**2019**, 68, 3163–3173. [Google Scholar] [CrossRef][Green Version] - Zhang, X.; Peng, M.; Yan, S.; Sun, Y. Deep-reinforcement-learning-based mode selection and resource allocation for cellular V2X communications. IEEE Internet Things J.
**2020**, 7, 6380–6391. [Google Scholar] [CrossRef][Green Version] - Little, J.D. A proof for the queuing formula: L= λ W. Oper. Res.
**1961**, 9, 383–387. [Google Scholar] [CrossRef] - Ghosh, A.; Cottatellucci, L.; Altman, E. Nash Equilibrium for Femto-Cell Power Allocation in HetNets with Channel Uncertainty. In Proceedings of the 2015 IEEE Global Communications Conference (GLOBECOM), San Diego, CA, USA, 6–10 December 2015; pp. 1–7. [Google Scholar]
- Auer, G.; Giannini, V.; Desset, C.; Godor, I.; Skillermark, P.; Olsson, M.; Imran, M.A.; Sabella, D.; Gonzalez, M.J.; Blume, O.; et al. How much energy is needed to run a wireless network? IEEE Wirel. Commun.
**2011**, 18, 40–49. [Google Scholar] [CrossRef] - 3GPP. Proximity-Based Services (ProSe). Technical Specification (TS) 23.303, 3rd Generation Partnership Project (3GPP). Available online: https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=840 (accessed on 12 June 2017).
- Sutton, R.S.; Barto, A.G. Reinforcement learning: An introduction. Robotica
**1999**, 17, 229–235. [Google Scholar] [CrossRef] - Lopez, P.A.; Behrisch, M.; Bieker-Walz, L.; Erdmann, J.; Flötteröd, Y.P.; Hilbrich, R.; Lücken, L.; Rummel, J.; Wagner, P.; Wießner, E. Microscopic Traffic Simulation using SUMO. In Proceedings of the 21st IEEE International Conference on Intelligent Transportation Systems, Maui, HI, USA, 4–7 November 2018. [Google Scholar]
- 3GPP. Technical Specification Group Radio Access Networks, Study on Evaluation Methodology of New Vehicle-to-Everything (V2X) Use Cases for LTE and NR. Technical Report (TR) 37.885, 3rd Generation Partnership Project (3GPP). Available online: https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=3209 (accessed on 4 August 2018).
- Song, Y.S.; Choi, H.K. Analysis of V2V broadcast performance limit for WAVE communication systems using two-ray path loss model. ETRI J.
**2017**, 39, 213–221. [Google Scholar] [CrossRef] - Codecá, L.; Frank, R.; Faye, S.; Engel, T. Luxembourg sumo traffic (lust) scenario: Traffic demand evaluation. IEEE Intell. Transp. Syst. Mag.
**2017**, 9, 52–63. [Google Scholar] [CrossRef] - Yu, Z.; Hu, J.; Min, G.; Lu, H.; Zhao, Z.; Wang, H.; Georgalas, N. Federated learning based proactive content caching in edge computing. In Proceedings of the 2018 IEEE Global Communications Conference (GLOBECOM), Abu Dhabi, United Arab Emirates, 9–13 December 2018; pp. 1–6. [Google Scholar]
- Konečnỳ, J.; McMahan, H.B.; Yu, F.X.; Richtárik, P.; Suresh, A.T.; Bacon, D. Federated learning: Strategies for improving communication efficiency. arXiv
**2016**, arXiv:1610.05492. [Google Scholar]

**Figure 4.**Comparison of average SINRs with various V2V distance: (

**a**) With low scenario; (

**b**) with high scenario.

**Figure 5.**Comparison of achievable data rates with various V2V distance: (

**a**) With low scenario; (

**b**) with high scenario.

**Figure 6.**Comparison of outage probabilities with various V2V distances: (

**a**) With low scenario; (

**b**) with high scenario.

**Figure 7.**Comparison of system energy efficiencies with various V2V distances: (

**a**) With low scenario; (

**b**) with high scenario.

**Figure 8.**Comparison of average SINRs with various maximum cluster sizes: (

**a**) With low scenario; (

**b**) with high scenario.

**Figure 9.**Comparison of achievable data rates with various maximum cluster sizes: (

**a**) With low scenario; (

**b**) with high scenario.

**Figure 10.**Comparison of system energy efficiencies with various maximum cluster sizes: (

**a**) With low scenario; (

**b**) with high scenario.

Symbol | Description |
---|---|

$\mathbb{U},\mathbb{C},\mathbb{V}$ | Set of total VE, V2I VE, V2V VE |

$\mathbb{S},\mathbb{K}$ | Set of total RRHs, resource blocks |

${W}^{total}$ | Total available bandwidth |

${p}_{tr}^{0},{p}_{k}^{v}$ | Transmission power of RRH, V2V VE v at the kth RB |

N | Noise power spectral density |

${\gamma}_{k}^{c},{\gamma}_{k}^{v}$ | SINR of cth V2I and vth V2V VE at the kth RB |

R | Total system capacity that means sum achieved data rate of total VE |

P | Total system energy consumption |

${P}_{\mathbb{S}}$ | Power consumption of total RRHs |

${P}_{fronthaul}$ | Power consumption of total fronthaul links |

$EE$ | System energy efficiency |

$\gamma ,{\gamma}_{0}$ | SINR of VE, SINR constraint |

$\tau ,{\tau}_{max}$ | Outage probability of system, Outage probability constraint |

${p}_{max}$ | Maximum transmission power of VE |

Parameter | Notation | Value |
---|---|---|

Noise power spectral density | N | −174 dBm/Hz |

Total bandwidth | ${W}^{total}$ | 100 MHz |

SINR threshold | ${\gamma}_{0}$ | 0.5 dBm |

Maximum outage probability constraint | ${\tau}_{max}$ | 0.05 |

Circuit power of RRH | ${\pi}_{\mathbb{S}}$ | 4.3 W |

Slope of RRH | $\Delta slope$ | 4.0 |

Circuit power of fronthaul transceiver and switch | ${\pi}_{fronthaul}$ | 13 W |

Power consumption per bit/s | $\phi $ | 0.83 W |

Transmission power of V2I mode | ${p}_{k}^{c}$ | 23 dBm |

Transmission power of RRH | ${p}_{tr}^{0}$ | 24 dBm |

Cluster size of RRH grouping | ${N}_{s}$ | 5 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Park, H.; Lim, Y. Deep Reinforcement Learning Based Resource Allocation with Radio Remote Head Grouping and Vehicle Clustering in 5G Vehicular Networks. *Electronics* **2021**, *10*, 3015.
https://doi.org/10.3390/electronics10233015

**AMA Style**

Park H, Lim Y. Deep Reinforcement Learning Based Resource Allocation with Radio Remote Head Grouping and Vehicle Clustering in 5G Vehicular Networks. *Electronics*. 2021; 10(23):3015.
https://doi.org/10.3390/electronics10233015

**Chicago/Turabian Style**

Park, Hyebin, and Yujin Lim. 2021. "Deep Reinforcement Learning Based Resource Allocation with Radio Remote Head Grouping and Vehicle Clustering in 5G Vehicular Networks" *Electronics* 10, no. 23: 3015.
https://doi.org/10.3390/electronics10233015