# Reinforcement Learning for Energy Optimization with 5G Communications in Vehicular Social Networks

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Materials and Methods

## 3. System Model and Problem Definition

## 4. Proposed Algorithm

#### 4.1. Centralized Q-Learning in the BBU Pool

Algorithm 1: Pseudocode for centralized Q-learning |

Initialization: |

for each ${s}_{BBU}^{t}\in {S}_{BBU}^{t},{a}_{BBU}^{t}\in {A}_{BBU}^{t}$ do |

initialize Q-table and policy ${\pi}_{BBU}^{\ast}\left({s}_{BBU}^{t}\right)$ |

end for |

Learning: |

loop |

estimate the state ${s}_{BBU}^{t}$ |

generate a random real number x$\in \left[0,1\right]$ |

if $x<\epsilon $ // for exploration |

select the action ${a}_{BBU}^{t}$ randomly |

else |

select the action ${a}_{BBU}^{t}$ according to ${\pi}_{BBU}^{\ast}\left({s}_{BBU}^{t}\right)$ |

recommend action ${a}_{BBU}^{t}$ to the devices in the cell |

calculate reward ${r}_{BBU}^{t}$ |

update Q-value $Q\left({s}_{BBU}^{t},{a}_{BBU}^{t}\right)$ and ${\pi}_{BBU}^{\ast}\left({s}_{BBU}^{t}\right)$ |

end loop |

#### 4.2. Distributed Q-Learning in the Devices

Algorithm 2: Pseudocode for distributed Q-learning |

Initialization: |

for each ${s}_{u}^{t}\in {S}_{u}^{t},{a}_{u}^{t}\in {A}_{u}^{t}$ do |

initialize Q-table and policy ${\pi}_{u}^{\ast}\left({s}_{u}^{t}\right)$ |

end for |

Learning: |

loop |

estimate state ${s}_{u}^{t}$ |

generate a random real number x$\in \left[0,1\right]$ |

if $x<\epsilon $ // for exploration |

elect action ${a}_{u}^{t}$ randomly |

else |

select action ${a}_{u}^{t}$ according to ${\pi}_{u}^{\ast}\left({s}_{u}^{t}\right)$ |

receive action ${a}_{BBU}^{t}$ from algorithm1 |

determine action ${a}_{u}^{\ast}$ by comparing ${a}_{u}^{t}$ and ${a}_{BBU}^{t}$ |

execute action ${a}_{u}^{t}$ |

calculate reward ${r}_{u}^{t}$ |

update Q-value $Q\left({s}_{u}^{t},{a}_{u}^{t}\right)$ and ${\pi}_{u}^{\ast}\left({s}_{u}^{t}\right)$ |

end loop |

#### 4.3. Target SINR Updating Algorithm

## 5. Results and Discussion

## 6. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Su, Z.; Hui, Y.; Guo, S. D2D-based content delivery with parked vehicles in vehicular social networks. IEEE Wirel. Commun.
**2016**, 23, 90–95. [Google Scholar] [CrossRef] - Akinlade, O.M. Adaptive Transmission Power with Vehicle Density for Congestion Control. Master’s Thesis, Windsor University, Windsor, ON, Canada, July 2018. [Google Scholar]
- Wu, X.; Sun, S.; Li, Y.; Tan, Z.; Huang, W.; Yao, X. A Power control algorithm based on outage probability awareness in vehicular ad hoc networks. Adv. Multimed.
**2018**, 2018, 1–8. [Google Scholar] [CrossRef][Green Version] - Hong, H.; Kim, Y.; Kim, R.; Ahn, W. An Effective Wide-Bandwidth Channel Access in Next-Generation WLAN-Based V2X Communications. Appl. Sci.
**2019**, 10, 222. [Google Scholar] [CrossRef][Green Version] - Liang, L.; Ye, H.; Li, G.Y. Spectrum sharing in vehicular networks based on multi-agent reinforcement learning. IEEE J. Sel. Areas Commun.
**2019**, 37, 2282–2292. [Google Scholar] [CrossRef][Green Version] - Zhang, X.; Peng, M.; Sun, Y. Deep reinforcement learning based mode selection and resource allocation for cellular V2X communications. IEEE Internet Things
**2019**, 1–12. [Google Scholar] [CrossRef][Green Version] - Huang, J.; Jinyun, Z.; Cong-Cong, X. Energy-efficient mode selection for D2D communications in cellular networks. IEEE Trans. Cogn. Commun. Netw.
**2018**, 4, 869–882. [Google Scholar] [CrossRef] - Akkarajitsakul, K.; Phunchongharn, P.; Hossain, E.; Bhargava, V.K. Mode selection for energy-efficient D2D communications in LTE-advanced networks: A coalitional game approach. In Proceedings of the IEEE international conference on communication systems, Singapore, 21–23 November 2012; pp. 488–492. [Google Scholar]
- Zuo, J.; Chao, Z.; Nan, B. Mode selection for energy efficient D2D communications underlaying C-RAN. In Proceedings of the International Conference on Information Technology, Singapore, 27–29 December 2017; pp. 287–291. [Google Scholar]
- Huang, J.; Liao, Y.; Xing, C. Efficient power control for D2D with SWIPT. In Proceedings of the Proceedings of the Conference on Research in Adaptive and Convergent Systems, Honolulu, HI, USA, 9–12 October 2018; pp. 106–111. [Google Scholar]
- Peng, S.; Kang, G.; Hailin, Z.; Liang, H. Transmit power control for D2D-underlaid cellular networks based on statistical features. IEEE Trans. Veh. Technol.
**2017**, 66, 4110–4119. [Google Scholar] - Xu, Y.; Wang, S. Mode selection for energy efficient content delivery in cellular networks. IEEE Commun. Lett.
**2016**, 20, 728–731. [Google Scholar] [CrossRef] - Klaus, D.; Chia-Hao, Y.; Cassio, B.R.; Peckka, J. Mode selection for device-to-device communication underlaying an LTE-advanced network. In Proceedings of the IEEE Wireless Communication and Networking Conference, Sydney, Australia, 18–21 April 2010; pp. 1–6. [Google Scholar]
- Yuri, V.L.M.; Rodrigo, L.B.; Carlos, F.M.S.; Tarcisio, F.M.; Jose, M.B.S.; Francisco, R.P.C. Uplink power control with variable target SINR for D2D communications underlaying cellular networks. In Proceedings of the 20th European Wireless Conference, Barcelona, Spain, 14–16 May 2014; pp. 1–5. [Google Scholar]
- Lee, N.; Lin, X.; Andrews, J.G.; Heath, R.W. Power control for D2D underlaid cellular networks: Modelling, algorithms, and analysis. IEEE J. Sel. Areas Commun.
**2015**, 33, 1–13. [Google Scholar] [CrossRef][Green Version] - Bei, M.; Hailin, Z.; Zhaowei, Z. Joint power allocation and mode selection for D2D communications with imperfect CSI. China Commun.
**2015**, 12, 73–81. [Google Scholar] - Yifei, H.; Ali, A.N.; Salman, D.; Xiangyun, Z. Mode selection, resource allocation, and power control for D2D-enabled two-tier cellular network. IEEE Trans. Commun.
**2016**, 64, 3534–3547. [Google Scholar] - Arifur, R.; Youngdoo, L.; Insoo, K. An efficient transmission mode selection based on reinforcement learning for cooperative cognitive radio networks. Hum. Cent. Comput. Inf. Sci.
**2016**, 6, 1–14. [Google Scholar] - Yaohua, S.; Mugen, P.; Shiwen, M. Deep reinforcement learning-based mode selection and resource management for green fog radio access networks. IEEE Internet Things J.
**2019**, 6, 1960–1971. [Google Scholar] - Shiwen, N.; Zhiqiang, F.; Ming, Z.; Xinyu, G.; Lin, Z. Q-learning based power control algorithm for D2D communication. In Proceedings of the IEEE 27th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications, Valencia, Spain, 4–8 September 2016; pp. 1–6. [Google Scholar]
- Yimingm, Q.; Zelin, J.; Yonghao, Z.; Guanghao, M.; Gang, X. Joint mode selection and power adaption for D2D communication with reinforcement learning. In Proceedings of the 15th International Symposium on Wireless Communication Systems, Lisbon, Portugal, 28–31 August 2018; pp. 1–6. [Google Scholar]
- Auer, G.; Giannini, V.; Desset, C.; Godor, I.; Skillermark, P.; Olsson, M.; Imran, M.A.; Sabella, D.; Gonzalez, M.J.; Blume, O.; et al. How much energy is needed to run a wireless network? IEEE Wirel. Commun.
**2011**, 18, 40–49. [Google Scholar] [CrossRef] - Xiaojian, L.; Shaowei, W. Efficient remote radio head switching scheme in cloud radio access network: A load balancing perspective. In Proceedings of the IEEE Conference on Computer Communications, Atlanta, GA, USA, 1–4 May 2017; pp. 1–9. [Google Scholar]
- Arnob, G.; Laura, C.; Eitan, A. Nash equilibrium for femto-cell power allocation in HetNets with channel uncertainty. In Proceedings of the IEEE Global Communications Conference, San Diego, CA, USA, 6–10 December 2015; pp. 1–7. [Google Scholar]
- Park, H.; Lim, Y. Adaptive Power Control using Reinforcement Learning in 5G Mobile Networks. In Proceedings of the International Conference on Information Networking, Barcelona, Spain, 7–10 January 2020; pp. 409–414. [Google Scholar]
- Nasir, Y.S.; Guo, D. Multi-agent deep Reinforcement Learning for Distributed Dynamic Power Allocation in Wireless Networks. arXiv
**2018**, arXiv:1808.00490. [Google Scholar] - 3GPP. TR 37.885, Technical Specification Group Radio Access Networks, Study on Evaluation Methodology of New Vehicle-to-Everything (V2X) use cases for LET and NR, Rel. 16; Sophia-Antipolis: Valbonne, France, 2018. [Google Scholar]
- Daniel, K.; Jakob, E.; Michael, B.; Laura, B. Recent development and applications of SUMO-simulation of urban mobility. Int. J. Adv. Syst. Meas.
**2012**, 5, 128–138. [Google Scholar] - 3GPP. TR 22.886, Technical Specification Group Radio Access Networks, Study on Enhancement of 3GPP Support for 5G V2X Services, Rel. 15; Sophia-Antipolis: Valbonne, France, 2017. [Google Scholar]
- YooSeung, S.; Hyungkyun, C. Analysis of V2V Broadcast Performance Limit for WAVE Communication Systems Using Two-Ray Path Loss Model. ETRI J.
**2017**, 39, 213–221. [Google Scholar] - 3GPP. TR 36.872, Technical Specification Group Radio Access Network, Small Cell Enhancements for E-UTRA and E-UTRAN Physical Layer Aspects, Rel. 11; Sophia-Antipolis: Valbonne, France, 2013. [Google Scholar]
- Khaled, S.H.; Engy, M.M. Device-to-Device communication distance analysis in interference limited cellular networks. In Proceedings of the ISWCS 2013; The Tenth International Symposium on Wireless Communication Systems, Ilmenau, Germany, 27–30 August 2013; pp. 1–5. [Google Scholar]
- Seok, B.; Sicato, J.C.S.; Erzhena, T.; Xuan, C.; Pan, Y.; Park, J.H. Secure D2D Communication for 5G IoT Network Based on Lightweight Cryptography. Appl. Sci.
**2020**, 10, 217. [Google Scholar] [CrossRef][Green Version]

**Figure 3.**Comparison of system energy efficiencies with various device-to-device (D2D) distances and system loads: (

**a**) $250\mathrm{m}$ with light traffic; (

**b**) $350\mathrm{m}$ with light traffic; (

**c**) $250\mathrm{m}$ with heavy traffic; and (

**d**) $350\mathrm{m}$ with heavy traffic.

**Figure 4.**Comparison of average achievable data rate with various D2D distances and system loads: (

**a**) $250\mathrm{m}$ with light traffic; (

**b**) $350\mathrm{m}$ with light traffic; (

**c**) 250 m with heavy traffic; and (

**d**) $350\mathrm{m}$ with heavy traffic.

**Figure 5.**Comparison of signal-to-interference noise ratio (SINR) with various D2D distances and system loads: (

**a**) $250\mathrm{m}$ with light traffic; (

**b**) $350\mathrm{m}$ with light traffic; (

**c**) $250\mathrm{m}$ with heavy traffic; and (

**d**) $350\mathrm{m}$ with heavy traffic.

**Figure 6.**Convergence of system energy efficiency as a reward function with D2D distance = 350 m: (

**a**) light traffic scenario; (

**b**) heavy traffic scenario.

Parameter | Notation | Value |
---|---|---|

Noise power spectral density | $N$ | $-174\mathrm{dBm}/\mathrm{Hz}$ |

Total bandwidth | ${W}^{total}$ | $100\mathrm{MHz}$ |

SINR constraint | ${\gamma}_{0}$ | $0.5\mathrm{dBm}$ |

Maximum outage probability constraint | ${\tau}_{max}$ | $0.05$ |

Minimum outage probability constraint | ${\tau}_{min}$ | $0.01$ |

Circuit power of RRH | ${\varphi}_{rrh}$ | $4.3\mathrm{W}$ |

Slope of RRH | $\mathsf{\Delta}slope$ | $4.0$ |

Circuit power of fronthaul transceiver and switch | ${\varphi}_{fronthaul}$ | $13\mathrm{W}$ |

Power consumption per bit/s | $\ell $ | $0.83\mathrm{W}$ |

Transmission power of cellular device | ${p}_{k}^{c}$ | $23\mathrm{dBm}$ |

Transmission power of RRH | ${p}_{tr}^{0}$ | $24\mathrm{dBm}$ |

**Table 2.**Average system energy efficiencies with system load variance, $\mathrm{D}2\mathrm{D}\mathrm{distance}=350\mathrm{m}.$

System Load (%) | 40 | 50 | 60 | 70 | 80 |
---|---|---|---|---|---|

Optimal | 4.8414 | 5.2831 | 4.9594 | 5.4548 | 5.5733 |

BA | 1.9857 | 1.0919 | 0.8479 | 0.7086 | 0.6553 |

Proposed | 3.8026 | 2.5863 | 1.9223 | 1.5863 | 1.4909 |

Compare1 | 3.2371 | 1.9767 | 1.5212 | 1.2576 | 1.1491 |

Compare2 | 1.9429 | 1.0723 | 0.8184 | 0.6992 | 0.6376 |

**Table 3.**Average achievable data rate with system load variance, $\mathrm{D}2\mathrm{D}\mathrm{distance}=350\mathrm{m}.$

System Load (%) | 40 | 50 | 60 | 70 | 80 |
---|---|---|---|---|---|

Optimal | 183.1308 | 184.869 | 181.0795 | 180.4453 | 179.5396 |

BA | 130.6982 | 124.3168 | 123.1323 | 122.5992 | 119.5717 |

Proposed | 168.763 | 158.1163 | 138.6988 | 131.8081 | 123.8312 |

Compare1 | 115.4847 | 98.18192 | 88.934 | 81.2835 | 72.47508 |

Compare2 | 126.4065 | 122.9883 | 120.3971 | 119.4175 | 117.3408 |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Park, H.; Lim, Y. Reinforcement Learning for Energy Optimization with 5G Communications in Vehicular Social Networks. *Sensors* **2020**, *20*, 2361.
https://doi.org/10.3390/s20082361

**AMA Style**

Park H, Lim Y. Reinforcement Learning for Energy Optimization with 5G Communications in Vehicular Social Networks. *Sensors*. 2020; 20(8):2361.
https://doi.org/10.3390/s20082361

**Chicago/Turabian Style**

Park, Hyebin, and Yujin Lim. 2020. "Reinforcement Learning for Energy Optimization with 5G Communications in Vehicular Social Networks" *Sensors* 20, no. 8: 2361.
https://doi.org/10.3390/s20082361