Deep Reinforcement Learning for RIS-Aided Multiuser MISO System with Hardware Impairments
Abstract
:1. Introduction
- We consider optimizing the precoding matrices at the BS and the reflecting phase shifts at the RIS based on statistical CSI to maximize the minimum user data rate to ensure fairness among the users, where the imperfect hardware is taken into account.
- Due to the expectation operator along with the hardware impairment, it is challenging to derive the closed-form data rate expression. Furthermore, the objective function in terms of the max-min format is discontinuous and non-differentiable. As a result, the existing algorithms based on mathematical derivations are not applicable. Instead, we resort to the powerful deep deterministic policy gradient (DDPG) algorithm to solve this challenging optimization problem.
- Note that the convergence speed is quite fast as it can converge within 600–900 iterations and the overall computational complexity are mainly from the calculation of rewards, which are only simple mathematical calculations. In addition, the calculated parameters can be used in subsequent steps and only need to be recalculated when the statistical CSI changes. Once the neural network is trained, it can be directly applied in real-time applications with only simple mathematical calculations. The neural networks only need to be retrained once the statistical CSI changes. Hence, the computational complexity is not high.
2. System Model
3. Proposed Algorithm
3.1. Transmission Scheme
3.2. DDPG Algorithm
- (1)
- The role of the actor current network is to iteratively update the policy network parameters and select the current action according to the state at time step t, which is composed of three parts: the beamforming matrix , the phase shift matrix and the channel matrices, i.e., , , . In addition, the actor current network also interacts with the environment to generate and reward , which can be defined as and is defined in (15). The expression of loss function can be expressed as
- (2)
- The actor target network serves to select the optimal action based on the state at time sampled in the empirical playback pool. The action is composed of two parts: the first N elements corresponding to the phase shifts of RIS reflecting elements and the remaining elements corresponding to the real part and imaginary part of the beamforming matrix, respectively. We take action to optimize the beamforming matrix and the phase shift matrix , and the optimized results can be described asThe target network parameter is periodically copied from the current network parameter , which uses the soft update method, and the soft update factor is .
- (3)
- The critic current network is used to iteratively update the value network parameter and calculate the current value of . The target value of is given byThe loss function is given by
- (4)
- The critic target network aims to calculate the portion of the target value Q. The network parameter is periodically copied from , which uses the soft update method, and the soft update factor is :At the same time, to increase some randomness and increase the coverage of learning in the learning process, the DDPG algorithm adds some noise to the selected action A. That is, the expression of the final and interactive action A of the environment is
Algorithm 1 The Proposed DDPG Algorithm. |
|
4. Simulation Results
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Pan, C.; Ren, H.; Wang, K.; Kolb, J.F.; Elkashlan, M.; Chen, M.; Di Renzo, M.; Hao, Y.; Wang, J.; Swindlehurst, A.L.; et al. Reconfigurable Intelligent Surfaces for 6G Systems: Principles, Applications, and Research Directions. IEEE Commun. Mag. 2021, 59, 14–20. [Google Scholar] [CrossRef]
- Renzo, M.D.; Debbah, M.; Phan-Huy, D.T.; Zappone, A.; Alouini, M.S.; Yuen, C.; Sciancalepore, V.; Alexandropoulos, G.C.; Hoydis, J.; Gacanin, H.; et al. Smart radio environments empowered by reconfigurable AI meta-surfaces: An idea whose time has come. EURASIP J. Wirel. Commun. Netw. 2019, 2019, 1–20. [Google Scholar] [CrossRef] [Green Version]
- Oliveri, G.; Rocca, P.; Salucci, M.; Massa, A. Holographic smart EM skins for advanced beam power shaping in next generation wireless environments. IEEE J. Multiscale Multiphys. Comput. Tech. 2021, 6, 171–182. [Google Scholar] [CrossRef]
- Di Renzo, M.; Zappone, A.; Debbah, M.; Alouini, M.S.; Yuen, C.; de Rosny, J.; Tretyakov, S. Smart Radio Environments Empowered by Reconfigurable Intelligent Surfaces: How It Works, State of Research, and The Road Ahead. IEEE J. Sel. Areas Commun. 2020, 38, 2450–2525. [Google Scholar] [CrossRef]
- Huang, C.; Hu, S.; Alexandropoulos, G.C.; Zappone, A.; Yuen, C.; Zhang, R.; Renzo, M.D.; Debbah, M. Holographic MIMO Surfaces for 6G Wireless Networks: Opportunities, Challenges, and Trends. IEEE Wirel. Commun. 2020, 27, 118–125. [Google Scholar] [CrossRef]
- Benoni, A.; Salucci, M.; Oliveri, G.; Rocca, P.; Li, B.; Massa, A. Planning of EM Skins for Improved Quality-of-Service in Urban Areas. IEEE Trans. Antennas Propag. 2022. [Google Scholar] [CrossRef]
- Pan, C.; Ren, H.; Wang, K.; Xu, W.; Elkashlan, M.; Nallanathan, A.; Hanzo, L. Multicell MIMO Communications Relying on Intelligent Reflecting Surfaces. IEEE Trans. Wirel. Commun. 2020, 19, 5218–5233. [Google Scholar] [CrossRef]
- Pan, C.; Ren, H.; Wang, K.; Elkashlan, M.; Nallanathan, A.; Wang, J.; Hanzo, L. Intelligent Reflecting Surface Aided MIMO Broadcasting for Simultaneous Wireless Information and Power Transfer. IEEE J. Sel. Areas Commun. 2020, 38, 1719–1734. [Google Scholar] [CrossRef]
- Boulogeorgos, A.A.A.; Alexiou, A. How Much do Hardware Imperfections Affect the Performance of Reconfigurable Intelligent Surface-Assisted Systems? IEEE Open J. Commun. Soc. 2020, 1, 1185–1195. [Google Scholar] [CrossRef]
- Shen, H.; Xu, W.; Gong, S.; Zhao, C.; Ng, D.W.K. Beamforming Optimization for IRS-Aided Communications with Transceiver Hardware Impairments. IEEE Trans. Commun. 2021, 69, 1214–1227. [Google Scholar] [CrossRef]
- Zhou, G.; Pan, C.; Ren, H.; Wang, K.; Peng, Z. Secure Wireless Communication in RIS-Aided MISO System with Hardware Impairments. IEEE Wirel. Commun. Lett. 2021, 10, 1309–1313. [Google Scholar] [CrossRef]
- Peng, Z.; Li, T.; Pan, C.; Ren, H.; Wang, J. RIS-Aided D2D Communications Relying on Statistical CSI With Imperfect Hardware. IEEE Commun. Lett. 2022, 26, 473–477. [Google Scholar] [CrossRef]
- Wang, K.; Lam, C.T.; Ng, B.K. Doppler Effect Mitigation using Reconfigurable Intelligent Surfaces with Hardware Impairments. In Proceedings of the 2021 IEEE Globecom Workshops (GC Wkshps), Madrid, Spain, 7–11 December 2021; pp. 1–6. [Google Scholar] [CrossRef]
- Hemanth, A.; Umamaheswari, K.; Pogaku, A.C.; Do, D.T.; Lee, B.M. Outage Performance Analysis of Reconfigurable Intelligent Surfaces-Aided NOMA Under Presence of Hardware Impairment. IEEE Access 2020, 8, 212156–212165. [Google Scholar] [CrossRef]
- Peng, Z.; Chen, Z.; Pan, C.; Zhou, G.; Ren, H. Robust Transmission Design for RIS-Aided Communications With Both Transceiver Hardware Impairments and Imperfect CSI. IEEE Wirel. Commun. Lett. 2022, 11, 528–532. [Google Scholar] [CrossRef]
- Hassan, A.K.; Moinuddin, M.; Al-Saggaf, U.M.; Aldayel, O.; Davidson, T.N.; Al-Naffouri, T.Y. Performance Analysis and Joint Statistical Beamformer Design for Multi-User MIMO Systems. IEEE Commun. Lett. 2020, 24, 2152–2156. [Google Scholar] [CrossRef]
- Zhi, K.; Pan, C.; Ren, H.; Wang, K. Power Scaling Law Analysis and Phase Shift Optimization of RIS-Aided Massive MIMO Systems With Statistical CSI. IEEE Trans. Commun. 2022, 70, 3558–3574. [Google Scholar] [CrossRef]
- Dai, J.; Zhu, F.; Pan, C.; Ren, H.; Wang, K. Statistical CSI-Based Transmission Design for Reconfigurable Intelligent Surface-Aided Massive MIMO Systems With Hardware Impairments. IEEE Wirel. Commun. Lett. 2022, 11, 38–42. [Google Scholar] [CrossRef]
- Ren, H.; Pan, C.; Wang, L.; Liu, W.; Kou, Z.; Wang, K. Long-Term CSI-Based Design for RIS-Aided Multiuser MISO Systems Exploiting Deep Reinforcement Learning. IEEE Commun. Lett. 2022, 26, 567–571. [Google Scholar] [CrossRef]
Parameter Networks | Number of Neurons | Activation Function |
---|---|---|
Actor | 128 | ReLU |
64 | ReLU | |
Critic | 64 | ReLU |
32 | ReLU | |
1 | None |
Parameter Name | Sign | Parameter Value |
---|---|---|
Noise power density | −174 dBm/Hz | |
Channel bandwidth | B | 1 MHz |
Reference path loss | 0–30 dB | |
Reference distance | 1 m | |
Path loss coeffificients | ||
Rician factors | 3 | |
3 | ||
3 | ||
Correlation coefficients | ||
Normalized variance | ||
Normalized variance | ||
Numbers of antennas | M | 8 |
Numbers of users | K | 4 |
Numbers of reflecting elements | N | 20–50 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ma, W.; Zhuo, L.; Li, L.; Liu, Y.; Ren, H. Deep Reinforcement Learning for RIS-Aided Multiuser MISO System with Hardware Impairments. Appl. Sci. 2022, 12, 7236. https://doi.org/10.3390/app12147236
Ma W, Zhuo L, Li L, Liu Y, Ren H. Deep Reinforcement Learning for RIS-Aided Multiuser MISO System with Hardware Impairments. Applied Sciences. 2022; 12(14):7236. https://doi.org/10.3390/app12147236
Chicago/Turabian StyleMa, Wenjie, Liuchang Zhuo, Luchu Li, Yuhao Liu, and Hong Ren. 2022. "Deep Reinforcement Learning for RIS-Aided Multiuser MISO System with Hardware Impairments" Applied Sciences 12, no. 14: 7236. https://doi.org/10.3390/app12147236
APA StyleMa, W., Zhuo, L., Li, L., Liu, Y., & Ren, H. (2022). Deep Reinforcement Learning for RIS-Aided Multiuser MISO System with Hardware Impairments. Applied Sciences, 12(14), 7236. https://doi.org/10.3390/app12147236