Deep Reinforcement Learning for RIS-Aided Multiuser MISO System with Hardware Impairments

Ma, Wenjie; Zhuo, Liuchang; Li, Luchu; Liu, Yuhao; Ren, Hong

doi:10.3390/app12147236

Open AccessArticle

Deep Reinforcement Learning for RIS-Aided Multiuser MISO System with Hardware Impairments

by

Wenjie Ma

,

Liuchang Zhuo

,

Luchu Li

,

Yuhao Liu

and

Hong Ren

^*

National Mobile Communications Research Laboratory, Southeast University, Nanjing 210096, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(14), 7236; https://doi.org/10.3390/app12147236

Submission received: 31 May 2022 / Revised: 11 July 2022 / Accepted: 13 July 2022 / Published: 18 July 2022

(This article belongs to the Special Issue Reconfigurable Intelligent Surface for 6G Wireless Communications)

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, we study a reconfigurable intelligent surface (RIS)-aided multiuser MISO system with imperfect hardware, where the transceiver design is based on the statistical channel state information (CSI). Considering the transceiver hardware impairments (HWI), we aim to maximize the minimum average user data rate, where the precoding matrices at the base station (BS) and the reflecting phase shifts at the RIS are jointly optimized. Since the problem is nonconvex and the objective function cannot be derived in closed form, we adopt the deep deterministic policy gradient (DDPG) algorithm to deal with this challenging optimization problem, where we generate a set of CSI vectors in an offline way, and then these data sets are used to train the neural networks. The simulation results demonstrate the rapid convergence speed of the adopted DDPG algorithm and also emphasize that it is crucial to consider the HWI when optimizing the transceiver.

Keywords:

intelligent reflecting surface (IRS); reconfigurable intelligent surface (RIS); hardware impairment (HWI); deep deterministic policy gradient (DDPG)

1. Introduction

Thanks to its attractive properties of low power consumption and hardware cost, reconfigurable intelligent surface (RIS) is recognized as one of the most promising techniques in future sixth-generation (6G) wireless systems [1,2,3,4,5,6]. RIS consists of an array of passive and low-cost reflecting elements whose phase shifts can be tuned. The authors of [7,8] studied the RIS-aided multicell and RIS-aided simultaneous wireless information and power transfer, respectively. Low-complexity algorithms were developed to jointly optimize the precoding matrices at the base station (BS) and the reflecting phase shifts at the RIS. However, the above contributions in [7,8] were based on the ideal assumption of perfect hardware, which is difficult to hold in practice. In practical communication systems, there are inevitable transceiver hardware impairments (HWI), which would cause signal distortions and cannot be ignored in the transceiver design.

The authors of [9] derived the closed-form date rate expression for RIS-aided communication systems, and then the impact of HWI on the RIS-aided systems was analyzed. A RIS-aided single-user communication system with HWI was studied in [10], where the phase shifts of the RIS were optimized by the majorization-minimization (MM) algorithm. Recently, the joint beamforming and phase shift design was studied in a RIS-aided physical layer security system in [11]. Besides the transceiver hardware impairment, the authors of [12] further considered the impact of the phase noise at the RIS and derived the closed-form data rate expression, based on which the genetic algorithm was adopted to solve the phase shift optimization problem. In [13], the RIS-aided communication system for serving a mobile user was studied, and the authors proposed an interesting algorithm to predict the positions of the user under HWI. In [14], the authors analyzed the outage performance for RIS-aided non-orthogonal multiple access systems with HWI, where both near-field and far-field users were considered. Most recently, robust transceiver design for RIS-aided communication systems was studied in [15], where both imperfect CSI and HWI were taken into account. The semidefinite programming was proposed to solve the robust problem.

However, all the above papers were based on the assumption that the BS can acquire the instantaneous CSI, which is challenging in practice due to the limited channel coherence time. Recently, the researchers have focused on the phase shift design based on the statistical CSI such as location/angle information or channel distribution information such as channel covariance matrices, which varies in a much lower time scale than the instantaneous CSI. There are several advantages to using statistical CSI for transceiver design [16]. Firstly, the channel estimation overhead can be reduced as only statistical CSI is needed, which changes very slowly. Secondly, the computational complexity is significantly reduced as the phase shifts at the RIS are only needed to be recomputed when the statistical CSI has changed. Thirdly, the feedback overhead is decreased since the phase shift values of the RIS are only fed back to the RIS controller only when its values are updated, which changes with statistical CSI. Due to the above appealing advantages, the transceiver design based on statistical CSI for RIS-aided systems has attracted extensive research attention [17,18]. Specifically, the authors of [17] derived the closed-form date rate expression for a RIS-aided multiuser system. Then, a genetic algorithm was first proposed to optimize the phase shifts, which only depend on the statistical CSI. As a step further, the authors extended the work in [17] to the practical case when there are imperfect hardware, and a robust transmission design was proposed to optimize the phase shift by considering HWI.

However, the contributions in [17,18] considered the two-time scale design, where the BS designed its precoding matrices based on the instantaneous effective CSI, while only the phase shifts were designed based on statistical CSI. This means that the BS still needs to estimate the instantaneous effective CSI, which will incur sizable channel estimation overhead for highly-mobile scenarios. Against the above background, the authors of [19] studied the transceiver design for RIS-aided communication systems based on fully statistical CSI, where both the precoding matrices at the BS and the reflecting phase shifts at the RIS were designed based on statistical CSI. However, this work was based on the ideal assumption of perfect hardware, which is difficult to hold in practice. As a result, the contributions of this work are summarized as follows:

We consider optimizing the precoding matrices at the BS and the reflecting phase shifts at the RIS based on statistical CSI to maximize the minimum user data rate to ensure fairness among the users, where the imperfect hardware is taken into account.
Due to the expectation operator along with the hardware impairment, it is challenging to derive the closed-form data rate expression. Furthermore, the objective function in terms of the max-min format is discontinuous and non-differentiable. As a result, the existing algorithms based on mathematical derivations are not applicable. Instead, we resort to the powerful deep deterministic policy gradient (DDPG) algorithm to solve this challenging optimization problem.
Note that the convergence speed is quite fast as it can converge within 600–900 iterations and the overall computational complexity are mainly from the calculation of rewards, which are only simple mathematical calculations. In addition, the calculated parameters can be used in subsequent steps and only need to be recalculated when the statistical CSI changes. Once the neural network is trained, it can be directly applied in real-time applications with only simple mathematical calculations. The neural networks only need to be retrained once the statistical CSI changes. Hence, the computational complexity is not high.

2. System Model

We consider a RIS-aided downlink multi-user system where the base station (BS) is equipped with M antennas and the user has a single antenna. The system architecture is shown in Figure 1. In this system, we assume that the RIS has N reflecting elements. Considering the hardware impairment, the transmit signal at the BS can be expressed as

x = \sum_{k = 1}^{K} (w_{k} s_{k} + η_{s}),

(1)

where

w_{k} \in C^{M \times 1}

represents the beamforming vector from the BS to the k-th user and

s_{k} \sim CN (0, 1)

represents the data signal symbol transmitted to the k-th user which satisfies

E \{{|s_{k}|}^{2}\} = 1

. Furthermore,

η_{s}

denotes the independent Gaussian distortion noise, which satisfies the Gaussian distribution of zero mean, and its distortion noise power is proportional to the transmit power of the antenna. Then,

η_{s}

can be represented as

η_{s} \sim CN (0, k_{s} diag \{\sum_{k = 1}^{K} w_{k} w_{k}^{H}\})

, where

k_{s} \in (0, 1)

denotes the normalized variance of the emission distortion noise.

For K users, the beamforming matrix at the BS can be expressed as

W =  [w_{1}, \dots, w_{K}] .

(2)

The beamforming matrix

W

has to satisfy the power constraints, which can be formulated as

tr {W W^{H}} \leq P_{\max} .

(3)

Furthermore, the channel between the base station and the RIS is denoted as

H_{S I} \in C^{N \times M}

, the channel between the k-th user and the BS is denoted as

h_{S D, k} \in C^{M \times 1}

and the channel matrix from RIS to the k-th user is denoted as

h_{I D, k} \in C^{N \times 1}

. In this paper, we consider the Rician fading model, the channel

H_{S I}

,

h_{S D, k}

, and

h_{I D, k}

can be formulated as

\begin{matrix} H_{S I} & = \sqrt{β} (\sqrt{\frac{δ}{δ + 1}} {\bar{H}}_{S I} + \sqrt{\frac{1}{δ + 1}} {\tilde{H}}_{S I}), \end{matrix}

(4)

\begin{matrix} h_{S D, k} & = \sqrt{γ_{k}} (\sqrt{\frac{ρ_{k}}{ρ_{k} + 1}} {\bar{h}}_{S D, k} + \sqrt{\frac{1}{ρ_{k} + 1}} {\tilde{h}}_{S D, k}), \end{matrix}

(5)

\begin{matrix} h_{I D, k} & = \sqrt{α_{k}} (\sqrt{\frac{ε_{k}}{ε_{k} + 1}} {\bar{h}}_{I D, k} + \sqrt{\frac{1}{ε_{k} + 1}} {\tilde{h}}_{I D, k}), \end{matrix}

(6)

where

β, γ_{k}

and

α_{k}

are the large-scale path loss coefficients;

δ, ρ_{k}

and

ε_{k}

are the Rician factors;

{\bar{H}}_{S I}

,

{\bar{h}}_{S D, k}

and

{\bar{h}}_{I D, k}

represent the line-of-sight components, which are statistical CSI and remain unchanged over long time. When using the uniform area array model, the line-of-sight components

{\bar{H}}_{S I}

,

{\bar{h}}_{S D, k}

and

{\bar{h}}_{I D, k}

can be formulated as

{\bar{H}}_{S I} = a_{N} (θ_{A}^{a}, θ_{A}^{e}) a_{M}^{H} (φ_{D}^{a}, φ_{D}^{e}),

(7)

{\bar{h}}_{I D, k} = a_{M} (θ_{D, k}^{a}, θ_{D, k}^{e}),

(8)

{\bar{h}}_{S D, k} = a_{N} (φ_{D, k}^{a}, φ_{D, k}^{e}),

(9)

where

θ_{A}^{a}

and

θ_{A}^{e}

deonte the azimuth and elevation angles of arrival at the RIS from the BS, respectively;

φ_{D}^{a}

and

φ_{D}^{e}

denote the azimuth and elevation angles of departure from the BS to the RIS, respectively;

θ_{D, k}^{a}

and

θ_{D, k}^{e} (φ_{D, k}^{a}, φ_{D, k}^{e})

are the the azimuth and elevation angles of departure from the RIS to the k-th user (from the BS to the k-th user). These angles are randomly generated. In addition, the array response vector is defined as

\begin{matrix} a_{X} (ϑ^{a}, ϑ^{e}) = [1, \dots, e^{j 2 π \frac{d}{λ} (x sin ϑ^{a} sin ϑ^{e} + y cos ϑ^{e})}, \dots, \\ e^{j 2 π \frac{d}{λ} ((\sqrt{X} - 1) sin ϑ^{a} sin ϑ^{e} + (\sqrt{X} - 1) cos ϑ^{e})}]^{T}, \end{matrix}

(10)

where X can be substituted as M, N or K and

θ_{A}^{a}

and

θ_{A}^{e}

deonte the azimuth and elevation angles, respectively; d and

λ

represent the antenna spacing at the BS and the wavelength, respectively.

Moreover,

{\tilde{H}}_{S I} \sim CN (0, R_{H R} \otimes R_{H B})

,

{\tilde{h}}_{S D, k} \sim CN (0, R_{h B, k})

and

{\tilde{h}}_{I D, k} \sim CN (0, R_{h R, k})

represent the non-line-of-sight component with

R_{H R}

,

R_{H B}

,

R_{h B, k}

and

R_{h R, k}

being the corresponding spatial covariance matrices, which are given by

{[R_{H B}]}_{i, j} = ρ^{|i - j|}

,

{[R_{H R}]}_{i, j} = ρ^{|i - j|}

,

{[R_{h B}]}_{i, j} = ρ^{|i - j|}

and

{[R_{h R}]}_{i, j} = ρ^{|i - j|}

and

ρ

represents the correlation coefficient.

\begin{matrix} {[a_{X} (θ^{a}, θ^{e})]}_{x} = exp \{2 π j \frac{d}{λ}  [\frac{x - 1}{\sqrt{X}} sin θ^{a} sin θ^{e} + ((x - 1) \mod \sqrt{X}) cos θ^{e}]\} . \end{matrix}

(11)

\begin{matrix} {SIN R}_{k} = \frac{| (h_{I D, k}^{H} Φ H_{S I} + h_{S D, k}^{H}) w_{k} |^{2}}{\sum_{i = 1, i \neq k}^{K} (1 + k_{B}) {| (h_{I D, k}^{H} Φ H_{S I} + h_{S D, k}^{H}) w_{i} |}^{2} + (1 + k_{B}) σ_{k}^{2} + Γ_{k} (w, Φ)} . \end{matrix}

(12)

\begin{matrix} Γ_{k} (w, Φ) = h_{I D, k}^{H} Φ H_{S I} + h_{S D, k}^{H}  [k_{B} w_{k} w_{k}^{H} + (1 + k_{B}) k_{s} d i a g \{\sum_{i = 1}^{K} w_{i} w_{i}^{H}\}] (h_{I D, k} Φ^{H} H_{S I}^{H} + h_{S D, k}) . \end{matrix}

(13)

Thus, the signal received at the k-th user can be written as

\begin{matrix} y_{k} = \underset{reflected link}{\underset{︸}{h_{I D, k}^{H} Φ H_{S I} (w_{k} s_{k} + η_{s})}} + \underset{direct link}{\underset{︸}{h_{S D, k}^{H} (w_{k} s_{k} + η_{s})}} \\ + \underset{multiuser interference}{\underset{︸}{\sum_{i = 1, i \neq k}^{K} h_{I D, k}^{H} Φ H_{S I} w_{i} s_{i}}} + \underset{receiver HWI}{\underset{︸}{η_{B}}} + \underset{noise}{\underset{︸}{η_{k}}} \\ = \tilde{y_{k}} + η_{B} . \end{matrix}

(14)

where

Φ = error (e^{j θ_{1}}, e^{j θ_{2}}, \dots, e^{j θ_{N}})

denote the phase shift matrix at the RIS and

θ_{i}

is the phase shift of the i-th reflecting element;

η_{B} \sim CN (0, k_{B} E \{{|\tilde{y_{k}}|}^{2}\})

deontes the user’s additional distortion noise, which satisfies the Gaussian distribution with zero mean and

k_{B} \in (0, 1)

denotes the normalized variance of the received distortion noise;

η_{k} \sim CN (0, σ_{k}^{2})

denotes the additive Gaussian white noise by the k-th user.

Therefore, the k-th user’s instantaneous signal-to-interference-plus-noise ratio (SINR) is given by (12) and (13) on the next page. Based on (12) and (13), the instantaneous data rate of the k-th user can be expressed as

R_{k} = {log}_{2} (1 + {SINR}_{k}) .

(15)

Therefore, the optimization problem in this paper can be written as

\begin{matrix} max_{W, Φ} min_{k} E [R_{k}] \\ s . t . C 1 : tr {W W^{H}} \leq P_{m a x}, \\ C 2 : | θ_{i} | = 1, \forall i = 1, 2, \dots, N, \end{matrix}

(16)

where the expectation in the objective function is taken over the nonline-of-sight components in the CSI. In the above optimization problem, C1 represents the power constraint at the BS, while C2 means the unit modulus constraints of the phase shifts of the RIS. Unfortunately, it is challenging to derive the closed-form expression of the objective function since the average data rate contains the expectation operation over numerous random small-scale channel gains. In addition, this work studied the impact of hardware impairment on both the BS and the users. The average data rate expressions would be much more complicated. Hence, there are no existing mathematical algorithms that can solve these kinds of optimization problems.

3. Proposed Algorithm

In this section, we propose a statistical CSI-based transmission scheme where the DDGP algorithm is adopted to solve the optimization problem.

3.1. Transmission Scheme

For the existing transmission schemes for RIS-assisted communication systems, the instantaneous CSI is adopted to adjust the beamforming and RIS phases shift, which requires channel estimation in each channel coherence time interval, as shown in Figure 2. However, this method has some drawbacks, as summarized as follows. For the instantaneous CSI-based scheme, the beamforming matrix and phase shift matrix need to be calculated in channel coherence interval, which increases the computational complexity. Furthermore, phase shifts of the RIS need to be updated frequently and sent back to the RIS controller, which incurs significant feedback overhead.

To address this issue, in this paper, we consider the design of the transmission scheme based on statistical CSI. As shown in Figure 2, for the statistical CSI-based scheme, the BS only needs to estimate the statistical CSI at the start of the transmission, and the rest of several channel coherence time intervals will be fully used to transmit the information, which significantly reduces the computational complexity and feedback overhead. Once the network is trained, it can be directly applied in real-time with only simple mathematical calculations. The neural networks only need to be retrained once the statistical CSI changes.

3.2. DDPG Algorithm

As the objective function in (16) does not have a closed-form expression, it is difficult to solve this problem using conventional optimization algorithms. Therefore, in this paper, we adopt a deep reinforcement learning algorithm, which can efficiently process complex environmental parameters and a large amount of state information by utilizing techniques such as stochastic gradient optimization and inverse parameter transfer in deep neural networks. Specifically, the DDGP algorithm is employed to solve the optimization problem in this paper, which is one of the deep reinforcement learning algorithms. The DDPG algorithm can be used to solve this challenging optimization problem with continuous variables.

The DDPG algorithm adopts the Actor-Critic architecture, which uses the policy network to output deterministic actions directly, and the functions of its four networks are introduced as follows:

(1): $A c t o r C u r r e n t n e t w o r k$
The role of the actor current network is to iteratively update the policy network parameters $θ$ and select the current action according to the state $S^{(t)}$ at time step t, which is composed of three parts: the beamforming matrix $W^{(t)} \in C^{M \times K}$ , the phase shift matrix $Φ^{(t)} \in C^{N \times N}$ and the channel matrices, i.e., $H_{S I}^{(t)} \in C^{N \times M}$ , $h_{S D, k}^{(t)} \in C^{M \times 1}$ , $h_{I D, k}^{(t)} \in C^{N \times 1}$ . In addition, the actor current network also interacts with the environment to generate $S^{(t + 1)}$ and reward $R^{(t)}$ , which can be defined as $R^{(t)} = max_{W, Φ} min_{k} E [R_{k}^{(t)}]$ and $R_{k}^{(t)}$ is defined in (15). The expression of loss function $J^{(t)} (θ)$ can be expressed as

$J^{(t)} (θ) = - \frac{1}{m} \sum_{j = 1}^{m} Q^{(t)} (s_{i}, a_{i}, w) .$

(17)
(2): $A c t o r T a r g e t n e t w o r k$
The actor target network serves to select the optimal action $a^{(t + 1)}$ based on the state $S^{(t + 1)}$ at time $t + 1$ sampled in the empirical playback pool. The action $a^{(t)} \in R^{2 M K + N}$ is composed of two parts: the first N elements corresponding to the phase shifts of RIS reflecting elements and the remaining $2 M K$ elements corresponding to the real part and imaginary part of the beamforming matrix, respectively. We take action $a^{(t)}$ to optimize the beamforming matrix $W^{(t)}$ and the phase shift matrix $Φ^{(t)}$ , and the optimized results can be described as

$W^{(t + 1)} = \frac{\sqrt{P_{\max}} W^{(t)}}{{∥W^{(t)}∥}_{F}} .$

$ϕ_{n}^{a (t + 1)} = ϕ_{n}^{a (t)} + a_{j}^{(t)} π$

where $ϕ_{n}^{(t)} = cos (ϕ_{n}^{a (t)}) + j sin (ϕ_{n}^{a (t)})$ is the n-th phase shift in $Φ^{(t)}$ and $a_{j}^{(t)}$ is the j-th action value in $a^{(t)}$ , $\forall n = j = 1, 2, \dots, N$ .
The target network parameter $θ^{{(t)}^{'}}$ is periodically copied from the current network parameter $θ^{(t)}$ , which uses the soft update method, and the soft update factor is $τ$ .

$θ^{(t + 1)} = τ θ^{(t)} + (1 - τ) θ^{(t + 1)} .$
(3): $C r i t i c C u r r e n t n e t w o r k$
The critic current network is used to iteratively update the value network parameter $w^{(t)}$ and calculate the current value of $Q (S^{(t)}, a^{(t)}, w^{(t)})$ . The target value of $Q^{{(t)}^{'}}$ is given by

$\begin{matrix} y_{i} = R^{(t)} + γ Q^{'} (S^{(t)}, a^{(t)}, w^{{(t)}^{'}}) . \end{matrix}$

(18)

The loss function is given by

$\begin{matrix} J (w^{(t)}) = \frac{1}{m} \sum_{j = 1}^{m} {(y_{j} - Q (ϕ (S_{j}^{(t)}), a_{j}^{(t)}, w^{(t)}))}^{2} . \end{matrix}$

(19)
(4): $C r i t i c T a r g e t n e t w o r k$
The critic target network aims to calculate the $Q^{'} (S^{(t)}, a^{(t)}, w^{{(t)}^{'}})$ portion of the target value Q. The network parameter $w^{{(t)}^{'}}$ is periodically copied from $w^{(t)}$ , which uses the soft update method, and the soft update factor is $τ$ :

$w^{(t + 1)} = τ w^{(t)} + (1 - τ) w^{(t + 1)} .$

At the same time, to increase some randomness and increase the coverage of learning in the learning process, the DDPG algorithm adds some noise $N$ to the selected action A. That is, the expression of the final and interactive action A of the environment is

$\begin{matrix} a^{(t)} = π_{θ} (S^{(t)}) + N . \end{matrix}$

(20)

The structures of actor and critic networks are shown in Table 1 and both of them have three layers of neural networks.

The overall algorithm is summarized in Algorithm 1.

Algorithm 1 The Proposed DDPG Algorithm.

1:: Randomly initialize $θ^{(t)}$ , $w^{(t)}$ , $w^{(t + 1)} = w^{(t)}$ , $θ^{(t + 1)} = θ^{(t)}$ . Empty the collection of experience playback D.
2:: for I = 1,2,…, T do
3:: Initialize $S^{(t)}$ as the first state of the current state sequence, and get its eigenvector $ϕ (S^{(t)})$ .
4:: Get the action $A^{(t)} = π_{θ} (S) + N$ in Actor’s current network based on state S.
5:: Perform the action $A^{(t)}$ , get a new state $S^{(t + 1)}$ , reward $R^{(t)}$ , and determine whether arrive the termination status `end’.
6:: Stores the array $\{ϕ (S^{(t)}), A^{(t)}, R^{(t)}, ϕ (S^{(t + 1)}), e n d\}$ into the empirical playback set D.
7:: $S^{(t + 1)} = S^{(t)}$
8:: Get m samples $\{ϕ (S_{j}^{(t)}), A_{j}^{(t)}, R_{j}^{(t)}, ϕ (S_{j}^{(t + 1)}), e n d_{j}\}$ , $\forall j = 1, 2, \dots, m,$ from empirical playback sets, and calculate the current target $Q^{'} s$ value $y_{j}$ :

$y_{j} = \{\begin{matrix} R_{j}^{(t)} \\ R_{j}^{(t)} + γ Q^{(t + 1)} (ϕ (S_{j}^{(t + 1)}), π_{θ^{'}} (ϕ (S_{j}^{(t + 1)}))) . \end{matrix}$

(21)
9:: Use the mean-variance loss function

$\frac{1}{m} \sum_{j = 1}^{m} {(y_{j} - Q^{(t)} (ϕ (S_{j}^{(t)}), A_{j}^{(t)}, w))}^{2},$

to update the Critic’s current network parameter w through the gradient backpropagation of neural networks.
10:: Use

$J (θ) = - \frac{1}{m} \sum_{j = 1}^{m} Q^{(t)} (s_{i}, a_{i}, w) .$

to update the Actor’s current network parameter $θ$ through the gradient backpropagation of neural networks.
11:: If $t % C = 1$ , update Critic’s target network and parameters of Actor’s target network:

$w^{(t + 1)} = τ w^{(t + 1)} + (1 - τ) w^{(t + 1)} .$

$θ^{(t + 1)} = τ θ^{(t)} + (1 - τ) θ^{(t + 1)} .$
12:: If $S^{(t + 1)}$ is at termination status, end the current time step’s iteration; otherwise go to Step b.
13:: end for

4. Simulation Results

In this section, the performance of the DDPG algorithm-based scheme is evaluated. Firstly, The locations of A and B are set at (0, 0, 30 m) and (100 m, 20 m, 10 m), respectively. Besides, the users are limited to a circle centered at (150 m, 0, 1.5 m) with a radius of 20 m. Other parameters are shown in Table 2.

Based on the table above, we adopt the DDPG algorithm based on statistical CSI. The number of reflecting elements is set to

N = 20

, 30, 40, and 50, respectively. For each time step, we use the beamforming matrix

W

and the phase shift matrix

Φ

at time step t as the input of the DDPG neural network, and the output will be

W

and

Φ

at time

t + 1

. In Figure 3, Figure 4, Figure 5 and Figure 6, we illustrate the minimum average user date versus the time steps for different N.

As shown in Figure 3, Figure 4, Figure 5 and Figure 6, when the number of RIS reflecting elements is set to

N = 20

, the MAUR (minimum average user date) converges to

0.75

when the number is set to

N = 30

, MAUR converges to about

0.8

. When the number of RIS reflecting elements increases to 40, MAUR increases to

0.9

, and when N is 50, the rate is about

1.1

. Hence, the MAUR increases with the number of reflecting elements. Meanwhile, the simulation results show that, under different conditions, the increase in the number of reflecting elements does not affect the converging speed of the proposed DDPG algorithm.

In Figure 7, Figure 8, Figure 9 and Figure 10, we respectively set

k_{B} = k_{s}

as 0.01–0.15 to explore the convergence of MUAR under different conditions. It indicates that the MAUR decreases with

k_{B}

and

k_{s}

. This conclusion is also consistent with the SINR formula in (13). Furthermore, when

k_{B}, k_{s}

are set to 0, the result is identical to the situation when hardware impairments are not considered.

In addition, considering the influence of wind and rain in nature, we have added a random variable on the channel angles (angle of departure and angle of arrival), where the random variable is assumed to follow the uniform distribution. In general, the uniform distribution can be regarded as the worst case since the variables are uniformly distributed rather than peaking at one point for the Gaussian distribution. Then, we use the trained solution obtained from our DDPG networks for the realistic channels with angle variance to demonstrate the effectiveness of our algorithm. As observed from Figure 11, we can find that the performance degradation due to the channel variations is not too much, which confirms the robustness of our proposed algorithm.

Finally, in Figure 12, we compare the performance of the proposed algorithm with the existing non-optimized algorithm to evaluate the effectiveness of the optimization operations. In specific, for the non-optimized algorithm, the beamforming vector at the BS is randomly generated and the phase shift matrix is set to a unit matrix. It is observed from Figure 12 that the proposed algorithm significantly outperforms the existing non-optimized algorithm.

5. Conclusions

In this paper, we studied the downlink IRS-aided multiuser MISO system with imperfect hardware, which is based on statistical CSI design. The DDPG algorithm was applied to optimize the beamforming matrix at the BS and the phase shifts matrix at the RIS jointly. Furthermore, the transceiver hardware impairment was also considered to solve the problem of inevitable hardware loss in practical systems. The simulation results demonstrated that it is necessary to take into HWI, and the DDGP algorithm can achieve excellent performance.

Author Contributions

Conceptualization, W.M., L.L., L.Z., Y.L. and H.R.; methodology, W.M., L. Li, L.Z., Y.L. and H.R.; software, W.M.; validation, L.Z.; formal analysis, Y.L.; investigation, W.M.; resources, W.M.; data curation, W.M.; writing—original draft preparation, W.M., L. Li, L.Z. and Y.L.; writing—review and editing, H.R.; visualization, Y.L.; supervision, H.R.; project administration, H.R.; funding acquisition, H.R. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China (62101128) and Basic Research Project of Jiangsu Provincial Department of Science and Technology (BK20210205).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

Pan, C.; Ren, H.; Wang, K.; Kolb, J.F.; Elkashlan, M.; Chen, M.; Di Renzo, M.; Hao, Y.; Wang, J.; Swindlehurst, A.L.; et al. Reconfigurable Intelligent Surfaces for 6G Systems: Principles, Applications, and Research Directions. IEEE Commun. Mag. 2021, 59, 14–20. [Google Scholar] [CrossRef]
Renzo, M.D.; Debbah, M.; Phan-Huy, D.T.; Zappone, A.; Alouini, M.S.; Yuen, C.; Sciancalepore, V.; Alexandropoulos, G.C.; Hoydis, J.; Gacanin, H.; et al. Smart radio environments empowered by reconfigurable AI meta-surfaces: An idea whose time has come. EURASIP J. Wirel. Commun. Netw. 2019, 2019, 1–20. [Google Scholar] [CrossRef] [Green Version]
Oliveri, G.; Rocca, P.; Salucci, M.; Massa, A. Holographic smart EM skins for advanced beam power shaping in next generation wireless environments. IEEE J. Multiscale Multiphys. Comput. Tech. 2021, 6, 171–182. [Google Scholar] [CrossRef]
Di Renzo, M.; Zappone, A.; Debbah, M.; Alouini, M.S.; Yuen, C.; de Rosny, J.; Tretyakov, S. Smart Radio Environments Empowered by Reconfigurable Intelligent Surfaces: How It Works, State of Research, and The Road Ahead. IEEE J. Sel. Areas Commun. 2020, 38, 2450–2525. [Google Scholar] [CrossRef]
Huang, C.; Hu, S.; Alexandropoulos, G.C.; Zappone, A.; Yuen, C.; Zhang, R.; Renzo, M.D.; Debbah, M. Holographic MIMO Surfaces for 6G Wireless Networks: Opportunities, Challenges, and Trends. IEEE Wirel. Commun. 2020, 27, 118–125. [Google Scholar] [CrossRef]
Benoni, A.; Salucci, M.; Oliveri, G.; Rocca, P.; Li, B.; Massa, A. Planning of EM Skins for Improved Quality-of-Service in Urban Areas. IEEE Trans. Antennas Propag. 2022. [Google Scholar] [CrossRef]
Pan, C.; Ren, H.; Wang, K.; Xu, W.; Elkashlan, M.; Nallanathan, A.; Hanzo, L. Multicell MIMO Communications Relying on Intelligent Reflecting Surfaces. IEEE Trans. Wirel. Commun. 2020, 19, 5218–5233. [Google Scholar] [CrossRef]
Pan, C.; Ren, H.; Wang, K.; Elkashlan, M.; Nallanathan, A.; Wang, J.; Hanzo, L. Intelligent Reflecting Surface Aided MIMO Broadcasting for Simultaneous Wireless Information and Power Transfer. IEEE J. Sel. Areas Commun. 2020, 38, 1719–1734. [Google Scholar] [CrossRef]
Boulogeorgos, A.A.A.; Alexiou, A. How Much do Hardware Imperfections Affect the Performance of Reconfigurable Intelligent Surface-Assisted Systems? IEEE Open J. Commun. Soc. 2020, 1, 1185–1195. [Google Scholar] [CrossRef]
Shen, H.; Xu, W.; Gong, S.; Zhao, C.; Ng, D.W.K. Beamforming Optimization for IRS-Aided Communications with Transceiver Hardware Impairments. IEEE Trans. Commun. 2021, 69, 1214–1227. [Google Scholar] [CrossRef]
Zhou, G.; Pan, C.; Ren, H.; Wang, K.; Peng, Z. Secure Wireless Communication in RIS-Aided MISO System with Hardware Impairments. IEEE Wirel. Commun. Lett. 2021, 10, 1309–1313. [Google Scholar] [CrossRef]
Peng, Z.; Li, T.; Pan, C.; Ren, H.; Wang, J. RIS-Aided D2D Communications Relying on Statistical CSI With Imperfect Hardware. IEEE Commun. Lett. 2022, 26, 473–477. [Google Scholar] [CrossRef]
Wang, K.; Lam, C.T.; Ng, B.K. Doppler Effect Mitigation using Reconfigurable Intelligent Surfaces with Hardware Impairments. In Proceedings of the 2021 IEEE Globecom Workshops (GC Wkshps), Madrid, Spain, 7–11 December 2021; pp. 1–6. [Google Scholar] [CrossRef]
Hemanth, A.; Umamaheswari, K.; Pogaku, A.C.; Do, D.T.; Lee, B.M. Outage Performance Analysis of Reconfigurable Intelligent Surfaces-Aided NOMA Under Presence of Hardware Impairment. IEEE Access 2020, 8, 212156–212165. [Google Scholar] [CrossRef]
Peng, Z.; Chen, Z.; Pan, C.; Zhou, G.; Ren, H. Robust Transmission Design for RIS-Aided Communications With Both Transceiver Hardware Impairments and Imperfect CSI. IEEE Wirel. Commun. Lett. 2022, 11, 528–532. [Google Scholar] [CrossRef]
Hassan, A.K.; Moinuddin, M.; Al-Saggaf, U.M.; Aldayel, O.; Davidson, T.N.; Al-Naffouri, T.Y. Performance Analysis and Joint Statistical Beamformer Design for Multi-User MIMO Systems. IEEE Commun. Lett. 2020, 24, 2152–2156. [Google Scholar] [CrossRef]
Zhi, K.; Pan, C.; Ren, H.; Wang, K. Power Scaling Law Analysis and Phase Shift Optimization of RIS-Aided Massive MIMO Systems With Statistical CSI. IEEE Trans. Commun. 2022, 70, 3558–3574. [Google Scholar] [CrossRef]
Dai, J.; Zhu, F.; Pan, C.; Ren, H.; Wang, K. Statistical CSI-Based Transmission Design for Reconfigurable Intelligent Surface-Aided Massive MIMO Systems With Hardware Impairments. IEEE Wirel. Commun. Lett. 2022, 11, 38–42. [Google Scholar] [CrossRef]
Ren, H.; Pan, C.; Wang, L.; Liu, W.; Kou, Z.; Wang, K. Long-Term CSI-Based Design for RIS-Aided Multiuser MISO Systems Exploiting Deep Reinforcement Learning. IEEE Commun. Lett. 2022, 26, 567–571. [Google Scholar] [CrossRef]

Figure 1. System Model.

Figure 2. Transmission Scheme.

Figure 3. RIS = 20.

Figure 4. RIS = 30.

Figure 5. RIS = 40.

Figure 6. RIS = 50.

Figure 7. Convergence Speed when

k_{B} = k_{S} = 0.01

.

Figure 7. Convergence Speed when

k_{B} = k_{S} = 0.01

.

Figure 8. Convergence Speed when

k_{B} = k_{S} = 0.05

.

Figure 8. Convergence Speed when

k_{B} = k_{S} = 0.05

.

Figure 9. Convergence Speed when

k_{B} = k_{S} = 0.1

.

Figure 9. Convergence Speed when

k_{B} = k_{S} = 0.1

.

Figure 10. Convergence Speed when

k_{B} = k_{S} = 0.15

.

Figure 10. Convergence Speed when

k_{B} = k_{S} = 0.15

.

Figure 11. Performance verification under disturbance.

Figure 12. Comparison with non-optimized algorithm.

Table 1. Network Structures.

Parameter Networks	Number of Neurons	Activation Function
Actor	128	ReLU
	64	ReLU
	$N + M * K * 2$	$tanh (\cdot)$
Critic	64	ReLU
	32	ReLU
	1	None

Table 2. Simulation Parameters.

Parameter Name	Sign	Parameter Value
Noise power density	$ρ_{n}$	−174 dBm/Hz
Channel bandwidth	B	1 MHz
Reference path loss	$P L_{0}$	0–30 dB
Reference distance	$d_{0}$	1 m
Path loss coeffificients	$β$	$2.2$
	$α_{k}$	$2.2$
	$γ_{k}$	$3.75$
Rician factors	$δ$	3
	$ε_{k}$	3
	$ρ_{k}$	3
Correlation coefficients	$ρ$	$0.1$
Normalized variance	$k_{B}$	$0.01$
Normalized variance	$k_{S}$	$0.01$
Numbers of antennas	M	8
Numbers of users	K	4
Numbers of reflecting elements	N	20–50

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ma, W.; Zhuo, L.; Li, L.; Liu, Y.; Ren, H. Deep Reinforcement Learning for RIS-Aided Multiuser MISO System with Hardware Impairments. Appl. Sci. 2022, 12, 7236. https://doi.org/10.3390/app12147236

AMA Style

Ma W, Zhuo L, Li L, Liu Y, Ren H. Deep Reinforcement Learning for RIS-Aided Multiuser MISO System with Hardware Impairments. Applied Sciences. 2022; 12(14):7236. https://doi.org/10.3390/app12147236

Chicago/Turabian Style

Ma, Wenjie, Liuchang Zhuo, Luchu Li, Yuhao Liu, and Hong Ren. 2022. "Deep Reinforcement Learning for RIS-Aided Multiuser MISO System with Hardware Impairments" Applied Sciences 12, no. 14: 7236. https://doi.org/10.3390/app12147236

APA Style

Ma, W., Zhuo, L., Li, L., Liu, Y., & Ren, H. (2022). Deep Reinforcement Learning for RIS-Aided Multiuser MISO System with Hardware Impairments. Applied Sciences, 12(14), 7236. https://doi.org/10.3390/app12147236

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Reinforcement Learning for RIS-Aided Multiuser MISO System with Hardware Impairments

Abstract

1. Introduction

2. System Model

3. Proposed Algorithm

3.1. Transmission Scheme

3.2. DDPG Algorithm

4. Simulation Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI