Phase Shift Design in RIS Empowered Wireless Networks: From Optimization to AI-Based Methods

Reconfigurable intelligent surfaces (RISs) have a revolutionary capability to customize the radio propagation environment for wireless networks. To fully exploit the advantages of RISs in wireless systems, the phases of the reflecting elements must be jointly designed with conventional communication resources, such as beamformers, transmit power, and computation time. However, due to the unique constraints on the phase shift, and massive numbers of reflecting units and users in large-scale networks, the resulting optimization problems are challenging to solve. This paper provides a review of current optimization methods and artificial intelligence-based methods for handling the constraints imposed by RIS and compares them in terms of solution quality and computational complexity. Future challenges in phase shift optimization involving RISs are also described and potential solutions are discussed.


I. INTRODUCTION
It is well-known that line-of-sight (LoS) propagation is a desirable but rarely occurring scenario for wireless communications.Traditional techniques for addressing this issue is to deploy more Zongze Li is with Peng Cheng Laboratory, Shenzhen 518038, China (e-mail: lizz@pcl.ac.cn).Qingfeng Lin and Yik-Chung Wu are with the Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong (e-mail: qflin@eee.hku.hk;ycwu@eee.hku.hk).Shuai Wang is with the Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China (e-mail: s.wang@siat.ac.cn).Yang Li is with Shenzhen Research Institute of Big Data, Shenzhen 518172, China (e-mail: liyang@sribd.cn).Miaowen Wen is with the School of Electronics and Information Engineering, South China University of Technology, Guangzhou 510640, China (e-mail: eemwwen@scut.edu.cn).H. Vincent Poor is with the Department of Electrical and Computer Engineering, Princeton University, Princeton, NJ 08544 USA (e-mail: poor@princeton.edu).

arXiv:2204.13372v1 [cs.LG] 28 Apr 2022
active nodes such as base stations (BSs), access points, or relays to improve coverage and compensate the high propagation loss in a non-LoS environment.However, this approach would induce high energy consumption and deployment/backhaul/maintenance cost.Worse still, it would also cause more severe and complicated network interference issue.
Recently, reconfigurable intelligent surfaces (RISs), which are passive devices equipped with large numbers of low cost reflective elements, have emerged as a promising technology to overcome the above challenges.Compared with the conventional active nodes approach which actively transmits the signals, an RIS shapes the incoming signal by adjusting the phase shifts of the reflecting elements.Thus, deploying RISs is more energy-efficient, environmentally friendly, and most importantly free of noise amplification and self-interference [1].Intuitively, deploying an RIS could provide virtual LoS links between a BS and mobile users even when the direct LoS path is blocked by high-rise buildings.Therefore, RISs have significant potential in enhancing both spectral and energy efficiencies in urban environments [2].Furthermore, due to the passive nature of RISs, they can be flexibly deployed in building facades, indoor walls, aerial platforms, roadside billboards, vehicle windows, etc.
While RISs could be game-changing, their deployment also brings challenges.One of them is resource allocation, which requires the nonconvex constrained phase shifts to be optimized together with other communication resources.To illustrate the importance of optimizing the phase shifts, we consider a use case on the vehicle-to-everything (V2X) system in Fig. 1, which consists of a BS located on the left side of the map, an RIS located at the intersection, and three intelligent vehicles marked in different colors.Each car is equipped with a front camera and LiDAR that capture data from the environment.These sensed data need to be transmitted to the BS for cooperative perception, remote driving, or vehicle platooning.Due to significant shadowing effects, the received signal power reduces quickly with distance away from the intersection, and high data rate transmission could not be achieved.One can either take a longer duration for transmission, which is not desirable as outdated data is not useful in an intelligent traffic system, or use lossy compression to reduce the amount of data to be sent, which would unfortunately compromise the integrity of information if the compression loss is too much.We illustrate the consequences of the latter option and show how an RIS might help to mitigate them.
In particular, we use the simulation platform of Car Learning to Act (CARLA) and Pytorch in Ubuntu 18.04 with a GeForce GTX 1080GPU for graphic rendering and generation of vivid sensing data [3].The ground-truth images of a particular frame from the front cameras are shown This demonstrates the necessity of deploying an RIS and the optimization of phase shifts in this V2X communication scenario.
To optimize the non-convex constrained phase shifts at an RIS, a number of optimization methods have been proposed in the literature, including semidefinite relaxation (SDR), the penalty method, the majorization-minimization (MM) algorithm [4], the manifold method [5], gradient descent (GD) [6], and convex relaxation (CR) [7].Artificial Intelligence (AI) methods, such as unsupervised learning [8], supervised learning [9], and reinforcement learning [10], also recently emerged as viable solutions.However, the properties of these diverse algorithms are scattered in the literature, and there is a lack of comparisons among them in the context of RISs.To fill this gap, this paper summarizes these techniques, reveals their relationships, and compares their properties.

II. RIS RESOURCE ALLOCATION EXAMPLES AND GENERAL FORMULATION
In wireless resource allocation involving an RIS, there are two types of resources.One is the conventional communication resources, such as beamforming vector, artificial noise, transmit power, and computation time.The other is the RIS coefficients.Each type of resources would have its own constraint, and there are possibly additional constraints coupling the two types of resources.Below are three application examples and their problem formulations.In each of the examples, it is assumed that there are M reflecting elements, and the RIS coefficients are expressed in a vector e := [e 1 , . . ., e M ] H ∈ F, with F being the feasible set of the RIS coefficients, and the specific form of F will be discussed after the three examples.
• Secure beamforming for multiple-input single-output (MISO) systems [11]: As shown in Fig. 2(a), the BS communicates with a single-antenna user with the help of an RIS in the presence of a single-antenna eavesdropper.The goal is to maximize the achievable secrecy rate by jointly optimizing the beamformer at the BS and the phase shift coefficients of the RIS under the transmit power constraint at the BS.To be specific, let the channels from the BS to the RIS, from the RIS to user, from the RIS to eavesdropper, and the beamforming vector at the BS be respectively denoted by H ∈ C M ×N , h ∈ C M ×1 , g ∈ C M ×1 , and w ∈ C N ×1 .Then, the secrecy rate maximization problem is given by max w,e where σ 2 is the variance of the white Gaussian noise at the user.
• MISO uplink communication networks [12]: There are a number of single-antenna mobile users transmitting signals to a multi-antenna BS with the assistance of an RIS, as shown in Fig. 2(b).The objective is to minimize the total uplink transmit power by jointly optimizing the phase shift coefficients of the RIS e, the transmission power x k of the user k under the limited transmission power P k , and signal-to-interference-and-noise-ratio (SINR) constraints.Let the channels from the BS to the RIS, from the RIS to user k, and from the BS to user k be respectively denoted by k ∈ {1, . . ., K}. Accordingly, the weighted power minimization problem is given by e ∈ F, (2c) where ĥk = h H r,k diag(e)H + h H d,k ∈ C 1×N is the equivalent channel from user k to the BS, λ = [λ 1 , . . ., λ K ] T represents the weights for mobile users, and r k is the minimum SINR requested by the user k.
• Computation offloading in the Internet of Things (IoT) networks [13]: In the downlink transmission of an RIS-aided cache-enabled radio access network, a multi-antenna BS transmits signals to a number of single-antenna users, as shown in Fig. 2(c).The goal is to minimize the total network cost that consists of both the backhaul capacity and the transmission power by adjusting the caching proportion of the file requested by user k, the precoding vector p k ∈ C M ×1 at the BS for user k, and the RIS coefficients.Besides the constraint on the RIS coefficients, we also have a constraint on the size of total cached content to be smaller than the local storage size S max at the BS.Further let the target rate of user k be denoted by R k , the total network cost minimization problem is formulated as where η is a regularization parameter, 2 R k /B − 1 is the SINR requirement in terms of the content-delivery target rate of user k, B is the bandwidth of the system, and ĥk is defined as in the previous example.
In the above three applications, we can see that most of the constraints in the resource allocation problems are decoupled in the sense that constraints for the RIS coefficents e does not involve other resources, and vice versa.For the coupled constraints, e.g., (2d) and (3e), they can be converted into penalty terms in the objective function [14], [15] or decoupled by introducing auxiliary variables [16]- [19].After these operations, without loss of generality, we consider a general resource allocation problem appearing in the form where f (x, e) is a continuous objective function, and x represents the conventional communication resources with the set X representing the constraint on x, such as maximum transmit power, limited cache size, operation time limitation, etc.
With the decoupled constraints for x and e, the optimization problem is tractable under the commonly used block cooridinate descent (BCD) framework, which alternatively solves for x with e fixed and solves for e with x fixed.In particular, when the phase shift coefficients of the RIS e is given, the resource allocation problem reduces to a traditional communication problem without the RIS, which has been investigated for decades and should be familiar to many communication researchers.On the other hand, when x is fixed at a certain value, say x (n) , the subproblem for optimizing e is Before discussing various methods for solving (5), let us review the modeling of the constraint set F on the RIS coefficients.Depending on whether the phase is modeled as a continuous or discrete variable, the feasible set F is defined differently: • Continuous phase shift: Each RIS coefficient has infinite phase resolution, i.e., e m is expressed as β m e iθm with i being the imaginary unit, and θ m is a real number.For β m , there are three variations in the literature.
-C1. β m is a known constant, which is the ideal phase shift model [20]- [22].This is the most popular model at the time of writing, and F is represented by modulus constraints -C2. β m is an unknown variable and is independent of θ m [23], [24].This model leads to a convex set F, described by |e m | 2 ≤ c for some constant c; -C3.β m is a function of θ m .This is a relatively new model and takes the hardware property into consideration.For example, one of the recent models [25] states that where β min , φ and α are known constants related to the specific circuit implementation.
• Discrete phase shift: Each RIS coefficient e m can only take one of the L possible phase shift values.
Among the three continuous phase shift models, C2 is a convex set, thus its treatment is similar to the conventional resource allocation problem.Another way to view C2 is by treating the optimization of β m and θ m separately, so C2 is equivalent to If we regard optimization of β m as part of conventional resources, the remaining constraint For C3, although it is non-convex, it can be handled by gradient descent method on θ m (to be detailed in the next section).For C1, even though β m is known and fixed, due to the modulus requirement, its handling is non-trivial and there exists a number of methods with different solution qualities for tackling this constraint.
On the other hand, for the discrete phase shift case, the corresponding problem ( 5) is an integer nonlinear program and is NP-hard (i.e., the optimal solution cannot be found in polynomial time).
However, the most prevalent way for handling this model is to relax the discrete variables to their continuous counterparts.Then each of the obtained continuous phase shifts (by any methods for solving continuous phase shift model) is quantized to its nearest discrete value.Since the resolution of discrete phase shifts increases with the number of allowable phases, the quantization loss will be insignificant when the number of allowable phases is large [26].
Since C1 is the most fundamental model, in this paper, we focus on reviewing the optimization methods for model C1, with some of the reviewed methods also applicable to C2 and C3.The AI-based methods will be covered in Section IV with reinforcement learning also suitable for discrete phase shift model.Further emerging approaches for handling discrete phase shift case will be discussed in the section of future challenges.

III. REVIEW ON OPTIMIZATION METHODS UNDER CONTINUOUS PHASE SHIFT
Currently, the major techniques for optimizing the continuous phase shifts are the semi-definite relaxation (SDR) method, penalty method, majorization minimization (MM) method, graident descent (GD) method, manifold method, and convex relaxation (CR) method.All the reviewed methods are primarily developed for C1, and can be applied to C2 if β m and θ m are optimized separately.For C3, it is handled by GD method due to the complicated dependence of β m on θ m .Table I provides a quick summary of the reviewed methods in this section.

A. SDR Method
To handle the nonconvex modulus constraints, we can introduce a rank-one auxiliary variable Q = ee H .This translates the optimization variable from e to Q, and the objective function To account for the rank-one property of Q and the diagonal elements of Q are all 1, we need to add constraints rank(Q) = 1 and Q m,m = 1, ∀m.
Then, problem (5) under C1 is equivalent to Notice that the transformed problem is still intractable due to the rank constraint rank(Q) = 1.
But the celebrated SDR method (i.e., removing the rank constraint) can be employed to solve More specifically, with the remaining constraints Q m,m = 1 for m = 1, . . ., M being trans- is a matrix with a single 1 in the (m, m) th position and zero in all other positions, variable Q can be directly updated via the interior point method, which is available in the popular software package CVX.If f is not convex, we may add another layer of successive convex approximation (SCA) to convexify the objective function in each SCA iteration with the complexity increased by a factor equal to the number of iterations for SCA.However, since the rank-1 constraint is relaxed, the obtained solution may not be a feasible solution to the original problem (6).
In general, a feasibility check is used to verify whether the obtained Q satisfies the rank constraint.Since the relaxed problem is a convex problem, a closed-form solution of Q or explicit expression with respect to Q can be derived in its dual domain.Then the feasibility check can be done by leveraging the ranks product inequalities technique [36].If the rank constraint is not satisfied, a Gaussian randomization procedure can be employed to extract a feasible solution [37].
Since the computational complexity order of SDR is O √ M (2M 4 + M 3 ) , it could be too time-consuming for large-scale RISs.

B. Penalty Method
To guarantee a feasible solution while avoiding the feasibility check of the SDR method, a penalty method can be employed.To be specific, the rank constraint rank(Q) = 1 in (6c) can be equivalently expressed as , where Q * is the conjugate of Q.
Then, with the constraint added as a penalized term, this further transforms problem (6) into where µ ∈ (0, 1) is a penalty factor penalizing the violation of constraint This transformed objective function now contains a difference-of-convex (DC) term To convert the DC term to a convex form, SCA can be applied to − Q 2 (if f is non-convex, the SCA can also be applied to f at the same time).
Such a resultant problem is convex in Accordingly, the optimal Q in each SCA iteration can be obtained by employing the interior-point method.Since the transformed problem is solved under the SCA framework, a stationary solution of Q can be guaranteed.Furthermore, since problem ( 5) is equivalent to the transformed problem as µ tends to zero, the obtained solution is also a stationary point to (5).The penalty factor µ is important in controlling how strict the rank constraint is imposed.In practice, it can be a decreasing sequence with respect to the SCA iteration to guarantee a feasible solution of ( 5) at the end of the iteration.
As the interior-point method is adopted in each SCA iteration, the complexity order is at least

C. MM Method
Both the SDR method and the penalty method require a complexity at least O(M 3 ).To reduce the computational complexity, the MM method can be employed to tackle the unit-modulus constraint.The key idea lies in constructing a sequence of surrogate functions that serve as upper bounds of the cost function with respect to the unknown variable e. Figure 3(a) visualizes how a linear surrogate function g(x (n) , e|e (r) ) upper bounds a convex quadratic function f x (n) , e on the unit circle at the r th iteration.Specifically, given the solution of e at the r th iteration as e (r) (the red point in Fig. 3(a)), the constructed linear surrogate function needs to satisfy: a) n) , e = ∇ e g(x (n) , e|e (r) ) at point e (r) .In practice, the second-order Taylor expansion and Jensen's inequality are commonly used to find g(x (n) , e|e (r) ) [4].
Since g(x (n) , e|e (r) ) is a linear surrogate function, it has a closed-form minimizer q e (r) .Then we can project q e (r) onto the unit circle manifold to obtain e (r+1) .The next iteration involves finding q e (r+1) based on e (r+1) , and the process repeats.Therefore, problem (5) can be iteratively solved and the final converged point is a local optimal point of problem ( 5) [4].The computational complexity of the MM method is dominated by the determination of surrogate functions, which gives a complexity order of O(M 2 ).

D. GD Method
Even with the MM method, the complexity order is quadratic.To further reduce the computational complexity to linear order, GD can be employed to find a stationary point of (5).The key observation is that the ultimate unknown variable in the feasible set F is in fact {θ m } M m=1 instead of e.Therefore, problem (5) can be recast into an unconstrained optimization problem as By recasting the quadratic function f x (n) , e shown in Fig. 3(a) as f x (n) , e iΘ , a graphic demonstration of the GD method is illustrated in Fig. 3(b).Using a feasible initialization point Θ (0) , Θ (r+1) can be obtained at the (r + 1) th iteration based on r) , where b (r) is the step size.Since only gradient information is involved in each update, GD has a linear complexity order with respect to M , and the final converged point is a stationary solution to (5).Another point to note is that by expressing the objective function in terms of Θ, many local minimums are introduced compared to the objective function in terms of e.Therefore, the quality of the converged solution of the GD method highly depends on the initialization.Notice that since this method directly optimize with respect to θ m , it is also applicable to model C3 where β m is a function of θ m .The only change in ( 9) is replacing e iΘ with [β 1 (θ 1 )e iθ 1 , . . ., β M (θ M )e iθ M ] T .

E. Manifold Method
Recognizing the constraint set F forms a complex circle manifold in model C1, another low-complexity method is based on manifold optimization.A representative algorithm in this category is the Riemannian conjugate gradient (CG) method [5], which solves problem (5) on an oblique manifold through alternatively computing the Riemannian gradient, finding the conjugate direction, and performing retraction mapping.A graphical representation of various steps of the Riemannian CG method is illustrated in Fig. 4.More specifically, the Riemannian gradient of f x (n) , e at the l th iteration solution e (l) is obtained by projecting the Euclidean gradient of f at e (l) onto the tangent space (blue color step in Fig. 4).After obtaining Riemannian gradient grad e (l) f , the CG descent direction at point e (l) can be obtained as c (l) , and e (l) is updated as e (l) + a (l) c (l) on tangent space, where a (l) is a Armijo backtracking step size (red color step in Fig. 4).Since the updated e (l) + a (l) c (l) may not be in the oblique manifold, the final point should be projected to the oblique manifold by employing retraction mapping (black color step in Fig. 4).
This method extends the GD method in the Euclidean space to the Riemannian manifold.
Compared to the GD method in the previous subsection, the manifold method does not reformulate the objective function in terms of Θ and thus avoids the many local minimums as shown in Fig. 3(b).By guaranteeing the complex circle constraint satisfied in every iteration, the Riemannian CG method converges to a stationary solution [5].The computational complexity of Fig. 4. Graphic illustration of Riemannian CG method at the l th iteration.
the Riemannian CG update is dominated by the gradient step which only involves element-wise operations.This gives a linear complexity order with respect to M .

F. CR Method
The idea of the CR method is that while the constraint set F in C1 is nonconvex, it can be relaxed to a Euclidean unit ball, which is a convex set.Therefore, problem (5) under C1 can be relaxed into Since (10) has a convex set, it can be solved via convex tools such as CVX.Afterward, the solution of the relaxed problem is projected to the nearest point in |e m | 2 = 1 to obtain a feasible solution.
A variant of the above method is replacing the interior point method with the projected-gradient (PG) method, which alternates between gradient step and projection step.Although this variant has not been employed in the existing literature involving RISs, it has a linear computational complexity compared to the cubic complexity of the interior point method, thus is promising for large-scale systems.
Notice that this method is applicable to model C2.For model C2, where F is already in the form of |e m | 2 ≤ c, there is no relaxation involved and the solution is directly obtained from solving (10).Furthermore, unlike other methods applying to C2, there is no need to optimize β m and θ m separately since optimization of β m is incorporated in (10).

G. Summary and Performance Comparison
To sum up, the optimization methods for handling continuous phase shift design in this section can be categorized into relaxation methods (SDR and CR), iterative approximation methods (penalty-based method and MM), and gradient methods (GD and manifold method).Their relationships are summarized in Fig. 5 and their properties are compared in Table I.From these figures, it can be observed that out of the six algorithms, GD and the manifold method perform consistently well in all three applications, followed by the MM method and the  penalty method.On the other hand, the SDR method and CR perform the worst in these three applications.The worse performance of the SDR method and CR is due to the relatively weak guarantee in the solution quality.
On the other hand, the computation times of various methods in the first application are shown in Fig. 7(a).From this figure, it can be seen that the manifold method, the GD method, and the CR-PG method require the least amount of computation times among the six algorithms, have many training samples corresponding to different channel realizations, a DNN can be trained to approximate the behavior of a traditional optimization method.The advantage of this approach is that the learning results inherit the solution quality from optimization methods [39], [40].
However, it has an additional burden of generating training samples, although low-complexity methods such as GD, the manifold method and CR-PG help to reduce this burden compared to SDR and MM methods.Furthermore, supervised learning can be extended to directly solve problem (4) by treating the channel realization as input and all resources (both x and e) as the desired output of the DNN.

B. Unsupervised Learning
The connection between unsupervised learning and problem (5) comes from the observation that ( 5) can be regarded as an unconstrained optimization problem if the variable is viewed in terms of θ m instead of e.This view has been adopted in the GD method in Section III.
But in contrast to the GD method for solving (9) with respect to Θ, unsupervised learning uses a DNN that accepts channel realization as input and generates the corresponding Θ as output, where the optimization is with respect to the coefficients of the DNN.In unsupervised learning, the objectve is to minimize E[f (x, e iΘ )], where the expectation is with respect to the distribution of input channel state information.The training procedure involves first generating a large number of channel realizations, then optimizing Θ and x under the BCD framework.When optimizing Θ, back propagation is used.On the other hand, when optimizing x, conventional optimization technique is used with the expectation tackled via sampling approximation.Different from supervised learning, this approach does not need the labelling of data, which saves a significant amount of time in training data preparation.However, a disadvantage is that the obtained solution does not have any quality guarantee.

C. Reinforcement Learning
Another major framework in AI is the deep reinforcement learning (DRL).In this framework, the agent (i.e., decision maker) gradually derives its best action through trial-and-error interactions with the environment over time.There are a few basic elements characterizing the DRL learning process: the state, the action, the reward, and the state action value function.
1) State: a set, denoted by S, characterizing the environment.The state s (t) ∈ S denotes the environment at the time step t.
2) Action: a set of allowable action, denoted by A. Once the agent takes an action a (t) ∈ A at time instant t (determined by the state action value function), the state of the environment will transit from the current state s (t) to the next state s (t+1) .
3) Reward: the performance metric of a particular action, denoted by r (t) at time instant t.Depending on the types of action spaces, two DRL methods are available: the deep Q-network (DQN) algorithm, which is designed for discrete action spaces, and the deep deterministic policy gradient (DDPG), which is designed for continuous action spaces.Hence, DQN fits the discrete phase shift model, while the DDPG is employed for continuous phase shift variables.
In this subsection, we present a mapping of DQN in the context of resource allocation problems in RIS empowered wireless networks.In this model, the central controller, which controls the RIS, acts as the agent.At each time slot t, the agent observes a state, s (t) ∈ S, which consists of all channel state information from the wireless system.According to the current state and the Q-function, the agent takes an action, a (t) = argmax a Q s (t) , a ∈ A, where A consists of discrete phase shifts that each reflecting element is allowed to take.After carrying out an action a (t) , the agent obtains a reward r (t) determined from the negative objective function of (5) and observes the next state s (t+1) generated by the wireless system.At each time slot, Q s (t) , a (t)   is updated by where α is the learning rate and γ is the discount factor designed for DQN.The aim of the DQN model is to enable the agent to carry out actions to maximize the long-term sum reward.

D. Summary and Performance Comparison
Different learning-based methods for solving problem (5) are summarized in Fig. 8.For supervised learning, since the training samples are generated from conventional optimization methods, the quality of the output is determined by the properties of the solution from the employed optimization method.For the other two methods (unsupervised learning and reinforcement learning), the outputs have no such quality guarantee.
To compare the performance of different learning-based methods, the first example mentioned in Section II is simulated, with the GD optimization selected for generating training samples in supervised learning, and also serves as a performance benchmark.Fig. 9(a) shows the case of continuous phase shift.It is clear that supervised learning performs close to the GD algorithm.This is not surprising as supervised learning is mimicking the behavior of the optimization method chosen for generating the training data.However, for unsupervised learning, although it does not need training data preparation, it performs unmistakably worse than the supervised learning.Table II further  On the other hand, the performance of deep learning methods under eight allowable discrete phases is shown in Fig. 9(b).For the supervised learning and unsupervised learning, we simply apply quanitization to the learning results.For DRL, we employ the the DQN algorithm, which is trained with a DNN for 2000 epochs and 128 minibatches for each epoch.GD method with unquantized output is also included in Fig. 9(b) to show the performance limit.It can be seen from Fig. 9(b) that the performance of quantization under supervised and unsupervised learning do not degrade much compared to the unquantized output in Fig. 9(a).For DQN, its performance lies between supervised learning and unsupervised learning.The training and inference times of DQN are also shown in Table II.

V. FUTURE CHALLENGES
While an explosive growth in the number of studies of resource allocation involving RISs has been witnessed in the past few years, there are still challenging problems remaining to be investigated.Below, four challenges are described and potential solutions are also discussed.

A. Handling Channel Uncertainty
In general, due to the large number of passive reflective elements in RISs, imperfect CSI is inevitable.Considering channel uncertainty, the resource optimization would be a stochastic counterpart of the problems discussed earlier.In particular, the CSI random error would make the constraints appear in a probabilistic form and the objective function takes an extra expectation.
If the distribution of the channel uncertainty is known, this statistical information can be used to transform the probabilistic constraints into deterministic ones and compute the expectation of the objective function explicitly [41]- [43].However, due to the cascaded channel created by the RIS, the statistical information of the CSI might be complicated, making the transformation from stochastic problems into deterministic ones suffers performance loss, and/or expectation computation intractable.In that case, the Monte Carlo simulation-based method could be used to handle the channel uncertainty [44].
On the other hand, learning-based methods can be modified to tackle uncertain CSI, even when the distribution of the channel uncertainty cannot be described in closed-form.In particular, when preparing the training data, we generate both the true CSI and the CSI added with uncertainty.
During the training, we input the observed CSI (which contains errors) to the DNN, but compute the loss function or reward function using error-free CSI.In this way, the learning system can automatically learn to "denoise" the CSI, while learning the mapping of the RIS phase shifts.

B. Handling Discrete Phase Shift
Recently, the discrete phase shift model begins to emerge under the argument that the reflecting elements only have finite reflection levels due to hardware limitations.The resulting resource allocation problem is even more challenging than the continuous phase shift counterpart since the problem involves both continuous and discrete variables.At the moment of writing, there are two major techniques for solving discrete phase shift problems: quantization or brute-force searching, with the majority of works adopting quantization.
For the quantization-based method, we have demonstrated in Fig. 9(b) that the performance loss is insignificant if the number of discrete phase shifts is not very small.This explains why quantization-based method is popular among existing works.However, when the number of allowable phases is small (e.g., 2 or 3), the quantization method will lead to inevitable performance degradation.To overcome this issue, the original integer nonlinear program can be iteratively transformed into integer linear programs via linear cuts.Then, the branch-andbound algorithm and exhaustive search can be employed to handle the resultant problem with discrete variables [45].However, these searching methods have an exponential time complexity, which could lead to unacceptable complexity even for modest values of M .Recently, the idea of alternating optimization (AO) has been applied to the discrete phase shift searching [46], in which multiple phase shifts are optimized one at a time so that the searching space in each iteration is small.While this reduces the complexity significantly, only stationary points can be guaranteed.
As can be seen that solving the discrete phase shift design problem is at its early stage.It is still a challenge to derive a low complexity approach while achieving performance close to that of brute-force search.For the conventional optimization method, the greedy algorithm, despite its heuristic nature, might be suitable here as it has a quadratic complexity order by using a linear search at each step.Besides, by viewing the desired phase angle as a non-zero element in a sparse vector [47], sparse signal processing such as Lasso approximation [48] and penalty method [49] can also be applied to handle discrete random variables.On the other hand, although the DQN algorithm of DRL matches the discrete phase problem, it can only provide a feasible solution and has a slow learning rate and unstable learning process.Making DRL more efficient in wireless applications would be an important direction.

C. Handling Mobility of RISs and Users
For a large-scale data-centric network, since communication service requirements are highly dynamic and imbalanced among users, it is usually inefficient to deploy RISs at fixed locations.
To improve network coverage and serve remote nodes, RISs can be deployed on autonomous systems such as unmanned aerial vehicles or unmanned ground vehicles for providing flexible channel reconfigurations.Furthermore, the locations of users may also dramatically change over time in emerging V2X networks.Due to the passive nature of RISs, they cannot send pilot signals to track the movement of the users, especially when the direct links from the BS to users are blocked [50].With a mobile RIS or users, the system performance not only depends on the RIS's or users' locations but also on the trajectory itself.Consequently, the dimension of design variables is significantly increased.
Mathematically, the time-varying phase shift design of a mobile wireless system can be modeled as a high-dimensional dynamic programming problem, in which Q-learning, temporal difference learning and policy iteration algorithms in approximate dynamic programming could provide effective solutions [51].On the other hand, since the CSI for unvisited places and future time slots are unknown, the prior distribution of channels has to be predetermined via the geometry-based tracing approach.However, as time evolves, the knowledge about the channel distribution should be updated for a better phase shift design.This can be modeled as a partially observable Markov decision process, where the DRL methods can be used to learn the underlying wireless environment while deciding the moving trajectory on the fly.Hence, the state of DRL includes not only the current CSI but also the action from the previous time step.Furthermore, by exploiting the extra partial information (e.g., previous locations and velocities of users or the RIS), the post-decision state algorithm can be used to find an optimized solution in dynamic environments during the training of the DRL model [52].

D. Scalability of AI-based Methods
In AI-based methods, while generic multi-layer perceptrons (MLPs) and convolutional neural networks (CNNs) have been widely used for wireless resource allocations, there are two wellknown technical challenges.Firstly, MLPs and CNNs are more difficult to train in large-scale settings than small-scale counterparts.For example, as demonstrated in the beamforming problem [53], although the performance of CNNs is near-optimal when trained and tested under a twouser setting, there exists a 18% performance gap to the classic algorithm when trained and tested under a 10-user setting.Secondly, MLPs are designed for a pre-defined problem size with fixed input and output dimensions.In the context of an RIS problem, this means that a well-trained MLP for a particular RIS dimension is not applicable to another setting when the number of reflecting elements differ.
Recent studies have shown that incorporating permutation equivariance property into the neural network architecture can reduce the parameter space, avoid a large number of unnecessary permuted training samples, and most importantly make the neural network generalizable to different problem scales [54]- [57].In particular, graph neural networks (GNNs) [54], [55] and attention-based transformers [56], [57] have been shown to possess the permutation equivariance property and have demonstrated superior performance, scalability, and generalization ability in a few wireless resource allocation problems.For instance, in the beamforming problem, a GNN trained with data generated in a setting of 50 users was shown to achieve near optimal testing performance under a much larger setting of 1000 users [54].This result in fact simultaneously

Fig. 3 .
Fig. 3. (a) Linear upperbound g(x (n) , e|e (r) ) for a quadratic function f (x (n) , e) at point e (r) on the unit circle.(b) Graphical representation of the GD method for updating Θ, where b (r) is the update step size and ∇ Θ f x (n) , e iΘ (r)is the gradient of f at the last iteration solution Θ (r) . n

Fig. 6 .
Fig. 6.Performance comparisons of six optimization methods with M = 10: (a) Secrecy rate versus the maximum transmit power under the number of BS antennas N = 20 [11].(b) Uplink transmit power versus the number of users under the number of BS antennas N = 20 and transmission power limitation P k = 10.8 dBm [12].(c) Total network cost versus the number of users under the number of BS antennas N = 10, the target rate R k = 10 MHz, the bandwidth B = 10 MHz, regularization parameter η = 100 and local storage size Smax = 100 [13].

4 )
State action value function (Q − function): while the reward represents immediate return from action a at state s, the state action value function indicates cumulative rewards the agent may get from taking action a in the state s, which is denoted by Q (s, a).
shows the training times and inference times of GD, supervised learning, and unsupervised learning.It can be observed that the inference times of deep learning methods are indeed very short compared to GD method, although their preparation and training times are very long.

Fig. 8 .Fig. 9 .M = 10 .
Fig. 8. Illustration of different learning methods.The loss functions of supervised learning and unsupervised learning are the mean squared error (MSE) between labels and predicted phases, and the expectation of the objective function of (9) over CSI, respectively.
solves the two challenges mentioned above (difficulty of training in large-scale setting and generalizability to different Interestingly, permutation equivalence also exists in RIS phase shift design problems since exchanging the channels of two reflecting elements should result in a corresponding permutation of the optimized phase shift design.Therefore, it is expected that GNNs and attention-based transformers would be effective neural network architectures for the RIS design problems as well.VI.CONCLUSIONThis paper has reviewed and compared current optimization methods for solving resource allocation problems associated with RISs.It has been noted that most of the available methods are tailored to the continuous phase shift constraints, and AI-based methods are emerging as serious contenders.With the principles and properties of different algorithms explained and illustrated, and future challenges analyzed, it is hoped that this paper will facilitate the suitable choice of algorithms for future research problems involving RISs.