Robust Transceiver Design for IRS-Assisted Cascaded MIMO Communication Systems

Intelligent reconfigurable surfaces (IRSs) have gained much attention due to their passive behavior that can be a successor to relays in many applications. However, traditional relay systems might still be a perfect choice when reliability and throughput are the main concerns in a communication system. In this work, we use an IRS along with a decode-and-forward relay to provide a possible solution to address one of the main challenges of future wireless networks which is providing reliability. We investigate a robust transceiver design against the residual self-interference (RSI), which maximizes the throughput rate under self-interference channel uncertainty-bound constraints. The yielded problem turns out to be a non-convex optimization problem, where the non-convex objective is optimized over the cone of semidefinite matrices. We propose a novel mathematical method to find a lower bound on the performance of the IRS that can be used as a benchmark. Eventually, we show an important result in which, for the worst-case scenario, IRS can be helpful only if the number of IRS elements are at least as large as the size of the interference channel. Moreover, a novel method based on majorization theory and singular value decomposition (SVD) is proposed to find the best response of the transmitters and relay against worst-case RSI. Furthermore, we propose a multi-level water-filling algorithm to obtain a locally optimal solution iteratively. We show that our algorithm performs better that the state of the art in terms of time complexity as well as robustness. For instance, our numerical results show that the acheivable rate can be increased twofold and almost sixfold, respectively, for the case of small and large antenna array at transceivers.


Introduction
Reliability and throughput are two of the most crucial requirements for the next generation of wireless networks. Optimally relaying the signal from a source to a destination can help enhance reliability and capacity of networks and is currently an active research area [1].
Another emerging candidate for relaying signals is reconfigurable intelligent surfaces (IRSs) [2]. An IRS is a device equipped with multiple passive reconfigurable reflectors that can reflect the colliding waves with an adjustable phase. One of the biggest advantages of IRSs is that they work in a real-time manner without consuming a noticeable amount of power [3]. However, the characteristics of the IRS (e.g., the lack of signal amplification and decode-and-forward processes) can potentially limit its functionality. As a result, in cases where reliability and throughput are of greater importance than power consumption, conventional relays might still be a better option than IRSs. For instance, authors in [4] show that a simple full-duplex relay can outperform an IRS in terms of throughput under certain conditions.
In this paper, we investigate the IRS-assisted MIMO full-duplex (FD) relay system that suffers channel uncertainties. It is also considered that the relays have practical issues such as self-interference (SI) as well as antenna and power limits. The combination of IRS and DF relay can potentially be advantageous. This is due to the fact that both the IRS and the relay have their own limitations that can possibly be compensated for by exploiting each other. The objective of this paper is to maximize the achievable rate of the system by jointly optimizing the impact of the IRS as well as the covariance matrices of the source and the relay.

Related works
IRSs can be utilized in various ways to help the direct links enhance the performance of the system. In [5], authors utilized an IRS to maximize the weighted sum rate MISO system. Authors in [6] proposed a method to minimize the power consumption in a MISO system equipped with an IRS, and authors in [7] investigate the problem of energy efficiency in an IRS-assisted MISO downlink system. The problem of rate maximization in a MIMO system has been presented in [8], where the authors propose an iterative algorithm to find the best IRS pattern assuming the perfect channel state information (CSI) is given. Recently, the authors in [9] utilized IRS in a relay-aided network to minimize the power consumption and successfully showed that the combination of IRS and relay outperforms other scenarios. While the performance of IRS communication systems has been extensively studied, there is not much research that considers robust design when the perfect CSI is not available [10].
Recently, with the emergence of artificial intelligence, IRSs have shown a great potential to improve the existing protocols and technologies [11]. Authors in [12] investigate the benefit of employing an IRS equipped with a multi-task learning system on the transmit power and achievable throughput of aerial-terrestrial communications. In [13], authors use a reinforcement-learning-based approach to optimize the IRS reflection coefficients for buffer-aided relay selection.
One of the earliest studies of the robust transmission designs of IRS-assisted systems was undertaken in [14], where a bounded CSI error model is applied to a problem of power minimization in a MISO transmission system. There, by virtue of semidefinite programming (SDP), the authors turn the original problem into a sequence of convex sub-problems. The robust power minimization subject to the outage probability constraints under statistical cascaded channel error model is considered in [15], where the aim is to optimize the system under worst-case rate constraint. Authors in [16] have proposed a robust algorithm for mean squared error (MSE) minimization for a single user MISO system equipped with an IRS. Their method provides a closed form solution for each iteration. However, it can be used only for the case of a single user system and cannot be extended to more general cases where there are multiple users. Recently, a robust algorithm based on a penalty dual decomposition (PDD) technique is proposed in [10] for sum-rate maximization where they assumed that the channel estimation error follows a complex normal distribution.
Exploiting a relay to improve the communication throughput rate is a classic alternative for IRS in communication systems. However, utilizing a relay in a network raises some important questions to be answered. For instance, how should the relay process the received signal before dispatching it to the destination? Now, the relay can receive a signal from the source, process it and transmit it towards the destination in a successive manner. This type of relaying technique is known as half-duplex relaying. Alternatively, while receiving a signal at a certain time instant, the relay can simultaneously transmit the previously received signals. This technique is known as full-duplex relaying [17].
As a consequence of transmitting and receiving at a common resource unit, the relay is confronted with SI. Note that full-duplex relaying potentially increases the total throughput rate of the communication compared to the half-duplex counterpart only if the SI is handled properly at the relay input. By physically isolating the transmitter and receiver front ends of the relay, a significant portion of SI can be reduced [18]. Moreover, analog and/or digital signal processing at the relay input can be utilized to cancel a portion of SI [19][20][21][22]. This can be realized if the estimate of the SI can be obtained at the relay. These SI cancellation procedures can effectively mitigate the destructive impact of SI up to a certain level. Hence, the remaining portion, the so-called residual self-interference (RSI), is still present at the relay input. The distribution of the RSI is investigated in [23,24]. This RSI is mainly due to the channel estimation uncertainties and transmitter noise. Therefore, the quality of channel estimation plays an important role for limiting RSI if the conventional modulation techniques are utilized.
The authors in [25] employ a superimposed signaling procedure (asymmetric modulation constellation) in the basic point-to-point FD communication for cancelling the SI and further retrieving the desired information contents without requiring channel estimates. They show that for the same average energy per transmission block, the bit error rate of their proposed method is better than that of conventional ones. The RSI evidently degrades the performance of the communication quality. To this end, the authors in [26] study the degrees-of-freedom (DoF), i.e., the slope of the rate curve at asymptotically high SNR and its relation to the performance of an FD cellular network in the presence of RSI. Moreover, the authors in [27,28] investigate the joint rate-energy and delivery time optimization of FD communication, respectively, when RSI is still present. Furthermore, the authors in [29] study the sum rate capacity of the FD channel with and without such degradation. In the presence of RSI, the authors in [30] study the capacity of a Gaussian two-hop FD relay.
Robust transceiver design against the worst-case RSI channel helps find the threshold for switching between HD and FD operating modes. This setup is commonly known as a hybrid relay system [31]. The authors in [32] investigate a robust design for multi-user fullduplex relaying with multi-antenna DF relay. In that work, the sources and destinations are equipped with single antennas. Moreover, the authors in [33] investigate a robust transceiver design for FD multi-user MIMO systems for maximizing the weighted sum-rate of the network. The robust design against worst-case RSI is investigated by authors in [34].

Contribution
Motivated by the above, in this work, we consider a DF multi-hop system with multiple antennas at the source, relay and destination along with an IRS to provide additional links. Then we try to maximize the throughput rate for the worst-case RSI scenario. To the best of our knowledge, this is the first time that the throughput rate maximization against the worst-case RSI is evaluated for IRS-assisted DF full-duplex relay in MIMO systems. First, we simplify the problem by finding an analytical lower bound for the performance of the IRS. Then, the optimization of maximum achievable rate of the DF full-duplex relaying is cast as a non-convex optimization problem. Thereafter, we propose a low complexity method to find the solution using majorization theory. We propose an efficient algorithm to solve this problem in polynomial time. Finally, the transmit signal covariances at the source and the relay are designed efficiently to improve robustness against worst-case RSI channel in a given uncertainty bound. Notice that once the covariances are known, one can easily find the precoders using conventional methods such as singular-value decomposition (SVD), etc. To the best of our knowledge, this is the first work that uses the IRS for RSI cancellation in MIMO full-duplex DF relay systems.

Organization
The rest of the paper is organized as follows. Section 2 outlines the system model and introduces its characteristics. The three different tasks for employing the IRS are also given in this section. In Section 3, the optimization problem belonging to the FD scenario is formulated, and its proper solution is presented. In addition, analytical bounds for the performance of the IRS are given and their corresponding proofs are provided. Section 4 provides the optimization problem for the HD scenario along with the solution. In Section 5, the effectiveness of the proposed algorithm is evaluated and verified by performing numerical simulations over various aspects. Finally, the paper is concluded in Section 6, and technical proofs of the theorems are given in the Appendices A-E.

System Model
We consider the communication from a source equipped with N t antennas to a destination with N r antennas. The reliable communication from the transmitter to the destination is assumed to be only feasible by means of a relay with K t transmitter and K r receiver antennas at the output and input front ends, respectively. This means that the direct link from the transmitter to the destination and the link from the transmitter to the IRS and to the destination has a negligible impact on the throughput. This assumption is realistic for the scenarios where the path loss is high due to the high frequency ranges such as mmWave and Terahertz or due to far distances [35], as well as cases were there are objects that block the direct link between the source and the destination. An IRS consisting of M elements is established to either cancel the RSI or help enhance one of the transmitter-relay/relaydestination links. The overall system model can be found in Figure 1. Figure 1. System model of an IRS assisted full-duplex relay. In our model both source and destination are equipped with N t and N r antennas, respectively. In addition, the relay is equipped with K t transmitting and K t receiving antennas, and the IRS has M passive elements.
In this paper, it is assumed that signal delivery over the transmitter-IRS-receiver link is not available. This is mainly due to the power attenuation and the power radiation pattern effects [36]. As the IRS is a passive device, it has some power attenuation in practice, which makes the reflected waves weaker than the received ones. In addition, due to the power radiation pattern, based on the angle of arrival and departure, both received and reflected waves are subject to attenuation, respectively. In our system, as the IRS is established in the vicinity of the relay and is faced towards it, both aforementioned effects cause the source-IRS-destination link to be extremely weaker than the source-relay-destination link.
Next, we present the achievable throughput rates for the HD and FD relaying. We start with the HD relay in which κ = 0. In the second case, IRS can be exploited to enhance the quality of the channel between the source and the relay. In such a case, the received signals at the relay and destination can be expressed as where H SI ∈ C N t M is the channel from the source to the IRS. Finally, IRS can be used to help the channel from the relay to the destination. In this case, the received signals are going to be where H ID ∈ C M×N r is the channel from the source to the IRS.
In what follows, we find the achievable rate for three aforementioned cases and compare them to see under what conditions each of them should be applied. Notation and definitions are summarized in Table 1.
The RSI channel uncertainty bound σ 2 The remaining RSI channel uncertainty after considering the impact of the IRS

Overview
Suppose that the relay employs a DF strategy. In the full-duplex scenario, both sourcerelay and relay-destination links are active at the same time. As a result, the signals from the relay transmitter interfere with the receiving signal at the relay receiver. We assume that an estimate of the SI channel H r is available at the relay denoted byĤ r . Hence, the RSI represented byH r is given asH In the rest of the paper, we try to find an approach to deal with this RSI.

Mathematical Preliminaries
Considering a FD DF relay, the following rates are achievable [37], in which, depending on how the IRS is applied to the system, the three following sets of rates are possible. First, where H tot 1 = (H r + H RI ΘH IR ) when the IRS is used to cancel the self interference. Second, whereĤ tot 2 = Ĥ 1 +Ĥ RI ΘĤ IR andH tot 2 = (H 1 +H RI ΘH IR ) if the IRS is established to help the source-relay channel and finally whereĤ tot 3 = Ĥ 2 +Ĥ RI ΘĤ IR andH tot 3 = (H 2 +H RI ΘH IR ) if the IRS is utilized to enhance the rate of the relay-destination channel. Notice that assuming that the RSI remains uncanceled, a robust transceiver against the worst-case RSI channel is required which is formulated as an optimization problem as follows max Q s ,Q r ,Θ min H r min R FD sr , R FD rd (13) subject to in which the throughput rate with respect to the worst-case RSI channel is maximized. Two constraints, P s and P r , represent the transmit power budgets at the source and the relay, respectively. In constraint (13c), T x represents the RSI or the channel estimation error bound corresponding to H x . Notice that Tr(H xH H x ) represents the sum of the squared singular values of H x . It should be noted that using a bounded matrix norm is the most common way for modeling the uncertainty of a matrix [38,39]. In practice, T x can be found using stochastic methods when the distribution of the channel error is known. Otherwise, one may find it using a sample average approximation method. Finally, constraints (13d) are due to the unit modulus limitation of the IRS elements.
The problem (13) is non-convex and hard to solve. As a result, for each of the abovementioned scenarios, we propose a simplified version of the optimization problem and try to solve it instead. Note that as we are interested in finding the throughput corresponding to the worst-case RSI, any simplification in the optimization problem should be in favor of the RSI and interference. First, we analyse the performance of the system when the IRS is helping the relay cancel the RSI. Consequently, we show that the problem (13) can be simplified to the following optimization problem (14) subject to Tr(Q s ) ≤ P s , Tr(Q r ) ≤ P r , and where Vec(·) denotes the vector of all non-zero elements of its input matrix. We can equivalently write T as where * denotes a column-wise Khatri-Rao product defined as below and where A i is the i'th column of A, and ⊗ denotes the Kronecker product. See Appendix A for proof. One can show that T ≤ ( √ T r − σ min (H IR * H T RI )) 2 . As mentioned before, problem (14) is a simplification of problem (13). This means every achievable rate which is inside the feasible set of (14) is also inside the feasible set of (13) (Notice that the reverse is not necessarily true, i.e., every achievable rate which is a feasible solution of (13) is not necessarily a feasible solution for (14) as well. However, as we look for achievable rates, we can still use this method). The reason is that in problem (13), the minimization over RSI happens only one time, and the worst-case RSI simultaneously tries to cancel the effect of the best configuration of IRS and the best covariance matrices. In (14), first, the RSI does its worst damage on the performance of the best IRS configuration and after that performs another optimization to bring the worst power allocation against the best covariance matrices (This will be clearer later on when the geometrical representation of the problem is given). In what follows, we provide our proposed ways to deal with optimization problems (14) and (15), respectively. (14), one can show that T (T, Θ) ≤ (

Theorem 1. For the optimization problem
Proof. We begin the proof with an intuitive example and then extend it to the more general case. Assume that K t = 1, K r = 2 and M = 3. Then we have In addition, consider the following optimization problem Here, notice that H IR diag(H T RI ) is a linear map from a three-dimensional into a twodimensional space. One simple example of such a mapping can be found in Figure 2. Here, an example of mapping from a three-dimensional to a two-dimensional space is shown. The left shape shows the feasible set for the IRS with three elements in a real valued space. The cube belongs to the case of T , i.e., constraints −1 ≤ θ m ≤ 1, ∀m, while the sphere shows the constraint θ 2 1 + θ 2 2 + θ 2 3 ≤ 1 which belongs to T . On the right, the feasible sets belonging to the two aforementioned regions after performing mapping f are presented as an example. It can be seen that the mapping of the first set of constraints (the hexagon) covers the whole area of that of the second one (the ellipse). One important key is, as mapping is a linear function, , where A and B are two arbitrary sets and f is the mapping.
In general, as the number of IRS elements or the dimensions ofh r increase, the mapping of the hypercube becomes more and more complicated and finding the optimal distance becomes more difficult. However, there is an upper bound for this distance. As shown in Figure 3, if instead of the cube, we limit the feasible set of IRS elements to the sphere inside the cube, i.e., replacing (18b) with (19b), the solution to the problem becomes GE ≥ GF. It turns out that finding GE is very simple as by the definition we have σ min (H IR diag(H T RI )) = OE, and also we know that √ T r = GO. Therefore, we can ). Finally, we use one last upper bound to make the original problem even easier to solve. Note that if instead of the ellipse, we consider the circle inscribed in it, we will have maxh r min Θ ||h r + ( where It is worth mentioning that the geometrical representation for the optimization problem (13) is different because there, considering that the RSI wants to bring the worst representation against the IRS configuration and covariance matrices simultaneously, the RSI cannot freely span the whole circle. This is due to the fact that some regions in the circle might not be a good choice when it comes to RSI design against covariance matrices. However, if the best representation of RSI against the covariance matrices also provides the best RSI against the IRS configuration, the solution to (14) and (13) will be the same. Eventually, instead of optimization problem (13), one can solve optimization problem (14). The solution to the new problem is guaranteed to be achievable by the original problem as well.
Notice that one can readily extend this interpretation into the complex domain, as the constraint (19b) will still be a subset of constraints (18b). It remains to show one can generalize the geometrical proof for arbitrary large dimensions. This means that the channel dimensions and the number of IRS elements can be any natural numbers.
Interestingly, it is enough to show that the geometrical proof based on 2  becomes GE ≥ GF. It turns out that finding GE is very simple as by the definition we have σ min (H IR diag(H T RI )) = OE, and also we know that √ T r = GO. Therefore, we can

norms and Euclidean distance exists for higher dimensions. This proof is given in Appendix
). Finally, we use one last upper bound to make the original problem even easier to solve. Note that, if instead of the ellipse, we consider the circle inscribed in it, we will have maxh r min Θ ||h r + ( where It is worth mentioning that the geometrical repre-228 sentation for the optimization problem (13) is different, because there, considering that the 229 RSI wants to bring the worst representation against the IRS configuration and covariance 230 matrices simultaneously, the RSI cannot freely span the whole circle. This is due to the fact 231 that some regions in the circle might not be a good choice when it comes to RSI design 232 against covariance matrices. However, if the best representation of RSI against the covari-233 ance matrices also provides the best RSI against the IRS configuration, the solution to (14) 234 and (13) will be the same. Eventually, instead of optimization problem (13) one can solve 235 optimization problem (14). The solution to the new problem is guaranteed to be achievable 236 Next we solve problem (14). Solving this problem is hard in general as it is non-convex. Hence, we first use the following lemma and theorem to solve it. There, it is shown that for every possible choice of H 1 and H 2 , there exists at least one set of simultaneously diagonalizable matrices H tot , Q s and Q r that are the solutions to the problem (14).

Lemma 1.
For two positive semi-definite and positive definite matrices A and B with eigenvalues Proof. Consider Fiedler's inequality given by [40], Furthermore, given B as a positive definite matrix, the following are true, Now, dividing the sides of (22) by B , one can readily obtain (21).
Note that in (21), the inequalities hold with equality if and only if A and B are diagonalizable over a common basis. Using the result of Lemma 1, R FD sr can be lowerbounded as In addition, it holds that λ i σ 2 Note that the inequality holds with equality whenever share a common basis. Next, we use the following inequality The above inequality holds true since λ i (H 1 Q sH H 1 ) ≤ T 1 P s . Now, instead of completing the minimization over the left-hand side (LHS) of Equation (26), we can first minimize the right-hand side (RHS) of Equation (27) to find an achievable rate. Similarly, for R FD rd we have

Remark 1.
Having the equality C =H r Q rH H r , one can generally conclude the rule of multiplication is determinant, i.e., det(C) = det H H rHr det(Q r ). Further, using the properties of determinants we can also conclude N  (3) Tr(Q s ) ≤ Tr(Q s ); then we can use Q s instead and rewrite (26) in terms of λ i H H 1 H 1 and λ i (Q s ) to simplify the problem. The first property implies that both Q s and Q s have the exact same impact on the capacity. Hence, if we find a Q s which is the solution to the problem (14), its corresponding Q s will also be a solution. The second property means, unlike Q s , Q s actually shares the common basis with H H 1 H 1 . The last property implies that Q s is at least as good as Q s in terms of power consumption. Observe that if we show for every feasible Q s there exists at least one such Q s , then we can solve the problem (14) in a much easier way. The reason is, in such a case, instead of searching for optimal Q s over the whole feasible set, we can search for the optimal Q s . Unlike Q s , finding Q s does not need a complete search over the whole feasible set since Q s shares a common basis with H H 1 H 1 . Therefore, we can limit our search only to the portion of the feasible set in which the matrices have eigendirections identical to those of H H 1 H 1 . Similarly, if we show for every choice ofH r , there exist at least oneH r for which we have three conditions , we can simplify our search to findingH r instead ofH r . In the next theorem, we show that such Q s andH r exist.

Theorem 2.
For all matrices Q s and H 1 , there exists at least one matrix Q s that satisfies the following conditions, where ρ(i) is a random permutation of i and indicates that there is no need for λ ρ(i) (Q s ) to be in decreasing order.
Proof. The proof is given in Appendix B.
For the sake of simplicity, we use the following notions for the rest of the paper, Now, using Theorem 2 alongside Lemma 1, we infer that with no loss of generality, instead of optimising over matrices, one can complete the optimization over eigenvalues to find the optimal value for RSH of (27). Then we have subject to γ s 1 ≤ P s , (37a) Note that the two additional constraints (37d) and (37e) need to be satisfied due to the conditions of Lemma 1 (i.e., eigenvalues have to be in decreasing order). Interestingly, these two additional constraints are affine. The above optimization problem can further be simplified using the following lemma, Lemma 2. The objective function of the optimization problem (37) is optimized when the constraints (37a) and (37c) are satisfied with equality.
Proof. Intuitively, as the objective function is an increasing and decreasing function of each element of γ s and σ 2 r , respectively, at convergence, the constraints are met with equality. See Appendix C for the proof.

Algorithm Description
In this subsection, our proposed algorithm is given. In short, it works as follows. First, based on the task of the IRS in the system, we compute the effect of IRS on the RSI, source-relay and/or relay-destination channel links. After that, we design the best signal design for the source and relays transmitters with the objective of maximizing the throughput. In the rest of this subsection, the detailed explanation of the algorithm is given. First, we need to solve the optimization problem (37). It can be readily shown that R FD rd is a monotonically increasing function of P r . Furthermore, one can show that R FD sr is an increasing function with respect to P s and a decreasing function with respect to T and P r (See Appendix D). Consequently, the worst-case RSI chooses a strategy to reduce the spectral efficiency, while the relay and the source cope with such strategy for improving the system robustness. That means, on one hand, the RSI hurts the stronger eigendirections of the received signal space more than the weaker ones. However, on the other hand, the source tries to cope with this strategy adaptively by smart eigen selection. This process clearly makes the optimization problem complicated at the source-relay hop. Unlike the source-relay hop, the resource allocation problem at the relay-receiver hop is rather easy. Since at the relay-receiver hop there is only one maximization, we can find the sum capacity simply by using the well-known water-filling algorithm.
Observe that although finding each R FD sr and R FD rd separately is a convex problem, the problem (37) as a whole is not convex. Therefore in this paper, we find the optimal R FD sr by keeping R FD rd fixed. Then we use the resulting R FD sr to find optimal R FD rd and again, using the new resulted R FD rd to find optimal R FD sr . This iterative process repeats until the convergence. Our simulation showed that the algorithm has a very fast convergence and only in rare cases does it take more than 20 iterations for the algorithm to converge. This is mainly due to the fact that inequalities (37d) and (37e) restrict the eigenvalues to vary up to a certain limit, which in turn, makes the whole outputs more stable. Figure 4 depicts a typical histogram of iterations. As it can be seen, only less than 3% of cases did not converge until 50 iterations.
Notice that the optimum values for the transmission power on relay hop may not sum to P r . The reason is that R FD sr is a monotonically decreasing function of P r and as we are interested in the min(R FD sr , R FD rd ), with R FD sr < R FD rd we will have min(R FD sr , R FD rd ) = R FD sr . Therefore, it is in our interest to keep P r as low as possible to increase R FD sr as much as possible. Analogously, in the case of R FD sr > R FD rd we have min(R FD sr , R FD rd ) = R FD rd which can be increased by increasing the total power usage of relay's transmitter. As a result, the well-known bisection method can be used to find the optimal rate where we have R FD sr = R FD rd , unless the case R FD sr ≥ R FD rd happens even if the maximum allowed power is used at the relay transmitter. In such a case, the relay-destination link becomes the bottleneck.  Figure 4. Cumulative distribution function (cdf) of iterations when P s = 5 and P r = 1 and M = K t = K r = N = 10. The maximum number of iterations is set to be 50. Cases that took more than 50 iterations to converge are considered to be divergent. Now we focus on how to find R FD sr . In order to find the sum rate for the sourcerelay hop, we assume that we are already given γ r which is the vector of relay input powers that maximizes the sum rate at the relay-destination hop. The next step is to complete the minimization over σ r and the maximization over γ s . One approach to solve this problem is to solve it iteratively. With this method, first one finds the optimal γ s by solving the maximization part of (37) under the assumption that the optimal σ r is given, and then, having the optimal γ s , the minimization part of (37) can be solved efficiently. This process goes on until the convergence of γ s and/or σ r . The maximization part is performed using the water-filling method. However, the additional conditions should be taken into account. For instance, if the optimal value for γ s i turns out to be equal to zero, then we should have γ s j = 0 for all j > i irrespective of their SNR. Figure 5 depicts two different examples of multi-level water-filling algorithms. As it can be seen, first, a regular water-filling algorithm is considered where for each subchannel we have . These additional restrictions act like caps on top of the water and create multilevel water-filling which can be interpreted as a cave. Figure 5a shows the case where these caps do not make any subchannel to have zero power. However, Figure 5b shows the case where subchannel i = 13 has to be zero as a result of the cap imposed by the additional constraints (37d). In this case, we have γ s ρ(13) = 0, and as a result min 1≤i ≤13 {σ 2 Thus, this condition forces all other subchannels (i.e., i > 13) to get no power. Algorithm 1 provides the detail of multilevel water-filling. For the minimization part, a Lagrangian multiplier is used. We have Calculating ∂L ∂σ 2 r i = 0 we arrive at where λ is the water level.
Similarly to the maximization case, there are additional constraints γ r i σ 2 that must be considered during the minimization process. However, it can be shown that if the constraints γ r i ≥ γ r i+1 and σ 2 1 i γ s ρ(i) ≥ σ 2 1 i+1 γ s i+1 are met, then the constraint γ r i σ 2 s ρ(i) ≥ γ r i+1 σ 2 r i+1 becomes redundant. Please refer to Appendix E for proof. The summary of the algorithm to find the achievable rate can be found in Algorithm 2. Next we deal with the optimization for the cases where IRS is utilized to help either the source-relay or relaydestination channels. In such cases, the optimization part over the covariance matrices remains the same as the abovementioned case. In addition, the optimization of the IRS elements can be performed using eigenvalue decomposition and the algorithm introduced in [8]. Notice that for the case in which IRS is assisting the source-relay link, the term T 1 P s in (27) should be replaced with (T 1 + T SI T IR )P s , and for the case where IRS helps the relay-destination link, the term T 2 P s in (28) should be replaced with (T 2 + T RI T ID )P r . The pseudo code for these scenarios is given in Algorithm 3.

Discussion
In this part, we evaluate the various aspects of our method. First, we examine the complexity of our algorithm and compare it with the state of the art. Algorithms 1 and 2 are the main solutions provided in this paper. Algorithm 1 is a multi-level water-filling, and as a result, it has the complexity of O(I w min(N t , K r )), where I w is a constant that is independent of system parameters and is only related to the accuracy of the multi-level water-filling algorithm. Algorithm 2 requires the SVD for matrices H 1 , H 2 and (H IR * H T RI ), with the complexity O(N t K r min(N t , K r )), O(N r K t min(N r , K t )) and O(MK r K t min(M, K r K t )), re-spectively. Furthermore, the Khatri-Rao multiplication H IR * H T RI is needed that has the complexity O(N r K t M). As a result, the overall complexity of our method is O(MK r K t min(M, K r K t ) + N r K t M + N t K r min(N t , K r ) + N r K t min(N r , K t ) + I t (I w min(N t , K r ))), where I t is a constant independent of the system parameters. Interestingly, our method has a super linear complexity with respect to the number of IRS elements which is better than the state of the art works, e.g., [5,8]. This means that our algorithm in more energy efficient and suitable for latency sensitive applications. It should also be noted that our algorithm does not provide the optimal IRS pattern; instead, it provides analytical bounds for the performance of the IRS that can be used as a benchmark. In other words, our work provides a tool with which one can evaluate the efficiency of their robust design. A comparison between our method and previous works is summarized in Table 2

Achievable Rate (Half-Duplex Relay)
We consider a simple HD relay where the source and the relay transmit in two subsequent time instances. Notice that for the case of HD, IRS can be used to assist both the source-relay and the relay-destination channels as the signal is being sent over each of these channels in a different time slot. Therefore, the received signals at the relay and the destination can, respectively, be expressed as Consequently, the achievable rates for the transmitter-relay and relay-destination links can be expressed as below where H 1 =Ĥ 1 +Ĥ IR ΘĤ SI and H 2 =H 2 +H ID ΘH RI . In addition, R HD sr and R HD rd are the achievable rates on the source-relay and relay-destination links, respectively. Using time sharing, the achievable rate between the source and destination nodes is given by where α is the time-sharing parameter. Note that in half-duplex relaying, the source and relay transmissions are conducted in separate channel uses. Hence, the transmit covariance matrices Q s ∈ H N t ×N t and Q r ∈ H K t ×K t are optimized by maximizing the achievable rate from the source to the destination. Here, the convex cone of Hermitian positive semidefinite matrices of dimensions N t × N t and K t × K t are represented by H N t ×N t and H K t ×K t , respectively. Importantly, for maximizing this achievable rate, the time-sharing parameter, i.e., α, needs to be optimized alongside the system parameters, e.g., power allocation. Readily, optimal α occurs at αR HD sr = (1 − α)R HD rd . Therefore, the achievable rate becomes as follows Notice that as the objective function of the above optimization problem is a monotonically increasing function of both R HD sr and R HD rd , the problem can be simplified to maximizing R HD sr and R HD rd separately. Next, we provide the solution to the rate optimization problem when IRS is assisting the source-relay link. We have The above optimization problem follows the same approach applied for the optimization of the relay-destination link in the FD scenario. As a result, the same method could be applied to find it. In other words, the well-known water-filling algorithm can be used to find the optimal covariance matrices along with the algorithm introduced in [8] to find the best IRS pattern. This process continues iteratively until it finally converges. The solution to R HD rd is the same as well, and the same procedure can be applied to find Q r . The overall procedure of finding the solution for the HD mode is summarized in Algorithm 4.

Numerical Results
We assume the transmit power budgets at the source and at the relay are P s = 5 and P r = 1, respectively. Moreover, the AWGN spectral density is assumed to be −175 dBm and the bandwidth is BW = 180 MHz. In this section, we investigate the performance of IRSassisted full-duplex relaying with RSI channel uncertainty bound T r , i.e., Tr(H rH H r ) ≤ T r . We consider all the channels to follow the Rician distribution with the factor = 0.1 and the specificaiton given in Table 3. We also assume T x = 0.001, x ∈ {1, 2, SI, IR, RI, ID}. We perform Monte Carlo simulations with L = 10 3 realizations from random channels and noise vectors. Hence, the average worst-case throughput rate is defined as the average of worst-case rates for L randomization, i.e., R av = 1 L L l=1 R l . Notice that for each set of realizations, we solve the robust transceiver design as is elaborated in Algorithm 2. We run different sets of simulations as described in the following subsections.

Antenna Array Increment with No IRS
In this part, first we assume that there is no IRS installed. Then we evaluate the performance of the system using different strategies. Thereafter, we examine how installing an IRS can help increase the throughput. We consider two cases where the source, relay and destination are equipped with (a) a small antenna array, and (b) a large antenna arrays. In order to see the impact of IRS, we first assume that there is no IRS installed. For these cases, we have These cases are considered to highlight the performance of full-duplex DF relaying as a function of the number of antennas with the worst-case RSI. Interestingly, as the number of antennas at the source, relay and destination increase, full-duplex relaying achieves a higher throughput rate even with strong RSI. This can be seen by comparing rates from Figure 6a to those from Figure 6b.  Furthermore, notice that the worst-case RSI casts strong interference on the strong streams from the source to the destination. With very low RSI power T r → 0, full-duplex almost doubles the throughput rate compared to the half-duplex counterpart. This can be seen in Figure 6, where the curves have their intercept point with the vertical axis.
However, as T r increases, the efficiency of full-duplex operation drops. It is worth noting that at low RSI power the DoF plays the most important role to have a higher sum rate. For instance, consider Figure 6a in which the cases FD = {4,5,5,4}, FD = {4,4,6,4} and FD = {4,6,4,4} have DoF total = 4 -DoF total is the minimum of the DoF of sourcerelay and relay-destination channels, i.e., DoF total = min (DoF sr , DoF rd ) -while the cases FD = {4,7,3,4} and FD = {4,3,7,4} have DoF total = 3. At T r = 0, there is a noticeable gap between the first three cases and the last two, while the difference of the first three cases from each other is small. The big gap is due to the difference in DoF total , and the small one is due to the difference in SNR. Similarly, in Figure 6b, the three cases FD = {10,12,12,10}, FD = {10,10,14,10} and FD = {10,10,14,10} with DoF total = 10 have higher rates than the two cases FD = {10,6,18,10} and FD = {10,18,6,10} with DoF total = 6.
Finally, it can be seen in both Figure 6a,b that at T r = 0 there is no difference between cases that have the same DoF total but different DoF sr and DoF rd . As it can be seen in both Figure 6a,b, for cases with K t > K r the sum rate drops quickly as RSI increases. In fact, the more relative antennas at the relay transmitter compared to its receiver, the faster the sum rate drops with the rise in RSI. To understand this behaviour of the system better, again, consider case {10,18,6,10} and also suppose T r → ∞. As discussed before, we have DoF sr = 10 and DoF rd = 6. Moreover, we have DoF I = 6 for the interference channel (H r ). Unlike the case with no interference, in this case the bottleneck is no longer the relay-destination link. This is due to the fact that interference can act to the detriment of some six of the source-relay subchannels. As we have DoF I = 6, interference can choose at most six independent subchannels, and as we assumed T r → ∞, for those subchannels we obtain SINR → 0. Therefore, no information can be conveyed from those links, and the bottleneck becomes the source-relay link with 4 usable subchannels. It can be seen in Figure 6a,b that as T r increases, the cases with the same sum-rate at T r = 0 start to diverge because of the different characteristics of the interference they experience. We explain the effect of interference in the following subsection in more detail.

The Impact of IRS
In this part, we evaluate the impact of IRS on the throughput rate when it is used to perform different tasks. Figure 7 shows the throughput for three different scenarios, namely, when IRS is used to help the transmitter-relay link, when it is applied to cancel RSI and when the IRS job is to help the relay-destination link. Then the results are compared with two cases where the system is working in HD with IRS and the case where the system is working in FD with no IRS. It is also assumed that T r (H r H H r ) = 75%, i.e., the system works at high RSI range. As it can be seen, the highest performance is achieved when the IRS is utilized to deal with the RSI. As a result, for the rest of the paper we use the IRS for this purpose.   Figure 8a, we considered the case {N t , K t , K r , N r } = {4, 5, 5, 4}, and for Figure 8b, we considered {N t , K t , K r , N r } = {10, 12, 12, 10}. As shown in the figure, the number of IRS elements has a great impact on RSI cancellation to the extend that having an IRS with M = 100 and M = 300 can cancel interference of T r Tr(H r H H r ) = 0.75 for {N t , K t , K r , N r } = {4, 5, 5, 4}, and {N t , K t , K r , N r } = {10, 12, 12, 10}, respectively. Further, it can also be seen in the figure that having IRS with 20 and 100 elements for the small and large antenna array cases, respectively, is not helpful at all. This is mainly due to the fact that unlike the average case, for the case of worst-case scenario, the number of IRS elements should be at least as large as the dimension ofH r . Otherwise, the IRS feasible set cannot span into all dimensions ofH r . Therefore, there is always at least one representation forH r in which IRS cannot perform any RSI cancellation. In addition, comparing two figures Figure 8a,b one can conclude that, when the dimension ofH r increases, the effort that IRS has to make in order to cancel RSI remarkably increases which is consistent with the previous statement. is working in FD with no IRS. It is also assumed that T r (H r H H r ) = 75% i. e. the system works 434 at high RSI range. As it can be seen, the highest performance is achieved when the IRS is 435 utilized to deal with the RSI. As a result, for the rest of the paper we use the IRS for this 436 purpose. Also, it can be seen that when the number of 437 Figure 8 shows the impact of IRS on the throughput. For Figure 8(a) we considered 438 the case {N t , K t , K r , N r } = {4, 5, 5, 4} and for Figure 8(b), we considered {N t , K t , K r , N r } = 439 {10, 12, 12, 10}. As shown in the figure, the number of IRS elements has a great impact on 440 RSI cancellation to the extend that having an IRS with M = 100 and M = 300 can cancel 441 interference of T r Tr(H r H H r ) = 0.75 for {N t , K t , K r , N r } = {4, 5, 5, 4}, and {N t , K t , K r , N r } = 442 {10, 12, 12, 10} respectively. Further, it can also be seen in the figure that having IRS with 20 443 and 100 elements for the small and large antenna array cases respectively, is not helpful 444 at all. This is mainly due to the fact that, unlike the average case, for the case of worst 445 case scenario, the number of IRS elements should be at least as large as the dimension of 446 H r , otherwise, the the IRS feasible set cannot span into all dimensions ofH r . Therefore, 447 there is always least one representation forH r in which IRS cannot do any RSI cancellation. 448 Also, comparing two figures Figure 8(a) and Figure 8(b), one can conclude that, when the 449 dimension ofH r increases, the effort that IRS has to make in order to cancel RSI remarkably 450 increases which is consistent with the previous statement.

Relay Tx/Rx Antenna allocation
Suppose that the relay has K t + K r = 8 in total. Furthermore, following cases in which the number of antenna at the source and destination are {N t , N r } = {4, 4}. The question is, from eight antennas at the relay, how many should be used for reception for the robust design? Figure 9 shows the sum rate as a function of K r for different values of T where there is no IRS and there is an IRS with M = 60 elements, respectively. As it can be seen, by using more antennas for reception than for transmission, i.e., K r > K t , at the relay, the throughput rate is maximized. This is due to the fact that increasing the signal-to-noise ratio (SNR) of the source-relay streams enhances the overall throughput rate more than increasing the number of antennas for transmission in order to enhance the DoF of the relay-destination link. Furthermore, notice that in this setup the overall DoF from the source to destination is limited by the DoF of the source-relay link, i.e., the bottleneck is in the first hop.
By comparing two scenarios, we see that having an IRS not only improves the rates in all cases, but also it may change the best antenna allocation. For instance, for the case of T = 15%, it is best to have six antennas at the relay receiver and four4 antennas at the relay transmitter. However, after establishing the IRS, the best antenna allocation changes to five antennas at each end. For instance, the results show that although the DoF total for both {N t , K t , K r , N r } = {4, 3, 5, 4} and {N t , K t , K r , N r } = {4, 5, 3, 4} is three, the sum rate capacity of the latter is much better than that of the former at high interference. This is because of the fact when DoF sr > DoF rd , the source-relay link enjoys DoF sr − DoF rd subchannels with no interference. Therefore, the source can manage to obtain a higher sum rate by choosing its power allocation wisely. However, in the case of DoF sr ≤ DoF rd , no matter how well the power allocation is performed, all sub channels suffer from interference at the source-relay end. Kr Rav (bits/channel use)  Figure 9. Sum rate throughput as a function of relay receiver antennas K r with and without RSI. The transmit power budget at the source and the relay are assumed to be equal, i.e., P s = 5 and P r = 1.

452
Suppose that the relay has K t + K r = 8 in total. Furthermore, following cases in which 453 the number of antenna at the source and destination are {N t , N r } = {4, 4}. The question is, 454 from 8 antennas at the relay, how many should be used for reception for the robust design? 455 Figure 9 shows the sum rate as a function of K r for different values of T where there is 456 no IRS and there is an IRS with M = 60 elements respectively. As it can be seen, by using 457 more antennas for reception than for transmission, i.e., K r > K t , at the relay, the throughput 458 rate is maximized. This is due to the fact that, increasing the signal-to-noise ratio (SNR) of 459 the source-relay streams enhances the overall throughput rate more than increasing the 460 number of antennas for transmission in order to enhance the DoF of the relay-destination 461 link. Furthermore, notice that in this setup the overall DoF from the source to destination is 462 limited by the DoF of the source-relay link, i.e., the bottleneck is in the first hop.

463
By comparing two scenarios we see that having an IRS not only improves the rates 464 in all cases, but also it mat change the best antenna allocation. For instance, for the case 465 of T = 15%, it is best to have 6 antennas at the relay receiver and 4 antennas at the relay 466 transmitter. However, after establishing the IRS, the best antenna allocation changes to 5 467 antennas at each end. For instance, the results show that although the DoF total for both 468 {N t , K t , K r , N r } = {4, 3, 5, 4} and {N t , K t , K r , N r } = {4, 5, 3, 4} is 3, the sum rate capacity of 469 the latter is much better than that of the former at high interference. This is because of the 470 fact when DoF sr > DoF rd , the source relay link enjoys DoF sr − DoF rd subchannels with no 471 interference. Therefore, the source can manage to obtain higher sum rate by choosing its 472 power allocation wisely. However, in the case of DoF sr ≤ DoF rd , no matter how well the 473 power allocation is done, all sub channels suffer from interference at the source-relay end. 474

475
In this subsection, we determine the thresholds where the HD relaying outperforms 476 the FD relaying. This thresholds provide a mode-switching threshold in hybrid HD/FD 477 relay systems. As it can be seen in Figure 10, for each case of K r , there is a maximum 478 value of T P above which the HD mode outperforms the FD mode in terms of sum rate 479 maximization. Furthermore, Figure 10 shows the threshold for different different IRS 480 configurations. For this part we continued with the case of N t = 4, K r = 5, K t = 5, N r = 4. 481 As it can be seen, by increasing the number of antennas, the threshold occurs at higher RSI. 482 This is in fact a direct result of getting better performance by having more antennas at the 483 relay's receiver. It is worth noting that the IRS has a great impact on the performance of FD 484 relaying. For instance, by having an IRS consists of only 60 elements, FD mode outperforms 485 HD mode in almost all cases.

Full-Duplex vs. Half-Duplex
In this subsection, we determine the thresholds where the HD relaying outperforms the FD relaying. This threshold provides a mode-switching threshold in hybrid HD/FD relay systems. As it can be seen in Figure 10, for each case of K r , there is a maximum value of T P above which the HD mode outperforms the FD mode in terms of sum rate maximization. Furthermore, Figure 10 shows the threshold for different IRS configurations. For this part, we continued with the case of N t = 4, K r = 5, K t = 5, N r = 4. As it can be seen, by increasing the number of antennas, the threshold occurs at higher RSI. This is in fact a direct result of obtaining better performance by having more antennas at the relay's receiver. It is worth noting that the IRS has a great impact on the performance of FD relaying. For instance, by having an IRS consisting of only 60 elements, the FD mode outperforms the HD mode in almost all cases. 3 3.5  Figure 10. Thresholds for different K r and M. The region above each curve indicates values of T P for which HD outperforms FD. In contrast, points below the curve belong to cases where FD performs better than HD.

487
In this paper, we investigated a multi-antenna source communicating with a multi-488 antenna destination through a multi-antenna relay. The relay is assumed to exploit a 489 decode-and-forward (DF) strategy. An IRS is installed to hep the relay cope with the RSI. 490 The transceivers are designed in order to be robust against the worst-case residual self-491 Figure 10. Thresholds for different K r and M. The region above each curve indicates values of T P for which HD outperforms FD. In contrast, points below the curve belong to cases where FD performs better than HD.

Conclusions
In this paper, we investigated a multi-antenna source communicating with a multiantenna destination through a multi-antenna relay. The relay is assumed to exploit a decode-and-forward (DF) strategy. An IRS is installed to help the relay cope with the RSI.
The transceivers are designed in order to be robust against the worst-case residual selfinterference (RSI). To this end, the worst-case achievable throughput rate is maximized. This optimization problem turns out to be a non-convex problem. Assuming that the degrees-offreedom (DoF) of the source-relay link is less than the DoF of the relay-destination link, we determined the left and right matrices of the singular vectors of the worst-case RSI channel. Then, the problem is simplified to the optimal power allocation at the transmitters, which guarantees robustness against the worst-case RSI singular values. This simplified problem is still non-convex. Based on the intuitions for optimal power allocation at the source and relay, we proposed an efficient algorithm to capture a stationary point. Our proposed method showed a significant improvement in robustness. More precisely, we showed that in the case of high uncertainty, using our method can lead to at least 100% worst-case throughput improvement for the case of few antenna arrays and up to 500% for the case of large antenna arrays at transceivers. Furthermore, we confirmed that there is a direct relation between the performance of the system and the number of IRS elements. The simulations show that having the IRS with as low as 90 and 300 elements can completely remove the RSI for our system configuration. Finally, we showed that when there is no RSI, the impact of the relay can be fully harnessed where the number of antennas are equal at the relay transmitter and receiver. Therefore, employing the IRS to deal with the RSI can lead to the best performance of the relay.  2 2 , and the proof is complete.

Appendix B. Proof of Theorem 1
Before stating the proof, first we introduce the following definitions.
Definition A1. For a vector a, we denote vector a ↓ which has the same components as a except that they are sorted in a decreasing order.
Definition A2. Vector a is said to be majorized by vector b and denoted by a ≺ b if: where a ↓ i is the i'th component of a ↓ , N r is the number of vector components and K ≤ N. If the last equality does not hold, a is said to be weakly majorized by b and denoted by a ≺ w b.
Definition A3. Vector a is said to be multiplicatively majorized by vector b and denoted by In addition, it is easy to check To begin with, we know that for n × m matrix A and m × n matrix B we have λ i (AB) = λ i (BA), ∀i ∈ {1, · · · , min(m, n)}. In addition, the only difference between eigenvalues of BA and AB are the number of eigenvalues 0.
Now, we define vector λ(Q s ) and set its components to be λ ρ(i) (Q s ) = Lemma A1. Let A and B be semidefinite Hermitian matrices with λ min(m,n) (AB) > 0. Then Proof. The proof is given in [41] (H.1,e).
Using the above lemma, we can conclude log λ(Q s ) ≺ log(λ(Q s )).
Then, immediately we can conclude Remark A1. It is worth mentioning that, depending on channel realizations, the optimal Q s might contain some zero eigenvalues. In such cases, we can simply ignore the zeros and construct matrix Q s with dimension (n − k) × (n − k). Similarly, in the cases whereH H rHr has some zero eigenvalues, we can do the same and proceed to constituteH s using only nonzero eigenvalues of H H rHr and add the zeros back to the result again at the end.
Finally, we use the following lemma to show thatH r and Q s are in the feasible set.
Lemma A2. For two vectors a and b, if a ≺ × b, then a ≺ w b follows.
Exploiting the above lemma, one concludes which consequently results in Therefore, there exists Q s andH r fulfilling (29)-(31), which satisfy where λ is water level and can be found based on power constraints. Substituting the new power allocation for interference, we obtain new power allocation for input power as follows where (a) comes from the fact that ε / N j=1 σ 2 1 j γr j is a constant independent of i. So we can define λ = λ + ε / N j=1 σ 2 1 j γr j . This shows, for σ 2 r , all the optimal variables and parameters remain the same as those of σ 2 r . Now we compare R FD sr for both cases. First, notice that we have ∀i, ε i ≥ 0 and among them there is at least one index i , for which we have ε i > 0. This means ∀i, σ 2 r ρ(i) ≥ σ 2 r ρ(i) and σ 2 . Now, notice that f i (x) = log 2 1 + σ 2 1 i γ s ρ(i) 1+γ r i x is a monotonically decreasing function of x. Thus, we have f i (σ 2 r ρ(i) ) ≤ f i (σ 2 r ρ(i) ) and f i (σ 2 ). Adding all above inequalities, we obtain . (A32) The above equation indicates R FD sr (σ 2 r ) > R FD sr (σ 2 r ) which contradicts the first assumption R FD sr (σ 2 r ) ≤ R FD sr (σ 2 r ). This completes the proof of the minimization part. For the maximization part, the general idea is the same. Again, the proof is by contradiction. We assume the optimal vector γ s , for which we have R FD sr (γ s ) ≥ R FD sr (γ s ), does not sum to P s . Therefore, we have γ s 1 < P s . Then there exists ε > 0 for which we have γ s 1 + ε = P s . Now we define where, η = i One can check that i γ s ρ(i) = P s and σ 2 1 i γ s ρ(i) ≥ σ 2 1 i+1 γ s ρ(i+1) . Thus, the new source power allocation is in the feasible set. Now the remaining is to make sure the new allocation does not change the corresponding σ 2 r . Using Lagrangian multiplier, we have = i log 2 1 + σ 2 1 i (γ s ρ(i) + ε i ) 1 + γ r i σ 2 = i log 2 (1 + ε η ) 1 + σ 2 1 i γ s ρ(i) 1 + γ r i σ 2 Now notice that as i log 2 (1 + ε η ) is a constant, we have ∂ i log 2 (1+ ε η ) ∂σ 2 r i = 0 and ∂ i log 2 (1+ ε η ) ∂λ = 0. As a result, the optimum interference allocation for γ r is the same as that of γ r . Similarly to the case of minimization, here we have i ε i = ε. In addition, we have ε i ≥ 0 and there exists at least one i for which we have ε i > 0. Finally, as f i (x) = log 1 + σ 2 1 i x 1+Pσ 2 r ρ(i) γ r i is a monotonically increasing function of x, we conclude R FD sr (γ s ) < R FD sr (γ s ) which contradicts the first assumption of γ s being the optimal source power allocation, and the proof is complete.

Appendix D
First, we show R FD sr is a decreasing function of T and an increasing function of P s . It is sufficient to show = min respectively. Next, we show g(P r ) = R FD sr (P r ) − R FD rd (P r ) is a monotonically decreasing function of P r . It is sufficient to show Finally, one can conclude