MIMO Gaussian State-Dependent Channels with a State-Cognitive Helper

We consider the problem of channel coding over multiterminal state-dependent channels in which neither transmitters nor receivers but only a helper node has a non-causal knowledge of the state. Such channel models arise in many emerging communication schemes. We start by investigating the parallel state-dependent channel with the same but differently scaled state corrupting the receivers. A cognitive helper knows the state in a non-causal manner and wishes to mitigate the interference that impacts the transmission between two transmit–receive pairs. Outer and inner bounds are derived. In our analysis, the channel parameters are partitioned into various cases, and segments on the capacity region boundary are characterized for each case. Furthermore, we show that for a particular set of channel parameters, the capacity region is entirely characterized. In the second part of this work, we address a similar scenario, but now each channel is corrupted by an independent state. We derive an inner bound using a coding scheme that integrates single-bin Gel’fand–Pinsker coding and Marton’s coding for the broadcast channel. We also derive an outer bound and further partition the channel parameters into several cases for which parts of the capacity region boundary are characterized.

In this work, we study a particular communication model that can be used in future NOMA techniques. Specifically, we investigate a type of state-dependent channel with a helper, illustrated in Figure 2, in which two transmitters wish to send messages to their corresponding receivers over a parallel state-dependent channel. The state is not known to either transmitter or receiver but is non-causally (the side information in all times is given to the encoder before the block transmission) known to a state-cognitive helper, who tries to assist each receiver in mitigating the interference caused by the state. This model captures interference cancelation in various practical scenarios. For example, users in multi-cell systems may be interfered by a base station located in other cells. Such a base station, being as the source that causes the interference, clearly knows the information of the interference (modeled by state) and can serve as a helper to mitigate the interference. Alternatively, that base station can also convey the interference information to other base stations via the backhaul network so that other base stations can serve as helpers to reduce the interference. As another example, consider a situation where there are two Device to Device (D2D) links located in two distinct cells, and there is a downlink signal sent from the base station to some conventional mobile user in the cell. Also, there is some central unit that knows in a non-causal manner the signal to be sent by each base station, the helper in our model, and tries to assist the D2D communication links by mitigating the interference (see Figure 3). As a comparison, this type of state-dependent models differs from the original state-dependent channels studied in, e.g., [2,3], in that the state-cognitive helper is not informed of the transmitters' messages, and hence its state cancelation strategies are necessarily independent of message encoding at the transmitters.  The study of channel coding in the presence of channel side information (CSI) was initiated by Shannon [4] who considered a discrete memoryless channel (DMC) channel with random parameters and side information provided causally to the transmitter. The single-letter expression for the capacity of the point-to-point DMC with non-causal CSI at the encoder (the G-P channel) was derived in the seminal work of Gel'fand and Pinsker [2]. One of the most interesting special cases of the G-P channel is the Gaussian additive noise and interference setting in which the additive interference plays the role of the state sequence, which is known non-causally to the transmitter. Costa showed in [3] that the capacity of this channel is equal to the capacity of the same channel without additive interference. The capacity achieving scheme of [3] (which is that of [2] applied to the Gaussian case) is termed "writing on dirty paper" (WDP), and consequently, the property of the channel where the known interference can be completely removed is dubbed "the WDP property". Cohen and Lapidoth [5] showed that any interference sequence can be removed entirely when the channel noise is ergodic and Gaussian.

P S Helper
The models we study in this work all have a broadcasting node. The discrete memoryless broadcast channel (DM-BC) was introduced by Cover [6]. The capacity region of the DM-BC is still an open problem. The largest known inner bound on the capacity region of the DM-BC with private messages was derived by Marton [7]. Liang [8] derived an inner bound on the capacity region of the DM-BC with an additional common message. The best outer bound for DM-BC with a common message is due to Nair and El Gamal [9]. There are, however, some special cases where the capacity region is fully characterized. For example, the capacity region of the degraded DM-BC was established by Gallager [10]. The capacity region of the Gaussian BC was derived by Bergmans [11]. An interesting result is the capacity region of the Gaussian MIMO BC which was established by Weingarten et al. [12]. The authors introduced a new notion of an enhanced channel and used it jointly with the Entropy Power Inequality (EPI) to show their result. The capacity achieving scheme relies on the dirty paper coding technique. Liu and Viswanath [13] developed an extremal inequality proof technique and showed that it can be used to establish a converse result in various Gaussian MIMO multiterminal networks, including the Gaussian MIMO BC with private messages. Recently, Geng and Nair [14] developed a different technique to characterize the capacity region of Gaussian MIMO BC with common and private messages.
Degraded DM-BC with causal and non-causal side information was introduced by Steinberg [15]. Inner and outer bounds on the capacity region were derived. For the particular case in which the nondegraded user is informed about the channel parameters, it was shown that the bounds are tight, thus obtaining the capacity region for that case. The general DM-BC with non-causal CSI at the encoder was studied by Steinberg and Shamai [16]. An inner bound was derived, and it was shown to be tight for the Gaussian BC with private messages and independent additive interference at both channels. The latter setting was recently extended to the case of common and private messages in the Gaussian framework with K users in [17]. The special case where the transmitter sends only a common message to all receivers over an additive BC has been initially studied in [18] and has been recently extended to the compound setting in [19]. Outer bounds for DM-BC with CSI at the encoder were derived in [20].
The models addressed in this paper have a mismatched property, that is the state sequence is known only to some nodes, which differs from the classical study on state-dependent channels. The type of channels with mismatched property has been addressed in the past for various models, for example, in [21][22][23][24][25], the state-dependent multiple access channel (MAC) is studied with the state known at only one transmitter. The best outer bound for the Gaussian MAC setting was recently reported in [26]. The point-to-point helper channel studied in [27,28] can be considered as a special case of [25], where the cognitive transmitter does not send any message. Further in [28], the state-dependent MAC with an additional helper was studied, and the partial/full capacity region was characterized under various channel parameters. Moreover, some state-dependent relay channel models can also be viewed as an extension of the state-dependent channel with a helper, where the relay serves the role of the helper by knowing the state information. In [29], the state-dependent relay channel with state non-causally available at the relay is considered. An achievable rate was derived using a combination of decode-and-forward, Gel'fand-Pinsker (GP) binning and codeword splitting. Also, in [30], additional noiseless cooperation links with finite capacity were assumed between the transmitter and the relay, and various coding techniques were explored. The authors of [31] have recently considered a different scenario with a state-cognitive relay. The state-dependent Z-IC with a common state known in a non-causal manner only to the primary user was studied in [32]. A good tutorial on channel coding in the presence of CSI can be found in [33].
The basic state-dependent Gaussian channel with a helper is illustrated in Figure 4. It was first introduced in [27], where the capacity in the infinite power regime was characterized and was shown to be achievable by lattice coding. The capacity under arbitrary state power was established for some special cases in [28]. Based on a single-bin GP binning scheme the following lower bound was derived for the discrete memoryless case This lower bound was further evaluated for Gaussian channel by appropriate choice of the maximizing input distribution. The surprising result of that study was that when the helper power is above some threshold, then the interference caused by the state is entirely canceled and the capacity of the channel without the state can be achieved. This threshold does not depend on the state power, and hence it was shown that this channel also has WDP property, that is the capacity of the channel is the same as the capacity of the similar channel without the interference (which is modeled as the state).
The most relevant work to this study is [34], in which the state-dependent parallel channel with a helper was studied, for the regime with infinite state power and with two receivers being corrupted by two independent states. A time-sharing scheme was proved to be capacity achieving under certain channel parameters. In contrast, in this study, we expand those results for the arbitrary state power regime. We also consider two extreme cases. At first, we address the problem where the two receivers of the parallel channel are corrupted by the same but differently scaled states, and in the second part, those states are independent. For both cases, we show that the time-sharing scheme is no longer optimal. Our main contribution in this work is a derivation of inner bound, which is an extension of the Marton coding scheme for the discrete broadcast channel to the current model. We will apply this bound for the MIMO Gaussian setting and characterize the segments of the capacity region for various channel parameters. The material in this paper was presented in part at [35,36].

Notation Conventions
Throughout the paper, random variables are denoted using a sans-serif font, e.g., X, their realizations are denoted by the respective lower-case letters, e.g., x, and their alphabets are denoted by the respective calligraphic letters, e.g., X . Let X n stand for the set of all n-tuples of elements from X . An element from X n is denoted by x n = (x 1 , x 2 , . . . , x n ) and substrings are denoted by x j i = (x i , x i+1 , . . . , x j ). The cardinality of a finite set, say X , is denoted by |X |. The probability distribution function of X, the joint distribution function of X and Y, and the conditional distribution of X given Y are denoted by P X , P X,Y and P X|Y respectively. The expectation of X is denoted by E [X]. The probability of an event E is denoted as P{E }. The set of jointly -typical n-tuples (x n , y n ) is defined as T (n) (P XY ) [37]. A set of consecutive integers starting at 1 and ending in 2 nR is denoted as {1, 2, . . . , 2 nR }. We assume throughout this paper that 2 nR are integers, for any R and n → ∞. We denote the covariance of a zero mean vector X by Σ X E XX T , Σ XY E XY T is the cross-correlation, and the conditional correlation matrix of X given Y as M X|Y Σ X − Σ XY Σ −1 Y Σ YX .

Definitions
Definition 1. Random variables X, Y, Z are said to form a Markov chain in that order (denoted by X → Y → Z) if the conditional distribution of Z depends only on Y and is conditionally independent of X. Specifically, X, Y and Z form a Markov chain X → Y → Z if the joint probability mass function can be written as

Auxiliary Results
This section introduces some auxiliary results that are relevant to the analysis in this work [37].
The following inequality will be frequently used in the proofs of outer bounds on the capacity regions.
The covering lemma and the packing lemma will be used in the achievability proofs throughout this paper. Lemma 3 (Covering Lemma). Let (U, X,X) ∼ P UXX and < . Let (U n , X n ) ∼ P U n X n be a pair of random sequences with lim n→∞ P{(U n , X n ) ∈ T (n) (P UX )} = 1, and letX n (m), m ∈ A, where |A| ≥ 2 nR , be random sequences, conditionally independent of each other and of X n given U n , each distributed according to ∏ n i=1 PX |U (x i |u i ). Then, there exists δ( ) that approaches zero as → 0 such that lim n→∞ P{(U n , X n ,X n (m)) / ∈ T (n) for all m ∈ A} = 0, if R > I(X;X|U) + δ( ).
Lemma 4 (Packing Lemma). Let (U, X, Y) ∼ P UXY . Let (Ũ n ,Ỹ n ) ∼ PŨ nỸn be a pair of arbitrarily distributed random sequences, not necessarily distributed according to ∏ n i=1 P UY (ũ i ,ỹ i ). Let X n (m), m ∈ A, where |A| ≤ 2 nR , be random sequences, each distributed according to ∏ n i=1 P X|U (x i |ũ i ). Further assume that X n (m), m ∈ A, is pairwise conditionally independent ofỸ n givenŨ n , but is arbitrarily dependent on other X n (m) sequences. Then, there exists δ( ) that approaches zero as → 0 such that

Channel Model
In this section, we study the state-dependent parallel network with a state-cognitive helper, in which two transmitters communicate with two corresponding receivers over a state-dependent parallel channel. The two receivers are corrupted by the same but differently scaled state, respectively. The state information is not known to either the transmitters or the receivers, but a helper non-causally. Hence, the helper assists these receivers to cancel the state interference (see Figure 5). More specifically, the encoder at transmitter l, f l : I (n) R l → X n l , maps a message m l ∈ I (n) R l to a codeword x n l , for l = 1, 2. The inputs x n 1 and x n 2 are sent respectively over the two subchannels of the parallel channel. The two receivers are corrupted by the same but differently scaled and identically distributed (i.i.d.) state sequence s n ∈ S n , which is known to a common helper non-causally. Hence, the encoder at the helper, f 0 : S n → X n 0 , maps the state sequence s n ∈ S n into a codeword x n 0 ∈ X n 0 . The channel transition probability is given by P Y 1 |X 0 X 1 S · P Y 2 |X 0 X 2 S . The decoder at receiver l, g l : Y n l → I (n) R l , maps a received sequence y n l into a messagem l ∈ I (n) R l , for l = 1, 2. We assume that the messages are uniformly distributed over the sets I (n) R 1 and I (n) R 2 . We define the average probability of error for a length-n code as P{(m 1 ,m 2 ) = (m 1 , m 2 )}. ( such that the average probability of error P (n) e → 0 as n → ∞.

Definition 3.
We define the capacity region of the channel as the closure of the set of all achievable rate pairs (R 1 , R 2 ).
In this section, we focus on the MIMO Gaussian channel, with the outputs at the two receivers for one channel use given by where X 0 , X 1 , X 2 , S 2 , Z 1 and Z 2 are all real vectors of size t × 1, and • X 0 , X 1 , X 2 are the input vectors that are subject to the covariance matrix constraints 1 S is a real Gaussian random vector with zero mean and covariance matrix K S = E SS T 0, • Z l is a real Gaussian random vector with zero mean and an identity covariance matrix Both the noise variables, and the state variable are i.i.d. over channel uses. G s 1 (G s 2 ) is t × t real matrix that represents the channel matrix connecting the state source to the first (second) user. Similarly, G 1 (G 2 ) is a t × t real channel matrix connecting the helper to the first (second) user. Thus, our model captures a general scenario, where the helper's power and the state power can be arbitrary.
Our goal is to characterize the capacity region of the Gaussian channel under various channel parameters (G 1 , G 2 , G s 1 , G s 2 , K 0 , K 1 , K 2 , K S ).

Inner and Outer Bounds
In this section, we first derive inner and outer bounds on the capacity region for the state-dependent parallel channel with a helper. Then by comparing the inner and outer bounds, we characterize the segments on the capacity region boundary under various channel parameters.
We start by deriving an inner bound on the capacity region for the DMC based on the single-bin GP scheme.

Proposition 1.
For the discrete memoryless state-dependent parallel channel with a helper under the same but differently scaled states at the two receivers, an inner bound on the capacity region consists of rate pairs (R 1 , R 2 ) satisfying: for some distribution P W|S P X 0 |WS P X 1 P X 2 .
Proof. The proof is relegated to Appendix A.
We evaluate the inner bound for the Gaussian channel by choosing the joint Gaussian distribution for random variables as follows: where X 0 , X 1 , X 2 , S are independent and K 0 K 0 .
Based on those definitions, we obtain an achievable region for the Gaussian channel.

Proposition 2.
An inner bound on the capacity region of the parallel state-dependent MIMO Gaussian channel with same but differently scaled states and a state-cognitive helper consists of rate pairs (R 1 , R 2 ) satisfying; for some real matrices A, B and K 0 satisfying K 0 0, K 0 + BK S B T K 0 .
We note that the above choice of the helper's signal incorporates two parts with X 0 designed using single-bin dirty paper coding, and BS acting as direct state subtraction.
We next present an outer bound which applies the point-to-point channel capacity and the upper bound derived for the point-to-point channel with a helper in [27]. Denote Proposition 3. An outer bound on the capacity region of the state-dependent parallel MIMO Gaussian channel with a helper consists of rate pairs (R 1 , R 2 ) satisfying: for every l ∈ {1, 2} and Σ X 0 S that satisfies Proof. The second term in (11) is simply the capacity of a point-to-point channel without state. The first term is derived in Appendix B.

Capacity Region Characterization
In this section, we optimize A and B in Proposition 2, and compare the rate bounds with the outer bounds in Proposition 3 to characterize the points or segments on the capacity region boundary.
Since the inner bound in Proposition 2 is not convex, it is difficult to provide a closed form for the jointly optimized bounds. Therefore, we first optimize the bounds for R 1 and R 2 respectively, and then provide conditions on channel parameters such that these bounds match the outer bound. Based on the conditions, we partition the channel parameters into the sets, in which different segments of the capacity region boundary can be obtained.
We first consider the rate bound for R 1 in (9a). By setting is achievable, and this matches the outer bound in (11). Thus, one segment of the capacity region is specified by We further observe that the second term g 1 (A, B, K 0 ) in (9a) is optimized by setting A b = B + G −1 1 G s 1 , and hence then the inner bound for R 1 becomes R 1 = 1 2 log(|K 1 + I|), which is the capacity of the point-to-point channel without state and matches the outer bound in (11). Thus, another segment of the capacity is specified by We then consider the rate bound for R 2 . Similarly, the following segments on the capacity boundary can be obtained.
, one segment of the capacity region boundary is specified by , one segment of the capacity region boundary is specified by Summarizing the above analysis, we obtain the following characterization of segments of the capacity region boundary.
captures one segment of the capacity region boundary, where the state cannot be fully canceled. If (G 1 , G 2 , G s 1 , G s 2 , K 0 , K 1 , K 2 , K S ) ∈ C 1 , then (14a)-(14b) captures one segment of the capacity region boundary where the state is fully canceled. If (G 1 , G 2 , G s 1 , G s 2 , K 0 , K 1 , K 2 , K S ) ∈ B 1 , then the R 1 segment of the capacity region boundary is not characterized.
The channel parameters (G 1 , G 2 , G s 1 , G s 2 , K 0 , K 1 , K 2 , K S ) can also be partitioned into the sets A 2 , B 2 , C 2 , where captures one segment of the capacity region boundary, where the state cannot be fully canceled. If (G 1 , G 2 , G s 1 , G s 2 , K 0 , K 1 , K 2 , K S ) ∈ C 2 , then (16a)-(16b) captures one segment of the capacity boundary where the state is fully canceled. If (G 1 , G 2 , G s 1 , G s 2 , K 0 , K 1 , K 2 , K S ) ∈ B 2 , then the R 2 segment of the capacity region boundary is not characterized.
The above theorem describes two partitions of the channel parameters, respectively under which segments on the capacity region boundary corresponding to R 1 and R 2 can be characterized. Intersection of two sets, each from one partition, collectively characterizes the entire segments on the capacity region boundary. Figure 6 lists all possible intersection of sets that the channel parameters can belong to. For each case in Figure 6, we use red solid line to represent the segments on the capacity region that are characterized in Theorem 1, and we also mark the value of the capacity that each segment corresponds to as characterized in Theorem 1. Please note that the case B 1 ∩ B 2 is not illustrated in Figure 6 since no segments are characterized in this case. Figure 6. Segments of the capacity region for all cases of channel parameters.
One interesting example in Theorem 1 is the case with G −1 Thus, the point-to-point channel capacity is simultaneously obtained for both R 1 and R 2 , with state being fully canceled. We state this result in the following theorem.
for some A ∈ Ω A then the capacity region of the state-dependent parallel Gaussian channel with a helper and under the same but differently scaled states contains (R 1 , R 2 ) satisfying R 1 ≤ 0.5 log(|K 1 + I|), The channel conditions of Theorem 2 are not just of mathematical importance but also have a practical utility. Consider, for example, a scenario where the helper is also the interferer (see Figure 3), in such case it is reasonable to assume that G s 1 = G 1 and G s 2 = G 2 , and thus the aforementioned conditions are satisfied.

Numerical Example
We now examine our results via simulations. In particular, we focus on the scalar channel case, i.e., We set P 0 = 6, P 1 = P 2 = 5, Q = 12, and b = 0.8, and plot the inner and outer bounds for the capacity region (R 1 , R 2 ) for two values of a. It can be observed from Figure 7 that the upper bound is defined by the rectangular region of channel without state. The inner bound, in the contrary, is susceptible to the value of a, such that in the case where a = b, our inner and outer bounds coincide everywhere, while in the case a = b they coincide only on some segments. Both observations corroborate the characterization of the capacity in Theorems 1 and 2.  It is also interesting to illustrate how the channel parameters (a, b) affect our ability to characterize the capacity region boundary. For this we propose the following setup: • we choose α and β such that R 1 lies on the capacity region boundary; • we further choose ρ 0S that maximizes the achievable R 2 , denoted as R I 2 ; • we compare it to the outer bound of R 2 , R O 2 , and plot the gap ∆ R O 2 − R I 2 . Figure 8 shows the results of such simulation for two values of P 0 : P 0 = 1 for which the state is not fully canceled for user 1 and P 0 = 6, for which the state is canceled. We fix other parameters as before, that is P 1 = P 2 = 5 and Q = 12. The right figure shows that the capacity gap is small around the line a = b, this result is not surprising, and it appears in Theorem 2. The left Figure is also interesting. It shows that there is a curve a = b for which the capacity gap is also near zero. The reason for this phenomenon is explained as follows.

•
The chosen channel parameters satisfy (a, b, P 0 , P 1 , P 2 , Q) ∈ A 1 , and hence and b 2 P 2 0 ≥ α 2 1 Q(P 2 + 1 − b 2 P 0 ), then (a, b, P 0 , P 1 , P 2 , Q) ∈ C 2 , i.e., R 2 = 1 2 log(1 + P 2 ) is achievable. We illustrate this result in Figure 9, where we fixed the channel parameters b = 1, P 1 = P 2 = 5, Q = 12, and calculate the capacity gap for various values of a and P 0 . The shaded area is the region of P 0 where the capacity of the point-to-point helper channel is not characterized. In practical situations the channel parameters a and b are fixed but the helper can control P 0 . The results here imply that for a fixed (a, b) we can choose P 0 such that the capacity gap is close to zero. We emphasize this in Figure 10, where we plot the inner and outer bounds on achievable (R 1 , R 2 ) with the following channel parameters (a, b, P 0 , P 1 , P 2 , Q) = (3.5, 5, 2.17, 5, 5, 12).

MIMO Gaussian Channel with Independent States
In this section, we consider the problem of channel coding over MIMO Gaussian parallel state-dependent channel with a cognitive helper where the states are independent. We start with deriving an achievable region for a general discrete memoryless case. We then, evaluate this region for the Gaussian setting by choosing an appropriate jointly Gaussian input distribution.

Problem Formulation
Consider a 3-transmitter, 2-receiver state-dependent parallel DMC depicted in Figure 11, where Transmitter 1 wishes to communicate a message M 1 to Receiver 1, and similarly Transmitter 2 wishes to transmit a message M 2 to its corresponding Receiver 2. The messages M 1 and M 2 are independent. The communication takes over a parallel state-dependent channel characterized by a probability transition matrix p(y 1 , y 2 |x 0 , x 1 , x 2 , s). The transmitter at the helper has non-causal knowledge of the state and tries to mitigate the interference caused in both channels. The state variable S is random taking values in S and drawn from a discrete memoryless source (DMS) A (2 nR 1 , 2 nR 2 , n) code for the parallel state-dependent channel with state known non-causally at the helper consists of • two message sets I (n) three encoders, where the encoder at the helper assigns a codeword x n 0 (s n ) to each state sequence s n ∈ S n , encoder 1 assigns a codeword x n 1 (m 1 ) to each message m 1 ∈ I (n) R 1 and encoder 2 assigns a codeword x n 2 (m 2 ) to each message m 2 ∈ I (n) R 2 , and • two decoders, where decoder 1 assigns an estimatem 1 ∈ I (n) R 1 or an error message e to each received sequence y n 1 , and decoder 2 assigns an estimatem 2 ∈ I (n) R 2 or an error message e to each received sequence y n 2 .
Helper Enc 1 + + + Dec 1 We assume that the message pair (M 1 , M 2 ) is uniformly distributed over I (n) The average probability of error for a length-n code is defined as A rate pair (R 1 , R 2 ) is said to be achievable if there exists a sequence of (2 nR 1 , 2 nR 2 , n) codes such that lim n→∞ P (n) e = 0. The capacity region C is the closure of the set of all achievable rate pairs (R 1 , R 2 ).
We observe that due to the lack of cooperation between the receivers, the capacity region of this channel depends on the p(y 1 , y 2 |x 0 , x 1 , x 2 , s) only through the conditional marginal PMFs p(y 1 |x 0 , x 1 , s) and p(y 2 |x 0 , x 2 , s). This observation is similar to the DM-BC ( [37], Lemma 5.1).
Our goal is to characterize the capacity region C for the state-dependent Gaussian parallel channel with additive state known at the helper. Here, the state S = (S 1 , S 2 ) T . The channel is modeled by a Gaussian vector parallel state-dependent channel where G 1 , G 2 are t × t channel gain matrices. X 0 , X 1 , X 2 are the helper and the noncognitive transmitters channel input signals, each subject to an average matrix power constraint The additive state variables S l and noise Z l are independent and identically distributed (i.i.d.) Gaussian with zero mean and strictly positive definite covariance matrix K S l and I respectively.

Outer and Inner Bounds
To characterize the capacity region of this channel, we first consider the following outer bound on the capacity region for the Gaussian setting. Let, and Proposition 4. Every achievable rate pair (R 1 , R 2 ) of the state-dependent parallel Gaussian channel with a helper must satisfy the following inequalities R l ≤ min R ub 2 l (Σ X 0 S ), for l = {1, 2} and some covariance matrices (Σ XS 1 , The proof of this outer bound is quite similar to the proof of the outer bound in Proposition 3 and is given in Appendix D.
The upper bound for each rate consists of two terms, the first one reflects the scenario when the interference cannot be completely canceled, and the second is simply the point-to-point capacity of the channel without the state. Furthermore, the individual rate bounds are connected through the choice of Σ X 0 S 1 and Σ X 0 S 2 .
We next derive an achievable region for the channel based on an achievable scheme that integrates Marton's coding, single-bin dirty paper coding, and state cancelation. More specifically, we generate two auxiliary random variables, U and V to incorporate the state information so that Receiver 1 (and respectively 2) decodes U ( and respectively V) and then decodes the respective transmitter information. Based on such an achievable scheme, we derive the following inner bound on the capacity region for the DM case.

Proposition 5.
An inner bound on the capacity region of the discrete memoryless parallel state-dependent channel with a helper consists of rate pairs (R 1 , R 2 ) satisfying: for some PMF P UVX 0 |S P X 1 P X 2 .

Remark 1. The achievable region in Proposition 5 is equivalent to the following region
for some PMF P UVX 0 |S P X 1 P X 2 .
Proof. The proof of the inner bound is relegated to Appendix E.
We evaluate the latter inner bound for the Gaussian channel by choosing the joint Gaussian distribution for random variables as follows: where X 01 , X 02 , X 1 , X 2 , S 1 , S 2 are independent. For simplicity of representation, denoteĀ 1 = (A 11 , A 12 ),Ā 2 = (A 20 , A 11 , A 12 ) andB = (B 1 , B 2 ) . Let f 1 (·), g 1 (·), f 2 (·) and g 2 (·) be defined as where the mutual information terms are evaluated using the joint Gaussian distribution set at (29). Based on those definitions we obtain an achievable region for the Gaussian channel. Proposition 6. An inner bound on the capacity region of the parallel state-dependent Gaussian channel with a helper and with independent states, consists of rate pairs (R 1 , R 2 ) satisfying; for some real matrices A 20 , A 21 , A 22 , B 1 , B 2 , K 01 and K 02 satisfying K 01 , K 02 0, Now we provide our intuition behind such construction of the RVs in the proof of Proposition 6. X 0 contains two parts, the one with B l , l = 1, 2 controls the direct state cancelation of each state. The second part X 0l , l = 1, 2, is used for dirty paper coding via generation of the state-correlated auxiliary RVs U and V.

Capacity Region Characterization
In this section, we will characterize segments on the capacity boundary for various channel parameters using the inner and outer bounds that were derived in Section 4.2. Consider the inner bounds in (30a)-(30b). Each bound has two terms in the argument of min. We suggest optimizing each term independently and then comparing it to the outer bounds in (25). In the last step we will state the conditions under which those terms are valid. Our technique for optimal choice of (A 11 , , A 12 , A 20 , A 21 , A 22 ) be such that cancels the respective interfering terms from the mutual information quantities. We explain how those matrices were chosen in Appendix F.
We begin by considering what choice of (A 11 , A 12 ) can maximize f 1 (Ā 1 ,B, K 01 , K 02 ). Let Then f 1 (Ā 1 ,B, K 01 , K 02 ) takes the following form , A a 12 , B a 1 , B a 2 , K 0 , 0) meets the outer bound (the first term in "min" in (25)) with B 1 K S 1 = Σ X 0 S 1 and B 2 K S 2 = Σ X 0 S 2 . Furthermore, by setting we obtain is achievable. Similarly, by choosing K 02 = 0, then R 1 = 1 2 log |K 1 + I| is achievable and this meets the outer bound (the second term in "min" in (25)). Next we consider the bound on R 2 . Let Then f 2 (Ā 2 ,B, K 01 , K 02 ) takes the following form 2 ,B, 0, K 0 ) meets the outer bound (the first term in "min" in (25)). Furthermore, we set and then obtain If g 2 (Ā b 2 ,B, K 01 , K 02 ) ≤ f 2 (Ā b 2 ,B, K 01 , K 02 ), then R 2 = 1 2 log (|K 2 + I|) is achievable and this meets the outer bound. This also equals the maximum rate for R 2 when the channel is not corrupted by state.
Summarizing the above analysis, we obtain the following characterization of segments of the capacity region boundary. Theorem 3. The channel parameters (G 1 , G 2 , K 0 , K 1 , K 2 , K S 1 , K S 2 ) can be partitioned into the sets A 1 , B 1 , C 1 , where 1 ,B, K 0 , 0) captures one segment of the capacity region boundary, where the state cannot be fully canceled. If (G 1 , G 2 , K 0 , K 1 , K 2 , K S 1 , K S 2 ) ∈ C 1 , then R 1 = 1 2 log |K 1 + I| captures one segment of the capacity region boundary where the state is fully canceled. If (G 1 , G 2 , K 0 , K 1 , K 2 , K S 1 , K S 2 ) ∈ B 1 , then the R 1 segment of the capacity region boundary is not characterized.
The channel parameters (G 1 , G 2 , K 0 , K 1 , K 2 , K S 1 , K S 2 ) can also be partitioned into the sets A 2 , B 2 , C 2 , where 2 ,B, 0, K 0 ) captures one segment of the capacity region boundary, where the state cannot be fully canceled. If (G 1 , G 2 , K 0 , K 1 , K 2 , K S 1 , K S 2 ) ∈ C 2 , then R 2 = 1 2 log (|K 2 + I|) captures one segment of the capacity boundary where the state is fully canceled. If (G 1 , G 2 , K 0 , K 1 , K 2 , K S 1 , K S 2 ) ∈ B 2 , then the R 2 segment of the capacity region boundary is not characterized.
The above theorem describes two partitions of the channel parameters, respectively under which segments on the capacity region boundary corresponding to R 1 and R 2 can be characterized. Intersection of two sets, each from one partition, collectively characterizes the entire segments on the capacity region boundary.
We note that our inner bound can be tight for some set of channel parameters. As an example, assume that (G 1 , G 2 , K 0 , K 1 , K 2 , K S 1 , K S 2 ) ∈ C 1 ∩ C 2 . In such case, R 1 = 1 2 log and R 2 = 1 2 log (|K 2 + I|) are achievable. For the point-to-point helper channel [28], it was shown that if the helper power is above some threshold, the state is completely canceled, whereas in our model we have two parallel channels. If the helper power is high enough, it can split its signal, similarly as for the Gaussian BC, such that one part of it is intended for Receiver 2, where by using dirty paper coding it eliminates completely the interference caused by the state and the part of the signal intended for Receiver 1. In the same time the part of the helper signal intended for Receiver 1, can only cancel the interference caused by the state while the part intended to Receiver 2 is treated as noise.

Numerical Results
In this section, we provide specific numerical examples to illustrate the bounds obtained in the previous sections. In particular, we focus on scalar Gaussian channel setting, such that: G 1 ← η 1 ; G 2 ← η 2 ; K 0 ← P 0 ; K 01 ← P 01 ; K 02 ← P 02 ; K 1 ← P 1 ; K 2 ← P 2 , K S 1 ← Q 1 ; K S 2 ← Q 2 . We also denote (A 11 , A 12 , A 20 , A 21 , A 22 , B 1 , B 2 ) ← (α 11 , α 12 , α 20 , α 21 , α 22 , β 1 , β 2 ). We plot the inner and outer bounds for various values of helper power P 0 , channel gains, η 1 and η 2 and different state power. The results are shown in Figure 12. The outer bound is based on Proposition 4. The inner bound is the convex hull of all the achievable regions, with interchange between the roles of the decoders. The time-sharing inner bound is according to point-to-point helper channel achievable region [28]. The scenario where the helper power is less than the users power is depicted in Figure 12a,b, while the channel gains in Figure 12a are equal, they are mismatched in Figure 12b. Please note that in both cases our inner bound outperforms the time-sharing bound, especially in the mismatched case, and some segments of the capacity region are characterized.
The scenario with helper power being higher than the user power and matched and mismatched channel gain is depicted in Figure 12c,d respectively. Similar to for low helper power regime, our proposed achievability scheme performs better than time-sharing.

Conclusions
In the first part of this paper, we have studied the parallel state-dependent Gaussian channel with a state-cognitive helper and with same but differently scaled states. An inner bound was derived and was compared to an upper bound, and the segments of the capacity region boundary were characterized for various channel parameters. We have shown that if the channel gain matrices satisfy a certain symmetry property, the full rectangular capacity region of the two point-to-point channels without the state can be achieved. Furthermore, for the scalar channel case, we have shown that for a given ratio of state gain over the helper signal gain, a /b, one can find a value of the helper power-P 0 , such that the capacity region is fully characterized.
A different model of the parallel state-dependent Gaussian channel with a state-cognitive helper and independent states was considered in the second part of this study. Inner and outer bounds were derived, and segments of the capacity region boundary were characterized for various channel parameters. We have also demonstrated our results using numerical simulation and have shown that our achievability scheme outperforms time-sharing that was shown to be optimal for the infinite state power regime in [34].
These two models represent a special case of a more general scenario with correlated states, our results in both studies imply that as the states get more correlated, it is easier to mitigate the interference. Furthermore, the gap between the inner bound and the outer bound in this work suggests that a new techniques for outer bound derivation is needed as we believe that the inner bounds consisting of pairs (R 1 , R 2 ) = ( f 1 (Ā a 1 ,B, K 01 , K 02 ), f 2 (Ā 2 ,B, K 01 , K 02 )) is indeed tight for some set of channel parameters.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Proof of Proposition 1
Fix the following joint PMF

. Codebook Generation
Randomly and independently generate 2 nR sequences w n (m),m ∈ I (n) Similarly, for l = {1, 2}, generate 2 nR l sequences x n l (m l ), m l ∈ I (n) R l , each according to ∏ n i=1 P X l (x li ). These sequences constitute the codebook, which is revealed to the encoders and the decoders. Let > and l ∈ {1, 2}. Upon receiving y n l , the decoder at Receiver l declares thatm l ∈ I (n) R l is sent if it is the unique message such that (w n (m), x n l (m l ), y n l ) ∈ T (n) (P WX l Y l ) for somem ∈ I (n) R ; otherwise it declares an error.

Appendix A.4. Analysis of the Probability of Error
The encoder at the helper declares an error if the following event occurs By the covering lemma (Section 2.3), with setting the original random variables (U, X,X) as (∅, S, W) respectively, and A = I (A3) Assume without loss of generality that (M 1 , M 2 ) = (1, 1), condition (A3) holds, and letM denote the index of the chosen w n sequence for s n . The decoder at Receiver l makes an error only if one or more of the following events occur: Thus, by the union of events bound, By the LLN, the first term P{E l1 } tend to zero as n → ∞. For the second term, note that for m l = 1, Hence, by the packing lemma, choosing the original random variable (U, X, Y) as (∅, X l , (W, Y l )) respectively, A = I (n) R l , P{E l2 } tends to zero as n → ∞ if R l < I(X l ; Y l , W). Since X l and W are independent, R l < I(X l ; Y l |W). Finally, for the third term, note that for m l = 1 andm =M p(w n (m), p(x li (m l ))p(w i (m))p(y li ).
Again, by the packing lemma, choosing the original random variable (U, X, Y) as (∅, (W, X l ), Y l ) respectively, A = I (n)

Appendix B. Proof of Proposition 3
We prove for a general l ∈ {1, 2}. By Fano's inequality (Lemma 2), where n tends to zero as n → ∞ by the assumption that lim n→∞ P (n) e = 0.

Now consider
It remains to show that Σ X 0 S K −1 S Σ T X 0 S K 0 . We use the non-negativity property of the covariance matrix of the vector (X 0 , S) T det E (X 0 , S) T where the last inequality follows since any covariance matrix (Σ X 0 S ) is by definition positive definite. Now we arrange parts to have: This completes the proof of Proposition 3.

Appendix C. Optimal Coefficients for the MIMO Gaussian with Differently Scaled States Channel
We first consider the bound on R 1 . Consider the first argument in min of (7a) It is straightforward to show that and h(W|S, X 1 ) = h(X 0 ).
As for the third term, denoteỸ 1 = Y 1 − X 1 , thus We require that term S in the argument of the differential entropy be completely canceled, therefore we choose With the above choice of A, we have Finally, we demand that M W|Ỹ 1 be the MMSE of X 0 given G 1 X 0 + Z 1 , i.e.
In such case We would like to obtain a condition under which that is equivalent to Furthermore, after rearranging terms, we have The choices of A c and A d for the achievability proof of R 2 follows using similar steps by interchanging the indices 1 ← 2.
where in the last equality we used the definition of Σ X 0 S from (26). Consequently, we established an upper bound on I(S n ; Y n 1 |M 1 ): if there is more than one suchr, choose the smallest one. If no suchr can be found declare an error. Next, given s n , u n (r) , findt such that s n , u n (r), v n (t) ∈ T (n) (P SUV ), if there is more than one sucht, choose the smallest one. If no sucht can be found declare an error. Then, given s n , u n (r) and v n (t), generate x n 0 with i.i.d. components according to ∏ n i=1 P X 0 |SUV (x 0i |s i , u i , v i ). Let (m 1 , m 2 ) be the messages to be sent. The encoder at transmitter l transmits x n l (m l ).
Again, by the union of events bound, the probability that the decoder at receiver 1 makes an error, can be upper bounded as We have already shown that Pr(E 01 ) tends to zero as n → ∞ ifR U > I(U; S) + δ( ). Next, note that E c 01 = {(S n , U n (r 0 )) ∈ T (n) (P SU )} = {(S n , U n (r 0 ), X n 0 ) ∈ T (n) (P SUX 0 )}, and P Y n 1 |S n U n (r 0 )X n 0 X n 1 (1) (y n 1 |s n , u n , x n 0 , Hence, by the conditionally typicality lemma, Pr(E c 01 ∩ E 11 ) tends to zero as n → ∞. As for the probability of the event (E c 01 ∩ E 12 ), X n 1 (m 1 ) is independent of (U n (r 0 ), Y n 1 ) ∼ ∏ n i=1 P UY 1 (u i , y 1i ). Hence, by the packing lemma, with U = ∅, X ← X 1 , Y ← (U, Y 1 ) and A = [2 : 2 nR 1 ], Pr(E c 01 ∩ E 12 ) tends to zero as n → if R 1 < I(X 1 ; U, Y 1 ) − δ( ). X 1 and U are mutually independent, hence the latter condition is equivalent to R 1 < I(X 1 ; Y 1 |U) − δ( ).
This completes the proof of achievability.

Appendix F. Optimal Coefficients for the MIMO Gaussian with Independent States Channel
We first consider the bound on R 1 . Consider the first argument in min of (28a) I(U, X 1 ; Y 1 ) − I(U; S) = I(X 1 ; Y 1 ) + I(U; Y 1 |X 1 ) − I(U; S|X 1 ) It is straightforward to show that and h(U|S, X 1 ) = h(X 01 ) = 1 2 log(2πe) t |K 01 |.
As for the third term in (A11), denoteỸ 1 = Y 1 − X 1 , We require that the terms S 1 and S 2 in the argument of the differential entropy be completely canceled, hence we choose With the above choice of (A 11 , A 12 ), we have h(U|X 1 , Y 1 ) = h X 01 − M U|Ỹ 1 G 1 X 01 + X 02 + Z 1 .
It is straightforward to show that and h(V|U, S, X 2 ) = h(X 02 ). As for the third term in (A12), denoteỸ 2 We require that terms X 01 , S 1 and S 2 in the argument of the differential entropy be completely canceled, hence we choose A a 20 = M V|Ỹ 2 G 2 , A a 21 = M V|Ỹ 2 G 2 B 1 , A a 12 = M V|Ỹ 2 (G 2 B 2 + I).