Robust Signaling for Bursty Interference

This paper studies a bursty interference channel, where the presence/absence of interference is modeled by a block-i.i.d. Bernoulli process that stays constant for a duration of T symbols (referred to as coherence block) and then changes independently to a new state. We consider both a quasi-static setup, where the interference state remains constant during the whole transmission of the codeword, and an ergodic setup, where a codeword spans several coherence blocks. For the quasi-static setup, we study the largest rate of a coding strategy that provides reliable communication at a basic rate and allows an increased (opportunistic) rate when there is no interference. For the ergodic setup, we study the largest achievable rate. We study how non-causal knowledge of the interference state, referred to as channel-state information (CSI), affects the achievable rates. We derive converse and achievability bounds for (i) local CSI at the receiver side only; (ii) local CSI at the transmitter and receiver side; and (iii) global CSI at all nodes. Our bounds allow us to identify when interference burstiness is beneficial and in which scenarios global CSI outperforms local CSI. The joint treatment of the quasi-static and ergodic setup further allows for a thorough comparison of these two setups.


Introduction
Interference is a key limiting factor for the efficient use of the spectrum in modern wireless networks. It is, therefore, not surprising that the interference channel (IC) has been studied extensively in the past; see, e.g., [1] (Chapter 6) and references therein. Most of the information-theoretic work developed for the IC assumes that interference is always present. However, certain physical phenomena, such as shadowing, can make the presence of interference intermittent or bursty. Interference can also be bursty due to the bursty nature of data traffic, distributed medium access control mechanisms, and decentralized networking protocols. For this reason, there has been an increasing interest in understanding and exploring the effects of burstiness of interference.
Seminal works in this area were performed by Khude et al. in [2] for the Gaussian channel and in [3] by using a model which corresponds to an approximation to the two-user Gaussian IC. They tried to harness the burstiness of the interference by taking advantage of the time instants when the interference is not present to send opportunistic data. Specifically, [2,3] considered a channel model where the interference state stays constant during the transmission of the entire codeword, which corresponds to a quasi-static channel. Motivated by the idea of degraded message sets by Körner and Marton [4], Khude et al. studied the largest rate of a coding strategy that provides reliable communication at a basic rate R and allows an increased (opportunistic) rate R + ∆R when there is no interference. The idea of opportunism was also used by Diggavi and Tse [5] for the quasi-static flat fading channel and, recently, by Yi and Sun [6] for the K-user IC with states.
Wang et al. [7] modeled the presence of interference using an independent and identically distributed (i.i.d.) Bernoulli process that indicates whether interference is present or not, which corresponds to an ergodic channel. They further assume that the interference links are fully correlated. Wang et al. mainly studied the effect of causal feedback under this model, but also presented converse bounds for the non-feedback case. Mishra et al. considered the generalization of this model to multicarrier systems, modeled as parallel two-user bursty ICs, for the feedback [8] and non-feedback case [9].
The bursty IC is related to the binary fading IC, for which the four channel coefficients are in the binary field {0, 1} according to some Bernoulli distribution. Note, however, that neither of the two models is a special case of the other. While a zero channel coefficient of the cross link corresponds to intermittence of interference, the bursty IC allows for non-binary signals. Conversely, in contrast to the binary fading IC, the direct links in the bursty IC cannot be zero, since only the interference can be intermittent. Vahid et al. [10][11][12][13][14] studied the capacity region of the binary fading IC. Specifically, [11,14] study the capacity region of the binary fading IC when the transmitters do not have access to the channel coefficients, and [12] study the capacity region when the transmitters have access to the past channel coefficients. Vahid and Calderbank additionally study the effect on the capacity region when certain correlation is available to all nodes as side information [13].
The focus of the works by Khude et al. [3] and Wang et al. [7] was on the linear deterministic model (LDM), which was first introduced by Avestimehr [15], but falls within the class of more general deterministic channels whose capacity was obtained by El Gamal and Costa in [16]. The LDM maps the Gaussian IC to a channel whose outputs are deterministic functions of their inputs. Bresler and Tse demonstrated in [17] that the generalized degrees of freedom (first-order capacity approximation) of the two-user Gaussian IC coincides with the normalized capacity of the corresponding deterministic channel. The LDM thus offers insights on the Gaussian IC.

Contributions
In this work, we consider the LDM of a bursty IC. We study how interference burstiness and the knowledge of the interference states (throughout referred to as channel-state information (CSI)) affects the capacity of this channel. We point out that this CSI is different from the one sometimes considered in the analysis of ICs (see, e.g., [18]), where CSI refers to knowledge of the channel coefficients. (In this regard, we assume that all transmitters and receivers have access to the channel coefficients). For the sake of compactness, we focus on non-causal CSI and leave other CSI scenarios, such as causal or delayed CSI, for future work.
We consider the following cases: (i) only the receivers know the corresponding interference state (local CSIR); (ii) transmitters and receivers know their corresponding interference states (local CSIRT); and (iii) both transmitters and receivers know all interference states (global CSIRT). For each CSI level we consider both (i) the quasi-static channel and (ii) the ergodic channel. Specifically, in the quasi-static channel the interference is present or absent during the whole message transmission and we harness the realizations when the channel experiences better conditions (no presence of interference) to send extra messages. In the ergodic channel the presence/absence of interference is modeled as a Bernoulli random variable which determines the interference state. The interference state stays constant for a certain coherence time T and then changes independently to a new state. This model includes the i.i.d. model by Wang et al. as a special case, but also allows for scenarios where the interference state changes more slowly. Note, however, that when the receivers know the interference state (as we shall assume in this work), then the capacity of this model becomes independent of T and coincides with that of the i.i.d. model. The proposed analysis is performed for the two extreme cases where the states of each of the interfering links are independent, and where states of the interfering links are fully correlated. Hence we unify the scenarios already treated in the literature [2,3,7]. Nevertheless, some of our presented results can be extended to consider an arbitrary correlation between the interfering states. The works by Vahid and Calderbank [13] and Yeh and Wang [19] characterize the capacity region of the two-user binary IC and the MIMO X-channel, respectively. While [13,19] consider a general spatial correlation between communication and interfering links, they do not consider the correlation between interfering links.
Our analysis shows that, for both the quasi-static and ergodic channels, for all interference regions except the very strong interference region, global CSIRT outperforms local CSIR/CSIRT. This result does not depend on the correlation between the states of the interfering links. For local CSIR/CSIRT and the quasi-static scenario, the burstiness of the channel is of benefit only in the very weak and weak interference regions. For the ergodic case and local CSIR, interference burstiness is only of clear benefit if the interference is either weak or very weak, or if it is present at most half of the time. This is in contrast to local CSIRT, where interference burstiness is beneficial in all interference regions.
Specific contributions of our paper include: • A joint treatment of the quasi-static and the ergodic model: Previous literature on the bursty IC considers either the quasi-static model or the ergodic model. Furthermore, due to space constraints, the proofs of some of the existing results were either omitted or contain little details. In contrast, our paper discusses both models, allowing for a thorough comparison between the two. • Novel achievability and converse bounds: For the ergodic model, the achievability bounds for local CSIRT, and the achievability and converse bounds for global CSIRT, are novel. In particular, novel achievability strategies are proposed that exploit certain synchronization between the users. To keep the paper self-contained, we further present the proof of the achievability bound for local CSIR that has appeared in the literature without proof. • Novel converse proofs for the quasi-static model: In contrast to existing converse bounds, which are based on Fano's inequality, our proofs of the converse bounds for the rates of the worst-case and opportunistic messages are based on an information density approach (more precise, they are based on the Verdú-Han lemma). This approach does not only allow for rigorous yet clear proofs, but it would also enable a more refined analysis of the probabilities that worst-case and opportunistic messages can be decoded correctly. • A thorough comparison of the sum capacity of various scenarios: Inter alia, the obtained results are used to study the advantage of featuring different levels of CSI, the impact of the burstiness of the interference, and the effect of the correlation between the channel states of both users.
The rest of this paper is organized as follows. Section 2 introduces the system model, where we define the bursty IC quasi-static setup, the ergodic setup, and briefly summarize previous results on the non-bursty IC. In Sections 3-5 we present our results for local CSIR, local CSIRT and global CSIRT, respectively. Section 6 studies the impact of featuring different CSI levels. Section 7 analyzes in which scenarios exploiting burstiness of interference is beneficial. Section 8 concludes the paper with a summary of the results. Most proofs of the presented results are deferred to the appendix.

Notation
To differentiate between scalars, vectors, and matrices we use different fonts: scalar random variables and their realizations are denoted by upper and lower case letters, respectively, e.g., B, b; vectors are denoted using bold face, e.g., X, x; random matrices are denoted via a special font, e.g., X; and for deterministic matrices we shall use yet another font, e.g., S. For sets we use the calligraphic font, e.g., S. We denote sequences such as A i,1 , . . . , A i,M by A M i . We define max{0, x} as (x) + . We use F 2 to denote the binary Galois field and ⊕ to denote the modulo 2 addition. Let the down-shift matrix S u ∈ F q×q 2 , a matrix of dimension q × q, be defined as the all-zero vector and I u ∈ F u×u 2 the identity matrix.
Similarly, we define the matrix L d ∈ F q×q 2 of dimension q × q that selects the d lowest components of a vector of dimension q: We shall denote by H b (p) the entropy of a binary random variable X with probability mass function (p, 1 − p), i.e., Similarly, we denote by H sum (p, q) the entropy H(X ⊕X) where X andX are two independent binary random variables with probability mass functions (p, 1 − p) and (q, 1 − q), respectively: For this function it holds that H sum (p, q) = H sum (1 − p, q) = H sum (p, 1 − q) = H sum (1 − p, 1 − q). Finally, 1(·) denotes the indicator function, i.e., 1(statement) is 1 if the statement is true and 0 if it is false.

System Model
Our analysis is based on the LDM, introduced by Avestimehr et al. [15] for some relay network. This model is, on the one hand, simple to analyze and, on the other hand, captures the essential structure of the Gaussian channel in the high signal-to-noise ratio regime.
We consider a bursty IC where (i) the interference state remains constant during the whole transmission of the codeword of length N (quasi-static setup) or (ii) the interference state remains constant for a duration of T consecutive symbols and then changes independently to a new state (ergodic setup). For one coherence block, the two-user bursty IC is depicted in Figure 1, where n d and n c are the channel gains of the direct and cross links, respectively. We assume that n d and n c are known to both the transmitter and receiver and remain constant during the whole transmission of the codeword. For simplicity, we shall assume that n d and n c are equal for both users. Nevertheless, most of our results generalize to the asymmetric case. More precisely, all converse and achievability bounds generalize to the asymmetric case, while the direct generalization of the proposed achievability schemes may be loose in some asymmetric regions. For the k-th block, the input-output relation of the channel is given by Let q max{n d , n c }. In (3) and (4), The interference states B i,k , i = 1, 2, k = 1, . . . , K, are sequences of i.i.d. Bernoulli random variables with activation probability p.
Regarding the sequences B K 1 and B K 2 , we consider two cases: (i) B K 1 and B K 2 are independent of each other and (ii) B K 1 and B K 2 are fully correlated sequences, i.e., B K 1 = B K 2 . For both cases we assume that the sequences are independent of the messages W 1 and W 2 .
We shall define the normalized interference level as α n c n d , based on which we can divide the interference into the following regions (a similar division was used by Jafar and Vishwanath [20]): very strong interference (VSI) for 2 < α.

Quasi-Static Channel
The channel defined in (3) and (4) may experience a slowly-varying change on the interference state. In this case, the duration of each of the transmitted codewords of length N = KT is smaller than the coherence time T of the channel and the interference state stays constant over the duration of each codeword, i.e., K = 1, T = N. In the wireless communications literature such a channel is usually referred to as a quasi-static channel [21] (Section 5.4.1). In this scenario, the rate pair of achievable rates (R 1 , R 2 ) is dominated by the worst case, which corresponds to the presence of interference at both receivers. However, in absence of interference, it is possible to communicate at a higher date rate, so planning a system for the worst case may be too pessimistic. Assuming that the receivers have access to the interference states, the transmitters could send opportunistic messages that are decoded only if the interference is absent, in addition to the regular messages that are decoded irrespective of the interference state. We make the notion of opportunistic messages and rates precise in the subsequent paragraphs.
Let U i,k indicate the level of CSI available at the transmitter side in coherence block k, and let V i,k indicate the level of CSI at the receiver side in coherence block k:

1.
local CSIR: local CSIRT: global CSIRT: We define the set of opportunistic messages according to the level of CSI at the receiver as Then, we define an opportunistic code as follows.
To better distinguish the rates (R 1 , R 2 ) from the opportunistic rates {∆R i (·)}, i = 1, 2, we shall refer to (R 1 , R 2 ) as worst-case rates, because the corresponding messages can be decoded even if the channel is in its worst state (see also Definition 2).
and lCl Pr (Ŵ 1 , lCl The capacity region is the closure of the set of achievable rate tuples [1](Sec. 6.1). We define the worst-case sum rate as R R 1 + R 2 and the opportunistic sum rate as ∆R(V 1 , . The worst-case sum capacity C is the supremum of all achievable worst-case sum rates, the opportunistic sum capacity ∆C(V 1 , V 2 ) is the supremum of all opportunistic sum rates, and the total sum capacity is defined as C + ∆C(V 1 , V 2 ). Note that the opportunistic sum capacity depends on the worst-case sum rate.

Remark 1.
The worst-case sum rate and opportunistic sum rates in the quasi-static setting depend only on the collection of possible interference states: for independent interference states we have B ∈ {00, 01, 10, 11}, and for fully correlated interference states we have B ∈ {00, 11}. In principle, our proof techniques could also be applied to analyze other collections of interference states.

Remark 2.
In the CSIRT setting the transmitters have access to the interference state. Therefore, in this setting the messages are strictly speaking not opportunistic. Instead, transmitters can adapt their rate based on the state of the interference links, which is sometimes referred to as rate adaptation in the literature.

Ergodic Channel
In this setup, we shall restrict ourselves to codes whose blocklength N is an integer multiple of the coherence time T. A codeword of length N = KT thus spans K independent channel realizations.
Definition 3 (Code for the bursty IC). A K, T, R 1 , R 2 code for the bursty IC is defined as: two independent messages W 1 and W 2 uniformly distributed over the message sets W i {1, 2, . . . , 2 KTR i }, i = 1, 2;

2.
two encoders: two decoders: HereŴ i denotes the decoded message, and U K i and V K i indicate the level of CSI at the transmitter and receiver side, respectively, which are defined as for the quasi-static channel in Section 2.1.

Definition 4 (Ergodic achievable rates).
A rate pair (R 1 , R 2 ) is achievable for a fixed T if there exists a sequence of codes K, T, R 1 , R 2 (parametrized by K) such that The capacity region is the closure of the set of achievable rate pairs. We define the sum rate as R R 1 + R 2 , the sum capacity C is the supremum of all achievable sum rates.

The Sum Capacities of the Non-Bursty and the Quasi-Static Bursty IC
When the activation probability p is 1, we recover in both the ergodic and quasi-static scenarios the deterministic IC. For a general deterministic IC the capacity region was obtained in [16] (Th. 1) and then by Bresler and Tse in [17] for a specific deterministic IC. For completeness, we present the sum capacity region for the deterministic non-bursty IC in the following theorem. Theorem 1. The sum capacity region of the two-user deterministic IC is equal to the union of the set of all sum rates R satisfying Proof. The proof is given in [16] (Section II). For the achievability bounds, El Gamal and Costa [16] (Theorem 1) use the Han-Kobayashi scheme [22] for a general IC. Bresler and Tse [17] (Section 4) use a specific Han-Kobayashi strategy for the special case of the LDM. Jafar and Vishwanath [20] present an alternative achievability scheme for the K-user IC, which particularized for the two-user IC will be referenced in this work.
We can achieve the sum rates (9) and (11) over the quasi-static channel by treating the bursty IC as a non-bursty IC. The following theorem demonstrates that this is the largest achievable worst-case sum rate irrespective of the availability of CSI and the correlation between B 1 and B 2 .
Theorem 2 (Sum capacity for the quasi-static bursty IC). For 0 ≤ p ≤ 1, the worst-case sum capacity of the bursty IC is equal to the supremum of the set of sum rates R satisfying Proof. The converse bounds are proved in Appendix A.1. Achievability follows directly from Theorem 1 by treating the bursty IC as a non-bursty IC.
Theorem 2 shows that the worst-case sum capacity does not depend on the level of CSI available at the transmitter and receiver side. However, this is not the case for the opportunistic rates as we will see in the next sections.

Remark 3.
In principle, one could reduce the worst-case rates in order to increase the opportunistic rates. However, it turns out that such a strategy is not beneficial in terms of total rates R i + ∆R i (V i ), i = 1, 2. In other words, setting ∆R i (1) = 0, i = 1, 2 (for local CSIR/CSIRT) and ∆R i (11) = 0 (for global CSIRT), as we have done in Definition 2, incurs no loss in total rate. Furthermore, in most cases it is preferable to maximize the worst-case rate, since it can be guaranteed irrespective of the interference state.

Local CSIR
For the quasi-static and ergodic setups, described in Sections 2.1 and 2.2, respectively, we derive converse and achievability bounds for the independent and fully correlated scenarios when the interference state is only available at the receiver side.

Independent Case
We present converse and achievability bounds for local CSIR when B 1 and B 2 are independent. The converse bounds are derived for local CSIRT, hence they also apply to this case. Since converse and achievability bounds coincide, this implies that local CSI at the transmitter is not beneficial in the quasi-static setup.
Theorem 3 (Opportunistic sum capacity for local CSIR/CSIRT). Assume that B 1 and B 2 are independent of each other. For 0 < p < 1, the opportunistic sum capacity region is the union of the set of rate tuples (R, where ∆R 1 (1) = ∆R 2 (1) = 0, and R, ∆R 1 (0) and ∆R 2 (0) satisfy (12)- (14) and Proof. The converse bounds are proved in Appendix A.2 and the achievability bounds are proved in Appendix A.3.

Remark 4.
The converse bounds in Theorem 3 coincide with those in [3] (Theorem 2.1), particularized for the symmetric setting. Theorem 3, however, is proven for local CSIRT, which is not considered in the model from [3]. The proof included in Appendix A.2 is based on an information density approach and provides a unified framework for treating local CSIR, local CSIRT and global CSIRT, as will be shown in Section 5.
As discussed in Remark 3, one could reduce the worst-case sum rate R and increase the opportunistic rates ∆R(V 1 , V 2 ). However, in the case of one-shot transmission this is not desirable, since the worst-case sum rate is the only rate that can be guaranteed irrespective of the interference state. (With one-shot transmission we refer to the case where we transmit one codeword of length N over the quasi-static channel. This is in contrast to the case discussed, e.g., in Section 3.3, where we are interested in transmitting many codewords, each over N channel uses of independent quasi-static channels.) Thus, one is typically interested in the opportunistic sum capacity when the worst-case rate R is maximized. For this case, the results of Theorem 3 are summarized in Table 1 for the VWI, WI, MI and SI regions. Table 1. Opportunistic sum capacity for local CSIR when the worst-case sum rate is maximized.

Rates
VWI WI MI SI Observe that converse and achievability bounds coincide. Further observe that opportunistic messages can only be transmitted reliably for VWI or WI. In the other interference regions, the opportunistic sum capacity is zero.

Fully Correlated Case
Assume now that the sequences B 1 and B 2 are fully correlated (B 1 = B 2 ). For local CSIR, the correlation between B 1 and B 2 has no influence on the opportunistic sum capacity region. Indeed, in this case the channel inputs are independent of (B 1 , B 2 ) and the opportunistic sum capacity region of the quasi-static bursty IC depends on (B 1 , B 2 ) only via the marginal distributions of B i , i = 1, 2. Hence, it follows that Theorem 3 as well as Table 1 apply also to the fully correlated case and local CSIR scenario. For completeness, a proof of the converse part is given in Appendix A.4. The achievability part is included in Appendix A.3.

Independent Case
For the case where the sequences B K 1 and B K 2 are independent of each other, we have the following theorems.
Theorem 4 (Converse bounds for local CSIR). Assume that B K 1 and B K 2 are independent of each other. The sum rate R for the bursty IC is upper-bounded by and Proof. Bound (18) coincides with [7] (Equation (3)). Specifically, [7] (Equation (3)) derives (18) for the considered channel model with T = 1 and feedback. The proof for this bound under local CSIRT (without feedback) is given in Appendix B.1. Bound (19) coincides with [23] (Lemma A.1). Specifically, [23] (Lemma A.1) derives (19) for the model considered with T = 1. The proof of [23] (Lemma A.1) directly generalizes to arbitrary T.
Theorem 5 (Achievability bounds for local CSIR). Assume that B K 1 and B K 2 are independent of each other. The following sum rate R is achievable over the bursty IC: Proof. The achievability scheme for VWI for all values of p, and for WI and MI when 0 ≤ p ≤ 1 2 , is described in Appendix B.2.1. The achievability scheme for WI and 1 2 < p ≤ 1 is described in Appendix B.2.2. The scheme for SI and 0 ≤ p ≤ 1 2 is summarized in Appendix B.2.3. For MI and SI when 1 2 < p ≤ 1, the achievability bound in the theorem corresponds to the one of the non-bursty IC [20]. This also implies that in this sub-region we do not exploit the burstiness of the IC.

Remark 5.
The achievability schemes presented in Theorem 5 are similar to those described in [11,14]. They achieve the capacity region by applying point-to-point erasure codes with appropriate rates at each transmitter and using either treating-interference-as-erasure or interference-decoding at each receiver. Specifically, we apply treating-interference-as-erasure in the VWI region and for all values of p, and for all interference regions, except VSI, and p ≤ 1 2 . Interference-decoding at each receiver is applied in the MI and SI regions for p > 1 2 .

Remark 6.
Wang et al. claim in [23] (Lemma A.1) that the converse bound (18) is tight for 0 ≤ p ≤ 1 2 without providing an achievability bound. Instead, they refer to Khude et al. [3] for the inner bound which, alas, does not apply to the ergodic setup. While it is possible to adapt the achievability schemes considered in [3] to prove (20), a number of steps are required. For completeness, we include the achievability schemes for the ergodic setup and 0 ≤ p ≤ 1 2 in Appendix B.2.1. Table 2 summarizes the results of Theorems 4 and 5. We write the sum capacities in bold face when the converse and achievability bounds match. In Table 2, we define C LSI min 2pn c , 2 where "L" stands for "local CSIR".

Fully Correlated Case
For local CSIR, the dependence between B K 1 and B K 2 has no influence on the capacity region. Indeed, in this case the channel inputs are independent of (B K 1 , B K 2 ) and decoder i has only access to B i,k and (S n d X i,k ⊕ B i,k S n c X j,k ), k = 1, . . . , K, j = 3 − i and i = 1, 2. Furthermore, Pr{Ŵ 1 = W 1 ∪Ŵ 2 = W 2 } vanishes as K → ∞ if, and only if, Pr{Ŵ i = W i }, i = 1, 2, vanishes as K → ∞. Since Pr(Ŵ i = W i ) depends only on B K i , the capacity region of the bursty IC depends on (B K 1 , B K 2 ) only via the marginal distributions of B K 1 and B K 2 . Hence, Theorems 4 and 5 as well as Table 2 apply also to the case where B K 1 = B K 2 . This is consistent with the observation by Sato [24] that "the capacity region is the same for all two-user channels that have the same marginal probabilities".

Quasi-Static vs. Ergodic Setup
In general, the sum capacities of the quasi-static and ergodic channels cannot be compared, because in the former case we have a set of sum capacities (worst case and opportunistic), whereas in the latter case only one is defined. To allow for a comparison, we introduce for the quasi-static channel the average sum capacity as where the suprema is over all tuples (R, ∆R 1 (0), ∆R 2 (0)) that satisfy (12)- (17). Intuitively, the average rate corresponds to the case where we send many messages over independent quasi-static fading channels. By the law of large numbers, a fraction of p transmissions will be affected by interference, the remaining transmissions will be interference-free. Table 3 summarizes the average sum capacity for the different interference regions. By comparing Tables 2 and 3, we can observe that for p ≤ 1 2 and all interference regions, and for p > 1 2 and VWI/WI, the average sum capacity in the quasi-static setup coincides with the sum capacity in the ergodic setup. For p > 1 2 , and MI/SI (where converse and achievability bounds do not coincide), the average sum capacities in the quasi-static setup coincide with the achievability bounds of the ergodic setup.

Local CSIRT
For the quasi-static and ergodic setups, we present converse and achievability bounds when transmitters and receivers have access to their corresponding interference states. We shall only consider the independent case here, because when B K 1 = B K 2 local CSIRT coincides with global CSIRT, which will be discussed in Section 5.

Quasi-Static Channel
For the quasi-static channel, the converse and achievability bounds were already presented in Theorem 3 in Section 3.1.1. Indeed, the converse bounds were derived for local CSIRT, whereas the achievability bounds in that theorem were derived for local CSIR. Since these bounds coincide for all interference regions and all probabilities of 0 < p < 1 it follows that, for the quasi-static channel, availability of local CSI at the transmitter in addition to local CSI at the receiver is not beneficial. The converse and achievability bounds are then given in Theorem 3.

Ergodic Channel
The converse bound (18) presented in Theorem 4 was derived for local CSIRT, so it applies to the case at hand. We next present achievability bounds for this setup that improve upon those for CSIR. The aim of these bounds is to provide computable expressions showing that local CSIRT outperforms local CSIR in the whole range of the α parameter. While the particular achievability schemes are sometimes involved, the intuition behind these schemes can be explained with the following toy example. Example 1. Let us assume that n d = n c = T = 1, and suppose that at time k the transmitters send the bits (B 1,k , B 2,k ) ∈ {0, 1} 2 . If there is no interference, then receiver i receives X i,k . If there is interference, then receiver i receives X 1,k ⊕ X 2,k . Consequently, the channel flips X 1,k if B 1,k = X 2,k = 1, and it flips X 2,k if B 2,k = X 1,k = 1. It follows that each transmitter-receiver pair experiences a binary symmetric channel (BSC) with a given crossover probability that depends on p and on the probabilities that (X 1 , X 2 ) are one. Specifically, let and define p 3 (1 − p)p 1 + pp 2 and q 3 (1 − p)q 1 + pq 2 , which are the crossover probabilities of the BSCs experienced by receivers 1 and 2, respectively, when they are affected by interference. By drawing for each user two codebooks (one for B i,k = 0 and one for B i,k = 1) i.i.d. at random according to the probabilities p 1 , p 2 , q 1 , and q 2 , and by following a random-coding argument, it can be shown that this scheme achieves the sum rate This expression holds for any set of parameters (p 1 , p 2 , q 1 , q 2 ), and the largest sum rate achieved by this scheme is obtained by maximizing over (p 1 , p 2 , q 1 , q 2 ) ∈ 0, 1 2 4 .
In the following, we present the achievable sum rates that can be obtained by generalizing the above achievability scheme to general n d and n c . The achievability schemes that achieve these rates are presented in Appendix D. The largest achievable sum rates can then be obtained by numerically maximizing over the parameters (p 1 , p 2 , q 1 , q 2 , . . .) (which depend on the interference region).

1.
For the VWI region, we achieve the sum rate 2.
For the WI region, we can achieve for any (p 1 , To present the achievable rates for MI, we need to divide the region into the following four subregions: (a) For 2 3 ≤ α ≤ 3 4 , we can achieve for any (p 1 , p 2 ,p 1 ,p 2 ,p 1 , q 1 , q 2 ,q 1 ,q 2 ,q 1 ) ∈ 0, 1 2 10 and where where
In each region, we optimize numerically over the set of parameters, exploiting in some cases that there is symmetry (except for α = 1 ) between the corresponding parameters of both users.

Local CSIRT vs. Local CSIR
To evaluate the effect of exploiting local CSI at the transmitter side, we plot in Figures 2-4 the converse and achievability bounds for local CSIR and local CSIRT. For each interference region, we choose one value of α. We omit the VWI region because in this region both local CSIR and local CISRT coincide. We observe that for all interference regions, except in the VWI region, local CSIRT outperforms local CSIR. We further observe that the largest improvement is obtained for p = 1 2 . This is not surprising, since in this case the uncertainty about the interference states is the largest.

Quasi-Static vs. Ergodic Setup
As observed in the previous subsection, for the ergodic setup local CSIRT outperforms local CSIR in all interference regions (except VWI). In contrast, the opportunistic rates achievable in the quasi-static setup for local CSIRT coincide with those achievable for local CSIR. In other words, the availability of local CSI at the transmitter is only beneficial in the ergodic setup but not in the quasi-static one. This remains to be true even if we consider the average sum capacity rather than the sum rate region. Intuitively, in the coherent setup, the achievable rates depend on the input distributions of X K 1 and X K 2 , and adapting these distributions to the interference state yields a rate gain. In contrast, in the quasi-static setup, we treat the two interference states separately: the worst-case rates are designed for the worst case (where both receivers experience interference), and the opportunistic rates are designed for the best case (where the corresponding receiver is interference-free).
Given that the opportunistic rate region (R, ∆R(V 1 , V 2 )) is not enhanced by the availability of local CSI at the transmitter, it follows directly that the same is true for the average sum capacity, defined in (23). Note, however, that it is unclear whether (23) corresponds to the best strategy to transmit several messages over independent uses of a quasi-static channel when the transmitters have access to local CSI. Indeed, in this case transmitter i may choose the values for R i and ∆R i (0) as a function of the interference state B i , potentially giving rise to a larger average sum capacity. Yet, the set of achievable rate pairs (R i , ∆R i (0)) depends on the choice of (R j , ∆R j (0)) of transmitter j = i, which transmitter i may not deduce since it has no access to the other transmitter's CSI. How the transmitters should adapt their rates to the interference state remains therefore an open question.

Global CSIRT
We next present converse and achievability bounds for global CSIRT. In this scenario, the transmitters may agree on a specific coding scheme that depends on the realization of (B K 1 , B K 2 ). This allows for a more elaborated cooperation between the transmitters and strictly increases the sum capacity compared to the local CSIR/CSIRT scenarios.

Quasi-Static Channel
In the quasi-static scenario with global CSIRT, the messages are, strictly speaking, not opportunistic. Instead, transmitters can choose the message depending on the true state of the interference links, so the strategy is perhaps better described as rate adaptation. Nevertheless, the definitions of worst-case sum rate and opportunistic sum rate in Section 2.1 still apply in this case. To keep notation consistent, we use the definition of "opportunism" also for global CSIRT.

Independent Case
Assume first that the sequences B 1 and B 2 are independent of each other.
Theorem 6 (Opportunistic sum capacity for global CSIRT). Assume that B 1 and B 2 are independent of each other. For 0 < p < 1, the opportunistic sum capacity region is the union of the set of rate tuples (R, ∆R(00), ∆R(01), ∆R(10)) satisfying (12)- (14) and  [20]. The details can be found in Appendix A.6.
Remark 8. The proofs of Theorems 3 and 6 merely require that the joint distribution Thus, these theorems also apply to the case where B 1 and B 2 are dependent, as long as they are not fully correlated. Table 4 summarizes the results of Theorem 6. Observe that for VWI and WI opportunistic messages can be transmitted reliably at a positive rate, while for MI and SI this is only the case if both links are interference-free. Table 4. Opportunistic sum capacity for global CSIRT when the worst-case sum rate is maximized and B 1 and B 2 are independent.

Rates
VWI WI MI SI Next, we consider the case in which the interference states are fully correlated. In this scenario, local CSIRT coincides with global CSIRT.
Theorem 7 (Opportunistic sum capacity for global CSIRT). Assume that B 1 and B 2 are fully correlated. For 0 ≤ p < 1, the opportunistic sum capacity region is the union of the set of rate pairs (R, ∆R(00)) satisfying (12)- (14) and Proof. For the converse bound, we note that the analysis in Appendix A. For the achievability bound, we use an achievability scheme where the opportunistic messages are only decoded in absence of interference at the intended receiver. In this case, we have two parallel interference-free channels, for which the optimal strategy consists of transmitting uncoded bits in the n d sub-channels. Table 5 summarizes the results of Theorem 7. Observe that the worst-case sum capacity C and the opportunistic sum capacity ∆C(00) when the channel is interference-free do not depend on the correlation between B 1 and B 2 . The only difference between the independent and fully correlated case is that the interference states [0, 1] and [1,0] Table 5. Opportunistic sum capacity for global CSIRT when the worst-case sum rate is maximized and B 1 and B 2 are fully correlated.

Rates
VWI WI MI SI and Proof. The proof of (52) follows along similar lines as (18) but noting that, for global CSIRT, X K i depends on both B K 1 and B K 2 . The proof of (53) is based on pairing the interference states according the four possible combinations of (B 1,k , B 2,k ). See Appendix B.3 for details. Remark 9. The proof of Theorem 8 can be extended to consider an arbitrary joint distribution Theorem 9 (Achievability bounds for global CSIRT). Assume that B K 1 and B K 2 are independent of each other. The following sum rates R are achievable over the bursty IC: where p min min(p 2 , p(1 − p)).
Proof. The sum rate (54) is achieved by using the optimal scheme for the non-bursty IC when any of the two receivers is affected by interference [20], and by using uncoded transmission when there is no interference. The sum rates (55) and (56) are novel. See Appendix B.4 for details.

Remark 10.
In contrast to the local CSIR scenario, the achievability schemes presented in Theorem 9 differ noticeably from those in [12] for the binary IC. Indeed, while both works exploit global CSIRT to enable cooperation between users, [12] assumes that only delayed CSI is present. The achievability schemes presented in Theorem 9 thus cannot be applied directly to the model considered in [12]. Table 6 summarizes the results of Theorems 8 and 9. We write the sum capacity in bold face when converse and achievability bounds coincide. In Table 6, we define where "G" stands for "global CSIRT". Table 6. Bounds on the sum capacity C for global CSIRT when B K 1 and B K 2 are independent.

Fully Correlated Case
We next discuss the case where the sequences B K 1 and B K 2 are fully correlated, i.e., B K 1 = B K 2 .
Theorem 10 (Converse bounds for global CSIRT). Assume that B K 1 and B K 2 are fully correlated. The sum rate R for the bursty IC is upper-bounded by Proof. The proof of (59) follows similar steps as in Appendix B.3.1 but considering The proof of (60) is given in Appendix B.5. See also Remark 9.
Theorem 11 (Achievability bounds for global CSIRT). Assume that B K 1 and B K 2 are fully correlated. The following sum rates R are achievable over the bursty IC: Proof. The sum rates (61) and (62) are achieved by using the optimal scheme for the non-bursty IC when the two receivers are affected by interference [20], and by using uncoded transmission in absence of interference.

Quasi-Static vs. Ergodic Setup
Similar to the average sum capacity for local CSIR defined in Section 3.3, we define the average sum capacity for global CSIRT when B 1 and B 2 are independent as where the suprema are over all rate tuples (R, ∆R(00), ∆R(01), ∆R (10)) that satisfy Theorems 2 and 6. The intuition behind (63) is the same as that behind (23) for local CSIR, but with global CSIRT the transmitters can adapt their rates (R i , ∆R i (V i )) to the interference state. For example, the first term on the right-hand side (RHS) of (63) corresponds to the interference state [1,1], in which case we transmit at total sum rate R; the second term corresponds to the interference state [0, 1], in which case we transmit at total sum rate R + ∆R(01); and so on. Table 8 summarizes the average sum capacity for the different interference regions. The average sum capacities for VWI and WI coincide with the sum capacities in the ergodic setup (see Table 6). In contrast, for MI and SI, the average sum capacities are smaller than the sum capacities in the ergodic setup. Table 8. Average sum capacity when B 1 and B 2 are independent.

Regions
Bounds Similarly, in the fully correlated case, we define the average sum capacity as where the suprema are over all rate pairs (R, ∆R(00)) that satisfy Theorems 2 and 7. The corresponding results are summarized in Table 9. Table 9. Average sum capacity when B 1 and B 2 are fully correlated.

Regions Bounds
We observe that the average sum capacities coincide with the sum capacities of the ergodic setup.

Exploiting CSI
In this section, we study how the level of CSI affects the sum rate in the quasi-static and ergodic setups.
For the quasi-static channel, Figures 5 and 6 show the total sum capacity presented in Theorems 3, 6 and 7. Specifically, we plot the normalized total sum capacity C+∆C   We further observe that the opportunistic-capacity region for local CSIRT is equal to that for local CSIR. Thus, local CSI at the transmitter is not beneficial. As we shall see later, this is in stark contrast to the ergodic setup, where local CSI at the transmitter-side is beneficial. Intuitively, in the ergodic case the input distributions of X K 1 and X K 2 depend on the realizations of B K 1 and B K 2 , respectively. Hence, adapting the input distributions to these realizations increases the sum capacity. In contrast, in the quasi-static case, the worst-case scenario (presence of interference) and the best-case scenario (absence of interference) are treated separately. Hence, there is no difference to the case of local CSIR.
For the ergodic setup, Figures 7-10 show the converse and achievability bounds presented in Theorems 4, 5, 8 and 9. We further include the results on local CSIRT presented in Section 4. Specifically, we plot the normalized sum capacity    Figure 7 reveals that in the VWI region the sum capacity is equal to 2(n d − pn c ), irrespective of the availability of CSI (see Figure 7). Thus, in this region access to global CSIRT is not beneficial compared to the local CSIR scenario. In the VSI region, the sum capacity of the non-bursty IC is equal to 2n d , which is that of two parallel channels without interference [15] (Section II-A). Therefore, burstiness of the interference (and hence CSI) does not affect the sum capacity.
In the WI region, shown in Figure 8, the converse and achievability bounds for local CSIR and global CSIRT coincide and it is apparent that global CSIRT outperforms local CSIR. In the MI and SI regions, the converse and achievability bounds only coincide for certain regions of p. Nevertheless, Figures 9 and 10 show that, in almost all cases, global CSIRT outperforms local CSIR. (For the case presented in Figure 9 α = 7 10 , we also present the local CSIRT converse bound (18), although it is looser for some values of p, with respect to the one depicted for global CSIRT.) Local CSIRT outperforms local CSIR in all interference regions (except VWI). We stress again the fact that this was not the case in the quasi-static scenario, where both coincide.
We next consider the case where B K 1 and B K 2 are fully correlated. For this scenario, [7,23] studied the effect of perfect feedback on the bursty IC. For comparison, the non-bursty IC with feedback was studied by Suh et al. in [25], where it was demonstrated that the gain of feedback becomes arbitrarily large for certain interference regions (VWI and WI) when the signal-to-noise-ratio increases. This gain corresponds to a better resource utilization and thereby a better resource sharing between users. Specifically, [7,23] (bursty IC) and [25] (non-bursty IC) assume that noiseless, delayed feedback is available from receiver i to transmitter i (i = 1, 2). For the symmetric setup treated in this paper, [7] (Theorem 3.2) or [23] (Theorem 3.2) showed the following: Theorem 12 (Channel capacity for the bursty IC with feedback [7,23]). The sum capacity of the bursty IC with noiseless, delayed feedback is given by Proof. See [7] (Sections IV and V), [23] (Sections IV and V, Appendices A, C, D).
Observe that (65) for α ≤ 2 coincides with (18). This implies that local CSIRT can never outperform delayed feedback. Intuitively, feedback contains not only information about the channel state, but also about the previous symbols transmitted by the other transmitter, which can be exploited to establish a certain cooperation between the transmitters. Figures 11-14 show the bounds on the normalized sum capacity, C n d , comparing the scenarios of local CSIR versus global CSIRT when the interference states are fully correlated, i.e., B K 1 = B K 2 . They further show the sum capacity for the case where the transmitters have noiseless delayed feedback [7]. The shadowed areas correspond to the regions where achievability and converse bounds do not coincide.    Figure 11 reveals that feedback in the VWI region outperforms the non-feedback case, irrespective of the availability of CSI. Wang et al. [7] have further shown that feedback also outperforms the non-feedback case in the VSI region. The order between global CSIRT and the feedback scheme is not obvious. There are regions where global CSIRT outperforms the feedback scheme and vice versa. Indeed, on the one hand, feedback contains information about the previous interference states and previous symbols transmitted by the other transmitter, permitting the resolution of collisions in previous transmissions. On the other hand, global CSIRT provides non-causal information about the interference states, allowing a better adaptation of the transmission strategy to the interference burstiness.

Exploiting Interference Burstiness
To better illustrate the benefits of interference burstiness, we show the normalized sum capacity as a function of α, in order to appreciate all the interference regions. In the non-bursty IC (p = 1), this curve corresponds to the well-known W-curve obtained by Etkin et al. in [26]. We next study how burstiness affects this curve in the different considered scenarios.
In the quasi-static setup, burstiness can be exploited by sending opportunistic messages. We consider the total sum capacity for the case where the worst-case rate R is maximized. For local CSIR/CSIRT, Theorem 3 suggests that the use of an opportunistic code is only beneficial if the interference region is VWI or WI. For other interference regions there is no benefit. In contrast, for global CSIRT an opportunistic code is beneficial for all interference regions (except for VSI where the sum capacity corresponds to that of two parallel channels without interference). Figures 15 and 16 illustrate these observations. Specifically, in Figures 15 and 16 we show the normalized total sum capacity achieved under local CSIR/CSIRT and global CSIRT when the interference states are independent. We observe that, for local CSIR, the opportunistic rates ∆R 1 (0) and ∆R 2 (0), are only positive in the VWI and WI regions. In these regions, if only one of the receivers is affected by interference the sum capacity is given by the worst-case rate R plus one opportunistic rate of the user which is not affected by interference. In absence of interference at both receivers, both receivers can decode opportunistic messages. Hence, the total sum capacity is equal to C + ∆C 1 (0) + ∆C 2 (0). For global CSIRT we can observe that, when only one of the receivers is affected by interference, we achieve the same total sum capacity as in the local CSIR/CSIRT. However, in absence of interference at both receivers, we achieve the trivial upper bound corresponding to two parallel channels. The fully correlated scenario can be considered as a subset of the independent scenario. Indeed, for the case B = [0, 0] and B = [1, 1] we obtain the same total sum capacity as for the independent scenario. The main difference is that in the fully correlated scenario the interference states B = [0, 1] and B = [1, 0] are impossible.  For the ergodic case, Figures 17 and 18 show the bounds on the normalized sum capacity, C n d , as a function of α when B K 1 and B K 2 are independent. The shadowed areas correspond to the regions where achievability and converse bounds do not coincide. We further show the W-curve. Observe that for p ≤ 1 2 the sum capacity as a function of α forms a V-curve instead of the W-curve. Further observe how the sum capacity approaches the W-curve as p tends to one. In Figure 19 we show the bounds on the normalized sum capacity, C n d , as a function of α for global CSIRT when B K 1 and B K 2 are fully correlated. (For local CSIR the sum capacity is not affected by the correlation between B K 1 and B K 2 , so the curve for R n d as a function of α coincides with the one obtained in Figure 17.) We observe that, for all values of p > 0, the sum capacity forms a W-curve similar to the W-curve for p = 1. This is the case because, when both interference states are fully correlated, the bursty IC is a combination of an IC and two parallel channels.
We observe that for global CSIRT the burstiness of the interference is beneficial for all interference regions and all values of p. For local CSIR, burstiness is beneficial for all values of p for VWI and WI. However, for MI and SI, burstiness is only of clear benefit for p ≤ 1 2 . It is yet unclear whether burstiness is also beneficial in these interference regions when p > 1 2 . To shed some light on this question, note that evaluating the converse bound in [23] (Lemma A.1), which yields (21), for inputs X K 1 and X K 2 that are temporally independent, we recover the achievability bound (20). Since for MI/SI and p ≥ 1 2 this bound coincides with the rates achievable over the non-bursty IC, this implies that an achievability scheme can only exploit the burstiness of the interference in this regime if it introduces some temporal correlation (this observation is also revealed by considering the average sum capacity for the quasi-static case). In fact, for global CSIRT the achievability schemes proposed in Theorem 9 for MI and SI copy the same bits over several coherence blocks, i.e., they exhibit a temporal correlation, which cannot be achieved using temporally independent distributions. However, the temporal pattern of these bits requires knowledge of both interference states, so this approach cannot be adapted to the cases of local CSIR/CSIRT. In contrast, for global CSIRT in the fully correlated case where converse and achievability bounds coincide, it is not necessary to introduce temporal memory. This scenario is simpler, since in this case the channel exhibits only two channel states, a non-bursty IC and two parallel channels.

Summary and Conclusions
In this work, we considered a two-user bursty IC in which the presence/absence of interference is modeled by a block-i.i.d. Bernoulli process while the power of the direct and cross links remains constant during the whole transmission. This scenario corresponds, e.g., to a slow-fading scenario in which all the nodes can track the channel gains of the different links, but where the interfering links are affected by intermittent occlusions due to some physical process. While this model may appear over-simplified, it yields a unified treatment of several aspects previously studied in the literature and gives rise to several new results on the effect of the CSI in the achievable rates over the bursty IC. Our channel model encompasses both the quasi-static scenario studied in [3,5] and the ergodic scenario (see, e.g., [7,12]). While the model recovers several cases studied in the literature, it also presents scenarios which have not been previously analyzed. This is the case, for example, for the ergodic setup with local and global CSIRT. Our analysis in these scenarios does not yield matching upper and lower bounds for all interference and burstiness levels. Yet, examining the obtained results, we observe that the best strategies in these scenarios often require elaborated coding strategies for both users that feature memory across different interference. This fact probably explains why no previous results exist in these scenarios. Furthermore, several of our proposed achievability schemes require complex correlation among signal levels. Thus, while the LDM in general provides insights on the Gaussian IC, the proposed schemes may actually be difficult to convert to the Gaussian case.
In the quasi-static scenario, the highest sum rate R that can be achieved is limited by the worst realization of the channel and thus coincides with that of the (non-bursty) IC. We can however transmit at an increased (opportunistic) sum rate R + ∆R when there is no interference at any of the interfering links. For the ergodic setup, we showed that an increased rate can be obtained when local CSI is present at both transmitter and receiver, compared to that obtained when CSI is only available at the receiver side. This is in contrast to the quasi-static scenario, where the achievable rates for local CSIR and local CSIRT coincide. Featuring global CSIRT at all nodes yields an increased sum rate for both the quasi-static and the ergodic scenarios. In the quasi-static channel, global CSI yields increased opportunistic rates in all the regions except in the very strong interference region, which is equivalent to having two parallel channels with no interference.
Both in the quasi-static and ergodic scenarios, global CSI exploits interference burstiness for all interference regions (except for very strong interference), irrespective of the level of burstiness. When local CSI is available only at the receiver side, interference burstiness is of clear benefit if the interference is either weak or very weak, or if the channel is ergodic and interference is present at most half of the time. When local CSI is available at each transmitter and receiver and the channel is ergodic, interference burstiness is beneficial in all interference regions except in the very weak and very strong interference regions.
In order to compare the achievable rates of the quasi-static and ergodic setup, one can define the average sum rate of the quasi-static setup for local CSIR/CSIRT as R + (1 − p)(∆R 1 (0) + ∆R 2 (0)), with a similar definition for the average sum rate for global CSIRT. The average sum rate corresponds to a scenario where several codewords are transmitted over independent quasi-static bursty ICs. This, in turn, could be the case if a codeword spans several coherence blocks, but no coding is performed over these blocks. This is in contrast to the ergodic setup where coding is typically performed over different coherence blocks. By the law of large numbers, roughly a fraction of p codewords experiences interference, the remaining codewords are transmitted free of interference. Consequently, an opportunistic transmission strategy achieves the rate pR + (1 − p)(R + ∆R 1 (0) + ∆R 2 (0)), which corresponds to the average sum rate. Our results demonstrate that, for local CSIR, the average sum capacity, obtained by maximizing the average sum rate over all achievable rate pairs (R, ∆R 1 (0) + ∆R 2 (0)), coincides with the achievable rates in the ergodic setup for all interference regions. In contrast, for local CSIRT, the average sum capacity is strictly smaller than the sum capacity in the ergodic setup. For global CSIRT, average sum capacity and sum capacity coincide for all interference regions when the interference states are fully correlated, and they coincide for VWI and WI when the interference states are independent. For global CSIRT, MI/SI, and independent interference states, the average sum capacity is smaller than the sum capacity in the ergodic setup. In general, the average sum capacity defined for the quasi-static setup never exceeds the sum capacity in the ergodic setup. This is perhaps not surprising if we recall that the average sum capacity corresponds to the case where no coding is performed over coherence blocks. Interestingly, the average sum capacity is not always achieved by maximizing the worst-case rate. For small values of p, it is beneficial to reduce the worst-case rate in order to achieve a larger opportunistic rate.
In our work we considered both the case where the interference states of the two users are independent and the case where the interference states are fully correlated. In both ergodic and quasi-static setups, the results for local CSIR are independent of the correlation between interference states. For other CSI levels, dependence between the interference states helps in all interference regions except very weak and very strong interference regions. Acknowledgments: Fruitful discussions with S. Gherekhloo are gratefully acknowledged. We further thank the anonymous reviewers for their insightful comments and suggestions.

Conflicts of Interest:
The authors declare no conflict of interest.
The converse bounds in the quasi-static case are based on an information density approach [27]. In particular, we define the information densities for the bursty IC Here and throughout the appendices, we use the notations and y N i = y i to highlight the fact that, in the quasi-static setting, we transmit N symbols in one coherence block.
We further consider the individual error events and the joint error event The proofs of the converse results are based on the following lemmas.
Lemma A1 (Verdú-Han lemma). Every (N, R, P e ) code over a channel P Y N |X N satisfies for every γ > 0, where X N places probability mass 1 2 NR on each codeword and .
When p = 0, the only positive probability is p 00 . A necessary condition for lim N→∞ P (N) e = 0 is that 00 → 0 as N → ∞. By following the same approach as for the case p > 0, we obtain the converse bound (12) in Theorem 2.

Appendix A.3. Achievability Proof of Theorem 3
In this section, we present the achievability bounds in Theorem 3 for the regions in which it is possible to transmit opportunistic messages, namely the VWI and WI regions. The presented bounds are valid for local CSIR and local CSIRT. Appendix A.3.1. Very Weak Interference Transmitter 1 (Tx 1 ) and transmitter 2 (Tx 2 ) transmit in the most significant levels a block of n d (1 − α) bits, and they transmit in the least significant levels a block of n d α bits. The same construction is used for both transmitters. Figure A1 depicts the signal levels of the transmitted signals (normalized by n d ) as observed at receiver 1 (Rx 1 ), when it is affected by interference. At the receiver side, we have the following procedure: • In presence of interference: decode block A in the desired signal which is interference free, and treat the block B as noise. We thus obtain the individual rate • In absence of interference: decode blocks A and B . We thus obtain the individual rate where ∆R 1 (0) = n c bits sub-channel use corresponds to the opportunistic rate.
The bounds (A42) and (A43) coincide with the bounds for the bounds of user 2. In order to obtain the possible sum rates according to the interference states, we combine (A42) (which corresponds to B 1 = 1) and (A43) (which corresponds to B 1 = 0) to obtain the converse bounds (15)- (16).

Appendix A.3.2. Weak Interference
The symbol transmitted by Tx 1 (normalized by n d ) is depicted in Figure A2a. Specifically, we transmit in the most significant levels a block of n d (1 − α) bits. In the subsequent levels we transmit a block of n d (2α − 1) zeros, followed by n d (2 − 3α) opportunistic bits. Finally, in the least significant levels, we transmit a block of n d (2α − 1) bits. The same construction is used for both transmitters. Figure A2b depicts the normalized signal levels of the transmitted signals as observed by Rx 1 . At the receiver side, we have the following procedure: • In presence of interference: The channel pushes the interference level by n d − n c bits. Thus, the least significant 2n c − n d bits of the desired signal (block A ) align with the zeros of the interference signal and can be decoded free from interference. Since (n d − n c ) ≤ n c , the most significant n d − n c bits (block B ) are also free from interference. Thus, we achieve the rate • In absence of interference: The bits in blocks A , B , and D can be decoded free from interference. Thus, we achieve the rate where ∆R 1 (0) = 2n d − 3n c bits sub-channel use corresponds to the opportunistic rate.
By symmetry, the bounds (A44) and (A45) also apply for the achievable rates of user 2. In order to obtain the possible sum rates according to the interference states, we combine (A44) (which corresponds to B 1 = 1) and (A45) (which corresponds to B 1 = 0) to obtain the achievability bounds in Theorem 3.
Note that compared to the derivation in Section A.2, the two error events E 1 (Γ 1 ) and E 2 (Γ 2 ) are conditioned on different interference states. In order to derive a joint error event for E 1 (Γ 1 ) and E 2 (Γ 2 ), we use the next lemma.

Lemma A3.
For local CSIR, the information density i i , i = 1, 2 depends only on (x N i , y N i ) and the corresponding state b i , i.e., Proof. We prove (A48) for user 1. By the definition of the information density (A1), it follows that Evaluating i 1 for B = [0, b 2 ], b 2 = 0, 1 and B = [1, b 2 ], b 2 = 0, 1 we obtain that both cases are independent of b 2 . The identity (A48) can be proven in the same way.
This proves the converse bounds in Theorem 6.
Appendix A.6. Achievability Proof of Theorem 6 In this section, we present the achievability schemes for global CSIRT when B 1 and B 2 are independent. In contrast to the local CSIR/CSIRT case, we can adapt our transmission strategy to the interference states.
When B = [0, 0], the capacity-achieving scheme consists of sending uncoded bits in all n d level. We thus achieve the sum rate R + ∆R(00) = 2n d bits sub-channel use . When B = [0, 1] or B = [1, 0], the achievability schemes coincide with the schemes described in Section A.3. In this case, we can only send opportunistic messages when we have VWI or WI. Appendix A.6.1. Very Weak Interference Consider the achievability scheme depicted in Figure A1. By (A42) and (A43), This proves the achievability bounds in Theorem 6 for VWI.

Appendix B. Proofs for the Ergodic Case
Appendix B.1. Proof of (18) in Theorem 4 The bound (18) coincides with [7] (Theorem 3.1). However, [7] (Theorem 3.1) derives (18) for the considered channel model with T = 1 and feedback. In this section we show that (18) also holds for general T in the no-feedback case. We follow along the lines of the proof of [7] (Theorem 3.1). We begin by applying Fano's inequality to obtain where 1K → 0 as K → ∞. Here, (a) follows because (W 1 , B K 1 ) determine X K 1 , so we can subtract the contribution of X K 1 in the second entropy and by evaluating the entropy for different interference states.
Step (b) follows because (B k−1 1 , X k 2 ) are independent of (B K 1,k , W 1 ) (which in turn follows because X K 2 only depends on (B K 2 , W 2 ), which is independent of (B K 1 , W 1 )) and because conditioning reduces entropy. Likewise, we have where 2K → 0 as K → ∞. Here, (a) follows because W 2 , W 1 and B K 1 are independent.
Step (b) follows because (W 1 , B K 1 ) determines X K 1 , so we can subtract its contribution from (Y 1,k , Y 2,k ), because Y 1,k ⊕ S n d X 1,k = B 1,k S n c X 2,k has a lower entropy than S n c X 2,k , and because conditioning reduces entropy.
Step (c) follows by the chain rule, and because conditioning reduces entropy.
Combining (A69) and (A70) yields By maximizing the individual entropies in (A71) over all input distributions, dividing both sides of (A71) by N = KT, and by letting then K tend to infinity, we obtain that By symmetry, the same bound also holds for R 2 + pR 1 . Thus, by averaging over the two cases, it follows that (A72) is also an upper bound on (R 1 + R 2 )(1 + p)/2. The final result (18) follows by dividing (A72) by 1+p 2 .

Appendix B.2. Achievability Proof of Theorem 5
In this section, we describe the achievability schemes that yield the rates presented in Theorem 5 for local CSIR. The bursty IC described in Section 2 is treated here as a set of n d parallel sub-channels.
Appendix B.2.1. Scheme 1 (VWI; WI, MI for 0 ≤ p ≤ 1 2 ) The achievability scheme is illustrated in Figure A3a. In the figure, we present the normalized received signal at Rx 1 , i.e., we represent graphically the time-k channel output Y 1,k given by (3), where the signal level from Tx 1 corresponds to S n d X 1,k and the signal level from Tx 2 corresponds to S n c X 2,k , both normalized by n d . In our scheme, the upper n d − n c sub-channels (block A in the figure) carry uncoded data (rate 1 bits/sub-channel use), while in the lower n c channels (block B in the figure) a capacity-achieving code of blocklength N = KT for a binary erasure channel (BEC) with erasure probability p is used (with asymptotic rate 1 − p bits/sub-channel use) [28] (Section 7.1.5). Block A is received free of interference and can be directly decoded at the receiver. Block B is affected by interference with probability (w.p.) p. Since the fading state B i,k is known to the i-th receiver, interfered slots are treated as erasures. Consequently, when K tends to infinity, user i achieves the rate R i = (n d − n c ) + (1 − p)n c . The sum rate R is thus given by This scheme is tight for VWI and for WI and MI when p ≤ 1 2 .
Appendix B.2.2. Scheme 2 (WI, 1 2 < p ≤ 1) We next consider the achievability scheme illustrated in Figure A3b. In blocks A and B uncoded data is transmitted (rate 1 bits/sub-channel use), block C carries the deterministic all-zeros sequence (rate 0 bit/sub-channel use) and in block D a capacity-achieving code for the BEC (with asymptotic rate 1 − p bits/sub-channel use) is used. As in Scheme 1, blocks A and B can be decoded without interference, and block D is decoded by treating interfered symbols as erasures. The rate achieved by this scheme at user i is R i = (n d − n c ) + (2n c − n d ) + (1 − p)(2n d − 3n c ), so Appendix B.2.3. Scheme 3 (SI, 0 ≤ p ≤ 1 2 ) We use an achievability scheme similar to Scheme 1. Now, the upper 2n d − n c sub-channels carry a capacity-achieving code for a BEC with erasure probability p, and the lower n c − n d sub-channels carry uncoded data. Consequently, when K tends to infinity, user i achieves the rate R i = (n c − n d ) + (1 − p)(2n d − n c ). The sum rate R = R 1 + R 2 is thus given by This proves Theorem 5.

Appendix B.3. Proof of Theorem 8
In this section, we prove the converse bounds for global CSIRT and independent B K 1 and B K 2 .
Step (b) follows because (W 1 , B K ) determines X 1,k , so we can subtract its contribution from Y 1,k and Y 2,k , and because conditioning reduces entropy.
Step (c) follows by evaluating the entropies for different interference states and because conditioning reduces entropy. Combining (A76) and (A77) yields By maximizing the entropies in (A78) over all input distributions, dividing by N = KT, and letting K tend to infinity, we obtain that which is (52). We denote the length of each of these states by j A , j B , j C and j D , respectively. For example, These states are schematically shown in Figure A4, where shaded areas correspond to b i = 1. Figure A4. Possible interference states.
For global CSIRT, (X K 1 , X K 2 ) may depend on B K = b K . We shall denote by X A i , X B i , X C i and X D i the X 1,k 's with indices in A, B, C and D. For example, X A i = {X i,k : k ∈ A}. At time k, the interference states B k = b k can be in one of the 4 possible cases, as depicted in Figure A4. The converse bound (53) is proved as follows. We begin by applying Fano's inequality to obtain where 1K → 0 and 2K → 0 as N → ∞. For every b K , we have where step (a) follows by the chain rule for entropy and because (W 1 , B K ) determines X K 1 , so we can subtract its contribution from the second and fourth entropy.
Step (b) follows because conditioning reduces entropy. We next upper-bound (A81) by combining the positive and negative entropies in areas B and C for user 1 and user 2; and areas A and B for user 2 and user 1: where step (a) follows because H(F) − H(G) ≤ H(F|G) for any random variables F and G. By maximizing the entropies in (A82) over all input distributions, we obtain By dividing (A83) by N = KT, and taking the limit as K → ∞, we obtain where (a) follows because ( 1K + 2K ) → 0 as K → ∞. Next, we apply the dominated convergence theorem (DCT) [29] (Section 1.34) to interchange limit and expectation. By the law of large numbers, we have that , and j B K → (1 − p) 2 almost surely as K → ∞. By replacing these probabilities in (A84), we thus obtain This yields (53).
Appendix B.4. Proof of Theorem 9 In this section, we present the achievability schemes for global CSIRT and independent B K 1 and B K 2 . Let b K denote the realizations of the interference states B K , and define j min min(j A , j B , j C ). Consider the following achievable schemes.
Both transmitters employ uncoded transmission in the first j min indices of regions A and C, respectively, and in the whole region D. Tx 1 copies the first j min indices of region A in region B, while Tx 2 copies the first j min indices of region C in B, aligned with those of user 1. The remaining indices are treated as a non-bursty IC attaining rate r ic = n d − n c 2 [20]. To illustrate the decoding process, Figure A5 shows the different normalized signals at the Rx 1 when j A = j B = j C = j D = 1. Tx 1 transmits the signals 1 , 3 , and 4 , in channel state A and B, C, and D, respectively. Similarly, Tx 2 transmits the signal 2 in states B and C. Rx 1 has access to a clean copy of signal 1 in region A, which can then be subtracted in state B to recover the interfering signal 2 . Since Tx 2 transmits the same signal in state C, the interference can then be canceled. Hence, signals 3 and 4 are recovered. For a given interference state and general A and B, C, and D, the rate attained by user i with this scheme is Figure A5. Normalized by n d signal levels at Rx 1 for MI and j A = j B = j C = j D .
Averaging (A86) over B K , and letting K → ∞, we obtain for the sum rate where we changed the order of limit and expectation by appealing to the DCT, and used that, by the law of large numbers, Both transmitters employ uncoded transmission in the first j min indices of states A and C. Tx 1 copies the lowest 2n d − n c sub-channels of the first j min indices of region A into the highest 2n d − n c sub-channels and uses uncoded transmission in the lowest n c − n d sub-channels of the corresponding sub-region in B. Tx 2 proceeds analogously but from region C to B. Both transmitters employ uncoded transmission in region D and treat the remaining indices as a non-bursty IC [20] with rate n c 2 . To illustrate the decoding process, Figure A6 shows the different normalized signals at the Rx 1 when j A = j B = j C = j D = 1. Tx 1 transmits the signals ( 1 , 2 ) , ( 1 , 3 ) , 5 and 6 in channel state A and B, C, and D, respectively. Similarly, Tx 2 transmits the signal ( 4 , 7 ) and ( 4 , 8 ) in states B and C, respectively. Rx 1 has access to a clean copy of signals 1 and 2 in region A, signal 1 can then be subtracted in state B to recover the interfering signals 4 and 7 . In state B, Rx 1 has access to signal 3 . Since Tx 2 transmits signal 4 in state C, the interference can then be canceled. Hence, signal 5 can be recovered. Finally, signal 6 is recovered without interference. For a given interference state, and general j A , j B , j C , j D , the rate attained by user i with this scheme is Averaging (A88) over B K , and letting K → ∞, we obtain for the sum rate where we changed the order of limit and expectation by appealing to the DCT, and used that, by the law of large numbers, The converse bound (59) for global CSIRT follows similar steps as in Appendix B.3.1 but considering B K 1 = B K 2 = B K . We next present the converse bound (60) for global CSIRT when B K 1 = B K 2 . This bound follows by giving the extra information (B K S n c X K 1 ) to Rx 1 . By Fano's inequality, we have where 1K → 0 as K → ∞. Analogously, by giving the extra information (B K S n c X K 2 ) to Rx 2 , we obtain where 2K → 0 as K → ∞. Thus, (A90) and (A91) yield where we have used that conditioning reduces entropy. By maximizing the entropies in (A92) over all input distributions, dividing by N = KT, and letting K tend to infinity, we obtain that This proves (60).

Appendix C. Proof of Lemma A2
In this section, we prove the Lemma A2. To this end, we first introduce definitions and properties that will be used in the proof of the lemma.
Definition A1 (Sup-entropy rate). The sup-entropy rate H(Y) is defined as the limsup in probability of . Analogously, the conditional sup-entropy rate H(Y|X) is the limsup in probability (according Lemma A4 (Sup-entropy rate properties). Suppose (X,Y) takes values in (X , Y ). The sup-entropy rate has the following properties: where |Y | denotes the cardinality of Y.
We recall the information densities i 1 (x N 1 , y N 1 , b) and i 2 (x N 2 , y N 2 , b) defined in (A1) and (A2), respectively. By decomposing the logarithms and applying the Bayes rule to both probability terms, we obtain (A96) To shorten notation, we shall omit the arguments and write i i i i (x N i , y N i , b), i = 1, 2 wherever the arguments are clear from the context.
Recall the error events E i (Γ i ) 1 n i i ≤ Γ i , i = 1, 2, and E 12 (Γ) 1 n i 1 + 1 n i 2 ≤ Γ , with Γ = Γ 1 + Γ 2 , as defined in (A3) and (A4), respectively. We first note that where (A97) follows because the conditions 1 N i 1 ≤ Γ 1 and 1 N i 2 ≤ Γ 2 imply that 1 Then, (A98) follows by applying basic set operations. Using (A97) and (A98), and computing the probability of the corresponding events, we obtain For clarity of exposition, we define b Pr 1 and analyze the necessary conditions on Γ such that b → 0 as N → ∞. We next consider separately the four possible realizations of B = b.
When B = [0, 0], the channel corresponds to two parallel channels with no interference links. Then, the underlying distribution of the probability (A100) is as the outputs y N 1 and y N 2 must coincide with the corresponding inputs according to the deterministic model. To prove the constraint (A6), we use (A96) in (A100) to obtain where we used that, according to (A101), log P X N We consider now the conditional sup-entropy rates H(X N i |B), i = 1, 2. According to (A95) in Lemma A4, we have that H(X N i |B) < n d , i = 1, 2. With these considerations, if we set Γ = 2n d + 2δ for some arbitrary δ > 0 in (A102), we obtain where the last step follows from (A99).
Recalling the definitions of the conditional sup-entropy rates H(X N i |B) we have that, for any δ > 0, This implies that the first probability on the RHS of (A103) tends to 1 as N → ∞, and the second probability on the RHS of (A103) tends to 0 as N → ∞. We conclude that for any Γ > 2n d the lower bound in (A103) tends to 1 as N → ∞. Thus, 00 → 0 as N → ∞ only if Γ ≤ 2n d .
When B = [0, 1], the channel corresponds to a two-user IC where only one of the transmitters interferes its non-intended receiver. In this case, the underlying distribution in (A100) is given by We next prove the constraints (A6) and (A7) in Lemma A2.
Appendix C.2.1. Proof of Constraint (A6) We lower-bound the probability 01 by that of 2 parallel channels and follow the steps in Appendix C.1. Indeed, by using (A96) in (A100) and lower-bounding log P X N The RHS of (A106 According to (A105), the following identities hold w.p. 1: Using (A96) in (A100) and the identities (i1)-(i3), we obtain 01 = Pr We next defineL d L d S n d and apply the chain rule of probability to obtain (A108) Using (A108) in (A107) and canceling the term log P S nc X N 1 |B (S n c X N 1 |B), we obtain Consider the sup-entropy rates H L (n d −n c ) + X N 1 |S n c X N 1 , B and H Y N 2 |B . By (A94) and (A95) in Lemma A4, we have that Let Γ = (n d − n c ) + + max(n d , n c ) + 2δ for some arbitrary δ > 0. It follows that Γ ≥ H L (n d −n c ) + X N 1 |S n c X N 1 , B) + H Y N 2 |B + 2δ, so (A109) can be lower-bounded as where the second step follows from (A99). By the definition of the conditional sup-entropy rate, it follows that the first probability on the RHS of (A112) tends to 1 as N → ∞, and the second probability on the RHS of (A112) tends to 0 as N → ∞. This implies that 01 → 0 as N → ∞ only if Γ ≤ (n d − n c ) + + max(n d , n c ) and proves conditions (A6) and (A7) in Lemma A2.
Remark A1. Given the symmetry of the problem, the constraints (A6) and (A7) for B = [1,0] are proven by swapping the roles of users 1 and 2, and following the same steps as for B = [0, 1].
Using (A115) and the identities (i1)-(i6), we obtain for (A100) Using (A108) in (A116), canceling the term log P S nc X N 1 |B (S n c X N 1 |B), and using that log P X N The RHS of (A117) coincides with (A109) conditioned on B = [1,1]. The proof then follows the one in Appendix C. We begin this proof by using (A96) to write where (a) follows by adding and subtracting and by rearranging terms.

Appendix D.3. Moderate Interference
We follow along similar lines to obtain the achievable rates for MI. However, in contrast to WI, for MI we need to consider different input distributions, depending on the value of α. In the proofs, we shall make use of the following auxiliary results, which can be proven by direct evaluation of the entropies considered.
To derive the achievable rates for MI, we again follow a random-coding argument where the codebooks are drawn i.i.d. at random. We next describe the input distributions for different values of α: Appendix D.3.1. MI, 2 3 < α ≤ 3 4 Consider the regions shown in Figure A8 for the received signal at Rx 1 . For the transmitted signal X 1 , we denote the bits in region j by X j 1 , j = {A, . . . , F}. In each of these regions we consider the following input distributions: • Regions A andÃ : We group the bits X A 1 and XÃ 1 in pairs, and we let each of these pairs (X 1 ,X 1 ) be i.i.d. and have the distribution from Lemma A6 with η 2 = 1, i.e., their marginal pmf is Regions B and F : The bits X B 1 and X F 1 are i.i.d. with marginal pmf P X 1 |B 1 (1|0) = P X 1 |B 1 (1|1) = 1 2 . (A140) • Region C : The bits X C 1 are i.i.d. with marginal pmf P X 1 |B 1 (1|0) = p 1 (A141) P X 1 |B 1 (1|1) = p 2 (A142) P X 1 (1) = p 3 = (1 − p)p 1 + pp 2 . (A143) • Region D : The bits X D 1 are i.i.d. with marginal pmf P X 1 |B 1 (1|0) =p 1 (A144) P X 1 |B 1 (1|1) =p 2 (A145) • Region E : The bits X E 1 are i.i.d. with marginal pmf P X 1 |B 1 (1|0) =p 1 (A147) Furthermore, we assume that X j 1 , j = {A, . . . , F} are independent. For user 2, the input distributions coincide with that of user 1 in the corresponding regions, but with parameters q i instead of p i ,q i instead ofp i ,q 1 instead ofp 1 , and γ i instead of η i . The terms in the other regions follow analogously. Therefore, using (A150) we obtain the rate Similarly, user 2 achieves the rate (33).
• Region C : The bits X C 1 are i.i.d. with marginal pmf P X 1 |B 1 (1|0) = p 1 (A185) P X 1 |B 1 (1|1) = p 2 (A186) Furthermore, we assume that X j 1 , j = {A, B, C, D} are independent. For Tx 2 , the input distributions coincide with that of Tx 1 in the corresponding regions, but with parameters q i instead of p i ,q 1 instead ofp 1 and γ 1 instead of η 1 . Following similar steps as in previous sections, we obtain the achievable rate pair (42) and (43).