Next Article in Journal
Kolmogorovian versus Non-Kolmogorovian Probabilities in Contextual Theories
Next Article in Special Issue
Trade-offs between Error Exponents and Excess-Rate Exponents of Typical Slepian–Wolf Codes
Previous Article in Journal
Automatic ECG Classification Using Continuous Wavelet Transform and Convolutional Neural Network
Previous Article in Special Issue
Computational Hardness of Collective Coin-Tossing Protocols
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

The Broadcast Approach in Communication Networks

1
Department of Electrical, Computer, and Systems Engineering, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
2
Faculty of Electrical Engineering, Technion—Israel Institute of Technology, Haifa 3200003, Israel
*
Author to whom correspondence should be addressed.
The authors contributed equally to this work.
Entropy 2021, 23(1), 120; https://doi.org/10.3390/e23010120
Submission received: 24 November 2020 / Revised: 12 January 2021 / Accepted: 13 January 2021 / Published: 18 January 2021
(This article belongs to the Special Issue Multiuser Information Theory III)

Abstract

:
In this paper we review the theoretical and practical principles of the broadcast approach to communication over state-dependent channels and networks in which the transmitters have access to only the probabilistic description of the time-varying states while remaining oblivious to their instantaneous realizations. When the temporal variations are frequent enough, an effective long-term strategy is adapting the transmission strategies to the system’s ergodic behavior. However, when the variations are infrequent, their temporal average can deviate significantly from the channel’s ergodic mode, rendering a lack of instantaneous performance guarantees. To circumvent a lack of short-term guarantees, the broadcast approach provides principles for designing transmission schemes that benefit from both short- and long-term performance guarantees. This paper provides an overview of how to apply the broadcast approach to various channels and network models under various operational constraints.

Contents
 
1      Motivation and Overview4
1.1      What is the Broadcast Approach?................................................................................................................................................. 4
1.2      Degradedness and Superposition Coding........................................................................................................................................... 5
1.3      Application to Multimedia Communication......................................................................................................................................... 6
 
2      Variable-to-Fixed Channel Coding7
2.1      Broadcast Approach in Wireless Channels......................................................................................................................................... 7
2.2      Relevance to the Broadcast Channel.............................................................................................................................................. 8
2.3      The SISO Broadcast Approach—Preliminaries..................................................................................................................................... 9
2.4      The MIMO Broadcast Approach..................................................................................................................................................... 15
            2.4.1      Weak Supermajorization............................................................................................................................................15
            2.4.2      Relation to Capacity..............................................................................................................................................16
            2.4.3      The MIMO Broadcast Approach Derivation............................................................................................................................17
            2.4.4      Degraded Message Sets.............................................................................................................................................19
2.5      On Queuing and Multilayer Coding................................................................................................................................................ 21
            2.5.1      Queue Model—Zero-Padding Queue..................................................................................................................................22
            2.5.2      Delay Bounds for a Finite Level Code Layering.....................................................................................................................23
            2.5.3      Delay Bounds for Continuum Broadcasting...........................................................................................................................24
2.6      Delay Constraints............................................................................................................................................................... 27
            2.6.1      Mixed Delay Constraints...........................................................................................................................................27
            2.6.2      Broadcasting with Mixed Delay Constraints.........................................................................................................................28
            2.6.3      Parallel MIMO Two-State Fading Channel............................................................................................................................30
            2.6.4      Capacity of Degraded Gaussian Broadcast Product Channels..........................................................................................................31
            2.6.5      Extended Degraded Gaussian Broadcast Product Channels.............................................................................................................32
            2.6.6      Broadcast Encoding Scheme.........................................................................................................................................32
2.7      Broadcast Approach via Dirty Paper Coding....................................................................................................................................... 35
 
3      The Multiple Access Channel1
3.1      Overview........................................................................................................................................................................ 35
3.2      Network Model................................................................................................................................................................... 35
            3.2.1      Discrete Channel Model............................................................................................................................................37
            3.2.2      Continuous Channel Model..........................................................................................................................................37
3.3      Degradedness and Optimal Rate Spitting.......................................................................................................................................... 38
3.4      MAC without CSIT—Continuous Channels.......................................................................................................................................... 38
3.5      MAC without CSIT—Two-State Channels: Adapting Streams to the Single-User Channels............................................................................................. 39
3.6      MAC without CSIT—Two-State Channels: State-Dependent Layering................................................................................................................. 40
3.7      MAC without CSIT—Multi-State Channels: State-Dependent Layering............................................................................................................... 45
3.8      MAC with Local CSIT—Two-State Channels: Fixed Layering........................................................................................................................ 47
3.9      MAC with Local CSIT—Two-State Channels: State-Dependent Layering.............................................................................................................. 48
3.10      MAC with Local CSIT—Multi-State Channels: State-Dependent Layering............................................................................................................ 51
 
4      The Interference Channel54
4.1      Overview........................................................................................................................................................................ 54
4.2      Broadcast Approach in the Interference Channel—Preliminaries.................................................................................................................. 55
4.3      Two-User Interference Channel without CSIT...................................................................................................................................... 57
            4.3.1      Successive Decoding: Two-State Channel............................................................................................................................58
            4.3.2      Successive Decoding: ℓ-State Channel............................................................................................................................58
            4.3.3      Average Achievable Rate Region....................................................................................................................................59
            4.3.4      Sum-Rate Gap Analysis.............................................................................................................................................61
4.4      N-User Interference Channel without CSIT........................................................................................................................................ 63
4.5      Two-User Interference Channel with Partial CSIT................................................................................................................................. 64
            4.5.1      Two-User Interference Channel with Partial CSIT—Scenario 1......................................................................................................65
            4.5.2      Two-User Interference Channel with Partial CSIT—Scenario 2......................................................................................................65
 
5      Relay Channels66
5.1      Overview........................................................................................................................................................................ 66
5.2      A Two-Hop Network............................................................................................................................................................... 67
            5.2.1      Upper Bounds......................................................................................................................................................68
            5.2.2      DF Strategies.....................................................................................................................................................70
            5.2.3      Continuous Broadcasting DF Strategies.............................................................................................................................71
            5.2.4      AF Relaying.......................................................................................................................................................75
            5.2.5      AQF Relay and Continuum Broadcasting..............................................................................................................................76
5.3      Cooperation Techniques of Two Co-Located Users.................................................................................................................................. 78
            5.3.1      Lower and Upper Bounds............................................................................................................................................80
            5.3.2      Naive AF Cooperation..............................................................................................................................................82
            5.3.3      AF with Separate Preprocessing....................................................................................................................................84
            5.3.4      Multi-Session AF with Separate Preprocessing......................................................................................................................85
            5.3.5      Multi-Session Wyner–Ziv CF......................................................................................................................................86
5.4      Transmit Cooperation Techniques................................................................................................................................................. 87
            5.4.1      Single-Layer Sequential Decode-and-Forward (SDF)..................................................................................................................88
            5.4.2      Continuous Broadcasting...........................................................................................................................................89
            5.4.3      Two Layer SDF—Successive Decoding...............................................................................................................................89
5.5      Diamond Channel................................................................................................................................................................. 92
            5.5.1      Decode-and-Forward................................................................................................................................................92
            5.5.2      Amplify-and-Forward...............................................................................................................................................94
5.6      Multi-Relay Networks............................................................................................................................................................ 94
            5.6.1      Oblivious Relays..................................................................................................................................................95
            5.6.2      Oblivious Agents..................................................................................................................................................96
5.7      Occasionally Available Relays................................................................................................................................................... 97
 
6      Communications Networks98
6.1      Overview........................................................................................................................................................................ 98
6.2      Multi-User MAC Broadcasting with Linear Detection............................................................................................................................... 98
            6.2.1      Channel Model.....................................................................................................................................................100
            6.2.2      Strongest Users Detection—Overview and Bounds...................................................................................................................100
            6.2.3      Broadcast Approach with Strongest Users Detection—(NO SIC)......................................................................................................102
            6.2.4      SIC Broadcast Approach Upper Bound................................................................................................................................103
            6.2.5      Broadcast Approach with Iterative SIC.............................................................................................................................104
6.3      The Broadcast Approach for Source-Channel Coding................................................................................................................................ 108
            6.3.1      SR with Finite Layer Coding.......................................................................................................................................109
            6.3.2      The Continuous SR-Broadcasting....................................................................................................................................109
6.4      The Information Bottleneck Channel.............................................................................................................................................. 113
            6.4.1      Uncertainty of Bottleneck Capacity................................................................................................................................115
6.5      Transmitters with Energy Harvesting............................................................................................................................................. 117
            6.5.1      Optimal Power Allocation Densities................................................................................................................................119
            6.5.2      Optimal Power Allocation over Time................................................................................................................................119
            6.5.3      Grouping the Constraints..........................................................................................................................................120
            6.5.4      Dominant Constraints..............................................................................................................................................121
            6.5.5      Optimality of Algorithm 1........................................................................................................................................121
 
7      Outlook122
 
A      Constants of Theorem 7126
 
B      Corner Points in Figure 16127
 
References128

1. Motivation and Overview

1.1. What is the Broadcast Approach?

The information- and communication-theoretic models of a communication channel are generally specified by the probabilistic description of the channel’s input and output relationship. The output, subsequently, depends on the channel input and the state process of the channel. The channel’s probabilistic description changes over time in various domains, rendering a time-varying channel state process. These, for instance, include mobile wireless communications, storage systems, and digital fingerprinting, where all have time-varying communication mediums. Reliable communication generally necessitates transmitting an encoded message over multiple channel uses. Therefore, temporal fluctuations in channel states can cause a significant impediment to sustaining reliable communications. When channel states are known to the transmitters, the encoders can be guided to adjust the transmission rates in response to the changes in the channel’s actual states. When a transmitter is informed of the channel state (e.g., via side information or feedback), it can adopt variable-length channel coding, the fundamental performance limits of which are well-investigated [1,2,3,4,5].
While desirable, informing the transmitters of the time-varying state process can be practically prohibitive in a wide range of existing or emerging communications technologies. In such circumstances, while the encoders cannot adapt their transmissions to channel states, there is still the possibility of adapting the decoders to the channel states. The information-theoretic limits of communication over such state-dependent channels when the transmitters have only access to the statistical description of the channel state process is studied broadly under the notion of variable-rate channel coding [6]. When the temporal variations are frequent enough, an effective long-term strategy is adapting the transmission strategies to the system’s ergodic behavior. However, when these variations are infrequent, their temporal average can deviate significantly from the channel’s ergodic mode, rendering the ergodic metrics (e.g., ergodic capacity) unreliable performance targets.
State-dependent channels appear in various forms in communication systems. A prevalent example is mobile wireless channels, which undergo fading processes. Fading induces time-varying states for the channel, resulting in uncertainty about the network’s state at all transmitter and receiver sites [7]. Other examples include opportunistic scheduling, in which the transmitter adjusts encoding and transmission based on a quality-of-service metric that depends on the state of the channel [8,9,10], e.g., signal-to-noise ratio, latency, and throughput; opportunistic spectrum access (across time, space, and frequency); and cognitive radio communication, in which the quality of communication relies on the access to the spectrum resources [11,12]. This survey paper focuses primarily on the fading process in different network models and the mechanisms for circumventing transmitters’ lack of information about random fading processes. Nevertheless, most techniques that we will review can be adjusted to cater to other forms of state-dependent channels as well.
When wireless channels undergo fading, a useful convention to circumvent uncertainties about the fading process is establishing training sessions to estimate channel states. Such sessions should repeat periodically commensurate to how frequently the states vary. Depending on the multiplexing mode in a communication channel, the training sessions are either bidirectional (e.g., in frequency-division multiplexing systems) or they are unidirectional and ensued by feedback sessions (e.g., in time-division multiplexing systems). While effective in delivering the channel state to the receiver sites, both mechanisms face various challenges for delivering the same information to the transmitters. For instance, establishing channels in both directions is not always feasible, and even when it is, feedback communication incurs additional costs and imposes additional latency. Such impediments are further exacerbated as the size of a network grows.
When the probabilistic model of the process is known, an alternative approach to channel training and estimation is hedging against the random fluctuations. When the fluctuations are rapid enough, an effective long-term strategy is adapting the transmission strategies to the system’s ergodic behavior. A widely-used instance of this is the ergodic capacity as a reliable transmission rate for a channel that undergoes a fast-fading process. On the other hand, when the fluctuations occur in time blocks, which is often the case, an effective strategy is the outage strategy, aiming to meet target reliability with a pre-specific probabilistic guarantee. An example of an outage strategy is adopting the notion of outage capacity, which evaluates the likelihood of reliable communication at a fixed transmission rate [13]. When the actual channel realization can sustain the rate, the transmission is carried out successfully; otherwise, it fails, and no message is decoded [7,13]. The notions of outage and delay-limited capacities are studied extensively for various networks, including the multiple access channel (c.f. [14,15,16,17,18,19] and references therein).
While the ergodic and outage approaches provide long-term probabilistic performance guarantees, they lack instantaneous guarantees. That is, each communication session faces a chance of complete failure. For instance, when the channel’s instantaneous realization does not sustain a rate equal to the ergodic or outage capacity, the entire communication session over that channel will be lost. To circumvent a lack of short-term guarantees, the broadcast approach provides principles for designing transmission schemes that benefit from both short- and long-term performance guarantees. In information-theoretic terms, the broadcast approach is called variable-to-fixed channel coding [6].

1.2. Degradedness and Superposition Coding

The broadcast approach ensures a minimum level of successful communication, even when the channels are in their weakest states. In this approach, any channel realization is viewed as a broadcast receiver, rendering an equivalent network consisting of several receivers. Each receiver is designated to a specific channel realization, and it is degraded with respect to a subset of other channels. Designing a broadcast approach for a channel model has the following two pivotal elements.
1. 
Degradedness in channel realizations: The first step in specifying a broadcast approach for a given channel pertains to designating a notion of degradedness that facilitates rank-ordering different realizations of a channel based on their relative strengths. The premise for assigning such degradedness is that if communication is successful in a specific realization, it will also be successful in all realizations considered stronger. For instance, in a single-user single-antenna wireless channel that undergoes a flat-fading process, the fading gain can be a natural degradedness metric. In this channel, as the channel gain increases, the channel becomes stronger. Adopting a proper degradedness metric hinges on the channel model. While it can emerge naturally for some channels (e.g., single-user flat-fading), in general, selecting a degradedness metric is rather heuristic, if possible at all. For instance, in the multiple access channel, the sum-rate capacity can be used as a metric to designate degradedness, while in the interference channel, comparing different network realizations, in general, is not well-defined.
2. 
Degradedness in message sets: Parallel to degradedness in channel realization, in some systems, we might have a natural notion of degradedness in the message sets as well. Specifically, in some communication scenarios (e.g., video communication), the messages can be naturally divided into multiple ordered layers that incrementally specify the entire message. In such systems, the first layer conveys the baseline information (e.g., the lowest quality version of a video); the second layer provides additional information that incrementally refines the baseline information (e.g., refining video quality), and so on. Such a message structure specifies a natural way of ordering the information layers, which should also be used by the receiver to retrieve the messages successfully. Specifically, the receiver starts by decoding the baseline (lowest-ranked) layer, followed by the second layer, and so on. While some messages have inherent degradedness structures (e.g., audio/video signals), that is not the case in general. When facing messages without an inherent degradedness structure, a transmitter can still split a message into multiple, independently generated information layers. The decoders, which are not constrained by decoding the layers in any specific order, will decode as many layers as they afford based on the actual channel realization.
In a communication system, in general, the states of degradedness in channel realizations and degradedness in message sets can vary independently. Subsequently, designing a broadcast approach for a communication system hinges on its channel and message degradedness status. By leveraging the intuitions from the known theories on the broadcast channel, we briefly comment on different combinations of the degradedness states.
  • Degraded message sets. A message set with an inherent degradedness structure enforces a prescribed decoding order for the receiver.
    -
    Degraded channels. When there is a natural notion of degradedness among channel realizations (e.g., in the single-user single-antenna flat-fading channel), we can designate one message to each channel realization such that the messages are rank-ordered in the same way that their associated channels are ordered. At the receiver side, based on the actual realization of the channel, the receiver decodes the messages designated to the weaker channels, e.g., in the weakest channel realization, the receiver decodes only the lowest-ranked message, and in the second weakest realization, it decodes the two lowest-ranked messages, and so on. Communication over a parallel Gaussian channel is an example in which one might face degradedness both in the channel and the message [20].
    -
    General channels. When lacking a natural notion of channel degradedness (e.g., in the single-user multi-antenna channel or the interference channel), we generally adopt an effective (even though imperfect) approach to rank order channel realizations. These orders will be used to prescribe an order according to which the messages will be decoded. The broadcast approach in such settings mimics the Körner–Marton coding approach for broadcast transmission with degraded message sets [21]. This approach is known to be optimal for a two-user broadcast channel with a degraded set of messages, while the optimal strategy for the general broadcast approach is an open problem despite the significant recent advances, e.g., [22].
  • General message sets. Without an inherent degradedness structure in the message, we have more freedom to generate the message set and associate the messages to different channel realizations. In general, each receiver has the freedom to decode any desired set of messages in any desired order. The single-user multi-antenna channel is an important example in which such an approach works effectively [23]. In this setting, while the channel is not degraded in general, different channel realizations are ordered based on the singular values of the channel matrix’s norm, which implies an order in channel capacities. In this setting, it is noteworthy that the specific choice of ordering the channels and assigning the set of messages decoded in each realization induces degradedness in the message set.
Built based on these two principles, and following the broadcast approach to compound channels [24], the notion of broadcast strategy for slowly fading single-user channel was initially introduced for effective single-user communication [25].

1.3. Application to Multimedia Communication

The broadcast approach has a wide range of applications that involve successive and incremental retrieval of information sources. Representative examples include image compression and video coding systems, which can be naturally integrated with the successive refinement techniques [26,27]. Specifically, the broadcast approach’s underlying premise is to allow the receivers to decode the messages only partially, as much as the channels’ actual instantaneous realizations allow. This is especially relevant in audio/video broadcast systems, in which even partially decoding the messages still renders signals that are aurally or visually interpretable or recognizable. In these systems, a transmitter is often oblivious to the instantaneous realization of the channels, and the quality of its channel shapes the quality of the audio or video signal recovered. This is also the principle widely used in communication with successive refinement, in which a message is split into multiple layers. A baseline layer carries the minimal content that allows decoding an acceptable message. The subsequent layers successively and progressively add more details to the message, refining its content and quality. This approach enables digitally achieving a key feature of analog audio and video transmission: the quality of communication is a direct function of the channel quality, while there is no channel state information at the transmitter.
In this review paper, we start by reviewing the core ideas in designing a broadcast approach in the single-user wireless channel in Section 2. In this section, we address both single-antenna and multi-antenna systems under various transmission constraints. Next, we provide an overview of the applications to the multiple access channel in Section 3. This section discusses settings in which transmitters are either entirely or partially oblivious to the channel states. Section 4 and Section 5 will be focused on the interference channel and the relay channel, respectively. A wide range of network settings will be discussed in Section 6, and finally, Section 7 provides a perspective on the possible directions for extending the theory and applications of the broadcast approach.

2. Variable-to-Fixed Channel Coding

As pointed out earlier, the broadcast approach is, in essence, a variable-to-fixed channel coding [6] for a state-dependent channel, where the state realization is known only at the receiver. While being oblivious to the channel realizations, the transmitter has access to the probabilistic description of the channel. The key idea underpinning the broadcast approach is splitting the transmitted message into multiple independent layers and providing the receiver with the flexibility to decode as many layers as it affords, depending on the channel’s actual state. While the concept is general and can be applied to a wide range of state-dependent channels, in this paper we focus on wireless channels.

2.1. Broadcast Approach in Wireless Channels

In wireless communications, the channels undergo random fading processes. In these systems, the channel state corresponds to a fading gain, and the channel state statistical description is characterized by the probability model of the fading process [7,23,25,28]. The relative duration of the channel’s coherence time to the system’s latency requirement specifies the channel’s fading condition. Specifically, slow (fast) fading arises when the channel’s coherence time is large (small) relative to the system’s latency requirement. In particular, slowly fading channels are commonly when a mobile front-end moves slowly relative to the data transmission rate. Such a model is especially apt in modern communication systems with high spectral efficiency and data rates.
In systems with slowly-fading channels, a receiver can estimate the channel fading coefficients with high accuracy. This motivates considering the instantaneous and perfect availability of the channel state information (CSI) at the receiver sites. On the other hand, acquiring such CSI at the transmitter sites (CSIT) can be either impossible, due to the lack of a backward channel from a receiver to its respective transmitter; prohibitive, due to the extensive costs associated with backward communication; or unhelpful, due to a mismatch between the stringent latency constraints and the frequency of backward communication. Hence, in these circumstances, properly circumventing the lack of perfect CSIT plays a pivotal role in designing effective communication schemes.
Capitalizing on the system’s ergodic behavior (e.g., setting the transmission rate to the ergodic capacity of a channel) effectively addresses the lack of CSIT [7]. However, this is viable only when the transmission is not facing any delay constraints, and the system is allowed to have sufficiently long transmission blocks (relative to the fading dynamics). In particular, in a highly dynamic channel environment, stringent delay constraints imply that a transmission block, in spite of still being large enough for having reliable communication [13]), is considerably shorter than the dynamics of the slow fading process. To quantify the quality of communication in such circumstances, the notion of capacity versus outage was introduced and discussed in [7,13] (and references therein). A fundamental assumption in these systems is that the fading process variations throughout the transmission block are negligible. In an outage strategy, the transmission rate is fixed, and the information is reliably retrieved by the receiver when the instantaneous channel realizations allow. Otherwise, communication fails (an outage event). In such systems, the term outage capacity refers to the maximal achievable average rate. It can also be cast as the capacity of an appropriately defined compound channel [7]. The main shortcoming of the outage approach to designing transmission is the possibility of outage events, which translates to possibly a significant loss in spectral efficiency.
The broadcast approach aims to avoid outage events while the transmitters remain oblivious to the state of their channels. In this approach, reliable transmission rates are adapted to the actual channel conditions without providing feedback from the receiver to the transmitter. This approach’s origins are discussed in Cover’s original paper [24], which suggests using a broadcast approach for the compound channel. Since the slowly-fading channel can be viewed as a compound channel with the channel realization as the parameter of the compound channel, transmission over these channels can be naturally viewed and analyzed from the perspective of the broadcast approach. This strategy is useful in various applications, and in particular, it is in line with the successive refinement source coding approach of [29] and the subsequent studies in [30,31,32,33,34]. Specifically, the underlying premise is that the more the provided information rate, the less average distortion evident in the reconstructed source.
An example of successive refinement of source coding is image compression, in which a gross description exists at first, and gradually with successive improvements of the description, the image quality is further refined. An application example is progressive JPEG encoding, where additional coded layers serve to refine the image quality. In the broadcast approach, the transmitter sends layered coded information, and in view of the receiver as a continuum of ordered users, the maximum number of layers successively decoded is dictated by the fading channel realization. Thus, the channel realization influences the received quality of the data. The broadcast approach has a practical appeal in voice communication cellular systems, where a layered voice coding is possible. Service quality, subsequently, depends on the channel realization. This facilitates using coding to achieve the basic features of analog communications, that is, the better the channel, the better the performance, e.g., the measured signal-to-noise ratio (SNR) or the received minimum mean-squared error (MMSE). All of this is viable while the transmitters are unaware of channel realizations. Other applications can be found in [35]. The problem of layered coding suggests unequal error protection on the transmitted data, which was studied in ([36], and references therein). A related subject is the priority encoding transmission (PET). The study in [37] shows that sending hierarchically-organized messages over lossy packet-based networks can be analyzed using the broadcast erasure channel with degraded message set, using the information spectrum approach [38]. Finally, we remark [39] extends the notion to settings in which the probabilistic model is unknown to the transmitter.

2.2. Relevance to the Broadcast Channel

Since the broadcast approach’s foundations hinge on those of the broadcast channel, we provide a brief overview of the pertinent literature on the broadcast channel, which was first explored by Cover [24,40]. In a broadcast channel, a single transmission is directed to a number of receivers, each enjoying possibly different channel conditions, reflected in their received SNRs. The Gaussian broadcast channel with a single transmit antenna coincides with the classical physically degraded Gaussian broadcast channel, whose capacity region is well known (see [40] for the deterministic case and [41,42,43] for the composite or ergodic cases). For multiple transmit antennas, the Gaussian broadcast channel is, in general, a non-degraded broadcast channel, for which the capacity region with a general message set is not fully known [44,45,46,47,48], and it cannot be reduced to an equivalent set of parallel degraded broadcast channels, as studied in [41,42,43,49]. In the special case of individual messages without common broadcasting, the capacity region in the multi-antenna setting was characterized in [50].
Broadcasting a single user essentially means broadcasting common information. Information-theoretic results and challenges for broadcasting a common source are discussed in [51], and in light of endless information, data transmission is termed streaming in [52]. The interpretation of single-user broadcasting is the hierarchical broadcasting using multi-level coding (MLC) [53,54,55]. The study in [54] demonstrates the spectral efficiency of MLC with hierarchical demodulation in an additive white Gaussian noise (AWGN) channel and a fading channel. Authors in [56] examine the fading interleaved channel with one bit of side information about the fading process. The broadcast approach is adapted to decode different rates for channels taking these two distinct states (determined by whether the SNR is above or below a threshold value). Since the channel is memoryless, the average rate, given by the mutual information we have I ( y , s ^ ; x ) (where x is the channel input, y is the channel output, and s ^ is the partial state information), is achievable. This is not the case with the broadcast approach, which seems to be unfit here, where channel states are assumed to be independent and identically distributed (i.i.d.).
Finally, the study in [57] considers a superposition coding scheme to achieve higher transmission rates in the slowly-fading channel. This study adopts the broadcast approach for the single-input single-output (SISO) channel with a finite number of receivers. The number of receivers is the number of coded layers. It is evident from [57] that for the SISO channel, a few levels of coded layering closely approximates the optimal strategy employing transmission of infinite code layers.

2.3. The SISO Broadcast Approach—Preliminaries

In this section, we elaborate on the original broadcast approach, first presented in [25], and we provide the derivation of the expressions related to the broadcast approach concept, an optimal power distribution, and the associated average achievable rates under different system constraints. We start by providing a canonical channel model for the single-user single-antenna system. The fading parameter realization can be interpreted as an index (possibly continuous), which designates the SNR at the receiver of interest. This model also serves as the basis for other channel models discussed in the rest of the paper. Specifically, consider the channel model:
y = h x + n ,
where x is the transmitted complex symbol, y is the received symbol, and n accounts for the AWGN with zero mean and unit variance denoted by CN ( 0 , 1 ) . Constant h represents the fading coefficient. For each realization of h, there is an achievable rate. We are interested in the average achievable rate for various independent transmission blocks. Thus, we present the results in terms of average performance, averaged over the distribution of h.
Information-theoretic considerations for this simple model were discussed in ([13], and references therein), as a special case of the multi-path setting. With the h value known to the transmitter, and with a short-term power constrain (excluding power optimization in different blocks), the reliable rate averaged over many block realizations is given by
C erg = E s [ log ( 1 + s P ) ] ,
where s | h | 2 is the random fading power. The normalized SNR, following the channel model definition (1), is denoted by P = E [ | x | 2 ] , where E stands for the expectation operator (when a subscript is added, it specifies the random variable with respect to which the expectation is taken).
The SISO channel defined in (1) is illustrated in Figure 1a, and its associated broadcast channel is depicted in Figure 1b. This figure also illustrates the broadcast approach, according to which the transmitter sends an infinite number of coded information layers. The receiver is equivalent to a continuum of ordered users, each decoding a coded layer if channel realization allows. In general, the number of coded layers (and respectively, receivers) depends on the cardinality of the fading power random variable (RV). Specifically, in a Gaussian fading channel, a continuum of coded layers is required. Predetermined ordering is achieved due to the degraded nature of the Gaussian SISO channel [40]. Each of the users has to decode a fractional rate, denoted by d R in Figure 1b. The fractional rates d R of the different users are not equal but depend on their receiver index. For some fading realization h ( j ) , only the continuum of receivers up to receiver j can decode their fractional rates d R . The first receiver decodes only its own d R , the second decode initially the interference d R (information intended to the first user) and then decodes its own d R . Finally, receiver j decodes all fractional interferences up to layer j 1 , and then decodes its information layer d R . Hence, the total achievable rate for a realization h ( j ) is the integral of d R over all receivers up to j. This model is the general case of coded layering. The broadcast approach in [25] with a finite number of code layers, also termed superposition coding, is presented in [57]. In finite level code layering, only a finite set of ordered receivers is required. This approach has a lower decoding complexity. However, it is a broadcast sub-optimal approach.
Next, assume that the fading power RV S is continuous. Then for some channel realization h ( j ) of Figure 1b, with a fading power s ( j ) , the designated reliably conveyed information rate is denoted by R ( s ( j ) ) . We now drop the superscript j, and refer to s as the realization of the fading power RV S. As illustrated, the transmitter views the fading channel as a degraded Gaussian broadcast channel [40] with a continuum of receivers, each experiencing a different effective receive SNR specified by s · P . The total transmitted power P is also the SNR as the fading and additive noise are normalized according to (1). The term s is, therefore, interpreted as a continuous index. By noting that for small enough x > 0 log ( 1 + x ) x , the incremental differential rate is given by
d R ( s ) = log 1 + s ρ ( s ) d s 1 + s I ( s ) = s ρ ( s ) d s 1 + s I ( s ) ,
where ρ ( s ) d s is the transmit power associated with a layer parameterized by s, intended for receiver s, which also designates the transmit power distribution. The right-hand-side equality is justified in [58]. Information streams intended for receivers indexed by u > s are undetectable and are treated as additional interfering noise, denoted by I ( s ) . The interference for a fading power s is
I ( s ) = s ρ ( u ) d u ,
which is also a monotonically decreasing function of s. The total transmitted power is the overall collected power assigned to all layers, i.e.,
P = 0 ρ ( u ) d u = I ( 0 ) .
As mentioned earlier, the total achievable rate for a fading realization s is an integration of the fractional rates over all receivers with successful layer decoding capability, rendering
R ( s ) = 0 s u ρ ( u ) d u 1 + u I ( u ) .
The average rate is achieved with sufficiently many transmission blocks, each viewing an independent fading realization. Therefore, the total rate averaged over all fading realizations is
R bs = 0 d u f ( u ) R ( u ) = 0 d u ( 1 F ( u ) ) u ρ ( u ) 1 + u I ( u ) ,
where f ( u ) is the probability distribution function (PDF) of the fading power, and
F ( u ) = 0 u d a f ( a ) ,
is the corresponding cumulative distribution function (CDF).
Optimizing R bs with respect to the power distribution ρ ( s ) (or equivalently with respect to I ( u ) , where u 0 ) under the power constraint P (5) is of interest and can in certain cases be found by solving the associated constrained Eüler equation [59]. We turn back to the expression in (7), corresponding to s th = 0 , and explicitly write the optimization problem posed
R bs , max = max I ( u ) 0 d u ( 1 F ( u ) ) u ρ ( u ) 1 + u I ( u ) ,
where we maximize R bs (7) over the residual interference function I ( u ) . For an extremum function I ( x ) , the variation of the functional (9) is zero [59], corresponding to a proper Eüler equation, which yields the extremal solution for I ( x ) . Let us first present the functional of (9) subject to maximization
S ( x , I ( x ) , I ( x ) ) = ( 1 F ( x ) ) x I ( x ) 1 + x I ( x ) .
The necessary condition for a maximum of the integral of S ( x , I ( x ) , I ( x ) ) over x is a zero variation of the functional ([59], Theorem 2, Section 3.2). Correspondingly, the Eüler Equation is given by
S I d d x S I = 0 ,
where
S I = ( 1 F ( x ) ) x 2 I ( x ) ( 1 + x I ( x ) ) 2 ,
S I = ( 1 F ( x ) ) x 1 + x I ( x ) ,
d d x S I = x f ( x ) 1 + x I ( x ) + ( 1 F ( x ) ) x 2 I ( x ) 1 ( 1 + x I ( x ) ) 2 .
These relationships simplify from a differential Equation (11) to a linear equation by I ( x ) , providing the following closed-form solution
I ( x ) = 1 F ( x ) x · f ( x ) x 2 f ( x ) x 0 x x 1 0 e l s e ,
where x 0 is determined by I ( x 0 ) = P , and x 1 by I ( x 1 ) = 0 . All the analyses are also valid for the single-input multiple-output (SIMO) and multiple-input single-output (MISO) channels as long as the channels are degraded regardless of the number of receive antennas in SIMO or transmit antennas in MISO. The number of transmit or receive antennas only affects the fading power distribution CDF. As an example, consider a SISO Rayleigh flat fading channel for which the fading power S has an exponential distribution with pdf
f ( u ) = e u , and F ( u ) = 1 e u , u 0 .
The optimal transmitter power distribution that maximizes R bs in (9) is specified by substituting f ( u ) and F ( u ) from (16) into (15), resulting in
ρ ( s ) = d d s I ( s ) = 2 s 3 1 s 2 , s 0 s s 1 0 , else .
Constant s 0 is determined by solving I ( s 0 ) = P , and it is given by
s 0 = 2 1 + 1 + 4 P .
Similarly, s 1 can be found by solving I ( s 1 ) = 0 , which indicates s 1 = 1 . The corresponding rate R ( s ) using (6) is
R ( s ) = 0 , 0 s s 0 2 ln ( s s 0 ) ( s s 0 ) , s 0 s 1 2 ln ( s 0 ) ( 1 s 0 ) , s 1 ,
and following (7), the associated total average rate is
R bs = 2 E i ( s 0 ) 2 E i ( 1 ) ( e s 0 e 1 ) ,
where
E i ( x ) = x e t t d t , x 0
is the exponential integral function. The limiting behavior of R bs is found to be
R bs ln P 9.256 , P 1 e P , P 0 .
The ergodic capacity in this case is given by [13],
C erg = e 1 / P · E i ( 1 P ) ln P 1.78 , P P , P 0 .
The average achievable rate of the standard outage approach, depends on the outage probability P out = P { s s th } = 1 e s th . Thus, the achievable outage rate is given by
R o ( s th ) = e s th log ( 1 + s th P ) ,
where R o ( s th ) is the average achievable rate of a single layered code for a parameter s th . That is, a rate of log ( 1 + s th P ) is achieved when the fading power realization is greater than s th , with probability e s th . The outage capacity is the product of maximizing the achievable outage average rate (24) with respect to the outage probability (or the fading power threshold s th ). This yields an outage capacity
R o , max = e s th , opt log ( 1 + s th , opt P ) ,
where s th , opt solves the equation
log ( 1 + s th , opt P ) = P 1 + s th , opt P ,
and it can be expressed in closed-form as
s th , opt = P W L ( P ) W L ( P ) · P ,
where W L ( P ) is the Lambert-W function, also known as the Omega function, which is the inverse of the function f ( W ) = W e W . Subsequently, the outage capacity is given by [60]
R o , max = e ( P W L ( P ) ) / W L ( P ) / P · log P / W L ( P ) ln P W L ( P ) , P 1 e P , P 0 .
The study in [61] provides an interesting interpretation for the basics of the broadcast approach [25] from the I-MMSE perspective.
When a transmitter has full CSI and transmits at a fixed power P, the transmission rate can be adapted to channel state, and single-layer transmission can achieve the ergodic capacity. When variability in transmission power is allowed, and we face an average power constraint, a water-filling approach can be used. This facilitates adapting the transmission power and rate to the fading state, which is advantageous in terms of the expected rate. However, when lacking the perfect CSIT, the SISO broadcast approach can be optimized as studied in [62]. In this approach, the CSI is quantized by the receiver and fed back to the transmitter. This allows for short latency, and the optimized achievable expected rate can be characterized as a function of the CSI accuracy.
The studies in [63,64] investigate various multi-layer encoding hybrid automatic repeat request (HARQ) schemes [65]. The motivation for extending the conventional HARQ schemes to multi-layer coding is to achieve high throughput efficiency with low latency. The study in [63] focuses on finite-level coding with incremental redundancy HARQ, where every coded layer supports incremental redundancy coding. The multi-layer bounds were investigated through continuous broadcasting by defining different broadcasting protocols that coherently combine HARQ and broadcasting incremental redundancy HARQ. Optimal power distribution cannot be obtained for continuous broadcasting. However, it was observed that even with a sub-optimal broadcasting power distribution, significantly high gains of ∼3 dB over an outage approach could be achieved for low and moderate SNRs in the long-term static channel model, with latency as short as two blocks. In the long-term static channel model, the channel is assumed to remain in the same fading state within the HARQ session. This is especially interesting as the conventional broadcast approach (without HARQ), has only marginal gains over the outage approach for low SNRs. The retransmission protocol of [63] is also an interesting approach, which uses retransmissions for sending new information at a rate matched to the broadcasting feedback from the first transmission. The optimal broadcasting power distribution for outage approach retransmission was fully characterized in [63], and numerical results showed that it is the most efficient scheme for high SNRs, and at the same time, it closely approximates the broadcasting incremental redundancy-HARQ for low SNRs. However, in broadcasting incremental redundancy HARQ, only sub-optimal power distributions were used and finding the broadcasting optimal power distribution is still an open problem. It may also turn out that the broadcasting incremental redundancy HARQ with an optimal power distribution has more gains over the outage approach retransmission scheme.
Next, we present the results on the achievable rates for the single-user SISO Rayleigh flat fading channel under the broadcast approach. Figure 2 demonstrates the SISO broadcast achievable average rate R bs (20), outage capacity R o (25), the ergodic capacity C erg (23) upper bound, and the Gaussian capacity C G = log ( 1 + P ) as a reference. Clearly, R bs > R o as the latter is achieved by substituting ρ ( s ) with P δ ( s s th , opt ) in lieu of the optimized ρ ( s ) in (6). Outage capacity is equivalent to optimized single-layer coding rather than the optimized continuum of code layers in the broadcast approach. This difference is more pronounced in the high SNRs. Such a comparison of the single- level code layer and two-level achievable rates is presented in [57]. This comparison shows that two-level code layering is already very close to the optimum R bs . The ergodic capacity in the general SIMO case, with N receive antennas, is given by ([66], Equation (9)):
C erg = 1 Γ ( N ) 0 d x log ( 1 + P · x ) x N 1 e x ,
where Γ denotes the Gamma function. The probability density of the total fading power for N receive antennas, is given by [66]
f ( λ ) = const ( N ) · λ N 1 e λ ,
where const ( N ) is a normalization constant.

2.4. The MIMO Broadcast Approach

Next, we review the multiple-input multiple-output (MIMO) channel. MIMO channels, in general, are non-degraded broadcast channels. The MIMO capacity region is known for multiple users with private messages  [50], and for two users with a common message [67]. A complete characterization of the broadcast approach requires the full solution of the most general MIMO broadcast channel with a general degraded message set, which is not yet available. Hence, suboptimal ranking procedures are studied. Broadcasting with degraded message sets is not only unknown in general channels, but also, it is unknown for MIMO channels [68,69]. Various approaches to transmitting degraded message set with sub-optimal ranking at the receiver are studied in [23,70,71]. The ranking of channel matrices (as opposed to a vector in a SIMO case) can be achieved via supermajorization ranking of the singular values of H H H . The variational problem for deriving the optimal power distribution for the MIMO broadcast strategy is characterized in [23], but seems not to lend itself to closed-form expressions. Thus, a sub-optimal solution using majorization is considered and demonstrated for the Rayleigh fading channel.
We adopt the broadcast approach described earlier for the SISO and SIMO channels, in which the receivers opt to detect the highest possible rate based on the actual realization of the propagation matrix H not available to the transmitter. In short, as H improves, it sustains higher reliable rates. This is because the MIMO setting is equivalent to the general broadcast channel (from the perspective of infinite layer coding), rather than a degraded broadcast channel as in the single-input case. In the sequel, we demonstrate a broadcast approach suited for this MIMO scenario. The approach suggests an ordering of the receivers based on supermajorization of singular values of the channel norm matrix. Consider the following flat fading MIMO channel with M transmit antennas and N receive antennas:
y = H x + n ,
where x is the input ( M × 1 ) vector, n is the ( N × 1 ) noise vector with complex Gaussian i.i.d. CN ( 0 , 1 ) elements. The propagation matrix ( N × M ) is designated by H and also possesses complex Gaussian i.i.d. CN ( 0 , 1 ) elements. The received ( N × 1 ) vector is denoted by y . We adhere to the non-ergodic case, where H is fixed throughout the code word transmission. We assume that the receiver is aware of H while the transmitter is not. The total transmit power constraint is P, i.e., E [ tr { x x H } ] P .

2.4.1. Weak Supermajorization

First, we introduce some partial ordering relations based on classical theory of majorization [72]. Let α = { α i } , β = { β i } be two sequences of length K. Let { α ( i ) } , { β ( i ) } be the increasing ordered permutations of the sequences, i.e.,
α ( 1 ) α ( 2 ) α ( K ) ,
β ( 1 ) β ( 2 ) β ( K ) .
Let α be weakly supermajorized by β , α w β , that is
i = 1 k α ( i ) i = 1 k β ( i ) , k = 1 , K .
Then, the relation α w β implies that [72]
i = 1 K ϕ ( α i ) i = 1 K ϕ ( β i ) ,
for all continuous decreasing convex functions ϕ ( · ) .

2.4.2. Relation to Capacity

Next, consider the received signal in (31), where the undetectable code layers are explicitly stated as
y = H ( x S + x I ) + n ,
where x S and x I are decodable information and residual interference Gaussian vectors, respectively. Their average norms are denoted by P S and P I , respectively, and the total transmit power P = P I + P S . n is an i.i.d. Gaussian complex vector with unit variance per component. The mutual information between x S and y is given by
I ( y ; x S ) = I ( y ; x S , x I ) I ( y ; x I | x S )
= log det I + P S + P I M H H H log det I + P I M H H H
= k = 1 J log 1 + P S λ k 1 + P I λ k
C ( λ ; P S , P I ) .
Parameters { λ k } for k = 1 J , where J min ( N , M ) , designate the singular values (or eigenvalues) of the matrix 1 M H H H for M N , or 1 M H H H for N M  [66]. Finally, if  λ w δ , we have
C ( λ ; P S , P I ) C ( δ ; P S , P I ) .

2.4.3. The MIMO Broadcast Approach Derivation

We discuss the MIMO channel broadcast approach via supermajorization layering for the simple case of M = N = 2 . The signal x is composed of a layered double indexed data stream with indices denoted by u and v. We refer to layer ordering by columns bottom-up, where u and v are described as a pair of indices taking integer values within the prescribed region. This is only for demonstration purposes, as indices u and v are continuous singular values of 1 2 H H H . Say u and v are associated with the minimal eigenvalue λ 2 and the sum of eigenvalues λ 2 + λ 1 , respectively. Evidently, u 0 , v 2 u . Say that λ 2 , λ 1 take on the set of integer values { 0 , 1 , 2 , 3 , 4 } , then the layered system is described by ( u , v ) in the order: ( 0 , 0 ) , ( 0 , 1 ) , ( 0 , 2 ) , ( 0 , 3 ) , ( 0 , 4 ) , ( 1 , 2 ) , ( 1 , 3 ) , ( 1 , 4 ) , ( 2 , 4 ) . The actual ordering of the layers is in fact immaterial, as will be shown, decoding is not done successively as in the SISO case [25], but rather according to what is decodable adhering to partial ordering.
We envisage all possible realizations of H and order them by u = λ 2 , v = λ 2 + λ 1 where λ 2 and λ 1 are, respectively, the minimal and maximal eigenvalues of 1 2 H H H (a 2 × 2 matrix in our case). Supermajorization ordering dictates that all streams decodable for realization H will be decodable for realization H as long as
λ 2 λ 2 , λ 2 + λ 1 > λ 2 + λ 1 .
Thus, we visualize all possible realizations of H as channels referring to different users in a broadcast setting, and we investigate the associated rates of the users, which we have ranked as in Section 2.4.1, via a degraded ordering. It is evident that the current approach specifies an achievable rate region, but by no means is it claimed to be optimal. In fact, it even has some inherent limitations.
Let u = λ 2 and v = λ 1 be the eigenvalues of 1 2 H H H for some channel realization such that v u 0 . Let ρ ( u , v ) d u d v be the power associated with the information stream indexed by ( u , v ) where v u , and featuring the incremental rate d 2 R ( u , v ) . Again, for a given u and v, all rates associated with the indices ( a , b ) , a u , b v can be decoded, as ( λ 2 , λ 1 ) is supermajorized by ( λ 2 = a , λ 1 = b ) . A natural optimization problem, in parallel to that posed and solved for the single dimensional case, is to optimize the power density ρ ( u , v ) , or the related interference pattern I ( u , v ) maximizing the average rate, under the power constraint I ( 0 , 0 ) = P . Let I ( u , v ) designate the residual interference at ( u , v ) . Hence,
I ( u , v ) = P 0 u d a a v d b ρ ( a , b ) .
The associated incremental rate d 2 R ( u , v ) , based on (3) and (37), is then given by
d 2 R ( u , v ) = log 1 + u ρ ( u , v ) d u d v 1 + u I ( u , v ) + log 1 + v ρ ( u , v ) d u d v 1 + v I ( u , v )
= u ρ ( u , v ) d u d v 1 + u I ( u , v ) + v ρ ( u , v ) d u d v 1 + v I ( u , v ) .
The power density is the second order derivative of the residual interference Function (43), i.e.,
ρ ( u , v ) = 2 u v I ( u , v ) I u v ,
and the incremental rate may be expressed as
d 2 R ( u , v , I , I u v ) = u I u v ( u , v ) d u d v 1 + u I ( u , v ) v I u v ( u , v ) d u d v 1 + v I ( u , v ) .
The accumulated reliable rate decoded at ( u , v ) is
R ( u , v ) = 0 u a v d 2 R ( a , b ) .
The expected rate, averaged over various channel realizations, is then given by
R ave = 0 0 f ( u , v ) R ( u , v ) d u d v ,
where f ( u , v ) designates the joint PDF of the ordered eigenvalues of 1 2 H H H , random variables u and v. For a Gaussian H with i.i.d. components, the joint density function of λ 2 , λ 1  is given by [66]
f λ 2 , λ 1 ( u , v ) = 16 e 2 v 2 u ( v u ) 2 , v u 0 .
The optimal expected rate is a product of an optimal selection of the power distribution ρ ( u , v ) . Specifying the power distribution uniquely specifies the residual interference function I ( u , v ) (43) and (46). Hence, optimizing R ave can instead be carried out with respect to the I ( u , v ) , i.e.,
R ave max = max I ( u , v ) 0 d a 0 d b f ( a , b ) 0 a d u u b d v R F ( u , v , I , I u v ) ,
where f ( a , b ) is defined in (50), and we have set R F ( u , v , I , I u v ) d 2 R ( u , v , I , I u v ) d u d v from (47), which depends on the interference function I ( u , v ) and the power density I u v ( u , v ) from (43) and (46), respectively. Maximizing R ave with respect to the functional I ( u , v ) is a variational problem ([23], Appendix A). Consequently, the optimization problem may be stated in the form of a partial differential equation (PDE),
S I + 2 u v S I u v = 0 ,
where
S ( a , b , I , I a b ) 1 + F ( a , b ) F ( a ) F ( b ) · R F ( a , b , I , I a b ) ,
and S I is the partial derivative with respect to the function I ( u , v ) , S I u v is the partial derivative with respect to the function I u v , and I u v is the second-order partial derivative of I ( u , v ) with respect to u and v. The necessary condition for the extremum is given in ([23], Appendix A) in terms of a non-linear second order PDE and does not appear to have a straightforward analytical solution. Therefore, we demonstrate a single-dimension approximation to the optimal solution. This approximation approach is called the 1-D approximation, and it is developed for the 2 × 2 channel, i.e., two transmit and two receive antennas. It suggests breaking the mutual dependency of the optimal power distribution ρ ( a , b ) by requiring ρ ( a , b ) = ρ ( a ) ρ ( b ) . Such a representation bears two independent solutions, obtained from solving the optimal SISO broadcast strategy. Another sub-optimal solution could be obtained based on a finite-level code layering, as suggested in [57] for the SISO scheme. Accordingly, a single layer (outage) coding with and without employing majorization ranking at the receiver is suggested by [23]. A two-layer coded scheme for the 2 × 2 channel is also studied and compared with the outage approach in [23]. Another sub-optimal approach to the MIMO channel involves modeling the MIMO channel as a multiple-access channel (MAC), where each antenna transmits an independent stream [23]. In an MAC approach for the MIMO channel, instead of performing joint encoding for all transmit antennas, each antenna has an independent encoder. Thus, the receiver views an MAC. When each encoder performs layered coding, we essentially get an MAC-broadcast strategy. This approach was first presented in [73] for the multiple-access channel, employing the broadcast approach at the receiver. The advantage of this approach is that each transmitter views an equivalent degraded broadcast channel, and the results of the SISO broadcast strategy may be directly used.

2.4.4. Degraded Message Sets

Next, we briefly outline the formulation of the general MIMO broadcasting with degraded message sets. The key step for addressing the continuous broadcast approach for MIMO channels with degraded message sets involves decoupling the layering index and the channel state. In many previous studies on the continuous broadcast approach (e.g., [23,32,74]) the layering index is associated with the channel fading gain. However, for the MIMO case with degraded message set, it is proposed that the continuous layering indices are associated with only the power allocation and layer rates.
Consider the MIMO channel model in (31). The source transmits layered messages with a power density distribution function ρ ( s ) , where s [ 0 , ) . The first transmitted message is associated with s = 0 , and can be considered as a common message for all receivers. The next layer indexed by d s , cannot be decoded by the first user, but it is a common message for all other users. The capacity of the channel in (31) for a given channel state is the mutual information given by
I ( y ; x ) = log det I + P M H H H ,
which can also be expressed using the eigenvalues of 1 M H H H  [66],
I ( y ; x ) = k = 1 K log 1 + P λ k ,
where K = min ( M , N ) is the degree of freedom of the MIMO channel, and { λ k } k = 1 K are the eigenvalues of 1 M H H H . The singular value decomposition (SVD) of 1 M H H H = U Λ V H where U and V are unitary matrices and Λ is a [ K x K ] diagonal matrix of singular values of 1 M H H H . The equivalent receive signal of (31) multiplied by H is y = U Λ V H x + n , and multiplying the received signal by U H creates a parallel channel U H y = Λ x + n , where x = V x . This makes the channel of (31) an effective parallel channel when x is transmitted. However V is known at the receiver, and therefore the transmitter does not have to perform any precoding, and layering can be performed with respect to singular values distribution of 1 M H H H . The fractional achievable rate for a power allocation ρ ( s ) d s , and under successive decoding, is given by
k = 1 K log 1 + λ k ρ ( s ) d s 1 + λ k I ( s ) = k = 1 K λ k ρ ( s ) d s 1 + λ k I ( s ) ,
where I ( s ) is the residual layering power. I ( s ) serves as interference for decoding layer s. The relationship between power density distribution and the residual interference is ρ ( s ) = d I ( s ) d s . It is achievable for the set of eigenvalues { λ k } k = 1 K such that
d R ( s ) k = 1 K λ k ρ ( s ) d s 1 + λ k I ( s ) d I K ( λ 1 , . . . , λ K , s ) .
Feasibility of successive decoding here results from the fact that the function d I K ( λ 1 , , λ K , s ) is an increasing function of λ k , k { 1 , , K } . Define a fractional rate allocation function r ( s ) , such that r ( s ) ρ ( s ) = d R ( s ) . The cumulative rate achievable for a layer index s is simply
R ( s ) = 0 s r ( u ) ρ ( u ) d u .
The probability of achieving R ( s ) is given by
F c ( s ) = P r ( s ) k = 1 K λ k 1 + λ k I ( s ) ,
where F c ( s ) is the complementary CDF of the layering index s, i.e., F c ( s ) = 1 F ( s ) . The expected broadcasting rate is then
R bs = 0 d s ( 1 F ( s ) ) r ( s ) ρ ( s ) = 0 d s P r ( s ) k = 1 K λ k 1 + λ k I ( s ) r ( s ) ρ ( s ) .
We focus now on the case of K = 2 , i.e., min ( M , N ) = 2 . In this case, the fractional rate r ( s ) is decipherable if
r ( s ) λ 1 1 + λ 1 I ( s ) + λ 2 1 + λ 2 I ( s ) .
An alternative formulation is for a given λ 1 , the eigenvalues λ 2 for which r ( s ) can be reliably decoded are given by
λ 2 r ( s ) + r ( s ) λ 1 I ( s ) λ 1 1 + ( 2 λ 1 r ( s ) ) I ( s ) r ( s ) λ 1 I 2 ( s ) G ( λ 1 , s , I , r ) ,
where the inequality holds only for G ( λ 1 , s , I , r ) 0 . An alternative representation of the decoding probability of layer s is thus
F c ( s ) = P λ 2 G ( λ 1 , s , I , r )
= 0 d u G ( u , s , I , r ) d v f λ 1 , λ 2 ( u , v ) · 1 G ( u , s , I , r ) 0
= 0 d u f λ 1 ( u ) Q λ 1 , λ 2 ( u , G ( u , s , I , r ) ) · 1 G ( u , s , I , r ) 0 ,
where 1 ( x ) is the indicator function, and
f λ 1 , λ 2 ( u , v ) = 2 F λ 1 , λ 2 ( u , v ) u v ,
is the joint PDF of ( λ 1 , λ 2 ) , and
Q λ 1 , λ 2 ( u , v ) F λ 1 , λ 2 ( u , v ) u .
The expected rate for a general layering function r ( s ) and a layering power allocation function I ( s ) is given by
R bs = 0 d s r ( s ) ρ ( s ) · 0 d u f λ 1 ( u ) Q λ 1 , λ 2 ( u , G ( u , s , I , r ) ) · 1 G ( u , s , I , r ) 0 .
Clearly, the optimization problem for expected broadcasting rate maximization is given by
R bs , opt = max r ( s ) 0 , I ( s ) , s . t . I ( 0 ) = P , ρ ( s ) 0 0 d s J ( s , I , I , r ) ,
where the integrand functional J ( s , I , I , r ) is given by
J ( s , I , I , r ) = r ( s ) ρ ( s ) 0 d u f λ 1 ( u ) Q λ 1 , λ 2 ( u , G ( u , s , I , r ) ) · 1 G ( u , s , I , r ) 0 .
The necessary conditions for extremum are given by the Euler equations [59]
J r = 0 ,
J I s J I = 0 ,
where J r is the partial derivative of J with respect to r ( s ) . The extremum condition for r ( s ) in (71) can be expressed as follows:
0 d u f λ 1 ( u ) Q λ 1 , λ 2 ( u , G ) · 1 r ( s ) 1 G 0 + δ ( G ) f λ 1 , λ 2 ( u , G ) s G · 1 G 0 = 0 ,
where for brevity, G ( u , s , I , r ) is replaced by G, and δ ( x ) is the Dirac delta function. The extremum conditions, as stated in (71) and (72), do not lend themselves into closed-form analytical solutions even though K = 2 , and characterizing them remains an open problem for future research.

2.5. On Queuing and Multilayer Coding

Classical information theory generally assumes an infinitely long queue of data ready for transmission, which is motivated by maximizing communication throughput (Shannon capacity). In network theory, on the other hand, the input data is usually a random process that controls writing to a buffer (serving as a queue), and the readout from this buffer is another random process. In these settings, the design goal of transmission concentrates on minimizing the queue delay for the input data. However, designing the data queue and transmission algorithm cannot be decoupled in the presence of stringent delay constraints on input data transmission. This is because the objective is no longer only maximizing the throughput. This conceptual difference between network theory and information theory can be overcome by posing a common optimization problem and jointly minimizing the delay of a random input process under a power (rate) control constraint. This becomes a cross-layer optimization problem involving the joint optimization of two layers of the seven-layer open systems interconnection (OSI) model. Other fundamental gaps between network theory and information theory are covered in detail in [75,76,77,78].
Queuing and channel coding for a block fading channel with transmit CSI only, for a single user, is discussed in [79]. In this section, we first consider optimizing rate and power allocation for a single layer code transmission. For this scheme, the outage capacity [13] maximizes the achievable throughput. Rate and power are optimized jointly to minimize the overall delay. The delay is measured from the arrival of a packet at the queue until successfully decoded, including, if needed, retransmission due to outage events.
The study in [80] considers a cross-layer system optimization approach for a single-server queue followed by a multi-layer wireless channel encoder, as depicted in Figure 3. The main focus is on minimizing the average delay of a packet measured from entering the queue until successful service completion.

2.5.1. Queue Model—Zero-Padding Queue

Next, we consider the zero-padding queue model described in [80]. It is assumed that the transmission is performed every time the queue is not empty. If the available queue data are less than a packet size, a frame can be generated with zero-padding to have a valid frame for the channel encoder. We define the queuing time as the time from arrival to completion of service, and the waiting time as the time measured from arrival until initially being served. The queue’s waiting time analysis can be done at embedded points: the beginning of every time slot. The random process of packet arrival random at each slot is a deterministic process denoted by λ (bits/channel use).
The queue waiting time can be measured directly based on the queue size, as stated on Little’s theorem [81], by normalizing the queue size by the inverse of the input rate λ . Notice that Little’s theorem does not consider the instantaneous quantities to the average waiting time and average queue size. The following equation defines the queue size:
Q n + 1 = N λ n + 1 + Q n N R n N λ n + 1 + Q n N R n 0 0 otherwise ,
where N is the number of channel uses between slots, which is also the block length, and λ n + 1 is a deterministic queue input rate λ . It is noteworthy that in a single-layer coding, R n is a fixed R with probability p, and it is 0 with probability 1 p . This waiting time equation is also analyzed in ([82], Chapter 5) for a single-layer coding approach and a deterministic arrival process, where tight bounds on the expected waiting time are obtained. For simplicity, by normalization of the queue size by the block-length N, the Lindley equation is obtained [83]:
q ˜ n + 1 = q ˜ n + λ n + 1 R n q ˜ n + λ n + 1 R n 0 0 q ˜ n + λ n + 1 R n < 0 ,
where q ˜ n is now the queue size in units of blocks of data corresponding to N arrivals to the queue. In an outage approach, we have R n = R with probability p, and R n = 0 with a complementary probability 1 p , which is also the outage probability. For the rest of the analysis, the queue equations will be normalized following (75). We specify the queuing time equation for completeness of the definitions, which is the overall system delay for the zero-padding queue. The overall delay must always take into account the additional delay of service time beyond the queue’s waiting time. The normalized queue size is the waiting time equivalent, i.e.,
q n + 1 = q n + λ n + 1 λ R n λ , q n R n λ 0 λ n + 1 λ , o t h e r w i s e ,
where q n is a normalized queue size at a renewal slot n. In a single-layer coding approach, it is possible to analyze the queue delay by adopting the standard M/G/1 queue model. The input random process of an M/G/1 model follows a Poisson process, and its service distribution is another general random process. In an outage approach, a geometrically distributed random variable characterizes the time between services. For using the M/G/1 model, an important assumption on the system model is made: input arrives in blocks that have the same length as the coded transmission blocks. That is, the queue equation is normalized to the data block size of its corresponding transmission. The number of arrivals is measured in block units, and the input process has a rate of λ norm .
Having the arrival blocks are equal in size to transmitted blocks is a limiting constraint since a change of transmission rate means a change in input block size. Therefore, the M/G/1 queue model is not adopted in [80], and in the following, we use the zero-padding queue model as described earlier.

2.5.2. Delay Bounds for a Finite Level Code Layering

We consider here K multi-layer coding, and describe the Lindley equation [81]. The queue update equation is given by
w n + 1 = w n + x n w n + x n 0 0 w n + x n < 0 ,
where x n is the update random variable, which depends on the number of code layers. Its value represents the difference between the queue input λ and the number of layers successfully decoded, i.e.,
x n λ i = 1 K ν i , n R i .
Random variables { ν i , n } i = 1 K are associated with the outage probability as function of layer index. The corresponding fading power thresholds are denoted by { s th , i } i = 1 K . Random variables { ν i , n } i = 1 K are related to the fading thresholds as follows
ν i , n = 1 s th , i s n s th , i + 1 0 otherwise ,
where s n is the fading power realization at the nth time-slot, and s th , K + 1 = . Every random variable ν i , n has a probability of being 1, denoted by p K i + 1 . Note that outage probability is
p ¯ = 1 i = 1 K p i ,
in which p ¯ represents the probability that all layers cannot be decoded. The CDF of the queue size at these embedding points requires computing the CDF at every time instant. In this setting, the probability density d F X ( τ ) of X (78) is given by
d F X ( x ) = i = 1 K p i δ x ( λ j = 1 K i + 1 R j ) + p ¯ δ ( x λ ) ,
where p i = P { s th , i s n s th , i + 1 } for i { 1 , , K } and s th , K + 1 = . The next theorem, discussed in detail in ([80], Appendix B), establishes upper and lower bounds on E [ W K ] .
Theorem 1
([80]). For a K-layer coding, the expected queue size is upper and lower bounded by
E [ w K ] ( K λ ) ( i = 1 K p i K i + 1 λ ) ( K λ ) 2 + i = 1 K p i ( K K i + 1 ) 2 + p ¯ K 2 2 ( i = 1 K p i K i + 1 λ ) ,
and
E [ w K ] 2 ( K λ ) ( i = 1 K p i K i + 1 λ ) ( K λ ) 2 + i = 1 K p i ( K K i + 1 ) 2 + p ¯ K 2 2 ( i = 1 K p i K i + 1 λ ) ,
where V j = 1 V R j .
The variance of the achievable rate random variable σ R KL 2 is given by
σ R KL 2 i = 1 K p i K i + 1 2 ( R KL , av ) 2 ,
where
R KL , av i = 1 K p i K i + 1 .
Corollary 1.
Queue expected size and expected delay for K-layer coding are upper bounded by
E [ w KL ] σ R K L 2 2 ( R KL , av λ ) ( 1 λ R KL , av ) σ R K L 2 2 R KL , av ,
and the expected delay is upper bounded by
E [ w λ , KL ] σ R K L 2 2 λ ( R KL , av λ ) ( 1 λ R KL , av ) σ R K L 2 2 R KL , av λ ,
where σ R KL 2 and R KL , av are given by (84) and (85) respectively.

2.5.3. Delay Bounds for Continuum Broadcasting

A continuous broadcasting approach is considered in this section. In this approach, the transmitter also sends multi-layer coded data. Unlike K-layer coding, the layering is a continuous function of the channel fading gain parameter. The number of layers is not limited, and an incremental rate with a differential power allocation is associated with every layer. The differential per layer rate is d R ( s ) = s ρ ( s ) d s 1 + s I ( s ) and ρ ( s ) d s is the transmit power of a layer s. This also determines the transmission power distribution per layer [58]. The residual interference for a fading power s is I ( s ) = s ρ ( u ) d u (4). The total achievable rate for a fading gain realization s is R ( s ) = 0 s u ρ ( u ) d u 1 + u I ( u ) (6). It is possible to extend the K-layer coding bounds shown above to this continuous broadcast setting. The bounds in (82) and (83) could be used for broadcasting after performing the following modifications:
  • The number of layers is unlimited, that is K .
  • Since the layering is continuous, every layer i is associated with a fading gain parameter s. Every rate R i is associated with a differential rate d R ( s ) specified in (3).
  • The cumulative rate K should be replaced by
    R T = 0 d R ( s ) .
  • The sum i = 1 K p i K i + 1 is actually the average rate and it turns to be R bs (7) for the continuum case.
  • Finally, in finite-level coding the expression i = 1 K p i ( K K i + 1 ) 2 + p ¯ K 2 turns out to be
    R d , bs 2 0 d u f ( u ) R T 0 u d R ( s ) 2
    = 0 d u f ( u ) u d R ( s ) 2
    = 2 0 d u F ( u ) d R ( u ) u d R ( s ) ,
    in the continuous case, where d R ( u ) and R ( u ) are specified in (3) and (6), respectively.
Corollary 2.
The queue average size for a continuous code layering is upper and lower bounded by
E [ w bs ] R T λ 2 + R d , bs 2 ( R T λ ) 2 2 ( R bs λ ) ,
E [ w bs ] ( R T λ ) + R d , bs 2 ( R T λ ) 2 2 ( R bs λ ) ,
and the average delay is lower and upper bounded by
E [ w λ , bs ] R T λ 2 λ + R d , bs 2 ( R T λ ) 2 2 λ ( R bs λ ) ,
E [ w λ , bs ] R T λ λ + R d , bs 2 ( R T λ ) 2 2 λ ( R bs λ ) ,
where R bs , R T , and R d , bs 2 are specified in (7), (88), and (89) respectively.
The variance of the achievable rate random variable σ R bs 2 is given by
σ R bs 2 0 d u f ( u ) R ( u ) 2 R bs 2
= 0 d u f ( u ) 0 u d R ( s ) 2 R bs 2
= 2 0 d u ( 1 F ( u ) ) d R ( u ) 0 u d R ( s ) R bs 2
= 2 0 d u ( 1 F ( u ) ) d R ( u ) R ( u ) R bs 2 .
Corollary 3.
The queue average size for a continuous code layering is upper bounded by
E [ w bs ] σ R bs 2 2 ( R bs λ ) ( 1 λ R bs ) σ R bs 2 2 R bs ,
and the average delay is upper bounded by
E [ w λ , bs ] σ R bs 2 2 λ ( R bs λ ) ( 1 λ R bs ) σ R bs 2 2 R bs λ ,
where R bs and σ R bs 2 are given by (7) and (96), respectively.
For minimizing the expected delay in the continuous layering case, it is required to obtain the optimal ρ ( s ) (4) which minimizes the average queue size upper bound. As in multi-layer coding, an analytic solution is not available and remains an open problem for further research. However, numerical optimization is impossible here. The constraint of optimization is a continuous function. The target functional in the optimization problem for continuous layering does not have a localization property [59]. A functional with localization property can be written as an integral of some target function. Our functional contains a ratio of integrals and further multiplication of integrals, which cannot be converted to an integral over a single target function. Such functional is also denoted as a nonlocal functional in [59]. In such cases, it is preferable to look for an approximate representation of the nonlocal functional, which has the localization property. Alternatively, approximate target functions with reduced degrees of freedom may be optimized.
An interesting observation from the numerical results of [80] is that when considering delay as a performance measure, code layering could give noticeable performance gains in terms of delay, which are more impressive than those associated with throughput. This makes layering more attractive when communicating under stringent delay constraints.
Analytic resource allocation optimization for delay minimization, under the simple queue model in [80], remains an open problem for further research. In general, when layering is adopted at the transmitter, in conjunction with successive decoding at the receiver, the first layer is decoded earlier than other layers, and it has the shortest service time. Accounting for a different service delay per layer, the basic queue size update equation (the Lindley equation) should be modified accordingly. The analysis of the broadcast approach with a per layer queue is a subject for further research. The queue model which was used in [80] is a zero-padding queue. In this model, the frame size is kept fixed every transmission, and if the queue is nearly empty, the transmission includes zero-padded bits on top of queue data. Optimizing the transmission strategy as a function of the queue size, such that no zero-padding is required, can further increase layering efficiency and minimize the expected delay. This is a possible direction for further research.

2.6. Delay Constraints

There are various aspects in which delay constraints in communications may impact the system design. Stringent delay constraints might not allow to capture the channel ergodic distribution, and may benefit from a broadcast approach. This is while relaxed delay constraints may allow transmission of long codewords that capture the channel ergodicity. When there is a mixture of delay requirements on data using the same physical transmission resources, interesting coded transmission schemes can be considered. This is studied in [84] as discussed in next subsections and also widely covered in [85,86]. Another aspect is decoding multiple independent blocks, as considered in [87], and studied by its equivalent channel setting, which is the MIMO parallel channel [20] and discussed in detail in the next subsections.

2.6.1. Mixed Delay Constraints

The work in [84] considers the problem of transmission with delay-constrained (DC) and non-delay-constrained (NDC) streams are transmitted over an SISO channel, with no CSIT adhering to the broadcast approach for the DC stream. The DC stream comprises layers that have to be decoded within a short period of a single transmission block. The NDC stream comprises layers that may be encoded over multiple blocks and decoded after the complete codeword is received, potentially observing the channel ergodicity. Three overall approaches are suggested in [84], trying to maximize the expected sum rate. Their achievable rate regions over DC and NDC are examined. A DC stream is always decoded in the presence of an NDC stream, which is treated as interference. However, before decoding an NDC stream, the decodable DC layers can be removed, allowing NDC decoding at the highest signal-to-interference-plus-noise ratio (SINR). A closed-form solution of the sum-rate maximization problem can be derived for the outage and broadcast DC stream in parallel to a single NDC layer. When NDC transmission is also composed of multi-layers, the optimization problem of the expected sum-rate becomes much more complicated.
The joint strategy of accessing both DC and NDC parts on a single channel uses a two-level block nesting. Every L samples define a block for the DC stream, while the NDC stream is encoded over K such blocks, consisting of L · K samples. The NDC block is called a super block. L is large enough for reliable communication for the DC part, but it is much shorter than the dynamics of the slow fading process. K is large enough to enable the empirical distribution of the fading coefficient to be similar to the real one. Two independent streams of information are encoded. The DC stream is decoded at the completion of each block at the decoder, at a rate dependent upon the realization of the channel fading coefficient for that block. The NDC stream is decoded only at the completion of the super block. All of the following proposed schemes assume superposition coding, equivalent to symbol-wise additivity of the DC and NDC code letters. Denote by w L the L-length codeword for the DC code for each block, and z K L the K L -length codeword for the NDC code for each super block. Define one super block as
y k , i = s k · ( w k , i + z k , i ) + n k , i , for i = 1 , 2 , , L , k = 1 , 2 , , K ,
where the double sub-index { k , i } is equivalent to the time index ( k 1 ) · L + i . Note that slow fading channel nature was used by defining s k , i = s k . This scheme reflects a power constraint of the form E [ | w k , i + z k , i | 2 ] P . Define R DC ( s ) as the achievable rate for a fading power realization s per block. The total expected DC rate over all fading power realizations is given by
R DC = 0 f S ( u ) R DC ( u ) d u .
Let R NDC designate the rate of the NDC part, which experiences enough such realizations throughout communication. When relaxing the stringent delay constraint, coding over sufficient large blocks achieves ergodic capacity, denoted by C erg = E S [ log 1 + S P ] . Clearly, for any coding scheme R DC + R NDC C erg .

2.6.2. Broadcasting with Mixed Delay Constraints

The superposition of DC and NDC is employed by allocating a fixed amount of power per stream. Define the DC relative power portion as β [ 0 , 1 ] , that is β · P is the power allocated for the DC stream and the rest ( 1 β ) · P for the NDC stream. The DC part uses the broadcast approach. During decoding of the DC part, the NDC is treated as additional interference since during the decoding of each DC block the NDC codeword cannot be completely received, and thus cannot be decoded nor reconstructed to assist the DC decoding. The NDC decoder is informed of all DC decoded layers per DC codeword, and it cancels out the decoded part from the corresponding NDC block, maximizing its SINR for NDC decoding. By designing the two encoders like described earlier, we can justify that both DC and NDC parts communicate over a flat fading channel with additive Gaussian noise. The imposed noise for each part consists of the white channel noise along with undecodable codewords of those that are undecoded yet from both parts.
The DC encoder uses superposition of an infinite number of layers, ordered using channel fading realization s in a manner that forms a degraded broadcast channel. Per DC message, the transmitted codeword of length L is given by
w L ( m 1 , m 2 , , m ) = j = 1 w j L ( m j ) .
Designate ρ ( s ) to be the DC layering power distribution, which will be optimized later on, and each layer communication scheme will try to overcome a Gaussian channel where the fading is known to both sides. The NDC encoder sends a single message through a block of length L · K . By random coding over a Gaussian channel, the codewords can be generated. A total of e L · K · R NDC codewords can be used, where the channel rate R NDC relies on the optimized channel power ρ ( s ) as well.
The decoders are activated by order. First, the DC decoder works on every L-block and by successive decoding can reveal as many layers as the channel permits. It is similar to the classic broadcast approach, except all layers suffer from an undecodable (at this stage) interference. All DC decoders’ outputs are fed to the NDC decoder, which works after K such blocks. After removal of the decodable DC codewords of all blocks, the NDC part is decoded with a minimal residual interference, where the interference includes only the undecoded DC layers. Calculating the DC rate in the presence of NDC is a direct extension of [23], which is a special case for β = 1 . Define the DC interference for a fading power s as I ( s ) , implying
I ( s ) = s ρ ( u ) d u , and ρ ( s ) = d d s I ( s ) .
It associates the undecodable layers upon a channel fading realization s as noise to the transmission. It is restricted to the total DC allocated power
I ( 0 ) = 0 ρ ( u ) d u = β P ,
with 0 β 1 .
Lemma 1
(Achievable Expected DC Rate [84]). Any total expected DC rate R DC , which is averaged over all fading realizations, that satisfies
R DC u = 0 ( 1 F S ( u ) ) u ρ ( u ) 1 + u I ( u ) + ( 1 β ) P u d u ,
is achievable.
Lemma 2
(Achievable Expected NDC Rate [84]). Any total expected NDC rate R NDC , which is averaged over all fading realizations, that satisfies
R NDC 0 f S ( u ) log 1 + ( 1 β ) P u 1 + u I ( u ) d u ,
is achievable.
It is possible to derive the optimal power allocation for DC layering that maximizes the sum rate ( R DC + R NDC as stated in (106) and (107), respectively. It is a function that depends on I ( s ) according to (105). Specifically, the optimization problem is
I * ( s ) = argmax I ( s ) R DC + R NDC s . t . I ( 0 ) = β P , and I ( ) = 0 .
The outage approach is a simple special case of layering, where a single DC coded layer is used. In this case, the power distribution I ( s ) is explicitly given by
I ( s ) = β P if 0 s s th 0 if s > s th ,
ρ ( s ) = β P · δ ( s s th ) ,
where δ is the Dirac function and s th is a parameter set prior to the communication. Constant  s th may be interpreted as the fading gain threshold for single layer coding. The advantages of this approach are low implementation complexity and ease of analysis. The disadvantage is its sub-optimality. The outage approach is designed for a channel with fixed fading of s th . On the one hand, if s s th , the message can be transmitted error-free at a rate adjusted for s th . On the other hand, if s < s th , the specific transmission is useless.
Proposition 1
(Joint Optimality by Outage DC [84]). The maximizer I o ( s ) of (108) subject that satisfies the form in (109) is specified by s th * , which can be found as the solution to
f S ( s th * ) log 1 + β P s th * = ( 1 F S ( s th * ) ) β P ( 1 + P s th * ) ( 1 + ( 1 β ) P s th * ) .
The optimal expected DC outage rate and the optimal expected NDC outage rate, which together maximize the sum rate are
R DC , o = 1 F S ( s th * ) log 1 + β P s th * 1 + ( 1 β ) P s th * ,
R NDC , o = 0 s th * f S ( u ) log 1 + ( 1 β ) P u 1 + β P u d u + s th * f S ( u ) log 1 + ( 1 β ) P u d u .
Maximizing (108) can be derived analytically by developing an Eüler Equation in a similar way to [23]. This is done by enlarging the class of admissible functions I ( s ) (as opposed to the outage approach) to be continuously differentiable and to satisfy the boundary conditions I ( 0 ) = β P and I ( ) = 0 .
Proposition 2
(Joint Optimality by Broadcast DC ). The maximizer I b s ( s ) of (108) when considering all continuously differentiable boundary conditioned functions is I bs ( s ) = [ I ˜ ( s ) ] 0 β P , where
I ˜ ( x ) = 1 x b ( x ) + b 2 ( x ) 4 a ( x ) c ( x ) 2 a ( x ) 1 ,
a ( x ) = x f S ( x ) ,
b ( x ) = 2 ( 1 β ) P f S ( x ) x 2 ( 1 F S ( x ) ) ,
c ( x ) = ( 1 β ) 2 P 2 f S ( x ) x 3 .
The associated rates R DC , bs and R NDC , bs can be achieved by substituting it in (106) and (107).
The square root in (114) can impose a finite-length domain for I ˜ ( s ) , that can result in discontinuity at I ( s ) . This situation is addressed by assigning a Dirac function at ρ ( s ) , which can be interpreted as a superposition of single-layer coding and continuous layering. Figure 4 shows the relation of R DC + R NDC for the joint outage approach and the joint broadcast approach, for selected values of β . The total expected sum-rate is the sum of the DC rate and the NDC rate. As may be observed, if β 0.9 , then the ergodic capacity can be nearly achieved in high SNRs.

2.6.3. Parallel MIMO Two-State Fading Channel

Broadcasting over MIMO channels is still an open problem, and only sub-optimal achievable schemes are known [23]. The work in [20] considers a two-state parallel MIMO channel, which is equivalent to a SISO two-state channel, where decoding can be done over multiple consecutive blocks, as studied in [87]. The work in [20] considers the slow (block) fading parallel MIMO channel [66], where channel state is known at the receiver only. Under this channel model, the transmitter may adopt a broadcast approach [23], which can optimize the expected transmission rate under no transmission CSI, which is essentially characterized by the variable-to-fixed coding [6].
The study in [49] composes two degraded broadcast channels [24,88] into a three-user setup: an encoder with two outputs, each driving a dual-output broadcast channel; two decoders, where each is fed by one less-noisy broadcast channel output and one more-noisy output of the other channel (called unmatched). This channel is referred to as degraded broadcast product channel. For the AWGN case, the capacity region (private and common rates) of this channel was derived [49]. In [20], the MIMO setting for the broadcast approach is revisited, with new tools that differ from those in [23,74]. This is by analyzing the finite-state parallel MIMO channel, where the capacity region in [49] is used to address the multi-layering optimization problem for maximizing the expected rate of a two-state fading [87,89,90] parallel MIMO channel.

2.6.4. Capacity of Degraded Gaussian Broadcast Product Channels

Consider the model introduced in [49], which is a two-receiver discrete memoryless degraded product broadcast channel. The Gaussian case was addressed as a special case. A single transmitter encodes two n-length codewords consisting of a common message w 0 { 1 , , 2 n R 0 } to be decoded by both users, and two private messages w BA { 1 , , 2 n R BA } and w AB { 1 , , 2 n R AB } , one for each of the two decoding users. A single function encodes these three messages into two codewords, where each undergoes parallel degraded broadcast sub-channels
y 1 = x 1 + n 11 z 1 = y 1 + n 12 , and z 2 = x 2 + n 21 y 2 = z 2 + n 22 ,
where n 11 , n 21 CN ( 0 , ν b 1 ) , n 21 , n 22 CN ( 0 , ν a 1 ν b 1 ) . As depicted in the bold and red parts of Figure 5, two users (namely A B and B A ) receive both common and private messages from the transmitter and independently decode the messages. This is an unmatched setting, as y 1 is less noisy than z 1 , and z 2 is less noisy than y 2 . Hence, each of the users has one less-noisy channel output alongside another, which is the noisier output of the other sub-channel. Following ([49], Theorem 2), which shows this case, and exploiting symmetry for equal power allocation to both sub-channels, optimal allocation is expected to be achieved by equal common rate allocation to every user (state). Denoting α ¯ = 1 α , the capacity region ( R 0 , R BA , R AB ) is
R 0 log 1 + ν a α P 1 + ν a α ¯ P + log 1 + ν b α P 1 + ν b α ¯ P ,
R 0 + R BA = R 0 + R AB log 1 + ν a α P 1 + ν a α ¯ P + log ( 1 + ν b P ) ,
R 0 + R BA + R AB log 1 + ν b P + log 1 + ν a α P 1 + ν a α ¯ P + log 1 + ν b α ¯ P .

2.6.5. Extended Degraded Gaussian Broadcast Product Channels

The classical product channel is extended by introducing two dual-input receivers in addition to the original two. The first gets the two more noisy channel outputs ( z 1 , y 2 ) , whereas the second receives the two less noisy outputs ( z 2 , y 1 ) . To support this, two messages w AA and w BB are added. The total two n-length codewords are the superposition of three codewords by independent encoders as follows ( X 1 , X 2 ) = f AA ( w AA ) + f cr ( w 0 , w BA , w AB ) + f BB ( w BB ) , where subscript cr stands for crossed states ( ( A , B ) or ( B , A ) ). See Figure 5 for an illustration.
Stream AA is decoded first, regardless of whether the others can be decoded (this is done by treating all the other streams as interference). Then, both streams AB and BA, including their common stream subscripted 0 can be decoded after removing the AA impact from their decoder inputs (treating the BB stream as interference). Finally, removing all the above decoded streams allows decoding stream BB. From (121), we have
R AA 2 log 1 + α AA P ν a 1 + α ¯ AA P ,
R AA + R 0 2 log 1 + α AA P ν a 1 + α ¯ AA P + log 1 + α α cr P ν b 1 + ( α ¯ α cr + α BB ) P + log 1 + α α cr P ν a 1 + ( α ¯ α cr + α BB ) P ,
R AA + R 0 + R BA = R AA + R 0 + R AB 2 log 1 + α AA P ν a 1 + α ¯ AA P + log 1 + α α cr P ν a 1 + ( α ¯ α cr + α BB ) P + log 1 + α cr P ν b 1 + α BB P ,
R AA + R 0 + R BA + R AB 2 log 1 + α AA P ν a 1 + α ¯ AA P + log 1 + α cr P ν b 1 + α BB P + log 1 + α α cr P ν a 1 + ( α ¯ α cr + α BB ) P + log 1 + α ¯ α cr P ν b 1 + α BB P ,
R AA + R 0 + R BA + R AB + R BB 2 log 1 + α AA P ν a 1 + α ¯ AA P + log 1 + α cr P ν b 1 + α BB P + log 1 + α α cr P ν a 1 + ( α ¯ α cr + α BB ) P + log 1 + α ¯ α cr P ν b 1 + α BB P + 2 log 1 + α BB P ν b 1 ,
where α AA , α cr , α BB [ 0 , 1 ] are the relative power allocations for the subscripted letters α AA + α cr + α BB = 1 , and α [ 0 , 1 ] is the single user private power allocation within the unmatched channel.

2.6.6. Broadcast Encoding Scheme

Adding a message splitter at the transmitter and channel state-dependent message multiplexer at the receiver enriches the domain. Figure 6 illustrates the encoding and decoding schemes. During decoding, the four possible channel states S = ( S 1 , S 2 ) impose different decoding capabilities. If S = ( A , A ) , then g AA ( · ) can reconstruct w AA to achieve a total rate of R AA . For S = ( B , A ) , g BA ( · ) is capable of reconstructing three messages ( w AA , w 0 , w BA ) with sum rate of R AA + R 0 + R BA . Similarly for S = ( A , B ) , g AB ( · ) reconstructs ( w AA , w 0 , w AB ) with sum rate R AA + R 0 + R AB . When both channels are permissive S = ( B , B ) , all five messages ( w AA , w 0 , w BA , w AB , w BB ) are reconstructed at g BB ( · ) under the rate R AA + R 0 + R BA + R AB + R BB . Recall that a single user transmission is of interest here, thus the expected rate of the parallel channel underhand can be expressed by
R ave = P A 2 R AA + P A P B ( R AA + R 0 + R AB ) + P B P A ( R AA + R 0 + R BA ) + P B 2 ( R AA + R 0 + R BA + R AB + R BB ) .
Using (126), and since both channels have identical statistics leading to R AB = R BA , the achievable average rate is
R ave = 2 ( P A + P B ) 2 log 1 + ν a P + R 0 ( 1 α AA ) + R 1 ( 1 α AA α α cr ) + R 2 ( 1 α AA α cr ) ,
where the new notations are
R 0 ( α 0 ) = [ ( P A + P B ) 2 P A 2 ] log ( 1 + ν b α 0 P ) [ ( P A + P B ) 2 + P A 2 ] log ( 1 + ν a α 0 P ) ,
R 1 ( α 1 ) = P B 2 log ( 1 + ν b α 1 P ) [ ( P A + P B ) 2 P A 2 ] log ( 1 + ν a α 1 P ) ,
R 2 ( α 2 ) = 2 P A P B log ( 1 + ν b α 2 P ) .
and the arguments satisfy α 0 = 1 α AA , α 1 = 1 α AA α α cr , and α 2 = 1 α AA α cr = α BB . Note that R 0 ( α 0 ) and R 1 ( α 1 ) are not obliged to be positive, as they can be negative for some scenarios, and R 2 ( α 2 ) is non-positive by definition. Denoting the domain D of valid power allocations vector α = [ α , α AA , α cr , α BB ] T [ 0 , 1 ] 4 and the operator [ x ] + = max { 0 , x } yield the following proposition.
Proposition 3.
The maximal sum rate of the symmetric two parallel two state channel over all power allocations is
max α D R ave ( α ) = 2 ( P A + P B ) 2 log ( 1 + ν a P ) + max 0 α AA 1 R 0 ( 1 α AA ) + R 1 ( α 1 opt ( α AA ) ) ,
where
α 1 opt ( α AA ) = max { 0 , min { 1 α AA , α 1 * } } ,
α 1 * = P B 2 ν b [ ( P A + P B ) 2 P A 2 ] ν a [ ( P A + P B ) 2 P A 2 P B 2 ] ν a ν b P ,
and the latter solves α 1 R 1 ( α 1 * ) = 0 .
Corollary 4.
The optimal power allocation for the state ( B , B ) is α BB opt = 0 .
This is true for any set of parameters ν a , ν b , P A , P B , even if P B 1 and ν b ν a . Inherently, a penalty occurs when trying to exploit the double permissive state.
Corollary 5.
Under the optimal power allocation, α opt ( α AA ) = 1 α 1 opt ( α AA ) / ( 1 α AA ) .
This removes a degree of freedom in the optimization problem.
Using these corollaries, and the notation α = [ α , α AA , α cr , α BB ] T instead of α = [ α 0 , α 1 , α 2 ] T , we have the following theorem.
Theorem 2.
The maximal sum rate of the symmetric two-parallel two-state channel over all allocations α D is
R ave opt = 2 ( P A + P B ) 2 log ( 1 + ν a P ) + [ l ] max 0 α AA 1 R 0 ( 1 α AA ) + R 1 ( ( 1 α AA ) · ( 1 α opt ( α AA ) ) ) ,
where
r C l α opt ( α AA ) = min 1 , 1 P B 2 ν b [ ( P A + P B ) 2 P A 2 ] ν a 2 P A · P B · ν a ν b P ( 1 α AA ) + .
Denoting the argument of the maximization as α AA opt , the optimal power allocation vector is
α opt = [ α opt ( α AA ) , α AA opt , 1 α AA opt , 0 ] .
From Proposition 3 and by setting α 1 = 1 α AA α α cr = ( 1 α AA ) ( 1 α ) , it can be observed that the optimal allocation for state BB is α BB = 0 . For evaluation of the advantage of the joint α AA and α , the following sub-optimal schemes are compared: (a) independent broadcasting; (b) privately broadcasting; and (c) only common broadcasting. A scheme for which the encoder disjointly encodes different messages into each single channel of the parallel channel using the broadcast approach over the fading channel is denoted by independent broadcasting. The broadcast approach for fading SISO channel relies on two main operations: superposition coding by layering at the transmitter; and successive interference cancellation at the receiver. The maximal expected sum rate of the symmetric two parallel two state channel under independent broadcasting is
R ave ind - bc , opt = 2 ( P A + P B ) log 1 + ν a P 1 + ν a ( 1 α ind - bc , opt ) P + 2 P B log 1 + ν b ( 1 α ind - bc , opt ) P ,
α bc , opt = min 1 , 1 P B ν b ( P A + P B ) ν a P A ν a ν b P + .
A scheme for which no power is allocated for the common stream in the ( B , A ) and ( A , B ) states (message w 0 ) is called privately broadcasting. This scheme is equivalent to setting α = 0 in Theorem 2, thus allocating encoding power from the common stream ( R 0 = 0 ) to the other streams R AA , R AB , R BA and R BB , which achieves optimality for
α AA prv - bc , opt = min 1 , 1 [ P B P A ] ν b [ P B + P A ] ν a 2 P A ν a ν b P + .
A scheme for which all of the crossed state power is allocated only to the common stream (message w 0 ) and no power is allocated to the private messages (no allocation for messages w AB and w BA ) is called only common broadcasting. This scheme is equivalent to setting α = 1 in Theorem 2, thus allocating encoding power from the private streams ( R AB = R BA = 0 ) to the other streams R AA , R 0 and R BB , which achieves optimality for
α AA cmn - bc , opt = min 1 , 1 [ ( P A + P B ) 2 P A 2 ] ν b [ ( P A + P B ) 2 + P A 2 ] ν a 2 P A 2 ν a ν b P + .
The result in Theorem 2 differs from the one presented in [87] for the two-parallel two state channel. In [87] it is chosen to transmit only common information to the pairs ( A , B ) and ( B , A ) . (Ref. [87], Equation (39)) clearly states that for the crossed states (A, B) and (B, A) only common rate is used without justification. It is further claimed that this is an expected rate upper bound for some power allocation. The result in (137) proves that broadcasting common information only, i.e., α = 1 is sub-optimal, and does not yield the maximal average rate.

2.7. Broadcast Approach via Dirty Paper Coding

We conclude this section by noting the relevance of dirty paper coding (DPC) to the broadcast approaches discussed. Even though the central focus of the broadcast approaches discussed is superposition coding, all these approaches can be revisited by instead adopting dirty paper coding. Information layers generated by a broadcast approach interfere with one another with the key property that the interference is known to the transmitter. DPC enables effective transmission when the transmitted signal is corrupted by interference (and noise in general) terms that are known to the transmitter. This is facilitated via precoding the transmitted signal by accounting for and canceling the interference.
DPC plays a pivotal role in the broadcast channel. It is an optimal (capacity-achieving) scheme for the multi-antenna Gaussian broadcast channel [47,50] with general message sets and effective, in the form of binning, for the general broadcast channel with degraded message sets [22]. To discuss the application of the DPC in the broadcast approach, consider the single-user channel with a two-state fading process, that is for the model in (1) we have h { h w , h s } where | h w | < | h s | , rendering the following two models in these two states:
y w = h w x + n w ,
y s = h s x + n s ,
which can be also considered a broadcast channel with two receivers with channels h w and h s . The broadcast region for this channel can be achieved by both superposition coding and DPC. When the noise terms have standard Gaussian distribution, the capacity region is characterized by all pairs
R w 1 2 log 1 + α P | h w | 2 1 + ( 1 α ) P | h w | 2 ,
R s l e q 1 2 log 1 + ( 1 α ) P | h s | 2 .
over all α [ 0 , 1 ] . This capacity region is achievable by superposition coding of two information layers x w and x s with rates R w and R s to transmit x = x w + x s . The same region can be achieved by DPC, where x w is generated and decoded as done in superposition coding, and x s is designed by treating x w as the interference known to the transmitter non-causally. It is noteworthy that the original design of DPC in [91] the non-causally known interference term is modeled as additive Gaussian noise. However, as shown in [92], the interference term can be any sequence, like a Gaussian codeword, and still achieve the same capacity region.
The operational difference of superposition coding and DPC at the receiver side is that when using superposition coding, at the stronger receiver, the layers x w and x s have to be decoded sequentially, while when using DPC, the two layers can be decoded in parallel. This observation alludes to an operational advantage of DPC over superposition coding: while both achieving the capacity region, DPC imposes shorter decoding latency.

3. The Multiple Access Channel

3.1. Overview

As discussed in detail in Section 2, CSI uncertainties result in degradation in communication reliability. Such degradations can be further exacerbated as we transition to multiuser networks consisting of a larger number of simultaneously communicating users. Irrespective of multiuser channel models, a common realistic assumption is that slowly varying channels can be estimated by the receivers with high fidelity, providing the receivers with the CSI. Acquiring the CSI by the transmitters can be further facilitated via feedback from the receivers. However, feedback communication is often infeasible or incurs additional communication and delay costs, which increase significantly as the number of transmitters and receivers grows in the network.
This section focuses on the multi-access channel, consisting of multiple users with independent messages communicating with a common receiver. The channels undergo slow fading processes. Similar to the setting considered in Section 2, it is assumed that the receivers can acquire the CSI with high fidelity (e.g., through training sessions). While the receiver has perfect and instantaneous access to the states of all channels, the transmitters are either entirely or partially oblivious to the CSI, rendering settings in which the transmitters face CSI uncertainty. The information-theoretic limits of the MAC when all the transmitters and receivers have complete CSI are well-investigated [7,93,94]. Furthermore, there is a rich literature on the MAC’s information-theoretic limits under varying degrees of availability of instantaneous CSIT. Representative studies on the capacity region include the impact of degraded CSIT [95], quantized and asymmetric CSIT [96], asymmetric delayed CSIT [97], non-causal asymmetric partial CSIT [98], and symmetric noisy CSIT [99]. Bounds on the capacity region of the memoryless MAC in which the CSIT is made available to a different encoder in a causal manner are characterized in [100]. Counterpart results are characterized for the case of common CSI at all transmitters in [101], which are also extended in [102] to address the case in which the encoder compresses previously transmitted symbols and the previous states. The study in [103] provides an inner bound on the capacity region of the discrete and Gaussian memoryless two-user MAC in which the CSI is made available to one of the encoders non-causally. An inner bound on the capacity of the Gaussian MAC is derived in [104] when both encoders are aware of the CSI in a strictly causal manner. The capacity region of a cooperative MAC with partial CSIT is characterized in [105]. The capacity region of the multiuser Gaussian MAC in which each interference state is known to only one transmitter is characterized within a constant gap in [106]. A two-user generalized MAC with correlated states and non-causally known CSIT is studied in [107]. In [108], a two-user Gaussian double-dirty compound MAC with partial CSIT is studied. The capacity regions of a MAC with full and distributed CSIT are analyzed in [109]. A two-user cooperative MAC with correlated states and partial CSIT is analyzed in [110]. The study in [111] characterizes inner and upper bounds on the capacity region of a finite-state MAC with feedback.
Despite the rich literature on the MAC with full CSIT, when the transmitters can acquire only the probability distribution of the fading channel state, without any instantaneous CSIT, the performance limits are not fully known. The broadcast approach is investigated for the two-user MAC with no CSIT in [73,89,90,112,113,114]. The multiple access channel is primarily studied in [73,89,90,112,113,114]. Specifically, the effectiveness of a broadcast strategy for multiuser channels is investigated in [73,90,112] for the settings in which the transmitters are oblivious to all channels, and in [89,114] for the settings in which each transmitter is oblivious to only channels linking other users to the receiver. Specifically, when the transmitters are oblivious to all channels, the approaches in [73] and [112] adopt the broadcast strategy designed for single-user channels and directly apply it to the MAC. As a result, each transmitter generates a number of information streams, each adapted to a specific realization of the direct channel linking the transmitter to the receiver. The study in [90] takes a different approach based on the premise that the contribution of each user to the overall performance of the multiple access channel not only depends on the direct channel linking this user to the receiver, but it is also influenced by the relative qualities of the other users’ channels. Hence, it proposes a strategy in which the information streams are generated and adapted to the channel’s combined state resulting from incorporating all individual channel states. The setting in which the transmitters have only local CSIT, that is, each transmitter has the CSI of its direct channel to the receiver while being unaware of the states of other users’ channels, is studied in [89,114]. Medium access without transmitter coordination is studied in [115].
The remainder of this section is organized as follows. This section focuses primarily on the two-user MAC, for which we provide a model in Section 3.2. We start by discussing the settings in which the transmitters have access to only the statistical model of the channel, and they are oblivious to the channel model in Section 3.4 with an emphasis on continuous channel models. Next, we focus on the setting in which the receiver has full CSI, and the transmitters have only the statistical model of the CSI and review two broadcast approaches in Section 3.5 and Section 3.6. The focus of these two subsections is two-state discrete channel models. Their generalization to multi-state channels will be discussed in Section 3.7. Finally, we will review two broadcast approach solutions for settings with local (partial) CSIT in Section 3.8 and Section 3.9. The focus of these two subsections are on the two-state discrete channel models, and their generalization to the multi-state models is discussed in Section 3.10.

3.2. Network Model

Consider a two-user multiple access channel, in which two independent users transmit independent messages to a common receiver via a discrete-time Gaussian multiple-access fading channel. All the users are equipped with one antenna, and the random channel coefficients are statistically independent. The fading process is assumed to remain unchanged during each transmission cycle and can change to independent states afterward. The users are subject to an average transmission power constraint P. By defining x i as the signal of transmitter i { 1 , 2 } and h i as the coefficient of the channel linking transmitter i { 1 , 2 } to the receiver, the received signal is
y = h 1 x 1 + h 2 x 2 + n ,
where n accounts for the AWGN with mean 0 and variance 1. We consider both continuous and discrete models for the channel. 

3.2.1. Discrete Channel Model

Each of the channels, independently of the other one, can be in one of the finite distinct states. We denote the number of states by N and denote the distinct values that h 1 and h 2 can take by { s m : m { 1 , , } } . Hence the multiple access channel can be in one of the combined 2 states. By leveraging the broadcast approach (c.f. [23,25,112]), the communication model in (144) can be equivalently presented by a broadcast network that has two inputs x 1 and x 2 and 2 outputs, each corresponding to one channel state combination. Each output corresponds to one possible combinations of channels h 1 and h 2 . We denote the output corresponding to the combination h 1 = s m and h 2 = s n by
y m n = s m x 1 + s n x 2 + n m n ,
where n m n is a standard Gaussian random variable for all m , n { 1 , , } . Figure 7 depicts this network for the case of the two-state channels ( = 2 ). Without loss of generality and for the convenience in notations, we assume that channel gains { s m : m { 1 , , } } take real positive values and are ordered in the ascending order, i.e., 0 < s 1 < s 2 < < s . We define p m n as the probability of the state ( h 1 , h 2 ) = ( s m , s n ) . Accordingly, we also define q m = n = 1 p m n and p n = m = 1 p m n . We will focus throughout the section on the case of symmetric average transmission power constraints, i.e., P 1 = P 2 = P , whereas the generalization to the case of asymmetric power constraints is straightforward.

3.2.2. Continuous Channel Model

In the continuous channel model, the fading coefficients h 1 and h 2 take a continuous of values that follow known statistical models. These statistical models are known to the transmitter and receiver. We denote the fading powers by s 1 = | h 1 | 2 and s 2 = | h 2 | 2 . Depending on channel realizations, denote the channel output when the channel gains are s 1 and s 2 by
y s 1 s 2 = s 1 x 1 + s 2 x 2 + n s 1 s 2 .
Throughout this section, we use the notation C ( x , y ) = 1 2 log 2 1 + x y + 1 P
We review settings in which the transmitters are either fully oblivious to all channels or have local CSIT. That is, each transmitter 1 (2) knows channel h 1 ( h 2 ) while being unaware of channel h 2 ( h 1 ). We refer to this model by L-CSIT, and similarly to the N-CSIT setting, we characterize an achievable rate region for it.

3.3. Degradedness and Optimal Rate Spitting

The broadcast approach’s hallmark is a designating an order of degradedness among different network realizations based on their qualities. Designating degradedness in the single-user single-antenna channel arises naturally, as discussed in Section 2. However, for the multiuser networks, there is no natural notion of degradedness, and any ordering approach will at least bear some level of heuristics. In the broadcast approaches that we discuss in this section for the MAC, we use the capacity region of the multiple access channels under different network states. Based on this notion of degradedness, once one of the channels improves, the associated capacity region expands, alluding the to the possibility of sustaining higher reliable rates.

3.4. MAC without CSIT—Continuous Channels

We start by discussing the canonical Gaussian multiple-access channel in which the channels undergo a continuous fading model in (146). This is the setting that is primarily investigated in [73]. To formalize this approach, we define R i ( s ) as the reliability communicated information rate of transmitter i at fading level s. Similarly to the single-user channel, we define ρ i ( s ) as the power assigned to the infinitesimal information layer of transmitter i corresponding to fading power s. Accordingly, we define the interference terms
I i ( s ) = s ρ i ( u ) d u .
when the channels fading powers are s 1 and s 2 , we define SNR i ( s 1 , s 2 ) as the effective SNR of transmitter i. These SNR terms satisfy
SNR 1 ( s 1 , s 2 ) = s 1 1 + s 2 I 2 ( SNR 2 ( s 1 , s 2 ) ) , SNR 2 ( s 1 , s 2 ) = s 1 1 + s 2 I 1 ( SNR 1 ( s 1 , s 2 ) ) .
Hence, corresponding to this channel combination, the rate that transmitter i can sustain reliability is
R i ( s 1 , s 2 ) = 0 SNR i ( s 1 , s 2 ) u d I i ( u ) 1 + u I i ( u ) ,
and subsequently, the expected rate of transmitter i is
R ¯ i = E [ R i ( s 1 , s 2 ) ] = 0 1 F i ( u ) u d I i ( u ) 1 + u I i ( u ) ,
where F i denotes the CDF of SNR i ( s 1 , s 2 ) . Any resource allocation or optimization problem over the average rates R ¯ 1 and R ¯ 2 consists in determining the power allocation functions ρ i ( s ) . For instance, finding the transmission policy that yields the maximum average rate R ¯ 1 + R ¯ 2 boils down to designing functions ρ 1 and ρ 2 . The same formulation can be readily generalized to the K-user MAC, in which we designate a power allocation function to each transmitter, accordingly define the interference functions, the achievable rates for each specific channel realization, and the average rate of each user.

3.5. MAC without CSIT—Two-State Channels: Adapting Streams to the Single-User Channels

We continue by reviewing finite-state multi-access channels. This setting is first investigated in [112] for the two-state discrete channel model. As suggested in [112], one can readily adopt the single-user strategy of [25] and split the information stream of a transmitter into two streams, each corresponding to one fading state, and encodes them independently. Recalling the canonical model in (146), let us refer to the channel with the fading gains s 1 and s 2 as weak and strong channels, respectively (We will use this strong versus weak dichotomous model throughout Section 3).The two encoded information streams are subsequently superimposed and transmitted over the channel. One of the streams, denoted by W 1 , is always decoded by the receiver, while the second stream, denoted by W 2 , is decoded only when the channel is strong.
This strategy is adopted and directly applied to the multiple access channel in [112]. Specifically, it generates two coded information streams per transmitter, where the streams of user i { 1 , 2 } are denoted by { W 1 i , W 2 i } . Based on the channels’ actual realizations, a combination of these streams is successively decoded by the receiver. In the first stage, the baseline streams W 1 1 and W 1 2 , which constitute the minimum amount of guaranteed information, are decoded. Additionally, when the channel between transmitter i and the receiver, i.e., h i is strong, in the second stage information stream W 2 i is also decoded. Table 1 depicts the decoding sequence corresponding to each of the four possible channel combinations.
Based on the codebook assignment and decoding specified in Table 1, the equivalent multiuser network is depicted in Figure 8. The performance limits on the rates are characterized by delineating the interplay among the rates of the four codebooks { W 1 1 , W 1 2 , W 2 1 , W 2 2 } . We denote the rate of codebook W i j by R ( W i j ) . There are two ways of grouping these rates and assessing the interplay among different groups. One approach would be analyzing the interplay between the rate of the codebooks adapted to the weak channels and the codebooks’ rate adapted to the strong channels. The second approach would be analyzing the interplay between the rates of the two users. In a symmetric case and in the face of CSIT uncertainty, a natural choice will be the former approach. For this purpose, define R w = R 1 1 + R 1 2 and R s = R 2 1 + R 2 2 as the rate of the codebooks adapted to the weak and strong channels, respectively. The study in [112] characterizes the capacity region of the pair ( R w , R s ) achievable in the Gaussian channel, where it is shown that superposition coding is the optimal coding strategy. The capacity region of this channel is specified in the following theorem.
Theorem 3
([112]). The ( R w , R s ) capacity region for the channel depicted in Figure 8 is given by the set of all rates satisfying
R w C ( 2 s 1 ( 1 β ) , 2 s 1 β ) ,
R s C ( 2 s 2 , 0 ) ,
for all β [ 0 , 1 ] .

3.6. MAC without CSIT—Two-State Channels: State-Dependent Layering

In the approach of Section 3.5, each transmitter adapts its transmission to its direct link to the receiver without regards for the channel linking the other transmission to the receiver. However, the contribution of user i { 1 , 2 } to a network-wide performance metric (e.g., sum-rate capacity) depends not only on the quality of the channel h i , but also on the quality of the channel of the other user. This motivates adapting the transmission scheme of each transmitter to the MAC’s combined state instead of the individual channels. As investigated in [90,113] adapting to the network state can be facilitated by assigning more information streams to each transmitter and adapting them to the combined effect of both channels. Designing and assigning more than two information streams to each transmitter allows for a finer resolution in successive decoding, which in turn expands the capacity region characterized in [112].
To review the encoding and decoding scheme as well as the attendant rate regions, we start by focusing on the two-state discrete channel model. This setting furnishes the context to highlight the differences between streaming and successive decoding strategy in this section and those investigated in Section 3.5. By leveraging the intuition gained, the general multi-state discrete channel model will be discussed in Section 3.6.
In the approach that adapts the transmissions to the combined network states, each transmitter splits its message into four streams corresponding to the four possible combinations of the two channels. These codebooks for transmitter i { 1 , 2 } are denoted by { W 11 i , W 12 i , W 21 i , W 22 i } , where the information stream W u v i is associated with the channel realization in which the channel gain of user i is s v , and the channel gain of the other user is s u . These stream assignments are demonstrated in Figure 9. The initial streams { W 11 1 , W 11 2 } account for the minimum amount of guaranteed information, which are adapted to the channel combination ( h 1 2 , h 2 2 ) = ( s 1 , s 1 ) and they should be decoded by all four possible channel combinations. When at least one of the channels is strong, the remaining codebooks are grouped and adapted to different channel realizations according to the assignments described in Figure 9. Specifically:
  • The second group of streams { W 12 1 , W 21 2 } are reserved to be decoded in addition to { W 11 1 , W 11 2 } when h 1 is strong, while h 2 is still weak.
  • Alternatively, when h 1 is weak and h 2 is strong, instead the third group of streams, i.e., { W 21 1 , W 12 2 } , are decoded.
  • Finally, when both channels are strong, in addition to all the previous streams, the fourth group { W 22 1 , W 22 2 } is also decoded.
The order in which the codebooks are successively decoded in different network states is presented in Table 2. Based on this successive decoding order, channel gain state ( s 1 , s 1 ) is degraded with respect to all other states (i.e., the capacity region of the MAC corresponding to receiver y 11 is strictly smaller than those of the other three receivers), while  ( s 1 , s 2 ) and ( s 2 , s 1 ) are degraded with respect to ( s 2 , s 2 ) . Clearly, the codebook assignment and successive decoding approach presented in Table 2 subsumes the one proposed in [112] presented in Table 1. In particular, Table 1 can be recovered as a special case of Table 2 by setting the rates of the streams { W 21 1 , W 21 2 , W 22 1 , W 22 2 } to zero. The codebook assignment and decoding order discussed leads to the equivalent multiuser network with two inputs { x 1 , x 2 } and four outputs { y 11 , y 12 , y 21 , y 22 } , as depicted in Figure 10. Each receiver is designated to decode a pre-specified set of codebooks.
Next, we delineate the region of all achievable rates R u v i for i , u , v { 1 , 2 } , where  R u v i accounts for the rate of codebook W u v i . Define β u v i [ 0 , 1 ] as the fraction of the power that transmitter i allocates to stream W u v i for u { 1 , 2 } and v { 1 , 2 } , where we clearly have u = 1 2 v = 1 2 β u v i = 1 . For the simplicity in notations, and in order to place the emphasis on the interplay among the rates of different information streams, we focus on a symmetric setting in which relevant streams in different users have identical rates, i.e., rates of information streams W u v 1 and W u v 2 , denoted by R u v 1 and  R u v 2 respectively, are the same, and it is denoted by R u v , i.e., R u v = R u v 1 = R u v 2 .
Theorem 4
([90]). The achievable rate region of the rates ( R 11 , R 12 , R 21 , R 22 ) for the channel depicted in Figure 10 is the set of all rates satisfying:
R 11 r 11
R 12 r 12
R 21 r 21
R 12 + R 21 r 1
2 R 12 + R 21 r 12
R 12 + 2 R 21 r 21
R 22 r 22 ,
over all possible power allocation factors β u v i [ 0 , 1 ] such that Σ u = 1 2 Σ v = 1 2 β u v i = 1 , where by setting β ¯ u v = 1 β u v we have defined
r 11 = min 1 2 C 2 s 1 β 11 , 2 s 1 β ¯ 11 , C s 1 β 11 , ( s 1 + s 2 ) β ¯ 11 , C s 2 β 12 , s 1 ( β 12 + β 22 ) + s 2 ( β 21 + β 22 ) ) } ,
r 12 = min 1 2 C 2 α 2 β 12 , 2 α 2 β 22 , C α 2 β 12 , α 1 ( β 12 + β 22 ) + α 2 ( β 21 + β 22 ) ) ,
r 21 = min 1 2 C 2 s 2 β 21 , 2 s 2 β 22 , C s 1 β 21 , s 1 ( β 12 + β 22 ) + s 2 ( β 21 + β 22 ) , r 1 = min 1 2 C 2 s 2 ( β 12 + β 21 ) , 2 s 2 β 22 , C s 1 β 21 + s 2 β 12 , s 1 ( β 12 + β 22 ) + s 2 ( β 21 + β 22 ) ,
r 12 = C s 2 ( 2 β 12 + β 21 ) , 2 s 2 β 22 ,
r 21 = C s 2 ( β 12 + 2 β 21 ) , 2 s 2 β 22 ,
r 22 = 1 2 C 2 s 2 β 22 , 0 .
Proof. 
The proof follows from the structure of the rate-splitting approach presented in Figure 9 and the decoding strategy presented in Table 2. The detailed proof is provided in ([90], Appendix B).    □
In order to compare the achievable rate region in Theorem 4 and the capacity region presented in Theorem 3, we group the information streams in the way that they are ordered and decoded in [112]. Specifically, the streams { W 21 1 , W 21 2 , W 22 1 , W 22 2 } are allocated zero power. Information streams W 11 1 and W 11 2 are adapted to the weak channels, and the information streams W 12 2 and W 12 2 are reserved to be decoded when one or both channels are strong. Information streams adapted to the strong channels are grouped, and their rates are aggregated, and those adapted to the weak channels are also groups, and their rates are aggregated. Based on this, the region presented in Theorem 4 can be used to form the sum-rates R w = ( R 11 1 + R 11 2 ) and R s = ( R 12 1 + R 12 2 ) .
Theorem 5
([90]). By setting the power allocated to streams { W 21 1 , W 21 2 , W 22 1 , W 22 2 } to zero, the achievable rate region characterized by Theorem 4 reduces to the following region, which coincides with the capacity region characterized in [112].
R w min { a 3 , a 6 , a 9 , a 4 + a 8 } ,
a n d R s C s 2 β 12 1 + s 2 β 12 2 , 0 ,
where we have defined
a 3 = C s 1 ( β 11 1 + β 11 2 ) , s 1 ( β ¯ 11 1 + β ¯ 11 2 ) ,
a 4 = C s 1 β 11 1 , s 1 β ¯ 11 1 + s 2 β ¯ 11 2 ,
a 6 = C s 1 β 11 1 + s 2 β 11 2 , s 1 β ¯ 11 1 + s 2 β ¯ 11 2 ,
a 8 = C s 1 β 11 2 , s 2 β ¯ 11 1 + s 1 β ¯ 11 2 ,
a 9 = C s 2 β 11 1 + s 1 β 11 2 , s 2 β ¯ 11 1 + s 1 β ¯ 11 2 .
Proof. 
See ([90], Appendix D).    □
Figure 11 quantifies and compares the achievable rate region characterized in Theorems 4 and 5 with the capacity region characterized in Theorem 3. The regions presented in Theorems 4 and 5 capture the interplay among the rates of the individual codebooks and the capacity region of Theorem 3 characterize the trade-off between the sum-rates of the information streams adapted to the weak and strong channels. To have a common ground for comparisons, the result of Theorems 4 and 5 can be presented to signify the codebooks of the weak and strong channel states. Recall that earlier we defined the sum-rates
R w = R 11 1 + R 11 2 , and R s = R 12 1 + R 12 2 .
Accordingly, for the coding scheme (Table 2) we define
R ¯ w = R 11 1 + R 11 2 + R 21 1 + R 21 2 + R 12 1 + R 12 2 ,
and R ¯ s = R 22 1 + R 22 2 .
Based on these definitions, Figure 11 demonstrates the regions described by ( R w , R s ) and ( R ¯ w , R ¯ s ) , in which the transmission SNR is 10, the channel coefficients are ( s 1 , s 2 ) = ( 0.5 , 1 ) , and the regions are optimized over all possible power allocation ratios. The numerical evaluation in Figure 11 depict that the achievable rate region in Theorem 4 subsumes that of Theorem 5 (and subsequently, that of 3), and the gap between the two regions diminishes as the rates of the information layers adapted to the strong channels increases, i.e., R s and R ¯ s increase. Next, in order to assess the tightness of the achievable rate regions, we present an outer bound on the capacity region of the network in Figure 10.
Theorem 6
([90]). An outer bound for the capacity region of the rates ( R 11 , R 12 , R 21 , R 22 ) for the channel depicted in Figure 10 is the set of all rates satisfying:
R 11 1 2 a 3 , R 12 1 2 a 24 , R 21 1 2 a 27 , R 22 r 22 ,
where we have defined
a 24 = C s 2 β 12 1 + s 2 β 12 2 , s 2 β 22 1 + s 2 β 22 2 ,
a 27 = C s 2 β 21 1 + s 2 β 21 2 , s 2 β 22 1 + s 2 β 22 2 ,
r 22 = 1 2 C 2 s 2 β 22 , 0 .
Figure 12 compares the outer bound specified in Theorem 6 and the achievable rate region presented in Theorem 4 for SNR values 1 and 5, and the choice of ( s 1 , s 2 ) = ( 0.5 , 1 ) . Corresponding to each SNR, this figure illustrates the capacity region obtained in Theorem 3, as well as the achievable rate region and the outer bound reviewed in this section.
To evaluate the average rate as a long-term relevant proper measure capturing the expected rate over a large number of transmission cycles, where each cycle undergoes an independent fading realization. Consider a symmetric channel, in which the corresponding information streams are allocated identical power and have the same rate, and set R u v = R u v 1 = R u v 2 for u , v { 1 , 2 } . In addition, consider a symmetric distribution for h 1 and h 2 such that P ( h 1 2 = s i ) = P ( h 2 2 = s i ) for i { 1 , 2 } , and define p = P ( h 1 2 = s 1 ) = P ( h 2 2 = s 1 ) . By leveraging the stochastic model of the fading process, the average rate is
R ave = 2 [ R 11 + ( 1 p ) ( R 12 + R 21 ) + ( 1 p ) 2 R 22 ] .
Figure 13 depicts the variations of the average sum-rate versus p for different values of s 1 . The observations from this figure also confirm that higher gain levels are exhibited as p decreases. It is noteworthy that the results from Figure 11 validates the observations from Figure 13 that improvement in average rate is significant when the probability of encountering a weak channel state is low since the rate distribution considered in the achievable rate region comparison will correspond to average rate if the probability of observing s 1 is zero.

3.7. MAC without CSIT—Multi-State Channels: State-Dependent Layering

The idea of adapting the transmission to the combined network states can be extended to devise codebook assignment and decoding strategy schemes for the general multiple-state channel. Similarly to the two-state channel, in the -state channel model, 2 codebooks are assigned to each transmitter. Hence, corresponding to the combined channel state ( h 1 2 , h 2 2 ) = ( s q , s p ) codebook W p q 1 is assigned to transmitter 1 and codebook W q p 2 is assigned to transmitter 2. By following the same line of analysis as in the two-state channel, the network state ( h 1 2 , h 2 2 ) = ( s 1 , s 1 ) can be readily verified to be degraded with respect to states ( s 1 , s 2 ) , ( s 2 , s 1 ) , and ( s 2 , s 2 ) when s 2 > s 1 . Additionally, channel combinations ( s 1 , s 2 ) and ( s 2 , s 1 ) are also degraded with respect to state ( s 2 , s 2 ) . When a particular transmitter’s channel becomes stronger while the interfering channel remains constant, the transmitter affords to decode additional codebooks. Similarly, when a transmitter’s own channel remains constant while the interfering channel becomes stronger, the transmitter can decode additional layers. This can be facilitated by decoding and removing the interfering transmitter’s message, based on which the transmitter experiences reduced interference. Based on these observations, by ordering the different realizations of h 1 and h 2 in the ascending order and determining their relative degradedness, a successive decoding strategy is illustrated in Table 3. In this table, A p , q denotes the cell in the pth row and the qth column, and it specifies the set of codebooks U p q to be decoded when the combined channel state is ( h 1 2 , h 2 2 ) = ( s q , s p ) . In this table, the codebooks set to be decoded in each possible combined state is recursively related to the codebooks decoded in the weaker channels. Specifically, the state corresponding to A p 1 , q 1 is degraded with respect to states A p , q 1 and A p 1 , q . Therefore, in the state A p , q , the receiver decodes all streams from states A p 1 , q 1 (included in U p 1 , q 1 ), A p , q 1 (included in U p , q 1 ), and A p 1 , q (included in U p 1 , q ). Subsequently, these are followed by decoding one additional stream from each user denoted by W p q 1 and W q p 2 . When both channel coefficients have the strongest possible realizations, all the streams from both users will be decoded at the receiver.
Next, the rate region achieved is presented in Theorem 7 for the general multi-state channel. It can be verified that the region characterized by Theorem 4 is subsumed by this general rate region. Similarly to the two-state channel settings, define R u v i as the rate of codebook W u v i for i { 1 , 2 } and u , v { 1 , , } . Furthermore, define β u v [ 0 , 1 ] as the fraction of the power allocated to the codebook W u v i , where u = 1 v = 1 β u v = 1 . For the simplicity in notations and for emphasizing the interplay among the rates, we focus on the symmetric case in which R u v = R u v 1 = R u v 2 .
Theorem 7
([90]). A region of simultaneously achievable rates
{ R u v : u < v a n d u , v { 1 , , } }
for an ℓ-state two-user multiple access channel is characterized as the set of all rates satisfying:
R u v min b 1 ( u , v ) , b 2 ( u , v ) , b 3 ( u , v ) 2
R v u min b 4 ( u , v ) , b 5 ( u , v ) 2
R u v + R v u min b 6 ( u , v ) , b 7 ( u , v ) , b 8 ( u , v ) 2
2 R u v + R v u b 9 ( u , v )
R u v + 2 R v u b 10 ( u , v )
R u u min b 11 ( u ) , b 12 ( u ) 2 ,
where constants { b i : i { 1 , , 12 } } are specified in Appendix A.

3.8. MAC with Local CSIT—Two-State Channels: Fixed Layering

Next, we turn to the setting in which the transmitters have local CSI. Specifically, each channel randomly takes one of a finite number of states, and each transmitter only knows the state of its direct channel to the receiver perfectly, along with the probability distribution of the state of the other transmitter’s channel. This model was first studied in [114], in which a single-user broadcast approach is directly applied to the MAC. In this approach, each transmitter generates two coded layers, where each layer is adapted to one of the states of the channel linking the other transmitter to its receiver. This transmission approach is followed by successive decoding at the receiver in which there exists a pre-specified order of decoding of the information layers.
This scheme assigns codebooks based on channels’ strengths such that it reserves one additional information layer as the channel state gets stronger. In this scheme, the number of transmitted layers and the decoding order are fixed and independent of the actual channel state. In the two-state channel model, when a transmitter i experiences the channel state s m , it splits its message to two information layers via two independent codebooks denoted by T m 1 i and T m 2 i . The rate of layer T m 1 i is adapted to the weak channel state of the other user while the rate of layer T m 2 i is adapted to the strong channel state. Thus, each transmitter encodes its information stream by two layers and adapts the power distribution between them according to its channel state. Subsequently, the receiver implements a successive decoding scheme according to which it decodes one layer from transmitter 1 followed by one layer from transmitter 2, and then the remaining layer of transmitter 1, and finally the remaining layer of transmitter 2. This order is pre-fixed and is used in all channel states. This scheme is summarized in Table 4.
The following theorem characterizes an outer bound on the average rate region. For this purpose, define R i ( h 1 , h 2 ) as the rate of transmitter i for the state pair ( h 1 , h 2 ) . Accordingly, define R ¯ i = E h 1 , h 2 [ R i ( h 1 , h 2 ) ] as the average rate of transmitter i, where the expected value is with respect to the distributions of h 1 and h 2 .
Theorem 8
([114]). When the transmitters have local CSIT, an outer bound on the expected capacity region contains rates ( R ¯ 1 , R ¯ 2 ) satisfying
R ¯ 1 q 1 C ( s 1 , 0 ) + ( 1 q 1 ) C ( s 2 , 0 )
R ¯ 1 q 2 C ( s 1 , 0 ) + ( 1 q 2 ) C ( s 2 , 0 )
R ¯ 1 + R ¯ 2 q 1 q 2 C ( 2 s 1 , 0 ) + ( q 1 + q 2 2 q 1 q 2 ) C ( s 1 + s 2 , 0 ) + ( 1 q 1 ) ( 1 q 2 ) C ( 2 s 2 , 0 ) .

3.9. MAC with Local CSIT—Two-State Channels: State-Dependent Layering

Next, we present another scheme for the MAC with local CSIT that generalizes the scheme of Section 3.8 via adapting information layering to the combined states of the channel. The underlying motivation guiding this generalization is that we need to account for both the direct and interfering roles that each transmitter plays. Hence, the transmission rates of different layers should be adapted to the combined state of the entire network. The major difference between this approach and that in Section 3.8 is that this scheme relies on the available local CSIT available to the individual transmitters such that each transmitter adapts its layers and their associated raters to the instantaneous state of the channel. This facilitates opportunistically sustaining higher rates.
  • State-dependent Layering. In this approach, each transmitter, depending on the instantaneous state of the local CSI available to it, splits its message into independent information layers. Formally, when transmitter i { 1 , 2 } is in the weak state, it encodes its message by only one layer, which we denote by U 11 i . On the other contrary, when transmitter i { 1 , 2 } is in the strong state, it divides its message into two information layers, which we denote by U 12 i , and U 22 i . Hence, transmitter i adapts the codebook U 12 i (or U 22 i ) to the state in which the other transmitter experiences a weak (or strong) channel. A summary of the layering scheme and the assignment of the codebooks to different network states is provided in Figure 14. In this table, the cell associated with the state ( s m , s n ) for m , n { 1 , 2 } specifies the codebook adapted to this state.
  • Decoding Scheme. A successive decoding scheme is designed based on the premise that as the combined channel state becomes stronger, more layers are decoded. Based on this, the total number of codebooks decoded increases as one of the two channels becomes stronger. In this decoding scheme, the combination of codebooks decoded in different states is as follows (and it is summarizes in Table 5):
  • State ( s 1 , s 1 ) : In this state, both transmitters experience weak states, and they generate codebooks { U 11 1 , U 11 2 } according to Figure 14. In this state, the receiver jointly decodes the baseline layers U 11 1 and U 11 2 .
  • State ( s 2 , s 1 ) : When the channel of transmitter 1 is strong and the channel of transmitter 2 is weak, three codebooks { U 12 1 , U 22 1 , U 11 2 } are generated and transmitted. As specified by Table 5, the receiver jointly decodes { U 12 1 , U 11 2 } . This is followed by decoding the remaining codebook, i.e., U 22 1 .
  • State ( s 1 , s 2 ) : In this state, codebook generation and decoding are similar to those in the state ( s 2 , s 1 ) , except that the roles of transmitters 1 and 2 are interchanged.
  • State ( s 2 , s 2 ) : Finally, when both transmitters experience strong channels, the receiver decodes four codebooks in the order specified by the last row of Table 5. Specifically, the receiver first jointly decodes the baseline layers { U 12 1 , U 12 2 } , followed by jointly decoding the remaining codebooks { U 22 1 , U 22 2 } .
Compared to the setting without any CSIT at the transmitter (i.e., the setting discussed in Section 3.6), the key difference is that the transmitters have distinct transmission strategies when they are experiencing different channel states. Specifically, each transmitter dynamically chooses its layering scheme based on the instantaneous channel state known to it. Furthermore, the major difference with the scheme of Section 3.8 is that this scheme adapts the number of encoded layers proportionately to the strength of the combined channel state. Such adaptation of the number of encoded layers results in two advantages. The first one is that adapting the number of layers leads to overall fewer information layers to be generated and transmitted. This, in turn, results in decoding overall fewer codebooks and reduced decoding complexity. The second advantage pertains to providing the receiver with the flexibility to vary the decoding order according to the combined channel state. This allows for a higher degree of freedom in optimizing power allocation, and subsequently, larger achievable rate regions. In support of these observations, the numerical evaluations in Figure 15, the achievable rate region subsumes that of Section 3.8. Furthermore, as the number of channel states increases, the sum-rate gap between these two schemes becomes more significant. Finally, depending on the actual channel state, the scheme in this section decodes between 2 and ( + 1 ) 2 codebooks, whereas the scheme of Section 3.8 always decodes 2 codebooks.
It is noteworthy that when in the two-state channel model of Figure 16 the channel states are s 1 = 0 and s 2 = 1 , this model simplifies to the two-user random access channel investigated in Section 3.5. In this special case, reserving one codebook to be decoded exclusively in each of the interference-free states, i.e., ( s 1 , s 2 ) and ( s 2 , s 1 ) , enlarges the achievable rate region. Hence, it is beneficial in this special case to treat codebooks ( U 22 1 , U 22 2 ) as interference whenever both users are active, i.e., when the channel state is ( s 2 , s 2 ) . In general, however, when the channel gain s 1 is non-zero, i.e., s 1 > 0 , reserving two codebooks to be decoded exclusively in these two channel states limits the average achievable rate region.
  • Achievable Rate Region. Next, we provide an inner bound on the average capacity region. Recall that the average rate of transmitter i is denoted by R ¯ i = E h 1 , h 2 [ R i ( h 1 , h 2 ) ] , where the expectation is with respect to the random variables h 1 and h 2 . Hence, the average capacity region is the convex hull of all simultaneously achievable average rates ( R ¯ 1 , R ¯ 2 ) . Furthermore, we define β i j k [ 0 , 1 ] as the ratio of the total power P assigned to information layer U i j k , where we have
    i = 1 j β i j k = 1
    for all j , k { 1 , 2 } . The next theorem characterizes an average achievable rate region.
Theorem 9
([89]). For the codebook assignment in Figure 14, and the decoding scheme in Table 5, for any given set of power allocation factors { β i j k } , the average achievable rate region { R ¯ 1 , R ¯ 2 } is the set of all rates that satisfy
R ¯ 1 q 1 C s 1 , s 2 β 22 2 + q 2 C s 2 β 12 1 , s 2 β 22 1 + s 2 β 22 2 + C s 2 β 22 1 , 0 ,
R ¯ 2 p 1 C ( s 1 , s 2 β 22 1 ) + p 2 C s 2 β 12 2 , s 2 β 22 1 + s 2 β 22 2 + C s 2 β 22 2 , 0 ,
R ¯ 1 + R ¯ 2 q 1 p 1 C 2 s 1 , 0 + q 1 p 2 C s 1 + s 2 β 12 2 + s 2 β 22 2 , 0 + q 2 p 1 C s 1 + s 2 β 12 1 + s 2 β 22 1 , 0 + q 2 p 2 C s 2 β 12 1 + s 2 β 12 2 + s 2 β 22 1 + s 2 β 22 2 , 0 .
Achieving the average rate region specified in this theorem requires decoding the codebooks in the order specified by Table 5. Specifically, the receiver adopts a multi-state decoding scheme where in each state it decodes at most two codebooks. This decoding scheme continues until all the codebooks from both transmitters are decoded. Even though limiting the number of codebooks to be decoded at each stage is expected to result in a reduced rate region, it can be readily verified that the rate region that is achieved by employing a fully joint decoding scheme can be recovered via time-sharing among the average achievable rates corresponding to all possible decoding orders in each channel state.
  • Outer Bound. Next, we provide outer bounds on the average capacity region, and we compare them with the achievable rate region specified by Theorem 9.
  • Outer bound 1: The first outer bound is the average capacity region corresponding to the two-user MAC in which the transmitters have complete access to the CSI [116]. This region is specified by OTVYZO in Figure 17.
  • Outer bound 2: The second outer bound is the average capacity region of the two-user MAC with local CSI at transmitter 1 and full CSI at transmitter 2. Outer bound 2 is formally characterized in the following theorem.
Theorem 10
([89]). For the two-user MAC with local CSI at transmitter 1 and full CSI at transmitter 2, the average capacity region is the set of all average rates enclosed by the region OTUWYZO shown in Figure 17, where the corner points are specified in Appendix B.
For the case of available local CSI at transmitter 1 and full CSI at transmitter 2, it can be shown that deploying the discussed layering scheme at transmitter 1 (with local CSIT) achieves the average sum-rate capacity of Outer bound 1. This is formalized in the following theorem.
Theorem 11
([89]). With local CSI at transmitter 1 and full CSI at transmitter 2, an average achievable rate region is the region OTUXYZO shown in Figure 17. The average capacity region is achieved along TU and YZ , and the sum-rate capacity is achieved on XY. The corner points are specified in Appendix B.
Figure 17 illustrates the relative representations of the inner and outer bounds on the average capacity region. Specifically, the region specified by OTVYZO is the average capacity region of a two-user MAC with full CSI at each transmitter, which serves as Outer Bound 1 specified earlier. This region encompasses Outer Bound 2 denoted by OTUWYZ. Segments TU and XYZ of the boundary of Outer Bound 1 coincide with the average capacity region of the case of the two-user MAC with full CSIT.
Figure 15 demonstrates the average rate region for the two-state channel. For this region we have P = P 1 = P 2 = 10 dB, and select the channel gains as s 1 = 0.25 and s 2 = 1 . Accordingly, the channel probability parameters are set to q 1 = p 1 = 0.5 . The main observation is that the average achievable rate region coincides with average rate region achieved when the receiver adopts joint decoding. It can be shown that when the transmitters have local CSIT, it is possible to achieve an average sum-rate that is close to outer bound 1, and that the average the sum-rate capacity can be achieved asymptotically in the low and high power regimes. This observation is formalized in the next theorem.
Theorem 12
([89]). By adopting the codebook assignment presented and setting β 22 1 = β 22 2 = s 1 s 2 , the sum-rate capacity of a two-user MAC with full CSIT is achievable asymptotically as P 0 or P .

3.10. MAC with Local CSIT—Multi-State Channels: State-Dependent Layering

In this section, we generalize the encoding and decoding strategy of Section 3.9 to the general -state channel. When the channels have possible states, each transmitter is allocated different sets of codebooks, one corresponding to each channel state. Specifically, corresponding to channel state s m for m { 1 , , } , transmitter i encodes its message via m information layers generated according to independent codebooks. This set of codebooks is denoted by W m i = { U 1 m i , , U m m i } .
Table 6 specifies the designation of the codebooks to different combined channel states. In this table, the channels are ordered in the ascending order. In particular, varying channels for transmitter 1, the combined channel state ( s q , s p ) precedes all channel states ( s k , s p ) for all k > q . Similarly, for transmitter 2 channel state ( s q , s p ) precedes the channel state ( s q , s k ) , for every k > p . Furthermore, according to this approach, when user i’s channel becomes stronger, it decodes additional codebooks. The sequence of decoding the codebooks, as shown in Table 6, is specified in three steps:
  • State ( s 1 , s 1 ) : Start with the weakest channel combination ( s 1 , s 1 ) , and reserve the baseline codebooks U 11 1 , U 11 2 to be the only codebooks to be decoded in this state. Define V 11 i = { U 11 i } as the set of codebooks that the receiver decodes from transmitter i when the channel state is ( s 1 , s 1 ) .
  • States ( s 1 , s q ) and ( s q , s 1 ) : Next, construct the first row of the table. For this purpose, define V 1 q 2 as the set of the codebooks that the receiver decodes from transmitter 2, when the channel state is ( s 1 , s q ) . Based on this, the set of codebooks in each state can be specified recursively. Specifically, in the state ( s 1 , s q ) , decode what has been decoded in the preceding state ( s 1 , s q 1 ) , i.e., the set of codebooks V 1 ( q 1 ) 2 , plus new codebooks { U 1 q 1 , , U q q 1 } . Then, construct the first column of the table in a similar fashion, except that the roles of transmitter 1 and 2 are swapped.
  • States ( s q , s p ) for p , q > 1 : By defining the set of codebooks that the receiver decodes from transmitter i in the state ( s q , s p ) by V q p i , the codebooks decoded in this state are related to the ones decoded in two preceding states. Specifically, in state ( s q , s p ) decode codebooks V ( p 1 ) q 1 and V p ( q 1 ) 1 . For example, for = 3 , the codebooks decoded in ( s 2 , s 3 ) include those decoded for transmitter 1 in state ( s 2 , s 2 ) along with those decoded for transmitter 2 in channel state ( s 1 , s 3 ) .
The decoding order in the general case is similar the one used for = 2 in Table 5. In particular, in channel state ( s q , s p ) the receiver successively decodes q codebooks from transmitter 1 along with p codebooks from transmitter 2. The set of decodable codebooks in channel state ( s q , s p ) is related to set of codebooks decoded for transmitter 2 in state ( s q 1 , s p ) and those decoded for transmitter 1 ( s q , s p 1 ) . The average achievable rate region for the codebook assignment and decoding strategy presented in this section is summarized in Theorem 13. Similar to the two-state channel case, define β m n i [ 0 , 1 ] as the fraction of power allocated to the codebook U m n i such that m = 1 n β m n i = 1 , n { 1 , . } .
Theorem 13
([89]). For the codebook assignment in this section and the decoding scheme in Table 6, for any given set of power allocation factors { β m n i } , the average achievable rate region { R ¯ 1 , R ¯ 2 } for the ℓ-state channel is the set of all rates that satisfy
R ¯ 2 E [ r 1 ( n , m ) ] ,
R ¯ 2 E [ r 2 ( n , m ) ] ,
R ¯ 1 + R ¯ 2 E [ min { r 3 ( n , m ) , r 4 ( n , m ) } ] ,
where the functions { r 1 ( n , m ) , , r 4 ( n , m ) } , for all m , n { 1 , , } are defined as follows.
r 1 ( n , m ) = min m j = 1 c 1 ( j , m ) + c 3 ( j , n , m ) ,
r 2 ( n , m ) = min m j = 1 c 2 ( j , m ) + c 4 ( j , n , m ) ,
r 3 ( n , m ) = k < m c 5 ( m ) + c 7 ( m , n ) + c 9 ( k , m , n ) ,
r 4 ( n , m ) = k < m c 5 ( m ) + c 6 ( m , n ) + c 8 ( k , m , n ) ,
where
c 1 ( j , m ) = C ( s j β j j 1 , s m C 2 ( j , m ) ) , j { 1 , , } , m { j , , } ,
c 2 ( j , i ) = C ( s j β j j 2 , s m C 1 ( j , m ) ) , j { 1 , , } ,
c 3 ( j , n , m ) = C ( s n β j n 1 , s n C 1 ( j , n ) + s j C 2 ( j , j ) ) , n { j + 1 , , } , m { j , , } ,
c 4 ( j , n , m ) = C ( s n β j n 2 , s j C 1 ( j , m ) + s n C 2 ( j , n ) ) , n { j + 1 , , } ,
c 5 ( m ) = C ( s m β m m 1 + s n β m m 2 ) , m { 1 , , } ,
c 6 ( m , n ) = C ( s m β m m 1 + s n β m n 2 , s n C 2 ( m , n ) ) , m < n , n { m + 1 , , } ,
c 7 ( m , n ) = C ( s n β m n 1 + s m β m m 2 , s n C 1 ( m , n ) ) , m < n , n { m + 1 , , } ,
c 8 ( k , m , n ) = C s m β k m 1 + s n β k n 2 , s m C 1 ( k , m ) + s n C 2 ( k , n ) , k < m , n { m , , } ,
c 9 ( k , m , n ) = C s n β k n 1 + s m β k m 2 , a n C 1 ( k , n ) + s m C 2 ( k , m ) , k < m , n { m , , } ,
and we have defined C 1 ( m , n ) = 1 i = 1 m β i n 1 and C 2 ( m , n ) = 1 i = 1 m β i n 2 , for all m < n and n { m + 1 , , } .
Figure 18 demonstrates the average rate region for the three-state channel, in which the channel gains are s 1 = 0.04 , s 2 = 0.25 , s 3 = 1 , and channel probability parameters q 1 = 0.3 , q 2 = 0.4 for transmitter 1, and p 1 = 0.6 , p 2 = 0.1 for transmitter 2. Furthermore, the region in Theorem 11 is evaluated in Figure 19. Specifically, the average achievable rate region OTUXYZ specified in Figure 17 is evaluated for three scenarios S 1 , S 2 , S 3 . In all three scenarios, the average power constraint is set to 10 dB, i.e., P 1 = P 2 = P = 10 dB, and the channel states are ( s 1 , s 2 ) = ( 0.3 , 1 ) . Evaluations are carried out for the symmetric setting S ^ 1 with the probability distribution q 1 = p 1 = 0.5 , and the asymmetric cases S ^ 1 , S ^ 2 with probability distributions q 1 = 0.2 , p 1 = 0.8 and q 1 = 0.4 , p 1 = 0.5 . This figures illustrate that the average capacity region of the two-user MAC with full CSIT can be partially achieved when only one user has full CSIT.

4. The Interference Channel

4.1. Overview

In this section, we turn the focus to the interference channel as a key building block in interference-limited wireless networks. In this channel, multiple transmitters communicate with their designated receivers, imposing interference on one another. Designing and analyzing interference management schemes has a rich literature. Irrespective of their discrepancies, the existing approaches often rely on the accurate availability of the CSIT and CSIR. We discuss how the broadcast approach can be viewed as a distributed interference management scheme, rendering a practical approach to have effective communication in the interference channel in the face of unknown CSIT.
While the literature on assessing the communication reliability limits of the interference channel and the attendant interference management schemes is rich, a significant focus is on the channels with perfect availability of the CSIT at all transmitters. Representative known results in the asymptote of high SNR regime include the degrees-of-freedom (DoF) region achievable by interference alignment [117,118]. In the non-asymptotic SNR regime of particular note is the achievable rate region due to Han–Kobayashi (HK) [119,120], which is shown to achieve rates within one bit of the capacity region for the Gaussian interference channel [121]. While unknown in its general form, the capacity region is known in special cases, including the strong interference channel [122,123], the discrete additive degraded interference channel [124], certain classes of the deterministic interference channel [125,126,127,128], and opportunistic communication under bursty interference, which is a form of the broadcast approach and is studied under different assumptions on the non-causal availability of the CSI at the transmitters and receivers [129]. some other examples of interference channel and broadcast approach are found in [130,131]. There are extensive studies on circumventing the challenges associated with analyzing and optimal resource allocation over the HK region [130,132,133,134,135,136,137,138]. A more detailed and thorough overview of these can be found in [94].
Interference management without CSIT has also been the subject of intense studies more recently, with more focus on the high SNR regime. Representative studies in the high SNR regime include characterizing the DoF region for the two-user multi-antenna interference channel in [139,140,141,142,143,144,145]; blind interference alignment in [146,147,148,149,150,151,152,153,154,155]; interference management via leveraging network topologies in [156,157]; and ergodic interference channels in [158,159,160]. In the non-asymptotic SNR regime, the studies are more limited, and they include analysis on the capacity region of the erasure interference channel in [161,162]; the compound interference channel in [163]; ergodic capacity for the Z-interference channel in [164]; ergodic capacity of the strong and very strong interference channels in [165,166]; and approximate capacity region for the fast-fading channels [167,168].
In this section, conductive to relieving dependency on full CSIT, we discuss how the broadcast approach can be viewed as a distributed interference management solution for circumventing the lack of CSIT in the multiuser interference channel. One significant intuition provided by the HK scheme is that even with full CSIT, layering and superposition coding is necessary. Built upon this intuition, the broadcast approach is a natural evolution of the HK scheme. We focus on the two-user and finite-state Gaussian interference channel to convey the key ideas in rate-splitting, codebook assignments, and decoding schemes. The remainder of this section is organized as follows. This section focuses primarily on the two-user Gaussian interference channel, for which we provide a model in Section 4.2. We start by discussing the setting in which the receiver has full CSI, and the transmitters have only the statistical model of the CSI and review the application of the broadcast approach in this setting in Section 4.3 for the two-user channel and in Section 4.4 for the multiuser channel. Finally, we will review the interference channel with local CSIT in Section 4.5. Under the setting with local CSIT, we consider two scenarios in which each transmitter either knows the level of the interference that their respective receiver experiences, or the level of interference they impose on the unintended receiver. We discuss how the broadcast approach can be designed for each of these two scenarios.

4.2. Broadcast Approach in the Interference Channel—Preliminaries

Consider the two-user slowly-fading Gaussian interference channel, in which the coefficient of the channel connecting transmitter i to receiver j is denoted by h i j * for i , j { 1 , 2 } . We refer to h i i * and h i j * as the direct and cross channel coefficients, respectively, i j . The signal received by receiver i is denoted by
y i * = h i i * x i * + h i j * x j * + n i * ,
where x i * denotes the signal transmitted by transmitter i, and n i * accounts for the AWGN distributed according to N 0 , N i . The transmitted symbol x i * is subject to the average power constraint P i * , i.e., E [ | x i * | 2 ] P i * . Each channel is assumed to follow a block fading model in which the channel coefficients remain constant for the duration of a transmission block of length n, and randomly change to another state afterward. We consider an -state channel model in which each channel coefficient h i j * randomly and independently of the rest of the channels takes one of the possible states { s i : i { 1 , , } } . Without loss of generality, we assume that 0 < s 1 < < s < + . The -state interference channel in (208) gives rise to an interference channel with 2 different states. The entire channel states are assumed to be fully known to the receivers while being unknown to the transmitters. A statistically equivalent form of the -state interference channel in (208) is the standard interference channel model given by [169,170]
y 1 = x 1 + a 1 x 2 + n 1 , and y 2 = a 2 x 1 + x 2 + n 2 ,
and the inputs satisfy E [ | x i * | 2 ] P i * , where we have defined
a 1 = h 12 * h 22 * 2 · N 2 N 1 , a 2 = h 21 * h 11 * 2 · N 1 N 2 , and P i = ( h i i * ) 2 N i · P i * .
and the terms n 1 and n 2 are the additive noise terms distributed according to N ( 0 , 1 ) . The equivalence between (208) and (209) can be established by setting
y i = y i * N i , x i = h i i * N i x i * , n i = n i * N i .
Channel gains a 1 and a 2 are statistically independent, inheriting their independence from that of the channel coefficients. By invoking the normalization in (210), it can be readily verified that the cross channel gains a i take one of K = ( 1 ) + 1 possible states, which we denote by { β 1 , , β K } . Without loss of generality we assume they are in the ascending order. For the two-state channel, the cross channel gain takes one of the three states β 1 = s 1 s 2 , β 2 = 1 , and β 3 = 1 β 1 . Hence, the state of the network is specified by two cross links, rendering K 2 states for the network. We say that the network is in the state ( β s , β t ) when ( a 1 , a 2 ) = ( β s , β t ) . To distinguish different states, in the network state ( β s , β t ) , we denote the outputs by
y 1 s = x 1 + β s x 2 + n 1 , and y 2 t = β t x 1 + x 2 + n 2 .
Hence, this interference channel can be equivalently presented as a network with two transmitters and K 2 receiver pairs, where each receiver pair corresponds to one possible channel state. In the case of the symmetric interference channel, we have a 1 = a 2 , and the number of possible channel combinations reduces to K, rending an equivalent network with two transmitters and 2 K receivers. Figure 20 depicts such a symmetric network for the two-state channel. Finally, we define
q 1 s = P ( a 1 = β s ) and q 2 s = P ( a 2 = β s ) .

4.3. Two-User Interference Channel without CSIT

Effective interference management in the interference channel hinges on how a transmitter can balance the two opposing roles that it has as both an information source and an interferer. Striking such a balance requires designating a proper notion of degradedness according to which different realizations of the network can be distinguished and ordered. Hence, specifying an order of degradedness plays a central role in assigning codebooks and designing the decoding schemes. We adopt the same notion of degradedness that was used for the MAC with proper modifications.
When each channel has possible states, the cross channels take one of the K = ( 1 ) + 1 . Hence, by adopting the broadcast approach, this two-user interference channel becomes equivalent to a multiuser network consisting of two transmitters and K 2 receivers. The transmitters and each of these receivers form a MAC, in which the receiver is interested in decoding as many information layers as possible. To this end and by following the same line of arguments we had for the MAC, we use the capacity region of the individual MACs to designate degradedness among distinct network states.
The network model in (212) is equivalent to a collection of MACs. The MAC associated with receiver y i s , for s < k , is degraded with respect to the MAC associated with the receiver y i k . Hence, receiver y i k can successfully decode all the information layers that are decoded by the receivers { y i 1 , , y i s } . Driven by this approach to designating degradedness, each transmitter splits its message into multiple independent codebooks, where each is adapted to one combined state of the network and intended to be decoded by specific receivers.
At receiver y i k , decoding every additional layer form transmitter i directly increases the achievable rate. In parallel, decoding each additional layer from the interfering other transmitter indirectly increases the achievable rate by canceling a part of the interfering signal. Driven by these two observations, transmitter i breaks its message into 2 K layers denoted by { V i k , U i k } k = 1 K , each serving a specific purpose. Recall that in the canonical model in (212), the direct channels remain unchanged and only the cross channels have varying states. Hence, each of the 2 K layers of each transmitter is designated to a specific cross channel state and receiver.
  • Transmitter 1 (or 2) reserves the information layer V 1 k (or V 2 k ) for adapting it to the channel from transmitter 1 (or 2) to the unintended receiver y 2 k (or y 1 k ). Based on this designation, the intended receivers { y 1 k } k = 1 K (or { y 2 k } k = 1 K ) will decode all codebooks { V 1 k } k = 1 K (or { V 2 k } k = 1 K ), and the non-intended receivers { y 2 k } k = 1 K (or { y 1 k } k = 1 K ) will be decoding a subset of these codebooks. The selection of the subsets depends on on channel strengths of the receivers, such that the non-intended receiver y 2 k (or y 1 k ) decodes only codebooks { V 1 s } s = 1 k (or { V 2 s } s = 1 k ).
  • Transmitter 1 (or 2) reserves the layer U 1 k (or U 2 k ) for adapting it to the channel from transmitter 2 (or 1) to the intended receiver y 1 k (or y 2 k ). Based on this designation, the unintended receivers { y 2 k } k = 1 K (or { y 1 k } k = 1 K ) will not decode any of the codebooks { U 1 k } k = 1 K (or { U 2 k } k = 1 K ), and the intended receivers { y 1 k } k = 1 K (or { y 2 k } k = 1 K ) will be decoding a subset of these codebooks. The selection of these subsets depends on channel strengths of the receives such that the intended receiver y 1 k (or y 2 k ) decodes only the codebooks { U 1 s } s = 1 k (or { U 2 s } s = 1 k ).
Figure 21 specifies how the codebooks are assigned to transmitter 1 as we as the set of codebooks decoded by each of the three receivers { y 1 k } k = 1 3 associated with transmitter 1.

4.3.1. Successive Decoding: Two-State Channel

We review a successive decoding scheme for the two-state channel. This scheme will be then generalized in Section 4.3.2. In this decoding scheme, each codebook will be decoded by a number of receivers. Therefore, the rate of each codebook will be limited by the its most degraded channel state. The codebooks that are not decoded by a receiver will be treated as Gaussian noise. These codebooks impose interference on the receiver, which compromises the achievable rate at the receiver. This observation guides designing a successive decoding scheme that dynamically identifies (i) the set of the receivers that decode a given codebook, and (ii) the order by which the codebooks are successively decoded by each receiver.
For formalizing this decoding strategy, denote the set of receivers that decode codebook V i k by V i k , and denote the set of the receivers that decode U i k by U i k . Therefore, we have
V 1 k = { y 1 s } s = 1 3 { y 2 s } s = k 3 , V 2 k = { y 2 s } s = 1 3 { y 1 s } s = k 3 , and U i k = { y i s } s = k 3 .
The order of successively decoding the codebooks at receiver y i k is specified in Table 7.

4.3.2. Successive Decoding: -State Channel

In this section, we generalize the successive decoding scheme to the general multi-state channels. Similarly to (214), we define
V 1 k = { y 1 s } s = 1 K { y 2 s } s = k K , V 2 k = { y 2 s } s = 1 K { y 1 s } s = k K , and U i k = { y i s } s = k K .
Each of the two receivers decodes a set of the codebooks. The choice of the set depends on the channel states. Specifically, when the network state is ( β q , β p ) , receiver 1 decodes K + q codebooks from transmitter 1 and q codebooks from transmitters 2. These codebooks are decoded successively in two stages in the following order:
  • Receiver 1—stage 1 (Codebooks { V i s } s = 1 q ): Receiver 1 decodes one information layer from each transmitter in an alternating manner until all codebooks { V 1 s } s = 1 q and { V 2 s } s = 1 q are decoded. The first layer to be decoded in this stage depends on the state β q . If β q < 1 , the receiver starts by decoding codebook V 1 1 from transmitter 1, then it decodes the respective layer V 2 1 from transmitter 2, and continues alternating between the two transmitters. Otherwise, if β q > 1 , receiver 1 first decodes V 2 1 from the interfering transmitter 2, followed by V 1 1 from transmitter 1, and continues alternating. By the end of stage 1, receiver 1 has decoded q codebooks from each transmitter.
  • Receiver 1—stage 2 (Codebooks { V 1 s } s = q + 1 K and { U 1 s } s = 1 q ): In stage 2, receiver 1 carries on decoding layers { V 1 s } s = q + 1 K from transmitter 1, in an ascending order of the index s. Finally, receiver 1 decodes layers { U 1 s } s = 1 q specially adapted to receivers { y 1 s } s = 1 q , in an ascending order of index s. Throughout stage 2, receiver 1 has additionally decoded K codebooks from its intended transmitter 1.
The decoding scheme at receiver 2 follows the same structure by swapping the roles of the two transmitters. The set of codebooks decoded by receiver i in channel state ( β q , β p ) is partly defined by the set of codebooks decoded by receiver i and the set decoded by receiver j in state ( β q 1 , β p 1 ) . The decoding scheme is summarized in Table 8. In this table, the channels are ordered in the ascending order such that at receiver 1, state ( β q , β p ) precedes all channel states ( β k , β p ) for all k > q . Similarly, at receiver 2, state ( β q , β p ) precedes network state ( β q , β k ) , for every k > p . Furthermore, according to this approach, when the cross channel of receiver i becomes stronger, receiver i decodes additional codebooks from both transmitters. In particular, in Table 8, every cell contains the codebooks decoded in the combined channel state ( β q , β p ) where we mark the codebooks decoded by receiver 1 in blue color, while those decoded by receiver 2 in red color. To further highlight the relationship between the decodable codebooks in different states, we denote by C i k the set of codebooks decoded by the receiver i when a i = β k .

4.3.3. Average Achievable Rate Region

In this section, we provide an overview on the average achievable rate region. The average rates of the users are specified by the rates of codebooks { V i k , U i k } k = 1 K , for i { 1 , 2 } . These rates should satisfy all the constraints imposes by different receivers in order for them to successfully decode all their designated codebooks. Hence, the rates are bounded by the smallest achievable rates by the receivers V i k and U i k . To formalize the rate regions, define R ( A ) as the rate of codebook A { V i k , U i k : i , k } , and define γ ( A ) as the fraction of the power P i allocated to the codebook A { V i k , U i k : i , k } . Accordingly, define R i ( s , t ) as the total achievable rate of user i, when the network is in the state ( β s , β t ) . Finally, denote the average achievable rate at receiver i by R ¯ i = E [ R i ( β s , β t ) ] , where the expectation is taken with respect to the probabilistic model of the channel. Note that the transmitters collectively have 4 K codebooks. Corresponding to the set S R + 4 K , define the rate region R in ( S ) as the set of all average rate combinations ( R ¯ 1 , R ¯ 2 ) such that R ( A ) S for all A { V i k , U i k : i , k } , i.e.,
R in ( S ) = ( R ¯ 1 , R ¯ 2 ) : R ( A ) S , A { V i k , U i k : i , k .
Furthermore, corresponding to each receiver y i k and codebook A { U i k , V i k : i , k } that should be decoded by y i k , define R i k ( A ) as the maximum rate that we can sustain for codebook A, while being decodable by y i k . Accordingly, for user i, and corresponding to s , t { 1 , , K } define the rates
r i ( s , t ) = k = t + 1 K R i s ( V i t ) + k = 1 s R i k ( U i k ) + min { 1 , 2 } k = 1 t R k ( V i k ) ,
where the rates R i s ( A ) are defined as follows. First, define γ ( A ) as the fraction of P i allocated to A { U i k , V i k : k } , and set
Γ v ( i , k ) = j = 1 k γ ( V i j ) , and Γ u ( i , k ) = j = 1 k γ ( U i j ) .
Based on these definitions, if the codebook V i k is decoded by the receiver y i s , then we have
If β s 1 , R i s ( V i k ) = C γ ( V i k ) P i , ( 1 Γ v ( i , k ) ) P i + β k ( 1 Γ v ( j , s 1 ) ) P j ,
If β s > 1 , R i s ( V i k ) = C γ ( V i k ) P i , ( 1 Γ v ( i , k ) ) P i + β k ( 1 Γ v ( j , s ) ) P j .
Similarly, if the codebook V i k is decoded by the receiver y j s , then we have
If β s 1 , R j s ( V i k ) = C β s γ ( V i k ) P i </