Information Rates for Channels with Fading, Side Information and Adaptive Codewords

Kramer, Gerhard

doi:10.3390/e25050728

Open AccessEditor’s ChoiceArticle

Information Rates for Channels with Fading, Side Information and Adaptive Codewords

by

Gerhard Kramer

School of Computation, Information and Technology, Technical University of Munich (TUM), 80333 Munich, Germany

Entropy 2023, 25(5), 728; https://doi.org/10.3390/e25050728

Submission received: 3 March 2023 / Revised: 26 March 2023 / Accepted: 22 April 2023 / Published: 27 April 2023

(This article belongs to the Special Issue Wireless Networks: Information Theoretic Perspectives III)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Generalized mutual information (GMI) is used to compute achievable rates for fading channels with various types of channel state information at the transmitter (CSIT) and receiver (CSIR). The GMI is based on variations of auxiliary channel models with additive white Gaussian noise (AWGN) and circularly-symmetric complex Gaussian inputs. One variation uses reverse channel models with minimum mean square error (MMSE) estimates that give the largest rates but are challenging to optimize. A second variation uses forward channel models with linear MMSE estimates that are easier to optimize. Both model classes are applied to channels where the receiver is unaware of the CSIT and for which adaptive codewords achieve capacity. The forward model inputs are chosen as linear functions of the adaptive codeword’s entries to simplify the analysis. For scalar channels, the maximum GMI is then achieved by a conventional codebook, where the amplitude and phase of each channel symbol are modified based on the CSIT. The GMI increases by partitioning the channel output alphabet and using a different auxiliary model for each partition subset. The partitioning also helps to determine the capacity scaling at high and low signal-to-noise ratios. A class of power control policies is described for partial CSIR, including a MMSE policy for full CSIT. Several examples of fading channels with AWGN illustrate the theory, focusing on on-off fading and Rayleigh fading. The capacity results generalize to block fading channels with in-block feedback, including capacity expressions in terms of mutual and directed information.

Keywords:

capacity; channel state information; directed information; fading; feedback; generalized mutual information; side information

1. Introduction

The capacity of fading channels is a topic of interest in wireless communications [1,2,3,4]. Fading refers to model variations over time, frequency, and space. A common approach to track fading is to insert pilot symbols into transmit symbol strings, have receivers estimate fading parameters via the pilot symbols, and have the receivers share their estimated channel state information (CSI) with the transmitters. The CSI available at the receiver (CSIR) and transmitter (CSIT) may be different and imperfect.

Information-theoretic studies on fading channels distinguish between average (ergodic) and outage capacity, causal and non-causal CSI, symbol and rate-limited CSI, and different qualities of CSIR and CSIT that are coarsely categorized as no, perfect, or partial. We refer to [5] for a review of the literature up to 2008. We here focus exclusively on average capacity and causal CSIT as introduced in [6]. Codes for such CSIT, or more generally for noisy feedback [7], are based on Shannon strategies, also called codetrees ([8], Chapter 9.4), or adaptive codewords ([9], Section 4.1). (The term “adaptive codeword” was suggested to the author by J. L. Massey.) Adaptive codewords are usually implemented by a conventional codebook and by modifying the codeword symbols as a function of the CSIT. This approach is optimal for some channels [10] and will be our main interest.

1.1. Block Fading

A model that accounts for the different time scales of data transmission (e.g., nanoseconds) and channel variations (e.g., milliseconds) is block fading [11,12]. Such fading has the channel parameters constant within blocks of L symbols and varying across blocks. A basic setup is as follows.

The fading is described by a state process $S_{H 1}, S_{H 2}, \dots$ independent of the transmitter messages and channel noise. The subscript “H” emphasizes that the states $S_{H i}$ may be hidden from the transceivers.
Each receiver sees a state process $S_{R 1}, S_{R 2}, \dots$ where $S_{R i}$ is a noisy function of $S_{H i}$ for all i.
Each transmitter sees a state process $S_{T 1}, S_{T 2}, \dots$ where $S_{T i}$ is a noisy function of $S_{H i}$ for all i.

The state processes may be modeled as memoryless [11,12] or governed by a Markov chain [13,14,15,16,17,18,19,20,21]. The memoryless models are particular cases of Shannon’s model [6]. For scalar channels,

S_{H i}

is usually a complex number

H_{i}

. Similarly, for vector or multi-input, multi-output (MIMO) channels with M- and N-dimensional inputs and outputs, respectively,

S_{H i}

is a

N \times M

matrix

H_{i}

.

Consider, for example, a point-to-point channel with block-fading and complex-alphabet inputs

X_{i ℓ}

and outputs

\begin{matrix} Y_{i ℓ} = H_{i} X_{i ℓ} + Z_{i ℓ} \end{matrix}

(1)

where the index i,

i = 1, \dots, n

, enumerates the blocks and the index ℓ,

ℓ = 1, \dots, L

, enumerates the symbols of each block. The additive white Gaussian noise (AWGN)

Z_{11}, Z_{12}, \dots

is a sequence of independent and identically distributed (i.i.d.) random variables that have a common circularly-symmetric complex Gaussian (CSCG) distribution.

1.2. CSI and In-Block Feedback

The motivation for modeling CSI as independent of the messages is simplicity. If one uses only pilot symbols to estimate the

H_{i}

in (1), for example, then the independence is valid, and the capacity analysis may be tractable. However, to improve performance, one can implement data and parameter estimation jointly, and one can actively adjust the transmit symbols

X_{i ℓ}

using past received symbols

Y_{i k}

,

k = 1, \dots, ℓ - 1

, if in-block feedback is available. (Across-block feedback does not increase capacity if the state processes are memoryless; see ([22], Remark 16).) An information theory for such feedback was developed in [22], where a challenge is that code design is based on adaptive codewords that are more sophisticated than conventional codewords.

For example, suppose the CSIR is

S_{R i} = H_{i}

. Then, one might expect that CSCG signaling is optimal, and the capacity is an average of

log (1 + SNR)

terms, where SNR is a signal-to-noise ratio. However, this simplification is based on constraints, e.g., that the CSIT is a function of the CSIR and that the

X_{i ℓ}

cannot influence the CSIT. The former constraint can be realistic, e.g., if the receiver quantizes a pilot-based estimate of

H_{i}

and sends the quantization bits to the transmitter via a low-latency and reliable feedback link. On the other hand, the latter constraint is unrealistic in general.

1.3. Auxiliary Models

This paper’s primary motivation is to further develop information theory for adaptive codewords. To gain insight, it is helpful to have achievable rates with

log (1 + SNR)

terms. A common approach to obtain such expressions is to lower bound the channel mutual information

I (X; Y)

as follows.

Suppose X is continuous and consider two conditional densities: the density

p (x | y)

and an auxiliary density

q (x | y)

. We will refer to such densities as reverse models; similarly,

p (y | x)

and

q (y | x)

are called forward models. One may write the differential entropy of X given Y as

\begin{matrix} h (X | Y) & = E [- log p (X | Y)] = \underset{average cross - entropy}{\underset{⏟}{E [- log q (X | Y)]}} - \underset{average divergence \geq 0}{\underset{⏟}{E [log \frac{p (X | Y)}{q (X | Y)}]}} \end{matrix}

(2)

where the first expectation in (2) is an average cross-entropy, and the second is an average informational divergence, which is non-negative. Several criteria affect the choice of

q (x | y)

: the cross-entropy should be simple enough to admit theoretical or numerical analysis, e.g., by Monte Carlo simulation; the cross-entropy should be close to

h (X | Y)

; and the cross-entropy should suggest suitable transmitter and receiver structures.

We illustrate how reverse and forward auxiliary models have been applied to bound mutual information. Assume that

E [X] = E [Y] = 0

for simplicity.

Reverse Model: Consider the reverse density that models

X, Y

as jointly CSCG:

\begin{matrix} q (x | y) = \frac{1}{π σ_{L}^{2}} exp (- {|x - {\hat{x}}_{L}|}^{2} / σ_{L}^{2}) \end{matrix}

(3)

where

{\hat{X}}_{L} = (E [X Y^{*}] / E [{| Y |}^{2}]) Y

and

\begin{matrix} σ_{L}^{2} = E [{|X - {\hat{X}}_{L}|}^{2}] = E [{| X |}^{2}] - \frac{| E [X Y^{*}] |^{2}}{E [{| Y |}^{2}]} \end{matrix}

(4)

is the mean square error (MSE) of the estimate

{\hat{X}}_{L}

. In fact,

{\hat{X}}_{L}

is the linear estimate with the minimum MSE (MMSE), and

σ_{L}^{2}

is the linear MMSE (LMMSE) which is independent of

Y = y

; see Section 2.5. The bound in (2) gives

\begin{matrix} h (X | Y) \leq log (π e σ_{L}^{2}) . \end{matrix}

(5)

Thus, if X is CSCG, then we have the desired form

\begin{matrix} I (X; Y) = h (X) - h (X | Y) \geq log (1 + \frac{{| h |}^{2} E [{| X |}^{2}]}{σ^{2}}) \end{matrix}

(6)

where the parameters h and

σ^{2}

are

\begin{matrix} h = \frac{E [Y X^{*}]}{E [{| X |}^{2}]}, σ^{2} = E [{| Y - h X |}^{2}] . \end{matrix}

(7)

The bound (6) is apparently due to Pinsker [23,24,25] and is widely used in the literature; see e.g., [18,26,27,28,29,30,31,32,33,34,35,36,37,38]. The bound is usually related to channels

p (y | x)

with additive noise but (2)–(6) show that it applies generally. The extension to vector channels is given in Section 2.7 below.

Forward Model: A more flexible approach is to choose the reverse density as

\begin{matrix} q (x | y) = \frac{p (x) q {(y | x)}^{s}}{q (y)} \end{matrix}

(8)

where

q (y | x)

is a forward auxiliary model (not necessarily a density),

s \geq 0

is a parameter to be optimized, and

\begin{matrix} q (y) = \int_{C} p (x) q {(y | x)}^{s} d x . \end{matrix}

(9)

Inserting (8) into (2) we compute

\begin{matrix} I (X; Y) \geq max_{s \geq 0} E [log \frac{q {(Y | X)}^{s}}{q (Y)}] . \end{matrix}

(10)

The right-hand side (RHS) of (10) is called a generalized mutual information (GMI) [39,40] and has been applied to problems in information theory [41], wireless communications [42,43,44,45,46,47,48,49,50,51], and fiber-optic communications [52,53,54,55,56,57,58,59,60,61]. For example, the bounds (6) and (10) are the same if

s = 1

and

\begin{matrix} q (y | x) = exp ({- | y - h x |}^{2} / σ^{2}) \end{matrix}

(11)

where h and

σ^{2}

are given by (7). Note that (11) is not a density unless

σ^{2} = 1 / π

but

q (x | y)

is a density. (We require

q (x | y)

to be a density to apply the divergence bound in (2).)

We compare the two approaches. The bound (5) is simple to apply and works well since the choices (7) give the maximal GMI for CSCG X; see Proposition 1 below. However, there are limitations: one must use continuous X, the auxiliary model

q (y | x)

is fixed as (11), and the bound does not show how to design the receiver. Instead, the GMI applies to continuous/discrete/mixed X and has an operational interpretation: the receiver uses

q (y | x)

rather than

p (y | x)

to decode. The framework of such mismatched receivers appeared in ([62], Exercise 5.22); see also [63].

1.4. Refined Auxiliary Models

The two approaches above can be refined in several ways, and we review selected variations in the literature.

Reverse Models: The model

q (x | y)

can be different for each

Y = y

, e.g., on may choose X as Gaussian with mean

E [X | Y = y]

and variance

\begin{matrix} Var [X | Y = y] = E [{| X |}^{2} | Y = y] - | E [X | Y = y] |^{2} \end{matrix}

(12)

and where

\begin{matrix} q (x | y) = \frac{1}{π Var [X | Y = y]} exp (- \frac{{|x - E [X | Y = y]|}^{2}}{Var [X | Y = y]}) . \end{matrix}

(13)

Inserting (13) in (2) we have the bound

\begin{matrix} h (X | Y) \leq E [log (π e Var [X | Y])] \end{matrix}

(14)

which improves (5) in general, since

Var [X | Y = y]

is the MMSE of X given the event

Y = y

. In other words, we have

Var [X | Y = y] \leq σ_{L}^{2}

for all

Y = y

and the following bound improves (6) for CSCG X:

\begin{matrix} I (X; Y) \geq E [log \frac{E [{| X |}^{2}]}{Var [X | Y]}] . \end{matrix}

(15)

In fact, the bound (15) was derived in ([50], Section III.B) by optimizing the GMI in (10) over all forward models of the form

\begin{matrix} q (y | x) = exp (- {|{\tilde{g}}_{y} - {\tilde{f}}_{y} x|}^{2}) \end{matrix}

(16)

where

{\tilde{f}}_{y}

,

{\tilde{g}}_{y}

depend on y; see also [47,48,49]. We provide a simple proof. By inserting (16) into (8) and (9), absorbing the s parameter in

{\tilde{f}}_{y}

and

{\tilde{g}}_{y}

, and completing squares, one can equivalently optimize over all reverse densities of the form

\begin{matrix} q (x | y) = exp (- {|g_{y} - f_{y} x|}^{2} + h_{y}) \end{matrix}

(17)

where

| f_{y} |^{2} = π e^{h_{y}}

so that

q (x | y)

is a density. We next bound the cross-entropy as

\begin{matrix} E [- log q (X | Y = y)] & = E [{|g_{y} / f_{y} - X|}^{2}] {| f_{y} |}^{2} - h_{y} \\ \geq Var [X | Y = y] π e^{h_{y}} - h_{y} \end{matrix}

(18)

with equality if

g_{y} / f_{y} = E [X | Y = y]

; see Section 2.5. The RHS of (18) is minimized by

Var [X | Y = y] π e^{h_{y}} = 1

, so the best choice for

f_{y}

,

g_{y}

,

h_{y}

gives the bound (14).

Remark 1.

The model (16) uses generalized nearest-neighbor decoding, improving the rules proposed in [42,43,44]. The authors of [50] pointed out that (6) and (15) use the LMMSE and MMSE, respectively; see ([50], Equation (87)).

Remark 2.

A corresponding forward model can be based on (8) and (13), namely

\begin{matrix} q {(y | x)}^{s} = \frac{q (x | y)}{p (x)} \Rightarrow q (y) = 1 . \end{matrix}

(19)

Remark 3.

The RHS of (15) has a more complicated form than the RHS of (6) due to the outer expectation and conditional variance, and this makes optimizing X challenging when there is CSIR and CSIT. Also, if

p (y | x)

is known, then it seems sensible to numerically compute

p (y)

and

I (X; Y)

directly, e.g., via Monte Carlo or numerical integration.

Remark 4.

Decoding rules for discrete X can be based on decision theory as well as estimation theory; see ([64], Equation (11)).

Forward Models: Refinements of (11) appear in the optical fiber literature where the non-linear Schrödinger equation describes wave propagation [52]. Such channels exhibit complicated interactions of attenuation, dispersion, nonlinearity, and noise, and the channel density is too challenging to compute. One thus resorts to capacity lower bounds based on GMI and Monte Carlo simulation. The simplest models are memoryless, and they work well if chosen carefully. For example, the paper [52] used auxiliary models of the form

\begin{matrix} q (y | x) = exp ({- | y - h x |}^{2} / σ_{| x |}^{2}) \end{matrix}

(20)

where h accounts for attenuation and self-phase modulation, and where the noise variance

σ_{| x |}^{2}

depends on

| x |

. Also, X was chosen to have concentric rings rather than a CSCG density. Subsequent papers applied progressively more sophisticated models with memory to better approximate the actual channel; see [53,54,55,56,57,58,59]. However, the rate gains over the model (20) are minor (≈12%) for 1000 km links, and the newer models do not suggest practical receiver structures.

A related application is short-reach fiber-optic systems that use direct detection (DD) receivers [65] with photodiodes. The paper [60] showed that sampling faster than the symbol rate increases the DD capacity. However, spectrally efficient filtering gives the channel a long memory, motivating auxiliary models

q (y | x)

with reduced memory to simplify GMI computations [61,66]. More generally, one may use channel-shortening filters [67,68,69] to increase the GMI.

Remark 5.

The ultimate GMI is

I (X; Y)

, and one can compute this quantity numerically for the channels considered in this paper. We are motivated to focus on forward auxiliary models

q (y | x)

to understand how to improve information rates for more complex channels. For instance, simple

q (y | x)

let one understand properties of optimal codes, see Lemma 3, and they suggest explicit power control policies, see Theorem 2.

Remark 6.

The paper [37] (see also ([2], Equation (3.3.45)) and ([70], Equation (6))) derives two capacity lower bounds for massive MIMO channels. These bounds are designed for problems where the fading parameters have small variance so that, in effect,

σ^{2}

in (7) is small. We will instead encounter cases where

σ^{2}

grows in proportion to

E [{| X |}^{2}]

and the RHS of (6) quickly saturates as

E [{| X |}^{2}]

grows; see Remark 20.

1.5. Organization

This paper is organized as follows. Section 2 defines notation and reviews basic results. Section 3 develops two results for the GMI of scalar auxiliary models with AWGN:

Proposition 1 in Section 3.1 states a known result, namely that the RHS of (6) is the maximum GMI for the AWGN auxiliary model (11) and a CSCG X.
Lemma 1 in Section 3.2 generalizes Proposition 1 by partitioning the channel output alphabet into K subsets, $K \geq 1$ . We use $K = 2$ to establish capacity properties at high and low SNR.

Section 4 and Section 5 apply the GMI to channels with CSIT and CSIR.

Section 4.3 treats adaptive codewords and develops structural properties of their optimal distribution.
Lemma 2 in Section 4.4 generalizes Proposition 1 to MIMO channels and adaptive codewords. The receiver models each transmit symbol as a weighted sum of the entries of the corresponding adaptive symbol.
Lemma 3 in Section 4.5 states that the maximum GMI for scalar channels, an AWGN auxiliary model, adaptive codewords with jointly CSCG entries, and $K = 1$ is achieved by using a conventional codebook where each symbol is modified based on the CSIT.
Lemma 4 in Section 4.6 extends Lemma 3 to MIMO channels, including diagonal or parallel channels.
Theorem 1 in Section 5.1 generalizes Lemma 3 to include CSIR; we use this result several times in Section 6.
Lemma 5 in Section 5.3 generalizes Lemmas 1 and 2 by partitioning the channel output alphabet.

Section 6, Section 7 and Section 8 apply the GMI to fading channels with AWGN and illustrate the theory for on-off and Rayleigh fading.

Lemma 6 in Section 6 gives a general capacity upper bound.
Section 6.5 introduces a class of power control policies for full CSIT. Theorem 2 develops the optimal policy with an MMSE form.
Theorem 3 in Section 6.6 provides a quadratic waterfilling expression for the GMI with partial CSIR.

Section 9 develops theory for block fading channels with in-block feedback (or in-block CSIT) that is a function of the CSIR and past channel inputs and outputs.

Theorem 4 in Section 9.2 generalizes Lemma 4 to MIMO block fading channels;
Section 9.3 develops capacity expressions in terms of directed information;
Section 9.4 specializes the capacity to fading channels with AWGN and delayed CSIR;
Proposition 3 generalizes Proposition 2 to channels with special CSIR and CSIT.

Section 10 concludes the paper. Finally, Appendix A, Appendix B, Appendix C, Appendix D, Appendix E, Appendix F and Appendix G provide results on special functions, GMI calculations, and proofs.

2. Preliminaries

2.1. Basic Notation

Let

1 (\cdot)

be the indicator function that takes on the value 1 if its argument is true and 0 otherwise. Let

δ (.)

be the Dirac generalized function with

\int_{X} δ (x) f (x) d x = f (0) \cdot 1 (0 \in X)

. For

x \in R

, define

{(x)}^{+} = max (0, x)

. The complex-conjugate, absolute value, and phase of

x \in C

are written as

x^{*}

,

| x |

, and

arg (x)

, respectively. We write

j = \sqrt{- 1}

and

\bar{ϵ} = 1 - ϵ

.

Sets are written with calligraphic font, e.g.,

S = {1, \dots, n}

and the cardinality of

S

is

| S |

. The complement of

S

in

T

is

S^{c}

where

T

is understood from the context.

2.2. Vectors and Matrices

Column vectors are written as

\underset{̲}{x} = {[x_{1}, \dots, x_{M}]}^{T}

where M is the dimension, and T denotes transposition. The complex-conjugate transpose (or Hermitian) of

\underset{̲}{x}

is written as

{\underset{̲}{x}}^{†}

. The Euclidean norm of

\underset{̲}{x}

is

∥ \underset{̲}{x} ∥

. Matrices are written with bold letters such as

A

. The letter

I

denotes the identity matrix. The determinant and trace of a square matrix

A

are written as

det A

and

tr A

, respectively.

A singular value decomposition (SVD) is

A = U Σ V^{†}

where

U

and

V

are unitary matrices and

Σ

is a rectangular diagonal matrix with the singular values of

A

on the diagonal. The square matrix

A

is positive semi-definite if

{\underset{̲}{x}}^{†} A \underset{̲}{x} \geq 0

for all

\underset{̲}{x}

. The notation

A ⪯ B

means that

B - A

is positive semi-definite. Similarly,

A

is positive definite if

{\underset{̲}{x}}^{†} A \underset{̲}{x} > 0

for all

\underset{̲}{x}

, and we write

A ≺ B

if

B - A

is positive definite.

2.3. Random Variables

Random variables are written with uppercase letters, such as X, and their realizations with lowercase letters, such as x. We write the distribution of discrete X with alphabet

X = {0, \dots, n - 1}

as

P_{X} = [P_{X} (0), \dots, P_{X} (n - 1)]

. The density of a real- or complex-valued X is written as

p_{X}

. Mixed discrete-continuous distributions are written using mixtures of densities and Dirac-

δ

functions.

Conditional distributions and densities are written as

P_{X | Y}

and

p_{X | Y}

, respectively. We usually drop subscripts if the argument is a lowercase version of the random variable, e.g., we write

p (y | x)

for

p_{Y | X} (y | x)

. One exception is that we consistently write the distributions

P_{S_{R}} (.)

and

P_{S_{T}} (.)

of the CSIR and CSIT with the subscript to avoid confusion with power notation.

2.4. Second-Order Statistics

The expectation and variance of the complex-valued random variable X are

E [X]

and

Var [X] = E [| X - E [X] |^{2}]

, respectively. The correlation coefficient of

X_{1}

and

X_{2}

is

ρ = E [U_{1} U_{2}^{*}]

where

\begin{matrix} U_{i} = (X_{i} - E [X_{i}]) / \sqrt{Var [X_{i}]} \end{matrix}

for

i = 1, 2

. We say that

X_{1}

and

X_{2}

are fully correlated if

ρ = e^{j ϕ}

for some real

ϕ

. Conditional expectation and variance are written as

E [X | A = a]

and

\begin{matrix} Var [X | A = a] = E [(X - E [X]) {(X - E [X])}^{*} | A = a] . \end{matrix}

The expressions

E [X | A]

,

Var [X | A]

are random variables that take on the values

E [X | A = a]

,

Var [X | A = a]

if

A = a

.

The expectation and covariance matrix of the random column vector

\underset{̲}{X} = {[X_{1}, \dots, X_{M}]}^{T}

are

E [\underset{̲}{X}]

and

Q_{\underset{̲}{X}} = E [(\underset{̲}{X} - E [\underset{̲}{X}]) {(\underset{̲}{X} - E [\underset{̲}{X}])}^{†}]

, respectively. We write

Q_{\underset{̲}{X}, \underset{̲}{Y}}

for the covariance matrix of the stacked vector

{[{\underset{̲}{X}}^{T} {\underset{̲}{Y}}^{T}]}^{T}

. We write

Q_{\underset{̲}{X} | \underset{̲}{Y} = \underset{̲}{y}}

for the covariance matrix of

\underset{̲}{X}

conditioned on the event

\underset{̲}{Y} = \underset{̲}{y}

.

Q_{\underset{̲}{X} | \underset{̲}{Y}}

is a random matrix that takes on the matrix value

Q_{\underset{̲}{X} | \underset{̲}{Y} = \underset{̲}{y}}

when

\underset{̲}{Y} = \underset{̲}{y}

.

We often consider CSCG random variables and vectors. A CSCG

\underset{̲}{X}

has density

\begin{matrix} p (\underset{̲}{x}) = \frac{exp (- {\underset{̲}{x}}^{†} Q_{\underset{̲}{X}}^{- 1} \underset{̲}{x})}{π^{M} det Q_{\underset{̲}{X}}} \end{matrix}

and we write

\underset{̲}{X} \sim CN (\underset{̲}{0}, Q_{\underset{̲}{X}})

.

2.5. MMSE and LMMSE Estimation

Assume that

E [\underset{̲}{X}] = E [\underset{̲}{Y}] = \underset{̲}{0}

. The MMSE estimate of

\underset{̲}{X}

given the event

\underset{̲}{Y} = \underset{̲}{y}

is the vector

\hat{\underset{̲}{X}} (\underset{̲}{y})

that minimizes

\begin{matrix} E [{∥\underset{̲}{X} - \hat{\underset{̲}{X}} (\underset{̲}{y})∥}^{2}| \underset{̲}{Y} = \underset{̲}{y}] . \end{matrix}

Direct analysis gives ([71], Chapter 4)

\begin{matrix} \hat{\underset{̲}{X}} (\underset{̲}{y}) = E [\underset{̲}{X} | \underset{̲}{Y} = \underset{̲}{y}] \end{matrix}

(21)

\begin{matrix} E [∥ \underset{̲}{X} - \hat{\underset{̲}{X}} ∥^{2}] = E [∥ \underset{̲}{X} ∥^{2}] - E [∥ \hat{\underset{̲}{X}} ∥^{2}] \end{matrix}

(22)

\begin{matrix} Q_{\underset{̲}{X} - \hat{\underset{̲}{X}}} = Q_{\underset{̲}{X}} - Q_{\hat{\underset{̲}{X}}} \end{matrix}

(23)

\begin{matrix} E [(\underset{̲}{X} - \hat{\underset{̲}{X}}) {\underset{̲}{Y}}^{†}] = 0 \end{matrix}

(24)

where the last identity is called the orthogonality principle.

The LMMSE estimate of

\underset{̲}{X}

given

\underset{̲}{Y}

with invertible

Q_{\underset{̲}{Y}}

is the vector

{\hat{\underset{̲}{X}}}_{L} = C \underset{̲}{Y}

where

C

is chosen to minimize

E [∥ \underset{̲}{X} - {\hat{\underset{̲}{X}}}_{L} ∥^{2}]

. We compute

\begin{matrix} {\hat{\underset{̲}{X}}}_{L} = E [\underset{̲}{X} {\underset{̲}{Y}}^{†}] Q_{\underset{̲}{Y}}^{- 1} \underset{̲}{Y} \end{matrix}

(25)

and we also have the properties (22)–(24) with

\hat{\underset{̲}{X}}

replaced by

{\hat{\underset{̲}{X}}}_{L}

. Moreover, if

\underset{̲}{X}

and

\underset{̲}{Y}

are jointly CSCG, then the MMSE and LMMSE estimators coincide, and the orthogonality principle (24) implies that the error

\underset{̲}{X} - \hat{\underset{̲}{X}}

is independent of

\underset{̲}{Y}

, i.e., we have

\begin{matrix} E [(\underset{̲}{X} - \hat{\underset{̲}{X}}) {(\underset{̲}{X} - \hat{\underset{̲}{X}})}^{†}| \underset{̲}{Y} = \underset{̲}{y}] & = E [\underset{̲}{X} {\underset{̲}{X}}^{†}| \underset{̲}{Y} = \underset{̲}{y}] - E [\underset{̲}{X} {\underset{̲}{Y}}^{†}] Q_{\underset{̲}{Y}}^{- 1} \underset{̲}{y} {\underset{̲}{y}}^{†} Q_{\underset{̲}{Y}}^{- 1} E {[\underset{̲}{X} {\underset{̲}{Y}}^{†}]}^{†} \\ = Q_{\underset{̲}{X}} - Q_{\hat{\underset{̲}{X}}} . \end{matrix}

(26)

2.6. Entropy, Divergence, and Information

Entropies of random vectors with densities p are written as

\begin{matrix} h (\underset{̲}{X}) = E [- log p (\underset{̲}{X})], h (\underset{̲}{X} | \underset{̲}{Y}) = E [- log p (\underset{̲}{X} | \underset{̲}{Y})] \end{matrix}

where we use logarithms to the base e for analysis. The informational divergence of the densities p and q is

\begin{matrix} D (p ∥ q) = E [log \frac{p (\underset{̲}{X})}{q (\underset{̲}{X})}] \end{matrix}

and

D (p ∥ q) \geq 0

with equality if and only if

p = q

almost everywhere. The mutual information of

\underset{̲}{X}

and

\underset{̲}{Y}

is

\begin{matrix} I (\underset{̲}{X}; \underset{̲}{Y}) = D (p (\underset{̲}{X}, \underset{̲}{Y}) ∥ p (\underset{̲}{X}) p (\underset{̲}{Y})) = E [log \frac{p (\underset{̲}{Y} | \underset{̲}{X})}{p (\underset{̲}{Y})}] . \end{matrix}

The average mutual information of

\underset{̲}{X}

and

\underset{̲}{Y}

conditioned on

\underset{̲}{Z}

is

I (\underset{̲}{X}; \underset{̲}{Y} | \underset{̲}{Z})

. We write strings as

X^{L} = (X_{1}, X_{2}, \dots, X_{L})

and use the directed information notation (see [9,72])

\begin{matrix} I (X^{L} \to Y^{L} | Z) = \sum_{ℓ = 1}^{L} I (X^{ℓ}; Y_{ℓ} | Y^{ℓ - 1}, Z) \end{matrix}

(27)

\begin{matrix} I (X^{L} \to Y^{L} ∥ Z^{L} | W) = \sum_{ℓ = 1}^{L} I (X^{ℓ}; Y_{ℓ} | Y^{ℓ - 1}, Z^{ℓ}, W) \end{matrix}

(28)

where

Y_{0} = 0

.

2.7. Entropy and Information Bounds

The expression (2) applies to random vectors. Choosing

q (\underset{̲}{x} | \underset{̲}{y})

as the conditional density where the

\underset{̲}{X}, \underset{̲}{Y}

are modeled as jointly CSCG we obtain a generalization of (5):

\begin{matrix} h (\underset{̲}{X} | \underset{̲}{Y}) & \leq log \frac{det (π e Q_{\underset{̲}{X}, \underset{̲}{Y}})}{det (π e Q_{\underset{̲}{Y}})} = log det (π e \{Q_{\underset{̲}{X}} - E [\underset{̲}{X} {\underset{̲}{Y}}^{†}] Q_{\underset{̲}{Y}}^{- 1} E [\underset{̲}{Y} {\underset{̲}{X}}^{†}]\}) . \end{matrix}

(29)

The vector generalization of (6) for CSCG

\underset{̲}{X}

is

\begin{matrix} I (\underset{̲}{X}; \underset{̲}{Y}) & = h (\underset{̲}{X}) - h (\underset{̲}{X} | \underset{̲}{Y}) \\ \geq log det ({(Q_{\underset{̲}{X}} - E [\underset{̲}{X} {\underset{̲}{Y}}^{†}] Q_{\underset{̲}{Y}}^{- 1} E [\underset{̲}{Y} {\underset{̲}{X}}^{†}])}^{- 1} Q_{\underset{̲}{X}}) \\ \overset{(a)}{=} log det (I + Q_{\underset{̲}{Z}}^{- 1} H Q_{\underset{̲}{X}}^{- 1} H^{†}) \end{matrix}

(30)

where (cf. (7))

\begin{matrix} H = E [\underset{̲}{Y} {\underset{̲}{X}}^{†}] Q_{\underset{̲}{\bar{X}}}^{- 1}, Q_{\underset{̲}{Z}} = Q_{\underset{̲}{Y}} - H Q_{\underset{̲}{X}} H^{†} \end{matrix}

(31)

and step

(a)

in (30) follows by the Woodbury identity

\begin{matrix} {(A + B C D)}^{- 1} = A^{- 1} - A^{- 1} B {(C^{- 1} + D A^{- 1} B)}^{- 1} D A^{- 1} \end{matrix}

(32)

and the Sylvester identity

\begin{matrix} det (I + A B) = det (I + B A) . \end{matrix}

(33)

We also have vector generalizations of (14) and (15):

\begin{matrix} h (\underset{̲}{X} | \underset{̲}{Y}) & \leq E [log det (π e Q_{\underset{̲}{X} | \underset{̲}{Y}})] \end{matrix}

(34)

\begin{matrix} I (\underset{̲}{X}; \underset{̲}{Y}) & \geq E [log \frac{det Q_{\underset{̲}{X}}}{det Q_{\underset{̲}{X} | \underset{̲}{Y}}}] for CSC G \underset{̲}{X} . \end{matrix}

(35)

2.8. Capacity and Wideband Rates

Consider the complex-alphabet AWGN channel with output

Y = X + Z

and noise

Z \sim CN (0, 1)

. The capacity with the block power constraint

\frac{1}{n} \sum_{i = 1}^{n} {| X_{i} |}^{2} \leq P

is

\begin{matrix} C (P) = max_{E [{| X |}^{2}] \leq P} I (X; Y) = log (1 + P) . \end{matrix}

(36)

The low SNR regime (small P) is known as the wideband regime [73]. For well-behaved channels such as AWGN channels, the minimum

E_{b} / N_{0}

and the slope S of the capacity vs.

E_{b} / N_{0}

in bits/(3 dB) at the minimum

E_{b} / N_{0}

are (see ([73], Equation (35)) and ([73], Theorem 9))

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} = \frac{log 2}{C^{'} (0)}, S = \frac{2 {[C^{'} (0)]}^{2}}{- C^{″} (0)} \end{matrix}

(37)

where

C^{'} (P)

and

C^{″} (P)

are the first and second derivatives of

C (P)

(measured in nats) with respect to P, respectively. For example, the wideband derivatives for (36) are

C^{'} (0) = 1

and

C^{″} (0) = - 1

so that the wideband values (37) are

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} = log 2, S = 2 . \end{matrix}

(38)

The minimal

E_{b} / N_{0}

is usually stated in decibels, for example

10 {log}_{10} (log 2) = - 1.59

dB. An extension of the theory to general channels is described in ([74], Section III).

Remark 7.

A useful method is flash signaling, where one sends with zero energy most of the time. In particular, we will consider the CSCG flash density

\begin{matrix} p (x) = (1 - p) δ (x) + p \frac{e^{- {| x |}^{2} / (P / p)}}{π (P / p)} \end{matrix}

(39)

where

0 < p \leq 1

so that the average power is

E [{| X |}^{2}] = P

. Note that flash signaling is defined in ([73], Definition 2) as a family of distributions satisfying a particular property as

P \to 0

. We use the terminology informally.

2.9. Uniformly-Spaced Quantizer

Consider a uniformly-spaced scalar quantizer

q_{u} (.)

with B bits, domain

[0, \infty)

, and reconstruction points

\begin{matrix} s \in {Δ / 2, 3 Δ / 2, \dots, Δ / 2 + (2^{B} - 1) Δ} \end{matrix}

where

Δ > 0

. The quantization intervals are

\begin{matrix} I (s) = \{\begin{matrix} [s - \frac{Δ}{2}, s + \frac{Δ}{2}), & s \neq s_{\max} \\ [s - \frac{Δ}{2}, \infty), & s = s_{\max} \end{matrix} \end{matrix}

where

s_{\max} = Δ / 2 + (2^{B} - 1) Δ

. We will consider

B = 0, 1, \infty

. For

B = \infty

we choose

q_{u} (x) = x

.

Suppose one applies the quantizer to the non-negative random variable G with density

p (g)

to obtain

S_{T} = q_{u} (G)

. Let

P_{S_{T}}

and

P_{S_{T} | G}

be the probability mass functions of

S_{T}

without and with conditioning on G, respectively. We have

\begin{matrix} P_{S_{T} | G} (s | g) = 1 (g \in I (s)), P_{S_{T}} (s) = \int_{g \in I (s)} p (g) d g \end{matrix}

(40)

and using Bayes’ rule, we obtain

\begin{matrix} p (g | s) & = \{\begin{matrix} p (g) / P_{S_{T}} (s), & g \in I (s) \\ 0, & else . \end{matrix} \end{matrix}

(41)

3. Generalized Mutual Information

We re-derive the GMI in the usual way, where one starts with the forward model

q (y | x)

rather than the reverse density

q (x | y)

in (8). Consider the joint density

p (x, y)

and define

q (y)

as in (9) for

s \geq 0

. Note that neither

q (y | x)

nor

q (y)

must be densities. The GMI is defined in [39] to be

{max}_{s \geq 0} I_{s} (X; Y)

where (see the RHS of (10))

\begin{matrix} I_{s} (X; Y) = E [log \frac{q {(Y | X)}^{s}}{q (Y)}] \end{matrix}

(42)

and where the expectation is with respect to

p (x, y)

. The GMI is a lower bound on the mutual information since

\begin{matrix} I_{s} (X; Y) = I (X; Y) - D (p_{X, Y}∥ p_{Y} q_{X | Y}) . \end{matrix}

(43)

Moreover, by using Gallager’s derivation of error exponents, but without modifying his “s” variable, the GMI

I_{s} (X; Y)

is achievable with a mismatched decoder that uses

q (y | x)

for its decoding metric [39].

3.1. AWGN Forward Model with CSCG Inputs

A natural metric is based on the AWGN auxiliary channel

Y_{a} = h X + Z

where h is a channel parameter and

Z \sim CN (0, σ^{2})

is independent of X, i.e., we have the auxiliary model (here a density)

\begin{matrix} q (y | x) = \frac{1}{π σ^{2}} exp ({- | y - h x |}^{2} / σ^{2}) \end{matrix}

(44)

where h and

σ^{2}

are to be optimized. A natural input is

X \sim CN (0, P)

so that (9) is

\begin{matrix} q (y) = \frac{π σ^{2} / s}{{(π σ^{2})}^{s}} \cdot \frac{exp (\frac{- {| y |}^{2}}{σ^{2} / s + {| h |}^{2} P})}{π (σ^{2} / s + | h |^{2} P)} . \end{matrix}

(45)

We have the following result, see [43] that considers channels of the form (1) and ([47], Proposition 1) that considers general

p (y | x)

.

Proposition 1.

The maximum GMI (42) for the channel

p (y | x)

, a CSCG input X with variance

P > 0

, and the auxiliary model (44) with

σ^{2} > 0

is

\begin{matrix} I_{1} (X; Y) = log (1 + \frac{| \tilde{h} |^{2} P}{{\tilde{σ}}^{2}}) \end{matrix}

(46)

where

s = 1

and (cf. (7))

\begin{matrix} \tilde{h} & = E [Y X^{*}] / P \end{matrix}

(47)

\begin{matrix} {\tilde{σ}}^{2} & = E [| Y - \tilde{h} {X |}^{2}] = E [{| Y |}^{2}] - {| \tilde{h} |}^{2} P . \end{matrix}

(48)

The expectations are with respect to the actual density

p (x, y)

.

Proof.

The GMI (42) for the model (44) is

\begin{matrix} I_{s} (X; Y) & = log (1 + \frac{{| h |}^{2} P}{σ^{2} / s}) + \frac{E [{| Y |}^{2}]}{σ^{2} / s + {| h |}^{2} P} - \frac{E [{| Y - h X |}^{2}]}{σ^{2} / s} . \end{matrix}

(49)

Since (49) depends only on the ratio

σ^{2} / s

one may as well set

s = 1

. Thus, choosing

h = \tilde{h}

and

σ^{2} = {\tilde{σ}}^{2}

gives (46).

Next, consider

Y_{a} = \tilde{h} X + \tilde{Z}

where

\tilde{Z} \sim CN (0, {\tilde{σ}}^{2})

is independent of X. We have

\begin{matrix} E [| Y_{a} |^{2}] = E [{| Y |}^{2}] \end{matrix}

(50)

\begin{matrix} E [| Y_{a} - \tilde{h} X |^{2}] = E [| Y - \tilde{h} X |^{2}] . \end{matrix}

(51)

In other words, the second-order statistics for the two channels with outputs Y (the actual channel output) and

Y_{a}

are the same. But the GMI (46) is the mutual information

I (X; Y_{a})

. Using (43) and (49), for any s, h and

σ^{2}

we have

\begin{matrix} I (X; Y_{a}) = log (1 + \frac{| \tilde{h} |^{2} P}{{\tilde{σ}}^{2}}) \geq I_{s} (X; Y_{a}) = I_{s} (X; Y) \end{matrix}

(52)

and equality holds if

h = \tilde{h}

and

σ^{2} / s = {\tilde{σ}}^{2}

. □

Remark 8.

The rate (46) is the same as the RHS of (6).

Remark 9.

Proposition 1 generalizes to vector models and adaptive input symbols; see Section 4.4.

Remark 10.

The estimate

\tilde{h}

is the MMSE estimate of h:

\begin{matrix} \tilde{h} = arg min_{h} E [{| Y - h X |}^{2}] \end{matrix}

(53)

and

{\tilde{σ}}^{2}

is the variance of the error. To see this, expand

\begin{matrix} E [{| Y - h X |}^{2}] = E [| (Y - \tilde{h} X) + (\tilde{h} - h) {X |}^{2}] = {\tilde{σ}}^{2} + {| \tilde{h} - h |}^{2} P \end{matrix}

(54)

where the final step follows by the definition of

\tilde{h}

in (47).

Remark 11.

Suppose that h is an estimate other than (53). Then if

E [{| Y |}^{2}] > E [{|Y - h X|}^{2}]

we may choose

\begin{matrix} σ^{2} / s = {| h |}^{2} P \cdot \frac{E [{|Y - h X|}^{2}]}{E [{| Y |}^{2}] - E [{|Y - h X|}^{2}]} \end{matrix}

(55)

and the GMI (49) simplifies to

\begin{matrix} I_{s} (X; Y) & = log (\frac{E [{| Y |}^{2}]}{E [{|Y - h X|}^{2}]}) . \end{matrix}

(56)

Remark 12.

The LM rate (for “lower bound to the mismatch capacity”) improves the GMI for some

q (y | x)

[40,75]. The LM rate replaces

q (y | x)

with

q (y | x) e^{t (x) / s}

for some function

t (.)

and permits optimizing s and

t (.)

; see ([41], Section 2.3.2). For example, if

p (y | x)

has the form

q {(y | x)}^{s} e^{t (x)}

then the LM rate can be larger than the GMI; see [76,77].

3.2. CSIR and K-Partitions

We consider two generalizations of Proposition 1. The first is for channels with a state

S_{R}

known at the receiver but not at the transmitter. The second expands the class of CSCG auxiliary models. The motivation is to obtain more precise models under partial CSIR, especially to better deal with channels at high SNR and with high rates. We here consider discrete

S_{R}

and later extend to continuous

S_{R}

.

CSIR: Consider the average GMI

\begin{matrix} I_{1} (X; Y | S_{R}) = \sum_{s_{R}} P_{S_{R}} (s_{R}) I_{1} (X; Y | S_{R} = s_{R}) \end{matrix}

(57)

where

I_{1} (X; Y | S_{R} = s_{R})

is the usual GMI where all densities are conditioned on

S_{R} = s_{R}

. The parameters (47) and (48) for the event

S_{R} = s_{R}

are now

\begin{matrix} \tilde{h} (s_{R}) & = \frac{E [Y X^{*}| S_{R} = s_{R}]}{E [{| X |}^{2}| S_{R} = s_{R}]} \end{matrix}

(58)

\begin{matrix} {\tilde{σ}}^{2} (s_{R}) & = E [| Y - \tilde{h} (s_{R}) {X |}^{2}| S_{R} = s_{R}] . \end{matrix}

(59)

The GMI (57) is thus

\begin{matrix} I_{1} (X; Y | S_{R}) = \sum_{s_{R}} P_{S_{R}} (s_{R}) log (1 + \frac{| \tilde{h} (s_{R}) |^{2} P}{\tilde{σ} {(s_{R})}^{2}}) . \end{matrix}

(60)

K-Partitions: Let

{Y_{k} : k = 1, \dots, K}

be a K-partition of

Y

and define the auxiliary model

\begin{matrix} q (y | x) = \frac{1}{π σ_{k}^{2}} e^{- | y - h_{k} {x |}^{2} / σ_{k}^{2}}, y \in Y_{k} . \end{matrix}

(61)

Observe that

q (y | x)

is not necessarily a density. We choose

X \sim CN (0, P)

so that (9) becomes (cf. (45))

\begin{matrix} q (y) = \frac{π σ_{k}^{2} / s}{{(π σ_{k}^{2})}^{s}} \cdot \frac{exp (\frac{- {| y |}^{2}}{σ_{k}^{2} / s + {| h_{k} |}^{2} P})}{π (σ_{k}^{2} / s + | h_{k} |^{2} P)}, y \in Y_{k} . \end{matrix}

(62)

Define the events

E_{k} = {Y \in Y_{k}}

for

k = 1, \dots, K

. We have

\begin{matrix} I_{s} (X; Y) & = \sum_{k = 1}^{K} \Pr [E_{k}] \cdot E [log \frac{q {(Y | X)}^{s}}{q (Y)}| E_{k}] \end{matrix}

(63)

and inserting (61) and (62) we have the following lemma.

Lemma 1.

The GMI (42) for the channel

p (y | x)

,

s = 1

, a CSCG input X with variance P, and the auxiliary model (61) is (see (49))

\begin{matrix} I_{1} (X; Y) = \sum_{k = 1}^{K} \Pr [E_{k}] [log (1 + \frac{| h_{k} |^{2} P}{σ_{k}^{2}}) + \frac{E [{| Y |}^{2} | E_{k}]}{σ_{k}^{2} + {| h_{k} |}^{2} P} - \frac{E [| Y - h_{k} X |^{2} | E_{k}]}{σ_{k}^{2}}] . \end{matrix}

(64)

Remark 13.

K-partitioning formally includes (57) as a special case by including

S_{R}

as part of the receiver’s “overall” channel output

\tilde{Y} = [Y, S_{R}]

. For example, one can partition

\tilde{Y}

as

{{\tilde{Y}}_{s_{R}} : s_{R} \in S_{R}}

where

{\tilde{Y}}_{s_{R}} = Y \times {s_{R}}

.

Remark 14.

The models (16) and (61) suggest building receivers based on adaptive Gaussian statistics. However, we are motivated to introduce (61) to prove capacity scaling results. For this purpose, we will use

K = 2

with the partition

\begin{matrix} E_{1} = {{| Y |}^{2} < t_{R}}, E_{2} = {{| Y |}^{2} \geq t_{R}} \end{matrix}

(65)

and

h_{1} = 0

,

σ_{1}^{2} = 1

. The GMI (64) thus has only the

k = 2

term and it remains to choose

h_{2}

,

σ_{2}^{2}

, and

t_{R}

.

Remark 15.

One can generalize Lemma 1 and partition

X \times Y

rather than

Y

only. However, the

q (y)

in (62) might not have a CSCG form.

Remark 16.

Define

P_{k} = E [{| X |}^{2} | E_{k}]

and choose the LMMSE auxiliary models with

\begin{matrix} h_{k} & = E [Y X^{*}| E_{k}] / P_{k} \end{matrix}

(66)

\begin{matrix} σ_{k}^{2} & = E [| Y - h_{k} {X |}^{2}| E_{k}] = E [{| Y |}^{2}| E_{k}] - {| h_{k} |}^{2} P_{k} \end{matrix}

(67)

for

k = 1, \dots, K

. The expression (64) is then

\begin{matrix} I_{1} (X; Y) = \sum_{k = 1}^{K} \Pr [E_{k}] [log (1 + \frac{| h_{k} |^{2} P}{E [{| Y |}^{2} | E_{k}] - {| h_{k} |}^{2} P_{k}}) - \frac{| h_{k} |^{2} (P - P_{k})}{E [{| Y |}^{2} | E_{k}] + {| h_{k} |}^{2} (P - P_{k})}] . \end{matrix}

(68)

Remark 17.

The LMMSE-based GMI (68) reduces to the GMI of Proposition 1 by choosing the trivial partition with

K = 1

and

Y_{1} = Y

. However, the GMI (68) may not be optimal for

K \geq 2

. What can be said is that the phase of

h_{k}

in (64) should be the same as the phase of

E [Y X^{*} | E_{k}]

for all k. We thus have K two-dimensional optimization problems, one for each pair

(| h_{k} |, σ_{k}^{2})

,

k = 1, \dots, K

.

Remark 18.

Suppose we choose a different auxiliary model for each

Y = y

, i.e., consider

K \to \infty

. The reverse density GMI uses the auxiliary model (19) which gives the RHS of (15):

\begin{matrix} I_{1} (X; Y) = \int_{C} p (y) log \frac{P}{Var [X | Y = y]} d y . \end{matrix}

(69)

Instead, the suboptimal (68) is the complicated expression

\begin{matrix} I_{1} (X; Y) = \int_{C} p (y) & [log (1 + \frac{| E [X | Y = y] |^{2} (P / P_{y})}{Var [X | Y = y]}) \\ - \frac{| E [X | Y = y] |^{2} (P / P_{y} - 1)}{Var [X | Y = y] + {| E [X | Y = y] |}^{2} (P / P_{y})}] d y . \end{matrix}

(70)

where

P_{y} = E [{| X |}^{2} | Y = y]

. We show how to compute these GMIs in Appendix C.

3.3. Example: On-Off Fading

Consider the channel

Y = H X + Z

where

H, X, Z

are mutually independent,

P_{H} (0) = P_{H} (\sqrt{2}) = 1 / 2

, and

Z \sim CN (0, 1)

. The channel exhibits particularly simple fading, giving basic insight into more realistic fading models. We consider two basic scenarios: full CSIR and no CSIR.

Full CSIR: Suppose

S_{R} = H

and

\begin{matrix} q (y | x, h) = p (y | x, h) = \frac{1}{π σ^{2}} e^{{- | y - h x |}^{2} / σ^{2}} \end{matrix}

(71)

which corresponds to having (58) and (59) as

\begin{matrix} \tilde{h} (0) = 0, \tilde{h} (\sqrt{2}) = \sqrt{2}, {\tilde{σ}}^{2} (0) = σ^{2} (\sqrt{2}) = 1 . \end{matrix}

(72)

The GMI (60) with

X \sim CN (0, P)

thus gives the capacity

\begin{matrix} C (P) = \frac{1}{2} log (1 + 2 P) . \end{matrix}

(73)

The wideband values (37) are

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} = log 2, S = 1 . \end{matrix}

(74)

Compared with (38), the minimal

E_{b} / N_{0}

is the same as without fading, namely

- 1.59

dB. However, fading reduces the capacity slope S; see the dashed curve in Figure 1.

No CSIR: Suppose

S_{R} = 0

and

X \sim CN (0, P)

and consider the densities

\begin{matrix} p (y | x) = \frac{e^{- {| y |}^{2}}}{2 π} + \frac{e^{- | y - \sqrt{2} {x |}^{2}}}{2 π} \end{matrix}

(75)

\begin{matrix} p (y) = \frac{e^{- {| y |}^{2}}}{2 π} + \frac{e^{- {| y |}^{2} / (1 + 2 P)}}{2 π (1 + 2 P)} . \end{matrix}

(76)

The mutual information can be computed by numerical integration or by Monte Carlo integration:

\begin{matrix} I (X; Y) \approx \frac{1}{N} \sum_{i = 1}^{N} log \frac{p_{Y | X} (y_{i} | x_{i})}{p_{Y} (y_{i})} \end{matrix}

(77)

where the RHS of (77) converges to

I (X; Y)

for long strings

x^{N}, y^{N}

sampled from

p (x, y)

. The results for

X \sim CN (0, P)

are shown in Figure 1 as the curve labeled “

I (X; Y)

Gauss”.

Next, Proposition 1 gives

h = 1 / \sqrt{2}

,

σ^{2} = 1 + P / 2

, and

\begin{matrix} I_{1} (X; Y) = log (1 + \frac{P}{2 + P}) . \end{matrix}

(78)

The wideband values (37) are

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} = log 4, S = 2 / 3 \end{matrix}

(79)

so the minimal

E_{b} / N_{0}

is 1.42 dB and the capacity slope S has decreased further. Moreover, the rate saturates at large SNR at 1 bit per channel use.

The “

I (X; Y)

Gauss” curve in Figure 1 suggests that the no-CSIR capacity approaches the full-CSIR capacity for large SNR. To prove this, consider the

K = 2

partition specified in Remark 14 with

h_{1} = 0

,

h_{2} = \sqrt{2}

, and

σ_{2}^{2} = 1

. Since we are not using LMMSE auxiliary models, we must compute the GMI using the general expression (64), which is

\begin{matrix} I_{1} (X; Y) = \Pr [E_{2}] [log (1 + 2 P) + \frac{E [{| Y |}^{2} | E_{2}]}{1 + 2 P} - E [{|Y - \sqrt{2} X|}^{2} | E_{2}]] . \end{matrix}

(80)

In Appendix B.1, we show that choosing

t_{R} = P^{λ_{R}} + b

where

0 < λ_{R} < 1

and b is a real constant makes all terms behave as desired as P increases:

\begin{matrix} \Pr [E_{2}] \to 1 / 2, \frac{E [{| Y |}^{2} | E_{2}]}{1 + 2 P} \to 1, E [{|Y - \sqrt{2} X|}^{2}| E_{2}] \to 1 . \end{matrix}

(81)

The GMI (80) of Lemma 1 thus gives the maximal value (73) for large P:

\begin{matrix} lim_{P \to \infty} [\frac{1}{2} log (1 + 2 P) - I_{1} (X; Y)] = 0 . \end{matrix}

(82)

Figure 1 shows the behavior of

I_{1} (X; Y)

for

K = 2

,

λ_{R} = 0.4

, and

b = 3

. Effectively, at large SNR, the receiver can estimate H accurately, and one approaches the full-CSIR capacity.

Remark 19.

For on-off fading, one may compute

I (X; Y)

directly and use the densities (75) and (76) to decode. Nevertheless, the partitioning of Lemma 1 helps prove the capacity scaling (82).

Consider next the reverse density GMI (69) and the forward model GMI (70). Appendix C.1 shows how to compute

E [X | Y = y]

,

E [{| X |}^{2} | Y = y]

, and

Var [X | Y = y]

, and Figure 1 plots the GMIs as the curves labeled “rGMI” and “GMI, K = ∞”, respectively. The rGMI curve gives the best possible rates for AWGN auxiliary models, as shown in Section 1.4. The results also show that the large-K GMI (70) is worse than the

K = 1

GMI at low SNR but better than the

K = 2

GMI of Remark 14.

Finally, the curve labeled “

I (X; Y)

Gauss” in Figure 1 suggests that the minimal

E_{b} / N_{0}

is 1.42 dB even for the capacity-achieving distribution. However, we know from ([73], Theorem 1) that flash signaling (39) can approach the minimal

E_{b} / N_{0}

of

- 1.59

dB. For example, the flash rates

I (X; Y)

with

p = 0.05

are plotted in Figure 1. Unfortunately, the wideband slope is

S = 0

([73], Theorem 17), and one requires very large flash powers (very small p) to approach

- 1.59

dB.

Remark 20.

As stated in Remark 6, the paper [37] (see also [2,70]) derives two capacity lower bounds. These bounds are the same for our problem, and they are derived using the following steps (see ([37], Lemmas 3 and 4)):

\begin{matrix} I (X; Y) & = I (X, S_{H}; Y) - I (S_{H}; Y | X) \\ \geq I (X; Y | S_{H}) - I (S_{H}; Y | X) . \end{matrix}

(83)

Now consider

Y = H X + Z

where

H, X, Z

are mutually independent,

S_{H} = H

,

Var [Z] = 1

, and

X \sim CN (0, P)

. We have

\begin{matrix} I (X; Y | H) & \geq E [log (1 + | H |^{2} P)] \end{matrix}

(84)

\begin{matrix} I (H; Y | X) & = h (Y | X) - h (Z) \\ \leq log (π e (1 + Var [H] P)) - h (Z) \end{matrix}

(85)

where (84) and (85) follow by (5), in the latter case with the roles of X and Y reversed. The bound (85) works well if

Var [H]

is small, as for massive MIMO with “channel hardening”. However, for our on-off fading model, the bound (83) is

\begin{matrix} I (X; Y) & \geq E [log (1 + {| H |}^{2} P)] - log (1 + Var [H] P) \\ = \frac{1}{2} log (1 + 2 P) - log (1 + P / 2) \end{matrix}

(86)

which is worse than the

K = 1

and

K = \infty

GMIs and is not shown in Figure 1.

4. Channels with CSIT

This section studies Shannon’s channel with side information, or state, known causally at the transmitter [5,6]. We begin by treating general channels and then focus mainly on complex-alphabet channels. The capacity expression has a random variable A that is either a list (for discrete-alphabet states) or a function (for continuous-alphabet states). We refer to A as an adaptive symbol of an adaptive codeword.

4.1. Model

The problem is specified by the functional dependence graph (FDG) in Figure 2. The model has a message M, a CSIT string

S_{T}^{n}

, and a noise string

Z^{n}

. The variables M,

S_{T}^{n}

,

Z^{n}

are mutually statistically independent, and

S_{T}^{n}

and

Z^{n}

are strings of i.i.d. random variables with the same distributions as

S_{T}

and Z, respectively.

S_{T}^{n}

is available causally at the transmitter, i.e., the channel input

X_{i}

,

i = 1, \dots, n

, is a function of M and the sub-string

S_{T}^{i}

. The receiver sees the channel outputs

\begin{matrix} Y_{i} = f (X_{i}, S_{T i}, Z_{i}) \end{matrix}

(87)

for some function

f (.)

and

i = 1, 2, \dots, n

.

Each

A_{i}

represents a list of possible choices of

X_{i}

at time i. More precisely, suppose that

S_{T}

has alphabet

S_{T} = {0, 1, \dots, ν - 1}

and define the adaptive symbol

\begin{matrix} A = (X (0), \dots, X (ν - 1)) \end{matrix}

whose entries have alphabet

X

. Here

S_{T} = s_{T}

means that

X (s_{T})

is transmitted, i.e., we have

X = X (S_{T})

. If

S_{T}

has a continuous alphabet, we make A a function rather than a list, and we may again write

X = X (S_{T})

. Some authors therefore write A as

X (.)

. (Shannon in [6] denoted our A and X as the respective X and x.)

Remark 21.

The conventional choice for A if

X = C

is

\begin{matrix} A = (\sqrt{P (0)} e^{j ϕ (0)}, \dots, \sqrt{P (ν - 1)} e^{j ϕ (ν - 1)}) \cdot U \end{matrix}

(88)

where U has

E [{| U |}^{2}] = 1

,

P (s_{T}) = E [| X (s_{T}) |^{2}]

, and

ϕ (s_{T})

is a phase shift. The interpretation is that U represents the symbol of a conventional codebook without CSIT, and these symbols are scaled and rotated. In other words, one separates the message-carrying U from an adaptation due to

S_{T}

via

\begin{matrix} X = \sqrt{P (S_{T})} e^{j ϕ (S_{T})} U . \end{matrix}

(89)

Remark 22.

One may define the channel by the functional relation (87), by

p (y | a)

, or by

p (y | x, s_{T})

; see Shannon’s emphasis in ([6], Theorem); see ([22], Remark 3). We generally prefer to use

p (y | a)

since we interpret A as a channel input.

Remark 23.

One can add feedback and let

X_{i}

be a function of

(M, S_{T}^{i}, Y^{i - 1})

, but feedback does not increase the capacity if the state and noise processes are memoryless ([22], Section V).

Remark 24.

The model (87) permits block fading and MIMO transmission by choosing

X_{i}

and

Y_{i}

as vectors [11,78].

4.2. Capacity

The capacity of the model under study is (see [6])

\begin{matrix} C = max_{A} I (A; Y) \end{matrix}

(90)

where

A - [S_{T}, X] - Y

forms a Markov chain. One may limit attention to A with cardinality

| A |

satisfying (see ([22], Equation (56)), [79], ([80], Theorem 1))

\begin{matrix} | A | \leq min (| Y |, 1 + | S_{T} | (| X | - 1)) . \end{matrix}

(91)

As usual, for the cost function

c (x, y)

and the average block cost constraint

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} E [c (X_{i}, Y_{i})] \leq P \end{matrix}

(92)

the unconstrained maximization in (90) becomes a constrained maximization over the A for which

E [c (X, Y)] \leq P

. Also, a simple upper bound on the capacity is

\begin{matrix} C (P) & \leq max_{A : E [c (X, Y)] \leq P} I (A; Y, S_{T}) \\ \overset{(a)}{=} max_{X (S_{T}) : E [c (X (S_{T}), Y)] \leq P} I (X; Y | S_{T}) \end{matrix}

(93)

where step

(a)

follows by the independence of A and

S_{T}

. This bound is tight if the receiver knows

S_{T}

.

Remark 25.

The chain rule for mutual information gives

\begin{matrix} I (A; Y) & = I (X (0) \dots X (ν - 1); Y) \end{matrix}

(94)

\begin{matrix} = \sum_{s_{T} = 0}^{ν - 1} I (X (s_{T}); Y | X (0), \dots, X (s_{T} - 1)) . \end{matrix}

(95)

The RHS of (94) suggests treating the channel as a multi-input, single-output (MISO) channel, and the expression (95) suggests using multi-level coding with multi-stage decoding [81]. For example, one may use polar coded modulation [82,83,84] with Honda-Yamamoto shaping [85,86].

Remark 26.

For

X = C

and the conventional adaptive symbol (88), we compute

I (A; Y) = I (U; Y)

and

\begin{matrix} C (P) = max_{P (S_{T}), ϕ (S_{T}) : E [c (X (S_{T}), Y)] \leq P} I (U; Y) . \end{matrix}

(96)

4.3. Structure of the Optimal Input Distribution

Let

A

be the alphabet of A and let

X = C

, i.e., we have

A = C^{ν}

for discrete

S_{T}

. Consider the expansions

\begin{matrix} p (y | a) & = \sum_{s_{T}} P_{S_{T}} (s_{T}) p (y | x (s_{T}), s_{T}) \\ p (y) & = \int_{A} p (a) p (y | a) d a \end{matrix}

(97)

\begin{matrix} = \sum_{s_{T}} P_{S_{T}} (s_{T}) \int_{C} p (x (s_{T})) p (y | x (s_{T}), s_{T}) d x (s_{T}) . \end{matrix}

(98)

Observe that

p (y)

, and hence

h (Y)

, depends only on the marginals

p (x (s_{T}))

of A; see ([80], Section III). So define the set of densities having the same marginals as A:

\begin{matrix} P (A) = \{p (\tilde{a}) : p (\tilde{x} (s_{T})) = p (x (s_{T})) for all s_{T} \in S_{T}\} . \end{matrix}

This set is convex, since for any

p^{(1)} (a), p^{(2)} (a) \in P (A)

and

0 \leq λ \leq 1

we have

\begin{matrix} λ p^{(1)} (a) + (1 - λ) p^{(2)} (a) \in P (A) . \end{matrix}

(99)

Moreover, for fixed

p (y)

, the expression

I (A; Y)

is a convex function of

p (a | y)

, and

p (a | y) = p (a) p (y | a) / p (y)

is a linear function of

p (a)

. Maximizing

I (A; Y)

over

P (A)

is thus the same as minimizing the concave function

h (Y | A)

over the convex set

P (A)

. An optimal

p (a)

is thus an extreme of

P (A)

. Some properties of such extremes are developed in [87,88].

For example, consider

| S_{T} | = 2

and

X = S_{T} = {0, 1}

, for which (91) states that at most

| A | = 3

adaptive symbols need have positive probability (and at most

| A | = 2

adaptive symbols if

| Y | = 2

). Suppose the marginals have

P_{X (0)} (0) = 1 / 2

,

P_{X (1)} (0) = 3 / 4

and consider the matrix notation

\begin{matrix} P_{A} = [\begin{matrix} P_{A} (0, 0) & P_{A} (0, 1) \\ P_{A} (1, 0) & P_{A} (1, 1) \end{matrix}] \end{matrix}

where we write

P_{A} (x_{1}, x_{2})

for

P_{A} ([x_{1}, x_{2}])

. The optimal

P_{A}

must then be one of the two extremes

\begin{matrix} \begin{matrix} P_{A} = [\begin{matrix} 1 / 2 & 0 \\ 1 / 4 & 1 / 4 \end{matrix}], P_{A} = [\begin{matrix} 1 / 4 & 1 / 4 \\ 1 / 2 & 0 \end{matrix}] . \end{matrix} \end{matrix}

(100)

For the first

P_{A}

, the codebook has the property that if

X (0) = 0

then

X (1) = 0

while if

X (0) = 1

then

X (1)

is uniformly distributed over

X = {0, 1}

.

Next, consider

| S_{T} | = 2

and marginals

P_{X (0)}

,

P_{X (1)}

that are uniform over

X = {0, 1, \dots, | X | - 1}

. This case was treated in detail in ([80], Section VI.A), see also [89], and we provide a different perspective. A classic theorem of Birkhoff [90] ensures that the extremes of

P (A)

are the

| X |!

distributions

P_{A}

for which the

| X | \times | X |

matrix

\begin{matrix} P_{A} = [\begin{matrix} P_{A} (0, 0) & \dots & P_{A} (0, | X | - 1) \\ ⋮ & ⋱ & ⋮ \\ P_{A} (| X | - 1, 0) & \dots & P_{A} (| X | - 1, | X | - 1) \end{matrix}] . \end{matrix}

is a permutation matrix multiplied by

1 / | X |

. For example, for

| X | = 2

we have the two extremes

\begin{matrix} \begin{matrix} P_{A} = \frac{1}{2} [\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix}], P_{A} = \frac{1}{2} [\begin{matrix} 0 & 1 \\ 1 & 0 \end{matrix}] . \end{matrix} \end{matrix}

(101)

The permutation property means that

X (s_{T})

is a function of

X (0)

, i.e., the encoding simplifies to a conventional codebook as in Remark 21 with uniformly-distributed U and a permutation

π_{s_{T}} (.)

indexed by

s_{T}

such that

X (S_{T}) = π_{S_{T}} (U)

. For example, for the first

P_{A}

in (101) we may choose

X (S_{T}) = U

, which is independent of

S_{T}

. On the other hand, for the second

P_{A}

in (101) we may choose

X (S_{T}) = U \oplus S_{T}

where ⊕ denotes addition modulo-2.

For

| S_{T} | > 2

, the geometry of

P (A)

is more complicated; see ([80], Section VI.B). For example, consider

X = {0, 1}

and suppose the marginals

P_{X (s_{T})}

,

s_{T} \in S_{T}

, are all uniform. Then the extremes include

P_{A}

related to linear codes and their cosets, e.g., two extremes for

| S_{T} | = 3

are related to the repetition code and single parity check code:

\begin{matrix} \begin{matrix} P_{A} (a) = 1 / 2, a \in {[0, 0, 0], [1, 1, 1]} \\ P_{A} (a) = 1 / 4, a \in {[0, 0, 0], [0, 1, 1], [1, 0, 1], [1, 1, 0]} . \end{matrix} \end{matrix}

This observation motivates concatenated coding, where the message is first encoded by an outer encoder followed by an inner code that is the coset of a linear code. The transmitter then sends the entries at position

S_{T}

of the inner codewords, which are vectors of dimension

| S_{T} |

. We do not know if there are channels for which such codes are helpful.

4.4. Generalized Mutual Information

Consider the vector channel

p (\underset{̲}{y} | \underset{̲}{x})

with input set

X = C^{M}

and output set

Y = C^{N}

. The GMI for adaptive symbols is

{max}_{s \geq 0} I_{s} (A; \underset{̲}{Y})

where

\begin{matrix} I_{s} (A; \underset{̲}{Y}) = E [log \frac{q {(\underset{̲}{Y} | A)}^{s}}{q (\underset{̲}{Y})}] \end{matrix}

(102)

and the expectation is with respect to

p (a, \underset{̲}{y})

. Suppose the auxiliary model is

q (\underset{̲}{y} | a)

and define

\begin{matrix} q (\underset{̲}{y}) = \int_{A} p (a) q {(\underset{̲}{y} | a)}^{s} d a . \end{matrix}

(103)

The GMI again provides a lower bound on the mutual information since (cf. (43))

\begin{matrix} I_{s} (A; \underset{̲}{Y}) = I (A; \underset{̲}{Y}) - D (p_{A, \underset{̲}{Y}}∥ p_{\underset{̲}{Y}} q_{A | \underset{̲}{Y}}) \end{matrix}

(104)

where

q (a | \underset{̲}{y}) = p (a) q {(\underset{̲}{y} | a)}^{s} / q (\underset{̲}{y})

is a reverse channel density.

We next study reverse and forward models as in Section 1.3 and Section 1.4. Suppose the entries

\underset{̲}{X} (s_{T})

of A are jointly CSCG.

Reverse Model: We write

\underset{̲}{A}

when we consider A to be a column vector that stacks the

\underset{̲}{X} (s_{T})

. Consider the following reverse density motivated by (13):

\begin{matrix} q (\underset{̲}{a} | \underset{̲}{y}) = \frac{exp (- {(\underset{̲}{a} - E [\underset{̲}{A} | \underset{̲}{Y} = \underset{̲}{y}])}^{†} Q_{\underset{̲}{A} | \underset{̲}{Y} = \underset{̲}{y}}^{- 1} (\underset{̲}{a} - E [\underset{̲}{A} | \underset{̲}{Y} = \underset{̲}{y}]))}{π^{ν M} det Q_{\underset{̲}{A} | \underset{̲}{Y} = \underset{̲}{y}}} . \end{matrix}

(105)

A corresponding forward model is

q (\underset{̲}{y} | a) = q (a | \underset{̲}{y}) / p (a)

and the GMI with

s = 1

becomes (cf. (35))

\begin{matrix} I_{1} (A; \underset{̲}{Y}) = E [log \frac{det Q_{\underset{̲}{A}}}{det Q_{\underset{̲}{A} | \underset{̲}{Y}}}] . \end{matrix}

(106)

To simplify, one may focus on adaptive symbols as in (89):

\begin{matrix} \underset{̲}{X} = Q_{\underset{̲}{X} (S_{T})}^{1 / 2} \cdot \underset{̲}{U} \end{matrix}

(107)

where

\underset{̲}{U} \sim CN (\underset{̲}{0}, I)

and the

Q_{\underset{̲}{X} (s_{T})}

are covariance matrices. We thus have

I (A; \underset{̲}{Y}) = I (\underset{̲}{U}; \underset{̲}{Y})

(cf. (96)) and using (105) but with

\underset{̲}{A}

replaced with

\underset{̲}{U}

we obtain

\begin{matrix} I_{1} (A; \underset{̲}{Y}) = E [- log det Q_{\underset{̲}{U} | \underset{̲}{Y}}] . \end{matrix}

(108)

Forward Model: Perhaps the simplest forward model is

q (\underset{̲}{y} | a) = p (\underset{̲}{y} | \underset{̲}{x} (s_{T}))

for some fixed value

s_{T} \in S_{T}

. One may interpret this model as having the receiver assume that

S_{T} = s_{T}

. A natural generalization of this idea is as follows: define the auxiliary vector

\begin{matrix} \underset{̲}{\bar{X}} = \sum_{s_{T}} W (s_{T}) \underset{̲}{X} (s_{T}) \end{matrix}

(109)

where the

W (s_{T})

are

M \times M

complex matrices, i.e.,

\underset{̲}{\bar{X}}

is a linear function of the entries of

A = [\underset{̲}{X} (s_{T}) : s_{T} \in S_{T}]

. For example, the matrices might be chosen based on

P_{S_{T}} (.)

. However, observe that

\underset{̲}{\bar{X}}

is independent of

S_{T}

. Now define the auxiliary model

\begin{matrix} q (\underset{̲}{y} | a) = q (\underset{̲}{y} | \underset{̲}{\bar{x}}) \end{matrix}

where we abuse notation by using the same

q (.)

. The expression (103) becomes

\begin{matrix} q (\underset{̲}{y}) & = \int_{A} p (a) q {(\underset{̲}{y} | a)}^{s} d a = \int_{C} p (\underset{̲}{\bar{x}}) q {(\underset{̲}{y} | \underset{̲}{\bar{x}})}^{s} d \underset{̲}{\bar{x}} . \end{matrix}

(110)

Remark 27.

We often consider

S_{T}

to be a discrete set, but for CSCG channels we also consider

S_{T} = C

so that the sum over

S_{T}

in (109) is replaced by an integral over

C

.

We now specialize further by choosing the auxiliary channel

{\underset{̲}{Y}}_{a} = H \underset{̲}{\bar{X}} + \underset{̲}{Z}

where

H

is an

N \times M

complex matrix,

\underset{̲}{Z}

is an N-dimensional CSCG vector that is independent of

\underset{̲}{\bar{X}}

and has invertible covariance matrix

Q_{\underset{̲}{Z}}

, and

H

and

Q_{\underset{̲}{Z}}

are to be optimized. Further choose

A = [\underset{̲}{X} (s_{T}) : s_{T} \in S_{T}]

whose entries are jointly CSCG with correlation matrices

\begin{matrix} R (s_{T 1}, s_{T 2}) = E [\underset{̲}{X} (s_{T 1}) \underset{̲}{X} {(s_{T 2})}^{†}] . \end{matrix}

Since

\underset{̲}{\bar{X}}

in (109) is independent of

S_{T}

, we have

\begin{matrix} q (\underset{̲}{y} | a) = \frac{exp (- {(\underset{̲}{y} - H \underset{̲}{\bar{x}})}^{†} Q_{\underset{̲}{Z}}^{- 1} (\underset{̲}{y} - H \underset{̲}{\bar{x}}))}{π^{N} det Q_{\underset{̲}{Z}}} . \end{matrix}

(111)

Moreover,

\underset{̲}{\bar{X}}

is CSCG so (110) is

\begin{matrix} q (\underset{̲}{y}) = \frac{π^{N} det (Q_{\underset{̲}{Z}} / s)}{{(π^{N} det Q_{\underset{̲}{Z}})}^{s}} \cdot \frac{exp (- {\underset{̲}{y}}^{†} {(Q_{\underset{̲}{Z}} / s + H Q_{\underset{̲}{\bar{X}}} H^{†})}^{- 1} \underset{̲}{y})}{π^{N} det (Q_{\underset{̲}{Z}} / s + H Q_{\underset{̲}{\bar{X}}} H^{†})} \end{matrix}

where

\begin{matrix} Q_{\underset{̲}{\bar{X}}} & = \sum_{s_{T 1}, s_{T_{2}}} W (s_{T 1}) R (s_{T 1}, s_{T 2}) W {(s_{T 2})}^{†} . \end{matrix}

We have the following generalization of Proposition 1.

Lemma 2.

The maximum GMI (102) for the channel

p (\underset{̲}{y} | a)

, an adaptive vector

A = [\underset{̲}{X} (s_{T}) : s_{T} \in S_{T}]

that has jointly CSCG entries, an

\underset{̲}{\bar{X}}

as in (109) with

Q_{\underset{̲}{\bar{X}}} ≻ 0

, and the auxiliary model (111) with

Q_{\underset{̲}{Z}} ≻ 0

is

\begin{matrix} I_{1} (A; \underset{̲}{Y}) = log det (I + Q_{\underset{̲}{\tilde{Z}}}^{- 1} \tilde{H} Q_{\underset{̲}{\bar{X}}} {\tilde{H}}^{†}) \end{matrix}

(112)

where (cf. (31))

\begin{matrix} \tilde{H} & = E [\underset{̲}{Y} {\underset{̲}{\bar{X}}}^{†}] Q_{\underset{̲}{\bar{X}}}^{- 1} \end{matrix}

(113)

\begin{matrix} Q_{\underset{̲}{\tilde{Z}}} & = Q_{\underset{̲}{Y}} - \tilde{H} Q_{\underset{̲}{\bar{X}}} {\tilde{H}}^{†} . \end{matrix}

(114)

The expectation is with respect to the actual channel with joint distribution/density

p (a, \underset{̲}{y})

.

Proof.

See Appendix D. □

Remark 28.

Since

\underset{̲}{\bar{X}}

is a function of A, the rate (112) can alternatively be derived by using

I (A; \underset{̲}{Y}) \geq I (\underset{̲}{\bar{X}}; \underset{̲}{Y})

and applying the bound (30) with

\underset{̲}{X}

replaced with

\underset{̲}{\bar{X}}

.

Remark 29.

The estimate

\tilde{H}

is the MMSE estimate of

H

:

\begin{matrix} \tilde{H} = arg min_{H} E [∥ \underset{̲}{Y} - H \underset{̲}{\bar{X}} ∥^{2}] \end{matrix}

(115)

and

Q_{\tilde{\underset{̲}{Z}}}

is the resulting covariance matrix of the error. To see this, expand (cf. (54))

\begin{matrix} E [∥ \underset{̲}{Y} - H \underset{̲}{\bar{X}} ∥^{2}] & = E [∥ (\underset{̲}{Y} - \tilde{H} \underset{̲}{\bar{X}}) + (\tilde{H} - H) \underset{̲}{\bar{X}} ∥^{2}] \\ = E [∥ \underset{̲}{Y} - \tilde{H} \underset{̲}{\bar{X}} ∥^{2}] + tr ((\tilde{H} - H) Q_{\underset{̲}{\bar{X}}} {(\tilde{H} - H)}^{†}) \end{matrix}

(116)

where the final step follows by the definition of

\tilde{H}

in (113).

Remark 30.

Suppose that

H

is an estimate other than (115). Generalizing (55), if

Q_{\underset{̲}{Y}} ≻ Q_{\underset{̲}{\bar{Z}}}

we may choose

\begin{matrix} Q_{\underset{̲}{Z}} / s = {(H Q_{\underset{̲}{\bar{X}}} H^{†})}^{1 / 2} {(Q_{\underset{̲}{Y}} - Q_{\underset{̲}{\bar{Z}}})}^{- 1 / 2} Q_{\underset{̲}{\bar{Z}}} {(Q_{\underset{̲}{Y}} - Q_{\underset{̲}{\bar{Z}}})}^{- 1 / 2} {(H Q_{\underset{̲}{\bar{X}}} H^{†})}^{1 / 2} \end{matrix}

(117)

where

\begin{matrix} Q_{\underset{̲}{\bar{Z}}} & = E [(\underset{̲}{Y} - H \underset{̲}{\bar{X}}) {(\underset{̲}{Y} - H \underset{̲}{\bar{X}})}^{†}] . \end{matrix}

(118)

Appendix D shows that (102) then simplifies to (cf. (56))

\begin{matrix} I_{s} (A; \underset{̲}{Y}) = log det (Q_{\underset{̲}{\bar{Z}}}^{- 1} Q_{\underset{̲}{Y}}) . \end{matrix}

(119)

Remark 31.

The GMI (112) does not depend on the scaling of

\underset{̲}{\bar{X}}

since this is absorbed in

\tilde{H}

. For example, one can choose the weighting matrices in (109) so that

E [∥ \underset{̲}{\bar{X}} ∥^{2}] = P

.

4.5. Optimal Codebooks for CSCG Forward Models

The following Lemma maximizes the GMI for scalar channels and A with CSCG entries without requiring A to have the form (89). Nevertheless, this form is optimal, and we refer to ([10], page 2013) and Section 6.4 for similar results. In the following, let

U (s_{T}) \sim CN (0, 1)

for all

s_{T}

.

Lemma 3.

The maximum GMI (102) for the channel

p (y | a)

, an adaptive symbol A with jointly CSCG entries, the forward model (111), and with fixed

P (s_{T}) = E [| X (s_{T}) |^{2}]

is

\begin{matrix} I_{1} (A; Y) = log (1 + \frac{\tilde{P}}{E [{| Y |}^{2}] - \tilde{P}}) \end{matrix}

(120)

where, writing

X (s_{T}) = \sqrt{P (s_{T})} U (s_{T})

for all

s_{T}

, we have

\begin{matrix} \tilde{P} & = E {[|E [Y U {(S_{T})}^{*}| S_{T}]|]}^{2} . \end{matrix}

(121)

This GMI is achieved by choosing fully-correlated symbols:

\begin{matrix} X (s_{T}) = \sqrt{P (s_{T})} e^{j ϕ (s_{T})} U \end{matrix}

(122)

and

\bar{X} = c U

for some non-zero constant c and a common

U \sim CN (0, 1)

, and where

\begin{matrix} ϕ (s_{T}) = - arg (E [Y U {(s_{T})}^{*}| S_{T} = s_{T}]) . \end{matrix}

(123)

Proof.

See Appendix E. □

Remark 32.

The expression (121) is based on (A58) in Appendix E and can alternatively be written as

\tilde{P} = | \tilde{h} |^{2} \bar{P}

where

\begin{matrix} \tilde{h} & = E [Y {\bar{X}}^{*}] / \bar{P} . \end{matrix}

Remark 33.

The power levels

P (s_{T})

may be optimized, usually under a constraint such as

E [P (S_{T})] \leq P

.

Remark 34.

By the Cauchy-Schwarz inequality, we have

\begin{matrix} E {[|E [Y U {(S_{T})}^{*}| S_{T}]|]}^{2} \leq E [{| Y |}^{2}] . \end{matrix}

Furthermore, equality holds if and only if

| Y U {(s_{T})}^{*} |

is a constant for each

s_{T}

, but this case is not interesting.

4.6. Forward Model GMI for MIMO Channels

The following lemma generalizes Lemma 3 to MIMO channels without claiming a closed-form expression for the optimal GMI. Let

\underset{̲}{U} (s_{T}) \sim CN (\underset{̲}{0}, I)

for all

s_{T}

.

Lemma 4.

A GMI (102) for the channel

p (\underset{̲}{y} | a)

, an adaptive vector A with jointly CSCG entries, the auxiliary model (111), and with fixed

Q_{\underset{̲}{X} (s_{T})}

is given by (112) that we write as

\begin{matrix} I_{1} (A; \underset{̲}{Y}) = log (\frac{det Q_{\underset{̲}{Y}}}{det (Q_{\underset{̲}{Y}} - \tilde{D} {\tilde{D}}^{†})}) . \end{matrix}

(124)

where for

M \times M

unitary

V_{R} (s_{T})

we have

\begin{matrix} \tilde{D} = E [U_{T} (S_{T}) Σ (S_{T}) V_{R} {(S_{T})}^{†}] \end{matrix}

(125)

and

U_{T} (s_{T})

and

Σ (s_{T})

are

N \times N

unitary and

N \times M

rectangular diagonal matrices, respectively, of the SVD

\begin{matrix} E [\underset{̲}{Y} {\underset{̲}{U} (s_{T})}^{†}| S_{T} = s_{T}] = U_{T} (s_{T}) Σ (s_{T}) V_{T} {(s_{T})}^{†} \end{matrix}

(126)

for all

s_{T}

, and the

V_{T} (s_{T})

are

M \times M

unitary matrices. The GMI (124) is achieved by choosing the symbols (cf. (122) and (A87) below):

\begin{matrix} \underset{̲}{X} (s_{T}) = Q_{\underset{̲}{X} (s_{T})}^{1 / 2} V_{T} (s_{T}) \underset{̲}{U} \end{matrix}

(127)

and

\underset{̲}{\bar{X}} = C \underset{̲}{U}

for some invertible

M \times M

matrix

C

and a common M-dimensional vector

\underset{̲}{U} \sim CN (\underset{̲}{0}, I)

. One may maximize (124) over the unitary

V_{R} (s_{T})

.

Proof.

See Appendix G. □

Using Lemma 4, the theory for MISO channels with

N = 1

is similar to the scalar case of Lemma 3; see Remark 35 below. However, optimizing the GMI is more difficult for

N > 1

because one must optimize over the unitary matrices

V_{R} (s_{T})

in (125); see Remark 36 below.

Remark 35.

Consider

N = 1

in which case one may set

U_{T} (s_{T}) = 1

and (126) is a

1 \times M

vector where

Σ (s_{T})

has as the only non-zero singular value

\begin{matrix} σ (s_{T}) = ∥E [Y \underset{̲}{U} {(s_{T})}^{†}| S_{T} = s_{T}]∥ = {(\sum_{m = 1}^{M} {|E [Y U_{m} {(s_{T})}^{*}| S_{T} = s_{T}]|}^{2})}^{1 / 2} . \end{matrix}

(128)

The absolute value of the scalar (125) is maximized by choosing

V_{R} (s_{T}) = I

for all

s_{T}

to obtain (cf. (121))

\begin{matrix} \tilde{D} {\tilde{D}}^{†} = E {[σ (S_{T})]}^{2} . \end{matrix}

(129)

Remark 36.

Consider

M = 1

in which case one may set

V_{T} (s_{T}) = 1

and (126) is a

N \times 1

vector where

Σ (s_{T})

has as the only non-zero singular value

\begin{matrix} σ (s_{T}) = ∥E [\underset{̲}{Y} U {(s_{T})}^{†}| S_{T} = s_{T}]∥ = {(\sum_{n = 1}^{N} {|E [Y_{n} U {(s_{T})}^{*}| S_{T} = s_{T}]|}^{2})}^{1 / 2} . \end{matrix}

(130)

We should now find the

V_{R} (s_{T}) = e^{j ϕ_{R} (s_{T})}

that minimize the determinant in the denominator of (124) where (see (125))

\begin{matrix} \tilde{D} = E [{\underset{̲}{u}}_{T} (S_{T}) σ (S_{T}) e^{- j ϕ_{R} (S_{T})}] \end{matrix}

(131)

and where each

{\underset{̲}{u}}_{T} (s_{T})

is one of the columns of the

N \times N

unitary matrix

U_{T} (s_{T})

.

Remark 37.

Consider

M = N

and the product channel

\begin{matrix} p (\underset{̲}{y} | a) = \prod_{m = 1}^{M} p (y_{m} | [x_{m} (s_{T}) : s_{T} \in S_{T}]) \end{matrix}

(132)

where

x_{m} (s_{T})

is the m’th entry of

\underset{̲}{x} (s_{T})

. We choose

Q_{\underset{̲}{X} (s_{T})}

as diagonal with diagonal entries

\sqrt{P_{m} (s_{T})}

,

m = 1, \dots, M

. Also choosing

V_{R} (s_{T}) = I

makes the matrix

\tilde{D} {\tilde{D}}^{†}

diagonal with the diagonal entries (cf. (121) where

M = N = 1

)

\begin{matrix} {(\sum_{s_{T}} P_{S_{T}} (s_{T}) |E [Y_{m} U_{m} {(s_{T})}^{*}| S_{T} = s_{T}]|)}^{2} \end{matrix}

(133)

for

m = 1, \dots, M

. The GMI (124) is thus (cf. (120))

\begin{matrix} I_{1} (A; \underset{̲}{Y}) = \sum_{m = 1}^{M} log (\frac{E [| Y_{m} |^{2}]}{E [| Y_{m} |^{2}] - E {[| E [Y_{m} U_{m} {(S_{T})}^{*}| S_{T}] |]}^{2}}) . \end{matrix}

(134)

Remark 38.

For general

p (\underset{̲}{y} | a)

, one might wish to choose diagonal

Q_{\underset{̲}{X} (s_{T})}

and a product model

\begin{matrix} q (\underset{̲}{y} | a) = \prod_{m = 1}^{M} q_{m} (y_{m} | {\bar{x}}_{m}) \end{matrix}

where the

q_{m} (.)

are scalar AWGN channels

\begin{matrix} q_{m} (y | x) = \frac{1}{π σ_{m}^{2}} exp (- | y - h_{m} {x |}^{2} / σ_{m}^{2}) \end{matrix}

with possibly different

h_{m}

and

σ_{m}^{2}

for each m. Consider also

\begin{matrix} {\bar{X}}_{m} = \sum_{s_{T}} w_{m} (s_{T}) X_{m} (s_{T}) \end{matrix}

for some complex weights

w_{m} (s_{T})

, i.e.,

{\bar{X}}_{m}

is a weighted sum of entries from the list

[X_{m} (s_{T}) : s_{T} \in S_{T}]

. The maximum GMI is now the same as (134) but without requiring the actual channel to have the form (132).

Remark 39.

If the actual channel is

\underset{̲}{Y} = H \underset{̲}{X} + \underset{̲}{Z}

then

\begin{matrix} E [\underset{̲}{Y} \underset{̲}{U} {(s_{T})}^{†} | S_{T} = s_{T}] = E [H \underset{̲}{X} (s_{T}) \underset{̲}{U} {(s_{T})}^{†} | S_{T} = s_{T}] = E [H | S_{T} = s_{T}] Q_{\underset{̲}{X} (s_{T})}^{1 / 2} \end{matrix}

(135)

where the final step follows because

\underset{̲}{U} (S_{T}) - S_{T} - H

forms a Markov chain. The expression (135) is useful because it separates the effects of the channel and the transmitter.

Remark 40.

Combining Remarks 37 and 39, suppose the actual channel is

\underset{̲}{Y} = H \underset{̲}{X} + \underset{̲}{Z}

with

M = N

and where

H

is diagonal with diagonal entries

H_{m}

,

m = 1, \dots, M

. The GMI (124) is then (cf. (134))

\begin{matrix} I_{1} (A; \underset{̲}{Y}) = \sum_{m = 1}^{M} log (\frac{E [| Y_{m} |^{2}]}{E [| Y_{m} |^{2}] - E {[|E [H_{m} \sqrt{P_{m} (S_{T})}| S_{T}]|]}^{2}}) \end{matrix}

(136)

where

E [| Y_{m} |^{2}] = 1 + E [| H_{m} |^{2} P_{m} (S_{T})]

.

5. Channels with CSIR and CSIT

Shannon’s model includes CSIR [11]. The FDG is shown in Figure 3 where there is a hidden state

S_{H}

, the CSIR

S_{R}

and CSIT

S_{T}

are functions of

S_{H}

, and the receiver sees the channel outputs

\begin{matrix} [Y_{i}, S_{R i}] = [f (X_{i}, S_{H i}, Z_{i}), S_{R i}] \end{matrix}

(137)

for some function

f (.)

and

i = 1, 2, \dots, n

. (By defining

S_{H} = [S_{H 1}, Z_{H}]

and calling

S_{H 1}

the hidden channel state we can include the case where

S_{R}

and

S_{T}

are noisy functions of

S_{H 1}

.) As before, M,

S_{H}^{n}

,

Z^{n}

are mutually statistically independent, and

S_{H}^{n}

and

Z^{n}

are i.i.d. strings of random variables with the same distributions as

S_{T}

and Z, respectively. Observe that we have changed the notation by writing Y for only part of the channel output. The new Y (without the

S_{R}

) is usually called the “channel output”.

5.1. Capacity and GMI

We begin with scalar channels for which (90) is

\begin{matrix} C = max_{A} I (A; Y, S_{R}) = max_{A} I (A; Y | S_{R}) \end{matrix}

(138)

where A and

S_{R}

are independent.

Reverse Model: The expression (108) with the adaptive symbol (88) is

\begin{matrix} I_{1} (A; Y, S_{R}) = E [- log Var [U | Y, S_{R}]] . \end{matrix}

(139)

Forward Model: Consider the expansion

\begin{matrix} I_{1} (A; Y | S_{R}) = \int_{S_{R}} p (s_{R}) I_{1} (A; Y | S_{R} = s_{R}) d s_{R} \end{matrix}

(140)

where

I_{1} (A; Y | S_{R} = s_{R})

is the GMI (102) with all densities conditioned on

S_{R} = s_{R}

. We choose the forward model

\begin{matrix} q (y | a, s_{R}) = \frac{1}{π σ {(s_{R})}^{2}} exp (- \frac{| y - h (s_{R}) \bar{x} (s_{R}) |^{2}}{σ {(s_{R})}^{2}}) . \end{matrix}

(141)

where similar to (109) we define

\begin{matrix} \bar{X} (s_{R}) = \sum_{s_{T}} w (s_{T}, s_{R}) X (s_{T}) \end{matrix}

(142)

for complex weights

w (s_{T}, s_{R})

, i.e.,

\bar{X} (s_{R})

is a weighted sum of entries from the list

A = [X (s_{T}) : s_{T} \in S_{T}]

. We have the following straightforward generalization of Lemma 3.

Theorem 1.

The maximum GMI (140) for the channel

p (y | a, s_{R})

, an adaptive symbol A with jointly CSCG entries, the model (141), and with fixed

P (s_{T}) = E [| X (s_{T}) |^{2}]

is

\begin{matrix} I_{1} (A; Y | S_{R}) = E [log (1 + \frac{\tilde{P} (S_{R})}{E [{| Y |}^{2} | S_{R}] - \tilde{P} (S_{R})})] \end{matrix}

(143)

where for all

s_{R} \in S_{R}

we have

\begin{matrix} \tilde{P} (s_{R}) & = E {[|E [Y U {(S_{T})}^{*}| S_{T}, S_{R} = s_{R}]|]}^{2} . \end{matrix}

(144)

Remark 41.

To establish Theorem 1, the receiver may choose

\bar{X} = \sqrt{P} U

to be independent of

s_{R}

. Alternatively, the receiver may choose

\bar{X} (s_{R}) = \sqrt{E [{| X |}^{2} | S_{R} = s_{R}]} U

. Both choices give the same GMI since the expectation in (144) does not depend on the scaling of

\bar{X}

; see Remark 31.

Remark 42.

The partition idea of Lemmas 1 and 5 carries over to Theorem 1. We may generalize (143) as

\begin{matrix} I_{1} (A; Y | S_{R}) = \int_{S_{R}} p (s_{R}) & \sum_{k = 1}^{K} \Pr [E_{k} | S_{R} = s_{R}] \\ [log (1 + \frac{| h_{k} (s_{R}) |^{2} P}{σ_{k}^{2} (s_{R})}) + \frac{E [{| Y |}^{2}| E_{k}, S_{R} = s_{R}]}{σ_{k}^{2} (s_{R}) + {| h_{k} (s_{R}) |}^{2} P} \\ - \frac{E [| Y - h_{k} (s_{R}) \sqrt{P} {U |}^{2}| E_{k}, S_{R} = s_{R}]}{σ_{k}^{2} (s_{R})}] d s_{R} \end{matrix}

(145)

where the

X (s_{T})

,

s_{T} \in S_{T}

, are given by (122) and the

h_{k} (s_{R})

and

σ_{k}^{2} (s_{R})

,

k = 1, \dots, K

,

s_{R} \in S_{R}

, can be optimized.

Remark 43.

One is usually interested in the optimal power control policy

P (s_{T})

under the constraint

E [P (S_{T})] \leq P

. Taking the derivative of (143) with respect to

\sqrt{P (s_{T})}

and setting to zero we obtain

\begin{matrix} E [\frac{E [{| Y |}^{2} | S_{R}] \tilde{P} {(S_{R})}^{'} - \tilde{P} (S_{R}) E {[{| Y |}^{2} | S_{R}]}^{'}}{E [{| Y |}^{2} | S_{R}] [E [{| Y |}^{2} | S_{R}] - \tilde{P} (S_{R})]}] = 2 λ \sqrt{P (s_{T})} P_{S_{T}} (s_{T}) \end{matrix}

(146)

where

\tilde{P} {(S_{R})}^{'}

and

E {[{| Y |}^{2} | S_{R}]}^{'}

are derivatives with respect to

\sqrt{P (s_{T})}

. We use (146) below to derive power control policies.

Remark 44.

A related model is a compound channel where

p (y | a, s_{R})

is indexed by the parameter

s_{R}

([91], Chapter 4). The problem is to find the maximum worst-case reliable rate if the transmitter does not know

s_{R}

. Alternatively, the transmitter must send its message to all

| S_{R} |

receivers indexed by

s_{R} \in S_{R}

. A compound channel may thus be interpreted as a broadcast channel with a common message.

5.2. CSIT@ R

An interesting specialization of Shannon’s model is when the receiver knows

S_{T}

and can determine

X (S_{T})

. We refer to this scenario as CSIT@R. The model was considered in ([10], Section II) when

S_{T}

is a function of

S_{R}

. More generally, suppose

S_{T}

is a function of

[Y, S_{R}]

. The capacity (138) then simplifies to (see ([10], Proposition 1))

\begin{matrix} C & \overset{(a)}{=} max_{A} I (A; Y, S_{T} | S_{R}) \\ \overset{(b)}{=} max_{A} I (X; Y | S_{R}, S_{T}) \\ \overset{(c)}{=} \sum_{s_{T}} P_{S_{T}} (s_{T}) [max_{X (s_{T})} I (X (s_{T}); Y | S_{R}, S_{T} = s_{T})] \end{matrix}

(147)

where step

(a)

follows because

S_{T}

is a function of

[Y, S_{R}]

; step

(b)

follows because A and

(S_{R}, S_{T})

are independent, X is a function of

[A, S_{T}]

, and

A - [S_{T}, X] - Y

forms a Markov chain; and step

(c)

follows because one may optimize

X (s_{T})

separately for each

s_{T} \in S_{T}

.

As discussed in [10], a practical motivation for this model is when the CSIT is based on error-free feedback from the receiver to the transmitter. In this case, where

S_{T}

is a function of

S_{R}

, the expression (144) becomes

\begin{matrix} \tilde{P} (s_{R}) & = {|E [Y U {(s_{T})}^{*}| S_{R} = s_{R}]|}^{2} . \end{matrix}

(148)

Remark 45.

The insight that one can replace adaptive symbols A with channel inputs X when X is a function of A and past Y appeared for two-way channels in ([9], Section 4.2.3) and networks in ([22], Section V.A), ([72], Section IV.F).

5.3. MIMO Channels and K-Partitions

We consider generalizations to MIMO channels and K-partitions as in Section 3.2.

MIMO Channels: Consider the average GMI

\begin{matrix} I_{1} (A; \underset{̲}{Y} | S_{R}) = \int_{S_{R}} p (s_{R}) I_{1} (A; \underset{̲}{Y} | S_{R} = s_{R}) d s_{R} \end{matrix}

(149)

and choose the parameters (113) and (114) for the event

S_{R} = s_{R}

. We have

\begin{matrix} \tilde{H} (s_{R}) & = E [\underset{̲}{Y} {\underset{̲}{\bar{X}}}^{†}| S_{R} = s_{R}] E {[\underset{̲}{\bar{X}} {\underset{̲}{\bar{X}}}^{†}| S_{R} = s_{R}]}^{- 1} \end{matrix}

(150)

\begin{matrix} Q_{\underset{̲}{\tilde{Z}}} (s_{R}) & = E [\underset{̲}{Y} {\underset{̲}{Y}}^{†}| S_{R} = s_{R}] - \tilde{H} (s_{R}) E [\underset{̲}{\bar{X}} {\underset{̲}{\bar{X}}}^{†}| S_{R} = s_{R}] \tilde{H} {(s_{R})}^{†} \end{matrix}

(151)

and the GMI (149) is (cf. (60) and (112))

\begin{matrix} I_{1} (A; \underset{̲}{Y} | S_{R}) = E [log det (I + Q_{\underset{̲}{\tilde{Z}}} {(S_{R})}^{- 1} \tilde{H} (S_{R}) Q_{\underset{̲}{\bar{X}}} \tilde{H} {(S_{R})}^{†})] . \end{matrix}

(152)

K-Partitions: Let

{{\underset{̲}{Y}}_{k} : k = 1, \dots, K}

be a K-partition of

\underset{̲}{Y}

and define the events

E_{k} = {\underset{̲}{Y} \in {\underset{̲}{Y}}_{k}}

for

k = 1, \dots, K

. As in Remark 13, K-partitioning formally includes (149) as a special case by including

S_{R}

as part of the receiver’s “overall” channel output

\underset{̲}{\tilde{Y}} = [\underset{̲}{Y}, S_{R}]

. The following lemma generalizes Lemma 1.

Lemma 5.

A GMI with

s = 1

for the channel

p (\underset{̲}{y} | a)

is

\begin{matrix} I_{1} (A; \underset{̲}{Y}) = \sum_{k = 1}^{K} \Pr [E_{k}] & \{log det (I + Q_{{\underset{̲}{Z}}_{k}}^{- 1} H_{k} Q_{\underset{̲}{\bar{X}}} H_{k}^{†}) \\ + E [{\underset{̲}{Y}}^{†} {(Q_{{\underset{̲}{Z}}_{k}} + H_{k} Q_{\underset{̲}{\bar{X}}} H_{k}^{†})}^{- 1} \underset{̲}{Y}| E_{k}] \\ - E [{(\underset{̲}{Y} - H_{k} \underset{̲}{\bar{X}})}^{†} Q_{{\underset{̲}{Z}}_{k}}^{- 1} (\underset{̲}{Y} - H_{k} \underset{̲}{\bar{X}})| E_{k}]\} \end{matrix}

(153)

where the

H_{k}

and

Q_{{\underset{̲}{Z}}_{k}}

,

k = 1, \dots, K

, can be optimized.

Remark 46.

For scalars the GMI (153) is

\begin{matrix} I_{1} (A; Y) = \sum_{k = 1}^{K} \Pr [E_{k}] & [log (1 + \frac{| h_{k} |^{2} \bar{P}}{σ_{k}^{2}}) + \frac{E [{| Y |}^{2} | E_{k}]}{σ_{k}^{2} + {| h_{k} |}^{2} \bar{P}} - \frac{E [| Y - h_{k} \bar{X} |^{2} | E_{k}]}{σ_{k}^{2}}] \end{matrix}

(154)

which is the same as (64) except that

\bar{X}

,

\bar{P}

replace

X, P

. If we follow (66) and (67) then (154) becomes (68) but with

\begin{matrix} h_{k} = E [Y {\bar{X}}^{*}| E_{k}] / P_{k}, P_{k} = E [{|\bar{X}|}^{2}| E_{k}] . \end{matrix}

Remark 47.

Consider Remark 14 and choose

K = 2

,

h_{1} = 0

,

σ_{1}^{2} = 1

. The GMI (154) then has only the

k = 2

term, and it again remains to select

h_{2}

,

σ_{2}^{2}

, and

t_{R}

.

Remark 48.

If we define

\begin{matrix} Q_{\underset{̲}{\bar{X}}}^{(k)} = E [\underset{̲}{\bar{X}} {\underset{̲}{\bar{X}}}^{†}| E_{k}], Q_{\underset{̲}{Y}}^{(k)} = E [\underset{̲}{Y} {\underset{̲}{Y}}^{†}| E_{k}] \end{matrix}

(155)

and choose the LMMSE auxiliary models with

\begin{matrix} H_{k} & = E [\underset{̲}{Y} {\underset{̲}{\bar{X}}}^{†}| E_{k}] {(Q_{\underset{̲}{\bar{X}}}^{(k)})}^{- 1} \end{matrix}

(156)

\begin{matrix} Q_{{\underset{̲}{Z}}_{k}} & = Q_{\underset{̲}{Y}}^{(k)} - H_{k} Q_{\underset{̲}{\bar{X}}}^{(k)} H_{k}^{†} \end{matrix}

(157)

for

k = 1, \dots, K

then the expression (153) is (cf. (68))

\begin{matrix} I_{1} (A; \underset{̲}{Y}) = \sum_{k = 1}^{K} \Pr [E_{k}] & [log det (I + Q_{{\underset{̲}{Z}}_{k}}^{- 1} H_{k} Q_{\underset{̲}{\bar{X}}} H_{k}^{†}) \end{matrix}

\begin{matrix} - tr ({(Q_{\underset{̲}{Y}}^{(k)} + H_{k} D_{\underset{̲}{\bar{X}}}^{(k)} H_{k}^{†})}^{- 1} H_{k} D_{\underset{̲}{\bar{X}}}^{(k)} H_{k}^{†})] \end{matrix}

(158)

where

D_{\underset{̲}{\bar{X}}}^{(k)} = Q_{\underset{̲}{\bar{X}}} - Q_{\underset{̲}{\bar{X}}}^{(k)}

.

Remark 49.

We may proceed as in Remark 18 and consider large K. These steps are given in Appendix F.

6. Fading Channels with AWGN

This section treats scalar, complex-alphabet, AWGN channels with CSIR for which the channel output is

\begin{matrix} [Y, S_{R}] = [H X + Z, S_{R}] \end{matrix}

(159)

where

H, A, Z

are mutually independent,

E [{| H |}^{2}] = 1

, and

Z \sim CN (0, 1)

. The capacity under the power constraint

E [{| X |}^{2}] \leq P

is (cf. (138))

\begin{matrix} C (P) = max_{A : E [{| X |}^{2}] \leq P} I (A; Y | S_{R}) . \end{matrix}

(160)

However, the optimization in (160) is often intractable, and we desire expressions with

log (1 + SNR)

terms to gain insight. We develop three such expressions: an upper bound and two lower bounds. It will be convenient to write

G = {| H |}^{2}

.

Capacity Upper Bound: We state this bound as a lemma since we use it to prove Proposition 2 below.

Lemma 6.

The capacity (160) is upper bounded as

\begin{matrix} C (P) \leq max E [log (1 + G P (S_{T}))] \end{matrix}

(161)

where the maximization is over

P (S_{T})

with

E [P (S_{T})] = P

.

Proof.

Consider the steps

\begin{matrix} I (A; Y | S_{R}) & \leq I (A; Y, S_{T}, H | S_{R}) \\ \overset{(a)}{=} I (A; Y | S_{R}, S_{T}, H) \\ = h (Y | S_{R}, S_{T}, H) - h (Z) \\ \overset{(b)}{\leq} E [log Var [Y | S_{R}, S_{T}, H]] \end{matrix}

(162)

where step

(a)

is because A and

[S_{R}, S_{T}, H]

are independent, and step

(b)

follows by the entropy bound

\begin{matrix} h (Y | B = b) \leq log (π e Var [Y | B = b]) \end{matrix}

(163)

which we applied with

B = [S_{R}, S_{T}, H]

. Finally, we compute

Var [Y | S_{R}, S_{T}, H] = 1 + G P (S_{T})

. □

Reverse Model GMI: Consider the adaptive symbol (88) and the GMI (139). We expand the variances in (139) as

\begin{matrix} Var [U | Y = y, S_{R} = s_{R}] = E [{| U |}^{2} | Y = y, S_{R} = s_{R}] - | E [U | Y = y, S_{R} = s_{R}] |^{2} . \end{matrix}

Appendix C shows that one may write

\begin{matrix} E [U | Y = y, S_{R} = s_{R}] = \int_{C \times S_{T}} p (h, s_{T} | y, s_{R}) \frac{h \sqrt{P (s_{T})} e^{j ϕ (s_{T})} y}{1 + {| h |}^{2} P (s_{T})} d s_{T} d h \end{matrix}

(164)

and

\begin{matrix} E [{| U |}^{2} | Y = y, S_{R} = s_{R}] & = \int_{C \times S_{T}} p (h, s_{T} | y, s_{R}) \\ (\frac{1}{1 + {| h |}^{2} P (s_{T})} + \frac{{| h |}^{2} P (s_{T}) {| y |}^{2}}{{(1 + {| h |}^{2} P (s_{T}))}^{2}}) d s_{T} d h . \end{matrix}

(165)

We use the expressions (164) and (165) to compute achievable rates by numerical integration. For example, suppose that

S_{T} = 0

and

S_{R} = H

, i.e., we have full CSIR and no CSIT. The averaging density is then

\begin{matrix} p (h, s_{T} | y, s_{R}) = δ (h - s_{R}) δ (s_{T}) \end{matrix}

and the variance simplifies to the capacity-achieving form

\begin{matrix} Var [U | Y = y, S_{R} = h] = \frac{1}{1 + {| h |}^{2} P} . \end{matrix}

Forward Model GMI: A forward model GMI is given by Theorem 1 where

\begin{matrix} \tilde{P} (s_{R}) & = E {[|E [H \sqrt{P (S_{T})}| S_{T}, S_{R} = s_{R}]|]}^{2} \end{matrix}

(166)

\begin{matrix} E [{| Y |}^{2} | S_{R} = s_{R}] & = 1 + E [G P (S_{T}) | S_{R} = s_{R}] \end{matrix}

(167)

so that (143) becomes

\begin{matrix} I_{1} (A; Y | S_{R}) = E [log (1 + \frac{\tilde{P} (S_{R})}{1 + E [G P (S_{T}) | S_{R}] - \tilde{P} (S_{R})})] . \end{matrix}

(168)

Remark 50.

Jensen’s inequality implies that the denominator in (168) is greater than or equal to

\begin{matrix} 1 + Var [\sqrt{G P (S_{T})}| S_{R}] . \end{matrix}

(169)

Equality requires that for all

S_{R} = s_{R}

we have

\begin{matrix} \tilde{P} (s_{R}) = E {[\sqrt{G P (S_{T})}| S_{R} = s_{R}]}^{2} \end{matrix}

(170)

which is valid if H is a function of

[S_{R}, S_{T}]

, for example. However, if there is channel uncertainty after conditioning on

[S_{R}, S_{T}]

then

\tilde{P} (s_{R})

is usually smaller than the RHS of (170).

Remark 51.

Consider

S_{R} = H

or

S_{R} = H \sqrt{P (S_{T})}

. For both cases, H is a function of

[S_{R}, S_{T}]

and the denominator in (168) is the variance (169). In fact, for

S_{R} = H \sqrt{P (S_{T})}

, the expression (169) takes on the minimal value 1. This CSIR is thus the best possible; see Proposition 2.

Remark 52.

For MIMO channels we replace (159) with

\begin{matrix} [\underset{̲}{Y}, S_{R}] = [H \underset{̲}{X} + \underset{̲}{Z}, S_{R}] \end{matrix}

(171)

where

H, A, \underset{̲}{Z}

are mutually independent and

\underset{̲}{Z} \sim CN (\underset{̲}{0}, I)

. One usually considers the constraint

E [∥ \underset{̲}{X} ∥^{2}] \leq P

.

Remark 53.

The model (171) includes block fading. For example, choosing

M = N

and

H = H I

gives scalar block fading. Moreover, the capacity per symbol without in-block feedback is the same as for the

M = N = 1

case except that P is replaced with

P / M

; see [11] and Section 9.

6.1. CSIR and CSIT Models

We study two classes of CSIR, as shown in Table 1. The first class has full (or “perfect”) CSIR, by which we mean either

S_{R} = H

or

S_{R} = H \sqrt{P (S_{T})}

. The motivation for studying the latter case is that it models block fading channels with long blocks where the receiver estimates

H \sqrt{P (S_{T})}

using pilot symbols, and the number of pilot symbols is much smaller than the block length [10]. Moreover, one achieves the upper bound (161), see Proposition 2 below.

We coarsely categorize the CSIT as follows:

Full CSIT: $S_{T} = H$ ;
CSIT@R: $S_{T} = q_{u} (G)$ where $q_{u} (.)$ is the quantizer of Section 2.9 with $B = 0, 1, \infty$ ;
Partial CSIT: $S_{T}$ is not known exactly at the receiver.

The capacity of the CSIT@R models is given by

log (1 + SNR)

expressions [10,92]; see also [93]. The partial CSIT model is interesting because achieving capacity generally requires adaptive codewords and closed-form capacity expressions are unavailable. The GMI lower bound of Theorem 1 and Remark 42 and the capacity upper bound of Lemma 6 serve as benchmarks.

The partial CSIR models have

S_{R}

being a lossy function of H. For example, a common model is based on LMMSE channel estimation with

\begin{matrix} H = \sqrt{\bar{ϵ}} S_{R} + \sqrt{ϵ} Z_{R} \end{matrix}

(172)

where

0 \leq ϵ \leq 1

and

S_{R}, Z_{R}

are uncorrelated. The CSIT is categorized as above, except that we consider

S_{T} = f_{T} (S_{R})

for some function

f_{T} (.)

rather than

S_{T} = q_{u} (G)

.

To illustrate the theory, we study two types of fading: one with discrete H and one with continuous H, namely

Section 7: on-off fading with $P_{H} (0) = P_{H} (\sqrt{2}) = 1 / 2$ ;
Section 8: Rayleigh fading with $H \sim CN (0, 1)$ .

For on-off fading we have

p (g) = \frac{1}{2} δ (g) + \frac{1}{2} δ (g - 2)

and for Rayleigh fading we have

p (g) = e^{- g} \cdot 1 (g \geq 0)

.

Remark 54.

For channels with partial CSIR, we will study the GMI for partitions with

K = 1

and

K = 2

. The full CSIT model has received relatively little attention in the literature, perhaps because CSIR is usually more accurate than CSIT ([5], Section 4.2.3).

6.2. No CSIR, No CSIT

Without CSIR or CSIT, the channel is a classic memoryless channel [94] for which the capacity (160) becomes the usual expression with

S_{R} = 0

and

A = X

. For CSCG X and

U = X / E [{| X |}^{2}]

, the reverse and forward model GMIs (139) and (168) are the respective

\begin{matrix} I_{1} (X; Y) & = E [- log Var [U | Y]] \end{matrix}

(173)

\begin{matrix} I_{1} (X; Y) & = log (1 + \frac{P | E [H] |^{2}}{1 + P Var [H]}) . \end{matrix}

(174)

For example, the forward model GMI is zero if

E [H] = 0

.

6.3. Full CSIR, CSIT@ R

Consider the full CSIR models with

S_{R} = H

and CSIT@R. The capacity is given by

log (1 + SNR)

expressions that we review.

First, the capacity with

B = 0

(no CSIT) is

\begin{matrix} C (P) = E [log (1 + G P)] = \int_{0}^{\infty} p (g) log (1 + g P) d g . \end{matrix}

(175)

The wideband derivatives are (see (37))

\begin{matrix} C^{'} (0) = E [G] = 1, C^{″} (0) = - E [G^{2}] \end{matrix}

(176)

so that the wideband values (37) are (see ([73], Theorem 13))

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} = log 2, S = \frac{2}{E [G^{2}]} . \end{matrix}

(177)

The minimal

E_{b} / N_{0}

is the same as without fading, namely

- 1.59

dB. However, Jensen’s inequality gives

E [G^{2}] \geq E {[G]}^{2} = 1

with equality if and only if

G = 1

. Thus, fading reduces the capacity slope S.

More generally, the capacity with full CSIR and

S_{T} = q_{u} (G)

is (see [10])

\begin{matrix} C (P) & = max_{P (S_{T}) : E [P (S_{T})] \leq P} E [log (1 + G P (S_{T}))] \\ = max_{P (S_{T}) : E [P (S_{T})] \leq P} \int_{0}^{\infty} p (g, s_{T}) log (1 + g P (s_{T})) d g d s_{T} . \end{matrix}

(178)

To optimize the power levels

P (s_{T})

, consider the Lagrangian

\begin{matrix} E [log (1 + G P (S_{T}))] + λ (P - E [P (S_{T})]) \end{matrix}

(179)

where

λ \geq 0

is a Lagrange multiplier. Taking the derivative with respect to

P (s_{T})

, we have

\begin{matrix} λ = E [\frac{G}{1 + G P (s_{T})}| S_{T} = s_{T}] = \int_{0}^{\infty} p (g | s_{T}) \frac{g}{1 + g P (s_{T})} d g \end{matrix}

(180)

as long as

P (s_{T}) \geq 0

. If this equation cannot be satisfied, choose

P (s_{T}) = 0

. Finally, set

λ

so that

E [P (S_{T})] = P

.

For example, consider

B = \infty

and

S_{T} = G

. We then have

p (g | s_{T}) = δ (g - s_{T})

and therefore

\begin{matrix} P (g) = {(\frac{1}{λ} - \frac{1}{g})}^{+} \end{matrix}

(181)

where

λ

is chosen so that

E [P (G)] = P

. The capacity (178) is then (see ([95], Equation (7)))

\begin{matrix} C (P) & = \int_{λ}^{\infty} p (g) log (g / λ) d g . \end{matrix}

(182)

Consider now the quantizer

q_{u} (.)

of Section 2.9 with

B = 1

. We have two equations for

λ

, namely

\begin{matrix} λ & = \int_{0}^{Δ} \frac{p (g)}{P_{S_{T}} (Δ / 2)} \cdot \frac{g}{1 + g P (Δ / 2)} d g \end{matrix}

(183)

\begin{matrix} λ & = \int_{Δ}^{\infty} \frac{p (g)}{P_{S_{T}} (3 Δ / 2)} \cdot \frac{g}{1 + g P (3 Δ / 2)} d g . \end{matrix}

(184)

Observe the following for (183) and (184):

both $P (Δ / 2)$ and $P (3 Δ / 2)$ decrease as $λ$ increases;
the maximal $λ$ permitted by (183) is $E [G | G \leq Δ]$ which is obtained with $P (Δ / 2) = 0$ ;
the maximal $λ$ permitted by (184) is $E [G | G \geq Δ]$ which is obtained with $P (3 Δ / 2) = 0$ .

Thus, if

E [G | G \geq Δ] > E [G | G \leq Δ]

, then at P below some threshold, we have

P (Δ / 2) = 0

and

P (3 Δ / 2) = P / P_{S_{T}} (3 Δ / 2)

. The capacity in nats per symbol at low power and for fixed

Δ

is thus

\begin{matrix} C (P) & = \int_{Δ}^{\infty} p (g) log (1 + g P (3 Δ / 2)) d g \\ \approx P E [G | G \geq Δ] - \frac{P^{2}}{2 P_{S_{T}} (3 Δ / 2)} E [G^{2} | G \geq Δ] \end{matrix}

(185)

where we used

\begin{matrix} log (1 + x) \approx x - \frac{x^{2}}{2} \end{matrix}

for small x. The wideband values (37) are

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} = \frac{log 2}{E [G | G \geq Δ]} \end{matrix}

(186)

\begin{matrix} S = \frac{2 P_{S_{T}} (3 Δ / 2) E {[G | G \geq Δ]}^{2}}{E [G^{2} | G \geq Δ]} . \end{matrix}

(187)

One can thus make the minimum

E_{b} / N_{0}

approach

- \infty

if one can make

E [G | G \geq Δ]

as large as desired by increasing

Δ

.

Remark 55.

Consider the MIMO model (171) with

S_{R} = H

. Suppose the CSIT is

S_{T} = f_{T} (S_{R})

for some function

f_{T} (\cdot)

. The capacity (178) generalizes to

\begin{matrix} C (P) & = max_{\underset{̲}{X} (S_{T}) : E [∥ \underset{̲}{X} (S_{T}) ∥^{2}] \leq P} I (\underset{̲}{X}; H \underset{̲}{X} + \underset{̲}{Z} | H, S_{T}) \\ = max_{Q (S_{T}) : E [tr (Q (S_{T}))] \leq P} E [log det (I + H Q (S_{T}) H^{†})] . \end{matrix}

(188)

6.4. Full CSIR, Partial CSIT

Consider first the full CSIR

S_{R} = H \sqrt{P (S_{T})}

and then the less informative

S_{R} = H

.

S_{R} = H \sqrt{P (S_{T})}

: We have the following capacity result that implies this CSIR is the best possible since one can achieve the same rate as if the receiver sees both H and

S_{T}

; see the first step of (162). We could thus have classified this model as CSIT@R.

Proposition 2

(see ([10], Proposition 3)). The capacity of the channel (159) with

S_{R} = H \sqrt{P (S_{T})}

and general

S_{T}

is

\begin{matrix} C (P) & = max_{P (S_{T}) : E [P (S_{T})] \leq P} \int_{C} p (s_{R}) log (1 + | s_{R} |^{2}) d s_{R} \\ = max_{P (S_{T}) : E [P (S_{T})] \leq P} E [log (1 + G P (S_{T}))] . \end{matrix}

(189)

Proof.

Achievability follows by Theorem 1 with Remark 51. The converse is given by Lemma 6. □

Remark 56.

Proposition 2 gives an upper bound and (thus) a target rate when the receiver has partial CSIR. For example, we will use the K-partition idea of Lemma 1 (see also Remark 46) to approach the upper bound for large SNR.

Remark 57.

Proposition 2 partially generalizes to block-fading channels; see Proposition 3 in Section 9.5.

S_{R} = H

: The capacity is (138) with

\begin{matrix} I (A; Y | H) = E [log \frac{p (Y | A, H)}{p (Y | H)}] \end{matrix}

(190)

where

E [{| X |}^{2}] \leq P

and where

\begin{matrix} p (y | a, h) & = \int_{C} p (s_{T} | h) \frac{e^{- | y - h x (s_{T}) |^{2}}}{π} d s_{T} \end{matrix}

(191)

and

\begin{matrix} p (y | h) & = \int_{C} p (s_{T} | h) (\int_{A} p (a) p (y | a, h, s_{T}) d a) d s_{T} \\ = \int_{C} p (s_{T} | h) (\int_{C} p (x (s_{T})) \frac{e^{- | y - h x (s_{T}) |^{2}}}{π} d x (s_{T})) d s_{T} . \end{matrix}

(192)

For example, if each entry

X (s_{T})

of A is CSCG with variance

P (s_{T})

then

\begin{matrix} p (y | h) = \int_{C} p (s_{T} | h) \frac{exp (- \frac{{| y |}^{2}}{1 + g P (s_{T})})}{π (1 + g P (s_{T}))} d s_{T} . \end{matrix}

(193)

In general, one can compute

I (A; Y | H)

numerically by using (190)–(192), but the calculations are hampered if the integrals in (191) and (192) do not simplify.

For the reverse model GMI (139), the averaging density in (164) and (165) is here

\begin{matrix} p (h, s_{T} | y, s_{R}) = δ (h - s_{R}) \frac{p (s_{T} | h) p (y | h, s_{T})}{p (y | h)} . \end{matrix}

(194)

We use numerical integration to compute the GMI.

To obtain more insight, we state the forward model rates of Theorem 1 and Remark 51 as a Corollary.

Corollary 1.

An achievable rate for the fading channels (159) with

S_{R} = H

and partial CSIT is the forward model GMI

\begin{matrix} I_{1} (A; Y | H) & = E [log (1 + SNR (H))] \end{matrix}

(195)

where

\begin{matrix} SNR (h) = \frac{{| h |}^{2} {\tilde{P}}_{T} (h)}{1 + {| h |}^{2} Var [\sqrt{P (S_{T})}| H = h]} \end{matrix}

(196)

and

\begin{matrix} {\tilde{P}}_{T} (h) = E {[\sqrt{P (S_{T})}| H = h]}^{2} . \end{matrix}

(197)

Remark 58.

Jensen’s inequality gives

\begin{matrix} {\tilde{P}}_{T} (h) \leq E [P (S_{T}) | H = h] \end{matrix}

(198)

by the concavity of the square root. Equality holds if and only if

P (S_{T})

is a constant given

H = h

.

Remark 59.

Choosing

P (s_{T}) = P

for all

s_{T}

in Corollary 1 gives

{\tilde{P}}_{T} (h) = P

for all h and the rate (195) is the capacity (175) without CSIT.

Remark 60.

For large P, the

SNR (h)

in (196) saturates unless

P (s_{T}) / P \to 1

for all

s_{T}

, i.e., the high-SNR capacity is the same as the capacity without CSIT. The CSIT thus must become more accurate as P increases to improve the rate.

Remark 61.

To optimize the power levels, consider (146) and

\begin{matrix} \tilde{P} {(h)}^{'} & = {2 | h |}^{2} \sqrt{{\tilde{P}}_{T} (h)} p (s_{T} | h) \end{matrix}

(199)

\begin{matrix} E {[{| Y |}^{2} | H = h]}^{'} & = {2 | h |}^{2} \sqrt{P (s_{T})} p (s_{T} | h) . \end{matrix}

(200)

However, the resulting equations give little insight due to the expectation over H in (146). An exception is the on-off fading case where the expectation has only one term; see (254) and (255).

6.5. Partial CSIR, Full CSIT

Suppose

S_{R}

is a (perhaps noisy) function of H; see (172). The capacity is given by (160) for which we need to compute

p (y | a, s_{R})

and

p (y | s_{R})

. The GMI with a K-partition of the output space

Y \times S_{R}

can be helpful for these problems. We assume that the CSIR is either

S_{R} = 0

or

S_{R} = 1 (G \geq t)

for some transmitter threshold t; see [95].

Suppose that

S_{T} = H

. We then have

\begin{matrix} p (y | a, s_{R}) & = \int_{C} p (h | s_{R}) \frac{exp (- {|y - h x (h)|}^{2})}{π} d h \\ p (y | s_{R}) & = \int_{C^{2}} p (h | s_{R}) p (x (h)) \frac{exp (- {|y - h x (h)|}^{2})}{π} d x (h) d h . \end{matrix}

Now select the

X (h)

to be jointly CSCG with variances

E [{| X (h) |}^{2}] = P (h)

and correlation coefficients

\begin{matrix} ρ (h, h^{'}) = \frac{E [X (h) X {(h^{'})}^{*}]}{\sqrt{P (h) P (h^{'})}} \end{matrix}

and where

E [P (H)] \leq P

. We then have

\begin{matrix} p (y | s_{R}) = \int_{C} p (h) \frac{e^{- {| y |}^{2} / {(| h |}^{2} P (h) + 1)}}{2 π (| h |^{2} P (h) + 1)} d h . \end{matrix}

As in (97),

p (y | s_{R})

and therefore

h (Y | S_{R})

depend only on the marginals

p (x (h))

of A and not on the

ρ (h, h^{'})

. We thus have the problem of finding the

ρ (h, h^{'})

that minimize

\begin{matrix} h (Y | S_{R}, A) = \int_{A} p (a) h (Y | S_{R}, A = a) d a . \end{matrix}

However, we study the conventional A in (88) for simplicity.

For the reverse model GMI (139), the averaging density in (164) and (165) is (cf. (194))

\begin{matrix} p (h, s_{T} | y, s_{R}) = δ (s_{T} - h) \frac{p (h | s_{R}) p (y | h, s_{R})}{p (y | s_{R})} . \end{matrix}

(201)

We again use numerical integration to compute the GMI.

For the forward model GMI, consider the same model and CSCG X as in Theorem 1. Since H is a function of

S_{T}

, we use (169) in Remark 50 to write

\begin{matrix} I_{1} (A; Y | S_{R}) = E [log (1 + \frac{\tilde{P} (S_{R})}{1 + Var [\sqrt{G P (H)}| S_{R}]})] \end{matrix}

(202)

where (see (170))

\begin{matrix} \tilde{P} (s_{R}) & = E {[\sqrt{G P (H)}| S_{R} = s_{R}]}^{2} \end{matrix}

(203)

\begin{matrix} E [{| Y |}^{2} | S_{R} = s_{R}] & = 1 + E [G P (H) | S_{R} = s_{R}] . \end{matrix}

(204)

The transmitter compensates for the phase of H, and it remains to adjust the transmit power levels

P (h)

. We study five power control policies and two types of CSIR; see Table 2.

Heuristic Policies: The first three policies are reasonable heuristics and have the form

\begin{matrix} P (h) = \{\begin{matrix} \hat{P} g^{a}, & g \geq t \\ 0, & else \end{matrix} \end{matrix}

(205)

for some choice of real a and where

\begin{matrix} \hat{P} = \frac{P}{\int_{t}^{\infty} p (g) g^{a} d g} . \end{matrix}

(206)

In particular, choosing

a = 0, + 1, - 1

, we obtain policies that we call truncated constant power (TCP), truncated matched filtering (TMF), and truncated channel inversion (TCI), respectively; see ([5], page 487), [95]. For such policies, we compute

\begin{matrix} \tilde{P} (s_{R}) & = \hat{P} {(\int_{t}^{\infty} p (g | s_{R}) \sqrt{g^{1 + a}} d g)}^{2} \end{matrix}

(207)

\begin{matrix} E [G P (H) | S_{R} = s_{R}] & = \hat{P} \int_{t}^{\infty} p (g | s_{R}) g^{1 + a} d g . \end{matrix}

(208)

These policies all have the form

P (h) = P \cdot f (h)

for some function

f (.)

that is independent of P. The minimum SNR in (37) with

C (P)

replaced with the GMI is thus

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} & = \frac{(\int_{t}^{\infty} p (g) g^{a} d g) log 2}{E [{(\int_{t}^{\infty} p (g | S_{R}) \sqrt{g^{1 + a}} d g)}^{2}]} . \end{matrix}

(209)

For instance, consider the threshold

t = 0

(no truncation). The TCP (

a = 0

) and TMF (

a = 1

) policies have

\hat{P} = P

while TCI (

a = - 1

) has

P = \hat{P} / E [G^{- 1}]

. For TCP, TMF, and TCI, we compute the respective

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} = \frac{log 2}{E [E {[\sqrt{G}| S_{R}]}^{2}]} \end{matrix}

(210)

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} = \frac{log 2}{E [E {[G | S_{R}]}^{2}]} \end{matrix}

(211)

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} = E [G^{- 1}] log 2 . \end{matrix}

(212)

Applying Jensen’s inequality to the square root, square, and inverse functions in (210)–(212), we find that for

t = 0

:

the minimum $E_{b} / N_{0}$ of TCP and TCI is larger (worse) than $- 1.59$ dB unless there is no fading;
the minimum $E_{b} / N_{0}$ of TMF is smaller (better) than $- 1.59$ dB unless $E [G | S_{R}] = E [G] = 1$ .

However, we emphasize that these claims apply to the GMI and not necessarily the mutual information; see Section 8.4.

GMI-Optimal Policy: The fourth policy is optimal for the GMI (202) and has the form of an MMSE precoder. This policy motivates a truncated MMSE (TMMSE) policy that generalizes and improves TMF and TCI.

Taking the derivative of the Lagrangian

\begin{matrix} I_{1} (A; Y | S_{R}) + λ (P - E [P (H)]) \end{matrix}

(213)

with respect to

P (h)

we have the following result.

Theorem 2.

The optimal power control policy for the GMI

I_{1} (A; Y | S_{R})

for the fading channels (159) with

S_{T} = H

is

\begin{matrix} \sqrt{P (h)} & = \frac{α (h) | h |}{λ + {β (h) | h |}^{2}} \end{matrix}

(214)

where

λ > 0

is chosen so that

E [P (H)] = P

and

\begin{matrix} α (h) = \int_{C} p (s_{R} | h) \frac{\sqrt{\tilde{P} (s_{R})}}{E [{| Y |}^{2} | S_{R} = s_{R}] - \tilde{P} (s_{R})} d s_{R} \end{matrix}

(215)

\begin{matrix} β (h) = \int_{C} p (s_{R} | h) \frac{\tilde{P} (s_{R})}{[E [{| Y |}^{2} | S_{R} = s_{R}] - \tilde{P} (s_{R})] E [{| Y |}^{2} | S_{R} = s_{R}]} d s_{R} . \end{matrix}

(216)

Proof.

Apply (146) with (203) and (204) to obtain

\begin{matrix} \tilde{P} {(s_{R})}^{'} = 2 | h | \sqrt{\tilde{P} (s_{R})} p (h | s_{R}) \end{matrix}

(217)

\begin{matrix} E {[{| Y |}^{2} | S_{R} = s_{R}]}^{'} = 2 {| h |}^{2} \sqrt{P (h)} p (h | s_{R}) . \end{matrix}

(218)

Inserting into (146) and rearranging terms we obtain (214) with (215) and (216). □

Remark 62.

The expressions (215) and (216) are self-referencing, as

\tilde{P} (s_{R})

itself depends on

α (h)

and

β (h)

. However, one simplification occurs if

S_{R}

is a function of H:

α (h)

and

β (h)

are functions of

s_{R}

only since the

p (s_{R} | h)

in (215) and (216) is a Dirac generalized function.

Remark 63.

Consider the expression (214). We effectively have a matched filter for small

| h |

; for large

| h |

, we effectively have a channel inversion. Recall that LMMSE filtering has similar behavior for low and high SNR, respectively.

Remark 64.

A heuristic based on the optimal policy is a TMMSE policy where the transmitter sets

P (h) = 0

if

G < t

, and otherwise uses (214) but where

α (h)

,

β (h)

are independent of h. There are thus four parameters to optimize: λ, α, β, and t. This TMMSE policy will outperform TMF and TCI in general, as these are special cases where

β = 0

and

λ = 0

, respectively.

S_{R} = 0

: For this CSIR, the GMI (202) simplifies to

I_{1} (A; Y)

and the heuristic policy (TCP, TMF, TCI) rates are

\begin{matrix} I_{1} (A; Y) = log (1 + \frac{\hat{P} E {[\sqrt{G^{1 + a}} \cdot 1 (G \geq t)]}^{2}}{1 + \hat{P} Var [\sqrt{G^{1 + a}} \cdot 1 (G \geq t)]}) . \end{matrix}

(219)

Moreover, the expression (209) gives

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} & = \frac{E [G^{a} \cdot 1 (G \geq t)]}{E {[\sqrt{G^{1 + a}} \cdot 1 (G \geq t)]}^{2}} log 2 . \end{matrix}

(220)

For TCP, TMF, and TCI, we compute the respective

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} = \frac{log 2}{\Pr [G \geq t] E {[\sqrt{G}| G \geq t]}^{2}} \end{matrix}

(221)

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} = \frac{log 2}{\int_{t}^{\infty} p (g) g d g} \end{matrix}

(222)

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} = \frac{E [G^{- 1}| G \geq t]}{\Pr [G \geq t]} log 2 . \end{matrix}

(223)

Again applying Jensen’s inequality to the various functions in (221)–(223), we find that:

the minimum $E_{b} / N_{0}$ of TMF is smaller (better) than that of TCP and TCI unless there is no fading, or if the minimal $E_{b} / N_{0}$ is $- \infty$ ;
the best threshold for TMF is $t = 0$ and the minimal $E_{b} / N_{0}$ is $- 1.59$ dB.

For the optimal policy, the parameters

α (h)

and

β (h)

in (215) and (216) are constants independent of h, see Remark 62, and the TMMSE policy with

t = 0

is the GMI-optimal policy.

Remark 65.

The TCI channel densities are

\begin{matrix} p (y | a) & = \Pr [G < t] \frac{e^{- {| y |}^{2}}}{π} + \Pr [G \geq t] \frac{e^{- {|y - \sqrt{\hat{P}} u|}^{2}}}{π} \\ p (y) & = \Pr [G < t] \frac{e^{- {| y |}^{2}}}{π} + \Pr [G \geq t] \frac{e^{- {| y |}^{2} / (1 + \hat{P})}}{π (1 + \hat{P})} . \end{matrix}

Remark 66.

At high SNR, one might expect that the receiver can estimate

P (S_{T})

precisely even if

S_{R} = 0

. We show that this is indeed the case for on-off fading by using the

K = 2

partition (154) of Remark 46. Moreover, the results prove that at high SNR one can approach

I (A; Y)

; see Section 7.3.

Remark 67.

For Rayleigh fading, the GMI with

K = 2

in (154) is helpful for both high and low SNR. For instance, for

S_{R} = 0

and TCI, the

K = 2

GMI approaches the mutual information for

S_{R} = 1 (G \geq t)

as the SNR increases; see Remark 74 in Section 8.4. We further show that for

S_{R} = 0

, the TCI policy can achieve a minimal

E_{b} / N_{0}

of

- \infty

dB, see (301) in Section 8.4.

S_{R} = 1 (G \geq t)

: The heuristic policy rates are now (cf. (219) and note the

\Pr [G \geq t]

term and conditioning)

\begin{matrix} I_{1} (A; Y | S_{R}) = \Pr [G \geq t] log (1 + \frac{\hat{P} E {[\sqrt{G^{1 + a}}| G \geq t]}^{2}}{1 + \hat{P} Var [\sqrt{G^{1 + a}}| G \geq t]}) . \end{matrix}

(224)

Moreover, the expression (209) is

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} = \frac{E [G^{a}| G \geq t]}{E {[\sqrt{G^{1 + a}}| G \geq t]}^{2}} log 2 . \end{matrix}

(225)

For TCP, TMF, and TCI we compute the respective

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} = \frac{log 2}{E {[\sqrt{G}| G \geq t]}^{2}} \end{matrix}

(226)

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} = \frac{log 2}{E [G | G \geq t]} \end{matrix}

(227)

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} = E [G^{- 1}| G \geq t] log 2 . \end{matrix}

(228)

Again applying Jensen’s inequality to the various functions in (226)–(228), we find that:

the minimum $E_{b} / N_{0}$ of all policies can be better than $- 1.59$ dB by choosing $t > 0$ ;
the minimum $E_{b} / N_{0}$ of TMF is smaller (better) than that of TCP and TCI unless there is no fading or the minimal $E_{b} / N_{0}$ is $- \infty$ .

For the optimal policy, Remark 62 points out that

α (h)

and

β (h)

depend on

s_{R}

only. We compute

\begin{matrix} \sqrt{P (h)} = \{\begin{matrix} \frac{α_{0} | h |}{λ + β_{0} {| h |}^{2}}, & g < t \\ \frac{α_{1} | h |}{λ + β_{1} {| h |}^{2}}, & g \geq t \end{matrix} \end{matrix}

(229)

where for

s_{R} \in {0, 1}

we have

\begin{matrix} α_{s_{R}} = \frac{\sqrt{\tilde{P} (s_{R})}}{E [{| Y |}^{2} | S_{R} = s_{R}] - \tilde{P} (s_{R})} \\ β_{s_{R}} = \frac{\sqrt{\tilde{P} (s_{R})}}{[E [{| Y |}^{2} | S_{R} = s_{R}] - \tilde{P} (s_{R})] E [{| Y |}^{2} | S_{R} = s_{R}]} . \end{matrix}

Remark 68.

The GMI (224) for TCI (

a = - 1

) is the mutual information

I (A; Y | S_{R})

. To see this, observe that the model

q (y | a, s_{R})

has

\begin{matrix} q (y | a, 0) = \frac{e^{- {| y |}^{2}}}{π}, q (y | a, 1) = \frac{e^{- {|y - \sqrt{\hat{P}} u|}^{2}}}{π} \end{matrix}

and thus we have

q (y | a, s_{R}) = p (y | a, s_{R})

for all

y, a, s_{R}

.

6.6. Partial CSIR, CSIT@ R

Suppose next that

S_{R}

is a noisy function of H (see for instance (172)) and

S_{T} = f_{T} (S_{R})

. The capacity is given by (147) and we compute

\begin{matrix} I (X; Y | S_{R}) = E [log \frac{p (Y | X, S_{R})}{p (Y | S_{R})}] \end{matrix}

(230)

where writing

s_{T} = f_{T} (s_{R})

we have

\begin{matrix} p (y | s_{R}, x) & = \int_{C} p (h | s_{R}) \frac{e^{- | y - h x (s_{T}) |^{2}}}{π} d h \end{matrix}

(231)

\begin{matrix} p (y | s_{R}) & = \int_{C^{2}} p (h | s_{R}) p (x (s_{T})) \frac{e^{- | y - h x (s_{T}) |^{2}}}{π} d x (s_{T}) d h . \end{matrix}

(232)

For example, if

X (s_{T})

is CSCG with variance

P (s_{T})

then

\begin{matrix} p (y | s_{R}) = \int_{C} p (h | s_{R}) \frac{exp (- \frac{{| y |}^{2}}{1 + {| h |}^{2} P (s_{T})})}{π (1 + | h |^{2} P (s_{T}))} d h . \end{matrix}

(233)

One can compute

I (X; Y | S_{R})

numerically using (231) and (232). However, optimizing over

X (s_{T})

is usually difficult.

For the reverse model GMI (139), the averaging density in (164) and (165) is now (cf. (194) and (201))

\begin{matrix} p (h, s_{T} | y, s_{R}) = δ (s_{T} - f_{T} (s_{R})) \frac{p (h | s_{R}) p (y | h, s_{R})}{p (y | s_{R})} . \end{matrix}

(234)

We use numerical integration to compute the rates.

The forward model GMI again gives more insight. Define the channel gain and variance as the respective

\begin{matrix} \tilde{g} (s_{R}) & = {|E [H | S_{R} = s_{R}]|}^{2} \end{matrix}

(235)

\begin{matrix} {\tilde{σ}}^{2} (s_{R}) & = Var [H | S_{R} = s_{R}] . \end{matrix}

(236)

Theorem 3.

An achievable rate for AWGN fading channels (159) with power constraint

E [{| X |}^{2}] \leq P

and with partial CSIR

S_{R}

and

S_{T} = f_{T} (S_{R})

is

\begin{matrix} I_{1} (X; Y | S_{R}) = E [log (1 + \frac{\tilde{g} (S_{R}) P (S_{T})}{1 + {\tilde{σ}}^{2} (S_{R}) P (S_{T})})] \end{matrix}

(237)

where

E [P (S_{T})] = P

. The optimal power levels

P (s_{T})

are obtained by solving

\begin{matrix} λ = \int_{R} p (s_{R} | s_{T}) \frac{\tilde{g} (s_{R})}{[1 + (\tilde{g} (s_{R}) + {\tilde{σ}}^{2} (s_{R})) P (s_{T})] [1 + {\tilde{σ}}^{2} (s_{R}) P (s_{T})]} d s_{R} . \end{matrix}

(238)

In particular, if

S_{T}

determines

S_{R}

(CSIR@T) then we have the quadratic waterfilling expression

\begin{matrix} f (P (s_{T}), \tilde{g} (s_{R}), {\tilde{σ}}^{2} (s_{R})) = {(\frac{1}{λ} - \frac{1}{\tilde{g} (s_{R})})}^{+} \end{matrix}

(239)

where

\begin{matrix} f (Q, g, σ^{2}) = (1 + 2 \frac{σ^{2}}{g}) Q + (1 + \frac{σ^{2}}{g}) σ^{2} Q^{2} \end{matrix}

(240)

and where λ is chosen so that

E [P (H_{R})] = P

.

Proof.

Apply Theorem 1 with

\begin{matrix} \tilde{P} (s_{R}) & = \tilde{g} (s_{R}) P (s_{T}) \end{matrix}

(241)

\begin{matrix} E [{| Y |}^{2} | S_{R} = s_{R}] & = 1 + (\tilde{g} (s_{R}) + {\tilde{σ}}^{2} (s_{R})) P (s_{T}) \end{matrix}

(242)

to obtain (237). To optimize the power levels

P (s_{T})

with (146), consider the derivatives

\begin{matrix} \tilde{P} {(s_{R})}^{'} & = 2 \tilde{g} (s_{R}) \sqrt{P (s_{T})} 1 (s_{T} = f_{T} (s_{R})) \end{matrix}

(243)

\begin{matrix} E {[{| Y |}^{2} | S_{R} = s_{R}]}^{'} & = 2 (\tilde{g} (s_{R}) + {\tilde{σ}}^{2} (s_{R})) \sqrt{P (s_{T})} 1 (s_{T} = f_{T} (s_{R})) . \end{matrix}

(244)

The expression (146) thus becomes (238). If

S_{T}

determines

S_{R}

then the expression simplifies to

\begin{matrix} λ = \frac{\tilde{g} (s_{R})}{[1 + (\tilde{g} (s_{R}) + {\tilde{σ}}^{2} (s_{R})) P (s_{T})] [1 + {\tilde{σ}}^{2} (s_{R}) P (s_{T})]} \end{matrix}

from which we obtain (239). □

Remark 69.

The optimal power control policy with CSIT@R and CSIR@T can be written explicitly by solving the quadratic in (239). The result is

\begin{matrix} P (s_{T}) = \frac{\tilde{g} + 2 {\tilde{σ}}^{2}}{2 {\tilde{σ}}^{2} (\tilde{g} + {\tilde{σ}}^{2})} [\sqrt{1 + 4 {\tilde{σ}}^{2} {(\frac{1}{λ} - \frac{1}{\tilde{g}})}^{+} \frac{\tilde{g} (\tilde{g} + {\tilde{σ}}^{2})}{{(\tilde{g} + 2 {\tilde{σ}}^{2})}^{2}}} - 1] \end{matrix}

(245)

where we have discarded the dependence on

s_{R}

for convenience. The alternative form (239) relates to the usual waterfilling where the left-hand side of (239) is

P (s_{T})

. Observe that

{\tilde{σ}}^{2} = 0

gives conventional waterfilling.

Remark 70.

As in Section 3.3, we show that at high SNR the

K = 2

GMI of Remark 42 approaches the upper bound of Proposition 2 in some cases; see Section 7.4. The channel parameters depend on

s_{R}

, and we choose

h_{1} (s_{R}) = 0

and

σ_{1}^{2} (s_{R}) = σ_{2}^{2} (s_{R}) = 1

for all

s_{R}

.

7. On-Off Fading

Consider again on-off fading with

P_{G} (0) = P_{G} (2) = 1 / 2

. We study the scenarios listed in Table 1. The case of no CIR and no CSIT was studied in Section 3.3.

7.1. Full CSIR, CSIT@ R

Consider

S_{R} = H

. The capacity with

B = 0

(no CSIT) is given by (175) (cf. (73)):

\begin{matrix} C (P) = \frac{1}{2} log (1 + 2 P) \end{matrix}

(246)

and the wideband values are given by (177) (cf. (74)); the minimal

E_{b} / N_{0}

is

log 2

and the slope is

S = 1

.

The capacity with

B = \infty

(or

S_{T} = G

) increases to

\begin{matrix} C (P) & = \frac{1}{2} log (1 + 4 P) \end{matrix}

(247)

where

P (0) = 0

and

P (2) = 2 P

. This capacity is also achieved with

B = 1

since there are only two values for G. We compute

C^{'} (0) = 2

and

C^{″} (0) = - 8

, and therefore

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} = \frac{log 2}{2}, S = 1 . \end{matrix}

(248)

The power gain due to CSIT compared to no fading is thus 3.01 dB, but the capacity slope is the same. The rate curves are compared in Figure 4.

7.2. Full CSIR, Partial CSIT

Consider next noisy CSIT with

0 \leq ϵ \leq \frac{1}{2}

and

\begin{matrix} \Pr [S_{T} = G] = \bar{ϵ}, \Pr [S_{T} \neq G] = ϵ . \end{matrix}

S_{R} = H \sqrt{P (S_{T})}

: The capacity of Proposition 2 is

\begin{matrix} C (P) = max_{P (0) + P (2) = 2 P} \frac{ϵ}{2} log (1 + 2 P (0)) + \frac{\bar{ϵ}}{2} log (1 + 2 P (2)) . \end{matrix}

(249)

Optimizing the power levels, we have

\begin{matrix} P (0) = {(2 ϵ P - \frac{\bar{ϵ} - ϵ}{2})}^{+}, P (2) = 2 P - P (0) . \end{matrix}

(250)

Figure 4 shows

C (P)

for

ϵ = 0.1

as the curve labeled “Best CSIR”. For

P \geq (\bar{ϵ} - ϵ) / (4 ϵ)

, we compute

\begin{matrix} C (P) = \frac{1}{2} log (1 + 2 P) + \frac{1}{2} [1 - H_{2} (ϵ)] log 2 \end{matrix}

(251)

where

H_{2} (ϵ) = - ϵ {log}_{2} ϵ - \bar{ϵ} {log}_{2} \bar{ϵ}

is the binary entropy function. For example, if

ϵ = 0.1

then for

P \geq 2

one gains

Δ C = [1 - H_{2} (0.1)] / 2 \approx 0.27

bits over the capacity without CSIT. This translates to an SNR gain of

2 Δ C \cdot 10 {log}_{10} (2) \approx 1.60

dB. On the other hand, for

P \leq (\bar{ϵ} - ϵ) / (4 ϵ)

we have

P (0) = 0

,

P (2) = 2 P

, and the capacity is

\begin{matrix} C (P) = \frac{\bar{ϵ}}{2} log (1 + 4 P) . \end{matrix}

(252)

We have

C^{'} (0) = 2 \bar{ϵ}

and lose a fraction of

\bar{ϵ}

of the power as compared to having full CSIT (

ϵ = 0

). For example, if

ϵ = 0.1

, the minimal

E_{b} / N_{0}

is approximately

- 4.14

dB.

S_{R} = H

: To compute

I (A; Y | H)

in (190), we write (191) and (193) for CSCG

X (s_{T})

as

\begin{matrix} p_{Y | A, H} (y | a, 0) & = p_{Y | H} (y | 0) = \frac{e^{- {| y |}^{2}}}{π} \\ p_{Y | A, H} (y | a, \sqrt{2}) & = ϵ \frac{e^{- {|y - \sqrt{2} x (0)|}^{2}}}{π} + \bar{ϵ} \frac{e^{- {|y - \sqrt{2} x (\sqrt{2})|}^{2}}}{π} \\ p_{Y | H} (y | \sqrt{2}) & = ϵ \frac{exp (- \frac{{| y |}^{2}}{1 + 2 P (0)})}{π (1 + 2 P (0))} + \bar{ϵ} \frac{exp (- \frac{{| y |}^{2}}{1 + 2 P (2)})}{π (1 + 2 P (2))} . \end{matrix}

Figure 4 shows the rates as the curve labeled “

I (A; Y | H)

”. This curve was computed by Monte Carlo integration with

P (0) = 0.1 \cdot P

and

P (2) = 1.9 \cdot P

, which is near-optimal for the range of SNRs depicted.

The reverse model GMI (139) requires

Var [U | Y, H]

. We show how to compute this variance in Appendix C.2 by applying (164) and (165). Figure 4 shows the GMIs as the curve labeled “rGMI”, where we used the same power levels as for the

I (A; Y | H)

curve. The two curves are indistinguishable for small P, but the “rGMI” rates are poor at large P. This example shows that the forward model GMI with optimized powers can be substantially better than the reverse model GMI with a reasonable but suboptimal power policy.

The forward model GMI (195) is

\begin{matrix} I_{1} (A; Y | H) = \frac{1}{2} log (1 + SNR (\sqrt{2})) \end{matrix}

(253)

where

SNR (\sqrt{2})

is given by (196) with

\begin{matrix} {\tilde{P}}_{T} (\sqrt{2}) & = {(ϵ \sqrt{P (0)} + \bar{ϵ} \sqrt{P (2)})}^{2} \\ Var [\sqrt{P (S_{T})}| H = h] & = 1 + 2 ϵ \bar{ϵ} {(\sqrt{P (2)} - \sqrt{P (0)})}^{2} . \end{matrix}

Applying Remark 61, the optimal power control policy is

\begin{matrix} \sqrt{P (s_{T})} = \frac{p_{H | S_{T}} (\sqrt{2} | s_{T})}{γ + β p_{H | S_{T}} (\sqrt{2} | s_{T})} = \{\begin{matrix} \frac{ϵ}{γ + β ϵ}, & s_{T} = 0 \\ \frac{\bar{ϵ}}{γ + β \bar{ϵ}}, & s_{T} = 2 \end{matrix} \end{matrix}

(254)

where

\begin{matrix} β & = \frac{2 \sqrt{{\tilde{P}}_{T} (\sqrt{2})}}{E [{| Y |}^{2} | H = \sqrt{2}]} \end{matrix}

(255)

and

γ \geq 0

is chosen so that

P (0) + P (2) = 2 P

. Figure 4 shows the resulting GMI as the curve labeled “GMI, K = 1”. At low SNR, we achieve the rate

{\tilde{P}}_{T} (\sqrt{2})

and the optimal power control has

β \to 0

so that

\begin{matrix} P (0) = \frac{2 P ϵ^{2}}{ϵ^{2} + {\bar{ϵ}}^{2}}, P (2) = \frac{2 P {\bar{ϵ}}^{2}}{ϵ^{2} + {\bar{ϵ}}^{2}} \end{matrix}

(256)

and therefore

\begin{matrix} {\tilde{P}}_{T} (\sqrt{2}) = 2 (ϵ^{2} + {\bar{ϵ}}^{2}) P . \end{matrix}

(257)

We have

C^{'} (0) = 2 (ϵ^{2} + {\bar{ϵ}}^{2})

and lose a fraction of

(ϵ^{2} + {\bar{ϵ}}^{2})

of the power as compared to having full CSIT (

ϵ = 0

). For example, if

ϵ = 0.1

, the minimal

E_{b} / N_{0}

is approximately

- 3.74

dB.

We remark that the

I (A; Y | H)

and reverse model GMI curves lie above the forward model curve if we choose the same power policy as for the forward channel.

7.3. Partial CSIR, Full CSIT

This section studies

S_{T} = H

. The capacity with partial CSIR is given by (138) for which we need to compute

p (y | a, s_{R})

and

p (y | s_{R})

. We consider two cases.

S_{R} = 1 (G \geq t)

: Here we recover the case with full CSIR by choosing t to satisfy

0 < t \leq 2

.

S_{R} = 0

: The best power policy clearly has

P (0) = 0

and

P (\sqrt{2}) = 2 P

. The mutual information is thus

I (A; Y) = I (X (\sqrt{2}); Y)

and the channel densities are (cf. (75) and (76))

\begin{matrix} p (y | a) & = \frac{e^{- {| y |}^{2}}}{2 π} + \frac{e^{- {|y - 2 \sqrt{P} u (\sqrt{2})|}^{2}}}{2 π} \\ p (y) & = \frac{e^{- {| y |}^{2}}}{2 π} + \frac{e^{- {| y |}^{2} / (1 + 4 P)}}{2 π (1 + 4 P)} . \end{matrix}

The rates

I (A; Y)

are shown in Figure 5. Observe that the low-SNR rates are larger than without fading; this is a consequence of the slightly bursty nature of transmission.

The reverse model GMI (139) requires

Var [U | Y]

. We compute this variance in Appendix C.3 by using (164) and (165) with (201) and

ϕ (s_{T}) = 0

. Figure 5 shows the GMIs as the curve labeled “rGMI”.

Next, the TCP, TMF, TCI, and TMMSE policies are the same for

0 < t \leq 2

, since they use

P (0) = 0

and

P (\sqrt{2}) = 2 P

. The resulting rate is given by (202)–(204) with

\tilde{P} (0) = 0

,

\tilde{P} (1) = P

, and

Var [\sqrt{G P (S_{T})}| S_{R} = 1] = P

and

\begin{matrix} I_{1} (A; Y) = log (1 + \frac{P}{1 + P}) . \end{matrix}

(258)

The rates are plotted in Figure 5 as the curve labeled “GMI, K = 1”. This example again shows that choosing

K = 1

is a poor choice at high SNR.

To improve the auxiliary model at high SNR, consider the GMI (154) with

K = 2

and the subsets (65). We further choose the parameters

h_{1} = 0

,

σ_{1}^{2} = 0

,

h_{2} = 2

,

σ_{2}^{2} = 1

, and adaptive coding with

X (0) = 0

,

X (\sqrt{2}) = \sqrt{2 P} U

,

\bar{X} = \sqrt{P} U

, where

U \sim CN (0, 1)

. The GMI (154) is

\begin{matrix} I_{1} (A; Y) = \Pr [E_{2}] & [log (1 + 4 P) + \frac{E [{| Y |}^{2} | E_{2}]}{1 + 4 P} - E [{|Y - \sqrt{4 P} U|}^{2}| E_{2}]] . \end{matrix}

(259)

In Appendix B.2, we show that choosing

t_{R} = P^{λ_{R}} + b

where

0 < λ_{R} < 1

and b is a real constant makes all terms behave as desired as P increases:

\begin{matrix} \Pr [E_{2}] \to 1 / 2, \frac{E [{| Y |}^{2} | E_{2}]}{1 + 4 P} \to 1, E [{|Y - \sqrt{4 P} U|}^{2}| E_{2}] \to 1 . \end{matrix}

(260)

We thus have

\begin{matrix} lim_{P \to \infty} [\frac{1}{2} log (1 + 4 P) - I_{1} (X; Y)] = 0 . \end{matrix}

(261)

Figure 5 shows the behavior of

I_{1} (A; Y)

for

λ_{R} = 1 / 2

and

b = 3

as the curve labeled “GMI, K = 2”. As for the case without CSIT, the receiver can estimate H accurately at large SNR, and one approaches the capacity with full CSIR.

Finally, the large-K forward model rates are computed using (70) but where

\bar{X}

replaces X. One may again use the results of Appendix C.3 and the relations

\begin{matrix} E [\bar{X}| Y = y] & = \sqrt{P} E [U | Y = y] \\ E [| \bar{X} |^{2}| Y = y] & = P E [{| U |}^{2} | Y = y] \\ Var [\bar{X} | Y = y] & = P Var [U | Y = y] . \end{matrix}

The rates are shown as the curve labeled “GMI, K = ∞” in Figure 5. So again, the large-K forward model is good at high SNR but worse than the best

K = 1

model at low SNR.

7.4. Partial CSIR, CSIT@ R

Consider partial CSIR with

S_{T} = S_{R}

and

\begin{matrix} \Pr [S_{R} = H] = \bar{ϵ}, \Pr [S_{R} \neq H] = ϵ \end{matrix}

(262)

where

0 \leq ϵ \leq \frac{1}{2}

. We thus have both CSIT@R and CSIR@T. To compute

I (X; Y | S_{R})

in (230), we write (231) and (232) as

\begin{matrix} p_{Y | S_{R}, X} (y | 0, x) & = \bar{ϵ} \frac{e^{- {| y |}^{2}}}{π} + ϵ \frac{e^{- {|y - \sqrt{2} x (0)|}^{2}}}{π} \\ p_{Y | S_{R}, X} (y | \sqrt{2}, x) & = \bar{ϵ} \frac{e^{- {|y - \sqrt{2} x (\sqrt{2})|}^{2}}}{π} + ϵ \frac{e^{- {| y |}^{2}}}{π} \\ p_{Y | S_{R}} (y | 0) & = \bar{ϵ} \frac{e^{- {| y |}^{2}}}{π} + ϵ \frac{e^{- {| y |}^{2} / [1 + 2 P (0)]}}{π [1 + 2 P (0)]} \\ p_{Y | S_{R}} (y | \sqrt{2}) & = \bar{ϵ} \frac{e^{- {| y |}^{2} / [1 + 2 P (\sqrt{2})]}}{π [1 + 2 P (\sqrt{2})]} + ϵ \frac{e^{- {| y |}^{2}}}{π} \end{matrix}

where

X (s_{T})

is CSCG. We choose the transmit powers

P (0)

and

P (\sqrt{2})

as in (250) to compare with the best CSIR. Figure 6 shows the resulting rates for

ϵ = 0.1

as the curve labeled “Partial CSIR,

I (X; Y | S_{R})

”. Observe that at high SNR, the curve seems to approach the best CSIR curve from Figure 4 with

S_{R} = H \sqrt{P (S_{T})}

. We prove this by studying a forward model GMI with

K = 2

.

The reverse model GMI requires

Var [U | Y, S_{R}]

, which can be computed by simulation; see Appendix C.4. However, optimizing the powers seems difficult. We instead focus on the forward model GMI of Theorem 3 for which we compute

\begin{matrix} \tilde{g} (0) = 2 ϵ^{2}, \tilde{g} (\sqrt{2}) = 2 {\bar{ϵ}}^{2}, {\tilde{σ}}^{2} (0) = {\tilde{σ}}^{2} (\sqrt{2}) = 2 ϵ \bar{ϵ} \end{matrix}

and therefore (237) is

\begin{matrix} I_{1} (X; Y | S_{R}) = \frac{1}{2} log (1 + \frac{2 ϵ^{2} P (0)}{1 + 2 ϵ \bar{ϵ} P (0)}) + \frac{1}{2} log (1 + \frac{2 {\bar{ϵ}}^{2} P (\sqrt{2})}{1 + 2 ϵ \bar{ϵ} P (\sqrt{2})}) . \end{matrix}

(263)

For CSIR@T, the optimal power control policy is given by the quadratic waterfilling specified by (239) or (245):

\begin{matrix} P (0) & = \frac{1 + \bar{ϵ}}{4 ϵ \bar{ϵ}} [\sqrt{1 + 8 ϵ \bar{ϵ} {(\frac{1}{λ} - \frac{1}{2 ϵ^{2}})}^{+} \frac{ϵ}{{(1 + \bar{ϵ})}^{2}}} - 1] \\ P (\sqrt{2}) & = \frac{1 + ϵ}{4 ϵ \bar{ϵ}} [\sqrt{1 + 8 ϵ \bar{ϵ} {(\frac{1}{λ} - \frac{1}{2 {\bar{ϵ}}^{2}})}^{+} \frac{\bar{ϵ}}{{(1 + ϵ)}^{2}}} - 1] . \end{matrix}

The rates are shown in Figure 6 as the curve labeled “Partial CSIR, GMI, K = 1”. Observe that at high SNR the GMI (263) saturates at

\begin{matrix} \frac{1}{2} log (1 + \frac{ϵ}{\bar{ϵ}}) + \frac{1}{2} log (1 + \frac{\bar{ϵ}}{ϵ}) . \end{matrix}

(264)

For example, for

ϵ = 0.1

, we approach

1.74

bits at high SNR. On the other hand, at low SNR, the rate is maximized with

P (0) = 0

and

P (\sqrt{2}) = 2 P

so that

I_{1} (X; Y | S_{R}) \approx 2 {\bar{ϵ}}^{2} P

. We thus achieve a fraction of

{\bar{ϵ}}^{2}

of the power compared to full CSIT. For example, if

ϵ = 0.1

, the minimal

E_{b} / N_{0}

is approximately

- 3.69

dB.

Figure 6 also shows the conventional waterfilling rates as the curve labeled “Partial CSIR, GMI, c-waterfill”. These rates are almost the same as the quadratic waterfilling rates except for the range of

E_{b} / N_{0}

between 9 to 13 dB shown in the inset.

To improve the auxiliary model at high SNR, we use a

K = 2

GMI with (see Remark 70)

\begin{matrix} h_{1} (s_{R}) = 0, h_{2} (s_{R}) = \sqrt{2}, σ_{1}^{2} (s_{R}) = σ_{2}^{2} (s_{R}) = 1 \end{matrix}

for

s_{R} = 0, \sqrt{2}

. The receiver chooses

\bar{X} (s_{R}) = \sqrt{P (s_{R})} U

(see Remark 41) and we have (see Remark 42)

\begin{matrix} I_{1} (X; Y | S_{R}) & = \frac{1}{2} \Pr [E_{2} | S_{R} = 0] \{log (1 + 2 P (0)) + \frac{E [{| Y |}^{2}| E_{2}, S_{R} = 0]}{1 + 2 P (0)} \\ - E [| Y - \sqrt{2} X (0) |^{2}| E_{2}, S_{R} = 0]\} \\ + \frac{1}{2} \Pr [E_{2} | S_{R} = \sqrt{2}] \{log (1 + 2 P (\sqrt{2})) + \frac{E [{| Y |}^{2}| E_{2}, S_{R} = \sqrt{2}]}{1 + 2 P (\sqrt{2})} \\ - E [| Y - \sqrt{2} X (\sqrt{2}) |^{2}| E_{2}, S_{R} = \sqrt{2}]\} \end{matrix}

(265)

where the

X (s_{T})

,

s_{T} \in S_{T}

, are given by (122). We consider

P (0)

and

P (\sqrt{2})

that scale in proportion to P. In this case, Appendix B.3 shows that choosing

t_{R} = P^{λ_{R}}

where

0 < λ_{R} < 1

gives the (best) full-CSIR capacity for large P, which is the rate specified in (249):

\begin{matrix} lim_{P \to \infty} & [\frac{ϵ}{2} log (1 + 2 P (0)) + \frac{\bar{ϵ}}{2} log (1 + 2 P (\sqrt{2})) - I_{1} (X; Y | S_{R})] = 0 . \end{matrix}

(266)

In other words, by optimizing

P (0)

and

P (\sqrt{2})

, at high SNR the

K = 2

GMI can approach the capacity of Proposition 2. This is expected since the receiver can estimate

H \sqrt{P (S_{T})}

reliably at high SNR.

Figure 6 shows the behavior of this GMI and

t_{R} = P^{0.4}

, and where we have chosen

P (0)

and

P (\sqrt{2})

according to (250). The abrupt change in slope at approximately 2.5 dB is because

P (0)

becomes positive beyond this

E_{b} / N_{0}

. Keeping

P (0) = 0

for

E_{b} / N_{0}

up to about 12 dB gives better rates, but for high SNR one should choose the powers according to (250).

8. Rayleigh Fading

Rayleigh fading has

H \sim CN (0, 1)

. The random variable

G = {| H |}^{2}

thus has the density

p (g) = e^{- g} \cdot 1 (g \geq 0)

. Section 8.1 and Section 8.2 review known results.

8.1. No CSIR, No CSIT

Suppose

S_{R} = S_{T} = 0

and

X \sim CN (0, P)

. The densities to compute

I (X; Y)

for CSCG X are

\begin{matrix} p (y | x) & = \frac{e^{- {| y |}^{2} / {(| x |}^{2} + 1)}}{π (| x |^{2} + 1)} \end{matrix}

(267)

\begin{matrix} p (y) & = \int_{0}^{\infty} \frac{e^{- g / P}}{P} \frac{e^{- {| y |}^{2} / (g + 1)}}{π (g + 1)} d g . \end{matrix}

(268)

The minimum

E_{b} / N_{0}

is approximately 9.2 dB, and the forward model GMI (174) is zero. The capacity is achieved by discrete and finite X [96], and at large SNR, the capacity behaves as

log log P

[97]. Further results are derived in [98,99,100,101,102].

8.2. Full CSIR, CSIT@ R

The capacity (175) for

B = 0

(no CSIT) is

\begin{matrix} C (P) = \int_{0}^{\infty} e^{- g} log (1 + g P) d g = e^{1 / P} E_{1} (1 / P) log (e) \end{matrix}

(269)

where the exponential integral

E_{1} (.)

is given by (A4) below. The wideband values are given by (177):

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} = log 2, S = 1 . \end{matrix}

The minimal

E_{b} / N_{0}

is

- 1.59

dB, but the fading reduces the capacity slope. At high SNR, we have

\begin{matrix} C (P) \approx log (P) - γ \end{matrix}

where

γ \approx 0.57721

is Euler’s constant. The capacity thus behaves as for the case without fading but with an SNR loss of approximately 2.5 dB.

The capacity (182) with

B = \infty

(or

S_{T} = G

) is (see ([95], Equation (7)))

\begin{matrix} C (P) & = \int_{λ}^{\infty} e^{- g} log (g / λ) d g = E_{1} (λ) . \end{matrix}

(270)

where

P (g)

is given by (181) and

λ

is chosen so that

\begin{matrix} P & = \int_{λ}^{\infty} e^{- g} P (g) d g = \frac{e^{- λ}}{λ} - E_{1} (λ) . \end{matrix}

At low SNR we have large

λ

and using the approximation (A7) below we compute

\begin{matrix} C (P) \approx e^{- λ} / λ and P \approx e^{- λ} / λ^{2} . \end{matrix}

(271)

We thus have

E_{b} / N_{0} \approx log (2) / λ

and the minimal

E_{b} / N_{0}

is

- \infty

.

Consider now

B = 1

for which

P_{S_{T}} (3 Δ / 2) = e^{- Δ}

and

\begin{matrix} E [G | G \geq Δ] = 1 + Δ \end{matrix}

(272)

\begin{matrix} E [G^{2} | G \geq Δ] = 2 + 2 Δ + Δ^{2} . \end{matrix}

(273)

We thus have the wideband quantities in (186) and (187):

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} = \frac{log 2}{1 + Δ} \end{matrix}

(274)

\begin{matrix} S = \frac{2 e^{- Δ} {(1 + Δ)}^{2}}{2 + 2 Δ + Δ^{2}} . \end{matrix}

(275)

Figure 7 shows the capacities for

B = 1

and

Δ = 1, 2, 1 / 2

. The minimum

E_{b} / N_{0}

value is

\begin{matrix} - 1.59 dB - 10 {log}_{10} (1 + Δ) \end{matrix}

(276)

and for

Δ = 1, 2, 1 / 2

we gain 3 dB, 4.8 dB, 1.8 dB, respectively, over no CSIT at low power. Note that one bit of feedback allows one to approach the full CSIT rates closely.

Remark 71.

For the scalar channel (159), knowing H at both the transmitter and receiver provides significant gains at low SNR [73] but small gains at high SNR ([95], Figure 4) as compared to knowing H at the receiver only. Furthermore, the reliability can be improved ([78], Figures 5–7). Significant gains are also possible for MIMO channels.

Remark 72.

An alternative way to derive (272)–(275) is as follows. Define

\hat{P} = P e^{Δ}

so for small P the capacity is

\begin{matrix} C (P) & = \int_{Δ}^{\infty} e^{- g} log (1 + g \hat{P}) d g \\ = e^{1 / \hat{P}} E_{1} (\frac{1}{\hat{P}} + Δ) + e^{- Δ} log (1 + \hat{P} Δ) \\ \approx P (1 + Δ) - \frac{1}{2} P^{2} e^{Δ} (2 + 2 Δ + Δ^{2}) . \end{matrix}

8.3. Full CSIR, Partial CSIT

Consider noisy CSIT with

\begin{matrix} \Pr [S_{T} = 1 (G \geq Δ)] = \bar{ϵ}, \Pr [S_{T} \neq 1 (G \geq Δ)] = ϵ . \end{matrix}

We begin with the most informative CSIR.

S_{R} = \sqrt{P (S_{T})} H

: Proposition 2 gives the capacity

\begin{matrix} C (P) & = \int_{0}^{\infty} e^{- g} \sum_{s_{T}} P (s_{T} | g) log (1 + g P (s_{T})) d g \\ = \int_{0}^{Δ} e^{- g} [\bar{ϵ} log (1 + g P (0)) + ϵ log (1 + g P (1))] d g \\ + \int_{Δ}^{\infty} e^{- g} [\bar{ϵ} log (1 + g P (1)) + ϵ log (1 + g P (0))] d g . \end{matrix}

(277)

It remains to optimize

P (0)

,

P (1)

and

Δ

. The two equations for the Lagrange multiplier

λ

are

\begin{matrix} λ \cdot P_{S_{T}} (0) & = \int_{0}^{Δ} e^{- g} \cdot \frac{\bar{ϵ} g}{1 + g P (0)} d g + \int_{Δ}^{\infty} e^{- g} \cdot \frac{ϵ g}{1 + g P (0)} d g \end{matrix}

(278)

\begin{matrix} λ \cdot P_{S_{T}} (1) & = \int_{0}^{Δ} e^{- g} \cdot \frac{ϵ g}{1 + g P (1)} d g + \int_{Δ}^{\infty} e^{- g} \cdot \frac{\bar{ϵ} g}{1 + g P (1)} d g \end{matrix}

(279)

where

P_{S_{T}} (0) = \bar{ϵ} - (\bar{ϵ} - ϵ) e^{- Δ}

and

P_{S_{T}} (1) = ϵ + (\bar{ϵ} - ϵ) e^{- Δ}

. The rates are shown in Figure 8.

For fixed

Δ

and large P, we have

1 / λ \approx P (0) \approx P (1) \approx P

and approach the capacity (269) without CSIT. In contrast, for small P we may use similar steps as for (183) and (184). Observe the following for (278) and (279):

both $P (0)$ and $P (1)$ decrease as $λ$ increases;
the maximal $λ$ in (278) is obtained with $P (0) = 0$ ; this value is

$\begin{matrix} E [G | S_{T} = 0] = \frac{\bar{ϵ} - (\bar{ϵ} - ϵ) (1 + Δ) e^{- Δ}}{P_{S_{T}} (0)} \end{matrix}$

(280)
the maximal $λ$ in (279) is obtained with $P (1) = 0$ ; this value is

$\begin{matrix} E [G | S_{T} = 1] = \frac{ϵ + (\bar{ϵ} - ϵ) (1 + Δ) e^{- Δ}}{P_{S_{T}} (1)} . \end{matrix}$

(281)

Thus, if

E [G | S_{T} = 0] < E [G | S_{T} = 1]

and

0 \leq ϵ < 1 / 2

, then for P below some threshold we have

P (0) = 0

,

P (1) = P / P_{S_{T}} (1)

and the capacity is

\begin{matrix} C (P) = \int_{0}^{Δ} e^{- g} ϵ log (1 + \frac{g P}{P_{S_{T}} (1)}) d g + \int_{Δ}^{\infty} e^{- g} \bar{ϵ} log (1 + \frac{g P}{P_{S_{T}} (1)}) d g . \end{matrix}

(282)

We compute

C^{'} (0) = E [G | S_{T} = 1]

which is given by (281) so that

1 \leq C^{'} (0) \leq 1 + Δ

, as expected from (274). For example, for

ϵ = 0.1

and

Δ = 1

we have

C^{'} (0) \approx 1.75

and therefore the minimal

E_{b} / N_{0}

is approximately

- 4.01

dB.

The best

Δ

is the unique solution

\hat{Δ}

of the equation

\begin{matrix} e^{- Δ} = \frac{ϵ}{\bar{ϵ} - ϵ} (Δ - 1) \end{matrix}

(283)

and the result is

C^{'} (0) = \hat{Δ} \geq 1

. We have the simple bounds

\begin{matrix} 1 + \frac{1}{2} log (\frac{1}{ϵ} - 2) \leq C^{'} (0) \leq 1 + \frac{1}{e} (\frac{1}{ϵ} - 2) \end{matrix}

(284)

where the left inequality follows by taking logarithms and using

log (Δ - 1) \leq Δ - 2

, and the right inequality follows by using

e^{- Δ} \leq e^{- 1}

in (283). For example, for

ϵ \to 0

we have

C^{'} (0) \to \infty

, and for

ϵ \to 1 / 2

we have

C^{'} (0) \to 1

.

S_{R} = H

: For the less informative CSIR, one may use (191) and (193) to compute

I (A; Y | H)

. The reverse model GMI requires

Var [U | Y, S_{R}]

, which can be computed by simulation; see Appendix C.2. Again, however, optimizing the powers seems difficult. We instead focus on the forward model GMI of Corollary 1, which is

\begin{matrix} I_{1} (A; Y | H) = \int_{0}^{\infty} e^{- g} log (1 + SNR (g)) d g \end{matrix}

(285)

where

\begin{matrix} SNR (g) = \frac{g {\tilde{P}}_{T} (g)}{1 + g ϵ \bar{ϵ} {(\sqrt{P (0)} - \sqrt{P (1)})}^{2}} \end{matrix}

(286)

and

\begin{matrix} {\tilde{P}}_{T} (g) & = \{\begin{matrix} {(\bar{ϵ} \sqrt{P (0)} + ϵ \sqrt{P (1)})}^{2}, & g < Δ \\ {(ϵ \sqrt{P (0)} + \bar{ϵ} \sqrt{P (1)})}^{2}, & g \geq Δ . \end{matrix} \end{matrix}

(287)

It remains to optimize

P (0)

,

P (1)

and

Δ

. Computing the derivatives seems complicated, so we use numerical optimization for fixed

Δ = 1

as in Figure 8. The results are shown in Figure 9. For fixed

Δ

and large P, it is best to choose

P (0) \approx P (1)

so that

S N R (g) \approx g P

and we approach the rate of no CSIT. For small P, however, the best

P (0)

is no longer zero and

C^{'} (0)

is smaller than (281).

8.4. Partial CSIR, Full CSIT

Consider

S_{T} = H

and suppose we choose the

X (h)

to be jointly CSCG with variances

E [{| X (h) |}^{2}] = P (h)

and correlation coefficients

\begin{matrix} ρ (h, h^{'}) = \frac{E [X (h) X {(h^{'})}^{*}]}{\sqrt{P (h) P (h^{'})}} \end{matrix}

and where

E [P (H)] \leq P

. We then have

\begin{matrix} p (y | s_{R}) = \int_{C} p (h | s_{R}) \frac{e^{- {| y |}^{2} / {(| h |}^{2} P (h) + 1)}}{π (| h |^{2} P (h) + 1)} d h . \end{matrix}

As in (97),

p (y | s_{R})

and

h (Y | S_{R})

depend only on the marginals of A and not on the

ρ (h, h^{'})

. We thus have the problem of finding the

ρ (h, h^{'})

that minimize

\begin{matrix} h (Y | A, S_{R}) = \int_{A} p (a) h (Y | S_{R}, A = a) d a . \end{matrix}

We will use fully-correlated

X (h)

as discussed in Section 6.5. We again consider

S_{R} = 0

and

S_{R} = 1 (G \geq t)

.

S_{R} = 0

: For the heuristic policies, the power (206) is

\begin{matrix} \hat{P} = \frac{P}{Γ (1 + a, t)} \end{matrix}

(288)

and the rate (219) is

\begin{matrix} I_{1} (A; Y) = log (1 + \frac{P Γ {(\frac{3 + a}{2}, t)}^{2}}{Γ (1 + a, t) + P [Γ (2 + a, t) - Γ {(\frac{3 + a}{2}, t)}^{2}]}) \end{matrix}

(289)

where

Γ (s, x)

is the upper incomplete gamma function; see Appendix A.3. Moreover, the expression (220) is

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} & = \frac{Γ (1 + a, t)}{Γ {(\frac{3 + a}{2}, t)}^{2}} \cdot log 2 . \end{matrix}

(290)

We remark that

Γ (s, 0) = Γ (s)

where

Γ (x)

is the gamma function. We further have

\begin{matrix} \begin{matrix} Γ (0, t) = E_{1} (t), & Γ (1, t) = e^{- t}, \\ Γ (2, t) = e^{- t} (t + 1), & Γ (3, t) = e^{- t} (t^{2} + 2 t + 2) . \end{matrix} \end{matrix}

For example, the TCP policy (

a = 0

) has

\hat{P} = P e^{t}

. At low SNR, it turns out that the best choice is

t = 0.283

for which we have

Γ (1, t) / Γ {(3 / 2, t)}^{2} \approx 1.174

. The minimum

E_{b} / N_{0}

in (222) is thus

- 0.90

dB. At high SNR, the best choice is

t = 0

so that (289) with

Γ (3 / 2, 0) = Γ (3 / 2) = \sqrt{π} / 2

gives

\begin{matrix} I_{1} (A; Y) = log (1 + \frac{P π / 4}{1 + P (1 - π / 4)}) . \end{matrix}

(291)

The TCP rate thus saturates at

2.22

bits per channel use; see the curve labeled “TCP, GMI, K = 1” in Figure 10.

The TMF policy (

a = 1

) has

\hat{P} = P e^{t} / (t + 1)

. The best choice is

t = 0

for which we have

Γ (2) = 1

and

Γ (3) = 2

and therefore (289) is

\begin{matrix} I_{1} (A; Y) = log (1 + \frac{P}{1 + P}) . \end{matrix}

(292)

The minimum

E_{b} / N_{0}

in (222) is

- 1.59

dB, and at high SNR, the TMF rate saturates at 1 bit per channel use. The rates are shown as the curve labeled “TMF, GMI, K = 1” in Figure 10.

The TCI policy (

a = - 1

) has

\hat{P} = P / E_{1} (t)

and using

Γ (0, t) = E_{1} (t)

and

Γ (1, t) = e^{- t}

gives

\begin{matrix} I_{1} (A; Y) = log (1 + \frac{P}{e^{2 t} E_{1} (t) + P (e^{t} - 1)}) . \end{matrix}

(293)

The minimum

E_{b} / N_{0}

in (290) is

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} = E_{1} (t) e^{2 t} \cdot log 2 . \end{matrix}

(294)

Optimizing over t by taking derivatives (see (A5) below), the best t satisfies the equation

2 t e^{t} E_{1} (t) = 1

which gives

t \approx 0.61

and the minimal

E_{b} / N_{0}

is approximately 0.194 dB. On the other hand, for large SNR, we may choose

t = 1 / P

and using

E_{1} (t) \approx log (1 / t)

for small t gives

\begin{matrix} I_{1} (A; Y) \approx log (1 + \frac{P}{1 + log P}) . \end{matrix}

Since the pre-log is at most 1, the capacity grows with pre-log 1 for large P. We see that TMF is best at small P while TCI is best at large P. The rates are shown as the curve labeled “TCI, GMI, K = 1” in Figure 10.

The simple channel output of TCI permits further analysis. Using Remark 65, we compute the mutual information

I (A; Y)

by numerical integration; see the curve labeled “TCI,

I (A; Y)

” in Figure 10. We see that at high SNR, the TCI mutual information is larger than the GMI for TCP, TMF, and (of course) TCI. Moreover, as we show, the TCI mutual information can work well at low SNR.

Motivated by Section 7.3 and Figure 5, we again use the GMI (154) with

K = 2

and (65). We further choose

h_{1} = 0

,

σ_{1}^{2} = σ_{2}^{2} = 1

, and

\begin{matrix} \bar{X} = \frac{\sqrt{\hat{P}}}{h_{2}} U, U \sim CN (0, 1) . \end{matrix}

The expression (154) simplifies to

\begin{matrix} I_{1} (A; Y) = \Pr [E_{2}] [log (1 + \hat{P}) + \frac{E [{| Y |}^{2} | E_{2}]}{1 + \hat{P}} - E [{|Y - \sqrt{\hat{P}} U|}^{2}| E_{2}]] . \end{matrix}

(295)

The GMI (295) exhibits interesting high and low SNR scaling by choosing the following thresholds

t, t_{R}

.

For high SNR, we choose

$\begin{matrix} t = P^{- λ} and t_{R} = {\hat{P}}^{λ_{R}} \end{matrix}$

(296)

where $0 < λ < 1$ and $0 < λ_{R} < 1$ . As P increases, t decreases and Appendix B.4 shows that

$\begin{matrix} \Pr [E_{2}] \to 1, \frac{E [{| Y |}^{2} | E_{2}]}{1 + \hat{P}} \to 1, E [{|Y - \sqrt{\hat{P}} U|}^{2} | E_{2}] \to 1 . \end{matrix}$

(297)

Inserting $\hat{P} = P / E_{1} (t)$ , we thus have

$\begin{matrix} lim_{P \to \infty} [I_{1} (A; Y) - log (1 + \frac{P}{E_{1} (t)})] = 0 . \end{matrix}$

(298)

We further have $E_{1} (t) \approx λ log P$ by using (A6) in Appendix A.2, and the high-SNR slope of the GMI matches the slope of $log P$ but the additive gap to $log P$ increases. The high SNR rates are shown as the curve labeled “TCI, GMI, K = 2” in Figure 10 for $λ = λ_{R} = 0.4$ .
For low SNR, we choose

$\begin{matrix} t = - log (P / c) and t_{R} = \hat{P} \end{matrix}$

(299)

for a constant $c > 0$ . As P decreases, both t and $\hat{P} = P / E_{1} (t)$ increase and Appendix B.4 shows that

$\begin{matrix} \Pr [E_{2}] \approx e^{- t - 1}, \frac{E [{| Y |}^{2} | E_{2}]}{1 + 2 \hat{P}} \to 1, E [{|Y - \sqrt{\hat{P}} U|}^{2} | E_{2}] \to 1 . \end{matrix}$

(300)

Using (A7), we have $I_{1} (A; Y) \approx e^{- t - 1} log t$ which vanishes as t grows. But we also have

$\begin{matrix} \frac{E_{b}}{N_{0}} = \frac{P}{R} log 2 & \approx \frac{c e^{- t} log 2}{e^{- t - 1} log t} \approx \frac{c e log 2}{log (- log P)} \end{matrix}$

(301)

which decreases (very slowly) as P decreases. The minimal $E_{b} / N_{0}$ is therefore $- \infty$ . The low SNR rates are shown as the curve labeled “TCI, GMI, K = 2” in Figure 11 for $c = 1.4$ .

Figure 11. Low-SNR rates for Rayleigh fading with

S_{T} = H

and

S_{R} = 0

. The threshold t was optimized for the

K = 1

curves, while

t = - log (P / 1.4)

for the

I (A; Y)

, rGMI, and

K = 2

curves. The

K = 2

GMI uses

t_{R} = \hat{P}

. The TMF and TMMSE GMIs are indistinguishable for this range of rates.

Figure 11. Low-SNR rates for Rayleigh fading with

S_{T} = H

and

S_{R} = 0

. The threshold t was optimized for the

K = 1

curves, while

t = - log (P / 1.4)

for the

I (A; Y)

, rGMI, and

K = 2

curves. The

K = 2

GMI uses

t_{R} = \hat{P}

. The TMF and TMMSE GMIs are indistinguishable for this range of rates.

Figure 11 shows that the TCI mutual information achieves a minimal

E_{b} / N_{0}

below

- 1.59

dB. At

E_{b} / N_{0} = - 2

dB, we computed

I_{1} (A; Y) \approx 6 \times 10^{- 7}

and

I (A; Y) \approx 3 \times 10^{- 4}

. The

K = 2

partition is thus useful to prove that TCI can achieve

E_{b} / N_{0}

arbitrarily close to zero. Figure 11 also shows the reverse model GMI as the curve labeled “TCI, rGMI” which has the rate

I_{1} (A; Y) \approx 8 \times 10^{- 6}

at

E_{b} / N_{0} = - 2

dB.

We compare the full CSIR and full CSIT rates. At high SNR, the GMI for

S_{R} = 0

achieves the same capacity pre-log as

S_{R} = H

. At low SNR, recall from (271) that with full CSIR/CSIT we have

E_{b} / N_{0} \approx log (2) / λ

. To compare the rates for similar

E_{b} / N_{0}

, we set

λ = log t

, where t is as in (299) and

c \approx 1

. The TCI

K = 2

GMI without CSIR is approximately

e^{- t} log t

while the full CSIR rate (271) is approximately

e^{- λ} / λ \approx 1 / (t log (t))

. Thus, the

K = 2

GMI with no CSIR is a fraction

t e^{- t} log {(t)}^{2}

of the full CSIR capacity.

S_{R} = 1 (G \geq t)

: The power in (206) is again (288) and the rate (224) is

\begin{matrix} I_{1} (A; Y | S_{R}) = e^{- t} \cdot log (1 + \frac{P e^{2 t} Γ {(\frac{3 + a}{2}, t)}^{2}}{Γ (1 + a, t) + P [e^{t} Γ (2 + a, t) - e^{2 t} Γ {(\frac{3 + a}{2}, t)}^{2}]}) . \end{matrix}

(302)

Moreover, the expression (225) is

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} = \frac{Γ (1 + a, t)}{e^{t} \cdot Γ {(\frac{3 + a}{2}, t)}^{2}} \cdot log 2 \end{matrix}

(303)

which is the same as (290) except for the factor

e^{t}

in the denominator. This implies that the minimal

E_{b} / N_{0}

can be improved for

t > 0

.

The TCP, TMF, and TCI rates (302) are the respective

\begin{matrix} I_{1} (A; Y | S_{R}) = e^{- t} log (1 + \frac{P e^{2 t} Γ {(\frac{3}{2}, t)}^{2}}{e^{- t} + P [t + 1 - e^{2 t} Γ {(\frac{3}{2}, t)}^{2}]}) \end{matrix}

(304)

\begin{matrix} I_{1} (A; Y | S_{R}) = e^{- t} log (1 + \frac{P {(t + 1)}^{2}}{e^{- t} (t + 1) + P}) \end{matrix}

(305)

\begin{matrix} I_{1} (A; Y | S_{R}) = e^{- t} log (1 + \frac{P}{E_{1} (t)}) . \end{matrix}

(306)

Remark 73.

As pointed out in Remark 68, the TCI GMI (306) is

I (A; Y | S_{R})

. One can also understand this by observing that the receiver knows

\sqrt{G P (G)}

for all G. The mutual information is thus related to the rate (189) of Proposition 2.

The minimal

E_{b} / N_{0}

in (303) are the respective

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} = \frac{1}{e^{2 t} \cdot Γ {(\frac{3}{2}, t)}^{2}} \cdot log 2 \end{matrix}

(307)

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} = \frac{1}{t + 1} \cdot log 2 \end{matrix}

(308)

\begin{matrix} {\frac{E_{b}}{N_{0}}|}_{\min} = e^{t} E_{1} (t) \cdot log 2 . \end{matrix}

(309)

The above expressions mean that, for all three policies, we can make the minimal

E_{b} / N_{0}

as small as desired by increasing t. For example, for TCI, we can bound (see (A9) below)

\begin{matrix} \frac{1}{t + 1} < e^{t} E_{1} (t) < \frac{1}{t} . \end{matrix}

(310)

TCI thus has a slightly larger (slightly worse) minimal

E_{b} / N_{0}

than TMF for the same t, as discussed after (212).

For large P, the TCP rate (304) is optimized by

t \approx 0.163

and the rate saturates at

\approx 2.35

bits per channel use. The TMF rate (305) is optimized with

t = 0

, and the rate saturates at 1 bit per channel use. For the TCI rate (306), we again choose

t = 1 / P

and use

E_{1} (t) \approx log (1 / t)

for small t to show that the capacity grows with pre-log 1:

\begin{matrix} I_{1} (A; Y | S_{R}) \approx log (1 + \frac{P}{log P}) . \end{matrix}

Again, TMF is best at small P while TCI is best at large P.

Remark 74.

Comparing (298) and (306), the

S_{R} = 0

,

K = 2

, TCI GMI in (295) approaches the

S_{R} = 1 (G \geq t)

mutual information

I (A; Y | S_{R})

in (306) at high SNR.

Optimal Policy: Consider now the optimal power control policy. Suppose first that

S_{R} = 0

for which Theorem 2 gives the TMMSE policy with

t = 0

:

\begin{matrix} \sqrt{P (h)} & = \frac{α | h |}{β + {| h |}^{2}} . \end{matrix}

(311)

For Rayleigh fading, we thus have (see (A13) below)

\begin{matrix} P = \int_{0}^{\infty} e^{- g} \frac{α^{2} g}{{(β + g)}^{2}} d g = α^{2} [(β + 1) e^{β} E_{1} (β) - 1] \end{matrix}

(312)

with the two expressions (see (A12) and (A14) below)

\begin{matrix} \tilde{P} & = \int_{0}^{\infty} e^{- g} \frac{α^{2} g}{β + g} d g = α^{2} {[1 - β e^{β} E_{1} (β)]}^{2} \end{matrix}

(313)

\begin{matrix} E [G P (H)] & = \int_{0}^{\infty} e^{- g} \frac{α^{2} g^{2}}{{(β + g)}^{2}} d g = α^{2} [1 + β - β (β + 2) e^{β} E_{1} (β)] . \end{matrix}

(314)

Given P and

β

, we may compute

α^{2}

from (312). We then search for the optimal

β

for fixed P. The rates are shown as the curve labeled “TMMSE, GMI, K = 1” in Figure 10 and Figure 11 and we see that the TMMSE strategy has the best

K = 1

rates.

Consider next

S_{R} = 1 (G \geq t)

and the TMMSE policy. We compute (see (A13) below)

\begin{matrix} P = \int_{t}^{\infty} e^{- g} \frac{α^{2} g}{{(β + g)}^{2}} d g = α^{2} [(β + 1) e^{β} E_{1} (t + β) - e^{- t} \frac{β}{t + β}] \end{matrix}

(315)

and (see (A12) and (A14) below)

\begin{matrix} \sqrt{\tilde{P} (1)} & = \int_{t}^{\infty} \frac{e^{- g}}{e^{- t}} \frac{α g}{β + g} d g = α [1 - β e^{t + β} E_{1} (t + β)] \end{matrix}

(316)

\begin{matrix} E [{| Y |}^{2} | S_{R} = 1] & = \int_{t}^{\infty} \frac{e^{- g}}{e^{- t}} (1 + \frac{α^{2} g^{2}}{{(β + g)}^{2}}) d g \\ = 1 + α^{2} [1 + \frac{β^{2}}{t + β} - β (β + 2) e^{t + β} E_{1} (t + β)] . \end{matrix}

(317)

We optimize as for the

S_{R} = 0

case: given P,

β

, t, we compute

α^{2}

from (315). We then search for the optimal

β

for fixed P and t. The optimal t is approximately a factor of 1.1 smaller than for the TCI policy. The rates are shown in Figure 12 as the curve labeled “TMMSE, GMI”.

8.5. Partial CSIR, CSIT@ R

Suppose

S_{R}

is defined by (see (172))

\begin{matrix} H = \sqrt{\bar{ϵ}} S_{R} + \sqrt{ϵ} Z_{R} \end{matrix}

where

0 \leq ϵ \leq 1

and

S_{R}, Z_{R}

are independent with distribution

CN (0, 1)

. We further consider the CSIT

S_{T} = {| S_{R} |}^{2}

.

The reverse model GMI again requires

Var [U | Y, S_{R}]

, which can be computed by simulation; see Appendix C.4. However, as in Section 7.4 and Section 8.3, optimizing the powers seems difficult, and we instead focus on forward models. The expressions (235) and (236) are

\begin{matrix} \tilde{g} (s_{R}) = \bar{ϵ} s_{T}, {\tilde{σ}}^{2} (s_{R}) = ϵ . \end{matrix}

(318)

The GMI (237) of Theorem 3 is

\begin{matrix} I_{1} (X; Y | S_{R}) & = \int_{λ / \bar{ϵ}}^{\infty} e^{- s_{T}} log (1 + \frac{\bar{ϵ} s_{T} P (s_{T})}{1 + ϵ P (s_{T})}) d s_{T} \end{matrix}

(319)

where the power control policy

P (s_{T})

is given by (245). The parameter

λ

is chosen so that

E [P (S_{T})] = P

. For example, for

ϵ \to 0

we recover the waterfilling solution (181). Figure 13 shows the quadratic and conventional waterfilling rates, which lie almost on top of each other. For example, the inset shows the rates for

ϵ = 0.2

and a small range of

E_{b} / N_{0}

.

9. Channels with In-Block Feedback

This section generalizes Shannon’s model described in Section 4.1 to include block fading with in-block feedback. For example, the model lets one include delay in the CSIT and permits many other generalizations for network models [22].

9.1. Model and Capacity

The problem is specified by the FDG in Figure 14. The model has a message M, and the channel input and output strings

\begin{matrix} X_{i}^{L} = (X_{i 1}, \dots, X_{i L}), Y_{i}^{L} = (Y_{i 1}, \dots, Y_{i L}) \end{matrix}

for blocks

i = 1, \dots, n

. The channel is specified by a string

S_{H}^{n} = (S_{H 1}, \dots, S_{H n})

of i.i.d. hidden channel states. The CSIR

S_{R i ℓ}

is a (possibly noisy) function of

S_{H i}

for all i and ℓ. The receiver sees the channel outputs (see (159))

\begin{matrix} (Y_{i ℓ}, S_{R i ℓ}) = (f_{ℓ} (X_{i}^{ℓ}, S_{H i}, Z_{i}^{L}), S_{R i ℓ}) \end{matrix}

(320)

for some functions

f_{ℓ} (\cdot)

,

ℓ = 1, \dots, L

. Observe that the

X_{i}^{ℓ}

influence the

Y_{i ℓ}

in a causal fashion. The random variables

M, S_{H 1}, \dots, S_{H n}, Z_{1}^{L}, \dots, Z_{n}^{L}

are mutually independent.

We now permit past channel symbols to influence the CSIT; see Section 1.2. Suppose the CSIT has the form

\begin{matrix} S_{T i ℓ} = f_{T ℓ} (S_{H i}, X_{i}^{ℓ - 1}, Y_{i}^{ℓ - 1}) \end{matrix}

(321)

for some function

f_{T ℓ} (.)

and for all i and ℓ. The motivation for (321) is that useful CSIR may not be available until the end of a block or even much later. In the meantime, the receiver can, e.g., quantize the

Y_{i}^{ℓ - 1}

and transmit the quantization bits via feedback. This lets one study fast power control and beamforming without precise knowledge of the channel coefficients.

Define the string of past and current states as

\begin{matrix} s_{T}^{i ℓ} = (s_{T 1}^{L}, \dots, s_{T (i - 1)}^{L}, s_{T i}^{ℓ}) . \end{matrix}

(322)

The channel input at time

i ℓ

is

X (s_{T}^{i ℓ})

and the adaptive codeword

A^{n L}

is defined by the ordered lists

\begin{matrix} A_{i ℓ} & = [X (s_{T}^{i ℓ}), \forall s_{T}^{i ℓ}] \end{matrix}

(323)

for

1 \leq i \leq n

and

1 \leq ℓ \leq L

. The adaptive codeword

A^{n L}

is a function of M and is thus independent of

S_{H}^{n}

and

S_{R}^{n L}

.

The model under consideration is a special case of the channels introduced in ([22], Section V). However, the model in [22] has transmission and reception begin at time

ℓ = 2

rather than

ℓ = 1

. To compare the theory, one must thus shift the time indexes by 1 unit and increase L to

L + 1

. The capacity for our model is given by ([22], Theorem 2) which we write as

\begin{matrix} C \overset{(a)}{=} max_{A^{L}} \frac{1}{L} I (A^{L}; Y^{L}, S_{R}^{L}) \overset{(b)}{=} max_{A^{L}} \frac{1}{L} I (A^{L}; Y^{L} | S_{R}^{L}) . \end{matrix}

(324)

where

(a)

follows by normalizing by L rather than

L + 1

, and step

(b)

follows by the independence of

A^{L}

and

S_{R}^{L}

.

9.2. GMI for Scalar Channels

We will study scalar block fading channels; extensions to vector channels follow as described in Section 4.4. Let

\underset{̲}{Y} = {[Y_{1}, \dots, Y_{L}]}^{T}

be the vector form of

Y^{L}

and similarly for other strings with L symbols. The GMI with parameter s is

\begin{matrix} I_{s} (A^{L}; Y^{L} | S_{R}^{L}) = E [log \frac{q {(\underset{̲}{Y} | \underset{̲}{A}, {\underset{̲}{S}}_{R})}^{s}}{q (\underset{̲}{Y} | {\underset{̲}{S}}_{R})}] \end{matrix}

(325)

Reverse Model: For the reverse model, let

\underset{̲}{A}

be a column vector that stacks the

X_{ℓ} (s_{T}^{ℓ})

for all

s_{T}^{ℓ}

and ℓ. Consider a reverse density as in (105):

\begin{matrix} q (a^{L} | y^{L}) = \frac{exp (- \underset{̲}{z} {(\underset{̲}{y}, {\underset{̲}{s}}_{R})}^{†} Q_{\underset{̲}{A} | \underset{̲}{Y} = \underset{̲}{y}, {\underset{̲}{S}}_{R} = {\underset{̲}{s}}_{R}}^{- 1} \underset{̲}{z} (\underset{̲}{y}, {\underset{̲}{s}}_{R}))}{π^{N} det Q_{\underset{̲}{A} | \underset{̲}{Y} = \underset{̲}{y}, {\underset{̲}{S}}_{R} = {\underset{̲}{s}}_{R}}} \end{matrix}

where

\begin{matrix} \underset{̲}{z} (\underset{̲}{y}, {\underset{̲}{s}}_{R}) = \underset{̲}{a} - E [\underset{̲}{A} | \underset{̲}{Y} = \underset{̲}{y}, {\underset{̲}{S}}_{R} = {\underset{̲}{s}}_{R}] . \end{matrix}

Using the forward model

q (y^{L} | a^{L}) = q (a^{L} | y^{L}) / p (a^{L})

, the GMI with

s = 1

becomes

\begin{matrix} I_{1} (A^{L}; Y^{L}, S_{R}^{L}) = E [log \frac{det Q_{\underset{̲}{A}}}{det Q_{\underset{̲}{A} | \underset{̲}{Y}, {\underset{̲}{S}}_{R}}}] . \end{matrix}

(326)

To simplify, consider adaptive symbols as in (89) (cf. (107)):

\begin{matrix} X_{ℓ} (S_{T}^{ℓ}) = \sqrt{P_{ℓ} (S_{T}^{ℓ})} e^{j ϕ_{ℓ} (S_{T}^{ℓ})} U_{ℓ} \end{matrix}

(327)

where

\underset{̲}{U} \sim CN (\underset{̲}{0}, I)

. In other words, consider a conventional codebook represented by the

U_{ℓ}

and adapt the power and phase based on the available CSIT. The mutual information becomes

I (A^{L}; Y^{L}, S_{R}^{L}) = I (U^{L}; Y^{L}, S_{R}^{L})

(cf. (96)) and the GMI with

s = 1

is (cf. (108))

\begin{matrix} I_{1} (A^{L}; Y^{L} | S_{R}^{L}) = E [- log det Q_{\underset{̲}{U} |\underset{̲}{Y}, {\underset{̲}{S}}_{R}}] . \end{matrix}

(328)

In fact, one may also consider choosing

U_{ℓ} = U

for all ℓ in which case we compute (cf. (139))

\begin{matrix} I_{1} (A^{L}; Y^{L} | S_{R}^{L}) = E [- log Var [U | \underset{̲}{Y}, {\underset{̲}{S}}_{R}]] . \end{matrix}

(329)

Forward Model: Consider the following forward model (cf. (111) and (141)):

\begin{matrix} q (\underset{̲}{y} | \underset{̲}{a}, {\underset{̲}{s}}_{R}) = \frac{exp (- \underset{̲}{z} {({\underset{̲}{s}}_{R})}^{†} Q_{\underset{̲}{Z}} {({\underset{̲}{s}}_{R})}^{- 1} \underset{̲}{z} ({\underset{̲}{s}}_{R}))}{π^{L} det Q_{\underset{̲}{Z}} ({\underset{̲}{s}}_{R})} . \end{matrix}

(330)

with

\begin{matrix} \underset{̲}{z} ({\underset{̲}{s}}_{R}) = \underset{̲}{y} - H ({\underset{̲}{s}}_{R}) \underset{̲}{\bar{x}} ({\underset{̲}{s}}_{R}) \end{matrix}

and where similar to (142) we define

\begin{matrix} \underset{̲}{\bar{X}} ({\underset{̲}{s}}_{R}) = \sum_{{\underset{̲}{s}}_{T}} W ({\underset{̲}{s}}_{T}, {\underset{̲}{s}}_{R}) \underset{̲}{X} ({\underset{̲}{s}}_{T}) \end{matrix}

(331)

where the

W ({\underset{̲}{s}}_{T}, {\underset{̲}{s}}_{R})

are

L \times L

complex matrices. Note that

\begin{matrix} \underset{̲}{X} ({\underset{̲}{s}}_{T}) = {[X_{1} (s_{T 1}), X_{2} (s_{T}^{2}), \dots, X_{2} (s_{T}^{L})]}^{T} \end{matrix}

(332)

so

X_{ℓ}

is a function of

A^{L}

and

S_{T}^{ℓ}

,

ℓ = 1, \dots, L

.

We have the following generalization of Lemma 4 (see also Theorem 1) where the novelty is that

S_{T}

is replaced with

{\underset{̲}{S}}_{T}

. Define

\underset{̲}{U} ({\underset{̲}{s}}_{T}) \sim CN (\underset{̲}{0}, I)

and

\underset{̲}{X} ({\underset{̲}{s}}_{T}) = Q_{\underset{̲}{X} ({\underset{̲}{s}}_{T})}^{1 / 2} \underset{̲}{U} ({\underset{̲}{s}}_{T})

for all

{\underset{̲}{s}}_{T}

.

Theorem 4.

A GMI (325) for the scalar block fading channel

p (y^{L} | a^{L}, s_{R}^{L})

, an adaptive codeword

A^{L}

with jointly CSCG entries, the auxiliary model (330), and with fixed

Q_{X ({\underset{̲}{s}}_{T})}

is

\begin{matrix} I_{1} (A^{L}; Y^{L} | S_{R}^{L}) = E [log (\frac{det Q_{\underset{̲}{Y}} ({\underset{̲}{S}}_{R})}{det (Q_{\underset{̲}{Y}} ({\underset{̲}{S}}_{R}) - \tilde{D} ({\underset{̲}{S}}_{R}) \tilde{D} {({\underset{̲}{S}}_{R})}^{†})})] . \end{matrix}

(333)

where

\begin{matrix} Q_{\underset{̲}{Y}} ({\underset{̲}{s}}_{R}) & = E [\underset{̲}{Y} {\underset{̲}{Y}}^{†}| {\underset{̲}{S}}_{R} = {\underset{̲}{s}}_{R}] \end{matrix}

(334)

and for

M \times M

unitary

V_{R} ({\underset{̲}{s}}_{T}, {\underset{̲}{s}}_{R})

the matrix

\tilde{D} ({\underset{̲}{s}}_{R})

is

\begin{matrix} E [U_{T} ({\underset{̲}{S}}_{T}, {\underset{̲}{s}}_{R}) Σ ({\underset{̲}{S}}_{T}, {\underset{̲}{s}}_{R}) V_{R} {({\underset{̲}{S}}_{T}, {\underset{̲}{s}}_{R})}^{†}| {\underset{̲}{S}}_{R} = {\underset{̲}{s}}_{R}] \end{matrix}

(335)

and

U_{T} ({\underset{̲}{s}}_{T}, {\underset{̲}{s}}_{R})

and

Σ ({\underset{̲}{s}}_{T}, {\underset{̲}{s}}_{R})

are

N \times N

unitary and

N \times M

rectangular diagonal matrices, respectively, of the SVD

\begin{matrix} E [\underset{̲}{Y} {\underset{̲}{U} ({\underset{̲}{s}}_{T})}^{†}| {\underset{̲}{S}}_{T} = {\underset{̲}{s}}_{T}, {\underset{̲}{S}}_{R} = {\underset{̲}{s}}_{R}] = U_{T} ({\underset{̲}{s}}_{T}, {\underset{̲}{s}}_{R}) Σ ({\underset{̲}{s}}_{T}, {\underset{̲}{s}}_{R}) V_{T} {({\underset{̲}{s}}_{T}, {\underset{̲}{s}}_{R})}^{†} \end{matrix}

(336)

for all

{\underset{̲}{s}}_{T}

,

{\underset{̲}{s}}_{R}

and the

V_{T} ({\underset{̲}{s}}_{T}, {\underset{̲}{s}}_{R})

are

M \times M

unitary matrices. One may maximize (333) over the unitary

V_{R} ({\underset{̲}{s}}_{T}, {\underset{̲}{s}}_{R})

.

Suppose next that the actual channel is

\underset{̲}{Y} = H \underset{̲}{X} + \underset{̲}{Z}

where

\underset{̲}{Z} \sim CN (\underset{̲}{0}, I)

. The extension of (136) and (168) to block fading channels with CSIR is

\begin{matrix} I_{1} (A^{L}; Y^{L} | S_{R}^{L}) = \sum_{ℓ = 1}^{L} E [log (1 + \frac{{\tilde{P}}_{ℓ} ({\underset{̲}{S}}_{R})}{1 + E [G P_{ℓ} (S_{T}^{ℓ}) | {\underset{̲}{S}}_{R}] - {\tilde{P}}_{ℓ} ({\underset{̲}{S}}_{R})})] \end{matrix}

(337)

where (cf. (166) and (167))

\begin{matrix} {\tilde{P}}_{ℓ} ({\underset{̲}{s}}_{R}) & = E {[|E [H \sqrt{P_{ℓ} (S_{T}^{ℓ})}| S_{T}^{ℓ}, {\underset{̲}{S}}_{R} = {\underset{̲}{s}}_{R}]|]}^{2} \\ E [| Y_{ℓ} |^{2} | {\underset{̲}{S}}_{R} = {\underset{̲}{s}}_{R}] & = 1 + E [G P_{ℓ} (S_{T}^{ℓ}) | {\underset{̲}{S}}_{R} = {\underset{̲}{s}}_{R}] . \end{matrix}

9.3. CSIT@ R

Continuing as in Section 5.2, suppose the CSIT in (321) can be written by replacing

S_{H i}

with

S_{R i}^{ℓ}

for all i and ℓ:

\begin{matrix} S_{T i ℓ} = f_{T ℓ} (S_{R i}^{ℓ}, X_{i}^{ℓ - 1}, Y_{i}^{ℓ - 1}) . \end{matrix}

(338)

The capacity (324) then simplifies to a directed information. To see this, expand the mutual information in (324) as

\begin{matrix} I (A^{L}; Y^{L} | S_{R}^{L}) & \overset{(a)}{=} \sum_{ℓ = 1}^{L} I (A^{L}, X^{ℓ}; Y_{ℓ} | S_{R}^{L}, Y^{ℓ - 1}) \\ \overset{(b)}{=} \sum_{ℓ = 1}^{L} I (X^{ℓ}; Y_{ℓ} | S_{R}^{L}, Y^{ℓ - 1}) \end{matrix}

(339)

where step

(a)

follows because

X^{ℓ}

is a function of

A^{L}

and

S_{T}^{ℓ}

in (338), and step

(b)

follows by the Markov chains

\begin{matrix} A^{L} - [S_{R}^{L}, X^{ℓ}, Y^{ℓ - 1}] - Y_{ℓ} . \end{matrix}

(340)

The capacity is therefore (see the definition (27))

\begin{matrix} C = max_{X_{ℓ} (S_{T}^{ℓ}), ℓ = 1, \dots, L} \frac{1}{L} I (X^{L} \to Y^{L} | S_{R}^{L}) . \end{matrix}

(341)

The maximization in (341) under a cost constraint becomes a constrained maximization for which

E [c (X^{L}, Y^{L})] \leq L P

for some cost function

c (\cdot)

.

Remark 75.

As outlined at the end of Section 9.1, the capacity (341) is a special case of the theory in ([22], Equation (48)). To see this, define the extended and time-shifted strings

\begin{matrix} {\hat{A}}^{L + 1} = (0, A^{L}), {\hat{X}}^{L + 1} = (0, X^{L}), {\hat{Y}}^{L + 1} = (0, Y^{L}) . \end{matrix}

Since

A^{L}

and

S_{R}^{L}

are independent, one may expand (339) as

\begin{matrix} I (A^{L}; Y^{L} | S_{R}^{L}) & = I (A^{L}; (S_{R 2}, \dots, S_{R L}, 0), Y^{L} | S_{R 1}) \\ \overset{(a)}{=} \sum_{ℓ = 1}^{L} I (A^{L}, X^{ℓ}; S_{R (ℓ + 1)}, Y_{ℓ} | S_{R}^{ℓ}, Y^{ℓ - 1}) \\ \overset{(b)}{=} \sum_{ℓ = 1}^{L} I (X^{ℓ}; S_{R (ℓ + 1)}, Y_{ℓ} | S_{R}^{ℓ}, Y^{ℓ - 1}) \\ = \sum_{ℓ = 2}^{L + 1} I ({\hat{X}}^{ℓ}; S_{R ℓ}, {\hat{Y}}_{ℓ} | S_{R}^{ℓ - 1}, {\hat{Y}}^{ℓ - 1}) \end{matrix}

(342)

where step

(a)

follows because

X^{ℓ}

is a function of

A^{L}

and

S_{T}^{ℓ}

in (338), and where

S_{R (L + 1)} = 0

, and step

(b)

follows by the Markov chains

\begin{matrix} A^{L} - [X^{ℓ}, Y^{ℓ - 1}, S_{R}^{ℓ}] - [Y_{ℓ}, S_{R (ℓ + 1)}] . \end{matrix}

(343)

The expression (342) is the desired directed information

\begin{matrix} I (A^{L}; Y^{L}, S_{R}^{L}) = I ({\hat{X}}^{L + 1} \to {\hat{Y}}^{L + 1}, S_{R}^{L + 1}) . \end{matrix}

(344)

Remark 76.

Consider the basic CSIT model

\begin{matrix} S_{T i ℓ} = f_{T} (S_{R i ℓ}) \end{matrix}

(345)

for some function

f_{T} (\cdot)

and for

ℓ = 1, \dots, L

and

i = 1, \dots, n

. This model was studied in ([103], Section III.C) and its capacity is given as (see ([103], Equation (35) with Equation (13)))

\begin{matrix} C & = max_{X_{ℓ} (S_{T}^{ℓ}), ℓ = 1, \dots, L} \frac{1}{L} I (X^{L}; Y^{L} | S_{R}^{L}, S_{T}^{L}) . \end{matrix}

(346)

To see that (346) is a special case of (341), observe that

\begin{matrix} I (X^{L} \to Y^{L} | S_{R}^{L}) & \overset{(a)}{=} \sum_{ℓ = 1}^{L} I (X^{ℓ}; Y_{ℓ} | S_{R}^{L}, S_{T}^{L}, Y^{ℓ - 1}) \\ \overset{(b)}{=} \sum_{ℓ = 1}^{L} I (X^{L}; Y_{ℓ} | S_{R}^{L}, S_{T}^{L}, Y^{ℓ - 1}) \end{matrix}

(347)

where step

(a)

follows by (339), and step

(b)

follows by the Markov chains

\begin{matrix} [X_{ℓ + 1}, \dots, X_{L}] - [S_{R}^{L}, S_{T}^{L}, Y^{ℓ - 1}, X^{ℓ}] - Y_{ℓ} . \end{matrix}

(348)

The expression (347) gives (346). Related results are available in ([10], Section III) and [104,105].

Remark 77.

The capacity (341) has only

S_{R}^{L}

in the conditioning while (346) has both

S_{R}^{L}

and

S_{T}^{L}

in the conditioning. This subtle difference is due to permitting

X^{ℓ - 1}

to influence the

S_{T ℓ}

in (338), and it complicates the analysis. On the other hand, if we remove only

X^{ℓ - 1}

from (338) then the receiver knows

S_{T ℓ}

at time ℓ and the capacity (341) can be written as (see the definition (28))

\begin{matrix} C = max_{X_{ℓ} (S_{T}^{ℓ}), ℓ = 1, \dots, L} \frac{1}{L} I (X^{L} \to Y^{L} ∥ S_{T}^{L} | S_{R}^{L}) . \end{matrix}

(349)

We treat such a model in Section 9.7 below.

9.4. Fading Channels with AWGN

The expression (341) is valid for general statistics. We next specialize to the block-fading AWGN model

\begin{matrix} Y_{ℓ} = H X_{ℓ} + Z_{ℓ} \end{matrix}

(350)

where

ℓ = 1, \dots, L

,

Z^{L} \sim CN (\underset{̲}{0}, I)

, and

(H, S_{R}^{L})

,

A^{L}

,

Z^{L}

are mutually independent. Consider the power constraint

\begin{matrix} \sum_{ℓ = 1}^{L} E [P_{ℓ} (S_{T}^{ℓ})] \leq L P \end{matrix}

(351)

where

P_{ℓ} (s_{T}^{ℓ}) = E [| X_{ℓ} (s_{T}^{ℓ}) |^{2}]

. The optimization of (341) under the constraint (351) is usually intractable, and we again desire expressions with

log (1 + SNR)

terms to obtain insight.

Capacity Upper Bound: Using similar steps as in (162), we have

\begin{matrix} I (A^{L}; Y^{L} | S_{R}^{L}) & \leq I (A^{L}; Y^{L}, H | S_{R}^{L}) \\ = \sum_{ℓ = 1}^{L} I (A^{L}; Y_{ℓ} | S_{R}^{L}, H, Y^{ℓ - 1}) \\ \leq \sum_{ℓ = 1}^{L} [h (Y_{ℓ} | S_{R}^{L}, H, Y^{ℓ - 1}) - h (Z_{ℓ})] \\ \overset{(a)}{\leq} \sum_{ℓ = 1}^{L} E [log (1 + E [G P_{ℓ} (S_{T}^{ℓ}) | S_{R}^{L}, H, Y^{ℓ - 1}])] \end{matrix}

(352)

where

G = {| H |}^{2}

and step

(a)

follows by (163). However, CSCG inputs do not necessarily maximize the RHS of (352) because the inputs affect the CSIT.

Remark 78.

The expectation inside the logarithm in (352) becomes

G P_{ℓ} (S_{T}^{ℓ})

if

S_{T}^{ℓ}

is a function of

S_{R}^{L}, H, Y^{ℓ - 1}

; see (161), Remark 77, and Proposition 3 below.

Achievable Rates: Deriving achievable rates is more subtle than in Section 6. Consider the CSIT model (338) where for each block, we have

\begin{matrix} S_{T ℓ} = f_{T ℓ} (H, X^{ℓ - 1}, Y^{ℓ - 1}) \end{matrix}

for all ℓ. The capacity (341) is

\begin{matrix} C (P) & = max_{X_{ℓ} (S_{T}^{ℓ}), ℓ = 1, \dots, L} \frac{1}{L} I (X^{L} \to Y^{L} | H) \end{matrix}

(353)

\begin{matrix} = max_{X_{ℓ} (S_{T}^{ℓ}), ℓ = 1, \dots, L} [\frac{1}{L} h (Y^{L} | H)] - log (π e) . \end{matrix}

(354)

However, CSCG inputs are not necessarily optimal since the inputs affect the CSIT.

Instead of trying to optimize the input, consider

X_{ℓ}

that are CSCG. We may write

\begin{matrix} I (X^{L} \to Y^{L} | H) = \sum_{ℓ = 1}^{L} E [log (1 + G P_{ℓ} (S_{T}^{ℓ}))] \end{matrix}

(355)

and the Lagrangians to maximize (355) are

\begin{matrix} \sum_{ℓ = 1}^{L} E [log (1 + G P_{ℓ} (S_{T}^{ℓ}))] + λ (L P - \sum_{ℓ = 1}^{L} E [P_{ℓ} (S_{T}^{ℓ})]) . \end{matrix}

(356)

Suppose the

S_{T ℓ}

are discrete random variables. Taking the derivative with respect to

P_{ℓ} (s_{T}^{ℓ})

, we obtain

\begin{matrix} λ & = \int_{0}^{\infty} p (g | s_{T}^{ℓ}) \frac{g}{1 + g P_{ℓ} (s_{T}^{ℓ})} d g \\ + \sum_{k = ℓ + 1}^{L} \sum_{s_{T}^{k}} \int_{0}^{\infty} p (g) \frac{d P_{S_{T}^{k} | G} (s_{T}^{k} | g)}{d P_{ℓ} (s_{T}^{ℓ})} \frac{log (1 + g P_{k} (s_{T}^{k}))}{P_{S_{T}^{ℓ}} (s_{T}^{ℓ})} d g \end{matrix}

(357)

as long as

P_{ℓ} (s_{T}^{ℓ}) > 0

. This expression is complicated because the choice of transmit powers

P_{ℓ} (s_{T}^{ℓ})

influences the statistics of the future CSIT

S_{T (ℓ + 1)}, \dots, S_{T L}

. If (357) cannot be satisfied, choose

P_{ℓ} (s_{T}^{ℓ}) = 0

. Finally, set

λ

so that

\sum_{ℓ = 1}^{L} E [P_{ℓ} (S_{T}^{ℓ})] = L P

.

Instead of the above, consider the simpler CSIT model with

S_{T ℓ} = f_{T ℓ} (H)

for all ℓ, cf. (345). The capacity (346) is now given by (355) with CSCG inputs and (357) simplifies because the derivatives with respect to

P_{ℓ} (s_{T}^{ℓ})

are zero, i.e., the double sum in (357) disappears and for all ℓ and

s_{T}^{ℓ}

we have

\begin{matrix} λ = \int_{0}^{\infty} p (g | s_{T}^{ℓ}) \frac{g}{1 + g P_{ℓ} (s_{T}^{ℓ})} d g . \end{matrix}

(358)

We use (358) for (362)–(364) in Section 9.7 below.

9.5. Full CSIR, Partial CSIT

We next generalize Proposition 2 in Section 6.4 to the block-fading AWGN model (350) with the CSIR

\begin{matrix} S_{R ℓ} = H \sqrt{P (S_{T}^{ℓ})}, ℓ = 1, \dots, L \end{matrix}

(359)

and where

S_{T ℓ} = f_{T ℓ} (S_{H})

, i.e., we have discarded

X_{i}^{ℓ - 1}

and

Y_{i}^{ℓ - 1}

in (321). We then have the following capacity result that implies this CSIR is the best possible since one achieves a capacity upper bound similar to (161).

Proposition 3.

The capacity of the channel (350) with the CSIR (359) and

S_{T ℓ} = f_{T ℓ} (S_{H})

for

ℓ = 1, \dots, L

is

\begin{matrix} C (P) & = max \frac{1}{L} \sum_{ℓ = 1}^{L} E [log (1 + G P_{ℓ} (S_{T}^{ℓ}))] \end{matrix}

(360)

where the maximization is over the power control policies

P_{ℓ} (S_{T}^{ℓ})

such that

\sum_{ℓ = 1}^{L} E [P_{ℓ} (S_{T}^{ℓ})] \leq L P

. One may use (358) to compute the

P_{ℓ} (S_{T}^{ℓ})

.

Proof.

For achievability, apply (337) with

\begin{matrix} {\tilde{P}}_{ℓ} ({\underset{̲}{S}}_{R}) = G P_{ℓ} (S_{T}^{ℓ}) and E [| Y_{ℓ} |^{2} | {\underset{̲}{S}}_{R}] = 1 + {\tilde{P}}_{ℓ} ({\underset{̲}{S}}_{R}) . \end{matrix}

The converse follows by applying similar steps as in (162):

\begin{matrix} I (A^{L}; Y^{L} | S_{R}^{L}) & \leq I (A^{L}; Y^{L}, S_{T}^{L}, H | S_{R}^{L}) \\ = \sum_{ℓ = 1}^{L} I (A^{L}; Y_{ℓ} | S_{R}^{L}, S_{T}^{L}, H, Y^{ℓ - 1}) \\ \leq \sum_{ℓ = 1}^{L} [h (Y_{ℓ} | S_{R}^{L}, S_{T}^{L}, H, Y^{ℓ - 1}) - h (Z_{ℓ})] \\ \overset{(a)}{\leq} \sum_{ℓ = 1}^{L} E [log Var [Y_{ℓ} | S_{R}^{L}, S_{T}^{L}, H, Y^{ℓ - 1}]] . \end{matrix}

(361)

Finally, insert

Var [Y_{ℓ} | S_{R}^{L}, S_{T}^{L}, H, Y^{ℓ - 1}] = 1 + G P_{ℓ} (S_{T}^{ℓ})

. □

The RHS of (361) is at most the RHS of (352) and hence (361) gives a better bound. However, the bound (361) is valid only for particular CSIT, as in Remark 78.

9.6. On-Off Fading with Delayed CSIT

Consider on-off fading where the CSIT is delayed by D symbols, i.e., we have

S_{T ℓ} = 0

for

ℓ = 1, \dots, D

and

S_{T (D + 1)} = H

. Define the transmit powers as

P_{ℓ} (s_{T}^{ℓ}) = E [| X (s_{T}^{ℓ}) |^{2}]

for

ℓ = 1, \dots, L

. The capacity is

\begin{matrix} C (P) & = \frac{D}{2 L} log (1 + 2 P_{1}) + \frac{L - D}{2 L} log (1 + 2 P_{D + 1}) \end{matrix}

where we write

P_{D + 1} = P_{D + 1} (s_{T}^{D + 1})

. Optimizing the powers, we obtain

\begin{matrix} \{\begin{matrix} P_{1} = P - \frac{L - D}{4 L} \\ P_{D + 1} = 2 P + \frac{D}{2 L} \end{matrix}\} if P \geq \frac{L - D}{4 L} \\ \{\begin{matrix} P_{1} = 0 \\ P_{D + 1} = \frac{2 L P}{L - D} \end{matrix}\} else . \end{matrix}

For large P, we thus have

C (P) \approx \frac{1}{2} log (P)

for all

0 \leq D \leq L

. For small P, we have

\begin{matrix} C (P) & = \{\begin{matrix} \frac{L - D}{2 L} log (1 + \frac{4 L P}{L - D}), & if 0 \leq D < L \\ log (1 + 2 P) / 2, & if D = L \end{matrix} \\ \approx \{\begin{matrix} (2 P - \frac{4 L}{L - D} P^{2}) log (e), & if 0 \leq D < L \\ (P - P^{2}) log (e), & if D = L . \end{matrix} \end{matrix}

The CSIT thus gives a 3 dB power gain at low SNR since

C (P) \approx 2 P log (e)

for

0 \leq D < L

and

C (P) \approx P log (e)

for

D = L

. Furthermore, using (37), the slope of the capacity versus

E_{b} / N_{0}

in bits/s/Hz/(3 dB) is

\begin{matrix} \begin{matrix} 1 - D / L & if 0 \leq D < L \\ 1 & if D = L . \end{matrix} \end{matrix}

In other words, the delay reduces the low-SNR rate by a factor of

1 - D / L

for

0 \leq D < L

.

9.7. Rayleigh Fading and One-Bit Feedback

Let

q_{u} (.)

be the one-bit (

B = 1

) quantizer in Section 2.9. We study Rayleigh fading for two scenarios with

S_{R}^{L} = H

, i.e., the receiver knows H after the L transmissions of each block.

For the CSIT (345), we study delayed feedback where $S_{T ℓ} = 0$ for $ℓ = 1, \dots, L - 1$ and $S_{T L} = q_{u} (G)$ . The delay is thus $D = L - 1$ in the sense of Section 9.6.
For the CSIT (338), we study the case $S_{T 1} = 0$ , $S_{T 2} = q_{u} (| Y_{1} |)$ , and $S_{T ℓ} = 0$ for $ℓ = 3, \dots, L$ . The delay is thus $D = 1$ in the sense of Section 9.6.

Delayed Quantized CSIR Feedback: Consider

S_{T ℓ} = 0

for

ℓ = 1, \dots, L - 1

and

S_{T L} = q_{u} (G)

. CSCG inputs are optimal, and (347) has the same form as (360). The Lagrangians are given by (356), and we again obtain (358). For the case at hand, we have

L + 1

equations for

λ

, namely

\begin{matrix} λ & = \int_{0}^{\infty} e^{- g} \frac{g}{1 + g P_{ℓ}} d g, ℓ = 1, \dots, L - 1 \end{matrix}

(362)

\begin{matrix} λ & = \int_{0}^{Δ} \frac{e^{- g}}{1 - e^{- Δ}} \frac{g}{1 + g P_{L} (Δ / 2)} d g \end{matrix}

(363)

\begin{matrix} λ & = \int_{Δ}^{\infty} \frac{e^{- g}}{e^{- Δ}} \frac{g}{1 + g P_{L} (3 Δ / 2)} d g \end{matrix}

(364)

where we used (40) and (41) and abused notation by writing

P_{L} (s_{T L})

for

P_{L} (s_{T}^{L})

. We thus have

P_{1} = \dots = P_{L - 1}

and obtain three equations. We now search for

λ

such that

\begin{matrix} (L - 1) P_{1} + \sum_{s} P_{S_{T L}} (s) P_{L} (s) = L P \end{matrix}

and the capacity (353) is

\begin{matrix} C (P) = \frac{L - 1}{L} e^{1 / P_{1}} E_{1} (1 / P_{1}) + \frac{1}{L} \sum_{s} \int_{I (s)} e^{- g} log (1 + g P_{L} (s)) d g \end{matrix}

(365)

where the sums are over

s = Δ / 2, 3 Δ / 2

and

\begin{matrix} I (Δ / 2) = [0, Δ), I (3 Δ / 2) = [Δ, \infty) . \end{matrix}

We remark that, if

P_{1} = 0

, then we set

e^{1 / P_{1}} E_{1} (1 / P_{1}) = 0

since

{lim}_{x \to \infty} e^{x} E_{1} (x) = 0

.

Figure 15 shows these capacities for

L = 1, 2, 3

and

Δ = 1

. At low SNR (e.g., for

L = 3

below

- 2.97

dB) we have

P_{1} = 0

and

P_{L} (Δ / 2) = 0

, i.e., the transmitter is silent unless

S_{T L} = 3 Δ / 2

and it uses power at time

ℓ = L

only. Observe that, as in Section 9.6, a delay of L steps reduces the low-SNR slope, and therefore the low-SNR rates, by a factor of L. Delay can thus be costly at low SNR.

Quantized Channel Output Feedback: Consider

S_{T 1} = 0

,

S_{T 2} = q_{u} (| Y_{1} |)

, and

S_{T ℓ} = 0

for

ℓ = 3, \dots, L

. As discussed in Remark 77, the capacity is given by the directed information expression (349). However, optimizing the input statistics seems difficult, i.e., CSCG inputs are not necessarily optimal. Instead, we compute achievable rates for a strategy where one symbol partially acts as a pilot.

Suppose the transmitter sends

X_{1} = \sqrt{P_{1}} e^{j Φ}

as the first symbol of each block, where

Φ

is uniformly distributed in

[0, 2 π)

. The idea is that

| X_{1} | = \sqrt{P_{1}}

is known at the receiver, and thus

X_{1}

acts as a pilot to test the channel amplitude. Next, we choose a variation of flash signaling. Define the event

E = {| Y_{1} | \geq Δ} = {S_{T 2} = 3 Δ / 2}

. If this event does not occur, the transmitter sends

X_{ℓ} = 0

for

ℓ = 2, \dots, L

. Otherwise, the transmitter sends independent CSCG

X_{ℓ}

with variance

P_{2} / \Pr [E]

for

ℓ = 2, \dots, L

. Define

P_{ℓ} (s_{T}^{ℓ}) = E [| X (s_{T}^{ℓ}) |^{2}]

. We have

P_{ℓ} = P_{2}

for

ℓ \geq 2

and the power constraint is

P_{1} + (L - 1) P_{2} \leq L P

.

We use (347) to write

\begin{matrix} C (P) \geq \frac{1}{L} I (X_{1}; Y_{1} | H) + \frac{L - 1}{L} I (X^{2}; Y_{2} | H, Y_{1}) . \end{matrix}

(366)

The first mutual information in (366) is

\begin{matrix} I (X_{1}; Y_{1} | H) & = h (Y_{1} | H) - log (π e) \end{matrix}

and we compute (see ([52], Appendix A))

\begin{matrix} p (y_{1} | h) = \frac{1}{π} e^{- (| y_{1} |^{2} + P_{1} | h |^{2})} I_{0} (2 | y_{1} | | h | \sqrt{P_{1}}) \end{matrix}

where

I_{0} (.)

is the modified Bessel function of the first kind of order zero. The Jacobian of the mapping from Cartesian coordinates

[ℜ (y_{1}), ℑ (y_{1})]

to polar coordinates

[| y_{1} |, arg y_{1}]

is

| y_{1} |

, so we have

\begin{matrix} h (Y_{1} | H = h) = \int_{0}^{\infty} - p (y_{1} | h) log (p (y_{1} | h)) 2 π | y_{1} | d | y_{1} | . \end{matrix}

We further compute

\begin{matrix} I (X^{2}; Y_{2} | H, Y_{1}) = \int_{0}^{\infty} e^{- g} \Pr [E | G = g] log (1 + \frac{g P_{2}}{\Pr [E]}) d g . \end{matrix}

(367)

The conditional probability of a high-energy

Y_{1}

is

\begin{matrix} \Pr [E | G = g] = Q_{1} (\sqrt{2 g P_{1}}, \sqrt{2} Δ) \end{matrix}

where

Q_{1} (.)

is the Marcum Q-function of order 1; see (A3) in Appendix A.1. For Rayleigh fading, we compute

\begin{matrix} \Pr [E] = \Pr [{|H \sqrt{P_{1}} e^{j Φ} + Z_{1}|}^{2} \geq Δ^{2}] = e^{- Δ^{2} / (P_{1} + 1)} . \end{matrix}

The resulting rates are shown in Figure 16 for the block lengths

L = 10, 20, 100

. Observe that each curve turns back on itself, which reflects the non-concavity of the directed information rates in P; see ([74], Section III). All rates below the curves are achievable by “time-wasting”, i.e., by transmitting for some fraction of the time only. This suggests that flash signaling [73] will improve the rates since one sends information by choosing whether to transmit energy.

10. Conclusions

This paper reviewed and derived achievable rates for channels with CSIR, CSIT, block fading, and in-block feedback. GMI expressions were developed for adaptive codewords and two classes of auxiliary channel models with AWGN and CSCG inputs: reverse and forward channel models. The forward model inputs were chosen as linear functions of the adaptive codeword’s symbols. We showed that, for scalar channels, an input distribution that maximizes the GMI generates a conventional codebook, where the codeword symbols are multiplied by a complex number that depends on the CSIT. The GMI increases by partitioning the channel output alphabet and modifying the auxiliary model parameters for each partition subset. The partitioning helps to determine the capacity scaling at high and low SNR. Power control policies were developed for full CSIT, including TMMSE policies. The theory was applied to channels with on-off fading and Rayleigh fading. The capacities with in-block feedback simplify to directed information expressions if the CSIT is a function of the CSIR and past channel inputs and outputs.

There are many possible applications and extensions of this work. For example, adaptive coding and modulation are important for all practical communication systems, including wireless, copper, and fiber-optic networks. Shannon’s adaptive codewords can improve current systems since the CSIT is usually a noisy version of the CSIR; see Remark 25. Moreover, the information theory for in-block feedback [22] applies to beamforming [106] and intelligent reflecting surfaces [107,108]. One may also apply GMI to multi-user channels with in-block feedback, such as multi-access and broadcast channels. Finally, it is important to develop improved capacity upper bounds. The standard approach here is the duality framework described in [97,109]; see also ([110], page 128).

Funding

This work was supported by the 6G Future Lab Bavaria funded by the Bavarian State Ministry of Science and the Arts, the project 6G-life funded by the Germany Federal Ministry for Education and Research (BMBF), and by the German Research Foundation (DFG) through projects 390777439 and 509917421.

Acknowledgments

The author wishes to thank the reviewers for their helpful comments and W. Zhang for sending his recent paper [50].

Conflicts of Interest

The author declares no conflict of interest.

Appendix A. Special Functions

This appendix reviews three classes of functions that we use to analyze information rates: the non-central chi-squared distribution, the exponential integral, and gamma functions.

Appendix A.1. Non-Central Chi-Squared Distribution

The non-central chi-squared distribution with two degrees of freedom is the probability distribution of

Y = {| x + Z |}^{2}

where

x \in C

and

Z \sim CN (0, 2)

. The density is

\begin{matrix} p (y) = \frac{1}{2} e^{- (y + | x |^{2}) / 2} I_{0} (| x | \sqrt{y}) \cdot 1 (y \geq 0) \end{matrix}

(A1)

where

I_{0} (.)

is the modified Bessel function of the first kind of order zero. The cumulative distribution function is

\begin{matrix} \Pr [Y \leq t] = 1 - Q_{1} (| x |, \sqrt{t}) \end{matrix}

(A2)

where

Q_{1} (.)

is the Marcum Q-function of order 1. Observe that if we change Z to

Z \sim CN (0, σ^{2})

then for

Y = {| x + Z |}^{2}

we instead have

\begin{matrix} \Pr [Y \leq t] = 1 - Q_{1} (\sqrt{{2 | x |}^{2} / σ^{2}}, \sqrt{2 t / σ^{2}}) . \end{matrix}

(A3)

Appendix A.2. Exponential Integral

The exponential integral is defined for

x > 0

as

\begin{matrix} E_{1} (x) = \int_{x}^{\infty} \frac{e^{- t}}{t} d t . \end{matrix}

(A4)

The derivative of

E_{1} (x)

is

\begin{matrix} \frac{d E_{1} (x)}{d x} = \frac{- e^{- x}}{x} . \end{matrix}

(A5)

For small x one may apply ([111], Equation (3))

\begin{matrix} E_{1} (x) \approx - γ - log x + x \end{matrix}

(A6)

where

γ \approx 0.57721

is Euler’s constant. For large x we have

\begin{matrix} E_{1} (x) \approx \frac{e^{- x}}{x} (1 - \frac{1}{x} + \frac{2}{x^{2}} - \frac{6}{x^{3}}) . \end{matrix}

(A7)

We have the bounds [112]

\begin{matrix} \frac{1}{2} log (1 + \frac{2}{x}) & < e^{x} E_{1} (x) < log (1 + \frac{1}{x}) \end{matrix}

(A8)

\begin{matrix} \frac{1}{x + 1} & < e^{x} E_{1} (x) < \frac{x + 1}{x (x + 2)} . \end{matrix}

(A9)

Using integration by parts, for

x > 0

, we have

\begin{matrix} \int_{x}^{\infty} e^{- t} log t d t = E_{1} (x) + e^{- x} log (x) \end{matrix}

(A10)

\begin{matrix} \int_{x}^{\infty} e^{- t} \frac{1}{t^{2}} d t = \frac{e^{- x}}{x} - E_{1} (x) . \end{matrix}

(A11)

Using the translation

\tilde{t} = t + y

we also have

\begin{matrix} \int_{x}^{\infty} e^{- t} \frac{t}{t + y} d t = e^{- x} - y e^{y} E_{1} (x + y) \end{matrix}

(A12)

\begin{matrix} \int_{x}^{\infty} e^{- t} \frac{t}{{(t + y)}^{2}} d t = - e^{- x} \frac{y}{x + y} + (y + 1) e^{y} E_{1} (x + y) \end{matrix}

(A13)

\begin{matrix} \int_{x}^{\infty} e^{- t} \frac{t^{2}}{{(t + y)}^{2}} d t = & e^{- x} (1 + \frac{y^{2}}{x + y}) \\ - y (y + 2) e^{y} E_{1} (x + y) . \end{matrix}

(A14)

Appendix A.3. Gamma Functions

The upper and lower incomplete gamma functions are the respective

\begin{matrix} Γ (s, t) = \int_{t}^{\infty} e^{- g} g^{s - 1} d g \end{matrix}

(A15)

\begin{matrix} γ (s, t) = \int_{0}^{t} e^{- g} g^{s - 1} d g . \end{matrix}

(A16)

For instance, we have

Γ (1, t) = e^{- t}

and

γ (1, t) = 1 - e^{- t}

. We further have

Γ (0, t) = E_{1} (t)

where

E_{1} (x)

is the exponential integral defined in Appendix A.2.

The Gamma function is

Γ (s) = Γ (s, 0) = γ (s, \infty)

and for positive integers n we have

\begin{matrix} Γ (n) = (n - 1)!, Γ (n - \frac{1}{2}) = \frac{(2 n - 2)!}{4^{n - 1} (n - 1)!} \sqrt{π} . \end{matrix}

For example, the following cases are used in Section 8.4:

\begin{matrix} Γ (1) = Γ (2) = 1, Γ (\frac{1}{2}) = \sqrt{π}, Γ (\frac{3}{2}) = \frac{\sqrt{π}}{2}, Γ (\frac{5}{2}) = \frac{3}{4} \sqrt{π} . \end{matrix}

The value

Γ (0)

is undefined but we have

{lim}_{x \to 0^{+}} Γ (x) = \infty

.

Appendix B. Forward Model GMIs with K = 2

This appendix studies

K = 2

GMIs to develop high and low SNR capacity scaling results. Consider the independent random variables

Z \sim CN (0, 1)

and

X \sim CN (0, P)

. We need the following expression for the event

E = {| X + Z |^{2} \geq t_{R}}

:

\begin{matrix} E [{| Z |}^{2}| E] & = \int_{C} p_{Z | E} (z) {| z |}^{2} d z \\ = \frac{1}{\Pr [E]} \int_{C} \frac{e^{- {| z |}^{2}}}{π} {| z |}^{2} \Pr [{| X + z |}^{2} \geq t_{R}] d z \\ = e^{t_{R} / (1 + P)} \int_{0}^{\infty} e^{- g} g Q_{1} (\sqrt{\frac{2 g}{P}}, \sqrt{\frac{2 t_{R}}{P}}) d g . \end{matrix}

(A17)

The integral can be computed using ([113], Equation (12)) with

k = 2

,

m = 1

,

p = 1

, the Gamma functions above, and the following identities for Kummer’s confluent hypergeometric function:

\begin{matrix} _{1} F_{1} (1; 2; z) = (e^{z} - 1) / z,_{1} F_{1} (2; 2; z) = e^{z} . \end{matrix}

The result is

\begin{matrix} E [{| Z |}^{2}| {| X + Z |}^{2} \geq t_{R}] = 1 + \frac{t_{R}}{{(1 + P)}^{2}} . \end{matrix}

(A18)

Appendix B.1. On-Off Fading

Consider on-off fading as in Section 3.3 and the

K = 2

partition in Remark 14 with

h_{2} = \sqrt{2}

. We compute

\begin{matrix} \Pr [E_{2}] = \sum_{h = 0, \sqrt{2}} \Pr [H = h] \Pr [E_{2} | H = h] = \frac{1}{2} e^{- t_{R}} + \frac{1}{2} e^{- t_{R} / (1 + 2 P)} . \end{matrix}

(A19)

If

t_{R} = P^{λ_{R}} + b

where

0 < λ_{R} < 1

and b is a real constant then

\Pr [E_{2}] \to 1 / 2

as

P \to \infty

, as desired. We further have

\begin{matrix} \Pr [H = 0| E_{2}] & = \frac{e^{- t_{R}}}{2 \Pr [E_{2}]} \end{matrix}

(A20)

\begin{matrix} \Pr [H = \sqrt{2}| E_{2}] & = \frac{e^{- t_{R} / (1 + 2 P)}}{2 \Pr [E_{2}]} . \end{matrix}

(A21)

The choice

t_{R} = P^{λ_{R}} + b

gives

\Pr [H = \sqrt{2}| E_{2}] \to 1

as

P \to \infty

. In other words, the receiver can reliably determine H by choosing

t_{R}

to grow with P, but not too fast.

We next compute

\begin{matrix} E [{| Y |}^{2} | E_{2}] & = \sum_{h = 0, \sqrt{2}} \Pr [H = h | E_{2}] E [{| Y |}^{2} | E_{2}, H = h] \\ = \frac{e^{- t_{R}} (t_{R} + 1) + e^{- t_{R} / (1 + 2 P)} (t_{R} + 1 + 2 P)}{2 \Pr [E_{2}]} . \end{matrix}

(A22)

The choice

t_{R} = P^{λ_{R}} + b

makes

E [{| Y |}^{2} | E_{2}] / (1 + 2 P) \to 1

as

P \to \infty

. Finally, we compute

\begin{matrix} E [| Y - \sqrt{2} X |^{2} | E_{2}] = \sum_{h = 0, \sqrt{2}} \Pr [H = h | E_{2}] E [{|Y - \sqrt{2} X|}^{2}| E_{2}, H = h] \\ = \frac{1}{2 \Pr [E_{2}]} \{e^{- t_{R}} (t_{R} + 1 + 2 P) + e^{- t_{R} / (1 + 2 P)} (1 + \frac{t_{R}}{{(1 + 2 P)}^{2}})\} \end{matrix}

(A23)

where the last step uses (A18). The choice

t_{R} = P^{λ_{R}} + b

makes

E [| Y - \sqrt{2} X |^{2} | E_{2}] \to 1

as

P \to \infty

.

Appendix B.2. On-Off Fading, Partial CSIR, and Full CSIT

The analysis for Section 7.3 is similar to that of Appendix B.1. Consider the GMI (259) and observe that we can replace

2 P

with

4 P

in (A19)–(A22). We also have

\begin{matrix} E [| Y - \sqrt{4 P} {U |}^{2}| E_{2}] = \sum_{h = 0, \sqrt{2}} \Pr [H = h | E_{2}] E [| Y - \sqrt{4 P} {U |}^{2} | E_{2}, H = h] \\ = \frac{1}{2 \Pr [E_{2}]} \{e^{- t_{R}} (t_{R} + 1 + 4 P) + e^{- t_{R} / (1 + 4 P)} (1 + \frac{t_{R}}{{(1 + 4 P)}^{2}})\} . \end{matrix}

(A24)

The choice

t_{R} = P^{λ_{R}} + b

as in Appendix B.1 gives (260).

Appendix B.3. On-Off Fading, Partial CSIR, and CSIT@R

The analysis for Section 7.4 is similar to that of Appendices Appendix B.1 and Appendix B.2. We compute

\begin{matrix} \Pr [E_{2} | S_{R} = 0] & = \bar{ϵ} e^{- t_{R}} + ϵ e^{- t_{R} / [1 + 2 P (0)]} \end{matrix}

(A25)

\begin{matrix} \Pr [E_{2} | S_{R} = \sqrt{2}] & = ϵ e^{- t_{R}} + \bar{ϵ} e^{- t_{R} / [1 + 2 P (\sqrt{2})]} . \end{matrix}

(A26)

Suppose

P (0)

and

P (\sqrt{2})

both scale in proportion to P. If we choose

t_{R} = P^{λ_{R}} + b

as in Appendix B.1 then

\Pr [E_{2} | S_{R} = 0] \to ϵ

and

\Pr [E_{2} | S_{R} = \sqrt{2}] \to \bar{ϵ}

as

P \to \infty

. We also have

\begin{matrix} \Pr [H = 0| E_{2}, S_{R} = 0] & = \frac{\bar{ϵ} e^{- t_{R}}}{\Pr [E_{2} | S_{R} = 0]} \end{matrix}

(A27)

\begin{matrix} \Pr [H = \sqrt{2}| E_{2}, S_{R} = 0] & = \frac{ϵ e^{- t_{R} / [1 + 2 P (0)]}}{\Pr [E_{2} | S_{R} = 0]} \end{matrix}

(A28)

and similarly for the probabilities

\Pr [H = 0 | E_{2}, S_{R} = \sqrt{2}]

and

\Pr [H = \sqrt{2} | E_{2}, S_{R} = \sqrt{2}]

. Choosing

t_{R} = P^{λ_{R}} + b

gives the desired behavior

\Pr [H = \sqrt{2} | E_{2}, S_{R} = 0] \to 1

and

\Pr [H = \sqrt{2} | E_{2}, S_{R} = \sqrt{2}] \to 1

as

P \to \infty

. Again, the receiver can reliably determine H by choosing

t_{R}

to grow with P, but not too fast.

We next have

\begin{matrix} E [{| Y |}^{2} | E_{2}, S_{R} = 0] = \frac{\bar{ϵ} e^{- t_{R}} (t_{R} + 1) + ϵ e^{- t_{R} / [1 + 2 P (0)]} (t_{R} + 1 + 2 P (0))}{\Pr [E_{2} | S_{R} = 0]} . \end{matrix}

(A29)

The expression for

E [{| Y |}^{2} | E_{2}, S_{R} = \sqrt{2}]

is similar but

ϵ

and

\bar{ϵ}

are swapped and

P (0)

is replaced with

P (\sqrt{2})

. We also have

\begin{matrix} E [| Y - \sqrt{2} {X (0) |}^{2}| E_{2}, S_{R} = 0] & = \frac{1}{\Pr [E_{2} | S_{R} = 0]} {\bar{ϵ} e^{- t_{R}} (t_{R} + 1 + 2 P (0)) + \\ ϵ e^{- t_{R} / [1 + 2 P (0)]} (1 + \frac{t_{R}}{{(1 + 2 P (0))}^{2}})\} . \end{matrix}

(A30)

The expression for

E [| Y - \sqrt{2} {X (0) |}^{2}| E_{2}, S_{R} = \sqrt{2}]

is similar: swap

ϵ

and

\bar{ϵ}

and replace

P (0)

with

P (\sqrt{2})

. The choice

t = P^{λ_{R}} + b

makes all terms in (265) behave as desired. We thus obtain (266).

Appendix B.4. Rayleigh Fading, No CSIR, full CSIT, and TCI

The analysis for Section 8.4 is similar to that of Appendices Appendix B.1–Appendix B.3, but we now have a continuous H. Recall that

E_{2} = {{| Y |}^{2} \geq t_{R}}

and

Y = \sqrt{P (h)} U + Z

where

P (h) = 0

for

g < t

and

P (h) = \hat{P}

otherwise. We compute

\begin{matrix} \Pr [E_{2}] & = \Pr [G < t] \Pr [E_{2} | G < t] + \Pr [G \geq t] \Pr [E_{2} | G \geq t] \\ = (1 - e^{- t}) e^{- t_{R}} + e^{- t} e^{- t_{R} / (1 + \hat{P})} \end{matrix}

(A31)

where we used

\Pr [E_{2} | G < t] = \Pr [{| Z |}^{2} \geq t_{R}]

and similarly for

\Pr [E_{2} | G \geq t]

. For example, for the t and

t_{R}

in (296) we find that

\Pr [E_{2}] \to 1

as P grows. Similarly, for the t and

t_{R}

in (299) we find that

\Pr [E_{2}] \approx e^{- t - 1}

as P decreases.

We write

\begin{matrix} E [{| Y |}^{2} | E_{2}] & = \Pr [G < t | E_{2}] E [{| Z |}^{2}| E_{2}, G < t] \\ + \Pr [G \geq t | E_{2}] E [{|\sqrt{\hat{P}} U + Z|}^{2}| E_{2}, G \geq t] \\ = \frac{(1 - e^{- t}) e^{- t_{R}} (t_{R} + 1) + e^{- t} e^{- t_{R} / (1 + \hat{P})} (t_{R} + 1 + \hat{P})}{\Pr [E_{2}]} . \end{matrix}

(A32)

For the t and

t_{R}

in (296) we have

E [{| Y |}^{2} | E_{2}] / (1 + \hat{P}) \to 1

as P grows. Similarly, for the t and

t_{R}

in (299) we find that

E [{| Y |}^{2} | E_{2}] / (1 + 2 \hat{P}) \to 1

as P decreases. Next, we write

\begin{matrix} E [{|Y - \sqrt{\hat{P}} U|}^{2}| E_{2}] \\ = \Pr [G < t | E_{2}] E [{|Z - \sqrt{\hat{P}} U|}^{2}| {| Z |}^{2} \geq t_{R}] + \Pr [G \geq t | E_{2}] E [{| Z |}^{2} |{|\sqrt{\hat{P}} U + Z|}^{2} \geq t_{R}] \\ = \frac{1}{\Pr [E_{2}]} \{(1 - e^{- t}) e^{- t_{R}} (t_{R} + 1 + \hat{P}) + e^{- t} e^{- t_{R} / (1 + \hat{P})} (1 + \frac{t_{R}}{{(1 + \hat{P})}^{2}})\} . \end{matrix}

(A33)

For the t and

t_{R}

in (296) the expression (A33) approaches 1 as P grows. Similarly, for the t and

t_{R}

in (299) we find that (A33) approaches 1 as P decreases.

Appendix C. Conditional Second-Order Statistics

This appendix shows how to compute conditional second-order statistics for the reverse model GMIs and the forward model GMIs with

K = \infty

. Suppose that

U, Y

are jointly CSCG given

H = h

. Using (25) and (26), we have

\begin{matrix} E [U | Y = y, H = h] = \frac{E [U Y^{*} | H = h]}{E [{| Y |}^{2} | H = h]} \cdot y \end{matrix}

(A34)

\begin{matrix} Var [U | Y = y, H = h] = E [{| U |}^{2} | H = h] - \frac{{|E [U Y^{*} | H = h]|}^{2}}{E [{| Y |}^{2} | H = h]} . \end{matrix}

(A35)

Now consider the channel

Y = H X + Z

where

X = \sqrt{P (S_{T})} e^{j ϕ (S_{T})} U

with

U \sim CN (0, 1)

. We may write

\begin{matrix} E [U | Y = y, S_{R} = s_{R}] = \int_{C \times S_{T}} p (h, s_{T} | y, s_{R}) \frac{h^{*} \sqrt{P (s_{T})} e^{j ϕ (s_{T})} y}{1 + {| h |}^{2} P (s_{T})} d s_{T} d h \end{matrix}

(A36)

and

\begin{matrix} E [{| U |}^{2} | Y = y, S_{R} = s_{R}] & = \int_{C \times S_{T}} p (h, s_{T} | y, s_{R}) \\ (\frac{1}{1 + {| h |}^{2} P (s_{T})} + \frac{{| h |}^{2} P (s_{T}) {| y |}^{2}}{{(1 + {| h |}^{2} P (s_{T}))}^{2}}) d s_{T} d h . \end{matrix}

(A37)

Appendix C.1. No CSIR, No CSIT

Consider

S_{R} = S_{T} = 0

. The expectations in (A36) and (A37) are computed via

\begin{matrix} p (h | y) = \frac{p (h) p (y | h)}{p (y)} . \end{matrix}

(A38)

The expression (A36) with

ϕ (0) = 0

gives

\begin{matrix} E [U | Y = y] = \int_{C} p (h | y) \frac{h^{*} \sqrt{P} y}{1 + {| h |}^{2} P} d h . \end{matrix}

(A39)

Similarly, the expression (A37) gives

\begin{matrix} E [{| U |}^{2} | Y = y] & = \int_{C} p (h | y) E [{| X |}^{2}| Y = y, H = h] d h \\ = \int_{C} p (h | y) (\frac{1}{1 + {| h |}^{2} P} + \frac{{| h |}^{2} P {| y |}^{2}}{{(1 + {| h |}^{2} P)}^{2}}) d h \end{matrix}

(A40)

We may now compute

Var [U | Y = y]

using (A39) and (A40). For the expressions (69) and (70), one may use

\begin{matrix} E [X | Y = y] = \sqrt{P} E [U | Y = y], E [{| X |}^{2} | Y = y] = P E [{| U |}^{2} | Y = y] . \end{matrix}

For example, for on-off fading as in Section 3.3 we compute

\begin{matrix} E [X | Y = y] = P_{H | Y} (\sqrt{2}| y) \frac{\sqrt{2} P}{1 + 2 P} \cdot y \end{matrix}

(A41)

\begin{matrix} E [{| X |}^{2}| Y = y] = P_{H | Y} (0 | y) P + P_{H | Y} (\sqrt{2}| y) (\frac{P}{1 + 2 P} + \frac{2 P^{2} {| y |}^{2}}{{(1 + 2 P)}^{2}}) \end{matrix}

(A42)

and therefore

\begin{matrix} Var [X | Y = y] = P_{H | Y} (0 | y) P + P_{H | Y} (\sqrt{2}| y) (\frac{P}{1 + 2 P} + \frac{2 P^{2} {| y |}^{2}}{{(1 + 2 P)}^{2}} P_{H | Y} (0 | y)) \end{matrix}

(A43)

where

P_{H | Y} (\sqrt{2}| y) = 1 - P_{H | Y} (0 | y)

and

\begin{matrix} P_{H | Y} (0 | y) = \frac{e^{- {| y |}^{2}}}{e^{- {| y |}^{2}} + \frac{1}{1 + 2 P} e^{- {| y |}^{2} / (1 + 2 P)}} . \end{matrix}

(A44)

For Rayleigh fading as in Section 8.1, the density (A38) is

\begin{matrix} p (h | y) = \frac{e^{- g} e^{- {| y |}^{2} / (1 + g P)}}{π^{2} (1 + g P)} \cdot \frac{1}{p (y)} \end{matrix}

where

g = {| h |}^{2}

. Moreover,

p (y)

in (268) depends on g only. We thus have

E [U | Y = y] = 0

and the integrand in (A40) depends on g and

{| y |}^{2}

only.

Appendix C.2. Full CSIR, Partial CSIT

Consider

S_{R} = H

and partial

S_{T}

. The expectations in (A36) and (A37) are computed via (194) that we repeat here:

\begin{matrix} p (h, s_{T} | y, s_{R}) = δ (h - s_{R}) \frac{p (s_{T} | h) p (y | h, s_{T})}{p (y | h)} . \end{matrix}

For on-off fading as in Section 7.2, the expression (A36) with

ϕ (0) = 0

gives the expectations

E [U | Y = y, H = 0] = 0

and

\begin{matrix} E [U | Y = y, H = \sqrt{2}] = \sum_{s_{T} = 0, 2} P_{S_{T} | Y, H} (s_{T} | y, \sqrt{2}) \frac{\sqrt{2 P (s_{T})} y}{1 + 2 P (s_{T})} \end{matrix}

and, similarly, (A37) gives

E [{| U |}^{2} | Y = y, H = 0] = 1

and

\begin{matrix} E [{| U |}^{2} | Y = y, H = \sqrt{2}] = \sum_{s_{T} = 0, 2} P_{S_{T} | Y, H} (s_{T} | y, \sqrt{2}) (\frac{1}{1 + 2 P (s_{T})} + \frac{2 P (s_{T}) {| y |}^{2}}{{(1 + 2 P (s_{T}))}^{2}}) \end{matrix}

where

P_{S_{T} | Y, H} (2 | y, \sqrt{2}) = 1 - P_{S_{T} | Y, H} (0 | y, \sqrt{2})

and

\begin{matrix} P_{S_{T} | Y, H} (0 | y, \sqrt{2}) = \frac{\frac{ϵ}{1 + 2 P (0)} e^{- {| y |}^{2} / (1 + 2 P (0))}}{\frac{ϵ}{1 + 2 P (0)} e^{- {| y |}^{2} / (1 + 2 P (0))} + \frac{\bar{ϵ}}{1 + 2 P (2)} e^{- {| y |}^{2} / (1 + 2 P (2))}} . \end{matrix}

For Rayleigh fading as in Section 8.3, the sums over

s_{T} = 0, 2

become sums over

s_{T} = 0, 1

and the probabilities

P (s_{T} | y, h)

take on similar forms as above.

Appendix C.3. Partial CSIR, Full CSIT

Consider

S_{T} = H

and partial

S_{R}

. The expectations in (A36) and (A37) are computed via (201) that we repeat here:

\begin{matrix} p (h, s_{T} | y, s_{R}) = δ (s_{T} - h) \frac{p (h | s_{R}) p (y | h, s_{R})}{p (y | s_{R})} . \end{matrix}

For on-off fading with

S_{R} = 0

as in Section 7.3, the expression (A36) with

ϕ (0) = 0

gives

\begin{matrix} E [U | Y = y] = P_{H | Y} (\sqrt{2}| y) \frac{\sqrt{4 P} y}{1 + 4 P} \end{matrix}

and (A37) gives

\begin{matrix} E [{| U |}^{2} | Y = y] = P_{H | Y} (0 | y) + P_{H | Y} (\sqrt{2}| y) (\frac{1}{1 + 4 P} + \frac{{4 P | y |}^{2}}{{(1 + 4 P)}^{2}}) \end{matrix}

where

P_{H | Y} (\sqrt{2}| y) = 1 - P_{H | Y} (0 | y)

and

\begin{matrix} P_{H | Y} (0 | y) = \frac{e^{- {| y |}^{2}}}{e^{- {| y |}^{2}} + \frac{1}{1 + 4 P} e^{- {| y |}^{2} / (1 + 4 P)}} . \end{matrix}

For Rayleigh fading with

S_{R} = 0

and TCI as in Section 8.4, the expressions (A36) and (A37) give (cf. (A41) and (A42))

\begin{matrix} E [U | Y = y] & = \Pr [G \geq t | Y = y] \frac{\sqrt{\hat{P}} y}{1 + \hat{P}} \\ E [{| U |}^{2} | Y = y] & = \Pr [G < t | Y = y] + \Pr [G \geq t | Y = y] (\frac{1}{1 + \hat{P}} + \frac{\hat{P} {| y |}^{2}}{{(1 + \hat{P})}^{2}}) \end{matrix}

and therefore (cf. (A43))

\begin{matrix} Var [U | Y = y] & = \Pr [G < t | Y = y] + \Pr [G \geq t | Y = y] \\ \cdot (\frac{1}{1 + \hat{P}} + \frac{\hat{P} {| y |}^{2}}{{(1 + \hat{P})}^{2}} \Pr [G < t | Y = y]) \end{matrix}

where (cf. (A44))

\begin{matrix} \Pr [G < t | Y = y] = \frac{(1 - e^{- t}) e^{- {| y |}^{2}}}{(1 - e^{- t}) e^{- {| y |}^{2}} + e^{- t} \frac{1}{1 + \hat{P}} e^{- {| y |}^{2} / (1 + \hat{P})}} . \end{matrix}

Appendix C.4. Partial CSIR, CSIT@R

Consider

S_{T} = S_{R}

and partial

S_{R}

. The expectations in (A36) and (A37) are computed via (234) that we repeat here:

\begin{matrix} p (h, s_{T} | y, s_{R}) = δ (s_{T} - f (s_{R})) \frac{p (h | s_{R}) p (y | h, s_{R})}{p (y | s_{R})} . \end{matrix}

For on-off fading as in Section 7.3, the expression (A36) with

ϕ (0) = 0

gives

\begin{matrix} E [U | Y = y, S_{R} = 0] & = P_{H | Y, S_{R}} (1 | y, 0) \frac{\sqrt{2 P (0)} y}{1 + 2 P (0)} \\ E [U | Y = y, S_{R} = \sqrt{2}] & = P_{H | Y, S_{R}} (1 | y, \sqrt{2}) \frac{\sqrt{2 P (\sqrt{2})} y}{1 + 2 P (\sqrt{2})} \end{matrix}

and (A37) gives

\begin{matrix} E [{| U |}^{2} | Y = y, S_{R} = 0] & = P_{H | Y, S_{R}} (0 | y, 0) \\ + P_{H | Y, S_{R}} (1 | y, 0) (\frac{1}{1 + 2 P (0))} + \frac{{2 P (0) | y |}^{2}}{{(1 + 2 P (0))}^{2}}) \\ E [{| U |}^{2} | Y = y, S_{R} = \sqrt{2}] & = P_{H | Y, S_{R}} (0 | y, \sqrt{2}) \\ + P_{H | Y, S_{R}} (1 | y, \sqrt{2}) (\frac{1}{1 + 2 P (\sqrt{2})} + \frac{2 P (\sqrt{2}) {| y |}^{2}}{{(1 + 2 P (\sqrt{2}))}^{2}}) \end{matrix}

where

P_{H | Y, S_{R}} (\sqrt{2} | y, s_{R}) = 1 - P_{H | Y, S_{R}} (0 | y, s_{R})

and

\begin{matrix} P_{H | Y, S_{R}} (0 | y, 0) & = \frac{\bar{ϵ} e^{- {| y |}^{2}}}{\bar{ϵ} e^{- {| y |}^{2}} + \frac{ϵ}{1 + 2 P (0)} e^{- {| y |}^{2} / (1 + 2 P (0))}} \\ P_{H | Y, S_{R}} (0 | y, \sqrt{2}) & = \frac{ϵ e^{- {| y |}^{2}}}{ϵ e^{- {| y |}^{2}} + \frac{\bar{ϵ}}{1 + 2 P (\sqrt{2})} e^{- {| y |}^{2} / (1 + 2 P (\sqrt{2}))}} . \end{matrix}

For Rayleigh fading as in Section 8.5, the probabilities

P (h | y, s_{R})

take on similar forms as above.

Appendix D. Proof of Lemma 2 and (119)

We prove Lemma 2 by using the same steps as in the proof of Proposition 1. The GMI (102) with a vector

\underset{̲}{Y}

is

\begin{matrix} I_{s} (A; \underset{̲}{Y}) & = log det (I + {(Q_{\underset{̲}{Z}} / s)}^{- 1} H Q_{\underset{̲}{\bar{X}}} H^{†}) \\ + E [{\underset{̲}{Y}}^{†} {(Q_{\underset{̲}{Z}} / s + H Q_{\underset{̲}{\bar{X}}} H^{†})}^{- 1} \underset{̲}{Y}] \\ - E [{(\underset{̲}{Y} - H \underset{̲}{\bar{X}})}^{†} {(Q_{\underset{̲}{Z}} / s)}^{- 1} (\underset{̲}{Y} - H \underset{̲}{\bar{X}})] . \end{matrix}

(A45)

One can again set

s = 1

. Choosing

H = \tilde{H}

and

Q_{\underset{̲}{Z}} = {\tilde{Q}}_{\underset{̲}{\tilde{Z}}}

then gives (112).

Next, consider the channel

{\underset{̲}{Y}}_{a} = \tilde{H} \underset{̲}{\bar{X}} + \underset{̲}{\tilde{Z}}

where

\underset{̲}{\tilde{Z}}

is CSCG with covariance matrix

Q_{\underset{̲}{\tilde{Z}}}

and

\underset{̲}{\tilde{Z}}

is independent of

\underset{̲}{\bar{X}}

. Generalizing (50) and (51), we compute

Q_{{\underset{̲}{Y}}_{a}} = Q_{\underset{̲}{Y}}

and

\begin{matrix} E [({\underset{̲}{Y}}_{a} - \tilde{H} \underset{̲}{\bar{X}}) {({\underset{̲}{Y}}_{a} - \tilde{H} \underset{̲}{\bar{X}})}^{†}] = E [(\underset{̲}{Y} - H \underset{̲}{\bar{X}}) {(\underset{̲}{Y} - H \underset{̲}{\bar{X}})}^{†}] . \end{matrix}

(A46)

In other words, the second-order statistics for the two channels with outputs

\underset{̲}{Y}

(the actual channel output) and

{\underset{̲}{Y}}_{a}

are the same. Moreover, the GMI (112) is the mutual information

I (A; {\underset{̲}{Y}}_{a})

. Using (104) and (A45), for any s,

H

and

Q_{\underset{̲}{Z}}

we have

\begin{matrix} I (A; {\underset{̲}{Y}}_{a}) = log det (I + Q_{\underset{̲}{\tilde{Z}}}^{- 1} \tilde{H} Q_{\underset{̲}{\bar{X}}} {\tilde{H}}^{†}) \geq I_{s} (A; {\underset{̲}{Y}}_{a}) = I_{s} (A; \underset{̲}{Y}) \end{matrix}

(A47)

and equality holds if

H = \tilde{H}

and

Q_{\underset{̲}{Z}} / s = Q_{\underset{̲}{\tilde{Z}}}

.

To prove (119), recall that

tr (AB) = tr (BA)

for matrices

A

and

B

with appropriate dimensions. Furthermore, for Hermitian matrices

A, B, C

with the same dimensions we have

\begin{matrix} tr (A B C) = tr ({(A B C)}^{†}) = tr (C B A) = tr (A C B) . \end{matrix}

(A48)

For notational convenience, consider the covariance matrix (117) with

s = 1

and use

\begin{matrix} A & = Q_{\underset{̲}{\bar{Z}}}, B = {(H Q_{\underset{̲}{\bar{X}}} H^{†})}^{- 1 / 2} {(Q_{\underset{̲}{Y}} - Q_{\underset{̲}{\bar{Z}}})}^{1 / 2} \\ C & = Q_{\underset{̲}{\bar{Z}}}^{- 1} {(Q_{\underset{̲}{Y}} - Q_{\underset{̲}{\bar{Z}}})}^{1 / 2} {(H Q_{\underset{̲}{\bar{X}}} H^{†})}^{- 1 / 2} \end{matrix}

to compute (cf. (A45))

\begin{matrix} E [{(\underset{̲}{Y} - H \underset{̲}{\bar{X}})}^{†} Q_{\underset{̲}{Z}}^{- 1} (\underset{̲}{Y} - H \underset{̲}{\bar{X}})] = tr (Q_{\underset{̲}{\bar{Z}}} Q_{\underset{̲}{Z}}^{- 1}) \overset{(a)}{=} tr ((Q_{\underset{̲}{Y}} - Q_{\underset{̲}{\bar{Z}}}) {(H Q_{\underset{̲}{\bar{X}}} H^{†})}^{- 1}) \end{matrix}

(A49)

where step

(a)

follows by (A48). Next, by using (117) we have

\begin{matrix} {(Q_{\underset{̲}{Z}} + H Q_{\underset{̲}{\bar{X}}} H^{†})}^{- 1} & = {(Q_{\underset{̲}{Y}} - Q_{\underset{̲}{\bar{Z}}})}^{1 / 2} {(H Q_{\underset{̲}{\bar{X}}} H^{†})}^{- 1 / 2} Q_{\underset{̲}{Y}}^{- 1} \\ \cdot {(H Q_{\underset{̲}{\bar{X}}} H^{†})}^{- 1 / 2} {(Q_{\underset{̲}{Y}} - Q_{\underset{̲}{\bar{Z}}})}^{1 / 2} \end{matrix}

(A50)

and therefore (cf. (A45))

\begin{matrix} E [{\underset{̲}{Y}}^{†} {(Q_{\underset{̲}{Z}} + H Q_{\underset{̲}{\bar{X}}} H^{†})}^{- 1} \underset{̲}{Y}] & = tr (Q_{\underset{̲}{Y}} {(Q_{\underset{̲}{Z}} + H Q_{\underset{̲}{\bar{X}}} H^{†})}^{- 1}) \\ \overset{(a)}{=} tr ((Q_{\underset{̲}{Y}} - Q_{\underset{̲}{\bar{Z}}}) {(H Q_{\underset{̲}{\bar{X}}} H^{†})}^{- 1}) \end{matrix}

(A51)

where step

(a)

again follows by (A48). We are thus left with the logarithm term in (A45). Finally, the determinant in (A45) is

\begin{matrix} det (I + Q_{\underset{̲}{Z}}^{- 1} H Q_{\underset{̲}{\bar{X}}} H^{†}) = det (Q_{\underset{̲}{\bar{Z}}}^{- 1} Q_{\underset{̲}{Y}}) \end{matrix}

(A52)

where we applied (117) and Sylvester’s identity (33).

Appendix E. Proof of Lemma 3

Let

\bar{P} = E [| \bar{X} |^{2}]

and write

\begin{matrix} \bar{X} = \sqrt{\bar{P}} \bar{U}, X (s_{T}) = \sqrt{P (s_{T})} U (s_{T}) . \end{matrix}

(A53)

Since the

U (s_{T})

are CSCG we have

\begin{matrix} U (s_{T}^{'}) = ρ (s_{T}^{'}, s_{T}) U (s_{T}) + Z (s_{T}^{'}) \end{matrix}

(A54)

where

ρ (s_{T}^{'}, s_{T}) = E [U (s_{T}^{'}) U {(s_{T})}^{*}]

and

\begin{matrix} Z (s_{T}^{'}) \sim CN (0, 1 - | ρ (s_{T}^{'}, s_{T}) |^{2}) \end{matrix}

(A55)

is independent of

U (s_{T})

. As in (109), define

\begin{matrix} \bar{X} & = \sum_{s_{T}^{'}} w (s_{T}^{'}) X (s_{T}^{'}) \\ = \sum_{s_{T}} w (s_{T}^{'}) \sqrt{P (s_{T}^{'})} [U (s_{T}) ρ (s_{T}^{'}, s_{T}) + Z (s_{T}^{'})] \\ = \sqrt{\bar{P}} \bar{ρ} (s_{T}) U (s_{T}) + \sum_{s_{T}^{'}} w (s_{T}^{'}) \sqrt{P (s_{T}^{'})} Z (s_{T}^{'}) \end{matrix}

(A56)

where, assuming that

\bar{P} > 0

, we have

\begin{matrix} \bar{ρ} (s_{T}) = E [\bar{U} U {(s_{T})}^{*}] = \sum_{s_{T}^{'}} w (s_{T}^{'}) \sqrt{\frac{P (s_{T}^{'})}{\bar{P}}} ρ (s_{T}^{'}, s_{T}) . \end{matrix}

(A57)

Observe that

\sqrt{\bar{P}} \bar{ρ} (s_{T}) U (s_{T})

is the LMMSE estimate of

\bar{X}

given

U (s_{T})

.

Using Lemma 2, we have the auxiliary variables

\begin{matrix} \tilde{h} = \frac{E [Y {\bar{X}}^{*}]}{\bar{P}}, {\tilde{σ}}^{2} = E [{| Y |}^{2}] - {| \tilde{h} |}^{2} \bar{P} \end{matrix}

(A58)

and the GMI

\begin{matrix} I_{1} (A; Y) = log (\frac{E [{| Y |}^{2}]}{E [{| Y |}^{2}] - {| \tilde{h} |}^{2} \bar{P}}) . \end{matrix}

(A59)

If the

P (s_{T})

are fixed, then so is

E [{| Y |}^{2}]

because

U (s_{T})

is CSCG and independent of Z given

S_{T} = s_{T}

. The GMI (A59) is thus maximized by maximizing

| \tilde{h} |^{2} \bar{P}

. We compute

\begin{matrix} | \tilde{h} |^{2} \bar{P} & = {|\sum_{s_{T}} P_{S_{T}} (s_{T}) \frac{E [Y {\bar{X}}^{*}| S_{T} = s_{T}]}{\sqrt{\bar{P}}}|}^{2} \\ \overset{(a)}{=} {|\sum_{s_{T}} P_{S_{T}} (s_{T}) E [Y U {(s_{T})}^{*}| S_{T} = s_{T}] \bar{ρ} {(s_{T})}^{*}|}^{2} \end{matrix}

(A60)

\begin{matrix} \leq {(\sum_{s_{T}} P_{S_{T}} (s_{T}) |E [Y U {(s_{T})}^{*}| S_{T} = s_{T}]|)}^{2} \end{matrix}

(A61)

where step

(a)

follows because we have the Markov chain

A - [U (S_{T}), S_{T}] - Y

which implies that Y and the

Z (s_{T}^{'})

in (A56) are independent give

S_{T} = s_{T}

.

Equality holds in (A61) if the summands in (A60) all have the same phase and

| \bar{ρ} (s_{T}) | = 1

for all

s_{T}

. But this is possible by choosing

X (s_{T})

as given in (122) so that

U (s_{T}) = e^{j ϕ (s_{T})} U

. Moreover, choose the receiver weights as

\begin{matrix} w ({\tilde{s}}_{T}) = \sqrt{\frac{\bar{P}}{P ({\tilde{s}}_{T})}} e^{- j ϕ ({\tilde{s}}_{T})} \end{matrix}

(A62)

for one

{\tilde{s}}_{T} \in S_{T}

with

P ({\tilde{s}}_{T}) > 0

, and

w (s_{T}) = 0

otherwise. We then have

\bar{X} = \sqrt{\bar{P}} U

and

\begin{matrix} ρ (s_{T}^{'}, s_{T}) = e^{j (ϕ (s_{T}^{'}) - ϕ (s_{T}))}, \bar{ρ} (s_{T}) & = e^{- j ϕ (s_{T})} \end{matrix}

(A63)

and the resulting maximal

I_{1} (A; Y)

is given by (120) and (121).

Remark A1.

The full correlation permits many choices for the

w (s_{T})

; hence, these weights do not seem central to the design. However, including weights can be useful if the codebook is not designed for the CSIR. For example, suppose A has independent entries

X (s_{T})

for which we compute

\begin{matrix} \bar{ρ} (s_{T}) & = \frac{w (s_{T}) \sqrt{P (s_{T})}}{\sqrt{\sum_{s_{T}^{'}} {| w (s_{T}^{'}) |}^{2} P (s_{T}^{'})}} \end{matrix}

(A64)

and thus (A60) becomes

\begin{matrix} \frac{{|\sum_{s_{T}} P_{S_{T}} (s_{T}) E [Y X {(s_{T})}^{*}| S_{T} = s_{T}] w {(s_{T})}^{*}|}^{2}}{\sum_{s_{T}} {| w (s_{T}) |}^{2} P (s_{T})} . \end{matrix}

(A65)

Using Bergström’s inequality (or the Cauchy-Schwarz inequality), the expression (A65) is maximized by

\begin{matrix} w (s_{T}) = P_{S_{T}} (s_{T}) \frac{E [Y X {(s_{T})}^{*}| S_{T} = s_{T}]}{P (s_{T})} \cdot c \end{matrix}

(A66)

for some constant

c \neq 0

. The expression (A60) is therefore

\begin{matrix} \sum_{s_{T}} P_{S_{T}} {(s_{T} | h)}^{2} {|E [Y U {(s_{T})}^{*}| S_{T} = s_{T}]|}^{2} \end{matrix}

(A67)

which is generally smaller than

E {[|E [Y U {(S_{T})}^{*}| S_{T}]|]}^{2}

(apply

\sum_{i} a_{i}^{2} \leq {(\sum_{i} a_{i})}^{2}

for non-negative

a_{i}

).

Remark A2.

The following example shows that more general signaling and more general

\bar{X}

can be useful. Consider the channel with two equally-likely states

S_{T} = {+ 1, - 1}

and

Y = | X | exp (j s_{T} arg (X)) + Z

. We compute

\begin{matrix} E [Y U {(+ 1)}^{*} | S_{T} = + 1] = \sqrt{P (1)} \\ E [Y U {(- 1)}^{*} | S_{T} = - 1] = 0 \\ \bar{ρ} (+ 1) = \frac{w (1) \sqrt{P (1)} + w (- 1) \sqrt{P (- 1)} ρ (- 1, + 1)}{\sqrt{\bar{P}}} \end{matrix}

and one should choose

P (- 1) = 0

and

P (1) = 2 P

if the power constraint is

E [P (S_{T})] \leq P

. We thus have

\begin{matrix} E [{| Y |}^{2}] = P + 1, \tilde{P} = \frac{P}{2} \end{matrix}

and therefore (120) gives

\begin{matrix} I_{1} (A; Y) = log (1 + \frac{P}{2 + P}) . \end{matrix}

However, one can achieve the rate

log (1 + P)

with other Gaussian

\bar{X}

, namely linear combinations of both the

X (s_{T})

and the

X {(s_{T})}^{*}

in (A56). This idea permits circularly asymmetric

\bar{X}

, also known as improper

\bar{X}

[114]. Alternatively, the transmitter can send the complex-conjugate symbols if

S_{T} = - 1

.

Appendix F. Large K for Section 5.3

We complete Remark 49 by proceeding as in Appendix C.1. To generalize (70), we must deal with unit-rank matrices

\underset{̲}{y} {\underset{̲}{y}}^{†}

that do not have inverses. Consider first finite K. Conditioned on the event

E_{k}

, we may write

\begin{matrix} \underset{̲}{Y} = {\underset{̲}{y}}_{k} + ϵ^{1 / 2} {\underset{̲}{\tilde{Z}}}_{k} \end{matrix}

(A68)

where

{\underset{̲}{y}}_{k} = E [\underset{̲}{Y} | E_{k}]

and

E [{\underset{̲}{\tilde{Z}}}_{k}| E_{k}] = \underset{̲}{0}

. We abuse notation and write the conditional covariance matrix of

{\underset{̲}{\tilde{Z}}}_{k}

as

Q_{{\underset{̲}{\tilde{Z}}}_{k}}

, and we assume that

Q_{{\underset{̲}{\tilde{Z}}}_{k}}

is invertible. Define

{\underset{̲}{\tilde{y}}}_{k} = Q_{{\underset{̲}{\tilde{Z}}}_{k}}^{- 1 / 2} {\underset{̲}{y}}_{k}

and compute

\begin{matrix} Q_{\underset{̲}{Y}}^{(k)} & = ϵ Q_{{\underset{̲}{\tilde{Z}}}_{k}}^{1 / 2} [I + \frac{1}{ϵ} {\underset{̲}{\tilde{y}}}_{k} {\underset{̲}{\tilde{y}}}_{k}^{†}] Q_{{\underset{̲}{\tilde{Z}}}_{k}}^{1 / 2} \end{matrix}

(A69)

\begin{matrix} {(Q_{\underset{̲}{Y}}^{(k)})}^{- 1} & = \frac{1}{ϵ} Q_{{\underset{̲}{\tilde{Z}}}_{k}}^{- 1 / 2} [I - \frac{{\underset{̲}{\tilde{y}}}_{k} {\underset{̲}{\tilde{y}}}_{k}^{†}}{ϵ + ∥ \underset{̲}{\tilde{y}} ∥^{2}}] Q_{{\underset{̲}{\tilde{Z}}}_{k}}^{- 1 / 2} . \end{matrix}

(A70)

We further compute approximations for small

ϵ

:

\begin{matrix} {\underset{̲}{y}}_{k}^{†} {(Q_{\underset{̲}{Y}}^{(k)})}^{- 1} {\underset{̲}{y}}_{k} = \frac{∥ {\underset{̲}{\tilde{y}}}_{k} ∥^{2}}{ϵ + ∥ {\underset{̲}{\tilde{y}}}_{k} ∥^{2}} \approx 1 \end{matrix}

(A71)

\begin{matrix} H_{k} = ({\underset{̲}{y}}_{k} E [{\underset{̲}{\bar{X}}}^{†}| E_{k}] + ϵ^{1 / 2} E [{\underset{̲}{\tilde{Z}}}_{k} {\underset{̲}{\bar{X}}}^{†}| E_{k}]) {(Q_{\underset{̲}{\bar{X}}}^{(k)})}^{- 1} \approx {\underset{̲}{y}}_{k} E [{\underset{̲}{\bar{X}}}^{†}| E_{k}] {(Q_{\underset{̲}{\bar{X}}}^{(k)})}^{- 1} . \end{matrix}

(A72)

We can now treat the limit of large K for which

ϵ

approaches zero, i.e., we choose a different auxiliary model for each

\underset{̲}{Y} = \underset{̲}{y}

. Applying the Woodbury and Sylvester identities (32) and (33) several times, (158) becomes

\begin{matrix} I_{1} (A; \underset{̲}{Y}) = \int_{C^{N}} p (\underset{̲}{y}) & [log det (I + {(Q_{\underset{̲}{\bar{X}}}^{(\underset{̲}{y})} - {\underset{̲}{E}}_{\underset{̲}{y}} {\underset{̲}{E}}_{\underset{̲}{y}}^{†})}^{- 1} Q_{\underset{̲}{\bar{X}}} {(Q_{\underset{̲}{\bar{X}}}^{(\underset{̲}{y})})}^{- 1} {\underset{̲}{E}}_{\underset{̲}{y}} {\underset{̲}{E}}_{\underset{̲}{y}}^{†}) \\ - tr ({(Q_{\underset{̲}{\bar{X}}}^{(\underset{̲}{y})} {(D_{\underset{̲}{\bar{X}}}^{(\underset{̲}{y})})}^{- 1} Q_{\underset{̲}{\bar{X}}}^{(\underset{̲}{y})} - {\underset{̲}{E}}_{\underset{̲}{y}} {\underset{̲}{E}}_{\underset{̲}{y}}^{†})}^{- 1} {\underset{̲}{E}}_{\underset{̲}{y}} {\underset{̲}{E}}_{\underset{̲}{y}}^{†})] d \underset{̲}{y} \end{matrix}

(A73)

where

\begin{matrix} {\underset{̲}{E}}_{\underset{̲}{y}} = E [\underset{̲}{\bar{X}} | \underset{̲}{Y} = \underset{̲}{y}], Q_{\underset{̲}{\bar{X}}}^{(\underset{̲}{y})} = E [\underset{̲}{\bar{X}} {\underset{̲}{\bar{X}}}^{†}| \underset{̲}{Y} = \underset{̲}{y}], D_{\underset{̲}{\bar{X}}}^{(\underset{̲}{y})} = Q_{\underset{̲}{\bar{X}}} - Q_{\underset{̲}{\bar{X}}}^{(\underset{̲}{y})} . \end{matrix}

If

\underset{̲}{\bar{X}}, \underset{̲}{Y}

are jointly CSCG, then using (25) and (26) we have

\begin{matrix} {\underset{̲}{E}}_{\underset{̲}{y}} & = E [\underset{̲}{\bar{X}} {\underset{̲}{Y}}^{†}] Q_{\underset{̲}{Y}}^{- 1} \cdot \underset{̲}{y} \end{matrix}

(A74)

\begin{matrix} Q_{\underset{̲}{\bar{X}}}^{(\underset{̲}{y})} - {\underset{̲}{E}}_{\underset{̲}{y}} {\underset{̲}{E}}_{\underset{̲}{y}}^{†} & = Q_{\underset{̲}{\bar{X}}} - E [\underset{̲}{\bar{X}} {\underset{̲}{Y}}^{†}] Q_{\underset{̲}{Y}}^{- 1} E {[\underset{̲}{\bar{X}} {\underset{̲}{Y}}^{†}]}^{†} . \end{matrix}

(A75)

For example, if

\underset{̲}{Y} = H \underset{̲}{X} + \underset{̲}{Z}

where

H, \underset{̲}{X}, \underset{̲}{Z}

are mutually independent and

E [\underset{̲}{Z}] = 0

, then we have (cf. (A39))

\begin{matrix} {\underset{̲}{E}}_{\underset{̲}{y}} & = \int_{C^{N \times M}} p (h | \underset{̲}{y}) E [\underset{̲}{\bar{X}} | \underset{̲}{Y} = \underset{̲}{y}, H = h] d h \\ = \int_{C^{N \times M}} p (h | \underset{̲}{y}) Q_{\underset{̲}{\bar{X}}} h^{†} {(I + h Q_{\underset{̲}{X}} h^{†})}^{- 1} \underset{̲}{y} d h \end{matrix}

(A76)

\begin{matrix} = E [Q_{\underset{̲}{\bar{X}}} H^{†} {(I + H Q_{\underset{̲}{X}} H^{†})}^{- 1}| \underset{̲}{Y} = \underset{̲}{y}] \cdot \underset{̲}{y} \end{matrix}

(A77)

where we have applied (A74) with conditioning on the event

H = h

. Similarly, we apply a conditional version of (A75) and the step (A76) to compute (cf. (A40))

\begin{matrix} Q_{\underset{̲}{\bar{X}}}^{(\underset{̲}{y})} & = \int_{C^{N \times M}} p (h | \underset{̲}{y}) E [\underset{̲}{\bar{X}} {\underset{̲}{\bar{X}}}^{†}| \underset{̲}{Y} = \underset{̲}{y}, H = h] d h \\ = \int_{C^{N \times M}} p (h | \underset{̲}{y}) (Q_{\underset{̲}{\bar{X}}}^{(\underset{̲}{y}, h)} + {\underset{̲}{E}}_{\underset{̲}{y}, h} {\underset{̲}{E}}_{\underset{̲}{y}, h}^{†}) d h \\ = E [Q_{\underset{̲}{\bar{X}}}^{(\underset{̲}{y}, H)} + {\underset{̲}{E}}_{\underset{̲}{y}, H} {\underset{̲}{E}}_{\underset{̲}{y}, H}^{†}| \underset{̲}{Y} = \underset{̲}{y}] \end{matrix}

(A78)

where

\begin{matrix} Q_{\underset{̲}{\bar{X}}}^{(\underset{̲}{y}, h)} & = Q_{\underset{̲}{\bar{X}}} - Q_{\underset{̲}{\bar{X}}} h^{†} {(I + h Q_{\underset{̲}{X}} h^{†})}^{- 1} h Q_{\underset{̲}{\bar{X}}} \\ {\underset{̲}{E}}_{\underset{̲}{y}, h} & = E [\underset{̲}{\bar{X}} | \underset{̲}{Y} = \underset{̲}{y}, H = h] = Q_{\underset{̲}{\bar{X}}} h^{†} {(I + h Q_{\underset{̲}{X}} h^{†})}^{- 1} \underset{̲}{y} . \end{matrix}

Appendix G. Proof of Lemma 4

We mimic the steps of Appendix E. Consider the SVDs

\begin{matrix} Q_{\underset{̲}{\bar{X}}} = V_{\underset{̲}{\bar{X}}} Σ_{\underset{̲}{\bar{X}}} V_{\underset{̲}{\bar{X}}}^{†}, Q_{\underset{̲}{X} (s_{T})} = V_{\underset{̲}{X} (s_{T})} Σ_{\underset{̲}{X} (s_{T})} V_{\underset{̲}{X} (s_{T})}^{†} . \end{matrix}

Let

\underset{̲}{\bar{U}} \sim CN (0, I)

and write

\begin{matrix} \underset{̲}{\bar{X}} & = Q_{\underset{̲}{\bar{X}}}^{1 / 2} \underset{̲}{\bar{U}} . \end{matrix}

Since the

\underset{̲}{U} (s_{T})

are CSCG, we have

\begin{matrix} \underset{̲}{U} (s_{T}^{'}) = R (s_{T}^{'}, s_{T}) \underset{̲}{U} (s_{T}) + \underset{̲}{Z} (s_{T}^{'}) \end{matrix}

(A79)

where

R (s_{T}^{'}, s_{T}) = E [\underset{̲}{U} (s_{T}^{'}) \underset{̲}{U} {(s_{T})}^{†}]

and

\begin{matrix} \underset{̲}{Z} (s_{T}^{'}) \sim CN (0, I - R (s_{T}^{'}, s_{T}) R {(s_{T}^{'}, s_{T})}^{†}) \end{matrix}

(A80)

is independent of

\underset{̲}{U} (s_{T})

. As in (109), define

\begin{matrix} \underset{̲}{\bar{X}} & = \sum_{s_{T}^{'}} W (s_{T}^{'}) \underset{̲}{X} (s_{T}^{'}) \\ = \sum_{s_{T}^{'}} W (s_{T}^{'}) Q_{\underset{̲}{X} (s_{T}^{'})}^{1 / 2} [R (s_{T}^{'}, s_{T}) \underset{̲}{U} (s_{T}) + \underset{̲}{Z} (s_{T}^{'})] \\ = Q_{\underset{̲}{\bar{X}}}^{1 / 2} \bar{R} (s_{T}) \underset{̲}{U} (s_{T}) + \sum_{s_{T}^{'}} W (s_{T}^{'}) Q_{\underset{̲}{X} (s_{T}^{'})}^{1 / 2} \underset{̲}{Z} (s_{T}^{'}) \end{matrix}

(A81)

where as in (A57), and assuming

Q_{\underset{̲}{\bar{X}}} ≻ 0

, we write

\begin{matrix} \bar{R} (s_{T}) = E [\underset{̲}{\bar{U}} \underset{̲}{U} {(s_{T})}^{†}] = \sum_{s_{T}^{'}} Q_{\underset{̲}{\bar{X}}}^{- 1 / 2} W (s_{T}^{'}) Q_{\underset{̲}{X} (s_{T}^{'})}^{1 / 2} R (s_{T}^{'}, s_{T}) . \end{matrix}

(A82)

Observe that the vector

Q_{\underset{̲}{\bar{X}}}^{1 / 2} \bar{R} (s_{T}) \underset{̲}{U} (s_{T})

is the LMMSE estimate of

\underset{̲}{\bar{X}}

given

\underset{̲}{U} (s_{T})

.

Using Lemma 2, we have (see (A58))

\begin{matrix} \tilde{H} = E [\underset{̲}{Y} {\underset{̲}{\bar{X}}}^{†}] Q_{\underset{̲}{\bar{X}}}^{- 1}, Q_{\tilde{\underset{̲}{Z}}} = Q_{\underset{̲}{Y}} - \tilde{H} Q_{\underset{̲}{\bar{X}}} {\tilde{H}}^{†} \end{matrix}

(A83)

and we have the GMI (124) that we repeat here:

\begin{matrix} I_{1} (A; \underset{̲}{Y}) = log (\frac{det Q_{\underset{̲}{Y}}}{det (Q_{\underset{̲}{Y}} - \tilde{H} Q_{\underset{̲}{\bar{X}}} {\tilde{H}}^{†})}) . \end{matrix}

(A84)

As in Appendix E, if the

Q_{\underset{̲}{X} (s_{T})}

are fixed, then so is

Q_{\underset{̲}{Y}}

because

\underset{̲}{U} (s_{T}) \sim CN (\underset{̲}{0}, I)

is independent of

\underset{̲}{Z}

given

S_{T} = s_{T}

. We want to maximize the GMI (A84). Similar to (A60), we have the decomposition

\begin{matrix} \tilde{H} Q_{\underset{̲}{\bar{X}}} {\tilde{H}}^{†} = \tilde{D} {\tilde{D}}^{†} \end{matrix}

(A85)

where

\begin{matrix} \tilde{D} = \sum_{s_{T}} P_{S_{T}} (s_{T}) E [\underset{̲}{Y} \underset{̲}{U} {(s_{T})}^{†}| S_{T} = s_{T}] \bar{R} {(s_{T})}^{†} . \end{matrix}

(A86)

As in (A60), we have the Markov chain

A - [\underset{̲}{U} (S_{T}), S_{T}] - \underset{̲}{Y}

which implies that

\underset{̲}{Y}

and the

\underset{̲}{Z} (s_{T}^{'})

in (A81) are independent give

S_{T} = s_{T}

. It is natural to expect that the matrix

\bar{R} (s_{T})

of correlation coefficients should be “maximized” somehow. Indeed, the Cauchy-Schwarz inequality gives

\begin{matrix} {\underset{̲}{v}}_{1}^{†} \bar{R} (s_{T}) {\underset{̲}{v}}_{2} = E [{\underset{̲}{v}}_{1}^{†} \underset{̲}{\bar{U}} \cdot \underset{̲}{U} {(s_{T})}^{†} {\underset{̲}{v}}_{2}] \leq \sqrt{E [{|{\underset{̲}{\bar{U}}}^{†} {\underset{̲}{v}}_{1}|}^{2}]} \cdot \sqrt{E [{|\underset{̲}{U} {(s_{T})}^{†} {\underset{̲}{v}}_{2}|}^{2}]} = ∥{\underset{̲}{v}}_{1}∥ \cdot ∥{\underset{̲}{v}}_{2}∥ \end{matrix}

for any complex M-dimensional vectors

{\underset{̲}{v}}_{1}

and

{\underset{̲}{v}}_{2}

. The singular values of

R (s_{T})

are thus at most 1. We will choose the

\underset{̲}{U} (s_{T})

so that the

R (s_{T})

are unitary matrices, and thus all singular values are 1.

Consider the SVD decompositions (126) and a codebook based on scaling and rotating a common

\underset{̲}{U} \sim CN (\underset{̲}{0}, I)

of dimension N (see (122)):

\begin{matrix} \underset{̲}{U} (s_{T}) = V_{T} (s_{T}) \underset{̲}{U} . \end{matrix}

(A87)

The receiver chooses

M \times M

unitary matrices

V_{R} (s_{T})

for all

s_{T}

and uses the weighting matrix (cf. (A62))

\begin{matrix} W ({\tilde{s}}_{T}) = Q_{\underset{̲}{\bar{X}}}^{1 / 2} V_{R} ({\tilde{s}}_{T}) V_{T} {({\tilde{s}}_{T})}^{†} Q_{\underset{̲}{X} ({\tilde{s}}_{T})}^{- 1 / 2} \end{matrix}

(A88)

for one

{\tilde{s}}_{T} \in S_{T}

with

Q_{\underset{̲}{X} ({\tilde{s}}_{T})} ≻ 0

, and

W ({\tilde{s}}_{T}) = 0

otherwise. These choices give

\underset{̲}{\bar{X}} = Q_{\underset{̲}{\bar{X}}}^{1 / 2} \underset{̲}{U}

and (cf. (A63))

\begin{matrix} R (s_{T}^{'}, s_{T}) = V_{T} (s_{T}^{'}) V_{T} {(s_{T})}^{†}, \bar{R} (s_{T}) = V_{R} (s_{T}) V_{T} {(s_{T})}^{†} . \end{matrix}

(A89)

Using (126), (A86), and (A89), we have

\begin{matrix} \tilde{D} = \sum_{s_{T}} P_{S_{T}} (s_{T}) U_{T} (s_{T}) Σ (s_{T}) V_{R} {(s_{T})}^{†} . \end{matrix}

(A90)

References

Ozarow, L.; Shamai, S.; Wyner, A.D. Information theoretic consideration for cellular mobile radio. IEEE Trans. Inf. Theory 1994, 43, 359–378. [Google Scholar] [CrossRef]
Biglieri, E.; Proakis, J.; Shamai (Shitz), S. Fading channels: Information-theoretic and communications aspects. IEEE Trans. Inf. Theory 1998, 44, 2619–2692. [Google Scholar] [CrossRef]
Love, D.J.; Heath, R.W., Jr.; Lau, V.K.N.; Gesbert, D.; Rao, B.D.; Andrews, M. An overview of limited feedback in wireless communication systems. IEEE J. Select. Areas Commun. 2008, 26, 1341–1365. [Google Scholar] [CrossRef]
Kim, Y.H.; Kramer, G. Information theory for cellular wireless networks. In Information Theoretic Perspectives on 5G Systems and Beyond; Cambridge University Press: Cambridge, UK, 2022; pp. 10–92. [Google Scholar] [CrossRef]
Keshet, G.; Steinberg, Y.; Merhav, N. Channel coding in the presence of side information. Found. Trends Commun. Inf. Theory 2008, 4, 445–586. [Google Scholar] [CrossRef] [Green Version]
Shannon, C.E. Channels with side information at the transmitter. IBM J. Res. Develop. 1958, 2, 289–293, Reprinted in Claude Elwood Shannon: Collected Papers; Sloane, N.J.A.,Wyner, A.D., Eds.; IEEE Press: Piscataway, NJ, USA, 1993; pp. 273–278. [Google Scholar] [CrossRef]
Shannon, C.E. Two-way communication channels. In Proceedings of the Proc. 4th Berkeley Symp. on Mathematical Statistics and Probability; Neyman, J., Ed.; Univ. Calif. Press: Berkeley, CA, USA, 1961; Volume 1, pp. 611–644, Reprinted in Claude Elwood Shannon: Collected Papers; Sloane, N.J.A.,Wyner, A.D., Eds.; IEEE Press: Piscataway, NJ, USA, 1993; pp. 351–384. [Google Scholar]
Blahut, R. Principles and Practice of Information Theory; Addison-Wesley: Reading, MA, USA, 1987. [Google Scholar]
Kramer, G. Directed Information for Channels with Feedback; Vol. ETH Series in Information Processing; Hartung-Gorre Verlag: Konstanz, Germany, 1998; Volume 11. [Google Scholar] [CrossRef]
Caire, G.; Shamai (Shitz), S. On the capacity of some channels with channel state information. IEEE Trans. Inf. Theory 1999, 45, 2007–2019. [Google Scholar] [CrossRef] [Green Version]
McEliece, R.J.; Stark, W.E. Channels with block interference. IEEE Trans. Inf. Theory 1984, 30, 44–53. [Google Scholar] [CrossRef]
Stark, W.; McEliece, R. On the capacity of channels with block memory. IEEE Trans. Inf. Theory 1988, 34, 322–324. [Google Scholar] [CrossRef] [Green Version]
Wang, H.S.; Moayeri, N. Finite-state Markov channel-a useful model for radio communication channels. IEEE Trans. Vehic. Technol. 1995, 44, 163–171. [Google Scholar] [CrossRef]
Wang, H.S.; Chang, P.C. On verifying the first-order Markovian assumption for a Rayleigh fading channel model. IEEE Trans. Vehic. Technol. 1996, 45, 353–357. [Google Scholar] [CrossRef]
Viswanathan, H. Capacity of Markov channels with receiver CSI and delayed feedback. IEEE Trans. Inf. Theory 1999, 45, 761–771. [Google Scholar] [CrossRef]
Zhang, Q.; Kassam, S. Finite-state Markov model for Rayleigh fading channels. IEEE Trans. Commun. 1999, 47, 1688–1692. [Google Scholar] [CrossRef]
Tan, C.C.; Beaulieu, N.C. On first-order Markov modeling for the Rayleigh fading channel. IEEE Trans. Commun. 2000, 48, 2032–2040. [Google Scholar] [CrossRef]
Médard, M. The effect upon channel capacity in wireless communications of perfect and imperfect knowledge of the channel. IEEE Trans. Inf. Theory 2000, 46, 933–946. [Google Scholar] [CrossRef] [Green Version]
Riediger, M.; Shwedyk, E. Communication receivers based on Markov models of the fading channel. In Proceedings of the IEEE CCECE2002, Canadian Conference on Electrical and Computer Engineering, Conference Proceedings (Cat. No.02CH37373), Winnipeg, MB, Canada, 12–15 May 2002; Volume 3, pp. 1255–1260. [Google Scholar] [CrossRef] [Green Version]
Agarwal, M.; Honig, M.L.; Ata, B. Adaptive training for correlated fading channels With feedback. IEEE Trans. Inf. Theory 2012, 58, 5398–5417. [Google Scholar] [CrossRef]
Ezzine, R.; Wiese, M.; Deppe, C.; Boche, H. A rigorous proof of the capacity of MIMO Gauss-Markov Rayleigh fading channels. In Proceedings of the 2022 IEEE International Symposium on Information Theory (ISIT), Espoo, Finland, 26 June–1 July 2022; pp. 2732–2737. [Google Scholar] [CrossRef]
Kramer, G. Information networks with in-block memory. IEEE Trans. Inf. Theory 2014, 60, 2105–2120. [Google Scholar] [CrossRef] [Green Version]
Pinsker, M.S. Calculation of the rate of information production by means of stationary random processes and the capacity of stationary channel. Dokl. Akad. Nauk USSR 1956, 111, 753–756. [Google Scholar]
Ihara, S. On the capacity of channels with additive non-Gaussian noise. Inf. Control 1978, 37, 34–39. [Google Scholar] [CrossRef] [Green Version]
Pinsker, M.; Prelov, V.; Verdú, S. Sensitivity of channel capacity. IEEE Trans. Inf. Theory 1995, 41, 1877–1888. [Google Scholar] [CrossRef] [Green Version]
Shamai, S. On the capacity of a twisted-wire pair: Peak-power constraint. IEEE Trans. Commun. 1990, 38, 368–378. [Google Scholar] [CrossRef]
Kalet, I.; Shamai, S. On the capacity of a twisted-wire pair: Gaussian model. IEEE Trans. Commun. 1990, 38, 379–383. [Google Scholar] [CrossRef] [Green Version]
Diggavi, S.; Cover, T. The worst additive noise under a covariance constraint. IEEE Trans. Inf. Theory 2001, 47, 3072–3081. [Google Scholar] [CrossRef]
Klein, T.; Gallager, R. Power control for the additive white Gaussian noise channel under channel estimation errors. In Proceedings of the 2001 IEEE International Symposium on Information Theory (IEEE Cat. No.01CH37252), Washington, DC, USA, 29–29 June 2001; p. 304. [Google Scholar] [CrossRef]
Bhashyam, S.; Sabharwal, A.; Aazhang, B. Feedback gain in multiple antenna systems. IEEE Trans. Commun. 2002, 50, 785–798. [Google Scholar] [CrossRef]
Hassibi, B.; Hochwald, B. How much training is needed in multiple-antenna wireless links? IEEE Trans. Inf. Theory 2003, 49, 951–963. [Google Scholar] [CrossRef] [Green Version]
Yoo, T.; Goldsmith, A. Capacity and power allocation for fading MIMO channels with channel estimation error. IEEE Trans. Inf. Theory 2006, 52, 2203–2214. [Google Scholar] [CrossRef] [Green Version]
Agarwal, M.; Honig, M.L. Wideband fading channel capacity With training and partial feedback. IEEE Trans. Inf. Theory 2010, 56, 4865–4873. [Google Scholar] [CrossRef]
Soysal, A.; Ulukus, S. Joint channel estimation and resource allocation for MIMO systems-part I: Single-user analysis. IEEE Trans. Wireless Commun. 2010, 9, 624–631. [Google Scholar] [CrossRef] [Green Version]
Marzetta, T.L.; Larsson, E.G.; Yang, H.; Ngo, H.Q. Fundamentals of Massive MIMO; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar] [CrossRef]
Li, Y.; Tao, C.; Lee Swindlehurst, A.; Mezghani, A.; Liu, L. Downlink achievable rate analysis in massive MIMO systems with one-bit DACs. IEEE Commun. Lett. 2017, 21, 1669–1672. [Google Scholar] [CrossRef]
Caire, G. On the ergodic rate lower bounds With applications to massive MIMO. IEEE Trans. Wireless Communi. 2018, 17, 3258–3268. [Google Scholar] [CrossRef]
Noam, Y.; Zaidel, B.M. On the two-user MISO interference channel with single-user decoding: Impact of imperfect CSIT and channel dimension reduction. IEEE Trans. Signal Proc. 2019, 67, 2608–2623. [Google Scholar] [CrossRef]
Kaplan, G.; Shamai (Shitz), S. Information rates and error exponents of compound channels with application to antipodal signaling in a fading environment. Arch. für Elektron. Und Übertragungstechnik 1993, 47, 228–239. [Google Scholar]
Merhav, N.; Kaplan, G.; Lapidoth, A.; Shamai Shitz, S. On information rates for mismatched decoders. IEEE Trans. Inf. Theory 1994, 40, 1953–1967. [Google Scholar] [CrossRef] [Green Version]
Scarlett, J.; i Fàbregas, A.G.; Somekh-Baruch, A.; Martinez, A. Information-Theoretic Foundations of Mismatched Decoding. Found. Trends Commun. Inf. Theory 2020, 17, 149–401. [Google Scholar] [CrossRef]
Lapidoth, A. Nearest neighbor decoding for additive non-Gaussian noise channels. IEEE Trans. Inf. Theory 1996, 42, 1520–1529. [Google Scholar] [CrossRef]
Lapidoth, A.; Shamai, S. Fading channels: How perfect need “perfect side information” be? IEEE Trans. Inf. Theory 2002, 48, 1118–1134. [Google Scholar] [CrossRef]
Weingarten, H.; Steinberg, Y.; Shamai, S. Gaussian codes and weighted nearest neighbor decoding in fading multiple-antenna channels. IEEE Trans. Inf. Theory 2004, 50, 1665–1686. [Google Scholar] [CrossRef]
Asyhari, A.T.; Fàbregas, A.G.i. MIMO block-fading channels With mismatched CSI. IEEE Trans. Inf. Theory 2014, 60, 7166–7185. [Google Scholar] [CrossRef] [Green Version]
Östman, J.; Lancho, A.; Durisi, G.; Sanguinetti, L. URLLC With massive MIMO: Analysis and design at finite blocklength. IEEE Trans. Wireless Commun. 2021, 20, 6387–6401. [Google Scholar] [CrossRef]
Zhang, W. A general framework for transmission with transceiver distortion and some applications. IEEE Trans. Commun. 2012, 60, 384–399. [Google Scholar] [CrossRef] [Green Version]
Zhang, W.; Wang, Y.; Shen, C.; Liang, N. A regression approach to certain information transmission problems. IEEE J. Selected Areas Commun. 2019, 37, 2517–2531. [Google Scholar] [CrossRef] [Green Version]
Pang, S.; Zhang, W. Generalized nearest neighbor decoding for MIMO channels with imperfect channel state information. In Proceedings of the IEEE Inf. Theory Workshop, Kanazawa, Japan, 17–21 October 2021; pp. 1–6. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, W. Generalized nearest neighbor decoding. IEEE Trans. Inf. Theory 2022, 68, 5852–5865. [Google Scholar] [CrossRef]
Nedelcu, A.S.; Steiner, F.; Kramer, G. Low-resolution precoding for multi-antenna downlink channels and OFDM. Entropy 2022, 24, 504. [Google Scholar] [CrossRef] [PubMed]
Essiambre, R.J.; Kramer, G.; Winzer, P.J.; Foschini, G.J.; Goebel, B. Capacity Limits of Optical Fiber Networks. IEEE/OSA J. Lightw. Technol. 2010, 28, 662–701. [Google Scholar] [CrossRef]
Dar, R.; Shtaif, M.; Feder, M. New bounds on the capacity of the nonlinear fiber-optic channel. Opt. Lett. 2014, 39, 398–401. [Google Scholar] [CrossRef]
Secondini, M.; Agrell, E.; Forestieri, E.; Marsella, D.; Camara, M.R. Nonlinearity mitigation in WDM systems: Models, strategies, and achievable rates. IEEE/OSA J. Lightw. Technol. 2019, 37, 2270–2283. [Google Scholar] [CrossRef] [Green Version]
García-Gómez, F.J.; Kramer, G. Mismatched models to lower bound the capacity of optical fiber channels. IEEE/OSA J. Lightw. Technol. 2020, 38, 6779–6787. [Google Scholar] [CrossRef]
García-Gómez, F.J.; Kramer, G. Mismatched models to lower bound the capacity of dual-polarization optical fiber channels. IEEE/OSA J. Lightw. Technol. 2021, 39, 3390–3399. [Google Scholar] [CrossRef]
García-Gómez, F.J.; Kramer, G. Rate and power scaling of space-division multiplexing via nonlinear perturbation. J. Lightw. Technol. 2022, 40, 5077–5082. [Google Scholar] [CrossRef]
Secondini, M.; Civelli, S.; Forestieri, E.; Khan, L.Z. New lower bounds on the capacity of optical fiber channels via optimized shaping and detection. J. Lightw. Technol. 2022, 40, 3197–3209. [Google Scholar] [CrossRef]
Shtaif, M.; Antonelli, C.; Mecozzi, A.; Chen, X. Challenges in estimating the information capacity of the fiber-optic channel. Proc. IEEE 2022, 110, 1655–1678. [Google Scholar] [CrossRef]
Mecozzi, A.; Shtaif, M. Information Capacity of Direct Detection Optical Transmission Systems. IEEE/OSA Trans. Lightw. Technol. 2018, 36, 689–694. [Google Scholar] [CrossRef] [Green Version]
Plabst, D.; Prinz, T.; Wiegart, T.; Rahman, T.; Stojanović, N.; Calabrò, S.; Hanik, N.; Kramer, G. Achievable rates for short-reach fiber-optic channels with direct detection. IEEE/OSA J. Lightw. Technol. 2022, 40, 3602–3613. [Google Scholar] [CrossRef]
Gallager, R.G. Information Theory and Reliable Communication; Wiley: New York, NY, USA, 1968. [Google Scholar]
Divsalar, D. Performance of Mismatched Receivers on Bandlimited Channels. Ph.D. Thesis, Univ. California, Los Angeles, CA, USA, 1978. [Google Scholar]
Ozarow, L.; Wyner, A. On the capacity of the Gaussian channel with a finite number of input levels. IEEE Trans. Inf. Theory 1990, 36, 1426–1428. [Google Scholar] [CrossRef]
Chagnon, M. Optical Communications for Short Reach. IEEE/OSA Trans. Lightw. Technol. 2019, 37, 1779–1797. [Google Scholar] [CrossRef]
Arnold, D.; Loeliger, H.A.; Vontobel, P.; Kavcic, A.; Zeng, W. Simulation-based computation of information rates for channels with memory. IEEE Trans. Inf. Theory 2006, 52, 3498–3508. [Google Scholar] [CrossRef] [Green Version]
Aboy-Faycal, I.; Lapidoth, A. On the capacity of reduced-complexity receivers for intersymbol interference channels. In Proceedings of the Convention of the Electrical and Electronic Engineers in Israel, Tel-Aviv, Israel, 11–12 April 2000; pp. 263–266. [Google Scholar] [CrossRef]
Rusek, F.; Prlja, A. Optimal channel shortening for MIMO and ISI channels. IEEE Trans. Wireless Commun. 2012, 11, 810–818. [Google Scholar] [CrossRef]
Hu, S.; Rusek, F. On the design of channel shortening demodulators for iterative receivers in linear vector channels. IEEE Access 2018, 6, 48339–48359. [Google Scholar] [CrossRef]
Mezghani, A.; Nossek, J.A. Analysis of 1-bit output noncoherent fading channels in the low SNR regime. In Proceedings of the IEEE International Symposium on Information Theory, Seoul, Republic of Korea, 28 June–3 July 2009; pp. 1080–1084. [Google Scholar] [CrossRef] [Green Version]
Papoulis, A. Probability, Random Variables, and Stochastic Processes, 2nd ed.; McGraw-Hill: New York, NY, USA, 1984. [Google Scholar]
Kramer, G. Capacity results for the discrete memoryless network. IEEE Trans. Inf. Theory 2003, 49, 4–21. [Google Scholar] [CrossRef]
Verdú, S. Spectral efficiency in the wideband regime. IEEE Trans. Inf. Theory 2002, 48, 1319–1343. [Google Scholar] [CrossRef] [Green Version]
Kramer, G.; Ashikhmin, A.; van Wijngaarden, A.; Wei, X. Spectral efficiency of coded phase-shift keying for fiber-optic communication. IEEE/OSA J. Lightw. Technol. 2003, 21, 2438–2445. [Google Scholar] [CrossRef]
Hui, J.Y.N. Fundamental Issues of Multiple Accessing. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 1983. [Google Scholar]
Scarlett, J.; Martinez, A.; Fabregas, A.G.i. Mismatched decoding: Error exponents, second-order rates and saddlepoint approximations. IEEE Trans. Inf. Theory 2014, 60, 2647–2666. [Google Scholar] [CrossRef] [Green Version]
Asadi Kangarshahi, E.; Guillén i Fàbregas, A. A single-letter upper bound to the mismatch capacity. IEEE Trans. Inf. Theory 2021, 67, 2013–2033. [Google Scholar] [CrossRef]
Lau, V.K.N.; Liu, Y.; Chen, T.A. Capacity of memoryless channels and block-fading channels with designable cardinality-constrained channel state feedback. IEEE Trans. Inf. Theory 2004, 50, 2038–2049. [Google Scholar] [CrossRef]
Shannon, C.E. Geometrische Deutung einiger Ergebnisse bei der Berechnung der Kanalkapazität. Nachrichtentechnische Z. 1957, 10, 1–4, English version in Claude Elwood Shannon: Collected Papers; Sloane, N.J.A., Wyner, A.D., Eds.; IEEE Press: Piscataway, NJ, USA, 1993; pp. 259–264. [Google Scholar]
Farmanbar, H.; Khandani, A.K. Precoding for the AWGN channel with discrete interference. IEEE Trans. Inf. Theory 2009, 55, 4019–4032. [Google Scholar] [CrossRef] [Green Version]
Wachsmann, U.; Fischer, R.; Huber, J. Multilevel codes: Theoretical concepts and practical design rules. IEEE Trans. Inf. Theory 1999, 45, 1361–1391. [Google Scholar] [CrossRef] [Green Version]
Stolte, N. Rekursive Codes mit der Plotkin-Konstruktion und ihre Decodierung. Ph.D. Thesis, Technische Universität Darmstadt, Darmstadt, Germany, 2002. [Google Scholar]
Arikan, E. Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels. IEEE Trans. Inf. Theory 2009, 55, 3051–3073. [Google Scholar] [CrossRef]
Seidl, M.; Schenk, A.; Stierstorfer, C.; Huber, J.B. Polar-coded modulation. IEEE Trans. Commun. 2013, 61, 4108–4119. [Google Scholar] [CrossRef]
Honda, J.; Yamamoto, H. Polar coding Without alphabet extension for asymmetric models. IEEE Trans. Inf. Theory 2013, 59, 7829–7838. [Google Scholar] [CrossRef]
Runge, C.; Wiegart, T.; Lentner, D.; Prinz, T. Multilevel binary polar-coded modulation achieving the capacity of asymmetric channels. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Espoo, Finland, 26 June–1 July 2022; pp. 2595–2600. [Google Scholar] [CrossRef]
Parthasarathy, K.R. Extreme points of the convex set of joint probability distributions with fixed marginals. Proc. Math. Sci 2007, 117, 505–515. [Google Scholar] [CrossRef] [Green Version]
Nadkarni, M.G.; Navada, K.G. On the number of extreme measures with fixed marginals. arXiv 2008, arXiv:math/0806.1214. [Google Scholar] [CrossRef]
Farmanbar, H.; Gharan, S.O.; Khandani, A.K. Channel code design with causal side information at the encoder. Eur. Trans. Telecommun. 2010, 21, 337–351. [Google Scholar] [CrossRef] [Green Version]
Birkhoff, G. Three observations on linear algebra. Univ. Nac. Tucumán. Revista A 1946, 5, 147–151. [Google Scholar]
Wolfowitz, J. Coding Theorems of Information Theory, 2nd ed.; Springer: Berlin, Germany, 1964. [Google Scholar]
Kim, T.T.; Skoglund, M. On the Expected Rate of Slowly Fading Channels With Quantized Side Information. IEEE Trans. Commun. 2007, 55, 820–829. [Google Scholar] [CrossRef]
Rosenzweig, A.; Steinberg, Y.; Shamai, S. On channels with partial channel state information at the transmitter. IEEE Trans. Inf. Theory 2005, 51, 1817–1830. [Google Scholar] [CrossRef]
Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423, 623–656, Reprinted in Claude Elwood Shannon: Collected Papers; Sloane, N.J.A.,Wyner, A.D., Eds.; IEEE Press: Piscataway, NJ, USA, 1993; pp. 5–83. [Google Scholar] [CrossRef] [Green Version]
Goldsmith, A.J.; Varaiya, P.P. Capacity of fading channels with channel side information. IEEE Trans. Inf. Theory 1997, 43, 1986–1992. [Google Scholar] [CrossRef] [Green Version]
Abou-Faycal, I.; Trott, M.; Shamai, S. The capacity of discrete-time memoryless Rayleigh-fading channels. IEEE Trans. Inf. Theory 2001, 47, 1290–1301. [Google Scholar] [CrossRef] [Green Version]
Lapidoth, A.; Moser, S.M. Capacity bounds via duality with applications to multiple-antenna systems on flat fading channels. IEEE Trans. Inf. Theory 2003, 49, 2426–2467. [Google Scholar] [CrossRef]
Taricco, G.; Elia, M. Capacity of fading channel with no side information. Elec. Lett. 1997, 33, 1368–1370. [Google Scholar] [CrossRef]
Marzetta, T.L.; Hochwald, B.M. Capacity of a mobile multiple-antenna communication link in Rayleigh flat fading. IEEE Trans. Inf. Theory 1999, 45, 139–157. [Google Scholar] [CrossRef] [Green Version]
Zheng, L.; Tse, D. Communication on the Grassmann manifold: A geometric approach to the noncoherent multiple-antenna channel. IEEE Trans. Inf. Theory 2002, 48, 359–383. [Google Scholar] [CrossRef] [Green Version]
Gursoy, M.; Poor, H.; Verdu, S. The noncoherent Rician fading channel—part I: Structure of the capacity-achieving input. IEEE Trans. Wireless Commun. 2005, 4, 2193–2206. [Google Scholar] [CrossRef] [Green Version]
Chowdhury, M.; Goldsmith, A. Capacity of block Rayleigh fading channels without CSI. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Barcelona, Spain, 10–15 July 2016; pp. 1884–1888. [Google Scholar] [CrossRef]
Goldsmith, A.J.; Médard, M. Capacity of time-varying channels with causal channel side information. IEEE Trans. Inf. Theory 2007, 53, 881–899. [Google Scholar] [CrossRef] [Green Version]
Jelinek, F. Indecomposable channels with side information at the transmitter. Inf. Control 1965, 8, 36–55. [Google Scholar] [CrossRef] [Green Version]
Das, A.; Narayan, P. Capacities of time-varying multiple-access channels with side information. IEEE Trans. Inf. Theory 2002, 48, 4–25. [Google Scholar] [CrossRef]
Van Veen, B.; Buckley, K. Beamforming: A versatile approach to spatial filtering. IEEE ASSP Mag. 1988, 5, 4–24. [Google Scholar] [CrossRef]
Liaskos, C.; Nie, S.; Tsioliaridou, A.; Pitsillides, A.; Ioannidis, S.; Akyildiz, I. A new wireless communication paradigm through software-controlled metasurfaces. IEEE Commun. Mag. 2018, 56, 162–169. [Google Scholar] [CrossRef] [Green Version]
Renzo, M.; Debbah, M.; Phan-Huy, D.T.; Zappone, A.; Alouini, M.S.; Yuen, C.; Sciancalepore, V.; Alexandropoulos, G.C.; Hoydis, J.; Gacanin, H.; et al. Smart radio environments empowered by reconfigurable AI meta-surfaces: An idea whose time has come. J. Wirel. Com. Netw. 2019, 2019, 129. [Google Scholar] [CrossRef] [Green Version]
Thangaraj, A.; Kramer, G.; Böcherer, G. Capacity bounds for discrete-time, amplitude-constrained, additive white Gaussian noise channels. IEEE Trans. Inf. Theory 2017, 63, 4172–4182. [Google Scholar] [CrossRef] [Green Version]
Csiszár, I.; Körner, J. Information Theory: Coding Theorems for Discrete Memoryless Channels; Akadémiai Kiadó: Budapest, Hungary, 1981. [Google Scholar]
Cody, W.J.; Thacher, H.C., Jr. Rational Chebyshev approximations for the exponential integral E₁(x). Math. Comp. 1968, 22, 641–649. [Google Scholar]
Nantomah, K. On Some Bounds for the Exponential Integral Function. J. Nepal Mathem. Soc. 2021, 4, 28–34. [Google Scholar] [CrossRef]
Sofotasios, P.C.; Muhaidat, S.; Karagiannidis, G.K.; Sharif, B.S. Solutions to integrals involving the Marcum Q-function and applications. IEEE Signal Proc. Lett. 2015, 22, 1752–1756. [Google Scholar] [CrossRef] [Green Version]
Neeser, F.; Massey, J. Proper complex random processes with applications to information theory. IEEE Trans. Inf. Theory 1993, 39, 1293–1302. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Rates for on-off fading with

S_{R} = 0

. The curve “Full CSIR” refers to

S_{R} = H

and is a capacity upper bound. Flash signaling uses

p = 0.05

; the GMI for the

K = 2

partition uses the threshold

t_{R} = P^{0.4} + 3

.

Figure 1. Rates for on-off fading with

S_{R} = 0

. The curve “Full CSIR” refers to

S_{R} = H

and is a capacity upper bound. Flash signaling uses

p = 0.05

; the GMI for the

K = 2

partition uses the threshold

t_{R} = P^{0.4} + 3

.

Figure 2. FDG for

n = 2

uses of a channel with CSIT. Open nodes represent statistically independent random variables, and filled nodes represent random variables that are functions of their parent variables. Dashed lines represent the CSIT influence on

X^{n}

.

Figure 2. FDG for

n = 2

uses of a channel with CSIT. Open nodes represent statistically independent random variables, and filled nodes represent random variables that are functions of their parent variables. Dashed lines represent the CSIT influence on

X^{n}

.

Figure 3. FDG for

n = 2

channel uses with different CSIT and CSIR. The hidden channel state

S_{H i}

permits dependent

S_{R i}

and

S_{T i}

.

Figure 3. FDG for

n = 2

channel uses with different CSIT and CSIR. The hidden channel state

S_{H i}

permits dependent

S_{R i}

and

S_{T i}

.

Figure 4. Rates for on-off fading with full CSIR and partial CSIT with noise parameter

ϵ = 0.1

. The curve “Best CSIR” shows the capacity with

S_{R} = H \sqrt{P (S_{T})}

. The curves for

I (A; Y | H)

, the reverse model GMI (rGMI), and the forward model GMI (GMI, K = 1) are for

S_{R} = H

with CSCG inputs

X (s_{T})

. The

I (A; Y | H)

and rGMI curves are indistinguishable in the inset.

Figure 4. Rates for on-off fading with full CSIR and partial CSIT with noise parameter

ϵ = 0.1

. The curve “Best CSIR” shows the capacity with

S_{R} = H \sqrt{P (S_{T})}

. The curves for

I (A; Y | H)

, the reverse model GMI (rGMI), and the forward model GMI (GMI, K = 1) are for

S_{R} = H

with CSCG inputs

X (s_{T})

. The

I (A; Y | H)

and rGMI curves are indistinguishable in the inset.

Figure 5. Rates for on-off fading with

S_{T} = H

and

S_{R} = 0

. The GMI for the

K = 2

partition uses the threshold

t_{R} = \sqrt{P} + 3

.

Figure 5. Rates for on-off fading with

S_{T} = H

and

S_{R} = 0

. The GMI for the

K = 2

partition uses the threshold

t_{R} = \sqrt{P} + 3

.

Figure 6. Rates for on-off fading with partial CSIR and CSIT@R. The curve “Best CSIR” shows the capacity with

S_{R} = H \sqrt{P (S_{T})}

. The mutual information

I (X; Y | S_{R})

and the GMI are for

\Pr [S_{R} \neq H] = 0.1

and with CSCG inputs

X (s_{T})

. The GMI for the

K = 2

partition uses

t_{R} = P^{0.4}

. The curve labeled ‘c-waterfill’ shows the conventional waterfilling rates.

Figure 6. Rates for on-off fading with partial CSIR and CSIT@R. The curve “Best CSIR” shows the capacity with

S_{R} = H \sqrt{P (S_{T})}

. The mutual information

I (X; Y | S_{R})

and the GMI are for

\Pr [S_{R} \neq H] = 0.1

and with CSCG inputs

X (s_{T})

. The GMI for the

K = 2

partition uses

t_{R} = P^{0.4}

. The curve labeled ‘c-waterfill’ shows the conventional waterfilling rates.

Figure 7. Capacities for Rayleigh fading with full CSIR, a one-bit quantizer with threshold

Δ

, and CSIT@R.

Figure 7. Capacities for Rayleigh fading with full CSIR, a one-bit quantizer with threshold

Δ

, and CSIT@R.

Figure 8. Capacities for Rayleigh fading,

S_{R} = \sqrt{P (S_{T})} H

, and a one-bit quantizer with threshold

Δ = 1

, and various CSIT error probabilities

ϵ

.

Figure 8. Capacities for Rayleigh fading,

S_{R} = \sqrt{P (S_{T})} H

, and a one-bit quantizer with threshold

Δ = 1

, and various CSIT error probabilities

ϵ

.

Figure 9. Rates for Rayleigh fading,

S_{R} = H

and

S_{R} = H \sqrt{P (S_{T})}

, a one-bit quantizer with threshold

Δ = 1

, and various

ϵ

. The curves labeled “best CSIR” show the capacities with

S_{R} = H \sqrt{P (S_{T})}

. The curves labeled “GMI” show the rates (285) for the optimal powers

P (0)

and

P (1)

.

Figure 9. Rates for Rayleigh fading,

S_{R} = H

and

S_{R} = H \sqrt{P (S_{T})}

, a one-bit quantizer with threshold

Δ = 1

, and various

ϵ

. The curves labeled “best CSIR” show the capacities with

S_{R} = H \sqrt{P (S_{T})}

. The curves labeled “GMI” show the rates (285) for the optimal powers

P (0)

and

P (1)

.

Figure 10. Rates for Rayleigh fading with

S_{T} = H

and

S_{R} = 0

. The threshold t was optimized for the

K = 1

curves, while

t = P^{- 0.4}

for the

I (A; Y)

and

K = 2

curves. The

K = 2

GMI uses

t_{R} = P^{0.4}

.

Figure 10. Rates for Rayleigh fading with

S_{T} = H

and

S_{R} = 0

. The threshold t was optimized for the

K = 1

curves, while

t = P^{- 0.4}

for the

I (A; Y)

and

K = 2

curves. The

K = 2

GMI uses

t_{R} = P^{0.4}

.

Figure 12. Rates for Rayleigh fading with full CSIT and

S_{R} = 1 (G \geq t)

.

Figure 12. Rates for Rayleigh fading with full CSIT and

S_{R} = 1 (G \geq t)

.

Figure 13. Rates for Rayleigh fading with partial CSIR and CSIT@R. The curves labeled ‘q-waterfill’ and ‘c-waterfill’ are the quadratic and conventional waterfilling rates, respectively.

Figure 14. FDG for a block fading model with

n = 2

blocks of length

L = 2

and in-block feedback. Across-block dependence via past

S_{T i ℓ}

is not shown.

Figure 14. FDG for a block fading model with

n = 2

blocks of length

L = 2

and in-block feedback. Across-block dependence via past

S_{T i ℓ}

is not shown.

Figure 15. Capacities for Rayleigh block fading with

L = 1, 2, 3

and a CSIT delay of

D = L - 1

. The CSIT at symbol L is

S_{T L} = q_{u} (G)

.

Figure 15. Capacities for Rayleigh block fading with

L = 1, 2, 3

and a CSIT delay of

D = L - 1

. The CSIT at symbol L is

S_{T L} = q_{u} (G)

.

Figure 16. Rates for Rayleigh block fading with block lengths

L = 10, 20, 100

. The CSIT at symbol 2 is

S_{T 2} = q_{u} (| Y_{1} |)

.

Figure 16. Rates for Rayleigh block fading with block lengths

L = 10, 20, 100

. The CSIT at symbol 2 is

S_{T 2} = q_{u} (| Y_{1} |)

.

Table 1. Models Studied in Section 6 (General Fading), Section 7 (On–Off Fading) and Section 8 (Rayleigh Fading).

		CSIR
		Full	Partial/No
CSIT	Full	Section 6.3	Section 6.5
	@R	Section 6.3	Section 6.6
	Partial/No	Section 6.4	Section 6.2

Table 2. Power Control Policies and Minimal SNRs.

		CSIR
		None: $S_{R} = 0$	$S_{R} = 1 (G \geq t)$
Policy	TCP	Equation (221)	Equation (226)
	TMF	Equation (222)	Equation (227)
	TCI	Equation (223)	Equation (228)
	GMI-Optimal	See Theorem 2
	TMMSE	See Remark 64

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kramer, G. Information Rates for Channels with Fading, Side Information and Adaptive Codewords. Entropy 2023, 25, 728. https://doi.org/10.3390/e25050728

AMA Style

Kramer G. Information Rates for Channels with Fading, Side Information and Adaptive Codewords. Entropy. 2023; 25(5):728. https://doi.org/10.3390/e25050728

Chicago/Turabian Style

Kramer, Gerhard. 2023. "Information Rates for Channels with Fading, Side Information and Adaptive Codewords" Entropy 25, no. 5: 728. https://doi.org/10.3390/e25050728

APA Style

Kramer, G. (2023). Information Rates for Channels with Fading, Side Information and Adaptive Codewords. Entropy, 25(5), 728. https://doi.org/10.3390/e25050728

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Information Rates for Channels with Fading, Side Information and Adaptive Codewords

Abstract

1. Introduction

1.1. Block Fading

1.2. CSI and In-Block Feedback

1.3. Auxiliary Models

1.4. Refined Auxiliary Models

1.5. Organization

2. Preliminaries

2.1. Basic Notation

2.2. Vectors and Matrices

2.3. Random Variables

2.4. Second-Order Statistics

2.5. MMSE and LMMSE Estimation

2.6. Entropy, Divergence, and Information

2.7. Entropy and Information Bounds

2.8. Capacity and Wideband Rates

2.9. Uniformly-Spaced Quantizer

3. Generalized Mutual Information

3.1. AWGN Forward Model with CSCG Inputs

3.2. CSIR and K-Partitions

3.3. Example: On-Off Fading

4. Channels with CSIT

4.1. Model

4.2. Capacity

4.3. Structure of the Optimal Input Distribution

4.4. Generalized Mutual Information

4.5. Optimal Codebooks for CSCG Forward Models

4.6. Forward Model GMI for MIMO Channels

5. Channels with CSIR and CSIT

5.1. Capacity and GMI

5.2. CSIT@ R

5.3. MIMO Channels and K-Partitions

6. Fading Channels with AWGN

6.1. CSIR and CSIT Models

6.2. No CSIR, No CSIT

6.3. Full CSIR, CSIT@ R

6.4. Full CSIR, Partial CSIT

6.5. Partial CSIR, Full CSIT

6.6. Partial CSIR, CSIT@ R

7. On-Off Fading

7.1. Full CSIR, CSIT@ R

7.2. Full CSIR, Partial CSIT

7.3. Partial CSIR, Full CSIT

7.4. Partial CSIR, CSIT@ R

8. Rayleigh Fading

8.1. No CSIR, No CSIT

8.2. Full CSIR, CSIT@ R

8.3. Full CSIR, Partial CSIT

8.4. Partial CSIR, Full CSIT

8.5. Partial CSIR, CSIT@ R

9. Channels with In-Block Feedback

9.1. Model and Capacity

9.2. GMI for Scalar Channels

9.3. CSIT@ R

9.4. Fading Channels with AWGN

9.5. Full CSIR, Partial CSIT

9.6. On-Off Fading with Delayed CSIT

9.7. Rayleigh Fading and One-Bit Feedback

10. Conclusions

Funding

Acknowledgments

Conflicts of Interest

Appendix A. Special Functions

Appendix A.1. Non-Central Chi-Squared Distribution

Appendix A.2. Exponential Integral

Appendix A.3. Gamma Functions

Appendix B. Forward Model GMIs with K = 2

Appendix B.1. On-Off Fading

Appendix B.2. On-Off Fading, Partial CSIR, and Full CSIT

Appendix B.3. On-Off Fading, Partial CSIR, and CSIT@R

Appendix B.4. Rayleigh Fading, No CSIR, full CSIT, and TCI

Appendix C. Conditional Second-Order Statistics

Appendix C.1. No CSIR, No CSIT

Appendix C.2. Full CSIR, Partial CSIT

Appendix C.3. Partial CSIR, Full CSIT

Appendix C.4. Partial CSIR, CSIT@R

Appendix D. Proof of Lemma 2 and (119)

Appendix E. Proof of Lemma 3