New Formulas of Feedback Capacity for AGN Channels with Memory: A Time-Domain Sufficient Statistic Approach

Charalambos D. Charalambous; Christos Kourtellaris; Stelios Louka

doi:10.3390/e27020207

,

and

Department of Electrical and Computer Engineering, University of Cyprus, P.O. Box 20537, Nicosia CY-1678, Cyprus

^*

Author to whom correspondence should be addressed.

^†

Preliminary results are presented in part at the 2021 IEEE Information Theory Workshop (ITW), Kanazawa, Japan, 17–21 October 2021. This paper appeared in arXiv document https://arxiv.org/abs/2010.06226, 13 October 2020.

Entropy2025, 27(2), 207;https://doi.org/10.3390/e27020207

This article belongs to the Section Information Theory, Probability and Statistics

Version Notes

Order Reprints

Review Reports

Abstract

Recently, several papers identified technical issues related to equivalent time-domain and frequency-domain “characterization of the n–block or transmission” feedback capacity formula and its asymptotic limit, the feedback capacity, of additive Gaussian noise (AGN) channels, first introduce by Cover and Pombra in 1989 (IEEE Transactions on Information Theory). The main objective of this paper is to derive new results on the Cover and Pombra characterization of the n–block feedback capacity formula, and to clarify the main points of confusion regarding the time-domain results that appeared in the literature. The first part of this paper derives new equivalent time-domain sequential characterizations of feedback capacity of AGN channels driven by non-stationary and non-ergodic Gaussian noise. It is shown that the optimal channel input processes of the new equivalent sequential characterizations are expressed as functionals of a sufficient statistic and a Gaussian orthogonal innovations process. Further, the Cover and Pombra n–block capacity formula is expressed as a functional of two generalized matrix difference Riccati equations (DREs) of the filtering theory of Gaussian systems, contrary to results that appeared in the literature and involve only one DRE. It is clarified that prior literature deals with a simpler problem that presupposes the state of the noise is known to the encoder and the decoder. In the second part of this paper, the existence of the asymptotic limit of the n–block feedback capacity formula is shown to be equivalent to the convergence properties of solutions of the two generalized DREs. Further, necessary and or sufficient conditions are identified for the existence of asymptotic limits, for stable and unstable Gaussian noise, when the optimal input distributions are asymptotically time-invariant but not necessarily stationary. This paper contains an in-depth analysis, with various examples, and identifies the technical conditions on the feedback code and state space noise realization, so that the time-domain capacity formulas that appeared in the literature, for AGN channels with stationary noises, are indeed correct.

Keywords:

feedback capacity; non-stationary; unstable; Gaussian channels; memory; sufficient statistic

1. Introduction, Motivation, Main Results, Current State of Knowledge

In the recent papers [1,2,3], concerns are raised whether the time-domain analysis in [4], deals with the Cover and Pombra [5] “characterization of the n–block or transmission” feedback capacity formula and its asymptotic limit, the feedback capacity, of additive Gaussian noise (AGN) channels. Furthermore, the recent comment paper [6] identified gaps in the proof of the simplified frequency-domain characterization of Theorem 4.1 in [4]. The main objective of this paper is to derive new results on the Cover and Pombra characterization of the n–block feedback capacity formula, and to clarify the main points of confusion regarding the time-domain results of [4] and related literature i.e., [7].

1.1. The Problem, Motivation, and Main Results

We consider the additive Gaussian noise (AGN) channel defined by [5]

\begin{matrix} Y_{t} = X_{t} + V_{t}, t = 1, \dots, n, \frac{1}{n} E \{\sum_{t = 1}^{n} {(X_{t})}^{2}\} \leq κ, κ \in [0, \infty) \end{matrix}

(1)

where

$X^{n} \overset{▵}{=} {X_{1}, X_{2}, \dots, X_{n}}$ is the sequence of channel input random variables (RVs) $X_{t} : Ω \to R$ ;
$Y^{n} \overset{▵}{=} {Y_{1}, Y_{2}, \dots, Y_{n}}$ is the sequence of channel output RVs $Y_{t} : Ω \to R$ ;
$V^{n} \overset{▵}{=} {V_{1}, \dots, V_{n}}$ is the sequence of jointly Gaussian distributed RVs $V_{t} : Ω \to R$ , with distribution $P_{V^{n}} (d v^{n})$ , which are not necessarily stationary or ergodic.
We wish to examine the feedback capacity of the AGN channel (1) for two distinct formulations of code definition and a noise model, described below under Case (I) and Case (II) formulations. Special cases of these will be related to the existing literature.

Case (I) Formulation. This formulation respects the following two conditions.

(I.1). The feedback code does not assume knowledge of the initial state of the noise at the encoder and the decoder (see Definition 1), and
(I.2) the noise sequence $V^{n}$ is represented by a partially observable (partially observable means that knowledge of $V^{t - 1}$ and the initial state do not specify the state $S^{t}, t = 1, \dots, n$ .) state space realization, with state sequence $S^{n}$ (see Definition 2).

For a formulation that respects (I.1) and (I.2), Cover and Pombra characterized the “n–finite transmission” feedback capacity [5] (Equations (10) and (11)), using the information measure (

C_{n}^{f b} (κ)

is identified using the converse coding theorem [5]),

\begin{matrix} C_{n}^{f b} (κ) \overset{▵}{=} \sup_{P_{X_{t} | X^{t - 1}, Y^{t - 1}}, t = 1, \dots, n : \frac{1}{n} E \{\sum_{t = 1}^{n} {(X_{t})}^{2}\} \leq κ} \sum_{t = 1}^{n} H (Y_{t} | Y^{t - 1}) - H (V^{n}) \end{matrix}

(2)

provided the supremum exists, where

H (\cdot)

denotes (differential) entropy.

Case (I.a) Formulation. Although not mentioned in [5], if the feedback code assumes knowledge of the initial state of the noise or the channel,

S_{1} = s

, at the encoder and the decoder (see Definition 3), it follows directly from [5] (Equations (10) and (11)), that (2) is replaced by the information measure

\begin{matrix} C_{n}^{f b} (κ, s) \overset{▵}{=} \sup_{P_{X_{t} | X^{t - 1}, Y^{t - 1}, s}, t = 1, \dots, n : \frac{1}{n} E \{\sum_{t = 1}^{n} {(X_{t})}^{2} | S_{1} = s\} \leq κ} \sum_{t = 1}^{n} H (Y_{t} | Y^{t - 1}, s) - H (V^{n} | s) . \end{matrix}

(3)

Case (II) Formulation. This formulation relaxes Conditions (I.1) and (I.2) to the following two conditions:

(II.1) The feedback code assumes knowledge of the initial state of the noise or the channel, $S_{1} = s$ , at the encoder and the decoder (see Definition 3);
(II.2) the noise sequence $V^{n}$ is represented by a fully observable state space realization (fully observable means knowledge of $V^{t - 1}$ and initial state specify the state $S^{t}, t = 1, \dots, n$ ), with state sequence $S^{n}$ such that the noise $V^{t - 1}$ and the initial state $S_{1}$ uniquely defines the noise state sequence $S^{t}$ and vice versa for $t = 1, \dots, n$ .
Thus, Formulation (2), which respects Conditions (I.1) and (I.2), is the most general.

For a formulation that respects Conditions (II.1) and (II.2), Yang, Kavcic, and Tatikonda [8] characterized the n–finite transmission feedback capacity [8] (Section II (particularly Section II.C, I–III)), using the information measure,

\begin{matrix} C_{n}^{f b, S} (κ, s) \overset{▵}{=} \sup_{P_{X_{t} | S^{t}, Y^{t - 1}, s}, t = 1, \dots, n : \frac{1}{n} E \{\sum_{t = 1}^{n} {(X_{t})}^{2} | S_{1} = s\} \leq κ} \sum_{t = 1}^{n} H (Y_{t} | Y^{t - 1}, s) - H (V^{n} | s) . \end{matrix}

(4)

Compared to

C_{n}^{f b} (κ)

and

C_{n}^{f b} (κ, s)

, the definition of

C_{n}^{f b, S} (κ, s)

imposes Condition (II.2) and is fundamentally different from the former, because the input distributions of

C_{n}^{f b, S} (κ, s)

are different from

C_{n}^{f b} (κ)

and

C_{n}^{f b} (κ, s)

. Hence, the information rates of the three formulas are generally different. However, for certain Gaussian noise models of

V^{n}

, it might be the case that, under Condition (II.1), the information measures

C_{n}^{f b} (κ, s)

and

C_{n}^{f b, S} (κ, s)

coincide. We provide several examples in the main body of this paper.

Motivation and Fundamental Differences of Case (I) and Case (II) Formulations.

At this point, we pause to discuss two technical issues of Case (I) and Case (II) formulations, which are not clarified in [4,7,9]. These technical issues are related to the time-domain characterization of feedback capacity, Theorem 6.1 in [4]; they are first discussed in [1,2,3,10]. A recent comment paper [6] also identified gaps in the proof of the frequency-domain characterization of capacity, Theorem 4.1 in [4], which affect the proofs of Theorems 4.1, 4.6, 5.3 and 6.1; Propositions 4.7 and 5.1; Remarks 4.5 and 5.2; and Lemma 6.1 in [4].

To illustrate the technical issues, we consider the autoregressive moving average stable noise denoted by ARMA

(a, c), a \in [- 1, 1], c \in (- 1, 1), c \neq a

, studied by many authors [4,7,8,9,11,12], as a benchmark example.

\begin{matrix} V_{t} = c V_{t - 1} + W_{t} - a W_{t - 1}, \forall t \in Z_{+} \overset{▵}{=} {1, 2, \dots}, \end{matrix}

(5)

\begin{matrix} V_{0} \in N (0, K_{V_{0}}), K_{V_{0}} \geq 0, W_{0} \in N (0, K_{W_{0}}), K_{W_{0}} \geq 0, W_{t} \in N (0, K_{W}), K_{W} > 0, \forall t, \end{matrix}

(6)

\begin{matrix} {W_{0}, W_{1}, \dots, W_{n}} indep. seq. and indep. of V_{0} . \end{matrix}

(7)

As in [4,7], we define the state variable of the noise by

\begin{matrix} S_{t} \overset{▵}{=} \frac{c V_{t - 1} - a W_{t - 1}}{c - a}, \forall t \in Z_{+} \end{matrix}

(8)

Then, the state space realization of

V^{n}

is

\begin{matrix} S_{t + 1} = c S_{t} + W_{t}, V_{t} = (c - a) S_{t} + W_{t}, \forall t \in Z_{+}, \end{matrix}

(9)

\begin{matrix} K_{S_{1}} = \frac{{(c)}^{2} K_{V_{0}} + {(a)}^{2} K_{W_{0}}}{{(c - a)}^{2}}, K_{V_{0}} \geq 0, K_{W_{0}} \geq 0 both given . \end{matrix}

(10)

Bounds on feedback capacity, when

a = 0

, i.e., corresponding to an autoregressive noise are derived in [13,14,15] using linear coding schemes.

For the Case (I) formulation, the information measure

C_{n}^{f b} (κ)

, i.e., (2), corresponds to the supremum over channel input distributions

P_{X_{t} | X^{t - 1}, Y^{t - 1}}, t = 1, \dots, n

.

For the Case (II) formulation, the information measure

C_{n}^{f b, S} (κ, s)

, i.e., (4), corresponds to the supremum over channel input distributions

P_{X_{t} | S^{t}, Y^{t - 1}, s}, t = 1, \dots, n

, and Conditions (II.1) and (II.2) are necessary. Alternatively, the necessary conditions for

C_{n}^{f b} (κ)

to reduce to

C_{n}^{f b, S} (κ, s)

are the conditions stated in (12) and (13) (these also follow independently of the Case (I) formulation from the converse coding theorem).

\begin{matrix} P_{X_{t} | X^{t - 1}, Y^{t - 1}} = P_{X_{t} | V^{t - 1}, Y^{t - 1}} holds by channel def. Y_{k} = X_{k} + V_{k}, k = 1, \dots, n; \end{matrix}

(11)

\begin{matrix} = P_{X_{t} | V^{t - 1}, Y^{t - 1}, s} i f S_{1} = s is known to the feedback code, i.e., Cond. (II.1); \end{matrix}

(12)

\begin{matrix} = P_{X_{t} | S^{t}, Y^{t - 1}, s} i f (V^{t - 1}, S_{1} = s) uniquely def . S^{t} and vice-versa, i.e., Cond. (II.2) . \end{matrix}

(13)

Thus, a necessary condition for (13) to hold is

S_{1} = (V_{0}, W_{0}) = (v_{0}, w_{0}) = s

, which is known to the encoder. It follows from [4] (Theorem 6.1; also see Lemma 6.1 and the comments above it, Equation (71)), that Conditions (II.1) and (II.2) are assumed; hence, these results are not developed for the Cover and Pombra [5] formulation. Additional information can be found in Remark 5.

Second, the analysis of the asymptotic per unit time limits of (2)–(4), i.e.,

\begin{matrix} C^{f b} (κ) \overset{▵}{=} & lim_{n ⟶ \infty} \sup_{{P_{X_{t} | X^{t - 1}, Y^{t - 1}}}_{t = 1}^{n}, \frac{1}{n} E \{\sum_{t = 1}^{n} {(X_{t})}^{2}\} \leq κ} \frac{1}{n} \{\sum_{t = 1}^{n} H (Y_{t} | Y^{t - 1}) - H (V^{n})\}, \end{matrix}

(14)

is a non-trivial problem, and it requires certain technical necessary and/or sufficient conditions for the limits to exist, as well as for the rates to be independent of the initial distribution

P_{V_{1}}

or the initial data,

S_{1} = s

(see [3,10]), even if the noise process

V^{n}

is stationary. It is easy to verify that the analysis in the past studies [4,7,9,11,12] considered the simpler problem

C_{n}^{f b, S} (κ, s)

, and that the asymptotic limit

{lim}_{n ⟶ \infty} \frac{1}{n} C_{n}^{f b, S} (κ, s)

does not correspond to the ergodic capacity. We clarify these points in our examples.

Main Results. The main results of this paper are briefly stated below.

(1) In the first part of this paper, we derive new equivalent sequential characterizations of the Cover and Pombra “n–block or transmission” feedback capacity formula [5] (Equation (11)),

C_{n}^{f b} (κ)

(this first appeared in [10]). In particular, we derive equivalent realizations to the optimal channel input process

X^{n}

[5] (Equation (11)), which are linear functionals of a finite-dimensional sufficient statistic and an orthogonal innovations process. From these new realizations follows the equivalent sequential characterizations of the “n–block or transmission” feedback capacity formula [5] (Equation (11)), which will henceforth be called the “n–finite transmission feedback information (n–FTFI) capacity”. The new n–FTFI capacity is expressed as a functional of two generalized matrix difference Riccati equations (DREs) of the filtering theory of Gaussian systems, instead of the one DRE given in [4,7,8]. In fact, we also show that the n–FTFI capacity of [4,7,8] corresponds to

C_{n}^{f b, S} (κ, s)

.

(2) In the second part of this paper, we analyze the asymptotic per unit time limit of the sequential characterizations of the n–FTFI capacity, when the supremum and limit over

n ⟶ \infty

are interchanged in (14), denoted by

C^{f b, o} (κ)

. Then, we show

C^{f b, o} (κ) = C^{f b} (κ)

. We identify necessary and/or sufficient conditions for the asymptotic limit to exist, and for the optimal input process

X_{t}, t = 1, \dots,

to be asymptotically stationary, in terms of the convergence properties of two generalized matrix difference Riccati equations (DREs) to their corresponding two generalized matrix algebraic Riccati equations (AREs). We make use of the so-called detectability, stabilizability, and unit circle controllablity conditions of generalized Kalman filters of Gaussian processes [16,17].

(3) From (1) and (2), we derive analogous results for

C_{n}^{f b} (κ, s)

and its per unit time asymptotic limit, denoted by

C^{f b, o} (κ, s)

, as degenerate versions of

C_{n}^{f b} (κ)

and

C^{f b, o} (κ)

. Further, we show that for certain noise models, and under certain conditions, it holds that

C^{f b, o} (κ, s) = C^{f b} (κ)

, i.e., these values do not depend on the initial state or initial distributions.

(4) From (1) and (2), we derive analogous results for the Case (II) formulation, i.e.,

C_{n}^{f b, S} (κ, s)

and its per unit time asymptotic limit denoted by

C^{f b, S} (κ, s)

, and we show these are fundamentally different from the Case (I) formulation,

C_{n}^{f b} (κ)

and

C^{f b} (κ)

, as well as

C_{n}^{f b} (κ, s)

and

C^{f b} (κ, s)

. In particular, we show that the characterizations of n–FTFI capacity,

C_{n}^{f b, S} (κ, s)

, for the Case (II) formulation follow directly from the Case (I) formulation, as a special case (an independent derivation is also presented). Moreover,

C_{n}^{f b, S} (κ, s)

is a functional of one generalized DRE, while

C_{n}^{f b} (κ), C_{n}^{f b} (κ, s)

are functionals of two generalized DREs.

1.2. The Code Definitions and Noise Models

Case (I) Feedback Code and Noise Definitions. For the Case (I) formulation, we consider the code of Definition 1 (due to [5]).

Definition 1.

Time-varying feedback code [5]

A noiseless time-varying feedback code for the AGN Channel (1) is denoted by $(2^{n R}, n)$ , $n = 1, 2, \dots$ and consists of the following elements and assumptions:
(i) The uniformly distributed messages $W : Ω \to M_{n} \overset{▵}{=} \{1, 2, \dots, 2^{n R}\}$ .
(ii) The time-varying encoder strategies, often called codewords of block length n, defined by (the superscript $e (\cdot)$ on $E^{e}$ indicates that the distribution depends on the strategy $e (\cdot) \in E_{[1, n]} (κ)$ ).

$\begin{matrix} E_{[1, n]} (κ) ≜ & {X_{1} = e_{1} (W), X_{2} = e_{2} (W, X_{1}, Y_{1}) \dots, X_{n} = e_{n} (W, X^{n - 1}, Y^{n - 1}) : \\ \frac{1}{n} E^{e} (\sum_{t = 1}^{n} {(X_{t})}^{2}) \leq κ} . \end{matrix}$

(15)

(iii) The average error probability of the decoder functions $y^{n} ⟼ d_{n} (y^{n}) \in M_{n}$ , defined by

$\begin{matrix} P_{e r r o r}^{(n)} = P \{d_{n} (Y^{n}) \neq W\} = \frac{1}{2^{n R}} \sum_{w = 1}^{2^{n R}} P \{d_{n} (Y^{n}) \neq w | W = w\} . \end{matrix}$

(16)

(iv) The channel input sequence “ $X^{n} \overset{▵}{=} {X_{1}, \dots, X_{n}}$ is causally related (a notion found in [5], page 39, above Lemma 5) to $V^{n}$ ”, which is equivalent to the following decomposition of the joint probability distribution of $(X^{n}, V^{n})$ :

$\begin{matrix} P_{X^{n}, V^{n}} = & P_{V_{n} | V^{n - 1}, X^{n}} P_{X_{n} | X^{n - 1}, V^{n - 1}} \dots P_{V_{2} | V_{1}, X^{2}} P_{X_{2} | X_{1}, V_{1}} P_{V_{1} | X_{1}} P_{X_{1}} \end{matrix}$

(17)

$\begin{matrix} = & P_{V^{n}} \prod_{t = 1}^{n} P_{X_{t} | X^{t - 1}, V^{t - 1}}, t h a t i s, P_{V_{t} | V^{t - 1}, X^{t}} = P_{V_{t} | V^{t - 1}} . \end{matrix}$

(18)

That is, $X^{t} \leftrightarrow V^{t - 1} \leftrightarrow V_{t}$ is a Markov chain, for $t = 1, \dots, n$ . As usual, the messages W are independent of the channel noise $V^{n}$ .
A rate R is called an achievable rate with feedback coding, if there exists a sequence of codes $(2^{n R}, n), n = 1, 2, \dots$ , such that $P_{e r r o r}^{(n)} ⟶ 0$ as $n ⟶ \infty$ . The feedback capacity $C^{f b} (κ)$ is defined as the supremum of all achievable rates R.

We note that, in general,

C^{f b} (κ)

depends on the initial distribution

P_{V_{1}}

; the ergodic capacity requires that

C^{f b} (κ)

is independent of

P_{V_{1}}

(see [18]).

We consider a noise model which is consistent with [5], i.e.,

V^{n}

is jointly Gaussian distributed,

P_{V^{n}} = \times_{t = 1}^{n} P_{V_{t} | V^{t - 1}}

, and induced by the partially observable state space (PO-SS) realization of Definition 2.

Definition 2.

A time-varying PO-SS realization of Gaussian noise

V^{n} \in N (0, K_{V^{n}})

is defined by

$\begin{matrix} S_{t + 1} = A_{t} S_{t} + B_{t} W_{t}, t = 1, \dots, n - 1 \end{matrix}$

(19)

$\begin{matrix} V_{t} = C_{t} S_{t} + N_{t} W_{t}, t = 1, \dots, n, \end{matrix}$

(20)

$\begin{matrix} S_{1} \in N (μ_{S_{1}}, K_{S_{1}}), K_{S_{1}} ⪰ 0, \end{matrix}$

(21)

$\begin{matrix} W_{t} \in N (0, K_{W_{t}}), K_{W_{t}} ⪰ 0, t = 1 \dots, n i n d e p . G a u s s i a n p r o c e s s, W^{t} i n d e p . o f S_{1}, \end{matrix}$

(22)

$\begin{matrix} S_{t} : Ω \to R^{n_{s}}, W_{t} : Ω \to R^{n_{w}}, V_{t} : Ω \to R^{n_{v}}, R_{t} \overset{▵}{=} N_{t} K_{W_{t}} N_{t}^{T} ≻ 0, t = 1, \dots, n \end{matrix}$

(23)

where $n_{v} = 1$ , $n_{s}, n_{w}$ are arbitrary positive integers, $(A_{t}, B_{t}, C_{t}, N_{t}, μ_{S_{1}}, K_{S_{1}}, K_{W_{t}})$ are nonrandom, measurable in t, with bounded entries, and $n_{s}, n_{w}$ are finite positive integers.
The above components $B_{t} W_{t}$ and $N_{t} W_{t}$ can be taken as

$\begin{matrix} B_{t} W_{t} = B_{t}^{1} W_{t}^{1} + B_{t}^{2} W_{t}^{2}, N_{t} W_{t} = N_{t}^{1} W_{t}^{1} + N_{t}^{2} W_{t}^{2}, t = 1, \dots, n, \end{matrix}$

(24)

$\begin{matrix} W^{1, n} a n d W^{1, n} a r e i n d e p e n d e n t, \end{matrix}$

(25)

where $B_{t}^{1}$ or $B_{t}^{2}$ can be zero and $N_{t}^{1}$ or $N_{t}^{2}$ can be zero $\forall t$ .
A time-invariant PO-SS realization of the Gaussian noise $V^{n} \in N (0, K_{V^{n}})$ is defined by (19)–(23), with $(A_{t}, B_{t}, C_{t}, N_{t}, K_{W_{t}}) = (A, B, C, N, K_{W}), \forall t$ .

For the Case (I) formulation, we use the terminology “partially observable”, which is standard in filtering theory [16], because the noise

V^{n}

induces a distribution

P_{V^{n}} = \times_{t = 1}^{n} P_{V_{t} | V^{t - 1}}

, and

P_{V_{t} | V^{t - 1}}

cannot be expressed as a function of the state of the noise, i.e.,

V^{t - 1}

does not uniquely define

S^{t}

. However, if

S_{1} = s

is known to the encoder, then it can be as easily verified from the ARMA

(a, c)

, that

V^{t - 1}

uniquely defines

S^{t}

, recursively. However, for the PO-SS realization, with

B_{t}^{2} = 0, N_{t}^{1} = 0, \forall t

, even if the initial state

S_{1} = s

is known to the encoder,

V^{t - 1}

does not uniquely define

S^{t}

because there are two independent noises

W^{1, n - 1}

and

W^{2, n}

that enter the equations of

S^{n}

and

V^{n}

. The PO-SS realization is often adopted in many practical problems of engineering and science to realize jointly Gaussian processes

V^{n}

.

We should emphasize that for the Case (I) formulation to be consistent with the Cover and Pombra [5] formulation, both the code of Definition 1 and the PO-SS realization of Definition 2 must respect the following two conditions:

(A1) The initial state

S_{1}

of the noise is not known at the encoder nor the decoder;

(A2) At each t, the representation of the noise

V^{t - 1}

by the PO-SS realization of Definition 2 does not uniquely determine the state of the noise

S^{t}

and vice-versa, i.e., it is a partially observable realization.

Case (II) Formulation of Feedback Code and Noise Definitions. For the Case (II) formulation, we presuppose the following:

Condition 1. The initial state of the noise or channel

S_{1} = s

is known to the encoder and decoder;

Condition 2. Given a fixed initial state

S_{1} = s

, known to the encoder and the decoder, at each t, the channel noise

V^{t - 1}

uniquely defines the state of the noise

S^{t}

and vice-versa.

Thus, for the Case (II) formulation, the code is that of Definition 3, below (hence different from Definition 1).

Definition 3.

A code with initial state known at the encoder and the decoder

A variant of the code of Definition 1 is feedback code with the initial state of the noise or channel $S_{1} = s$ , known to the encoder and decoder strategies, denoted by $(s, 2^{n R}, n)$ , $n = 1, 2, \dots$ .
The code $(s, 2^{n R}, n)$ , $n = 1, 2, \dots$ is defined as in Definition 2, with (ii), (iii), and (iv) being replaced by

$\begin{matrix} E_{[1, n]}^{s} (κ) ≜ & {X_{1} = e_{1} (W, S_{1}), X_{2} = e_{2} (W, S_{1}, X_{1}, Y_{1}) \dots, X_{n} = e_{n} (W, S_{1}, X^{n - 1}, Y^{n - 1}) : \end{matrix}$

$\begin{matrix} \frac{1}{n} E^{e} \{\sum_{i = 1}^{n} {(X_{t})}^{2} | S_{1} = s\} \leq κ}, (y^{n}, s) ⟼ d_{n}^{s} (y^{n}, s) \in M_{n}, \end{matrix}$

(26)

$\begin{matrix} P_{X^{n}, V^{n} | S_{1}} = & P_{V^{n} | S_{1}} \prod_{t = 1}^{n} P_{X_{t} | X^{t - 1}, V^{t - 1}, S_{1}}, t h a t i s, P_{V_{t} | V^{t - 1}, X^{t}, S_{1}} = P_{V_{t} | V^{t - 1}, S_{1}} . \end{matrix}$

(27)

The feedback capacity is denoted by $C^{f b} (κ, s)$ and should be distinguished from $C^{f b} (κ)$ .
The feedback capacity is denoted by and $C^{f b, S} (κ, s)$ if, in addition, Condition 2 holds.
The initial state may include $S_{1} \overset{▵}{=} (V_{- \infty}^{0}, Y_{- \infty}^{0})$ , etc.

It will become apparent that past studies [4,7,8] considered feedback capacity,

C^{f b, S} (κ, s)

, and not

C^{f b} (κ, s)

or

C^{f b} (κ)

.

For the Case (II) formulation, it is obvious (from the converse to the coding theorem) that the optimal channel input conditional distribution is expressed as a function of the state of the noise,

S^{n}

, due to (12), (13).

1.3. Approach of This Paper

Our approach and analysis of information measures (2)–(4), as well sa their per unit time limits, is based on the following two step procedure:

Step # 1. We apply a linear transformation to the Cover and Pombra optimal channel input process [5] (Equation (11)), (see (33)–(39) to equivalently represent it by a linear functional of the past channel noise sequence, the past channel output sequence, and an orthogonal Gaussian process, i.e., an innovations process. That is,

X^{n}

is uniquely represented, since it is expressed in terms of the orthogonal process.

Step # 2. We express the optimal input process by a functional of a sufficient statistic, which satisfies a Markov recursion, and an orthogonal innovations process. It then follows that the Cover and Pombra characterization of the “n–block” formula [5] (Equation (10)) (see (33) and (34)) is equivalently represented by a sequential characterization. The problem of feedback capacity is then expressed as the maximization over two sequences of time-varying strategies of the channel input process of the difference of (differential) entropies of the innovations processes of

Y^{n}

and

V^{n}

(analog of entropies in the right-hand side of (2)).

\begin{matrix} H (Y^{n}) - H (V^{n}) = & \sum_{t = 1}^{n} \{H (Y_{t} | Y^{t - 1}) - H (V_{t} | V^{t - 1})\} \end{matrix}

(28)

\begin{matrix} = & \sum_{t = 1}^{n} \{H (Y_{t} - E \{Y_{t} | Y^{t - 1}\} | Y^{t - 1}) - H (V_{t} - E \{V_{t} | V^{t - 1}\} | V^{t - 1})\} \end{matrix}

(29)

\begin{matrix} \overset{(a)}{=} & \sum_{t = 1}^{n} \{H (Y_{t} - E \{Y_{t} | Y^{t - 1}\}) - H (V_{t} - E \{V_{t} | V^{t - 1}\})\} \end{matrix}

(30)

\begin{matrix} = & \sum_{t = 1}^{n} \{H (I_{t}) - H ({\hat{I}}_{t})\}, I_{t} \overset{▵}{=} Y_{t} - E \{Y_{t} | Y^{t - 1}\}, {\hat{I}}_{t} \overset{▵}{=} V_{t} - E \{V_{t} | V^{t - 1}\} \end{matrix}

(31)

where

(a)

is due to the Gaussianity of

Y^{n}

and

V^{n}

, as well as due the independence of the innovations processes

I_{t}

and

Y^{t - 1}

and of innovations processes

{\hat{I}}_{t}

and

V^{t - 1}

.

The asymptotic analysis of

{lim}_{n ⟶ \infty} \frac{1}{n} C_{n}^{f b} (κ)

(or with limit and supremum interchanged) is then addressed from the asymptotic properties of entropy rates and the average power,

\begin{matrix} lim_{n ⟶ \infty} \frac{1}{n} \sum_{t = 1}^{n} \{H (I_{t}) - H ({\hat{I}}_{t})\}, lim_{n ⟶ \infty} \frac{1}{n} E \{\sum_{t = 1}^{n} {(X_{t})}^{2}\} \end{matrix}

(32)

over the channel input distributions, and where the covariance of the innovations process of

Y^{n}

is a functional of the solutions of two generalized matrix DREs. We identify necessary and/or sufficient conditions for the existence of limits, irrespective of whether the noise

V^{n}

is non-stationary, unstable, or stationary. Further, we show that, in general, the characterizations of feedback capacity for the Case (I) and Case (II) formulations are fundamentally different, and we identify conditions based on the feedback codes and noise to coincide.

1.4. Review of Related Literature

Asymptotic feedback capacity formulas and bounds for AGN channels, driven by stationary and asymptotically stationary (often limited memory) noise, are derived since the early 1970s in an anthology of papers based on information theoretic formulas, under various assumptions [7,8,9,11,12,13,14,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33], in two directions.

(D1): Characterizations and explicit formulas of asymptotic feedback capacity that correspond to feedback codes, when the initial state of the noise (the initial state $S_{1}$ is any a priori information) $S_{1} = s_{1}$ is known or not known to the encoder and the decoder;
(D2): Bounds on asymptotic feedback capacity that correspond to linear feedback coding schemes of communicating Gaussian random variables (RVs), $Θ : Ω \to R$ , and coding schemes of communicating digital messages $W : Ω \to {1, 2, \dots, 2^{n R}}$ , when the initial state the $S_{1} = s_{1}$ is known or not known to the encoder and the decoder.

1.4.1. The Cover and Pombra Characterizations of Capacity Pombra [5]

Cover and Pombra characterized the n–FTFI capacity for non-stationary and non-ergodic noise

V^{n}

, [5] (Equation (10)), by (we use

H (X)

to denote differential entropy of a continuous-valued RV X; hence, we indirectly assume the probability density functions exist)

\begin{matrix} C_{n}^{f b} (κ) \overset{▵}{=} & \max_{(B^{n}, K_{{\bar{Z}}^{n}}) : \frac{1}{n} t r \{E (X^{n} {(X^{n})}^{T})\} \leq κ} H (Y^{n}) - H (V^{n}) \end{matrix}

(33)

\begin{matrix} = & \max_{(B^{n}, K_{{\bar{Z}}^{n}}) : \frac{1}{n} t r \{B^{n} K_{V^{n}} {(B^{n})}^{T} + K_{{\bar{Z}}^{n}}\} \leq κ} \frac{1}{2} log \frac{| (B^{n} + I_{n \times n}) K_{V^{n}} {(B^{n} + I_{n \times n})}^{T} + K_{{\bar{Z}}^{n}} |}{| K_{V^{n}} |} \end{matrix}

(34)

where the distribution

P_{Y^{n}}

is induced by a jointly Gaussian channel input process

X^{n}

[5] (Equation (11)):

\begin{matrix} X_{t} = \sum_{j = 1}^{t - 1} B_{t, j} V_{j} + {\bar{Z}}_{t}, t = 1, \dots, n, \end{matrix}

(35)

\begin{matrix} X^{n} = B^{n} V^{n} + {\bar{Z}}^{n}, Y^{n} = (B^{n} + I_{n \times n}) V^{n} + {\bar{Z}}^{n}, \end{matrix}

(36)

\begin{matrix} {\bar{Z}}^{n} is jointly Gaussian, N (0, K_{{\bar{Z}}^{n}}), {\bar{Z}}^{n} is independent of V^{n}, \end{matrix}

(37)

\begin{matrix} X^{n} \overset{▵}{=} {[\begin{matrix} X_{1} & X_{2} & \dots & X_{n} \end{matrix}]}^{T} similarly for the rest, B^{n} lower diagonal matrix, \end{matrix}

(38)

\begin{matrix} \frac{1}{n} E \{\sum_{t = 1}^{n} {(X_{t})}^{2}\} = \frac{1}{n} t r \{E (X^{n} {(X^{n})}^{T})\} \leq κ . \end{matrix}

(39)

The notation

N (0, K_{{\bar{Z}}^{n}})

means that the random variable

{\bar{Z}}^{n}

is jointly Gaussian with mean

E {{\bar{Z}}^{n}} = 0

and covariance matrix

K_{{\bar{Z}}^{n}} = E {{\bar{Z}}^{n} {({\bar{Z}}^{n})}^{T}}

, and

I_{n \times n}

denotes an n by the n identity matrix. Feedback capacity,

C^{f b} (κ)

, is characterized by the per unit time limit of the n–FTFI capacity [5] (Theorem 1).

\begin{matrix} C^{f b} (κ) \overset{▵}{=} lim_{n ⟶ \infty} \frac{1}{n} C_{n}^{f b} (κ) . \end{matrix}

(40)

Over the years, considerable efforts have been devoted to compute

C_{n}^{f b} (κ)

and

C^{f b} (κ)

[4,7,8,9,11,33], often under simplified assumptions on the channel noise. In addition, bounds are described in [27,28,29], while numerical methods are developed in [31] for time-invariant AGN channels, driven by stationary noise. In [4,7,11,33,34], the authors considered a variant of (40) by interchanging the per unit time limit and maximization operations under the following assumption: the joint process

(X^{n}, Y^{n}), n = 1, 2, \dots

is either jointly stationary or asymptotically stationary, and the joint distribution of the joint process

(X^{n}, Y^{n}), n = 1, 2, \dots

is time-invariant. We describe [4,7,8] below.

1.4.2. The Yang, Kavcic and Tatikonda [8] Characterization of Maximal Information Rate

In [8], the authors analyzed the feedback capacity of the AGN channel (1), driven by a stationary noise, described the power spectral density (PSD) functions

S_{V} (e^{j θ}), θ \in [- π, π]

:

\begin{matrix} S_{V} (e^{j θ}) \overset{▵}{=} K_{W} \frac{(1 - \sum_{k = 1}^{L} a (k) e^{j k θ}) (1 - \sum_{k = 1}^{L} a (k) e^{- j k θ})}{(1 - \sum_{k = 1}^{L} c (k) e^{j k θ}) (1 - \sum_{k = 1}^{L} c (k) e^{- j k θ})}, | c (k) | < 1, | a (k) | < 1, c (k) \neq a (k) . \end{matrix}

(41)

The analysis in [8] is based on time-domain methods and corresponds to the Case (II) formulation (see [8] (Section II; in particular, Section II.C, I–III, Theorem 1, Section III) by considering a specific time-invariant, stable, state space realization of the noise PSD (41), such that Conditions (II.1), and (II.2) hold, i.e.,

The initial state of the noise,

S_{1} = s

, is known to the encoder and the decoder, and the initial state and noise

(s, V^{t - 1})

uniquely define the noise state

S^{t}

, and vice-versa, for all t.

The time-domain characterization of feedback capacity, called the maximal information rate [8] (Theorem 6), corresponds to the supremum and limit being interchanged, and involves only one matrix Riccati equation of the linear filtering theory. However, ref. [8] (Theorem 6) does not state the conditions based on which the maximal information rate is valid (i.e., existence of asymptotic limit).

1.4.3. The Kim [4] Characterization of Feedback Capacity

The author in [4] also analyzed the feedback capacity of the AGN channel (1), driven by stationary noise described by the PSD (41) and by a time-invariant, stable, state space realization of the noise

V^{n}

(see [4] (Section VI)). A major point of confusion is that the characterization of feedback capacity in the time domain [4] (Theorem 6.1) does not state whether this corresponds to Case (I) or Case (II) formulations. The reader, however, can verify from [4] (Lemma 6.1 and comments above it) that the time-domain characterization of feedback capacity [4] (Theorem 6.1) corresponds to the Case (II) formulation, as stated in the study by Yang, Kavcic and Tatikonda [8]. In fact, the characterization of feedback capacity [4] (Theorem 6.1) involves only one Riccati equation of the linear filtering theory, as in [8]. We reconfirm this point at various parts of this paper (see, for example, Section 2.6).

1.4.4. The Gattami [7] Characterization of Feedback Capacity and Semi-Definite Progamming Formulation

The authors in [7] re-visited the feedback capacity of the AGN channel (1) driven by a stationary noise described by the PSD (41) and with a time-invariant, stable, state space realization of the noise

V^{n}

. One of the main results of [7] is the feedback capacity characterization for the Case (II) formulation, i.e., that involves only one matrix Riccati equation of the filtering theory, precisely as in [4,8]. Another main result of [7] is the re-formulation of the optimization problem using semi-definite programming.

In the following remark, we will provide additional insights into the results of [4,7,8].

Remark 1.

On the formulas in [4,7,8].

Refs. [4,7] considered the stable, time-invariant PO-SS realization of Definition 2, with $W_{t} : Ω \to R, N_{t} = 1, \forall t$ , i.e., $n_{w} = 1$ .

In [4], (Theorem 6.1) (see [4] (above and below Equation (70))), the asymptotic characterization of feedback capacity involves one filtering matrix ARE and is achieved by a channel input, which is different at times

n = 1, 2, \dots, n_{s}

from subsequent time

n = n_{s} + 1, \dots

, given by

\begin{matrix} X_{n} = Z_{n} \in N (0, K_{Z}), K_{Z} > 0, n = 1, \dots, n_{s}, \end{matrix}

(42)

\begin{matrix} X_{n} = χ (S_{n} - E \{S_{n} | Y^{n - 1}\}), n = n_{s} + 1, n_{s} + 2, \dots . \end{matrix}

(43)

where χ is a constant vector, and

Z_{n}, n = 1, 2, \dots, n_{s}

are IID Gaussian RVs.

In [7] (Theorem 3), the asymptotic characterization of feedback capacity involves one filtering matrix ARE and is incurred by the time-invariant channel input

\begin{matrix} X_{n} = χ (S_{n} - E \{S_{n} | Y^{n - 1}\}) + Z_{n}, Z_{n} \in N (0, K_{Z}), K_{Z} \geq 0, n = 1, 2, \dots \end{matrix}

(44)

Z_{n}, n = 1, 2, \dots

, are IID Gaussian RVs.

In [8], a state space realization of the PSD is considered, and the maximal information rate is presented, (Equation (125)) and [8] (Theorem 6, Equation (138),

I_{m a x}

), which involves one filtering matrix ARE. It is achieved by

\begin{matrix} X_{n} = χ (S_{n - 1} - E \{S_{n - 1} | Y^{n - 1}, S_{0} = s_{0}\}) + Z_{n}, X_{1} = Z_{1}, \end{matrix}

(45)

\begin{matrix} Z_{n} \in N (0, K_{Z}), K_{Z} \geq 0, n = 1, 2, \dots, \end{matrix}

(46)

where

Z_{n}, n = 1, 2, \dots

are IID Gaussian RVs.

The above references computed the feedback capacity and maximal information rate for the ARMA

(a, c), a \in [- 1, 1], c \in (- 1, 1), c \neq a

and arrived at the conclusion that it is precisely Butman’s [13] and Wolfowitz’s [14] lower bound.

It will become apparent that [4,7,8] arrived at the above expressions by considering problem

C^{f b, S} (κ, s)

, and not

C^{f b} (κ)

or

C^{f b} (κ, s)

.

Sequential equivalent characterizations of the Cover and Pombra [5] n–FTFI capacity,

C_{n}^{f b} (κ)

and

C^{f b} (κ)

capacity of the AGN channel (1) driven by a non-stationary and non-ergodic Gaussian noise

V^{n}

, as well as by time-varying, unstable (and stable) state space realizations of Definition 2, are derived in [1,2,3,10], including relations between Case (I) and Case (II) formulations. In particular, ref. [10] proved that

C_{n}^{f b} (κ)

of the AGN channel with state space noise of Definition 2 involves two matrix Riccati equations of the linear filtering theory and that, under certain conditions,

C_{n}^{f b} (κ)

reduces to

C_{n}^{f b, S} (κ, s)

, which involves only one matrix Riccati equation. Corresponding expressions are obtained for their per unit time asymptotic limits. The methods and results of [10] are generalized to multiple-input multiple-output (MIMO) Gaussian channels in [3]. Further, ref. [2] derived closed-form expressions of

C^{f b} (κ)

for AGN channels driven by the stable and unstable ARMA

(a, c), a \in (- \infty, \infty), c \in (- \infty, \infty), c \neq a

noise, showing the connection between Case (I) and Case (II) formulations. Ref. [3] generalized the earlier investigation in [1], which considered the autoregressive unit memory stable and unstable noise. An investigation of nonfeedback capacity of stable and unstable noise is outlined in [35]. The connection of ergodic theory and feedback capacity of unstable channels is discussed in [36,37].

MIMO Gaussian channels are also investigated in [38]. However, many of the expressions in [38] are previously obtained in [3,10]. The analysis in [38] does not include a derivation of the optimal channel input that achieves

C_{n}^{f b} (κ)

and

C^{f b} (κ)

, and does not discuss the connection between Case (I) and Case (II) formulations. The closed-form expressions of capacity of the examples in [38], are special cases of expressions in [1] and in [2], which treated the noise ARMA

(a, c), \forall a \in (- \infty, \infty), \forall c \in (- \infty, \infty), c \neq a

.

We structure this paper as follows.

In Section 2, we derive the new sequential characterizations of the n–FTFI capacity for the Cover and Pombra formulation of feedback capacity of the AGN channel (1), i.e., for the Case (I) formulation,

C_{n}^{f b} (κ)

. We also derive analogous sequential characterizations for

C_{n}^{f b} (κ, s)

and for the Case (II) formulation,

C_{n}^{f b, S} (κ, s)

, i.e., when Conditions 1 and 2 hold, to illustrate their fundamental differences.

In Section 3, we present the asymptotic analysis of feedback capacity for the Case (I) formulation. In Section 4, we treat the Case (II) formulation.

This paper contains several examples and makes comparisons to the existing literature.

2. Sequential Characterizations of n–FTFI Capacity for Case (I) Formulation

In this section, we derive equivalent sequential characterizations for the following:

(i)

C_{n}^{f b} (κ)

defined by (2) of the Case (I) formulation, i.e., for the Cover and Pombra n–FTFI capacity characterization (34);

(ii)

C_{n}^{f b} (κ, s)

defined by (3), as a degenerated case of

C_{n}^{f b} (κ)

;

(iii)

C_{n}^{f b, S} (κ, s)

defined by (4) of the Case (II) formulation, as a degenerated case of

C_{n}^{f b} (κ)

.

We organize the presentation of the material as follows.

(1) Section 2.1. Here, we introduce our notation.

(2) Section 2.2. The main result is Theorem 1, which provides an equivalent sequential characterization of the Cover and Pombra characterization of the n–FTFI capacity,

C_{n}^{f b} (κ)

, i.e., of (33) and (34). Our derivation proceeds as follows. We apply a linear transformation to the Cover and Pombra Gaussian optimal channel input

X^{n}

(35), to represent

X_{t}

, with a linear function of

(V^{t - 1}, Y^{t - 1})

or equivalently

(X^{t - 1}, Y^{t - 1})

and an orthogonal Gaussian innovations process

Z_{t}

, which is independent of

(Z^{t - 1}, X^{t - 1}, V^{t - 1}, Y^{t - 1})

for

t = 1, \dots, n

.

Subsequently, we apply Theorem 1 to the time-varying PO-SS $(a_{t}, c_{t}, b_{t}^{1}, b_{t}^{2}, d_{t}^{1}, d_{t}^{2})$ noise (see Example 1) to the non-stationary autoregressive moving average, ARMA $(a, c), a \in (- \infty, \infty), c \in (- \infty, \infty)$ noise, and to the stationary ARMA $(a, c), a \in (- 1, 1), c \in (- 1, 1)$ noise (see Example 2), which is found in many references, such as [4]. It will become apparent that our characterizations of n–FTFI capacity are fundamentally different from past studies.

(3) Section 2.3. The main result is Theorem 3, which provides a simplified characterization of the sequential characterization of the n–FTFI capacity,

C_{n}^{f b} (κ)

, given in Theorem 1 (i.e., the equivalent of (34)), for the time-varying AGN channel (1) driven by the PO-SS realization of Definition 2, for the code of Definition 1. The n–FTFI capacity of Theorem 3 is expressed in terms of solutions to two DREs. Our derivation is based on identifying a finite-dimensional sufficient statistic to express

X_{t}

as a functional of the sufficient statistic, instead of

(V^{t - 1}, Y^{t - 1})

or

(X^{t - 1}, Y^{t - 1})

, and an orthogonal Gaussian innovations process.

(4) Section 2.4. The main results is Corollary 6, which is an application of Theorem 3 (i.e., the sufficient statistic representation), to the ARMA

(a, c), a \in (- \infty, \infty), c \in (- \infty, \infty)

noise of Example 2. This example shows that the n–FTFI capacity is expressed in terms of solutions to two DREs.

From Corollary 6, the following will become apparent:

(i) Neither the time-domain characterization [4] (Theorem 6.1) (see [4]) (Theorem 5.3) nor the frequency domain characterization [4] (Theorem 4.1) correspond to the Cover and Pombra characterization of feedback capacity.

(5) Section 2.5. The main results is Corollary 7, which gives the n–FTFI capacity for the Case (II) formulation, as a degenerate case of the Case (I) formulation, i.e., of Theorem 3.

(6) Section 2.6. The main result is Proposition 2, which further clarifies the following.

(i) The formulation of [8] and the formulation that led to [4] (Theorem 6.1) are based on the Case (II) formulation, as well as (ii) some of the oversights in [4,7,9,11,12].

2.1. Notation

Throughout this paper, we use the following notation:

$Z \overset{▵}{=} {\dots, - 1, 0, 1, \dots}, Z_{+} \overset{▵}{=} {1, \dots}, Z_{+}^{n} \overset{▵}{=} {1, 2, \dots, n}$ , where n is a finite positive integer.
$R \overset{▵}{=} (- \infty, \infty)$ , and $R^{m}$ is the vector space of tuples of the real numbers for an integer $n \in Z_{+}$ .
$C \overset{▵}{=} {a + j b : (a, b) \in R \times R}$ is the space of complex numbers.
$R^{n \times m}$ is the set of n by m matrices with entries from the set of real numbers for integers $(n, m) \in Z_{+} \times Z_{+}$ .
$D_{o} \overset{▵}{=} \{c \in C : | c | < 1\}$ is the open unit disc of the space of complex number $C$ .
$S_{+}^{n \times n}, n \in Z_{+}$ (resp. $S_{+ +}^{n \times n}$ ) denotes the set of positive semidefinite (resp. positive definite) symmetric matrices with elements in the real numbers and of size $n \times n$ . Thus, $A \in S_{+}^{n \times n}$ if for all $w \in R^{n}$ , $w^{T} A w \geq 0$ . Positive semidefiniteness is denoted by $A ⪰ 0$ and (strict) positive definiteness by $A ≻ 0$ . $I_{n \times n} \in S_{+ +}^{n \times n}, n \in Z_{+}$ denotes the identity matrix, $t r (A)$ denotes the trace of any matrix $A \in R^{n \times n}, n \in Z_{+}$ .
$s p e c (A) \subset C$ is the Spectrum of a matrix $A \in R^{q \times q}, q \in Z_{+}$ (the set of all its eigenvalues). A matrix $A \in R^{q \times q}$ is called exponentially stable if all its eigenvalues are within the open unit disc, i.e., $s p e c (A) \subset D_{o}$ .
$(Ω, F, P)$ denotes a probability space. Given a random variable $X : Ω \to R^{n_{x}}, n_{x} \in Z_{+}^{n}$ , its induced distribution on $R^{n_{x}}$ is denoted by $P_{X}$ .
$P_{X} \in N (μ_{X}, K_{X}), K_{X} ⪰ 0$ denotes a Gaussian distributed RV X, with mean value $μ_{X}$ and covariance matrix $K_{X} = c o v (X, X) ⪰ 0$ , defined by

$\begin{matrix} μ_{X} \overset{▵}{=} E {X}, K_{X} = c o v (X, X) \overset{▵}{=} E \{(X - E \{X\}) {(X - E \{X\})}^{T}\} . \end{matrix}$

(47)

Given another Gaussian random variable $Y : Ω \to R^{n_{y}}, n_{y} \in Z_{+}^{n}$ , which is jointly Gaussian distributed with X, i.e., the joint distribution is $P_{X, Y}$ , the conditional covariance of X given Y is defined by

$\begin{matrix} K_{X | Y} = c o v (X, X | Y) \overset{▵}{=} & E \{(X - E \{X | Y\}) {(X - E \{X | Y\})}^{T} | Y\} \end{matrix}$

(48)

$\begin{matrix} = & E \{(X - E \{X | Y\}) {(X - E \{X | Y\})}^{T}\} \end{matrix}$

(49)

where the last equality is due to a property of jointly Gaussian distributed RVs.
Given three arbitrary RVs $(X, Y, Z)$ with induced distribution $P_{X, Y, Z}$ , the RVs $(X, Z)$ are called conditionally independent given the RV Y if $P_{Z | X, Y} = P_{Z | Y}$ . This conditional independence is often denoted by $X \leftrightarrow Y \leftrightarrow Z$ , which is a Markov chain.

2.2. Preliminary Characterizations of n–FTFI Capacity of AGN Channels Driven by Correlated Noise

We start with preliminary calculations for the feedback code of Definition 1, which we use to prove Theorem 1. These calculations are introduced for the sake of clarity and to establish our notation.

For the feedback code of Definition 1, by the channel definition (1), i.e., (18), the conditional distribution of $Y_{t}$ , given $Y^{t - 1} = y^{t - 1}, X^{t} = x^{t}$ , is

$\begin{matrix} P \{Y_{t} \in d y | Y^{t - 1} = y^{t - 1}, X^{t} = x^{t}\} \overset{(a)}{=} & P \{Y_{t} \in d y | Y^{t - 1} = y^{t - 1}, X^{t} = x^{t}, V^{t - 1} = v^{t - 1}\} \end{matrix}$

$\begin{matrix} \overset{(b)}{=} & P_{V_{t} | V^{t - 1}} (v_{t} : x_{t} + v_{t} \in d y), t = 2, \dots, n \end{matrix}$

(50)

$\begin{matrix} = & P_{Y_{t} | X_{t}, V^{t - 1}} \end{matrix}$

(51)

$\begin{matrix} \equiv & P_{t} (d y | x_{t}, v^{t - 1}), \end{matrix}$

(52)

$\begin{matrix} P \{Y_{1} \in d y | Y^{0} = y^{0}, X^{1} = x^{1}\} = & P_{Y_{1} | X_{1}} \equiv P_{1} (d y | x_{1}) . \end{matrix}$

(53)

where $(a)$ is due to (1) and $(b)$ is due to (18). We introduce the set of channel input distributions with feedback, which are consistent with the code of Definition 1, not necessarily generated by the messages W, as follows:

$\begin{matrix} P_{[1, n]} (κ) \overset{▵}{=} \{P_{t} (d x_{t} | x^{t - 1}, y^{t - 1}) \overset{▵}{=} P_{X_{t} | X^{t - 1}, Y^{t - 1}}, t = 1, \dots, n : \frac{1}{n} E^{P} (\sum_{t = 1}^{n} {(X_{t})}^{2}) \leq κ\} . \end{matrix}$

(54)

By Definition 1, we have $E_{[1, n]} (κ) \subseteq P_{[1, n]} (κ)$ . Moreover, by the channel definition, any pair of the sequence triple $(V^{t}, X^{t}, Y^{t})$ uniquely defines the remaining sequence. Thus, the identity holds:

$\begin{matrix} {\bar{P}}_{[1, n]} (κ) \overset{▵}{=} \{{\bar{P}}_{t} (d x_{t} | v^{t - 1}, y^{t - 1}), t = 1, \dots, n : \frac{1}{n} E^{\bar{P}} (\sum_{t = 1}^{n} {(X_{t})}^{2}) \leq κ\} = P_{[1, n]} (κ) . \end{matrix}$

(55)

We also emphasize that, by Definition 1, for a given feedback encoder strategy $e (\cdot) \in E_{[1, n]} (κ)$ , i.e., $x_{1} = e_{1} (w), x_{2} = e_{2} (w, x_{1}, y_{1}), \dots, x_{n} = e_{n} (w, x^{n - 1}, y^{n - 1})$ the conditional distributions of $Y_{t}$ given $(Y^{t - 1}, W) = (y^{t - 1}, w)$ is obtained as follows:

$\begin{matrix} P_{Y_{t} | W, Y^{t - 1}}^{e} (d y_{t} |, y^{t - 1}, w) \overset{(a)}{=} & P_{t} (d y_{t} | {e_{j} (w, x^{j - 1}, y^{j - 1}) : j = 1, \dots, t}, y^{t - 1}, w) \end{matrix}$

(56)

$\begin{matrix} \overset{(b)}{=} & P_{t} (d y_{t} | {e_{j} (w, x^{j - 1}, y^{j - 1}) : j = 1, \dots, t}, y^{t - 1}, v^{t - 1}, w) \end{matrix}$

(57)

$\begin{matrix} \overset{(c)}{=} & P_{t} (d y_{t} | {e_{j} (w, x^{j - 1}, y^{j - 1}) : j = 1, \dots, t}, v^{t - 1}, w) \end{matrix}$

(58)

$\begin{matrix} \overset{(d)}{=} & P_{t} (d y_{t} | {e_{j} (w, x^{j - 1}, y^{j - 1}) : j = 1, \dots, t}, v^{t - 1}) \end{matrix}$

(59)

$\begin{matrix} \overset{(e)}{=} & P_{t} (d y_{t} | e_{t} (w, x^{t - 1}, y^{t - 1}), v^{t - 1}) \end{matrix}$

(60)

$(a)$ is due to knowledge of the distribution of the strategies $e_{j} (\cdot), j = 1, \dots, t$ , the code definition, and the recursive substitution, $x_{1} = e_{1} (w), x_{2} = e_{2} (w, x_{1}, y_{1}), \dots, e_{t} (w, x^{t - 1}, y^{t - 1})$ , where $x^{t - 1}$ is specified by the knowledge of the strategies, $e_{j} (\cdot), j = 1, \dots, t - 1$ and the knowledge of $(y^{t - 2}, w)$ ,
$(b)$ is due to knowing $x_{j} = e_{j} (w, x^{j - 1}, y^{j - 1}), y_{j}, j = 1, \dots, t - 1$ specifies $v_{j} = y_{j} - x_{j}, j = 1, \dots, t - 1$ ,
$(c)$ is due to the fact that, any pair of the triple $(x^{t}, y^{t}, v^{t})$ specifies the remaining sequence, i.e., knowing $(x^{t - 1}, v^{t - 1})$ specifies $y^{t - 1}$ , and $y^{t - 1}$ is thus redundant,
$(d)$ is due to the conditional independence $P_{V_{t} | V^{t - 1}, X^{t}, W} = P_{V_{t} | V^{t - 1}, X^{t}}$ ,
$(e)$ is due to (18), i.e., $P_{V_{t} | V^{t - 1}, X^{t}} = P_{V_{t} | V^{t - 1}}$ , and the channel definition.

By the channel definition

Y_{t} = X_{t} + V_{t}, t = 1, \dots, n

, each

e (\cdot) \in E_{[0, n]} (κ)

is also expressed as

\begin{matrix} x_{1} = e_{1} (w) = {\bar{e}}_{1} (w), x_{2} = e_{2} (w, x_{1}, y_{1}) = {\tilde{e}}_{2} (w, x_{1}, v_{1}, y_{1}) \overset{(a)}{=} {\bar{e}}_{2} (w, v_{1}, y_{1}), \dots, \\ x_{n} = e_{n} (w, x^{n - 1}, y^{n - 1}) = {\tilde{e}}_{n} (w, x^{n - 1}, v^{n - 1}, y^{n - 1}) \overset{(a)}{=} {\bar{e}}_{n} (w, v^{n - 1}, y^{n - 1}), w \in M^{(n)} . \end{matrix}

(61)

where

(a)

is due to the channel definition—i.e., the presence of

x^{t - 1}

in

{\tilde{e}}_{t} (\cdot, v^{t - 1}, \cdot)

can be removed, since it is redundant—and specified by

(v^{t - 1}, y^{t - 1})

. Consequently, we have the identity

\begin{matrix} {\bar{E}}_{[1, n]} (κ) ≜ & {x_{1} = {\bar{e}}_{1} (w), x_{2} = {\bar{e}}_{2} (w, v_{1}, y_{1}) \dots, x_{n} = {\bar{e}}_{n} (w, v^{n - 1}, y^{n - 1}) : \\ \frac{1}{n} E^{\bar{e}} (\sum_{i = 1}^{n} {(X_{t})}^{2}) \leq κ} = E_{[1, n]} (κ) . \end{matrix}

(62)

Notation 1.

For the feedback code of Definition 3, with initial state

S_{1} = s

, known to the encoder and the decoder, the above sets

P_{[1, n]} (κ), {\bar{P}}_{[1, n]} (κ), E_{[1, n]}, {\bar{E}}_{[1, n]}

are replaced by

P_{[0, n]}^{s} (κ), {\bar{P}}_{[1, n]}^{s} (κ)

,

E_{[1, n]}^{s}, {\bar{E}}_{[1, n]}^{s}

to indicate that the distributions and codes are

{\bar{P}}_{t} (d x_{t} | v^{t - 1}, y^{t - 1}, s), t = 1, \dots, x_{1} = {\bar{e}}_{1} (w, s), x_{2} = {\bar{e}}_{2} (w, v_{1}, y_{1}, s) \dots, x_{n} = {\bar{e}}_{n} (w, v^{n - 1}, y^{n - 1}, s)

, etc., and these depend on s.

In the next theorem, we present our preliminary equivalent sequential characterization of the Cover and Pombra characterization

C_{n}^{f b} (κ)

, i.e., of (33), under encoder strategies

E_{[0, n]} (κ) = {\bar{E}}_{[0, n]} (κ)

and channel input distributions

P_{[0, n]} (κ) = {\bar{P}}_{[0, n]} (κ)

. Unlike the Cover and Pombra [5] realization of

X^{n}

, given by (35), at each time t,

X_{t}

is driven by an orthogonal Gaussian process

Z_{t}

.

Theorem 1.

Information structures of maximizing distributions for AGN Channels

Consider the AGN channel (1), i.e., with noise distribution $P_{V^{n}}$ , and the code of Definition 1. Then, the following hold:
(a) The following inequality holds:

$\begin{matrix} \sup_{{\bar{E}}_{[1, n]} (κ)} \sum_{t = 1}^{n} H^{\bar{e}} (Y_{t} | Y^{t - 1}) \leq \sup_{{\bar{P}}_{[1, n]} (κ)} \sum_{t = 1}^{n} H^{\bar{P}} (Y_{t} | Y^{t - 1}) \end{matrix}$

(63)

where the conditional (differential) entropy $H^{\bar{e}} (Y_{t} | Y^{t - 1})$ is evaluated with respect to the probability distribution $P_{t}^{\bar{e}} (d y_{t} | y^{t - 1})$ , defined by

$\begin{matrix} P_{t}^{\bar{e}} (d y_{t} | y^{t - 1}) = \int P_{t} (d y_{t} | {\bar{e}}_{t} (w, v^{t - 1}, y^{t - 1}), v^{t - 1}) P_{t}^{\bar{e}} (d w | y^{t - 1}), t = 0, \dots, n . \end{matrix}$

(64)

and $H^{\bar{P}} (Y_{t} | Y^{t - 1})$ is evaluated with respect to the probability distribution $P_{t}^{\bar{P}} (d y_{t} | y^{t - 1})$ , defined by

$\begin{matrix} P_{t}^{\bar{P}} (d y_{t} | y^{t - 1}) = \int P_{t} (d y_{t} | x_{t}, v^{t - 1}) P_{t}^{\bar{P}} (d x_{t} | v^{t - 1}, y^{t - 1}) P_{t}^{\bar{P}} (d v^{t - 1} | y^{t - 1}), t = 0, \dots, n . \end{matrix}$

(65)

(b) The optimal channel input distribution ${\bar{P} (d x_{t} | v^{t - 1}, y^{t - 1}), t = 1, \dots, n} \in {\bar{P}}_{[1, n]} (κ)$ , which maximizes $\sum_{t = 1}^{n} H^{\bar{P}} (Y_{t} | Y^{t - 1})$ of part (a), i.e., the right-hand side of (63), is induced by an input process $X^{n}$ , which is conditionally Gaussian, with linear conditional mean, nonrandom conditional covariance and given by

$\begin{matrix} E^{\bar{P}} \{X_{t} | V^{t - 1}, Y^{t - 1}\} = \{\begin{matrix} Γ_{t}^{1} V^{t - 1} + Γ_{t}^{2} Y^{t - 1}, & f o r & t = 2, \dots, n \\ 0, & f o r & t = 1, \end{matrix} \end{matrix}$

(66)

$\begin{matrix} K_{X_{t} | V^{t - 1}, Y^{t - 1}} \overset{▵}{=} c o v (X_{t}, X_{t} | V^{t - 1}, Y^{t - 1}) = K_{Z_{t}} ⪰ 0, t = 1, \dots, n \end{matrix}$

(67)

and such that the average constraint holds and (18) is respected.
(c) The optimal channel input distribution ${\bar{P} (d x_{t} | v^{t - 1}, y^{t - 1}), t = 1, \dots, n} \in {\bar{P}}_{[1, n]} (κ)$ of part (b) is induced by a jointly Gaussian process $X^{n}$ , with a realization given by

$\begin{matrix} X_{t} = \sum_{j = 1}^{t - 1} Γ_{t, j}^{1} V_{j} + \sum_{j = 1}^{t - 1} Γ_{t, j}^{2} Y_{j} + Z_{t}, X_{1} = Z_{1}, t = 2, \dots, n, \end{matrix}$

(68)

$\begin{matrix} = Γ_{t}^{1} V^{t - 1} + Γ_{t}^{2} Y^{t - 1} + Z_{t}, \end{matrix}$

(69)

$\begin{matrix} Z_{t} \in N (0, K_{Z_{t}}), t = 1, \dots, n a G a u s s i a n s e q u e n c e, \end{matrix}$

(70)

$\begin{matrix} Z_{t} i n d e p e n d e n t o f (V^{t - 1}, X^{t - 1}, Y^{t - 1}), t = 1, \dots, n, \end{matrix}$

(71)

$\begin{matrix} Z^{n} i n d e p e n d e n t o f V^{n}, \end{matrix}$

(72)

$\begin{matrix} \frac{1}{n} E^{\bar{P}} \{\sum_{t = 1}^{n} {(X_{t})}^{2}\} \leq κ, \end{matrix}$

(73)

$\begin{matrix} (Γ_{t}^{1}, Γ_{t}^{2}, K_{Z_{t}}) \in (- \infty, \infty) \times (- \infty, \infty) \times [0, \infty) n o n r a n d o m . \end{matrix}$

(74)

(d) An equivalent characterization of the n–FTFI capacity $C_{n}^{f b} (κ)$ , defined by (33) and (34), is given by

$\begin{matrix} C_{n}^{f b} (κ) = & \sup_{\frac{1}{n} E^{\bar{P}} \{\sum_{t = 1}^{n} {(X_{t})}^{2}\} \leq κ} \sum_{t = 1}^{n} H^{\bar{P}} (Y_{t} | Y^{t - 1}) - H (V^{n}) \end{matrix}$

(75)

$\begin{matrix} = & \sup_{\frac{1}{n} E^{\bar{P}} \{\sum_{t = 1}^{n} {(X_{t})}^{2}\} \leq κ} \sum_{t = 1}^{n} \{H^{\bar{P}} (I_{t}) - H ({\hat{I}}_{t})\} \end{matrix}$

(76)

where $I_{t} a n d {\hat{I}}_{t}$ are innovations processes defined by

$\begin{matrix} I_{t} \overset{▵}{=} Y_{t} - E^{\bar{P}} \{Y_{t} | Y^{t - 1}\}, {\hat{I}}_{t} \overset{▵}{=} V_{t} - E \{V_{t} | V^{t - 1}\} \end{matrix}$

(77)

where parts (b) and (c) hold, and where the supremum is over all $(Γ_{t}^{1}, Γ_{t}^{2}, K_{Z_{t}}), t = 1, \dots, n$ of the realization of part (c), which induces the distribution ${\bar{P}}_{t} (d x_{t} | v^{t - 1}, y^{t - 1}), t = 1, \dots, n$ .

Proof.

See Appendix A.1. □

Remark 2.

For the code of Definition 3 that assumes knowledge of the initial state

S_{1} = s

, it is easy to verify that

C_{n}^{f b} (κ, s)

is directly obtained from Theorem 1, as a degenerate case (an independent derivation is easily produced following the derivation of Corollary 11, with slight variations).

By utilizing Theorem 1, we can derive the converse coding theorems stated below for the feedback codes of Definitions 1 and 3.

Theorem 2.

Converse coding theorems for codes of Definitions 1 and 3

Consider the AGN channel (1).

(a) Any achievable rate R for the code of Definition 1 satisfies

$\begin{matrix} R \leq & C^{f b} (κ) \overset{▵}{=} lim_{n ⟶ \infty} \frac{1}{n} C_{n}^{f b} (κ), \end{matrix}$

(78)

$\begin{matrix} C_{n}^{f b} (κ) = & \sup_{{\bar{P}}_{t} (d x_{t} | v^{t - 1}, y^{t - 1}), t = 1, \dots, n : \frac{1}{n} E^{\bar{P}} \{\sum_{t = 1}^{n} {(X_{t})}^{2}\} \leq κ} \sum_{t = 1}^{n} H^{\bar{P}} (Y_{t} | Y^{t - 1}) - H (V^{n}) \end{matrix}$

(79)

provided the supremum exists and the limit exists, where the right-hand side of (79) is given in Theorem 1(d).
(b) Any achievable rate R for the code of Definition 3 (with initial state $S_{1} = s$ ) satisfies

$\begin{matrix} R \leq & C^{f b} (κ, s) \overset{▵}{=} lim_{n ⟶ \infty} \frac{1}{n} C_{n}^{f b} (κ, s), \end{matrix}$

(80)

$\begin{matrix} C_{n}^{f b} (κ, s) = & \sup_{{\bar{P}}_{t} (d x_{t} | v^{t - 1}, y^{t - 1}, s), t = 1, \dots, n : \frac{1}{n} E_{s}^{\bar{P}} \{\sum_{t = 1}^{n} {(X_{t})}^{2} | S_{1}\} \leq κ} \sum_{t = 1}^{n} H^{\bar{P}} (Y_{t} | Y^{t - 1}, s) - H (V^{n} | s) . \end{matrix}$

(81)

where $E_{s}^{\bar{P}} {\cdot}$ means the expectation is for a fixed $S_{1} = s$ , provided the supremum exists and the limit exists, and where the right-hand side of (81) is obtained from Theorem 1, part (d), by replacing all conditional distributions, entropies, etc., for fixed initial state $S_{1} = s$ (see Notation 1).

Proof.

Following on from standard arguments, we use Fano’s inequality (see also [5]) and Theorem 1. □

In the next remark, we clarify the equivalence of Theorem 1(d) to Cover and Pombra [5].

Remark 3.

Relation of Theorem 1 and Cover and Pombra [5]

(a) From the realization of $X^{n}$ given by (68), we can recover the Cover and Pombra [5] realization (35) by recursive substitution of $Y^{t - 1}$ into the right-hand side of (68), as follows:

$\begin{matrix} X_{t} = & \sum_{j = 1}^{t - 1} Γ_{t, j}^{1} V_{j} + \sum_{j = 1}^{t - 1} Γ_{t, j}^{2} Y_{j} + Z_{t} \end{matrix}$

(82)

$\begin{matrix} = & \sum_{j = 1}^{t - 1} Γ_{t, j}^{1} V_{j} + \sum_{j = 1}^{t - 2} Γ_{t, j}^{2} Y_{j} + Γ_{t, t - 1}^{2} (X_{t - 1} + Z_{t - 1}) + Z_{t} \end{matrix}$

(83)

$\begin{matrix} = & \sum_{j = 1}^{t - 1} B_{t, j} V_{j} + {\bar{Z}}_{t}, b y r e c u r s i v e s u b s t i t u t i o n o f X_{1}, \dots, X_{t - 1}, Y_{1}, \dots, Y_{t - 2} \end{matrix}$

(84)

for some ${\bar{Z}}_{t} \in (0, K_{{\bar{Z}}_{t}})$ which is jointly correlated, and some nonrandom $B_{t, j}$ , as given by (35) and (36).
(b) Unlike the Cover and Pombra [5] realization of $X^{n}$ , i.e., (35), the realization of $X^{n}$ given by (68) or in vector form by (69) is such that, at each time t, $X_{t}$ depends on $(V^{t - 1}, Y^{t - 1}, Z_{t})$ or in vector form on $(V^{t - 1}, Y^{t - 1}, Z_{t})$ , where $Z^{t}$ is an innovations or orthogonal process, i.e., (71) holds.
(c) In subsequent parts of the paper, we derive an equivalent sequential characterization of the Cover and Pombra n–FTFI capacity (34), which is simplified further with the use of a sufficient statistic (that satisfies a Markov recursion).

To characterize

C_{n}^{f b} (κ)

using Theorem 1, part (d), we need to compute the (differential) entropy

H (V^{n})

of

V^{n}

. The following lemma is useful in this respect:

Lemma 1.

Entropy

H (V^{n})

calculation from generalized Kalman filter of the PO-SS noise realization.

Consider the PO-SS realization of

V^{n}

of Definition 2. Define the conditional covariance and conditional mean of

S_{t}

given

V^{t - 1}

by

\begin{matrix} Σ_{t} \overset{▵}{=} & c o v (S_{t}, S_{t} | V^{t - 1}) = E \{(S_{t} - {\hat{S}}_{t}) {(S_{t} - {\hat{S}}_{t})}^{T} | V^{t - 1}\}, {\hat{S}}_{t} \overset{▵}{=} E \{S_{t} | V^{t - 1}\}, t = 2, \dots, n, \end{matrix}

(85)

\begin{matrix} Σ_{1} \overset{▵}{=} & c o v (S_{1}, S_{1}) = K_{S_{1}}, {\hat{S}}_{1} \overset{▵}{=} μ_{S_{1}} . \end{matrix}

(86)

Then, the following hold:

(a) The conditional distribution of $V_{t}$ conditioned on $V^{t - 1}$ is Gaussian, i.e.,

$\begin{matrix} P_{V_{t} | V^{t - 1}} \in N (μ_{V_{t} | V^{t - 1}}, K_{V_{t} | V^{t - 1}}), t = 1, \dots, n \end{matrix}$

(87)

where $μ_{V_{t} | V^{t - 1}} \overset{▵}{=} E \{V_{t} | V^{t - 1}\}, K_{V_{t} | V^{t - 1}} \overset{▵}{=} c o v (V_{t}, V_{t} | V^{t - 1})$ .
(b) The conditional mean and covariance $μ_{V_{t} | V^{t - 1}}, K_{V_{t} | V^{t - 1}}$ are given by the generalized Kalman filter recursions, as follows:
(i) The optimal mean square error estimate ${\hat{S}}_{t}$ satisfies the generalized Kalman filter recursion

$\begin{matrix} {\hat{S}}_{t + 1} = A_{t} {\hat{S}}_{t} + M_{t} (Σ_{t}) {\hat{I}}_{t}, {\hat{S}}_{1} = μ_{S_{1}}, \end{matrix}$

(88)

$\begin{matrix} M_{t} (Σ_{t}) \overset{▵}{=} (A_{t} Σ_{t} C_{t}^{T} + B_{t} K_{W_{t}} N_{t}^{T}) {(N_{t} K_{W_{t}} N_{t}^{T} + C_{t} Σ_{t} C_{t}^{T})}^{- 1}, \end{matrix}$

(89)

$\begin{matrix} {\hat{I}}_{t} \overset{▵}{=} V_{t} - E \{V_{t} | V^{t - 1}\} = V_{t} - C_{t} {\hat{S}}_{t} = C_{t} (S_{t} - {\hat{S}}_{t}) + N_{t} W_{t}, t = 1, \dots, n, \end{matrix}$

(90)

$\begin{matrix} {\hat{I}}_{t} \in N (0, K_{{\hat{I}}_{t}}), t = 1, \dots, n i s a n o r t h o g o n a l i n n o v a t i o n s p r o c e s s, i . e ., {\hat{I}}_{t} i s i n d e p e n d e n t o f \\ {\hat{I}}_{s}, f o r a l l t \neq s, a n d {\hat{I}}_{t} i s i n d e p e n d e n t o f V^{t - 1}, \end{matrix}$

(91)

$\begin{matrix} K_{{\hat{I}}_{t}} \overset{▵}{=} c o v ({\hat{I}}_{t}, {\hat{I}}_{t}) = C_{t} Σ_{t} C_{t}^{T} + N_{t} K_{W_{t}} N_{t}^{T} . \end{matrix}$

(92)

(ii) The error $E_{t} \overset{▵}{=} S_{t} - {\hat{S}}_{t}$ satisfies the recursion

$\begin{matrix} E_{t + 1} = & M_{t}^{C L} (Σ_{t}) E_{t} + (B_{t} - M (Σ_{t}) N_{t}) W_{t}, E_{1} = S_{1} - {\hat{S}}_{1}, t = 1, \dots, n, \end{matrix}$

(93)

$\begin{matrix} M_{t}^{C L} (Σ_{t}) \overset{▵}{=} & A_{t} - M_{t} (Σ_{t}) C_{t} . \end{matrix}$

(94)

(iii) The covariance of the error is such that $E \{E_{t} E_{t}^{T}\} = Σ_{t}$ and satisfies the generalized matrix DRE

$\begin{matrix} Σ_{t + 1} = & A_{t} Σ_{t} A_{t}^{T} + B_{t} K_{W_{t}} B_{t}^{T} - (A_{t} Σ_{t} C_{t}^{T} + B_{t} K_{W_{t}} N_{t}^{T}) {(N_{t} K_{W_{t}} N_{t}^{T} + C_{t} Σ_{t} C_{t}^{T})}^{- 1} \\ . {(A_{t} Σ_{t} C_{t}^{T} + B_{t} K_{W_{t}} N_{t}^{T})}^{T}, t = 1, \dots, n, Σ_{1} = K_{S_{1}} ⪰ 0, Σ_{t} ⪰ 0 . \end{matrix}$

(95)

(iv) The conditional mean and covariance $μ_{V_{t} | V^{t - 1}}, K_{V_{t} | V^{t - 1}}$ are given by

$\begin{matrix} μ_{V_{t} | V^{t - 1}} = & C_{t} {\hat{S}}_{t}, t = 1, \dots, n, \end{matrix}$

(96)

$\begin{matrix} K_{V_{t} | V^{t - 1}} = & K_{{\hat{I}}_{t}} = C_{t} Σ_{t} C_{t}^{T} + N_{t} K_{W_{t}} N_{t}^{T}, t = 1, \dots, n . \end{matrix}$

(97)

(v) The entropy of $V^{n}$ is given by

$\begin{matrix} H (V^{n}) = \sum_{t = 1}^{n} H ({\hat{I}}_{t}) = \frac{1}{2} \sum_{t = 1}^{n} log (2 π e [C_{t} Σ_{t} C_{t}^{T} + N_{t} K_{W_{t}} N_{t}^{T}]) \end{matrix}$

(98)

Proof.

(a,b).(i–iv). The generalized Kalman filter of the PO-SS realization of

V^{n}

and accompanied statements can be found in many textbooks, i.e., [16]. However, it is noted that

{\hat{I}}_{t}, t = 2, \dots, n

,

{\hat{I}}_{1} = V_{1}

are all independent Gaussian. For example, to show (93), we write the recursion for

E_{t} = S_{t} - {\hat{S}}_{t}

using part (i) and the realization of

S_{t}

, part (b). (v) By the chain rule of joint entropy, we have

\begin{matrix} H (V^{n}) = & H (V_{1}) + \sum_{t = 2}^{n} H (V_{t} | V^{t - 1}) \end{matrix}

(99)

\begin{matrix} = & H (V_{1}) + \sum_{t = 2}^{n} H (V_{t} - E \{V_{t} | V^{t - 1}\} | V^{t - 1}) \end{matrix}

(100)

\begin{matrix} = & H (V_{1}) + \sum_{t = 2}^{n} H ({\hat{I}}_{t}), by orthogonality of {\hat{I}}_{t} \overset{▵}{=} V_{t} - E \{V_{t} | V^{t - 1}\} and V^{t - 1} \end{matrix}

(101)

From (101) and (97), we have (98) from the entropy formula of Gaussian RVs. □

The next corollary of the entropy

H (V^{n} | s)

follows directly from Lemma 1 when

S_{1} = s

is fixed.

Corollary 1.

Conditional entropy

H (V^{n} | s), S_{1} = s

of the PO-SS noise realization.

Consider the PO-SS realization of

V^{n}

of Definition 2, for fixed

S_{1} = s

, and denote the state process generated by recursion (19) via (we often use the notation

S_{t} = S_{t}^{s}

to emphasize that the

S_{t}

process is generated for

S_{1} = S_{1}^{s} = s

fixed)

S_{t} = S_{t}^{s}, t = 2, \dots, n, S_{1} = S_{1}^{s} = s

. Replace the conditional covariance and conditional mean (85) and (86) by

\begin{matrix} Σ_{t}^{s} \overset{▵}{=} & c o v (S_{t}^{s}, S_{t}^{s} | V^{t - 1}, S_{1}^{s}) = E \{(S_{t}^{s} - {\hat{S}}_{t}^{s}) {(S_{t}^{s} - {\hat{S}}_{t}^{s})}^{T} | V^{t - 1}, S_{1}^{s}\}, \end{matrix}

(102)

\begin{matrix} {\hat{S}}_{t}^{s} \overset{▵}{=} & E \{S_{t}^{s} | V^{t - 1}, S_{1}^{s}\}, t = 2, \dots, n, S_{1}^{s} = s, {\hat{S}}_{1}^{s} \overset{▵}{=} s, Σ_{1}^{s} \overset{▵}{=} c o v (S_{1}^{s}, S_{1}^{s} | S_{1}^{s}) = 0 . \end{matrix}

(103)

Then, all statements of Lemma 1 hold, with the following changes:

\begin{matrix} Σ_{t} ⟼ Σ_{t}^{s}, Σ_{1}^{s} = 0, P_{V_{t} | V^{t - 1}} ⟼ P_{V_{t} | V^{t - 1}, S_{1}^{s}}, {\hat{S}}_{t} ⟼ {\hat{S}}_{t}^{s}, {\hat{S}}_{1}^{s} = s, e t c, t = 1, \dots, n . \end{matrix}

(104)

In particular, the conditional entropy of the

V^{n}

conditioned on

S_{1} = S_{1}^{s} = s

is given by

\begin{matrix} H (V^{n} | s) = \frac{1}{2} \sum_{t = 1}^{n} log (2 π e [C_{t} Σ_{t}^{s} C_{t}^{T} + N_{t} K_{W_{t}} N_{t}^{T}]) \end{matrix}

(105)

where

Σ_{t}^{s}, t = 2, \dots, n

satisfies the generalized DRE (95) with initial condition

Σ_{1}^{s} = 0

.

Proof.

We continue directly from Lemma 1 and (102), (103). □

Next, we introduce an example of a PO-SS realization of the noise that we often use in this paper.

Example 1.

A time-varying PO-SS

(a_{t}, c_{t}, b_{t}^{1}, b_{t}^{2}, d_{t}^{1}, d_{t}^{2})

noise realization is defined by

\begin{matrix} S_{t + 1} = a_{t} S_{t} + b_{t}^{1} W_{t}^{1} + b_{t}^{2} W_{t}^{2}, t = 1, 2, \dots, n - 1 \end{matrix}

(106)

\begin{matrix} V_{t} = c_{t} S_{t} + d_{t}^{1} W_{t}^{1} + d_{t}^{2} W_{t}^{2}, t = 1, \dots, n, \end{matrix}

(107)

\begin{matrix} S_{1} \in N (μ_{S_{1}}, K_{S_{1}}), K_{S_{1}} \geq 0, W_{t}^{i} \in N (0, K_{W_{t}^{i}}), K_{W_{t}^{i}} \geq 0, i = 1, 2, t = 1, \dots, n, \end{matrix}

(108)

\begin{matrix} W^{1, n} a n d W^{2, n} i n d e p . s e q . a n d i n d e p . o f S_{1}, \end{matrix}

(109)

\begin{matrix} a_{t} \in R, c_{t} \in R, b_{t}^{i} \in R, d_{t}^{i} \in R, i = 1, 2, \forall t a r e n o n r a n d o m, \end{matrix}

(110)

\begin{matrix} b_{t} \circ b_{t} \overset{▵}{=} {(b_{t}^{1})}^{2} K_{W_{t}^{1}} + {(b_{t}^{2})}^{2} K_{W_{t}^{2}}, b_{t} \circ d_{t} \overset{▵}{=} b_{t}^{1} K_{W_{t}^{1}} d_{t}^{1} + b_{t}^{2} K_{W_{t}^{2}} d_{t}^{2}, \end{matrix}

(111)

\begin{matrix} d_{t} \circ d_{t} \overset{▵}{=} {(d_{t}^{1})}^{2} K_{W_{t}^{1}} + {(d_{t}^{2})}^{2} K_{W_{t}^{2}} > 0, \forall t . \end{matrix}

(112)

The next corollary is an application of Lemma 1 to the time-varying PO-SS noise of Example 1.

Corollary 2.

The entropy

H (V^{n})

of the PO-SS

(a_{t}, c_{t}, b_{t}^{1}, b_{t}^{2}, d_{t}^{1}, d_{t}^{2})

noise of Example 1 is computed from Lemma 1 with the following changes:

\begin{matrix} C_{t} ⟼ c_{t}, A_{t} ⟼ a_{t}, B_{t} K_{W_{t}} N_{t}^{T} ⟼ b_{t} \circ d_{t}, \\ B_{t} K_{W_{t}} B_{t}^{T} ⟼ b_{t} \circ b_{t}, N_{t} K_{W_{t}} N_{t}^{T} ⟼ d_{t} \circ d_{t}, t = 1, \dots, n . \end{matrix}

(113)

Proof.

This is easily verified. □

From Corollary 2, we have the following observations:

Remark 4.

Consider the PO-SS

(a_{t}, c_{t}, b_{t}^{1}, b_{t}^{2}, d_{t}^{1}, d_{t}^{2})

noise of Example 1. Then, the following hold.

(a) Consider the code of Definition 2. At each time t, the optimal channel input process $X^{n}$ is either realized by (35), or equivalently by (68), i.e.,

$\begin{matrix} X_{t} = \sum_{j = 1}^{t - 1} B_{t, j} V_{t, j} + {\bar{Z}}_{t} = \sum_{j = 1}^{t - 1} Γ_{t, j}^{1} V_{j} + \sum_{j = 1}^{t - 1} Γ_{t, j}^{2} Y_{j} + Z_{t}, t = 1, 2, \dots, n . \end{matrix}$

(114)

Moreover, $X_{t}$ cannot be expressed in terms of the state $S^{t}$ because, by (106) and (107), the noise sequence $V^{t - 1}$ does not specify $S^{t}$ for $t = 1, \dots, n$ .
(b) Consider the code of Definition 3, i.e., with a fixed initial state $S_{1} = S_{1}^{s} = s$ . With Corollary 1 using (113) $H (V^{n} | s)$ is computed from Lemma 1, with $Σ_{1} = Σ_{1}^{s} = 0$ , and (105) reduces to

$\begin{matrix} H (V^{n} | s) = \frac{1}{2} \sum_{t = 1}^{n} log (2 π e [{(c_{t})}^{2} Σ_{t}^{s} + d_{t} \circ d_{t}]) \end{matrix}$

(115)

where $Σ_{t}^{s}$ is the solution of (95) with $Σ_{1} = Σ_{1}^{s} = 0$ (using (113)).

We also apply our results to various versions of the autoregressive moving average (ARMA) noise model, such as the double-side and single-sided, stationary version of the ARMA noise, previously analyzed in [4] and in many other papers, to illustrate fundamental differences of Case (I) and Case (II) formulations.

Example 2.

The time-invariant ARMA

(a, c)

noise

(a) The time-invariant one-sided, stable or unstable, autoregressive moving average (ARMA $(a, c), a \in (- \infty, \infty), c \in (- \infty, \infty)$ ) noise is defined by

$\begin{matrix} V_{t} = c V_{t - 1} + W_{t} - a W_{t - 1}, \forall t \in Z_{+} \overset{▵}{=} {1, 2, \dots}, \end{matrix}$

(116)

$\begin{matrix} V_{0} \in N (0, K_{V_{0}}), K_{V_{0}} \geq 0, W_{0} \in N (0, K_{W_{0}}), K_{W_{0}} \geq 0, \end{matrix}$

(117)

$\begin{matrix} W_{t} \in N (0, K_{W}), K_{W} > 0, \forall t \in Z_{+}, {W_{0}, W_{1}, \dots, W_{n}} i n d e p . s e q . a n d i n d e p . o f V_{0}, \end{matrix}$

(118)

$\begin{matrix} c \in (- \infty, \infty), a \in (- \infty, \infty), c \neq a . \end{matrix}$

(119)

To express the AR $(a, c)$ in state space form, we define the state variable of the noise by

$\begin{matrix} S_{t} \overset{▵}{=} \frac{c V_{t - 1} - a W_{t - 1}}{c - a}, \forall t \in Z_{+} \end{matrix}$

(120)

Then, the state space realization of $V^{n}$ is

$\begin{matrix} S_{t + 1} = c S_{t} + W_{t}, \forall t \in Z_{+}, \end{matrix}$

(121)

$\begin{matrix} V_{t} = (c - a) S_{t} + W_{t}, \forall t \in Z_{+}, \end{matrix}$

(122)

$\begin{matrix} K_{S_{1}} = \frac{{(c)}^{2} K_{V_{0}} + {(a)}^{2} K_{W_{0}}}{{(c - a)}^{2}}, K_{V_{0}} \geq 0, K_{W_{0}} \geq 0 b o t h g i v e n . \end{matrix}$

(123)

We note that the AR $(a, c)$ is not necessarily stationary or asymptotically stationary.
A special case of the AR $(a, c)$ is the AR $(c)$ noise (i.e., with $a = 0$ ) defined by

$\begin{matrix} V_{t} = c V_{t - 1} + W_{t}, t = 1, 2, \dots, K_{V_{0}} \geq 0, K_{W} > 0 . \end{matrix}$

(124)

(b) Double-sided wide-sense stationary ARMA $(a, c), a \in [- 1, 1], c \in (- 1, 1)$ noise. A double-sided wide-sense stationary ARMA $(a, c)$ noise is defined by

$\begin{matrix} V_{t} = c V_{t - 1} + W_{t} - a W_{t - 1}, \forall t \in Z \overset{▵}{=} {\dots, - 1, 0, 1, \dots}, | a | \leq 1, | c | < 1 . \end{matrix}$

(125)

where $W_{t}, \forall t \in Z$ is an independent and identically distributed Gaussian sequence, i.e., $W_{t} \in N (0, K_{W})$ , $\forall t$ . The power spectral density (PSD) of the wide-sense stationary noise (this corresponds to [4] (Equation (43) with $L = 1$ )) is given by

$\begin{matrix} S_{V} (e^{j θ}) \overset{▵}{=} & K_{W} \frac{(1 - a e^{i θ}) (1 - a e^{- i θ})}{(1 - c e^{i θ}) (1 - c e^{- i θ})}, | c | < 1, | a | \leq 1, c \neq a, K_{W} > 0 . \end{matrix}$

(126)

We define the state process by

$\begin{matrix} S_{t} \overset{▵}{=} \frac{c V_{t - 1} - a W_{t - 1}}{c - a}, \forall t \in Z . \end{matrix}$

(127)

Then, the stationary state space realization of $V_{t}, \forall t \in Z$ is

$\begin{matrix} S_{t + 1} = c S_{t} + W_{t}, \forall t \in Z, \end{matrix}$

(128)

$\begin{matrix} V_{t} = (c - a) S_{t} + W_{t}, \forall t \in Z \end{matrix}$

(129)

provided that the initial covariances $c o v (S_{t}, S_{t}), c o v (S_{t}, V_{t}), c o v (V_{t}, V_{t})$ are chosen appropriately to ensure stationarity (see Proposition 1).
(c) One-sided wide-sense stationary ARMA $(a, c), a \in [- 1, 1], c \in (- 1, 1)$ . The one-sided wide-sense stationary ARMA $(a, c)$ noise is defined as in part (b) with $\forall t \in Z \overset{▵}{=} {\dots, - 1, 0, 1, \dots,}$ replaced by $\forall t \in Z_{+} \overset{▵}{=} {1, 2, \dots,}$ and (127)–(129), which hold, $\forall t \in Z_{+}$ , provided that the initial covariances are chosen appropriately (see Proposition 1).

For the AR

(a, c)

noise, we clarify differences of the feedback codes of Definitions 1 and 3 in the next remark, as well as of the Case (I) formulation versus the Case (II) formulation (and we discuss implication to results in [4,7,9,11,12]).

Remark 5.

ARMA

(a, c)

noise of Example 2

(a) Consider any of the AR $(a, c)$ of Example 2. For the code of Definition 2, the channel input process $X^{n}$ cannot be expressed in terms of the state $S^{n}$ (also see Remark 4(a)).
(b) Consider the non-stationary AR $(a, c), a \in (- \infty, \infty), c \in (- \infty, \infty)$ of Example 2(a).
(i) Assume the code of Definition 3, with initial state $V_{0} = v_{0}$ known to the encoder. By (120),

$\begin{matrix} S_{1} = S_{1}^{v_{0}} = \frac{c v_{0} - a W_{0}}{c - a}, V_{0} = v_{0}; \end{matrix}$

(130)

hence, knowledge of $V_{0} = v_{0}$ at the encoder does not determine $S_{1}^{v_{0}}$ because the encoder requires knowledge of $W_{0}$ for this to hold. It then follows that $H (V^{n} | v_{0})$ is computed from Corollary 1 as follows:

$\begin{matrix} Σ_{1} = Σ_{1}^{v_{0}} = \frac{{(a)}^{2} K_{W_{0}}}{{(c - a)}^{2}}, (105) r e d u c e s t o H (V^{n} | v_{0}) = \frac{1}{2} \sum_{t = 1}^{n} log (2 π e [{(c)}^{2} Σ_{t}^{v_{0}} + K_{W}]) \end{matrix}$

(131)

where $Σ_{t}^{v_{0}}$ is the solution of (95) with initial data $Σ_{1} = K_{S_{1}} = Σ_{1}^{v_{0}}, K_{W_{0}} \geq 0$ .
(ii) Assume the code of Definition 3, with initial state $S_{1} = s$ or $(V_{0}, W_{0}) = (v_{0}, w_{0})$ , known to the encoder. Then, by Corollary 1,

$\begin{matrix} H (V^{n} | v_{0}, w_{0}) = \frac{1}{2} \sum_{t = 1}^{n} log (2 π e K_{W}) . \end{matrix}$

(132)

By (120), $S_{1} \overset{▵}{=} \frac{c V_{0} - a W_{0}}{c - a}$ , and a necessary condition for Conditions 1 of Section 1.1 to hold is the following: both $(V_{0}, W_{0}) = (v_{0}, w_{0})$ are known to the encoder and the decoder.
(c) The statements of parts (a) and (b) also hold for the double-sided and the one-sided wide-sense stationary AR $(a, c), a \in [- 1, 1], c \in (- 1, 1)$ of Example 2(b,c).
(d) The Case (II) formulation discussed in Section 1.1 requires Conditions 1 and 2 to hold. For any of the AR $(a . c)$ noise models, Conditions 1 and 2 hold if and only if $S_{1} = s_{1}$ or $(V_{0}, W_{0}) = (v_{0}, w_{0})$ are known to the encoder. Clearly, the values of $H (V^{n})$ under the Case (I) formulation are fundamentally different from the values of $H (V^{n} | s), S_{1} = s$ under the Case (II) formulation. Consequently, in general, $C_{n}^{f b} (κ)$ given by (75) is fundamentally different from $C_{n}^{f b} (κ, s)$ , i.e., it corresponds to a fixed initial state $S_{1} = s$ , known to the encoder and the decoder, and to the channel input distribution.
(e) From parts (a–d), the characterization of feedback capacity for the stationary ARMA $(a, c)$ , $a \in [- 1, 1], c \in (- 1, 1)$ given in [4] (Theorem 6.1, $C_{F B}$ ) (which is derived based on [4] (Lemma 6.1)) presupposed the encoder and the decoder assumed knowledge of $S_{1} = S_{1}^{s} = s$ ,

In fact, the formulas of capacity in [4,7,8] use

H (V^{n} | v_{0}, w_{0}) = \frac{1}{2} \sum_{t = 1}^{n} log (2 π e K_{W})

.

In the next proposition, we state conditions for the stable realizations of Example 2(a), i.e., AR

(a, c), a \in [- 1, 1], c \in (- 1, 1)

to be asymptotically stationary, and for the realizations of Example 2(b,c) to be stationary. We should emphasize that for stationary noise, we need to determine the initial conditions of the generalized Kalman filter of Lemma 1 to correspond to the stationary noise.

Proposition 1.

Asymptotically stationary and stationary ARMA

(a, c)

noises of Example 2

(a) The realization of the double-sided ARMA $(a, c), a \in [- 1, 1], c \in (- 1, 1)$ noise of Example 2(b) is stationary if the following conditions hold:

$\begin{matrix} d_{11} \overset{▵}{=} c o v (S_{t}, S_{t}) = K_{S_{t}}, d_{12} \overset{▵}{=} c o v (S_{t}, V_{t}) = K_{S_{t}, V_{t}}, \\ d_{22} \overset{▵}{=} c o v (V_{t}, V_{t}) = K_{V_{t}}, a r e c o n s t a n t \forall t \in Z . \end{matrix}$

(133)

where the constants $(d_{11}, d_{12}, d_{22})$ are given by

$\begin{matrix} d_{11} = \frac{K_{W}}{1 - c^{2}}, d_{12} = \frac{(c - a) K_{W}}{1 - c^{2}}, d_{22} = \frac{{(c - a)}^{2} K_{W}}{1 - c^{2}} + K_{W} . \end{matrix}$

(134)

Similarly, the one-sided ARMA $(a, c), a \in [- 1, 1], c \in (- 1, 1)$ noise of Example 2(c) is stationary if the above equations hold $\forall t \in Z_{+} \overset{▵}{=} {1, 2, \dots}$ .
(b) The realization of the ARMA $(a, c)$ noise of Example 2(a) is asymptotically stationary if $a \in [- 1, 1], c \in (- 1, 1)$ .
(c) For the stationary realization of part (a), the optimal conditional variance and conditional mean of $S_{t}$ from $(V_{0}, V_{1}, V_{2}, \dots, V_{t - 1})$ , i.e., $Σ_{t} \overset{▵}{=} c o v (S_{t}, S_{t} | V^{t - 1}, V_{0}), {\hat{S}}_{t} \overset{▵}{=} E \{S_{t} | V^{t - 1}, V_{0}\}$ are defined by the generalized Kalman filter given by

$\begin{matrix} {\hat{S}}_{t + 1} = c {\hat{S}}_{t} + ({(c)}^{2} Σ_{t} + K_{W}) {(K_{W} + {(c - a)}^{2} Σ_{t})}^{- 1} (V_{t} - (c - a) {\hat{S}}_{t}), t = 1, 2, \dots, \end{matrix}$

(135)

$\begin{matrix} Σ_{t + 1} = {(c)}^{2} Σ_{t} + K_{W_{t}} - {(c (a - c) Σ_{t} + K_{W})}^{2} {(K_{W} + {(c (a - c))}^{2} Σ_{t})}^{- 1} \end{matrix}$

(136)

initialized at the initial data

$\begin{matrix} {\hat{S}}_{1} \overset{▵}{=} E \{S_{1} | V_{0}\} = \frac{c d_{12} + K_{W}}{d_{22}} V_{0}, \end{matrix}$

(137)

$\begin{matrix} Σ_{1} \overset{▵}{=} c o v (S_{1}, S_{1} | V_{0}) = d_{11} - \frac{{(d_{12})}^{2}}{d_{22}} . \end{matrix}$

(138)

(i) If the conditioning information is $(V_{- N}, \dots, V_{0}, V_{1}, V_{2}, \dots, V_{t - 1})$ , then the generalized Kalman filters (135) and (136) still hold and are initialized at the initial data

$\begin{matrix} {\hat{S}}_{- N + 1} = \frac{c d_{12} + K_{W}}{d_{22}} V_{- N}, \end{matrix}$

(139)

$\begin{matrix} Σ_{- N + 1} = d_{11} - \frac{{(d_{12})}^{2}}{d_{22}} . \end{matrix}$

(140)

(ii) If the inital data $V_{0}$ are not available, then the generalized Kalman filter is initialized at initial data ${\hat{S}}_{1} = 0$ , $Σ_{1} = c o v (S_{1}, S_{1}) = d_{11}$ .

Proof.

See Appendix A.2. □

Remark 6.

Consider the stationary double-sided or one-sided ARMA

(a, c), a \in [- 1, 1], c \in (- 1, 1)

of Example 2. From Proposition 1 and, in particular, the initial data

{\hat{S}}_{1}, Σ_{1}

stated in (137) and (138), it is clear that even if the encoder and the decoder know the initial state

V_{0}

. Thus,

H (V^{n} | v_{0}) \neq \frac{1}{2} \sum_{t = 1}^{n} log (2 π e K_{W})

. In this case, the value of

C_{n}^{f b} (κ, v_{0})

defined by (81) is fundamentally different from the formulation in [4,7,8], leading to the characterization of feedback capacity [4] (Theorem 6.1).

In the next corollary, we further clarify the difference between Case (I) formulation and Case (II) formulation, by stating the analog of Theorem 1 for the code of Definition 3, i.e., when

S_{1} = S_{1}^{s} = s

is fixed.

Corollary 3.

n–FTFI capacity for feedback code of Definition 3

Consider the time-varying AGN channel defined by (1), driven by a noise with the PO-SS realization of Definition 2, and the code of Definition 3, with initial state $S_{1} = S_{1}^{s} = s$ being fixed.
Then, the following hold.
(a) The n–FTFI capacity $C_{n}^{f b} (κ, s)$ is given by

$\begin{matrix} C_{n}^{f b} (κ, s) \overset{▵}{=} & \sup_{\frac{1}{n} E_{s}^{\bar{P}} \{\sum_{t = 1}^{n} {(X_{t})}^{2} | S_{1}\} \leq κ} H^{\bar{P}} (Y^{n} | s) - H (V^{n} | s) . \end{matrix}$

(141)

$\begin{matrix} X_{t} = & Γ^{0} s + \sum_{j = 1}^{t - 1} Γ_{t, j}^{1} V_{j} + \sum_{j = 1}^{t - 1} Γ_{t, j}^{2} Y_{j} + Z_{t}, t = 1, \dots, n . \end{matrix}$

(142)

where the supremum is over all $(Γ^{0}, Γ_{t, j}^{1}, Γ_{t, j}^{2}, K_{Z_{t}}), j = 1, \dots, t - 1, t = 1, \dots, n$ of the realization of $X^{n}$ , which induces the distribution ${\bar{P}}_{t} (d x_{t} | v^{t - 1}, y^{t - 1}, s), t = 1, \dots, n$ , and all statements of Theorem 1 and Lemma 1 hold, with the conditional distributions, expectations, and entropies replaced by the corresponding expressions with a fixed $S_{1} = S_{1}^{s} = s$ .
(b) A necessary condition for Condition 2 of Section 1.1 to hold is as follows:
(i) $N_{t} W_{t}$ uniquely defines $C_{t + 1} B_{t} W_{t}, \forall t$ .
Moreover, if (i) holds, then the entropy $H (V^{n} | s)$ of part (a) is given by

$\begin{matrix} H (V^{n} | s) = \frac{1}{2} \sum_{t = 1}^{n} log (2 π e N_{t} K_{W_{t}} N_{t}^{T}) . \end{matrix}$

(143)

The stable, time-invariant PO-SS realization of Definition 2, which is considered in [4,7], satisfies $W_{t} : Ω \to R, N_{t} = 1, \forall t$ , i.e., $n_{w} = 1$ . Moreover, for this realization, (i) always holds.

Proof.

See Appendix A.3. □

In the next remark, we illustrate that

H (V^{n} | s)

given by (143) follows directly from Lemma 1 by fixing

S_{1} = S_{1}^{s} = s

and assuming that

N_{t} W_{t}

uniquely defines

C_{t + 1} B_{t} W_{t}, \forall t

.

Remark 7.

The n–FTFI capacity for code of Definition 1 versus code of Definition 3.

Consider the generalized Kalman filter of the PO-SS noise realization, of Lemma 1, and assume the following:

(i) The initial state of the noise

S_{1}

is known, i.e.,

S_{1} = S_{1}^{s} = s

or

S_{1} = S_{1}^{s} = s = 0

, and

N_{t} W_{t}

uniquely defines

C_{t + 1} B_{t} W_{t}, \forall t

.

Then, all statements of Lemma 1 hold, by replacing

(Σ_{t}, {\hat{S}}_{t})

with

(Σ_{t}^{s}, {\hat{S}}_{t}^{s})

for

t = 1, 2, \dots

. Since

Σ_{t}^{s}

satisfies the generalized DRE (95) with initial condition

Σ_{1}^{s} = 0

, then it is easy to deduce that

Σ_{t}^{s} = 0

for

t = 1, 2, \dots, n

is a solution. By substituting

Σ_{t}^{s} = 0, t = 1, 2, \dots, n

in (98), we obtain (143), as expected, which is precisely the entropy of the noise that appeared in [4,7].

On the other hand, for the code of Definition 1, by Theorem 1(d), the right-hand side of the n–FTFI capacity $C_{n}^{f b} (κ)$ involves $H (V^{n})$ , which is computed using the generalized Kalman filter of Lemma 1.

2.3. A Sufficient Statistic Approach to the Characterization of n–FTFI Capacity of AGN Channels Driven by PO-SS Noise Realizations

The characterization of the n–FTFI capacity via (34) (which is equivalently given in Theorem 1(d)), although compactly represented, is not very practical because the input process

X^{n}

is not expressed in terms of a sufficient statistic that summarizes the information of the channel input strategy [39].

In this section, we wish to identify a sufficient statistic for the input process $X_{t}$ , given by (68), called the state of the input, which summarizes the information contained in $(V^{t - 1}, Y^{t - 1})$ . It will then become apparent that the characterization of the n–FTFI capacity for the Cover and Pombra formulation and code of Definition 1 can be expressed as a functional of two generalized matrix DREs.

First, we invoke Theorem 1 and Lemma 1 to show that for each time t,

X_{t}

is expressed as

\begin{matrix} X_{t} = Λ_{t} ({\hat{S}}_{t} - E \{{\hat{S}}_{t} | Y^{t - 1}\}) + Z_{t}, t = 1, \dots, n, \end{matrix}

(144)

\begin{matrix} {\hat{S}}_{t} \overset{▵}{=} E \{S_{t} | V^{t - 1}\}, {\hat{\hat{S}}}_{t} \overset{▵}{=} E \{{\hat{S}}_{t} | Y^{t - 1}\} \end{matrix}

(145)

which means, at each time t, the state of the channel input process

X_{t}

is

({\hat{S}}_{t}, {\hat{\hat{S}}}_{t})

. We show that

{\hat{\hat{S}}}_{t}

satisfies another generalized Kalman filter recursion.

Now, we prepare to prove (144) and the main theorem. We start with preliminary calculations.

\begin{matrix} P \{Y_{t} \in d y | Y^{t - 1}, X^{t}\} = & P_{t} (d y | X_{t}, V^{t - 1}), t = 2, \dots, n, by channel definition \end{matrix}

(146)

\begin{matrix} = & P_{t} (d y | X_{t}, V^{t - 1}, {\hat{S}}^{t}), by {\hat{S}}_{t} = E \{S_{t} | V^{t - 1}\} \end{matrix}

(147)

\begin{matrix} = & P_{t} (d y | X_{t}, V^{t - 1}, {\hat{S}}_{t}, {\hat{I}}^{t - 1}), by (90), i . e ., V_{t} = C_{t} {\hat{S}}_{t} + {\hat{I}}_{t} \end{matrix}

(148)

\begin{matrix} = & P_{t} (d y | X_{t}, {\hat{S}}_{t}), by Y_{t} = X_{t} + V_{t} = X_{t} + C_{t} {\hat{S}}_{t} + {\hat{I}}_{t}, (91) . \end{matrix}

(149)

At

t = 1

, we also have

P \{Y_{1} \in d y | X_{1}\} = P_{1} (d y | X_{1})

. By (149), it follows that the conditional distribution of

Y_{t}

given

Y^{t - 1} = y^{t - 1}

is

\begin{matrix} P_{t} (d y_{t} | y^{t - 1}) = & \int P_{t} (d y | x_{t}, {\hat{s}}_{t}) P_{t} (d x_{t} | {\hat{s}}_{t}, y^{t - 1}) P_{t} (d {\hat{s}}_{t} | y^{t - 1}), t = 2, \dots, n, \end{matrix}

(150)

\begin{matrix} P_{1} (d y_{1}) = & \int P_{1} (d y | x_{t}, {\hat{s}}_{1}) P_{1} (d x_{1} | {\hat{s}}_{1}) P_{1} (d {\hat{s}}_{1}) . \end{matrix}

(151)

From the above distributions, at each time t, the distribution of

X_{t}

conditioned on

(V^{t - 1}, Y^{t - 1})

, given in Theorem 1, is also expressed as a linear functional of

({\hat{S}}_{t}, Y^{t - 1})

, for

t = 1, \dots, n

.

The next theorem further shows that for each t, the dependence of

X_{t}

on

Y^{t - 1}

is expressed in terms of

E \{{\hat{S}}_{t} | Y^{t - 1}\}

for

t = 1, \dots, n

, and this dependence gives rise to an equivalent sequential characterization of the Cover and Pombra n–FTFI capacity,

C_{n}^{f b} (κ)

.

Theorem 3.

Equivalent characterization of n–FTFI capacity

C_{n}^{f b} (κ)

for PO-SS noise realizations

Consider the time-varying AGN channel defined by (1), driven by a noise with the PO-SS realization of Definition 2, and the code of Definition 1. Also consider the generalized Kalman filter of Lemma 1.
Define the conditional covariance and conditional mean of ${\hat{S}}_{t}$ , given $Y^{t - 1}$ , by

$\begin{matrix} K_{t} \overset{▵}{=} & c o v ({\hat{S}}_{t}, {\hat{S}}_{t} | Y^{t - 1}) = E \{({\hat{S}}_{t} - {\hat{\hat{S}}}_{t}) {({\hat{S}}_{t} - {\hat{\hat{S}}}_{t})}^{T} | Y^{t - 1}\}, t = 2, \dots, n, \end{matrix}$

(152)

$\begin{matrix} {\hat{\hat{S}}}_{t} \overset{▵}{=} & E \{{\hat{S}}_{t} | Y^{t - 1}\}, {\hat{\hat{S}}}_{1} \overset{▵}{=} μ_{S_{1}}, K_{1} \overset{▵}{=} 0 . \end{matrix}$

(153)

Then, the following hold.
(a) An equivalent characterization of the n–FTFI capacity $C_{n}^{f b} (κ)$ , defined by (34) and (35), is

$\begin{matrix} C_{n}^{f b} (κ) = \sup_{P_{[1, n]}^{\hat{S}} (κ)} \sum_{t = 1}^{n} H (Y_{t} | Y^{t - 1}) - H (V^{n}) \end{matrix}$

(154)

where $(X^{n}, Y^{n})$ is jointly Gaussian, and

$\begin{matrix} H (V^{n}) i s t h e e n t r o p y o f V^{n} g i v e n i n L e m m a 1, i . e ., (98), \end{matrix}$

(155)

$\begin{matrix} {\hat{I}}^{n} i s t h e i n n o v a t i o n s p r o c e s s o f V^{n} g i v e n i n L e m m a 1; \end{matrix}$

(156)

$\begin{matrix} Y_{t} = X_{t} + V_{t}, t = 1, \dots, n; \end{matrix}$

(157)

$\begin{matrix} V_{t} = C_{t} {\hat{S}}_{t} + {\hat{I}}_{t}; \end{matrix}$

(158)

$\begin{matrix} P_{t} (d y_{t} | y^{t - 1}) = \int P_{t} (d y | x_{t}, {\hat{s}}_{t}) P_{t} (d x_{t} | {\hat{s}}_{t}, y^{t - 1}) P_{t} (d {\hat{s}}_{t} | y^{t - 1}), t = 2, \dots, n; \end{matrix}$

(159)

$\begin{matrix} P_{1} (d y_{1}) = \int P_{1} (d y | x_{t}, {\hat{s}}_{1}) P_{1} (d x_{1} | {\hat{s}}_{1}) P_{1} (d {\hat{s}}_{1}); \end{matrix}$

(160)

$\begin{matrix} P_{t} (d y_{t} | y^{t - 1}) \in N (μ_{Y_{t} | Y^{t - 1}}, K_{Y_{t} | Y^{t - 1}}); \end{matrix}$

(161)

$\begin{matrix} μ_{Y_{t} | Y_{t - 1}} i s l i n e a r i n Y^{t - 1} a n d K_{Y_{t} | Y^{t - 1}} i s n o n r a n d o m; \end{matrix}$

(162)

$\begin{matrix} P_{t} (d x_{t} | {\hat{s}}_{t}, y^{t - 1}) \in N (μ_{X_{t} | {\hat{S}}_{t}, Y^{t - 1}}, K_{X_{t} | {\hat{S}}_{t}, Y^{t - 1}}); \end{matrix}$

(163)

$\begin{matrix} μ_{X_{t} | {\hat{S}}_{t}, Y^{t - 1}} i s l i n e a r i n ({\hat{S}}_{t}, Y^{t - 1}) a n d K_{X_{t} | {\hat{S}}_{t}, Y^{t - 1}} i s n o n r a n d o m; \end{matrix}$

(164)

$\begin{matrix} P_{[1, n]}^{\hat{S}} (κ) \overset{▵}{=} \{P_{t} (d x_{t} | {\hat{s}}_{t}, y^{t - 1}), t = 1, \dots, n : \frac{1}{n} E (\sum_{t = 1}^{n} {(X_{t})}^{2}) \leq κ\} . \end{matrix}$

(165)

(b) The optimal jointly Gaussian process $(X^{n}, Y^{n})$ of part (a) is represented as a function of a sufficient statistic by

$\begin{matrix} X_{t} = Λ_{t} ({\hat{S}}_{t} - {\hat{\hat{S}}}_{t}) + Z_{t}, t = 1, \dots, n, \end{matrix}$

(166)

$\begin{matrix} Z_{t} \in N (0, K_{Z_{t}}) i n d e p e n d e n t o f (X^{t - 1}, V^{t - 1}, {\hat{S}}^{t}, \hat{{\hat{S}}^{t}}, {\hat{I}}^{t}, Y^{t - 1}), t = 1, \dots, n; \end{matrix}$

(167)

$\begin{matrix} {\hat{I}}_{t} \in N (0, K_{{\hat{I}}_{t}}) i n d e p e n d e n t o f (X^{t - 1}, V^{t - 1}, {\hat{S}}^{t}, Y^{t - 1}, \hat{{\hat{S}}^{t}}), t = 1, \dots, n; \end{matrix}$

(168)

$\begin{matrix} Y_{t} = Λ_{t} ({\hat{S}}_{t} - {\hat{\hat{S}}}_{t}) + Z_{t} + V_{t}, t = 1, \dots, n; \end{matrix}$

(169)

$\begin{matrix} = Λ_{t} ({\hat{S}}_{t} - {\hat{\hat{S}}}_{t}) + C_{t} {\hat{S}}_{t} + {\hat{I}}_{t} + Z_{t}; \end{matrix}$

(170)

$\begin{matrix} \frac{1}{n} E \{\sum_{t = 1}^{n} {(X_{t})}^{2}\} = \frac{1}{n} \sum_{t = 1}^{n} (Λ_{t} K_{t} Λ_{t}^{T} + K_{Z_{t}}) . \end{matrix}$

(171)

where $Λ_{t}$ is nonrandom.
The conditional mean and covariance, ${\hat{\hat{S}}}_{t}$ and $K_{t}$ , are given by generalized Kalman filter equations, as follows:
(i) ${\hat{\hat{S}}}_{t}$ satisfies the Kalman filter recursion

$\begin{matrix} {\hat{\hat{S}}}_{t + 1} = A_{t} {\hat{\hat{S}}}_{t} + F_{t} (Σ_{t}, K_{t}) I_{t}, {\hat{\hat{S}}}_{1} = μ_{S_{1}}; \end{matrix}$

(172)

$\begin{matrix} F_{t} (Σ_{t}, K_{t}) \overset{▵}{=} (A_{t} K_{t} {(Λ_{t} + C_{t})}^{T} + M_{t} (Σ_{t}) K_{{\hat{I}}_{t}}) {\{K_{{\hat{I}}_{t}} + K_{Z_{t}} + (Λ_{t} + C_{t}) K_{t} {(Λ_{t} + C_{t})}^{T}\}}^{- 1}; \end{matrix}$

(173)

$\begin{matrix} I_{t} \overset{▵}{=} Y_{t} - E \{Y_{t} | Y^{t - 1}\} = Y_{t} - C_{t} {\hat{\hat{S}}}_{t} = (Λ_{t} + C_{t}) ({\hat{S}}_{t} - {\hat{\hat{S}}}_{t}) + {\hat{I}}_{t} + Z_{t}, t = 1, \dots, n; \end{matrix}$

(174)

$\begin{matrix} I_{t} \in N (0, K_{I_{t}}), t = 1, \dots, n i s a n o r t h o g o n a l i n n o v a t i o n s p r o c e s s, \\ i . e ., I_{t} i s i n d e p e n d e n t o f I_{s}, f o r a l l t \neq s, a n d I_{t} i s i n d e p e n d e n t o f V^{t - 1}; \end{matrix}$

(175)

$\begin{matrix} K_{Y_{t} | Y^{t - 1}} = K_{I_{t}} \overset{▵}{=} c o v (I_{t}, I_{t}) = (Λ_{t} + C_{t}) K_{t} {(Λ_{t} + C_{t})}^{T} + K_{{\hat{I}}_{t}} + K_{Z_{t}}; \end{matrix}$

(176)

$\begin{matrix} K_{{\hat{I}}_{t}} g i v e n b y (92) . \end{matrix}$

(177)

(ii) The error ${\hat{E}}_{t} \overset{▵}{=} {\hat{S}}_{t} - {\hat{\hat{S}}}_{t}$ satisfies the recursion

$\begin{matrix} {\hat{E}}_{t + 1} = & F_{t}^{C L} (Σ_{t}, K_{t}) {\hat{E}}_{t} + (M_{t} (Σ_{t}) - F_{t} (Σ_{t}, K_{t})) {\hat{I}}_{t} \end{matrix}$

$\begin{matrix} - F_{t} (Σ_{t} . K_{t}) Z_{t}, {\hat{E}}_{1} = {\hat{S}}_{1} - {\hat{\hat{S}}}_{1} = 0, t = 1, \dots, n; \end{matrix}$

(178)

$\begin{matrix} F_{t}^{C L} (Σ_{t}, & K_{t}) \overset{▵}{=} A_{t} - F_{t} (Σ_{t}, K_{t}) (Λ_{t} + C_{t}) . \end{matrix}$

(179)

(iii) $K_{t} = E \{{\hat{E}}_{t} {\hat{E}}_{t}^{T}\}$ satisfies the generalized DRE

$\begin{matrix} K_{t + 1} = A_{t} K_{t} A_{t}^{T} - (A_{t} K_{t} {(Λ_{t} + C_{t})}^{T} + M_{t} (Σ_{t}) K_{{\hat{I}}_{t}}) (K_{{\hat{I}}_{t}} + K_{Z_{t}} \\ + (Λ_{t} + C_{t}) K_{t} {(Λ_{t} + C_{t})}^{T})^{- 1} {(A_{t} K_{t} {(Λ_{t} + C_{t})}^{T} + M_{t} (Σ_{t}) K_{{\hat{I}}_{t}})}^{T} \\ + M_{t} (Σ_{t}) K_{{\hat{I}}_{t}} {(M_{t} (Σ_{t}))}^{T}, K_{t} ⪰ 0, t = 1, \dots, n, K_{1} = 0 . \end{matrix}$

(180)

(c) An equivalent characterization of the n–FTFI capacity $C_{n}^{f b} (κ)$ , defined by (33) and (34), using the sufficient statistics of part (b) is

$\begin{matrix} C_{n}^{f b} (κ) = & \sup_{(Λ_{t}, K_{Z_{t}}), t = 1, \dots, n : \frac{1}{n} E \{\sum_{t = 1}^{n} {(X_{t})}^{2}\} \leq κ} \frac{1}{2} \sum_{t = 1}^{n} log \frac{K_{Y_{t} | Y^{t - 1}}}{K_{V_{t} | V^{t - 1}}} \end{matrix}$

(181)

$\begin{matrix} = & \sup_{(Λ_{t}, K_{Z_{t}}), t = 1, \dots, n : \frac{1}{n} E \{\sum_{t = 1}^{n} {(X_{t})}^{2}\} \leq κ} \frac{1}{2} \sum_{t = 1}^{n} log \frac{K_{I_{t}}}{K_{{\hat{I}}_{t}}} \\ = & \sup_{(Λ_{t}, K_{Z_{t}}), t = 1, \dots, n : \frac{1}{n} \sum_{t = 1}^{n} (Λ_{t} K_{t} Λ_{t}^{T} + K_{Z_{t}}) \leq κ} \frac{1}{2} { \end{matrix}$

(182)

$\begin{matrix} \sum_{t = 1}^{n} log (\frac{(Λ_{t} + C_{t}) K_{t} {(Λ_{t} + C_{t})}^{T} + K_{{\hat{I}}_{t}} + K_{Z_{t}}}{K_{{\hat{I}}_{t}}})} . \end{matrix}$

(183)

Proof.

See Appendix A.4. □

Remark 8.

On the characterization of n–FTFI capacity of Theorem 3

The characterization of the n–FTFI capacity $C_{n}^{f b} (κ)$ given by (183) involves the generalized matrix DRE $K_{t}$ , which is also a functional of the generalized matrix DRE $Σ_{t}$ of the error covariance of the state $S^{n}$ from the noise output $V^{n}$ . This feature is not part of the analysis in [4] and recent studies [4,7,9,11,12].

The next corollary follows directly from Theorem 3 as a degenerate case.

Corollary 4.

Equivalent characterization of n–FTFI capacity

C_{n}^{f b} (κ, s)

for PO-SS noise realizations Consider the time-varying AGN channel defined by (1), driven by a noise with the PO-SS realization of Definition 2, and the code of Definition 3, with initial state

S_{1} = S_{1}^{s} = s

fixed, and replace (152) and (153) by

\begin{matrix} K_{t} = K_{t}^{s} \overset{▵}{=} & c o v ({\hat{S}}_{t}^{s}, {\hat{S}}_{t}^{s} | Y^{t - 1}, S_{1} = s) = E \{({\hat{S}}_{t}^{s} - \hat{{\hat{S}}_{t}^{s}}) {({\hat{S}}_{t}^{s} - \hat{{\hat{S}}_{t}^{s}})}^{T}\}, \end{matrix}

(184)

\begin{matrix} {\hat{\hat{S}}}_{t} = & \hat{{\hat{S}}_{t}^{s}} \overset{▵}{=} E \{{\hat{S}}_{t}^{s} | Y^{t - 1}, S_{1} = s\}, t = 2, \dots, n, \hat{{\hat{S}}_{1}} = \hat{{\hat{S}}_{1}^{s}} \overset{▵}{=} s, K_{1} = K_{1}^{s} = 0 . \end{matrix}

(185)

Then, the characterization of n–FTFI capacity, (3), is

\begin{matrix} C_{n}^{f b} (κ, s) = \sup_{P_{[1, n]}^{\hat{{\hat{S}}^{s}}} (κ)} \sum_{t = 1}^{n} H (Y_{t} | Y^{t - 1}, s) - H (V^{n} | s), \end{matrix}

(186)

\begin{matrix} P_{[1, n]}^{\hat{{\hat{S}}^{s}}} (κ) \overset{▵}{=} \{P_{t} (d x_{t} | \hat{{\hat{s}}_{t}^{s}}, y^{t - 1}, s), t = 1, \dots, n : \frac{1}{n} E (\sum_{t = 1}^{n} {(X_{t})}^{2} | S_{1}^{s} = s) \leq κ\} \end{matrix}

(187)

where

H (V^{n} | s)

is given by Corollary 1, and the statements of Theorem 3 hold with the above changes, i.e., (184), (185), and all conditional entropies, distributions, expectations, etc., are defined for fixed

S_{1} = S_{1}^{s} = s

.

Proof.

It is easily verified from the derivation of Theorem 3 by fixing

S_{1} = S_{1}^{s} = s

. □

Remark 9.

On the characterization of n–FTFI capacity of Corollary 4

The characterization of the n–FTFI capacity $C_{n}^{f b} (κ, s)$ given in Corollary 4 (similar to Theorem 3) involves two generalized matrix DREs, because it does not assume that Conditions 1 and 2 hold. This distinction is not part of the analysis in [4,7,9,11,12].

2.4. Application Examples

In this section, we apply Theorem 3 to specific examples.

First, we consider the application example of the AGN channel driven by the PO-SS

(a_{t}, c_{t}, b_{t}^{1}, b_{t}^{2}, d_{t}^{1}, d_{t}^{2})

noise.

Corollary 5.

The n–FTFI capacity

C_{n}^{f b} (κ)

of the AGN channel driven by the PO-SS

(a_{t}, c_{t}, b_{t}^{1}, b_{t}^{2}

,

d_{t}^{1}, d_{t}^{2})

noise is obtained from Lemma 1 and Theorem 3 by using (113).

Proof.

This is easily verified, as in Corollary 2. □

In the next corollary, we apply Theorem 3 to the stable and unstable ARMA

(a, c)

noise to obtain the characterization of the n–FTFI capacity

C_{n}^{f b} (κ)

and

C_{n}^{f b} (κ, s)

. It is then obvious that for the stable ARMA

(a, c), a \in [- 1, 1], c \in (- 1, 1)

noise, the characterization of

C_{n}^{f b} (κ)

involves two generalized DREs, contrary to the analysis in [4,7,9,11,12], for the same noise model.

Corollary 6.

Characterization of n–FTFI capacity

C_{n}^{f b} (κ)

for the ARMA

(a, c), a \in (- \infty, \infty), c \in (- \infty, \infty)

Consider the time-varying AGN channel defined by (1) and the code of Definition 1.
(a) For the non-stationary ARMA $(a, c), a \in (- \infty, \infty), c \in (- \infty, \infty)$ noise of Example 2(a), the characterization of the n–FTFI capacity, $C_{n}^{f b} (κ)$ , is

$\begin{matrix} C_{n}^{f b} (κ) = \sup_{(Λ_{t}, K_{Z_{t}}), t = 1, \dots, n : \frac{1}{n} \sum_{t = 1}^{n} ({(Λ_{t})}^{2} K_{t} + K_{Z_{t}}) \leq κ} \frac{1}{2} \sum_{t = 1}^{n} log (\frac{{(Λ_{t} + c - a)}^{2} K_{t} + K_{{\hat{I}}_{t}} + K_{Z_{t}}}{K_{{\hat{I}}_{t}}}) \end{matrix}$

(188)

subject to the constraints

$\begin{matrix} K_{t + 1} = & {(c)}^{2} K_{t} + {(M_{t} (Σ_{t}))}^{2} K_{{\hat{I}}_{t}} - {(c K_{t} (Λ_{t} + c - a) + M_{t} (Σ_{t}) K_{{\hat{I}}_{t}})}^{2} \end{matrix}$

$\begin{matrix} . {(K_{{\hat{I}}_{t}} + K_{Z_{t}} + {(Λ_{t} + c - a)}^{2} K_{t})}^{- 1}, K_{1} = 0, t = 1, \dots, n, \end{matrix}$

(189)

$\begin{matrix} K_{Z_{t}} \geq & 0, K_{t} \geq 0, c \neq a, K_{W} > 0, t = 1, \dots, n \end{matrix}$

(190)

and where

$\begin{matrix} M_{t} (Σ_{t}) \overset{▵}{=} (c Σ_{t} (c - a) + K_{W}) {(K_{W} + {(c - a)}^{2} Σ_{t})}^{- 1}, \end{matrix}$

(191)

$\begin{matrix} K_{{\hat{I}}_{t}} = {(c - a)}^{2} Σ_{t} + K_{W}, t = 1, \dots, n, \end{matrix}$

(192)

$\begin{matrix} Σ_{t + 1} = {(c)}^{2} Σ_{t} + K_{W} - {(c Σ_{t} (c - a) + K_{W})}^{2} {(K_{W} + {(c - a)}^{2} Σ_{t})}^{- 1}, t = 1, \dots, n, \end{matrix}$

(193)

$\begin{matrix} Σ_{1} = K_{S_{1}} = \frac{{(c_{0})}^{2} K_{S_{0}} + {(a_{0})}^{2} K_{W_{0}}}{{(c_{0} - a_{0})}^{2}} . \end{matrix}$

(194)

The optimal jointly Gaussian process $(X^{n}, Y^{n})$ is obtained from Theorem 3(b) by invoking

$\begin{matrix} A_{t} ⟼ c, C_{t} ⟼ c - a, B_{t} ⟼ 1, N_{t} ⟼ 1, t = 1, 2, \dots, n . \end{matrix}$

(195)

Special Case. If $Σ_{1} = 0$ or the initial state is fixed, $S_{1} = S_{1}^{s} = s$ , then

$\begin{matrix} Σ_{t} = Σ_{t}^{s} = 0, K_{{\hat{I}}_{t}} = K_{W}, M_{t} (Σ_{t}) = M_{t} (Σ_{t}^{s}) = 1, t = 1, 2, \dots \end{matrix}$

(196)

and $C_{n}^{f b} (κ)$ reduces to

$\begin{matrix} C_{n}^{f b} (κ) = & C_{n}^{f b} (κ, s) = \sup_{(Λ_{t}, K_{Z_{t}}), t = 1, \dots, n : \frac{1}{n} \sum_{t = 1}^{n} ({(Λ_{t})}^{2} K_{t}^{s} + K_{Z_{t}}) \leq κ} { \\ \frac{1}{2} \sum_{t = 1}^{n} log (\frac{{(Λ_{t} + c - a)}^{2} K_{t}^{s} + K_{W} + K_{Z_{t}}}{K_{W}})} \end{matrix}$

(197)

subject to the constraints

$\begin{matrix} K_{t + 1}^{s} = {(c)}^{2} K_{t}^{s} + K_{W} - {(c K_{t}^{s} (Λ_{t} + c - a) + K_{W})}^{2} \\ . {(K_{Z_{t}} + {(Λ_{t} + c - a)}^{2} K_{t}^{s} + K_{W})}^{- 1}, K_{1}^{s} = 0, K_{t}^{s} \geq 0, K_{Z_{t}} \geq 0, t = 1, \dots, n . \end{matrix}$

(198)

(This special case is precisely the application example analyzed in [4,7,8]).

(b) For the non-stationary AR

(c), c \in (- \infty, \infty)

noise of Example 2(c), the characterization of the n–FTFI capacity

C_{n}^{f b} (κ)

is obtained from part (a) by setting

a = 0

, i.e.,

\begin{matrix} C_{n}^{f b} (κ) = \sup_{(Λ_{t}, K_{Z_{t}}), t = 1, \dots, n : \frac{1}{n} \sum_{t = 1}^{n} ({(Λ_{t})}^{2} K_{t} + K_{Z_{t}}) \leq κ} \frac{1}{2} \sum_{t = 1}^{n} log (\frac{{(Λ_{t} + c)}^{2} K_{t} + {(c)}^{2} Σ_{t} + K_{W} + K_{Z_{t}}}{{(c)}^{2} Σ_{t} + K_{W}}) \end{matrix}

(199)

subject to the constraints of

K_{t}, Σ_{t}

are the non-negative solutions of the generalized RDEs:

\begin{matrix} K_{t + 1} = & {(c)}^{2} K_{t} + {(c)}^{2} Σ_{t} + K_{W} - {(c K_{t} (Λ_{t} + c) + {(c)}^{2} Σ_{t} + K_{W})}^{2} \end{matrix}

\begin{matrix} . {({(c)}^{2} Σ_{t} + K_{W} + K_{Z_{t}} + {(Λ_{t} + c)}^{2} K_{t})}^{- 1}, K_{1} = 0, t = 1, \dots, n, \end{matrix}

(200)

\begin{matrix} Σ_{t + 1} = & {(c)}^{2} Σ_{t} + K_{W} - {({(c)}^{2} Σ_{t} + K_{W})}^{2} {(K_{W} + {(c)}^{2} Σ_{t})}^{- 1}, Σ_{1} = K_{S_{1}} = K_{S_{0}} \geq 0 . \end{matrix}

(201)

(c) For the non-stationary AR

(c), c \in (- \infty, \infty)

noise of Example 2(c), with

Σ_{1} = 0

or a fixed initial state

S_{1} = S_{1}^{s} = s

, then (196) holds, i.e.,

Σ_{t} = Σ_{t}^{s} = 0, K_{{\hat{I}}_{t}} = K_{W}, M_{t} (Σ_{t}) = M_{t} (Σ_{t}^{s}) = 1, t = 1, 2, \dots

, and

C_{n}^{f b} (κ)

reduces to

\begin{matrix} C_{n}^{f b} (κ) = C_{n}^{f b} (κ, s) \\ = \sup_{(Λ_{t}, K_{Z_{t}}), t = 1, \dots, n : \frac{1}{n} \sum_{t = 1}^{n} ({(Λ_{t})}^{2} K_{t}^{s} + K_{Z_{t}}) \leq κ} \frac{1}{2} \sum_{t = 1}^{n} log (\frac{{(Λ_{t} + c)}^{2} K_{t}^{s} + K_{W} + K_{Z_{t}}}{K_{W}}) \end{matrix}

(202)

subject to the constraint

\begin{matrix} K_{t + 1}^{s} = {(c)}^{2} K_{t}^{s} + K_{W} \\ - {(c K_{t}^{s} (Λ_{t} + c) + K_{W})}^{2} {(K_{W} + K_{Z_{t}} + {(Λ_{t} + c)}^{2} K_{t}^{s})}^{- 1}, K_{1}^{s} = 0, t = 1, \dots, n . \end{matrix}

(203)

Proof.

(a) The first part follows directly from Theorem 3 by using (195). The last part is obtained as follows. If

Σ_{1} = 0

or

S_{1} = S_{1}^{s} = s

is fixed, then, by (193), it follows that

Σ_{t} = Σ_{t}^{s} = 0, \forall t = 2, \dots

. Moreover, by (191) and (192), it follows that

M_{t} (Σ_{t}) = M_{t} (Σ_{t}^{s}) = 1, K_{{\hat{I}}_{t}} = {(c - a)}^{2} Σ_{t}^{s} + K_{W} = K_{W}, t = 1, 2, \dots

. By substituting into (188) and (189), we obtain (197) and (180). (b) From part (a), let

a = 0

. Then,

\begin{matrix} M_{t} (Σ_{t}) = ({(c)}^{2} Σ_{t} + K_{W}) {(K_{W} + {(c)}^{2} Σ_{t})}^{- 1}, K_{{\hat{I}}_{t}} = {(c)}^{2} Σ_{t} + K_{W}, t = 1, \dots, n . \end{matrix}

(204)

By substituting into the equations of part (a), we obtain (200) and (201). (c) This is a special case of parts (a) and (b). □

Remark 10.

By Corollary 6(a), it is obvious that if

Σ_{1} = 0

, i.e.,

K_{S_{0}} = K_{W_{0}} = 0

, which means

S_{1} = S_{1}^{s} = s

is fixed,

(V_{0}, W_{0}) = (v_{0}, w_{0})

is fixed (and known to the encoder and the decoder)—see (120). Then,

Σ_{1} = Σ_{1}^{s} = 0

and

C_{n}^{f b} (κ) = C_{n}^{f b} (κ, s)

, which depends on the initial state

S_{1} = S_{1}^{s} = s

. To ensure that we obtain a large enough n, the rate

\frac{1}{n} C_{n} (κ, s)

is independent of s. Moreover, it is necessary to identify conditions for convergence of solutions

K_{t}^{s}, t = 1, 2, \dots

of the generalized DRE (180) to a unique limit,

{lim}_{n ⟶ \infty} K_{n}^{s} = K^{\infty} \geq 0

, which does not depend on the initial data

K_{1}^{s} = 0

. We address this problem in Section 3.

2.5. Case (II) Formulation: A Degenerate of Case (I) Formulation

Theorem 3 gives the n–FTFI capacity for the Case (I) formulation. However, since the Case (II) formulation is a special case of the Case (I) formulation, we expect that we can recover the characterization of the n–FTFI capacity for the Case (II) formulation from Theorem 3, i.e., when the code is

(s, 2^{n R}, n)

,

n = 1, 2, \dots

, and Conditions 1 and 2 of Section 1.1 hold. We show this in the next corollary.

Corollary 7.

The degenerate n–FTFI capacity

C_{n}^{f b} (κ)

of Theorem 3 for the Case (II) formulation

Consider the time-varying AGN channel defined by (1), driven by a noise with PO-SS realization of Definition 2, and suppose that the following hold:
- (1) The code is $(s, 2^{n R}, n)$ , $n = 1, 2, \dots$ ;
  (2) Conditions 1 and 2 of Section 1.1 hold.

Then, the following hold:
(a) Corollary 1 holds, i.e., all statements of Lemma 1 hold with $(Σ_{t}, {\hat{S}}_{t})$ replaced by $(Σ_{t}^{s}, {\hat{S}}_{t}^{s})$ as defined by (102) and (103). In particular, $(Σ_{t}^{s}, {\hat{S}}_{t}^{s}) = (0, S_{t}^{s})$ for $t = 1, 2, \dots$ , and $H (V^{n}) = H (V^{n} | s)$ is given by (143).
(b) All statements of Theorem 3 hold with $(Σ_{t}, {\hat{S}}_{t})$ replaced by $(Σ_{t}^{s}, {\hat{S}}_{t}^{s})$ , as in part (a), and $(K_{t}, {\hat{\hat{S}}}_{t})$ defined by (152) and (153) reduces to

$\begin{matrix} K_{t} = K_{t}^{s} = c o v (S_{t}^{s}, S_{t}^{s} | Y^{t - 1}, S_{1}^{s} = s), \end{matrix}$

(205)

$\begin{matrix} {\hat{\hat{S}}}_{t} = {\hat{S}}_{t}^{s} = E \{S_{t}^{s} | Y^{t - 1}, S_{1}^{s} = s\}, K_{1}^{s} = 0, {\hat{S}}_{1}^{s} = s, t = 2, \dots, n \end{matrix}$

(206)

In particular, the optimal input process $X^{n}$ of Theorem 3(c) degenerates to

$\begin{matrix} X_{t} = Λ_{t} (S_{t}^{s} - {\hat{S}}_{t}^{s}) + Z_{t}, X_{1} = Z_{t}, t = 2, \dots, n . \end{matrix}$

(207)

(c) The characterization of the n–FTFI capacity, $C_{n}^{f b} (κ)$ of Theorem 3, degenerates to $C_{n}^{f b, S} (κ, s)$ . It is defined by

$\begin{matrix} C_{n}^{f b} (κ) = & C_{n}^{f b, S} (κ, s) \overset{▵}{=} \sup_{(Λ_{t}, K_{Z_{t}}), t = 1, \dots, n : \frac{1}{n} E_{s} \{\sum_{t = 1}^{n} {(X_{t})}^{2}\} \leq κ} \sum_{t = 1}^{n} log \frac{K_{Y_{t} | Y^{t - 1}, s}}{K_{V_{t} | V^{t - 1}, s}} \end{matrix}$

(208)

$\begin{matrix} = & \sup_{(Λ_{t}, K_{Z_{t}}), t = 1, \dots, n : \frac{1}{n} \sum_{t = 1}^{n} (Λ_{t} K_{t}^{s} Λ_{t}^{T} + K_{Z_{t}}) \leq κ} { \\ \frac{1}{2} \sum_{t = 1}^{n} log (\frac{(Λ_{t} + C_{t}) K_{t}^{s} {(Λ_{t} + C_{t})}^{T} + N_{t} K_{W_{t}} N_{t}^{T} + K_{Z_{t}}}{N_{t} K_{W_{t}} N_{t}^{T}})} . \end{matrix}$

(209)

$K_{t} = K_{t}^{s} = E_{s} \{E_{t}^{s} {(E_{t}^{s})}^{T}\}$ satisfies the generalized DRE

$\begin{matrix} K_{t + 1}^{s} = A_{t} K_{t} A_{t}^{T} + B_{t} K_{W_{t}} B_{t}^{T} - (B_{t} K_{W_{t}} N_{t}^{T} + A_{t} K_{t}^{s} {(Λ_{t} + C_{t})}^{T}) {N_{t} K_{W_{t}} N_{t}^{T} + K_{Z_{t}} \\ + (Λ_{t} + C_{t}) K_{t}^{s} {(Λ_{t} + C_{t})}^{T}}^{- 1} {(B_{t} K_{W_{t}} N_{t}^{T} + A_{t} K_{t}^{s} (Λ_{t} + C_{t}))}^{T}, K_{t}^{s} ⪰ 0, K_{1}^{s} = 0, t = 1, \dots, n . \end{matrix}$

(210)

and the statements of parts (a) and (b) hold.

Proof.

(a) The statements about Lemma 1 follow from Remark 7. (b) The statements about Theorem 3 are easily verified by replacing all conditional expectations, distributions, etc., for a fixed initial state

S_{1} = S_{1}^{s} = s

, and by using part (a), i.e.,

(Σ_{t}^{s}, {\hat{S}}_{t}^{s}) = (0, S_{t}^{s})

,

t = 1, 2, \dots

. Part (c) follows from parts (a) and (b). □

2.6. Comments on Past Studies

It is easily verified that Yang, Kavcic and Tatikonda [8] analyzed

C_{n}^{f b} (κ, s)

, which is defined by (81), under the Case (II) formulation, i.e., Conditions 1 and 2 of Section 1.1 hold, as discussed in the next remark.

Remark 11.

Prior studies on the time-invariant stationary noise of PSD (41)

Yang, Kavcic and Tatikonda [8] analyzed the AGN channel driven by a stationary noise with PSD defined by (41) (see [8] (Theorem 1)). The special case of (126) is found in [8] (Section VI.B, Theorem 7).
The analysis in [8] presupposed the following formulation:
(i) The code is $(s, 2^{n R}, n)$ , $n = 1, 2, \dots$ , where $S_{1} = S_{1}^{s} = s$ is the initial state of the noise, known to the encoder and the decoder, as discussed in Definition 3;
(ii) Conditions 1 and 2 of Section 1.1 hold;
(iii) The n–FTFI capacity formula is $C_{n}^{f b} (κ, s)$ , defined by (81).
We emphasize that in [8] (Section II.C), a specific realization of the PSD is considered to ensure that Conditions 1 and 2 hold, i.e., the analysis in [8] presupposed a stationary noise and the Case (II) formulation.

Now, we ask the following: Given the PSD of the noise defined by (41), and the double-sided realization [4] (Equation (58)), i.e., the analog of time-invariant version of the PO-SS realization of Definition 2, or its analogous one-sided realization, what are the necessary conditions for the feedback capacity of [4] (Theorem 6.1) to be valid?

The answer to this question is as follows: Conditions 1 and 2 of Section 1.1 are necessary conditions. We show this in the next proposition.

Proposition 2.

Conditions for validity of the feedback capacity characterization of [4] (Theorem 6.1)

Consider the AGN channel (1) driven by a stationary noise with PSD defined by (41) with the double-sided or one-sided realization [4] (Equation (58)), (i.e., analog of time invariant of Definition 2).
Then, a necessary condition for [4] (Theorem 6.1) to hold is

$\begin{matrix} P_{X_{t} | X^{t - 1}, Y_{- \infty}^{t - 1}} = P_{X_{t} | S^{t}, Y_{- \infty}^{t - 1}}, t = 1, \dots, \end{matrix}$

(211)

Further, Conditions 1 and 2 of Section 1.1 are necessary and sufficient for Equality (211) to hold.

Proof.

See Appendix A.5. □

The next remark is our final observation on prior studies.

Remark 12.

Comparison of Cover and Pombra Characterization and current literature

From Corollary 7 and Proposition 2, we have the following:
The characterization of feedback capacity given in [4] (Theorem 6.1, $C_{F B}$ ) corresponds to the Case (II) formulation and not to the Case (I) formulation. Further, the optimization problem of [4] (Theorem 6.1, $C_{F B}$ ) is precisely the optimization problem investigated in [8] (Section VI), with the additional restriction that the innovations’ part of the channel input is taken to be asymptotically zero in [4] (Theorem 6.1, $C_{F B}$ ), i.e., see [4] (Lemma 6.1 and comments above it). Recent studies [7,9,11,12] should be read with caution because the results therein often build on [4] (Theorems 4.1 and 6.1).

3. Asymptotic Analysis for Case (I) Formulation

In this section, we address the asymptotic per unit time limit of the n–FTFI capacity. Our analysis includes the following:

(1) Fundamental differences of entropy rates of jointly Gaussian stable versus unstable noise processes.

(2) Necessary and/or sufficient conditions for existence of entropy rates of unstable (and stable)

Y^{n}, n = 1, 2, \dots

, and

V^{n}, n = 1, 2, \dots

, expressed in terms of detectability and stabilizability or unit circle controllability conditions of generalized DREs [16,17], and asymptotic stationarity of the optimal input process

X^{n}, n = 1, 2, \dots

(and output process

Y^{n}, n = 1, 2, \dots

, if the noise is stable).

This section also reconfirms that, in general, the asymptotic analysis of the n–FTFI capacity of a feedback code that depends on the initial state of the channel, i.e.,

S_{1} = S_{1}^{s} = s

is fundamentally different from code that does does not depend on the initial state.

Closed-form expressions of the asymptotic per unit time limit of

C_{n}^{f b} (κ, s)

of AGN channels driven by AR

(c), c \in (- \infty, \infty)

noise, i.e., stable and unstable, are found in [1].

Closed-form expressions of the asymptotic per unit time limit of

C_{n}^{f b} (κ)

of AGN channels driven by ARMA

(a, c), a \in (- \infty, \infty), c \in (- \infty, \infty)

noise are found in [3].

We consider the following definition of rate, often used for nonfeedback capacity of stationary processes; however, our formulations does not assume stationarity.

Definition 4.

Per unit time limit of

C_{n}^{f b, o} (κ)

and

C_{n}^{f b, o} (κ, s)

Consider the AGN channel defined by (1), driven by the time-invariant PO-SS realization of Definition 2.
(a) For the code of Definition 1, define the per unit time limit

$\begin{matrix} C^{f b, o} (κ) & \overset{▵}{=} \sup_{{lim}_{n ⟶ \infty} \frac{1}{n} E \{\sum_{t = 1}^{n} {(X_{t})}^{2}\} \leq κ} lim_{n ⟶ \infty} \frac{1}{n} \{H (Y^{n}) - H (V^{n})\} \end{matrix}$

(212)

$\begin{matrix} \leq C^{f b} (κ) \overset{▵}{=} lim_{n ⟶ \infty} \frac{1}{n} C_{n}^{f b} (κ) \end{matrix}$

(213)

where, for problem $C^{f b, o} (κ)$ , the supremum is taken over all asymptotically time-invarinat distributions with feedback $P_{X_{t} | X^{t - 1}, Y^{t - 1}}^{o} = P_{X_{t} | V^{t - 1}, Y^{t - 1}}^{o}, t = 1, 2, \dots$ , such that the limits exists and the supremum exists and it is finite.
(b) For the code of Definition 3, i.e., $(s, 2^{n R}, n)$ , $n = 1, 2, \dots$ , with initial state $S_{1} = S_{1}^{s} = s$ , $C^{f b, o} (κ)$ is replaced by $C^{f b, o} (κ, s)$ , defined by (212), with differential entropies, conditional expectations, and conditional distributions, defined for fixed $S_{1}^{s} = s$ .

The rate definition,

C^{f b, o} (κ)

, i.e., the interchange of limit and supremum, is consistent with the definition of rates considered in [4,7,9,11,12]. However, unlike [4,7,9,11,12], we treat the general time-invariant, stable and unstable PO-SS noise realization of Definition 2, which is not necessarily stationary or asymptotically stationary.

We should emphasize that, in general, and irrespective of whether the noise is stable or unstable, the entropy rates that appear in (212) and (213) may not exist. To show existence of the limits $C^{f b, o} (κ), C^{f b} (κ)$ and $C^{f b, o} (κ, s)$ , we identify necessary and/or sufficient conditions, using the characterization of Theorem 3, when the channel input strategies are restricted to asymptotically time-invariant strategies ${lim}_{n ⟶ \infty} Λ_{n} = Λ^{\infty}, {lim}_{n ⟶ \infty} K_{Z_{n}} = K_{Z}^{\infty}$ . Clearly, by (212), whether the limit as, $n ⟶ \infty$ and the supremum over channel input distributions exist, depend on the convergence properties of the coupled generalized matrix DREs, $Σ_{n}, K_{n}$ , as $n ⟶ \infty$ .

3.1. Entropy Rates of Gaussian Processes

First, we recall the following definition, which is standard and found in many textbooks:

Definition 5.

Entropy rate of continuous-valued random processes

Let $X_{t} : Ω \to R^{n_{z}}, n_{x} \in Z_{+}$ be a random process defined on some probability space $(Ω, F, P)$ . The entropy rate (differential) is defined by

$\begin{matrix} H_{R} (X^{\infty}) \overset{▵}{=} lim_{n ⟶ \infty} \frac{1}{n} H (X_{1}, X_{2}, \dots, X_{n}) \end{matrix}$

(214)

when the limit exists.

The next theorem quantifies the existence of entropy rates of stationary Gaussian processes [16].

Theorem 4.

The entropy rate of stationary zero-mean full-rank Gaussian process [16]

Let $X_{t} : Ω \to R^{n_{x}}, n_{x} \in Z_{+}, \forall t \in Z_{+}$ be a stationary Gaussian process, with a zero mean, and full-rank covariance of $X^{n}$ . Let $H_{t}^{X}$ denote the Hilbert space of RVs generated by ${X_{t} : s \leq t, s, t \in Z_{+}}$ , and define the innovations process by

$\begin{matrix} Σ_{t} \overset{▵}{=} E \{(X_{t} - E \{X_{t} | H_{t - 1}^{X}\}) {(X_{t} - E \{X_{t} | H_{t - 1}^{X}\})}^{T}\} ≻ 0 \end{matrix}$

(215)

and its limit

$\begin{matrix} Σ \overset{▵}{=} lim_{n ⟶ \infty} Σ_{n} \end{matrix}$

(216)

Then, the entropy rate is given by

$\begin{matrix} H_{R} (X^{\infty}) = & \frac{n_{x}}{2} log (2 π e) + \frac{1}{2} lim_{n ⟶ \infty} \frac{1}{n} \sum_{t = 1}^{n} log | Σ_{t} | \end{matrix}$

(217)

$\begin{matrix} = & \frac{n_{x}}{2 π} log (2 π e) + \frac{1}{2} log | Σ | \end{matrix}$

(218)

when it exists.

An application of Theorem 4 is given in the next proposition [15].

Proposition 3.

Entropy rate of Gaussian process described by PSD (41)

Let $V_{t}, \forall t \in Z_{+}$ be a real, scalar-valued, stationary Gaussian noise with PSD (41), with a corresponding time-invariant stationary realization (similar to Definition 2). Then, the entropy rate is given by

$\begin{matrix} H_{R} (V^{\infty}) = \frac{1}{2} log (2 π e K_{W}) . \end{matrix}$

(219)

Proof.

This is shown in [15] by using the Szego formula and Poisson’s integral formula. □

The next remark is trivial; it is introduced for a subsequent comparison.

Remark 13.

Let

V_{t}, \forall t \in Z_{+}

be the non-stationary ARMA

(a, c), a \in (- \infty, \infty), c \in (- \infty, \infty)

noise of Example 2. Then, the conditional entropy of

V^{n}

for fixed initial state

S_{1} = S_{1}^{s} = s

is given by

\begin{matrix} H_{R} (V^{\infty} | s) \overset{▵}{=} lim_{n ⟶ \infty} \frac{1}{n} H (V^{n} | s) = lim_{n ⟶ \infty} \frac{1}{n} \sum_{t = 1}^{n} \frac{1}{2} log (2 π e K_{W}) = \frac{1}{2} log (2 π e K_{W}) . \end{matrix}

(220)

The next lemma identifies fundamental conditions for the existence of the entropy rate of the time-varying PO-SS noise realization of Definition 2 (if

S_{1} = S_{1}^{s} = s

is not fixed) and includes the entropy rate

H_{R} (V^{\infty})

of the non-stationary ARMA

(a, c), a \in (- \infty, \infty), c \in (- \infty, \infty)

noise of Remark 13.

Lemma 2.

Entropy rate of the time-varying PO-SS noise realization of Definition 2

Consider the time-varying PO-SS noise realization of Definition 2. Then, the following hold:
(a) The joint entropy of $V^{n}$ , when it exists, is given by

$\begin{matrix} H (V^{n}) = \sum_{t = 1}^{n} H ({\hat{I}}_{t}) = \frac{1}{2} \sum_{t = 1}^{n} log (2 π e K_{{\hat{I}}_{t}}) \end{matrix}$

(221)

where ${\hat{I}}_{t}, t = 1, \dots, n$ is a zero-mean covariance $K_{{\hat{I}}_{t}} \overset{▵}{=} c o v ({\hat{I}}_{t}, {\hat{I}}_{t})$ , Gaussian orthogonal innovation process of $V^{n}$ that is defined by

$\begin{matrix} {\hat{I}}_{t} \overset{▵}{=} & V_{t} - E \{V_{t} | V^{t - 1}\}, t = 1, \dots, n \end{matrix}$

(222)

that is, ${\hat{I}}_{t}$ is independent of ${\hat{I}}_{k}, \forall k \neq t$ .
(b) Suppose that the sequence $K_{{\hat{I}}_{t}}, t = 1, 2, \dots, n$ is such that

$\begin{matrix} lim_{n ⟶ \infty} K_{{\hat{I}}_{n}} = K_{\hat{I}}^{\infty} ≻ 0 . \end{matrix}$

(223)

Then, the entropy rate of $V_{t}, \forall t \in Z_{+}$ is given by

$\begin{matrix} H_{R} (V^{\infty}) = lim_{n ⟶ \infty} \frac{1}{n} \sum_{t = 1}^{n} H ({\hat{I}}_{t}) = \frac{1}{2} log (2 π e K_{\hat{I}}^{\infty}) . \end{matrix}$

(224)

Proof.

See Appendix A.6. □

Remark 14.

Entropy rate of non-stationary Gaussian noise

By Lemma 2, a necessary condition for existence of the entropy rate of non-stationary Gaussian process $V^{n}$ is the convergence of the covariance of the Gaussian orthogonal innovations process of $V^{n}$ , i.e., of $K_{{\hat{I}}_{t}} \overset{▵}{=} c o v ({\hat{I}}_{t}, {\hat{I}}_{t})$ , since ${lim}_{n \to \infty} \frac{1}{n} H (V^{n}) = {lim}_{n \to \infty} \frac{1}{n} \sum_{t = 1}^{n} H ({\hat{I}}_{t})$ . We can determine such necessary and/or sufficient conditions from the convergence properties of the generalized Kalman filter equations [16,17] of Lemma 1.

3.2. Convergence Properties of Generalized Matrix DREs to AREs

To address the asymptotic properties of estimation errors generated by the recursions of generalized Kalman filters, such as

{\hat{E}}_{t}, t = 1, 2, \dots

of Theorem 3, generated by (178), we need to introduce the stabilizing solutions of generalized AREs. The next definition is useful in this respect.

Definition 6.

Stabilizing solutions of generalized matrix AREs

Let $(A, G, Q, S, R, C) \in R^{q \times q} \times R^{q \times k} \times R^{k \times k} \times R^{k \times p} \times R^{p \times p} \times R^{p \times q}$ .
Define the generalized time-invariant matrix DRE

$\begin{matrix} P_{t + 1} = A P_{t} A^{T} + G Q G^{T} - (A P_{t} C^{T} + G S) {(R + C P_{t} C^{T})}^{- 1} . {(A P_{t} C^{T} + G S)}^{T}, P_{1} = g i v e n, \\ P_{t} \in S_{+}^{q \times q}, t = 1, \dots, R = R^{T} ≻ 0, \\ F^{C L} (P) \overset{▵}{=} A - (A P C^{T} + G S) {(R + C P C^{T})}^{- 1} C . \end{matrix}$

(225)

Moreover, define the corresponding generalized matrix ARE as follows:

$\begin{matrix} P = & A P A^{T} + G Q G^{T} - (A P C^{T} + G S) {(R + C P C^{T})}^{- 1} . {(A P C^{T} + G S)}^{T}, P \in S_{+}^{q \times q} . \end{matrix}$

(226)

A solution $P = P^{T} ⪰ 0$ to the generalized matrix ARE (226), assuming it exists, is called stabilizing if $s p e c (F^{C L} (P)) \in D_{o}$ . In this case, we say that $F^{C L} (P)$ is asymptotically stable, i.e., the eigenvalues of $F^{C L} (P)$ are stable.

With respect to any of the above generalized matrices DRE and ARE, we introduce the important notions of detectability, unit circle controllability, and stabilizability. We use these notions to characterize the convergence properties of solutions of generalized matrix DREs,

P_{n}

, as

n ⟶ \infty

, to a unique symmetric, non-negative, stabilizing solution P of the generalized matrix ARE. These notions are used to identify necessary and/or sufficient conditions for the error recursions of generalized Kalman filters, such as

{\hat{E}}_{t}, t = 1, 2, \dots

of Theorem 3, generated by (178), to converge in a mean square sense, to a unique limit. However, we should distinguish whether the convergence is uniform for all initial conditions,

P_{1} ⪰ 0

or only for

P_{1} ≻ 0

.

Definition 7.

Detectability, Stabilizability, and Unit Circle Controllability

Consider the generalized matrix ARE of Definition 6 and introduce the matrices

$\begin{matrix} A^{*} \overset{▵}{=} A - G S R^{- 1} C, B^{*} \overset{▵}{=} Q - S R^{- 1} S^{T}, B^{*} = B^{*, \frac{1}{2}} {(B^{*, \frac{1}{2}})}^{T} . \end{matrix}$

(227)

(a) The pair $\{A, C\}$ is called detectable if there exists a matrix $K \in R^{q \times p}$ such that $s p e c (A - K C) \in D_{o}$ , i.e., the eigenvalues λ of $A - K C$ lie in $D_{o}$ (stable).
(b) The pair $\{A^{*}, G B^{*, \frac{1}{2}}\}$ is called unit circle controllable if there exists a $K \in R^{k \times q}$ such that $s p e c (A^{*} - G B^{*, \frac{1}{2}} K) \notin {c \in C : | c | = 1}$ , i.e., all eigenvalues λ of $A^{*} - G B^{*, \frac{1}{2}} K$ are such that $| λ | \neq 1$ .
(c) The pair $\{A^{*}, G B^{*, \frac{1}{2}}\}$ is called stabilizable if there exists a $K \in R^{k \times q}$ such that $s p e c (A^{*} - G B^{*, \frac{1}{2}} K) \in D_{o}$ , i.e., all all eigenvalues λ of $A^{*} - G B^{*, \frac{1}{2}} K$ lie in $D_{o}$ .
(d) The pair $\{A, C\}$ is called observable if the rank condition holds:

$\begin{matrix} r a n k (O) = q, O \overset{▵}{=} [\begin{matrix} C \\ C A \\ ⋮ \\ C A^{q - 1} \end{matrix}] . \end{matrix}$

(228)

(e) The pair $\{A^{*}, G B^{*, \frac{1}{2}}\}$ is called controllable if the rank condition holds:

$\begin{matrix} r a n k (C) = q, C \overset{▵}{=} [\begin{matrix} G B^{*, \frac{1}{2}} & A^{*} G B^{*, \frac{1}{2}} & \dots & {(A^{*})}^{q - 1} G B^{*, \frac{1}{2}} \end{matrix}] . \end{matrix}$

(229)

Remark 15.

The following are well known [16]. If the pair

\{A, C\}

is observable, then it is detectable. If the pair

\{A^{*}, G B^{*, \frac{1}{2}}\}

is controllable, then it is stabilizable.

The next theorem characterizes detectability, unit circle controllability, and stabilizability [17,40].

Lemma 3

([17,40]). Necessary and sufficient conditions for detectability, unit circle controllability, and stabilizability

(a) The pair $\{A, C\}$ is detectable if and only if there exists no eigenvalue and eigenvector ${λ, x}$ , $A x = λ x$ , such that $| λ | \geq 1$ and such that $C x = 0$ .
(b) The pair $\{A^{*}, G B^{*, \frac{1}{2}}\}$ is unit circle controllable if and only if there exists no eigenvalue and eigenvector ${λ, x}$ , $x^{T} {(A^{*})}^{T} = x^{T} λ$ , such that $| λ | = 1$ and such that $x^{T} G B^{*, \frac{1}{2}} = 0$ .
(c) The pair $\{A^{*}, G B^{*, \frac{1}{2}}\}$ is stabilizable if and only there exists no eigenvalue and eigenvector ${λ, x}$ , $x^{T} {(A^{*})}^{T} = x^{T} λ$ , such that $| λ | \geq 1$ and such that $x^{T} G B^{*, \frac{1}{2}} = 0$ .

In the next theorem, we summarize known results on sufficient and/or necessary conditions for the convergence of solutions

{P_{t}, t = 1, 2, \dots, n}

of the generalized time-invariant DRE (225), as

n ⟶ \infty

, to a symmetric, non-negative

P ⪰ 0

, stabilizing solution of the corresponding generalized ARE (226). We recall that the pair

\{A, C\}

is detectable, which is a necessary condition forthe convergence of the sequence

{P_{t}, t = 1, 2, \dots, n}

as

n ⟶ \infty

to a non-negative P, which is a stabilizing solution of a corresponding generalized ARE. However, it is not sufficient. To have a sufficient condition, it is necessary that the pair

\{A^{*}, G B^{*, \frac{1}{2}}\}

is unit circle controllable; however, the limiting P is not necessarily the unique solution of

P ⪰ 0

of the generalized ARE. There may be multiple solutions depending on the initial condition

P_{1} ⪰ 0

.

Theorem 5

([16,17]). Convergence of time-invariant generalized DRE

Let ${P_{t}, t = 1, 2, \dots, n}$ denote a sequence that satisfies the time-invariant generalized DRE (225) with arbitrary initial condition $P_{1} ⪰ 0$ . The following hold:
(1) Consider the generalized DRE (225) with a zero initial condition, i.e., $P_{1} = 0$ , and assume that the pair $\{A, C\}$ is detectable and that the pair $\{A^{*}, G B^{*, \frac{1}{2}}\}$ is unit circle controllable.
Then, the sequence ${P_{t} : t = 1, 2, \dots, n}$ that satisfies the generalized DRE (225), with a zero initial condition $P_{1} = 0$ , converges to P, i.e., ${lim}_{n ⟶ \infty} P_{n} = P$ , where P satisfies the generalized matrix ARE (226) if and only if the pair $\{A^{*}, G B^{*, \frac{1}{2}}\}$ is stabilizable.
(2) Assume that the pair $\{A, C\}$ is detectable and that the pair $\{A^{*}, G B^{*, \frac{1}{2}}\}$ is unit circle controllable. Then there exists a unique stabilizing solution $P ⪰ 0$ to the generalized ARE (226), i.e., such that $s p e c (F^{C L} (P)) \in D_{o}$ if and only if ${A^{*}, G B^{*, \frac{1}{2}}}$ is stabilizable.
(3) If ${A, C}$ is detectable and ${A^{*}, G B^{*, \frac{1}{2}}}$ is stabilizable, then any solution $P_{t}, t = 1, 2, \dots, n$ to the generalized matrix DRE (225) with arbitrary initial condition $P_{1} ⪰ 0$ is such that ${lim}_{n ⟶ \infty} P_{n} = P$ , where $P ⪰ 0$ is the unique solution of the generalized matrix ARE (226) with $s p e c (F^{C L} (P)) \in D_{o}$ , i.e., it is stabilizing.
(4) ${A, C}$ is detectable and ${A^{*}, G B^{*, \frac{1}{2}}}$ unit circle controllable, which are necessary and sufficient conditions for any solution $P_{t}, t = 2, \dots, n$ to the generalized DRE (225) to converge, ${lim}_{n ⟶ \infty} P_{n} = P$ , from some initial condition $P_{1} ⪰ 0$ , where $P ⪰ 0$ is a stabilizing solution of the generalized ARE (226), but it may not be unique (i.e., (226) may have multiple solutions $P ⪰ 0$ ).

Proposition 4.

Generalizations to asymptotic-time invariant coefficients

Suppose that the coefficients of the generalized matrix DRE (225), $(A, C, G, Q, S)$ are replaced by $(A_{t}, C_{t}, G_{t}, Q_{t}, S_{t})$ , $t = 1, 2, \dots, n$ , and they are asymptotically time-invariant, i.e.,

$\begin{matrix} lim_{n ⟶ \infty} (A_{n}, C_{n}, G_{n}, Q_{n}, S_{n}) = (A, C, G, Q, S) . \end{matrix}$

(230)

Then, Theorem 5 remains valid.

Proof.

This is due to the well-known continuity properties of matrix DREs with respect to their coefficients, i.e., the convergence properties are characterized by the limiting pairs,

\{A, C\}

and

\{A^{*}, G B^{*, \frac{1}{2}}\}

. □

3.3. Feedback Rates

Now, we return to the feedback rates of Definition 4. The next corollary is an application of Theorem 5 to the generalized Kalman filter of Lemma 1 (for the time-invariant PO-SS realization); it identifies conditions for existence of the entropy rate

H_{R} (V^{\infty})

, irrespective of whether the noise is stable or unstable.

Corollary 8.

The entropy rate of PO-SS noise realization based on the generalized Kalman filter

Let $Σ_{t}^{o} = Σ_{t}, t = 1, 2, \dots$ denote the solution of the generalized matrix DRE (95) of the generalized Kalman filter of Lemma 1 of the time-invariant PO-SS realization of $V^{n}$ of Definition 2, i.e., $(A_{t}, B_{t}, C_{t}, N_{t}, K_{W_{t}}) = (A, B, C, N, K_{W}), \forall t$ , generated by

$\begin{matrix} Σ_{t + 1}^{o} = & A Σ_{t}^{o} A^{T} + B K_{W} B^{T} - (A Σ_{t}^{o} C^{T} + B K_{W} N^{T}) {(N K_{W} N^{T} + C Σ_{t}^{o} C^{T})}^{- 1} \end{matrix}$

$\begin{matrix} . {(A Σ_{t}^{o} C^{T} + B K_{W} N^{T})}^{T}, Σ_{t}^{o} ⪰ 0, t = 1, \dots, n, Σ_{1}^{o} = K_{S_{1}} ⪰ 0 . \end{matrix}$

(231)

$\begin{matrix} M^{C L} (Σ^{o}) \overset{▵}{=} & A - M (Σ^{o}) C, M (Σ^{o}) \overset{▵}{=} (A Σ^{o} C^{T} + B_{t} K_{W} N^{T}) {(N K_{W} N^{T} + C Σ^{o} C^{T})}^{- 1} . \end{matrix}$

(232)

Let $Σ^{\infty} = Σ^{\infty, T} ⪰ 0$ be a solution of the corresponding generalized ARE

$\begin{matrix} Σ^{\infty} = & A Σ^{\infty} A^{T} + B K_{W} B^{T} \\ - (A Σ^{\infty} C^{T} + B K_{W} N^{T}) {(N K_{W} N^{T} + C Σ^{\infty} C^{T})}^{- 1} {(A Σ^{\infty} C^{T} + B K_{W} N^{T})}^{T} . \end{matrix}$

(233)

Define the matrices

$\begin{matrix} G Q G^{T} \overset{▵}{=} B K_{W} B^{T}, G S \overset{▵}{=} B K_{W} N^{T}, R \overset{▵}{=} N K_{W} N^{T} ⟹ G \overset{▵}{=} B, Q \overset{▵}{=} K_{W}, S \overset{▵}{=} K_{W} N^{T}, \end{matrix}$

(234)

$\begin{matrix} A^{*} \overset{▵}{=} A - B K_{W} N^{T} {(N K_{W} N^{T})}^{- 1} C, B^{*} \overset{▵}{=} K_{W} - K_{W} N^{T} {(N K_{W} N^{T})}^{- 1} {(K_{W} N^{T})}^{T} . \end{matrix}$

(235)

(a) All statements of Theorem 5 hold with $(G, Q, S, R)$ as defined by (234) and (235).
In particular, suppose the following:
(i) ${A, C}$ is detectable;
(ii) ${A^{*}, G B^{*, \frac{1}{2}}}$ is stabilizable.
Then, any solution $Σ_{t}^{o}, t = 1, 2, \dots, n$ to the generalized matrix DRE (231) with arbitrary initial condition $Σ_{1}^{o} ⪰ 0$ is such that ${lim}_{n ⟶ \infty} Σ_{n}^{o} = Σ^{\infty}$ , where $Σ^{\infty} ⪰ 0$ is the unique and stabilizing solution of the generalized matrix ARE (233), i.e., with $s p e c (M^{C L} (Σ^{\infty})) \in D_{o}$ .
(b) Suppose that (i) and (ii) hold. The entropy rate of $V^{n}$ is given by

$\begin{matrix} H_{R} (V^{\infty}) = & lim_{n ⟶ \infty} \frac{1}{2 n} \sum_{t = 1}^{n} log (2 π e [C Σ_{t}^{o} C^{T} + N K_{W} N^{T}]) \end{matrix}$

(236)

$\begin{matrix} = & H ({\hat{I}}_{t}^{\infty}) \overset{▵}{=} \frac{1}{2} log (2 π e [C Σ^{\infty} C^{T} + N K_{W} N^{T}]), \forall Σ_{1}^{o} ⪰ 0, \forall t \end{matrix}$

(237)

where

$\begin{matrix} {\hat{I}}_{t}^{\infty} \overset{▵}{=} C (S_{t} - {\hat{S}}_{t}^{\infty}) + N W_{t} \in N (0, C Σ^{\infty} C^{T} + N K_{W} N^{T}), t = 1, 2, \dots, \end{matrix}$

(238)

is the stationary Gaussian innovations process, i.e., with $Σ_{t}^{o}$ replaced by $Σ^{\infty}$ , and the entropy rate $H_{R} (V^{\infty})$ is independent of the initial data $Σ_{1}^{o} ⪰ 0$ .
(c) Suppose in parts (a) and (b), the condition ${A^{*}, G B^{*, \frac{1}{2}}}$ is stabilizable is replaced by
(iii) ${A^{*}, G B^{*, \frac{1}{2}}}$ is unit circle controllable.
Then, the statements of parts (a) and (b) hold, for some $Σ_{1}^{o} ⪰ 0$ , but not for all $Σ_{1}^{o} ⪰ 0$ . Moreover, ${\hat{I}}_{t}^{\infty}$ is not necessarily a stationary process, i.e., it depends on the value of $Σ_{1}^{o} ⪰ 0$ .

Proof.

(a) These are direct applications of Theorem 5. (b) This follows from Lemma 2. (c) This is due to Theorem 5(4). □

Next, we apply Corollary 8 to the non-stationary AR

(a, c), a \in (- \infty, \infty), c \in (- \infty, \infty)

noise.

Lemma 4.

Properties of solutions of DREs and AREs of AR

(a, c), a \in (- \infty, \infty), c \in (- \infty, \infty)

noise and entropy rate

H_{R} (V^{\infty})

Consider the AR $(a, c), a \in (- \infty, \infty), c \in (- \infty, \infty)$ noise of Example 2(a), and the DRE $Σ_{t}^{o} \overset{▵}{=} Σ_{t}, t = 1, \dots, n$ , generated by Corollary 6(a), i.e.,

$\begin{matrix} Σ_{t + 1}^{o} = {(c)}^{2} Σ_{t}^{o} + K_{W} - {(c Σ_{t}^{o} (c - a) + K_{W})}^{2} {(K_{W} + {(c - a)}^{2} Σ_{t}^{o})}^{- 1}, t = 1, \dots, n, \end{matrix}$

(239)

$\begin{matrix} Σ_{1}^{o} = K_{S_{1}} = \frac{{(c_{0})}^{2} K_{S_{0}} + {(a_{0})}^{2} K_{W_{0}}}{{(c_{0} - a_{0})}^{2}} \geq 0 . \end{matrix}$

(240)

where $K_{W} > 0, c \neq a$ , $K_{S_{0}} \geq 0, K_{W_{0}} \geq 0$ . Let $Σ^{\infty} \geq 0$ be a solution of the corresponding generalized ARE, as follows:

$\begin{matrix} Σ^{\infty} = {(c)}^{2} Σ^{\infty} + K_{W} - {(c Σ^{\infty} (c - a) + K_{W})}^{2} {(K_{W} + {(c - a)}^{2} Σ^{\infty})}^{- 1} . \end{matrix}$

(241)

Then, the detectability and stabilizability pairs are

$\begin{matrix} {A, C} = {c, c - a}, {A^{*}, G B^{*, \frac{1}{2}}} = {a, 0} . \end{matrix}$

(242)

and the following hold:
(1) The pair ${A, C} = {c, c - a}$ is detectable $\forall c \in (- \infty, \infty), a \in (- \infty, \infty)$ (the restriction $c \neq a$ is always assumed).
(2) The pair ${A^{*}, G B^{*, \frac{1}{2}}} = {a, 0}$ is unit circle controllable if and only if $| a | \neq 1$ ( $\forall c \in (- \infty, \infty)$ ).
(3) The pair ${A^{*}, G B^{*, \frac{1}{2}}} = {a, 0}$ is stabilizable if and only if $a \in (- 1, 1)$ ( $\forall c \in (- \infty, \infty)$ ).
(4) Suppose $c \in (- \infty, \infty)$ and $a \in (- \infty, \infty)$ . The sequence ${Σ_{t}^{o}, t = 1, 2, \dots, n}$ that satisfies the generalized DRE with any initial condition, $Σ_{1}^{o} \geq 0$ , converges to $Σ^{\infty}$ , i.e., ${lim}_{n ⟶ \infty} Σ_{n}^{o} = Σ^{\infty}$ , where $Σ^{\infty} \geq 0$ satisfies the ARE (241) if and only if the ${A^{*}, G B^{*, \frac{1}{2}}} = {a, 0}$ is unit circle controllable, equivalently, $| a | \neq 1$ . Moreover, the solutions of the quadratic Equation (241), without imposing $Σ^{\infty} \geq 0$ are

$\begin{matrix} Σ^{\infty} = \{\begin{matrix} 0 & u n i q u e, s t a b i l i z i n g, Σ^{\infty} \geq 0 s o l . o f (241) f o r c \in (- \infty, \infty), | a | < 1 \\ \frac{K_{W} (a^{2} - 1)}{{(c - a)}^{2}} > 0 & m a x i m a l, s t a b i l i z i n g, Σ^{\infty} > 0 s o l . o f (241) f o r c \in (- \infty, \infty), | a | > 1 \\ \frac{K_{W} (a^{2} - 1)}{{(c - a)}^{2}} < 0 & n o n - s t a b i l i z i n g, Σ^{\infty} < 0 s o l . o f (241) f o r c \in (- \infty, \infty), | a | < 1 . \end{matrix} \end{matrix}$

(243)

That is, ${lim}_{n ⟶ \infty} Σ_{n}^{o} = Σ^{\infty} = 0, \forall Σ_{1}^{o} \geq 0$ , is the unique and stabilizing solution $Σ^{\infty} \geq 0$ of (241), i.e., such that $| M^{C L} (Σ^{\infty}) | < 1$ , if and only if $| a | < 1$ , and
${lim}_{n ⟶ \infty} Σ_{n}^{0} = Σ^{\infty} = \frac{K_{W} (a^{2} - 1)}{{(c - a)}^{2}} > 0, \forall Σ_{1}^{o} > 0$ is the maximal and stabilizing solution $Σ^{\infty}$ of (241), i.e., such that $| M^{C L} (Σ^{\infty}) | < 1$ , if and only if $| a | > 1$ .
(5) Suppose $c \in (- \infty, \infty)$ and $| a | < 1$ . Then, any solution $Σ_{t}^{o}, t = 1, 2, \dots, n$ to the generalized DRE (239) with an arbitrary initial condition, $Σ_{1}^{o} \geq 0$ , is such that ${lim}_{n ⟶ \infty} Σ_{n}^{o} = Σ^{\infty}$ , where $Σ^{\infty} \geq 0$ is the unique solution of the generalized ARE (241) with $M^{C L} (Σ^{\infty}) \in (- 1, 1)$ , i.e., it is stabilizing. Moreover, $Σ^{\infty} = 0$ .
(6) (i) Suppose $c \in (- \infty, \infty)$ and $| a | < 1$ . The entropy rate of $V_{t}, \forall t \in Z_{+}$ is given by

$\begin{matrix} H_{R} (V^{\infty}) = & lim_{n ⟶ \infty} \frac{1}{n} \sum_{t = 1}^{n} \frac{1}{2} log (2 π e [{(c - a)}^{2} Σ_{t}^{o} + K_{W}]) \end{matrix}$

(244)

$\begin{matrix} = & \frac{1}{2} log (2 π e K_{W}), \forall Σ_{1}^{o} \geq 0 . \end{matrix}$

(245)

(ii) Suppose $c \in (- \infty, \infty)$ and $| a | > 1$ . The entropy rate of $V_{t}, \forall t \in Z_{+}$ is given by

$\begin{matrix} H_{R} (V^{\infty}) = & lim_{n ⟶ \infty} \frac{1}{n} \sum_{t = 1}^{n} \frac{1}{2} log (2 π e [{(c - a)}^{2} Σ_{t}^{o} + K_{W}]) \end{matrix}$

(246)

$\begin{matrix} = & \frac{1}{2} log (2 π e K_{W} a^{2}), \forall Σ_{1}^{o} > 0 . \end{matrix}$

(247)

Proof.

See Appendix A.7. □

Remark 16.

Lemma 4(4) emphasizes the fact that in the asymptotic analysis of

{Σ_{t}^{o}, t = 1, 2, \dots,}

, which satisfies the DREs (239) and (240), its limiting value,

{lim}_{n ⟶ \infty} Σ_{n}^{o} = Σ^{\infty}

, where

Σ^{\infty} \geq 0

satisfies the ARE (241), with two solutions

Σ^{\infty} = 0

and

Σ^{\infty} = \frac{K_{W} (a^{2} - 1)}{{(c - a)}^{2}}

. For any

c \in (- \infty, \infty)

, it is clear that for

| a | < 1

, the unique and stabilizing solution is

Σ^{\infty} = 0

,

\forall Σ_{1}^{o} \geq 0

, since the other solution

Σ^{\infty} = \frac{K_{W} (a^{2} - 1)}{{(c - a)}^{2}} < 0

is negative. On the other hand, for any

c \in (- \infty, \infty)

and

| a | > 1

, the stabilizing solution is the maximal solution,

Σ^{\infty} = \frac{K_{W} (a^{2} - 1)}{{(c - a)}^{2}} > 0

, provided

Σ_{1}^{o} > 0

.

To gain additional insights, we discuss the application of Lemma 4 to the AR

(c), c \in (- \infty, \infty)

noise in the next remark.

Remark 17.

Entropy rate

H_{R} (V^{\infty})

of the AR

(c), c \in (- \infty, \infty)

noise

Consider the non-stationary AR $(c), c \in (- \infty, \infty)$ noise defined by (124). Then, from Lemma 4, $Σ_{t}^{o}, t = 1, \dots, n$ is the solution of (239) and (240), with $a = 0$ (see Corollary 6(b), (201)), and (241) degenerates to the ARE, as follows:

$\begin{matrix} Σ^{\infty} = {(c)}^{2} Σ^{\infty} + K_{W} - {({(c)}^{2} Σ^{\infty} + K_{W})}^{2} {(K_{W} + {(c)}^{2} Σ^{\infty})}^{- 1} \end{matrix}$

(248)

For $a = 0$ , by (242), the pair ${A, C} = {c, c}$ is detectable, and the pair ${A^{*}, G B^{*, \frac{1}{2}}} = {0, 0}$ is stabilizable. The two solutions of the ARE (248), without imposing $Σ^{\infty} \geq 0$ , are

$\begin{matrix} Σ^{\infty} = \{\begin{matrix} 0 & t h e u n i q u e, s t a b i l i z i n g, n o n - n e g a t i v e s o l u t i o n o f t h e A R E \\ - \frac{K_{W}}{c^{2}} < 0 & t h e n o n - s t a b i l i z i n g, n e g a t i v e s o l u t i o n o f t h e A R E . \end{matrix} \end{matrix}$

(249)

That is, ${lim}_{n ⟶ \infty} Σ_{n}^{o} = Σ^{\infty} \geq 0$ , where $Σ^{\infty} = 0$ is the unique (stabilizing) solution of the ARE and corresponds to the stable eigenvalue of the error equation (see (93), i.e., $M^{C L} (Σ^{\infty}) = c - \frac{K_{W}}{K_{W}} c = 0$ .

Next, we compute the entropy rate

H_{R} (V^{\infty})

of the time-invariant non-stationary PO-SS

(a, c, b^{1}, b^{2}, d^{1}, d^{2})

noise of Corollary 2 to show fundamental differences from the entropy rate

H_{R} (V^{\infty})

of the AR

(a, c)

noise of Lemma 4.

Lemma 5.

Properties of solutions of DREs and AREs of PO-SS

(a, c, b^{1} = b, b^{2} = 0, d^{1} = 0, d^{2} = d)

noise and entropy rate

H_{R} (V^{\infty})

Consider the the time-invariant non-stationary PO-SS $(a, c, b^{1}, b^{2} = 0, d^{1} = 0, d^{2} = d)$ noise of Example 1, i.e., given by

$\begin{matrix} S_{t + 1} = a S_{t} + b W_{t}^{1}, t = 1, 2, \dots, n - 1 \end{matrix}$

(250)

$\begin{matrix} V_{t} = c S_{t} + d W_{t}^{2}, t = 1, \dots, n, \end{matrix}$

(251)

and the sequence $Σ_{t}^{o} \overset{▵}{=} Σ_{t}, t = 1, \dots, n$ , generated by the DRE of Lemma 1 (see (113), i.e.,

$\begin{matrix} Σ_{t + 1}^{o} = {(a)}^{2} Σ_{t}^{o} + {(b)}^{2} K_{W^{1}} \\ - {(a Σ_{t}^{o} c)}^{2} {({(d)}^{2} K_{W^{2}} + {(c)}^{2} Σ_{t}^{o})}^{- 1}, t = 1, \dots, n, Σ_{1}^{o} = K_{S_{1}} \geq 0, Σ_{t}^{o} \geq 0 \end{matrix}$

(252)

where ${(b)}^{2} K_{W^{1}} \geq 0, {(d)}^{2} K_{W}^{2} > 0$ . Let $Σ^{\infty} \geq 0$ be the corresponding solution of generalized ARE:

$\begin{matrix} Σ^{\infty} = {(a)}^{2} Σ^{\infty} + {(b)}^{2} K_{W^{1}} - {(a Σ^{\infty} c)}^{2} {({(d)}^{2} K_{W^{2}} + {(c)}^{2} Σ^{\infty})}^{- 1} . \end{matrix}$

(253)

Then, the detectability and stabilizability pairs are

$\begin{matrix} {A, C} = {a, c}, {A^{*}, G B^{*, \frac{1}{2}}} = {a, b {(K_{W^{1}})}^{\frac{1}{2}}} . \end{matrix}$

(254)

and the following hold:
(1) The pair ${A, C} = {a, c}$ is detectable $\forall c \in (- \infty, \infty), a \in (- \infty, \infty), c \neq 0$ . If $c = 0$ the pair ${A, C} = {a, 0}$ is detectable if and only if $| a | < 1$ .
(2) The pair ${A^{*}, G B^{*, \frac{1}{2}}} = {a, b {(K_{W^{1}})}^{\frac{1}{2}}}$ is unit circle controllable if and only if $| b {(K_{W^{1}})}^{\frac{1}{2}} | \neq 1$ , $\forall a \in (- \infty, \infty), c \in (- \infty, \infty)$ .
(3) The pair ${A^{*}, G B^{*, \frac{1}{2}}} = {a, b {(K_{W^{1}})}^{\frac{1}{2}}}$ is stabilizable if $b {(K_{W^{1}})}^{\frac{1}{2}} \neq 0$ , $\forall a \in (\infty, \infty), c \in (- \infty, \infty)$ . If $b {(K_{W^{1}})}^{\frac{1}{2}} = 0$ the pair ${A^{*}, G B^{*, \frac{1}{2}}} = {a, 0}$ is stabilizable if and only if $| a | < 1$ .
(4) Define the set

$\begin{matrix} L^{\infty} \overset{▵}{=} {(a, c, {(b)}^{2} K_{W^{1}}) \in & {(- \infty, \infty)}^{2} \times [0, \infty) : (i) t h e p a i r {A, C} = {a, c} i s d e t e c t a b l e, a n d \\ (i i) t h e p a i r {A^{*}, G B^{*, \frac{1}{2}}} = {a, b {(K_{W^{1}})}^{\frac{1}{2}}} i s s t a b i l i z a b l e} . \end{matrix}$

(255)

For any $(a, c, b {(K_{W^{1}})}^{\frac{1}{2}}) \in L^{\infty}$ , any solution $Σ_{t}^{o}, t = 1, 2, \dots, n$ to the (classical) DRE (252) with an arbitrary initial condition, $Σ_{1}^{o} \geq 0$ is such that ${lim}_{n ⟶ \infty} Σ_{n}^{o} = Σ^{\infty}$ , where $Σ^{\infty} \geq 0$ is the unique solution of the (classical) ARE (253) with $M^{C L} (Σ^{\infty}) \in (- 1, 1)$ , i.e., it is stabilizing.
(5) For any $(a, c, b^{2} K_{W^{1}}) \in L^{\infty}$ of part (4), the entropy rate of $V_{t}, \forall t \in Z_{+}$ is given by

$\begin{matrix} H_{R} (V^{\infty}) = & lim_{n ⟶ \infty} \frac{1}{n} \sum_{t = 1}^{n} \frac{1}{2} log (2 π e [{(c)}^{2} Σ_{t}^{o} + {(d)}^{2} K_{W^{2}}]) \end{matrix}$

(256)

$\begin{matrix} = & \frac{1}{2} log (2 π e [{(c)}^{2} Σ^{\infty} + {(d)}^{2} K_{W^{2}}]), \forall Σ_{1}^{o} \geq 0 . \end{matrix}$

(257)

Proof.

Follows from Theorem 5. □

Next, we turn our attention to the convergence properties of the entropy rate

H_{R} (Y^{\infty})

, which is needed for the characterization of

C^{f b, o} (κ)

of Definition 4.

Theorem 6.

Asymptotic properties of entropy rate

H_{R} (Y^{\infty})

of Theorem 3

Let $K_{t}^{o}, t = 1, \dots,$ be the solution of the generalized DRE (180) of the generalized Kalman filter of Theorem 3, corresponding to the time-invariant PO-SS realization of $V^{n}$ of Definition 2, $(A_{t}, B_{t}, C_{t}, N_{t}, K_{W_{t}}) = (A, B, C, N, K_{W}), \forall t$ , with time-invariant strategies $(Λ_{t}, K_{Z_{t}}) = (Λ^{\infty}, K_{Z}^{\infty})$ , $\forall t$ , generated by

$\begin{matrix} K_{t + 1}^{o} = A K_{t}^{o} A^{T} + M (Σ_{t}^{o}) K_{{\hat{I}}_{t}^{o}} {(M (Σ_{t}^{o}))}^{T} \\ - (A K_{t}^{o} {(Λ^{\infty} + C)}^{T} + M (Σ_{t}^{o}) K_{{\hat{I}}_{t}^{o}}) {(K_{{\hat{I}}_{t}^{o}} + K_{Z}^{\infty} + (Λ^{\infty} + C) K_{t}^{o} {(Λ^{\infty} + C)}^{T})}^{- 1} \\ . {(A K_{t}^{o} {(Λ^{\infty} + C)}^{T} + M (Σ_{t}^{o}) K_{{\hat{I}}_{t}^{o}})}^{T}, K_{t}^{o} = K_{t}^{o, T} ⪰ 0, t = 1, \dots, n, K_{1}^{o} = 0 \end{matrix}$

(258)

where

$\begin{matrix} K_{{\hat{I}}_{t}^{o}} = C Σ_{t}^{o} C^{T} + N K_{W} N^{T}, Σ_{t}^{o} i s a s o l u t i o n o f (231), M (Σ^{o}) i s g i v e n b y (232), \end{matrix}$

(259)

$\begin{matrix} F^{C L} (Σ^{o}, K^{o}) \overset{▵}{=} A - F (Σ^{o}, K^{o}) (Λ^{\infty} + C), \end{matrix}$

(260)

$\begin{matrix} F (Σ^{o}, K^{o}) \overset{▵}{=} (A K^{o} {(Λ^{\infty} + C)}^{T} + M (Σ^{o}) K_{{\hat{I}}^{o}}) {\{K_{{\hat{I}}^{o}} + K_{Z^{\infty}} + (Λ^{\infty} + C) K^{o} {(Λ^{\infty} + C)}^{T}\}}^{- 1} . \end{matrix}$

(261)

Define the corresponding generalized ARE by

$\begin{matrix} K^{\infty} = A K^{\infty} A^{T} + M (Σ^{\infty}) K_{{\hat{I}}^{\infty}} {(M (Σ^{\infty}))}^{T} - (A K^{\infty} {(Λ^{\infty} + C)}^{T} + M (Σ^{\infty}) K_{{\hat{I}}^{\infty}}) (K_{{\hat{I}}^{\infty}} + K_{Z}^{\infty} \\ + (Λ^{\infty} + C) K^{\infty} {(Λ^{\infty} + C)}^{T})^{- 1} {(A K^{\infty} {(Λ^{\infty} + C)}^{T} + M (Σ^{\infty}) K_{{\hat{I}}^{\infty}})}^{T}, K^{\infty} = K^{\infty, T} ⪰ 0 . \end{matrix}$

(262)

where

$\begin{matrix} K_{{\hat{I}}^{\infty}} = C Σ^{\infty} C^{T} + N K_{W} N^{T}, Σ^{\infty} i s a s o l u t i o n o f (233), M (Σ^{\infty}) i s g i v e n b y (232) . \end{matrix}$

(263)

Introduce the matrices

$\begin{matrix} C (Λ^{\infty}) \overset{▵}{=} Λ^{\infty} + C, G Q G^{T} \overset{▵}{=} M (Σ^{\infty}) K_{{\hat{I}}^{\infty}} {(M (Σ^{\infty}))}^{T}, \end{matrix}$

(264)

$\begin{matrix} G S \overset{▵}{=} M (Σ^{\infty}) K_{{\hat{I}}^{\infty}}, R (K_{Z}^{\infty}) \overset{▵}{=} K_{{\hat{I}}^{\infty}} + K_{Z}^{\infty} . \end{matrix}$

(265)

$\begin{matrix} ⟹ G \overset{▵}{=} M (Σ^{\infty}), Q \overset{▵}{=} K_{{\hat{I}}^{\infty}}, S \overset{▵}{=} K_{{\hat{I}}^{\infty}}, \end{matrix}$

(266)

$\begin{matrix} A^{*} (Λ^{\infty}, K_{Z}^{\infty}) \overset{▵}{=} A - M (Σ^{\infty}) K_{{\hat{I}}^{\infty}} {(K_{{\hat{I}}^{\infty}} + K_{Z}^{\infty})}^{- 1} (Λ^{\infty} + C), \end{matrix}$

(267)

$\begin{matrix} B^{*} (K_{Z}^{\infty}) \overset{▵}{=} K_{{\hat{I}}^{\infty}} - K_{{\hat{I}}^{\infty}} {(K_{{\hat{I}}^{\infty}} + K_{Z}^{\infty})}^{- 1} K_{{\hat{I}}^{\infty}} . \end{matrix}$

(268)

Suppose that the detectability and stabilizability conditions of Corollary 8(i,ii) hold.
Then, all statements of Theorem 5 hold with $(C (Λ^{\infty}), G, Q, S, R (K_{Z}^{\infty}))$ as defined by (265).
In particular, suppose
(i) ${A, C (Λ^{\infty})} = {A, Λ^{\infty} + C}$ is detectable;
(ii) ${A^{*} (Λ^{\infty}, K_{Z}^{\infty}), G B^{*, \frac{1}{2}} (K_{Z}^{\infty})}$ is stabilizable.
Then, any solution $K_{t}^{o}, t = 1, 2, \dots, n$ to the generalized matrix DRE (258) with an arbitrary initial condition $K_{1}^{o} ⪰ 0$ is such that ${lim}_{n ⟶ \infty} K_{n}^{o} = K^{\infty}$ , where $K^{\infty} ⪰ 0$ is the unique solution of the generalized matrix ARE (262) with $s p e c (F^{C L} (K^{\infty}, Σ^{\infty})) \in D_{o}$ , i.e., it is stabilizing.
Moreover, the entropy rate of $Y^{n}$ is given by

$\begin{matrix} H_{R} (Y^{\infty}) = lim_{n ⟶ \infty} \frac{1}{n} \sum_{t = 1}^{n} H (I_{t}^{o}) \end{matrix}$

(269)

$\begin{matrix} = lim_{n ⟶ \infty} \frac{1}{2 n} \sum_{t = 1}^{n} log (2 π e [(Λ^{\infty} + C) K_{t}^{o} {(Λ^{\infty} + C)}^{T} + K_{{\hat{I}}_{t}^{o}} + K_{Z}^{\infty}]) \end{matrix}$

(270)

$\begin{matrix} = H (I_{t}^{\infty}) \overset{▵}{=} \frac{1}{2} log (2 π e [(Λ^{\infty} + C) K^{\infty} {(Λ^{\infty} + C)}^{T} + K_{{\hat{I}}^{\infty}} + K_{Z}^{\infty}]), \forall K_{1}^{o} ⪰ 0, \forall t \end{matrix}$

(271)

where $I_{t}^{o}, t = 1, \dots, n$ is the innovations process of Theorem 3 (with indicated changes of time-invariant strategies) and where

$\begin{matrix} I_{t}^{\infty} = (Λ^{\infty} + C) ({\hat{S}}_{t}^{\infty} - \hat{{\hat{S}}_{t}^{\infty}}) + {\hat{I}}_{t}^{\infty} + Z_{t}, \end{matrix}$

(272)

$\begin{matrix} I_{t}^{\infty} \in N (0, (Λ^{\infty} + C) K^{\infty} {(Λ^{\infty} + C)}^{T} + K_{{\hat{I}}^{\infty}} + K_{Z}^{\infty}), t = 1, 2, \dots, \end{matrix}$

(273)

is the stationary Gaussian innovations process, i.e., with $(K_{t}^{o}, Σ_{t}^{o})$ replaced by $(K^{\infty}, Σ^{\infty})$ .

Proof.

Since the detectability and stabilizability conditions of Corollary 8 hold, then the statements of Corollary 8 hold. By the continuity property of the solutions of generalized difference Riccati equations, with respect to its coefficients (see [16]), and the convergence of the sequence

{lim}_{n ⟶ \infty} Σ_{n}^{\infty} = Σ^{\infty}

, where

Σ^{\infty} ⪰ 0

is the unique stabilizing solution of (233), then the statements of Theorem 6 hold, as stated. In particular, under the detectability and stabilizability Conditions (i) and (ii),

{lim}_{n ⟶ \infty} K_{n}^{o} = K^{\infty}

, where

K^{\infty} ⪰ 0

is the unique and stabilizing solution of (262). □

In the next lemma, we apply Theorem 6 to the AR

(a, c), a \in (- \infty, \infty), c \in (- \infty, \infty)

noise of Example 2(a) using Lemma 4.

Lemma 6.

Consider the AR

(a, c), a \in (- \infty, \infty), c \in (- \infty, \infty)

noise of Example 2(a), and the DRE

Σ_{t}^{o} \overset{▵}{=} Σ_{t}, t = 1, \dots, n

and ARE of Lemma 4, (239)–(242).

Let $K_{t}^{o}, t = 1, \dots, n$ denote the solution of the DRE of Corollary 6(a), when $Λ_{t} = Λ^{\infty}, K_{Z_{t}} = K_{Z}^{\infty}, K_{t} = K_{t}^{o}, \forall t$ , i.e., given by

$\begin{matrix} K_{t + 1}^{o} = & {(c)}^{2} K_{t}^{o} + {(M (Σ_{t}^{o}))}^{2} K_{{\hat{I}}_{t}^{o}} - {(c K_{t}^{o} (Λ^{\infty} + c - a) + M (Σ_{t}^{o}) K_{{\hat{I}}_{t}^{o}})}^{2} \end{matrix}$

$\begin{matrix} . {(K_{{\hat{I}}_{t}^{o}} + K_{Z}^{\infty} + {(Λ^{\infty} + c - a)}^{2} K_{t}^{o})}^{- 1}, K_{1}^{o} = 0, t = 1, \dots, n, \end{matrix}$

(274)

$\begin{matrix} K_{Z}^{\infty} \geq & 0, K_{t}^{o} \geq 0, t = 1, \dots, n \end{matrix}$

(275)

and where

$\begin{matrix} M (Σ_{t}^{o}) \overset{▵}{=} (c Σ_{t}^{o} (c - a) + K_{W}) {(K_{W} + {(c - a)}^{2} Σ_{t}^{o})}^{- 1}, \end{matrix}$

(276)

$\begin{matrix} K_{{\hat{I}}_{t}^{o}} = {(c - a)}^{2} Σ_{t}^{o} + K_{W}, t = 1, \dots, n . \end{matrix}$

(277)

Define the set

$\begin{matrix} L^{\infty} \overset{▵}{=} {(a, c) \in {(- \infty, \infty)}^{2}, & a \neq c : (i) t h e p a i r {A, C} = {a, c - a} i s d e t e c t a b l e, a n d \\ (i i) t h e p a i r {A^{*}, G B^{*, \frac{1}{2}}} = {a, 0} i s s t a b i l i z a b l e} . \end{matrix}$

(278)

For any $(a, c) \in L^{\infty}$ , let $K^{\infty} \geq 0$ be a corresponding solution of the ARE (evaluated at ${lim}_{n ⟶ \infty} Σ_{n}^{\infty} = Σ^{\infty} = 0$ ),

$\begin{matrix} K^{\infty} = & {(c)}^{2} K^{\infty} + K_{W} - {(c K^{\infty} (Λ^{\infty} + c - a) + K_{W})}^{2} {(K_{W} + K_{Z}^{\infty} + {(Λ^{\infty} + c - a)}^{2} K^{\infty})}^{- 1} . \end{matrix}$

(279)

$\begin{matrix} K_{Z}^{\infty} \geq & 0, K_{W} > 0 . \end{matrix}$

(280)

and define the pairs

$\begin{matrix} {A, C (Λ^{\infty}} = {c, Λ^{\infty} + c - a}, \end{matrix}$

(281)

$\begin{matrix} {A^{*} (Λ^{\infty}, K_{Z}^{\infty}), G B^{*, \frac{1}{2}} (K_{Z}^{\infty})} \\ = {c - K_{W} {(K_{W} + K_{Z}^{\infty})}^{- 1} (Λ^{\infty} + c - a), {(K_{W} - {(K_{W})}^{2} {(K_{W} + K_{Z}^{\infty})}^{- 1})}^{\frac{1}{2}}} . \end{matrix}$

(282)

Then, the following hold:
(1) Suppose $Λ^{\infty} + c - a \neq 0$ . Then, ${A, C (Λ^{\infty}} = {c, Λ^{\infty} + c - a}$ is detectable $\forall (a, c) \in {(- \infty, \infty)}^{2}$ .
(2) Suppose $Λ^{\infty} + c - a = 0$ . Then, ${A, C (Λ^{\infty}} = {c, 0}$ is detectable for if and only if $| c | < 1$ $\forall a \in (- \infty, \infty)$ .
(3) Suppose $K_{Z}^{\infty} = 0$ . Then, the pair ${A^{*}, G B^{*, \frac{1}{2}}} = {- Λ^{\infty} + a, 0}$ is unit circle controllable if and only if $| Λ - a | \neq 1$ $\forall a \in (- \infty, \infty)$ .
(4) Suppose $K_{Z}^{\infty} = 0$ . Then, the pair ${A^{*}, G B^{*, \frac{1}{2}}} = {- Λ^{\infty} + a, 0}$ is stabilizable if and only if $| Λ - a | < 1$ $\forall a \in (- \infty, \infty)$ .
(5) Suppose $Λ^{\infty} + c - a \neq 0, | Λ - a | \neq 1$ $\forall (a, c) \in {(- \infty, \infty)}^{2}$ , and $K_{Z}^{\infty} = 0$ , $Σ_{1}^{o} = 0$ . The sequence $K_{t}^{o}, t = 1, 2, \dots, n$ that satisfies the generalized DRE (274) with a zero initial condition, $K_{1}^{o} = 0$ , converges to $K^{\infty} \geq 0$ , i.e., ${lim}_{n ⟶ \infty} K_{n}^{o} = K^{\infty}$ , where $K^{\infty}$ satisfies the generalized ARE,

$\begin{matrix} K^{\infty} = & {(c)}^{2} K^{\infty} + K_{W} - {(c K^{\infty} (Λ^{\infty} + c - a) + K_{W})}^{2} {(K_{W} + {(Λ^{\infty} + c - a)}^{2} K^{\infty})}^{- 1}, K^{\infty} \geq 0 \end{matrix}$

(283)

if and only if $| a | < 1$ (by Lemma 4(4)), and the pair ${A^{*}, G B^{*, \frac{1}{2}}} = {- Λ^{\infty} + a, 0}$ is stabilizable, equivalently, $| Λ^{\infty} - a | < 1$ .
Moreover, the solutions of the ARE (283), under the stabilizability condition, i.e., $| Λ^{\infty} - a | < 1$ , are

$\begin{matrix} K^{\infty} = \{\begin{matrix} 0 & t h e u n i q u e, s t a b i l i z i n g, K^{\infty} \geq 0 s o l . o f (283) f o r | Λ^{\infty} - a | < 1 \\ \frac{K_{W} ({(Λ^{\infty} - a)}^{2} - 1)}{{(Λ^{\infty} + c - a)}^{2}} < 0 & t h e n o n - s t a b i l i z i n g, K^{\infty} < 0 s o l . o f (283) f o r | Λ^{\infty} - a | < 1 . \end{matrix} \end{matrix}$

(284)

That is, ${lim}_{n ⟶ \infty} Σ_{n}^{o} = Σ^{\infty} = 0$ is the unique and stabilizing solution $Σ^{\infty} \geq 0$ of (283), i.e., such that $| M^{C L} (Σ^{\infty}) | < 1$ , if and only if $| Λ^{\infty} - a | < 1, | a | < 1$ .

Proof.

The statements follow from Lemma 4, Theorem 6 (and general properties of Theorem 5). □

Remark 18.

From Lemma 6(5), it follows that if

K_{Z}^{\infty} = 0, Σ_{1}^{o} = 0

, then the unique and stabilizing solution is

K^{\infty} = 0

and corresponds to

| Λ^{\infty} - a | < 1, a \in (- 1, 1)

. This is an application of Theorem 5(1).

In the next theorem, we characterize the asymptotic limit of Definition 4 by invoking Theorems 3 and 6 and Corollary 8.

Theorem 7.

Feedback capacity

C^{f b, o} (κ)

of Theorem 3 for time-invariant strategies

Consider $C^{f b, o} (κ)$ of Definition 4 corresponding to Theorem 3, i.e., the PO-SS realization of $V^{n}$ of Definition 2 is time-invariant, $(A_{t}, B_{t}, C_{t}, N_{t}, K_{W_{t}}) = (A, B, C, N, K_{W}), \forall t$ , and the strategies are time-invariant, $(Λ_{t}, K_{Z_{t}}) = (Λ^{\infty}, K_{Z}^{\infty}), \forall t$ .
Define the set

$\begin{matrix} P^{\infty} \overset{▵}{=} & {(Λ^{\infty}, K_{Z}^{\infty}) \in (- \infty, \infty) \times [0, \infty) : \\ (i) {A, C} o f C o r o l l a r y 8 i s d e t e c t a b l e; \\ (i i) {A^{*}, G B^{*, \frac{1}{2}}} o f C o r o l l a r y 8 i s s t a b i l i z a b l e, w h e r e (A^{*}, B^{*}) i s d e f i n e d b y (235); \\ (i i i) {A, C (Λ^{\infty})} = {A, Λ^{\infty} + C} o f T h e o r e m 6 i s d e t e c t a b l e; \\ (i v) {A^{*} (Λ^{\infty}, K_{Z}^{\infty}), G B^{*, \frac{1}{2}} (K_{Z}^{\infty})} o f T h e o r e m 6 i s s t a b i l i z a b l e; \\ (A^{*} (Λ^{\infty}, K_{Z}^{\infty}), B^{*} (K_{Z}^{\infty})) i s d e f i n e d b y (268) .} . \end{matrix}$

(285)

Then,

$\begin{matrix} C^{f b, o} (κ) \overset{▵}{=} \sup_{(Λ^{\infty}, K_{Z}^{\infty}) \in P^{\infty} : {lim}_{n ⟶ \infty} \frac{1}{n} \sum_{t = 1}^{n} (Λ^{\infty} K_{t}^{o} {(Λ^{\infty})}^{T} + K_{Z}^{\infty}) \leq κ} { \end{matrix}$

$\begin{matrix} lim_{n ⟶ \infty} \frac{1}{2 n} \sum_{t = 1}^{n} log (\frac{(Λ^{\infty} + C) K_{t}^{o} {(Λ^{\infty} + C)}^{T} + C Σ_{t}^{o} C^{T} + N K_{W} N^{T} + K_{Z}^{\infty}}{C Σ_{t}^{o} C^{T} + N K_{W} N^{T}})} \end{matrix}$

(286)

$\begin{matrix} = & \sup_{(Λ^{\infty}, K_{Z}^{\infty}) \in P^{\infty} (κ)} \frac{1}{2} log (\frac{(Λ^{\infty} + C) K^{\infty} {(Λ^{\infty} + C)}^{T} + C Σ^{\infty} C^{T} + N K_{W} N^{T} + K_{Z}^{\infty}}{C Σ^{\infty} C^{T} + N K_{W} N^{T}}) \end{matrix}$

(287)

where

$\begin{matrix} P^{\infty} (κ) \overset{▵}{=} & {(Λ^{\infty}, K_{Z}^{\infty}) \in P^{\infty} : K_{Z}^{\infty} \geq 0, Λ^{\infty} K^{\infty} {(Λ^{\infty})}^{T} + K_{Z}^{\infty} \leq κ, \\ K^{\infty} i s t h e u n i q u e a n d s t a b i l i z i n g s o l u t i o n o f (262), i . e ., | F^{C L} (Σ^{\infty}, K^{\infty}) | < 1 \\ Σ^{\infty} i s t h e u n i q u e, s t a b i l i z i n g s o l u t i o n o f (233), i . e ., | M^{C L} (Σ^{\infty}) | < 1,} \end{matrix}$

(288)

provided there exists $κ \in [0, \infty)$ such that the set $P^{\infty} (κ)$ is non-empty.
Moreover, the maximum element $(Λ^{\infty}, K_{Z}^{\infty}) \in P^{\infty} (κ)$ is such that
(1) It induces asymptotic stationarity of the corresponding input and innovations processes (see Theorem 3 for specification);
(2) If $V^{n}, n = 1, 2, \dots$ is asymptotically stationary, then it induces asymptotic stationarity of the corresponding, input and output processes;
(3) For (i) and (ii), $C^{f b, o} (κ)$ is independent of the initial conditions $K_{1}^{o} ⪰ 0, Σ_{1}^{o} ⪰ 0$ .
Furthermore, if the set $P^{\infty} (κ)$ is empty, replace stabilizability of ${A^{*}, G B^{*, \frac{1}{2}}}$ and ${A^{*} (Λ^{\infty}, K_{Z}^{\infty}), G B^{*, \frac{1}{2}} (K_{Z}^{\infty})}$ by unit circle controllablilty, i.e., the maximal and stabilizing solutions $K^{\infty}, Σ^{\infty}$ of the AREs are utilized.

Proof.

By Definition 4, Theorems 3 and 6 and Corollary 8, (286) follows. We defined the set

P^{\infty}

using the detectability and stabilizability conditions of Corollary 8 and Theorem 6 to ensure the convergence of solutions

{(K_{t}^{o}, Σ_{t}^{o}) : t = 1, 2, \dots, n}

of the generalized matrix DREs to unique non-negative, stabilizing solutions of the corresponding generalized matrix AREs. Then, for any element

(Λ^{\infty}, K_{Z}^{\infty}) \in P^{\infty}

, both summands in (286) converge. This establishes the characterization of the right-hand side of (287). Parts (1)–(3) follow from the asymptotic properties of the Kalman filter (due to the stabilizability and detectability conditions). The last statement follows due to the relaxation, Theorem 5(4). □

Remark 19.

In Theorem 7, if we replace stabilizability by unit circle controllability, then the supremum in (288) is over a larger set. However, the asymptotic limits

(Σ^{\infty}, K^{\infty})

are not unique stabilizing solutions but are instead the maximal and stabilizing solutions.

Theorem 7 also holds for asymptotically time-invariant strategies. We state this as a corollary.

Corollary 9.

Feedback capacity

C^{f b, o} (κ)

of Theorem 3 for asymptotically time-invariant strategies

Consider the problem statement of Theorem 7 with asymptotically time-invariant strategies, ${lim}_{n ⟶ \infty} (Λ_{n}, K_{Z_{n}}) = (Λ^{\infty}, K_{Z}^{\infty})$ and corresponding $K_{n}, n = 1, 2, \dots$ .
Then,

$\begin{matrix} C^{f b, o} (κ) \overset{▵}{=} \sup_{{lim}_{n ⟶ \infty} (Λ_{n}, K_{Z_{n}}) = (Λ^{\infty}, K_{Z}^{\infty}) \in P^{\infty} : {lim}_{n ⟶ \infty} \frac{1}{n} \sum_{t = 1}^{n} (Λ_{t} K_{t} {(Λ_{t})}^{T} + K_{Z_{t}}) \leq κ} { \end{matrix}$

$\begin{matrix} lim_{n ⟶ \infty} \frac{1}{2 n} \sum_{t = 1}^{n} log (\frac{(Λ_{t} + C) K_{t} {(Λ_{t} + C)}^{T} + C Σ_{t} C^{T} + N K_{W} N^{T} + K_{Z_{t}}}{C Σ_{t} C^{T} + N K_{W} N^{T}})} \end{matrix}$

(289)

$\begin{matrix} = & (287) \end{matrix}$

(290)

and the statements of Theorem 7 hold.

Proof.

This follows from Proposition 4 and Theorem 7. □

Remark 20.

Explicit closed-form expressions of

C^{f b, o} (κ)

are given in [1,2] for the stable and unstable AR and ARMA noise processes

V^{n}

. The expressions

C^{f b, o} (κ)

consist of multiple regimes that depend on the parameters of noise, i.e.,

(c, a, K_{W})

for the ARMA noise and value of κ. Moreover, for some regimes,

C^{f b, o} (κ)

is achieved by an optimal

K_{Z}^{\infty} > 0

, while for other regimes, it is achieved by

{lim}_{n ⟶ \infty} K_{Z_{n}} = K_{Z}^{\infty} = 0

, such that

K_{Z_{1}} > 0, K_{Z_{t}} = 0, t = 2, 3, \dots, n

.

Next, we give the expression of feedback capacity

C^{f b} (κ)

, which is generally an upper bound on the expressions of Theorem 7 and Corollary 9.

Theorem 8.

Feedback capacity

C^{f b} (κ)

of Theorem 3 for asymptotically time-invariant noise and strategies

Consider $C^{f b} (κ)$ of Definition 4 corresponding to Theorem 3, where $(Λ_{t}, K_{Z_{t}}), t = 1, \dots, n$ and the coefficients of the PO-SS realization of $V^{n}$ of Definition 2 are asymptotically time-invariant, i.e.,

$\begin{matrix} (a) lim_{n ⟶ \infty} (Λ_{n}, K_{Z_{n}}) = (Λ^{\infty}, K_{Z}^{\infty}); (b) lim_{n ⟶ \infty} (A_{n}, B_{n}, C_{n}, N_{n}, K_{W_{n}}) = (A, B, C, N, K_{W}) . \end{matrix}$

(291)

Let $P^{\infty, u c c}$ correspond to $P^{\infty} =$ (285) with ${A^{*}, G B^{*, \frac{1}{2}}}$ and ${A^{*} (Λ^{\infty}, K_{Z}^{\infty}), G B^{*, \frac{1}{2}} (K_{Z}^{\infty})}$ being replaced by unit circle controllablity.
Let $P^{\infty, u c c} (κ)$ correspond to $P^{\infty} (κ) =$ (288), with $K^{\infty}$ and $Σ^{\infty}$ being the maximal and stabilizing solution of (262) and the maximal and stabilizing solution of (233), respectively.
Then,

$\begin{matrix} C^{f b} (κ) \overset{▵}{=} lim_{n ⟶ \infty} \sup_{{\{Λ_{t}, K_{Z_{t}}\}}_{t = 1}^{n} : {lim}_{n ⟶ \infty} (Λ_{n}, K_{Z_{n}}) = (Λ^{\infty}, K_{Z}^{\infty}) \in P^{\infty, u c c}, \frac{1}{n} \sum_{t = 1}^{n} (Λ_{t} K_{t} {(Λ_{t})}^{T} + K_{Z_{t}}) \leq κ} { \end{matrix}$

$\begin{matrix} \frac{1}{2 n} \sum_{t = 1}^{n} log (\frac{(Λ_{t} + C_{t}) K_{t} {(Λ_{t} + C_{t})}^{T} + C_{t} Σ_{t} C_{t}^{T} + N_{t} K_{W_{t}} N_{t}^{T} + K_{Z_{t}}}{C_{t} Σ_{t} C_{t}^{T} + N_{t} K_{W_{t}} N_{t}^{T}})} \end{matrix}$

(292)

$\begin{matrix} = & \sup_{(Λ^{\infty}, K_{Z}^{\infty}) \in P^{\infty, u c c} (κ)} \frac{1}{2} log (\frac{(Λ^{\infty} + C) K^{\infty} {(Λ^{\infty} + C)}^{T} + C Σ^{\infty} C^{T} + N K_{W} N^{T} + K_{Z}^{\infty}}{C Σ^{\infty} C^{T} + N K_{W} N^{T}}) \end{matrix}$

(293)

$\begin{matrix} \geq & C^{f b, o} (κ) . \end{matrix}$

(294)

Proof.

First, we note that Theorem 7 continuous to hold, if we consider asymptotically time-invariant strategies and coefficients, i.e., (a) and (b) (by Proposition 4), and the stabilizability conditions are replaced by unit circle controllability conditions. Hence, (288) remains valid with the set

P^{\infty} (κ)

replaced by the larger set

P^{\infty, u c c} (κ)

, giving the higher value (293) ≥ (288). For the derivation of (292) = (293), it suffices to show we can interchange the

{lim}_{n ⟶ \infty}

and the supremum in (292). This can be completed by using the definition of the set

P^{\infty, u c c} (κ)

and Conditions (a) and (b). The procedure, although lengthy, is similar to the one described in [41]; hence, we omit it. □

Conclusion 1.

Degenerate versions of Theorem 7, Theorem 8 for feedback code of Definition 3, i.e.,

(s, 2^{n R}, n)

,

n = 1, 2, \dots

The characterizations of feedback capacity $C^{f b, o} (κ, s)$ of the AGN channel (1) driven by a noise $V^{n}$ of Definition 2, for the code of Definition 3, i.e., $(s, 2^{n R}, n)$ , $n = 1, 2, \dots$ , are degenerate cases of Theorem 7 and Theorem 8, corresponding to $Σ_{t} = Σ_{t}^{s}, t = 1, \dots, Σ_{1} = Σ_{1}^{s} = 0$ . In particular, since Theorem 7 characterizes $C^{f b, o} (κ)$ for all initial data $Σ_{1} ⪰ 0$ , then it includes $Σ_{1} = Σ_{1}^{s} = 0$ . Moreover, it follows that $C^{f b, o} (κ) = C^{f b, o} (κ, s)$ , where $C^{f b, o} (κ, s)$ is independent of the initial state $S_{1}^{s} = s$ .

We apply Theorem 7 to obtain

C^{f b, o} (κ)

of the AR

(a, c), a \in (- \infty, \infty), c \in (- \infty, \infty)

noise.

Corollary 10.

Consider the AR

(a, c), a \in (- \infty, \infty), c \in (- \infty, \infty)

noise of Example 2(a).

Define the set

$\begin{matrix} P^{\infty} \overset{▵}{=} & {(Λ^{\infty}, K_{Z}^{\infty}) \in (- \infty, \infty) \times [0, \infty) : \\ (i) c \in (- \infty, \infty), a \in (- 1, 1), c \neq a, \\ (i i) t h e p a i r {A, C (Λ^{\infty})} \overset{▵}{=} {c, Λ^{\infty} + c - a} i s d e t e c t a b l e, \\ (i i) t h e p a i r {A^{*} (Λ^{\infty}, K_{Z}^{\infty}), G B^{*, \frac{1}{2}} (K_{Z}^{\infty})} i s s t a b i l i z a b l e, w h e r e \\ A^{*} (Λ^{\infty}, K_{Z}^{\infty}) \overset{▵}{=} c - K_{W} {(K_{W} + K_{Z}^{\infty})}^{- 1} (Λ^{\infty} + c - a), \\ G B^{*, \frac{1}{2}} (K_{Z}^{\infty}) \overset{▵}{=} {(K_{W} - {(K_{W})}^{2} {(K_{W} + K_{Z}^{\infty})}^{- 1})}^{\frac{1}{2}}} . \end{matrix}$

(295)

Then,

$\begin{matrix} C^{f b, o} (κ) = \sup_{(Λ^{\infty}, K_{Z}^{\infty}) \in P^{\infty} (κ)} \frac{1}{2} log (\frac{{(Λ^{\infty} + c - a)}^{2} K^{\infty} + K_{W} + K_{Z}^{\infty}}{K_{W}}) = C^{f b, o} (κ, s), \forall s \end{matrix}$

(296)

where

$\begin{matrix} P^{\infty} & (κ) \overset{▵}{=} {(Λ^{\infty}, K_{Z}^{\infty}) \in P^{\infty} : {(Λ^{\infty})}^{2} K^{\infty} + K_{Z}^{\infty} \leq κ, \\ K^{\infty} \geq 0 i s t h e u n i q u e a n d s t a b i l i z i n g s o l u t i o n o f \\ K^{\infty} = {(c)}^{2} K^{\infty} + K_{W} \\ - {(c K^{\infty} (Λ^{\infty} + c - a) + K_{W})}^{2} {(K_{W} + K_{Z}^{\infty} + {(Λ^{\infty} + c - a)}^{2} K^{\infty})}^{- 1}} \end{matrix}$

(297)

provided that there exists $κ \in [0, \infty)$ such that the set $P^{\infty} (κ)$ is non-empty.
Moreover, the maximum element $(Λ^{\infty}, K_{Z}^{\infty}) \in P^{\infty} (κ)$ , is such that,
(1) It induces asymptotic stationarity of the corresponding, input and innovations processes;
(2) If $V^{n}, n = 1, 2, \dots$ is asymptotically stationary, then it induces asymptotic stationarity of the corresponding, input and output processes;
(3) For (i) and (ii), $C^{f b, o} (κ)$ and $C^{f b, o} (κ, s)$ are independent of $Σ_{1} \geq 0$ and s, respectively, and the following identities hold.

$\begin{matrix} C^{f b, o} (κ) = C^{f b, o} (κ, s) = C^{f b, S, o} (κ, s), \forall s \end{matrix}$

(298)

Furthermore, if the set $P^{\infty} (κ)$ is empty, replace stabilizability of ${A^{*}, G B^{*, \frac{1}{2}}}$ by unit circle controllability, i.e., so that $K^{\infty}$ is the maximal and stabilizing solutions of the ARE.

Proof.

The first part is an application of Theorem 7, Lemmas 4 and 6. Parts (1)–(3) are due to the convergence properties of the Kalman filter (due to the stabilizability and detectability conditions). It remains to show (298). The equality

C^{f b, o} (κ) = C^{f b, o} (κ, s), \forall s

holds by Conclusion 1(a). The last equality holds due to the AR

(a, c), a \in (- \infty, \infty), c \in (- \infty, \infty)

noise. If the initial state

S_{1} = S_{1}^{s} = s

is known to the encoder and the decoder, then Condition 1 of Section 1.1 holds. In addition, Condition 2 also holds, as can be easily verified from Equations (121) and (122). □

Remark 21.

From Corollary 10, we obtain the degenerate cases, AR

(c), c \in (- \infty, \infty)

, noise, i.e., setting

a = 0

. The various implications of the detectability and stabilizability conditions for the AR

(c), c \in (- \infty, \infty)

noise are found in [1]. The corresponding

C^{f b, o} (κ, s)

states that for stable AR

(c)

and time-invariant strategies, feedback does not increase capacity (because of the stronger condition of stabilizabily). However, if unit circle controllability is imposed instead, then feedback increases capacity.

4. Sequential Characterization of n–FTFI Capacity for Case (II) Formulation

In this section, we consider the Case (II) formulation, and we derive the characterization of feedback capacity,

C_{n}^{f b, S} (κ, s)

, of the AGN channel (1) driven by a noise

V^{n}

of Definition 2, i.e., for the code of Definition 3,

(s, 2^{n R}, n)

,

n = 1, 2, \dots

, when Conditions 1 and 2 of Section 1.1 hold.

Definition 8.

AGN channels driven by noise with invertible PO-SS realizations

The PO-SS realization of the noise of Definition 2 is called invertible if it satisfies the following condition:
(A1) Given the initial state $S_{1} = S_{1}^{s} = s$ , the noise $V^{t - 1}$ uniquely specifies the state $S^{t}$ , for $t = 1, \dots, n$ , and vice versa.

Corollary 11.

Characterization of n–FTFI Capacity for the Case (II) formulation

Consider the AGN channel (1) driven by a noise $V^{n}$ of Definition 8, and the code of Definition 3, $(s, 2^{n R}, n)$ , $n = 1, 2, \dots$ , i.e., Conditions 1 and 2 of Section 1.1 hold.
Define the n–FTFI capacity for a fixed initial state $S_{1} = S_{1}^{s} = s$ by

$\begin{matrix} C_{n}^{f b, S} (κ, s) = \sup_{P_{[1, n]}^{s} (κ)} H^{P} (Y^{n} | s) - H (V^{n} | s) \end{matrix}$

(299)

where the set $P_{[1, n]}^{s} (κ)$ is defined by

$\begin{matrix} P_{[1, n]}^{s} (κ) \overset{▵}{=} \{P_{t} (d x_{t} | x^{t - 1}, y^{t - 1}, s), t = 1, \dots, n : \frac{1}{n} E_{s}^{P} (\sum_{t = 1}^{n} {(X_{t})}^{2}) \leq κ\} \end{matrix}$

(300)

and where $E_{s}^{P}$ means $S_{1} = S_{1}^{s} = s$ is fixed, and the joint distribution depends on the elements of $P_{[1, n]}^{s} (κ)$ .
Then, the following hold:
(a) The n–FTFI capacity, for a fixed $S_{1} = s$ , is characterized by

$\begin{matrix} C_{n}^{f b, S} (κ, s) = \sup_{{\bar{P}}_{[1, n]}^{s, M} (κ)} \sum_{t = 1}^{n} \{H^{{\bar{P}}^{M}} (Y_{t} | Y^{t - 1}, s) - H (V_{t} | V^{t - 1}, s)\} \end{matrix}$

(301)

$\begin{matrix} = & \sup_{{\bar{P}}_{[1, n]}^{s, M} (κ)} \sum_{t = 1}^{n} \{H^{{\bar{P}}^{M}} (Y_{t} - E^{{\bar{P}}^{M}} \{Y_{t} | | Y^{t - 1}, s\} | s) - H (V_{t} - E \{V_{t} | V_{t}^{t - 1}, s\} | s)\} \end{matrix}$

(302)

where the ${\bar{P}}_{[1, n]}^{s, M} (κ)$ is defined by

$\begin{matrix} {\bar{P}}_{[1, n]}^{s, M} (κ) \overset{▵}{=} \{{\bar{P}}_{t}^{M} (d x_{t} | s_{t}, y^{t - 1}, s), t = 1, \dots, n : \frac{1}{n + 1} E_{s}^{{\bar{P}}^{M}} (\sum_{t = 1}^{n} {(X_{t})}^{2}) \leq κ\} \end{matrix}$

(303)

and where (18) is respected, ${\bar{P}}_{t}^{M} (d x_{t} | s_{t}, y^{t - 1}, s)$ , is conditionally Gaussian, with linear conditional mean and nonrandom conditional covariance, given by (the notation $S_{t} = S_{t}^{s}, t = 2, \dots, n$ means this sequence is generated from (19), when the initial state is fixed, $S_{1} = S_{1}^{s} = s$ ).

$\begin{matrix} E^{{\bar{P}}^{M}} \{X_{t} | S_{t}^{s}, Y^{t - 1}, S_{1}^{s} = s\} = \{\begin{matrix} Λ_{t} (S_{t}^{s} - E^{{\bar{P}}^{M}} \{S_{t}^{s} | Y^{t - 1}, S_{1}^{s} = s\}), & t = 2, \dots, n \\ 0, & t = 1, \end{matrix} \end{matrix}$

(304)

$\begin{matrix} K_{X_{t} | S_{t}^{s}, Y^{t - 1}, S_{1}^{s} = s} \overset{▵}{=} c o v (X_{t}, X_{t} | S_{t}^{s}, Y^{t - 1}, S_{1}^{s} = s) = K_{Z_{t}} ⪰ 0, t = 1, \dots, n . \end{matrix}$

(305)

and $H^{\bar{P}} (Y_{t} | Y^{t - 1}, s)$ is evaluated with respect to the probability distribution $P_{t}^{{\bar{P}}^{M}} (d y_{t} | y^{t - 1}, s)$ , defined by

$\begin{matrix} P_{t}^{{\bar{P}}^{M}} (d y_{t} | y^{t - 1}, s) = \int P_{t} (d y_{t} | x_{t}, s_{t}) P_{t}^{{\bar{P}}^{M}} (d x_{t} | s_{t}, y^{t - 1}, s) P_{t}^{{\bar{P}}^{M}} (d s_{t} | y^{t - 1}, s), t = 1, \dots, n . \end{matrix}$

(306)

(b) Define the conditional means and conditional covariance for a fixed $S_{1} = S_{1}^{s} = s$ by

$\begin{matrix} K_{t}^{s} \overset{▵}{=} & c o v (S_{t}^{s}, S_{t}^{s} | Y^{t - 1}, S_{1}^{s} = s) = E_{s}^{{\bar{P}}^{M}} \{(S_{t}^{s} - {\hat{S}}_{t}^{s}) {(S_{t}^{s} - {\hat{S}}_{t}^{s})}^{T}\}, \end{matrix}$

(307)

$\begin{matrix} {\hat{S}}_{t}^{s} \overset{▵}{=} & E_{s}^{{\bar{P}}^{M}} \{S_{t}^{s} | Y^{t - 1}, S_{1}^{s} = s\}, t = 2, \dots, n, K_{1}^{s} \overset{▵}{=} c o v (S_{1}^{s}, S_{1}^{s} | S_{1}^{s} = s) = 0, {\hat{S}}_{1}^{s} \overset{▵}{=} s . \end{matrix}$

(308)

The optimal channel input distribution of part (a) is induced by a jointly Gaussian process $X^{n}$ , with a realization given by

$\begin{matrix} X_{t} = Λ_{t} (S_{t}^{s} - {\hat{S}}_{t}^{s}) + Z_{t}, X_{1} = Z_{1}, t = 2, \dots, n, \end{matrix}$

(309)

$\begin{matrix} Z_{t} \in N (0, K_{Z_{t}}) i n d e p e n d e n t o f (S_{1}, X^{t - 1}, V^{t - 1}, Y^{t - 1}), t = 1, \dots, n, \end{matrix}$

(310)

$\begin{matrix} Y_{t} = Λ_{t} (S_{t}^{s} - {\hat{S}}_{t}^{s}) + Z_{t} + V_{t}, t = 1, \dots, n, \end{matrix}$

(311)

$\begin{matrix} = Λ_{t} (S_{t}^{s} - {\hat{S}}_{t}^{s}) + C_{t} S_{t}^{s} + N_{t} W_{t} + Z_{t}, \end{matrix}$

(312)

$\begin{matrix} \frac{1}{n} E_{s}^{{\bar{P}}^{M}} \{\sum_{t = 1}^{n} {(X_{t})}^{2}\} = \frac{1}{n} \sum_{t = 1}^{n} (Λ_{t} K_{t}^{s} Λ_{t}^{T} + K_{Z_{t}}) \leq κ \end{matrix}$

(313)

where $Λ_{t}$ is nonrandom.
The conditional means and conditional covariance ${\hat{S}}_{t}^{s}$ and $K_{t}^{s}$ are given by the generalized Kalman filter, as follows:
(i) ${\hat{S}}_{t}^{s}$ satisfies the Kalman filter recursion

$\begin{matrix} {\hat{S}}_{t + 1}^{s} = A_{t} {\hat{S}}_{t}^{s} + F_{t} (K_{t}^{s}) I_{t}^{s}, {\hat{S}}_{1} = s, \\ F_{t} (K_{t}^{s}) \overset{▵}{=} (A_{t} K_{t}^{s} {(Λ_{t} + C_{t})}^{T} + B_{t} K_{W_{t}} N_{t}^{T}) \end{matrix}$

(314)

$\begin{matrix} . {\{N_{t} K_{W_{t}} N_{t}^{T} + K_{Z_{t}} + (Λ_{t} + C_{t}) K_{t}^{s} {(Λ_{t} + C_{t})}^{T}\}}^{- 1}, \end{matrix}$

(315)

$\begin{matrix} I_{t}^{s} \overset{▵}{=} Y_{t} - C_{t} {\hat{S}}_{t}^{s} = (Λ_{t} + C_{t}) (S_{t}^{s} - {\hat{S}}_{t}^{s}) + N_{t} W_{t} + Z_{t}, t = 1, \dots, n, \end{matrix}$

(316)

$\begin{matrix} I_{t}^{s} \in N (0, K_{I_{t}^{s}}), t = 1, \dots, n a n o r t h o g o n a l i n n o v a t i o n s p r o c e s s, i . e ., \\ I_{t}^{s} i s i n d e p e n d e n t o f I_{k}^{s}, f o r a l l t \neq k, a n d I_{t}^{s} i s i n d e p e n d e n t o f Y^{t - 1}, \end{matrix}$

(317)

$\begin{matrix} K_{Y_{t} | Y^{t - 1}, s} = K_{I_{t}^{s}} \overset{▵}{=} c o v (I_{t}, I_{t} | S_{1}^{s} = s) \\ = (Λ_{t} + C_{t}) K_{t}^{s} {(Λ_{t} + C_{t})}^{T} + N_{t} K_{W_{t}} N_{t}^{T} + K_{Z_{t}} . \end{matrix}$

(318)

(ii) The error $E_{t}^{s} \overset{▵}{=} S_{t}^{s} - {\hat{S}}_{t}^{s}$ satisfies the recursion

$\begin{matrix} E_{t + 1}^{s} = & F_{t}^{C L} (K_{t}^{s}) E_{t}^{s} + (B_{t} - F_{t} (K_{t}^{s}) N_{t}) W_{t} - F_{t} (K_{t}^{s}) Z_{t}, E_{1}^{s} = S_{1}^{s} - {\hat{S}}_{1}^{s} = 0, t = 1, \dots, n, \end{matrix}$

(319)

$\begin{matrix} F_{t}^{C L} ( & K_{t}^{s}) \overset{▵}{=} A_{t} - F_{t} (K_{t}^{s}) (Λ_{t} + C_{t}) . \end{matrix}$

(320)

(iii) $K_{t}^{s} = E \{E_{t}^{s} {(E_{t}^{s})}^{T}\}$ satisfies the generalized DRE

$\begin{matrix} K_{t + 1}^{s} = A_{t} K_{t} A_{t}^{T} + B_{t} K_{W_{t}} B_{t}^{T} - (B_{t} K_{W_{t}} N_{t}^{T} + A_{t} K_{t}^{s} {(Λ_{t} + C_{t})}^{T}) {N_{t} K_{W_{t}} N_{t}^{T} + K_{Z_{t}} \\ + (Λ_{t} + C_{t}) K_{t}^{s} {(Λ_{t} + C_{t})}^{T}}^{- 1} {(B_{t} K_{W_{t}} N_{t}^{T} + A_{t} K_{t}^{s} (Λ_{t} + C_{t}))}^{T}, K_{t}^{s} ⪰ 0, K_{1}^{s} = 0, t = 1, \dots, n . \end{matrix}$

(321)

(c) The characterization of the n–FTFI capacity of part (a) is

$\begin{matrix} C_{n}^{f b, S} (κ, s) = \sup_{(Λ_{t}, K_{Z_{t}}), t = 1, \dots, n : \frac{1}{n} E_{s} \{\sum_{t = 1}^{n} {(X_{t})}^{2}\} \leq κ} \sum_{t = 1}^{n} log \frac{K_{Y_{t} | Y^{t - 1}, s}}{K_{V_{t} | V^{t - 1}, s}} \end{matrix}$

(322)

$\begin{matrix} = \sup_{(Λ_{t}, K_{Z_{t}}), t = 1, \dots, n : \frac{1}{n} E_{s} \{\sum_{t = 1}^{n} {(X_{t})}^{2}\} \leq κ} \sum_{t = 1}^{n} \{H (I_{t}^{s}) - H (N_{t} W_{t})\} \end{matrix}$

(323)

$\begin{matrix} = \sup_{(Λ_{t}, K_{Z_{t}}), t = 1, \dots, n : \frac{1}{n} \sum_{t = 1}^{n} (Λ_{t} K_{t}^{s} Λ_{t}^{T} + K_{Z_{t}}) \leq κ} { \\ \frac{1}{2} \sum_{t = 1}^{n} log (\frac{(Λ_{t} + C_{t}) K_{t}^{s} {(Λ_{t} + C_{t})}^{T} + N_{t} K_{W_{t}} N_{t}^{T} + K_{Z_{t}}}{N_{t} K_{W_{t}} N_{t}^{T}})} . \end{matrix}$

(324)

and the statements of part (b) hold.

Proof.

See Appendix A.8. □

Remark 22.

The asymptotic analysis of Section 3, based on Definition 4, applies naturally to Corollary 11.

Corollary 12.

Characterization of Feedback Capacity for the Case (II) formulation

Consider the statement of Corollary 11 for asymptotically time-invariant noise. The feedback capacity is given by

$\begin{matrix} C^{f b, S} (κ, s) = lim_{n ⟶ \infty} \frac{1}{n} C_{n}^{f b, S} (κ, s) \end{matrix}$

(325)

$\begin{matrix} = \sup_{(Λ^{\infty}, K_{Z}^{\infty}) \in P^{\infty, u c c, S} (κ)} \frac{1}{2} log (\frac{(Λ^{\infty} + C) K^{\infty} {(Λ^{\infty} + C)}^{T} + N K_{W} N^{T} + K_{Z}^{\infty}}{N K_{W} N^{T}}), \end{matrix}$

(326)

$\begin{matrix} P^{\infty, u c c, S} (κ) \overset{▵}{=} {(Λ^{\infty}, K_{Z}^{\infty}) : K_{Z}^{\infty} \geq 0, Λ^{\infty} K^{\infty} {(Λ^{\infty})}^{T} + K_{Z}^{\infty} \leq κ, K^{\infty} i s t h e m a x i m a l a n d \\ s t a b i l i z i n g s o l u t i o n o f (328), i . e ., d e t e c t a b i l i t y a n d u n i t c i r c l e c o n t r o l l a b i l i t y h o l d}, \end{matrix}$

(327)

$\begin{matrix} K^{\infty} = A K^{\infty} A^{T} + B K_{W} B^{T} - (B K_{W} N^{T} + A K^{\infty} {(Λ^{\infty} + C)}^{T}) {N K_{W} N^{T} + K_{Z}^{\infty} \\ + (Λ^{\infty} + C) K^{\infty} {(Λ^{\infty} + C)}^{T}}^{- 1} {(B K_{W} N^{T} + A K^{\infty} (Λ^{\infty} + C))}^{T} . \end{matrix}$

(328)

If the optimal solution is such that the stabilizability holds, then the feedback capacity is independent of the initial state $S_{1} = s$ .

Proof.

This is a special case of Theorem 7, with

Σ^{\infty} = 0

. □

Corollary 13.

Feedback capacity of [4,7]

Consider the channels studied in [4,7], i.e., with time-invariant and stable realization.
(a) The time-domain n–FTFI capacity is $C_{n}^{f b, S} (κ, s)$ given in Corollary 11.
(b) The time-domain feedback capacity is given by Corollary 12.

Proof.

Since the noise in [4,7] is time-invariant and stable, according to Definition 8, and the code is that of Definition 3, i.e.,

(s, 2^{n R}, n)

,

n = 1, 2, \dots

, then their results are special cases of Corollaries 11 and 12. □

In the next remark, we clarify the relation of Corollary 11 and the analysis of [4,8].

Remark 23.

Relations of Corollary 11 and [4,8]

(a) The state space problem analyzed in [8] is precisely $C_{n}^{f b, S} (κ, s)$ , when the noise is stationary and Gaussian, i.e., it corresponds to the Case (II) formulation. Corollary 11 is derived in [8] for the degenerate case of a time-invariant realization of the noise $V^{n}$ , i.e., of Definition 8. However, the asymptotic analysis of [8] (Section VI) should be read with caution, because it did not impose the necessary and/or sufficient conditions for convergence of the sequence $K_{t}^{s}, t = 1, 2, \dots$ generated by the time-invariant version of the generalized DRE (321), i.e., ${lim}_{n ⟶ \infty} K_{n}^{s} = K^{\infty} ⪰ 0$ , where $K^{\infty} ⪰ 0$ is either the maximal or the unique and stabilizing solution of a corresponding generalized ARE.
(b) The problem analyzed [4] that led to [4] (Theorem 6.1, $C_{F B}$ ) is the per unit time limit of $C_{n}^{f b, S} (κ, s)$ , when the noise is stationary, two-sided or one-sided (asymptotically stationary) and Gaussian, i.e., it corresponds to the Case (II) formulation. The characterization of feedback capacity presented in [4] (Theorem 6.1, $C_{F B}$ ) presupposed that the following hold ((i)–(iii) are also assumed in [8] (Section VI)):
(i) The feedback code is Definition 3, i.e., $(s, 2^{n R}, n)$ .
(ii) The noise is time-invariant and stable, and the PO-SS realization of the noise is invertible, as presented in Definition 8.
(iii) The definition of rate is $C^{f b, S, o} (κ, s)$ , with supremum and per unit time limit interchanged, and the supremum taken over using time-invariant channel input distributions.
(iv) The innovations covariance of the channel input process is asymptotically zero, i.e., ${lim}_{n ⟶ \infty} K_{Z_{n}} = K_{Z}^{\infty} = 0$ . This implies the corresponding ${lim}_{n ⟶ \infty} K_{n}^{s} = K^{\infty} \geq 0$ is the maximal and stabilizing solution of the corresponding matrix ARE, since detectability and unit circle controllability conditions hold, but not the stabilizability condition.
Items (i)–(iv) are confirmed from [4] (Lemma 6.1) (and comments above), which is used to derive [4] (Theorem 6.1, $C_{F B}$ ).
However, the characterization of feedback capacity in [4] (Theorem 6.1, $C_{F B}$ ) should be read with caution, because the stabilizability condition is violated, due to the requirement of the author, which states that $K_{Z}^{\infty} = 0$ is optimal. By Theorem 5(1), for the choice $K_{Z}^{\infty} = 0$ , the only choice is the maximal and stabilizing solution of the generalized ARE presented in [4] (Theorem 6.1, $C_{F B}$ ).
However, it is easy to verify that [4] (Theorem 6.1, C_FB) cannot be the capacity of asymptotically stationary noise because $C_{F B}$ depends on the covariance $K_{S_{1}} ≻ 0$ . Moreover, $K_{1} ≻ 0$ is required.
Finally, we emphasize that the treatment of the ARMA noise in [2] clarifies the above issues.

5. Conclusions

New equivalent sequential characterizations of Cover and Pombra [5] “n–block” feedback capacity formulas are derived using time-domain methods for additive Gaussian noise (AGN) channels driven by non-stationary Gaussian noise. New features of the equivalent characterizations encompass the representation of the optimal channel input process by a sufficient statistic and Gaussian orthogonal innovations process. The sequential characterizations of the n–block feedback capacity formula are expressed as a functional of two generalized matrix difference Riccati equations (DREs) of the filtering theory of Gaussian systems. The asymptotic analysis of the per unit time limit of the n–block”, called feedback capacity, is also presented for time-invariant and asymptotically time-invariant channel input distributions, using the tools from the theory of generalized matrix Riccati equations.

Prior analysis and characterizations of feedback capacity follows on from the analysis and derivation of the new sequential characterizations of feedback capacity, such as [4,7,11,12], who do not address the Cover and Pombra [5] feedback capacity problem, as the code definitions and noise assumptions in [4,7,11,12] (even under the restriction of stationary noise) are fundamentally different from those in [5]. This paper clarifies several of these points of confusion.

Author Contributions

C.D.C., C.K. and S.L. contributed to the conceptualization, methodology, and writing of this manuscript. All authors have read and agreed to the published version of this manuscript.

Funding

The work of C.D. Charalambous was co-funded by the European Regional Development Fund and the Republic of Cyprus through the Research and Innovation Foundation (Project: EXCELLENCE/1216/0296).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

No data are contained within this article.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of this study; in the collection, analyses, or interpretation of data; in the writing of this manuscript; or in the decision to publish the results.

Appendix A

Appendix A.1. Proof of Theorem 1

(a) Consider an element of

{\bar{E}}_{[0, n]} (κ)

. Then, the conditional entropies

H^{\bar{e}} (Y_{t} | Y^{t - 1}), t = 1, \dots, n

are defined, provided that the conditional distributions of

Y_{t}

conditioned on

Y^{t - 1}

, i.e.,

P_{t}^{\bar{e}} (d y_{t} | y^{t - 1})

, for

t = 1, \dots, n

, are determined. By the reconditioning, using (60), we have

\begin{matrix} P_{t}^{\bar{e}} & (d y_{t} | y^{t - 1}) = \int P_{t}^{\bar{e}} (d y_{t} | y^{t - 1}, w) P_{t}^{\bar{e}} (d w | y^{t - 1}), t = 0, \dots, n \end{matrix}

(A1)

\begin{matrix} = & \int P_{t} (d y_{t} | {\bar{e}}_{t} (w, v^{t - 1}, y^{t - 1}), v^{t - 1}) P_{t}^{\bar{e}} (d w | y^{t - 1}), by (60), (61) . \end{matrix}

(A2)

Hence, (64) is shown. Similarly, consider an element of

{\bar{P}}_{[0, n]} (κ)

. Then, the conditional entropies

H^{\bar{P}} (Y_{t} | Y^{t - 1}), t = 1, \dots, n

are defined, provided that the conditional distributions of

Y_{t}

conditioned on

Y^{t - 1}

, i.e.,

P_{t}^{\bar{P}} (d y_{t} | y^{t - 1})

for

t = 1, \dots, n

, are determined. By (52) and (53), (65) is obtained. Since

{\bar{E}}_{[0, n]} (κ) \subseteq {\bar{P}}_{[0, n]} (κ)

, Inequality (63) follows.

(b) This part is followed by the maximum entropy principle of Gaussian distributions. That is, under the restriction (18), a conditional Gaussian element of

{\bar{P} (d x_{t} | v^{t - 1}, y^{t - 1}), t = 1, \dots, n} \in {\bar{P}}_{[0, n]} (κ)

with linear conditional mean and nonrandom conditional covariance induces a jointly Gaussian distribution of the process

(X^{n}, Y^{n})

, such that the marginal distribution of

Y^{n}

is jointly Gaussian. Below, we provide alternative proof that uses the Cover and Pombra characterization of the n–FTFI capacity, given by (34) and (35). Consider (35) and define the process

\begin{matrix} Z_{1} \overset{▵}{=} & {\bar{Z}}_{1} - E \{{\bar{Z}}_{1}\}, \end{matrix}

(A3)

\begin{matrix} Z_{t} \overset{▵}{=} & {\bar{Z}}_{t} - E^{\bar{P}} \{{\bar{Z}}_{t} | X^{t - 1}, V^{t - 1}, Y^{t - 1}\}, t = 2, \dots, n, \end{matrix}

(A4)

\begin{matrix} = & {\bar{Z}}_{t} - E^{\bar{P}} \{{\bar{Z}}_{t} | V^{t - 1}, Y^{t - 1}\}, since X^{t - 1} is uniquely defined by (V^{t - 1}, Y^{t - 1}) . \end{matrix}

(A5)

Then,

Z_{t}

is a Gaussian orthogonal innovations process, independent of

(X^{t - 1}, V^{t - 1}, Y^{t - 1})

, for

t = 2, \dots, n

, and

E \{Z_{t}\} = 0

, for

t = 1, \dots, n

. By (35), we re-write

X_{t}, t = 1, \dots, n

as,

\begin{matrix} X_{t} = & \sum_{j = 1}^{t - 1} B_{t, j} V_{j} + {\bar{Z}}_{t}, t = 1, \dots, n, \end{matrix}

(A6)

\begin{matrix} = & \sum_{j = 1}^{t - 1} B_{t, j} V_{j} + E^{\bar{P}} \{{\bar{Z}}_{t} | V^{t - 1}, Y^{t - 1}\} + Z_{t}, b y () \end{matrix}

(A7)

\begin{matrix} \overset{(a)}{=} & \sum_{j = 1}^{t - 1} B_{t, j} V_{j} + {\bar{Γ}}_{t} (\begin{matrix} V^{t - 1} \\ Y^{t - 1} \end{matrix}) + Z_{t}, for some {\bar{Γ}}_{t} nonrandom \end{matrix}

(A8)

\begin{matrix} = & \sum_{j = 1}^{t - 1} Γ_{t, j}^{1} V_{j} + \sum_{j = 1}^{t - 1} Γ_{t, j}^{2} Y_{j} + Z_{t}, for some (Γ_{\cdot, \cdot}^{1}, Γ_{\cdot, \cdot}^{2}) \end{matrix}

(A9)

\begin{matrix} = & Γ_{t}^{1} V^{t - 1} + Γ_{t}^{2} Y^{t - 1} + Z_{t}, by definition \end{matrix}

(A10)

where

(a)

is due to the by joint Gaussianity of

(Z^{n}, X^{n}, Y^{n})

. From (A10) and the independence of

Z_{t}

and

(X^{t - 1}, V^{t - 1}, Y^{t - 1})

, for

t = 2, \dots, n

, (66) then follows, as well as (67).

(c) The statements follow directly from the representation of part (b), while the independence of $Z^{n}$ and $V^{n}$ is due to the code definition, i.e., Definition 1(iv).
(d) The statement follows on from parts (a–c) and (28)–(31).

Appendix A.2. Proof of Proposition 1

(a) The covariances of the realization of the ARMA

(a, c)

noise of Example 2(b) satisfy the recursions

\begin{matrix} K_{S_{t + 1}} = c^{2} K_{S_{t}} + K_{W}, K_{S_{t}, V_{t}} = (c - a) K_{S_{t}}, K_{V_{t}} = {(c - a)}^{2} K_{S_{t}} + K_{W}, \forall t \in Z . \end{matrix}

(A11)

If the recursion

K_{S_{t + 1}} = c^{2} K_{S_{t}} + K_{W}

is initiated at the stationary value

K_{S_{1}} = d_{11} = \frac{K_{W}}{1 - c^{2}}

, then

K_{S_{t + 1}} = d_{11}, \forall t = 2, 3, \dots

; hence,

S_{t}, \forall t \in Z

is stationary, which then implies the stationarity of

V_{t}, \forall t \in Z

. Hence, if (133) holds, then

(V_{t}, S_{t}), \forall t \in Z

is stationary. Via simple calculations, (134) then follows. We carry out the same operation for the one-sided ARMA

(a, c)

. (b) By the above covariances, for all

K_{S_{1}} \geq 0

, we have

{lim}_{n ⟶ \infty} K_{S_{n}} = K_{S}^{\infty}

, where

K_{S}^{\infty} = c^{2} K_{S}^{\infty} + K_{W}

, which then implies

K_{S}^{\infty} = d_{11}

. Similarly,

{lim}_{n ⟶ \infty} K_{S_{n}, V_{n}} = K_{S, V}^{\infty} = d_{12}

,

{lim}_{n ⟶ \infty} K_{V_{n}} = K_{V}^{\infty} = d_{22}

. (c) (135) and (136) follow on from Lemma 1 by replacing the conditioning information

V^{t - 1}

with

(V_{0}, V^{t - 1})

in (85). By mean square estimation, the initial data are

\begin{matrix} {\hat{S}}_{1} = & E \{S_{1} | V_{0}\} = E \{c S_{0} + W_{0} | V_{0}\} \\ = & E \{c S_{0} + W_{0}\} + c o v (c S_{0} + W_{0}, V_{0}) {\{c o v (V_{0}, V_{0})\}}^{- 1} (V_{0} - E \{V_{0}\}) \\ = & (c d_{12} + K_{W}) d_{22}^{- 1} V_{0}, \end{matrix}

(A12)

\begin{matrix} Σ_{1} = & c o v (S_{1}, S_{1} | V_{0}) = c o v (S_{1}, S_{1}) - {\{c o v (S_{1}, S_{1})\}}^{2} {\{c o v (V_{0}, V_{0})\}}^{- 1} \\ = & d_{11} - d_{11}^{2} d_{22}^{- 1} \end{matrix}

(A13)

The last part is obvious.

Appendix A.3. Proof of Corollary 3

(a) Since we have assumed that

S_{1} = S_{1}^{s} = s

is fixed, and that it is known to the encoder and the decoder, then Theorem 1 still holds. This is due to replacing all conditional distributions, expectations and entropies via corresponding expressions with a fixed

S_{1} = S_{1}^{s} = s

. Hence, (75) is replaced by (141), and (68) is replaced by (142) (since the code is allowed to depend on

S_{1} = S_{1}^{s} = s

). (b) From the PO-SS realization of Definition 2 with

S_{1} = S_{1}^{s} = s

fixed, it follows that a necessary condition for Conditions 1 of Section 1.1 to hold is (i). The expression of entropy (143) is easily obtained by invoking Condition (i) and the properties of conditional entropy. That is,

H (V_{1} | S_{1}^{s} = s) = H (C_{1} S_{1}^{s} + N_{1} W_{1} | S_{1}^{s} = s) = H (N_{1} W_{1} | S_{1} = s) = H (N_{1} W_{1})

by independence of

W_{1}

and

S_{1}^{s}

, and

H (V_{2} | V_{1}, s) = H (V_{2} | V_{1}, S_{1}^{s} = s) = H (C_{2} S_{2}^{s} + N_{2} W_{2} | C_{1} S_{1}^{s} + N_{1} W_{1}, S_{1}^{s} = s) = H (C_{2} S_{2}^{s} + N_{2} W_{2} | N_{1} W_{1}, S_{1}^{s} = s) = H (C_{2} A_{1} S_{1}^{s} + C_{2} B_{1} W_{1} + N_{2} W_{2} | N_{1} W_{1}, S_{1}^{s} = s) = H (N_{2} W_{2} | N_{1} W_{1}, S_{1}^{s} = s) = H (N_{2} W_{2})

, etc. The last statement is obvious. This completes the proof.

Appendix A.4. Proof of Theorem 3

(a) Clearly, (154)–(165) follow directly from Theorem 1 and the preliminary calculations prior to the statement of the theorem. However, (154)–(165) can also be shown independently of Theorem 1 by invoking the maximum entropy property of Gaussian distributions, as follows: By Lemma 1, we have

H (V^{n}) = \sum_{t = 1}^{n} H ({\hat{I}}_{t})

. By the maximum entropy principle,

H (Y^{n})

is maximized if

P_{Y^{n}}

is jointly Gaussian, the average power constraint holds, and (17) is respected. By (146), (150), and (160), if (150)–(165) hold, then

(X^{n}, Y^{n})

is jointly Gaussian; hence,

H (Y^{n})

is maximized. This shows (a).

(b) Step 1. By (163) and (164), an alternative representation of $X^{n}$ to the one given in Theorem 1 and induced by (68) is

$\begin{matrix} X_{t} = Γ_{t}^{1} {\hat{S}}_{t} + Γ_{t}^{2} Y^{t - 1} + Z_{t}, t = 1, \dots, n, \end{matrix}$

(A14)

$\begin{matrix} Z_{t} satisfies (167) . \end{matrix}$

(A15)

for some nonrandom $(Γ_{\cdot}^{1}, Γ_{\cdot}^{2})$ . Upon substituting (A14) into the channel output $Y^{n}$ , we have

$\begin{matrix} Y_{t} = & Γ_{t}^{1} {\hat{S}}_{t} + Γ_{t}^{2} Y^{t - 1} + Z_{t} + V_{t}, t = 1, \dots, n \end{matrix}$

(A16)

$\begin{matrix} = & (Γ_{t}^{1} + C_{t}) {\hat{S}}_{t} + Γ_{t}^{2} Y^{t - 1} + Z_{t} + {\hat{I}}_{t}, by (90) . \end{matrix}$

(A17)

The right-hand side of (A17) is driven by two independent processes, $Z_{t}, t = 1, \dots, n$ and ${\hat{I}}_{t}, t = 1, \dots, n$ , which are also mutually independent. Further, the right-hand side of (A17) is a linear function of a state process ${\hat{S}}_{t}, t = 1, \dots, n$ , which satisfies the following recursion (88):

$\begin{matrix} {\hat{S}}_{t + 1} = A_{t} {\hat{S}}_{t} + M_{t} (Σ_{t}) {\hat{I}}_{t}, {\hat{S}}_{1} = μ_{S_{1}}, \end{matrix}$

(A18)

Note that the right-hand side of (A18) is driven by the orthogonal process ${\hat{I}}_{t}$ , which is independent of $V^{t - 1}$ and hence of ${\hat{S}}_{t}$ . It is also independent of $Y^{t - 1}$ . By (167), $Z_{t}$ is independent of $Y^{t - 1}$ and of ${\hat{S}}_{t}$ . By (A17) and (A18), it follows that ${\hat{\hat{S}}}_{t}, t = 1, \dots, n$ satisfies a generalized Kalman filter recursion, similar to that of Lemma 1. Hence, the entropy $H (Y^{n})$ can be computed using the innovations process of $Y^{n}$ , as in Lemma 1. Define the orthogonal Gaussian innovations process $I^{n}$ of $Y^{n}$ by

$\begin{matrix} I_{t} \overset{▵}{=} & Y_{t} - E \{Y_{t} | Y^{t - 1}\}, t = 1, \dots, n \end{matrix}$

(A19)

$\begin{matrix} = & (Γ_{t}^{1} + C_{t}) ({\hat{S}}_{t} - {\hat{\hat{S}}}_{t}) + {\hat{I}}_{t} - E \{{\hat{I}}_{t} | Y^{t - 1}\} + Z_{t}, by (A17) . \end{matrix}$

(A20)

$\begin{matrix} = & (Γ_{t}^{1} + C_{t}) ({\hat{S}}_{t} - {\hat{\hat{S}}}_{t}) + {\hat{I}}_{t} + Z_{t}, by {\hat{I}}_{t} indep . of Y^{t - 1}, E \{{\hat{I}}_{t}\} = 0 \end{matrix}$

(A21)

The entropy of $Y^{n}$ is computed as follows.

$\begin{matrix} H (Y^{n}) = & \sum_{t = 1}^{n} H (Y_{t} | Y^{t - 1}) \end{matrix}$

(A22)

$\begin{matrix} = & \sum_{t = 1}^{n} H (I_{t} | Y^{t - 1}), by (A19) and a property of conditional entropy; \end{matrix}$

(A23)

$\begin{matrix} = & \sum_{t = 1}^{n} H (I_{t}), by orthogonality of I_{t} and Y^{t - 1} . \end{matrix}$

(A24)

By (A21), the Gaussian innovations process $I^{n}$ does not depend on the strategy of $Γ^{2}$ . Consequently, by (A24), the entropy $H (Y^{n})$ does not depend on the strategy of $Γ^{2}$ .
Step 2. Let $g_{t} (Y^{t - 1}) \overset{▵}{=} Γ_{t}^{2} Y^{t - 1}, t = 1, \dots, n$ . By (A14) and (A15), it then follows that

$\begin{matrix} \frac{1}{n} E & \{\sum_{t = 1}^{n} {(X_{t})}^{2}\} = \frac{1}{n} E \{\sum_{t = 1}^{n} {(Γ_{t}^{1} {\hat{S}}_{t} + g_{t} (Y^{t - 1}) + Z_{t})}^{2}\} \end{matrix}$

(A25)

$\begin{matrix} = & \frac{1}{n} E \{\sum_{t = 1}^{n} {(Γ_{t}^{1} {\hat{S}}_{t} + g_{t} (Y^{t - 1}))}^{2}\} + \sum_{t = 1}^{n} K_{Z_{t}}, by indep. of Z_{t} and (V^{t - 1}, {\hat{S}}_{t}, Y^{t - 1}) . \end{matrix}$

(A26)

By utilizing the mean square estimation theory, then the choice of $g (\cdot)$ that minimizes the right hand of (A26) is

$\begin{matrix} g_{t} (Y^{t - 1}) = g_{t}^{*} (Y^{t - 1}) = - Γ_{t}^{1} E \{{\hat{S}}_{t} | Y^{t - 1}\} = - Γ_{t}^{1} {\hat{\hat{S}}}_{t}, t = 1, \dots, n . \end{matrix}$

(A27)

Hence, $Γ_{t}^{2} = - Γ_{t}^{1}, \forall t$ . Let $Λ_{t} \overset{▵}{=} Γ_{t}^{1}, \forall t$ and substitute into the (A17) recursion to obtain (170) and into the average power (A26) to obtain (171). Hence, the derivation of (166)–(171) is completed.

It then follows that (172)–(180) are generalized Kalman filter recursions of estimating the new state process

{\hat{S}}_{t}, t = 1, \dots, n

that satisfies recursion (A18) from the channel output process

Y_{t}

that satisfies the recursion (170).

(c) By parts (a) and (b), and the entropy of Gaussian RVs, upon substituting (A24) and (98) into (154), we obtain part (c). This completes the proof.

Appendix A.5. Proof of Proposition 2

Since the proof of [4] (Theorem 6.1) is based on [4] (Lemma 6.1), where the channel input

X_{t}

is expressed as

X_{t} = Λ (S_{t} - E \{S_{t} | Y_{- \infty}^{t - 1}), t = 1, \dots

, where

Λ

is a nonrandom vector, then (211) is necessary for [4] (Theorem 6.1) to hold. Next, we show that Conditions 1 and 2 of Section 1.1 are necessary and sufficient for Equality (211) to hold. To avoid a complex notation, we prove the claim for the realization of Example 2(a). Suppose that the initial state

S_{1}

of the noise is

S_{1} = S_{1}^{s} = s

and that it is known to the encoder and the decoder; without loss of generality, take

s = 0

, which, by (127), implies

V_{0} = 0, W_{0} = 0

(as is often carried out in [4]). Then, the following hold:

\begin{matrix} S_{1} = 0 ⟹ V_{1} = W_{1}, S_{2} = W_{1} = V_{1}, by (128), (129); \end{matrix}

(A28)

\begin{matrix} (S_{1} = 0, V_{1}) uniquely define S_{2} = c S_{1} + W_{1} = W_{1} = V_{1}, by (128), (129); \end{matrix}

(A29)

\begin{matrix} V_{2} = (c - a) S_{2} + W_{2}, S_{3} = c S_{2} + W_{2} = c V_{1} + W_{2}, by (128), (129); \end{matrix}

(A30)

\begin{matrix} (S_{1} = 0, V_{1}, V_{2}) uniquely define (S_{2}, S_{3}); \end{matrix}

(A31)

\begin{matrix} Repeating, then (S_{1} = 0, V_{1}, \dots, V_{t - 1}) uniquely define (S_{2}, S_{3}, \dots, S_{t}), \forall t = 3, 4, \dots . \end{matrix}

(A32)

From (A28)–(A32), it then follows that for any

S_{1} = s

, including

s = 0

, it is known by the encoder that the following equalities hold:

\begin{matrix} P_{X_{t} | X^{t - 1}, Y_{- \infty}^{t - 1}, S_{1}} = & P_{X_{t} | V^{t - 1}, Y_{- \infty}^{t - 1}, S_{1}}, b y Y_{t} = X_{t} + V_{t} \end{matrix}

(A33)

\begin{matrix} = & P_{X_{t} | S^{t}, Y_{- \infty}^{t - 1}, S_{1}}, t = 1, \dots, \end{matrix}

(A34)

We can go one step further to identify the information structure of optimal channel input distributions using (A34), i.e., to show

P_{X_{t} | S^{t}, Y_{- \infty}^{t - 1}, S_{1}} = P_{X_{t} | S_{t}, Y_{- \infty}^{t - 1}, S_{1}}, t = 1, \dots

, by repeating to proof of [8] (Theorem 1). However, for the statement of the proposition, this is not necessary.

Suppose that either

S_{1} = s

is not known to the encoder, i.e.,

V_{0} = v_{0}, W_{0} = w_{0}

are not known to the encoder, and

S_{1} \neq 0

, while the optimal channel input is expressed as a function of the state of the noise,

S^{n}

:

\begin{matrix} P_{X_{t} | X^{t - 1}, Y_{- \infty}^{t - 1}} = & P_{X_{t} | V^{t - 1}, Y_{- \infty}^{t - 1}} = P_{X_{t} | S^{t}, Y_{- \infty}^{t - 1}}, t = 1, \dots, \end{matrix}

(A35)

Then, by (128) and (129), it follows that

V_{1} = (c - a) S_{1} + W_{1}, S_{2} = a S_{1} + W_{1}

. Hence, knowledge of

V_{1}

does not specify

S_{2}

. Similarly,

V^{t - 1}

does not specify

S^{t}

, for

t = 2, 3, \dots

. Hence, we arrive at a contradiction of the last equality in (A35). This competes the proof.

Appendix A.6. Proof of Lemma 2

(a) This is due to Lemma 1(v).
(b) By taking the per unit time limit (221) and utilizing the hypothesis (223), the continuity of the $log (\cdot)$ and the fact that, for any convergent sequence $a_{n}, n = 1, 2, \dots$ , i.e., ${lim}_{n ⟶ \infty} a_{n} = a$ , then $\frac{1}{n} \sum_{t = 1}^{n} a_{n} ⟶ a$ , as $n ⟶ \infty$ , (224) follows.

Appendix A.7. Proof of Lemma 4

From Corollary 6(a), we deduce that

Σ_{t}^{o} \overset{▵}{=} Σ_{t}, t = 1, \dots, n

satisfies (239) with initial condition (240). By Definition 7, the corresponding generalized algebraic Riccati equation is (241), and pairs

{A, C}

and

{A^{*}, G B^{*, \frac{1}{2}}}

are given by (242).

(1) By Definition 7, for $c \neq a$ , the pair ${A, C} = {c, c - a}$ is observable and hence detectable.
(2) By Definition 7, the pair ${A^{*}, G B^{*, \frac{1}{2}}} = {a, 0}$ is unit circle controllable if and only if $| a | \neq 1$ .
(3) By Definition 7, the pair ${A^{*}, G B^{*, \frac{1}{2}}} = {a, 0}$ is stabilizable if and only if $a \in (- 1, 1)$ .
(4) This follows from Theorem 5(1) and parts (1)–(4). Since (241) is a quadratic equation, we can verify that the two solutions are $Σ^{\infty} = 0$ and $Σ^{\infty} = \frac{K_{W} (a^{2} - 1)}{{(c - a)}^{2}}$ , and we consider the statement of (243).
(5) For values $c \in (- \infty, \infty)$ and $| a | < 1$ , the pair ${A, C} = {c, c - a}$ is detectable and the pair ${A^{*}, G B^{*, \frac{1}{2}}} = {a, 0}$ is stabilizable, and the statement follows from Theorem 5(3).
(6) (245) follows from Lemma 2(b) by invoking Corollary 6(a), i.e., $K_{{\hat{I}}_{t}} = {(c - a)}^{2} Σ_{t}^{o} + K_{W}, t = 1, \dots, n$ , where $Σ_{t}^{o}$ is generated by (239) and parts (4) and (5).

Appendix A.8. Proof of Corollary 11

First, note that the analog of Theorem 1(a), for the code

(s, 2^{n R}, n)

,

n = 1, 2, \dots

is (299) and (300) because

P_{t} (d x_{t} | x^{t - 1}, y^{t - 1}, s) = {\bar{P}}_{t} (d x_{t} | v^{t - 1}, y^{t - 1}, s), t = 1, \dots, n

. Define

{\bar{P}}_{[0, n]}^{s} (κ)

as in (300), with

x^{t - 1}

being replaced by

v^{t - 1}, t = 1, \dots, n

.

(a) Then

$\begin{matrix} P_{t} (d x_{t} | x^{t - 1}, y^{t - 1}, s) = & P_{t} (d x_{t} | v^{t - 1}, y^{t - 1}, s_{0}), t = 1, \dots, n, b y Y_{t} = X_{t} + V_{t} \end{matrix}$

(A36)

$\begin{matrix} = & {\bar{P}}_{t} (d x_{t} | s^{t}, y^{t - 1}, s), Definition 8 . \end{matrix}$

(A37)

The PO-SS realization, for a fixed $S_{1}^{s} = s$ , is then

$\begin{matrix} V_{t} = C_{t} S_{t}^{s} + N_{t} W_{t}, S_{1}^{s} = s, t = 1, \dots, n, \end{matrix}$

(A38)

$\begin{matrix} S_{t + 1}^{s} = A_{t} S_{t}^{s} + B_{t} W_{t}, S_{1}^{s} = s . \end{matrix}$

(A39)

Then,

$\begin{matrix} P_{t} (d y_{t} | x^{t}, y^{t - 1}, s) = & P_{t} (d y_{t} | x^{t}, v^{t - 1}, y^{t - 1}, s) \end{matrix}$

(A40)

$\begin{matrix} = & P_{t} (d y_{t} | x^{t}, s^{t}, y^{t - 1}, s), by Definition 8, (A1) \end{matrix}$

(A41)

$\begin{matrix} = & P_{t} (d y_{t} | x_{t}, s^{t}), by Y_{t} = X_{t} + V_{t} and (A38) \end{matrix}$

(A42)

$\begin{matrix} \overset{(a)}{=} & P_{t} (d y_{t} | x_{t}, s_{t}, s), by mutually independent of (W_{1}, \dots, W_{n}, S_{1}^{s}) . \end{matrix}$

(A43)

The probability distribution $P_{t} (d y_{t} | y^{t - 1}, s)$ is then given by

$\begin{matrix} P_{t}^{\bar{P}} (d y_{t} | y^{t - 1}, s) = & \int P_{t} (d y_{t} | x_{t}, s_{t}) P_{t} (d x_{t} | s_{t}, y^{t - 1}, s) \\ . P_{t}^{\bar{P}} (d s_{t} | y^{t - 1}, s), t = 1, \dots, n, by reconditioning and (A43) . \end{matrix}$

(A44)

The pay-off is the sum of conditional entropies $\sum_{t = 1}^{n} H (Y_{t} | Y^{t - 1}, s)$ , and the constraint is (300). By Definition 2, the state $S_{t}^{s}, t = 2, \dots, n$ is Markov, $P_{S_{t}^{s} | S^{s, t - 1}} = P_{S_{t}^{s} | S_{t - 1}^{s}}, t = 2, \dots, n$ . By (A44) and the Markov property of $S_{t}^{s}, t = 1, \dots, n$ , at each time t, the input distribution $P_{t}^{\bar{P}} (d s_{t} | y^{t - 1}, s)$ depends on $P_{j} (d x_{j} | s_{j}, y^{j - 1}, s), j = 1, \dots, t - 1$ and not on ${\bar{P}}_{j} (d x_{j} | s^{j}, y^{j - 1}, s), j = 1, \dots, t - 1$ . By (301) and (303), $P_{t} (d x_{t} | s_{t}, y^{t - 1}, s) = {\bar{P}}_{t}^{M} (d x_{t} | s_{t}, y^{t - 1}, s), t = 1, \dots, n$ . It is noted that (301) and (303) also follow from a slight variation of the derivation given in [8] (Theorem 1). By the maximum entropy principle of Gaussian distributions, it then follows that the distribution ${\bar{P}}_{t}^{M} (d x_{t} | s_{t}, y^{t - 1}, s)$ is conditionally Gaussian, with linear conditional mean and nonrandom conditional covariance. Then, (304) follows by repeating Step 2 of the derivation of Theorem 3. This completes the derivation of all statements of part (a).
(b,c). The statements follow from part (a), using the generalized Kalman filter, as in Theorem 3.

References

Kourtellaris, C.; Charalambous, C.D.; Sergey, L. New Formulas for Ergodic Feedback Capacity of AGN Channels Driven by Stable and Unstable Autoregressive Noise. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Los Angeles, CA, USA, 21–26 June 2020. [Google Scholar]
Louka, S.; Kourtellaris, C.; Charalambous, C.D. Qualitative Analysis of Feedback Capacity of AGN Channels Driven by Stable and Unstable Autoregressive Moving Average. In Proceedings of the 2021 IEEE Information Theory Workshop (ITW), Kanazawa, Japan, 17–21 October 2021. [Google Scholar]
Charalambous, C.D.; Kourtellaris, C.; Louka, S. Sequential Characterization of Cover and Pombra Gaussian Feedback Capacity: Generalizations to MIMO Channels via a Sufficient Statistic. In Proceedings of the 2021 IEEE Information Theory Workshop (ITW), Kanazawa, Japan, 17–21 October 2021. [Google Scholar]
Kim, Y.H. Feedback Capacity of Stationary Gaussian Channels. IEEE Trans. Inf. Theory 2010, 56, 57–85. [Google Scholar] [CrossRef]
Cover, T.; Pombra, S. Gaussian feedback capacity. IEEE Trans. Inf. Theory 1989, 35, 37–43. [Google Scholar] [CrossRef]
Derpich, M.S.; Ostergaard, J. Comments on “Feedback Capacity of Stationary Gaussian Channels”. IEEE Trans. Inf. Theory 2024, 70, 1848–1851. [Google Scholar] [CrossRef]
Gattami, A. Feedback Capacity of Gaussian Channels Revisited. IEEE Trans. Inf. Theory 2019, 65, 1948–1960. [Google Scholar] [CrossRef]
Yang, S.; Kavcic, A.; Tatikonda, S. On Feedback Capacity of Power-Constrained Gaussian Noise Channels with Memory. Inf. Theory IEEE Trans. Theory 2007, 53, 929–954. [Google Scholar] [CrossRef]
Ihara, S. On the Feedback Capacity of the First-Order Moving Average Gaussian Channel. Jpn. J. Stat. Data Sci. 2019, 2, 491–506. [Google Scholar] [CrossRef]
Charalambous, C.D.; Kourtellaris, C.; Louka, S. New Formulas of Feedback Capacity for AGN Channels with Memory: A Time-Domain Sufficient Statistic Approach. arXiv 2020, arXiv:2010.06226. [Google Scholar] [CrossRef]
Liu, T.; Han, G. Feedback Capacity of Stationary Gaussian Channels Further Examined. IEEE Trans. Inf. Theory 2019, 64, 2494–2506. [Google Scholar] [CrossRef]
Li, C.; Elia, N. Youla coding and computation of Gaussian feedback capacity. IEEE Trans. Inf. Theory 2019, 64, 3197–3215. [Google Scholar] [CrossRef]
Butman, S. Linear Feedback Rate Bounds for Regressive Channels. IEEE Trans. Inf. Theory 1976, 22, 363–366. [Google Scholar] [CrossRef]
Wolfowitz, J. Signalling over a Gaussian Channel with Feedback and Autoregressive Noise. J. Appl. Probab. 1975, 12, 713–723. [Google Scholar] [CrossRef]
Ihara, S. Information Theory for Continuous Systems; World Scientific: Singapore, 1993; pp. I–XIII, 1–308. [Google Scholar]
Caines, P.E. Linear Stochastic Systems; Wiley Series in Probability and Statistics; John Wiley & Sons, Inc.: New York, NY, USA, 1988. [Google Scholar]
Kailath, T.; Sayed, A.; Hassibi, B. Linear Estimation; Prentice Hall: Hoboken, NJ, USA, 2000. [Google Scholar]
Gallager, R.T. Information Theory and Reliable Communication; John Wiley & Sons, Inc.: New York, NY, USA, 1968. [Google Scholar]
Butman, S. A General Formulation of Linear Feedback Communications systems with Solutions. IEEE Trans. Inf. Theory 1969, 15, 392–400. [Google Scholar] [CrossRef]
Ebert, P.M. The capacity of the Gaussian channel with feedback. Bell Sys. Tech. J. 1970, 49, 1705–1712. [Google Scholar] [CrossRef]
Tienan, J.; Schalkwijk, J.P.M. An upper bound to the capacity of bandlimited Gaussian autoregressive channel with noiseless feedback. IEEE Trans. Inf. Theory 1974, 20, 311–316. [Google Scholar] [CrossRef]
Dembo, A. On Gaussian Feedback Capacity. IEEE Trans. Inf. Theory 1989, 35, 1072–1076. [Google Scholar] [CrossRef]
Ihara, S.; Yanaki, K. Capacity of discrete-time Gaussian channels with and without feedback-II. Jpn. J. Appl. Math. 1989, 6, 245–258. [Google Scholar] [CrossRef]
Ozarow, L. Upper Bounds on the Capacity of Gaussian Channel with Feedback. IEEE Trans. Inf. Theory 1984, 36, 156–161. [Google Scholar] [CrossRef]
Ozarow, L.H. Random coding for additive Gaussin channels with Feedback. IEEE Trans. Inf. Theory 1984, 36, 17–22. [Google Scholar] [CrossRef]
Yanaki, K. Necessary and sufficient conditions for the capacity of the discrete-time Gaussian channel to be increased by feedback. IEEE Trans. Inf. Theory 1992, 38, 1788–1791. [Google Scholar] [CrossRef]
Yanaki, K. An upper bound on the discrete-time Gaussian channel with feedback-II. IEEE Trans. Inf. Theory 1994, 40, 588–593. [Google Scholar] [CrossRef][Green Version]
Chen, H.W.; Yanaki, K. Refiniements of the half-bit and factor-of-two bounds for capacity in Gaussian channels with feedback. IEEE Trans. Inf. Theory 1999, 45, 316–325. [Google Scholar]
Chen, H.W.; Yanaki, K. Upper bounds on the capacity of discrete-time blockwise white Gaussian channels with feedback. IEEE Trans. Inf. Theory 2000, 43, 1125–1131. [Google Scholar] [CrossRef]
Gallager, R.G.; Nakiboglu, B. Variations on a Theme by Schalkwijk and Kailath. IEEE Trans. Inf. Theory 2010, 56, 6–17. [Google Scholar] [CrossRef]
Ordentlich, E. A Class of Optimal Coding Schmes for Moving Average Additive Gaussian Noise Channels with Feedback. In Proceedings of the IEEE International Symposium on Information Theory Proceedings (ISIT), Trondheim, Norway, 27 June–1 July 1994; p. 467. [Google Scholar]
Tatikonda, S.; Mitter, S. The Capacity of Channels with Feedback. IEEE Trans. Inf. Theory 2009, 55, 323–349. [Google Scholar] [CrossRef]
Liu, T.; Han, G. The ARMAk Gaussian Feedback Capacity. In Proceedings of the IEEE International Symposium on Information Theory Proceedings (ISIT), Aachen, Germany, 25–30 June 2017; pp. 211–215. [Google Scholar]
Kim, Y.H. Feedback capacity of the first-order moving average Gaussian channel. IEEE Trans. Inf. Theory 2006, 52, 3063–3079. [Google Scholar] [CrossRef]
Kourtellaris, C.; Charalambous, C.D.; Sergey, L. From Feedback Capacity to Tight Achievable Bounds without Feedback for AGN Channels with Stable and Unstable Autoregressive Noise. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Los Angeles, CA, USA, 21–26 June 2020. [Google Scholar]
Kourtellaris, C.; Charalambous, C.D. Information Structures of Capacity Achieving Distributions for Feedback Channels with Memory and Transmission Cost: Stochastic Optimal Control & Variational Equalities. IEEE Trans. Inf. Theory 2018, 64, 4962–4992. [Google Scholar]
Charalambous, C.D.; Kourtellaris, C.; Loyka, S. Capacity Achieving Distributions and Separation Principle for Feedback Gaussian Channels with Memory: The LQG Theory of Directed Information. IEEE Trans. Inf. Theory 2018, 64, 6384–6418. [Google Scholar] [CrossRef]
Sabag, O.; Kostina, V.; Hassibi, B. Feedback Capacity of MIMO Gaussian Channels. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Melbourne, Australia, 12–20 July 2021. [Google Scholar]
Kumar, P.R.; Varaiya, P. Stochastic Systems: Estimation, Identification, and Adaptive Control; Prentice Hall: Hoboken, NJ, USA, 1986. [Google Scholar]
van Schuppen, J.H. Control and System Theory of Discrete-Time Stochastic Systems; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
Charalambous, C.D.; Louka, S. A Riccati-Lyapunov Approach to Nonfeedback Capacity of MIMO Gaussian Channels Driven by Stable and Unstable Noise. In Proceedings of the 2022 IEEE Information Theory Workshop (ITW), Mumbai, India, 1–9 November 2022; pp. 184–189. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

New Formulas of Feedback Capacity for AGN Channels with Memory: A Time-Domain Sufficient Statistic Approach †

Abstract

1. Introduction, Motivation, Main Results, Current State of Knowledge

1.1. The Problem, Motivation, and Main Results

1.2. The Code Definitions and Noise Models

1.3. Approach of This Paper

1.4. Review of Related Literature

1.4.1. The Cover and Pombra Characterizations of Capacity Pombra [5]

1.4.2. The Yang, Kavcic and Tatikonda [8] Characterization of Maximal Information Rate

1.4.3. The Kim [4] Characterization of Feedback Capacity

1.4.4. The Gattami [7] Characterization of Feedback Capacity and Semi-Definite Progamming Formulation

2. Sequential Characterizations of n–FTFI Capacity for Case (I) Formulation

2.1. Notation

2.2. Preliminary Characterizations of n–FTFI Capacity of AGN Channels Driven by Correlated Noise

2.3. A Sufficient Statistic Approach to the Characterization of n–FTFI Capacity of AGN Channels Driven by PO-SS Noise Realizations

2.4. Application Examples

2.5. Case (II) Formulation: A Degenerate of Case (I) Formulation

2.6. Comments on Past Studies

3. Asymptotic Analysis for Case (I) Formulation

3.1. Entropy Rates of Gaussian Processes

3.2. Convergence Properties of Generalized Matrix DREs to AREs

3.3. Feedback Rates

4. Sequential Characterization of n–FTFI Capacity for Case (II) Formulation

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Proof of Theorem 1

Appendix A.2. Proof of Proposition 1

Appendix A.3. Proof of Corollary 3

Appendix A.4. Proof of Theorem 3

Appendix A.5. Proof of Proposition 2

Appendix A.6. Proof of Lemma 2

Appendix A.7. Proof of Lemma 4

Appendix A.8. Proof of Corollary 11

References

Article Metrics

Citations

Article Access Statistics

New Formulas of Feedback Capacity for AGN Channels with Memory: A Time-Domain Sufficient Statistic Approach^†