Abstract
Recently, several papers identified technical issues related to equivalent time-domain and frequency-domain “characterization of the n–block or transmission” feedback capacity formula and its asymptotic limit, the feedback capacity, of additive Gaussian noise (AGN) channels, first introduce by Cover and Pombra in 1989 (IEEE Transactions on Information Theory). The main objective of this paper is to derive new results on the Cover and Pombra characterization of the n–block feedback capacity formula, and to clarify the main points of confusion regarding the time-domain results that appeared in the literature. The first part of this paper derives new equivalent time-domain sequential characterizations of feedback capacity of AGN channels driven by non-stationary and non-ergodic Gaussian noise. It is shown that the optimal channel input processes of the new equivalent sequential characterizations are expressed as functionals of a sufficient statistic and a Gaussian orthogonal innovations process. Further, the Cover and Pombra n–block capacity formula is expressed as a functional of two generalized matrix difference Riccati equations (DREs) of the filtering theory of Gaussian systems, contrary to results that appeared in the literature and involve only one DRE. It is clarified that prior literature deals with a simpler problem that presupposes the state of the noise is known to the encoder and the decoder. In the second part of this paper, the existence of the asymptotic limit of the n–block feedback capacity formula is shown to be equivalent to the convergence properties of solutions of the two generalized DREs. Further, necessary and or sufficient conditions are identified for the existence of asymptotic limits, for stable and unstable Gaussian noise, when the optimal input distributions are asymptotically time-invariant but not necessarily stationary. This paper contains an in-depth analysis, with various examples, and identifies the technical conditions on the feedback code and state space noise realization, so that the time-domain capacity formulas that appeared in the literature, for AGN channels with stationary noises, are indeed correct.
1. Introduction, Motivation, Main Results, Current State of Knowledge
In the recent papers [1,2,3], concerns are raised whether the time-domain analysis in [4], deals with the Cover and Pombra [5] “characterization of the n–block or transmission” feedback capacity formula and its asymptotic limit, the feedback capacity, of additive Gaussian noise (AGN) channels. Furthermore, the recent comment paper [6] identified gaps in the proof of the simplified frequency-domain characterization of Theorem 4.1 in [4]. The main objective of this paper is to derive new results on the Cover and Pombra characterization of the n–block feedback capacity formula, and to clarify the main points of confusion regarding the time-domain results of [4] and related literature i.e., [7].
1.1. The Problem, Motivation, and Main Results
We consider the additive Gaussian noise (AGN) channel defined by [5]
where
- is the sequence of channel input random variables (RVs) ;is the sequence of channel output RVs ;is the sequence of jointly Gaussian distributed RVs , with distribution , which are not necessarily stationary or ergodic.We wish to examine the feedback capacity of the AGN channel (1) for two distinct formulations of code definition and a noise model, described below under Case (I) and Case (II) formulations. Special cases of these will be related to the existing literature.
Case (I) Formulation. This formulation respects the following two conditions.
- (I.1). The feedback code does not assume knowledge of the initial state of the noise at the encoder and the decoder (see Definition 1), and(I.2) the noise sequence is represented by a partially observable (partially observable means that knowledge of and the initial state do not specify the state .) state space realization, with state sequence (see Definition 2).
For a formulation that respects (I.1) and (I.2), Cover and Pombra characterized the “n–finite transmission” feedback capacity [5] (Equations (10) and (11)), using the information measure ( is identified using the converse coding theorem [5]),
provided the supremum exists, where denotes (differential) entropy.
Case (I.a) Formulation. Although not mentioned in [5], if the feedback code assumes knowledge of the initial state of the noise or the channel, , at the encoder and the decoder (see Definition 3), it follows directly from [5] (Equations (10) and (11)), that (2) is replaced by the information measure
Case (II) Formulation. This formulation relaxes Conditions (I.1) and (I.2) to the following two conditions:
- (II.1) The feedback code assumes knowledge of the initial state of the noise or the channel, , at the encoder and the decoder (see Definition 3);(II.2) the noise sequence is represented by a fully observable state space realization (fully observable means knowledge of and initial state specify the state ), with state sequence such that the noise and the initial state uniquely defines the noise state sequence and vice versa for .Thus, Formulation (2), which respects Conditions (I.1) and (I.2), is the most general.
For a formulation that respects Conditions (II.1) and (II.2), Yang, Kavcic, and Tatikonda [8] characterized the n–finite transmission feedback capacity [8] (Section II (particularly Section II.C, I–III)), using the information measure,
Compared to and , the definition of imposes Condition (II.2) and is fundamentally different from the former, because the input distributions of are different from and . Hence, the information rates of the three formulas are generally different. However, for certain Gaussian noise models of , it might be the case that, under Condition (II.1), the information measures and coincide. We provide several examples in the main body of this paper.
Motivation and Fundamental Differences of Case (I) and Case (II) Formulations.
At this point, we pause to discuss two technical issues of Case (I) and Case (II) formulations, which are not clarified in [4,7,9]. These technical issues are related to the time-domain characterization of feedback capacity, Theorem 6.1 in [4]; they are first discussed in [1,2,3,10]. A recent comment paper [6] also identified gaps in the proof of the frequency-domain characterization of capacity, Theorem 4.1 in [4], which affect the proofs of Theorems 4.1, 4.6, 5.3 and 6.1; Propositions 4.7 and 5.1; Remarks 4.5 and 5.2; and Lemma 6.1 in [4].
To illustrate the technical issues, we consider the autoregressive moving average stable noise denoted by ARMA , studied by many authors [4,7,8,9,11,12], as a benchmark example.
As in [4,7], we define the state variable of the noise by
Then, the state space realization of is
Bounds on feedback capacity, when , i.e., corresponding to an autoregressive noise are derived in [13,14,15] using linear coding schemes.
For the Case (I) formulation, the information measure , i.e., (2), corresponds to the supremum over channel input distributions .
For the Case (II) formulation, the information measure , i.e., (4), corresponds to the supremum over channel input distributions , and Conditions (II.1) and (II.2) are necessary. Alternatively, the necessary conditions for to reduce to are the conditions stated in (12) and (13) (these also follow independently of the Case (I) formulation from the converse coding theorem).
Thus, a necessary condition for (13) to hold is , which is known to the encoder. It follows from [4] (Theorem 6.1; also see Lemma 6.1 and the comments above it, Equation (71)), that Conditions (II.1) and (II.2) are assumed; hence, these results are not developed for the Cover and Pombra [5] formulation. Additional information can be found in Remark 5.
Second, the analysis of the asymptotic per unit time limits of (2)–(4), i.e.,
is a non-trivial problem, and it requires certain technical necessary and/or sufficient conditions for the limits to exist, as well as for the rates to be independent of the initial distribution or the initial data, (see [3,10]), even if the noise process is stationary. It is easy to verify that the analysis in the past studies [4,7,9,11,12] considered the simpler problem , and that the asymptotic limit does not correspond to the ergodic capacity. We clarify these points in our examples.
Main Results. The main results of this paper are briefly stated below.
(1) In the first part of this paper, we derive new equivalent sequential characterizations of the Cover and Pombra “n–block or transmission” feedback capacity formula [5] (Equation (11)), (this first appeared in [10]). In particular, we derive equivalent realizations to the optimal channel input process [5] (Equation (11)), which are linear functionals of a finite-dimensional sufficient statistic and an orthogonal innovations process. From these new realizations follows the equivalent sequential characterizations of the “n–block or transmission” feedback capacity formula [5] (Equation (11)), which will henceforth be called the “n–finite transmission feedback information (n–FTFI) capacity”. The new n–FTFI capacity is expressed as a functional of two generalized matrix difference Riccati equations (DREs) of the filtering theory of Gaussian systems, instead of the one DRE given in [4,7,8]. In fact, we also show that the n–FTFI capacity of [4,7,8] corresponds to .
(2) In the second part of this paper, we analyze the asymptotic per unit time limit of the sequential characterizations of the n–FTFI capacity, when the supremum and limit over are interchanged in (14), denoted by . Then, we show . We identify necessary and/or sufficient conditions for the asymptotic limit to exist, and for the optimal input process to be asymptotically stationary, in terms of the convergence properties of two generalized matrix difference Riccati equations (DREs) to their corresponding two generalized matrix algebraic Riccati equations (AREs). We make use of the so-called detectability, stabilizability, and unit circle controllablity conditions of generalized Kalman filters of Gaussian processes [16,17].
(3) From (1) and (2), we derive analogous results for and its per unit time asymptotic limit, denoted by , as degenerate versions of and . Further, we show that for certain noise models, and under certain conditions, it holds that , i.e., these values do not depend on the initial state or initial distributions.
(4) From (1) and (2), we derive analogous results for the Case (II) formulation, i.e., and its per unit time asymptotic limit denoted by , and we show these are fundamentally different from the Case (I) formulation, and , as well as and . In particular, we show that the characterizations of n–FTFI capacity, , for the Case (II) formulation follow directly from the Case (I) formulation, as a special case (an independent derivation is also presented). Moreover, is a functional of one generalized DRE, while are functionals of two generalized DREs.
1.2. The Code Definitions and Noise Models
Case (I) Feedback Code and Noise Definitions. For the Case (I) formulation, we consider the code of Definition 1 (due to [5]).
Definition 1.
Time-varying feedback code [5]
- A noiseless time-varying feedback code for the AGN Channel (1) is denoted by , and consists of the following elements and assumptions:(i) The uniformly distributed messages .(ii) The time-varying encoder strategies, often called codewords of block length n, defined by (the superscript on indicates that the distribution depends on the strategy ).(iii) The average error probability of the decoder functions , defined by(iv) The channel input sequence “ is causally related (a notion found in [5], page 39, above Lemma 5) to ”, which is equivalent to the following decomposition of the joint probability distribution of :That is, is a Markov chain, for . As usual, the messages W are independent of the channel noise .A rate R is called an achievable rate with feedback coding, if there exists a sequence of codes , such that as . The feedback capacity is defined as the supremum of all achievable rates R.
We note that, in general, depends on the initial distribution ; the ergodic capacity requires that is independent of (see [18]).
We consider a noise model which is consistent with [5], i.e., is jointly Gaussian distributed, , and induced by the partially observable state space (PO-SS) realization of Definition 2.
Definition 2.
A time-varying PO-SS realization of Gaussian noise is defined by
For the Case (I) formulation, we use the terminology “partially observable”, which is standard in filtering theory [16], because the noise induces a distribution , and cannot be expressed as a function of the state of the noise, i.e., does not uniquely define . However, if is known to the encoder, then it can be as easily verified from the ARMA , that uniquely defines , recursively. However, for the PO-SS realization, with , even if the initial state is known to the encoder, does not uniquely define because there are two independent noises and that enter the equations of and . The PO-SS realization is often adopted in many practical problems of engineering and science to realize jointly Gaussian processes .
We should emphasize that for the Case (I) formulation to be consistent with the Cover and Pombra [5] formulation, both the code of Definition 1 and the PO-SS realization of Definition 2 must respect the following two conditions:
(A1)
The initial state of the noise is not known at the encoder nor the decoder;
(A2)
At each t, the representation of the noise by the PO-SS realization of Definition 2 does not uniquely determine the state of the noise and vice-versa, i.e., it is a partially observable realization.
Case (II) Formulation of Feedback Code and Noise Definitions. For the Case (II) formulation, we presuppose the following:
Condition 1.
The initial state of the noise or channel is known to the encoder and decoder;
Condition 2.
Given a fixed initial state , known to the encoder and the decoder, at each t, the channel noise uniquely defines the state of the noise and vice-versa.
Thus, for the Case (II) formulation, the code is that of Definition 3, below (hence different from Definition 1).
Definition 3.
A code with initial state known at the encoder and the decoder
- A variant of the code of Definition 1 is feedback code with the initial state of the noise or channel , known to the encoder and decoder strategies, denoted by , .The code , is defined as in Definition 2, with (ii), (iii), and (iv) being replaced byThe feedback capacity is denoted by and should be distinguished from .The feedback capacity is denoted by and if, in addition, Condition 2 holds.The initial state may include , etc.
It will become apparent that past studies [4,7,8] considered feedback capacity, , and not or .
1.3. Approach of This Paper
Our approach and analysis of information measures (2)–(4), as well sa their per unit time limits, is based on the following two step procedure:
Step # 1. We apply a linear transformation to the Cover and Pombra optimal channel input process [5] (Equation (11)), (see (33)–(39) to equivalently represent it by a linear functional of the past channel noise sequence, the past channel output sequence, and an orthogonal Gaussian process, i.e., an innovations process. That is, is uniquely represented, since it is expressed in terms of the orthogonal process.
Step # 2. We express the optimal input process by a functional of a sufficient statistic, which satisfies a Markov recursion, and an orthogonal innovations process. It then follows that the Cover and Pombra characterization of the “n–block” formula [5] (Equation (10)) (see (33) and (34)) is equivalently represented by a sequential characterization. The problem of feedback capacity is then expressed as the maximization over two sequences of time-varying strategies of the channel input process of the difference of (differential) entropies of the innovations processes of and (analog of entropies in the right-hand side of (2)).
where is due to the Gaussianity of and , as well as due the independence of the innovations processes and and of innovations processes and .
The asymptotic analysis of (or with limit and supremum interchanged) is then addressed from the asymptotic properties of entropy rates and the average power,
over the channel input distributions, and where the covariance of the innovations process of is a functional of the solutions of two generalized matrix DREs. We identify necessary and/or sufficient conditions for the existence of limits, irrespective of whether the noise is non-stationary, unstable, or stationary. Further, we show that, in general, the characterizations of feedback capacity for the Case (I) and Case (II) formulations are fundamentally different, and we identify conditions based on the feedback codes and noise to coincide.
1.4. Review of Related Literature
Asymptotic feedback capacity formulas and bounds for AGN channels, driven by stationary and asymptotically stationary (often limited memory) noise, are derived since the early 1970s in an anthology of papers based on information theoretic formulas, under various assumptions [7,8,9,11,12,13,14,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33], in two directions.
- (D1)
- Characterizations and explicit formulas of asymptotic feedback capacity that correspond to feedback codes, when the initial state of the noise (the initial state is any a priori information) is known or not known to the encoder and the decoder;
- (D2)
- Bounds on asymptotic feedback capacity that correspond to linear feedback coding schemes of communicating Gaussian random variables (RVs), , and coding schemes of communicating digital messages , when the initial state the is known or not known to the encoder and the decoder.
1.4.1. The Cover and Pombra Characterizations of Capacity Pombra [5]
Cover and Pombra characterized the n–FTFI capacity for non-stationary and non-ergodic noise , [5] (Equation (10)), by (we use to denote differential entropy of a continuous-valued RV X; hence, we indirectly assume the probability density functions exist)
where the distribution is induced by a jointly Gaussian channel input process [5] (Equation (11)):
The notation means that the random variable is jointly Gaussian with mean and covariance matrix , and denotes an n by the n identity matrix. Feedback capacity, , is characterized by the per unit time limit of the n–FTFI capacity [5] (Theorem 1).
Over the years, considerable efforts have been devoted to compute and [4,7,8,9,11,33], often under simplified assumptions on the channel noise. In addition, bounds are described in [27,28,29], while numerical methods are developed in [31] for time-invariant AGN channels, driven by stationary noise. In [4,7,11,33,34], the authors considered a variant of (40) by interchanging the per unit time limit and maximization operations under the following assumption: the joint process is either jointly stationary or asymptotically stationary, and the joint distribution of the joint process is time-invariant. We describe [4,7,8] below.
1.4.2. The Yang, Kavcic and Tatikonda [8] Characterization of Maximal Information Rate
In [8], the authors analyzed the feedback capacity of the AGN channel (1), driven by a stationary noise, described the power spectral density (PSD) functions :
The analysis in [8] is based on time-domain methods and corresponds to the Case (II) formulation (see [8] (Section II; in particular, Section II.C, I–III, Theorem 1, Section III) by considering a specific time-invariant, stable, state space realization of the noise PSD (41), such that Conditions (II.1), and (II.2) hold, i.e.,
The initial state of the noise, , is known to the encoder and the decoder, and the initial state and noise uniquely define the noise state , and vice-versa, for all t.
The time-domain characterization of feedback capacity, called the maximal information rate [8] (Theorem 6), corresponds to the supremum and limit being interchanged, and involves only one matrix Riccati equation of the linear filtering theory. However, ref. [8] (Theorem 6) does not state the conditions based on which the maximal information rate is valid (i.e., existence of asymptotic limit).
1.4.3. The Kim [4] Characterization of Feedback Capacity
The author in [4] also analyzed the feedback capacity of the AGN channel (1), driven by stationary noise described by the PSD (41) and by a time-invariant, stable, state space realization of the noise (see [4] (Section VI)). A major point of confusion is that the characterization of feedback capacity in the time domain [4] (Theorem 6.1) does not state whether this corresponds to Case (I) or Case (II) formulations. The reader, however, can verify from [4] (Lemma 6.1 and comments above it) that the time-domain characterization of feedback capacity [4] (Theorem 6.1) corresponds to the Case (II) formulation, as stated in the study by Yang, Kavcic and Tatikonda [8]. In fact, the characterization of feedback capacity [4] (Theorem 6.1) involves only one Riccati equation of the linear filtering theory, as in [8]. We reconfirm this point at various parts of this paper (see, for example, Section 2.6).
1.4.4. The Gattami [7] Characterization of Feedback Capacity and Semi-Definite Progamming Formulation
The authors in [7] re-visited the feedback capacity of the AGN channel (1) driven by a stationary noise described by the PSD (41) and with a time-invariant, stable, state space realization of the noise . One of the main results of [7] is the feedback capacity characterization for the Case (II) formulation, i.e., that involves only one matrix Riccati equation of the filtering theory, precisely as in [4,8]. Another main result of [7] is the re-formulation of the optimization problem using semi-definite programming.
In the following remark, we will provide additional insights into the results of [4,7,8].
Remark 1.
On the formulas in [4,7,8].
- Refs. [4,7] considered the stable, time-invariant PO-SS realization of Definition 2, with , i.e., .
In [4], (Theorem 6.1) (see [4] (above and below Equation (70))), the asymptotic characterization of feedback capacity involves one filtering matrix ARE and is achieved by a channel input, which is different at times from subsequent time , given by
where χ is a constant vector, and are IID Gaussian RVs.
In [7] (Theorem 3), the asymptotic characterization of feedback capacity involves one filtering matrix ARE and is incurred by the time-invariant channel input
, are IID Gaussian RVs.
In [8], a state space realization of the PSD is considered, and the maximal information rate is presented, (Equation (125)) and [8] (Theorem 6, Equation (138), ), which involves one filtering matrix ARE. It is achieved by
where are IID Gaussian RVs.
The above references computed the feedback capacity and maximal information rate for the ARMA and arrived at the conclusion that it is precisely Butman’s [13] and Wolfowitz’s [14] lower bound.
It will become apparent that [4,7,8] arrived at the above expressions by considering problem , and not or .
Sequential equivalent characterizations of the Cover and Pombra [5] n–FTFI capacity, and capacity of the AGN channel (1) driven by a non-stationary and non-ergodic Gaussian noise , as well as by time-varying, unstable (and stable) state space realizations of Definition 2, are derived in [1,2,3,10], including relations between Case (I) and Case (II) formulations. In particular, ref. [10] proved that of the AGN channel with state space noise of Definition 2 involves two matrix Riccati equations of the linear filtering theory and that, under certain conditions, reduces to , which involves only one matrix Riccati equation. Corresponding expressions are obtained for their per unit time asymptotic limits. The methods and results of [10] are generalized to multiple-input multiple-output (MIMO) Gaussian channels in [3]. Further, ref. [2] derived closed-form expressions of for AGN channels driven by the stable and unstable ARMA noise, showing the connection between Case (I) and Case (II) formulations. Ref. [3] generalized the earlier investigation in [1], which considered the autoregressive unit memory stable and unstable noise. An investigation of nonfeedback capacity of stable and unstable noise is outlined in [35]. The connection of ergodic theory and feedback capacity of unstable channels is discussed in [36,37].
MIMO Gaussian channels are also investigated in [38]. However, many of the expressions in [38] are previously obtained in [3,10]. The analysis in [38] does not include a derivation of the optimal channel input that achieves and , and does not discuss the connection between Case (I) and Case (II) formulations. The closed-form expressions of capacity of the examples in [38], are special cases of expressions in [1] and in [2], which treated the noise ARMA .
We structure this paper as follows.
In Section 2, we derive the new sequential characterizations of the n–FTFI capacity for the Cover and Pombra formulation of feedback capacity of the AGN channel (1), i.e., for the Case (I) formulation, . We also derive analogous sequential characterizations for and for the Case (II) formulation, , i.e., when Conditions 1 and 2 hold, to illustrate their fundamental differences.
In Section 3, we present the asymptotic analysis of feedback capacity for the Case (I) formulation. In Section 4, we treat the Case (II) formulation.
This paper contains several examples and makes comparisons to the existing literature.
2. Sequential Characterizations of n–FTFI Capacity for Case (I) Formulation
In this section, we derive equivalent sequential characterizations for the following:
(i) defined by (2) of the Case (I) formulation, i.e., for the Cover and Pombra n–FTFI capacity characterization (34);
(ii) defined by (3), as a degenerated case of ;
(iii) defined by (4) of the Case (II) formulation, as a degenerated case of .
We organize the presentation of the material as follows.
(1) Section 2.1. Here, we introduce our notation.
(2) Section 2.2. The main result is Theorem 1, which provides an equivalent sequential characterization of the Cover and Pombra characterization of the n–FTFI capacity, , i.e., of (33) and (34). Our derivation proceeds as follows. We apply a linear transformation to the Cover and Pombra Gaussian optimal channel input (35), to represent , with a linear function of or equivalently and an orthogonal Gaussian innovations process , which is independent of for .
- Subsequently, we apply Theorem 1 to the time-varying PO-SS noise (see Example 1) to the non-stationary autoregressive moving average, ARMA noise, and to the stationary ARMA noise (see Example 2), which is found in many references, such as [4]. It will become apparent that our characterizations of n–FTFI capacity are fundamentally different from past studies.
(3) Section 2.3. The main result is Theorem 3, which provides a simplified characterization of the sequential characterization of the n–FTFI capacity, , given in Theorem 1 (i.e., the equivalent of (34)), for the time-varying AGN channel (1) driven by the PO-SS realization of Definition 2, for the code of Definition 1. The n–FTFI capacity of Theorem 3 is expressed in terms of solutions to two DREs. Our derivation is based on identifying a finite-dimensional sufficient statistic to express as a functional of the sufficient statistic, instead of or , and an orthogonal Gaussian innovations process.
(4) Section 2.4. The main results is Corollary 6, which is an application of Theorem 3 (i.e., the sufficient statistic representation), to the ARMA noise of Example 2. This example shows that the n–FTFI capacity is expressed in terms of solutions to two DREs.
- From Corollary 6, the following will become apparent:
(i) Neither the time-domain characterization [4] (Theorem 6.1) (see [4]) (Theorem 5.3) nor the frequency domain characterization [4] (Theorem 4.1) correspond to the Cover and Pombra characterization of feedback capacity.
(5) Section 2.5. The main results is Corollary 7, which gives the n–FTFI capacity for the Case (II) formulation, as a degenerate case of the Case (I) formulation, i.e., of Theorem 3.
(6) Section 2.6. The main result is Proposition 2, which further clarifies the following.
(i) The formulation of [8] and the formulation that led to [4] (Theorem 6.1) are based on the Case (II) formulation, as well as (ii) some of the oversights in [4,7,9,11,12].
2.1. Notation
Throughout this paper, we use the following notation:
- , where n is a finite positive integer., and is the vector space of tuples of the real numbers for an integer .is the space of complex numbers.is the set of n by m matrices with entries from the set of real numbers for integers .is the open unit disc of the space of complex number .(resp. ) denotes the set of positive semidefinite (resp. positive definite) symmetric matrices with elements in the real numbers and of size . Thus, if for all , . Positive semidefiniteness is denoted by and (strict) positive definiteness by . denotes the identity matrix, denotes the trace of any matrix .is the Spectrum of a matrix (the set of all its eigenvalues). A matrix is called exponentially stable if all its eigenvalues are within the open unit disc, i.e., .denotes a probability space. Given a random variable , its induced distribution on is denoted by .denotes a Gaussian distributed RV X, with mean value and covariance matrix , defined byGiven another Gaussian random variable , which is jointly Gaussian distributed with X, i.e., the joint distribution is , the conditional covariance of X given Y is defined bywhere the last equality is due to a property of jointly Gaussian distributed RVs.Given three arbitrary RVs with induced distribution , the RVs are called conditionally independent given the RV Y if . This conditional independence is often denoted by , which is a Markov chain.
2.2. Preliminary Characterizations of n–FTFI Capacity of AGN Channels Driven by Correlated Noise
We start with preliminary calculations for the feedback code of Definition 1, which we use to prove Theorem 1. These calculations are introduced for the sake of clarity and to establish our notation.
- For the feedback code of Definition 1, by the channel definition (1), i.e., (18), the conditional distribution of , given , iswhere is due to (1) and is due to (18). We introduce the set of channel input distributions with feedback, which are consistent with the code of Definition 1, not necessarily generated by the messages W, as follows:By Definition 1, we have . Moreover, by the channel definition, any pair of the sequence triple uniquely defines the remaining sequence. Thus, the identity holds:We also emphasize that, by Definition 1, for a given feedback encoder strategy , i.e., the conditional distributions of given is obtained as follows:is due to knowledge of the distribution of the strategies , the code definition, and the recursive substitution, , where is specified by the knowledge of the strategies, and the knowledge of ,is due to knowing specifies ,is due to the fact that, any pair of the triple specifies the remaining sequence, i.e., knowing specifies , and is thus redundant,is due to the conditional independence ,is due to (18), i.e., , and the channel definition.
By the channel definition , each is also expressed as
where is due to the channel definition—i.e., the presence of in can be removed, since it is redundant—and specified by . Consequently, we have the identity
Notation 1.
For the feedback code of Definition 3, with initial state , known to the encoder and the decoder, the above sets are replaced by , to indicate that the distributions and codes are , etc., and these depend on s.
In the next theorem, we present our preliminary equivalent sequential characterization of the Cover and Pombra characterization , i.e., of (33), under encoder strategies and channel input distributions . Unlike the Cover and Pombra [5] realization of , given by (35), at each time t, is driven by an orthogonal Gaussian process .
Theorem 1.
Information structures of maximizing distributions for AGN Channels
- Consider the AGN channel (1), i.e., with noise distribution , and the code of Definition 1. Then, the following hold:(a) The following inequality holds:where the conditional (differential) entropy is evaluated with respect to the probability distribution , defined byand is evaluated with respect to the probability distribution , defined by(b) The optimal channel input distribution , which maximizes of part (a), i.e., the right-hand side of (63), is induced by an input process , which is conditionally Gaussian, with linear conditional mean, nonrandom conditional covariance and given byand such that the average constraint holds and (18) is respected.(c) The optimal channel input distribution of part (b) is induced by a jointly Gaussian process , with a realization given by
Proof.
See Appendix A.1. □
Remark 2.
For the code of Definition 3 that assumes knowledge of the initial state , it is easy to verify that is directly obtained from Theorem 1, as a degenerate case (an independent derivation is easily produced following the derivation of Corollary 11, with slight variations).
By utilizing Theorem 1, we can derive the converse coding theorems stated below for the feedback codes of Definitions 1 and 3.
Theorem 2.
Converse coding theorems for codes of Definitions 1 and 3
Consider the AGN channel (1).
- (a) Any achievable rate R for the code of Definition 1 satisfiesprovided the supremum exists and the limit exists, where the right-hand side of (79) is given in Theorem 1(d).(b) Any achievable rate R for the code of Definition 3 (with initial state ) satisfieswhere means the expectation is for a fixed , provided the supremum exists and the limit exists, and where the right-hand side of (81) is obtained from Theorem 1, part (d), by replacing all conditional distributions, entropies, etc., for fixed initial state (see Notation 1).
Proof.
Following on from standard arguments, we use Fano’s inequality (see also [5]) and Theorem 1. □
In the next remark, we clarify the equivalence of Theorem 1(d) to Cover and Pombra [5].
Remark 3.
Relation of Theorem 1 and Cover and Pombra [5]
- (a) From the realization of given by (68), we can recover the Cover and Pombra [5] realization (35) by recursive substitution of into the right-hand side of (68), as follows:for some which is jointly correlated, and some nonrandom , as given by (35) and (36).(b) Unlike the Cover and Pombra [5] realization of , i.e., (35), the realization of given by (68) or in vector form by (69) is such that, at each time t, depends on or in vector form on , where is an innovations or orthogonal process, i.e., (71) holds.(c) In subsequent parts of the paper, we derive an equivalent sequential characterization of the Cover and Pombra n–FTFI capacity (34), which is simplified further with the use of a sufficient statistic (that satisfies a Markov recursion).
To characterize using Theorem 1, part (d), we need to compute the (differential) entropy of . The following lemma is useful in this respect:
Lemma 1.
Entropy calculation from generalized Kalman filter of the PO-SS noise realization.
Consider the PO-SS realization of of Definition 2. Define the conditional covariance and conditional mean of given by
Then, the following hold:
- (a) The conditional distribution of conditioned on is Gaussian, i.e.,where .(b) The conditional mean and covariance are given by the generalized Kalman filter recursions, as follows:(i) The optimal mean square error estimate satisfies the generalized Kalman filter recursion(ii) The error satisfies the recursion(iii) The covariance of the error is such that and satisfies the generalized matrix DRE(iv) The conditional mean and covariance are given by(v) The entropy of is given by
Proof.
(a,b).(i–iv). The generalized Kalman filter of the PO-SS realization of and accompanied statements can be found in many textbooks, i.e., [16]. However, it is noted that , are all independent Gaussian. For example, to show (93), we write the recursion for using part (i) and the realization of , part (b). (v) By the chain rule of joint entropy, we have
From (101) and (97), we have (98) from the entropy formula of Gaussian RVs. □
The next corollary of the entropy follows directly from Lemma 1 when is fixed.
Corollary 1.
Conditional entropy of the PO-SS noise realization.
Consider the PO-SS realization of of Definition 2, for fixed , and denote the state process generated by recursion (19) via (we often use the notation to emphasize that the process is generated for fixed) . Replace the conditional covariance and conditional mean (85) and (86) by
Then, all statements of Lemma 1 hold, with the following changes:
In particular, the conditional entropy of the conditioned on is given by
where satisfies the generalized DRE (95) with initial condition .
Next, we introduce an example of a PO-SS realization of the noise that we often use in this paper.
Example 1.
A time-varying PO-SS noise realization is defined by
The next corollary is an application of Lemma 1 to the time-varying PO-SS noise of Example 1.
Corollary 2.
The entropy of the PO-SS noise of Example 1 is computed from Lemma 1 with the following changes:
Proof.
This is easily verified. □
From Corollary 2, we have the following observations:
Remark 4.
Consider the PO-SS noise of Example 1. Then, the following hold.
We also apply our results to various versions of the autoregressive moving average (ARMA) noise model, such as the double-side and single-sided, stationary version of the ARMA noise, previously analyzed in [4] and in many other papers, to illustrate fundamental differences of Case (I) and Case (II) formulations.
Example 2.
The time-invariant ARMA noise
- (a) The time-invariant one-sided, stable or unstable, autoregressive moving average (ARMA) noise is defined byTo express the AR in state space form, we define the state variable of the noise byThen, the state space realization of isWe note that the AR is not necessarily stationary or asymptotically stationary.A special case of the AR is the AR noise (i.e., with ) defined by(b) Double-sided wide-sense stationary ARMA noise. A double-sided wide-sense stationary ARMA noise is defined bywhere is an independent and identically distributed Gaussian sequence, i.e., , . The power spectral density (PSD) of the wide-sense stationary noise (this corresponds to [4] (Equation (43) with )) is given byWe define the state process byThen, the stationary state space realization of isprovided that the initial covariances are chosen appropriately to ensure stationarity (see Proposition 1).
For the AR noise, we clarify differences of the feedback codes of Definitions 1 and 3 in the next remark, as well as of the Case (I) formulation versus the Case (II) formulation (and we discuss implication to results in [4,7,9,11,12]).
Remark 5.
ARMA noise of Example 2
- (a) Consider any of the AR of Example 2. For the code of Definition 2, the channel input process cannot be expressed in terms of the state (also see Remark 4(a)).(b) Consider the non-stationary AR of Example 2(a).(i) Assume the code of Definition 3, with initial state known to the encoder. By (120),hence, knowledge of at the encoder does not determine because the encoder requires knowledge of for this to hold. It then follows that is computed from Corollary 1 as follows:where is the solution of (95) with initial data .(ii) Assume the code of Definition 3, with initial state or , known to the encoder. Then, by Corollary 1,By (120), , and a necessary condition for Conditions 1 of Section 1.1 to hold is the following: both are known to the encoder and the decoder.(c) The statements of parts (a) and (b) also hold for the double-sided and the one-sided wide-sense stationary AR of Example 2(b,c).(d) The Case (II) formulation discussed in Section 1.1 requires Conditions 1 and 2 to hold. For any of the AR noise models, Conditions 1 and 2 hold if and only if or are known to the encoder. Clearly, the values of under the Case (I) formulation are fundamentally different from the values of under the Case (II) formulation. Consequently, in general, given by (75) is fundamentally different from , i.e., it corresponds to a fixed initial state , known to the encoder and the decoder, and to the channel input distribution.(e) From parts (a–d), the characterization of feedback capacity for the stationary ARMA, given in [4] (Theorem 6.1, ) (which is derived based on [4] (Lemma 6.1)) presupposed the encoder and the decoder assumed knowledge of ,
In fact, the formulas of capacity in [4,7,8] use .
In the next proposition, we state conditions for the stable realizations of Example 2(a), i.e., AR to be asymptotically stationary, and for the realizations of Example 2(b,c) to be stationary. We should emphasize that for stationary noise, we need to determine the initial conditions of the generalized Kalman filter of Lemma 1 to correspond to the stationary noise.
Proposition 1.
Asymptotically stationary and stationary ARMA noises of Example 2
- (a) The realization of the double-sided ARMA noise of Example 2(b) is stationary if the following conditions hold:where the constants are given bySimilarly, the one-sided ARMA noise of Example 2(c) is stationary if the above equations hold .(b) The realization of the ARMA noise of Example 2(a) is asymptotically stationary if .(c) For the stationary realization of part (a), the optimal conditional variance and conditional mean of from , i.e., are defined by the generalized Kalman filter given byinitialized at the initial data(i) If the conditioning information is , then the generalized Kalman filters (135) and (136) still hold and are initialized at the initial data(ii) If the inital data are not available, then the generalized Kalman filter is initialized at initial data , .
Proof.
See Appendix A.2. □
Remark 6.
Consider the stationary double-sided or one-sided ARMA of Example 2. From Proposition 1 and, in particular, the initial data stated in (137) and (138), it is clear that even if the encoder and the decoder know the initial state . Thus, . In this case, the value of defined by (81) is fundamentally different from the formulation in [4,7,8], leading to the characterization of feedback capacity [4] (Theorem 6.1).
In the next corollary, we further clarify the difference between Case (I) formulation and Case (II) formulation, by stating the analog of Theorem 1 for the code of Definition 3, i.e., when is fixed.
Corollary 3.
n–FTFI capacity for feedback code of Definition 3
- Consider the time-varying AGN channel defined by (1), driven by a noise with the PO-SS realization of Definition 2, and the code of Definition 3, with initial state being fixed.Then, the following hold.(a) The n–FTFI capacity is given bywhere the supremum is over all of the realization of , which induces the distribution , and all statements of Theorem 1 and Lemma 1 hold, with the conditional distributions, expectations, and entropies replaced by the corresponding expressions with a fixed .(b) A necessary condition for Condition 2 of Section 1.1 to hold is as follows:(i) uniquely defines .Moreover, if (i) holds, then the entropy of part (a) is given byThe stable, time-invariant PO-SS realization of Definition 2, which is considered in [4,7], satisfies , i.e., . Moreover, for this realization, (i) always holds.
Proof.
See Appendix A.3. □
In the next remark, we illustrate that given by (143) follows directly from Lemma 1 by fixing and assuming that uniquely defines .
Remark 7.
The n–FTFI capacity for code of Definition 1 versus code of Definition 3.
Consider the generalized Kalman filter of the PO-SS noise realization, of Lemma 1, and assume the following:
(i) The initial state of the noise is known, i.e., or , and uniquely defines .
Then, all statements of Lemma 1 hold, by replacing with for . Since satisfies the generalized DRE (95) with initial condition , then it is easy to deduce that for is a solution. By substituting in (98), we obtain (143), as expected, which is precisely the entropy of the noise that appeared in [4,7].
- On the other hand, for the code of Definition 1, by Theorem 1(d), the right-hand side of the n–FTFI capacity involves , which is computed using the generalized Kalman filter of Lemma 1.
2.3. A Sufficient Statistic Approach to the Characterization of n–FTFI Capacity of AGN Channels Driven by PO-SS Noise Realizations
The characterization of the n–FTFI capacity via (34) (which is equivalently given in Theorem 1(d)), although compactly represented, is not very practical because the input process is not expressed in terms of a sufficient statistic that summarizes the information of the channel input strategy [39].
- In this section, we wish to identify a sufficient statistic for the input process , given by (68), called the state of the input, which summarizes the information contained in . It will then become apparent that the characterization of the n–FTFI capacity for the Cover and Pombra formulation and code of Definition 1 can be expressed as a functional of two generalized matrix DREs.
First, we invoke Theorem 1 and Lemma 1 to show that for each time t, is expressed as
which means, at each time t, the state of the channel input process is . We show that satisfies another generalized Kalman filter recursion.
Now, we prepare to prove (144) and the main theorem. We start with preliminary calculations.
At , we also have . By (149), it follows that the conditional distribution of given is
From the above distributions, at each time t, the distribution of conditioned on , given in Theorem 1, is also expressed as a linear functional of , for .
The next theorem further shows that for each t, the dependence of on is expressed in terms of for , and this dependence gives rise to an equivalent sequential characterization of the Cover and Pombra n–FTFI capacity, .
Theorem 3.
Equivalent characterization of n–FTFI capacity for PO-SS noise realizations
- Consider the time-varying AGN channel defined by (1), driven by a noise with the PO-SS realization of Definition 2, and the code of Definition 1. Also consider the generalized Kalman filter of Lemma 1.Define the conditional covariance and conditional mean of , given , byThen, the following hold.(a) An equivalent characterization of the n–FTFI capacity , defined by (34) and (35), iswhere is jointly Gaussian, and(b) The optimal jointly Gaussian process of part (a) is represented as a function of a sufficient statistic bywhere is nonrandom.The conditional mean and covariance, and , are given by generalized Kalman filter equations, as follows:(i) satisfies the Kalman filter recursion
Proof.
See Appendix A.4. □
Remark 8.
On the characterization of n–FTFI capacity of Theorem 3
- The characterization of the n–FTFI capacity given by (183) involves the generalized matrix DRE , which is also a functional of the generalized matrix DRE of the error covariance of the state from the noise output . This feature is not part of the analysis in [4] and recent studies [4,7,9,11,12].
The next corollary follows directly from Theorem 3 as a degenerate case.
Corollary 4.
Equivalent characterization of n–FTFI capacity for PO-SS noise realizations Consider the time-varying AGN channel defined by (1), driven by a noise with the PO-SS realization of Definition 2, and the code of Definition 3, with initial state fixed, and replace (152) and (153) by
Then, the characterization of n–FTFI capacity, (3), is
where is given by Corollary 1, and the statements of Theorem 3 hold with the above changes, i.e., (184), (185), and all conditional entropies, distributions, expectations, etc., are defined for fixed .
Proof.
It is easily verified from the derivation of Theorem 3 by fixing . □
Remark 9.
On the characterization of n–FTFI capacity of Corollary 4
- The characterization of the n–FTFI capacity given in Corollary 4 (similar to Theorem 3) involves two generalized matrix DREs, because it does not assume that Conditions 1 and 2 hold. This distinction is not part of the analysis in [4,7,9,11,12].
2.4. Application Examples
In this section, we apply Theorem 3 to specific examples.
First, we consider the application example of the AGN channel driven by the PO-SS noise.
Corollary 5.
The n–FTFI capacity of the AGN channel driven by the PO-SS, noise is obtained from Lemma 1 and Theorem 3 by using (113).
Proof.
This is easily verified, as in Corollary 2. □
In the next corollary, we apply Theorem 3 to the stable and unstable ARMA noise to obtain the characterization of the n–FTFI capacity and . It is then obvious that for the stable ARMA noise, the characterization of involves two generalized DREs, contrary to the analysis in [4,7,9,11,12], for the same noise model.
Corollary 6.
Characterization of n–FTFI capacity for the ARMA
- Consider the time-varying AGN channel defined by (1) and the code of Definition 1.(a) For the non-stationary ARMA noise of Example 2(a), the characterization of the n–FTFI capacity, , issubject to the constraintsand whereThe optimal jointly Gaussian process is obtained from Theorem 3(b) by invokingSpecial Case. If or the initial state is fixed, , thenand reduces tosubject to the constraints(This special case is precisely the application example analyzed in [4,7,8]).
(b) For the non-stationary AR noise of Example 2(c), the characterization of the n–FTFI capacity is obtained from part (a) by setting , i.e.,
subject to the constraints of are the non-negative solutions of the generalized RDEs:
(c) For the non-stationary AR noise of Example 2(c), with or a fixed initial state , then (196) holds, i.e., , and reduces to
subject to the constraint
Proof.
(a) The first part follows directly from Theorem 3 by using (195). The last part is obtained as follows. If or is fixed, then, by (193), it follows that . Moreover, by (191) and (192), it follows that . By substituting into (188) and (189), we obtain (197) and (180). (b) From part (a), let . Then,
By substituting into the equations of part (a), we obtain (200) and (201). (c) This is a special case of parts (a) and (b). □
Remark 10.
By Corollary 6(a), it is obvious that if , i.e., , which means is fixed, is fixed (and known to the encoder and the decoder)—see (120). Then, and , which depends on the initial state . To ensure that we obtain a large enough n, the rate is independent of s. Moreover, it is necessary to identify conditions for convergence of solutions of the generalized DRE (180) to a unique limit, , which does not depend on the initial data . We address this problem in Section 3.
2.5. Case (II) Formulation: A Degenerate of Case (I) Formulation
Theorem 3 gives the n–FTFI capacity for the Case (I) formulation. However, since the Case (II) formulation is a special case of the Case (I) formulation, we expect that we can recover the characterization of the n–FTFI capacity for the Case (II) formulation from Theorem 3, i.e., when the code is , , and Conditions 1 and 2 of Section 1.1 hold. We show this in the next corollary.
Corollary 7.
The degenerate n–FTFI capacity of Theorem 3 for the Case (II) formulation
- Consider the time-varying AGN channel defined by (1), driven by a noise with PO-SS realization of Definition 2, and suppose that the following hold:
- (1) The code is , ;(2) Conditions 1 and 2 of Section 1.1 hold.
- Then, the following hold:(a) Corollary 1 holds, i.e., all statements of Lemma 1 hold with replaced by as defined by (102) and (103). In particular, for , and is given by (143).(b) All statements of Theorem 3 hold with replaced by , as in part (a), and defined by (152) and (153) reduces toIn particular, the optimal input process of Theorem 3(c) degenerates to(c) The characterization of the n–FTFI capacity, of Theorem 3, degenerates to . It is defined bysatisfies the generalized DREand the statements of parts (a) and (b) hold.
Proof.
(a) The statements about Lemma 1 follow from Remark 7. (b) The statements about Theorem 3 are easily verified by replacing all conditional expectations, distributions, etc., for a fixed initial state , and by using part (a), i.e., , . Part (c) follows from parts (a) and (b). □
2.6. Comments on Past Studies
It is easily verified that Yang, Kavcic and Tatikonda [8] analyzed , which is defined by (81), under the Case (II) formulation, i.e., Conditions 1 and 2 of Section 1.1 hold, as discussed in the next remark.
Remark 11.
Prior studies on the time-invariant stationary noise of PSD (41)
- Yang, Kavcic and Tatikonda [8] analyzed the AGN channel driven by a stationary noise with PSD defined by (41) (see [8] (Theorem 1)). The special case of (126) is found in [8] (Section VI.B, Theorem 7).The analysis in [8] presupposed the following formulation:(i) The code is , , where is the initial state of the noise, known to the encoder and the decoder, as discussed in Definition 3;(ii) Conditions 1 and 2 of Section 1.1 hold;(iii) The n–FTFI capacity formula is , defined by (81).We emphasize that in [8] (Section II.C), a specific realization of the PSD is considered to ensure that Conditions 1 and 2 hold, i.e., the analysis in [8] presupposed a stationary noise and the Case (II) formulation.
Now, we ask the following: Given the PSD of the noise defined by (41), and the double-sided realization [4] (Equation (58)), i.e., the analog of time-invariant version of the PO-SS realization of Definition 2, or its analogous one-sided realization, what are the necessary conditions for the feedback capacity of [4] (Theorem 6.1) to be valid?
- The answer to this question is as follows: Conditions 1 and 2 of Section 1.1 are necessary conditions. We show this in the next proposition.
Proposition 2.
Conditions for validity of the feedback capacity characterization of [4] (Theorem 6.1)
- Consider the AGN channel (1) driven by a stationary noise with PSD defined by (41) with the double-sided or one-sided realization [4] (Equation (58)), (i.e., analog of time invariant of Definition 2).Then, a necessary condition for [4] (Theorem 6.1) to hold isFurther, Conditions 1 and 2 of Section 1.1 are necessary and sufficient for Equality (211) to hold.
Proof.
See Appendix A.5. □
The next remark is our final observation on prior studies.
Remark 12.
Comparison of Cover and Pombra Characterization and current literature
- From Corollary 7 and Proposition 2, we have the following:The characterization of feedback capacity given in [4] (Theorem 6.1, ) corresponds to the Case (II) formulation and not to the Case (I) formulation. Further, the optimization problem of [4] (Theorem 6.1, ) is precisely the optimization problem investigated in [8] (Section VI), with the additional restriction that the innovations’ part of the channel input is taken to be asymptotically zero in [4] (Theorem 6.1, ), i.e., see [4] (Lemma 6.1 and comments above it). Recent studies [7,9,11,12] should be read with caution because the results therein often build on [4] (Theorems 4.1 and 6.1).
3. Asymptotic Analysis for Case (I) Formulation
In this section, we address the asymptotic per unit time limit of the n–FTFI capacity. Our analysis includes the following:
(1) Fundamental differences of entropy rates of jointly Gaussian stable versus unstable noise processes.
(2) Necessary and/or sufficient conditions for existence of entropy rates of unstable (and stable) , and , expressed in terms of detectability and stabilizability or unit circle controllability conditions of generalized DREs [16,17], and asymptotic stationarity of the optimal input process (and output process , if the noise is stable).
This section also reconfirms that, in general, the asymptotic analysis of the n–FTFI capacity of a feedback code that depends on the initial state of the channel, i.e., is fundamentally different from code that does does not depend on the initial state.
Closed-form expressions of the asymptotic per unit time limit of of AGN channels driven by AR noise, i.e., stable and unstable, are found in [1].
Closed-form expressions of the asymptotic per unit time limit of of AGN channels driven by ARMA noise are found in [3].
We consider the following definition of rate, often used for nonfeedback capacity of stationary processes; however, our formulations does not assume stationarity.
Definition 4.
Per unit time limit of and
- Consider the AGN channel defined by (1), driven by the time-invariant PO-SS realization of Definition 2.(a) For the code of Definition 1, define the per unit time limitwhere, for problem , the supremum is taken over all asymptotically time-invarinat distributions with feedback , such that the limits exists and the supremum exists and it is finite.(b) For the code of Definition 3, i.e., , , with initial state , is replaced by , defined by (212), with differential entropies, conditional expectations, and conditional distributions, defined for fixed .
The rate definition, , i.e., the interchange of limit and supremum, is consistent with the definition of rates considered in [4,7,9,11,12]. However, unlike [4,7,9,11,12], we treat the general time-invariant, stable and unstable PO-SS noise realization of Definition 2, which is not necessarily stationary or asymptotically stationary.
- We should emphasize that, in general, and irrespective of whether the noise is stable or unstable, the entropy rates that appear in (212) and (213) may not exist. To show existence of the limits and , we identify necessary and/or sufficient conditions, using the characterization of Theorem 3, when the channel input strategies are restricted to asymptotically time-invariant strategies . Clearly, by (212), whether the limit as, and the supremum over channel input distributions exist, depend on the convergence properties of the coupled generalized matrix DREs, , as .
3.1. Entropy Rates of Gaussian Processes
First, we recall the following definition, which is standard and found in many textbooks:
Definition 5.
Entropy rate of continuous-valued random processes
- Let be a random process defined on some probability space . The entropy rate (differential) is defined bywhen the limit exists.
The next theorem quantifies the existence of entropy rates of stationary Gaussian processes [16].
Theorem 4.
The entropy rate of stationary zero-mean full-rank Gaussian process [16]
- Let be a stationary Gaussian process, with a zero mean, and full-rank covariance of . Let denote the Hilbert space of RVs generated by , and define the innovations process byand its limitThen, the entropy rate is given bywhen it exists.
An application of Theorem 4 is given in the next proposition [15].
Proposition 3.
Entropy rate of Gaussian process described by PSD (41)
- Let be a real, scalar-valued, stationary Gaussian noise with PSD (41), with a corresponding time-invariant stationary realization (similar to Definition 2). Then, the entropy rate is given by
Proof.
This is shown in [15] by using the Szego formula and Poisson’s integral formula. □
The next remark is trivial; it is introduced for a subsequent comparison.
Remark 13.
Let be the non-stationary ARMA noise of Example 2. Then, the conditional entropy of for fixed initial state is given by
The next lemma identifies fundamental conditions for the existence of the entropy rate of the time-varying PO-SS noise realization of Definition 2 (if is not fixed) and includes the entropy rate of the non-stationary ARMA noise of Remark 13.
Lemma 2.
Entropy rate of the time-varying PO-SS noise realization of Definition 2
- Consider the time-varying PO-SS noise realization of Definition 2. Then, the following hold:(a) The joint entropy of , when it exists, is given bywhere is a zero-mean covariance , Gaussian orthogonal innovation process of that is defined bythat is, is independent of .(b) Suppose that the sequence is such thatThen, the entropy rate of is given by
Proof.
See Appendix A.6. □
Remark 14.
Entropy rate of non-stationary Gaussian noise
- By Lemma 2, a necessary condition for existence of the entropy rate of non-stationary Gaussian process is the convergence of the covariance of the Gaussian orthogonal innovations process of , i.e., of , since . We can determine such necessary and/or sufficient conditions from the convergence properties of the generalized Kalman filter equations [16,17] of Lemma 1.
3.2. Convergence Properties of Generalized Matrix DREs to AREs
To address the asymptotic properties of estimation errors generated by the recursions of generalized Kalman filters, such as of Theorem 3, generated by (178), we need to introduce the stabilizing solutions of generalized AREs. The next definition is useful in this respect.
Definition 6.
Stabilizing solutions of generalized matrix AREs
- Let .Define the generalized time-invariant matrix DREMoreover, define the corresponding generalized matrix ARE as follows:A solution to the generalized matrix ARE (226), assuming it exists, is called stabilizing if . In this case, we say that is asymptotically stable, i.e., the eigenvalues of are stable.
With respect to any of the above generalized matrices DRE and ARE, we introduce the important notions of detectability, unit circle controllability, and stabilizability. We use these notions to characterize the convergence properties of solutions of generalized matrix DREs, , as , to a unique symmetric, non-negative, stabilizing solution P of the generalized matrix ARE. These notions are used to identify necessary and/or sufficient conditions for the error recursions of generalized Kalman filters, such as of Theorem 3, generated by (178), to converge in a mean square sense, to a unique limit. However, we should distinguish whether the convergence is uniform for all initial conditions, or only for .
Definition 7.
Detectability, Stabilizability, and Unit Circle Controllability
- Consider the generalized matrix ARE of Definition 6 and introduce the matrices(a) The pair is called detectable if there exists a matrix such that , i.e., the eigenvalues λ of lie in (stable).(b) The pair is called unit circle controllable if there exists a such that , i.e., all eigenvalues λ of are such that .(c) The pair is called stabilizable if there exists a such that , i.e., all all eigenvalues λ of lie in .(d) The pair is called observable if the rank condition holds:(e) The pair is called controllable if the rank condition holds:
Remark 15.
The following are well known [16]. If the pair is observable, then it is detectable. If the pair is controllable, then it is stabilizable.
The next theorem characterizes detectability, unit circle controllability, and stabilizability [17,40].
Lemma 3
([17,40]). Necessary and sufficient conditions for detectability, unit circle controllability, and stabilizability
- (a) The pair is detectable if and only if there exists no eigenvalue and eigenvector , , such that and such that .(b) The pair is unit circle controllable if and only if there exists no eigenvalue and eigenvector , , such that and such that .(c) The pair is stabilizable if and only there exists no eigenvalue and eigenvector , , such that and such that .
In the next theorem, we summarize known results on sufficient and/or necessary conditions for the convergence of solutions of the generalized time-invariant DRE (225), as , to a symmetric, non-negative , stabilizing solution of the corresponding generalized ARE (226). We recall that the pair is detectable, which is a necessary condition forthe convergence of the sequence as to a non-negative P, which is a stabilizing solution of a corresponding generalized ARE. However, it is not sufficient. To have a sufficient condition, it is necessary that the pair is unit circle controllable; however, the limiting P is not necessarily the unique solution of of the generalized ARE. There may be multiple solutions depending on the initial condition .
Theorem 5
([16,17]). Convergence of time-invariant generalized DRE
- Let denote a sequence that satisfies the time-invariant generalized DRE (225) with arbitrary initial condition . The following hold:(1) Consider the generalized DRE (225) with a zero initial condition, i.e., , and assume that the pair is detectable and that the pair is unit circle controllable.Then, the sequence that satisfies the generalized DRE (225), with a zero initial condition , converges to P, i.e., , where P satisfies the generalized matrix ARE (226) if and only if the pair is stabilizable.(2) Assume that the pair is detectable and that the pair is unit circle controllable. Then there exists a unique stabilizing solution to the generalized ARE (226), i.e., such that if and only if is stabilizable.(3) If is detectable and is stabilizable, then any solution to the generalized matrix DRE (225) with arbitrary initial condition is such that , where is the unique solution of the generalized matrix ARE (226) with , i.e., it is stabilizing.(4) is detectable and unit circle controllable, which are necessary and sufficient conditions for any solution to the generalized DRE (225) to converge, , from some initial condition , where is a stabilizing solution of the generalized ARE (226), but it may not be unique (i.e., (226) may have multiple solutions ).
Proposition 4.
Generalizations to asymptotic-time invariant coefficients
- Suppose that the coefficients of the generalized matrix DRE (225), are replaced by , , and they are asymptotically time-invariant, i.e.,Then, Theorem 5 remains valid.
Proof.
This is due to the well-known continuity properties of matrix DREs with respect to their coefficients, i.e., the convergence properties are characterized by the limiting pairs, and . □
3.3. Feedback Rates
Now, we return to the feedback rates of Definition 4. The next corollary is an application of Theorem 5 to the generalized Kalman filter of Lemma 1 (for the time-invariant PO-SS realization); it identifies conditions for existence of the entropy rate , irrespective of whether the noise is stable or unstable.
Corollary 8.
The entropy rate of PO-SS noise realization based on the generalized Kalman filter
- Let denote the solution of the generalized matrix DRE (95) of the generalized Kalman filter of Lemma 1 of the time-invariant PO-SS realization of of Definition 2, i.e., , generated byLet be a solution of the corresponding generalized AREDefine the matrices(a) All statements of Theorem 5 hold with as defined by (234) and (235).In particular, suppose the following:(i) is detectable;(ii) is stabilizable.Then, any solution to the generalized matrix DRE (231) with arbitrary initial condition is such that , where is the unique and stabilizing solution of the generalized matrix ARE (233), i.e., with .(b) Suppose that (i) and (ii) hold. The entropy rate of is given bywhereis the stationary Gaussian innovations process, i.e., with replaced by , and the entropy rate is independent of the initial data .(c) Suppose in parts (a) and (b), the condition is stabilizable is replaced by(iii) is unit circle controllable.Then, the statements of parts (a) and (b) hold, for some , but not for all . Moreover, is not necessarily a stationary process, i.e., it depends on the value of .
Proof.
(a) These are direct applications of Theorem 5. (b) This follows from Lemma 2. (c) This is due to Theorem 5(4). □
Next, we apply Corollary 8 to the non-stationary AR noise.
Lemma 4.
Properties of solutions of DREs and AREs of AR noise and entropy rate
- Consider the AR noise of Example 2(a), and the DRE , generated by Corollary 6(a), i.e.,where , . Let be a solution of the corresponding generalized ARE, as follows:Then, the detectability and stabilizability pairs areand the following hold:(1) The pair is detectable (the restriction is always assumed).(2) The pair is unit circle controllable if and only if ().(3) The pair is stabilizable if and only if ().(4) Suppose and . The sequence that satisfies the generalized DRE with any initial condition, , converges to , i.e., , where satisfies the ARE (241) if and only if the is unit circle controllable, equivalently, . Moreover, the solutions of the quadratic Equation (241), without imposing areThat is, , is the unique and stabilizing solution of (241), i.e., such that , if and only if , andis the maximal and stabilizing solution of (241), i.e., such that , if and only if .(5) Suppose and . Then, any solution to the generalized DRE (239) with an arbitrary initial condition, , is such that , where is the unique solution of the generalized ARE (241) with , i.e., it is stabilizing. Moreover, .(6) (i) Suppose and . The entropy rate of is given by(ii) Suppose and . The entropy rate of is given by
Proof.
See Appendix A.7. □
Remark 16.
Lemma 4(4) emphasizes the fact that in the asymptotic analysis of , which satisfies the DREs (239) and (240), its limiting value, , where satisfies the ARE (241), with two solutions and . For any , it is clear that for , the unique and stabilizing solution is , , since the other solution is negative. On the other hand, for any and , the stabilizing solution is the maximal solution, , provided .
To gain additional insights, we discuss the application of Lemma 4 to the AR noise in the next remark.
Remark 17.
Entropy rate of the AR noise
- Consider the non-stationary AR noise defined by (124). Then, from Lemma 4, is the solution of (239) and (240), with (see Corollary 6(b), (201)), and (241) degenerates to the ARE, as follows:For , by (242), the pair is detectable, and the pair is stabilizable. The two solutions of the ARE (248), without imposing , areThat is, , where is the unique (stabilizing) solution of the ARE and corresponds to the stable eigenvalue of the error equation (see (93), i.e., .
Next, we compute the entropy rate of the time-invariant non-stationary PO-SS noise of Corollary 2 to show fundamental differences from the entropy rate of the AR noise of Lemma 4.
Lemma 5.
Properties of solutions of DREs and AREs of PO-SS noise and entropy rate
- Consider the the time-invariant non-stationary PO-SS noise of Example 1, i.e., given byand the sequence , generated by the DRE of Lemma 1 (see (113), i.e.,where . Let be the corresponding solution of generalized ARE:Then, the detectability and stabilizability pairs areand the following hold:(1) The pair is detectable . If the pair is detectable if and only if .(2) The pair is unit circle controllable if and only if , .(3) The pair is stabilizable if , . If the pair is stabilizable if and only if .(4) Define the setFor any , any solution to the (classical) DRE (252) with an arbitrary initial condition, is such that , where is the unique solution of the (classical) ARE (253) with , i.e., it is stabilizing.(5) For any of part (4), the entropy rate of is given by
Proof.
Follows from Theorem 5. □
Next, we turn our attention to the convergence properties of the entropy rate , which is needed for the characterization of of Definition 4.
Theorem 6.
Asymptotic properties of entropy rate of Theorem 3
- Let be the solution of the generalized DRE (180) of the generalized Kalman filter of Theorem 3, corresponding to the time-invariant PO-SS realization of of Definition 2, , with time-invariant strategies , , generated bywhereDefine the corresponding generalized ARE bywhereIntroduce the matricesSuppose that the detectability and stabilizability conditions of Corollary 8(i,ii) hold.Then, all statements of Theorem 5 hold with as defined by (265).In particular, suppose(i) is detectable;(ii) is stabilizable.Then, any solution to the generalized matrix DRE (258) with an arbitrary initial condition is such that , where is the unique solution of the generalized matrix ARE (262) with , i.e., it is stabilizing.Moreover, the entropy rate of is given bywhere is the innovations process of Theorem 3 (with indicated changes of time-invariant strategies) and whereis the stationary Gaussian innovations process, i.e., with replaced by .
Proof.
Since the detectability and stabilizability conditions of Corollary 8 hold, then the statements of Corollary 8 hold. By the continuity property of the solutions of generalized difference Riccati equations, with respect to its coefficients (see [16]), and the convergence of the sequence , where is the unique stabilizing solution of (233), then the statements of Theorem 6 hold, as stated. In particular, under the detectability and stabilizability Conditions (i) and (ii), , where is the unique and stabilizing solution of (262). □
In the next lemma, we apply Theorem 6 to the AR noise of Example 2(a) using Lemma 4.
Lemma 6.
- Let denote the solution of the DRE of Corollary 6(a), when , i.e., given byand whereDefine the setFor any , let be a corresponding solution of the ARE (evaluated at ),and define the pairsThen, the following hold:(1) Suppose . Then, is detectable .(2) Suppose . Then, is detectable for if and only if .(3) Suppose . Then, the pair is unit circle controllable if and only if .(4) Suppose . Then, the pair is stabilizable if and only if .(5) Suppose , and , . The sequence that satisfies the generalized DRE (274) with a zero initial condition, , converges to , i.e., , where satisfies the generalized ARE,if and only if (by Lemma 4(4)), and the pair is stabilizable, equivalently, .
Proof.
The statements follow from Lemma 4, Theorem 6 (and general properties of Theorem 5). □
Remark 18.
From Lemma 6(5), it follows that if , then the unique and stabilizing solution is and corresponds to . This is an application of Theorem 5(1).
In the next theorem, we characterize the asymptotic limit of Definition 4 by invoking Theorems 3 and 6 and Corollary 8.
Theorem 7.
Feedback capacity of Theorem 3 for time-invariant strategies
- Consider of Definition 4 corresponding to Theorem 3, i.e., the PO-SS realization of of Definition 2 is time-invariant, , and the strategies are time-invariant, .Define the setThen,whereprovided there exists such that the set is non-empty.Moreover, the maximum element is such that(1) It induces asymptotic stationarity of the corresponding input and innovations processes (see Theorem 3 for specification);(2) If is asymptotically stationary, then it induces asymptotic stationarity of the corresponding, input and output processes;(3) For (i) and (ii), is independent of the initial conditions .Furthermore, if the set is empty, replace stabilizability of and by unit circle controllablilty, i.e., the maximal and stabilizing solutions of the AREs are utilized.
Proof.
By Definition 4, Theorems 3 and 6 and Corollary 8, (286) follows. We defined the set using the detectability and stabilizability conditions of Corollary 8 and Theorem 6 to ensure the convergence of solutions of the generalized matrix DREs to unique non-negative, stabilizing solutions of the corresponding generalized matrix AREs. Then, for any element , both summands in (286) converge. This establishes the characterization of the right-hand side of (287). Parts (1)–(3) follow from the asymptotic properties of the Kalman filter (due to the stabilizability and detectability conditions). The last statement follows due to the relaxation, Theorem 5(4). □
Remark 19.
In Theorem 7, if we replace stabilizability by unit circle controllability, then the supremum in (288) is over a larger set. However, the asymptotic limits are not unique stabilizing solutions but are instead the maximal and stabilizing solutions.
Theorem 7 also holds for asymptotically time-invariant strategies. We state this as a corollary.
Corollary 9.
Feedback capacity of Theorem 3 for asymptotically time-invariant strategies
- Consider the problem statement of Theorem 7 with asymptotically time-invariant strategies, and corresponding .Then,and the statements of Theorem 7 hold.
Proof.
This follows from Proposition 4 and Theorem 7. □
Remark 20.
Explicit closed-form expressions of are given in [1,2] for the stable and unstable AR and ARMA noise processes . The expressions consist of multiple regimes that depend on the parameters of noise, i.e., for the ARMA noise and value of κ. Moreover, for some regimes, is achieved by an optimal , while for other regimes, it is achieved by , such that .
Next, we give the expression of feedback capacity , which is generally an upper bound on the expressions of Theorem 7 and Corollary 9.
Theorem 8.
Feedback capacity of Theorem 3 for asymptotically time-invariant noise and strategies
- Consider of Definition 4 corresponding to Theorem 3, where and the coefficients of the PO-SS realization of of Definition 2 are asymptotically time-invariant, i.e.,Let correspond to (285) with and being replaced by unit circle controllablity.Let correspond to (288), with and being the maximal and stabilizing solution of (262) and the maximal and stabilizing solution of (233), respectively.Then,
Proof.
First, we note that Theorem 7 continuous to hold, if we consider asymptotically time-invariant strategies and coefficients, i.e., (a) and (b) (by Proposition 4), and the stabilizability conditions are replaced by unit circle controllability conditions. Hence, (288) remains valid with the set replaced by the larger set , giving the higher value (293) ≥ (288). For the derivation of (292) = (293), it suffices to show we can interchange the and the supremum in (292). This can be completed by using the definition of the set and Conditions (a) and (b). The procedure, although lengthy, is similar to the one described in [41]; hence, we omit it. □
Conclusion 1.
Degenerate versions of Theorem 7, Theorem 8 for feedback code of Definition 3, i.e., ,
- The characterizations of feedback capacity of the AGN channel (1) driven by a noise of Definition 2, for the code of Definition 3, i.e., , , are degenerate cases of Theorem 7 and Theorem 8, corresponding to . In particular, since Theorem 7 characterizes for all initial data , then it includes . Moreover, it follows that , where is independent of the initial state .
We apply Theorem 7 to obtain of the AR noise.
Corollary 10.
Consider the AR noise of Example 2(a).
- Define the setThen,whereprovided that there exists such that the set is non-empty.Moreover, the maximum element , is such that,(1) It induces asymptotic stationarity of the corresponding, input and innovations processes;(2) If is asymptotically stationary, then it induces asymptotic stationarity of the corresponding, input and output processes;(3) For (i) and (ii), and are independent of and s, respectively, and the following identities hold.Furthermore, if the set is empty, replace stabilizability of by unit circle controllability, i.e., so that is the maximal and stabilizing solutions of the ARE.
Proof.
The first part is an application of Theorem 7, Lemmas 4 and 6. Parts (1)–(3) are due to the convergence properties of the Kalman filter (due to the stabilizability and detectability conditions). It remains to show (298). The equality holds by Conclusion 1(a). The last equality holds due to the AR noise. If the initial state is known to the encoder and the decoder, then Condition 1 of Section 1.1 holds. In addition, Condition 2 also holds, as can be easily verified from Equations (121) and (122). □
Remark 21.
From Corollary 10, we obtain the degenerate cases, AR, noise, i.e., setting . The various implications of the detectability and stabilizability conditions for the AR noise are found in [1]. The corresponding states that for stable AR and time-invariant strategies, feedback does not increase capacity (because of the stronger condition of stabilizabily). However, if unit circle controllability is imposed instead, then feedback increases capacity.
4. Sequential Characterization of n–FTFI Capacity for Case (II) Formulation
In this section, we consider the Case (II) formulation, and we derive the characterization of feedback capacity, , of the AGN channel (1) driven by a noise of Definition 2, i.e., for the code of Definition 3, , , when Conditions 1 and 2 of Section 1.1 hold.
Definition 8.
AGN channels driven by noise with invertible PO-SS realizations
- The PO-SS realization of the noise of Definition 2 is called invertible if it satisfies the following condition:(A1) Given the initial state , the noise uniquely specifies the state , for , and vice versa.
Corollary 11.
Characterization of n–FTFI Capacity for the Case (II) formulation
- Consider the AGN channel (1) driven by a noise of Definition 8, and the code of Definition 3, , , i.e., Conditions 1 and 2 of Section 1.1 hold.Define the n–FTFI capacity for a fixed initial state bywhere the set is defined byand where means is fixed, and the joint distribution depends on the elements of .Then, the following hold:(a) The n–FTFI capacity, for a fixed , is characterized bywhere the is defined byand where (18) is respected, , is conditionally Gaussian, with linear conditional mean and nonrandom conditional covariance, given by (the notation means this sequence is generated from (19), when the initial state is fixed, ).and is evaluated with respect to the probability distribution , defined by(b) Define the conditional means and conditional covariance for a fixed byThe optimal channel input distribution of part (a) is induced by a jointly Gaussian process , with a realization given bywhere is nonrandom.The conditional means and conditional covariance and are given by the generalized Kalman filter, as follows:(i) satisfies the Kalman filter recursion(ii) The error satisfies the recursion(iii) satisfies the generalized DRE(c) The characterization of the n–FTFI capacity of part (a) isand the statements of part (b) hold.
Proof.
See Appendix A.8. □
Remark 22.
The asymptotic analysis of Section 3, based on Definition 4, applies naturally to Corollary 11.
Corollary 12.
Characterization of Feedback Capacity for the Case (II) formulation
- Consider the statement of Corollary 11 for asymptotically time-invariant noise. The feedback capacity is given byIf the optimal solution is such that the stabilizability holds, then the feedback capacity is independent of the initial state .
Proof.
This is a special case of Theorem 7, with . □
Corollary 13.
Feedback capacity of [4,7]
- Consider the channels studied in [4,7], i.e., with time-invariant and stable realization.(a) The time-domain n–FTFI capacity is given in Corollary 11.(b) The time-domain feedback capacity is given by Corollary 12.
Proof.
Since the noise in [4,7] is time-invariant and stable, according to Definition 8, and the code is that of Definition 3, i.e., , , then their results are special cases of Corollaries 11 and 12. □
In the next remark, we clarify the relation of Corollary 11 and the analysis of [4,8].
Remark 23.
Relations of Corollary 11 and [4,8]
- (a) The state space problem analyzed in [8] is precisely , when the noise is stationary and Gaussian, i.e., it corresponds to the Case (II) formulation. Corollary 11 is derived in [8] for the degenerate case of a time-invariant realization of the noise , i.e., of Definition 8. However, the asymptotic analysis of [8] (Section VI) should be read with caution, because it did not impose the necessary and/or sufficient conditions for convergence of the sequence generated by the time-invariant version of the generalized DRE (321), i.e., , where is either the maximal or the unique and stabilizing solution of a corresponding generalized ARE.(b) The problem analyzed [4] that led to [4] (Theorem 6.1, ) is the per unit time limit of , when the noise is stationary, two-sided or one-sided (asymptotically stationary) and Gaussian, i.e., it corresponds to the Case (II) formulation. The characterization of feedback capacity presented in [4] (Theorem 6.1, ) presupposed that the following hold ((i)–(iii) are also assumed in [8] (Section VI)):(i) The feedback code is Definition 3, i.e., .(ii) The noise is time-invariant and stable, and the PO-SS realization of the noise is invertible, as presented in Definition 8.(iii) The definition of rate is , with supremum and per unit time limit interchanged, and the supremum taken over using time-invariant channel input distributions.(iv) The innovations covariance of the channel input process is asymptotically zero, i.e., . This implies the corresponding is the maximal and stabilizing solution of the corresponding matrix ARE, since detectability and unit circle controllability conditions hold, but not the stabilizability condition.Items (i)–(iv) are confirmed from [4] (Lemma 6.1) (and comments above), which is used to derive [4] (Theorem 6.1, ).However, the characterization of feedback capacity in [4] (Theorem 6.1, ) should be read with caution, because the stabilizability condition is violated, due to the requirement of the author, which states that is optimal. By Theorem 5(1), for the choice , the only choice is the maximal and stabilizing solution of the generalized ARE presented in [4] (Theorem 6.1, ).However, it is easy to verify that [4] (Theorem 6.1, CFB) cannot be the capacity of asymptotically stationary noise because depends on the covariance . Moreover, is required.Finally, we emphasize that the treatment of the ARMA noise in [2] clarifies the above issues.
5. Conclusions
New equivalent sequential characterizations of Cover and Pombra [5] “n–block” feedback capacity formulas are derived using time-domain methods for additive Gaussian noise (AGN) channels driven by non-stationary Gaussian noise. New features of the equivalent characterizations encompass the representation of the optimal channel input process by a sufficient statistic and Gaussian orthogonal innovations process. The sequential characterizations of the n–block feedback capacity formula are expressed as a functional of two generalized matrix difference Riccati equations (DREs) of the filtering theory of Gaussian systems. The asymptotic analysis of the per unit time limit of the n–block”, called feedback capacity, is also presented for time-invariant and asymptotically time-invariant channel input distributions, using the tools from the theory of generalized matrix Riccati equations.
Prior analysis and characterizations of feedback capacity follows on from the analysis and derivation of the new sequential characterizations of feedback capacity, such as [4,7,11,12], who do not address the Cover and Pombra [5] feedback capacity problem, as the code definitions and noise assumptions in [4,7,11,12] (even under the restriction of stationary noise) are fundamentally different from those in [5]. This paper clarifies several of these points of confusion.
Author Contributions
C.D.C., C.K. and S.L. contributed to the conceptualization, methodology, and writing of this manuscript. All authors have read and agreed to the published version of this manuscript.
Funding
The work of C.D. Charalambous was co-funded by the European Regional Development Fund and the Republic of Cyprus through the Research and Innovation Foundation (Project: EXCELLENCE/1216/0296).
Institutional Review Board Statement
Not applicable.
Data Availability Statement
No data are contained within this article.
Conflicts of Interest
The authors declare no conflicts of interest. The funders had no role in the design of this study; in the collection, analyses, or interpretation of data; in the writing of this manuscript; or in the decision to publish the results.
Appendix A
Appendix A.1. Proof of Theorem 1
(a) Consider an element of . Then, the conditional entropies are defined, provided that the conditional distributions of conditioned on , i.e., , for , are determined. By the reconditioning, using (60), we have
Hence, (64) is shown. Similarly, consider an element of . Then, the conditional entropies are defined, provided that the conditional distributions of conditioned on , i.e., for , are determined. By (52) and (53), (65) is obtained. Since , Inequality (63) follows.
(b) This part is followed by the maximum entropy principle of Gaussian distributions. That is, under the restriction (18), a conditional Gaussian element of with linear conditional mean and nonrandom conditional covariance induces a jointly Gaussian distribution of the process , such that the marginal distribution of is jointly Gaussian. Below, we provide alternative proof that uses the Cover and Pombra characterization of the n–FTFI capacity, given by (34) and (35). Consider (35) and define the process
Then, is a Gaussian orthogonal innovations process, independent of , for , and , for . By (35), we re-write as,
where is due to the by joint Gaussianity of . From (A10) and the independence of and , for , (66) then follows, as well as (67).
Appendix A.2. Proof of Proposition 1
(a) The covariances of the realization of the ARMA noise of Example 2(b) satisfy the recursions
If the recursion is initiated at the stationary value , then ; hence, is stationary, which then implies the stationarity of . Hence, if (133) holds, then is stationary. Via simple calculations, (134) then follows. We carry out the same operation for the one-sided ARMA. (b) By the above covariances, for all , we have , where , which then implies . Similarly, , . (c) (135) and (136) follow on from Lemma 1 by replacing the conditioning information with in (85). By mean square estimation, the initial data are
The last part is obvious.
Appendix A.3. Proof of Corollary 3
(a) Since we have assumed that is fixed, and that it is known to the encoder and the decoder, then Theorem 1 still holds. This is due to replacing all conditional distributions, expectations and entropies via corresponding expressions with a fixed . Hence, (75) is replaced by (141), and (68) is replaced by (142) (since the code is allowed to depend on ). (b) From the PO-SS realization of Definition 2 with fixed, it follows that a necessary condition for Conditions 1 of Section 1.1 to hold is (i). The expression of entropy (143) is easily obtained by invoking Condition (i) and the properties of conditional entropy. That is, by independence of and , and , etc. The last statement is obvious. This completes the proof.
Appendix A.4. Proof of Theorem 3
(a) Clearly, (154)–(165) follow directly from Theorem 1 and the preliminary calculations prior to the statement of the theorem. However, (154)–(165) can also be shown independently of Theorem 1 by invoking the maximum entropy property of Gaussian distributions, as follows: By Lemma 1, we have . By the maximum entropy principle, is maximized if is jointly Gaussian, the average power constraint holds, and (17) is respected. By (146), (150), and (160), if (150)–(165) hold, then is jointly Gaussian; hence, is maximized. This shows (a).
- (b) Step 1. By (163) and (164), an alternative representation of to the one given in Theorem 1 and induced by (68) isfor some nonrandom . Upon substituting (A14) into the channel output , we haveThe right-hand side of (A17) is driven by two independent processes, and , which are also mutually independent. Further, the right-hand side of (A17) is a linear function of a state process , which satisfies the following recursion (88):Note that the right-hand side of (A18) is driven by the orthogonal process , which is independent of and hence of . It is also independent of . By (167), is independent of and of . By (A17) and (A18), it follows that satisfies a generalized Kalman filter recursion, similar to that of Lemma 1. Hence, the entropy can be computed using the innovations process of , as in Lemma 1. Define the orthogonal Gaussian innovations process of byThe entropy of is computed as follows.
Appendix A.5. Proof of Proposition 2
Since the proof of [4] (Theorem 6.1) is based on [4] (Lemma 6.1), where the channel input is expressed as , where is a nonrandom vector, then (211) is necessary for [4] (Theorem 6.1) to hold. Next, we show that Conditions 1 and 2 of Section 1.1 are necessary and sufficient for Equality (211) to hold. To avoid a complex notation, we prove the claim for the realization of Example 2(a). Suppose that the initial state of the noise is and that it is known to the encoder and the decoder; without loss of generality, take , which, by (127), implies (as is often carried out in [4]). Then, the following hold:
From (A28)–(A32), it then follows that for any , including , it is known by the encoder that the following equalities hold:
We can go one step further to identify the information structure of optimal channel input distributions using (A34), i.e., to show , by repeating to proof of [8] (Theorem 1). However, for the statement of the proposition, this is not necessary.
Suppose that either is not known to the encoder, i.e., are not known to the encoder, and , while the optimal channel input is expressed as a function of the state of the noise, :
Appendix A.6. Proof of Lemma 2
Appendix A.7. Proof of Lemma 4
From Corollary 6(a), we deduce that satisfies (239) with initial condition (240). By Definition 7, the corresponding generalized algebraic Riccati equation is (241), and pairs and are given by (242).
- (1) By Definition 7, for , the pair is observable and hence detectable.(2) By Definition 7, the pair is unit circle controllable if and only if .(3) By Definition 7, the pair is stabilizable if and only if .(4) This follows from Theorem 5(1) and parts (1)–(4). Since (241) is a quadratic equation, we can verify that the two solutions are and , and we consider the statement of (243).(5) For values and , the pair is detectable and the pair is stabilizable, and the statement follows from Theorem 5(3).
Appendix A.8. Proof of Corollary 11
First, note that the analog of Theorem 1(a), for the code , is (299) and (300) because . Define as in (300), with being replaced by .
- (a) ThenThe PO-SS realization, for a fixed , is thenThen,The probability distribution is then given byThe pay-off is the sum of conditional entropies , and the constraint is (300). By Definition 2, the state is Markov, . By (A44) and the Markov property of , at each time t, the input distribution depends on and not on . By (301) and (303), . It is noted that (301) and (303) also follow from a slight variation of the derivation given in [8] (Theorem 1). By the maximum entropy principle of Gaussian distributions, it then follows that the distribution is conditionally Gaussian, with linear conditional mean and nonrandom conditional covariance. Then, (304) follows by repeating Step 2 of the derivation of Theorem 3. This completes the derivation of all statements of part (a).(b,c). The statements follow from part (a), using the generalized Kalman filter, as in Theorem 3.
References
- Kourtellaris, C.; Charalambous, C.D.; Sergey, L. New Formulas for Ergodic Feedback Capacity of AGN Channels Driven by Stable and Unstable Autoregressive Noise. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Los Angeles, CA, USA, 21–26 June 2020. [Google Scholar]
- Louka, S.; Kourtellaris, C.; Charalambous, C.D. Qualitative Analysis of Feedback Capacity of AGN Channels Driven by Stable and Unstable Autoregressive Moving Average. In Proceedings of the 2021 IEEE Information Theory Workshop (ITW), Kanazawa, Japan, 17–21 October 2021. [Google Scholar]
- Charalambous, C.D.; Kourtellaris, C.; Louka, S. Sequential Characterization of Cover and Pombra Gaussian Feedback Capacity: Generalizations to MIMO Channels via a Sufficient Statistic. In Proceedings of the 2021 IEEE Information Theory Workshop (ITW), Kanazawa, Japan, 17–21 October 2021. [Google Scholar]
- Kim, Y.H. Feedback Capacity of Stationary Gaussian Channels. IEEE Trans. Inf. Theory 2010, 56, 57–85. [Google Scholar] [CrossRef]
- Cover, T.; Pombra, S. Gaussian feedback capacity. IEEE Trans. Inf. Theory 1989, 35, 37–43. [Google Scholar] [CrossRef]
- Derpich, M.S.; Ostergaard, J. Comments on “Feedback Capacity of Stationary Gaussian Channels”. IEEE Trans. Inf. Theory 2024, 70, 1848–1851. [Google Scholar] [CrossRef]
- Gattami, A. Feedback Capacity of Gaussian Channels Revisited. IEEE Trans. Inf. Theory 2019, 65, 1948–1960. [Google Scholar] [CrossRef]
- Yang, S.; Kavcic, A.; Tatikonda, S. On Feedback Capacity of Power-Constrained Gaussian Noise Channels with Memory. Inf. Theory IEEE Trans. Theory 2007, 53, 929–954. [Google Scholar] [CrossRef]
- Ihara, S. On the Feedback Capacity of the First-Order Moving Average Gaussian Channel. Jpn. J. Stat. Data Sci. 2019, 2, 491–506. [Google Scholar] [CrossRef]
- Charalambous, C.D.; Kourtellaris, C.; Louka, S. New Formulas of Feedback Capacity for AGN Channels with Memory: A Time-Domain Sufficient Statistic Approach. arXiv 2020, arXiv:2010.06226. [Google Scholar] [CrossRef]
- Liu, T.; Han, G. Feedback Capacity of Stationary Gaussian Channels Further Examined. IEEE Trans. Inf. Theory 2019, 64, 2494–2506. [Google Scholar] [CrossRef]
- Li, C.; Elia, N. Youla coding and computation of Gaussian feedback capacity. IEEE Trans. Inf. Theory 2019, 64, 3197–3215. [Google Scholar] [CrossRef]
- Butman, S. Linear Feedback Rate Bounds for Regressive Channels. IEEE Trans. Inf. Theory 1976, 22, 363–366. [Google Scholar] [CrossRef]
- Wolfowitz, J. Signalling over a Gaussian Channel with Feedback and Autoregressive Noise. J. Appl. Probab. 1975, 12, 713–723. [Google Scholar] [CrossRef]
- Ihara, S. Information Theory for Continuous Systems; World Scientific: Singapore, 1993; pp. I–XIII, 1–308. [Google Scholar]
- Caines, P.E. Linear Stochastic Systems; Wiley Series in Probability and Statistics; John Wiley & Sons, Inc.: New York, NY, USA, 1988. [Google Scholar]
- Kailath, T.; Sayed, A.; Hassibi, B. Linear Estimation; Prentice Hall: Hoboken, NJ, USA, 2000. [Google Scholar]
- Gallager, R.T. Information Theory and Reliable Communication; John Wiley & Sons, Inc.: New York, NY, USA, 1968. [Google Scholar]
- Butman, S. A General Formulation of Linear Feedback Communications systems with Solutions. IEEE Trans. Inf. Theory 1969, 15, 392–400. [Google Scholar] [CrossRef]
- Ebert, P.M. The capacity of the Gaussian channel with feedback. Bell Sys. Tech. J. 1970, 49, 1705–1712. [Google Scholar] [CrossRef]
- Tienan, J.; Schalkwijk, J.P.M. An upper bound to the capacity of bandlimited Gaussian autoregressive channel with noiseless feedback. IEEE Trans. Inf. Theory 1974, 20, 311–316. [Google Scholar] [CrossRef]
- Dembo, A. On Gaussian Feedback Capacity. IEEE Trans. Inf. Theory 1989, 35, 1072–1076. [Google Scholar] [CrossRef]
- Ihara, S.; Yanaki, K. Capacity of discrete-time Gaussian channels with and without feedback-II. Jpn. J. Appl. Math. 1989, 6, 245–258. [Google Scholar] [CrossRef]
- Ozarow, L. Upper Bounds on the Capacity of Gaussian Channel with Feedback. IEEE Trans. Inf. Theory 1984, 36, 156–161. [Google Scholar] [CrossRef]
- Ozarow, L.H. Random coding for additive Gaussin channels with Feedback. IEEE Trans. Inf. Theory 1984, 36, 17–22. [Google Scholar] [CrossRef]
- Yanaki, K. Necessary and sufficient conditions for the capacity of the discrete-time Gaussian channel to be increased by feedback. IEEE Trans. Inf. Theory 1992, 38, 1788–1791. [Google Scholar] [CrossRef]
- Yanaki, K. An upper bound on the discrete-time Gaussian channel with feedback-II. IEEE Trans. Inf. Theory 1994, 40, 588–593. [Google Scholar] [CrossRef][Green Version]
- Chen, H.W.; Yanaki, K. Refiniements of the half-bit and factor-of-two bounds for capacity in Gaussian channels with feedback. IEEE Trans. Inf. Theory 1999, 45, 316–325. [Google Scholar]
- Chen, H.W.; Yanaki, K. Upper bounds on the capacity of discrete-time blockwise white Gaussian channels with feedback. IEEE Trans. Inf. Theory 2000, 43, 1125–1131. [Google Scholar] [CrossRef]
- Gallager, R.G.; Nakiboglu, B. Variations on a Theme by Schalkwijk and Kailath. IEEE Trans. Inf. Theory 2010, 56, 6–17. [Google Scholar] [CrossRef]
- Ordentlich, E. A Class of Optimal Coding Schmes for Moving Average Additive Gaussian Noise Channels with Feedback. In Proceedings of the IEEE International Symposium on Information Theory Proceedings (ISIT), Trondheim, Norway, 27 June–1 July 1994; p. 467. [Google Scholar]
- Tatikonda, S.; Mitter, S. The Capacity of Channels with Feedback. IEEE Trans. Inf. Theory 2009, 55, 323–349. [Google Scholar] [CrossRef]
- Liu, T.; Han, G. The ARMAk Gaussian Feedback Capacity. In Proceedings of the IEEE International Symposium on Information Theory Proceedings (ISIT), Aachen, Germany, 25–30 June 2017; pp. 211–215. [Google Scholar]
- Kim, Y.H. Feedback capacity of the first-order moving average Gaussian channel. IEEE Trans. Inf. Theory 2006, 52, 3063–3079. [Google Scholar] [CrossRef]
- Kourtellaris, C.; Charalambous, C.D.; Sergey, L. From Feedback Capacity to Tight Achievable Bounds without Feedback for AGN Channels with Stable and Unstable Autoregressive Noise. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Los Angeles, CA, USA, 21–26 June 2020. [Google Scholar]
- Kourtellaris, C.; Charalambous, C.D. Information Structures of Capacity Achieving Distributions for Feedback Channels with Memory and Transmission Cost: Stochastic Optimal Control & Variational Equalities. IEEE Trans. Inf. Theory 2018, 64, 4962–4992. [Google Scholar]
- Charalambous, C.D.; Kourtellaris, C.; Loyka, S. Capacity Achieving Distributions and Separation Principle for Feedback Gaussian Channels with Memory: The LQG Theory of Directed Information. IEEE Trans. Inf. Theory 2018, 64, 6384–6418. [Google Scholar] [CrossRef]
- Sabag, O.; Kostina, V.; Hassibi, B. Feedback Capacity of MIMO Gaussian Channels. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Melbourne, Australia, 12–20 July 2021. [Google Scholar]
- Kumar, P.R.; Varaiya, P. Stochastic Systems: Estimation, Identification, and Adaptive Control; Prentice Hall: Hoboken, NJ, USA, 1986. [Google Scholar]
- van Schuppen, J.H. Control and System Theory of Discrete-Time Stochastic Systems; Springer: Berlin/Heidelberg, Germany, 2021. [Google Scholar]
- Charalambous, C.D.; Louka, S. A Riccati-Lyapunov Approach to Nonfeedback Capacity of MIMO Gaussian Channels Driven by Stable and Unstable Noise. In Proceedings of the 2022 IEEE Information Theory Workshop (ITW), Mumbai, India, 1–9 November 2022; pp. 184–189. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).